public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH proxmox-apt] Support quoted parsing for .list format
@ 2021-07-01 13:46 Fabian Ebner
  2021-07-01 13:46 ` [pve-devel] [PATCH proxmox-apt 1/2] avoid backtick unicode symbol in string Fabian Ebner
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Fabian Ebner @ 2021-07-01 13:46 UTC (permalink / raw)
  To: pve-devel

First patch is just replacing the unicode back-tick that caused
problems down the line. Not related to the second patch.

Second patch makes the .lists files parser aware of quoting, which
fixes parsing CD ROM and file:// repositories with spaces, and in
general makes the parser behave more like APT.

Fabian Ebner (2):
  avoid backtick unicode symbol in string
  support quote-word parsing for one-line format

 src/repositories/file/list_parser.rs      | 143 +++++++++++++++++-----
 src/repositories/repository.rs            |  58 ++++++++-
 src/repositories/standard.rs              |   2 +-
 tests/sources.list.d.expected/cdroms.list |  10 ++
 tests/sources.list.d.expected/files.list  |   4 +
 tests/sources.list.d/cdroms.list          |   7 ++
 tests/sources.list.d/files.list           |   2 +
 7 files changed, 187 insertions(+), 39 deletions(-)
 create mode 100644 tests/sources.list.d.expected/cdroms.list
 create mode 100644 tests/sources.list.d.expected/files.list
 create mode 100644 tests/sources.list.d/cdroms.list
 create mode 100644 tests/sources.list.d/files.list

-- 
2.30.2





^ permalink raw reply	[flat|nested] 4+ messages in thread

* [pve-devel] [PATCH proxmox-apt 1/2] avoid backtick unicode symbol in string
  2021-07-01 13:46 [pve-devel] [PATCH proxmox-apt] Support quoted parsing for .list format Fabian Ebner
@ 2021-07-01 13:46 ` Fabian Ebner
  2021-07-01 13:46 ` [pve-devel] [PATCH proxmox-apt 2/2] support quote-word parsing for one-line format Fabian Ebner
  2021-07-01 16:30 ` [pve-devel] applied: [PATCH proxmox-apt] Support quoted parsing for .list format Thomas Lamprecht
  2 siblings, 0 replies; 4+ messages in thread
From: Fabian Ebner @ 2021-07-01 13:46 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 src/repositories/standard.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/repositories/standard.rs b/src/repositories/standard.rs
index 9d48670..463e735 100644
--- a/src/repositories/standard.rs
+++ b/src/repositories/standard.rs
@@ -110,7 +110,7 @@ impl APTRepositoryHandle {
             APTRepositoryHandle::NoSubscription => {
                 "This is the recommended repository for testing and non-production use. \
                 Its packages are not as heavily tested and validated as the production ready \
-                enterprise repository. You don’t need a subscription key to access this repository."
+                enterprise repository. You don't need a subscription key to access this repository."
             }
             APTRepositoryHandle::Test => {
                 "This repository contains the latest packages and is primarily used for test labs \
-- 
2.30.2





^ permalink raw reply	[flat|nested] 4+ messages in thread

* [pve-devel] [PATCH proxmox-apt 2/2] support quote-word parsing for one-line format
  2021-07-01 13:46 [pve-devel] [PATCH proxmox-apt] Support quoted parsing for .list format Fabian Ebner
  2021-07-01 13:46 ` [pve-devel] [PATCH proxmox-apt 1/2] avoid backtick unicode symbol in string Fabian Ebner
@ 2021-07-01 13:46 ` Fabian Ebner
  2021-07-01 16:30 ` [pve-devel] applied: [PATCH proxmox-apt] Support quoted parsing for .list format Thomas Lamprecht
  2 siblings, 0 replies; 4+ messages in thread
From: Fabian Ebner @ 2021-07-01 13:46 UTC (permalink / raw)
  To: pve-devel

so that parsing CD ROM repositories with spaces in the name works too.
But it's not limited to that, and should make one-line parsing rather
similar to what APT does (stanza parsing in APT doesn't use
ParseQuoteWord at all AFAICS).

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
 src/repositories/file/list_parser.rs      | 143 +++++++++++++++++-----
 src/repositories/repository.rs            |  58 ++++++++-
 tests/sources.list.d.expected/cdroms.list |  10 ++
 tests/sources.list.d.expected/files.list  |   4 +
 tests/sources.list.d/cdroms.list          |   7 ++
 tests/sources.list.d/files.list           |   2 +
 6 files changed, 186 insertions(+), 38 deletions(-)
 create mode 100644 tests/sources.list.d.expected/cdroms.list
 create mode 100644 tests/sources.list.d.expected/files.list
 create mode 100644 tests/sources.list.d/cdroms.list
 create mode 100644 tests/sources.list.d/files.list

diff --git a/src/repositories/file/list_parser.rs b/src/repositories/file/list_parser.rs
index 86c9955..04c1729 100644
--- a/src/repositories/file/list_parser.rs
+++ b/src/repositories/file/list_parser.rs
@@ -1,7 +1,6 @@
 use std::convert::TryInto;
 use std::io::BufRead;
-use std::iter::{Iterator, Peekable};
-use std::str::SplitAsciiWhitespace;
+use std::iter::Iterator;
 
 use anyhow::{bail, format_err, Error};
 
@@ -9,6 +8,78 @@ use crate::repositories::{APTRepository, APTRepositoryFileType, APTRepositoryOpt
 
 use super::APTRepositoryParser;
 
+// TODO convert %-escape characters. Also adapt printing back accordingly,
+// because at least '%' needs to be re-escaped when printing.
+/// See APT's ParseQuoteWord in contrib/strutl.cc
+///
+/// Doesn't split on whitespace when between `[]` or `""` and strips `"` from the word.
+///
+/// Currently, %-escaped characters are not interpreted, but passed along as is.
+struct SplitQuoteWord {
+    rest: String,
+    position: usize,
+}
+
+impl SplitQuoteWord {
+    pub fn new(string: String) -> Self {
+        Self {
+            rest: string,
+            position: 0,
+        }
+    }
+}
+
+impl Iterator for SplitQuoteWord {
+    type Item = Result<String, Error>;
+
+    fn next(&mut self) -> Option<Self::Item> {
+        let rest = &self.rest[self.position..];
+
+        let mut start = None;
+        let mut wait_for = None;
+
+        for (n, c) in rest.chars().enumerate() {
+            self.position += 1;
+
+            if let Some(wait_for_char) = wait_for {
+                if wait_for_char == c {
+                    wait_for = None;
+                }
+                continue;
+            }
+
+            if char::is_ascii_whitespace(&c) {
+                if let Some(start) = start {
+                    return Some(Ok(rest[start..n].replace('"', "")));
+                }
+                continue;
+            }
+
+            if start == None {
+                start = Some(n);
+            }
+
+            if c == '"' {
+                wait_for = Some('"');
+            }
+
+            if c == '[' {
+                wait_for = Some(']');
+            }
+        }
+
+        if let Some(wait_for) = wait_for {
+            return Some(Err(format_err!("missing terminating '{}'", wait_for)));
+        }
+
+        if let Some(start) = start {
+            return Some(Ok(rest[start..].replace('"', "")));
+        }
+
+        None
+    }
+}
+
 pub struct APTListFileParser<R: BufRead> {
     input: R,
     line_nr: usize,
@@ -31,24 +102,18 @@ impl<R: BufRead> APTListFileParser<R> {
     /// Errors when options are invalid or not closed by `']'`.
     fn parse_options(
         options: &mut Vec<APTRepositoryOption>,
-        tokens: &mut Peekable<SplitAsciiWhitespace>,
+        tokens: &mut SplitQuoteWord,
     ) -> Result<(), Error> {
-        let mut option = match tokens.peek() {
-            Some(token) => {
-                match token.strip_prefix('[') {
-                    Some(option) => option,
-                    None => return Ok(()), // doesn't look like options
-                }
-            }
-            None => return Ok(()),
-        };
-
-        tokens.next(); // avoid reading the beginning twice
-
         let mut finished = false;
+
         loop {
+            let mut option = match tokens.next() {
+                Some(token) => token?,
+                None => bail!("options not closed by ']'"),
+            };
+
             if let Some(stripped) = option.strip_suffix(']') {
-                option = stripped;
+                option = stripped.to_string();
                 if option.is_empty() {
                     break;
                 }
@@ -83,11 +148,6 @@ impl<R: BufRead> APTListFileParser<R> {
             if finished {
                 break;
             }
-
-            option = match tokens.next() {
-                Some(option) => option,
-                None => bail!("options not closed by ']'"),
-            }
         }
 
         Ok(())
@@ -122,24 +182,43 @@ impl<R: BufRead> APTListFileParser<R> {
             line = line_start;
         }
 
-        let mut tokens = line.split_ascii_whitespace().peekable();
-
-        match tokens.next() {
-            Some(package_type) => {
+        // e.g. quoted "deb" is not accepted by APT, so no need for quote word parsing here
+        line = match line.split_once(|c| char::is_ascii_whitespace(&c)) {
+            Some((package_type, rest)) => {
                 repo.types.push(package_type.try_into()?);
+                rest
             }
             None => return Ok(None), // empty line
-        }
+        };
 
-        Self::parse_options(&mut repo.options, &mut tokens)?;
+        line = line.trim_start_matches(|c| char::is_ascii_whitespace(&c));
+
+        let has_options = match line.strip_prefix('[') {
+            Some(rest) => {
+                // avoid the start of the options to be interpreted as the start of a quote word
+                line = rest;
+                true
+            }
+            None => false,
+        };
+
+        let mut tokens = SplitQuoteWord::new(line.to_string());
+
+        if has_options {
+            Self::parse_options(&mut repo.options, &mut tokens)?;
+        }
 
         // the rest of the line is just '<uri> <suite> [<components>...]'
-        let mut tokens = tokens.map(str::to_string);
         repo.uris
-            .push(tokens.next().ok_or_else(|| format_err!("missing URI"))?);
-        repo.suites
-            .push(tokens.next().ok_or_else(|| format_err!("missing suite"))?);
-        repo.components.extend(tokens);
+            .push(tokens.next().ok_or_else(|| format_err!("missing URI"))??);
+        repo.suites.push(
+            tokens
+                .next()
+                .ok_or_else(|| format_err!("missing suite"))??,
+        );
+        for token in tokens {
+            repo.components.push(token?);
+        }
 
         repo.comment = std::mem::take(&mut self.comment);
 
diff --git a/src/repositories/repository.rs b/src/repositories/repository.rs
index cf17380..4e1ea6e 100644
--- a/src/repositories/repository.rs
+++ b/src/repositories/repository.rs
@@ -433,6 +433,41 @@ fn suite_variant(suite: &str) -> (&str, &str) {
     (suite, "")
 }
 
+/// Strips existing double quotes from the string first, and then adds double quotes at
+/// the beginning and end if there is an ASCII whitespace in the `string`, which is not
+/// escaped by `[]`.
+fn quote_for_one_line(string: &str) -> String {
+    let mut add_quotes = false;
+    let mut wait_for_bracket = false;
+
+    // easier to just quote the whole string, so ignore pre-existing quotes
+    // currently, parsing removes them anyways, but being on the safe side is rather cheap
+    let string = string.replace('"', "");
+
+    for c in string.chars() {
+        if wait_for_bracket {
+            if c == ']' {
+                wait_for_bracket = false;
+            }
+            continue;
+        }
+
+        if char::is_ascii_whitespace(&c) {
+            add_quotes = true;
+            break;
+        }
+
+        if c == '[' {
+            wait_for_bracket = true;
+        }
+    }
+
+    match add_quotes {
+        true => format!("\"{}\"", string),
+        false => string,
+    }
+}
+
 /// Writes a repository in one-line format followed by a blank line.
 ///
 /// Expects that `repo.file_type == APTRepositoryFileType::List`.
@@ -457,15 +492,26 @@ fn write_one_line(repo: &APTRepository, w: &mut dyn Write) -> Result<(), Error>
 
     if !repo.options.is_empty() {
         write!(w, "[ ")?;
-        repo.options
-            .iter()
-            .try_for_each(|option| write!(w, "{}={} ", option.key, option.values.join(",")))?;
+
+        for option in repo.options.iter() {
+            let option = quote_for_one_line(&format!("{}={}", option.key, option.values.join(",")));
+            write!(w, "{} ", option)?;
+        }
+
         write!(w, "] ")?;
     };
 
-    write!(w, "{} ", repo.uris[0])?;
-    write!(w, "{} ", repo.suites[0])?;
-    writeln!(w, "{}", repo.components.join(" "))?;
+    write!(w, "{} ", quote_for_one_line(&repo.uris[0]))?;
+    write!(w, "{} ", quote_for_one_line(&repo.suites[0]))?;
+    writeln!(
+        w,
+        "{}",
+        repo.components
+            .iter()
+            .map(|comp| quote_for_one_line(comp))
+            .collect::<Vec<String>>()
+            .join(" ")
+    )?;
 
     writeln!(w)?;
 
diff --git a/tests/sources.list.d.expected/cdroms.list b/tests/sources.list.d.expected/cdroms.list
new file mode 100644
index 0000000..7d75573
--- /dev/null
+++ b/tests/sources.list.d.expected/cdroms.list
@@ -0,0 +1,10 @@
+# deb [ trusted=yes ] cdrom:[Proxmox VE 5.1]/ stretch pve
+
+# deb [ trusted=yes ] cdrom:[Proxmox VE 5.1]/proxmox/packages/ / 
+
+deb [ trusted=yes ] cdrom:[Proxmox VE 7.0 BETA]/ bullseye pve
+
+deb cdrom:[Proxmox VE 7.0 BETA]/proxmox/packages/ / 
+
+deb [ trusted=yes ] cdrom:[Debian GNU/Linux 10.6.0 _Buster_ - Official amd64 NETINST 20200926-10:16]/ buster main
+
diff --git a/tests/sources.list.d.expected/files.list b/tests/sources.list.d.expected/files.list
new file mode 100644
index 0000000..5e77023
--- /dev/null
+++ b/tests/sources.list.d.expected/files.list
@@ -0,0 +1,4 @@
+deb [ trusted=yes ] "file:///some/spacey/mount point/" bullseye pve
+
+deb [ lang=it ] "file:///some/spacey/mount point/proxmox/packages/" / 
+
diff --git a/tests/sources.list.d/cdroms.list b/tests/sources.list.d/cdroms.list
new file mode 100644
index 0000000..3b12626
--- /dev/null
+++ b/tests/sources.list.d/cdroms.list
@@ -0,0 +1,7 @@
+#deb [trusted=yes] cdrom:[Proxmox VE 5.1]/ stretch pve
+#deb [trusted=yes] cdrom:[Proxmox VE 5.1]/proxmox/packages/ /
+
+deb [trusted=yes] cdrom:[Proxmox VE 7.0 BETA]/ bullseye pve
+deb cdrom:[Proxmox VE 7.0 BETA]/proxmox/packages/ /
+
+deb [ "trusted=yes" ] cdrom:[Debian GNU/Linux 10.6.0 _Buster_ - Official amd64 NETINST 20200926-10:16]/ buster main
diff --git a/tests/sources.list.d/files.list b/tests/sources.list.d/files.list
new file mode 100644
index 0000000..4a5e4c2
--- /dev/null
+++ b/tests/sources.list.d/files.list
@@ -0,0 +1,2 @@
+deb [trusted=yes] "file:///some/spacey/mount point/" bullseye pve
+deb [lang="it"] file:///some/spacey/"mount point"/proxmox/packages/ /
-- 
2.30.2





^ permalink raw reply	[flat|nested] 4+ messages in thread

* [pve-devel] applied: [PATCH proxmox-apt] Support quoted parsing for .list format
  2021-07-01 13:46 [pve-devel] [PATCH proxmox-apt] Support quoted parsing for .list format Fabian Ebner
  2021-07-01 13:46 ` [pve-devel] [PATCH proxmox-apt 1/2] avoid backtick unicode symbol in string Fabian Ebner
  2021-07-01 13:46 ` [pve-devel] [PATCH proxmox-apt 2/2] support quote-word parsing for one-line format Fabian Ebner
@ 2021-07-01 16:30 ` Thomas Lamprecht
  2 siblings, 0 replies; 4+ messages in thread
From: Thomas Lamprecht @ 2021-07-01 16:30 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Ebner

On 01.07.21 15:46, Fabian Ebner wrote:
> First patch is just replacing the unicode back-tick that caused
> problems down the line. Not related to the second patch.
> 
> Second patch makes the .lists files parser aware of quoting, which
> fixes parsing CD ROM and file:// repositories with spaces, and in
> general makes the parser behave more like APT.
> 
> Fabian Ebner (2):
>   avoid backtick unicode symbol in string
>   support quote-word parsing for one-line format
> 

applied, thanks! And argh, that's what one gets from copy-pasting ascidooc *rendered*
html, always does fancy non-ascii characters..




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-07-01 16:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-01 13:46 [pve-devel] [PATCH proxmox-apt] Support quoted parsing for .list format Fabian Ebner
2021-07-01 13:46 ` [pve-devel] [PATCH proxmox-apt 1/2] avoid backtick unicode symbol in string Fabian Ebner
2021-07-01 13:46 ` [pve-devel] [PATCH proxmox-apt 2/2] support quote-word parsing for one-line format Fabian Ebner
2021-07-01 16:30 ` [pve-devel] applied: [PATCH proxmox-apt] Support quoted parsing for .list format Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal