public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [RFC proxmox] support quoted strings in property strings
@ 2022-02-16 13:39 Wolfgang Bumiller
  2022-02-17  8:58 ` Fabian Ebner
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Wolfgang Bumiller @ 2022-02-16 13:39 UTC (permalink / raw)
  To: pbs-devel
  Cc: pve-devel, Thomas Lamprecht, Dominik Csapak,
	Fabian Grünbichler, Hannes Laimer

This allows free form text to exist within property strings,
quoted, like:
    key="A value with \"quotes, also commas",key2=value2
or also:
    "the value for a default_key",key2=value2

And drop ';' as a key=value separator since those are meant
for arrays inside property strings...

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
This is mostly a reaction to Hannes' maintenance mode series.
I think it would make more sense to improve our "property string
specification" (as much as there is one :P) to support quoted strings.
This way we can avoid the url encoding mess.

We could also do this in PVE (which would be particularly useful if we
want to allow adding notes to net/disk devices).
AFAICT the only property strings containing string values which would
in *theory* allow quotes in the beginning wouldn't work with them in
*practice*, (eg. the 'path' in a container's mount point, or an 'mdev'
in a VM's hostpci entry?)

 proxmox-schema/src/lib.rs             |   2 +
 proxmox-schema/src/property_string.rs | 163 ++++++++++++++++++++++++++
 proxmox-schema/src/schema.rs          |  25 ++--
 3 files changed, 177 insertions(+), 13 deletions(-)
 create mode 100644 proxmox-schema/src/property_string.rs

diff --git a/proxmox-schema/src/lib.rs b/proxmox-schema/src/lib.rs
index 4e98443..13b0739 100644
--- a/proxmox-schema/src/lib.rs
+++ b/proxmox-schema/src/lib.rs
@@ -19,6 +19,8 @@ pub use const_regex::ConstRegexPattern;
 pub mod de;
 pub mod format;
 
+pub(crate) mod property_string;
+
 mod schema;
 pub use schema::*;
 
diff --git a/proxmox-schema/src/property_string.rs b/proxmox-schema/src/property_string.rs
new file mode 100644
index 0000000..a40c1ca
--- /dev/null
+++ b/proxmox-schema/src/property_string.rs
@@ -0,0 +1,163 @@
+//! Property string parsing
+//! This may at some point also get a proper direkt `Serializer`/`Deserializer` for property
+//! strings.
+
+use std::borrow::Cow;
+use std::mem;
+
+use anyhow::{bail, format_err, Error};
+
+/// Iterate over the `key=value` pairs of a property string.
+///
+/// Note, that the `key` may be optional when the schema defines a "default" key.
+/// If the value does not require stripping backslash escapes it will be borrowed, otherwise an
+/// owned `String` will be returned.
+pub struct PropertyIterator<'a> {
+    data: &'a str,
+}
+
+impl<'a> PropertyIterator<'a> {
+    pub fn new(data: &'a str) -> Self {
+        Self { data }
+    }
+}
+
+impl<'a> Iterator for PropertyIterator<'a> {
+    type Item = Result<(Option<&'a str>, Cow<'a, str>), Error>;
+
+    fn next(&mut self) -> Option<Self::Item> {
+        if self.data.is_empty() {
+            return None;
+        }
+
+        let key = if self.data.starts_with('"') {
+            // value without key and quoted
+            None
+        } else {
+            let key = match self.data.find(&[',', '=']) {
+                Some(pos) if self.data.as_bytes()[pos] == b',' => None,
+                Some(pos) => Some(ascii_split_off(&mut self.data, pos)),
+                None => None,
+            };
+
+            if !self.data.starts_with('"') {
+                let value = match self.data.find(',') {
+                    Some(pos) => ascii_split_off(&mut self.data, pos),
+                    None => mem::take(&mut self.data),
+                };
+                return Some(Ok((key, Cow::Borrowed(value))));
+            }
+
+            key
+        };
+
+        let value = match parse_quoted_string(&mut self.data) {
+            Ok(value) => value,
+            Err(err) => return Some(Err(err)),
+        };
+
+        if !self.data.is_empty() {
+            if self.data.starts_with(',') {
+                self.data = &self.data[1..];
+            } else {
+                return Some(Err(format_err!("garbage after quoted string")));
+            }
+        }
+
+        Some(Ok((key, value)))
+    }
+}
+
+impl<'a> std::iter::FusedIterator for PropertyIterator<'a> {}
+
+/// Parse a quoted string and move `data` to after the closing quote.
+///
+/// The string must start with a double quote.
+///
+/// This allows `\"`, `\\` and `\n` as escape sequences.
+///
+/// If no escape sequence is found, `Cow::Borrowed` is returned, otherwise an owned `String` is
+/// returned.
+fn parse_quoted_string<'s>(data: &'_ mut &'s str) -> Result<Cow<'s, str>, Error> {
+    let data_out = data;
+    let data = data_out.as_bytes();
+
+    if data[0] != b'"' {
+        bail!("not a quoted string");
+    }
+
+    let mut i = 1;
+    while i != data.len() {
+        if data[i] == b'"' {
+            // unsafe: we're at an ascii boundary and index [0] is an ascii quote
+            let value = unsafe { std::str::from_utf8_unchecked(&data[1..i]) };
+            *data_out = unsafe { std::str::from_utf8_unchecked(&data[(i + 1)..]) };
+            return Ok(Cow::Borrowed(value));
+        } else if data[i] == b'\\' {
+            // we cannot borrow
+            break;
+        }
+        i += 1;
+    }
+    if i == data.len() {
+        // reached the end before reaching a quote
+        bail!("unexpected end of string");
+    }
+
+    // we're now at the first backslash, don't include it in the output:
+    let mut out = Vec::with_capacity(data.len());
+    out.extend_from_slice(&data[1..i]);
+    i += 1;
+    let mut was_backslash = true;
+    while i != data.len() {
+        match (data[i], mem::replace(&mut was_backslash, false)) {
+            (b'"', false) => {
+                i += 1;
+                break;
+            }
+            (b'"', true) => out.push(b'"'),
+            (b'\\', true) => out.push(b'\\'),
+            (b'n', true) => out.push(b'\n'),
+            (_, true) => bail!("unsupported escape sequence"),
+            (b'\\', false) => was_backslash = true,
+            (ch, false) => out.push(ch),
+        }
+        i += 1;
+    }
+
+    // unsafe: we know we're at an ascii boundary
+    *data_out = unsafe { std::str::from_utf8_unchecked(&data[i..]) };
+    Ok(Cow::Owned(unsafe { String::from_utf8_unchecked(out) }))
+}
+
+/// Like `str::split_at` but with assumes `mid` points to an ASCII character and the 2nd slice
+/// *excludes* `mid`.
+fn ascii_split_around(s: &str, mid: usize) -> (&str, &str) {
+    (&s[..mid], &s[(mid + 1)..])
+}
+
+/// Split "off" the first `mid` bytes of `s`, advancing it to `mid + 1` (assuming `mid` points to
+/// an ASCII character!).
+fn ascii_split_off<'a, 's>(s: &'a mut &'s str, mid: usize) -> &'s str {
+    let (a, b) = ascii_split_around(s, mid);
+    *s = b;
+    a
+}
+
+#[test]
+fn iterate_over_property_string() {
+    let data = r#""value w/o key",fst=v1,sec="with = in it",third=3,"and \" and '\\'""#;
+    let mut iter = PropertyIterator::new(data).map(|entry| entry.unwrap());
+    assert_eq!(iter.next().unwrap(), (None, Cow::Borrowed("value w/o key")));
+    assert_eq!(iter.next().unwrap(), (Some("fst"), Cow::Borrowed("v1")));
+    assert_eq!(
+        iter.next().unwrap(),
+        (Some("sec"), Cow::Borrowed("with = in it"))
+    );
+    assert_eq!(iter.next().unwrap(), (Some("third"), Cow::Borrowed("3")));
+    assert_eq!(
+        iter.next().unwrap(),
+        (None, Cow::Borrowed(r#"and " and '\'"#))
+    );
+    assert!(iter.next().is_none());
+}
diff --git a/proxmox-schema/src/schema.rs b/proxmox-schema/src/schema.rs
index 44c56e7..58c6a82 100644
--- a/proxmox-schema/src/schema.rs
+++ b/proxmox-schema/src/schema.rs
@@ -904,19 +904,18 @@ impl Schema {
             schema: T,
             default_key: Option<&'static str>,
         ) -> Result<Value, Error> {
-            let mut param_list: Vec<(String, String)> = vec![];
-            let key_val_list: Vec<&str> = value_str
-                .split(|c: char| c == ',' || c == ';')
-                .filter(|s| !s.is_empty())
-                .collect();
-            for key_val in key_val_list {
-                let kv: Vec<&str> = key_val.splitn(2, '=').collect();
-                if kv.len() == 2 {
-                    param_list.push((kv[0].trim().into(), kv[1].trim().into()));
-                } else if let Some(key) = default_key {
-                    param_list.push((key.into(), kv[0].trim().into()));
-                } else {
-                    bail!("Value without key, but schema does not define a default key.");
+            let mut param_list = Vec::new();
+            for entry in crate::property_string::PropertyIterator::new(value_str) {
+                let (key, value) = entry?;
+                match key {
+                    Some(key) => param_list.push((key.to_string(), value.into_owned())),
+                    None => {
+                        if let Some(key) = default_key {
+                            param_list.push((key.to_string(), value.into_owned()));
+                        } else {
+                            bail!("Value without key, but schema does not define a default key.");
+                        }
+                    }
                 }
             }
 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pbs-devel] [RFC proxmox] support quoted strings in property strings
  2022-02-16 13:39 [pbs-devel] [RFC proxmox] support quoted strings in property strings Wolfgang Bumiller
@ 2022-02-17  8:58 ` Fabian Ebner
  2022-02-17  9:47   ` Wolfgang Bumiller
  2022-02-17 11:13 ` [pbs-devel] applied: " Thomas Lamprecht
  2022-02-18  8:02 ` [pbs-devel] " Fabian Ebner
  2 siblings, 1 reply; 6+ messages in thread
From: Fabian Ebner @ 2022-02-17  8:58 UTC (permalink / raw)
  To: pbs-devel, Wolfgang Bumiller

Am 16.02.22 um 14:39 schrieb Wolfgang Bumiller:
> This allows free form text to exist within property strings,
> quoted, like:
>     key="A value with \"quotes, also commas",key2=value2
> or also:
>     "the value for a default_key",key2=value2
> 
> And drop ';' as a key=value separator since those are meant
> for arrays inside property strings...
> 

Isn't that backwards-incompatible?

> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> ---
> This is mostly a reaction to Hannes' maintenance mode series.
> I think it would make more sense to improve our "property string
> specification" (as much as there is one :P) to support quoted strings.
> This way we can avoid the url encoding mess.
> 
> We could also do this in PVE (which would be particularly useful if we
> want to allow adding notes to net/disk devices).
> AFAICT the only property strings containing string values which would
> in *theory* allow quotes in the beginning wouldn't work with them in
> *practice*, (eg. the 'path' in a container's mount point, or an 'mdev'
> in a VM's hostpci entry?)
> 
>  proxmox-schema/src/lib.rs             |   2 +
>  proxmox-schema/src/property_string.rs | 163 ++++++++++++++++++++++++++
>  proxmox-schema/src/schema.rs          |  25 ++--
>  3 files changed, 177 insertions(+), 13 deletions(-)
>  create mode 100644 proxmox-schema/src/property_string.rs
> 
> diff --git a/proxmox-schema/src/lib.rs b/proxmox-schema/src/lib.rs
> index 4e98443..13b0739 100644
> --- a/proxmox-schema/src/lib.rs
> +++ b/proxmox-schema/src/lib.rs
> @@ -19,6 +19,8 @@ pub use const_regex::ConstRegexPattern;
>  pub mod de;
>  pub mod format;
>  
> +pub(crate) mod property_string;
> +
>  mod schema;
>  pub use schema::*;
>  
> diff --git a/proxmox-schema/src/property_string.rs b/proxmox-schema/src/property_string.rs
> new file mode 100644
> index 0000000..a40c1ca
> --- /dev/null
> +++ b/proxmox-schema/src/property_string.rs
> @@ -0,0 +1,163 @@
> +//! Property string parsing
> +//! This may at some point also get a proper direkt `Serializer`/`Deserializer` for property
> +//! strings.
> +
> +use std::borrow::Cow;
> +use std::mem;
> +
> +use anyhow::{bail, format_err, Error};
> +
> +/// Iterate over the `key=value` pairs of a property string.
> +///
> +/// Note, that the `key` may be optional when the schema defines a "default" key.

Nit: But the iterator does not use the schema. Shouldn't the comment
here rather just describe the behavior of the iterator?

> +/// If the value does not require stripping backslash escapes it will be borrowed, otherwise an
> +/// owned `String` will be returned.
> +pub struct PropertyIterator<'a> {
> +    data: &'a str,
> +}
> +




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pbs-devel] [RFC proxmox] support quoted strings in property strings
  2022-02-17  8:58 ` Fabian Ebner
@ 2022-02-17  9:47   ` Wolfgang Bumiller
  0 siblings, 0 replies; 6+ messages in thread
From: Wolfgang Bumiller @ 2022-02-17  9:47 UTC (permalink / raw)
  To: Fabian Ebner; +Cc: pbs-devel

On Thu, Feb 17, 2022 at 09:58:03AM +0100, Fabian Ebner wrote:
> Am 16.02.22 um 14:39 schrieb Wolfgang Bumiller:
> > This allows free form text to exist within property strings,
> > quoted, like:
> >     key="A value with \"quotes, also commas",key2=value2
> > or also:
> >     "the value for a default_key",key2=value2
> > 
> > And drop ';' as a key=value separator since those are meant
> > for arrays inside property strings...
> > 
> 
> Isn't that backwards-incompatible?

Technically yes, but currently we aren't using it, it's not compatible
with pve and it's also not documented, so chances are nobody manually
wrote this in a config file yet.

In pve we for instance have
    features: mount=ext4;ntfs;foo
in containers and
    cpu: flags=+FOO;-BAR
in VMs

and it was initially meant to work the same in pbs...

I really do prefer "breaking" this ASAP.

> > +/// Iterate over the `key=value` pairs of a property string.
> > +///
> > +/// Note, that the `key` may be optional when the schema defines a "default" key.
> 
> Nit: But the iterator does not use the schema. Shouldn't the comment
> here rather just describe the behavior of the iterator?

It doesn't, but it's meant specifically for property strings and the
iterator is not exposed to the public right now. Iow. it's currently a
parser-internal detail.
(A reason to keep it internal is also that the de-quoting happening is
very limited (on purpose), and exposing it might give people the idea of
adding things like `\xXY` or `\U0ABC` support to the string parser, too.
But if someone needs that, I'd rather look for an existing parse/quote
crate first before adding more custom code... ;-) )
(In fact... it may even be nicer code-wise if the default key was itself
stored in the iterator... not sure... but I have some serde
De/Serializer WIP/POC code for property strings to avoid the
serde_json::Value intermediate step for those (particularly for the
*verify* case))




^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pbs-devel] applied: [RFC proxmox] support quoted strings in property strings
  2022-02-16 13:39 [pbs-devel] [RFC proxmox] support quoted strings in property strings Wolfgang Bumiller
  2022-02-17  8:58 ` Fabian Ebner
@ 2022-02-17 11:13 ` Thomas Lamprecht
  2022-02-18  8:02 ` [pbs-devel] " Fabian Ebner
  2 siblings, 0 replies; 6+ messages in thread
From: Thomas Lamprecht @ 2022-02-17 11:13 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Wolfgang Bumiller
  Cc: pve-devel, Hannes Laimer

On 16.02.22 14:39, Wolfgang Bumiller wrote:
> This allows free form text to exist within property strings,
> quoted, like:
>     key="A value with \"quotes, also commas",key2=value2
> or also:
>     "the value for a default_key",key2=value2
> 
> And drop ';' as a key=value separator since those are meant
> for arrays inside property strings...
> 
> Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
> ---
> This is mostly a reaction to Hannes' maintenance mode series.
> I think it would make more sense to improve our "property string
> specification" (as much as there is one :P) to support quoted strings.
> This way we can avoid the url encoding mess.
> 
> We could also do this in PVE (which would be particularly useful if we
> want to allow adding notes to net/disk devices).
> AFAICT the only property strings containing string values which would
> in *theory* allow quotes in the beginning wouldn't work with them in
> *practice*, (eg. the 'path' in a container's mount point, or an 'mdev'
> in a VM's hostpci entry?)
> 
>  proxmox-schema/src/lib.rs             |   2 +
>  proxmox-schema/src/property_string.rs | 163 ++++++++++++++++++++++++++
>  proxmox-schema/src/schema.rs          |  25 ++--
>  3 files changed, 177 insertions(+), 13 deletions(-)
>  create mode 100644 proxmox-schema/src/property_string.rs
> 
>

applied, thanks!

Would be great to get now for PVE too, albeit we can wait out the rust take over
there ;-P

@Hannes: can you rework the maintenance series to use this now, having mode and
message/comment more cleanly separated? I'd already apply the rest of that series,
but fwict it'd need changes in the first patch already, and the latter do not
apply independently.




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pbs-devel] [RFC proxmox] support quoted strings in property strings
  2022-02-16 13:39 [pbs-devel] [RFC proxmox] support quoted strings in property strings Wolfgang Bumiller
  2022-02-17  8:58 ` Fabian Ebner
  2022-02-17 11:13 ` [pbs-devel] applied: " Thomas Lamprecht
@ 2022-02-18  8:02 ` Fabian Ebner
  2 siblings, 0 replies; 6+ messages in thread
From: Fabian Ebner @ 2022-02-18  8:02 UTC (permalink / raw)
  To: pbs-devel, Wolfgang Bumiller

> +fn parse_quoted_string<'s>(data: &'_ mut &'s str) -> Result<Cow<'s, str>, Error> {
> +    let data_out = data;
> +    let data = data_out.as_bytes();
> +
> +    if data[0] != b'"' {
> +        bail!("not a quoted string");
> +    }
> +
> +    let mut i = 1;
> +    while i != data.len() {
> +        if data[i] == b'"' {

What if a byte that's part of an UTF-8 character is equal to b'"', or
can that not happen?

> +            // unsafe: we're at an ascii boundary and index [0] is an ascii quote
> +            let value = unsafe { std::str::from_utf8_unchecked(&data[1..i]) };
> +            *data_out = unsafe { std::str::from_utf8_unchecked(&data[(i + 1)..]) };
> +            return Ok(Cow::Borrowed(value));
> +        } else if data[i] == b'\\' {
> +            // we cannot borrow
> +            break;
> +        }
> +        i += 1;
> +    }




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [pbs-devel] [RFC proxmox] support quoted strings in property strings
@ 2022-02-18  8:13 Wolfgang Bumiller
  0 siblings, 0 replies; 6+ messages in thread
From: Wolfgang Bumiller @ 2022-02-18  8:13 UTC (permalink / raw)
  To: Fabian Ebner, pbs-devel


> On 02/18/2022 9:02 AM Fabian Ebner <f.ebner@proxmox.com> wrote:
> 
>  
> > +fn parse_quoted_string<'s>(data: &'_ mut &'s str) -> Result<Cow<'s, str>, Error> {
> > +    let data_out = data;
> > +    let data = data_out.as_bytes();
> > +
> > +    if data[0] != b'"' {
> > +        bail!("not a quoted string");
> > +    }
> > +
> > +    let mut i = 1;
> > +    while i != data.len() {
> > +        if data[i] == b'"' {
> 
> What if a byte that's part of an UTF-8 character is equal to b'"', or
> can that not happen?

This cannot happen as then utf-8 wouldn't be ascii compatible.
In multibyte characters each byte has the high bit set and the first byte's
number of set high bits before the first zero bit tells you the length of the
sequence.
If that length were to cover a b'"' byte then the string was invalid to begin
with, which is not our fault.




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-02-18  8:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-16 13:39 [pbs-devel] [RFC proxmox] support quoted strings in property strings Wolfgang Bumiller
2022-02-17  8:58 ` Fabian Ebner
2022-02-17  9:47   ` Wolfgang Bumiller
2022-02-17 11:13 ` [pbs-devel] applied: " Thomas Lamprecht
2022-02-18  8:02 ` [pbs-devel] " Fabian Ebner
2022-02-18  8:13 Wolfgang Bumiller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal