Re: [pdm-devel] [PATCH proxmox 1/3] pve-api-types: add FixedString type

From: Wolfgang Bumiller <w.bumiller@proxmox.com>
To: Stefan Hanreich <s.hanreich@proxmox.com>
Cc: pdm-devel@lists.proxmox.com
Subject: Re: [pdm-devel] [PATCH proxmox 1/3] pve-api-types: add FixedString type
Date: Wed, 12 Nov 2025 11:50:01 +0100	[thread overview]
Message-ID: <azh3ifpvho4xs4qqti6wbmkjn5atenshwbap3ycijv3mx3badm@uxj5bjp33xx3> (raw)
In-Reply-To: <20251112092225.17890-2-s.hanreich@proxmox.com>

On Wed, Nov 12, 2025 at 10:22:05AM +0100, Stefan Hanreich wrote:
> A custom, immutable, string type, that is copiable, which can hold up
> to 23 characters. It will be used later for storing the name of
> unknown enum variants when parsing the return value of enum
> properties. The main reason for introducing this type is that its copy
> as opposed to String, which lets us keep the Copy implementation for
> the existing enums.
> 
> Main reason for choosing 23 characters is the fact that String is 24
> bytes large as well (ptr, len, capacity).
> 
> Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
> ---
>  pve-api-types/src/types/fixed_string.rs | 331 ++++++++++++++++++++++++
>  pve-api-types/src/types/mod.rs          |   3 +
>  2 files changed, 334 insertions(+)
>  create mode 100644 pve-api-types/src/types/fixed_string.rs
> 
> diff --git a/pve-api-types/src/types/fixed_string.rs b/pve-api-types/src/types/fixed_string.rs
> new file mode 100644
> index 00000000..f1692227
> --- /dev/null
> +++ b/pve-api-types/src/types/fixed_string.rs
> @@ -0,0 +1,331 @@
> +use std::cmp::Ordering;
> +use std::error::Error;
> +use std::fmt::{Debug, Display, Formatter};
> +use std::ops::Deref;
> +use std::str::FromStr;
> +
> +use serde::{Deserialize, Serialize};
> +
> +/// Error type used by constructors of [`FixedString`]
> +#[derive(Clone, Copy, Debug)]
> +pub enum FixedStringError {
> +    TooLong,
> +}

IMO this does not need to be an enum. `pub struct TooLongError` should
be enough.

> +
> +impl Error for FixedStringError {}
> +
> +impl Display for FixedStringError {
> +    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> Result<(), std::fmt::Error> {

^ you're importing `Formatter`.
And on that note: personally I prefer to import `std::fmt` and prefix
its items with `fmt::*`.

> +        f.write_str(match self {
> +            Self::TooLong => "string is longer than 23 characters",
> +        })
> +    }
> +}
> +
> +/// An immutable string type with a maximum size of 23 bytes.
> +///
> +/// After construction it is guaranteed that its contents are:
> +/// * valid utf-8
> +/// * not longer than 23 characters
> +///
> +/// FixedString is immutable, therefore it is sufficient to validate the invariants only at
> +/// construction time to guarantee that they will always hold during the lifecycle of the
> +/// struct.
> +#[derive(Clone, Copy)]
> +pub struct FixedString {
> +    buf: [u8; 23],
> +    len: u8,
> +}
> +
> +impl FixedString {
> +    /// Creates a new FixedString instance from a str reference.
> +    ///
> +    /// # Errors
> +    /// This function will return an error if:
> +    /// * The passed string is longer than 23 bytes
> +    pub fn new(value: &str) -> Result<Self, FixedStringError> {
> +        if value.len() > 23 {
> +            return Err(FixedStringError::TooLong);
> +        }
> +
> +        let mut buf = [0; 23];
> +        buf[..value.len()].copy_from_slice(value.as_bytes());
> +
> +        Ok(Self {
> +            buf,
> +            // SAFETY: self.len is at least 0 and at most 23, which fits into u8
> +            len: value.len() as u8,
> +        })
> +    }
> +
> +    /// Creates a new FixedString instance from a str reference, truncating if required
> +    ///
> +    /// It works by copying the bytes from the supplied value into its own local buffer. If the
> +    /// string is longer than 23 bytes, then this methods will truncate the string at the nearest
> +    /// utf-8 boundary, so even if a string longer than 23 characters is passed in, it is possible
> +    /// that it gets truncated to a length less than 23 if the utf-8 boundary doesn't perfectly
> +    /// align with the 23 bytes limit.
> +    pub fn new_truncated(value: &str) -> Self {

I don't think we should keep this unless we also have a way of checking
for this.
In case we do want to use this: since values up to 31 (a potential
still-cheap-enough extension to this `FixedString`'s length (since it
would then just be 4 pointers in size)) only require 5 bits, we have 3
bits remaining in the length byte which could be used to signal that
this was a truncated string, which could be returned via a
`.is_truncated()` method, and we could refuse to serialize such a string
without calling some extra method first (and include some marker in the
Debug impl)?

But IMO: for now, we just don't need it.

> +        // TODO: replace, once [`str::floor_char_boundary`] has been stabilized
> +        let len = if value.len() > 23 {
> +            let mut index = 23;
> +
> +            while index > 0 && !value.is_char_boundary(index) {
> +                index = index - 1;

index -= 1; ;-)

or:
    panic!("Who let Dominik add emojis to enums?")

> +            }
> +
> +            index
> +        } else {
> +            value.len()
> +        };

^ Actually the above could just be shortened to

let mut len = value.len();
if len > 23 {
    len = 23;
    while ... { len -= 1 }
}

> +
> +        let mut buf = [0; 23];
> +        buf[..len].copy_from_slice(&value.as_bytes()[..len]);
> +
> +        Self {
> +            buf,
> +            // SAFETY: self.len is at least 0 and at most 23, which fits into u8
> +            len: len as u8,
> +        }
> +    }
> +
> +    /// Returns a str reference to the stored data
> +    #[inline]
> +    pub fn as_str(&self) -> &str {
> +        // SAFETY: self.buf must be a valid utf-8 string by construction
> +        unsafe { str::from_utf8_unchecked(self.as_bytes()) }
> +    }
> +
> +    /// Returns a reference to the set bytes in the stored buffer
> +    #[inline]
> +    pub fn as_bytes(&self) -> &[u8] {
> +        // SAFETY: self.len >= 0 and self.len <= 23 by construction
> +        unsafe { self.buf.get_unchecked(..self.len as usize) }
> +    }
> +}
> +
> +impl PartialEq for FixedString {

IMO - regarding ordering where it matters - it would be easier to read
such impls when they're purely forwarded, eg.:

    PartialEq::eq(self.as_str(), other)

since then all implementations look the same (no need for `Some()`
wrapping in the `PartialOrd` impl, for instance.

Also, it could be generated with a simple forwarding macro which would
IMO also make review easier than having to go through each of them
individually:

    macro_rules! forward_impl_to_bytes {
        ($($trait:ident {$fn:ident -> $out:ty })+) => {
            $(
                impl $trait for FixedString {
                    fn $fn(&self, other: &Self) -> $out {
                        <[u8] as $trait>::$fn(self.as_bytes(), other.as_bytes())
                    }
                }
            )+
        };
    }

    forward_impl_to_bytes! {
        PartialEq { eq -> bool }
        PartialOrd { partial_cmp -> Option<Ordering> }
        Ord { cmp -> Ordering }
    }

And for the ones below:

    macro_rules! forward_impl_to_str_bidir {
        ($($trait:ident {$fn:ident -> $out:ty })+) => {
            $(
                impl $trait<str> for FixedString {
                    fn $fn(&self, other: &str) -> $out {
                        <str as $trait>::$fn(self.as_str(), other)
                    }
                }

                impl $trait<FixedString> for &str {
                    fn $fn(&self, other: &FixedString) -> $out {
                        <str as $trait>::$fn(self, other.as_str())
                    }
                }
            )+
        };
    }

    forward_impl_to_str_bidir! {
        PartialEq { eq -> bool }
        PartialOrd { partial_cmp -> Option<Ordering> }
    }

Yes I realize that for 2 traits this doesn't save much, but it would
make the intention clearer and IMO is still easier to review.

> +    #[inline]
> +    fn eq(&self, other: &FixedString) -> bool {
> +        self.as_bytes() == other.as_bytes()
> +    }
> +}
> +
> +impl Eq for FixedString {}
> +
> +impl PartialOrd for FixedString {
> +    fn partial_cmp(&self, other: &FixedString) -> Option<Ordering> {
> +        Some(self.cmp(other))
> +    }
> +}
> +
> +impl Ord for FixedString {
> +    #[inline]
> +    fn cmp(&self, other: &FixedString) -> Ordering {
> +        self.as_bytes().cmp(other.as_bytes())
> +    }
> +}
> +
> +impl Display for FixedString {
> +    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> Result<(), std::fmt::Error> {
> +        f.write_str(self.as_str())

Consider forwarding to &str's fmt instead so that things such as
alignment etc. work:

    Display::fmt(self.as_str(), f)

so that `println!("{the_enum:>8}")` will print "     foo" instead of
just "foo", and `{x:0^8}` produces "00foo000" ;-)

> +    }
> +}
> +
> +impl Debug for FixedString {
> +    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> Result<(), std::fmt::Error> {
> +        f.write_str(self.as_str())

Same here with `Debug::fmt(self.as_str(), f)`.

> +    }
> +}
> +
> +impl Deref for FixedString {
> +    type Target = str;
> +
> +    fn deref(&self) -> &str {
> +        self.as_str()
> +    }
> +}
> +
> +impl AsRef<str> for FixedString {
> +    fn as_ref(&self) -> &str {
> +        self.as_str()
> +    }
> +}

I'd also add `AsRef<[u8]>`, and `Borrow<str>`. (The latter would be
required to allow a `HashMap<FixedString, T>` to use `.get("key")` with
`&str` for a key - not that that's something we'd expect to see...

> +
> +impl TryFrom<String> for FixedString {
> +    type Error = FixedStringError;
> +
> +    fn try_from(value: String) -> Result<Self, Self::Error> {
> +        FixedString::new(value.as_str())
> +    }
> +}
> +
> +impl FromStr for FixedString {
> +    type Err = FixedStringError;
> +
> +    fn from_str(value: &str) -> Result<Self, Self::Err> {
> +        FixedString::new(value)
> +    }
> +}
> +
> +impl TryFrom<&str> for FixedString {
> +    type Error = FixedStringError;
> +
> +    fn try_from(value: &str) -> Result<Self, Self::Error> {
> +        FixedString::new(value)
> +    }
> +}
> +
> +impl PartialEq<str> for FixedString {
> +    fn eq(&self, other: &str) -> bool {
> +        self.as_str().eq(other)
> +    }
> +}
> +
> +impl PartialEq<FixedString> for &str {
> +    fn eq(&self, other: &FixedString) -> bool {
> +        self.eq(&other.as_str())
> +    }
> +}
> +
> +impl PartialOrd<str> for FixedString {
> +    fn partial_cmp(&self, other: &str) -> Option<Ordering> {
> +        Some(self.as_str().cmp(&other))
> +    }
> +}
> +
> +impl PartialOrd<FixedString> for &str {
> +    fn partial_cmp(&self, other: &FixedString) -> Option<Ordering> {
> +        Some(self.cmp(&other.as_str()))
> +    }
> +}
> +
> +impl Serialize for FixedString {
> +    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
> +    where
> +        S: serde::Serializer,
> +    {
> +        serializer.serialize_str(self.as_str())
> +    }
> +}
> +
> +impl<'de> Deserialize<'de> for FixedString {
> +    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
> +    where
> +        D: serde::Deserializer<'de>,
> +    {
> +        deserializer.deserialize_str(FixedStringVisitor)
> +    }
> +}
> +
> +struct FixedStringVisitor;

^ I'd prefer to move the type and Visitor impl into the `fn deserialize`.

> +
> +impl<'de> serde::de::Visitor<'de> for FixedStringVisitor {
> +    type Value = FixedString;
> +
> +    fn expecting(&self, f: &mut Formatter) -> std::fmt::Result {
> +        f.write_str("a string that is at most 23 bytes long")
> +    }
> +
> +    fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
> +    where
> +        E: serde::de::Error,
> +    {
> +        v.try_into().map_err(E::custom)
> +    }
> +
> +    fn visit_string<E>(self, v: String) -> Result<Self::Value, E>
> +    where
> +        E: serde::de::Error,
> +    {

^ You can drop this. `visit_string` and `visit_borrowed_str` by default
forward to `visit_str`.

> +        v.try_into().map_err(E::custom)
> +    }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +
> +    use serde_plain;
> +
> +    #[test]
> +    fn test_construct() {
> +        let fixed_string = FixedString::new("").expect("empty string is valid");
> +        assert_eq!("", fixed_string);
> +
> +        let fixed_string = FixedString::new("a").expect("valid string");
> +        assert_eq!("a", fixed_string);
> +
> +        let fixed_string = FixedString::new("🌏🌏🌏🌏🌏").expect("valid string");
> +        assert_eq!("🌏🌏🌏🌏🌏", fixed_string);
> +
> +        let fixed_string =
> +            FixedString::new("aaaaaaaaaaaaaaaaaaaaaaa").expect("23 characters are allowed");
> +        assert_eq!("aaaaaaaaaaaaaaaaaaaaaaa", fixed_string);
> +
> +        FixedString::new("🌏🌏🌏🌏🌏🌏").expect_err("string too long");
> +        FixedString::new("aaaaaaaaaaaaaaaaaaaaaaaa").expect_err("string too long");
> +    }
> +
> +    #[test]
> +    fn test_serialize_deserialize() {
> +        let valid_string = "aaaaaaaaaaaaaaaaaaaaaaa";
> +
> +        let fixed_string: FixedString =
> +            serde_plain::from_str("aaaaaaaaaaaaaaaaaaaaaaa").expect("deserialization works");
> +        assert_eq!(valid_string, fixed_string);
> +
> +        let serialized_string =
> +            serde_plain::to_string(&fixed_string).expect("can be serialized into a string");
> +        assert_eq!(valid_string, serialized_string);
> +
> +        serde_plain::from_str::<FixedString>("aaaaaaaaaaaaaaaaaaaaaaaa")
> +            .expect_err("cannot deserialize string that is too long");
> +    }
> +
> +    #[test]
> +    fn test_ord() {
> +        let fixed_string = FixedString::new("abc").expect("valid string");
> +
> +        assert!(fixed_string == fixed_string);
> +        assert!(fixed_string >= fixed_string);
> +        assert!(fixed_string <= fixed_string);
> +
> +        assert!("ab" < fixed_string);
> +        assert!("abc" == fixed_string);
> +        assert!("abcd" > fixed_string);
> +
> +        let larger_fixed_string = FixedString::new("abcde").expect("valid string");
> +
> +        assert!(larger_fixed_string > fixed_string);
> +        assert!(fixed_string < larger_fixed_string);
> +    }
> +
> +    #[test]
> +    fn test_truncate() {
> +        let fixed_string = FixedString::new_truncated("aaaaaaaaaaaaaaaaaaaaaaaaaaaa");
> +        assert_eq!("aaaaaaaaaaaaaaaaaaaaaaa", fixed_string);
> +
> +        let fixed_string = FixedString::new_truncated("aaaaaaaaaaaaaaaaaa💯");
> +        assert_eq!("aaaaaaaaaaaaaaaaaa💯", fixed_string);
> +
> +        // '💯' is 4 bytes long
> +        // first byte of '💯' is inside the 23 byte limit, but it's not a valid utf-8 boundary so
> +        // it should get truncated
> +        let fixed_string = FixedString::new_truncated("aaaaaaaaaaaaaaaaaaaaa💯");
> +        assert_eq!("aaaaaaaaaaaaaaaaaaaaa", fixed_string);
> +
> +        let fixed_string = FixedString::new_truncated("aaaaaaaaaaaaaaaaaaaaaa💯");
> +        assert_eq!("aaaaaaaaaaaaaaaaaaaaaa", fixed_string);
> +
> +        let fixed_string = FixedString::new_truncated("aaaaaaaaaaaaaaaaaaaaaaa💯");
> +        assert_eq!("aaaaaaaaaaaaaaaaaaaaaaa", fixed_string);
> +
> +        // '🌏' is 4 bytes long
> +        let fixed_string = FixedString::new_truncated("🌏🌏🌏🌏🌏");
> +        assert_eq!("🌏🌏🌏🌏🌏", fixed_string);
> +
> +        let fixed_string = FixedString::new_truncated("🌏🌏🌏🌏🌏🌏");
> +        assert_eq!("🌏🌏🌏🌏🌏", fixed_string);
> +    }
> +}
> diff --git a/pve-api-types/src/types/mod.rs b/pve-api-types/src/types/mod.rs
> index fe52a169..9653dd46 100644
> --- a/pve-api-types/src/types/mod.rs
> +++ b/pve-api-types/src/types/mod.rs
> @@ -17,6 +17,9 @@ pub mod array;
>  pub mod stringlist;
>  pub mod verifiers;
>  
> +mod fixed_string;
> +pub use fixed_string::{FixedString, FixedStringError};
> +
>  include!("../generated/types.rs");
>  
>  /// A PVE Upid, contrary to a PBS Upid, contains no 'task-id' number.
> -- 
> 2.47.3

_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel