Re: [pdm-devel] [PATCH proxmox 1/3] pve-api-types: add FixedString type

From: Stefan Hanreich <s.hanreich@proxmox.com>
To: Wolfgang Bumiller <w.bumiller@proxmox.com>
Cc: pdm-devel@lists.proxmox.com
Subject: Re: [pdm-devel] [PATCH proxmox 1/3] pve-api-types: add FixedString type
Date: Wed, 12 Nov 2025 17:39:46 +0100	[thread overview]
Message-ID: <0786d417-db35-4c7e-96e2-66620ad26e9c@proxmox.com> (raw)
In-Reply-To: <azh3ifpvho4xs4qqti6wbmkjn5atenshwbap3ycijv3mx3badm@uxj5bjp33xx3>

thanks for the review! Incorporated all your suggestions, waiting with
the v2 til tomorrow, in case something else comes up.

On 11/12/25 11:49 AM, Wolfgang Bumiller wrote:
> On Wed, Nov 12, 2025 at 10:22:05AM +0100, Stefan Hanreich wrote:
>> A custom, immutable, string type, that is copiable, which can hold up
>> to 23 characters. It will be used later for storing the name of
>> unknown enum variants when parsing the return value of enum
>> properties. The main reason for introducing this type is that its copy
>> as opposed to String, which lets us keep the Copy implementation for
>> the existing enums.
>>
>> Main reason for choosing 23 characters is the fact that String is 24
>> bytes large as well (ptr, len, capacity).
>>
>> Signed-off-by: Stefan Hanreich <s.hanreich@proxmox.com>
>> ---
>>  pve-api-types/src/types/fixed_string.rs | 331 ++++++++++++++++++++++++
>>  pve-api-types/src/types/mod.rs          |   3 +
>>  2 files changed, 334 insertions(+)
>>  create mode 100644 pve-api-types/src/types/fixed_string.rs
>>
>> diff --git a/pve-api-types/src/types/fixed_string.rs b/pve-api-types/src/types/fixed_string.rs
>> new file mode 100644
>> index 00000000..f1692227
>> --- /dev/null
>> +++ b/pve-api-types/src/types/fixed_string.rs
>> @@ -0,0 +1,331 @@
>> +use std::cmp::Ordering;
>> +use std::error::Error;
>> +use std::fmt::{Debug, Display, Formatter};
>> +use std::ops::Deref;
>> +use std::str::FromStr;
>> +
>> +use serde::{Deserialize, Serialize};
>> +
>> +/// Error type used by constructors of [`FixedString`]
>> +#[derive(Clone, Copy, Debug)]
>> +pub enum FixedStringError {
>> +    TooLong,
>> +}
> 
> IMO this does not need to be an enum. `pub struct TooLongError` should
> be enough.

also thought about it, wanted it to be potentially extendable without
too much hassle - but works for me as well. Additional error variants
are unlikely anyway..

>> +
>> +impl Error for FixedStringError {}
>> +
>> +impl Display for FixedStringError {
>> +    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> Result<(), std::fmt::Error> {
> 
> ^ you're importing `Formatter`.
> And on that note: personally I prefer to import `std::fmt` and prefix
> its items with `fmt::*`.
> 
>> +        f.write_str(match self {
>> +            Self::TooLong => "string is longer than 23 characters",
>> +        })
>> +    }
>> +}
>> +
>> +/// An immutable string type with a maximum size of 23 bytes.
>> +///
>> +/// After construction it is guaranteed that its contents are:
>> +/// * valid utf-8
>> +/// * not longer than 23 characters
>> +///
>> +/// FixedString is immutable, therefore it is sufficient to validate the invariants only at
>> +/// construction time to guarantee that they will always hold during the lifecycle of the
>> +/// struct.
>> +#[derive(Clone, Copy)]
>> +pub struct FixedString {
>> +    buf: [u8; 23],
>> +    len: u8,
>> +}
>> +
>> +impl FixedString {
>> +    /// Creates a new FixedString instance from a str reference.
>> +    ///
>> +    /// # Errors
>> +    /// This function will return an error if:
>> +    /// * The passed string is longer than 23 bytes
>> +    pub fn new(value: &str) -> Result<Self, FixedStringError> {
>> +        if value.len() > 23 {
>> +            return Err(FixedStringError::TooLong);
>> +        }
>> +
>> +        let mut buf = [0; 23];
>> +        buf[..value.len()].copy_from_slice(value.as_bytes());
>> +
>> +        Ok(Self {
>> +            buf,
>> +            // SAFETY: self.len is at least 0 and at most 23, which fits into u8
>> +            len: value.len() as u8,
>> +        })
>> +    }
>> +
>> +    /// Creates a new FixedString instance from a str reference, truncating if required
>> +    ///
>> +    /// It works by copying the bytes from the supplied value into its own local buffer. If the
>> +    /// string is longer than 23 bytes, then this methods will truncate the string at the nearest
>> +    /// utf-8 boundary, so even if a string longer than 23 characters is passed in, it is possible
>> +    /// that it gets truncated to a length less than 23 if the utf-8 boundary doesn't perfectly
>> +    /// align with the 23 bytes limit.
>> +    pub fn new_truncated(value: &str) -> Self {
> 
> I don't think we should keep this unless we also have a way of checking
> for this.
> In case we do want to use this: since values up to 31 (a potential
> still-cheap-enough extension to this `FixedString`'s length (since it
> would then just be 4 pointers in size)) only require 5 bits, we have 3
> bits remaining in the length byte which could be used to signal that
> this was a truncated string, which could be returned via a
> `.is_truncated()` method, and we could refuse to serialize such a string
> without calling some extra method first (and include some marker in the
> Debug impl)?
> 
> But IMO: for now, we just don't need it.
> 

dropped it now then

>> +        // TODO: replace, once [`str::floor_char_boundary`] has been stabilized
>> +        let len = if value.len() > 23 {
>> +            let mut index = 23;
>> +
>> +            while index > 0 && !value.is_char_boundary(index) {
>> +                index = index - 1;
> 
> index -= 1; ;-)
> 
> or:
>     panic!("Who let Dominik add emojis to enums?")
> 
>> +            }
>> +
>> +            index
>> +        } else {
>> +            value.len()
>> +        };
> 
> ^ Actually the above could just be shortened to
> 
> let mut len = value.len();
> if len > 23 {
>     len = 23;
>     while ... { len -= 1 }
> }
> 
>> +
>> +        let mut buf = [0; 23];
>> +        buf[..len].copy_from_slice(&value.as_bytes()[..len]);
>> +
>> +        Self {
>> +            buf,
>> +            // SAFETY: self.len is at least 0 and at most 23, which fits into u8
>> +            len: len as u8,
>> +        }
>> +    }
>> +
>> +    /// Returns a str reference to the stored data
>> +    #[inline]
>> +    pub fn as_str(&self) -> &str {
>> +        // SAFETY: self.buf must be a valid utf-8 string by construction
>> +        unsafe { str::from_utf8_unchecked(self.as_bytes()) }
>> +    }
>> +
>> +    /// Returns a reference to the set bytes in the stored buffer
>> +    #[inline]
>> +    pub fn as_bytes(&self) -> &[u8] {
>> +        // SAFETY: self.len >= 0 and self.len <= 23 by construction
>> +        unsafe { self.buf.get_unchecked(..self.len as usize) }
>> +    }
>> +}
>> +
>> +impl PartialEq for FixedString {
> 
> IMO - regarding ordering where it matters - it would be easier to read
> such impls when they're purely forwarded, eg.:
> 
>     PartialEq::eq(self.as_str(), other)
> 
> since then all implementations look the same (no need for `Some()`
> wrapping in the `PartialOrd` impl, for instance.

yeah, that's fair - idk why I didn't do it in the first place...

> Also, it could be generated with a simple forwarding macro which would
> IMO also make review easier than having to go through each of them
> individually:
> 
>     macro_rules! forward_impl_to_bytes {
>         ($($trait:ident {$fn:ident -> $out:ty })+) => {
>             $(
>                 impl $trait for FixedString {
>                     fn $fn(&self, other: &Self) -> $out {
>                         <[u8] as $trait>::$fn(self.as_bytes(), other.as_bytes())
>                     }
>                 }
>             )+
>         };
>     }
> 
>     forward_impl_to_bytes! {
>         PartialEq { eq -> bool }
>         PartialOrd { partial_cmp -> Option<Ordering> }
>         Ord { cmp -> Ordering }
>     }
> 
> 
> And for the ones below:
> 
>     macro_rules! forward_impl_to_str_bidir {
>         ($($trait:ident {$fn:ident -> $out:ty })+) => {
>             $(
>                 impl $trait<str> for FixedString {
>                     fn $fn(&self, other: &str) -> $out {
>                         <str as $trait>::$fn(self.as_str(), other)
>                     }
>                 }
> 
>                 impl $trait<FixedString> for &str {
>                     fn $fn(&self, other: &FixedString) -> $out {
>                         <str as $trait>::$fn(self, other.as_str())
>                     }
>                 }
>             )+
>         };
>     }
> 
>     forward_impl_to_str_bidir! {
>         PartialEq { eq -> bool }
>         PartialOrd { partial_cmp -> Option<Ordering> }
>     }
> 
> Yes I realize that for 2 traits this doesn't save much, but it would
> make the intention clearer and IMO is still easier to review.

still, this makes adding additional traits a breeze and I'm sure we
might need to do that at some point. I kept the inline for both cases
though - what do you think? makes sense imo.

>> +    #[inline]
>> +    fn eq(&self, other: &FixedString) -> bool {
>> +        self.as_bytes() == other.as_bytes()
>> +    }
>> +}
>> +
>> +impl Eq for FixedString {}
>> +
>> +impl PartialOrd for FixedString {
>> +    fn partial_cmp(&self, other: &FixedString) -> Option<Ordering> {
>> +        Some(self.cmp(other))
>> +    }
>> +}
>> +
>> +impl Ord for FixedString {
>> +    #[inline]
>> +    fn cmp(&self, other: &FixedString) -> Ordering {
>> +        self.as_bytes().cmp(other.as_bytes())
>> +    }
>> +}
>> +
>> +impl Display for FixedString {
>> +    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> Result<(), std::fmt::Error> {
>> +        f.write_str(self.as_str())
> 
> Consider forwarding to &str's fmt instead so that things such as
> alignment etc. work:
> 
>     Display::fmt(self.as_str(), f)
> 
> so that `println!("{the_enum:>8}")` will print "     foo" instead of
> just "foo", and `{x:0^8}` produces "00foo000" ;-)
> 

done

>> +    }
>> +}
>> +
>> +impl Debug for FixedString {
>> +    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> Result<(), std::fmt::Error> {
>> +        f.write_str(self.as_str())
> 
> Same here with `Debug::fmt(self.as_str(), f)`.
> 

done

>> +    }
>> +}
>> +
>> +impl Deref for FixedString {
>> +    type Target = str;
>> +
>> +    fn deref(&self) -> &str {
>> +        self.as_str()
>> +    }
>> +}
>> +
>> +impl AsRef<str> for FixedString {
>> +    fn as_ref(&self) -> &str {
>> +        self.as_str()
>> +    }
>> +}
> 
> I'd also add `AsRef<[u8]>`, and `Borrow<str>`. (The latter would be
> required to allow a `HashMap<FixedString, T>` to use `.get("key")` with
> `&str` for a key - not that that's something we'd expect to see...
> 

done

>> +
>> +impl TryFrom<String> for FixedString {
>> +    type Error = FixedStringError;
>> +
>> +    fn try_from(value: String) -> Result<Self, Self::Error> {
>> +        FixedString::new(value.as_str())
>> +    }
>> +}
>> +
>> +impl FromStr for FixedString {
>> +    type Err = FixedStringError;
>> +
>> +    fn from_str(value: &str) -> Result<Self, Self::Err> {
>> +        FixedString::new(value)
>> +    }
>> +}
>> +
>> +impl TryFrom<&str> for FixedString {
>> +    type Error = FixedStringError;
>> +
>> +    fn try_from(value: &str) -> Result<Self, Self::Error> {
>> +        FixedString::new(value)
>> +    }
>> +}
>> +
>> +impl PartialEq<str> for FixedString {
>> +    fn eq(&self, other: &str) -> bool {
>> +        self.as_str().eq(other)
>> +    }
>> +}
>> +
>> +impl PartialEq<FixedString> for &str {
>> +    fn eq(&self, other: &FixedString) -> bool {
>> +        self.eq(&other.as_str())
>> +    }
>> +}
>> +
>> +impl PartialOrd<str> for FixedString {
>> +    fn partial_cmp(&self, other: &str) -> Option<Ordering> {
>> +        Some(self.as_str().cmp(&other))
>> +    }
>> +}
>> +
>> +impl PartialOrd<FixedString> for &str {
>> +    fn partial_cmp(&self, other: &FixedString) -> Option<Ordering> {
>> +        Some(self.cmp(&other.as_str()))
>> +    }
>> +}
>> +
>> +impl Serialize for FixedString {
>> +    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
>> +    where
>> +        S: serde::Serializer,
>> +    {
>> +        serializer.serialize_str(self.as_str())
>> +    }
>> +}
>> +
>> +impl<'de> Deserialize<'de> for FixedString {
>> +    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
>> +    where
>> +        D: serde::Deserializer<'de>,
>> +    {
>> +        deserializer.deserialize_str(FixedStringVisitor)
>> +    }
>> +}
>> +
>> +struct FixedStringVisitor;
> 
> ^ I'd prefer to move the type and Visitor impl into the `fn deserialize`.
> 

done

>> +
>> +impl<'de> serde::de::Visitor<'de> for FixedStringVisitor {
>> +    type Value = FixedString;
>> +
>> +    fn expecting(&self, f: &mut Formatter) -> std::fmt::Result {
>> +        f.write_str("a string that is at most 23 bytes long")
>> +    }
>> +
>> +    fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
>> +    where
>> +        E: serde::de::Error,
>> +    {
>> +        v.try_into().map_err(E::custom)
>> +    }
>> +
>> +    fn visit_string<E>(self, v: String) -> Result<Self::Value, E>
>> +    where
>> +        E: serde::de::Error,
>> +    {
> 
> ^ You can drop this. `visit_string` and `visit_borrowed_str` by default
> forward to `visit_str`.
> 

done

>> +        v.try_into().map_err(E::custom)
>> +    }
>> +}
>> +
>> +#[cfg(test)]
>> +mod tests {
>> +    use super::*;
>> +
>> +    use serde_plain;
>> +
>> +    #[test]
>> +    fn test_construct() {
>> +        let fixed_string = FixedString::new("").expect("empty string is valid");
>> +        assert_eq!("", fixed_string);
>> +
>> +        let fixed_string = FixedString::new("a").expect("valid string");
>> +        assert_eq!("a", fixed_string);
>> +
>> +        let fixed_string = FixedString::new("🌏🌏🌏🌏🌏").expect("valid string");
>> +        assert_eq!("🌏🌏🌏🌏🌏", fixed_string);
>> +
>> +        let fixed_string =
>> +            FixedString::new("aaaaaaaaaaaaaaaaaaaaaaa").expect("23 characters are allowed");
>> +        assert_eq!("aaaaaaaaaaaaaaaaaaaaaaa", fixed_string);
>> +
>> +        FixedString::new("🌏🌏🌏🌏🌏🌏").expect_err("string too long");
>> +        FixedString::new("aaaaaaaaaaaaaaaaaaaaaaaa").expect_err("string too long");
>> +    }
>> +
>> +    #[test]
>> +    fn test_serialize_deserialize() {
>> +        let valid_string = "aaaaaaaaaaaaaaaaaaaaaaa";
>> +
>> +        let fixed_string: FixedString =
>> +            serde_plain::from_str("aaaaaaaaaaaaaaaaaaaaaaa").expect("deserialization works");
>> +        assert_eq!(valid_string, fixed_string);
>> +
>> +        let serialized_string =
>> +            serde_plain::to_string(&fixed_string).expect("can be serialized into a string");
>> +        assert_eq!(valid_string, serialized_string);
>> +
>> +        serde_plain::from_str::<FixedString>("aaaaaaaaaaaaaaaaaaaaaaaa")
>> +            .expect_err("cannot deserialize string that is too long");
>> +    }
>> +
>> +    #[test]
>> +    fn test_ord() {
>> +        let fixed_string = FixedString::new("abc").expect("valid string");
>> +
>> +        assert!(fixed_string == fixed_string);
>> +        assert!(fixed_string >= fixed_string);
>> +        assert!(fixed_string <= fixed_string);
>> +
>> +        assert!("ab" < fixed_string);
>> +        assert!("abc" == fixed_string);
>> +        assert!("abcd" > fixed_string);
>> +
>> +        let larger_fixed_string = FixedString::new("abcde").expect("valid string");
>> +
>> +        assert!(larger_fixed_string > fixed_string);
>> +        assert!(fixed_string < larger_fixed_string);
>> +    }
>> +
>> +    #[test]
>> +    fn test_truncate() {
>> +        let fixed_string = FixedString::new_truncated("aaaaaaaaaaaaaaaaaaaaaaaaaaaa");
>> +        assert_eq!("aaaaaaaaaaaaaaaaaaaaaaa", fixed_string);
>> +
>> +        let fixed_string = FixedString::new_truncated("aaaaaaaaaaaaaaaaaa💯");
>> +        assert_eq!("aaaaaaaaaaaaaaaaaa💯", fixed_string);
>> +
>> +        // '💯' is 4 bytes long
>> +        // first byte of '💯' is inside the 23 byte limit, but it's not a valid utf-8 boundary so
>> +        // it should get truncated
>> +        let fixed_string = FixedString::new_truncated("aaaaaaaaaaaaaaaaaaaaa💯");
>> +        assert_eq!("aaaaaaaaaaaaaaaaaaaaa", fixed_string);
>> +
>> +        let fixed_string = FixedString::new_truncated("aaaaaaaaaaaaaaaaaaaaaa💯");
>> +        assert_eq!("aaaaaaaaaaaaaaaaaaaaaa", fixed_string);
>> +
>> +        let fixed_string = FixedString::new_truncated("aaaaaaaaaaaaaaaaaaaaaaa💯");
>> +        assert_eq!("aaaaaaaaaaaaaaaaaaaaaaa", fixed_string);
>> +
>> +        // '🌏' is 4 bytes long
>> +        let fixed_string = FixedString::new_truncated("🌏🌏🌏🌏🌏");
>> +        assert_eq!("🌏🌏🌏🌏🌏", fixed_string);
>> +
>> +        let fixed_string = FixedString::new_truncated("🌏🌏🌏🌏🌏🌏");
>> +        assert_eq!("🌏🌏🌏🌏🌏", fixed_string);
>> +    }
>> +}
>> diff --git a/pve-api-types/src/types/mod.rs b/pve-api-types/src/types/mod.rs
>> index fe52a169..9653dd46 100644
>> --- a/pve-api-types/src/types/mod.rs
>> +++ b/pve-api-types/src/types/mod.rs
>> @@ -17,6 +17,9 @@ pub mod array;
>>  pub mod stringlist;
>>  pub mod verifiers;
>>  
>> +mod fixed_string;
>> +pub use fixed_string::{FixedString, FixedStringError};
>> +
>>  include!("../generated/types.rs");
>>  
>>  /// A PVE Upid, contrary to a PBS Upid, contains no 'task-id' number.
>> -- 
>> 2.47.3

_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel