public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
To: Gabriel Goller <g.goller@proxmox.com>
Cc: pbs-devel@lists.proxmox.com
Subject: Re: [pbs-devel] [PATCH pathpatterns v2] match_list: added `matches_mode_lazy()`, which only retrieves file mode if necessary
Date: Thu, 17 Aug 2023 11:01:26 +0200	[thread overview]
Message-ID: <geluhhwcgsj5otowki6gf67lpzo7zbaxmhwdbuf2nemvunnpwm@bohtdrk5lg7e> (raw)
In-Reply-To: <20230816140330.191947-1-g.goller@proxmox.com>

Getting there :-)
I'd still like a few changes.

On Wed, Aug 16, 2023 at 04:03:29PM +0200, Gabriel Goller wrote:
> Added `matches_mode_lazy()` function, which takes a closure that should return the
> `file_mode`. The function will go through the patterns and match by path only
> until it finds a pattern which does not have `MatchFlag::ANY_FILE_TYPE`, in case
> it will call the closure, which will return a `file_mode`. This will ensure that
> the closure (which in our case executes `stat()`) will only be run once(at most),
> every pattern will only be processed once and that the order of the patterns
> will be respected.
> 
> Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
> ---
> 
> changes v2:
>  - updated the `matches_path()` function to take a closure and renamed 
>    it to `matches_mode_lazy()`. This allows us to only match once, so we 
>    don't need to match before and after `stat()` (without and with 
>    `file_mode`) anymore.
> 
> changes v1:
>  - added `matches_path()` function, which matches by path only and returns an
>    error if the `file_mode` is required.
> 
>  src/match_list.rs | 203 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 202 insertions(+), 1 deletion(-)
> 
> diff --git a/src/match_list.rs b/src/match_list.rs
> index c5b14e0..9914130 100644
> --- a/src/match_list.rs
> +++ b/src/match_list.rs
> @@ -1,6 +1,6 @@
>  //! Helpers for include/exclude lists.
> -
>  use bitflags::bitflags;
> +use std::{error::Error, fmt};
>  
>  use crate::PatternFlag;
>  
> @@ -39,6 +39,17 @@ impl Default for MatchFlag {
>      }
>  }
>  
> +#[derive(Debug, PartialEq)]
> +pub struct FileModeRequired;
> +
> +impl fmt::Display for FileModeRequired {
> +    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
> +        write!(f, "File mode is required for matching")
> +    }
> +}
> +
> +impl Error for FileModeRequired {}
> +
>  /// A pattern entry. (Glob patterns or literal patterns.)
>  // Note:
>  // For regex we'd likely use the POSIX extended REs via `regexec(3)`, since we're targetting
> @@ -304,12 +315,26 @@ impl MatchEntry {
>  
>          self.matches_path_exact(path)
>      }
> +

These can be dropped, because we can actually avoid the need to call any
`matches` functions that would require a mode if we don't actually have
a mode.

For this purpose, `MatchEntry` should just have a

    pub fn needs_file_mode(&self) -> bool;

(also utilized in its `fn matches_mode()` for less duplicate bitflag
handling code)

> +    /// Check whether the path contains a matching suffix. Returns an error if a file mode is required.
> +    pub fn matches_path<T: AsRef<[u8]>>(&self, path: T) -> Result<bool, FileModeRequired> {
> +        self.matches_path_do(path.as_ref())
> +    }
> +
> +    fn matches_path_do(&self, path: &[u8]) -> Result<bool, FileModeRequired> {
> +        if !self.flags.contains(MatchFlag::ANY_FILE_TYPE) {
> +            return Err(FileModeRequired);
> +        }
> +
> +        Ok(self.matches_path_suffix_do(path))
> +    }
>  }
>  
>  #[doc(hidden)]
>  pub trait MatchListEntry {
>      fn entry_matches(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
>      fn entry_matches_exact(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
> +    fn entry_matches_path(&self, path: &[u8]) -> Result<Option<MatchType>, FileModeRequired>;

Instead of this, we should expose
    fn needs_file_mode(&self) -> bool
in this trait.

>  }
>  
>  impl MatchListEntry for &'_ MatchEntry {
> @@ -328,6 +353,15 @@ impl MatchListEntry for &'_ MatchEntry {
>              None
>          }
>      }
> +
> +    fn entry_matches_path(&self, path: &[u8]) -> Result<Option<MatchType>, FileModeRequired> {
> +        let matches = self.matches_path(path)?;
> +        if matches {
> +            Ok(Some(self.match_type()))
> +        } else {
> +            Ok(None)
> +        }
> +    }
>  }
>  
>  impl MatchListEntry for &'_ &'_ MatchEntry {
> @@ -346,6 +380,15 @@ impl MatchListEntry for &'_ &'_ MatchEntry {
>              None
>          }
>      }
> +
> +    fn entry_matches_path(&self, path: &[u8]) -> Result<Option<MatchType>, FileModeRequired> {
> +        let matches = self.matches_path(path)?;
> +        if matches {
> +            Ok(Some(self.match_type()))
> +        } else {
> +            Ok(None)
> +        }
> +    }
>  }
>  
>  /// This provides [`matches`](MatchList::matches) and [`matches_exact`](MatchList::matches_exact)
> @@ -374,6 +417,22 @@ pub trait MatchList {
>      }

And here would be the main API break.
All of the methods here in `MatchList` should act as lazy instead of
having this as a new method.

The only downside is that this means fixing up all the test cases... ;-)

>  
>      fn matches_exact_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
> +
> +    /// Check whether this list contains anything exactly matching the path, returns error if
> +    /// `file_mode` is required for exact matching and cannot be retrieved using closure.
> +    fn matches_mode_lazy<T: AsRef<[u8]>, E: Error, U: FnMut() -> Result<u32, E>>(

Style nit: with large trait bounds, please move them into a `where`
clause instead (same in the other cases where you add these).

> +        &self,
> +        path: T,
> +        get_file_mode: U,
> +    ) -> Result<Option<MatchType>, E> {
> +        self.matches_mode_lazy_do(path.as_ref(), get_file_mode)
> +    }
> +
> +    fn matches_mode_lazy_do<E: Error, T: FnMut() -> Result<u32, E>>(
> +        &self,
> +        path: &[u8],
> +        get_file_mode: T,
> +    ) -> Result<Option<MatchType>, E>;
>  }
>  
>  impl<'a, T> MatchList for T
> @@ -408,6 +467,39 @@ where
>  
>          None
>      }
> +
> +    fn matches_mode_lazy_do<E: Error, U: FnMut() -> Result<u32, E>>(

`FnOnce` should suffice, to guarantee at the API level that we only call
it once.

You can make this work by moving it into an `Option` first:

    let mut get_file_mode = Some(get_file_mode);
    let mut file_mode = None;
    for m in ... {
        if file_mode.is_none() && m.needs_file_mode() {
            // unwrap: we always have get_file_mode XOR file_mode as Some()
            file_mode = Some((get_file_mode.take().unwrap())()?);
        }
        if let Some(mt) = m.entry_matches(path, file_mode) {
            return Ok(Some(mt));
        }
    }

Note that we just call the regular `entry_matches`, since given `None`
as file mode it simply won't bother with the file mode, but at the same
time, if the pattern does *want* a file mode, we try to fetch it.

Now, as a bonus, we could also make `None` work again as a parameter for
these methods by replacing the `FnOnce` parameter with a new small
custom trait:

    pub trait GetFileMode {
        type Error;
        fn get(self) -> Result<u32, Self::Error>;
    }

With this being our own trait, we can have a generic impl along with a
one for `Option<u32>`:

    impl<T> GetFileMode for T
    where
        T: FnOnce() -> Result<u32, E>,
    {
        type Error = E;
        fn get(self) -> Result<u32, Self::Error> {
            return self();
        }
    }

    impl GetFileMode for Option<u32> {
        type Error = FileModeRequired;
        fn get(self) -> Result<u32, Self::Error> {
            self.ok_or(FileModeRequired)
        }
    }

^ otherwise we also won't need the `FileModeRequired` error type ;-)

This will also make adapting the test cases much easier (only the `Ok()`
around the result types needs to be added, no closures for the modes).


> +        &self,
> +        path: &[u8],
> +        mut get_file_mode: U,
> +    ) -> Result<Option<MatchType>, E> {
> +        // This is an &self method on a `T where T: 'a`.
> +        let this: &'a Self = unsafe { std::mem::transmute(self) };
> +
> +        let mut file_mode: Option<u32> = None;
> +        for m in this.into_iter().rev() {
> +            if file_mode.is_none() {
> +                match m.entry_matches_path(path) {
> +                    Ok(mt) => {
> +                        if mt.is_some() {
> +                            return Ok(mt);
> +                        }
> +                    }
> +                    Err(_) => match get_file_mode() {
> +                        Ok(x) => file_mode = Some(x),
> +                        Err(e) => return Err(e),
> +                    },
> +                }
> +            }
> +            if file_mode.is_some() {
> +                let full_match = m.entry_matches(path, file_mode);
> +                if full_match.is_some() {
> +                    return Ok(full_match);
> +                }
> +            }
> +        }
> +        Ok(None)
> +    }
>  }
>  
>  #[test]




      parent reply	other threads:[~2023-08-17  9:01 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-16 14:03 Gabriel Goller
2023-08-16 14:03 ` [pbs-devel] [PATCH proxmox-backup v4] fix #4380: check if file is excluded before running `stat()` Gabriel Goller
2023-08-17  9:01 ` Wolfgang Bumiller [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=geluhhwcgsj5otowki6gf67lpzo7zbaxmhwdbuf2nemvunnpwm@bohtdrk5lg7e \
    --to=w.bumiller@proxmox.com \
    --cc=g.goller@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal