public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH pathpatterns v4] match_list: updated `matches()`, to only retrieve file mode if necessary
@ 2023-08-21 13:08 Gabriel Goller
  2023-08-21 13:08 ` [pbs-devel] [PATCH proxmox-backup v6] fix #4380: check if file is excluded before running `stat()` Gabriel Goller
  2023-08-22 13:07 ` [pbs-devel] [PATCH pathpatterns v4] match_list: updated `matches()`, to only retrieve file mode if necessary Wolfgang Bumiller
  0 siblings, 2 replies; 5+ messages in thread
From: Gabriel Goller @ 2023-08-21 13:08 UTC (permalink / raw)
  To: pbs-devel

Updated `matches()` function, which now takes the `GetFileMode` trait and returns a
Result. `GetFileMode` is a function that should return the `file_mode`. `matches()`
will go through the patterns and match by path only until it finds a pattern which
does not have `MatchFlag::ANY_FILE_TYPE`, in case it will call the
`get_file_mode`, which will return a `file_mode`. This will ensure that the
`get()` (which in our case executes `stat()`) will only be run once(at most),
every pattern will only be processed once and that the order of the patterns
will be respected. `GetFileMode` is also implemented for `Option<u32>` (so that
we can simply pass options of the `file_mode`) and `u32` (returning
`std::convert::Infallible`, so that we can pass a u32 mode directly).

Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
---

changes v3:
 - changed `matches()` and `matches_exact()` to retrieve the file_mode lazily.
   Allowed these functions to take in a trait `GetFileMode`, so that we can pass
   closures and options that contain the `file_mode`.

changes v2:
 - updated the `matches_path()` function to take a closure and renamed
   it to `matches_mode_lazy()`. This allows us to only match once, so we
   don't need to match before and after `stat()` (without and with
   `file_mode`) anymore.

changes v1:
 - added `matches_path()` function, which matches by path only and returns an
   error if the `file_mode` is required.


 src/lib.rs        |  72 +++++-----
 src/match_list.rs | 335 ++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 315 insertions(+), 92 deletions(-)

diff --git a/src/lib.rs b/src/lib.rs
index cb89917..998cdae 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -31,17 +31,17 @@
 //!     MatchEntry::include(Pattern::path("bananas/curved.*")?),
 //! ];
 //!
-//! assert_eq!(list.matches("/things", None), None);
-//! assert_eq!(list.matches("/things/shop", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas", None), Some(MatchType::Exclude));
-//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Some(MatchType::Include));
+//! assert_eq!(list.matches("/things", None), Ok(None));
+//! assert_eq!(list.matches("/things/shop", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas", None), Ok(Some(MatchType::Exclude)));
+//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
 //!
 //! // this will exclude the curved.bak file
 //! list.push(MatchEntry::exclude(Pattern::path("curved.bak")?));
-//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Some(MatchType::Exclude));
+//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Ok(Some(MatchType::Exclude)));
 //! list.pop();
-//! # assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Some(MatchType::Include));
+//! # assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
 //!
 //! // but this will not:
 //! list.push(
@@ -49,40 +49,40 @@
 //!         .flags(MatchFlag::ANCHORED)
 //! );
 //! // or: list.push
-//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Some(MatchType::Include));
+//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
 //! list.pop();
 //!
 //! // let's check some patterns, anything starting with a 'c', 'f' or 's':
 //! let mut list = vec![MatchEntry::include(Pattern::path("[cfs]*")?)];
-//! assert_eq!(list.matches("/things", None), None);
-//! assert_eq!(list.matches("/things/file1.dat", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/file2.dat", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/info.txt", None), None);
-//! assert_eq!(list.matches("/things/shop/apples", None), None);
-//! assert_eq!(list.matches("/things/shop/apples/gala.txt", None), None);
-//! assert_eq!(list.matches("/things/shop/apples/golden-delicious.txt", None), None);
-//! assert_eq!(list.matches("/things/shop/bananas", None), None);
-//! assert_eq!(list.matches("/things/shop/bananas/straight.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/shop/bananas/curved.bak", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas/other.txt", None), None);
+//! assert_eq!(list.matches("/things", None), Ok(None));
+//! assert_eq!(list.matches("/things/file1.dat", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/file2.dat", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/info.txt", None), Ok(None));
+//! assert_eq!(list.matches("/things/shop/apples", None), Ok(None));
+//! assert_eq!(list.matches("/things/shop/apples/gala.txt", None), Ok(None));
+//! assert_eq!(list.matches("/things/shop/apples/golden-delicious.txt", None), Ok(None));
+//! assert_eq!(list.matches("/things/shop/bananas", None), Ok(None));
+//! assert_eq!(list.matches("/things/shop/bananas/straight.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas/other.txt", None), Ok(None));
 //!
 //! // If we add `**` we end up including the entire `shop/` subtree:
 //! list.push(MatchEntry::include(Pattern::path("[cfs]*/**")?));
-//! assert_eq!(list.matches("/things", None), None);
-//! assert_eq!(list.matches("/things/file1.dat", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/file2.dat", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/info.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/apples", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/apples/gala.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/shop/apples/golden-delicious.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas/straight.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/shop/bananas/curved.bak", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/shop/bananas/other.txt", None), Some(MatchType::Include));
+//! assert_eq!(list.matches("/things", None), Ok(None));
+//! assert_eq!(list.matches("/things/file1.dat", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/file2.dat", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/info.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/apples", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/apples/gala.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/shop/apples/golden-delicious.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas/straight.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/shop/bananas/other.txt", None), Ok(Some(MatchType::Include)));
 //! # Ok(())
 //! # }
 //! # test().unwrap()
@@ -92,7 +92,9 @@ mod match_list;
 mod pattern;
 
 #[doc(inline)]
-pub use match_list::{MatchEntry, MatchFlag, MatchList, MatchPattern, MatchType};
+pub use match_list::{
+    FileModeRequired, GetFileMode, MatchEntry, MatchFlag, MatchList, MatchPattern, MatchType,
+};
 
 #[doc(inline)]
 pub use pattern::{ParseError, Pattern, PatternFlag};
diff --git a/src/match_list.rs b/src/match_list.rs
index c5b14e0..69b1fdb 100644
--- a/src/match_list.rs
+++ b/src/match_list.rs
@@ -1,5 +1,4 @@
 //! Helpers for include/exclude lists.
-
 use bitflags::bitflags;
 
 use crate::PatternFlag;
@@ -39,6 +38,46 @@ impl Default for MatchFlag {
     }
 }
 
+#[derive(Debug, PartialEq)]
+pub struct FileModeRequired;
+
+impl std::fmt::Display for FileModeRequired {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        write!(f, "File mode is required for matching")
+    }
+}
+
+impl std::error::Error for FileModeRequired {}
+
+pub trait GetFileMode {
+    type Error;
+    fn get(self) -> Result<u32, Self::Error>;
+}
+
+impl<T, E> GetFileMode for T
+where
+    T: FnOnce() -> Result<u32, E>,
+{
+    type Error = E;
+    fn get(self) -> Result<u32, Self::Error> {
+        self()
+    }
+}
+
+impl GetFileMode for Option<u32> {
+    type Error = FileModeRequired;
+    fn get(self) -> Result<u32, Self::Error> {
+        self.ok_or(FileModeRequired {})
+    }
+}
+
+impl GetFileMode for u32 {
+    type Error = std::convert::Infallible;
+    fn get(self) -> Result<u32, Self::Error> {
+        Ok(self)
+    }
+}
+
 /// A pattern entry. (Glob patterns or literal patterns.)
 // Note:
 // For regex we'd likely use the POSIX extended REs via `regexec(3)`, since we're targetting
@@ -211,7 +250,7 @@ impl MatchEntry {
     pub fn matches_mode(&self, file_mode: u32) -> bool {
         // bitflags' `.contains` means ALL bits must be set, if they are all set we don't
         // need to check the mode...
-        if self.flags.contains(MatchFlag::ANY_FILE_TYPE) {
+        if !self.needs_file_mode() {
             return true;
         }
 
@@ -304,12 +343,21 @@ impl MatchEntry {
 
         self.matches_path_exact(path)
     }
+
+    pub fn needs_file_mode(&self) -> bool {
+        let flags = (self.flags & MatchFlag::ANY_FILE_TYPE).bits();
+        if flags != 0 && flags != MatchFlag::ANY_FILE_TYPE.bits() {
+            return true;
+        }
+        false
+    }
 }
 
 #[doc(hidden)]
 pub trait MatchListEntry {
     fn entry_matches(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
     fn entry_matches_exact(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
+    fn entry_needs_file_mode(&self) -> bool;
 }
 
 impl MatchListEntry for &'_ MatchEntry {
@@ -328,6 +376,10 @@ impl MatchListEntry for &'_ MatchEntry {
             None
         }
     }
+
+    fn entry_needs_file_mode(&self) -> bool {
+        self.needs_file_mode()
+    }
 }
 
 impl MatchListEntry for &'_ &'_ MatchEntry {
@@ -346,6 +398,10 @@ impl MatchListEntry for &'_ &'_ MatchEntry {
             None
         }
     }
+
+    fn entry_needs_file_mode(&self) -> bool {
+        self.needs_file_mode()
+    }
 }
 
 /// This provides [`matches`](MatchList::matches) and [`matches_exact`](MatchList::matches_exact)
@@ -361,19 +417,18 @@ impl MatchListEntry for &'_ &'_ MatchEntry {
 /// This makes it easier to use slices over entries or references to entries.
 pub trait MatchList {
     /// Check whether this list contains anything matching a prefix of the specified path, and the
-    /// specified file mode.
-    fn matches<T: AsRef<[u8]>>(&self, path: T, file_mode: Option<u32>) -> Option<MatchType> {
-        self.matches_do(path.as_ref(), file_mode)
-    }
-
-    fn matches_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
-
-    /// Check whether this list contains anything exactly matching the path and mode.
-    fn matches_exact<T: AsRef<[u8]>>(&self, path: T, file_mode: Option<u32>) -> Option<MatchType> {
-        self.matches_exact_do(path.as_ref(), file_mode)
-    }
-
-    fn matches_exact_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
+    /// specified file mode. Gets the file_mode lazily, only if needed.
+    fn matches<T, U>(&self, path: T, get_file_mode: U) -> Result<Option<MatchType>, U::Error>
+    where
+        T: AsRef<[u8]>,
+        U: GetFileMode;
+
+    /// Check whether this list contains anything exactly matching the path and mode. Gets the
+    /// file_mode lazily, only if needed.
+    fn matches_exact<T, U>(&self, path: T, get_file_mode: U) -> Result<Option<MatchType>, U::Error>
+    where
+        T: AsRef<[u8]>,
+        U: GetFileMode;
 }
 
 impl<'a, T> MatchList for T
@@ -383,30 +438,48 @@ where
     <&'a T as IntoIterator>::IntoIter: DoubleEndedIterator,
     <&'a T as IntoIterator>::Item: MatchListEntry,
 {
-    fn matches_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType> {
+    fn matches<U, G>(&self, path: U, get_file_mode: G) -> Result<Option<MatchType>, G::Error>
+    where
+        U: AsRef<[u8]>,
+        G: GetFileMode,
+    {
         // This is an &self method on a `T where T: 'a`.
         let this: &'a Self = unsafe { std::mem::transmute(self) };
 
+        let mut get_file_mode = Some(get_file_mode);
+        let mut file_mode = None;
+
         for m in this.into_iter().rev() {
-            if let Some(mt) = m.entry_matches(path, file_mode) {
-                return Some(mt);
+            if file_mode.is_none() && m.entry_needs_file_mode() {
+                file_mode = Some(get_file_mode.take().unwrap().get()?);
+            }
+            if let Some(mt) = m.entry_matches(path.as_ref(), file_mode) {
+                return Ok(Some(mt));
             }
         }
-
-        None
+        Ok(None)
     }
 
-    fn matches_exact_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType> {
+    fn matches_exact<U, G>(&self, path: U, get_file_mode: G) -> Result<Option<MatchType>, G::Error>
+    where
+        U: AsRef<[u8]>,
+        G: GetFileMode,
+    {
         // This is an &self method on a `T where T: 'a`.
         let this: &'a Self = unsafe { std::mem::transmute(self) };
 
+        let mut get_file_mode = Some(get_file_mode);
+        let mut file_mode = None;
+
         for m in this.into_iter().rev() {
-            if let Some(mt) = m.entry_matches_exact(path, file_mode) {
-                return Some(mt);
+            if file_mode.is_none() && m.entry_needs_file_mode() {
+                file_mode = Some(get_file_mode.take().unwrap().get()?);
+            }
+            if let Some(mt) = m.entry_matches_exact(path.as_ref(), file_mode) {
+                return Ok(Some(mt));
             }
         }
-
-        None
+        Ok(None)
     }
 }
 
@@ -415,20 +488,20 @@ fn assert_containers_implement_match_list() {
     use std::iter::FromIterator;
 
     let vec = vec![MatchEntry::include(crate::Pattern::path("a*").unwrap())];
-    assert_eq!(vec.matches("asdf", None), Some(MatchType::Include));
+    assert_eq!(vec.matches("asdf", None), Ok(Some(MatchType::Include)));
 
     // FIXME: ideally we can make this work as well!
     let vd = std::collections::VecDeque::<MatchEntry>::from_iter(vec.clone());
-    assert_eq!(vd.matches("asdf", None), Some(MatchType::Include));
+    assert_eq!(vd.matches("asdf", None), Ok(Some(MatchType::Include)));
 
     let list: &[MatchEntry] = &vec[..];
-    assert_eq!(list.matches("asdf", None), Some(MatchType::Include));
+    assert_eq!(list.matches("asdf", None), Ok(Some(MatchType::Include)));
 
     let list: Vec<&MatchEntry> = vec.iter().collect();
-    assert_eq!(list.matches("asdf", None), Some(MatchType::Include));
+    assert_eq!(list.matches("asdf", None), Ok(Some(MatchType::Include)));
 
     let list: &[&MatchEntry] = &list[..];
-    assert_eq!(list.matches("asdf", None), Some(MatchType::Include));
+    assert_eq!(list.matches("asdf", None), Ok(Some(MatchType::Include)));
 }
 
 #[test]
@@ -443,25 +516,28 @@ fn test_file_type_matches() {
     ];
     assert_eq!(
         matchlist.matches("a_dir", Some(libc::S_IFDIR)),
-        Some(MatchType::Include)
+        Ok(Some(MatchType::Include))
     );
     assert_eq!(
         matchlist.matches("/a_dir", Some(libc::S_IFDIR)),
-        Some(MatchType::Include)
+        Ok(Some(MatchType::Include))
     );
-    assert_eq!(matchlist.matches("/a_dir", Some(libc::S_IFREG)), None);
+    assert_eq!(matchlist.matches("/a_dir", Some(libc::S_IFREG)), Ok(None));
 
     assert_eq!(
         matchlist.matches("/a_file", Some(libc::S_IFREG)),
-        Some(MatchType::Exclude)
+        Ok(Some(MatchType::Exclude))
     );
-    assert_eq!(matchlist.matches("/a_file", Some(libc::S_IFDIR)), None);
+    assert_eq!(matchlist.matches("/a_file", Some(libc::S_IFDIR)), Ok(None));
 
     assert_eq!(
         matchlist.matches("/another_dir", Some(libc::S_IFDIR)),
-        Some(MatchType::Exclude)
+        Ok(Some(MatchType::Exclude))
+    );
+    assert_eq!(
+        matchlist.matches("/another_dir", Some(libc::S_IFREG)),
+        Ok(None)
     );
-    assert_eq!(matchlist.matches("/another_dir", Some(libc::S_IFREG)), None);
 }
 
 #[test]
@@ -471,22 +547,25 @@ fn test_anchored_matches() {
     let matchlist = vec![
         MatchEntry::new(Pattern::path("file-a").unwrap(), MatchType::Include),
         MatchEntry::new(Pattern::path("some/path").unwrap(), MatchType::Include)
-            .flags(MatchFlag::ANCHORED),
+            .add_flags(MatchFlag::ANCHORED),
     ];
 
-    assert_eq!(matchlist.matches("file-a", None), Some(MatchType::Include));
+    assert_eq!(
+        matchlist.matches("file-a", None),
+        Ok(Some(MatchType::Include))
+    );
     assert_eq!(
         matchlist.matches("another/file-a", None),
-        Some(MatchType::Include)
+        Ok(Some(MatchType::Include))
     );
 
-    assert_eq!(matchlist.matches("some", None), None);
-    assert_eq!(matchlist.matches("path", None), None);
+    assert_eq!(matchlist.matches("some", None), Ok(None));
+    assert_eq!(matchlist.matches("path", None), Ok(None));
     assert_eq!(
         matchlist.matches("some/path", None),
-        Some(MatchType::Include)
+        Ok(Some(MatchType::Include))
     );
-    assert_eq!(matchlist.matches("another/some/path", None), None);
+    assert_eq!(matchlist.matches("another/some/path", None), Ok(None));
 }
 
 #[test]
@@ -495,7 +574,10 @@ fn test_literal_matches() {
         MatchPattern::Literal(b"/bin/mv".to_vec()),
         MatchType::Include,
     )];
-    assert_eq!(matchlist.matches("/bin/mv", None), Some(MatchType::Include));
+    assert_eq!(
+        matchlist.matches("/bin/mv", None),
+        Ok(Some(MatchType::Include))
+    );
 }
 
 #[test]
@@ -504,29 +586,168 @@ fn test_path_relativity() {
     let matchlist = vec![
         MatchEntry::new(Pattern::path("noslash").unwrap(), MatchType::Include),
         MatchEntry::new(Pattern::path("noslash-a").unwrap(), MatchType::Include)
-            .flags(MatchFlag::ANCHORED),
+            .add_flags(MatchFlag::ANCHORED),
         MatchEntry::new(Pattern::path("/slash").unwrap(), MatchType::Include),
         MatchEntry::new(Pattern::path("/slash-a").unwrap(), MatchType::Include)
-            .flags(MatchFlag::ANCHORED),
+            .add_flags(MatchFlag::ANCHORED),
     ];
-    assert_eq!(matchlist.matches("noslash", None), Some(MatchType::Include));
+    assert_eq!(
+        matchlist.matches("noslash", None),
+        Ok(Some(MatchType::Include))
+    );
     assert_eq!(
         matchlist.matches("noslash-a", None),
-        Some(MatchType::Include)
+        Ok(Some(MatchType::Include))
     );
-    assert_eq!(matchlist.matches("slash", None), None);
-    assert_eq!(matchlist.matches("/slash", None), Some(MatchType::Include));
-    assert_eq!(matchlist.matches("slash-a", None), None);
+    assert_eq!(matchlist.matches("slash", None), Ok(None));
+    assert_eq!(
+        matchlist.matches("/slash", None),
+        Ok(Some(MatchType::Include))
+    );
+    assert_eq!(matchlist.matches("slash-a", None), Ok(None));
     assert_eq!(
         matchlist.matches("/slash-a", None),
-        Some(MatchType::Include)
+        Ok(Some(MatchType::Include))
     );
 
     assert_eq!(
         matchlist.matches("foo/noslash", None),
-        Some(MatchType::Include)
+        Ok(Some(MatchType::Include))
+    );
+    assert_eq!(matchlist.matches("foo/noslash-a", None), Ok(None));
+    assert_eq!(matchlist.matches("foo/slash", None), Ok(None));
+    assert_eq!(matchlist.matches("foo/slash-a", None), Ok(None));
+}
+
+#[test]
+fn matches_path() {
+    use crate::Pattern;
+
+    let matchlist = vec![
+        MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude),
+        MatchEntry::new(Pattern::path("b*").unwrap(), MatchType::Exclude),
+    ];
+
+    assert_eq!(
+        matchlist.matches("ahsjdj", || Err(FileModeRequired {})),
+        Ok(Some(MatchType::Exclude))
+    );
+    let mut test = 1;
+    let result = matchlist.matches("bhshdf", || {
+        test += 1;
+        Ok::<u32, FileModeRequired>(libc::S_IFDIR)
+    });
+    assert_eq!(result, Ok(Some(MatchType::Exclude)));
+    assert_eq!(test, 1);
+
+    let matchlist = vec![
+        MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+            .flags(MatchFlag::MATCH_DIRECTORIES),
+        MatchEntry::new(Pattern::path("b*").unwrap(), MatchType::Exclude),
+    ];
+
+    let mut test = 1;
+    let result = matchlist.matches("aa", || {
+        test += 1;
+        Ok::<u32, FileModeRequired>(libc::S_IFDIR)
+    });
+    assert_eq!(result, Ok(Some(MatchType::Exclude)));
+    assert_eq!(test, 2);
+    assert_eq!(
+        matchlist.matches("ahsjdj", || Err(FileModeRequired {})),
+        Err(FileModeRequired {})
+    );
+    assert_eq!(
+        matchlist.matches("bhshdf", || Err(FileModeRequired {})),
+        Ok(Some(MatchType::Exclude))
+    );
+    assert_eq!(
+        matchlist.matches("ahsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFREG)),
+        Ok(None)
+    );
+
+    let matchlist = vec![
+        MatchEntry::new(Pattern::path("b*").unwrap(), MatchType::Include),
+        MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+            .flags(MatchFlag::MATCH_DIRECTORIES),
+    ];
+    assert_eq!(
+        matchlist.matches("ahsjdj", || Err(FileModeRequired {})),
+        Err(FileModeRequired {})
+    );
+    assert_eq!(
+        matchlist.matches("bhshdf", || Err(FileModeRequired {})),
+        Err(FileModeRequired {})
+    );
+    assert_eq!(
+        matchlist.matches("ahsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFDIR)),
+        Ok(Some(MatchType::Exclude))
+    );
+    assert_eq!(
+        matchlist.matches("ahsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFREG)),
+        Ok(None)
+    );
+    assert_eq!(
+        matchlist.matches("bbb", || Ok::<u32, FileModeRequired>(libc::S_IFREG)),
+        Ok(Some(MatchType::Include))
     );
-    assert_eq!(matchlist.matches("foo/noslash-a", None), None);
-    assert_eq!(matchlist.matches("foo/slash", None), None);
-    assert_eq!(matchlist.matches("foo/slash-a", None), None);
+
+    let matchlist = vec![
+        MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+            .flags(MatchFlag::MATCH_DIRECTORIES),
+    ];
+
+    assert_eq!(
+        matchlist.matches("bbb", || Ok::<u32, FileModeRequired>(libc::S_IFDIR)),
+        Ok(None)
+    );
+
+    let matchlist = vec![
+        MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+            .flags(MatchFlag::MATCH_DIRECTORIES),
+        MatchEntry::new(Pattern::path("b*").unwrap(), MatchType::Exclude)
+            .flags(MatchFlag::MATCH_REGULAR_FILES),
+    ];
+
+    assert_eq!(
+        matchlist.matches("ahsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFDIR)),
+        Ok(Some(MatchType::Exclude))
+    );
+    assert_eq!(
+        matchlist.matches("ahsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFREG)),
+        Ok(None)
+    );
+    assert_eq!(
+        matchlist.matches("bhsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFREG)),
+        Ok(Some(MatchType::Exclude))
+    );
+    assert_eq!(
+        matchlist.matches("bhsjdj", libc::S_IFREG as u32),
+        Ok(Some(MatchType::Exclude))
+    );
+    assert_eq!(matchlist.matches("bhsjdj", 0), Ok(None));
+}
+
+#[test]
+fn match_entry_needs_flag() {
+    use crate::Pattern;
+    let match_entry =
+        MatchEntry::parse_pattern("a*/", PatternFlag::PATH_NAME, MatchType::Exclude).unwrap();
+    assert_eq!(match_entry.needs_file_mode(), true);
+
+    let match_entry = MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+        .flags(MatchFlag::MATCH_REGULAR_FILES);
+    assert_eq!(match_entry.needs_file_mode(), true);
+
+    let match_entry = MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+        .flags(MatchFlag::MATCH_DIRECTORIES | MatchFlag::MATCH_REGULAR_FILES);
+    assert_eq!(match_entry.needs_file_mode(), true);
+
+    let match_entry = MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+        .flags(MatchFlag::ANCHORED | MatchFlag::MATCH_DIRECTORIES);
+    assert_eq!(match_entry.needs_file_mode(), true);
+
+    let match_entry = MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+        .flags(MatchFlag::ANY_FILE_TYPE);
+    assert_eq!(match_entry.needs_file_mode(), false);
 }
-- 
2.39.2





^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v6] fix #4380: check if file is excluded before running `stat()`
  2023-08-21 13:08 [pbs-devel] [PATCH pathpatterns v4] match_list: updated `matches()`, to only retrieve file mode if necessary Gabriel Goller
@ 2023-08-21 13:08 ` Gabriel Goller
  2023-08-22 13:04   ` Wolfgang Bumiller
  2023-08-22 13:07 ` [pbs-devel] [PATCH pathpatterns v4] match_list: updated `matches()`, to only retrieve file mode if necessary Wolfgang Bumiller
  1 sibling, 1 reply; 5+ messages in thread
From: Gabriel Goller @ 2023-08-21 13:08 UTC (permalink / raw)
  To: pbs-devel

Passed a closure with the `stat()` function call to `matches()`. This
will traverse through all patterns and try to match using the path only, if a
`file_mode` is needed, it will run the closure. This means that if we exclude
a file with the `MatchType::ANY_FILE_TYPE`, we will skip it without running
`stat()` on it. As we updated the `matches()` function, we also updated all the
invocations of it.
Added `pathpatterns` crate to local overrides in cargo.toml.

Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
---

changes v5:
 - updated all invocations of `matches()` 

changes v4:
 - match only by path and exclude the matched files, the run `stat()` and
   match again, this time using the `file_mode`. This will match everything 
   twice in the worst case, which is not optimal.
changes v3:
 - checking for `read` and `execute` permissions before entering directory,
   doesn't work because there are a lot of side-effects (executed by
   different user, AppArmor, SELinux, ...).
changes v2:
 - checking for excluded files with `matches()` before executing `stat()`,
   this doesn't work because we get the file_mode from `stat()` and don't
   want to ignore it when matching.


 Cargo.toml                      |  5 +++--
 pbs-client/src/catalog_shell.rs |  8 +++----
 pbs-client/src/pxar/create.rs   | 38 ++++++++++++++++++++-------------
 pbs-client/src/pxar/extract.rs  | 10 +++++----
 pbs-datastore/src/catalog.rs    |  6 +++---
 5 files changed, 39 insertions(+), 28 deletions(-)

diff --git a/Cargo.toml b/Cargo.toml
index 5cbae1b8..560794a4 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -264,8 +264,9 @@ proxmox-rrd.workspace = true
 #proxmox-sortable-macro = { path = "../proxmox/proxmox-sortable-macro" }
 #proxmox-human-byte = { path = "../proxmox/proxmox-human-byte" }
 
-#proxmox-apt = { path = "../proxmox/proxmox-apt" }
-#proxmox-openid = { path = "../proxmox/proxmox-openid" }
+#proxmox-apt = { path = "../proxmox-apt" }
+#proxmox-openid = { path = "../proxmox-openid-rs" }
+#pathpatterns = {path = "../pathpatterns" }
 
 #pxar = { path = "../pxar" }
 
diff --git a/pbs-client/src/catalog_shell.rs b/pbs-client/src/catalog_shell.rs
index b8aaf8cb..f53b3cc5 100644
--- a/pbs-client/src/catalog_shell.rs
+++ b/pbs-client/src/catalog_shell.rs
@@ -1138,14 +1138,14 @@ impl<'a> ExtractorState<'a> {
     pub async fn handle_entry(&mut self, entry: catalog::DirEntry) -> Result<(), Error> {
         let match_result = self.match_list.matches(&self.path, entry.get_file_mode());
         let did_match = match match_result {
-            Some(MatchType::Include) => true,
-            Some(MatchType::Exclude) => false,
-            None => self.matches,
+            Ok(Some(MatchType::Include)) => true,
+            Ok(Some(MatchType::Exclude)) => false,
+            _ => self.matches,
         };
 
         match (did_match, &entry.attr) {
             (_, DirEntryAttribute::Directory { .. }) => {
-                self.handle_new_directory(entry, match_result).await?;
+                self.handle_new_directory(entry, match_result?).await?;
             }
             (true, DirEntryAttribute::File { .. }) => {
                 self.dir_stack.push(PathStackEntry::new(entry));
diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 2577cf98..2d516cfa 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -21,7 +21,6 @@ use pxar::Metadata;
 
 use proxmox_io::vec;
 use proxmox_lang::c_str;
-use proxmox_sys::error::SysError;
 use proxmox_sys::fs::{self, acl, xattr};
 
 use pbs_datastore::catalog::BackupCatalogWriter;
@@ -420,7 +419,7 @@ impl Archiver {
         for file in dir.iter() {
             let file = file?;
 
-            let file_name = file.file_name().to_owned();
+            let file_name = file.file_name();
             let file_name_bytes = file_name.to_bytes();
             if file_name_bytes == b"." || file_name_bytes == b".." {
                 continue;
@@ -434,25 +433,34 @@ impl Archiver {
             assert_single_path_component(os_file_name)?;
             let full_path = self.path.join(os_file_name);
 
-            let stat = match nix::sys::stat::fstatat(
+            let match_path = PathBuf::from("/").join(full_path.clone());
+
+            let mut stat_results: Option<FileStat> = None;
+
+            let get_file_mode = || match nix::sys::stat::fstatat(
                 dir_fd,
-                file_name.as_c_str(),
+                file_name.to_owned().as_c_str(),
                 nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
             ) {
-                Ok(stat) => stat,
-                Err(ref err) if err.not_found() => continue,
-                Err(err) => return Err(err).context(format!("stat failed on {:?}", full_path)),
+                Ok(stat) => Ok(stat),
+                Err(e) => Err(e),
             };
-
-            let match_path = PathBuf::from("/").join(full_path.clone());
-            if self
-                .patterns
-                .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
-                == Some(MatchType::Exclude)
+            if Some(MatchType::Exclude)
+                == self
+                    .patterns
+                    .matches(match_path.as_os_str().as_bytes(), || {
+                        Ok::<_, Errno>(match &stat_results {
+                            Some(result) => result.st_mode,
+                            None => stat_results.insert(get_file_mode()?).st_mode,
+                        })
+                    })
+                    .with_context(|| format!("stat failed on {full_path:?}"))?
             {
                 continue;
             }
 
+            let stat = stat_results.map(Ok).unwrap_or_else(get_file_mode)?;
+
             self.entry_counter += 1;
             if self.entry_counter > self.entry_limit {
                 bail!(
@@ -462,7 +470,7 @@ impl Archiver {
             }
 
             file_list.push(FileListEntry {
-                name: file_name,
+                name: file_name.to_owned(),
                 path: full_path,
                 stat,
             });
@@ -533,7 +541,7 @@ impl Archiver {
         let match_path = PathBuf::from("/").join(self.path.clone());
         if self
             .patterns
-            .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
+            .matches(match_path.as_os_str().as_bytes(), stat.st_mode)?
             == Some(MatchType::Exclude)
         {
             return Ok(());
diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
index 4eb6fb90..e24a1560 100644
--- a/pbs-client/src/pxar/extract.rs
+++ b/pbs-client/src/pxar/extract.rs
@@ -251,22 +251,24 @@ where
 
         self.extractor.set_path(entry.path().as_os_str().to_owned());
 
+        // We can `unwrap()` safely here because we get a `Result<_, std::convert::Infallible>`
         let match_result = self.match_list.matches(
             entry.path().as_os_str().as_bytes(),
-            Some(metadata.file_type() as u32),
-        );
+            metadata.file_type() as u32
+        ).unwrap();
 
         let did_match = match match_result {
             Some(MatchType::Include) => true,
             Some(MatchType::Exclude) => false,
-            None => self.state.current_match,
+            _ => self.state.current_match,
         };
 
         let extract_res = match (did_match, entry.kind()) {
             (_, EntryKind::Directory) => {
                 self.callback(entry.path());
 
-                let create = self.state.current_match && match_result != Some(MatchType::Exclude);
+                let create =
+                    self.state.current_match && match_result != Some(MatchType::Exclude);
                 let res = self
                     .extractor
                     .enter_directory(file_name_os.to_owned(), metadata.clone(), create)
diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
index 11c14b64..86e20c92 100644
--- a/pbs-datastore/src/catalog.rs
+++ b/pbs-datastore/src/catalog.rs
@@ -678,9 +678,9 @@ impl<R: Read + Seek> CatalogReader<R> {
             }
             file_path.extend(&e.name);
             match match_list.matches(&file_path, e.get_file_mode()) {
-                Some(MatchType::Exclude) => continue,
-                Some(MatchType::Include) => callback(file_path)?,
-                None => (),
+                Ok(Some(MatchType::Exclude)) => continue,
+                Ok(Some(MatchType::Include)) => callback(file_path)?,
+                _ => (),
             }
             if is_dir {
                 self.find(&e, file_path, match_list, callback)?;
-- 
2.39.2





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup v6] fix #4380: check if file is excluded before running `stat()`
  2023-08-21 13:08 ` [pbs-devel] [PATCH proxmox-backup v6] fix #4380: check if file is excluded before running `stat()` Gabriel Goller
@ 2023-08-22 13:04   ` Wolfgang Bumiller
  2023-08-22 14:07     ` Gabriel Goller
  0 siblings, 1 reply; 5+ messages in thread
From: Wolfgang Bumiller @ 2023-08-22 13:04 UTC (permalink / raw)
  To: Gabriel Goller; +Cc: pbs-devel

On Mon, Aug 21, 2023 at 03:08:26PM +0200, Gabriel Goller wrote:
> Passed a closure with the `stat()` function call to `matches()`. This
> will traverse through all patterns and try to match using the path only, if a
> `file_mode` is needed, it will run the closure. This means that if we exclude
> a file with the `MatchType::ANY_FILE_TYPE`, we will skip it without running
> `stat()` on it. As we updated the `matches()` function, we also updated all the
> invocations of it.
> Added `pathpatterns` crate to local overrides in cargo.toml.
> 
> Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
> ---
> 
> changes v5:
>  - updated all invocations of `matches()` 
> 
> changes v4:
>  - match only by path and exclude the matched files, the run `stat()` and
>    match again, this time using the `file_mode`. This will match everything 
>    twice in the worst case, which is not optimal.
> changes v3:
>  - checking for `read` and `execute` permissions before entering directory,
>    doesn't work because there are a lot of side-effects (executed by
>    different user, AppArmor, SELinux, ...).
> changes v2:
>  - checking for excluded files with `matches()` before executing `stat()`,
>    this doesn't work because we get the file_mode from `stat()` and don't
>    want to ignore it when matching.
> 
> 
>  Cargo.toml                      |  5 +++--
>  pbs-client/src/catalog_shell.rs |  8 +++----
>  pbs-client/src/pxar/create.rs   | 38 ++++++++++++++++++++-------------
>  pbs-client/src/pxar/extract.rs  | 10 +++++----
>  pbs-datastore/src/catalog.rs    |  6 +++---
>  5 files changed, 39 insertions(+), 28 deletions(-)
> 
> diff --git a/Cargo.toml b/Cargo.toml
> index 5cbae1b8..560794a4 100644
> --- a/Cargo.toml
> +++ b/Cargo.toml
> @@ -264,8 +264,9 @@ proxmox-rrd.workspace = true
>  #proxmox-sortable-macro = { path = "../proxmox/proxmox-sortable-macro" }
>  #proxmox-human-byte = { path = "../proxmox/proxmox-human-byte" }
>  
> -#proxmox-apt = { path = "../proxmox/proxmox-apt" }
> -#proxmox-openid = { path = "../proxmox/proxmox-openid" }
> +#proxmox-apt = { path = "../proxmox-apt" }
> +#proxmox-openid = { path = "../proxmox-openid-rs" }
> +#pathpatterns = {path = "../pathpatterns" }
>  
>  #pxar = { path = "../pxar" }
>  
> diff --git a/pbs-client/src/catalog_shell.rs b/pbs-client/src/catalog_shell.rs
> index b8aaf8cb..f53b3cc5 100644
> --- a/pbs-client/src/catalog_shell.rs
> +++ b/pbs-client/src/catalog_shell.rs
> @@ -1138,14 +1138,14 @@ impl<'a> ExtractorState<'a> {
>      pub async fn handle_entry(&mut self, entry: catalog::DirEntry) -> Result<(), Error> {
>          let match_result = self.match_list.matches(&self.path, entry.get_file_mode());
>          let did_match = match match_result {
> -            Some(MatchType::Include) => true,
> -            Some(MatchType::Exclude) => false,
> -            None => self.matches,
> +            Ok(Some(MatchType::Include)) => true,
> +            Ok(Some(MatchType::Exclude)) => false,
> +            _ => self.matches,
>          };
>  
>          match (did_match, &entry.attr) {
>              (_, DirEntryAttribute::Directory { .. }) => {
> -                self.handle_new_directory(entry, match_result).await?;
> +                self.handle_new_directory(entry, match_result?).await?;
>              }
>              (true, DirEntryAttribute::File { .. }) => {
>                  self.dir_stack.push(PathStackEntry::new(entry));
> diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
> index 2577cf98..2d516cfa 100644
> --- a/pbs-client/src/pxar/create.rs
> +++ b/pbs-client/src/pxar/create.rs
> @@ -21,7 +21,6 @@ use pxar::Metadata;
>  
>  use proxmox_io::vec;
>  use proxmox_lang::c_str;
> -use proxmox_sys::error::SysError;
>  use proxmox_sys::fs::{self, acl, xattr};
>  
>  use pbs_datastore::catalog::BackupCatalogWriter;
> @@ -420,7 +419,7 @@ impl Archiver {
>          for file in dir.iter() {
>              let file = file?;
>  
> -            let file_name = file.file_name().to_owned();
> +            let file_name = file.file_name();
>              let file_name_bytes = file_name.to_bytes();
>              if file_name_bytes == b"." || file_name_bytes == b".." {
>                  continue;
> @@ -434,25 +433,34 @@ impl Archiver {
>              assert_single_path_component(os_file_name)?;
>              let full_path = self.path.join(os_file_name);
>  
> -            let stat = match nix::sys::stat::fstatat(
> +            let match_path = PathBuf::from("/").join(full_path.clone());
> +
> +            let mut stat_results: Option<FileStat> = None;
> +
> +            let get_file_mode = || match nix::sys::stat::fstatat(

You don't need that 'match' here if your cases are basically a no-op.

You *could* attach the `stat failed on ...` context with the file name
here... but this will make it a bit more tedious to check for `ENOENT`
as we'd need to use `.downcast()` on the anyhow::Error.

So either that, or we'll need to duplicate the context line down below
where we do the final `let stat = results.unwrap_or_else(get_file_mode)?`.

>                  dir_fd,
> -                file_name.as_c_str(),
> +                file_name.to_owned().as_c_str(),
>                  nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
>              ) {
> -                Ok(stat) => stat,
> -                Err(ref err) if err.not_found() => continue,

Because the thing is, we still need to handle this case - just noticed
that this was removed entirely, which is of course not what we want :-)
If the file gets removed between listing it from the directory and us
trying to `stat` it, we should just act as if it had never existed.

> -                Err(err) => return Err(err).context(format!("stat failed on {:?}", full_path)),
> +                Ok(stat) => Ok(stat),
> +                Err(e) => Err(e),
>              };
> -
> -            let match_path = PathBuf::from("/").join(full_path.clone());
> -            if self
> -                .patterns
> -                .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
> -                == Some(MatchType::Exclude)
> +            if Some(MatchType::Exclude)
> +                == self
> +                    .patterns
> +                    .matches(match_path.as_os_str().as_bytes(), || {
> +                        Ok::<_, Errno>(match &stat_results {
> +                            Some(result) => result.st_mode,
> +                            None => stat_results.insert(get_file_mode()?).st_mode,
> +                        })
> +                    })

... it will instead need to be checked here ^
Basically, we will need to swap the `if` for a `match` after all, with
the cases
    Ok(Some(MatchType::Exclude)) => continue,
    Ok(_) => (),
    Err(err) if err.not_found() => continue,
    err => return err.with_context(...),
or the final 2 cases when using *not* not changing the `get_file_mode`
error to `anyhow::Error` would be:
    Err(err) => match err.downcast::<io::Error>() {
        Some(err) if err.not_found() => continue,
        _ => return Err(err),
    }


I hope I didn't miss anything now.

> +                    .with_context(|| format!("stat failed on {full_path:?}"))?
>              {
>                  continue;
>              }
>  
> +            let stat = stat_results.map(Ok).unwrap_or_else(get_file_mode)?;

If the context line is not already in the closure, we'd need to
duplicate it here.

> +
>              self.entry_counter += 1;
>              if self.entry_counter > self.entry_limit {
>                  bail!(
> @@ -462,7 +470,7 @@ impl Archiver {
>              }
>  
>              file_list.push(FileListEntry {
> -                name: file_name,
> +                name: file_name.to_owned(),
>                  path: full_path,
>                  stat,
>              });
> @@ -533,7 +541,7 @@ impl Archiver {
>          let match_path = PathBuf::from("/").join(self.path.clone());
>          if self
>              .patterns
> -            .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
> +            .matches(match_path.as_os_str().as_bytes(), stat.st_mode)?
>              == Some(MatchType::Exclude)
>          {
>              return Ok(());
> diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
> index 4eb6fb90..e24a1560 100644
> --- a/pbs-client/src/pxar/extract.rs
> +++ b/pbs-client/src/pxar/extract.rs
> @@ -251,22 +251,24 @@ where
>  
>          self.extractor.set_path(entry.path().as_os_str().to_owned());
>  
> +        // We can `unwrap()` safely here because we get a `Result<_, std::convert::Infallible>`
>          let match_result = self.match_list.matches(
>              entry.path().as_os_str().as_bytes(),
> -            Some(metadata.file_type() as u32),
> -        );
> +            metadata.file_type() as u32
> +        ).unwrap();
>  
>          let did_match = match match_result {
>              Some(MatchType::Include) => true,
>              Some(MatchType::Exclude) => false,
> -            None => self.state.current_match,
> +            _ => self.state.current_match,
>          };
>  
>          let extract_res = match (did_match, entry.kind()) {
>              (_, EntryKind::Directory) => {
>                  self.callback(entry.path());
>  
> -                let create = self.state.current_match && match_result != Some(MatchType::Exclude);
> +                let create =
> +                    self.state.current_match && match_result != Some(MatchType::Exclude);
>                  let res = self
>                      .extractor
>                      .enter_directory(file_name_os.to_owned(), metadata.clone(), create)
> diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
> index 11c14b64..86e20c92 100644
> --- a/pbs-datastore/src/catalog.rs
> +++ b/pbs-datastore/src/catalog.rs
> @@ -678,9 +678,9 @@ impl<R: Read + Seek> CatalogReader<R> {
>              }
>              file_path.extend(&e.name);
>              match match_list.matches(&file_path, e.get_file_mode()) {
> -                Some(MatchType::Exclude) => continue,
> -                Some(MatchType::Include) => callback(file_path)?,
> -                None => (),
> +                Ok(Some(MatchType::Exclude)) => continue,
> +                Ok(Some(MatchType::Include)) => callback(file_path)?,
> +                _ => (),





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pbs-devel] [PATCH pathpatterns v4] match_list: updated `matches()`, to only retrieve file mode if necessary
  2023-08-21 13:08 [pbs-devel] [PATCH pathpatterns v4] match_list: updated `matches()`, to only retrieve file mode if necessary Gabriel Goller
  2023-08-21 13:08 ` [pbs-devel] [PATCH proxmox-backup v6] fix #4380: check if file is excluded before running `stat()` Gabriel Goller
@ 2023-08-22 13:07 ` Wolfgang Bumiller
  1 sibling, 0 replies; 5+ messages in thread
From: Wolfgang Bumiller @ 2023-08-22 13:07 UTC (permalink / raw)
  To: Gabriel Goller; +Cc: pbs-devel

LGTM now, 1 minor nit below since the other patch needs another version
anyway...

On Mon, Aug 21, 2023 at 03:08:25PM +0200, Gabriel Goller wrote:
> Updated `matches()` function, which now takes the `GetFileMode` trait and returns a
> Result. `GetFileMode` is a function that should return the `file_mode`. `matches()`
> will go through the patterns and match by path only until it finds a pattern which
> does not have `MatchFlag::ANY_FILE_TYPE`, in case it will call the
> `get_file_mode`, which will return a `file_mode`. This will ensure that the
> `get()` (which in our case executes `stat()`) will only be run once(at most),
> every pattern will only be processed once and that the order of the patterns
> will be respected. `GetFileMode` is also implemented for `Option<u32>` (so that
> we can simply pass options of the `file_mode`) and `u32` (returning
> `std::convert::Infallible`, so that we can pass a u32 mode directly).
> 
> Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
> ---
> 
> changes v3:
>  - changed `matches()` and `matches_exact()` to retrieve the file_mode lazily.
>    Allowed these functions to take in a trait `GetFileMode`, so that we can pass
>    closures and options that contain the `file_mode`.
> 
> changes v2:
>  - updated the `matches_path()` function to take a closure and renamed
>    it to `matches_mode_lazy()`. This allows us to only match once, so we
>    don't need to match before and after `stat()` (without and with
>    `file_mode`) anymore.
> 
> changes v1:
>  - added `matches_path()` function, which matches by path only and returns an
>    error if the `file_mode` is required.
> 
> 
>  src/lib.rs        |  72 +++++-----
>  src/match_list.rs | 335 ++++++++++++++++++++++++++++++++++++++--------
>  2 files changed, 315 insertions(+), 92 deletions(-)
> 
> diff --git a/src/match_list.rs b/src/match_list.rs
> index c5b14e0..69b1fdb 100644
> --- a/src/match_list.rs
> +++ b/src/match_list.rs
(...)
> @@ -383,30 +438,48 @@ where
>      <&'a T as IntoIterator>::IntoIter: DoubleEndedIterator,
>      <&'a T as IntoIterator>::Item: MatchListEntry,
>  {

Since we have a T here already...

> -    fn matches_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType> {
> +    fn matches<U, G>(&self, path: U, get_file_mode: G) -> Result<Option<MatchType>, G::Error>

maybe use `P` for the `path` ;-)

> +    where
> +        U: AsRef<[u8]>,
> +        G: GetFileMode,
> +    {
>          // This is an &self method on a `T where T: 'a`.
>          let this: &'a Self = unsafe { std::mem::transmute(self) };
>  
> +        let mut get_file_mode = Some(get_file_mode);
> +        let mut file_mode = None;
> +
>          for m in this.into_iter().rev() {
> -            if let Some(mt) = m.entry_matches(path, file_mode) {
> -                return Some(mt);
> +            if file_mode.is_none() && m.entry_needs_file_mode() {
> +                file_mode = Some(get_file_mode.take().unwrap().get()?);
> +            }
> +            if let Some(mt) = m.entry_matches(path.as_ref(), file_mode) {
> +                return Ok(Some(mt));
>              }
>          }
> -
> -        None
> +        Ok(None)
>      }
>  
> -    fn matches_exact_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType> {
> +    fn matches_exact<U, G>(&self, path: U, get_file_mode: G) -> Result<Option<MatchType>, G::Error>

^ here as well

> +    where
> +        U: AsRef<[u8]>,
> +        G: GetFileMode,
> +    {
>          // This is an &self method on a `T where T: 'a`.
>          let this: &'a Self = unsafe { std::mem::transmute(self) };
>  
> +        let mut get_file_mode = Some(get_file_mode);
> +        let mut file_mode = None;
> +
>          for m in this.into_iter().rev() {
> -            if let Some(mt) = m.entry_matches_exact(path, file_mode) {
> -                return Some(mt);
> +            if file_mode.is_none() && m.entry_needs_file_mode() {
> +                file_mode = Some(get_file_mode.take().unwrap().get()?);
> +            }
> +            if let Some(mt) = m.entry_matches_exact(path.as_ref(), file_mode) {
> +                return Ok(Some(mt));
>              }
>          }
> -
> -        None
> +        Ok(None)
>      }
>  }
>  




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup v6] fix #4380: check if file is excluded before running `stat()`
  2023-08-22 13:04   ` Wolfgang Bumiller
@ 2023-08-22 14:07     ` Gabriel Goller
  0 siblings, 0 replies; 5+ messages in thread
From: Gabriel Goller @ 2023-08-22 14:07 UTC (permalink / raw)
  To: Wolfgang Bumiller; +Cc: pbs-devel

Submitted a new version.

On 8/22/23 15:04, Wolfgang Bumiller wrote:
> On Mon, Aug 21, 2023 at 03:08:26PM +0200, Gabriel Goller wrote:
>> Passed a closure with the `stat()` function call to `matches()`. This
>> will traverse through all patterns and try to match using the path only, if a
>> `file_mode` is needed, it will run the closure. This means that if we exclude
>> a file with the `MatchType::ANY_FILE_TYPE`, we will skip it without running
>> `stat()` on it. As we updated the `matches()` function, we also updated all the
>> invocations of it.
>> Added `pathpatterns` crate to local overrides in cargo.toml.
>>
>> Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
>> ---
>>
>> changes v5:
>>   - updated all invocations of `matches()`
>>
>> changes v4:
>>   - match only by path and exclude the matched files, the run `stat()` and
>>     match again, this time using the `file_mode`. This will match everything
>>     twice in the worst case, which is not optimal.
>> changes v3:
>>   - checking for `read` and `execute` permissions before entering directory,
>>     doesn't work because there are a lot of side-effects (executed by
>>     different user, AppArmor, SELinux, ...).
>> changes v2:
>>   - checking for excluded files with `matches()` before executing `stat()`,
>>     this doesn't work because we get the file_mode from `stat()` and don't
>>     want to ignore it when matching.
>>
>>
>>   Cargo.toml                      |  5 +++--
>>   pbs-client/src/catalog_shell.rs |  8 +++----
>>   pbs-client/src/pxar/create.rs   | 38 ++++++++++++++++++++-------------
>>   pbs-client/src/pxar/extract.rs  | 10 +++++----
>>   pbs-datastore/src/catalog.rs    |  6 +++---
>>   5 files changed, 39 insertions(+), 28 deletions(-)
>>
>> diff --git a/Cargo.toml b/Cargo.toml
>> index 5cbae1b8..560794a4 100644
>> --- a/Cargo.toml
>> +++ b/Cargo.toml
>> @@ -264,8 +264,9 @@ proxmox-rrd.workspace = true
>>   #proxmox-sortable-macro = { path = "../proxmox/proxmox-sortable-macro" }
>>   #proxmox-human-byte = { path = "../proxmox/proxmox-human-byte" }
>>   
>> -#proxmox-apt = { path = "../proxmox/proxmox-apt" }
>> -#proxmox-openid = { path = "../proxmox/proxmox-openid" }
>> +#proxmox-apt = { path = "../proxmox-apt" }
>> +#proxmox-openid = { path = "../proxmox-openid-rs" }
>> +#pathpatterns = {path = "../pathpatterns" }
>>   
>>   #pxar = { path = "../pxar" }
>>   
>> diff --git a/pbs-client/src/catalog_shell.rs b/pbs-client/src/catalog_shell.rs
>> index b8aaf8cb..f53b3cc5 100644
>> --- a/pbs-client/src/catalog_shell.rs
>> +++ b/pbs-client/src/catalog_shell.rs
>> @@ -1138,14 +1138,14 @@ impl<'a> ExtractorState<'a> {
>>       pub async fn handle_entry(&mut self, entry: catalog::DirEntry) -> Result<(), Error> {
>>           let match_result = self.match_list.matches(&self.path, entry.get_file_mode());
>>           let did_match = match match_result {
>> -            Some(MatchType::Include) => true,
>> -            Some(MatchType::Exclude) => false,
>> -            None => self.matches,
>> +            Ok(Some(MatchType::Include)) => true,
>> +            Ok(Some(MatchType::Exclude)) => false,
>> +            _ => self.matches,
>>           };
>>   
>>           match (did_match, &entry.attr) {
>>               (_, DirEntryAttribute::Directory { .. }) => {
>> -                self.handle_new_directory(entry, match_result).await?;
>> +                self.handle_new_directory(entry, match_result?).await?;
>>               }
>>               (true, DirEntryAttribute::File { .. }) => {
>>                   self.dir_stack.push(PathStackEntry::new(entry));
>> diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
>> index 2577cf98..2d516cfa 100644
>> --- a/pbs-client/src/pxar/create.rs
>> +++ b/pbs-client/src/pxar/create.rs
>> @@ -21,7 +21,6 @@ use pxar::Metadata;
>>   
>>   use proxmox_io::vec;
>>   use proxmox_lang::c_str;
>> -use proxmox_sys::error::SysError;
>>   use proxmox_sys::fs::{self, acl, xattr};
>>   
>>   use pbs_datastore::catalog::BackupCatalogWriter;
>> @@ -420,7 +419,7 @@ impl Archiver {
>>           for file in dir.iter() {
>>               let file = file?;
>>   
>> -            let file_name = file.file_name().to_owned();
>> +            let file_name = file.file_name();
>>               let file_name_bytes = file_name.to_bytes();
>>               if file_name_bytes == b"." || file_name_bytes == b".." {
>>                   continue;
>> @@ -434,25 +433,34 @@ impl Archiver {
>>               assert_single_path_component(os_file_name)?;
>>               let full_path = self.path.join(os_file_name);
>>   
>> -            let stat = match nix::sys::stat::fstatat(
>> +            let match_path = PathBuf::from("/").join(full_path.clone());
>> +
>> +            let mut stat_results: Option<FileStat> = None;
>> +
>> +            let get_file_mode = || match nix::sys::stat::fstatat(
> You don't need that 'match' here if your cases are basically a no-op.
>
> You *could* attach the `stat failed on ...` context with the file name
> here... but this will make it a bit more tedious to check for `ENOENT`
> as we'd need to use `.downcast()` on the anyhow::Error.
>
> So either that, or we'll need to duplicate the context line down below
> where we do the final `let stat = results.unwrap_or_else(get_file_mode)?`.
>
>>                   dir_fd,
>> -                file_name.as_c_str(),
>> +                file_name.to_owned().as_c_str(),
>>                   nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
>>               ) {
>> -                Ok(stat) => stat,
>> -                Err(ref err) if err.not_found() => continue,
> Because the thing is, we still need to handle this case - just noticed
> that this was removed entirely, which is of course not what we want :-)
> If the file gets removed between listing it from the directory and us
> trying to `stat` it, we should just act as if it had never existed.
>
>> -                Err(err) => return Err(err).context(format!("stat failed on {:?}", full_path)),
>> +                Ok(stat) => Ok(stat),
>> +                Err(e) => Err(e),
>>               };
>> -
>> -            let match_path = PathBuf::from("/").join(full_path.clone());
>> -            if self
>> -                .patterns
>> -                .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
>> -                == Some(MatchType::Exclude)
>> +            if Some(MatchType::Exclude)
>> +                == self
>> +                    .patterns
>> +                    .matches(match_path.as_os_str().as_bytes(), || {
>> +                        Ok::<_, Errno>(match &stat_results {
>> +                            Some(result) => result.st_mode,
>> +                            None => stat_results.insert(get_file_mode()?).st_mode,
>> +                        })
>> +                    })
> ... it will instead need to be checked here ^
> Basically, we will need to swap the `if` for a `match` after all, with
> the cases
>      Ok(Some(MatchType::Exclude)) => continue,
>      Ok(_) => (),
>      Err(err) if err.not_found() => continue,
>      err => return err.with_context(...),
> or the final 2 cases when using *not* not changing the `get_file_mode`
> error to `anyhow::Error` would be:
>      Err(err) => match err.downcast::<io::Error>() {
>          Some(err) if err.not_found() => continue,
>          _ => return Err(err),
>      }
>
>
> I hope I didn't miss anything now.
>
>> +                    .with_context(|| format!("stat failed on {full_path:?}"))?
>>               {
>>                   continue;
>>               }
>>   
>> +            let stat = stat_results.map(Ok).unwrap_or_else(get_file_mode)?;
> If the context line is not already in the closure, we'd need to
> duplicate it here.
>
>> +
>>               self.entry_counter += 1;
>>               if self.entry_counter > self.entry_limit {
>>                   bail!(
>> @@ -462,7 +470,7 @@ impl Archiver {
>>               }
>>   
>>               file_list.push(FileListEntry {
>> -                name: file_name,
>> +                name: file_name.to_owned(),
>>                   path: full_path,
>>                   stat,
>>               });
>> @@ -533,7 +541,7 @@ impl Archiver {
>>           let match_path = PathBuf::from("/").join(self.path.clone());
>>           if self
>>               .patterns
>> -            .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
>> +            .matches(match_path.as_os_str().as_bytes(), stat.st_mode)?
>>               == Some(MatchType::Exclude)
>>           {
>>               return Ok(());
>> diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
>> index 4eb6fb90..e24a1560 100644
>> --- a/pbs-client/src/pxar/extract.rs
>> +++ b/pbs-client/src/pxar/extract.rs
>> @@ -251,22 +251,24 @@ where
>>   
>>           self.extractor.set_path(entry.path().as_os_str().to_owned());
>>   
>> +        // We can `unwrap()` safely here because we get a `Result<_, std::convert::Infallible>`
>>           let match_result = self.match_list.matches(
>>               entry.path().as_os_str().as_bytes(),
>> -            Some(metadata.file_type() as u32),
>> -        );
>> +            metadata.file_type() as u32
>> +        ).unwrap();
>>   
>>           let did_match = match match_result {
>>               Some(MatchType::Include) => true,
>>               Some(MatchType::Exclude) => false,
>> -            None => self.state.current_match,
>> +            _ => self.state.current_match,
>>           };
>>   
>>           let extract_res = match (did_match, entry.kind()) {
>>               (_, EntryKind::Directory) => {
>>                   self.callback(entry.path());
>>   
>> -                let create = self.state.current_match && match_result != Some(MatchType::Exclude);
>> +                let create =
>> +                    self.state.current_match && match_result != Some(MatchType::Exclude);
>>                   let res = self
>>                       .extractor
>>                       .enter_directory(file_name_os.to_owned(), metadata.clone(), create)
>> diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
>> index 11c14b64..86e20c92 100644
>> --- a/pbs-datastore/src/catalog.rs
>> +++ b/pbs-datastore/src/catalog.rs
>> @@ -678,9 +678,9 @@ impl<R: Read + Seek> CatalogReader<R> {
>>               }
>>               file_path.extend(&e.name);
>>               match match_list.matches(&file_path, e.get_file_mode()) {
>> -                Some(MatchType::Exclude) => continue,
>> -                Some(MatchType::Include) => callback(file_path)?,
>> -                None => (),
>> +                Ok(Some(MatchType::Exclude)) => continue,
>> +                Ok(Some(MatchType::Include)) => callback(file_path)?,
>> +                _ => (),




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-08-22 14:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-21 13:08 [pbs-devel] [PATCH pathpatterns v4] match_list: updated `matches()`, to only retrieve file mode if necessary Gabriel Goller
2023-08-21 13:08 ` [pbs-devel] [PATCH proxmox-backup v6] fix #4380: check if file is excluded before running `stat()` Gabriel Goller
2023-08-22 13:04   ` Wolfgang Bumiller
2023-08-22 14:07     ` Gabriel Goller
2023-08-22 13:07 ` [pbs-devel] [PATCH pathpatterns v4] match_list: updated `matches()`, to only retrieve file mode if necessary Wolfgang Bumiller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal