public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH pathpatterns v3] match_list: updated `matches()`, to only retrieve file mode if necessary
@ 2023-08-18 14:32 Gabriel Goller
  2023-08-18 14:32 ` [pbs-devel] [PATCH proxmox-backup v5] fix #4380: check if file is excluded before running `stat()` Gabriel Goller
  2023-08-21  7:57 ` [pbs-devel] [PATCH pathpatterns v3] match_list: updated `matches()`, to only retrieve file mode if necessary Wolfgang Bumiller
  0 siblings, 2 replies; 7+ messages in thread
From: Gabriel Goller @ 2023-08-18 14:32 UTC (permalink / raw)
  To: pbs-devel

Updated `matches()` function, which now takes the `GetFileMode` trait, that
has a function that should return the `file_mode`. `matches()` will go
through the patterns and match by path only until it finds a pattern which
does not have `MatchFlag::ANY_FILE_TYPE`, in case it will call the
`get_file_mode`, which will return a `file_mode`. This will ensure that the
`get()` (which in our case executes `stat()`) will only be run once(at most),
every pattern will only be processed once and that the order of the patterns
will be respected.

Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
---

changes v2:
 - updated the `matches_path()` function to take a closure and renamed
   it to `matches_mode_lazy()`. This allows us to only match once, so we
   don't need to match before and after `stat()` (without and with
   `file_mode`) anymore.

changes v1:
 - added `matches_path()` function, which matches by path only and returns an
   error if the `file_mode` is required.

 src/lib.rs        |  72 +++++-----
 src/match_list.rs | 337 ++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 315 insertions(+), 94 deletions(-)

diff --git a/src/lib.rs b/src/lib.rs
index cb89917..998cdae 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -31,17 +31,17 @@
 //!     MatchEntry::include(Pattern::path("bananas/curved.*")?),
 //! ];
 //!
-//! assert_eq!(list.matches("/things", None), None);
-//! assert_eq!(list.matches("/things/shop", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas", None), Some(MatchType::Exclude));
-//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Some(MatchType::Include));
+//! assert_eq!(list.matches("/things", None), Ok(None));
+//! assert_eq!(list.matches("/things/shop", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas", None), Ok(Some(MatchType::Exclude)));
+//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
 //!
 //! // this will exclude the curved.bak file
 //! list.push(MatchEntry::exclude(Pattern::path("curved.bak")?));
-//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Some(MatchType::Exclude));
+//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Ok(Some(MatchType::Exclude)));
 //! list.pop();
-//! # assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Some(MatchType::Include));
+//! # assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
 //!
 //! // but this will not:
 //! list.push(
@@ -49,40 +49,40 @@
 //!         .flags(MatchFlag::ANCHORED)
 //! );
 //! // or: list.push
-//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Some(MatchType::Include));
+//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
 //! list.pop();
 //!
 //! // let's check some patterns, anything starting with a 'c', 'f' or 's':
 //! let mut list = vec![MatchEntry::include(Pattern::path("[cfs]*")?)];
-//! assert_eq!(list.matches("/things", None), None);
-//! assert_eq!(list.matches("/things/file1.dat", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/file2.dat", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/info.txt", None), None);
-//! assert_eq!(list.matches("/things/shop/apples", None), None);
-//! assert_eq!(list.matches("/things/shop/apples/gala.txt", None), None);
-//! assert_eq!(list.matches("/things/shop/apples/golden-delicious.txt", None), None);
-//! assert_eq!(list.matches("/things/shop/bananas", None), None);
-//! assert_eq!(list.matches("/things/shop/bananas/straight.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/shop/bananas/curved.bak", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas/other.txt", None), None);
+//! assert_eq!(list.matches("/things", None), Ok(None));
+//! assert_eq!(list.matches("/things/file1.dat", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/file2.dat", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/info.txt", None), Ok(None));
+//! assert_eq!(list.matches("/things/shop/apples", None), Ok(None));
+//! assert_eq!(list.matches("/things/shop/apples/gala.txt", None), Ok(None));
+//! assert_eq!(list.matches("/things/shop/apples/golden-delicious.txt", None), Ok(None));
+//! assert_eq!(list.matches("/things/shop/bananas", None), Ok(None));
+//! assert_eq!(list.matches("/things/shop/bananas/straight.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas/other.txt", None), Ok(None));
 //!
 //! // If we add `**` we end up including the entire `shop/` subtree:
 //! list.push(MatchEntry::include(Pattern::path("[cfs]*/**")?));
-//! assert_eq!(list.matches("/things", None), None);
-//! assert_eq!(list.matches("/things/file1.dat", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/file2.dat", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/info.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/apples", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/apples/gala.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/shop/apples/golden-delicious.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas/straight.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/shop/bananas/curved.bak", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/shop/bananas/other.txt", None), Some(MatchType::Include));
+//! assert_eq!(list.matches("/things", None), Ok(None));
+//! assert_eq!(list.matches("/things/file1.dat", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/file2.dat", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/info.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/apples", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/apples/gala.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/shop/apples/golden-delicious.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas/straight.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/shop/bananas/other.txt", None), Ok(Some(MatchType::Include)));
 //! # Ok(())
 //! # }
 //! # test().unwrap()
@@ -92,7 +92,9 @@ mod match_list;
 mod pattern;
 
 #[doc(inline)]
-pub use match_list::{MatchEntry, MatchFlag, MatchList, MatchPattern, MatchType};
+pub use match_list::{
+    FileModeRequired, GetFileMode, MatchEntry, MatchFlag, MatchList, MatchPattern, MatchType,
+};
 
 #[doc(inline)]
 pub use pattern::{ParseError, Pattern, PatternFlag};
diff --git a/src/match_list.rs b/src/match_list.rs
index c5b14e0..51aa65d 100644
--- a/src/match_list.rs
+++ b/src/match_list.rs
@@ -1,5 +1,4 @@
 //! Helpers for include/exclude lists.
-
 use bitflags::bitflags;
 
 use crate::PatternFlag;
@@ -39,6 +38,37 @@ impl Default for MatchFlag {
     }
 }
 
+#[derive(Debug, PartialEq)]
+pub struct FileModeRequired {}
+impl std::fmt::Display for FileModeRequired {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        write!(f, "File mode is required for matching")
+    }
+}
+impl std::error::Error for FileModeRequired {}
+
+pub trait GetFileMode {
+    type Error;
+    fn get(self) -> Result<u32, Self::Error>;
+}
+
+impl<T, E> GetFileMode for T
+where
+    T: FnOnce() -> Result<u32, E>,
+{
+    type Error = E;
+    fn get(self) -> Result<u32, Self::Error> {
+        self()
+    }
+}
+
+impl GetFileMode for Option<u32> {
+    type Error = FileModeRequired;
+    fn get(self) -> Result<u32, Self::Error> {
+        self.ok_or(FileModeRequired {})
+    }
+}
+
 /// A pattern entry. (Glob patterns or literal patterns.)
 // Note:
 // For regex we'd likely use the POSIX extended REs via `regexec(3)`, since we're targetting
@@ -211,7 +241,7 @@ impl MatchEntry {
     pub fn matches_mode(&self, file_mode: u32) -> bool {
         // bitflags' `.contains` means ALL bits must be set, if they are all set we don't
         // need to check the mode...
-        if self.flags.contains(MatchFlag::ANY_FILE_TYPE) {
+        if !self.needs_file_mode() {
             return true;
         }
 
@@ -304,12 +334,21 @@ impl MatchEntry {
 
         self.matches_path_exact(path)
     }
+
+    pub fn needs_file_mode(&self) -> bool {
+        let flags = (self.flags & MatchFlag::ANY_FILE_TYPE).bits();
+        if flags != 0 && flags != MatchFlag::ANY_FILE_TYPE.bits() {
+            return true;
+        }
+        false
+    }
 }
 
 #[doc(hidden)]
 pub trait MatchListEntry {
     fn entry_matches(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
     fn entry_matches_exact(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
+    fn entry_needs_file_mode(&self) -> bool;
 }
 
 impl MatchListEntry for &'_ MatchEntry {
@@ -328,6 +367,10 @@ impl MatchListEntry for &'_ MatchEntry {
             None
         }
     }
+
+    fn entry_needs_file_mode(&self) -> bool {
+        self.needs_file_mode()
+    }
 }
 
 impl MatchListEntry for &'_ &'_ MatchEntry {
@@ -346,6 +389,10 @@ impl MatchListEntry for &'_ &'_ MatchEntry {
             None
         }
     }
+
+    fn entry_needs_file_mode(&self) -> bool {
+        self.needs_file_mode()
+    }
 }
 
 /// This provides [`matches`](MatchList::matches) and [`matches_exact`](MatchList::matches_exact)
@@ -361,19 +408,26 @@ impl MatchListEntry for &'_ &'_ MatchEntry {
 /// This makes it easier to use slices over entries or references to entries.
 pub trait MatchList {
     /// Check whether this list contains anything matching a prefix of the specified path, and the
-    /// specified file mode.
-    fn matches<T: AsRef<[u8]>>(&self, path: T, file_mode: Option<u32>) -> Option<MatchType> {
-        self.matches_do(path.as_ref(), file_mode)
-    }
-
-    fn matches_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
-
-    /// Check whether this list contains anything exactly matching the path and mode.
-    fn matches_exact<T: AsRef<[u8]>>(&self, path: T, file_mode: Option<u32>) -> Option<MatchType> {
-        self.matches_exact_do(path.as_ref(), file_mode)
-    }
-
-    fn matches_exact_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
+    /// specified file mode. Gets the file_mode lazily, only if needed.
+    fn matches<T, U>(
+        &self,
+        path: T,
+        get_file_mode: U,
+    ) -> Result<Option<MatchType>, <U as GetFileMode>::Error>
+    where
+        T: AsRef<[u8]>,
+        U: GetFileMode;
+
+    /// Check whether this list contains anything exactly matching the path and mode. Gets the
+    /// file_mode lazily, only if needed.
+    fn matches_exact<T, U>(
+        &self,
+        path: T,
+        get_file_mode: U,
+    ) -> Result<Option<MatchType>, <U as GetFileMode>::Error>
+    where
+        T: AsRef<[u8]>,
+        U: GetFileMode;
 }
 
 impl<'a, T> MatchList for T
@@ -383,30 +437,56 @@ where
     <&'a T as IntoIterator>::IntoIter: DoubleEndedIterator,
     <&'a T as IntoIterator>::Item: MatchListEntry,
 {
-    fn matches_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType> {
+    fn matches<U, G>(
+        &self,
+        path: U,
+        get_file_mode: G,
+    ) -> Result<Option<MatchType>, <G as GetFileMode>::Error>
+    where
+        U: AsRef<[u8]>,
+        G: GetFileMode,
+    {
         // This is an &self method on a `T where T: 'a`.
         let this: &'a Self = unsafe { std::mem::transmute(self) };
 
+        let mut get_file_mode = Some(get_file_mode);
+        let mut file_mode = None;
+
         for m in this.into_iter().rev() {
-            if let Some(mt) = m.entry_matches(path, file_mode) {
-                return Some(mt);
+            if file_mode.is_none() && m.entry_needs_file_mode() {
+                file_mode = Some(get_file_mode.take().unwrap().get()?);
+            }
+            if let Some(mt) = m.entry_matches(path.as_ref(), file_mode) {
+                return Ok(Some(mt));
             }
         }
-
-        None
-    }
-
-    fn matches_exact_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType> {
+        Ok(None)
+    }
+
+    fn matches_exact<U, G>(
+        &self,
+        path: U,
+        get_file_mode: G,
+    ) -> Result<Option<MatchType>, <G as GetFileMode>::Error>
+    where
+        U: AsRef<[u8]>,
+        G: GetFileMode,
+    {
         // This is an &self method on a `T where T: 'a`.
         let this: &'a Self = unsafe { std::mem::transmute(self) };
 
+        let mut get_file_mode = Some(get_file_mode);
+        let mut file_mode = None;
+
         for m in this.into_iter().rev() {
-            if let Some(mt) = m.entry_matches_exact(path, file_mode) {
-                return Some(mt);
+            if file_mode.is_none() && m.entry_needs_file_mode() {
+                file_mode = Some(get_file_mode.take().unwrap().get()?);
+            }
+            if let Some(mt) = m.entry_matches_exact(path.as_ref(), file_mode) {
+                return Ok(Some(mt));
             }
         }
-
-        None
+        Ok(None)
     }
 }
 
@@ -415,20 +495,20 @@ fn assert_containers_implement_match_list() {
     use std::iter::FromIterator;
 
     let vec = vec![MatchEntry::include(crate::Pattern::path("a*").unwrap())];
-    assert_eq!(vec.matches("asdf", None), Some(MatchType::Include));
+    assert_eq!(vec.matches("asdf", None), Ok(Some(MatchType::Include)));
 
     // FIXME: ideally we can make this work as well!
     let vd = std::collections::VecDeque::<MatchEntry>::from_iter(vec.clone());
-    assert_eq!(vd.matches("asdf", None), Some(MatchType::Include));
+    assert_eq!(vd.matches("asdf", None), Ok(Some(MatchType::Include)));
 
     let list: &[MatchEntry] = &vec[..];
-    assert_eq!(list.matches("asdf", None), Some(MatchType::Include));
+    assert_eq!(list.matches("asdf", None), Ok(Some(MatchType::Include)));
 
     let list: Vec<&MatchEntry> = vec.iter().collect();
-    assert_eq!(list.matches("asdf", None), Some(MatchType::Include));
+    assert_eq!(list.matches("asdf", None), Ok(Some(MatchType::Include)));
 
     let list: &[&MatchEntry] = &list[..];
-    assert_eq!(list.matches("asdf", None), Some(MatchType::Include));
+    assert_eq!(list.matches("asdf", None), Ok(Some(MatchType::Include)));
 }
 
 #[test]
@@ -443,25 +523,28 @@ fn test_file_type_matches() {
     ];
     assert_eq!(
         matchlist.matches("a_dir", Some(libc::S_IFDIR)),
-        Some(MatchType::Include)
+        Ok(Some(MatchType::Include))
     );
     assert_eq!(
         matchlist.matches("/a_dir", Some(libc::S_IFDIR)),
-        Some(MatchType::Include)
+        Ok(Some(MatchType::Include))
     );
-    assert_eq!(matchlist.matches("/a_dir", Some(libc::S_IFREG)), None);
+    assert_eq!(matchlist.matches("/a_dir", Some(libc::S_IFREG)), Ok(None));
 
     assert_eq!(
         matchlist.matches("/a_file", Some(libc::S_IFREG)),
-        Some(MatchType::Exclude)
+        Ok(Some(MatchType::Exclude))
     );
-    assert_eq!(matchlist.matches("/a_file", Some(libc::S_IFDIR)), None);
+    assert_eq!(matchlist.matches("/a_file", Some(libc::S_IFDIR)), Ok(None));
 
     assert_eq!(
         matchlist.matches("/another_dir", Some(libc::S_IFDIR)),
-        Some(MatchType::Exclude)
+        Ok(Some(MatchType::Exclude))
+    );
+    assert_eq!(
+        matchlist.matches("/another_dir", Some(libc::S_IFREG)),
+        Ok(None)
     );
-    assert_eq!(matchlist.matches("/another_dir", Some(libc::S_IFREG)), None);
 }
 
 #[test]
@@ -471,22 +554,25 @@ fn test_anchored_matches() {
     let matchlist = vec![
         MatchEntry::new(Pattern::path("file-a").unwrap(), MatchType::Include),
         MatchEntry::new(Pattern::path("some/path").unwrap(), MatchType::Include)
-            .flags(MatchFlag::ANCHORED),
+            .add_flags(MatchFlag::ANCHORED),
     ];
 
-    assert_eq!(matchlist.matches("file-a", None), Some(MatchType::Include));
+    assert_eq!(
+        matchlist.matches("file-a", None),
+        Ok(Some(MatchType::Include))
+    );
     assert_eq!(
         matchlist.matches("another/file-a", None),
-        Some(MatchType::Include)
+        Ok(Some(MatchType::Include))
     );
 
-    assert_eq!(matchlist.matches("some", None), None);
-    assert_eq!(matchlist.matches("path", None), None);
+    assert_eq!(matchlist.matches("some", None), Ok(None));
+    assert_eq!(matchlist.matches("path", None), Ok(None));
     assert_eq!(
         matchlist.matches("some/path", None),
-        Some(MatchType::Include)
+        Ok(Some(MatchType::Include))
     );
-    assert_eq!(matchlist.matches("another/some/path", None), None);
+    assert_eq!(matchlist.matches("another/some/path", None), Ok(None));
 }
 
 #[test]
@@ -495,7 +581,10 @@ fn test_literal_matches() {
         MatchPattern::Literal(b"/bin/mv".to_vec()),
         MatchType::Include,
     )];
-    assert_eq!(matchlist.matches("/bin/mv", None), Some(MatchType::Include));
+    assert_eq!(
+        matchlist.matches("/bin/mv", None),
+        Ok(Some(MatchType::Include))
+    );
 }
 
 #[test]
@@ -504,29 +593,159 @@ fn test_path_relativity() {
     let matchlist = vec![
         MatchEntry::new(Pattern::path("noslash").unwrap(), MatchType::Include),
         MatchEntry::new(Pattern::path("noslash-a").unwrap(), MatchType::Include)
-            .flags(MatchFlag::ANCHORED),
+            .add_flags(MatchFlag::ANCHORED),
         MatchEntry::new(Pattern::path("/slash").unwrap(), MatchType::Include),
         MatchEntry::new(Pattern::path("/slash-a").unwrap(), MatchType::Include)
-            .flags(MatchFlag::ANCHORED),
+            .add_flags(MatchFlag::ANCHORED),
     ];
-    assert_eq!(matchlist.matches("noslash", None), Some(MatchType::Include));
+    assert_eq!(
+        matchlist.matches("noslash", None),
+        Ok(Some(MatchType::Include))
+    );
     assert_eq!(
         matchlist.matches("noslash-a", None),
-        Some(MatchType::Include)
+        Ok(Some(MatchType::Include))
     );
-    assert_eq!(matchlist.matches("slash", None), None);
-    assert_eq!(matchlist.matches("/slash", None), Some(MatchType::Include));
-    assert_eq!(matchlist.matches("slash-a", None), None);
+    assert_eq!(matchlist.matches("slash", None), Ok(None));
+    assert_eq!(
+        matchlist.matches("/slash", None),
+        Ok(Some(MatchType::Include))
+    );
+    assert_eq!(matchlist.matches("slash-a", None), Ok(None));
     assert_eq!(
         matchlist.matches("/slash-a", None),
-        Some(MatchType::Include)
+        Ok(Some(MatchType::Include))
     );
 
     assert_eq!(
         matchlist.matches("foo/noslash", None),
-        Some(MatchType::Include)
+        Ok(Some(MatchType::Include))
     );
-    assert_eq!(matchlist.matches("foo/noslash-a", None), None);
-    assert_eq!(matchlist.matches("foo/slash", None), None);
-    assert_eq!(matchlist.matches("foo/slash-a", None), None);
+    assert_eq!(matchlist.matches("foo/noslash-a", None), Ok(None));
+    assert_eq!(matchlist.matches("foo/slash", None), Ok(None));
+    assert_eq!(matchlist.matches("foo/slash-a", None), Ok(None));
+}
+
+#[test]
+fn matches_path() {
+    use crate::Pattern;
+
+    let matchlist = vec![
+        MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude),
+        MatchEntry::new(Pattern::path("b*").unwrap(), MatchType::Exclude),
+    ];
+
+    assert_eq!(
+        matchlist.matches("ahsjdj", || Err(FileModeRequired {})),
+        Ok(Some(MatchType::Exclude))
+    );
+    let mut _test = MatchFlag::ANY_FILE_TYPE;
+    assert_eq!(
+        matchlist.matches("bhshdf", || {
+            _test = MatchFlag::MATCH_DIRECTORIES;
+            Ok::<u32, FileModeRequired>(libc::S_IFDIR)
+        }),
+        Ok(Some(MatchType::Exclude))
+    );
+
+    let matchlist = vec![
+        MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+            .flags(MatchFlag::MATCH_DIRECTORIES),
+        MatchEntry::new(Pattern::path("b*").unwrap(), MatchType::Exclude),
+    ];
+
+    assert_eq!(
+        matchlist.matches("ahsjdj", || Err(FileModeRequired {})),
+        Err(FileModeRequired {})
+    );
+    assert_eq!(
+        matchlist.matches("bhshdf", || Err(FileModeRequired {})),
+        Ok(Some(MatchType::Exclude))
+    );
+    assert_eq!(
+        matchlist.matches("ahsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFDIR)),
+        Ok(Some(MatchType::Exclude))
+    );
+
+    let matchlist = vec![
+        MatchEntry::new(Pattern::path("b*").unwrap(), MatchType::Include),
+        MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+            .flags(MatchFlag::MATCH_DIRECTORIES),
+    ];
+
+    assert_eq!(
+        matchlist.matches("ahsjdj", || Err(FileModeRequired {})),
+        Err(FileModeRequired {})
+    );
+    assert_eq!(
+        matchlist.matches("bhshdf", || Err(FileModeRequired {})),
+        Err(FileModeRequired {})
+    );
+
+    assert_eq!(
+        matchlist.matches("ahsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFDIR)),
+        Ok(Some(MatchType::Exclude))
+    );
+    assert_eq!(
+        matchlist.matches("bhshdf", || Err(FileModeRequired {})),
+        Err(FileModeRequired {})
+    );
+    assert_eq!(
+        matchlist.matches("bbb", || Ok::<u32, FileModeRequired>(libc::S_IFDIR)),
+        Ok(Some(MatchType::Include))
+    );
+
+    let matchlist = vec![
+        MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+            .flags(MatchFlag::MATCH_DIRECTORIES),
+    ];
+
+    assert_eq!(
+        matchlist.matches("bbb", || Ok::<u32, FileModeRequired>(libc::S_IFDIR)),
+        Ok(None)
+    );
+
+    let matchlist = vec![
+        MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+            .flags(MatchFlag::MATCH_DIRECTORIES),
+        MatchEntry::new(Pattern::path("b*").unwrap(), MatchType::Exclude)
+            .flags(MatchFlag::MATCH_REGULAR_FILES),
+    ];
+
+    assert_eq!(
+        matchlist.matches("ahsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFDIR)),
+        Ok(Some(MatchType::Exclude))
+    );
+    assert_eq!(
+        matchlist.matches("ahsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFREG)),
+        Ok(None)
+    );
+    assert_eq!(
+        matchlist.matches("bhsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFREG)),
+        Ok(Some(MatchType::Exclude))
+    );
+}
+
+#[test]
+fn match_entry_needs_flag() {
+    use crate::Pattern;
+    let match_entry = MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+        .flags(MatchFlag::MATCH_DIRECTORIES);
+    assert_eq!(match_entry.needs_file_mode(), true);
+
+    let match_entry = MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+        .flags(MatchFlag::MATCH_REGULAR_FILES);
+    assert_eq!(match_entry.needs_file_mode(), true);
+
+    let match_entry = MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+        .flags(MatchFlag::MATCH_DIRECTORIES | MatchFlag::MATCH_REGULAR_FILES);
+    assert_eq!(match_entry.needs_file_mode(), true);
+
+    let match_entry = MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+        .flags(MatchFlag::ANCHORED | MatchFlag::MATCH_DIRECTORIES);
+    assert_eq!(match_entry.needs_file_mode(), true);
+
+    let match_entry = MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+        .flags(MatchFlag::ANY_FILE_TYPE);
+    assert_eq!(match_entry.needs_file_mode(), false);
 }
-- 
2.39.2





^ permalink raw reply	[flat|nested] 7+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v5] fix #4380: check if file is excluded before running `stat()`
  2023-08-18 14:32 [pbs-devel] [PATCH pathpatterns v3] match_list: updated `matches()`, to only retrieve file mode if necessary Gabriel Goller
@ 2023-08-18 14:32 ` Gabriel Goller
  2023-08-21  8:41   ` Wolfgang Bumiller
  2023-08-21  7:57 ` [pbs-devel] [PATCH pathpatterns v3] match_list: updated `matches()`, to only retrieve file mode if necessary Wolfgang Bumiller
  1 sibling, 1 reply; 7+ messages in thread
From: Gabriel Goller @ 2023-08-18 14:32 UTC (permalink / raw)
  To: pbs-devel

Passed a closure with the `stat()` function call to `matches()`. This
will traverse through all patterns and try to match using the path only, if a
`file_mode` is needed, it will run the closure. This means that if we exclude
a file with the `MatchType::ANY_FILE_TYPE`, we will skip it without running
`stat()` on it.
Added `pathpatterns` crate to local overrides in cargo.toml.

Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
---

changes v4:
 - match only by path and exclude the matched files, the run `stat()` and
   match again, this time using the `file_mode`. This will match everything 
   twice in the worst case, which is not optimal.
changes v3:
 - checking for `read` and `execute` permissions before entering directory,
   doesn't work because there are a lot of side-effects (executed by
   different user, AppArmor, SELinux, ...).
changes v2:
 - checking for excluded files with `matches()` before executing `stat()`,
   this doesn't work because we get the file_mode from `stat()` and don't
   want to ignore it when matching.

 Cargo.toml                      |  1 +
 pbs-client/src/catalog_shell.rs | 12 +++---
 pbs-client/src/pxar/create.rs   | 69 ++++++++++++++++++++++++---------
 pbs-client/src/pxar/extract.rs  |  9 +++--
 pbs-datastore/src/catalog.rs    |  6 +--
 5 files changed, 65 insertions(+), 32 deletions(-)

diff --git a/Cargo.toml b/Cargo.toml
index 2a05c471..c54bc28b 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -262,6 +262,7 @@ proxmox-rrd.workspace = true
 
 #proxmox-apt = { path = "../proxmox-apt" }
 #proxmox-openid = { path = "../proxmox-openid-rs" }
+#pathpatterns = {path = "../pathpatterns" }
 
 #pxar = { path = "../pxar" }
 
diff --git a/pbs-client/src/catalog_shell.rs b/pbs-client/src/catalog_shell.rs
index 98af5699..89bcc280 100644
--- a/pbs-client/src/catalog_shell.rs
+++ b/pbs-client/src/catalog_shell.rs
@@ -13,7 +13,7 @@ use nix::dir::Dir;
 use nix::fcntl::OFlag;
 use nix::sys::stat::Mode;
 
-use pathpatterns::{MatchEntry, MatchList, MatchPattern, MatchType, PatternFlag};
+use pathpatterns::{FileModeRequired, MatchEntry, MatchList, MatchPattern, MatchType, PatternFlag};
 use proxmox_router::cli::{self, CliCommand, CliCommandMap, CliHelper, CommandLineInterface};
 use proxmox_schema::api;
 use proxmox_sys::fs::{create_path, CreateOptions};
@@ -1108,7 +1108,7 @@ impl<'a> ExtractorState<'a> {
     async fn handle_new_directory(
         &mut self,
         entry: catalog::DirEntry,
-        match_result: Option<MatchType>,
+        match_result: Result<Option<MatchType>, FileModeRequired>,
     ) -> Result<(), Error> {
         // enter a new directory:
         self.read_dir_stack.push(mem::replace(
@@ -1123,7 +1123,7 @@ impl<'a> ExtractorState<'a> {
         Shell::walk_pxar_archive(self.accessor, &mut self.dir_stack).await?;
         let dir_pxar = self.dir_stack.last().unwrap().pxar.as_ref().unwrap();
         let dir_meta = dir_pxar.entry().metadata().clone();
-        let create = self.matches && match_result != Some(MatchType::Exclude);
+        let create = self.matches && match_result != Ok(Some(MatchType::Exclude));
         self.extractor
             .enter_directory(dir_pxar.file_name().to_os_string(), dir_meta, create)?;
 
@@ -1133,9 +1133,9 @@ impl<'a> ExtractorState<'a> {
     pub async fn handle_entry(&mut self, entry: catalog::DirEntry) -> Result<(), Error> {
         let match_result = self.match_list.matches(&self.path, entry.get_file_mode());
         let did_match = match match_result {
-            Some(MatchType::Include) => true,
-            Some(MatchType::Exclude) => false,
-            None => self.matches,
+            Ok(Some(MatchType::Include)) => true,
+            Ok(Some(MatchType::Exclude)) => false,
+            _ => self.matches,
         };
 
         match (did_match, &entry.attr) {
diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 2577cf98..4a844599 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -21,7 +21,6 @@ use pxar::Metadata;
 
 use proxmox_io::vec;
 use proxmox_lang::c_str;
-use proxmox_sys::error::SysError;
 use proxmox_sys::fs::{self, acl, xattr};
 
 use pbs_datastore::catalog::BackupCatalogWriter;
@@ -420,7 +419,7 @@ impl Archiver {
         for file in dir.iter() {
             let file = file?;
 
-            let file_name = file.file_name().to_owned();
+            let file_name = file.file_name();
             let file_name_bytes = file_name.to_bytes();
             if file_name_bytes == b"." || file_name_bytes == b".." {
                 continue;
@@ -434,25 +433,57 @@ impl Archiver {
             assert_single_path_component(os_file_name)?;
             let full_path = self.path.join(os_file_name);
 
-            let stat = match nix::sys::stat::fstatat(
-                dir_fd,
-                file_name.as_c_str(),
-                nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
-            ) {
-                Ok(stat) => stat,
-                Err(ref err) if err.not_found() => continue,
-                Err(err) => return Err(err).context(format!("stat failed on {:?}", full_path)),
-            };
-
             let match_path = PathBuf::from("/").join(full_path.clone());
-            if self
+
+            let mut stat_results: Option<FileStat> = None;
+
+            let stat = self
                 .patterns
-                .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
-                == Some(MatchType::Exclude)
-            {
+                .matches(
+                    match_path.as_os_str().as_bytes(),
+                    || match nix::sys::stat::fstatat(
+                        dir_fd,
+                        file_name.to_owned().as_c_str(),
+                        nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
+                    ) {
+                        Ok(stat) => {
+                            stat_results = Some(stat);
+                            Ok(stat.st_mode)
+                        }
+                        Err(e) => Err(e),
+                    },
+                )
+                .map_err(|err| {
+                    anyhow::Error::new(err).context(format!(
+                        "stat failed on {:?} with {:?}",
+                        full_path,
+                        err.desc()
+                    ))
+                })?;
+
+            if stat == Some(MatchType::Exclude) {
                 continue;
             }
 
+            if stat_results.is_none() {
+                match nix::sys::stat::fstatat(
+                    dir_fd,
+                    file_name.to_owned().as_c_str(),
+                    nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
+                ) {
+                    Ok(stat) => {
+                        stat_results = Some(stat);
+                    }
+                    Err(err) => {
+                        return Err(err).context(format!(
+                            "stat failed on {:?} with {:?}",
+                            full_path,
+                            err.desc()
+                        ));
+                    }
+                }
+            }
+
             self.entry_counter += 1;
             if self.entry_counter > self.entry_limit {
                 bail!(
@@ -462,9 +493,9 @@ impl Archiver {
             }
 
             file_list.push(FileListEntry {
-                name: file_name,
+                name: file_name.to_owned(),
                 path: full_path,
-                stat,
+                stat: stat_results.unwrap(),
             });
         }
 
@@ -534,7 +565,7 @@ impl Archiver {
         if self
             .patterns
             .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
-            == Some(MatchType::Exclude)
+            == Ok(Some(MatchType::Exclude))
         {
             return Ok(());
         }
diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
index 4dbaf52d..e08af3c0 100644
--- a/pbs-client/src/pxar/extract.rs
+++ b/pbs-client/src/pxar/extract.rs
@@ -244,16 +244,17 @@ where
         );
 
         let did_match = match match_result {
-            Some(MatchType::Include) => true,
-            Some(MatchType::Exclude) => false,
-            None => self.state.current_match,
+            Ok(Some(MatchType::Include)) => true,
+            Ok(Some(MatchType::Exclude)) => false,
+            _ => self.state.current_match,
         };
 
         let extract_res = match (did_match, entry.kind()) {
             (_, EntryKind::Directory) => {
                 self.callback(entry.path());
 
-                let create = self.state.current_match && match_result != Some(MatchType::Exclude);
+                let create =
+                    self.state.current_match && match_result != Ok(Some(MatchType::Exclude));
                 let res = self
                     .extractor
                     .enter_directory(file_name_os.to_owned(), metadata.clone(), create)
diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
index 11c14b64..86e20c92 100644
--- a/pbs-datastore/src/catalog.rs
+++ b/pbs-datastore/src/catalog.rs
@@ -678,9 +678,9 @@ impl<R: Read + Seek> CatalogReader<R> {
             }
             file_path.extend(&e.name);
             match match_list.matches(&file_path, e.get_file_mode()) {
-                Some(MatchType::Exclude) => continue,
-                Some(MatchType::Include) => callback(file_path)?,
-                None => (),
+                Ok(Some(MatchType::Exclude)) => continue,
+                Ok(Some(MatchType::Include)) => callback(file_path)?,
+                _ => (),
             }
             if is_dir {
                 self.find(&e, file_path, match_list, callback)?;
-- 
2.39.2





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [PATCH pathpatterns v3] match_list: updated `matches()`, to only retrieve file mode if necessary
  2023-08-18 14:32 [pbs-devel] [PATCH pathpatterns v3] match_list: updated `matches()`, to only retrieve file mode if necessary Gabriel Goller
  2023-08-18 14:32 ` [pbs-devel] [PATCH proxmox-backup v5] fix #4380: check if file is excluded before running `stat()` Gabriel Goller
@ 2023-08-21  7:57 ` Wolfgang Bumiller
  1 sibling, 0 replies; 7+ messages in thread
From: Wolfgang Bumiller @ 2023-08-21  7:57 UTC (permalink / raw)
  To: Gabriel Goller; +Cc: pbs-devel

On Fri, Aug 18, 2023 at 04:32:11PM +0200, Gabriel Goller wrote:
> Updated `matches()` function, which now takes the `GetFileMode` trait, that
> has a function that should return the `file_mode`. `matches()` will go
> through the patterns and match by path only until it finds a pattern which
> does not have `MatchFlag::ANY_FILE_TYPE`, in case it will call the
> `get_file_mode`, which will return a `file_mode`. This will ensure that the
> `get()` (which in our case executes `stat()`) will only be run once(at most),
> every pattern will only be processed once and that the order of the patterns
> will be respected.
> 
> Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
> ---
> 
> changes v2:
>  - updated the `matches_path()` function to take a closure and renamed
>    it to `matches_mode_lazy()`. This allows us to only match once, so we
>    don't need to match before and after `stat()` (without and with
>    `file_mode`) anymore.
> 
> changes v1:
>  - added `matches_path()` function, which matches by path only and returns an
>    error if the `file_mode` is required.
> 
>  src/lib.rs        |  72 +++++-----
>  src/match_list.rs | 337 ++++++++++++++++++++++++++++++++++++++--------
>  2 files changed, 315 insertions(+), 94 deletions(-)
> 
> diff --git a/src/lib.rs b/src/lib.rs
> index cb89917..998cdae 100644
> --- a/src/lib.rs
> +++ b/src/lib.rs
> @@ -31,17 +31,17 @@
>  //!     MatchEntry::include(Pattern::path("bananas/curved.*")?),
>  //! ];
>  //!
> -//! assert_eq!(list.matches("/things", None), None);
> -//! assert_eq!(list.matches("/things/shop", None), Some(MatchType::Include));
> -//! assert_eq!(list.matches("/things/shop/bananas", None), Some(MatchType::Exclude));
> -//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Some(MatchType::Include));
> -//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Some(MatchType::Include));
> +//! assert_eq!(list.matches("/things", None), Ok(None));
> +//! assert_eq!(list.matches("/things/shop", None), Ok(Some(MatchType::Include)));
> +//! assert_eq!(list.matches("/things/shop/bananas", None), Ok(Some(MatchType::Exclude)));
> +//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Ok(Some(MatchType::Include)));
> +//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
>  //!
>  //! // this will exclude the curved.bak file
>  //! list.push(MatchEntry::exclude(Pattern::path("curved.bak")?));
> -//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Some(MatchType::Exclude));
> +//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Ok(Some(MatchType::Exclude)));
>  //! list.pop();
> -//! # assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Some(MatchType::Include));
> +//! # assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
>  //!
>  //! // but this will not:
>  //! list.push(
> @@ -49,40 +49,40 @@
>  //!         .flags(MatchFlag::ANCHORED)
>  //! );
>  //! // or: list.push
> -//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Some(MatchType::Include));
> +//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
>  //! list.pop();
>  //!
>  //! // let's check some patterns, anything starting with a 'c', 'f' or 's':
>  //! let mut list = vec![MatchEntry::include(Pattern::path("[cfs]*")?)];
> -//! assert_eq!(list.matches("/things", None), None);
> -//! assert_eq!(list.matches("/things/file1.dat", None), Some(MatchType::Include));
> -//! assert_eq!(list.matches("/things/file2.dat", None), Some(MatchType::Include));
> -//! assert_eq!(list.matches("/things/shop", None), Some(MatchType::Include));
> -//! assert_eq!(list.matches("/things/shop/info.txt", None), None);
> -//! assert_eq!(list.matches("/things/shop/apples", None), None);
> -//! assert_eq!(list.matches("/things/shop/apples/gala.txt", None), None);
> -//! assert_eq!(list.matches("/things/shop/apples/golden-delicious.txt", None), None);
> -//! assert_eq!(list.matches("/things/shop/bananas", None), None);
> -//! assert_eq!(list.matches("/things/shop/bananas/straight.txt", None), Some(MatchType::Include));
> -//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Some(MatchType::Include));
> -//! assert_eq!(list.matches("/shop/bananas/curved.bak", None), Some(MatchType::Include));
> -//! assert_eq!(list.matches("/things/shop/bananas/other.txt", None), None);
> +//! assert_eq!(list.matches("/things", None), Ok(None));
> +//! assert_eq!(list.matches("/things/file1.dat", None), Ok(Some(MatchType::Include)));
> +//! assert_eq!(list.matches("/things/file2.dat", None), Ok(Some(MatchType::Include)));
> +//! assert_eq!(list.matches("/things/shop", None), Ok(Some(MatchType::Include)));
> +//! assert_eq!(list.matches("/things/shop/info.txt", None), Ok(None));
> +//! assert_eq!(list.matches("/things/shop/apples", None), Ok(None));
> +//! assert_eq!(list.matches("/things/shop/apples/gala.txt", None), Ok(None));
> +//! assert_eq!(list.matches("/things/shop/apples/golden-delicious.txt", None), Ok(None));
> +//! assert_eq!(list.matches("/things/shop/bananas", None), Ok(None));
> +//! assert_eq!(list.matches("/things/shop/bananas/straight.txt", None), Ok(Some(MatchType::Include)));
> +//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Ok(Some(MatchType::Include)));
> +//! assert_eq!(list.matches("/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
> +//! assert_eq!(list.matches("/things/shop/bananas/other.txt", None), Ok(None));
>  //!
>  //! // If we add `**` we end up including the entire `shop/` subtree:
>  //! list.push(MatchEntry::include(Pattern::path("[cfs]*/**")?));
> -//! assert_eq!(list.matches("/things", None), None);
> -//! assert_eq!(list.matches("/things/file1.dat", None), Some(MatchType::Include));
> -//! assert_eq!(list.matches("/things/file2.dat", None), Some(MatchType::Include));
> -//! assert_eq!(list.matches("/things/shop", None), Some(MatchType::Include));
> -//! assert_eq!(list.matches("/things/shop/info.txt", None), Some(MatchType::Include));
> -//! assert_eq!(list.matches("/things/shop/apples", None), Some(MatchType::Include));
> -//! assert_eq!(list.matches("/things/shop/apples/gala.txt", None), Some(MatchType::Include));
> -//! assert_eq!(list.matches("/shop/apples/golden-delicious.txt", None), Some(MatchType::Include));
> -//! assert_eq!(list.matches("/things/shop/bananas", None), Some(MatchType::Include));
> -//! assert_eq!(list.matches("/things/shop/bananas/straight.txt", None), Some(MatchType::Include));
> -//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Some(MatchType::Include));
> -//! assert_eq!(list.matches("/shop/bananas/curved.bak", None), Some(MatchType::Include));
> -//! assert_eq!(list.matches("/shop/bananas/other.txt", None), Some(MatchType::Include));
> +//! assert_eq!(list.matches("/things", None), Ok(None));
> +//! assert_eq!(list.matches("/things/file1.dat", None), Ok(Some(MatchType::Include)));
> +//! assert_eq!(list.matches("/things/file2.dat", None), Ok(Some(MatchType::Include)));
> +//! assert_eq!(list.matches("/things/shop", None), Ok(Some(MatchType::Include)));
> +//! assert_eq!(list.matches("/things/shop/info.txt", None), Ok(Some(MatchType::Include)));
> +//! assert_eq!(list.matches("/things/shop/apples", None), Ok(Some(MatchType::Include)));
> +//! assert_eq!(list.matches("/things/shop/apples/gala.txt", None), Ok(Some(MatchType::Include)));
> +//! assert_eq!(list.matches("/shop/apples/golden-delicious.txt", None), Ok(Some(MatchType::Include)));
> +//! assert_eq!(list.matches("/things/shop/bananas", None), Ok(Some(MatchType::Include)));
> +//! assert_eq!(list.matches("/things/shop/bananas/straight.txt", None), Ok(Some(MatchType::Include)));
> +//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Ok(Some(MatchType::Include)));
> +//! assert_eq!(list.matches("/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
> +//! assert_eq!(list.matches("/shop/bananas/other.txt", None), Ok(Some(MatchType::Include)));
>  //! # Ok(())
>  //! # }
>  //! # test().unwrap()
> @@ -92,7 +92,9 @@ mod match_list;
>  mod pattern;
>  
>  #[doc(inline)]
> -pub use match_list::{MatchEntry, MatchFlag, MatchList, MatchPattern, MatchType};
> +pub use match_list::{
> +    FileModeRequired, GetFileMode, MatchEntry, MatchFlag, MatchList, MatchPattern, MatchType,
> +};
>  
>  #[doc(inline)]
>  pub use pattern::{ParseError, Pattern, PatternFlag};
> diff --git a/src/match_list.rs b/src/match_list.rs
> index c5b14e0..51aa65d 100644
> --- a/src/match_list.rs
> +++ b/src/match_list.rs
> @@ -1,5 +1,4 @@
>  //! Helpers for include/exclude lists.
> -
>  use bitflags::bitflags;
>  
>  use crate::PatternFlag;
> @@ -39,6 +38,37 @@ impl Default for MatchFlag {
>      }
>  }
>  
> +#[derive(Debug, PartialEq)]

Aaaah shoot, can't put the `PartialEq` behind `cfg(test)` due to
doctests :(

Well, those doctests look ugly in the docs anyway, so I suppose a future
todo-item would be to move those into regular tests and guard this
derive.
But for now this is fine (definitely out of scope of this series)!

> +pub struct FileModeRequired {}

^ Why does this have an empty body now? We don't need that, please drop
it :-)

> +impl std::fmt::Display for FileModeRequired {
> +    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
> +        write!(f, "File mode is required for matching")
> +    }
> +}
> +impl std::error::Error for FileModeRequired {}
> +
> +pub trait GetFileMode {
> +    type Error;
> +    fn get(self) -> Result<u32, Self::Error>;
> +}
> +
> +impl<T, E> GetFileMode for T
> +where
> +    T: FnOnce() -> Result<u32, E>,
> +{
> +    type Error = E;
> +    fn get(self) -> Result<u32, Self::Error> {
> +        self()
> +    }
> +}
> +
> +impl GetFileMode for Option<u32> {
> +    type Error = FileModeRequired;
> +    fn get(self) -> Result<u32, Self::Error> {
> +        self.ok_or(FileModeRequired {})
> +    }
> +}
> +
>  /// A pattern entry. (Glob patterns or literal patterns.)
>  // Note:
>  // For regex we'd likely use the POSIX extended REs via `regexec(3)`, since we're targetting
> @@ -211,7 +241,7 @@ impl MatchEntry {
>      pub fn matches_mode(&self, file_mode: u32) -> bool {
>          // bitflags' `.contains` means ALL bits must be set, if they are all set we don't
>          // need to check the mode...
> -        if self.flags.contains(MatchFlag::ANY_FILE_TYPE) {
> +        if !self.needs_file_mode() {
>              return true;
>          }
>  
> @@ -304,12 +334,21 @@ impl MatchEntry {
>  
>          self.matches_path_exact(path)
>      }
> +
> +    pub fn needs_file_mode(&self) -> bool {
> +        let flags = (self.flags & MatchFlag::ANY_FILE_TYPE).bits();
> +        if flags != 0 && flags != MatchFlag::ANY_FILE_TYPE.bits() {
> +            return true;
> +        }
> +        false
> +    }
>  }
>  
>  #[doc(hidden)]
>  pub trait MatchListEntry {
>      fn entry_matches(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
>      fn entry_matches_exact(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
> +    fn entry_needs_file_mode(&self) -> bool;
>  }
>  
>  impl MatchListEntry for &'_ MatchEntry {
> @@ -328,6 +367,10 @@ impl MatchListEntry for &'_ MatchEntry {
>              None
>          }
>      }
> +
> +    fn entry_needs_file_mode(&self) -> bool {
> +        self.needs_file_mode()
> +    }
>  }
>  
>  impl MatchListEntry for &'_ &'_ MatchEntry {
> @@ -346,6 +389,10 @@ impl MatchListEntry for &'_ &'_ MatchEntry {
>              None
>          }
>      }
> +
> +    fn entry_needs_file_mode(&self) -> bool {
> +        self.needs_file_mode()
> +    }
>  }
>  
>  /// This provides [`matches`](MatchList::matches) and [`matches_exact`](MatchList::matches_exact)
> @@ -361,19 +408,26 @@ impl MatchListEntry for &'_ &'_ MatchEntry {
>  /// This makes it easier to use slices over entries or references to entries.
>  pub trait MatchList {
>      /// Check whether this list contains anything matching a prefix of the specified path, and the
> -    /// specified file mode.
> -    fn matches<T: AsRef<[u8]>>(&self, path: T, file_mode: Option<u32>) -> Option<MatchType> {
> -        self.matches_do(path.as_ref(), file_mode)
> -    }
> -
> -    fn matches_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
> -
> -    /// Check whether this list contains anything exactly matching the path and mode.
> -    fn matches_exact<T: AsRef<[u8]>>(&self, path: T, file_mode: Option<u32>) -> Option<MatchType> {
> -        self.matches_exact_do(path.as_ref(), file_mode)
> -    }
> -
> -    fn matches_exact_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
> +    /// specified file mode. Gets the file_mode lazily, only if needed.
> +    fn matches<T, U>(
> +        &self,
> +        path: T,
> +        get_file_mode: U,
> +    ) -> Result<Option<MatchType>, <U as GetFileMode>::Error>

Just use `U::Error`, formats cleaner.

> +    where
> +        T: AsRef<[u8]>,
> +        U: GetFileMode;
> +
> +    /// Check whether this list contains anything exactly matching the path and mode. Gets the
> +    /// file_mode lazily, only if needed.
> +    fn matches_exact<T, U>(
> +        &self,
> +        path: T,
> +        get_file_mode: U,
> +    ) -> Result<Option<MatchType>, <U as GetFileMode>::Error>
> +    where
> +        T: AsRef<[u8]>,
> +        U: GetFileMode;
>  }
>  
>  impl<'a, T> MatchList for T
> @@ -383,30 +437,56 @@ where
>      <&'a T as IntoIterator>::IntoIter: DoubleEndedIterator,
>      <&'a T as IntoIterator>::Item: MatchListEntry,
>  {
> -    fn matches_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType> {
> +    fn matches<U, G>(
> +        &self,
> +        path: U,
> +        get_file_mode: G,
> +    ) -> Result<Option<MatchType>, <G as GetFileMode>::Error>
> +    where
> +        U: AsRef<[u8]>,
> +        G: GetFileMode,
> +    {
>          // This is an &self method on a `T where T: 'a`.
>          let this: &'a Self = unsafe { std::mem::transmute(self) };
>  
> +        let mut get_file_mode = Some(get_file_mode);
> +        let mut file_mode = None;
> +
>          for m in this.into_iter().rev() {
> -            if let Some(mt) = m.entry_matches(path, file_mode) {
> -                return Some(mt);
> +            if file_mode.is_none() && m.entry_needs_file_mode() {
> +                file_mode = Some(get_file_mode.take().unwrap().get()?);
> +            }
> +            if let Some(mt) = m.entry_matches(path.as_ref(), file_mode) {
> +                return Ok(Some(mt));
>              }
>          }
> -
> -        None
> -    }
> -
> -    fn matches_exact_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType> {
> +        Ok(None)
> +    }
> +
> +    fn matches_exact<U, G>(
> +        &self,
> +        path: U,
> +        get_file_mode: G,
> +    ) -> Result<Option<MatchType>, <G as GetFileMode>::Error>
> +    where
> +        U: AsRef<[u8]>,
> +        G: GetFileMode,
> +    {
>          // This is an &self method on a `T where T: 'a`.
>          let this: &'a Self = unsafe { std::mem::transmute(self) };
>  
> +        let mut get_file_mode = Some(get_file_mode);
> +        let mut file_mode = None;
> +
>          for m in this.into_iter().rev() {
> -            if let Some(mt) = m.entry_matches_exact(path, file_mode) {
> -                return Some(mt);
> +            if file_mode.is_none() && m.entry_needs_file_mode() {
> +                file_mode = Some(get_file_mode.take().unwrap().get()?);
> +            }
> +            if let Some(mt) = m.entry_matches_exact(path.as_ref(), file_mode) {
> +                return Ok(Some(mt));
>              }
>          }
> -
> -        None
> +        Ok(None)
>      }
>  }
>  
> @@ -415,20 +495,20 @@ fn assert_containers_implement_match_list() {
>      use std::iter::FromIterator;
>  
>      let vec = vec![MatchEntry::include(crate::Pattern::path("a*").unwrap())];
> -    assert_eq!(vec.matches("asdf", None), Some(MatchType::Include));
> +    assert_eq!(vec.matches("asdf", None), Ok(Some(MatchType::Include)));
>  
>      // FIXME: ideally we can make this work as well!
>      let vd = std::collections::VecDeque::<MatchEntry>::from_iter(vec.clone());
> -    assert_eq!(vd.matches("asdf", None), Some(MatchType::Include));
> +    assert_eq!(vd.matches("asdf", None), Ok(Some(MatchType::Include)));
>  
>      let list: &[MatchEntry] = &vec[..];
> -    assert_eq!(list.matches("asdf", None), Some(MatchType::Include));
> +    assert_eq!(list.matches("asdf", None), Ok(Some(MatchType::Include)));
>  
>      let list: Vec<&MatchEntry> = vec.iter().collect();
> -    assert_eq!(list.matches("asdf", None), Some(MatchType::Include));
> +    assert_eq!(list.matches("asdf", None), Ok(Some(MatchType::Include)));
>  
>      let list: &[&MatchEntry] = &list[..];
> -    assert_eq!(list.matches("asdf", None), Some(MatchType::Include));
> +    assert_eq!(list.matches("asdf", None), Ok(Some(MatchType::Include)));
>  }
>  
>  #[test]
> @@ -443,25 +523,28 @@ fn test_file_type_matches() {
>      ];
>      assert_eq!(
>          matchlist.matches("a_dir", Some(libc::S_IFDIR)),
> -        Some(MatchType::Include)
> +        Ok(Some(MatchType::Include))
>      );
>      assert_eq!(
>          matchlist.matches("/a_dir", Some(libc::S_IFDIR)),
> -        Some(MatchType::Include)
> +        Ok(Some(MatchType::Include))
>      );
> -    assert_eq!(matchlist.matches("/a_dir", Some(libc::S_IFREG)), None);
> +    assert_eq!(matchlist.matches("/a_dir", Some(libc::S_IFREG)), Ok(None));
>  
>      assert_eq!(
>          matchlist.matches("/a_file", Some(libc::S_IFREG)),
> -        Some(MatchType::Exclude)
> +        Ok(Some(MatchType::Exclude))
>      );
> -    assert_eq!(matchlist.matches("/a_file", Some(libc::S_IFDIR)), None);
> +    assert_eq!(matchlist.matches("/a_file", Some(libc::S_IFDIR)), Ok(None));
>  
>      assert_eq!(
>          matchlist.matches("/another_dir", Some(libc::S_IFDIR)),
> -        Some(MatchType::Exclude)
> +        Ok(Some(MatchType::Exclude))
> +    );
> +    assert_eq!(
> +        matchlist.matches("/another_dir", Some(libc::S_IFREG)),
> +        Ok(None)
>      );
> -    assert_eq!(matchlist.matches("/another_dir", Some(libc::S_IFREG)), None);
>  }
>  
>  #[test]
> @@ -471,22 +554,25 @@ fn test_anchored_matches() {
>      let matchlist = vec![
>          MatchEntry::new(Pattern::path("file-a").unwrap(), MatchType::Include),
>          MatchEntry::new(Pattern::path("some/path").unwrap(), MatchType::Include)
> -            .flags(MatchFlag::ANCHORED),
> +            .add_flags(MatchFlag::ANCHORED),
>      ];
>  
> -    assert_eq!(matchlist.matches("file-a", None), Some(MatchType::Include));
> +    assert_eq!(
> +        matchlist.matches("file-a", None),
> +        Ok(Some(MatchType::Include))
> +    );
>      assert_eq!(
>          matchlist.matches("another/file-a", None),
> -        Some(MatchType::Include)
> +        Ok(Some(MatchType::Include))
>      );
>  
> -    assert_eq!(matchlist.matches("some", None), None);
> -    assert_eq!(matchlist.matches("path", None), None);
> +    assert_eq!(matchlist.matches("some", None), Ok(None));
> +    assert_eq!(matchlist.matches("path", None), Ok(None));
>      assert_eq!(
>          matchlist.matches("some/path", None),
> -        Some(MatchType::Include)
> +        Ok(Some(MatchType::Include))
>      );
> -    assert_eq!(matchlist.matches("another/some/path", None), None);
> +    assert_eq!(matchlist.matches("another/some/path", None), Ok(None));
>  }
>  
>  #[test]
> @@ -495,7 +581,10 @@ fn test_literal_matches() {
>          MatchPattern::Literal(b"/bin/mv".to_vec()),
>          MatchType::Include,
>      )];
> -    assert_eq!(matchlist.matches("/bin/mv", None), Some(MatchType::Include));
> +    assert_eq!(
> +        matchlist.matches("/bin/mv", None),
> +        Ok(Some(MatchType::Include))
> +    );
>  }
>  
>  #[test]
> @@ -504,29 +593,159 @@ fn test_path_relativity() {
>      let matchlist = vec![
>          MatchEntry::new(Pattern::path("noslash").unwrap(), MatchType::Include),
>          MatchEntry::new(Pattern::path("noslash-a").unwrap(), MatchType::Include)
> -            .flags(MatchFlag::ANCHORED),
> +            .add_flags(MatchFlag::ANCHORED),
>          MatchEntry::new(Pattern::path("/slash").unwrap(), MatchType::Include),
>          MatchEntry::new(Pattern::path("/slash-a").unwrap(), MatchType::Include)
> -            .flags(MatchFlag::ANCHORED),
> +            .add_flags(MatchFlag::ANCHORED),
>      ];
> -    assert_eq!(matchlist.matches("noslash", None), Some(MatchType::Include));
> +    assert_eq!(
> +        matchlist.matches("noslash", None),
> +        Ok(Some(MatchType::Include))
> +    );
>      assert_eq!(
>          matchlist.matches("noslash-a", None),
> -        Some(MatchType::Include)
> +        Ok(Some(MatchType::Include))
>      );
> -    assert_eq!(matchlist.matches("slash", None), None);
> -    assert_eq!(matchlist.matches("/slash", None), Some(MatchType::Include));
> -    assert_eq!(matchlist.matches("slash-a", None), None);
> +    assert_eq!(matchlist.matches("slash", None), Ok(None));
> +    assert_eq!(
> +        matchlist.matches("/slash", None),
> +        Ok(Some(MatchType::Include))
> +    );
> +    assert_eq!(matchlist.matches("slash-a", None), Ok(None));
>      assert_eq!(
>          matchlist.matches("/slash-a", None),
> -        Some(MatchType::Include)
> +        Ok(Some(MatchType::Include))
>      );
>  
>      assert_eq!(
>          matchlist.matches("foo/noslash", None),
> -        Some(MatchType::Include)
> +        Ok(Some(MatchType::Include))
>      );
> -    assert_eq!(matchlist.matches("foo/noslash-a", None), None);
> -    assert_eq!(matchlist.matches("foo/slash", None), None);
> -    assert_eq!(matchlist.matches("foo/slash-a", None), None);
> +    assert_eq!(matchlist.matches("foo/noslash-a", None), Ok(None));
> +    assert_eq!(matchlist.matches("foo/slash", None), Ok(None));
> +    assert_eq!(matchlist.matches("foo/slash-a", None), Ok(None));
> +}
> +
> +#[test]
> +fn matches_path() {
> +    use crate::Pattern;
> +
> +    let matchlist = vec![
> +        MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude),
> +        MatchEntry::new(Pattern::path("b*").unwrap(), MatchType::Exclude),
> +    ];
> +
> +    assert_eq!(
> +        matchlist.matches("ahsjdj", || Err(FileModeRequired {})),
> +        Ok(Some(MatchType::Exclude))
> +    );
> +    let mut _test = MatchFlag::ANY_FILE_TYPE;

^ Not sure what the point of this is. If it's only to test the
assignment/mutability, maybe don't use `MatchFlag` for the contents,
that's a bit confusing/misleading, and maybe include a test that would
actually trigger the access as well and also actually validate the final
value.

> +    assert_eq!(
> +        matchlist.matches("bhshdf", || {
> +            _test = MatchFlag::MATCH_DIRECTORIES;
> +            Ok::<u32, FileModeRequired>(libc::S_IFDIR)
> +        }),
> +        Ok(Some(MatchType::Exclude))
> +    );
> +
> +    let matchlist = vec![
> +        MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
> +            .flags(MatchFlag::MATCH_DIRECTORIES),
> +        MatchEntry::new(Pattern::path("b*").unwrap(), MatchType::Exclude),
> +    ];
> +
> +    assert_eq!(
> +        matchlist.matches("ahsjdj", || Err(FileModeRequired {})),
> +        Err(FileModeRequired {})
> +    );
> +    assert_eq!(
> +        matchlist.matches("bhshdf", || Err(FileModeRequired {})),
> +        Ok(Some(MatchType::Exclude))
> +    );
> +    assert_eq!(
> +        matchlist.matches("ahsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFDIR)),
> +        Ok(Some(MatchType::Exclude))
> +    );

Could also have a case with a different mode here.

> +
> +    let matchlist = vec![
> +        MatchEntry::new(Pattern::path("b*").unwrap(), MatchType::Include),
> +        MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
> +            .flags(MatchFlag::MATCH_DIRECTORIES),
> +    ];
> +
> +    assert_eq!(
> +        matchlist.matches("ahsjdj", || Err(FileModeRequired {})),
> +        Err(FileModeRequired {})
> +    );
> +    assert_eq!(
> +        matchlist.matches("bhshdf", || Err(FileModeRequired {})),
> +        Err(FileModeRequired {})
> +    );
> +
> +    assert_eq!(
> +        matchlist.matches("ahsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFDIR)),
> +        Ok(Some(MatchType::Exclude))
> +    );
> +    assert_eq!(
> +        matchlist.matches("bhshdf", || Err(FileModeRequired {})),
> +        Err(FileModeRequired {})
> +    );

^ This case is a duplicate.

> +    assert_eq!(
> +        matchlist.matches("bbb", || Ok::<u32, FileModeRequired>(libc::S_IFDIR)),
> +        Ok(Some(MatchType::Include))
> +    );
> +
> +    let matchlist = vec![
> +        MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
> +            .flags(MatchFlag::MATCH_DIRECTORIES),
> +    ];
> +
> +    assert_eq!(
> +        matchlist.matches("bbb", || Ok::<u32, FileModeRequired>(libc::S_IFDIR)),
> +        Ok(None)
> +    );
> +
> +    let matchlist = vec![
> +        MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
> +            .flags(MatchFlag::MATCH_DIRECTORIES),
> +        MatchEntry::new(Pattern::path("b*").unwrap(), MatchType::Exclude)
> +            .flags(MatchFlag::MATCH_REGULAR_FILES),
> +    ];
> +
> +    assert_eq!(
> +        matchlist.matches("ahsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFDIR)),
> +        Ok(Some(MatchType::Exclude))
> +    );
> +    assert_eq!(
> +        matchlist.matches("ahsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFREG)),
> +        Ok(None)
> +    );
> +    assert_eq!(
> +        matchlist.matches("bhsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFREG)),
> +        Ok(Some(MatchType::Exclude))
> +    );
> +}
> +
> +#[test]
> +fn match_entry_needs_flag() {
> +    use crate::Pattern;
> +    let match_entry = MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
> +        .flags(MatchFlag::MATCH_DIRECTORIES);

could replace the `.flags` call here with using `a*/` as a pattern
(trailing slash).

> +    assert_eq!(match_entry.needs_file_mode(), true);
> +
> +    let match_entry = MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
> +        .flags(MatchFlag::MATCH_REGULAR_FILES);
> +    assert_eq!(match_entry.needs_file_mode(), true);
> +
> +    let match_entry = MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
> +        .flags(MatchFlag::MATCH_DIRECTORIES | MatchFlag::MATCH_REGULAR_FILES);
> +    assert_eq!(match_entry.needs_file_mode(), true);
> +
> +    let match_entry = MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
> +        .flags(MatchFlag::ANCHORED | MatchFlag::MATCH_DIRECTORIES);
> +    assert_eq!(match_entry.needs_file_mode(), true);
> +
> +    let match_entry = MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
> +        .flags(MatchFlag::ANY_FILE_TYPE);
> +    assert_eq!(match_entry.needs_file_mode(), false);
>  }
> -- 
> 2.39.2




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup v5] fix #4380: check if file is excluded before running `stat()`
  2023-08-18 14:32 ` [pbs-devel] [PATCH proxmox-backup v5] fix #4380: check if file is excluded before running `stat()` Gabriel Goller
@ 2023-08-21  8:41   ` Wolfgang Bumiller
  2023-08-21 11:03     ` Gabriel Goller
  0 siblings, 1 reply; 7+ messages in thread
From: Wolfgang Bumiller @ 2023-08-21  8:41 UTC (permalink / raw)
  To: Gabriel Goller; +Cc: pbs-devel


needs a rebase, and comments inline:

On Fri, Aug 18, 2023 at 04:32:12PM +0200, Gabriel Goller wrote:
> Passed a closure with the `stat()` function call to `matches()`. This
> will traverse through all patterns and try to match using the path only, if a
> `file_mode` is needed, it will run the closure. This means that if we exclude
> a file with the `MatchType::ANY_FILE_TYPE`, we will skip it without running
> `stat()` on it.
> Added `pathpatterns` crate to local overrides in cargo.toml.
> 
> Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
> ---
> 
> changes v4:
>  - match only by path and exclude the matched files, the run `stat()` and
>    match again, this time using the `file_mode`. This will match everything 
>    twice in the worst case, which is not optimal.
> changes v3:
>  - checking for `read` and `execute` permissions before entering directory,
>    doesn't work because there are a lot of side-effects (executed by
>    different user, AppArmor, SELinux, ...).
> changes v2:
>  - checking for excluded files with `matches()` before executing `stat()`,
>    this doesn't work because we get the file_mode from `stat()` and don't
>    want to ignore it when matching.
> 
>  Cargo.toml                      |  1 +
>  pbs-client/src/catalog_shell.rs | 12 +++---
>  pbs-client/src/pxar/create.rs   | 69 ++++++++++++++++++++++++---------
>  pbs-client/src/pxar/extract.rs  |  9 +++--
>  pbs-datastore/src/catalog.rs    |  6 +--
>  5 files changed, 65 insertions(+), 32 deletions(-)
> 
> diff --git a/Cargo.toml b/Cargo.toml
> index 2a05c471..c54bc28b 100644
> --- a/Cargo.toml
> +++ b/Cargo.toml
> @@ -262,6 +262,7 @@ proxmox-rrd.workspace = true
>  
>  #proxmox-apt = { path = "../proxmox-apt" }
>  #proxmox-openid = { path = "../proxmox-openid-rs" }
> +#pathpatterns = {path = "../pathpatterns" }
>  
>  #pxar = { path = "../pxar" }
>  
> diff --git a/pbs-client/src/catalog_shell.rs b/pbs-client/src/catalog_shell.rs
> index 98af5699..89bcc280 100644
> --- a/pbs-client/src/catalog_shell.rs
> +++ b/pbs-client/src/catalog_shell.rs
> @@ -13,7 +13,7 @@ use nix::dir::Dir;
>  use nix::fcntl::OFlag;
>  use nix::sys::stat::Mode;
>  
> -use pathpatterns::{MatchEntry, MatchList, MatchPattern, MatchType, PatternFlag};
> +use pathpatterns::{FileModeRequired, MatchEntry, MatchList, MatchPattern, MatchType, PatternFlag};
>  use proxmox_router::cli::{self, CliCommand, CliCommandMap, CliHelper, CommandLineInterface};
>  use proxmox_schema::api;
>  use proxmox_sys::fs::{create_path, CreateOptions};
> @@ -1108,7 +1108,7 @@ impl<'a> ExtractorState<'a> {
>      async fn handle_new_directory(
>          &mut self,
>          entry: catalog::DirEntry,
> -        match_result: Option<MatchType>,
> +        match_result: Result<Option<MatchType>, FileModeRequired>,

^ This is not needed, let's not take errors with us throughout the
process.

This is our catalog shell. The catalog file always contains file types
for its contents except for hardlinks, since we cannot resolve them.
However, hardlinks also cannot recurse down any further, since they're
also never directories, so `handle_new_directory` will never actually
get `Err()` passed here.

>      ) -> Result<(), Error> {
>          // enter a new directory:
>          self.read_dir_stack.push(mem::replace(
> @@ -1123,7 +1123,7 @@ impl<'a> ExtractorState<'a> {
>          Shell::walk_pxar_archive(self.accessor, &mut self.dir_stack).await?;
>          let dir_pxar = self.dir_stack.last().unwrap().pxar.as_ref().unwrap();
>          let dir_meta = dir_pxar.entry().metadata().clone();
> -        let create = self.matches && match_result != Some(MatchType::Exclude);
> +        let create = self.matches && match_result != Ok(Some(MatchType::Exclude));

^ so no need for this change

>          self.extractor
>              .enter_directory(dir_pxar.file_name().to_os_string(), dir_meta, create)?;
>  
> @@ -1133,9 +1133,9 @@ impl<'a> ExtractorState<'a> {
>      pub async fn handle_entry(&mut self, entry: catalog::DirEntry) -> Result<(), Error> {
>          let match_result = self.match_list.matches(&self.path, entry.get_file_mode());

I'd like to point out that if we don't know what to do, we should fail,
so, just use `?` here.

Since this is an interactive tool, we *may* get away with just logging
this and continuing with the matches you wrote out below here.

But we should not just be silent about it :-)

Personally, I'm fine with erroring and telling the user to investigate.
In practice I doubt anyone will really run into this. The catalog shell
is too awkward to have many users (I think???)

>          let did_match = match match_result {
> -            Some(MatchType::Include) => true,
> -            Some(MatchType::Exclude) => false,
> -            None => self.matches,
> +            Ok(Some(MatchType::Include)) => true,
> +            Ok(Some(MatchType::Exclude)) => false,
> +            _ => self.matches,
>          };
>  
>          match (did_match, &entry.attr) {
> diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
> index 2577cf98..4a844599 100644
> --- a/pbs-client/src/pxar/create.rs
> +++ b/pbs-client/src/pxar/create.rs
> @@ -21,7 +21,6 @@ use pxar::Metadata;
>  
>  use proxmox_io::vec;
>  use proxmox_lang::c_str;
> -use proxmox_sys::error::SysError;
>  use proxmox_sys::fs::{self, acl, xattr};
>  
>  use pbs_datastore::catalog::BackupCatalogWriter;
> @@ -420,7 +419,7 @@ impl Archiver {
>          for file in dir.iter() {
>              let file = file?;
>  
> -            let file_name = file.file_name().to_owned();
> +            let file_name = file.file_name();
>              let file_name_bytes = file_name.to_bytes();
>              if file_name_bytes == b"." || file_name_bytes == b".." {
>                  continue;
> @@ -434,25 +433,57 @@ impl Archiver {
>              assert_single_path_component(os_file_name)?;
>              let full_path = self.path.join(os_file_name);
>  
> -            let stat = match nix::sys::stat::fstatat(
> -                dir_fd,
> -                file_name.as_c_str(),
> -                nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
> -            ) {
> -                Ok(stat) => stat,
> -                Err(ref err) if err.not_found() => continue,
> -                Err(err) => return Err(err).context(format!("stat failed on {:?}", full_path)),
> -            };
> -
>              let match_path = PathBuf::from("/").join(full_path.clone());
> -            if self
> +
> +            let mut stat_results: Option<FileStat> = None;

Please factor the duplicate `fstatat` call out up here into a closure
    let get_file_mode = || fstatat(...);

> +
> +            let stat = self

^ stat is not a good name for this.
This can stay an `if` as before since the condition doesn't need to be
this huge.

>                  .patterns
> -                .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
> -                == Some(MatchType::Exclude)
> -            {
> +                .matches(
> +                    match_path.as_os_str().as_bytes(),
> +                    || match nix::sys::stat::fstatat(
> +                        dir_fd,
> +                        file_name.to_owned().as_c_str(),
> +                        nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
> +                    ) {
> +                        Ok(stat) => {
> +                            stat_results = Some(stat);
> +                            Ok(stat.st_mode)
> +                        }
> +                        Err(e) => Err(e),
> +                    },
> +                )

^ this block would just be
                .matches(match_path.as_os_str().as_bytes(), || {
                    Ok::<_, Errno>(match &stat_results {
                        Some(result) => result.st_mode,
                        None => stat_results.insert(get_file_mode()?).st_mode,
                    })
                })


> +                .map_err(|err| {
> +                    anyhow::Error::new(err).context(format!(
> +                        "stat failed on {:?} with {:?}",
> +                        full_path,
> +                        err.desc()

When switching to 'context', there's no need to include the error in the
message, it just causes duplicate text in the output.

> +                    ))
> +                })?;

^ We have both `Error` and `Context` in imported. The entire `.map_err`
can be turned into just:

        .with_context(|| format!("stat failed on {full_path:?}"))?

> +
> +            if stat == Some(MatchType::Exclude) {
>                  continue;
>              }
>  
> +            if stat_results.is_none() {
> +                match nix::sys::stat::fstatat(
> +                    dir_fd,
> +                    file_name.to_owned().as_c_str(),
> +                    nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
> +                ) {
> +                    Ok(stat) => {
> +                        stat_results = Some(stat);
> +                    }
> +                    Err(err) => {
> +                        return Err(err).context(format!(
> +                            "stat failed on {:?} with {:?}",
> +                            full_path,
> +                            err.desc()
> +                        ));
> +                    }
> +                }
> +            }

^ Instead of this large block we then only need:

    let stat = stat_results.map(Ok).unwrap_or_else(get_file_mode)?;

> +
>              self.entry_counter += 1;
>              if self.entry_counter > self.entry_limit {
>                  bail!(
> @@ -462,9 +493,9 @@ impl Archiver {
>              }
>  
>              file_list.push(FileListEntry {
> -                name: file_name,
> +                name: file_name.to_owned(),
>                  path: full_path,
> -                stat,
> +                stat: stat_results.unwrap(),

^ and this change can just be dropped :-)

>              });
>          }
>  
> @@ -534,7 +565,7 @@ impl Archiver {
>          if self
>              .patterns
>              .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
> -            == Some(MatchType::Exclude)
> +            == Ok(Some(MatchType::Exclude))

^ Please don't just ignore errors, just put a `?` onto the call.

On the one hand we know that this cannot error since we pass `Some`, so
we could even unwrap, but that's worse, since there's no guarantee that
no other errors can happen.
We *may* be able to guarantee this by adding a `GetFileMode` impl
in pathpatterns directly to `u32` with `type Error =
std::convert::Infallible`, and then drop the `Some()` wrapping around
this piece?

>          {
>              return Ok(());
>          }
> diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
> index 4dbaf52d..e08af3c0 100644
> --- a/pbs-client/src/pxar/extract.rs
> +++ b/pbs-client/src/pxar/extract.rs
> @@ -244,16 +244,17 @@ where
>          );
>  
>          let did_match = match match_result {
> -            Some(MatchType::Include) => true,
> -            Some(MatchType::Exclude) => false,
> -            None => self.state.current_match,
> +            Ok(Some(MatchType::Include)) => true,
> +            Ok(Some(MatchType::Exclude)) => false,
> +            _ => self.state.current_match,

Same here, 3 lines above we pass `Some(file_type)` to `matches()`.
We should not just use `_` here, but rather, the above should fail.

>          };
>  
>          let extract_res = match (did_match, entry.kind()) {
>              (_, EntryKind::Directory) => {
>                  self.callback(entry.path());
>  
> -                let create = self.state.current_match && match_result != Some(MatchType::Exclude);
> +                let create =
> +                    self.state.current_match && match_result != Ok(Some(MatchType::Exclude));
>                  let res = self
>                      .extractor
>                      .enter_directory(file_name_os.to_owned(), metadata.clone(), create)
> diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
> index 11c14b64..86e20c92 100644
> --- a/pbs-datastore/src/catalog.rs
> +++ b/pbs-datastore/src/catalog.rs
> @@ -678,9 +678,9 @@ impl<R: Read + Seek> CatalogReader<R> {
>              }
>              file_path.extend(&e.name);
>              match match_list.matches(&file_path, e.get_file_mode()) {
> -                Some(MatchType::Exclude) => continue,
> -                Some(MatchType::Include) => callback(file_path)?,
> -                None => (),
> +                Ok(Some(MatchType::Exclude)) => continue,
> +                Ok(Some(MatchType::Include)) => callback(file_path)?,
> +                _ => (),
>              }
>              if is_dir {
>                  self.find(&e, file_path, match_list, callback)?;
> -- 
> 2.39.2




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup v5] fix #4380: check if file is excluded before running `stat()`
  2023-08-21  8:41   ` Wolfgang Bumiller
@ 2023-08-21 11:03     ` Gabriel Goller
  2023-08-21 12:18       ` Wolfgang Bumiller
  0 siblings, 1 reply; 7+ messages in thread
From: Gabriel Goller @ 2023-08-21 11:03 UTC (permalink / raw)
  To: Wolfgang Bumiller; +Cc: pbs-devel


On 8/21/23 10:41, Wolfgang Bumiller wrote:
> needs a rebase, and comments inline:
>
> On Fri, Aug 18, 2023 at 04:32:12PM +0200, Gabriel Goller wrote:
>> Passed a closure with the `stat()` function call to `matches()`. This
>> will traverse through all patterns and try to match using the path only, if a
>> `file_mode` is needed, it will run the closure. This means that if we exclude
>> a file with the `MatchType::ANY_FILE_TYPE`, we will skip it without running
>> `stat()` on it.
>> Added `pathpatterns` crate to local overrides in cargo.toml.
>>
>> Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
>> ---
>>
>> changes v4:
>>   - match only by path and exclude the matched files, the run `stat()` and
>>     match again, this time using the `file_mode`. This will match everything
>>     twice in the worst case, which is not optimal.
>> changes v3:
>>   - checking for `read` and `execute` permissions before entering directory,
>>     doesn't work because there are a lot of side-effects (executed by
>>     different user, AppArmor, SELinux, ...).
>> changes v2:
>>   - checking for excluded files with `matches()` before executing `stat()`,
>>     this doesn't work because we get the file_mode from `stat()` and don't
>>     want to ignore it when matching.
>>
>>   Cargo.toml                      |  1 +
>>   pbs-client/src/catalog_shell.rs | 12 +++---
>>   pbs-client/src/pxar/create.rs   | 69 ++++++++++++++++++++++++---------
>>   pbs-client/src/pxar/extract.rs  |  9 +++--
>>   pbs-datastore/src/catalog.rs    |  6 +--
>>   5 files changed, 65 insertions(+), 32 deletions(-)
>>
>> diff --git a/Cargo.toml b/Cargo.toml
>> index 2a05c471..c54bc28b 100644
>> --- a/Cargo.toml
>> +++ b/Cargo.toml
>> @@ -262,6 +262,7 @@ proxmox-rrd.workspace = true
>>   
>>   #proxmox-apt = { path = "../proxmox-apt" }
>>   #proxmox-openid = { path = "../proxmox-openid-rs" }
>> +#pathpatterns = {path = "../pathpatterns" }
>>   
>>   #pxar = { path = "../pxar" }
>>   
>> diff --git a/pbs-client/src/catalog_shell.rs b/pbs-client/src/catalog_shell.rs
>> index 98af5699..89bcc280 100644
>> --- a/pbs-client/src/catalog_shell.rs
>> +++ b/pbs-client/src/catalog_shell.rs
>> @@ -13,7 +13,7 @@ use nix::dir::Dir;
>>   use nix::fcntl::OFlag;
>>   use nix::sys::stat::Mode;
>>   
>> -use pathpatterns::{MatchEntry, MatchList, MatchPattern, MatchType, PatternFlag};
>> +use pathpatterns::{FileModeRequired, MatchEntry, MatchList, MatchPattern, MatchType, PatternFlag};
>>   use proxmox_router::cli::{self, CliCommand, CliCommandMap, CliHelper, CommandLineInterface};
>>   use proxmox_schema::api;
>>   use proxmox_sys::fs::{create_path, CreateOptions};
>> @@ -1108,7 +1108,7 @@ impl<'a> ExtractorState<'a> {
>>       async fn handle_new_directory(
>>           &mut self,
>>           entry: catalog::DirEntry,
>> -        match_result: Option<MatchType>,
>> +        match_result: Result<Option<MatchType>, FileModeRequired>,
> ^ This is not needed, let's not take errors with us throughout the
> process.
>
> This is our catalog shell. The catalog file always contains file types
> for its contents except for hardlinks, since we cannot resolve them.
> However, hardlinks also cannot recurse down any further, since they're
> also never directories, so `handle_new_directory` will never actually
> get `Err()` passed here.
>
>>       ) -> Result<(), Error> {
>>           // enter a new directory:
>>           self.read_dir_stack.push(mem::replace(
>> @@ -1123,7 +1123,7 @@ impl<'a> ExtractorState<'a> {
>>           Shell::walk_pxar_archive(self.accessor, &mut self.dir_stack).await?;
>>           let dir_pxar = self.dir_stack.last().unwrap().pxar.as_ref().unwrap();
>>           let dir_meta = dir_pxar.entry().metadata().clone();
>> -        let create = self.matches && match_result != Some(MatchType::Exclude);
>> +        let create = self.matches && match_result != Ok(Some(MatchType::Exclude));
> ^ so no need for this change
>
>>           self.extractor
>>               .enter_directory(dir_pxar.file_name().to_os_string(), dir_meta, create)?;
>>   
>> @@ -1133,9 +1133,9 @@ impl<'a> ExtractorState<'a> {
>>       pub async fn handle_entry(&mut self, entry: catalog::DirEntry) -> Result<(), Error> {
>>           let match_result = self.match_list.matches(&self.path, entry.get_file_mode());
> I'd like to point out that if we don't know what to do, we should fail,
> so, just use `?` here.
>
> Since this is an interactive tool, we *may* get away with just logging
> this and continuing with the matches you wrote out below here.
>
> But we should not just be silent about it :-)
>
> Personally, I'm fine with erroring and telling the user to investigate.
> In practice I doubt anyone will really run into this. The catalog shell
> is too awkward to have many users (I think???)
How about we introduce this log message, then use the question mark 
operator?

     match (did_match, &entry.attr) {
         (_, DirEntryAttribute::Directory { .. }) => {
             if match_result.is_err() {
                 log::error!("error while matching file paths");
             }
             self.handle_new_directory(entry, match_result?).await?;
         }
         ...
     }
>>           let did_match = match match_result {
>> -            Some(MatchType::Include) => true,
>> -            Some(MatchType::Exclude) => false,
>> -            None => self.matches,
>> +            Ok(Some(MatchType::Include)) => true,
>> +            Ok(Some(MatchType::Exclude)) => false,
>> +            _ => self.matches,
>>           };
>>   
>>           match (did_match, &entry.attr) {
>> diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
>> index 2577cf98..4a844599 100644
>> --- a/pbs-client/src/pxar/create.rs
>> +++ b/pbs-client/src/pxar/create.rs
>> @@ -21,7 +21,6 @@ use pxar::Metadata;
>>   
>>   use proxmox_io::vec;
>>   use proxmox_lang::c_str;
>> -use proxmox_sys::error::SysError;
>>   use proxmox_sys::fs::{self, acl, xattr};
>>   
>>   use pbs_datastore::catalog::BackupCatalogWriter;
>> @@ -420,7 +419,7 @@ impl Archiver {
>>           for file in dir.iter() {
>>               let file = file?;
>>   
>> -            let file_name = file.file_name().to_owned();
>> +            let file_name = file.file_name();
>>               let file_name_bytes = file_name.to_bytes();
>>               if file_name_bytes == b"." || file_name_bytes == b".." {
>>                   continue;
>> @@ -434,25 +433,57 @@ impl Archiver {
>>               assert_single_path_component(os_file_name)?;
>>               let full_path = self.path.join(os_file_name);
>>   
>> -            let stat = match nix::sys::stat::fstatat(
>> -                dir_fd,
>> -                file_name.as_c_str(),
>> -                nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
>> -            ) {
>> -                Ok(stat) => stat,
>> -                Err(ref err) if err.not_found() => continue,
>> -                Err(err) => return Err(err).context(format!("stat failed on {:?}", full_path)),
>> -            };
>> -
>>               let match_path = PathBuf::from("/").join(full_path.clone());
>> -            if self
>> +
>> +            let mut stat_results: Option<FileStat> = None;
> Please factor the duplicate `fstatat` call out up here into a closure
>      let get_file_mode = || fstatat(...);
>
>> +
>> +            let stat = self
> ^ stat is not a good name for this.
> This can stay an `if` as before since the condition doesn't need to be
> this huge.
>
>>                   .patterns
>> -                .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
>> -                == Some(MatchType::Exclude)
>> -            {
>> +                .matches(
>> +                    match_path.as_os_str().as_bytes(),
>> +                    || match nix::sys::stat::fstatat(
>> +                        dir_fd,
>> +                        file_name.to_owned().as_c_str(),
>> +                        nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
>> +                    ) {
>> +                        Ok(stat) => {
>> +                            stat_results = Some(stat);
>> +                            Ok(stat.st_mode)
>> +                        }
>> +                        Err(e) => Err(e),
>> +                    },
>> +                )
> ^ this block would just be
>                  .matches(match_path.as_os_str().as_bytes(), || {
>                      Ok::<_, Errno>(match &stat_results {
>                          Some(result) => result.st_mode,
>                          None => stat_results.insert(get_file_mode()?).st_mode,
>                      })
>                  })
>
>
>> +                .map_err(|err| {
>> +                    anyhow::Error::new(err).context(format!(
>> +                        "stat failed on {:?} with {:?}",
>> +                        full_path,
>> +                        err.desc()
> When switching to 'context', there's no need to include the error in the
> message, it just causes duplicate text in the output.
>
>> +                    ))
>> +                })?;
> ^ We have both `Error` and `Context` in imported. The entire `.map_err`
> can be turned into just:
>
>          .with_context(|| format!("stat failed on {full_path:?}"))?
>
>> +
>> +            if stat == Some(MatchType::Exclude) {
>>                   continue;
>>               }
>>   
>> +            if stat_results.is_none() {
>> +                match nix::sys::stat::fstatat(
>> +                    dir_fd,
>> +                    file_name.to_owned().as_c_str(),
>> +                    nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
>> +                ) {
>> +                    Ok(stat) => {
>> +                        stat_results = Some(stat);
>> +                    }
>> +                    Err(err) => {
>> +                        return Err(err).context(format!(
>> +                            "stat failed on {:?} with {:?}",
>> +                            full_path,
>> +                            err.desc()
>> +                        ));
>> +                    }
>> +                }
>> +            }
> ^ Instead of this large block we then only need:
>
>      let stat = stat_results.map(Ok).unwrap_or_else(get_file_mode)?;
>
>> +
>>               self.entry_counter += 1;
>>               if self.entry_counter > self.entry_limit {
>>                   bail!(
>> @@ -462,9 +493,9 @@ impl Archiver {
>>               }
>>   
>>               file_list.push(FileListEntry {
>> -                name: file_name,
>> +                name: file_name.to_owned(),
>>                   path: full_path,
>> -                stat,
>> +                stat: stat_results.unwrap(),
> ^ and this change can just be dropped :-)
>
>>               });
>>           }
>>   
>> @@ -534,7 +565,7 @@ impl Archiver {
>>           if self
>>               .patterns
>>               .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
>> -            == Some(MatchType::Exclude)
>> +            == Ok(Some(MatchType::Exclude))
> ^ Please don't just ignore errors, just put a `?` onto the call.
>
> On the one hand we know that this cannot error since we pass `Some`, so
> we could even unwrap, but that's worse, since there's no guarantee that
> no other errors can happen.
> We *may* be able to guarantee this by adding a `GetFileMode` impl
> in pathpatterns directly to `u32` with `type Error =
> std::convert::Infallible`, and then drop the `Some()` wrapping around
> this piece?

Yes, that's nice. Adding this implementation to pathpatterns:

     impl GetFileMode for u32 {
         type Error = std::convert::Infallible;
         fn get(self) -> Result<u32, Self::Error> {
             Ok(self)
         }
     }

then it should look like this:

     if self
         .patterns
         .matches(match_path.as_os_str().as_bytes(), stat.st_mode)?
             == Some(MatchType::Exclude)
     {
         return Ok(());
     }

>>           {
>>               return Ok(());
>>           }
>> diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
>> index 4dbaf52d..e08af3c0 100644
>> --- a/pbs-client/src/pxar/extract.rs
>> +++ b/pbs-client/src/pxar/extract.rs
>> @@ -244,16 +244,17 @@ where
>>           );
>>   
>>           let did_match = match match_result {
>> -            Some(MatchType::Include) => true,
>> -            Some(MatchType::Exclude) => false,
>> -            None => self.state.current_match,
>> +            Ok(Some(MatchType::Include)) => true,
>> +            Ok(Some(MatchType::Exclude)) => false,
>> +            _ => self.state.current_match,
> Same here, 3 lines above we pass `Some(file_type)` to `matches()`.
> We should not just use `_` here, but rather, the above should fail.

Here we can write this:

     let match_result = self.match_list.matches(
         entry.path().as_os_str().as_bytes(),
         metadata.file_type() as u32
     ).unwrap();
     let did_match = match match_result {
         Some(MatchType::Include) => true,
         Some(MatchType::Exclude) => false,
         _ => self.state.current_match,

     };

Unwrapping should (?) be safe because the `Err` is `Infallible`.

>>           };
>>   
>>           let extract_res = match (did_match, entry.kind()) {
>>               (_, EntryKind::Directory) => {
>>                   self.callback(entry.path());
>>   
>> -                let create = self.state.current_match && match_result != Some(MatchType::Exclude);
>> +                let create =
>> +                    self.state.current_match && match_result != Ok(Some(MatchType::Exclude));
>>                   let res = self
>>                       .extractor
>>                       .enter_directory(file_name_os.to_owned(), metadata.clone(), create)
>> diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
>> index 11c14b64..86e20c92 100644
>> --- a/pbs-datastore/src/catalog.rs
>> +++ b/pbs-datastore/src/catalog.rs
>> @@ -678,9 +678,9 @@ impl<R: Read + Seek> CatalogReader<R> {
>>               }
>>               file_path.extend(&e.name);
>>               match match_list.matches(&file_path, e.get_file_mode()) {
>> -                Some(MatchType::Exclude) => continue,
>> -                Some(MatchType::Include) => callback(file_path)?,
>> -                None => (),
>> +                Ok(Some(MatchType::Exclude)) => continue,
>> +                Ok(Some(MatchType::Include)) => callback(file_path)?,
>> +                _ => (),
>>               }
>>               if is_dir {
>>                   self.find(&e, file_path, match_list, callback)?;
>> -- 
>> 2.39.2




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup v5] fix #4380: check if file is excluded before running `stat()`
  2023-08-21 11:03     ` Gabriel Goller
@ 2023-08-21 12:18       ` Wolfgang Bumiller
  2023-08-21 12:48         ` Gabriel Goller
  0 siblings, 1 reply; 7+ messages in thread
From: Wolfgang Bumiller @ 2023-08-21 12:18 UTC (permalink / raw)
  To: Gabriel Goller; +Cc: pbs-devel

On Mon, Aug 21, 2023 at 01:03:56PM +0200, Gabriel Goller wrote:
> 
> On 8/21/23 10:41, Wolfgang Bumiller wrote:
> > needs a rebase, and comments inline:
> > 
> > On Fri, Aug 18, 2023 at 04:32:12PM +0200, Gabriel Goller wrote:
> > > @@ -1108,7 +1108,7 @@ impl<'a> ExtractorState<'a> {
> > >       async fn handle_new_directory(
> > >           &mut self,
> > >           entry: catalog::DirEntry,
> > > -        match_result: Option<MatchType>,
> > > +        match_result: Result<Option<MatchType>, FileModeRequired>,
> > ^ This is not needed, let's not take errors with us throughout the
> > process.
> > 
> > This is our catalog shell. The catalog file always contains file types
> > for its contents except for hardlinks, since we cannot resolve them.
> > However, hardlinks also cannot recurse down any further, since they're
> > also never directories, so `handle_new_directory` will never actually
> > get `Err()` passed here.
> > 
> > >       ) -> Result<(), Error> {
> > >           // enter a new directory:
> > >           self.read_dir_stack.push(mem::replace(
> > > @@ -1123,7 +1123,7 @@ impl<'a> ExtractorState<'a> {
> > >           Shell::walk_pxar_archive(self.accessor, &mut self.dir_stack).await?;
> > >           let dir_pxar = self.dir_stack.last().unwrap().pxar.as_ref().unwrap();
> > >           let dir_meta = dir_pxar.entry().metadata().clone();
> > > -        let create = self.matches && match_result != Some(MatchType::Exclude);
> > > +        let create = self.matches && match_result != Ok(Some(MatchType::Exclude));
> > ^ so no need for this change
> > 
> > >           self.extractor
> > >               .enter_directory(dir_pxar.file_name().to_os_string(), dir_meta, create)?;
> > > @@ -1133,9 +1133,9 @@ impl<'a> ExtractorState<'a> {
> > >       pub async fn handle_entry(&mut self, entry: catalog::DirEntry) -> Result<(), Error> {
> > >           let match_result = self.match_list.matches(&self.path, entry.get_file_mode());
> > I'd like to point out that if we don't know what to do, we should fail,
> > so, just use `?` here.
> > 
> > Since this is an interactive tool, we *may* get away with just logging
> > this and continuing with the matches you wrote out below here.
> > 
> > But we should not just be silent about it :-)
> > 
> > Personally, I'm fine with erroring and telling the user to investigate.
> > In practice I doubt anyone will really run into this. The catalog shell
> > is too awkward to have many users (I think???)
> How about we introduce this log message, then use the question mark
> operator?

This would produce both a warning and an error message then, no?
Or did you mean something else?

> 
>     match (did_match, &entry.attr) {
>         (_, DirEntryAttribute::Directory { .. }) => {
>             if match_result.is_err() {
>                 log::error!("error while matching file paths");
>             }
>             self.handle_new_directory(entry, match_result?).await?;
>         }
>         ...
>     }
> > >           let did_match = match match_result {
> > > -            Some(MatchType::Include) => true,
> > > -            Some(MatchType::Exclude) => false,
> > > -            None => self.matches,
> > > +            Ok(Some(MatchType::Include)) => true,
> > > +            Ok(Some(MatchType::Exclude)) => false,
> > > +            _ => self.matches,
> > >           };
> > >           match (did_match, &entry.attr) {
> > > diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
> > > index 2577cf98..4a844599 100644
> > > --- a/pbs-client/src/pxar/create.rs
> > > +++ b/pbs-client/src/pxar/create.rs
(...)
> > > +            if stat_results.is_none() {
> > > +                match nix::sys::stat::fstatat(
> > > +                    dir_fd,
> > > +                    file_name.to_owned().as_c_str(),
> > > +                    nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
> > > +                ) {
> > > +                    Ok(stat) => {
> > > +                        stat_results = Some(stat);
> > > +                    }
> > > +                    Err(err) => {
> > > +                        return Err(err).context(format!(
> > > +                            "stat failed on {:?} with {:?}",
> > > +                            full_path,
> > > +                            err.desc()
> > > +                        ));
> > > +                    }
> > > +                }
> > > +            }
> > ^ Instead of this large block we then only need:
> > 
> >      let stat = stat_results.map(Ok).unwrap_or_else(get_file_mode)?;
> > 
> > > +
> > >               self.entry_counter += 1;
> > >               if self.entry_counter > self.entry_limit {
> > >                   bail!(
> > > @@ -462,9 +493,9 @@ impl Archiver {
> > >               }
> > >               file_list.push(FileListEntry {
> > > -                name: file_name,
> > > +                name: file_name.to_owned(),
> > >                   path: full_path,
> > > -                stat,
> > > +                stat: stat_results.unwrap(),
> > ^ and this change can just be dropped :-)
> > 
> > >               });
> > >           }
> > > @@ -534,7 +565,7 @@ impl Archiver {
> > >           if self
> > >               .patterns
> > >               .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
> > > -            == Some(MatchType::Exclude)
> > > +            == Ok(Some(MatchType::Exclude))
> > ^ Please don't just ignore errors, just put a `?` onto the call.
> > 
> > On the one hand we know that this cannot error since we pass `Some`, so
> > we could even unwrap, but that's worse, since there's no guarantee that
> > no other errors can happen.
> > We *may* be able to guarantee this by adding a `GetFileMode` impl
> > in pathpatterns directly to `u32` with `type Error =
> > std::convert::Infallible`, and then drop the `Some()` wrapping around
> > this piece?
> 
> Yes, that's nice. Adding this implementation to pathpatterns:
> 
>     impl GetFileMode for u32 {
>         type Error = std::convert::Infallible;
>         fn get(self) -> Result<u32, Self::Error> {
>             Ok(self)
>         }
>     }
> 
> then it should look like this:
> 
>     if self
>         .patterns
>         .matches(match_path.as_os_str().as_bytes(), stat.st_mode)?
>             == Some(MatchType::Exclude)
>     {
>         return Ok(());
>     }
> 
> > >           {
> > >               return Ok(());
> > >           }
> > > diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
> > > index 4dbaf52d..e08af3c0 100644
> > > --- a/pbs-client/src/pxar/extract.rs
> > > +++ b/pbs-client/src/pxar/extract.rs
> > > @@ -244,16 +244,17 @@ where
> > >           );
> > >           let did_match = match match_result {
> > > -            Some(MatchType::Include) => true,
> > > -            Some(MatchType::Exclude) => false,
> > > -            None => self.state.current_match,
> > > +            Ok(Some(MatchType::Include)) => true,
> > > +            Ok(Some(MatchType::Exclude)) => false,
> > > +            _ => self.state.current_match,
> > Same here, 3 lines above we pass `Some(file_type)` to `matches()`.
> > We should not just use `_` here, but rather, the above should fail.
> 
> Here we can write this:
> 
>     let match_result = self.match_list.matches(
>         entry.path().as_os_str().as_bytes(),
>         metadata.file_type() as u32
>     ).unwrap();
>     let did_match = match match_result {
>         Some(MatchType::Include) => true,
>         Some(MatchType::Exclude) => false,
>         _ => self.state.current_match,
> 
>     };
> 
> Unwrapping should (?) be safe because the `Err` is `Infallible`.

Yeah I guess that's fine. Note: please add comments above unwraps ;-)

Somehow I'm thinking the stdlib should have a separate function for
this... like fn Result<T, Infallible>::unwrap_infallible(self) -> T...
But I don't really feel like it's worth it pulling in another crate for
this ;-)




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup v5] fix #4380: check if file is excluded before running `stat()`
  2023-08-21 12:18       ` Wolfgang Bumiller
@ 2023-08-21 12:48         ` Gabriel Goller
  0 siblings, 0 replies; 7+ messages in thread
From: Gabriel Goller @ 2023-08-21 12:48 UTC (permalink / raw)
  To: Wolfgang Bumiller; +Cc: pbs-devel


On 8/21/23 14:18, Wolfgang Bumiller wrote:
> On Mon, Aug 21, 2023 at 01:03:56PM +0200, Gabriel Goller wrote:
>> On 8/21/23 10:41, Wolfgang Bumiller wrote:
>>> needs a rebase, and comments inline:
>>>
>>> On Fri, Aug 18, 2023 at 04:32:12PM +0200, Gabriel Goller wrote:
>>>> @@ -1108,7 +1108,7 @@ impl<'a> ExtractorState<'a> {
>>>>        async fn handle_new_directory(
>>>>            &mut self,
>>>>            entry: catalog::DirEntry,
>>>> -        match_result: Option<MatchType>,
>>>> +        match_result: Result<Option<MatchType>, FileModeRequired>,
>>> ^ This is not needed, let's not take errors with us throughout the
>>> process.
>>>
>>> This is our catalog shell. The catalog file always contains file types
>>> for its contents except for hardlinks, since we cannot resolve them.
>>> However, hardlinks also cannot recurse down any further, since they're
>>> also never directories, so `handle_new_directory` will never actually
>>> get `Err()` passed here.
>>>
>>>>        ) -> Result<(), Error> {
>>>>            // enter a new directory:
>>>>            self.read_dir_stack.push(mem::replace(
>>>> @@ -1123,7 +1123,7 @@ impl<'a> ExtractorState<'a> {
>>>>            Shell::walk_pxar_archive(self.accessor, &mut self.dir_stack).await?;
>>>>            let dir_pxar = self.dir_stack.last().unwrap().pxar.as_ref().unwrap();
>>>>            let dir_meta = dir_pxar.entry().metadata().clone();
>>>> -        let create = self.matches && match_result != Some(MatchType::Exclude);
>>>> +        let create = self.matches && match_result != Ok(Some(MatchType::Exclude));
>>> ^ so no need for this change
>>>
>>>>            self.extractor
>>>>                .enter_directory(dir_pxar.file_name().to_os_string(), dir_meta, create)?;
>>>> @@ -1133,9 +1133,9 @@ impl<'a> ExtractorState<'a> {
>>>>        pub async fn handle_entry(&mut self, entry: catalog::DirEntry) -> Result<(), Error> {
>>>>            let match_result = self.match_list.matches(&self.path, entry.get_file_mode());
>>> I'd like to point out that if we don't know what to do, we should fail,
>>> so, just use `?` here.
>>>
>>> Since this is an interactive tool, we *may* get away with just logging
>>> this and continuing with the matches you wrote out below here.
>>>
>>> But we should not just be silent about it :-)
>>>
>>> Personally, I'm fine with erroring and telling the user to investigate.
>>> In practice I doubt anyone will really run into this. The catalog shell
>>> is too awkward to have many users (I think???)
>> How about we introduce this log message, then use the question mark
>> operator?
> This would produce both a warning and an error message then, no?
> Or did you mean something else?
Oh, you are right, I missed that. We already log the error.
>>      match (did_match, &entry.attr) {
>>          (_, DirEntryAttribute::Directory { .. }) => {
>>              if match_result.is_err() {
>>                  log::error!("error while matching file paths");
>>              }
>>              self.handle_new_directory(entry, match_result?).await?;
>>          }
>>          ...
>>      }
>>>>            let did_match = match match_result {
>>>> -            Some(MatchType::Include) => true,
>>>> -            Some(MatchType::Exclude) => false,
>>>> -            None => self.matches,
>>>> +            Ok(Some(MatchType::Include)) => true,
>>>> +            Ok(Some(MatchType::Exclude)) => false,
>>>> +            _ => self.matches,
>>>>            };
>>>>            match (did_match, &entry.attr) {
>>>> diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
>>>> index 2577cf98..4a844599 100644
>>>> --- a/pbs-client/src/pxar/create.rs
>>>> +++ b/pbs-client/src/pxar/create.rs
> (...)
>>>> +            if stat_results.is_none() {
>>>> +                match nix::sys::stat::fstatat(
>>>> +                    dir_fd,
>>>> +                    file_name.to_owned().as_c_str(),
>>>> +                    nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
>>>> +                ) {
>>>> +                    Ok(stat) => {
>>>> +                        stat_results = Some(stat);
>>>> +                    }
>>>> +                    Err(err) => {
>>>> +                        return Err(err).context(format!(
>>>> +                            "stat failed on {:?} with {:?}",
>>>> +                            full_path,
>>>> +                            err.desc()
>>>> +                        ));
>>>> +                    }
>>>> +                }
>>>> +            }
>>> ^ Instead of this large block we then only need:
>>>
>>>       let stat = stat_results.map(Ok).unwrap_or_else(get_file_mode)?;
>>>
>>>> +
>>>>                self.entry_counter += 1;
>>>>                if self.entry_counter > self.entry_limit {
>>>>                    bail!(
>>>> @@ -462,9 +493,9 @@ impl Archiver {
>>>>                }
>>>>                file_list.push(FileListEntry {
>>>> -                name: file_name,
>>>> +                name: file_name.to_owned(),
>>>>                    path: full_path,
>>>> -                stat,
>>>> +                stat: stat_results.unwrap(),
>>> ^ and this change can just be dropped :-)
>>>
>>>>                });
>>>>            }
>>>> @@ -534,7 +565,7 @@ impl Archiver {
>>>>            if self
>>>>                .patterns
>>>>                .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
>>>> -            == Some(MatchType::Exclude)
>>>> +            == Ok(Some(MatchType::Exclude))
>>> ^ Please don't just ignore errors, just put a `?` onto the call.
>>>
>>> On the one hand we know that this cannot error since we pass `Some`, so
>>> we could even unwrap, but that's worse, since there's no guarantee that
>>> no other errors can happen.
>>> We *may* be able to guarantee this by adding a `GetFileMode` impl
>>> in pathpatterns directly to `u32` with `type Error =
>>> std::convert::Infallible`, and then drop the `Some()` wrapping around
>>> this piece?
>> Yes, that's nice. Adding this implementation to pathpatterns:
>>
>>      impl GetFileMode for u32 {
>>          type Error = std::convert::Infallible;
>>          fn get(self) -> Result<u32, Self::Error> {
>>              Ok(self)
>>          }
>>      }
>>
>> then it should look like this:
>>
>>      if self
>>          .patterns
>>          .matches(match_path.as_os_str().as_bytes(), stat.st_mode)?
>>              == Some(MatchType::Exclude)
>>      {
>>          return Ok(());
>>      }
>>
>>>>            {
>>>>                return Ok(());
>>>>            }
>>>> diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
>>>> index 4dbaf52d..e08af3c0 100644
>>>> --- a/pbs-client/src/pxar/extract.rs
>>>> +++ b/pbs-client/src/pxar/extract.rs
>>>> @@ -244,16 +244,17 @@ where
>>>>            );
>>>>            let did_match = match match_result {
>>>> -            Some(MatchType::Include) => true,
>>>> -            Some(MatchType::Exclude) => false,
>>>> -            None => self.state.current_match,
>>>> +            Ok(Some(MatchType::Include)) => true,
>>>> +            Ok(Some(MatchType::Exclude)) => false,
>>>> +            _ => self.state.current_match,
>>> Same here, 3 lines above we pass `Some(file_type)` to `matches()`.
>>> We should not just use `_` here, but rather, the above should fail.
>> Here we can write this:
>>
>>      let match_result = self.match_list.matches(
>>          entry.path().as_os_str().as_bytes(),
>>          metadata.file_type() as u32
>>      ).unwrap();
>>      let did_match = match match_result {
>>          Some(MatchType::Include) => true,
>>          Some(MatchType::Exclude) => false,
>>          _ => self.state.current_match,
>>
>>      };
>>
>> Unwrapping should (?) be safe because the `Err` is `Infallible`.
> Yeah I guess that's fine. Note: please add comments above unwraps ;-)
>
> Somehow I'm thinking the stdlib should have a separate function for
> this... like fn Result<T, Infallible>::unwrap_infallible(self) -> T...
> But I don't really feel like it's worth it pulling in another crate for
> this ;-)
Yes, I agree.




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-08-21 12:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-18 14:32 [pbs-devel] [PATCH pathpatterns v3] match_list: updated `matches()`, to only retrieve file mode if necessary Gabriel Goller
2023-08-18 14:32 ` [pbs-devel] [PATCH proxmox-backup v5] fix #4380: check if file is excluded before running `stat()` Gabriel Goller
2023-08-21  8:41   ` Wolfgang Bumiller
2023-08-21 11:03     ` Gabriel Goller
2023-08-21 12:18       ` Wolfgang Bumiller
2023-08-21 12:48         ` Gabriel Goller
2023-08-21  7:57 ` [pbs-devel] [PATCH pathpatterns v3] match_list: updated `matches()`, to only retrieve file mode if necessary Wolfgang Bumiller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal