* [pbs-devel] [PATCH pathpatterns v4] match_list: updated `matches()`, to only retrieve file mode if necessary
@ 2023-08-21 13:08 Gabriel Goller
2023-08-21 13:08 ` [pbs-devel] [PATCH proxmox-backup v6] fix #4380: check if file is excluded before running `stat()` Gabriel Goller
2023-08-22 13:07 ` [pbs-devel] [PATCH pathpatterns v4] match_list: updated `matches()`, to only retrieve file mode if necessary Wolfgang Bumiller
0 siblings, 2 replies; 5+ messages in thread
From: Gabriel Goller @ 2023-08-21 13:08 UTC (permalink / raw)
To: pbs-devel
Updated `matches()` function, which now takes the `GetFileMode` trait and returns a
Result. `GetFileMode` is a function that should return the `file_mode`. `matches()`
will go through the patterns and match by path only until it finds a pattern which
does not have `MatchFlag::ANY_FILE_TYPE`, in case it will call the
`get_file_mode`, which will return a `file_mode`. This will ensure that the
`get()` (which in our case executes `stat()`) will only be run once(at most),
every pattern will only be processed once and that the order of the patterns
will be respected. `GetFileMode` is also implemented for `Option<u32>` (so that
we can simply pass options of the `file_mode`) and `u32` (returning
`std::convert::Infallible`, so that we can pass a u32 mode directly).
Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
---
changes v3:
- changed `matches()` and `matches_exact()` to retrieve the file_mode lazily.
Allowed these functions to take in a trait `GetFileMode`, so that we can pass
closures and options that contain the `file_mode`.
changes v2:
- updated the `matches_path()` function to take a closure and renamed
it to `matches_mode_lazy()`. This allows us to only match once, so we
don't need to match before and after `stat()` (without and with
`file_mode`) anymore.
changes v1:
- added `matches_path()` function, which matches by path only and returns an
error if the `file_mode` is required.
src/lib.rs | 72 +++++-----
src/match_list.rs | 335 ++++++++++++++++++++++++++++++++++++++--------
2 files changed, 315 insertions(+), 92 deletions(-)
diff --git a/src/lib.rs b/src/lib.rs
index cb89917..998cdae 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -31,17 +31,17 @@
//! MatchEntry::include(Pattern::path("bananas/curved.*")?),
//! ];
//!
-//! assert_eq!(list.matches("/things", None), None);
-//! assert_eq!(list.matches("/things/shop", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas", None), Some(MatchType::Exclude));
-//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Some(MatchType::Include));
+//! assert_eq!(list.matches("/things", None), Ok(None));
+//! assert_eq!(list.matches("/things/shop", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas", None), Ok(Some(MatchType::Exclude)));
+//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
//!
//! // this will exclude the curved.bak file
//! list.push(MatchEntry::exclude(Pattern::path("curved.bak")?));
-//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Some(MatchType::Exclude));
+//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Ok(Some(MatchType::Exclude)));
//! list.pop();
-//! # assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Some(MatchType::Include));
+//! # assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
//!
//! // but this will not:
//! list.push(
@@ -49,40 +49,40 @@
//! .flags(MatchFlag::ANCHORED)
//! );
//! // or: list.push
-//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Some(MatchType::Include));
+//! assert_eq!(list.matches("/things/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
//! list.pop();
//!
//! // let's check some patterns, anything starting with a 'c', 'f' or 's':
//! let mut list = vec![MatchEntry::include(Pattern::path("[cfs]*")?)];
-//! assert_eq!(list.matches("/things", None), None);
-//! assert_eq!(list.matches("/things/file1.dat", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/file2.dat", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/info.txt", None), None);
-//! assert_eq!(list.matches("/things/shop/apples", None), None);
-//! assert_eq!(list.matches("/things/shop/apples/gala.txt", None), None);
-//! assert_eq!(list.matches("/things/shop/apples/golden-delicious.txt", None), None);
-//! assert_eq!(list.matches("/things/shop/bananas", None), None);
-//! assert_eq!(list.matches("/things/shop/bananas/straight.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/shop/bananas/curved.bak", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas/other.txt", None), None);
+//! assert_eq!(list.matches("/things", None), Ok(None));
+//! assert_eq!(list.matches("/things/file1.dat", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/file2.dat", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/info.txt", None), Ok(None));
+//! assert_eq!(list.matches("/things/shop/apples", None), Ok(None));
+//! assert_eq!(list.matches("/things/shop/apples/gala.txt", None), Ok(None));
+//! assert_eq!(list.matches("/things/shop/apples/golden-delicious.txt", None), Ok(None));
+//! assert_eq!(list.matches("/things/shop/bananas", None), Ok(None));
+//! assert_eq!(list.matches("/things/shop/bananas/straight.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas/other.txt", None), Ok(None));
//!
//! // If we add `**` we end up including the entire `shop/` subtree:
//! list.push(MatchEntry::include(Pattern::path("[cfs]*/**")?));
-//! assert_eq!(list.matches("/things", None), None);
-//! assert_eq!(list.matches("/things/file1.dat", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/file2.dat", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/info.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/apples", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/apples/gala.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/shop/apples/golden-delicious.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas/straight.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/shop/bananas/curved.bak", None), Some(MatchType::Include));
-//! assert_eq!(list.matches("/shop/bananas/other.txt", None), Some(MatchType::Include));
+//! assert_eq!(list.matches("/things", None), Ok(None));
+//! assert_eq!(list.matches("/things/file1.dat", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/file2.dat", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/info.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/apples", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/apples/gala.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/shop/apples/golden-delicious.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas/straight.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/things/shop/bananas/curved.txt", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/shop/bananas/curved.bak", None), Ok(Some(MatchType::Include)));
+//! assert_eq!(list.matches("/shop/bananas/other.txt", None), Ok(Some(MatchType::Include)));
//! # Ok(())
//! # }
//! # test().unwrap()
@@ -92,7 +92,9 @@ mod match_list;
mod pattern;
#[doc(inline)]
-pub use match_list::{MatchEntry, MatchFlag, MatchList, MatchPattern, MatchType};
+pub use match_list::{
+ FileModeRequired, GetFileMode, MatchEntry, MatchFlag, MatchList, MatchPattern, MatchType,
+};
#[doc(inline)]
pub use pattern::{ParseError, Pattern, PatternFlag};
diff --git a/src/match_list.rs b/src/match_list.rs
index c5b14e0..69b1fdb 100644
--- a/src/match_list.rs
+++ b/src/match_list.rs
@@ -1,5 +1,4 @@
//! Helpers for include/exclude lists.
-
use bitflags::bitflags;
use crate::PatternFlag;
@@ -39,6 +38,46 @@ impl Default for MatchFlag {
}
}
+#[derive(Debug, PartialEq)]
+pub struct FileModeRequired;
+
+impl std::fmt::Display for FileModeRequired {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ write!(f, "File mode is required for matching")
+ }
+}
+
+impl std::error::Error for FileModeRequired {}
+
+pub trait GetFileMode {
+ type Error;
+ fn get(self) -> Result<u32, Self::Error>;
+}
+
+impl<T, E> GetFileMode for T
+where
+ T: FnOnce() -> Result<u32, E>,
+{
+ type Error = E;
+ fn get(self) -> Result<u32, Self::Error> {
+ self()
+ }
+}
+
+impl GetFileMode for Option<u32> {
+ type Error = FileModeRequired;
+ fn get(self) -> Result<u32, Self::Error> {
+ self.ok_or(FileModeRequired {})
+ }
+}
+
+impl GetFileMode for u32 {
+ type Error = std::convert::Infallible;
+ fn get(self) -> Result<u32, Self::Error> {
+ Ok(self)
+ }
+}
+
/// A pattern entry. (Glob patterns or literal patterns.)
// Note:
// For regex we'd likely use the POSIX extended REs via `regexec(3)`, since we're targetting
@@ -211,7 +250,7 @@ impl MatchEntry {
pub fn matches_mode(&self, file_mode: u32) -> bool {
// bitflags' `.contains` means ALL bits must be set, if they are all set we don't
// need to check the mode...
- if self.flags.contains(MatchFlag::ANY_FILE_TYPE) {
+ if !self.needs_file_mode() {
return true;
}
@@ -304,12 +343,21 @@ impl MatchEntry {
self.matches_path_exact(path)
}
+
+ pub fn needs_file_mode(&self) -> bool {
+ let flags = (self.flags & MatchFlag::ANY_FILE_TYPE).bits();
+ if flags != 0 && flags != MatchFlag::ANY_FILE_TYPE.bits() {
+ return true;
+ }
+ false
+ }
}
#[doc(hidden)]
pub trait MatchListEntry {
fn entry_matches(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
fn entry_matches_exact(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
+ fn entry_needs_file_mode(&self) -> bool;
}
impl MatchListEntry for &'_ MatchEntry {
@@ -328,6 +376,10 @@ impl MatchListEntry for &'_ MatchEntry {
None
}
}
+
+ fn entry_needs_file_mode(&self) -> bool {
+ self.needs_file_mode()
+ }
}
impl MatchListEntry for &'_ &'_ MatchEntry {
@@ -346,6 +398,10 @@ impl MatchListEntry for &'_ &'_ MatchEntry {
None
}
}
+
+ fn entry_needs_file_mode(&self) -> bool {
+ self.needs_file_mode()
+ }
}
/// This provides [`matches`](MatchList::matches) and [`matches_exact`](MatchList::matches_exact)
@@ -361,19 +417,18 @@ impl MatchListEntry for &'_ &'_ MatchEntry {
/// This makes it easier to use slices over entries or references to entries.
pub trait MatchList {
/// Check whether this list contains anything matching a prefix of the specified path, and the
- /// specified file mode.
- fn matches<T: AsRef<[u8]>>(&self, path: T, file_mode: Option<u32>) -> Option<MatchType> {
- self.matches_do(path.as_ref(), file_mode)
- }
-
- fn matches_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
-
- /// Check whether this list contains anything exactly matching the path and mode.
- fn matches_exact<T: AsRef<[u8]>>(&self, path: T, file_mode: Option<u32>) -> Option<MatchType> {
- self.matches_exact_do(path.as_ref(), file_mode)
- }
-
- fn matches_exact_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType>;
+ /// specified file mode. Gets the file_mode lazily, only if needed.
+ fn matches<T, U>(&self, path: T, get_file_mode: U) -> Result<Option<MatchType>, U::Error>
+ where
+ T: AsRef<[u8]>,
+ U: GetFileMode;
+
+ /// Check whether this list contains anything exactly matching the path and mode. Gets the
+ /// file_mode lazily, only if needed.
+ fn matches_exact<T, U>(&self, path: T, get_file_mode: U) -> Result<Option<MatchType>, U::Error>
+ where
+ T: AsRef<[u8]>,
+ U: GetFileMode;
}
impl<'a, T> MatchList for T
@@ -383,30 +438,48 @@ where
<&'a T as IntoIterator>::IntoIter: DoubleEndedIterator,
<&'a T as IntoIterator>::Item: MatchListEntry,
{
- fn matches_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType> {
+ fn matches<U, G>(&self, path: U, get_file_mode: G) -> Result<Option<MatchType>, G::Error>
+ where
+ U: AsRef<[u8]>,
+ G: GetFileMode,
+ {
// This is an &self method on a `T where T: 'a`.
let this: &'a Self = unsafe { std::mem::transmute(self) };
+ let mut get_file_mode = Some(get_file_mode);
+ let mut file_mode = None;
+
for m in this.into_iter().rev() {
- if let Some(mt) = m.entry_matches(path, file_mode) {
- return Some(mt);
+ if file_mode.is_none() && m.entry_needs_file_mode() {
+ file_mode = Some(get_file_mode.take().unwrap().get()?);
+ }
+ if let Some(mt) = m.entry_matches(path.as_ref(), file_mode) {
+ return Ok(Some(mt));
}
}
-
- None
+ Ok(None)
}
- fn matches_exact_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType> {
+ fn matches_exact<U, G>(&self, path: U, get_file_mode: G) -> Result<Option<MatchType>, G::Error>
+ where
+ U: AsRef<[u8]>,
+ G: GetFileMode,
+ {
// This is an &self method on a `T where T: 'a`.
let this: &'a Self = unsafe { std::mem::transmute(self) };
+ let mut get_file_mode = Some(get_file_mode);
+ let mut file_mode = None;
+
for m in this.into_iter().rev() {
- if let Some(mt) = m.entry_matches_exact(path, file_mode) {
- return Some(mt);
+ if file_mode.is_none() && m.entry_needs_file_mode() {
+ file_mode = Some(get_file_mode.take().unwrap().get()?);
+ }
+ if let Some(mt) = m.entry_matches_exact(path.as_ref(), file_mode) {
+ return Ok(Some(mt));
}
}
-
- None
+ Ok(None)
}
}
@@ -415,20 +488,20 @@ fn assert_containers_implement_match_list() {
use std::iter::FromIterator;
let vec = vec![MatchEntry::include(crate::Pattern::path("a*").unwrap())];
- assert_eq!(vec.matches("asdf", None), Some(MatchType::Include));
+ assert_eq!(vec.matches("asdf", None), Ok(Some(MatchType::Include)));
// FIXME: ideally we can make this work as well!
let vd = std::collections::VecDeque::<MatchEntry>::from_iter(vec.clone());
- assert_eq!(vd.matches("asdf", None), Some(MatchType::Include));
+ assert_eq!(vd.matches("asdf", None), Ok(Some(MatchType::Include)));
let list: &[MatchEntry] = &vec[..];
- assert_eq!(list.matches("asdf", None), Some(MatchType::Include));
+ assert_eq!(list.matches("asdf", None), Ok(Some(MatchType::Include)));
let list: Vec<&MatchEntry> = vec.iter().collect();
- assert_eq!(list.matches("asdf", None), Some(MatchType::Include));
+ assert_eq!(list.matches("asdf", None), Ok(Some(MatchType::Include)));
let list: &[&MatchEntry] = &list[..];
- assert_eq!(list.matches("asdf", None), Some(MatchType::Include));
+ assert_eq!(list.matches("asdf", None), Ok(Some(MatchType::Include)));
}
#[test]
@@ -443,25 +516,28 @@ fn test_file_type_matches() {
];
assert_eq!(
matchlist.matches("a_dir", Some(libc::S_IFDIR)),
- Some(MatchType::Include)
+ Ok(Some(MatchType::Include))
);
assert_eq!(
matchlist.matches("/a_dir", Some(libc::S_IFDIR)),
- Some(MatchType::Include)
+ Ok(Some(MatchType::Include))
);
- assert_eq!(matchlist.matches("/a_dir", Some(libc::S_IFREG)), None);
+ assert_eq!(matchlist.matches("/a_dir", Some(libc::S_IFREG)), Ok(None));
assert_eq!(
matchlist.matches("/a_file", Some(libc::S_IFREG)),
- Some(MatchType::Exclude)
+ Ok(Some(MatchType::Exclude))
);
- assert_eq!(matchlist.matches("/a_file", Some(libc::S_IFDIR)), None);
+ assert_eq!(matchlist.matches("/a_file", Some(libc::S_IFDIR)), Ok(None));
assert_eq!(
matchlist.matches("/another_dir", Some(libc::S_IFDIR)),
- Some(MatchType::Exclude)
+ Ok(Some(MatchType::Exclude))
+ );
+ assert_eq!(
+ matchlist.matches("/another_dir", Some(libc::S_IFREG)),
+ Ok(None)
);
- assert_eq!(matchlist.matches("/another_dir", Some(libc::S_IFREG)), None);
}
#[test]
@@ -471,22 +547,25 @@ fn test_anchored_matches() {
let matchlist = vec![
MatchEntry::new(Pattern::path("file-a").unwrap(), MatchType::Include),
MatchEntry::new(Pattern::path("some/path").unwrap(), MatchType::Include)
- .flags(MatchFlag::ANCHORED),
+ .add_flags(MatchFlag::ANCHORED),
];
- assert_eq!(matchlist.matches("file-a", None), Some(MatchType::Include));
+ assert_eq!(
+ matchlist.matches("file-a", None),
+ Ok(Some(MatchType::Include))
+ );
assert_eq!(
matchlist.matches("another/file-a", None),
- Some(MatchType::Include)
+ Ok(Some(MatchType::Include))
);
- assert_eq!(matchlist.matches("some", None), None);
- assert_eq!(matchlist.matches("path", None), None);
+ assert_eq!(matchlist.matches("some", None), Ok(None));
+ assert_eq!(matchlist.matches("path", None), Ok(None));
assert_eq!(
matchlist.matches("some/path", None),
- Some(MatchType::Include)
+ Ok(Some(MatchType::Include))
);
- assert_eq!(matchlist.matches("another/some/path", None), None);
+ assert_eq!(matchlist.matches("another/some/path", None), Ok(None));
}
#[test]
@@ -495,7 +574,10 @@ fn test_literal_matches() {
MatchPattern::Literal(b"/bin/mv".to_vec()),
MatchType::Include,
)];
- assert_eq!(matchlist.matches("/bin/mv", None), Some(MatchType::Include));
+ assert_eq!(
+ matchlist.matches("/bin/mv", None),
+ Ok(Some(MatchType::Include))
+ );
}
#[test]
@@ -504,29 +586,168 @@ fn test_path_relativity() {
let matchlist = vec![
MatchEntry::new(Pattern::path("noslash").unwrap(), MatchType::Include),
MatchEntry::new(Pattern::path("noslash-a").unwrap(), MatchType::Include)
- .flags(MatchFlag::ANCHORED),
+ .add_flags(MatchFlag::ANCHORED),
MatchEntry::new(Pattern::path("/slash").unwrap(), MatchType::Include),
MatchEntry::new(Pattern::path("/slash-a").unwrap(), MatchType::Include)
- .flags(MatchFlag::ANCHORED),
+ .add_flags(MatchFlag::ANCHORED),
];
- assert_eq!(matchlist.matches("noslash", None), Some(MatchType::Include));
+ assert_eq!(
+ matchlist.matches("noslash", None),
+ Ok(Some(MatchType::Include))
+ );
assert_eq!(
matchlist.matches("noslash-a", None),
- Some(MatchType::Include)
+ Ok(Some(MatchType::Include))
);
- assert_eq!(matchlist.matches("slash", None), None);
- assert_eq!(matchlist.matches("/slash", None), Some(MatchType::Include));
- assert_eq!(matchlist.matches("slash-a", None), None);
+ assert_eq!(matchlist.matches("slash", None), Ok(None));
+ assert_eq!(
+ matchlist.matches("/slash", None),
+ Ok(Some(MatchType::Include))
+ );
+ assert_eq!(matchlist.matches("slash-a", None), Ok(None));
assert_eq!(
matchlist.matches("/slash-a", None),
- Some(MatchType::Include)
+ Ok(Some(MatchType::Include))
);
assert_eq!(
matchlist.matches("foo/noslash", None),
- Some(MatchType::Include)
+ Ok(Some(MatchType::Include))
+ );
+ assert_eq!(matchlist.matches("foo/noslash-a", None), Ok(None));
+ assert_eq!(matchlist.matches("foo/slash", None), Ok(None));
+ assert_eq!(matchlist.matches("foo/slash-a", None), Ok(None));
+}
+
+#[test]
+fn matches_path() {
+ use crate::Pattern;
+
+ let matchlist = vec![
+ MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude),
+ MatchEntry::new(Pattern::path("b*").unwrap(), MatchType::Exclude),
+ ];
+
+ assert_eq!(
+ matchlist.matches("ahsjdj", || Err(FileModeRequired {})),
+ Ok(Some(MatchType::Exclude))
+ );
+ let mut test = 1;
+ let result = matchlist.matches("bhshdf", || {
+ test += 1;
+ Ok::<u32, FileModeRequired>(libc::S_IFDIR)
+ });
+ assert_eq!(result, Ok(Some(MatchType::Exclude)));
+ assert_eq!(test, 1);
+
+ let matchlist = vec![
+ MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+ .flags(MatchFlag::MATCH_DIRECTORIES),
+ MatchEntry::new(Pattern::path("b*").unwrap(), MatchType::Exclude),
+ ];
+
+ let mut test = 1;
+ let result = matchlist.matches("aa", || {
+ test += 1;
+ Ok::<u32, FileModeRequired>(libc::S_IFDIR)
+ });
+ assert_eq!(result, Ok(Some(MatchType::Exclude)));
+ assert_eq!(test, 2);
+ assert_eq!(
+ matchlist.matches("ahsjdj", || Err(FileModeRequired {})),
+ Err(FileModeRequired {})
+ );
+ assert_eq!(
+ matchlist.matches("bhshdf", || Err(FileModeRequired {})),
+ Ok(Some(MatchType::Exclude))
+ );
+ assert_eq!(
+ matchlist.matches("ahsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFREG)),
+ Ok(None)
+ );
+
+ let matchlist = vec![
+ MatchEntry::new(Pattern::path("b*").unwrap(), MatchType::Include),
+ MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+ .flags(MatchFlag::MATCH_DIRECTORIES),
+ ];
+ assert_eq!(
+ matchlist.matches("ahsjdj", || Err(FileModeRequired {})),
+ Err(FileModeRequired {})
+ );
+ assert_eq!(
+ matchlist.matches("bhshdf", || Err(FileModeRequired {})),
+ Err(FileModeRequired {})
+ );
+ assert_eq!(
+ matchlist.matches("ahsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFDIR)),
+ Ok(Some(MatchType::Exclude))
+ );
+ assert_eq!(
+ matchlist.matches("ahsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFREG)),
+ Ok(None)
+ );
+ assert_eq!(
+ matchlist.matches("bbb", || Ok::<u32, FileModeRequired>(libc::S_IFREG)),
+ Ok(Some(MatchType::Include))
);
- assert_eq!(matchlist.matches("foo/noslash-a", None), None);
- assert_eq!(matchlist.matches("foo/slash", None), None);
- assert_eq!(matchlist.matches("foo/slash-a", None), None);
+
+ let matchlist = vec![
+ MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+ .flags(MatchFlag::MATCH_DIRECTORIES),
+ ];
+
+ assert_eq!(
+ matchlist.matches("bbb", || Ok::<u32, FileModeRequired>(libc::S_IFDIR)),
+ Ok(None)
+ );
+
+ let matchlist = vec![
+ MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+ .flags(MatchFlag::MATCH_DIRECTORIES),
+ MatchEntry::new(Pattern::path("b*").unwrap(), MatchType::Exclude)
+ .flags(MatchFlag::MATCH_REGULAR_FILES),
+ ];
+
+ assert_eq!(
+ matchlist.matches("ahsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFDIR)),
+ Ok(Some(MatchType::Exclude))
+ );
+ assert_eq!(
+ matchlist.matches("ahsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFREG)),
+ Ok(None)
+ );
+ assert_eq!(
+ matchlist.matches("bhsjdj", || Ok::<u32, FileModeRequired>(libc::S_IFREG)),
+ Ok(Some(MatchType::Exclude))
+ );
+ assert_eq!(
+ matchlist.matches("bhsjdj", libc::S_IFREG as u32),
+ Ok(Some(MatchType::Exclude))
+ );
+ assert_eq!(matchlist.matches("bhsjdj", 0), Ok(None));
+}
+
+#[test]
+fn match_entry_needs_flag() {
+ use crate::Pattern;
+ let match_entry =
+ MatchEntry::parse_pattern("a*/", PatternFlag::PATH_NAME, MatchType::Exclude).unwrap();
+ assert_eq!(match_entry.needs_file_mode(), true);
+
+ let match_entry = MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+ .flags(MatchFlag::MATCH_REGULAR_FILES);
+ assert_eq!(match_entry.needs_file_mode(), true);
+
+ let match_entry = MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+ .flags(MatchFlag::MATCH_DIRECTORIES | MatchFlag::MATCH_REGULAR_FILES);
+ assert_eq!(match_entry.needs_file_mode(), true);
+
+ let match_entry = MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+ .flags(MatchFlag::ANCHORED | MatchFlag::MATCH_DIRECTORIES);
+ assert_eq!(match_entry.needs_file_mode(), true);
+
+ let match_entry = MatchEntry::new(Pattern::path("a*").unwrap(), MatchType::Exclude)
+ .flags(MatchFlag::ANY_FILE_TYPE);
+ assert_eq!(match_entry.needs_file_mode(), false);
}
--
2.39.2
^ permalink raw reply [flat|nested] 5+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v6] fix #4380: check if file is excluded before running `stat()`
2023-08-21 13:08 [pbs-devel] [PATCH pathpatterns v4] match_list: updated `matches()`, to only retrieve file mode if necessary Gabriel Goller
@ 2023-08-21 13:08 ` Gabriel Goller
2023-08-22 13:04 ` Wolfgang Bumiller
2023-08-22 13:07 ` [pbs-devel] [PATCH pathpatterns v4] match_list: updated `matches()`, to only retrieve file mode if necessary Wolfgang Bumiller
1 sibling, 1 reply; 5+ messages in thread
From: Gabriel Goller @ 2023-08-21 13:08 UTC (permalink / raw)
To: pbs-devel
Passed a closure with the `stat()` function call to `matches()`. This
will traverse through all patterns and try to match using the path only, if a
`file_mode` is needed, it will run the closure. This means that if we exclude
a file with the `MatchType::ANY_FILE_TYPE`, we will skip it without running
`stat()` on it. As we updated the `matches()` function, we also updated all the
invocations of it.
Added `pathpatterns` crate to local overrides in cargo.toml.
Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
---
changes v5:
- updated all invocations of `matches()`
changes v4:
- match only by path and exclude the matched files, the run `stat()` and
match again, this time using the `file_mode`. This will match everything
twice in the worst case, which is not optimal.
changes v3:
- checking for `read` and `execute` permissions before entering directory,
doesn't work because there are a lot of side-effects (executed by
different user, AppArmor, SELinux, ...).
changes v2:
- checking for excluded files with `matches()` before executing `stat()`,
this doesn't work because we get the file_mode from `stat()` and don't
want to ignore it when matching.
Cargo.toml | 5 +++--
pbs-client/src/catalog_shell.rs | 8 +++----
pbs-client/src/pxar/create.rs | 38 ++++++++++++++++++++-------------
| 10 +++++----
pbs-datastore/src/catalog.rs | 6 +++---
5 files changed, 39 insertions(+), 28 deletions(-)
diff --git a/Cargo.toml b/Cargo.toml
index 5cbae1b8..560794a4 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -264,8 +264,9 @@ proxmox-rrd.workspace = true
#proxmox-sortable-macro = { path = "../proxmox/proxmox-sortable-macro" }
#proxmox-human-byte = { path = "../proxmox/proxmox-human-byte" }
-#proxmox-apt = { path = "../proxmox/proxmox-apt" }
-#proxmox-openid = { path = "../proxmox/proxmox-openid" }
+#proxmox-apt = { path = "../proxmox-apt" }
+#proxmox-openid = { path = "../proxmox-openid-rs" }
+#pathpatterns = {path = "../pathpatterns" }
#pxar = { path = "../pxar" }
diff --git a/pbs-client/src/catalog_shell.rs b/pbs-client/src/catalog_shell.rs
index b8aaf8cb..f53b3cc5 100644
--- a/pbs-client/src/catalog_shell.rs
+++ b/pbs-client/src/catalog_shell.rs
@@ -1138,14 +1138,14 @@ impl<'a> ExtractorState<'a> {
pub async fn handle_entry(&mut self, entry: catalog::DirEntry) -> Result<(), Error> {
let match_result = self.match_list.matches(&self.path, entry.get_file_mode());
let did_match = match match_result {
- Some(MatchType::Include) => true,
- Some(MatchType::Exclude) => false,
- None => self.matches,
+ Ok(Some(MatchType::Include)) => true,
+ Ok(Some(MatchType::Exclude)) => false,
+ _ => self.matches,
};
match (did_match, &entry.attr) {
(_, DirEntryAttribute::Directory { .. }) => {
- self.handle_new_directory(entry, match_result).await?;
+ self.handle_new_directory(entry, match_result?).await?;
}
(true, DirEntryAttribute::File { .. }) => {
self.dir_stack.push(PathStackEntry::new(entry));
diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 2577cf98..2d516cfa 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -21,7 +21,6 @@ use pxar::Metadata;
use proxmox_io::vec;
use proxmox_lang::c_str;
-use proxmox_sys::error::SysError;
use proxmox_sys::fs::{self, acl, xattr};
use pbs_datastore::catalog::BackupCatalogWriter;
@@ -420,7 +419,7 @@ impl Archiver {
for file in dir.iter() {
let file = file?;
- let file_name = file.file_name().to_owned();
+ let file_name = file.file_name();
let file_name_bytes = file_name.to_bytes();
if file_name_bytes == b"." || file_name_bytes == b".." {
continue;
@@ -434,25 +433,34 @@ impl Archiver {
assert_single_path_component(os_file_name)?;
let full_path = self.path.join(os_file_name);
- let stat = match nix::sys::stat::fstatat(
+ let match_path = PathBuf::from("/").join(full_path.clone());
+
+ let mut stat_results: Option<FileStat> = None;
+
+ let get_file_mode = || match nix::sys::stat::fstatat(
dir_fd,
- file_name.as_c_str(),
+ file_name.to_owned().as_c_str(),
nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
) {
- Ok(stat) => stat,
- Err(ref err) if err.not_found() => continue,
- Err(err) => return Err(err).context(format!("stat failed on {:?}", full_path)),
+ Ok(stat) => Ok(stat),
+ Err(e) => Err(e),
};
-
- let match_path = PathBuf::from("/").join(full_path.clone());
- if self
- .patterns
- .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
- == Some(MatchType::Exclude)
+ if Some(MatchType::Exclude)
+ == self
+ .patterns
+ .matches(match_path.as_os_str().as_bytes(), || {
+ Ok::<_, Errno>(match &stat_results {
+ Some(result) => result.st_mode,
+ None => stat_results.insert(get_file_mode()?).st_mode,
+ })
+ })
+ .with_context(|| format!("stat failed on {full_path:?}"))?
{
continue;
}
+ let stat = stat_results.map(Ok).unwrap_or_else(get_file_mode)?;
+
self.entry_counter += 1;
if self.entry_counter > self.entry_limit {
bail!(
@@ -462,7 +470,7 @@ impl Archiver {
}
file_list.push(FileListEntry {
- name: file_name,
+ name: file_name.to_owned(),
path: full_path,
stat,
});
@@ -533,7 +541,7 @@ impl Archiver {
let match_path = PathBuf::from("/").join(self.path.clone());
if self
.patterns
- .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
+ .matches(match_path.as_os_str().as_bytes(), stat.st_mode)?
== Some(MatchType::Exclude)
{
return Ok(());
--git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
index 4eb6fb90..e24a1560 100644
--- a/pbs-client/src/pxar/extract.rs
+++ b/pbs-client/src/pxar/extract.rs
@@ -251,22 +251,24 @@ where
self.extractor.set_path(entry.path().as_os_str().to_owned());
+ // We can `unwrap()` safely here because we get a `Result<_, std::convert::Infallible>`
let match_result = self.match_list.matches(
entry.path().as_os_str().as_bytes(),
- Some(metadata.file_type() as u32),
- );
+ metadata.file_type() as u32
+ ).unwrap();
let did_match = match match_result {
Some(MatchType::Include) => true,
Some(MatchType::Exclude) => false,
- None => self.state.current_match,
+ _ => self.state.current_match,
};
let extract_res = match (did_match, entry.kind()) {
(_, EntryKind::Directory) => {
self.callback(entry.path());
- let create = self.state.current_match && match_result != Some(MatchType::Exclude);
+ let create =
+ self.state.current_match && match_result != Some(MatchType::Exclude);
let res = self
.extractor
.enter_directory(file_name_os.to_owned(), metadata.clone(), create)
diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
index 11c14b64..86e20c92 100644
--- a/pbs-datastore/src/catalog.rs
+++ b/pbs-datastore/src/catalog.rs
@@ -678,9 +678,9 @@ impl<R: Read + Seek> CatalogReader<R> {
}
file_path.extend(&e.name);
match match_list.matches(&file_path, e.get_file_mode()) {
- Some(MatchType::Exclude) => continue,
- Some(MatchType::Include) => callback(file_path)?,
- None => (),
+ Ok(Some(MatchType::Exclude)) => continue,
+ Ok(Some(MatchType::Include)) => callback(file_path)?,
+ _ => (),
}
if is_dir {
self.find(&e, file_path, match_list, callback)?;
--
2.39.2
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v6] fix #4380: check if file is excluded before running `stat()`
2023-08-21 13:08 ` [pbs-devel] [PATCH proxmox-backup v6] fix #4380: check if file is excluded before running `stat()` Gabriel Goller
@ 2023-08-22 13:04 ` Wolfgang Bumiller
2023-08-22 14:07 ` Gabriel Goller
0 siblings, 1 reply; 5+ messages in thread
From: Wolfgang Bumiller @ 2023-08-22 13:04 UTC (permalink / raw)
To: Gabriel Goller; +Cc: pbs-devel
On Mon, Aug 21, 2023 at 03:08:26PM +0200, Gabriel Goller wrote:
> Passed a closure with the `stat()` function call to `matches()`. This
> will traverse through all patterns and try to match using the path only, if a
> `file_mode` is needed, it will run the closure. This means that if we exclude
> a file with the `MatchType::ANY_FILE_TYPE`, we will skip it without running
> `stat()` on it. As we updated the `matches()` function, we also updated all the
> invocations of it.
> Added `pathpatterns` crate to local overrides in cargo.toml.
>
> Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
> ---
>
> changes v5:
> - updated all invocations of `matches()`
>
> changes v4:
> - match only by path and exclude the matched files, the run `stat()` and
> match again, this time using the `file_mode`. This will match everything
> twice in the worst case, which is not optimal.
> changes v3:
> - checking for `read` and `execute` permissions before entering directory,
> doesn't work because there are a lot of side-effects (executed by
> different user, AppArmor, SELinux, ...).
> changes v2:
> - checking for excluded files with `matches()` before executing `stat()`,
> this doesn't work because we get the file_mode from `stat()` and don't
> want to ignore it when matching.
>
>
> Cargo.toml | 5 +++--
> pbs-client/src/catalog_shell.rs | 8 +++----
> pbs-client/src/pxar/create.rs | 38 ++++++++++++++++++++-------------
> pbs-client/src/pxar/extract.rs | 10 +++++----
> pbs-datastore/src/catalog.rs | 6 +++---
> 5 files changed, 39 insertions(+), 28 deletions(-)
>
> diff --git a/Cargo.toml b/Cargo.toml
> index 5cbae1b8..560794a4 100644
> --- a/Cargo.toml
> +++ b/Cargo.toml
> @@ -264,8 +264,9 @@ proxmox-rrd.workspace = true
> #proxmox-sortable-macro = { path = "../proxmox/proxmox-sortable-macro" }
> #proxmox-human-byte = { path = "../proxmox/proxmox-human-byte" }
>
> -#proxmox-apt = { path = "../proxmox/proxmox-apt" }
> -#proxmox-openid = { path = "../proxmox/proxmox-openid" }
> +#proxmox-apt = { path = "../proxmox-apt" }
> +#proxmox-openid = { path = "../proxmox-openid-rs" }
> +#pathpatterns = {path = "../pathpatterns" }
>
> #pxar = { path = "../pxar" }
>
> diff --git a/pbs-client/src/catalog_shell.rs b/pbs-client/src/catalog_shell.rs
> index b8aaf8cb..f53b3cc5 100644
> --- a/pbs-client/src/catalog_shell.rs
> +++ b/pbs-client/src/catalog_shell.rs
> @@ -1138,14 +1138,14 @@ impl<'a> ExtractorState<'a> {
> pub async fn handle_entry(&mut self, entry: catalog::DirEntry) -> Result<(), Error> {
> let match_result = self.match_list.matches(&self.path, entry.get_file_mode());
> let did_match = match match_result {
> - Some(MatchType::Include) => true,
> - Some(MatchType::Exclude) => false,
> - None => self.matches,
> + Ok(Some(MatchType::Include)) => true,
> + Ok(Some(MatchType::Exclude)) => false,
> + _ => self.matches,
> };
>
> match (did_match, &entry.attr) {
> (_, DirEntryAttribute::Directory { .. }) => {
> - self.handle_new_directory(entry, match_result).await?;
> + self.handle_new_directory(entry, match_result?).await?;
> }
> (true, DirEntryAttribute::File { .. }) => {
> self.dir_stack.push(PathStackEntry::new(entry));
> diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
> index 2577cf98..2d516cfa 100644
> --- a/pbs-client/src/pxar/create.rs
> +++ b/pbs-client/src/pxar/create.rs
> @@ -21,7 +21,6 @@ use pxar::Metadata;
>
> use proxmox_io::vec;
> use proxmox_lang::c_str;
> -use proxmox_sys::error::SysError;
> use proxmox_sys::fs::{self, acl, xattr};
>
> use pbs_datastore::catalog::BackupCatalogWriter;
> @@ -420,7 +419,7 @@ impl Archiver {
> for file in dir.iter() {
> let file = file?;
>
> - let file_name = file.file_name().to_owned();
> + let file_name = file.file_name();
> let file_name_bytes = file_name.to_bytes();
> if file_name_bytes == b"." || file_name_bytes == b".." {
> continue;
> @@ -434,25 +433,34 @@ impl Archiver {
> assert_single_path_component(os_file_name)?;
> let full_path = self.path.join(os_file_name);
>
> - let stat = match nix::sys::stat::fstatat(
> + let match_path = PathBuf::from("/").join(full_path.clone());
> +
> + let mut stat_results: Option<FileStat> = None;
> +
> + let get_file_mode = || match nix::sys::stat::fstatat(
You don't need that 'match' here if your cases are basically a no-op.
You *could* attach the `stat failed on ...` context with the file name
here... but this will make it a bit more tedious to check for `ENOENT`
as we'd need to use `.downcast()` on the anyhow::Error.
So either that, or we'll need to duplicate the context line down below
where we do the final `let stat = results.unwrap_or_else(get_file_mode)?`.
> dir_fd,
> - file_name.as_c_str(),
> + file_name.to_owned().as_c_str(),
> nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
> ) {
> - Ok(stat) => stat,
> - Err(ref err) if err.not_found() => continue,
Because the thing is, we still need to handle this case - just noticed
that this was removed entirely, which is of course not what we want :-)
If the file gets removed between listing it from the directory and us
trying to `stat` it, we should just act as if it had never existed.
> - Err(err) => return Err(err).context(format!("stat failed on {:?}", full_path)),
> + Ok(stat) => Ok(stat),
> + Err(e) => Err(e),
> };
> -
> - let match_path = PathBuf::from("/").join(full_path.clone());
> - if self
> - .patterns
> - .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
> - == Some(MatchType::Exclude)
> + if Some(MatchType::Exclude)
> + == self
> + .patterns
> + .matches(match_path.as_os_str().as_bytes(), || {
> + Ok::<_, Errno>(match &stat_results {
> + Some(result) => result.st_mode,
> + None => stat_results.insert(get_file_mode()?).st_mode,
> + })
> + })
... it will instead need to be checked here ^
Basically, we will need to swap the `if` for a `match` after all, with
the cases
Ok(Some(MatchType::Exclude)) => continue,
Ok(_) => (),
Err(err) if err.not_found() => continue,
err => return err.with_context(...),
or the final 2 cases when using *not* not changing the `get_file_mode`
error to `anyhow::Error` would be:
Err(err) => match err.downcast::<io::Error>() {
Some(err) if err.not_found() => continue,
_ => return Err(err),
}
I hope I didn't miss anything now.
> + .with_context(|| format!("stat failed on {full_path:?}"))?
> {
> continue;
> }
>
> + let stat = stat_results.map(Ok).unwrap_or_else(get_file_mode)?;
If the context line is not already in the closure, we'd need to
duplicate it here.
> +
> self.entry_counter += 1;
> if self.entry_counter > self.entry_limit {
> bail!(
> @@ -462,7 +470,7 @@ impl Archiver {
> }
>
> file_list.push(FileListEntry {
> - name: file_name,
> + name: file_name.to_owned(),
> path: full_path,
> stat,
> });
> @@ -533,7 +541,7 @@ impl Archiver {
> let match_path = PathBuf::from("/").join(self.path.clone());
> if self
> .patterns
> - .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
> + .matches(match_path.as_os_str().as_bytes(), stat.st_mode)?
> == Some(MatchType::Exclude)
> {
> return Ok(());
> diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
> index 4eb6fb90..e24a1560 100644
> --- a/pbs-client/src/pxar/extract.rs
> +++ b/pbs-client/src/pxar/extract.rs
> @@ -251,22 +251,24 @@ where
>
> self.extractor.set_path(entry.path().as_os_str().to_owned());
>
> + // We can `unwrap()` safely here because we get a `Result<_, std::convert::Infallible>`
> let match_result = self.match_list.matches(
> entry.path().as_os_str().as_bytes(),
> - Some(metadata.file_type() as u32),
> - );
> + metadata.file_type() as u32
> + ).unwrap();
>
> let did_match = match match_result {
> Some(MatchType::Include) => true,
> Some(MatchType::Exclude) => false,
> - None => self.state.current_match,
> + _ => self.state.current_match,
> };
>
> let extract_res = match (did_match, entry.kind()) {
> (_, EntryKind::Directory) => {
> self.callback(entry.path());
>
> - let create = self.state.current_match && match_result != Some(MatchType::Exclude);
> + let create =
> + self.state.current_match && match_result != Some(MatchType::Exclude);
> let res = self
> .extractor
> .enter_directory(file_name_os.to_owned(), metadata.clone(), create)
> diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
> index 11c14b64..86e20c92 100644
> --- a/pbs-datastore/src/catalog.rs
> +++ b/pbs-datastore/src/catalog.rs
> @@ -678,9 +678,9 @@ impl<R: Read + Seek> CatalogReader<R> {
> }
> file_path.extend(&e.name);
> match match_list.matches(&file_path, e.get_file_mode()) {
> - Some(MatchType::Exclude) => continue,
> - Some(MatchType::Include) => callback(file_path)?,
> - None => (),
> + Ok(Some(MatchType::Exclude)) => continue,
> + Ok(Some(MatchType::Include)) => callback(file_path)?,
> + _ => (),
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v6] fix #4380: check if file is excluded before running `stat()`
2023-08-22 13:04 ` Wolfgang Bumiller
@ 2023-08-22 14:07 ` Gabriel Goller
0 siblings, 0 replies; 5+ messages in thread
From: Gabriel Goller @ 2023-08-22 14:07 UTC (permalink / raw)
To: Wolfgang Bumiller; +Cc: pbs-devel
Submitted a new version.
On 8/22/23 15:04, Wolfgang Bumiller wrote:
> On Mon, Aug 21, 2023 at 03:08:26PM +0200, Gabriel Goller wrote:
>> Passed a closure with the `stat()` function call to `matches()`. This
>> will traverse through all patterns and try to match using the path only, if a
>> `file_mode` is needed, it will run the closure. This means that if we exclude
>> a file with the `MatchType::ANY_FILE_TYPE`, we will skip it without running
>> `stat()` on it. As we updated the `matches()` function, we also updated all the
>> invocations of it.
>> Added `pathpatterns` crate to local overrides in cargo.toml.
>>
>> Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
>> ---
>>
>> changes v5:
>> - updated all invocations of `matches()`
>>
>> changes v4:
>> - match only by path and exclude the matched files, the run `stat()` and
>> match again, this time using the `file_mode`. This will match everything
>> twice in the worst case, which is not optimal.
>> changes v3:
>> - checking for `read` and `execute` permissions before entering directory,
>> doesn't work because there are a lot of side-effects (executed by
>> different user, AppArmor, SELinux, ...).
>> changes v2:
>> - checking for excluded files with `matches()` before executing `stat()`,
>> this doesn't work because we get the file_mode from `stat()` and don't
>> want to ignore it when matching.
>>
>>
>> Cargo.toml | 5 +++--
>> pbs-client/src/catalog_shell.rs | 8 +++----
>> pbs-client/src/pxar/create.rs | 38 ++++++++++++++++++++-------------
>> pbs-client/src/pxar/extract.rs | 10 +++++----
>> pbs-datastore/src/catalog.rs | 6 +++---
>> 5 files changed, 39 insertions(+), 28 deletions(-)
>>
>> diff --git a/Cargo.toml b/Cargo.toml
>> index 5cbae1b8..560794a4 100644
>> --- a/Cargo.toml
>> +++ b/Cargo.toml
>> @@ -264,8 +264,9 @@ proxmox-rrd.workspace = true
>> #proxmox-sortable-macro = { path = "../proxmox/proxmox-sortable-macro" }
>> #proxmox-human-byte = { path = "../proxmox/proxmox-human-byte" }
>>
>> -#proxmox-apt = { path = "../proxmox/proxmox-apt" }
>> -#proxmox-openid = { path = "../proxmox/proxmox-openid" }
>> +#proxmox-apt = { path = "../proxmox-apt" }
>> +#proxmox-openid = { path = "../proxmox-openid-rs" }
>> +#pathpatterns = {path = "../pathpatterns" }
>>
>> #pxar = { path = "../pxar" }
>>
>> diff --git a/pbs-client/src/catalog_shell.rs b/pbs-client/src/catalog_shell.rs
>> index b8aaf8cb..f53b3cc5 100644
>> --- a/pbs-client/src/catalog_shell.rs
>> +++ b/pbs-client/src/catalog_shell.rs
>> @@ -1138,14 +1138,14 @@ impl<'a> ExtractorState<'a> {
>> pub async fn handle_entry(&mut self, entry: catalog::DirEntry) -> Result<(), Error> {
>> let match_result = self.match_list.matches(&self.path, entry.get_file_mode());
>> let did_match = match match_result {
>> - Some(MatchType::Include) => true,
>> - Some(MatchType::Exclude) => false,
>> - None => self.matches,
>> + Ok(Some(MatchType::Include)) => true,
>> + Ok(Some(MatchType::Exclude)) => false,
>> + _ => self.matches,
>> };
>>
>> match (did_match, &entry.attr) {
>> (_, DirEntryAttribute::Directory { .. }) => {
>> - self.handle_new_directory(entry, match_result).await?;
>> + self.handle_new_directory(entry, match_result?).await?;
>> }
>> (true, DirEntryAttribute::File { .. }) => {
>> self.dir_stack.push(PathStackEntry::new(entry));
>> diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
>> index 2577cf98..2d516cfa 100644
>> --- a/pbs-client/src/pxar/create.rs
>> +++ b/pbs-client/src/pxar/create.rs
>> @@ -21,7 +21,6 @@ use pxar::Metadata;
>>
>> use proxmox_io::vec;
>> use proxmox_lang::c_str;
>> -use proxmox_sys::error::SysError;
>> use proxmox_sys::fs::{self, acl, xattr};
>>
>> use pbs_datastore::catalog::BackupCatalogWriter;
>> @@ -420,7 +419,7 @@ impl Archiver {
>> for file in dir.iter() {
>> let file = file?;
>>
>> - let file_name = file.file_name().to_owned();
>> + let file_name = file.file_name();
>> let file_name_bytes = file_name.to_bytes();
>> if file_name_bytes == b"." || file_name_bytes == b".." {
>> continue;
>> @@ -434,25 +433,34 @@ impl Archiver {
>> assert_single_path_component(os_file_name)?;
>> let full_path = self.path.join(os_file_name);
>>
>> - let stat = match nix::sys::stat::fstatat(
>> + let match_path = PathBuf::from("/").join(full_path.clone());
>> +
>> + let mut stat_results: Option<FileStat> = None;
>> +
>> + let get_file_mode = || match nix::sys::stat::fstatat(
> You don't need that 'match' here if your cases are basically a no-op.
>
> You *could* attach the `stat failed on ...` context with the file name
> here... but this will make it a bit more tedious to check for `ENOENT`
> as we'd need to use `.downcast()` on the anyhow::Error.
>
> So either that, or we'll need to duplicate the context line down below
> where we do the final `let stat = results.unwrap_or_else(get_file_mode)?`.
>
>> dir_fd,
>> - file_name.as_c_str(),
>> + file_name.to_owned().as_c_str(),
>> nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
>> ) {
>> - Ok(stat) => stat,
>> - Err(ref err) if err.not_found() => continue,
> Because the thing is, we still need to handle this case - just noticed
> that this was removed entirely, which is of course not what we want :-)
> If the file gets removed between listing it from the directory and us
> trying to `stat` it, we should just act as if it had never existed.
>
>> - Err(err) => return Err(err).context(format!("stat failed on {:?}", full_path)),
>> + Ok(stat) => Ok(stat),
>> + Err(e) => Err(e),
>> };
>> -
>> - let match_path = PathBuf::from("/").join(full_path.clone());
>> - if self
>> - .patterns
>> - .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
>> - == Some(MatchType::Exclude)
>> + if Some(MatchType::Exclude)
>> + == self
>> + .patterns
>> + .matches(match_path.as_os_str().as_bytes(), || {
>> + Ok::<_, Errno>(match &stat_results {
>> + Some(result) => result.st_mode,
>> + None => stat_results.insert(get_file_mode()?).st_mode,
>> + })
>> + })
> ... it will instead need to be checked here ^
> Basically, we will need to swap the `if` for a `match` after all, with
> the cases
> Ok(Some(MatchType::Exclude)) => continue,
> Ok(_) => (),
> Err(err) if err.not_found() => continue,
> err => return err.with_context(...),
> or the final 2 cases when using *not* not changing the `get_file_mode`
> error to `anyhow::Error` would be:
> Err(err) => match err.downcast::<io::Error>() {
> Some(err) if err.not_found() => continue,
> _ => return Err(err),
> }
>
>
> I hope I didn't miss anything now.
>
>> + .with_context(|| format!("stat failed on {full_path:?}"))?
>> {
>> continue;
>> }
>>
>> + let stat = stat_results.map(Ok).unwrap_or_else(get_file_mode)?;
> If the context line is not already in the closure, we'd need to
> duplicate it here.
>
>> +
>> self.entry_counter += 1;
>> if self.entry_counter > self.entry_limit {
>> bail!(
>> @@ -462,7 +470,7 @@ impl Archiver {
>> }
>>
>> file_list.push(FileListEntry {
>> - name: file_name,
>> + name: file_name.to_owned(),
>> path: full_path,
>> stat,
>> });
>> @@ -533,7 +541,7 @@ impl Archiver {
>> let match_path = PathBuf::from("/").join(self.path.clone());
>> if self
>> .patterns
>> - .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
>> + .matches(match_path.as_os_str().as_bytes(), stat.st_mode)?
>> == Some(MatchType::Exclude)
>> {
>> return Ok(());
>> diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
>> index 4eb6fb90..e24a1560 100644
>> --- a/pbs-client/src/pxar/extract.rs
>> +++ b/pbs-client/src/pxar/extract.rs
>> @@ -251,22 +251,24 @@ where
>>
>> self.extractor.set_path(entry.path().as_os_str().to_owned());
>>
>> + // We can `unwrap()` safely here because we get a `Result<_, std::convert::Infallible>`
>> let match_result = self.match_list.matches(
>> entry.path().as_os_str().as_bytes(),
>> - Some(metadata.file_type() as u32),
>> - );
>> + metadata.file_type() as u32
>> + ).unwrap();
>>
>> let did_match = match match_result {
>> Some(MatchType::Include) => true,
>> Some(MatchType::Exclude) => false,
>> - None => self.state.current_match,
>> + _ => self.state.current_match,
>> };
>>
>> let extract_res = match (did_match, entry.kind()) {
>> (_, EntryKind::Directory) => {
>> self.callback(entry.path());
>>
>> - let create = self.state.current_match && match_result != Some(MatchType::Exclude);
>> + let create =
>> + self.state.current_match && match_result != Some(MatchType::Exclude);
>> let res = self
>> .extractor
>> .enter_directory(file_name_os.to_owned(), metadata.clone(), create)
>> diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
>> index 11c14b64..86e20c92 100644
>> --- a/pbs-datastore/src/catalog.rs
>> +++ b/pbs-datastore/src/catalog.rs
>> @@ -678,9 +678,9 @@ impl<R: Read + Seek> CatalogReader<R> {
>> }
>> file_path.extend(&e.name);
>> match match_list.matches(&file_path, e.get_file_mode()) {
>> - Some(MatchType::Exclude) => continue,
>> - Some(MatchType::Include) => callback(file_path)?,
>> - None => (),
>> + Ok(Some(MatchType::Exclude)) => continue,
>> + Ok(Some(MatchType::Include)) => callback(file_path)?,
>> + _ => (),
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [pbs-devel] [PATCH pathpatterns v4] match_list: updated `matches()`, to only retrieve file mode if necessary
2023-08-21 13:08 [pbs-devel] [PATCH pathpatterns v4] match_list: updated `matches()`, to only retrieve file mode if necessary Gabriel Goller
2023-08-21 13:08 ` [pbs-devel] [PATCH proxmox-backup v6] fix #4380: check if file is excluded before running `stat()` Gabriel Goller
@ 2023-08-22 13:07 ` Wolfgang Bumiller
1 sibling, 0 replies; 5+ messages in thread
From: Wolfgang Bumiller @ 2023-08-22 13:07 UTC (permalink / raw)
To: Gabriel Goller; +Cc: pbs-devel
LGTM now, 1 minor nit below since the other patch needs another version
anyway...
On Mon, Aug 21, 2023 at 03:08:25PM +0200, Gabriel Goller wrote:
> Updated `matches()` function, which now takes the `GetFileMode` trait and returns a
> Result. `GetFileMode` is a function that should return the `file_mode`. `matches()`
> will go through the patterns and match by path only until it finds a pattern which
> does not have `MatchFlag::ANY_FILE_TYPE`, in case it will call the
> `get_file_mode`, which will return a `file_mode`. This will ensure that the
> `get()` (which in our case executes `stat()`) will only be run once(at most),
> every pattern will only be processed once and that the order of the patterns
> will be respected. `GetFileMode` is also implemented for `Option<u32>` (so that
> we can simply pass options of the `file_mode`) and `u32` (returning
> `std::convert::Infallible`, so that we can pass a u32 mode directly).
>
> Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
> ---
>
> changes v3:
> - changed `matches()` and `matches_exact()` to retrieve the file_mode lazily.
> Allowed these functions to take in a trait `GetFileMode`, so that we can pass
> closures and options that contain the `file_mode`.
>
> changes v2:
> - updated the `matches_path()` function to take a closure and renamed
> it to `matches_mode_lazy()`. This allows us to only match once, so we
> don't need to match before and after `stat()` (without and with
> `file_mode`) anymore.
>
> changes v1:
> - added `matches_path()` function, which matches by path only and returns an
> error if the `file_mode` is required.
>
>
> src/lib.rs | 72 +++++-----
> src/match_list.rs | 335 ++++++++++++++++++++++++++++++++++++++--------
> 2 files changed, 315 insertions(+), 92 deletions(-)
>
> diff --git a/src/match_list.rs b/src/match_list.rs
> index c5b14e0..69b1fdb 100644
> --- a/src/match_list.rs
> +++ b/src/match_list.rs
(...)
> @@ -383,30 +438,48 @@ where
> <&'a T as IntoIterator>::IntoIter: DoubleEndedIterator,
> <&'a T as IntoIterator>::Item: MatchListEntry,
> {
Since we have a T here already...
> - fn matches_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType> {
> + fn matches<U, G>(&self, path: U, get_file_mode: G) -> Result<Option<MatchType>, G::Error>
maybe use `P` for the `path` ;-)
> + where
> + U: AsRef<[u8]>,
> + G: GetFileMode,
> + {
> // This is an &self method on a `T where T: 'a`.
> let this: &'a Self = unsafe { std::mem::transmute(self) };
>
> + let mut get_file_mode = Some(get_file_mode);
> + let mut file_mode = None;
> +
> for m in this.into_iter().rev() {
> - if let Some(mt) = m.entry_matches(path, file_mode) {
> - return Some(mt);
> + if file_mode.is_none() && m.entry_needs_file_mode() {
> + file_mode = Some(get_file_mode.take().unwrap().get()?);
> + }
> + if let Some(mt) = m.entry_matches(path.as_ref(), file_mode) {
> + return Ok(Some(mt));
> }
> }
> -
> - None
> + Ok(None)
> }
>
> - fn matches_exact_do(&self, path: &[u8], file_mode: Option<u32>) -> Option<MatchType> {
> + fn matches_exact<U, G>(&self, path: U, get_file_mode: G) -> Result<Option<MatchType>, G::Error>
^ here as well
> + where
> + U: AsRef<[u8]>,
> + G: GetFileMode,
> + {
> // This is an &self method on a `T where T: 'a`.
> let this: &'a Self = unsafe { std::mem::transmute(self) };
>
> + let mut get_file_mode = Some(get_file_mode);
> + let mut file_mode = None;
> +
> for m in this.into_iter().rev() {
> - if let Some(mt) = m.entry_matches_exact(path, file_mode) {
> - return Some(mt);
> + if file_mode.is_none() && m.entry_needs_file_mode() {
> + file_mode = Some(get_file_mode.take().unwrap().get()?);
> + }
> + if let Some(mt) = m.entry_matches_exact(path.as_ref(), file_mode) {
> + return Ok(Some(mt));
> }
> }
> -
> - None
> + Ok(None)
> }
> }
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-08-22 14:07 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-21 13:08 [pbs-devel] [PATCH pathpatterns v4] match_list: updated `matches()`, to only retrieve file mode if necessary Gabriel Goller
2023-08-21 13:08 ` [pbs-devel] [PATCH proxmox-backup v6] fix #4380: check if file is excluded before running `stat()` Gabriel Goller
2023-08-22 13:04 ` Wolfgang Bumiller
2023-08-22 14:07 ` Gabriel Goller
2023-08-22 13:07 ` [pbs-devel] [PATCH pathpatterns v4] match_list: updated `matches()`, to only retrieve file mode if necessary Wolfgang Bumiller
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal