public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
To: Gabriel Goller <g.goller@proxmox.com>
Cc: pbs-devel@lists.proxmox.com
Subject: Re: [pbs-devel] [PATCH proxmox-backup v6] fix #4380: check if file is excluded before running `stat()`
Date: Tue, 22 Aug 2023 15:04:39 +0200	[thread overview]
Message-ID: <vzaphbts2zuwbuy3hrwxpaf4jn4wysakeobqmmqddqjalhqsoi@yq2qfgiuodnw> (raw)
In-Reply-To: <20230821130826.147473-2-g.goller@proxmox.com>

On Mon, Aug 21, 2023 at 03:08:26PM +0200, Gabriel Goller wrote:
> Passed a closure with the `stat()` function call to `matches()`. This
> will traverse through all patterns and try to match using the path only, if a
> `file_mode` is needed, it will run the closure. This means that if we exclude
> a file with the `MatchType::ANY_FILE_TYPE`, we will skip it without running
> `stat()` on it. As we updated the `matches()` function, we also updated all the
> invocations of it.
> Added `pathpatterns` crate to local overrides in cargo.toml.
> 
> Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
> ---
> 
> changes v5:
>  - updated all invocations of `matches()` 
> 
> changes v4:
>  - match only by path and exclude the matched files, the run `stat()` and
>    match again, this time using the `file_mode`. This will match everything 
>    twice in the worst case, which is not optimal.
> changes v3:
>  - checking for `read` and `execute` permissions before entering directory,
>    doesn't work because there are a lot of side-effects (executed by
>    different user, AppArmor, SELinux, ...).
> changes v2:
>  - checking for excluded files with `matches()` before executing `stat()`,
>    this doesn't work because we get the file_mode from `stat()` and don't
>    want to ignore it when matching.
> 
> 
>  Cargo.toml                      |  5 +++--
>  pbs-client/src/catalog_shell.rs |  8 +++----
>  pbs-client/src/pxar/create.rs   | 38 ++++++++++++++++++++-------------
>  pbs-client/src/pxar/extract.rs  | 10 +++++----
>  pbs-datastore/src/catalog.rs    |  6 +++---
>  5 files changed, 39 insertions(+), 28 deletions(-)
> 
> diff --git a/Cargo.toml b/Cargo.toml
> index 5cbae1b8..560794a4 100644
> --- a/Cargo.toml
> +++ b/Cargo.toml
> @@ -264,8 +264,9 @@ proxmox-rrd.workspace = true
>  #proxmox-sortable-macro = { path = "../proxmox/proxmox-sortable-macro" }
>  #proxmox-human-byte = { path = "../proxmox/proxmox-human-byte" }
>  
> -#proxmox-apt = { path = "../proxmox/proxmox-apt" }
> -#proxmox-openid = { path = "../proxmox/proxmox-openid" }
> +#proxmox-apt = { path = "../proxmox-apt" }
> +#proxmox-openid = { path = "../proxmox-openid-rs" }
> +#pathpatterns = {path = "../pathpatterns" }
>  
>  #pxar = { path = "../pxar" }
>  
> diff --git a/pbs-client/src/catalog_shell.rs b/pbs-client/src/catalog_shell.rs
> index b8aaf8cb..f53b3cc5 100644
> --- a/pbs-client/src/catalog_shell.rs
> +++ b/pbs-client/src/catalog_shell.rs
> @@ -1138,14 +1138,14 @@ impl<'a> ExtractorState<'a> {
>      pub async fn handle_entry(&mut self, entry: catalog::DirEntry) -> Result<(), Error> {
>          let match_result = self.match_list.matches(&self.path, entry.get_file_mode());
>          let did_match = match match_result {
> -            Some(MatchType::Include) => true,
> -            Some(MatchType::Exclude) => false,
> -            None => self.matches,
> +            Ok(Some(MatchType::Include)) => true,
> +            Ok(Some(MatchType::Exclude)) => false,
> +            _ => self.matches,
>          };
>  
>          match (did_match, &entry.attr) {
>              (_, DirEntryAttribute::Directory { .. }) => {
> -                self.handle_new_directory(entry, match_result).await?;
> +                self.handle_new_directory(entry, match_result?).await?;
>              }
>              (true, DirEntryAttribute::File { .. }) => {
>                  self.dir_stack.push(PathStackEntry::new(entry));
> diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
> index 2577cf98..2d516cfa 100644
> --- a/pbs-client/src/pxar/create.rs
> +++ b/pbs-client/src/pxar/create.rs
> @@ -21,7 +21,6 @@ use pxar::Metadata;
>  
>  use proxmox_io::vec;
>  use proxmox_lang::c_str;
> -use proxmox_sys::error::SysError;
>  use proxmox_sys::fs::{self, acl, xattr};
>  
>  use pbs_datastore::catalog::BackupCatalogWriter;
> @@ -420,7 +419,7 @@ impl Archiver {
>          for file in dir.iter() {
>              let file = file?;
>  
> -            let file_name = file.file_name().to_owned();
> +            let file_name = file.file_name();
>              let file_name_bytes = file_name.to_bytes();
>              if file_name_bytes == b"." || file_name_bytes == b".." {
>                  continue;
> @@ -434,25 +433,34 @@ impl Archiver {
>              assert_single_path_component(os_file_name)?;
>              let full_path = self.path.join(os_file_name);
>  
> -            let stat = match nix::sys::stat::fstatat(
> +            let match_path = PathBuf::from("/").join(full_path.clone());
> +
> +            let mut stat_results: Option<FileStat> = None;
> +
> +            let get_file_mode = || match nix::sys::stat::fstatat(

You don't need that 'match' here if your cases are basically a no-op.

You *could* attach the `stat failed on ...` context with the file name
here... but this will make it a bit more tedious to check for `ENOENT`
as we'd need to use `.downcast()` on the anyhow::Error.

So either that, or we'll need to duplicate the context line down below
where we do the final `let stat = results.unwrap_or_else(get_file_mode)?`.

>                  dir_fd,
> -                file_name.as_c_str(),
> +                file_name.to_owned().as_c_str(),
>                  nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
>              ) {
> -                Ok(stat) => stat,
> -                Err(ref err) if err.not_found() => continue,

Because the thing is, we still need to handle this case - just noticed
that this was removed entirely, which is of course not what we want :-)
If the file gets removed between listing it from the directory and us
trying to `stat` it, we should just act as if it had never existed.

> -                Err(err) => return Err(err).context(format!("stat failed on {:?}", full_path)),
> +                Ok(stat) => Ok(stat),
> +                Err(e) => Err(e),
>              };
> -
> -            let match_path = PathBuf::from("/").join(full_path.clone());
> -            if self
> -                .patterns
> -                .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
> -                == Some(MatchType::Exclude)
> +            if Some(MatchType::Exclude)
> +                == self
> +                    .patterns
> +                    .matches(match_path.as_os_str().as_bytes(), || {
> +                        Ok::<_, Errno>(match &stat_results {
> +                            Some(result) => result.st_mode,
> +                            None => stat_results.insert(get_file_mode()?).st_mode,
> +                        })
> +                    })

... it will instead need to be checked here ^
Basically, we will need to swap the `if` for a `match` after all, with
the cases
    Ok(Some(MatchType::Exclude)) => continue,
    Ok(_) => (),
    Err(err) if err.not_found() => continue,
    err => return err.with_context(...),
or the final 2 cases when using *not* not changing the `get_file_mode`
error to `anyhow::Error` would be:
    Err(err) => match err.downcast::<io::Error>() {
        Some(err) if err.not_found() => continue,
        _ => return Err(err),
    }


I hope I didn't miss anything now.

> +                    .with_context(|| format!("stat failed on {full_path:?}"))?
>              {
>                  continue;
>              }
>  
> +            let stat = stat_results.map(Ok).unwrap_or_else(get_file_mode)?;

If the context line is not already in the closure, we'd need to
duplicate it here.

> +
>              self.entry_counter += 1;
>              if self.entry_counter > self.entry_limit {
>                  bail!(
> @@ -462,7 +470,7 @@ impl Archiver {
>              }
>  
>              file_list.push(FileListEntry {
> -                name: file_name,
> +                name: file_name.to_owned(),
>                  path: full_path,
>                  stat,
>              });
> @@ -533,7 +541,7 @@ impl Archiver {
>          let match_path = PathBuf::from("/").join(self.path.clone());
>          if self
>              .patterns
> -            .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
> +            .matches(match_path.as_os_str().as_bytes(), stat.st_mode)?
>              == Some(MatchType::Exclude)
>          {
>              return Ok(());
> diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
> index 4eb6fb90..e24a1560 100644
> --- a/pbs-client/src/pxar/extract.rs
> +++ b/pbs-client/src/pxar/extract.rs
> @@ -251,22 +251,24 @@ where
>  
>          self.extractor.set_path(entry.path().as_os_str().to_owned());
>  
> +        // We can `unwrap()` safely here because we get a `Result<_, std::convert::Infallible>`
>          let match_result = self.match_list.matches(
>              entry.path().as_os_str().as_bytes(),
> -            Some(metadata.file_type() as u32),
> -        );
> +            metadata.file_type() as u32
> +        ).unwrap();
>  
>          let did_match = match match_result {
>              Some(MatchType::Include) => true,
>              Some(MatchType::Exclude) => false,
> -            None => self.state.current_match,
> +            _ => self.state.current_match,
>          };
>  
>          let extract_res = match (did_match, entry.kind()) {
>              (_, EntryKind::Directory) => {
>                  self.callback(entry.path());
>  
> -                let create = self.state.current_match && match_result != Some(MatchType::Exclude);
> +                let create =
> +                    self.state.current_match && match_result != Some(MatchType::Exclude);
>                  let res = self
>                      .extractor
>                      .enter_directory(file_name_os.to_owned(), metadata.clone(), create)
> diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
> index 11c14b64..86e20c92 100644
> --- a/pbs-datastore/src/catalog.rs
> +++ b/pbs-datastore/src/catalog.rs
> @@ -678,9 +678,9 @@ impl<R: Read + Seek> CatalogReader<R> {
>              }
>              file_path.extend(&e.name);
>              match match_list.matches(&file_path, e.get_file_mode()) {
> -                Some(MatchType::Exclude) => continue,
> -                Some(MatchType::Include) => callback(file_path)?,
> -                None => (),
> +                Ok(Some(MatchType::Exclude)) => continue,
> +                Ok(Some(MatchType::Include)) => callback(file_path)?,
> +                _ => (),





  reply	other threads:[~2023-08-22 13:05 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-21 13:08 [pbs-devel] [PATCH pathpatterns v4] match_list: updated `matches()`, to only retrieve file mode if necessary Gabriel Goller
2023-08-21 13:08 ` [pbs-devel] [PATCH proxmox-backup v6] fix #4380: check if file is excluded before running `stat()` Gabriel Goller
2023-08-22 13:04   ` Wolfgang Bumiller [this message]
2023-08-22 14:07     ` Gabriel Goller
2023-08-22 13:07 ` [pbs-devel] [PATCH pathpatterns v4] match_list: updated `matches()`, to only retrieve file mode if necessary Wolfgang Bumiller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=vzaphbts2zuwbuy3hrwxpaf4jn4wysakeobqmmqddqjalhqsoi@yq2qfgiuodnw \
    --to=w.bumiller@proxmox.com \
    --cc=g.goller@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal