all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
To: Gabriel Goller <g.goller@proxmox.com>
Cc: pbs-devel@lists.proxmox.com
Subject: Re: [pbs-devel] [PATCH proxmox-backup v3 1/1] fix #4380: check if file is excluded before running `stat()`
Date: Fri, 11 Aug 2023 10:51:31 +0200	[thread overview]
Message-ID: <ff3ue3ab5uscsatexjmo7sahc6vtotqe4jcdzmjc7d22msnzg5@a35jf4thexka> (raw)
In-Reply-To: <20230809101913.81818-2-g.goller@proxmox.com>

On Wed, Aug 09, 2023 at 12:19:13PM +0200, Gabriel Goller wrote:
> Before running `stat()` we now make a `matches_path()` call, which checks

I was more thinking to use `matches_path()` only if `stat()` actually
fails and then propagate `stat()`'s error (instead of the error that a
file mode is required... ;-) )
That way we wouldn't match twice in the other case.

> if the file/directory is excluded (`matches_path()` tries to match using
> only the path, without the file mode). If the file is not excluded,
> we `stat()` and then check again if the path matches using the file_mode and
> the `matches()` function.
> Added `pathpatterns` crate to local overrides in cargo.toml.
> 
> changes v2:
>  - checking for `read` and `execute` permissions before entering directory,
>    doesn't work because there are a lot of side-effects (executed by
>    different user, AppArmor, SELinux, ...).
> changes v1:
>  - checking for excluded files with `matches()` before executing `stat()`,
>    this doesn't work because we get the file_mode from `stat()` and don't 
>    want to ignore it when matching.
> 
> Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
> ---
>  Cargo.toml                    |  1 +
>  pbs-client/src/pxar/create.rs | 45 ++++++++++++++++++++++++++---------
>  2 files changed, 35 insertions(+), 11 deletions(-)
> 
> diff --git a/Cargo.toml b/Cargo.toml
> index 2a05c471..c54bc28b 100644
> --- a/Cargo.toml
> +++ b/Cargo.toml
> @@ -262,6 +262,7 @@ proxmox-rrd.workspace = true
>  
>  #proxmox-apt = { path = "../proxmox-apt" }
>  #proxmox-openid = { path = "../proxmox-openid-rs" }
> +#pathpatterns = {path = "../pathpatterns" }
>  
>  #pxar = { path = "../pxar" }
>  
> diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
> index 2577cf98..a6d608e7 100644
> --- a/pbs-client/src/pxar/create.rs
> +++ b/pbs-client/src/pxar/create.rs
> @@ -232,7 +232,7 @@ impl Archiver {
>              let entry_counter = self.entry_counter;
>  
>              let old_patterns_count = self.patterns.len();
> -            self.read_pxar_excludes(dir.as_raw_fd())?;
> +            self.read_pxar_excludes(&mut dir)?;
>  
>              let mut file_list = self.generate_directory_file_list(&mut dir, is_root)?;
>  
> @@ -315,8 +315,23 @@ impl Archiver {
>          }
>      }
>  
> -    fn read_pxar_excludes(&mut self, parent: RawFd) -> Result<(), Error> {
> -        let fd = match self.open_file(parent, c_str!(".pxarexclude"), OFlag::O_RDONLY, false)? {
> +    fn read_pxar_excludes(&mut self, parent: &mut Dir) -> Result<(), Error> {
> +        let mut exclude_file_found = false;
> +        for file in parent.iter() {
> +            if file?.file_name().to_str()? == ".pxarexclude" {
> +                exclude_file_found = true;
> +                break;
> +            }
> +        }
> +        if !exclude_file_found {

Why are we adding this check?
It's not necessary.

> +            return Ok(());
> +        }
> +        let fd = match self.open_file(
> +            parent.as_raw_fd(),
> +            c_str!(".pxarexclude"),
> +            OFlag::O_RDONLY,
> +            false,
> +        )? {
>              Some(fd) => fd,
>              None => return Ok(()),

^ This already means the file does not exist.

>          };
> @@ -420,7 +435,7 @@ impl Archiver {
>          for file in dir.iter() {
>              let file = file?;
>  
> -            let file_name = file.file_name().to_owned();
> +            let file_name = file.file_name();
>              let file_name_bytes = file_name.to_bytes();
>              if file_name_bytes == b"." || file_name_bytes == b".." {
>                  continue;
> @@ -434,9 +449,17 @@ impl Archiver {
>              assert_single_path_component(os_file_name)?;
>              let full_path = self.path.join(os_file_name);
>  
> +            let match_path = PathBuf::from("/").join(full_path.clone());
> +            let pre_stat_match = self
> +                .patterns
> +                .matches_path(match_path.as_os_str().as_bytes());

As said above, I'd like this to instead happen in stat's `Err() => {}`
case.

> +            if pre_stat_match == Ok(Some(MatchType::Exclude)) {
> +                continue;
> +            }
> +
>              let stat = match nix::sys::stat::fstatat(
>                  dir_fd,
> -                file_name.as_c_str(),
> +                file_name.to_owned().as_c_str(),
>                  nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
>              ) {
>                  Ok(stat) => stat,
> @@ -444,11 +467,11 @@ impl Archiver {
>                  Err(err) => return Err(err).context(format!("stat failed on {:?}", full_path)),
>              };
>  
> -            let match_path = PathBuf::from("/").join(full_path.clone());
> -            if self
> -                .patterns
> -                .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
> -                == Some(MatchType::Exclude)
> +            if pre_stat_match.is_err()

And then this won't be needed and AFAICT isn't correct with the current
implementation of `matches_path()` of this series since it might have
found a match *after* one of a higher precedence would have already
matched potentially differently.

> +                && self
> +                    .patterns
> +                    .matches(match_path.as_os_str().as_bytes(), Some(stat.st_mode))
> +                    == Some(MatchType::Exclude)
>              {
>                  continue;
>              }
> @@ -462,7 +485,7 @@ impl Archiver {
>              }
>  
>              file_list.push(FileListEntry {
> -                name: file_name,
> +                name: file_name.to_owned(),
>                  path: full_path,
>                  stat,
>              });
> -- 
> 2.39.2




  reply	other threads:[~2023-08-11  8:52 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-09 10:19 [pbs-devel] [PATCH pathpatterns] match_list: added `matches_path()` function, which matches only the path Gabriel Goller
2023-08-09 10:19 ` [pbs-devel] [PATCH proxmox-backup v3 1/1] fix #4380: check if file is excluded before running `stat()` Gabriel Goller
2023-08-11  8:51   ` Wolfgang Bumiller [this message]
2023-08-14  7:41     ` Gabriel Goller
2023-08-11  8:26 ` [pbs-devel] [PATCH pathpatterns] match_list: added `matches_path()` function, which matches only the path Wolfgang Bumiller
2023-08-11  8:32   ` Wolfgang Bumiller
2023-08-11  8:38     ` Wolfgang Bumiller
2023-08-14  9:32   ` Gabriel Goller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ff3ue3ab5uscsatexjmo7sahc6vtotqe4jcdzmjc7d22msnzg5@a35jf4thexka \
    --to=w.bumiller@proxmox.com \
    --cc=g.goller@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal