public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
To: Max Carrara <m.carrara@proxmox.com>
Cc: pbs-devel@lists.proxmox.com
Subject: Re: [pbs-devel] [PATCH proxmox-backup 2/5] pbs-client: pxar: refactor body of `extract_archive` to `ExtractorIter`
Date: Tue, 11 Jul 2023 15:24:17 +0200	[thread overview]
Message-ID: <rt25g6fws5y4r2iuwxq23qcezkjewvzy4bwzm7igfd4qqtb5aa@m4c7tuh2fcrp> (raw)
In-Reply-To: <20230607163428.1154123-3-m.carrara@proxmox.com>

On Wed, Jun 07, 2023 at 06:34:25PM +0200, Max Carrara wrote:
> This change factors the body of `extract_archive()` into a separate
> struct named `ExtractorIter` which implements the `Iterator` trait.
> 
> This refactor has two goals:
>   * Make it easier to provide and propagate errors and additional
>     information via `anyhow::Context`
>   * Introduce a means to handle errors that occur during extraction,
>     with the possibility to continue extraction if the handler decides
>     that the error is not fatal

Does it really help much with that both those things?
Which parts actually benefit from this?

> 
> The latter point benefits from the information provided by the former;
> previously, errors could only be handled in certain locations
> (e.g. application of metadata), but not on a "per-entry" basis.
> 
> Since `extract_archive()` was already using a "desugared" version of
> the iterator pattern to begin with, wrapping its body up in an actual
> `Iterator` made the most sense, as it didn't require changing the already
> existing control flow that much.
> 
> Signed-off-by: Max Carrara <m.carrara@proxmox.com>
> ---
>  pbs-client/src/pxar/extract.rs | 382 +++++++++++++++++++++++----------
>  1 file changed, 273 insertions(+), 109 deletions(-)
> 
> diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
> index 7af80bb9..fa08bfd7 100644
> --- a/pbs-client/src/pxar/extract.rs
> +++ b/pbs-client/src/pxar/extract.rs
> @@ -8,7 +8,7 @@ use std::os::unix::io::{AsRawFd, FromRawFd, RawFd};
>  use std::path::{Path, PathBuf};
>  use std::sync::{Arc, Mutex};
>  
> -use anyhow::{bail, Context, Error};
> +use anyhow::{anyhow, bail, Context, Error};
>  use nix::dir::Dir;
>  use nix::fcntl::OFlag;
>  use nix::sys::stat::Mode;
> @@ -40,75 +40,224 @@ pub struct PxarExtractOptions<'a> {
>  pub type ErrorHandler = Box<dyn FnMut(Error) -> Result<(), Error> + Send>;
>  
>  pub fn extract_archive<T, F>(
> -    mut decoder: pxar::decoder::Decoder<T>,
> +    decoder: pxar::decoder::Decoder<T>,
>      destination: &Path,
>      feature_flags: Flags,
> -    mut callback: F,
> +    callback: F,
>      options: PxarExtractOptions,
>  ) -> Result<(), Error>
>  where
>      T: pxar::decoder::SeqRead,
>      F: FnMut(&Path),
>  {
> -    // we use this to keep track of our directory-traversal
> -    decoder.enable_goodbye_entries(true);
> +    ExtractorIter::new(decoder, destination, feature_flags, callback, options)
> +        .context("failed to initialize extractor")?
> +        .collect::<Result<(), Error>>()
> +        .context("encountered unexpected error during extraction")
> +}
>  
> -    let root = decoder
> -        .next()
> -        .context("found empty pxar archive")?
> -        .context("error reading pxar archive")?;
> +struct ExtractorIterState {
> +    match_stack: Vec<bool>,
> +    err_path_stack: Vec<OsString>,
> +    current_match: bool,
> +    end_reached: bool,
> +}
>  
> -    if !root.is_dir() {
> -        bail!("pxar archive does not start with a directory entry!");
> +/// An [`Iterator`] that encapsulates the process of extraction in [extract_archive].
> +/// Therefore, traversing over an [`ExtractorIter`] until exhaustion extracts an
> +/// entire PXAR archive.
> +struct ExtractorIter<'a, T, F>
> +where
> +    T: pxar::decoder::SeqRead,
> +    F: FnMut(&Path),
> +{
> +    decoder: pxar::decoder::Decoder<T>,
> +    callback: F,
> +    extractor: Extractor,
> +    match_list: &'a [MatchEntry],
> +    state: ExtractorIterState,
> +}
> +
> +impl ExtractorIterState {
> +    fn new(options: &PxarExtractOptions) -> Self {
> +        Self {
> +            match_stack: Vec::new(),
> +            err_path_stack: Vec::new(),
> +            current_match: options.extract_match_default,
> +            end_reached: false,
> +        }
>      }
> +}
>  
> -    create_path(
> -        destination,
> -        None,
> -        Some(CreateOptions::new().perm(Mode::from_bits_truncate(0o700))),
> -    )
> -    .context(format!("error creating directory {:?}", destination))?;
> +impl<'a, T, F> ExtractorIter<'a, T, F>
> +where
> +    T: pxar::decoder::SeqRead,
> +    F: FnMut(&Path),
> +{
> +    /// Creates and initializes the state of a new [`ExtractorIter`].
> +    ///
> +    /// This function requires that the given [`Decoder`][D] has not made a single
> +    /// traversal (a call to [`next()`][next]) yet.
> +    ///
> +    /// [D]: pxar::decoder::Decoder
> +    /// [next]: std::iter::Iterator::next()
> +    fn new(
> +        mut decoder: pxar::decoder::Decoder<T>,
> +        destination: &Path,
> +        feature_flags: Flags,
> +        callback: F,
> +        options: PxarExtractOptions<'a>,
> +    ) -> Result<Self, Error> {
> +        // we use this to keep track of our directory-traversal
> +        decoder.enable_goodbye_entries(true);
> +
> +        let root = decoder
> +            .next()
> +            .context("found empty pxar archive")?
> +            .context("error reading pxar archive")?;
> +
> +        if !root.is_dir() {
> +            bail!("pxar archive does not start with a directory entry!");
> +        }
>  
> -    let dir = Dir::open(
> -        destination,
> -        OFlag::O_DIRECTORY | OFlag::O_CLOEXEC,
> -        Mode::empty(),
> -    )
> -    .context(format!("unable to open target directory {:?}", destination))?;
> -
> -    let mut extractor = Extractor::new(
> -        dir,
> -        root.metadata().clone(),
> -        options.allow_existing_dirs,
> -        options.overwrite,
> -        feature_flags,
> -    );
> -
> -    if let Some(on_error) = options.on_error {
> -        extractor.on_error(on_error);
> +        let mut state = ExtractorIterState::new(&options);
> +        state.err_path_stack.push(OsString::from("/"));
> +
> +        create_path(
> +            destination,
> +            None,
> +            Some(CreateOptions::new().perm(Mode::from_bits_truncate(0o700))),
> +        )
> +        .context(format!("error creating directory {destination:?}"))?;
> +
> +        let dir = Dir::open(
> +            destination,
> +            OFlag::O_DIRECTORY | OFlag::O_CLOEXEC,
> +            Mode::empty(),
> +        )
> +        .context(format!("unable to open target directory {destination:?}"))?;
> +
> +        let mut extractor = Extractor::new(
> +            dir,
> +            root.metadata().clone(),
> +            options.allow_existing_dirs,
> +            options.overwrite,
> +            feature_flags,
> +        );
> +
> +        if let Some(on_error) = options.on_error {
> +            extractor.on_error(on_error);
> +        }
> +
> +        Ok(Self {
> +            decoder,
> +            callback,
> +            extractor,
> +            match_list: options.match_list,
> +            state,
> +        })
>      }
>  
> -    let mut match_stack = Vec::new();
> -    let mut err_path_stack = vec![OsString::from("/")];
> -    let mut current_match = options.extract_match_default;
> -    while let Some(entry) = decoder.next() {
> -        let entry = entry.context("error reading pxar archive")?;
> +    #[inline(always)]
> +    fn callback(&mut self, path: &Path) {
> +        (self.callback)(path)
> +    }
>  
> -        let file_name_os = entry.file_name();
> +    /// Yields the [`Decoder`][D]'s [`Result`]s and validates each [`Entry`][E]
> +    /// it encounters. It also controls the [`ExtractorIter`]'s stop condition.
> +    ///
> +    /// In detail, the [`ExtractorIter`] will stop if and only if one of the
> +    /// following conditions is true:
> +    ///   * The [`Decoder`][D] is exhausted
> +    ///   * The [`Decoder`][D] failed to read from the archive and consequently
> +    ///     yielded an [`io::Error`]
> +    ///   * Validation of an [`Entry`][E] failed
> +    ///
> +    /// The iterator will yield the encountered error before stopping.
> +    ///
> +    /// [D]: pxar::decoder::Decoder
> +    /// [E]: pxar::Entry
> +    #[inline]
> +    fn next_entry(&mut self) -> Option<Result<Entry, Error>> {
> +        if self.state.end_reached {
> +            return None;
> +        }
>  
> -        // safety check: a file entry in an archive must never contain slashes:
> -        if file_name_os.as_bytes().contains(&b'/') {
> -            bail!("archive file entry contains slashes, which is invalid and a security concern");
> +        let entry = self.decoder.next();
> +
> +        if entry.is_none() {

^ This could be a diverging `match` to avoid the `unwrap` below.

    let entry = match entry {
        None => { .. the code from here .. }
        Some(entry) => entry,
    };

(side note: too much code for a `let else` to be formatted properly)

> +            self.state.end_reached = true;
> +            if !self.extractor.dir_stack.is_empty() {
> +                return Some(Err(anyhow!("unexpected eof while decoding pxar archive")));

style: we typically use `format_err!`, I don't think we should start
mixing in `anyhow!()` unless there's a good reason for it.

> +            } else {
> +                return None;
> +            }
>          }
>  
> -        let file_name = CString::new(file_name_os.as_bytes())
> -            .context("encountered file name with null-bytes")?;
> +        #[inline(always)]
> +        fn validate_entry(entry: Result<Entry, io::Error>) -> Result<Entry, Error> {
> +            let entry = entry.context("error reading pxar archive")?;
> +
> +            let file_name_bytes = entry.file_name().as_bytes();
> +
> +            if file_name_bytes.contains(&b'/') {
> +                bail!(
> +                    "archive file entry contains slashes, which is invalid and a security concern"
> +                );
> +            }
> +
> +            CString::new(file_name_bytes).context("encountered file name with null-bytes")?;

^ I'd rather drop this here, because this allocates for no reason other
than to check for errors, both of which happens again later on.
As a side note, do we even need a `CString`? Or can the whole code using
`file_name` below work on a `CStr` as well?

If you insist on checking here
- at least use `CStr::from_bytes_with_nul` here
- add a comment about the code below
- use an `unsafe{}` block with a comment below with
  `CStr::from_bytes_with_nul_unchecked`

But only checking later makes a lot more sense to me :-)

> +
> +            Ok(entry)
> +        }
> +
> +        let entry = validate_entry(entry.unwrap());
> +
> +        if entry.is_err() {
> +            self.state.end_reached = true;
> +        }
>  
> +        Some(entry)
> +    }
> +}
> +
> +impl<'a, T, F> Iterator for ExtractorIter<'a, T, F>
> +where
> +    T: pxar::decoder::SeqRead,
> +    F: FnMut(&Path),
> +{
> +    type Item = Result<(), Error>;
> +
> +    /// Performs the actual extraction of decoded [`Entries`][E].
> +    ///
> +    /// Should an error occur during any point of extraction (**not** while
> +    /// fetching the next [`Entry`][E]), the error may be handled by the
> +    /// [`ErrorHandler`] provided by the [`PxarExtractOptions`] used to
> +    /// initialize the iterator.
> +    ///
> +    /// [E]: pxar::Entry
> +    fn next(&mut self) -> Option<Self::Item> {
> +        let entry = self.next_entry()?;
> +
> +        if entry.is_err() {
> +            assert!(
> +                self.state.end_reached,
> +                "encountered an error while reading the archive, but the iterator was not stopped!"
> +            );
> +
> +            return Some(entry.map(drop));
> +        }
> +
> +        let entry = entry.unwrap();

^ Same as in `next_entry` - we can easily get rid of the `unwrap` with a
`match`.

> +        let file_name_os = entry.file_name();
>          let metadata = entry.metadata();
>  
> -        extractor.set_path(entry.path().as_os_str().to_owned());
> +        // can unwrap here because next_entry already checked for null bytes
> +        let file_name = CString::new(file_name_os.as_bytes()).unwrap();
> +
> +        self.extractor.set_path(entry.path().as_os_str().to_owned());
>  
> -        let match_result = options.match_list.matches(
> +        let match_result = self.match_list.matches(
>              entry.path().as_os_str().as_bytes(),
>              Some(metadata.file_type() as u32),
>          );
> @@ -116,99 +265,114 @@ where
>          let did_match = match match_result {
>              Some(MatchType::Include) => true,
>              Some(MatchType::Exclude) => false,
> -            None => current_match,
> +            None => self.state.current_match,
>          };
> -        match (did_match, entry.kind()) {
> -            (_, EntryKind::Directory) => {
> -                callback(entry.path());
> -
> -                let create = current_match && match_result != Some(MatchType::Exclude);
> -                extractor
> -                    .enter_directory(file_name_os.to_owned(), metadata.clone(), create)
> -                    .context(format!("error at entry {:?}", file_name_os))?;
>  
> -                // We're starting a new directory, push our old matching state and replace it with
> -                // our new one:
> -                match_stack.push(current_match);
> -                current_match = did_match;
> -
> -                // When we hit the goodbye table we'll try to apply metadata to the directory, but
> -                // the Goodbye entry will not contain the path, so push it to our path stack for
> -                // error messages:
> -                err_path_stack.push(extractor.clone_path());
> +        let extract_res = match (did_match, entry.kind()) {
> +            (_, EntryKind::Directory) => {
> +                self.callback(entry.path());
> +
> +                let create = self.state.current_match && match_result != Some(MatchType::Exclude);
> +                let res = self.extractor.enter_directory(
> +                    file_name_os.to_owned(),
> +                    metadata.clone(),
> +                    create,
> +                );
> +
> +                if res.is_ok() {
> +                    // We're starting a new directory, push our old matching state and replace it with
> +                    // our new one:
> +                    self.state.match_stack.push(self.state.current_match);
> +                    self.state.current_match = did_match;
> +
> +                    // When we hit the goodbye table we'll try to apply metadata to the directory, but
> +                    // the Goodbye entry will not contain the path, so push it to our path stack for
> +                    // error messages:
> +                    self.state.err_path_stack.push(self.extractor.clone_path());
> +                }
>  
> -                Ok(())
> +                res
>              }
>              (_, EntryKind::GoodbyeTable) => {
>                  // go up a directory
>  
> -                extractor.set_path(err_path_stack.pop().context(format!(
> -                    "error at entry {:?} - unexpected end of directory",
> -                    file_name_os
> -                ))?);
> -
> -                extractor
> -                    .leave_directory()
> -                    .context(format!("error at entry {:?}", file_name_os))?;
> -
> -                // We left a directory, also get back our previous matching state. This is in sync
> -                // with `dir_stack` so this should never be empty except for the final goodbye
> -                // table, in which case we get back to the default of `true`.
> -                current_match = match_stack.pop().unwrap_or(true);
> +                let res = self
> +                    .state
> +                    .err_path_stack
> +                    .pop()
> +                    .context("unexpected end of directory")
> +                    .map(|path| self.extractor.set_path(path))
> +                    .and(self.extractor.leave_directory());
> +
> +                if res.is_ok() {
> +                    // We left a directory, also get back our previous matching state. This is in sync
> +                    // with `dir_stack` so this should never be empty except for the final goodbye
> +                    // table, in which case we get back to the default of `true`.
> +                    self.state.current_match = self.state.match_stack.pop().unwrap_or(true);
> +                }
>  
> -                Ok(())
> +                res
>              }
>              (true, EntryKind::Symlink(link)) => {
> -                callback(entry.path());
> -                extractor.extract_symlink(&file_name, metadata, link.as_ref())
> +                self.callback(entry.path());
> +                self.extractor
> +                    .extract_symlink(&file_name, metadata, link.as_ref())
>              }
>              (true, EntryKind::Hardlink(link)) => {
> -                callback(entry.path());
> -                extractor.extract_hardlink(&file_name, link.as_os_str())
> +                self.callback(entry.path());
> +                self.extractor
> +                    .extract_hardlink(&file_name, link.as_os_str())
>              }
>              (true, EntryKind::Device(dev)) => {
> -                if extractor.contains_flags(Flags::WITH_DEVICE_NODES) {
> -                    callback(entry.path());
> -                    extractor.extract_device(&file_name, metadata, dev)
> +                if self.extractor.contains_flags(Flags::WITH_DEVICE_NODES) {
> +                    self.callback(entry.path());
> +                    self.extractor.extract_device(&file_name, metadata, dev)
>                  } else {
>                      Ok(())
>                  }
>              }
>              (true, EntryKind::Fifo) => {
> -                if extractor.contains_flags(Flags::WITH_FIFOS) {
> -                    callback(entry.path());
> -                    extractor.extract_special(&file_name, metadata, 0)
> +                if self.extractor.contains_flags(Flags::WITH_FIFOS) {
> +                    self.callback(entry.path());
> +                    self.extractor.extract_special(&file_name, metadata, 0)
>                  } else {
>                      Ok(())
>                  }
>              }
>              (true, EntryKind::Socket) => {
> -                if extractor.contains_flags(Flags::WITH_SOCKETS) {
> -                    callback(entry.path());
> -                    extractor.extract_special(&file_name, metadata, 0)
> +                if self.extractor.contains_flags(Flags::WITH_SOCKETS) {
> +                    self.callback(entry.path());
> +                    self.extractor.extract_special(&file_name, metadata, 0)
>                  } else {
>                      Ok(())
>                  }
>              }
> -            (true, EntryKind::File { size, .. }) => extractor.extract_file(
> -                &file_name,
> -                metadata,
> -                *size,
> -                &mut decoder
> -                    .contents()
> -                    .context("found regular file entry without contents in archive")?,
> -                extractor.overwrite,
> -            ),
> +            (true, EntryKind::File { size, .. }) => {
> +                let contents = self.decoder.contents();
> +
> +                if let Some(mut contents) = contents {
> +                    self.extractor.extract_file(
> +                        &file_name,
> +                        metadata,
> +                        *size,
> +                        &mut contents,
> +                        self.extractor.overwrite,
> +                    )
> +                } else {
> +                    Err(anyhow!(
> +                        "found regular file entry without contents in archive"
> +                    ))
> +                }
> +            }
>              (false, _) => Ok(()), // skip this
> -        }
> -        .context(format!("error at entry {:?}", file_name_os))?;
> -    }
> +        };
>  
> -    if !extractor.dir_stack.is_empty() {
> -        bail!("unexpected eof while decoding pxar archive");
> +        Some(
> +            extract_res
> +                .context(format!("error at entry {file_name_os:?}"))
> +                .or_else(&mut *self.extractor.on_error),
> +        )
>      }
> -
> -    Ok(())
>  }
>  
>  /// Common state for file extraction.
> -- 
> 2.30.2




  reply	other threads:[~2023-07-11 13:24 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-07 16:34 [pbs-devel] [PATCH proxmox-backup/pve-container 0/5] improve error handling of pxar extractor Max Carrara
2023-06-07 16:34 ` [pbs-devel] [PATCH proxmox-backup 1/5] pbs-client: pxar: preserve error context Max Carrara
2023-07-11 12:54   ` Wolfgang Bumiller
2023-06-07 16:34 ` [pbs-devel] [PATCH proxmox-backup 2/5] pbs-client: pxar: refactor body of `extract_archive` to `ExtractorIter` Max Carrara
2023-07-11 13:24   ` Wolfgang Bumiller [this message]
2023-06-07 16:34 ` [pbs-devel] [PATCH proxmox-backup 3/5] pbs-client: pxar: add `PxarExtractContext` Max Carrara
2023-06-07 16:34 ` [pbs-devel] [PATCH proxmox-backup 4/5] proxmox-backup-client: restore: add 'ignore-extract-device-errors' flag Max Carrara
2023-07-11 14:17   ` Wolfgang Bumiller
2023-06-07 16:34 ` [pbs-devel] [PATCH pve-container 5/5] fix #3460: restore: honor '--ignore-unpack-errors' flag for pbs Max Carrara
2023-06-08 13:47   ` Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=rt25g6fws5y4r2iuwxq23qcezkjewvzy4bwzm7igfd4qqtb5aa@m4c7tuh2fcrp \
    --to=w.bumiller@proxmox.com \
    --cc=m.carrara@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal