public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH v4 proxmox-backup 14/26] fix #3174: extractor: impl seq restore from appendix
Date: Thu,  9 Nov 2023 19:46:02 +0100	[thread overview]
Message-ID: <20231109184614.1611127-15-c.ebner@proxmox.com> (raw)
In-Reply-To: <20231109184614.1611127-1-c.ebner@proxmox.com>

Restores the file payloads for all AppendixRef entries encountered
during the sequential restore of the pxar archive.
This is done by iterating over all the files listed in the corresponding
state variable, opening each of the parent directory while storing its
metadata for successive restore and creating the file, followed by
writing the contents to it.

When leaving the directories, their metatdata is restored.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 3:
- Use BTreeMap and sorted insert instead of Vec and sorting afterwards

Changes since version 2:
- Sort entries by their appendix start offset for restore. Required
  since chunks are now normalized during upload.

Changes since version 1:
- Use the Encoder<SeqSink> to get encoded metadata byte size

 pbs-client/src/pxar/create.rs  |   4 +-
 pbs-client/src/pxar/extract.rs | 149 +++++++++++++++++++++++++++++++--
 pbs-client/src/pxar/tools.rs   |   1 +
 3 files changed, 144 insertions(+), 10 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 611d7421..50bba4e6 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -43,7 +43,7 @@ pub struct PxarCreateOptions {
     pub skip_lost_and_found: bool,
 }
 
-fn detect_fs_type(fd: RawFd) -> Result<i64, Error> {
+pub fn detect_fs_type(fd: RawFd) -> Result<i64, Error> {
     let mut fs_stat = std::mem::MaybeUninit::uninit();
     let res = unsafe { libc::fstatfs(fd, fs_stat.as_mut_ptr()) };
     Errno::result(res)?;
@@ -776,7 +776,7 @@ impl Archiver {
     }
 }
 
-fn get_metadata(
+pub fn get_metadata(
     fd: RawFd,
     stat: &FileStat,
     flags: Flags,
diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
index d2d42749..7bdcb595 100644
--- a/pbs-client/src/pxar/extract.rs
+++ b/pbs-client/src/pxar/extract.rs
@@ -1,6 +1,6 @@
 //! Code for extraction of pxar contents onto the file system.
 
-use std::collections::HashMap;
+use std::collections::{BTreeMap, HashMap};
 use std::ffi::{CStr, CString, OsStr, OsString};
 use std::io;
 use std::os::unix::ffi::OsStrExt;
@@ -74,7 +74,7 @@ struct ExtractorIterState {
     err_path_stack: Vec<OsString>,
     current_match: bool,
     end_reached: bool,
-    appendix_list: Vec<(PathBuf, u64, u64)>,
+    appendix_refs: BTreeMap<u64, (PathBuf, u64)>,
 }
 
 /// An [`Iterator`] that encapsulates the process of extraction in [extract_archive].
@@ -99,7 +99,7 @@ impl ExtractorIterState {
             err_path_stack: Vec::new(),
             current_match: options.extract_match_default,
             end_reached: false,
-            appendix_list: Vec::new(),
+            appendix_refs: BTreeMap::new(),
         }
     }
 }
@@ -313,6 +313,141 @@ where
 
                 res
             }
+            (_, EntryKind::Appendix { total }) => {
+                // Bytes consumed in decoder since encountering the appendix marker
+                let mut consumed = 0;
+                for (offset, (path, size)) in &self.state.appendix_refs {
+                    self.extractor.allow_existing_dirs = true;
+
+                    // Open dir path components, skipping the root component, get metadata
+                    for dir in path.iter().skip(1) {
+                        let parent_fd = match self.extractor.dir_stack.last_dir_fd(true) {
+                            Ok(parent_fd) => parent_fd,
+                            Err(err) => return Some(Err(err.into())),
+                        };
+                        let fs_magic =
+                            match crate::pxar::create::detect_fs_type(parent_fd.as_raw_fd()) {
+                                Ok(fs_magic) => fs_magic,
+                                Err(err) => return Some(Err(err.into())),
+                            };
+
+                        let mut fs_feature_flags = Flags::from_magic(fs_magic);
+                        let file_name = match CString::new(dir.as_bytes()) {
+                            Ok(file_name) => file_name,
+                            Err(err) => return Some(Err(err.into())),
+                        };
+                        let fd = match proxmox_sys::fd::openat(
+                            &parent_fd,
+                            file_name.as_ref(),
+                            OFlag::O_NOATIME,
+                            Mode::empty(),
+                        ) {
+                            Ok(fd) => fd,
+                            Err(err) => return Some(Err(err.into())),
+                        };
+
+                        let stat = match nix::sys::stat::fstat(fd.as_raw_fd()) {
+                            Ok(stat) => stat,
+                            Err(err) => return Some(Err(err.into())),
+                        };
+                        let metadata = match crate::pxar::create::get_metadata(
+                            fd.as_raw_fd(),
+                            &stat,
+                            fs_feature_flags,
+                            fs_magic,
+                            &mut fs_feature_flags,
+                        ) {
+                            Ok(metadata) => metadata,
+                            Err(err) => return Some(Err(err)),
+                        };
+
+                        match self.extractor.enter_directory(
+                            dir.to_os_string(),
+                            metadata.clone(),
+                            true,
+                        ) {
+                            Ok(()) => (),
+                            Err(err) => return Some(Err(err)),
+                        };
+                    }
+
+                    let skip = *offset - consumed;
+                    match self.decoder.skip_bytes(skip) {
+                        Ok(()) => (),
+                        Err(err) => return Some(Err(err.into())),
+                    };
+
+                    let entry = match self.decoder.next() {
+                        Some(Ok(entry)) => entry,
+                        Some(Err(err)) => return Some(Err(err.into())),
+                        None => return Some(Err(format_err!("expected entry"))),
+                    };
+
+                    let file_name_os = entry.file_name();
+                    let file_name_bytes = file_name_os.as_bytes();
+
+                    let file_name = match CString::new(file_name_bytes) {
+                        Ok(file_name_ref) => file_name_ref,
+                        Err(err) => return Some(Err(err.into())),
+                    };
+
+                    let metadata = entry.metadata();
+
+                    self.extractor.set_path(path.as_os_str().to_owned());
+
+                    let contents = self.decoder.contents();
+                    match contents {
+                        None => {
+                            return Some(Err(format_err!(
+                                "found regular file entry without contents in archive"
+                            )))
+                        }
+                        Some(mut contents) => {
+                            let result = self
+                                .extractor
+                                .extract_file(
+                                    &file_name,
+                                    metadata,
+                                    *size,
+                                    &mut contents,
+                                    self.extractor
+                                        .overwrite_flags
+                                        .contains(OverwriteFlags::FILE),
+                                )
+                                .context(PxarExtractContext::ExtractFile);
+                            if let Err(err) = result {
+                                return Some(Err(err.into()));
+                            }
+                        }
+                    }
+
+                    // Iter over all dir path components, skipping the root component, set metadata
+                    for _dir in path.iter().skip(1) {
+                        if let Err(err) = self.extractor.leave_directory() {
+                            return Some(Err(err.into()));
+                        }
+                    }
+
+                    let mut bytes = match pxar::encoder::Encoder::<pxar::encoder::SeqSink>::byte_len(
+                        file_name.as_c_str(),
+                        &metadata,
+                    ) {
+                        Ok(bytes) => bytes,
+                        Err(err) => return Some(Err(err.into())),
+                    };
+                    // payload header size
+                    bytes += std::mem::size_of::<pxar::format::Header>() as u64;
+
+                    consumed += skip + bytes + *size;
+                }
+
+                let skip = *total - consumed;
+                if let Err(err) = self.decoder.skip_bytes(skip) {
+                    return Some(Err(err.into()));
+                }
+
+                Ok(())
+            }
             (true, EntryKind::Symlink(link)) => {
                 self.callback(entry.path());
                 self.extractor
@@ -382,11 +517,9 @@ where
                     file_size,
                 },
             ) => {
-                self.state.appendix_list.push((
-                    entry.path().to_path_buf(),
-                    *appendix_offset,
-                    *file_size,
-                ));
+                self.state
+                    .appendix_refs
+                    .insert(*appendix_offset, (entry.path().to_path_buf(), *file_size));
                 Ok(())
             }
             (false, _) => Ok(()), // skip this
diff --git a/pbs-client/src/pxar/tools.rs b/pbs-client/src/pxar/tools.rs
index aac5a1e7..174a7351 100644
--- a/pbs-client/src/pxar/tools.rs
+++ b/pbs-client/src/pxar/tools.rs
@@ -156,6 +156,7 @@ pub fn format_multi_line_entry(entry: &Entry) -> String {
 
     let (size, link, type_name) = match entry.kind() {
         EntryKind::File { size, .. } => (format!("{}", *size), String::new(), "file"),
+        EntryKind::Appendix { total } => (format!("{total}"), String::new(), "appendix"),
         EntryKind::AppendixRef {
             appendix_offset,
             file_size,
-- 
2.39.2





  parent reply	other threads:[~2023-11-09 18:47 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-09 18:45 [pbs-devel] [PATCH-SERIES v4 pxar proxmox-backup proxmox-widget-toolkit 00/26] fix #3174: improve file-level backup Christian Ebner
2023-11-09 18:45 ` [pbs-devel] [PATCH v4 pxar 1/26] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
2023-11-09 18:45 ` [pbs-devel] [PATCH v4 pxar 2/26] fix #3174: decoder: impl skip_bytes for sync dec Christian Ebner
2023-11-09 18:45 ` [pbs-devel] [PATCH v4 pxar 3/26] fix #3174: encoder: calc filename + metadata byte size Christian Ebner
2023-11-09 18:45 ` [pbs-devel] [PATCH v4 pxar 4/26] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype Christian Ebner
2023-11-09 18:45 ` [pbs-devel] [PATCH v4 pxar 5/26] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype Christian Ebner
2023-11-09 18:45 ` [pbs-devel] [PATCH v4 pxar 6/26] fix #3174: encoder: helper to add to encoder position Christian Ebner
2023-11-09 18:45 ` [pbs-devel] [PATCH v4 pxar 7/26] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype Christian Ebner
2023-11-09 18:45 ` [pbs-devel] [PATCH v4 proxmox-backup 08/26] fix #3174: index: add fn index list from start/end-offsets Christian Ebner
2023-11-09 18:45 ` [pbs-devel] [PATCH v4 proxmox-backup 09/26] fix #3174: index: add fn digest for DynamicEntry Christian Ebner
2023-11-09 18:45 ` [pbs-devel] [PATCH v4 proxmox-backup 10/26] fix #3174: api: double catalog upload size Christian Ebner
2023-11-09 18:45 ` [pbs-devel] [PATCH v4 proxmox-backup 11/26] fix #3174: catalog: introduce extended format v2 Christian Ebner
2023-11-09 18:46 ` [pbs-devel] [PATCH v4 proxmox-backup 12/26] fix #3174: archiver/extractor: impl appendix ref Christian Ebner
2023-11-09 18:46 ` [pbs-devel] [PATCH v4 proxmox-backup 13/26] fix #3174: catalog: add specialized Archive entry Christian Ebner
2023-11-09 18:46 ` Christian Ebner [this message]
2023-11-09 18:46 ` [pbs-devel] [PATCH v4 proxmox-backup 15/26] fix #3174: archiver: store ref to previous backup Christian Ebner
2023-11-09 18:46 ` [pbs-devel] [PATCH v4 proxmox-backup 16/26] fix #3174: upload stream: impl reused chunk injector Christian Ebner
2023-11-09 18:46 ` [pbs-devel] [PATCH v4 proxmox-backup 17/26] fix #3174: chunker: add forced boundaries Christian Ebner
2023-11-09 18:46 ` [pbs-devel] [PATCH v4 proxmox-backup 18/26] fix #3174: backup writer: inject queued chunk in upload steam Christian Ebner
2023-11-09 18:46 ` [pbs-devel] [PATCH v4 proxmox-backup 19/26] fix #3174: archiver: reuse files with unchanged metadata Christian Ebner
2023-11-09 18:46 ` [pbs-devel] [PATCH v4 proxmox-backup 20/26] fix #3174: specs: add backup detection mode specification Christian Ebner
2023-11-09 18:46 ` [pbs-devel] [PATCH v4 proxmox-backup 21/26] fix #3174: client: Add detection mode to backup creation Christian Ebner
2023-11-09 18:46 ` [pbs-devel] [PATCH v4 proxmox-backup 22/26] test-suite: add detection mode change benchmark Christian Ebner
2023-11-09 18:46 ` [pbs-devel] [PATCH v4 proxmox-backup 23/26] test-suite: Add bin to deb, add shell completions Christian Ebner
2023-11-09 18:46 ` [pbs-devel] [PATCH v4 proxmox-backup 24/26] catalog: fetch offset and size for files and refs Christian Ebner
2023-11-09 18:46 ` [pbs-devel] [PATCH v4 proxmox-backup 25/26] pxar: add heuristic to reduce reused chunk fragmentation Christian Ebner
2023-11-09 18:46 ` [pbs-devel] [PATCH v4 proxmox-widget-toolkit 26/26] file-browser: support pxar archive and fileref types Christian Ebner
2023-11-13 14:23 ` [pbs-devel] [PATCH-SERIES v4 pxar proxmox-backup proxmox-widget-toolkit 00/26] fix #3174: improve file-level backup Fabian Grünbichler
2023-11-13 15:14   ` Christian Ebner
2023-11-13 15:21     ` Christian Ebner
2023-11-13 15:35     ` Fabian Grünbichler
2023-11-13 15:45       ` Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231109184614.1611127-15-c.ebner@proxmox.com \
    --to=c.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal