public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH v5 pxar 4/28] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype
Date: Wed, 15 Nov 2023 16:47:49 +0100	[thread overview]
Message-ID: <20231115154813.281564-5-c.ebner@proxmox.com> (raw)
In-Reply-To: <20231115154813.281564-1-c.ebner@proxmox.com>

Add an additional entry type for regular files to store a reference to
the appenidx section of the pxar archive, the position relative to the
appendix start is stored, in order to be able to access the file payload
within the appendix.

This new entry type is used to reference the contents of existing file
payload chunks for unchanged file payloads.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v4:
- no changes

Changes since v3:
- no changes

Changes since v2:
- no changes

Changes since v1:
- Do not re-encode the filename and metadata in add_appendix_ref by
  removing the call to start_file_do
- Use dedicated types for archive offsets instead of raw `u64` values.

 examples/mk-format-hashes.rs |  5 +++
 src/decoder/mod.rs           | 22 ++++++++++++-
 src/encoder/aio.rs           | 14 ++++++++-
 src/encoder/mod.rs           | 61 ++++++++++++++++++++++++++++++++++++
 src/encoder/sync.rs          | 16 +++++++++-
 src/format/mod.rs            |  5 +++
 src/lib.rs                   |  6 ++++
 7 files changed, 126 insertions(+), 3 deletions(-)

diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index 1ad606c..8b4f5de 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -41,6 +41,11 @@ const CONSTANTS: &[(&str, &str, &str)] = &[
         "PXAR_PAYLOAD",
         "__PROXMOX_FORMAT_PXAR_PAYLOAD__",
     ),
+    (
+        "Marks the beginnig of an appendix reference for regular files",
+        "PXAR_APPENDIX_REF",
+        "__PROXMOX_FORMAT_PXAR_APPENDIX_REF__",
+    ),
     (
         "Marks item as entry of goodbye table",
         "PXAR_GOODBYE",
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index a094324..70a8697 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -276,6 +276,10 @@ impl<I: SeqRead> DecoderImpl<I> {
 
             match self.current_header.htype {
                 format::PXAR_FILENAME => return self.handle_file_entry().await,
+                format::PXAR_APPENDIX_REF => {
+                    self.state = State::Default;
+                    return self.handle_appendix_ref_entry().await
+                }
                 format::PXAR_GOODBYE => {
                     self.state = State::InGoodbyeTable;
 
@@ -334,6 +338,22 @@ impl<I: SeqRead> DecoderImpl<I> {
         self.read_next_entry().await.map(Some)
     }
 
+    async fn handle_appendix_ref_entry(&mut self) -> io::Result<Option<Entry>> {
+        let bytes = self.read_entry_as_bytes().await?;
+        let appendix_offset = u64::from_le_bytes(bytes[0..8].try_into().unwrap());
+        let file_size = u64::from_le_bytes(bytes[8..16].try_into().unwrap());
+
+        self.reset_path()?;
+        return Ok(Some(Entry {
+            path: self.entry.path.clone(),
+            metadata: Metadata::default(),
+            kind: EntryKind::AppendixRef {
+                appendix_offset,
+                file_size,
+            }
+        }));
+    }
+
     fn reset_path(&mut self) -> io::Result<()> {
         let path_len = *self
             .path_lengths
@@ -535,7 +555,7 @@ impl<I: SeqRead> DecoderImpl<I> {
                 self.state = State::InPayload { offset: 0 };
                 return Ok(ItemResult::Entry);
             }
-            format::PXAR_FILENAME | format::PXAR_GOODBYE => {
+            format::PXAR_FILENAME | format::PXAR_GOODBYE | format::PXAR_APPENDIX_REF => {
                 if self.entry.metadata.is_fifo() {
                     self.state = State::InSpecialFile;
                     self.entry.kind = EntryKind::Fifo;
diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index ad25fea..66ea535 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -5,7 +5,7 @@ use std::path::Path;
 use std::pin::Pin;
 use std::task::{Context, Poll};
 
-use crate::encoder::{self, LinkOffset, SeqWrite};
+use crate::encoder::{self, AppendixRefOffset, LinkOffset, SeqWrite};
 use crate::format;
 use crate::Metadata;
 
@@ -112,6 +112,18 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         self.inner.finish().await
     }
 
+    /// Add reference to archive appendix
+    pub async fn add_appendix_ref<PF: AsRef<Path>>(
+        &mut self,
+        file_name: PF,
+        appendix_ref_offset: AppendixRefOffset,
+        file_size: u64,
+    ) -> io::Result<()> {
+        self.inner
+            .add_appendix_ref(file_name.as_ref(), appendix_ref_offset, file_size)
+            .await
+    }
+
     /// Add a symbolic link to the archive.
     pub async fn add_symlink<PF: AsRef<Path>, PT: AsRef<Path>>(
         &mut self,
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 860c21f..982e3f9 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -38,6 +38,33 @@ impl LinkOffset {
     }
 }
 
+/// File reference used to create appendix references.
+#[derive(Clone, Copy, Debug, Default, Eq, PartialEq, Ord, PartialOrd)]
+pub struct AppendixRefOffset(u64);
+
+impl AppendixRefOffset {
+    /// Get the raw byte offset for this appendix reference.
+    #[inline]
+    pub fn raw(self) -> u64 {
+        self.0
+    }
+
+    /// Return a new AppendixRefOffset, positively shifted by offset
+    #[inline]
+    pub fn add(&self, offset: u64) -> Self {
+        Self(self.0 + offset)
+    }
+
+    /// Return a new AppendixRefOffset, negatively shifted by offset
+    #[inline]
+    pub fn sub(&self, offset: u64) -> Self {
+        Self(self.0 - offset)
+    }
+}
+
+/// Offset pointing to the start of the appendix section of the archive.
+#[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
+pub struct AppendixStartOffset(u64);
 /// Sequential write interface used by the encoder's state machine.
 ///
 /// This is our internal writer trait which is available for `std::io::Write` types in the
@@ -466,6 +493,40 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         Ok(offset)
     }
 
+    /// Add reference to pxar archive appendix
+    pub async fn add_appendix_ref(
+        &mut self,
+        file_name: &Path,
+        appendix_ref_offset: AppendixRefOffset,
+        file_size: u64,
+    ) -> io::Result<()> {
+        self.check()?;
+
+        let offset = self.position();
+        let file_name = file_name.as_os_str().as_bytes();
+
+        let mut data = Vec::with_capacity(2 * 8);
+        data.extend(&appendix_ref_offset.raw().to_le_bytes());
+        data.extend(&file_size.to_le_bytes());
+        seq_write_pxar_entry(
+            self.output.as_mut(),
+            format::PXAR_APPENDIX_REF,
+            &data,
+            &mut self.state.write_position,
+        )
+        .await?;
+
+        let end_offset = self.position();
+
+        self.state.items.push(GoodbyeItem {
+            hash: format::hash_filename(file_name),
+            offset: offset,
+            size: end_offset - offset,
+        });
+
+        Ok(())
+    }
+
     /// Return a file offset usable with `add_hardlink`.
     pub async fn add_symlink(
         &mut self,
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 93c3b2c..370a219 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -6,7 +6,7 @@ use std::pin::Pin;
 use std::task::{Context, Poll};
 
 use crate::decoder::sync::StandardReader;
-use crate::encoder::{self, LinkOffset, SeqWrite};
+use crate::encoder::{self, AppendixRefOffset, LinkOffset, SeqWrite};
 use crate::format;
 use crate::util::poll_result_once;
 use crate::Metadata;
@@ -110,6 +110,20 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         poll_result_once(self.inner.finish())
     }
 
+    /// Add reference to archive appendix
+    pub async fn add_appendix_ref<PF: AsRef<Path>>(
+        &mut self,
+        file_name: PF,
+        appendix_ref_offset: AppendixRefOffset,
+        file_size: u64,
+    ) -> io::Result<()> {
+        poll_result_once(self.inner.add_appendix_ref(
+            file_name.as_ref(),
+            appendix_ref_offset,
+            file_size,
+        ))
+    }
+
     /// Add a symbolic link to the archive.
     pub fn add_symlink<PF: AsRef<Path>, PT: AsRef<Path>>(
         &mut self,
diff --git a/src/format/mod.rs b/src/format/mod.rs
index 72a193c..5eb7562 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -22,6 +22,7 @@
 //!   * `FCAPS`             -- file capability in Linux disk format
 //!   * `QUOTA_PROJECT_ID`  -- the ext4/xfs quota project ID
 //!   * `PAYLOAD`           -- file contents, if it is one
+//!   * `APPENDIX_REF`      -- start offset and size of a file entry relative to the appendix start
 //!   * `SYMLINK`           -- symlink target, if it is one
 //!   * `DEVICE`            -- device major/minor, if it is a block/char device
 //!
@@ -99,6 +100,8 @@ pub const PXAR_QUOTA_PROJID: u64 = 0xe07540e82f7d1cbb;
 pub const PXAR_HARDLINK: u64 = 0x51269c8422bd7275;
 /// Marks the beginnig of the payload (actual content) of regular files
 pub const PXAR_PAYLOAD: u64 = 0x28147a1b0b7c1a25;
+/// Marks the beginnig of an appendix reference for regular files
+pub const PXAR_APPENDIX_REF: u64 = 0x849b4a17e0234f8e;
 /// Marks item as entry of goodbye table
 pub const PXAR_GOODBYE: u64 = 0x2fec4fa642d5731d;
 /// The end marker used in the GOODBYE object
@@ -151,6 +154,7 @@ impl Header {
             PXAR_ACL_GROUP_OBJ => size_of::<acl::GroupObject>() as u64,
             PXAR_QUOTA_PROJID => size_of::<QuotaProjectId>() as u64,
             PXAR_ENTRY => size_of::<Stat>() as u64,
+            PXAR_APPENDIX_REF => u64::MAX - (size_of::<Self>() as u64),
             PXAR_PAYLOAD | PXAR_GOODBYE => u64::MAX - (size_of::<Self>() as u64),
             _ => u64::MAX - (size_of::<Self>() as u64),
         }
@@ -191,6 +195,7 @@ impl Display for Header {
             PXAR_ACL_GROUP_OBJ => "ACL_GROUP_OBJ",
             PXAR_QUOTA_PROJID => "QUOTA_PROJID",
             PXAR_ENTRY => "ENTRY",
+            PXAR_APPENDIX_REF => "APPENDIX_REF",
             PXAR_PAYLOAD => "PAYLOAD",
             PXAR_GOODBYE => "GOODBYE",
             _ => "UNKNOWN",
diff --git a/src/lib.rs b/src/lib.rs
index 210c4b1..fa84e7a 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -366,6 +366,12 @@ pub enum EntryKind {
         offset: Option<u64>,
     },
 
+    /// Reference to pxar archive appendix
+    AppendixRef {
+        appendix_offset: u64,
+        file_size: u64,
+    },
+
     /// Directory entry. When iterating through an archive, the contents follow next.
     Directory,
 
-- 
2.39.2





  parent reply	other threads:[~2023-11-15 15:48 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 1/28] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 2/28] fix #3174: decoder: impl skip_bytes for sync dec Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 3/28] fix #3174: encoder: calc filename + metadata byte size Christian Ebner
2023-11-15 15:47 ` Christian Ebner [this message]
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 5/28] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 6/28] fix #3174: encoder: helper to add to encoder position Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 7/28] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 8/28] fix #3174: enc/dec: introduce pxar format version 2 Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 09/28] fix #3174: index: add fn index list from start/end-offsets Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 10/28] fix #3174: index: add fn digest for DynamicEntry Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 11/28] fix #3174: api: double catalog upload size Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 12/28] fix #3174: catalog: introduce extended format v2 Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 13/28] fix #3174: archiver/extractor: impl appendix ref Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 14/28] fix #3174: catalog: add specialized Archive entry Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 15/28] fix #3174: extractor: impl seq restore from appendix Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 16/28] fix #3174: archiver: store ref to previous backup Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 17/28] fix #3174: upload stream: impl reused chunk injector Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 18/28] fix #3174: chunker: add forced boundaries Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 19/28] fix #3174: backup writer: inject queued chunk in upload steam Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 20/28] fix #3174: archiver: reuse files with unchanged metadata Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 21/28] fix #3174: specs: add backup detection mode specification Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 22/28] fix #3174: client: Add detection mode to backup creation Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 23/28] test-suite: add detection mode change benchmark Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 24/28] test-suite: Add bin to deb, add shell completions Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 25/28] catalog: fetch offset and size for files and refs Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 26/28] pxar: add heuristic to reduce reused chunk fragmentation Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 27/28] catalog: use format version 2 conditionally Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-widget-toolkit 28/28] file-browser: support pxar archive and fileref types Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231115154813.281564-5-c.ebner@proxmox.com \
    --to=c.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal