public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup
@ 2023-10-09 11:51 Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 1/23] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
                   ` (22 more replies)
  0 siblings, 23 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Disclaimer: Version 2 of the patch series remains work in progress and
is only intended for review/testing purposes.

Changes to the patch series since version 1 are based on the feedback
obtained via mailing list or other communication channels. Many thanks
to Fabian, Wolfgang and Thomas for testing and discussions.

This series of patches prototypes a possible approach to improve the
pxar file level backup creation speed.

The current approach is to skip encoding of regular file payloads,
for which metadata (currently ctime and size) did not change as
compared to a previous backup run. Instead of re-encoding the files, a
reference to a newly introduced appendix section of the pxar archive
will be written. The appendix section will be created as concatenation
of indexed chunks from the previous backup run, thereby containing the
sequential file payload at a calculated offset with respect to the
starting point of the appendix section.

Metadata comparison and calculation of the chunks to be indexed for the
appendix section is performed using the catalog of a previous backup as
reference. In order to be able to calculate the offsets, an updated
catalog file format version 2 is introduced which extends the previous
version by including the file offset with respect to the pxar archive
byte stream, as well as the files ctime. This allows to find the required
chunks indexes and the start padding within the concatenated chunks.
The catalog reader remains backwards compatible to the catalog file
format version 1.

During encoding, the chunks needed for the appendix section are injected
in the backup upload stream after forcing a chunk boundary when regular
pxar encoding is finished. Finally, the pxar archive containing an
appendix section are marked as such by appending a final pxar goodbye
lookup table only containing the offset to the appendix section start and
total size of that section, needed for random access as e.g. to mount
the archive via the fuse filesystem implementation.

The following lists the most notable changes included in this series since
the previous version 1:

- fixes and refactors the missing chunk issue by modifying the logic to
  avoid re-appending the same chunks multiple times if referenced by
  multiple consecutive files.
- fixes a performance issue with catalog lookup being the bottleneck
  for cases with directories with many entries, resulting in the
  metadata based file change detection performing worse than the regular
  mode.
- fixes the creation of multi archive backups. All of the archives use
  the same reference catalog.
- the catalog file format is pushed to version 2, including the needed
  archive offsets as well as ctime for file change detection.
- the catalog is fully backward compatible to catalog file format
  version 1, so both can be decoded by the reader. However, the new
  version of the catalog file format will be used for all new backups
- it is now possible to perform multiple consecutive runs of the backup
  with metadata based file change detection, no more need to perform the
  regular run previous to the other one.
- change from `incremental` flag to enum based `BackupDetectionMode`
  parameter for command invocations.
- includes a new `proxmox-backup-test-suite` binary to create and run
  benchmarks to compare the performance of the different detection modes.

An invocation of a backup run with this patches now is:
```bash
proxmox-backup-client backup <label>.pxar:<source-path> --change-detection-mode=metadata
```
During the first run, no reference catalog can be used, the backup will
be a regular run. Following backups will however utilize the catalog and
index files of the previous run to perform file change detection.

As benchmarks, the linux source code as well as the coco dataset for
computer vision and pattern recognition can be used.
The benchmarks can be performed by running:
```bash
proxmox-backup-test-suite detection-mode-bench prepare --target /<path-to-bench-source-target>
proxmox-backup-test-suite detection-mode-bench run linux.pxar:/<path-to-bench-source-target>/linux
proxmox-backup-test-suite detection-mode-bench run coco.pxar:/<path-to-bench-source-target>/coco
```

Above command invocations assume the default repository and credentials
to be set as environment variables, they might however be passed as
additinal optional parameters instead.

Initial benchmark runs using these test data show a significant
improvement in the time needed for the backups:

For the linux source code backup:
    Completed benchmark with 5 runs for each tested mode.

    Completed regular backup with:
    Total runtime: 50.41 s
    Average: 10.08 ± 0.11 s
    Min: 9.93 s
    Max: 10.19 s

    Completed metadata detection mode backup with:
    Total runtime: 8.93 s
    Average: 1.79 ± 0.06 s
    Min: 1.74 s
    Max: 1.86 s

    Differences (metadata based - regular):
    Delta total runtime: -41.48 s (-82.28 %)
    Delta average: -8.30 ± 0.13 s (-82.28 %)
    Delta min: -8.19 s (-82.52 %)
    Delta max: -8.33 s (-81.77 %)

For the coco dataset backup:
    Completed benchmark with 5 runs for each tested mode.

    Completed regular backup with:
    Total runtime: 587.54 s
    Average: 117.51 ± 2.66 s
    Min: 114.32 s
    Max: 119.62 s

    Completed metadata detection mode backup with:
    Total runtime: 111.07 s
    Average: 22.21 ± 0.19 s
    Min: 22.05 s
    Max: 22.54 s

    Differences (metadata based - regular):
    Delta total runtime: -476.47 s (-81.10 %)
    Delta average: -95.29 ± 2.66 s (-81.10 %)
    Delta min: -92.27 s (-80.71 %)
    Delta max: -97.08 s (-81.16 %)

Above tests were performed inside a test VM backing up to a local
datastore.

pxar:

Christian Ebner (7):
  fix #3174: decoder: factor out skip_bytes from skip_entry
  fix #3174: decoder: impl skip_bytes for sync dec
  fix #3174: encoder: calc filename + metadata byte size
  fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype
  fix #3174: enc/dec: impl PXAR_APPENDIX entrytype
  fix #3174: encoder: helper to add to encoder position
  fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype

 examples/mk-format-hashes.rs |  11 +++
 examples/pxarcmd.rs          |   4 +-
 src/accessor/mod.rs          |  46 +++++++++++
 src/decoder/mod.rs           |  50 ++++++++++--
 src/decoder/sync.rs          |   5 ++
 src/encoder/aio.rs           |  32 +++++++-
 src/encoder/mod.rs           | 153 ++++++++++++++++++++++++++++++++++-
 src/encoder/sync.rs          |  39 ++++++++-
 src/format/mod.rs            |  16 ++++
 src/lib.rs                   |  10 +++
 10 files changed, 348 insertions(+), 18 deletions(-)

proxmox-backup:

Christian Ebner (16):
  fix #3174: index: add fn index list from start/end-offsets
  fix #3174: index: add fn digest for DynamicEntry
  fix #3174: api: double catalog upload size
  fix #3174: catalog: introduce extended format v2
  fix #3174: archiver/extractor: impl appendix ref
  fix #3174: catalog: add specialized Archive entry
  fix #3174: extractor: impl seq restore from appendix
  fix #3174: archiver: store ref to previous backup
  fix #3174: upload stream: impl reused chunk injector
  fix #3174: chunker: add forced boundaries
  fix #3174: backup writer: inject queued chunk in upload steam
  fix #3174: archiver: reuse files with unchanged metadata
  fix #3174: schema: add backup detection mode schema
  fix #3174: client: Add detection mode to backup creation
  test-suite: add detection mode change benchmark
  test-suite: Add bin to deb, add shell completions

 Cargo.toml                                    |   1 +
 Makefile                                      |  13 +-
 debian/proxmox-backup-client.bash-completion  |   1 +
 debian/proxmox-backup-client.install          |   2 +
 debian/proxmox-backup-test-suite.bc           |   8 +
 examples/test_chunk_speed2.rs                 |   9 +-
 pbs-api-types/src/datastore.rs                |  41 ++
 pbs-client/src/backup_writer.rs               |  88 +--
 pbs-client/src/catalog_shell.rs               |   3 +-
 pbs-client/src/chunk_stream.rs                |  41 +-
 pbs-client/src/inject_reused_chunks.rs        | 123 ++++
 pbs-client/src/lib.rs                         |   1 +
 pbs-client/src/pxar/create.rs                 | 314 +++++++++-
 pbs-client/src/pxar/extract.rs                | 141 +++++
 pbs-client/src/pxar/mod.rs                    |   2 +-
 pbs-client/src/pxar/tools.rs                  |   9 +
 pbs-client/src/pxar_backup_stream.rs          |   8 +-
 pbs-datastore/src/catalog.rs                  | 568 +++++++++++++++---
 pbs-datastore/src/dynamic_index.rs            |  37 ++
 pbs-datastore/src/file_formats.rs             |   3 +
 proxmox-backup-client/src/main.rs             | 161 ++++-
 proxmox-backup-test-suite/Cargo.toml          |  18 +
 .../src/detection_mode_bench.rs               | 288 +++++++++
 proxmox-backup-test-suite/src/main.rs         |  17 +
 .../src/proxmox_restore_daemon/api.rs         |  21 +-
 pxar-bin/src/main.rs                          |  23 +-
 src/api2/backup/upload_chunk.rs               |   4 +-
 src/tape/file_formats/snapshot_archive.rs     |   2 +-
 tests/catar.rs                                |   3 +
 zsh-completions/_proxmox-backup-test-suite    |  13 +
 30 files changed, 1789 insertions(+), 174 deletions(-)
 create mode 100644 debian/proxmox-backup-test-suite.bc
 create mode 100644 pbs-client/src/inject_reused_chunks.rs
 create mode 100644 proxmox-backup-test-suite/Cargo.toml
 create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs
 create mode 100644 proxmox-backup-test-suite/src/main.rs
 create mode 100644 zsh-completions/_proxmox-backup-test-suite

-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 pxar 1/23] fix #3174: decoder: factor out skip_bytes from skip_entry
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 2/23] fix #3174: decoder: impl skip_bytes for sync dec Christian Ebner
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Allows to skip bytes independently of the current entries header, as is
implemented by skip_entry.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---

Changes since v1:
- use `u64` instead of `usize` as type for len parameter of `skip_bytes`

 src/decoder/mod.rs | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index d1fb911..a094324 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -562,20 +562,25 @@ impl<I: SeqRead> DecoderImpl<I> {
     // These utilize additional information and hence are not part of the `dyn SeqRead` impl.
     //
 
-    async fn skip_entry(&mut self, offset: u64) -> io::Result<()> {
-        let mut len = self.current_header.content_size() - offset;
+    async fn skip_bytes(&mut self, len: u64) -> io::Result<()> {
+        let mut remaining = len;
         let scratch = scratch_buffer();
-        while len >= (scratch.len() as u64) {
+        while remaining >= (scratch.len() as u64) {
             seq_read_exact(&mut self.input, scratch).await?;
-            len -= scratch.len() as u64;
+            remaining -= scratch.len() as u64;
         }
-        let len = len as usize;
-        if len > 0 {
-            seq_read_exact(&mut self.input, &mut scratch[..len]).await?;
+        let remaining = remaining as usize;
+        if remaining > 0 {
+            seq_read_exact(&mut self.input, &mut scratch[..remaining]).await?;
         }
         Ok(())
     }
 
+    async fn skip_entry(&mut self, offset: u64) -> io::Result<()> {
+        let len = self.current_header.content_size() - offset;
+        self.skip_bytes(len).await
+    }
+
     async fn read_entry_as_bytes(&mut self) -> io::Result<Vec<u8>> {
         let size = usize::try_from(self.current_header.content_size()).map_err(io_err_other)?;
         let data = seq_read_exact_data(&mut self.input, size).await?;
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 pxar 2/23] fix #3174: decoder: impl skip_bytes for sync dec
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 1/23] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 3/23] fix #3174: encoder: calc filename + metadata byte size Christian Ebner
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Allows to skip over a given number of bytes during pxar archive
decoding. This is used to skip over unneeded contents in a pxar
appendix.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v1:
- remove type cast to `usize`, not needed anymore

 src/decoder/sync.rs | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/decoder/sync.rs b/src/decoder/sync.rs
index 5597a03..bdfe2f4 100644
--- a/src/decoder/sync.rs
+++ b/src/decoder/sync.rs
@@ -81,6 +81,11 @@ impl<T: SeqRead> Decoder<T> {
     pub fn enable_goodbye_entries(&mut self, on: bool) {
         self.inner.with_goodbye_tables = on;
     }
+
+    /// Skip bytes in decoder stream.
+    pub fn skip_bytes(&mut self, len: u64) -> io::Result<()> {
+        poll_result_once(self.inner.skip_bytes(len))
+    }
 }
 
 impl<T: SeqRead> Iterator for Decoder<T> {
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 pxar 3/23] fix #3174: encoder: calc filename + metadata byte size
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 1/23] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 2/23] fix #3174: decoder: impl skip_bytes for sync dec Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 4/23] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype Christian Ebner
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Introduce SeqSink and impl SeqWrite in order to create an encoder
implementation which instead of writing data to a stream, consumes
the encoded stream and returns the consumed bytes for that stream.

Based on this, implement a helper function `byte_len` which returns the
byte size of the filename entry and metadata entry as encoded by the
archive.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v1:
- Instead of calculating the metadata size based on the known encoding
  sizes, implement an Encoder instance which counts the encoded bytes.

 src/encoder/mod.rs  | 37 +++++++++++++++++++++++++++++++++++++
 src/encoder/sync.rs |  9 ++++++++-
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 0d342ec..a209ee7 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -85,6 +85,24 @@ where
     }
 }
 
+#[derive(Default)]
+/// Sink to consume sequential byte stream
+pub struct SeqSink;
+
+impl SeqWrite for SeqSink {
+    fn poll_seq_write(
+        self: Pin<&mut Self>,
+        _cx: &mut Context,
+        buf: &[u8],
+    ) -> Poll<io::Result<usize>> {
+        Poll::Ready(Ok(buf.len()))
+    }
+
+    fn poll_flush(self: Pin<&mut Self>, _cx: &mut Context) -> Poll<io::Result<()>> {
+        Poll::Ready(Ok(()))
+    }
+}
+
 /// awaitable verison of `poll_seq_write`.
 async fn seq_write<T: SeqWrite + ?Sized>(
     output: &mut T,
@@ -833,6 +851,25 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     }
 }
 
+impl EncoderImpl<'_, SeqSink> {
+    /// Calculate the encoded byte len of filename and metadata struct
+    async fn byte_len(filename: &std::ffi::CStr, metadata: &Metadata) -> io::Result<u64> {
+        let mut this = Self {
+            output: EncoderOutput::Owned(SeqSink::default()),
+            state: EncoderState::default(),
+            parent: None,
+            finished: false,
+            file_copy_buffer: Arc::new(Mutex::new(unsafe {
+                crate::util::vec_new_uninitialized(1024 * 1024)
+            })),
+        };
+
+        this.start_file_do(Some(metadata), filename.to_bytes())
+            .await?;
+        Ok(this.position())
+    }
+}
+
 /// Writer for a file object in a directory.
 pub(crate) struct FileImpl<'a, S: SeqWrite> {
     output: &'a mut S,
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 1ec91b8..ac0025c 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -6,7 +6,7 @@ use std::pin::Pin;
 use std::task::{Context, Poll};
 
 use crate::decoder::sync::StandardReader;
-use crate::encoder::{self, LinkOffset, SeqWrite};
+use crate::encoder::{self, LinkOffset, SeqSink, SeqWrite};
 use crate::format;
 use crate::util::poll_result_once;
 use crate::Metadata;
@@ -165,6 +165,13 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
     }
 }
 
+impl<'a> Encoder<'a, SeqSink> {
+    /// Calculate the encoded byte len of filename and metadata struct
+    pub fn byte_len(filename: &std::ffi::CStr, metadata: &Metadata) -> io::Result<u64> {
+        poll_result_once(encoder::EncoderImpl::byte_len(filename, metadata))
+    }
+}
+
 /// This is a "file" inside a pxar archive, to which the initially declared amount of data should
 /// be written.
 ///
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 pxar 4/23] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (2 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 3/23] fix #3174: encoder: calc filename + metadata byte size Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 5/23] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype Christian Ebner
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Add an additional entry type for regular files to store a reference to
the appenidx section of the pxar archive, the position relative to the
appendix start is stored, in order to be able to access the file payload
within the appendix.

This new entry type is used to reference the contents of existing file
payload chunks for unchanged file payloads.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v1:
- Do not re-encode the filename and metadata in add_appendix_ref by
  removing the call to start_file_do
- Use dedicated types for archive offsets instead of raw `u64` values.

 examples/mk-format-hashes.rs |  5 +++
 src/decoder/mod.rs           | 22 ++++++++++++-
 src/encoder/aio.rs           | 14 ++++++++-
 src/encoder/mod.rs           | 61 ++++++++++++++++++++++++++++++++++++
 src/encoder/sync.rs          | 16 +++++++++-
 src/format/mod.rs            |  5 +++
 src/lib.rs                   |  6 ++++
 7 files changed, 126 insertions(+), 3 deletions(-)

diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index 1ad606c..8b4f5de 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -41,6 +41,11 @@ const CONSTANTS: &[(&str, &str, &str)] = &[
         "PXAR_PAYLOAD",
         "__PROXMOX_FORMAT_PXAR_PAYLOAD__",
     ),
+    (
+        "Marks the beginnig of an appendix reference for regular files",
+        "PXAR_APPENDIX_REF",
+        "__PROXMOX_FORMAT_PXAR_APPENDIX_REF__",
+    ),
     (
         "Marks item as entry of goodbye table",
         "PXAR_GOODBYE",
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index a094324..70a8697 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -276,6 +276,10 @@ impl<I: SeqRead> DecoderImpl<I> {
 
             match self.current_header.htype {
                 format::PXAR_FILENAME => return self.handle_file_entry().await,
+                format::PXAR_APPENDIX_REF => {
+                    self.state = State::Default;
+                    return self.handle_appendix_ref_entry().await
+                }
                 format::PXAR_GOODBYE => {
                     self.state = State::InGoodbyeTable;
 
@@ -334,6 +338,22 @@ impl<I: SeqRead> DecoderImpl<I> {
         self.read_next_entry().await.map(Some)
     }
 
+    async fn handle_appendix_ref_entry(&mut self) -> io::Result<Option<Entry>> {
+        let bytes = self.read_entry_as_bytes().await?;
+        let appendix_offset = u64::from_le_bytes(bytes[0..8].try_into().unwrap());
+        let file_size = u64::from_le_bytes(bytes[8..16].try_into().unwrap());
+
+        self.reset_path()?;
+        return Ok(Some(Entry {
+            path: self.entry.path.clone(),
+            metadata: Metadata::default(),
+            kind: EntryKind::AppendixRef {
+                appendix_offset,
+                file_size,
+            }
+        }));
+    }
+
     fn reset_path(&mut self) -> io::Result<()> {
         let path_len = *self
             .path_lengths
@@ -535,7 +555,7 @@ impl<I: SeqRead> DecoderImpl<I> {
                 self.state = State::InPayload { offset: 0 };
                 return Ok(ItemResult::Entry);
             }
-            format::PXAR_FILENAME | format::PXAR_GOODBYE => {
+            format::PXAR_FILENAME | format::PXAR_GOODBYE | format::PXAR_APPENDIX_REF => {
                 if self.entry.metadata.is_fifo() {
                     self.state = State::InSpecialFile;
                     self.entry.kind = EntryKind::Fifo;
diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index ad25fea..66ea535 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -5,7 +5,7 @@ use std::path::Path;
 use std::pin::Pin;
 use std::task::{Context, Poll};
 
-use crate::encoder::{self, LinkOffset, SeqWrite};
+use crate::encoder::{self, AppendixRefOffset, LinkOffset, SeqWrite};
 use crate::format;
 use crate::Metadata;
 
@@ -112,6 +112,18 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         self.inner.finish().await
     }
 
+    /// Add reference to archive appendix
+    pub async fn add_appendix_ref<PF: AsRef<Path>>(
+        &mut self,
+        file_name: PF,
+        appendix_ref_offset: AppendixRefOffset,
+        file_size: u64,
+    ) -> io::Result<()> {
+        self.inner
+            .add_appendix_ref(file_name.as_ref(), appendix_ref_offset, file_size)
+            .await
+    }
+
     /// Add a symbolic link to the archive.
     pub async fn add_symlink<PF: AsRef<Path>, PT: AsRef<Path>>(
         &mut self,
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index a209ee7..78612a6 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -38,6 +38,33 @@ impl LinkOffset {
     }
 }
 
+/// File reference used to create appendix references.
+#[derive(Clone, Copy, Debug, Default, Eq, PartialEq, Ord, PartialOrd)]
+pub struct AppendixRefOffset(u64);
+
+impl AppendixRefOffset {
+    /// Get the raw byte offset for this appendix reference.
+    #[inline]
+    pub fn raw(self) -> u64 {
+        self.0
+    }
+
+    /// Return a new AppendixRefOffset, positively shifted by offset
+    #[inline]
+    pub fn add(&self, offset: u64) -> Self {
+        Self(self.0 + offset)
+    }
+
+    /// Return a new AppendixRefOffset, negatively shifted by offset
+    #[inline]
+    pub fn sub(&self, offset: u64) -> Self {
+        Self(self.0 - offset)
+    }
+}
+
+/// Offset pointing to the start of the appendix section of the archive.
+#[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
+pub struct AppendixStartOffset(u64);
 /// Sequential write interface used by the encoder's state machine.
 ///
 /// This is our internal writer trait which is available for `std::io::Write` types in the
@@ -449,6 +476,40 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         Ok(offset)
     }
 
+    /// Add reference to pxar archive appendix
+    pub async fn add_appendix_ref(
+        &mut self,
+        file_name: &Path,
+        appendix_ref_offset: AppendixRefOffset,
+        file_size: u64,
+    ) -> io::Result<()> {
+        self.check()?;
+
+        let offset = self.position();
+        let file_name = file_name.as_os_str().as_bytes();
+
+        let mut data = Vec::with_capacity(2 * 8);
+        data.extend(&appendix_ref_offset.raw().to_le_bytes());
+        data.extend(&file_size.to_le_bytes());
+        seq_write_pxar_entry(
+            self.output.as_mut(),
+            format::PXAR_APPENDIX_REF,
+            &data,
+            &mut self.state.write_position,
+        )
+        .await?;
+
+        let end_offset = self.position();
+
+        self.state.items.push(GoodbyeItem {
+            hash: format::hash_filename(file_name),
+            offset: offset,
+            size: end_offset - offset,
+        });
+
+        Ok(())
+    }
+
     /// Return a file offset usable with `add_hardlink`.
     pub async fn add_symlink(
         &mut self,
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index ac0025c..2c9ea2b 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -6,7 +6,7 @@ use std::pin::Pin;
 use std::task::{Context, Poll};
 
 use crate::decoder::sync::StandardReader;
-use crate::encoder::{self, LinkOffset, SeqSink, SeqWrite};
+use crate::encoder::{self, AppendixRefOffset, LinkOffset, SeqSink, SeqWrite};
 use crate::format;
 use crate::util::poll_result_once;
 use crate::Metadata;
@@ -110,6 +110,20 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         poll_result_once(self.inner.finish())
     }
 
+    /// Add reference to archive appendix
+    pub async fn add_appendix_ref<PF: AsRef<Path>>(
+        &mut self,
+        file_name: PF,
+        appendix_ref_offset: AppendixRefOffset,
+        file_size: u64,
+    ) -> io::Result<()> {
+        poll_result_once(self.inner.add_appendix_ref(
+            file_name.as_ref(),
+            appendix_ref_offset,
+            file_size,
+        ))
+    }
+
     /// Add a symbolic link to the archive.
     pub fn add_symlink<PF: AsRef<Path>, PT: AsRef<Path>>(
         &mut self,
diff --git a/src/format/mod.rs b/src/format/mod.rs
index 72a193c..5eb7562 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -22,6 +22,7 @@
 //!   * `FCAPS`             -- file capability in Linux disk format
 //!   * `QUOTA_PROJECT_ID`  -- the ext4/xfs quota project ID
 //!   * `PAYLOAD`           -- file contents, if it is one
+//!   * `APPENDIX_REF`      -- start offset and size of a file entry relative to the appendix start
 //!   * `SYMLINK`           -- symlink target, if it is one
 //!   * `DEVICE`            -- device major/minor, if it is a block/char device
 //!
@@ -99,6 +100,8 @@ pub const PXAR_QUOTA_PROJID: u64 = 0xe07540e82f7d1cbb;
 pub const PXAR_HARDLINK: u64 = 0x51269c8422bd7275;
 /// Marks the beginnig of the payload (actual content) of regular files
 pub const PXAR_PAYLOAD: u64 = 0x28147a1b0b7c1a25;
+/// Marks the beginnig of an appendix reference for regular files
+pub const PXAR_APPENDIX_REF: u64 = 0x849b4a17e0234f8e;
 /// Marks item as entry of goodbye table
 pub const PXAR_GOODBYE: u64 = 0x2fec4fa642d5731d;
 /// The end marker used in the GOODBYE object
@@ -151,6 +154,7 @@ impl Header {
             PXAR_ACL_GROUP_OBJ => size_of::<acl::GroupObject>() as u64,
             PXAR_QUOTA_PROJID => size_of::<QuotaProjectId>() as u64,
             PXAR_ENTRY => size_of::<Stat>() as u64,
+            PXAR_APPENDIX_REF => u64::MAX - (size_of::<Self>() as u64),
             PXAR_PAYLOAD | PXAR_GOODBYE => u64::MAX - (size_of::<Self>() as u64),
             _ => u64::MAX - (size_of::<Self>() as u64),
         }
@@ -191,6 +195,7 @@ impl Display for Header {
             PXAR_ACL_GROUP_OBJ => "ACL_GROUP_OBJ",
             PXAR_QUOTA_PROJID => "QUOTA_PROJID",
             PXAR_ENTRY => "ENTRY",
+            PXAR_APPENDIX_REF => "APPENDIX_REF",
             PXAR_PAYLOAD => "PAYLOAD",
             PXAR_GOODBYE => "GOODBYE",
             _ => "UNKNOWN",
diff --git a/src/lib.rs b/src/lib.rs
index 210c4b1..fa84e7a 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -366,6 +366,12 @@ pub enum EntryKind {
         offset: Option<u64>,
     },
 
+    /// Reference to pxar archive appendix
+    AppendixRef {
+        appendix_offset: u64,
+        file_size: u64,
+    },
+
     /// Directory entry. When iterating through an archive, the contents follow next.
     Directory,
 
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 pxar 5/23] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (3 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 4/23] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 6/23] fix #3174: encoder: helper to add to encoder position Christian Ebner
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Add an additional entry type for marking the start of a pxar archive
appendix section. The appendix is a concatenation of possibly
uncorrelated chunks, therefore not following the pxar archive format
anymore. The appendix is only used to access the file metadata and
payloads when a PXAR_APPENDIX_REF entry is encountered in the archive
before this point.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v1:
- Use custom type for appendix start offset instead of raw `u64`

 examples/mk-format-hashes.rs |  1 +
 src/decoder/mod.rs           |  9 +++++++++
 src/encoder/aio.rs           |  9 ++++++++-
 src/encoder/mod.rs           | 30 ++++++++++++++++++++++++++++++
 src/encoder/sync.rs          |  9 ++++++++-
 src/format/mod.rs            |  7 +++++++
 src/lib.rs                   |  4 ++++
 7 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index 8b4f5de..f068edd 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -12,6 +12,7 @@ const CONSTANTS: &[(&str, &str, &str)] = &[
         "__PROXMOX_FORMAT_ENTRY__",
     ),
     ("", "PXAR_FILENAME", "__PROXMOX_FORMAT_FILENAME__"),
+    ("", "PXAR_APPENDIX", "__PROXMOX_FORMAT_APPENDIX__"),
     ("", "PXAR_SYMLINK", "__PROXMOX_FORMAT_SYMLINK__"),
     ("", "PXAR_DEVICE", "__PROXMOX_FORMAT_DEVICE__"),
     ("", "PXAR_XATTR", "__PROXMOX_FORMAT_XATTR__"),
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index 70a8697..43a1122 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -295,6 +295,7 @@ impl<I: SeqRead> DecoderImpl<I> {
                         continue;
                     }
                 }
+                format::PXAR_APPENDIX => return Ok(Some(self.entry.take())),
                 _ => io_bail!(
                     "expected filename or directory-goodbye pxar entry, got: {}",
                     self.current_header,
@@ -546,6 +547,14 @@ impl<I: SeqRead> DecoderImpl<I> {
                 self.entry.kind = EntryKind::Device(self.read_device().await?);
                 return Ok(ItemResult::Entry);
             }
+            format::PXAR_APPENDIX => {
+                let bytes = self.read_entry_as_bytes().await?;
+                let total = u64::from_le_bytes(bytes[0..8].try_into().unwrap());
+                self.entry.kind = EntryKind::Appendix {
+                    total,
+                };
+                return Ok(ItemResult::Entry);
+            }
             format::PXAR_PAYLOAD => {
                 let offset = seq_read_position(&mut self.input).await.transpose()?;
                 self.entry.kind = EntryKind::File {
diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index 66ea535..5c7c7b9 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -5,7 +5,7 @@ use std::path::Path;
 use std::pin::Pin;
 use std::task::{Context, Poll};
 
-use crate::encoder::{self, AppendixRefOffset, LinkOffset, SeqWrite};
+use crate::encoder::{self, AppendixRefOffset, AppendixStartOffset, LinkOffset, SeqWrite};
 use crate::format;
 use crate::Metadata;
 
@@ -124,6 +124,13 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
             .await
     }
 
+    /// Add the appendix start entry marker
+    ///
+    /// Returns the LinkOffset pointing after the entry, the appendix start offset
+    pub async fn add_appendix(&mut self, full_size: u64) -> io::Result<AppendixStartOffset> {
+        self.inner.add_appendix(full_size).await
+    }
+
     /// Add a symbolic link to the archive.
     pub async fn add_symlink<PF: AsRef<Path>, PT: AsRef<Path>>(
         &mut self,
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 78612a6..3270d52 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -65,6 +65,15 @@ impl AppendixRefOffset {
 /// Offset pointing to the start of the appendix section of the archive.
 #[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
 pub struct AppendixStartOffset(u64);
+
+impl AppendixStartOffset {
+    /// Get the raw byte start offset for this appenidx section.
+    #[inline]
+    pub fn raw(self) -> u64 {
+        self.0
+    }
+}
+
 /// Sequential write interface used by the encoder's state machine.
 ///
 /// This is our internal writer trait which is available for `std::io::Write` types in the
@@ -510,6 +519,27 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         Ok(())
     }
 
+    /// Add the appendix start entry marker
+    ///
+    /// Returns the AppendixStartOffset pointing after the entry, the start of the appendix
+    /// section of the archive.
+    pub async fn add_appendix(&mut self, full_size: u64) -> io::Result<AppendixStartOffset> {
+        self.check()?;
+
+        let data = &full_size.to_le_bytes().to_vec();
+        seq_write_pxar_entry(
+            self.output.as_mut(),
+            format::PXAR_APPENDIX,
+            &data,
+            &mut self.state.write_position,
+        )
+        .await?;
+
+        let offset = self.position();
+
+        Ok(AppendixStartOffset(offset))
+    }
+
     /// Return a file offset usable with `add_hardlink`.
     pub async fn add_symlink(
         &mut self,
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 2c9ea2b..2aaa6f2 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -6,7 +6,7 @@ use std::pin::Pin;
 use std::task::{Context, Poll};
 
 use crate::decoder::sync::StandardReader;
-use crate::encoder::{self, AppendixRefOffset, LinkOffset, SeqSink, SeqWrite};
+use crate::encoder::{self, AppendixRefOffset, AppendixStartOffset, LinkOffset, SeqSink, SeqWrite};
 use crate::format;
 use crate::util::poll_result_once;
 use crate::Metadata;
@@ -124,6 +124,13 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         ))
     }
 
+    /// Add the appendix start entry marker
+    ///
+    /// Returns the LinkOffset pointing after the entry, the appendix start offset
+    pub async fn add_appendix(&mut self, full_size: u64) -> io::Result<AppendixStartOffset> {
+        poll_result_once(self.inner.add_appendix(full_size))
+    }
+
     /// Add a symbolic link to the archive.
     pub fn add_symlink<PF: AsRef<Path>, PT: AsRef<Path>>(
         &mut self,
diff --git a/src/format/mod.rs b/src/format/mod.rs
index 5eb7562..8254df9 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -35,6 +35,12 @@
 //!   * `<archive>`         -- serialization of the second directory entry
 //!   * ...
 //!   * `GOODBYE`           -- lookup table at the end of a list of directory entries
+//!
+//! For backups referencing previous backups to skip file payloads, the archive is followed by a
+//! appendix maker after which the concatinated pxar archive fragments containing the file payloads
+//! are appended. They are NOT guaranteed to follow the full pxar structure and should only be
+//! used to extract the file payloads by given offset.
+//!   * `APPENDIX`          -- pxar archive fragments containing file payloads
 
 use std::cmp::Ordering;
 use std::ffi::{CStr, OsStr};
@@ -85,6 +91,7 @@ pub const PXAR_ENTRY: u64 = 0xd5956474e588acef;
 /// Previous version of the entry struct
 pub const PXAR_ENTRY_V1: u64 = 0x11da850a1c1cceff;
 pub const PXAR_FILENAME: u64 = 0x16701121063917b3;
+pub const PXAR_APPENDIX: u64 = 0x9ff6c9507864b38d;
 pub const PXAR_SYMLINK: u64 = 0x27f971e7dbf5dc5f;
 pub const PXAR_DEVICE: u64 = 0x9fc9e906586d5ce9;
 pub const PXAR_XATTR: u64 = 0x0dab0229b57dcd03;
diff --git a/src/lib.rs b/src/lib.rs
index fa84e7a..035f995 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -372,6 +372,10 @@ pub enum EntryKind {
         file_size: u64,
     },
 
+    Appendix {
+        total: u64,
+    },
+
     /// Directory entry. When iterating through an archive, the contents follow next.
     Directory,
 
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 pxar 6/23] fix #3174: encoder: helper to add to encoder position
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (4 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 5/23] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 7/23] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype Christian Ebner
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Adds a helper to allow to increase the encoder position by a given
size. This is used to increase the position when adding an appendix
section to the pxar stream, as these bytes are never encoded directly
but rather referenced by already existing chunks.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v1:
- add unsafe function signature in order to make it clear that calling
  `position_add` with incorrect size values will break the pxar
  archive encoding.

 src/encoder/aio.rs  | 5 +++++
 src/encoder/mod.rs  | 6 ++++++
 src/encoder/sync.rs | 5 +++++
 3 files changed, 16 insertions(+)

diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index 5c7c7b9..8ff1364 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -112,6 +112,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         self.inner.finish().await
     }
 
+    /// Add size to encoders position and return new position.
+    pub unsafe fn position_add(&mut self, size: u64) -> u64 {
+        unsafe { self.inner.position_add(size) }
+    }
+
     /// Add reference to archive appendix
     pub async fn add_appendix_ref<PF: AsRef<Path>>(
         &mut self,
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 3270d52..de23728 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -673,6 +673,12 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         self.state.write_position
     }
 
+    #[inline]
+    pub unsafe fn position_add(&mut self, size: u64) -> u64 {
+        self.state.write_position += size;
+        self.state.write_position
+    }
+
     pub async fn create_directory(
         &mut self,
         file_name: &Path,
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 2aaa6f2..a7f6b20 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -110,6 +110,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         poll_result_once(self.inner.finish())
     }
 
+    /// Add size to encoders position and return new position.
+    pub unsafe fn position_add(&mut self, size: u64) -> u64 {
+        unsafe { self.inner.position_add(size) }
+    }
+
     /// Add reference to archive appendix
     pub async fn add_appendix_ref<PF: AsRef<Path>>(
         &mut self,
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 pxar 7/23] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (5 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 6/23] fix #3174: encoder: helper to add to encoder position Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 08/23] fix #3174: index: add fn index list from start/end-offsets Christian Ebner
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

The PXAR_APPENDIX_TAIL entry marks pxar archives containing an appendix
section. It has the same size as a goodbye tail marker item in order to
be able to easily read and distinguish archives with and without such
section.

This also implements the accessor used by e.g. the fuse implementation
to perform random io on the archive. The accessor reads the last entry
and stores the appendix offset if needed, in order to recalculate the
actual file payload offset within the archive when encountering a
appendix reference entry in the archive.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v1:
- adapt to custom type for appendix start offset

 examples/mk-format-hashes.rs |  5 ++++
 examples/pxarcmd.rs          |  4 ++--
 src/accessor/mod.rs          | 46 ++++++++++++++++++++++++++++++++++++
 src/encoder/aio.rs           |  6 ++---
 src/encoder/mod.rs           | 19 ++++++++++++++-
 src/encoder/sync.rs          |  4 ++--
 src/format/mod.rs            |  4 ++++
 7 files changed, 80 insertions(+), 8 deletions(-)

diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index f068edd..7fb938d 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -57,6 +57,11 @@ const CONSTANTS: &[(&str, &str, &str)] = &[
         "PXAR_GOODBYE_TAIL_MARKER",
         "__PROXMOX_FORMAT_PXAR_GOODBYE_TAIL_MARKER__",
     ),
+    (
+        "Marks the end of an archive containing an appendix section",
+        "PXAR_APPENDIX_TAIL",
+        "__PROXMOX_FORMAT_APPENDIX_TAIL__",
+    ),
 ];
 
 fn main() {
diff --git a/examples/pxarcmd.rs b/examples/pxarcmd.rs
index e0c779d..c7848cc 100644
--- a/examples/pxarcmd.rs
+++ b/examples/pxarcmd.rs
@@ -105,7 +105,7 @@ fn cmd_create(mut args: std::env::ArgsOs) -> Result<(), Error> {
 
     let mut encoder = Encoder::create(file, &meta)?;
     add_directory(&mut encoder, dir, &dir_path, &mut HashMap::new())?;
-    encoder.finish()?;
+    encoder.finish(None)?;
 
     Ok(())
 }
@@ -145,7 +145,7 @@ fn add_directory<'a, T: SeqWrite + 'a>(
                 root_path,
                 &mut *hardlinks,
             )?;
-            dir.finish()?;
+            dir.finish(None)?;
         } else if file_type.is_symlink() {
             todo!("symlink handling");
         } else if file_type.is_file() {
diff --git a/src/accessor/mod.rs b/src/accessor/mod.rs
index 6a2de73..1c19d7b 100644
--- a/src/accessor/mod.rs
+++ b/src/accessor/mod.rs
@@ -182,6 +182,7 @@ pub(crate) struct AccessorImpl<T> {
     input: T,
     size: u64,
     caches: Arc<Caches>,
+    appendix_offset: Option<u64>,
 }
 
 impl<T: ReadAt> AccessorImpl<T> {
@@ -190,10 +191,22 @@ impl<T: ReadAt> AccessorImpl<T> {
             io_bail!("too small to contain a pxar archive");
         }
 
+        let tail_offset = size - (size_of::<GoodbyeItem>() as u64);
+        let tail: GoodbyeItem = read_entry_at(&input, tail_offset).await?;
+
+        let (appendix_offset, size) = if tail.hash == format::PXAR_APPENDIX_TAIL {
+            (Some(tail.offset), size - 40)
+        } else if tail.hash != format::PXAR_GOODBYE_TAIL_MARKER {
+            io_bail!("no goodbye tail marker found");
+        } else {
+            (None, size)
+        };
+
         Ok(Self {
             input,
             size,
             caches: Arc::new(Caches::default()),
+            appendix_offset,
         })
     }
 
@@ -207,6 +220,7 @@ impl<T: ReadAt> AccessorImpl<T> {
             self.size,
             "/".into(),
             Arc::clone(&self.caches),
+            self.appendix_offset,
         )
         .await
     }
@@ -263,6 +277,7 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
             self.size,
             "/".into(),
             Arc::clone(&self.caches),
+            self.appendix_offset,
         )
         .await
     }
@@ -274,6 +289,7 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
             offset,
             "/".into(),
             Arc::clone(&self.caches),
+            self.appendix_offset,
         )
         .await
     }
@@ -369,6 +385,7 @@ pub(crate) struct DirectoryImpl<T> {
     table: Arc<[GoodbyeItem]>,
     path: PathBuf,
     caches: Arc<Caches>,
+    appendix_offset: Option<u64>,
 }
 
 impl<T: Clone + ReadAt> DirectoryImpl<T> {
@@ -378,6 +395,7 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
         end_offset: u64,
         path: PathBuf,
         caches: Arc<Caches>,
+        appendix_offset: Option<u64>,
     ) -> io::Result<DirectoryImpl<T>> {
         let tail = Self::read_tail_entry(&input, end_offset).await?;
 
@@ -407,6 +425,7 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
             table: table.as_ref().map_or_else(|| Arc::new([]), Arc::clone),
             path,
             caches,
+            appendix_offset,
         };
 
         // sanity check:
@@ -516,6 +535,32 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
             .next()
             .await
             .ok_or_else(|| io_format_err!("unexpected EOF while decoding directory entry"))??;
+
+        if let EntryKind::AppendixRef {
+            appendix_offset,
+            file_size,
+        } = entry.kind()
+        {
+            let appendix_start = match self.appendix_offset {
+                Some(appendix_start) => appendix_start,
+                None => io_bail!("missing required appendix start offset information"),
+            };
+
+            let name = file_name.ok_or_else(|| io_format_err!("missing required filename"))?;
+            let c_string = std::ffi::CString::new(name.as_os_str().as_bytes())?;
+            let start =
+                appendix_start + appendix_offset + 16 + c_string.as_bytes_with_nul().len() as u64;
+            let end = start + file_size;
+            decoder = self.get_decoder(start..end, file_name).await?;
+
+            let entry = decoder
+                .next()
+                .await
+                .ok_or_else(|| io_format_err!("unexpected EOF while decoding directory entry"))??;
+
+            return Ok((entry, decoder));
+        }
+
         Ok((entry, decoder))
     }
 
@@ -698,6 +743,7 @@ impl<T: Clone + ReadAt> FileEntryImpl<T> {
             self.entry_range_info.entry_range.end,
             self.entry.path.clone(),
             Arc::clone(&self.caches),
+            None,
         )
         .await
     }
diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index 8ff1364..48ba857 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -108,8 +108,8 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
     }
 
     /// Finish this directory. This is mandatory, otherwise the `Drop` handler will `panic!`.
-    pub async fn finish(self) -> io::Result<()> {
-        self.inner.finish().await
+    pub async fn finish(self, appendix_tail: Option<(AppendixStartOffset, u64)>) -> io::Result<()> {
+        self.inner.finish(appendix_tail).await
     }
 
     /// Add size to encoders position and return new position.
@@ -327,7 +327,7 @@ mod test {
                     .await
                     .unwrap();
             }
-            encoder.finish().await.unwrap();
+            encoder.finish(None).await.unwrap();
         };
 
         fn test_send<T: Send>(_: T) {}
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index de23728..532b906 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -869,7 +869,10 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         .await
     }
 
-    pub async fn finish(mut self) -> io::Result<()> {
+    pub async fn finish(
+        mut self,
+        appendix_tail: Option<(AppendixStartOffset, u64)>,
+    ) -> io::Result<()> {
         let tail_bytes = self.finish_goodbye_table().await?;
         seq_write_pxar_entry(
             self.output.as_mut(),
@@ -879,6 +882,20 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         )
         .await?;
 
+        if let Some((appendix_start_offset, size)) = appendix_tail {
+            let mut appendix_tail = Vec::new();
+            appendix_tail.append(&mut format::PXAR_APPENDIX_TAIL.to_le_bytes().to_vec());
+            appendix_tail.append(&mut appendix_start_offset.raw().to_le_bytes().to_vec());
+            appendix_tail.append(&mut size.to_le_bytes().to_vec());
+            seq_write_pxar_entry(
+                self.output.as_mut(),
+                format::PXAR_GOODBYE,
+                &appendix_tail,
+                &mut self.state.write_position,
+            )
+            .await?;
+        }
+
         if let EncoderOutput::Owned(output) = &mut self.output {
             flush(output).await?;
         }
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index a7f6b20..97b4e7c 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -106,8 +106,8 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
     }
 
     /// Finish this directory. This is mandatory, otherwise the `Drop` handler will `panic!`.
-    pub fn finish(self) -> io::Result<()> {
-        poll_result_once(self.inner.finish())
+    pub fn finish(self, appendix_tail: Option<(AppendixStartOffset, u64)>) -> io::Result<()> {
+        poll_result_once(self.inner.finish(appendix_tail))
     }
 
     /// Add size to encoders position and return new position.
diff --git a/src/format/mod.rs b/src/format/mod.rs
index 8254df9..8016ab1 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -41,6 +41,8 @@
 //! are appended. They are NOT guaranteed to follow the full pxar structure and should only be
 //! used to extract the file payloads by given offset.
 //!   * `APPENDIX`          -- pxar archive fragments containing file payloads
+//!   * final goodbye table
+//!   * `APPENDIX_TAIL`     -- marks the end of an archive containing a APPENDIX section
 
 use std::cmp::Ordering;
 use std::ffi::{CStr, OsStr};
@@ -113,6 +115,8 @@ pub const PXAR_APPENDIX_REF: u64 = 0x849b4a17e0234f8e;
 pub const PXAR_GOODBYE: u64 = 0x2fec4fa642d5731d;
 /// The end marker used in the GOODBYE object
 pub const PXAR_GOODBYE_TAIL_MARKER: u64 = 0xef5eed5b753e1555;
+/// Marks the end of an archive containing an appendix section
+pub const PXAR_APPENDIX_TAIL: u64 = 0x5b1b9abb7ae454f1;
 
 #[derive(Debug, Endian)]
 #[repr(C)]
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 proxmox-backup 08/23] fix #3174: index: add fn index list from start/end-offsets
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (6 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 7/23] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 09/23] fix #3174: index: add fn digest for DynamicEntry Christian Ebner
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Adds a function to get a list of DynamicEntry's from a chunk index by
given start and end offset, which should be contained within these
chunks.
This is needed in order to reference file payloads and reuse the
chunks containing them from a previous backup run.

The index entries are normalized, meaning each entries end() is equal to
the chunk size.

In addition to the list of index entries, the padding to the start of
the requested start offset from the first chunk is returned. This is
needed for calculation of the appendix section offsets during
decoding.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 1:
- remove unneeded total size calculation, not needed

 pbs-datastore/src/dynamic_index.rs | 32 ++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/pbs-datastore/src/dynamic_index.rs b/pbs-datastore/src/dynamic_index.rs
index 71a5082e..11b8381d 100644
--- a/pbs-datastore/src/dynamic_index.rs
+++ b/pbs-datastore/src/dynamic_index.rs
@@ -188,6 +188,38 @@ impl DynamicIndexReader {
             self.binary_search(middle_idx + 1, middle_end, end_idx, end, offset)
         }
     }
+
+    /// List of chunk indices containing the data from start_offset to end_offset
+    pub fn indices(
+        &self,
+        start_offset: u64,
+        end_offset: u64,
+    ) -> Result<(Vec<DynamicEntry>, u64), Error> {
+        let end_idx = self.index.len() - 1;
+        let chunk_end = self.chunk_end(end_idx);
+        let start = self.binary_search(0, 0, end_idx, chunk_end, start_offset)?;
+        let end = self.binary_search(0, 0, end_idx, chunk_end, end_offset)?;
+
+        let offset_first = if start == 0 {
+            0
+        } else {
+            self.index[start - 1].end()
+        };
+
+        let padding_start = start_offset - offset_first;
+
+        let mut indices = Vec::new();
+        let mut prev_end = offset_first;
+        for dynamic_entry in &self.index[start..end + 1] {
+            let mut dynamic_entry = dynamic_entry.clone();
+            // Normalize end to be equal to size
+            dynamic_entry.end_le -= prev_end;
+            prev_end += dynamic_entry.end_le;
+            indices.push(dynamic_entry);
+        }
+
+        Ok((indices, padding_start))
+    }
 }
 
 impl IndexFile for DynamicIndexReader {
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 proxmox-backup 09/23] fix #3174: index: add fn digest for DynamicEntry
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (7 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 08/23] fix #3174: index: add fn index list from start/end-offsets Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 10/23] fix #3174: api: double catalog upload size Christian Ebner
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 1:
no changes

 pbs-datastore/src/dynamic_index.rs | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/pbs-datastore/src/dynamic_index.rs b/pbs-datastore/src/dynamic_index.rs
index 11b8381d..71439ef2 100644
--- a/pbs-datastore/src/dynamic_index.rs
+++ b/pbs-datastore/src/dynamic_index.rs
@@ -72,6 +72,11 @@ impl DynamicEntry {
     pub fn end(&self) -> u64 {
         u64::from_le(self.end_le)
     }
+
+    #[inline]
+    pub fn digest(&self) -> [u8; 32] {
+        self.digest.clone()
+    }
 }
 
 pub struct DynamicIndexReader {
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 proxmox-backup 10/23] fix #3174: api: double catalog upload size
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (8 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 09/23] fix #3174: index: add fn digest for DynamicEntry Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 11/23] fix #3174: catalog: introduce extended format v2 Christian Ebner
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Double the catalog upload size, as the catalog needs to contain the
additional archive offset information needed for referencing file
payloads in an appendix if the catalog is used to perform backups
without re-reading file payloads.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 1:
no changes

 src/api2/backup/upload_chunk.rs | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/api2/backup/upload_chunk.rs b/src/api2/backup/upload_chunk.rs
index 20259660..84485bfd 100644
--- a/src/api2/backup/upload_chunk.rs
+++ b/src/api2/backup/upload_chunk.rs
@@ -126,7 +126,7 @@ pub const API_METHOD_UPLOAD_FIXED_CHUNK: ApiMethod = ApiMethod::new(
                 false,
                 &IntegerSchema::new("Chunk size.")
                     .minimum(1)
-                    .maximum(1024 * 1024 * 16)
+                    .maximum(1024 * 1024 * 32)
                     .schema()
             ),
             (
@@ -135,7 +135,7 @@ pub const API_METHOD_UPLOAD_FIXED_CHUNK: ApiMethod = ApiMethod::new(
                 &IntegerSchema::new("Encoded chunk size.")
                     .minimum((std::mem::size_of::<DataBlobHeader>() as isize) + 1)
                     .maximum(
-                        1024 * 1024 * 16
+                        1024 * 1024 * 32
                             + (std::mem::size_of::<EncryptedDataBlobHeader>() as isize)
                     )
                     .schema()
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 proxmox-backup 11/23] fix #3174: catalog: introduce extended format v2
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (9 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 10/23] fix #3174: api: double catalog upload size Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 12/23] fix #3174: archiver/extractor: impl appendix ref Christian Ebner
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Increments the catalog file format to version 2. The new catalog
format introduces an extension to file entries in order to store the
additional ctime and pxar archive file offsets, needed for metadata
based file change detection and pxar file entry reuse in the pxar
appendix section.

ctime is introduced, in order to allow to detect also file status
changes since the last backup run, e.g. by updated extended
attributes, which will not increment mtime.

Inclusion of the pxar archive file offset allows to calculate the file
entry size in the archive needed for re-indexing chunks containing
this entry and the offset relative to the appendix start in the pxar
archive, needed for calculation of the bytes to skip over during
sequential decoding while restoring the pxar archive.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 1:
not present in previous version

 pbs-client/src/pxar/create.rs                 |  31 +-
 pbs-datastore/src/catalog.rs                  | 323 ++++++++++++++----
 pbs-datastore/src/file_formats.rs             |   3 +
 .../src/proxmox_restore_daemon/api.rs         |   6 +-
 4 files changed, 282 insertions(+), 81 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index e7053d9e..a2338218 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -390,12 +390,6 @@ impl Archiver {
         patterns_count: usize,
     ) -> Result<(), Error> {
         let content = generate_pxar_excludes_cli(&self.patterns[..patterns_count]);
-        if let Some(ref catalog) = self.catalog {
-            catalog
-                .lock()
-                .unwrap()
-                .add_file(file_name, content.len() as u64, 0)?;
-        }
 
         let mut metadata = Metadata::default();
         metadata.stat.mode = pxar::format::mode::IFREG | 0o600;
@@ -405,6 +399,14 @@ impl Archiver {
             .await?;
         file.write_all(&content).await?;
 
+        if let Some(ref catalog) = self.catalog {
+            let link_offset = file.file_offset();
+            catalog
+                .lock()
+                .unwrap()
+                .add_file(file_name, content.len() as u64, 0, 0, link_offset)?;
+        }
+
         Ok(())
     }
 
@@ -572,17 +574,20 @@ impl Archiver {
                 }
 
                 let file_size = stat.st_size as u64;
-                if let Some(ref catalog) = self.catalog {
-                    catalog
-                        .lock()
-                        .unwrap()
-                        .add_file(c_file_name, file_size, stat.st_mtime)?;
-                }
-
                 let offset: LinkOffset = self
                     .add_regular_file(encoder, fd, file_name, &metadata, file_size)
                     .await?;
 
+                if let Some(ref catalog) = self.catalog {
+                    catalog.lock().unwrap().add_file(
+                        c_file_name,
+                        file_size,
+                        stat.st_mtime,
+                        stat.st_ctime,
+                        offset,
+                    )?;
+                }
+
                 if stat.st_nlink > 1 {
                     self.hardlinks
                         .insert(link_info, (self.path.clone(), offset));
diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
index 86e20c92..c4d1a4de 100644
--- a/pbs-datastore/src/catalog.rs
+++ b/pbs-datastore/src/catalog.rs
@@ -11,7 +11,7 @@ use pathpatterns::{MatchList, MatchType};
 use proxmox_io::ReadExt;
 use proxmox_schema::api;
 
-use crate::file_formats::PROXMOX_CATALOG_FILE_MAGIC_1_0;
+use crate::file_formats::{PROXMOX_CATALOG_FILE_MAGIC_1_0, PROXMOX_CATALOG_FILE_MAGIC_2_0};
 
 /// Trait for writing file list catalogs.
 ///
@@ -20,7 +20,14 @@ use crate::file_formats::PROXMOX_CATALOG_FILE_MAGIC_1_0;
 pub trait BackupCatalogWriter {
     fn start_directory(&mut self, name: &CStr) -> Result<(), Error>;
     fn end_directory(&mut self) -> Result<(), Error>;
-    fn add_file(&mut self, name: &CStr, size: u64, mtime: i64) -> Result<(), Error>;
+    fn add_file(
+        &mut self,
+        name: &CStr,
+        size: u64,
+        mtime: i64,
+        ctime: i64,
+        file_offset: pxar::encoder::LinkOffset,
+    ) -> Result<(), Error>;
     fn add_symlink(&mut self, name: &CStr) -> Result<(), Error>;
     fn add_hardlink(&mut self, name: &CStr) -> Result<(), Error>;
     fn add_block_device(&mut self, name: &CStr) -> Result<(), Error>;
@@ -81,6 +88,21 @@ impl fmt::Display for CatalogEntryType {
     }
 }
 
+#[derive(Clone, Copy, Debug, Default, Eq, PartialEq, Ord, PartialOrd)]
+pub struct FileOffset {
+    offset: u64,
+}
+
+impl FileOffset {
+    pub fn raw(&self) -> u64 {
+        self.offset
+    }
+}
+#[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
+pub enum Offset {
+    FileOffset { offset: u64 },
+}
+
 /// Represents a named directory entry
 ///
 /// The ``attr`` property contain the exact type with type specific
@@ -91,11 +113,23 @@ pub struct DirEntry {
     pub attr: DirEntryAttribute,
 }
 
+#[derive(Clone, Debug, PartialEq, Eq)]
+pub struct CatalogV2Extension {
+    pub ctime: i64,
+    pub file_offset: FileOffset,
+}
+
 /// Used to specific additional attributes inside DirEntry
 #[derive(Clone, Debug, PartialEq, Eq)]
 pub enum DirEntryAttribute {
-    Directory { start: u64 },
-    File { size: u64, mtime: i64 },
+    Directory {
+        start: u64,
+    },
+    File {
+        size: u64,
+        mtime: i64,
+        extension: Option<CatalogV2Extension>,
+    },
     Symlink,
     Hardlink,
     BlockDevice,
@@ -105,40 +139,63 @@ pub enum DirEntryAttribute {
 }
 
 impl DirEntry {
-    fn new(etype: CatalogEntryType, name: Vec<u8>, start: u64, size: u64, mtime: i64) -> Self {
-        match etype {
-            CatalogEntryType::Directory => DirEntry {
+    fn new(
+        etype: CatalogEntryType,
+        name: Vec<u8>,
+        start: u64,
+        size: u64,
+        mtime: i64,
+        ctime: i64,
+        offset: Option<Offset>,
+    ) -> Self {
+        match (etype, offset) {
+            (CatalogEntryType::Directory, None) => DirEntry {
                 name,
                 attr: DirEntryAttribute::Directory { start },
             },
-            CatalogEntryType::File => DirEntry {
-                name,
-                attr: DirEntryAttribute::File { size, mtime },
-            },
-            CatalogEntryType::Symlink => DirEntry {
+            (CatalogEntryType::File, offset) => {
+                let extension = if let Some(Offset::FileOffset { offset }) = offset {
+                    Some(CatalogV2Extension {
+                        ctime,
+                        file_offset: FileOffset { offset },
+                    })
+                } else {
+                    None
+                };
+                DirEntry {
+                    name,
+                    attr: DirEntryAttribute::File {
+                        size,
+                        mtime,
+                        extension,
+                    },
+                }
+            }
+            (CatalogEntryType::Symlink, None) => DirEntry {
                 name,
                 attr: DirEntryAttribute::Symlink,
             },
-            CatalogEntryType::Hardlink => DirEntry {
+            (CatalogEntryType::Hardlink, None) => DirEntry {
                 name,
                 attr: DirEntryAttribute::Hardlink,
             },
-            CatalogEntryType::BlockDevice => DirEntry {
+            (CatalogEntryType::BlockDevice, None) => DirEntry {
                 name,
                 attr: DirEntryAttribute::BlockDevice,
             },
-            CatalogEntryType::CharDevice => DirEntry {
+            (CatalogEntryType::CharDevice, None) => DirEntry {
                 name,
                 attr: DirEntryAttribute::CharDevice,
             },
-            CatalogEntryType::Fifo => DirEntry {
+            (CatalogEntryType::Fifo, None) => DirEntry {
                 name,
                 attr: DirEntryAttribute::Fifo,
             },
-            CatalogEntryType::Socket => DirEntry {
+            (CatalogEntryType::Socket, None) => DirEntry {
                 name,
                 attr: DirEntryAttribute::Socket,
             },
+            _ => panic!("unexpected parameters '{etype}' and '{offset:?}'"),
         }
     }
 
@@ -197,13 +254,22 @@ impl DirInfo {
             }
             DirEntry {
                 name,
-                attr: DirEntryAttribute::File { size, mtime },
+                attr:
+                    DirEntryAttribute::File {
+                        size,
+                        mtime,
+                        extension,
+                    },
             } => {
                 writer.write_all(&[CatalogEntryType::File as u8])?;
                 catalog_encode_u64(writer, name.len() as u64)?;
                 writer.write_all(name)?;
                 catalog_encode_u64(writer, *size)?;
                 catalog_encode_i64(writer, *mtime)?;
+                if let Some(CatalogV2Extension { ctime, file_offset }) = extension {
+                    catalog_encode_i64(writer, *ctime)?;
+                    catalog_encode_u64(writer, file_offset.raw())?;
+                }
             }
             DirEntry {
                 name,
@@ -271,8 +337,11 @@ impl DirInfo {
         Ok((self.name, data))
     }
 
-    fn parse<C: FnMut(CatalogEntryType, &[u8], u64, u64, i64) -> Result<bool, Error>>(
+    fn parse<
+        C: FnMut(CatalogEntryType, &[u8], u64, u64, i64, i64, Option<Offset>) -> Result<bool, Error>,
+    >(
         data: &[u8],
+        catalog_version: Option<[u8; 8]>,
         mut callback: C,
     ) -> Result<(), Error> {
         let mut cursor = data;
@@ -300,14 +369,28 @@ impl DirInfo {
             let cont = match etype {
                 CatalogEntryType::Directory => {
                     let offset = catalog_decode_u64(&mut cursor)?;
-                    callback(etype, name, offset, 0, 0)?
+                    callback(etype, name, offset, 0, 0, 0, None)?
                 }
                 CatalogEntryType::File => {
                     let size = catalog_decode_u64(&mut cursor)?;
                     let mtime = catalog_decode_i64(&mut cursor)?;
-                    callback(etype, name, 0, size, mtime)?
+                    let (ctime, offset) = if let Some(version) = catalog_version {
+                        let mut ctime = 0;
+                        let mut offset = None;
+                        if version == PROXMOX_CATALOG_FILE_MAGIC_2_0 {
+                            ctime = catalog_decode_i64(&mut cursor)?;
+                            let file_offset = catalog_decode_u64(&mut cursor)?;
+                            offset = Some(Offset::FileOffset {
+                                offset: file_offset,
+                            })
+                        }
+                        (ctime, offset)
+                    } else {
+                        (0, None)
+                    };
+                    callback(etype, name, 0, size, mtime, ctime, offset)?
                 }
-                _ => callback(etype, name, 0, 0, 0)?,
+                _ => callback(etype, name, 0, 0, 0, 0, None)?,
             };
             if !cont {
                 return Ok(());
@@ -342,7 +425,7 @@ impl<W: Write> CatalogWriter<W> {
             dirstack: vec![DirInfo::new_rootdir()],
             pos: 0,
         };
-        me.write_all(&PROXMOX_CATALOG_FILE_MAGIC_1_0)?;
+        me.write_all(&PROXMOX_CATALOG_FILE_MAGIC_2_0)?;
         Ok(me)
     }
 
@@ -407,15 +490,29 @@ impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
         Ok(())
     }
 
-    fn add_file(&mut self, name: &CStr, size: u64, mtime: i64) -> Result<(), Error> {
+    fn add_file(
+        &mut self,
+        name: &CStr,
+        size: u64,
+        mtime: i64,
+        ctime: i64,
+        file_offset: pxar::encoder::LinkOffset,
+    ) -> Result<(), Error> {
         let dir = self
             .dirstack
             .last_mut()
             .ok_or_else(|| format_err!("outside root"))?;
         let name = name.to_bytes().to_vec();
+        let file_offset = FileOffset {
+            offset: file_offset.raw(),
+        };
         dir.entries.push(DirEntry {
             name,
-            attr: DirEntryAttribute::File { size, mtime },
+            attr: DirEntryAttribute::File {
+                size,
+                mtime,
+                extension: Some(CatalogV2Extension { ctime, file_offset }),
+            },
         });
         Ok(())
     }
@@ -502,12 +599,16 @@ impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
 /// Read Catalog files
 pub struct CatalogReader<R> {
     reader: R,
+    magic: Option<[u8; 8]>,
 }
 
 impl<R: Read + Seek> CatalogReader<R> {
     /// Create a new CatalogReader instance
     pub fn new(reader: R) -> Self {
-        Self { reader }
+        Self {
+            reader,
+            magic: None,
+        }
     }
 
     /// Print whole catalog to stdout
@@ -528,8 +629,11 @@ impl<R: Read + Seek> CatalogReader<R> {
         self.reader.seek(SeekFrom::Start(0))?;
         let mut magic = [0u8; 8];
         self.reader.read_exact(&mut magic)?;
-        if magic != PROXMOX_CATALOG_FILE_MAGIC_1_0 {
-            bail!("got unexpected magic number for catalog");
+        match magic {
+            PROXMOX_CATALOG_FILE_MAGIC_1_0 | PROXMOX_CATALOG_FILE_MAGIC_2_0 => {
+                self.magic = Some(magic)
+            }
+            _ => bail!("got unexpected magic number for catalog"),
         }
         self.reader.seek(SeekFrom::End(-8))?;
         let start = unsafe { self.reader.read_le_value::<u64>()? };
@@ -550,11 +654,23 @@ impl<R: Read + Seek> CatalogReader<R> {
 
         let mut entry_list = Vec::new();
 
-        DirInfo::parse(&data, |etype, name, offset, size, mtime| {
-            let entry = DirEntry::new(etype, name.to_vec(), start - offset, size, mtime);
-            entry_list.push(entry);
-            Ok(true)
-        })?;
+        DirInfo::parse(
+            &data,
+            self.magic,
+            |etype, name, offset, size, mtime, ctime, link_offset| {
+                let entry = DirEntry::new(
+                    etype,
+                    name.to_vec(),
+                    start - offset,
+                    size,
+                    mtime,
+                    ctime,
+                    link_offset,
+                );
+                entry_list.push(entry);
+                Ok(true)
+            },
+        )?;
 
         Ok(entry_list)
     }
@@ -600,15 +716,27 @@ impl<R: Read + Seek> CatalogReader<R> {
         let data = self.read_raw_dirinfo_block(start)?;
 
         let mut item = None;
-        DirInfo::parse(&data, |etype, name, offset, size, mtime| {
-            if name != filename {
-                return Ok(true);
-            }
+        DirInfo::parse(
+            &data,
+            self.magic,
+            |etype, name, offset, size, mtime, ctime, link_offset| {
+                if name != filename {
+                    return Ok(true);
+                }
 
-            let entry = DirEntry::new(etype, name.to_vec(), start - offset, size, mtime);
-            item = Some(entry);
-            Ok(false) // stop parsing
-        })?;
+                let entry = DirEntry::new(
+                    etype,
+                    name.to_vec(),
+                    start - offset,
+                    size,
+                    mtime,
+                    ctime,
+                    link_offset,
+                );
+                item = Some(entry);
+                Ok(false) // stop parsing
+            },
+        )?;
 
         Ok(item)
     }
@@ -628,35 +756,51 @@ impl<R: Read + Seek> CatalogReader<R> {
     pub fn dump_dir(&mut self, prefix: &std::path::Path, start: u64) -> Result<(), Error> {
         let data = self.read_raw_dirinfo_block(start)?;
 
-        DirInfo::parse(&data, |etype, name, offset, size, mtime| {
-            let mut path = std::path::PathBuf::from(prefix);
-            let name: &OsStr = OsStrExt::from_bytes(name);
-            path.push(name);
-
-            match etype {
-                CatalogEntryType::Directory => {
-                    log::info!("{} {:?}", etype, path);
-                    if offset > start {
-                        bail!("got wrong directory offset ({} > {})", offset, start);
+        DirInfo::parse(
+            &data,
+            self.magic,
+            |etype, name, offset, size, mtime, ctime, link_offset| {
+                let mut path = std::path::PathBuf::from(prefix);
+                let name: &OsStr = OsStrExt::from_bytes(name);
+                path.push(name);
+
+                match etype {
+                    CatalogEntryType::Directory => {
+                        log::info!("{} {:?}", etype, path);
+                        if offset > start {
+                            bail!("got wrong directory offset ({} > {})", offset, start);
+                        }
+                        let pos = start - offset;
+                        self.dump_dir(&path, pos)?;
                     }
-                    let pos = start - offset;
-                    self.dump_dir(&path, pos)?;
-                }
-                CatalogEntryType::File => {
-                    let mut mtime_string = mtime.to_string();
-                    if let Ok(s) = proxmox_time::strftime_local("%FT%TZ", mtime) {
-                        mtime_string = s;
+                    CatalogEntryType::File => {
+                        let mut mtime_string = mtime.to_string();
+                        let mut ctime_string = ctime.to_string();
+                        if let Ok(s) = proxmox_time::strftime_local("%FT%TZ", mtime) {
+                            mtime_string = s;
+                        }
+                        if let Ok(s) = proxmox_time::strftime_local("%FT%TZ", ctime) {
+                            ctime_string = s;
+                        }
+
+                        log::info!(
+                            "{} {:?} {} {} {} {:?}",
+                            etype,
+                            path,
+                            size,
+                            mtime_string,
+                            ctime_string,
+                            link_offset
+                        );
+                    }
+                    _ => {
+                        log::info!("{} {:?}", etype, path);
                     }
-
-                    log::info!("{} {:?} {} {}", etype, path, size, mtime_string,);
-                }
-                _ => {
-                    log::info!("{} {:?}", etype, path);
                 }
-            }
 
-            Ok(true)
-        })
+                Ok(true)
+            },
+        )
     }
 
     /// Finds all entries matching the given match patterns and calls the
@@ -705,9 +849,24 @@ impl<R: Read + Seek> CatalogReader<R> {
             components.push(b'/');
             components.extend(&direntry.name);
             let mut entry = ArchiveEntry::new(&components, Some(&direntry.attr));
-            if let DirEntryAttribute::File { size, mtime } = direntry.attr {
+            if let DirEntryAttribute::File {
+                size,
+                mtime,
+                extension,
+            } = direntry.attr
+            {
                 entry.size = size.into();
                 entry.mtime = mtime.into();
+                entry.ctime = None;
+                entry.file_offset = None;
+                if let Some(CatalogV2Extension {
+                    ctime,
+                    file_offset: FileOffset { offset },
+                }) = extension
+                {
+                    entry.ctime = ctime.into();
+                    entry.file_offset = offset.into();
+                }
             }
             res.push(entry);
         }
@@ -916,6 +1075,12 @@ pub struct ArchiveEntry {
     /// The file "last modified" time stamp, if entry_type is 'f' (file)
     #[serde(skip_serializing_if = "Option::is_none")]
     pub mtime: Option<i64>,
+    /// The file "last status change" time stamp,  if entry_type is 'f' (file)
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub ctime: Option<i64>,
+    /// The file archive offset, if entry_type is 'f' (file)
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub file_offset: Option<u64>,
 }
 
 impl ArchiveEntry {
@@ -946,6 +1111,30 @@ impl ArchiveEntry {
                 Some(DirEntryAttribute::File { mtime, .. }) => Some(*mtime),
                 _ => None,
             },
+            ctime: match entry_type {
+                Some(DirEntryAttribute::File { extension, .. }) => {
+                    if let Some(CatalogV2Extension { ctime, .. }) = extension {
+                        Some(*ctime)
+                    } else {
+                        None
+                    }
+                }
+                _ => None,
+            },
+            file_offset: match entry_type {
+                Some(DirEntryAttribute::File { extension, .. }) => {
+                    if let Some(CatalogV2Extension {
+                        file_offset: FileOffset { offset },
+                        ..
+                    }) = extension
+                    {
+                        Some(*offset)
+                    } else {
+                        None
+                    }
+                }
+                _ => None,
+            },
         }
     }
 }
diff --git a/pbs-datastore/src/file_formats.rs b/pbs-datastore/src/file_formats.rs
index 73d67e20..4181f0f1 100644
--- a/pbs-datastore/src/file_formats.rs
+++ b/pbs-datastore/src/file_formats.rs
@@ -5,6 +5,9 @@ use endian_trait::Endian;
 // openssl::sha::sha256(b"Proxmox Backup Catalog file v1.0")[0..8]
 pub const PROXMOX_CATALOG_FILE_MAGIC_1_0: [u8; 8] = [145, 253, 96, 249, 196, 103, 88, 213];
 
+// openssl::sha::sha256(b"Proxmox Backup Catalog file v2.0")[0..8]
+pub const PROXMOX_CATALOG_FILE_MAGIC_2_0: [u8; 8] = [204, 223, 24, 211, 187, 125, 183, 226];
+
 // openssl::sha::sha256(b"Proxmox Backup uncompressed blob v1.0")[0..8]
 pub const UNCOMPRESSED_BLOB_MAGIC_1_0: [u8; 8] = [66, 171, 56, 7, 190, 131, 112, 161];
 
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index c4e97d33..fb12befa 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -24,7 +24,7 @@ use proxmox_sys::fs::read_subdir;
 
 use pbs_api_types::file_restore::{FileRestoreFormat, RestoreDaemonStatus};
 use pbs_client::pxar::{create_archive, Flags, PxarCreateOptions, ENCODER_MAX_ENTRIES};
-use pbs_datastore::catalog::{ArchiveEntry, DirEntryAttribute};
+use pbs_datastore::catalog::{ArchiveEntry, CatalogV2Extension, DirEntryAttribute, FileOffset};
 use pbs_tools::json::required_string_param;
 
 use pxar::encoder::aio::TokioWriter;
@@ -109,6 +109,10 @@ fn get_dir_entry(path: &Path) -> Result<DirEntryAttribute, Error> {
         libc::S_IFREG => DirEntryAttribute::File {
             size: stat.st_size as u64,
             mtime: stat.st_mtime,
+            extension: Some(CatalogV2Extension {
+                ctime: stat.st_ctime,
+                file_offset: FileOffset::default(),
+            }),
         },
         libc::S_IFDIR => DirEntryAttribute::Directory { start: 0 },
         _ => bail!("unsupported file type: {}", stat.st_mode),
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 proxmox-backup 12/23] fix #3174: archiver/extractor: impl appendix ref
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (10 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 11/23] fix #3174: catalog: introduce extended format v2 Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 13/23] fix #3174: catalog: add specialized Archive entry Christian Ebner
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Implements the functionality to create and extract appendix references
via the pbs client.

This reuses the pxar encoders functionality to write appendix reference
entries and adds the implementation to store and access them in the
catalog.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 1:
- Use custom types for appendix ref archive offsets
- Include catalog implementation in this patch

 pbs-client/src/catalog_shell.rs |   2 +-
 pbs-client/src/pxar/create.rs   |  12 ++++
 pbs-client/src/pxar/extract.rs  |  16 +++++
 pbs-client/src/pxar/tools.rs    |   8 +++
 pbs-datastore/src/catalog.rs    | 103 ++++++++++++++++++++++++++++++++
 5 files changed, 140 insertions(+), 1 deletion(-)

diff --git a/pbs-client/src/catalog_shell.rs b/pbs-client/src/catalog_shell.rs
index f53b3cc5..99416d2f 100644
--- a/pbs-client/src/catalog_shell.rs
+++ b/pbs-client/src/catalog_shell.rs
@@ -1147,7 +1147,7 @@ impl<'a> ExtractorState<'a> {
             (_, DirEntryAttribute::Directory { .. }) => {
                 self.handle_new_directory(entry, match_result?).await?;
             }
-            (true, DirEntryAttribute::File { .. }) => {
+            (true, DirEntryAttribute::File { .. } | DirEntryAttribute::AppendixRef { .. }) => {
                 self.dir_stack.push(PathStackEntry::new(entry));
                 let file = Shell::walk_pxar_archive(self.accessor, &mut self.dir_stack).await?;
                 self.extract_file(file).await?;
diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index a2338218..611d7421 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -735,6 +735,18 @@ impl Archiver {
         Ok(out.file_offset())
     }
 
+    async fn add_appendix_ref<T: SeqWrite + Send>(
+        &mut self,
+        encoder: &mut Encoder<'_, T>,
+        file_name: &Path,
+        appendix_offset: pxar::encoder::AppendixRefOffset,
+        file_size: u64,
+    ) -> Result<(), Error> {
+        Ok(encoder
+            .add_appendix_ref(file_name, appendix_offset, file_size)
+            .await?)
+    }
+
     async fn add_symlink<T: SeqWrite + Send>(
         &mut self,
         encoder: &mut Encoder<'_, T>,
diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
index f78e06c2..d2d42749 100644
--- a/pbs-client/src/pxar/extract.rs
+++ b/pbs-client/src/pxar/extract.rs
@@ -74,6 +74,7 @@ struct ExtractorIterState {
     err_path_stack: Vec<OsString>,
     current_match: bool,
     end_reached: bool,
+    appendix_list: Vec<(PathBuf, u64, u64)>,
 }
 
 /// An [`Iterator`] that encapsulates the process of extraction in [extract_archive].
@@ -98,6 +99,7 @@ impl ExtractorIterState {
             err_path_stack: Vec::new(),
             current_match: options.extract_match_default,
             end_reached: false,
+            appendix_list: Vec::new(),
         }
     }
 }
@@ -373,6 +375,20 @@ where
                 }
                 .context(PxarExtractContext::ExtractFile)
             }
+            (
+                true,
+                EntryKind::AppendixRef {
+                    appendix_offset,
+                    file_size,
+                },
+            ) => {
+                self.state.appendix_list.push((
+                    entry.path().to_path_buf(),
+                    *appendix_offset,
+                    *file_size,
+                ));
+                Ok(())
+            }
             (false, _) => Ok(()), // skip this
         };
 
diff --git a/pbs-client/src/pxar/tools.rs b/pbs-client/src/pxar/tools.rs
index 0cfbaf5b..aac5a1e7 100644
--- a/pbs-client/src/pxar/tools.rs
+++ b/pbs-client/src/pxar/tools.rs
@@ -156,6 +156,14 @@ pub fn format_multi_line_entry(entry: &Entry) -> String {
 
     let (size, link, type_name) = match entry.kind() {
         EntryKind::File { size, .. } => (format!("{}", *size), String::new(), "file"),
+        EntryKind::AppendixRef {
+            appendix_offset,
+            file_size,
+        } => (
+            format!("{} {}", appendix_offset, file_size),
+            String::new(),
+            "appendix ref",
+        ),
         EntryKind::Symlink(link) => (
             "0".to_string(),
             format!(" -> {:?}", link.as_os_str()),
diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
index c4d1a4de..da68dac9 100644
--- a/pbs-datastore/src/catalog.rs
+++ b/pbs-datastore/src/catalog.rs
@@ -28,6 +28,14 @@ pub trait BackupCatalogWriter {
         ctime: i64,
         file_offset: pxar::encoder::LinkOffset,
     ) -> Result<(), Error>;
+    fn add_appendix_ref(
+        &mut self,
+        name: &CStr,
+        size: u64,
+        mtime: i64,
+        ctime: i64,
+        appendix_ref_offset: pxar::encoder::AppendixRefOffset,
+    ) -> Result<(), Error>;
     fn add_symlink(&mut self, name: &CStr) -> Result<(), Error>;
     fn add_hardlink(&mut self, name: &CStr) -> Result<(), Error>;
     fn add_block_device(&mut self, name: &CStr) -> Result<(), Error>;
@@ -41,6 +49,7 @@ pub trait BackupCatalogWriter {
 pub enum CatalogEntryType {
     Directory = b'd',
     File = b'f',
+    AppendixRef = b'r',
     Symlink = b'l',
     Hardlink = b'h',
     BlockDevice = b'b',
@@ -56,6 +65,7 @@ impl TryFrom<u8> for CatalogEntryType {
         Ok(match value {
             b'd' => CatalogEntryType::Directory,
             b'f' => CatalogEntryType::File,
+            b'r' => CatalogEntryType::AppendixRef,
             b'l' => CatalogEntryType::Symlink,
             b'h' => CatalogEntryType::Hardlink,
             b'b' => CatalogEntryType::BlockDevice,
@@ -72,6 +82,7 @@ impl From<&DirEntryAttribute> for CatalogEntryType {
         match value {
             DirEntryAttribute::Directory { .. } => CatalogEntryType::Directory,
             DirEntryAttribute::File { .. } => CatalogEntryType::File,
+            DirEntryAttribute::AppendixRef { .. } => CatalogEntryType::AppendixRef,
             DirEntryAttribute::Symlink => CatalogEntryType::Symlink,
             DirEntryAttribute::Hardlink => CatalogEntryType::Hardlink,
             DirEntryAttribute::BlockDevice => CatalogEntryType::BlockDevice,
@@ -98,9 +109,22 @@ impl FileOffset {
         self.offset
     }
 }
+
+#[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
+pub struct AppendixRefOffset {
+    offset: u64,
+}
+
+impl AppendixRefOffset {
+    pub fn raw(&self) -> u64 {
+        self.offset
+    }
+}
+
 #[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
 pub enum Offset {
     FileOffset { offset: u64 },
+    AppendixRefOffset { offset: u64 },
 }
 
 /// Represents a named directory entry
@@ -130,6 +154,12 @@ pub enum DirEntryAttribute {
         mtime: i64,
         extension: Option<CatalogV2Extension>,
     },
+    AppendixRef {
+        size: u64,
+        mtime: i64,
+        ctime: i64,
+        appendix_ref_offset: AppendixRefOffset,
+    },
     Symlink,
     Hardlink,
     BlockDevice,
@@ -195,6 +225,17 @@ impl DirEntry {
                 name,
                 attr: DirEntryAttribute::Socket,
             },
+            (CatalogEntryType::AppendixRef, Some(Offset::AppendixRefOffset { offset })) => {
+                DirEntry {
+                    name,
+                    attr: DirEntryAttribute::AppendixRef {
+                        size,
+                        mtime,
+                        ctime,
+                        appendix_ref_offset: AppendixRefOffset { offset },
+                    },
+                }
+            }
             _ => panic!("unexpected parameters '{etype}' and '{offset:?}'"),
         }
     }
@@ -204,6 +245,7 @@ impl DirEntry {
         Some(match self.attr {
             DirEntryAttribute::Directory { .. } => pxar::mode::IFDIR,
             DirEntryAttribute::File { .. } => pxar::mode::IFREG,
+            DirEntryAttribute::AppendixRef { .. } => pxar::mode::IFREG,
             DirEntryAttribute::Symlink => pxar::mode::IFLNK,
             DirEntryAttribute::Hardlink => return None,
             DirEntryAttribute::BlockDevice => pxar::mode::IFBLK,
@@ -271,6 +313,24 @@ impl DirInfo {
                     catalog_encode_u64(writer, file_offset.raw())?;
                 }
             }
+            DirEntry {
+                name,
+                attr:
+                    DirEntryAttribute::AppendixRef {
+                        size,
+                        mtime,
+                        ctime,
+                        appendix_ref_offset,
+                    },
+            } => {
+                writer.write_all(&[CatalogEntryType::AppendixRef as u8])?;
+                catalog_encode_u64(writer, name.len() as u64)?;
+                writer.write_all(name)?;
+                catalog_encode_u64(writer, *size)?;
+                catalog_encode_i64(writer, *mtime)?;
+                catalog_encode_i64(writer, *ctime)?;
+                catalog_encode_u64(writer, appendix_ref_offset.raw())?;
+            }
             DirEntry {
                 name,
                 attr: DirEntryAttribute::Symlink,
@@ -390,6 +450,21 @@ impl DirInfo {
                     };
                     callback(etype, name, 0, size, mtime, ctime, offset)?
                 }
+                CatalogEntryType::AppendixRef => {
+                    let size = catalog_decode_u64(&mut cursor)?;
+                    let mtime = catalog_decode_i64(&mut cursor)?;
+                    let ctime = catalog_decode_i64(&mut cursor)?;
+                    let offset = catalog_decode_u64(&mut cursor)?;
+                    callback(
+                        etype,
+                        name,
+                        0,
+                        size,
+                        mtime,
+                        ctime,
+                        Some(Offset::AppendixRefOffset { offset }),
+                    )?
+                }
                 _ => callback(etype, name, 0, 0, 0, 0, None)?,
             };
             if !cont {
@@ -517,6 +592,34 @@ impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
         Ok(())
     }
 
+    fn add_appendix_ref(
+        &mut self,
+        name: &CStr,
+        size: u64,
+        mtime: i64,
+        ctime: i64,
+        appendix_ref_offset: pxar::encoder::AppendixRefOffset,
+    ) -> Result<(), Error> {
+        let dir = self
+            .dirstack
+            .last_mut()
+            .ok_or_else(|| format_err!("outside root"))?;
+        let name = name.to_bytes().to_vec();
+        let appendix_ref_offset = AppendixRefOffset {
+            offset: appendix_ref_offset.raw(),
+        };
+        dir.entries.push(DirEntry {
+            name,
+            attr: DirEntryAttribute::AppendixRef {
+                size,
+                mtime,
+                ctime,
+                appendix_ref_offset,
+            },
+        });
+        Ok(())
+    }
+
     fn add_symlink(&mut self, name: &CStr) -> Result<(), Error> {
         let dir = self
             .dirstack
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 proxmox-backup 13/23] fix #3174: catalog: add specialized Archive entry
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (11 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 12/23] fix #3174: archiver/extractor: impl appendix ref Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 14/23] fix #3174: extractor: impl seq restore from appendix Christian Ebner
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Introduces a specialized pxar directory entry type Archive,
which extends the regular directory entry by storing an additional
optional appendix start offset.

The archive entry type is only used for the top most entries in the
catalog, replacing the currently used directory entry. If this entry
was created by reusing pxar file entries in an appendix section,
the appendix start offset is present and can be used to easily locate
and calculate the referenced file entries within the appendix section
to access them from the catalog shell.

Since the catalog might contain multiple archives, each archive entry
stores its individual appendix start offset.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 1:
- This reworks the Appendix Offset impl of version 1 completely

 pbs-client/src/catalog_shell.rs |   1 +
 pbs-datastore/src/catalog.rs    | 146 +++++++++++++++++++++++++++++++-
 2 files changed, 145 insertions(+), 2 deletions(-)

diff --git a/pbs-client/src/catalog_shell.rs b/pbs-client/src/catalog_shell.rs
index 99416d2f..7deb9d9a 100644
--- a/pbs-client/src/catalog_shell.rs
+++ b/pbs-client/src/catalog_shell.rs
@@ -1144,6 +1144,7 @@ impl<'a> ExtractorState<'a> {
         };
 
         match (did_match, &entry.attr) {
+            (_, DirEntryAttribute::Archive { .. }) |
             (_, DirEntryAttribute::Directory { .. }) => {
                 self.handle_new_directory(entry, match_result?).await?;
             }
diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
index da68dac9..5e588480 100644
--- a/pbs-datastore/src/catalog.rs
+++ b/pbs-datastore/src/catalog.rs
@@ -18,6 +18,11 @@ use crate::file_formats::{PROXMOX_CATALOG_FILE_MAGIC_1_0, PROXMOX_CATALOG_FILE_M
 /// A file list catalog simply stores a directory tree. Such catalogs may be used as index to do a
 /// fast search for files.
 pub trait BackupCatalogWriter {
+    fn start_archive(&mut self, name: &CStr) -> Result<(), Error>;
+    fn end_archive(
+        &mut self,
+        appendix: Option<pxar::encoder::AppendixStartOffset>,
+    ) -> Result<(), Error>;
     fn start_directory(&mut self, name: &CStr) -> Result<(), Error>;
     fn end_directory(&mut self) -> Result<(), Error>;
     fn add_file(
@@ -50,6 +55,7 @@ pub enum CatalogEntryType {
     Directory = b'd',
     File = b'f',
     AppendixRef = b'r',
+    Archive = b'a',
     Symlink = b'l',
     Hardlink = b'h',
     BlockDevice = b'b',
@@ -66,6 +72,7 @@ impl TryFrom<u8> for CatalogEntryType {
             b'd' => CatalogEntryType::Directory,
             b'f' => CatalogEntryType::File,
             b'r' => CatalogEntryType::AppendixRef,
+            b'a' => CatalogEntryType::Archive,
             b'l' => CatalogEntryType::Symlink,
             b'h' => CatalogEntryType::Hardlink,
             b'b' => CatalogEntryType::BlockDevice,
@@ -83,6 +90,7 @@ impl From<&DirEntryAttribute> for CatalogEntryType {
             DirEntryAttribute::Directory { .. } => CatalogEntryType::Directory,
             DirEntryAttribute::File { .. } => CatalogEntryType::File,
             DirEntryAttribute::AppendixRef { .. } => CatalogEntryType::AppendixRef,
+            DirEntryAttribute::Archive { .. } => CatalogEntryType::Archive,
             DirEntryAttribute::Symlink => CatalogEntryType::Symlink,
             DirEntryAttribute::Hardlink => CatalogEntryType::Hardlink,
             DirEntryAttribute::BlockDevice => CatalogEntryType::BlockDevice,
@@ -121,10 +129,22 @@ impl AppendixRefOffset {
     }
 }
 
+#[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
+pub struct AppendixStartOffset {
+    offset: u64,
+}
+
+impl AppendixStartOffset {
+    pub fn raw(&self) -> u64 {
+        self.offset
+    }
+}
+
 #[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
 pub enum Offset {
     FileOffset { offset: u64 },
     AppendixRefOffset { offset: u64 },
+    AppendixStartOffset { offset: u64 },
 }
 
 /// Represents a named directory entry
@@ -160,6 +180,10 @@ pub enum DirEntryAttribute {
         ctime: i64,
         appendix_ref_offset: AppendixRefOffset,
     },
+    Archive {
+        start: u64,
+        appendix_offset: AppendixStartOffset,
+    },
     Symlink,
     Hardlink,
     BlockDevice,
@@ -236,6 +260,13 @@ impl DirEntry {
                     },
                 }
             }
+            (CatalogEntryType::Archive, Some(Offset::AppendixStartOffset { offset })) => DirEntry {
+                name,
+                attr: DirEntryAttribute::Archive {
+                    start,
+                    appendix_offset: AppendixStartOffset { offset },
+                },
+            },
             _ => panic!("unexpected parameters '{etype}' and '{offset:?}'"),
         }
     }
@@ -246,6 +277,7 @@ impl DirEntry {
             DirEntryAttribute::Directory { .. } => pxar::mode::IFDIR,
             DirEntryAttribute::File { .. } => pxar::mode::IFREG,
             DirEntryAttribute::AppendixRef { .. } => pxar::mode::IFREG,
+            DirEntryAttribute::Archive { .. } => pxar::mode::IFDIR,
             DirEntryAttribute::Symlink => pxar::mode::IFLNK,
             DirEntryAttribute::Hardlink => return None,
             DirEntryAttribute::BlockDevice => pxar::mode::IFBLK,
@@ -258,6 +290,12 @@ impl DirEntry {
     /// Check if DirEntry is a directory
     pub fn is_directory(&self) -> bool {
         matches!(self.attr, DirEntryAttribute::Directory { .. })
+            || matches!(self.attr, DirEntryAttribute::Archive { .. })
+    }
+
+    /// Check if DirEntry is an archive
+    pub fn is_archive(&self) -> bool {
+        matches!(self.attr, DirEntryAttribute::Archive { .. })
     }
 
     /// Check if DirEntry is a symlink
@@ -285,6 +323,20 @@ impl DirInfo {
 
     fn encode_entry<W: Write>(writer: &mut W, entry: &DirEntry, pos: u64) -> Result<(), Error> {
         match entry {
+            DirEntry {
+                name,
+                attr:
+                    DirEntryAttribute::Archive {
+                        start,
+                        appendix_offset,
+                    },
+            } => {
+                writer.write_all(&[CatalogEntryType::Archive as u8])?;
+                catalog_encode_u64(writer, name.len() as u64)?;
+                writer.write_all(name)?;
+                catalog_encode_u64(writer, appendix_offset.raw())?;
+                catalog_encode_u64(writer, pos - start)?;
+            }
             DirEntry {
                 name,
                 attr: DirEntryAttribute::Directory { start },
@@ -427,6 +479,19 @@ impl DirInfo {
             cursor.read_exact(name)?;
 
             let cont = match etype {
+                CatalogEntryType::Archive => {
+                    let offset = catalog_decode_u64(&mut cursor)?;
+                    let start = catalog_decode_u64(&mut cursor)?;
+                    callback(
+                        etype,
+                        name,
+                        start,
+                        0,
+                        0,
+                        0,
+                        Some(Offset::AppendixStartOffset { offset }),
+                    )?
+                }
                 CatalogEntryType::Directory => {
                     let offset = catalog_decode_u64(&mut cursor)?;
                     callback(etype, name, offset, 0, 0, 0, None)?
@@ -533,6 +598,51 @@ impl<W: Write> CatalogWriter<W> {
 }
 
 impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
+    fn start_archive(&mut self, name: &CStr) -> Result<(), Error> {
+        let new = DirInfo::new(name.to_owned());
+        self.dirstack.push(new);
+        Ok(())
+    }
+
+    fn end_archive(
+        &mut self,
+        appendix: Option<pxar::encoder::AppendixStartOffset>,
+    ) -> Result<(), Error> {
+        let (start, name) = match self.dirstack.pop() {
+            Some(dir) => {
+                let start = self.pos;
+                let (name, data) = dir.encode(start)?;
+                self.write_all(&data)?;
+                (start, name)
+            }
+            None => {
+                bail!("got unexpected end_directory level 0");
+            }
+        };
+
+        let current = self
+            .dirstack
+            .last_mut()
+            .ok_or_else(|| format_err!("outside root"))?;
+        let name = name.to_bytes().to_vec();
+        let appendix_offset = if let Some(appendix) = appendix {
+            AppendixStartOffset {
+                offset: appendix.raw(),
+            }
+        } else {
+            AppendixStartOffset { offset: 0 }
+        };
+        current.entries.push(DirEntry {
+            name,
+            attr: DirEntryAttribute::Archive {
+                start,
+                appendix_offset,
+            },
+        });
+
+        Ok(())
+    }
+
     fn start_directory(&mut self, name: &CStr) -> Result<(), Error> {
         let new = DirInfo::new(name.to_owned());
         self.dirstack.push(new);
@@ -746,10 +856,33 @@ impl<R: Read + Seek> CatalogReader<R> {
         })
     }
 
+    pub fn appendix_offset(
+        &mut self,
+        archive_name: &[u8],
+    ) -> Result<Option<AppendixStartOffset>, Error> {
+        let root = self.root()?;
+        let dir_entry = self.lookup(&root, archive_name)?.unwrap();
+        if let DirEntry {
+            attr: DirEntryAttribute::Archive {
+                appendix_offset, ..
+            },
+            ..
+        } = dir_entry
+        {
+            if appendix_offset.raw() != 0 {
+                return Ok(Some(appendix_offset));
+            } else {
+                return Ok(None);
+            }
+        }
+        Ok(None)
+    }
+
     /// Read all directory entries
     pub fn read_dir(&mut self, parent: &DirEntry) -> Result<Vec<DirEntry>, Error> {
         let start = match parent.attr {
             DirEntryAttribute::Directory { start } => start,
+            DirEntryAttribute::Archive { start, .. } => start,
             _ => bail!("parent is not a directory - internal error"),
         };
 
@@ -813,6 +946,7 @@ impl<R: Read + Seek> CatalogReader<R> {
     ) -> Result<Option<DirEntry>, Error> {
         let start = match parent.attr {
             DirEntryAttribute::Directory { start } => start,
+            DirEntryAttribute::Archive { start, .. } => start,
             _ => bail!("parent is not a directory - internal error"),
         };
 
@@ -822,7 +956,7 @@ impl<R: Read + Seek> CatalogReader<R> {
         DirInfo::parse(
             &data,
             self.magic,
-            |etype, name, offset, size, mtime, ctime, link_offset| {
+            |etype, name, offset, size, mtime, ctime, archive_offset| {
                 if name != filename {
                     return Ok(true);
                 }
@@ -834,7 +968,7 @@ impl<R: Read + Seek> CatalogReader<R> {
                     size,
                     mtime,
                     ctime,
-                    link_offset,
+                    archive_offset,
                 );
                 item = Some(entry);
                 Ok(false) // stop parsing
@@ -868,6 +1002,14 @@ impl<R: Read + Seek> CatalogReader<R> {
                 path.push(name);
 
                 match etype {
+                    CatalogEntryType::Archive => {
+                        log::info!("{} {:?}", etype, path);
+                        if offset > start {
+                            bail!("got wrong archive offset ({} > {})", offset, start);
+                        }
+                        let pos = start - offset;
+                        self.dump_dir(&path, pos)?;
+                    }
                     CatalogEntryType::Directory => {
                         log::info!("{} {:?}", etype, path);
                         if offset > start {
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 proxmox-backup 14/23] fix #3174: extractor: impl seq restore from appendix
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (12 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 13/23] fix #3174: catalog: add specialized Archive entry Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 15/23] fix #3174: archiver: store ref to previous backup Christian Ebner
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Restores the file payloads for all AppendixRef entries encountered
during the sequential restore of the pxar archive.
This is done by iterating over all the files listed in the corresponding
state variable, opening each of the parent directory while storing its
metadata for successive restore and creating the file, followed by
writing the contents to it.

When leaving the directories, their metatdata is restored.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 1:
- Use the Encoder<SeqSink> to get encoded metadata byte size

 pbs-client/src/pxar/create.rs  |   4 +-
 pbs-client/src/pxar/extract.rs | 125 +++++++++++++++++++++++++++++++++
 pbs-client/src/pxar/tools.rs   |   1 +
 3 files changed, 128 insertions(+), 2 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 611d7421..50bba4e6 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -43,7 +43,7 @@ pub struct PxarCreateOptions {
     pub skip_lost_and_found: bool,
 }
 
-fn detect_fs_type(fd: RawFd) -> Result<i64, Error> {
+pub fn detect_fs_type(fd: RawFd) -> Result<i64, Error> {
     let mut fs_stat = std::mem::MaybeUninit::uninit();
     let res = unsafe { libc::fstatfs(fd, fs_stat.as_mut_ptr()) };
     Errno::result(res)?;
@@ -776,7 +776,7 @@ impl Archiver {
     }
 }
 
-fn get_metadata(
+pub fn get_metadata(
     fd: RawFd,
     stat: &FileStat,
     flags: Flags,
diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
index d2d42749..0450cee4 100644
--- a/pbs-client/src/pxar/extract.rs
+++ b/pbs-client/src/pxar/extract.rs
@@ -313,6 +313,131 @@ where
 
                 res
             }
+            (_, EntryKind::Appendix { total }) => {
+                // Bytes consumed in decoder since encountering the appendix marker
+                let mut consumed = 0;
+
+                for (path, offset, size) in &self.state.appendix_list {
+                    self.extractor.allow_existing_dirs = true;
+
+                    // Open dir path components, skipping the root component, get metadata
+                    for dir in path.iter().skip(1) {
+                        let parent_fd = match self.extractor.dir_stack.last_dir_fd(true) {
+                            Ok(parent_fd) => parent_fd,
+                            Err(err) => return Some(Err(err)),
+                        };
+                        let fs_magic =
+                            match crate::pxar::create::detect_fs_type(parent_fd.as_raw_fd()) {
+                                Ok(fs_magic) => fs_magic,
+                                Err(err) => return Some(Err(err)),
+                            };
+
+                        let mut fs_feature_flags = Flags::from_magic(fs_magic);
+                        let file_name = match CString::new(dir.as_bytes()) {
+                            Ok(file_name) => file_name,
+                            Err(err) => return Some(Err(err.into())),
+                        };
+                        let fd = proxmox_sys::fd::openat(
+                            &parent_fd,
+                            file_name.as_ref(),
+                            OFlag::O_NOATIME,
+                            Mode::empty(),
+                        )
+                        .unwrap();
+                        let stat = nix::sys::stat::fstat(fd.as_raw_fd()).unwrap();
+                        let metadata = match crate::pxar::create::get_metadata(
+                            fd.as_raw_fd(),
+                            &stat,
+                            fs_feature_flags,
+                            fs_magic,
+                            &mut fs_feature_flags,
+                        ) {
+                            Ok(metadata) => metadata,
+                            Err(err) => return Some(Err(err)),
+                        };
+
+                        match self.extractor.enter_directory(
+                            dir.to_os_string(),
+                            metadata.clone(),
+                            true,
+                        ) {
+                            Ok(()) => (),
+                            Err(err) => return Some(Err(err)),
+                        };
+                    }
+
+                    let skip = *offset - consumed;
+                    match self.decoder.skip_bytes(skip) {
+                        Ok(()) => (),
+                        Err(err) => return Some(Err(err.into())),
+                    };
+
+                    let entry = match self.decoder.next() {
+                        Some(Ok(entry)) => entry,
+                        Some(Err(err)) => return Some(Err(err.into())),
+                        None => return Some(Err(format_err!("expected entry"))),
+                    };
+
+                    let file_name_os = entry.file_name();
+                    let file_name_bytes = file_name_os.as_bytes();
+
+                    let file_name = match CString::new(file_name_bytes) {
+                        Ok(file_name_ref) => file_name_ref,
+                        Err(err) => return Some(Err(format_err!(err))),
+                    };
+
+                    let metadata = entry.metadata();
+
+                    self.extractor.set_path(path.as_os_str().to_owned());
+
+                    let contents = self.decoder.contents();
+
+                    let result = if let Some(mut contents) = contents {
+                        self.extractor.extract_file(
+                            &file_name,
+                            metadata,
+                            *size,
+                            &mut contents,
+                            self.extractor
+                                .overwrite_flags
+                                .contains(OverwriteFlags::FILE),
+                        )
+                    } else {
+                        Err(format_err!(
+                            "found regular file entry without contents in archive"
+                        ))
+                    }
+                    .context(PxarExtractContext::ExtractFile);
+                    result.unwrap();
+
+                    // Iter over all dir path components, skipping the root component, set metadata
+                    for _dir in path.iter().skip(1) {
+                        match self.extractor.leave_directory() {
+                            Ok(()) => (),
+                            Err(err) => return Some(Err(err)),
+                        }
+                    }
+
+                    let mut bytes =
+                        pxar::encoder::Encoder::<pxar::encoder::SeqSink>::byte_len(
+                            file_name.as_c_str(),
+                            &metadata,
+                        )
+                        .unwrap();
+                    // payload header size
+                    bytes += std::mem::size_of::<pxar::format::Header>() as u64;
+
+                    consumed += skip + bytes + *size;
+                }
+
+                let skip = *total - consumed;
+                match self.decoder.skip_bytes(skip) {
+                    Ok(_position) => (),
+                    Err(err) => return Some(Err(err.into())),
+                }
+
+                Ok(())
+            }
             (true, EntryKind::Symlink(link)) => {
                 self.callback(entry.path());
                 self.extractor
diff --git a/pbs-client/src/pxar/tools.rs b/pbs-client/src/pxar/tools.rs
index aac5a1e7..174a7351 100644
--- a/pbs-client/src/pxar/tools.rs
+++ b/pbs-client/src/pxar/tools.rs
@@ -156,6 +156,7 @@ pub fn format_multi_line_entry(entry: &Entry) -> String {
 
     let (size, link, type_name) = match entry.kind() {
         EntryKind::File { size, .. } => (format!("{}", *size), String::new(), "file"),
+        EntryKind::Appendix { total } => (format!("{total}"), String::new(), "appendix"),
         EntryKind::AppendixRef {
             appendix_offset,
             file_size,
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 proxmox-backup 15/23] fix #3174: archiver: store ref to previous backup
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (13 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 14/23] fix #3174: extractor: impl seq restore from appendix Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 16/23] fix #3174: upload stream: impl reused chunk injector Christian Ebner
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Adds the stateful information needed for accessing the previous backups
information such as the index and catalog.

This patch only introduces the struct and interface, by setting all
occurrences to None it is not used for now.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 1:
no changes

 pbs-client/src/pxar/create.rs                 | 19 +++++++++++++++++--
 pbs-client/src/pxar/mod.rs                    |  2 +-
 proxmox-backup-client/src/main.rs             |  1 +
 .../src/proxmox_restore_daemon/api.rs         |  1 +
 pxar-bin/src/main.rs                          | 17 +++++++++--------
 5 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 50bba4e6..bc6b63dd 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -24,14 +24,15 @@ use proxmox_io::vec;
 use proxmox_lang::c_str;
 use proxmox_sys::fs::{self, acl, xattr};
 
-use pbs_datastore::catalog::BackupCatalogWriter;
+use pbs_datastore::catalog::{BackupCatalogWriter, CatalogReader};
+use pbs_datastore::dynamic_index::DynamicIndexReader;
 
 use crate::pxar::metadata::errno_is_unsupported;
 use crate::pxar::tools::assert_single_path_component;
 use crate::pxar::Flags;
 
 /// Pxar options for creating a pxar archive/stream
-#[derive(Default, Clone)]
+#[derive(Default)]
 pub struct PxarCreateOptions {
     /// Device/mountpoint st_dev numbers that should be included. None for no limitation.
     pub device_set: Option<HashSet<u64>>,
@@ -41,6 +42,18 @@ pub struct PxarCreateOptions {
     pub entries_max: usize,
     /// Skip lost+found directory
     pub skip_lost_and_found: bool,
+    /// Reference state for partial backups
+    pub previous_ref: Option<PxarPrevRef>,
+}
+
+/// Contains statefull information of previous backups snapshots for partial backups
+pub struct PxarPrevRef {
+    /// Reference index for partial backups
+    pub index: DynamicIndexReader,
+    /// Reference catalog for partial backups
+    pub catalog: CatalogReader<std::fs::File>,
+    /// Reference archive name for partial backups
+    pub archive_name: String,
 }
 
 pub fn detect_fs_type(fd: RawFd) -> Result<i64, Error> {
@@ -128,6 +141,7 @@ struct Archiver {
     device_set: Option<HashSet<u64>>,
     hardlinks: HashMap<HardLinkInfo, (PathBuf, LinkOffset)>,
     file_copy_buffer: Vec<u8>,
+    previous_ref: Option<PxarPrevRef>,
 }
 
 type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
@@ -192,6 +206,7 @@ where
         device_set,
         hardlinks: HashMap::new(),
         file_copy_buffer: vec::undefined(4 * 1024 * 1024),
+        previous_ref: options.previous_ref,
     };
 
     archiver
diff --git a/pbs-client/src/pxar/mod.rs b/pbs-client/src/pxar/mod.rs
index 14674b9b..24315f5f 100644
--- a/pbs-client/src/pxar/mod.rs
+++ b/pbs-client/src/pxar/mod.rs
@@ -56,7 +56,7 @@ pub(crate) mod tools;
 mod flags;
 pub use flags::Flags;
 
-pub use create::{create_archive, PxarCreateOptions};
+pub use create::{create_archive, PxarCreateOptions, PxarPrevRef};
 pub use extract::{
     create_tar, create_zip, extract_archive, extract_sub_dir, extract_sub_dir_seq, ErrorHandler,
     OverwriteFlags, PxarExtractContext, PxarExtractOptions,
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 1a13291a..509fa22c 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -993,6 +993,7 @@ async fn create_backup(
                     patterns: pattern_list.clone(),
                     entries_max: entries_max as usize,
                     skip_lost_and_found,
+                    previous_ref: None,
                 };
 
                 let upload_options = UploadOptions {
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index fb12befa..f89b0ab4 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -356,6 +356,7 @@ fn extract(
                         device_set: None,
                         patterns,
                         skip_lost_and_found: false,
+                        previous_ref: None,
                     };
 
                     let pxar_writer = TokioWriter::new(writer);
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index bc044035..9376a2c1 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -330,13 +330,6 @@ async fn create_archive(
         Some(HashSet::new())
     };
 
-    let options = pbs_client::pxar::PxarCreateOptions {
-        entries_max: entries_max as usize,
-        device_set,
-        patterns,
-        skip_lost_and_found: false,
-    };
-
     let source = PathBuf::from(source);
 
     let dir = nix::dir::Dir::open(
@@ -349,7 +342,7 @@ async fn create_archive(
         .create_new(true)
         .write(true)
         .mode(0o640)
-        .open(archive)?;
+        .open(archive.clone())?;
 
     let writer = std::io::BufWriter::with_capacity(1024 * 1024, file);
     let mut feature_flags = Flags::DEFAULT;
@@ -372,6 +365,14 @@ async fn create_archive(
         feature_flags.remove(Flags::WITH_SOCKETS);
     }
 
+    let options = pbs_client::pxar::PxarCreateOptions {
+        entries_max: entries_max as usize,
+        device_set,
+        patterns,
+        skip_lost_and_found: false,
+        previous_ref: None,
+    };
+
     let writer = pxar::encoder::sync::StandardWriter::new(writer);
     pbs_client::pxar::create_archive(
         dir,
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 proxmox-backup 16/23] fix #3174: upload stream: impl reused chunk injector
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (14 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 15/23] fix #3174: archiver: store ref to previous backup Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 17/23] fix #3174: chunker: add forced boundaries Christian Ebner
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

In order to be included in the backups index file, the reused chunks
which store the payload of skipped files during pxar encoding have to be
inserted after the encoder has written the pxar appendix entry type.

The chunker forces a chunk boundary after this marker and queues the
list of chunks to be uploaded thereafter.
This implements the logic to inject the chunks into the chunk upload
stream after such a boundary is requested, by looping over the queued
chunks and inserting them into the stream.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 1:
no changes

 pbs-client/src/inject_reused_chunks.rs | 123 +++++++++++++++++++++++++
 pbs-client/src/lib.rs                  |   1 +
 2 files changed, 124 insertions(+)
 create mode 100644 pbs-client/src/inject_reused_chunks.rs

diff --git a/pbs-client/src/inject_reused_chunks.rs b/pbs-client/src/inject_reused_chunks.rs
new file mode 100644
index 00000000..01cb1350
--- /dev/null
+++ b/pbs-client/src/inject_reused_chunks.rs
@@ -0,0 +1,123 @@
+use std::collections::VecDeque;
+use std::pin::Pin;
+use std::sync::atomic::{AtomicUsize, Ordering};
+use std::sync::{Arc, Mutex};
+use std::task::{Context, Poll};
+
+use anyhow::Error;
+use futures::{ready, Stream};
+use pin_project_lite::pin_project;
+
+use pbs_datastore::dynamic_index::DynamicEntry;
+
+pin_project! {
+    pub struct InjectReusedChunksQueue<S> {
+        #[pin]
+        input: S,
+        current: Option<InjectChunks>,
+        injection_queue: Arc<Mutex<VecDeque<InjectChunks>>>,
+        stream_len: Arc<AtomicUsize>,
+        index_csum: Arc<Mutex<Option<openssl::sha::Sha256>>>,
+    }
+}
+
+#[derive(Debug)]
+pub struct InjectChunks {
+    pub boundary: u64,
+    pub chunks: Vec<DynamicEntry>,
+    pub size: usize,
+}
+
+pub enum InjectedChunksInfo {
+    Known(Vec<(u64, [u8; 32])>),
+    Raw((u64, bytes::BytesMut)),
+}
+
+pub trait InjectReusedChunks: Sized {
+    fn inject_reused_chunks(
+        self,
+        injection_queue: Arc<Mutex<VecDeque<InjectChunks>>>,
+        stream_len: Arc<AtomicUsize>,
+        index_csum: Arc<Mutex<Option<openssl::sha::Sha256>>>,
+    ) -> InjectReusedChunksQueue<Self>;
+}
+
+impl<S> InjectReusedChunks for S
+where
+    S: Stream<Item = Result<bytes::BytesMut, Error>>,
+{
+    fn inject_reused_chunks(
+        self,
+        injection_queue: Arc<Mutex<VecDeque<InjectChunks>>>,
+        stream_len: Arc<AtomicUsize>,
+        index_csum: Arc<Mutex<Option<openssl::sha::Sha256>>>,
+    ) -> InjectReusedChunksQueue<Self> {
+        let current = injection_queue.lock().unwrap().pop_front();
+
+        InjectReusedChunksQueue {
+            input: self,
+            current,
+            injection_queue,
+            stream_len,
+            index_csum,
+        }
+    }
+}
+
+impl<S> Stream for InjectReusedChunksQueue<S>
+where
+    S: Stream<Item = Result<bytes::BytesMut, Error>>,
+{
+    type Item = Result<InjectedChunksInfo, Error>;
+
+    fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Option<Self::Item>> {
+        let mut this = self.project();
+        loop {
+            let current = this.current.take();
+            if let Some(current) = current {
+                let mut chunks = Vec::new();
+                let mut guard = this.index_csum.lock().unwrap();
+                let csum = guard.as_mut().unwrap();
+
+                for chunk in current.chunks {
+                    let offset = this
+                        .stream_len
+                        .fetch_add(chunk.end() as usize, Ordering::SeqCst)
+                        as u64;
+                    let digest = chunk.digest();
+                    chunks.push((offset, digest));
+                    // Chunk end is assumed to be normalized to chunk size here
+                    let end_offset = offset + chunk.end();
+                    csum.update(&end_offset.to_le_bytes());
+                    csum.update(&digest);
+                }
+                let chunk_info = InjectedChunksInfo::Known(chunks);
+                return Poll::Ready(Some(Ok(chunk_info)));
+            }
+
+            match ready!(this.input.as_mut().poll_next(cx)) {
+                None => return Poll::Ready(None),
+                Some(Err(err)) => return Poll::Ready(Some(Err(err))),
+                Some(Ok(raw)) => {
+                    let chunk_size = raw.len();
+                    let offset = this.stream_len.fetch_add(chunk_size, Ordering::SeqCst) as u64;
+                    let mut injections = this.injection_queue.lock().unwrap();
+                    if let Some(inject) = injections.pop_front() {
+                        if inject.boundary == offset {
+                            let _ = this.current.insert(inject);
+                            // Should be injected here, directly jump to next loop iteration
+                            continue;
+                        } else if inject.boundary <= offset + chunk_size as u64 {
+                            let _ = this.current.insert(inject);
+                        } else {
+                            injections.push_front(inject);
+                        }
+                    }
+                    let data = InjectedChunksInfo::Raw((offset, raw));
+
+                    return Poll::Ready(Some(Ok(data)));
+                }
+            }
+        }
+    }
+}
diff --git a/pbs-client/src/lib.rs b/pbs-client/src/lib.rs
index 21cf8556..8bf26381 100644
--- a/pbs-client/src/lib.rs
+++ b/pbs-client/src/lib.rs
@@ -8,6 +8,7 @@ pub mod pxar;
 pub mod tools;
 
 mod merge_known_chunks;
+mod inject_reused_chunks;
 pub mod pipe_to_stream;
 
 mod http_client;
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 proxmox-backup 17/23] fix #3174: chunker: add forced boundaries
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (15 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 16/23] fix #3174: upload stream: impl reused chunk injector Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 18/23] fix #3174: backup writer: inject queued chunk in upload steam Christian Ebner
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Allow to force a boundary while chunking and inject reused chunks into
the stream. Duoble ended queues are used to control the boundaries and
chunk injection between archiver, chunker and uploader.

The archiver gets an interface to request a boundary and push a list of
chunks to inject following that boundary. The chunker reads this queue,
creating the boundary and passing the list of chunks to inject to the
uploader via a second, dedicated double ended queue.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 1:
no changes

 examples/test_chunk_speed2.rs                 |  9 +++-
 pbs-client/src/backup_writer.rs               |  6 ++-
 pbs-client/src/chunk_stream.rs                | 41 ++++++++++++++++++-
 pbs-client/src/pxar/create.rs                 | 10 ++++-
 pbs-client/src/pxar_backup_stream.rs          |  8 +++-
 proxmox-backup-client/src/main.rs             | 36 ++++++++++++----
 .../src/proxmox_restore_daemon/api.rs         | 13 +++++-
 pxar-bin/src/main.rs                          |  5 ++-
 tests/catar.rs                                |  3 ++
 9 files changed, 114 insertions(+), 17 deletions(-)

diff --git a/examples/test_chunk_speed2.rs b/examples/test_chunk_speed2.rs
index 3f69b436..e8bac726 100644
--- a/examples/test_chunk_speed2.rs
+++ b/examples/test_chunk_speed2.rs
@@ -1,5 +1,7 @@
 use anyhow::Error;
 use futures::*;
+use std::collections::VecDeque;
+use std::sync::{Arc, Mutex};
 
 extern crate proxmox_backup;
 
@@ -26,7 +28,12 @@ async fn run() -> Result<(), Error> {
         .map_err(Error::from);
 
     //let chunk_stream = FixedChunkStream::new(stream, 4*1024*1024);
-    let mut chunk_stream = ChunkStream::new(stream, None);
+    let mut chunk_stream = ChunkStream::new(
+        stream,
+        None,
+        Arc::new(Mutex::new(VecDeque::new())),
+        Arc::new(Mutex::new(VecDeque::new())),
+    );
 
     let start_time = std::time::Instant::now();
 
diff --git a/pbs-client/src/backup_writer.rs b/pbs-client/src/backup_writer.rs
index 8a03d8ea..cc6dd49a 100644
--- a/pbs-client/src/backup_writer.rs
+++ b/pbs-client/src/backup_writer.rs
@@ -1,4 +1,4 @@
-use std::collections::HashSet;
+use std::collections::{HashSet, VecDeque};
 use std::future::Future;
 use std::os::unix::fs::OpenOptionsExt;
 use std::sync::atomic::{AtomicU64, AtomicUsize, Ordering};
@@ -23,6 +23,7 @@ use pbs_tools::crypt_config::CryptConfig;
 
 use proxmox_human_byte::HumanByte;
 
+use super::inject_reused_chunks::InjectChunks;
 use super::merge_known_chunks::{MergeKnownChunks, MergedChunkInfo};
 
 use super::{H2Client, HttpClient};
@@ -265,6 +266,7 @@ impl BackupWriter {
         archive_name: &str,
         stream: impl Stream<Item = Result<bytes::BytesMut, Error>>,
         options: UploadOptions,
+        injection_queue: Arc<Mutex<VecDeque<InjectChunks>>>,
     ) -> Result<BackupStats, Error> {
         let known_chunks = Arc::new(Mutex::new(HashSet::new()));
 
@@ -341,6 +343,7 @@ impl BackupWriter {
                 None
             },
             options.compress,
+            injection_queue,
         )
         .await?;
 
@@ -637,6 +640,7 @@ impl BackupWriter {
         known_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
         crypt_config: Option<Arc<CryptConfig>>,
         compress: bool,
+        injection_queue: Arc<Mutex<VecDeque<InjectChunks>>>,
     ) -> impl Future<Output = Result<UploadStats, Error>> {
         let total_chunks = Arc::new(AtomicUsize::new(0));
         let total_chunks2 = total_chunks.clone();
diff --git a/pbs-client/src/chunk_stream.rs b/pbs-client/src/chunk_stream.rs
index 895f6eae..1373502f 100644
--- a/pbs-client/src/chunk_stream.rs
+++ b/pbs-client/src/chunk_stream.rs
@@ -1,5 +1,7 @@
 use std::pin::Pin;
 use std::task::{Context, Poll};
+use std::sync::{Arc, Mutex};
+use std::collections::VecDeque;
 
 use anyhow::Error;
 use bytes::BytesMut;
@@ -8,21 +10,34 @@ use futures::stream::{Stream, TryStream};
 
 use pbs_datastore::Chunker;
 
+use crate::inject_reused_chunks::InjectChunks;
+
 /// Split input stream into dynamic sized chunks
 pub struct ChunkStream<S: Unpin> {
     input: S,
     chunker: Chunker,
     buffer: BytesMut,
     scan_pos: usize,
+    consumed: u64,
+    boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
+    injections: Arc<Mutex<VecDeque<InjectChunks>>>,
 }
 
 impl<S: Unpin> ChunkStream<S> {
-    pub fn new(input: S, chunk_size: Option<usize>) -> Self {
+    pub fn new(
+        input: S,
+        chunk_size: Option<usize>,
+        boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
+        injections: Arc<Mutex<VecDeque<InjectChunks>>>,
+    ) -> Self {
         Self {
             input,
             chunker: Chunker::new(chunk_size.unwrap_or(4 * 1024 * 1024)),
             buffer: BytesMut::new(),
             scan_pos: 0,
+            consumed: 0,
+            boundaries,
+            injections,
         }
     }
 }
@@ -40,6 +55,29 @@ where
     fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Option<Self::Item>> {
         let this = self.get_mut();
         loop {
+            { 
+                // Make sure to release this lock and don't hold it longer than required
+                let mut boundaries = this.boundaries.lock().unwrap();
+                if let Some(inject) = boundaries.pop_front() {
+                    let max = this.consumed + this.buffer.len() as u64;
+                    if inject.boundary <= max {
+                        let chunk_size = (inject.boundary - this.consumed) as usize;
+                        let result = this.buffer.split_to(chunk_size);
+                        this.consumed += chunk_size as u64;
+                        this.scan_pos = 0;
+
+                        // Add the size of the injected chunks to consumed, so chunk stream offsets
+                        // are in sync with the rest of the archive.
+                        this.consumed += inject.size as u64;
+
+                        this.injections.lock().unwrap().push_back(inject);
+
+                        return Poll::Ready(Some(Ok(result)));
+                    }
+                    boundaries.push_front(inject);
+                }
+            }
+
             if this.scan_pos < this.buffer.len() {
                 let boundary = this.chunker.scan(&this.buffer[this.scan_pos..]);
 
@@ -50,6 +88,7 @@ where
                     // continue poll
                 } else if chunk_size <= this.buffer.len() {
                     let result = this.buffer.split_to(chunk_size);
+                    this.consumed += chunk_size as u64;
                     this.scan_pos = 0;
                     return Poll::Ready(Some(Ok(result)));
                 } else {
diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index bc6b63dd..4464a3ed 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -1,4 +1,4 @@
-use std::collections::{HashMap, HashSet};
+use std::collections::{HashMap, HashSet, VecDeque};
 use std::ffi::{CStr, CString, OsStr};
 use std::fmt;
 use std::io::{self, Read};
@@ -25,8 +25,9 @@ use proxmox_lang::c_str;
 use proxmox_sys::fs::{self, acl, xattr};
 
 use pbs_datastore::catalog::{BackupCatalogWriter, CatalogReader};
-use pbs_datastore::dynamic_index::DynamicIndexReader;
+use pbs_datastore::dynamic_index::{DynamicEntry, DynamicIndexReader};
 
+use crate::inject_reused_chunks::InjectChunks;
 use crate::pxar::metadata::errno_is_unsupported;
 use crate::pxar::tools::assert_single_path_component;
 use crate::pxar::Flags;
@@ -142,6 +143,8 @@ struct Archiver {
     hardlinks: HashMap<HardLinkInfo, (PathBuf, LinkOffset)>,
     file_copy_buffer: Vec<u8>,
     previous_ref: Option<PxarPrevRef>,
+    forced_boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
+    inject: (usize, Vec<DynamicEntry>),
 }
 
 type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
@@ -153,6 +156,7 @@ pub async fn create_archive<T, F>(
     callback: F,
     catalog: Option<Arc<Mutex<dyn BackupCatalogWriter + Send>>>,
     options: PxarCreateOptions,
+    forced_boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
 ) -> Result<(), Error>
 where
     T: SeqWrite + Send,
@@ -207,6 +211,8 @@ where
         hardlinks: HashMap::new(),
         file_copy_buffer: vec::undefined(4 * 1024 * 1024),
         previous_ref: options.previous_ref,
+        forced_boundaries,
+        inject: (0, Vec::new()),
     };
 
     archiver
diff --git a/pbs-client/src/pxar_backup_stream.rs b/pbs-client/src/pxar_backup_stream.rs
index 22a6ffdc..d18ba470 100644
--- a/pbs-client/src/pxar_backup_stream.rs
+++ b/pbs-client/src/pxar_backup_stream.rs
@@ -1,3 +1,4 @@
+use std::collections::VecDeque;
 use std::io::Write;
 //use std::os::unix::io::FromRawFd;
 use std::path::Path;
@@ -17,6 +18,8 @@ use proxmox_io::StdChannelWriter;
 
 use pbs_datastore::catalog::CatalogWriter;
 
+use crate::inject_reused_chunks::InjectChunks;
+
 /// Stream implementation to encode and upload .pxar archives.
 ///
 /// The hyper client needs an async Stream for file upload, so we
@@ -40,6 +43,7 @@ impl PxarBackupStream {
         dir: Dir,
         catalog: Arc<Mutex<CatalogWriter<W>>>,
         options: crate::pxar::PxarCreateOptions,
+        boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
     ) -> Result<Self, Error> {
         let (tx, rx) = std::sync::mpsc::sync_channel(10);
 
@@ -64,6 +68,7 @@ impl PxarBackupStream {
                 },
                 Some(catalog),
                 options,
+                boundaries,
             )
             .await
             {
@@ -87,10 +92,11 @@ impl PxarBackupStream {
         dirname: &Path,
         catalog: Arc<Mutex<CatalogWriter<W>>>,
         options: crate::pxar::PxarCreateOptions,
+        boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
     ) -> Result<Self, Error> {
         let dir = nix::dir::Dir::open(dirname, OFlag::O_DIRECTORY, Mode::empty())?;
 
-        Self::new(dir, catalog, options)
+        Self::new(dir, catalog, options, boundaries)
     }
 }
 
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 509fa22c..5945ae5d 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -1,4 +1,4 @@
-use std::collections::HashSet;
+use std::collections::{HashSet, VecDeque};
 use std::io::{self, Read, Seek, SeekFrom, Write};
 use std::path::{Path, PathBuf};
 use std::pin::Pin;
@@ -192,8 +192,17 @@ async fn backup_directory<P: AsRef<Path>>(
     pxar_create_options: pbs_client::pxar::PxarCreateOptions,
     upload_options: UploadOptions,
 ) -> Result<BackupStats, Error> {
-    let pxar_stream = PxarBackupStream::open(dir_path.as_ref(), catalog, pxar_create_options)?;
-    let mut chunk_stream = ChunkStream::new(pxar_stream, chunk_size);
+    let boundaries = Arc::new(Mutex::new(VecDeque::new()));
+    let pxar_stream = PxarBackupStream::open(
+        dir_path.as_ref(),
+        catalog,
+        pxar_create_options,
+        boundaries.clone(),
+    )?;
+
+    let injections = Arc::new(Mutex::new(VecDeque::new()));
+    let mut chunk_stream =
+        ChunkStream::new(pxar_stream, chunk_size, boundaries, injections.clone());
 
     let (tx, rx) = mpsc::channel(10); // allow to buffer 10 chunks
 
@@ -211,7 +220,7 @@ async fn backup_directory<P: AsRef<Path>>(
     }
 
     let stats = client
-        .upload_stream(archive_name, stream, upload_options)
+        .upload_stream(archive_name, stream, upload_options, injections)
         .await?;
 
     Ok(stats)
@@ -237,8 +246,9 @@ async fn backup_image<P: AsRef<Path>>(
         bail!("cannot backup image with dynamic chunk size!");
     }
 
+    let injection_queue = Arc::new(Mutex::new(VecDeque::new()));
     let stats = client
-        .upload_stream(archive_name, stream, upload_options)
+        .upload_stream(archive_name, stream, upload_options, injection_queue)
         .await?;
 
     Ok(stats)
@@ -529,7 +539,14 @@ fn spawn_catalog_upload(
     let (catalog_tx, catalog_rx) = std::sync::mpsc::sync_channel(10); // allow to buffer 10 writes
     let catalog_stream = proxmox_async::blocking::StdChannelStream(catalog_rx);
     let catalog_chunk_size = 512 * 1024;
-    let catalog_chunk_stream = ChunkStream::new(catalog_stream, Some(catalog_chunk_size));
+    let boundaries = Arc::new(Mutex::new(VecDeque::new()));
+    let injections = Arc::new(Mutex::new(VecDeque::new()));
+    let catalog_chunk_stream = ChunkStream::new(
+        catalog_stream,
+        Some(catalog_chunk_size),
+        boundaries,
+        injections.clone(),
+    );
 
     let catalog_writer = Arc::new(Mutex::new(CatalogWriter::new(TokioWriterAdapter::new(
         StdChannelWriter::new(catalog_tx),
@@ -545,7 +562,12 @@ fn spawn_catalog_upload(
 
     tokio::spawn(async move {
         let catalog_upload_result = client
-            .upload_stream(CATALOG_NAME, catalog_chunk_stream, upload_options)
+            .upload_stream(
+                CATALOG_NAME,
+                catalog_chunk_stream,
+                upload_options,
+                injections,
+            )
             .await;
 
         if let Err(ref err) = catalog_upload_result {
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index f89b0ab4..5eff673e 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -1,8 +1,10 @@
 ///! File-restore API running inside the restore VM
+use std::collections::VecDeque;
 use std::ffi::OsStr;
 use std::fs;
 use std::os::unix::ffi::OsStrExt;
 use std::path::{Path, PathBuf};
+use std::sync::{Arc, Mutex};
 
 use anyhow::{bail, Error};
 use futures::FutureExt;
@@ -360,8 +362,15 @@ fn extract(
                     };
 
                     let pxar_writer = TokioWriter::new(writer);
-                    create_archive(dir, pxar_writer, Flags::DEFAULT, |_| Ok(()), None, options)
-                        .await
+                    create_archive(
+                        dir,
+                        pxar_writer,
+                        Flags::DEFAULT,
+                        |_| Ok(()),
+                        None,
+                        options,
+                        Arc::new(Mutex::new(VecDeque::new())),
+                    ).await
                 }
                 .await;
                 if let Err(err) = result {
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index 9376a2c1..c019f3e4 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -1,10 +1,10 @@
-use std::collections::HashSet;
+use std::collections::{HashSet, VecDeque};
 use std::ffi::OsStr;
 use std::fs::OpenOptions;
 use std::os::unix::fs::OpenOptionsExt;
 use std::path::{Path, PathBuf};
 use std::sync::atomic::{AtomicBool, Ordering};
-use std::sync::Arc;
+use std::sync::{Arc, Mutex};
 
 use anyhow::{bail, format_err, Error};
 use futures::future::FutureExt;
@@ -384,6 +384,7 @@ async fn create_archive(
         },
         None,
         options,
+        Arc::new(Mutex::new(VecDeque::new())),
     )
     .await?;
 
diff --git a/tests/catar.rs b/tests/catar.rs
index 36bb4f3b..d69cb37b 100644
--- a/tests/catar.rs
+++ b/tests/catar.rs
@@ -1,4 +1,6 @@
 use std::process::Command;
+use std::sync::{Arc, Mutex};
+use std::collections::VecDeque;
 
 use anyhow::Error;
 
@@ -40,6 +42,7 @@ fn run_test(dir_name: &str) -> Result<(), Error> {
         |_| Ok(()),
         None,
         options,
+        Arc::new(Mutex::new(VecDeque::new())),
     ))?;
 
     Command::new("cmp")
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 proxmox-backup 18/23] fix #3174: backup writer: inject queued chunk in upload steam
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (16 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 17/23] fix #3174: chunker: add forced boundaries Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 19/23] fix #3174: archiver: reuse files with unchanged metadata Christian Ebner
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Inject the chunk in the backup writers upload stream, including them
thereby in the index file.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 1:
no changes

 pbs-client/src/backup_writer.rs | 84 +++++++++++++++++++--------------
 1 file changed, 49 insertions(+), 35 deletions(-)

diff --git a/pbs-client/src/backup_writer.rs b/pbs-client/src/backup_writer.rs
index cc6dd49a..0f18b1ff 100644
--- a/pbs-client/src/backup_writer.rs
+++ b/pbs-client/src/backup_writer.rs
@@ -23,7 +23,7 @@ use pbs_tools::crypt_config::CryptConfig;
 
 use proxmox_human_byte::HumanByte;
 
-use super::inject_reused_chunks::InjectChunks;
+use super::inject_reused_chunks::{InjectChunks, InjectReusedChunks, InjectedChunksInfo};
 use super::merge_known_chunks::{MergeKnownChunks, MergedChunkInfo};
 
 use super::{H2Client, HttpClient};
@@ -667,48 +667,62 @@ impl BackupWriter {
         let index_csum_2 = index_csum.clone();
 
         stream
-            .and_then(move |data| {
-                let chunk_len = data.len();
+            .inject_reused_chunks(injection_queue, stream_len, index_csum.clone())
+            .and_then(move |chunk_info| {
+                match chunk_info {
+                    InjectedChunksInfo::Known(chunks) => {
+                        total_chunks.fetch_add(chunks.len(), Ordering::SeqCst);
+                        future::ok(MergedChunkInfo::Known(chunks))
+                    }
+                    InjectedChunksInfo::Raw((offset, data)) => {
+                        let chunk_len = data.len();
 
-                total_chunks.fetch_add(1, Ordering::SeqCst);
-                let offset = stream_len.fetch_add(chunk_len, Ordering::SeqCst) as u64;
+                        total_chunks.fetch_add(1, Ordering::SeqCst);
 
-                let mut chunk_builder = DataChunkBuilder::new(data.as_ref()).compress(compress);
+                        let mut chunk_builder =
+                            DataChunkBuilder::new(data.as_ref()).compress(compress);
 
-                if let Some(ref crypt_config) = crypt_config {
-                    chunk_builder = chunk_builder.crypt_config(crypt_config);
-                }
+                        if let Some(ref crypt_config) = crypt_config {
+                            chunk_builder = chunk_builder.crypt_config(crypt_config);
+                        }
 
-                let mut known_chunks = known_chunks.lock().unwrap();
-                let digest = chunk_builder.digest();
+                        let mut known_chunks = known_chunks.lock().unwrap();
 
-                let mut guard = index_csum.lock().unwrap();
-                let csum = guard.as_mut().unwrap();
+                        let digest = chunk_builder.digest();
 
-                let chunk_end = offset + chunk_len as u64;
+                        let mut guard = index_csum.lock().unwrap();
+                        let csum = guard.as_mut().unwrap();
 
-                if !is_fixed_chunk_size {
-                    csum.update(&chunk_end.to_le_bytes());
-                }
-                csum.update(digest);
+                        let chunk_end = offset + chunk_len as u64;
 
-                let chunk_is_known = known_chunks.contains(digest);
-                if chunk_is_known {
-                    known_chunk_count.fetch_add(1, Ordering::SeqCst);
-                    reused_len.fetch_add(chunk_len, Ordering::SeqCst);
-                    future::ok(MergedChunkInfo::Known(vec![(offset, *digest)]))
-                } else {
-                    let compressed_stream_len2 = compressed_stream_len.clone();
-                    known_chunks.insert(*digest);
-                    future::ready(chunk_builder.build().map(move |(chunk, digest)| {
-                        compressed_stream_len2.fetch_add(chunk.raw_size(), Ordering::SeqCst);
-                        MergedChunkInfo::New(ChunkInfo {
-                            chunk,
-                            digest,
-                            chunk_len: chunk_len as u64,
-                            offset,
-                        })
-                    }))
+                        if !is_fixed_chunk_size {
+                            csum.update(&chunk_end.to_le_bytes());
+                        }
+                        csum.update(digest);
+
+                        let chunk_is_known = known_chunks.contains(digest);
+                        if chunk_is_known {
+                            known_chunk_count.fetch_add(1, Ordering::SeqCst);
+                            reused_len.fetch_add(chunk_len, Ordering::SeqCst);
+
+                            future::ok(MergedChunkInfo::Known(vec![(offset, *digest)]))
+                        } else {
+                            let compressed_stream_len2 = compressed_stream_len.clone();
+                            known_chunks.insert(*digest);
+
+                            future::ready(chunk_builder.build().map(move |(chunk, digest)| {
+                                compressed_stream_len2
+                                    .fetch_add(chunk.raw_size(), Ordering::SeqCst);
+
+                                MergedChunkInfo::New(ChunkInfo {
+                                    chunk,
+                                    digest,
+                                    chunk_len: chunk_len as u64,
+                                    offset,
+                                })
+                            }))
+                        }
+                    }
                 }
             })
             .merge_known_chunks()
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 proxmox-backup 19/23] fix #3174: archiver: reuse files with unchanged metadata
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (17 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 18/23] fix #3174: backup writer: inject queued chunk in upload steam Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 20/23] fix #3174: schema: add backup detection mode schema Christian Ebner
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

During pxar archive encoding, check regular files against their
previous backup catalogs metadata, if present.

Instead of re-encoding files with unchanged metadata with file size over
a given threshold limit, mark the entries as appendix references in the
pxar archive and append the chunks containing the file payload in the
appendix.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 1:
- Refactor to use new catalog file format and types
- fix issue with injected chunks not being included in the list to
  upload in case of consecutive files referencing the same chunk
- remove restriction of needing the previous backup run to be regular in
  order to perform metadata based file change detection. This can be
  done now, since the catalog stores the appendix start offset for each
  pxar archive in an easily accessible way.

 pbs-client/src/pxar/create.rs                 | 250 +++++++++++++++++-
 proxmox-backup-client/src/main.rs             |   8 +-
 .../src/proxmox_restore_daemon/api.rs         |   1 +
 pxar-bin/src/main.rs                          |   1 +
 src/tape/file_formats/snapshot_archive.rs     |   2 +-
 5 files changed, 242 insertions(+), 20 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 4464a3ed..a24c790b 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -17,14 +17,17 @@ use nix::sys::stat::{FileStat, Mode};
 
 use pathpatterns::{MatchEntry, MatchFlag, MatchList, MatchType, PatternFlag};
 use proxmox_sys::error::SysError;
-use pxar::encoder::{LinkOffset, SeqWrite};
+use pxar::encoder::{LinkOffset, SeqSink, SeqWrite};
 use pxar::Metadata;
 
 use proxmox_io::vec;
 use proxmox_lang::c_str;
 use proxmox_sys::fs::{self, acl, xattr};
 
-use pbs_datastore::catalog::{BackupCatalogWriter, CatalogReader};
+use pbs_datastore::catalog::{
+    AppendixStartOffset, BackupCatalogWriter, CatalogReader, CatalogV2Extension, DirEntry,
+    DirEntryAttribute,
+};
 use pbs_datastore::dynamic_index::{DynamicEntry, DynamicIndexReader};
 
 use crate::inject_reused_chunks::InjectChunks;
@@ -32,6 +35,8 @@ use crate::pxar::metadata::errno_is_unsupported;
 use crate::pxar::tools::assert_single_path_component;
 use crate::pxar::Flags;
 
+const MAX_FILE_SIZE: u64 = 128;
+
 /// Pxar options for creating a pxar archive/stream
 #[derive(Default)]
 pub struct PxarCreateOptions {
@@ -45,6 +50,8 @@ pub struct PxarCreateOptions {
     pub skip_lost_and_found: bool,
     /// Reference state for partial backups
     pub previous_ref: Option<PxarPrevRef>,
+    /// Catalog archive name
+    pub archive_name: Option<CString>,
 }
 
 /// Contains statefull information of previous backups snapshots for partial backups
@@ -144,7 +151,8 @@ struct Archiver {
     file_copy_buffer: Vec<u8>,
     previous_ref: Option<PxarPrevRef>,
     forced_boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
-    inject: (usize, Vec<DynamicEntry>),
+    inject: (pxar::encoder::AppendixRefOffset, Vec<DynamicEntry>),
+    prev_appendix: Option<AppendixStartOffset>,
 }
 
 type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
@@ -155,7 +163,7 @@ pub async fn create_archive<T, F>(
     feature_flags: Flags,
     callback: F,
     catalog: Option<Arc<Mutex<dyn BackupCatalogWriter + Send>>>,
-    options: PxarCreateOptions,
+    mut options: PxarCreateOptions,
     forced_boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
 ) -> Result<(), Error>
 where
@@ -196,6 +204,22 @@ where
         )?);
     }
 
+    let (appendix_start, prev_cat_parent) = if let Some(ref mut prev_ref) = options.previous_ref {
+        let entry = prev_ref
+            .catalog
+            .lookup_recursive(prev_ref.archive_name.as_bytes())?;
+        let parent = match entry.attr {
+            DirEntryAttribute::Archive { .. } => Some(entry),
+            _ => None,
+        };
+        let appendix_start = prev_ref
+            .catalog
+            .appendix_offset(prev_ref.archive_name.as_bytes())?;
+        (appendix_start, parent)
+    } else {
+        (None, None)
+    };
+
     let mut archiver = Archiver {
         feature_flags,
         fs_feature_flags,
@@ -212,13 +236,42 @@ where
         file_copy_buffer: vec::undefined(4 * 1024 * 1024),
         previous_ref: options.previous_ref,
         forced_boundaries,
-        inject: (0, Vec::new()),
+        inject: (pxar::encoder::AppendixRefOffset::default(), Vec::new()),
+        prev_appendix: appendix_start,
     };
 
+    if let Some(ref mut catalog) = archiver.catalog {
+        if let Some(ref archive_name) = options.archive_name {
+            catalog
+                .lock()
+                .unwrap()
+                .start_archive(archive_name.as_c_str())?;
+        }
+    }
+
     archiver
-        .archive_dir_contents(&mut encoder, source_dir, true)
+        .archive_dir_contents(&mut encoder, source_dir, prev_cat_parent.as_ref(), true)
         .await?;
-    encoder.finish().await?;
+
+    if archiver.inject.1.len() > 0 {
+        let (appendix_offset, appendix_size) = archiver.add_appendix(&mut encoder).await?;
+        encoder
+            .finish(Some((appendix_offset, appendix_size)))
+            .await?;
+        if let Some(catalog) = archiver.catalog {
+            if options.archive_name.is_some() {
+                catalog.lock().unwrap().end_archive(Some(appendix_offset))?;
+            }
+        }
+    } else {
+        encoder.finish(None).await?;
+        if let Some(ref mut catalog) = archiver.catalog {
+            if options.archive_name.is_some() {
+                catalog.lock().unwrap().end_archive(None)?;
+            }
+        }
+    }
+
     Ok(())
 }
 
@@ -247,6 +300,7 @@ impl Archiver {
         &'a mut self,
         encoder: &'a mut Encoder<'b, T>,
         mut dir: Dir,
+        prev_cat_parent: Option<&'a DirEntry>,
         is_root: bool,
     ) -> BoxFuture<'a, Result<(), Error>> {
         async move {
@@ -269,6 +323,13 @@ impl Archiver {
 
             let old_path = std::mem::take(&mut self.path);
 
+            let mut prev_catalog_dir_entries = Vec::new();
+            if let Some(ref mut prev_ref) = self.previous_ref {
+                if let Some(ref parent) = prev_cat_parent {
+                    prev_catalog_dir_entries = prev_ref.catalog.read_dir(parent)?;
+                }
+            }
+
             for file_entry in file_list {
                 let file_name = file_entry.name.to_bytes();
 
@@ -280,9 +341,15 @@ impl Archiver {
 
                 (self.callback)(&file_entry.path)?;
                 self.path = file_entry.path;
-                self.add_entry(encoder, dir_fd, &file_entry.name, &file_entry.stat)
-                    .await
-                    .map_err(|err| self.wrap_err(err))?;
+                self.add_entry(
+                    encoder,
+                    dir_fd,
+                    &file_entry.name,
+                    &file_entry.stat,
+                    &prev_catalog_dir_entries,
+                )
+                .await
+                .map_err(|err| self.wrap_err(err))?;
             }
             self.path = old_path;
             self.entry_counter = entry_counter;
@@ -529,12 +596,146 @@ impl Archiver {
         Ok(())
     }
 
+    async fn add_appendix<T: SeqWrite + Send>(
+        &mut self,
+        encoder: &mut Encoder<'_, T>,
+    ) -> Result<(pxar::encoder::AppendixStartOffset, u64), Error> {
+        let total = self
+            .inject
+            .1
+            .iter()
+            .fold(0, |sum, inject| sum + inject.end());
+        let appendix_offset = encoder.add_appendix(total).await?;
+        let mut boundaries = self.forced_boundaries.lock().unwrap();
+        let mut position = unsafe { encoder.position_add(0) };
+
+        // Inject reused chunks in patches of 128 to not exceed upload post req size limit
+        for injects in self.inject.1.chunks(128) {
+            let size = injects
+                .iter()
+                .fold(0, |sum, inject| sum + inject.end() as usize);
+            let inject_chunks = InjectChunks {
+                boundary: position,
+                chunks: injects.to_vec(),
+                size,
+            };
+            boundaries.push_back(inject_chunks);
+            position = unsafe { encoder.position_add(size as u64) };
+        }
+
+        Ok((appendix_offset, total))
+    }
+
+    async fn reuse_if_metadata_unchanged<T: SeqWrite + Send>(
+        &mut self,
+        encoder: &mut Encoder<'_, T>,
+        c_file_name: &CStr,
+        metadata: &Metadata,
+        stat: &FileStat,
+        catalog_entries: &[DirEntry],
+    ) -> Result<bool, Error> {
+        let prev_ref = match self.previous_ref {
+            None => return Ok(false),
+            Some(ref mut prev_ref) => prev_ref,
+        };
+
+        let catalog_entry = catalog_entries
+            .iter()
+            .find(|entry| entry.name == c_file_name.to_bytes());
+
+        let (size, ctime, start_offset) = match catalog_entry {
+            Some(&DirEntry {
+                attr:
+                    DirEntryAttribute::File {
+                        size,
+                        mtime: _,
+                        extension: Some(CatalogV2Extension { ctime, file_offset }),
+                    },
+                ..
+            }) => (size, ctime, file_offset.raw()),
+            Some(&DirEntry {
+                attr:
+                    DirEntryAttribute::AppendixRef {
+                        size,
+                        mtime: _,
+                        ctime,
+                        appendix_ref_offset,
+                    },
+                ..
+            }) => (
+                size,
+                ctime,
+                self.prev_appendix.unwrap().raw() + appendix_ref_offset.raw(),
+            ),
+            // The entry type found in the catalog is not a regular file,
+            // do not reuse entry but rather encode it again.
+            _ => return Ok(false),
+        };
+
+        let file_size = stat.st_size as u64;
+        if ctime == stat.st_ctime && size == file_size {
+            // Since ctime is unchanged, use current metadata size to calculate size and thereby
+            // end offset for the file payload in the reference archive.
+            // This is required to find the existing indexes to be included in the new index file.
+            let mut bytes = pxar::encoder::Encoder::<SeqSink>::byte_len(c_file_name, metadata)?;
+            // payload header size
+            bytes += std::mem::size_of::<pxar::format::Header>() as u64;
+
+            let end_offset = start_offset + bytes + file_size;
+            let (indices, padding_start) = prev_ref.index.indices(start_offset, end_offset)?;
+
+            let mut appendix_ref_offset = self.inject.0.add(padding_start);
+            let mut chunks = indices.into_iter();
+
+            if let Some(last) = self.inject.1.last() {
+                if let Some(next) = chunks.next() {
+                    // Check if the first chunk to be appended is equal to the last one.
+                    // If so, do not append again but rather adjust the reference offset.
+                    // Otherwise append it without recalculation of reference offset.
+                    // This avoids to concatenate the same chunk multiple times if multiple
+                    // unchanged files reference the same one.
+                    if next.digest() == last.digest() {
+                        appendix_ref_offset = appendix_ref_offset.sub(last.end());
+                    } else {
+                        self.inject.0 = self.inject.0.add(next.end());
+                        self.inject.1.push(next);
+                    }
+                }
+            }
+
+            // Append all remainig chunks
+            for chunk in chunks {
+                self.inject.0 = self.inject.0.add(chunk.end());
+                self.inject.1.push(chunk);
+            }
+
+            let file_name: &Path = OsStr::from_bytes(c_file_name.to_bytes()).as_ref();
+            self.add_appendix_ref(encoder, file_name, appendix_ref_offset, file_size)
+                .await?;
+
+            if let Some(ref catalog) = self.catalog {
+                catalog.lock().unwrap().add_appendix_ref(
+                    c_file_name,
+                    file_size,
+                    stat.st_mtime,
+                    stat.st_ctime,
+                    appendix_ref_offset,
+                )?;
+            }
+
+            return Ok(true);
+        }
+
+        Ok(false)
+    }
+
     async fn add_entry<T: SeqWrite + Send>(
         &mut self,
         encoder: &mut Encoder<'_, T>,
         parent: RawFd,
         c_file_name: &CStr,
         stat: &FileStat,
+        prev_cat_entries: &[DirEntry],
     ) -> Result<(), Error> {
         use pxar::format::mode;
 
@@ -595,6 +796,20 @@ impl Archiver {
                 }
 
                 let file_size = stat.st_size as u64;
+                if file_size > MAX_FILE_SIZE
+                    && self
+                        .reuse_if_metadata_unchanged(
+                            encoder,
+                            c_file_name,
+                            &metadata,
+                            stat,
+                            prev_cat_entries,
+                        )
+                        .await?
+                {
+                    return Ok(());
+                }
+
                 let offset: LinkOffset = self
                     .add_regular_file(encoder, fd, file_name, &metadata, file_size)
                     .await?;
@@ -622,8 +837,15 @@ impl Archiver {
                 if let Some(ref catalog) = self.catalog {
                     catalog.lock().unwrap().start_directory(c_file_name)?;
                 }
+                let prev_cat_entry = prev_cat_entries.iter().find(|entry| match entry {
+                    DirEntry {
+                        name,
+                        attr: DirEntryAttribute::Directory { .. },
+                    } => name == c_file_name.to_bytes(),
+                    _ => false,
+                });
                 let result = self
-                    .add_directory(encoder, dir, c_file_name, &metadata, stat)
+                    .add_directory(encoder, dir, c_file_name, &metadata, stat, prev_cat_entry)
                     .await;
                 if let Some(ref catalog) = self.catalog {
                     catalog.lock().unwrap().end_directory()?;
@@ -680,6 +902,7 @@ impl Archiver {
         dir_name: &CStr,
         metadata: &Metadata,
         stat: &FileStat,
+        prev_cat_parent: Option<&DirEntry>,
     ) -> Result<(), Error> {
         let dir_name = OsStr::from_bytes(dir_name.to_bytes());
 
@@ -706,14 +929,15 @@ impl Archiver {
             log::info!("skipping mount point: {:?}", self.path);
             Ok(())
         } else {
-            self.archive_dir_contents(&mut encoder, dir, false).await
+            self.archive_dir_contents(&mut encoder, dir, prev_cat_parent, false)
+                .await
         };
 
         self.fs_magic = old_fs_magic;
         self.fs_feature_flags = old_fs_feature_flags;
         self.current_st_dev = old_st_dev;
 
-        encoder.finish().await?;
+        encoder.finish(None).await?;
         result
     }
 
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 5945ae5d..cbdd9f43 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -48,7 +48,7 @@ use pbs_client::{
     FixedChunkStream, HttpClient, PxarBackupStream, RemoteChunkReader, UploadOptions,
     BACKUP_SOURCE_SCHEMA,
 };
-use pbs_datastore::catalog::{BackupCatalogWriter, CatalogReader, CatalogWriter};
+use pbs_datastore::catalog::{CatalogReader, CatalogWriter};
 use pbs_datastore::chunk_store::verify_chunk_size;
 use pbs_datastore::dynamic_index::{BufferedDynamicReader, DynamicIndexReader};
 use pbs_datastore::fixed_index::FixedIndexReader;
@@ -1005,10 +1005,6 @@ async fn create_backup(
                 let catalog = catalog.as_ref().unwrap();
 
                 log_file("directory", &filename, &target);
-                catalog
-                    .lock()
-                    .unwrap()
-                    .start_directory(std::ffi::CString::new(target.as_str())?.as_c_str())?;
 
                 let pxar_options = pbs_client::pxar::PxarCreateOptions {
                     device_set: devices.clone(),
@@ -1016,6 +1012,7 @@ async fn create_backup(
                     entries_max: entries_max as usize,
                     skip_lost_and_found,
                     previous_ref: None,
+                    archive_name: Some(std::ffi::CString::new(target.as_str())?),
                 };
 
                 let upload_options = UploadOptions {
@@ -1036,7 +1033,6 @@ async fn create_backup(
                 )
                 .await?;
                 manifest.add_file(target, stats.size, stats.csum, crypto.mode)?;
-                catalog.lock().unwrap().end_directory()?;
             }
             (BackupSpecificationType::IMAGE, false) => {
                 log_file("image", &filename, &target);
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index 5eff673e..734fa976 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -359,6 +359,7 @@ fn extract(
                         patterns,
                         skip_lost_and_found: false,
                         previous_ref: None,
+                        archive_name: None,
                     };
 
                     let pxar_writer = TokioWriter::new(writer);
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index c019f3e4..6d365115 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -371,6 +371,7 @@ async fn create_archive(
         patterns,
         skip_lost_and_found: false,
         previous_ref: None,
+        archive_name: None,
     };
 
     let writer = pxar::encoder::sync::StandardWriter::new(writer);
diff --git a/src/tape/file_formats/snapshot_archive.rs b/src/tape/file_formats/snapshot_archive.rs
index 252384b5..4bbf4727 100644
--- a/src/tape/file_formats/snapshot_archive.rs
+++ b/src/tape/file_formats/snapshot_archive.rs
@@ -88,7 +88,7 @@ pub fn tape_write_snapshot_archive<'a>(
                 proxmox_lang::io_bail!("file '{}' shrunk while reading", filename);
             }
         }
-        encoder.finish()?;
+        encoder.finish(None)?;
         Ok(())
     });
 
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 proxmox-backup 20/23] fix #3174: schema: add backup detection mode schema
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (18 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 19/23] fix #3174: archiver: reuse files with unchanged metadata Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 21/23] fix #3174: client: Add detection mode to backup creation Christian Ebner
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Adds the schema for switching the detection mode used to identify
regular files which changed since a reference backup run.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 1:
not present in version 1

 pbs-api-types/src/datastore.rs | 41 ++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
index 73c4890e..723bb43d 100644
--- a/pbs-api-types/src/datastore.rs
+++ b/pbs-api-types/src/datastore.rs
@@ -65,6 +65,14 @@ pub const BACKUP_TYPE_SCHEMA: Schema = StringSchema::new("Backup type.")
     ]))
     .schema();
 
+pub const BACKUP_DETECTION_MODE_SCHEMA: Schema =
+    StringSchema::new("Backup file change detection mode.")
+        .format(&ApiStringFormat::Enum(&[
+            EnumEntry::new("data", "Reading full data"),
+            EnumEntry::new("metadata", "Check file metadata"),
+        ]))
+        .schema();
+
 pub const BACKUP_TIME_SCHEMA: Schema = IntegerSchema::new("Backup time (Unix epoch.)")
     .minimum(1)
     .schema();
@@ -1048,6 +1056,39 @@ impl std::str::FromStr for BackupPart {
     }
 }
 
+#[api]
+/// Backup file detection mode
+#[derive(Clone, Copy, Debug, Eq, Hash, PartialEq, Deserialize, Serialize)]
+#[serde(rename_all = "lowercase")]
+pub enum BackupDetectionMode {
+    /// Data..
+    Data,
+    /// Metadata.
+    Metadata,
+}
+
+impl BackupDetectionMode {
+    pub const fn as_str(&self) -> &'static str {
+        match self {
+            BackupDetectionMode::Data=> "data",
+            BackupDetectionMode::Metadata => "metadata",
+        }
+    }
+}
+
+impl std::str::FromStr for BackupDetectionMode {
+    type Err = Error;
+
+    /// Parse backup file detection mode.
+    fn from_str(ty: &str) -> Result<Self, Error> {
+        Ok(match ty {
+            "data" => BackupDetectionMode::Data,
+            "metadata" => BackupDetectionMode::Metadata,
+            _ => bail!("invalid backup detection mode {ty:?}"),
+        })
+    }
+}
+
 #[api(
     properties: {
         "backup": { type: BackupDir },
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 proxmox-backup 21/23] fix #3174: client: Add detection mode to backup creation
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (19 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 20/23] fix #3174: schema: add backup detection mode schema Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 22/23] test-suite: add detection mode change benchmark Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 23/23] test-suite: Add bin to deb, add shell completions Christian Ebner
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Introduces the `change-detection-mode` parameter to change file
encoding behavior.

When set to `metadata`, the catalog for the previous backup run and
the corresponding index file are fetched from the server and used as
reference during pxar archive creation.
This allows the archiver to skip encoding of file payloads for
unchanged regular files and referencing their existing chunks to be
included in the new backups index file instead, creating a pxar
archive with appendix section containing the payloads as concatenation
of chunks.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 1:
- Replace `incremental` flag with `change-detection-mode` param
 proxmox-backup-client/src/main.rs | 118 ++++++++++++++++++++++++++++--
 1 file changed, 110 insertions(+), 8 deletions(-)

diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index cbdd9f43..478049f9 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -1,5 +1,6 @@
 use std::collections::{HashSet, VecDeque};
 use std::io::{self, Read, Seek, SeekFrom, Write};
+use std::os::unix::fs::OpenOptionsExt;
 use std::path::{Path, PathBuf};
 use std::pin::Pin;
 use std::sync::{Arc, Mutex};
@@ -24,10 +25,11 @@ use proxmox_time::{epoch_i64, strftime_local};
 use pxar::accessor::{MaybeReady, ReadAt, ReadAtOperation};
 
 use pbs_api_types::{
-    Authid, BackupDir, BackupGroup, BackupNamespace, BackupPart, BackupType, CryptMode,
-    Fingerprint, GroupListItem, PruneJobOptions, PruneListItem, RateLimitConfig, SnapshotListItem,
-    StorageStatus, BACKUP_ID_SCHEMA, BACKUP_NAMESPACE_SCHEMA, BACKUP_TIME_SCHEMA,
-    BACKUP_TYPE_SCHEMA, TRAFFIC_CONTROL_BURST_SCHEMA, TRAFFIC_CONTROL_RATE_SCHEMA,
+    Authid, BackupDetectionMode, BackupDir, BackupGroup, BackupNamespace, BackupPart, BackupType,
+    CryptMode, Fingerprint, GroupListItem, PruneJobOptions, PruneListItem, RateLimitConfig,
+    SnapshotListItem, StorageStatus, BACKUP_DETECTION_MODE_SCHEMA, BACKUP_ID_SCHEMA,
+    BACKUP_NAMESPACE_SCHEMA, BACKUP_TIME_SCHEMA, BACKUP_TYPE_SCHEMA, TRAFFIC_CONTROL_BURST_SCHEMA,
+    TRAFFIC_CONTROL_RATE_SCHEMA,
 };
 use pbs_client::catalog_shell::Shell;
 use pbs_client::pxar::ErrorHandler as PxarErrorHandler;
@@ -666,6 +668,10 @@ fn spawn_catalog_upload(
                schema: TRAFFIC_CONTROL_BURST_SCHEMA,
                optional: true,
            },
+           "change-detection-mode": {
+               schema: BACKUP_DETECTION_MODE_SCHEMA,
+               optional: true,
+           },
            "exclude": {
                type: Array,
                description: "List of paths or patterns for matching files to exclude.",
@@ -849,7 +855,22 @@ async fn create_backup(
 
     let backup_time = backup_time_opt.unwrap_or_else(epoch_i64);
 
-    let client = connect_rate_limited(&repo, rate_limit)?;
+    let detection_mode: BackupDetectionMode = param["change-detection-mode"]
+        .as_str()
+        .unwrap_or("data")
+        .parse()?;
+
+    let client = connect_rate_limited(&repo, rate_limit.clone())?;
+    let backup_group = BackupGroup::new(backup_type, backup_id);
+
+    let previous_snapshot = if detection_mode == BackupDetectionMode::Metadata {
+        let snapshot =
+            api_datastore_latest_snapshot(&client, &repo.store(), &backup_ns, backup_group).await?;
+        Some(snapshot)
+    } else {
+        None
+    };
+
     record_repository(&repo);
 
     let snapshot = BackupDir::from((backup_type, backup_id.to_owned(), backup_time));
@@ -959,8 +980,8 @@ async fn create_backup(
         log::info!("{} {} '{}' to '{}' as {}", what, desc, file, repo, target);
     };
 
-    for (backup_type, filename, target, size) in upload_list {
-        match (backup_type, dry_run) {
+    for (backup_spec_type, filename, target, size) in upload_list {
+        match (backup_spec_type, dry_run) {
             // dry-run
             (BackupSpecificationType::CONFIG, true) => log_file("config file", &filename, &target),
             (BackupSpecificationType::LOGFILE, true) => log_file("log file", &filename, &target),
@@ -1006,12 +1027,44 @@ async fn create_backup(
 
                 log_file("directory", &filename, &target);
 
+                let known_chunks = Arc::new(Mutex::new(HashSet::new()));
+                let previous_ref = if detection_mode == BackupDetectionMode::Metadata {
+                    match previous_manifest {
+                        None => None,
+                        Some(ref manifest) => {
+                            let reference_index = client
+                                .download_previous_dynamic_index(
+                                    &target,
+                                    &manifest,
+                                    known_chunks.clone(),
+                                )
+                                .await?;
+
+                            let reference_catalog = download_reference_catalog(
+                                &repo,
+                                previous_snapshot.as_ref().unwrap(),
+                                &backup_ns,
+                                crypt_config.clone(),
+                            )
+                            .await?;
+
+                            Some(pbs_client::pxar::PxarPrevRef {
+                                index: reference_index,
+                                catalog: reference_catalog,
+                                archive_name: target.clone(),
+                            })
+                        }
+                    }
+                } else {
+                    None
+                };
+
                 let pxar_options = pbs_client::pxar::PxarCreateOptions {
                     device_set: devices.clone(),
                     patterns: pattern_list.clone(),
                     entries_max: entries_max as usize,
                     skip_lost_and_found,
-                    previous_ref: None,
+                    previous_ref,
                     archive_name: Some(std::ffi::CString::new(target.as_str())?),
                 };
 
@@ -1112,6 +1165,55 @@ async fn create_backup(
     Ok(Value::Null)
 }
 
+async fn download_reference_catalog(
+    repo: &BackupRepository,
+    previous_snapshot: &BackupDir,
+    backup_ns: &BackupNamespace,
+    crypt_config: Option<Arc<CryptConfig>>,
+) -> Result<CatalogReader<std::fs::File>, Error> {
+    let http_reader_client = connect(&repo)?;
+    let backup_reader = BackupReader::start(
+        http_reader_client,
+        crypt_config.clone(),
+        repo.store(),
+        &backup_ns,
+        &previous_snapshot,
+        true,
+    )
+    .await?;
+
+    let (manifest, _) = backup_reader.download_manifest().await?;
+    manifest.check_fingerprint(crypt_config.as_ref().map(Arc::as_ref))?;
+
+    let index = backup_reader
+        .download_dynamic_index(&manifest, CATALOG_NAME)
+        .await?;
+    let most_used = index.find_most_used_chunks(8);
+    let file_info = manifest.lookup_file_info(CATALOG_NAME)?;
+
+    let chunk_reader = RemoteChunkReader::new(
+        backup_reader,
+        crypt_config.clone(),
+        file_info.chunk_crypt_mode(),
+        most_used,
+    );
+
+    let mut reader = BufferedDynamicReader::new(index, chunk_reader);
+
+    let mut catalogfile = std::fs::OpenOptions::new()
+        .write(true)
+        .read(true)
+        .custom_flags(libc::O_TMPFILE)
+        .open("/tmp")?;
+
+    std::io::copy(&mut reader, &mut catalogfile)
+        .map_err(|err| format_err!("failed to download reference catalog - {}", err))?;
+
+    catalogfile.seek(SeekFrom::Start(0))?;
+
+    Ok(CatalogReader::new(catalogfile))
+}
+
 async fn dump_image<W: Write>(
     client: Arc<BackupReader>,
     crypt_config: Option<Arc<CryptConfig>>,
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 proxmox-backup 22/23] test-suite: add detection mode change benchmark
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (20 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 21/23] fix #3174: client: Add detection mode to backup creation Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 23/23] test-suite: Add bin to deb, add shell completions Christian Ebner
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Introduces the proxmox-backup-test-suite create intended for
benchmarking and high level user facing testing.

The initial code includes a benchmark intended for regression testing of
the proxmox-backup-client when using different file detection modes
during backup.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 1:
not present in version 1

 Cargo.toml                                    |   1 +
 proxmox-backup-test-suite/Cargo.toml          |  18 ++
 .../src/detection_mode_bench.rs               | 288 ++++++++++++++++++
 proxmox-backup-test-suite/src/main.rs         |  17 ++
 4 files changed, 324 insertions(+)
 create mode 100644 proxmox-backup-test-suite/Cargo.toml
 create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs
 create mode 100644 proxmox-backup-test-suite/src/main.rs

diff --git a/Cargo.toml b/Cargo.toml
index cfbf2ba1..92ffd1d5 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -46,6 +46,7 @@ members = [
     "proxmox-rrd",
 
     "pxar-bin",
+    "proxmox-backup-test-suite",
 ]
 
 [lib]
diff --git a/proxmox-backup-test-suite/Cargo.toml b/proxmox-backup-test-suite/Cargo.toml
new file mode 100644
index 00000000..3f899e9b
--- /dev/null
+++ b/proxmox-backup-test-suite/Cargo.toml
@@ -0,0 +1,18 @@
+[package]
+name = "proxmox-backup-test-suite"
+version = "0.1.0"
+authors.workspace = true
+edition.workspace = true
+
+[dependencies]
+anyhow.workspace = true
+futures.workspace = true
+serde.workspace = true
+serde_json.workspace = true
+
+pbs-client.workspace = true
+pbs-key-config.workspace = true
+pbs-tools.workspace = true
+proxmox-async.workspace = true
+proxmox-router = { workspace = true, features = ["cli"] }
+proxmox-schema = { workspace = true, features = [ "api-macro" ] }
diff --git a/proxmox-backup-test-suite/src/detection_mode_bench.rs b/proxmox-backup-test-suite/src/detection_mode_bench.rs
new file mode 100644
index 00000000..e219dc84
--- /dev/null
+++ b/proxmox-backup-test-suite/src/detection_mode_bench.rs
@@ -0,0 +1,288 @@
+use std::path::Path;
+use std::process::Command;
+
+use anyhow::{bail, format_err, Error};
+use serde_json::Value;
+
+use pbs_client::{
+    tools::{complete_repository, key_source::KEYFILE_SCHEMA, REPO_URL_SCHEMA},
+    BACKUP_SOURCE_SCHEMA,
+};
+use pbs_tools::json;
+use proxmox_router::cli::*;
+use proxmox_schema::api;
+
+const DEFAULT_NUMBER_OF_RUNS: u64 = 5;
+// Homepage https://cocodataset.org/
+const COCO_DATASET_SRC_URL: &'static str = "http://images.cocodataset.org/zips/unlabeled2017.zip";
+// Homepage https://kernel.org/
+const LINUX_GIT_REPOSITORY: &'static str =
+    "git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git";
+const LINUX_GIT_TAG: &'static str = "v6.5.5";
+
+pub(crate) fn detection_mode_bench_mgtm_cli() -> CliCommandMap {
+    let run_cmd_def = CliCommand::new(&API_METHOD_DETECTION_MODE_BENCH_RUN)
+        .arg_param(&["backupspec"])
+        .completion_cb("repository", complete_repository)
+        .completion_cb("keyfile", complete_file_name);
+
+    let prepare_cmd_def = CliCommand::new(&API_METHOD_DETECTION_MODE_BENCH_PREPARE);
+    CliCommandMap::new()
+        .insert("prepare", prepare_cmd_def)
+        .insert("run", run_cmd_def)
+}
+
+#[api(
+   input: {
+       properties: {
+           backupspec: {
+               type: Array,
+               description: "List of backup source specifications ([<label.ext>:<path>] ...)",
+               items: {
+                   schema: BACKUP_SOURCE_SCHEMA,
+               }
+           },
+           repository: {
+               schema: REPO_URL_SCHEMA,
+               optional: true,
+           },
+           keyfile: {
+               schema: KEYFILE_SCHEMA,
+               optional: true,
+           },
+           "number-of-runs": {
+               description: "Number of times to repeat the run",
+               type: Integer,
+               optional: true,
+           },
+       }
+   }
+)]
+/// Run benchmark to compare performance for backups using different change detection modes.
+fn detection_mode_bench_run(param: Value) -> Result<(), Error> {
+    let mut pbc = Command::new("proxmox-backup-client");
+    pbc.arg("backup");
+
+    let backupspec_list = json::required_array_param(&param, "backupspec")?;
+    for backupspec in backupspec_list {
+        let arg = backupspec
+            .as_str()
+            .ok_or_else(|| format_err!("failed to parse backupspec"))?;
+        pbc.arg(arg);
+    }
+
+    if let Some(repo) = param["repository"].as_str() {
+        pbc.arg("--repository");
+        pbc.arg::<&str>(repo);
+    }
+
+    if let Some(keyfile) = param["keyfile"].as_str() {
+        pbc.arg("--keyfile");
+        pbc.arg::<&str>(keyfile);
+    }
+
+    let number_of_runs = match param["number_of_runs"].as_u64() {
+        Some(n) => n,
+        None => DEFAULT_NUMBER_OF_RUNS,
+    };
+    if number_of_runs < 1 {
+        bail!("Number of runs must be greater than 1, aborting.");
+    }
+
+    // First run is an initial run to make sure all chunks are present already, reduce side effects
+    // by filesystem caches ecc.
+    let _stats_initial = do_run(&mut pbc, 1)?;
+
+    println!("\nStarting benchmarking backups with regular detection mode...\n");
+    let stats_reg = do_run(&mut pbc, number_of_runs)?;
+
+    println!("\nStarting benchmarking backups with regular detection mode...\n");
+    pbc.arg("--change-detection-mode=metadata");
+    let stats_meta = do_run(&mut pbc, number_of_runs)?;
+
+    println!("\nCompleted benchmark with {number_of_runs} runs for each tested mode.");
+    println!("\nCompleted regular backup with:");
+    println!("Total runtime: {:.2} s", stats_reg.total);
+    println!("Average: {:.2} ± {:.2} s", stats_reg.avg, stats_reg.stddev);
+    println!("Min: {:.2} s", stats_reg.min);
+    println!("Max: {:.2} s", stats_reg.max);
+
+    println!("\nCompleted metadata detection mode backup with:");
+    println!("Total runtime: {:.2} s", stats_meta.total);
+    println!(
+        "Average: {:.2} ± {:.2} s",
+        stats_meta.avg, stats_meta.stddev
+    );
+    println!("Min: {:.2} s", stats_meta.min);
+    println!("Max: {:.2} s", stats_meta.max);
+
+    let diff_stddev =
+        ((stats_meta.stddev * stats_meta.stddev) + (stats_reg.stddev * stats_reg.stddev)).sqrt();
+    println!("\nDifferences (metadata based - regular):");
+    println!(
+        "Delta total runtime: {:.2} s ({:.2} %)",
+        stats_meta.total - stats_reg.total,
+        100.0 * (stats_meta.total / stats_reg.total - 1.0),
+    );
+    println!(
+        "Delta average: {:.2} ± {:.2} s ({:.2} %)",
+        stats_meta.avg - stats_reg.avg,
+        diff_stddev,
+        100.0 * (stats_meta.avg / stats_reg.avg - 1.0),
+    );
+    println!(
+        "Delta min: {:.2} s ({:.2} %)", 
+        stats_meta.min - stats_reg.min,
+        100.0 * (stats_meta.min / stats_reg.min - 1.0),
+    );
+    println!(
+        "Delta max: {:.2} s ({:.2} %)",
+        stats_meta.max - stats_reg.max,
+        100.0 * (stats_meta.max / stats_reg.max - 1.0),
+    );
+
+    Ok(())
+}
+
+fn do_run(cmd: &mut Command, n_runs: u64) -> Result<Statistics, Error> {
+    let mut timings = Vec::with_capacity(n_runs as usize);
+    for iteration in 1..n_runs + 1 {
+        let start = std::time::SystemTime::now();
+        let mut child = cmd.spawn()?;
+        let exit_code = child.wait()?;
+        let elapsed = start.elapsed()?;
+        timings.push(elapsed);
+        if !exit_code.success() {
+            bail!("Run number {iteration} of {n_runs} failed, aborting.");
+        }
+    }
+
+    Ok(statistics(timings))
+}
+
+struct Statistics {
+    total: f64,
+    avg: f64,
+    stddev: f64,
+    min: f64,
+    max: f64,
+}
+
+fn statistics(timings: Vec<std::time::Duration>) -> Statistics {
+    let total = timings
+        .iter()
+        .fold(0f64, |sum, time| sum + time.as_secs_f64());
+    let avg = total / timings.len() as f64;
+    let var = 1f64 / (timings.len() - 1) as f64
+        * timings.iter().fold(0f64, |sq_sum, time| {
+            let diff = time.as_secs_f64() - avg;
+            sq_sum + diff * diff
+        });
+    let stddev = var.sqrt();
+    let min = timings.iter().min().unwrap().as_secs_f64();
+    let max = timings.iter().max().unwrap().as_secs_f64();
+
+    Statistics {
+        total,
+        avg,
+        stddev,
+        min,
+        max,
+    }
+}
+
+#[api(
+    input: {
+        properties: {
+            target: {
+                description: "target path to prepare test data.",
+            },
+        },
+    },
+)]
+/// Prepare files required for detection mode backup benchmarks.
+fn detection_mode_bench_prepare(target: String) -> Result<(), Error> {
+    let linux_repo_target = format!("{target}/linux");
+    let coco_dataset_target = format!("{target}/coco");
+    git_clone(LINUX_GIT_REPOSITORY, linux_repo_target.as_str())?;
+    git_checkout(LINUX_GIT_TAG, linux_repo_target.as_str())?;
+    wget_download(COCO_DATASET_SRC_URL, coco_dataset_target.as_str())?;
+
+    Ok(())
+}
+
+fn git_clone(repo: &str, target: &str) -> Result<(), Error> {
+    println!("Calling git clone for '{repo}'.");
+    let target_git = format!("{target}/.git");
+    let path = Path::new(&target_git);
+    if let Ok(true) = path.try_exists() {
+        println!("Target '{target}' already contains a git repository, skip.");
+        return Ok(());
+    }
+
+    let mut git = Command::new("git");
+    git.args(["clone", repo, target]);
+
+    let mut child = git.spawn()?;
+    let exit_code = child.wait()?;
+    if exit_code.success() {
+        println!("git clone finished with success.");
+    } else {
+        bail!("git clone failed for '{target}'.");
+    }
+
+    Ok(())
+}
+
+fn git_checkout(checkout_target: &str, target: &str) -> Result<(), Error> {
+    println!("Calling git checkout '{checkout_target}'.");
+    let mut git = Command::new("git");
+    git.args(["-C", target, "checkout", checkout_target]);
+
+    let mut child = git.spawn()?;
+    let exit_code = child.wait()?;
+    if exit_code.success() {
+        println!("git checkout finished with success.");
+    } else {
+        bail!("git checkout '{checkout_target}' failed for '{target}'.");
+    }
+    Ok(())
+}
+
+fn wget_download(source_url: &str, target: &str) -> Result<(), Error> {
+    let path = Path::new(&target);
+    if let Ok(true) = path.try_exists() {
+        println!("Target '{target}' already exists, skip.");
+        return Ok(());
+    }
+    let zip = format!("{}/unlabeled2017.zip", target);
+    let path = Path::new(&zip);
+    if !path.try_exists()? {
+        println!("Download archive using wget from '{source_url}' to '{target}'.");
+        let mut wget = Command::new("wget");
+        wget.args(["-P", target, source_url]);
+
+        let mut child = wget.spawn()?;
+        let exit_code = child.wait()?;
+        if exit_code.success() {
+            println!("Download finished with success.");
+        } else {
+            bail!("Failed to download '{source_url}' to '{target}'.");
+        }
+        return Ok(());
+    } else {
+        println!("Target '{target}' already contains download, skip download.");
+    }
+
+    let mut unzip = Command::new("unzip");
+    unzip.args([&zip, "-d", target]);
+
+    let mut child = unzip.spawn()?;
+    let exit_code = child.wait()?;
+    if exit_code.success() {
+        println!("Extracting zip archive finished with success.");
+    } else {
+        bail!("Failed to extract zip archive '{zip}' to '{target}'.");
+    }
+    Ok(())
+}
diff --git a/proxmox-backup-test-suite/src/main.rs b/proxmox-backup-test-suite/src/main.rs
new file mode 100644
index 00000000..0a5b436a
--- /dev/null
+++ b/proxmox-backup-test-suite/src/main.rs
@@ -0,0 +1,17 @@
+use proxmox_router::cli::*;
+
+mod detection_mode_bench;
+
+fn main() {
+    let cmd_def = CliCommandMap::new().insert(
+        "detection-mode-bench",
+        detection_mode_bench::detection_mode_bench_mgtm_cli(),
+    );
+
+    let rpcenv = CliEnvironment::new();
+    run_cli_command(
+        cmd_def,
+        rpcenv,
+        Some(|future| proxmox_async::runtime::main(future)),
+    );
+}
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

* [pbs-devel] [RFC v2 proxmox-backup 23/23] test-suite: Add bin to deb, add shell completions
  2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
                   ` (21 preceding siblings ...)
  2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 22/23] test-suite: add detection mode change benchmark Christian Ebner
@ 2023-10-09 11:51 ` Christian Ebner
  22 siblings, 0 replies; 24+ messages in thread
From: Christian Ebner @ 2023-10-09 11:51 UTC (permalink / raw)
  To: pbs-devel

Adds the required files for bash and zsh completion and packages the
binary to be included in the proxmox-backup-client debian package.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 1:
not present in version 1

 Makefile                                     | 13 ++++++++-----
 debian/proxmox-backup-client.bash-completion |  1 +
 debian/proxmox-backup-client.install         |  2 ++
 debian/proxmox-backup-test-suite.bc          |  8 ++++++++
 zsh-completions/_proxmox-backup-test-suite   | 13 +++++++++++++
 5 files changed, 32 insertions(+), 5 deletions(-)
 create mode 100644 debian/proxmox-backup-test-suite.bc
 create mode 100644 zsh-completions/_proxmox-backup-test-suite

diff --git a/Makefile b/Makefile
index 0317dd5e..acaac3f7 100644
--- a/Makefile
+++ b/Makefile
@@ -8,11 +8,12 @@ SUBDIRS := etc www docs
 
 # Binaries usable by users
 USR_BIN := \
-	proxmox-backup-client 	\
-	proxmox-file-restore	\
-	pxar			\
-	proxmox-tape		\
-	pmtx			\
+	proxmox-backup-client 		\
+	proxmox-backup-test-suite 	\
+	proxmox-file-restore		\
+	pxar				\
+	proxmox-tape			\
+	pmtx				\
 	pmt
 
 # Binaries usable by admins
@@ -165,6 +166,8 @@ $(COMPILED_BINS) $(COMPILEDIR)/dump-catalog-shell-cli $(COMPILEDIR)/docgen: .do-
 	    --bin proxmox-backup-client \
 	    --bin dump-catalog-shell-cli \
 	    --bin proxmox-backup-debug \
+	    --package proxmox-backup-test-suite \
+	    --bin proxmox-backup-test-suite \
 	    --package proxmox-file-restore \
 	    --bin proxmox-file-restore \
 	    --package pxar-bin \
diff --git a/debian/proxmox-backup-client.bash-completion b/debian/proxmox-backup-client.bash-completion
index 43736017..c4ff02ae 100644
--- a/debian/proxmox-backup-client.bash-completion
+++ b/debian/proxmox-backup-client.bash-completion
@@ -1,2 +1,3 @@
 debian/proxmox-backup-client.bc proxmox-backup-client
+debian/proxmox-backup-test-suite.bc proxmox-backup-test-suite
 debian/pxar.bc pxar
diff --git a/debian/proxmox-backup-client.install b/debian/proxmox-backup-client.install
index 74b568f1..0eb85975 100644
--- a/debian/proxmox-backup-client.install
+++ b/debian/proxmox-backup-client.install
@@ -1,6 +1,8 @@
 usr/bin/proxmox-backup-client
+usr/bin/proxmox-backup-test-suite
 usr/bin/pxar
 usr/share/man/man1/proxmox-backup-client.1
 usr/share/man/man1/pxar.1
 usr/share/zsh/vendor-completions/_proxmox-backup-client
+usr/share/zsh/vendor-completions/_proxmox-backup-test-suite
 usr/share/zsh/vendor-completions/_pxar
diff --git a/debian/proxmox-backup-test-suite.bc b/debian/proxmox-backup-test-suite.bc
new file mode 100644
index 00000000..2686d7ea
--- /dev/null
+++ b/debian/proxmox-backup-test-suite.bc
@@ -0,0 +1,8 @@
+# proxmox-backup-test-suite bash completion
+
+# see http://tiswww.case.edu/php/chet/bash/FAQ
+# and __ltrim_colon_completions() in /usr/share/bash-completion/bash_completion
+# this modifies global var, but I found no better way
+COMP_WORDBREAKS=${COMP_WORDBREAKS//:}
+
+complete -C 'proxmox-backup-test-suite bashcomplete' proxmox-backup-test-suite
diff --git a/zsh-completions/_proxmox-backup-test-suite b/zsh-completions/_proxmox-backup-test-suite
new file mode 100644
index 00000000..72ebcea5
--- /dev/null
+++ b/zsh-completions/_proxmox-backup-test-suite
@@ -0,0 +1,13 @@
+#compdef _proxmox-backup-test-suite() proxmox-backup-test-suite
+
+function _proxmox-backup-test-suite() {
+    local cwords line point cmd curr prev
+    cwords=${#words[@]}
+    line=$words
+    point=${#line}
+    cmd=${words[1]}
+    curr=${words[cwords]}
+    prev=${words[cwords-1]}
+    compadd -- $(COMP_CWORD="$cwords" COMP_LINE="$line" COMP_POINT="$point" \
+        proxmox-backup-test-suite bashcomplete "$cmd" "$curr" "$prev")
+}
-- 
2.39.2





^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-10-09 12:01 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-09 11:51 [pbs-devel] [RFC v2 pxar proxmox-backup 00/23] fix #3174: improve file-level backup Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 1/23] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 2/23] fix #3174: decoder: impl skip_bytes for sync dec Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 3/23] fix #3174: encoder: calc filename + metadata byte size Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 4/23] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 5/23] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 6/23] fix #3174: encoder: helper to add to encoder position Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 pxar 7/23] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 08/23] fix #3174: index: add fn index list from start/end-offsets Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 09/23] fix #3174: index: add fn digest for DynamicEntry Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 10/23] fix #3174: api: double catalog upload size Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 11/23] fix #3174: catalog: introduce extended format v2 Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 12/23] fix #3174: archiver/extractor: impl appendix ref Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 13/23] fix #3174: catalog: add specialized Archive entry Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 14/23] fix #3174: extractor: impl seq restore from appendix Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 15/23] fix #3174: archiver: store ref to previous backup Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 16/23] fix #3174: upload stream: impl reused chunk injector Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 17/23] fix #3174: chunker: add forced boundaries Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 18/23] fix #3174: backup writer: inject queued chunk in upload steam Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 19/23] fix #3174: archiver: reuse files with unchanged metadata Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 20/23] fix #3174: schema: add backup detection mode schema Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 21/23] fix #3174: client: Add detection mode to backup creation Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 22/23] test-suite: add detection mode change benchmark Christian Ebner
2023-10-09 11:51 ` [pbs-devel] [RFC v2 proxmox-backup 23/23] test-suite: Add bin to deb, add shell completions Christian Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal