public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup
@ 2024-05-27 14:32 Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 01/69] decoder: factor out skip part from skip_entry Christian Ebner
                   ` (69 more replies)
  0 siblings, 70 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

This series of patches implements an metadata based file change
detection mechanism for improved pxar file level backup creation speed
for unchanged files.

The chosen approach is to split pxar archives on creation via the
proxmox-backup-client into two separate data and upload streams,
one exclusive for regular file payloads, the other one for the rest
of the pxar archive, which is mostly metadata.

On consecutive runs, the metadata archive of the previous backup run,
which is limited in size and therefore rapidly accessed is used to
lookup and compare the metadata for entries to encode.
This assumes that the connection speed to the Proxmox Backup Server is
sufficiently fast, allowing the download and chaching of the chunks for
that index.

Changes to regular files are detected by comparing all of the files
metadata object, including mtime, acls, ecc. If no changes are detected,
the previous payload index is used to lookup chunks to possibly re-use
in the payload stream of the new archive.
In order to reduce possible chunk fragmentation, the decision whether to
reuse or reencode a file payload is deferred until enough information
is gathered by adding entries to a look-ahead cache. If the padding
introduced by reusing chunks falls below a threshold, the entries are
referenced, the chunks are reused and injected into the pxar payload
upload stream, otherwise they are discated and the files encoded
regularly.

Patches 16 and 17 are to be applied before the patches to the pxar
repository, while patches 14 and 15 are to be applied to the pxar repository
only after patch 52 in the series, for the patches to compile in a sequential
chain.

The following lists the most notable changes included in this series since
the version 6:
- Allow to use `.pxar` extension in cli commands for convenience
- Refactor the input/output interface for the pxar encoder, decoder and
  accessor to use a `PxarVariant` enum, in order to guarantee the
  payload relate input/output is always attached for split archives.
- Refactor the lookahead caching logic in the pxars `Archiver` to
  improve overall code readability.
- Add helper method for file name matching and use it where possible,
  for it to be handled in a single place.
- Extend documentation to include additional information about which
  metadata is compared to the previous snapshot
- Fix an issue with the `pxar list` which failed in case of metadata
  only pxar archives.
- Fix an issue in the payload chunker test where the context was not
  updated accordingly.
- Various clippy fixes, smaller refactoring and reordering of patches

The following lists the most notable changes included in this series since
the version 5:
- Fix an issue where the payload chunker was not correctly reset after
  suggested or forced boundaries.
- Added regression tests for payload chunker and chunk stream.

The following lists the most notable changes included in this series since
the version 4:
- Increase open file handle limit to hard limit and adapt lookahead
  cache size dynamically (thanks a lot to Thomas for pointing this out
  and providing the necessary background information). This helps with
  the reuse of multiple entries being contained within the same chunk,
  otherwise exceeding padding threshold and being therefore reencoded
  instead.
- Fix payload chunker scan to only scan up until chunk pos in case a
  suggested boundary is chosen.
- Fix issue with decoder state being not set to correct `InDirectory`
  after reading prelude and getting root directory entry.
- Fix issue with kept back chunk injection when the chunk follows a
  range discontinuity.
- Add regression test for pxar create with metadata archive and payload
  index reference.

The following lists the most notable changes included in this series since
the version 3:
- Rework the whole reused chunk injection and accounting logic and use
  lockless async `mpsc::channel`s instead of `Arc<Mutex<VecDeque<..>>>`.
- Reworked lookahead caching logic to use payload ranges and check for
  possible range continuation instead of looking up the reusable dynamic
  entries immediately in case of a reusable entry chain. This also
  avoids edge cases not covered in the previous version of the patch series.
  This current version therefore tends to reencode small files more
  aggressively, since they might introduce additional unwanted paddings.
- Correctly cover also hardlinks for the reuse logic, avoiding to
  reencode these entries.
- Add additional dedicatet chunker implementation for payload data
  stream, allowing the archiver to suggest boundaries to the chunker to
  reduce padding for reused chunks.
- Add additional `change-detection-mode=data`, in order to allow
  creating split archives with fully reencoded payload data.
- Add additional payload input readers for pxar accessor type
  implementations where needed.
- Add additional consistency check in pxar encoder when dropping state
  or encoder instance.
- CliParams was renamed to the more opaque Prelude, since the pxar
  archive does not care about its contents and this might be extended to
  store other information about the archive as well.
- Add missing proxmox-file-restore for split archives and fix restore of
  tar/zip archives via WebUI. This is handled by the same decoder logic,
  and needed an updated payload input content range to read the data
  from the correct location in the payload data archive.
- Additional refactoring to use the pxar reader helpers where possible.

The following lists the most notable changes included in this series since
the version 2:
- many bugfixes regarding incorrect archive encoding by wrong offset
  generation, adding additional sanity checks and rather fail on
  encoding than produce an incorrectly encoded archive
- different approach for deciding whether to reuse or reencode the
  entries. Previously, the entries have been encoded when a cached
  payload size threshold was reached. Now, the padding introduced by
  reusable chunks is tracked, and only if the padding does not exceed
  the set threshold, the entries are reused. This reduces the possible
  padding, at the cost of reencoding more entries. Also avoids to
  re-use chunks which have now large padding holes because of
  moved/removed files contained within.
- added headers for metadata archive and payload file
- added documentation

An invocation of a backup run with this patches now is:
```bash
proxmox-backup-client backup <label>.pxar:<source-path> --change-detection-mode=metadata
```
During the first run, no reference index is available, the pxar archive
will however be split into the two parts.
Following backups will however utilize the pxar archive accessor and
index files of the previous run to perform file change detection.

As benchmarks, the linux source code as well as the coco dataset for
computer vision and pattern recognition can be used.
The benchmarks can be performed by running:
```bash
proxmox-backup-test-suite detection-mode-bench prepare --target /<path-to-bench-source-target>
proxmox-backup-test-suite detection-mode-bench run linux.pxar:/<path-to-bench-source-target>/linux
proxmox-backup-test-suite detection-mode-bench run coco.pxar:/<path-to-bench-source-target>/coco
```

Above command invocations assume the default repository and credentials
to be set as environment variables, they might however be passed as
additional optional parameters instead.

pxar:

Christian Ebner (15):
  decoder: factor out skip part from skip_entry
  lib: add type for input/output variant differentiation
  encoder: move to stack based state tracking
  format/examples: add header type `PXAR_PAYLOAD_REF`
  decoder: add method to read payload references
  encoder: allow split output writer for archive creation
  decoder/accessor: allow for split input stream variant
  decoder: set payload input range when decoding via accessor
  encoder: add payload reference capability
  encoder: add payload position capability
  encoder: add payload advance capability
  encoder/format: finish payload stream with marker
  format: add payload stream start marker
  format/encoder/decoder: new pxar entry type `Version`
  format/encoder/decoder: new pxar entry type `Prelude`

 examples/apxar.rs            |   2 +-
 examples/mk-format-hashes.rs |  21 ++
 examples/pxarcmd.rs          |   7 +-
 src/accessor/aio.rs          |  10 +-
 src/accessor/mod.rs          | 120 +++++++--
 src/accessor/sync.rs         |   8 +-
 src/decoder/aio.rs           |  13 +-
 src/decoder/mod.rs           | 249 ++++++++++++++---
 src/decoder/sync.rs          |  21 +-
 src/encoder/aio.rs           |  90 +++++--
 src/encoder/mod.rs           | 508 ++++++++++++++++++++++++++---------
 src/encoder/sync.rs          |  75 +++++-
 src/format/mod.rs            |  63 +++++
 src/lib.rs                   |  71 +++++
 tests/compat.rs              |   3 +-
 tests/simple/fs.rs           |   8 +-
 tests/simple/main.rs         |  11 +-
 17 files changed, 1027 insertions(+), 253 deletions(-)

proxmox-backup:

Christian Ebner (54):
  client: backup: factor out extension from backup target
  api: datastore: refactor getting local chunk reader
  client: pxar: switch to stack based encoder state
  client: pxar: combine writers into struct
  client: pxar: optionally split metadata and payload streams
  client: helper: add helpers for creating reader instances
  client: helper: add method for split archive name mapping
  client: tools: helper to check pxar filename extensions
  client: restore: read payload from dedicated index
  tools: cover extension for split pxar archives
  restore: cover extension for split pxar archives
  client: mount: make split pxar archives mountable
  api: datastore: attach split archive payload chunk reader
  catalog: shell: make split pxar archives accessible
  www: cover metadata extension for pxar archives
  file restore: factor out getting pxar reader
  file restore: cover split metadata and payload archives
  file restore: show more error context when extraction fails
  pxar: add optional payload input for archive restore
  pxar: cover listing for split archives
  pxar: add more context to extraction error
  client: pxar: include payload offset in entry listing
  pxar: show padding in debug output on archive list
  datastore: dynamic index: add method to get digest
  client: pxar: helper for lookup of reusable dynamic entries
  upload stream: implement reused chunk injector
  client: chunk stream: add struct to hold injection state
  chunker: add method to reset chunker state
  client: streams: add channels for dynamic entry injection
  specs: add backup detection mode specification
  client: implement prepare reference method
  client: pxar: add method for metadata comparison
  pxar: caching: add look-ahead cache
  client: pxar: refactor catalog encoding for directories
  fix #3174: client: pxar: enable caching and meta comparison
  client: backup writer: add injected chunk count to stats
  pxar: create: keep track of reused chunks and files
  pxar: create: show chunk injection stats debug output
  client: pxar: add helper to handle optional preludes
  client: pxar: opt encode cli exclude patterns as Prelude
  pxar: ignore version and prelude entries in listing
  docs: file formats: describe split pxar archive file layout
  docs: add section describing change detection mode
  test-suite: add detection mode change benchmark
  test-suite: Makefile: add debian package and related files
  datastore: chunker: add Chunker trait
  datastore: chunker: implement chunker for payload stream
  client: chunk stream: switch payload stream chunker
  client: pxar: allow to restore prelude to optional path
  client: pxar: add archive creation with reference test
  client: tools: add helper to raise nofile rlimit
  client: pxar: set cache limit based on nofile rlimit
  chunker: tests: add regression tests for payload chunker
  chunk stream: tests: add regression tests for payload chunker

 Cargo.toml                                    |   1 +
 Makefile                                      |  18 +-
 debian/control                                |   7 +
 debian/proxmox-backup-client.bash-completion  |   1 +
 debian/proxmox-backup-test-suite.bc           |   8 +
 debian/proxmox-backup-test-suite.install      |   3 +
 docs/Makefile                                 |   2 +
 docs/backup-client.rst                        |  45 +
 docs/command-line-tools.rst                   |   5 +
 docs/command-syntax.rst                       |   4 +
 docs/conf.py                                  |   1 +
 docs/file-formats.rst                         |  46 +
 docs/meta-format-overview.dot                 |  50 +
 .../proxmox-backup-test-suite/description.rst |   2 +
 docs/proxmox-backup-test-suite/man1.rst       |  17 +
 docs/technical-overview.rst                   |   3 +
 examples/test_chunk_size.rs                   |   9 +-
 examples/test_chunk_speed.rs                  |   7 +-
 examples/test_chunk_speed2.rs                 |   2 +-
 pbs-client/src/backup_specification.rs        |  26 +
 pbs-client/src/backup_writer.rs               | 118 ++-
 pbs-client/src/chunk_stream.rs                | 238 ++++-
 pbs-client/src/inject_reused_chunks.rs        | 129 +++
 pbs-client/src/lib.rs                         |   3 +-
 pbs-client/src/pxar/create.rs                 | 911 +++++++++++++++++-
 pbs-client/src/pxar/extract.rs                |  28 +-
 pbs-client/src/pxar/look_ahead_cache.rs       | 165 ++++
 pbs-client/src/pxar/mod.rs                    |   5 +-
 pbs-client/src/pxar/tools.rs                  | 123 ++-
 pbs-client/src/pxar_backup_stream.rs          |  71 +-
 pbs-client/src/tools/mod.rs                   |  69 +-
 pbs-datastore/src/chunker.rs                  | 267 ++++-
 pbs-datastore/src/dynamic_index.rs            |  14 +-
 pbs-datastore/src/lib.rs                      |   2 +-
 pbs-pxar-fuse/src/lib.rs                      |   2 +-
 proxmox-backup-client/src/catalog.rs          |  29 +-
 proxmox-backup-client/src/helper.rs           | 114 +++
 proxmox-backup-client/src/main.rs             | 291 +++++-
 proxmox-backup-client/src/mount.rs            |  33 +-
 proxmox-backup-test-suite/Cargo.toml          |  18 +
 .../src/detection_mode_bench.rs               | 294 ++++++
 proxmox-backup-test-suite/src/main.rs         |  17 +
 proxmox-file-restore/src/main.rs              |  73 +-
 .../src/proxmox_restore_daemon/api.rs         |  20 +-
 pxar-bin/src/main.rs                          |  84 +-
 src/api2/admin/datastore.rs                   |  48 +-
 src/api2/tape/restore.rs                      |  22 +-
 src/bin/proxmox_backup_debug/diff.rs          |   2 +-
 src/tape/file_formats/snapshot_archive.rs     |   8 +-
 tests/catar.rs                                |   7 +-
 tests/pxar/backup-client-pxar-data.mpxar      | Bin 0 -> 15070 bytes
 tests/pxar/backup-client-pxar-data.ppxar.didx | Bin 0 -> 8096 bytes
 tests/pxar/backup-client-pxar-expected.mpxar  | Bin 0 -> 15086 bytes
 www/datastore/Content.js                      |   6 +-
 zsh-completions/_proxmox-backup-test-suite    |  13 +
 55 files changed, 3144 insertions(+), 337 deletions(-)
 create mode 100644 debian/proxmox-backup-test-suite.bc
 create mode 100644 debian/proxmox-backup-test-suite.install
 create mode 100644 docs/meta-format-overview.dot
 create mode 100644 docs/proxmox-backup-test-suite/description.rst
 create mode 100644 docs/proxmox-backup-test-suite/man1.rst
 create mode 100644 pbs-client/src/inject_reused_chunks.rs
 create mode 100644 pbs-client/src/pxar/look_ahead_cache.rs
 create mode 100644 proxmox-backup-client/src/helper.rs
 create mode 100644 proxmox-backup-test-suite/Cargo.toml
 create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs
 create mode 100644 proxmox-backup-test-suite/src/main.rs
 create mode 100644 tests/pxar/backup-client-pxar-data.mpxar
 create mode 100644 tests/pxar/backup-client-pxar-data.ppxar.didx
 create mode 100644 tests/pxar/backup-client-pxar-expected.mpxar
 create mode 100644 zsh-completions/_proxmox-backup-test-suite

-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 pxar 01/69] decoder: factor out skip part from skip_entry
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 02/69] lib: add type for input/output variant differentiation Christian Ebner
                   ` (68 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Make the skip part reusable for a different input.

In preparation for skipping payload paddings in a separated input.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes, patch reorderd

 src/decoder/mod.rs | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index d1fb911..3c6d9ef 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -563,15 +563,18 @@ impl<I: SeqRead> DecoderImpl<I> {
     //
 
     async fn skip_entry(&mut self, offset: u64) -> io::Result<()> {
-        let mut len = self.current_header.content_size() - offset;
+        let len = (self.current_header.content_size() - offset) as usize;
+        Self::skip(&mut self.input, len).await
+    }
+
+    async fn skip(input: &mut I, mut len: usize) -> io::Result<()> {
         let scratch = scratch_buffer();
-        while len >= (scratch.len() as u64) {
-            seq_read_exact(&mut self.input, scratch).await?;
-            len -= scratch.len() as u64;
+        while len >= (scratch.len()) {
+            seq_read_exact(input, scratch).await?;
+            len -= scratch.len();
         }
-        let len = len as usize;
         if len > 0 {
-            seq_read_exact(&mut self.input, &mut scratch[..len]).await?;
+            seq_read_exact(input, &mut scratch[..len]).await?;
         }
         Ok(())
     }
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 pxar 02/69] lib: add type for input/output variant differentiation
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 01/69] decoder: factor out skip part from skip_entry Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 03/69] encoder: move to stack based state tracking Christian Ebner
                   ` (67 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Introduce an enum which stores 2 different possible variants of
inputs or outputs to be passed to encoder and decoder/accessor
instances, depending whether to read/write a fully self contained
pxar archive or whether to split off the payload stream into a
separate input/output.

Co-authored-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- not present in previous version

 src/lib.rs | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/src/lib.rs b/src/lib.rs
index 210c4b1..f784c9e 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -494,3 +494,65 @@ impl Entry {
         }
     }
 }
+
+#[derive(Clone)]
+/// Possible variants of the encoder output and decoder as well as accessor inputs
+///
+/// Allow to have a unified or split input/output, depending on whether this is a split
+/// archive or not.
+pub enum PxarVariant<A, P> {
+    /// All of the pxar archive is contained within the given input/output
+    Unified(A),
+    /// Metadata and payload are split into separate inputs/outputs
+    Split(A, P),
+}
+
+impl<A, P> PxarVariant<A, P> {
+    pub fn archive(&self) -> &A {
+        match self {
+            PxarVariant::Unified(a) => a,
+            PxarVariant::Split(a, _) => a,
+        }
+    }
+
+    pub fn archive_mut(&mut self) -> &mut A {
+        match self {
+            PxarVariant::Unified(a) => a,
+            PxarVariant::Split(a, _) => a,
+        }
+    }
+
+    pub fn payload(&self) -> Option<&P> {
+        match self {
+            PxarVariant::Unified(_) => None,
+            PxarVariant::Split(_, p) => Some(p),
+        }
+    }
+
+    pub fn payload_mut(&mut self) -> Option<&mut P> {
+        match self {
+            PxarVariant::Unified(_) => None,
+            PxarVariant::Split(_, p) => Some(p),
+        }
+    }
+
+    pub fn wrap_multi<OUT1, OUT2, F1: Fn(A) -> OUT1, F2: Fn(P) -> OUT2>(
+        self,
+        f1: F1,
+        f2: F2,
+    ) -> PxarVariant<OUT1, OUT2> {
+        match self {
+            PxarVariant::Unified(a) => PxarVariant::Unified(f1(a)),
+            PxarVariant::Split(a, p) => PxarVariant::Split(f1(a), f2(p)),
+        }
+    }
+}
+
+impl<IN> PxarVariant<IN, IN> {
+    pub fn wrap<OUT, F: Fn(IN) -> OUT>(self, f: F) -> PxarVariant<OUT, OUT> {
+        match self {
+            PxarVariant::Unified(a) => PxarVariant::Unified(f(a)),
+            PxarVariant::Split(a, p) => PxarVariant::Split(f(a), f(p)),
+        }
+    }
+}
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 pxar 03/69] encoder: move to stack based state tracking
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 01/69] decoder: factor out skip part from skip_entry Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 02/69] lib: add type for input/output variant differentiation Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 04/69] format/examples: add header type `PXAR_PAYLOAD_REF` Christian Ebner
                   ` (66 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

In preparation for the proxmox-backup-client look-ahead caching,
where a passing around of different encoder instances with internal
references is not feasible.

Instead of creating a new encoder instance for each directory level
and keeping references to the parent state, use an internal stack.
Adds additional helper functions to solve borrow issues, when both
the state and writers have to be accessed by a mutable reference.

This is a breaking change in the pxar library API.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- patch reordered

 examples/pxarcmd.rs  |   7 +-
 src/encoder/aio.rs   |  26 +++--
 src/encoder/mod.rs   | 271 +++++++++++++++++++++++++------------------
 src/encoder/sync.rs  |  16 ++-
 tests/simple/fs.rs   |   6 +-
 tests/simple/main.rs |   3 +
 6 files changed, 196 insertions(+), 133 deletions(-)

diff --git a/examples/pxarcmd.rs b/examples/pxarcmd.rs
index e0c779d..0294eba 100644
--- a/examples/pxarcmd.rs
+++ b/examples/pxarcmd.rs
@@ -106,6 +106,7 @@ fn cmd_create(mut args: std::env::ArgsOs) -> Result<(), Error> {
     let mut encoder = Encoder::create(file, &meta)?;
     add_directory(&mut encoder, dir, &dir_path, &mut HashMap::new())?;
     encoder.finish()?;
+    encoder.close()?;
 
     Ok(())
 }
@@ -138,14 +139,14 @@ fn add_directory<'a, T: SeqWrite + 'a>(
 
         let meta = Metadata::from(&file_meta);
         if file_type.is_dir() {
-            let mut dir = encoder.create_directory(file_name, &meta)?;
+            encoder.create_directory(file_name, &meta)?;
             add_directory(
-                &mut dir,
+                encoder,
                 std::fs::read_dir(file_path)?,
                 root_path,
                 &mut *hardlinks,
             )?;
-            dir.finish()?;
+            encoder.finish()?;
         } else if file_type.is_symlink() {
             todo!("symlink handling");
         } else if file_type.is_file() {
diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index ad25fea..f11e57c 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -98,20 +98,23 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         &mut self,
         file_name: P,
         metadata: &Metadata,
-    ) -> io::Result<Encoder<'_, T>> {
-        Ok(Encoder {
-            inner: self
-                .inner
-                .create_directory(file_name.as_ref(), metadata)
-                .await?,
-        })
+    ) -> io::Result<()> {
+        self.inner
+            .create_directory(file_name.as_ref(), metadata)
+            .await
     }
 
-    /// Finish this directory. This is mandatory, otherwise the `Drop` handler will `panic!`.
-    pub async fn finish(self) -> io::Result<()> {
+    /// Finish this directory. This is mandatory, encodes the end for the current directory.
+    pub async fn finish(&mut self) -> io::Result<()> {
         self.inner.finish().await
     }
 
+    /// Close the encoder instance. This is mandatory, encodes the end for the optional payload
+    /// output stream, if some is given
+    pub async fn close(self) -> io::Result<()> {
+        self.inner.close().await
+    }
+
     /// Add a symbolic link to the archive.
     pub async fn add_symlink<PF: AsRef<Path>, PT: AsRef<Path>>(
         &mut self,
@@ -295,11 +298,12 @@ mod test {
                 .await
                 .unwrap();
             {
-                let mut dir = encoder
+                encoder
                     .create_directory("baba", &Metadata::dir_builder(0o700).build())
                     .await
                     .unwrap();
-                dir.create_file(&Metadata::file_builder(0o755).build(), "abab", 1024)
+                encoder
+                    .create_file(&Metadata::file_builder(0o755).build(), "abab", 1024)
                     .await
                     .unwrap();
             }
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index da41733..2bc3128 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -221,9 +221,17 @@ struct EncoderState {
 
     /// We need to keep track how much we have written to get offsets.
     write_position: u64,
+
+    /// Mark the encoder state as correctly finished, ready to be dropped
+    finished: bool,
 }
 
 impl EncoderState {
+    #[inline]
+    fn position(&self) -> u64 {
+        self.write_position
+    }
+
     fn merge_error(&mut self, error: Option<EncodeError>) {
         // one error is enough:
         if self.encode_error.is_none() {
@@ -234,6 +242,23 @@ impl EncoderState {
     fn add_error(&mut self, error: EncodeError) {
         self.merge_error(Some(error));
     }
+
+    fn finish(&mut self) -> Option<EncodeError> {
+        self.finished = true;
+        self.encode_error.take()
+    }
+}
+
+impl Drop for EncoderState {
+    fn drop(&mut self) {
+        if !self.finished {
+            eprintln!("unfinished encoder state dropped");
+        }
+
+        if self.encode_error.is_some() {
+            eprintln!("finished encoder state with errors");
+        }
+    }
 }
 
 pub(crate) enum EncoderOutput<'a, T> {
@@ -241,16 +266,6 @@ pub(crate) enum EncoderOutput<'a, T> {
     Borrowed(&'a mut T),
 }
 
-impl<'a, T> EncoderOutput<'a, T> {
-    #[inline]
-    fn to_borrowed_mut<'s>(&'s mut self) -> EncoderOutput<'s, T>
-    where
-        'a: 's,
-    {
-        EncoderOutput::Borrowed(self.as_mut())
-    }
-}
-
 impl<'a, T> std::convert::AsMut<T> for EncoderOutput<'a, T> {
     fn as_mut(&mut self) -> &mut T {
         match self {
@@ -278,8 +293,8 @@ impl<'a, T> std::convert::From<&'a mut T> for EncoderOutput<'a, T> {
 /// synchronous or `async` I/O objects in as output.
 pub(crate) struct EncoderImpl<'a, T: SeqWrite + 'a> {
     output: EncoderOutput<'a, T>,
-    state: EncoderState,
-    parent: Option<&'a mut EncoderState>,
+    /// EncoderState stack storing the state for each directory level
+    state: Vec<EncoderState>,
     finished: bool,
 
     /// Since only the "current" entry can be actively writing files, we share the file copy
@@ -289,15 +304,12 @@ pub(crate) struct EncoderImpl<'a, T: SeqWrite + 'a> {
 
 impl<'a, T: SeqWrite + 'a> Drop for EncoderImpl<'a, T> {
     fn drop(&mut self) {
-        if let Some(ref mut parent) = self.parent {
-            // propagate errors:
-            parent.merge_error(self.state.encode_error);
-            if !self.finished {
-                parent.add_error(EncodeError::IncompleteDirectory);
-            }
-        } else if !self.finished {
-            // FIXME: how do we deal with this?
-            // eprintln!("Encoder dropped without finishing!");
+        if !self.finished {
+            eprintln!("unclosed encoder dropped");
+        }
+
+        if !self.state.is_empty() {
+            eprintln!("closed encoder dropped with state");
         }
     }
 }
@@ -312,8 +324,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         }
         let mut this = Self {
             output,
-            state: EncoderState::default(),
-            parent: None,
+            state: vec![EncoderState::default()],
             finished: false,
             file_copy_buffer: Arc::new(Mutex::new(unsafe {
                 crate::util::vec_new_uninitialized(1024 * 1024)
@@ -321,19 +332,45 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         };
 
         this.encode_metadata(metadata).await?;
-        this.state.files_offset = this.position();
+        let state = this.state_mut()?;
+        state.files_offset = state.position();
 
         Ok(this)
     }
 
     fn check(&self) -> io::Result<()> {
-        match self.state.encode_error {
+        if self.finished {
+            io_bail!("unexpected encoder finished state");
+        }
+        let state = self.state()?;
+        match state.encode_error {
             Some(EncodeError::IncompleteFile) => io_bail!("incomplete file"),
             Some(EncodeError::IncompleteDirectory) => io_bail!("directory not finalized"),
             None => Ok(()),
         }
     }
 
+    fn state(&self) -> io::Result<&EncoderState> {
+        self.state
+            .last()
+            .ok_or_else(|| io_format_err!("encoder state stack underflow"))
+    }
+
+    fn state_mut(&mut self) -> io::Result<&mut EncoderState> {
+        self.state
+            .last_mut()
+            .ok_or_else(|| io_format_err!("encoder state stack underflow"))
+    }
+
+    fn output_state(&mut self) -> io::Result<(&mut T, &mut EncoderState)> {
+        Ok((
+            self.output.as_mut(),
+            self.state
+                .last_mut()
+                .ok_or_else(|| io_format_err!("encoder state stack underflow"))?,
+        ))
+    }
+
     pub async fn create_file<'b>(
         &'b mut self,
         metadata: &Metadata,
@@ -358,27 +395,27 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     {
         self.check()?;
 
-        let file_offset = self.position();
+        let file_offset = self.state()?.position();
         self.start_file_do(Some(metadata), file_name).await?;
 
         let header = format::Header::with_content_size(format::PXAR_PAYLOAD, file_size);
         header.check_header_size()?;
+        let (output, state) = self.output_state()?;
+        seq_write_struct(output, header, &mut state.write_position).await?;
 
-        seq_write_struct(self.output.as_mut(), header, &mut self.state.write_position).await?;
-
-        let payload_data_offset = self.position();
+        let payload_data_offset = state.position();
 
         let meta_size = payload_data_offset - file_offset;
 
         Ok(FileImpl {
-            output: self.output.as_mut(),
+            output,
             goodbye_item: GoodbyeItem {
                 hash: format::hash_filename(file_name),
                 offset: file_offset,
                 size: file_size + meta_size,
             },
             remaining_size: file_size,
-            parent: &mut self.state,
+            parent: state,
         })
     }
 
@@ -459,7 +496,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         target: &Path,
         target_offset: LinkOffset,
     ) -> io::Result<()> {
-        let current_offset = self.position();
+        let current_offset = self.state()?.position();
         if current_offset <= target_offset.0 {
             io_bail!("invalid hardlink offset, can only point to prior files");
         }
@@ -533,24 +570,20 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     ) -> io::Result<LinkOffset> {
         self.check()?;
 
-        let file_offset = self.position();
+        let file_offset = self.state()?.position();
 
         let file_name = file_name.as_os_str().as_bytes();
 
         self.start_file_do(metadata, file_name).await?;
+
+        let (output, state) = self.output_state()?;
         if let Some((htype, entry_data)) = entry_htype_data {
-            seq_write_pxar_entry(
-                self.output.as_mut(),
-                htype,
-                entry_data,
-                &mut self.state.write_position,
-            )
-            .await?;
+            seq_write_pxar_entry(output, htype, entry_data, &mut state.write_position).await?;
         }
 
-        let end_offset = self.position();
+        let end_offset = state.position();
 
-        self.state.items.push(GoodbyeItem {
+        state.items.push(GoodbyeItem {
             hash: format::hash_filename(file_name),
             offset: file_offset,
             size: end_offset - file_offset,
@@ -559,16 +592,11 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         Ok(LinkOffset(file_offset))
     }
 
-    #[inline]
-    fn position(&mut self) -> u64 {
-        self.state.write_position
-    }
-
     pub async fn create_directory(
         &mut self,
         file_name: &Path,
         metadata: &Metadata,
-    ) -> io::Result<EncoderImpl<'_, T>> {
+    ) -> io::Result<()> {
         self.check()?;
 
         if !metadata.is_dir() {
@@ -578,34 +606,30 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         let file_name = file_name.as_os_str().as_bytes();
         let file_hash = format::hash_filename(file_name);
 
-        let file_offset = self.position();
+        let file_offset = self.state()?.position();
         self.encode_filename(file_name).await?;
 
-        let entry_offset = self.position();
+        let entry_offset = self.state()?.position();
         self.encode_metadata(metadata).await?;
 
-        let files_offset = self.position();
+        let state = self.state_mut()?;
+        let files_offset = state.position();
 
         // the child will write to OUR state now:
-        let write_position = self.position();
-
-        let file_copy_buffer = Arc::clone(&self.file_copy_buffer);
-
-        Ok(EncoderImpl {
-            // always forward as Borrowed(), to avoid stacking references on nested calls
-            output: self.output.to_borrowed_mut(),
-            state: EncoderState {
-                entry_offset,
-                files_offset,
-                file_offset: Some(file_offset),
-                file_hash,
-                write_position,
-                ..Default::default()
-            },
-            parent: Some(&mut self.state),
+        let write_position = state.position();
+
+        self.state.push(EncoderState {
+            items: Vec::new(),
+            encode_error: None,
+            entry_offset,
+            files_offset,
+            file_offset: Some(file_offset),
+            file_hash,
+            write_position,
             finished: false,
-            file_copy_buffer,
-        })
+        });
+
+        Ok(())
     }
 
     async fn start_file_do(
@@ -621,11 +645,12 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     }
 
     async fn encode_metadata(&mut self, metadata: &Metadata) -> io::Result<()> {
+        let (output, state) = self.output_state()?;
         seq_write_pxar_struct_entry(
-            self.output.as_mut(),
+            output,
             format::PXAR_ENTRY,
             metadata.stat.clone(),
-            &mut self.state.write_position,
+            &mut state.write_position,
         )
         .await?;
 
@@ -647,72 +672,74 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     }
 
     async fn write_xattr(&mut self, xattr: &format::XAttr) -> io::Result<()> {
+        let (output, state) = self.output_state()?;
         seq_write_pxar_entry(
-            self.output.as_mut(),
+            output,
             format::PXAR_XATTR,
             &xattr.data,
-            &mut self.state.write_position,
+            &mut state.write_position,
         )
         .await
     }
 
     async fn write_acls(&mut self, acl: &crate::Acl) -> io::Result<()> {
+        let (output, state) = self.output_state()?;
         for acl in &acl.users {
             seq_write_pxar_struct_entry(
-                self.output.as_mut(),
+                output,
                 format::PXAR_ACL_USER,
                 acl.clone(),
-                &mut self.state.write_position,
+                &mut state.write_position,
             )
             .await?;
         }
 
         for acl in &acl.groups {
             seq_write_pxar_struct_entry(
-                self.output.as_mut(),
+                output,
                 format::PXAR_ACL_GROUP,
                 acl.clone(),
-                &mut self.state.write_position,
+                &mut state.write_position,
             )
             .await?;
         }
 
         if let Some(acl) = &acl.group_obj {
             seq_write_pxar_struct_entry(
-                self.output.as_mut(),
+                output,
                 format::PXAR_ACL_GROUP_OBJ,
                 acl.clone(),
-                &mut self.state.write_position,
+                &mut state.write_position,
             )
             .await?;
         }
 
         if let Some(acl) = &acl.default {
             seq_write_pxar_struct_entry(
-                self.output.as_mut(),
+                output,
                 format::PXAR_ACL_DEFAULT,
                 acl.clone(),
-                &mut self.state.write_position,
+                &mut state.write_position,
             )
             .await?;
         }
 
         for acl in &acl.default_users {
             seq_write_pxar_struct_entry(
-                self.output.as_mut(),
+                output,
                 format::PXAR_ACL_DEFAULT_USER,
                 acl.clone(),
-                &mut self.state.write_position,
+                &mut state.write_position,
             )
             .await?;
         }
 
         for acl in &acl.default_groups {
             seq_write_pxar_struct_entry(
-                self.output.as_mut(),
+                output,
                 format::PXAR_ACL_DEFAULT_GROUP,
                 acl.clone(),
-                &mut self.state.write_position,
+                &mut state.write_position,
             )
             .await?;
         }
@@ -721,11 +748,12 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     }
 
     async fn write_file_capabilities(&mut self, fcaps: &format::FCaps) -> io::Result<()> {
+        let (output, state) = self.output_state()?;
         seq_write_pxar_entry(
-            self.output.as_mut(),
+            output,
             format::PXAR_FCAPS,
             &fcaps.data,
-            &mut self.state.write_position,
+            &mut state.write_position,
         )
         .await
     }
@@ -734,66 +762,89 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         &mut self,
         quota_project_id: &format::QuotaProjectId,
     ) -> io::Result<()> {
+        let (output, state) = self.output_state()?;
         seq_write_pxar_struct_entry(
-            self.output.as_mut(),
+            output,
             format::PXAR_QUOTA_PROJID,
             *quota_project_id,
-            &mut self.state.write_position,
+            &mut state.write_position,
         )
         .await
     }
 
     async fn encode_filename(&mut self, file_name: &[u8]) -> io::Result<()> {
         crate::util::validate_filename(file_name)?;
+        let (output, state) = self.output_state()?;
         seq_write_pxar_entry_zero(
-            self.output.as_mut(),
+            output,
             format::PXAR_FILENAME,
             file_name,
-            &mut self.state.write_position,
+            &mut state.write_position,
         )
         .await
     }
 
-    pub async fn finish(mut self) -> io::Result<()> {
+    pub async fn close(mut self) -> io::Result<()> {
+        if !self.state.is_empty() {
+            io_bail!("unexpected state on encoder close");
+        }
+
+        if let EncoderOutput::Owned(output) = &mut self.output {
+            flush(output).await?;
+        }
+
+        self.finished = true;
+
+        Ok(())
+    }
+
+    pub async fn finish(&mut self) -> io::Result<()> {
         let tail_bytes = self.finish_goodbye_table().await?;
+        let mut state = self
+            .state
+            .pop()
+            .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
         seq_write_pxar_entry(
             self.output.as_mut(),
             format::PXAR_GOODBYE,
             &tail_bytes,
-            &mut self.state.write_position,
+            &mut state.write_position,
         )
         .await?;
 
-        if let EncoderOutput::Owned(output) = &mut self.output {
-            flush(output).await?;
-        }
+        let end_offset = state.position();
 
-        // done up here because of the self-borrow and to propagate
-        let end_offset = self.position();
-
-        if let Some(parent) = &mut self.parent {
+        let encode_error = state.finish();
+        if let Some(parent) = self.state.last_mut() {
             parent.write_position = end_offset;
 
-            let file_offset = self
-                .state
+            let file_offset = state
                 .file_offset
                 .expect("internal error: parent set but no file_offset?");
 
             parent.items.push(GoodbyeItem {
-                hash: self.state.file_hash,
+                hash: state.file_hash,
                 offset: file_offset,
                 size: end_offset - file_offset,
             });
+            // propagate errors
+            parent.merge_error(encode_error);
+            Ok(())
+        } else {
+            match encode_error {
+                Some(EncodeError::IncompleteFile) => io_bail!("incomplete file"),
+                Some(EncodeError::IncompleteDirectory) => io_bail!("directory not finalized"),
+                None => Ok(()),
+            }
         }
-        self.finished = true;
-        Ok(())
     }
 
     async fn finish_goodbye_table(&mut self) -> io::Result<Vec<u8>> {
-        let goodbye_offset = self.position();
+        let state = self.state_mut()?;
+        let goodbye_offset = state.position();
 
         // "take" out the tail (to not leave an array of endian-swapped structs in `self`)
-        let mut tail = take(&mut self.state.items);
+        let mut tail = take(&mut state.items);
         let tail_size = (tail.len() + 1) * size_of::<GoodbyeItem>();
         let goodbye_size = tail_size as u64 + size_of::<format::Header>() as u64;
 
@@ -818,7 +869,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         bst.push(
             GoodbyeItem {
                 hash: format::PXAR_GOODBYE_TAIL_MARKER,
-                offset: goodbye_offset - self.state.entry_offset,
+                offset: goodbye_offset - state.entry_offset,
                 size: goodbye_size,
             }
             .to_le(),
@@ -845,8 +896,8 @@ pub(crate) struct FileImpl<'a, S: SeqWrite> {
     /// exactly zero.
     remaining_size: u64,
 
-    /// The directory containing this file. This is where we propagate the `IncompleteFile` error
-    /// to, and where we insert our `GoodbyeItem`.
+    /// The directory stack with the last item being the directory containing this file. This is
+    /// where we propagate the `IncompleteFile` error to, and where we insert our `GoodbyeItem`.
     parent: &'a mut EncoderState,
 }
 
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 1ec91b8..48a97af 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -99,17 +99,21 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         &mut self,
         file_name: P,
         metadata: &Metadata,
-    ) -> io::Result<Encoder<'_, T>> {
-        Ok(Encoder {
-            inner: poll_result_once(self.inner.create_directory(file_name.as_ref(), metadata))?,
-        })
+    ) -> io::Result<()> {
+        poll_result_once(self.inner.create_directory(file_name.as_ref(), metadata))
     }
 
-    /// Finish this directory. This is mandatory, otherwise the `Drop` handler will `panic!`.
-    pub fn finish(self) -> io::Result<()> {
+    /// Finish this directory. This is mandatory, encodes the end for the current directory.
+    pub fn finish(&mut self) -> io::Result<()> {
         poll_result_once(self.inner.finish())
     }
 
+    /// Close the encoder instance. This is mandatory, encodes the end for the optional payload
+    /// output stream, if some is given
+    pub fn close(self) -> io::Result<()> {
+        poll_result_once(self.inner.close())
+    }
+
     /// Add a symbolic link to the archive.
     pub fn add_symlink<PF: AsRef<Path>, PT: AsRef<Path>>(
         &mut self,
diff --git a/tests/simple/fs.rs b/tests/simple/fs.rs
index 9a89c4d..4284805 100644
--- a/tests/simple/fs.rs
+++ b/tests/simple/fs.rs
@@ -144,12 +144,12 @@ impl Entry {
 
             EntryKind::Directory(entries) => {
                 self.no_hardlink()?;
-                let mut dir = encoder.create_directory(&self.name, &self.metadata)?;
+                encoder.create_directory(&self.name, &self.metadata)?;
                 let path = path.join(&self.name);
                 for entry in entries {
-                    entry.encode_into(&mut dir, hardlinks, &path)?;
+                    entry.encode_into(encoder, hardlinks, &path)?;
                 }
-                dir.finish()?;
+                encoder.finish()?;
             }
 
             EntryKind::Symlink(path) => {
diff --git a/tests/simple/main.rs b/tests/simple/main.rs
index d661c7d..e55457f 100644
--- a/tests/simple/main.rs
+++ b/tests/simple/main.rs
@@ -51,6 +51,9 @@ fn test1() {
     encoder
         .finish()
         .expect("failed to finish encoding the pxar archive");
+    encoder
+        .close()
+        .expect("failed to close the encoder instance");
 
     assert!(!file.is_empty(), "encoder did not write any data");
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 pxar 04/69] format/examples: add header type `PXAR_PAYLOAD_REF`
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (2 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 03/69] encoder: move to stack based state tracking Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 05/69] decoder: add method to read payload references Christian Ebner
                   ` (65 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Introduces the header type `PXAR_PAYLOAD_REF` to mark regular file
entry payloads, not encoded within the regular pxar archive but
rather redirected to a dedicated payload output writer.
It therefore substitutes the `PXAR_PAYLOAD` header type for these
entries.

The header marks the start and size for a `PayloadRef` typed object
in the archive, storing the offset to the payload header offset in the
payload stream of the dedicated payload output as well as the payload
size.

The `PayloadRef` provides the means to store, serialize and
deserialize the entry.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- patch reordered

 examples/mk-format-hashes.rs |  5 +++++
 src/format/mod.rs            | 20 ++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index 6e00654..83adb38 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -41,6 +41,11 @@ const CONSTANTS: &[(&str, &str, &str)] = &[
         "PXAR_PAYLOAD",
         "__PROXMOX_FORMAT_PXAR_PAYLOAD__",
     ),
+    (
+        "Marks the beginning of a payload reference for regular files",
+        "PXAR_PAYLOAD_REF",
+        "__PROXMOX_FORMAT_PXAR_PAYLOAD_REF__",
+    ),
     (
         "Marks item as entry of goodbye table",
         "PXAR_GOODBYE",
diff --git a/src/format/mod.rs b/src/format/mod.rs
index bfea9f6..5d7a652 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -22,6 +22,7 @@
 //!   * `FCAPS`             -- file capability in Linux disk format
 //!   * `QUOTA_PROJECT_ID`  -- the ext4/xfs quota project ID
 //!   * `PAYLOAD`           -- file contents, if it is one
+//!   * `PAYLOAD_REF`       -- reference to file offset in optional payload file (introduced in v2)
 //!   * `SYMLINK`           -- symlink target, if it is one
 //!   * `DEVICE`            -- device major/minor, if it is a block/char device
 //!
@@ -99,6 +100,8 @@ pub const PXAR_QUOTA_PROJID: u64 = 0xe07540e82f7d1cbb;
 pub const PXAR_HARDLINK: u64 = 0x51269c8422bd7275;
 /// Marks the beginning of the payload (actual content) of regular files
 pub const PXAR_PAYLOAD: u64 = 0x28147a1b0b7c1a25;
+/// Marks the beginning of a payload reference for regular files
+pub const PXAR_PAYLOAD_REF: u64 = 0x419d3d6bc4ba977e;
 /// Marks item as entry of goodbye table
 pub const PXAR_GOODBYE: u64 = 0x2fec4fa642d5731d;
 /// The end marker used in the GOODBYE object
@@ -152,6 +155,7 @@ impl Header {
             PXAR_QUOTA_PROJID => size_of::<QuotaProjectId>() as u64,
             PXAR_ENTRY => size_of::<Stat>() as u64,
             PXAR_PAYLOAD | PXAR_GOODBYE => u64::MAX - (size_of::<Self>() as u64),
+            PXAR_PAYLOAD_REF => size_of::<PayloadRef>() as u64,
             _ => u64::MAX - (size_of::<Self>() as u64),
         }
     }
@@ -192,6 +196,7 @@ impl Display for Header {
             PXAR_QUOTA_PROJID => "QUOTA_PROJID",
             PXAR_ENTRY => "ENTRY",
             PXAR_PAYLOAD => "PAYLOAD",
+            PXAR_PAYLOAD_REF => "PAYLOAD_REF",
             PXAR_GOODBYE => "GOODBYE",
             _ => "UNKNOWN",
         };
@@ -723,6 +728,21 @@ impl GoodbyeItem {
     }
 }
 
+/// References a regular file payload found in a separated payload archive
+#[derive(Clone, Debug, Endian)]
+pub struct PayloadRef {
+    pub offset: u64,
+    pub size: u64,
+}
+
+impl PayloadRef {
+    pub(crate) fn data(&self) -> Vec<u8> {
+        let mut data = self.offset.to_le_bytes().to_vec();
+        data.append(&mut self.size.to_le_bytes().to_vec());
+        data
+    }
+}
+
 /// Hash a file name for use in the goodbye table.
 pub fn hash_filename(name: &[u8]) -> u64 {
     use std::hash::Hasher;
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 pxar 05/69] decoder: add method to read payload references
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (3 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 04/69] format/examples: add header type `PXAR_PAYLOAD_REF` Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 06/69] encoder: allow split output writer for archive creation Christian Ebner
                   ` (64 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

This is in preparation for reading payloads from a dedicated payload
input stream.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- patch reordered

 src/decoder/mod.rs | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index 3c6d9ef..d19ffd1 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -664,6 +664,11 @@ impl<I: SeqRead> DecoderImpl<I> {
     async fn read_quota_project_id(&mut self) -> io::Result<format::QuotaProjectId> {
         self.read_simple_entry("quota project id").await
     }
+
+    async fn read_payload_ref(&mut self) -> io::Result<format::PayloadRef> {
+        self.current_header.check_header_size()?;
+        seq_read_entry(&mut self.input).await
+    }
 }
 
 /// Reader for file contents inside a pxar archive.
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 pxar 06/69] encoder: allow split output writer for archive creation
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (4 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 05/69] decoder: add method to read payload references Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 07/69] decoder/accessor: allow for split input stream variant Christian Ebner
                   ` (63 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

During regular pxar archive encoding, the payload of regular files is
written as part of the archive.

This patch introduces functionality to instead attach a writer variant
with a split payload writer instance to redirect the payload to a
different output.
The separation of data and metadata streams allows for efficient
reuse of payload data by referencing the payload writer byte offset,
without having to reencode it.

Whenever the payload of regular files is redirected to a dedicated
output writer, encode a payload reference header followed by the
required data to locate the data, instead of adding the regular payload
header followed by the encoded payload to the archive.

This is in preparation for reusing payload chunks for unchanged files
of backups created via the proxmox-backup-client.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- patch reordered, use PxarVariant instead of optional payload output

 src/encoder/aio.rs  |  24 ++++---
 src/encoder/mod.rs  | 160 ++++++++++++++++++++++++++++++++------------
 src/encoder/sync.rs |  19 ++++--
 3 files changed, 148 insertions(+), 55 deletions(-)

diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index f11e57c..610fce5 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -7,7 +7,7 @@ use std::task::{Context, Poll};
 
 use crate::encoder::{self, LinkOffset, SeqWrite};
 use crate::format;
-use crate::Metadata;
+use crate::{Metadata, PxarVariant};
 
 /// Asynchronous `pxar` encoder.
 ///
@@ -22,10 +22,10 @@ impl<'a, T: tokio::io::AsyncWrite + 'a> Encoder<'a, TokioWriter<T>> {
     /// Encode a `pxar` archive into a `tokio::io::AsyncWrite` output.
     #[inline]
     pub async fn from_tokio(
-        output: T,
+        output: PxarVariant<T, T>,
         metadata: &Metadata,
     ) -> io::Result<Encoder<'a, TokioWriter<T>>> {
-        Encoder::new(TokioWriter::new(output), metadata).await
+        Encoder::new(output.wrap(|output| TokioWriter::new(output)), metadata).await
     }
 }
 
@@ -37,7 +37,9 @@ impl<'a> Encoder<'a, TokioWriter<tokio::fs::File>> {
         metadata: &'b Metadata,
     ) -> io::Result<Encoder<'a, TokioWriter<tokio::fs::File>>> {
         Encoder::new(
-            TokioWriter::new(tokio::fs::File::create(path.as_ref()).await?),
+            PxarVariant::Unified(TokioWriter::new(
+                tokio::fs::File::create(path.as_ref()).await?,
+            )),
             metadata,
         )
         .await
@@ -46,9 +48,10 @@ impl<'a> Encoder<'a, TokioWriter<tokio::fs::File>> {
 
 impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
     /// Create an asynchronous encoder for an output implementing our internal write interface.
-    pub async fn new(output: T, metadata: &Metadata) -> io::Result<Encoder<'a, T>> {
+    pub async fn new(output: PxarVariant<T, T>, metadata: &Metadata) -> io::Result<Encoder<'a, T>> {
+        let output = output.wrap_multi(|output| output.into(), |payload_output| payload_output);
         Ok(Self {
-            inner: encoder::EncoderImpl::new(output.into(), metadata).await?,
+            inner: encoder::EncoderImpl::new(output, metadata).await?,
         })
     }
 
@@ -294,9 +297,12 @@ mod test {
     /// Assert that `Encoder` is `Send`
     fn send_test() {
         let test = async {
-            let mut encoder = Encoder::new(DummyOutput, &Metadata::dir_builder(0o700).build())
-                .await
-                .unwrap();
+            let mut encoder = Encoder::new(
+                crate::PxarVariant::Unified(DummyOutput),
+                &Metadata::dir_builder(0o700).build(),
+            )
+            .await
+            .unwrap();
             {
                 encoder
                     .create_directory("baba", &Metadata::dir_builder(0o700).build())
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 2bc3128..fbd90fe 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -17,8 +17,8 @@ use endian_trait::Endian;
 
 use crate::binary_tree_array;
 use crate::decoder::{self, SeqRead};
-use crate::format::{self, GoodbyeItem};
-use crate::Metadata;
+use crate::format::{self, GoodbyeItem, PayloadRef};
+use crate::{Metadata, PxarVariant};
 
 pub mod aio;
 pub mod sync;
@@ -222,6 +222,9 @@ struct EncoderState {
     /// We need to keep track how much we have written to get offsets.
     write_position: u64,
 
+    /// Track the bytes written to the payload writer
+    payload_write_position: u64,
+
     /// Mark the encoder state as correctly finished, ready to be dropped
     finished: bool,
 }
@@ -232,6 +235,11 @@ impl EncoderState {
         self.write_position
     }
 
+    #[inline]
+    fn payload_position(&self) -> u64 {
+        self.payload_write_position
+    }
+
     fn merge_error(&mut self, error: Option<EncodeError>) {
         // one error is enough:
         if self.encode_error.is_none() {
@@ -292,7 +300,7 @@ impl<'a, T> std::convert::From<&'a mut T> for EncoderOutput<'a, T> {
 /// We use `async fn` to implement the encoder state machine so that we can easily plug in both
 /// synchronous or `async` I/O objects in as output.
 pub(crate) struct EncoderImpl<'a, T: SeqWrite + 'a> {
-    output: EncoderOutput<'a, T>,
+    output: PxarVariant<EncoderOutput<'a, T>, T>,
     /// EncoderState stack storing the state for each directory level
     state: Vec<EncoderState>,
     finished: bool,
@@ -316,7 +324,7 @@ impl<'a, T: SeqWrite + 'a> Drop for EncoderImpl<'a, T> {
 
 impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     pub async fn new(
-        output: EncoderOutput<'a, T>,
+        output: PxarVariant<EncoderOutput<'a, T>, T>,
         metadata: &Metadata,
     ) -> io::Result<EncoderImpl<'a, T>> {
         if !metadata.is_dir() {
@@ -362,9 +370,16 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
             .ok_or_else(|| io_format_err!("encoder state stack underflow"))
     }
 
-    fn output_state(&mut self) -> io::Result<(&mut T, &mut EncoderState)> {
+    fn output_state(&mut self) -> io::Result<(PxarVariant<&mut T, &mut T>, &mut EncoderState)> {
+        let output = match &mut self.output {
+            PxarVariant::Unified(output) => PxarVariant::Unified(output.as_mut()),
+            PxarVariant::Split(output, payload_output) => {
+                PxarVariant::Split(output.as_mut(), payload_output)
+            }
+        };
+
         Ok((
-            self.output.as_mut(),
+            output,
             self.state
                 .last_mut()
                 .ok_or_else(|| io_format_err!("encoder state stack underflow"))?,
@@ -398,10 +413,33 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         let file_offset = self.state()?.position();
         self.start_file_do(Some(metadata), file_name).await?;
 
-        let header = format::Header::with_content_size(format::PXAR_PAYLOAD, file_size);
-        header.check_header_size()?;
-        let (output, state) = self.output_state()?;
-        seq_write_struct(output, header, &mut state.write_position).await?;
+        let (mut output, state) = self.output_state()?;
+        if let Some(payload_output) = output.payload_mut() {
+            // payload references must point to the position prior to the payload header,
+            // separating payload entries in the payload stream
+            let payload_position = state.payload_position();
+
+            let header = format::Header::with_content_size(format::PXAR_PAYLOAD, file_size);
+            header.check_header_size()?;
+            seq_write_struct(payload_output, header, &mut state.payload_write_position).await?;
+
+            let payload_ref = PayloadRef {
+                offset: payload_position,
+                size: file_size,
+            };
+
+            seq_write_pxar_entry(
+                output.archive_mut(),
+                format::PXAR_PAYLOAD_REF,
+                &payload_ref.data(),
+                &mut state.write_position,
+            )
+            .await?;
+        } else {
+            let header = format::Header::with_content_size(format::PXAR_PAYLOAD, file_size);
+            header.check_header_size()?;
+            seq_write_struct(output.archive_mut(), header, &mut state.write_position).await?;
+        }
 
         let payload_data_offset = state.position();
 
@@ -576,9 +614,15 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
 
         self.start_file_do(metadata, file_name).await?;
 
-        let (output, state) = self.output_state()?;
+        let (mut output, state) = self.output_state()?;
         if let Some((htype, entry_data)) = entry_htype_data {
-            seq_write_pxar_entry(output, htype, entry_data, &mut state.write_position).await?;
+            seq_write_pxar_entry(
+                output.archive_mut(),
+                htype,
+                entry_data,
+                &mut state.write_position,
+            )
+            .await?;
         }
 
         let end_offset = state.position();
@@ -617,6 +661,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
 
         // the child will write to OUR state now:
         let write_position = state.position();
+        let payload_write_position = state.payload_position();
 
         self.state.push(EncoderState {
             items: Vec::new(),
@@ -626,6 +671,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
             file_offset: Some(file_offset),
             file_hash,
             write_position,
+            payload_write_position,
             finished: false,
         });
 
@@ -645,9 +691,9 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     }
 
     async fn encode_metadata(&mut self, metadata: &Metadata) -> io::Result<()> {
-        let (output, state) = self.output_state()?;
+        let (mut output, state) = self.output_state()?;
         seq_write_pxar_struct_entry(
-            output,
+            output.archive_mut(),
             format::PXAR_ENTRY,
             metadata.stat.clone(),
             &mut state.write_position,
@@ -672,9 +718,9 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     }
 
     async fn write_xattr(&mut self, xattr: &format::XAttr) -> io::Result<()> {
-        let (output, state) = self.output_state()?;
+        let (mut output, state) = self.output_state()?;
         seq_write_pxar_entry(
-            output,
+            output.archive_mut(),
             format::PXAR_XATTR,
             &xattr.data,
             &mut state.write_position,
@@ -683,10 +729,10 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     }
 
     async fn write_acls(&mut self, acl: &crate::Acl) -> io::Result<()> {
-        let (output, state) = self.output_state()?;
+        let (mut output, state) = self.output_state()?;
         for acl in &acl.users {
             seq_write_pxar_struct_entry(
-                output,
+                output.archive_mut(),
                 format::PXAR_ACL_USER,
                 acl.clone(),
                 &mut state.write_position,
@@ -696,7 +742,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
 
         for acl in &acl.groups {
             seq_write_pxar_struct_entry(
-                output,
+                output.archive_mut(),
                 format::PXAR_ACL_GROUP,
                 acl.clone(),
                 &mut state.write_position,
@@ -706,7 +752,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
 
         if let Some(acl) = &acl.group_obj {
             seq_write_pxar_struct_entry(
-                output,
+                output.archive_mut(),
                 format::PXAR_ACL_GROUP_OBJ,
                 acl.clone(),
                 &mut state.write_position,
@@ -716,7 +762,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
 
         if let Some(acl) = &acl.default {
             seq_write_pxar_struct_entry(
-                output,
+                output.archive_mut(),
                 format::PXAR_ACL_DEFAULT,
                 acl.clone(),
                 &mut state.write_position,
@@ -726,7 +772,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
 
         for acl in &acl.default_users {
             seq_write_pxar_struct_entry(
-                output,
+                output.archive_mut(),
                 format::PXAR_ACL_DEFAULT_USER,
                 acl.clone(),
                 &mut state.write_position,
@@ -736,7 +782,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
 
         for acl in &acl.default_groups {
             seq_write_pxar_struct_entry(
-                output,
+                output.archive_mut(),
                 format::PXAR_ACL_DEFAULT_GROUP,
                 acl.clone(),
                 &mut state.write_position,
@@ -748,9 +794,9 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     }
 
     async fn write_file_capabilities(&mut self, fcaps: &format::FCaps) -> io::Result<()> {
-        let (output, state) = self.output_state()?;
+        let (mut output, state) = self.output_state()?;
         seq_write_pxar_entry(
-            output,
+            output.archive_mut(),
             format::PXAR_FCAPS,
             &fcaps.data,
             &mut state.write_position,
@@ -762,9 +808,9 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         &mut self,
         quota_project_id: &format::QuotaProjectId,
     ) -> io::Result<()> {
-        let (output, state) = self.output_state()?;
+        let (mut output, state) = self.output_state()?;
         seq_write_pxar_struct_entry(
-            output,
+            output.archive_mut(),
             format::PXAR_QUOTA_PROJID,
             *quota_project_id,
             &mut state.write_position,
@@ -774,9 +820,9 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
 
     async fn encode_filename(&mut self, file_name: &[u8]) -> io::Result<()> {
         crate::util::validate_filename(file_name)?;
-        let (output, state) = self.output_state()?;
+        let (mut output, state) = self.output_state()?;
         seq_write_pxar_entry_zero(
-            output,
+            output.archive_mut(),
             format::PXAR_FILENAME,
             file_name,
             &mut state.write_position,
@@ -789,7 +835,11 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
             io_bail!("unexpected state on encoder close");
         }
 
-        if let EncoderOutput::Owned(output) = &mut self.output {
+        if let Some(output) = self.output.payload_mut() {
+            flush(output).await?;
+        }
+
+        if let EncoderOutput::Owned(output) = self.output.archive_mut() {
             flush(output).await?;
         }
 
@@ -805,7 +855,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
             .pop()
             .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
         seq_write_pxar_entry(
-            self.output.as_mut(),
+            self.output.archive_mut().as_mut(),
             format::PXAR_GOODBYE,
             &tail_bytes,
             &mut state.write_position,
@@ -813,10 +863,12 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         .await?;
 
         let end_offset = state.position();
+        let payload_end_offset = state.payload_position();
 
         let encode_error = state.finish();
         if let Some(parent) = self.state.last_mut() {
             parent.write_position = end_offset;
+            parent.payload_write_position = payload_end_offset;
 
             let file_offset = state
                 .file_offset
@@ -886,7 +938,8 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
 
 /// Writer for a file object in a directory.
 pub(crate) struct FileImpl<'a, S: SeqWrite> {
-    output: &'a mut S,
+    /// Optional write redirection of file payloads to this sequential stream
+    output: PxarVariant<&'a mut S, &'a mut S>,
 
     /// This file's `GoodbyeItem`. FIXME: We currently don't touch this, can we just push it
     /// directly instead of on Drop of FileImpl?
@@ -934,7 +987,7 @@ impl<'a, S: SeqWrite> FileImpl<'a, S> {
     ) -> Poll<io::Result<usize>> {
         let this = self.get_mut();
         this.check_remaining(data.len())?;
-        let output = unsafe { Pin::new_unchecked(&mut *this.output) };
+        let output = unsafe { Pin::new_unchecked(&mut *this.output.archive_mut()) };
         match output.poll_seq_write(cx, data) {
             Poll::Ready(Ok(put)) => {
                 this.remaining_size -= put as u64;
@@ -948,7 +1001,10 @@ impl<'a, S: SeqWrite> FileImpl<'a, S> {
     /// Poll flush interface to more easily connect to tokio/futures.
     #[cfg(feature = "tokio-io")]
     pub fn poll_flush(self: Pin<&mut Self>, cx: &mut Context) -> Poll<io::Result<()>> {
-        unsafe { self.map_unchecked_mut(|this| this.output).poll_flush(cx) }
+        unsafe {
+            self.map_unchecked_mut(|this| this.output.archive_mut())
+                .poll_flush(cx)
+        }
     }
 
     /// Poll close/shutdown interface to more easily connect to tokio/futures.
@@ -957,7 +1013,10 @@ impl<'a, S: SeqWrite> FileImpl<'a, S> {
     /// provided by our encoder.
     #[cfg(feature = "tokio-io")]
     pub fn poll_close(self: Pin<&mut Self>, cx: &mut Context) -> Poll<io::Result<()>> {
-        unsafe { self.map_unchecked_mut(|this| this.output).poll_flush(cx) }
+        unsafe {
+            self.map_unchecked_mut(|this| this.output.archive_mut())
+                .poll_flush(cx)
+        }
     }
 
     /// Write file data for the current file entry in a pxar archive.
@@ -967,19 +1026,38 @@ impl<'a, S: SeqWrite> FileImpl<'a, S> {
     /// for convenience.
     pub async fn write(&mut self, data: &[u8]) -> io::Result<usize> {
         self.check_remaining(data.len())?;
-        let put =
-            poll_fn(|cx| unsafe { Pin::new_unchecked(&mut self.output).poll_seq_write(cx, data) })
-                .await?;
-        //let put = seq_write(self.output.as_mut().unwrap(), data).await?;
+        let put = if let Some(mut output) = self.output.payload_mut() {
+            let put =
+                poll_fn(|cx| unsafe { Pin::new_unchecked(&mut output).poll_seq_write(cx, data) })
+                    .await?;
+            self.parent.payload_write_position += put as u64;
+            put
+        } else {
+            let put = poll_fn(|cx| unsafe {
+                Pin::new_unchecked(self.output.archive_mut()).poll_seq_write(cx, data)
+            })
+            .await?;
+            self.parent.write_position += put as u64;
+            put
+        };
+
         self.remaining_size -= put as u64;
-        self.parent.write_position += put as u64;
         Ok(put)
     }
 
     /// Completely write file data for the current file entry in a pxar archive.
     pub async fn write_all(&mut self, data: &[u8]) -> io::Result<()> {
         self.check_remaining(data.len())?;
-        seq_write_all(self.output, data, &mut self.parent.write_position).await?;
+        if let Some(ref mut output) = self.output.payload_mut() {
+            seq_write_all(output, data, &mut self.parent.payload_write_position).await?;
+        } else {
+            seq_write_all(
+                self.output.archive_mut(),
+                data,
+                &mut self.parent.write_position,
+            )
+            .await?;
+        }
         self.remaining_size -= data.len() as u64;
         Ok(())
     }
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 48a97af..9d39658 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -9,7 +9,7 @@ use crate::decoder::sync::StandardReader;
 use crate::encoder::{self, LinkOffset, SeqWrite};
 use crate::format;
 use crate::util::poll_result_once;
-use crate::Metadata;
+use crate::{Metadata, PxarVariant};
 
 /// Blocking `pxar` encoder.
 ///
@@ -28,7 +28,7 @@ impl<'a, T: io::Write + 'a> Encoder<'a, StandardWriter<T>> {
     /// Encode a `pxar` archive into a regular `std::io::Write` output.
     #[inline]
     pub fn from_std(output: T, metadata: &Metadata) -> io::Result<Encoder<'a, StandardWriter<T>>> {
-        Encoder::new(StandardWriter::new(output), metadata)
+        Encoder::new(PxarVariant::Unified(StandardWriter::new(output)), metadata)
     }
 }
 
@@ -39,7 +39,7 @@ impl<'a> Encoder<'a, StandardWriter<std::fs::File>> {
         metadata: &'b Metadata,
     ) -> io::Result<Encoder<'a, StandardWriter<std::fs::File>>> {
         Encoder::new(
-            StandardWriter::new(std::fs::File::create(path.as_ref())?),
+            PxarVariant::Unified(StandardWriter::new(std::fs::File::create(path.as_ref())?)),
             metadata,
         )
     }
@@ -50,9 +50,18 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
     ///
     /// Note that the `output`'s `SeqWrite` implementation must always return `Poll::Ready` and is
     /// not allowed to use the `Waker`, as this will cause a `panic!`.
-    pub fn new(output: T, metadata: &Metadata) -> io::Result<Self> {
+    // Optionally attach a dedicated writer to redirect the payloads of regular files to a separate
+    // output.
+    pub fn new(output: PxarVariant<T, T>, metadata: &Metadata) -> io::Result<Self> {
+        let output = match output {
+            PxarVariant::Unified(output) => PxarVariant::Unified(output.into()),
+            PxarVariant::Split(output, payload_output) => {
+                PxarVariant::Split(output.into(), payload_output)
+            }
+        };
+
         Ok(Self {
-            inner: poll_result_once(encoder::EncoderImpl::new(output.into(), metadata))?,
+            inner: poll_result_once(encoder::EncoderImpl::new(output, metadata))?,
         })
     }
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 pxar 07/69] decoder/accessor: allow for split input stream variant
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (5 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 06/69] encoder: allow split output writer for archive creation Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 08/69] decoder: set payload input range when decoding via accessor Christian Ebner
                   ` (62 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

When a pxar archive was encoded using the split stream output
variant, access to the payload of regular files has to be redirected
to the corresponding dedicated input.

Allow to pass the split input variant to the decoder and accessor
instances to handle the split streams accordingly and decode split
stream archives.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- use PxarVariant instead of optional payload inputs

 examples/apxar.rs    |   2 +-
 src/accessor/aio.rs  |  10 ++--
 src/accessor/mod.rs  |  83 ++++++++++++++++++---------
 src/accessor/sync.rs |   8 +--
 src/decoder/aio.rs   |  13 +++--
 src/decoder/mod.rs   | 133 ++++++++++++++++++++++++++++++++++---------
 src/decoder/sync.rs  |  21 +++++--
 src/lib.rs           |   3 +
 tests/compat.rs      |   3 +-
 tests/simple/main.rs |   8 ++-
 10 files changed, 206 insertions(+), 78 deletions(-)

diff --git a/examples/apxar.rs b/examples/apxar.rs
index 0c62242..0dab51d 100644
--- a/examples/apxar.rs
+++ b/examples/apxar.rs
@@ -9,7 +9,7 @@ async fn main() {
         .await
         .expect("failed to open file");
 
-    let mut reader = Decoder::from_tokio(file)
+    let mut reader = Decoder::from_tokio(pxar::PxarVariant::Unified(file))
         .await
         .expect("failed to open pxar archive contents");
 
diff --git a/src/accessor/aio.rs b/src/accessor/aio.rs
index 98d7755..73b1025 100644
--- a/src/accessor/aio.rs
+++ b/src/accessor/aio.rs
@@ -18,7 +18,7 @@ use crate::accessor::{self, cache::Cache, MaybeReady, ReadAt, ReadAtOperation};
 use crate::decoder::aio::Decoder;
 use crate::format::GoodbyeItem;
 use crate::util;
-use crate::Entry;
+use crate::{Entry, PxarVariant};
 
 use super::sync::{FileReader, FileRefReader};
 
@@ -39,7 +39,7 @@ impl<T: FileExt> Accessor<FileReader<T>> {
     /// by a blocking file.
     #[inline]
     pub async fn from_file_and_size(input: T, size: u64) -> io::Result<Self> {
-        Accessor::new(FileReader::new(input), size).await
+        Accessor::new(PxarVariant::Unified(FileReader::new(input)), size).await
     }
 }
 
@@ -75,7 +75,7 @@ where
         input: T,
         size: u64,
     ) -> io::Result<Accessor<FileRefReader<T>>> {
-        Accessor::new(FileRefReader::new(input), size).await
+        Accessor::new(PxarVariant::Unified(FileRefReader::new(input)), size).await
     }
 }
 
@@ -85,7 +85,9 @@ impl<T: ReadAt> Accessor<T> {
     ///
     /// Note that the `input`'s `SeqRead` implementation must always return `Poll::Ready` and is
     /// not allowed to use the `Waker`, as this will cause a `panic!`.
-    pub async fn new(input: T, size: u64) -> io::Result<Self> {
+    /// Optionally take the file payloads from the provided input stream rather than the regular
+    /// pxar stream.
+    pub async fn new(input: PxarVariant<T, (T, u64)>, size: u64) -> io::Result<Self> {
         Ok(Self {
             inner: accessor::AccessorImpl::new(input, size).await?,
         })
diff --git a/src/accessor/mod.rs b/src/accessor/mod.rs
index 6a2de73..c061b74 100644
--- a/src/accessor/mod.rs
+++ b/src/accessor/mod.rs
@@ -19,7 +19,7 @@ use crate::binary_tree_array;
 use crate::decoder::{self, DecoderImpl};
 use crate::format::{self, GoodbyeItem};
 use crate::util;
-use crate::{Entry, EntryKind};
+use crate::{Entry, EntryKind, PxarVariant};
 
 pub mod aio;
 pub mod cache;
@@ -179,17 +179,22 @@ struct Caches {
 
 /// The random access state machine implementation.
 pub(crate) struct AccessorImpl<T> {
-    input: T,
+    input: PxarVariant<T, (T, Range<u64>)>,
     size: u64,
     caches: Arc<Caches>,
 }
 
 impl<T: ReadAt> AccessorImpl<T> {
-    pub async fn new(input: T, size: u64) -> io::Result<Self> {
+    pub async fn new(input: PxarVariant<T, (T, u64)>, size: u64) -> io::Result<Self> {
         if size < (size_of::<GoodbyeItem>() as u64) {
             io_bail!("too small to contain a pxar archive");
         }
 
+        let input = input.wrap_multi(
+            |input| input,
+            |(payload_input, size)| (payload_input, 0..size),
+        );
+
         Ok(Self {
             input,
             size,
@@ -202,13 +207,14 @@ impl<T: ReadAt> AccessorImpl<T> {
     }
 
     pub async fn open_root_ref(&self) -> io::Result<DirectoryImpl<&dyn ReadAt>> {
-        DirectoryImpl::open_at_end(
-            &self.input as &dyn ReadAt,
-            self.size,
-            "/".into(),
-            Arc::clone(&self.caches),
-        )
-        .await
+        let input = match &self.input {
+            PxarVariant::Unified(input) => PxarVariant::Unified(input as &dyn ReadAt),
+            PxarVariant::Split(input, (payload_input, range)) => PxarVariant::Split(
+                input as &dyn ReadAt,
+                (payload_input as &dyn ReadAt, range.clone()),
+            ),
+        };
+        DirectoryImpl::open_at_end(input, self.size, "/".into(), Arc::clone(&self.caches)).await
     }
 
     pub fn set_goodbye_table_cache(
@@ -224,21 +230,25 @@ impl<T: ReadAt> AccessorImpl<T> {
 }
 
 async fn get_decoder<T: ReadAt>(
-    input: T,
+    input: PxarVariant<T, (T, Range<u64>)>,
     entry_range: Range<u64>,
     path: PathBuf,
 ) -> io::Result<DecoderImpl<SeqReadAtAdapter<T>>> {
-    DecoderImpl::new_full(SeqReadAtAdapter::new(input, entry_range), path, true).await
+    let input = input.wrap_multi(
+        |input| SeqReadAtAdapter::new(input, entry_range.clone()),
+        |(payload_input, range)| SeqReadAtAdapter::new(payload_input, range),
+    );
+    DecoderImpl::new_full(input, path, true).await
 }
 
 // NOTE: This performs the Decoder::read_next_item() behavior! Keep in mind when changing!
 async fn get_decoder_at_filename<T: ReadAt>(
-    input: T,
+    input: PxarVariant<T, (T, Range<u64>)>,
     entry_range: Range<u64>,
     path: PathBuf,
 ) -> io::Result<(DecoderImpl<SeqReadAtAdapter<T>>, u64)> {
     // Read the header, it should be a FILENAME, then skip over it and its length:
-    let header: format::Header = read_entry_at(&input, entry_range.start).await?;
+    let header: format::Header = read_entry_at(input.archive(), entry_range.start).await?;
     header.check_header_size()?;
 
     if header.htype != format::PXAR_FILENAME {
@@ -293,6 +303,7 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
             .next()
             .await
             .ok_or_else(|| io_format_err!("unexpected EOF while decoding file entry"))??;
+
         Ok(FileEntryImpl {
             input: self.input.clone(),
             entry,
@@ -303,7 +314,11 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
 
     /// Allow opening arbitrary contents from a specific range.
     pub unsafe fn open_contents_at_range(&self, range: Range<u64>) -> FileContentsImpl<T> {
-        FileContentsImpl::new(self.input.clone(), range)
+        if let Some((payload_input, _)) = &self.input.payload() {
+            FileContentsImpl::new(payload_input.clone(), range)
+        } else {
+            FileContentsImpl::new(self.input.archive().clone(), range)
+        }
     }
 
     /// Following a hardlink breaks a couple of conventions we otherwise have, particularly we will
@@ -342,6 +357,7 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
             EntryKind::File {
                 offset: Some(offset),
                 size,
+                ..
             } => {
                 let meta_size = offset - link_offset;
                 let entry_end = link_offset + meta_size + size;
@@ -362,7 +378,7 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
 
 /// The directory random-access state machine implementation.
 pub(crate) struct DirectoryImpl<T> {
-    input: T,
+    input: PxarVariant<T, (T, Range<u64>)>,
     entry_ofs: u64,
     goodbye_ofs: u64,
     size: u64,
@@ -374,12 +390,12 @@ pub(crate) struct DirectoryImpl<T> {
 impl<T: Clone + ReadAt> DirectoryImpl<T> {
     /// Open a directory ending at the specified position.
     async fn open_at_end(
-        input: T,
+        input: PxarVariant<T, (T, Range<u64>)>,
         end_offset: u64,
         path: PathBuf,
         caches: Arc<Caches>,
     ) -> io::Result<DirectoryImpl<T>> {
-        let tail = Self::read_tail_entry(&input, end_offset).await?;
+        let tail = Self::read_tail_entry(input.archive(), end_offset).await?;
 
         if end_offset < tail.size {
             io_bail!("goodbye tail size out of range");
@@ -434,7 +450,7 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
                 data.as_mut_ptr() as *mut u8,
                 len * size_of::<GoodbyeItem>(),
             );
-            read_exact_at(&self.input, slice, self.table_offset()).await?;
+            read_exact_at(self.input.archive(), slice, self.table_offset()).await?;
         }
         Ok(Arc::from(data))
     }
@@ -599,7 +615,8 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
 
             let cursor = self.get_cursor(index).await?;
             if cursor.file_name == path {
-                return Ok(Some(cursor.decode_entry().await?));
+                let entry = cursor.decode_entry().await?;
+                return Ok(Some(entry));
             }
 
             dup += 1;
@@ -645,13 +662,13 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
     }
 
     async fn read_filename_entry(&self, file_ofs: u64) -> io::Result<(PathBuf, u64)> {
-        let head: format::Header = read_entry_at(&self.input, file_ofs).await?;
+        let head: format::Header = read_entry_at(self.input.archive(), file_ofs).await?;
         if head.htype != format::PXAR_FILENAME {
             io_bail!("expected PXAR_FILENAME header, found: {}", head);
         }
 
         let mut path = read_exact_data_at(
-            &self.input,
+            self.input.archive(),
             head.content_size() as usize,
             file_ofs + (size_of_val(&head) as u64),
         )
@@ -681,7 +698,7 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
 /// A file entry retrieved from a Directory.
 #[derive(Clone)]
 pub(crate) struct FileEntryImpl<T: Clone + ReadAt> {
-    input: T,
+    input: PxarVariant<T, (T, Range<u64>)>,
     entry: Entry,
     entry_range_info: EntryRangeInfo,
     caches: Arc<Caches>,
@@ -711,15 +728,29 @@ impl<T: Clone + ReadAt> FileEntryImpl<T> {
             EntryKind::File {
                 size,
                 offset: Some(offset),
+                payload_offset: None,
             } => Ok(Some(offset..(offset + size))),
+            // Payload offset beats regular offset if some
+            EntryKind::File {
+                size,
+                offset: Some(_offset),
+                payload_offset: Some(payload_offset),
+            } => {
+                let start_offset = payload_offset + size_of::<format::Header>() as u64;
+                Ok(Some(start_offset..start_offset + size))
+            }
             _ => Ok(None),
         }
     }
 
     pub async fn contents(&self) -> io::Result<FileContentsImpl<T>> {
-        match self.content_range()? {
-            Some(range) => Ok(FileContentsImpl::new(self.input.clone(), range)),
-            None => io_bail!("not a file"),
+        let range = self
+            .content_range()?
+            .ok_or_else(|| io_format_err!("not a file"))?;
+        if let Some((ref payload_input, _)) = self.input.payload() {
+            Ok(FileContentsImpl::new(payload_input.clone(), range))
+        } else {
+            Ok(FileContentsImpl::new(self.input.archive().clone(), range))
         }
     }
 
diff --git a/src/accessor/sync.rs b/src/accessor/sync.rs
index a777152..df2ed23 100644
--- a/src/accessor/sync.rs
+++ b/src/accessor/sync.rs
@@ -12,7 +12,7 @@ use crate::accessor::{self, cache::Cache, MaybeReady, ReadAt, ReadAtOperation};
 use crate::decoder::Decoder;
 use crate::format::GoodbyeItem;
 use crate::util::poll_result_once;
-use crate::Entry;
+use crate::{Entry, PxarVariant};
 
 /// Blocking `pxar` random-access decoder.
 ///
@@ -31,7 +31,7 @@ impl<T: FileExt> Accessor<FileReader<T>> {
     /// Decode a `pxar` archive from a standard file implementing `FileExt`.
     #[inline]
     pub fn from_file_and_size(input: T, size: u64) -> io::Result<Self> {
-        Accessor::new(FileReader::new(input), size)
+        Accessor::new(PxarVariant::Unified(FileReader::new(input)), size)
     }
 }
 
@@ -64,7 +64,7 @@ where
 {
     /// Open an `Arc` or `Rc` of `File`.
     pub fn from_file_ref_and_size(input: T, size: u64) -> io::Result<Accessor<FileRefReader<T>>> {
-        Accessor::new(FileRefReader::new(input), size)
+        Accessor::new(PxarVariant::Unified(FileRefReader::new(input)), size)
     }
 }
 
@@ -74,7 +74,7 @@ impl<T: ReadAt> Accessor<T> {
     ///
     /// Note that the `input`'s `SeqRead` implementation must always return `Poll::Ready` and is
     /// not allowed to use the `Waker`, as this will cause a `panic!`.
-    pub fn new(input: T, size: u64) -> io::Result<Self> {
+    pub fn new(input: PxarVariant<T, (T, u64)>, size: u64) -> io::Result<Self> {
         Ok(Self {
             inner: poll_result_once(accessor::AccessorImpl::new(input, size))?,
         })
diff --git a/src/decoder/aio.rs b/src/decoder/aio.rs
index 4de8c6f..3f9881d 100644
--- a/src/decoder/aio.rs
+++ b/src/decoder/aio.rs
@@ -6,7 +6,7 @@ use std::io;
 use std::path::Path;
 
 use crate::decoder::{self, Contents, SeqRead};
-use crate::Entry;
+use crate::{Entry, PxarVariant};
 
 /// Asynchronous `pxar` decoder.
 ///
@@ -20,8 +20,8 @@ pub struct Decoder<T> {
 impl<T: tokio::io::AsyncRead> Decoder<TokioReader<T>> {
     /// Decode a `pxar` archive from a `tokio::io::AsyncRead` input.
     #[inline]
-    pub async fn from_tokio(input: T) -> io::Result<Self> {
-        Decoder::new(TokioReader::new(input)).await
+    pub async fn from_tokio(input: PxarVariant<T, T>) -> io::Result<Self> {
+        Decoder::new(input.wrap(|input| TokioReader::new(input))).await
     }
 }
 
@@ -30,13 +30,16 @@ impl Decoder<TokioReader<tokio::fs::File>> {
     /// Decode a `pxar` archive from a `tokio::io::AsyncRead` input.
     #[inline]
     pub async fn open<P: AsRef<Path>>(path: P) -> io::Result<Self> {
-        Decoder::from_tokio(tokio::fs::File::open(path.as_ref()).await?).await
+        Decoder::from_tokio(PxarVariant::Unified(
+            tokio::fs::File::open(path.as_ref()).await?,
+        ))
+        .await
     }
 }
 
 impl<T: SeqRead> Decoder<T> {
     /// Create an async decoder from an input implementing our internal read interface.
-    pub async fn new(input: T) -> io::Result<Self> {
+    pub async fn new(input: PxarVariant<T, T>) -> io::Result<Self> {
         Ok(Self {
             inner: decoder::DecoderImpl::new(input).await?,
         })
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index d19ffd1..b5c17b8 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -19,7 +19,7 @@ use endian_trait::Endian;
 
 use crate::format::{self, Header};
 use crate::util::{self, io_err_other};
-use crate::{Entry, EntryKind, Metadata};
+use crate::{Entry, EntryKind, Metadata, PxarVariant};
 
 pub mod aio;
 pub mod sync;
@@ -150,13 +150,16 @@ async fn seq_read_entry<T: SeqRead + ?Sized, E: Endian>(input: &mut T) -> io::Re
 /// We use `async fn` to implement the decoder state machine so that we can easily plug in both
 /// synchronous or `async` I/O objects in as input.
 pub(crate) struct DecoderImpl<T> {
-    pub(crate) input: T,
+    // Payload of regular files might be provided by a different reader
+    pub(crate) input: PxarVariant<T, T>,
     current_header: Header,
     entry: Entry,
     path_lengths: Vec<usize>,
     state: State,
     with_goodbye_tables: bool,
 
+    payload_consumed: u64,
+
     /// The random access code uses decoders for sub-ranges which may not end in a `PAYLOAD` for
     /// entries like FIFOs or sockets, so there we explicitly allow an item to terminate with EOF.
     eof_after_entry: bool,
@@ -167,6 +170,7 @@ enum State {
     Default,
     InPayload {
         offset: u64,
+        size: u64,
     },
 
     /// file entries with no data (fifo, socket)
@@ -195,16 +199,16 @@ pub(crate) enum ItemResult {
 }
 
 impl<I: SeqRead> DecoderImpl<I> {
-    pub async fn new(input: I) -> io::Result<Self> {
+    pub async fn new(input: PxarVariant<I, I>) -> io::Result<Self> {
         Self::new_full(input, "/".into(), false).await
     }
 
     pub(crate) fn input(&self) -> &I {
-        &self.input
+        self.input.archive()
     }
 
     pub(crate) async fn new_full(
-        input: I,
+        input: PxarVariant<I, I>,
         path: PathBuf,
         eof_after_entry: bool,
     ) -> io::Result<Self> {
@@ -219,6 +223,7 @@ impl<I: SeqRead> DecoderImpl<I> {
             path_lengths: Vec::new(),
             state: State::Begin,
             with_goodbye_tables: false,
+            payload_consumed: 0,
             eof_after_entry,
         };
 
@@ -242,9 +247,14 @@ impl<I: SeqRead> DecoderImpl<I> {
                     // hierarchy and parse the next PXAR_FILENAME or the PXAR_GOODBYE:
                     self.read_next_item().await?;
                 }
-                State::InPayload { offset } => {
-                    // We need to skip the current payload first.
-                    self.skip_entry(offset).await?;
+                State::InPayload { offset, .. } => {
+                    if self.input.payload().is_some() {
+                        // Update consumed payload as given by the offset referenced by the content reader
+                        self.payload_consumed += offset;
+                    } else {
+                        // Skip remaining payload of current entry in regular stream
+                        self.skip_entry(offset).await?;
+                    }
                     self.read_next_item().await?;
                 }
                 State::InGoodbyeTable => {
@@ -300,20 +310,28 @@ impl<I: SeqRead> DecoderImpl<I> {
     }
 
     pub fn content_size(&self) -> Option<u64> {
-        if let State::InPayload { .. } = self.state {
-            Some(self.current_header.content_size())
+        if let State::InPayload { size, .. } = self.state {
+            if self.input.payload().is_some() {
+                Some(size)
+            } else {
+                Some(self.current_header.content_size())
+            }
         } else {
             None
         }
     }
 
     pub fn content_reader(&mut self) -> Option<Contents<I>> {
-        if let State::InPayload { offset } = &mut self.state {
-            Some(Contents::new(
-                &mut self.input,
-                offset,
-                self.current_header.content_size(),
-            ))
+        if let State::InPayload { offset, size } = &mut self.state {
+            if self.input.payload().is_some() {
+                Some(Contents::new(
+                    self.input.payload_mut().unwrap(),
+                    offset,
+                    *size,
+                ))
+            } else {
+                Some(Contents::new(self.input.archive_mut(), offset, *size))
+            }
         } else {
             None
         }
@@ -357,7 +375,7 @@ impl<I: SeqRead> DecoderImpl<I> {
         self.state = State::Default;
         self.entry.clear_data();
 
-        let header: Header = match seq_read_entry_or_eof(&mut self.input).await? {
+        let header: Header = match seq_read_entry_or_eof(self.input.archive_mut()).await? {
             None => return Ok(None),
             Some(header) => header,
         };
@@ -377,11 +395,11 @@ impl<I: SeqRead> DecoderImpl<I> {
         } else if header.htype == format::PXAR_ENTRY || header.htype == format::PXAR_ENTRY_V1 {
             if header.htype == format::PXAR_ENTRY {
                 self.entry.metadata = Metadata {
-                    stat: seq_read_entry(&mut self.input).await?,
+                    stat: seq_read_entry(self.input.archive_mut()).await?,
                     ..Default::default()
                 };
             } else if header.htype == format::PXAR_ENTRY_V1 {
-                let stat: format::Stat_V1 = seq_read_entry(&mut self.input).await?;
+                let stat: format::Stat_V1 = seq_read_entry(self.input.archive_mut()).await?;
 
                 self.entry.metadata = Metadata {
                     stat: stat.into(),
@@ -457,7 +475,7 @@ impl<I: SeqRead> DecoderImpl<I> {
             )
         };
 
-        match seq_read_exact_or_eof(&mut self.input, dest).await? {
+        match seq_read_exact_or_eof(self.input.archive_mut(), dest).await? {
             Some(()) => {
                 self.current_header.check_header_size()?;
                 Ok(Some(()))
@@ -527,12 +545,71 @@ impl<I: SeqRead> DecoderImpl<I> {
                 return Ok(ItemResult::Entry);
             }
             format::PXAR_PAYLOAD => {
-                let offset = seq_read_position(&mut self.input).await.transpose()?;
+                let offset = seq_read_position(self.input.archive_mut())
+                    .await
+                    .transpose()?;
                 self.entry.kind = EntryKind::File {
                     size: self.current_header.content_size(),
                     offset,
+                    payload_offset: None,
+                };
+                self.state = State::InPayload {
+                    offset: 0,
+                    size: self.current_header.content_size(),
+                };
+                return Ok(ItemResult::Entry);
+            }
+            format::PXAR_PAYLOAD_REF => {
+                let offset = seq_read_position(self.input.archive_mut())
+                    .await
+                    .transpose()?;
+                let payload_ref = self.read_payload_ref().await?;
+
+                if let Some(payload_input) = self.input.payload_mut() {
+                    if seq_read_position(payload_input)
+                        .await
+                        .transpose()?
+                        .is_none()
+                    {
+                        if self.payload_consumed > payload_ref.offset {
+                            io_bail!(
+                                "unexpected offset {}, smaller than already consumed payload {}",
+                                payload_ref.offset,
+                                self.payload_consumed,
+                            );
+                        }
+                        let to_skip = payload_ref.offset - self.payload_consumed;
+                        Self::skip(payload_input, to_skip as usize).await?;
+                        self.payload_consumed += to_skip;
+                    }
+
+                    let header: Header = seq_read_entry(payload_input).await?;
+                    if header.htype != format::PXAR_PAYLOAD {
+                        io_bail!(
+                            "unexpected header in payload input: expected {} , got {header}",
+                            format::PXAR_PAYLOAD,
+                        );
+                    }
+                    self.payload_consumed += size_of::<Header>() as u64;
+
+                    if header.content_size() != payload_ref.size {
+                        io_bail!(
+                            "encountered payload size mismatch: got {}, expected {}",
+                            payload_ref.size,
+                            header.content_size(),
+                        );
+                    }
+                }
+
+                self.entry.kind = EntryKind::File {
+                    size: payload_ref.size,
+                    offset,
+                    payload_offset: Some(payload_ref.offset),
+                };
+                self.state = State::InPayload {
+                    offset: 0,
+                    size: payload_ref.size,
                 };
-                self.state = State::InPayload { offset: 0 };
                 return Ok(ItemResult::Entry);
             }
             format::PXAR_FILENAME | format::PXAR_GOODBYE => {
@@ -564,7 +641,7 @@ impl<I: SeqRead> DecoderImpl<I> {
 
     async fn skip_entry(&mut self, offset: u64) -> io::Result<()> {
         let len = (self.current_header.content_size() - offset) as usize;
-        Self::skip(&mut self.input, len).await
+        Self::skip(self.input.archive_mut(), len).await
     }
 
     async fn skip(input: &mut I, mut len: usize) -> io::Result<()> {
@@ -581,7 +658,7 @@ impl<I: SeqRead> DecoderImpl<I> {
 
     async fn read_entry_as_bytes(&mut self) -> io::Result<Vec<u8>> {
         let size = usize::try_from(self.current_header.content_size()).map_err(io_err_other)?;
-        let data = seq_read_exact_data(&mut self.input, size).await?;
+        let data = seq_read_exact_data(self.input.archive_mut(), size).await?;
         Ok(data)
     }
 
@@ -598,7 +675,7 @@ impl<I: SeqRead> DecoderImpl<I> {
                 size_of::<T>(),
             );
         }
-        seq_read_entry(&mut self.input).await
+        seq_read_entry(self.input.archive_mut()).await
     }
 
     //
@@ -630,8 +707,8 @@ impl<I: SeqRead> DecoderImpl<I> {
         }
         let data_size = content_size - size_of::<u64>();
 
-        let offset: u64 = seq_read_entry(&mut self.input).await?;
-        let data = seq_read_exact_data(&mut self.input, data_size).await?;
+        let offset: u64 = seq_read_entry(self.input.archive_mut()).await?;
+        let data = seq_read_exact_data(self.input.archive_mut(), data_size).await?;
 
         Ok(format::Hardlink { offset, data })
     }
@@ -667,7 +744,7 @@ impl<I: SeqRead> DecoderImpl<I> {
 
     async fn read_payload_ref(&mut self) -> io::Result<format::PayloadRef> {
         self.current_header.check_header_size()?;
-        seq_read_entry(&mut self.input).await
+        seq_read_entry(self.input.archive_mut()).await
     }
 }
 
diff --git a/src/decoder/sync.rs b/src/decoder/sync.rs
index 5597a03..8779f87 100644
--- a/src/decoder/sync.rs
+++ b/src/decoder/sync.rs
@@ -7,7 +7,7 @@ use std::task::{Context, Poll};
 
 use crate::decoder::{self, SeqRead};
 use crate::util::poll_result_once;
-use crate::Entry;
+use crate::{Entry, PxarVariant};
 
 /// Blocking `pxar` decoder.
 ///
@@ -25,8 +25,8 @@ pub struct Decoder<T> {
 impl<T: io::Read> Decoder<StandardReader<T>> {
     /// Decode a `pxar` archive from a regular `std::io::Read` input.
     #[inline]
-    pub fn from_std(input: T) -> io::Result<Self> {
-        Decoder::new(StandardReader::new(input))
+    pub fn from_std(input: PxarVariant<T, T>) -> io::Result<Self> {
+        Decoder::new(input.wrap(|i| StandardReader::new(i)))
     }
 
     /// Get a direct reference to the reader contained inside the contained [`StandardReader`].
@@ -37,8 +37,15 @@ impl<T: io::Read> Decoder<StandardReader<T>> {
 
 impl Decoder<StandardReader<std::fs::File>> {
     /// Convenience shortcut for `File::open` followed by `Accessor::from_file`.
-    pub fn open<P: AsRef<Path>>(path: P) -> io::Result<Self> {
-        Self::from_std(std::fs::File::open(path.as_ref())?)
+    pub fn open<P: AsRef<Path>>(path: PxarVariant<P, P>) -> io::Result<Self> {
+        let input = match path {
+            PxarVariant::Split(input, payload_input) => PxarVariant::Split(
+                std::fs::File::open(input)?,
+                std::fs::File::open(payload_input)?,
+            ),
+            PxarVariant::Unified(input) => PxarVariant::Unified(std::fs::File::open(input)?),
+        };
+        Self::from_std(input)
     }
 }
 
@@ -47,7 +54,9 @@ impl<T: SeqRead> Decoder<T> {
     ///
     /// Note that the `input`'s `SeqRead` implementation must always return `Poll::Ready` and is
     /// not allowed to use the `Waker`, as this will cause a `panic!`.
-    pub fn new(input: T) -> io::Result<Self> {
+    /// The optional payload input must be used to restore regular file payloads for payload references
+    /// encountered within the archive.
+    pub fn new(input: PxarVariant<T, T>) -> io::Result<Self> {
         Ok(Self {
             inner: poll_result_once(decoder::DecoderImpl::new(input))?,
         })
diff --git a/src/lib.rs b/src/lib.rs
index f784c9e..bafdfe4 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -364,6 +364,9 @@ pub enum EntryKind {
 
         /// The file's byte offset inside the archive, if available.
         offset: Option<u64>,
+
+        /// The file's byte offset inside the payload stream, if available.
+        payload_offset: Option<u64>,
     },
 
     /// Directory entry. When iterating through an archive, the contents follow next.
diff --git a/tests/compat.rs b/tests/compat.rs
index 3b43e38..8f1b778 100644
--- a/tests/compat.rs
+++ b/tests/compat.rs
@@ -94,7 +94,8 @@ fn create_archive() -> io::Result<Vec<u8>> {
 fn test_archive() {
     let archive = create_archive().expect("failed to create test archive");
     let mut input = &archive[..];
-    let mut decoder = decoder::Decoder::from_std(&mut input).expect("failed to create decoder");
+    let mut decoder = decoder::Decoder::from_std(pxar::PxarVariant::Unified(&mut input))
+        .expect("failed to create decoder");
 
     let item = decoder
         .next()
diff --git a/tests/simple/main.rs b/tests/simple/main.rs
index e55457f..e403184 100644
--- a/tests/simple/main.rs
+++ b/tests/simple/main.rs
@@ -61,14 +61,16 @@ fn test1() {
     // std::fs::write("myarchive.pxar", &file).expect("failed to write out test archive");
 
     let mut input = &file[..];
-    let mut decoder = decoder::Decoder::from_std(&mut input).expect("failed to create decoder");
+    let mut decoder = decoder::Decoder::from_std(pxar::PxarVariant::Unified(&mut input))
+        .expect("failed to create decoder");
     let decoded_fs =
         fs::Entry::decode_from(&mut decoder).expect("failed to decode previously encoded archive");
 
     assert_eq!(test_fs, decoded_fs);
 
-    let accessor = accessor::Accessor::new(&file[..], file.len() as u64)
-        .expect("failed to create random access reader for encoded archive");
+    let accessor =
+        accessor::Accessor::new(pxar::PxarVariant::Unified(&file[..]), file.len() as u64)
+            .expect("failed to create random access reader for encoded archive");
 
     check_bunzip2(&accessor);
     check_run_special_files(&accessor);
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 pxar 08/69] decoder: set payload input range when decoding via accessor
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (6 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 07/69] decoder/accessor: allow for split input stream variant Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 09/69] encoder: add payload reference capability Christian Ebner
                   ` (61 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

When accessing the file contents via the sequential file restore
the range of the payload contents cannot be inferred a-priori but need
to be calculated based on the payload references encountered during
decoding.

Extending the `SeqRead` trait by the method `update_range` allows to
set the range in the payload reader instance by implementing the
method for `SeqReadAtAdapter`, thereby setting the correct content
range to be accessed.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 src/accessor/mod.rs |  4 ++++
 src/decoder/mod.rs  | 10 ++++++++++
 2 files changed, 14 insertions(+)

diff --git a/src/accessor/mod.rs b/src/accessor/mod.rs
index c061b74..51e846a 100644
--- a/src/accessor/mod.rs
+++ b/src/accessor/mod.rs
@@ -1013,4 +1013,8 @@ impl<T: ReadAt> decoder::SeqRead for SeqReadAtAdapter<T> {
     fn poll_position(self: Pin<&mut Self>, _cx: &mut Context) -> Poll<Option<io::Result<u64>>> {
         Poll::Ready(Some(Ok(self.range.start)))
     }
+
+    fn update_range(&mut self, range: Range<u64>) {
+        self.range = range;
+    }
 }
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index b5c17b8..0fc3698 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -8,6 +8,7 @@ use std::ffi::OsString;
 use std::future::poll_fn;
 use std::io;
 use std::mem::{self, size_of, size_of_val, MaybeUninit};
+use std::ops::Range;
 use std::os::unix::ffi::{OsStrExt, OsStringExt};
 use std::path::{Path, PathBuf};
 use std::pin::Pin;
@@ -55,6 +56,11 @@ pub trait SeqRead {
     fn poll_position(self: Pin<&mut Self>, _cx: &mut Context) -> Poll<Option<io::Result<u64>>> {
         Poll::Ready(None)
     }
+
+    /// Update range for Readers implementing `SeqReadAtAdapter`
+    fn update_range(&mut self, _range: Range<u64>) {
+        // nothing to be done, only implemented by `SeqReadAtAdapter`s
+    }
 }
 
 /// Allow using trait objects for generics taking a `SeqRead`:
@@ -581,6 +587,10 @@ impl<I: SeqRead> DecoderImpl<I> {
                         let to_skip = payload_ref.offset - self.payload_consumed;
                         Self::skip(payload_input, to_skip as usize).await?;
                         self.payload_consumed += to_skip;
+                    } else {
+                        let start = payload_ref.offset;
+                        let end = start + payload_ref.size + size_of::<Header>() as u64;
+                        payload_input.update_range(start..end);
                     }
 
                     let header: Header = seq_read_entry(payload_input).await?;
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 pxar 09/69] encoder: add payload reference capability
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (7 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 08/69] decoder: set payload input range when decoding via accessor Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 10/69] encoder: add payload position capability Christian Ebner
                   ` (60 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Allows to encode regular files with a payload reference within a
separate payload archive rather than encoding the payload within the
regular archive.

Following the PXAR_PAYLOAD_REF marked header, the payload offset and
size are encoded.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 src/encoder/aio.rs  | 18 ++++++++++++++-
 src/encoder/mod.rs  | 54 +++++++++++++++++++++++++++++++++++++++++++++
 src/encoder/sync.rs | 21 +++++++++++++++++-
 3 files changed, 91 insertions(+), 2 deletions(-)

diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index 610fce5..23b2bbf 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -5,7 +5,7 @@ use std::path::Path;
 use std::pin::Pin;
 use std::task::{Context, Poll};
 
-use crate::encoder::{self, LinkOffset, SeqWrite};
+use crate::encoder::{self, LinkOffset, PayloadOffset, SeqWrite};
 use crate::format;
 use crate::{Metadata, PxarVariant};
 
@@ -95,6 +95,22 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
     //     ).await
     // }
 
+    /// Encode a payload reference pointing to given offset in the separate payload output
+    ///
+    /// Returns with error if the encoder instance has no separate payload output or encoding
+    /// failed.
+    pub async fn add_payload_ref(
+        &mut self,
+        metadata: &Metadata,
+        file_name: &Path,
+        file_size: u64,
+        payload_offset: PayloadOffset,
+    ) -> io::Result<LinkOffset> {
+        self.inner
+            .add_payload_ref(metadata, file_name.as_ref(), file_size, payload_offset)
+            .await
+    }
+
     /// Create a new subdirectory. Note that the subdirectory has to be finished by calling the
     /// `finish()` method, otherwise the entire archive will be in an error state.
     pub async fn create_directory<P: AsRef<Path>>(
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index fbd90fe..31933fc 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -38,6 +38,24 @@ impl LinkOffset {
     }
 }
 
+/// File reference used to create payload references.
+#[derive(Clone, Copy, Debug, Default, Eq, PartialEq, Ord, PartialOrd)]
+pub struct PayloadOffset(u64);
+
+impl PayloadOffset {
+    /// Get the raw byte offset of this link.
+    #[inline]
+    pub fn raw(self) -> u64 {
+        self.0
+    }
+
+    /// Return a new PayloadOffset, positively shifted by offset
+    #[inline]
+    pub fn add(&self, offset: u64) -> Self {
+        Self(self.0 + offset)
+    }
+}
+
 /// Sequential write interface used by the encoder's state machine.
 ///
 /// This is our internal writer trait which is available for `std::io::Write` types in the
@@ -506,6 +524,42 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         Ok(offset)
     }
 
+    /// Encode a payload reference pointing to given offset in the separate payload output
+    ///
+    /// Returns a file offset usable with `add_hardlink` or with error if the encoder instance has
+    /// no separate payload output or encoding failed.
+    pub async fn add_payload_ref(
+        &mut self,
+        metadata: &Metadata,
+        file_name: &Path,
+        file_size: u64,
+        payload_offset: PayloadOffset,
+    ) -> io::Result<LinkOffset> {
+        if self.output.payload().is_none() {
+            io_bail!("unable to add payload reference");
+        }
+
+        let offset = payload_offset.raw();
+        let payload_position = self.state()?.payload_position();
+        if offset < payload_position {
+            io_bail!("offset smaller than current position: {offset} < {payload_position}");
+        }
+
+        let payload_ref = PayloadRef {
+            offset,
+            size: file_size,
+        };
+        let this_offset: LinkOffset = self
+            .add_file_entry(
+                Some(metadata),
+                file_name,
+                Some((format::PXAR_PAYLOAD_REF, &payload_ref.data())),
+            )
+            .await?;
+
+        Ok(this_offset)
+    }
+
     /// Return a file offset usable with `add_hardlink`.
     pub async fn add_symlink(
         &mut self,
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 9d39658..62bb5e1 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -6,7 +6,7 @@ use std::pin::Pin;
 use std::task::{Context, Poll};
 
 use crate::decoder::sync::StandardReader;
-use crate::encoder::{self, LinkOffset, SeqWrite};
+use crate::encoder::{self, LinkOffset, PayloadOffset, SeqWrite};
 use crate::format;
 use crate::util::poll_result_once;
 use crate::{Metadata, PxarVariant};
@@ -102,6 +102,25 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         ))
     }
 
+    /// Encode a payload reference pointing to given offset in the separate payload output
+    ///
+    /// Returns with error if the encoder instance has no separate payload output or encoding
+    /// failed.
+    pub async fn add_payload_ref(
+        &mut self,
+        metadata: &Metadata,
+        file_name: &Path,
+        file_size: u64,
+        payload_offset: PayloadOffset,
+    ) -> io::Result<LinkOffset> {
+        poll_result_once(self.inner.add_payload_ref(
+            metadata,
+            file_name.as_ref(),
+            file_size,
+            payload_offset,
+        ))
+    }
+
     /// Create a new subdirectory. Note that the subdirectory has to be finished by calling the
     /// `finish()` method, otherwise the entire archive will be in an error state.
     pub fn create_directory<P: AsRef<Path>>(
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 pxar 10/69] encoder: add payload position capability
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (8 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 09/69] encoder: add payload reference capability Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 11/69] encoder: add payload advance capability Christian Ebner
                   ` (59 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Allows to read the current payload offset from the dedicated payload
input stream. This is required to get the current offset for calculation
of forced boundaries in the proxmox-backup-client, when injecting reused
payload chunks into the payload stream.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 src/encoder/aio.rs  | 5 +++++
 src/encoder/mod.rs  | 4 ++++
 src/encoder/sync.rs | 5 +++++
 3 files changed, 14 insertions(+)

diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index 23b2bbf..524c281 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -75,6 +75,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         })
     }
 
+    /// Get current position for payload stream
+    pub fn payload_position(&self) -> io::Result<PayloadOffset> {
+        self.inner.payload_position()
+    }
+
     // /// Convenience shortcut to add a *regular* file by path including its contents to the archive.
     // pub async fn add_file<P, F>(
     //     &mut self,
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 31933fc..a3a6e9a 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -524,6 +524,10 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         Ok(offset)
     }
 
+    pub fn payload_position(&self) -> io::Result<PayloadOffset> {
+        Ok(PayloadOffset(self.state()?.payload_position()))
+    }
+
     /// Encode a payload reference pointing to given offset in the separate payload output
     ///
     /// Returns a file offset usable with `add_hardlink` or with error if the encoder instance has
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 62bb5e1..75650e4 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -102,6 +102,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         ))
     }
 
+    /// Get current payload position for payload stream
+    pub fn payload_position(&self) -> io::Result<PayloadOffset> {
+        self.inner.payload_position()
+    }
+
     /// Encode a payload reference pointing to given offset in the separate payload output
     ///
     /// Returns with error if the encoder instance has no separate payload output or encoding
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 pxar 11/69] encoder: add payload advance capability
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (9 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 10/69] encoder: add payload position capability Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 12/69] encoder/format: finish payload stream with marker Christian Ebner
                   ` (58 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Allows to advance the payload writer position by a given size.
This is used to update the encoders payload input position when
injecting reused chunks for files with unchanged metadata.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 src/encoder/aio.rs  | 5 +++++
 src/encoder/mod.rs  | 6 ++++++
 src/encoder/sync.rs | 5 +++++
 3 files changed, 16 insertions(+)

diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index 524c281..46856b0 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -116,6 +116,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
             .await
     }
 
+    /// Add size to payload stream
+    pub fn advance(&mut self, size: PayloadOffset) -> io::Result<()> {
+        self.inner.advance(size)
+    }
+
     /// Create a new subdirectory. Note that the subdirectory has to be finished by calling the
     /// `finish()` method, otherwise the entire archive will be in an error state.
     pub async fn create_directory<P: AsRef<Path>>(
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index a3a6e9a..e89dc3b 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -564,6 +564,12 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         Ok(this_offset)
     }
 
+    /// Add size to payload stream
+    pub fn advance(&mut self, size: PayloadOffset) -> io::Result<()> {
+        self.state_mut()?.payload_write_position += size.raw();
+        Ok(())
+    }
+
     /// Return a file offset usable with `add_hardlink`.
     pub async fn add_symlink(
         &mut self,
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 75650e4..5aa8d69 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -126,6 +126,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         ))
     }
 
+    /// Add size to payload stream
+    pub fn advance(&mut self, size: PayloadOffset) -> io::Result<()> {
+        self.inner.advance(size)
+    }
+
     /// Create a new subdirectory. Note that the subdirectory has to be finished by calling the
     /// `finish()` method, otherwise the entire archive will be in an error state.
     pub fn create_directory<P: AsRef<Path>>(
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 pxar 12/69] encoder/format: finish payload stream with marker
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (10 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 11/69] encoder: add payload advance capability Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 13/69] format: add payload stream start marker Christian Ebner
                   ` (57 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Mark the end of the optional payload stream, this makes sure that at
least some bytes are written to the stream (as empty archives are not
allowed by the proxmox backup server) and possible injected chunks
must be consumed.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 examples/mk-format-hashes.rs | 5 +++++
 src/encoder/mod.rs           | 8 ++++++++
 src/format/mod.rs            | 4 ++++
 3 files changed, 17 insertions(+)

diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index 83adb38..de73df0 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -56,6 +56,11 @@ const CONSTANTS: &[(&str, &str, &str)] = &[
         "PXAR_GOODBYE_TAIL_MARKER",
         "__PROXMOX_FORMAT_PXAR_GOODBYE_TAIL_MARKER__",
     ),
+    (
+        "The end marker used in the separate payload stream",
+        "PXAR_PAYLOAD_TAIL_MARKER",
+        "__PROXMOX_FORMAT_PXAR_PAYLOAD_TAIL_MARKER__",
+    ),
 ];
 
 fn main() {
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index e89dc3b..54e147d 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -900,6 +900,14 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         }
 
         if let Some(output) = self.output.payload_mut() {
+            let mut dummy_writer = 0;
+            seq_write_pxar_entry(
+                output,
+                format::PXAR_PAYLOAD_TAIL_MARKER,
+                &[],
+                &mut dummy_writer,
+            )
+            .await?;
             flush(output).await?;
         }
 
diff --git a/src/format/mod.rs b/src/format/mod.rs
index 5d7a652..e451b0f 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -106,6 +106,8 @@ pub const PXAR_PAYLOAD_REF: u64 = 0x419d3d6bc4ba977e;
 pub const PXAR_GOODBYE: u64 = 0x2fec4fa642d5731d;
 /// The end marker used in the GOODBYE object
 pub const PXAR_GOODBYE_TAIL_MARKER: u64 = 0xef5eed5b753e1555;
+/// The end marker used in the separate payload stream
+pub const PXAR_PAYLOAD_TAIL_MARKER: u64 = 0x6c72b78b984c81b5;
 
 #[derive(Debug, Endian)]
 #[repr(C)]
@@ -156,6 +158,7 @@ impl Header {
             PXAR_ENTRY => size_of::<Stat>() as u64,
             PXAR_PAYLOAD | PXAR_GOODBYE => u64::MAX - (size_of::<Self>() as u64),
             PXAR_PAYLOAD_REF => size_of::<PayloadRef>() as u64,
+            PXAR_PAYLOAD_TAIL_MARKER => size_of::<Header>() as u64,
             _ => u64::MAX - (size_of::<Self>() as u64),
         }
     }
@@ -197,6 +200,7 @@ impl Display for Header {
             PXAR_ENTRY => "ENTRY",
             PXAR_PAYLOAD => "PAYLOAD",
             PXAR_PAYLOAD_REF => "PAYLOAD_REF",
+            PXAR_PAYLOAD_TAIL_MARKER => "PXAR_PAYLOAD_TAIL_MARKER",
             PXAR_GOODBYE => "GOODBYE",
             _ => "UNKNOWN",
         };
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 pxar 13/69] format: add payload stream start marker
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (11 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 12/69] encoder/format: finish payload stream with marker Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 14/69] format/encoder/decoder: new pxar entry type `Version` Christian Ebner
                   ` (56 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Mark the beginning of the payload stream with a magic number. Allows for
version and file type detection.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 examples/mk-format-hashes.rs |  5 +++++
 src/accessor/mod.rs          |  2 +-
 src/decoder/mod.rs           | 28 +++++++++++++++++++---------
 src/encoder/mod.rs           | 12 ++++++++++--
 src/format/mod.rs            |  2 ++
 5 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index de73df0..35cff99 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -56,6 +56,11 @@ const CONSTANTS: &[(&str, &str, &str)] = &[
         "PXAR_GOODBYE_TAIL_MARKER",
         "__PROXMOX_FORMAT_PXAR_GOODBYE_TAIL_MARKER__",
     ),
+    (
+        "The start marker used in the separate payload stream",
+        "PXAR_PAYLOAD_START_MARKER",
+        "__PROXMOX_FORMAT_PXAR_PAYLOAD_START_MARKER__",
+    ),
     (
         "The end marker used in the separate payload stream",
         "PXAR_PAYLOAD_TAIL_MARKER",
diff --git a/src/accessor/mod.rs b/src/accessor/mod.rs
index 51e846a..c3a5e14 100644
--- a/src/accessor/mod.rs
+++ b/src/accessor/mod.rs
@@ -238,7 +238,7 @@ async fn get_decoder<T: ReadAt>(
         |input| SeqReadAtAdapter::new(input, entry_range.clone()),
         |(payload_input, range)| SeqReadAtAdapter::new(payload_input, range),
     );
-    DecoderImpl::new_full(input, path, true).await
+    DecoderImpl::new_full(input, path, true, 0).await
 }
 
 // NOTE: This performs the Decoder::read_next_item() behavior! Keep in mind when changing!
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index 0fc3698..19b1b5c 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -205,8 +205,21 @@ pub(crate) enum ItemResult {
 }
 
 impl<I: SeqRead> DecoderImpl<I> {
-    pub async fn new(input: PxarVariant<I, I>) -> io::Result<Self> {
-        Self::new_full(input, "/".into(), false).await
+    pub async fn new(mut input: PxarVariant<I, I>) -> io::Result<Self> {
+        let payload_consumed = if let Some(payload_input) = input.payload_mut() {
+            let header: Header = seq_read_entry(payload_input).await?;
+            if header.htype != format::PXAR_PAYLOAD_START_MARKER {
+                io_bail!(
+                    "unexpected header in payload input: expected {:#x?} , got {header:#x?}",
+                    format::PXAR_PAYLOAD_START_MARKER,
+                );
+            }
+            header.full_size()
+        } else {
+            0
+        };
+
+        Self::new_full(input, "/".into(), false, payload_consumed).await
     }
 
     pub(crate) fn input(&self) -> &I {
@@ -217,8 +230,9 @@ impl<I: SeqRead> DecoderImpl<I> {
         input: PxarVariant<I, I>,
         path: PathBuf,
         eof_after_entry: bool,
+        payload_consumed: u64,
     ) -> io::Result<Self> {
-        let this = DecoderImpl {
+        Ok(DecoderImpl {
             input,
             current_header: unsafe { mem::zeroed() },
             entry: Entry {
@@ -229,13 +243,9 @@ impl<I: SeqRead> DecoderImpl<I> {
             path_lengths: Vec::new(),
             state: State::Begin,
             with_goodbye_tables: false,
-            payload_consumed: 0,
+            payload_consumed,
             eof_after_entry,
-        };
-
-        // this.read_next_entry().await?;
-
-        Ok(this)
+        })
     }
 
     /// Get the next file entry, recursing into directories.
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 54e147d..b579e18 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -342,15 +342,23 @@ impl<'a, T: SeqWrite + 'a> Drop for EncoderImpl<'a, T> {
 
 impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     pub async fn new(
-        output: PxarVariant<EncoderOutput<'a, T>, T>,
+        mut output: PxarVariant<EncoderOutput<'a, T>, T>,
         metadata: &Metadata,
     ) -> io::Result<EncoderImpl<'a, T>> {
         if !metadata.is_dir() {
             io_bail!("directory metadata must contain the directory mode flag");
         }
+
+        let mut state = EncoderState::default();
+        if let Some(payload_output) = output.payload_mut() {
+            let header = format::Header::with_content_size(format::PXAR_PAYLOAD_START_MARKER, 0);
+            header.check_header_size()?;
+            seq_write_struct(payload_output, header, &mut state.payload_write_position).await?;
+        }
+
         let mut this = Self {
             output,
-            state: vec![EncoderState::default()],
+            state: vec![state],
             finished: false,
             file_copy_buffer: Arc::new(Mutex::new(unsafe {
                 crate::util::vec_new_uninitialized(1024 * 1024)
diff --git a/src/format/mod.rs b/src/format/mod.rs
index e451b0f..6519bfc 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -106,6 +106,8 @@ pub const PXAR_PAYLOAD_REF: u64 = 0x419d3d6bc4ba977e;
 pub const PXAR_GOODBYE: u64 = 0x2fec4fa642d5731d;
 /// The end marker used in the GOODBYE object
 pub const PXAR_GOODBYE_TAIL_MARKER: u64 = 0xef5eed5b753e1555;
+/// The start marker used in the separate payload stream
+pub const PXAR_PAYLOAD_START_MARKER: u64 = 0x834c68c2194a4ed2;
 /// The end marker used in the separate payload stream
 pub const PXAR_PAYLOAD_TAIL_MARKER: u64 = 0x6c72b78b984c81b5;
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 pxar 14/69] format/encoder/decoder: new pxar entry type `Version`
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (12 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 13/69] format: add payload stream start marker Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 15/69] format/encoder/decoder: new pxar entry type `Prelude` Christian Ebner
                   ` (55 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Introduces a new pxar format entry type `Version` and the associated
encoder and decoder methods. The format version entry is only allowed
once, as the first entry of the pxar archive, marked with a
`PXAR_FORMAT_VERSION` header followed by the encoded version number.
If not present, the default format version 1 is assumed as encoding
format for the archive.

The entry allows to early detect incompatibility with an encoded
archive and bail or switch mode based on the encountered version.

The format version entry is not backwards compatible to pxar format
version 1.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 examples/mk-format-hashes.rs |  5 +++++
 src/accessor/mod.rs          | 21 ++++++++++++++++++--
 src/decoder/mod.rs           | 37 ++++++++++++++++++++++++++++++++++--
 src/encoder/mod.rs           | 37 +++++++++++++++++++++++++++++++++---
 src/format/mod.rs            | 11 +++++++++++
 src/lib.rs                   |  3 +++
 tests/simple/fs.rs           |  1 +
 7 files changed, 108 insertions(+), 7 deletions(-)

diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index 35cff99..e5d69b1 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -1,6 +1,11 @@
 use pxar::format::hash_filename;
 
 const CONSTANTS: &[(&str, &str, &str)] = &[
+    (
+        "Pxar format version entry, fallback to version 1 if not present",
+        "PXAR_FORMAT_VERSION",
+        "__PROXMOX_FORMAT_VERSION__",
+    ),
     (
         "Beginning of an entry (current version).",
         "PXAR_ENTRY",
diff --git a/src/accessor/mod.rs b/src/accessor/mod.rs
index c3a5e14..faab430 100644
--- a/src/accessor/mod.rs
+++ b/src/accessor/mod.rs
@@ -299,11 +299,19 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
             PathBuf::new(),
         )
         .await?;
-        let entry = decoder
+        let mut entry = decoder
             .next()
             .await
             .ok_or_else(|| io_format_err!("unexpected EOF while decoding file entry"))??;
 
+        // Skip over possible Version and Prelude before the root entry of type Directory
+        if let EntryKind::Version(_) = entry.kind() {
+            entry = decoder
+                .next()
+                .await
+                .ok_or_else(|| io_format_err!("unexpected EOF while decoding directory entry"))??;
+        }
+
         Ok(FileEntryImpl {
             input: self.input.clone(),
             entry,
@@ -528,10 +536,19 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
         file_name: Option<&Path>,
     ) -> io::Result<(Entry, DecoderImpl<SeqReadAtAdapter<T>>)> {
         let mut decoder = self.get_decoder(entry_range, file_name).await?;
-        let entry = decoder
+        let mut entry = decoder
             .next()
             .await
             .ok_or_else(|| io_format_err!("unexpected EOF while decoding directory entry"))??;
+
+        // Skip over possible Version and Prelude before the root entry of type Directory
+        if let EntryKind::Version(_) = entry.kind() {
+            entry = decoder
+                .next()
+                .await
+                .ok_or_else(|| io_format_err!("unexpected EOF while decoding directory entry"))??;
+        }
+
         Ok((entry, decoder))
     }
 
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index 19b1b5c..43c83ae 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -18,7 +18,7 @@ use std::task::{Context, Poll};
 
 use endian_trait::Endian;
 
-use crate::format::{self, Header};
+use crate::format::{self, FormatVersion, Header};
 use crate::util::{self, io_err_other};
 use crate::{Entry, EntryKind, Metadata, PxarVariant};
 
@@ -169,10 +169,14 @@ pub(crate) struct DecoderImpl<T> {
     /// The random access code uses decoders for sub-ranges which may not end in a `PAYLOAD` for
     /// entries like FIFOs or sockets, so there we explicitly allow an item to terminate with EOF.
     eof_after_entry: bool,
+    /// The format version as determined by the format version header
+    version: format::FormatVersion,
 }
 
+#[derive(Clone, PartialEq)]
 enum State {
     Begin,
+    Root,
     Default,
     InPayload {
         offset: u64,
@@ -245,6 +249,7 @@ impl<I: SeqRead> DecoderImpl<I> {
             with_goodbye_tables: false,
             payload_consumed,
             eof_after_entry,
+            version: FormatVersion::default(),
         })
     }
 
@@ -257,7 +262,19 @@ impl<I: SeqRead> DecoderImpl<I> {
         loop {
             match self.state {
                 State::Eof => return Ok(None),
-                State::Begin => return self.read_next_entry().await.map(Some),
+                State::Begin => {
+                    let entry = self.read_next_entry().await.map(Some);
+                    if let Ok(Some(ref entry)) = entry {
+                        if let EntryKind::Version(version) = entry.kind() {
+                            self.version = version.clone();
+                            self.state = State::Root;
+                        }
+                    }
+                    return entry;
+                }
+                State::Root => {
+                    return self.read_next_entry().await.map(Some);
+                }
                 State::Default => {
                     // we completely finished an entry, so now we're going "up" in the directory
                     // hierarchy and parse the next PXAR_FILENAME or the PXAR_GOODBYE:
@@ -388,6 +405,7 @@ impl<I: SeqRead> DecoderImpl<I> {
     }
 
     async fn read_next_entry_or_eof(&mut self) -> io::Result<Option<Entry>> {
+        let previous_state = self.state.clone();
         self.state = State::Default;
         self.entry.clear_data();
 
@@ -407,6 +425,14 @@ impl<I: SeqRead> DecoderImpl<I> {
             self.entry.metadata = Metadata::default();
             self.entry.kind = EntryKind::Hardlink(self.read_hardlink().await?);
 
+            Ok(Some(self.entry.take()))
+        } else if header.htype == format::PXAR_FORMAT_VERSION {
+            if previous_state != State::Begin {
+                io_bail!("Got format version entry at unexpected position");
+            }
+            self.current_header = header;
+            self.entry.kind = EntryKind::Version(self.read_format_version().await?);
+
             Ok(Some(self.entry.take()))
         } else if header.htype == format::PXAR_ENTRY || header.htype == format::PXAR_ENTRY_V1 {
             if header.htype == format::PXAR_ENTRY {
@@ -766,6 +792,13 @@ impl<I: SeqRead> DecoderImpl<I> {
         self.current_header.check_header_size()?;
         seq_read_entry(self.input.archive_mut()).await
     }
+
+    async fn read_format_version(&mut self) -> io::Result<format::FormatVersion> {
+        match seq_read_entry(self.input.archive_mut()).await? {
+            2u64 => Ok(format::FormatVersion::Version2),
+            version => io_bail!("unexpected pxar format version {version}"),
+        }
+    }
 }
 
 /// Reader for file contents inside a pxar archive.
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index b579e18..4bed040 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -17,7 +17,7 @@ use endian_trait::Endian;
 
 use crate::binary_tree_array;
 use crate::decoder::{self, SeqRead};
-use crate::format::{self, GoodbyeItem, PayloadRef};
+use crate::format::{self, FormatVersion, GoodbyeItem, PayloadRef};
 use crate::{Metadata, PxarVariant};
 
 pub mod aio;
@@ -326,6 +326,8 @@ pub(crate) struct EncoderImpl<'a, T: SeqWrite + 'a> {
     /// Since only the "current" entry can be actively writing files, we share the file copy
     /// buffer.
     file_copy_buffer: Arc<Mutex<Vec<u8>>>,
+    /// Pxar format version to encode
+    version: format::FormatVersion,
 }
 
 impl<'a, T: SeqWrite + 'a> Drop for EncoderImpl<'a, T> {
@@ -350,11 +352,14 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         }
 
         let mut state = EncoderState::default();
-        if let Some(payload_output) = output.payload_mut() {
+        let version = if let Some(payload_output) = output.payload_mut() {
             let header = format::Header::with_content_size(format::PXAR_PAYLOAD_START_MARKER, 0);
             header.check_header_size()?;
             seq_write_struct(payload_output, header, &mut state.payload_write_position).await?;
-        }
+            format::FormatVersion::Version2
+        } else {
+            format::FormatVersion::Version1
+        };
 
         let mut this = Self {
             output,
@@ -363,8 +368,10 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
             file_copy_buffer: Arc::new(Mutex::new(unsafe {
                 crate::util::vec_new_uninitialized(1024 * 1024)
             })),
+            version,
         };
 
+        this.encode_format_version().await?;
         this.encode_metadata(metadata).await?;
         let state = this.state_mut()?;
         state.files_offset = state.position();
@@ -547,6 +554,10 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         file_size: u64,
         payload_offset: PayloadOffset,
     ) -> io::Result<LinkOffset> {
+        if self.version == FormatVersion::Version1 {
+            io_bail!("payload references not supported pxar format version 1");
+        }
+
         if self.output.payload().is_none() {
             io_bail!("unable to add payload reference");
         }
@@ -762,6 +773,26 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         Ok(())
     }
 
+    async fn encode_format_version(&mut self) -> io::Result<()> {
+        let version_bytes = match self.version {
+            format::FormatVersion::Version1 => return Ok(()),
+            format::FormatVersion::Version2 => 2u64.to_le_bytes(),
+        };
+
+        let (mut output, state) = self.output_state()?;
+        if state.write_position != 0 {
+            io_bail!("pxar format version must be encoded at the beginning of an archive");
+        }
+
+        seq_write_pxar_entry(
+            output.archive_mut(),
+            format::PXAR_FORMAT_VERSION,
+            &version_bytes,
+            &mut state.write_position,
+        )
+        .await
+    }
+
     async fn encode_metadata(&mut self, metadata: &Metadata) -> io::Result<()> {
         let (mut output, state) = self.output_state()?;
         seq_write_pxar_struct_entry(
diff --git a/src/format/mod.rs b/src/format/mod.rs
index 6519bfc..9b66fe2 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -6,6 +6,7 @@
 //! item data.
 //!
 //! An archive contains items in the following order:
+//!  * `FORMAT_VERSION`     -- (optional for v1), version of encoding format
 //!  * `ENTRY`              -- containing general stat() data and related bits
 //!   * `XATTR`             -- one extended attribute
 //!   * ...                 -- more of these when there are multiple defined
@@ -80,6 +81,8 @@ pub mod mode {
 }
 
 // Generated by `cargo run --example mk-format-hashes`
+/// Pxar format version entry, fallback to version 1 if not present
+pub const PXAR_FORMAT_VERSION: u64 = 0x730f6c75df16a40d;
 /// Beginning of an entry (current version).
 pub const PXAR_ENTRY: u64 = 0xd5956474e588acef;
 /// Previous version of the entry struct
@@ -186,6 +189,7 @@ impl Header {
 impl Display for Header {
     fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
         let readable = match self.htype {
+            PXAR_FORMAT_VERSION => "FORMAT_VERSION",
             PXAR_FILENAME => "FILENAME",
             PXAR_SYMLINK => "SYMLINK",
             PXAR_HARDLINK => "HARDLINK",
@@ -551,6 +555,13 @@ impl From<&std::fs::Metadata> for Stat {
     }
 }
 
+#[derive(Clone, Debug, Default, PartialEq)]
+pub enum FormatVersion {
+    #[default]
+    Version1,
+    Version2,
+}
+
 #[derive(Clone, Debug)]
 pub struct Filename {
     pub name: Vec<u8>,
diff --git a/src/lib.rs b/src/lib.rs
index bafdfe4..7e5b48f 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -342,6 +342,9 @@ impl Acl {
 /// Identifies whether the entry is a file, symlink, directory, etc.
 #[derive(Clone, Debug)]
 pub enum EntryKind {
+    /// Pxar file format version
+    Version(format::FormatVersion),
+
     /// Symbolic links.
     Symlink(format::Symlink),
 
diff --git a/tests/simple/fs.rs b/tests/simple/fs.rs
index 4284805..8a8c607 100644
--- a/tests/simple/fs.rs
+++ b/tests/simple/fs.rs
@@ -229,6 +229,7 @@ impl Entry {
                     })?))
                 };
             match item.kind() {
+                PxarEntryKind::Version(_) => continue,
                 PxarEntryKind::GoodbyeTable => break,
                 PxarEntryKind::File { size, .. } => {
                     let mut data = Vec::new();
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 pxar 15/69] format/encoder/decoder: new pxar entry type `Prelude`
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (13 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 14/69] format/encoder/decoder: new pxar entry type `Version` Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 16/69] client: backup: factor out extension from backup target Christian Ebner
                   ` (54 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Introduces a new pxar format entry type `Prelude` and the associated
encoder and decoder methods.
A prelude starts with header marker `PXAR_PRELUDE` followed by raw
byte content, used to store additional metadata associated with the
pxar archive, e.g. command line arguments passed on archive creation.

The prelude's content has no fixed encoding format but is stored as
an raw, arbitrary byte slice. A prelude entry is encoded right after
a pxar format version entry, both being encoded in the metadata
archive in case of an archive with dedicated payload output.

The prelude is not backwards compatible to pxar format version 1.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 examples/mk-format-hashes.rs |  1 +
 src/accessor/mod.rs          | 12 ++++++++++++
 src/decoder/mod.rs           | 31 ++++++++++++++++++++++++++++++-
 src/encoder/aio.rs           | 18 +++++++++++++++---
 src/encoder/mod.rs           | 26 ++++++++++++++++++++++++++
 src/encoder/sync.rs          | 15 ++++++++++++---
 src/format/mod.rs            | 26 ++++++++++++++++++++++++++
 src/lib.rs                   |  3 +++
 tests/simple/fs.rs           |  1 +
 9 files changed, 126 insertions(+), 7 deletions(-)

diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index e5d69b1..e998760 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -16,6 +16,7 @@ const CONSTANTS: &[(&str, &str, &str)] = &[
         "PXAR_ENTRY_V1",
         "__PROXMOX_FORMAT_ENTRY__",
     ),
+    ("", "PXAR_PRELUDE", "__PROXMOX_FORMAT_PRELUDE__"),
     ("", "PXAR_FILENAME", "__PROXMOX_FORMAT_FILENAME__"),
     ("", "PXAR_SYMLINK", "__PROXMOX_FORMAT_SYMLINK__"),
     ("", "PXAR_DEVICE", "__PROXMOX_FORMAT_DEVICE__"),
diff --git a/src/accessor/mod.rs b/src/accessor/mod.rs
index faab430..e4bf3f9 100644
--- a/src/accessor/mod.rs
+++ b/src/accessor/mod.rs
@@ -310,6 +310,12 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
                 .next()
                 .await
                 .ok_or_else(|| io_format_err!("unexpected EOF while decoding directory entry"))??;
+
+            if let EntryKind::Prelude(_) = entry.kind() {
+                entry = decoder.next().await.ok_or_else(|| {
+                    io_format_err!("unexpected EOF while decoding directory entry")
+                })??;
+            }
         }
 
         Ok(FileEntryImpl {
@@ -547,6 +553,12 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
                 .next()
                 .await
                 .ok_or_else(|| io_format_err!("unexpected EOF while decoding directory entry"))??;
+
+            if let EntryKind::Prelude(_) = entry.kind() {
+                entry = decoder.next().await.ok_or_else(|| {
+                    io_format_err!("unexpected EOF while decoding directory entry")
+                })??;
+            }
         }
 
         Ok((entry, decoder))
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index 43c83ae..1a1be35 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -176,6 +176,7 @@ pub(crate) struct DecoderImpl<T> {
 #[derive(Clone, PartialEq)]
 enum State {
     Begin,
+    Prelude,
     Root,
     Default,
     InPayload {
@@ -264,10 +265,25 @@ impl<I: SeqRead> DecoderImpl<I> {
                 State::Eof => return Ok(None),
                 State::Begin => {
                     let entry = self.read_next_entry().await.map(Some);
+                    // If the first entry is of kind Version, next must be Prelude or Directory
                     if let Ok(Some(ref entry)) = entry {
                         if let EntryKind::Version(version) = entry.kind() {
                             self.version = version.clone();
-                            self.state = State::Root;
+                            self.state = State::Prelude;
+                        }
+                    }
+                    return entry;
+                }
+                State::Prelude => {
+                    let entry = self.read_next_entry().await.map(Some);
+                    if let Ok(Some(ref entry)) = entry {
+                        match entry.kind() {
+                            EntryKind::Prelude(_) => self.state = State::Root,
+                            EntryKind::Directory => self.state = State::InDirectory,
+                            _ => io_bail!(
+                                "expected directory or prelude entry, got entry kind {:?}",
+                                entry.kind()
+                            ),
                         }
                     }
                     return entry;
@@ -433,6 +449,14 @@ impl<I: SeqRead> DecoderImpl<I> {
             self.current_header = header;
             self.entry.kind = EntryKind::Version(self.read_format_version().await?);
 
+            Ok(Some(self.entry.take()))
+        } else if header.htype == format::PXAR_PRELUDE {
+            if previous_state != State::Prelude {
+                io_bail!("Got format version entry at unexpected position");
+            }
+            self.current_header = header;
+            self.entry.kind = EntryKind::Prelude(self.read_prelude().await?);
+
             Ok(Some(self.entry.take()))
         } else if header.htype == format::PXAR_ENTRY || header.htype == format::PXAR_ENTRY_V1 {
             if header.htype == format::PXAR_ENTRY {
@@ -799,6 +823,11 @@ impl<I: SeqRead> DecoderImpl<I> {
             version => io_bail!("unexpected pxar format version {version}"),
         }
     }
+
+    async fn read_prelude(&mut self) -> io::Result<format::Prelude> {
+        let data = self.read_entry_as_bytes().await?;
+        Ok(format::Prelude { data })
+    }
 }
 
 /// Reader for file contents inside a pxar archive.
diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index 46856b0..8973402 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -24,8 +24,14 @@ impl<'a, T: tokio::io::AsyncWrite + 'a> Encoder<'a, TokioWriter<T>> {
     pub async fn from_tokio(
         output: PxarVariant<T, T>,
         metadata: &Metadata,
+        prelude: Option<&[u8]>,
     ) -> io::Result<Encoder<'a, TokioWriter<T>>> {
-        Encoder::new(output.wrap(|output| TokioWriter::new(output)), metadata).await
+        Encoder::new(
+            output.wrap(|output| TokioWriter::new(output)),
+            metadata,
+            prelude,
+        )
+        .await
     }
 }
 
@@ -41,6 +47,7 @@ impl<'a> Encoder<'a, TokioWriter<tokio::fs::File>> {
                 tokio::fs::File::create(path.as_ref()).await?,
             )),
             metadata,
+            None,
         )
         .await
     }
@@ -48,10 +55,14 @@ impl<'a> Encoder<'a, TokioWriter<tokio::fs::File>> {
 
 impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
     /// Create an asynchronous encoder for an output implementing our internal write interface.
-    pub async fn new(output: PxarVariant<T, T>, metadata: &Metadata) -> io::Result<Encoder<'a, T>> {
+    pub async fn new(
+        output: PxarVariant<T, T>,
+        metadata: &Metadata,
+        prelude: Option<&[u8]>,
+    ) -> io::Result<Encoder<'a, T>> {
         let output = output.wrap_multi(|output| output.into(), |payload_output| payload_output);
         Ok(Self {
-            inner: encoder::EncoderImpl::new(output, metadata).await?,
+            inner: encoder::EncoderImpl::new(output, metadata, prelude).await?,
         })
     }
 
@@ -326,6 +337,7 @@ mod test {
             let mut encoder = Encoder::new(
                 crate::PxarVariant::Unified(DummyOutput),
                 &Metadata::dir_builder(0o700).build(),
+                None,
             )
             .await
             .unwrap();
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 4bed040..a309c0f 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -346,6 +346,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     pub async fn new(
         mut output: PxarVariant<EncoderOutput<'a, T>, T>,
         metadata: &Metadata,
+        prelude: Option<&[u8]>,
     ) -> io::Result<EncoderImpl<'a, T>> {
         if !metadata.is_dir() {
             io_bail!("directory metadata must contain the directory mode flag");
@@ -372,6 +373,9 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         };
 
         this.encode_format_version().await?;
+        if let Some(prelude) = prelude {
+            this.encode_prelude(prelude).await?;
+        }
         this.encode_metadata(metadata).await?;
         let state = this.state_mut()?;
         state.files_offset = state.position();
@@ -773,6 +777,28 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         Ok(())
     }
 
+    async fn encode_prelude(&mut self, prelude: &[u8]) -> io::Result<()> {
+        if self.version == FormatVersion::Version1 {
+            io_bail!("encoding prelude not supported in pxar format version 1");
+        }
+
+        let (mut output, state) = self.output_state()?;
+        if state.write_position != (size_of::<u64>() + size_of::<format::Header>()) as u64 {
+            io_bail!(
+                "prelude must be encoded following the version header, current position {}",
+                state.write_position,
+            );
+        }
+
+        seq_write_pxar_entry(
+            output.archive_mut(),
+            format::PXAR_PRELUDE,
+            prelude,
+            &mut state.write_position,
+        )
+        .await
+    }
+
     async fn encode_format_version(&mut self) -> io::Result<()> {
         let version_bytes = match self.version {
             format::FormatVersion::Version1 => return Ok(()),
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 5aa8d69..3cfa03b 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -28,7 +28,11 @@ impl<'a, T: io::Write + 'a> Encoder<'a, StandardWriter<T>> {
     /// Encode a `pxar` archive into a regular `std::io::Write` output.
     #[inline]
     pub fn from_std(output: T, metadata: &Metadata) -> io::Result<Encoder<'a, StandardWriter<T>>> {
-        Encoder::new(PxarVariant::Unified(StandardWriter::new(output)), metadata)
+        Encoder::new(
+            PxarVariant::Unified(StandardWriter::new(output)),
+            metadata,
+            None,
+        )
     }
 }
 
@@ -41,6 +45,7 @@ impl<'a> Encoder<'a, StandardWriter<std::fs::File>> {
         Encoder::new(
             PxarVariant::Unified(StandardWriter::new(std::fs::File::create(path.as_ref())?)),
             metadata,
+            None,
         )
     }
 }
@@ -52,7 +57,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
     /// not allowed to use the `Waker`, as this will cause a `panic!`.
     // Optionally attach a dedicated writer to redirect the payloads of regular files to a separate
     // output.
-    pub fn new(output: PxarVariant<T, T>, metadata: &Metadata) -> io::Result<Self> {
+    pub fn new(
+        output: PxarVariant<T, T>,
+        metadata: &Metadata,
+        prelude: Option<&[u8]>,
+    ) -> io::Result<Self> {
         let output = match output {
             PxarVariant::Unified(output) => PxarVariant::Unified(output.into()),
             PxarVariant::Split(output, payload_output) => {
@@ -61,7 +70,7 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         };
 
         Ok(Self {
-            inner: poll_result_once(encoder::EncoderImpl::new(output, metadata))?,
+            inner: poll_result_once(encoder::EncoderImpl::new(output, metadata, prelude))?,
         })
     }
 
diff --git a/src/format/mod.rs b/src/format/mod.rs
index 9b66fe2..73b06cd 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -87,6 +87,7 @@ pub const PXAR_FORMAT_VERSION: u64 = 0x730f6c75df16a40d;
 pub const PXAR_ENTRY: u64 = 0xd5956474e588acef;
 /// Previous version of the entry struct
 pub const PXAR_ENTRY_V1: u64 = 0x11da850a1c1cceff;
+pub const PXAR_PRELUDE: u64 = 0xe309d79d9f7b771b;
 pub const PXAR_FILENAME: u64 = 0x16701121063917b3;
 pub const PXAR_SYMLINK: u64 = 0x27f971e7dbf5dc5f;
 pub const PXAR_DEVICE: u64 = 0x9fc9e906586d5ce9;
@@ -147,6 +148,7 @@ impl Header {
     #[inline]
     pub fn max_content_size(&self) -> u64 {
         match self.htype {
+            PXAR_PRELUDE => u64::MAX - (size_of::<Self>() as u64),
             // + null-termination
             PXAR_FILENAME => crate::util::MAX_FILENAME_LEN + 1,
             // + null-termination
@@ -190,6 +192,7 @@ impl Display for Header {
     fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
         let readable = match self.htype {
             PXAR_FORMAT_VERSION => "FORMAT_VERSION",
+            PXAR_PRELUDE => "PRELUDE",
             PXAR_FILENAME => "FILENAME",
             PXAR_SYMLINK => "SYMLINK",
             PXAR_HARDLINK => "HARDLINK",
@@ -694,6 +697,29 @@ impl Device {
     }
 }
 
+#[derive(Clone, Debug)]
+pub struct Prelude {
+    pub data: Vec<u8>,
+}
+
+impl Prelude {
+    pub fn as_os_str(&self) -> &OsStr {
+        self.as_ref()
+    }
+}
+
+impl AsRef<[u8]> for Prelude {
+    fn as_ref(&self) -> &[u8] {
+        &self.data
+    }
+}
+
+impl AsRef<OsStr> for Prelude {
+    fn as_ref(&self) -> &OsStr {
+        OsStr::from_bytes(&self.data[..self.data.len().max(1) - 1])
+    }
+}
+
 #[cfg(all(test, target_os = "linux"))]
 #[test]
 fn test_linux_devices() {
diff --git a/src/lib.rs b/src/lib.rs
index 7e5b48f..e0c5498 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -345,6 +345,9 @@ pub enum EntryKind {
     /// Pxar file format version
     Version(format::FormatVersion),
 
+    /// Pxar prelude blob
+    Prelude(format::Prelude),
+
     /// Symbolic links.
     Symlink(format::Symlink),
 
diff --git a/tests/simple/fs.rs b/tests/simple/fs.rs
index 8a8c607..96fcee9 100644
--- a/tests/simple/fs.rs
+++ b/tests/simple/fs.rs
@@ -230,6 +230,7 @@ impl Entry {
                 };
             match item.kind() {
                 PxarEntryKind::Version(_) => continue,
+                PxarEntryKind::Prelude(_) => continue,
                 PxarEntryKind::GoodbyeTable => break,
                 PxarEntryKind::File { size, .. } => {
                     let mut data = Vec::new();
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 16/69] client: backup: factor out extension from backup target
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (14 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 15/69] format/encoder/decoder: new pxar entry type `Prelude` Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 17/69] api: datastore: refactor getting local chunk reader Christian Ebner
                   ` (53 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Instead of composing the backup target name and pushing it to the
backup list, push the archive name and extension separately, only
constructing it while iterating the list later.

By this it remains possible to additionally prefix the extension, as
required with the separate pxar metadata and payload indexes.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes, patch reordered

 proxmox-backup-client/src/main.rs | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 32fe914c4..4453c7756 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -785,7 +785,8 @@ async fn create_backup(
                 upload_list.push((
                     BackupSpecificationType::PXAR,
                     filename.to_owned(),
-                    format!("{}.didx", target),
+                    target.to_owned(),
+                    "didx",
                     0,
                 ));
             }
@@ -803,7 +804,8 @@ async fn create_backup(
                 upload_list.push((
                     BackupSpecificationType::IMAGE,
                     filename.to_owned(),
-                    format!("{}.fidx", target),
+                    target.to_owned(),
+                    "fidx",
                     size,
                 ));
             }
@@ -814,7 +816,8 @@ async fn create_backup(
                 upload_list.push((
                     BackupSpecificationType::CONFIG,
                     filename.to_owned(),
-                    format!("{}.blob", target),
+                    target.to_owned(),
+                    "blob",
                     metadata.len(),
                 ));
             }
@@ -825,7 +828,8 @@ async fn create_backup(
                 upload_list.push((
                     BackupSpecificationType::LOGFILE,
                     filename.to_owned(),
-                    format!("{}.blob", target),
+                    target.to_owned(),
+                    "blob",
                     metadata.len(),
                 ));
             }
@@ -944,7 +948,8 @@ async fn create_backup(
         log::info!("{} {} '{}' to '{}' as {}", what, desc, file, repo, target);
     };
 
-    for (backup_type, filename, target, size) in upload_list {
+    for (backup_type, filename, target_base, extension, size) in upload_list {
+        let target = format!("{target_base}.{extension}");
         match (backup_type, dry_run) {
             // dry-run
             (BackupSpecificationType::CONFIG, true) => log_file("config file", &filename, &target),
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 17/69] api: datastore: refactor getting local chunk reader
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (15 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 16/69] client: backup: factor out extension from backup target Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 18/69] client: pxar: switch to stack based encoder state Christian Ebner
                   ` (52 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Move the code to get the local chunk reader to a dedicated function
to make it reusable. The same code is required to get the local chunk
reader for the payload stream for split stream archives.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- patch reordered, no changes

 src/api2/admin/datastore.rs | 39 ++++++++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 14 deletions(-)

diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index baa537595..ca72a2f2b 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -1743,6 +1743,29 @@ pub const API_METHOD_PXAR_FILE_DOWNLOAD: ApiMethod = ApiMethod::new(
     &Permission::Anybody,
 );
 
+fn get_local_pxar_reader(
+    datastore: Arc<DataStore>,
+    manifest: &BackupManifest,
+    backup_dir: &BackupDir,
+    pxar_name: &str,
+) -> Result<(LocalDynamicReadAt<LocalChunkReader>, u64), Error> {
+    let mut path = datastore.base_path();
+    path.push(backup_dir.relative_path());
+    path.push(pxar_name);
+
+    let index = DynamicIndexReader::open(&path)
+        .map_err(|err| format_err!("unable to read dynamic index '{:?}' - {}", &path, err))?;
+
+    let (csum, size) = index.compute_csum();
+    manifest.verify_file(pxar_name, &csum, size)?;
+
+    let chunk_reader = LocalChunkReader::new(datastore, None, CryptMode::None);
+    let reader = BufferedDynamicReader::new(index, chunk_reader);
+    let archive_size = reader.archive_size();
+
+    Ok((LocalDynamicReadAt::new(reader), archive_size))
+}
+
 pub fn pxar_file_download(
     _parts: Parts,
     _req_body: Body,
@@ -1787,20 +1810,8 @@ pub fn pxar_file_download(
             }
         }
 
-        let mut path = datastore.base_path();
-        path.push(backup_dir.relative_path());
-        path.push(pxar_name);
-
-        let index = DynamicIndexReader::open(&path)
-            .map_err(|err| format_err!("unable to read dynamic index '{:?}' - {}", &path, err))?;
-
-        let (csum, size) = index.compute_csum();
-        manifest.verify_file(pxar_name, &csum, size)?;
-
-        let chunk_reader = LocalChunkReader::new(datastore, None, CryptMode::None);
-        let reader = BufferedDynamicReader::new(index, chunk_reader);
-        let archive_size = reader.archive_size();
-        let reader = LocalDynamicReadAt::new(reader);
+        let (reader, archive_size) =
+            get_local_pxar_reader(datastore.clone(), &manifest, &backup_dir, pxar_name)?;
 
         let decoder = Accessor::new(reader, archive_size).await?;
         let root = decoder.open_root().await?;
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 18/69] client: pxar: switch to stack based encoder state
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (16 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 17/69] api: datastore: refactor getting local chunk reader Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 19/69] client: pxar: combine writers into struct Christian Ebner
                   ` (51 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

... and adapt to the new reader/writer variant for encoder or
decoder/accessor to attach a dedicated payload input/output for split
pxar archives.

In preparation for look-ahead caching, where a passing around of
per-directory level encoder instances with internal references is
not feasible.

Previously, for each directory level a new encoder instance has been
generated, restricting possible implementation errors. These encoder
instances have been internally linked by references to keep track of
the state changes in a parent child relationship.

This is however not feasible when the encoder has to be passed by
mutable reference, as required by the look-ahead cache
implementation. The encoder has therefore been adapted to use a
single instance implementation with an internal stack keeping track
of the state.

Depends on the bumped pxar library version, including the patches to
attach the corresponding variant for the pxar reader/writer
instantiation.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- adapt to new PxarVariant pxar interface type

 pbs-client/src/pxar/create.rs             | 8 +++++---
 pbs-pxar-fuse/src/lib.rs                  | 2 +-
 proxmox-backup-client/src/catalog.rs      | 3 ++-
 proxmox-backup-client/src/main.rs         | 2 +-
 proxmox-backup-client/src/mount.rs        | 3 ++-
 proxmox-file-restore/src/main.rs          | 4 ++--
 pxar-bin/src/main.rs                      | 2 +-
 src/api2/admin/datastore.rs               | 2 +-
 src/api2/tape/restore.rs                  | 5 +++--
 src/bin/proxmox_backup_debug/diff.rs      | 2 +-
 src/tape/file_formats/snapshot_archive.rs | 7 +++++--
 11 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 60efb0ce5..1b1bac2d4 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -170,7 +170,7 @@ where
         set.insert(stat.st_dev);
     }
 
-    let mut encoder = Encoder::new(&mut writer, &metadata).await?;
+    let mut encoder = Encoder::new(pxar::PxarVariant::Unified(&mut writer), &metadata).await?;
 
     let mut patterns = options.patterns;
 
@@ -203,6 +203,8 @@ where
         .archive_dir_contents(&mut encoder, source_dir, true)
         .await?;
     encoder.finish().await?;
+    encoder.close().await?;
+
     Ok(())
 }
 
@@ -663,7 +665,7 @@ impl Archiver {
     ) -> Result<(), Error> {
         let dir_name = OsStr::from_bytes(dir_name.to_bytes());
 
-        let mut encoder = encoder.create_directory(dir_name, metadata).await?;
+        encoder.create_directory(dir_name, metadata).await?;
 
         let old_fs_magic = self.fs_magic;
         let old_fs_feature_flags = self.fs_feature_flags;
@@ -686,7 +688,7 @@ impl Archiver {
             log::info!("skipping mount point: {:?}", self.path);
             Ok(())
         } else {
-            self.archive_dir_contents(&mut encoder, dir, false).await
+            self.archive_dir_contents(encoder, dir, false).await
         };
 
         self.fs_magic = old_fs_magic;
diff --git a/pbs-pxar-fuse/src/lib.rs b/pbs-pxar-fuse/src/lib.rs
index bf196b6c4..377635b2a 100644
--- a/pbs-pxar-fuse/src/lib.rs
+++ b/pbs-pxar-fuse/src/lib.rs
@@ -66,7 +66,7 @@ impl Session {
         let file = std::fs::File::open(archive_path)?;
         let file_size = file.metadata()?.len();
         let reader: Reader = Arc::new(accessor::sync::FileReader::new(file));
-        let accessor = Accessor::new(reader, file_size).await?;
+        let accessor = Accessor::new(pxar::PxarVariant::Unified(reader), file_size).await?;
         Self::mount(accessor, options, verbose, mountpoint)
     }
 
diff --git a/proxmox-backup-client/src/catalog.rs b/proxmox-backup-client/src/catalog.rs
index 72b22e67f..e72b6a1e0 100644
--- a/proxmox-backup-client/src/catalog.rs
+++ b/proxmox-backup-client/src/catalog.rs
@@ -220,7 +220,8 @@ async fn catalog_shell(param: Value) -> Result<(), Error> {
     let reader = BufferedDynamicReader::new(index, chunk_reader);
     let archive_size = reader.archive_size();
     let reader: pbs_pxar_fuse::Reader = Arc::new(BufferedDynamicReadAt::new(reader));
-    let decoder = pbs_pxar_fuse::Accessor::new(reader, archive_size).await?;
+    let decoder =
+        pbs_pxar_fuse::Accessor::new(pxar::PxarVariant::Unified(reader), archive_size).await?;
 
     client.download(CATALOG_NAME, &mut tmpfile).await?;
     let index = DynamicIndexReader::new(tmpfile)
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 4453c7756..ad2bc5a66 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -1458,7 +1458,7 @@ async fn restore(
 
         if let Some(target) = target {
             pbs_client::pxar::extract_archive(
-                pxar::decoder::Decoder::from_std(reader)?,
+                pxar::decoder::Decoder::from_std(pxar::PxarVariant::Unified(reader))?,
                 Path::new(target),
                 feature_flags,
                 |path| {
diff --git a/proxmox-backup-client/src/mount.rs b/proxmox-backup-client/src/mount.rs
index 4a2f83357..4d352b6e4 100644
--- a/proxmox-backup-client/src/mount.rs
+++ b/proxmox-backup-client/src/mount.rs
@@ -296,7 +296,8 @@ async fn mount_do(param: Value, pipe: Option<OwnedFd>) -> Result<Value, Error> {
         let reader = BufferedDynamicReader::new(index, chunk_reader);
         let archive_size = reader.archive_size();
         let reader: pbs_pxar_fuse::Reader = Arc::new(BufferedDynamicReadAt::new(reader));
-        let decoder = pbs_pxar_fuse::Accessor::new(reader, archive_size).await?;
+        let decoder =
+            pbs_pxar_fuse::Accessor::new(pxar::PxarVariant::Unified(reader), archive_size).await?;
 
         let session =
             pbs_pxar_fuse::Session::mount(decoder, options, false, Path::new(target.unwrap()))
diff --git a/proxmox-file-restore/src/main.rs b/proxmox-file-restore/src/main.rs
index 50875a636..6a6379f27 100644
--- a/proxmox-file-restore/src/main.rs
+++ b/proxmox-file-restore/src/main.rs
@@ -457,7 +457,7 @@ async fn extract(
 
             let archive_size = reader.archive_size();
             let reader = LocalDynamicReadAt::new(reader);
-            let decoder = Accessor::new(reader, archive_size).await?;
+            let decoder = Accessor::new(pxar::PxarVariant::Unified(reader), archive_size).await?;
             extract_to_target(decoder, &path, target, format, zstd).await?;
         }
         ExtractPath::VM(file, path) => {
@@ -483,7 +483,7 @@ async fn extract(
                     false,
                 )
                 .await?;
-                let decoder = Decoder::from_tokio(reader).await?;
+                let decoder = Decoder::from_tokio(pxar::PxarVariant::Unified(reader)).await?;
                 extract_sub_dir_seq(&target, decoder).await?;
 
                 // we extracted a .pxarexclude-cli file auto-generated by the VM when encoding the
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index 2bbe90e34..0729db1c9 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -26,7 +26,7 @@ fn extract_archive_from_reader<R: std::io::Read>(
     options: PxarExtractOptions,
 ) -> Result<(), Error> {
     pbs_client::pxar::extract_archive(
-        pxar::decoder::Decoder::from_std(reader)?,
+        pxar::decoder::Decoder::from_std(pxar::PxarVariant::Unified(reader))?,
         Path::new(target),
         feature_flags,
         |path| {
diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index ca72a2f2b..af1c12cc0 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -1813,7 +1813,7 @@ pub fn pxar_file_download(
         let (reader, archive_size) =
             get_local_pxar_reader(datastore.clone(), &manifest, &backup_dir, pxar_name)?;
 
-        let decoder = Accessor::new(reader, archive_size).await?;
+        let decoder = Accessor::new(pxar::PxarVariant::Unified(reader), archive_size).await?;
         let root = decoder.open_root().await?;
         let path = OsStr::from_bytes(file_path).to_os_string();
         let file = root
diff --git a/src/api2/tape/restore.rs b/src/api2/tape/restore.rs
index 84557bce1..9184ff934 100644
--- a/src/api2/tape/restore.rs
+++ b/src/api2/tape/restore.rs
@@ -1069,7 +1069,8 @@ fn restore_snapshots_to_tmpdir(
                     "File {file_num}: snapshot archive {source_datastore}:{snapshot}",
                 );
 
-                let mut decoder = pxar::decoder::sync::Decoder::from_std(reader)?;
+                let mut decoder =
+                    pxar::decoder::sync::Decoder::from_std(pxar::PxarVariant::Unified(reader))?;
 
                 let target_datastore = match store_map.target_store(&source_datastore) {
                     Some(datastore) => datastore,
@@ -1685,7 +1686,7 @@ fn restore_snapshot_archive<'a>(
     reader: Box<dyn 'a + TapeRead>,
     snapshot_path: &Path,
 ) -> Result<bool, Error> {
-    let mut decoder = pxar::decoder::sync::Decoder::from_std(reader)?;
+    let mut decoder = pxar::decoder::sync::Decoder::from_std(pxar::PxarVariant::Unified(reader))?;
     match try_restore_snapshot_archive(worker, &mut decoder, snapshot_path) {
         Ok(_) => Ok(true),
         Err(err) => {
diff --git a/src/bin/proxmox_backup_debug/diff.rs b/src/bin/proxmox_backup_debug/diff.rs
index 5b68941a4..e6767c17c 100644
--- a/src/bin/proxmox_backup_debug/diff.rs
+++ b/src/bin/proxmox_backup_debug/diff.rs
@@ -277,7 +277,7 @@ async fn open_dynamic_index(
     let reader = BufferedDynamicReader::new(index, chunk_reader);
     let archive_size = reader.archive_size();
     let reader: Arc<dyn ReadAt + Send + Sync> = Arc::new(LocalDynamicReadAt::new(reader));
-    let accessor = Accessor::new(reader, archive_size).await?;
+    let accessor = Accessor::new(pxar::PxarVariant::Unified(reader), archive_size).await?;
 
     Ok((lookup_index, accessor))
 }
diff --git a/src/tape/file_formats/snapshot_archive.rs b/src/tape/file_formats/snapshot_archive.rs
index 252384b50..82f466980 100644
--- a/src/tape/file_formats/snapshot_archive.rs
+++ b/src/tape/file_formats/snapshot_archive.rs
@@ -58,8 +58,10 @@ pub fn tape_write_snapshot_archive<'a>(
             ));
         }
 
-        let mut encoder =
-            pxar::encoder::sync::Encoder::new(PxarTapeWriter::new(writer), &root_metadata)?;
+        let mut encoder = pxar::encoder::sync::Encoder::new(
+            pxar::PxarVariant::Unified(PxarTapeWriter::new(writer)),
+            &root_metadata,
+        )?;
 
         for filename in file_list.iter() {
             let mut file = snapshot_reader.open_file(filename).map_err(|err| {
@@ -89,6 +91,7 @@ pub fn tape_write_snapshot_archive<'a>(
             }
         }
         encoder.finish()?;
+        encoder.close()?;
         Ok(())
     });
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 19/69] client: pxar: combine writers into struct
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (17 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 18/69] client: pxar: switch to stack based encoder state Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 20/69] client: pxar: optionally split metadata and payload streams Christian Ebner
                   ` (50 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Introduce a `PxarWriters` struct to bundle all writer instances
required for the pxar archive creation into a single object to limit
the number of function call parameters.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- rename archive writer from `writer` to `archive`
- adapt to new PxarVariant pxar interface type

 pbs-client/src/pxar/create.rs                 | 23 +++++++++++++++----
 pbs-client/src/pxar/mod.rs                    |  2 +-
 pbs-client/src/pxar_backup_stream.rs          |  8 ++++---
 .../src/proxmox_restore_daemon/api.rs         |  8 ++++---
 pxar-bin/src/main.rs                          |  8 +++----
 tests/catar.rs                                |  5 ++--
 6 files changed, 35 insertions(+), 19 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 1b1bac2d4..cc75f0262 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -18,7 +18,7 @@ use nix::sys::stat::{FileStat, Mode};
 use pathpatterns::{MatchEntry, MatchFlag, MatchList, MatchType, PatternFlag};
 use proxmox_sys::error::SysError;
 use pxar::encoder::{LinkOffset, SeqWrite};
-use pxar::Metadata;
+use pxar::{Metadata, PxarVariant};
 
 use proxmox_io::vec;
 use proxmox_lang::c_str;
@@ -135,12 +135,25 @@ struct Archiver {
 
 type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
 
+pub struct PxarWriters<T> {
+    archive: PxarVariant<T, T>,
+    catalog: Option<Arc<Mutex<dyn BackupCatalogWriter + Send>>>,
+}
+
+impl<T> PxarWriters<T> {
+    pub fn new(
+        archive: PxarVariant<T, T>,
+        catalog: Option<Arc<Mutex<dyn BackupCatalogWriter + Send>>>,
+    ) -> Self {
+        Self { archive, catalog }
+    }
+}
+
 pub async fn create_archive<T, F>(
     source_dir: Dir,
-    mut writer: T,
+    writers: PxarWriters<T>,
     feature_flags: Flags,
     callback: F,
-    catalog: Option<Arc<Mutex<dyn BackupCatalogWriter + Send>>>,
     options: PxarCreateOptions,
 ) -> Result<(), Error>
 where
@@ -170,7 +183,7 @@ where
         set.insert(stat.st_dev);
     }
 
-    let mut encoder = Encoder::new(pxar::PxarVariant::Unified(&mut writer), &metadata).await?;
+    let mut encoder = Encoder::new(writers.archive, &metadata).await?;
 
     let mut patterns = options.patterns;
 
@@ -188,7 +201,7 @@ where
         fs_magic,
         callback: Box::new(callback),
         patterns,
-        catalog,
+        catalog: writers.catalog,
         path: PathBuf::new(),
         entry_counter: 0,
         entry_limit: options.entries_max,
diff --git a/pbs-client/src/pxar/mod.rs b/pbs-client/src/pxar/mod.rs
index 14674b9b9..b7dcf8362 100644
--- a/pbs-client/src/pxar/mod.rs
+++ b/pbs-client/src/pxar/mod.rs
@@ -56,7 +56,7 @@ pub(crate) mod tools;
 mod flags;
 pub use flags::Flags;
 
-pub use create::{create_archive, PxarCreateOptions};
+pub use create::{create_archive, PxarCreateOptions, PxarWriters};
 pub use extract::{
     create_tar, create_zip, extract_archive, extract_sub_dir, extract_sub_dir_seq, ErrorHandler,
     OverwriteFlags, PxarExtractContext, PxarExtractOptions,
diff --git a/pbs-client/src/pxar_backup_stream.rs b/pbs-client/src/pxar_backup_stream.rs
index 22a6ffdc2..8dc3fd088 100644
--- a/pbs-client/src/pxar_backup_stream.rs
+++ b/pbs-client/src/pxar_backup_stream.rs
@@ -17,6 +17,8 @@ use proxmox_io::StdChannelWriter;
 
 use pbs_datastore::catalog::CatalogWriter;
 
+use crate::pxar::create::PxarWriters;
+
 /// Stream implementation to encode and upload .pxar archives.
 ///
 /// The hyper client needs an async Stream for file upload, so we
@@ -53,16 +55,16 @@ impl PxarBackupStream {
                 StdChannelWriter::new(tx),
             ));
 
-            let writer = pxar::encoder::sync::StandardWriter::new(writer);
+            let writer =
+                pxar::PxarVariant::Unified(pxar::encoder::sync::StandardWriter::new(writer));
             if let Err(err) = crate::pxar::create_archive(
                 dir,
-                writer,
+                PxarWriters::new(writer, Some(catalog)),
                 crate::pxar::Flags::DEFAULT,
                 move |path| {
                     log::debug!("{:?}", path);
                     Ok(())
                 },
-                Some(catalog),
                 options,
             )
             .await
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index cb7b53e11..95c9f4619 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -23,7 +23,9 @@ use proxmox_sortable_macro::sortable;
 use proxmox_sys::fs::read_subdir;
 
 use pbs_api_types::file_restore::{FileRestoreFormat, RestoreDaemonStatus};
-use pbs_client::pxar::{create_archive, Flags, PxarCreateOptions, ENCODER_MAX_ENTRIES};
+use pbs_client::pxar::{
+    create_archive, Flags, PxarCreateOptions, PxarWriters, ENCODER_MAX_ENTRIES,
+};
 use pbs_datastore::catalog::{ArchiveEntry, DirEntryAttribute};
 use pbs_tools::json::required_string_param;
 
@@ -360,8 +362,8 @@ fn extract(
                         skip_e2big_xattr: false,
                     };
 
-                    let pxar_writer = TokioWriter::new(writer);
-                    create_archive(dir, pxar_writer, Flags::DEFAULT, |_| Ok(()), None, options)
+                    let pxar_writer = pxar::PxarVariant::Unified(TokioWriter::new(writer));
+                    create_archive(dir, PxarWriters::new(pxar_writer, None), Flags::DEFAULT, |_| Ok(()), options)
                         .await
                 }
                 .await;
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index 0729db1c9..61756d21c 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -13,7 +13,8 @@ use tokio::signal::unix::{signal, SignalKind};
 
 use pathpatterns::{MatchEntry, MatchType, PatternFlag};
 use pbs_client::pxar::{
-    format_single_line_entry, Flags, OverwriteFlags, PxarExtractOptions, ENCODER_MAX_ENTRIES,
+    format_single_line_entry, Flags, OverwriteFlags, PxarExtractOptions, PxarWriters,
+    ENCODER_MAX_ENTRIES,
 };
 
 use proxmox_router::cli::*;
@@ -373,16 +374,15 @@ async fn create_archive(
         feature_flags.remove(Flags::WITH_SOCKETS);
     }
 
-    let writer = pxar::encoder::sync::StandardWriter::new(writer);
+    let writer = pxar::PxarVariant::Unified(pxar::encoder::sync::StandardWriter::new(writer));
     pbs_client::pxar::create_archive(
         dir,
-        writer,
+        PxarWriters::new(writer, None),
         feature_flags,
         move |path| {
             log::debug!("{:?}", path);
             Ok(())
         },
-        None,
         options,
     )
     .await?;
diff --git a/tests/catar.rs b/tests/catar.rs
index 36bb4f3bc..932df61a9 100644
--- a/tests/catar.rs
+++ b/tests/catar.rs
@@ -19,7 +19,7 @@ fn run_test(dir_name: &str) -> Result<(), Error> {
         .write(true)
         .truncate(true)
         .open("test-proxmox.catar")?;
-    let writer = pxar::encoder::sync::StandardWriter::new(writer);
+    let writer = pxar::PxarVariant::Unified(pxar::encoder::sync::StandardWriter::new(writer));
 
     let dir = nix::dir::Dir::open(
         dir_name,
@@ -35,10 +35,9 @@ fn run_test(dir_name: &str) -> Result<(), Error> {
     let rt = tokio::runtime::Runtime::new().unwrap();
     rt.block_on(create_archive(
         dir,
-        writer,
+        PxarWriters::new(writer, None),
         Flags::DEFAULT,
         |_| Ok(()),
-        None,
         options,
     ))?;
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 20/69] client: pxar: optionally split metadata and payload streams
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (18 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 19/69] client: pxar: combine writers into struct Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 21/69] client: helper: add helpers for creating reader instances Christian Ebner
                   ` (49 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

... and attach the split payload writer variant to the pxar archive
creation. By this, metadata and payload data will create different
dynamic indexes, allowing to lookup and reuse payload chunks without
the additional overhead of the pxar archive's metadata.

For now this functionality remains disabled and will be enabled in a
later patch once the logic for reusing the payload chunks is in
place.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- adapt to new PxarVariant pxar interface type

 pbs-client/src/pxar_backup_stream.rs | 51 ++++++++++++++-----
 proxmox-backup-client/src/main.rs    | 75 +++++++++++++++++++++++++---
 2 files changed, 105 insertions(+), 21 deletions(-)

diff --git a/pbs-client/src/pxar_backup_stream.rs b/pbs-client/src/pxar_backup_stream.rs
index 8dc3fd088..3541eddb5 100644
--- a/pbs-client/src/pxar_backup_stream.rs
+++ b/pbs-client/src/pxar_backup_stream.rs
@@ -42,21 +42,37 @@ impl PxarBackupStream {
         dir: Dir,
         catalog: Arc<Mutex<CatalogWriter<W>>>,
         options: crate::pxar::PxarCreateOptions,
-    ) -> Result<Self, Error> {
-        let (tx, rx) = std::sync::mpsc::sync_channel(10);
-
+        separate_payload_stream: bool,
+    ) -> Result<(Self, Option<Self>), Error> {
         let buffer_size = 256 * 1024;
 
-        let error = Arc::new(Mutex::new(None));
-        let error2 = Arc::clone(&error);
-        let handler = async move {
-            let writer = TokioWriterAdapter::new(std::io::BufWriter::with_capacity(
+        let (tx, rx) = std::sync::mpsc::sync_channel(10);
+        let writer = TokioWriterAdapter::new(std::io::BufWriter::with_capacity(
+            buffer_size,
+            StdChannelWriter::new(tx),
+        ));
+        let writer = pxar::encoder::sync::StandardWriter::new(writer);
+
+        let (writer, payload_rx) = if separate_payload_stream {
+            let (tx, rx) = std::sync::mpsc::sync_channel(10);
+            let payload_writer = TokioWriterAdapter::new(std::io::BufWriter::with_capacity(
                 buffer_size,
                 StdChannelWriter::new(tx),
             ));
+            (
+                pxar::PxarVariant::Split(
+                    writer,
+                    pxar::encoder::sync::StandardWriter::new(payload_writer),
+                ),
+                Some(rx),
+            )
+        } else {
+            (pxar::PxarVariant::Unified(writer), None)
+        };
 
-            let writer =
-                pxar::PxarVariant::Unified(pxar::encoder::sync::StandardWriter::new(writer));
+        let error = Arc::new(Mutex::new(None));
+        let error2 = Arc::clone(&error);
+        let handler = async move {
             if let Err(err) = crate::pxar::create_archive(
                 dir,
                 PxarWriters::new(writer, Some(catalog)),
@@ -78,21 +94,30 @@ impl PxarBackupStream {
         let future = Abortable::new(handler, registration);
         tokio::spawn(future);
 
-        Ok(Self {
+        let backup_stream = Self {
+            rx: Some(rx),
+            handle: Some(handle.clone()),
+            error: Arc::clone(&error),
+        };
+
+        let backup_payload_stream = payload_rx.map(|rx| Self {
             rx: Some(rx),
             handle: Some(handle),
             error,
-        })
+        });
+
+        Ok((backup_stream, backup_payload_stream))
     }
 
     pub fn open<W: Write + Send + 'static>(
         dirname: &Path,
         catalog: Arc<Mutex<CatalogWriter<W>>>,
         options: crate::pxar::PxarCreateOptions,
-    ) -> Result<Self, Error> {
+        separate_payload_stream: bool,
+    ) -> Result<(Self, Option<Self>), Error> {
         let dir = nix::dir::Dir::open(dirname, OFlag::O_DIRECTORY, Mode::empty())?;
 
-        Self::new(dir, catalog, options)
+        Self::new(dir, catalog, options, separate_payload_stream)
     }
 }
 
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index ad2bc5a66..25556d672 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -187,18 +187,24 @@ async fn backup_directory<P: AsRef<Path>>(
     client: &BackupWriter,
     dir_path: P,
     archive_name: &str,
+    payload_target: Option<&str>,
     chunk_size: Option<usize>,
     catalog: Arc<Mutex<CatalogWriter<TokioWriterAdapter<StdChannelWriter<Error>>>>>,
     pxar_create_options: pbs_client::pxar::PxarCreateOptions,
     upload_options: UploadOptions,
-) -> Result<BackupStats, Error> {
+) -> Result<(BackupStats, Option<BackupStats>), Error> {
     if upload_options.fixed_size.is_some() {
         bail!("cannot backup directory with fixed chunk size!");
     }
 
-    let pxar_stream = PxarBackupStream::open(dir_path.as_ref(), catalog, pxar_create_options)?;
-    let mut chunk_stream = ChunkStream::new(pxar_stream, chunk_size);
+    let (pxar_stream, payload_stream) = PxarBackupStream::open(
+        dir_path.as_ref(),
+        catalog,
+        pxar_create_options,
+        payload_target.is_some(),
+    )?;
 
+    let mut chunk_stream = ChunkStream::new(pxar_stream, chunk_size);
     let (tx, rx) = mpsc::channel(10); // allow to buffer 10 chunks
 
     let stream = ReceiverStream::new(rx).map_err(Error::from);
@@ -210,11 +216,36 @@ async fn backup_directory<P: AsRef<Path>>(
         }
     });
 
-    let stats = client
-        .upload_stream(archive_name, stream, upload_options)
-        .await?;
+    let stats = client.upload_stream(archive_name, stream, upload_options.clone());
 
-    Ok(stats)
+    if let Some(payload_stream) = payload_stream {
+        let payload_target = payload_target
+            .ok_or_else(|| format_err!("got payload stream, but no target archive name"))?;
+
+        let mut payload_chunk_stream = ChunkStream::new(payload_stream, chunk_size);
+        let (payload_tx, payload_rx) = mpsc::channel(10); // allow to buffer 10 chunks
+        let stream = ReceiverStream::new(payload_rx).map_err(Error::from);
+
+        // spawn payload chunker inside a separate task so that it can run parallel
+        tokio::spawn(async move {
+            while let Some(v) = payload_chunk_stream.next().await {
+                let _ = payload_tx.send(v).await;
+            }
+        });
+
+        let payload_stats = client.upload_stream(&payload_target, stream, upload_options);
+
+        match futures::join!(stats, payload_stats) {
+            (Ok(stats), Ok(payload_stats)) => Ok((stats, Some(payload_stats))),
+            (Err(err), Ok(_)) => Err(format_err!("upload failed: {err}")),
+            (Ok(_), Err(err)) => Err(format_err!("upload failed: {err}")),
+            (Err(err), Err(payload_err)) => {
+                Err(format_err!("upload failed: {err} - {payload_err}"))
+            }
+        }
+    } else {
+        Ok((stats.await?, None))
+    }
 }
 
 async fn backup_image<P: AsRef<Path>>(
@@ -985,6 +1016,23 @@ async fn create_backup(
                 manifest.add_file(target, stats.size, stats.csum, crypto.mode)?;
             }
             (BackupSpecificationType::PXAR, false) => {
+                let metadata_mode = false; // Until enabled via param
+
+                let target_base = if let Some(base) = target_base.strip_suffix(".pxar") {
+                    base.to_string()
+                } else {
+                    bail!("unexpected suffix in target: {target_base}");
+                };
+
+                let (target, payload_target) = if metadata_mode {
+                    (
+                        format!("{target_base}.mpxar.{extension}"),
+                        Some(format!("{target_base}.ppxar.{extension}")),
+                    )
+                } else {
+                    (target, None)
+                };
+
                 // start catalog upload on first use
                 if catalog.is_none() {
                     let catalog_upload_res =
@@ -1015,16 +1063,27 @@ async fn create_backup(
                     ..UploadOptions::default()
                 };
 
-                let stats = backup_directory(
+                let (stats, payload_stats) = backup_directory(
                     &client,
                     &filename,
                     &target,
+                    payload_target.as_deref(),
                     chunk_size_opt,
                     catalog.clone(),
                     pxar_options,
                     upload_options,
                 )
                 .await?;
+
+                if let Some(payload_stats) = payload_stats {
+                    manifest.add_file(
+                        payload_target
+                            .ok_or_else(|| format_err!("missing payload target archive"))?,
+                        payload_stats.size,
+                        payload_stats.csum,
+                        crypto.mode,
+                    )?;
+                }
                 manifest.add_file(target, stats.size, stats.csum, crypto.mode)?;
                 catalog.lock().unwrap().end_directory()?;
             }
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 21/69] client: helper: add helpers for creating reader instances
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (19 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 20/69] client: pxar: optionally split metadata and payload streams Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 22/69] client: helper: add method for split archive name mapping Christian Ebner
                   ` (48 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Add module to place helper methods which need to be used in different
submodules of the client.

Add `get_pxar_fuse_reader`, `get_buffered_pxar_reader` and
`get_pxar_fuse_accessor` to create reader instances to access pxar
archives.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- adapt to new PxarVariant pxar interface type
- fix clippy warnings

 proxmox-backup-client/src/helper.rs | 72 +++++++++++++++++++++++++++++
 proxmox-backup-client/src/main.rs   |  2 +
 2 files changed, 74 insertions(+)
 create mode 100644 proxmox-backup-client/src/helper.rs

diff --git a/proxmox-backup-client/src/helper.rs b/proxmox-backup-client/src/helper.rs
new file mode 100644
index 000000000..5b21b6720
--- /dev/null
+++ b/proxmox-backup-client/src/helper.rs
@@ -0,0 +1,72 @@
+use std::sync::Arc;
+
+use anyhow::Error;
+use pbs_client::{BackupReader, RemoteChunkReader};
+use pbs_datastore::BackupManifest;
+use pbs_tools::crypt_config::CryptConfig;
+
+use crate::{BufferedDynamicReadAt, BufferedDynamicReader, IndexFile};
+
+pub(crate) async fn get_pxar_fuse_accessor(
+    archive_name: &str,
+    payload_archive_name: Option<&str>,
+    client: Arc<BackupReader>,
+    manifest: &BackupManifest,
+    crypt_config: Option<Arc<CryptConfig>>,
+) -> Result<pbs_pxar_fuse::Accessor, Error> {
+    let (reader, archive_size) =
+        get_pxar_fuse_reader(archive_name, client.clone(), manifest, crypt_config.clone()).await?;
+
+    let reader = if let Some(payload_archive_name) = payload_archive_name {
+        let (payload_reader, payload_size) = get_pxar_fuse_reader(
+            payload_archive_name,
+            client.clone(),
+            manifest,
+            crypt_config.clone(),
+        )
+        .await?;
+
+        pxar::PxarVariant::Split(reader, (payload_reader, payload_size))
+    } else {
+        pxar::PxarVariant::Unified(reader)
+    };
+
+    let accessor = pbs_pxar_fuse::Accessor::new(reader, archive_size).await?;
+
+    Ok(accessor)
+}
+
+pub(crate) async fn get_pxar_fuse_reader(
+    archive_name: &str,
+    client: Arc<BackupReader>,
+    manifest: &BackupManifest,
+    crypt_config: Option<Arc<CryptConfig>>,
+) -> Result<(pbs_pxar_fuse::Reader, u64), Error> {
+    let reader = get_buffered_pxar_reader(archive_name, client, manifest, crypt_config).await?;
+    let archive_size = reader.archive_size();
+    let reader: pbs_pxar_fuse::Reader = Arc::new(BufferedDynamicReadAt::new(reader));
+
+    Ok((reader, archive_size))
+}
+
+pub(crate) async fn get_buffered_pxar_reader(
+    archive_name: &str,
+    client: Arc<BackupReader>,
+    manifest: &BackupManifest,
+    crypt_config: Option<Arc<CryptConfig>>,
+) -> Result<BufferedDynamicReader<RemoteChunkReader>, Error> {
+    let index = client
+        .download_dynamic_index(manifest, archive_name)
+        .await?;
+
+    let most_used = index.find_most_used_chunks(8);
+    let file_info = manifest.lookup_file_info(archive_name)?;
+    let chunk_reader = RemoteChunkReader::new(
+        client.clone(),
+        crypt_config.clone(),
+        file_info.chunk_crypt_mode(),
+        most_used,
+    );
+
+    Ok(BufferedDynamicReader::new(index, chunk_reader))
+}
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 25556d672..db0fb6324 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -72,6 +72,8 @@ mod catalog;
 pub use catalog::*;
 mod snapshot;
 pub use snapshot::*;
+mod helper;
+pub(crate) use helper::*;
 pub mod key;
 pub mod namespace;
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 22/69] client: helper: add method for split archive name mapping
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (20 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 21/69] client: helper: add helpers for creating reader instances Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 23/69] client: tools: helper to check pxar filename extensions Christian Ebner
                   ` (47 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Helper method that takes the meta or payload archive name as input
and maps it to the correct archive names for metadata and payload
archive.

If neither is matched, fallback to returning the passed in archive
name as target archive and `None` for the payload archive name.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- extend mapping to also include `.pxar` as allowed extension, mapping
  to `.mpxar`

 proxmox-backup-client/src/helper.rs | 42 +++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/proxmox-backup-client/src/helper.rs b/proxmox-backup-client/src/helper.rs
index 5b21b6720..5589aa5b1 100644
--- a/proxmox-backup-client/src/helper.rs
+++ b/proxmox-backup-client/src/helper.rs
@@ -70,3 +70,45 @@ pub(crate) async fn get_buffered_pxar_reader(
 
     Ok(BufferedDynamicReader::new(index, chunk_reader))
 }
+
+pub(crate) fn get_pxar_archive_names(
+    archive_name: &str,
+    manifest: &BackupManifest,
+) -> (String, Option<String>) {
+    let filename = archive_name.strip_suffix(".didx").unwrap_or(archive_name);
+
+    if let Some(base) = filename
+        .strip_suffix(".mpxar")
+        .or_else(|| filename.strip_suffix(".ppxar"))
+    {
+        if archive_name.ends_with(".didx") {
+            return (
+                format!("{base}.mpxar.didx"),
+                Some(format!("{base}.ppxar.didx")),
+            );
+        } else {
+            return (format!("{base}.mpxar"), Some(format!("{base}.ppxar")));
+        }
+    }
+
+    if let Some(base) = filename.strip_suffix(".pxar") {
+        // Check if pxar is present, otherwise fallback to split archive naming
+        if manifest
+            .files()
+            .iter()
+            .find(|fileinfo| fileinfo.filename == filename)
+            .is_none()
+        {
+            if archive_name.ends_with(".didx") {
+                return (
+                    format!("{base}.mpxar.didx"),
+                    Some(format!("{base}.ppxar.didx")),
+                );
+            } else {
+                return (format!("{base}.mpxar"), Some(format!("{base}.ppxar")));
+            }
+        }
+    }
+
+    (archive_name.to_owned(), None)
+}
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 23/69] client: tools: helper to check pxar filename extensions
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (21 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 22/69] client: helper: add method for split archive name mapping Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 24/69] client: restore: read payload from dedicated index Christian Ebner
                   ` (46 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

With the introduction of split pxar archives, the allowed extensions
are now `.pxar`, `.mpxar` and `.ppxar`. Add a helper function to
allow to check for all valid variants, including the optional
additional `.didx` in case of a server archive name.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- not present in previous version

 pbs-client/src/tools/mod.rs | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/pbs-client/src/tools/mod.rs b/pbs-client/src/tools/mod.rs
index 1b0123a39..6a9d1992d 100644
--- a/pbs-client/src/tools/mod.rs
+++ b/pbs-client/src/tools/mod.rs
@@ -526,3 +526,16 @@ pub fn place_xdg_file(
         .and_then(|base| base.place_config_file(file_name).map_err(Error::from))
         .with_context(|| format!("failed to place {} in xdg home", description))
 }
+
+/// Check if the given filename has a valid pxar filename extension variant
+///
+/// If `with_didx_extension` is `true`, check the additional `.didx` ending.
+pub fn has_pxar_filename_extension(name: &str, with_didx_extension: bool) -> bool {
+    if with_didx_extension {
+        name.ends_with(".pxar.didx")
+            || name.ends_with(".mpxar.didx")
+            || name.ends_with(".ppxar.didx")
+    } else {
+        name.ends_with(".pxar") || name.ends_with(".mpxar") || name.ends_with(".ppxar")
+    }
+}
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 24/69] client: restore: read payload from dedicated index
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (22 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 23/69] client: tools: helper to check pxar filename extensions Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 25/69] tools: cover extension for split pxar archives Christian Ebner
                   ` (45 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Whenever a split pxar archive is encountered, instantiate and attach
the required dedicated reader instance to the decoder instance on
restore.

Piping the output to stdout is not possible for these, as this would
require a decoder instance which can decode the input stream, while
maintaining the pxar stream format as output.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- adapt to PxarVariant pxar interface
- use newly introduced has_pxar_filename_extension
- allow for pxar -> mpxar mapping

 proxmox-backup-client/src/main.rs | 45 ++++++++++++++++++++-----------
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index db0fb6324..ef481743f 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -35,7 +35,7 @@ use pbs_client::tools::{
     complete_archive_name, complete_auth_id, complete_backup_group, complete_backup_snapshot,
     complete_backup_source, complete_chunk_size, complete_group_or_snapshot,
     complete_img_archive_name, complete_namespace, complete_pxar_archive_name, complete_repository,
-    connect, connect_rate_limited, extract_repository_from_value,
+    connect, connect_rate_limited, extract_repository_from_value, has_pxar_filename_extension,
     key_source::{
         crypto_parameters, format_key_source, get_encryption_key_password, KEYFD_SCHEMA,
         KEYFILE_SCHEMA, MASTER_PUBKEY_FD_SCHEMA, MASTER_PUBKEY_FILE_SCHEMA,
@@ -1216,7 +1216,7 @@ async fn dump_image<W: Write>(
 fn parse_archive_type(name: &str) -> (String, ArchiveType) {
     if name.ends_with(".didx") || name.ends_with(".fidx") || name.ends_with(".blob") {
         (name.into(), archive_type(name).unwrap())
-    } else if name.ends_with(".pxar") {
+    } else if has_pxar_filename_extension(name, false) {
         (format!("{}.didx", name), ArchiveType::DynamicIndex)
     } else if name.ends_with(".img") {
         (format!("{}.fidx", name), ArchiveType::FixedIndex)
@@ -1400,6 +1400,9 @@ async fn restore(
 
     let (manifest, backup_index_data) = client.download_manifest().await?;
 
+    let (archive_name, payload_archive_name) =
+        helper::get_pxar_archive_names(&archive_name, &manifest);
+
     if archive_name == ENCRYPTED_KEY_BLOB_NAME && crypt_config.is_none() {
         log::info!("Restoring encrypted key blob without original key - skipping manifest fingerprint check!")
     } else {
@@ -1450,20 +1453,13 @@ async fn restore(
                 .map_err(|err| format_err!("unable to pipe data - {}", err))?;
         }
     } else if archive_type == ArchiveType::DynamicIndex {
-        let index = client
-            .download_dynamic_index(&manifest, &archive_name)
-            .await?;
-
-        let most_used = index.find_most_used_chunks(8);
-
-        let chunk_reader = RemoteChunkReader::new(
+        let mut reader = get_buffered_pxar_reader(
+            &archive_name,
             client.clone(),
-            crypt_config,
-            file_info.chunk_crypt_mode(),
-            most_used,
-        );
-
-        let mut reader = BufferedDynamicReader::new(index, chunk_reader);
+            &manifest,
+            crypt_config.clone(),
+        )
+        .await?;
 
         let on_error = if ignore_extract_device_errors {
             let handler: PxarErrorHandler = Box::new(move |err: Error| {
@@ -1518,8 +1514,22 @@ async fn restore(
         }
 
         if let Some(target) = target {
+            let reader = if let Some(payload_archive_name) = payload_archive_name {
+                let payload_reader = get_buffered_pxar_reader(
+                    &payload_archive_name,
+                    client.clone(),
+                    &manifest,
+                    crypt_config.clone(),
+                )
+                .await?;
+                pxar::PxarVariant::Split(reader, payload_reader)
+            } else {
+                pxar::PxarVariant::Unified(reader)
+            };
+            let decoder = pxar::decoder::Decoder::from_std(reader)?;
+
             pbs_client::pxar::extract_archive(
-                pxar::decoder::Decoder::from_std(pxar::PxarVariant::Unified(reader))?,
+                decoder,
                 Path::new(target),
                 feature_flags,
                 |path| {
@@ -1529,6 +1539,9 @@ async fn restore(
             )
             .map_err(|err| format_err!("error extracting archive - {:#}", err))?;
         } else {
+            if archive_name.ends_with(".mpxar.didx") || archive_name.ends_with(".ppxar.didx") {
+                bail!("unable to pipe split archive");
+            }
             let mut writer = std::fs::OpenOptions::new()
                 .write(true)
                 .open("/dev/stdout")
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 25/69] tools: cover extension for split pxar archives
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (23 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 24/69] client: restore: read payload from dedicated index Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 26/69] restore: " Christian Ebner
                   ` (44 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Cover the additional `.mpxar` for metadata archive and `.ppxar` for
the payload data file in the cli parameter completion callback.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- use newly introduced has_pxar_filename_extension helper

 pbs-client/src/tools/mod.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pbs-client/src/tools/mod.rs b/pbs-client/src/tools/mod.rs
index 6a9d1992d..f43058dbd 100644
--- a/pbs-client/src/tools/mod.rs
+++ b/pbs-client/src/tools/mod.rs
@@ -337,7 +337,7 @@ pub fn complete_pxar_archive_name(arg: &str, param: &HashMap<String, String>) ->
     complete_server_file_name(arg, param)
         .iter()
         .filter_map(|name| {
-            if name.ends_with(".pxar.didx") {
+            if has_pxar_filename_extension(name, true) {
                 Some(pbs_tools::format::strip_server_file_extension(name).to_owned())
             } else {
                 None
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 26/69] restore: cover extension for split pxar archives
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (24 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 25/69] tools: cover extension for split pxar archives Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 27/69] client: mount: make split pxar archives mountable Christian Ebner
                   ` (43 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Cover the additional `.mpxar` for metadata archive and `.ppxar` for
the payload data for pxar archives written as split archive.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- use newly introduced has_pxar_filename_extension helper

 proxmox-file-restore/src/main.rs | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/proxmox-file-restore/src/main.rs b/proxmox-file-restore/src/main.rs
index 6a6379f27..680281632 100644
--- a/proxmox-file-restore/src/main.rs
+++ b/proxmox-file-restore/src/main.rs
@@ -75,7 +75,7 @@ fn parse_path(path: String, base64: bool) -> Result<ExtractPath, Error> {
         (file, path)
     };
 
-    if file.ends_with(".pxar.didx") {
+    if has_pxar_filename_extension(&file, true) {
         Ok(ExtractPath::Pxar(file, path))
     } else if file.ends_with(".img.fidx") {
         Ok(ExtractPath::VM(file, path))
@@ -123,11 +123,13 @@ async fn list_files(
         ExtractPath::ListArchives => {
             let mut entries = vec![];
             for file in manifest.files() {
-                if !file.filename.ends_with(".pxar.didx") && !file.filename.ends_with(".img.fidx") {
+                if !has_pxar_filename_extension(&file.filename, true)
+                    && !file.filename.ends_with(".img.fidx")
+                {
                     continue;
                 }
                 let path = format!("/{}", file.filename);
-                let attr = if file.filename.ends_with(".pxar.didx") {
+                let attr = if has_pxar_filename_extension(&file.filename, true) {
                     // a pxar file is a file archive, so it's root is also a directory root
                     Some(&DirEntryAttribute::Directory { start: 0 })
                 } else {
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 27/69] client: mount: make split pxar archives mountable
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (25 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 26/69] restore: " Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 28/69] api: datastore: attach split archive payload chunk reader Christian Ebner
                   ` (42 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Cover the cases where the pxar archive was uploaded as split payload
data and metadata streams. Instantiate the required reader and
decoder instances to access the metadata and payload data archives.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- use newly introduced has_pxar_filename_extension helper
- adapt to PxarVariant pxar interface
- fix clippy warnings

 proxmox-backup-client/src/mount.rs | 34 +++++++++++++-----------------
 proxmox-file-restore/src/main.rs   |  1 +
 2 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/proxmox-backup-client/src/mount.rs b/proxmox-backup-client/src/mount.rs
index 4d352b6e4..8b3d8915a 100644
--- a/proxmox-backup-client/src/mount.rs
+++ b/proxmox-backup-client/src/mount.rs
@@ -10,6 +10,7 @@ use futures::future::FutureExt;
 use futures::select;
 use futures::stream::{StreamExt, TryStreamExt};
 use nix::unistd::{fork, ForkResult};
+use pbs_client::tools::has_pxar_filename_extension;
 use serde_json::Value;
 use tokio::signal::unix::{signal, SignalKind};
 
@@ -21,17 +22,16 @@ use pbs_api_types::BackupNamespace;
 use pbs_client::tools::key_source::get_encryption_key_password;
 use pbs_client::{BackupReader, RemoteChunkReader};
 use pbs_datastore::cached_chunk_reader::CachedChunkReader;
-use pbs_datastore::dynamic_index::BufferedDynamicReader;
 use pbs_datastore::index::IndexFile;
 use pbs_key_config::load_and_decrypt_key;
 use pbs_tools::crypt_config::CryptConfig;
 use pbs_tools::json::required_string_param;
 
+use crate::helper;
 use crate::{
     complete_group_or_snapshot, complete_img_archive_name, complete_namespace,
     complete_pxar_archive_name, complete_repository, connect, dir_or_last_from_group,
-    extract_repository_from_value, optional_ns_param, record_repository, BufferedDynamicReadAt,
-    REPO_URL_SCHEMA,
+    extract_repository_from_value, optional_ns_param, record_repository, REPO_URL_SCHEMA,
 };
 
 #[sortable]
@@ -219,7 +219,7 @@ async fn mount_do(param: Value, pipe: Option<OwnedFd>) -> Result<Value, Error> {
         }
     };
 
-    let server_archive_name = if archive_name.ends_with(".pxar") {
+    let server_archive_name = if has_pxar_filename_extension(archive_name, false) {
         if target.is_none() {
             bail!("use the 'mount' command to mount pxar archives");
         }
@@ -246,7 +246,10 @@ async fn mount_do(param: Value, pipe: Option<OwnedFd>) -> Result<Value, Error> {
     let (manifest, _) = client.download_manifest().await?;
     manifest.check_fingerprint(crypt_config.as_ref().map(Arc::as_ref))?;
 
-    let file_info = manifest.lookup_file_info(&server_archive_name)?;
+    let (archive_name, payload_archive_name) =
+        helper::get_pxar_archive_names(&server_archive_name, &manifest);
+
+    let file_info = manifest.lookup_file_info(&archive_name)?;
 
     let daemonize = || -> Result<(), Error> {
         if let Some(pipe) = pipe {
@@ -283,21 +286,14 @@ async fn mount_do(param: Value, pipe: Option<OwnedFd>) -> Result<Value, Error> {
         futures::future::select(interrupt_int.recv().boxed(), interrupt_term.recv().boxed());
 
     if server_archive_name.ends_with(".didx") {
-        let index = client
-            .download_dynamic_index(&manifest, &server_archive_name)
-            .await?;
-        let most_used = index.find_most_used_chunks(8);
-        let chunk_reader = RemoteChunkReader::new(
+        let decoder = helper::get_pxar_fuse_accessor(
+            &archive_name,
+            payload_archive_name.as_deref(),
             client.clone(),
-            crypt_config,
-            file_info.chunk_crypt_mode(),
-            most_used,
-        );
-        let reader = BufferedDynamicReader::new(index, chunk_reader);
-        let archive_size = reader.archive_size();
-        let reader: pbs_pxar_fuse::Reader = Arc::new(BufferedDynamicReadAt::new(reader));
-        let decoder =
-            pbs_pxar_fuse::Accessor::new(pxar::PxarVariant::Unified(reader), archive_size).await?;
+            &manifest,
+            crypt_config.clone(),
+        )
+        .await?;
 
         let session =
             pbs_pxar_fuse::Session::mount(decoder, options, false, Path::new(target.unwrap()))
diff --git a/proxmox-file-restore/src/main.rs b/proxmox-file-restore/src/main.rs
index 680281632..61dece97d 100644
--- a/proxmox-file-restore/src/main.rs
+++ b/proxmox-file-restore/src/main.rs
@@ -24,6 +24,7 @@ use pbs_api_types::{file_restore::FileRestoreFormat, BackupDir, BackupNamespace,
 use pbs_client::pxar::{create_tar, create_zip, extract_sub_dir, extract_sub_dir_seq};
 use pbs_client::tools::{
     complete_group_or_snapshot, complete_repository, connect, extract_repository_from_value,
+    has_pxar_filename_extension,
     key_source::{
         crypto_parameters_keep_fd, format_key_source, get_encryption_key_password, KEYFD_SCHEMA,
         KEYFILE_SCHEMA,
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 28/69] api: datastore: attach split archive payload chunk reader
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (26 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 27/69] client: mount: make split pxar archives mountable Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 29/69] catalog: shell: make split pxar archives accessible Christian Ebner
                   ` (41 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Attach the payload chunk reader for pxar archives which have been
uploaded using split streams for metadata and payload data.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- adapt to PxarVariant pxar interface

 src/api2/admin/datastore.rs | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index af1c12cc0..bab1104d4 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -1813,7 +1813,16 @@ pub fn pxar_file_download(
         let (reader, archive_size) =
             get_local_pxar_reader(datastore.clone(), &manifest, &backup_dir, pxar_name)?;
 
-        let decoder = Accessor::new(pxar::PxarVariant::Unified(reader), archive_size).await?;
+        let reader = if let Some(archive_base_name) = pxar_name.strip_suffix(".mpxar.didx") {
+            let payload_archive_name = format!("{archive_base_name}.ppxar.didx");
+            let payload_input =
+                get_local_pxar_reader(datastore, &manifest, &backup_dir, &payload_archive_name)?;
+            pxar::PxarVariant::Split(reader, payload_input)
+        } else {
+            pxar::PxarVariant::Unified(reader)
+        };
+        let decoder = Accessor::new(reader, archive_size).await?;
+
         let root = decoder.open_root().await?;
         let path = OsStr::from_bytes(file_path).to_os_string();
         let file = root
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 29/69] catalog: shell: make split pxar archives accessible
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (27 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 28/69] api: datastore: attach split archive payload chunk reader Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 30/69] www: cover metadata extension for pxar archives Christian Ebner
                   ` (40 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Cover the cases where the pxar archive was uploaded as split payload
data and metadata streams. Instantiate the required reader and
decoder instances to access the metadata and payload data archives,
using the corresponding helper methods.
Allows to restore split metadata and payload stream pxar archives via
the catalog shell.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- use newly introduced has_pxar_filename_extension helper
- allow for pxar -> mpxar mapping
- adapt to PxarVariant pxar interface
- fix clippy warnings

 proxmox-backup-client/src/catalog.rs | 30 ++++++++++++----------------
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/proxmox-backup-client/src/catalog.rs b/proxmox-backup-client/src/catalog.rs
index e72b6a1e0..ebd75b83f 100644
--- a/proxmox-backup-client/src/catalog.rs
+++ b/proxmox-backup-client/src/catalog.rs
@@ -9,17 +9,19 @@ use proxmox_router::cli::*;
 use proxmox_schema::api;
 
 use pbs_api_types::BackupNamespace;
+use pbs_client::tools::has_pxar_filename_extension;
 use pbs_client::tools::key_source::get_encryption_key_password;
 use pbs_client::{BackupReader, RemoteChunkReader};
 use pbs_tools::crypt_config::CryptConfig;
 use pbs_tools::json::required_string_param;
 
+use crate::helper;
 use crate::{
     complete_backup_snapshot, complete_group_or_snapshot, complete_namespace,
     complete_pxar_archive_name, complete_repository, connect, crypto_parameters, decrypt_key,
     dir_or_last_from_group, extract_repository_from_value, format_key_source, optional_ns_param,
-    record_repository, BackupDir, BufferedDynamicReadAt, BufferedDynamicReader, CatalogReader,
-    DynamicIndexReader, IndexFile, Shell, CATALOG_NAME, KEYFD_SCHEMA, REPO_URL_SCHEMA,
+    record_repository, BackupDir, BufferedDynamicReader, CatalogReader, DynamicIndexReader,
+    IndexFile, Shell, CATALOG_NAME, KEYFD_SCHEMA, REPO_URL_SCHEMA,
 };
 
 #[api(
@@ -180,7 +182,7 @@ async fn catalog_shell(param: Value) -> Result<(), Error> {
         }
     };
 
-    let server_archive_name = if archive_name.ends_with(".pxar") {
+    let server_archive_name = if has_pxar_filename_extension(archive_name, false) {
         format!("{}.didx", archive_name)
     } else {
         bail!("Can only mount pxar archives.");
@@ -205,23 +207,17 @@ async fn catalog_shell(param: Value) -> Result<(), Error> {
     let (manifest, _) = client.download_manifest().await?;
     manifest.check_fingerprint(crypt_config.as_ref().map(Arc::as_ref))?;
 
-    let index = client
-        .download_dynamic_index(&manifest, &server_archive_name)
-        .await?;
-    let most_used = index.find_most_used_chunks(8);
+    let (archive_name, payload_archive_name) =
+        helper::get_pxar_archive_names(&server_archive_name, &manifest);
 
-    let file_info = manifest.lookup_file_info(&server_archive_name)?;
-    let chunk_reader = RemoteChunkReader::new(
+    let decoder = helper::get_pxar_fuse_accessor(
+        &archive_name,
+        payload_archive_name.as_deref(),
         client.clone(),
+        &manifest,
         crypt_config.clone(),
-        file_info.chunk_crypt_mode(),
-        most_used,
-    );
-    let reader = BufferedDynamicReader::new(index, chunk_reader);
-    let archive_size = reader.archive_size();
-    let reader: pbs_pxar_fuse::Reader = Arc::new(BufferedDynamicReadAt::new(reader));
-    let decoder =
-        pbs_pxar_fuse::Accessor::new(pxar::PxarVariant::Unified(reader), archive_size).await?;
+    )
+    .await?;
 
     client.download(CATALOG_NAME, &mut tmpfile).await?;
     let index = DynamicIndexReader::new(tmpfile)
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 30/69] www: cover metadata extension for pxar archives
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (28 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 29/69] catalog: shell: make split pxar archives accessible Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 31/69] file restore: factor out getting pxar reader Christian Ebner
                   ` (39 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Allows to access the pxar metadata archives for navigation and
download via the Proxmox Backup Server web ui.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 www/datastore/Content.js | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/www/datastore/Content.js b/www/datastore/Content.js
index c2403ff9c..6dd1ab319 100644
--- a/www/datastore/Content.js
+++ b/www/datastore/Content.js
@@ -1050,7 +1050,7 @@ Ext.define('PBS.DataStoreContent', {
 		    tooltip: gettext('Browse'),
 		    getClass: (v, m, { data }) => {
 			if (
-			    (data.ty === 'file' && data.filename.endsWith('pxar.didx')) ||
+			    (data.ty === 'file' && (data.filename.endsWith('.pxar.didx') || data.filename.endsWith('.mpxar.didx'))) ||
 			    (data.ty === 'ns' && !data.root)
 			) {
 			    return 'fa fa-folder-open-o';
@@ -1058,7 +1058,9 @@ Ext.define('PBS.DataStoreContent', {
 			return 'pmx-hidden';
 		    },
 		    isActionDisabled: (v, r, c, i, { data }) =>
-			!(data.ty === 'file' && data.filename.endsWith('pxar.didx') && data['crypt-mode'] < 3) && data.ty !== 'ns',
+			!(data.ty === 'file' &&
+			(data.filename.endsWith('.pxar.didx') || data.filename.endsWith('.mpxar.didx')) &&
+			data['crypt-mode'] < 3) && data.ty !== 'ns',
 		},
 	    ],
 	},
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 31/69] file restore: factor out getting pxar reader
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (29 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 30/69] www: cover metadata extension for pxar archives Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 32/69] file restore: cover split metadata and payload archives Christian Ebner
                   ` (38 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Factor out the logic to get the pxar reader into a dedicated function
so it can be reused to get the payload data archive reader instance.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- rename from `get_local_pxar_reader` to `get_remote_pxar_reader`

 proxmox-file-restore/src/main.rs | 44 ++++++++++++++++++++------------
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/proxmox-file-restore/src/main.rs b/proxmox-file-restore/src/main.rs
index 61dece97d..c9a545677 100644
--- a/proxmox-file-restore/src/main.rs
+++ b/proxmox-file-restore/src/main.rs
@@ -35,7 +35,7 @@ use pbs_client::{BackupReader, BackupRepository, RemoteChunkReader};
 use pbs_datastore::catalog::{ArchiveEntry, CatalogReader, DirEntryAttribute};
 use pbs_datastore::dynamic_index::{BufferedDynamicReader, LocalDynamicReadAt};
 use pbs_datastore::index::IndexFile;
-use pbs_datastore::CATALOG_NAME;
+use pbs_datastore::{BackupManifest, CATALOG_NAME};
 use pbs_key_config::decrypt_key;
 use pbs_tools::crypt_config::CryptConfig;
 
@@ -328,6 +328,31 @@ async fn list(
     Ok(())
 }
 
+async fn get_remote_pxar_reader(
+    archive_name: &str,
+    client: Arc<BackupReader>,
+    manifest: &BackupManifest,
+    crypt_config: Option<Arc<CryptConfig>>,
+) -> Result<(LocalDynamicReadAt<RemoteChunkReader>, u64), Error> {
+    let index = client
+        .download_dynamic_index(&manifest, &archive_name)
+        .await?;
+    let most_used = index.find_most_used_chunks(8);
+
+    let file_info = manifest.lookup_file_info(&archive_name)?;
+    let chunk_reader = RemoteChunkReader::new(
+        client.clone(),
+        crypt_config,
+        file_info.chunk_crypt_mode(),
+        most_used,
+    );
+
+    let reader = BufferedDynamicReader::new(index, chunk_reader);
+    let archive_size = reader.archive_size();
+
+    Ok((LocalDynamicReadAt::new(reader), archive_size))
+}
+
 #[api(
     input: {
         properties: {
@@ -445,21 +470,8 @@ async fn extract(
 
     match path {
         ExtractPath::Pxar(archive_name, path) => {
-            let file_info = manifest.lookup_file_info(&archive_name)?;
-            let index = client
-                .download_dynamic_index(&manifest, &archive_name)
-                .await?;
-            let most_used = index.find_most_used_chunks(8);
-            let chunk_reader = RemoteChunkReader::new(
-                client.clone(),
-                crypt_config,
-                file_info.chunk_crypt_mode(),
-                most_used,
-            );
-            let reader = BufferedDynamicReader::new(index, chunk_reader);
-
-            let archive_size = reader.archive_size();
-            let reader = LocalDynamicReadAt::new(reader);
+            let (reader, archive_size) =
+                get_remote_pxar_reader(&archive_name, client, &manifest, crypt_config).await?;
             let decoder = Accessor::new(pxar::PxarVariant::Unified(reader), archive_size).await?;
             extract_to_target(decoder, &path, target, format, zstd).await?;
         }
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 32/69] file restore: cover split metadata and payload archives
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (30 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 31/69] file restore: factor out getting pxar reader Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 33/69] file restore: show more error context when extraction fails Christian Ebner
                   ` (37 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Attach the payload data archive as input stream to the decoder
and accessor instances for split archives.
Allows to restore contents from split archives via the
`proxmox-file-restore extract` command, by passing the metadata
archive name.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- s/get_local_pxar_reader/get_remote_pxar_reader
- adapt to PxarVariant pxar interface

 proxmox-file-restore/src/main.rs | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/proxmox-file-restore/src/main.rs b/proxmox-file-restore/src/main.rs
index c9a545677..0de2b47e6 100644
--- a/proxmox-file-restore/src/main.rs
+++ b/proxmox-file-restore/src/main.rs
@@ -470,9 +470,25 @@ async fn extract(
 
     match path {
         ExtractPath::Pxar(archive_name, path) => {
-            let (reader, archive_size) =
-                get_remote_pxar_reader(&archive_name, client, &manifest, crypt_config).await?;
-            let decoder = Accessor::new(pxar::PxarVariant::Unified(reader), archive_size).await?;
+            let (reader, archive_size) = get_remote_pxar_reader(
+                &archive_name,
+                client.clone(),
+                &manifest,
+                crypt_config.clone(),
+            )
+            .await?;
+
+            let reader = if let Some(archive_base_name) = archive_name.strip_suffix(".mpxar.didx") {
+                let payload_archive_name = format!("{archive_base_name}.ppxar.didx");
+                let (payload_reader, payload_size) =
+                    get_remote_pxar_reader(&payload_archive_name, client, &manifest, crypt_config)
+                        .await?;
+                pxar::PxarVariant::Split(reader, (payload_reader, payload_size))
+            } else {
+                pxar::PxarVariant::Unified(reader)
+            };
+            let decoder = Accessor::new(reader, archive_size).await?;
+
             extract_to_target(decoder, &path, target, format, zstd).await?;
         }
         ExtractPath::VM(file, path) => {
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 33/69] file restore: show more error context when extraction fails
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (31 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 32/69] file restore: cover split metadata and payload archives Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 34/69] pxar: add optional payload input for archive restore Christian Ebner
                   ` (36 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Otherwise the context swallows the actual, underlying error message.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 proxmox-file-restore/src/main.rs | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/proxmox-file-restore/src/main.rs b/proxmox-file-restore/src/main.rs
index 0de2b47e6..26e7663a1 100644
--- a/proxmox-file-restore/src/main.rs
+++ b/proxmox-file-restore/src/main.rs
@@ -489,7 +489,9 @@ async fn extract(
             };
             let decoder = Accessor::new(reader, archive_size).await?;
 
-            extract_to_target(decoder, &path, target, format, zstd).await?;
+            extract_to_target(decoder, &path, target, format, zstd)
+                .await
+                .map_err(|err| format_err!("error extracting archive - {err:#}"))?;
         }
         ExtractPath::VM(file, path) => {
             let details = SnapRestoreDetails {
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 34/69] pxar: add optional payload input for archive restore
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (32 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 33/69] file restore: show more error context when extraction fails Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 35/69] pxar: cover listing for split archives Christian Ebner
                   ` (35 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Allows to pass the optional payload input to restore for cases where the
regular file payloads are stored in the split archive.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- adapt to PxarVariant pxar interface

 pxar-bin/src/main.rs | 29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index 61756d21c..903467c98 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -25,9 +25,15 @@ fn extract_archive_from_reader<R: std::io::Read>(
     target: &str,
     feature_flags: Flags,
     options: PxarExtractOptions,
+    payload_reader: Option<&mut R>,
 ) -> Result<(), Error> {
+    let reader = if let Some(payload_reader) = payload_reader {
+        pxar::PxarVariant::Split(reader, payload_reader)
+    } else {
+        pxar::PxarVariant::Unified(reader)
+    };
     pbs_client::pxar::extract_archive(
-        pxar::decoder::Decoder::from_std(pxar::PxarVariant::Unified(reader))?,
+        pxar::decoder::Decoder::from_std(reader)?,
         Path::new(target),
         feature_flags,
         |path| {
@@ -120,6 +126,10 @@ fn extract_archive_from_reader<R: std::io::Read>(
                 optional: true,
                 default: false,
             },
+            "payload-input": {
+                description: "'ppxar' payload input data file to restore split archive.",
+                optional: true,
+            },
         },
     },
 )]
@@ -142,6 +152,7 @@ fn extract_archive(
     no_fifos: bool,
     no_sockets: bool,
     strict: bool,
+    payload_input: Option<String>,
 ) -> Result<(), Error> {
     let mut feature_flags = Flags::DEFAULT;
     if no_xattrs {
@@ -220,12 +231,24 @@ fn extract_archive(
     if archive == "-" {
         let stdin = std::io::stdin();
         let mut reader = stdin.lock();
-        extract_archive_from_reader(&mut reader, target, feature_flags, options)?;
+        extract_archive_from_reader(&mut reader, target, feature_flags, options, None)?;
     } else {
         log::debug!("PXAR extract: {}", archive);
         let file = std::fs::File::open(archive)?;
         let mut reader = std::io::BufReader::new(file);
-        extract_archive_from_reader(&mut reader, target, feature_flags, options)?;
+        let mut payload_reader = if let Some(payload_input) = payload_input {
+            let file = std::fs::File::open(payload_input)?;
+            Some(std::io::BufReader::new(file))
+        } else {
+            None
+        };
+        extract_archive_from_reader(
+            &mut reader,
+            target,
+            feature_flags,
+            options,
+            payload_reader.as_mut(),
+        )?;
     }
 
     if !was_ok.load(Ordering::Acquire) {
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 35/69] pxar: cover listing for split archives
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (33 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 34/69] pxar: add optional payload input for archive restore Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 36/69] pxar: add more context to extraction error Christian Ebner
                   ` (34 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Allows to list entries of split pxar archives. As the decoder skips
over the file payloads, the corresponding payload file has to be
provided. Otherwise the decoder would skip inside the metadata
archive, leading to incorrect decoding.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- not present in previous version

 pxar-bin/src/main.rs | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index 903467c98..b64ae1d19 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -454,12 +454,26 @@ async fn mount_archive(archive: String, mountpoint: String, verbose: bool) -> Re
             archive: {
                 description: "Archive name.",
             },
+            "payload-input": {
+                description: "'ppxar' payload input data file for split archive.",
+                optional: true,
+            },
         },
     },
 )]
 /// List the contents of an archive.
-fn dump_archive(archive: String) -> Result<(), Error> {
-    for entry in pxar::decoder::Decoder::open(archive)? {
+fn dump_archive(archive: String, payload_input: Option<String>) -> Result<(), Error> {
+    if archive.ends_with(".mpxar") && payload_input.is_none() {
+        bail!("Payload input required for split pxar archives");
+    }
+
+    let input = if let Some(payload_input) = payload_input {
+        pxar::PxarVariant::Split(archive, payload_input)
+    } else {
+        pxar::PxarVariant::Unified(archive)
+    };
+
+    for entry in pxar::decoder::Decoder::open(input)? {
         let entry = entry?;
 
         if log::log_enabled!(log::Level::Debug) {
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 36/69] pxar: add more context to extraction error
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (34 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 35/69] pxar: cover listing for split archives Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 37/69] client: pxar: include payload offset in entry listing Christian Ebner
                   ` (33 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Show more of the extraction error context provided by the pxar decoder.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 pxar-bin/src/main.rs | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index b64ae1d19..17e468062 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -231,7 +231,8 @@ fn extract_archive(
     if archive == "-" {
         let stdin = std::io::stdin();
         let mut reader = stdin.lock();
-        extract_archive_from_reader(&mut reader, target, feature_flags, options, None)?;
+        extract_archive_from_reader(&mut reader, target, feature_flags, options, None)
+            .map_err(|err| format_err!("error extracting archive - {err:#}"))?;
     } else {
         log::debug!("PXAR extract: {}", archive);
         let file = std::fs::File::open(archive)?;
@@ -248,7 +249,8 @@ fn extract_archive(
             feature_flags,
             options,
             payload_reader.as_mut(),
-        )?;
+        )
+        .map_err(|err| format_err!("error extracting archive - {err:#}"))?
     }
 
     if !was_ok.load(Ordering::Acquire) {
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 37/69] client: pxar: include payload offset in entry listing
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (35 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 36/69] pxar: add more context to extraction error Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 38/69] pxar: show padding in debug output on archive list Christian Ebner
                   ` (32 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Also display the payload offset as listing output when the regular file
entry had a payload reference rather than the payload encoded in the
archive. This allows for debugging by inspecting the raw payload data
file at given offset.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 pbs-client/src/pxar/tools.rs | 116 ++++++++++++++++++++++++-----------
 1 file changed, 80 insertions(+), 36 deletions(-)

diff --git a/pbs-client/src/pxar/tools.rs b/pbs-client/src/pxar/tools.rs
index 0cfbaf5b9..459951d50 100644
--- a/pbs-client/src/pxar/tools.rs
+++ b/pbs-client/src/pxar/tools.rs
@@ -128,25 +128,42 @@ pub fn format_single_line_entry(entry: &Entry) -> String {
 
     let meta = entry.metadata();
 
-    let (size, link) = match entry.kind() {
-        EntryKind::File { size, .. } => (format!("{}", *size), String::new()),
-        EntryKind::Symlink(link) => ("0".to_string(), format!(" -> {:?}", link.as_os_str())),
-        EntryKind::Hardlink(link) => ("0".to_string(), format!(" -> {:?}", link.as_os_str())),
-        EntryKind::Device(dev) => (format!("{},{}", dev.major, dev.minor), String::new()),
-        _ => ("0".to_string(), String::new()),
+    let (size, link, payload_offset) = match entry.kind() {
+        EntryKind::File {
+            size,
+            payload_offset,
+            ..
+        } => (format!("{}", *size), String::new(), *payload_offset),
+        EntryKind::Symlink(link) => ("0".to_string(), format!(" -> {:?}", link.as_os_str()), None),
+        EntryKind::Hardlink(link) => ("0".to_string(), format!(" -> {:?}", link.as_os_str()), None),
+        EntryKind::Device(dev) => (format!("{},{}", dev.major, dev.minor), String::new(), None),
+        _ => ("0".to_string(), String::new(), None),
     };
 
     let owner_string = format!("{}/{}", meta.stat.uid, meta.stat.gid);
 
-    format!(
-        "{} {:<13} {} {:>8} {:?}{}",
-        mode_string,
-        owner_string,
-        format_mtime(&meta.stat.mtime),
-        size,
-        entry.path(),
-        link,
-    )
+    if let Some(offset) = payload_offset {
+        format!(
+            "{} {:<13} {} {:>8} {:?}{} {}",
+            mode_string,
+            owner_string,
+            format_mtime(&meta.stat.mtime),
+            size,
+            entry.path(),
+            link,
+            offset,
+        )
+    } else {
+        format!(
+            "{} {:<13} {} {:>8} {:?}{}",
+            mode_string,
+            owner_string,
+            format_mtime(&meta.stat.mtime),
+            size,
+            entry.path(),
+            link,
+        )
+    }
 }
 
 pub fn format_multi_line_entry(entry: &Entry) -> String {
@@ -154,17 +171,23 @@ pub fn format_multi_line_entry(entry: &Entry) -> String {
 
     let meta = entry.metadata();
 
-    let (size, link, type_name) = match entry.kind() {
-        EntryKind::File { size, .. } => (format!("{}", *size), String::new(), "file"),
+    let (size, link, type_name, payload_offset) = match entry.kind() {
+        EntryKind::File {
+            size,
+            payload_offset,
+            ..
+        } => (format!("{}", *size), String::new(), "file", *payload_offset),
         EntryKind::Symlink(link) => (
             "0".to_string(),
             format!(" -> {:?}", link.as_os_str()),
             "symlink",
+            None,
         ),
         EntryKind::Hardlink(link) => (
             "0".to_string(),
             format!(" -> {:?}", link.as_os_str()),
             "symlink",
+            None,
         ),
         EntryKind::Device(dev) => (
             format!("{},{}", dev.major, dev.minor),
@@ -176,11 +199,12 @@ pub fn format_multi_line_entry(entry: &Entry) -> String {
             } else {
                 "device"
             },
+            None,
         ),
-        EntryKind::Socket => ("0".to_string(), String::new(), "socket"),
-        EntryKind::Fifo => ("0".to_string(), String::new(), "fifo"),
-        EntryKind::Directory => ("0".to_string(), String::new(), "directory"),
-        EntryKind::GoodbyeTable => ("0".to_string(), String::new(), "bad entry"),
+        EntryKind::Socket => ("0".to_string(), String::new(), "socket", None),
+        EntryKind::Fifo => ("0".to_string(), String::new(), "fifo", None),
+        EntryKind::Directory => ("0".to_string(), String::new(), "directory", None),
+        EntryKind::GoodbyeTable => ("0".to_string(), String::new(), "bad entry", None),
     };
 
     let file_name = match std::str::from_utf8(entry.path().as_os_str().as_bytes()) {
@@ -188,19 +212,39 @@ pub fn format_multi_line_entry(entry: &Entry) -> String {
         Err(_) => std::borrow::Cow::Owned(format!("{:?}", entry.path())),
     };
 
-    format!(
-        "  File: {}{}\n  \
-           Size: {:<13} Type: {}\n\
-         Access: ({:o}/{})  Uid: {:<5} Gid: {:<5}\n\
-         Modify: {}\n",
-        file_name,
-        link,
-        size,
-        type_name,
-        meta.file_mode(),
-        mode_string,
-        meta.stat.uid,
-        meta.stat.gid,
-        format_mtime(&meta.stat.mtime),
-    )
+    if let Some(offset) = payload_offset {
+        format!(
+            "  File: {}{}\n  \
+               Size: {:<13} Type: {}\n\
+             Access: ({:o}/{})  Uid: {:<5} Gid: {:<5}\n\
+             Modify: {}\n
+             PayloadOffset: {}\n",
+            file_name,
+            link,
+            size,
+            type_name,
+            meta.file_mode(),
+            mode_string,
+            meta.stat.uid,
+            meta.stat.gid,
+            format_mtime(&meta.stat.mtime),
+            offset,
+        )
+    } else {
+        format!(
+            "  File: {}{}\n  \
+               Size: {:<13} Type: {}\n\
+             Access: ({:o}/{})  Uid: {:<5} Gid: {:<5}\n\
+             Modify: {}\n",
+            file_name,
+            link,
+            size,
+            type_name,
+            meta.file_mode(),
+            mode_string,
+            meta.stat.uid,
+            meta.stat.gid,
+            format_mtime(&meta.stat.mtime),
+        )
+    }
 }
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 38/69] pxar: show padding in debug output on archive list
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (36 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 37/69] client: pxar: include payload offset in entry listing Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 39/69] datastore: dynamic index: add method to get digest Christian Ebner
                   ` (31 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

In addition to the entries, also show the padding encountered in-between
referenced payloads.

Example invocation: `PXAR_LOG=debug pxar list archive.mpxar`

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 pxar-bin/src/main.rs | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index 17e468062..cfeddd9fa 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -9,6 +9,7 @@ use std::sync::Arc;
 use anyhow::{bail, format_err, Error};
 use futures::future::FutureExt;
 use futures::select;
+use pxar::EntryKind;
 use tokio::signal::unix::{signal, SignalKind};
 
 use pathpatterns::{MatchEntry, MatchType, PatternFlag};
@@ -479,6 +480,23 @@ fn dump_archive(archive: String, payload_input: Option<String>) -> Result<(), Er
         let entry = entry?;
 
         if log::log_enabled!(log::Level::Debug) {
+            match entry.kind() {
+                EntryKind::File {
+                    payload_offset: Some(offset),
+                    size,
+                    ..
+                } => {
+                    if let Some(last) = last {
+                        let skipped = offset - last;
+                        if skipped > 0 {
+                            log::debug!("Encountered padding of {skipped} bytes");
+                        }
+                    }
+                    last = Some(offset + size + std::mem::size_of::<pxar::format::Header>() as u64);
+                }
+                _ => (),
+            }
+
             log::debug!("{}", format_single_line_entry(&entry));
         } else {
             log::info!("{:?}", entry.path());
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 39/69] datastore: dynamic index: add method to get digest
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (37 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 38/69] pxar: show padding in debug output on archive list Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 40/69] client: pxar: helper for lookup of reusable dynamic entries Christian Ebner
                   ` (30 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

In preparation for injecting reused payload chunks in payload streams
for regular files with unchanged metaddata. Allows to get the digest
of a dynamic index entry to construct a reusable dynamic entry from
it.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 pbs-datastore/src/dynamic_index.rs | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/pbs-datastore/src/dynamic_index.rs b/pbs-datastore/src/dynamic_index.rs
index 71a5082e1..b8047b5b1 100644
--- a/pbs-datastore/src/dynamic_index.rs
+++ b/pbs-datastore/src/dynamic_index.rs
@@ -72,6 +72,11 @@ impl DynamicEntry {
     pub fn end(&self) -> u64 {
         u64::from_le(self.end_le)
     }
+
+    #[inline]
+    pub fn digest(&self) -> [u8; 32] {
+        self.digest
+    }
 }
 
 pub struct DynamicIndexReader {
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 40/69] client: pxar: helper for lookup of reusable dynamic entries
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (38 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 39/69] datastore: dynamic index: add method to get digest Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 41/69] upload stream: implement reused chunk injector Christian Ebner
                   ` (29 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

The helper method allows to lookup the entries of a dynamic index
which fully cover a given offset range. Further, the helper returns
the start padding from the start offset of the dynamic index entry
to the start offset of the given range and the end padding.

This will be used to lookup size and digest for chunks covering the
payload range of a regular file in order to re-use found chunks by
indexing them in the archives index file instead of re-encoding the
payload.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 pbs-client/src/pxar/create.rs | 70 +++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index cc75f0262..6dbd1e664 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -2,6 +2,7 @@ use std::collections::{HashMap, HashSet};
 use std::ffi::{CStr, CString, OsStr};
 use std::fmt;
 use std::io::{self, Read};
+use std::ops::Range;
 use std::os::unix::ffi::OsStrExt;
 use std::os::unix::io::{AsRawFd, FromRawFd, IntoRawFd, OwnedFd, RawFd};
 use std::path::{Path, PathBuf};
@@ -25,6 +26,8 @@ use proxmox_lang::c_str;
 use proxmox_sys::fs::{self, acl, xattr};
 
 use pbs_datastore::catalog::BackupCatalogWriter;
+use pbs_datastore::dynamic_index::DynamicIndexReader;
+use pbs_datastore::index::IndexFile;
 
 use crate::pxar::metadata::errno_is_unsupported;
 use crate::pxar::tools::assert_single_path_component;
@@ -780,6 +783,73 @@ impl Archiver {
     }
 }
 
+/// Dynamic entry reusable by payload references
+#[derive(Clone, Debug)]
+#[repr(C)]
+pub struct ReusableDynamicEntry {
+    size: u64,
+    padding: u64,
+    digest: [u8; 32],
+}
+
+impl ReusableDynamicEntry {
+    #[inline]
+    pub fn size(&self) -> u64 {
+        self.size
+    }
+
+    #[inline]
+    pub fn digest(&self) -> [u8; 32] {
+        self.digest
+    }
+}
+
+/// List of dynamic entries containing the data given by an offset range
+fn lookup_dynamic_entries(
+    index: &DynamicIndexReader,
+    range: Range<u64>,
+) -> Result<(Vec<ReusableDynamicEntry>, u64, u64), Error> {
+    let end_idx = index.index_count() - 1;
+    let chunk_end = index.chunk_end(end_idx);
+    let start = index.binary_search(0, 0, end_idx, chunk_end, range.start)?;
+
+    let mut prev_end = if start == 0 {
+        0
+    } else {
+        index.chunk_end(start - 1)
+    };
+    let padding_start = range.start - prev_end;
+    let mut padding_end = 0;
+
+    let mut indices = Vec::new();
+    for dynamic_entry in &index.index()[start..] {
+        let end = dynamic_entry.end();
+
+        let reusable_dynamic_entry = ReusableDynamicEntry {
+            size: (end - prev_end),
+            padding: 0,
+            digest: dynamic_entry.digest(),
+        };
+        indices.push(reusable_dynamic_entry);
+
+        if range.end < end {
+            padding_end = end - range.end;
+            break;
+        }
+        prev_end = end;
+    }
+
+    if let Some(first) = indices.first_mut() {
+        first.padding += padding_start;
+    }
+
+    if let Some(last) = indices.last_mut() {
+        last.padding += padding_end;
+    }
+
+    Ok((indices, padding_start, padding_end))
+}
+
 fn get_metadata(
     fd: RawFd,
     stat: &FileStat,
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 41/69] upload stream: implement reused chunk injector
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (39 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 40/69] client: pxar: helper for lookup of reusable dynamic entries Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 42/69] client: chunk stream: add struct to hold injection state Christian Ebner
                   ` (28 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

In order to be included in the backups index file, reused payload
chunks have to be injected into the payload upload stream at a
forced boundary. The chunker forces a chunk boundary and sends the
list of reusable dynamic entries to be uploaded.

This implements the logic to receive these dynamic entries via the
corresponding communication channel from the chunker and inject the
entries into the backup upload stream by looking for the matching
chunk boundary, already forced by the chunker.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 pbs-client/src/inject_reused_chunks.rs | 129 +++++++++++++++++++++++++
 pbs-client/src/lib.rs                  |   1 +
 2 files changed, 130 insertions(+)
 create mode 100644 pbs-client/src/inject_reused_chunks.rs

diff --git a/pbs-client/src/inject_reused_chunks.rs b/pbs-client/src/inject_reused_chunks.rs
new file mode 100644
index 000000000..ed147f5fb
--- /dev/null
+++ b/pbs-client/src/inject_reused_chunks.rs
@@ -0,0 +1,129 @@
+use std::cmp;
+use std::pin::Pin;
+use std::sync::atomic::{AtomicUsize, Ordering};
+use std::sync::{mpsc, Arc};
+use std::task::{Context, Poll};
+
+use anyhow::{anyhow, Error};
+use futures::{ready, Stream};
+use pin_project_lite::pin_project;
+
+use crate::pxar::create::ReusableDynamicEntry;
+
+pin_project! {
+    pub struct InjectReusedChunksQueue<S> {
+        #[pin]
+        input: S,
+        next_injection: Option<InjectChunks>,
+        buffer: Option<bytes::BytesMut>,
+        injections: Option<mpsc::Receiver<InjectChunks>>,
+        stream_len: Arc<AtomicUsize>,
+    }
+}
+
+type StreamOffset = u64;
+#[derive(Debug)]
+/// Holds a list of chunks to inject at the given boundary by forcing a chunk boundary.
+pub struct InjectChunks {
+    /// Offset at which to force the boundary
+    pub boundary: StreamOffset,
+    /// List of chunks to inject
+    pub chunks: Vec<ReusableDynamicEntry>,
+    /// Cumulative size of the chunks in the list
+    pub size: usize,
+}
+
+/// Variants for stream consumer to distinguish between raw data chunks and injected ones.
+pub enum InjectedChunksInfo {
+    Known(Vec<ReusableDynamicEntry>),
+    Raw(bytes::BytesMut),
+}
+
+pub trait InjectReusedChunks: Sized {
+    fn inject_reused_chunks(
+        self,
+        injections: Option<mpsc::Receiver<InjectChunks>>,
+        stream_len: Arc<AtomicUsize>,
+    ) -> InjectReusedChunksQueue<Self>;
+}
+
+impl<S> InjectReusedChunks for S
+where
+    S: Stream<Item = Result<bytes::BytesMut, Error>>,
+{
+    fn inject_reused_chunks(
+        self,
+        injections: Option<mpsc::Receiver<InjectChunks>>,
+        stream_len: Arc<AtomicUsize>,
+    ) -> InjectReusedChunksQueue<Self> {
+        InjectReusedChunksQueue {
+            input: self,
+            next_injection: None,
+            injections,
+            buffer: None,
+            stream_len,
+        }
+    }
+}
+
+impl<S> Stream for InjectReusedChunksQueue<S>
+where
+    S: Stream<Item = Result<bytes::BytesMut, Error>>,
+{
+    type Item = Result<InjectedChunksInfo, Error>;
+
+    fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Option<Self::Item>> {
+        let mut this = self.project();
+
+        // loop to skip over possible empty chunks
+        loop {
+            if this.next_injection.is_none() {
+                if let Some(injections) = this.injections.as_mut() {
+                    if let Ok(injection) = injections.try_recv() {
+                        *this.next_injection = Some(injection);
+                    }
+                }
+            }
+
+            if let Some(inject) = this.next_injection.take() {
+                // got reusable dynamic entries to inject
+                let offset = this.stream_len.load(Ordering::SeqCst) as u64;
+
+                match inject.boundary.cmp(&offset) {
+                    // inject now
+                    cmp::Ordering::Equal => {
+                        let chunk_info = InjectedChunksInfo::Known(inject.chunks);
+                        return Poll::Ready(Some(Ok(chunk_info)));
+                    }
+                    // inject later
+                    cmp::Ordering::Greater => *this.next_injection = Some(inject),
+                    // incoming new chunks and injections didn't line up?
+                    cmp::Ordering::Less => {
+                        return Poll::Ready(Some(Err(anyhow!("invalid injection boundary"))))
+                    }
+                }
+            }
+
+            // nothing to inject now, await further input
+            match ready!(this.input.as_mut().poll_next(cx)) {
+                None => {
+                    if let Some(injections) = this.injections.as_mut() {
+                        if this.next_injection.is_some() || injections.try_recv().is_ok() {
+                            // stream finished, but remaining dynamic entries to inject
+                            return Poll::Ready(Some(Err(anyhow!(
+                                "injection queue not fully consumed"
+                            ))));
+                        }
+                    }
+                    // stream finished and all dynamic entries already injected
+                    return Poll::Ready(None);
+                }
+                Some(Err(err)) => return Poll::Ready(Some(Err(err))),
+                // ignore empty chunks, injected chunks from queue at forced boundary, but boundary
+                // did not require splitting of the raw stream buffer to force the boundary
+                Some(Ok(raw)) if raw.is_empty() => continue,
+                Some(Ok(raw)) => return Poll::Ready(Some(Ok(InjectedChunksInfo::Raw(raw)))),
+            }
+        }
+    }
+}
diff --git a/pbs-client/src/lib.rs b/pbs-client/src/lib.rs
index 21cf8556b..3e7bd2a8b 100644
--- a/pbs-client/src/lib.rs
+++ b/pbs-client/src/lib.rs
@@ -7,6 +7,7 @@ pub mod catalog_shell;
 pub mod pxar;
 pub mod tools;
 
+mod inject_reused_chunks;
 mod merge_known_chunks;
 pub mod pipe_to_stream;
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 42/69] client: chunk stream: add struct to hold injection state
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (40 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 41/69] upload stream: implement reused chunk injector Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 43/69] chunker: add method to reset chunker state Christian Ebner
                   ` (27 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Adds a dedicated structure to hold the optional sender and receiver
instances and state for injection of reused dynamic entries in the
payload stream for split stream pxar archives.

The asynchronous channels must only be attached to the payload
archive, leaving the current behavior for the metadata archive and
current default encoding without reusing payload chunks of previous
snapshots.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 pbs-client/src/chunk_stream.rs | 23 +++++++++++++++++++++++
 pbs-client/src/lib.rs          |  2 +-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/pbs-client/src/chunk_stream.rs b/pbs-client/src/chunk_stream.rs
index 895f6eae2..83c75ba28 100644
--- a/pbs-client/src/chunk_stream.rs
+++ b/pbs-client/src/chunk_stream.rs
@@ -1,4 +1,5 @@
 use std::pin::Pin;
+use std::sync::mpsc;
 use std::task::{Context, Poll};
 
 use anyhow::Error;
@@ -8,6 +9,28 @@ use futures::stream::{Stream, TryStream};
 
 use pbs_datastore::Chunker;
 
+use crate::inject_reused_chunks::InjectChunks;
+
+/// Holds the queues for optional injection of reused dynamic index entries
+pub struct InjectionData {
+    boundaries: mpsc::Receiver<InjectChunks>,
+    injections: mpsc::Sender<InjectChunks>,
+    consumed: u64,
+}
+
+impl InjectionData {
+    pub fn new(
+        boundaries: mpsc::Receiver<InjectChunks>,
+        injections: mpsc::Sender<InjectChunks>,
+    ) -> Self {
+        Self {
+            boundaries,
+            injections,
+            consumed: 0,
+        }
+    }
+}
+
 /// Split input stream into dynamic sized chunks
 pub struct ChunkStream<S: Unpin> {
     input: S,
diff --git a/pbs-client/src/lib.rs b/pbs-client/src/lib.rs
index 3e7bd2a8b..3d2da27b9 100644
--- a/pbs-client/src/lib.rs
+++ b/pbs-client/src/lib.rs
@@ -39,6 +39,6 @@ mod backup_specification;
 pub use backup_specification::*;
 
 mod chunk_stream;
-pub use chunk_stream::{ChunkStream, FixedChunkStream};
+pub use chunk_stream::{ChunkStream, FixedChunkStream, InjectionData};
 
 pub const PROXMOX_BACKUP_TCP_KEEPALIVE_TIME: u32 = 120;
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 43/69] chunker: add method to reset chunker state
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (41 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 42/69] client: chunk stream: add struct to hold injection state Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 44/69] client: streams: add channels for dynamic entry injection Christian Ebner
                   ` (26 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

When forcing a boundary, the internal chunker state is not in sync
with the chunk stream anymore. The reset method therefore allows
to reset the internal state.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 pbs-datastore/src/chunker.rs | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/pbs-datastore/src/chunker.rs b/pbs-datastore/src/chunker.rs
index 712751829..253d2cf4c 100644
--- a/pbs-datastore/src/chunker.rs
+++ b/pbs-datastore/src/chunker.rs
@@ -167,6 +167,12 @@ impl Chunker {
         0
     }
 
+    pub fn reset(&mut self) {
+        self.h = 0;
+        self.chunk_size = 0;
+        self.window_size = 0;
+    }
+
     // fast implementation avoiding modulo
     // #[inline(always)]
     fn shall_break(&self) -> bool {
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 44/69] client: streams: add channels for dynamic entry injection
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (42 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 43/69] chunker: add method to reset chunker state Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 45/69] specs: add backup detection mode specification Christian Ebner
                   ` (25 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

To reuse dynamic entries of a previous backup run and index them for
the new snapshot. Adds a non-blocking channel between the pxar
archiver and the chunk stream, as well as the chunk stream and the
backup writer.

The archiver sends forced boundary positions and the dynamic
entries to inject into the chunk stream following this boundary.

The chunk stream consumes this channel inputs as receiver whenever a
new chunk is requested by the upload stream, forcing a non-regular
chunk boundary in the pxar stream at the requested positions.

The dynamic entries to inject and the boundary are then send via the
second asynchronous channel to the backup writer's upload stream,
indexing them by inserting the dynamic entries as known chunks into
the upload stream.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- removed unneccessary code changes present in last patch
- adapted to PxarVariant pxar interface

 examples/test_chunk_speed2.rs                 |  2 +-
 pbs-client/src/backup_writer.rs               | 98 ++++++++++++-------
 pbs-client/src/chunk_stream.rs                | 78 ++++++++++++++-
 pbs-client/src/pxar/create.rs                 |  6 +-
 pbs-client/src/pxar_backup_stream.rs          |  8 +-
 proxmox-backup-client/src/main.rs             | 28 ++++--
 .../src/proxmox_restore_daemon/api.rs         |  2 +-
 pxar-bin/src/main.rs                          |  1 +
 tests/catar.rs                                |  1 +
 9 files changed, 171 insertions(+), 53 deletions(-)

diff --git a/examples/test_chunk_speed2.rs b/examples/test_chunk_speed2.rs
index 3f69b436d..22dd14ce2 100644
--- a/examples/test_chunk_speed2.rs
+++ b/examples/test_chunk_speed2.rs
@@ -26,7 +26,7 @@ async fn run() -> Result<(), Error> {
         .map_err(Error::from);
 
     //let chunk_stream = FixedChunkStream::new(stream, 4*1024*1024);
-    let mut chunk_stream = ChunkStream::new(stream, None);
+    let mut chunk_stream = ChunkStream::new(stream, None, None);
 
     let start_time = std::time::Instant::now();
 
diff --git a/pbs-client/src/backup_writer.rs b/pbs-client/src/backup_writer.rs
index dc9aa569f..b2ada85cd 100644
--- a/pbs-client/src/backup_writer.rs
+++ b/pbs-client/src/backup_writer.rs
@@ -23,6 +23,7 @@ use pbs_tools::crypt_config::CryptConfig;
 
 use proxmox_human_byte::HumanByte;
 
+use super::inject_reused_chunks::{InjectChunks, InjectReusedChunks, InjectedChunksInfo};
 use super::merge_known_chunks::{MergeKnownChunks, MergedChunkInfo};
 
 use super::{H2Client, HttpClient};
@@ -265,6 +266,7 @@ impl BackupWriter {
         archive_name: &str,
         stream: impl Stream<Item = Result<bytes::BytesMut, Error>>,
         options: UploadOptions,
+        injections: Option<std::sync::mpsc::Receiver<InjectChunks>>,
     ) -> Result<BackupStats, Error> {
         let known_chunks = Arc::new(Mutex::new(HashSet::new()));
 
@@ -341,6 +343,7 @@ impl BackupWriter {
                 None
             },
             options.compress,
+            injections,
         )
         .await?;
 
@@ -636,6 +639,7 @@ impl BackupWriter {
         known_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
         crypt_config: Option<Arc<CryptConfig>>,
         compress: bool,
+        injections: Option<std::sync::mpsc::Receiver<InjectChunks>>,
     ) -> impl Future<Output = Result<UploadStats, Error>> {
         let total_chunks = Arc::new(AtomicUsize::new(0));
         let total_chunks2 = total_chunks.clone();
@@ -662,48 +666,72 @@ impl BackupWriter {
         let index_csum_2 = index_csum.clone();
 
         stream
-            .and_then(move |data| {
-                let chunk_len = data.len();
+            .inject_reused_chunks(injections, stream_len.clone())
+            .and_then(move |chunk_info| match chunk_info {
+                InjectedChunksInfo::Known(chunks) => {
+                    // account for injected chunks
+                    let count = chunks.len();
+                    total_chunks.fetch_add(count, Ordering::SeqCst);
+
+                    let mut known = Vec::new();
+                    let mut guard = index_csum.lock().unwrap();
+                    let csum = guard.as_mut().unwrap();
+                    for chunk in chunks {
+                        let offset =
+                            stream_len.fetch_add(chunk.size() as usize, Ordering::SeqCst) as u64;
+                        reused_len.fetch_add(chunk.size() as usize, Ordering::SeqCst);
+                        let digest = chunk.digest();
+                        known.push((offset, digest));
+                        let end_offset = offset + chunk.size();
+                        csum.update(&end_offset.to_le_bytes());
+                        csum.update(&digest);
+                    }
+                    future::ok(MergedChunkInfo::Known(known))
+                }
+                InjectedChunksInfo::Raw(data) => {
+                    // account for not injected chunks (new and known)
+                    let chunk_len = data.len();
 
-                total_chunks.fetch_add(1, Ordering::SeqCst);
-                let offset = stream_len.fetch_add(chunk_len, Ordering::SeqCst) as u64;
+                    total_chunks.fetch_add(1, Ordering::SeqCst);
+                    let offset = stream_len.fetch_add(chunk_len, Ordering::SeqCst) as u64;
 
-                let mut chunk_builder = DataChunkBuilder::new(data.as_ref()).compress(compress);
+                    let mut chunk_builder = DataChunkBuilder::new(data.as_ref()).compress(compress);
 
-                if let Some(ref crypt_config) = crypt_config {
-                    chunk_builder = chunk_builder.crypt_config(crypt_config);
-                }
+                    if let Some(ref crypt_config) = crypt_config {
+                        chunk_builder = chunk_builder.crypt_config(crypt_config);
+                    }
 
-                let mut known_chunks = known_chunks.lock().unwrap();
-                let digest = chunk_builder.digest();
+                    let mut known_chunks = known_chunks.lock().unwrap();
+                    let digest = chunk_builder.digest();
 
-                let mut guard = index_csum.lock().unwrap();
-                let csum = guard.as_mut().unwrap();
+                    let mut guard = index_csum.lock().unwrap();
+                    let csum = guard.as_mut().unwrap();
 
-                let chunk_end = offset + chunk_len as u64;
+                    let chunk_end = offset + chunk_len as u64;
 
-                if !is_fixed_chunk_size {
-                    csum.update(&chunk_end.to_le_bytes());
-                }
-                csum.update(digest);
-
-                let chunk_is_known = known_chunks.contains(digest);
-                if chunk_is_known {
-                    known_chunk_count.fetch_add(1, Ordering::SeqCst);
-                    reused_len.fetch_add(chunk_len, Ordering::SeqCst);
-                    future::ok(MergedChunkInfo::Known(vec![(offset, *digest)]))
-                } else {
-                    let compressed_stream_len2 = compressed_stream_len.clone();
-                    known_chunks.insert(*digest);
-                    future::ready(chunk_builder.build().map(move |(chunk, digest)| {
-                        compressed_stream_len2.fetch_add(chunk.raw_size(), Ordering::SeqCst);
-                        MergedChunkInfo::New(ChunkInfo {
-                            chunk,
-                            digest,
-                            chunk_len: chunk_len as u64,
-                            offset,
-                        })
-                    }))
+                    if !is_fixed_chunk_size {
+                        csum.update(&chunk_end.to_le_bytes());
+                    }
+                    csum.update(digest);
+
+                    let chunk_is_known = known_chunks.contains(digest);
+                    if chunk_is_known {
+                        known_chunk_count.fetch_add(1, Ordering::SeqCst);
+                        reused_len.fetch_add(chunk_len, Ordering::SeqCst);
+                        future::ok(MergedChunkInfo::Known(vec![(offset, *digest)]))
+                    } else {
+                        let compressed_stream_len2 = compressed_stream_len.clone();
+                        known_chunks.insert(*digest);
+                        future::ready(chunk_builder.build().map(move |(chunk, digest)| {
+                            compressed_stream_len2.fetch_add(chunk.raw_size(), Ordering::SeqCst);
+                            MergedChunkInfo::New(ChunkInfo {
+                                chunk,
+                                digest,
+                                chunk_len: chunk_len as u64,
+                                offset,
+                            })
+                        }))
+                    }
                 }
             })
             .merge_known_chunks()
diff --git a/pbs-client/src/chunk_stream.rs b/pbs-client/src/chunk_stream.rs
index 83c75ba28..87a018d50 100644
--- a/pbs-client/src/chunk_stream.rs
+++ b/pbs-client/src/chunk_stream.rs
@@ -14,6 +14,7 @@ use crate::inject_reused_chunks::InjectChunks;
 /// Holds the queues for optional injection of reused dynamic index entries
 pub struct InjectionData {
     boundaries: mpsc::Receiver<InjectChunks>,
+    next_boundary: Option<InjectChunks>,
     injections: mpsc::Sender<InjectChunks>,
     consumed: u64,
 }
@@ -25,6 +26,7 @@ impl InjectionData {
     ) -> Self {
         Self {
             boundaries,
+            next_boundary: None,
             injections,
             consumed: 0,
         }
@@ -37,15 +39,17 @@ pub struct ChunkStream<S: Unpin> {
     chunker: Chunker,
     buffer: BytesMut,
     scan_pos: usize,
+    injection_data: Option<InjectionData>,
 }
 
 impl<S: Unpin> ChunkStream<S> {
-    pub fn new(input: S, chunk_size: Option<usize>) -> Self {
+    pub fn new(input: S, chunk_size: Option<usize>, injection_data: Option<InjectionData>) -> Self {
         Self {
             input,
             chunker: Chunker::new(chunk_size.unwrap_or(4 * 1024 * 1024)),
             buffer: BytesMut::new(),
             scan_pos: 0,
+            injection_data,
         }
     }
 }
@@ -62,7 +66,70 @@ where
 
     fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Option<Self::Item>> {
         let this = self.get_mut();
+
         loop {
+            if let Some(InjectionData {
+                boundaries,
+                next_boundary,
+                injections,
+                consumed,
+            }) = this.injection_data.as_mut()
+            {
+                if next_boundary.is_none() {
+                    if let Ok(boundary) = boundaries.try_recv() {
+                        *next_boundary = Some(boundary);
+                    }
+                }
+
+                if let Some(inject) = next_boundary.take() {
+                    // require forced boundary, lookup next regular boundary
+                    let pos = if this.scan_pos < this.buffer.len() {
+                        this.chunker.scan(&this.buffer[this.scan_pos..])
+                    } else {
+                        0
+                    };
+
+                    let chunk_boundary = if pos == 0 {
+                        *consumed + this.buffer.len() as u64
+                    } else {
+                        *consumed + (this.scan_pos + pos) as u64
+                    };
+
+                    if inject.boundary <= chunk_boundary {
+                        // forced boundary is before next boundary, force within current buffer
+                        let chunk_size = (inject.boundary - *consumed) as usize;
+                        let raw_chunk = this.buffer.split_to(chunk_size);
+                        this.chunker.reset();
+                        this.scan_pos = 0;
+
+                        *consumed += chunk_size as u64;
+
+                        // add the size of the injected chunks to consumed, so chunk stream offsets
+                        // are in sync with the rest of the archive.
+                        *consumed += inject.size as u64;
+
+                        injections.send(inject).unwrap();
+
+                        // the chunk can be empty, return nevertheless to allow the caller to
+                        // make progress by consuming from the injection queue
+                        return Poll::Ready(Some(Ok(raw_chunk)));
+                    } else if pos != 0 {
+                        *next_boundary = Some(inject);
+                        // forced boundary is after next boundary, split off chunk from buffer
+                        let chunk_size = this.scan_pos + pos;
+                        let raw_chunk = this.buffer.split_to(chunk_size);
+                        *consumed += chunk_size as u64;
+                        this.scan_pos = 0;
+
+                        return Poll::Ready(Some(Ok(raw_chunk)));
+                    } else {
+                        // forced boundary is after current buffer length, continue reading
+                        *next_boundary = Some(inject);
+                        this.scan_pos = this.buffer.len();
+                    }
+                }
+            }
+
             if this.scan_pos < this.buffer.len() {
                 let boundary = this.chunker.scan(&this.buffer[this.scan_pos..]);
 
@@ -70,11 +137,14 @@ where
 
                 if boundary == 0 {
                     this.scan_pos = this.buffer.len();
-                    // continue poll
                 } else if chunk_size <= this.buffer.len() {
-                    let result = this.buffer.split_to(chunk_size);
+                    // found new chunk boundary inside buffer, split off chunk from buffer
+                    let raw_chunk = this.buffer.split_to(chunk_size);
+                    if let Some(InjectionData { consumed, .. }) = this.injection_data.as_mut() {
+                        *consumed += chunk_size as u64;
+                    }
                     this.scan_pos = 0;
-                    return Poll::Ready(Some(Ok(result)));
+                    return Poll::Ready(Some(Ok(raw_chunk)));
                 } else {
                     panic!("got unexpected chunk boundary from chunker");
                 }
diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 6dbd1e664..7667348d4 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -6,7 +6,7 @@ use std::ops::Range;
 use std::os::unix::ffi::OsStrExt;
 use std::os::unix::io::{AsRawFd, FromRawFd, IntoRawFd, OwnedFd, RawFd};
 use std::path::{Path, PathBuf};
-use std::sync::{Arc, Mutex};
+use std::sync::{mpsc, Arc, Mutex};
 
 use anyhow::{bail, Context, Error};
 use futures::future::BoxFuture;
@@ -29,6 +29,7 @@ use pbs_datastore::catalog::BackupCatalogWriter;
 use pbs_datastore::dynamic_index::DynamicIndexReader;
 use pbs_datastore::index::IndexFile;
 
+use crate::inject_reused_chunks::InjectChunks;
 use crate::pxar::metadata::errno_is_unsupported;
 use crate::pxar::tools::assert_single_path_component;
 use crate::pxar::Flags;
@@ -134,6 +135,7 @@ struct Archiver {
     hardlinks: HashMap<HardLinkInfo, (PathBuf, LinkOffset)>,
     file_copy_buffer: Vec<u8>,
     skip_e2big_xattr: bool,
+    forced_boundaries: Option<mpsc::Sender<InjectChunks>>,
 }
 
 type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
@@ -158,6 +160,7 @@ pub async fn create_archive<T, F>(
     feature_flags: Flags,
     callback: F,
     options: PxarCreateOptions,
+    forced_boundaries: Option<mpsc::Sender<InjectChunks>>,
 ) -> Result<(), Error>
 where
     T: SeqWrite + Send,
@@ -213,6 +216,7 @@ where
         hardlinks: HashMap::new(),
         file_copy_buffer: vec::undefined(4 * 1024 * 1024),
         skip_e2big_xattr: options.skip_e2big_xattr,
+        forced_boundaries,
     };
 
     archiver
diff --git a/pbs-client/src/pxar_backup_stream.rs b/pbs-client/src/pxar_backup_stream.rs
index 3541eddb5..fb6d063f2 100644
--- a/pbs-client/src/pxar_backup_stream.rs
+++ b/pbs-client/src/pxar_backup_stream.rs
@@ -2,7 +2,7 @@ use std::io::Write;
 //use std::os::unix::io::FromRawFd;
 use std::path::Path;
 use std::pin::Pin;
-use std::sync::{Arc, Mutex};
+use std::sync::{mpsc, Arc, Mutex};
 use std::task::{Context, Poll};
 
 use anyhow::{format_err, Error};
@@ -17,6 +17,7 @@ use proxmox_io::StdChannelWriter;
 
 use pbs_datastore::catalog::CatalogWriter;
 
+use crate::inject_reused_chunks::InjectChunks;
 use crate::pxar::create::PxarWriters;
 
 /// Stream implementation to encode and upload .pxar archives.
@@ -42,6 +43,7 @@ impl PxarBackupStream {
         dir: Dir,
         catalog: Arc<Mutex<CatalogWriter<W>>>,
         options: crate::pxar::PxarCreateOptions,
+        boundaries: Option<mpsc::Sender<InjectChunks>>,
         separate_payload_stream: bool,
     ) -> Result<(Self, Option<Self>), Error> {
         let buffer_size = 256 * 1024;
@@ -82,6 +84,7 @@ impl PxarBackupStream {
                     Ok(())
                 },
                 options,
+                boundaries,
             )
             .await
             {
@@ -113,11 +116,12 @@ impl PxarBackupStream {
         dirname: &Path,
         catalog: Arc<Mutex<CatalogWriter<W>>>,
         options: crate::pxar::PxarCreateOptions,
+        boundaries: Option<mpsc::Sender<InjectChunks>>,
         separate_payload_stream: bool,
     ) -> Result<(Self, Option<Self>), Error> {
         let dir = nix::dir::Dir::open(dirname, OFlag::O_DIRECTORY, Mode::empty())?;
 
-        Self::new(dir, catalog, options, separate_payload_stream)
+        Self::new(dir, catalog, options, boundaries, separate_payload_stream)
     }
 }
 
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index ef481743f..f93f9c851 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -45,8 +45,8 @@ use pbs_client::tools::{
 use pbs_client::{
     delete_ticket_info, parse_backup_specification, view_task_result, BackupReader,
     BackupRepository, BackupSpecificationType, BackupStats, BackupWriter, ChunkStream,
-    FixedChunkStream, HttpClient, PxarBackupStream, RemoteChunkReader, UploadOptions,
-    BACKUP_SOURCE_SCHEMA,
+    FixedChunkStream, HttpClient, InjectionData, PxarBackupStream, RemoteChunkReader,
+    UploadOptions, BACKUP_SOURCE_SCHEMA,
 };
 use pbs_datastore::catalog::{BackupCatalogWriter, CatalogReader, CatalogWriter};
 use pbs_datastore::chunk_store::verify_chunk_size;
@@ -199,14 +199,16 @@ async fn backup_directory<P: AsRef<Path>>(
         bail!("cannot backup directory with fixed chunk size!");
     }
 
+    let (payload_boundaries_tx, payload_boundaries_rx) = std::sync::mpsc::channel();
     let (pxar_stream, payload_stream) = PxarBackupStream::open(
         dir_path.as_ref(),
         catalog,
         pxar_create_options,
+        Some(payload_boundaries_tx),
         payload_target.is_some(),
     )?;
 
-    let mut chunk_stream = ChunkStream::new(pxar_stream, chunk_size);
+    let mut chunk_stream = ChunkStream::new(pxar_stream, chunk_size, None);
     let (tx, rx) = mpsc::channel(10); // allow to buffer 10 chunks
 
     let stream = ReceiverStream::new(rx).map_err(Error::from);
@@ -218,13 +220,16 @@ async fn backup_directory<P: AsRef<Path>>(
         }
     });
 
-    let stats = client.upload_stream(archive_name, stream, upload_options.clone());
+    let stats = client.upload_stream(archive_name, stream, upload_options.clone(), None);
 
     if let Some(payload_stream) = payload_stream {
         let payload_target = payload_target
             .ok_or_else(|| format_err!("got payload stream, but no target archive name"))?;
 
-        let mut payload_chunk_stream = ChunkStream::new(payload_stream, chunk_size);
+        let (payload_injections_tx, payload_injections_rx) = std::sync::mpsc::channel();
+        let injection_data = InjectionData::new(payload_boundaries_rx, payload_injections_tx);
+        let mut payload_chunk_stream =
+            ChunkStream::new(payload_stream, chunk_size, Some(injection_data));
         let (payload_tx, payload_rx) = mpsc::channel(10); // allow to buffer 10 chunks
         let stream = ReceiverStream::new(payload_rx).map_err(Error::from);
 
@@ -235,7 +240,12 @@ async fn backup_directory<P: AsRef<Path>>(
             }
         });
 
-        let payload_stats = client.upload_stream(&payload_target, stream, upload_options);
+        let payload_stats = client.upload_stream(
+            &payload_target,
+            stream,
+            upload_options,
+            Some(payload_injections_rx),
+        );
 
         match futures::join!(stats, payload_stats) {
             (Ok(stats), Ok(payload_stats)) => Ok((stats, Some(payload_stats))),
@@ -271,7 +281,7 @@ async fn backup_image<P: AsRef<Path>>(
     }
 
     let stats = client
-        .upload_stream(archive_name, stream, upload_options)
+        .upload_stream(archive_name, stream, upload_options, None)
         .await?;
 
     Ok(stats)
@@ -562,7 +572,7 @@ fn spawn_catalog_upload(
     let (catalog_tx, catalog_rx) = std::sync::mpsc::sync_channel(10); // allow to buffer 10 writes
     let catalog_stream = proxmox_async::blocking::StdChannelStream(catalog_rx);
     let catalog_chunk_size = 512 * 1024;
-    let catalog_chunk_stream = ChunkStream::new(catalog_stream, Some(catalog_chunk_size));
+    let catalog_chunk_stream = ChunkStream::new(catalog_stream, Some(catalog_chunk_size), None);
 
     let catalog_writer = Arc::new(Mutex::new(CatalogWriter::new(TokioWriterAdapter::new(
         StdChannelWriter::new(catalog_tx),
@@ -578,7 +588,7 @@ fn spawn_catalog_upload(
 
     tokio::spawn(async move {
         let catalog_upload_result = client
-            .upload_stream(CATALOG_NAME, catalog_chunk_stream, upload_options)
+            .upload_stream(CATALOG_NAME, catalog_chunk_stream, upload_options, None)
             .await;
 
         if let Err(ref err) = catalog_upload_result {
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index 95c9f4619..f7fbae093 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -363,7 +363,7 @@ fn extract(
                     };
 
                     let pxar_writer = pxar::PxarVariant::Unified(TokioWriter::new(writer));
-                    create_archive(dir, PxarWriters::new(pxar_writer, None), Flags::DEFAULT, |_| Ok(()), options)
+                    create_archive(dir, PxarWriters::new(pxar_writer, None), Flags::DEFAULT, |_| Ok(()), options, None)
                         .await
                 }
                 .await;
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index cfeddd9fa..91857f399 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -410,6 +410,7 @@ async fn create_archive(
             Ok(())
         },
         options,
+        None,
     )
     .await?;
 
diff --git a/tests/catar.rs b/tests/catar.rs
index 932df61a9..9f83b4cc2 100644
--- a/tests/catar.rs
+++ b/tests/catar.rs
@@ -39,6 +39,7 @@ fn run_test(dir_name: &str) -> Result<(), Error> {
         Flags::DEFAULT,
         |_| Ok(()),
         options,
+        None,
     ))?;
 
     Command::new("cmp")
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 45/69] specs: add backup detection mode specification
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (43 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 44/69] client: streams: add channels for dynamic entry injection Christian Ebner
@ 2024-05-27 14:32 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 46/69] client: implement prepare reference method Christian Ebner
                   ` (24 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:32 UTC (permalink / raw)
  To: pbs-devel

Adds the specification for switching the detection mode used to
identify regular files which changed since a reference backup run.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- use api macro and serde Serialize/Deserialize instead of implementing
  schema and parsing

 pbs-client/src/backup_specification.rs | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/pbs-client/src/backup_specification.rs b/pbs-client/src/backup_specification.rs
index 619a3a9da..66bf71965 100644
--- a/pbs-client/src/backup_specification.rs
+++ b/pbs-client/src/backup_specification.rs
@@ -1,4 +1,5 @@
 use anyhow::{bail, Error};
+use serde::{Deserialize, Serialize};
 
 use proxmox_schema::*;
 
@@ -45,3 +46,28 @@ pub fn parse_backup_specification(value: &str) -> Result<BackupSpecification, Er
 
     bail!("unable to parse backup source specification '{}'", value);
 }
+
+#[api]
+#[derive(Default, Deserialize, Serialize)]
+#[serde(rename_all = "lowercase")]
+/// Mode to detect file changes since last backup run
+pub enum BackupDetectionMode {
+    /// Encode backup as self contained pxar archive
+    #[default]
+    Default,
+    /// Split backup mode, re-encode payload data
+    Data,
+    /// Compare metadata, reuse payload chunks if metadata unchanged
+    Metadata,
+}
+
+impl BackupDetectionMode {
+    /// Selected mode is data based file change detection with split meta/payload streams
+    pub fn is_data(&self) -> bool {
+        matches!(self, Self::Data)
+    }
+    /// Selected mode is metadata based file change detection
+    pub fn is_metadata(&self) -> bool {
+        matches!(self, Self::Metadata)
+    }
+}
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 46/69] client: implement prepare reference method
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (44 preceding siblings ...)
  2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 45/69] specs: add backup detection mode specification Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 47/69] client: pxar: add method for metadata comparison Christian Ebner
                   ` (23 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Implement a method that prepares the decoder instance to access a
previous snapshots metadata index and payload index in order to
pass it to the pxar archiver. The archiver than can utilize these
to compare the metadata for files to the previous state and gather
reusable chunks.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- adapt to PxarVariant pxar interface
- adapt to `change_detection_mode` now being an api type
- adapt to pxar -> mpxar mapping helper changes

 pbs-client/src/pxar/create.rs                 |  67 +++++++++-
 pbs-client/src/pxar/mod.rs                    |   4 +-
 proxmox-backup-client/src/main.rs             | 120 +++++++++++++++---
 .../src/proxmox_restore_daemon/api.rs         |   1 +
 pxar-bin/src/main.rs                          |   1 +
 5 files changed, 169 insertions(+), 24 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 7667348d4..678ad768f 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -18,6 +18,8 @@ use nix::sys::stat::{FileStat, Mode};
 
 use pathpatterns::{MatchEntry, MatchFlag, MatchList, MatchType, PatternFlag};
 use proxmox_sys::error::SysError;
+use pxar::accessor::aio::{Accessor, Directory};
+use pxar::accessor::ReadAt;
 use pxar::encoder::{LinkOffset, SeqWrite};
 use pxar::{Metadata, PxarVariant};
 
@@ -35,7 +37,7 @@ use crate::pxar::tools::assert_single_path_component;
 use crate::pxar::Flags;
 
 /// Pxar options for creating a pxar archive/stream
-#[derive(Default, Clone)]
+#[derive(Default)]
 pub struct PxarCreateOptions {
     /// Device/mountpoint st_dev numbers that should be included. None for no limitation.
     pub device_set: Option<HashSet<u64>>,
@@ -47,6 +49,20 @@ pub struct PxarCreateOptions {
     pub skip_lost_and_found: bool,
     /// Skip xattrs of files that return E2BIG error
     pub skip_e2big_xattr: bool,
+    /// Reference state for partial backups
+    pub previous_ref: Option<PxarPrevRef>,
+}
+
+pub type MetadataArchiveReader = Arc<dyn ReadAt + Send + Sync + 'static>;
+
+/// Statefull information of previous backups snapshots for partial backups
+pub struct PxarPrevRef {
+    /// Reference accessor for metadata comparison
+    pub accessor: Accessor<MetadataArchiveReader>,
+    /// Reference index for reusing payload chunks
+    pub payload_index: DynamicIndexReader,
+    /// Reference archive name for partial backups
+    pub archive_name: String,
 }
 
 fn detect_fs_type(fd: RawFd) -> Result<i64, Error> {
@@ -136,6 +152,7 @@ struct Archiver {
     file_copy_buffer: Vec<u8>,
     skip_e2big_xattr: bool,
     forced_boundaries: Option<mpsc::Sender<InjectChunks>>,
+    previous_payload_index: Option<DynamicIndexReader>,
 }
 
 type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
@@ -200,6 +217,15 @@ where
             MatchType::Exclude,
         )?);
     }
+    let (previous_payload_index, previous_metadata_accessor) =
+        if let Some(refs) = options.previous_ref {
+            (
+                Some(refs.payload_index),
+                refs.accessor.open_root().await.ok(),
+            )
+        } else {
+            (None, None)
+        };
 
     let mut archiver = Archiver {
         feature_flags,
@@ -217,10 +243,11 @@ where
         file_copy_buffer: vec::undefined(4 * 1024 * 1024),
         skip_e2big_xattr: options.skip_e2big_xattr,
         forced_boundaries,
+        previous_payload_index,
     };
 
     archiver
-        .archive_dir_contents(&mut encoder, source_dir, true)
+        .archive_dir_contents(&mut encoder, previous_metadata_accessor, source_dir, true)
         .await?;
     encoder.finish().await?;
     encoder.close().await?;
@@ -252,6 +279,7 @@ impl Archiver {
     fn archive_dir_contents<'a, T: SeqWrite + Send>(
         &'a mut self,
         encoder: &'a mut Encoder<'_, T>,
+        mut previous_metadata_accessor: Option<Directory<MetadataArchiveReader>>,
         mut dir: Dir,
         is_root: bool,
     ) -> BoxFuture<'a, Result<(), Error>> {
@@ -286,9 +314,15 @@ impl Archiver {
 
                 (self.callback)(&file_entry.path)?;
                 self.path = file_entry.path;
-                self.add_entry(encoder, dir_fd, &file_entry.name, &file_entry.stat)
-                    .await
-                    .map_err(|err| self.wrap_err(err))?;
+                self.add_entry(
+                    encoder,
+                    &mut previous_metadata_accessor,
+                    dir_fd,
+                    &file_entry.name,
+                    &file_entry.stat,
+                )
+                .await
+                .map_err(|err| self.wrap_err(err))?;
             }
             self.path = old_path;
             self.entry_counter = entry_counter;
@@ -536,6 +570,7 @@ impl Archiver {
     async fn add_entry<T: SeqWrite + Send>(
         &mut self,
         encoder: &mut Encoder<'_, T>,
+        previous_metadata: &mut Option<Directory<MetadataArchiveReader>>,
         parent: RawFd,
         c_file_name: &CStr,
         stat: &FileStat,
@@ -625,7 +660,14 @@ impl Archiver {
                     catalog.lock().unwrap().start_directory(c_file_name)?;
                 }
                 let result = self
-                    .add_directory(encoder, dir, c_file_name, &metadata, stat)
+                    .add_directory(
+                        encoder,
+                        previous_metadata,
+                        dir,
+                        c_file_name,
+                        &metadata,
+                        stat,
+                    )
                     .await;
                 if let Some(ref catalog) = self.catalog {
                     catalog.lock().unwrap().end_directory()?;
@@ -678,6 +720,7 @@ impl Archiver {
     async fn add_directory<T: SeqWrite + Send>(
         &mut self,
         encoder: &mut Encoder<'_, T>,
+        previous_metadata_accessor: &mut Option<Directory<MetadataArchiveReader>>,
         dir: Dir,
         dir_name: &CStr,
         metadata: &Metadata,
@@ -708,7 +751,17 @@ impl Archiver {
             log::info!("skipping mount point: {:?}", self.path);
             Ok(())
         } else {
-            self.archive_dir_contents(encoder, dir, false).await
+            let mut dir_accessor = None;
+            if let Some(accessor) = previous_metadata_accessor.as_mut() {
+                if let Some(file_entry) = accessor.lookup(dir_name).await? {
+                    if file_entry.entry().is_dir() {
+                        let dir = file_entry.enter_directory().await?;
+                        dir_accessor = Some(dir);
+                    }
+                }
+            }
+            self.archive_dir_contents(encoder, dir_accessor, dir, false)
+                .await
         };
 
         self.fs_magic = old_fs_magic;
diff --git a/pbs-client/src/pxar/mod.rs b/pbs-client/src/pxar/mod.rs
index b7dcf8362..5248a1956 100644
--- a/pbs-client/src/pxar/mod.rs
+++ b/pbs-client/src/pxar/mod.rs
@@ -56,7 +56,9 @@ pub(crate) mod tools;
 mod flags;
 pub use flags::Flags;
 
-pub use create::{create_archive, PxarCreateOptions, PxarWriters};
+pub use create::{
+    create_archive, MetadataArchiveReader, PxarCreateOptions, PxarPrevRef, PxarWriters,
+};
 pub use extract::{
     create_tar, create_zip, extract_archive, extract_sub_dir, extract_sub_dir_seq, ErrorHandler,
     OverwriteFlags, PxarExtractContext, PxarExtractOptions,
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index f93f9c851..fcce13430 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -21,6 +21,7 @@ use proxmox_router::{cli::*, ApiMethod, RpcEnvironment};
 use proxmox_schema::api;
 use proxmox_sys::fs::{file_get_json, image_size, replace_file, CreateOptions};
 use proxmox_time::{epoch_i64, strftime_local};
+use pxar::accessor::aio::Accessor;
 use pxar::accessor::{MaybeReady, ReadAt, ReadAtOperation};
 
 use pbs_api_types::{
@@ -30,7 +31,7 @@ use pbs_api_types::{
     BACKUP_TYPE_SCHEMA, TRAFFIC_CONTROL_BURST_SCHEMA, TRAFFIC_CONTROL_RATE_SCHEMA,
 };
 use pbs_client::catalog_shell::Shell;
-use pbs_client::pxar::ErrorHandler as PxarErrorHandler;
+use pbs_client::pxar::{ErrorHandler as PxarErrorHandler, MetadataArchiveReader, PxarPrevRef};
 use pbs_client::tools::{
     complete_archive_name, complete_auth_id, complete_backup_group, complete_backup_snapshot,
     complete_backup_source, complete_chunk_size, complete_group_or_snapshot,
@@ -43,14 +44,14 @@ use pbs_client::tools::{
     CHUNK_SIZE_SCHEMA, REPO_URL_SCHEMA,
 };
 use pbs_client::{
-    delete_ticket_info, parse_backup_specification, view_task_result, BackupReader,
-    BackupRepository, BackupSpecificationType, BackupStats, BackupWriter, ChunkStream,
-    FixedChunkStream, HttpClient, InjectionData, PxarBackupStream, RemoteChunkReader,
+    delete_ticket_info, parse_backup_specification, view_task_result, BackupDetectionMode,
+    BackupReader, BackupRepository, BackupSpecificationType, BackupStats, BackupWriter,
+    ChunkStream, FixedChunkStream, HttpClient, InjectionData, PxarBackupStream, RemoteChunkReader,
     UploadOptions, BACKUP_SOURCE_SCHEMA,
 };
 use pbs_datastore::catalog::{BackupCatalogWriter, CatalogReader, CatalogWriter};
 use pbs_datastore::chunk_store::verify_chunk_size;
-use pbs_datastore::dynamic_index::{BufferedDynamicReader, DynamicIndexReader};
+use pbs_datastore::dynamic_index::{BufferedDynamicReader, DynamicIndexReader, LocalDynamicReadAt};
 use pbs_datastore::fixed_index::FixedIndexReader;
 use pbs_datastore::index::IndexFile;
 use pbs_datastore::manifest::{
@@ -687,6 +688,10 @@ fn spawn_catalog_upload(
                schema: TRAFFIC_CONTROL_BURST_SCHEMA,
                optional: true,
            },
+           "change-detection-mode": {
+               type: BackupDetectionMode,
+               optional: true,
+           },
            "exclude": {
                type: Array,
                description: "List of paths or patterns for matching files to exclude.",
@@ -722,6 +727,7 @@ async fn create_backup(
     param: Value,
     all_file_systems: bool,
     skip_lost_and_found: bool,
+    change_detection_mode: Option<BackupDetectionMode>,
     dry_run: bool,
     skip_e2big_xattr: bool,
     _info: &ApiMethod,
@@ -881,6 +887,8 @@ async fn create_backup(
 
     let backup_time = backup_time_opt.unwrap_or_else(epoch_i64);
 
+    let detection_mode = change_detection_mode.unwrap_or_default();
+
     let http_client = connect_rate_limited(&repo, rate_limit)?;
     record_repository(&repo);
 
@@ -981,7 +989,7 @@ async fn create_backup(
         None
     };
 
-    let mut manifest = BackupManifest::new(snapshot);
+    let mut manifest = BackupManifest::new(snapshot.clone());
 
     let mut catalog = None;
     let mut catalog_result_rx = None;
@@ -1028,22 +1036,21 @@ async fn create_backup(
                 manifest.add_file(target, stats.size, stats.csum, crypto.mode)?;
             }
             (BackupSpecificationType::PXAR, false) => {
-                let metadata_mode = false; // Until enabled via param
-
                 let target_base = if let Some(base) = target_base.strip_suffix(".pxar") {
                     base.to_string()
                 } else {
                     bail!("unexpected suffix in target: {target_base}");
                 };
 
-                let (target, payload_target) = if metadata_mode {
-                    (
-                        format!("{target_base}.mpxar.{extension}"),
-                        Some(format!("{target_base}.ppxar.{extension}")),
-                    )
-                } else {
-                    (target, None)
-                };
+                let (target, payload_target) =
+                    if detection_mode.is_metadata() || detection_mode.is_data() {
+                        (
+                            format!("{target_base}.mpxar.{extension}"),
+                            Some(format!("{target_base}.ppxar.{extension}")),
+                        )
+                    } else {
+                        (target, None)
+                    };
 
                 // start catalog upload on first use
                 if catalog.is_none() {
@@ -1060,12 +1067,41 @@ async fn create_backup(
                     .unwrap()
                     .start_directory(std::ffi::CString::new(target.as_str())?.as_c_str())?;
 
+                let mut previous_ref = None;
+                if detection_mode.is_metadata() {
+                    if let Some(ref manifest) = previous_manifest {
+                        // BackupWriter::start created a new snapshot, get the one before
+                        if let Some(backup_time) = client.previous_backup_time().await? {
+                            let backup_dir: BackupDir =
+                                (snapshot.group.clone(), backup_time).into();
+                            let backup_reader = BackupReader::start(
+                                &http_client,
+                                crypt_config.clone(),
+                                repo.store(),
+                                &backup_ns,
+                                &backup_dir,
+                                true,
+                            )
+                            .await?;
+                            previous_ref = prepare_reference(
+                                &target,
+                                manifest.clone(),
+                                &client,
+                                backup_reader.clone(),
+                                crypt_config.clone(),
+                            )
+                            .await?
+                        }
+                    }
+                }
+
                 let pxar_options = pbs_client::pxar::PxarCreateOptions {
                     device_set: devices.clone(),
                     patterns: pattern_list.clone(),
                     entries_max: entries_max as usize,
                     skip_lost_and_found,
                     skip_e2big_xattr,
+                    previous_ref,
                 };
 
                 let upload_options = UploadOptions {
@@ -1177,6 +1213,58 @@ async fn create_backup(
     Ok(Value::Null)
 }
 
+async fn prepare_reference(
+    target: &str,
+    manifest: Arc<BackupManifest>,
+    backup_writer: &BackupWriter,
+    backup_reader: Arc<BackupReader>,
+    crypt_config: Option<Arc<CryptConfig>>,
+) -> Result<Option<PxarPrevRef>, Error> {
+    let (target, payload_target) = helper::get_pxar_archive_names(target, &manifest);
+    let payload_target = payload_target.unwrap_or_default();
+
+    let metadata_ref_index = if let Ok(index) = backup_reader
+        .download_dynamic_index(&manifest, &target)
+        .await
+    {
+        index
+    } else {
+        log::info!("No previous metadata index, continue without reference");
+        return Ok(None);
+    };
+
+    if manifest.lookup_file_info(&payload_target).is_err() {
+        log::info!("No previous payload index found in manifest, continue without reference");
+        return Ok(None);
+    }
+
+    let known_payload_chunks = Arc::new(Mutex::new(HashSet::new()));
+    let payload_ref_index = backup_writer
+        .download_previous_dynamic_index(&payload_target, &manifest, known_payload_chunks)
+        .await?;
+
+    log::info!("Using previous index as metadata reference for '{target}'");
+
+    let most_used = metadata_ref_index.find_most_used_chunks(8);
+    let file_info = manifest.lookup_file_info(&target)?;
+    let chunk_reader = RemoteChunkReader::new(
+        backup_reader.clone(),
+        crypt_config.clone(),
+        file_info.chunk_crypt_mode(),
+        most_used,
+    );
+    let reader = BufferedDynamicReader::new(metadata_ref_index, chunk_reader);
+    let archive_size = reader.archive_size();
+    let reader: MetadataArchiveReader = Arc::new(LocalDynamicReadAt::new(reader));
+    let accessor = Accessor::new(reader, archive_size, None).await?;
+
+    Ok(Some(pbs_client::pxar::PxarPrevRef {
+        accessor,
+        payload_index: payload_ref_index,
+        archive_name: target,
+    }))
+}
+
 async fn dump_image<W: Write>(
     client: Arc<BackupReader>,
     crypt_config: Option<Arc<CryptConfig>>,
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index f7fbae093..681fa6db9 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -360,6 +360,7 @@ fn extract(
                         patterns,
                         skip_lost_and_found: false,
                         skip_e2big_xattr: false,
+                        previous_ref: None,
                     };
 
                     let pxar_writer = pxar::PxarVariant::Unified(TokioWriter::new(writer));
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index 91857f399..a3b6ec4c0 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -363,6 +363,7 @@ async fn create_archive(
         patterns,
         skip_lost_and_found: false,
         skip_e2big_xattr: false,
+        previous_ref: None,
     };
 
     let source = PathBuf::from(source);
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 47/69] client: pxar: add method for metadata comparison
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (45 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 46/69] client: implement prepare reference method Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 48/69] pxar: caching: add look-ahead cache Christian Ebner
                   ` (22 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Add method to compare metadata of current file entry against metadata
of the entry looked up in the previous backup snapshot. If the
metadata matched, the start offset pointing to the files payload
header in the payload steam is returned.

This is in preparation for reusing payload chunks for unchanged files.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- move check if previous metadata accessor to within the
  is_reusable_entry

 pbs-client/src/pxar/create.rs     | 37 ++++++++++++++++++++++++++++++-
 proxmox-backup-client/src/main.rs |  3 ++-
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 678ad768f..ac8827bb2 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -2,6 +2,7 @@ use std::collections::{HashMap, HashSet};
 use std::ffi::{CStr, CString, OsStr};
 use std::fmt;
 use std::io::{self, Read};
+use std::mem::size_of;
 use std::ops::Range;
 use std::os::unix::ffi::OsStrExt;
 use std::os::unix::io::{AsRawFd, FromRawFd, IntoRawFd, OwnedFd, RawFd};
@@ -21,7 +22,7 @@ use proxmox_sys::error::SysError;
 use pxar::accessor::aio::{Accessor, Directory};
 use pxar::accessor::ReadAt;
 use pxar::encoder::{LinkOffset, SeqWrite};
-use pxar::{Metadata, PxarVariant};
+use pxar::{EntryKind, Metadata, PxarVariant};
 
 use proxmox_io::vec;
 use proxmox_lang::c_str;
@@ -333,6 +334,40 @@ impl Archiver {
         .boxed()
     }
 
+    async fn is_reusable_entry(
+        &mut self,
+        previous_metadata_accessor: &Option<Directory<MetadataArchiveReader>>,
+        file_name: &Path,
+        metadata: &Metadata,
+    ) -> Result<Option<Range<u64>>, Error> {
+        if let Some(previous_metadata_accessor) = previous_metadata_accessor {
+            if let Some(file_entry) = previous_metadata_accessor.lookup(file_name).await? {
+                if metadata == file_entry.metadata() {
+                    if let EntryKind::File {
+                        payload_offset: Some(offset),
+                        size,
+                        ..
+                    } = file_entry.entry().kind()
+                    {
+                        let range =
+                            *offset..*offset + size + size_of::<pxar::format::Header>() as u64;
+                        log::debug!(
+                            "reusable: {file_name:?} at range {range:?} has unchanged metadata."
+                        );
+                        return Ok(Some(range));
+                    }
+                    log::debug!("reencode: {file_name:?} not a regular file.");
+                    return Ok(None);
+                }
+                log::debug!("reencode: {file_name:?} metadata did not match.");
+                return Ok(None);
+            }
+            log::debug!("reencode: {file_name:?} not found in previous archive.");
+        }
+
+        Ok(None)
+    }
+
     /// openat() wrapper which allows but logs `EACCES` and turns `ENOENT` into `None`.
     ///
     /// The `existed` flag is set when iterating through a directory to note that we know the file
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index fcce13430..32e5f9b81 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -1256,7 +1256,8 @@ async fn prepare_reference(
     let reader = BufferedDynamicReader::new(metadata_ref_index, chunk_reader);
     let archive_size = reader.archive_size();
     let reader: MetadataArchiveReader = Arc::new(LocalDynamicReadAt::new(reader));
-    let accessor = Accessor::new(reader, archive_size, None).await?;
+    // only care about the metadata, therefore do not attach payload reader
+    let accessor = Accessor::new(pxar::PxarVariant::Unified(reader), archive_size).await?;
 
     Ok(Some(pbs_client::pxar::PxarPrevRef {
         accessor,
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 48/69] pxar: caching: add look-ahead cache
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (46 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 47/69] client: pxar: add method for metadata comparison Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 49/69] client: pxar: refactor catalog encoding for directories Christian Ebner
                   ` (21 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Add a lookahead cache and the neccessary types to store the required
data and keep track of directory boundaries while traversing the
filesystem tree, in order to postpone a decision if to reuse or
reencode a given regular file with unchanged metadata.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- add PxarLookaheadCache and refactor some of the logic to be contained
  within this patch

 pbs-client/src/pxar/create.rs           |   2 +-
 pbs-client/src/pxar/look_ahead_cache.rs | 165 ++++++++++++++++++++++++
 pbs-client/src/pxar/mod.rs              |   1 +
 3 files changed, 167 insertions(+), 1 deletion(-)
 create mode 100644 pbs-client/src/pxar/look_ahead_cache.rs

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index ac8827bb2..6127aa88f 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -131,7 +131,7 @@ impl fmt::Display for ArchiveError {
 }
 
 #[derive(Eq, PartialEq, Hash)]
-struct HardLinkInfo {
+pub(crate) struct HardLinkInfo {
     st_dev: u64,
     st_ino: u64,
 }
diff --git a/pbs-client/src/pxar/look_ahead_cache.rs b/pbs-client/src/pxar/look_ahead_cache.rs
new file mode 100644
index 000000000..539586271
--- /dev/null
+++ b/pbs-client/src/pxar/look_ahead_cache.rs
@@ -0,0 +1,165 @@
+use std::collections::HashSet;
+use std::ffi::CString;
+use std::ops::Range;
+use std::os::unix::io::OwnedFd;
+use std::path::PathBuf;
+
+use nix::sys::stat::FileStat;
+
+use pxar::encoder::PayloadOffset;
+use pxar::Metadata;
+
+use super::create::*;
+
+const DEFAULT_CACHE_SIZE: usize = 512;
+
+pub(crate) struct CacheEntryData {
+    pub(crate) fd: OwnedFd,
+    pub(crate) c_file_name: CString,
+    pub(crate) stat: FileStat,
+    pub(crate) metadata: Metadata,
+    pub(crate) payload_offset: PayloadOffset,
+}
+
+pub(crate) enum CacheEntry {
+    RegEntry(CacheEntryData),
+    DirEntry(CacheEntryData),
+    DirEnd,
+}
+
+pub(crate) struct PxarLookaheadCache {
+    // Current state of the cache
+    enabled: bool,
+    // Cached entries
+    entries: Vec<CacheEntry>,
+    // Entries encountered having more than one link given by stat
+    hardlinks: HashSet<HardLinkInfo>,
+    // Payload range covered by the currently cached entries
+    range: Range<u64>,
+    // Possible held back last chunk from last flush, used for possible chunk continuation
+    last_chunk: Option<ReusableDynamicEntry>,
+    // Path when started caching
+    start_path: PathBuf,
+    // Number of entries with file descriptors
+    fd_entries: usize,
+    // Max number of entries with file descriptors
+    cache_size: usize,
+}
+
+impl PxarLookaheadCache {
+    pub(crate) fn new(size: Option<usize>) -> Self {
+        Self {
+            enabled: false,
+            entries: Vec::new(),
+            hardlinks: HashSet::new(),
+            range: 0..0,
+            last_chunk: None,
+            start_path: PathBuf::new(),
+            fd_entries: 0,
+            cache_size: size.unwrap_or(DEFAULT_CACHE_SIZE),
+        }
+    }
+
+    pub(crate) fn is_full(&self) -> bool {
+        self.fd_entries >= self.cache_size
+    }
+
+    pub(crate) fn caching_enabled(&self) -> bool {
+        self.enabled
+    }
+
+    pub(crate) fn insert(
+        &mut self,
+        fd: OwnedFd,
+        c_file_name: CString,
+        stat: FileStat,
+        metadata: Metadata,
+        payload_offset: PayloadOffset,
+    ) {
+        self.enabled = true;
+        self.fd_entries += 1;
+        if metadata.is_dir() {
+            self.entries.push(CacheEntry::DirEntry(CacheEntryData {
+                fd,
+                c_file_name,
+                stat,
+                metadata,
+                payload_offset,
+            }))
+        } else {
+            self.entries.push(CacheEntry::RegEntry(CacheEntryData {
+                fd,
+                c_file_name,
+                stat,
+                metadata,
+                payload_offset,
+            }))
+        }
+    }
+
+    pub(crate) fn insert_dir_end(&mut self) {
+        self.entries.push(CacheEntry::DirEnd);
+    }
+
+    pub(crate) fn take_and_reset(&mut self) -> Vec<CacheEntry> {
+        self.fd_entries = 0;
+        self.enabled = false;
+        self.start_path.clear();
+        self.clear_range();
+        std::mem::take(&mut self.entries)
+    }
+
+    pub(crate) fn update_start_path(&mut self, path: PathBuf) {
+        self.start_path = path;
+    }
+
+    pub(crate) fn start_path(&self) -> &PathBuf {
+        &self.start_path
+    }
+
+    pub(crate) fn contains_hardlink(&self, info: &HardLinkInfo) -> bool {
+        self.hardlinks.contains(info)
+    }
+
+    pub(crate) fn insert_hardlink(&mut self, info: HardLinkInfo) -> bool {
+        self.hardlinks.insert(info)
+    }
+
+    pub(crate) fn range(&self) -> &Range<u64> {
+        &self.range
+    }
+
+    pub(crate) fn update_range(&mut self, range: Range<u64>) {
+        self.range = range;
+    }
+
+    pub(crate) fn clear_range(&mut self) {
+        // keep end for possible continuation if cache has been cleared because
+        // it was full, but further caching would be fine
+        self.range = self.range.end..self.range.end
+    }
+
+    pub(crate) fn try_extend_range(&mut self, range: Range<u64>) -> bool {
+        if self.range.end == 0 {
+            // initialize first range to start and end with start of new range
+            self.range.start = range.start;
+            self.range.end = range.start;
+        }
+
+        // range continued, update end
+        if self.range.end == range.start {
+            self.range.end = range.end;
+            return true;
+        }
+
+        false
+    }
+
+    pub(crate) fn take_last_chunk(&mut self) -> Option<ReusableDynamicEntry> {
+        self.last_chunk.take()
+    }
+
+    pub(crate) fn update_last_chunk(&mut self, chunk: Option<ReusableDynamicEntry>) {
+        self.last_chunk = chunk;
+    }
+}
diff --git a/pbs-client/src/pxar/mod.rs b/pbs-client/src/pxar/mod.rs
index 5248a1956..334759df6 100644
--- a/pbs-client/src/pxar/mod.rs
+++ b/pbs-client/src/pxar/mod.rs
@@ -50,6 +50,7 @@
 pub(crate) mod create;
 pub(crate) mod dir_stack;
 pub(crate) mod extract;
+pub(crate) mod look_ahead_cache;
 pub(crate) mod metadata;
 pub(crate) mod tools;
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 49/69] client: pxar: refactor catalog encoding for directories
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (47 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 48/69] pxar: caching: add look-ahead cache Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 50/69] fix #3174: client: pxar: enable caching and meta comparison Christian Ebner
                   ` (20 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Move the catalog directory start and end encoding from `add_entry`
to the `add_directory`, the latter being called by the previous.

By this, the `add_entry` method can be reused to walk the filesystem
tree in the context of an enabled lookahead cache without encoding
anything.

No functional change intended.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- not present in pervious version

 pbs-client/src/pxar/create.rs | 38 +++++++++++++++++------------------
 1 file changed, 18 insertions(+), 20 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 6127aa88f..04c89b453 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -690,24 +690,15 @@ impl Archiver {
             }
             mode::IFDIR => {
                 let dir = Dir::from_fd(fd.into_raw_fd())?;
-
-                if let Some(ref catalog) = self.catalog {
-                    catalog.lock().unwrap().start_directory(c_file_name)?;
-                }
-                let result = self
-                    .add_directory(
-                        encoder,
-                        previous_metadata,
-                        dir,
-                        c_file_name,
-                        &metadata,
-                        stat,
-                    )
-                    .await;
-                if let Some(ref catalog) = self.catalog {
-                    catalog.lock().unwrap().end_directory()?;
-                }
-                result
+                self.add_directory(
+                    encoder,
+                    previous_metadata,
+                    dir,
+                    c_file_name,
+                    &metadata,
+                    stat,
+                )
+                .await
             }
             mode::IFSOCK => {
                 if let Some(ref catalog) = self.catalog {
@@ -757,12 +748,15 @@ impl Archiver {
         encoder: &mut Encoder<'_, T>,
         previous_metadata_accessor: &mut Option<Directory<MetadataArchiveReader>>,
         dir: Dir,
-        dir_name: &CStr,
+        c_dir_name: &CStr,
         metadata: &Metadata,
         stat: &FileStat,
     ) -> Result<(), Error> {
-        let dir_name = OsStr::from_bytes(dir_name.to_bytes());
+        let dir_name = OsStr::from_bytes(c_dir_name.to_bytes());
 
+        if let Some(ref catalog) = self.catalog {
+            catalog.lock().unwrap().start_directory(c_dir_name)?;
+        }
         encoder.create_directory(dir_name, metadata).await?;
 
         let old_fs_magic = self.fs_magic;
@@ -804,6 +798,10 @@ impl Archiver {
         self.current_st_dev = old_st_dev;
 
         encoder.finish().await?;
+        if let Some(ref catalog) = self.catalog {
+            catalog.lock().unwrap().end_directory()?;
+        }
+
         result
     }
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 50/69] fix #3174: client: pxar: enable caching and meta comparison
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (48 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 49/69] client: pxar: refactor catalog encoding for directories Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 51/69] client: backup writer: add injected chunk count to stats Christian Ebner
                   ` (19 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

When walking the file system tree, check for each entry if it is
reusable, meaning that the metadata did not change and the payload
chunks can be reindexed instead of reencoding the whole data.

If the metadata matched, the range of the dynamic index entries for
that file are looked up in the previous payload data index.
Use the range and possible padding introduced by partial reuse of
chunks to decide whether to reuse the dynamic entries and encode
the file payloads as payload reference right away or cache the entry
for now and keep looking ahead.

If however a non-reusable (because changed) entry is encountered
before the padding threshold is reached, the entries on the cache are
flushed to the archive by reencoding them, resetting the cached state.

Reusable chunk digests and size as well as reference offsets to the
start of regular files payloads within the payload stream are injected
into the backup stream by sending them to the chunker via a dedicated
channel, forcing a chunk boundary and inserting the chunks.

If the threshold value for reuse is reached, the chunks are injected
in the payload stream and the references with the corresponding
offsets encoded in the metadata stream.

Since multiple files might be contained within a single chunk, it is
assured that the deduplication of chunks is performed, by keeping back
the last chunk, so following files might as well reuse that same
chunk without double indexing it.  It is assured that this chunk is
injected in the stream also in case that the following lookups lead to
a cache clear and reencoding.

Directory boundaries are cached as well, and written as part of the
encoding when flushing.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- use PxarLookaheadCache and its provided methods
- refactoring removing some unnecessary methods to improve readability

 pbs-client/src/pxar/create.rs | 387 +++++++++++++++++++++++++++++++---
 1 file changed, 360 insertions(+), 27 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 04c89b453..f044dd1e6 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -21,9 +21,10 @@ use pathpatterns::{MatchEntry, MatchFlag, MatchList, MatchType, PatternFlag};
 use proxmox_sys::error::SysError;
 use pxar::accessor::aio::{Accessor, Directory};
 use pxar::accessor::ReadAt;
-use pxar::encoder::{LinkOffset, SeqWrite};
+use pxar::encoder::{LinkOffset, PayloadOffset, SeqWrite};
 use pxar::{EntryKind, Metadata, PxarVariant};
 
+use proxmox_human_byte::HumanByte;
 use proxmox_io::vec;
 use proxmox_lang::c_str;
 use proxmox_sys::fs::{self, acl, xattr};
@@ -33,10 +34,13 @@ use pbs_datastore::dynamic_index::DynamicIndexReader;
 use pbs_datastore::index::IndexFile;
 
 use crate::inject_reused_chunks::InjectChunks;
+use crate::pxar::look_ahead_cache::{CacheEntry, CacheEntryData, PxarLookaheadCache};
 use crate::pxar::metadata::errno_is_unsupported;
 use crate::pxar::tools::assert_single_path_component;
 use crate::pxar::Flags;
 
+const CHUNK_PADDING_THRESHOLD: f64 = 0.1;
+
 /// Pxar options for creating a pxar archive/stream
 #[derive(Default)]
 pub struct PxarCreateOptions {
@@ -154,6 +158,7 @@ struct Archiver {
     skip_e2big_xattr: bool,
     forced_boundaries: Option<mpsc::Sender<InjectChunks>>,
     previous_payload_index: Option<DynamicIndexReader>,
+    cache: PxarLookaheadCache,
 }
 
 type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
@@ -207,6 +212,7 @@ where
         set.insert(stat.st_dev);
     }
 
+    let metadata_mode = options.previous_ref.is_some() && writers.archive.payload().is_some();
     let mut encoder = Encoder::new(writers.archive, &metadata).await?;
 
     let mut patterns = options.patterns;
@@ -245,11 +251,19 @@ where
         skip_e2big_xattr: options.skip_e2big_xattr,
         forced_boundaries,
         previous_payload_index,
+        cache: PxarLookaheadCache::new(None),
     };
 
     archiver
         .archive_dir_contents(&mut encoder, previous_metadata_accessor, source_dir, true)
         .await?;
+
+    if metadata_mode {
+        archiver
+            .flush_cached_reusing_if_below_threshold(&mut encoder, false)
+            .await?;
+    }
+
     encoder.finish().await?;
     encoder.close().await?;
 
@@ -307,7 +321,10 @@ impl Archiver {
             for file_entry in file_list {
                 let file_name = file_entry.name.to_bytes();
 
-                if is_root && file_name == b".pxarexclude-cli" {
+                if is_root
+                    && file_name == b".pxarexclude-cli"
+                    && previous_metadata_accessor.is_none()
+                {
                     self.encode_pxarexclude_cli(encoder, &file_entry.name, old_patterns_count)
                         .await?;
                     continue;
@@ -610,8 +627,6 @@ impl Archiver {
         c_file_name: &CStr,
         stat: &FileStat,
     ) -> Result<(), Error> {
-        use pxar::format::mode;
-
         let file_mode = stat.st_mode & libc::S_IFMT;
         let open_mode = if file_mode == libc::S_IFREG || file_mode == libc::S_IFDIR {
             OFlag::empty()
@@ -649,6 +664,127 @@ impl Archiver {
             self.skip_e2big_xattr,
         )?;
 
+        if self.previous_payload_index.is_none() {
+            return self
+                .add_entry_to_archive(encoder, &mut None, c_file_name, stat, fd, &metadata, None)
+                .await;
+        }
+
+        // Avoid having to many open file handles in cached entries
+        if self.cache.is_full() {
+            log::debug!("Max cache size reached, reuse cached entries");
+            self.flush_cached_reusing_if_below_threshold(encoder, true)
+                .await?;
+        }
+
+        if metadata.is_regular_file() {
+            if stat.st_nlink > 1 {
+                let link_info = HardLinkInfo {
+                    st_dev: stat.st_dev,
+                    st_ino: stat.st_ino,
+                };
+                if self.cache.contains_hardlink(&link_info) {
+                    // This hardlink has been seen by the lookahead cache already, put it on the cache
+                    // with a dummy offset and continue without lookup and chunk injection.
+                    // On flushing or re-encoding, the logic there will store the actual hardlink with
+                    // offset.
+                    if !self.cache.caching_enabled() {
+                        // have regular file, get directory path
+                        let mut path = self.path.clone();
+                        path.pop();
+                        self.cache.update_start_path(path);
+                    }
+                    self.cache.insert(
+                        fd,
+                        c_file_name.into(),
+                        *stat,
+                        metadata.clone(),
+                        PayloadOffset::default(),
+                    );
+                    return Ok(());
+                } else {
+                    // mark this hardlink as seen by the lookahead cache
+                    self.cache.insert_hardlink(link_info);
+                }
+            }
+
+            let file_name: &Path = OsStr::from_bytes(c_file_name.to_bytes()).as_ref();
+            if let Some(payload_range) = self
+                .is_reusable_entry(previous_metadata, file_name, &metadata)
+                .await?
+            {
+                if !self.cache.try_extend_range(payload_range.clone()) {
+                    log::debug!("Cache range has hole, new range: {payload_range:?}");
+                    self.flush_cached_reusing_if_below_threshold(encoder, true)
+                        .await?;
+                    // range has to be set after flushing of cached entries, which resets the range
+                    self.cache.update_range(payload_range.clone());
+                }
+
+                // offset relative to start of current range, does not include possible padding of
+                // actual chunks, which needs to be added before encoding the payload reference
+                let offset =
+                    PayloadOffset::default().add(payload_range.start - self.cache.range().start);
+                log::debug!("Offset relative to range start: {offset:?}");
+
+                if !self.cache.caching_enabled() {
+                    // have regular file, get directory path
+                    let mut path = self.path.clone();
+                    path.pop();
+                    self.cache.update_start_path(path);
+                }
+                self.cache
+                    .insert(fd, c_file_name.into(), *stat, metadata.clone(), offset);
+                return Ok(());
+            }
+        } else if self.cache.caching_enabled() {
+            self.cache.insert(
+                fd.try_clone()?,
+                c_file_name.into(),
+                *stat,
+                metadata.clone(),
+                PayloadOffset::default(),
+            );
+
+            if metadata.is_dir() {
+                self.add_directory(
+                    encoder,
+                    previous_metadata,
+                    Dir::from_fd(fd.into_raw_fd())?,
+                    c_file_name,
+                    &metadata,
+                    stat,
+                )
+                .await?;
+            }
+            return Ok(());
+        }
+
+        self.encode_entries_to_archive(encoder, None).await?;
+        self.add_entry_to_archive(
+            encoder,
+            previous_metadata,
+            c_file_name,
+            stat,
+            fd,
+            &metadata,
+            None,
+        )
+        .await
+    }
+
+    async fn add_entry_to_archive<T: SeqWrite + Send>(
+        &mut self,
+        encoder: &mut Encoder<'_, T>,
+        previous_metadata: &mut Option<Directory<MetadataArchiveReader>>,
+        c_file_name: &CStr,
+        stat: &FileStat,
+        fd: OwnedFd,
+        metadata: &Metadata,
+        payload_offset: Option<PayloadOffset>,
+    ) -> Result<(), Error> {
+        use pxar::format::mode;
+
         let file_name: &Path = OsStr::from_bytes(c_file_name.to_bytes()).as_ref();
         match metadata.file_type() {
             mode::IFREG => {
@@ -677,9 +813,14 @@ impl Archiver {
                         .add_file(c_file_name, file_size, stat.st_mtime)?;
                 }
 
-                let offset: LinkOffset = self
-                    .add_regular_file(encoder, fd, file_name, &metadata, file_size)
-                    .await?;
+                let offset: LinkOffset = if let Some(payload_offset) = payload_offset {
+                    encoder
+                        .add_payload_ref(metadata, file_name, file_size, payload_offset)
+                        .await?
+                } else {
+                    self.add_regular_file(encoder, fd, file_name, metadata, file_size)
+                        .await?
+                };
 
                 if stat.st_nlink > 1 {
                     self.hardlinks
@@ -690,50 +831,43 @@ impl Archiver {
             }
             mode::IFDIR => {
                 let dir = Dir::from_fd(fd.into_raw_fd())?;
-                self.add_directory(
-                    encoder,
-                    previous_metadata,
-                    dir,
-                    c_file_name,
-                    &metadata,
-                    stat,
-                )
-                .await
+                self.add_directory(encoder, previous_metadata, dir, c_file_name, metadata, stat)
+                    .await
             }
             mode::IFSOCK => {
                 if let Some(ref catalog) = self.catalog {
                     catalog.lock().unwrap().add_socket(c_file_name)?;
                 }
 
-                Ok(encoder.add_socket(&metadata, file_name).await?)
+                Ok(encoder.add_socket(metadata, file_name).await?)
             }
             mode::IFIFO => {
                 if let Some(ref catalog) = self.catalog {
                     catalog.lock().unwrap().add_fifo(c_file_name)?;
                 }
 
-                Ok(encoder.add_fifo(&metadata, file_name).await?)
+                Ok(encoder.add_fifo(metadata, file_name).await?)
             }
             mode::IFLNK => {
                 if let Some(ref catalog) = self.catalog {
                     catalog.lock().unwrap().add_symlink(c_file_name)?;
                 }
 
-                self.add_symlink(encoder, fd, file_name, &metadata).await
+                self.add_symlink(encoder, fd, file_name, metadata).await
             }
             mode::IFBLK => {
                 if let Some(ref catalog) = self.catalog {
                     catalog.lock().unwrap().add_block_device(c_file_name)?;
                 }
 
-                self.add_device(encoder, file_name, &metadata, stat).await
+                self.add_device(encoder, file_name, metadata, stat).await
             }
             mode::IFCHR => {
                 if let Some(ref catalog) = self.catalog {
                     catalog.lock().unwrap().add_char_device(c_file_name)?;
                 }
 
-                self.add_device(encoder, file_name, &metadata, stat).await
+                self.add_device(encoder, file_name, metadata, stat).await
             }
             other => bail!(
                 "encountered unknown file type: 0x{:x} (0o{:o})",
@@ -743,6 +877,199 @@ impl Archiver {
         }
     }
 
+    async fn flush_cached_reusing_if_below_threshold<T: SeqWrite + Send>(
+        &mut self,
+        encoder: &mut Encoder<'_, T>,
+        keep_last_chunk: bool,
+    ) -> Result<(), Error> {
+        if self.cache.range().is_empty() {
+            // only non regular file entries (e.g. directories) in cache, allows to do regular encoding
+            self.encode_entries_to_archive(encoder, None).await?;
+            return Ok(());
+        }
+
+        // Take ownership of previous last chunk, only update where it must be injected
+        let mut prev_last_chunk = self.cache.take_last_chunk();
+        if let Some(ref ref_payload_index) = self.previous_payload_index {
+            let (mut indices, start_padding, end_padding) =
+                lookup_dynamic_entries(ref_payload_index, self.cache.range().clone())?;
+            let mut padding = start_padding + end_padding;
+            let range = self.cache.range();
+            let total_size = (range.end - range.start) + padding;
+
+            // take into account used bytes of kept back chunk for padding
+            if let (Some(first), Some(last)) = (indices.first_mut(), prev_last_chunk.as_mut()) {
+                if last.digest() == first.digest() {
+                    // Update padding used for threshold calculation only
+                    let used = last.size() - last.padding;
+                    padding -= used;
+                }
+            }
+
+            let ratio = padding as f64 / total_size as f64;
+
+            // do not reuse chunks if introduced padding higher than threshold
+            // opt for re-encoding in that case
+            if ratio > CHUNK_PADDING_THRESHOLD {
+                log::debug!(
+                    "Padding ratio: {ratio} > {CHUNK_PADDING_THRESHOLD}, padding: {}, total {}, chunks: {}",
+                    HumanByte::from(padding),
+                    HumanByte::from(total_size),
+                    indices.len(),
+                );
+                self.cache.update_last_chunk(prev_last_chunk);
+                self.encode_entries_to_archive(encoder, None).await?;
+            } else {
+                log::debug!(
+                    "Padding ratio: {ratio} < {CHUNK_PADDING_THRESHOLD}, padding: {}, total {}, chunks: {}",
+                    HumanByte::from(padding),
+                    HumanByte::from(total_size),
+                    indices.len(),
+                );
+
+                // check for cases where kept back last is not equal first chunk because the range
+                // end aligned with a chunk boundary, and the chunks therefore needs to be injected
+                if let (Some(first), Some(last)) = (indices.first_mut(), prev_last_chunk) {
+                    if last.digest() != first.digest() {
+                        // make sure to inject previous last chunk before encoding entries
+                        self.inject_chunks_at_current_payload_position(encoder, &[last])?;
+                    } else {
+                        let used = last.size() - last.padding;
+                        first.padding -= used;
+                    }
+                }
+
+                let base_offset = Some(encoder.payload_position()?.add(start_padding));
+                self.encode_entries_to_archive(encoder, base_offset).await?;
+
+                if keep_last_chunk {
+                    self.cache.update_last_chunk(indices.pop());
+                }
+
+                self.inject_chunks_at_current_payload_position(encoder, indices.as_slice())?;
+            }
+
+            Ok(())
+        } else {
+            bail!("cannot reuse chunks without previous index reader");
+        }
+    }
+
+    // Take ownership of cached entries and encode them to the archive
+    // Encode with reused payload chunks when base offset is some, reencode otherwise
+    async fn encode_entries_to_archive<T: SeqWrite + Send>(
+        &mut self,
+        encoder: &mut Encoder<'_, T>,
+        base_offset: Option<PayloadOffset>,
+    ) -> Result<(), Error> {
+        if let Some(prev) = self.cache.take_last_chunk() {
+            // make sure to inject previous last chunk before encoding entries
+            self.inject_chunks_at_current_payload_position(encoder, &[prev])?;
+        }
+
+        let old_path = self.path.clone();
+        self.path = self.cache.start_path().clone();
+
+        // take ownership of cached entries and reset caching state
+        let entries = self.cache.take_and_reset();
+        log::debug!(
+            "Got {} cache entries to encode: reuse is {}",
+            entries.len(),
+            base_offset.is_some()
+        );
+
+        for entry in entries {
+            match entry {
+                CacheEntry::RegEntry(CacheEntryData {
+                    fd,
+                    c_file_name,
+                    stat,
+                    metadata,
+                    payload_offset,
+                }) => {
+                    let file_name = OsStr::from_bytes(c_file_name.to_bytes());
+                    self.path.push(file_name);
+                    self.add_entry_to_archive(
+                        encoder,
+                        &mut None,
+                        &c_file_name,
+                        &stat,
+                        fd,
+                        &metadata,
+                        base_offset.map(|base_offset| payload_offset.add(base_offset.raw())),
+                    )
+                    .await?;
+                    self.path.pop();
+                }
+                CacheEntry::DirEntry(CacheEntryData {
+                    c_file_name,
+                    metadata,
+                    ..
+                }) => {
+                    let file_name = OsStr::from_bytes(c_file_name.to_bytes());
+                    self.path.push(file_name);
+                    if let Some(ref catalog) = self.catalog {
+                        catalog.lock().unwrap().start_directory(&c_file_name)?;
+                    }
+                    let dir_name = OsStr::from_bytes(c_file_name.to_bytes());
+                    encoder.create_directory(dir_name, &metadata).await?;
+                }
+                CacheEntry::DirEnd => {
+                    encoder.finish().await?;
+                    if let Some(ref catalog) = self.catalog {
+                        catalog.lock().unwrap().end_directory()?;
+                    }
+                    self.path.pop();
+                }
+            }
+        }
+
+        self.path = old_path;
+
+        Ok(())
+    }
+
+    fn inject_chunks_at_current_payload_position<T: SeqWrite + Send>(
+        &mut self,
+        encoder: &mut Encoder<'_, T>,
+        reused_chunks: &[ReusableDynamicEntry],
+    ) -> Result<(), Error> {
+        let mut injection_boundary = encoder.payload_position()?;
+
+        for chunks in reused_chunks.chunks(128) {
+            let mut chunk_list = Vec::with_capacity(128);
+            let mut size = PayloadOffset::default();
+
+            for chunk in chunks.iter() {
+                log::debug!(
+                    "Injecting chunk with {} padding (chunk size {})",
+                    HumanByte::from(chunk.padding),
+                    HumanByte::from(chunk.size()),
+                );
+                size = size.add(chunk.size());
+                chunk_list.push(chunk.clone());
+            }
+
+            let inject_chunks = InjectChunks {
+                boundary: injection_boundary.raw(),
+                chunks: chunk_list,
+                size: size.raw() as usize,
+            };
+
+            if let Some(sender) = self.forced_boundaries.as_mut() {
+                sender.send(inject_chunks)?;
+            } else {
+                bail!("missing injection queue");
+            };
+
+            injection_boundary = injection_boundary.add(size.raw());
+            log::debug!("Advance payload position by: {size:?}");
+            encoder.advance(size)?;
+        }
+
+        Ok(())
+    }
+
     async fn add_directory<T: SeqWrite + Send>(
         &mut self,
         encoder: &mut Encoder<'_, T>,
@@ -754,10 +1081,12 @@ impl Archiver {
     ) -> Result<(), Error> {
         let dir_name = OsStr::from_bytes(c_dir_name.to_bytes());
 
-        if let Some(ref catalog) = self.catalog {
-            catalog.lock().unwrap().start_directory(c_dir_name)?;
+        if !self.cache.caching_enabled() {
+            if let Some(ref catalog) = self.catalog {
+                catalog.lock().unwrap().start_directory(c_dir_name)?;
+            }
+            encoder.create_directory(dir_name, metadata).await?;
         }
-        encoder.create_directory(dir_name, metadata).await?;
 
         let old_fs_magic = self.fs_magic;
         let old_fs_feature_flags = self.fs_feature_flags;
@@ -797,9 +1126,13 @@ impl Archiver {
         self.fs_feature_flags = old_fs_feature_flags;
         self.current_st_dev = old_st_dev;
 
-        encoder.finish().await?;
-        if let Some(ref catalog) = self.catalog {
-            catalog.lock().unwrap().end_directory()?;
+        if !self.cache.caching_enabled() {
+            encoder.finish().await?;
+            if let Some(ref catalog) = self.catalog {
+                catalog.lock().unwrap().end_directory()?;
+            }
+        } else {
+            self.cache.insert_dir_end();
         }
 
         result
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 51/69] client: backup writer: add injected chunk count to stats
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (49 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 50/69] fix #3174: client: pxar: enable caching and meta comparison Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 52/69] pxar: create: keep track of reused chunks and files Christian Ebner
                   ` (18 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Track the number of injected chunks and show them in the debug output

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- adapt output for injected and therefore reused chunks
- refactoring of unneeded variables

 pbs-client/src/backup_writer.rs | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/pbs-client/src/backup_writer.rs b/pbs-client/src/backup_writer.rs
index b2ada85cd..c22978096 100644
--- a/pbs-client/src/backup_writer.rs
+++ b/pbs-client/src/backup_writer.rs
@@ -57,8 +57,10 @@ pub struct UploadOptions {
 struct UploadStats {
     chunk_count: usize,
     chunk_reused: usize,
+    chunk_injected: usize,
     size: usize,
     size_reused: usize,
+    size_injected: usize,
     size_compressed: usize,
     duration: std::time::Duration,
     csum: [u8; 32],
@@ -355,6 +357,14 @@ impl BackupWriter {
             pbs_tools::format::strip_server_file_extension(archive_name)
         };
 
+        if upload_stats.chunk_injected > 0 {
+            log::info!(
+                "{archive}: reused {} from previous snapshot for unchanged files ({} chunks)",
+                HumanByte::from(upload_stats.size_injected),
+                upload_stats.chunk_injected,
+            );
+        }
+
         if archive_name != CATALOG_NAME {
             let speed: HumanByte =
                 ((size_dirty * 1_000_000) / (upload_stats.duration.as_micros() as usize)).into();
@@ -645,6 +655,8 @@ impl BackupWriter {
         let total_chunks2 = total_chunks.clone();
         let known_chunk_count = Arc::new(AtomicUsize::new(0));
         let known_chunk_count2 = known_chunk_count.clone();
+        let injected_chunk_count = Arc::new(AtomicUsize::new(0));
+        let injected_chunk_count2 = injected_chunk_count.clone();
 
         let stream_len = Arc::new(AtomicUsize::new(0));
         let stream_len2 = stream_len.clone();
@@ -652,6 +664,8 @@ impl BackupWriter {
         let compressed_stream_len2 = compressed_stream_len.clone();
         let reused_len = Arc::new(AtomicUsize::new(0));
         let reused_len2 = reused_len.clone();
+        let injected_len = Arc::new(AtomicUsize::new(0));
+        let injected_len2 = injected_len.clone();
 
         let append_chunk_path = format!("{}_index", prefix);
         let upload_chunk_path = format!("{}_chunk", prefix);
@@ -672,6 +686,7 @@ impl BackupWriter {
                     // account for injected chunks
                     let count = chunks.len();
                     total_chunks.fetch_add(count, Ordering::SeqCst);
+                    injected_chunk_count.fetch_add(count, Ordering::SeqCst);
 
                     let mut known = Vec::new();
                     let mut guard = index_csum.lock().unwrap();
@@ -680,6 +695,7 @@ impl BackupWriter {
                         let offset =
                             stream_len.fetch_add(chunk.size() as usize, Ordering::SeqCst) as u64;
                         reused_len.fetch_add(chunk.size() as usize, Ordering::SeqCst);
+                        injected_len.fetch_add(chunk.size() as usize, Ordering::SeqCst);
                         let digest = chunk.digest();
                         known.push((offset, digest));
                         let end_offset = offset + chunk.size();
@@ -795,8 +811,10 @@ impl BackupWriter {
                 let duration = start_time.elapsed();
                 let chunk_count = total_chunks2.load(Ordering::SeqCst);
                 let chunk_reused = known_chunk_count2.load(Ordering::SeqCst);
+                let chunk_injected = injected_chunk_count2.load(Ordering::SeqCst);
                 let size = stream_len2.load(Ordering::SeqCst);
                 let size_reused = reused_len2.load(Ordering::SeqCst);
+                let size_injected = injected_len2.load(Ordering::SeqCst);
                 let size_compressed = compressed_stream_len2.load(Ordering::SeqCst) as usize;
 
                 let mut guard = index_csum_2.lock().unwrap();
@@ -805,8 +823,10 @@ impl BackupWriter {
                 futures::future::ok(UploadStats {
                     chunk_count,
                     chunk_reused,
+                    chunk_injected,
                     size,
                     size_reused,
+                    size_injected,
                     size_compressed,
                     duration,
                     csum,
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 52/69] pxar: create: keep track of reused chunks and files
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (50 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 51/69] client: backup writer: add injected chunk count to stats Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 53/69] pxar: create: show chunk injection stats debug output Christian Ebner
                   ` (17 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Track and log reused or reencoded files as well as the reused chunks
and their paddings.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- Adapt stats to include also reencoded size

 pbs-client/src/pxar/create.rs | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index f044dd1e6..3e72036f1 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -140,6 +140,18 @@ pub(crate) struct HardLinkInfo {
     st_ino: u64,
 }
 
+#[derive(Default)]
+struct ReuseStats {
+    files_reused_count: u64,
+    files_hardlink_count: u64,
+    files_reencoded_count: u64,
+    total_injected_count: u64,
+    partial_chunks_count: u64,
+    total_injected_size: u64,
+    total_reused_payload_size: u64,
+    total_reencoded_size: u64,
+}
+
 struct Archiver {
     feature_flags: Flags,
     fs_feature_flags: Flags,
@@ -159,6 +171,7 @@ struct Archiver {
     forced_boundaries: Option<mpsc::Sender<InjectChunks>>,
     previous_payload_index: Option<DynamicIndexReader>,
     cache: PxarLookaheadCache,
+    reuse_stats: ReuseStats,
 }
 
 type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
@@ -252,6 +265,7 @@ where
         forced_boundaries,
         previous_payload_index,
         cache: PxarLookaheadCache::new(None),
+        reuse_stats: ReuseStats::default(),
     };
 
     archiver
@@ -814,15 +828,24 @@ impl Archiver {
                 }
 
                 let offset: LinkOffset = if let Some(payload_offset) = payload_offset {
+                    self.reuse_stats.total_reused_payload_size +=
+                        file_size + size_of::<pxar::format::Header>() as u64;
+                    self.reuse_stats.files_reused_count += 1;
+
                     encoder
                         .add_payload_ref(metadata, file_name, file_size, payload_offset)
                         .await?
                 } else {
+                    self.reuse_stats.total_reencoded_size +=
+                        file_size + size_of::<pxar::format::Header>() as u64;
+                    self.reuse_stats.files_reencoded_count += 1;
+
                     self.add_regular_file(encoder, fd, file_name, metadata, file_size)
                         .await?
                 };
 
                 if stat.st_nlink > 1 {
+                    self.reuse_stats.files_hardlink_count += 1;
                     self.hardlinks
                         .insert(link_info, (self.path.clone(), offset));
                 }
@@ -1046,6 +1069,13 @@ impl Archiver {
                     HumanByte::from(chunk.padding),
                     HumanByte::from(chunk.size()),
                 );
+                self.reuse_stats.total_injected_size += chunk.size();
+                self.reuse_stats.total_injected_count += 1;
+
+                if chunk.padding > 0 {
+                    self.reuse_stats.partial_chunks_count += 1;
+                }
+
                 size = size.add(chunk.size());
                 chunk_list.push(chunk.clone());
             }
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 53/69] pxar: create: show chunk injection stats debug output
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (51 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 52/69] pxar: create: keep track of reused chunks and files Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 54/69] client: pxar: add helper to handle optional preludes Christian Ebner
                   ` (16 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- Adapt output to be in more readable list form

 pbs-client/src/pxar/create.rs | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 3e72036f1..be86df356 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -281,6 +281,34 @@ where
     encoder.finish().await?;
     encoder.close().await?;
 
+    if metadata_mode {
+        log::info!("Change detection summary:");
+        log::info!(
+            " - {} total files ({} hardlinks)",
+            archiver.reuse_stats.files_reused_count
+                + archiver.reuse_stats.files_reencoded_count
+                + archiver.reuse_stats.files_hardlink_count,
+            archiver.reuse_stats.files_hardlink_count,
+        );
+        log::info!(
+            " - {} unchanged, reusable files with {} data",
+            archiver.reuse_stats.files_reused_count,
+            HumanByte::from(archiver.reuse_stats.total_reused_payload_size),
+        );
+        log::info!(
+            " - {} changed or non-reusable files with {} data",
+            archiver.reuse_stats.files_reencoded_count,
+            HumanByte::from(archiver.reuse_stats.total_reencoded_size),
+        );
+        log::info!(
+            " - {} padding in {} partial chunks",
+            HumanByte::from(
+                archiver.reuse_stats.total_injected_size
+                    - archiver.reuse_stats.total_reused_payload_size
+            ),
+            archiver.reuse_stats.partial_chunks_count,
+        );
+    }
     Ok(())
 }
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 54/69] client: pxar: add helper to handle optional preludes
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (52 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 53/69] pxar: create: show chunk injection stats debug output Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 55/69] client: pxar: opt encode cli exclude patterns as Prelude Christian Ebner
                   ` (15 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Pxar archives with format version 2 allows to store optional
information file format version and prelude entries.

Cover the case for these entries, the file format version entry being
introduced to distinguish between different file formats used for
encoding as well as the prelude entry used to store optional metadata
such as the pxar cli exlude parameters.

Add the logic to accept and decode these prelude entries when
accessing the archive via a decoder instance.

For now simply ignore them.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- Get rid of additional, but unused EntryKind::Version return value

 pbs-client/src/pxar/create.rs             |  2 +-
 pbs-client/src/pxar/extract.rs            |  7 ++---
 pbs-client/src/pxar/tools.rs              |  7 +++++
 pbs-client/src/tools/mod.rs               | 31 +++++++++++++++++++++++
 src/api2/tape/restore.rs                  | 17 +++++--------
 src/tape/file_formats/snapshot_archive.rs |  1 +
 6 files changed, 50 insertions(+), 15 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index be86df356..9a9882fc7 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -226,7 +226,7 @@ where
     }
 
     let metadata_mode = options.previous_ref.is_some() && writers.archive.payload().is_some();
-    let mut encoder = Encoder::new(writers.archive, &metadata).await?;
+    let mut encoder = Encoder::new(writers.archive, &metadata, None).await?;
 
     let mut patterns = options.patterns;
 
diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
index 5f5ac6188..e22390606 100644
--- a/pbs-client/src/pxar/extract.rs
+++ b/pbs-client/src/pxar/extract.rs
@@ -29,6 +29,7 @@ use proxmox_compression::zip::{ZipEncoder, ZipEntry};
 use crate::pxar::dir_stack::PxarDirStack;
 use crate::pxar::metadata;
 use crate::pxar::Flags;
+use crate::tools::handle_root_with_optional_format_version_prelude;
 
 pub struct PxarExtractOptions<'a> {
     pub match_list: &'a [MatchEntry],
@@ -124,9 +125,7 @@ where
         // we use this to keep track of our directory-traversal
         decoder.enable_goodbye_entries(true);
 
-        let root = decoder
-            .next()
-            .context("found empty pxar archive")?
+        let (root, _) = handle_root_with_optional_format_version_prelude(&mut decoder)
             .context("error reading pxar archive")?;
 
         if !root.is_dir() {
@@ -267,6 +266,8 @@ where
         };
 
         let extract_res = match (did_match, entry.kind()) {
+            (_, EntryKind::Version(_version)) => Ok(()),
+            (_, EntryKind::Prelude(_prelude)) => Ok(()),
             (_, EntryKind::Directory) => {
                 self.callback(entry.path());
 
diff --git a/pbs-client/src/pxar/tools.rs b/pbs-client/src/pxar/tools.rs
index 459951d50..27e5185a3 100644
--- a/pbs-client/src/pxar/tools.rs
+++ b/pbs-client/src/pxar/tools.rs
@@ -172,6 +172,13 @@ pub fn format_multi_line_entry(entry: &Entry) -> String {
     let meta = entry.metadata();
 
     let (size, link, type_name, payload_offset) = match entry.kind() {
+        EntryKind::Version(version) => (format!("{version:?}"), String::new(), "version", None),
+        EntryKind::Prelude(prelude) => (
+            "0".to_string(),
+            format!("raw data: {:?} bytes", prelude.data.len()),
+            "prelude",
+            None,
+        ),
         EntryKind::File {
             size,
             payload_offset,
diff --git a/pbs-client/src/tools/mod.rs b/pbs-client/src/tools/mod.rs
index f43058dbd..8d4fefaf3 100644
--- a/pbs-client/src/tools/mod.rs
+++ b/pbs-client/src/tools/mod.rs
@@ -539,3 +539,34 @@ pub fn has_pxar_filename_extension(name: &str, with_didx_extension: bool) -> boo
         name.ends_with(".pxar") || name.ends_with(".mpxar") || name.ends_with(".ppxar")
     }
 }
+
+/// Decode possible format version and prelude entries before getting the root directory
+/// entry.
+///
+/// Returns the root directory entry and, if present, the prelude entry
+pub fn handle_root_with_optional_format_version_prelude<R: pxar::decoder::SeqRead>(
+    decoder: &mut pxar::decoder::sync::Decoder<R>,
+) -> Result<(pxar::Entry, Option<pxar::Entry>), Error> {
+    let first = decoder
+        .next()
+        .ok_or_else(|| format_err!("missing root entry"))??;
+    match first.kind() {
+        pxar::EntryKind::Directory => Ok((first, None)),
+        pxar::EntryKind::Version(_version) => {
+            let second = decoder
+                .next()
+                .ok_or_else(|| format_err!("missing root entry"))??;
+            match second.kind() {
+                pxar::EntryKind::Directory => Ok((second, None)),
+                pxar::EntryKind::Prelude(_prelude) => {
+                    let third = decoder
+                        .next()
+                        .ok_or_else(|| format_err!("missing root entry"))??;
+                    Ok((third, Some(second)))
+                }
+                _ => bail!("unexpected entry kind {:?}", second.kind()),
+            }
+        }
+        _ => bail!("unexpected entry kind {:?}", first.kind()),
+    }
+}
diff --git a/src/api2/tape/restore.rs b/src/api2/tape/restore.rs
index 9184ff934..382909647 100644
--- a/src/api2/tape/restore.rs
+++ b/src/api2/tape/restore.rs
@@ -23,6 +23,7 @@ use pbs_api_types::{
     PRIV_DATASTORE_MODIFY, PRIV_TAPE_READ, TAPE_RESTORE_NAMESPACE_SCHEMA,
     TAPE_RESTORE_SNAPSHOT_SCHEMA, UPID_SCHEMA,
 };
+use pbs_client::tools::handle_root_with_optional_format_version_prelude;
 use pbs_config::CachedUserInfo;
 use pbs_datastore::dynamic_index::DynamicIndexReader;
 use pbs_datastore::fixed_index::FixedIndexReader;
@@ -1713,17 +1714,11 @@ fn try_restore_snapshot_archive<R: pxar::decoder::SeqRead>(
     decoder: &mut pxar::decoder::sync::Decoder<R>,
     snapshot_path: &Path,
 ) -> Result<BackupManifest, Error> {
-    let _root = match decoder.next() {
-        None => bail!("missing root entry"),
-        Some(root) => {
-            let root = root?;
-            match root.kind() {
-                pxar::EntryKind::Directory => { /* Ok */ }
-                _ => bail!("wrong root entry type"),
-            }
-            root
-        }
-    };
+    let (root, _) = handle_root_with_optional_format_version_prelude(decoder)?;
+    match root.kind() {
+        pxar::EntryKind::Directory => { /* Ok */ }
+        _ => bail!("wrong root entry type"),
+    }
 
     let root_path = Path::new("/");
     let manifest_file_name = OsStr::new(MANIFEST_BLOB_NAME);
diff --git a/src/tape/file_formats/snapshot_archive.rs b/src/tape/file_formats/snapshot_archive.rs
index 82f466980..f5a588f4e 100644
--- a/src/tape/file_formats/snapshot_archive.rs
+++ b/src/tape/file_formats/snapshot_archive.rs
@@ -61,6 +61,7 @@ pub fn tape_write_snapshot_archive<'a>(
         let mut encoder = pxar::encoder::sync::Encoder::new(
             pxar::PxarVariant::Unified(PxarTapeWriter::new(writer)),
             &root_metadata,
+            None,
         )?;
 
         for filename in file_list.iter() {
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 55/69] client: pxar: opt encode cli exclude patterns as Prelude
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (53 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 54/69] client: pxar: add helper to handle optional preludes Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 56/69] pxar: ignore version and prelude entries in listing Christian Ebner
                   ` (14 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Instead of encoding the pxar cli exclude patterns as regular file
within the root directory of an archive, store this information
directly after the pxar format version entry in the entry of kind
Prelude.

This behavior is however currently exclusive to the archives written
with format version 2 in a split metadata and payload case.

This is a breaking change for the encoding of new cli exclude
parameters. Any new exclude parameter will not be added to an already
present .pxar-cliexclude file, and it will not be created if not
present.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- adapt to field naming change for PxarWriters

 pbs-client/src/pxar/create.rs | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 9a9882fc7..528577520 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -225,9 +225,6 @@ where
         set.insert(stat.st_dev);
     }
 
-    let metadata_mode = options.previous_ref.is_some() && writers.archive.payload().is_some();
-    let mut encoder = Encoder::new(writers.archive, &metadata, None).await?;
-
     let mut patterns = options.patterns;
 
     if options.skip_lost_and_found {
@@ -237,6 +234,15 @@ where
             MatchType::Exclude,
         )?);
     }
+
+    let cli_params_content = generate_pxar_excludes_cli(&patterns[..]);
+    let cli_params = if options.previous_ref.is_some() {
+        Some(cli_params_content.as_slice())
+    } else {
+        None
+    };
+
+    let metadata_mode = options.previous_ref.is_some() && writers.archive.payload().is_some();
     let (previous_payload_index, previous_metadata_accessor) =
         if let Some(refs) = options.previous_ref {
             (
@@ -247,6 +253,8 @@ where
             (None, None)
         };
 
+    let mut encoder = Encoder::new(writers.archive, &metadata, cli_params).await?;
+
     let mut archiver = Archiver {
         feature_flags,
         fs_feature_flags,
@@ -348,7 +356,7 @@ impl Archiver {
 
             let mut file_list = self.generate_directory_file_list(&mut dir, is_root)?;
 
-            if is_root && old_patterns_count > 0 {
+            if is_root && old_patterns_count > 0 && previous_metadata_accessor.is_none() {
                 file_list.push(FileListEntry {
                     name: CString::new(".pxarexclude-cli").unwrap(),
                     path: PathBuf::new(),
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 56/69] pxar: ignore version and prelude entries in listing
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (54 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 55/69] client: pxar: opt encode cli exclude patterns as Prelude Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 57/69] docs: file formats: describe split pxar archive file layout Christian Ebner
                   ` (13 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Do not list the pxar format version and the prelude entries in the
output of pxar list, these are not regular entries.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- not present

 pxar-bin/src/main.rs | 1 +
 1 file changed, 1 insertion(+)

diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index a3b6ec4c0..7efecd524 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -483,6 +483,7 @@ fn dump_archive(archive: String, payload_input: Option<String>) -> Result<(), Er
 
         if log::log_enabled!(log::Level::Debug) {
             match entry.kind() {
+                EntryKind::Version(_) | EntryKind::Prelude(_) => continue,
                 EntryKind::File {
                     payload_offset: Some(offset),
                     size,
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 57/69] docs: file formats: describe split pxar archive file layout
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (55 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 56/69] pxar: ignore version and prelude entries in listing Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 58/69] docs: add section describing change detection mode Christian Ebner
                   ` (12 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Describes the pxar metadata archive and the corresponding pxar payload
file-format layout.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 docs/file-formats.rst         | 46 ++++++++++++++++++++++++++++++++
 docs/meta-format-overview.dot | 50 +++++++++++++++++++++++++++++++++++
 2 files changed, 96 insertions(+)
 create mode 100644 docs/meta-format-overview.dot

diff --git a/docs/file-formats.rst b/docs/file-formats.rst
index 43ecfefce..77d55b5ef 100644
--- a/docs/file-formats.rst
+++ b/docs/file-formats.rst
@@ -8,7 +8,53 @@ Proxmox File Archive Format (``.pxar``)
 
 .. graphviz:: pxar-format-overview.dot
 
+.. _pxar-meta-format:
 
+Proxmox File Archive Format - Meta (``.mpxar``)
+-----------------------------------------------
+
+Pxar metadata archive with same structure as a regular pxar archive, with the
+exception of regular file payloads not being contained within the archive
+itself, but rather being stored as payload references to the corresponding pxar
+payload (``.ppxar``) file.
+
+Can be used to lookup all the archive entries and metadata without the size
+overhead introduced by the file payloads.
+
+.. graphviz:: meta-format-overview.dot
+
+.. _ppxar-format:
+
+Proxmox File Archive Format - Payload (``.ppxar``)
+--------------------------------------------------
+
+Pxar payload file storing regular file payloads to be referenced and accessed by
+the corresponding pxar metadata (``.mpxar``) archive. Contains a concatenation
+of regular file payloads, each prefixed by a `PAYLOAD` header. Further, the
+actual referenced payload entries might be separated by padding (full/partial
+payloads not referenced), introduced when reusing chunks of a previous backup
+run, when chunk boundaries did not aligned to payload entry offsets.
+
+All headers are stored as little-endian.
+
+.. list-table::
+   :widths: auto
+
+   * - ``PAYLOAD_START_MARKER``
+     - header of ``[u8; 16]`` consisting of type hash and size;
+       marks start
+   * - ``PAYLOAD``
+     - header of ``[u8; 16]`` cosisting of type hash and size;
+       referenced by metadata archive
+   * - Payload
+     - raw regular file payload
+   * - Padding
+     - partial/full unreferenced payloads, caused by unaligned chunk boundary
+   * - ...
+     - further concatenation of payload header, payload and padding
+   * - ``PAYLOAD_TAIL_MARKER``
+     - header of ``[u8; 16]`` consisting of type hash and size;
+       marks end
 .. _data-blob-format:
 
 Data Blob Format (``.blob``)
diff --git a/docs/meta-format-overview.dot b/docs/meta-format-overview.dot
new file mode 100644
index 000000000..7eea4b55b
--- /dev/null
+++ b/docs/meta-format-overview.dot
@@ -0,0 +1,50 @@
+digraph g {
+graph [
+rankdir = "LR"
+fontname="Helvetica"
+];
+node [
+fontsize = "16"
+shape = "record"
+];
+edge [
+];
+
+"archive" [
+label = "archive.mpxar"
+shape = "record"
+];
+
+"rootdir" [
+label = "<fv>FORMAT_VERSION\l|PRELUDE\l|<f0>ENTRY\l|\{XATTR\}\* extended attribute list\l|\{ACL_USER\}\* USER ACL entries\l|\{ACL_GROUP\}\* GROUP ACL entries\l|\[ACL_GROUP_OBJ\] the ACL_GROUP_OBJ \l|\[ACL_DEFAULT\] the various default ACL fields\l|\{ACL_DEFAULT_USER\}\* USER ACL entries\l|\{ACL_DEFAULT_GROUP\}\* GROUP ACL entries\l|\[FCAPS\] file capability in Linux disk format\l|\[QUOTA_PROJECT_ID\] the ext4/xfs quota project ID\l|{<pl> PAYLOAD_REF|SYMLINK|DEVICE|{<de> \{DirectoryEntries\}\*|GOODBYE}}"
+shape = "record"
+];
+
+
+"entry" [
+label = "<f0> size: u64 = 64\l|type: u64 = ENTRY\l|feature_flags: u64\l|mode: u64\l|flags: u64\l|uid: u64\l|gid: u64\l|mtime: u64\l"
+labeljust = "l"
+shape = "record"
+];
+
+
+
+"direntry" [
+label = "<f0> FILENAME\l|{ENTRY\l|HARDLINK\l}"
+shape = "record"
+];
+
+"payloadrefentry" [
+label = "<f0> offset: u64\l|size: u64\l"
+shape = "record"
+];
+
+"archive" -> "rootdir":fv
+
+"rootdir":f0 -> "entry":f0
+
+"rootdir":de -> "direntry":f0
+
+"rootdir":pl -> "payloadrefentry":f0
+
+}
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 58/69] docs: add section describing change detection mode
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (56 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 57/69] docs: file formats: describe split pxar archive file layout Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 59/69] test-suite: add detection mode change benchmark Christian Ebner
                   ` (11 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Describe the motivation and basic principle of the clients change
detection mode and show an example invocation.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- add more information on metadata being compared
- adapt and link from technical overview

 docs/backup-client.rst      | 45 +++++++++++++++++++++++++++++++++++++
 docs/technical-overview.rst |  3 +++
 2 files changed, 48 insertions(+)

diff --git a/docs/backup-client.rst b/docs/backup-client.rst
index 00a1abbb3..58fcd79f0 100644
--- a/docs/backup-client.rst
+++ b/docs/backup-client.rst
@@ -280,6 +280,51 @@ Multiple paths can be excluded like this:
 
     # proxmox-backup-client backup.pxar:./linux --exclude=/usr --exclude=/rust
 
+.. _client_change_detection_mode:
+
+Change Detection Mode
+~~~~~~~~~~~~~~~~~~~~~
+
+File-based backups containing a lot of data can take a long time, as the default
+behavior for the Proxmox backup client is to read all data and re-encode it.
+The encoded stream is split into variable sized chunks for efficient
+deduplication and based on the chunk digest a decision can be made whether a
+given chunk needs to be uploaded or can be indexed without upload as it is
+already available on the server (and therefore deduplicated). For some
+use-cases, where files do not change frequently the full re-reading is not
+feasible and undesired.
+
+The backup clients `change-detection-mode` can be switched from default to
+`metadata` based detection to reduce limitations as described above, instructing
+the client to avoid re-reading files with unchanged metadata whenever possible.
+When using this mode, instead of the regular pxar archive, the backup snapshot
+is stored into two separate files: the `mpxar` containing the archives metadata
+and the `ppxar` containing a concatenation of the file contents. This splitting
+allows for metadata lookups without the overhead of the file contents.
+Using the `change-detection-mode` set to `data` allows to create the same split
+archive as when using the `metadata` mode, but without using a previous
+reference and therefore reencoding all file payloads.
+
+When creating the backup archives, the current file metadata is compared to the
+one looked up in the previous `mpxar` archive.
+The metadata comparison includes file size, file type, ownership and permission
+information acls and attributes and most importantly the files mtime, for
+details see the :ref:`pxar metadata archive format <pxar-meta-format>`.
+
+If unchanged, the entry is cached for possible re-use of content chunks without
+re-reading, by indexing the already present chunks containing the contents from
+the previous backup snapshot. Since the file might only partially re-use chunks
+(thereby introducing wasted space in the form of padding), the decision whether
+to re-use or re-encode the currently cached entries is delegated to when enough
+information is available, comparing the possible padding a threshold value.
+
+The following shows an example for the client invocation with the `metadata`
+mode:
+
+.. code-block:: console
+
+    # proxmox-backup-client backup.pxar:./linux --change-detection-mode=metadata
+
 .. _client_encryption:
 
 Encryption
diff --git a/docs/technical-overview.rst b/docs/technical-overview.rst
index 89835a7cc..a8b1c7268 100644
--- a/docs/technical-overview.rst
+++ b/docs/technical-overview.rst
@@ -28,6 +28,9 @@ which are not chunked, e.g. the client log), or one or more indexes
 
 When uploading an index, the client first has to read the source data, chunk it
 and send the data as chunks with their identifying checksum to the server.
+When using the :ref:`change detection mode <change_detection_mode>` payload
+chunks for unchanged files are reused from the previous snapshot, thereby not
+reading the source data again.
 
 If there is a previous Snapshot in the backup group, the client can first
 download the chunk list of the previous Snapshot. If it detects a chunk that
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 59/69] test-suite: add detection mode change benchmark
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (57 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 58/69] docs: add section describing change detection mode Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 60/69] test-suite: Makefile: add debian package and related files Christian Ebner
                   ` (10 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Introduces the proxmox-backup-test-suite create intended for
benchmarking and high level user facing testing.

The initial code includes a benchmark intended for regression testing of
the proxmox-backup-client when using different file detection modes
during backup.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 Cargo.toml                                    |   1 +
 proxmox-backup-test-suite/Cargo.toml          |  18 ++
 .../src/detection_mode_bench.rs               | 294 ++++++++++++++++++
 proxmox-backup-test-suite/src/main.rs         |  17 +
 4 files changed, 330 insertions(+)
 create mode 100644 proxmox-backup-test-suite/Cargo.toml
 create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs
 create mode 100644 proxmox-backup-test-suite/src/main.rs

diff --git a/Cargo.toml b/Cargo.toml
index 4119b3cac..e83c65b60 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -45,6 +45,7 @@ members = [
     "proxmox-restore-daemon",
 
     "pxar-bin",
+    "proxmox-backup-test-suite",
 ]
 
 [lib]
diff --git a/proxmox-backup-test-suite/Cargo.toml b/proxmox-backup-test-suite/Cargo.toml
new file mode 100644
index 000000000..3f899e9bc
--- /dev/null
+++ b/proxmox-backup-test-suite/Cargo.toml
@@ -0,0 +1,18 @@
+[package]
+name = "proxmox-backup-test-suite"
+version = "0.1.0"
+authors.workspace = true
+edition.workspace = true
+
+[dependencies]
+anyhow.workspace = true
+futures.workspace = true
+serde.workspace = true
+serde_json.workspace = true
+
+pbs-client.workspace = true
+pbs-key-config.workspace = true
+pbs-tools.workspace = true
+proxmox-async.workspace = true
+proxmox-router = { workspace = true, features = ["cli"] }
+proxmox-schema = { workspace = true, features = [ "api-macro" ] }
diff --git a/proxmox-backup-test-suite/src/detection_mode_bench.rs b/proxmox-backup-test-suite/src/detection_mode_bench.rs
new file mode 100644
index 000000000..9a3c76802
--- /dev/null
+++ b/proxmox-backup-test-suite/src/detection_mode_bench.rs
@@ -0,0 +1,294 @@
+use std::path::Path;
+use std::process::Command;
+use std::{thread, time};
+
+use anyhow::{bail, format_err, Error};
+use serde_json::Value;
+
+use pbs_client::{
+    tools::{complete_repository, key_source::KEYFILE_SCHEMA, REPO_URL_SCHEMA},
+    BACKUP_SOURCE_SCHEMA,
+};
+use pbs_tools::json;
+use proxmox_router::cli::*;
+use proxmox_schema::api;
+
+const DEFAULT_NUMBER_OF_RUNS: u64 = 5;
+// Homepage https://cocodataset.org/
+const COCO_DATASET_SRC_URL: &'static str = "http://images.cocodataset.org/zips/unlabeled2017.zip";
+// Homepage https://kernel.org/
+const LINUX_GIT_REPOSITORY: &'static str =
+    "git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git";
+const LINUX_GIT_TAG: &'static str = "v6.5.5";
+
+pub(crate) fn detection_mode_bench_mgtm_cli() -> CliCommandMap {
+    let run_cmd_def = CliCommand::new(&API_METHOD_DETECTION_MODE_BENCH_RUN)
+        .arg_param(&["backupspec"])
+        .completion_cb("repository", complete_repository)
+        .completion_cb("keyfile", complete_file_name);
+
+    let prepare_cmd_def = CliCommand::new(&API_METHOD_DETECTION_MODE_BENCH_PREPARE);
+    CliCommandMap::new()
+        .insert("prepare", prepare_cmd_def)
+        .insert("run", run_cmd_def)
+}
+
+#[api(
+   input: {
+       properties: {
+           backupspec: {
+               type: Array,
+               description: "List of backup source specifications ([<label.ext>:<path>] ...)",
+               items: {
+                   schema: BACKUP_SOURCE_SCHEMA,
+               }
+           },
+           repository: {
+               schema: REPO_URL_SCHEMA,
+               optional: true,
+           },
+           keyfile: {
+               schema: KEYFILE_SCHEMA,
+               optional: true,
+           },
+           "number-of-runs": {
+               description: "Number of times to repeat the run",
+               type: Integer,
+               optional: true,
+           },
+       }
+   }
+)]
+/// Run benchmark to compare performance for backups using different change detection modes.
+fn detection_mode_bench_run(param: Value) -> Result<(), Error> {
+    let mut pbc = Command::new("proxmox-backup-client");
+    pbc.arg("backup");
+
+    let backupspec_list = json::required_array_param(&param, "backupspec")?;
+    for backupspec in backupspec_list {
+        let arg = backupspec
+            .as_str()
+            .ok_or_else(|| format_err!("failed to parse backupspec"))?;
+        pbc.arg(arg);
+    }
+
+    if let Some(repo) = param["repository"].as_str() {
+        pbc.arg("--repository");
+        pbc.arg::<&str>(repo);
+    }
+
+    if let Some(keyfile) = param["keyfile"].as_str() {
+        pbc.arg("--keyfile");
+        pbc.arg::<&str>(keyfile);
+    }
+
+    let number_of_runs = match param["number_of_runs"].as_u64() {
+        Some(n) => n,
+        None => DEFAULT_NUMBER_OF_RUNS,
+    };
+    if number_of_runs < 1 {
+        bail!("Number of runs must be greater than 1, aborting.");
+    }
+
+    // First run is an initial run to make sure all chunks are present already, reduce side effects
+    // by filesystem caches ecc.
+    let _stats_initial = do_run(&mut pbc, 1)?;
+
+    println!("\nStarting benchmarking backups with regular detection mode...\n");
+    let stats_reg = do_run(&mut pbc, number_of_runs)?;
+
+    // Make sure to have a valid reference with catalog fromat version 2
+    pbc.arg("--change-detection-mode=metadata");
+    let _stats_initial = do_run(&mut pbc, 1)?;
+
+    println!("\nStarting benchmarking backups with metadata detection mode...\n");
+    let stats_meta = do_run(&mut pbc, number_of_runs)?;
+
+    println!("\nCompleted benchmark with {number_of_runs} runs for each tested mode.");
+    println!("\nCompleted regular backup with:");
+    println!("Total runtime: {:.2} s", stats_reg.total);
+    println!("Average: {:.2} ± {:.2} s", stats_reg.avg, stats_reg.stddev);
+    println!("Min: {:.2} s", stats_reg.min);
+    println!("Max: {:.2} s", stats_reg.max);
+
+    println!("\nCompleted metadata detection mode backup with:");
+    println!("Total runtime: {:.2} s", stats_meta.total);
+    println!(
+        "Average: {:.2} ± {:.2} s",
+        stats_meta.avg, stats_meta.stddev
+    );
+    println!("Min: {:.2} s", stats_meta.min);
+    println!("Max: {:.2} s", stats_meta.max);
+
+    let diff_stddev =
+        ((stats_meta.stddev * stats_meta.stddev) + (stats_reg.stddev * stats_reg.stddev)).sqrt();
+    println!("\nDifferences (metadata based - regular):");
+    println!(
+        "Delta total runtime: {:.2} s ({:.2} %)",
+        stats_meta.total - stats_reg.total,
+        100.0 * (stats_meta.total / stats_reg.total - 1.0),
+    );
+    println!(
+        "Delta average: {:.2} ± {:.2} s ({:.2} %)",
+        stats_meta.avg - stats_reg.avg,
+        diff_stddev,
+        100.0 * (stats_meta.avg / stats_reg.avg - 1.0),
+    );
+    println!(
+        "Delta min: {:.2} s ({:.2} %)",
+        stats_meta.min - stats_reg.min,
+        100.0 * (stats_meta.min / stats_reg.min - 1.0),
+    );
+    println!(
+        "Delta max: {:.2} s ({:.2} %)",
+        stats_meta.max - stats_reg.max,
+        100.0 * (stats_meta.max / stats_reg.max - 1.0),
+    );
+
+    Ok(())
+}
+
+fn do_run(cmd: &mut Command, n_runs: u64) -> Result<Statistics, Error> {
+    // Avoid consecutive snapshot timestamps collision
+    thread::sleep(time::Duration::from_millis(1000));
+    let mut timings = Vec::with_capacity(n_runs as usize);
+    for iteration in 1..n_runs + 1 {
+        let start = std::time::SystemTime::now();
+        let mut child = cmd.spawn()?;
+        let exit_code = child.wait()?;
+        let elapsed = start.elapsed()?;
+        timings.push(elapsed);
+        if !exit_code.success() {
+            bail!("Run number {iteration} of {n_runs} failed, aborting.");
+        }
+    }
+
+    Ok(statistics(timings))
+}
+
+struct Statistics {
+    total: f64,
+    avg: f64,
+    stddev: f64,
+    min: f64,
+    max: f64,
+}
+
+fn statistics(timings: Vec<std::time::Duration>) -> Statistics {
+    let total = timings
+        .iter()
+        .fold(0f64, |sum, time| sum + time.as_secs_f64());
+    let avg = total / timings.len() as f64;
+    let var = 1f64 / (timings.len() - 1) as f64
+        * timings.iter().fold(0f64, |sq_sum, time| {
+            let diff = time.as_secs_f64() - avg;
+            sq_sum + diff * diff
+        });
+    let stddev = var.sqrt();
+    let min = timings.iter().min().unwrap().as_secs_f64();
+    let max = timings.iter().max().unwrap().as_secs_f64();
+
+    Statistics {
+        total,
+        avg,
+        stddev,
+        min,
+        max,
+    }
+}
+
+#[api(
+    input: {
+        properties: {
+            target: {
+                description: "target path to prepare test data.",
+            },
+        },
+    },
+)]
+/// Prepare files required for detection mode backup benchmarks.
+fn detection_mode_bench_prepare(target: String) -> Result<(), Error> {
+    let linux_repo_target = format!("{target}/linux");
+    let coco_dataset_target = format!("{target}/coco");
+    git_clone(LINUX_GIT_REPOSITORY, linux_repo_target.as_str())?;
+    git_checkout(LINUX_GIT_TAG, linux_repo_target.as_str())?;
+    wget_download(COCO_DATASET_SRC_URL, coco_dataset_target.as_str())?;
+
+    Ok(())
+}
+
+fn git_clone(repo: &str, target: &str) -> Result<(), Error> {
+    println!("Calling git clone for '{repo}'.");
+    let target_git = format!("{target}/.git");
+    let path = Path::new(&target_git);
+    if let Ok(true) = path.try_exists() {
+        println!("Target '{target}' already contains a git repository, skip.");
+        return Ok(());
+    }
+
+    let mut git = Command::new("git");
+    git.args(["clone", repo, target]);
+
+    let mut child = git.spawn()?;
+    let exit_code = child.wait()?;
+    if exit_code.success() {
+        println!("git clone finished with success.");
+    } else {
+        bail!("git clone failed for '{target}'.");
+    }
+
+    Ok(())
+}
+
+fn git_checkout(checkout_target: &str, target: &str) -> Result<(), Error> {
+    println!("Calling git checkout '{checkout_target}'.");
+    let mut git = Command::new("git");
+    git.args(["-C", target, "checkout", checkout_target]);
+
+    let mut child = git.spawn()?;
+    let exit_code = child.wait()?;
+    if exit_code.success() {
+        println!("git checkout finished with success.");
+    } else {
+        bail!("git checkout '{checkout_target}' failed for '{target}'.");
+    }
+    Ok(())
+}
+
+fn wget_download(source_url: &str, target: &str) -> Result<(), Error> {
+    let path = Path::new(&target);
+    if let Ok(true) = path.try_exists() {
+        println!("Target '{target}' already exists, skip.");
+        return Ok(());
+    }
+    let zip = format!("{}/unlabeled2017.zip", target);
+    let path = Path::new(&zip);
+    if !path.try_exists()? {
+        println!("Download archive using wget from '{source_url}' to '{target}'.");
+        let mut wget = Command::new("wget");
+        wget.args(["-P", target, source_url]);
+
+        let mut child = wget.spawn()?;
+        let exit_code = child.wait()?;
+        if exit_code.success() {
+            println!("Download finished with success.");
+        } else {
+            bail!("Failed to download '{source_url}' to '{target}'.");
+        }
+        return Ok(());
+    } else {
+        println!("Target '{target}' already contains download, skip download.");
+    }
+
+    let mut unzip = Command::new("unzip");
+    unzip.args([&zip, "-d", target]);
+
+    let mut child = unzip.spawn()?;
+    let exit_code = child.wait()?;
+    if exit_code.success() {
+        println!("Extracting zip archive finished with success.");
+    } else {
+        bail!("Failed to extract zip archive '{zip}' to '{target}'.");
+    }
+    Ok(())
+}
diff --git a/proxmox-backup-test-suite/src/main.rs b/proxmox-backup-test-suite/src/main.rs
new file mode 100644
index 000000000..0a5b436a8
--- /dev/null
+++ b/proxmox-backup-test-suite/src/main.rs
@@ -0,0 +1,17 @@
+use proxmox_router::cli::*;
+
+mod detection_mode_bench;
+
+fn main() {
+    let cmd_def = CliCommandMap::new().insert(
+        "detection-mode-bench",
+        detection_mode_bench::detection_mode_bench_mgtm_cli(),
+    );
+
+    let rpcenv = CliEnvironment::new();
+    run_cli_command(
+        cmd_def,
+        rpcenv,
+        Some(|future| proxmox_async::runtime::main(future)),
+    );
+}
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 60/69] test-suite: Makefile: add debian package and related files
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (58 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 59/69] test-suite: add detection mode change benchmark Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 61/69] datastore: chunker: add Chunker trait Christian Ebner
                   ` (9 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Adds the required Makefile and debian packaging entries to package
the test suite binary as standalone debian package.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- move binary into its own debian package

 Makefile                                       | 18 +++++++++++-------
 debian/control                                 |  7 +++++++
 debian/proxmox-backup-client.bash-completion   |  1 +
 debian/proxmox-backup-test-suite.bc            |  8 ++++++++
 debian/proxmox-backup-test-suite.install       |  3 +++
 docs/Makefile                                  |  2 ++
 docs/command-line-tools.rst                    |  5 +++++
 docs/command-syntax.rst                        |  4 ++++
 docs/conf.py                                   |  1 +
 docs/proxmox-backup-test-suite/description.rst |  2 ++
 docs/proxmox-backup-test-suite/man1.rst        | 17 +++++++++++++++++
 zsh-completions/_proxmox-backup-test-suite     | 13 +++++++++++++
 12 files changed, 74 insertions(+), 7 deletions(-)
 create mode 100644 debian/proxmox-backup-test-suite.bc
 create mode 100644 debian/proxmox-backup-test-suite.install
 create mode 100644 docs/proxmox-backup-test-suite/description.rst
 create mode 100644 docs/proxmox-backup-test-suite/man1.rst
 create mode 100644 zsh-completions/_proxmox-backup-test-suite

diff --git a/Makefile b/Makefile
index 03e938767..8529363ce 100644
--- a/Makefile
+++ b/Makefile
@@ -8,11 +8,12 @@ SUBDIRS := etc www docs templates
 
 # Binaries usable by users
 USR_BIN := \
-	proxmox-backup-client 	\
-	proxmox-file-restore	\
-	pxar			\
-	proxmox-tape		\
-	pmtx			\
+	proxmox-backup-client 		\
+	proxmox-backup-test-suite	\
+	proxmox-file-restore		\
+	pxar				\
+	proxmox-tape			\
+	pmtx				\
 	pmt
 
 # Binaries usable by admins
@@ -60,9 +61,10 @@ CLIENT_DBG_DEB=$(PACKAGE)-client-dbgsym_$(DEB_VERSION)_$(ARCH).deb
 RESTORE_DEB=proxmox-backup-file-restore_$(DEB_VERSION)_$(ARCH).deb
 RESTORE_DBG_DEB=proxmox-backup-file-restore-dbgsym_$(DEB_VERSION)_$(ARCH).deb
 DOC_DEB=$(PACKAGE)-docs_$(DEB_VERSION)_all.deb
+TEST_SUITE_DEB=$(PACKAGE)-test-suite_$(DEB_VERSION)_$(ARCH).deb
 
 DEBS=$(SERVER_DEB) $(SERVER_DBG_DEB) $(CLIENT_DEB) $(CLIENT_DBG_DEB) \
-     $(RESTORE_DEB) $(RESTORE_DBG_DEB)
+     $(RESTORE_DEB) $(RESTORE_DBG_DEB) $(TEST_SUITE_DEB)
 
 DSC = rust-$(PACKAGE)_$(DEB_VERSION).dsc
 
@@ -165,6 +167,8 @@ $(COMPILED_BINS) $(COMPILEDIR)/dump-catalog-shell-cli $(COMPILEDIR)/docgen: .do-
 	    --bin proxmox-backup-client \
 	    --bin dump-catalog-shell-cli \
 	    --bin proxmox-backup-debug \
+	    --package proxmox-backup-test-suite \
+	    --bin proxmox-backup-test-suite \
 	    --package proxmox-file-restore \
 	    --bin proxmox-file-restore \
 	    --package pxar-bin \
@@ -218,7 +222,7 @@ upload: UPLOAD_DIST ?= $(DEB_DISTRIBUTION)
 upload: $(SERVER_DEB) $(CLIENT_DEB) $(RESTORE_DEB) $(DOC_DEB)
 	# check if working directory is clean
 	git diff --exit-code --stat && git diff --exit-code --stat --staged
-	tar cf - $(SERVER_DEB) $(SERVER_DBG_DEB) $(DOC_DEB) $(CLIENT_DEB) $(CLIENT_DBG_DEB) \
+	tar cf - $(SERVER_DEB) $(SERVER_DBG_DEB) $(DOC_DEB) $(CLIENT_DEB) $(CLIENT_DBG_DEB) $(TEST_SUIT_DEB) \
 	  | ssh -X repoman@repo.proxmox.com upload --product pbs --dist $(UPLOAD_DIST)
 	tar cf - $(CLIENT_DEB) $(CLIENT_DBG_DEB) | ssh -X repoman@repo.proxmox.com upload --product "pve,pmg,pbs-client" --dist $(UPLOAD_DIST)
 	tar cf - $(RESTORE_DEB) $(RESTORE_DBG_DEB) | ssh -X repoman@repo.proxmox.com upload --product "pve" --dist $(UPLOAD_DIST)
diff --git a/debian/control b/debian/control
index a7f8f327b..38720b983 100644
--- a/debian/control
+++ b/debian/control
@@ -216,3 +216,10 @@ Description: Proxmox Backup single file restore tools for pxar and block device
  This package contains the Proxmox Backup single file restore client for
  restoring individual files and folders from both host/container and VM/block
  device backups. It includes a block device restore driver using QEMU.
+
+Package: proxmox-backup-test-suite
+Architecture: any
+Depends: proxmox-backup-client, ${shlibs:Depends}
+Description: Proxmox Backup Test Suite tool
+ This package contains the Proxmox Backup Test Suite, which provides a cli tool
+ to run performance tests.
diff --git a/debian/proxmox-backup-client.bash-completion b/debian/proxmox-backup-client.bash-completion
index 437360175..c4ff02ae6 100644
--- a/debian/proxmox-backup-client.bash-completion
+++ b/debian/proxmox-backup-client.bash-completion
@@ -1,2 +1,3 @@
 debian/proxmox-backup-client.bc proxmox-backup-client
+debian/proxmox-backup-test-suite.bc proxmox-backup-test-suite
 debian/pxar.bc pxar
diff --git a/debian/proxmox-backup-test-suite.bc b/debian/proxmox-backup-test-suite.bc
new file mode 100644
index 000000000..2686d7eaa
--- /dev/null
+++ b/debian/proxmox-backup-test-suite.bc
@@ -0,0 +1,8 @@
+# proxmox-backup-test-suite bash completion
+
+# see http://tiswww.case.edu/php/chet/bash/FAQ
+# and __ltrim_colon_completions() in /usr/share/bash-completion/bash_completion
+# this modifies global var, but I found no better way
+COMP_WORDBREAKS=${COMP_WORDBREAKS//:}
+
+complete -C 'proxmox-backup-test-suite bashcomplete' proxmox-backup-test-suite
diff --git a/debian/proxmox-backup-test-suite.install b/debian/proxmox-backup-test-suite.install
new file mode 100644
index 000000000..e0cb31ac6
--- /dev/null
+++ b/debian/proxmox-backup-test-suite.install
@@ -0,0 +1,3 @@
+usr/bin/proxmox-backup-test-suite
+usr/share/man/man1/proxmox-backup-test-suite.1
+usr/share/zsh/vendor-completions/_proxmox-backup-test-suite
diff --git a/docs/Makefile b/docs/Makefile
index d6c61c86e..014739f69 100644
--- a/docs/Makefile
+++ b/docs/Makefile
@@ -7,6 +7,7 @@ GENERATED_SYNOPSIS := 						\
 	proxmox-backup-manager/synopsis.rst			\
 	proxmox-backup-debug/synopsis.rst			\
 	proxmox-file-restore/synopsis.rst			\
+	proxmox-backup-test-suite/synopsis.rst			\
 	pxar/synopsis.rst					\
 	pmtx/synopsis.rst					\
 	pmt/synopsis.rst					\
@@ -33,6 +34,7 @@ MAN1_PAGES := 				\
 	proxmox-backup-manager.1	\
 	proxmox-file-restore.1		\
 	proxmox-backup-debug.1		\
+	proxmox-backup-test-suite.1	\
 	pbs2to3.1			\
 
 MAN5_PAGES :=				\
diff --git a/docs/command-line-tools.rst b/docs/command-line-tools.rst
index 0cac17c8b..3655b7c8c 100644
--- a/docs/command-line-tools.rst
+++ b/docs/command-line-tools.rst
@@ -40,3 +40,8 @@ Command-line Tools
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
 .. include:: proxmox-backup-debug/description.rst
+
+``proxmox-backup-test-suite``
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. include:: proxmox-backup-test-suite/description.rst
diff --git a/docs/command-syntax.rst b/docs/command-syntax.rst
index 9657557d1..bfaf635a1 100644
--- a/docs/command-syntax.rst
+++ b/docs/command-syntax.rst
@@ -65,3 +65,7 @@ The following commands are available in an interactive restore shell:
 ``proxmox-backup-debug``
 ------------------------
 .. include:: proxmox-backup-debug/synopsis.rst
+
+``proxmox-backup-test-suite``
+------------------------
+.. include:: proxmox-backup-test-suite/synopsis.rst
diff --git a/docs/conf.py b/docs/conf.py
index fba726295..876e53479 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -98,6 +98,7 @@ man_pages = [
     ('proxmox-backup-proxy/man1', 'proxmox-backup-proxy', 'Proxmox Backup Public API Server', [author], 1),
     ('proxmox-backup/man1', 'proxmox-backup', 'Proxmox Backup Local API Server', [author], 1),
     ('proxmox-file-restore/man1', 'proxmox-file-restore', 'CLI tool for restoring files and directories from Proxmox Backup Server archives', [author], 1),
+    ('proxmox-backup-test-suite/man1', 'proxmox-backup-test-suite', 'CLI tool for performing performance benchmarks', [author], 1),
     ('proxmox-tape/man1', 'proxmox-tape', 'Proxmox Tape Backup CLI Tool', [author], 1),
     ('pxar/man1', 'pxar', 'Proxmox File Archive CLI Tool', [author], 1),
     ('pmt/man1', 'pmt', 'Control Linux Tape Devices', [author], 1),
diff --git a/docs/proxmox-backup-test-suite/description.rst b/docs/proxmox-backup-test-suite/description.rst
new file mode 100644
index 000000000..b99c29adf
--- /dev/null
+++ b/docs/proxmox-backup-test-suite/description.rst
@@ -0,0 +1,2 @@
+Command-line tool for running performance benchmarks.
+
diff --git a/docs/proxmox-backup-test-suite/man1.rst b/docs/proxmox-backup-test-suite/man1.rst
new file mode 100644
index 000000000..2e57423c0
--- /dev/null
+++ b/docs/proxmox-backup-test-suite/man1.rst
@@ -0,0 +1,17 @@
+:orphan:
+
+====================
+proxmox-backup-test-suite
+====================
+
+Synopsis
+========
+
+.. include:: synopsis.rst
+
+Description
+============
+
+.. include:: description.rst
+
+.. include:: ../pbs-copyright.rst
diff --git a/zsh-completions/_proxmox-backup-test-suite b/zsh-completions/_proxmox-backup-test-suite
new file mode 100644
index 000000000..72ebcea5f
--- /dev/null
+++ b/zsh-completions/_proxmox-backup-test-suite
@@ -0,0 +1,13 @@
+#compdef _proxmox-backup-test-suite() proxmox-backup-test-suite
+
+function _proxmox-backup-test-suite() {
+    local cwords line point cmd curr prev
+    cwords=${#words[@]}
+    line=$words
+    point=${#line}
+    cmd=${words[1]}
+    curr=${words[cwords]}
+    prev=${words[cwords-1]}
+    compadd -- $(COMP_CWORD="$cwords" COMP_LINE="$line" COMP_POINT="$point" \
+        proxmox-backup-test-suite bashcomplete "$cmd" "$curr" "$prev")
+}
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 61/69] datastore: chunker: add Chunker trait
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (59 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 60/69] test-suite: Makefile: add debian package and related files Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 62/69] datastore: chunker: implement chunker for payload stream Christian Ebner
                   ` (8 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Add the Chunker trait and move the current Chunker to ChunkerImpl to
implement the trait instead. This allows to use different chunker
implementations by dynamic dispatch and is in preparation for
implementing a dedicated payload chunker.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 examples/test_chunk_size.rs        |  9 +--
 examples/test_chunk_speed.rs       |  7 ++-
 pbs-client/src/chunk_stream.rs     | 37 ++++++------
 pbs-datastore/src/chunker.rs       | 95 ++++++++++++++++++------------
 pbs-datastore/src/dynamic_index.rs |  9 +--
 pbs-datastore/src/lib.rs           |  2 +-
 6 files changed, 91 insertions(+), 68 deletions(-)

diff --git a/examples/test_chunk_size.rs b/examples/test_chunk_size.rs
index a01a5e640..2ebc22f64 100644
--- a/examples/test_chunk_size.rs
+++ b/examples/test_chunk_size.rs
@@ -5,10 +5,10 @@ extern crate proxmox_backup;
 use anyhow::Error;
 use std::io::{Read, Write};
 
-use pbs_datastore::Chunker;
+use pbs_datastore::{Chunker, ChunkerImpl};
 
 struct ChunkWriter {
-    chunker: Chunker,
+    chunker: ChunkerImpl,
     last_chunk: usize,
     chunk_offset: usize,
 
@@ -23,7 +23,7 @@ struct ChunkWriter {
 impl ChunkWriter {
     fn new(chunk_size: usize) -> Self {
         ChunkWriter {
-            chunker: Chunker::new(chunk_size),
+            chunker: ChunkerImpl::new(chunk_size),
             last_chunk: 0,
             chunk_offset: 0,
             chunk_count: 0,
@@ -69,7 +69,8 @@ impl Write for ChunkWriter {
     fn write(&mut self, data: &[u8]) -> std::result::Result<usize, std::io::Error> {
         let chunker = &mut self.chunker;
 
-        let pos = chunker.scan(data);
+        let ctx = pbs_datastore::chunker::Context::default();
+        let pos = chunker.scan(data, &ctx);
 
         if pos > 0 {
             self.chunk_offset += pos;
diff --git a/examples/test_chunk_speed.rs b/examples/test_chunk_speed.rs
index 37e13e0de..2d79604ab 100644
--- a/examples/test_chunk_speed.rs
+++ b/examples/test_chunk_speed.rs
@@ -1,6 +1,6 @@
 extern crate proxmox_backup;
 
-use pbs_datastore::Chunker;
+use pbs_datastore::{Chunker, ChunkerImpl};
 
 fn main() {
     let mut buffer = Vec::new();
@@ -12,7 +12,7 @@ fn main() {
             buffer.push(byte);
         }
     }
-    let mut chunker = Chunker::new(64 * 1024);
+    let mut chunker = ChunkerImpl::new(64 * 1024);
 
     let count = 5;
 
@@ -23,8 +23,9 @@ fn main() {
     for _i in 0..count {
         let mut pos = 0;
         let mut _last = 0;
+        let ctx = pbs_datastore::chunker::Context::default();
         while pos < buffer.len() {
-            let k = chunker.scan(&buffer[pos..]);
+            let k = chunker.scan(&buffer[pos..], &ctx);
             if k == 0 {
                 //println!("LAST {}", pos);
                 break;
diff --git a/pbs-client/src/chunk_stream.rs b/pbs-client/src/chunk_stream.rs
index 87a018d50..84158a2c9 100644
--- a/pbs-client/src/chunk_stream.rs
+++ b/pbs-client/src/chunk_stream.rs
@@ -7,7 +7,7 @@ use bytes::BytesMut;
 use futures::ready;
 use futures::stream::{Stream, TryStream};
 
-use pbs_datastore::Chunker;
+use pbs_datastore::{Chunker, ChunkerImpl};
 
 use crate::inject_reused_chunks::InjectChunks;
 
@@ -16,7 +16,6 @@ pub struct InjectionData {
     boundaries: mpsc::Receiver<InjectChunks>,
     next_boundary: Option<InjectChunks>,
     injections: mpsc::Sender<InjectChunks>,
-    consumed: u64,
 }
 
 impl InjectionData {
@@ -28,7 +27,6 @@ impl InjectionData {
             boundaries,
             next_boundary: None,
             injections,
-            consumed: 0,
         }
     }
 }
@@ -36,19 +34,22 @@ impl InjectionData {
 /// Split input stream into dynamic sized chunks
 pub struct ChunkStream<S: Unpin> {
     input: S,
-    chunker: Chunker,
+    chunker: Box<dyn Chunker + Send>,
     buffer: BytesMut,
     scan_pos: usize,
+    consumed: u64,
     injection_data: Option<InjectionData>,
 }
 
 impl<S: Unpin> ChunkStream<S> {
     pub fn new(input: S, chunk_size: Option<usize>, injection_data: Option<InjectionData>) -> Self {
+        let chunk_size = chunk_size.unwrap_or(4 * 1024 * 1024);
         Self {
             input,
-            chunker: Chunker::new(chunk_size.unwrap_or(4 * 1024 * 1024)),
+            chunker: Box::new(ChunkerImpl::new(chunk_size)),
             buffer: BytesMut::new(),
             scan_pos: 0,
+            consumed: 0,
             injection_data,
         }
     }
@@ -68,11 +69,15 @@ where
         let this = self.get_mut();
 
         loop {
+            let ctx = pbs_datastore::chunker::Context {
+                base: this.consumed,
+                total: this.buffer.len() as u64,
+            };
+
             if let Some(InjectionData {
                 boundaries,
                 next_boundary,
                 injections,
-                consumed,
             }) = this.injection_data.as_mut()
             {
                 if next_boundary.is_none() {
@@ -84,29 +89,29 @@ where
                 if let Some(inject) = next_boundary.take() {
                     // require forced boundary, lookup next regular boundary
                     let pos = if this.scan_pos < this.buffer.len() {
-                        this.chunker.scan(&this.buffer[this.scan_pos..])
+                        this.chunker.scan(&this.buffer[this.scan_pos..], &ctx)
                     } else {
                         0
                     };
 
                     let chunk_boundary = if pos == 0 {
-                        *consumed + this.buffer.len() as u64
+                        this.consumed + this.buffer.len() as u64
                     } else {
-                        *consumed + (this.scan_pos + pos) as u64
+                        this.consumed + (this.scan_pos + pos) as u64
                     };
 
                     if inject.boundary <= chunk_boundary {
                         // forced boundary is before next boundary, force within current buffer
-                        let chunk_size = (inject.boundary - *consumed) as usize;
+                        let chunk_size = (inject.boundary - this.consumed) as usize;
                         let raw_chunk = this.buffer.split_to(chunk_size);
                         this.chunker.reset();
                         this.scan_pos = 0;
 
-                        *consumed += chunk_size as u64;
+                        this.consumed += chunk_size as u64;
 
                         // add the size of the injected chunks to consumed, so chunk stream offsets
                         // are in sync with the rest of the archive.
-                        *consumed += inject.size as u64;
+                        this.consumed += inject.size as u64;
 
                         injections.send(inject).unwrap();
 
@@ -118,7 +123,7 @@ where
                         // forced boundary is after next boundary, split off chunk from buffer
                         let chunk_size = this.scan_pos + pos;
                         let raw_chunk = this.buffer.split_to(chunk_size);
-                        *consumed += chunk_size as u64;
+                        this.consumed += chunk_size as u64;
                         this.scan_pos = 0;
 
                         return Poll::Ready(Some(Ok(raw_chunk)));
@@ -131,7 +136,7 @@ where
             }
 
             if this.scan_pos < this.buffer.len() {
-                let boundary = this.chunker.scan(&this.buffer[this.scan_pos..]);
+                let boundary = this.chunker.scan(&this.buffer[this.scan_pos..], &ctx);
 
                 let chunk_size = this.scan_pos + boundary;
 
@@ -140,9 +145,7 @@ where
                 } else if chunk_size <= this.buffer.len() {
                     // found new chunk boundary inside buffer, split off chunk from buffer
                     let raw_chunk = this.buffer.split_to(chunk_size);
-                    if let Some(InjectionData { consumed, .. }) = this.injection_data.as_mut() {
-                        *consumed += chunk_size as u64;
-                    }
+                    this.consumed += chunk_size as u64;
                     this.scan_pos = 0;
                     return Poll::Ready(Some(Ok(raw_chunk)));
                 } else {
diff --git a/pbs-datastore/src/chunker.rs b/pbs-datastore/src/chunker.rs
index 253d2cf4c..d75e63fa8 100644
--- a/pbs-datastore/src/chunker.rs
+++ b/pbs-datastore/src/chunker.rs
@@ -5,6 +5,20 @@
 /// use hash value 0 to detect a boundary.
 const CA_CHUNKER_WINDOW_SIZE: usize = 64;
 
+/// Additional context for chunker to find possible boundaries in payload streams
+#[derive(Default)]
+pub struct Context {
+    /// Already consumed bytes of the chunk stream consumer
+    pub base: u64,
+    /// Total size currently buffered
+    pub total: u64,
+}
+
+pub trait Chunker {
+    fn scan(&mut self, data: &[u8], ctx: &Context) -> usize;
+    fn reset(&mut self);
+}
+
 /// Sliding window chunker (Buzhash)
 ///
 /// This is a rewrite of *casync* chunker (cachunker.h) in rust.
@@ -15,7 +29,7 @@ const CA_CHUNKER_WINDOW_SIZE: usize = 64;
 /// Hash](https://en.wikipedia.org/wiki/Rolling_hash) article from
 /// Wikipedia.
 
-pub struct Chunker {
+pub struct ChunkerImpl {
     h: u32,
     window_size: usize,
     chunk_size: usize,
@@ -67,7 +81,7 @@ const BUZHASH_TABLE: [u32; 256] = [
     0x5eff22f4, 0x6027f4cc, 0x77178b3c, 0xae507131, 0x7bf7cabc, 0xf9c18d66, 0x593ade65, 0xd95ddf11,
 ];
 
-impl Chunker {
+impl ChunkerImpl {
     /// Create a new Chunker instance, which produces and average
     /// chunk size of `chunk_size_avg` (need to be a power of two). We
     /// allow variation from `chunk_size_avg/4` up to a maximum of
@@ -105,11 +119,44 @@ impl Chunker {
         }
     }
 
+    // fast implementation avoiding modulo
+    // #[inline(always)]
+    fn shall_break(&self) -> bool {
+        if self.chunk_size >= self.chunk_size_max {
+            return true;
+        }
+
+        if self.chunk_size < self.chunk_size_min {
+            return false;
+        }
+
+        //(self.h & 0x1ffff) <= 2 //THIS IS SLOW!!!
+
+        //(self.h & self.break_test_mask) <= 2 // Bad on 0 streams
+
+        (self.h & self.break_test_mask) >= self.break_test_minimum
+    }
+
+    // This is the original implementation from casync
+    /*
+    #[inline(always)]
+    fn shall_break_orig(&self) -> bool {
+
+        if self.chunk_size >= self.chunk_size_max { return true; }
+
+        if self.chunk_size < self.chunk_size_min { return false; }
+
+        (self.h % self.discriminator) == (self.discriminator - 1)
+    }
+     */
+}
+
+impl Chunker for ChunkerImpl {
     /// Scans the specified data for a chunk border. Returns 0 if none
     /// was found (and the function should be called with more data
     /// later on), or another value indicating the position of a
     /// border.
-    pub fn scan(&mut self, data: &[u8]) -> usize {
+    fn scan(&mut self, data: &[u8], _ctx: &Context) -> usize {
         let window_len = self.window.len();
         let data_len = data.len();
 
@@ -167,42 +214,11 @@ impl Chunker {
         0
     }
 
-    pub fn reset(&mut self) {
+    fn reset(&mut self) {
         self.h = 0;
         self.chunk_size = 0;
         self.window_size = 0;
     }
-
-    // fast implementation avoiding modulo
-    // #[inline(always)]
-    fn shall_break(&self) -> bool {
-        if self.chunk_size >= self.chunk_size_max {
-            return true;
-        }
-
-        if self.chunk_size < self.chunk_size_min {
-            return false;
-        }
-
-        //(self.h & 0x1ffff) <= 2 //THIS IS SLOW!!!
-
-        //(self.h & self.break_test_mask) <= 2 // Bad on 0 streams
-
-        (self.h & self.break_test_mask) >= self.break_test_minimum
-    }
-
-    // This is the original implementation from casync
-    /*
-    #[inline(always)]
-    fn shall_break_orig(&self) -> bool {
-
-        if self.chunk_size >= self.chunk_size_max { return true; }
-
-        if self.chunk_size < self.chunk_size_min { return false; }
-
-        (self.h % self.discriminator) == (self.discriminator - 1)
-    }
-     */
 }
 
 #[test]
@@ -215,17 +231,18 @@ fn test_chunker1() {
             buffer.push(byte);
         }
     }
-    let mut chunker = Chunker::new(64 * 1024);
+    let mut chunker = ChunkerImpl::new(64 * 1024);
 
     let mut pos = 0;
     let mut last = 0;
 
     let mut chunks1: Vec<(usize, usize)> = vec![];
     let mut chunks2: Vec<(usize, usize)> = vec![];
+    let ctx = Context::default();
 
     // test1: feed single bytes
     while pos < buffer.len() {
-        let k = chunker.scan(&buffer[pos..pos + 1]);
+        let k = chunker.scan(&buffer[pos..pos + 1], &ctx);
         pos += 1;
         if k != 0 {
             let prev = last;
@@ -235,13 +252,13 @@ fn test_chunker1() {
     }
     chunks1.push((last, buffer.len() - last));
 
-    let mut chunker = Chunker::new(64 * 1024);
+    let mut chunker = ChunkerImpl::new(64 * 1024);
 
     let mut pos = 0;
 
     // test2: feed with whole buffer
     while pos < buffer.len() {
-        let k = chunker.scan(&buffer[pos..]);
+        let k = chunker.scan(&buffer[pos..], &ctx);
         if k != 0 {
             chunks2.push((pos, k));
             pos += k;
diff --git a/pbs-datastore/src/dynamic_index.rs b/pbs-datastore/src/dynamic_index.rs
index b8047b5b1..dc9eee050 100644
--- a/pbs-datastore/src/dynamic_index.rs
+++ b/pbs-datastore/src/dynamic_index.rs
@@ -23,7 +23,7 @@ use crate::data_blob::{DataBlob, DataChunkBuilder};
 use crate::file_formats;
 use crate::index::{ChunkReadInfo, IndexFile};
 use crate::read_chunk::ReadChunk;
-use crate::Chunker;
+use crate::{Chunker, ChunkerImpl};
 
 /// Header format definition for dynamic index files (`.dixd`)
 #[repr(C)]
@@ -397,7 +397,7 @@ impl DynamicIndexWriter {
 pub struct DynamicChunkWriter {
     index: DynamicIndexWriter,
     closed: bool,
-    chunker: Chunker,
+    chunker: ChunkerImpl,
     stat: ChunkStat,
     chunk_offset: usize,
     last_chunk: usize,
@@ -409,7 +409,7 @@ impl DynamicChunkWriter {
         Self {
             index,
             closed: false,
-            chunker: Chunker::new(chunk_size),
+            chunker: ChunkerImpl::new(chunk_size),
             stat: ChunkStat::new(0),
             chunk_offset: 0,
             last_chunk: 0,
@@ -494,7 +494,8 @@ impl Write for DynamicChunkWriter {
     fn write(&mut self, data: &[u8]) -> std::result::Result<usize, std::io::Error> {
         let chunker = &mut self.chunker;
 
-        let pos = chunker.scan(data);
+        let ctx = crate::chunker::Context::default();
+        let pos = chunker.scan(data, &ctx);
 
         if pos > 0 {
             self.chunk_buffer.extend_from_slice(&data[0..pos]);
diff --git a/pbs-datastore/src/lib.rs b/pbs-datastore/src/lib.rs
index 43050162f..24429626c 100644
--- a/pbs-datastore/src/lib.rs
+++ b/pbs-datastore/src/lib.rs
@@ -196,7 +196,7 @@ pub use backup_info::{BackupDir, BackupGroup, BackupInfo};
 pub use checksum_reader::ChecksumReader;
 pub use checksum_writer::ChecksumWriter;
 pub use chunk_store::ChunkStore;
-pub use chunker::Chunker;
+pub use chunker::{Chunker, ChunkerImpl};
 pub use crypt_reader::CryptReader;
 pub use crypt_writer::CryptWriter;
 pub use data_blob::DataBlob;
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 62/69] datastore: chunker: implement chunker for payload stream
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (60 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 61/69] datastore: chunker: add Chunker trait Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 63/69] client: chunk stream: switch payload stream chunker Christian Ebner
                   ` (7 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Implement the Chunker trait for a dedicated payload stream chunker,
which extends the regular chunker by the option to suggest boundaries
to be used over the hast based boundaries whenever possible.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 pbs-datastore/src/chunker.rs | 90 ++++++++++++++++++++++++++++++++++++
 pbs-datastore/src/lib.rs     |  2 +-
 2 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/pbs-datastore/src/chunker.rs b/pbs-datastore/src/chunker.rs
index d75e63fa8..d0543bca0 100644
--- a/pbs-datastore/src/chunker.rs
+++ b/pbs-datastore/src/chunker.rs
@@ -1,3 +1,5 @@
+use std::sync::mpsc::Receiver;
+
 /// Note: window size 32 or 64, is faster because we can
 /// speedup modulo operations, but always computes hash 0
 /// for constant data streams .. 0,0,0,0,0,0
@@ -46,6 +48,16 @@ pub struct ChunkerImpl {
     window: [u8; CA_CHUNKER_WINDOW_SIZE],
 }
 
+/// Sliding window chunker (Buzhash) with boundary suggestions
+///
+/// Suggest to chunk at a given boundary instead of the regular chunk boundary for better alignment
+/// with file payload boundaries.
+pub struct PayloadChunker {
+    chunker: ChunkerImpl,
+    current_suggested: Option<u64>,
+    suggested_boundaries: Receiver<u64>,
+}
+
 const BUZHASH_TABLE: [u32; 256] = [
     0x458be752, 0xc10748cc, 0xfbbcdbb8, 0x6ded5b68, 0xb10a82b5, 0x20d75648, 0xdfc5665f, 0xa8428801,
     0x7ebf5191, 0x841135c7, 0x65cc53b3, 0x280a597c, 0x16f60255, 0xc78cbc3e, 0x294415f5, 0xb938d494,
@@ -221,6 +233,84 @@ impl Chunker for ChunkerImpl {
     }
 }
 
+impl PayloadChunker {
+    /// Create a new PayloadChunker instance, which produces and average
+    /// chunk size of `chunk_size_avg` (need to be a power of two), if no
+    /// suggested boundaries are provided.
+    /// Use suggested boundaries instead,  whenever the chunk size is within
+    /// the min - max range.
+    pub fn new(chunk_size_avg: usize, suggested_boundaries: Receiver<u64>) -> Self {
+        Self {
+            chunker: ChunkerImpl::new(chunk_size_avg),
+            current_suggested: None,
+            suggested_boundaries,
+        }
+    }
+}
+
+impl Chunker for PayloadChunker {
+    fn scan(&mut self, data: &[u8], ctx: &Context) -> usize {
+        assert!(ctx.total >= data.len() as u64);
+        let pos = ctx.total - data.len() as u64;
+
+        loop {
+            if let Some(boundary) = self.current_suggested {
+                if boundary < ctx.base + pos {
+                    log::debug!("Boundary {boundary} in past");
+                    // ignore passed boundaries
+                    self.current_suggested = None;
+                    continue;
+                }
+
+                if boundary > ctx.base + ctx.total {
+                    log::debug!("Boundary {boundary} in future");
+                    // boundary in future, cannot decide yet
+                    return self.chunker.scan(data, ctx);
+                }
+
+                let chunk_size = (boundary - ctx.base) as usize;
+                if chunk_size < self.chunker.chunk_size_min {
+                    log::debug!("Chunk size {chunk_size} below minimum chunk size");
+                    // chunk to small, ignore boundary
+                    self.current_suggested = None;
+                    continue;
+                }
+
+                if chunk_size <= self.chunker.chunk_size_max {
+                    self.current_suggested = None;
+                    // calculate boundary relative to start of given data buffer
+                    let len = chunk_size - pos as usize;
+                    if len == 0 {
+                        // passed this one, previous scan did not know about boundary just yet
+                        return self.chunker.scan(data, ctx);
+                    }
+                    self.chunker.reset();
+                    log::debug!(
+                        "Chunk at suggested boundary: {boundary}, chunk size: {chunk_size}"
+                    );
+                    return len;
+                }
+
+                log::debug!("Chunk {chunk_size} to big, regular scan");
+                // chunk to big, cannot decide yet
+                // scan for hash based chunk boundary instead
+                return self.chunker.scan(data, ctx);
+            }
+
+            if let Ok(boundary) = self.suggested_boundaries.try_recv() {
+                self.current_suggested = Some(boundary);
+            } else {
+                log::debug!("No suggested boundary, regular scan");
+                return self.chunker.scan(data, ctx);
+            }
+        }
+    }
+
+    fn reset(&mut self) {
+        self.chunker.reset();
+    }
+}
+
 #[test]
 fn test_chunker1() {
     let mut buffer = Vec::new();
diff --git a/pbs-datastore/src/lib.rs b/pbs-datastore/src/lib.rs
index 24429626c..3e4aa34c2 100644
--- a/pbs-datastore/src/lib.rs
+++ b/pbs-datastore/src/lib.rs
@@ -196,7 +196,7 @@ pub use backup_info::{BackupDir, BackupGroup, BackupInfo};
 pub use checksum_reader::ChecksumReader;
 pub use checksum_writer::ChecksumWriter;
 pub use chunk_store::ChunkStore;
-pub use chunker::{Chunker, ChunkerImpl};
+pub use chunker::{Chunker, ChunkerImpl, PayloadChunker};
 pub use crypt_reader::CryptReader;
 pub use crypt_writer::CryptWriter;
 pub use data_blob::DataBlob;
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 63/69] client: chunk stream: switch payload stream chunker
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (61 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 62/69] datastore: chunker: implement chunker for payload stream Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 64/69] client: pxar: allow to restore prelude to optional path Christian Ebner
                   ` (6 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Use the dedicated chunker with boundary suggestions for the payload
stream, by attaching the channel sender to the archiver and the
channel receiver to the payload stream chunker.

The archiver sends the file boundaries for the chunker to consume.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 examples/test_chunk_speed2.rs                 |  2 +-
 pbs-client/src/chunk_stream.rs                | 15 +++++--
 pbs-client/src/pxar/create.rs                 |  8 ++++
 pbs-client/src/pxar_backup_stream.rs          | 40 +++++++++++--------
 proxmox-backup-client/src/main.rs             | 16 +++++---
 .../src/proxmox_restore_daemon/api.rs         | 12 +++++-
 pxar-bin/src/main.rs                          |  1 +
 tests/catar.rs                                |  1 +
 8 files changed, 68 insertions(+), 27 deletions(-)

diff --git a/examples/test_chunk_speed2.rs b/examples/test_chunk_speed2.rs
index 22dd14ce2..f2963746a 100644
--- a/examples/test_chunk_speed2.rs
+++ b/examples/test_chunk_speed2.rs
@@ -26,7 +26,7 @@ async fn run() -> Result<(), Error> {
         .map_err(Error::from);
 
     //let chunk_stream = FixedChunkStream::new(stream, 4*1024*1024);
-    let mut chunk_stream = ChunkStream::new(stream, None, None);
+    let mut chunk_stream = ChunkStream::new(stream, None, None, None);
 
     let start_time = std::time::Instant::now();
 
diff --git a/pbs-client/src/chunk_stream.rs b/pbs-client/src/chunk_stream.rs
index 84158a2c9..de3e7bb5d 100644
--- a/pbs-client/src/chunk_stream.rs
+++ b/pbs-client/src/chunk_stream.rs
@@ -7,7 +7,7 @@ use bytes::BytesMut;
 use futures::ready;
 use futures::stream::{Stream, TryStream};
 
-use pbs_datastore::{Chunker, ChunkerImpl};
+use pbs_datastore::{Chunker, ChunkerImpl, PayloadChunker};
 
 use crate::inject_reused_chunks::InjectChunks;
 
@@ -42,11 +42,20 @@ pub struct ChunkStream<S: Unpin> {
 }
 
 impl<S: Unpin> ChunkStream<S> {
-    pub fn new(input: S, chunk_size: Option<usize>, injection_data: Option<InjectionData>) -> Self {
+    pub fn new(
+        input: S,
+        chunk_size: Option<usize>,
+        injection_data: Option<InjectionData>,
+        suggested_boundaries: Option<mpsc::Receiver<u64>>,
+    ) -> Self {
         let chunk_size = chunk_size.unwrap_or(4 * 1024 * 1024);
         Self {
             input,
-            chunker: Box::new(ChunkerImpl::new(chunk_size)),
+            chunker: if let Some(suggested) = suggested_boundaries {
+                Box::new(PayloadChunker::new(chunk_size, suggested))
+            } else {
+                Box::new(ChunkerImpl::new(chunk_size))
+            },
             buffer: BytesMut::new(),
             scan_pos: 0,
             consumed: 0,
diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 528577520..ff7e86804 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -169,6 +169,7 @@ struct Archiver {
     file_copy_buffer: Vec<u8>,
     skip_e2big_xattr: bool,
     forced_boundaries: Option<mpsc::Sender<InjectChunks>>,
+    suggested_boundaries: Option<mpsc::Sender<u64>>,
     previous_payload_index: Option<DynamicIndexReader>,
     cache: PxarLookaheadCache,
     reuse_stats: ReuseStats,
@@ -197,6 +198,7 @@ pub async fn create_archive<T, F>(
     callback: F,
     options: PxarCreateOptions,
     forced_boundaries: Option<mpsc::Sender<InjectChunks>>,
+    suggested_boundaries: Option<mpsc::Sender<u64>>,
 ) -> Result<(), Error>
 where
     T: SeqWrite + Send,
@@ -271,6 +273,7 @@ where
         file_copy_buffer: vec::undefined(4 * 1024 * 1024),
         skip_e2big_xattr: options.skip_e2big_xattr,
         forced_boundaries,
+        suggested_boundaries,
         previous_payload_index,
         cache: PxarLookaheadCache::new(None),
         reuse_stats: ReuseStats::default(),
@@ -863,6 +866,11 @@ impl Archiver {
                         .add_file(c_file_name, file_size, stat.st_mtime)?;
                 }
 
+                if let Some(sender) = self.suggested_boundaries.as_mut() {
+                    let offset = encoder.payload_position()?.raw();
+                    sender.send(offset)?;
+                }
+
                 let offset: LinkOffset = if let Some(payload_offset) = payload_offset {
                     self.reuse_stats.total_reused_payload_size +=
                         file_size + size_of::<pxar::format::Header>() as u64;
diff --git a/pbs-client/src/pxar_backup_stream.rs b/pbs-client/src/pxar_backup_stream.rs
index fb6d063f2..f322566f0 100644
--- a/pbs-client/src/pxar_backup_stream.rs
+++ b/pbs-client/src/pxar_backup_stream.rs
@@ -27,6 +27,7 @@ use crate::pxar::create::PxarWriters;
 /// consumer.
 pub struct PxarBackupStream {
     rx: Option<std::sync::mpsc::Receiver<Result<Vec<u8>, Error>>>,
+    pub suggested_boundaries: Option<std::sync::mpsc::Receiver<u64>>,
     handle: Option<AbortHandle>,
     error: Arc<Mutex<Option<String>>>,
 }
@@ -55,22 +56,26 @@ impl PxarBackupStream {
         ));
         let writer = pxar::encoder::sync::StandardWriter::new(writer);
 
-        let (writer, payload_rx) = if separate_payload_stream {
-            let (tx, rx) = std::sync::mpsc::sync_channel(10);
-            let payload_writer = TokioWriterAdapter::new(std::io::BufWriter::with_capacity(
-                buffer_size,
-                StdChannelWriter::new(tx),
-            ));
-            (
-                pxar::PxarVariant::Split(
-                    writer,
-                    pxar::encoder::sync::StandardWriter::new(payload_writer),
-                ),
-                Some(rx),
-            )
-        } else {
-            (pxar::PxarVariant::Unified(writer), None)
-        };
+        let (writer, payload_rx, suggested_boundaries_tx, suggested_boundaries_rx) =
+            if separate_payload_stream {
+                let (tx, rx) = std::sync::mpsc::sync_channel(10);
+                let (suggested_boundaries_tx, suggested_boundaries_rx) = std::sync::mpsc::channel();
+                let payload_writer = TokioWriterAdapter::new(std::io::BufWriter::with_capacity(
+                    buffer_size,
+                    StdChannelWriter::new(tx),
+                ));
+                (
+                    pxar::PxarVariant::Split(
+                        writer,
+                        pxar::encoder::sync::StandardWriter::new(payload_writer),
+                    ),
+                    Some(rx),
+                    Some(suggested_boundaries_tx),
+                    Some(suggested_boundaries_rx),
+                )
+            } else {
+                (pxar::PxarVariant::Unified(writer), None, None, None)
+            };
 
         let error = Arc::new(Mutex::new(None));
         let error2 = Arc::clone(&error);
@@ -85,6 +90,7 @@ impl PxarBackupStream {
                 },
                 options,
                 boundaries,
+                suggested_boundaries_tx,
             )
             .await
             {
@@ -99,12 +105,14 @@ impl PxarBackupStream {
 
         let backup_stream = Self {
             rx: Some(rx),
+            suggested_boundaries: None,
             handle: Some(handle.clone()),
             error: Arc::clone(&error),
         };
 
         let backup_payload_stream = payload_rx.map(|rx| Self {
             rx: Some(rx),
+            suggested_boundaries: suggested_boundaries_rx,
             handle: Some(handle),
             error,
         });
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 32e5f9b81..87dbb63d5 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -209,7 +209,7 @@ async fn backup_directory<P: AsRef<Path>>(
         payload_target.is_some(),
     )?;
 
-    let mut chunk_stream = ChunkStream::new(pxar_stream, chunk_size, None);
+    let mut chunk_stream = ChunkStream::new(pxar_stream, chunk_size, None, None);
     let (tx, rx) = mpsc::channel(10); // allow to buffer 10 chunks
 
     let stream = ReceiverStream::new(rx).map_err(Error::from);
@@ -223,14 +223,19 @@ async fn backup_directory<P: AsRef<Path>>(
 
     let stats = client.upload_stream(archive_name, stream, upload_options.clone(), None);
 
-    if let Some(payload_stream) = payload_stream {
+    if let Some(mut payload_stream) = payload_stream {
         let payload_target = payload_target
             .ok_or_else(|| format_err!("got payload stream, but no target archive name"))?;
 
         let (payload_injections_tx, payload_injections_rx) = std::sync::mpsc::channel();
         let injection_data = InjectionData::new(payload_boundaries_rx, payload_injections_tx);
-        let mut payload_chunk_stream =
-            ChunkStream::new(payload_stream, chunk_size, Some(injection_data));
+        let suggested_boundaries = payload_stream.suggested_boundaries.take();
+        let mut payload_chunk_stream = ChunkStream::new(
+            payload_stream,
+            chunk_size,
+            Some(injection_data),
+            suggested_boundaries,
+        );
         let (payload_tx, payload_rx) = mpsc::channel(10); // allow to buffer 10 chunks
         let stream = ReceiverStream::new(payload_rx).map_err(Error::from);
 
@@ -573,7 +578,8 @@ fn spawn_catalog_upload(
     let (catalog_tx, catalog_rx) = std::sync::mpsc::sync_channel(10); // allow to buffer 10 writes
     let catalog_stream = proxmox_async::blocking::StdChannelStream(catalog_rx);
     let catalog_chunk_size = 512 * 1024;
-    let catalog_chunk_stream = ChunkStream::new(catalog_stream, Some(catalog_chunk_size), None);
+    let catalog_chunk_stream =
+        ChunkStream::new(catalog_stream, Some(catalog_chunk_size), None, None);
 
     let catalog_writer = Arc::new(Mutex::new(CatalogWriter::new(TokioWriterAdapter::new(
         StdChannelWriter::new(catalog_tx),
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index 681fa6db9..80af5011e 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -364,8 +364,16 @@ fn extract(
                     };
 
                     let pxar_writer = pxar::PxarVariant::Unified(TokioWriter::new(writer));
-                    create_archive(dir, PxarWriters::new(pxar_writer, None), Flags::DEFAULT, |_| Ok(()), options, None)
-                        .await
+                    create_archive(
+                        dir,
+                        PxarWriters::new(pxar_writer, None),
+                        Flags::DEFAULT,
+                        |_| Ok(()),
+                        options,
+                        None,
+                        None,
+                    )
+                    .await
                 }
                 .await;
                 if let Err(err) = result {
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index 7efecd524..a2a3d241a 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -412,6 +412,7 @@ async fn create_archive(
         },
         options,
         None,
+        None,
     )
     .await?;
 
diff --git a/tests/catar.rs b/tests/catar.rs
index 9f83b4cc2..94c565012 100644
--- a/tests/catar.rs
+++ b/tests/catar.rs
@@ -40,6 +40,7 @@ fn run_test(dir_name: &str) -> Result<(), Error> {
         |_| Ok(()),
         options,
         None,
+        None,
     ))?;
 
     Command::new("cmp")
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 64/69] client: pxar: allow to restore prelude to optional path
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (62 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 63/69] client: chunk stream: switch payload stream chunker Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 65/69] client: pxar: add archive creation with reference test Christian Ebner
                   ` (5 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Pxar archives allow to store additional information in a prelude
entry since pxar format version 2.

Add an optional parameter to `pxar` and `proxmox-backup-client` to
specify the path to restore the prelude to and pass this to the
archive extraction by extending the `PxarExtractOptions` by a
corresponding field. If none is given, the prelude is simply skipped
during restore.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- remove binding to now not returned version return value

 pbs-client/src/pxar/extract.rs    | 23 +++++++++++++++++++++--
 proxmox-backup-client/src/main.rs | 12 +++++++++++-
 pxar-bin/src/main.rs              |  6 ++++++
 3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
index e22390606..99c0d0e10 100644
--- a/pbs-client/src/pxar/extract.rs
+++ b/pbs-client/src/pxar/extract.rs
@@ -2,7 +2,8 @@
 
 use std::collections::HashMap;
 use std::ffi::{CStr, CString, OsStr, OsString};
-use std::io;
+use std::fs::OpenOptions;
+use std::io::{self, Write};
 use std::os::unix::ffi::OsStrExt;
 use std::os::unix::io::{AsRawFd, FromRawFd, RawFd};
 use std::path::{Path, PathBuf};
@@ -37,6 +38,7 @@ pub struct PxarExtractOptions<'a> {
     pub allow_existing_dirs: bool,
     pub overwrite_flags: OverwriteFlags,
     pub on_error: Option<ErrorHandler>,
+    pub prelude_path: Option<PathBuf>,
 }
 
 bitflags! {
@@ -125,9 +127,26 @@ where
         // we use this to keep track of our directory-traversal
         decoder.enable_goodbye_entries(true);
 
-        let (root, _) = handle_root_with_optional_format_version_prelude(&mut decoder)
+        let (root, prelude) = handle_root_with_optional_format_version_prelude(&mut decoder)
             .context("error reading pxar archive")?;
 
+        if let Some(ref path) = options.prelude_path {
+            if let Some(entry) = prelude {
+                let mut prelude_file = OpenOptions::new()
+                    .create(true)
+                    .write(true)
+                    .open(path)
+                    .with_context(|| format!("error creating prelude file '{path:?}'"))?;
+                if let pxar::EntryKind::Prelude(ref prelude) = entry.kind() {
+                    prelude_file.write_all(prelude.as_os_str().as_bytes())?;
+                } else {
+                    log::info!("unexpected entry kind for prelude");
+                }
+            } else {
+                log::info!("No prelude entry found, skip prelude restore.");
+            }
+        }
+
         if !root.is_dir() {
             bail!("pxar archive does not start with a directory entry!");
         }
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 87dbb63d5..ad9042857 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -1433,7 +1433,12 @@ We do not extract '.pxar' archives when writing to standard output.
                 description: "ignore errors that occur during device node extraction",
                 optional: true,
                 default: false,
-            }
+            },
+            "restore-prelude-to": {
+                description: "Path to restore prelude to, (pxar v2 archives only).",
+                type: String,
+                optional: true,
+            },
         }
     }
 )]
@@ -1595,12 +1600,17 @@ async fn restore(
             overwrite_flags.insert(pbs_client::pxar::OverwriteFlags::all());
         }
 
+        let prelude_path = param["restore-prelude-to"]
+            .as_str()
+            .map(|path| PathBuf::from(path));
+
         let options = pbs_client::pxar::PxarExtractOptions {
             match_list: &[],
             extract_match_default: true,
             allow_existing_dirs,
             overwrite_flags,
             on_error,
+            prelude_path,
         };
 
         let mut feature_flags = pbs_client::pxar::Flags::DEFAULT;
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index a2a3d241a..4cb7713e8 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -131,6 +131,10 @@ fn extract_archive_from_reader<R: std::io::Read>(
                 description: "'ppxar' payload input data file to restore split archive.",
                 optional: true,
             },
+            "restore-prelude-to": {
+                description: "Path to restore pxar archive prelude to.",
+                optional: true,
+            },
         },
     },
 )]
@@ -154,6 +158,7 @@ fn extract_archive(
     no_sockets: bool,
     strict: bool,
     payload_input: Option<String>,
+    restore_prelude_to: Option<String>,
 ) -> Result<(), Error> {
     let mut feature_flags = Flags::DEFAULT;
     if no_xattrs {
@@ -227,6 +232,7 @@ fn extract_archive(
         overwrite_flags,
         extract_match_default,
         on_error,
+        prelude_path: restore_prelude_to.map(|path| PathBuf::from(path)),
     };
 
     if archive == "-" {
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 65/69] client: pxar: add archive creation with reference test
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (63 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 64/69] client: pxar: allow to restore prelude to optional path Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 66/69] client: tools: add helper to raise nofile rlimit Christian Ebner
                   ` (4 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Add a basic regression test for archive creation with reference
metadata archive and index.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- adapt to PxarVariant pxar interface
- adapt to PxarLookaheadCache

 pbs-client/src/pxar/create.rs                 | 243 ++++++++++++++++++
 tests/pxar/backup-client-pxar-data.mpxar      | Bin 0 -> 15070 bytes
 tests/pxar/backup-client-pxar-data.ppxar.didx | Bin 0 -> 8096 bytes
 tests/pxar/backup-client-pxar-expected.mpxar  | Bin 0 -> 15086 bytes
 4 files changed, 243 insertions(+)
 create mode 100644 tests/pxar/backup-client-pxar-data.mpxar
 create mode 100644 tests/pxar/backup-client-pxar-data.ppxar.didx
 create mode 100644 tests/pxar/backup-client-pxar-expected.mpxar

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index ff7e86804..bcadf12bd 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -1718,3 +1718,246 @@ fn generate_pxar_excludes_cli(patterns: &[MatchEntry]) -> Vec<u8> {
 
     content
 }
+
+#[cfg(test)]
+mod tests {
+    use std::ffi::OsString;
+    use std::fs::File;
+    use std::fs::OpenOptions;
+    use std::io::{self, BufReader, Seek, SeekFrom, Write};
+    use std::pin::Pin;
+    use std::process::Command;
+    use std::sync::mpsc;
+    use std::task::{Context, Poll};
+
+    use pbs_datastore::dynamic_index::DynamicIndexReader;
+    use pxar::accessor::sync::FileReader;
+    use pxar::encoder::SeqWrite;
+
+    use crate::pxar::extract::Extractor;
+    use crate::pxar::OverwriteFlags;
+
+    use super::*;
+
+    struct DummyWriter {
+        file: Option<File>,
+    }
+
+    impl DummyWriter {
+        fn new<P: AsRef<Path>>(path: Option<P>) -> Result<Self, Error> {
+            let file = if let Some(path) = path {
+                Some(
+                    OpenOptions::new()
+                        .read(true)
+                        .write(true)
+                        .truncate(true)
+                        .create(true)
+                        .open(path)?,
+                )
+            } else {
+                None
+            };
+            Ok(Self { file })
+        }
+    }
+
+    impl Write for DummyWriter {
+        fn write(&mut self, data: &[u8]) -> io::Result<usize> {
+            if let Some(file) = self.file.as_mut() {
+                file.write_all(data)?;
+            }
+            Ok(data.len())
+        }
+
+        fn flush(&mut self) -> io::Result<()> {
+            if let Some(file) = self.file.as_mut() {
+                file.flush()?;
+            }
+            Ok(())
+        }
+    }
+
+    impl SeqWrite for DummyWriter {
+        fn poll_seq_write(
+            mut self: Pin<&mut Self>,
+            _cx: &mut Context,
+            buf: &[u8],
+        ) -> Poll<io::Result<usize>> {
+            Poll::Ready(self.as_mut().write(buf))
+        }
+
+        fn poll_flush(mut self: Pin<&mut Self>, _cx: &mut Context) -> Poll<Result<(), io::Error>> {
+            Poll::Ready(self.as_mut().flush())
+        }
+    }
+
+    fn prepare<P: AsRef<Path>>(dir_path: P) -> Result<(), Error> {
+        let dir = nix::dir::Dir::open(dir_path.as_ref(), OFlag::O_DIRECTORY, Mode::empty())?;
+
+        let fs_magic = detect_fs_type(dir.as_raw_fd()).unwrap();
+        let stat = nix::sys::stat::fstat(dir.as_raw_fd()).unwrap();
+        let mut fs_feature_flags = Flags::from_magic(fs_magic);
+        let metadata = get_metadata(
+            dir.as_raw_fd(),
+            &stat,
+            fs_feature_flags,
+            fs_magic,
+            &mut fs_feature_flags,
+            false,
+        )?;
+
+        let mut extractor = Extractor::new(
+            dir,
+            metadata.clone(),
+            true,
+            OverwriteFlags::empty(),
+            fs_feature_flags,
+        );
+
+        let dir_metadata = Metadata {
+            stat: pxar::Stat::default().mode(0o777u64).set_dir().gid(0).uid(0),
+            ..Default::default()
+        };
+
+        let file_metadata = Metadata {
+            stat: pxar::Stat::default()
+                .mode(0o777u64)
+                .set_regular_file()
+                .gid(0)
+                .uid(0),
+            ..Default::default()
+        };
+
+        extractor.enter_directory(
+            OsString::from(format!("testdir")),
+            dir_metadata.clone(),
+            true,
+        )?;
+
+        let size = 1024 * 1024;
+        let mut cursor = BufReader::new(std::io::Cursor::new(vec![0u8; size]));
+        for i in 0..10 {
+            extractor.enter_directory(
+                OsString::from(format!("folder_{i}")),
+                dir_metadata.clone(),
+                true,
+            )?;
+            for j in 0..10 {
+                cursor.seek(SeekFrom::Start(0))?;
+                extractor.extract_file(
+                    CString::new(format!("file_{j}").as_str())?.as_c_str(),
+                    &file_metadata,
+                    size as u64,
+                    &mut cursor,
+                    true,
+                )?;
+            }
+            extractor.leave_directory()?;
+        }
+
+        extractor.leave_directory()?;
+
+        Ok(())
+    }
+
+    #[test]
+    fn test_create_archive_with_reference() -> Result<(), Error> {
+        let mut testdir = PathBuf::from("./target/testout");
+        testdir.push(std::module_path!());
+
+        let _ = std::fs::remove_dir_all(&testdir);
+        let _ = std::fs::create_dir_all(&testdir);
+
+        prepare(testdir.as_path())?;
+
+        let previous_payload_index = Some(DynamicIndexReader::new(File::open(
+            "../tests/pxar/backup-client-pxar-data.ppxar.didx",
+        )?)?);
+        let metadata_archive = File::open("../tests/pxar/backup-client-pxar-data.mpxar").unwrap();
+        let metadata_size = metadata_archive.metadata()?.len();
+        let reader: MetadataArchiveReader = Arc::new(FileReader::new(metadata_archive));
+
+        let rt = tokio::runtime::Runtime::new().unwrap();
+        let (suggested_boundaries, _rx) = mpsc::channel();
+        let (forced_boundaries, _rx) = mpsc::channel();
+
+        rt.block_on(async move {
+            testdir.push("testdir");
+            let source_dir =
+                nix::dir::Dir::open(testdir.as_path(), OFlag::O_DIRECTORY, Mode::empty()).unwrap();
+
+            let fs_magic = detect_fs_type(source_dir.as_raw_fd()).unwrap();
+            let stat = nix::sys::stat::fstat(source_dir.as_raw_fd()).unwrap();
+            let mut fs_feature_flags = Flags::from_magic(fs_magic);
+
+            let metadata = get_metadata(
+                source_dir.as_raw_fd(),
+                &stat,
+                fs_feature_flags,
+                fs_magic,
+                &mut fs_feature_flags,
+                false,
+            )?;
+
+            let writer = DummyWriter::new(Some("./target/backup-client-pxar-run.mpxar")).unwrap();
+            let payload_writer = DummyWriter::new::<PathBuf>(None).unwrap();
+
+            let mut encoder = Encoder::new(
+                pxar::PxarVariant::Split(writer, payload_writer),
+                &metadata,
+                Some(&[]),
+            )
+            .await?;
+
+            let mut archiver = Archiver {
+                feature_flags: Flags::from_magic(fs_magic),
+                fs_feature_flags: Flags::from_magic(fs_magic),
+                fs_magic,
+                callback: Box::new(|_| Ok(())),
+                patterns: Vec::new(),
+                catalog: None,
+                path: PathBuf::new(),
+                entry_counter: 0,
+                entry_limit: 1024,
+                current_st_dev: stat.st_dev,
+                device_set: None,
+                hardlinks: HashMap::new(),
+                file_copy_buffer: vec::undefined(4 * 1024 * 1024),
+                skip_e2big_xattr: false,
+                forced_boundaries: Some(forced_boundaries),
+                previous_payload_index,
+                suggested_boundaries: Some(suggested_boundaries),
+                cache: PxarLookaheadCache::new(),
+                reuse_stats: ReuseStats::default(),
+            };
+
+            let accessor = Accessor::new(pxar::PxarVariant::Unified(reader), metadata_size)
+                .await
+                .unwrap();
+            let root = accessor.open_root().await.ok();
+            archiver
+                .archive_dir_contents(&mut encoder, root, source_dir, true)
+                .await
+                .unwrap();
+
+            archiver
+                .flush_cached_reusing_if_below_threshold(&mut encoder, false)
+                .await
+                .unwrap();
+
+            encoder.finish().await.unwrap();
+            encoder.close().await.unwrap();
+
+            let status = Command::new("diff")
+                .args([
+                    "../tests/pxar/backup-client-pxar-expected.mpxar",
+                    "./target/backup-client-pxar-run.mpxar",
+                ])
+                .status()
+                .expect("failed to execute diff");
+            assert!(status.success());
+
+            Ok::<(), Error>(())
+        })
+    }
+}
diff --git a/tests/pxar/backup-client-pxar-data.mpxar b/tests/pxar/backup-client-pxar-data.mpxar
new file mode 100644
index 0000000000000000000000000000000000000000..00f3dc295fb38062c23e6cf7cac9ae110beb0a65
GIT binary patch
literal 15070
zcmeI3ZD<@t7{_Pd4&n>F7EIfqb#1^FO6}HK%_)tWO2s0r+iLp7VpnN`+SqK3@uiTk
z0g)mo3n~RgSX7jP#Z`-;wUEWA!B4JW5kF|x5580@T|bDmXzQ8GN@tzklRN*x`)~`#
z+|A9+Z+4!U-!r*zm%i41e0X5q&>}W-sk}V(=Du$q+4;h;F8=yl4}ZdoA2i1PeiW~F
z7gkDF&G*_D^Edhj2X^*7yu)JuwZnyZhYt+&$+{a8M{=R@jqX44;jVQr_n5qS`Ja!?
zJj=%~;8y>8^bO)nmIG_xu7%+&Cf=v??$*F?HnW6jmEx|0;T&euxV12x%N!baJq+hD
zm&V-y!}-jkaa}N6z<e54f#E_H2)HYTj;(+Fj+3hvDKpg*+uC;ZZa5IH;Qkxr`*hRW
zPwf8mu1mHb<?ZtNpKkkPvV3Urmo~1zy#C9{-_I`n@$iyOh4%etZrOKo^U622=`*~%
zeP!$T7i*WVKk;~>pO3D*w{v9smfybSqt4rptjHd`|MLm<Vqu)Wo?ABc{O-rP2Mg^x
z7k_iMt1TS)zR-W~W!=Mf|9sD>XZd*YdB}HcLEjPq_HYs}F67(1L&2w#Y%n&v?uz=3
zSjazEo-U<0$><xz#Vn$6IDIE9rg1oZr!1jyIDKa<rExfYGbN*OIDMBD#vM>&W#aU0
zDplb0RRf39x22dg4ySKhu>@R8-*xF*Vx%6v7kKeM>Dy6kA+B?*Z&z_>oMf`bW;a>I
z<m4$Xjl=2NS3DYr(|4fwG!CclPzh)pPT!Fd(m0&HV<n<-IDIEdOyh9+PL)K!we($=
zz9ow2nVpfOKE<8BGbI(`D#hVW-%QPD98TY5mGQr_Y8<H~u^F3PY>L^!RI9-0s|F6I
zZ%Z|498TZ1YSB2Hz8%%3aX5Xuszc*&`u0?p#^LnstDb;s>ANm{OZIGY=sQq-A+B?*
z?@$eB98TYn8qzqNzGF3_agwFbV75rqn8xAsovI0q!|6LyQyPcUH`6j2htqdiWBlvb
z8kruaZ&RxR&pTMO^j(*}C7Y-@^lfRT5Z5`@x2;(;4ySKNvuPYo->&A+IGnyc&82aY
zmDgal@HLOd;q)D7K8?faJJbRihtqeYg)|PQ?^ufjTua||>07d@n?v7;77KBmV|}Mu
zLgR4y&a{-q;q=Y)jK<;gUDg@@$9att98TY+UIm_af|D*4$wF^1TUfeD<8b=6b&JN~
z^zG<2jl=2N)g1xX(sy0@mMpX8(6^_%LR_VL68GJ=uX{8Or|&@bX&g@9p&rmUoW3JH
zq;WWX$9hELaQaU4n8r!=RfE|g)e{<r(|4w)G!Cb4W@G}crSH1*Es1+`=(}t%gFI5<
z^lchdAa#Pn>Dw|)8i&)jZCEr8r*FrwX&g@9uHn!)oW4E7rExfY`-Vs3B-^;bY!Mhf
zjl=0XGy(zF(sy0@mIR_X^c@+Y5Z5_AeaA*b<8b;;jF`sZ^qm?Bjl=0XGg2Cd(>E(+
zG!Ccla*375OpnvIS*il5g9T3CR>`Ds5^FS=E$osd;9B~Y>$^BF%I$l%bRK>#eY7&O
zHYWHE=v;8~)Sh?xU(H|VW$)?(9aB$LPP~7)*uLYlq5Vhi+jHX|?PC2`OI{f&Z9MeB
z>6K!Ahu7bI_2$0s#@C4T7d>;$s;8fP>(9z^v3}X<gBuoTx3=wFD%QVullIW@VSRMn
ee6jw{9WR|3pSSVg=*41v{&S{}`TgcUXZj0DmM>rc

literal 0
HcmV?d00001

diff --git a/tests/pxar/backup-client-pxar-data.ppxar.didx b/tests/pxar/backup-client-pxar-data.ppxar.didx
new file mode 100644
index 0000000000000000000000000000000000000000..a646218b5d504196443b17d62f3b22d171f011b8
GIT binary patch
literal 8096
zcmeIw&x=k`9LMqR`E|?Ga2Ha_;uI^9yBJNz!Z9f+>6R!pO*b*jh^~^|JnlRqSshV|
z%`G8TSEgiYa;C7OyU<J|9b=l(l;<{OBpc7nKj5>pIN$Ya@$KDb%dI01H%~o(*K{tQ
zJ~q6+^=PT==^y&xJ9`F3sC!@cUYwa2-S=W{^1`mZr(YJX_P=dkztmp8Zd>P0&yH!m
zYQlvAp+G1Q3WNfoKqwFjgaV;JC=d#S0-?bFT|iU3_TbPp*KU10`+DQ}SnEW!^~!_6
zmE|WN7T?dES$;Kh@A!s<^qNZtPc0p`(~IYOPkMW9;>Of`Ykp<tpHIL1?CtT1yY~$x
zkW0xxE~6B3Ic1P5D2JS-0&*o;$W>HA&QS%qnjGXj)sSn*LylMjxtI}Kh5y=%W?c!m
zglWhbmOw6L267ooA(yiZas|sFXITNcl3B=Atc09n736B>Am>>PxrTYj5pN(DbK=OZ
zH1A4ee_TV(@C0%xH;~JC3b~wTkSll&Im-*kmE1zE;w9u9uOL@*2RYAc$Ti$Ujzj~w
zSdc(=rA1dF`x6>+MkJ6+g@IfqQpn{ZgIpnU$XQW9t`rt>l_(+SL<PB8ILLWXL#`1X
zawHqb#gZhlD=oVc*`L&qGcti(Dh=c^nL;j?8RQC?L(a+qa;3D8t7Hi|Co9O+(m~G4
z8gh;FkR#PVE>@(FU1`;o$o`auoKXqnQe_~QsT6X#${<&$9CB6_kSmpiT%}6LIaNWf
zRt|Dr)sSnHha5!><l=}TWLG-sN@RbLhMb8K$YqgPb4Mq~o{fC&+#LA+d)TeK+0;Jx
Tczf@GpWkK|cE3#e4vqc=((xEu

literal 0
HcmV?d00001

diff --git a/tests/pxar/backup-client-pxar-expected.mpxar b/tests/pxar/backup-client-pxar-expected.mpxar
new file mode 100644
index 0000000000000000000000000000000000000000..ae4a18c89749f3d7ec82623e84509df19943d03e
GIT binary patch
literal 15086
zcmeI3Z-^9S9LJyew{ZQzQRvjGZ1W%mF~`ihExcw8BMEJ+&NoR;;T@Hij$M~!+%X3c
z5)=a!LLm(mg^)C*cv!*>U2*iP2{P$LIT8J_45t^7Nokw=%!_Ax+~4i?J=zyLa6C89
zJ@<T`XP)Qz{C>O3UiwDo@!`Q)L-SbmQh9m#&Zl18d#vMIli#0ud-r#bZF%Wv55GTG
z=D+abM~$(6erm4+b4!J*XM3IV`5y+h4{qsybhE|&Yln054j&rqmvuKLj^sk)8{PB%
zM_X6zEf;z7ykx98^L+dQZu!4Q-z3iBn7X*@U^tuQ^Q$wv6)>E`EdE&Q;I4<^TxQd_
zl`x#g92$264CgbK#@z_R1<a#yJuqCzd>U7R;UX3YxGRT_u72~*lgs8Q)#{0j9b5a>
z?2DIhA8zNZ*S-7XwomW5WYZDeF0cRj_D?3wgOk5@a0TY|UrzpUcHvKl7p$vkKXB&O
z-6z*CeQTp$?Kp2=x@-K{%EhZsJW<on$5-9oJ+f)T?_cwA<n2e6WDh_1`2>5pW}LsB
zTQv3Jww=9syS(h4|IOK+j&S6Mn*RGP>m9!Lm-|jV&&QKLhg^R(`j!Z=%tywH3;8zh
zQ1GcF8jMY^yIOt6Ead-K$2gMFH;GGFMB{M!PFYOjaQe<zLgR4yW=cxqaQZftjK<;g
zT~ru%K%Je5)3>FVG!Cb4TdB<N{8eXmIDI>cCE(inZb;t}BbE7C;Kl!>Z&$H}b(Ka7
zoW4E9p>dLjH8#D6RU4dq#iemLeFut1<8b;86`#i8^c^Vyjl=0XRzezw(|4joG!Ccl
zREcREPT!f52)MSs8`8H#5#{L_N$OKv_RZ8(SXU_yr*BiuXdF)8MV0YaS#@$8$=Zxf
zZ*6L$g{7J_4ySKht<3NIRcCfMeLJc}<8b<RRh!1)^zEq*jl=2NS6v#1(|4eH0<Nv^
zhV(5tv#p`;Q1yj%ond`PYCz*~`i|9*#^Lmxs1c2mY=tJHMXJU$4yW%-O=uiW-%Lws
z98TY+meDwzzKa^;Z^zaNy*PbanknGg`ff<yl0(!Q`nI*oMxrt}T=wl~7LCK{+tq9u
zhts#GIW!KZZ(nn1oMh)U87%_Mqj5NWhni30aQcq4fX3nU9cv+t!|6NGA_3RdcSHJ?
z?CRFgcdEt0y3TO+ooNY;!|9vpDUHMF+tf1}htqdaXZ(ZnIvE^J-<EFDILStDGFsSr
zWqwp*fz!96TQm-*Z&$Zz98TY!?g+THz8liFWSg~yzJ1*l)^&#U9q1m7!|6NJeHw?;
zccce24yW%}4{01u--#a4IGnyyJ*IJzbJb+D$n=E9;q=Xnl*ZxoZ5o+?YwNoqeM{E8
zHS}FHm_g<^xHnHM!=!OIecMK5epCjB)3;+-G!Cb4*RW|EPT!v4&^VmFeZ!@3IDH3(
zN8=>NxXEY{8a|D~={qt40oT@dL;991L~H0fHbP-tXE^&#jEKhJ^qm?pjl=0XGZGqy
z(>E)mG!Cb4vyjm^oW6?%Rv<$!PTy9+q;WWX+l9*fsKi2IjV7aoQ?LYFTi<eh*FG2J
zj$IqN55JH;UaBtE1U~`Yb8ea1@!r7e`F&pYE#KEQ^-Sr+2Um#gyFMG*bL4>?H~rZu
z)_=9&wV}e=gCCw=D%N*-1HIR*@Be;$g;;;lbJs3=_UU*2DlHc47oFa}W{!4S$F7B9
o{h^z+M~)BcqpN0%^>=T6<;?i3wfjde7VGn`GkwA5n}40@Z-J;Sk^lez

literal 0
HcmV?d00001

-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 66/69] client: tools: add helper to raise nofile rlimit
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (64 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 65/69] client: pxar: add archive creation with reference test Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 67/69] client: pxar: set cache limit based on " Christian Ebner
                   ` (3 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

The default soft limit for open file handles is rather low, as some
apis (e.g. the POSIX `select(2)` syscall) do not work [0].

The lookahead cache use during the backup clients metadata comparison
to reuse unchanged files however requires much higher limits to work
effectively.

This helper function allows to raise the soft limit to the hard
limit, as provided by the `getrlimit(2)` syscall.

[0] https://0pointer.net/blog/file-descriptor-limits.html

Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- no changes

 pbs-client/src/tools/mod.rs | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/pbs-client/src/tools/mod.rs b/pbs-client/src/tools/mod.rs
index 8d4fefaf3..a9d1e9843 100644
--- a/pbs-client/src/tools/mod.rs
+++ b/pbs-client/src/tools/mod.rs
@@ -570,3 +570,26 @@ pub fn handle_root_with_optional_format_version_prelude<R: pxar::decoder::SeqRea
         _ => bail!("unexpected entry kind {:?}", first.kind()),
     }
 }
+
+/// Raise the soft limit for open file handles to the hard limit
+///
+/// Returns the values set before raising the limit as libc::rlimit64
+pub fn raise_nofile_limit() -> Result<libc::rlimit64, Error> {
+    let mut old = libc::rlimit64 {
+        rlim_cur: 0,
+        rlim_max: 0,
+    };
+    if 0 != unsafe { libc::getrlimit64(libc::RLIMIT_NOFILE, &mut old as *mut libc::rlimit64) } {
+        bail!("Failed to get nofile rlimit");
+    }
+
+    let mut new = libc::rlimit64 {
+        rlim_cur: old.rlim_max,
+        rlim_max: old.rlim_max,
+    };
+    if 0 != unsafe { libc::setrlimit64(libc::RLIMIT_NOFILE, &mut new as *mut libc::rlimit64) } {
+        bail!("Failed to set nofile rlimit");
+    }
+
+    Ok(old)
+}
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 67/69] client: pxar: set cache limit based on nofile rlimit
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (65 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 66/69] client: tools: add helper to raise nofile rlimit Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 68/69] chunker: tests: add regression tests for payload chunker Christian Ebner
                   ` (2 subsequent siblings)
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

The lookahead cache size requires the resource limit for open file
handles to be high in order to allow for efficient reuse of unchanged
file payloads.

Increase the nofile soft limit to the hard limit and dynamically adapt
the cache size to the new soft limit minus the half of the previous
soft limit.

The `PxarCreateOptions` and the `Archiver` are therefore extended by
an additional field to store the maximum cache size, with fallback to
a default size of 512 entries.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- adapt to PxarLookaheadCache

 pbs-client/src/pxar/create.rs                 |  6 ++++--
 proxmox-backup-client/src/main.rs             | 21 ++++++++++++++++---
 .../src/proxmox_restore_daemon/api.rs         |  1 +
 pxar-bin/src/main.rs                          |  1 +
 4 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index bcadf12bd..c30ba340f 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -56,6 +56,8 @@ pub struct PxarCreateOptions {
     pub skip_e2big_xattr: bool,
     /// Reference state for partial backups
     pub previous_ref: Option<PxarPrevRef>,
+    /// Maximum number of lookahead cache entries
+    pub max_cache_size: Option<usize>,
 }
 
 pub type MetadataArchiveReader = Arc<dyn ReadAt + Send + Sync + 'static>;
@@ -275,7 +277,7 @@ where
         forced_boundaries,
         suggested_boundaries,
         previous_payload_index,
-        cache: PxarLookaheadCache::new(None),
+        cache: PxarLookaheadCache::new(options.max_cache_size),
         reuse_stats: ReuseStats::default(),
     };
 
@@ -1927,7 +1929,7 @@ mod tests {
                 forced_boundaries: Some(forced_boundaries),
                 previous_payload_index,
                 suggested_boundaries: Some(suggested_boundaries),
-                cache: PxarLookaheadCache::new(),
+                cache: PxarLookaheadCache::new(None),
                 reuse_stats: ReuseStats::default(),
             };
 
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index ad9042857..9e5c2006e 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -41,7 +41,7 @@ use pbs_client::tools::{
         crypto_parameters, format_key_source, get_encryption_key_password, KEYFD_SCHEMA,
         KEYFILE_SCHEMA, MASTER_PUBKEY_FD_SCHEMA, MASTER_PUBKEY_FILE_SCHEMA,
     },
-    CHUNK_SIZE_SCHEMA, REPO_URL_SCHEMA,
+    raise_nofile_limit, CHUNK_SIZE_SCHEMA, REPO_URL_SCHEMA,
 };
 use pbs_client::{
     delete_ticket_info, parse_backup_specification, view_task_result, BackupDetectionMode,
@@ -1074,7 +1074,8 @@ async fn create_backup(
                     .start_directory(std::ffi::CString::new(target.as_str())?.as_c_str())?;
 
                 let mut previous_ref = None;
-                if detection_mode.is_metadata() {
+                let max_cache_size = if detection_mode.is_metadata() {
+                    let old_rlimit = raise_nofile_limit()?;
                     if let Some(ref manifest) = previous_manifest {
                         // BackupWriter::start created a new snapshot, get the one before
                         if let Some(backup_time) = client.previous_backup_time().await? {
@@ -1099,7 +1100,20 @@ async fn create_backup(
                             .await?
                         }
                     }
-                }
+
+                    if old_rlimit.rlim_max <= 4096 {
+                        log::info!(
+                            "resource limit for open file handles low: {}",
+                            old_rlimit.rlim_max,
+                        );
+                    }
+
+                    Some(usize::try_from(
+                        old_rlimit.rlim_max - old_rlimit.rlim_cur / 2,
+                    )?)
+                } else {
+                    None
+                };
 
                 let pxar_options = pbs_client::pxar::PxarCreateOptions {
                     device_set: devices.clone(),
@@ -1108,6 +1122,7 @@ async fn create_backup(
                     skip_lost_and_found,
                     skip_e2big_xattr,
                     previous_ref,
+                    max_cache_size,
                 };
 
                 let upload_options = UploadOptions {
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index 80af5011e..0a535b7a7 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -361,6 +361,7 @@ fn extract(
                         skip_lost_and_found: false,
                         skip_e2big_xattr: false,
                         previous_ref: None,
+                        max_cache_size: None,
                     };
 
                     let pxar_writer = pxar::PxarVariant::Unified(TokioWriter::new(writer));
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index 4cb7713e8..3a746b9f4 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -370,6 +370,7 @@ async fn create_archive(
         skip_lost_and_found: false,
         skip_e2big_xattr: false,
         previous_ref: None,
+        max_cache_size: None,
     };
 
     let source = PathBuf::from(source);
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 68/69] chunker: tests: add regression tests for payload chunker
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (66 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 67/69] client: pxar: set cache limit based on " Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 69/69] chunk stream: " Christian Ebner
  2024-05-28  9:45 ` [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Test chunking of a payload stream with suggested chunk boundaries.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- fix incorrect context updates for tests

 pbs-datastore/src/chunker.rs | 94 ++++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/pbs-datastore/src/chunker.rs b/pbs-datastore/src/chunker.rs
index d0543bca0..ecdbca296 100644
--- a/pbs-datastore/src/chunker.rs
+++ b/pbs-datastore/src/chunker.rs
@@ -382,3 +382,97 @@ fn test_chunker1() {
         panic!("got different chunks");
     }
 }
+
+#[test]
+fn test_suggested_boundary() {
+    let mut buffer = Vec::new();
+
+    for i in 0..(256 * 1024) {
+        for j in 0..4 {
+            let byte = ((i >> (j << 3)) & 0xff) as u8;
+            buffer.push(byte);
+        }
+    }
+    let (tx, rx) = std::sync::mpsc::channel();
+    let mut chunker = PayloadChunker::new(64 * 1024, rx);
+
+    // Suggest chunk boundary within regular chunk
+    tx.send(32 * 1024).unwrap();
+    // Suggest chunk boundary within regular chunk, resulting chunk being 0
+    tx.send(32 * 1024).unwrap();
+    // Suggest chunk boundary in the past, must be ignored
+    tx.send(0).unwrap();
+    // Suggest chunk boundary aligned with regular boundary
+    tx.send(405521).unwrap();
+
+    let mut pos = 0;
+    let mut last = 0;
+
+    let mut chunks1: Vec<(usize, usize)> = vec![];
+    let mut chunks2: Vec<(usize, usize)> = vec![];
+    let mut ctx = Context::default();
+
+    // test1: feed single bytes with suggeset boundary
+    while pos < buffer.len() {
+        ctx.total += 1;
+        let k = chunker.scan(&buffer[pos..pos + 1], &ctx);
+        pos += 1;
+        if k != 0 {
+            let prev = last;
+            last = pos;
+            ctx.base += pos as u64;
+            ctx.total = 0;
+            chunks1.push((prev, pos - prev));
+        }
+    }
+    chunks1.push((last, buffer.len() - last));
+
+    let mut pos = 0;
+    let mut ctx = Context::default();
+    ctx.total = buffer.len() as u64;
+    chunker.reset();
+    // Suggest chunk boundary within regular chunk
+    tx.send(32 * 1024).unwrap();
+    // Suggest chunk boundary within regular chunk,
+    // resulting chunk being to small and therefore ignored
+    tx.send(32 * 1024).unwrap();
+    // Suggest chunk boundary in the past, must be ignored
+    tx.send(0).unwrap();
+    // Suggest chunk boundary aligned with regular boundary
+    tx.send(405521).unwrap();
+
+    while pos < buffer.len() {
+        let k = chunker.scan(&buffer[pos..], &ctx);
+        if k != 0 {
+            chunks2.push((pos, k));
+            pos += k;
+            ctx.base += pos as u64;
+            ctx.total = (buffer.len() - pos) as u64;
+        } else {
+            break;
+        }
+    }
+
+    chunks2.push((pos, buffer.len() - pos));
+
+    if chunks1 != chunks2 {
+        let mut size1 = 0;
+        for (_offset, len) in &chunks1 {
+            size1 += len;
+        }
+        println!("Chunks1: {size1}\n{chunks1:?}\n");
+
+        let mut size2 = 0;
+        for (_offset, len) in &chunks2 {
+            size2 += len;
+        }
+        println!("Chunks2: {size2}\n{chunks2:?}\n");
+
+        panic!("got different chunks");
+    }
+
+    let expected_sizes = [32768, 110609, 229376, 32768, 262144, 262144, 118767];
+    for ((_, chunk_size), expected) in chunks1.iter().zip(expected_sizes.iter()) {
+        assert_eq!(chunk_size, expected);
+    }
+}
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [pbs-devel] [PATCH v7 proxmox-backup 69/69] chunk stream: tests: add regression tests for payload chunker
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (67 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 68/69] chunker: tests: add regression tests for payload chunker Christian Ebner
@ 2024-05-27 14:33 ` Christian Ebner
  2024-05-28  9:45 ` [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-27 14:33 UTC (permalink / raw)
  To: pbs-devel

Regression tests to cover suggested and forced boundaries as well as
chunk injection.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- add missing `#[cfg(test)]` attribute
- fix formatting

 pbs-client/src/chunk_stream.rs | 117 +++++++++++++++++++++++++++++++++
 1 file changed, 117 insertions(+)

diff --git a/pbs-client/src/chunk_stream.rs b/pbs-client/src/chunk_stream.rs
index de3e7bb5d..e3f0980c6 100644
--- a/pbs-client/src/chunk_stream.rs
+++ b/pbs-client/src/chunk_stream.rs
@@ -237,3 +237,120 @@ where
         }
     }
 }
+
+#[cfg(test)]
+mod test {
+    use futures::stream::StreamExt;
+
+    use super::*;
+
+    struct DummyInput {
+        data: Vec<u8>,
+    }
+
+    impl DummyInput {
+        fn new(data: Vec<u8>) -> Self {
+            Self { data }
+        }
+    }
+
+    impl Stream for DummyInput {
+        type Item = Result<Vec<u8>, Error>;
+
+        fn poll_next(self: Pin<&mut Self>, _cx: &mut Context) -> Poll<Option<Self::Item>> {
+            let this = self.get_mut();
+            match this.data.len() {
+                0 => Poll::Ready(None),
+                size if size > 10 => Poll::Ready(Some(Ok(this.data.split_off(10)))),
+                _ => Poll::Ready(Some(Ok(std::mem::take(&mut this.data)))),
+            }
+        }
+    }
+
+    #[test]
+    fn test_chunk_stream_forced_boundaries() {
+        let mut data = Vec::new();
+        for i in 0..(256 * 1024) {
+            for j in 0..4 {
+                let byte = ((i >> (j << 3)) & 0xff) as u8;
+                data.push(byte);
+            }
+        }
+
+        let mut input = DummyInput::new(data);
+        let input = Pin::new(&mut input);
+
+        let (injections_tx, injections_rx) = mpsc::channel();
+        let (boundaries_tx, boundaries_rx) = mpsc::channel();
+        let (suggested_tx, suggested_rx) = mpsc::channel();
+        let injection_data = InjectionData::new(boundaries_rx, injections_tx);
+
+        let mut chunk_stream = ChunkStream::new(
+            input,
+            Some(64 * 1024),
+            Some(injection_data),
+            Some(suggested_rx),
+        );
+        let chunks = std::sync::Arc::new(std::sync::Mutex::new(Vec::new()));
+        let chunks_clone = chunks.clone();
+
+        // Suggested boundary matching forced boundary
+        suggested_tx.send(32 * 1024).unwrap();
+        // Suggested boundary not matching forced boundary
+        suggested_tx.send(64 * 1024).unwrap();
+        // Force chunk boundary at suggested boundary
+        boundaries_tx
+            .send(InjectChunks {
+                boundary: 32 * 1024,
+                chunks: Vec::new(),
+                size: 1024,
+            })
+            .unwrap();
+        // Force chunk boundary within regular chunk
+        boundaries_tx
+            .send(InjectChunks {
+                boundary: 128 * 1024,
+                chunks: Vec::new(),
+                size: 2048,
+            })
+            .unwrap();
+        // Force chunk boundary aligned with regular boundary
+        boundaries_tx
+            .send(InjectChunks {
+                boundary: 657408,
+                chunks: Vec::new(),
+                size: 512,
+            })
+            .unwrap();
+        // Force chunk boundary within regular chunk, without injecting data
+        boundaries_tx
+            .send(InjectChunks {
+                boundary: 657408 + 1024,
+                chunks: Vec::new(),
+                size: 0,
+            })
+            .unwrap();
+
+        let rt = tokio::runtime::Runtime::new().unwrap();
+        rt.block_on(async move {
+            while let Some(chunk) = chunk_stream.next().await {
+                let chunk = chunk.unwrap();
+                let mut chunks = chunks.lock().unwrap();
+                chunks.push(chunk);
+            }
+        });
+
+        let mut total = 0;
+        let chunks = chunks_clone.lock().unwrap();
+        let expected = [32768, 31744, 65536, 262144, 262144, 512, 262144, 131584];
+        for (chunk, expected) in chunks.as_slice().iter().zip(expected.iter()) {
+            assert_eq!(chunk.len(), *expected);
+            total += chunk.len();
+        }
+        while let Ok(injection) = injections_rx.recv() {
+            total += injection.size;
+        }
+
+        assert_eq!(total, 4 * 256 * 1024 + 1024 + 2048 + 512);
+    }
+}
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup
  2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
                   ` (68 preceding siblings ...)
  2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 69/69] chunk stream: " Christian Ebner
@ 2024-05-28  9:45 ` Christian Ebner
  69 siblings, 0 replies; 71+ messages in thread
From: Christian Ebner @ 2024-05-28  9:45 UTC (permalink / raw)
  To: pbs-devel

Superseded by version 8:
https://lists.proxmox.com/pipermail/pbs-devel/2024-May/009526.html


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2024-05-28  9:45 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 01/69] decoder: factor out skip part from skip_entry Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 02/69] lib: add type for input/output variant differentiation Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 03/69] encoder: move to stack based state tracking Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 04/69] format/examples: add header type `PXAR_PAYLOAD_REF` Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 05/69] decoder: add method to read payload references Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 06/69] encoder: allow split output writer for archive creation Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 07/69] decoder/accessor: allow for split input stream variant Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 08/69] decoder: set payload input range when decoding via accessor Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 09/69] encoder: add payload reference capability Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 10/69] encoder: add payload position capability Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 11/69] encoder: add payload advance capability Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 12/69] encoder/format: finish payload stream with marker Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 13/69] format: add payload stream start marker Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 14/69] format/encoder/decoder: new pxar entry type `Version` Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 15/69] format/encoder/decoder: new pxar entry type `Prelude` Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 16/69] client: backup: factor out extension from backup target Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 17/69] api: datastore: refactor getting local chunk reader Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 18/69] client: pxar: switch to stack based encoder state Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 19/69] client: pxar: combine writers into struct Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 20/69] client: pxar: optionally split metadata and payload streams Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 21/69] client: helper: add helpers for creating reader instances Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 22/69] client: helper: add method for split archive name mapping Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 23/69] client: tools: helper to check pxar filename extensions Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 24/69] client: restore: read payload from dedicated index Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 25/69] tools: cover extension for split pxar archives Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 26/69] restore: " Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 27/69] client: mount: make split pxar archives mountable Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 28/69] api: datastore: attach split archive payload chunk reader Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 29/69] catalog: shell: make split pxar archives accessible Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 30/69] www: cover metadata extension for pxar archives Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 31/69] file restore: factor out getting pxar reader Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 32/69] file restore: cover split metadata and payload archives Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 33/69] file restore: show more error context when extraction fails Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 34/69] pxar: add optional payload input for archive restore Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 35/69] pxar: cover listing for split archives Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 36/69] pxar: add more context to extraction error Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 37/69] client: pxar: include payload offset in entry listing Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 38/69] pxar: show padding in debug output on archive list Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 39/69] datastore: dynamic index: add method to get digest Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 40/69] client: pxar: helper for lookup of reusable dynamic entries Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 41/69] upload stream: implement reused chunk injector Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 42/69] client: chunk stream: add struct to hold injection state Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 43/69] chunker: add method to reset chunker state Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 44/69] client: streams: add channels for dynamic entry injection Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 45/69] specs: add backup detection mode specification Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 46/69] client: implement prepare reference method Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 47/69] client: pxar: add method for metadata comparison Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 48/69] pxar: caching: add look-ahead cache Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 49/69] client: pxar: refactor catalog encoding for directories Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 50/69] fix #3174: client: pxar: enable caching and meta comparison Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 51/69] client: backup writer: add injected chunk count to stats Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 52/69] pxar: create: keep track of reused chunks and files Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 53/69] pxar: create: show chunk injection stats debug output Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 54/69] client: pxar: add helper to handle optional preludes Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 55/69] client: pxar: opt encode cli exclude patterns as Prelude Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 56/69] pxar: ignore version and prelude entries in listing Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 57/69] docs: file formats: describe split pxar archive file layout Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 58/69] docs: add section describing change detection mode Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 59/69] test-suite: add detection mode change benchmark Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 60/69] test-suite: Makefile: add debian package and related files Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 61/69] datastore: chunker: add Chunker trait Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 62/69] datastore: chunker: implement chunker for payload stream Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 63/69] client: chunk stream: switch payload stream chunker Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 64/69] client: pxar: allow to restore prelude to optional path Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 65/69] client: pxar: add archive creation with reference test Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 66/69] client: tools: add helper to raise nofile rlimit Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 67/69] client: pxar: set cache limit based on " Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 68/69] chunker: tests: add regression tests for payload chunker Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 69/69] chunk stream: " Christian Ebner
2024-05-28  9:45 ` [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal