public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup
@ 2024-05-07 15:51 Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 01/62] format/examples: add header type `PXAR_PAYLOAD_REF` Christian Ebner
                   ` (62 more replies)
  0 siblings, 63 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:51 UTC (permalink / raw)
  To: pbs-devel

This series of patches implements an metadata based file change
detection mechanism for improved pxar file level backup creation speed
for unchanged files.

The chosen approach is to split pxar archives on creation via the
proxmox-backup-client into two separate data and upload streams,
one exclusive for regular file payloads, the other one for the rest
of the pxar archive, which is mostly metadata.

On consecutive runs, the metadata archive of the previous backup run,
which is limited in size and therefore rapidly accessed is used to
lookup and compare the metadata for entries to encode.
This assumes that the connection speed to the Proxmox Backup Server is
sufficiently fast, allowing the download and chaching of the chunks for
that index.

Changes to regular files are detected by comparing all of the files
metadata object, including mtime, acls, ecc. If no changes are detected,
the previous payload index is used to lookup chunks to possibly re-use
in the payload stream of the new archive.
In order to reduce possible chunk fragmentation, the decision whether to
reuse or reencode a file payload is deferred until enough information
is gathered by adding entries to a look-ahead cache. If the padding
introduced by reusing chunks falls below a threshold, the entries are
referenced, the chunks are reused and injected into the pxar payload
upload stream, otherwise they are discated and the files encoded
regularly.

Patches 13 and 14 are to be applied to the pxar repository only after
patch 49 in the series, for the patches to compile in a sequential
chain.

The following lists the most notable changes included in this series since
the version 4:
- Increase open file handle limit to hard limit and adapt lookahead
  cache size dynamically (thanks a lot to Thomas for pointing this out
  and providing the necessary background information). This helps with
  the reuse of multiple entries being contained within the same chunk,
  otherwise exceeding padding threshold and being therefore reencoded
  instead.
- Fix payload chunker scan to only scan up until chunk pos in case a
  suggested boundary is chosen.
- Fix issue with decoder state being not set to correct `InDirectory`
  after reading prelude and getting root directory entry.
- Fix issue with kept back chunk injection when the chunk follows a
  range discontinuity.
- Add regression test for pxar create with metadata archive and payload
  index reference.

The following lists the most notable changes included in this series since
the version 3:
- Rework the whole reused chunk injection and accounting logic and use
  lockless async `mpsc::channel`s instead of `Arc<Mutex<VecDeque<..>>>`.
- Reworked lookahead caching logic to use payload ranges and check for
  possible range continuation instead of looking up the reusable dynamic
  entries immediately in case of a reusable entry chain. This also
  avoids edge cases not covered in the previous version of the patch series.
  This current version therefore tends to reencode small files more
  aggressively, since they might introduce additional unwanted paddings.
- Correctly cover also hardlinks for the reuse logic, avoiding to
  reencode these entries.
- Add additional dedicatet chunker implementation for payload data
  stream, allowing the archiver to suggest boundaries to the chunker to
  reduce padding for reused chunks.
- Add additional `change-detection-mode=data`, in order to allow
  creating split archives with fully reencoded payload data.
- Add additional payload input readers for pxar accessor type
  implementations where needed.
- Add additional consistency check in pxar encoder when dropping state
  or encoder instance.
- CliParams was renamed to the more opaque Prelude, since the pxar
  archive does not care about its contents and this might be extended to
  store other information about the archive as well.
- Add missing proxmox-file-restore for split archives and fix restore of
  tar/zip archives via WebUI. This is handled by the same decoder logic,
  and needed an updated payload input content range to read the data
  from the correct location in the payload data archive.
- Additional refactoring to use the pxar reader helpers where possible.

The following lists the most notable changes included in this series since
the version 2:
- many bugfixes regarding incorrect archive encoding by wrong offset
  generation, adding additional sanity checks and rather fail on
  encoding than produce an incorrectly encoded archive
- different approach for deciding whether to reuse or reencode the
  entries. Previously, the entries have been encoded when a cached
  payload size threshold was reached. Now, the padding introduced by
  reusable chunks is tracked, and only if the padding does not exceed
  the set threshold, the entries are reused. This reduces the possible
  padding, at the cost of reencoding more entries. Also avoids to
  re-use chunks which have now large padding holes because of
  moved/removed files contained within.
- added headers for metadata archive and payload file
- added documentation

An invocation of a backup run with this patches now is:
```bash
proxmox-backup-client backup <label>.pxar:<source-path> --change-detection-mode=metadata
```
During the first run, no reference index is available, the pxar archive
will however be split into the two parts.
Following backups will however utilize the pxar archive accessor and
index files of the previous run to perform file change detection.

As benchmarks, the linux source code as well as the coco dataset for
computer vision and pattern recognition can be used.
The benchmarks can be performed by running:
```bash
proxmox-backup-test-suite detection-mode-bench prepare --target /<path-to-bench-source-target>
proxmox-backup-test-suite detection-mode-bench run linux.pxar:/<path-to-bench-source-target>/linux
proxmox-backup-test-suite detection-mode-bench run coco.pxar:/<path-to-bench-source-target>/coco
```

Above command invocations assume the default repository and credentials
to be set as environment variables, they might however be passed as
additional optional parameters instead.

pxar:

Christian Ebner (14):
  format/examples: add header type `PXAR_PAYLOAD_REF`
  decoder: add method to read payload references
  decoder: factor out skip part from skip_entry
  encoder: add optional output writer for file payloads
  encoder: move to stack based state tracking
  decoder/accessor: add optional payload input stream
  decoder: set payload input range when decoding via accessor
  encoder: add payload reference capability
  encoder: add payload position capability
  encoder: add payload advance capability
  encoder/format: finish payload stream with marker
  format: add payload stream start marker
  format/encoder/decoder: new pxar entry type `Version`
  format/encoder/decoder: new pxar entry type `Prelude`

 examples/apxar.rs            |   2 +-
 examples/mk-format-hashes.rs |  21 ++
 examples/pxarcmd.rs          |   7 +-
 src/accessor/aio.rs          |  10 +-
 src/accessor/mod.rs          | 116 +++++++-
 src/accessor/sync.rs         |   8 +-
 src/decoder/aio.rs           |  14 +-
 src/decoder/mod.rs           | 212 +++++++++++++--
 src/decoder/sync.rs          |  15 +-
 src/encoder/aio.rs           |  87 ++++--
 src/encoder/mod.rs           | 497 ++++++++++++++++++++++++++---------
 src/encoder/sync.rs          |  67 ++++-
 src/format/mod.rs            |  63 +++++
 src/lib.rs                   |   9 +
 tests/compat.rs              |   3 +-
 tests/simple/fs.rs           |   8 +-
 tests/simple/main.rs         |   8 +-
 17 files changed, 935 insertions(+), 212 deletions(-)

proxmox-backup:

Christian Ebner (48):
  client: pxar: switch to stack based encoder state
  client: backup: factor out extension from backup target
  client: pxar: combine writers into struct
  client: pxar: add optional pxar payload writer instance
  client: pxar: optionally split metadata and payload streams
  client: helper: add helpers for creating reader instances
  client: helper: add method for split archive name mapping
  client: restore: read payload from dedicated index
  tools: cover extension for split pxar archives
  restore: cover extension for split pxar archives
  client: mount: make split pxar archives mountable
  api: datastore: refactor getting local chunk reader
  api: datastore: attach optional payload chunk reader
  catalog: shell: make split pxar archives accessible
  www: cover metadata extension for pxar archives
  file restore: factor out getting pxar reader
  file restore: cover split metadata and payload archives
  file restore: show more error context when extraction fails
  pxar: add optional payload input for achive restore
  pxar: add more context to extraction error
  client: pxar: include payload offset in entry listing
  pxar: show padding in debug output on archive list
  datastore: dynamic index: add method to get digest
  client: pxar: helper for lookup of reusable dynamic entries
  upload stream: implement reused chunk injector
  client: chunk stream: add struct to hold injection state
  client: streams: add channels for dynamic entry injection
  specs: add backup detection mode specification
  client: implement prepare reference method
  client: pxar: add method for metadata comparison
  pxar: caching: add look-ahead cache types
  fix #3174: client: pxar: enable caching and meta comparison
  client: backup writer: add injected chunk count to stats
  pxar: create: keep track of reused chunks and files
  pxar: create: show chunk injection stats debug output
  client: pxar: add helper to handle optional preludes
  client: pxar: opt encode cli exclude patterns as Prelude
  docs: file formats: describe split pxar archive file layout
  docs: add section describing change detection mode
  test-suite: add detection mode change benchmark
  test-suite: add bin to deb, add shell completions
  datastore: chunker: add Chunker trait
  datastore: chunker: implement chunker for payload stream
  client: chunk stream: switch payload stream chunker
  client: pxar: allow to restore prelude to optional path
  client: pxar: add archive creation with reference test
  client: tools: add helper to raise nofile rlimit
  client: pxar: set cache limit based on nofile rlimit

 Cargo.toml                                    |    1 +
 Makefile                                      |   13 +-
 debian/proxmox-backup-client.bash-completion  |    1 +
 debian/proxmox-backup-client.install          |    2 +
 debian/proxmox-backup-test-suite.bc           |    8 +
 docs/backup-client.rst                        |   41 +
 docs/file-formats.rst                         |   46 +
 docs/meta-format-overview.dot                 |   50 +
 examples/test_chunk_size.rs                   |    9 +-
 examples/test_chunk_speed.rs                  |    7 +-
 examples/test_chunk_speed2.rs                 |    2 +-
 pbs-client/src/backup_specification.rs        |   44 +
 pbs-client/src/backup_writer.rs               |  120 +-
 pbs-client/src/chunk_stream.rs                |  122 +-
 pbs-client/src/inject_reused_chunks.rs        |  129 +++
 pbs-client/src/lib.rs                         |    3 +-
 pbs-client/src/pxar/create.rs                 | 1004 ++++++++++++++++-
 pbs-client/src/pxar/extract.rs                |   31 +-
 pbs-client/src/pxar/look_ahead_cache.rs       |   38 +
 pbs-client/src/pxar/mod.rs                    |    5 +-
 pbs-client/src/pxar/tools.rs                  |  123 +-
 pbs-client/src/pxar_backup_stream.rs          |   68 +-
 pbs-client/src/tools/mod.rs                   |   55 +-
 pbs-datastore/src/chunker.rs                  |  161 ++-
 pbs-datastore/src/dynamic_index.rs            |   14 +-
 pbs-datastore/src/lib.rs                      |    2 +-
 pbs-pxar-fuse/src/lib.rs                      |    2 +-
 proxmox-backup-client/src/catalog.rs          |   30 +-
 proxmox-backup-client/src/helper.rs           |   96 ++
 proxmox-backup-client/src/main.rs             |  284 ++++-
 proxmox-backup-client/src/mount.rs            |   34 +-
 proxmox-backup-test-suite/Cargo.toml          |   18 +
 .../src/detection_mode_bench.rs               |  294 +++++
 proxmox-backup-test-suite/src/main.rs         |   17 +
 proxmox-file-restore/src/main.rs              |   80 +-
 .../src/proxmox_restore_daemon/api.rs         |   18 +-
 pxar-bin/src/main.rs                          |   61 +-
 src/api2/admin/datastore.rs                   |   47 +-
 src/api2/tape/restore.rs                      |   21 +-
 src/bin/proxmox_backup_debug/diff.rs          |    2 +-
 src/tape/file_formats/snapshot_archive.rs     |    9 +-
 tests/catar.rs                                |    5 +-
 tests/pxar/backup-client-pxar-data.mpxar      |  Bin 0 -> 15070 bytes
 tests/pxar/backup-client-pxar-data.ppxar.didx |  Bin 0 -> 8096 bytes
 tests/pxar/backup-client-pxar-expected.mpxar  |  Bin 0 -> 15086 bytes
 www/datastore/Content.js                      |    6 +-
 zsh-completions/_proxmox-backup-test-suite    |   13 +
 47 files changed, 2802 insertions(+), 334 deletions(-)
 create mode 100644 debian/proxmox-backup-test-suite.bc
 create mode 100644 docs/meta-format-overview.dot
 create mode 100644 pbs-client/src/inject_reused_chunks.rs
 create mode 100644 pbs-client/src/pxar/look_ahead_cache.rs
 create mode 100644 proxmox-backup-client/src/helper.rs
 create mode 100644 proxmox-backup-test-suite/Cargo.toml
 create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs
 create mode 100644 proxmox-backup-test-suite/src/main.rs
 create mode 100644 tests/pxar/backup-client-pxar-data.mpxar
 create mode 100644 tests/pxar/backup-client-pxar-data.ppxar.didx
 create mode 100644 tests/pxar/backup-client-pxar-expected.mpxar
 create mode 100644 zsh-completions/_proxmox-backup-test-suite

-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 pxar 01/62] format/examples: add header type `PXAR_PAYLOAD_REF`
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
@ 2024-05-07 15:51 ` Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 02/62] decoder: add method to read payload references Christian Ebner
                   ` (61 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:51 UTC (permalink / raw)
  To: pbs-devel

Introduces the header type `PXAR_PAYLOAD_REF` to mark regular file
entry payloads, not encoded within the regular pxar archive but
rather redirected to a dedicated payload output writer.
It therefore substitutes the `PXAR_PAYLOAD` header type for these
entries.

The header marks the start and size for a `PayloadRef` typed object
in the archive, storing the offset to the payload header offset in the
payload stream of the dedicated payload output as well as the payload
size.

The `PayloadRef` provides the means to store, serialize and
deserialize the entry.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 examples/mk-format-hashes.rs |  5 +++++
 src/format/mod.rs            | 20 ++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index 6e00654..83adb38 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -41,6 +41,11 @@ const CONSTANTS: &[(&str, &str, &str)] = &[
         "PXAR_PAYLOAD",
         "__PROXMOX_FORMAT_PXAR_PAYLOAD__",
     ),
+    (
+        "Marks the beginning of a payload reference for regular files",
+        "PXAR_PAYLOAD_REF",
+        "__PROXMOX_FORMAT_PXAR_PAYLOAD_REF__",
+    ),
     (
         "Marks item as entry of goodbye table",
         "PXAR_GOODBYE",
diff --git a/src/format/mod.rs b/src/format/mod.rs
index bfea9f6..5d7a652 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -22,6 +22,7 @@
 //!   * `FCAPS`             -- file capability in Linux disk format
 //!   * `QUOTA_PROJECT_ID`  -- the ext4/xfs quota project ID
 //!   * `PAYLOAD`           -- file contents, if it is one
+//!   * `PAYLOAD_REF`       -- reference to file offset in optional payload file (introduced in v2)
 //!   * `SYMLINK`           -- symlink target, if it is one
 //!   * `DEVICE`            -- device major/minor, if it is a block/char device
 //!
@@ -99,6 +100,8 @@ pub const PXAR_QUOTA_PROJID: u64 = 0xe07540e82f7d1cbb;
 pub const PXAR_HARDLINK: u64 = 0x51269c8422bd7275;
 /// Marks the beginning of the payload (actual content) of regular files
 pub const PXAR_PAYLOAD: u64 = 0x28147a1b0b7c1a25;
+/// Marks the beginning of a payload reference for regular files
+pub const PXAR_PAYLOAD_REF: u64 = 0x419d3d6bc4ba977e;
 /// Marks item as entry of goodbye table
 pub const PXAR_GOODBYE: u64 = 0x2fec4fa642d5731d;
 /// The end marker used in the GOODBYE object
@@ -152,6 +155,7 @@ impl Header {
             PXAR_QUOTA_PROJID => size_of::<QuotaProjectId>() as u64,
             PXAR_ENTRY => size_of::<Stat>() as u64,
             PXAR_PAYLOAD | PXAR_GOODBYE => u64::MAX - (size_of::<Self>() as u64),
+            PXAR_PAYLOAD_REF => size_of::<PayloadRef>() as u64,
             _ => u64::MAX - (size_of::<Self>() as u64),
         }
     }
@@ -192,6 +196,7 @@ impl Display for Header {
             PXAR_QUOTA_PROJID => "QUOTA_PROJID",
             PXAR_ENTRY => "ENTRY",
             PXAR_PAYLOAD => "PAYLOAD",
+            PXAR_PAYLOAD_REF => "PAYLOAD_REF",
             PXAR_GOODBYE => "GOODBYE",
             _ => "UNKNOWN",
         };
@@ -723,6 +728,21 @@ impl GoodbyeItem {
     }
 }
 
+/// References a regular file payload found in a separated payload archive
+#[derive(Clone, Debug, Endian)]
+pub struct PayloadRef {
+    pub offset: u64,
+    pub size: u64,
+}
+
+impl PayloadRef {
+    pub(crate) fn data(&self) -> Vec<u8> {
+        let mut data = self.offset.to_le_bytes().to_vec();
+        data.append(&mut self.size.to_le_bytes().to_vec());
+        data
+    }
+}
+
 /// Hash a file name for use in the goodbye table.
 pub fn hash_filename(name: &[u8]) -> u64 {
     use std::hash::Hasher;
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 pxar 02/62] decoder: add method to read payload references
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 01/62] format/examples: add header type `PXAR_PAYLOAD_REF` Christian Ebner
@ 2024-05-07 15:51 ` Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 03/62] decoder: factor out skip part from skip_entry Christian Ebner
                   ` (60 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:51 UTC (permalink / raw)
  To: pbs-devel

This is in preparation for reading payloads from a dedicated payload
input stream.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 src/decoder/mod.rs | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index d1fb911..9dce7b2 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -661,6 +661,11 @@ impl<I: SeqRead> DecoderImpl<I> {
     async fn read_quota_project_id(&mut self) -> io::Result<format::QuotaProjectId> {
         self.read_simple_entry("quota project id").await
     }
+
+    async fn read_payload_ref(&mut self) -> io::Result<format::PayloadRef> {
+        self.current_header.check_header_size()?;
+        seq_read_entry(&mut self.input).await
+    }
 }
 
 /// Reader for file contents inside a pxar archive.
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 pxar 03/62] decoder: factor out skip part from skip_entry
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 01/62] format/examples: add header type `PXAR_PAYLOAD_REF` Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 02/62] decoder: add method to read payload references Christian Ebner
@ 2024-05-07 15:51 ` Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 04/62] encoder: add optional output writer for file payloads Christian Ebner
                   ` (59 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:51 UTC (permalink / raw)
  To: pbs-devel

Make the skip part reusable for a different input.

In preparation for skipping payload paddings in a separated input.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 src/decoder/mod.rs | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index 9dce7b2..d19ffd1 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -563,15 +563,18 @@ impl<I: SeqRead> DecoderImpl<I> {
     //
 
     async fn skip_entry(&mut self, offset: u64) -> io::Result<()> {
-        let mut len = self.current_header.content_size() - offset;
+        let len = (self.current_header.content_size() - offset) as usize;
+        Self::skip(&mut self.input, len).await
+    }
+
+    async fn skip(input: &mut I, mut len: usize) -> io::Result<()> {
         let scratch = scratch_buffer();
-        while len >= (scratch.len() as u64) {
-            seq_read_exact(&mut self.input, scratch).await?;
-            len -= scratch.len() as u64;
+        while len >= (scratch.len()) {
+            seq_read_exact(input, scratch).await?;
+            len -= scratch.len();
         }
-        let len = len as usize;
         if len > 0 {
-            seq_read_exact(&mut self.input, &mut scratch[..len]).await?;
+            seq_read_exact(input, &mut scratch[..len]).await?;
         }
         Ok(())
     }
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 pxar 04/62] encoder: add optional output writer for file payloads
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (2 preceding siblings ...)
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 03/62] decoder: factor out skip part from skip_entry Christian Ebner
@ 2024-05-07 15:51 ` Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 05/62] encoder: move to stack based state tracking Christian Ebner
                   ` (58 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:51 UTC (permalink / raw)
  To: pbs-devel

During regular pxar archive encoding, the payload of regular files is
written as part of the archive.

This patch introduces functionality to attach an optional, dedicated
writer instance to redirect the payload to a different output.
The separation of data and metadata streams allows for efficient
reuse of payload data by referencing the payload writer byte offset,
without having to reencode it.

Whenever the payload of regular files is redirected to a dedicated
output writer, encode a payload reference header followed by the
required data to locate the data, instead of adding the regular payload
header followed by the encoded payload to the archive.

This is in preparation for reusing payload chunks for unchanged files
of backups created via the proxmox-backup-client.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 src/encoder/aio.rs  | 24 +++++++++---
 src/encoder/mod.rs  | 89 ++++++++++++++++++++++++++++++++++++++++-----
 src/encoder/sync.rs | 13 +++++--
 3 files changed, 107 insertions(+), 19 deletions(-)

diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index ad25fea..31a1a2f 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -24,8 +24,14 @@ impl<'a, T: tokio::io::AsyncWrite + 'a> Encoder<'a, TokioWriter<T>> {
     pub async fn from_tokio(
         output: T,
         metadata: &Metadata,
+        payload_output: Option<T>,
     ) -> io::Result<Encoder<'a, TokioWriter<T>>> {
-        Encoder::new(TokioWriter::new(output), metadata).await
+        Encoder::new(
+            TokioWriter::new(output),
+            metadata,
+            payload_output.map(|payload_output| TokioWriter::new(payload_output)),
+        )
+        .await
     }
 }
 
@@ -39,6 +45,7 @@ impl<'a> Encoder<'a, TokioWriter<tokio::fs::File>> {
         Encoder::new(
             TokioWriter::new(tokio::fs::File::create(path.as_ref()).await?),
             metadata,
+            None,
         )
         .await
     }
@@ -46,9 +53,13 @@ impl<'a> Encoder<'a, TokioWriter<tokio::fs::File>> {
 
 impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
     /// Create an asynchronous encoder for an output implementing our internal write interface.
-    pub async fn new(output: T, metadata: &Metadata) -> io::Result<Encoder<'a, T>> {
+    pub async fn new(
+        output: T,
+        metadata: &Metadata,
+        payload_output: Option<T>,
+    ) -> io::Result<Encoder<'a, T>> {
         Ok(Self {
-            inner: encoder::EncoderImpl::new(output.into(), metadata).await?,
+            inner: encoder::EncoderImpl::new(output.into(), metadata, payload_output).await?,
         })
     }
 
@@ -291,9 +302,10 @@ mod test {
     /// Assert that `Encoder` is `Send`
     fn send_test() {
         let test = async {
-            let mut encoder = Encoder::new(DummyOutput, &Metadata::dir_builder(0o700).build())
-                .await
-                .unwrap();
+            let mut encoder =
+                Encoder::new(DummyOutput, &Metadata::dir_builder(0o700).build(), None)
+                    .await
+                    .unwrap();
             {
                 let mut dir = encoder
                     .create_directory("baba", &Metadata::dir_builder(0o700).build())
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index da41733..99c3758 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -17,7 +17,7 @@ use endian_trait::Endian;
 
 use crate::binary_tree_array;
 use crate::decoder::{self, SeqRead};
-use crate::format::{self, GoodbyeItem};
+use crate::format::{self, GoodbyeItem, PayloadRef};
 use crate::Metadata;
 
 pub mod aio;
@@ -221,6 +221,9 @@ struct EncoderState {
 
     /// We need to keep track how much we have written to get offsets.
     write_position: u64,
+
+    /// Track the bytes written to the payload writer
+    payload_write_position: u64,
 }
 
 impl EncoderState {
@@ -278,6 +281,7 @@ impl<'a, T> std::convert::From<&'a mut T> for EncoderOutput<'a, T> {
 /// synchronous or `async` I/O objects in as output.
 pub(crate) struct EncoderImpl<'a, T: SeqWrite + 'a> {
     output: EncoderOutput<'a, T>,
+    payload_output: EncoderOutput<'a, Option<T>>,
     state: EncoderState,
     parent: Option<&'a mut EncoderState>,
     finished: bool,
@@ -306,12 +310,14 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     pub async fn new(
         output: EncoderOutput<'a, T>,
         metadata: &Metadata,
+        payload_output: Option<T>,
     ) -> io::Result<EncoderImpl<'a, T>> {
         if !metadata.is_dir() {
             io_bail!("directory metadata must contain the directory mode flag");
         }
         let mut this = Self {
             output,
+            payload_output: EncoderOutput::Owned(None),
             state: EncoderState::default(),
             parent: None,
             finished: false,
@@ -323,6 +329,10 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         this.encode_metadata(metadata).await?;
         this.state.files_offset = this.position();
 
+        if let Some(payload_output) = payload_output {
+            this.payload_output = EncoderOutput::Owned(Some(payload_output));
+        }
+
         Ok(this)
     }
 
@@ -361,10 +371,37 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         let file_offset = self.position();
         self.start_file_do(Some(metadata), file_name).await?;
 
-        let header = format::Header::with_content_size(format::PXAR_PAYLOAD, file_size);
-        header.check_header_size()?;
+        if let Some(payload_output) = self.payload_output.as_mut() {
+            // payload references must point to the position prior to the payload header,
+            // separating payload entries in the payload stream
+            let payload_position = self.state.payload_write_position;
+
+            let header = format::Header::with_content_size(format::PXAR_PAYLOAD, file_size);
+            header.check_header_size()?;
+            seq_write_struct(
+                payload_output,
+                header,
+                &mut self.state.payload_write_position,
+            )
+            .await?;
+
+            let payload_ref = PayloadRef {
+                offset: payload_position,
+                size: file_size,
+            };
 
-        seq_write_struct(self.output.as_mut(), header, &mut self.state.write_position).await?;
+            seq_write_pxar_entry(
+                self.output.as_mut(),
+                format::PXAR_PAYLOAD_REF,
+                &payload_ref.data(),
+                &mut self.state.write_position,
+            )
+            .await?;
+        } else {
+            let header = format::Header::with_content_size(format::PXAR_PAYLOAD, file_size);
+            header.check_header_size()?;
+            seq_write_struct(self.output.as_mut(), header, &mut self.state.write_position).await?;
+        }
 
         let payload_data_offset = self.position();
 
@@ -372,6 +409,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
 
         Ok(FileImpl {
             output: self.output.as_mut(),
+            payload_output: self.payload_output.as_mut().as_mut(),
             goodbye_item: GoodbyeItem {
                 hash: format::hash_filename(file_name),
                 offset: file_offset,
@@ -564,6 +602,11 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         self.state.write_position
     }
 
+    #[inline]
+    fn payload_position(&mut self) -> u64 {
+        self.state.payload_write_position
+    }
+
     pub async fn create_directory(
         &mut self,
         file_name: &Path,
@@ -588,18 +631,21 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
 
         // the child will write to OUR state now:
         let write_position = self.position();
+        let payload_write_position = self.payload_position();
 
         let file_copy_buffer = Arc::clone(&self.file_copy_buffer);
 
         Ok(EncoderImpl {
             // always forward as Borrowed(), to avoid stacking references on nested calls
             output: self.output.to_borrowed_mut(),
+            payload_output: self.payload_output.to_borrowed_mut(),
             state: EncoderState {
                 entry_offset,
                 files_offset,
                 file_offset: Some(file_offset),
                 file_hash,
                 write_position,
+                payload_write_position,
                 ..Default::default()
             },
             parent: Some(&mut self.state),
@@ -764,15 +810,21 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         )
         .await?;
 
+        if let EncoderOutput::Owned(Some(output)) = &mut self.payload_output {
+            flush(output).await?;
+        }
+
         if let EncoderOutput::Owned(output) = &mut self.output {
             flush(output).await?;
         }
 
         // done up here because of the self-borrow and to propagate
         let end_offset = self.position();
+        let payload_end_offset = self.payload_position();
 
         if let Some(parent) = &mut self.parent {
             parent.write_position = end_offset;
+            parent.payload_write_position = payload_end_offset;
 
             let file_offset = self
                 .state
@@ -837,6 +889,9 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
 pub(crate) struct FileImpl<'a, S: SeqWrite> {
     output: &'a mut S,
 
+    /// Optional write redirection of file payloads to this sequential stream
+    payload_output: Option<&'a mut S>,
+
     /// This file's `GoodbyeItem`. FIXME: We currently don't touch this, can we just push it
     /// directly instead of on Drop of FileImpl?
     goodbye_item: GoodbyeItem,
@@ -916,19 +971,33 @@ impl<'a, S: SeqWrite> FileImpl<'a, S> {
     /// for convenience.
     pub async fn write(&mut self, data: &[u8]) -> io::Result<usize> {
         self.check_remaining(data.len())?;
-        let put =
-            poll_fn(|cx| unsafe { Pin::new_unchecked(&mut self.output).poll_seq_write(cx, data) })
-                .await?;
-        //let put = seq_write(self.output.as_mut().unwrap(), data).await?;
+        let put = if let Some(mut output) = self.payload_output.as_mut() {
+            let put =
+                poll_fn(|cx| unsafe { Pin::new_unchecked(&mut output).poll_seq_write(cx, data) })
+                    .await?;
+            self.parent.payload_write_position += put as u64;
+            put
+        } else {
+            let put = poll_fn(|cx| unsafe {
+                Pin::new_unchecked(&mut self.output).poll_seq_write(cx, data)
+            })
+            .await?;
+            self.parent.write_position += put as u64;
+            put
+        };
+
         self.remaining_size -= put as u64;
-        self.parent.write_position += put as u64;
         Ok(put)
     }
 
     /// Completely write file data for the current file entry in a pxar archive.
     pub async fn write_all(&mut self, data: &[u8]) -> io::Result<()> {
         self.check_remaining(data.len())?;
-        seq_write_all(self.output, data, &mut self.parent.write_position).await?;
+        if let Some(ref mut output) = self.payload_output {
+            seq_write_all(output, data, &mut self.parent.payload_write_position).await?;
+        } else {
+            seq_write_all(self.output, data, &mut self.parent.write_position).await?;
+        }
         self.remaining_size -= data.len() as u64;
         Ok(())
     }
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 1ec91b8..96d056d 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -28,7 +28,7 @@ impl<'a, T: io::Write + 'a> Encoder<'a, StandardWriter<T>> {
     /// Encode a `pxar` archive into a regular `std::io::Write` output.
     #[inline]
     pub fn from_std(output: T, metadata: &Metadata) -> io::Result<Encoder<'a, StandardWriter<T>>> {
-        Encoder::new(StandardWriter::new(output), metadata)
+        Encoder::new(StandardWriter::new(output), metadata, None)
     }
 }
 
@@ -41,6 +41,7 @@ impl<'a> Encoder<'a, StandardWriter<std::fs::File>> {
         Encoder::new(
             StandardWriter::new(std::fs::File::create(path.as_ref())?),
             metadata,
+            None,
         )
     }
 }
@@ -50,9 +51,15 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
     ///
     /// Note that the `output`'s `SeqWrite` implementation must always return `Poll::Ready` and is
     /// not allowed to use the `Waker`, as this will cause a `panic!`.
-    pub fn new(output: T, metadata: &Metadata) -> io::Result<Self> {
+    // Optionally attach a dedicated writer to redirect the payloads of regular files to a separate
+    // output.
+    pub fn new(output: T, metadata: &Metadata, payload_output: Option<T>) -> io::Result<Self> {
         Ok(Self {
-            inner: poll_result_once(encoder::EncoderImpl::new(output.into(), metadata))?,
+            inner: poll_result_once(encoder::EncoderImpl::new(
+                output.into(),
+                metadata,
+                payload_output,
+            ))?,
         })
     }
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 pxar 05/62] encoder: move to stack based state tracking
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (3 preceding siblings ...)
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 04/62] encoder: add optional output writer for file payloads Christian Ebner
@ 2024-05-07 15:51 ` Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 06/62] decoder/accessor: add optional payload input stream Christian Ebner
                   ` (57 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:51 UTC (permalink / raw)
  To: pbs-devel

In preparation for the proxmox-backup-client look-ahead caching,
where a passing around of different encoder instances with internal
references is not feasible.

Instead of creating a new encoder instance for each directory level
and keeping references to the parent state, use an internal stack.
Adds additional helper functions to solve borrow issues, when both
the state and writers have to be accessed by a mutable reference.

This is a breaking change in the pxar library API.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 examples/pxarcmd.rs  |   7 +-
 src/encoder/aio.rs   |  26 ++--
 src/encoder/mod.rs   | 323 +++++++++++++++++++++++++------------------
 src/encoder/sync.rs  |  16 ++-
 tests/simple/fs.rs   |   6 +-
 tests/simple/main.rs |   3 +
 6 files changed, 226 insertions(+), 155 deletions(-)

diff --git a/examples/pxarcmd.rs b/examples/pxarcmd.rs
index e0c779d..0294eba 100644
--- a/examples/pxarcmd.rs
+++ b/examples/pxarcmd.rs
@@ -106,6 +106,7 @@ fn cmd_create(mut args: std::env::ArgsOs) -> Result<(), Error> {
     let mut encoder = Encoder::create(file, &meta)?;
     add_directory(&mut encoder, dir, &dir_path, &mut HashMap::new())?;
     encoder.finish()?;
+    encoder.close()?;
 
     Ok(())
 }
@@ -138,14 +139,14 @@ fn add_directory<'a, T: SeqWrite + 'a>(
 
         let meta = Metadata::from(&file_meta);
         if file_type.is_dir() {
-            let mut dir = encoder.create_directory(file_name, &meta)?;
+            encoder.create_directory(file_name, &meta)?;
             add_directory(
-                &mut dir,
+                encoder,
                 std::fs::read_dir(file_path)?,
                 root_path,
                 &mut *hardlinks,
             )?;
-            dir.finish()?;
+            encoder.finish()?;
         } else if file_type.is_symlink() {
             todo!("symlink handling");
         } else if file_type.is_file() {
diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index 31a1a2f..635e550 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -109,20 +109,23 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         &mut self,
         file_name: P,
         metadata: &Metadata,
-    ) -> io::Result<Encoder<'_, T>> {
-        Ok(Encoder {
-            inner: self
-                .inner
-                .create_directory(file_name.as_ref(), metadata)
-                .await?,
-        })
+    ) -> io::Result<()> {
+        self.inner
+            .create_directory(file_name.as_ref(), metadata)
+            .await
     }
 
-    /// Finish this directory. This is mandatory, otherwise the `Drop` handler will `panic!`.
-    pub async fn finish(self) -> io::Result<()> {
+    /// Finish this directory. This is mandatory, encodes the end for the current directory.
+    pub async fn finish(&mut self) -> io::Result<()> {
         self.inner.finish().await
     }
 
+    /// Close the encoder instance. This is mandatory, encodes the end for the optional payload
+    /// output stream, if some is given
+    pub async fn close(self) -> io::Result<()> {
+        self.inner.close().await
+    }
+
     /// Add a symbolic link to the archive.
     pub async fn add_symlink<PF: AsRef<Path>, PT: AsRef<Path>>(
         &mut self,
@@ -307,11 +310,12 @@ mod test {
                     .await
                     .unwrap();
             {
-                let mut dir = encoder
+                encoder
                     .create_directory("baba", &Metadata::dir_builder(0o700).build())
                     .await
                     .unwrap();
-                dir.create_file(&Metadata::file_builder(0o755).build(), "abab", 1024)
+                encoder
+                    .create_file(&Metadata::file_builder(0o755).build(), "abab", 1024)
                     .await
                     .unwrap();
             }
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 99c3758..369a937 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -224,9 +224,22 @@ struct EncoderState {
 
     /// Track the bytes written to the payload writer
     payload_write_position: u64,
+
+    /// Mark the encoder state as correctly finished, ready to be dropped
+    finished: bool,
 }
 
 impl EncoderState {
+    #[inline]
+    fn position(&self) -> u64 {
+        self.write_position
+    }
+
+    #[inline]
+    fn payload_position(&self) -> u64 {
+        self.payload_write_position
+    }
+
     fn merge_error(&mut self, error: Option<EncodeError>) {
         // one error is enough:
         if self.encode_error.is_none() {
@@ -237,6 +250,23 @@ impl EncoderState {
     fn add_error(&mut self, error: EncodeError) {
         self.merge_error(Some(error));
     }
+
+    fn finish(&mut self) -> Option<EncodeError> {
+        self.finished = true;
+        self.encode_error.take()
+    }
+}
+
+impl Drop for EncoderState {
+    fn drop(&mut self) {
+        if !self.finished {
+            eprintln!("unfinished encoder state dropped");
+        }
+
+        if self.encode_error.is_some() {
+            eprintln!("finished encoder state with errors");
+        }
+    }
 }
 
 pub(crate) enum EncoderOutput<'a, T> {
@@ -244,16 +274,6 @@ pub(crate) enum EncoderOutput<'a, T> {
     Borrowed(&'a mut T),
 }
 
-impl<'a, T> EncoderOutput<'a, T> {
-    #[inline]
-    fn to_borrowed_mut<'s>(&'s mut self) -> EncoderOutput<'s, T>
-    where
-        'a: 's,
-    {
-        EncoderOutput::Borrowed(self.as_mut())
-    }
-}
-
 impl<'a, T> std::convert::AsMut<T> for EncoderOutput<'a, T> {
     fn as_mut(&mut self) -> &mut T {
         match self {
@@ -282,8 +302,8 @@ impl<'a, T> std::convert::From<&'a mut T> for EncoderOutput<'a, T> {
 pub(crate) struct EncoderImpl<'a, T: SeqWrite + 'a> {
     output: EncoderOutput<'a, T>,
     payload_output: EncoderOutput<'a, Option<T>>,
-    state: EncoderState,
-    parent: Option<&'a mut EncoderState>,
+    /// EncoderState stack storing the state for each directory level
+    state: Vec<EncoderState>,
     finished: bool,
 
     /// Since only the "current" entry can be actively writing files, we share the file copy
@@ -293,15 +313,12 @@ pub(crate) struct EncoderImpl<'a, T: SeqWrite + 'a> {
 
 impl<'a, T: SeqWrite + 'a> Drop for EncoderImpl<'a, T> {
     fn drop(&mut self) {
-        if let Some(ref mut parent) = self.parent {
-            // propagate errors:
-            parent.merge_error(self.state.encode_error);
-            if !self.finished {
-                parent.add_error(EncodeError::IncompleteDirectory);
-            }
-        } else if !self.finished {
-            // FIXME: how do we deal with this?
-            // eprintln!("Encoder dropped without finishing!");
+        if !self.finished {
+            eprintln!("unclosed encoder dropped");
+        }
+
+        if !self.state.is_empty() {
+            eprintln!("closed encoder dropped with state");
         }
     }
 }
@@ -318,8 +335,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         let mut this = Self {
             output,
             payload_output: EncoderOutput::Owned(None),
-            state: EncoderState::default(),
-            parent: None,
+            state: vec![EncoderState::default()],
             finished: false,
             file_copy_buffer: Arc::new(Mutex::new(unsafe {
                 crate::util::vec_new_uninitialized(1024 * 1024)
@@ -327,7 +343,8 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         };
 
         this.encode_metadata(metadata).await?;
-        this.state.files_offset = this.position();
+        let state = this.state_mut()?;
+        state.files_offset = state.position();
 
         if let Some(payload_output) = payload_output {
             this.payload_output = EncoderOutput::Owned(Some(payload_output));
@@ -337,13 +354,50 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     }
 
     fn check(&self) -> io::Result<()> {
-        match self.state.encode_error {
+        if self.finished {
+            io_bail!("unexpected encoder finished state");
+        }
+        let state = self.state()?;
+        match state.encode_error {
             Some(EncodeError::IncompleteFile) => io_bail!("incomplete file"),
             Some(EncodeError::IncompleteDirectory) => io_bail!("directory not finalized"),
             None => Ok(()),
         }
     }
 
+    fn state(&self) -> io::Result<&EncoderState> {
+        self.state
+            .last()
+            .ok_or_else(|| io_format_err!("encoder state stack underflow"))
+    }
+
+    fn state_mut(&mut self) -> io::Result<&mut EncoderState> {
+        self.state
+            .last_mut()
+            .ok_or_else(|| io_format_err!("encoder state stack underflow"))
+    }
+
+    fn output_state(&mut self) -> io::Result<(&mut T, &mut EncoderState)> {
+        Ok((
+            self.output.as_mut(),
+            self.state
+                .last_mut()
+                .ok_or_else(|| io_format_err!("encoder state stack underflow"))?,
+        ))
+    }
+
+    fn output_payload_output_state(
+        &mut self,
+    ) -> io::Result<(&mut T, Option<&mut T>, &mut EncoderState)> {
+        Ok((
+            self.output.as_mut(),
+            self.payload_output.as_mut().as_mut(),
+            self.state
+                .last_mut()
+                .ok_or_else(|| io_format_err!("encoder state stack underflow"))?,
+        ))
+    }
+
     pub async fn create_file<'b>(
         &'b mut self,
         metadata: &Metadata,
@@ -368,22 +422,17 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     {
         self.check()?;
 
-        let file_offset = self.position();
+        let file_offset = self.state()?.position();
         self.start_file_do(Some(metadata), file_name).await?;
 
-        if let Some(payload_output) = self.payload_output.as_mut() {
+        if let (output, Some(payload_output), state) = self.output_payload_output_state()? {
             // payload references must point to the position prior to the payload header,
             // separating payload entries in the payload stream
-            let payload_position = self.state.payload_write_position;
+            let payload_position = state.payload_position();
 
             let header = format::Header::with_content_size(format::PXAR_PAYLOAD, file_size);
             header.check_header_size()?;
-            seq_write_struct(
-                payload_output,
-                header,
-                &mut self.state.payload_write_position,
-            )
-            .await?;
+            seq_write_struct(payload_output, header, &mut state.payload_write_position).await?;
 
             let payload_ref = PayloadRef {
                 offset: payload_position,
@@ -391,32 +440,34 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
             };
 
             seq_write_pxar_entry(
-                self.output.as_mut(),
+                output,
                 format::PXAR_PAYLOAD_REF,
                 &payload_ref.data(),
-                &mut self.state.write_position,
+                &mut state.write_position,
             )
             .await?;
         } else {
             let header = format::Header::with_content_size(format::PXAR_PAYLOAD, file_size);
             header.check_header_size()?;
-            seq_write_struct(self.output.as_mut(), header, &mut self.state.write_position).await?;
+            let (output, state) = self.output_state()?;
+            seq_write_struct(output, header, &mut state.write_position).await?;
         }
 
-        let payload_data_offset = self.position();
+        let (output, payload_output, state) = self.output_payload_output_state()?;
+        let payload_data_offset = state.position();
 
         let meta_size = payload_data_offset - file_offset;
 
         Ok(FileImpl {
-            output: self.output.as_mut(),
-            payload_output: self.payload_output.as_mut().as_mut(),
+            output,
+            payload_output,
             goodbye_item: GoodbyeItem {
                 hash: format::hash_filename(file_name),
                 offset: file_offset,
                 size: file_size + meta_size,
             },
             remaining_size: file_size,
-            parent: &mut self.state,
+            parent: state,
         })
     }
 
@@ -497,7 +548,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         target: &Path,
         target_offset: LinkOffset,
     ) -> io::Result<()> {
-        let current_offset = self.position();
+        let current_offset = self.state()?.position();
         if current_offset <= target_offset.0 {
             io_bail!("invalid hardlink offset, can only point to prior files");
         }
@@ -571,24 +622,20 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     ) -> io::Result<LinkOffset> {
         self.check()?;
 
-        let file_offset = self.position();
+        let file_offset = self.state()?.position();
 
         let file_name = file_name.as_os_str().as_bytes();
 
         self.start_file_do(metadata, file_name).await?;
+
+        let (output, state) = self.output_state()?;
         if let Some((htype, entry_data)) = entry_htype_data {
-            seq_write_pxar_entry(
-                self.output.as_mut(),
-                htype,
-                entry_data,
-                &mut self.state.write_position,
-            )
-            .await?;
+            seq_write_pxar_entry(output, htype, entry_data, &mut state.write_position).await?;
         }
 
-        let end_offset = self.position();
+        let end_offset = state.position();
 
-        self.state.items.push(GoodbyeItem {
+        state.items.push(GoodbyeItem {
             hash: format::hash_filename(file_name),
             offset: file_offset,
             size: end_offset - file_offset,
@@ -597,21 +644,11 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         Ok(LinkOffset(file_offset))
     }
 
-    #[inline]
-    fn position(&mut self) -> u64 {
-        self.state.write_position
-    }
-
-    #[inline]
-    fn payload_position(&mut self) -> u64 {
-        self.state.payload_write_position
-    }
-
     pub async fn create_directory(
         &mut self,
         file_name: &Path,
         metadata: &Metadata,
-    ) -> io::Result<EncoderImpl<'_, T>> {
+    ) -> io::Result<()> {
         self.check()?;
 
         if !metadata.is_dir() {
@@ -621,37 +658,32 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         let file_name = file_name.as_os_str().as_bytes();
         let file_hash = format::hash_filename(file_name);
 
-        let file_offset = self.position();
+        let file_offset = self.state()?.position();
         self.encode_filename(file_name).await?;
 
-        let entry_offset = self.position();
+        let entry_offset = self.state()?.position();
         self.encode_metadata(metadata).await?;
 
-        let files_offset = self.position();
+        let state = self.state_mut()?;
+        let files_offset = state.position();
 
         // the child will write to OUR state now:
-        let write_position = self.position();
-        let payload_write_position = self.payload_position();
-
-        let file_copy_buffer = Arc::clone(&self.file_copy_buffer);
-
-        Ok(EncoderImpl {
-            // always forward as Borrowed(), to avoid stacking references on nested calls
-            output: self.output.to_borrowed_mut(),
-            payload_output: self.payload_output.to_borrowed_mut(),
-            state: EncoderState {
-                entry_offset,
-                files_offset,
-                file_offset: Some(file_offset),
-                file_hash,
-                write_position,
-                payload_write_position,
-                ..Default::default()
-            },
-            parent: Some(&mut self.state),
+        let write_position = state.position();
+        let payload_write_position = state.payload_position();
+
+        self.state.push(EncoderState {
+            items: Vec::new(),
+            encode_error: None,
+            entry_offset,
+            files_offset,
+            file_offset: Some(file_offset),
+            file_hash,
+            write_position,
+            payload_write_position,
             finished: false,
-            file_copy_buffer,
-        })
+        });
+
+        Ok(())
     }
 
     async fn start_file_do(
@@ -667,11 +699,12 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     }
 
     async fn encode_metadata(&mut self, metadata: &Metadata) -> io::Result<()> {
+        let (output, state) = self.output_state()?;
         seq_write_pxar_struct_entry(
-            self.output.as_mut(),
+            output,
             format::PXAR_ENTRY,
             metadata.stat.clone(),
-            &mut self.state.write_position,
+            &mut state.write_position,
         )
         .await?;
 
@@ -693,72 +726,74 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     }
 
     async fn write_xattr(&mut self, xattr: &format::XAttr) -> io::Result<()> {
+        let (output, state) = self.output_state()?;
         seq_write_pxar_entry(
-            self.output.as_mut(),
+            output,
             format::PXAR_XATTR,
             &xattr.data,
-            &mut self.state.write_position,
+            &mut state.write_position,
         )
         .await
     }
 
     async fn write_acls(&mut self, acl: &crate::Acl) -> io::Result<()> {
+        let (output, state) = self.output_state()?;
         for acl in &acl.users {
             seq_write_pxar_struct_entry(
-                self.output.as_mut(),
+                output,
                 format::PXAR_ACL_USER,
                 acl.clone(),
-                &mut self.state.write_position,
+                &mut state.write_position,
             )
             .await?;
         }
 
         for acl in &acl.groups {
             seq_write_pxar_struct_entry(
-                self.output.as_mut(),
+                output,
                 format::PXAR_ACL_GROUP,
                 acl.clone(),
-                &mut self.state.write_position,
+                &mut state.write_position,
             )
             .await?;
         }
 
         if let Some(acl) = &acl.group_obj {
             seq_write_pxar_struct_entry(
-                self.output.as_mut(),
+                output,
                 format::PXAR_ACL_GROUP_OBJ,
                 acl.clone(),
-                &mut self.state.write_position,
+                &mut state.write_position,
             )
             .await?;
         }
 
         if let Some(acl) = &acl.default {
             seq_write_pxar_struct_entry(
-                self.output.as_mut(),
+                output,
                 format::PXAR_ACL_DEFAULT,
                 acl.clone(),
-                &mut self.state.write_position,
+                &mut state.write_position,
             )
             .await?;
         }
 
         for acl in &acl.default_users {
             seq_write_pxar_struct_entry(
-                self.output.as_mut(),
+                output,
                 format::PXAR_ACL_DEFAULT_USER,
                 acl.clone(),
-                &mut self.state.write_position,
+                &mut state.write_position,
             )
             .await?;
         }
 
         for acl in &acl.default_groups {
             seq_write_pxar_struct_entry(
-                self.output.as_mut(),
+                output,
                 format::PXAR_ACL_DEFAULT_GROUP,
                 acl.clone(),
-                &mut self.state.write_position,
+                &mut state.write_position,
             )
             .await?;
         }
@@ -767,11 +802,12 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     }
 
     async fn write_file_capabilities(&mut self, fcaps: &format::FCaps) -> io::Result<()> {
+        let (output, state) = self.output_state()?;
         seq_write_pxar_entry(
-            self.output.as_mut(),
+            output,
             format::PXAR_FCAPS,
             &fcaps.data,
-            &mut self.state.write_position,
+            &mut state.write_position,
         )
         .await
     }
@@ -780,35 +816,32 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         &mut self,
         quota_project_id: &format::QuotaProjectId,
     ) -> io::Result<()> {
+        let (output, state) = self.output_state()?;
         seq_write_pxar_struct_entry(
-            self.output.as_mut(),
+            output,
             format::PXAR_QUOTA_PROJID,
             *quota_project_id,
-            &mut self.state.write_position,
+            &mut state.write_position,
         )
         .await
     }
 
     async fn encode_filename(&mut self, file_name: &[u8]) -> io::Result<()> {
         crate::util::validate_filename(file_name)?;
+        let (output, state) = self.output_state()?;
         seq_write_pxar_entry_zero(
-            self.output.as_mut(),
+            output,
             format::PXAR_FILENAME,
             file_name,
-            &mut self.state.write_position,
+            &mut state.write_position,
         )
         .await
     }
 
-    pub async fn finish(mut self) -> io::Result<()> {
-        let tail_bytes = self.finish_goodbye_table().await?;
-        seq_write_pxar_entry(
-            self.output.as_mut(),
-            format::PXAR_GOODBYE,
-            &tail_bytes,
-            &mut self.state.write_position,
-        )
-        .await?;
+    pub async fn close(mut self) -> io::Result<()> {
+        if !self.state.is_empty() {
+            io_bail!("unexpected state on encoder close");
+        }
 
         if let EncoderOutput::Owned(Some(output)) = &mut self.payload_output {
             flush(output).await?;
@@ -818,34 +851,60 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
             flush(output).await?;
         }
 
-        // done up here because of the self-borrow and to propagate
-        let end_offset = self.position();
-        let payload_end_offset = self.payload_position();
+        self.finished = true;
+
+        Ok(())
+    }
 
-        if let Some(parent) = &mut self.parent {
+    pub async fn finish(&mut self) -> io::Result<()> {
+        let tail_bytes = self.finish_goodbye_table().await?;
+        let mut state = self
+            .state
+            .pop()
+            .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
+        seq_write_pxar_entry(
+            self.output.as_mut(),
+            format::PXAR_GOODBYE,
+            &tail_bytes,
+            &mut state.write_position,
+        )
+        .await?;
+
+        let end_offset = state.position();
+        let payload_end_offset = state.payload_position();
+
+        let encode_error = state.finish();
+        if let Some(parent) = self.state.last_mut() {
             parent.write_position = end_offset;
             parent.payload_write_position = payload_end_offset;
 
-            let file_offset = self
-                .state
+            let file_offset = state
                 .file_offset
                 .expect("internal error: parent set but no file_offset?");
 
             parent.items.push(GoodbyeItem {
-                hash: self.state.file_hash,
+                hash: state.file_hash,
                 offset: file_offset,
                 size: end_offset - file_offset,
             });
+            // propagate errors
+            parent.merge_error(encode_error);
+            Ok(())
+        } else {
+            match encode_error {
+                Some(EncodeError::IncompleteFile) => io_bail!("incomplete file"),
+                Some(EncodeError::IncompleteDirectory) => io_bail!("directory not finalized"),
+                None => Ok(()),
+            }
         }
-        self.finished = true;
-        Ok(())
     }
 
     async fn finish_goodbye_table(&mut self) -> io::Result<Vec<u8>> {
-        let goodbye_offset = self.position();
+        let state = self.state_mut()?;
+        let goodbye_offset = state.position();
 
         // "take" out the tail (to not leave an array of endian-swapped structs in `self`)
-        let mut tail = take(&mut self.state.items);
+        let mut tail = take(&mut state.items);
         let tail_size = (tail.len() + 1) * size_of::<GoodbyeItem>();
         let goodbye_size = tail_size as u64 + size_of::<format::Header>() as u64;
 
@@ -870,7 +929,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         bst.push(
             GoodbyeItem {
                 hash: format::PXAR_GOODBYE_TAIL_MARKER,
-                offset: goodbye_offset - self.state.entry_offset,
+                offset: goodbye_offset - state.entry_offset,
                 size: goodbye_size,
             }
             .to_le(),
@@ -900,8 +959,8 @@ pub(crate) struct FileImpl<'a, S: SeqWrite> {
     /// exactly zero.
     remaining_size: u64,
 
-    /// The directory containing this file. This is where we propagate the `IncompleteFile` error
-    /// to, and where we insert our `GoodbyeItem`.
+    /// The directory stack with the last item being the directory containing this file. This is
+    /// where we propagate the `IncompleteFile` error to, and where we insert our `GoodbyeItem`.
     parent: &'a mut EncoderState,
 }
 
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 96d056d..d0d62ba 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -106,17 +106,21 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         &mut self,
         file_name: P,
         metadata: &Metadata,
-    ) -> io::Result<Encoder<'_, T>> {
-        Ok(Encoder {
-            inner: poll_result_once(self.inner.create_directory(file_name.as_ref(), metadata))?,
-        })
+    ) -> io::Result<()> {
+        poll_result_once(self.inner.create_directory(file_name.as_ref(), metadata))
     }
 
-    /// Finish this directory. This is mandatory, otherwise the `Drop` handler will `panic!`.
-    pub fn finish(self) -> io::Result<()> {
+    /// Finish this directory. This is mandatory, encodes the end for the current directory.
+    pub fn finish(&mut self) -> io::Result<()> {
         poll_result_once(self.inner.finish())
     }
 
+    /// Close the encoder instance. This is mandatory, encodes the end for the optional payload
+    /// output stream, if some is given
+    pub fn close(self) -> io::Result<()> {
+        poll_result_once(self.inner.close())
+    }
+
     /// Add a symbolic link to the archive.
     pub fn add_symlink<PF: AsRef<Path>, PT: AsRef<Path>>(
         &mut self,
diff --git a/tests/simple/fs.rs b/tests/simple/fs.rs
index 9a89c4d..4284805 100644
--- a/tests/simple/fs.rs
+++ b/tests/simple/fs.rs
@@ -144,12 +144,12 @@ impl Entry {
 
             EntryKind::Directory(entries) => {
                 self.no_hardlink()?;
-                let mut dir = encoder.create_directory(&self.name, &self.metadata)?;
+                encoder.create_directory(&self.name, &self.metadata)?;
                 let path = path.join(&self.name);
                 for entry in entries {
-                    entry.encode_into(&mut dir, hardlinks, &path)?;
+                    entry.encode_into(encoder, hardlinks, &path)?;
                 }
-                dir.finish()?;
+                encoder.finish()?;
             }
 
             EntryKind::Symlink(path) => {
diff --git a/tests/simple/main.rs b/tests/simple/main.rs
index d661c7d..e55457f 100644
--- a/tests/simple/main.rs
+++ b/tests/simple/main.rs
@@ -51,6 +51,9 @@ fn test1() {
     encoder
         .finish()
         .expect("failed to finish encoding the pxar archive");
+    encoder
+        .close()
+        .expect("failed to close the encoder instance");
 
     assert!(!file.is_empty(), "encoder did not write any data");
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 pxar 06/62] decoder/accessor: add optional payload input stream
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (4 preceding siblings ...)
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 05/62] encoder: move to stack based state tracking Christian Ebner
@ 2024-05-07 15:51 ` Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 07/62] decoder: set payload input range when decoding via accessor Christian Ebner
                   ` (56 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:51 UTC (permalink / raw)
  To: pbs-devel

Implement an optional redirection to read the payload for regular files
from a different input stream.

This allows to decode split stream archives.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 examples/apxar.rs    |  2 +-
 src/accessor/aio.rs  | 10 +++--
 src/accessor/mod.rs  | 78 ++++++++++++++++++++++++++++++------
 src/accessor/sync.rs |  8 ++--
 src/decoder/aio.rs   | 14 ++++---
 src/decoder/mod.rs   | 94 ++++++++++++++++++++++++++++++++++++++------
 src/decoder/sync.rs  | 15 ++++---
 src/lib.rs           |  3 ++
 tests/compat.rs      |  3 +-
 tests/simple/main.rs |  5 ++-
 10 files changed, 188 insertions(+), 44 deletions(-)

diff --git a/examples/apxar.rs b/examples/apxar.rs
index 0c62242..d5eb04e 100644
--- a/examples/apxar.rs
+++ b/examples/apxar.rs
@@ -9,7 +9,7 @@ async fn main() {
         .await
         .expect("failed to open file");
 
-    let mut reader = Decoder::from_tokio(file)
+    let mut reader = Decoder::from_tokio(file, None)
         .await
         .expect("failed to open pxar archive contents");
 
diff --git a/src/accessor/aio.rs b/src/accessor/aio.rs
index 98d7755..06167b4 100644
--- a/src/accessor/aio.rs
+++ b/src/accessor/aio.rs
@@ -39,7 +39,7 @@ impl<T: FileExt> Accessor<FileReader<T>> {
     /// by a blocking file.
     #[inline]
     pub async fn from_file_and_size(input: T, size: u64) -> io::Result<Self> {
-        Accessor::new(FileReader::new(input), size).await
+        Accessor::new(FileReader::new(input), size, None).await
     }
 }
 
@@ -75,7 +75,7 @@ where
         input: T,
         size: u64,
     ) -> io::Result<Accessor<FileRefReader<T>>> {
-        Accessor::new(FileRefReader::new(input), size).await
+        Accessor::new(FileRefReader::new(input), size, None).await
     }
 }
 
@@ -85,9 +85,11 @@ impl<T: ReadAt> Accessor<T> {
     ///
     /// Note that the `input`'s `SeqRead` implementation must always return `Poll::Ready` and is
     /// not allowed to use the `Waker`, as this will cause a `panic!`.
-    pub async fn new(input: T, size: u64) -> io::Result<Self> {
+    /// Optionally take the file payloads from the provided input stream rather than the regular
+    /// pxar stream.
+    pub async fn new(input: T, size: u64, payload_input: Option<(T, u64)>) -> io::Result<Self> {
         Ok(Self {
-            inner: accessor::AccessorImpl::new(input, size).await?,
+            inner: accessor::AccessorImpl::new(input, size, payload_input).await?,
         })
     }
 
diff --git a/src/accessor/mod.rs b/src/accessor/mod.rs
index 6a2de73..46afbe3 100644
--- a/src/accessor/mod.rs
+++ b/src/accessor/mod.rs
@@ -182,10 +182,11 @@ pub(crate) struct AccessorImpl<T> {
     input: T,
     size: u64,
     caches: Arc<Caches>,
+    payload_input: Option<(T, Range<u64>)>,
 }
 
 impl<T: ReadAt> AccessorImpl<T> {
-    pub async fn new(input: T, size: u64) -> io::Result<Self> {
+    pub async fn new(input: T, size: u64, payload_input: Option<(T, u64)>) -> io::Result<Self> {
         if size < (size_of::<GoodbyeItem>() as u64) {
             io_bail!("too small to contain a pxar archive");
         }
@@ -194,6 +195,7 @@ impl<T: ReadAt> AccessorImpl<T> {
             input,
             size,
             caches: Arc::new(Caches::default()),
+            payload_input: payload_input.map(|(input, size)| (input, 0..size)),
         })
     }
 
@@ -207,6 +209,9 @@ impl<T: ReadAt> AccessorImpl<T> {
             self.size,
             "/".into(),
             Arc::clone(&self.caches),
+            self.payload_input
+                .as_ref()
+                .map(|(input, range)| (input as &dyn ReadAt, range.clone())),
         )
         .await
     }
@@ -227,8 +232,15 @@ async fn get_decoder<T: ReadAt>(
     input: T,
     entry_range: Range<u64>,
     path: PathBuf,
+    payload_input: Option<(T, Range<u64>)>,
 ) -> io::Result<DecoderImpl<SeqReadAtAdapter<T>>> {
-    DecoderImpl::new_full(SeqReadAtAdapter::new(input, entry_range), path, true).await
+    DecoderImpl::new_full(
+        SeqReadAtAdapter::new(input, entry_range.clone()),
+        path,
+        true,
+        payload_input.map(|(input, range)| SeqReadAtAdapter::new(input, range)),
+    )
+    .await
 }
 
 // NOTE: This performs the Decoder::read_next_item() behavior! Keep in mind when changing!
@@ -236,6 +248,7 @@ async fn get_decoder_at_filename<T: ReadAt>(
     input: T,
     entry_range: Range<u64>,
     path: PathBuf,
+    payload_input: Option<(T, Range<u64>)>,
 ) -> io::Result<(DecoderImpl<SeqReadAtAdapter<T>>, u64)> {
     // Read the header, it should be a FILENAME, then skip over it and its length:
     let header: format::Header = read_entry_at(&input, entry_range.start).await?;
@@ -251,7 +264,7 @@ async fn get_decoder_at_filename<T: ReadAt>(
     }
 
     Ok((
-        get_decoder(input, entry_offset..entry_range.end, path).await?,
+        get_decoder(input, entry_offset..entry_range.end, path, payload_input).await?,
         entry_offset,
     ))
 }
@@ -263,6 +276,7 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
             self.size,
             "/".into(),
             Arc::clone(&self.caches),
+            self.payload_input.clone(),
         )
         .await
     }
@@ -274,6 +288,7 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
             offset,
             "/".into(),
             Arc::clone(&self.caches),
+            self.payload_input.clone(),
         )
         .await
     }
@@ -287,23 +302,30 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
             self.input.clone(),
             entry_range_info.entry_range.clone(),
             PathBuf::new(),
+            self.payload_input.clone(),
         )
         .await?;
         let entry = decoder
             .next()
             .await
             .ok_or_else(|| io_format_err!("unexpected EOF while decoding file entry"))??;
+
         Ok(FileEntryImpl {
             input: self.input.clone(),
             entry,
             entry_range_info: entry_range_info.clone(),
             caches: Arc::clone(&self.caches),
+            payload_input: self.payload_input.clone(),
         })
     }
 
     /// Allow opening arbitrary contents from a specific range.
     pub unsafe fn open_contents_at_range(&self, range: Range<u64>) -> FileContentsImpl<T> {
-        FileContentsImpl::new(self.input.clone(), range)
+        if let Some((payload_input, _)) = &self.payload_input {
+            FileContentsImpl::new(payload_input.clone(), range)
+        } else {
+            FileContentsImpl::new(self.input.clone(), range)
+        }
     }
 
     /// Following a hardlink breaks a couple of conventions we otherwise have, particularly we will
@@ -326,9 +348,13 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
 
         let link_offset = entry_file_offset - link_offset;
 
-        let (mut decoder, entry_offset) =
-            get_decoder_at_filename(self.input.clone(), link_offset..self.size, PathBuf::new())
-                .await?;
+        let (mut decoder, entry_offset) = get_decoder_at_filename(
+            self.input.clone(),
+            link_offset..self.size,
+            PathBuf::new(),
+            self.payload_input.clone(),
+        )
+        .await?;
 
         let entry = decoder
             .next()
@@ -342,6 +368,7 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
             EntryKind::File {
                 offset: Some(offset),
                 size,
+                ..
             } => {
                 let meta_size = offset - link_offset;
                 let entry_end = link_offset + meta_size + size;
@@ -353,6 +380,7 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
                         entry_range: entry_offset..entry_end,
                     },
                     caches: Arc::clone(&self.caches),
+                    payload_input: self.payload_input.clone(),
                 })
             }
             _ => io_bail!("hardlink does not point to a regular file"),
@@ -369,6 +397,7 @@ pub(crate) struct DirectoryImpl<T> {
     table: Arc<[GoodbyeItem]>,
     path: PathBuf,
     caches: Arc<Caches>,
+    payload_input: Option<(T, Range<u64>)>,
 }
 
 impl<T: Clone + ReadAt> DirectoryImpl<T> {
@@ -378,6 +407,7 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
         end_offset: u64,
         path: PathBuf,
         caches: Arc<Caches>,
+        payload_input: Option<(T, Range<u64>)>,
     ) -> io::Result<DirectoryImpl<T>> {
         let tail = Self::read_tail_entry(&input, end_offset).await?;
 
@@ -407,6 +437,7 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
             table: table.as_ref().map_or_else(|| Arc::new([]), Arc::clone),
             path,
             caches,
+            payload_input,
         };
 
         // sanity check:
@@ -502,6 +533,7 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
                 None => self.path.clone(),
                 Some(file) => self.path.join(file),
             },
+            self.payload_input.clone(),
         )
         .await
     }
@@ -533,6 +565,7 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
                 entry_range: self.entry_range(),
             },
             caches: Arc::clone(&self.caches),
+            payload_input: self.payload_input.clone(),
         })
     }
 
@@ -575,6 +608,10 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
             cur = next;
         }
 
+        if let Some(cur) = cur.as_mut() {
+            cur.payload_input = self.payload_input.clone();
+        }
+
         Ok(cur)
     }
 
@@ -599,7 +636,9 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
 
             let cursor = self.get_cursor(index).await?;
             if cursor.file_name == path {
-                return Ok(Some(cursor.decode_entry().await?));
+                let mut entry = cursor.decode_entry().await?;
+                entry.payload_input = self.payload_input.clone();
+                return Ok(Some(entry));
             }
 
             dup += 1;
@@ -685,6 +724,7 @@ pub(crate) struct FileEntryImpl<T: Clone + ReadAt> {
     entry: Entry,
     entry_range_info: EntryRangeInfo,
     caches: Arc<Caches>,
+    payload_input: Option<(T, Range<u64>)>,
 }
 
 impl<T: Clone + ReadAt> FileEntryImpl<T> {
@@ -698,6 +738,7 @@ impl<T: Clone + ReadAt> FileEntryImpl<T> {
             self.entry_range_info.entry_range.end,
             self.entry.path.clone(),
             Arc::clone(&self.caches),
+            self.payload_input.clone(),
         )
         .await
     }
@@ -711,15 +752,29 @@ impl<T: Clone + ReadAt> FileEntryImpl<T> {
             EntryKind::File {
                 size,
                 offset: Some(offset),
+                payload_offset: None,
             } => Ok(Some(offset..(offset + size))),
+            // Payload offset beats regular offset if some
+            EntryKind::File {
+                size,
+                offset: Some(_offset),
+                payload_offset: Some(payload_offset),
+            } => {
+                let start_offset = payload_offset + size_of::<format::Header>() as u64;
+                Ok(Some(start_offset..start_offset + size))
+            }
             _ => Ok(None),
         }
     }
 
     pub async fn contents(&self) -> io::Result<FileContentsImpl<T>> {
-        match self.content_range()? {
-            Some(range) => Ok(FileContentsImpl::new(self.input.clone(), range)),
-            None => io_bail!("not a file"),
+        let range = self
+            .content_range()?
+            .ok_or_else(|| io_format_err!("not a file"))?;
+        if let Some((ref payload_input, _)) = self.payload_input {
+            Ok(FileContentsImpl::new(payload_input.clone(), range))
+        } else {
+            Ok(FileContentsImpl::new(self.input.clone(), range))
         }
     }
 
@@ -808,6 +863,7 @@ impl<'a, T: Clone + ReadAt> DirEntryImpl<'a, T> {
             entry,
             entry_range_info: self.entry_range_info.clone(),
             caches: Arc::clone(&self.caches),
+            payload_input: self.dir.payload_input.clone(),
         })
     }
 
diff --git a/src/accessor/sync.rs b/src/accessor/sync.rs
index a777152..cd8dff0 100644
--- a/src/accessor/sync.rs
+++ b/src/accessor/sync.rs
@@ -31,7 +31,7 @@ impl<T: FileExt> Accessor<FileReader<T>> {
     /// Decode a `pxar` archive from a standard file implementing `FileExt`.
     #[inline]
     pub fn from_file_and_size(input: T, size: u64) -> io::Result<Self> {
-        Accessor::new(FileReader::new(input), size)
+        Accessor::new(FileReader::new(input), size, None)
     }
 }
 
@@ -64,7 +64,7 @@ where
 {
     /// Open an `Arc` or `Rc` of `File`.
     pub fn from_file_ref_and_size(input: T, size: u64) -> io::Result<Accessor<FileRefReader<T>>> {
-        Accessor::new(FileRefReader::new(input), size)
+        Accessor::new(FileRefReader::new(input), size, None)
     }
 }
 
@@ -74,9 +74,9 @@ impl<T: ReadAt> Accessor<T> {
     ///
     /// Note that the `input`'s `SeqRead` implementation must always return `Poll::Ready` and is
     /// not allowed to use the `Waker`, as this will cause a `panic!`.
-    pub fn new(input: T, size: u64) -> io::Result<Self> {
+    pub fn new(input: T, size: u64, payload_input: Option<(T, u64)>) -> io::Result<Self> {
         Ok(Self {
-            inner: poll_result_once(accessor::AccessorImpl::new(input, size))?,
+            inner: poll_result_once(accessor::AccessorImpl::new(input, size, payload_input))?,
         })
     }
 
diff --git a/src/decoder/aio.rs b/src/decoder/aio.rs
index 4de8c6f..bb032cf 100644
--- a/src/decoder/aio.rs
+++ b/src/decoder/aio.rs
@@ -20,8 +20,12 @@ pub struct Decoder<T> {
 impl<T: tokio::io::AsyncRead> Decoder<TokioReader<T>> {
     /// Decode a `pxar` archive from a `tokio::io::AsyncRead` input.
     #[inline]
-    pub async fn from_tokio(input: T) -> io::Result<Self> {
-        Decoder::new(TokioReader::new(input)).await
+    pub async fn from_tokio(input: T, payload_input: Option<T>) -> io::Result<Self> {
+        Decoder::new(
+            TokioReader::new(input),
+            payload_input.map(|payload_input| TokioReader::new(payload_input)),
+        )
+        .await
     }
 }
 
@@ -30,15 +34,15 @@ impl Decoder<TokioReader<tokio::fs::File>> {
     /// Decode a `pxar` archive from a `tokio::io::AsyncRead` input.
     #[inline]
     pub async fn open<P: AsRef<Path>>(path: P) -> io::Result<Self> {
-        Decoder::from_tokio(tokio::fs::File::open(path.as_ref()).await?).await
+        Decoder::from_tokio(tokio::fs::File::open(path.as_ref()).await?, None).await
     }
 }
 
 impl<T: SeqRead> Decoder<T> {
     /// Create an async decoder from an input implementing our internal read interface.
-    pub async fn new(input: T) -> io::Result<Self> {
+    pub async fn new(input: T, payload_input: Option<T>) -> io::Result<Self> {
         Ok(Self {
-            inner: decoder::DecoderImpl::new(input).await?,
+            inner: decoder::DecoderImpl::new(input, payload_input).await?,
         })
     }
 
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index d19ffd1..07b6c61 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -157,6 +157,10 @@ pub(crate) struct DecoderImpl<T> {
     state: State,
     with_goodbye_tables: bool,
 
+    // Payload of regular files might be provided by a different reader
+    payload_input: Option<T>,
+    payload_consumed: u64,
+
     /// The random access code uses decoders for sub-ranges which may not end in a `PAYLOAD` for
     /// entries like FIFOs or sockets, so there we explicitly allow an item to terminate with EOF.
     eof_after_entry: bool,
@@ -167,6 +171,7 @@ enum State {
     Default,
     InPayload {
         offset: u64,
+        size: u64,
     },
 
     /// file entries with no data (fifo, socket)
@@ -195,8 +200,8 @@ pub(crate) enum ItemResult {
 }
 
 impl<I: SeqRead> DecoderImpl<I> {
-    pub async fn new(input: I) -> io::Result<Self> {
-        Self::new_full(input, "/".into(), false).await
+    pub async fn new(input: I, payload_input: Option<I>) -> io::Result<Self> {
+        Self::new_full(input, "/".into(), false, payload_input).await
     }
 
     pub(crate) fn input(&self) -> &I {
@@ -207,6 +212,7 @@ impl<I: SeqRead> DecoderImpl<I> {
         input: I,
         path: PathBuf,
         eof_after_entry: bool,
+        payload_input: Option<I>,
     ) -> io::Result<Self> {
         let this = DecoderImpl {
             input,
@@ -219,6 +225,8 @@ impl<I: SeqRead> DecoderImpl<I> {
             path_lengths: Vec::new(),
             state: State::Begin,
             with_goodbye_tables: false,
+            payload_input,
+            payload_consumed: 0,
             eof_after_entry,
         };
 
@@ -242,9 +250,14 @@ impl<I: SeqRead> DecoderImpl<I> {
                     // hierarchy and parse the next PXAR_FILENAME or the PXAR_GOODBYE:
                     self.read_next_item().await?;
                 }
-                State::InPayload { offset } => {
-                    // We need to skip the current payload first.
-                    self.skip_entry(offset).await?;
+                State::InPayload { offset, .. } => {
+                    if self.payload_input.is_some() {
+                        // Update consumed payload as given by the offset referenced by the content reader
+                        self.payload_consumed += offset;
+                    } else {
+                        // Skip remaining payload of current entry in regular stream
+                        self.skip_entry(offset).await?;
+                    }
                     self.read_next_item().await?;
                 }
                 State::InGoodbyeTable => {
@@ -300,19 +313,23 @@ impl<I: SeqRead> DecoderImpl<I> {
     }
 
     pub fn content_size(&self) -> Option<u64> {
-        if let State::InPayload { .. } = self.state {
-            Some(self.current_header.content_size())
+        if let State::InPayload { size, .. } = self.state {
+            if self.payload_input.is_some() {
+                Some(size)
+            } else {
+                Some(self.current_header.content_size())
+            }
         } else {
             None
         }
     }
 
     pub fn content_reader(&mut self) -> Option<Contents<I>> {
-        if let State::InPayload { offset } = &mut self.state {
+        if let State::InPayload { offset, size } = &mut self.state {
             Some(Contents::new(
-                &mut self.input,
+                self.payload_input.as_mut().unwrap_or(&mut self.input),
                 offset,
-                self.current_header.content_size(),
+                *size,
             ))
         } else {
             None
@@ -531,8 +548,63 @@ impl<I: SeqRead> DecoderImpl<I> {
                 self.entry.kind = EntryKind::File {
                     size: self.current_header.content_size(),
                     offset,
+                    payload_offset: None,
+                };
+                self.state = State::InPayload {
+                    offset: 0,
+                    size: self.current_header.content_size(),
+                };
+                return Ok(ItemResult::Entry);
+            }
+            format::PXAR_PAYLOAD_REF => {
+                let offset = seq_read_position(&mut self.input).await.transpose()?;
+                let payload_ref = self.read_payload_ref().await?;
+
+                if let Some(payload_input) = self.payload_input.as_mut() {
+                    if seq_read_position(payload_input)
+                        .await
+                        .transpose()?
+                        .is_none()
+                    {
+                        if self.payload_consumed > payload_ref.offset {
+                            io_bail!(
+                                "unexpected offset {}, smaller than already consumed payload {}",
+                                payload_ref.offset,
+                                self.payload_consumed,
+                            );
+                        }
+                        let to_skip = payload_ref.offset - self.payload_consumed;
+                        Self::skip(payload_input, to_skip as usize).await?;
+                        self.payload_consumed += to_skip;
+                    }
+
+                    let header: Header = seq_read_entry(payload_input).await?;
+                    if header.htype != format::PXAR_PAYLOAD {
+                        io_bail!(
+                            "unexpected header in payload input: expected {} , got {header}",
+                            format::PXAR_PAYLOAD,
+                        );
+                    }
+                    self.payload_consumed += size_of::<Header>() as u64;
+
+                    if header.content_size() != payload_ref.size {
+                        io_bail!(
+                            "encountered payload size mismatch: got {}, expected {}",
+                            payload_ref.size,
+                            header.content_size(),
+                        );
+                    }
+                }
+
+                self.entry.kind = EntryKind::File {
+                    size: payload_ref.size,
+                    offset,
+                    payload_offset: Some(payload_ref.offset),
+                };
+                self.state = State::InPayload {
+                    offset: 0,
+                    size: payload_ref.size,
                 };
-                self.state = State::InPayload { offset: 0 };
                 return Ok(ItemResult::Entry);
             }
             format::PXAR_FILENAME | format::PXAR_GOODBYE => {
diff --git a/src/decoder/sync.rs b/src/decoder/sync.rs
index 5597a03..caa8bcd 100644
--- a/src/decoder/sync.rs
+++ b/src/decoder/sync.rs
@@ -25,8 +25,11 @@ pub struct Decoder<T> {
 impl<T: io::Read> Decoder<StandardReader<T>> {
     /// Decode a `pxar` archive from a regular `std::io::Read` input.
     #[inline]
-    pub fn from_std(input: T) -> io::Result<Self> {
-        Decoder::new(StandardReader::new(input))
+    pub fn from_std(input: T, payload_input: Option<T>) -> io::Result<Self> {
+        Decoder::new(
+            StandardReader::new(input),
+            payload_input.map(|payload_input| StandardReader::new(payload_input)),
+        )
     }
 
     /// Get a direct reference to the reader contained inside the contained [`StandardReader`].
@@ -38,7 +41,7 @@ impl<T: io::Read> Decoder<StandardReader<T>> {
 impl Decoder<StandardReader<std::fs::File>> {
     /// Convenience shortcut for `File::open` followed by `Accessor::from_file`.
     pub fn open<P: AsRef<Path>>(path: P) -> io::Result<Self> {
-        Self::from_std(std::fs::File::open(path.as_ref())?)
+        Self::from_std(std::fs::File::open(path.as_ref())?, None)
     }
 }
 
@@ -47,9 +50,11 @@ impl<T: SeqRead> Decoder<T> {
     ///
     /// Note that the `input`'s `SeqRead` implementation must always return `Poll::Ready` and is
     /// not allowed to use the `Waker`, as this will cause a `panic!`.
-    pub fn new(input: T) -> io::Result<Self> {
+    /// The optional payload input must be used to restore regular file payloads for payload references
+    /// encountered within the archive.
+    pub fn new(input: T, payload_input: Option<T>) -> io::Result<Self> {
         Ok(Self {
-            inner: poll_result_once(decoder::DecoderImpl::new(input))?,
+            inner: poll_result_once(decoder::DecoderImpl::new(input, payload_input))?,
         })
     }
 
diff --git a/src/lib.rs b/src/lib.rs
index 210c4b1..ef81a85 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -364,6 +364,9 @@ pub enum EntryKind {
 
         /// The file's byte offset inside the archive, if available.
         offset: Option<u64>,
+
+        /// The file's byte offset inside the payload stream, if available.
+        payload_offset: Option<u64>,
     },
 
     /// Directory entry. When iterating through an archive, the contents follow next.
diff --git a/tests/compat.rs b/tests/compat.rs
index 3b43e38..a1514ba 100644
--- a/tests/compat.rs
+++ b/tests/compat.rs
@@ -94,7 +94,8 @@ fn create_archive() -> io::Result<Vec<u8>> {
 fn test_archive() {
     let archive = create_archive().expect("failed to create test archive");
     let mut input = &archive[..];
-    let mut decoder = decoder::Decoder::from_std(&mut input).expect("failed to create decoder");
+    let mut decoder =
+        decoder::Decoder::from_std(&mut input, None).expect("failed to create decoder");
 
     let item = decoder
         .next()
diff --git a/tests/simple/main.rs b/tests/simple/main.rs
index e55457f..6ee93d1 100644
--- a/tests/simple/main.rs
+++ b/tests/simple/main.rs
@@ -61,13 +61,14 @@ fn test1() {
     // std::fs::write("myarchive.pxar", &file).expect("failed to write out test archive");
 
     let mut input = &file[..];
-    let mut decoder = decoder::Decoder::from_std(&mut input).expect("failed to create decoder");
+    let mut decoder =
+        decoder::Decoder::from_std(&mut input, None).expect("failed to create decoder");
     let decoded_fs =
         fs::Entry::decode_from(&mut decoder).expect("failed to decode previously encoded archive");
 
     assert_eq!(test_fs, decoded_fs);
 
-    let accessor = accessor::Accessor::new(&file[..], file.len() as u64)
+    let accessor = accessor::Accessor::new(&file[..], file.len() as u64, None)
         .expect("failed to create random access reader for encoded archive");
 
     check_bunzip2(&accessor);
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 pxar 07/62] decoder: set payload input range when decoding via accessor
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (5 preceding siblings ...)
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 06/62] decoder/accessor: add optional payload input stream Christian Ebner
@ 2024-05-07 15:51 ` Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 08/62] encoder: add payload reference capability Christian Ebner
                   ` (55 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:51 UTC (permalink / raw)
  To: pbs-devel

When accessing the file contents via the sequential file restore
the range of the payload contents cannot be inferred a-priori but need
to be calculated based on the payload references encountered during
decoding.

Extending the `SeqRead` trait by the method `update_range` allows to
set the range in the payload reader instance by implementing the
method for `SeqReadAtAdapter`, thereby setting the correct content
range to be accessed.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- fixed typo in commit message

 src/accessor/mod.rs |  4 ++++
 src/decoder/mod.rs  | 10 ++++++++++
 2 files changed, 14 insertions(+)

diff --git a/src/accessor/mod.rs b/src/accessor/mod.rs
index 46afbe3..fadd3d2 100644
--- a/src/accessor/mod.rs
+++ b/src/accessor/mod.rs
@@ -1038,4 +1038,8 @@ impl<T: ReadAt> decoder::SeqRead for SeqReadAtAdapter<T> {
     fn poll_position(self: Pin<&mut Self>, _cx: &mut Context) -> Poll<Option<io::Result<u64>>> {
         Poll::Ready(Some(Ok(self.range.start)))
     }
+
+    fn update_range(&mut self, range: Range<u64>) {
+        self.range = range;
+    }
 }
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index 07b6c61..7c5cc12 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -8,6 +8,7 @@ use std::ffi::OsString;
 use std::future::poll_fn;
 use std::io;
 use std::mem::{self, size_of, size_of_val, MaybeUninit};
+use std::ops::Range;
 use std::os::unix::ffi::{OsStrExt, OsStringExt};
 use std::path::{Path, PathBuf};
 use std::pin::Pin;
@@ -55,6 +56,11 @@ pub trait SeqRead {
     fn poll_position(self: Pin<&mut Self>, _cx: &mut Context) -> Poll<Option<io::Result<u64>>> {
         Poll::Ready(None)
     }
+
+    /// Update range for Readers implementing `SeqReadAtAdapter`
+    fn update_range(&mut self, _range: Range<u64>) {
+        // nothing to be done, only implemented by `SeqReadAtAdapter`s
+    }
 }
 
 /// Allow using trait objects for generics taking a `SeqRead`:
@@ -576,6 +582,10 @@ impl<I: SeqRead> DecoderImpl<I> {
                         let to_skip = payload_ref.offset - self.payload_consumed;
                         Self::skip(payload_input, to_skip as usize).await?;
                         self.payload_consumed += to_skip;
+                    } else {
+                        let start = payload_ref.offset;
+                        let end = start + payload_ref.size + size_of::<Header>() as u64;
+                        payload_input.update_range(start..end);
                     }
 
                     let header: Header = seq_read_entry(payload_input).await?;
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 pxar 08/62] encoder: add payload reference capability
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (6 preceding siblings ...)
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 07/62] decoder: set payload input range when decoding via accessor Christian Ebner
@ 2024-05-07 15:51 ` Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 09/62] encoder: add payload position capability Christian Ebner
                   ` (54 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:51 UTC (permalink / raw)
  To: pbs-devel

Allows to encode regular files with a payload reference within a
separate payload archive rather than encoding the payload within the
regular archive.

Following the PXAR_PAYLOAD_REF marked header, the payload offset and
size are encoded.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 src/encoder/aio.rs  | 18 ++++++++++++++-
 src/encoder/mod.rs  | 54 +++++++++++++++++++++++++++++++++++++++++++++
 src/encoder/sync.rs | 21 +++++++++++++++++-
 3 files changed, 91 insertions(+), 2 deletions(-)

diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index 635e550..b0e460b 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -5,7 +5,7 @@ use std::path::Path;
 use std::pin::Pin;
 use std::task::{Context, Poll};
 
-use crate::encoder::{self, LinkOffset, SeqWrite};
+use crate::encoder::{self, LinkOffset, PayloadOffset, SeqWrite};
 use crate::format;
 use crate::Metadata;
 
@@ -103,6 +103,22 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
     //     ).await
     // }
 
+    /// Encode a payload reference pointing to given offset in the separate payload output
+    ///
+    /// Returns with error if the encoder instance has no separate payload output or encoding
+    /// failed.
+    pub async fn add_payload_ref(
+        &mut self,
+        metadata: &Metadata,
+        file_name: &Path,
+        file_size: u64,
+        payload_offset: PayloadOffset,
+    ) -> io::Result<LinkOffset> {
+        self.inner
+            .add_payload_ref(metadata, file_name.as_ref(), file_size, payload_offset)
+            .await
+    }
+
     /// Create a new subdirectory. Note that the subdirectory has to be finished by calling the
     /// `finish()` method, otherwise the entire archive will be in an error state.
     pub async fn create_directory<P: AsRef<Path>>(
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 369a937..4152c91 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -38,6 +38,24 @@ impl LinkOffset {
     }
 }
 
+/// File reference used to create payload references.
+#[derive(Clone, Copy, Debug, Default, Eq, PartialEq, Ord, PartialOrd)]
+pub struct PayloadOffset(u64);
+
+impl PayloadOffset {
+    /// Get the raw byte offset of this link.
+    #[inline]
+    pub fn raw(self) -> u64 {
+        self.0
+    }
+
+    /// Return a new PayloadOffset, positively shifted by offset
+    #[inline]
+    pub fn add(&self, offset: u64) -> Self {
+        Self(self.0 + offset)
+    }
+}
+
 /// Sequential write interface used by the encoder's state machine.
 ///
 /// This is our internal writer trait which is available for `std::io::Write` types in the
@@ -520,6 +538,42 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         Ok(offset)
     }
 
+    /// Encode a payload reference pointing to given offset in the separate payload output
+    ///
+    /// Returns a file offset usable with `add_hardlink` or with error if the encoder instance has
+    /// no separate payload output or encoding failed.
+    pub async fn add_payload_ref(
+        &mut self,
+        metadata: &Metadata,
+        file_name: &Path,
+        file_size: u64,
+        payload_offset: PayloadOffset,
+    ) -> io::Result<LinkOffset> {
+        if self.payload_output.as_mut().is_none() {
+            io_bail!("unable to add payload reference");
+        }
+
+        let offset = payload_offset.raw();
+        let payload_position = self.state()?.payload_position();
+        if offset < payload_position {
+            io_bail!("offset smaller than current position: {offset} < {payload_position}");
+        }
+
+        let payload_ref = PayloadRef {
+            offset,
+            size: file_size,
+        };
+        let this_offset: LinkOffset = self
+            .add_file_entry(
+                Some(metadata),
+                file_name,
+                Some((format::PXAR_PAYLOAD_REF, &payload_ref.data())),
+            )
+            .await?;
+
+        Ok(this_offset)
+    }
+
     /// Return a file offset usable with `add_hardlink`.
     pub async fn add_symlink(
         &mut self,
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index d0d62ba..4bfbc8b 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -6,7 +6,7 @@ use std::pin::Pin;
 use std::task::{Context, Poll};
 
 use crate::decoder::sync::StandardReader;
-use crate::encoder::{self, LinkOffset, SeqWrite};
+use crate::encoder::{self, LinkOffset, PayloadOffset, SeqWrite};
 use crate::format;
 use crate::util::poll_result_once;
 use crate::Metadata;
@@ -100,6 +100,25 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         ))
     }
 
+    /// Encode a payload reference pointing to given offset in the separate payload output
+    ///
+    /// Returns with error if the encoder instance has no separate payload output or encoding
+    /// failed.
+    pub async fn add_payload_ref(
+        &mut self,
+        metadata: &Metadata,
+        file_name: &Path,
+        file_size: u64,
+        payload_offset: PayloadOffset,
+    ) -> io::Result<LinkOffset> {
+        poll_result_once(self.inner.add_payload_ref(
+            metadata,
+            file_name.as_ref(),
+            file_size,
+            payload_offset,
+        ))
+    }
+
     /// Create a new subdirectory. Note that the subdirectory has to be finished by calling the
     /// `finish()` method, otherwise the entire archive will be in an error state.
     pub fn create_directory<P: AsRef<Path>>(
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 pxar 09/62] encoder: add payload position capability
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (7 preceding siblings ...)
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 08/62] encoder: add payload reference capability Christian Ebner
@ 2024-05-07 15:51 ` Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 10/62] encoder: add payload advance capability Christian Ebner
                   ` (53 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:51 UTC (permalink / raw)
  To: pbs-devel

Allows to read the current payload offset from the dedicated payload
input stream. This is required to get the current offset for calculation
of forced boundaries in the proxmox-backup-client, when injecting reused
payload chunks into the payload stream.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 src/encoder/aio.rs  | 5 +++++
 src/encoder/mod.rs  | 4 ++++
 src/encoder/sync.rs | 5 +++++
 3 files changed, 14 insertions(+)

diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index b0e460b..f817747 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -83,6 +83,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         })
     }
 
+    /// Get current position for payload stream
+    pub fn payload_position(&self) -> io::Result<PayloadOffset> {
+        self.inner.payload_position()
+    }
+
     // /// Convenience shortcut to add a *regular* file by path including its contents to the archive.
     // pub async fn add_file<P, F>(
     //     &mut self,
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 4152c91..aeaab1d 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -538,6 +538,10 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         Ok(offset)
     }
 
+    pub fn payload_position(&self) -> io::Result<PayloadOffset> {
+        Ok(PayloadOffset(self.state()?.payload_position()))
+    }
+
     /// Encode a payload reference pointing to given offset in the separate payload output
     ///
     /// Returns a file offset usable with `add_hardlink` or with error if the encoder instance has
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 4bfbc8b..e47f008 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -100,6 +100,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         ))
     }
 
+    /// Get current payload position for payload stream
+    pub fn payload_position(&self) -> io::Result<PayloadOffset> {
+        self.inner.payload_position()
+    }
+
     /// Encode a payload reference pointing to given offset in the separate payload output
     ///
     /// Returns with error if the encoder instance has no separate payload output or encoding
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 pxar 10/62] encoder: add payload advance capability
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (8 preceding siblings ...)
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 09/62] encoder: add payload position capability Christian Ebner
@ 2024-05-07 15:51 ` Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 11/62] encoder/format: finish payload stream with marker Christian Ebner
                   ` (52 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:51 UTC (permalink / raw)
  To: pbs-devel

Allows to advance the payload writer position by a given size.
This is used to update the encoders payload input position when
injecting reused chunks for files with unchanged metadata.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 src/encoder/aio.rs  | 5 +++++
 src/encoder/mod.rs  | 6 ++++++
 src/encoder/sync.rs | 5 +++++
 3 files changed, 16 insertions(+)

diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index f817747..e385457 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -124,6 +124,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
             .await
     }
 
+    /// Add size to payload stream
+    pub fn advance(&mut self, size: PayloadOffset) -> io::Result<()> {
+        self.inner.advance(size)
+    }
+
     /// Create a new subdirectory. Note that the subdirectory has to be finished by calling the
     /// `finish()` method, otherwise the entire archive will be in an error state.
     pub async fn create_directory<P: AsRef<Path>>(
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index aeaab1d..54258b7 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -578,6 +578,12 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         Ok(this_offset)
     }
 
+    /// Add size to payload stream
+    pub fn advance(&mut self, size: PayloadOffset) -> io::Result<()> {
+        self.state_mut()?.payload_write_position += size.raw();
+        Ok(())
+    }
+
     /// Return a file offset usable with `add_hardlink`.
     pub async fn add_symlink(
         &mut self,
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index e47f008..bc6430a 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -124,6 +124,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         ))
     }
 
+    /// Add size to payload stream
+    pub fn advance(&mut self, size: PayloadOffset) -> io::Result<()> {
+        self.inner.advance(size)
+    }
+
     /// Create a new subdirectory. Note that the subdirectory has to be finished by calling the
     /// `finish()` method, otherwise the entire archive will be in an error state.
     pub fn create_directory<P: AsRef<Path>>(
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 pxar 11/62] encoder/format: finish payload stream with marker
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (9 preceding siblings ...)
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 10/62] encoder: add payload advance capability Christian Ebner
@ 2024-05-07 15:51 ` Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 12/62] format: add payload stream start marker Christian Ebner
                   ` (51 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:51 UTC (permalink / raw)
  To: pbs-devel

Mark the end of the optional payload stream, this makes sure that at
least some bytes are written to the stream (as empty archives are not
allowed by the proxmox backup server) and possible injected chunks
must be consumed.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 examples/mk-format-hashes.rs | 5 +++++
 src/encoder/mod.rs           | 8 ++++++++
 src/format/mod.rs            | 4 ++++
 3 files changed, 17 insertions(+)

diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index 83adb38..de73df0 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -56,6 +56,11 @@ const CONSTANTS: &[(&str, &str, &str)] = &[
         "PXAR_GOODBYE_TAIL_MARKER",
         "__PROXMOX_FORMAT_PXAR_GOODBYE_TAIL_MARKER__",
     ),
+    (
+        "The end marker used in the separate payload stream",
+        "PXAR_PAYLOAD_TAIL_MARKER",
+        "__PROXMOX_FORMAT_PXAR_PAYLOAD_TAIL_MARKER__",
+    ),
 ];
 
 fn main() {
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 54258b7..cf16449 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -908,6 +908,14 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         }
 
         if let EncoderOutput::Owned(Some(output)) = &mut self.payload_output {
+            let mut dummy_writer = 0;
+            seq_write_pxar_entry(
+                output,
+                format::PXAR_PAYLOAD_TAIL_MARKER,
+                &[],
+                &mut dummy_writer,
+            )
+            .await?;
             flush(output).await?;
         }
 
diff --git a/src/format/mod.rs b/src/format/mod.rs
index 5d7a652..e451b0f 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -106,6 +106,8 @@ pub const PXAR_PAYLOAD_REF: u64 = 0x419d3d6bc4ba977e;
 pub const PXAR_GOODBYE: u64 = 0x2fec4fa642d5731d;
 /// The end marker used in the GOODBYE object
 pub const PXAR_GOODBYE_TAIL_MARKER: u64 = 0xef5eed5b753e1555;
+/// The end marker used in the separate payload stream
+pub const PXAR_PAYLOAD_TAIL_MARKER: u64 = 0x6c72b78b984c81b5;
 
 #[derive(Debug, Endian)]
 #[repr(C)]
@@ -156,6 +158,7 @@ impl Header {
             PXAR_ENTRY => size_of::<Stat>() as u64,
             PXAR_PAYLOAD | PXAR_GOODBYE => u64::MAX - (size_of::<Self>() as u64),
             PXAR_PAYLOAD_REF => size_of::<PayloadRef>() as u64,
+            PXAR_PAYLOAD_TAIL_MARKER => size_of::<Header>() as u64,
             _ => u64::MAX - (size_of::<Self>() as u64),
         }
     }
@@ -197,6 +200,7 @@ impl Display for Header {
             PXAR_ENTRY => "ENTRY",
             PXAR_PAYLOAD => "PAYLOAD",
             PXAR_PAYLOAD_REF => "PAYLOAD_REF",
+            PXAR_PAYLOAD_TAIL_MARKER => "PXAR_PAYLOAD_TAIL_MARKER",
             PXAR_GOODBYE => "GOODBYE",
             _ => "UNKNOWN",
         };
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 pxar 12/62] format: add payload stream start marker
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (10 preceding siblings ...)
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 11/62] encoder/format: finish payload stream with marker Christian Ebner
@ 2024-05-07 15:51 ` Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 13/62] format/encoder/decoder: new pxar entry type `Version` Christian Ebner
                   ` (50 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:51 UTC (permalink / raw)
  To: pbs-devel

Mark the beginning of the payload stream with a magic number. Allows for
version and file type detection.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 examples/mk-format-hashes.rs |  5 +++++
 src/accessor/mod.rs          |  1 +
 src/decoder/mod.rs           | 28 +++++++++++++++++++---------
 src/encoder/mod.rs           | 18 +++++++++++-------
 src/format/mod.rs            |  2 ++
 5 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index de73df0..35cff99 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -56,6 +56,11 @@ const CONSTANTS: &[(&str, &str, &str)] = &[
         "PXAR_GOODBYE_TAIL_MARKER",
         "__PROXMOX_FORMAT_PXAR_GOODBYE_TAIL_MARKER__",
     ),
+    (
+        "The start marker used in the separate payload stream",
+        "PXAR_PAYLOAD_START_MARKER",
+        "__PROXMOX_FORMAT_PXAR_PAYLOAD_START_MARKER__",
+    ),
     (
         "The end marker used in the separate payload stream",
         "PXAR_PAYLOAD_TAIL_MARKER",
diff --git a/src/accessor/mod.rs b/src/accessor/mod.rs
index fadd3d2..9048c94 100644
--- a/src/accessor/mod.rs
+++ b/src/accessor/mod.rs
@@ -239,6 +239,7 @@ async fn get_decoder<T: ReadAt>(
         path,
         true,
         payload_input.map(|(input, range)| SeqReadAtAdapter::new(input, range)),
+        0,
     )
     .await
 }
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index 7c5cc12..3bca835 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -206,8 +206,21 @@ pub(crate) enum ItemResult {
 }
 
 impl<I: SeqRead> DecoderImpl<I> {
-    pub async fn new(input: I, payload_input: Option<I>) -> io::Result<Self> {
-        Self::new_full(input, "/".into(), false, payload_input).await
+    pub async fn new(input: I, mut payload_input: Option<I>) -> io::Result<Self> {
+        let payload_consumed = if let Some(payload_input) = payload_input.as_mut() {
+            let header: Header = seq_read_entry(payload_input).await?;
+            if header.htype != format::PXAR_PAYLOAD_START_MARKER {
+                io_bail!(
+                    "unexpected header in payload input: expected {:#x?} , got {header:#x?}",
+                    format::PXAR_PAYLOAD_START_MARKER,
+                );
+            }
+            header.full_size()
+        } else {
+            0
+        };
+
+        Self::new_full(input, "/".into(), false, payload_input, payload_consumed).await
     }
 
     pub(crate) fn input(&self) -> &I {
@@ -219,8 +232,9 @@ impl<I: SeqRead> DecoderImpl<I> {
         path: PathBuf,
         eof_after_entry: bool,
         payload_input: Option<I>,
+        payload_consumed: u64,
     ) -> io::Result<Self> {
-        let this = DecoderImpl {
+        Ok(DecoderImpl {
             input,
             current_header: unsafe { mem::zeroed() },
             entry: Entry {
@@ -232,13 +246,9 @@ impl<I: SeqRead> DecoderImpl<I> {
             state: State::Begin,
             with_goodbye_tables: false,
             payload_input,
-            payload_consumed: 0,
+            payload_consumed,
             eof_after_entry,
-        };
-
-        // this.read_next_entry().await?;
-
-        Ok(this)
+        })
     }
 
     /// Get the next file entry, recursing into directories.
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index cf16449..ea82eef 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -345,15 +345,23 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     pub async fn new(
         output: EncoderOutput<'a, T>,
         metadata: &Metadata,
-        payload_output: Option<T>,
+        mut payload_output: Option<T>,
     ) -> io::Result<EncoderImpl<'a, T>> {
         if !metadata.is_dir() {
             io_bail!("directory metadata must contain the directory mode flag");
         }
+
+        let mut state = EncoderState::default();
+        if let Some(payload_output) = payload_output.as_mut() {
+            let header = format::Header::with_content_size(format::PXAR_PAYLOAD_START_MARKER, 0);
+            header.check_header_size()?;
+            seq_write_struct(payload_output, header, &mut state.payload_write_position).await?;
+        }
+
         let mut this = Self {
             output,
-            payload_output: EncoderOutput::Owned(None),
-            state: vec![EncoderState::default()],
+            payload_output: EncoderOutput::Owned(payload_output),
+            state: vec![state],
             finished: false,
             file_copy_buffer: Arc::new(Mutex::new(unsafe {
                 crate::util::vec_new_uninitialized(1024 * 1024)
@@ -364,10 +372,6 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         let state = this.state_mut()?;
         state.files_offset = state.position();
 
-        if let Some(payload_output) = payload_output {
-            this.payload_output = EncoderOutput::Owned(Some(payload_output));
-        }
-
         Ok(this)
     }
 
diff --git a/src/format/mod.rs b/src/format/mod.rs
index e451b0f..6519bfc 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -106,6 +106,8 @@ pub const PXAR_PAYLOAD_REF: u64 = 0x419d3d6bc4ba977e;
 pub const PXAR_GOODBYE: u64 = 0x2fec4fa642d5731d;
 /// The end marker used in the GOODBYE object
 pub const PXAR_GOODBYE_TAIL_MARKER: u64 = 0xef5eed5b753e1555;
+/// The start marker used in the separate payload stream
+pub const PXAR_PAYLOAD_START_MARKER: u64 = 0x834c68c2194a4ed2;
 /// The end marker used in the separate payload stream
 pub const PXAR_PAYLOAD_TAIL_MARKER: u64 = 0x6c72b78b984c81b5;
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 pxar 13/62] format/encoder/decoder: new pxar entry type `Version`
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (11 preceding siblings ...)
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 12/62] format: add payload stream start marker Christian Ebner
@ 2024-05-07 15:51 ` Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 14/62] format/encoder/decoder: new pxar entry type `Prelude` Christian Ebner
                   ` (49 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:51 UTC (permalink / raw)
  To: pbs-devel

Introduces a new pxar format entry type `Version` and the associated
encoder and decoder methods. The format version entry is only allowed
once, as the first entry of the pxar archive, marked with a
`PXAR_FORMAT_VERSION` header followed by the encoded version number.
If not present, the default format version 1 is assumed as encoding
format for the archive.

The entry allows to early detect incompatibility with an encoded
archive and bail or switch mode based on the encountered version.

The format version entry is not backwards compatible to pxar format
version 1.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- fix typos in commit message

 examples/mk-format-hashes.rs |  5 +++++
 src/accessor/mod.rs          | 21 ++++++++++++++++++--
 src/decoder/mod.rs           | 37 ++++++++++++++++++++++++++++++++++--
 src/encoder/mod.rs           | 37 +++++++++++++++++++++++++++++++++---
 src/format/mod.rs            | 11 +++++++++++
 src/lib.rs                   |  3 +++
 tests/simple/fs.rs           |  1 +
 7 files changed, 108 insertions(+), 7 deletions(-)

diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index 35cff99..e5d69b1 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -1,6 +1,11 @@
 use pxar::format::hash_filename;
 
 const CONSTANTS: &[(&str, &str, &str)] = &[
+    (
+        "Pxar format version entry, fallback to version 1 if not present",
+        "PXAR_FORMAT_VERSION",
+        "__PROXMOX_FORMAT_VERSION__",
+    ),
     (
         "Beginning of an entry (current version).",
         "PXAR_ENTRY",
diff --git a/src/accessor/mod.rs b/src/accessor/mod.rs
index 9048c94..6441baa 100644
--- a/src/accessor/mod.rs
+++ b/src/accessor/mod.rs
@@ -306,11 +306,19 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
             self.payload_input.clone(),
         )
         .await?;
-        let entry = decoder
+        let mut entry = decoder
             .next()
             .await
             .ok_or_else(|| io_format_err!("unexpected EOF while decoding file entry"))??;
 
+        // Skip over possible Version and Prelude before the root entry of type Directory
+        if let EntryKind::Version(_) = entry.kind() {
+            entry = decoder
+                .next()
+                .await
+                .ok_or_else(|| io_format_err!("unexpected EOF while decoding directory entry"))??;
+        }
+
         Ok(FileEntryImpl {
             input: self.input.clone(),
             entry,
@@ -545,10 +553,19 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
         file_name: Option<&Path>,
     ) -> io::Result<(Entry, DecoderImpl<SeqReadAtAdapter<T>>)> {
         let mut decoder = self.get_decoder(entry_range, file_name).await?;
-        let entry = decoder
+        let mut entry = decoder
             .next()
             .await
             .ok_or_else(|| io_format_err!("unexpected EOF while decoding directory entry"))??;
+
+        // Skip over possible Version and Prelude before the root entry of type Directory
+        if let EntryKind::Version(_) = entry.kind() {
+            entry = decoder
+                .next()
+                .await
+                .ok_or_else(|| io_format_err!("unexpected EOF while decoding directory entry"))??;
+        }
+
         Ok((entry, decoder))
     }
 
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index 3bca835..305ecf1 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -18,7 +18,7 @@ use std::task::{Context, Poll};
 
 use endian_trait::Endian;
 
-use crate::format::{self, Header};
+use crate::format::{self, FormatVersion, Header};
 use crate::util::{self, io_err_other};
 use crate::{Entry, EntryKind, Metadata};
 
@@ -170,10 +170,14 @@ pub(crate) struct DecoderImpl<T> {
     /// The random access code uses decoders for sub-ranges which may not end in a `PAYLOAD` for
     /// entries like FIFOs or sockets, so there we explicitly allow an item to terminate with EOF.
     eof_after_entry: bool,
+    /// The format version as determined by the format version header
+    version: format::FormatVersion,
 }
 
+#[derive(Clone, PartialEq)]
 enum State {
     Begin,
+    Root,
     Default,
     InPayload {
         offset: u64,
@@ -248,6 +252,7 @@ impl<I: SeqRead> DecoderImpl<I> {
             payload_input,
             payload_consumed,
             eof_after_entry,
+            version: FormatVersion::default(),
         })
     }
 
@@ -260,7 +265,19 @@ impl<I: SeqRead> DecoderImpl<I> {
         loop {
             match self.state {
                 State::Eof => return Ok(None),
-                State::Begin => return self.read_next_entry().await.map(Some),
+                State::Begin => {
+                    let entry = self.read_next_entry().await.map(Some);
+                    if let Ok(Some(ref entry)) = entry {
+                        if let EntryKind::Version(version) = entry.kind() {
+                            self.version = version.clone();
+                            self.state = State::Root;
+                        }
+                    }
+                    return entry;
+                }
+                State::Root => {
+                    return self.read_next_entry().await.map(Some);
+                }
                 State::Default => {
                     // we completely finished an entry, so now we're going "up" in the directory
                     // hierarchy and parse the next PXAR_FILENAME or the PXAR_GOODBYE:
@@ -387,6 +404,7 @@ impl<I: SeqRead> DecoderImpl<I> {
     }
 
     async fn read_next_entry_or_eof(&mut self) -> io::Result<Option<Entry>> {
+        let previous_state = self.state.clone();
         self.state = State::Default;
         self.entry.clear_data();
 
@@ -406,6 +424,14 @@ impl<I: SeqRead> DecoderImpl<I> {
             self.entry.metadata = Metadata::default();
             self.entry.kind = EntryKind::Hardlink(self.read_hardlink().await?);
 
+            Ok(Some(self.entry.take()))
+        } else if header.htype == format::PXAR_FORMAT_VERSION {
+            if previous_state != State::Begin {
+                io_bail!("Got format version entry at unexpected position");
+            }
+            self.current_header = header;
+            self.entry.kind = EntryKind::Version(self.read_format_version().await?);
+
             Ok(Some(self.entry.take()))
         } else if header.htype == format::PXAR_ENTRY || header.htype == format::PXAR_ENTRY_V1 {
             if header.htype == format::PXAR_ENTRY {
@@ -761,6 +787,13 @@ impl<I: SeqRead> DecoderImpl<I> {
         self.current_header.check_header_size()?;
         seq_read_entry(&mut self.input).await
     }
+
+    async fn read_format_version(&mut self) -> io::Result<format::FormatVersion> {
+        match seq_read_entry(&mut self.input).await? {
+            2u64 => Ok(format::FormatVersion::Version2),
+            version => io_bail!("unexpected pxar format version {version}"),
+        }
+    }
 }
 
 /// Reader for file contents inside a pxar archive.
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index ea82eef..906ef62 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -17,7 +17,7 @@ use endian_trait::Endian;
 
 use crate::binary_tree_array;
 use crate::decoder::{self, SeqRead};
-use crate::format::{self, GoodbyeItem, PayloadRef};
+use crate::format::{self, FormatVersion, GoodbyeItem, PayloadRef};
 use crate::Metadata;
 
 pub mod aio;
@@ -327,6 +327,8 @@ pub(crate) struct EncoderImpl<'a, T: SeqWrite + 'a> {
     /// Since only the "current" entry can be actively writing files, we share the file copy
     /// buffer.
     file_copy_buffer: Arc<Mutex<Vec<u8>>>,
+    /// Pxar format version to encode
+    version: format::FormatVersion,
 }
 
 impl<'a, T: SeqWrite + 'a> Drop for EncoderImpl<'a, T> {
@@ -352,11 +354,14 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         }
 
         let mut state = EncoderState::default();
-        if let Some(payload_output) = payload_output.as_mut() {
+        let version = if let Some(payload_output) = payload_output.as_mut() {
             let header = format::Header::with_content_size(format::PXAR_PAYLOAD_START_MARKER, 0);
             header.check_header_size()?;
             seq_write_struct(payload_output, header, &mut state.payload_write_position).await?;
-        }
+            format::FormatVersion::Version2
+        } else {
+            format::FormatVersion::Version1
+        };
 
         let mut this = Self {
             output,
@@ -366,8 +371,10 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
             file_copy_buffer: Arc::new(Mutex::new(unsafe {
                 crate::util::vec_new_uninitialized(1024 * 1024)
             })),
+            version,
         };
 
+        this.encode_format_version().await?;
         this.encode_metadata(metadata).await?;
         let state = this.state_mut()?;
         state.files_offset = state.position();
@@ -557,6 +564,10 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         file_size: u64,
         payload_offset: PayloadOffset,
     ) -> io::Result<LinkOffset> {
+        if self.version == FormatVersion::Version1 {
+            io_bail!("payload references not supported pxar format version 1");
+        }
+
         if self.payload_output.as_mut().is_none() {
             io_bail!("unable to add payload reference");
         }
@@ -766,6 +777,26 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         Ok(())
     }
 
+    async fn encode_format_version(&mut self) -> io::Result<()> {
+        let version_bytes = match self.version {
+            format::FormatVersion::Version1 => return Ok(()),
+            format::FormatVersion::Version2 => 2u64.to_le_bytes(),
+        };
+
+        let (output, state) = self.output_state()?;
+        if state.write_position != 0 {
+            io_bail!("pxar format version must be encoded at the beginning of an archive");
+        }
+
+        seq_write_pxar_entry(
+            output,
+            format::PXAR_FORMAT_VERSION,
+            &version_bytes,
+            &mut state.write_position,
+        )
+        .await
+    }
+
     async fn encode_metadata(&mut self, metadata: &Metadata) -> io::Result<()> {
         let (output, state) = self.output_state()?;
         seq_write_pxar_struct_entry(
diff --git a/src/format/mod.rs b/src/format/mod.rs
index 6519bfc..9b66fe2 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -6,6 +6,7 @@
 //! item data.
 //!
 //! An archive contains items in the following order:
+//!  * `FORMAT_VERSION`     -- (optional for v1), version of encoding format
 //!  * `ENTRY`              -- containing general stat() data and related bits
 //!   * `XATTR`             -- one extended attribute
 //!   * ...                 -- more of these when there are multiple defined
@@ -80,6 +81,8 @@ pub mod mode {
 }
 
 // Generated by `cargo run --example mk-format-hashes`
+/// Pxar format version entry, fallback to version 1 if not present
+pub const PXAR_FORMAT_VERSION: u64 = 0x730f6c75df16a40d;
 /// Beginning of an entry (current version).
 pub const PXAR_ENTRY: u64 = 0xd5956474e588acef;
 /// Previous version of the entry struct
@@ -186,6 +189,7 @@ impl Header {
 impl Display for Header {
     fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
         let readable = match self.htype {
+            PXAR_FORMAT_VERSION => "FORMAT_VERSION",
             PXAR_FILENAME => "FILENAME",
             PXAR_SYMLINK => "SYMLINK",
             PXAR_HARDLINK => "HARDLINK",
@@ -551,6 +555,13 @@ impl From<&std::fs::Metadata> for Stat {
     }
 }
 
+#[derive(Clone, Debug, Default, PartialEq)]
+pub enum FormatVersion {
+    #[default]
+    Version1,
+    Version2,
+}
+
 #[derive(Clone, Debug)]
 pub struct Filename {
     pub name: Vec<u8>,
diff --git a/src/lib.rs b/src/lib.rs
index ef81a85..a87b5ac 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -342,6 +342,9 @@ impl Acl {
 /// Identifies whether the entry is a file, symlink, directory, etc.
 #[derive(Clone, Debug)]
 pub enum EntryKind {
+    /// Pxar file format version
+    Version(format::FormatVersion),
+
     /// Symbolic links.
     Symlink(format::Symlink),
 
diff --git a/tests/simple/fs.rs b/tests/simple/fs.rs
index 4284805..8a8c607 100644
--- a/tests/simple/fs.rs
+++ b/tests/simple/fs.rs
@@ -229,6 +229,7 @@ impl Entry {
                     })?))
                 };
             match item.kind() {
+                PxarEntryKind::Version(_) => continue,
                 PxarEntryKind::GoodbyeTable => break,
                 PxarEntryKind::File { size, .. } => {
                     let mut data = Vec::new();
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 pxar 14/62] format/encoder/decoder: new pxar entry type `Prelude`
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (12 preceding siblings ...)
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 13/62] format/encoder/decoder: new pxar entry type `Version` Christian Ebner
@ 2024-05-07 15:51 ` Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 proxmox-backup 15/62] client: pxar: switch to stack based encoder state Christian Ebner
                   ` (48 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:51 UTC (permalink / raw)
  To: pbs-devel

Introduces a new pxar format entry type `Prelude` and the associated
encoder and decoder methods.
A prelude starts with header marker `PXAR_PRELUDE` followed by raw
byte content, used to store additional metadata associated with the
pxar archive, e.g. command line arguments passed on archive creation.

The prelude's content has no fixed encoding format but is stored as
an raw, arbitrary byte slice. A prelude entry is encoded right after
a pxar format version entry, both being encoded in the metadata
archive in case of an archive with dedicated payload output.

The prelude is not backwards compatible to pxar format version 1.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- fix decoder state: must be in `InDirectory` after reading the root
  directory entry, when reading the optional prelude

 examples/mk-format-hashes.rs |  1 +
 src/accessor/mod.rs          | 12 ++++++++++++
 src/decoder/mod.rs           | 31 ++++++++++++++++++++++++++++++-
 src/encoder/aio.rs           | 19 ++++++++++++++-----
 src/encoder/mod.rs           | 26 ++++++++++++++++++++++++++
 src/encoder/sync.rs          | 11 +++++++++--
 src/format/mod.rs            | 26 ++++++++++++++++++++++++++
 src/lib.rs                   |  3 +++
 tests/simple/fs.rs           |  1 +
 9 files changed, 122 insertions(+), 8 deletions(-)

diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index e5d69b1..e998760 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -16,6 +16,7 @@ const CONSTANTS: &[(&str, &str, &str)] = &[
         "PXAR_ENTRY_V1",
         "__PROXMOX_FORMAT_ENTRY__",
     ),
+    ("", "PXAR_PRELUDE", "__PROXMOX_FORMAT_PRELUDE__"),
     ("", "PXAR_FILENAME", "__PROXMOX_FORMAT_FILENAME__"),
     ("", "PXAR_SYMLINK", "__PROXMOX_FORMAT_SYMLINK__"),
     ("", "PXAR_DEVICE", "__PROXMOX_FORMAT_DEVICE__"),
diff --git a/src/accessor/mod.rs b/src/accessor/mod.rs
index 6441baa..a746868 100644
--- a/src/accessor/mod.rs
+++ b/src/accessor/mod.rs
@@ -317,6 +317,12 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
                 .next()
                 .await
                 .ok_or_else(|| io_format_err!("unexpected EOF while decoding directory entry"))??;
+
+            if let EntryKind::Prelude(_) = entry.kind() {
+                entry = decoder.next().await.ok_or_else(|| {
+                    io_format_err!("unexpected EOF while decoding directory entry")
+                })??;
+            }
         }
 
         Ok(FileEntryImpl {
@@ -564,6 +570,12 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
                 .next()
                 .await
                 .ok_or_else(|| io_format_err!("unexpected EOF while decoding directory entry"))??;
+
+            if let EntryKind::Prelude(_) = entry.kind() {
+                entry = decoder.next().await.ok_or_else(|| {
+                    io_format_err!("unexpected EOF while decoding directory entry")
+                })??;
+            }
         }
 
         Ok((entry, decoder))
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index 305ecf1..21dc208 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -177,6 +177,7 @@ pub(crate) struct DecoderImpl<T> {
 #[derive(Clone, PartialEq)]
 enum State {
     Begin,
+    Prelude,
     Root,
     Default,
     InPayload {
@@ -267,10 +268,25 @@ impl<I: SeqRead> DecoderImpl<I> {
                 State::Eof => return Ok(None),
                 State::Begin => {
                     let entry = self.read_next_entry().await.map(Some);
+                    // If the first entry is of kind Version, next must be Prelude or Directory
                     if let Ok(Some(ref entry)) = entry {
                         if let EntryKind::Version(version) = entry.kind() {
                             self.version = version.clone();
-                            self.state = State::Root;
+                            self.state = State::Prelude;
+                        }
+                    }
+                    return entry;
+                }
+                State::Prelude => {
+                    let entry = self.read_next_entry().await.map(Some);
+                    if let Ok(Some(ref entry)) = entry {
+                        match entry.kind() {
+                            EntryKind::Prelude(_) => self.state = State::Root,
+                            EntryKind::Directory => self.state = State::InDirectory,
+                            _ => io_bail!(
+                                "expected directory or prelude entry, got entry kind {:?}",
+                                entry.kind()
+                            ),
                         }
                     }
                     return entry;
@@ -432,6 +448,14 @@ impl<I: SeqRead> DecoderImpl<I> {
             self.current_header = header;
             self.entry.kind = EntryKind::Version(self.read_format_version().await?);
 
+            Ok(Some(self.entry.take()))
+        } else if header.htype == format::PXAR_PRELUDE {
+            if previous_state != State::Prelude {
+                io_bail!("Got format version entry at unexpected position");
+            }
+            self.current_header = header;
+            self.entry.kind = EntryKind::Prelude(self.read_prelude().await?);
+
             Ok(Some(self.entry.take()))
         } else if header.htype == format::PXAR_ENTRY || header.htype == format::PXAR_ENTRY_V1 {
             if header.htype == format::PXAR_ENTRY {
@@ -794,6 +818,11 @@ impl<I: SeqRead> DecoderImpl<I> {
             version => io_bail!("unexpected pxar format version {version}"),
         }
     }
+
+    async fn read_prelude(&mut self) -> io::Result<format::Prelude> {
+        let data = self.read_entry_as_bytes().await?;
+        Ok(format::Prelude { data })
+    }
 }
 
 /// Reader for file contents inside a pxar archive.
diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index e385457..19055ad 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -25,11 +25,13 @@ impl<'a, T: tokio::io::AsyncWrite + 'a> Encoder<'a, TokioWriter<T>> {
         output: T,
         metadata: &Metadata,
         payload_output: Option<T>,
+        prelude: Option<&[u8]>,
     ) -> io::Result<Encoder<'a, TokioWriter<T>>> {
         Encoder::new(
             TokioWriter::new(output),
             metadata,
             payload_output.map(|payload_output| TokioWriter::new(payload_output)),
+            prelude,
         )
         .await
     }
@@ -46,6 +48,7 @@ impl<'a> Encoder<'a, TokioWriter<tokio::fs::File>> {
             TokioWriter::new(tokio::fs::File::create(path.as_ref()).await?),
             metadata,
             None,
+            None,
         )
         .await
     }
@@ -57,9 +60,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         output: T,
         metadata: &Metadata,
         payload_output: Option<T>,
+        prelude: Option<&[u8]>,
     ) -> io::Result<Encoder<'a, T>> {
         Ok(Self {
-            inner: encoder::EncoderImpl::new(output.into(), metadata, payload_output).await?,
+            inner: encoder::EncoderImpl::new(output.into(), metadata, payload_output, prelude)
+                .await?,
         })
     }
 
@@ -331,10 +336,14 @@ mod test {
     /// Assert that `Encoder` is `Send`
     fn send_test() {
         let test = async {
-            let mut encoder =
-                Encoder::new(DummyOutput, &Metadata::dir_builder(0o700).build(), None)
-                    .await
-                    .unwrap();
+            let mut encoder = Encoder::new(
+                DummyOutput,
+                &Metadata::dir_builder(0o700).build(),
+                None,
+                None,
+            )
+            .await
+            .unwrap();
             {
                 encoder
                     .create_directory("baba", &Metadata::dir_builder(0o700).build())
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 906ef62..b785ebc 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -348,6 +348,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         output: EncoderOutput<'a, T>,
         metadata: &Metadata,
         mut payload_output: Option<T>,
+        prelude: Option<&[u8]>,
     ) -> io::Result<EncoderImpl<'a, T>> {
         if !metadata.is_dir() {
             io_bail!("directory metadata must contain the directory mode flag");
@@ -375,6 +376,9 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         };
 
         this.encode_format_version().await?;
+        if let Some(prelude) = prelude {
+            this.encode_prelude(prelude).await?;
+        }
         this.encode_metadata(metadata).await?;
         let state = this.state_mut()?;
         state.files_offset = state.position();
@@ -777,6 +781,28 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         Ok(())
     }
 
+    async fn encode_prelude(&mut self, prelude: &[u8]) -> io::Result<()> {
+        if self.version == FormatVersion::Version1 {
+            io_bail!("encoding prelude not supported in pxar format version 1");
+        }
+
+        let (output, state) = self.output_state()?;
+        if state.write_position != (size_of::<u64>() + size_of::<format::Header>()) as u64 {
+            io_bail!(
+                "prelude must be encoded following the version header, current position {}",
+                state.write_position,
+            );
+        }
+
+        seq_write_pxar_entry(
+            output,
+            format::PXAR_PRELUDE,
+            prelude,
+            &mut state.write_position,
+        )
+        .await
+    }
+
     async fn encode_format_version(&mut self) -> io::Result<()> {
         let version_bytes = match self.version {
             format::FormatVersion::Version1 => return Ok(()),
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index bc6430a..ffed47b 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -28,7 +28,7 @@ impl<'a, T: io::Write + 'a> Encoder<'a, StandardWriter<T>> {
     /// Encode a `pxar` archive into a regular `std::io::Write` output.
     #[inline]
     pub fn from_std(output: T, metadata: &Metadata) -> io::Result<Encoder<'a, StandardWriter<T>>> {
-        Encoder::new(StandardWriter::new(output), metadata, None)
+        Encoder::new(StandardWriter::new(output), metadata, None, None)
     }
 }
 
@@ -42,6 +42,7 @@ impl<'a> Encoder<'a, StandardWriter<std::fs::File>> {
             StandardWriter::new(std::fs::File::create(path.as_ref())?),
             metadata,
             None,
+            None,
         )
     }
 }
@@ -53,12 +54,18 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
     /// not allowed to use the `Waker`, as this will cause a `panic!`.
     // Optionally attach a dedicated writer to redirect the payloads of regular files to a separate
     // output.
-    pub fn new(output: T, metadata: &Metadata, payload_output: Option<T>) -> io::Result<Self> {
+    pub fn new(
+        output: T,
+        metadata: &Metadata,
+        payload_output: Option<T>,
+        prelude: Option<&[u8]>,
+    ) -> io::Result<Self> {
         Ok(Self {
             inner: poll_result_once(encoder::EncoderImpl::new(
                 output.into(),
                 metadata,
                 payload_output,
+                prelude,
             ))?,
         })
     }
diff --git a/src/format/mod.rs b/src/format/mod.rs
index 9b66fe2..73b06cd 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -87,6 +87,7 @@ pub const PXAR_FORMAT_VERSION: u64 = 0x730f6c75df16a40d;
 pub const PXAR_ENTRY: u64 = 0xd5956474e588acef;
 /// Previous version of the entry struct
 pub const PXAR_ENTRY_V1: u64 = 0x11da850a1c1cceff;
+pub const PXAR_PRELUDE: u64 = 0xe309d79d9f7b771b;
 pub const PXAR_FILENAME: u64 = 0x16701121063917b3;
 pub const PXAR_SYMLINK: u64 = 0x27f971e7dbf5dc5f;
 pub const PXAR_DEVICE: u64 = 0x9fc9e906586d5ce9;
@@ -147,6 +148,7 @@ impl Header {
     #[inline]
     pub fn max_content_size(&self) -> u64 {
         match self.htype {
+            PXAR_PRELUDE => u64::MAX - (size_of::<Self>() as u64),
             // + null-termination
             PXAR_FILENAME => crate::util::MAX_FILENAME_LEN + 1,
             // + null-termination
@@ -190,6 +192,7 @@ impl Display for Header {
     fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
         let readable = match self.htype {
             PXAR_FORMAT_VERSION => "FORMAT_VERSION",
+            PXAR_PRELUDE => "PRELUDE",
             PXAR_FILENAME => "FILENAME",
             PXAR_SYMLINK => "SYMLINK",
             PXAR_HARDLINK => "HARDLINK",
@@ -694,6 +697,29 @@ impl Device {
     }
 }
 
+#[derive(Clone, Debug)]
+pub struct Prelude {
+    pub data: Vec<u8>,
+}
+
+impl Prelude {
+    pub fn as_os_str(&self) -> &OsStr {
+        self.as_ref()
+    }
+}
+
+impl AsRef<[u8]> for Prelude {
+    fn as_ref(&self) -> &[u8] {
+        &self.data
+    }
+}
+
+impl AsRef<OsStr> for Prelude {
+    fn as_ref(&self) -> &OsStr {
+        OsStr::from_bytes(&self.data[..self.data.len().max(1) - 1])
+    }
+}
+
 #[cfg(all(test, target_os = "linux"))]
 #[test]
 fn test_linux_devices() {
diff --git a/src/lib.rs b/src/lib.rs
index a87b5ac..16d69f8 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -345,6 +345,9 @@ pub enum EntryKind {
     /// Pxar file format version
     Version(format::FormatVersion),
 
+    /// Pxar prelude blob
+    Prelude(format::Prelude),
+
     /// Symbolic links.
     Symlink(format::Symlink),
 
diff --git a/tests/simple/fs.rs b/tests/simple/fs.rs
index 8a8c607..96fcee9 100644
--- a/tests/simple/fs.rs
+++ b/tests/simple/fs.rs
@@ -230,6 +230,7 @@ impl Entry {
                 };
             match item.kind() {
                 PxarEntryKind::Version(_) => continue,
+                PxarEntryKind::Prelude(_) => continue,
                 PxarEntryKind::GoodbyeTable => break,
                 PxarEntryKind::File { size, .. } => {
                     let mut data = Vec::new();
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 15/62] client: pxar: switch to stack based encoder state
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (13 preceding siblings ...)
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 14/62] format/encoder/decoder: new pxar entry type `Prelude` Christian Ebner
@ 2024-05-07 15:51 ` Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 proxmox-backup 16/62] client: backup: factor out extension from backup target Christian Ebner
                   ` (47 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:51 UTC (permalink / raw)
  To: pbs-devel

... and add the new optional encoder/decoder/accessor parameter to
attach a dedicated payload input/output.

In preparation for look-ahead caching, where a passing around of
per-directory level encoder instances with internal references is
not feasible.

Previously, for each directory level a new encoder instance has been
generated, restricting possible implementation errors. These encoder
instances have been internally linked by references to keep track of
the state changes in a parent child relationship.

This is however not feasible when the encoder has to be passed by
mutable reference, as required by the look-ahead cache
implementation. The encoder has therefore been adapted to use a
single instance implementation with an internal stack keeping track
of the state.

Depends on the bumped pxar library version, including the patches to
optionally attach the dedicated payload input/output.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pbs-client/src/pxar/create.rs             | 8 +++++---
 pbs-pxar-fuse/src/lib.rs                  | 2 +-
 proxmox-backup-client/src/catalog.rs      | 2 +-
 proxmox-backup-client/src/main.rs         | 2 +-
 proxmox-backup-client/src/mount.rs        | 2 +-
 proxmox-file-restore/src/main.rs          | 4 ++--
 pxar-bin/src/main.rs                      | 2 +-
 src/api2/admin/datastore.rs               | 2 +-
 src/api2/tape/restore.rs                  | 4 ++--
 src/bin/proxmox_backup_debug/diff.rs      | 2 +-
 src/tape/file_formats/snapshot_archive.rs | 3 ++-
 11 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 60efb0ce5..c9bf6df85 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -170,7 +170,7 @@ where
         set.insert(stat.st_dev);
     }
 
-    let mut encoder = Encoder::new(&mut writer, &metadata).await?;
+    let mut encoder = Encoder::new(&mut writer, &metadata, None).await?;
 
     let mut patterns = options.patterns;
 
@@ -203,6 +203,8 @@ where
         .archive_dir_contents(&mut encoder, source_dir, true)
         .await?;
     encoder.finish().await?;
+    encoder.close().await?;
+
     Ok(())
 }
 
@@ -663,7 +665,7 @@ impl Archiver {
     ) -> Result<(), Error> {
         let dir_name = OsStr::from_bytes(dir_name.to_bytes());
 
-        let mut encoder = encoder.create_directory(dir_name, metadata).await?;
+        encoder.create_directory(dir_name, metadata).await?;
 
         let old_fs_magic = self.fs_magic;
         let old_fs_feature_flags = self.fs_feature_flags;
@@ -686,7 +688,7 @@ impl Archiver {
             log::info!("skipping mount point: {:?}", self.path);
             Ok(())
         } else {
-            self.archive_dir_contents(&mut encoder, dir, false).await
+            self.archive_dir_contents(encoder, dir, false).await
         };
 
         self.fs_magic = old_fs_magic;
diff --git a/pbs-pxar-fuse/src/lib.rs b/pbs-pxar-fuse/src/lib.rs
index bf196b6c4..dff7aac31 100644
--- a/pbs-pxar-fuse/src/lib.rs
+++ b/pbs-pxar-fuse/src/lib.rs
@@ -66,7 +66,7 @@ impl Session {
         let file = std::fs::File::open(archive_path)?;
         let file_size = file.metadata()?.len();
         let reader: Reader = Arc::new(accessor::sync::FileReader::new(file));
-        let accessor = Accessor::new(reader, file_size).await?;
+        let accessor = Accessor::new(reader, file_size, None).await?;
         Self::mount(accessor, options, verbose, mountpoint)
     }
 
diff --git a/proxmox-backup-client/src/catalog.rs b/proxmox-backup-client/src/catalog.rs
index 72b22e67f..db919477f 100644
--- a/proxmox-backup-client/src/catalog.rs
+++ b/proxmox-backup-client/src/catalog.rs
@@ -220,7 +220,7 @@ async fn catalog_shell(param: Value) -> Result<(), Error> {
     let reader = BufferedDynamicReader::new(index, chunk_reader);
     let archive_size = reader.archive_size();
     let reader: pbs_pxar_fuse::Reader = Arc::new(BufferedDynamicReadAt::new(reader));
-    let decoder = pbs_pxar_fuse::Accessor::new(reader, archive_size).await?;
+    let decoder = pbs_pxar_fuse::Accessor::new(reader, archive_size, None).await?;
 
     client.download(CATALOG_NAME, &mut tmpfile).await?;
     let index = DynamicIndexReader::new(tmpfile)
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 32fe914c4..287005024 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -1453,7 +1453,7 @@ async fn restore(
 
         if let Some(target) = target {
             pbs_client::pxar::extract_archive(
-                pxar::decoder::Decoder::from_std(reader)?,
+                pxar::decoder::Decoder::from_std(reader, None)?,
                 Path::new(target),
                 feature_flags,
                 |path| {
diff --git a/proxmox-backup-client/src/mount.rs b/proxmox-backup-client/src/mount.rs
index 4a2f83357..67fd23468 100644
--- a/proxmox-backup-client/src/mount.rs
+++ b/proxmox-backup-client/src/mount.rs
@@ -296,7 +296,7 @@ async fn mount_do(param: Value, pipe: Option<OwnedFd>) -> Result<Value, Error> {
         let reader = BufferedDynamicReader::new(index, chunk_reader);
         let archive_size = reader.archive_size();
         let reader: pbs_pxar_fuse::Reader = Arc::new(BufferedDynamicReadAt::new(reader));
-        let decoder = pbs_pxar_fuse::Accessor::new(reader, archive_size).await?;
+        let decoder = pbs_pxar_fuse::Accessor::new(reader, archive_size, None).await?;
 
         let session =
             pbs_pxar_fuse::Session::mount(decoder, options, false, Path::new(target.unwrap()))
diff --git a/proxmox-file-restore/src/main.rs b/proxmox-file-restore/src/main.rs
index 50875a636..dbab69942 100644
--- a/proxmox-file-restore/src/main.rs
+++ b/proxmox-file-restore/src/main.rs
@@ -457,7 +457,7 @@ async fn extract(
 
             let archive_size = reader.archive_size();
             let reader = LocalDynamicReadAt::new(reader);
-            let decoder = Accessor::new(reader, archive_size).await?;
+            let decoder = Accessor::new(reader, archive_size, None).await?;
             extract_to_target(decoder, &path, target, format, zstd).await?;
         }
         ExtractPath::VM(file, path) => {
@@ -483,7 +483,7 @@ async fn extract(
                     false,
                 )
                 .await?;
-                let decoder = Decoder::from_tokio(reader).await?;
+                let decoder = Decoder::from_tokio(reader, None).await?;
                 extract_sub_dir_seq(&target, decoder).await?;
 
                 // we extracted a .pxarexclude-cli file auto-generated by the VM when encoding the
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index 2bbe90e34..d475da4e3 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -26,7 +26,7 @@ fn extract_archive_from_reader<R: std::io::Read>(
     options: PxarExtractOptions,
 ) -> Result<(), Error> {
     pbs_client::pxar::extract_archive(
-        pxar::decoder::Decoder::from_std(reader)?,
+        pxar::decoder::Decoder::from_std(reader, None)?,
         Path::new(target),
         feature_flags,
         |path| {
diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index 3ea174998..068b6a61e 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -1803,7 +1803,7 @@ pub fn pxar_file_download(
         let archive_size = reader.archive_size();
         let reader = LocalDynamicReadAt::new(reader);
 
-        let decoder = Accessor::new(reader, archive_size).await?;
+        let decoder = Accessor::new(reader, archive_size, None).await?;
         let root = decoder.open_root().await?;
         let path = OsStr::from_bytes(file_path).to_os_string();
         let file = root
diff --git a/src/api2/tape/restore.rs b/src/api2/tape/restore.rs
index 84557bce1..11fb2b4cd 100644
--- a/src/api2/tape/restore.rs
+++ b/src/api2/tape/restore.rs
@@ -1069,7 +1069,7 @@ fn restore_snapshots_to_tmpdir(
                     "File {file_num}: snapshot archive {source_datastore}:{snapshot}",
                 );
 
-                let mut decoder = pxar::decoder::sync::Decoder::from_std(reader)?;
+                let mut decoder = pxar::decoder::sync::Decoder::from_std(reader, None)?;
 
                 let target_datastore = match store_map.target_store(&source_datastore) {
                     Some(datastore) => datastore,
@@ -1685,7 +1685,7 @@ fn restore_snapshot_archive<'a>(
     reader: Box<dyn 'a + TapeRead>,
     snapshot_path: &Path,
 ) -> Result<bool, Error> {
-    let mut decoder = pxar::decoder::sync::Decoder::from_std(reader)?;
+    let mut decoder = pxar::decoder::sync::Decoder::from_std(reader, None)?;
     match try_restore_snapshot_archive(worker, &mut decoder, snapshot_path) {
         Ok(_) => Ok(true),
         Err(err) => {
diff --git a/src/bin/proxmox_backup_debug/diff.rs b/src/bin/proxmox_backup_debug/diff.rs
index 5b68941a4..140c573c1 100644
--- a/src/bin/proxmox_backup_debug/diff.rs
+++ b/src/bin/proxmox_backup_debug/diff.rs
@@ -277,7 +277,7 @@ async fn open_dynamic_index(
     let reader = BufferedDynamicReader::new(index, chunk_reader);
     let archive_size = reader.archive_size();
     let reader: Arc<dyn ReadAt + Send + Sync> = Arc::new(LocalDynamicReadAt::new(reader));
-    let accessor = Accessor::new(reader, archive_size).await?;
+    let accessor = Accessor::new(reader, archive_size, None).await?;
 
     Ok((lookup_index, accessor))
 }
diff --git a/src/tape/file_formats/snapshot_archive.rs b/src/tape/file_formats/snapshot_archive.rs
index 252384b50..43d1cf9c3 100644
--- a/src/tape/file_formats/snapshot_archive.rs
+++ b/src/tape/file_formats/snapshot_archive.rs
@@ -59,7 +59,7 @@ pub fn tape_write_snapshot_archive<'a>(
         }
 
         let mut encoder =
-            pxar::encoder::sync::Encoder::new(PxarTapeWriter::new(writer), &root_metadata)?;
+            pxar::encoder::sync::Encoder::new(PxarTapeWriter::new(writer), &root_metadata, None)?;
 
         for filename in file_list.iter() {
             let mut file = snapshot_reader.open_file(filename).map_err(|err| {
@@ -89,6 +89,7 @@ pub fn tape_write_snapshot_archive<'a>(
             }
         }
         encoder.finish()?;
+        encoder.close()?;
         Ok(())
     });
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 16/62] client: backup: factor out extension from backup target
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (14 preceding siblings ...)
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 proxmox-backup 15/62] client: pxar: switch to stack based encoder state Christian Ebner
@ 2024-05-07 15:51 ` Christian Ebner
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 proxmox-backup 17/62] client: pxar: combine writers into struct Christian Ebner
                   ` (46 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:51 UTC (permalink / raw)
  To: pbs-devel

Instead of composing the backup target name and pushing it to the
backup list, push the archive name and extension separately, only
constructing it while iterating the list later.

By this it remains possible to additionally prefix the extension, as
required with the separate pxar metadata and payload indexes.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 proxmox-backup-client/src/main.rs | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 287005024..d8da36de4 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -785,7 +785,8 @@ async fn create_backup(
                 upload_list.push((
                     BackupSpecificationType::PXAR,
                     filename.to_owned(),
-                    format!("{}.didx", target),
+                    target.to_owned(),
+                    "didx",
                     0,
                 ));
             }
@@ -803,7 +804,8 @@ async fn create_backup(
                 upload_list.push((
                     BackupSpecificationType::IMAGE,
                     filename.to_owned(),
-                    format!("{}.fidx", target),
+                    target.to_owned(),
+                    "fidx",
                     size,
                 ));
             }
@@ -814,7 +816,8 @@ async fn create_backup(
                 upload_list.push((
                     BackupSpecificationType::CONFIG,
                     filename.to_owned(),
-                    format!("{}.blob", target),
+                    target.to_owned(),
+                    "blob",
                     metadata.len(),
                 ));
             }
@@ -825,7 +828,8 @@ async fn create_backup(
                 upload_list.push((
                     BackupSpecificationType::LOGFILE,
                     filename.to_owned(),
-                    format!("{}.blob", target),
+                    target.to_owned(),
+                    "blob",
                     metadata.len(),
                 ));
             }
@@ -944,7 +948,8 @@ async fn create_backup(
         log::info!("{} {} '{}' to '{}' as {}", what, desc, file, repo, target);
     };
 
-    for (backup_type, filename, target, size) in upload_list {
+    for (backup_type, filename, target_base, extension, size) in upload_list {
+        let target = format!("{target_base}.{extension}");
         match (backup_type, dry_run) {
             // dry-run
             (BackupSpecificationType::CONFIG, true) => log_file("config file", &filename, &target),
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 17/62] client: pxar: combine writers into struct
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (15 preceding siblings ...)
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 proxmox-backup 16/62] client: backup: factor out extension from backup target Christian Ebner
@ 2024-05-07 15:51 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 18/62] client: pxar: add optional pxar payload writer instance Christian Ebner
                   ` (45 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:51 UTC (permalink / raw)
  To: pbs-devel

Introduce a `PxarWriters` struct to bundle all writer instances
required for the pxar archive creation into a single object to limit
the number of function call parameters, allowing to extend this
further by e.g. the optional payload writer instance.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pbs-client/src/pxar/create.rs                  | 18 ++++++++++++++----
 pbs-client/src/pxar/mod.rs                     |  2 +-
 pbs-client/src/pxar_backup_stream.rs           |  5 +++--
 .../src/proxmox_restore_daemon/api.rs          |  6 ++++--
 pxar-bin/src/main.rs                           |  6 +++---
 tests/catar.rs                                 |  3 +--
 6 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index c9bf6df85..82f05889b 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -135,12 +135,22 @@ struct Archiver {
 
 type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
 
+pub struct PxarWriters<T> {
+    writer: T,
+    catalog: Option<Arc<Mutex<dyn BackupCatalogWriter + Send>>>,
+}
+
+impl<T> PxarWriters<T> {
+    pub fn new(writer: T, catalog: Option<Arc<Mutex<dyn BackupCatalogWriter + Send>>>) -> Self {
+        Self { writer, catalog }
+    }
+}
+
 pub async fn create_archive<T, F>(
     source_dir: Dir,
-    mut writer: T,
+    mut writers: PxarWriters<T>,
     feature_flags: Flags,
     callback: F,
-    catalog: Option<Arc<Mutex<dyn BackupCatalogWriter + Send>>>,
     options: PxarCreateOptions,
 ) -> Result<(), Error>
 where
@@ -170,7 +180,7 @@ where
         set.insert(stat.st_dev);
     }
 
-    let mut encoder = Encoder::new(&mut writer, &metadata, None).await?;
+    let mut encoder = Encoder::new(&mut writers.writer, &metadata, None).await?;
 
     let mut patterns = options.patterns;
 
@@ -188,7 +198,7 @@ where
         fs_magic,
         callback: Box::new(callback),
         patterns,
-        catalog,
+        catalog: writers.catalog,
         path: PathBuf::new(),
         entry_counter: 0,
         entry_limit: options.entries_max,
diff --git a/pbs-client/src/pxar/mod.rs b/pbs-client/src/pxar/mod.rs
index 14674b9b9..b7dcf8362 100644
--- a/pbs-client/src/pxar/mod.rs
+++ b/pbs-client/src/pxar/mod.rs
@@ -56,7 +56,7 @@ pub(crate) mod tools;
 mod flags;
 pub use flags::Flags;
 
-pub use create::{create_archive, PxarCreateOptions};
+pub use create::{create_archive, PxarCreateOptions, PxarWriters};
 pub use extract::{
     create_tar, create_zip, extract_archive, extract_sub_dir, extract_sub_dir_seq, ErrorHandler,
     OverwriteFlags, PxarExtractContext, PxarExtractOptions,
diff --git a/pbs-client/src/pxar_backup_stream.rs b/pbs-client/src/pxar_backup_stream.rs
index 22a6ffdc2..bfa108a8b 100644
--- a/pbs-client/src/pxar_backup_stream.rs
+++ b/pbs-client/src/pxar_backup_stream.rs
@@ -17,6 +17,8 @@ use proxmox_io::StdChannelWriter;
 
 use pbs_datastore::catalog::CatalogWriter;
 
+use crate::pxar::create::PxarWriters;
+
 /// Stream implementation to encode and upload .pxar archives.
 ///
 /// The hyper client needs an async Stream for file upload, so we
@@ -56,13 +58,12 @@ impl PxarBackupStream {
             let writer = pxar::encoder::sync::StandardWriter::new(writer);
             if let Err(err) = crate::pxar::create_archive(
                 dir,
-                writer,
+                PxarWriters::new(writer, Some(catalog)),
                 crate::pxar::Flags::DEFAULT,
                 move |path| {
                     log::debug!("{:?}", path);
                     Ok(())
                 },
-                Some(catalog),
                 options,
             )
             .await
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index c20552225..1ee200573 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -23,7 +23,9 @@ use proxmox_sortable_macro::sortable;
 use proxmox_sys::fs::read_subdir;
 
 use pbs_api_types::file_restore::{FileRestoreFormat, RestoreDaemonStatus};
-use pbs_client::pxar::{create_archive, Flags, PxarCreateOptions, ENCODER_MAX_ENTRIES};
+use pbs_client::pxar::{
+    create_archive, Flags, PxarCreateOptions, PxarWriters, ENCODER_MAX_ENTRIES,
+};
 use pbs_datastore::catalog::{ArchiveEntry, DirEntryAttribute};
 use pbs_tools::json::required_string_param;
 
@@ -356,7 +358,7 @@ fn extract(
                     };
 
                     let pxar_writer = TokioWriter::new(writer);
-                    create_archive(dir, pxar_writer, Flags::DEFAULT, |_| Ok(()), None, options)
+                    create_archive(dir, PxarWriters::new(pxar_writer, None), Flags::DEFAULT, |_| Ok(()), options)
                         .await
                 }
                 .await;
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index d475da4e3..ae2325078 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -13,7 +13,8 @@ use tokio::signal::unix::{signal, SignalKind};
 
 use pathpatterns::{MatchEntry, MatchType, PatternFlag};
 use pbs_client::pxar::{
-    format_single_line_entry, Flags, OverwriteFlags, PxarExtractOptions, ENCODER_MAX_ENTRIES,
+    format_single_line_entry, Flags, OverwriteFlags, PxarExtractOptions, PxarWriters,
+    ENCODER_MAX_ENTRIES,
 };
 
 use proxmox_router::cli::*;
@@ -376,13 +377,12 @@ async fn create_archive(
     let writer = pxar::encoder::sync::StandardWriter::new(writer);
     pbs_client::pxar::create_archive(
         dir,
-        writer,
+        PxarWriters::new(writer, None),
         feature_flags,
         move |path| {
             log::debug!("{:?}", path);
             Ok(())
         },
-        None,
         options,
     )
     .await?;
diff --git a/tests/catar.rs b/tests/catar.rs
index 36bb4f3bc..f414da8c9 100644
--- a/tests/catar.rs
+++ b/tests/catar.rs
@@ -35,10 +35,9 @@ fn run_test(dir_name: &str) -> Result<(), Error> {
     let rt = tokio::runtime::Runtime::new().unwrap();
     rt.block_on(create_archive(
         dir,
-        writer,
+        PxarWriters::new(writer, None),
         Flags::DEFAULT,
         |_| Ok(()),
-        None,
         options,
     ))?;
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 18/62] client: pxar: add optional pxar payload writer instance
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (16 preceding siblings ...)
  2024-05-07 15:51 ` [pbs-devel] [PATCH v5 proxmox-backup 17/62] client: pxar: combine writers into struct Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 19/62] client: pxar: optionally split metadata and payload streams Christian Ebner
                   ` (44 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Extend the PxarWriters to hold the optional pxar payload writer
and attach it to the pxar encoder during archive creation.

The payload writer will encode the payloads of regular files to a
different backup stream, splitting the metadata from the payload
data.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pbs-client/src/pxar/create.rs                 | 20 ++++++++++++++++---
 pbs-client/src/pxar_backup_stream.rs          |  2 +-
 .../src/proxmox_restore_daemon/api.rs         | 10 ++++++++--
 pxar-bin/src/main.rs                          |  2 +-
 tests/catar.rs                                |  2 +-
 5 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 82f05889b..2bb5a6253 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -137,12 +137,21 @@ type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
 
 pub struct PxarWriters<T> {
     writer: T,
+    payload_writer: Option<T>,
     catalog: Option<Arc<Mutex<dyn BackupCatalogWriter + Send>>>,
 }
 
 impl<T> PxarWriters<T> {
-    pub fn new(writer: T, catalog: Option<Arc<Mutex<dyn BackupCatalogWriter + Send>>>) -> Self {
-        Self { writer, catalog }
+    pub fn new(
+        writer: T,
+        payload_writer: Option<T>,
+        catalog: Option<Arc<Mutex<dyn BackupCatalogWriter + Send>>>,
+    ) -> Self {
+        Self {
+            writer,
+            payload_writer,
+            catalog,
+        }
     }
 }
 
@@ -180,7 +189,12 @@ where
         set.insert(stat.st_dev);
     }
 
-    let mut encoder = Encoder::new(&mut writers.writer, &metadata, None).await?;
+    let mut encoder = Encoder::new(
+        &mut writers.writer,
+        &metadata,
+        writers.payload_writer.as_mut(),
+    )
+    .await?;
 
     let mut patterns = options.patterns;
 
diff --git a/pbs-client/src/pxar_backup_stream.rs b/pbs-client/src/pxar_backup_stream.rs
index bfa108a8b..cdfb7eaa8 100644
--- a/pbs-client/src/pxar_backup_stream.rs
+++ b/pbs-client/src/pxar_backup_stream.rs
@@ -58,7 +58,7 @@ impl PxarBackupStream {
             let writer = pxar::encoder::sync::StandardWriter::new(writer);
             if let Err(err) = crate::pxar::create_archive(
                 dir,
-                PxarWriters::new(writer, Some(catalog)),
+                PxarWriters::new(writer, None, Some(catalog)),
                 crate::pxar::Flags::DEFAULT,
                 move |path| {
                     log::debug!("{:?}", path);
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index 1ee200573..ea97976e6 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -358,8 +358,14 @@ fn extract(
                     };
 
                     let pxar_writer = TokioWriter::new(writer);
-                    create_archive(dir, PxarWriters::new(pxar_writer, None), Flags::DEFAULT, |_| Ok(()), options)
-                        .await
+                    create_archive(
+                        dir,
+                        PxarWriters::new(pxar_writer, None, None),
+                        Flags::DEFAULT,
+                        |_| Ok(()),
+                        options,
+                    )
+                    .await
                 }
                 .await;
                 if let Err(err) = result {
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index ae2325078..34944cf16 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -377,7 +377,7 @@ async fn create_archive(
     let writer = pxar::encoder::sync::StandardWriter::new(writer);
     pbs_client::pxar::create_archive(
         dir,
-        PxarWriters::new(writer, None),
+        PxarWriters::new(writer, None, None),
         feature_flags,
         move |path| {
             log::debug!("{:?}", path);
diff --git a/tests/catar.rs b/tests/catar.rs
index f414da8c9..9e96a8610 100644
--- a/tests/catar.rs
+++ b/tests/catar.rs
@@ -35,7 +35,7 @@ fn run_test(dir_name: &str) -> Result<(), Error> {
     let rt = tokio::runtime::Runtime::new().unwrap();
     rt.block_on(create_archive(
         dir,
-        PxarWriters::new(writer, None),
+        PxarWriters::new(writer, None, None),
         Flags::DEFAULT,
         |_| Ok(()),
         options,
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 19/62] client: pxar: optionally split metadata and payload streams
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (17 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 18/62] client: pxar: add optional pxar payload writer instance Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 20/62] client: helper: add helpers for creating reader instances Christian Ebner
                   ` (43 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

... and attach the optional payload writer to the pxar archive
creation. By this, metadata and payload data will create different
dynamic indexes, allowing to lookup and reuse payload chunks without
the additional overhead of the pxar archive's metadata.

For now this functionality remains disabled and will be enabled in a
later patch once the logic for reusing the payload chunks is in
place.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pbs-client/src/pxar_backup_stream.rs | 49 +++++++++++++-----
 proxmox-backup-client/src/main.rs    | 75 +++++++++++++++++++++++++---
 2 files changed, 103 insertions(+), 21 deletions(-)

diff --git a/pbs-client/src/pxar_backup_stream.rs b/pbs-client/src/pxar_backup_stream.rs
index cdfb7eaa8..95145cb0d 100644
--- a/pbs-client/src/pxar_backup_stream.rs
+++ b/pbs-client/src/pxar_backup_stream.rs
@@ -42,23 +42,37 @@ impl PxarBackupStream {
         dir: Dir,
         catalog: Arc<Mutex<CatalogWriter<W>>>,
         options: crate::pxar::PxarCreateOptions,
-    ) -> Result<Self, Error> {
-        let (tx, rx) = std::sync::mpsc::sync_channel(10);
-
+        separate_payload_stream: bool,
+    ) -> Result<(Self, Option<Self>), Error> {
         let buffer_size = 256 * 1024;
 
-        let error = Arc::new(Mutex::new(None));
-        let error2 = Arc::clone(&error);
-        let handler = async move {
-            let writer = TokioWriterAdapter::new(std::io::BufWriter::with_capacity(
+        let (tx, rx) = std::sync::mpsc::sync_channel(10);
+        let writer = TokioWriterAdapter::new(std::io::BufWriter::with_capacity(
+            buffer_size,
+            StdChannelWriter::new(tx),
+        ));
+        let writer = pxar::encoder::sync::StandardWriter::new(writer);
+
+        let (payload_writer, payload_rx) = if separate_payload_stream {
+            let (tx, rx) = std::sync::mpsc::sync_channel(10);
+            let payload_writer = TokioWriterAdapter::new(std::io::BufWriter::with_capacity(
                 buffer_size,
                 StdChannelWriter::new(tx),
             ));
+            (
+                Some(pxar::encoder::sync::StandardWriter::new(payload_writer)),
+                Some(rx),
+            )
+        } else {
+            (None, None)
+        };
 
-            let writer = pxar::encoder::sync::StandardWriter::new(writer);
+        let error = Arc::new(Mutex::new(None));
+        let error2 = Arc::clone(&error);
+        let handler = async move {
             if let Err(err) = crate::pxar::create_archive(
                 dir,
-                PxarWriters::new(writer, None, Some(catalog)),
+                PxarWriters::new(writer, payload_writer, Some(catalog)),
                 crate::pxar::Flags::DEFAULT,
                 move |path| {
                     log::debug!("{:?}", path);
@@ -77,21 +91,30 @@ impl PxarBackupStream {
         let future = Abortable::new(handler, registration);
         tokio::spawn(future);
 
-        Ok(Self {
+        let backup_stream = Self {
+            rx: Some(rx),
+            handle: Some(handle.clone()),
+            error: Arc::clone(&error),
+        };
+
+        let backup_payload_stream = payload_rx.map(|rx| Self {
             rx: Some(rx),
             handle: Some(handle),
             error,
-        })
+        });
+
+        Ok((backup_stream, backup_payload_stream))
     }
 
     pub fn open<W: Write + Send + 'static>(
         dirname: &Path,
         catalog: Arc<Mutex<CatalogWriter<W>>>,
         options: crate::pxar::PxarCreateOptions,
-    ) -> Result<Self, Error> {
+        separate_payload_stream: bool,
+    ) -> Result<(Self, Option<Self>), Error> {
         let dir = nix::dir::Dir::open(dirname, OFlag::O_DIRECTORY, Mode::empty())?;
 
-        Self::new(dir, catalog, options)
+        Self::new(dir, catalog, options, separate_payload_stream)
     }
 }
 
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index d8da36de4..ab7d316d4 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -187,18 +187,24 @@ async fn backup_directory<P: AsRef<Path>>(
     client: &BackupWriter,
     dir_path: P,
     archive_name: &str,
+    payload_target: Option<&str>,
     chunk_size: Option<usize>,
     catalog: Arc<Mutex<CatalogWriter<TokioWriterAdapter<StdChannelWriter<Error>>>>>,
     pxar_create_options: pbs_client::pxar::PxarCreateOptions,
     upload_options: UploadOptions,
-) -> Result<BackupStats, Error> {
+) -> Result<(BackupStats, Option<BackupStats>), Error> {
     if upload_options.fixed_size.is_some() {
         bail!("cannot backup directory with fixed chunk size!");
     }
 
-    let pxar_stream = PxarBackupStream::open(dir_path.as_ref(), catalog, pxar_create_options)?;
-    let mut chunk_stream = ChunkStream::new(pxar_stream, chunk_size);
+    let (pxar_stream, payload_stream) = PxarBackupStream::open(
+        dir_path.as_ref(),
+        catalog,
+        pxar_create_options,
+        payload_target.is_some(),
+    )?;
 
+    let mut chunk_stream = ChunkStream::new(pxar_stream, chunk_size);
     let (tx, rx) = mpsc::channel(10); // allow to buffer 10 chunks
 
     let stream = ReceiverStream::new(rx).map_err(Error::from);
@@ -210,11 +216,36 @@ async fn backup_directory<P: AsRef<Path>>(
         }
     });
 
-    let stats = client
-        .upload_stream(archive_name, stream, upload_options)
-        .await?;
+    let stats = client.upload_stream(archive_name, stream, upload_options.clone());
 
-    Ok(stats)
+    if let Some(payload_stream) = payload_stream {
+        let payload_target = payload_target
+            .ok_or_else(|| format_err!("got payload stream, but no target archive name"))?;
+
+        let mut payload_chunk_stream = ChunkStream::new(payload_stream, chunk_size);
+        let (payload_tx, payload_rx) = mpsc::channel(10); // allow to buffer 10 chunks
+        let stream = ReceiverStream::new(payload_rx).map_err(Error::from);
+
+        // spawn payload chunker inside a separate task so that it can run parallel
+        tokio::spawn(async move {
+            while let Some(v) = payload_chunk_stream.next().await {
+                let _ = payload_tx.send(v).await;
+            }
+        });
+
+        let payload_stats = client.upload_stream(&payload_target, stream, upload_options);
+
+        match futures::join!(stats, payload_stats) {
+            (Ok(stats), Ok(payload_stats)) => Ok((stats, Some(payload_stats))),
+            (Err(err), Ok(_)) => Err(format_err!("upload failed: {err}")),
+            (Ok(_), Err(err)) => Err(format_err!("upload failed: {err}")),
+            (Err(err), Err(payload_err)) => {
+                Err(format_err!("upload failed: {err} - {payload_err}"))
+            }
+        }
+    } else {
+        Ok((stats.await?, None))
+    }
 }
 
 async fn backup_image<P: AsRef<Path>>(
@@ -985,6 +1016,23 @@ async fn create_backup(
                 manifest.add_file(target, stats.size, stats.csum, crypto.mode)?;
             }
             (BackupSpecificationType::PXAR, false) => {
+                let metadata_mode = false; // Until enabled via param
+
+                let target_base = if let Some(base) = target_base.strip_suffix(".pxar") {
+                    base.to_string()
+                } else {
+                    bail!("unexpected suffix in target: {target_base}");
+                };
+
+                let (target, payload_target) = if metadata_mode {
+                    (
+                        format!("{target_base}.mpxar.{extension}"),
+                        Some(format!("{target_base}.ppxar.{extension}")),
+                    )
+                } else {
+                    (target, None)
+                };
+
                 // start catalog upload on first use
                 if catalog.is_none() {
                     let catalog_upload_res =
@@ -1015,16 +1063,27 @@ async fn create_backup(
                     ..UploadOptions::default()
                 };
 
-                let stats = backup_directory(
+                let (stats, payload_stats) = backup_directory(
                     &client,
                     &filename,
                     &target,
+                    payload_target.as_deref(),
                     chunk_size_opt,
                     catalog.clone(),
                     pxar_options,
                     upload_options,
                 )
                 .await?;
+
+                if let Some(payload_stats) = payload_stats {
+                    manifest.add_file(
+                        payload_target
+                            .ok_or_else(|| format_err!("missing payload target archive"))?,
+                        payload_stats.size,
+                        payload_stats.csum,
+                        crypto.mode,
+                    )?;
+                }
                 manifest.add_file(target, stats.size, stats.csum, crypto.mode)?;
                 catalog.lock().unwrap().end_directory()?;
             }
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 20/62] client: helper: add helpers for creating reader instances
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (18 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 19/62] client: pxar: optionally split metadata and payload streams Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 21/62] client: helper: add method for split archive name mapping Christian Ebner
                   ` (42 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Add module to place helper methods which need to be used in different
submodules of the client.

Add `get_pxar_fuse_reader`, `get_buffered_pxar_reader` and
`get_pxar_fuse_accessor` to create reader instances to access pxar
archives.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 proxmox-backup-client/src/helper.rs | 75 +++++++++++++++++++++++++++++
 proxmox-backup-client/src/main.rs   |  2 +
 2 files changed, 77 insertions(+)
 create mode 100644 proxmox-backup-client/src/helper.rs

diff --git a/proxmox-backup-client/src/helper.rs b/proxmox-backup-client/src/helper.rs
new file mode 100644
index 000000000..00b3ce362
--- /dev/null
+++ b/proxmox-backup-client/src/helper.rs
@@ -0,0 +1,75 @@
+use std::sync::Arc;
+
+use anyhow::Error;
+use pbs_client::{BackupReader, RemoteChunkReader};
+use pbs_datastore::BackupManifest;
+use pbs_tools::crypt_config::CryptConfig;
+
+use crate::{BufferedDynamicReadAt, BufferedDynamicReader, IndexFile};
+
+pub(crate) async fn get_pxar_fuse_accessor(
+    archive_name: &str,
+    payload_archive_name: Option<&str>,
+    client: Arc<BackupReader>,
+    manifest: &BackupManifest,
+    crypt_config: Option<Arc<CryptConfig>>,
+) -> Result<pbs_pxar_fuse::Accessor, Error> {
+    let (reader, archive_size) = get_pxar_fuse_reader(
+        &archive_name,
+        client.clone(),
+        &manifest,
+        crypt_config.clone(),
+    )
+    .await?;
+
+    let accessor = if let Some(payload_archive_name) = payload_archive_name {
+        let (payload_reader, payload_size) = get_pxar_fuse_reader(
+            payload_archive_name,
+            client.clone(),
+            &manifest,
+            crypt_config.clone(),
+        )
+        .await?;
+        pbs_pxar_fuse::Accessor::new(reader, archive_size, Some((payload_reader, payload_size)))
+            .await?
+    } else {
+        pbs_pxar_fuse::Accessor::new(reader, archive_size, None).await?
+    };
+
+    Ok(accessor)
+}
+
+pub(crate) async fn get_pxar_fuse_reader(
+    archive_name: &str,
+    client: Arc<BackupReader>,
+    manifest: &BackupManifest,
+    crypt_config: Option<Arc<CryptConfig>>,
+) -> Result<(pbs_pxar_fuse::Reader, u64), Error> {
+    let reader = get_buffered_pxar_reader(archive_name, client, manifest, crypt_config).await?;
+    let archive_size = reader.archive_size();
+    let reader: pbs_pxar_fuse::Reader = Arc::new(BufferedDynamicReadAt::new(reader));
+
+    Ok((reader, archive_size))
+}
+
+pub(crate) async fn get_buffered_pxar_reader(
+    archive_name: &str,
+    client: Arc<BackupReader>,
+    manifest: &BackupManifest,
+    crypt_config: Option<Arc<CryptConfig>>,
+) -> Result<BufferedDynamicReader<RemoteChunkReader>, Error> {
+    let index = client
+        .download_dynamic_index(&manifest, &archive_name)
+        .await?;
+
+    let most_used = index.find_most_used_chunks(8);
+    let file_info = manifest.lookup_file_info(&archive_name)?;
+    let chunk_reader = RemoteChunkReader::new(
+        client.clone(),
+        crypt_config.clone(),
+        file_info.chunk_crypt_mode(),
+        most_used,
+    );
+
+    Ok(BufferedDynamicReader::new(index, chunk_reader))
+}
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index ab7d316d4..b81719dad 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -72,6 +72,8 @@ mod catalog;
 pub use catalog::*;
 mod snapshot;
 pub use snapshot::*;
+mod helper;
+pub(crate) use helper::*;
 pub mod key;
 pub mod namespace;
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 21/62] client: helper: add method for split archive name mapping
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (19 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 20/62] client: helper: add helpers for creating reader instances Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 22/62] client: restore: read payload from dedicated index Christian Ebner
                   ` (41 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Helper method that takes the meta or payload archive name as input
and maps it to the correct archive names for metadata and payload
archive.

If neighter is matched, fallback to returning the passed in archive
name as target archive and `None` for the payload archive name.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 proxmox-backup-client/src/helper.rs | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/proxmox-backup-client/src/helper.rs b/proxmox-backup-client/src/helper.rs
index 00b3ce362..f55231ce2 100644
--- a/proxmox-backup-client/src/helper.rs
+++ b/proxmox-backup-client/src/helper.rs
@@ -73,3 +73,24 @@ pub(crate) async fn get_buffered_pxar_reader(
 
     Ok(BufferedDynamicReader::new(index, chunk_reader))
 }
+
+pub(crate) fn get_pxar_archive_names(archive_name: &str) -> (String, Option<String>) {
+    if let Some(base) = archive_name
+        .strip_suffix(".mpxar.didx")
+        .or_else(|| archive_name.strip_suffix(".ppxar.didx"))
+    {
+        return (
+            format!("{base}.mpxar.didx"),
+            Some(format!("{base}.ppxar.didx")),
+        );
+    }
+
+    if let Some(base) = archive_name
+        .strip_suffix(".mpxar")
+        .or_else(|| archive_name.strip_suffix(".ppxar"))
+    {
+        return (format!("{base}.mpxar"), Some(format!("{base}.ppxar")));
+    }
+
+    (archive_name.to_owned(), None)
+}
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 22/62] client: restore: read payload from dedicated index
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (20 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 21/62] client: helper: add method for split archive name mapping Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 23/62] tools: cover extension for split pxar archives Christian Ebner
                   ` (40 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Whenever a split pxar archive is encountered, instantiate and attach
the required dedicated reader instance to the decoder instance on
restore.

Piping the output to stdout is not possible for these, as this would
require a decoder instance which can decode the input stream, while
maintaining the pxar stream format as output.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 proxmox-backup-client/src/main.rs | 39 ++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 14 deletions(-)

diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index b81719dad..821777d66 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -1216,7 +1216,7 @@ async fn dump_image<W: Write>(
 fn parse_archive_type(name: &str) -> (String, ArchiveType) {
     if name.ends_with(".didx") || name.ends_with(".fidx") || name.ends_with(".blob") {
         (name.into(), archive_type(name).unwrap())
-    } else if name.ends_with(".pxar") {
+    } else if name.ends_with(".pxar") || name.ends_with(".mpxar") || name.ends_with(".ppxar") {
         (format!("{}.didx", name), ArchiveType::DynamicIndex)
     } else if name.ends_with(".img") {
         (format!("{}.fidx", name), ArchiveType::FixedIndex)
@@ -1450,20 +1450,15 @@ async fn restore(
                 .map_err(|err| format_err!("unable to pipe data - {}", err))?;
         }
     } else if archive_type == ArchiveType::DynamicIndex {
-        let index = client
-            .download_dynamic_index(&manifest, &archive_name)
-            .await?;
+        let (archive_name, payload_archive_name) = helper::get_pxar_archive_names(&archive_name);
 
-        let most_used = index.find_most_used_chunks(8);
-
-        let chunk_reader = RemoteChunkReader::new(
+        let mut reader = get_buffered_pxar_reader(
+            &archive_name,
             client.clone(),
-            crypt_config,
-            file_info.chunk_crypt_mode(),
-            most_used,
-        );
-
-        let mut reader = BufferedDynamicReader::new(index, chunk_reader);
+            &manifest,
+            crypt_config.clone(),
+        )
+        .await?;
 
         let on_error = if ignore_extract_device_errors {
             let handler: PxarErrorHandler = Box::new(move |err: Error| {
@@ -1518,8 +1513,21 @@ async fn restore(
         }
 
         if let Some(target) = target {
+            let decoder = if let Some(payload_archive_name) = payload_archive_name {
+                let payload_reader = get_buffered_pxar_reader(
+                    &payload_archive_name,
+                    client.clone(),
+                    &manifest,
+                    crypt_config.clone(),
+                )
+                .await?;
+                pxar::decoder::Decoder::from_std(reader, Some(payload_reader))?
+            } else {
+                pxar::decoder::Decoder::from_std(reader, None)?
+            };
+
             pbs_client::pxar::extract_archive(
-                pxar::decoder::Decoder::from_std(reader, None)?,
+                decoder,
                 Path::new(target),
                 feature_flags,
                 |path| {
@@ -1529,6 +1537,9 @@ async fn restore(
             )
             .map_err(|err| format_err!("error extracting archive - {:#}", err))?;
         } else {
+            if archive_name.ends_with(".mpxar.didx") || archive_name.ends_with(".ppxar.didx") {
+                bail!("unable to pipe split archive");
+            }
             let mut writer = std::fs::OpenOptions::new()
                 .write(true)
                 .open("/dev/stdout")
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 23/62] tools: cover extension for split pxar archives
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (21 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 22/62] client: restore: read payload from dedicated index Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 24/62] restore: " Christian Ebner
                   ` (39 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Cover the additional `.mpxar` for metadata archive and `.ppxar` for
the payload data file in the cli parameter completion callback.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pbs-client/src/tools/mod.rs | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/pbs-client/src/tools/mod.rs b/pbs-client/src/tools/mod.rs
index 1b0123a39..f8d3102d1 100644
--- a/pbs-client/src/tools/mod.rs
+++ b/pbs-client/src/tools/mod.rs
@@ -337,7 +337,10 @@ pub fn complete_pxar_archive_name(arg: &str, param: &HashMap<String, String>) ->
     complete_server_file_name(arg, param)
         .iter()
         .filter_map(|name| {
-            if name.ends_with(".pxar.didx") {
+            if name.ends_with(".pxar.didx")
+                || name.ends_with(".mpxar.didx")
+                || name.ends_with(".ppxar.didx")
+            {
                 Some(pbs_tools::format::strip_server_file_extension(name).to_owned())
             } else {
                 None
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 24/62] restore: cover extension for split pxar archives
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (22 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 23/62] tools: cover extension for split pxar archives Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 25/62] client: mount: make split pxar archives mountable Christian Ebner
                   ` (38 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Cover the additional `.mpxar` for metadata archive and `.ppxar` for
the payload data for pxar archives written as split archive.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 proxmox-file-restore/src/main.rs | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/proxmox-file-restore/src/main.rs b/proxmox-file-restore/src/main.rs
index dbab69942..685ce34d9 100644
--- a/proxmox-file-restore/src/main.rs
+++ b/proxmox-file-restore/src/main.rs
@@ -75,7 +75,10 @@ fn parse_path(path: String, base64: bool) -> Result<ExtractPath, Error> {
         (file, path)
     };
 
-    if file.ends_with(".pxar.didx") {
+    if file.ends_with(".pxar.didx")
+        || file.ends_with(".mpxar.didx")
+        || file.ends_with(".ppxar.didx")
+    {
         Ok(ExtractPath::Pxar(file, path))
     } else if file.ends_with(".img.fidx") {
         Ok(ExtractPath::VM(file, path))
@@ -123,11 +126,18 @@ async fn list_files(
         ExtractPath::ListArchives => {
             let mut entries = vec![];
             for file in manifest.files() {
-                if !file.filename.ends_with(".pxar.didx") && !file.filename.ends_with(".img.fidx") {
+                if !file.filename.ends_with(".pxar.didx")
+                    && !file.filename.ends_with(".img.fidx")
+                    && !file.filename.ends_with(".mpxar.didx")
+                    && !file.filename.ends_with(".ppxar.didx")
+                {
                     continue;
                 }
                 let path = format!("/{}", file.filename);
-                let attr = if file.filename.ends_with(".pxar.didx") {
+                let attr = if file.filename.ends_with(".pxar.didx")
+                    || file.filename.ends_with(".mpxar.didx")
+                    || file.filename.ends_with(".ppxar.didx")
+                {
                     // a pxar file is a file archive, so it's root is also a directory root
                     Some(&DirEntryAttribute::Directory { start: 0 })
                 } else {
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 25/62] client: mount: make split pxar archives mountable
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (23 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 24/62] restore: " Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 26/62] api: datastore: refactor getting local chunk reader Christian Ebner
                   ` (37 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Cover the cases where the pxar archive was uploaded as split payload
data and metadata streams. Instantiate the required reader and
decoder instances to access the metadata and payload data archives.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 proxmox-backup-client/src/mount.rs | 34 ++++++++++++++----------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/proxmox-backup-client/src/mount.rs b/proxmox-backup-client/src/mount.rs
index 67fd23468..dd9532fbe 100644
--- a/proxmox-backup-client/src/mount.rs
+++ b/proxmox-backup-client/src/mount.rs
@@ -21,17 +21,16 @@ use pbs_api_types::BackupNamespace;
 use pbs_client::tools::key_source::get_encryption_key_password;
 use pbs_client::{BackupReader, RemoteChunkReader};
 use pbs_datastore::cached_chunk_reader::CachedChunkReader;
-use pbs_datastore::dynamic_index::BufferedDynamicReader;
 use pbs_datastore::index::IndexFile;
 use pbs_key_config::load_and_decrypt_key;
 use pbs_tools::crypt_config::CryptConfig;
 use pbs_tools::json::required_string_param;
 
+use crate::helper;
 use crate::{
     complete_group_or_snapshot, complete_img_archive_name, complete_namespace,
     complete_pxar_archive_name, complete_repository, connect, dir_or_last_from_group,
-    extract_repository_from_value, optional_ns_param, record_repository, BufferedDynamicReadAt,
-    REPO_URL_SCHEMA,
+    extract_repository_from_value, optional_ns_param, record_repository, REPO_URL_SCHEMA,
 };
 
 #[sortable]
@@ -219,7 +218,10 @@ async fn mount_do(param: Value, pipe: Option<OwnedFd>) -> Result<Value, Error> {
         }
     };
 
-    let server_archive_name = if archive_name.ends_with(".pxar") {
+    let server_archive_name = if archive_name.ends_with(".pxar")
+        || archive_name.ends_with(".mpxar")
+        || archive_name.ends_with(".ppxar")
+    {
         if target.is_none() {
             bail!("use the 'mount' command to mount pxar archives");
         }
@@ -246,7 +248,9 @@ async fn mount_do(param: Value, pipe: Option<OwnedFd>) -> Result<Value, Error> {
     let (manifest, _) = client.download_manifest().await?;
     manifest.check_fingerprint(crypt_config.as_ref().map(Arc::as_ref))?;
 
-    let file_info = manifest.lookup_file_info(&server_archive_name)?;
+    let (archive_name, payload_archive_name) = helper::get_pxar_archive_names(&server_archive_name);
+
+    let file_info = manifest.lookup_file_info(&archive_name)?;
 
     let daemonize = || -> Result<(), Error> {
         if let Some(pipe) = pipe {
@@ -283,20 +287,14 @@ async fn mount_do(param: Value, pipe: Option<OwnedFd>) -> Result<Value, Error> {
         futures::future::select(interrupt_int.recv().boxed(), interrupt_term.recv().boxed());
 
     if server_archive_name.ends_with(".didx") {
-        let index = client
-            .download_dynamic_index(&manifest, &server_archive_name)
-            .await?;
-        let most_used = index.find_most_used_chunks(8);
-        let chunk_reader = RemoteChunkReader::new(
+        let decoder = helper::get_pxar_fuse_accessor(
+            &archive_name,
+            payload_archive_name.as_ref().map(|x| x.as_str()),
             client.clone(),
-            crypt_config,
-            file_info.chunk_crypt_mode(),
-            most_used,
-        );
-        let reader = BufferedDynamicReader::new(index, chunk_reader);
-        let archive_size = reader.archive_size();
-        let reader: pbs_pxar_fuse::Reader = Arc::new(BufferedDynamicReadAt::new(reader));
-        let decoder = pbs_pxar_fuse::Accessor::new(reader, archive_size, None).await?;
+            &manifest,
+            crypt_config.clone(),
+        )
+        .await?;
 
         let session =
             pbs_pxar_fuse::Session::mount(decoder, options, false, Path::new(target.unwrap()))
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 26/62] api: datastore: refactor getting local chunk reader
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (24 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 25/62] client: mount: make split pxar archives mountable Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 27/62] api: datastore: attach optional payload " Christian Ebner
                   ` (36 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Move the code to get the local chunk reader to a dedicated function
to make it reusable. The same code is required to get the local chunk
reader for the payload stream for split stream archives.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 src/api2/admin/datastore.rs | 39 ++++++++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 14 deletions(-)

diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index 068b6a61e..67330dd4f 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -1744,6 +1744,29 @@ pub const API_METHOD_PXAR_FILE_DOWNLOAD: ApiMethod = ApiMethod::new(
     &Permission::Anybody,
 );
 
+fn get_local_pxar_reader(
+    datastore: Arc<DataStore>,
+    manifest: &BackupManifest,
+    backup_dir: &BackupDir,
+    pxar_name: &str,
+) -> Result<(LocalDynamicReadAt<LocalChunkReader>, u64), Error> {
+    let mut path = datastore.base_path();
+    path.push(backup_dir.relative_path());
+    path.push(pxar_name);
+
+    let index = DynamicIndexReader::open(&path)
+        .map_err(|err| format_err!("unable to read dynamic index '{:?}' - {}", &path, err))?;
+
+    let (csum, size) = index.compute_csum();
+    manifest.verify_file(pxar_name, &csum, size)?;
+
+    let chunk_reader = LocalChunkReader::new(datastore, None, CryptMode::None);
+    let reader = BufferedDynamicReader::new(index, chunk_reader);
+    let archive_size = reader.archive_size();
+
+    Ok((LocalDynamicReadAt::new(reader), archive_size))
+}
+
 pub fn pxar_file_download(
     _parts: Parts,
     _req_body: Body,
@@ -1788,20 +1811,8 @@ pub fn pxar_file_download(
             }
         }
 
-        let mut path = datastore.base_path();
-        path.push(backup_dir.relative_path());
-        path.push(pxar_name);
-
-        let index = DynamicIndexReader::open(&path)
-            .map_err(|err| format_err!("unable to read dynamic index '{:?}' - {}", &path, err))?;
-
-        let (csum, size) = index.compute_csum();
-        manifest.verify_file(pxar_name, &csum, size)?;
-
-        let chunk_reader = LocalChunkReader::new(datastore, None, CryptMode::None);
-        let reader = BufferedDynamicReader::new(index, chunk_reader);
-        let archive_size = reader.archive_size();
-        let reader = LocalDynamicReadAt::new(reader);
+        let (reader, archive_size) =
+            get_local_pxar_reader(datastore.clone(), &manifest, &backup_dir, pxar_name)?;
 
         let decoder = Accessor::new(reader, archive_size, None).await?;
         let root = decoder.open_root().await?;
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 27/62] api: datastore: attach optional payload chunk reader
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (25 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 26/62] api: datastore: refactor getting local chunk reader Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 28/62] catalog: shell: make split pxar archives accessible Christian Ebner
                   ` (35 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Attach the payload chunk reader for pxar archives which have been
uploaded using split streams for metadata and payload data.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 src/api2/admin/datastore.rs | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index 67330dd4f..9e8a06671 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -1814,7 +1814,15 @@ pub fn pxar_file_download(
         let (reader, archive_size) =
             get_local_pxar_reader(datastore.clone(), &manifest, &backup_dir, pxar_name)?;
 
-        let decoder = Accessor::new(reader, archive_size, None).await?;
+        let decoder = if let Some(archive_base_name) = pxar_name.strip_suffix(".mpxar.didx") {
+            let payload_archive_name = format!("{archive_base_name}.ppxar.didx");
+            let (payload_reader, payload_size) =
+                get_local_pxar_reader(datastore, &manifest, &backup_dir, &payload_archive_name)?;
+            Accessor::new(reader, archive_size, Some((payload_reader, payload_size))).await?
+        } else {
+            Accessor::new(reader, archive_size, None).await?
+        };
+
         let root = decoder.open_root().await?;
         let path = OsStr::from_bytes(file_path).to_os_string();
         let file = root
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 28/62] catalog: shell: make split pxar archives accessible
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (26 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 27/62] api: datastore: attach optional payload " Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 29/62] www: cover metadata extension for pxar archives Christian Ebner
                   ` (34 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Cover the cases where the pxar archive was uploaded as split payload
data and metadata streams. Instantiate the required reader and
decoder instances to access the metadata and payload data archives,
using the corresponding helper methods.
Allows to restore split metadata and payload stream pxar archives via
the catalog shell.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 proxmox-backup-client/src/catalog.rs | 30 +++++++++++++---------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/proxmox-backup-client/src/catalog.rs b/proxmox-backup-client/src/catalog.rs
index db919477f..0f56e1224 100644
--- a/proxmox-backup-client/src/catalog.rs
+++ b/proxmox-backup-client/src/catalog.rs
@@ -14,12 +14,13 @@ use pbs_client::{BackupReader, RemoteChunkReader};
 use pbs_tools::crypt_config::CryptConfig;
 use pbs_tools::json::required_string_param;
 
+use crate::helper;
 use crate::{
     complete_backup_snapshot, complete_group_or_snapshot, complete_namespace,
     complete_pxar_archive_name, complete_repository, connect, crypto_parameters, decrypt_key,
     dir_or_last_from_group, extract_repository_from_value, format_key_source, optional_ns_param,
-    record_repository, BackupDir, BufferedDynamicReadAt, BufferedDynamicReader, CatalogReader,
-    DynamicIndexReader, IndexFile, Shell, CATALOG_NAME, KEYFD_SCHEMA, REPO_URL_SCHEMA,
+    record_repository, BackupDir, BufferedDynamicReader, CatalogReader, DynamicIndexReader,
+    IndexFile, Shell, CATALOG_NAME, KEYFD_SCHEMA, REPO_URL_SCHEMA,
 };
 
 #[api(
@@ -180,7 +181,10 @@ async fn catalog_shell(param: Value) -> Result<(), Error> {
         }
     };
 
-    let server_archive_name = if archive_name.ends_with(".pxar") {
+    let server_archive_name = if archive_name.ends_with(".pxar")
+        || archive_name.ends_with(".mpxar")
+        || archive_name.ends_with(".ppxar")
+    {
         format!("{}.didx", archive_name)
     } else {
         bail!("Can only mount pxar archives.");
@@ -205,22 +209,16 @@ async fn catalog_shell(param: Value) -> Result<(), Error> {
     let (manifest, _) = client.download_manifest().await?;
     manifest.check_fingerprint(crypt_config.as_ref().map(Arc::as_ref))?;
 
-    let index = client
-        .download_dynamic_index(&manifest, &server_archive_name)
-        .await?;
-    let most_used = index.find_most_used_chunks(8);
+    let (archive_name, payload_archive_name) = helper::get_pxar_archive_names(&server_archive_name);
 
-    let file_info = manifest.lookup_file_info(&server_archive_name)?;
-    let chunk_reader = RemoteChunkReader::new(
+    let decoder = helper::get_pxar_fuse_accessor(
+        &archive_name,
+        payload_archive_name.as_ref().map(|x| x.as_str()),
         client.clone(),
+        &manifest,
         crypt_config.clone(),
-        file_info.chunk_crypt_mode(),
-        most_used,
-    );
-    let reader = BufferedDynamicReader::new(index, chunk_reader);
-    let archive_size = reader.archive_size();
-    let reader: pbs_pxar_fuse::Reader = Arc::new(BufferedDynamicReadAt::new(reader));
-    let decoder = pbs_pxar_fuse::Accessor::new(reader, archive_size, None).await?;
+    )
+    .await?;
 
     client.download(CATALOG_NAME, &mut tmpfile).await?;
     let index = DynamicIndexReader::new(tmpfile)
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 29/62] www: cover metadata extension for pxar archives
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (27 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 28/62] catalog: shell: make split pxar archives accessible Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 30/62] file restore: factor out getting pxar reader Christian Ebner
                   ` (33 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Allows to access the pxar metadata archives for navigation and
download via the Proxmox Backup Server web ui.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 www/datastore/Content.js | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/www/datastore/Content.js b/www/datastore/Content.js
index c2403ff9c..6dd1ab319 100644
--- a/www/datastore/Content.js
+++ b/www/datastore/Content.js
@@ -1050,7 +1050,7 @@ Ext.define('PBS.DataStoreContent', {
 		    tooltip: gettext('Browse'),
 		    getClass: (v, m, { data }) => {
 			if (
-			    (data.ty === 'file' && data.filename.endsWith('pxar.didx')) ||
+			    (data.ty === 'file' && (data.filename.endsWith('.pxar.didx') || data.filename.endsWith('.mpxar.didx'))) ||
 			    (data.ty === 'ns' && !data.root)
 			) {
 			    return 'fa fa-folder-open-o';
@@ -1058,7 +1058,9 @@ Ext.define('PBS.DataStoreContent', {
 			return 'pmx-hidden';
 		    },
 		    isActionDisabled: (v, r, c, i, { data }) =>
-			!(data.ty === 'file' && data.filename.endsWith('pxar.didx') && data['crypt-mode'] < 3) && data.ty !== 'ns',
+			!(data.ty === 'file' &&
+			(data.filename.endsWith('.pxar.didx') || data.filename.endsWith('.mpxar.didx')) &&
+			data['crypt-mode'] < 3) && data.ty !== 'ns',
 		},
 	    ],
 	},
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 30/62] file restore: factor out getting pxar reader
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (28 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 29/62] www: cover metadata extension for pxar archives Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 31/62] file restore: cover split metadata and payload archives Christian Ebner
                   ` (32 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Factor out the logic to get the pxar reader into a dedicated function
so it can be reused to get the payload data archive reader instance.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 proxmox-file-restore/src/main.rs | 44 ++++++++++++++++++++------------
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/proxmox-file-restore/src/main.rs b/proxmox-file-restore/src/main.rs
index 685ce34d9..8a11cff65 100644
--- a/proxmox-file-restore/src/main.rs
+++ b/proxmox-file-restore/src/main.rs
@@ -34,7 +34,7 @@ use pbs_client::{BackupReader, BackupRepository, RemoteChunkReader};
 use pbs_datastore::catalog::{ArchiveEntry, CatalogReader, DirEntryAttribute};
 use pbs_datastore::dynamic_index::{BufferedDynamicReader, LocalDynamicReadAt};
 use pbs_datastore::index::IndexFile;
-use pbs_datastore::CATALOG_NAME;
+use pbs_datastore::{BackupManifest, CATALOG_NAME};
 use pbs_key_config::decrypt_key;
 use pbs_tools::crypt_config::CryptConfig;
 
@@ -335,6 +335,31 @@ async fn list(
     Ok(())
 }
 
+async fn get_local_pxar_reader(
+    archive_name: &str,
+    client: Arc<BackupReader>,
+    manifest: &BackupManifest,
+    crypt_config: Option<Arc<CryptConfig>>,
+) -> Result<(LocalDynamicReadAt<RemoteChunkReader>, u64), Error> {
+    let index = client
+        .download_dynamic_index(&manifest, &archive_name)
+        .await?;
+    let most_used = index.find_most_used_chunks(8);
+
+    let file_info = manifest.lookup_file_info(&archive_name)?;
+    let chunk_reader = RemoteChunkReader::new(
+        client.clone(),
+        crypt_config,
+        file_info.chunk_crypt_mode(),
+        most_used,
+    );
+
+    let reader = BufferedDynamicReader::new(index, chunk_reader);
+    let archive_size = reader.archive_size();
+
+    Ok((LocalDynamicReadAt::new(reader), archive_size))
+}
+
 #[api(
     input: {
         properties: {
@@ -452,21 +477,8 @@ async fn extract(
 
     match path {
         ExtractPath::Pxar(archive_name, path) => {
-            let file_info = manifest.lookup_file_info(&archive_name)?;
-            let index = client
-                .download_dynamic_index(&manifest, &archive_name)
-                .await?;
-            let most_used = index.find_most_used_chunks(8);
-            let chunk_reader = RemoteChunkReader::new(
-                client.clone(),
-                crypt_config,
-                file_info.chunk_crypt_mode(),
-                most_used,
-            );
-            let reader = BufferedDynamicReader::new(index, chunk_reader);
-
-            let archive_size = reader.archive_size();
-            let reader = LocalDynamicReadAt::new(reader);
+            let (reader, archive_size) =
+                get_local_pxar_reader(&archive_name, client, &manifest, crypt_config).await?;
             let decoder = Accessor::new(reader, archive_size, None).await?;
             extract_to_target(decoder, &path, target, format, zstd).await?;
         }
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 31/62] file restore: cover split metadata and payload archives
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (29 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 30/62] file restore: factor out getting pxar reader Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 32/62] file restore: show more error context when extraction fails Christian Ebner
                   ` (31 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Attach the payload data archive as input stream to the decoder
and accessor instances for split archives.
Allows to restore contents from split archives via the
`proxmox-file-restore extract` command, by passing the metadata
archive name.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 proxmox-file-restore/src/main.rs | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/proxmox-file-restore/src/main.rs b/proxmox-file-restore/src/main.rs
index 8a11cff65..36a988708 100644
--- a/proxmox-file-restore/src/main.rs
+++ b/proxmox-file-restore/src/main.rs
@@ -477,9 +477,25 @@ async fn extract(
 
     match path {
         ExtractPath::Pxar(archive_name, path) => {
-            let (reader, archive_size) =
-                get_local_pxar_reader(&archive_name, client, &manifest, crypt_config).await?;
-            let decoder = Accessor::new(reader, archive_size, None).await?;
+            let (reader, archive_size) = get_local_pxar_reader(
+                &archive_name,
+                client.clone(),
+                &manifest,
+                crypt_config.clone(),
+            )
+            .await?;
+
+            let decoder = if let Some(archive_base_name) = archive_name.strip_suffix(".mpxar.didx")
+            {
+                let payload_archive_name = format!("{archive_base_name}.ppxar.didx");
+                let (payload_reader, payload_size) =
+                    get_local_pxar_reader(&payload_archive_name, client, &manifest, crypt_config)
+                        .await?;
+                Accessor::new(reader, archive_size, Some((payload_reader, payload_size))).await?
+            } else {
+                Accessor::new(reader, archive_size, None).await?
+            };
+
             extract_to_target(decoder, &path, target, format, zstd).await?;
         }
         ExtractPath::VM(file, path) => {
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 32/62] file restore: show more error context when extraction fails
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (30 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 31/62] file restore: cover split metadata and payload archives Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 33/62] pxar: add optional payload input for achive restore Christian Ebner
                   ` (30 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Otherwise the context swallows the actual, underlying error message.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 proxmox-file-restore/src/main.rs | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/proxmox-file-restore/src/main.rs b/proxmox-file-restore/src/main.rs
index 36a988708..36dd14ff4 100644
--- a/proxmox-file-restore/src/main.rs
+++ b/proxmox-file-restore/src/main.rs
@@ -496,7 +496,9 @@ async fn extract(
                 Accessor::new(reader, archive_size, None).await?
             };
 
-            extract_to_target(decoder, &path, target, format, zstd).await?;
+            extract_to_target(decoder, &path, target, format, zstd)
+                .await
+                .map_err(|err| format_err!("error extracting archive - {err:#}"))?;
         }
         ExtractPath::VM(file, path) => {
             let details = SnapRestoreDetails {
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 33/62] pxar: add optional payload input for achive restore
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (31 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 32/62] file restore: show more error context when extraction fails Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 34/62] pxar: add more context to extraction error Christian Ebner
                   ` (29 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Allows to pass the optional payload input to restore for cases where the
regular file payloads are stored in the split archive.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pxar-bin/src/main.rs | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index 34944cf16..ac0acad0e 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -25,9 +25,10 @@ fn extract_archive_from_reader<R: std::io::Read>(
     target: &str,
     feature_flags: Flags,
     options: PxarExtractOptions,
+    payload_reader: Option<&mut R>,
 ) -> Result<(), Error> {
     pbs_client::pxar::extract_archive(
-        pxar::decoder::Decoder::from_std(reader, None)?,
+        pxar::decoder::Decoder::from_std(reader, payload_reader)?,
         Path::new(target),
         feature_flags,
         |path| {
@@ -120,6 +121,10 @@ fn extract_archive_from_reader<R: std::io::Read>(
                 optional: true,
                 default: false,
             },
+            "payload-input": {
+                description: "'ppxar' payload input data file to restore split archive.",
+                optional: true,
+            },
         },
     },
 )]
@@ -142,6 +147,7 @@ fn extract_archive(
     no_fifos: bool,
     no_sockets: bool,
     strict: bool,
+    payload_input: Option<String>,
 ) -> Result<(), Error> {
     let mut feature_flags = Flags::DEFAULT;
     if no_xattrs {
@@ -220,12 +226,24 @@ fn extract_archive(
     if archive == "-" {
         let stdin = std::io::stdin();
         let mut reader = stdin.lock();
-        extract_archive_from_reader(&mut reader, target, feature_flags, options)?;
+        extract_archive_from_reader(&mut reader, target, feature_flags, options, None)?;
     } else {
         log::debug!("PXAR extract: {}", archive);
         let file = std::fs::File::open(archive)?;
         let mut reader = std::io::BufReader::new(file);
-        extract_archive_from_reader(&mut reader, target, feature_flags, options)?;
+        let mut payload_reader = if let Some(payload_input) = payload_input {
+            let file = std::fs::File::open(payload_input)?;
+            Some(std::io::BufReader::new(file))
+        } else {
+            None
+        };
+        extract_archive_from_reader(
+            &mut reader,
+            target,
+            feature_flags,
+            options,
+            payload_reader.as_mut(),
+        )?;
     }
 
     if !was_ok.load(Ordering::Acquire) {
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 34/62] pxar: add more context to extraction error
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (32 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 33/62] pxar: add optional payload input for achive restore Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 35/62] client: pxar: include payload offset in entry listing Christian Ebner
                   ` (28 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Show more of the extraction error context provided by the pxar decoder.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pxar-bin/src/main.rs | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index ac0acad0e..44a6fa8a1 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -226,7 +226,8 @@ fn extract_archive(
     if archive == "-" {
         let stdin = std::io::stdin();
         let mut reader = stdin.lock();
-        extract_archive_from_reader(&mut reader, target, feature_flags, options, None)?;
+        extract_archive_from_reader(&mut reader, target, feature_flags, options, None)
+            .map_err(|err| format_err!("error extracting archive - {err:#}"))?;
     } else {
         log::debug!("PXAR extract: {}", archive);
         let file = std::fs::File::open(archive)?;
@@ -243,7 +244,8 @@ fn extract_archive(
             feature_flags,
             options,
             payload_reader.as_mut(),
-        )?;
+        )
+        .map_err(|err| format_err!("error extracting archive - {err:#}"))?
     }
 
     if !was_ok.load(Ordering::Acquire) {
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 35/62] client: pxar: include payload offset in entry listing
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (33 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 34/62] pxar: add more context to extraction error Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 36/62] pxar: show padding in debug output on archive list Christian Ebner
                   ` (27 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Also display the payload offset as listing output when the regular file
entry had a payload reference rather than the payload encoded in the
archive. This allows for debugging by inspecting the raw payload data
file at given offset.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pbs-client/src/pxar/tools.rs | 116 ++++++++++++++++++++++++-----------
 1 file changed, 80 insertions(+), 36 deletions(-)

diff --git a/pbs-client/src/pxar/tools.rs b/pbs-client/src/pxar/tools.rs
index 0cfbaf5b9..459951d50 100644
--- a/pbs-client/src/pxar/tools.rs
+++ b/pbs-client/src/pxar/tools.rs
@@ -128,25 +128,42 @@ pub fn format_single_line_entry(entry: &Entry) -> String {
 
     let meta = entry.metadata();
 
-    let (size, link) = match entry.kind() {
-        EntryKind::File { size, .. } => (format!("{}", *size), String::new()),
-        EntryKind::Symlink(link) => ("0".to_string(), format!(" -> {:?}", link.as_os_str())),
-        EntryKind::Hardlink(link) => ("0".to_string(), format!(" -> {:?}", link.as_os_str())),
-        EntryKind::Device(dev) => (format!("{},{}", dev.major, dev.minor), String::new()),
-        _ => ("0".to_string(), String::new()),
+    let (size, link, payload_offset) = match entry.kind() {
+        EntryKind::File {
+            size,
+            payload_offset,
+            ..
+        } => (format!("{}", *size), String::new(), *payload_offset),
+        EntryKind::Symlink(link) => ("0".to_string(), format!(" -> {:?}", link.as_os_str()), None),
+        EntryKind::Hardlink(link) => ("0".to_string(), format!(" -> {:?}", link.as_os_str()), None),
+        EntryKind::Device(dev) => (format!("{},{}", dev.major, dev.minor), String::new(), None),
+        _ => ("0".to_string(), String::new(), None),
     };
 
     let owner_string = format!("{}/{}", meta.stat.uid, meta.stat.gid);
 
-    format!(
-        "{} {:<13} {} {:>8} {:?}{}",
-        mode_string,
-        owner_string,
-        format_mtime(&meta.stat.mtime),
-        size,
-        entry.path(),
-        link,
-    )
+    if let Some(offset) = payload_offset {
+        format!(
+            "{} {:<13} {} {:>8} {:?}{} {}",
+            mode_string,
+            owner_string,
+            format_mtime(&meta.stat.mtime),
+            size,
+            entry.path(),
+            link,
+            offset,
+        )
+    } else {
+        format!(
+            "{} {:<13} {} {:>8} {:?}{}",
+            mode_string,
+            owner_string,
+            format_mtime(&meta.stat.mtime),
+            size,
+            entry.path(),
+            link,
+        )
+    }
 }
 
 pub fn format_multi_line_entry(entry: &Entry) -> String {
@@ -154,17 +171,23 @@ pub fn format_multi_line_entry(entry: &Entry) -> String {
 
     let meta = entry.metadata();
 
-    let (size, link, type_name) = match entry.kind() {
-        EntryKind::File { size, .. } => (format!("{}", *size), String::new(), "file"),
+    let (size, link, type_name, payload_offset) = match entry.kind() {
+        EntryKind::File {
+            size,
+            payload_offset,
+            ..
+        } => (format!("{}", *size), String::new(), "file", *payload_offset),
         EntryKind::Symlink(link) => (
             "0".to_string(),
             format!(" -> {:?}", link.as_os_str()),
             "symlink",
+            None,
         ),
         EntryKind::Hardlink(link) => (
             "0".to_string(),
             format!(" -> {:?}", link.as_os_str()),
             "symlink",
+            None,
         ),
         EntryKind::Device(dev) => (
             format!("{},{}", dev.major, dev.minor),
@@ -176,11 +199,12 @@ pub fn format_multi_line_entry(entry: &Entry) -> String {
             } else {
                 "device"
             },
+            None,
         ),
-        EntryKind::Socket => ("0".to_string(), String::new(), "socket"),
-        EntryKind::Fifo => ("0".to_string(), String::new(), "fifo"),
-        EntryKind::Directory => ("0".to_string(), String::new(), "directory"),
-        EntryKind::GoodbyeTable => ("0".to_string(), String::new(), "bad entry"),
+        EntryKind::Socket => ("0".to_string(), String::new(), "socket", None),
+        EntryKind::Fifo => ("0".to_string(), String::new(), "fifo", None),
+        EntryKind::Directory => ("0".to_string(), String::new(), "directory", None),
+        EntryKind::GoodbyeTable => ("0".to_string(), String::new(), "bad entry", None),
     };
 
     let file_name = match std::str::from_utf8(entry.path().as_os_str().as_bytes()) {
@@ -188,19 +212,39 @@ pub fn format_multi_line_entry(entry: &Entry) -> String {
         Err(_) => std::borrow::Cow::Owned(format!("{:?}", entry.path())),
     };
 
-    format!(
-        "  File: {}{}\n  \
-           Size: {:<13} Type: {}\n\
-         Access: ({:o}/{})  Uid: {:<5} Gid: {:<5}\n\
-         Modify: {}\n",
-        file_name,
-        link,
-        size,
-        type_name,
-        meta.file_mode(),
-        mode_string,
-        meta.stat.uid,
-        meta.stat.gid,
-        format_mtime(&meta.stat.mtime),
-    )
+    if let Some(offset) = payload_offset {
+        format!(
+            "  File: {}{}\n  \
+               Size: {:<13} Type: {}\n\
+             Access: ({:o}/{})  Uid: {:<5} Gid: {:<5}\n\
+             Modify: {}\n
+             PayloadOffset: {}\n",
+            file_name,
+            link,
+            size,
+            type_name,
+            meta.file_mode(),
+            mode_string,
+            meta.stat.uid,
+            meta.stat.gid,
+            format_mtime(&meta.stat.mtime),
+            offset,
+        )
+    } else {
+        format!(
+            "  File: {}{}\n  \
+               Size: {:<13} Type: {}\n\
+             Access: ({:o}/{})  Uid: {:<5} Gid: {:<5}\n\
+             Modify: {}\n",
+            file_name,
+            link,
+            size,
+            type_name,
+            meta.file_mode(),
+            mode_string,
+            meta.stat.uid,
+            meta.stat.gid,
+            format_mtime(&meta.stat.mtime),
+        )
+    }
 }
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 36/62] pxar: show padding in debug output on archive list
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (34 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 35/62] client: pxar: include payload offset in entry listing Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 37/62] datastore: dynamic index: add method to get digest Christian Ebner
                   ` (26 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

In addition to the entries, also show the padding encountered in-between
referenced payloads.

Example invocation: `PXAR_LOG=debug pxar list archive.mpxar`

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pxar-bin/src/main.rs | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index 44a6fa8a1..58c9d2cfd 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -9,6 +9,7 @@ use std::sync::Arc;
 use anyhow::{bail, format_err, Error};
 use futures::future::FutureExt;
 use futures::select;
+use pxar::EntryKind;
 use tokio::signal::unix::{signal, SignalKind};
 
 use pathpatterns::{MatchEntry, MatchType, PatternFlag};
@@ -456,10 +457,28 @@ async fn mount_archive(archive: String, mountpoint: String, verbose: bool) -> Re
 )]
 /// List the contents of an archive.
 fn dump_archive(archive: String) -> Result<(), Error> {
+    let mut last = None;
     for entry in pxar::decoder::Decoder::open(archive)? {
         let entry = entry?;
 
         if log::log_enabled!(log::Level::Debug) {
+            match entry.kind() {
+                EntryKind::File {
+                    payload_offset: Some(offset),
+                    size,
+                    ..
+                } => {
+                    if let Some(last) = last {
+                        let skipped = offset - last;
+                        if skipped > 0 {
+                            log::debug!("Encountered padding of {skipped} bytes");
+                        }
+                    }
+                    last = Some(offset + size + std::mem::size_of::<pxar::format::Header>() as u64);
+                }
+                _ => (),
+            }
+
             log::debug!("{}", format_single_line_entry(&entry));
         } else {
             log::info!("{:?}", entry.path());
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 37/62] datastore: dynamic index: add method to get digest
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (35 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 36/62] pxar: show padding in debug output on archive list Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 38/62] client: pxar: helper for lookup of reusable dynamic entries Christian Ebner
                   ` (25 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

In preparation for injecting reused payload chunks in payload streams
for regular files with unchanged metaddata. Allows to get the digest
of a dynamic index entry to construct a reusable dynamic entry from
it.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pbs-datastore/src/dynamic_index.rs | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/pbs-datastore/src/dynamic_index.rs b/pbs-datastore/src/dynamic_index.rs
index 71a5082e1..b8047b5b1 100644
--- a/pbs-datastore/src/dynamic_index.rs
+++ b/pbs-datastore/src/dynamic_index.rs
@@ -72,6 +72,11 @@ impl DynamicEntry {
     pub fn end(&self) -> u64 {
         u64::from_le(self.end_le)
     }
+
+    #[inline]
+    pub fn digest(&self) -> [u8; 32] {
+        self.digest
+    }
 }
 
 pub struct DynamicIndexReader {
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 38/62] client: pxar: helper for lookup of reusable dynamic entries
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (36 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 37/62] datastore: dynamic index: add method to get digest Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 39/62] upload stream: implement reused chunk injector Christian Ebner
                   ` (24 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

The helper method allows to lookup the entries of a dynamic index
which fully cover a given offset range. Further, the helper returns
the start padding from the start offset of the dynamic index entry
to the start offset of the given range and the end padding.

This will be used to lookup size and digest for chunks covering the
payload range of a regular file in order to re-use found chunks by
indexing them in the archives index file instead of re-encoding the
payload.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pbs-client/src/pxar/create.rs | 70 +++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 2bb5a6253..0f32efcce 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -2,6 +2,7 @@ use std::collections::{HashMap, HashSet};
 use std::ffi::{CStr, CString, OsStr};
 use std::fmt;
 use std::io::{self, Read};
+use std::ops::Range;
 use std::os::unix::ffi::OsStrExt;
 use std::os::unix::io::{AsRawFd, FromRawFd, IntoRawFd, OwnedFd, RawFd};
 use std::path::{Path, PathBuf};
@@ -25,6 +26,8 @@ use proxmox_lang::c_str;
 use proxmox_sys::fs::{self, acl, xattr};
 
 use pbs_datastore::catalog::BackupCatalogWriter;
+use pbs_datastore::dynamic_index::DynamicIndexReader;
+use pbs_datastore::index::IndexFile;
 
 use crate::pxar::metadata::errno_is_unsupported;
 use crate::pxar::tools::assert_single_path_component;
@@ -791,6 +794,73 @@ impl Archiver {
     }
 }
 
+/// Dynamic entry reusable by payload references
+#[derive(Clone, Debug)]
+#[repr(C)]
+pub struct ReusableDynamicEntry {
+    size: u64,
+    padding: u64,
+    digest: [u8; 32],
+}
+
+impl ReusableDynamicEntry {
+    #[inline]
+    pub fn size(&self) -> u64 {
+        self.size
+    }
+
+    #[inline]
+    pub fn digest(&self) -> [u8; 32] {
+        self.digest
+    }
+}
+
+/// List of dynamic entries containing the data given by an offset range
+fn lookup_dynamic_entries(
+    index: &DynamicIndexReader,
+    range: Range<u64>,
+) -> Result<(Vec<ReusableDynamicEntry>, u64, u64), Error> {
+    let end_idx = index.index_count() - 1;
+    let chunk_end = index.chunk_end(end_idx);
+    let start = index.binary_search(0, 0, end_idx, chunk_end, range.start)?;
+
+    let mut prev_end = if start == 0 {
+        0
+    } else {
+        index.chunk_end(start - 1)
+    };
+    let padding_start = range.start - prev_end;
+    let mut padding_end = 0;
+
+    let mut indices = Vec::new();
+    for dynamic_entry in &index.index()[start..] {
+        let end = dynamic_entry.end();
+
+        let reusable_dynamic_entry = ReusableDynamicEntry {
+            size: (end - prev_end),
+            padding: 0,
+            digest: dynamic_entry.digest(),
+        };
+        indices.push(reusable_dynamic_entry);
+
+        if range.end < end {
+            padding_end = end - range.end;
+            break;
+        }
+        prev_end = end;
+    }
+
+    if let Some(first) = indices.first_mut() {
+        first.padding += padding_start;
+    }
+
+    if let Some(last) = indices.last_mut() {
+        last.padding += padding_end;
+    }
+
+    Ok((indices, padding_start, padding_end))
+}
+
 fn get_metadata(
     fd: RawFd,
     stat: &FileStat,
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 39/62] upload stream: implement reused chunk injector
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (37 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 38/62] client: pxar: helper for lookup of reusable dynamic entries Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 40/62] client: chunk stream: add struct to hold injection state Christian Ebner
                   ` (23 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

In order to be included in the backups index file, reused payload
chunks have to be injected into the payload upload stream at a
forced boundary. The chunker forces a chunk boundary and sends the
list of reusable dynamic entries to be uploaded.

This implements the logic to receive these dynamic entries via the
corresponding communication channel from the chunker and inject the
entries into the backup upload stream by looking for the matching
chunk boundary, already forced by the chunker.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pbs-client/src/inject_reused_chunks.rs | 129 +++++++++++++++++++++++++
 pbs-client/src/lib.rs                  |   1 +
 2 files changed, 130 insertions(+)
 create mode 100644 pbs-client/src/inject_reused_chunks.rs

diff --git a/pbs-client/src/inject_reused_chunks.rs b/pbs-client/src/inject_reused_chunks.rs
new file mode 100644
index 000000000..ed147f5fb
--- /dev/null
+++ b/pbs-client/src/inject_reused_chunks.rs
@@ -0,0 +1,129 @@
+use std::cmp;
+use std::pin::Pin;
+use std::sync::atomic::{AtomicUsize, Ordering};
+use std::sync::{mpsc, Arc};
+use std::task::{Context, Poll};
+
+use anyhow::{anyhow, Error};
+use futures::{ready, Stream};
+use pin_project_lite::pin_project;
+
+use crate::pxar::create::ReusableDynamicEntry;
+
+pin_project! {
+    pub struct InjectReusedChunksQueue<S> {
+        #[pin]
+        input: S,
+        next_injection: Option<InjectChunks>,
+        buffer: Option<bytes::BytesMut>,
+        injections: Option<mpsc::Receiver<InjectChunks>>,
+        stream_len: Arc<AtomicUsize>,
+    }
+}
+
+type StreamOffset = u64;
+#[derive(Debug)]
+/// Holds a list of chunks to inject at the given boundary by forcing a chunk boundary.
+pub struct InjectChunks {
+    /// Offset at which to force the boundary
+    pub boundary: StreamOffset,
+    /// List of chunks to inject
+    pub chunks: Vec<ReusableDynamicEntry>,
+    /// Cumulative size of the chunks in the list
+    pub size: usize,
+}
+
+/// Variants for stream consumer to distinguish between raw data chunks and injected ones.
+pub enum InjectedChunksInfo {
+    Known(Vec<ReusableDynamicEntry>),
+    Raw(bytes::BytesMut),
+}
+
+pub trait InjectReusedChunks: Sized {
+    fn inject_reused_chunks(
+        self,
+        injections: Option<mpsc::Receiver<InjectChunks>>,
+        stream_len: Arc<AtomicUsize>,
+    ) -> InjectReusedChunksQueue<Self>;
+}
+
+impl<S> InjectReusedChunks for S
+where
+    S: Stream<Item = Result<bytes::BytesMut, Error>>,
+{
+    fn inject_reused_chunks(
+        self,
+        injections: Option<mpsc::Receiver<InjectChunks>>,
+        stream_len: Arc<AtomicUsize>,
+    ) -> InjectReusedChunksQueue<Self> {
+        InjectReusedChunksQueue {
+            input: self,
+            next_injection: None,
+            injections,
+            buffer: None,
+            stream_len,
+        }
+    }
+}
+
+impl<S> Stream for InjectReusedChunksQueue<S>
+where
+    S: Stream<Item = Result<bytes::BytesMut, Error>>,
+{
+    type Item = Result<InjectedChunksInfo, Error>;
+
+    fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Option<Self::Item>> {
+        let mut this = self.project();
+
+        // loop to skip over possible empty chunks
+        loop {
+            if this.next_injection.is_none() {
+                if let Some(injections) = this.injections.as_mut() {
+                    if let Ok(injection) = injections.try_recv() {
+                        *this.next_injection = Some(injection);
+                    }
+                }
+            }
+
+            if let Some(inject) = this.next_injection.take() {
+                // got reusable dynamic entries to inject
+                let offset = this.stream_len.load(Ordering::SeqCst) as u64;
+
+                match inject.boundary.cmp(&offset) {
+                    // inject now
+                    cmp::Ordering::Equal => {
+                        let chunk_info = InjectedChunksInfo::Known(inject.chunks);
+                        return Poll::Ready(Some(Ok(chunk_info)));
+                    }
+                    // inject later
+                    cmp::Ordering::Greater => *this.next_injection = Some(inject),
+                    // incoming new chunks and injections didn't line up?
+                    cmp::Ordering::Less => {
+                        return Poll::Ready(Some(Err(anyhow!("invalid injection boundary"))))
+                    }
+                }
+            }
+
+            // nothing to inject now, await further input
+            match ready!(this.input.as_mut().poll_next(cx)) {
+                None => {
+                    if let Some(injections) = this.injections.as_mut() {
+                        if this.next_injection.is_some() || injections.try_recv().is_ok() {
+                            // stream finished, but remaining dynamic entries to inject
+                            return Poll::Ready(Some(Err(anyhow!(
+                                "injection queue not fully consumed"
+                            ))));
+                        }
+                    }
+                    // stream finished and all dynamic entries already injected
+                    return Poll::Ready(None);
+                }
+                Some(Err(err)) => return Poll::Ready(Some(Err(err))),
+                // ignore empty chunks, injected chunks from queue at forced boundary, but boundary
+                // did not require splitting of the raw stream buffer to force the boundary
+                Some(Ok(raw)) if raw.is_empty() => continue,
+                Some(Ok(raw)) => return Poll::Ready(Some(Ok(InjectedChunksInfo::Raw(raw)))),
+            }
+        }
+    }
+}
diff --git a/pbs-client/src/lib.rs b/pbs-client/src/lib.rs
index 21cf8556b..3e7bd2a8b 100644
--- a/pbs-client/src/lib.rs
+++ b/pbs-client/src/lib.rs
@@ -7,6 +7,7 @@ pub mod catalog_shell;
 pub mod pxar;
 pub mod tools;
 
+mod inject_reused_chunks;
 mod merge_known_chunks;
 pub mod pipe_to_stream;
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 40/62] client: chunk stream: add struct to hold injection state
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (38 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 39/62] upload stream: implement reused chunk injector Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 41/62] client: streams: add channels for dynamic entry injection Christian Ebner
                   ` (22 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Adds a dedicated structure to hold the optional sender and receiver
instances and state for injection of reused dynamic entries in the
payload stream for split stream pxar archives.

The asynchronous channels must only be attached to the payload
archive, leaving the current behaviour for the metadata archive and
current default encoding without reusing payload chunks of previous
snapshots.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pbs-client/src/chunk_stream.rs | 23 +++++++++++++++++++++++
 pbs-client/src/lib.rs          |  2 +-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/pbs-client/src/chunk_stream.rs b/pbs-client/src/chunk_stream.rs
index 895f6eae2..83c75ba28 100644
--- a/pbs-client/src/chunk_stream.rs
+++ b/pbs-client/src/chunk_stream.rs
@@ -1,4 +1,5 @@
 use std::pin::Pin;
+use std::sync::mpsc;
 use std::task::{Context, Poll};
 
 use anyhow::Error;
@@ -8,6 +9,28 @@ use futures::stream::{Stream, TryStream};
 
 use pbs_datastore::Chunker;
 
+use crate::inject_reused_chunks::InjectChunks;
+
+/// Holds the queues for optional injection of reused dynamic index entries
+pub struct InjectionData {
+    boundaries: mpsc::Receiver<InjectChunks>,
+    injections: mpsc::Sender<InjectChunks>,
+    consumed: u64,
+}
+
+impl InjectionData {
+    pub fn new(
+        boundaries: mpsc::Receiver<InjectChunks>,
+        injections: mpsc::Sender<InjectChunks>,
+    ) -> Self {
+        Self {
+            boundaries,
+            injections,
+            consumed: 0,
+        }
+    }
+}
+
 /// Split input stream into dynamic sized chunks
 pub struct ChunkStream<S: Unpin> {
     input: S,
diff --git a/pbs-client/src/lib.rs b/pbs-client/src/lib.rs
index 3e7bd2a8b..3d2da27b9 100644
--- a/pbs-client/src/lib.rs
+++ b/pbs-client/src/lib.rs
@@ -39,6 +39,6 @@ mod backup_specification;
 pub use backup_specification::*;
 
 mod chunk_stream;
-pub use chunk_stream::{ChunkStream, FixedChunkStream};
+pub use chunk_stream::{ChunkStream, FixedChunkStream, InjectionData};
 
 pub const PROXMOX_BACKUP_TCP_KEEPALIVE_TIME: u32 = 120;
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 41/62] client: streams: add channels for dynamic entry injection
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (39 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 40/62] client: chunk stream: add struct to hold injection state Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 42/62] specs: add backup detection mode specification Christian Ebner
                   ` (21 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

To reuse dynamic entries of a previous backup run and index them for
the new snapshot. Adds a non-blocking channel between the pxar
archiver and the chunk stream, as well as the chunk stream and the
backup writer.

The archiver sends forced boundary positions and the dynamic
entries to inject into the chunk stream following this boundary.

The chunk stream consumes this channel inputs as receiver whenever a
new chunk is requested by the upload stream, forcing a non-regular
chunk boundary in the pxar stream at the requested positions.

The dynamic entries to inject and the boundary are then send via the
second asynchronous channel to the backup writer's upload stream,
indexing them by inserting the dynamic entries as known chunks into
the upload stream.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 examples/test_chunk_speed2.rs                 |   2 +-
 pbs-client/src/backup_writer.rs               | 110 ++++++++++++------
 pbs-client/src/chunk_stream.rs                |  79 ++++++++++++-
 pbs-client/src/pxar/create.rs                 |   6 +-
 pbs-client/src/pxar_backup_stream.rs          |   8 +-
 proxmox-backup-client/src/main.rs             |  28 +++--
 .../src/proxmox_restore_daemon/api.rs         |   1 +
 pxar-bin/src/main.rs                          |   1 +
 tests/catar.rs                                |   1 +
 9 files changed, 181 insertions(+), 55 deletions(-)

diff --git a/examples/test_chunk_speed2.rs b/examples/test_chunk_speed2.rs
index 3f69b436d..22dd14ce2 100644
--- a/examples/test_chunk_speed2.rs
+++ b/examples/test_chunk_speed2.rs
@@ -26,7 +26,7 @@ async fn run() -> Result<(), Error> {
         .map_err(Error::from);
 
     //let chunk_stream = FixedChunkStream::new(stream, 4*1024*1024);
-    let mut chunk_stream = ChunkStream::new(stream, None);
+    let mut chunk_stream = ChunkStream::new(stream, None, None);
 
     let start_time = std::time::Instant::now();
 
diff --git a/pbs-client/src/backup_writer.rs b/pbs-client/src/backup_writer.rs
index dc9aa569f..66f209fed 100644
--- a/pbs-client/src/backup_writer.rs
+++ b/pbs-client/src/backup_writer.rs
@@ -23,6 +23,7 @@ use pbs_tools::crypt_config::CryptConfig;
 
 use proxmox_human_byte::HumanByte;
 
+use super::inject_reused_chunks::{InjectChunks, InjectReusedChunks, InjectedChunksInfo};
 use super::merge_known_chunks::{MergeKnownChunks, MergedChunkInfo};
 
 use super::{H2Client, HttpClient};
@@ -265,6 +266,7 @@ impl BackupWriter {
         archive_name: &str,
         stream: impl Stream<Item = Result<bytes::BytesMut, Error>>,
         options: UploadOptions,
+        injections: Option<std::sync::mpsc::Receiver<InjectChunks>>,
     ) -> Result<BackupStats, Error> {
         let known_chunks = Arc::new(Mutex::new(HashSet::new()));
 
@@ -341,6 +343,7 @@ impl BackupWriter {
                 None
             },
             options.compress,
+            injections,
         )
         .await?;
 
@@ -636,6 +639,7 @@ impl BackupWriter {
         known_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
         crypt_config: Option<Arc<CryptConfig>>,
         compress: bool,
+        injections: Option<std::sync::mpsc::Receiver<InjectChunks>>,
     ) -> impl Future<Output = Result<UploadStats, Error>> {
         let total_chunks = Arc::new(AtomicUsize::new(0));
         let total_chunks2 = total_chunks.clone();
@@ -643,10 +647,12 @@ impl BackupWriter {
         let known_chunk_count2 = known_chunk_count.clone();
 
         let stream_len = Arc::new(AtomicUsize::new(0));
+        let stream_len1 = stream_len.clone();
         let stream_len2 = stream_len.clone();
         let compressed_stream_len = Arc::new(AtomicU64::new(0));
         let compressed_stream_len2 = compressed_stream_len.clone();
         let reused_len = Arc::new(AtomicUsize::new(0));
+        let reused_len1 = reused_len.clone();
         let reused_len2 = reused_len.clone();
 
         let append_chunk_path = format!("{}_index", prefix);
@@ -658,52 +664,79 @@ impl BackupWriter {
 
         let start_time = std::time::Instant::now();
 
-        let index_csum = Arc::new(Mutex::new(Some(openssl::sha::Sha256::new())));
+        let index_csum = Arc::new(Mutex::new(openssl::sha::Sha256::new()));
+        let index_csum_1 = index_csum.clone();
         let index_csum_2 = index_csum.clone();
 
         stream
-            .and_then(move |data| {
-                let chunk_len = data.len();
+            .inject_reused_chunks(injections, stream_len)
+            .and_then(move |chunk_info| match chunk_info {
+                InjectedChunksInfo::Known(chunks) => {
+                    // account for injected chunks
+                    let count = chunks.len();
+                    total_chunks.fetch_add(count, Ordering::SeqCst);
+
+                    let mut known = Vec::new();
+                    let mut csum = index_csum_1.lock().unwrap();
+                    for chunk in chunks {
+                        let offset =
+                            stream_len1.fetch_add(chunk.size() as usize, Ordering::SeqCst) as u64;
+                        reused_len1.fetch_add(chunk.size() as usize, Ordering::SeqCst);
+                        let digest = chunk.digest();
+                        known.push((offset, digest));
+                        let end_offset = offset + chunk.size();
+                        csum.update(&end_offset.to_le_bytes());
+                        csum.update(&digest);
+                    }
+                    future::ok(MergedChunkInfo::Known(known))
+                }
+                InjectedChunksInfo::Raw(raw) => {
+                    // account for not injected chunks (new and known)
+                    let offset = stream_len1.fetch_add(raw.len(), Ordering::SeqCst) as u64;
+                    let chunk_len = raw.len() as u64;
 
-                total_chunks.fetch_add(1, Ordering::SeqCst);
-                let offset = stream_len.fetch_add(chunk_len, Ordering::SeqCst) as u64;
+                    total_chunks.fetch_add(1, Ordering::SeqCst);
 
-                let mut chunk_builder = DataChunkBuilder::new(data.as_ref()).compress(compress);
+                    let mut chunk_builder = DataChunkBuilder::new(raw.as_ref()).compress(compress);
 
-                if let Some(ref crypt_config) = crypt_config {
-                    chunk_builder = chunk_builder.crypt_config(crypt_config);
-                }
+                    if let Some(ref crypt_config) = crypt_config {
+                        chunk_builder = chunk_builder.crypt_config(crypt_config);
+                    }
 
-                let mut known_chunks = known_chunks.lock().unwrap();
-                let digest = chunk_builder.digest();
+                    let mut known_chunks = known_chunks.lock().unwrap();
 
-                let mut guard = index_csum.lock().unwrap();
-                let csum = guard.as_mut().unwrap();
+                    let digest = chunk_builder.digest();
 
-                let chunk_end = offset + chunk_len as u64;
+                    let mut csum = index_csum.lock().unwrap();
 
-                if !is_fixed_chunk_size {
-                    csum.update(&chunk_end.to_le_bytes());
-                }
-                csum.update(digest);
+                    let chunk_end = offset + chunk_len;
 
-                let chunk_is_known = known_chunks.contains(digest);
-                if chunk_is_known {
-                    known_chunk_count.fetch_add(1, Ordering::SeqCst);
-                    reused_len.fetch_add(chunk_len, Ordering::SeqCst);
-                    future::ok(MergedChunkInfo::Known(vec![(offset, *digest)]))
-                } else {
-                    let compressed_stream_len2 = compressed_stream_len.clone();
-                    known_chunks.insert(*digest);
-                    future::ready(chunk_builder.build().map(move |(chunk, digest)| {
-                        compressed_stream_len2.fetch_add(chunk.raw_size(), Ordering::SeqCst);
-                        MergedChunkInfo::New(ChunkInfo {
-                            chunk,
-                            digest,
-                            chunk_len: chunk_len as u64,
-                            offset,
-                        })
-                    }))
+                    if !is_fixed_chunk_size {
+                        csum.update(&chunk_end.to_le_bytes());
+                    }
+                    csum.update(digest);
+
+                    let chunk_is_known = known_chunks.contains(digest);
+                    if chunk_is_known {
+                        known_chunk_count.fetch_add(1, Ordering::SeqCst);
+                        reused_len.fetch_add(chunk_len as usize, Ordering::SeqCst);
+
+                        future::ok(MergedChunkInfo::Known(vec![(offset, *digest)]))
+                    } else {
+                        let compressed_stream_len2 = compressed_stream_len.clone();
+                        known_chunks.insert(*digest);
+
+                        future::ready(chunk_builder.build().map(move |(chunk, digest)| {
+                            compressed_stream_len2.fetch_add(chunk.raw_size(), Ordering::SeqCst);
+
+                            MergedChunkInfo::New(ChunkInfo {
+                                chunk,
+                                digest,
+                                chunk_len,
+                                offset,
+                            })
+                        }))
+                    }
                 }
             })
             .merge_known_chunks()
@@ -771,8 +804,11 @@ impl BackupWriter {
                 let size_reused = reused_len2.load(Ordering::SeqCst);
                 let size_compressed = compressed_stream_len2.load(Ordering::SeqCst) as usize;
 
-                let mut guard = index_csum_2.lock().unwrap();
-                let csum = guard.take().unwrap().finish();
+                let csum = Arc::into_inner(index_csum_2)
+                    .unwrap()
+                    .into_inner()
+                    .unwrap()
+                    .finish();
 
                 futures::future::ok(UploadStats {
                     chunk_count,
diff --git a/pbs-client/src/chunk_stream.rs b/pbs-client/src/chunk_stream.rs
index 83c75ba28..728c0a88d 100644
--- a/pbs-client/src/chunk_stream.rs
+++ b/pbs-client/src/chunk_stream.rs
@@ -14,6 +14,7 @@ use crate::inject_reused_chunks::InjectChunks;
 /// Holds the queues for optional injection of reused dynamic index entries
 pub struct InjectionData {
     boundaries: mpsc::Receiver<InjectChunks>,
+    next_boundary: Option<InjectChunks>,
     injections: mpsc::Sender<InjectChunks>,
     consumed: u64,
 }
@@ -25,6 +26,7 @@ impl InjectionData {
     ) -> Self {
         Self {
             boundaries,
+            next_boundary: None,
             injections,
             consumed: 0,
         }
@@ -37,15 +39,17 @@ pub struct ChunkStream<S: Unpin> {
     chunker: Chunker,
     buffer: BytesMut,
     scan_pos: usize,
+    injection_data: Option<InjectionData>,
 }
 
 impl<S: Unpin> ChunkStream<S> {
-    pub fn new(input: S, chunk_size: Option<usize>) -> Self {
+    pub fn new(input: S, chunk_size: Option<usize>, injection_data: Option<InjectionData>) -> Self {
         Self {
             input,
             chunker: Chunker::new(chunk_size.unwrap_or(4 * 1024 * 1024)),
             buffer: BytesMut::new(),
             scan_pos: 0,
+            injection_data,
         }
     }
 }
@@ -62,19 +66,82 @@ where
 
     fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Option<Self::Item>> {
         let this = self.get_mut();
+
         loop {
+            if let Some(InjectionData {
+                boundaries,
+                next_boundary,
+                injections,
+                consumed,
+            }) = this.injection_data.as_mut()
+            {
+                if next_boundary.is_none() {
+                    if let Ok(boundary) = boundaries.try_recv() {
+                        *next_boundary = Some(boundary);
+                    }
+                }
+
+                if let Some(inject) = next_boundary.take() {
+                    // require forced boundary, lookup next regular boundary
+                    let pos = this.chunker.scan(&this.buffer[this.scan_pos..]);
+
+                    let chunk_boundary = if pos == 0 {
+                        *consumed + this.buffer.len() as u64
+                    } else {
+                        *consumed + (this.scan_pos + pos) as u64
+                    };
+
+                    if inject.boundary <= chunk_boundary {
+                        // forced boundary is before next boundary, force within current buffer
+                        let chunk_size = (inject.boundary - *consumed) as usize;
+                        let raw_chunk = this.buffer.split_to(chunk_size);
+                        *consumed += chunk_size as u64;
+                        this.scan_pos = 0;
+
+                        // add the size of the injected chunks to consumed, so chunk stream offsets
+                        // are in sync with the rest of the archive.
+                        *consumed += inject.size as u64;
+
+                        injections.send(inject).unwrap();
+
+                        // the chunk can be empty, return nevertheless to allow the caller to
+                        // make progress by consuming from the injection queue
+                        return Poll::Ready(Some(Ok(raw_chunk)));
+                    } else if pos != 0 {
+                        *next_boundary = Some(inject);
+                        // forced boundary is after next boundary, split off chunk from buffer
+                        let chunk_size = this.scan_pos + pos;
+                        let raw_chunk = this.buffer.split_to(chunk_size);
+                        *consumed += chunk_size as u64;
+                        this.scan_pos = 0;
+
+                        return Poll::Ready(Some(Ok(raw_chunk)));
+                    } else {
+                        // forced boundary is after current buffer length, continue reading
+                        *next_boundary = Some(inject);
+                        this.scan_pos = this.buffer.len();
+                    }
+                }
+            }
+
             if this.scan_pos < this.buffer.len() {
+                // look for next chunk boundary, starting from scan_pos
                 let boundary = this.chunker.scan(&this.buffer[this.scan_pos..]);
 
                 let chunk_size = this.scan_pos + boundary;
 
                 if boundary == 0 {
+                    // no new chunk boundary, update position for next boundary lookup
                     this.scan_pos = this.buffer.len();
-                    // continue poll
                 } else if chunk_size <= this.buffer.len() {
-                    let result = this.buffer.split_to(chunk_size);
+                    // found new chunk boundary inside buffer, split off chunk from buffer
+                    let raw_chunk = this.buffer.split_to(chunk_size);
+                    if let Some(InjectionData { consumed, .. }) = this.injection_data.as_mut() {
+                        *consumed += chunk_size as u64;
+                    }
                     this.scan_pos = 0;
-                    return Poll::Ready(Some(Ok(result)));
+
+                    return Poll::Ready(Some(Ok(raw_chunk)));
                 } else {
                     panic!("got unexpected chunk boundary from chunker");
                 }
@@ -82,10 +149,11 @@ where
 
             match ready!(Pin::new(&mut this.input).try_poll_next(cx)) {
                 Some(Err(err)) => {
+                    // got error in byte stream, pass to consumer
                     return Poll::Ready(Some(Err(err.into())));
                 }
                 None => {
-                    this.scan_pos = 0;
+                    // end of stream reached, flush remaining bytes in buffer
                     if !this.buffer.is_empty() {
                         return Poll::Ready(Some(Ok(this.buffer.split())));
                     } else {
@@ -93,6 +161,7 @@ where
                     }
                 }
                 Some(Ok(data)) => {
+                    // got new data, add to buffer
                     this.buffer.extend_from_slice(data.as_ref());
                 }
             }
diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 0f32efcce..dd3c64525 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -6,7 +6,7 @@ use std::ops::Range;
 use std::os::unix::ffi::OsStrExt;
 use std::os::unix::io::{AsRawFd, FromRawFd, IntoRawFd, OwnedFd, RawFd};
 use std::path::{Path, PathBuf};
-use std::sync::{Arc, Mutex};
+use std::sync::{mpsc, Arc, Mutex};
 
 use anyhow::{bail, Context, Error};
 use futures::future::BoxFuture;
@@ -29,6 +29,7 @@ use pbs_datastore::catalog::BackupCatalogWriter;
 use pbs_datastore::dynamic_index::DynamicIndexReader;
 use pbs_datastore::index::IndexFile;
 
+use crate::inject_reused_chunks::InjectChunks;
 use crate::pxar::metadata::errno_is_unsupported;
 use crate::pxar::tools::assert_single_path_component;
 use crate::pxar::Flags;
@@ -134,6 +135,7 @@ struct Archiver {
     hardlinks: HashMap<HardLinkInfo, (PathBuf, LinkOffset)>,
     file_copy_buffer: Vec<u8>,
     skip_e2big_xattr: bool,
+    forced_boundaries: Option<mpsc::Sender<InjectChunks>>,
 }
 
 type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
@@ -164,6 +166,7 @@ pub async fn create_archive<T, F>(
     feature_flags: Flags,
     callback: F,
     options: PxarCreateOptions,
+    forced_boundaries: Option<mpsc::Sender<InjectChunks>>,
 ) -> Result<(), Error>
 where
     T: SeqWrite + Send,
@@ -224,6 +227,7 @@ where
         hardlinks: HashMap::new(),
         file_copy_buffer: vec::undefined(4 * 1024 * 1024),
         skip_e2big_xattr: options.skip_e2big_xattr,
+        forced_boundaries,
     };
 
     archiver
diff --git a/pbs-client/src/pxar_backup_stream.rs b/pbs-client/src/pxar_backup_stream.rs
index 95145cb0d..9d2cb41d6 100644
--- a/pbs-client/src/pxar_backup_stream.rs
+++ b/pbs-client/src/pxar_backup_stream.rs
@@ -2,7 +2,7 @@ use std::io::Write;
 //use std::os::unix::io::FromRawFd;
 use std::path::Path;
 use std::pin::Pin;
-use std::sync::{Arc, Mutex};
+use std::sync::{mpsc, Arc, Mutex};
 use std::task::{Context, Poll};
 
 use anyhow::{format_err, Error};
@@ -17,6 +17,7 @@ use proxmox_io::StdChannelWriter;
 
 use pbs_datastore::catalog::CatalogWriter;
 
+use crate::inject_reused_chunks::InjectChunks;
 use crate::pxar::create::PxarWriters;
 
 /// Stream implementation to encode and upload .pxar archives.
@@ -42,6 +43,7 @@ impl PxarBackupStream {
         dir: Dir,
         catalog: Arc<Mutex<CatalogWriter<W>>>,
         options: crate::pxar::PxarCreateOptions,
+        boundaries: Option<mpsc::Sender<InjectChunks>>,
         separate_payload_stream: bool,
     ) -> Result<(Self, Option<Self>), Error> {
         let buffer_size = 256 * 1024;
@@ -79,6 +81,7 @@ impl PxarBackupStream {
                     Ok(())
                 },
                 options,
+                boundaries,
             )
             .await
             {
@@ -110,11 +113,12 @@ impl PxarBackupStream {
         dirname: &Path,
         catalog: Arc<Mutex<CatalogWriter<W>>>,
         options: crate::pxar::PxarCreateOptions,
+        boundaries: Option<mpsc::Sender<InjectChunks>>,
         separate_payload_stream: bool,
     ) -> Result<(Self, Option<Self>), Error> {
         let dir = nix::dir::Dir::open(dirname, OFlag::O_DIRECTORY, Mode::empty())?;
 
-        Self::new(dir, catalog, options, separate_payload_stream)
+        Self::new(dir, catalog, options, boundaries, separate_payload_stream)
     }
 }
 
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 821777d66..5e93f9542 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -45,8 +45,8 @@ use pbs_client::tools::{
 use pbs_client::{
     delete_ticket_info, parse_backup_specification, view_task_result, BackupReader,
     BackupRepository, BackupSpecificationType, BackupStats, BackupWriter, ChunkStream,
-    FixedChunkStream, HttpClient, PxarBackupStream, RemoteChunkReader, UploadOptions,
-    BACKUP_SOURCE_SCHEMA,
+    FixedChunkStream, HttpClient, InjectionData, PxarBackupStream, RemoteChunkReader,
+    UploadOptions, BACKUP_SOURCE_SCHEMA,
 };
 use pbs_datastore::catalog::{BackupCatalogWriter, CatalogReader, CatalogWriter};
 use pbs_datastore::chunk_store::verify_chunk_size;
@@ -199,14 +199,16 @@ async fn backup_directory<P: AsRef<Path>>(
         bail!("cannot backup directory with fixed chunk size!");
     }
 
+    let (payload_boundaries_tx, payload_boundaries_rx) = std::sync::mpsc::channel();
     let (pxar_stream, payload_stream) = PxarBackupStream::open(
         dir_path.as_ref(),
         catalog,
         pxar_create_options,
+        Some(payload_boundaries_tx),
         payload_target.is_some(),
     )?;
 
-    let mut chunk_stream = ChunkStream::new(pxar_stream, chunk_size);
+    let mut chunk_stream = ChunkStream::new(pxar_stream, chunk_size, None);
     let (tx, rx) = mpsc::channel(10); // allow to buffer 10 chunks
 
     let stream = ReceiverStream::new(rx).map_err(Error::from);
@@ -218,13 +220,16 @@ async fn backup_directory<P: AsRef<Path>>(
         }
     });
 
-    let stats = client.upload_stream(archive_name, stream, upload_options.clone());
+    let stats = client.upload_stream(archive_name, stream, upload_options.clone(), None);
 
     if let Some(payload_stream) = payload_stream {
         let payload_target = payload_target
             .ok_or_else(|| format_err!("got payload stream, but no target archive name"))?;
 
-        let mut payload_chunk_stream = ChunkStream::new(payload_stream, chunk_size);
+        let (payload_injections_tx, payload_injections_rx) = std::sync::mpsc::channel();
+        let injection_data = InjectionData::new(payload_boundaries_rx, payload_injections_tx);
+        let mut payload_chunk_stream =
+            ChunkStream::new(payload_stream, chunk_size, Some(injection_data));
         let (payload_tx, payload_rx) = mpsc::channel(10); // allow to buffer 10 chunks
         let stream = ReceiverStream::new(payload_rx).map_err(Error::from);
 
@@ -235,7 +240,12 @@ async fn backup_directory<P: AsRef<Path>>(
             }
         });
 
-        let payload_stats = client.upload_stream(&payload_target, stream, upload_options);
+        let payload_stats = client.upload_stream(
+            &payload_target,
+            stream,
+            upload_options,
+            Some(payload_injections_rx),
+        );
 
         match futures::join!(stats, payload_stats) {
             (Ok(stats), Ok(payload_stats)) => Ok((stats, Some(payload_stats))),
@@ -271,7 +281,7 @@ async fn backup_image<P: AsRef<Path>>(
     }
 
     let stats = client
-        .upload_stream(archive_name, stream, upload_options)
+        .upload_stream(archive_name, stream, upload_options, None)
         .await?;
 
     Ok(stats)
@@ -562,7 +572,7 @@ fn spawn_catalog_upload(
     let (catalog_tx, catalog_rx) = std::sync::mpsc::sync_channel(10); // allow to buffer 10 writes
     let catalog_stream = proxmox_async::blocking::StdChannelStream(catalog_rx);
     let catalog_chunk_size = 512 * 1024;
-    let catalog_chunk_stream = ChunkStream::new(catalog_stream, Some(catalog_chunk_size));
+    let catalog_chunk_stream = ChunkStream::new(catalog_stream, Some(catalog_chunk_size), None);
 
     let catalog_writer = Arc::new(Mutex::new(CatalogWriter::new(TokioWriterAdapter::new(
         StdChannelWriter::new(catalog_tx),
@@ -578,7 +588,7 @@ fn spawn_catalog_upload(
 
     tokio::spawn(async move {
         let catalog_upload_result = client
-            .upload_stream(CATALOG_NAME, catalog_chunk_stream, upload_options)
+            .upload_stream(CATALOG_NAME, catalog_chunk_stream, upload_options, None)
             .await;
 
         if let Err(ref err) = catalog_upload_result {
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index ea97976e6..0883d6cda 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -364,6 +364,7 @@ fn extract(
                         Flags::DEFAULT,
                         |_| Ok(()),
                         options,
+                        None,
                     )
                     .await
                 }
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index 58c9d2cfd..d46c98d2b 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -405,6 +405,7 @@ async fn create_archive(
             Ok(())
         },
         options,
+        None,
     )
     .await?;
 
diff --git a/tests/catar.rs b/tests/catar.rs
index 9e96a8610..d5ef85ffe 100644
--- a/tests/catar.rs
+++ b/tests/catar.rs
@@ -39,6 +39,7 @@ fn run_test(dir_name: &str) -> Result<(), Error> {
         Flags::DEFAULT,
         |_| Ok(()),
         options,
+        None,
     ))?;
 
     Command::new("cmp")
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 42/62] specs: add backup detection mode specification
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (40 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 41/62] client: streams: add channels for dynamic entry injection Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 43/62] client: implement prepare reference method Christian Ebner
                   ` (20 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Adds the specification for switching the detection mode used to
identify regular files which changed since a reference backup run.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pbs-client/src/backup_specification.rs | 44 ++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/pbs-client/src/backup_specification.rs b/pbs-client/src/backup_specification.rs
index 619a3a9da..b6b0f5199 100644
--- a/pbs-client/src/backup_specification.rs
+++ b/pbs-client/src/backup_specification.rs
@@ -4,6 +4,7 @@ use proxmox_schema::*;
 
 const_regex! {
     BACKUPSPEC_REGEX = r"^([a-zA-Z0-9_-]+\.(pxar|img|conf|log)):(.+)$";
+    DETECTION_MODE_REGEX = r"^(default|data|metadata)$";
 }
 
 pub const BACKUP_SOURCE_SCHEMA: Schema =
@@ -11,6 +12,11 @@ pub const BACKUP_SOURCE_SCHEMA: Schema =
         .format(&ApiStringFormat::Pattern(&BACKUPSPEC_REGEX))
         .schema();
 
+pub const BACKUP_DETECTION_MODE_SPEC: Schema =
+    StringSchema::new("Backup detection mode specification (default|data|metadata).")
+        .format(&ApiStringFormat::Pattern(&DETECTION_MODE_REGEX))
+        .schema();
+
 pub enum BackupSpecificationType {
     PXAR,
     IMAGE,
@@ -45,3 +51,41 @@ pub fn parse_backup_specification(value: &str) -> Result<BackupSpecification, Er
 
     bail!("unable to parse backup source specification '{}'", value);
 }
+
+/// Mode to detect file changes since last backup run
+pub enum BackupDetectionMode {
+    /// Encode backup as self contained pxar archive
+    Default,
+    /// Split backup mode, re-encode payload data
+    Data,
+    /// Compare metadata, reuse payload chunks if metadata unchanged
+    Metadata,
+}
+
+impl BackupDetectionMode {
+    /// Selected mode is data based file change detection with split meta/payload streams
+    pub fn is_data(&self) -> bool {
+        matches!(self, Self::Data)
+    }
+    /// Selected mode is metadata based file change detection
+    pub fn is_metadata(&self) -> bool {
+        matches!(self, Self::Metadata)
+    }
+}
+
+pub fn parse_backup_detection_mode_specification(
+    value: &str,
+) -> Result<BackupDetectionMode, Error> {
+    match (DETECTION_MODE_REGEX.regex_obj)().captures(value) {
+        Some(caps) => {
+            let mode = match caps.get(1).unwrap().as_str() {
+                "default" => BackupDetectionMode::Default,
+                "data" => BackupDetectionMode::Data,
+                "metadata" => BackupDetectionMode::Metadata,
+                _ => bail!("invalid backup detection mode"),
+            };
+            Ok(mode)
+        }
+        None => bail!("unable to parse backup detection mode specification '{value}'"),
+    }
+}
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 43/62] client: implement prepare reference method
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (41 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 42/62] specs: add backup detection mode specification Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 44/62] client: pxar: add method for metadata comparison Christian Ebner
                   ` (19 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Implement a method that prepares the decoder instance to access a
previous snapshots metadata index and payload index in order to
pass it to the pxar archiver. The archiver than can utilize these
to compare the metadata for files to the previous state and gather
reusable chunks.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- introduce `MetadataArchiveReader` type to make code usable for testing
  with local accessor instance

 pbs-client/src/pxar/create.rs                 |  67 +++++++++-
 pbs-client/src/pxar/mod.rs                    |   4 +-
 proxmox-backup-client/src/main.rs             | 122 +++++++++++++++---
 .../src/proxmox_restore_daemon/api.rs         |   1 +
 pxar-bin/src/main.rs                          |   1 +
 5 files changed, 170 insertions(+), 25 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index dd3c64525..3248fd307 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -18,6 +18,8 @@ use nix::sys::stat::{FileStat, Mode};
 
 use pathpatterns::{MatchEntry, MatchFlag, MatchList, MatchType, PatternFlag};
 use proxmox_sys::error::SysError;
+use pxar::accessor::aio::{Accessor, Directory};
+use pxar::accessor::ReadAt;
 use pxar::encoder::{LinkOffset, SeqWrite};
 use pxar::Metadata;
 
@@ -35,7 +37,7 @@ use crate::pxar::tools::assert_single_path_component;
 use crate::pxar::Flags;
 
 /// Pxar options for creating a pxar archive/stream
-#[derive(Default, Clone)]
+#[derive(Default)]
 pub struct PxarCreateOptions {
     /// Device/mountpoint st_dev numbers that should be included. None for no limitation.
     pub device_set: Option<HashSet<u64>>,
@@ -47,6 +49,20 @@ pub struct PxarCreateOptions {
     pub skip_lost_and_found: bool,
     /// Skip xattrs of files that return E2BIG error
     pub skip_e2big_xattr: bool,
+    /// Reference state for partial backups
+    pub previous_ref: Option<PxarPrevRef>,
+}
+
+pub type MetadataArchiveReader = Arc<dyn ReadAt + Send + Sync + 'static>;
+
+/// Statefull information of previous backups snapshots for partial backups
+pub struct PxarPrevRef {
+    /// Reference accessor for metadata comparison
+    pub accessor: Accessor<MetadataArchiveReader>,
+    /// Reference index for reusing payload chunks
+    pub payload_index: DynamicIndexReader,
+    /// Reference archive name for partial backups
+    pub archive_name: String,
 }
 
 fn detect_fs_type(fd: RawFd) -> Result<i64, Error> {
@@ -136,6 +152,7 @@ struct Archiver {
     file_copy_buffer: Vec<u8>,
     skip_e2big_xattr: bool,
     forced_boundaries: Option<mpsc::Sender<InjectChunks>>,
+    previous_payload_index: Option<DynamicIndexReader>,
 }
 
 type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
@@ -211,6 +228,15 @@ where
             MatchType::Exclude,
         )?);
     }
+    let (previous_payload_index, previous_metadata_accessor) =
+        if let Some(refs) = options.previous_ref {
+            (
+                Some(refs.payload_index),
+                refs.accessor.open_root().await.ok(),
+            )
+        } else {
+            (None, None)
+        };
 
     let mut archiver = Archiver {
         feature_flags,
@@ -228,10 +254,11 @@ where
         file_copy_buffer: vec::undefined(4 * 1024 * 1024),
         skip_e2big_xattr: options.skip_e2big_xattr,
         forced_boundaries,
+        previous_payload_index,
     };
 
     archiver
-        .archive_dir_contents(&mut encoder, source_dir, true)
+        .archive_dir_contents(&mut encoder, previous_metadata_accessor, source_dir, true)
         .await?;
     encoder.finish().await?;
     encoder.close().await?;
@@ -263,6 +290,7 @@ impl Archiver {
     fn archive_dir_contents<'a, T: SeqWrite + Send>(
         &'a mut self,
         encoder: &'a mut Encoder<'_, T>,
+        mut previous_metadata_accessor: Option<Directory<MetadataArchiveReader>>,
         mut dir: Dir,
         is_root: bool,
     ) -> BoxFuture<'a, Result<(), Error>> {
@@ -297,9 +325,15 @@ impl Archiver {
 
                 (self.callback)(&file_entry.path)?;
                 self.path = file_entry.path;
-                self.add_entry(encoder, dir_fd, &file_entry.name, &file_entry.stat)
-                    .await
-                    .map_err(|err| self.wrap_err(err))?;
+                self.add_entry(
+                    encoder,
+                    &mut previous_metadata_accessor,
+                    dir_fd,
+                    &file_entry.name,
+                    &file_entry.stat,
+                )
+                .await
+                .map_err(|err| self.wrap_err(err))?;
             }
             self.path = old_path;
             self.entry_counter = entry_counter;
@@ -547,6 +581,7 @@ impl Archiver {
     async fn add_entry<T: SeqWrite + Send>(
         &mut self,
         encoder: &mut Encoder<'_, T>,
+        previous_metadata: &mut Option<Directory<MetadataArchiveReader>>,
         parent: RawFd,
         c_file_name: &CStr,
         stat: &FileStat,
@@ -636,7 +671,14 @@ impl Archiver {
                     catalog.lock().unwrap().start_directory(c_file_name)?;
                 }
                 let result = self
-                    .add_directory(encoder, dir, c_file_name, &metadata, stat)
+                    .add_directory(
+                        encoder,
+                        previous_metadata,
+                        dir,
+                        c_file_name,
+                        &metadata,
+                        stat,
+                    )
                     .await;
                 if let Some(ref catalog) = self.catalog {
                     catalog.lock().unwrap().end_directory()?;
@@ -689,6 +731,7 @@ impl Archiver {
     async fn add_directory<T: SeqWrite + Send>(
         &mut self,
         encoder: &mut Encoder<'_, T>,
+        previous_metadata_accessor: &mut Option<Directory<MetadataArchiveReader>>,
         dir: Dir,
         dir_name: &CStr,
         metadata: &Metadata,
@@ -719,7 +762,17 @@ impl Archiver {
             log::info!("skipping mount point: {:?}", self.path);
             Ok(())
         } else {
-            self.archive_dir_contents(encoder, dir, false).await
+            let mut dir_accessor = None;
+            if let Some(accessor) = previous_metadata_accessor.as_mut() {
+                if let Some(file_entry) = accessor.lookup(dir_name).await? {
+                    if file_entry.entry().is_dir() {
+                        let dir = file_entry.enter_directory().await?;
+                        dir_accessor = Some(dir);
+                    }
+                }
+            }
+            self.archive_dir_contents(encoder, dir_accessor, dir, false)
+                .await
         };
 
         self.fs_magic = old_fs_magic;
diff --git a/pbs-client/src/pxar/mod.rs b/pbs-client/src/pxar/mod.rs
index b7dcf8362..5248a1956 100644
--- a/pbs-client/src/pxar/mod.rs
+++ b/pbs-client/src/pxar/mod.rs
@@ -56,7 +56,9 @@ pub(crate) mod tools;
 mod flags;
 pub use flags::Flags;
 
-pub use create::{create_archive, PxarCreateOptions, PxarWriters};
+pub use create::{
+    create_archive, MetadataArchiveReader, PxarCreateOptions, PxarPrevRef, PxarWriters,
+};
 pub use extract::{
     create_tar, create_zip, extract_archive, extract_sub_dir, extract_sub_dir_seq, ErrorHandler,
     OverwriteFlags, PxarExtractContext, PxarExtractOptions,
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 5e93f9542..d620083e1 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -21,6 +21,7 @@ use proxmox_router::{cli::*, ApiMethod, RpcEnvironment};
 use proxmox_schema::api;
 use proxmox_sys::fs::{file_get_json, image_size, replace_file, CreateOptions};
 use proxmox_time::{epoch_i64, strftime_local};
+use pxar::accessor::aio::Accessor;
 use pxar::accessor::{MaybeReady, ReadAt, ReadAtOperation};
 
 use pbs_api_types::{
@@ -30,7 +31,7 @@ use pbs_api_types::{
     BACKUP_TYPE_SCHEMA, TRAFFIC_CONTROL_BURST_SCHEMA, TRAFFIC_CONTROL_RATE_SCHEMA,
 };
 use pbs_client::catalog_shell::Shell;
-use pbs_client::pxar::ErrorHandler as PxarErrorHandler;
+use pbs_client::pxar::{ErrorHandler as PxarErrorHandler, MetadataArchiveReader, PxarPrevRef};
 use pbs_client::tools::{
     complete_archive_name, complete_auth_id, complete_backup_group, complete_backup_snapshot,
     complete_backup_source, complete_chunk_size, complete_group_or_snapshot,
@@ -43,14 +44,14 @@ use pbs_client::tools::{
     CHUNK_SIZE_SCHEMA, REPO_URL_SCHEMA,
 };
 use pbs_client::{
-    delete_ticket_info, parse_backup_specification, view_task_result, BackupReader,
-    BackupRepository, BackupSpecificationType, BackupStats, BackupWriter, ChunkStream,
-    FixedChunkStream, HttpClient, InjectionData, PxarBackupStream, RemoteChunkReader,
-    UploadOptions, BACKUP_SOURCE_SCHEMA,
+    delete_ticket_info, parse_backup_detection_mode_specification, parse_backup_specification,
+    view_task_result, BackupReader, BackupRepository, BackupSpecificationType, BackupStats,
+    BackupWriter, ChunkStream, FixedChunkStream, HttpClient, InjectionData, PxarBackupStream,
+    RemoteChunkReader, UploadOptions, BACKUP_DETECTION_MODE_SPEC, BACKUP_SOURCE_SCHEMA,
 };
 use pbs_datastore::catalog::{BackupCatalogWriter, CatalogReader, CatalogWriter};
 use pbs_datastore::chunk_store::verify_chunk_size;
-use pbs_datastore::dynamic_index::{BufferedDynamicReader, DynamicIndexReader};
+use pbs_datastore::dynamic_index::{BufferedDynamicReader, DynamicIndexReader, LocalDynamicReadAt};
 use pbs_datastore::fixed_index::FixedIndexReader;
 use pbs_datastore::index::IndexFile;
 use pbs_datastore::manifest::{
@@ -687,6 +688,10 @@ fn spawn_catalog_upload(
                schema: TRAFFIC_CONTROL_BURST_SCHEMA,
                optional: true,
            },
+           "change-detection-mode": {
+               schema: BACKUP_DETECTION_MODE_SPEC,
+               optional: true,
+           },
            "exclude": {
                type: Array,
                description: "List of paths or patterns for matching files to exclude.",
@@ -881,6 +886,9 @@ async fn create_backup(
 
     let backup_time = backup_time_opt.unwrap_or_else(epoch_i64);
 
+    let detection_mode = param["change-detection-mode"].as_str().unwrap_or("default");
+    let detection_mode = parse_backup_detection_mode_specification(detection_mode)?;
+
     let http_client = connect_rate_limited(&repo, rate_limit)?;
     record_repository(&repo);
 
@@ -981,7 +989,7 @@ async fn create_backup(
         None
     };
 
-    let mut manifest = BackupManifest::new(snapshot);
+    let mut manifest = BackupManifest::new(snapshot.clone());
 
     let mut catalog = None;
     let mut catalog_result_rx = None;
@@ -1028,22 +1036,21 @@ async fn create_backup(
                 manifest.add_file(target, stats.size, stats.csum, crypto.mode)?;
             }
             (BackupSpecificationType::PXAR, false) => {
-                let metadata_mode = false; // Until enabled via param
-
                 let target_base = if let Some(base) = target_base.strip_suffix(".pxar") {
                     base.to_string()
                 } else {
                     bail!("unexpected suffix in target: {target_base}");
                 };
 
-                let (target, payload_target) = if metadata_mode {
-                    (
-                        format!("{target_base}.mpxar.{extension}"),
-                        Some(format!("{target_base}.ppxar.{extension}")),
-                    )
-                } else {
-                    (target, None)
-                };
+                let (target, payload_target) =
+                    if detection_mode.is_metadata() || detection_mode.is_data() {
+                        (
+                            format!("{target_base}.mpxar.{extension}"),
+                            Some(format!("{target_base}.ppxar.{extension}")),
+                        )
+                    } else {
+                        (target, None)
+                    };
 
                 // start catalog upload on first use
                 if catalog.is_none() {
@@ -1060,12 +1067,41 @@ async fn create_backup(
                     .unwrap()
                     .start_directory(std::ffi::CString::new(target.as_str())?.as_c_str())?;
 
+                let mut previous_ref = None;
+                if detection_mode.is_metadata() {
+                    if let Some(ref manifest) = previous_manifest {
+                        // BackupWriter::start created a new snapshot, get the one before
+                        if let Some(backup_time) = client.previous_backup_time().await? {
+                            let backup_dir: BackupDir =
+                                (snapshot.group.clone(), backup_time).into();
+                            let backup_reader = BackupReader::start(
+                                &http_client,
+                                crypt_config.clone(),
+                                repo.store(),
+                                &backup_ns,
+                                &backup_dir,
+                                true,
+                            )
+                            .await?;
+                            previous_ref = prepare_reference(
+                                &target,
+                                manifest.clone(),
+                                &client,
+                                backup_reader.clone(),
+                                crypt_config.clone(),
+                            )
+                            .await?
+                        }
+                    }
+                }
+
                 let pxar_options = pbs_client::pxar::PxarCreateOptions {
                     device_set: devices.clone(),
                     patterns: pattern_list.clone(),
                     entries_max: entries_max as usize,
                     skip_lost_and_found,
                     skip_e2big_xattr,
+                    previous_ref,
                 };
 
                 let upload_options = UploadOptions {
@@ -1177,6 +1213,58 @@ async fn create_backup(
     Ok(Value::Null)
 }
 
+async fn prepare_reference(
+    target: &str,
+    manifest: Arc<BackupManifest>,
+    backup_writer: &BackupWriter,
+    backup_reader: Arc<BackupReader>,
+    crypt_config: Option<Arc<CryptConfig>>,
+) -> Result<Option<PxarPrevRef>, Error> {
+    let (target, payload_target) = helper::get_pxar_archive_names(target);
+    let payload_target = payload_target.unwrap_or_default();
+
+    let metadata_ref_index = if let Ok(index) = backup_reader
+        .download_dynamic_index(&manifest, &target)
+        .await
+    {
+        index
+    } else {
+        log::info!("No previous metadata index, continue without reference");
+        return Ok(None);
+    };
+
+    if manifest.lookup_file_info(&payload_target).is_err() {
+        log::info!("No previous payload index found in manifest, continue without reference");
+        return Ok(None);
+    }
+
+    let known_payload_chunks = Arc::new(Mutex::new(HashSet::new()));
+    let payload_ref_index = backup_writer
+        .download_previous_dynamic_index(&payload_target, &manifest, known_payload_chunks)
+        .await?;
+
+    log::info!("Using previous index as metadata reference for '{target}'");
+
+    let most_used = metadata_ref_index.find_most_used_chunks(8);
+    let file_info = manifest.lookup_file_info(&target)?;
+    let chunk_reader = RemoteChunkReader::new(
+        backup_reader.clone(),
+        crypt_config.clone(),
+        file_info.chunk_crypt_mode(),
+        most_used,
+    );
+    let reader = BufferedDynamicReader::new(metadata_ref_index, chunk_reader);
+    let archive_size = reader.archive_size();
+    let reader: MetadataArchiveReader = Arc::new(LocalDynamicReadAt::new(reader));
+    let accessor = Accessor::new(reader, archive_size, None).await?;
+
+    Ok(Some(pbs_client::pxar::PxarPrevRef {
+        accessor,
+        payload_index: payload_ref_index,
+        archive_name: target,
+    }))
+}
+
 async fn dump_image<W: Write>(
     client: Arc<BackupReader>,
     crypt_config: Option<Arc<CryptConfig>>,
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index 0883d6cda..e50cb8184 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -355,6 +355,7 @@ fn extract(
                         patterns,
                         skip_lost_and_found: false,
                         skip_e2big_xattr: false,
+                        previous_ref: None,
                     };
 
                     let pxar_writer = TokioWriter::new(writer);
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index d46c98d2b..c6d3794bb 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -358,6 +358,7 @@ async fn create_archive(
         patterns,
         skip_lost_and_found: false,
         skip_e2big_xattr: false,
+        previous_ref: None,
     };
 
     let source = PathBuf::from(source);
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 44/62] client: pxar: add method for metadata comparison
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (42 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 43/62] client: implement prepare reference method Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 45/62] pxar: caching: add look-ahead cache types Christian Ebner
                   ` (18 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Add method to compare metadata of current file entry against metadata
of the entry looked up in the previous backup snapshot. If the
metadata matched, the start offset pointing to the files payload
header in the payload steam is returned.

This is in preparation for reusing payload chunks for unchanged files.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- refactor to `MetadataArchiveReader` type, reusable by tests with local
  accessor

 pbs-client/src/pxar/create.rs | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 3248fd307..7e6402de5 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -2,6 +2,7 @@ use std::collections::{HashMap, HashSet};
 use std::ffi::{CStr, CString, OsStr};
 use std::fmt;
 use std::io::{self, Read};
+use std::mem::size_of;
 use std::ops::Range;
 use std::os::unix::ffi::OsStrExt;
 use std::os::unix::io::{AsRawFd, FromRawFd, IntoRawFd, OwnedFd, RawFd};
@@ -21,7 +22,7 @@ use proxmox_sys::error::SysError;
 use pxar::accessor::aio::{Accessor, Directory};
 use pxar::accessor::ReadAt;
 use pxar::encoder::{LinkOffset, SeqWrite};
-use pxar::Metadata;
+use pxar::{EntryKind, Metadata};
 
 use proxmox_io::vec;
 use proxmox_lang::c_str;
@@ -344,6 +345,37 @@ impl Archiver {
         .boxed()
     }
 
+    async fn is_reusable_entry(
+        &mut self,
+        previous_metadata_accessor: &mut Directory<MetadataArchiveReader>,
+        file_name: &Path,
+        metadata: &Metadata,
+    ) -> Result<Option<Range<u64>>, Error> {
+        if let Some(file_entry) = previous_metadata_accessor.lookup(file_name).await? {
+            if metadata == file_entry.metadata() {
+                if let EntryKind::File {
+                    payload_offset: Some(offset),
+                    size,
+                    ..
+                } = file_entry.entry().kind()
+                {
+                    let range = *offset..*offset + size + size_of::<pxar::format::Header>() as u64;
+                    log::debug!(
+                        "reusable: {file_name:?} at range {range:?} has unchanged metadata."
+                    );
+                    return Ok(Some(range));
+                }
+                log::debug!("reencode: {file_name:?} not a regular file.");
+                return Ok(None);
+            }
+            log::debug!("reencode: {file_name:?} metadata did not match.");
+            return Ok(None);
+        }
+
+        log::debug!("reencode: {file_name:?} not found in previous archive.");
+        Ok(None)
+    }
+
     /// openat() wrapper which allows but logs `EACCES` and turns `ENOENT` into `None`.
     ///
     /// The `existed` flag is set when iterating through a directory to note that we know the file
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 45/62] pxar: caching: add look-ahead cache types
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (43 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 44/62] client: pxar: add method for metadata comparison Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 46/62] fix #3174: client: pxar: enable caching and meta comparison Christian Ebner
                   ` (17 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

These types allow to store the needed data and keep track of
directory boundaries while traversing the filesystem tree, in order
to postpone a decision if to reuse or reencode a given regular file
with unchanged metadata.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pbs-client/src/pxar/look_ahead_cache.rs | 38 +++++++++++++++++++++++++
 pbs-client/src/pxar/mod.rs              |  1 +
 2 files changed, 39 insertions(+)
 create mode 100644 pbs-client/src/pxar/look_ahead_cache.rs

diff --git a/pbs-client/src/pxar/look_ahead_cache.rs b/pbs-client/src/pxar/look_ahead_cache.rs
new file mode 100644
index 000000000..68f3fd1f2
--- /dev/null
+++ b/pbs-client/src/pxar/look_ahead_cache.rs
@@ -0,0 +1,38 @@
+use nix::sys::stat::FileStat;
+use pxar::encoder::PayloadOffset;
+use std::ffi::CString;
+use std::os::unix::io::OwnedFd;
+
+use pxar::Metadata;
+
+pub(crate) struct CacheEntryData {
+    pub(crate) fd: OwnedFd,
+    pub(crate) c_file_name: CString,
+    pub(crate) stat: FileStat,
+    pub(crate) metadata: Metadata,
+    pub(crate) payload_offset: PayloadOffset,
+}
+
+impl CacheEntryData {
+    pub(crate) fn new(
+        fd: OwnedFd,
+        c_file_name: CString,
+        stat: FileStat,
+        metadata: Metadata,
+        payload_offset: PayloadOffset,
+    ) -> Self {
+        Self {
+            fd,
+            c_file_name,
+            stat,
+            metadata,
+            payload_offset,
+        }
+    }
+}
+
+pub(crate) enum CacheEntry {
+    RegEntry(CacheEntryData),
+    DirEntry(CacheEntryData),
+    DirEnd,
+}
diff --git a/pbs-client/src/pxar/mod.rs b/pbs-client/src/pxar/mod.rs
index 5248a1956..334759df6 100644
--- a/pbs-client/src/pxar/mod.rs
+++ b/pbs-client/src/pxar/mod.rs
@@ -50,6 +50,7 @@
 pub(crate) mod create;
 pub(crate) mod dir_stack;
 pub(crate) mod extract;
+pub(crate) mod look_ahead_cache;
 pub(crate) mod metadata;
 pub(crate) mod tools;
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 46/62] fix #3174: client: pxar: enable caching and meta comparison
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (44 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 45/62] pxar: caching: add look-ahead cache types Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 47/62] client: backup writer: add injected chunk count to stats Christian Ebner
                   ` (16 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

When walking the file system tree, check for each entry if it is
reusable, meaning that the metadata did not change and the payload
chunks can be reindexed instead of reencoding the whole data.

If the metadata matched, the range of the dynamic index entries for
that file are looked up in the previous payload data index.
Use the range and possible padding introduced by partial reuse of
chunks to decide wheather to reuse the dynamic entries and encode
the file payloads as payload reference right away or cache the entry
for now and keep looking ahead.

If however a non-reusable (because changed) entry is encountered
before the padding threshold is reached, the entries on the cache are
flushed to the archive by reencoding them, resetting the cached state.

Reusable chunk digests and size as well as reference offsets to the
start of regular files payloads within the payload stream are injected
into the backup stream by sending them to the chunker via a dedicated
channel, forcing a chunk boundary and inserting the chunks.

If the threshold value for reuse is reached, the chunks are injected
in the payload stream and the references with the corresponding
offsets encoded in the metadata stream.

Since multiple files might be contained within a single chunk, it is
assured that the deduplication of chunks is performed, by keeping back
the last chunk, so following files might as well reuse that same
chunk without double indexing it.  It is assured that this chunk is
injected in the stream also in case that the following lookups lead to
a cache clear and reencoding.

Directory boundaries are cached as well, and written as part of the
encoding when flushing.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- fix kept back junk injection when a range discontinuity is encountered
- adapt to `MetadataArchiveReader`

 pbs-client/src/pxar/create.rs | 494 +++++++++++++++++++++++++++++++---
 1 file changed, 460 insertions(+), 34 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 7e6402de5..b2932c973 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -21,9 +21,10 @@ use pathpatterns::{MatchEntry, MatchFlag, MatchList, MatchType, PatternFlag};
 use proxmox_sys::error::SysError;
 use pxar::accessor::aio::{Accessor, Directory};
 use pxar::accessor::ReadAt;
-use pxar::encoder::{LinkOffset, SeqWrite};
+use pxar::encoder::{LinkOffset, PayloadOffset, SeqWrite};
 use pxar::{EntryKind, Metadata};
 
+use proxmox_human_byte::HumanByte;
 use proxmox_io::vec;
 use proxmox_lang::c_str;
 use proxmox_sys::fs::{self, acl, xattr};
@@ -33,10 +34,14 @@ use pbs_datastore::dynamic_index::DynamicIndexReader;
 use pbs_datastore::index::IndexFile;
 
 use crate::inject_reused_chunks::InjectChunks;
+use crate::pxar::look_ahead_cache::{CacheEntry, CacheEntryData};
 use crate::pxar::metadata::errno_is_unsupported;
 use crate::pxar::tools::assert_single_path_component;
 use crate::pxar::Flags;
 
+const CHUNK_PADDING_THRESHOLD: f64 = 0.1;
+const MAX_CACHE_SIZE: usize = 512;
+
 /// Pxar options for creating a pxar archive/stream
 #[derive(Default)]
 pub struct PxarCreateOptions {
@@ -154,6 +159,11 @@ struct Archiver {
     skip_e2big_xattr: bool,
     forced_boundaries: Option<mpsc::Sender<InjectChunks>>,
     previous_payload_index: Option<DynamicIndexReader>,
+    cached_entries: Vec<CacheEntry>,
+    cached_hardlinks: HashSet<HardLinkInfo>,
+    cached_range: Range<u64>,
+    cached_last_chunk: Option<ReusableDynamicEntry>,
+    caching_enabled: bool,
 }
 
 type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
@@ -213,6 +223,8 @@ where
         set.insert(stat.st_dev);
     }
 
+    let metadata_mode = options.previous_ref.is_some() && writers.payload_writer.is_some();
+
     let mut encoder = Encoder::new(
         &mut writers.writer,
         &metadata,
@@ -256,11 +268,23 @@ where
         skip_e2big_xattr: options.skip_e2big_xattr,
         forced_boundaries,
         previous_payload_index,
+        cached_entries: Vec::new(),
+        cached_range: Range::default(),
+        cached_last_chunk: None,
+        cached_hardlinks: HashSet::new(),
+        caching_enabled: false,
     };
 
     archiver
         .archive_dir_contents(&mut encoder, previous_metadata_accessor, source_dir, true)
         .await?;
+
+    if metadata_mode {
+        archiver
+            .flush_cached_reusing_if_below_threshold(&mut encoder, false)
+            .await?;
+    }
+
     encoder.finish().await?;
     encoder.close().await?;
 
@@ -318,7 +342,10 @@ impl Archiver {
             for file_entry in file_list {
                 let file_name = file_entry.name.to_bytes();
 
-                if is_root && file_name == b".pxarexclude-cli" {
+                if is_root
+                    && file_name == b".pxarexclude-cli"
+                    && previous_metadata_accessor.is_none()
+                {
                     self.encode_pxarexclude_cli(encoder, &file_entry.name, old_patterns_count)
                         .await?;
                     continue;
@@ -336,6 +363,7 @@ impl Archiver {
                 .await
                 .map_err(|err| self.wrap_err(err))?;
             }
+
             self.path = old_path;
             self.entry_counter = entry_counter;
             self.patterns.truncate(old_patterns_count);
@@ -618,8 +646,6 @@ impl Archiver {
         c_file_name: &CStr,
         stat: &FileStat,
     ) -> Result<(), Error> {
-        use pxar::format::mode;
-
         let file_mode = stat.st_mode & libc::S_IFMT;
         let open_mode = if file_mode == libc::S_IFREG || file_mode == libc::S_IFDIR {
             OFlag::empty()
@@ -657,6 +683,96 @@ impl Archiver {
             self.skip_e2big_xattr,
         )?;
 
+        if self.previous_payload_index.is_none() {
+            return self
+                .add_entry_to_archive(
+                    encoder,
+                    previous_metadata,
+                    c_file_name,
+                    stat,
+                    fd,
+                    &metadata,
+                    None,
+                )
+                .await;
+        }
+
+        // Avoid having to many open file handles in cached entries
+        if self.cached_entries.len() > MAX_CACHE_SIZE {
+            log::debug!("Max cache size reached, reuse cached entries");
+            self.flush_cached_reusing_if_below_threshold(encoder, true)
+                .await?;
+        }
+
+        if metadata.is_regular_file() {
+            self.cache_or_flush_entries(
+                encoder,
+                previous_metadata,
+                c_file_name,
+                stat,
+                fd,
+                &metadata,
+            )
+            .await
+        } else if self.caching_enabled {
+            if stat.st_mode & libc::S_IFMT == libc::S_IFDIR {
+                let fd_clone = fd.try_clone()?;
+                let cache_entry = CacheEntry::DirEntry(CacheEntryData::new(
+                    fd,
+                    c_file_name.into(),
+                    *stat,
+                    metadata.clone(),
+                    PayloadOffset::default(),
+                ));
+                self.cached_entries.push(cache_entry);
+
+                let dir = Dir::from_fd(fd_clone.into_raw_fd())?;
+                self.add_directory(
+                    encoder,
+                    previous_metadata,
+                    dir,
+                    c_file_name,
+                    &metadata,
+                    stat,
+                )
+                .await?;
+            } else {
+                let cache_entry = CacheEntry::RegEntry(CacheEntryData::new(
+                    fd,
+                    c_file_name.into(),
+                    *stat,
+                    metadata,
+                    PayloadOffset::default(),
+                ));
+                self.cached_entries.push(cache_entry);
+            }
+            Ok(())
+        } else {
+            self.add_entry_to_archive(
+                encoder,
+                previous_metadata,
+                c_file_name,
+                stat,
+                fd,
+                &metadata,
+                None,
+            )
+            .await
+        }
+    }
+
+    async fn add_entry_to_archive<T: SeqWrite + Send>(
+        &mut self,
+        encoder: &mut Encoder<'_, T>,
+        previous_metadata: &mut Option<Directory<MetadataArchiveReader>>,
+        c_file_name: &CStr,
+        stat: &FileStat,
+        fd: OwnedFd,
+        metadata: &Metadata,
+        payload_offset: Option<PayloadOffset>,
+    ) -> Result<(), Error> {
+        use pxar::format::mode;
+
         let file_name: &Path = OsStr::from_bytes(c_file_name.to_bytes()).as_ref();
         match metadata.file_type() {
             mode::IFREG => {
@@ -685,9 +801,14 @@ impl Archiver {
                         .add_file(c_file_name, file_size, stat.st_mtime)?;
                 }
 
-                let offset: LinkOffset = self
-                    .add_regular_file(encoder, fd, file_name, &metadata, file_size)
-                    .await?;
+                let offset: LinkOffset = if let Some(payload_offset) = payload_offset {
+                    encoder
+                        .add_payload_ref(metadata, file_name, file_size, payload_offset)
+                        .await?
+                } else {
+                    self.add_regular_file(encoder, fd, file_name, metadata, file_size)
+                        .await?
+                };
 
                 if stat.st_nlink > 1 {
                     self.hardlinks
@@ -698,59 +819,43 @@ impl Archiver {
             }
             mode::IFDIR => {
                 let dir = Dir::from_fd(fd.into_raw_fd())?;
-
-                if let Some(ref catalog) = self.catalog {
-                    catalog.lock().unwrap().start_directory(c_file_name)?;
-                }
-                let result = self
-                    .add_directory(
-                        encoder,
-                        previous_metadata,
-                        dir,
-                        c_file_name,
-                        &metadata,
-                        stat,
-                    )
-                    .await;
-                if let Some(ref catalog) = self.catalog {
-                    catalog.lock().unwrap().end_directory()?;
-                }
-                result
+                self.add_directory(encoder, previous_metadata, dir, c_file_name, metadata, stat)
+                    .await
             }
             mode::IFSOCK => {
                 if let Some(ref catalog) = self.catalog {
                     catalog.lock().unwrap().add_socket(c_file_name)?;
                 }
 
-                Ok(encoder.add_socket(&metadata, file_name).await?)
+                Ok(encoder.add_socket(metadata, file_name).await?)
             }
             mode::IFIFO => {
                 if let Some(ref catalog) = self.catalog {
                     catalog.lock().unwrap().add_fifo(c_file_name)?;
                 }
 
-                Ok(encoder.add_fifo(&metadata, file_name).await?)
+                Ok(encoder.add_fifo(metadata, file_name).await?)
             }
             mode::IFLNK => {
                 if let Some(ref catalog) = self.catalog {
                     catalog.lock().unwrap().add_symlink(c_file_name)?;
                 }
 
-                self.add_symlink(encoder, fd, file_name, &metadata).await
+                self.add_symlink(encoder, fd, file_name, metadata).await
             }
             mode::IFBLK => {
                 if let Some(ref catalog) = self.catalog {
                     catalog.lock().unwrap().add_block_device(c_file_name)?;
                 }
 
-                self.add_device(encoder, file_name, &metadata, stat).await
+                self.add_device(encoder, file_name, metadata, stat).await
             }
             mode::IFCHR => {
                 if let Some(ref catalog) = self.catalog {
                     catalog.lock().unwrap().add_char_device(c_file_name)?;
                 }
 
-                self.add_device(encoder, file_name, &metadata, stat).await
+                self.add_device(encoder, file_name, metadata, stat).await
             }
             other => bail!(
                 "encountered unknown file type: 0x{:x} (0o{:o})",
@@ -760,18 +865,329 @@ impl Archiver {
         }
     }
 
+    async fn cache_or_flush_entries<T: SeqWrite + Send>(
+        &mut self,
+        encoder: &mut Encoder<'_, T>,
+        previous_metadata_accessor: &mut Option<Directory<MetadataArchiveReader>>,
+        c_file_name: &CStr,
+        stat: &FileStat,
+        fd: OwnedFd,
+        metadata: &Metadata,
+    ) -> Result<(), Error> {
+        let file_name: &Path = OsStr::from_bytes(c_file_name.to_bytes()).as_ref();
+        let reusable = if let Some(accessor) = previous_metadata_accessor {
+            self.is_reusable_entry(accessor, file_name, metadata)
+                .await?
+        } else {
+            None
+        };
+
+        if stat.st_nlink > 1 {
+            let link_info = HardLinkInfo {
+                st_dev: stat.st_dev,
+                st_ino: stat.st_ino,
+            };
+            if self.cached_hardlinks.contains(&link_info) {
+                // This hardlink has been seen by the lookahead cache already, put it on the cache
+                // with a dummy offset and continue without lookup and chunk injection.
+                // On flushing or re-encoding, the logic there will store the actual hardlink with
+                // offset.
+                self.caching_enabled = true;
+                let cache_entry = CacheEntry::RegEntry(CacheEntryData::new(
+                    fd,
+                    c_file_name.into(),
+                    *stat,
+                    metadata.clone(),
+                    PayloadOffset::default(),
+                ));
+                self.cached_entries.push(cache_entry);
+                return Ok(());
+            } else {
+                // mark this hardlink as seen by the lookahead cache
+                self.cached_hardlinks.insert(link_info);
+            }
+        }
+
+        if let Some(payload_range) = reusable {
+            // check for range continuation in payload archive
+            if self.cached_range.end == 0 {
+                // initialize first range to start and end with start of new range
+                self.cached_range.start = payload_range.start;
+                self.cached_range.end = payload_range.start;
+            }
+
+            if self.cached_range.end == payload_range.start {
+                self.cached_range.end = payload_range.end;
+                log::debug!(
+                    "Cache range continuation, new range: {:?}",
+                    self.cached_range
+                );
+            } else {
+                log::debug!("Cache range has hole, new range: {payload_range:?}");
+                self.flush_cached_reusing_if_below_threshold(encoder, true)
+                    .await?;
+                // range has to be set after flushing of cached entries, which resets the range
+                self.cached_range = payload_range.clone();
+            }
+
+            // offset relative to start of current range, does not include possible padding of
+            // actual chunks, which needs to be added before encoding the payload reference
+            let offset =
+                PayloadOffset::default().add(payload_range.start - self.cached_range.start);
+            log::debug!("Offset relative to range start: {offset:?}");
+
+            self.caching_enabled = true;
+            self.cached_entries
+                .push(CacheEntry::RegEntry(CacheEntryData::new(
+                    fd,
+                    c_file_name.into(),
+                    *stat,
+                    metadata.clone(),
+                    offset,
+                )));
+
+            return Ok(());
+        }
+
+        self.flush_cached_reencoding(encoder).await?;
+        self.add_entry_to_archive(
+            encoder,
+            previous_metadata_accessor,
+            c_file_name,
+            stat,
+            fd,
+            metadata,
+            None,
+        )
+        .await
+    }
+
+    async fn flush_cached_reusing_if_below_threshold<T: SeqWrite + Send>(
+        &mut self,
+        encoder: &mut Encoder<'_, T>,
+        keep_last_chunk: bool,
+    ) -> Result<(), Error> {
+        let mut prev_last_chunk = self.cached_last_chunk.take();
+
+        if self.cached_range.is_empty() {
+            if let Some(prev) = prev_last_chunk {
+                // make sure to inject previous last
+                self.inject_chunks_at_current_payload_position(encoder, vec![prev].as_slice())?;
+            }
+            // only non regular file entries (directories) in cache, allows to do regular encoding
+            self.encode_entries_to_archive(encoder, None).await?;
+            return Ok(());
+        }
+
+        if let Some(ref ref_payload_index) = self.previous_payload_index {
+            let (mut indices, start_padding, end_padding) =
+                lookup_dynamic_entries(ref_payload_index, self.cached_range.clone())?;
+            let mut padding = start_padding + end_padding;
+            let total_size = (self.cached_range.end - self.cached_range.start) + padding;
+
+            // take into account used bytes of kept back chunk for padding
+            if let (Some(first), Some(last)) = (indices.first_mut(), prev_last_chunk.as_mut()) {
+                if last.digest() == first.digest() {
+                    // Update padding used for threshold calculation only
+                    let used = last.size() - last.padding;
+                    padding -= used;
+                }
+            }
+
+            let ratio = padding as f64 / total_size as f64;
+
+            if ratio > CHUNK_PADDING_THRESHOLD {
+                log::debug!(
+                    "Padding ratio: {ratio} > {CHUNK_PADDING_THRESHOLD}, padding: {}, total {}, chunks: {}",
+                    HumanByte::from(padding),
+                    HumanByte::from(total_size),
+                    indices.len(),
+                );
+                // do not reuse chunks if introduced padding higher than threshold
+                // opt for re-encoding in that case
+                if let Some(prev) = prev_last_chunk {
+                    // make sure to inject previous last
+                    self.inject_chunks_at_current_payload_position(encoder, vec![prev].as_slice())?;
+                }
+                self.encode_entries_to_archive(encoder, None).await?;
+            } else {
+                log::debug!(
+                    "Padding ratio: {ratio} < {CHUNK_PADDING_THRESHOLD}, padding: {}, total {}, chunks: {}",
+                    HumanByte::from(padding),
+                    HumanByte::from(total_size),
+                    indices.len(),
+                );
+
+                // check for cases where kept back last is not equal first chunk because the range
+                // end aligned with a chunk boundary, and the chunks therefore needs to be injected
+                if let (Some(first), Some(last)) = (indices.first_mut(), prev_last_chunk) {
+                    if last.digest() != first.digest() {
+                        // make sure to inject previous last
+                        self.inject_chunks_at_current_payload_position(
+                            encoder,
+                            vec![last].as_slice(),
+                        )?;
+                    } else {
+                        let used = last.size() - last.padding;
+                        first.padding -= used;
+                    }
+                }
+
+                let base_offset = Some(encoder.payload_position()?.add(start_padding));
+                self.encode_entries_to_archive(encoder, base_offset).await?;
+
+                if keep_last_chunk {
+                    self.cached_last_chunk = indices.pop();
+                }
+
+                self.inject_chunks_at_current_payload_position(encoder, indices.as_slice())?;
+            }
+
+            // clear range while keeping end for possible continuation if entries have been flushed
+            // because the max cache size was reached
+            self.cached_range = self.cached_range.end..self.cached_range.end;
+            self.caching_enabled = false;
+
+            Ok(())
+        } else {
+            bail!("cannot reuse chunks without previous index reader");
+        }
+    }
+
+    // Clear the cache and reencode all cached entries
+    // Make sure to inject a possibly kept back chunk from a previous chunk continuation attempt
+    async fn flush_cached_reencoding<T: SeqWrite + Send>(
+        &mut self,
+        encoder: &mut Encoder<'_, T>,
+    ) -> Result<(), Error> {
+        if let Some(prev) = self.cached_last_chunk.take() {
+            // make sure to inject previous last
+            self.inject_chunks_at_current_payload_position(encoder, vec![prev].as_slice())?;
+        }
+
+        self.encode_entries_to_archive(encoder, None).await?;
+
+        self.cached_range = self.cached_range.end..self.cached_range.end;
+        self.caching_enabled = false;
+        Ok(())
+    }
+
+    // Take ownership of cached entries and encode them to the archive
+    // Encode with reused payload chunks when base offset is some, reencode otherwise
+    async fn encode_entries_to_archive<T: SeqWrite + Send>(
+        &mut self,
+        encoder: &mut Encoder<'_, T>,
+        base_offset: Option<PayloadOffset>,
+    ) -> Result<(), Error> {
+        // take ownership of cached entries, leaving new empty cache behind
+        let entries = std::mem::take(&mut self.cached_entries);
+        log::debug!(
+            "Got {} cache entries to encode: reuse is {}",
+            entries.len(),
+            base_offset.is_some()
+        );
+
+        for entry in entries {
+            match entry {
+                CacheEntry::RegEntry(CacheEntryData {
+                    fd,
+                    c_file_name,
+                    stat,
+                    metadata,
+                    payload_offset,
+                }) => {
+                    self.add_entry_to_archive(
+                        encoder,
+                        &mut None,
+                        &c_file_name,
+                        &stat,
+                        fd,
+                        &metadata,
+                        base_offset.map(|base_offset| payload_offset.add(base_offset.raw())),
+                    )
+                    .await?
+                }
+                CacheEntry::DirEntry(CacheEntryData {
+                    c_file_name,
+                    metadata,
+                    ..
+                }) => {
+                    if let Some(ref catalog) = self.catalog {
+                        catalog.lock().unwrap().start_directory(&c_file_name)?;
+                    }
+                    let dir_name = OsStr::from_bytes(c_file_name.to_bytes());
+                    encoder.create_directory(dir_name, &metadata).await?;
+                }
+                CacheEntry::DirEnd => {
+                    encoder.finish().await?;
+                    if let Some(ref catalog) = self.catalog {
+                        catalog.lock().unwrap().end_directory()?;
+                    }
+                }
+            }
+        }
+
+        Ok(())
+    }
+
+    fn inject_chunks_at_current_payload_position<T: SeqWrite + Send>(
+        &mut self,
+        encoder: &mut Encoder<'_, T>,
+        reused_chunks: &[ReusableDynamicEntry],
+    ) -> Result<(), Error> {
+        let mut injection_boundary = encoder.payload_position()?;
+
+        for chunks in reused_chunks.chunks(128) {
+            let mut chunk_list = Vec::with_capacity(128);
+            let mut size = PayloadOffset::default();
+
+            for chunk in chunks.iter() {
+                log::debug!(
+                    "Injecting chunk with {} padding (chunk size {})",
+                    HumanByte::from(chunk.padding),
+                    HumanByte::from(chunk.size()),
+                );
+                size = size.add(chunk.size());
+                chunk_list.push(chunk.clone());
+            }
+
+            let inject_chunks = InjectChunks {
+                boundary: injection_boundary.raw(),
+                chunks: chunk_list,
+                size: size.raw() as usize,
+            };
+
+            if let Some(sender) = self.forced_boundaries.as_mut() {
+                sender.send(inject_chunks)?;
+            } else {
+                bail!("missing injection queue");
+            };
+
+            injection_boundary = injection_boundary.add(size.raw());
+            log::debug!("Advance payload position by: {size:?}");
+            encoder.advance(size)?;
+        }
+
+        Ok(())
+    }
+
     async fn add_directory<T: SeqWrite + Send>(
         &mut self,
         encoder: &mut Encoder<'_, T>,
         previous_metadata_accessor: &mut Option<Directory<MetadataArchiveReader>>,
         dir: Dir,
-        dir_name: &CStr,
+        c_dir_name: &CStr,
         metadata: &Metadata,
         stat: &FileStat,
     ) -> Result<(), Error> {
-        let dir_name = OsStr::from_bytes(dir_name.to_bytes());
+        let dir_name = OsStr::from_bytes(c_dir_name.to_bytes());
 
-        encoder.create_directory(dir_name, metadata).await?;
+        if !self.caching_enabled {
+            if let Some(ref catalog) = self.catalog {
+                catalog.lock().unwrap().start_directory(c_dir_name)?;
+            }
+            encoder.create_directory(dir_name, metadata).await?;
+        }
 
         let old_fs_magic = self.fs_magic;
         let old_fs_feature_flags = self.fs_feature_flags;
@@ -811,7 +1227,17 @@ impl Archiver {
         self.fs_feature_flags = old_fs_feature_flags;
         self.current_st_dev = old_st_dev;
 
-        encoder.finish().await?;
+        if !self.caching_enabled {
+            encoder.finish().await?;
+            if let Some(ref catalog) = self.catalog {
+                if !self.caching_enabled {
+                    catalog.lock().unwrap().end_directory()?;
+                }
+            }
+        } else {
+            self.cached_entries.push(CacheEntry::DirEnd);
+        }
+
         result
     }
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 47/62] client: backup writer: add injected chunk count to stats
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (45 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 46/62] fix #3174: client: pxar: enable caching and meta comparison Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 48/62] pxar: create: keep track of reused chunks and files Christian Ebner
                   ` (15 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Track the number of injected chunks and show them in the debug output

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pbs-client/src/backup_writer.rs | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/pbs-client/src/backup_writer.rs b/pbs-client/src/backup_writer.rs
index 66f209fed..3d1b60d59 100644
--- a/pbs-client/src/backup_writer.rs
+++ b/pbs-client/src/backup_writer.rs
@@ -57,6 +57,7 @@ pub struct UploadOptions {
 struct UploadStats {
     chunk_count: usize,
     chunk_reused: usize,
+    chunk_injected: usize,
     size: usize,
     size_reused: usize,
     size_compressed: usize,
@@ -400,6 +401,10 @@ impl BackupWriter {
                 archive,
                 (upload_stats.duration.as_micros()) / (upload_stats.chunk_count as u128)
             );
+            log::debug!(
+                "{archive}: Injected {} chunks from previous snapshot.",
+                upload_stats.chunk_injected,
+            );
         }
 
         let param = json!({
@@ -645,6 +650,8 @@ impl BackupWriter {
         let total_chunks2 = total_chunks.clone();
         let known_chunk_count = Arc::new(AtomicUsize::new(0));
         let known_chunk_count2 = known_chunk_count.clone();
+        let injected_chunk_count = Arc::new(AtomicUsize::new(0));
+        let injected_chunk_count2 = known_chunk_count.clone();
 
         let stream_len = Arc::new(AtomicUsize::new(0));
         let stream_len1 = stream_len.clone();
@@ -675,6 +682,7 @@ impl BackupWriter {
                     // account for injected chunks
                     let count = chunks.len();
                     total_chunks.fetch_add(count, Ordering::SeqCst);
+                    injected_chunk_count.fetch_add(count, Ordering::SeqCst);
 
                     let mut known = Vec::new();
                     let mut csum = index_csum_1.lock().unwrap();
@@ -800,6 +808,7 @@ impl BackupWriter {
                 let duration = start_time.elapsed();
                 let chunk_count = total_chunks2.load(Ordering::SeqCst);
                 let chunk_reused = known_chunk_count2.load(Ordering::SeqCst);
+                let chunk_injected = injected_chunk_count2.load(Ordering::SeqCst);
                 let size = stream_len2.load(Ordering::SeqCst);
                 let size_reused = reused_len2.load(Ordering::SeqCst);
                 let size_compressed = compressed_stream_len2.load(Ordering::SeqCst) as usize;
@@ -813,6 +822,7 @@ impl BackupWriter {
                 futures::future::ok(UploadStats {
                     chunk_count,
                     chunk_reused,
+                    chunk_injected,
                     size,
                     size_reused,
                     size_compressed,
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 48/62] pxar: create: keep track of reused chunks and files
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (46 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 47/62] client: backup writer: add injected chunk count to stats Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 49/62] pxar: create: show chunk injection stats debug output Christian Ebner
                   ` (14 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Track and log reused or reencoded files as well as the reused chunks
and their paddings.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pbs-client/src/pxar/create.rs | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index b2932c973..b03bd5a17 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -141,6 +141,17 @@ struct HardLinkInfo {
     st_ino: u64,
 }
 
+#[derive(Default)]
+struct ReuseStats {
+    files_reused_count: u64,
+    files_hardlink_count: u64,
+    files_reencoded_count: u64,
+    total_injected_count: u64,
+    partial_chunks_count: u64,
+    total_injected_size: u64,
+    total_reused_payload_size: u64,
+}
+
 struct Archiver {
     feature_flags: Flags,
     fs_feature_flags: Flags,
@@ -164,6 +175,7 @@ struct Archiver {
     cached_range: Range<u64>,
     cached_last_chunk: Option<ReusableDynamicEntry>,
     caching_enabled: bool,
+    reuse_stats: ReuseStats,
 }
 
 type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
@@ -273,6 +285,7 @@ where
         cached_last_chunk: None,
         cached_hardlinks: HashSet::new(),
         caching_enabled: false,
+        reuse_stats: ReuseStats::default(),
     };
 
     archiver
@@ -802,15 +815,22 @@ impl Archiver {
                 }
 
                 let offset: LinkOffset = if let Some(payload_offset) = payload_offset {
+                    self.reuse_stats.total_reused_payload_size +=
+                        file_size + size_of::<pxar::format::Header>() as u64;
+                    self.reuse_stats.files_reused_count += 1;
+
                     encoder
                         .add_payload_ref(metadata, file_name, file_size, payload_offset)
                         .await?
                 } else {
+                    self.reuse_stats.files_reencoded_count += 1;
+
                     self.add_regular_file(encoder, fd, file_name, metadata, file_size)
                         .await?
                 };
 
                 if stat.st_nlink > 1 {
+                    self.reuse_stats.files_hardlink_count += 1;
                     self.hardlinks
                         .insert(link_info, (self.path.clone(), offset));
                 }
@@ -1147,6 +1167,13 @@ impl Archiver {
                     HumanByte::from(chunk.padding),
                     HumanByte::from(chunk.size()),
                 );
+                self.reuse_stats.total_injected_size += chunk.size();
+                self.reuse_stats.total_injected_count += 1;
+
+                if chunk.padding > 0 {
+                    self.reuse_stats.partial_chunks_count += 1;
+                }
+
                 size = size.add(chunk.size());
                 chunk_list.push(chunk.clone());
             }
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 49/62] pxar: create: show chunk injection stats debug output
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (47 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 48/62] pxar: create: keep track of reused chunks and files Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 50/62] client: pxar: add helper to handle optional preludes Christian Ebner
                   ` (13 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pbs-client/src/pxar/create.rs | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index b03bd5a17..6dd0f3106 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -301,6 +301,27 @@ where
     encoder.finish().await?;
     encoder.close().await?;
 
+    if metadata_mode {
+        log::info!(
+            "Change detection: processed {} files: {} reencoded, {} reused, {} hardlinks",
+            archiver.reuse_stats.files_reused_count
+                + archiver.reuse_stats.files_reencoded_count
+                + archiver.reuse_stats.files_hardlink_count,
+            archiver.reuse_stats.files_reencoded_count,
+            archiver.reuse_stats.files_reused_count,
+            archiver.reuse_stats.files_hardlink_count,
+        );
+        if archiver.reuse_stats.total_reused_payload_size > 0 {
+            log::info!(
+                "Change detection: reused {} data, {} padding: total {} in {} chunks ({} partial chunks)",
+                HumanByte::from(archiver.reuse_stats.total_reused_payload_size),
+                HumanByte::from(archiver.reuse_stats.total_injected_size - archiver.reuse_stats.total_reused_payload_size),
+                HumanByte::from(archiver.reuse_stats.total_injected_size),
+                archiver.reuse_stats.total_injected_count,
+                archiver.reuse_stats.partial_chunks_count,
+            );
+        }
+    }
     Ok(())
 }
 
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 50/62] client: pxar: add helper to handle optional preludes
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (48 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 49/62] pxar: create: show chunk injection stats debug output Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 51/62] client: pxar: opt encode cli exclude patterns as Prelude Christian Ebner
                   ` (12 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Pxar archives with format version 2 allows to store optional
information file format version and prelude entries.

Cover the case for these entries, the file format version entry being
introduced to distinguish between different file formats used for
encoding as well as the prelude entry used to store optional metadata
such as the pxar cli exlude parameters.

Add the logic to accept and decode these prelude entries when
accessing the archive via a decoder instance.

For now simply ignore them.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pbs-client/src/pxar/create.rs             |  1 +
 pbs-client/src/pxar/extract.rs            |  7 +++---
 pbs-client/src/pxar/tools.rs              |  7 ++++++
 pbs-client/src/tools/mod.rs               | 27 +++++++++++++++++++++++
 src/api2/tape/restore.rs                  | 17 +++++---------
 src/tape/file_formats/snapshot_archive.rs |  8 +++++--
 6 files changed, 51 insertions(+), 16 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 6dd0f3106..153c71349 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -241,6 +241,7 @@ where
         &mut writers.writer,
         &metadata,
         writers.payload_writer.as_mut(),
+        None,
     )
     .await?;
 
diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
index 5f5ac6188..23b2f6ba5 100644
--- a/pbs-client/src/pxar/extract.rs
+++ b/pbs-client/src/pxar/extract.rs
@@ -29,6 +29,7 @@ use proxmox_compression::zip::{ZipEncoder, ZipEntry};
 use crate::pxar::dir_stack::PxarDirStack;
 use crate::pxar::metadata;
 use crate::pxar::Flags;
+use crate::tools::handle_root_with_optional_format_version_prelude;
 
 pub struct PxarExtractOptions<'a> {
     pub match_list: &'a [MatchEntry],
@@ -124,9 +125,7 @@ where
         // we use this to keep track of our directory-traversal
         decoder.enable_goodbye_entries(true);
 
-        let root = decoder
-            .next()
-            .context("found empty pxar archive")?
+        let (root, _, _) = handle_root_with_optional_format_version_prelude(&mut decoder)
             .context("error reading pxar archive")?;
 
         if !root.is_dir() {
@@ -267,6 +266,8 @@ where
         };
 
         let extract_res = match (did_match, entry.kind()) {
+            (_, EntryKind::Version(_version)) => Ok(()),
+            (_, EntryKind::Prelude(_prelude)) => Ok(()),
             (_, EntryKind::Directory) => {
                 self.callback(entry.path());
 
diff --git a/pbs-client/src/pxar/tools.rs b/pbs-client/src/pxar/tools.rs
index 459951d50..27e5185a3 100644
--- a/pbs-client/src/pxar/tools.rs
+++ b/pbs-client/src/pxar/tools.rs
@@ -172,6 +172,13 @@ pub fn format_multi_line_entry(entry: &Entry) -> String {
     let meta = entry.metadata();
 
     let (size, link, type_name, payload_offset) = match entry.kind() {
+        EntryKind::Version(version) => (format!("{version:?}"), String::new(), "version", None),
+        EntryKind::Prelude(prelude) => (
+            "0".to_string(),
+            format!("raw data: {:?} bytes", prelude.data.len()),
+            "prelude",
+            None,
+        ),
         EntryKind::File {
             size,
             payload_offset,
diff --git a/pbs-client/src/tools/mod.rs b/pbs-client/src/tools/mod.rs
index f8d3102d1..e6cf066e4 100644
--- a/pbs-client/src/tools/mod.rs
+++ b/pbs-client/src/tools/mod.rs
@@ -529,3 +529,30 @@ pub fn place_xdg_file(
         .and_then(|base| base.place_config_file(file_name).map_err(Error::from))
         .with_context(|| format!("failed to place {} in xdg home", description))
 }
+
+pub fn handle_root_with_optional_format_version_prelude<R: pxar::decoder::SeqRead>(
+    decoder: &mut pxar::decoder::sync::Decoder<R>,
+) -> Result<(pxar::Entry, Option<pxar::Entry>, Option<pxar::Entry>), Error> {
+    let first = decoder
+        .next()
+        .ok_or_else(|| format_err!("missing root entry"))??;
+    match first.kind() {
+        pxar::EntryKind::Directory => Ok((first, None, None)),
+        pxar::EntryKind::Version(_version) => {
+            let second = decoder
+                .next()
+                .ok_or_else(|| format_err!("missing root entry"))??;
+            match second.kind() {
+                pxar::EntryKind::Directory => Ok((second, Some(first), None)),
+                pxar::EntryKind::Prelude(_prelude) => {
+                    let third = decoder
+                        .next()
+                        .ok_or_else(|| format_err!("missing root entry"))??;
+                    Ok((third, Some(first), Some(second)))
+                }
+                _ => bail!("unexpected entry kind {:?}", second.kind()),
+            }
+        }
+        _ => bail!("unexpected entry kind {:?}", first.kind()),
+    }
+}
diff --git a/src/api2/tape/restore.rs b/src/api2/tape/restore.rs
index 11fb2b4cd..46093c7b0 100644
--- a/src/api2/tape/restore.rs
+++ b/src/api2/tape/restore.rs
@@ -23,6 +23,7 @@ use pbs_api_types::{
     PRIV_DATASTORE_MODIFY, PRIV_TAPE_READ, TAPE_RESTORE_NAMESPACE_SCHEMA,
     TAPE_RESTORE_SNAPSHOT_SCHEMA, UPID_SCHEMA,
 };
+use pbs_client::tools::handle_root_with_optional_format_version_prelude;
 use pbs_config::CachedUserInfo;
 use pbs_datastore::dynamic_index::DynamicIndexReader;
 use pbs_datastore::fixed_index::FixedIndexReader;
@@ -1712,17 +1713,11 @@ fn try_restore_snapshot_archive<R: pxar::decoder::SeqRead>(
     decoder: &mut pxar::decoder::sync::Decoder<R>,
     snapshot_path: &Path,
 ) -> Result<BackupManifest, Error> {
-    let _root = match decoder.next() {
-        None => bail!("missing root entry"),
-        Some(root) => {
-            let root = root?;
-            match root.kind() {
-                pxar::EntryKind::Directory => { /* Ok */ }
-                _ => bail!("wrong root entry type"),
-            }
-            root
-        }
-    };
+    let (root, _, _) = handle_root_with_optional_format_version_prelude(decoder)?;
+    match root.kind() {
+        pxar::EntryKind::Directory => { /* Ok */ }
+        _ => bail!("wrong root entry type"),
+    }
 
     let root_path = Path::new("/");
     let manifest_file_name = OsStr::new(MANIFEST_BLOB_NAME);
diff --git a/src/tape/file_formats/snapshot_archive.rs b/src/tape/file_formats/snapshot_archive.rs
index 43d1cf9c3..7e052919b 100644
--- a/src/tape/file_formats/snapshot_archive.rs
+++ b/src/tape/file_formats/snapshot_archive.rs
@@ -58,8 +58,12 @@ pub fn tape_write_snapshot_archive<'a>(
             ));
         }
 
-        let mut encoder =
-            pxar::encoder::sync::Encoder::new(PxarTapeWriter::new(writer), &root_metadata, None)?;
+        let mut encoder = pxar::encoder::sync::Encoder::new(
+            PxarTapeWriter::new(writer),
+            &root_metadata,
+            None,
+            None,
+        )?;
 
         for filename in file_list.iter() {
             let mut file = snapshot_reader.open_file(filename).map_err(|err| {
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 51/62] client: pxar: opt encode cli exclude patterns as Prelude
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (49 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 50/62] client: pxar: add helper to handle optional preludes Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 52/62] docs: file formats: describe split pxar archive file layout Christian Ebner
                   ` (11 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Instead of encoding the pxar cli exclude patterns as regular file
within the root directory of an archive, store this information
directly after the pxar format version entry in the entry of kind
Prelude.

This behaviour is however currently exclusive to the archives written
with format version 2 in a split metadata and payload case.

This is a breaking change for the encoding of new cli exclude
parameters. Any new exclude parameter will not be added to an already
present .pxar-cliexclude file, and it will not be created if not
present.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 pbs-client/src/pxar/create.rs | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 153c71349..19f2349fa 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -235,16 +235,6 @@ where
         set.insert(stat.st_dev);
     }
 
-    let metadata_mode = options.previous_ref.is_some() && writers.payload_writer.is_some();
-
-    let mut encoder = Encoder::new(
-        &mut writers.writer,
-        &metadata,
-        writers.payload_writer.as_mut(),
-        None,
-    )
-    .await?;
-
     let mut patterns = options.patterns;
 
     if options.skip_lost_and_found {
@@ -254,6 +244,15 @@ where
             MatchType::Exclude,
         )?);
     }
+
+    let cli_params_content = generate_pxar_excludes_cli(&patterns[..]);
+    let cli_params = if options.previous_ref.is_some() {
+        Some(cli_params_content.as_slice())
+    } else {
+        None
+    };
+
+    let metadata_mode = options.previous_ref.is_some() && writers.payload_writer.is_some();
     let (previous_payload_index, previous_metadata_accessor) =
         if let Some(refs) = options.previous_ref {
             (
@@ -264,6 +263,14 @@ where
             (None, None)
         };
 
+    let mut encoder = Encoder::new(
+        &mut writers.writer,
+        &metadata,
+        writers.payload_writer.as_mut(),
+        cli_params,
+    )
+    .await?;
+
     let mut archiver = Archiver {
         feature_flags,
         fs_feature_flags,
@@ -362,7 +369,7 @@ impl Archiver {
 
             let mut file_list = self.generate_directory_file_list(&mut dir, is_root)?;
 
-            if is_root && old_patterns_count > 0 {
+            if is_root && old_patterns_count > 0 && previous_metadata_accessor.is_none() {
                 file_list.push(FileListEntry {
                     name: CString::new(".pxarexclude-cli").unwrap(),
                     path: PathBuf::new(),
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 52/62] docs: file formats: describe split pxar archive file layout
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (50 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 51/62] client: pxar: opt encode cli exclude patterns as Prelude Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 53/62] docs: add section describing change detection mode Christian Ebner
                   ` (10 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Describes the pxar metadata archive and the corresponding pxar payload
file-format layout.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 docs/file-formats.rst         | 46 ++++++++++++++++++++++++++++++++
 docs/meta-format-overview.dot | 50 +++++++++++++++++++++++++++++++++++
 2 files changed, 96 insertions(+)
 create mode 100644 docs/meta-format-overview.dot

diff --git a/docs/file-formats.rst b/docs/file-formats.rst
index 43ecfefce..77d55b5ef 100644
--- a/docs/file-formats.rst
+++ b/docs/file-formats.rst
@@ -8,7 +8,53 @@ Proxmox File Archive Format (``.pxar``)
 
 .. graphviz:: pxar-format-overview.dot
 
+.. _pxar-meta-format:
 
+Proxmox File Archive Format - Meta (``.mpxar``)
+-----------------------------------------------
+
+Pxar metadata archive with same structure as a regular pxar archive, with the
+exception of regular file payloads not being contained within the archive
+itself, but rather being stored as payload references to the corresponding pxar
+payload (``.ppxar``) file.
+
+Can be used to lookup all the archive entries and metadata without the size
+overhead introduced by the file payloads.
+
+.. graphviz:: meta-format-overview.dot
+
+.. _ppxar-format:
+
+Proxmox File Archive Format - Payload (``.ppxar``)
+--------------------------------------------------
+
+Pxar payload file storing regular file payloads to be referenced and accessed by
+the corresponding pxar metadata (``.mpxar``) archive. Contains a concatenation
+of regular file payloads, each prefixed by a `PAYLOAD` header. Further, the
+actual referenced payload entries might be separated by padding (full/partial
+payloads not referenced), introduced when reusing chunks of a previous backup
+run, when chunk boundaries did not aligned to payload entry offsets.
+
+All headers are stored as little-endian.
+
+.. list-table::
+   :widths: auto
+
+   * - ``PAYLOAD_START_MARKER``
+     - header of ``[u8; 16]`` consisting of type hash and size;
+       marks start
+   * - ``PAYLOAD``
+     - header of ``[u8; 16]`` cosisting of type hash and size;
+       referenced by metadata archive
+   * - Payload
+     - raw regular file payload
+   * - Padding
+     - partial/full unreferenced payloads, caused by unaligned chunk boundary
+   * - ...
+     - further concatenation of payload header, payload and padding
+   * - ``PAYLOAD_TAIL_MARKER``
+     - header of ``[u8; 16]`` consisting of type hash and size;
+       marks end
 .. _data-blob-format:
 
 Data Blob Format (``.blob``)
diff --git a/docs/meta-format-overview.dot b/docs/meta-format-overview.dot
new file mode 100644
index 000000000..7eea4b55b
--- /dev/null
+++ b/docs/meta-format-overview.dot
@@ -0,0 +1,50 @@
+digraph g {
+graph [
+rankdir = "LR"
+fontname="Helvetica"
+];
+node [
+fontsize = "16"
+shape = "record"
+];
+edge [
+];
+
+"archive" [
+label = "archive.mpxar"
+shape = "record"
+];
+
+"rootdir" [
+label = "<fv>FORMAT_VERSION\l|PRELUDE\l|<f0>ENTRY\l|\{XATTR\}\* extended attribute list\l|\{ACL_USER\}\* USER ACL entries\l|\{ACL_GROUP\}\* GROUP ACL entries\l|\[ACL_GROUP_OBJ\] the ACL_GROUP_OBJ \l|\[ACL_DEFAULT\] the various default ACL fields\l|\{ACL_DEFAULT_USER\}\* USER ACL entries\l|\{ACL_DEFAULT_GROUP\}\* GROUP ACL entries\l|\[FCAPS\] file capability in Linux disk format\l|\[QUOTA_PROJECT_ID\] the ext4/xfs quota project ID\l|{<pl> PAYLOAD_REF|SYMLINK|DEVICE|{<de> \{DirectoryEntries\}\*|GOODBYE}}"
+shape = "record"
+];
+
+
+"entry" [
+label = "<f0> size: u64 = 64\l|type: u64 = ENTRY\l|feature_flags: u64\l|mode: u64\l|flags: u64\l|uid: u64\l|gid: u64\l|mtime: u64\l"
+labeljust = "l"
+shape = "record"
+];
+
+
+
+"direntry" [
+label = "<f0> FILENAME\l|{ENTRY\l|HARDLINK\l}"
+shape = "record"
+];
+
+"payloadrefentry" [
+label = "<f0> offset: u64\l|size: u64\l"
+shape = "record"
+];
+
+"archive" -> "rootdir":fv
+
+"rootdir":f0 -> "entry":f0
+
+"rootdir":de -> "direntry":f0
+
+"rootdir":pl -> "payloadrefentry":f0
+
+}
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 53/62] docs: add section describing change detection mode
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (51 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 52/62] docs: file formats: describe split pxar archive file layout Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 54/62] test-suite: add detection mode change benchmark Christian Ebner
                   ` (9 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Describe the motivation and basic principle of the clients change
detection mode and show an example invocation.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 docs/backup-client.rst | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/docs/backup-client.rst b/docs/backup-client.rst
index 00a1abbb3..e48b5dd60 100644
--- a/docs/backup-client.rst
+++ b/docs/backup-client.rst
@@ -280,6 +280,47 @@ Multiple paths can be excluded like this:
 
     # proxmox-backup-client backup.pxar:./linux --exclude=/usr --exclude=/rust
 
+.. _client_change_detection_mode:
+
+Change detection mode
+~~~~~~~~~~~~~~~~~~~~~
+
+File-based backups containing a lot of data can take a long time, as the default
+behavior for the Proxmox backup client is to read all data and re-encode it.
+The encoded stream is split into variable sized chunks for efficient
+deduplication and based on the chunk digest a decision can be made whether a
+given chunk needs to be uploaded or can be indexed without upload as it is
+already available on the server (and therefore deduplicated). For some
+use-cases, where files do not change frequently the full re-reading is not
+feasible and undesired.
+
+The backup clients `change-detection-mode` can be switched from default to
+`metadata` based detection to reduce limitations as described above, instructing
+the client to avoid re-reading files with unchanged metadata whenever possible.
+When using this mode, instead of the regular pxar archive, the backup snapshot
+is stored into two separate files: the `mpxar` containing the archives metadata
+and the `ppxar` containing a concatenation of the file contents. This splitting
+allows for metadata lookups without the overhead of the file contents.
+Using the `change-detection-mode` set to `data` allows to create the same split
+archive as when using the `metadata` mode, but without using a previous
+reference and therefore reencoding all file payloads.
+
+When creating the backup archives, the current file metadata is compared to the
+one looked up in the previous `mpxar` archive, and if unchanged the entry cached
+for possible re-use of content chunks without re-reading, by indexing the
+already present chunks containing the contents from the previous backup
+snapshot. Since the file might only partially re-use chunks (thereby introducing
+wasted space in the form of padding), the decision whether to re-use or
+re-encode the currently cached entries is delegated to when enough information
+is available, comparing the possible padding a threshold value.
+
+The following shows an example for the client invocation with the `metadata`
+mode:
+
+.. code-block:: console
+
+    # proxmox-backup-client backup.pxar:./linux --change-detection-mode=metadata
+
 .. _client_encryption:
 
 Encryption
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 54/62] test-suite: add detection mode change benchmark
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (52 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 53/62] docs: add section describing change detection mode Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 55/62] test-suite: add bin to deb, add shell completions Christian Ebner
                   ` (8 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Introduces the proxmox-backup-test-suite create intended for
benchmarking and high level user facing testing.

The initial code includes a benchmark intended for regression testing of
the proxmox-backup-client when using different file detection modes
during backup.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 Cargo.toml                                    |   1 +
 proxmox-backup-test-suite/Cargo.toml          |  18 ++
 .../src/detection_mode_bench.rs               | 294 ++++++++++++++++++
 proxmox-backup-test-suite/src/main.rs         |  17 +
 4 files changed, 330 insertions(+)
 create mode 100644 proxmox-backup-test-suite/Cargo.toml
 create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs
 create mode 100644 proxmox-backup-test-suite/src/main.rs

diff --git a/Cargo.toml b/Cargo.toml
index 5758b37bc..950dd9671 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -45,6 +45,7 @@ members = [
     "proxmox-restore-daemon",
 
     "pxar-bin",
+    "proxmox-backup-test-suite",
 ]
 
 [lib]
diff --git a/proxmox-backup-test-suite/Cargo.toml b/proxmox-backup-test-suite/Cargo.toml
new file mode 100644
index 000000000..3f899e9bc
--- /dev/null
+++ b/proxmox-backup-test-suite/Cargo.toml
@@ -0,0 +1,18 @@
+[package]
+name = "proxmox-backup-test-suite"
+version = "0.1.0"
+authors.workspace = true
+edition.workspace = true
+
+[dependencies]
+anyhow.workspace = true
+futures.workspace = true
+serde.workspace = true
+serde_json.workspace = true
+
+pbs-client.workspace = true
+pbs-key-config.workspace = true
+pbs-tools.workspace = true
+proxmox-async.workspace = true
+proxmox-router = { workspace = true, features = ["cli"] }
+proxmox-schema = { workspace = true, features = [ "api-macro" ] }
diff --git a/proxmox-backup-test-suite/src/detection_mode_bench.rs b/proxmox-backup-test-suite/src/detection_mode_bench.rs
new file mode 100644
index 000000000..9a3c76802
--- /dev/null
+++ b/proxmox-backup-test-suite/src/detection_mode_bench.rs
@@ -0,0 +1,294 @@
+use std::path::Path;
+use std::process::Command;
+use std::{thread, time};
+
+use anyhow::{bail, format_err, Error};
+use serde_json::Value;
+
+use pbs_client::{
+    tools::{complete_repository, key_source::KEYFILE_SCHEMA, REPO_URL_SCHEMA},
+    BACKUP_SOURCE_SCHEMA,
+};
+use pbs_tools::json;
+use proxmox_router::cli::*;
+use proxmox_schema::api;
+
+const DEFAULT_NUMBER_OF_RUNS: u64 = 5;
+// Homepage https://cocodataset.org/
+const COCO_DATASET_SRC_URL: &'static str = "http://images.cocodataset.org/zips/unlabeled2017.zip";
+// Homepage https://kernel.org/
+const LINUX_GIT_REPOSITORY: &'static str =
+    "git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git";
+const LINUX_GIT_TAG: &'static str = "v6.5.5";
+
+pub(crate) fn detection_mode_bench_mgtm_cli() -> CliCommandMap {
+    let run_cmd_def = CliCommand::new(&API_METHOD_DETECTION_MODE_BENCH_RUN)
+        .arg_param(&["backupspec"])
+        .completion_cb("repository", complete_repository)
+        .completion_cb("keyfile", complete_file_name);
+
+    let prepare_cmd_def = CliCommand::new(&API_METHOD_DETECTION_MODE_BENCH_PREPARE);
+    CliCommandMap::new()
+        .insert("prepare", prepare_cmd_def)
+        .insert("run", run_cmd_def)
+}
+
+#[api(
+   input: {
+       properties: {
+           backupspec: {
+               type: Array,
+               description: "List of backup source specifications ([<label.ext>:<path>] ...)",
+               items: {
+                   schema: BACKUP_SOURCE_SCHEMA,
+               }
+           },
+           repository: {
+               schema: REPO_URL_SCHEMA,
+               optional: true,
+           },
+           keyfile: {
+               schema: KEYFILE_SCHEMA,
+               optional: true,
+           },
+           "number-of-runs": {
+               description: "Number of times to repeat the run",
+               type: Integer,
+               optional: true,
+           },
+       }
+   }
+)]
+/// Run benchmark to compare performance for backups using different change detection modes.
+fn detection_mode_bench_run(param: Value) -> Result<(), Error> {
+    let mut pbc = Command::new("proxmox-backup-client");
+    pbc.arg("backup");
+
+    let backupspec_list = json::required_array_param(&param, "backupspec")?;
+    for backupspec in backupspec_list {
+        let arg = backupspec
+            .as_str()
+            .ok_or_else(|| format_err!("failed to parse backupspec"))?;
+        pbc.arg(arg);
+    }
+
+    if let Some(repo) = param["repository"].as_str() {
+        pbc.arg("--repository");
+        pbc.arg::<&str>(repo);
+    }
+
+    if let Some(keyfile) = param["keyfile"].as_str() {
+        pbc.arg("--keyfile");
+        pbc.arg::<&str>(keyfile);
+    }
+
+    let number_of_runs = match param["number_of_runs"].as_u64() {
+        Some(n) => n,
+        None => DEFAULT_NUMBER_OF_RUNS,
+    };
+    if number_of_runs < 1 {
+        bail!("Number of runs must be greater than 1, aborting.");
+    }
+
+    // First run is an initial run to make sure all chunks are present already, reduce side effects
+    // by filesystem caches ecc.
+    let _stats_initial = do_run(&mut pbc, 1)?;
+
+    println!("\nStarting benchmarking backups with regular detection mode...\n");
+    let stats_reg = do_run(&mut pbc, number_of_runs)?;
+
+    // Make sure to have a valid reference with catalog fromat version 2
+    pbc.arg("--change-detection-mode=metadata");
+    let _stats_initial = do_run(&mut pbc, 1)?;
+
+    println!("\nStarting benchmarking backups with metadata detection mode...\n");
+    let stats_meta = do_run(&mut pbc, number_of_runs)?;
+
+    println!("\nCompleted benchmark with {number_of_runs} runs for each tested mode.");
+    println!("\nCompleted regular backup with:");
+    println!("Total runtime: {:.2} s", stats_reg.total);
+    println!("Average: {:.2} ± {:.2} s", stats_reg.avg, stats_reg.stddev);
+    println!("Min: {:.2} s", stats_reg.min);
+    println!("Max: {:.2} s", stats_reg.max);
+
+    println!("\nCompleted metadata detection mode backup with:");
+    println!("Total runtime: {:.2} s", stats_meta.total);
+    println!(
+        "Average: {:.2} ± {:.2} s",
+        stats_meta.avg, stats_meta.stddev
+    );
+    println!("Min: {:.2} s", stats_meta.min);
+    println!("Max: {:.2} s", stats_meta.max);
+
+    let diff_stddev =
+        ((stats_meta.stddev * stats_meta.stddev) + (stats_reg.stddev * stats_reg.stddev)).sqrt();
+    println!("\nDifferences (metadata based - regular):");
+    println!(
+        "Delta total runtime: {:.2} s ({:.2} %)",
+        stats_meta.total - stats_reg.total,
+        100.0 * (stats_meta.total / stats_reg.total - 1.0),
+    );
+    println!(
+        "Delta average: {:.2} ± {:.2} s ({:.2} %)",
+        stats_meta.avg - stats_reg.avg,
+        diff_stddev,
+        100.0 * (stats_meta.avg / stats_reg.avg - 1.0),
+    );
+    println!(
+        "Delta min: {:.2} s ({:.2} %)",
+        stats_meta.min - stats_reg.min,
+        100.0 * (stats_meta.min / stats_reg.min - 1.0),
+    );
+    println!(
+        "Delta max: {:.2} s ({:.2} %)",
+        stats_meta.max - stats_reg.max,
+        100.0 * (stats_meta.max / stats_reg.max - 1.0),
+    );
+
+    Ok(())
+}
+
+fn do_run(cmd: &mut Command, n_runs: u64) -> Result<Statistics, Error> {
+    // Avoid consecutive snapshot timestamps collision
+    thread::sleep(time::Duration::from_millis(1000));
+    let mut timings = Vec::with_capacity(n_runs as usize);
+    for iteration in 1..n_runs + 1 {
+        let start = std::time::SystemTime::now();
+        let mut child = cmd.spawn()?;
+        let exit_code = child.wait()?;
+        let elapsed = start.elapsed()?;
+        timings.push(elapsed);
+        if !exit_code.success() {
+            bail!("Run number {iteration} of {n_runs} failed, aborting.");
+        }
+    }
+
+    Ok(statistics(timings))
+}
+
+struct Statistics {
+    total: f64,
+    avg: f64,
+    stddev: f64,
+    min: f64,
+    max: f64,
+}
+
+fn statistics(timings: Vec<std::time::Duration>) -> Statistics {
+    let total = timings
+        .iter()
+        .fold(0f64, |sum, time| sum + time.as_secs_f64());
+    let avg = total / timings.len() as f64;
+    let var = 1f64 / (timings.len() - 1) as f64
+        * timings.iter().fold(0f64, |sq_sum, time| {
+            let diff = time.as_secs_f64() - avg;
+            sq_sum + diff * diff
+        });
+    let stddev = var.sqrt();
+    let min = timings.iter().min().unwrap().as_secs_f64();
+    let max = timings.iter().max().unwrap().as_secs_f64();
+
+    Statistics {
+        total,
+        avg,
+        stddev,
+        min,
+        max,
+    }
+}
+
+#[api(
+    input: {
+        properties: {
+            target: {
+                description: "target path to prepare test data.",
+            },
+        },
+    },
+)]
+/// Prepare files required for detection mode backup benchmarks.
+fn detection_mode_bench_prepare(target: String) -> Result<(), Error> {
+    let linux_repo_target = format!("{target}/linux");
+    let coco_dataset_target = format!("{target}/coco");
+    git_clone(LINUX_GIT_REPOSITORY, linux_repo_target.as_str())?;
+    git_checkout(LINUX_GIT_TAG, linux_repo_target.as_str())?;
+    wget_download(COCO_DATASET_SRC_URL, coco_dataset_target.as_str())?;
+
+    Ok(())
+}
+
+fn git_clone(repo: &str, target: &str) -> Result<(), Error> {
+    println!("Calling git clone for '{repo}'.");
+    let target_git = format!("{target}/.git");
+    let path = Path::new(&target_git);
+    if let Ok(true) = path.try_exists() {
+        println!("Target '{target}' already contains a git repository, skip.");
+        return Ok(());
+    }
+
+    let mut git = Command::new("git");
+    git.args(["clone", repo, target]);
+
+    let mut child = git.spawn()?;
+    let exit_code = child.wait()?;
+    if exit_code.success() {
+        println!("git clone finished with success.");
+    } else {
+        bail!("git clone failed for '{target}'.");
+    }
+
+    Ok(())
+}
+
+fn git_checkout(checkout_target: &str, target: &str) -> Result<(), Error> {
+    println!("Calling git checkout '{checkout_target}'.");
+    let mut git = Command::new("git");
+    git.args(["-C", target, "checkout", checkout_target]);
+
+    let mut child = git.spawn()?;
+    let exit_code = child.wait()?;
+    if exit_code.success() {
+        println!("git checkout finished with success.");
+    } else {
+        bail!("git checkout '{checkout_target}' failed for '{target}'.");
+    }
+    Ok(())
+}
+
+fn wget_download(source_url: &str, target: &str) -> Result<(), Error> {
+    let path = Path::new(&target);
+    if let Ok(true) = path.try_exists() {
+        println!("Target '{target}' already exists, skip.");
+        return Ok(());
+    }
+    let zip = format!("{}/unlabeled2017.zip", target);
+    let path = Path::new(&zip);
+    if !path.try_exists()? {
+        println!("Download archive using wget from '{source_url}' to '{target}'.");
+        let mut wget = Command::new("wget");
+        wget.args(["-P", target, source_url]);
+
+        let mut child = wget.spawn()?;
+        let exit_code = child.wait()?;
+        if exit_code.success() {
+            println!("Download finished with success.");
+        } else {
+            bail!("Failed to download '{source_url}' to '{target}'.");
+        }
+        return Ok(());
+    } else {
+        println!("Target '{target}' already contains download, skip download.");
+    }
+
+    let mut unzip = Command::new("unzip");
+    unzip.args([&zip, "-d", target]);
+
+    let mut child = unzip.spawn()?;
+    let exit_code = child.wait()?;
+    if exit_code.success() {
+        println!("Extracting zip archive finished with success.");
+    } else {
+        bail!("Failed to extract zip archive '{zip}' to '{target}'.");
+    }
+    Ok(())
+}
diff --git a/proxmox-backup-test-suite/src/main.rs b/proxmox-backup-test-suite/src/main.rs
new file mode 100644
index 000000000..0a5b436a8
--- /dev/null
+++ b/proxmox-backup-test-suite/src/main.rs
@@ -0,0 +1,17 @@
+use proxmox_router::cli::*;
+
+mod detection_mode_bench;
+
+fn main() {
+    let cmd_def = CliCommandMap::new().insert(
+        "detection-mode-bench",
+        detection_mode_bench::detection_mode_bench_mgtm_cli(),
+    );
+
+    let rpcenv = CliEnvironment::new();
+    run_cli_command(
+        cmd_def,
+        rpcenv,
+        Some(|future| proxmox_async::runtime::main(future)),
+    );
+}
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 55/62] test-suite: add bin to deb, add shell completions
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (53 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 54/62] test-suite: add detection mode change benchmark Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 56/62] datastore: chunker: add Chunker trait Christian Ebner
                   ` (7 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Adds the required files for bash and zsh completion and packages the
binary to be included in the proxmox-backup-client debian package.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 Makefile                                     | 13 ++++++++-----
 debian/proxmox-backup-client.bash-completion |  1 +
 debian/proxmox-backup-client.install         |  2 ++
 debian/proxmox-backup-test-suite.bc          |  8 ++++++++
 zsh-completions/_proxmox-backup-test-suite   | 13 +++++++++++++
 5 files changed, 32 insertions(+), 5 deletions(-)
 create mode 100644 debian/proxmox-backup-test-suite.bc
 create mode 100644 zsh-completions/_proxmox-backup-test-suite

diff --git a/Makefile b/Makefile
index 03e938767..76529921b 100644
--- a/Makefile
+++ b/Makefile
@@ -8,11 +8,12 @@ SUBDIRS := etc www docs templates
 
 # Binaries usable by users
 USR_BIN := \
-	proxmox-backup-client 	\
-	proxmox-file-restore	\
-	pxar			\
-	proxmox-tape		\
-	pmtx			\
+	proxmox-backup-client 		\
+	proxmox-backup-test-suite 	\
+	proxmox-file-restore		\
+	pxar				\
+	proxmox-tape			\
+	pmtx				\
 	pmt
 
 # Binaries usable by admins
@@ -165,6 +166,8 @@ $(COMPILED_BINS) $(COMPILEDIR)/dump-catalog-shell-cli $(COMPILEDIR)/docgen: .do-
 	    --bin proxmox-backup-client \
 	    --bin dump-catalog-shell-cli \
 	    --bin proxmox-backup-debug \
+	    --package proxmox-backup-test-suite \
+	    --bin proxmox-backup-test-suite \
 	    --package proxmox-file-restore \
 	    --bin proxmox-file-restore \
 	    --package pxar-bin \
diff --git a/debian/proxmox-backup-client.bash-completion b/debian/proxmox-backup-client.bash-completion
index 437360175..c4ff02ae6 100644
--- a/debian/proxmox-backup-client.bash-completion
+++ b/debian/proxmox-backup-client.bash-completion
@@ -1,2 +1,3 @@
 debian/proxmox-backup-client.bc proxmox-backup-client
+debian/proxmox-backup-test-suite.bc proxmox-backup-test-suite
 debian/pxar.bc pxar
diff --git a/debian/proxmox-backup-client.install b/debian/proxmox-backup-client.install
index 74b568f17..0eb859757 100644
--- a/debian/proxmox-backup-client.install
+++ b/debian/proxmox-backup-client.install
@@ -1,6 +1,8 @@
 usr/bin/proxmox-backup-client
+usr/bin/proxmox-backup-test-suite
 usr/bin/pxar
 usr/share/man/man1/proxmox-backup-client.1
 usr/share/man/man1/pxar.1
 usr/share/zsh/vendor-completions/_proxmox-backup-client
+usr/share/zsh/vendor-completions/_proxmox-backup-test-suite
 usr/share/zsh/vendor-completions/_pxar
diff --git a/debian/proxmox-backup-test-suite.bc b/debian/proxmox-backup-test-suite.bc
new file mode 100644
index 000000000..2686d7eaa
--- /dev/null
+++ b/debian/proxmox-backup-test-suite.bc
@@ -0,0 +1,8 @@
+# proxmox-backup-test-suite bash completion
+
+# see http://tiswww.case.edu/php/chet/bash/FAQ
+# and __ltrim_colon_completions() in /usr/share/bash-completion/bash_completion
+# this modifies global var, but I found no better way
+COMP_WORDBREAKS=${COMP_WORDBREAKS//:}
+
+complete -C 'proxmox-backup-test-suite bashcomplete' proxmox-backup-test-suite
diff --git a/zsh-completions/_proxmox-backup-test-suite b/zsh-completions/_proxmox-backup-test-suite
new file mode 100644
index 000000000..72ebcea5f
--- /dev/null
+++ b/zsh-completions/_proxmox-backup-test-suite
@@ -0,0 +1,13 @@
+#compdef _proxmox-backup-test-suite() proxmox-backup-test-suite
+
+function _proxmox-backup-test-suite() {
+    local cwords line point cmd curr prev
+    cwords=${#words[@]}
+    line=$words
+    point=${#line}
+    cmd=${words[1]}
+    curr=${words[cwords]}
+    prev=${words[cwords-1]}
+    compadd -- $(COMP_CWORD="$cwords" COMP_LINE="$line" COMP_POINT="$point" \
+        proxmox-backup-test-suite bashcomplete "$cmd" "$curr" "$prev")
+}
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 56/62] datastore: chunker: add Chunker trait
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (54 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 55/62] test-suite: add bin to deb, add shell completions Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 57/62] datastore: chunker: implement chunker for payload stream Christian Ebner
                   ` (6 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Add the Chunker trait and move the current Chunker to ChunkerImpl to
implement the trait instead. This allows to use different chunker
implementations by dynamic dispatch and is in preparation for
implementing a dedicated payload chunker.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 examples/test_chunk_size.rs        |  9 +--
 examples/test_chunk_speed.rs       |  7 ++-
 pbs-client/src/chunk_stream.rs     | 37 ++++++------
 pbs-datastore/src/chunker.rs       | 92 ++++++++++++++++++------------
 pbs-datastore/src/dynamic_index.rs |  9 +--
 pbs-datastore/src/lib.rs           |  2 +-
 6 files changed, 89 insertions(+), 67 deletions(-)

diff --git a/examples/test_chunk_size.rs b/examples/test_chunk_size.rs
index a01a5e640..2ebc22f64 100644
--- a/examples/test_chunk_size.rs
+++ b/examples/test_chunk_size.rs
@@ -5,10 +5,10 @@ extern crate proxmox_backup;
 use anyhow::Error;
 use std::io::{Read, Write};
 
-use pbs_datastore::Chunker;
+use pbs_datastore::{Chunker, ChunkerImpl};
 
 struct ChunkWriter {
-    chunker: Chunker,
+    chunker: ChunkerImpl,
     last_chunk: usize,
     chunk_offset: usize,
 
@@ -23,7 +23,7 @@ struct ChunkWriter {
 impl ChunkWriter {
     fn new(chunk_size: usize) -> Self {
         ChunkWriter {
-            chunker: Chunker::new(chunk_size),
+            chunker: ChunkerImpl::new(chunk_size),
             last_chunk: 0,
             chunk_offset: 0,
             chunk_count: 0,
@@ -69,7 +69,8 @@ impl Write for ChunkWriter {
     fn write(&mut self, data: &[u8]) -> std::result::Result<usize, std::io::Error> {
         let chunker = &mut self.chunker;
 
-        let pos = chunker.scan(data);
+        let ctx = pbs_datastore::chunker::Context::default();
+        let pos = chunker.scan(data, &ctx);
 
         if pos > 0 {
             self.chunk_offset += pos;
diff --git a/examples/test_chunk_speed.rs b/examples/test_chunk_speed.rs
index 37e13e0de..2d79604ab 100644
--- a/examples/test_chunk_speed.rs
+++ b/examples/test_chunk_speed.rs
@@ -1,6 +1,6 @@
 extern crate proxmox_backup;
 
-use pbs_datastore::Chunker;
+use pbs_datastore::{Chunker, ChunkerImpl};
 
 fn main() {
     let mut buffer = Vec::new();
@@ -12,7 +12,7 @@ fn main() {
             buffer.push(byte);
         }
     }
-    let mut chunker = Chunker::new(64 * 1024);
+    let mut chunker = ChunkerImpl::new(64 * 1024);
 
     let count = 5;
 
@@ -23,8 +23,9 @@ fn main() {
     for _i in 0..count {
         let mut pos = 0;
         let mut _last = 0;
+        let ctx = pbs_datastore::chunker::Context::default();
         while pos < buffer.len() {
-            let k = chunker.scan(&buffer[pos..]);
+            let k = chunker.scan(&buffer[pos..], &ctx);
             if k == 0 {
                 //println!("LAST {}", pos);
                 break;
diff --git a/pbs-client/src/chunk_stream.rs b/pbs-client/src/chunk_stream.rs
index 728c0a88d..a32ecfd15 100644
--- a/pbs-client/src/chunk_stream.rs
+++ b/pbs-client/src/chunk_stream.rs
@@ -7,7 +7,7 @@ use bytes::BytesMut;
 use futures::ready;
 use futures::stream::{Stream, TryStream};
 
-use pbs_datastore::Chunker;
+use pbs_datastore::{Chunker, ChunkerImpl};
 
 use crate::inject_reused_chunks::InjectChunks;
 
@@ -16,7 +16,6 @@ pub struct InjectionData {
     boundaries: mpsc::Receiver<InjectChunks>,
     next_boundary: Option<InjectChunks>,
     injections: mpsc::Sender<InjectChunks>,
-    consumed: u64,
 }
 
 impl InjectionData {
@@ -28,7 +27,6 @@ impl InjectionData {
             boundaries,
             next_boundary: None,
             injections,
-            consumed: 0,
         }
     }
 }
@@ -36,19 +34,22 @@ impl InjectionData {
 /// Split input stream into dynamic sized chunks
 pub struct ChunkStream<S: Unpin> {
     input: S,
-    chunker: Chunker,
+    chunker: Box<dyn Chunker + Send>,
     buffer: BytesMut,
     scan_pos: usize,
+    consumed: u64,
     injection_data: Option<InjectionData>,
 }
 
 impl<S: Unpin> ChunkStream<S> {
     pub fn new(input: S, chunk_size: Option<usize>, injection_data: Option<InjectionData>) -> Self {
+        let chunk_size = chunk_size.unwrap_or(4 * 1024 * 1024);
         Self {
             input,
-            chunker: Chunker::new(chunk_size.unwrap_or(4 * 1024 * 1024)),
+            chunker: Box::new(ChunkerImpl::new(chunk_size)),
             buffer: BytesMut::new(),
             scan_pos: 0,
+            consumed: 0,
             injection_data,
         }
     }
@@ -68,11 +69,15 @@ where
         let this = self.get_mut();
 
         loop {
+            let ctx = pbs_datastore::chunker::Context {
+                base: this.consumed,
+                total: this.buffer.len() as u64,
+            };
+
             if let Some(InjectionData {
                 boundaries,
                 next_boundary,
                 injections,
-                consumed,
             }) = this.injection_data.as_mut()
             {
                 if next_boundary.is_none() {
@@ -83,24 +88,24 @@ where
 
                 if let Some(inject) = next_boundary.take() {
                     // require forced boundary, lookup next regular boundary
-                    let pos = this.chunker.scan(&this.buffer[this.scan_pos..]);
+                    let pos = this.chunker.scan(&this.buffer[this.scan_pos..], &ctx);
 
                     let chunk_boundary = if pos == 0 {
-                        *consumed + this.buffer.len() as u64
+                        this.consumed + this.buffer.len() as u64
                     } else {
-                        *consumed + (this.scan_pos + pos) as u64
+                        this.consumed + (this.scan_pos + pos) as u64
                     };
 
                     if inject.boundary <= chunk_boundary {
                         // forced boundary is before next boundary, force within current buffer
-                        let chunk_size = (inject.boundary - *consumed) as usize;
+                        let chunk_size = (inject.boundary - this.consumed) as usize;
                         let raw_chunk = this.buffer.split_to(chunk_size);
-                        *consumed += chunk_size as u64;
+                        this.consumed += chunk_size as u64;
                         this.scan_pos = 0;
 
                         // add the size of the injected chunks to consumed, so chunk stream offsets
                         // are in sync with the rest of the archive.
-                        *consumed += inject.size as u64;
+                        this.consumed += inject.size as u64;
 
                         injections.send(inject).unwrap();
 
@@ -112,7 +117,7 @@ where
                         // forced boundary is after next boundary, split off chunk from buffer
                         let chunk_size = this.scan_pos + pos;
                         let raw_chunk = this.buffer.split_to(chunk_size);
-                        *consumed += chunk_size as u64;
+                        this.consumed += chunk_size as u64;
                         this.scan_pos = 0;
 
                         return Poll::Ready(Some(Ok(raw_chunk)));
@@ -126,7 +131,7 @@ where
 
             if this.scan_pos < this.buffer.len() {
                 // look for next chunk boundary, starting from scan_pos
-                let boundary = this.chunker.scan(&this.buffer[this.scan_pos..]);
+                let boundary = this.chunker.scan(&this.buffer[this.scan_pos..], &ctx);
 
                 let chunk_size = this.scan_pos + boundary;
 
@@ -136,9 +141,7 @@ where
                 } else if chunk_size <= this.buffer.len() {
                     // found new chunk boundary inside buffer, split off chunk from buffer
                     let raw_chunk = this.buffer.split_to(chunk_size);
-                    if let Some(InjectionData { consumed, .. }) = this.injection_data.as_mut() {
-                        *consumed += chunk_size as u64;
-                    }
+                    this.consumed += chunk_size as u64;
                     this.scan_pos = 0;
 
                     return Poll::Ready(Some(Ok(raw_chunk)));
diff --git a/pbs-datastore/src/chunker.rs b/pbs-datastore/src/chunker.rs
index 712751829..119b88a03 100644
--- a/pbs-datastore/src/chunker.rs
+++ b/pbs-datastore/src/chunker.rs
@@ -5,6 +5,19 @@
 /// use hash value 0 to detect a boundary.
 const CA_CHUNKER_WINDOW_SIZE: usize = 64;
 
+/// Additional context for chunker to find possible boundaries in payload streams
+#[derive(Default)]
+pub struct Context {
+    /// Already consumed bytes of the chunk stream consumer
+    pub base: u64,
+    /// Total size currently buffered
+    pub total: u64,
+}
+
+pub trait Chunker {
+    fn scan(&mut self, data: &[u8], ctx: &Context) -> usize;
+}
+
 /// Sliding window chunker (Buzhash)
 ///
 /// This is a rewrite of *casync* chunker (cachunker.h) in rust.
@@ -15,7 +28,7 @@ const CA_CHUNKER_WINDOW_SIZE: usize = 64;
 /// Hash](https://en.wikipedia.org/wiki/Rolling_hash) article from
 /// Wikipedia.
 
-pub struct Chunker {
+pub struct ChunkerImpl {
     h: u32,
     window_size: usize,
     chunk_size: usize,
@@ -67,7 +80,7 @@ const BUZHASH_TABLE: [u32; 256] = [
     0x5eff22f4, 0x6027f4cc, 0x77178b3c, 0xae507131, 0x7bf7cabc, 0xf9c18d66, 0x593ade65, 0xd95ddf11,
 ];
 
-impl Chunker {
+impl ChunkerImpl {
     /// Create a new Chunker instance, which produces and average
     /// chunk size of `chunk_size_avg` (need to be a power of two). We
     /// allow variation from `chunk_size_avg/4` up to a maximum of
@@ -105,11 +118,44 @@ impl Chunker {
         }
     }
 
+    // fast implementation avoiding modulo
+    // #[inline(always)]
+    fn shall_break(&self) -> bool {
+        if self.chunk_size >= self.chunk_size_max {
+            return true;
+        }
+
+        if self.chunk_size < self.chunk_size_min {
+            return false;
+        }
+
+        //(self.h & 0x1ffff) <= 2 //THIS IS SLOW!!!
+
+        //(self.h & self.break_test_mask) <= 2 // Bad on 0 streams
+
+        (self.h & self.break_test_mask) >= self.break_test_minimum
+    }
+
+    // This is the original implementation from casync
+    /*
+    #[inline(always)]
+    fn shall_break_orig(&self) -> bool {
+
+        if self.chunk_size >= self.chunk_size_max { return true; }
+
+        if self.chunk_size < self.chunk_size_min { return false; }
+
+        (self.h % self.discriminator) == (self.discriminator - 1)
+    }
+     */
+}
+
+impl Chunker for ChunkerImpl {
     /// Scans the specified data for a chunk border. Returns 0 if none
     /// was found (and the function should be called with more data
     /// later on), or another value indicating the position of a
     /// border.
-    pub fn scan(&mut self, data: &[u8]) -> usize {
+    fn scan(&mut self, data: &[u8], _ctx: &Context) -> usize {
         let window_len = self.window.len();
         let data_len = data.len();
 
@@ -166,37 +212,6 @@ impl Chunker {
 
         0
     }
-
-    // fast implementation avoiding modulo
-    // #[inline(always)]
-    fn shall_break(&self) -> bool {
-        if self.chunk_size >= self.chunk_size_max {
-            return true;
-        }
-
-        if self.chunk_size < self.chunk_size_min {
-            return false;
-        }
-
-        //(self.h & 0x1ffff) <= 2 //THIS IS SLOW!!!
-
-        //(self.h & self.break_test_mask) <= 2 // Bad on 0 streams
-
-        (self.h & self.break_test_mask) >= self.break_test_minimum
-    }
-
-    // This is the original implementation from casync
-    /*
-    #[inline(always)]
-    fn shall_break_orig(&self) -> bool {
-
-        if self.chunk_size >= self.chunk_size_max { return true; }
-
-        if self.chunk_size < self.chunk_size_min { return false; }
-
-        (self.h % self.discriminator) == (self.discriminator - 1)
-    }
-     */
 }
 
 #[test]
@@ -209,17 +224,18 @@ fn test_chunker1() {
             buffer.push(byte);
         }
     }
-    let mut chunker = Chunker::new(64 * 1024);
+    let mut chunker = ChunkerImpl::new(64 * 1024);
 
     let mut pos = 0;
     let mut last = 0;
 
     let mut chunks1: Vec<(usize, usize)> = vec![];
     let mut chunks2: Vec<(usize, usize)> = vec![];
+    let ctx = Context::default();
 
     // test1: feed single bytes
     while pos < buffer.len() {
-        let k = chunker.scan(&buffer[pos..pos + 1]);
+        let k = chunker.scan(&buffer[pos..pos + 1], &ctx);
         pos += 1;
         if k != 0 {
             let prev = last;
@@ -229,13 +245,13 @@ fn test_chunker1() {
     }
     chunks1.push((last, buffer.len() - last));
 
-    let mut chunker = Chunker::new(64 * 1024);
+    let mut chunker = ChunkerImpl::new(64 * 1024);
 
     let mut pos = 0;
 
     // test2: feed with whole buffer
     while pos < buffer.len() {
-        let k = chunker.scan(&buffer[pos..]);
+        let k = chunker.scan(&buffer[pos..], &ctx);
         if k != 0 {
             chunks2.push((pos, k));
             pos += k;
diff --git a/pbs-datastore/src/dynamic_index.rs b/pbs-datastore/src/dynamic_index.rs
index b8047b5b1..dc9eee050 100644
--- a/pbs-datastore/src/dynamic_index.rs
+++ b/pbs-datastore/src/dynamic_index.rs
@@ -23,7 +23,7 @@ use crate::data_blob::{DataBlob, DataChunkBuilder};
 use crate::file_formats;
 use crate::index::{ChunkReadInfo, IndexFile};
 use crate::read_chunk::ReadChunk;
-use crate::Chunker;
+use crate::{Chunker, ChunkerImpl};
 
 /// Header format definition for dynamic index files (`.dixd`)
 #[repr(C)]
@@ -397,7 +397,7 @@ impl DynamicIndexWriter {
 pub struct DynamicChunkWriter {
     index: DynamicIndexWriter,
     closed: bool,
-    chunker: Chunker,
+    chunker: ChunkerImpl,
     stat: ChunkStat,
     chunk_offset: usize,
     last_chunk: usize,
@@ -409,7 +409,7 @@ impl DynamicChunkWriter {
         Self {
             index,
             closed: false,
-            chunker: Chunker::new(chunk_size),
+            chunker: ChunkerImpl::new(chunk_size),
             stat: ChunkStat::new(0),
             chunk_offset: 0,
             last_chunk: 0,
@@ -494,7 +494,8 @@ impl Write for DynamicChunkWriter {
     fn write(&mut self, data: &[u8]) -> std::result::Result<usize, std::io::Error> {
         let chunker = &mut self.chunker;
 
-        let pos = chunker.scan(data);
+        let ctx = crate::chunker::Context::default();
+        let pos = chunker.scan(data, &ctx);
 
         if pos > 0 {
             self.chunk_buffer.extend_from_slice(&data[0..pos]);
diff --git a/pbs-datastore/src/lib.rs b/pbs-datastore/src/lib.rs
index 43050162f..24429626c 100644
--- a/pbs-datastore/src/lib.rs
+++ b/pbs-datastore/src/lib.rs
@@ -196,7 +196,7 @@ pub use backup_info::{BackupDir, BackupGroup, BackupInfo};
 pub use checksum_reader::ChecksumReader;
 pub use checksum_writer::ChecksumWriter;
 pub use chunk_store::ChunkStore;
-pub use chunker::Chunker;
+pub use chunker::{Chunker, ChunkerImpl};
 pub use crypt_reader::CryptReader;
 pub use crypt_writer::CryptWriter;
 pub use data_blob::DataBlob;
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 57/62] datastore: chunker: implement chunker for payload stream
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (55 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 56/62] datastore: chunker: add Chunker trait Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 58/62] client: chunk stream: switch payload stream chunker Christian Ebner
                   ` (5 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Implement the Chunker trait for a dedicated payload stream chunker,
which extends the regular chunker by the option to suggest boundaries
to be used over the hast based boundaries whenever possible.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- fix issue with scan already consuming full buffer, now only scan up
  until suggested boundary
- add more debug log output

 pbs-datastore/src/chunker.rs | 81 ++++++++++++++++++++++++++++++++++++
 pbs-datastore/src/lib.rs     |  2 +-
 2 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/pbs-datastore/src/chunker.rs b/pbs-datastore/src/chunker.rs
index 119b88a03..ceda2d7de 100644
--- a/pbs-datastore/src/chunker.rs
+++ b/pbs-datastore/src/chunker.rs
@@ -1,3 +1,5 @@
+use std::sync::mpsc::Receiver;
+
 /// Note: window size 32 or 64, is faster because we can
 /// speedup modulo operations, but always computes hash 0
 /// for constant data streams .. 0,0,0,0,0,0
@@ -45,6 +47,16 @@ pub struct ChunkerImpl {
     window: [u8; CA_CHUNKER_WINDOW_SIZE],
 }
 
+/// Sliding window chunker (Buzhash) with boundary suggestions
+///
+/// Suggest to chunk at a given boundary instead of the regular chunk boundary for better alignment
+/// with file payload boundaries.
+pub struct PayloadChunker {
+    chunker: ChunkerImpl,
+    current_suggested: Option<u64>,
+    suggested_boundaries: Receiver<u64>,
+}
+
 const BUZHASH_TABLE: [u32; 256] = [
     0x458be752, 0xc10748cc, 0xfbbcdbb8, 0x6ded5b68, 0xb10a82b5, 0x20d75648, 0xdfc5665f, 0xa8428801,
     0x7ebf5191, 0x841135c7, 0x65cc53b3, 0x280a597c, 0x16f60255, 0xc78cbc3e, 0x294415f5, 0xb938d494,
@@ -214,6 +226,75 @@ impl Chunker for ChunkerImpl {
     }
 }
 
+impl PayloadChunker {
+    /// Create a new PayloadChunker instance, which produces and average
+    /// chunk size of `chunk_size_avg` (need to be a power of two), if no
+    /// suggested boundaries are provided.
+    /// Use suggested boundaries instead,  whenever the chunk size is within
+    /// the min - max range.
+    pub fn new(chunk_size_avg: usize, suggested_boundaries: Receiver<u64>) -> Self {
+        Self {
+            chunker: ChunkerImpl::new(chunk_size_avg),
+            current_suggested: None,
+            suggested_boundaries,
+        }
+    }
+}
+
+impl Chunker for PayloadChunker {
+    fn scan(&mut self, data: &[u8], ctx: &Context) -> usize {
+        let pos = ctx.total - data.len() as u64;
+
+        loop {
+            if let Some(boundary) = self.current_suggested {
+                if boundary < ctx.base + pos {
+                    log::debug!("Boundary {boundary} in past");
+                    // ignore passed boundaries
+                    self.current_suggested = None;
+                    continue;
+                }
+
+                if boundary > ctx.base + ctx.total {
+                    log::debug!("Boundary {boundary} in future");
+                    // boundary in future, cannot decide yet
+                    return self.chunker.scan(data, ctx);
+                }
+
+                let chunk_size = (boundary - ctx.base) as usize;
+                if chunk_size < self.chunker.chunk_size_min {
+                    log::debug!("Chunk size {chunk_size} below minimum chunk size");
+                    // chunk to small, ignore boundary
+                    self.current_suggested = None;
+                    continue;
+                }
+
+                if chunk_size <= self.chunker.chunk_size_max {
+                    log::debug!("Chunk at suggested boundary: {boundary}, {chunk_size}");
+                    self.current_suggested = None;
+                    // calculate boundary relative to start of given data buffer
+                    let len = chunk_size - pos as usize;
+                    log::debug!("Chunk at suggested boundary: {boundary}, chunk size {chunk_size}, len {len}");
+                    // although we ignore the output, consume the data with the chunker
+                    let _ignore = self.chunker.scan(&data[..len], ctx);
+                    return len;
+                }
+
+                log::debug!("Chunk {chunk_size} to big, regular scan");
+                // chunk to big, cannot decide yet
+                // scan for hash based chunk boundary instead
+                return self.chunker.scan(data, ctx);
+            }
+
+            if let Ok(boundary) = self.suggested_boundaries.try_recv() {
+                self.current_suggested = Some(boundary);
+            } else {
+                log::debug!("No suggested boundary, regular scan");
+                return self.chunker.scan(data, ctx);
+            }
+        }
+    }
+}
+
 #[test]
 fn test_chunker1() {
     let mut buffer = Vec::new();
diff --git a/pbs-datastore/src/lib.rs b/pbs-datastore/src/lib.rs
index 24429626c..3e4aa34c2 100644
--- a/pbs-datastore/src/lib.rs
+++ b/pbs-datastore/src/lib.rs
@@ -196,7 +196,7 @@ pub use backup_info::{BackupDir, BackupGroup, BackupInfo};
 pub use checksum_reader::ChecksumReader;
 pub use checksum_writer::ChecksumWriter;
 pub use chunk_store::ChunkStore;
-pub use chunker::{Chunker, ChunkerImpl};
+pub use chunker::{Chunker, ChunkerImpl, PayloadChunker};
 pub use crypt_reader::CryptReader;
 pub use crypt_writer::CryptWriter;
 pub use data_blob::DataBlob;
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 58/62] client: chunk stream: switch payload stream chunker
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (56 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 57/62] datastore: chunker: implement chunker for payload stream Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 59/62] client: pxar: allow to restore prelude to optional path Christian Ebner
                   ` (4 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Use the dedicated chunker with boundary suggestions for the payload
stream, by attaching the channel sender to the archiver and the
channel receiver to the payload stream chunker.

The archiver sends the file boundaries for the chunker to consume.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes

 examples/test_chunk_speed2.rs                 |  2 +-
 pbs-client/src/chunk_stream.rs                | 15 ++++++--
 pbs-client/src/pxar/create.rs                 |  8 +++++
 pbs-client/src/pxar_backup_stream.rs          | 34 ++++++++++++-------
 proxmox-backup-client/src/main.rs             | 16 ++++++---
 .../src/proxmox_restore_daemon/api.rs         |  1 +
 pxar-bin/src/main.rs                          |  1 +
 tests/catar.rs                                |  1 +
 8 files changed, 56 insertions(+), 22 deletions(-)

diff --git a/examples/test_chunk_speed2.rs b/examples/test_chunk_speed2.rs
index 22dd14ce2..f2963746a 100644
--- a/examples/test_chunk_speed2.rs
+++ b/examples/test_chunk_speed2.rs
@@ -26,7 +26,7 @@ async fn run() -> Result<(), Error> {
         .map_err(Error::from);
 
     //let chunk_stream = FixedChunkStream::new(stream, 4*1024*1024);
-    let mut chunk_stream = ChunkStream::new(stream, None, None);
+    let mut chunk_stream = ChunkStream::new(stream, None, None, None);
 
     let start_time = std::time::Instant::now();
 
diff --git a/pbs-client/src/chunk_stream.rs b/pbs-client/src/chunk_stream.rs
index a32ecfd15..ab7b70d17 100644
--- a/pbs-client/src/chunk_stream.rs
+++ b/pbs-client/src/chunk_stream.rs
@@ -7,7 +7,7 @@ use bytes::BytesMut;
 use futures::ready;
 use futures::stream::{Stream, TryStream};
 
-use pbs_datastore::{Chunker, ChunkerImpl};
+use pbs_datastore::{Chunker, ChunkerImpl, PayloadChunker};
 
 use crate::inject_reused_chunks::InjectChunks;
 
@@ -42,11 +42,20 @@ pub struct ChunkStream<S: Unpin> {
 }
 
 impl<S: Unpin> ChunkStream<S> {
-    pub fn new(input: S, chunk_size: Option<usize>, injection_data: Option<InjectionData>) -> Self {
+    pub fn new(
+        input: S,
+        chunk_size: Option<usize>,
+        injection_data: Option<InjectionData>,
+        suggested_boundaries: Option<mpsc::Receiver<u64>>,
+    ) -> Self {
         let chunk_size = chunk_size.unwrap_or(4 * 1024 * 1024);
         Self {
             input,
-            chunker: Box::new(ChunkerImpl::new(chunk_size)),
+            chunker: if let Some(suggested) = suggested_boundaries {
+                Box::new(PayloadChunker::new(chunk_size, suggested))
+            } else {
+                Box::new(ChunkerImpl::new(chunk_size))
+            },
             buffer: BytesMut::new(),
             scan_pos: 0,
             consumed: 0,
diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 19f2349fa..287e47655 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -169,6 +169,7 @@ struct Archiver {
     file_copy_buffer: Vec<u8>,
     skip_e2big_xattr: bool,
     forced_boundaries: Option<mpsc::Sender<InjectChunks>>,
+    suggested_boundaries: Option<mpsc::Sender<u64>>,
     previous_payload_index: Option<DynamicIndexReader>,
     cached_entries: Vec<CacheEntry>,
     cached_hardlinks: HashSet<HardLinkInfo>,
@@ -207,6 +208,7 @@ pub async fn create_archive<T, F>(
     callback: F,
     options: PxarCreateOptions,
     forced_boundaries: Option<mpsc::Sender<InjectChunks>>,
+    suggested_boundaries: Option<mpsc::Sender<u64>>,
 ) -> Result<(), Error>
 where
     T: SeqWrite + Send,
@@ -288,6 +290,7 @@ where
         skip_e2big_xattr: options.skip_e2big_xattr,
         forced_boundaries,
         previous_payload_index,
+        suggested_boundaries,
         cached_entries: Vec::new(),
         cached_range: Range::default(),
         cached_last_chunk: None,
@@ -843,6 +846,11 @@ impl Archiver {
                         .add_file(c_file_name, file_size, stat.st_mtime)?;
                 }
 
+                if let Some(sender) = self.suggested_boundaries.as_mut() {
+                    let offset = encoder.payload_position()?.raw();
+                    sender.send(offset)?;
+                }
+
                 let offset: LinkOffset = if let Some(payload_offset) = payload_offset {
                     self.reuse_stats.total_reused_payload_size +=
                         file_size + size_of::<pxar::format::Header>() as u64;
diff --git a/pbs-client/src/pxar_backup_stream.rs b/pbs-client/src/pxar_backup_stream.rs
index 9d2cb41d6..59ccbd631 100644
--- a/pbs-client/src/pxar_backup_stream.rs
+++ b/pbs-client/src/pxar_backup_stream.rs
@@ -27,6 +27,7 @@ use crate::pxar::create::PxarWriters;
 /// consumer.
 pub struct PxarBackupStream {
     rx: Option<std::sync::mpsc::Receiver<Result<Vec<u8>, Error>>>,
+    pub suggested_boundaries: Option<std::sync::mpsc::Receiver<u64>>,
     handle: Option<AbortHandle>,
     error: Arc<Mutex<Option<String>>>,
 }
@@ -55,19 +56,23 @@ impl PxarBackupStream {
         ));
         let writer = pxar::encoder::sync::StandardWriter::new(writer);
 
-        let (payload_writer, payload_rx) = if separate_payload_stream {
-            let (tx, rx) = std::sync::mpsc::sync_channel(10);
-            let payload_writer = TokioWriterAdapter::new(std::io::BufWriter::with_capacity(
-                buffer_size,
-                StdChannelWriter::new(tx),
-            ));
-            (
-                Some(pxar::encoder::sync::StandardWriter::new(payload_writer)),
-                Some(rx),
-            )
-        } else {
-            (None, None)
-        };
+        let (payload_writer, payload_rx, suggested_boundaries_tx, suggested_boundaries_rx) =
+            if separate_payload_stream {
+                let (tx, rx) = std::sync::mpsc::sync_channel(10);
+                let (suggested_boundaries_tx, suggested_boundaries_rx) = std::sync::mpsc::channel();
+                let payload_writer = TokioWriterAdapter::new(std::io::BufWriter::with_capacity(
+                    buffer_size,
+                    StdChannelWriter::new(tx),
+                ));
+                (
+                    Some(pxar::encoder::sync::StandardWriter::new(payload_writer)),
+                    Some(rx),
+                    Some(suggested_boundaries_tx),
+                    Some(suggested_boundaries_rx),
+                )
+            } else {
+                (None, None, None, None)
+            };
 
         let error = Arc::new(Mutex::new(None));
         let error2 = Arc::clone(&error);
@@ -82,6 +87,7 @@ impl PxarBackupStream {
                 },
                 options,
                 boundaries,
+                suggested_boundaries_tx,
             )
             .await
             {
@@ -96,12 +102,14 @@ impl PxarBackupStream {
 
         let backup_stream = Self {
             rx: Some(rx),
+            suggested_boundaries: None,
             handle: Some(handle.clone()),
             error: Arc::clone(&error),
         };
 
         let backup_payload_stream = payload_rx.map(|rx| Self {
             rx: Some(rx),
+            suggested_boundaries: suggested_boundaries_rx,
             handle: Some(handle),
             error,
         });
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index d620083e1..dccea230e 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -209,7 +209,7 @@ async fn backup_directory<P: AsRef<Path>>(
         payload_target.is_some(),
     )?;
 
-    let mut chunk_stream = ChunkStream::new(pxar_stream, chunk_size, None);
+    let mut chunk_stream = ChunkStream::new(pxar_stream, chunk_size, None, None);
     let (tx, rx) = mpsc::channel(10); // allow to buffer 10 chunks
 
     let stream = ReceiverStream::new(rx).map_err(Error::from);
@@ -223,14 +223,19 @@ async fn backup_directory<P: AsRef<Path>>(
 
     let stats = client.upload_stream(archive_name, stream, upload_options.clone(), None);
 
-    if let Some(payload_stream) = payload_stream {
+    if let Some(mut payload_stream) = payload_stream {
         let payload_target = payload_target
             .ok_or_else(|| format_err!("got payload stream, but no target archive name"))?;
 
         let (payload_injections_tx, payload_injections_rx) = std::sync::mpsc::channel();
         let injection_data = InjectionData::new(payload_boundaries_rx, payload_injections_tx);
-        let mut payload_chunk_stream =
-            ChunkStream::new(payload_stream, chunk_size, Some(injection_data));
+        let suggested_boundaries = payload_stream.suggested_boundaries.take();
+        let mut payload_chunk_stream = ChunkStream::new(
+            payload_stream,
+            chunk_size,
+            Some(injection_data),
+            suggested_boundaries,
+        );
         let (payload_tx, payload_rx) = mpsc::channel(10); // allow to buffer 10 chunks
         let stream = ReceiverStream::new(payload_rx).map_err(Error::from);
 
@@ -573,7 +578,8 @@ fn spawn_catalog_upload(
     let (catalog_tx, catalog_rx) = std::sync::mpsc::sync_channel(10); // allow to buffer 10 writes
     let catalog_stream = proxmox_async::blocking::StdChannelStream(catalog_rx);
     let catalog_chunk_size = 512 * 1024;
-    let catalog_chunk_stream = ChunkStream::new(catalog_stream, Some(catalog_chunk_size), None);
+    let catalog_chunk_stream =
+        ChunkStream::new(catalog_stream, Some(catalog_chunk_size), None, None);
 
     let catalog_writer = Arc::new(Mutex::new(CatalogWriter::new(TokioWriterAdapter::new(
         StdChannelWriter::new(catalog_tx),
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index e50cb8184..956c3246a 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -366,6 +366,7 @@ fn extract(
                         |_| Ok(()),
                         options,
                         None,
+                        None,
                     )
                     .await
                 }
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index c6d3794bb..85f96ad2c 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -407,6 +407,7 @@ async fn create_archive(
         },
         options,
         None,
+        None,
     )
     .await?;
 
diff --git a/tests/catar.rs b/tests/catar.rs
index d5ef85ffe..3f5b22177 100644
--- a/tests/catar.rs
+++ b/tests/catar.rs
@@ -40,6 +40,7 @@ fn run_test(dir_name: &str) -> Result<(), Error> {
         |_| Ok(()),
         options,
         None,
+        None,
     ))?;
 
     Command::new("cmp")
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 59/62] client: pxar: allow to restore prelude to optional path
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (57 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 58/62] client: chunk stream: switch payload stream chunker Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 60/62] client: pxar: add archive creation with reference test Christian Ebner
                   ` (3 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Pxar archives allow to store additional information in a prelude
entry since pxar format version 2.

Add an optional parameter to `pxar` and `proxmox-backup-client` to
specify the path to restore the prelude to and pass this to the
archive extraction by extending the `PxarExtractOptions` by a
corresponding field. If none is given, the prelude is simply skipped
during restore.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- not present in previous version

 pbs-client/src/pxar/extract.rs    | 26 +++++++++++++++++++++++---
 proxmox-backup-client/src/main.rs | 12 +++++++++++-
 pxar-bin/src/main.rs              |  6 ++++++
 3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
index 23b2f6ba5..0fa3d48d7 100644
--- a/pbs-client/src/pxar/extract.rs
+++ b/pbs-client/src/pxar/extract.rs
@@ -2,7 +2,8 @@
 
 use std::collections::HashMap;
 use std::ffi::{CStr, CString, OsStr, OsString};
-use std::io;
+use std::fs::OpenOptions;
+use std::io::{self, Write};
 use std::os::unix::ffi::OsStrExt;
 use std::os::unix::io::{AsRawFd, FromRawFd, RawFd};
 use std::path::{Path, PathBuf};
@@ -37,6 +38,7 @@ pub struct PxarExtractOptions<'a> {
     pub allow_existing_dirs: bool,
     pub overwrite_flags: OverwriteFlags,
     pub on_error: Option<ErrorHandler>,
+    pub prelude_path: Option<PathBuf>,
 }
 
 bitflags! {
@@ -125,8 +127,26 @@ where
         // we use this to keep track of our directory-traversal
         decoder.enable_goodbye_entries(true);
 
-        let (root, _, _) = handle_root_with_optional_format_version_prelude(&mut decoder)
-            .context("error reading pxar archive")?;
+        let (root, _version, prelude) =
+            handle_root_with_optional_format_version_prelude(&mut decoder)
+                .context("error reading pxar archive")?;
+
+        if let Some(ref path) = options.prelude_path {
+            if let Some(entry) = prelude {
+                let mut prelude_file = OpenOptions::new()
+                    .create(true)
+                    .write(true)
+                    .open(path)
+                    .with_context(|| format!("error creating prelude file '{path:?}'"))?;
+                if let pxar::EntryKind::Prelude(ref prelude) = entry.kind() {
+                    prelude_file.write_all(prelude.as_os_str().as_bytes())?;
+                } else {
+                    log::info!("unexpected entry kind for prelude");
+                }
+            } else {
+                log::info!("No prelude entry found, skip prelude restore.");
+            }
+        }
 
         if !root.is_dir() {
             bail!("pxar archive does not start with a directory entry!");
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index dccea230e..1423e6f0f 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -1432,7 +1432,12 @@ We do not extract '.pxar' archives when writing to standard output.
                 description: "ignore errors that occur during device node extraction",
                 optional: true,
                 default: false,
-            }
+            },
+            "restore-prelude-to": {
+                description: "Path to restore prelude to, (pxar v2 archives only).",
+                type: String,
+                optional: true,
+            },
         }
     }
 )]
@@ -1593,12 +1598,17 @@ async fn restore(
             overwrite_flags.insert(pbs_client::pxar::OverwriteFlags::all());
         }
 
+        let prelude_path = param["restore-prelude-to"]
+            .as_str()
+            .map(|path| PathBuf::from(path));
+
         let options = pbs_client::pxar::PxarExtractOptions {
             match_list: &[],
             extract_match_default: true,
             allow_existing_dirs,
             overwrite_flags,
             on_error,
+            prelude_path,
         };
 
         let mut feature_flags = pbs_client::pxar::Flags::DEFAULT;
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index 85f96ad2c..2a5403467 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -126,6 +126,10 @@ fn extract_archive_from_reader<R: std::io::Read>(
                 description: "'ppxar' payload input data file to restore split archive.",
                 optional: true,
             },
+            "restore-prelude-to": {
+                description: "Path to restore pxar archive prelude to.",
+                optional: true,
+            },
         },
     },
 )]
@@ -149,6 +153,7 @@ fn extract_archive(
     no_sockets: bool,
     strict: bool,
     payload_input: Option<String>,
+    restore_prelude_to: Option<String>,
 ) -> Result<(), Error> {
     let mut feature_flags = Flags::DEFAULT;
     if no_xattrs {
@@ -222,6 +227,7 @@ fn extract_archive(
         overwrite_flags,
         extract_match_default,
         on_error,
+        prelude_path: restore_prelude_to.map(|path| PathBuf::from(path)),
     };
 
     if archive == "-" {
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 60/62] client: pxar: add archive creation with reference test
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (58 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 59/62] client: pxar: allow to restore prelude to optional path Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 61/62] client: tools: add helper to raise nofile rlimit Christian Ebner
                   ` (2 subsequent siblings)
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

Add a basic regression test for archive creation with reference
metadata archive and index.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- not present in previous version

 pbs-client/src/pxar/create.rs                 | 242 ++++++++++++++++++
 tests/pxar/backup-client-pxar-data.mpxar      | Bin 0 -> 15070 bytes
 tests/pxar/backup-client-pxar-data.ppxar.didx | Bin 0 -> 8096 bytes
 tests/pxar/backup-client-pxar-expected.mpxar  | Bin 0 -> 15086 bytes
 4 files changed, 242 insertions(+)
 create mode 100644 tests/pxar/backup-client-pxar-data.mpxar
 create mode 100644 tests/pxar/backup-client-pxar-data.ppxar.didx
 create mode 100644 tests/pxar/backup-client-pxar-expected.mpxar

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 287e47655..9349dd8bb 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -1811,3 +1811,245 @@ fn generate_pxar_excludes_cli(patterns: &[MatchEntry]) -> Vec<u8> {
 
     content
 }
+
+#[cfg(test)]
+mod tests {
+    use std::ffi::OsString;
+    use std::fs::File;
+    use std::fs::OpenOptions;
+    use std::io::{self, BufReader, Seek, SeekFrom, Write};
+    use std::pin::Pin;
+    use std::process::Command;
+    use std::sync::mpsc;
+    use std::task::{Context, Poll};
+
+    use pbs_datastore::dynamic_index::DynamicIndexReader;
+    use pxar::accessor::sync::FileReader;
+    use pxar::encoder::SeqWrite;
+
+    use crate::pxar::extract::Extractor;
+    use crate::pxar::OverwriteFlags;
+
+    use super::*;
+
+    struct DummyWriter {
+        file: Option<File>,
+    }
+
+    impl DummyWriter {
+        fn new<P: AsRef<Path>>(path: Option<P>) -> Result<Self, Error> {
+            let file = if let Some(path) = path {
+                Some(
+                    OpenOptions::new()
+                        .read(true)
+                        .write(true)
+                        .truncate(true)
+                        .create(true)
+                        .open(path)?,
+                )
+            } else {
+                None
+            };
+            Ok(Self { file })
+        }
+    }
+
+    impl Write for DummyWriter {
+        fn write(&mut self, data: &[u8]) -> io::Result<usize> {
+            if let Some(file) = self.file.as_mut() {
+                file.write_all(data)?;
+            }
+            Ok(data.len())
+        }
+
+        fn flush(&mut self) -> io::Result<()> {
+            if let Some(file) = self.file.as_mut() {
+                file.flush()?;
+            }
+            Ok(())
+        }
+    }
+
+    impl SeqWrite for DummyWriter {
+        fn poll_seq_write(
+            mut self: Pin<&mut Self>,
+            _cx: &mut Context,
+            buf: &[u8],
+        ) -> Poll<io::Result<usize>> {
+            Poll::Ready(self.as_mut().write(buf))
+        }
+
+        fn poll_flush(mut self: Pin<&mut Self>, _cx: &mut Context) -> Poll<Result<(), io::Error>> {
+            Poll::Ready(self.as_mut().flush())
+        }
+    }
+
+    fn prepare<P: AsRef<Path>>(dir_path: P) -> Result<(), Error> {
+        let dir = nix::dir::Dir::open(dir_path.as_ref(), OFlag::O_DIRECTORY, Mode::empty())?;
+
+        let fs_magic = detect_fs_type(dir.as_raw_fd()).unwrap();
+        let stat = nix::sys::stat::fstat(dir.as_raw_fd()).unwrap();
+        let mut fs_feature_flags = Flags::from_magic(fs_magic);
+        let metadata = get_metadata(
+            dir.as_raw_fd(),
+            &stat,
+            fs_feature_flags,
+            fs_magic,
+            &mut fs_feature_flags,
+            false,
+        )?;
+
+        let mut extractor = Extractor::new(
+            dir,
+            metadata.clone(),
+            true,
+            OverwriteFlags::empty(),
+            fs_feature_flags,
+        );
+
+        let dir_metadata = Metadata {
+            stat: pxar::Stat::default().mode(0o777u64).set_dir().gid(0).uid(0),
+            ..Default::default()
+        };
+
+        let file_metadata = Metadata {
+            stat: pxar::Stat::default()
+                .mode(0o777u64)
+                .set_regular_file()
+                .gid(0)
+                .uid(0),
+            ..Default::default()
+        };
+
+        extractor.enter_directory(
+            OsString::from(format!("testdir")),
+            dir_metadata.clone(),
+            true,
+        )?;
+
+        let size = 1024 * 1024;
+        let mut cursor = BufReader::new(std::io::Cursor::new(vec![0u8; size]));
+        for i in 0..10 {
+            extractor.enter_directory(
+                OsString::from(format!("folder_{i}")),
+                dir_metadata.clone(),
+                true,
+            )?;
+            for j in 0..10 {
+                cursor.seek(SeekFrom::Start(0))?;
+                extractor.extract_file(
+                    CString::new(format!("file_{j}").as_str())?.as_c_str(),
+                    &file_metadata,
+                    size as u64,
+                    &mut cursor,
+                    true,
+                )?;
+            }
+            extractor.leave_directory()?;
+        }
+
+        extractor.leave_directory()?;
+
+        Ok(())
+    }
+
+    #[test]
+    fn test_create_archive_with_reference() -> Result<(), Error> {
+        let mut testdir = PathBuf::from("./target/testout");
+        testdir.push(std::module_path!());
+
+        let _ = std::fs::remove_dir_all(&testdir);
+        let _ = std::fs::create_dir_all(&testdir);
+
+        prepare(testdir.as_path())?;
+
+        let previous_payload_index = Some(DynamicIndexReader::new(File::open(
+            "../tests/pxar/backup-client-pxar-data.ppxar.didx",
+        )?)?);
+        let metadata_archive = File::open("../tests/pxar/backup-client-pxar-data.mpxar").unwrap();
+        let metadata_size = metadata_archive.metadata()?.len();
+        let reader: MetadataArchiveReader = Arc::new(FileReader::new(metadata_archive));
+
+        let rt = tokio::runtime::Runtime::new().unwrap();
+        let (suggested_boundaries, _rx) = mpsc::channel();
+        let (forced_boundaries, _rx) = mpsc::channel();
+
+        rt.block_on(async move {
+            testdir.push("testdir");
+            let source_dir =
+                nix::dir::Dir::open(testdir.as_path(), OFlag::O_DIRECTORY, Mode::empty()).unwrap();
+
+            let fs_magic = detect_fs_type(source_dir.as_raw_fd()).unwrap();
+            let stat = nix::sys::stat::fstat(source_dir.as_raw_fd()).unwrap();
+            let mut fs_feature_flags = Flags::from_magic(fs_magic);
+
+            let metadata = get_metadata(
+                source_dir.as_raw_fd(),
+                &stat,
+                fs_feature_flags,
+                fs_magic,
+                &mut fs_feature_flags,
+                false,
+            )?;
+
+            let mut writer =
+                DummyWriter::new(Some("./target/backup-client-pxar-run.mpxar")).unwrap();
+            let mut payload_writer = Some(DummyWriter::new::<PathBuf>(None).unwrap());
+
+            let mut encoder =
+                Encoder::new(&mut writer, &metadata, payload_writer.as_mut(), Some(&[])).await?;
+
+            let mut archiver = Archiver {
+                feature_flags: Flags::from_magic(fs_magic),
+                fs_feature_flags: Flags::from_magic(fs_magic),
+                fs_magic,
+                callback: Box::new(|_| Ok(())),
+                patterns: Vec::new(),
+                catalog: None,
+                path: PathBuf::new(),
+                entry_counter: 0,
+                entry_limit: 1024,
+                current_st_dev: stat.st_dev,
+                device_set: None,
+                hardlinks: HashMap::new(),
+                file_copy_buffer: vec::undefined(4 * 1024 * 1024),
+                skip_e2big_xattr: false,
+                forced_boundaries: Some(forced_boundaries),
+                previous_payload_index,
+                suggested_boundaries: Some(suggested_boundaries),
+                cached_entries: Vec::new(),
+                cached_range: Range::default(),
+                cached_last_chunk: None,
+                cached_hardlinks: HashSet::new(),
+                caching_enabled: false,
+                reuse_stats: ReuseStats::default(),
+            };
+
+            let accessor = Accessor::new(reader, metadata_size, None).await.unwrap();
+            let root = accessor.open_root().await.ok();
+            archiver
+                .archive_dir_contents(&mut encoder, root, source_dir, true)
+                .await
+                .unwrap();
+
+            archiver
+                .flush_cached_reusing_if_below_threshold(&mut encoder, false)
+                .await
+                .unwrap();
+
+            encoder.finish().await.unwrap();
+            encoder.close().await.unwrap();
+
+            let status = Command::new("diff")
+                .args([
+                    "../tests/pxar/backup-client-pxar-expected.mpxar",
+                    "./target/backup-client-pxar-run.mpxar",
+                ])
+                .status()
+                .expect("failed to execute diff");
+            assert!(status.success());
+
+            Ok::<(), Error>(())
+        })
+    }
+}
diff --git a/tests/pxar/backup-client-pxar-data.mpxar b/tests/pxar/backup-client-pxar-data.mpxar
new file mode 100644
index 0000000000000000000000000000000000000000..00f3dc295fb38062c23e6cf7cac9ae110beb0a65
GIT binary patch
literal 15070
zcmeI3ZD<@t7{_Pd4&n>F7EIfqb#1^FO6}HK%_)tWO2s0r+iLp7VpnN`+SqK3@uiTk
z0g)mo3n~RgSX7jP#Z`-;wUEWA!B4JW5kF|x5580@T|bDmXzQ8GN@tzklRN*x`)~`#
z+|A9+Z+4!U-!r*zm%i41e0X5q&>}W-sk}V(=Du$q+4;h;F8=yl4}ZdoA2i1PeiW~F
z7gkDF&G*_D^Edhj2X^*7yu)JuwZnyZhYt+&$+{a8M{=R@jqX44;jVQr_n5qS`Ja!?
zJj=%~;8y>8^bO)nmIG_xu7%+&Cf=v??$*F?HnW6jmEx|0;T&euxV12x%N!baJq+hD
zm&V-y!}-jkaa}N6z<e54f#E_H2)HYTj;(+Fj+3hvDKpg*+uC;ZZa5IH;Qkxr`*hRW
zPwf8mu1mHb<?ZtNpKkkPvV3Urmo~1zy#C9{-_I`n@$iyOh4%etZrOKo^U622=`*~%
zeP!$T7i*WVKk;~>pO3D*w{v9smfybSqt4rptjHd`|MLm<Vqu)Wo?ABc{O-rP2Mg^x
z7k_iMt1TS)zR-W~W!=Mf|9sD>XZd*YdB}HcLEjPq_HYs}F67(1L&2w#Y%n&v?uz=3
zSjazEo-U<0$><xz#Vn$6IDIE9rg1oZr!1jyIDKa<rExfYGbN*OIDMBD#vM>&W#aU0
zDplb0RRf39x22dg4ySKhu>@R8-*xF*Vx%6v7kKeM>Dy6kA+B?*Z&z_>oMf`bW;a>I
z<m4$Xjl=2NS3DYr(|4fwG!CclPzh)pPT!Fd(m0&HV<n<-IDIEdOyh9+PL)K!we($=
zz9ow2nVpfOKE<8BGbI(`D#hVW-%QPD98TY5mGQr_Y8<H~u^F3PY>L^!RI9-0s|F6I
zZ%Z|498TZ1YSB2Hz8%%3aX5Xuszc*&`u0?p#^LnstDb;s>ANm{OZIGY=sQq-A+B?*
z?@$eB98TYn8qzqNzGF3_agwFbV75rqn8xAsovI0q!|6LyQyPcUH`6j2htqdiWBlvb
z8kruaZ&RxR&pTMO^j(*}C7Y-@^lfRT5Z5`@x2;(;4ySKNvuPYo->&A+IGnyc&82aY
zmDgal@HLOd;q)D7K8?faJJbRihtqeYg)|PQ?^ufjTua||>07d@n?v7;77KBmV|}Mu
zLgR4y&a{-q;q=Y)jK<;gUDg@@$9att98TY+UIm_af|D*4$wF^1TUfeD<8b=6b&JN~
z^zG<2jl=2N)g1xX(sy0@mMpX8(6^_%LR_VL68GJ=uX{8Or|&@bX&g@9p&rmUoW3JH
zq;WWX$9hELaQaU4n8r!=RfE|g)e{<r(|4w)G!Cb4W@G}crSH1*Es1+`=(}t%gFI5<
z^lchdAa#Pn>Dw|)8i&)jZCEr8r*FrwX&g@9uHn!)oW4E7rExfY`-Vs3B-^;bY!Mhf
zjl=0XGy(zF(sy0@mIR_X^c@+Y5Z5_AeaA*b<8b;;jF`sZ^qm?Bjl=0XGg2Cd(>E(+
zG!Ccla*375OpnvIS*il5g9T3CR>`Ds5^FS=E$osd;9B~Y>$^BF%I$l%bRK>#eY7&O
zHYWHE=v;8~)Sh?xU(H|VW$)?(9aB$LPP~7)*uLYlq5Vhi+jHX|?PC2`OI{f&Z9MeB
z>6K!Ahu7bI_2$0s#@C4T7d>;$s;8fP>(9z^v3}X<gBuoTx3=wFD%QVullIW@VSRMn
ee6jw{9WR|3pSSVg=*41v{&S{}`TgcUXZj0DmM>rc

literal 0
HcmV?d00001

diff --git a/tests/pxar/backup-client-pxar-data.ppxar.didx b/tests/pxar/backup-client-pxar-data.ppxar.didx
new file mode 100644
index 0000000000000000000000000000000000000000..a646218b5d504196443b17d62f3b22d171f011b8
GIT binary patch
literal 8096
zcmeIw&x=k`9LMqR`E|?Ga2Ha_;uI^9yBJNz!Z9f+>6R!pO*b*jh^~^|JnlRqSshV|
z%`G8TSEgiYa;C7OyU<J|9b=l(l;<{OBpc7nKj5>pIN$Ya@$KDb%dI01H%~o(*K{tQ
zJ~q6+^=PT==^y&xJ9`F3sC!@cUYwa2-S=W{^1`mZr(YJX_P=dkztmp8Zd>P0&yH!m
zYQlvAp+G1Q3WNfoKqwFjgaV;JC=d#S0-?bFT|iU3_TbPp*KU10`+DQ}SnEW!^~!_6
zmE|WN7T?dES$;Kh@A!s<^qNZtPc0p`(~IYOPkMW9;>Of`Ykp<tpHIL1?CtT1yY~$x
zkW0xxE~6B3Ic1P5D2JS-0&*o;$W>HA&QS%qnjGXj)sSn*LylMjxtI}Kh5y=%W?c!m
zglWhbmOw6L267ooA(yiZas|sFXITNcl3B=Atc09n736B>Am>>PxrTYj5pN(DbK=OZ
zH1A4ee_TV(@C0%xH;~JC3b~wTkSll&Im-*kmE1zE;w9u9uOL@*2RYAc$Ti$Ujzj~w
zSdc(=rA1dF`x6>+MkJ6+g@IfqQpn{ZgIpnU$XQW9t`rt>l_(+SL<PB8ILLWXL#`1X
zawHqb#gZhlD=oVc*`L&qGcti(Dh=c^nL;j?8RQC?L(a+qa;3D8t7Hi|Co9O+(m~G4
z8gh;FkR#PVE>@(FU1`;o$o`auoKXqnQe_~QsT6X#${<&$9CB6_kSmpiT%}6LIaNWf
zRt|Dr)sSnHha5!><l=}TWLG-sN@RbLhMb8K$YqgPb4Mq~o{fC&+#LA+d)TeK+0;Jx
Tczf@GpWkK|cE3#e4vqc=((xEu

literal 0
HcmV?d00001

diff --git a/tests/pxar/backup-client-pxar-expected.mpxar b/tests/pxar/backup-client-pxar-expected.mpxar
new file mode 100644
index 0000000000000000000000000000000000000000..ae4a18c89749f3d7ec82623e84509df19943d03e
GIT binary patch
literal 15086
zcmeI3Z-^9S9LJyew{ZQzQRvjGZ1W%mF~`ihExcw8BMEJ+&NoR;;T@Hij$M~!+%X3c
z5)=a!LLm(mg^)C*cv!*>U2*iP2{P$LIT8J_45t^7Nokw=%!_Ax+~4i?J=zyLa6C89
zJ@<T`XP)Qz{C>O3UiwDo@!`Q)L-SbmQh9m#&Zl18d#vMIli#0ud-r#bZF%Wv55GTG
z=D+abM~$(6erm4+b4!J*XM3IV`5y+h4{qsybhE|&Yln054j&rqmvuKLj^sk)8{PB%
zM_X6zEf;z7ykx98^L+dQZu!4Q-z3iBn7X*@U^tuQ^Q$wv6)>E`EdE&Q;I4<^TxQd_
zl`x#g92$264CgbK#@z_R1<a#yJuqCzd>U7R;UX3YxGRT_u72~*lgs8Q)#{0j9b5a>
z?2DIhA8zNZ*S-7XwomW5WYZDeF0cRj_D?3wgOk5@a0TY|UrzpUcHvKl7p$vkKXB&O
z-6z*CeQTp$?Kp2=x@-K{%EhZsJW<on$5-9oJ+f)T?_cwA<n2e6WDh_1`2>5pW}LsB
zTQv3Jww=9syS(h4|IOK+j&S6Mn*RGP>m9!Lm-|jV&&QKLhg^R(`j!Z=%tywH3;8zh
zQ1GcF8jMY^yIOt6Ead-K$2gMFH;GGFMB{M!PFYOjaQe<zLgR4yW=cxqaQZftjK<;g
zT~ru%K%Je5)3>FVG!Cb4TdB<N{8eXmIDI>cCE(inZb;t}BbE7C;Kl!>Z&$H}b(Ka7
zoW4E9p>dLjH8#D6RU4dq#iemLeFut1<8b;86`#i8^c^Vyjl=0XRzezw(|4joG!Ccl
zREcREPT!f52)MSs8`8H#5#{L_N$OKv_RZ8(SXU_yr*BiuXdF)8MV0YaS#@$8$=Zxf
zZ*6L$g{7J_4ySKht<3NIRcCfMeLJc}<8b<RRh!1)^zEq*jl=2NS6v#1(|4eH0<Nv^
zhV(5tv#p`;Q1yj%ond`PYCz*~`i|9*#^Lmxs1c2mY=tJHMXJU$4yW%-O=uiW-%Lws
z98TY+meDwzzKa^;Z^zaNy*PbanknGg`ff<yl0(!Q`nI*oMxrt}T=wl~7LCK{+tq9u
zhts#GIW!KZZ(nn1oMh)U87%_Mqj5NWhni30aQcq4fX3nU9cv+t!|6NGA_3RdcSHJ?
z?CRFgcdEt0y3TO+ooNY;!|9vpDUHMF+tf1}htqdaXZ(ZnIvE^J-<EFDILStDGFsSr
zWqwp*fz!96TQm-*Z&$Zz98TY!?g+THz8liFWSg~yzJ1*l)^&#U9q1m7!|6NJeHw?;
zccce24yW%}4{01u--#a4IGnyyJ*IJzbJb+D$n=E9;q=Xnl*ZxoZ5o+?YwNoqeM{E8
zHS}FHm_g<^xHnHM!=!OIecMK5epCjB)3;+-G!Cb4*RW|EPT!v4&^VmFeZ!@3IDH3(
zN8=>NxXEY{8a|D~={qt40oT@dL;991L~H0fHbP-tXE^&#jEKhJ^qm?pjl=0XGZGqy
z(>E)mG!Cb4vyjm^oW6?%Rv<$!PTy9+q;WWX+l9*fsKi2IjV7aoQ?LYFTi<eh*FG2J
zj$IqN55JH;UaBtE1U~`Yb8ea1@!r7e`F&pYE#KEQ^-Sr+2Um#gyFMG*bL4>?H~rZu
z)_=9&wV}e=gCCw=D%N*-1HIR*@Be;$g;;;lbJs3=_UU*2DlHc47oFa}W{!4S$F7B9
o{h^z+M~)BcqpN0%^>=T6<;?i3wfjde7VGn`GkwA5n}40@Z-J;Sk^lez

literal 0
HcmV?d00001

-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 61/62] client: tools: add helper to raise nofile rlimit
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (59 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 60/62] client: pxar: add archive creation with reference test Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 62/62] client: pxar: set cache limit based on " Christian Ebner
  2024-05-14 10:52 ` [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

The default soft limit for open file handles is rather low, as some
apis (e.g. the POSIX `select(2)` syscall) do not work [0].

The lookahead cache use during the backup clients metadata comparison
to reuse unchanged files however requires much higher limits to work
effectively.

This helper function allows to raise the soft limit to the hard
limit, as provided by the `getrlimit(2)` syscall.

[0] https://0pointer.net/blog/file-descriptor-limits.html

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- not present in previous version

 pbs-client/src/tools/mod.rs | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/pbs-client/src/tools/mod.rs b/pbs-client/src/tools/mod.rs
index e6cf066e4..8de8b6b80 100644
--- a/pbs-client/src/tools/mod.rs
+++ b/pbs-client/src/tools/mod.rs
@@ -556,3 +556,26 @@ pub fn handle_root_with_optional_format_version_prelude<R: pxar::decoder::SeqRea
         _ => bail!("unexpected entry kind {:?}", first.kind()),
     }
 }
+
+/// Raise the soft limit for open file handles to the hard limit
+///
+/// Returns the values set before raising the limit as libc::rlimit64
+pub fn raise_nofile_limit() -> Result<libc::rlimit64, Error> {
+    let mut old = libc::rlimit64 {
+        rlim_cur: 0,
+        rlim_max: 0,
+    };
+    if 0 != unsafe { libc::getrlimit64(libc::RLIMIT_NOFILE, &mut old as *mut libc::rlimit64) } {
+        bail!("Failed to get nofile rlimit");
+    }
+
+    let mut new = libc::rlimit64 {
+        rlim_cur: old.rlim_max,
+        rlim_max: old.rlim_max,
+    };
+    if 0 != unsafe { libc::setrlimit64(libc::RLIMIT_NOFILE, &mut new as *mut libc::rlimit64) } {
+        bail!("Failed to set nofile rlimit");
+    }
+
+    Ok(old)
+}
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 62/62] client: pxar: set cache limit based on nofile rlimit
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (60 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 61/62] client: tools: add helper to raise nofile rlimit Christian Ebner
@ 2024-05-07 15:52 ` Christian Ebner
  2024-05-14 10:52 ` [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-07 15:52 UTC (permalink / raw)
  To: pbs-devel

The lookahead cache size requires the resource limit for open file
handles to be high in order to allow for efficient reuse of unchanged
file payloads.

Increase the nofile soft limit to the hard limit and dynamically adapt
the cache size to the new soft limit minus the half of the previous
soft limit.

The `PxarCreateOptions` and the `Archiver` are therefore extedned by
an additional field to store the maximum cache size, with fallback to
a default size of 512 entries.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- not present in previous version

 pbs-client/src/pxar/create.rs                 |  7 ++++++-
 proxmox-backup-client/src/main.rs             | 21 ++++++++++++++++---
 .../src/proxmox_restore_daemon/api.rs         |  1 +
 pxar-bin/src/main.rs                          |  1 +
 4 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 9349dd8bb..d326d5ac0 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -57,6 +57,8 @@ pub struct PxarCreateOptions {
     pub skip_e2big_xattr: bool,
     /// Reference state for partial backups
     pub previous_ref: Option<PxarPrevRef>,
+    /// Maximum number of lookahead cache entries
+    pub max_cache_size: Option<usize>,
 }
 
 pub type MetadataArchiveReader = Arc<dyn ReadAt + Send + Sync + 'static>;
@@ -172,6 +174,7 @@ struct Archiver {
     suggested_boundaries: Option<mpsc::Sender<u64>>,
     previous_payload_index: Option<DynamicIndexReader>,
     cached_entries: Vec<CacheEntry>,
+    max_cache_size: usize,
     cached_hardlinks: HashSet<HardLinkInfo>,
     cached_range: Range<u64>,
     cached_last_chunk: Option<ReusableDynamicEntry>,
@@ -292,6 +295,7 @@ where
         previous_payload_index,
         suggested_boundaries,
         cached_entries: Vec::new(),
+        max_cache_size: options.max_cache_size.unwrap_or(MAX_CACHE_SIZE),
         cached_range: Range::default(),
         cached_last_chunk: None,
         cached_hardlinks: HashSet::new(),
@@ -743,7 +747,7 @@ impl Archiver {
         }
 
         // Avoid having to many open file handles in cached entries
-        if self.cached_entries.len() > MAX_CACHE_SIZE {
+        if self.cached_entries.len() > self.max_cache_size {
             log::debug!("Max cache size reached, reuse cached entries");
             self.flush_cached_reusing_if_below_threshold(encoder, true)
                 .await?;
@@ -2018,6 +2022,7 @@ mod tests {
                 previous_payload_index,
                 suggested_boundaries: Some(suggested_boundaries),
                 cached_entries: Vec::new(),
+                max_cache_size: 512,
                 cached_range: Range::default(),
                 cached_last_chunk: None,
                 cached_hardlinks: HashSet::new(),
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 1423e6f0f..359b2afcf 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -41,7 +41,7 @@ use pbs_client::tools::{
         crypto_parameters, format_key_source, get_encryption_key_password, KEYFD_SCHEMA,
         KEYFILE_SCHEMA, MASTER_PUBKEY_FD_SCHEMA, MASTER_PUBKEY_FILE_SCHEMA,
     },
-    CHUNK_SIZE_SCHEMA, REPO_URL_SCHEMA,
+    raise_nofile_limit, CHUNK_SIZE_SCHEMA, REPO_URL_SCHEMA,
 };
 use pbs_client::{
     delete_ticket_info, parse_backup_detection_mode_specification, parse_backup_specification,
@@ -1074,7 +1074,8 @@ async fn create_backup(
                     .start_directory(std::ffi::CString::new(target.as_str())?.as_c_str())?;
 
                 let mut previous_ref = None;
-                if detection_mode.is_metadata() {
+                let max_cache_size = if detection_mode.is_metadata() {
+                    let old_rlimit = raise_nofile_limit()?;
                     if let Some(ref manifest) = previous_manifest {
                         // BackupWriter::start created a new snapshot, get the one before
                         if let Some(backup_time) = client.previous_backup_time().await? {
@@ -1099,7 +1100,20 @@ async fn create_backup(
                             .await?
                         }
                     }
-                }
+
+                    if old_rlimit.rlim_max <= 4096 {
+                        log::info!(
+                            "resource limit for open file handles low: {}",
+                            old_rlimit.rlim_max,
+                        );
+                    }
+
+                    Some(usize::try_from(
+                        old_rlimit.rlim_max - old_rlimit.rlim_cur / 2,
+                    )?)
+                } else {
+                    None
+                };
 
                 let pxar_options = pbs_client::pxar::PxarCreateOptions {
                     device_set: devices.clone(),
@@ -1108,6 +1122,7 @@ async fn create_backup(
                     skip_lost_and_found,
                     skip_e2big_xattr,
                     previous_ref,
+                    max_cache_size,
                 };
 
                 let upload_options = UploadOptions {
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index 956c3246a..49aa96b58 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -356,6 +356,7 @@ fn extract(
                         skip_lost_and_found: false,
                         skip_e2big_xattr: false,
                         previous_ref: None,
+                        max_cache_size: None,
                     };
 
                     let pxar_writer = TokioWriter::new(writer);
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index 2a5403467..c08aee4f6 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -365,6 +365,7 @@ async fn create_archive(
         skip_lost_and_found: false,
         skip_e2big_xattr: false,
         previous_ref: None,
+        max_cache_size: None,
     };
 
     let source = PathBuf::from(source);
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup
  2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
                   ` (61 preceding siblings ...)
  2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 62/62] client: pxar: set cache limit based on " Christian Ebner
@ 2024-05-14 10:52 ` Christian Ebner
  62 siblings, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-14 10:52 UTC (permalink / raw)
  To: pbs-devel

Superseded by version 6, which can be found here:
https://lists.proxmox.com/pipermail/pbs-devel/2024-May/009312.html

Please note that I unfortunately send this with the wrong cover letter, 
the correct cover letter can be found here:
http://lists.proxmox.com/pipermail/pbs-devel/attachments/20240514/a0440508/attachment.patch


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup
  2024-05-14 10:33 Christian Ebner
  2024-05-14 10:45 ` Christian Ebner
@ 2024-05-27 14:35 ` Christian Ebner
  1 sibling, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-27 14:35 UTC (permalink / raw)
  To: pbs-devel

Superseded by version 7 of the patch series:
https://lists.proxmox.com/pipermail/pbs-devel/2024-May/009452.html


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup
  2024-05-14 10:33 Christian Ebner
@ 2024-05-14 10:45 ` Christian Ebner
  2024-05-27 14:35 ` Christian Ebner
  1 sibling, 0 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-14 10:45 UTC (permalink / raw)
  To: pbs-devel

[-- Attachment #1: Type: text/plain, Size: 104 bytes --]

Unfortunately I send this including the wrong cover letter, please find 
attached the correct version 6.

[-- Attachment #2: 0000-cover-letter.patch --]
[-- Type: text/x-diff, Size: 14492 bytes --]

From ee0797da6dc59c4f4115b7d2583fcba3916017f1 Mon Sep 17 00:00:00 2001
From: Christian Ebner <c.ebner@proxmox.com>
Date: Tue, 14 May 2024 12:37:00 +0200
Subject: [PATCH v6 pxar proxmox-backup 00/65] fix #3174: improve file-level backup

This series of patches implements an metadata based file change
detection mechanism for improved pxar file level backup creation speed
for unchanged files.

The chosen approach is to split pxar archives on creation via the
proxmox-backup-client into two separate data and upload streams,
one exclusive for regular file payloads, the other one for the rest
of the pxar archive, which is mostly metadata.

On consecutive runs, the metadata archive of the previous backup run,
which is limited in size and therefore rapidly accessed is used to
lookup and compare the metadata for entries to encode.
This assumes that the connection speed to the Proxmox Backup Server is
sufficiently fast, allowing the download and chaching of the chunks for
that index.

Changes to regular files are detected by comparing all of the files
metadata object, including mtime, acls, ecc. If no changes are detected,
the previous payload index is used to lookup chunks to possibly re-use
in the payload stream of the new archive.
In order to reduce possible chunk fragmentation, the decision whether to
reuse or reencode a file payload is deferred until enough information
is gathered by adding entries to a look-ahead cache. If the padding
introduced by reusing chunks falls below a threshold, the entries are
referenced, the chunks are reused and injected into the pxar payload
upload stream, otherwise they are discated and the files encoded
regularly.

Patches 13 and 14 are to be applied to the pxar repository only after
patch 50 in the series, for the patches to compile in a sequential
chain.

The following lists the most notable changes included in this series since
the version 5:
- Fix an issue where the payload chunker was not correctly reset after
  suggested or forced boundaries.
- Added regression tests for payload chunker and chunk stream.

The following lists the most notable changes included in this series since
the version 4:
- Increase open file handle limit to hard limit and adapt lookahead
  cache size dynamically (thanks a lot to Thomas for pointing this out
  and providing the necessary background information). This helps with
  the reuse of multiple entries being contained within the same chunk,
  otherwise exceeding padding threshold and being therefore reencoded
  instead.
- Fix payload chunker scan to only scan up until chunk pos in case a
  suggested boundary is chosen.
- Fix issue with decoder state being not set to correct `InDirectory`
  after reading prelude and getting root directory entry.
- Fix issue with kept back chunk injection when the chunk follows a
  range discontinuity.
- Add regression test for pxar create with metadata archive and payload
  index reference.

The following lists the most notable changes included in this series since
the version 3:
- Rework the whole reused chunk injection and accounting logic and use
  lockless async `mpsc::channel`s instead of `Arc<Mutex<VecDeque<..>>>`.
- Reworked lookahead caching logic to use payload ranges and check for
  possible range continuation instead of looking up the reusable dynamic
  entries immediately in case of a reusable entry chain. This also
  avoids edge cases not covered in the previous version of the patch series.
  This current version therefore tends to reencode small files more
  aggressively, since they might introduce additional unwanted paddings.
- Correctly cover also hardlinks for the reuse logic, avoiding to
  reencode these entries.
- Add additional dedicatet chunker implementation for payload data
  stream, allowing the archiver to suggest boundaries to the chunker to
  reduce padding for reused chunks.
- Add additional `change-detection-mode=data`, in order to allow
  creating split archives with fully reencoded payload data.
- Add additional payload input readers for pxar accessor type
  implementations where needed.
- Add additional consistency check in pxar encoder when dropping state
  or encoder instance.
- CliParams was renamed to the more opaque Prelude, since the pxar
  archive does not care about its contents and this might be extended to
  store other information about the archive as well.
- Add missing proxmox-file-restore for split archives and fix restore of
  tar/zip archives via WebUI. This is handled by the same decoder logic,
  and needed an updated payload input content range to read the data
  from the correct location in the payload data archive.
- Additional refactoring to use the pxar reader helpers where possible.

The following lists the most notable changes included in this series since
the version 2:
- many bugfixes regarding incorrect archive encoding by wrong offset
  generation, adding additional sanity checks and rather fail on
  encoding than produce an incorrectly encoded archive
- different approach for deciding whether to reuse or reencode the
  entries. Previously, the entries have been encoded when a cached
  payload size threshold was reached. Now, the padding introduced by
  reusable chunks is tracked, and only if the padding does not exceed
  the set threshold, the entries are reused. This reduces the possible
  padding, at the cost of reencoding more entries. Also avoids to
  re-use chunks which have now large padding holes because of
  moved/removed files contained within.
- added headers for metadata archive and payload file
- added documentation

An invocation of a backup run with this patches now is:
```bash
proxmox-backup-client backup <label>.pxar:<source-path> --change-detection-mode=metadata
```
During the first run, no reference index is available, the pxar archive
will however be split into the two parts.
Following backups will however utilize the pxar archive accessor and
index files of the previous run to perform file change detection.

As benchmarks, the linux source code as well as the coco dataset for
computer vision and pattern recognition can be used.
The benchmarks can be performed by running:
```bash
proxmox-backup-test-suite detection-mode-bench prepare --target /<path-to-bench-source-target>
proxmox-backup-test-suite detection-mode-bench run linux.pxar:/<path-to-bench-source-target>/linux
proxmox-backup-test-suite detection-mode-bench run coco.pxar:/<path-to-bench-source-target>/coco
```

Above command invocations assume the default repository and credentials
to be set as environment variables, they might however be passed as
additional optional parameters instead.

pxar:

Christian Ebner (14):
  format/examples: add header type `PXAR_PAYLOAD_REF`
  decoder: add method to read payload references
  decoder: factor out skip part from skip_entry
  encoder: add optional output writer for file payloads
  encoder: move to stack based state tracking
  decoder/accessor: add optional payload input stream
  decoder: set payload input range when decoding via accessor
  encoder: add payload reference capability
  encoder: add payload position capability
  encoder: add payload advance capability
  encoder/format: finish payload stream with marker
  format: add payload stream start marker
  format/encoder/decoder: new pxar entry type `Version`
  format/encoder/decoder: new pxar entry type `Prelude`

 examples/apxar.rs            |   2 +-
 examples/mk-format-hashes.rs |  21 ++
 examples/pxarcmd.rs          |   7 +-
 src/accessor/aio.rs          |  10 +-
 src/accessor/mod.rs          | 116 +++++++-
 src/accessor/sync.rs         |   8 +-
 src/decoder/aio.rs           |  14 +-
 src/decoder/mod.rs           | 212 +++++++++++++--
 src/decoder/sync.rs          |  15 +-
 src/encoder/aio.rs           |  87 ++++--
 src/encoder/mod.rs           | 497 ++++++++++++++++++++++++++---------
 src/encoder/sync.rs          |  67 ++++-
 src/format/mod.rs            |  63 +++++
 src/lib.rs                   |   9 +
 tests/compat.rs              |   3 +-
 tests/simple/fs.rs           |   8 +-
 tests/simple/main.rs         |   8 +-
 17 files changed, 935 insertions(+), 212 deletions(-)

proxmox-backup:

Christian Ebner (51):
  client: pxar: switch to stack based encoder state
  client: backup: factor out extension from backup target
  client: pxar: combine writers into struct
  client: pxar: add optional pxar payload writer instance
  client: pxar: optionally split metadata and payload streams
  client: helper: add helpers for creating reader instances
  client: helper: add method for split archive name mapping
  client: restore: read payload from dedicated index
  tools: cover extension for split pxar archives
  restore: cover extension for split pxar archives
  client: mount: make split pxar archives mountable
  api: datastore: refactor getting local chunk reader
  api: datastore: attach optional payload chunk reader
  catalog: shell: make split pxar archives accessible
  www: cover metadata extension for pxar archives
  file restore: factor out getting pxar reader
  file restore: cover split metadata and payload archives
  file restore: show more error context when extraction fails
  pxar: add optional payload input for achive restore
  pxar: add more context to extraction error
  client: pxar: include payload offset in entry listing
  pxar: show padding in debug output on archive list
  datastore: dynamic index: add method to get digest
  client: pxar: helper for lookup of reusable dynamic entries
  upload stream: implement reused chunk injector
  client: chunk stream: add struct to hold injection state
  chunker: add method to reset chunker state
  client: streams: add channels for dynamic entry injection
  specs: add backup detection mode specification
  client: implement prepare reference method
  client: pxar: add method for metadata comparison
  pxar: caching: add look-ahead cache types
  fix #3174: client: pxar: enable caching and meta comparison
  client: backup writer: add injected chunk count to stats
  pxar: create: keep track of reused chunks and files
  pxar: create: show chunk injection stats debug output
  client: pxar: add helper to handle optional preludes
  client: pxar: opt encode cli exclude patterns as Prelude
  docs: file formats: describe split pxar archive file layout
  docs: add section describing change detection mode
  test-suite: add detection mode change benchmark
  test-suite: add bin to deb, add shell completions
  datastore: chunker: add Chunker trait
  datastore: chunker: implement chunker for payload stream
  client: chunk stream: switch payload stream chunker
  client: pxar: allow to restore prelude to optional path
  client: pxar: add archive creation with reference test
  client: tools: add helper to raise nofile rlimit
  client: pxar: set cache limit based on nofile rlimit
  chunker: tests: add regression tests for payload chunker
  chunk stream: tests: add regression tests for payload chunker

 Cargo.toml                                    |    1 +
 Makefile                                      |   13 +-
 debian/proxmox-backup-client.bash-completion  |    1 +
 debian/proxmox-backup-client.install          |    2 +
 debian/proxmox-backup-test-suite.bc           |    8 +
 docs/backup-client.rst                        |   41 +
 docs/file-formats.rst                         |   46 +
 docs/meta-format-overview.dot                 |   50 +
 examples/test_chunk_size.rs                   |    9 +-
 examples/test_chunk_speed.rs                  |    7 +-
 examples/test_chunk_speed2.rs                 |    2 +-
 pbs-client/src/backup_specification.rs        |   44 +
 pbs-client/src/backup_writer.rs               |  120 +-
 pbs-client/src/chunk_stream.rs                |  211 +++-
 pbs-client/src/inject_reused_chunks.rs        |  129 +++
 pbs-client/src/lib.rs                         |    3 +-
 pbs-client/src/pxar/create.rs                 | 1004 ++++++++++++++++-
 pbs-client/src/pxar/extract.rs                |   31 +-
 pbs-client/src/pxar/look_ahead_cache.rs       |   38 +
 pbs-client/src/pxar/mod.rs                    |    5 +-
 pbs-client/src/pxar/tools.rs                  |  123 +-
 pbs-client/src/pxar_backup_stream.rs          |   68 +-
 pbs-client/src/tools/mod.rs                   |   55 +-
 pbs-datastore/src/chunker.rs                  |  264 ++++-
 pbs-datastore/src/dynamic_index.rs            |   14 +-
 pbs-datastore/src/lib.rs                      |    2 +-
 pbs-pxar-fuse/src/lib.rs                      |    2 +-
 proxmox-backup-client/src/catalog.rs          |   30 +-
 proxmox-backup-client/src/helper.rs           |   96 ++
 proxmox-backup-client/src/main.rs             |  284 ++++-
 proxmox-backup-client/src/mount.rs            |   34 +-
 proxmox-backup-test-suite/Cargo.toml          |   18 +
 .../src/detection_mode_bench.rs               |  294 +++++
 proxmox-backup-test-suite/src/main.rs         |   17 +
 proxmox-file-restore/src/main.rs              |   80 +-
 .../src/proxmox_restore_daemon/api.rs         |   18 +-
 pxar-bin/src/main.rs                          |   61 +-
 src/api2/admin/datastore.rs                   |   47 +-
 src/api2/tape/restore.rs                      |   21 +-
 src/bin/proxmox_backup_debug/diff.rs          |    2 +-
 src/tape/file_formats/snapshot_archive.rs     |    9 +-
 tests/catar.rs                                |    5 +-
 tests/pxar/backup-client-pxar-data.mpxar      |  Bin 0 -> 15070 bytes
 tests/pxar/backup-client-pxar-data.ppxar.didx |  Bin 0 -> 8096 bytes
 tests/pxar/backup-client-pxar-expected.mpxar  |  Bin 0 -> 15086 bytes
 www/datastore/Content.js                      |    6 +-
 zsh-completions/_proxmox-backup-test-suite    |   13 +
 47 files changed, 2996 insertions(+), 332 deletions(-)
 create mode 100644 debian/proxmox-backup-test-suite.bc
 create mode 100644 docs/meta-format-overview.dot
 create mode 100644 pbs-client/src/inject_reused_chunks.rs
 create mode 100644 pbs-client/src/pxar/look_ahead_cache.rs
 create mode 100644 proxmox-backup-client/src/helper.rs
 create mode 100644 proxmox-backup-test-suite/Cargo.toml
 create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs
 create mode 100644 proxmox-backup-test-suite/src/main.rs
 create mode 100644 tests/pxar/backup-client-pxar-data.mpxar
 create mode 100644 tests/pxar/backup-client-pxar-data.ppxar.didx
 create mode 100644 tests/pxar/backup-client-pxar-expected.mpxar
 create mode 100644 zsh-completions/_proxmox-backup-test-suite

-- 
2.39.2


[-- Attachment #3: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup
@ 2024-05-14 10:33 Christian Ebner
  2024-05-14 10:45 ` Christian Ebner
  2024-05-27 14:35 ` Christian Ebner
  0 siblings, 2 replies; 67+ messages in thread
From: Christian Ebner @ 2024-05-14 10:33 UTC (permalink / raw)
  To: pbs-devel

This series of patches implements an metadata based file change
detection mechanism for improved pxar file level backup creation speed
for unchanged files.

The chosen approach is to split pxar archives on creation via the
proxmox-backup-client into two separate data and upload streams,
one exclusive for regular file payloads, the other one for the rest
of the pxar archive, which is mostly metadata.

On consecutive runs, the metadata archive of the previous backup run,
which is limited in size and therefore rapidly accessed is used to
lookup and compare the metadata for entries to encode.
This assumes that the connection speed to the Proxmox Backup Server is
sufficiently fast, allowing the download and chaching of the chunks for
that index.

Changes to regular files are detected by comparing all of the files
metadata object, including mtime, acls, ecc. If no changes are detected,
the previous payload index is used to lookup chunks to possibly re-use
in the payload stream of the new archive.
In order to reduce possible chunk fragmentation, the decision whether to
reuse or reencode a file payload is deferred until enough information
is gathered by adding entries to a look-ahead cache. If the padding
introduced by reusing chunks falls below a threshold, the entries are
referenced, the chunks are reused and injected into the pxar payload
upload stream, otherwise they are discated and the files encoded
regularly.

Patches 13 and 14 are to be applied to the pxar repository only after
patch 50 in the series, for the patches to compile in a sequential
chain.

The following lists the most notable changes included in this series since
the version 4:
- Increase open file handle limit to hard limit and adapt lookahead
  cache size dynamically (thanks a lot to Thomas for pointing this out
  and providing the necessary background information). This helps with
  the reuse of multiple entries being contained within the same chunk,
  otherwise exceeding padding threshold and being therefore reencoded
  instead.
- Fix payload chunker scan to only scan up until chunk pos in case a
  suggested boundary is chosen.
- Fix issue with decoder state being not set to correct `InDirectory`
  after reading prelude and getting root directory entry.
- Fix issue with kept back chunk injection when the chunk follows a
  range discontinuity.
- Add regression test for pxar create with metadata archive and payload
  index reference.

The following lists the most notable changes included in this series since
the version 3:
- Rework the whole reused chunk injection and accounting logic and use
  lockless async `mpsc::channel`s instead of `Arc<Mutex<VecDeque<..>>>`.
- Reworked lookahead caching logic to use payload ranges and check for
  possible range continuation instead of looking up the reusable dynamic
  entries immediately in case of a reusable entry chain. This also
  avoids edge cases not covered in the previous version of the patch series.
  This current version therefore tends to reencode small files more
  aggressively, since they might introduce additional unwanted paddings.
- Correctly cover also hardlinks for the reuse logic, avoiding to
  reencode these entries.
- Add additional dedicatet chunker implementation for payload data
  stream, allowing the archiver to suggest boundaries to the chunker to
  reduce padding for reused chunks.
- Add additional `change-detection-mode=data`, in order to allow
  creating split archives with fully reencoded payload data.
- Add additional payload input readers for pxar accessor type
  implementations where needed.
- Add additional consistency check in pxar encoder when dropping state
  or encoder instance.
- CliParams was renamed to the more opaque Prelude, since the pxar
  archive does not care about its contents and this might be extended to
  store other information about the archive as well.
- Add missing proxmox-file-restore for split archives and fix restore of
  tar/zip archives via WebUI. This is handled by the same decoder logic,
  and needed an updated payload input content range to read the data
  from the correct location in the payload data archive.
- Additional refactoring to use the pxar reader helpers where possible.

The following lists the most notable changes included in this series since
the version 2:
- many bugfixes regarding incorrect archive encoding by wrong offset
  generation, adding additional sanity checks and rather fail on
  encoding than produce an incorrectly encoded archive
- different approach for deciding whether to reuse or reencode the
  entries. Previously, the entries have been encoded when a cached
  payload size threshold was reached. Now, the padding introduced by
  reusable chunks is tracked, and only if the padding does not exceed
  the set threshold, the entries are reused. This reduces the possible
  padding, at the cost of reencoding more entries. Also avoids to
  re-use chunks which have now large padding holes because of
  moved/removed files contained within.
- added headers for metadata archive and payload file
- added documentation

An invocation of a backup run with this patches now is:
```bash
proxmox-backup-client backup <label>.pxar:<source-path> --change-detection-mode=metadata
```
During the first run, no reference index is available, the pxar archive
will however be split into the two parts.
Following backups will however utilize the pxar archive accessor and
index files of the previous run to perform file change detection.

As benchmarks, the linux source code as well as the coco dataset for
computer vision and pattern recognition can be used.
The benchmarks can be performed by running:
```bash
proxmox-backup-test-suite detection-mode-bench prepare --target /<path-to-bench-source-target>
proxmox-backup-test-suite detection-mode-bench run linux.pxar:/<path-to-bench-source-target>/linux
proxmox-backup-test-suite detection-mode-bench run coco.pxar:/<path-to-bench-source-target>/coco
```

Above command invocations assume the default repository and credentials
to be set as environment variables, they might however be passed as
additional optional parameters instead.

pxar:

Christian Ebner (14):
  format/examples: add header type `PXAR_PAYLOAD_REF`
  decoder: add method to read payload references
  decoder: factor out skip part from skip_entry
  encoder: add optional output writer for file payloads
  encoder: move to stack based state tracking
  decoder/accessor: add optional payload input stream
  decoder: set payload input range when decoding via accessor
  encoder: add payload reference capability
  encoder: add payload position capability
  encoder: add payload advance capability
  encoder/format: finish payload stream with marker
  format: add payload stream start marker
  format/encoder/decoder: new pxar entry type `Version`
  format/encoder/decoder: new pxar entry type `Prelude`

 examples/apxar.rs            |   2 +-
 examples/mk-format-hashes.rs |  21 ++
 examples/pxarcmd.rs          |   7 +-
 src/accessor/aio.rs          |  10 +-
 src/accessor/mod.rs          | 116 +++++++-
 src/accessor/sync.rs         |   8 +-
 src/decoder/aio.rs           |  14 +-
 src/decoder/mod.rs           | 212 +++++++++++++--
 src/decoder/sync.rs          |  15 +-
 src/encoder/aio.rs           |  87 ++++--
 src/encoder/mod.rs           | 497 ++++++++++++++++++++++++++---------
 src/encoder/sync.rs          |  67 ++++-
 src/format/mod.rs            |  63 +++++
 src/lib.rs                   |   9 +
 tests/compat.rs              |   3 +-
 tests/simple/fs.rs           |   8 +-
 tests/simple/main.rs         |   8 +-
 17 files changed, 935 insertions(+), 212 deletions(-)

proxmox-backup:

Christian Ebner (48):
  client: pxar: switch to stack based encoder state
  client: backup: factor out extension from backup target
  client: pxar: combine writers into struct
  client: pxar: add optional pxar payload writer instance
  client: pxar: optionally split metadata and payload streams
  client: helper: add helpers for creating reader instances
  client: helper: add method for split archive name mapping
  client: restore: read payload from dedicated index
  tools: cover extension for split pxar archives
  restore: cover extension for split pxar archives
  client: mount: make split pxar archives mountable
  api: datastore: refactor getting local chunk reader
  api: datastore: attach optional payload chunk reader
  catalog: shell: make split pxar archives accessible
  www: cover metadata extension for pxar archives
  file restore: factor out getting pxar reader
  file restore: cover split metadata and payload archives
  file restore: show more error context when extraction fails
  pxar: add optional payload input for achive restore
  pxar: add more context to extraction error
  client: pxar: include payload offset in entry listing
  pxar: show padding in debug output on archive list
  datastore: dynamic index: add method to get digest
  client: pxar: helper for lookup of reusable dynamic entries
  upload stream: implement reused chunk injector
  client: chunk stream: add struct to hold injection state
  client: streams: add channels for dynamic entry injection
  specs: add backup detection mode specification
  client: implement prepare reference method
  client: pxar: add method for metadata comparison
  pxar: caching: add look-ahead cache types
  fix #3174: client: pxar: enable caching and meta comparison
  client: backup writer: add injected chunk count to stats
  pxar: create: keep track of reused chunks and files
  pxar: create: show chunk injection stats debug output
  client: pxar: add helper to handle optional preludes
  client: pxar: opt encode cli exclude patterns as Prelude
  docs: file formats: describe split pxar archive file layout
  docs: add section describing change detection mode
  test-suite: add detection mode change benchmark
  test-suite: add bin to deb, add shell completions
  datastore: chunker: add Chunker trait
  datastore: chunker: implement chunker for payload stream
  client: chunk stream: switch payload stream chunker
  client: pxar: allow to restore prelude to optional path
  client: pxar: add archive creation with reference test
  client: tools: add helper to raise nofile rlimit
  client: pxar: set cache limit based on nofile rlimit

 Cargo.toml                                    |    1 +
 Makefile                                      |   13 +-
 debian/proxmox-backup-client.bash-completion  |    1 +
 debian/proxmox-backup-client.install          |    2 +
 debian/proxmox-backup-test-suite.bc           |    8 +
 docs/backup-client.rst                        |   41 +
 docs/file-formats.rst                         |   46 +
 docs/meta-format-overview.dot                 |   50 +
 examples/test_chunk_size.rs                   |    9 +-
 examples/test_chunk_speed.rs                  |    7 +-
 examples/test_chunk_speed2.rs                 |    2 +-
 pbs-client/src/backup_specification.rs        |   44 +
 pbs-client/src/backup_writer.rs               |  120 +-
 pbs-client/src/chunk_stream.rs                |  122 +-
 pbs-client/src/inject_reused_chunks.rs        |  129 +++
 pbs-client/src/lib.rs                         |    3 +-
 pbs-client/src/pxar/create.rs                 | 1004 ++++++++++++++++-
 pbs-client/src/pxar/extract.rs                |   31 +-
 pbs-client/src/pxar/look_ahead_cache.rs       |   38 +
 pbs-client/src/pxar/mod.rs                    |    5 +-
 pbs-client/src/pxar/tools.rs                  |  123 +-
 pbs-client/src/pxar_backup_stream.rs          |   68 +-
 pbs-client/src/tools/mod.rs                   |   55 +-
 pbs-datastore/src/chunker.rs                  |  161 ++-
 pbs-datastore/src/dynamic_index.rs            |   14 +-
 pbs-datastore/src/lib.rs                      |    2 +-
 pbs-pxar-fuse/src/lib.rs                      |    2 +-
 proxmox-backup-client/src/catalog.rs          |   30 +-
 proxmox-backup-client/src/helper.rs           |   96 ++
 proxmox-backup-client/src/main.rs             |  284 ++++-
 proxmox-backup-client/src/mount.rs            |   34 +-
 proxmox-backup-test-suite/Cargo.toml          |   18 +
 .../src/detection_mode_bench.rs               |  294 +++++
 proxmox-backup-test-suite/src/main.rs         |   17 +
 proxmox-file-restore/src/main.rs              |   80 +-
 .../src/proxmox_restore_daemon/api.rs         |   18 +-
 pxar-bin/src/main.rs                          |   61 +-
 src/api2/admin/datastore.rs                   |   47 +-
 src/api2/tape/restore.rs                      |   21 +-
 src/bin/proxmox_backup_debug/diff.rs          |    2 +-
 src/tape/file_formats/snapshot_archive.rs     |    9 +-
 tests/catar.rs                                |    5 +-
 tests/pxar/backup-client-pxar-data.mpxar      |  Bin 0 -> 15070 bytes
 tests/pxar/backup-client-pxar-data.ppxar.didx |  Bin 0 -> 8096 bytes
 tests/pxar/backup-client-pxar-expected.mpxar  |  Bin 0 -> 15086 bytes
 www/datastore/Content.js                      |    6 +-
 zsh-completions/_proxmox-backup-test-suite    |   13 +
 47 files changed, 2802 insertions(+), 334 deletions(-)
 create mode 100644 debian/proxmox-backup-test-suite.bc
 create mode 100644 docs/meta-format-overview.dot
 create mode 100644 pbs-client/src/inject_reused_chunks.rs
 create mode 100644 pbs-client/src/pxar/look_ahead_cache.rs
 create mode 100644 proxmox-backup-client/src/helper.rs
 create mode 100644 proxmox-backup-test-suite/Cargo.toml
 create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs
 create mode 100644 proxmox-backup-test-suite/src/main.rs
 create mode 100644 tests/pxar/backup-client-pxar-data.mpxar
 create mode 100644 tests/pxar/backup-client-pxar-data.ppxar.didx
 create mode 100644 tests/pxar/backup-client-pxar-expected.mpxar
 create mode 100644 zsh-completions/_proxmox-backup-test-suite

-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2024-05-27 14:35 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-07 15:51 [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 01/62] format/examples: add header type `PXAR_PAYLOAD_REF` Christian Ebner
2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 02/62] decoder: add method to read payload references Christian Ebner
2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 03/62] decoder: factor out skip part from skip_entry Christian Ebner
2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 04/62] encoder: add optional output writer for file payloads Christian Ebner
2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 05/62] encoder: move to stack based state tracking Christian Ebner
2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 06/62] decoder/accessor: add optional payload input stream Christian Ebner
2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 07/62] decoder: set payload input range when decoding via accessor Christian Ebner
2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 08/62] encoder: add payload reference capability Christian Ebner
2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 09/62] encoder: add payload position capability Christian Ebner
2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 10/62] encoder: add payload advance capability Christian Ebner
2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 11/62] encoder/format: finish payload stream with marker Christian Ebner
2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 12/62] format: add payload stream start marker Christian Ebner
2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 13/62] format/encoder/decoder: new pxar entry type `Version` Christian Ebner
2024-05-07 15:51 ` [pbs-devel] [PATCH v5 pxar 14/62] format/encoder/decoder: new pxar entry type `Prelude` Christian Ebner
2024-05-07 15:51 ` [pbs-devel] [PATCH v5 proxmox-backup 15/62] client: pxar: switch to stack based encoder state Christian Ebner
2024-05-07 15:51 ` [pbs-devel] [PATCH v5 proxmox-backup 16/62] client: backup: factor out extension from backup target Christian Ebner
2024-05-07 15:51 ` [pbs-devel] [PATCH v5 proxmox-backup 17/62] client: pxar: combine writers into struct Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 18/62] client: pxar: add optional pxar payload writer instance Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 19/62] client: pxar: optionally split metadata and payload streams Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 20/62] client: helper: add helpers for creating reader instances Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 21/62] client: helper: add method for split archive name mapping Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 22/62] client: restore: read payload from dedicated index Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 23/62] tools: cover extension for split pxar archives Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 24/62] restore: " Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 25/62] client: mount: make split pxar archives mountable Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 26/62] api: datastore: refactor getting local chunk reader Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 27/62] api: datastore: attach optional payload " Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 28/62] catalog: shell: make split pxar archives accessible Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 29/62] www: cover metadata extension for pxar archives Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 30/62] file restore: factor out getting pxar reader Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 31/62] file restore: cover split metadata and payload archives Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 32/62] file restore: show more error context when extraction fails Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 33/62] pxar: add optional payload input for achive restore Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 34/62] pxar: add more context to extraction error Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 35/62] client: pxar: include payload offset in entry listing Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 36/62] pxar: show padding in debug output on archive list Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 37/62] datastore: dynamic index: add method to get digest Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 38/62] client: pxar: helper for lookup of reusable dynamic entries Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 39/62] upload stream: implement reused chunk injector Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 40/62] client: chunk stream: add struct to hold injection state Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 41/62] client: streams: add channels for dynamic entry injection Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 42/62] specs: add backup detection mode specification Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 43/62] client: implement prepare reference method Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 44/62] client: pxar: add method for metadata comparison Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 45/62] pxar: caching: add look-ahead cache types Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 46/62] fix #3174: client: pxar: enable caching and meta comparison Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 47/62] client: backup writer: add injected chunk count to stats Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 48/62] pxar: create: keep track of reused chunks and files Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 49/62] pxar: create: show chunk injection stats debug output Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 50/62] client: pxar: add helper to handle optional preludes Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 51/62] client: pxar: opt encode cli exclude patterns as Prelude Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 52/62] docs: file formats: describe split pxar archive file layout Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 53/62] docs: add section describing change detection mode Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 54/62] test-suite: add detection mode change benchmark Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 55/62] test-suite: add bin to deb, add shell completions Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 56/62] datastore: chunker: add Chunker trait Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 57/62] datastore: chunker: implement chunker for payload stream Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 58/62] client: chunk stream: switch payload stream chunker Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 59/62] client: pxar: allow to restore prelude to optional path Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 60/62] client: pxar: add archive creation with reference test Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 61/62] client: tools: add helper to raise nofile rlimit Christian Ebner
2024-05-07 15:52 ` [pbs-devel] [PATCH v5 proxmox-backup 62/62] client: pxar: set cache limit based on " Christian Ebner
2024-05-14 10:52 ` [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup Christian Ebner
2024-05-14 10:33 Christian Ebner
2024-05-14 10:45 ` Christian Ebner
2024-05-27 14:35 ` Christian Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal