public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup
@ 2023-11-15 15:47 Christian Ebner
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 1/28] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
                   ` (27 more replies)
  0 siblings, 28 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:47 UTC (permalink / raw)
  To: pbs-devel

Changes to the patch series since version 4 are based on the feedback
obtained via internal communication channels. Many thanks to Thomas,
Fabian, Wolfgang and Dominik for their continuous feedback up until
now.

This series of patches implements an metadata based file change
detection mechanism for improved pxar file level backup creation speed
for unchanged files.

The chosen approach is to skip encoding of regular file payloads,
for which metadata (currently ctime and size) did not change as
compared to a previous backup run. Instead of re-encoding the files, a
reference to a newly introduced appendix section of the pxar archive
will be written. The appendix section will be created as concatenation
of indexed chunks from the previous backup run, thereby containing the
sequential file payload at a calculated offset with respect to the
starting point of the appendix section.

Metadata comparison and calculation of the chunks to be indexed for the
appendix section is performed using the catalog of a previous backup as
reference. In order to be able to calculate the offsets, an updated
catalog file format version 2 is introduced which extends the previous
version by including the file offset with respect to the pxar archive
byte stream, as well as the files ctime. This allows to find the required
chunks indexes and the start padding within the concatenated chunks.
The catalog reader remains backwards compatible to the catalog file
format version 1.

During encoding, the chunks needed for the appendix section are injected
in the backup upload stream after forcing a chunk boundary when regular
pxar encoding is finished. Finally, the pxar archive containing an
appendix section are marked as such by appending a final pxar goodbye
lookup table only containing the offset to the appendix section start and
total size of that section, needed for random access as e.g. to mount
the archive via the fuse filesystem implementation.

The following lists the most notable changes included in this series since
the version 4:

- fix an issue with premature injection queue chunk insertion on initialization
- fix an issue with the decoder state not being correctly set at the start of
  the appendix section, leading to decoding errors in special cases.
- avoid double injection of chunks in cases where the chunk list to insert
  starts with the first chunk of the list already being present, but not the
  subsequent ones
- refactoring and renaming of the Encoder's `bytes_len` to `encoded_size`

The following lists the most notable changes included in this series since
the version 3:

- count appendix chunks as reused chunks, as they are not re-encoded and
  re-uploaded.
- add a heuristic to reduce chunk fragmentation for appendix chunks for
  multiple consecutive backup runs with metadata based file change detection.
- refactor the appendix list generation during pxar encoding in the archiver.
- switch from Vec to BTreeMap for the restoring the appendix entries so
  entries are inserted in sorted order based on their offset, making it
  unnecessary to sort afterwards.
- fix issue with chunk injection code which lead to data corruption in some
  edge cases, add additional checks as fortification.

The following lists the most notable changes included in this series since
the version 2:

- avoid re-indexing the same chunks multiple times in the appendix
  section by looking them up in the already present appendix chunks list
  and calculate the appendix reference offset accordingly. This now
  requires to sort entries by their appendix start offset for sequential
  restore.
- reduce appendix reference chunk fragmentation by increasing the file
  size threshold to 1k.
- Fix the WebUIs file browser and single file restore, broken in the
  previous patch series.
- fixes previous catalog and/or dynamic index downloads when either the
  backup group was empty or no archive with the same name present in the
  backup

The following lists the most notable changes included in this series since
the version 1:

- fixes and refactors the missing chunk issue by modifying the logic to
  avoid re-appending the same chunks multiple times if referenced by
  multiple consecutive files.
- fixes a performance issue with catalog lookup being the bottleneck
  for cases with directories with many entries, resulting in the
  metadata based file change detection performing worse than the regular
  mode.
- fixes the creation of multi archive backups. All of the archives use
  the same reference catalog.
- the catalog file format is pushed to version 2, including the needed
  archive offsets as well as ctime for file change detection.
- the catalog is fully backward compatible to catalog file format
  version 1, so both can be decoded by the reader. However, the new
  version of the catalog file format will be used for all new backups
- it is now possible to perform multiple consecutive runs of the backup
  with metadata based file change detection, no more need to perform the
  regular run previous to the other one.
- change from `incremental` flag to enum based `BackupDetectionMode`
  parameter for command invocations.
- includes a new `proxmox-backup-test-suite` binary to create and run
  benchmarks to compare the performance of the different detection modes.

An invocation of a backup run with this patches now is:
```bash
proxmox-backup-client backup <label>.pxar:<source-path> --change-detection-mode=metadata
```
During the first run, no reference catalog can be used, the backup will
be a regular run. Following backups will however utilize the catalog and
index files of the previous run to perform file change detection.

As benchmarks, the linux source code as well as the coco dataset for
computer vision and pattern recognition can be used.
The benchmarks can be performed by running:
```bash
proxmox-backup-test-suite detection-mode-bench prepare --target /<path-to-bench-source-target>
proxmox-backup-test-suite detection-mode-bench run linux.pxar:/<path-to-bench-source-target>/linux
proxmox-backup-test-suite detection-mode-bench run coco.pxar:/<path-to-bench-source-target>/coco
```

Above command invocations assume the default repository and credentials
to be set as environment variables, they might however be passed as
additinal optional parameters instead.

Benchmark runs using these test data show a significant improvement in
the time needed for the backups:

For the linux source code backup:
    Completed benchmark with 5 runs for each tested mode.

    Completed regular backup with:
    Total runtime: 46.40 s
    Average: 9.28 ± 0.04 s
    Min: 9.25 s
    Max: 9.34 s

    Completed metadata detection mode backup with:
    Total runtime: 9.07 s
    Average: 1.81 ± 0.09 s
    Min: 1.70 s
    Max: 1.92 s

    Differences (metadata based - regular):
    Delta total runtime: -37.34 s (-80.46 %)
    Delta average: -7.47 ± 0.10 s (-80.46 %)
    Delta min: -7.55 s (-81.63 %)
    Delta max: -7.43 s (-79.49 %)

For the coco dataset backup:
    Completed benchmark with 5 runs for each tested mode.

    Completed regular backup with:
    Total runtime: 586.37 s
    Average: 117.27 ± 5.26 s
    Min: 108.51 s
    Max: 121.13 s

    Completed metadata detection mode backup with:
    Total runtime: 124.33 s
    Average: 24.87 ± 1.00 s
    Min: 23.71 s
    Max: 26.19 s

    Differences (metadata based - regular):
    Delta total runtime: -462.04 s (-78.80 %)
    Delta average: -92.41 ± 5.35 s (-78.80 %)
    Delta min: -84.79 s (-78.15 %)
    Delta max: -94.94 s (-78.38 %)

Above tests were performed inside a test VM backing up to a local
datastore.

pxar:

Christian Ebner (8):
  fix #3174: decoder: factor out skip_bytes from skip_entry
  fix #3174: decoder: impl skip_bytes for sync dec
  fix #3174: encoder: calc filename + metadata byte size
  fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype
  fix #3174: enc/dec: impl PXAR_APPENDIX entrytype
  fix #3174: encoder: helper to add to encoder position
  fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype
  fix #3174: enc/dec: introduce pxar format version 2

 examples/mk-format-hashes.rs |  16 +++
 examples/pxarcmd.rs          |   4 +-
 src/accessor/mod.rs          | 132 +++++++++++++++++++++----
 src/decoder/mod.rs           |  83 ++++++++++++++--
 src/decoder/sync.rs          |   5 +
 src/encoder/aio.rs           |  60 ++++++++++--
 src/encoder/mod.rs           | 183 ++++++++++++++++++++++++++++++++++-
 src/encoder/sync.rs          |  54 +++++++++--
 src/format/mod.rs            |  29 +++++-
 src/lib.rs                   |  10 ++
 10 files changed, 526 insertions(+), 50 deletions(-)

proxmox-backup:

Christian Ebner (19):
  fix #3174: index: add fn index list from start/end-offsets
  fix #3174: index: add fn digest for DynamicEntry
  fix #3174: api: double catalog upload size
  fix #3174: catalog: introduce extended format v2
  fix #3174: archiver/extractor: impl appendix ref
  fix #3174: catalog: add specialized Archive entry
  fix #3174: extractor: impl seq restore from appendix
  fix #3174: archiver: store ref to previous backup
  fix #3174: upload stream: impl reused chunk injector
  fix #3174: chunker: add forced boundaries
  fix #3174: backup writer: inject queued chunk in upload steam
  fix #3174: archiver: reuse files with unchanged metadata
  fix #3174: specs: add backup detection mode specification
  fix #3174: client: Add detection mode to backup creation
  test-suite: add detection mode change benchmark
  test-suite: Add bin to deb, add shell completions
  catalog: fetch offset and size for files and refs
  pxar: add heuristic to reduce reused chunk fragmentation
  catalog: use format version 2 conditionally

 Cargo.toml                                    |   1 +
 Makefile                                      |  13 +-
 debian/proxmox-backup-client.bash-completion  |   1 +
 debian/proxmox-backup-client.install          |   2 +
 debian/proxmox-backup-test-suite.bc           |   8 +
 examples/test_chunk_speed2.rs                 |   9 +-
 pbs-client/src/backup_specification.rs        |  30 +
 pbs-client/src/backup_writer.rs               |  89 ++-
 pbs-client/src/catalog_shell.rs               |   3 +-
 pbs-client/src/chunk_stream.rs                |  42 +-
 pbs-client/src/inject_reused_chunks.rs        | 153 ++++
 pbs-client/src/lib.rs                         |   1 +
 pbs-client/src/pxar/create.rs                 | 401 +++++++++-
 pbs-client/src/pxar/extract.rs                | 149 +++-
 pbs-client/src/pxar/mod.rs                    |   2 +-
 pbs-client/src/pxar/tools.rs                  |   9 +
 pbs-client/src/pxar_backup_stream.rs          |   8 +-
 pbs-datastore/src/catalog.rs                  | 693 ++++++++++++++++--
 pbs-datastore/src/dynamic_index.rs            |  38 +
 pbs-datastore/src/file_formats.rs             |   3 +
 proxmox-backup-client/src/main.rs             | 190 ++++-
 proxmox-backup-test-suite/Cargo.toml          |  18 +
 .../src/detection_mode_bench.rs               | 291 ++++++++
 proxmox-backup-test-suite/src/main.rs         |  17 +
 .../src/proxmox_restore_daemon/api.rs         |  21 +-
 pxar-bin/src/main.rs                          |  23 +-
 src/api2/backup/upload_chunk.rs               |   4 +-
 src/tape/file_formats/snapshot_archive.rs     |   9 +-
 tests/catar.rs                                |   3 +
 zsh-completions/_proxmox-backup-test-suite    |  13 +
 30 files changed, 2059 insertions(+), 185 deletions(-)
 create mode 100644 debian/proxmox-backup-test-suite.bc
 create mode 100644 pbs-client/src/inject_reused_chunks.rs
 create mode 100644 proxmox-backup-test-suite/Cargo.toml
 create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs
 create mode 100644 proxmox-backup-test-suite/src/main.rs
 create mode 100644 zsh-completions/_proxmox-backup-test-suite

proxmox-widget-toolkit:

Christian Ebner (1):
  file-browser: support pxar archive and fileref types

 src/Schema.js             |  2 ++
 src/window/FileBrowser.js | 16 ++++++++++++----
 2 files changed, 14 insertions(+), 4 deletions(-)

-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 pxar 1/28] fix #3174: decoder: factor out skip_bytes from skip_entry
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
@ 2023-11-15 15:47 ` Christian Ebner
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 2/28] fix #3174: decoder: impl skip_bytes for sync dec Christian Ebner
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:47 UTC (permalink / raw)
  To: pbs-devel

Allows to skip bytes independently of the current entries header, as is
implemented by skip_entry.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v4:
- no changes

Changes since v3:
- no changes

Changes since v2:
- no changes

Changes since v1:
- use `u64` instead of `usize` as type for len parameter of `skip_bytes`

 src/decoder/mod.rs | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index d1fb911..a094324 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -562,20 +562,25 @@ impl<I: SeqRead> DecoderImpl<I> {
     // These utilize additional information and hence are not part of the `dyn SeqRead` impl.
     //
 
-    async fn skip_entry(&mut self, offset: u64) -> io::Result<()> {
-        let mut len = self.current_header.content_size() - offset;
+    async fn skip_bytes(&mut self, len: u64) -> io::Result<()> {
+        let mut remaining = len;
         let scratch = scratch_buffer();
-        while len >= (scratch.len() as u64) {
+        while remaining >= (scratch.len() as u64) {
             seq_read_exact(&mut self.input, scratch).await?;
-            len -= scratch.len() as u64;
+            remaining -= scratch.len() as u64;
         }
-        let len = len as usize;
-        if len > 0 {
-            seq_read_exact(&mut self.input, &mut scratch[..len]).await?;
+        let remaining = remaining as usize;
+        if remaining > 0 {
+            seq_read_exact(&mut self.input, &mut scratch[..remaining]).await?;
         }
         Ok(())
     }
 
+    async fn skip_entry(&mut self, offset: u64) -> io::Result<()> {
+        let len = self.current_header.content_size() - offset;
+        self.skip_bytes(len).await
+    }
+
     async fn read_entry_as_bytes(&mut self) -> io::Result<Vec<u8>> {
         let size = usize::try_from(self.current_header.content_size()).map_err(io_err_other)?;
         let data = seq_read_exact_data(&mut self.input, size).await?;
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 pxar 2/28] fix #3174: decoder: impl skip_bytes for sync dec
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 1/28] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
@ 2023-11-15 15:47 ` Christian Ebner
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 3/28] fix #3174: encoder: calc filename + metadata byte size Christian Ebner
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:47 UTC (permalink / raw)
  To: pbs-devel

Allows to skip over a given number of bytes during pxar archive
decoding. This is used to skip over unneeded contents in a pxar
appendix.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v4:
- no changes

Changes since v3:
- no changes

Changes since v2:
- no changes

Changes since v1:
- remove type cast to `usize`, not needed anymore

 src/decoder/sync.rs | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/decoder/sync.rs b/src/decoder/sync.rs
index 5597a03..bdfe2f4 100644
--- a/src/decoder/sync.rs
+++ b/src/decoder/sync.rs
@@ -81,6 +81,11 @@ impl<T: SeqRead> Decoder<T> {
     pub fn enable_goodbye_entries(&mut self, on: bool) {
         self.inner.with_goodbye_tables = on;
     }
+
+    /// Skip bytes in decoder stream.
+    pub fn skip_bytes(&mut self, len: u64) -> io::Result<()> {
+        poll_result_once(self.inner.skip_bytes(len))
+    }
 }
 
 impl<T: SeqRead> Iterator for Decoder<T> {
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 pxar 3/28] fix #3174: encoder: calc filename + metadata byte size
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 1/28] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 2/28] fix #3174: decoder: impl skip_bytes for sync dec Christian Ebner
@ 2023-11-15 15:47 ` Christian Ebner
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 4/28] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype Christian Ebner
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:47 UTC (permalink / raw)
  To: pbs-devel

Introduce SeqSink and impl SeqWrite in order to create an encoder
implementation which instead of writing data to a stream, consumes
the encoded stream and returns the consumed bytes for that stream.

Based on this, implement a helper function `byte_len` which returns the
byte size of the filename entry and metadata entry as encoded by the
archive.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v4:
- Refactor and rename `bytes_len` to `encoded_size`

Changes since v3:
- no changes

Changes since v2:
- no changes

Changes since v1:
- Instead of calculating the metadata size based on the known encoding
  sizes, implement an Encoder instance which counts the encoded bytes.

 src/encoder/mod.rs  | 35 +++++++++++++++++++++++++++++++++++
 src/encoder/sync.rs |  5 +++++
 2 files changed, 40 insertions(+)

diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 0d342ec..860c21f 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -85,6 +85,24 @@ where
     }
 }
 
+#[derive(Default)]
+/// Sink to consume sequential byte stream
+pub struct SeqSink;
+
+impl SeqWrite for SeqSink {
+    fn poll_seq_write(
+        self: Pin<&mut Self>,
+        _cx: &mut Context,
+        buf: &[u8],
+    ) -> Poll<io::Result<usize>> {
+        Poll::Ready(Ok(buf.len()))
+    }
+
+    fn poll_flush(self: Pin<&mut Self>, _cx: &mut Context) -> Poll<io::Result<()>> {
+        Poll::Ready(Ok(()))
+    }
+}
+
 /// awaitable verison of `poll_seq_write`.
 async fn seq_write<T: SeqWrite + ?Sized>(
     output: &mut T,
@@ -183,6 +201,23 @@ where
     seq_write_pxar_entry(output, htype, buf, position).await
 }
 
+/// Calculate the encoded byte len of filename and metadata struct
+pub async fn encoded_size(filename: &std::ffi::CStr, metadata: &Metadata) -> io::Result<u64> {
+    let mut this = EncoderImpl {
+        output: EncoderOutput::Owned(SeqSink::default()),
+        state: EncoderState::default(),
+        parent: None,
+        finished: false,
+        file_copy_buffer: Arc::new(Mutex::new(unsafe {
+            crate::util::vec_new_uninitialized(1024 * 1024)
+        })),
+    };
+
+    this.start_file_do(Some(metadata), filename.to_bytes())
+        .await?;
+    Ok(this.position())
+}
+
 /// Error conditions caused by wrong usage of this crate.
 #[derive(Clone, Copy, Debug, Eq, PartialEq)]
 pub enum EncodeError {
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 1ec91b8..93c3b2c 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -228,3 +228,8 @@ impl<T: io::Write> SeqWrite for StandardWriter<T> {
         Poll::Ready(self.pin_to_inner().and_then(|inner| inner.flush()))
     }
 }
+
+/// Calculate the encoded byte len of filename and metadata struct
+pub fn encoded_size(filename: &std::ffi::CStr, metadata: &Metadata) -> io::Result<u64> {
+    poll_result_once(crate::encoder::encoded_size(filename, metadata))
+}
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 pxar 4/28] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (2 preceding siblings ...)
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 3/28] fix #3174: encoder: calc filename + metadata byte size Christian Ebner
@ 2023-11-15 15:47 ` Christian Ebner
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 5/28] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype Christian Ebner
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:47 UTC (permalink / raw)
  To: pbs-devel

Add an additional entry type for regular files to store a reference to
the appenidx section of the pxar archive, the position relative to the
appendix start is stored, in order to be able to access the file payload
within the appendix.

This new entry type is used to reference the contents of existing file
payload chunks for unchanged file payloads.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v4:
- no changes

Changes since v3:
- no changes

Changes since v2:
- no changes

Changes since v1:
- Do not re-encode the filename and metadata in add_appendix_ref by
  removing the call to start_file_do
- Use dedicated types for archive offsets instead of raw `u64` values.

 examples/mk-format-hashes.rs |  5 +++
 src/decoder/mod.rs           | 22 ++++++++++++-
 src/encoder/aio.rs           | 14 ++++++++-
 src/encoder/mod.rs           | 61 ++++++++++++++++++++++++++++++++++++
 src/encoder/sync.rs          | 16 +++++++++-
 src/format/mod.rs            |  5 +++
 src/lib.rs                   |  6 ++++
 7 files changed, 126 insertions(+), 3 deletions(-)

diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index 1ad606c..8b4f5de 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -41,6 +41,11 @@ const CONSTANTS: &[(&str, &str, &str)] = &[
         "PXAR_PAYLOAD",
         "__PROXMOX_FORMAT_PXAR_PAYLOAD__",
     ),
+    (
+        "Marks the beginnig of an appendix reference for regular files",
+        "PXAR_APPENDIX_REF",
+        "__PROXMOX_FORMAT_PXAR_APPENDIX_REF__",
+    ),
     (
         "Marks item as entry of goodbye table",
         "PXAR_GOODBYE",
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index a094324..70a8697 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -276,6 +276,10 @@ impl<I: SeqRead> DecoderImpl<I> {
 
             match self.current_header.htype {
                 format::PXAR_FILENAME => return self.handle_file_entry().await,
+                format::PXAR_APPENDIX_REF => {
+                    self.state = State::Default;
+                    return self.handle_appendix_ref_entry().await
+                }
                 format::PXAR_GOODBYE => {
                     self.state = State::InGoodbyeTable;
 
@@ -334,6 +338,22 @@ impl<I: SeqRead> DecoderImpl<I> {
         self.read_next_entry().await.map(Some)
     }
 
+    async fn handle_appendix_ref_entry(&mut self) -> io::Result<Option<Entry>> {
+        let bytes = self.read_entry_as_bytes().await?;
+        let appendix_offset = u64::from_le_bytes(bytes[0..8].try_into().unwrap());
+        let file_size = u64::from_le_bytes(bytes[8..16].try_into().unwrap());
+
+        self.reset_path()?;
+        return Ok(Some(Entry {
+            path: self.entry.path.clone(),
+            metadata: Metadata::default(),
+            kind: EntryKind::AppendixRef {
+                appendix_offset,
+                file_size,
+            }
+        }));
+    }
+
     fn reset_path(&mut self) -> io::Result<()> {
         let path_len = *self
             .path_lengths
@@ -535,7 +555,7 @@ impl<I: SeqRead> DecoderImpl<I> {
                 self.state = State::InPayload { offset: 0 };
                 return Ok(ItemResult::Entry);
             }
-            format::PXAR_FILENAME | format::PXAR_GOODBYE => {
+            format::PXAR_FILENAME | format::PXAR_GOODBYE | format::PXAR_APPENDIX_REF => {
                 if self.entry.metadata.is_fifo() {
                     self.state = State::InSpecialFile;
                     self.entry.kind = EntryKind::Fifo;
diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index ad25fea..66ea535 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -5,7 +5,7 @@ use std::path::Path;
 use std::pin::Pin;
 use std::task::{Context, Poll};
 
-use crate::encoder::{self, LinkOffset, SeqWrite};
+use crate::encoder::{self, AppendixRefOffset, LinkOffset, SeqWrite};
 use crate::format;
 use crate::Metadata;
 
@@ -112,6 +112,18 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         self.inner.finish().await
     }
 
+    /// Add reference to archive appendix
+    pub async fn add_appendix_ref<PF: AsRef<Path>>(
+        &mut self,
+        file_name: PF,
+        appendix_ref_offset: AppendixRefOffset,
+        file_size: u64,
+    ) -> io::Result<()> {
+        self.inner
+            .add_appendix_ref(file_name.as_ref(), appendix_ref_offset, file_size)
+            .await
+    }
+
     /// Add a symbolic link to the archive.
     pub async fn add_symlink<PF: AsRef<Path>, PT: AsRef<Path>>(
         &mut self,
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 860c21f..982e3f9 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -38,6 +38,33 @@ impl LinkOffset {
     }
 }
 
+/// File reference used to create appendix references.
+#[derive(Clone, Copy, Debug, Default, Eq, PartialEq, Ord, PartialOrd)]
+pub struct AppendixRefOffset(u64);
+
+impl AppendixRefOffset {
+    /// Get the raw byte offset for this appendix reference.
+    #[inline]
+    pub fn raw(self) -> u64 {
+        self.0
+    }
+
+    /// Return a new AppendixRefOffset, positively shifted by offset
+    #[inline]
+    pub fn add(&self, offset: u64) -> Self {
+        Self(self.0 + offset)
+    }
+
+    /// Return a new AppendixRefOffset, negatively shifted by offset
+    #[inline]
+    pub fn sub(&self, offset: u64) -> Self {
+        Self(self.0 - offset)
+    }
+}
+
+/// Offset pointing to the start of the appendix section of the archive.
+#[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
+pub struct AppendixStartOffset(u64);
 /// Sequential write interface used by the encoder's state machine.
 ///
 /// This is our internal writer trait which is available for `std::io::Write` types in the
@@ -466,6 +493,40 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         Ok(offset)
     }
 
+    /// Add reference to pxar archive appendix
+    pub async fn add_appendix_ref(
+        &mut self,
+        file_name: &Path,
+        appendix_ref_offset: AppendixRefOffset,
+        file_size: u64,
+    ) -> io::Result<()> {
+        self.check()?;
+
+        let offset = self.position();
+        let file_name = file_name.as_os_str().as_bytes();
+
+        let mut data = Vec::with_capacity(2 * 8);
+        data.extend(&appendix_ref_offset.raw().to_le_bytes());
+        data.extend(&file_size.to_le_bytes());
+        seq_write_pxar_entry(
+            self.output.as_mut(),
+            format::PXAR_APPENDIX_REF,
+            &data,
+            &mut self.state.write_position,
+        )
+        .await?;
+
+        let end_offset = self.position();
+
+        self.state.items.push(GoodbyeItem {
+            hash: format::hash_filename(file_name),
+            offset: offset,
+            size: end_offset - offset,
+        });
+
+        Ok(())
+    }
+
     /// Return a file offset usable with `add_hardlink`.
     pub async fn add_symlink(
         &mut self,
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 93c3b2c..370a219 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -6,7 +6,7 @@ use std::pin::Pin;
 use std::task::{Context, Poll};
 
 use crate::decoder::sync::StandardReader;
-use crate::encoder::{self, LinkOffset, SeqWrite};
+use crate::encoder::{self, AppendixRefOffset, LinkOffset, SeqWrite};
 use crate::format;
 use crate::util::poll_result_once;
 use crate::Metadata;
@@ -110,6 +110,20 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         poll_result_once(self.inner.finish())
     }
 
+    /// Add reference to archive appendix
+    pub async fn add_appendix_ref<PF: AsRef<Path>>(
+        &mut self,
+        file_name: PF,
+        appendix_ref_offset: AppendixRefOffset,
+        file_size: u64,
+    ) -> io::Result<()> {
+        poll_result_once(self.inner.add_appendix_ref(
+            file_name.as_ref(),
+            appendix_ref_offset,
+            file_size,
+        ))
+    }
+
     /// Add a symbolic link to the archive.
     pub fn add_symlink<PF: AsRef<Path>, PT: AsRef<Path>>(
         &mut self,
diff --git a/src/format/mod.rs b/src/format/mod.rs
index 72a193c..5eb7562 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -22,6 +22,7 @@
 //!   * `FCAPS`             -- file capability in Linux disk format
 //!   * `QUOTA_PROJECT_ID`  -- the ext4/xfs quota project ID
 //!   * `PAYLOAD`           -- file contents, if it is one
+//!   * `APPENDIX_REF`      -- start offset and size of a file entry relative to the appendix start
 //!   * `SYMLINK`           -- symlink target, if it is one
 //!   * `DEVICE`            -- device major/minor, if it is a block/char device
 //!
@@ -99,6 +100,8 @@ pub const PXAR_QUOTA_PROJID: u64 = 0xe07540e82f7d1cbb;
 pub const PXAR_HARDLINK: u64 = 0x51269c8422bd7275;
 /// Marks the beginnig of the payload (actual content) of regular files
 pub const PXAR_PAYLOAD: u64 = 0x28147a1b0b7c1a25;
+/// Marks the beginnig of an appendix reference for regular files
+pub const PXAR_APPENDIX_REF: u64 = 0x849b4a17e0234f8e;
 /// Marks item as entry of goodbye table
 pub const PXAR_GOODBYE: u64 = 0x2fec4fa642d5731d;
 /// The end marker used in the GOODBYE object
@@ -151,6 +154,7 @@ impl Header {
             PXAR_ACL_GROUP_OBJ => size_of::<acl::GroupObject>() as u64,
             PXAR_QUOTA_PROJID => size_of::<QuotaProjectId>() as u64,
             PXAR_ENTRY => size_of::<Stat>() as u64,
+            PXAR_APPENDIX_REF => u64::MAX - (size_of::<Self>() as u64),
             PXAR_PAYLOAD | PXAR_GOODBYE => u64::MAX - (size_of::<Self>() as u64),
             _ => u64::MAX - (size_of::<Self>() as u64),
         }
@@ -191,6 +195,7 @@ impl Display for Header {
             PXAR_ACL_GROUP_OBJ => "ACL_GROUP_OBJ",
             PXAR_QUOTA_PROJID => "QUOTA_PROJID",
             PXAR_ENTRY => "ENTRY",
+            PXAR_APPENDIX_REF => "APPENDIX_REF",
             PXAR_PAYLOAD => "PAYLOAD",
             PXAR_GOODBYE => "GOODBYE",
             _ => "UNKNOWN",
diff --git a/src/lib.rs b/src/lib.rs
index 210c4b1..fa84e7a 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -366,6 +366,12 @@ pub enum EntryKind {
         offset: Option<u64>,
     },
 
+    /// Reference to pxar archive appendix
+    AppendixRef {
+        appendix_offset: u64,
+        file_size: u64,
+    },
+
     /// Directory entry. When iterating through an archive, the contents follow next.
     Directory,
 
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 pxar 5/28] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (3 preceding siblings ...)
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 4/28] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype Christian Ebner
@ 2023-11-15 15:47 ` Christian Ebner
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 6/28] fix #3174: encoder: helper to add to encoder position Christian Ebner
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:47 UTC (permalink / raw)
  To: pbs-devel

Add an additional entry type for marking the start of a pxar archive
appendix section. The appendix is a concatenation of possibly
uncorrelated chunks, therefore not following the pxar archive format
anymore. The appendix is only used to access the file metadata and
payloads when a PXAR_APPENDIX_REF entry is encountered in the archive
before this point.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v4:
- set decoder state to default an beginning of appendix section

Changes since v3:
- specify u64 as AppendixRefOffset for full_size in add_appendix

Changes since v2:
- no changes

Changes since v1:
- Use custom type for appendix start offset instead of raw `u64`

 examples/mk-format-hashes.rs |  1 +
 src/decoder/mod.rs           | 12 ++++++++++++
 src/encoder/aio.rs           | 12 +++++++++++-
 src/encoder/mod.rs           | 33 +++++++++++++++++++++++++++++++++
 src/encoder/sync.rs          | 12 +++++++++++-
 src/format/mod.rs            |  7 +++++++
 src/lib.rs                   |  4 ++++
 7 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index 8b4f5de..f068edd 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -12,6 +12,7 @@ const CONSTANTS: &[(&str, &str, &str)] = &[
         "__PROXMOX_FORMAT_ENTRY__",
     ),
     ("", "PXAR_FILENAME", "__PROXMOX_FORMAT_FILENAME__"),
+    ("", "PXAR_APPENDIX", "__PROXMOX_FORMAT_APPENDIX__"),
     ("", "PXAR_SYMLINK", "__PROXMOX_FORMAT_SYMLINK__"),
     ("", "PXAR_DEVICE", "__PROXMOX_FORMAT_DEVICE__"),
     ("", "PXAR_XATTR", "__PROXMOX_FORMAT_XATTR__"),
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index 70a8697..4eea633 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -295,6 +295,10 @@ impl<I: SeqRead> DecoderImpl<I> {
                         continue;
                     }
                 }
+                format::PXAR_APPENDIX => {
+                    self.state = State::Default;
+                    return Ok(Some(self.entry.take()));
+                }
                 _ => io_bail!(
                     "expected filename or directory-goodbye pxar entry, got: {}",
                     self.current_header,
@@ -546,6 +550,14 @@ impl<I: SeqRead> DecoderImpl<I> {
                 self.entry.kind = EntryKind::Device(self.read_device().await?);
                 return Ok(ItemResult::Entry);
             }
+            format::PXAR_APPENDIX => {
+                let bytes = self.read_entry_as_bytes().await?;
+                let total = u64::from_le_bytes(bytes[0..8].try_into().unwrap());
+                self.entry.kind = EntryKind::Appendix {
+                    total,
+                };
+                return Ok(ItemResult::Entry);
+            }
             format::PXAR_PAYLOAD => {
                 let offset = seq_read_position(&mut self.input).await.transpose()?;
                 self.entry.kind = EntryKind::File {
diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index 66ea535..9cc26e0 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -5,7 +5,7 @@ use std::path::Path;
 use std::pin::Pin;
 use std::task::{Context, Poll};
 
-use crate::encoder::{self, AppendixRefOffset, LinkOffset, SeqWrite};
+use crate::encoder::{self, AppendixRefOffset, AppendixStartOffset, LinkOffset, SeqWrite};
 use crate::format;
 use crate::Metadata;
 
@@ -124,6 +124,16 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
             .await
     }
 
+    /// Add the appendix start entry marker
+    ///
+    /// Returns the LinkOffset pointing after the entry, the appendix start offset
+    pub async fn add_appendix(
+        &mut self,
+        full_size: AppendixRefOffset,
+    ) -> io::Result<AppendixStartOffset> {
+        self.inner.add_appendix(full_size).await
+    }
+
     /// Add a symbolic link to the archive.
     pub async fn add_symlink<PF: AsRef<Path>, PT: AsRef<Path>>(
         &mut self,
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 982e3f9..6745b8b 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -65,6 +65,15 @@ impl AppendixRefOffset {
 /// Offset pointing to the start of the appendix section of the archive.
 #[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
 pub struct AppendixStartOffset(u64);
+
+impl AppendixStartOffset {
+    /// Get the raw byte start offset for this appenidx section.
+    #[inline]
+    pub fn raw(self) -> u64 {
+        self.0
+    }
+}
+
 /// Sequential write interface used by the encoder's state machine.
 ///
 /// This is our internal writer trait which is available for `std::io::Write` types in the
@@ -527,6 +536,30 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         Ok(())
     }
 
+    /// Add the appendix start entry marker
+    ///
+    /// Returns the AppendixStartOffset pointing after the entry, the start of the appendix
+    /// section of the archive.
+    pub async fn add_appendix(
+        &mut self,
+        full_size: AppendixRefOffset,
+    ) -> io::Result<AppendixStartOffset> {
+        self.check()?;
+
+        let data = &full_size.raw().to_le_bytes().to_vec();
+        seq_write_pxar_entry(
+            self.output.as_mut(),
+            format::PXAR_APPENDIX,
+            &data,
+            &mut self.state.write_position,
+        )
+        .await?;
+
+        let offset = self.position();
+
+        Ok(AppendixStartOffset(offset))
+    }
+
     /// Return a file offset usable with `add_hardlink`.
     pub async fn add_symlink(
         &mut self,
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 370a219..54e33a2 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -6,7 +6,7 @@ use std::pin::Pin;
 use std::task::{Context, Poll};
 
 use crate::decoder::sync::StandardReader;
-use crate::encoder::{self, AppendixRefOffset, LinkOffset, SeqWrite};
+use crate::encoder::{self, AppendixRefOffset, AppendixStartOffset, LinkOffset, SeqWrite};
 use crate::format;
 use crate::util::poll_result_once;
 use crate::Metadata;
@@ -124,6 +124,16 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         ))
     }
 
+    /// Add the appendix start entry marker
+    ///
+    /// Returns the LinkOffset pointing after the entry, the appendix start offset
+    pub async fn add_appendix(
+        &mut self,
+        full_size: AppendixRefOffset,
+    ) -> io::Result<AppendixStartOffset> {
+        poll_result_once(self.inner.add_appendix(full_size))
+    }
+
     /// Add a symbolic link to the archive.
     pub fn add_symlink<PF: AsRef<Path>, PT: AsRef<Path>>(
         &mut self,
diff --git a/src/format/mod.rs b/src/format/mod.rs
index 5eb7562..8254df9 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -35,6 +35,12 @@
 //!   * `<archive>`         -- serialization of the second directory entry
 //!   * ...
 //!   * `GOODBYE`           -- lookup table at the end of a list of directory entries
+//!
+//! For backups referencing previous backups to skip file payloads, the archive is followed by a
+//! appendix maker after which the concatinated pxar archive fragments containing the file payloads
+//! are appended. They are NOT guaranteed to follow the full pxar structure and should only be
+//! used to extract the file payloads by given offset.
+//!   * `APPENDIX`          -- pxar archive fragments containing file payloads
 
 use std::cmp::Ordering;
 use std::ffi::{CStr, OsStr};
@@ -85,6 +91,7 @@ pub const PXAR_ENTRY: u64 = 0xd5956474e588acef;
 /// Previous version of the entry struct
 pub const PXAR_ENTRY_V1: u64 = 0x11da850a1c1cceff;
 pub const PXAR_FILENAME: u64 = 0x16701121063917b3;
+pub const PXAR_APPENDIX: u64 = 0x9ff6c9507864b38d;
 pub const PXAR_SYMLINK: u64 = 0x27f971e7dbf5dc5f;
 pub const PXAR_DEVICE: u64 = 0x9fc9e906586d5ce9;
 pub const PXAR_XATTR: u64 = 0x0dab0229b57dcd03;
diff --git a/src/lib.rs b/src/lib.rs
index fa84e7a..035f995 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -372,6 +372,10 @@ pub enum EntryKind {
         file_size: u64,
     },
 
+    Appendix {
+        total: u64,
+    },
+
     /// Directory entry. When iterating through an archive, the contents follow next.
     Directory,
 
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 pxar 6/28] fix #3174: encoder: helper to add to encoder position
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (4 preceding siblings ...)
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 5/28] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype Christian Ebner
@ 2023-11-15 15:47 ` Christian Ebner
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 7/28] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype Christian Ebner
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:47 UTC (permalink / raw)
  To: pbs-devel

Adds a helper to allow to increase the encoder position by a given
size. This is used to increase the position when adding an appendix
section to the pxar stream, as these bytes are never encoded directly
but rather referenced by already existing chunks.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v4:
- no changes

Changes since v3:
- no changes

Changes since v2:
- no changes

Changes since v1:
- add unsafe function signature in order to make it clear that calling
  `position_add` with incorrect size values will break the pxar
  archive encoding.

 src/encoder/aio.rs  | 5 +++++
 src/encoder/mod.rs  | 6 ++++++
 src/encoder/sync.rs | 5 +++++
 3 files changed, 16 insertions(+)

diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index 9cc26e0..7379940 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -112,6 +112,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         self.inner.finish().await
     }
 
+    /// Add size to encoders position and return new position.
+    pub unsafe fn position_add(&mut self, size: u64) -> u64 {
+        unsafe { self.inner.position_add(size) }
+    }
+
     /// Add reference to archive appendix
     pub async fn add_appendix_ref<PF: AsRef<Path>>(
         &mut self,
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 6745b8b..98f6521 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -693,6 +693,12 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         self.state.write_position
     }
 
+    #[inline]
+    pub unsafe fn position_add(&mut self, size: u64) -> u64 {
+        self.state.write_position += size;
+        self.state.write_position
+    }
+
     pub async fn create_directory(
         &mut self,
         file_name: &Path,
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 54e33a2..5d8db25 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -110,6 +110,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
         poll_result_once(self.inner.finish())
     }
 
+    /// Add size to encoders position and return new position.
+    pub unsafe fn position_add(&mut self, size: u64) -> u64 {
+        unsafe { self.inner.position_add(size) }
+    }
+
     /// Add reference to archive appendix
     pub async fn add_appendix_ref<PF: AsRef<Path>>(
         &mut self,
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 pxar 7/28] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (5 preceding siblings ...)
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 6/28] fix #3174: encoder: helper to add to encoder position Christian Ebner
@ 2023-11-15 15:47 ` Christian Ebner
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 8/28] fix #3174: enc/dec: introduce pxar format version 2 Christian Ebner
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:47 UTC (permalink / raw)
  To: pbs-devel

The PXAR_APPENDIX_TAIL entry marks pxar archives containing an appendix
section. It has the same size as a goodbye tail marker item in order to
be able to easily read and distinguish archives with and without such
section.

This also implements the accessor used by e.g. the fuse implementation
to perform random io on the archive. The accessor reads the last entry
and stores the appendix offset if needed, in order to recalculate the
actual file payload offset within the archive when encountering a
appendix reference entry in the archive.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v4:
- no changes

Changes since v3:
- correctly propagate appendix_offset to all accessor impls which depend
  on it

Changes since v2:
- fix get_cursor for files located in the appendix

Changes since v1:
- adapt to custom type for appendix start offset

 examples/mk-format-hashes.rs |   5 ++
 examples/pxarcmd.rs          |   4 +-
 src/accessor/mod.rs          | 132 +++++++++++++++++++++++++++++------
 src/encoder/aio.rs           |   9 ++-
 src/encoder/mod.rs           |  19 ++++-
 src/encoder/sync.rs          |   7 +-
 src/format/mod.rs            |   4 ++
 7 files changed, 152 insertions(+), 28 deletions(-)

diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index f068edd..7fb938d 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -57,6 +57,11 @@ const CONSTANTS: &[(&str, &str, &str)] = &[
         "PXAR_GOODBYE_TAIL_MARKER",
         "__PROXMOX_FORMAT_PXAR_GOODBYE_TAIL_MARKER__",
     ),
+    (
+        "Marks the end of an archive containing an appendix section",
+        "PXAR_APPENDIX_TAIL",
+        "__PROXMOX_FORMAT_APPENDIX_TAIL__",
+    ),
 ];
 
 fn main() {
diff --git a/examples/pxarcmd.rs b/examples/pxarcmd.rs
index e0c779d..c7848cc 100644
--- a/examples/pxarcmd.rs
+++ b/examples/pxarcmd.rs
@@ -105,7 +105,7 @@ fn cmd_create(mut args: std::env::ArgsOs) -> Result<(), Error> {
 
     let mut encoder = Encoder::create(file, &meta)?;
     add_directory(&mut encoder, dir, &dir_path, &mut HashMap::new())?;
-    encoder.finish()?;
+    encoder.finish(None)?;
 
     Ok(())
 }
@@ -145,7 +145,7 @@ fn add_directory<'a, T: SeqWrite + 'a>(
                 root_path,
                 &mut *hardlinks,
             )?;
-            dir.finish()?;
+            dir.finish(None)?;
         } else if file_type.is_symlink() {
             todo!("symlink handling");
         } else if file_type.is_file() {
diff --git a/src/accessor/mod.rs b/src/accessor/mod.rs
index 6a2de73..dd5b559 100644
--- a/src/accessor/mod.rs
+++ b/src/accessor/mod.rs
@@ -182,6 +182,7 @@ pub(crate) struct AccessorImpl<T> {
     input: T,
     size: u64,
     caches: Arc<Caches>,
+    appendix_offset: Option<u64>,
 }
 
 impl<T: ReadAt> AccessorImpl<T> {
@@ -190,10 +191,22 @@ impl<T: ReadAt> AccessorImpl<T> {
             io_bail!("too small to contain a pxar archive");
         }
 
+        let tail_offset = size - (size_of::<GoodbyeItem>() as u64);
+        let tail: GoodbyeItem = read_entry_at(&input, tail_offset).await?;
+
+        let (appendix_offset, size) = if tail.hash == format::PXAR_APPENDIX_TAIL {
+            (Some(tail.offset), size - 40)
+        } else if tail.hash != format::PXAR_GOODBYE_TAIL_MARKER {
+            io_bail!("no goodbye tail marker found");
+        } else {
+            (None, size)
+        };
+
         Ok(Self {
             input,
             size,
             caches: Arc::new(Caches::default()),
+            appendix_offset,
         })
     }
 
@@ -207,6 +220,7 @@ impl<T: ReadAt> AccessorImpl<T> {
             self.size,
             "/".into(),
             Arc::clone(&self.caches),
+            self.appendix_offset,
         )
         .await
     }
@@ -263,6 +277,7 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
             self.size,
             "/".into(),
             Arc::clone(&self.caches),
+            self.appendix_offset,
         )
         .await
     }
@@ -274,6 +289,7 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
             offset,
             "/".into(),
             Arc::clone(&self.caches),
+            self.appendix_offset,
         )
         .await
     }
@@ -298,6 +314,7 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
             entry,
             entry_range_info: entry_range_info.clone(),
             caches: Arc::clone(&self.caches),
+            appendix_offset: self.appendix_offset,
         })
     }
 
@@ -353,6 +370,7 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
                         entry_range: entry_offset..entry_end,
                     },
                     caches: Arc::clone(&self.caches),
+                    appendix_offset: self.appendix_offset,
                 })
             }
             _ => io_bail!("hardlink does not point to a regular file"),
@@ -369,6 +387,7 @@ pub(crate) struct DirectoryImpl<T> {
     table: Arc<[GoodbyeItem]>,
     path: PathBuf,
     caches: Arc<Caches>,
+    appendix_offset: Option<u64>,
 }
 
 impl<T: Clone + ReadAt> DirectoryImpl<T> {
@@ -378,6 +397,7 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
         end_offset: u64,
         path: PathBuf,
         caches: Arc<Caches>,
+        appendix_offset: Option<u64>,
     ) -> io::Result<DirectoryImpl<T>> {
         let tail = Self::read_tail_entry(&input, end_offset).await?;
 
@@ -407,6 +427,7 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
             table: table.as_ref().map_or_else(|| Arc::new([]), Arc::clone),
             path,
             caches,
+            appendix_offset,
         };
 
         // sanity check:
@@ -516,6 +537,32 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
             .next()
             .await
             .ok_or_else(|| io_format_err!("unexpected EOF while decoding directory entry"))??;
+
+        if let EntryKind::AppendixRef {
+            appendix_offset,
+            file_size,
+        } = entry.kind()
+        {
+            let appendix_start = match self.appendix_offset {
+                Some(appendix_start) => appendix_start,
+                None => io_bail!("missing required appendix start offset information"),
+            };
+
+            let name = file_name.ok_or_else(|| io_format_err!("missing required filename"))?;
+            let c_string = std::ffi::CString::new(name.as_os_str().as_bytes())?;
+            let start =
+                appendix_start + appendix_offset + 16 + c_string.as_bytes_with_nul().len() as u64;
+            let end = start + file_size;
+            decoder = self.get_decoder(start..end, file_name).await?;
+
+            let entry = decoder
+                .next()
+                .await
+                .ok_or_else(|| io_format_err!("unexpected EOF while decoding directory entry"))??;
+
+            return Ok((entry, decoder));
+        }
+
         Ok((entry, decoder))
     }
 
@@ -533,6 +580,7 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
                 entry_range: self.entry_range(),
             },
             caches: Arc::clone(&self.caches),
+            appendix_offset: self.appendix_offset,
         })
     }
 
@@ -616,36 +664,76 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
         }
 
         let file_ofs = self.goodbye_ofs - file_goodbye_ofs;
-        let (file_name, entry_ofs) = self.read_filename_entry(file_ofs).await?;
 
-        let entry_range = Range {
-            start: entry_ofs,
-            end: file_ofs + entry.size,
+        let mut head: format::Header = read_entry_at(&self.input, file_ofs).await?;
+        let (file_name, entry_range_info) = if head.htype == format::PXAR_APPENDIX_REF {
+            let appendix_start = match self.appendix_offset {
+                Some(appendix_start) => appendix_start,
+                None => io_bail!("missing required appendix start offset information"),
+            };
+            let bytes = read_exact_data_at(
+                &self.input,
+                head.content_size() as usize,
+                file_ofs + (size_of_val(&head) as u64),
+            )
+            .await?;
+            let appendix_offset = u64::from_le_bytes(bytes[0..8].try_into().unwrap());
+            let offset = appendix_start + appendix_offset;
+            let size = u64::from_le_bytes(bytes[8..16].try_into().unwrap());
+
+            head = read_entry_at(&self.input, offset).await?;
+            let (file_name, entry_ofs) = self.read_filename_entry(head, offset).await?;
+
+            let c_string = std::ffi::CString::new(file_name.as_os_str().as_bytes())?;
+            let start = offset + 16 + c_string.as_bytes_with_nul().len() as u64;
+            if start + size < start {
+                io_bail!(
+                    "bad file: invalid entry ranges for {:?}: \
+                     start=0x{:x}, file_ofs=0x{:x}, size=0x{:x}",
+                    file_name,
+                    entry_ofs,
+                    offset,
+                    size,
+                );
+            }
+            (file_name, EntryRangeInfo {
+                filename_header_offset: Some(offset),
+                entry_range: Range {
+                    start,
+                    end: start + size,
+                },
+            })
+        } else {
+            let (file_name, entry_ofs) = self.read_filename_entry(head, file_ofs).await?;
+            if file_ofs + entry.size < entry_ofs {
+                io_bail!(
+                    "bad file: invalid entry ranges for {:?}: \
+                     start=0x{:x}, file_ofs=0x{:x}, size=0x{:x}",
+                    file_name,
+                    entry_ofs,
+                    file_ofs,
+                    entry.size,
+                );
+            }
+            (file_name, EntryRangeInfo {
+                filename_header_offset: Some(file_ofs),
+                entry_range: Range {
+                    start: entry_ofs,
+                    end: file_ofs + entry.size,
+                },
+            })
         };
-        if entry_range.end < entry_range.start {
-            io_bail!(
-                "bad file: invalid entry ranges for {:?}: \
-                 start=0x{:x}, file_ofs=0x{:x}, size=0x{:x}",
-                file_name,
-                entry_ofs,
-                file_ofs,
-                entry.size,
-            );
-        }
 
         Ok(DirEntryImpl {
             dir: self,
             file_name,
-            entry_range_info: EntryRangeInfo {
-                filename_header_offset: Some(file_ofs),
-                entry_range,
-            },
+            entry_range_info,
             caches: Arc::clone(&self.caches),
+            appendix_offset: self.appendix_offset,
         })
     }
 
-    async fn read_filename_entry(&self, file_ofs: u64) -> io::Result<(PathBuf, u64)> {
-        let head: format::Header = read_entry_at(&self.input, file_ofs).await?;
+    async fn read_filename_entry(&self, head: format::Header, file_ofs: u64) -> io::Result<(PathBuf, u64)> {
         if head.htype != format::PXAR_FILENAME {
             io_bail!("expected PXAR_FILENAME header, found: {}", head);
         }
@@ -685,6 +773,7 @@ pub(crate) struct FileEntryImpl<T: Clone + ReadAt> {
     entry: Entry,
     entry_range_info: EntryRangeInfo,
     caches: Arc<Caches>,
+    appendix_offset: Option<u64>,
 }
 
 impl<T: Clone + ReadAt> FileEntryImpl<T> {
@@ -698,6 +787,7 @@ impl<T: Clone + ReadAt> FileEntryImpl<T> {
             self.entry_range_info.entry_range.end,
             self.entry.path.clone(),
             Arc::clone(&self.caches),
+            self.appendix_offset,
         )
         .await
     }
@@ -787,6 +877,7 @@ pub(crate) struct DirEntryImpl<'a, T: Clone + ReadAt> {
     file_name: PathBuf,
     entry_range_info: EntryRangeInfo,
     caches: Arc<Caches>,
+    appendix_offset: Option<u64>,
 }
 
 impl<'a, T: Clone + ReadAt> DirEntryImpl<'a, T> {
@@ -808,6 +899,7 @@ impl<'a, T: Clone + ReadAt> DirEntryImpl<'a, T> {
             entry,
             entry_range_info: self.entry_range_info.clone(),
             caches: Arc::clone(&self.caches),
+            appendix_offset: self.appendix_offset,
         })
     }
 
diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index 7379940..5a833c5 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -108,8 +108,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
     }
 
     /// Finish this directory. This is mandatory, otherwise the `Drop` handler will `panic!`.
-    pub async fn finish(self) -> io::Result<()> {
-        self.inner.finish().await
+    pub async fn finish(
+        self,
+        appendix_tail: Option<(AppendixStartOffset, AppendixRefOffset)>,
+    ) -> io::Result<()> {
+        self.inner.finish(appendix_tail).await
     }
 
     /// Add size to encoders position and return new position.
@@ -330,7 +333,7 @@ mod test {
                     .await
                     .unwrap();
             }
-            encoder.finish().await.unwrap();
+            encoder.finish(None).await.unwrap();
         };
 
         fn test_send<T: Send>(_: T) {}
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 98f6521..c33b2c3 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -889,7 +889,10 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         .await
     }
 
-    pub async fn finish(mut self) -> io::Result<()> {
+    pub async fn finish(
+        mut self,
+        appendix_tail: Option<(AppendixStartOffset, AppendixRefOffset)>,
+    ) -> io::Result<()> {
         let tail_bytes = self.finish_goodbye_table().await?;
         seq_write_pxar_entry(
             self.output.as_mut(),
@@ -899,6 +902,20 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         )
         .await?;
 
+        if let Some((appendix_start_offset, size)) = appendix_tail {
+            let mut appendix_tail = Vec::new();
+            appendix_tail.append(&mut format::PXAR_APPENDIX_TAIL.to_le_bytes().to_vec());
+            appendix_tail.append(&mut appendix_start_offset.raw().to_le_bytes().to_vec());
+            appendix_tail.append(&mut size.raw().to_le_bytes().to_vec());
+            seq_write_pxar_entry(
+                self.output.as_mut(),
+                format::PXAR_GOODBYE,
+                &appendix_tail,
+                &mut self.state.write_position,
+            )
+            .await?;
+        }
+
         if let EncoderOutput::Owned(output) = &mut self.output {
             flush(output).await?;
         }
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 5d8db25..5ede554 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -106,8 +106,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
     }
 
     /// Finish this directory. This is mandatory, otherwise the `Drop` handler will `panic!`.
-    pub fn finish(self) -> io::Result<()> {
-        poll_result_once(self.inner.finish())
+    pub fn finish(
+        self,
+        appendix_tail: Option<(AppendixStartOffset, AppendixRefOffset)>,
+    ) -> io::Result<()> {
+        poll_result_once(self.inner.finish(appendix_tail))
     }
 
     /// Add size to encoders position and return new position.
diff --git a/src/format/mod.rs b/src/format/mod.rs
index 8254df9..8016ab1 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -41,6 +41,8 @@
 //! are appended. They are NOT guaranteed to follow the full pxar structure and should only be
 //! used to extract the file payloads by given offset.
 //!   * `APPENDIX`          -- pxar archive fragments containing file payloads
+//!   * final goodbye table
+//!   * `APPENDIX_TAIL`     -- marks the end of an archive containing a APPENDIX section
 
 use std::cmp::Ordering;
 use std::ffi::{CStr, OsStr};
@@ -113,6 +115,8 @@ pub const PXAR_APPENDIX_REF: u64 = 0x849b4a17e0234f8e;
 pub const PXAR_GOODBYE: u64 = 0x2fec4fa642d5731d;
 /// The end marker used in the GOODBYE object
 pub const PXAR_GOODBYE_TAIL_MARKER: u64 = 0xef5eed5b753e1555;
+/// Marks the end of an archive containing an appendix section
+pub const PXAR_APPENDIX_TAIL: u64 = 0x5b1b9abb7ae454f1;
 
 #[derive(Debug, Endian)]
 #[repr(C)]
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 pxar 8/28] fix #3174: enc/dec: introduce pxar format version 2
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (6 preceding siblings ...)
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 7/28] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype Christian Ebner
@ 2023-11-15 15:47 ` Christian Ebner
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 09/28] fix #3174: index: add fn index list from start/end-offsets Christian Ebner
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:47 UTC (permalink / raw)
  To: pbs-devel

Prefix pxar archives with format version 2 with a header containing the
corresponding version 2 hash.

The main intention for this is to early detect the incompatible version
for older pxar binaries, not compatible with this format version.

Further, encoder and decoder states are extended to include the version
and check consistency accordingly.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- not present in version 4

 examples/mk-format-hashes.rs |  5 +++++
 src/decoder/mod.rs           | 30 +++++++++++++++++++++++++++++-
 src/encoder/aio.rs           | 22 ++++++++++++++++------
 src/encoder/mod.rs           | 29 +++++++++++++++++++++++++++++
 src/encoder/sync.rs          | 11 ++++++++---
 src/format/mod.rs            | 11 ++++++++++-
 6 files changed, 97 insertions(+), 11 deletions(-)

diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index 7fb938d..61f4773 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -1,6 +1,11 @@
 use pxar::format::hash_filename;
 
 const CONSTANTS: &[(&str, &str, &str)] = &[
+    (
+        "Pxar format version 2 entry, fallback to version 1 if not present",
+        "PXAR_FORMAT_VERSION_2",
+        "__PROXMOX_FORMAT_VERSION_V2__",
+    ),
     (
         "Beginning of an entry (current version).",
         "PXAR_ENTRY",
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index 4eea633..b7f6c39 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -160,6 +160,7 @@ pub(crate) struct DecoderImpl<T> {
     /// The random access code uses decoders for sub-ranges which may not end in a `PAYLOAD` for
     /// entries like FIFOs or sockets, so there we explicitly allow an item to terminate with EOF.
     eof_after_entry: bool,
+    version: format::FormatVersion,
 }
 
 enum State {
@@ -220,6 +221,7 @@ impl<I: SeqRead> DecoderImpl<I> {
             state: State::Begin,
             with_goodbye_tables: false,
             eof_after_entry,
+            version: format::FormatVersion::default(),
         };
 
         // this.read_next_entry().await?;
@@ -236,7 +238,16 @@ impl<I: SeqRead> DecoderImpl<I> {
         loop {
             match self.state {
                 State::Eof => return Ok(None),
-                State::Begin => return self.read_next_entry().await.map(Some),
+                State::Begin => {
+                    match self.read_next_entry_header_or_eof().await? {
+                        Some(header) if header.htype == format::PXAR_FORMAT_VERSION_2 => {
+                            self.version = format::FormatVersion::V2;
+                            return self.read_next_entry().await.map(Some);
+                        }
+                        Some(header) => return self.read_next_entry_payload_or_eof(header).await,
+                        None => return Err(io_format_err!("unexpected EOF")),
+                    }
+                }
                 State::Default => {
                     // we completely finished an entry, so now we're going "up" in the directory
                     // hierarchy and parse the next PXAR_FILENAME or the PXAR_GOODBYE:
@@ -277,6 +288,9 @@ impl<I: SeqRead> DecoderImpl<I> {
             match self.current_header.htype {
                 format::PXAR_FILENAME => return self.handle_file_entry().await,
                 format::PXAR_APPENDIX_REF => {
+                    if self.version == format::FormatVersion::Default {
+                        io_bail!("unsupported appendix reference in default version");
+                    }
                     self.state = State::Default;
                     return self.handle_appendix_ref_entry().await
                 }
@@ -296,6 +310,9 @@ impl<I: SeqRead> DecoderImpl<I> {
                     }
                 }
                 format::PXAR_APPENDIX => {
+                    if self.version == format::FormatVersion::Default {
+                        io_bail!("unsupported appendix in default version");
+                    }
                     self.state = State::Default;
                     return Ok(Some(self.entry.take()));
                 }
@@ -378,6 +395,14 @@ impl<I: SeqRead> DecoderImpl<I> {
     }
 
     async fn read_next_entry_or_eof(&mut self) -> io::Result<Option<Entry>> {
+        if let Some(header) = self.read_next_entry_header_or_eof().await? {
+            self.read_next_entry_payload_or_eof(header).await
+        } else {
+            Ok(None)
+        }
+    }
+
+    async fn read_next_entry_header_or_eof(&mut self) -> io::Result<Option<Header>> {
         self.state = State::Default;
         self.entry.clear_data();
 
@@ -387,7 +412,10 @@ impl<I: SeqRead> DecoderImpl<I> {
         };
 
         header.check_header_size()?;
+        Ok(Some(header))
+    }
 
+    async fn read_next_entry_payload_or_eof(&mut self, header: Header) -> io::Result<Option<Entry>> {
         if header.htype == format::PXAR_HARDLINK {
             // The only "dangling" header without an 'Entry' in front of it because it does not
             // carry its own metadata.
diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index 5a833c5..b750c8d 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -24,8 +24,9 @@ impl<'a, T: tokio::io::AsyncWrite + 'a> Encoder<'a, TokioWriter<T>> {
     pub async fn from_tokio(
         output: T,
         metadata: &Metadata,
+        version: format::FormatVersion,
     ) -> io::Result<Encoder<'a, TokioWriter<T>>> {
-        Encoder::new(TokioWriter::new(output), metadata).await
+        Encoder::new(TokioWriter::new(output), metadata, version).await
     }
 }
 
@@ -46,9 +47,13 @@ impl<'a> Encoder<'a, TokioWriter<tokio::fs::File>> {
 
 impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
     /// Create an asynchronous encoder for an output implementing our internal write interface.
-    pub async fn new(output: T, metadata: &Metadata) -> io::Result<Encoder<'a, T>> {
+    pub async fn new(
+        output: T,
+        metadata: &Metadata,
+        version: format::FormatVersion,
+    ) -> io::Result<Encoder<'a, T>> {
         Ok(Self {
-            inner: encoder::EncoderImpl::new(output.into(), metadata).await?,
+            inner: encoder::EncoderImpl::new(output.into(), metadata, version).await?,
         })
     }
 
@@ -299,6 +304,7 @@ mod test {
     use std::task::{Context, Poll};
 
     use super::Encoder;
+    use crate::format;
     use crate::Metadata;
 
     struct DummyOutput;
@@ -321,9 +327,13 @@ mod test {
     /// Assert that `Encoder` is `Send`
     fn send_test() {
         let test = async {
-            let mut encoder = Encoder::new(DummyOutput, &Metadata::dir_builder(0o700).build())
-                .await
-                .unwrap();
+            let mut encoder = Encoder::new(
+                DummyOutput,
+                &Metadata::dir_builder(0o700).build(),
+                format::FormatVersion::Default,
+            )
+            .await
+            .unwrap();
             {
                 let mut dir = encoder
                     .create_directory("baba", &Metadata::dir_builder(0o700).build())
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index c33b2c3..b3c1a89 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -247,6 +247,7 @@ pub async fn encoded_size(filename: &std::ffi::CStr, metadata: &Metadata) -> io:
         file_copy_buffer: Arc::new(Mutex::new(unsafe {
             crate::util::vec_new_uninitialized(1024 * 1024)
         })),
+        version: format::FormatVersion::Default,
     };
 
     this.start_file_do(Some(metadata), filename.to_bytes())
@@ -356,6 +357,8 @@ pub(crate) struct EncoderImpl<'a, T: SeqWrite + 'a> {
     /// Since only the "current" entry can be actively writing files, we share the file copy
     /// buffer.
     file_copy_buffer: Arc<Mutex<Vec<u8>>>,
+    /// Pxar format version to encode
+    version: format::FormatVersion,
 }
 
 impl<'a, T: SeqWrite + 'a> Drop for EncoderImpl<'a, T> {
@@ -377,6 +380,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
     pub async fn new(
         output: EncoderOutput<'a, T>,
         metadata: &Metadata,
+        version: format::FormatVersion,
     ) -> io::Result<EncoderImpl<'a, T>> {
         if !metadata.is_dir() {
             io_bail!("directory metadata must contain the directory mode flag");
@@ -389,8 +393,10 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
             file_copy_buffer: Arc::new(Mutex::new(unsafe {
                 crate::util::vec_new_uninitialized(1024 * 1024)
             })),
+            version,
         };
 
+        this.encode_format_version().await?;
         this.encode_metadata(metadata).await?;
         this.state.files_offset = this.position();
 
@@ -509,6 +515,9 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         appendix_ref_offset: AppendixRefOffset,
         file_size: u64,
     ) -> io::Result<()> {
+        if self.version == format::FormatVersion::Default {
+            io_bail!("unable to add appendix reference for default format version");
+        }
         self.check()?;
 
         let offset = self.position();
@@ -544,6 +553,9 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         &mut self,
         full_size: AppendixRefOffset,
     ) -> io::Result<AppendixStartOffset> {
+        if self.version == format::FormatVersion::Default {
+            io_bail!("unable to add appendix for default format version");
+        }
         self.check()?;
 
         let data = &full_size.raw().to_le_bytes().to_vec();
@@ -740,6 +752,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
             parent: Some(&mut self.state),
             finished: false,
             file_copy_buffer,
+            version: self.version.clone(),
         })
     }
 
@@ -755,6 +768,22 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
         Ok(())
     }
 
+    async fn encode_format_version(&mut self) -> io::Result<()> {
+        if self.state.write_position != 0 {
+            io_bail!("format version must be encoded at the beginning of an archive");
+        }
+
+        let version = match self.version {
+            format::FormatVersion::Default => return Ok(()),
+            format::FormatVersion::V2 => format::PXAR_FORMAT_VERSION_2,
+        };
+
+        let header = format::Header::with_content_size(version, 0);
+        header.check_header_size()?;
+
+        seq_write_struct(self.output.as_mut(), header, &mut self.state.write_position).await
+    }
+
     async fn encode_metadata(&mut self, metadata: &Metadata) -> io::Result<()> {
         seq_write_pxar_struct_entry(
             self.output.as_mut(),
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 5ede554..f25afb7 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -28,7 +28,11 @@ impl<'a, T: io::Write + 'a> Encoder<'a, StandardWriter<T>> {
     /// Encode a `pxar` archive into a regular `std::io::Write` output.
     #[inline]
     pub fn from_std(output: T, metadata: &Metadata) -> io::Result<Encoder<'a, StandardWriter<T>>> {
-        Encoder::new(StandardWriter::new(output), metadata)
+        Encoder::new(
+            StandardWriter::new(output),
+            metadata,
+            format::FormatVersion::Default,
+        )
     }
 }
 
@@ -41,6 +45,7 @@ impl<'a> Encoder<'a, StandardWriter<std::fs::File>> {
         Encoder::new(
             StandardWriter::new(std::fs::File::create(path.as_ref())?),
             metadata,
+            format::FormatVersion::Default,
         )
     }
 }
@@ -50,9 +55,9 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
     ///
     /// Note that the `output`'s `SeqWrite` implementation must always return `Poll::Ready` and is
     /// not allowed to use the `Waker`, as this will cause a `panic!`.
-    pub fn new(output: T, metadata: &Metadata) -> io::Result<Self> {
+    pub fn new(output: T, metadata: &Metadata, version: format::FormatVersion) -> io::Result<Self> {
         Ok(Self {
-            inner: poll_result_once(encoder::EncoderImpl::new(output.into(), metadata))?,
+            inner: poll_result_once(encoder::EncoderImpl::new(output.into(), metadata, version))?,
         })
     }
 
diff --git a/src/format/mod.rs b/src/format/mod.rs
index 8016ab1..7bffe98 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -44,7 +44,7 @@
 //!   * final goodbye table
 //!   * `APPENDIX_TAIL`     -- marks the end of an archive containing a APPENDIX section
 
-use std::cmp::Ordering;
+use std::cmp::{Ordering, PartialEq};
 use std::ffi::{CStr, OsStr};
 use std::fmt;
 use std::fmt::Display;
@@ -88,6 +88,8 @@ pub mod mode {
 }
 
 // Generated by `cargo run --example mk-format-hashes`
+/// Pxar format version 2 entry, fallback to version 1 if not present
+pub const PXAR_FORMAT_VERSION_2: u64 = 0xa0c3af8478917dbb;
 /// Beginning of an entry (current version).
 pub const PXAR_ENTRY: u64 = 0xd5956474e588acef;
 /// Previous version of the entry struct
@@ -118,6 +120,13 @@ pub const PXAR_GOODBYE_TAIL_MARKER: u64 = 0xef5eed5b753e1555;
 /// Marks the end of an archive containing an appendix section
 pub const PXAR_APPENDIX_TAIL: u64 = 0x5b1b9abb7ae454f1;
 
+#[derive(Clone, Default, PartialEq)]
+pub enum FormatVersion {
+    #[default]
+    Default,
+    V2,
+}
+
 #[derive(Debug, Endian)]
 #[repr(C)]
 pub struct Header {
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 09/28] fix #3174: index: add fn index list from start/end-offsets
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (7 preceding siblings ...)
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 8/28] fix #3174: enc/dec: introduce pxar format version 2 Christian Ebner
@ 2023-11-15 15:47 ` Christian Ebner
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 10/28] fix #3174: index: add fn digest for DynamicEntry Christian Ebner
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:47 UTC (permalink / raw)
  To: pbs-devel

Adds a function to get a list of DynamicEntry's from a chunk index by
given start and end offset, which should be contained within these
chunks.
This is needed in order to reference file payloads and reuse the
chunks containing them from a previous backup run.

The index entries are normalized, meaning each entries end() is equal to
the chunk size.

In addition to the list of index entries, the padding to the start of
the requested start offset from the first chunk is returned. This is
needed for calculation of the appendix section offsets during
decoding.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- no changes

Changes since version 3:
- also return padding end of chunk, needed to easily get the full chunks
  size

Changes since version 2:
- no changes

Changes since version 1:
- remove unneeded total size calculation, not needed

 pbs-datastore/src/dynamic_index.rs | 33 ++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/pbs-datastore/src/dynamic_index.rs b/pbs-datastore/src/dynamic_index.rs
index 71a5082e..c2bf44eb 100644
--- a/pbs-datastore/src/dynamic_index.rs
+++ b/pbs-datastore/src/dynamic_index.rs
@@ -188,6 +188,39 @@ impl DynamicIndexReader {
             self.binary_search(middle_idx + 1, middle_end, end_idx, end, offset)
         }
     }
+
+    /// List of chunk indices containing the data from start_offset to end_offset
+    pub fn indices(
+        &self,
+        start_offset: u64,
+        end_offset: u64,
+    ) -> Result<(Vec<DynamicEntry>, u64, u64), Error> {
+        let end_idx = self.index.len() - 1;
+        let chunk_end = self.chunk_end(end_idx);
+        let start = self.binary_search(0, 0, end_idx, chunk_end, start_offset)?;
+        let end = self.binary_search(0, 0, end_idx, chunk_end, end_offset)?;
+
+        let offset_first = if start == 0 {
+            0
+        } else {
+            self.index[start - 1].end()
+        };
+
+        let padding_start = start_offset - offset_first;
+        let padding_end = self.index[end].end() - end_offset;
+
+        let mut indices = Vec::new();
+        let mut prev_end = offset_first;
+        for dynamic_entry in &self.index[start..end + 1] {
+            let mut dynamic_entry = dynamic_entry.clone();
+            // Normalize end to be equal to size
+            dynamic_entry.end_le -= prev_end;
+            prev_end += dynamic_entry.end_le;
+            indices.push(dynamic_entry);
+        }
+
+        Ok((indices, padding_start, padding_end))
+    }
 }
 
 impl IndexFile for DynamicIndexReader {
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 10/28] fix #3174: index: add fn digest for DynamicEntry
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (8 preceding siblings ...)
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 09/28] fix #3174: index: add fn index list from start/end-offsets Christian Ebner
@ 2023-11-15 15:47 ` Christian Ebner
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 11/28] fix #3174: api: double catalog upload size Christian Ebner
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:47 UTC (permalink / raw)
  To: pbs-devel

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- no changes

Changes since version 3:
- no changes

Changes since version 2:
- no changes

Changes since version 1:
- no changes

 pbs-datastore/src/dynamic_index.rs | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/pbs-datastore/src/dynamic_index.rs b/pbs-datastore/src/dynamic_index.rs
index c2bf44eb..51269250 100644
--- a/pbs-datastore/src/dynamic_index.rs
+++ b/pbs-datastore/src/dynamic_index.rs
@@ -72,6 +72,11 @@ impl DynamicEntry {
     pub fn end(&self) -> u64 {
         u64::from_le(self.end_le)
     }
+
+    #[inline]
+    pub fn digest(&self) -> [u8; 32] {
+        self.digest.clone()
+    }
 }
 
 pub struct DynamicIndexReader {
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 11/28] fix #3174: api: double catalog upload size
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (9 preceding siblings ...)
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 10/28] fix #3174: index: add fn digest for DynamicEntry Christian Ebner
@ 2023-11-15 15:47 ` Christian Ebner
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 12/28] fix #3174: catalog: introduce extended format v2 Christian Ebner
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:47 UTC (permalink / raw)
  To: pbs-devel

Double the catalog upload size, as the catalog needs to contain the
additional archive offset information needed for referencing file
payloads in an appendix if the catalog is used to perform backups
without re-reading file payloads.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- no changes

Changes since version 3:
- no changes

Changes since version 2:
- no changes

Changes since version 1:
- no changes

 src/api2/backup/upload_chunk.rs | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/api2/backup/upload_chunk.rs b/src/api2/backup/upload_chunk.rs
index 20259660..84485bfd 100644
--- a/src/api2/backup/upload_chunk.rs
+++ b/src/api2/backup/upload_chunk.rs
@@ -126,7 +126,7 @@ pub const API_METHOD_UPLOAD_FIXED_CHUNK: ApiMethod = ApiMethod::new(
                 false,
                 &IntegerSchema::new("Chunk size.")
                     .minimum(1)
-                    .maximum(1024 * 1024 * 16)
+                    .maximum(1024 * 1024 * 32)
                     .schema()
             ),
             (
@@ -135,7 +135,7 @@ pub const API_METHOD_UPLOAD_FIXED_CHUNK: ApiMethod = ApiMethod::new(
                 &IntegerSchema::new("Encoded chunk size.")
                     .minimum((std::mem::size_of::<DataBlobHeader>() as isize) + 1)
                     .maximum(
-                        1024 * 1024 * 16
+                        1024 * 1024 * 32
                             + (std::mem::size_of::<EncryptedDataBlobHeader>() as isize)
                     )
                     .schema()
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 12/28] fix #3174: catalog: introduce extended format v2
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (10 preceding siblings ...)
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 11/28] fix #3174: api: double catalog upload size Christian Ebner
@ 2023-11-15 15:47 ` Christian Ebner
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 13/28] fix #3174: archiver/extractor: impl appendix ref Christian Ebner
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:47 UTC (permalink / raw)
  To: pbs-devel

Increments the catalog file format to version 2. The new catalog
format introduces an extension to file entries in order to store the
additional ctime and pxar archive file offsets, needed for metadata
based file change detection and pxar file entry reuse in the pxar
appendix section.

ctime is introduced, in order to allow to detect also file status
changes since the last backup run, e.g. by updated extended
attributes, which will not increment mtime.

Inclusion of the pxar archive file offset allows to calculate the file
entry size in the archive needed for re-indexing chunks containing
this entry and the offset relative to the appendix start in the pxar
archive, needed for calculation of the bytes to skip over during
sequential decoding while restoring the pxar archive.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- no changes

Changes since version 3:
- no changes

Changes since version 2:
- no changes

Changes since version 1:
- not present in previous version

 pbs-client/src/pxar/create.rs                 |  31 +-
 pbs-datastore/src/catalog.rs                  | 323 ++++++++++++++----
 pbs-datastore/src/file_formats.rs             |   3 +
 .../src/proxmox_restore_daemon/api.rs         |   6 +-
 4 files changed, 282 insertions(+), 81 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index e7053d9e..a2338218 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -390,12 +390,6 @@ impl Archiver {
         patterns_count: usize,
     ) -> Result<(), Error> {
         let content = generate_pxar_excludes_cli(&self.patterns[..patterns_count]);
-        if let Some(ref catalog) = self.catalog {
-            catalog
-                .lock()
-                .unwrap()
-                .add_file(file_name, content.len() as u64, 0)?;
-        }
 
         let mut metadata = Metadata::default();
         metadata.stat.mode = pxar::format::mode::IFREG | 0o600;
@@ -405,6 +399,14 @@ impl Archiver {
             .await?;
         file.write_all(&content).await?;
 
+        if let Some(ref catalog) = self.catalog {
+            let link_offset = file.file_offset();
+            catalog
+                .lock()
+                .unwrap()
+                .add_file(file_name, content.len() as u64, 0, 0, link_offset)?;
+        }
+
         Ok(())
     }
 
@@ -572,17 +574,20 @@ impl Archiver {
                 }
 
                 let file_size = stat.st_size as u64;
-                if let Some(ref catalog) = self.catalog {
-                    catalog
-                        .lock()
-                        .unwrap()
-                        .add_file(c_file_name, file_size, stat.st_mtime)?;
-                }
-
                 let offset: LinkOffset = self
                     .add_regular_file(encoder, fd, file_name, &metadata, file_size)
                     .await?;
 
+                if let Some(ref catalog) = self.catalog {
+                    catalog.lock().unwrap().add_file(
+                        c_file_name,
+                        file_size,
+                        stat.st_mtime,
+                        stat.st_ctime,
+                        offset,
+                    )?;
+                }
+
                 if stat.st_nlink > 1 {
                     self.hardlinks
                         .insert(link_info, (self.path.clone(), offset));
diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
index eb531837..746b493b 100644
--- a/pbs-datastore/src/catalog.rs
+++ b/pbs-datastore/src/catalog.rs
@@ -11,7 +11,7 @@ use pathpatterns::{MatchList, MatchType};
 use proxmox_io::ReadExt;
 use proxmox_schema::api;
 
-use crate::file_formats::PROXMOX_CATALOG_FILE_MAGIC_1_0;
+use crate::file_formats::{PROXMOX_CATALOG_FILE_MAGIC_1_0, PROXMOX_CATALOG_FILE_MAGIC_2_0};
 
 /// Trait for writing file list catalogs.
 ///
@@ -20,7 +20,14 @@ use crate::file_formats::PROXMOX_CATALOG_FILE_MAGIC_1_0;
 pub trait BackupCatalogWriter {
     fn start_directory(&mut self, name: &CStr) -> Result<(), Error>;
     fn end_directory(&mut self) -> Result<(), Error>;
-    fn add_file(&mut self, name: &CStr, size: u64, mtime: i64) -> Result<(), Error>;
+    fn add_file(
+        &mut self,
+        name: &CStr,
+        size: u64,
+        mtime: i64,
+        ctime: i64,
+        file_offset: pxar::encoder::LinkOffset,
+    ) -> Result<(), Error>;
     fn add_symlink(&mut self, name: &CStr) -> Result<(), Error>;
     fn add_hardlink(&mut self, name: &CStr) -> Result<(), Error>;
     fn add_block_device(&mut self, name: &CStr) -> Result<(), Error>;
@@ -81,6 +88,21 @@ impl fmt::Display for CatalogEntryType {
     }
 }
 
+#[derive(Clone, Copy, Debug, Default, Eq, PartialEq, Ord, PartialOrd)]
+pub struct FileOffset {
+    offset: u64,
+}
+
+impl FileOffset {
+    pub fn raw(&self) -> u64 {
+        self.offset
+    }
+}
+#[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
+pub enum Offset {
+    FileOffset { offset: u64 },
+}
+
 /// Represents a named directory entry
 ///
 /// The ``attr`` property contain the exact type with type specific
@@ -91,11 +113,23 @@ pub struct DirEntry {
     pub attr: DirEntryAttribute,
 }
 
+#[derive(Clone, Debug, PartialEq, Eq)]
+pub struct CatalogV2Extension {
+    pub ctime: i64,
+    pub file_offset: FileOffset,
+}
+
 /// Used to specific additional attributes inside DirEntry
 #[derive(Clone, Debug, PartialEq, Eq)]
 pub enum DirEntryAttribute {
-    Directory { start: u64 },
-    File { size: u64, mtime: i64 },
+    Directory {
+        start: u64,
+    },
+    File {
+        size: u64,
+        mtime: i64,
+        extension: Option<CatalogV2Extension>,
+    },
     Symlink,
     Hardlink,
     BlockDevice,
@@ -105,40 +139,63 @@ pub enum DirEntryAttribute {
 }
 
 impl DirEntry {
-    fn new(etype: CatalogEntryType, name: Vec<u8>, start: u64, size: u64, mtime: i64) -> Self {
-        match etype {
-            CatalogEntryType::Directory => DirEntry {
+    fn new(
+        etype: CatalogEntryType,
+        name: Vec<u8>,
+        start: u64,
+        size: u64,
+        mtime: i64,
+        ctime: i64,
+        offset: Option<Offset>,
+    ) -> Self {
+        match (etype, offset) {
+            (CatalogEntryType::Directory, None) => DirEntry {
                 name,
                 attr: DirEntryAttribute::Directory { start },
             },
-            CatalogEntryType::File => DirEntry {
-                name,
-                attr: DirEntryAttribute::File { size, mtime },
-            },
-            CatalogEntryType::Symlink => DirEntry {
+            (CatalogEntryType::File, offset) => {
+                let extension = if let Some(Offset::FileOffset { offset }) = offset {
+                    Some(CatalogV2Extension {
+                        ctime,
+                        file_offset: FileOffset { offset },
+                    })
+                } else {
+                    None
+                };
+                DirEntry {
+                    name,
+                    attr: DirEntryAttribute::File {
+                        size,
+                        mtime,
+                        extension,
+                    },
+                }
+            }
+            (CatalogEntryType::Symlink, None) => DirEntry {
                 name,
                 attr: DirEntryAttribute::Symlink,
             },
-            CatalogEntryType::Hardlink => DirEntry {
+            (CatalogEntryType::Hardlink, None) => DirEntry {
                 name,
                 attr: DirEntryAttribute::Hardlink,
             },
-            CatalogEntryType::BlockDevice => DirEntry {
+            (CatalogEntryType::BlockDevice, None) => DirEntry {
                 name,
                 attr: DirEntryAttribute::BlockDevice,
             },
-            CatalogEntryType::CharDevice => DirEntry {
+            (CatalogEntryType::CharDevice, None) => DirEntry {
                 name,
                 attr: DirEntryAttribute::CharDevice,
             },
-            CatalogEntryType::Fifo => DirEntry {
+            (CatalogEntryType::Fifo, None) => DirEntry {
                 name,
                 attr: DirEntryAttribute::Fifo,
             },
-            CatalogEntryType::Socket => DirEntry {
+            (CatalogEntryType::Socket, None) => DirEntry {
                 name,
                 attr: DirEntryAttribute::Socket,
             },
+            _ => panic!("unexpected parameters '{etype}' and '{offset:?}'"),
         }
     }
 
@@ -197,13 +254,22 @@ impl DirInfo {
             }
             DirEntry {
                 name,
-                attr: DirEntryAttribute::File { size, mtime },
+                attr:
+                    DirEntryAttribute::File {
+                        size,
+                        mtime,
+                        extension,
+                    },
             } => {
                 writer.write_all(&[CatalogEntryType::File as u8])?;
                 catalog_encode_u64(writer, name.len() as u64)?;
                 writer.write_all(name)?;
                 catalog_encode_u64(writer, *size)?;
                 catalog_encode_i64(writer, *mtime)?;
+                if let Some(CatalogV2Extension { ctime, file_offset }) = extension {
+                    catalog_encode_i64(writer, *ctime)?;
+                    catalog_encode_u64(writer, file_offset.raw())?;
+                }
             }
             DirEntry {
                 name,
@@ -271,8 +337,11 @@ impl DirInfo {
         Ok((self.name, data))
     }
 
-    fn parse<C: FnMut(CatalogEntryType, &[u8], u64, u64, i64) -> Result<bool, Error>>(
+    fn parse<
+        C: FnMut(CatalogEntryType, &[u8], u64, u64, i64, i64, Option<Offset>) -> Result<bool, Error>,
+    >(
         data: &[u8],
+        catalog_version: Option<[u8; 8]>,
         mut callback: C,
     ) -> Result<(), Error> {
         let mut cursor = data;
@@ -300,14 +369,28 @@ impl DirInfo {
             let cont = match etype {
                 CatalogEntryType::Directory => {
                     let offset = catalog_decode_u64(&mut cursor)?;
-                    callback(etype, name, offset, 0, 0)?
+                    callback(etype, name, offset, 0, 0, 0, None)?
                 }
                 CatalogEntryType::File => {
                     let size = catalog_decode_u64(&mut cursor)?;
                     let mtime = catalog_decode_i64(&mut cursor)?;
-                    callback(etype, name, 0, size, mtime)?
+                    let (ctime, offset) = if let Some(version) = catalog_version {
+                        let mut ctime = 0;
+                        let mut offset = None;
+                        if version == PROXMOX_CATALOG_FILE_MAGIC_2_0 {
+                            ctime = catalog_decode_i64(&mut cursor)?;
+                            let file_offset = catalog_decode_u64(&mut cursor)?;
+                            offset = Some(Offset::FileOffset {
+                                offset: file_offset,
+                            })
+                        }
+                        (ctime, offset)
+                    } else {
+                        (0, None)
+                    };
+                    callback(etype, name, 0, size, mtime, ctime, offset)?
                 }
-                _ => callback(etype, name, 0, 0, 0)?,
+                _ => callback(etype, name, 0, 0, 0, 0, None)?,
             };
             if !cont {
                 return Ok(());
@@ -342,7 +425,7 @@ impl<W: Write> CatalogWriter<W> {
             dirstack: vec![DirInfo::new_rootdir()],
             pos: 0,
         };
-        me.write_all(&PROXMOX_CATALOG_FILE_MAGIC_1_0)?;
+        me.write_all(&PROXMOX_CATALOG_FILE_MAGIC_2_0)?;
         Ok(me)
     }
 
@@ -407,15 +490,29 @@ impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
         Ok(())
     }
 
-    fn add_file(&mut self, name: &CStr, size: u64, mtime: i64) -> Result<(), Error> {
+    fn add_file(
+        &mut self,
+        name: &CStr,
+        size: u64,
+        mtime: i64,
+        ctime: i64,
+        file_offset: pxar::encoder::LinkOffset,
+    ) -> Result<(), Error> {
         let dir = self
             .dirstack
             .last_mut()
             .ok_or_else(|| format_err!("outside root"))?;
         let name = name.to_bytes().to_vec();
+        let file_offset = FileOffset {
+            offset: file_offset.raw(),
+        };
         dir.entries.push(DirEntry {
             name,
-            attr: DirEntryAttribute::File { size, mtime },
+            attr: DirEntryAttribute::File {
+                size,
+                mtime,
+                extension: Some(CatalogV2Extension { ctime, file_offset }),
+            },
         });
         Ok(())
     }
@@ -502,12 +599,16 @@ impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
 /// Read Catalog files
 pub struct CatalogReader<R> {
     reader: R,
+    magic: Option<[u8; 8]>,
 }
 
 impl<R: Read + Seek> CatalogReader<R> {
     /// Create a new CatalogReader instance
     pub fn new(reader: R) -> Self {
-        Self { reader }
+        Self {
+            reader,
+            magic: None,
+        }
     }
 
     /// Print whole catalog to stdout
@@ -528,8 +629,11 @@ impl<R: Read + Seek> CatalogReader<R> {
         self.reader.seek(SeekFrom::Start(0))?;
         let mut magic = [0u8; 8];
         self.reader.read_exact(&mut magic)?;
-        if magic != PROXMOX_CATALOG_FILE_MAGIC_1_0 {
-            bail!("got unexpected magic number for catalog");
+        match magic {
+            PROXMOX_CATALOG_FILE_MAGIC_1_0 | PROXMOX_CATALOG_FILE_MAGIC_2_0 => {
+                self.magic = Some(magic)
+            }
+            _ => bail!("got unexpected magic number for catalog"),
         }
         self.reader.seek(SeekFrom::End(-8))?;
         let start = unsafe { self.reader.read_le_value::<u64>()? };
@@ -550,11 +654,23 @@ impl<R: Read + Seek> CatalogReader<R> {
 
         let mut entry_list = Vec::new();
 
-        DirInfo::parse(&data, |etype, name, offset, size, mtime| {
-            let entry = DirEntry::new(etype, name.to_vec(), start - offset, size, mtime);
-            entry_list.push(entry);
-            Ok(true)
-        })?;
+        DirInfo::parse(
+            &data,
+            self.magic,
+            |etype, name, offset, size, mtime, ctime, link_offset| {
+                let entry = DirEntry::new(
+                    etype,
+                    name.to_vec(),
+                    start - offset,
+                    size,
+                    mtime,
+                    ctime,
+                    link_offset,
+                );
+                entry_list.push(entry);
+                Ok(true)
+            },
+        )?;
 
         Ok(entry_list)
     }
@@ -600,15 +716,27 @@ impl<R: Read + Seek> CatalogReader<R> {
         let data = self.read_raw_dirinfo_block(start)?;
 
         let mut item = None;
-        DirInfo::parse(&data, |etype, name, offset, size, mtime| {
-            if name != filename {
-                return Ok(true);
-            }
+        DirInfo::parse(
+            &data,
+            self.magic,
+            |etype, name, offset, size, mtime, ctime, link_offset| {
+                if name != filename {
+                    return Ok(true);
+                }
 
-            let entry = DirEntry::new(etype, name.to_vec(), start - offset, size, mtime);
-            item = Some(entry);
-            Ok(false) // stop parsing
-        })?;
+                let entry = DirEntry::new(
+                    etype,
+                    name.to_vec(),
+                    start - offset,
+                    size,
+                    mtime,
+                    ctime,
+                    link_offset,
+                );
+                item = Some(entry);
+                Ok(false) // stop parsing
+            },
+        )?;
 
         Ok(item)
     }
@@ -628,35 +756,51 @@ impl<R: Read + Seek> CatalogReader<R> {
     pub fn dump_dir(&mut self, prefix: &std::path::Path, start: u64) -> Result<(), Error> {
         let data = self.read_raw_dirinfo_block(start)?;
 
-        DirInfo::parse(&data, |etype, name, offset, size, mtime| {
-            let mut path = std::path::PathBuf::from(prefix);
-            let name: &OsStr = OsStrExt::from_bytes(name);
-            path.push(name);
-
-            match etype {
-                CatalogEntryType::Directory => {
-                    log::info!("{} {:?}", etype, path);
-                    if offset > start {
-                        bail!("got wrong directory offset ({} > {})", offset, start);
+        DirInfo::parse(
+            &data,
+            self.magic,
+            |etype, name, offset, size, mtime, ctime, link_offset| {
+                let mut path = std::path::PathBuf::from(prefix);
+                let name: &OsStr = OsStrExt::from_bytes(name);
+                path.push(name);
+
+                match etype {
+                    CatalogEntryType::Directory => {
+                        log::info!("{} {:?}", etype, path);
+                        if offset > start {
+                            bail!("got wrong directory offset ({} > {})", offset, start);
+                        }
+                        let pos = start - offset;
+                        self.dump_dir(&path, pos)?;
                     }
-                    let pos = start - offset;
-                    self.dump_dir(&path, pos)?;
-                }
-                CatalogEntryType::File => {
-                    let mut mtime_string = mtime.to_string();
-                    if let Ok(s) = proxmox_time::strftime_local("%FT%TZ", mtime) {
-                        mtime_string = s;
+                    CatalogEntryType::File => {
+                        let mut mtime_string = mtime.to_string();
+                        let mut ctime_string = ctime.to_string();
+                        if let Ok(s) = proxmox_time::strftime_local("%FT%TZ", mtime) {
+                            mtime_string = s;
+                        }
+                        if let Ok(s) = proxmox_time::strftime_local("%FT%TZ", ctime) {
+                            ctime_string = s;
+                        }
+
+                        log::info!(
+                            "{} {:?} {} {} {} {:?}",
+                            etype,
+                            path,
+                            size,
+                            mtime_string,
+                            ctime_string,
+                            link_offset
+                        );
+                    }
+                    _ => {
+                        log::info!("{} {:?}", etype, path);
                     }
-
-                    log::info!("{} {:?} {} {}", etype, path, size, mtime_string,);
-                }
-                _ => {
-                    log::info!("{} {:?}", etype, path);
                 }
-            }
 
-            Ok(true)
-        })
+                Ok(true)
+            },
+        )
     }
 
     /// Finds all entries matching the given match patterns and calls the
@@ -705,9 +849,24 @@ impl<R: Read + Seek> CatalogReader<R> {
             components.push(b'/');
             components.extend(&direntry.name);
             let mut entry = ArchiveEntry::new(&components, Some(&direntry.attr));
-            if let DirEntryAttribute::File { size, mtime } = direntry.attr {
+            if let DirEntryAttribute::File {
+                size,
+                mtime,
+                extension,
+            } = direntry.attr
+            {
                 entry.size = size.into();
                 entry.mtime = mtime.into();
+                entry.ctime = None;
+                entry.file_offset = None;
+                if let Some(CatalogV2Extension {
+                    ctime,
+                    file_offset: FileOffset { offset },
+                }) = extension
+                {
+                    entry.ctime = ctime.into();
+                    entry.file_offset = offset.into();
+                }
             }
             res.push(entry);
         }
@@ -916,6 +1075,12 @@ pub struct ArchiveEntry {
     /// The file "last modified" time stamp, if entry_type is 'f' (file)
     #[serde(skip_serializing_if = "Option::is_none")]
     pub mtime: Option<i64>,
+    /// The file "last status change" time stamp,  if entry_type is 'f' (file)
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub ctime: Option<i64>,
+    /// The file archive offset, if entry_type is 'f' (file)
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub file_offset: Option<u64>,
 }
 
 impl ArchiveEntry {
@@ -946,6 +1111,30 @@ impl ArchiveEntry {
                 Some(DirEntryAttribute::File { mtime, .. }) => Some(*mtime),
                 _ => None,
             },
+            ctime: match entry_type {
+                Some(DirEntryAttribute::File { extension, .. }) => {
+                    if let Some(CatalogV2Extension { ctime, .. }) = extension {
+                        Some(*ctime)
+                    } else {
+                        None
+                    }
+                }
+                _ => None,
+            },
+            file_offset: match entry_type {
+                Some(DirEntryAttribute::File { extension, .. }) => {
+                    if let Some(CatalogV2Extension {
+                        file_offset: FileOffset { offset },
+                        ..
+                    }) = extension
+                    {
+                        Some(*offset)
+                    } else {
+                        None
+                    }
+                }
+                _ => None,
+            },
         }
     }
 }
diff --git a/pbs-datastore/src/file_formats.rs b/pbs-datastore/src/file_formats.rs
index 73d67e20..4181f0f1 100644
--- a/pbs-datastore/src/file_formats.rs
+++ b/pbs-datastore/src/file_formats.rs
@@ -5,6 +5,9 @@ use endian_trait::Endian;
 // openssl::sha::sha256(b"Proxmox Backup Catalog file v1.0")[0..8]
 pub const PROXMOX_CATALOG_FILE_MAGIC_1_0: [u8; 8] = [145, 253, 96, 249, 196, 103, 88, 213];
 
+// openssl::sha::sha256(b"Proxmox Backup Catalog file v2.0")[0..8]
+pub const PROXMOX_CATALOG_FILE_MAGIC_2_0: [u8; 8] = [204, 223, 24, 211, 187, 125, 183, 226];
+
 // openssl::sha::sha256(b"Proxmox Backup uncompressed blob v1.0")[0..8]
 pub const UNCOMPRESSED_BLOB_MAGIC_1_0: [u8; 8] = [66, 171, 56, 7, 190, 131, 112, 161];
 
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index c4e97d33..fb12befa 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -24,7 +24,7 @@ use proxmox_sys::fs::read_subdir;
 
 use pbs_api_types::file_restore::{FileRestoreFormat, RestoreDaemonStatus};
 use pbs_client::pxar::{create_archive, Flags, PxarCreateOptions, ENCODER_MAX_ENTRIES};
-use pbs_datastore::catalog::{ArchiveEntry, DirEntryAttribute};
+use pbs_datastore::catalog::{ArchiveEntry, CatalogV2Extension, DirEntryAttribute, FileOffset};
 use pbs_tools::json::required_string_param;
 
 use pxar::encoder::aio::TokioWriter;
@@ -109,6 +109,10 @@ fn get_dir_entry(path: &Path) -> Result<DirEntryAttribute, Error> {
         libc::S_IFREG => DirEntryAttribute::File {
             size: stat.st_size as u64,
             mtime: stat.st_mtime,
+            extension: Some(CatalogV2Extension {
+                ctime: stat.st_ctime,
+                file_offset: FileOffset::default(),
+            }),
         },
         libc::S_IFDIR => DirEntryAttribute::Directory { start: 0 },
         _ => bail!("unsupported file type: {}", stat.st_mode),
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 13/28] fix #3174: archiver/extractor: impl appendix ref
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (11 preceding siblings ...)
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 12/28] fix #3174: catalog: introduce extended format v2 Christian Ebner
@ 2023-11-15 15:47 ` Christian Ebner
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 14/28] fix #3174: catalog: add specialized Archive entry Christian Ebner
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:47 UTC (permalink / raw)
  To: pbs-devel

Implements the functionality to create and extract appendix references
via the pbs client.

This reuses the pxar encoders functionality to write appendix reference
entries and adds the implementation to store and access them in the
catalog.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- no changes

Changes since version 3:
- no changes

Changes since version 2:
- no changes

Changes since version 1:
- Use custom types for appendix ref archive offsets
- Include catalog implementation in this patch

 pbs-client/src/catalog_shell.rs |   2 +-
 pbs-client/src/pxar/create.rs   |  12 ++++
 pbs-client/src/pxar/extract.rs  |  16 +++++
 pbs-client/src/pxar/tools.rs    |   8 +++
 pbs-datastore/src/catalog.rs    | 103 ++++++++++++++++++++++++++++++++
 5 files changed, 140 insertions(+), 1 deletion(-)

diff --git a/pbs-client/src/catalog_shell.rs b/pbs-client/src/catalog_shell.rs
index f53b3cc5..99416d2f 100644
--- a/pbs-client/src/catalog_shell.rs
+++ b/pbs-client/src/catalog_shell.rs
@@ -1147,7 +1147,7 @@ impl<'a> ExtractorState<'a> {
             (_, DirEntryAttribute::Directory { .. }) => {
                 self.handle_new_directory(entry, match_result?).await?;
             }
-            (true, DirEntryAttribute::File { .. }) => {
+            (true, DirEntryAttribute::File { .. } | DirEntryAttribute::AppendixRef { .. }) => {
                 self.dir_stack.push(PathStackEntry::new(entry));
                 let file = Shell::walk_pxar_archive(self.accessor, &mut self.dir_stack).await?;
                 self.extract_file(file).await?;
diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index a2338218..611d7421 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -735,6 +735,18 @@ impl Archiver {
         Ok(out.file_offset())
     }
 
+    async fn add_appendix_ref<T: SeqWrite + Send>(
+        &mut self,
+        encoder: &mut Encoder<'_, T>,
+        file_name: &Path,
+        appendix_offset: pxar::encoder::AppendixRefOffset,
+        file_size: u64,
+    ) -> Result<(), Error> {
+        Ok(encoder
+            .add_appendix_ref(file_name, appendix_offset, file_size)
+            .await?)
+    }
+
     async fn add_symlink<T: SeqWrite + Send>(
         &mut self,
         encoder: &mut Encoder<'_, T>,
diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
index f78e06c2..d2d42749 100644
--- a/pbs-client/src/pxar/extract.rs
+++ b/pbs-client/src/pxar/extract.rs
@@ -74,6 +74,7 @@ struct ExtractorIterState {
     err_path_stack: Vec<OsString>,
     current_match: bool,
     end_reached: bool,
+    appendix_list: Vec<(PathBuf, u64, u64)>,
 }
 
 /// An [`Iterator`] that encapsulates the process of extraction in [extract_archive].
@@ -98,6 +99,7 @@ impl ExtractorIterState {
             err_path_stack: Vec::new(),
             current_match: options.extract_match_default,
             end_reached: false,
+            appendix_list: Vec::new(),
         }
     }
 }
@@ -373,6 +375,20 @@ where
                 }
                 .context(PxarExtractContext::ExtractFile)
             }
+            (
+                true,
+                EntryKind::AppendixRef {
+                    appendix_offset,
+                    file_size,
+                },
+            ) => {
+                self.state.appendix_list.push((
+                    entry.path().to_path_buf(),
+                    *appendix_offset,
+                    *file_size,
+                ));
+                Ok(())
+            }
             (false, _) => Ok(()), // skip this
         };
 
diff --git a/pbs-client/src/pxar/tools.rs b/pbs-client/src/pxar/tools.rs
index 0cfbaf5b..aac5a1e7 100644
--- a/pbs-client/src/pxar/tools.rs
+++ b/pbs-client/src/pxar/tools.rs
@@ -156,6 +156,14 @@ pub fn format_multi_line_entry(entry: &Entry) -> String {
 
     let (size, link, type_name) = match entry.kind() {
         EntryKind::File { size, .. } => (format!("{}", *size), String::new(), "file"),
+        EntryKind::AppendixRef {
+            appendix_offset,
+            file_size,
+        } => (
+            format!("{} {}", appendix_offset, file_size),
+            String::new(),
+            "appendix ref",
+        ),
         EntryKind::Symlink(link) => (
             "0".to_string(),
             format!(" -> {:?}", link.as_os_str()),
diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
index 746b493b..8ae7c661 100644
--- a/pbs-datastore/src/catalog.rs
+++ b/pbs-datastore/src/catalog.rs
@@ -28,6 +28,14 @@ pub trait BackupCatalogWriter {
         ctime: i64,
         file_offset: pxar::encoder::LinkOffset,
     ) -> Result<(), Error>;
+    fn add_appendix_ref(
+        &mut self,
+        name: &CStr,
+        size: u64,
+        mtime: i64,
+        ctime: i64,
+        appendix_ref_offset: pxar::encoder::AppendixRefOffset,
+    ) -> Result<(), Error>;
     fn add_symlink(&mut self, name: &CStr) -> Result<(), Error>;
     fn add_hardlink(&mut self, name: &CStr) -> Result<(), Error>;
     fn add_block_device(&mut self, name: &CStr) -> Result<(), Error>;
@@ -41,6 +49,7 @@ pub trait BackupCatalogWriter {
 pub enum CatalogEntryType {
     Directory = b'd',
     File = b'f',
+    AppendixRef = b'r',
     Symlink = b'l',
     Hardlink = b'h',
     BlockDevice = b'b',
@@ -56,6 +65,7 @@ impl TryFrom<u8> for CatalogEntryType {
         Ok(match value {
             b'd' => CatalogEntryType::Directory,
             b'f' => CatalogEntryType::File,
+            b'r' => CatalogEntryType::AppendixRef,
             b'l' => CatalogEntryType::Symlink,
             b'h' => CatalogEntryType::Hardlink,
             b'b' => CatalogEntryType::BlockDevice,
@@ -72,6 +82,7 @@ impl From<&DirEntryAttribute> for CatalogEntryType {
         match value {
             DirEntryAttribute::Directory { .. } => CatalogEntryType::Directory,
             DirEntryAttribute::File { .. } => CatalogEntryType::File,
+            DirEntryAttribute::AppendixRef { .. } => CatalogEntryType::AppendixRef,
             DirEntryAttribute::Symlink => CatalogEntryType::Symlink,
             DirEntryAttribute::Hardlink => CatalogEntryType::Hardlink,
             DirEntryAttribute::BlockDevice => CatalogEntryType::BlockDevice,
@@ -98,9 +109,22 @@ impl FileOffset {
         self.offset
     }
 }
+
+#[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
+pub struct AppendixRefOffset {
+    offset: u64,
+}
+
+impl AppendixRefOffset {
+    pub fn raw(&self) -> u64 {
+        self.offset
+    }
+}
+
 #[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
 pub enum Offset {
     FileOffset { offset: u64 },
+    AppendixRefOffset { offset: u64 },
 }
 
 /// Represents a named directory entry
@@ -130,6 +154,12 @@ pub enum DirEntryAttribute {
         mtime: i64,
         extension: Option<CatalogV2Extension>,
     },
+    AppendixRef {
+        size: u64,
+        mtime: i64,
+        ctime: i64,
+        appendix_ref_offset: AppendixRefOffset,
+    },
     Symlink,
     Hardlink,
     BlockDevice,
@@ -195,6 +225,17 @@ impl DirEntry {
                 name,
                 attr: DirEntryAttribute::Socket,
             },
+            (CatalogEntryType::AppendixRef, Some(Offset::AppendixRefOffset { offset })) => {
+                DirEntry {
+                    name,
+                    attr: DirEntryAttribute::AppendixRef {
+                        size,
+                        mtime,
+                        ctime,
+                        appendix_ref_offset: AppendixRefOffset { offset },
+                    },
+                }
+            }
             _ => panic!("unexpected parameters '{etype}' and '{offset:?}'"),
         }
     }
@@ -204,6 +245,7 @@ impl DirEntry {
         Some(match self.attr {
             DirEntryAttribute::Directory { .. } => pxar::mode::IFDIR,
             DirEntryAttribute::File { .. } => pxar::mode::IFREG,
+            DirEntryAttribute::AppendixRef { .. } => pxar::mode::IFREG,
             DirEntryAttribute::Symlink => pxar::mode::IFLNK,
             DirEntryAttribute::Hardlink => return None,
             DirEntryAttribute::BlockDevice => pxar::mode::IFBLK,
@@ -271,6 +313,24 @@ impl DirInfo {
                     catalog_encode_u64(writer, file_offset.raw())?;
                 }
             }
+            DirEntry {
+                name,
+                attr:
+                    DirEntryAttribute::AppendixRef {
+                        size,
+                        mtime,
+                        ctime,
+                        appendix_ref_offset,
+                    },
+            } => {
+                writer.write_all(&[CatalogEntryType::AppendixRef as u8])?;
+                catalog_encode_u64(writer, name.len() as u64)?;
+                writer.write_all(name)?;
+                catalog_encode_u64(writer, *size)?;
+                catalog_encode_i64(writer, *mtime)?;
+                catalog_encode_i64(writer, *ctime)?;
+                catalog_encode_u64(writer, appendix_ref_offset.raw())?;
+            }
             DirEntry {
                 name,
                 attr: DirEntryAttribute::Symlink,
@@ -390,6 +450,21 @@ impl DirInfo {
                     };
                     callback(etype, name, 0, size, mtime, ctime, offset)?
                 }
+                CatalogEntryType::AppendixRef => {
+                    let size = catalog_decode_u64(&mut cursor)?;
+                    let mtime = catalog_decode_i64(&mut cursor)?;
+                    let ctime = catalog_decode_i64(&mut cursor)?;
+                    let offset = catalog_decode_u64(&mut cursor)?;
+                    callback(
+                        etype,
+                        name,
+                        0,
+                        size,
+                        mtime,
+                        ctime,
+                        Some(Offset::AppendixRefOffset { offset }),
+                    )?
+                }
                 _ => callback(etype, name, 0, 0, 0, 0, None)?,
             };
             if !cont {
@@ -517,6 +592,34 @@ impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
         Ok(())
     }
 
+    fn add_appendix_ref(
+        &mut self,
+        name: &CStr,
+        size: u64,
+        mtime: i64,
+        ctime: i64,
+        appendix_ref_offset: pxar::encoder::AppendixRefOffset,
+    ) -> Result<(), Error> {
+        let dir = self
+            .dirstack
+            .last_mut()
+            .ok_or_else(|| format_err!("outside root"))?;
+        let name = name.to_bytes().to_vec();
+        let appendix_ref_offset = AppendixRefOffset {
+            offset: appendix_ref_offset.raw(),
+        };
+        dir.entries.push(DirEntry {
+            name,
+            attr: DirEntryAttribute::AppendixRef {
+                size,
+                mtime,
+                ctime,
+                appendix_ref_offset,
+            },
+        });
+        Ok(())
+    }
+
     fn add_symlink(&mut self, name: &CStr) -> Result<(), Error> {
         let dir = self
             .dirstack
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 14/28] fix #3174: catalog: add specialized Archive entry
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (12 preceding siblings ...)
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 13/28] fix #3174: archiver/extractor: impl appendix ref Christian Ebner
@ 2023-11-15 15:47 ` Christian Ebner
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 15/28] fix #3174: extractor: impl seq restore from appendix Christian Ebner
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:47 UTC (permalink / raw)
  To: pbs-devel

Introduces a specialized pxar directory entry type Archive,
which extends the regular directory entry by storing an additional
optional appendix start offset.

The archive entry type is only used for the top most entries in the
catalog, replacing the currently used directory entry. If this entry
was created by reusing pxar file entries in an appendix section,
the appendix start offset is present and can be used to easily locate
and calculate the referenced file entries within the appendix section
to access them from the catalog shell.

Since the catalog might contain multiple archives, each archive entry
stores its individual appendix start offset.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- no changes

Changes since version 3:
- no changes

Changes since version 2:
- Make sure DirEntryAttribute::Archive is not flagged as leaf node

Changes since version 1:
- This reworks the Appendix Offset impl of version 1 completely

 pbs-client/src/catalog_shell.rs |   1 +
 pbs-datastore/src/catalog.rs    | 152 +++++++++++++++++++++++++++++++-
 2 files changed, 150 insertions(+), 3 deletions(-)

diff --git a/pbs-client/src/catalog_shell.rs b/pbs-client/src/catalog_shell.rs
index 99416d2f..7deb9d9a 100644
--- a/pbs-client/src/catalog_shell.rs
+++ b/pbs-client/src/catalog_shell.rs
@@ -1144,6 +1144,7 @@ impl<'a> ExtractorState<'a> {
         };
 
         match (did_match, &entry.attr) {
+            (_, DirEntryAttribute::Archive { .. }) |
             (_, DirEntryAttribute::Directory { .. }) => {
                 self.handle_new_directory(entry, match_result?).await?;
             }
diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
index 8ae7c661..220313c6 100644
--- a/pbs-datastore/src/catalog.rs
+++ b/pbs-datastore/src/catalog.rs
@@ -18,6 +18,11 @@ use crate::file_formats::{PROXMOX_CATALOG_FILE_MAGIC_1_0, PROXMOX_CATALOG_FILE_M
 /// A file list catalog simply stores a directory tree. Such catalogs may be used as index to do a
 /// fast search for files.
 pub trait BackupCatalogWriter {
+    fn start_archive(&mut self, name: &CStr) -> Result<(), Error>;
+    fn end_archive(
+        &mut self,
+        appendix: Option<pxar::encoder::AppendixStartOffset>,
+    ) -> Result<(), Error>;
     fn start_directory(&mut self, name: &CStr) -> Result<(), Error>;
     fn end_directory(&mut self) -> Result<(), Error>;
     fn add_file(
@@ -50,6 +55,7 @@ pub enum CatalogEntryType {
     Directory = b'd',
     File = b'f',
     AppendixRef = b'r',
+    Archive = b'a',
     Symlink = b'l',
     Hardlink = b'h',
     BlockDevice = b'b',
@@ -66,6 +72,7 @@ impl TryFrom<u8> for CatalogEntryType {
             b'd' => CatalogEntryType::Directory,
             b'f' => CatalogEntryType::File,
             b'r' => CatalogEntryType::AppendixRef,
+            b'a' => CatalogEntryType::Archive,
             b'l' => CatalogEntryType::Symlink,
             b'h' => CatalogEntryType::Hardlink,
             b'b' => CatalogEntryType::BlockDevice,
@@ -83,6 +90,7 @@ impl From<&DirEntryAttribute> for CatalogEntryType {
             DirEntryAttribute::Directory { .. } => CatalogEntryType::Directory,
             DirEntryAttribute::File { .. } => CatalogEntryType::File,
             DirEntryAttribute::AppendixRef { .. } => CatalogEntryType::AppendixRef,
+            DirEntryAttribute::Archive { .. } => CatalogEntryType::Archive,
             DirEntryAttribute::Symlink => CatalogEntryType::Symlink,
             DirEntryAttribute::Hardlink => CatalogEntryType::Hardlink,
             DirEntryAttribute::BlockDevice => CatalogEntryType::BlockDevice,
@@ -121,10 +129,22 @@ impl AppendixRefOffset {
     }
 }
 
+#[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
+pub struct AppendixStartOffset {
+    offset: u64,
+}
+
+impl AppendixStartOffset {
+    pub fn raw(&self) -> u64 {
+        self.offset
+    }
+}
+
 #[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
 pub enum Offset {
     FileOffset { offset: u64 },
     AppendixRefOffset { offset: u64 },
+    AppendixStartOffset { offset: u64 },
 }
 
 /// Represents a named directory entry
@@ -160,6 +180,10 @@ pub enum DirEntryAttribute {
         ctime: i64,
         appendix_ref_offset: AppendixRefOffset,
     },
+    Archive {
+        start: u64,
+        appendix_offset: AppendixStartOffset,
+    },
     Symlink,
     Hardlink,
     BlockDevice,
@@ -236,6 +260,13 @@ impl DirEntry {
                     },
                 }
             }
+            (CatalogEntryType::Archive, Some(Offset::AppendixStartOffset { offset })) => DirEntry {
+                name,
+                attr: DirEntryAttribute::Archive {
+                    start,
+                    appendix_offset: AppendixStartOffset { offset },
+                },
+            },
             _ => panic!("unexpected parameters '{etype}' and '{offset:?}'"),
         }
     }
@@ -246,6 +277,7 @@ impl DirEntry {
             DirEntryAttribute::Directory { .. } => pxar::mode::IFDIR,
             DirEntryAttribute::File { .. } => pxar::mode::IFREG,
             DirEntryAttribute::AppendixRef { .. } => pxar::mode::IFREG,
+            DirEntryAttribute::Archive { .. } => pxar::mode::IFDIR,
             DirEntryAttribute::Symlink => pxar::mode::IFLNK,
             DirEntryAttribute::Hardlink => return None,
             DirEntryAttribute::BlockDevice => pxar::mode::IFBLK,
@@ -258,6 +290,12 @@ impl DirEntry {
     /// Check if DirEntry is a directory
     pub fn is_directory(&self) -> bool {
         matches!(self.attr, DirEntryAttribute::Directory { .. })
+            || matches!(self.attr, DirEntryAttribute::Archive { .. })
+    }
+
+    /// Check if DirEntry is an archive
+    pub fn is_archive(&self) -> bool {
+        matches!(self.attr, DirEntryAttribute::Archive { .. })
     }
 
     /// Check if DirEntry is a symlink
@@ -285,6 +323,20 @@ impl DirInfo {
 
     fn encode_entry<W: Write>(writer: &mut W, entry: &DirEntry, pos: u64) -> Result<(), Error> {
         match entry {
+            DirEntry {
+                name,
+                attr:
+                    DirEntryAttribute::Archive {
+                        start,
+                        appendix_offset,
+                    },
+            } => {
+                writer.write_all(&[CatalogEntryType::Archive as u8])?;
+                catalog_encode_u64(writer, name.len() as u64)?;
+                writer.write_all(name)?;
+                catalog_encode_u64(writer, appendix_offset.raw())?;
+                catalog_encode_u64(writer, pos - start)?;
+            }
             DirEntry {
                 name,
                 attr: DirEntryAttribute::Directory { start },
@@ -427,6 +479,19 @@ impl DirInfo {
             cursor.read_exact(name)?;
 
             let cont = match etype {
+                CatalogEntryType::Archive => {
+                    let offset = catalog_decode_u64(&mut cursor)?;
+                    let start = catalog_decode_u64(&mut cursor)?;
+                    callback(
+                        etype,
+                        name,
+                        start,
+                        0,
+                        0,
+                        0,
+                        Some(Offset::AppendixStartOffset { offset }),
+                    )?
+                }
                 CatalogEntryType::Directory => {
                     let offset = catalog_decode_u64(&mut cursor)?;
                     callback(etype, name, offset, 0, 0, 0, None)?
@@ -533,6 +598,51 @@ impl<W: Write> CatalogWriter<W> {
 }
 
 impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
+    fn start_archive(&mut self, name: &CStr) -> Result<(), Error> {
+        let new = DirInfo::new(name.to_owned());
+        self.dirstack.push(new);
+        Ok(())
+    }
+
+    fn end_archive(
+        &mut self,
+        appendix: Option<pxar::encoder::AppendixStartOffset>,
+    ) -> Result<(), Error> {
+        let (start, name) = match self.dirstack.pop() {
+            Some(dir) => {
+                let start = self.pos;
+                let (name, data) = dir.encode(start)?;
+                self.write_all(&data)?;
+                (start, name)
+            }
+            None => {
+                bail!("got unexpected end_directory level 0");
+            }
+        };
+
+        let current = self
+            .dirstack
+            .last_mut()
+            .ok_or_else(|| format_err!("outside root"))?;
+        let name = name.to_bytes().to_vec();
+        let appendix_offset = if let Some(appendix) = appendix {
+            AppendixStartOffset {
+                offset: appendix.raw(),
+            }
+        } else {
+            AppendixStartOffset { offset: 0 }
+        };
+        current.entries.push(DirEntry {
+            name,
+            attr: DirEntryAttribute::Archive {
+                start,
+                appendix_offset,
+            },
+        });
+
+        Ok(())
+    }
+
     fn start_directory(&mut self, name: &CStr) -> Result<(), Error> {
         let new = DirInfo::new(name.to_owned());
         self.dirstack.push(new);
@@ -746,10 +856,33 @@ impl<R: Read + Seek> CatalogReader<R> {
         })
     }
 
+    pub fn appendix_offset(
+        &mut self,
+        archive_name: &[u8],
+    ) -> Result<Option<AppendixStartOffset>, Error> {
+        let root = self.root()?;
+        let dir_entry = self.lookup(&root, archive_name)?.unwrap();
+        if let DirEntry {
+            attr: DirEntryAttribute::Archive {
+                appendix_offset, ..
+            },
+            ..
+        } = dir_entry
+        {
+            if appendix_offset.raw() != 0 {
+                return Ok(Some(appendix_offset));
+            } else {
+                return Ok(None);
+            }
+        }
+        Ok(None)
+    }
+
     /// Read all directory entries
     pub fn read_dir(&mut self, parent: &DirEntry) -> Result<Vec<DirEntry>, Error> {
         let start = match parent.attr {
             DirEntryAttribute::Directory { start } => start,
+            DirEntryAttribute::Archive { start, .. } => start,
             _ => bail!("parent is not a directory - internal error"),
         };
 
@@ -813,6 +946,7 @@ impl<R: Read + Seek> CatalogReader<R> {
     ) -> Result<Option<DirEntry>, Error> {
         let start = match parent.attr {
             DirEntryAttribute::Directory { start } => start,
+            DirEntryAttribute::Archive { start, .. } => start,
             _ => bail!("parent is not a directory - internal error"),
         };
 
@@ -822,7 +956,7 @@ impl<R: Read + Seek> CatalogReader<R> {
         DirInfo::parse(
             &data,
             self.magic,
-            |etype, name, offset, size, mtime, ctime, link_offset| {
+            |etype, name, offset, size, mtime, ctime, archive_offset| {
                 if name != filename {
                     return Ok(true);
                 }
@@ -834,7 +968,7 @@ impl<R: Read + Seek> CatalogReader<R> {
                     size,
                     mtime,
                     ctime,
-                    link_offset,
+                    archive_offset,
                 );
                 item = Some(entry);
                 Ok(false) // stop parsing
@@ -868,6 +1002,14 @@ impl<R: Read + Seek> CatalogReader<R> {
                 path.push(name);
 
                 match etype {
+                    CatalogEntryType::Archive => {
+                        log::info!("{} {:?}", etype, path);
+                        if offset > start {
+                            bail!("got wrong archive offset ({} > {})", offset, start);
+                        }
+                        let pos = start - offset;
+                        self.dump_dir(&path, pos)?;
+                    }
                     CatalogEntryType::Directory => {
                         log::info!("{} {:?}", etype, path);
                         if offset > start {
@@ -1208,7 +1350,11 @@ impl ArchiveEntry {
                 Some(entry_type) => CatalogEntryType::from(entry_type).to_string(),
                 None => "v".to_owned(),
             },
-            leaf: !matches!(entry_type, None | Some(DirEntryAttribute::Directory { .. })),
+            leaf: !matches!(
+                entry_type,
+                None | Some(DirEntryAttribute::Directory { .. })
+                    | Some(DirEntryAttribute::Archive { .. })
+            ),
             size,
             mtime: match entry_type {
                 Some(DirEntryAttribute::File { mtime, .. }) => Some(*mtime),
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 15/28] fix #3174: extractor: impl seq restore from appendix
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (13 preceding siblings ...)
  2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 14/28] fix #3174: catalog: add specialized Archive entry Christian Ebner
@ 2023-11-15 15:48 ` Christian Ebner
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 16/28] fix #3174: archiver: store ref to previous backup Christian Ebner
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:48 UTC (permalink / raw)
  To: pbs-devel

Restores the file payloads for all AppendixRef entries encountered
during the sequential restore of the pxar archive.
This is done by iterating over all the files listed in the corresponding
state variable, opening each of the parent directory while storing its
metadata for successive restore and creating the file, followed by
writing the contents to it.

When leaving the directories, their metatdata is restored.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- Refactor and rename `byte_len` to `encoded_size`

Changes since version 3:
- Use BTreeMap and sorted insert instead of Vec and sorting afterwards

Changes since version 2:
- Sort entries by their appendix start offset for restore. Required
  since chunks are now normalized during upload.

Changes since version 1:
- Use the Encoder<SeqSink> to get encoded metadata byte size

 pbs-client/src/pxar/create.rs  |   4 +-
 pbs-client/src/pxar/extract.rs | 147 +++++++++++++++++++++++++++++++--
 pbs-client/src/pxar/tools.rs   |   1 +
 3 files changed, 142 insertions(+), 10 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 611d7421..50bba4e6 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -43,7 +43,7 @@ pub struct PxarCreateOptions {
     pub skip_lost_and_found: bool,
 }
 
-fn detect_fs_type(fd: RawFd) -> Result<i64, Error> {
+pub fn detect_fs_type(fd: RawFd) -> Result<i64, Error> {
     let mut fs_stat = std::mem::MaybeUninit::uninit();
     let res = unsafe { libc::fstatfs(fd, fs_stat.as_mut_ptr()) };
     Errno::result(res)?;
@@ -776,7 +776,7 @@ impl Archiver {
     }
 }
 
-fn get_metadata(
+pub fn get_metadata(
     fd: RawFd,
     stat: &FileStat,
     flags: Flags,
diff --git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
index d2d42749..aa6d4e4d 100644
--- a/pbs-client/src/pxar/extract.rs
+++ b/pbs-client/src/pxar/extract.rs
@@ -1,6 +1,6 @@
 //! Code for extraction of pxar contents onto the file system.
 
-use std::collections::HashMap;
+use std::collections::{BTreeMap, HashMap};
 use std::ffi::{CStr, CString, OsStr, OsString};
 use std::io;
 use std::os::unix::ffi::OsStrExt;
@@ -74,7 +74,7 @@ struct ExtractorIterState {
     err_path_stack: Vec<OsString>,
     current_match: bool,
     end_reached: bool,
-    appendix_list: Vec<(PathBuf, u64, u64)>,
+    appendix_refs: BTreeMap<u64, (PathBuf, u64)>,
 }
 
 /// An [`Iterator`] that encapsulates the process of extraction in [extract_archive].
@@ -99,7 +99,7 @@ impl ExtractorIterState {
             err_path_stack: Vec::new(),
             current_match: options.extract_match_default,
             end_reached: false,
-            appendix_list: Vec::new(),
+            appendix_refs: BTreeMap::new(),
         }
     }
 }
@@ -313,6 +313,139 @@ where
 
                 res
             }
+            (_, EntryKind::Appendix { total }) => {
+                // Bytes consumed in decoder since encountering the appendix marker
+                let mut consumed = 0;
+                for (offset, (path, size)) in &self.state.appendix_refs {
+                    self.extractor.allow_existing_dirs = true;
+
+                    // Open dir path components, skipping the root component, get metadata
+                    for dir in path.iter().skip(1) {
+                        let parent_fd = match self.extractor.dir_stack.last_dir_fd(true) {
+                            Ok(parent_fd) => parent_fd,
+                            Err(err) => return Some(Err(err.into())),
+                        };
+                        let fs_magic =
+                            match crate::pxar::create::detect_fs_type(parent_fd.as_raw_fd()) {
+                                Ok(fs_magic) => fs_magic,
+                                Err(err) => return Some(Err(err.into())),
+                            };
+
+                        let mut fs_feature_flags = Flags::from_magic(fs_magic);
+                        let file_name = match CString::new(dir.as_bytes()) {
+                            Ok(file_name) => file_name,
+                            Err(err) => return Some(Err(err.into())),
+                        };
+                        let fd = match proxmox_sys::fd::openat(
+                            &parent_fd,
+                            file_name.as_ref(),
+                            OFlag::O_NOATIME,
+                            Mode::empty(),
+                        ) {
+                            Ok(fd) => fd,
+                            Err(err) => return Some(Err(err.into())),
+                        };
+
+                        let stat = match nix::sys::stat::fstat(fd.as_raw_fd()) {
+                            Ok(stat) => stat,
+                            Err(err) => return Some(Err(err.into())),
+                        };
+                        let metadata = match crate::pxar::create::get_metadata(
+                            fd.as_raw_fd(),
+                            &stat,
+                            fs_feature_flags,
+                            fs_magic,
+                            &mut fs_feature_flags,
+                        ) {
+                            Ok(metadata) => metadata,
+                            Err(err) => return Some(Err(err)),
+                        };
+
+                        match self.extractor.enter_directory(
+                            dir.to_os_string(),
+                            metadata.clone(),
+                            true,
+                        ) {
+                            Ok(()) => (),
+                            Err(err) => return Some(Err(err)),
+                        };
+                    }
+
+                    let skip = *offset - consumed;
+                    match self.decoder.skip_bytes(skip) {
+                        Ok(()) => (),
+                        Err(err) => return Some(Err(err.into())),
+                    };
+
+                    let entry = match self.decoder.next() {
+                        Some(Ok(entry)) => entry,
+                        Some(Err(err)) => return Some(Err(err.into())),
+                        None => return Some(Err(format_err!("expected entry"))),
+                    };
+
+                    let file_name_os = entry.file_name();
+                    let file_name_bytes = file_name_os.as_bytes();
+
+                    let file_name = match CString::new(file_name_bytes) {
+                        Ok(file_name_ref) => file_name_ref,
+                        Err(err) => return Some(Err(err.into())),
+                    };
+
+                    let metadata = entry.metadata();
+
+                    self.extractor.set_path(path.as_os_str().to_owned());
+
+                    let contents = self.decoder.contents();
+                    match contents {
+                        None => {
+                            return Some(Err(format_err!(
+                                "found regular file entry without contents in archive"
+                            )))
+                        }
+                        Some(mut contents) => {
+                            let result = self
+                                .extractor
+                                .extract_file(
+                                    &file_name,
+                                    metadata,
+                                    *size,
+                                    &mut contents,
+                                    self.extractor
+                                        .overwrite_flags
+                                        .contains(OverwriteFlags::FILE),
+                                )
+                                .context(PxarExtractContext::ExtractFile);
+                            if let Err(err) = result {
+                                return Some(Err(err.into()));
+                            }
+                        }
+                    }
+
+                    // Iter over all dir path components, skipping the root component, set metadata
+                    for _dir in path.iter().skip(1) {
+                        if let Err(err) = self.extractor.leave_directory() {
+                            return Some(Err(err.into()));
+                        }
+                    }
+
+                    let mut bytes =
+                        match pxar::encoder::sync::encoded_size(file_name.as_c_str(), &metadata) {
+                            Ok(bytes) => bytes,
+                            Err(err) => return Some(Err(err.into())),
+                        };
+                    // payload header size
+                    bytes += std::mem::size_of::<pxar::format::Header>() as u64;
+
+                    consumed += skip + bytes + *size;
+                }
+
+                let skip = *total - consumed;
+                if let Err(err) = self.decoder.skip_bytes(skip) {
+                    return Some(Err(err.into()));
+                }
+
+                Ok(())
+            }
             (true, EntryKind::Symlink(link)) => {
                 self.callback(entry.path());
                 self.extractor
@@ -382,11 +515,9 @@ where
                     file_size,
                 },
             ) => {
-                self.state.appendix_list.push((
-                    entry.path().to_path_buf(),
-                    *appendix_offset,
-                    *file_size,
-                ));
+                self.state
+                    .appendix_refs
+                    .insert(*appendix_offset, (entry.path().to_path_buf(), *file_size));
                 Ok(())
             }
             (false, _) => Ok(()), // skip this
diff --git a/pbs-client/src/pxar/tools.rs b/pbs-client/src/pxar/tools.rs
index aac5a1e7..174a7351 100644
--- a/pbs-client/src/pxar/tools.rs
+++ b/pbs-client/src/pxar/tools.rs
@@ -156,6 +156,7 @@ pub fn format_multi_line_entry(entry: &Entry) -> String {
 
     let (size, link, type_name) = match entry.kind() {
         EntryKind::File { size, .. } => (format!("{}", *size), String::new(), "file"),
+        EntryKind::Appendix { total } => (format!("{total}"), String::new(), "appendix"),
         EntryKind::AppendixRef {
             appendix_offset,
             file_size,
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 16/28] fix #3174: archiver: store ref to previous backup
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (14 preceding siblings ...)
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 15/28] fix #3174: extractor: impl seq restore from appendix Christian Ebner
@ 2023-11-15 15:48 ` Christian Ebner
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 17/28] fix #3174: upload stream: impl reused chunk injector Christian Ebner
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:48 UTC (permalink / raw)
  To: pbs-devel

Adds the stateful information needed for accessing the previous backups
information such as the index and catalog.

This patch only introduces the struct and interface, by setting all
occurrences to None it is not used for now.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- use pxar format version only for metadata based detection mode

Changes since version 3:
- no changes

Changes since version 2:
- no changes

Changes since version 1:
- no changes

 pbs-client/src/pxar/create.rs                 | 25 ++++++++++++++++---
 pbs-client/src/pxar/mod.rs                    |  2 +-
 proxmox-backup-client/src/main.rs             |  1 +
 .../src/proxmox_restore_daemon/api.rs         |  1 +
 pxar-bin/src/main.rs                          | 17 +++++++------
 src/tape/file_formats/snapshot_archive.rs     |  7 ++++--
 6 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 50bba4e6..2b542972 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -24,14 +24,15 @@ use proxmox_io::vec;
 use proxmox_lang::c_str;
 use proxmox_sys::fs::{self, acl, xattr};
 
-use pbs_datastore::catalog::BackupCatalogWriter;
+use pbs_datastore::catalog::{BackupCatalogWriter, CatalogReader};
+use pbs_datastore::dynamic_index::DynamicIndexReader;
 
 use crate::pxar::metadata::errno_is_unsupported;
 use crate::pxar::tools::assert_single_path_component;
 use crate::pxar::Flags;
 
 /// Pxar options for creating a pxar archive/stream
-#[derive(Default, Clone)]
+#[derive(Default)]
 pub struct PxarCreateOptions {
     /// Device/mountpoint st_dev numbers that should be included. None for no limitation.
     pub device_set: Option<HashSet<u64>>,
@@ -41,6 +42,18 @@ pub struct PxarCreateOptions {
     pub entries_max: usize,
     /// Skip lost+found directory
     pub skip_lost_and_found: bool,
+    /// Reference state for partial backups
+    pub previous_ref: Option<PxarPrevRef>,
+}
+
+/// Contains statefull information of previous backups snapshots for partial backups
+pub struct PxarPrevRef {
+    /// Reference index for partial backups
+    pub index: DynamicIndexReader,
+    /// Reference catalog for partial backups
+    pub catalog: CatalogReader<std::fs::File>,
+    /// Reference archive name for partial backups
+    pub archive_name: String,
 }
 
 pub fn detect_fs_type(fd: RawFd) -> Result<i64, Error> {
@@ -128,6 +141,7 @@ struct Archiver {
     device_set: Option<HashSet<u64>>,
     hardlinks: HashMap<HardLinkInfo, (PathBuf, LinkOffset)>,
     file_copy_buffer: Vec<u8>,
+    previous_ref: Option<PxarPrevRef>,
 }
 
 type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
@@ -166,7 +180,11 @@ where
         set.insert(stat.st_dev);
     }
 
-    let mut encoder = Encoder::new(&mut writer, &metadata).await?;
+    let pxar_format_version = match options.previous_ref {
+        Some(_) => pxar::format::FormatVersion::V2,
+        None => pxar::format::FormatVersion::Default,
+    };
+    let mut encoder = Encoder::new(&mut writer, &metadata, pxar_format_version).await?;
 
     let mut patterns = options.patterns;
 
@@ -192,6 +210,7 @@ where
         device_set,
         hardlinks: HashMap::new(),
         file_copy_buffer: vec::undefined(4 * 1024 * 1024),
+        previous_ref: options.previous_ref,
     };
 
     archiver
diff --git a/pbs-client/src/pxar/mod.rs b/pbs-client/src/pxar/mod.rs
index 14674b9b..24315f5f 100644
--- a/pbs-client/src/pxar/mod.rs
+++ b/pbs-client/src/pxar/mod.rs
@@ -56,7 +56,7 @@ pub(crate) mod tools;
 mod flags;
 pub use flags::Flags;
 
-pub use create::{create_archive, PxarCreateOptions};
+pub use create::{create_archive, PxarCreateOptions, PxarPrevRef};
 pub use extract::{
     create_tar, create_zip, extract_archive, extract_sub_dir, extract_sub_dir_seq, ErrorHandler,
     OverwriteFlags, PxarExtractContext, PxarExtractOptions,
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 1a13291a..509fa22c 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -993,6 +993,7 @@ async fn create_backup(
                     patterns: pattern_list.clone(),
                     entries_max: entries_max as usize,
                     skip_lost_and_found,
+                    previous_ref: None,
                 };
 
                 let upload_options = UploadOptions {
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index fb12befa..f89b0ab4 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -356,6 +356,7 @@ fn extract(
                         device_set: None,
                         patterns,
                         skip_lost_and_found: false,
+                        previous_ref: None,
                     };
 
                     let pxar_writer = TokioWriter::new(writer);
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index bc044035..9376a2c1 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -330,13 +330,6 @@ async fn create_archive(
         Some(HashSet::new())
     };
 
-    let options = pbs_client::pxar::PxarCreateOptions {
-        entries_max: entries_max as usize,
-        device_set,
-        patterns,
-        skip_lost_and_found: false,
-    };
-
     let source = PathBuf::from(source);
 
     let dir = nix::dir::Dir::open(
@@ -349,7 +342,7 @@ async fn create_archive(
         .create_new(true)
         .write(true)
         .mode(0o640)
-        .open(archive)?;
+        .open(archive.clone())?;
 
     let writer = std::io::BufWriter::with_capacity(1024 * 1024, file);
     let mut feature_flags = Flags::DEFAULT;
@@ -372,6 +365,14 @@ async fn create_archive(
         feature_flags.remove(Flags::WITH_SOCKETS);
     }
 
+    let options = pbs_client::pxar::PxarCreateOptions {
+        entries_max: entries_max as usize,
+        device_set,
+        patterns,
+        skip_lost_and_found: false,
+        previous_ref: None,
+    };
+
     let writer = pxar::encoder::sync::StandardWriter::new(writer);
     pbs_client::pxar::create_archive(
         dir,
diff --git a/src/tape/file_formats/snapshot_archive.rs b/src/tape/file_formats/snapshot_archive.rs
index 252384b5..cc5c5885 100644
--- a/src/tape/file_formats/snapshot_archive.rs
+++ b/src/tape/file_formats/snapshot_archive.rs
@@ -58,8 +58,11 @@ pub fn tape_write_snapshot_archive<'a>(
             ));
         }
 
-        let mut encoder =
-            pxar::encoder::sync::Encoder::new(PxarTapeWriter::new(writer), &root_metadata)?;
+        let mut encoder = pxar::encoder::sync::Encoder::new(
+            PxarTapeWriter::new(writer),
+            &root_metadata,
+            pxar::format::FormatVersion::Default,
+        )?;
 
         for filename in file_list.iter() {
             let mut file = snapshot_reader.open_file(filename).map_err(|err| {
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 17/28] fix #3174: upload stream: impl reused chunk injector
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (15 preceding siblings ...)
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 16/28] fix #3174: archiver: store ref to previous backup Christian Ebner
@ 2023-11-15 15:48 ` Christian Ebner
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 18/28] fix #3174: chunker: add forced boundaries Christian Ebner
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:48 UTC (permalink / raw)
  To: pbs-devel

In order to be included in the backups index file, the reused chunks
which store the payload of skipped files during pxar encoding have to be
inserted after the encoder has written the pxar appendix entry type.

The chunker forces a chunk boundary after this marker and queues the
list of chunks to be uploaded thereafter.
This implements the logic to inject the chunks into the chunk upload
stream after such a boundary is requested, by looping over the queued
chunks and inserting them into the stream.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- fix premature chunk injection by incorrect initialization

Changes since version 3:
- count appendix chunks as reused chunks
- fix issue with stream being corrupted by missing buffering of data
  when injecting

Changes since version 2:
- no changes

Changes since version 1:
- no changes

 pbs-client/src/inject_reused_chunks.rs | 153 +++++++++++++++++++++++++
 pbs-client/src/lib.rs                  |   1 +
 2 files changed, 154 insertions(+)
 create mode 100644 pbs-client/src/inject_reused_chunks.rs

diff --git a/pbs-client/src/inject_reused_chunks.rs b/pbs-client/src/inject_reused_chunks.rs
new file mode 100644
index 00000000..5811648c
--- /dev/null
+++ b/pbs-client/src/inject_reused_chunks.rs
@@ -0,0 +1,153 @@
+use std::collections::VecDeque;
+use std::pin::Pin;
+use std::sync::atomic::{AtomicUsize, Ordering};
+use std::sync::{Arc, Mutex};
+use std::task::{Context, Poll};
+
+use anyhow::{anyhow, Error};
+use futures::{ready, Stream};
+use pin_project_lite::pin_project;
+
+use pbs_datastore::dynamic_index::DynamicEntry;
+
+pin_project! {
+    pub struct InjectReusedChunksQueue<S> {
+        #[pin]
+        input: S,
+        current: Option<InjectChunks>,
+        buffer: Option<bytes::BytesMut>,
+        injection_queue: Arc<Mutex<VecDeque<InjectChunks>>>,
+        stream_len: Arc<AtomicUsize>,
+        reused_len: Arc<AtomicUsize>,
+        index_csum: Arc<Mutex<Option<openssl::sha::Sha256>>>,
+    }
+}
+
+#[derive(Debug)]
+pub struct InjectChunks {
+    pub boundary: u64,
+    pub chunks: Vec<DynamicEntry>,
+    pub size: usize,
+}
+
+pub enum InjectedChunksInfo {
+    Known(Vec<(u64, [u8; 32])>),
+    Raw((u64, bytes::BytesMut)),
+}
+
+pub trait InjectReusedChunks: Sized {
+    fn inject_reused_chunks(
+        self,
+        injection_queue: Arc<Mutex<VecDeque<InjectChunks>>>,
+        stream_len: Arc<AtomicUsize>,
+        reused_len: Arc<AtomicUsize>,
+        index_csum: Arc<Mutex<Option<openssl::sha::Sha256>>>,
+    ) -> InjectReusedChunksQueue<Self>;
+}
+
+impl<S> InjectReusedChunks for S
+where
+    S: Stream<Item = Result<bytes::BytesMut, Error>>,
+{
+    fn inject_reused_chunks(
+        self,
+        injection_queue: Arc<Mutex<VecDeque<InjectChunks>>>,
+        stream_len: Arc<AtomicUsize>,
+        reused_len: Arc<AtomicUsize>,
+        index_csum: Arc<Mutex<Option<openssl::sha::Sha256>>>,
+    ) -> InjectReusedChunksQueue<Self> {
+        InjectReusedChunksQueue {
+            input: self,
+            current: None,
+            injection_queue,
+            buffer: None,
+            stream_len,
+            reused_len,
+            index_csum,
+        }
+    }
+}
+
+impl<S> Stream for InjectReusedChunksQueue<S>
+where
+    S: Stream<Item = Result<bytes::BytesMut, Error>>,
+{
+    type Item = Result<InjectedChunksInfo, Error>;
+
+    fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Option<Self::Item>> {
+        let mut this = self.project();
+        loop {
+            let current = this.current.take();
+            if let Some(current) = current {
+                let mut chunks = Vec::new();
+                let mut guard = this.index_csum.lock().unwrap();
+                let csum = guard.as_mut().unwrap();
+
+                for chunk in current.chunks {
+                    let offset = this
+                        .stream_len
+                        .fetch_add(chunk.end() as usize, Ordering::SeqCst)
+                        as u64;
+                    this.reused_len
+                        .fetch_add(chunk.end() as usize, Ordering::SeqCst);
+                    let digest = chunk.digest();
+                    chunks.push((offset, digest));
+                    // Chunk end is assumed to be normalized to chunk size here
+                    let end_offset = offset + chunk.end();
+                    csum.update(&end_offset.to_le_bytes());
+                    csum.update(&digest);
+                }
+                let chunk_info = InjectedChunksInfo::Known(chunks);
+                return Poll::Ready(Some(Ok(chunk_info)));
+            }
+
+            let buffer = this.buffer.take();
+            if let Some(buffer) = buffer {
+                let offset = this.stream_len.fetch_add(buffer.len(), Ordering::SeqCst) as u64;
+                let data = InjectedChunksInfo::Raw((offset, buffer));
+                return Poll::Ready(Some(Ok(data)));
+            }
+
+            match ready!(this.input.as_mut().poll_next(cx)) {
+                None => return Poll::Ready(None),
+                Some(Err(err)) => return Poll::Ready(Some(Err(err))),
+                Some(Ok(raw)) => {
+                    let chunk_size = raw.len();
+                    let offset = this.stream_len.load(Ordering::SeqCst) as u64;
+                    let mut injections = this.injection_queue.lock().unwrap();
+
+                    if let Some(inject) = injections.pop_front() {
+                        if inject.boundary == offset {
+                            if this.current.replace(inject).is_some() {
+                                return Poll::Ready(Some(Err(anyhow!(
+                                    "replaced injection queue not empty"
+                                ))));
+                            }
+                            if chunk_size > 0 && this.buffer.replace(raw).is_some() {
+                                return Poll::Ready(Some(Err(anyhow!(
+                                    "replaced buffer not empty"
+                                ))));
+                            }
+                            continue;
+                        } else if inject.boundary == offset + chunk_size as u64 {
+                            let _ = this.current.insert(inject);
+                        } else if inject.boundary < offset + chunk_size as u64 {
+                            return Poll::Ready(Some(Err(anyhow!("invalid injection boundary"))));
+                        } else {
+                            injections.push_front(inject);
+                        }
+                    }
+
+                    if chunk_size == 0 {
+                        return Poll::Ready(Some(Err(anyhow!("unexpected empty raw data"))));
+                    }
+
+                    let offset = this.stream_len.fetch_add(chunk_size, Ordering::SeqCst) as u64;
+                    let data = InjectedChunksInfo::Raw((offset, raw));
+
+                    return Poll::Ready(Some(Ok(data)));
+                }
+            }
+        }
+    }
+}
diff --git a/pbs-client/src/lib.rs b/pbs-client/src/lib.rs
index 21cf8556..8bf26381 100644
--- a/pbs-client/src/lib.rs
+++ b/pbs-client/src/lib.rs
@@ -8,6 +8,7 @@ pub mod pxar;
 pub mod tools;
 
 mod merge_known_chunks;
+mod inject_reused_chunks;
 pub mod pipe_to_stream;
 
 mod http_client;
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 18/28] fix #3174: chunker: add forced boundaries
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (16 preceding siblings ...)
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 17/28] fix #3174: upload stream: impl reused chunk injector Christian Ebner
@ 2023-11-15 15:48 ` Christian Ebner
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 19/28] fix #3174: backup writer: inject queued chunk in upload steam Christian Ebner
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:48 UTC (permalink / raw)
  To: pbs-devel

Allow to force a boundary while chunking and inject reused chunks into
the stream. Duoble ended queues are used to control the boundaries and
chunk injection between archiver, chunker and uploader.

The archiver gets an interface to request a boundary and push a list of
chunks to inject following that boundary. The chunker reads this queue,
creating the boundary and passing the list of chunks to inject to the
uploader via a second, dedicated double ended queue.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- no changes

Changes since version 3:
- no changes

Changes since version 2:
- no changes

Changes since version 1:
- no changes

 examples/test_chunk_speed2.rs                 |  9 +++-
 pbs-client/src/backup_writer.rs               |  6 ++-
 pbs-client/src/chunk_stream.rs                | 42 ++++++++++++++++++-
 pbs-client/src/pxar/create.rs                 | 10 ++++-
 pbs-client/src/pxar_backup_stream.rs          |  8 +++-
 proxmox-backup-client/src/main.rs             | 36 ++++++++++++----
 .../src/proxmox_restore_daemon/api.rs         | 13 +++++-
 pxar-bin/src/main.rs                          |  5 ++-
 tests/catar.rs                                |  3 ++
 9 files changed, 115 insertions(+), 17 deletions(-)

diff --git a/examples/test_chunk_speed2.rs b/examples/test_chunk_speed2.rs
index 3f69b436..e8bac726 100644
--- a/examples/test_chunk_speed2.rs
+++ b/examples/test_chunk_speed2.rs
@@ -1,5 +1,7 @@
 use anyhow::Error;
 use futures::*;
+use std::collections::VecDeque;
+use std::sync::{Arc, Mutex};
 
 extern crate proxmox_backup;
 
@@ -26,7 +28,12 @@ async fn run() -> Result<(), Error> {
         .map_err(Error::from);
 
     //let chunk_stream = FixedChunkStream::new(stream, 4*1024*1024);
-    let mut chunk_stream = ChunkStream::new(stream, None);
+    let mut chunk_stream = ChunkStream::new(
+        stream,
+        None,
+        Arc::new(Mutex::new(VecDeque::new())),
+        Arc::new(Mutex::new(VecDeque::new())),
+    );
 
     let start_time = std::time::Instant::now();
 
diff --git a/pbs-client/src/backup_writer.rs b/pbs-client/src/backup_writer.rs
index 8a03d8ea..cc6dd49a 100644
--- a/pbs-client/src/backup_writer.rs
+++ b/pbs-client/src/backup_writer.rs
@@ -1,4 +1,4 @@
-use std::collections::HashSet;
+use std::collections::{HashSet, VecDeque};
 use std::future::Future;
 use std::os::unix::fs::OpenOptionsExt;
 use std::sync::atomic::{AtomicU64, AtomicUsize, Ordering};
@@ -23,6 +23,7 @@ use pbs_tools::crypt_config::CryptConfig;
 
 use proxmox_human_byte::HumanByte;
 
+use super::inject_reused_chunks::InjectChunks;
 use super::merge_known_chunks::{MergeKnownChunks, MergedChunkInfo};
 
 use super::{H2Client, HttpClient};
@@ -265,6 +266,7 @@ impl BackupWriter {
         archive_name: &str,
         stream: impl Stream<Item = Result<bytes::BytesMut, Error>>,
         options: UploadOptions,
+        injection_queue: Arc<Mutex<VecDeque<InjectChunks>>>,
     ) -> Result<BackupStats, Error> {
         let known_chunks = Arc::new(Mutex::new(HashSet::new()));
 
@@ -341,6 +343,7 @@ impl BackupWriter {
                 None
             },
             options.compress,
+            injection_queue,
         )
         .await?;
 
@@ -637,6 +640,7 @@ impl BackupWriter {
         known_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
         crypt_config: Option<Arc<CryptConfig>>,
         compress: bool,
+        injection_queue: Arc<Mutex<VecDeque<InjectChunks>>>,
     ) -> impl Future<Output = Result<UploadStats, Error>> {
         let total_chunks = Arc::new(AtomicUsize::new(0));
         let total_chunks2 = total_chunks.clone();
diff --git a/pbs-client/src/chunk_stream.rs b/pbs-client/src/chunk_stream.rs
index 895f6eae..5cdf6916 100644
--- a/pbs-client/src/chunk_stream.rs
+++ b/pbs-client/src/chunk_stream.rs
@@ -1,5 +1,7 @@
 use std::pin::Pin;
 use std::task::{Context, Poll};
+use std::sync::{Arc, Mutex};
+use std::collections::VecDeque;
 
 use anyhow::Error;
 use bytes::BytesMut;
@@ -8,21 +10,34 @@ use futures::stream::{Stream, TryStream};
 
 use pbs_datastore::Chunker;
 
+use crate::inject_reused_chunks::InjectChunks;
+
 /// Split input stream into dynamic sized chunks
 pub struct ChunkStream<S: Unpin> {
     input: S,
     chunker: Chunker,
     buffer: BytesMut,
     scan_pos: usize,
+    consumed: u64,
+    boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
+    injections: Arc<Mutex<VecDeque<InjectChunks>>>,
 }
 
 impl<S: Unpin> ChunkStream<S> {
-    pub fn new(input: S, chunk_size: Option<usize>) -> Self {
+    pub fn new(
+        input: S,
+        chunk_size: Option<usize>,
+        boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
+        injections: Arc<Mutex<VecDeque<InjectChunks>>>,
+    ) -> Self {
         Self {
             input,
             chunker: Chunker::new(chunk_size.unwrap_or(4 * 1024 * 1024)),
             buffer: BytesMut::new(),
             scan_pos: 0,
+            consumed: 0,
+            boundaries,
+            injections,
         }
     }
 }
@@ -40,6 +55,29 @@ where
     fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Option<Self::Item>> {
         let this = self.get_mut();
         loop {
+            {
+                // Make sure to release this lock and don't hold it longer than required
+                let mut boundaries = this.boundaries.lock().unwrap();
+                if let Some(inject) = boundaries.pop_front() {
+                    let max = this.consumed + this.buffer.len() as u64;
+                    if inject.boundary <= max {
+                        let chunk_size = (inject.boundary - this.consumed) as usize;
+                        let result = this.buffer.split_to(chunk_size);
+                        this.consumed += chunk_size as u64;
+                        this.scan_pos = 0;
+
+                        // Add the size of the injected chunks to consumed, so chunk stream offsets
+                        // are in sync with the rest of the archive.
+                        this.consumed += inject.size as u64;
+
+                        this.injections.lock().unwrap().push_back(inject);
+
+                        return Poll::Ready(Some(Ok(result)));
+                    }
+                    boundaries.push_front(inject);
+                }
+            }
+
             if this.scan_pos < this.buffer.len() {
                 let boundary = this.chunker.scan(&this.buffer[this.scan_pos..]);
 
@@ -50,7 +88,9 @@ where
                     // continue poll
                 } else if chunk_size <= this.buffer.len() {
                     let result = this.buffer.split_to(chunk_size);
+                    this.consumed += chunk_size as u64;
                     this.scan_pos = 0;
+
                     return Poll::Ready(Some(Ok(result)));
                 } else {
                     panic!("got unexpected chunk boundary from chunker");
diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 2b542972..c4d207d8 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -1,4 +1,4 @@
-use std::collections::{HashMap, HashSet};
+use std::collections::{HashMap, HashSet, VecDeque};
 use std::ffi::{CStr, CString, OsStr};
 use std::fmt;
 use std::io::{self, Read};
@@ -25,8 +25,9 @@ use proxmox_lang::c_str;
 use proxmox_sys::fs::{self, acl, xattr};
 
 use pbs_datastore::catalog::{BackupCatalogWriter, CatalogReader};
-use pbs_datastore::dynamic_index::DynamicIndexReader;
+use pbs_datastore::dynamic_index::{DynamicEntry, DynamicIndexReader};
 
+use crate::inject_reused_chunks::InjectChunks;
 use crate::pxar::metadata::errno_is_unsupported;
 use crate::pxar::tools::assert_single_path_component;
 use crate::pxar::Flags;
@@ -142,6 +143,8 @@ struct Archiver {
     hardlinks: HashMap<HardLinkInfo, (PathBuf, LinkOffset)>,
     file_copy_buffer: Vec<u8>,
     previous_ref: Option<PxarPrevRef>,
+    forced_boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
+    inject: (usize, Vec<DynamicEntry>),
 }
 
 type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
@@ -153,6 +156,7 @@ pub async fn create_archive<T, F>(
     callback: F,
     catalog: Option<Arc<Mutex<dyn BackupCatalogWriter + Send>>>,
     options: PxarCreateOptions,
+    forced_boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
 ) -> Result<(), Error>
 where
     T: SeqWrite + Send,
@@ -211,6 +215,8 @@ where
         hardlinks: HashMap::new(),
         file_copy_buffer: vec::undefined(4 * 1024 * 1024),
         previous_ref: options.previous_ref,
+        forced_boundaries,
+        inject: (0, Vec::new()),
     };
 
     archiver
diff --git a/pbs-client/src/pxar_backup_stream.rs b/pbs-client/src/pxar_backup_stream.rs
index 22a6ffdc..d18ba470 100644
--- a/pbs-client/src/pxar_backup_stream.rs
+++ b/pbs-client/src/pxar_backup_stream.rs
@@ -1,3 +1,4 @@
+use std::collections::VecDeque;
 use std::io::Write;
 //use std::os::unix::io::FromRawFd;
 use std::path::Path;
@@ -17,6 +18,8 @@ use proxmox_io::StdChannelWriter;
 
 use pbs_datastore::catalog::CatalogWriter;
 
+use crate::inject_reused_chunks::InjectChunks;
+
 /// Stream implementation to encode and upload .pxar archives.
 ///
 /// The hyper client needs an async Stream for file upload, so we
@@ -40,6 +43,7 @@ impl PxarBackupStream {
         dir: Dir,
         catalog: Arc<Mutex<CatalogWriter<W>>>,
         options: crate::pxar::PxarCreateOptions,
+        boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
     ) -> Result<Self, Error> {
         let (tx, rx) = std::sync::mpsc::sync_channel(10);
 
@@ -64,6 +68,7 @@ impl PxarBackupStream {
                 },
                 Some(catalog),
                 options,
+                boundaries,
             )
             .await
             {
@@ -87,10 +92,11 @@ impl PxarBackupStream {
         dirname: &Path,
         catalog: Arc<Mutex<CatalogWriter<W>>>,
         options: crate::pxar::PxarCreateOptions,
+        boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
     ) -> Result<Self, Error> {
         let dir = nix::dir::Dir::open(dirname, OFlag::O_DIRECTORY, Mode::empty())?;
 
-        Self::new(dir, catalog, options)
+        Self::new(dir, catalog, options, boundaries)
     }
 }
 
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 509fa22c..5945ae5d 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -1,4 +1,4 @@
-use std::collections::HashSet;
+use std::collections::{HashSet, VecDeque};
 use std::io::{self, Read, Seek, SeekFrom, Write};
 use std::path::{Path, PathBuf};
 use std::pin::Pin;
@@ -192,8 +192,17 @@ async fn backup_directory<P: AsRef<Path>>(
     pxar_create_options: pbs_client::pxar::PxarCreateOptions,
     upload_options: UploadOptions,
 ) -> Result<BackupStats, Error> {
-    let pxar_stream = PxarBackupStream::open(dir_path.as_ref(), catalog, pxar_create_options)?;
-    let mut chunk_stream = ChunkStream::new(pxar_stream, chunk_size);
+    let boundaries = Arc::new(Mutex::new(VecDeque::new()));
+    let pxar_stream = PxarBackupStream::open(
+        dir_path.as_ref(),
+        catalog,
+        pxar_create_options,
+        boundaries.clone(),
+    )?;
+
+    let injections = Arc::new(Mutex::new(VecDeque::new()));
+    let mut chunk_stream =
+        ChunkStream::new(pxar_stream, chunk_size, boundaries, injections.clone());
 
     let (tx, rx) = mpsc::channel(10); // allow to buffer 10 chunks
 
@@ -211,7 +220,7 @@ async fn backup_directory<P: AsRef<Path>>(
     }
 
     let stats = client
-        .upload_stream(archive_name, stream, upload_options)
+        .upload_stream(archive_name, stream, upload_options, injections)
         .await?;
 
     Ok(stats)
@@ -237,8 +246,9 @@ async fn backup_image<P: AsRef<Path>>(
         bail!("cannot backup image with dynamic chunk size!");
     }
 
+    let injection_queue = Arc::new(Mutex::new(VecDeque::new()));
     let stats = client
-        .upload_stream(archive_name, stream, upload_options)
+        .upload_stream(archive_name, stream, upload_options, injection_queue)
         .await?;
 
     Ok(stats)
@@ -529,7 +539,14 @@ fn spawn_catalog_upload(
     let (catalog_tx, catalog_rx) = std::sync::mpsc::sync_channel(10); // allow to buffer 10 writes
     let catalog_stream = proxmox_async::blocking::StdChannelStream(catalog_rx);
     let catalog_chunk_size = 512 * 1024;
-    let catalog_chunk_stream = ChunkStream::new(catalog_stream, Some(catalog_chunk_size));
+    let boundaries = Arc::new(Mutex::new(VecDeque::new()));
+    let injections = Arc::new(Mutex::new(VecDeque::new()));
+    let catalog_chunk_stream = ChunkStream::new(
+        catalog_stream,
+        Some(catalog_chunk_size),
+        boundaries,
+        injections.clone(),
+    );
 
     let catalog_writer = Arc::new(Mutex::new(CatalogWriter::new(TokioWriterAdapter::new(
         StdChannelWriter::new(catalog_tx),
@@ -545,7 +562,12 @@ fn spawn_catalog_upload(
 
     tokio::spawn(async move {
         let catalog_upload_result = client
-            .upload_stream(CATALOG_NAME, catalog_chunk_stream, upload_options)
+            .upload_stream(
+                CATALOG_NAME,
+                catalog_chunk_stream,
+                upload_options,
+                injections,
+            )
             .await;
 
         if let Err(ref err) = catalog_upload_result {
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index f89b0ab4..5eff673e 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -1,8 +1,10 @@
 ///! File-restore API running inside the restore VM
+use std::collections::VecDeque;
 use std::ffi::OsStr;
 use std::fs;
 use std::os::unix::ffi::OsStrExt;
 use std::path::{Path, PathBuf};
+use std::sync::{Arc, Mutex};
 
 use anyhow::{bail, Error};
 use futures::FutureExt;
@@ -360,8 +362,15 @@ fn extract(
                     };
 
                     let pxar_writer = TokioWriter::new(writer);
-                    create_archive(dir, pxar_writer, Flags::DEFAULT, |_| Ok(()), None, options)
-                        .await
+                    create_archive(
+                        dir,
+                        pxar_writer,
+                        Flags::DEFAULT,
+                        |_| Ok(()),
+                        None,
+                        options,
+                        Arc::new(Mutex::new(VecDeque::new())),
+                    ).await
                 }
                 .await;
                 if let Err(err) = result {
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index 9376a2c1..c019f3e4 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -1,10 +1,10 @@
-use std::collections::HashSet;
+use std::collections::{HashSet, VecDeque};
 use std::ffi::OsStr;
 use std::fs::OpenOptions;
 use std::os::unix::fs::OpenOptionsExt;
 use std::path::{Path, PathBuf};
 use std::sync::atomic::{AtomicBool, Ordering};
-use std::sync::Arc;
+use std::sync::{Arc, Mutex};
 
 use anyhow::{bail, format_err, Error};
 use futures::future::FutureExt;
@@ -384,6 +384,7 @@ async fn create_archive(
         },
         None,
         options,
+        Arc::new(Mutex::new(VecDeque::new())),
     )
     .await?;
 
diff --git a/tests/catar.rs b/tests/catar.rs
index 36bb4f3b..d69cb37b 100644
--- a/tests/catar.rs
+++ b/tests/catar.rs
@@ -1,4 +1,6 @@
 use std::process::Command;
+use std::sync::{Arc, Mutex};
+use std::collections::VecDeque;
 
 use anyhow::Error;
 
@@ -40,6 +42,7 @@ fn run_test(dir_name: &str) -> Result<(), Error> {
         |_| Ok(()),
         None,
         options,
+        Arc::new(Mutex::new(VecDeque::new())),
     ))?;
 
     Command::new("cmp")
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 19/28] fix #3174: backup writer: inject queued chunk in upload steam
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (17 preceding siblings ...)
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 18/28] fix #3174: chunker: add forced boundaries Christian Ebner
@ 2023-11-15 15:48 ` Christian Ebner
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 20/28] fix #3174: archiver: reuse files with unchanged metadata Christian Ebner
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:48 UTC (permalink / raw)
  To: pbs-devel

Inject the chunk in the backup writers upload stream, including them
thereby in the index file.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- no changes

Changes since version 3:
- reformatting

Changes since version 2:
- no changes

Changes since version 1:
- no changes

 pbs-client/src/backup_writer.rs | 85 +++++++++++++++++++--------------
 1 file changed, 50 insertions(+), 35 deletions(-)

diff --git a/pbs-client/src/backup_writer.rs b/pbs-client/src/backup_writer.rs
index cc6dd49a..e8c2ec11 100644
--- a/pbs-client/src/backup_writer.rs
+++ b/pbs-client/src/backup_writer.rs
@@ -23,7 +23,7 @@ use pbs_tools::crypt_config::CryptConfig;
 
 use proxmox_human_byte::HumanByte;
 
-use super::inject_reused_chunks::InjectChunks;
+use super::inject_reused_chunks::{InjectChunks, InjectReusedChunks, InjectedChunksInfo};
 use super::merge_known_chunks::{MergeKnownChunks, MergedChunkInfo};
 
 use super::{H2Client, HttpClient};
@@ -667,48 +667,63 @@ impl BackupWriter {
         let index_csum_2 = index_csum.clone();
 
         stream
-            .and_then(move |data| {
-                let chunk_len = data.len();
+            .inject_reused_chunks(
+                injection_queue,
+                stream_len,
+                reused_len.clone(),
+                index_csum.clone(),
+            )
+            .and_then(move |chunk_info| match chunk_info {
+                InjectedChunksInfo::Known(chunks) => {
+                    total_chunks.fetch_add(chunks.len(), Ordering::SeqCst);
+                    future::ok(MergedChunkInfo::Known(chunks))
+                }
+                InjectedChunksInfo::Raw((offset, data)) => {
+                    let chunk_len = data.len();
 
-                total_chunks.fetch_add(1, Ordering::SeqCst);
-                let offset = stream_len.fetch_add(chunk_len, Ordering::SeqCst) as u64;
+                    total_chunks.fetch_add(1, Ordering::SeqCst);
 
-                let mut chunk_builder = DataChunkBuilder::new(data.as_ref()).compress(compress);
+                    let mut chunk_builder = DataChunkBuilder::new(data.as_ref()).compress(compress);
 
-                if let Some(ref crypt_config) = crypt_config {
-                    chunk_builder = chunk_builder.crypt_config(crypt_config);
-                }
+                    if let Some(ref crypt_config) = crypt_config {
+                        chunk_builder = chunk_builder.crypt_config(crypt_config);
+                    }
 
-                let mut known_chunks = known_chunks.lock().unwrap();
-                let digest = chunk_builder.digest();
+                    let mut known_chunks = known_chunks.lock().unwrap();
 
-                let mut guard = index_csum.lock().unwrap();
-                let csum = guard.as_mut().unwrap();
+                    let digest = chunk_builder.digest();
 
-                let chunk_end = offset + chunk_len as u64;
+                    let mut guard = index_csum.lock().unwrap();
+                    let csum = guard.as_mut().unwrap();
 
-                if !is_fixed_chunk_size {
-                    csum.update(&chunk_end.to_le_bytes());
-                }
-                csum.update(digest);
+                    let chunk_end = offset + chunk_len as u64;
 
-                let chunk_is_known = known_chunks.contains(digest);
-                if chunk_is_known {
-                    known_chunk_count.fetch_add(1, Ordering::SeqCst);
-                    reused_len.fetch_add(chunk_len, Ordering::SeqCst);
-                    future::ok(MergedChunkInfo::Known(vec![(offset, *digest)]))
-                } else {
-                    let compressed_stream_len2 = compressed_stream_len.clone();
-                    known_chunks.insert(*digest);
-                    future::ready(chunk_builder.build().map(move |(chunk, digest)| {
-                        compressed_stream_len2.fetch_add(chunk.raw_size(), Ordering::SeqCst);
-                        MergedChunkInfo::New(ChunkInfo {
-                            chunk,
-                            digest,
-                            chunk_len: chunk_len as u64,
-                            offset,
-                        })
-                    }))
+                    if !is_fixed_chunk_size {
+                        csum.update(&chunk_end.to_le_bytes());
+                    }
+                    csum.update(digest);
+
+                    let chunk_is_known = known_chunks.contains(digest);
+                    if chunk_is_known {
+                        known_chunk_count.fetch_add(1, Ordering::SeqCst);
+                        reused_len.fetch_add(chunk_len, Ordering::SeqCst);
+
+                        future::ok(MergedChunkInfo::Known(vec![(offset, *digest)]))
+                    } else {
+                        let compressed_stream_len2 = compressed_stream_len.clone();
+                        known_chunks.insert(*digest);
+
+                        future::ready(chunk_builder.build().map(move |(chunk, digest)| {
+                            compressed_stream_len2.fetch_add(chunk.raw_size(), Ordering::SeqCst);
+
+                            MergedChunkInfo::New(ChunkInfo {
+                                chunk,
+                                digest,
+                                chunk_len: chunk_len as u64,
+                                offset,
+                            })
+                        }))
+                    }
                 }
             })
             .merge_known_chunks()
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 20/28] fix #3174: archiver: reuse files with unchanged metadata
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (18 preceding siblings ...)
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 19/28] fix #3174: backup writer: inject queued chunk in upload steam Christian Ebner
@ 2023-11-15 15:48 ` Christian Ebner
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 21/28] fix #3174: specs: add backup detection mode specification Christian Ebner
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:48 UTC (permalink / raw)
  To: pbs-devel

During pxar archive encoding, check regular files against their
previous backup catalogs metadata, if present.

Instead of re-encoding files with unchanged metadata with file size over
a given threshold limit, mark the entries as appendix references in the
pxar archive and append the chunks containing the file payload in the
appendix.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- restructured patch by squashing changes from v3 patch 0021 into this
  one, as the logic should be in this patch

Changes since version 3:
- no changes

Changes since version 2:
- Increase file size threshold for metadata based detection to 1k to
  reduce chunk fragmentation.
- Normalize appendix section chunks to inject in order to minimize
  artifical bloating of the index. This further drastically reduces
  restore time for a fragmented index

Changes since version 1:
- Refactor to use new catalog file format and types
- fix issue with injected chunks not being included in the list to
  upload in case of consecutive files referencing the same chunk
- remove restriction of needing the previous backup run to be regular in
  order to perform metadata based file change detection. This can be
  done now, since the catalog stores the appendix start offset for each
  pxar archive in an easily accessible way.

 pbs-client/src/pxar/create.rs                 | 306 +++++++++++++++++-
 proxmox-backup-client/src/main.rs             |   8 +-
 .../src/proxmox_restore_daemon/api.rs         |   1 +
 pxar-bin/src/main.rs                          |   1 +
 src/tape/file_formats/snapshot_archive.rs     |   2 +-
 5 files changed, 298 insertions(+), 20 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index c4d207d8..8d3db5e8 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -17,14 +17,17 @@ use nix::sys::stat::{FileStat, Mode};
 
 use pathpatterns::{MatchEntry, MatchFlag, MatchList, MatchType, PatternFlag};
 use proxmox_sys::error::SysError;
-use pxar::encoder::{LinkOffset, SeqWrite};
+use pxar::encoder::{AppendixRefOffset, LinkOffset, SeqWrite};
 use pxar::Metadata;
 
 use proxmox_io::vec;
 use proxmox_lang::c_str;
 use proxmox_sys::fs::{self, acl, xattr};
 
-use pbs_datastore::catalog::{BackupCatalogWriter, CatalogReader};
+use pbs_datastore::catalog::{
+    AppendixStartOffset, BackupCatalogWriter, CatalogReader, CatalogV2Extension, DirEntry,
+    DirEntryAttribute,
+};
 use pbs_datastore::dynamic_index::{DynamicEntry, DynamicIndexReader};
 
 use crate::inject_reused_chunks::InjectChunks;
@@ -32,6 +35,8 @@ use crate::pxar::metadata::errno_is_unsupported;
 use crate::pxar::tools::assert_single_path_component;
 use crate::pxar::Flags;
 
+const MAX_FILE_SIZE: u64 = 1024;
+
 /// Pxar options for creating a pxar archive/stream
 #[derive(Default)]
 pub struct PxarCreateOptions {
@@ -45,6 +50,8 @@ pub struct PxarCreateOptions {
     pub skip_lost_and_found: bool,
     /// Reference state for partial backups
     pub previous_ref: Option<PxarPrevRef>,
+    /// Catalog archive name
+    pub archive_name: Option<CString>,
 }
 
 /// Contains statefull information of previous backups snapshots for partial backups
@@ -127,6 +134,89 @@ struct HardLinkInfo {
     st_ino: u64,
 }
 
+struct Appendix {
+    total: AppendixRefOffset,
+    chunks: Vec<DynamicEntry>,
+}
+
+impl Appendix {
+    fn new() -> Self {
+        Self {
+            total: AppendixRefOffset::default(),
+            chunks: Vec::new(),
+        }
+    }
+
+    fn is_empty(&self) -> bool {
+        self.chunks.is_empty()
+    }
+
+    fn insert(&mut self, indices: Vec<DynamicEntry>, start_padding: u64) -> AppendixRefOffset {
+        if let Some(offset) = self.digest_sequence_contained(&indices) {
+            AppendixRefOffset::default().add(offset + start_padding)
+        } else if let Some(offset) = self.last_digest_matched(&indices) {
+            for chunk in indices.into_iter().skip(1) {
+                self.total = self.total.add(chunk.end());
+                self.chunks.push(chunk);
+            }
+            AppendixRefOffset::default().add(offset + start_padding)
+        } else {
+            let offset = self.total;
+            for chunk in indices.into_iter() {
+                self.total = self.total.add(chunk.end());
+                self.chunks.push(chunk);
+            }
+            offset.add(start_padding)
+        }
+    }
+
+    fn digest_sequence_contained(&self, indices: &[DynamicEntry]) -> Option<u64> {
+        let digest = if let Some(first) = indices.first() {
+            first.digest()
+        } else {
+            return None;
+        };
+
+        let mut offset = 0;
+        let mut iter = self.chunks.iter();
+        while let Some(position) = iter.position(|e| {
+            offset += e.end();
+            e.digest() == digest
+        }) {
+            if indices.len() + position > self.chunks.len() {
+                return None;
+            }
+
+            for (ind, chunk) in indices.iter().skip(1).enumerate() {
+                if chunk.digest() != self.chunks[ind + position].digest() {
+                    break;
+                }
+            }
+
+            offset -= self.chunks[position].end();
+            return Some(offset);
+        }
+
+        None
+    }
+
+    fn last_digest_matched(&self, indices: &[DynamicEntry]) -> Option<u64> {
+        let digest = if let Some(first) = indices.first() {
+            first.digest()
+        } else {
+            return None;
+        };
+
+        if let Some(last) = self.chunks.last() {
+            if last.digest() == digest {
+                return Some(self.total.raw() - last.end());
+            }
+        }
+
+        None
+    }
+}
+
 struct Archiver {
     feature_flags: Flags,
     fs_feature_flags: Flags,
@@ -144,7 +234,8 @@ struct Archiver {
     file_copy_buffer: Vec<u8>,
     previous_ref: Option<PxarPrevRef>,
     forced_boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
-    inject: (usize, Vec<DynamicEntry>),
+    appendix: Appendix,
+    prev_appendix: Option<AppendixStartOffset>,
 }
 
 type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
@@ -155,7 +246,7 @@ pub async fn create_archive<T, F>(
     feature_flags: Flags,
     callback: F,
     catalog: Option<Arc<Mutex<dyn BackupCatalogWriter + Send>>>,
-    options: PxarCreateOptions,
+    mut options: PxarCreateOptions,
     forced_boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
 ) -> Result<(), Error>
 where
@@ -200,6 +291,22 @@ where
         )?);
     }
 
+    let (appendix_start, prev_cat_parent) = if let Some(ref mut prev_ref) = options.previous_ref {
+        let entry = prev_ref
+            .catalog
+            .lookup_recursive(prev_ref.archive_name.as_bytes())?;
+        let parent = match entry.attr {
+            DirEntryAttribute::Archive { .. } => Some(entry),
+            _ => None,
+        };
+        let appendix_start = prev_ref
+            .catalog
+            .appendix_offset(prev_ref.archive_name.as_bytes())?;
+        (appendix_start, parent)
+    } else {
+        (None, None)
+    };
+
     let mut archiver = Archiver {
         feature_flags,
         fs_feature_flags,
@@ -216,13 +323,42 @@ where
         file_copy_buffer: vec::undefined(4 * 1024 * 1024),
         previous_ref: options.previous_ref,
         forced_boundaries,
-        inject: (0, Vec::new()),
+        appendix: Appendix::new(),
+        prev_appendix: appendix_start,
     };
 
+    if let Some(ref mut catalog) = archiver.catalog {
+        if let Some(ref archive_name) = options.archive_name {
+            catalog
+                .lock()
+                .unwrap()
+                .start_archive(archive_name.as_c_str())?;
+        }
+    }
+
     archiver
-        .archive_dir_contents(&mut encoder, source_dir, true)
+        .archive_dir_contents(&mut encoder, source_dir, prev_cat_parent.as_ref(), true)
         .await?;
-    encoder.finish().await?;
+
+    if archiver.appendix.is_empty() {
+        encoder.finish(None).await?;
+        if let Some(ref mut catalog) = archiver.catalog {
+            if options.archive_name.is_some() {
+                catalog.lock().unwrap().end_archive(None)?;
+            }
+        }
+    } else {
+        let (appendix_offset, appendix_size) = archiver.add_appendix(&mut encoder).await?;
+        encoder
+            .finish(Some((appendix_offset, appendix_size)))
+            .await?;
+        if let Some(catalog) = archiver.catalog {
+            if options.archive_name.is_some() {
+                catalog.lock().unwrap().end_archive(Some(appendix_offset))?;
+            }
+        }
+    }
+
     Ok(())
 }
 
@@ -251,6 +387,7 @@ impl Archiver {
         &'a mut self,
         encoder: &'a mut Encoder<'b, T>,
         mut dir: Dir,
+        prev_cat_parent: Option<&'a DirEntry>,
         is_root: bool,
     ) -> BoxFuture<'a, Result<(), Error>> {
         async move {
@@ -273,6 +410,13 @@ impl Archiver {
 
             let old_path = std::mem::take(&mut self.path);
 
+            let mut prev_catalog_dir_entries = Vec::new();
+            if let Some(ref mut prev_ref) = self.previous_ref {
+                if let Some(ref parent) = prev_cat_parent {
+                    prev_catalog_dir_entries = prev_ref.catalog.read_dir(parent)?;
+                }
+            }
+
             for file_entry in file_list {
                 let file_name = file_entry.name.to_bytes();
 
@@ -284,9 +428,15 @@ impl Archiver {
 
                 (self.callback)(&file_entry.path)?;
                 self.path = file_entry.path;
-                self.add_entry(encoder, dir_fd, &file_entry.name, &file_entry.stat)
-                    .await
-                    .map_err(|err| self.wrap_err(err))?;
+                self.add_entry(
+                    encoder,
+                    dir_fd,
+                    &file_entry.name,
+                    &file_entry.stat,
+                    &prev_catalog_dir_entries,
+                )
+                .await
+                .map_err(|err| self.wrap_err(err))?;
             }
             self.path = old_path;
             self.entry_counter = entry_counter;
@@ -533,12 +683,119 @@ impl Archiver {
         Ok(())
     }
 
+    async fn add_appendix<T: SeqWrite + Send>(
+        &mut self,
+        encoder: &mut Encoder<'_, T>,
+    ) -> Result<(pxar::encoder::AppendixStartOffset, AppendixRefOffset), Error> {
+        let total = self.appendix.total;
+        let appendix_offset = encoder.add_appendix(total).await?;
+        let mut boundaries = self.forced_boundaries.lock().unwrap();
+        let mut position = unsafe { encoder.position_add(0) };
+
+        // Inject reused chunks in patches of 128 to not exceed upload post req size limit
+        for chunks in self.appendix.chunks.chunks(128) {
+            let size = chunks
+                .iter()
+                .fold(0, |sum, chunk| sum + chunk.end() as usize);
+            let inject_chunks = InjectChunks {
+                boundary: position,
+                chunks: chunks.to_vec(),
+                size,
+            };
+            boundaries.push_back(inject_chunks);
+            position = unsafe { encoder.position_add(size as u64) };
+        }
+
+        Ok((appendix_offset, total))
+    }
+
+    async fn reuse_if_metadata_unchanged<T: SeqWrite + Send>(
+        &mut self,
+        encoder: &mut Encoder<'_, T>,
+        c_file_name: &CStr,
+        metadata: &Metadata,
+        stat: &FileStat,
+        catalog_entries: &[DirEntry],
+    ) -> Result<bool, Error> {
+        let prev_ref = match self.previous_ref {
+            None => return Ok(false),
+            Some(ref mut prev_ref) => prev_ref,
+        };
+
+        let catalog_entry = catalog_entries
+            .iter()
+            .find(|entry| entry.name == c_file_name.to_bytes());
+
+        let (size, ctime, start_offset) = match catalog_entry {
+            Some(&DirEntry {
+                attr:
+                    DirEntryAttribute::File {
+                        size,
+                        mtime: _,
+                        extension: Some(CatalogV2Extension { ctime, file_offset }),
+                    },
+                ..
+            }) => (size, ctime, file_offset.raw()),
+            Some(&DirEntry {
+                attr:
+                    DirEntryAttribute::AppendixRef {
+                        size,
+                        mtime: _,
+                        ctime,
+                        appendix_ref_offset,
+                    },
+                ..
+            }) => (
+                size,
+                ctime,
+                self.prev_appendix.unwrap().raw() + appendix_ref_offset.raw(),
+            ),
+            // The entry type found in the catalog is not a regular file,
+            // do not reuse entry but rather encode it again.
+            _ => return Ok(false),
+        };
+
+        let file_size = stat.st_size as u64;
+        if ctime != stat.st_ctime || size != file_size {
+            return Ok(false);
+        }
+
+        // Since ctime is unchanged, use current metadata size to calculate size and thereby
+        // end offset for the file payload in the reference archive.
+        // This is required to find the existing indexes to be included in the new index file.
+        let mut bytes = pxar::encoder::encoded_size(c_file_name, metadata).await?;
+        // payload header size
+        bytes += std::mem::size_of::<pxar::format::Header>() as u64;
+
+        let end_offset = start_offset + bytes + file_size;
+        let (indices, start_padding, end_padding) =
+            prev_ref.index.indices(start_offset, end_offset)?;
+
+        let appendix_ref_offset = self.appendix.insert(indices, start_padding);
+        let file_name: &Path = OsStr::from_bytes(c_file_name.to_bytes()).as_ref();
+        self.add_appendix_ref(encoder, file_name, appendix_ref_offset, file_size)
+            .await?;
+
+        if let Some(ref catalog) = self.catalog {
+            catalog.lock().unwrap().add_appendix_ref(
+                c_file_name,
+                file_size,
+                stat.st_mtime,
+                stat.st_ctime,
+                appendix_ref_offset,
+            )?;
+        }
+
+        Ok(true)
+    }
+
     async fn add_entry<T: SeqWrite + Send>(
         &mut self,
         encoder: &mut Encoder<'_, T>,
         parent: RawFd,
         c_file_name: &CStr,
         stat: &FileStat,
+        prev_cat_entries: &[DirEntry],
     ) -> Result<(), Error> {
         use pxar::format::mode;
 
@@ -599,6 +856,20 @@ impl Archiver {
                 }
 
                 let file_size = stat.st_size as u64;
+                if file_size > MAX_FILE_SIZE
+                    && self
+                        .reuse_if_metadata_unchanged(
+                            encoder,
+                            c_file_name,
+                            &metadata,
+                            stat,
+                            prev_cat_entries,
+                        )
+                        .await?
+                {
+                    return Ok(());
+                }
+
                 let offset: LinkOffset = self
                     .add_regular_file(encoder, fd, file_name, &metadata, file_size)
                     .await?;
@@ -626,8 +897,15 @@ impl Archiver {
                 if let Some(ref catalog) = self.catalog {
                     catalog.lock().unwrap().start_directory(c_file_name)?;
                 }
+                let prev_cat_entry = prev_cat_entries.iter().find(|entry| match entry {
+                    DirEntry {
+                        name,
+                        attr: DirEntryAttribute::Directory { .. },
+                    } => name == c_file_name.to_bytes(),
+                    _ => false,
+                });
                 let result = self
-                    .add_directory(encoder, dir, c_file_name, &metadata, stat)
+                    .add_directory(encoder, dir, c_file_name, &metadata, stat, prev_cat_entry)
                     .await;
                 if let Some(ref catalog) = self.catalog {
                     catalog.lock().unwrap().end_directory()?;
@@ -684,6 +962,7 @@ impl Archiver {
         dir_name: &CStr,
         metadata: &Metadata,
         stat: &FileStat,
+        prev_cat_parent: Option<&DirEntry>,
     ) -> Result<(), Error> {
         let dir_name = OsStr::from_bytes(dir_name.to_bytes());
 
@@ -710,14 +989,15 @@ impl Archiver {
             log::info!("skipping mount point: {:?}", self.path);
             Ok(())
         } else {
-            self.archive_dir_contents(&mut encoder, dir, false).await
+            self.archive_dir_contents(&mut encoder, dir, prev_cat_parent, false)
+                .await
         };
 
         self.fs_magic = old_fs_magic;
         self.fs_feature_flags = old_fs_feature_flags;
         self.current_st_dev = old_st_dev;
 
-        encoder.finish().await?;
+        encoder.finish(None).await?;
         result
     }
 
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 5945ae5d..cbdd9f43 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -48,7 +48,7 @@ use pbs_client::{
     FixedChunkStream, HttpClient, PxarBackupStream, RemoteChunkReader, UploadOptions,
     BACKUP_SOURCE_SCHEMA,
 };
-use pbs_datastore::catalog::{BackupCatalogWriter, CatalogReader, CatalogWriter};
+use pbs_datastore::catalog::{CatalogReader, CatalogWriter};
 use pbs_datastore::chunk_store::verify_chunk_size;
 use pbs_datastore::dynamic_index::{BufferedDynamicReader, DynamicIndexReader};
 use pbs_datastore::fixed_index::FixedIndexReader;
@@ -1005,10 +1005,6 @@ async fn create_backup(
                 let catalog = catalog.as_ref().unwrap();
 
                 log_file("directory", &filename, &target);
-                catalog
-                    .lock()
-                    .unwrap()
-                    .start_directory(std::ffi::CString::new(target.as_str())?.as_c_str())?;
 
                 let pxar_options = pbs_client::pxar::PxarCreateOptions {
                     device_set: devices.clone(),
@@ -1016,6 +1012,7 @@ async fn create_backup(
                     entries_max: entries_max as usize,
                     skip_lost_and_found,
                     previous_ref: None,
+                    archive_name: Some(std::ffi::CString::new(target.as_str())?),
                 };
 
                 let upload_options = UploadOptions {
@@ -1036,7 +1033,6 @@ async fn create_backup(
                 )
                 .await?;
                 manifest.add_file(target, stats.size, stats.csum, crypto.mode)?;
-                catalog.lock().unwrap().end_directory()?;
             }
             (BackupSpecificationType::IMAGE, false) => {
                 log_file("image", &filename, &target);
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index 5eff673e..734fa976 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -359,6 +359,7 @@ fn extract(
                         patterns,
                         skip_lost_and_found: false,
                         previous_ref: None,
+                        archive_name: None,
                     };
 
                     let pxar_writer = TokioWriter::new(writer);
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index c019f3e4..6d365115 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -371,6 +371,7 @@ async fn create_archive(
         patterns,
         skip_lost_and_found: false,
         previous_ref: None,
+        archive_name: None,
     };
 
     let writer = pxar::encoder::sync::StandardWriter::new(writer);
diff --git a/src/tape/file_formats/snapshot_archive.rs b/src/tape/file_formats/snapshot_archive.rs
index cc5c5885..0ae2c6f4 100644
--- a/src/tape/file_formats/snapshot_archive.rs
+++ b/src/tape/file_formats/snapshot_archive.rs
@@ -91,7 +91,7 @@ pub fn tape_write_snapshot_archive<'a>(
                 proxmox_lang::io_bail!("file '{}' shrunk while reading", filename);
             }
         }
-        encoder.finish()?;
+        encoder.finish(None)?;
         Ok(())
     });
 
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 21/28] fix #3174: specs: add backup detection mode specification
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (19 preceding siblings ...)
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 20/28] fix #3174: archiver: reuse files with unchanged metadata Christian Ebner
@ 2023-11-15 15:48 ` Christian Ebner
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 22/28] fix #3174: client: Add detection mode to backup creation Christian Ebner
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:48 UTC (permalink / raw)
  To: pbs-devel

Adds the specification for switching the detection mode used to
identify regular files which changed since a reference backup run.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- no changes

Changes since version 3:
- no changes

Changes since version 2:
- use backup specification instead of schema. The intention is to allow
  to not only switch metadata based file change detection on or off, but
  rather to also allow to specify for which archives which method should
  be used.

Changes since version 1:
- not present in version 1

 pbs-client/src/backup_specification.rs | 30 ++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/pbs-client/src/backup_specification.rs b/pbs-client/src/backup_specification.rs
index 619a3a9d..3f9e6d02 100644
--- a/pbs-client/src/backup_specification.rs
+++ b/pbs-client/src/backup_specification.rs
@@ -4,6 +4,7 @@ use proxmox_schema::*;
 
 const_regex! {
     BACKUPSPEC_REGEX = r"^([a-zA-Z0-9_-]+\.(pxar|img|conf|log)):(.+)$";
+    DETECTION_MODE_REGEX = r"^(data|metadata(:[a-zA-Z0-9_-]+\.pxar)*)$";
 }
 
 pub const BACKUP_SOURCE_SCHEMA: Schema =
@@ -11,6 +12,11 @@ pub const BACKUP_SOURCE_SCHEMA: Schema =
         .format(&ApiStringFormat::Pattern(&BACKUPSPEC_REGEX))
         .schema();
 
+pub const BACKUP_DETECTION_MODE_SPEC: Schema =
+    StringSchema::new("Backup source specification ([data|metadata(:<label>,...)]).")
+        .format(&ApiStringFormat::Pattern(&DETECTION_MODE_REGEX))
+        .schema();
+
 pub enum BackupSpecificationType {
     PXAR,
     IMAGE,
@@ -45,3 +51,27 @@ pub fn parse_backup_specification(value: &str) -> Result<BackupSpecification, Er
 
     bail!("unable to parse backup source specification '{}'", value);
 }
+
+pub enum BackupDetectionMode {
+    Data,
+    Metadata(Vec<String>),
+}
+
+pub fn parse_backup_detection_mode_specification(
+    value: &str,
+) -> Result<BackupDetectionMode, Error> {
+    if let Some(caps) = (DETECTION_MODE_REGEX.regex_obj)().captures(value) {
+        let mode = match caps.get(1).unwrap().as_str() {
+            "data" => BackupDetectionMode::Data,
+            ty if ty.starts_with("metadata") => {
+                let archives = ty.split(':').skip(1).map(|s| format!("{s}.didx")).collect();
+                BackupDetectionMode::Metadata(archives)
+            },
+            _ => bail!("invalid backup detection mode"),
+        };
+
+        return Ok(mode);
+    }
+
+    bail!("unable to parse backup detection mode specification '{}'", value);
+}
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 22/28] fix #3174: client: Add detection mode to backup creation
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (20 preceding siblings ...)
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 21/28] fix #3174: specs: add backup detection mode specification Christian Ebner
@ 2023-11-15 15:48 ` Christian Ebner
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 23/28] test-suite: add detection mode change benchmark Christian Ebner
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:48 UTC (permalink / raw)
  To: pbs-devel

Introduces the `change-detection-mode` parameter to change file
encoding behavior.

When set to `metadata`, the catalog for the previous backup run and
the corresponding index file are fetched from the server and used as
reference during pxar archive creation.
This allows the archiver to skip encoding of file payloads for
unchanged regular files and referencing their existing chunks to be
included in the new backups index file instead, creating a pxar
archive with appendix section containing the payloads as concatenation
of chunks.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- moved some logic to patch v5 0020, as it better fits there

Changes since version 3:
- refactor appendix section creation code

Changes since version 2:
- Fix issue with reference catalog and index download when either the
  backup group contains no snapshots or the snapshot does not contain an
  archive with the given name.

Changes since version 1:
- Replace `incremental` flag with `change-detection-mode` param

 proxmox-backup-client/src/main.rs | 122 ++++++++++++++++++++++++++++--
 1 file changed, 114 insertions(+), 8 deletions(-)

diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index cbdd9f43..65d3308a 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -1,5 +1,6 @@
 use std::collections::{HashSet, VecDeque};
 use std::io::{self, Read, Seek, SeekFrom, Write};
+use std::os::unix::fs::OpenOptionsExt;
 use std::path::{Path, PathBuf};
 use std::pin::Pin;
 use std::sync::{Arc, Mutex};
@@ -43,10 +44,10 @@ use pbs_client::tools::{
     CHUNK_SIZE_SCHEMA, REPO_URL_SCHEMA,
 };
 use pbs_client::{
-    delete_ticket_info, parse_backup_specification, view_task_result, BackupReader,
-    BackupRepository, BackupSpecificationType, BackupStats, BackupWriter, ChunkStream,
-    FixedChunkStream, HttpClient, PxarBackupStream, RemoteChunkReader, UploadOptions,
-    BACKUP_SOURCE_SCHEMA,
+    delete_ticket_info, parse_backup_detection_mode_specification, parse_backup_specification,
+    view_task_result, BackupDetectionMode, BackupReader, BackupRepository, BackupSpecificationType,
+    BackupStats, BackupWriter, ChunkStream, FixedChunkStream, HttpClient, PxarBackupStream,
+    RemoteChunkReader, UploadOptions, BACKUP_DETECTION_MODE_SPEC, BACKUP_SOURCE_SCHEMA,
 };
 use pbs_datastore::catalog::{CatalogReader, CatalogWriter};
 use pbs_datastore::chunk_store::verify_chunk_size;
@@ -666,6 +667,10 @@ fn spawn_catalog_upload(
                schema: TRAFFIC_CONTROL_BURST_SCHEMA,
                optional: true,
            },
+           "change-detection-mode": {
+               schema: BACKUP_DETECTION_MODE_SPEC,
+               optional: true,
+           },
            "exclude": {
                type: Array,
                description: "List of paths or patterns for matching files to exclude.",
@@ -849,7 +854,20 @@ async fn create_backup(
 
     let backup_time = backup_time_opt.unwrap_or_else(epoch_i64);
 
-    let client = connect_rate_limited(&repo, rate_limit)?;
+    let cd_mode = param["change-detection-mode"].as_str().unwrap_or("data");
+    let detection_mode = parse_backup_detection_mode_specification(cd_mode)?;
+
+    let client = connect_rate_limited(&repo, rate_limit.clone())?;
+    let backup_group = BackupGroup::new(backup_type, backup_id);
+
+    let previous_snapshot = if let BackupDetectionMode::Metadata(_) = detection_mode {
+        api_datastore_latest_snapshot(&client, &repo.store(), &backup_ns, backup_group)
+            .await
+            .ok()
+    } else {
+        None
+    };
+
     record_repository(&repo);
 
     let snapshot = BackupDir::from((backup_type, backup_id.to_owned(), backup_time));
@@ -959,8 +977,8 @@ async fn create_backup(
         log::info!("{} {} '{}' to '{}' as {}", what, desc, file, repo, target);
     };
 
-    for (backup_type, filename, target, size) in upload_list {
-        match (backup_type, dry_run) {
+    for (backup_spec_type, filename, target, size) in upload_list {
+        match (backup_spec_type, dry_run) {
             // dry-run
             (BackupSpecificationType::CONFIG, true) => log_file("config file", &filename, &target),
             (BackupSpecificationType::LOGFILE, true) => log_file("log file", &filename, &target),
@@ -1006,12 +1024,51 @@ async fn create_backup(
 
                 log_file("directory", &filename, &target);
 
+                let known_chunks = Arc::new(Mutex::new(HashSet::new()));
+                let mut previous_ref = None;
+
+                if let BackupDetectionMode::Metadata(ref archives) = detection_mode {
+                    if archives.is_empty() || archives.contains(&target) {
+                        if let Some(ref manifest) = previous_manifest {
+                            let reference_catalog = download_reference_catalog(
+                                &repo,
+                                previous_snapshot.as_ref().unwrap(),
+                                &backup_ns,
+                                crypt_config.clone(),
+                            )
+                            .await;
+
+                            if let Ok(reference_catalog) = reference_catalog {
+                                let reference_index = client.download_previous_dynamic_index(
+                                    &target,
+                                    &manifest,
+                                    known_chunks.clone(),
+                                )
+                                .await;
+
+                                if let Ok(reference_index) = reference_index {
+                                    log::info!(
+                                        "Using previous catalog as metadata reference for '{target}'"
+                                    );
+                                    previous_ref = Some(pbs_client::pxar::PxarPrevRef {
+                                        index: reference_index,
+                                        catalog: reference_catalog,
+                                        archive_name: target.clone(),
+                                    });
+                                }
+                            }
+                        } else {
+                            log::info!("No previous manifest, fallback to regular mode");
+                        };
+                    }
+                }
+
                 let pxar_options = pbs_client::pxar::PxarCreateOptions {
                     device_set: devices.clone(),
                     patterns: pattern_list.clone(),
                     entries_max: entries_max as usize,
                     skip_lost_and_found,
-                    previous_ref: None,
+                    previous_ref,
                     archive_name: Some(std::ffi::CString::new(target.as_str())?),
                 };
 
@@ -1112,6 +1169,55 @@ async fn create_backup(
     Ok(Value::Null)
 }
 
+async fn download_reference_catalog(
+    repo: &BackupRepository,
+    previous_snapshot: &BackupDir,
+    backup_ns: &BackupNamespace,
+    crypt_config: Option<Arc<CryptConfig>>,
+) -> Result<CatalogReader<std::fs::File>, Error> {
+    let http_reader_client = connect(&repo)?;
+    let backup_reader = BackupReader::start(
+        http_reader_client,
+        crypt_config.clone(),
+        repo.store(),
+        &backup_ns,
+        &previous_snapshot,
+        true,
+    )
+    .await?;
+
+    let (manifest, _) = backup_reader.download_manifest().await?;
+    manifest.check_fingerprint(crypt_config.as_ref().map(Arc::as_ref))?;
+
+    let index = backup_reader
+        .download_dynamic_index(&manifest, CATALOG_NAME)
+        .await?;
+    let most_used = index.find_most_used_chunks(8);
+    let file_info = manifest.lookup_file_info(CATALOG_NAME)?;
+
+    let chunk_reader = RemoteChunkReader::new(
+        backup_reader,
+        crypt_config.clone(),
+        file_info.chunk_crypt_mode(),
+        most_used,
+    );
+
+    let mut reader = BufferedDynamicReader::new(index, chunk_reader);
+
+    let mut catalogfile = std::fs::OpenOptions::new()
+        .write(true)
+        .read(true)
+        .custom_flags(libc::O_TMPFILE)
+        .open("/tmp")?;
+
+    std::io::copy(&mut reader, &mut catalogfile)
+        .map_err(|err| format_err!("failed to download reference catalog - {}", err))?;
+
+    catalogfile.seek(SeekFrom::Start(0))?;
+
+    Ok(CatalogReader::new(catalogfile))
+}
+
 async fn dump_image<W: Write>(
     client: Arc<BackupReader>,
     crypt_config: Option<Arc<CryptConfig>>,
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 23/28] test-suite: add detection mode change benchmark
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (21 preceding siblings ...)
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 22/28] fix #3174: client: Add detection mode to backup creation Christian Ebner
@ 2023-11-15 15:48 ` Christian Ebner
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 24/28] test-suite: Add bin to deb, add shell completions Christian Ebner
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:48 UTC (permalink / raw)
  To: pbs-devel

Introduces the proxmox-backup-test-suite create intended for
benchmarking and high level user facing testing.

The initial code includes a benchmark intended for regression testing of
the proxmox-backup-client when using different file detection modes
during backup.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- add a metadata based backup run, as only those can be used as
  reference

Changes since version 3:
- no changes

Changes since version 2:
- no changes

Changes since version 1:
- not present in version 1

 Cargo.toml                                    |   1 +
 proxmox-backup-test-suite/Cargo.toml          |  18 ++
 .../src/detection_mode_bench.rs               | 291 ++++++++++++++++++
 proxmox-backup-test-suite/src/main.rs         |  17 +
 4 files changed, 327 insertions(+)
 create mode 100644 proxmox-backup-test-suite/Cargo.toml
 create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs
 create mode 100644 proxmox-backup-test-suite/src/main.rs

diff --git a/Cargo.toml b/Cargo.toml
index db18cd45..674d395e 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -46,6 +46,7 @@ members = [
     "proxmox-rrd",
 
     "pxar-bin",
+    "proxmox-backup-test-suite",
 ]
 
 [lib]
diff --git a/proxmox-backup-test-suite/Cargo.toml b/proxmox-backup-test-suite/Cargo.toml
new file mode 100644
index 00000000..3f899e9b
--- /dev/null
+++ b/proxmox-backup-test-suite/Cargo.toml
@@ -0,0 +1,18 @@
+[package]
+name = "proxmox-backup-test-suite"
+version = "0.1.0"
+authors.workspace = true
+edition.workspace = true
+
+[dependencies]
+anyhow.workspace = true
+futures.workspace = true
+serde.workspace = true
+serde_json.workspace = true
+
+pbs-client.workspace = true
+pbs-key-config.workspace = true
+pbs-tools.workspace = true
+proxmox-async.workspace = true
+proxmox-router = { workspace = true, features = ["cli"] }
+proxmox-schema = { workspace = true, features = [ "api-macro" ] }
diff --git a/proxmox-backup-test-suite/src/detection_mode_bench.rs b/proxmox-backup-test-suite/src/detection_mode_bench.rs
new file mode 100644
index 00000000..2f891990
--- /dev/null
+++ b/proxmox-backup-test-suite/src/detection_mode_bench.rs
@@ -0,0 +1,291 @@
+use std::path::Path;
+use std::process::Command;
+
+use anyhow::{bail, format_err, Error};
+use serde_json::Value;
+
+use pbs_client::{
+    tools::{complete_repository, key_source::KEYFILE_SCHEMA, REPO_URL_SCHEMA},
+    BACKUP_SOURCE_SCHEMA,
+};
+use pbs_tools::json;
+use proxmox_router::cli::*;
+use proxmox_schema::api;
+
+const DEFAULT_NUMBER_OF_RUNS: u64 = 5;
+// Homepage https://cocodataset.org/
+const COCO_DATASET_SRC_URL: &'static str = "http://images.cocodataset.org/zips/unlabeled2017.zip";
+// Homepage https://kernel.org/
+const LINUX_GIT_REPOSITORY: &'static str =
+    "git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git";
+const LINUX_GIT_TAG: &'static str = "v6.5.5";
+
+pub(crate) fn detection_mode_bench_mgtm_cli() -> CliCommandMap {
+    let run_cmd_def = CliCommand::new(&API_METHOD_DETECTION_MODE_BENCH_RUN)
+        .arg_param(&["backupspec"])
+        .completion_cb("repository", complete_repository)
+        .completion_cb("keyfile", complete_file_name);
+
+    let prepare_cmd_def = CliCommand::new(&API_METHOD_DETECTION_MODE_BENCH_PREPARE);
+    CliCommandMap::new()
+        .insert("prepare", prepare_cmd_def)
+        .insert("run", run_cmd_def)
+}
+
+#[api(
+   input: {
+       properties: {
+           backupspec: {
+               type: Array,
+               description: "List of backup source specifications ([<label.ext>:<path>] ...)",
+               items: {
+                   schema: BACKUP_SOURCE_SCHEMA,
+               }
+           },
+           repository: {
+               schema: REPO_URL_SCHEMA,
+               optional: true,
+           },
+           keyfile: {
+               schema: KEYFILE_SCHEMA,
+               optional: true,
+           },
+           "number-of-runs": {
+               description: "Number of times to repeat the run",
+               type: Integer,
+               optional: true,
+           },
+       }
+   }
+)]
+/// Run benchmark to compare performance for backups using different change detection modes.
+fn detection_mode_bench_run(param: Value) -> Result<(), Error> {
+    let mut pbc = Command::new("proxmox-backup-client");
+    pbc.arg("backup");
+
+    let backupspec_list = json::required_array_param(&param, "backupspec")?;
+    for backupspec in backupspec_list {
+        let arg = backupspec
+            .as_str()
+            .ok_or_else(|| format_err!("failed to parse backupspec"))?;
+        pbc.arg(arg);
+    }
+
+    if let Some(repo) = param["repository"].as_str() {
+        pbc.arg("--repository");
+        pbc.arg::<&str>(repo);
+    }
+
+    if let Some(keyfile) = param["keyfile"].as_str() {
+        pbc.arg("--keyfile");
+        pbc.arg::<&str>(keyfile);
+    }
+
+    let number_of_runs = match param["number_of_runs"].as_u64() {
+        Some(n) => n,
+        None => DEFAULT_NUMBER_OF_RUNS,
+    };
+    if number_of_runs < 1 {
+        bail!("Number of runs must be greater than 1, aborting.");
+    }
+
+    // First run is an initial run to make sure all chunks are present already, reduce side effects
+    // by filesystem caches ecc.
+    let _stats_initial = do_run(&mut pbc, 1)?;
+
+    println!("\nStarting benchmarking backups with regular detection mode...\n");
+    let stats_reg = do_run(&mut pbc, number_of_runs)?;
+
+    // Make sure to have a valid reference with catalog fromat version 2
+    pbc.arg("--change-detection-mode=metadata");
+    let _stats_initial = do_run(&mut pbc, 1)?;
+
+    println!("\nStarting benchmarking backups with metadata detection mode...\n");
+    let stats_meta = do_run(&mut pbc, number_of_runs)?;
+
+    println!("\nCompleted benchmark with {number_of_runs} runs for each tested mode.");
+    println!("\nCompleted regular backup with:");
+    println!("Total runtime: {:.2} s", stats_reg.total);
+    println!("Average: {:.2} ± {:.2} s", stats_reg.avg, stats_reg.stddev);
+    println!("Min: {:.2} s", stats_reg.min);
+    println!("Max: {:.2} s", stats_reg.max);
+
+    println!("\nCompleted metadata detection mode backup with:");
+    println!("Total runtime: {:.2} s", stats_meta.total);
+    println!(
+        "Average: {:.2} ± {:.2} s",
+        stats_meta.avg, stats_meta.stddev
+    );
+    println!("Min: {:.2} s", stats_meta.min);
+    println!("Max: {:.2} s", stats_meta.max);
+
+    let diff_stddev =
+        ((stats_meta.stddev * stats_meta.stddev) + (stats_reg.stddev * stats_reg.stddev)).sqrt();
+    println!("\nDifferences (metadata based - regular):");
+    println!(
+        "Delta total runtime: {:.2} s ({:.2} %)",
+        stats_meta.total - stats_reg.total,
+        100.0 * (stats_meta.total / stats_reg.total - 1.0),
+    );
+    println!(
+        "Delta average: {:.2} ± {:.2} s ({:.2} %)",
+        stats_meta.avg - stats_reg.avg,
+        diff_stddev,
+        100.0 * (stats_meta.avg / stats_reg.avg - 1.0),
+    );
+    println!(
+        "Delta min: {:.2} s ({:.2} %)", 
+        stats_meta.min - stats_reg.min,
+        100.0 * (stats_meta.min / stats_reg.min - 1.0),
+    );
+    println!(
+        "Delta max: {:.2} s ({:.2} %)",
+        stats_meta.max - stats_reg.max,
+        100.0 * (stats_meta.max / stats_reg.max - 1.0),
+    );
+
+    Ok(())
+}
+
+fn do_run(cmd: &mut Command, n_runs: u64) -> Result<Statistics, Error> {
+    let mut timings = Vec::with_capacity(n_runs as usize);
+    for iteration in 1..n_runs + 1 {
+        let start = std::time::SystemTime::now();
+        let mut child = cmd.spawn()?;
+        let exit_code = child.wait()?;
+        let elapsed = start.elapsed()?;
+        timings.push(elapsed);
+        if !exit_code.success() {
+            bail!("Run number {iteration} of {n_runs} failed, aborting.");
+        }
+    }
+
+    Ok(statistics(timings))
+}
+
+struct Statistics {
+    total: f64,
+    avg: f64,
+    stddev: f64,
+    min: f64,
+    max: f64,
+}
+
+fn statistics(timings: Vec<std::time::Duration>) -> Statistics {
+    let total = timings
+        .iter()
+        .fold(0f64, |sum, time| sum + time.as_secs_f64());
+    let avg = total / timings.len() as f64;
+    let var = 1f64 / (timings.len() - 1) as f64
+        * timings.iter().fold(0f64, |sq_sum, time| {
+            let diff = time.as_secs_f64() - avg;
+            sq_sum + diff * diff
+        });
+    let stddev = var.sqrt();
+    let min = timings.iter().min().unwrap().as_secs_f64();
+    let max = timings.iter().max().unwrap().as_secs_f64();
+
+    Statistics {
+        total,
+        avg,
+        stddev,
+        min,
+        max,
+    }
+}
+
+#[api(
+    input: {
+        properties: {
+            target: {
+                description: "target path to prepare test data.",
+            },
+        },
+    },
+)]
+/// Prepare files required for detection mode backup benchmarks.
+fn detection_mode_bench_prepare(target: String) -> Result<(), Error> {
+    let linux_repo_target = format!("{target}/linux");
+    let coco_dataset_target = format!("{target}/coco");
+    git_clone(LINUX_GIT_REPOSITORY, linux_repo_target.as_str())?;
+    git_checkout(LINUX_GIT_TAG, linux_repo_target.as_str())?;
+    wget_download(COCO_DATASET_SRC_URL, coco_dataset_target.as_str())?;
+
+    Ok(())
+}
+
+fn git_clone(repo: &str, target: &str) -> Result<(), Error> {
+    println!("Calling git clone for '{repo}'.");
+    let target_git = format!("{target}/.git");
+    let path = Path::new(&target_git);
+    if let Ok(true) = path.try_exists() {
+        println!("Target '{target}' already contains a git repository, skip.");
+        return Ok(());
+    }
+
+    let mut git = Command::new("git");
+    git.args(["clone", repo, target]);
+
+    let mut child = git.spawn()?;
+    let exit_code = child.wait()?;
+    if exit_code.success() {
+        println!("git clone finished with success.");
+    } else {
+        bail!("git clone failed for '{target}'.");
+    }
+
+    Ok(())
+}
+
+fn git_checkout(checkout_target: &str, target: &str) -> Result<(), Error> {
+    println!("Calling git checkout '{checkout_target}'.");
+    let mut git = Command::new("git");
+    git.args(["-C", target, "checkout", checkout_target]);
+
+    let mut child = git.spawn()?;
+    let exit_code = child.wait()?;
+    if exit_code.success() {
+        println!("git checkout finished with success.");
+    } else {
+        bail!("git checkout '{checkout_target}' failed for '{target}'.");
+    }
+    Ok(())
+}
+
+fn wget_download(source_url: &str, target: &str) -> Result<(), Error> {
+    let path = Path::new(&target);
+    if let Ok(true) = path.try_exists() {
+        println!("Target '{target}' already exists, skip.");
+        return Ok(());
+    }
+    let zip = format!("{}/unlabeled2017.zip", target);
+    let path = Path::new(&zip);
+    if !path.try_exists()? {
+        println!("Download archive using wget from '{source_url}' to '{target}'.");
+        let mut wget = Command::new("wget");
+        wget.args(["-P", target, source_url]);
+
+        let mut child = wget.spawn()?;
+        let exit_code = child.wait()?;
+        if exit_code.success() {
+            println!("Download finished with success.");
+        } else {
+            bail!("Failed to download '{source_url}' to '{target}'.");
+        }
+        return Ok(());
+    } else {
+        println!("Target '{target}' already contains download, skip download.");
+    }
+
+    let mut unzip = Command::new("unzip");
+    unzip.args([&zip, "-d", target]);
+
+    let mut child = unzip.spawn()?;
+    let exit_code = child.wait()?;
+    if exit_code.success() {
+        println!("Extracting zip archive finished with success.");
+    } else {
+        bail!("Failed to extract zip archive '{zip}' to '{target}'.");
+    }
+    Ok(())
+}
diff --git a/proxmox-backup-test-suite/src/main.rs b/proxmox-backup-test-suite/src/main.rs
new file mode 100644
index 00000000..0a5b436a
--- /dev/null
+++ b/proxmox-backup-test-suite/src/main.rs
@@ -0,0 +1,17 @@
+use proxmox_router::cli::*;
+
+mod detection_mode_bench;
+
+fn main() {
+    let cmd_def = CliCommandMap::new().insert(
+        "detection-mode-bench",
+        detection_mode_bench::detection_mode_bench_mgtm_cli(),
+    );
+
+    let rpcenv = CliEnvironment::new();
+    run_cli_command(
+        cmd_def,
+        rpcenv,
+        Some(|future| proxmox_async::runtime::main(future)),
+    );
+}
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 24/28] test-suite: Add bin to deb, add shell completions
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (22 preceding siblings ...)
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 23/28] test-suite: add detection mode change benchmark Christian Ebner
@ 2023-11-15 15:48 ` Christian Ebner
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 25/28] catalog: fetch offset and size for files and refs Christian Ebner
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:48 UTC (permalink / raw)
  To: pbs-devel

Adds the required files for bash and zsh completion and packages the
binary to be included in the proxmox-backup-client debian package.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- no changes

Changes since version 3:
- no changes

Changes since version 2:
- no changes

Changes since version 1:
- not present in version 1

 Makefile                                     | 13 ++++++++-----
 debian/proxmox-backup-client.bash-completion |  1 +
 debian/proxmox-backup-client.install         |  2 ++
 debian/proxmox-backup-test-suite.bc          |  8 ++++++++
 zsh-completions/_proxmox-backup-test-suite   | 13 +++++++++++++
 5 files changed, 32 insertions(+), 5 deletions(-)
 create mode 100644 debian/proxmox-backup-test-suite.bc
 create mode 100644 zsh-completions/_proxmox-backup-test-suite

diff --git a/Makefile b/Makefile
index 0317dd5e..acaac3f7 100644
--- a/Makefile
+++ b/Makefile
@@ -8,11 +8,12 @@ SUBDIRS := etc www docs
 
 # Binaries usable by users
 USR_BIN := \
-	proxmox-backup-client 	\
-	proxmox-file-restore	\
-	pxar			\
-	proxmox-tape		\
-	pmtx			\
+	proxmox-backup-client 		\
+	proxmox-backup-test-suite 	\
+	proxmox-file-restore		\
+	pxar				\
+	proxmox-tape			\
+	pmtx				\
 	pmt
 
 # Binaries usable by admins
@@ -165,6 +166,8 @@ $(COMPILED_BINS) $(COMPILEDIR)/dump-catalog-shell-cli $(COMPILEDIR)/docgen: .do-
 	    --bin proxmox-backup-client \
 	    --bin dump-catalog-shell-cli \
 	    --bin proxmox-backup-debug \
+	    --package proxmox-backup-test-suite \
+	    --bin proxmox-backup-test-suite \
 	    --package proxmox-file-restore \
 	    --bin proxmox-file-restore \
 	    --package pxar-bin \
diff --git a/debian/proxmox-backup-client.bash-completion b/debian/proxmox-backup-client.bash-completion
index 43736017..c4ff02ae 100644
--- a/debian/proxmox-backup-client.bash-completion
+++ b/debian/proxmox-backup-client.bash-completion
@@ -1,2 +1,3 @@
 debian/proxmox-backup-client.bc proxmox-backup-client
+debian/proxmox-backup-test-suite.bc proxmox-backup-test-suite
 debian/pxar.bc pxar
diff --git a/debian/proxmox-backup-client.install b/debian/proxmox-backup-client.install
index 74b568f1..0eb85975 100644
--- a/debian/proxmox-backup-client.install
+++ b/debian/proxmox-backup-client.install
@@ -1,6 +1,8 @@
 usr/bin/proxmox-backup-client
+usr/bin/proxmox-backup-test-suite
 usr/bin/pxar
 usr/share/man/man1/proxmox-backup-client.1
 usr/share/man/man1/pxar.1
 usr/share/zsh/vendor-completions/_proxmox-backup-client
+usr/share/zsh/vendor-completions/_proxmox-backup-test-suite
 usr/share/zsh/vendor-completions/_pxar
diff --git a/debian/proxmox-backup-test-suite.bc b/debian/proxmox-backup-test-suite.bc
new file mode 100644
index 00000000..2686d7ea
--- /dev/null
+++ b/debian/proxmox-backup-test-suite.bc
@@ -0,0 +1,8 @@
+# proxmox-backup-test-suite bash completion
+
+# see http://tiswww.case.edu/php/chet/bash/FAQ
+# and __ltrim_colon_completions() in /usr/share/bash-completion/bash_completion
+# this modifies global var, but I found no better way
+COMP_WORDBREAKS=${COMP_WORDBREAKS//:}
+
+complete -C 'proxmox-backup-test-suite bashcomplete' proxmox-backup-test-suite
diff --git a/zsh-completions/_proxmox-backup-test-suite b/zsh-completions/_proxmox-backup-test-suite
new file mode 100644
index 00000000..72ebcea5
--- /dev/null
+++ b/zsh-completions/_proxmox-backup-test-suite
@@ -0,0 +1,13 @@
+#compdef _proxmox-backup-test-suite() proxmox-backup-test-suite
+
+function _proxmox-backup-test-suite() {
+    local cwords line point cmd curr prev
+    cwords=${#words[@]}
+    line=$words
+    point=${#line}
+    cmd=${words[1]}
+    curr=${words[cwords]}
+    prev=${words[cwords-1]}
+    compadd -- $(COMP_CWORD="$cwords" COMP_LINE="$line" COMP_POINT="$point" \
+        proxmox-backup-test-suite bashcomplete "$cmd" "$curr" "$prev")
+}
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 25/28] catalog: fetch offset and size for files and refs
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (23 preceding siblings ...)
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 24/28] test-suite: Add bin to deb, add shell completions Christian Ebner
@ 2023-11-15 15:48 ` Christian Ebner
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 26/28] pxar: add heuristic to reduce reused chunk fragmentation Christian Ebner
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:48 UTC (permalink / raw)
  To: pbs-devel

Allows to fetch the pxar archive offsets and file size for regular files
and appendix referenced files.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- no changes

Changes since version 3:
- no present in version 3

 pbs-datastore/src/catalog.rs | 70 ++++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
index 220313c6..fe076a94 100644
--- a/pbs-datastore/src/catalog.rs
+++ b/pbs-datastore/src/catalog.rs
@@ -1,3 +1,4 @@
+use std::collections::BTreeMap;
 use std::ffi::{CStr, CString, OsStr};
 use std::fmt;
 use std::io::{Read, Seek, SeekFrom, Write};
@@ -1118,6 +1119,75 @@ impl<R: Read + Seek> CatalogReader<R> {
 
         Ok(res)
     }
+
+    /// Get all File and AppendixRef entries with their pxar archive offset and size
+    pub fn fetch_offsets(&mut self) -> Result<BTreeMap<u64, u64>, Error> {
+        let root = self.root()?;
+        let mut list = BTreeMap::new();
+        match root {
+            DirEntry {
+                attr: DirEntryAttribute::Directory { start },
+                ..
+            } => self.fetch_offsets_from_dir(std::path::Path::new("./"), start, &mut list, None)?,
+            _ => bail!("unexpected root entry type, not a directory!"),
+        }
+        Ok(list)
+    }
+
+    fn fetch_offsets_from_dir(
+        &mut self,
+        prefix: &std::path::Path,
+        start: u64,
+        list: &mut BTreeMap<u64, u64>,
+        appendix_start: Option<AppendixStartOffset>,
+    ) -> Result<(), Error> {
+        let data = self.read_raw_dirinfo_block(start)?;
+
+        DirInfo::parse(
+            &data,
+            self.magic,
+            |etype, name_bytes, offset, size, _mtime, _ctime, link_offset| {
+                let mut path = std::path::PathBuf::from(prefix);
+                let name: &OsStr = OsStrExt::from_bytes(name_bytes);
+                path.push(name);
+
+                match etype {
+                    CatalogEntryType::Archive => {
+                        if offset > start {
+                            bail!("got wrong archive offset ({} > {})", offset, start);
+                        }
+                        let pos = start - offset;
+                        let appendix_start = self.appendix_offset(name_bytes)?;
+                        self.fetch_offsets_from_dir(&path, pos, list, appendix_start)?;
+                    }
+                    CatalogEntryType::Directory => {
+                        if offset > start {
+                            bail!("got wrong directory offset ({} > {})", offset, start);
+                        }
+                        let pos = start - offset;
+                        self.fetch_offsets_from_dir(&path, pos, list, appendix_start)?;
+                    }
+                    CatalogEntryType::AppendixRef => {
+                        if let Some(Offset::AppendixRefOffset { offset }) = link_offset {
+                            if let Some(appendix_start) = appendix_start {
+                                list.insert(appendix_start.raw() + offset, size);
+                            } else {
+                                bail!("missing required appendix start offset");
+                            }
+                        }
+                    }
+                    CatalogEntryType::File => {
+                        if let Some(Offset::FileOffset { offset }) = link_offset {
+                            list.insert(offset, size);
+                        }
+                    }
+                    _ => {}
+                }
+                Ok(true)
+            },
+        )?;
+        Ok(())
+    }
 }
 
 /// Serialize i64 as short, variable length byte sequence
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 26/28] pxar: add heuristic to reduce reused chunk fragmentation
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (24 preceding siblings ...)
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 25/28] catalog: fetch offset and size for files and refs Christian Ebner
@ 2023-11-15 15:48 ` Christian Ebner
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 27/28] catalog: use format version 2 conditionally Christian Ebner
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-widget-toolkit 28/28] file-browser: support pxar archive and fileref types Christian Ebner
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:48 UTC (permalink / raw)
  To: pbs-devel

For multiple consecutive runs with metadata based file change detection
the referencing of existing chunks can lead to fragmentation and an
increased size of the index file.

In order to reduce this, check which chunks relative size has been
referenced by the previous run and re-chunk files where a constant
threshold value has been underrun.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- no changes

Changes since version 3:
- no present in version 3

 pbs-client/src/pxar/create.rs | 55 +++++++++++++++++++++++++----------
 1 file changed, 40 insertions(+), 15 deletions(-)

diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 8d3db5e8..baeba859 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -1,7 +1,8 @@
-use std::collections::{HashMap, HashSet, VecDeque};
+use std::collections::{BTreeMap, HashMap, HashSet, VecDeque};
 use std::ffi::{CStr, CString, OsStr};
 use std::fmt;
 use std::io::{self, Read};
+use std::ops::Bound::Included;
 use std::os::unix::ffi::OsStrExt;
 use std::os::unix::io::{AsRawFd, FromRawFd, IntoRawFd, OwnedFd, RawFd};
 use std::path::{Path, PathBuf};
@@ -36,6 +37,7 @@ use crate::pxar::tools::assert_single_path_component;
 use crate::pxar::Flags;
 
 const MAX_FILE_SIZE: u64 = 1024;
+const FRAGMENTATION_THRESHOLD: f64 = 0.75;
 
 /// Pxar options for creating a pxar archive/stream
 #[derive(Default)]
@@ -236,6 +238,7 @@ struct Archiver {
     forced_boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
     appendix: Appendix,
     prev_appendix: Option<AppendixStartOffset>,
+    ref_offsets: BTreeMap<u64, u64>,
 }
 
 type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
@@ -291,21 +294,23 @@ where
         )?);
     }
 
-    let (appendix_start, prev_cat_parent) = if let Some(ref mut prev_ref) = options.previous_ref {
-        let entry = prev_ref
-            .catalog
-            .lookup_recursive(prev_ref.archive_name.as_bytes())?;
-        let parent = match entry.attr {
-            DirEntryAttribute::Archive { .. } => Some(entry),
-            _ => None,
+    let (appendix_start, prev_cat_parent, ref_offsets) =
+        if let Some(ref mut prev_ref) = options.previous_ref {
+            let entry = prev_ref
+                .catalog
+                .lookup_recursive(prev_ref.archive_name.as_bytes())?;
+            let parent = match entry.attr {
+                DirEntryAttribute::Archive { .. } => Some(entry),
+                _ => None,
+            };
+            let appendix_start = prev_ref
+                .catalog
+                .appendix_offset(prev_ref.archive_name.as_bytes())?;
+            let ref_offsets = prev_ref.catalog.fetch_offsets()?;
+            (appendix_start, parent, ref_offsets)
+        } else {
+            (None, None, BTreeMap::new())
         };
-        let appendix_start = prev_ref
-            .catalog
-            .appendix_offset(prev_ref.archive_name.as_bytes())?;
-        (appendix_start, parent)
-    } else {
-        (None, None)
-    };
 
     let mut archiver = Archiver {
         feature_flags,
@@ -325,6 +330,7 @@ where
         forced_boundaries,
         appendix: Appendix::new(),
         prev_appendix: appendix_start,
+        ref_offsets,
     };
 
     if let Some(ref mut catalog) = archiver.catalog {
@@ -771,6 +777,25 @@ impl Archiver {
         let (indices, start_padding, end_padding) =
             prev_ref.index.indices(start_offset, end_offset)?;
 
+        // Heuristic to reduce chunk fragmentation by only referencing files if the previous
+        // run referenced the files. If the files spans more than 2 chunks, it should always be
+        // referenced, otherwise check if at least 3/4 of the chunk where referenced in the
+        // last backup run.
+        if indices.len() < 3 {
+            let chunk_sum = indices.iter().fold(0f64, |sum, ind| sum + ind.end() as f64);
+            let chunks_start = start_offset - start_padding;
+            let chunks_end = end_offset + end_padding;
+            let refs_sum = self
+                .ref_offsets
+                .range((Included(chunks_start), Included(chunks_end)))
+                .fold(0f64, |sum, (_, size)| sum + *size as f64);
+
+            // Does not cover the attribute size information
+            if refs_sum / chunk_sum < FRAGMENTATION_THRESHOLD {
+                return Ok(false);
+            }
+        }
+
         let appendix_ref_offset = self.appendix.insert(indices, start_padding);
         let file_name: &Path = OsStr::from_bytes(c_file_name.to_bytes()).as_ref();
         self.add_appendix_ref(encoder, file_name, appendix_ref_offset, file_size)
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-backup 27/28] catalog: use format version 2 conditionally
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (25 preceding siblings ...)
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 26/28] pxar: add heuristic to reduce reused chunk fragmentation Christian Ebner
@ 2023-11-15 15:48 ` Christian Ebner
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-widget-toolkit 28/28] file-browser: support pxar archive and fileref types Christian Ebner
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:48 UTC (permalink / raw)
  To: pbs-devel

Only use the catalog format version 2 when performing backups with the
change detection mode set to metadata.

This makes sure to remain compatible with older clients which do not
support the format version 2 at least for backups created using the
regular mode.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- not present in version 4

 pbs-datastore/src/catalog.rs      | 71 +++++++++++++++++++++++++------
 proxmox-backup-client/src/main.rs | 29 ++++++++++---
 2 files changed, 81 insertions(+), 19 deletions(-)

diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
index fe076a94..93457302 100644
--- a/pbs-datastore/src/catalog.rs
+++ b/pbs-datastore/src/catalog.rs
@@ -65,6 +65,12 @@ pub enum CatalogEntryType {
     Socket = b's',
 }
 
+#[derive(PartialEq)]
+pub enum CatalogFormatVersion {
+    V1,
+    V2,
+}
+
 impl TryFrom<u8> for CatalogEntryType {
     type Error = Error;
 
@@ -556,17 +562,22 @@ pub struct CatalogWriter<W> {
     writer: W,
     dirstack: Vec<DirInfo>,
     pos: u64,
+    version: CatalogFormatVersion,
 }
 
 impl<W: Write> CatalogWriter<W> {
     /// Create a new  CatalogWriter instance
-    pub fn new(writer: W) -> Result<Self, Error> {
+    pub fn new(writer: W, version: CatalogFormatVersion) -> Result<Self, Error> {
         let mut me = Self {
             writer,
             dirstack: vec![DirInfo::new_rootdir()],
             pos: 0,
+            version,
         };
-        me.write_all(&PROXMOX_CATALOG_FILE_MAGIC_2_0)?;
+        match me.version {
+            CatalogFormatVersion::V1 => me.write_all(&PROXMOX_CATALOG_FILE_MAGIC_1_0)?,
+            CatalogFormatVersion::V2 => me.write_all(&PROXMOX_CATALOG_FILE_MAGIC_2_0)?,
+        }
         Ok(me)
     }
 
@@ -600,6 +611,10 @@ impl<W: Write> CatalogWriter<W> {
 
 impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
     fn start_archive(&mut self, name: &CStr) -> Result<(), Error> {
+        if self.version == CatalogFormatVersion::V1 {
+            return self.start_directory(name);
+        }
+
         let new = DirInfo::new(name.to_owned());
         self.dirstack.push(new);
         Ok(())
@@ -609,6 +624,9 @@ impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
         &mut self,
         appendix: Option<pxar::encoder::AppendixStartOffset>,
     ) -> Result<(), Error> {
+        if self.version == CatalogFormatVersion::V1 {
+            return self.end_directory();
+        }
         let (start, name) = match self.dirstack.pop() {
             Some(dir) => {
                 let start = self.pos;
@@ -689,17 +707,32 @@ impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
             .last_mut()
             .ok_or_else(|| format_err!("outside root"))?;
         let name = name.to_bytes().to_vec();
-        let file_offset = FileOffset {
-            offset: file_offset.raw(),
+        let entry = match self.version {
+            CatalogFormatVersion::V1 => {
+                DirEntry {
+                    name,
+                    attr: DirEntryAttribute::File {
+                        size,
+                        mtime,
+                        extension: None,
+                    },
+                }
+            }
+            CatalogFormatVersion::V2 => {
+                let file_offset = FileOffset {
+                    offset: file_offset.raw(),
+                };
+                DirEntry {
+                    name,
+                    attr: DirEntryAttribute::File {
+                        size,
+                        mtime,
+                        extension: Some(CatalogV2Extension { ctime, file_offset }),
+                    },
+                }
+            }
         };
-        dir.entries.push(DirEntry {
-            name,
-            attr: DirEntryAttribute::File {
-                size,
-                mtime,
-                extension: Some(CatalogV2Extension { ctime, file_offset }),
-            },
-        });
+        dir.entries.push(entry);
         Ok(())
     }
 
@@ -711,6 +744,9 @@ impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
         ctime: i64,
         appendix_ref_offset: pxar::encoder::AppendixRefOffset,
     ) -> Result<(), Error> {
+        if self.version == CatalogFormatVersion::V1 {
+            bail!("unsupported by catalog format version 1");
+        }
         let dir = self
             .dirstack
             .last_mut()
@@ -857,6 +893,17 @@ impl<R: Read + Seek> CatalogReader<R> {
         })
     }
 
+    pub fn format_version(&mut self) -> Result<CatalogFormatVersion, Error> {
+        self.reader.seek(SeekFrom::Start(0))?;
+        let mut magic = [0u8; 8];
+        self.reader.read_exact(&mut magic)?;
+        match magic {
+            PROXMOX_CATALOG_FILE_MAGIC_1_0 => Ok(CatalogFormatVersion::V1),
+            PROXMOX_CATALOG_FILE_MAGIC_2_0 => Ok(CatalogFormatVersion::V2),
+            _ => bail!("got unexpected magic number for catalog"),
+        }
+    }
+
     pub fn appendix_offset(
         &mut self,
         archive_name: &[u8],
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 65d3308a..0a2e0c8b 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -49,7 +49,7 @@ use pbs_client::{
     BackupStats, BackupWriter, ChunkStream, FixedChunkStream, HttpClient, PxarBackupStream,
     RemoteChunkReader, UploadOptions, BACKUP_DETECTION_MODE_SPEC, BACKUP_SOURCE_SCHEMA,
 };
-use pbs_datastore::catalog::{CatalogReader, CatalogWriter};
+use pbs_datastore::catalog::{CatalogFormatVersion, CatalogReader, CatalogWriter};
 use pbs_datastore::chunk_store::verify_chunk_size;
 use pbs_datastore::dynamic_index::{BufferedDynamicReader, DynamicIndexReader};
 use pbs_datastore::fixed_index::FixedIndexReader;
@@ -536,6 +536,7 @@ struct CatalogUploadResult {
 fn spawn_catalog_upload(
     client: Arc<BackupWriter>,
     encrypt: bool,
+    catalog_format_version: CatalogFormatVersion,
 ) -> Result<CatalogUploadResult, Error> {
     let (catalog_tx, catalog_rx) = std::sync::mpsc::sync_channel(10); // allow to buffer 10 writes
     let catalog_stream = proxmox_async::blocking::StdChannelStream(catalog_rx);
@@ -549,9 +550,10 @@ fn spawn_catalog_upload(
         injections.clone(),
     );
 
-    let catalog_writer = Arc::new(Mutex::new(CatalogWriter::new(TokioWriterAdapter::new(
-        StdChannelWriter::new(catalog_tx),
-    ))?));
+    let catalog_writer = Arc::new(Mutex::new(CatalogWriter::new(
+        TokioWriterAdapter::new(StdChannelWriter::new(catalog_tx)),
+        catalog_format_version,
+    )?));
 
     let (catalog_result_tx, catalog_result_rx) = tokio::sync::oneshot::channel();
 
@@ -1015,8 +1017,16 @@ async fn create_backup(
             (BackupSpecificationType::PXAR, false) => {
                 // start catalog upload on first use
                 if catalog.is_none() {
-                    let catalog_upload_res =
-                        spawn_catalog_upload(client.clone(), crypto.mode == CryptMode::Encrypt)?;
+                    let catalog_format_version = if let BackupDetectionMode::Metadata(_) = detection_mode {
+                        CatalogFormatVersion::V2
+                    } else {
+                        CatalogFormatVersion::V1
+                    };
+                    let catalog_upload_res = spawn_catalog_upload(
+                        client.clone(),
+                        crypto.mode == CryptMode::Encrypt,
+                        catalog_format_version,
+                    )?;
                     catalog = Some(catalog_upload_res.catalog_writer);
                     catalog_result_rx = Some(catalog_upload_res.result);
                 }
@@ -1214,8 +1224,13 @@ async fn download_reference_catalog(
         .map_err(|err| format_err!("failed to download reference catalog - {}", err))?;
 
     catalogfile.seek(SeekFrom::Start(0))?;
+    let mut reader = CatalogReader::new(catalogfile);
 
-    Ok(CatalogReader::new(catalogfile))
+    match reader.format_version() {
+        Ok(CatalogFormatVersion::V2) => Ok(reader),
+        Ok(CatalogFormatVersion::V1) => Err(format_err!("unsupported catalog format version")),
+        Err(err) => Err(err),
+    }
 }
 
 async fn dump_image<W: Write>(
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

* [pbs-devel] [PATCH v5 proxmox-widget-toolkit 28/28] file-browser: support pxar archive and fileref types
  2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
                   ` (26 preceding siblings ...)
  2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 27/28] catalog: use format version 2 conditionally Christian Ebner
@ 2023-11-15 15:48 ` Christian Ebner
  27 siblings, 0 replies; 29+ messages in thread
From: Christian Ebner @ 2023-11-15 15:48 UTC (permalink / raw)
  To: pbs-devel

Covers cases for the special directory type archive and file
reference types as introduced by pxar format version 2.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since version 4:
- rebased to current master

Changes since version 3:
- no changes

Changes since version 2:
- not present in version 2

 src/Schema.js             |  2 ++
 src/window/FileBrowser.js | 16 ++++++++++++----
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/src/Schema.js b/src/Schema.js
index e0f583a..b680f70 100644
--- a/src/Schema.js
+++ b/src/Schema.js
@@ -59,7 +59,9 @@ Ext.define('Proxmox.Schema', { // a singleton
 	b: { icon: 'cube', label: gettext('Block Device') },
 	c: { icon: 'tty', label: gettext('Character Device') },
 	d: { icon: 'folder-o', label: gettext('Directory') },
+	a: { icon: 'folder-o', label: gettext('Archive') },
 	f: { icon: 'file-text-o', label: gettext('File') },
+	r: { icon: 'file-text-o', label: gettext('FileRef') },
 	h: { icon: 'file-o', label: gettext('Hardlink') },
 	l: { icon: 'link', label: gettext('Softlink') },
 	p: { icon: 'exchange', label: gettext('Pipe/Fifo') },
diff --git a/src/window/FileBrowser.js b/src/window/FileBrowser.js
index e036d9f..4a288e6 100644
--- a/src/window/FileBrowser.js
+++ b/src/window/FileBrowser.js
@@ -8,7 +8,7 @@ Ext.define('proxmox-file-tree', {
 	    calculate: data => {
 		if (data.size === undefined) {
 		    return '';
-		} else if (false && data.type === 'd') { // eslint-disable-line no-constant-condition
+		} else if (false && (data.type === 'd' || data.type === 'a')) { // eslint-disable-line no-constant-condition
 		    // FIXME: enable again once we fixed trouble with confusing size vs item #
 		    let fs = data.size === 1 ? gettext('{0} Item') : gettext('{0} Items');
 		    return Ext.String.format(fs, data.size);
@@ -26,7 +26,7 @@ Ext.define('proxmox-file-tree', {
 	    name: 'iconCls',
 	    calculate: function(data) {
 		let icon = Proxmox.Schema.pxarFileTypes[data.type]?.icon ?? 'file-o';
-		if (data.expanded && data.type === 'd') {
+		if (data.expanded && (data.type === 'd' || data.type === 'a')) {
 		    icon = 'folder-open-o';
 		}
 		return `fa fa-${icon}`;
@@ -58,7 +58,9 @@ Ext.define("Proxmox.window.FileBrowser", {
 	downloadableFileTypes: {
 	    'h': true, // hardlinks
 	    'f': true, // "normal" files
+	    'r': true, // ref files
 	    'd': true, // directories
+	    'a': true, // archive
 	},
 
 	// prefix to prepend to downloaded file names
@@ -98,7 +100,7 @@ Ext.define("Proxmox.window.FileBrowser", {
 	    params.filepath = data.filepath;
 
 	    let filename = view.downloadPrefix + data.text;
-	    if (data.type === 'd') {
+	    if (data.type === 'd' || data.type === 'a') {
 		if (tar) {
 		    params.tar = 1;
 		    filename += ".tar.zst";
@@ -122,7 +124,7 @@ Ext.define("Proxmox.window.FileBrowser", {
 	    view.lookup('selectText').setText(st);
 
 	    let canDownload = view.downloadURL && view.downloadableFileTypes[data.type];
-	    let enableMenu = data.type === 'd';
+	    let enableMenu = data.type === 'd' || data.type === 'a';
 
 	    let downloadBtn = view.lookup('downloadBtn');
 	    downloadBtn.setDisabled(!canDownload || enableMenu);
@@ -201,6 +203,7 @@ Ext.define("Proxmox.window.FileBrowser", {
 	    });
 	    store.load((rec, op, success) => {
 		let root = store.getRoot();
+		console.log(root);
 		root.expand(); // always expand invisible root node
 		if (view.archive === 'all') {
 		    root.expandChildren(false);
@@ -269,6 +272,11 @@ Ext.define("Proxmox.window.FileBrowser", {
 			    } else if (a.data.type !== 'd' && b.data.type === 'd') {
 				return 1;
 			    }
+			    if (a.data.type === 'a' && b.data.type !== 'a') {
+				return -1;
+			    } else if (a.data.type !== 'a' && b.data.type === 'a') {
+				return 1;
+			    }
 
 			    let asize = a.data.size || 0;
 			    let bsize = b.data.size || 0;
-- 
2.39.2





^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-11-15 15:49 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-15 15:47 [pbs-devel] [PATCH-SERIES v5 pxar proxmox-backup proxmox-widget-toolkit 00/28] fix #3174: improve file-level backup Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 1/28] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 2/28] fix #3174: decoder: impl skip_bytes for sync dec Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 3/28] fix #3174: encoder: calc filename + metadata byte size Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 4/28] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 5/28] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 6/28] fix #3174: encoder: helper to add to encoder position Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 7/28] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 pxar 8/28] fix #3174: enc/dec: introduce pxar format version 2 Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 09/28] fix #3174: index: add fn index list from start/end-offsets Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 10/28] fix #3174: index: add fn digest for DynamicEntry Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 11/28] fix #3174: api: double catalog upload size Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 12/28] fix #3174: catalog: introduce extended format v2 Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 13/28] fix #3174: archiver/extractor: impl appendix ref Christian Ebner
2023-11-15 15:47 ` [pbs-devel] [PATCH v5 proxmox-backup 14/28] fix #3174: catalog: add specialized Archive entry Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 15/28] fix #3174: extractor: impl seq restore from appendix Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 16/28] fix #3174: archiver: store ref to previous backup Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 17/28] fix #3174: upload stream: impl reused chunk injector Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 18/28] fix #3174: chunker: add forced boundaries Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 19/28] fix #3174: backup writer: inject queued chunk in upload steam Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 20/28] fix #3174: archiver: reuse files with unchanged metadata Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 21/28] fix #3174: specs: add backup detection mode specification Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 22/28] fix #3174: client: Add detection mode to backup creation Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 23/28] test-suite: add detection mode change benchmark Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 24/28] test-suite: Add bin to deb, add shell completions Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 25/28] catalog: fetch offset and size for files and refs Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 26/28] pxar: add heuristic to reduce reused chunk fragmentation Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-backup 27/28] catalog: use format version 2 conditionally Christian Ebner
2023-11-15 15:48 ` [pbs-devel] [PATCH v5 proxmox-widget-toolkit 28/28] file-browser: support pxar archive and fileref types Christian Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal