* [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup
@ 2024-01-25 13:25 Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 1/29] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
` (28 more replies)
0 siblings, 29 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
Changes to the patch series since version 5 are based on the feedback
obtained via internal communication channels. Many thanks to Thomas,
Fabian, Wolfgang and Dominik for their continuous feedback.
This series of patches implements an metadata based file change
detection mechanism for improved pxar file level backup creation speed
for unchanged files.
The chosen approach is to skip encoding of regular file payloads,
for which metadata (currently ctime and size) did not change as
compared to a previous backup run. Instead of re-encoding the files, a
reference to a newly introduced appendix section of the pxar archive
will be written. The appendix section will be created as concatenation
of indexed chunks from the previous backup run, thereby containing the
sequential file payload at a calculated offset with respect to the
starting point of the appendix section.
Metadata comparison and calculation of the chunks to be indexed for the
appendix section is performed using the catalog of a previous backup as
reference. In order to be able to calculate the offsets, an updated
catalog file format version 2 is introduced which extends the previous
version by including the file offset with respect to the pxar archive
byte stream, as well as the files ctime. This allows to find the required
chunks indexes and the start padding within the concatenated chunks.
The catalog reader remains backwards compatible to the catalog file
format version 1.
During encoding, the chunks needed for the appendix section are injected
in the backup upload stream after forcing a chunk boundary when regular
pxar encoding is finished. Finally, the pxar archive containing an
appendix section are marked as such by appending a final pxar goodbye
lookup table only containing the offset to the appendix section start and
total size of that section, needed for random access as e.g. to mount
the archive via the fuse filesystem implementation.
The following lists the most notable changes included in this series since
the version 5:
- the archiver now implements lookahead caching in order to decide if to
reuse or reencode an entry, replacing the previously used heuristic.
This allows to reduce chunk fragmentation while being more predictable
and stable than the previous approach
- the pxar format version is now encoded as prefix entry to pxar archives
with version 2 in order for encoder/decoder implementations to early detect
the correct format.
- the encoder now stores an internal stack of encoder states, instead of
creating new EncoderImpl instances for each directory level. This was a
neccessary change in order to handle the entry caching for the lookahead
cache.
- Appendix chunks now have a dedicated type `AppendableDynamicEntry` to
clearly distingush them from regular chunks.
The following lists the most notable changes included in this series since
the version 4:
- fix an issue with premature injection queue chunk insertion on initialization
- fix an issue with the decoder state not being correctly set at the start of
the appendix section, leading to decoding errors in special cases.
- avoid double injection of chunks in cases where the chunk list to insert
starts with the first chunk of the list already being present, but not the
subsequent ones
- refactoring and renaming of the Encoder's `bytes_len` to `encoded_size`
The following lists the most notable changes included in this series since
the version 3:
- count appendix chunks as reused chunks, as they are not re-encoded and
re-uploaded.
- add a heuristic to reduce chunk fragmentation for appendix chunks for
multiple consecutive backup runs with metadata based file change detection.
- refactor the appendix list generation during pxar encoding in the archiver.
- switch from Vec to BTreeMap for the restoring the appendix entries so
entries are inserted in sorted order based on their offset, making it
unnecessary to sort afterwards.
- fix issue with chunk injection code which lead to data corruption in some
edge cases, add additional checks as fortification.
The following lists the most notable changes included in this series since
the version 2:
- avoid re-indexing the same chunks multiple times in the appendix
section by looking them up in the already present appendix chunks list
and calculate the appendix reference offset accordingly. This now
requires to sort entries by their appendix start offset for sequential
restore.
- reduce appendix reference chunk fragmentation by increasing the file
size threshold to 1k.
- Fix the WebUIs file browser and single file restore, broken in the
previous patch series.
- fixes previous catalog and/or dynamic index downloads when either the
backup group was empty or no archive with the same name present in the
backup
The following lists the most notable changes included in this series since
the version 1:
- fixes and refactors the missing chunk issue by modifying the logic to
avoid re-appending the same chunks multiple times if referenced by
multiple consecutive files.
- fixes a performance issue with catalog lookup being the bottleneck
for cases with directories with many entries, resulting in the
metadata based file change detection performing worse than the regular
mode.
- fixes the creation of multi archive backups. All of the archives use
the same reference catalog.
- the catalog file format is pushed to version 2, including the needed
archive offsets as well as ctime for file change detection.
- the catalog is fully backward compatible to catalog file format
version 1, so both can be decoded by the reader. However, the new
version of the catalog file format will be used for all new backups
- it is now possible to perform multiple consecutive runs of the backup
with metadata based file change detection, no more need to perform the
regular run previous to the other one.
- change from `incremental` flag to enum based `BackupDetectionMode`
parameter for command invocations.
- includes a new `proxmox-backup-test-suite` binary to create and run
benchmarks to compare the performance of the different detection modes.
An invocation of a backup run with this patches now is:
```bash
proxmox-backup-client backup <label>.pxar:<source-path> --change-detection-mode=metadata
```
During the first run, no reference catalog can be used, the backup will
be a regular run. Following backups will however utilize the catalog and
index files of the previous run to perform file change detection.
As benchmarks, the linux source code as well as the coco dataset for
computer vision and pattern recognition can be used.
The benchmarks can be performed by running:
```bash
proxmox-backup-test-suite detection-mode-bench prepare --target /<path-to-bench-source-target>
proxmox-backup-test-suite detection-mode-bench run linux.pxar:/<path-to-bench-source-target>/linux
proxmox-backup-test-suite detection-mode-bench run coco.pxar:/<path-to-bench-source-target>/coco
```
Above command invocations assume the default repository and credentials
to be set as environment variables, they might however be passed as
additinal optional parameters instead.
Benchmark runs using these test data show a significant improvement in
the time needed for the backups:
For the linux source code backup:
Completed benchmark with 5 runs for each tested mode.
Completed regular backup with:
Total runtime: 49.20 s
Average: 9.84 ± 0.07 s
Min: 9.74 s
Max: 9.92 s
Completed metadata detection mode backup with:
Total runtime: 12.20 s
Average: 2.44 ± 0.06 s
Min: 2.34 s
Max: 2.48 s
Differences (metadata based - regular):
Delta total runtime: -37.00 s (-75.20 %)
Delta average: -7.40 ± 0.09 s (-75.20 %)
Delta min: -7.40 s (-75.98 %)
Delta max: -7.43 s (-74.97 %)
For the coco dataset backup:
Completed benchmark with 5 runs for each tested mode.
Completed regular backup with:
Total runtime: 513.10 s
Average: 102.62 ± 0.46 s
Min: 102.15 s
Max: 103.23 s
Completed metadata detection mode backup with:
Total runtime: 114.32 s
Average: 22.86 ± 0.58 s
Min: 22.40 s
Max: 23.86 s
Differences (metadata based - regular):
Delta total runtime: -398.78 s (-77.72 %)
Delta average: -79.76 ± 0.74 s (-77.72 %)
Delta min: -79.76 s (-78.07 %)
Delta max: -79.37 s (-76.89 %)
Above tests were performed inside a test VM backing up to a local
datastore.
pxar:
Christian Ebner (9):
fix #3174: decoder: factor out skip_bytes from skip_entry
fix #3174: decoder: impl skip_bytes for sync dec
fix #3174: encoder: calc filename + metadata byte size
fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype
fix #3174: enc/dec: impl PXAR_APPENDIX entrytype
fix #3174: encoder: helper to add to encoder position
fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype
fix #3174: add pxar format version entry
fix #3174: enc: move from instance per dir to state stack
examples/mk-format-hashes.rs | 16 ++
examples/pxarcmd.rs | 8 +-
src/accessor/mod.rs | 142 ++++++++++--
src/decoder/mod.rs | 83 ++++++-
src/decoder/sync.rs | 5 +
src/encoder/aio.rs | 77 +++++--
src/encoder/mod.rs | 435 +++++++++++++++++++++++++++--------
src/encoder/sync.rs | 60 ++++-
src/format/mod.rs | 29 ++-
src/lib.rs | 13 ++
10 files changed, 706 insertions(+), 162 deletions(-)
proxmox-backup:
Christian Ebner (19):
fix #3174: index: add fn index list from start/end-offsets
fix #3174: api: double catalog upload size
fix #3174: catalog: introduce extended format v2
fix #3174: archiver/extractor: impl appendix ref
fix #3174: catalog: add specialized Archive entry
fix #3174: extractor: impl seq restore from appendix
fix #3174: archiver: store ref to previous backup
fix #3174: upload stream: impl reused chunk injector
fix #3174: chunker: add forced boundaries
fix #3174: backup writer: inject queued chunk in upload steam
fix #3174: archiver: reuse files with unchanged metadata
fix #3174: specs: add backup detection mode specification
fix #3174: client: Add detection mode to backup creation
catalog: use format version 2 conditionally
tools: add optional path prefix to line based output
pxar-bin: implement listing for appendix entries
test-suite: add detection mode change benchmark
test-suite: Add bin to deb, add shell completions
pxar: add lookahead caching to reduce chunk fragmentation
Cargo.toml | 1 +
Makefile | 13 +-
debian/proxmox-backup-client.bash-completion | 1 +
debian/proxmox-backup-client.install | 2 +
debian/proxmox-backup-test-suite.bc | 8 +
examples/test_chunk_speed2.rs | 9 +-
pbs-client/src/backup_specification.rs | 33 +
pbs-client/src/backup_writer.rs | 89 ++-
pbs-client/src/catalog_shell.rs | 4 +-
pbs-client/src/chunk_stream.rs | 42 +-
pbs-client/src/inject_reused_chunks.rs | 152 ++++
pbs-client/src/lib.rs | 1 +
pbs-client/src/pxar/create.rs | 707 ++++++++++++++++--
pbs-client/src/pxar/extract.rs | 150 +++-
pbs-client/src/pxar/lookahead_cache.rs | 41 +
pbs-client/src/pxar/mod.rs | 3 +-
pbs-client/src/pxar/tools.rs | 24 +-
pbs-client/src/pxar_backup_stream.rs | 8 +-
pbs-datastore/src/catalog.rs | 621 +++++++++++++--
pbs-datastore/src/dynamic_index.rs | 55 ++
pbs-datastore/src/file_formats.rs | 3 +
proxmox-backup-client/src/main.rs | 192 ++++-
proxmox-backup-test-suite/Cargo.toml | 18 +
.../src/detection_mode_bench.rs | 291 +++++++
proxmox-backup-test-suite/src/main.rs | 17 +
.../src/proxmox_restore_daemon/api.rs | 21 +-
pxar-bin/src/main.rs | 78 +-
src/api2/backup/upload_chunk.rs | 4 +-
src/tape/file_formats/snapshot_archive.rs | 9 +-
tests/catar.rs | 3 +
zsh-completions/_proxmox-backup-test-suite | 13 +
31 files changed, 2395 insertions(+), 218 deletions(-)
create mode 100644 debian/proxmox-backup-test-suite.bc
create mode 100644 pbs-client/src/inject_reused_chunks.rs
create mode 100644 pbs-client/src/pxar/lookahead_cache.rs
create mode 100644 proxmox-backup-test-suite/Cargo.toml
create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs
create mode 100644 proxmox-backup-test-suite/src/main.rs
create mode 100644 zsh-completions/_proxmox-backup-test-suite
proxmox-widget-toolkit:
Christian Ebner (1):
file-browser: support pxar archive and fileref types
src/Schema.js | 2 ++
src/window/FileBrowser.js | 16 ++++++++++++----
2 files changed, 14 insertions(+), 4 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 pxar 1/29] fix #3174: decoder: factor out skip_bytes from skip_entry
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 2/29] fix #3174: decoder: impl skip_bytes for sync dec Christian Ebner
` (27 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
Allows to skip bytes independently of the current entries header, as is
implemented by skip_entry.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- no changes
src/decoder/mod.rs | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index d1fb911..a094324 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -562,20 +562,25 @@ impl<I: SeqRead> DecoderImpl<I> {
// These utilize additional information and hence are not part of the `dyn SeqRead` impl.
//
- async fn skip_entry(&mut self, offset: u64) -> io::Result<()> {
- let mut len = self.current_header.content_size() - offset;
+ async fn skip_bytes(&mut self, len: u64) -> io::Result<()> {
+ let mut remaining = len;
let scratch = scratch_buffer();
- while len >= (scratch.len() as u64) {
+ while remaining >= (scratch.len() as u64) {
seq_read_exact(&mut self.input, scratch).await?;
- len -= scratch.len() as u64;
+ remaining -= scratch.len() as u64;
}
- let len = len as usize;
- if len > 0 {
- seq_read_exact(&mut self.input, &mut scratch[..len]).await?;
+ let remaining = remaining as usize;
+ if remaining > 0 {
+ seq_read_exact(&mut self.input, &mut scratch[..remaining]).await?;
}
Ok(())
}
+ async fn skip_entry(&mut self, offset: u64) -> io::Result<()> {
+ let len = self.current_header.content_size() - offset;
+ self.skip_bytes(len).await
+ }
+
async fn read_entry_as_bytes(&mut self) -> io::Result<Vec<u8>> {
let size = usize::try_from(self.current_header.content_size()).map_err(io_err_other)?;
let data = seq_read_exact_data(&mut self.input, size).await?;
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 pxar 2/29] fix #3174: decoder: impl skip_bytes for sync dec
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 1/29] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 3/29] fix #3174: encoder: calc filename + metadata byte size Christian Ebner
` (26 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
Allows to skip over a given number of bytes during pxar archive
decoding. This is used to skip over unneeded contents in a pxar
appendix.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- no changes
src/decoder/sync.rs | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/src/decoder/sync.rs b/src/decoder/sync.rs
index 5597a03..bdfe2f4 100644
--- a/src/decoder/sync.rs
+++ b/src/decoder/sync.rs
@@ -81,6 +81,11 @@ impl<T: SeqRead> Decoder<T> {
pub fn enable_goodbye_entries(&mut self, on: bool) {
self.inner.with_goodbye_tables = on;
}
+
+ /// Skip bytes in decoder stream.
+ pub fn skip_bytes(&mut self, len: u64) -> io::Result<()> {
+ poll_result_once(self.inner.skip_bytes(len))
+ }
}
impl<T: SeqRead> Iterator for Decoder<T> {
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 pxar 3/29] fix #3174: encoder: calc filename + metadata byte size
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 1/29] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 2/29] fix #3174: decoder: impl skip_bytes for sync dec Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 4/29] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype Christian Ebner
` (25 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
Introduce SeqSink and impl SeqWrite in order to create an encoder
implementation which instead of writing data to a stream, consumes
the encoded stream and returns the consumed bytes for that stream.
Based on this, implement a helper function `byte_len` which returns the
byte size of the filename entry and metadata entry as encoded by the
archive.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- no changes
src/encoder/mod.rs | 35 +++++++++++++++++++++++++++++++++++
src/encoder/sync.rs | 5 +++++
2 files changed, 40 insertions(+)
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 0d342ec..860c21f 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -85,6 +85,24 @@ where
}
}
+#[derive(Default)]
+/// Sink to consume sequential byte stream
+pub struct SeqSink;
+
+impl SeqWrite for SeqSink {
+ fn poll_seq_write(
+ self: Pin<&mut Self>,
+ _cx: &mut Context,
+ buf: &[u8],
+ ) -> Poll<io::Result<usize>> {
+ Poll::Ready(Ok(buf.len()))
+ }
+
+ fn poll_flush(self: Pin<&mut Self>, _cx: &mut Context) -> Poll<io::Result<()>> {
+ Poll::Ready(Ok(()))
+ }
+}
+
/// awaitable verison of `poll_seq_write`.
async fn seq_write<T: SeqWrite + ?Sized>(
output: &mut T,
@@ -183,6 +201,23 @@ where
seq_write_pxar_entry(output, htype, buf, position).await
}
+/// Calculate the encoded byte len of filename and metadata struct
+pub async fn encoded_size(filename: &std::ffi::CStr, metadata: &Metadata) -> io::Result<u64> {
+ let mut this = EncoderImpl {
+ output: EncoderOutput::Owned(SeqSink::default()),
+ state: EncoderState::default(),
+ parent: None,
+ finished: false,
+ file_copy_buffer: Arc::new(Mutex::new(unsafe {
+ crate::util::vec_new_uninitialized(1024 * 1024)
+ })),
+ };
+
+ this.start_file_do(Some(metadata), filename.to_bytes())
+ .await?;
+ Ok(this.position())
+}
+
/// Error conditions caused by wrong usage of this crate.
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
pub enum EncodeError {
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 1ec91b8..93c3b2c 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -228,3 +228,8 @@ impl<T: io::Write> SeqWrite for StandardWriter<T> {
Poll::Ready(self.pin_to_inner().and_then(|inner| inner.flush()))
}
}
+
+/// Calculate the encoded byte len of filename and metadata struct
+pub fn encoded_size(filename: &std::ffi::CStr, metadata: &Metadata) -> io::Result<u64> {
+ poll_result_once(crate::encoder::encoded_size(filename, metadata))
+}
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 pxar 4/29] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (2 preceding siblings ...)
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 3/29] fix #3174: encoder: calc filename + metadata byte size Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 5/29] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype Christian Ebner
` (24 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
Add an additional entry type for regular files to store a reference to
the appenidx section of the pxar archive, the position relative to the
appendix start is stored, in order to be able to access the file payload
within the appendix.
This new entry type is used to reference the contents of existing file
payload chunks for unchanged file payloads.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- no changes
examples/mk-format-hashes.rs | 5 +++
src/decoder/mod.rs | 22 ++++++++++++-
src/encoder/aio.rs | 14 ++++++++-
src/encoder/mod.rs | 61 ++++++++++++++++++++++++++++++++++++
src/encoder/sync.rs | 16 +++++++++-
src/format/mod.rs | 5 +++
src/lib.rs | 6 ++++
7 files changed, 126 insertions(+), 3 deletions(-)
diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index 1ad606c..8b4f5de 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -41,6 +41,11 @@ const CONSTANTS: &[(&str, &str, &str)] = &[
"PXAR_PAYLOAD",
"__PROXMOX_FORMAT_PXAR_PAYLOAD__",
),
+ (
+ "Marks the beginnig of an appendix reference for regular files",
+ "PXAR_APPENDIX_REF",
+ "__PROXMOX_FORMAT_PXAR_APPENDIX_REF__",
+ ),
(
"Marks item as entry of goodbye table",
"PXAR_GOODBYE",
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index a094324..c30fa14 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -276,6 +276,10 @@ impl<I: SeqRead> DecoderImpl<I> {
match self.current_header.htype {
format::PXAR_FILENAME => return self.handle_file_entry().await,
+ format::PXAR_APPENDIX_REF => {
+ self.state = State::Default;
+ return self.handle_appendix_ref_entry().await;
+ }
format::PXAR_GOODBYE => {
self.state = State::InGoodbyeTable;
@@ -334,6 +338,22 @@ impl<I: SeqRead> DecoderImpl<I> {
self.read_next_entry().await.map(Some)
}
+ async fn handle_appendix_ref_entry(&mut self) -> io::Result<Option<Entry>> {
+ let bytes = self.read_entry_as_bytes().await?;
+ let appendix_offset = u64::from_le_bytes(bytes[0..8].try_into().unwrap());
+ let file_size = u64::from_le_bytes(bytes[8..16].try_into().unwrap());
+
+ self.reset_path()?;
+ return Ok(Some(Entry {
+ path: self.entry.path.clone(),
+ metadata: Metadata::default(),
+ kind: EntryKind::AppendixRef {
+ appendix_offset,
+ file_size,
+ },
+ }));
+ }
+
fn reset_path(&mut self) -> io::Result<()> {
let path_len = *self
.path_lengths
@@ -535,7 +555,7 @@ impl<I: SeqRead> DecoderImpl<I> {
self.state = State::InPayload { offset: 0 };
return Ok(ItemResult::Entry);
}
- format::PXAR_FILENAME | format::PXAR_GOODBYE => {
+ format::PXAR_FILENAME | format::PXAR_GOODBYE | format::PXAR_APPENDIX_REF => {
if self.entry.metadata.is_fifo() {
self.state = State::InSpecialFile;
self.entry.kind = EntryKind::Fifo;
diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index ad25fea..66ea535 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -5,7 +5,7 @@ use std::path::Path;
use std::pin::Pin;
use std::task::{Context, Poll};
-use crate::encoder::{self, LinkOffset, SeqWrite};
+use crate::encoder::{self, AppendixRefOffset, LinkOffset, SeqWrite};
use crate::format;
use crate::Metadata;
@@ -112,6 +112,18 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
self.inner.finish().await
}
+ /// Add reference to archive appendix
+ pub async fn add_appendix_ref<PF: AsRef<Path>>(
+ &mut self,
+ file_name: PF,
+ appendix_ref_offset: AppendixRefOffset,
+ file_size: u64,
+ ) -> io::Result<()> {
+ self.inner
+ .add_appendix_ref(file_name.as_ref(), appendix_ref_offset, file_size)
+ .await
+ }
+
/// Add a symbolic link to the archive.
pub async fn add_symlink<PF: AsRef<Path>, PT: AsRef<Path>>(
&mut self,
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 860c21f..982e3f9 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -38,6 +38,33 @@ impl LinkOffset {
}
}
+/// File reference used to create appendix references.
+#[derive(Clone, Copy, Debug, Default, Eq, PartialEq, Ord, PartialOrd)]
+pub struct AppendixRefOffset(u64);
+
+impl AppendixRefOffset {
+ /// Get the raw byte offset for this appendix reference.
+ #[inline]
+ pub fn raw(self) -> u64 {
+ self.0
+ }
+
+ /// Return a new AppendixRefOffset, positively shifted by offset
+ #[inline]
+ pub fn add(&self, offset: u64) -> Self {
+ Self(self.0 + offset)
+ }
+
+ /// Return a new AppendixRefOffset, negatively shifted by offset
+ #[inline]
+ pub fn sub(&self, offset: u64) -> Self {
+ Self(self.0 - offset)
+ }
+}
+
+/// Offset pointing to the start of the appendix section of the archive.
+#[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
+pub struct AppendixStartOffset(u64);
/// Sequential write interface used by the encoder's state machine.
///
/// This is our internal writer trait which is available for `std::io::Write` types in the
@@ -466,6 +493,40 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
Ok(offset)
}
+ /// Add reference to pxar archive appendix
+ pub async fn add_appendix_ref(
+ &mut self,
+ file_name: &Path,
+ appendix_ref_offset: AppendixRefOffset,
+ file_size: u64,
+ ) -> io::Result<()> {
+ self.check()?;
+
+ let offset = self.position();
+ let file_name = file_name.as_os_str().as_bytes();
+
+ let mut data = Vec::with_capacity(2 * 8);
+ data.extend(&appendix_ref_offset.raw().to_le_bytes());
+ data.extend(&file_size.to_le_bytes());
+ seq_write_pxar_entry(
+ self.output.as_mut(),
+ format::PXAR_APPENDIX_REF,
+ &data,
+ &mut self.state.write_position,
+ )
+ .await?;
+
+ let end_offset = self.position();
+
+ self.state.items.push(GoodbyeItem {
+ hash: format::hash_filename(file_name),
+ offset: offset,
+ size: end_offset - offset,
+ });
+
+ Ok(())
+ }
+
/// Return a file offset usable with `add_hardlink`.
pub async fn add_symlink(
&mut self,
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 93c3b2c..370a219 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -6,7 +6,7 @@ use std::pin::Pin;
use std::task::{Context, Poll};
use crate::decoder::sync::StandardReader;
-use crate::encoder::{self, LinkOffset, SeqWrite};
+use crate::encoder::{self, AppendixRefOffset, LinkOffset, SeqWrite};
use crate::format;
use crate::util::poll_result_once;
use crate::Metadata;
@@ -110,6 +110,20 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
poll_result_once(self.inner.finish())
}
+ /// Add reference to archive appendix
+ pub async fn add_appendix_ref<PF: AsRef<Path>>(
+ &mut self,
+ file_name: PF,
+ appendix_ref_offset: AppendixRefOffset,
+ file_size: u64,
+ ) -> io::Result<()> {
+ poll_result_once(self.inner.add_appendix_ref(
+ file_name.as_ref(),
+ appendix_ref_offset,
+ file_size,
+ ))
+ }
+
/// Add a symbolic link to the archive.
pub fn add_symlink<PF: AsRef<Path>, PT: AsRef<Path>>(
&mut self,
diff --git a/src/format/mod.rs b/src/format/mod.rs
index 72a193c..5eb7562 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -22,6 +22,7 @@
//! * `FCAPS` -- file capability in Linux disk format
//! * `QUOTA_PROJECT_ID` -- the ext4/xfs quota project ID
//! * `PAYLOAD` -- file contents, if it is one
+//! * `APPENDIX_REF` -- start offset and size of a file entry relative to the appendix start
//! * `SYMLINK` -- symlink target, if it is one
//! * `DEVICE` -- device major/minor, if it is a block/char device
//!
@@ -99,6 +100,8 @@ pub const PXAR_QUOTA_PROJID: u64 = 0xe07540e82f7d1cbb;
pub const PXAR_HARDLINK: u64 = 0x51269c8422bd7275;
/// Marks the beginnig of the payload (actual content) of regular files
pub const PXAR_PAYLOAD: u64 = 0x28147a1b0b7c1a25;
+/// Marks the beginnig of an appendix reference for regular files
+pub const PXAR_APPENDIX_REF: u64 = 0x849b4a17e0234f8e;
/// Marks item as entry of goodbye table
pub const PXAR_GOODBYE: u64 = 0x2fec4fa642d5731d;
/// The end marker used in the GOODBYE object
@@ -151,6 +154,7 @@ impl Header {
PXAR_ACL_GROUP_OBJ => size_of::<acl::GroupObject>() as u64,
PXAR_QUOTA_PROJID => size_of::<QuotaProjectId>() as u64,
PXAR_ENTRY => size_of::<Stat>() as u64,
+ PXAR_APPENDIX_REF => u64::MAX - (size_of::<Self>() as u64),
PXAR_PAYLOAD | PXAR_GOODBYE => u64::MAX - (size_of::<Self>() as u64),
_ => u64::MAX - (size_of::<Self>() as u64),
}
@@ -191,6 +195,7 @@ impl Display for Header {
PXAR_ACL_GROUP_OBJ => "ACL_GROUP_OBJ",
PXAR_QUOTA_PROJID => "QUOTA_PROJID",
PXAR_ENTRY => "ENTRY",
+ PXAR_APPENDIX_REF => "APPENDIX_REF",
PXAR_PAYLOAD => "PAYLOAD",
PXAR_GOODBYE => "GOODBYE",
_ => "UNKNOWN",
diff --git a/src/lib.rs b/src/lib.rs
index 210c4b1..fa84e7a 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -366,6 +366,12 @@ pub enum EntryKind {
offset: Option<u64>,
},
+ /// Reference to pxar archive appendix
+ AppendixRef {
+ appendix_offset: u64,
+ file_size: u64,
+ },
+
/// Directory entry. When iterating through an archive, the contents follow next.
Directory,
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 pxar 5/29] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (3 preceding siblings ...)
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 4/29] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 6/29] fix #3174: encoder: helper to add to encoder position Christian Ebner
` (23 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
Add an additional entry type for marking the start of a pxar archive
appendix section. The appendix is a concatenation of possibly
uncorrelated chunks, therefore not following the pxar archive format
anymore. The appendix is only used to access the file metadata and
payloads when a PXAR_APPENDIX_REF entry is encountered in the archive
before this point.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- no changes
examples/mk-format-hashes.rs | 1 +
src/decoder/mod.rs | 10 ++++++++++
src/encoder/aio.rs | 12 +++++++++++-
src/encoder/mod.rs | 33 +++++++++++++++++++++++++++++++++
src/encoder/sync.rs | 12 +++++++++++-
src/format/mod.rs | 7 +++++++
src/lib.rs | 4 ++++
7 files changed, 77 insertions(+), 2 deletions(-)
diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index 8b4f5de..f068edd 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -12,6 +12,7 @@ const CONSTANTS: &[(&str, &str, &str)] = &[
"__PROXMOX_FORMAT_ENTRY__",
),
("", "PXAR_FILENAME", "__PROXMOX_FORMAT_FILENAME__"),
+ ("", "PXAR_APPENDIX", "__PROXMOX_FORMAT_APPENDIX__"),
("", "PXAR_SYMLINK", "__PROXMOX_FORMAT_SYMLINK__"),
("", "PXAR_DEVICE", "__PROXMOX_FORMAT_DEVICE__"),
("", "PXAR_XATTR", "__PROXMOX_FORMAT_XATTR__"),
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index c30fa14..a0f322e 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -295,6 +295,10 @@ impl<I: SeqRead> DecoderImpl<I> {
continue;
}
}
+ format::PXAR_APPENDIX => {
+ self.state = State::Default;
+ return Ok(Some(self.entry.take()));
+ }
_ => io_bail!(
"expected filename or directory-goodbye pxar entry, got: {}",
self.current_header,
@@ -546,6 +550,12 @@ impl<I: SeqRead> DecoderImpl<I> {
self.entry.kind = EntryKind::Device(self.read_device().await?);
return Ok(ItemResult::Entry);
}
+ format::PXAR_APPENDIX => {
+ let bytes = self.read_entry_as_bytes().await?;
+ let total = u64::from_le_bytes(bytes[0..8].try_into().unwrap());
+ self.entry.kind = EntryKind::Appendix { total };
+ return Ok(ItemResult::Entry);
+ }
format::PXAR_PAYLOAD => {
let offset = seq_read_position(&mut self.input).await.transpose()?;
self.entry.kind = EntryKind::File {
diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index 66ea535..9cc26e0 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -5,7 +5,7 @@ use std::path::Path;
use std::pin::Pin;
use std::task::{Context, Poll};
-use crate::encoder::{self, AppendixRefOffset, LinkOffset, SeqWrite};
+use crate::encoder::{self, AppendixRefOffset, AppendixStartOffset, LinkOffset, SeqWrite};
use crate::format;
use crate::Metadata;
@@ -124,6 +124,16 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
.await
}
+ /// Add the appendix start entry marker
+ ///
+ /// Returns the LinkOffset pointing after the entry, the appendix start offset
+ pub async fn add_appendix(
+ &mut self,
+ full_size: AppendixRefOffset,
+ ) -> io::Result<AppendixStartOffset> {
+ self.inner.add_appendix(full_size).await
+ }
+
/// Add a symbolic link to the archive.
pub async fn add_symlink<PF: AsRef<Path>, PT: AsRef<Path>>(
&mut self,
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 982e3f9..6745b8b 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -65,6 +65,15 @@ impl AppendixRefOffset {
/// Offset pointing to the start of the appendix section of the archive.
#[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
pub struct AppendixStartOffset(u64);
+
+impl AppendixStartOffset {
+ /// Get the raw byte start offset for this appenidx section.
+ #[inline]
+ pub fn raw(self) -> u64 {
+ self.0
+ }
+}
+
/// Sequential write interface used by the encoder's state machine.
///
/// This is our internal writer trait which is available for `std::io::Write` types in the
@@ -527,6 +536,30 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
Ok(())
}
+ /// Add the appendix start entry marker
+ ///
+ /// Returns the AppendixStartOffset pointing after the entry, the start of the appendix
+ /// section of the archive.
+ pub async fn add_appendix(
+ &mut self,
+ full_size: AppendixRefOffset,
+ ) -> io::Result<AppendixStartOffset> {
+ self.check()?;
+
+ let data = &full_size.raw().to_le_bytes().to_vec();
+ seq_write_pxar_entry(
+ self.output.as_mut(),
+ format::PXAR_APPENDIX,
+ &data,
+ &mut self.state.write_position,
+ )
+ .await?;
+
+ let offset = self.position();
+
+ Ok(AppendixStartOffset(offset))
+ }
+
/// Return a file offset usable with `add_hardlink`.
pub async fn add_symlink(
&mut self,
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 370a219..54e33a2 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -6,7 +6,7 @@ use std::pin::Pin;
use std::task::{Context, Poll};
use crate::decoder::sync::StandardReader;
-use crate::encoder::{self, AppendixRefOffset, LinkOffset, SeqWrite};
+use crate::encoder::{self, AppendixRefOffset, AppendixStartOffset, LinkOffset, SeqWrite};
use crate::format;
use crate::util::poll_result_once;
use crate::Metadata;
@@ -124,6 +124,16 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
))
}
+ /// Add the appendix start entry marker
+ ///
+ /// Returns the LinkOffset pointing after the entry, the appendix start offset
+ pub async fn add_appendix(
+ &mut self,
+ full_size: AppendixRefOffset,
+ ) -> io::Result<AppendixStartOffset> {
+ poll_result_once(self.inner.add_appendix(full_size))
+ }
+
/// Add a symbolic link to the archive.
pub fn add_symlink<PF: AsRef<Path>, PT: AsRef<Path>>(
&mut self,
diff --git a/src/format/mod.rs b/src/format/mod.rs
index 5eb7562..8254df9 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -35,6 +35,12 @@
//! * `<archive>` -- serialization of the second directory entry
//! * ...
//! * `GOODBYE` -- lookup table at the end of a list of directory entries
+//!
+//! For backups referencing previous backups to skip file payloads, the archive is followed by a
+//! appendix maker after which the concatinated pxar archive fragments containing the file payloads
+//! are appended. They are NOT guaranteed to follow the full pxar structure and should only be
+//! used to extract the file payloads by given offset.
+//! * `APPENDIX` -- pxar archive fragments containing file payloads
use std::cmp::Ordering;
use std::ffi::{CStr, OsStr};
@@ -85,6 +91,7 @@ pub const PXAR_ENTRY: u64 = 0xd5956474e588acef;
/// Previous version of the entry struct
pub const PXAR_ENTRY_V1: u64 = 0x11da850a1c1cceff;
pub const PXAR_FILENAME: u64 = 0x16701121063917b3;
+pub const PXAR_APPENDIX: u64 = 0x9ff6c9507864b38d;
pub const PXAR_SYMLINK: u64 = 0x27f971e7dbf5dc5f;
pub const PXAR_DEVICE: u64 = 0x9fc9e906586d5ce9;
pub const PXAR_XATTR: u64 = 0x0dab0229b57dcd03;
diff --git a/src/lib.rs b/src/lib.rs
index fa84e7a..035f995 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -372,6 +372,10 @@ pub enum EntryKind {
file_size: u64,
},
+ Appendix {
+ total: u64,
+ },
+
/// Directory entry. When iterating through an archive, the contents follow next.
Directory,
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 pxar 6/29] fix #3174: encoder: helper to add to encoder position
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (4 preceding siblings ...)
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 5/29] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 7/29] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype Christian Ebner
` (22 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
Adds a helper to allow to increase the encoder position by a given
size. This is used to increase the position when adding an appendix
section to the pxar stream, as these bytes are never encoded directly
but rather referenced by already existing chunks.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- no changes
src/encoder/aio.rs | 5 +++++
src/encoder/mod.rs | 6 ++++++
src/encoder/sync.rs | 5 +++++
3 files changed, 16 insertions(+)
diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index 9cc26e0..7379940 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -112,6 +112,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
self.inner.finish().await
}
+ /// Add size to encoders position and return new position.
+ pub unsafe fn position_add(&mut self, size: u64) -> u64 {
+ unsafe { self.inner.position_add(size) }
+ }
+
/// Add reference to archive appendix
pub async fn add_appendix_ref<PF: AsRef<Path>>(
&mut self,
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 6745b8b..98f6521 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -693,6 +693,12 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
self.state.write_position
}
+ #[inline]
+ pub unsafe fn position_add(&mut self, size: u64) -> u64 {
+ self.state.write_position += size;
+ self.state.write_position
+ }
+
pub async fn create_directory(
&mut self,
file_name: &Path,
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 54e33a2..5d8db25 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -110,6 +110,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
poll_result_once(self.inner.finish())
}
+ /// Add size to encoders position and return new position.
+ pub unsafe fn position_add(&mut self, size: u64) -> u64 {
+ unsafe { self.inner.position_add(size) }
+ }
+
/// Add reference to archive appendix
pub async fn add_appendix_ref<PF: AsRef<Path>>(
&mut self,
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 pxar 7/29] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (5 preceding siblings ...)
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 6/29] fix #3174: encoder: helper to add to encoder position Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 8/29] fix #3174: add pxar format version entry Christian Ebner
` (21 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
The PXAR_APPENDIX_TAIL entry marks pxar archives containing an appendix
section. It has the same size as a goodbye tail marker item in order to
be able to easily read and distinguish archives with and without such
section.
This also implements the accessor used by e.g. the fuse implementation
to perform random io on the archive. The accessor reads the last entry
and stores the appendix offset if needed, in order to recalculate the
actual file payload offset within the archive when encountering a
appendix reference entry in the archive.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- fix formatting using `cargo fmt`
examples/mk-format-hashes.rs | 5 ++
examples/pxarcmd.rs | 4 +-
src/accessor/mod.rs | 142 ++++++++++++++++++++++++++++++-----
src/encoder/aio.rs | 9 ++-
src/encoder/mod.rs | 19 ++++-
src/encoder/sync.rs | 7 +-
src/format/mod.rs | 4 +
7 files changed, 162 insertions(+), 28 deletions(-)
diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index f068edd..7fb938d 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -57,6 +57,11 @@ const CONSTANTS: &[(&str, &str, &str)] = &[
"PXAR_GOODBYE_TAIL_MARKER",
"__PROXMOX_FORMAT_PXAR_GOODBYE_TAIL_MARKER__",
),
+ (
+ "Marks the end of an archive containing an appendix section",
+ "PXAR_APPENDIX_TAIL",
+ "__PROXMOX_FORMAT_APPENDIX_TAIL__",
+ ),
];
fn main() {
diff --git a/examples/pxarcmd.rs b/examples/pxarcmd.rs
index e0c779d..c7848cc 100644
--- a/examples/pxarcmd.rs
+++ b/examples/pxarcmd.rs
@@ -105,7 +105,7 @@ fn cmd_create(mut args: std::env::ArgsOs) -> Result<(), Error> {
let mut encoder = Encoder::create(file, &meta)?;
add_directory(&mut encoder, dir, &dir_path, &mut HashMap::new())?;
- encoder.finish()?;
+ encoder.finish(None)?;
Ok(())
}
@@ -145,7 +145,7 @@ fn add_directory<'a, T: SeqWrite + 'a>(
root_path,
&mut *hardlinks,
)?;
- dir.finish()?;
+ dir.finish(None)?;
} else if file_type.is_symlink() {
todo!("symlink handling");
} else if file_type.is_file() {
diff --git a/src/accessor/mod.rs b/src/accessor/mod.rs
index 6a2de73..21b62d0 100644
--- a/src/accessor/mod.rs
+++ b/src/accessor/mod.rs
@@ -182,6 +182,7 @@ pub(crate) struct AccessorImpl<T> {
input: T,
size: u64,
caches: Arc<Caches>,
+ appendix_offset: Option<u64>,
}
impl<T: ReadAt> AccessorImpl<T> {
@@ -190,10 +191,22 @@ impl<T: ReadAt> AccessorImpl<T> {
io_bail!("too small to contain a pxar archive");
}
+ let tail_offset = size - (size_of::<GoodbyeItem>() as u64);
+ let tail: GoodbyeItem = read_entry_at(&input, tail_offset).await?;
+
+ let (appendix_offset, size) = if tail.hash == format::PXAR_APPENDIX_TAIL {
+ (Some(tail.offset), size - 40)
+ } else if tail.hash != format::PXAR_GOODBYE_TAIL_MARKER {
+ io_bail!("no goodbye tail marker found");
+ } else {
+ (None, size)
+ };
+
Ok(Self {
input,
size,
caches: Arc::new(Caches::default()),
+ appendix_offset,
})
}
@@ -207,6 +220,7 @@ impl<T: ReadAt> AccessorImpl<T> {
self.size,
"/".into(),
Arc::clone(&self.caches),
+ self.appendix_offset,
)
.await
}
@@ -263,6 +277,7 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
self.size,
"/".into(),
Arc::clone(&self.caches),
+ self.appendix_offset,
)
.await
}
@@ -274,6 +289,7 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
offset,
"/".into(),
Arc::clone(&self.caches),
+ self.appendix_offset,
)
.await
}
@@ -298,6 +314,7 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
entry,
entry_range_info: entry_range_info.clone(),
caches: Arc::clone(&self.caches),
+ appendix_offset: self.appendix_offset,
})
}
@@ -353,6 +370,7 @@ impl<T: Clone + ReadAt> AccessorImpl<T> {
entry_range: entry_offset..entry_end,
},
caches: Arc::clone(&self.caches),
+ appendix_offset: self.appendix_offset,
})
}
_ => io_bail!("hardlink does not point to a regular file"),
@@ -369,6 +387,7 @@ pub(crate) struct DirectoryImpl<T> {
table: Arc<[GoodbyeItem]>,
path: PathBuf,
caches: Arc<Caches>,
+ appendix_offset: Option<u64>,
}
impl<T: Clone + ReadAt> DirectoryImpl<T> {
@@ -378,6 +397,7 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
end_offset: u64,
path: PathBuf,
caches: Arc<Caches>,
+ appendix_offset: Option<u64>,
) -> io::Result<DirectoryImpl<T>> {
let tail = Self::read_tail_entry(&input, end_offset).await?;
@@ -407,6 +427,7 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
table: table.as_ref().map_or_else(|| Arc::new([]), Arc::clone),
path,
caches,
+ appendix_offset,
};
// sanity check:
@@ -516,6 +537,32 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
.next()
.await
.ok_or_else(|| io_format_err!("unexpected EOF while decoding directory entry"))??;
+
+ if let EntryKind::AppendixRef {
+ appendix_offset,
+ file_size,
+ } = entry.kind()
+ {
+ let appendix_start = match self.appendix_offset {
+ Some(appendix_start) => appendix_start,
+ None => io_bail!("missing required appendix start offset information"),
+ };
+
+ let name = file_name.ok_or_else(|| io_format_err!("missing required filename"))?;
+ let c_string = std::ffi::CString::new(name.as_os_str().as_bytes())?;
+ let start =
+ appendix_start + appendix_offset + 16 + c_string.as_bytes_with_nul().len() as u64;
+ let end = start + file_size;
+ decoder = self.get_decoder(start..end, file_name).await?;
+
+ let entry = decoder
+ .next()
+ .await
+ .ok_or_else(|| io_format_err!("unexpected EOF while decoding directory entry"))??;
+
+ return Ok((entry, decoder));
+ }
+
Ok((entry, decoder))
}
@@ -533,6 +580,7 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
entry_range: self.entry_range(),
},
caches: Arc::clone(&self.caches),
+ appendix_offset: self.appendix_offset,
})
}
@@ -616,36 +664,86 @@ impl<T: Clone + ReadAt> DirectoryImpl<T> {
}
let file_ofs = self.goodbye_ofs - file_goodbye_ofs;
- let (file_name, entry_ofs) = self.read_filename_entry(file_ofs).await?;
- let entry_range = Range {
- start: entry_ofs,
- end: file_ofs + entry.size,
- };
- if entry_range.end < entry_range.start {
- io_bail!(
- "bad file: invalid entry ranges for {:?}: \
- start=0x{:x}, file_ofs=0x{:x}, size=0x{:x}",
+ let mut head: format::Header = read_entry_at(&self.input, file_ofs).await?;
+ let (file_name, entry_range_info) = if head.htype == format::PXAR_APPENDIX_REF {
+ let appendix_start = match self.appendix_offset {
+ Some(appendix_start) => appendix_start,
+ None => io_bail!("missing required appendix start offset information"),
+ };
+ let bytes = read_exact_data_at(
+ &self.input,
+ head.content_size() as usize,
+ file_ofs + (size_of_val(&head) as u64),
+ )
+ .await?;
+ let appendix_offset = u64::from_le_bytes(bytes[0..8].try_into().unwrap());
+ let offset = appendix_start + appendix_offset;
+ let size = u64::from_le_bytes(bytes[8..16].try_into().unwrap());
+
+ head = read_entry_at(&self.input, offset).await?;
+ let (file_name, entry_ofs) = self.read_filename_entry(head, offset).await?;
+
+ let c_string = std::ffi::CString::new(file_name.as_os_str().as_bytes())?;
+ let start = offset + 16 + c_string.as_bytes_with_nul().len() as u64;
+ if start + size < start {
+ io_bail!(
+ "bad file: invalid entry ranges for {:?}: \
+ start=0x{:x}, file_ofs=0x{:x}, size=0x{:x}",
+ file_name,
+ entry_ofs,
+ offset,
+ size,
+ );
+ }
+ (
file_name,
- entry_ofs,
- file_ofs,
- entry.size,
- );
- }
+ EntryRangeInfo {
+ filename_header_offset: Some(offset),
+ entry_range: Range {
+ start,
+ end: start + size,
+ },
+ },
+ )
+ } else {
+ let (file_name, entry_ofs) = self.read_filename_entry(head, file_ofs).await?;
+ if file_ofs + entry.size < entry_ofs {
+ io_bail!(
+ "bad file: invalid entry ranges for {:?}: \
+ start=0x{:x}, file_ofs=0x{:x}, size=0x{:x}",
+ file_name,
+ entry_ofs,
+ file_ofs,
+ entry.size,
+ );
+ }
+ (
+ file_name,
+ EntryRangeInfo {
+ filename_header_offset: Some(file_ofs),
+ entry_range: Range {
+ start: entry_ofs,
+ end: file_ofs + entry.size,
+ },
+ },
+ )
+ };
Ok(DirEntryImpl {
dir: self,
file_name,
- entry_range_info: EntryRangeInfo {
- filename_header_offset: Some(file_ofs),
- entry_range,
- },
+ entry_range_info,
caches: Arc::clone(&self.caches),
+ appendix_offset: self.appendix_offset,
})
}
- async fn read_filename_entry(&self, file_ofs: u64) -> io::Result<(PathBuf, u64)> {
- let head: format::Header = read_entry_at(&self.input, file_ofs).await?;
+ async fn read_filename_entry(
+ &self,
+ head: format::Header,
+ file_ofs: u64,
+ ) -> io::Result<(PathBuf, u64)> {
if head.htype != format::PXAR_FILENAME {
io_bail!("expected PXAR_FILENAME header, found: {}", head);
}
@@ -685,6 +783,7 @@ pub(crate) struct FileEntryImpl<T: Clone + ReadAt> {
entry: Entry,
entry_range_info: EntryRangeInfo,
caches: Arc<Caches>,
+ appendix_offset: Option<u64>,
}
impl<T: Clone + ReadAt> FileEntryImpl<T> {
@@ -698,6 +797,7 @@ impl<T: Clone + ReadAt> FileEntryImpl<T> {
self.entry_range_info.entry_range.end,
self.entry.path.clone(),
Arc::clone(&self.caches),
+ self.appendix_offset,
)
.await
}
@@ -787,6 +887,7 @@ pub(crate) struct DirEntryImpl<'a, T: Clone + ReadAt> {
file_name: PathBuf,
entry_range_info: EntryRangeInfo,
caches: Arc<Caches>,
+ appendix_offset: Option<u64>,
}
impl<'a, T: Clone + ReadAt> DirEntryImpl<'a, T> {
@@ -808,6 +909,7 @@ impl<'a, T: Clone + ReadAt> DirEntryImpl<'a, T> {
entry,
entry_range_info: self.entry_range_info.clone(),
caches: Arc::clone(&self.caches),
+ appendix_offset: self.appendix_offset,
})
}
diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index 7379940..5a833c5 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -108,8 +108,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
}
/// Finish this directory. This is mandatory, otherwise the `Drop` handler will `panic!`.
- pub async fn finish(self) -> io::Result<()> {
- self.inner.finish().await
+ pub async fn finish(
+ self,
+ appendix_tail: Option<(AppendixStartOffset, AppendixRefOffset)>,
+ ) -> io::Result<()> {
+ self.inner.finish(appendix_tail).await
}
/// Add size to encoders position and return new position.
@@ -330,7 +333,7 @@ mod test {
.await
.unwrap();
}
- encoder.finish().await.unwrap();
+ encoder.finish(None).await.unwrap();
};
fn test_send<T: Send>(_: T) {}
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 98f6521..c33b2c3 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -889,7 +889,10 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
.await
}
- pub async fn finish(mut self) -> io::Result<()> {
+ pub async fn finish(
+ mut self,
+ appendix_tail: Option<(AppendixStartOffset, AppendixRefOffset)>,
+ ) -> io::Result<()> {
let tail_bytes = self.finish_goodbye_table().await?;
seq_write_pxar_entry(
self.output.as_mut(),
@@ -899,6 +902,20 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
)
.await?;
+ if let Some((appendix_start_offset, size)) = appendix_tail {
+ let mut appendix_tail = Vec::new();
+ appendix_tail.append(&mut format::PXAR_APPENDIX_TAIL.to_le_bytes().to_vec());
+ appendix_tail.append(&mut appendix_start_offset.raw().to_le_bytes().to_vec());
+ appendix_tail.append(&mut size.raw().to_le_bytes().to_vec());
+ seq_write_pxar_entry(
+ self.output.as_mut(),
+ format::PXAR_GOODBYE,
+ &appendix_tail,
+ &mut self.state.write_position,
+ )
+ .await?;
+ }
+
if let EncoderOutput::Owned(output) = &mut self.output {
flush(output).await?;
}
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 5d8db25..5ede554 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -106,8 +106,11 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
}
/// Finish this directory. This is mandatory, otherwise the `Drop` handler will `panic!`.
- pub fn finish(self) -> io::Result<()> {
- poll_result_once(self.inner.finish())
+ pub fn finish(
+ self,
+ appendix_tail: Option<(AppendixStartOffset, AppendixRefOffset)>,
+ ) -> io::Result<()> {
+ poll_result_once(self.inner.finish(appendix_tail))
}
/// Add size to encoders position and return new position.
diff --git a/src/format/mod.rs b/src/format/mod.rs
index 8254df9..8016ab1 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -41,6 +41,8 @@
//! are appended. They are NOT guaranteed to follow the full pxar structure and should only be
//! used to extract the file payloads by given offset.
//! * `APPENDIX` -- pxar archive fragments containing file payloads
+//! * final goodbye table
+//! * `APPENDIX_TAIL` -- marks the end of an archive containing a APPENDIX section
use std::cmp::Ordering;
use std::ffi::{CStr, OsStr};
@@ -113,6 +115,8 @@ pub const PXAR_APPENDIX_REF: u64 = 0x849b4a17e0234f8e;
pub const PXAR_GOODBYE: u64 = 0x2fec4fa642d5731d;
/// The end marker used in the GOODBYE object
pub const PXAR_GOODBYE_TAIL_MARKER: u64 = 0xef5eed5b753e1555;
+/// Marks the end of an archive containing an appendix section
+pub const PXAR_APPENDIX_TAIL: u64 = 0x5b1b9abb7ae454f1;
#[derive(Debug, Endian)]
#[repr(C)]
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 pxar 8/29] fix #3174: add pxar format version entry
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (6 preceding siblings ...)
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 7/29] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 9/29] fix #3174: enc: move from instance per dir to state stack Christian Ebner
` (20 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
Adds an additional entry type at the start of each pxar archive
signaling the encoding format version. If not present, the default
version 1 is assumed.
This allows to early on detect the pxar encoding version, allowing tools
to switch mode or bail on non compatible encoder/decoder functionality.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- pxar archives in version 2 are prefixed by an entry kind version,
allowing encoders/decoders to early on detect the used format version.
examples/mk-format-hashes.rs | 5 +++++
src/decoder/mod.rs | 32 +++++++++++++++++++++++++++++++-
src/encoder/aio.rs | 23 ++++++++++++++++-------
src/encoder/mod.rs | 32 ++++++++++++++++++++++++++++++++
src/encoder/sync.rs | 7 ++++---
src/format/mod.rs | 11 ++++++++++-
src/lib.rs | 3 +++
7 files changed, 101 insertions(+), 12 deletions(-)
diff --git a/examples/mk-format-hashes.rs b/examples/mk-format-hashes.rs
index 7fb938d..8a43d81 100644
--- a/examples/mk-format-hashes.rs
+++ b/examples/mk-format-hashes.rs
@@ -1,6 +1,11 @@
use pxar::format::hash_filename;
const CONSTANTS: &[(&str, &str, &str)] = &[
+ (
+ "Pxar format version entry, fallback to version 1 if not present",
+ "PXAR_FORMAT_VERSION",
+ "__PROXMOX_FORMAT_VERSION__",
+ ),
(
"Beginning of an entry (current version).",
"PXAR_ENTRY",
diff --git a/src/decoder/mod.rs b/src/decoder/mod.rs
index a0f322e..ec2f079 100644
--- a/src/decoder/mod.rs
+++ b/src/decoder/mod.rs
@@ -160,6 +160,7 @@ pub(crate) struct DecoderImpl<T> {
/// The random access code uses decoders for sub-ranges which may not end in a `PAYLOAD` for
/// entries like FIFOs or sockets, so there we explicitly allow an item to terminate with EOF.
eof_after_entry: bool,
+ version: format::FormatVersion,
}
enum State {
@@ -220,6 +221,7 @@ impl<I: SeqRead> DecoderImpl<I> {
state: State::Begin,
with_goodbye_tables: false,
eof_after_entry,
+ version: format::FormatVersion::default(),
};
// this.read_next_entry().await?;
@@ -236,7 +238,16 @@ impl<I: SeqRead> DecoderImpl<I> {
loop {
match self.state {
State::Eof => return Ok(None),
- State::Begin => return self.read_next_entry().await.map(Some),
+ State::Begin => {
+ let entry = self.read_next_entry().await.map(Some);
+ if let Ok(Some(ref entry)) = entry {
+ if let EntryKind::Version(version) = entry.kind() {
+ self.version = version.clone();
+ return self.read_next_entry().await.map(Some);
+ }
+ }
+ return entry;
+ }
State::Default => {
// we completely finished an entry, so now we're going "up" in the directory
// hierarchy and parse the next PXAR_FILENAME or the PXAR_GOODBYE:
@@ -277,6 +288,9 @@ impl<I: SeqRead> DecoderImpl<I> {
match self.current_header.htype {
format::PXAR_FILENAME => return self.handle_file_entry().await,
format::PXAR_APPENDIX_REF => {
+ if self.version == format::FormatVersion::Version1 {
+ io_bail!("appendix reference not supported in pxar format version 1");
+ }
self.state = State::Default;
return self.handle_appendix_ref_entry().await;
}
@@ -296,6 +310,9 @@ impl<I: SeqRead> DecoderImpl<I> {
}
}
format::PXAR_APPENDIX => {
+ if self.version == format::FormatVersion::Version1 {
+ io_bail!("appendix not supported in pxar format version 1");
+ }
self.state = State::Default;
return Ok(Some(self.entry.take()));
}
@@ -397,6 +414,11 @@ impl<I: SeqRead> DecoderImpl<I> {
self.entry.metadata = Metadata::default();
self.entry.kind = EntryKind::Hardlink(self.read_hardlink().await?);
+ Ok(Some(self.entry.take()))
+ } else if header.htype == format::PXAR_FORMAT_VERSION {
+ self.current_header = header;
+ self.entry.kind = EntryKind::Version(self.read_format_version().await?);
+
Ok(Some(self.entry.take()))
} else if header.htype == format::PXAR_ENTRY || header.htype == format::PXAR_ENTRY_V1 {
if header.htype == format::PXAR_ENTRY {
@@ -696,6 +718,14 @@ impl<I: SeqRead> DecoderImpl<I> {
async fn read_quota_project_id(&mut self) -> io::Result<format::QuotaProjectId> {
self.read_simple_entry("quota project id").await
}
+
+ async fn read_format_version(&mut self) -> io::Result<format::FormatVersion> {
+ match seq_read_entry(&mut self.input).await? {
+ 1u64 => Ok(format::FormatVersion::Version1),
+ 2u64 => Ok(format::FormatVersion::Version2),
+ _ => io_bail!("unexpected pxar format version"),
+ }
+ }
}
/// Reader for file contents inside a pxar archive.
diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index 5a833c5..8f586b6 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -24,8 +24,9 @@ impl<'a, T: tokio::io::AsyncWrite + 'a> Encoder<'a, TokioWriter<T>> {
pub async fn from_tokio(
output: T,
metadata: &Metadata,
+ version: format::FormatVersion,
) -> io::Result<Encoder<'a, TokioWriter<T>>> {
- Encoder::new(TokioWriter::new(output), metadata).await
+ Encoder::new(TokioWriter::new(output), metadata, version).await
}
}
@@ -46,9 +47,13 @@ impl<'a> Encoder<'a, TokioWriter<tokio::fs::File>> {
impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
/// Create an asynchronous encoder for an output implementing our internal write interface.
- pub async fn new(output: T, metadata: &Metadata) -> io::Result<Encoder<'a, T>> {
+ pub async fn new(
+ output: T,
+ metadata: &Metadata,
+ version: format::FormatVersion,
+ ) -> io::Result<Encoder<'a, T>> {
Ok(Self {
- inner: encoder::EncoderImpl::new(output.into(), metadata).await?,
+ inner: encoder::EncoderImpl::new(output.into(), metadata, version).await?,
})
}
@@ -299,7 +304,7 @@ mod test {
use std::task::{Context, Poll};
use super::Encoder;
- use crate::Metadata;
+ use crate::{format, Metadata};
struct DummyOutput;
@@ -321,9 +326,13 @@ mod test {
/// Assert that `Encoder` is `Send`
fn send_test() {
let test = async {
- let mut encoder = Encoder::new(DummyOutput, &Metadata::dir_builder(0o700).build())
- .await
- .unwrap();
+ let mut encoder = Encoder::new(
+ DummyOutput,
+ &Metadata::dir_builder(0o700).build(),
+ format::FormatVersion::default(),
+ )
+ .await
+ .unwrap();
{
let mut dir = encoder
.create_directory("baba", &Metadata::dir_builder(0o700).build())
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index c33b2c3..7e18dc8 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -247,6 +247,7 @@ pub async fn encoded_size(filename: &std::ffi::CStr, metadata: &Metadata) -> io:
file_copy_buffer: Arc::new(Mutex::new(unsafe {
crate::util::vec_new_uninitialized(1024 * 1024)
})),
+ version: format::FormatVersion::default(),
};
this.start_file_do(Some(metadata), filename.to_bytes())
@@ -356,6 +357,8 @@ pub(crate) struct EncoderImpl<'a, T: SeqWrite + 'a> {
/// Since only the "current" entry can be actively writing files, we share the file copy
/// buffer.
file_copy_buffer: Arc<Mutex<Vec<u8>>>,
+ /// Pxar format version to encode
+ version: format::FormatVersion,
}
impl<'a, T: SeqWrite + 'a> Drop for EncoderImpl<'a, T> {
@@ -377,6 +380,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
pub async fn new(
output: EncoderOutput<'a, T>,
metadata: &Metadata,
+ version: format::FormatVersion,
) -> io::Result<EncoderImpl<'a, T>> {
if !metadata.is_dir() {
io_bail!("directory metadata must contain the directory mode flag");
@@ -389,8 +393,10 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
file_copy_buffer: Arc::new(Mutex::new(unsafe {
crate::util::vec_new_uninitialized(1024 * 1024)
})),
+ version,
};
+ this.encode_format_version().await?;
this.encode_metadata(metadata).await?;
this.state.files_offset = this.position();
@@ -509,6 +515,9 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
appendix_ref_offset: AppendixRefOffset,
file_size: u64,
) -> io::Result<()> {
+ if self.version == format::FormatVersion::Version1 {
+ io_bail!("unable to add appendix reference for pxar format version 1");
+ }
self.check()?;
let offset = self.position();
@@ -544,6 +553,9 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
&mut self,
full_size: AppendixRefOffset,
) -> io::Result<AppendixStartOffset> {
+ if self.version == format::FormatVersion::Version1 {
+ io_bail!("unable to add appendix for pxar format version 1");
+ }
self.check()?;
let data = &full_size.raw().to_le_bytes().to_vec();
@@ -740,6 +752,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
parent: Some(&mut self.state),
finished: false,
file_copy_buffer,
+ version: self.version.clone(),
})
}
@@ -755,6 +768,25 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
Ok(())
}
+ async fn encode_format_version(&mut self) -> io::Result<()> {
+ if self.state.write_position != 0 {
+ io_bail!("pxar format version must be encoded at the beginning of an archive");
+ }
+
+ let version_bytes = match self.version {
+ format::FormatVersion::Version1 => return Ok(()),
+ format::FormatVersion::Version2 => 2u64.to_le_bytes(),
+ };
+
+ seq_write_pxar_entry(
+ self.output.as_mut(),
+ format::PXAR_FORMAT_VERSION,
+ &version_bytes,
+ &mut self.state.write_position,
+ )
+ .await
+ }
+
async fn encode_metadata(&mut self, metadata: &Metadata) -> io::Result<()> {
seq_write_pxar_struct_entry(
self.output.as_mut(),
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 5ede554..8516662 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -28,7 +28,7 @@ impl<'a, T: io::Write + 'a> Encoder<'a, StandardWriter<T>> {
/// Encode a `pxar` archive into a regular `std::io::Write` output.
#[inline]
pub fn from_std(output: T, metadata: &Metadata) -> io::Result<Encoder<'a, StandardWriter<T>>> {
- Encoder::new(StandardWriter::new(output), metadata)
+ Encoder::new(StandardWriter::new(output), metadata, format::FormatVersion::default())
}
}
@@ -41,6 +41,7 @@ impl<'a> Encoder<'a, StandardWriter<std::fs::File>> {
Encoder::new(
StandardWriter::new(std::fs::File::create(path.as_ref())?),
metadata,
+ format::FormatVersion::default(),
)
}
}
@@ -50,9 +51,9 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
///
/// Note that the `output`'s `SeqWrite` implementation must always return `Poll::Ready` and is
/// not allowed to use the `Waker`, as this will cause a `panic!`.
- pub fn new(output: T, metadata: &Metadata) -> io::Result<Self> {
+ pub fn new(output: T, metadata: &Metadata, version: format::FormatVersion) -> io::Result<Self> {
Ok(Self {
- inner: poll_result_once(encoder::EncoderImpl::new(output.into(), metadata))?,
+ inner: poll_result_once(encoder::EncoderImpl::new(output.into(), metadata, version))?,
})
}
diff --git a/src/format/mod.rs b/src/format/mod.rs
index 8016ab1..5154a0a 100644
--- a/src/format/mod.rs
+++ b/src/format/mod.rs
@@ -44,7 +44,7 @@
//! * final goodbye table
//! * `APPENDIX_TAIL` -- marks the end of an archive containing a APPENDIX section
-use std::cmp::Ordering;
+use std::cmp::{Ordering, PartialEq};
use std::ffi::{CStr, OsStr};
use std::fmt;
use std::fmt::Display;
@@ -88,6 +88,8 @@ pub mod mode {
}
// Generated by `cargo run --example mk-format-hashes`
+/// Pxar format version entry, fallback to version 1 if not present
+pub const PXAR_FORMAT_VERSION: u64 = 0x730f6c75df16a40d;
/// Beginning of an entry (current version).
pub const PXAR_ENTRY: u64 = 0xd5956474e588acef;
/// Previous version of the entry struct
@@ -561,6 +563,13 @@ pub struct Filename {
pub name: Vec<u8>,
}
+#[derive(Clone, Debug, Default, PartialEq)]
+pub enum FormatVersion {
+ #[default]
+ Version1,
+ Version2,
+}
+
#[derive(Clone, Debug)]
pub struct Symlink {
pub data: Vec<u8>,
diff --git a/src/lib.rs b/src/lib.rs
index 035f995..a08055e 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -342,6 +342,9 @@ impl Acl {
/// Identifies whether the entry is a file, symlink, directory, etc.
#[derive(Clone, Debug)]
pub enum EntryKind {
+ /// Pxar file format version
+ Version(format::FormatVersion),
+
/// Symbolic links.
Symlink(format::Symlink),
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 pxar 9/29] fix #3174: enc: move from instance per dir to state stack
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (7 preceding siblings ...)
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 8/29] fix #3174: add pxar format version entry Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 10/29] fix #3174: index: add fn index list from start/end-offsets Christian Ebner
` (19 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
The current encoder implementation creates an EncoderImpl for each
directory, storing its state and references to the parent and output,
restricting possible encoding errors by missuing the api.
This api is however to strict when state has to be tracked across
directoy boundaries, as e.g. when trying to encode cached entries, as
then multiple encoder instances with internal references have to be
handled.
This patch therefore moves to an internal stack to keep track of the
encoder states for different directory levels, thereby removing the
internal parent child borrowing. Only a single EncoderImpl instance will
be created, the `create_directory` call and `finish` call operating on
the internal state stack instead of creating/consuming the EncoderImpl
instance.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- not present in previous version
examples/pxarcmd.rs | 6 +-
src/encoder/aio.rs | 20 ++-
src/encoder/mod.rs | 305 ++++++++++++++++++++++++++------------------
src/encoder/sync.rs | 16 ++-
4 files changed, 202 insertions(+), 145 deletions(-)
diff --git a/examples/pxarcmd.rs b/examples/pxarcmd.rs
index c7848cc..0530e20 100644
--- a/examples/pxarcmd.rs
+++ b/examples/pxarcmd.rs
@@ -138,14 +138,14 @@ fn add_directory<'a, T: SeqWrite + 'a>(
let meta = Metadata::from(&file_meta);
if file_type.is_dir() {
- let mut dir = encoder.create_directory(file_name, &meta)?;
+ encoder.create_directory(file_name, &meta)?;
add_directory(
- &mut dir,
+ encoder,
std::fs::read_dir(file_path)?,
root_path,
&mut *hardlinks,
)?;
- dir.finish(None)?;
+ encoder.finish(None)?;
} else if file_type.is_symlink() {
todo!("symlink handling");
} else if file_type.is_file() {
diff --git a/src/encoder/aio.rs b/src/encoder/aio.rs
index 8f586b6..1870c7d 100644
--- a/src/encoder/aio.rs
+++ b/src/encoder/aio.rs
@@ -103,25 +103,22 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
&mut self,
file_name: P,
metadata: &Metadata,
- ) -> io::Result<Encoder<'_, T>> {
- Ok(Encoder {
- inner: self
- .inner
- .create_directory(file_name.as_ref(), metadata)
- .await?,
- })
+ ) -> io::Result<()> {
+ self.inner
+ .create_directory(file_name.as_ref(), metadata)
+ .await
}
/// Finish this directory. This is mandatory, otherwise the `Drop` handler will `panic!`.
pub async fn finish(
- self,
+ &mut self,
appendix_tail: Option<(AppendixStartOffset, AppendixRefOffset)>,
) -> io::Result<()> {
self.inner.finish(appendix_tail).await
}
/// Add size to encoders position and return new position.
- pub unsafe fn position_add(&mut self, size: u64) -> u64 {
+ pub unsafe fn position_add(&mut self, size: u64) -> io::Result<u64> {
unsafe { self.inner.position_add(size) }
}
@@ -334,11 +331,12 @@ mod test {
.await
.unwrap();
{
- let mut dir = encoder
+ encoder
.create_directory("baba", &Metadata::dir_builder(0o700).build())
.await
.unwrap();
- dir.create_file(&Metadata::file_builder(0o755).build(), "abab", 1024)
+ encoder
+ .create_file(&Metadata::file_builder(0o755).build(), "abab", 1024)
.await
.unwrap();
}
diff --git a/src/encoder/mod.rs b/src/encoder/mod.rs
index 7e18dc8..113dd9f 100644
--- a/src/encoder/mod.rs
+++ b/src/encoder/mod.rs
@@ -241,8 +241,7 @@ where
pub async fn encoded_size(filename: &std::ffi::CStr, metadata: &Metadata) -> io::Result<u64> {
let mut this = EncoderImpl {
output: EncoderOutput::Owned(SeqSink::default()),
- state: EncoderState::default(),
- parent: None,
+ state: vec![EncoderState::default()],
finished: false,
file_copy_buffer: Arc::new(Mutex::new(unsafe {
crate::util::vec_new_uninitialized(1024 * 1024)
@@ -252,7 +251,8 @@ pub async fn encoded_size(filename: &std::ffi::CStr, metadata: &Metadata) -> io:
this.start_file_do(Some(metadata), filename.to_bytes())
.await?;
- Ok(this.position())
+ let state = this.state.last().ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
+ Ok(state.position())
}
/// Error conditions caused by wrong usage of this crate.
@@ -296,6 +296,11 @@ struct EncoderState {
}
impl EncoderState {
+ #[inline]
+ fn position(&self) -> u64 {
+ self.write_position
+ }
+
fn merge_error(&mut self, error: Option<EncodeError>) {
// one error is enough:
if self.encode_error.is_none() {
@@ -313,16 +318,6 @@ pub(crate) enum EncoderOutput<'a, T> {
Borrowed(&'a mut T),
}
-impl<'a, T> EncoderOutput<'a, T> {
- #[inline]
- fn to_borrowed_mut<'s>(&'s mut self) -> EncoderOutput<'s, T>
- where
- 'a: 's,
- {
- EncoderOutput::Borrowed(self.as_mut())
- }
-}
-
impl<'a, T> std::convert::AsMut<T> for EncoderOutput<'a, T> {
fn as_mut(&mut self) -> &mut T {
match self {
@@ -350,8 +345,8 @@ impl<'a, T> std::convert::From<&'a mut T> for EncoderOutput<'a, T> {
/// synchronous or `async` I/O objects in as output.
pub(crate) struct EncoderImpl<'a, T: SeqWrite + 'a> {
output: EncoderOutput<'a, T>,
- state: EncoderState,
- parent: Option<&'a mut EncoderState>,
+ /// EncoderState stack storing the state for each directory level
+ state: Vec<EncoderState>,
finished: bool,
/// Since only the "current" entry can be actively writing files, we share the file copy
@@ -361,21 +356,6 @@ pub(crate) struct EncoderImpl<'a, T: SeqWrite + 'a> {
version: format::FormatVersion,
}
-impl<'a, T: SeqWrite + 'a> Drop for EncoderImpl<'a, T> {
- fn drop(&mut self) {
- if let Some(ref mut parent) = self.parent {
- // propagate errors:
- parent.merge_error(self.state.encode_error);
- if !self.finished {
- parent.add_error(EncodeError::IncompleteDirectory);
- }
- } else if !self.finished {
- // FIXME: how do we deal with this?
- // eprintln!("Encoder dropped without finishing!");
- }
- }
-}
-
impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
pub async fn new(
output: EncoderOutput<'a, T>,
@@ -387,8 +367,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
}
let mut this = Self {
output,
- state: EncoderState::default(),
- parent: None,
+ state: vec![EncoderState::default()],
finished: false,
file_copy_buffer: Arc::new(Mutex::new(unsafe {
crate::util::vec_new_uninitialized(1024 * 1024)
@@ -398,19 +377,41 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
this.encode_format_version().await?;
this.encode_metadata(metadata).await?;
- this.state.files_offset = this.position();
+ let state = this.state
+ .last_mut()
+ .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
+ state.file_offset = Some(state.position());
Ok(this)
}
fn check(&self) -> io::Result<()> {
- match self.state.encode_error {
+ if self.finished {
+ io_bail!("unexpected encoder finished state");
+ }
+ let state = self
+ .state
+ .last()
+ .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
+ match state.encode_error {
Some(EncodeError::IncompleteFile) => io_bail!("incomplete file"),
Some(EncodeError::IncompleteDirectory) => io_bail!("directory not finalized"),
None => Ok(()),
}
}
+ fn state(&self) -> io::Result<&EncoderState> {
+ self.state
+ .last()
+ .ok_or_else(|| io_format_err!("encoder state stack underflow"))
+ }
+
+ fn state_mut(&mut self) -> io::Result<&mut EncoderState> {
+ self.state
+ .last_mut()
+ .ok_or_else(|| io_format_err!("encoder state stack underflow"))
+ }
+
pub async fn create_file<'b>(
&'b mut self,
metadata: &Metadata,
@@ -434,16 +435,18 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
'a: 'b,
{
self.check()?;
-
- let file_offset = self.position();
+ let file_offset = self.state()?.position();
self.start_file_do(Some(metadata), file_name).await?;
let header = format::Header::with_content_size(format::PXAR_PAYLOAD, file_size);
header.check_header_size()?;
- seq_write_struct(self.output.as_mut(), header, &mut self.state.write_position).await?;
+ let state = self.state
+ .last_mut()
+ .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
+ seq_write_struct(self.output.as_mut(), header, &mut state.write_position).await?;
- let payload_data_offset = self.position();
+ let payload_data_offset = state.position();
let meta_size = payload_data_offset - file_offset;
@@ -520,23 +523,29 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
}
self.check()?;
- let offset = self.position();
+ let offset = self.state()?.position();
let file_name = file_name.as_os_str().as_bytes();
let mut data = Vec::with_capacity(2 * 8);
data.extend(&appendix_ref_offset.raw().to_le_bytes());
data.extend(&file_size.to_le_bytes());
+
+
+ let state = self
+ .state
+ .last_mut()
+ .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
seq_write_pxar_entry(
self.output.as_mut(),
format::PXAR_APPENDIX_REF,
&data,
- &mut self.state.write_position,
+ &mut state.write_position,
)
.await?;
- let end_offset = self.position();
+ let end_offset = state.write_position;
- self.state.items.push(GoodbyeItem {
+ state.items.push(GoodbyeItem {
hash: format::hash_filename(file_name),
offset: offset,
size: end_offset - offset,
@@ -559,17 +568,19 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
self.check()?;
let data = &full_size.raw().to_le_bytes().to_vec();
+ let state = self
+ .state
+ .last_mut()
+ .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
seq_write_pxar_entry(
self.output.as_mut(),
format::PXAR_APPENDIX,
&data,
- &mut self.state.write_position,
+ &mut state.write_position,
)
.await?;
- let offset = self.position();
-
- Ok(AppendixStartOffset(offset))
+ Ok(AppendixStartOffset(state.write_position))
}
/// Return a file offset usable with `add_hardlink`.
@@ -600,7 +611,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
target: &Path,
target_offset: LinkOffset,
) -> io::Result<()> {
- let current_offset = self.position();
+ let current_offset = self.state()?.position();
if current_offset <= target_offset.0 {
io_bail!("invalid hardlink offset, can only point to prior files");
}
@@ -673,25 +684,28 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
entry_htype_data: Option<(u64, &[u8])>,
) -> io::Result<LinkOffset> {
self.check()?;
-
- let file_offset = self.position();
+ let file_offset = self.state()?.position();
let file_name = file_name.as_os_str().as_bytes();
self.start_file_do(metadata, file_name).await?;
+ let state = self
+ .state
+ .last_mut()
+ .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
if let Some((htype, entry_data)) = entry_htype_data {
seq_write_pxar_entry(
self.output.as_mut(),
htype,
entry_data,
- &mut self.state.write_position,
+ &mut state.write_position,
)
.await?;
}
- let end_offset = self.position();
+ let end_offset = state.position();
- self.state.items.push(GoodbyeItem {
+ state.items.push(GoodbyeItem {
hash: format::hash_filename(file_name),
offset: file_offset,
size: end_offset - file_offset,
@@ -701,21 +715,17 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
}
#[inline]
- fn position(&mut self) -> u64 {
- self.state.write_position
- }
-
- #[inline]
- pub unsafe fn position_add(&mut self, size: u64) -> u64 {
- self.state.write_position += size;
- self.state.write_position
+ pub unsafe fn position_add(&mut self, size: u64) -> io::Result<u64> {
+ let state = self.state_mut()?;
+ state.write_position += size;
+ Ok(state.write_position)
}
pub async fn create_directory(
&mut self,
file_name: &Path,
metadata: &Metadata,
- ) -> io::Result<EncoderImpl<'_, T>> {
+ ) -> io::Result<()> {
self.check()?;
if !metadata.is_dir() {
@@ -725,35 +735,28 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
let file_name = file_name.as_os_str().as_bytes();
let file_hash = format::hash_filename(file_name);
- let file_offset = self.position();
+ let file_offset = self.state()?.position();
self.encode_filename(file_name).await?;
- let entry_offset = self.position();
+ let entry_offset = self.state()?.position();
self.encode_metadata(metadata).await?;
- let files_offset = self.position();
+ let state = self.state_mut()?;
+ let files_offset = state.position();
// the child will write to OUR state now:
- let write_position = self.position();
-
- let file_copy_buffer = Arc::clone(&self.file_copy_buffer);
-
- Ok(EncoderImpl {
- // always forward as Borrowed(), to avoid stacking references on nested calls
- output: self.output.to_borrowed_mut(),
- state: EncoderState {
- entry_offset,
- files_offset,
- file_offset: Some(file_offset),
- file_hash,
- write_position,
- ..Default::default()
- },
- parent: Some(&mut self.state),
- finished: false,
- file_copy_buffer,
- version: self.version.clone(),
- })
+ let write_position = state.position();
+
+ self.state.push(EncoderState {
+ entry_offset,
+ files_offset,
+ file_offset: Some(file_offset),
+ file_hash,
+ write_position,
+ ..Default::default()
+ });
+
+ Ok(())
}
async fn start_file_do(
@@ -769,30 +772,38 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
}
async fn encode_format_version(&mut self) -> io::Result<()> {
- if self.state.write_position != 0 {
- io_bail!("pxar format version must be encoded at the beginning of an archive");
- }
+ let state = self
+ .state
+ .last_mut()
+ .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
+ if state.write_position != 0 {
+ io_bail!("pxar format version must be encoded at the beginning of an archive");
+ }
- let version_bytes = match self.version {
- format::FormatVersion::Version1 => return Ok(()),
- format::FormatVersion::Version2 => 2u64.to_le_bytes(),
- };
+ let version_bytes = match self.version {
+ format::FormatVersion::Version1 => return Ok(()),
+ format::FormatVersion::Version2 => 2u64.to_le_bytes(),
+ };
seq_write_pxar_entry(
self.output.as_mut(),
format::PXAR_FORMAT_VERSION,
&version_bytes,
- &mut self.state.write_position,
+ &mut state.write_position,
)
.await
}
async fn encode_metadata(&mut self, metadata: &Metadata) -> io::Result<()> {
+ let state = self
+ .state
+ .last_mut()
+ .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
seq_write_pxar_struct_entry(
self.output.as_mut(),
format::PXAR_ENTRY,
metadata.stat.clone(),
- &mut self.state.write_position,
+ &mut state.write_position,
)
.await?;
@@ -814,22 +825,30 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
}
async fn write_xattr(&mut self, xattr: &format::XAttr) -> io::Result<()> {
+ let state = self
+ .state
+ .last_mut()
+ .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
seq_write_pxar_entry(
self.output.as_mut(),
format::PXAR_XATTR,
&xattr.data,
- &mut self.state.write_position,
+ &mut state.write_position,
)
.await
}
async fn write_acls(&mut self, acl: &crate::Acl) -> io::Result<()> {
+ let state = self
+ .state
+ .last_mut()
+ .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
for acl in &acl.users {
seq_write_pxar_struct_entry(
self.output.as_mut(),
format::PXAR_ACL_USER,
acl.clone(),
- &mut self.state.write_position,
+ &mut state.write_position,
)
.await?;
}
@@ -839,7 +858,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
self.output.as_mut(),
format::PXAR_ACL_GROUP,
acl.clone(),
- &mut self.state.write_position,
+ &mut state.write_position,
)
.await?;
}
@@ -849,7 +868,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
self.output.as_mut(),
format::PXAR_ACL_GROUP_OBJ,
acl.clone(),
- &mut self.state.write_position,
+ &mut state.write_position,
)
.await?;
}
@@ -859,7 +878,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
self.output.as_mut(),
format::PXAR_ACL_DEFAULT,
acl.clone(),
- &mut self.state.write_position,
+ &mut state.write_position,
)
.await?;
}
@@ -869,7 +888,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
self.output.as_mut(),
format::PXAR_ACL_DEFAULT_USER,
acl.clone(),
- &mut self.state.write_position,
+ &mut state.write_position,
)
.await?;
}
@@ -879,7 +898,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
self.output.as_mut(),
format::PXAR_ACL_DEFAULT_GROUP,
acl.clone(),
- &mut self.state.write_position,
+ &mut state.write_position,
)
.await?;
}
@@ -888,11 +907,15 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
}
async fn write_file_capabilities(&mut self, fcaps: &format::FCaps) -> io::Result<()> {
+ let state = self
+ .state
+ .last_mut()
+ .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
seq_write_pxar_entry(
self.output.as_mut(),
format::PXAR_FCAPS,
&fcaps.data,
- &mut self.state.write_position,
+ &mut state.write_position,
)
.await
}
@@ -901,36 +924,49 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
&mut self,
quota_project_id: &format::QuotaProjectId,
) -> io::Result<()> {
+ let state = self
+ .state
+ .last_mut()
+ .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
seq_write_pxar_struct_entry(
self.output.as_mut(),
format::PXAR_QUOTA_PROJID,
*quota_project_id,
- &mut self.state.write_position,
+ &mut state.write_position,
)
.await
}
async fn encode_filename(&mut self, file_name: &[u8]) -> io::Result<()> {
crate::util::validate_filename(file_name)?;
+ let state = self
+ .state
+ .last_mut()
+ .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
seq_write_pxar_entry_zero(
self.output.as_mut(),
format::PXAR_FILENAME,
file_name,
- &mut self.state.write_position,
+ &mut state.write_position,
)
.await
}
pub async fn finish(
- mut self,
+ &mut self,
appendix_tail: Option<(AppendixStartOffset, AppendixRefOffset)>,
) -> io::Result<()> {
let tail_bytes = self.finish_goodbye_table().await?;
+
+ let mut state = self
+ .state
+ .pop()
+ .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
seq_write_pxar_entry(
self.output.as_mut(),
format::PXAR_GOODBYE,
&tail_bytes,
- &mut self.state.write_position,
+ &mut state.write_position,
)
.await?;
@@ -943,7 +979,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
self.output.as_mut(),
format::PXAR_GOODBYE,
&appendix_tail,
- &mut self.state.write_position,
+ &mut state.write_position,
)
.await?;
}
@@ -952,32 +988,33 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
flush(output).await?;
}
- // done up here because of the self-borrow and to propagate
- let end_offset = self.position();
+ let end_offset = state.write_position;
- if let Some(parent) = &mut self.parent {
+ if let Some(parent) = self.state.last_mut() {
parent.write_position = end_offset;
- let file_offset = self
- .state
- .file_offset
- .expect("internal error: parent set but no file_offset?");
+ let file_offset = state.file_offset.expect("internal error: no file offset?");
parent.items.push(GoodbyeItem {
- hash: self.state.file_hash,
+ hash: state.file_hash,
offset: file_offset,
size: end_offset - file_offset,
});
+ // propagate errors:
+ parent.merge_error(state.encode_error);
+ } else {
+ self.finished = true;
}
- self.finished = true;
+
Ok(())
}
async fn finish_goodbye_table(&mut self) -> io::Result<Vec<u8>> {
- let goodbye_offset = self.position();
+ let state = self.state_mut()?;
+ let goodbye_offset = state.position();
// "take" out the tail (to not leave an array of endian-swapped structs in `self`)
- let mut tail = take(&mut self.state.items);
+ let mut tail = take(&mut state.items);
let tail_size = (tail.len() + 1) * size_of::<GoodbyeItem>();
let goodbye_size = tail_size as u64 + size_of::<format::Header>() as u64;
@@ -1002,7 +1039,7 @@ impl<'a, T: SeqWrite + 'a> EncoderImpl<'a, T> {
bst.push(
GoodbyeItem {
hash: format::PXAR_GOODBYE_TAIL_MARKER,
- offset: goodbye_offset - self.state.entry_offset,
+ offset: goodbye_offset - state.entry_offset,
size: goodbye_size,
}
.to_le(),
@@ -1029,18 +1066,26 @@ pub(crate) struct FileImpl<'a, S: SeqWrite> {
/// exactly zero.
remaining_size: u64,
+ ///FIXME: Update comments
/// The directory containing this file. This is where we propagate the `IncompleteFile` error
/// to, and where we insert our `GoodbyeItem`.
- parent: &'a mut EncoderState,
+ parent: &'a mut Vec<EncoderState>,
}
impl<'a, S: SeqWrite> Drop for FileImpl<'a, S> {
fn drop(&mut self) {
if self.remaining_size != 0 {
- self.parent.add_error(EncodeError::IncompleteFile);
+ self.parent
+ .last_mut()
+ .unwrap()
+ .add_error(EncodeError::IncompleteFile);
}
- self.parent.items.push(self.goodbye_item.clone());
+ self.parent
+ .last_mut()
+ .unwrap()
+ .items
+ .push(self.goodbye_item.clone());
}
}
@@ -1071,7 +1116,11 @@ impl<'a, S: SeqWrite> FileImpl<'a, S> {
match output.poll_seq_write(cx, data) {
Poll::Ready(Ok(put)) => {
this.remaining_size -= put as u64;
- this.parent.write_position += put as u64;
+ let mut parent = this
+ .parent
+ .last_mut()
+ .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
+ parent.write_position += put as u64;
Poll::Ready(Ok(put))
}
other => other,
@@ -1105,14 +1154,22 @@ impl<'a, S: SeqWrite> FileImpl<'a, S> {
.await?;
//let put = seq_write(self.output.as_mut().unwrap(), data).await?;
self.remaining_size -= put as u64;
- self.parent.write_position += put as u64;
+ let parent = self
+ .parent
+ .last_mut()
+ .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
+ parent.write_position += put as u64;
Ok(put)
}
/// Completely write file data for the current file entry in a pxar archive.
pub async fn write_all(&mut self, data: &[u8]) -> io::Result<()> {
self.check_remaining(data.len())?;
- seq_write_all(self.output, data, &mut self.parent.write_position).await?;
+ let parent = self
+ .parent
+ .last_mut()
+ .ok_or_else(|| io_format_err!("encoder state stack underflow"))?;
+ seq_write_all(self.output, data, &mut parent.write_position).await?;
self.remaining_size -= data.len() as u64;
Ok(())
}
diff --git a/src/encoder/sync.rs b/src/encoder/sync.rs
index 8516662..fc338b2 100644
--- a/src/encoder/sync.rs
+++ b/src/encoder/sync.rs
@@ -28,7 +28,11 @@ impl<'a, T: io::Write + 'a> Encoder<'a, StandardWriter<T>> {
/// Encode a `pxar` archive into a regular `std::io::Write` output.
#[inline]
pub fn from_std(output: T, metadata: &Metadata) -> io::Result<Encoder<'a, StandardWriter<T>>> {
- Encoder::new(StandardWriter::new(output), metadata, format::FormatVersion::default())
+ Encoder::new(
+ StandardWriter::new(output),
+ metadata,
+ format::FormatVersion::default(),
+ )
}
}
@@ -100,22 +104,20 @@ impl<'a, T: SeqWrite + 'a> Encoder<'a, T> {
&mut self,
file_name: P,
metadata: &Metadata,
- ) -> io::Result<Encoder<'_, T>> {
- Ok(Encoder {
- inner: poll_result_once(self.inner.create_directory(file_name.as_ref(), metadata))?,
- })
+ ) -> io::Result<()> {
+ poll_result_once(self.inner.create_directory(file_name.as_ref(), metadata))
}
/// Finish this directory. This is mandatory, otherwise the `Drop` handler will `panic!`.
pub fn finish(
- self,
+ &mut self,
appendix_tail: Option<(AppendixStartOffset, AppendixRefOffset)>,
) -> io::Result<()> {
poll_result_once(self.inner.finish(appendix_tail))
}
/// Add size to encoders position and return new position.
- pub unsafe fn position_add(&mut self, size: u64) -> u64 {
+ pub unsafe fn position_add(&mut self, size: u64) -> io::Result<u64> {
unsafe { self.inner.position_add(size) }
}
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-backup 10/29] fix #3174: index: add fn index list from start/end-offsets
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (8 preceding siblings ...)
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 9/29] fix #3174: enc: move from instance per dir to state stack Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 11/29] fix #3174: api: double catalog upload size Christian Ebner
` (18 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
Adds a function to get a list of DynamicEntry's from a chunk index by
given start and end offset, which should be contained within these
chunks.
This is needed in order to reference file payloads and reuse the
chunks containing them from a previous backup run.
The index entries are normalized, meaning each entries end() is equal to
the chunk size.
In addition to the list of index entries, the padding to the start of
the requested start offset from the first chunk is returned. This is
needed for calculation of the appendix section offsets during
decoding.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- Introduce dedicated `AppendableDynamicEntry` to distinguish from
regular `DynamicEntry` objects. Refactor accordingly.
pbs-datastore/src/dynamic_index.rs | 55 ++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)
diff --git a/pbs-datastore/src/dynamic_index.rs b/pbs-datastore/src/dynamic_index.rs
index 71a5082e..a4d8ceec 100644
--- a/pbs-datastore/src/dynamic_index.rs
+++ b/pbs-datastore/src/dynamic_index.rs
@@ -74,6 +74,26 @@ impl DynamicEntry {
}
}
+/// Dynamic Entry appendable as pxar Appendix entry
+#[derive(Clone, Debug)]
+#[repr(C)]
+pub struct AppendableDynamicEntry {
+ size_le: u64,
+ digest: [u8; 32],
+}
+
+impl AppendableDynamicEntry {
+ #[inline]
+ pub fn size(&self) -> u64 {
+ u64::from_le(self.size_le)
+ }
+
+ #[inline]
+ pub fn digest(&self) -> [u8; 32] {
+ self.digest.clone()
+ }
+}
+
pub struct DynamicIndexReader {
_file: File,
pub size: usize,
@@ -188,6 +208,41 @@ impl DynamicIndexReader {
self.binary_search(middle_idx + 1, middle_end, end_idx, end, offset)
}
}
+
+ /// List of chunk indices containing the data from start_offset to end_offset
+ pub fn indices(
+ &self,
+ start_offset: u64,
+ end_offset: u64,
+ ) -> Result<(Vec<AppendableDynamicEntry>, u64, u64), Error> {
+ let end_idx = self.index.len() - 1;
+ let chunk_end = self.chunk_end(end_idx);
+ let start = self.binary_search(0, 0, end_idx, chunk_end, start_offset)?;
+ let end = self.binary_search(0, 0, end_idx, chunk_end, end_offset)?;
+
+ let offset_first = if start == 0 {
+ 0
+ } else {
+ self.index[start - 1].end()
+ };
+
+ let padding_start = start_offset - offset_first;
+ let padding_end = self.index[end].end() - end_offset;
+
+ let mut indices = Vec::new();
+ let mut prev_end = offset_first;
+ for dynamic_entry in &self.index[start..end + 1] {
+ let size = dynamic_entry.end() - prev_end;
+ let appendable_dynamic_entry = AppendableDynamicEntry {
+ size_le: size.to_le(),
+ digest: dynamic_entry.digest.clone(),
+ };
+ prev_end += size;
+ indices.push(appendable_dynamic_entry);
+ }
+
+ Ok((indices, padding_start, padding_end))
+ }
}
impl IndexFile for DynamicIndexReader {
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-backup 11/29] fix #3174: api: double catalog upload size
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (9 preceding siblings ...)
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 10/29] fix #3174: index: add fn index list from start/end-offsets Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 12/29] fix #3174: catalog: introduce extended format v2 Christian Ebner
` (17 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
Double the catalog upload size, as the catalog needs to contain the
additional archive offset information needed for referencing file
payloads in an appendix if the catalog is used to perform backups
without re-reading file payloads.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- no changes
src/api2/backup/upload_chunk.rs | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/api2/backup/upload_chunk.rs b/src/api2/backup/upload_chunk.rs
index 20259660..84485bfd 100644
--- a/src/api2/backup/upload_chunk.rs
+++ b/src/api2/backup/upload_chunk.rs
@@ -126,7 +126,7 @@ pub const API_METHOD_UPLOAD_FIXED_CHUNK: ApiMethod = ApiMethod::new(
false,
&IntegerSchema::new("Chunk size.")
.minimum(1)
- .maximum(1024 * 1024 * 16)
+ .maximum(1024 * 1024 * 32)
.schema()
),
(
@@ -135,7 +135,7 @@ pub const API_METHOD_UPLOAD_FIXED_CHUNK: ApiMethod = ApiMethod::new(
&IntegerSchema::new("Encoded chunk size.")
.minimum((std::mem::size_of::<DataBlobHeader>() as isize) + 1)
.maximum(
- 1024 * 1024 * 16
+ 1024 * 1024 * 32
+ (std::mem::size_of::<EncryptedDataBlobHeader>() as isize)
)
.schema()
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-backup 12/29] fix #3174: catalog: introduce extended format v2
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (10 preceding siblings ...)
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 11/29] fix #3174: api: double catalog upload size Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 13/29] fix #3174: archiver/extractor: impl appendix ref Christian Ebner
` (16 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
Increments the catalog file format to version 2. The new catalog
format introduces an extension to file entries in order to store the
additional ctime and pxar archive file offsets, needed for metadata
based file change detection and pxar file entry reuse in the pxar
appendix section.
ctime is introduced, in order to allow to detect also file status
changes since the last backup run, e.g. by updated extended
attributes, which will not increment mtime.
Inclusion of the pxar archive file offset allows to calculate the file
entry size in the archive needed for re-indexing chunks containing
this entry and the offset relative to the appendix start in the pxar
archive, needed for calculation of the bytes to skip over during
sequential decoding while restoring the pxar archive.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- no changes
pbs-client/src/pxar/create.rs | 31 +-
pbs-datastore/src/catalog.rs | 323 ++++++++++++++----
pbs-datastore/src/file_formats.rs | 3 +
.../src/proxmox_restore_daemon/api.rs | 6 +-
4 files changed, 282 insertions(+), 81 deletions(-)
diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index e7053d9e..a2338218 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -390,12 +390,6 @@ impl Archiver {
patterns_count: usize,
) -> Result<(), Error> {
let content = generate_pxar_excludes_cli(&self.patterns[..patterns_count]);
- if let Some(ref catalog) = self.catalog {
- catalog
- .lock()
- .unwrap()
- .add_file(file_name, content.len() as u64, 0)?;
- }
let mut metadata = Metadata::default();
metadata.stat.mode = pxar::format::mode::IFREG | 0o600;
@@ -405,6 +399,14 @@ impl Archiver {
.await?;
file.write_all(&content).await?;
+ if let Some(ref catalog) = self.catalog {
+ let link_offset = file.file_offset();
+ catalog
+ .lock()
+ .unwrap()
+ .add_file(file_name, content.len() as u64, 0, 0, link_offset)?;
+ }
+
Ok(())
}
@@ -572,17 +574,20 @@ impl Archiver {
}
let file_size = stat.st_size as u64;
- if let Some(ref catalog) = self.catalog {
- catalog
- .lock()
- .unwrap()
- .add_file(c_file_name, file_size, stat.st_mtime)?;
- }
-
let offset: LinkOffset = self
.add_regular_file(encoder, fd, file_name, &metadata, file_size)
.await?;
+ if let Some(ref catalog) = self.catalog {
+ catalog.lock().unwrap().add_file(
+ c_file_name,
+ file_size,
+ stat.st_mtime,
+ stat.st_ctime,
+ offset,
+ )?;
+ }
+
if stat.st_nlink > 1 {
self.hardlinks
.insert(link_info, (self.path.clone(), offset));
diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
index eb531837..746b493b 100644
--- a/pbs-datastore/src/catalog.rs
+++ b/pbs-datastore/src/catalog.rs
@@ -11,7 +11,7 @@ use pathpatterns::{MatchList, MatchType};
use proxmox_io::ReadExt;
use proxmox_schema::api;
-use crate::file_formats::PROXMOX_CATALOG_FILE_MAGIC_1_0;
+use crate::file_formats::{PROXMOX_CATALOG_FILE_MAGIC_1_0, PROXMOX_CATALOG_FILE_MAGIC_2_0};
/// Trait for writing file list catalogs.
///
@@ -20,7 +20,14 @@ use crate::file_formats::PROXMOX_CATALOG_FILE_MAGIC_1_0;
pub trait BackupCatalogWriter {
fn start_directory(&mut self, name: &CStr) -> Result<(), Error>;
fn end_directory(&mut self) -> Result<(), Error>;
- fn add_file(&mut self, name: &CStr, size: u64, mtime: i64) -> Result<(), Error>;
+ fn add_file(
+ &mut self,
+ name: &CStr,
+ size: u64,
+ mtime: i64,
+ ctime: i64,
+ file_offset: pxar::encoder::LinkOffset,
+ ) -> Result<(), Error>;
fn add_symlink(&mut self, name: &CStr) -> Result<(), Error>;
fn add_hardlink(&mut self, name: &CStr) -> Result<(), Error>;
fn add_block_device(&mut self, name: &CStr) -> Result<(), Error>;
@@ -81,6 +88,21 @@ impl fmt::Display for CatalogEntryType {
}
}
+#[derive(Clone, Copy, Debug, Default, Eq, PartialEq, Ord, PartialOrd)]
+pub struct FileOffset {
+ offset: u64,
+}
+
+impl FileOffset {
+ pub fn raw(&self) -> u64 {
+ self.offset
+ }
+}
+#[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
+pub enum Offset {
+ FileOffset { offset: u64 },
+}
+
/// Represents a named directory entry
///
/// The ``attr`` property contain the exact type with type specific
@@ -91,11 +113,23 @@ pub struct DirEntry {
pub attr: DirEntryAttribute,
}
+#[derive(Clone, Debug, PartialEq, Eq)]
+pub struct CatalogV2Extension {
+ pub ctime: i64,
+ pub file_offset: FileOffset,
+}
+
/// Used to specific additional attributes inside DirEntry
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum DirEntryAttribute {
- Directory { start: u64 },
- File { size: u64, mtime: i64 },
+ Directory {
+ start: u64,
+ },
+ File {
+ size: u64,
+ mtime: i64,
+ extension: Option<CatalogV2Extension>,
+ },
Symlink,
Hardlink,
BlockDevice,
@@ -105,40 +139,63 @@ pub enum DirEntryAttribute {
}
impl DirEntry {
- fn new(etype: CatalogEntryType, name: Vec<u8>, start: u64, size: u64, mtime: i64) -> Self {
- match etype {
- CatalogEntryType::Directory => DirEntry {
+ fn new(
+ etype: CatalogEntryType,
+ name: Vec<u8>,
+ start: u64,
+ size: u64,
+ mtime: i64,
+ ctime: i64,
+ offset: Option<Offset>,
+ ) -> Self {
+ match (etype, offset) {
+ (CatalogEntryType::Directory, None) => DirEntry {
name,
attr: DirEntryAttribute::Directory { start },
},
- CatalogEntryType::File => DirEntry {
- name,
- attr: DirEntryAttribute::File { size, mtime },
- },
- CatalogEntryType::Symlink => DirEntry {
+ (CatalogEntryType::File, offset) => {
+ let extension = if let Some(Offset::FileOffset { offset }) = offset {
+ Some(CatalogV2Extension {
+ ctime,
+ file_offset: FileOffset { offset },
+ })
+ } else {
+ None
+ };
+ DirEntry {
+ name,
+ attr: DirEntryAttribute::File {
+ size,
+ mtime,
+ extension,
+ },
+ }
+ }
+ (CatalogEntryType::Symlink, None) => DirEntry {
name,
attr: DirEntryAttribute::Symlink,
},
- CatalogEntryType::Hardlink => DirEntry {
+ (CatalogEntryType::Hardlink, None) => DirEntry {
name,
attr: DirEntryAttribute::Hardlink,
},
- CatalogEntryType::BlockDevice => DirEntry {
+ (CatalogEntryType::BlockDevice, None) => DirEntry {
name,
attr: DirEntryAttribute::BlockDevice,
},
- CatalogEntryType::CharDevice => DirEntry {
+ (CatalogEntryType::CharDevice, None) => DirEntry {
name,
attr: DirEntryAttribute::CharDevice,
},
- CatalogEntryType::Fifo => DirEntry {
+ (CatalogEntryType::Fifo, None) => DirEntry {
name,
attr: DirEntryAttribute::Fifo,
},
- CatalogEntryType::Socket => DirEntry {
+ (CatalogEntryType::Socket, None) => DirEntry {
name,
attr: DirEntryAttribute::Socket,
},
+ _ => panic!("unexpected parameters '{etype}' and '{offset:?}'"),
}
}
@@ -197,13 +254,22 @@ impl DirInfo {
}
DirEntry {
name,
- attr: DirEntryAttribute::File { size, mtime },
+ attr:
+ DirEntryAttribute::File {
+ size,
+ mtime,
+ extension,
+ },
} => {
writer.write_all(&[CatalogEntryType::File as u8])?;
catalog_encode_u64(writer, name.len() as u64)?;
writer.write_all(name)?;
catalog_encode_u64(writer, *size)?;
catalog_encode_i64(writer, *mtime)?;
+ if let Some(CatalogV2Extension { ctime, file_offset }) = extension {
+ catalog_encode_i64(writer, *ctime)?;
+ catalog_encode_u64(writer, file_offset.raw())?;
+ }
}
DirEntry {
name,
@@ -271,8 +337,11 @@ impl DirInfo {
Ok((self.name, data))
}
- fn parse<C: FnMut(CatalogEntryType, &[u8], u64, u64, i64) -> Result<bool, Error>>(
+ fn parse<
+ C: FnMut(CatalogEntryType, &[u8], u64, u64, i64, i64, Option<Offset>) -> Result<bool, Error>,
+ >(
data: &[u8],
+ catalog_version: Option<[u8; 8]>,
mut callback: C,
) -> Result<(), Error> {
let mut cursor = data;
@@ -300,14 +369,28 @@ impl DirInfo {
let cont = match etype {
CatalogEntryType::Directory => {
let offset = catalog_decode_u64(&mut cursor)?;
- callback(etype, name, offset, 0, 0)?
+ callback(etype, name, offset, 0, 0, 0, None)?
}
CatalogEntryType::File => {
let size = catalog_decode_u64(&mut cursor)?;
let mtime = catalog_decode_i64(&mut cursor)?;
- callback(etype, name, 0, size, mtime)?
+ let (ctime, offset) = if let Some(version) = catalog_version {
+ let mut ctime = 0;
+ let mut offset = None;
+ if version == PROXMOX_CATALOG_FILE_MAGIC_2_0 {
+ ctime = catalog_decode_i64(&mut cursor)?;
+ let file_offset = catalog_decode_u64(&mut cursor)?;
+ offset = Some(Offset::FileOffset {
+ offset: file_offset,
+ })
+ }
+ (ctime, offset)
+ } else {
+ (0, None)
+ };
+ callback(etype, name, 0, size, mtime, ctime, offset)?
}
- _ => callback(etype, name, 0, 0, 0)?,
+ _ => callback(etype, name, 0, 0, 0, 0, None)?,
};
if !cont {
return Ok(());
@@ -342,7 +425,7 @@ impl<W: Write> CatalogWriter<W> {
dirstack: vec![DirInfo::new_rootdir()],
pos: 0,
};
- me.write_all(&PROXMOX_CATALOG_FILE_MAGIC_1_0)?;
+ me.write_all(&PROXMOX_CATALOG_FILE_MAGIC_2_0)?;
Ok(me)
}
@@ -407,15 +490,29 @@ impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
Ok(())
}
- fn add_file(&mut self, name: &CStr, size: u64, mtime: i64) -> Result<(), Error> {
+ fn add_file(
+ &mut self,
+ name: &CStr,
+ size: u64,
+ mtime: i64,
+ ctime: i64,
+ file_offset: pxar::encoder::LinkOffset,
+ ) -> Result<(), Error> {
let dir = self
.dirstack
.last_mut()
.ok_or_else(|| format_err!("outside root"))?;
let name = name.to_bytes().to_vec();
+ let file_offset = FileOffset {
+ offset: file_offset.raw(),
+ };
dir.entries.push(DirEntry {
name,
- attr: DirEntryAttribute::File { size, mtime },
+ attr: DirEntryAttribute::File {
+ size,
+ mtime,
+ extension: Some(CatalogV2Extension { ctime, file_offset }),
+ },
});
Ok(())
}
@@ -502,12 +599,16 @@ impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
/// Read Catalog files
pub struct CatalogReader<R> {
reader: R,
+ magic: Option<[u8; 8]>,
}
impl<R: Read + Seek> CatalogReader<R> {
/// Create a new CatalogReader instance
pub fn new(reader: R) -> Self {
- Self { reader }
+ Self {
+ reader,
+ magic: None,
+ }
}
/// Print whole catalog to stdout
@@ -528,8 +629,11 @@ impl<R: Read + Seek> CatalogReader<R> {
self.reader.seek(SeekFrom::Start(0))?;
let mut magic = [0u8; 8];
self.reader.read_exact(&mut magic)?;
- if magic != PROXMOX_CATALOG_FILE_MAGIC_1_0 {
- bail!("got unexpected magic number for catalog");
+ match magic {
+ PROXMOX_CATALOG_FILE_MAGIC_1_0 | PROXMOX_CATALOG_FILE_MAGIC_2_0 => {
+ self.magic = Some(magic)
+ }
+ _ => bail!("got unexpected magic number for catalog"),
}
self.reader.seek(SeekFrom::End(-8))?;
let start = unsafe { self.reader.read_le_value::<u64>()? };
@@ -550,11 +654,23 @@ impl<R: Read + Seek> CatalogReader<R> {
let mut entry_list = Vec::new();
- DirInfo::parse(&data, |etype, name, offset, size, mtime| {
- let entry = DirEntry::new(etype, name.to_vec(), start - offset, size, mtime);
- entry_list.push(entry);
- Ok(true)
- })?;
+ DirInfo::parse(
+ &data,
+ self.magic,
+ |etype, name, offset, size, mtime, ctime, link_offset| {
+ let entry = DirEntry::new(
+ etype,
+ name.to_vec(),
+ start - offset,
+ size,
+ mtime,
+ ctime,
+ link_offset,
+ );
+ entry_list.push(entry);
+ Ok(true)
+ },
+ )?;
Ok(entry_list)
}
@@ -600,15 +716,27 @@ impl<R: Read + Seek> CatalogReader<R> {
let data = self.read_raw_dirinfo_block(start)?;
let mut item = None;
- DirInfo::parse(&data, |etype, name, offset, size, mtime| {
- if name != filename {
- return Ok(true);
- }
+ DirInfo::parse(
+ &data,
+ self.magic,
+ |etype, name, offset, size, mtime, ctime, link_offset| {
+ if name != filename {
+ return Ok(true);
+ }
- let entry = DirEntry::new(etype, name.to_vec(), start - offset, size, mtime);
- item = Some(entry);
- Ok(false) // stop parsing
- })?;
+ let entry = DirEntry::new(
+ etype,
+ name.to_vec(),
+ start - offset,
+ size,
+ mtime,
+ ctime,
+ link_offset,
+ );
+ item = Some(entry);
+ Ok(false) // stop parsing
+ },
+ )?;
Ok(item)
}
@@ -628,35 +756,51 @@ impl<R: Read + Seek> CatalogReader<R> {
pub fn dump_dir(&mut self, prefix: &std::path::Path, start: u64) -> Result<(), Error> {
let data = self.read_raw_dirinfo_block(start)?;
- DirInfo::parse(&data, |etype, name, offset, size, mtime| {
- let mut path = std::path::PathBuf::from(prefix);
- let name: &OsStr = OsStrExt::from_bytes(name);
- path.push(name);
-
- match etype {
- CatalogEntryType::Directory => {
- log::info!("{} {:?}", etype, path);
- if offset > start {
- bail!("got wrong directory offset ({} > {})", offset, start);
+ DirInfo::parse(
+ &data,
+ self.magic,
+ |etype, name, offset, size, mtime, ctime, link_offset| {
+ let mut path = std::path::PathBuf::from(prefix);
+ let name: &OsStr = OsStrExt::from_bytes(name);
+ path.push(name);
+
+ match etype {
+ CatalogEntryType::Directory => {
+ log::info!("{} {:?}", etype, path);
+ if offset > start {
+ bail!("got wrong directory offset ({} > {})", offset, start);
+ }
+ let pos = start - offset;
+ self.dump_dir(&path, pos)?;
}
- let pos = start - offset;
- self.dump_dir(&path, pos)?;
- }
- CatalogEntryType::File => {
- let mut mtime_string = mtime.to_string();
- if let Ok(s) = proxmox_time::strftime_local("%FT%TZ", mtime) {
- mtime_string = s;
+ CatalogEntryType::File => {
+ let mut mtime_string = mtime.to_string();
+ let mut ctime_string = ctime.to_string();
+ if let Ok(s) = proxmox_time::strftime_local("%FT%TZ", mtime) {
+ mtime_string = s;
+ }
+ if let Ok(s) = proxmox_time::strftime_local("%FT%TZ", ctime) {
+ ctime_string = s;
+ }
+
+ log::info!(
+ "{} {:?} {} {} {} {:?}",
+ etype,
+ path,
+ size,
+ mtime_string,
+ ctime_string,
+ link_offset
+ );
+ }
+ _ => {
+ log::info!("{} {:?}", etype, path);
}
-
- log::info!("{} {:?} {} {}", etype, path, size, mtime_string,);
- }
- _ => {
- log::info!("{} {:?}", etype, path);
}
- }
- Ok(true)
- })
+ Ok(true)
+ },
+ )
}
/// Finds all entries matching the given match patterns and calls the
@@ -705,9 +849,24 @@ impl<R: Read + Seek> CatalogReader<R> {
components.push(b'/');
components.extend(&direntry.name);
let mut entry = ArchiveEntry::new(&components, Some(&direntry.attr));
- if let DirEntryAttribute::File { size, mtime } = direntry.attr {
+ if let DirEntryAttribute::File {
+ size,
+ mtime,
+ extension,
+ } = direntry.attr
+ {
entry.size = size.into();
entry.mtime = mtime.into();
+ entry.ctime = None;
+ entry.file_offset = None;
+ if let Some(CatalogV2Extension {
+ ctime,
+ file_offset: FileOffset { offset },
+ }) = extension
+ {
+ entry.ctime = ctime.into();
+ entry.file_offset = offset.into();
+ }
}
res.push(entry);
}
@@ -916,6 +1075,12 @@ pub struct ArchiveEntry {
/// The file "last modified" time stamp, if entry_type is 'f' (file)
#[serde(skip_serializing_if = "Option::is_none")]
pub mtime: Option<i64>,
+ /// The file "last status change" time stamp, if entry_type is 'f' (file)
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub ctime: Option<i64>,
+ /// The file archive offset, if entry_type is 'f' (file)
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub file_offset: Option<u64>,
}
impl ArchiveEntry {
@@ -946,6 +1111,30 @@ impl ArchiveEntry {
Some(DirEntryAttribute::File { mtime, .. }) => Some(*mtime),
_ => None,
},
+ ctime: match entry_type {
+ Some(DirEntryAttribute::File { extension, .. }) => {
+ if let Some(CatalogV2Extension { ctime, .. }) = extension {
+ Some(*ctime)
+ } else {
+ None
+ }
+ }
+ _ => None,
+ },
+ file_offset: match entry_type {
+ Some(DirEntryAttribute::File { extension, .. }) => {
+ if let Some(CatalogV2Extension {
+ file_offset: FileOffset { offset },
+ ..
+ }) = extension
+ {
+ Some(*offset)
+ } else {
+ None
+ }
+ }
+ _ => None,
+ },
}
}
}
diff --git a/pbs-datastore/src/file_formats.rs b/pbs-datastore/src/file_formats.rs
index 73d67e20..4181f0f1 100644
--- a/pbs-datastore/src/file_formats.rs
+++ b/pbs-datastore/src/file_formats.rs
@@ -5,6 +5,9 @@ use endian_trait::Endian;
// openssl::sha::sha256(b"Proxmox Backup Catalog file v1.0")[0..8]
pub const PROXMOX_CATALOG_FILE_MAGIC_1_0: [u8; 8] = [145, 253, 96, 249, 196, 103, 88, 213];
+// openssl::sha::sha256(b"Proxmox Backup Catalog file v2.0")[0..8]
+pub const PROXMOX_CATALOG_FILE_MAGIC_2_0: [u8; 8] = [204, 223, 24, 211, 187, 125, 183, 226];
+
// openssl::sha::sha256(b"Proxmox Backup uncompressed blob v1.0")[0..8]
pub const UNCOMPRESSED_BLOB_MAGIC_1_0: [u8; 8] = [66, 171, 56, 7, 190, 131, 112, 161];
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index c4e97d33..fb12befa 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -24,7 +24,7 @@ use proxmox_sys::fs::read_subdir;
use pbs_api_types::file_restore::{FileRestoreFormat, RestoreDaemonStatus};
use pbs_client::pxar::{create_archive, Flags, PxarCreateOptions, ENCODER_MAX_ENTRIES};
-use pbs_datastore::catalog::{ArchiveEntry, DirEntryAttribute};
+use pbs_datastore::catalog::{ArchiveEntry, CatalogV2Extension, DirEntryAttribute, FileOffset};
use pbs_tools::json::required_string_param;
use pxar::encoder::aio::TokioWriter;
@@ -109,6 +109,10 @@ fn get_dir_entry(path: &Path) -> Result<DirEntryAttribute, Error> {
libc::S_IFREG => DirEntryAttribute::File {
size: stat.st_size as u64,
mtime: stat.st_mtime,
+ extension: Some(CatalogV2Extension {
+ ctime: stat.st_ctime,
+ file_offset: FileOffset::default(),
+ }),
},
libc::S_IFDIR => DirEntryAttribute::Directory { start: 0 },
_ => bail!("unsupported file type: {}", stat.st_mode),
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-backup 13/29] fix #3174: archiver/extractor: impl appendix ref
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (11 preceding siblings ...)
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 12/29] fix #3174: catalog: introduce extended format v2 Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 14/29] fix #3174: catalog: add specialized Archive entry Christian Ebner
` (15 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
Implements the functionality to create and extract appendix references
via the pbs client.
This reuses the pxar encoders functionality to write appendix reference
entries and adds the implementation to store and access them in the
catalog.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- no changes
pbs-client/src/catalog_shell.rs | 2 +-
pbs-client/src/pxar/create.rs | 12 ++++
| 16 +++++
pbs-client/src/pxar/tools.rs | 8 +++
pbs-datastore/src/catalog.rs | 103 ++++++++++++++++++++++++++++++++
5 files changed, 140 insertions(+), 1 deletion(-)
diff --git a/pbs-client/src/catalog_shell.rs b/pbs-client/src/catalog_shell.rs
index 349bb7cb..8629a3a5 100644
--- a/pbs-client/src/catalog_shell.rs
+++ b/pbs-client/src/catalog_shell.rs
@@ -1085,7 +1085,7 @@ impl<'a> ExtractorState<'a> {
(_, DirEntryAttribute::Directory { .. }) => {
self.handle_new_directory(entry, match_result?).await?;
}
- (true, DirEntryAttribute::File { .. }) => {
+ (true, DirEntryAttribute::File { .. } | DirEntryAttribute::AppendixRef { .. }) => {
self.dir_stack.push(PathStackEntry::new(entry));
let file = Shell::walk_pxar_archive(self.accessor, &mut self.dir_stack).await?;
self.extract_file(file).await?;
diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index a2338218..611d7421 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -735,6 +735,18 @@ impl Archiver {
Ok(out.file_offset())
}
+ async fn add_appendix_ref<T: SeqWrite + Send>(
+ &mut self,
+ encoder: &mut Encoder<'_, T>,
+ file_name: &Path,
+ appendix_offset: pxar::encoder::AppendixRefOffset,
+ file_size: u64,
+ ) -> Result<(), Error> {
+ Ok(encoder
+ .add_appendix_ref(file_name, appendix_offset, file_size)
+ .await?)
+ }
+
async fn add_symlink<T: SeqWrite + Send>(
&mut self,
encoder: &mut Encoder<'_, T>,
--git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
index af18ecfc..f56c8f24 100644
--- a/pbs-client/src/pxar/extract.rs
+++ b/pbs-client/src/pxar/extract.rs
@@ -74,6 +74,7 @@ struct ExtractorIterState {
err_path_stack: Vec<OsString>,
current_match: bool,
end_reached: bool,
+ appendix_list: Vec<(PathBuf, u64, u64)>,
}
/// An [`Iterator`] that encapsulates the process of extraction in [extract_archive].
@@ -98,6 +99,7 @@ impl ExtractorIterState {
err_path_stack: Vec::new(),
current_match: options.extract_match_default,
end_reached: false,
+ appendix_list: Vec::new(),
}
}
}
@@ -373,6 +375,20 @@ where
}
.context(PxarExtractContext::ExtractFile)
}
+ (
+ true,
+ EntryKind::AppendixRef {
+ appendix_offset,
+ file_size,
+ },
+ ) => {
+ self.state.appendix_list.push((
+ entry.path().to_path_buf(),
+ *appendix_offset,
+ *file_size,
+ ));
+ Ok(())
+ }
(false, _) => Ok(()), // skip this
};
diff --git a/pbs-client/src/pxar/tools.rs b/pbs-client/src/pxar/tools.rs
index 0cfbaf5b..aac5a1e7 100644
--- a/pbs-client/src/pxar/tools.rs
+++ b/pbs-client/src/pxar/tools.rs
@@ -156,6 +156,14 @@ pub fn format_multi_line_entry(entry: &Entry) -> String {
let (size, link, type_name) = match entry.kind() {
EntryKind::File { size, .. } => (format!("{}", *size), String::new(), "file"),
+ EntryKind::AppendixRef {
+ appendix_offset,
+ file_size,
+ } => (
+ format!("{} {}", appendix_offset, file_size),
+ String::new(),
+ "appendix ref",
+ ),
EntryKind::Symlink(link) => (
"0".to_string(),
format!(" -> {:?}", link.as_os_str()),
diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
index 746b493b..8ae7c661 100644
--- a/pbs-datastore/src/catalog.rs
+++ b/pbs-datastore/src/catalog.rs
@@ -28,6 +28,14 @@ pub trait BackupCatalogWriter {
ctime: i64,
file_offset: pxar::encoder::LinkOffset,
) -> Result<(), Error>;
+ fn add_appendix_ref(
+ &mut self,
+ name: &CStr,
+ size: u64,
+ mtime: i64,
+ ctime: i64,
+ appendix_ref_offset: pxar::encoder::AppendixRefOffset,
+ ) -> Result<(), Error>;
fn add_symlink(&mut self, name: &CStr) -> Result<(), Error>;
fn add_hardlink(&mut self, name: &CStr) -> Result<(), Error>;
fn add_block_device(&mut self, name: &CStr) -> Result<(), Error>;
@@ -41,6 +49,7 @@ pub trait BackupCatalogWriter {
pub enum CatalogEntryType {
Directory = b'd',
File = b'f',
+ AppendixRef = b'r',
Symlink = b'l',
Hardlink = b'h',
BlockDevice = b'b',
@@ -56,6 +65,7 @@ impl TryFrom<u8> for CatalogEntryType {
Ok(match value {
b'd' => CatalogEntryType::Directory,
b'f' => CatalogEntryType::File,
+ b'r' => CatalogEntryType::AppendixRef,
b'l' => CatalogEntryType::Symlink,
b'h' => CatalogEntryType::Hardlink,
b'b' => CatalogEntryType::BlockDevice,
@@ -72,6 +82,7 @@ impl From<&DirEntryAttribute> for CatalogEntryType {
match value {
DirEntryAttribute::Directory { .. } => CatalogEntryType::Directory,
DirEntryAttribute::File { .. } => CatalogEntryType::File,
+ DirEntryAttribute::AppendixRef { .. } => CatalogEntryType::AppendixRef,
DirEntryAttribute::Symlink => CatalogEntryType::Symlink,
DirEntryAttribute::Hardlink => CatalogEntryType::Hardlink,
DirEntryAttribute::BlockDevice => CatalogEntryType::BlockDevice,
@@ -98,9 +109,22 @@ impl FileOffset {
self.offset
}
}
+
+#[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
+pub struct AppendixRefOffset {
+ offset: u64,
+}
+
+impl AppendixRefOffset {
+ pub fn raw(&self) -> u64 {
+ self.offset
+ }
+}
+
#[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
pub enum Offset {
FileOffset { offset: u64 },
+ AppendixRefOffset { offset: u64 },
}
/// Represents a named directory entry
@@ -130,6 +154,12 @@ pub enum DirEntryAttribute {
mtime: i64,
extension: Option<CatalogV2Extension>,
},
+ AppendixRef {
+ size: u64,
+ mtime: i64,
+ ctime: i64,
+ appendix_ref_offset: AppendixRefOffset,
+ },
Symlink,
Hardlink,
BlockDevice,
@@ -195,6 +225,17 @@ impl DirEntry {
name,
attr: DirEntryAttribute::Socket,
},
+ (CatalogEntryType::AppendixRef, Some(Offset::AppendixRefOffset { offset })) => {
+ DirEntry {
+ name,
+ attr: DirEntryAttribute::AppendixRef {
+ size,
+ mtime,
+ ctime,
+ appendix_ref_offset: AppendixRefOffset { offset },
+ },
+ }
+ }
_ => panic!("unexpected parameters '{etype}' and '{offset:?}'"),
}
}
@@ -204,6 +245,7 @@ impl DirEntry {
Some(match self.attr {
DirEntryAttribute::Directory { .. } => pxar::mode::IFDIR,
DirEntryAttribute::File { .. } => pxar::mode::IFREG,
+ DirEntryAttribute::AppendixRef { .. } => pxar::mode::IFREG,
DirEntryAttribute::Symlink => pxar::mode::IFLNK,
DirEntryAttribute::Hardlink => return None,
DirEntryAttribute::BlockDevice => pxar::mode::IFBLK,
@@ -271,6 +313,24 @@ impl DirInfo {
catalog_encode_u64(writer, file_offset.raw())?;
}
}
+ DirEntry {
+ name,
+ attr:
+ DirEntryAttribute::AppendixRef {
+ size,
+ mtime,
+ ctime,
+ appendix_ref_offset,
+ },
+ } => {
+ writer.write_all(&[CatalogEntryType::AppendixRef as u8])?;
+ catalog_encode_u64(writer, name.len() as u64)?;
+ writer.write_all(name)?;
+ catalog_encode_u64(writer, *size)?;
+ catalog_encode_i64(writer, *mtime)?;
+ catalog_encode_i64(writer, *ctime)?;
+ catalog_encode_u64(writer, appendix_ref_offset.raw())?;
+ }
DirEntry {
name,
attr: DirEntryAttribute::Symlink,
@@ -390,6 +450,21 @@ impl DirInfo {
};
callback(etype, name, 0, size, mtime, ctime, offset)?
}
+ CatalogEntryType::AppendixRef => {
+ let size = catalog_decode_u64(&mut cursor)?;
+ let mtime = catalog_decode_i64(&mut cursor)?;
+ let ctime = catalog_decode_i64(&mut cursor)?;
+ let offset = catalog_decode_u64(&mut cursor)?;
+ callback(
+ etype,
+ name,
+ 0,
+ size,
+ mtime,
+ ctime,
+ Some(Offset::AppendixRefOffset { offset }),
+ )?
+ }
_ => callback(etype, name, 0, 0, 0, 0, None)?,
};
if !cont {
@@ -517,6 +592,34 @@ impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
Ok(())
}
+ fn add_appendix_ref(
+ &mut self,
+ name: &CStr,
+ size: u64,
+ mtime: i64,
+ ctime: i64,
+ appendix_ref_offset: pxar::encoder::AppendixRefOffset,
+ ) -> Result<(), Error> {
+ let dir = self
+ .dirstack
+ .last_mut()
+ .ok_or_else(|| format_err!("outside root"))?;
+ let name = name.to_bytes().to_vec();
+ let appendix_ref_offset = AppendixRefOffset {
+ offset: appendix_ref_offset.raw(),
+ };
+ dir.entries.push(DirEntry {
+ name,
+ attr: DirEntryAttribute::AppendixRef {
+ size,
+ mtime,
+ ctime,
+ appendix_ref_offset,
+ },
+ });
+ Ok(())
+ }
+
fn add_symlink(&mut self, name: &CStr) -> Result<(), Error> {
let dir = self
.dirstack
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-backup 14/29] fix #3174: catalog: add specialized Archive entry
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (12 preceding siblings ...)
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 13/29] fix #3174: archiver/extractor: impl appendix ref Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 15/29] fix #3174: extractor: impl seq restore from appendix Christian Ebner
` (14 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
Introduces a specialized pxar directory entry type Archive,
which extends the regular directory entry by storing an additional
optional appendix start offset.
The archive entry type is only used for the top most entries in the
catalog, replacing the currently used directory entry. If this entry
was created by reusing pxar file entries in an appendix section,
the appendix start offset is present and can be used to easily locate
and calculate the referenced file entries within the appendix section
to access them from the catalog shell.
Since the catalog might contain multiple archives, each archive entry
stores its individual appendix start offset.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- no changes
pbs-client/src/catalog_shell.rs | 2 +-
pbs-datastore/src/catalog.rs | 152 +++++++++++++++++++++++++++++++-
2 files changed, 150 insertions(+), 4 deletions(-)
diff --git a/pbs-client/src/catalog_shell.rs b/pbs-client/src/catalog_shell.rs
index 8629a3a5..c9b49a9a 100644
--- a/pbs-client/src/catalog_shell.rs
+++ b/pbs-client/src/catalog_shell.rs
@@ -1082,7 +1082,7 @@ impl<'a> ExtractorState<'a> {
};
match (did_match, &entry.attr) {
- (_, DirEntryAttribute::Directory { .. }) => {
+ (_, DirEntryAttribute::Archive { .. }) | (_, DirEntryAttribute::Directory { .. }) => {
self.handle_new_directory(entry, match_result?).await?;
}
(true, DirEntryAttribute::File { .. } | DirEntryAttribute::AppendixRef { .. }) => {
diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
index 8ae7c661..220313c6 100644
--- a/pbs-datastore/src/catalog.rs
+++ b/pbs-datastore/src/catalog.rs
@@ -18,6 +18,11 @@ use crate::file_formats::{PROXMOX_CATALOG_FILE_MAGIC_1_0, PROXMOX_CATALOG_FILE_M
/// A file list catalog simply stores a directory tree. Such catalogs may be used as index to do a
/// fast search for files.
pub trait BackupCatalogWriter {
+ fn start_archive(&mut self, name: &CStr) -> Result<(), Error>;
+ fn end_archive(
+ &mut self,
+ appendix: Option<pxar::encoder::AppendixStartOffset>,
+ ) -> Result<(), Error>;
fn start_directory(&mut self, name: &CStr) -> Result<(), Error>;
fn end_directory(&mut self) -> Result<(), Error>;
fn add_file(
@@ -50,6 +55,7 @@ pub enum CatalogEntryType {
Directory = b'd',
File = b'f',
AppendixRef = b'r',
+ Archive = b'a',
Symlink = b'l',
Hardlink = b'h',
BlockDevice = b'b',
@@ -66,6 +72,7 @@ impl TryFrom<u8> for CatalogEntryType {
b'd' => CatalogEntryType::Directory,
b'f' => CatalogEntryType::File,
b'r' => CatalogEntryType::AppendixRef,
+ b'a' => CatalogEntryType::Archive,
b'l' => CatalogEntryType::Symlink,
b'h' => CatalogEntryType::Hardlink,
b'b' => CatalogEntryType::BlockDevice,
@@ -83,6 +90,7 @@ impl From<&DirEntryAttribute> for CatalogEntryType {
DirEntryAttribute::Directory { .. } => CatalogEntryType::Directory,
DirEntryAttribute::File { .. } => CatalogEntryType::File,
DirEntryAttribute::AppendixRef { .. } => CatalogEntryType::AppendixRef,
+ DirEntryAttribute::Archive { .. } => CatalogEntryType::Archive,
DirEntryAttribute::Symlink => CatalogEntryType::Symlink,
DirEntryAttribute::Hardlink => CatalogEntryType::Hardlink,
DirEntryAttribute::BlockDevice => CatalogEntryType::BlockDevice,
@@ -121,10 +129,22 @@ impl AppendixRefOffset {
}
}
+#[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
+pub struct AppendixStartOffset {
+ offset: u64,
+}
+
+impl AppendixStartOffset {
+ pub fn raw(&self) -> u64 {
+ self.offset
+ }
+}
+
#[derive(Clone, Copy, Debug, Eq, PartialEq, Ord, PartialOrd)]
pub enum Offset {
FileOffset { offset: u64 },
AppendixRefOffset { offset: u64 },
+ AppendixStartOffset { offset: u64 },
}
/// Represents a named directory entry
@@ -160,6 +180,10 @@ pub enum DirEntryAttribute {
ctime: i64,
appendix_ref_offset: AppendixRefOffset,
},
+ Archive {
+ start: u64,
+ appendix_offset: AppendixStartOffset,
+ },
Symlink,
Hardlink,
BlockDevice,
@@ -236,6 +260,13 @@ impl DirEntry {
},
}
}
+ (CatalogEntryType::Archive, Some(Offset::AppendixStartOffset { offset })) => DirEntry {
+ name,
+ attr: DirEntryAttribute::Archive {
+ start,
+ appendix_offset: AppendixStartOffset { offset },
+ },
+ },
_ => panic!("unexpected parameters '{etype}' and '{offset:?}'"),
}
}
@@ -246,6 +277,7 @@ impl DirEntry {
DirEntryAttribute::Directory { .. } => pxar::mode::IFDIR,
DirEntryAttribute::File { .. } => pxar::mode::IFREG,
DirEntryAttribute::AppendixRef { .. } => pxar::mode::IFREG,
+ DirEntryAttribute::Archive { .. } => pxar::mode::IFDIR,
DirEntryAttribute::Symlink => pxar::mode::IFLNK,
DirEntryAttribute::Hardlink => return None,
DirEntryAttribute::BlockDevice => pxar::mode::IFBLK,
@@ -258,6 +290,12 @@ impl DirEntry {
/// Check if DirEntry is a directory
pub fn is_directory(&self) -> bool {
matches!(self.attr, DirEntryAttribute::Directory { .. })
+ || matches!(self.attr, DirEntryAttribute::Archive { .. })
+ }
+
+ /// Check if DirEntry is an archive
+ pub fn is_archive(&self) -> bool {
+ matches!(self.attr, DirEntryAttribute::Archive { .. })
}
/// Check if DirEntry is a symlink
@@ -285,6 +323,20 @@ impl DirInfo {
fn encode_entry<W: Write>(writer: &mut W, entry: &DirEntry, pos: u64) -> Result<(), Error> {
match entry {
+ DirEntry {
+ name,
+ attr:
+ DirEntryAttribute::Archive {
+ start,
+ appendix_offset,
+ },
+ } => {
+ writer.write_all(&[CatalogEntryType::Archive as u8])?;
+ catalog_encode_u64(writer, name.len() as u64)?;
+ writer.write_all(name)?;
+ catalog_encode_u64(writer, appendix_offset.raw())?;
+ catalog_encode_u64(writer, pos - start)?;
+ }
DirEntry {
name,
attr: DirEntryAttribute::Directory { start },
@@ -427,6 +479,19 @@ impl DirInfo {
cursor.read_exact(name)?;
let cont = match etype {
+ CatalogEntryType::Archive => {
+ let offset = catalog_decode_u64(&mut cursor)?;
+ let start = catalog_decode_u64(&mut cursor)?;
+ callback(
+ etype,
+ name,
+ start,
+ 0,
+ 0,
+ 0,
+ Some(Offset::AppendixStartOffset { offset }),
+ )?
+ }
CatalogEntryType::Directory => {
let offset = catalog_decode_u64(&mut cursor)?;
callback(etype, name, offset, 0, 0, 0, None)?
@@ -533,6 +598,51 @@ impl<W: Write> CatalogWriter<W> {
}
impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
+ fn start_archive(&mut self, name: &CStr) -> Result<(), Error> {
+ let new = DirInfo::new(name.to_owned());
+ self.dirstack.push(new);
+ Ok(())
+ }
+
+ fn end_archive(
+ &mut self,
+ appendix: Option<pxar::encoder::AppendixStartOffset>,
+ ) -> Result<(), Error> {
+ let (start, name) = match self.dirstack.pop() {
+ Some(dir) => {
+ let start = self.pos;
+ let (name, data) = dir.encode(start)?;
+ self.write_all(&data)?;
+ (start, name)
+ }
+ None => {
+ bail!("got unexpected end_directory level 0");
+ }
+ };
+
+ let current = self
+ .dirstack
+ .last_mut()
+ .ok_or_else(|| format_err!("outside root"))?;
+ let name = name.to_bytes().to_vec();
+ let appendix_offset = if let Some(appendix) = appendix {
+ AppendixStartOffset {
+ offset: appendix.raw(),
+ }
+ } else {
+ AppendixStartOffset { offset: 0 }
+ };
+ current.entries.push(DirEntry {
+ name,
+ attr: DirEntryAttribute::Archive {
+ start,
+ appendix_offset,
+ },
+ });
+
+ Ok(())
+ }
+
fn start_directory(&mut self, name: &CStr) -> Result<(), Error> {
let new = DirInfo::new(name.to_owned());
self.dirstack.push(new);
@@ -746,10 +856,33 @@ impl<R: Read + Seek> CatalogReader<R> {
})
}
+ pub fn appendix_offset(
+ &mut self,
+ archive_name: &[u8],
+ ) -> Result<Option<AppendixStartOffset>, Error> {
+ let root = self.root()?;
+ let dir_entry = self.lookup(&root, archive_name)?.unwrap();
+ if let DirEntry {
+ attr: DirEntryAttribute::Archive {
+ appendix_offset, ..
+ },
+ ..
+ } = dir_entry
+ {
+ if appendix_offset.raw() != 0 {
+ return Ok(Some(appendix_offset));
+ } else {
+ return Ok(None);
+ }
+ }
+ Ok(None)
+ }
+
/// Read all directory entries
pub fn read_dir(&mut self, parent: &DirEntry) -> Result<Vec<DirEntry>, Error> {
let start = match parent.attr {
DirEntryAttribute::Directory { start } => start,
+ DirEntryAttribute::Archive { start, .. } => start,
_ => bail!("parent is not a directory - internal error"),
};
@@ -813,6 +946,7 @@ impl<R: Read + Seek> CatalogReader<R> {
) -> Result<Option<DirEntry>, Error> {
let start = match parent.attr {
DirEntryAttribute::Directory { start } => start,
+ DirEntryAttribute::Archive { start, .. } => start,
_ => bail!("parent is not a directory - internal error"),
};
@@ -822,7 +956,7 @@ impl<R: Read + Seek> CatalogReader<R> {
DirInfo::parse(
&data,
self.magic,
- |etype, name, offset, size, mtime, ctime, link_offset| {
+ |etype, name, offset, size, mtime, ctime, archive_offset| {
if name != filename {
return Ok(true);
}
@@ -834,7 +968,7 @@ impl<R: Read + Seek> CatalogReader<R> {
size,
mtime,
ctime,
- link_offset,
+ archive_offset,
);
item = Some(entry);
Ok(false) // stop parsing
@@ -868,6 +1002,14 @@ impl<R: Read + Seek> CatalogReader<R> {
path.push(name);
match etype {
+ CatalogEntryType::Archive => {
+ log::info!("{} {:?}", etype, path);
+ if offset > start {
+ bail!("got wrong archive offset ({} > {})", offset, start);
+ }
+ let pos = start - offset;
+ self.dump_dir(&path, pos)?;
+ }
CatalogEntryType::Directory => {
log::info!("{} {:?}", etype, path);
if offset > start {
@@ -1208,7 +1350,11 @@ impl ArchiveEntry {
Some(entry_type) => CatalogEntryType::from(entry_type).to_string(),
None => "v".to_owned(),
},
- leaf: !matches!(entry_type, None | Some(DirEntryAttribute::Directory { .. })),
+ leaf: !matches!(
+ entry_type,
+ None | Some(DirEntryAttribute::Directory { .. })
+ | Some(DirEntryAttribute::Archive { .. })
+ ),
size,
mtime: match entry_type {
Some(DirEntryAttribute::File { mtime, .. }) => Some(*mtime),
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-backup 15/29] fix #3174: extractor: impl seq restore from appendix
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (13 preceding siblings ...)
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 14/29] fix #3174: catalog: add specialized Archive entry Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 16/29] fix #3174: archiver: store ref to previous backup Christian Ebner
` (13 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
Restores the file payloads for all AppendixRef entries encountered
during the sequential restore of the pxar archive.
This is done by iterating over all the files listed in the corresponding
state variable, opening each of the parent directory while storing its
metadata for successive restore and creating the file, followed by
writing the contents to it.
When leaving the directories, their metatdata is restored.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- no changes
pbs-client/src/pxar/create.rs | 4 +-
| 147 +++++++++++++++++++++++++++++++--
pbs-client/src/pxar/tools.rs | 1 +
3 files changed, 142 insertions(+), 10 deletions(-)
diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 611d7421..50bba4e6 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -43,7 +43,7 @@ pub struct PxarCreateOptions {
pub skip_lost_and_found: bool,
}
-fn detect_fs_type(fd: RawFd) -> Result<i64, Error> {
+pub fn detect_fs_type(fd: RawFd) -> Result<i64, Error> {
let mut fs_stat = std::mem::MaybeUninit::uninit();
let res = unsafe { libc::fstatfs(fd, fs_stat.as_mut_ptr()) };
Errno::result(res)?;
@@ -776,7 +776,7 @@ impl Archiver {
}
}
-fn get_metadata(
+pub fn get_metadata(
fd: RawFd,
stat: &FileStat,
flags: Flags,
--git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
index f56c8f24..e7dc5b02 100644
--- a/pbs-client/src/pxar/extract.rs
+++ b/pbs-client/src/pxar/extract.rs
@@ -1,6 +1,6 @@
//! Code for extraction of pxar contents onto the file system.
-use std::collections::HashMap;
+use std::collections::{BTreeMap, HashMap};
use std::ffi::{CStr, CString, OsStr, OsString};
use std::io;
use std::os::unix::ffi::OsStrExt;
@@ -74,7 +74,7 @@ struct ExtractorIterState {
err_path_stack: Vec<OsString>,
current_match: bool,
end_reached: bool,
- appendix_list: Vec<(PathBuf, u64, u64)>,
+ appendix_refs: BTreeMap<u64, (PathBuf, u64)>,
}
/// An [`Iterator`] that encapsulates the process of extraction in [extract_archive].
@@ -99,7 +99,7 @@ impl ExtractorIterState {
err_path_stack: Vec::new(),
current_match: options.extract_match_default,
end_reached: false,
- appendix_list: Vec::new(),
+ appendix_refs: BTreeMap::new(),
}
}
}
@@ -313,6 +313,139 @@ where
res
}
+ (_, EntryKind::Appendix { total }) => {
+ // Bytes consumed in decoder since encountering the appendix marker
+ let mut consumed = 0;
+ for (offset, (path, size)) in &self.state.appendix_refs {
+ self.extractor.allow_existing_dirs = true;
+
+ // Open dir path components, skipping the root component, get metadata
+ for dir in path.iter().skip(1) {
+ let parent_fd = match self.extractor.dir_stack.last_dir_fd(true) {
+ Ok(parent_fd) => parent_fd,
+ Err(err) => return Some(Err(err.into())),
+ };
+ let fs_magic =
+ match crate::pxar::create::detect_fs_type(parent_fd.as_raw_fd()) {
+ Ok(fs_magic) => fs_magic,
+ Err(err) => return Some(Err(err.into())),
+ };
+
+ let mut fs_feature_flags = Flags::from_magic(fs_magic);
+ let file_name = match CString::new(dir.as_bytes()) {
+ Ok(file_name) => file_name,
+ Err(err) => return Some(Err(err.into())),
+ };
+ let fd = match proxmox_sys::fd::openat(
+ &parent_fd,
+ file_name.as_ref(),
+ OFlag::O_NOATIME,
+ Mode::empty(),
+ ) {
+ Ok(fd) => fd,
+ Err(err) => return Some(Err(err.into())),
+ };
+
+ let stat = match nix::sys::stat::fstat(fd.as_raw_fd()) {
+ Ok(stat) => stat,
+ Err(err) => return Some(Err(err.into())),
+ };
+ let metadata = match crate::pxar::create::get_metadata(
+ fd.as_raw_fd(),
+ &stat,
+ fs_feature_flags,
+ fs_magic,
+ &mut fs_feature_flags,
+ ) {
+ Ok(metadata) => metadata,
+ Err(err) => return Some(Err(err)),
+ };
+
+ match self.extractor.enter_directory(
+ dir.to_os_string(),
+ metadata.clone(),
+ true,
+ ) {
+ Ok(()) => (),
+ Err(err) => return Some(Err(err)),
+ };
+ }
+
+ let skip = *offset - consumed;
+ match self.decoder.skip_bytes(skip) {
+ Ok(()) => (),
+ Err(err) => return Some(Err(err.into())),
+ };
+
+ let entry = match self.decoder.next() {
+ Some(Ok(entry)) => entry,
+ Some(Err(err)) => return Some(Err(err.into())),
+ None => return Some(Err(format_err!("expected entry"))),
+ };
+
+ let file_name_os = entry.file_name();
+ let file_name_bytes = file_name_os.as_bytes();
+
+ let file_name = match CString::new(file_name_bytes) {
+ Ok(file_name_ref) => file_name_ref,
+ Err(err) => return Some(Err(err.into())),
+ };
+
+ let metadata = entry.metadata();
+
+ self.extractor.set_path(path.as_os_str().to_owned());
+
+ let contents = self.decoder.contents();
+ match contents {
+ None => {
+ return Some(Err(format_err!(
+ "found regular file entry without contents in archive"
+ )))
+ }
+ Some(mut contents) => {
+ let result = self
+ .extractor
+ .extract_file(
+ &file_name,
+ metadata,
+ *size,
+ &mut contents,
+ self.extractor
+ .overwrite_flags
+ .contains(OverwriteFlags::FILE),
+ )
+ .context(PxarExtractContext::ExtractFile);
+ if let Err(err) = result {
+ return Some(Err(err.into()));
+ }
+ }
+ }
+
+ // Iter over all dir path components, skipping the root component, set metadata
+ for _dir in path.iter().skip(1) {
+ if let Err(err) = self.extractor.leave_directory() {
+ return Some(Err(err.into()));
+ }
+ }
+
+ let mut bytes =
+ match pxar::encoder::sync::encoded_size(file_name.as_c_str(), &metadata) {
+ Ok(bytes) => bytes,
+ Err(err) => return Some(Err(err.into())),
+ };
+ // payload header size
+ bytes += std::mem::size_of::<pxar::format::Header>() as u64;
+
+ consumed += skip + bytes + *size;
+ }
+
+ let skip = *total - consumed;
+ if let Err(err) = self.decoder.skip_bytes(skip) {
+ return Some(Err(err.into()));
+ }
+
+ Ok(())
+ }
(true, EntryKind::Symlink(link)) => {
self.callback(entry.path());
self.extractor
@@ -382,11 +515,9 @@ where
file_size,
},
) => {
- self.state.appendix_list.push((
- entry.path().to_path_buf(),
- *appendix_offset,
- *file_size,
- ));
+ self.state
+ .appendix_refs
+ .insert(*appendix_offset, (entry.path().to_path_buf(), *file_size));
Ok(())
}
(false, _) => Ok(()), // skip this
diff --git a/pbs-client/src/pxar/tools.rs b/pbs-client/src/pxar/tools.rs
index aac5a1e7..174a7351 100644
--- a/pbs-client/src/pxar/tools.rs
+++ b/pbs-client/src/pxar/tools.rs
@@ -156,6 +156,7 @@ pub fn format_multi_line_entry(entry: &Entry) -> String {
let (size, link, type_name) = match entry.kind() {
EntryKind::File { size, .. } => (format!("{}", *size), String::new(), "file"),
+ EntryKind::Appendix { total } => (format!("{total}"), String::new(), "appendix"),
EntryKind::AppendixRef {
appendix_offset,
file_size,
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-backup 16/29] fix #3174: archiver: store ref to previous backup
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (14 preceding siblings ...)
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 15/29] fix #3174: extractor: impl seq restore from appendix Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 17/29] fix #3174: upload stream: impl reused chunk injector Christian Ebner
` (12 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
Adds the stateful information needed for accessing the previous backups
information such as the index and catalog.
This patch only introduces the struct and interface, by setting all
occurrences to None it is not used for now.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- read and decode pxar format version entry, if present
pbs-client/src/pxar/create.rs | 25 ++++++++++++++++---
| 1 +
pbs-client/src/pxar/mod.rs | 2 +-
pbs-client/src/pxar/tools.rs | 1 +
proxmox-backup-client/src/main.rs | 1 +
.../src/proxmox_restore_daemon/api.rs | 1 +
pxar-bin/src/main.rs | 17 +++++++------
src/tape/file_formats/snapshot_archive.rs | 7 ++++--
8 files changed, 41 insertions(+), 14 deletions(-)
diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 50bba4e6..4f6ffc78 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -24,14 +24,15 @@ use proxmox_io::vec;
use proxmox_lang::c_str;
use proxmox_sys::fs::{self, acl, xattr};
-use pbs_datastore::catalog::BackupCatalogWriter;
+use pbs_datastore::catalog::{BackupCatalogWriter, CatalogReader};
+use pbs_datastore::dynamic_index::DynamicIndexReader;
use crate::pxar::metadata::errno_is_unsupported;
use crate::pxar::tools::assert_single_path_component;
use crate::pxar::Flags;
/// Pxar options for creating a pxar archive/stream
-#[derive(Default, Clone)]
+#[derive(Default)]
pub struct PxarCreateOptions {
/// Device/mountpoint st_dev numbers that should be included. None for no limitation.
pub device_set: Option<HashSet<u64>>,
@@ -41,6 +42,18 @@ pub struct PxarCreateOptions {
pub entries_max: usize,
/// Skip lost+found directory
pub skip_lost_and_found: bool,
+ /// Reference state for partial backups
+ pub previous_ref: Option<PxarPrevRef>,
+}
+
+/// Contains statefull information of previous backups snapshots for partial backups
+pub struct PxarPrevRef {
+ /// Reference index for partial backups
+ pub index: DynamicIndexReader,
+ /// Reference catalog for partial backups
+ pub catalog: CatalogReader<std::fs::File>,
+ /// Reference archive name for partial backups
+ pub archive_name: String,
}
pub fn detect_fs_type(fd: RawFd) -> Result<i64, Error> {
@@ -128,6 +141,7 @@ struct Archiver {
device_set: Option<HashSet<u64>>,
hardlinks: HashMap<HardLinkInfo, (PathBuf, LinkOffset)>,
file_copy_buffer: Vec<u8>,
+ previous_ref: Option<PxarPrevRef>,
}
type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
@@ -166,7 +180,11 @@ where
set.insert(stat.st_dev);
}
- let mut encoder = Encoder::new(&mut writer, &metadata).await?;
+ let pxar_format_version = match options.previous_ref {
+ Some(_) => pxar::format::FormatVersion::Version2,
+ None => pxar::format::FormatVersion::default(),
+ };
+ let mut encoder = Encoder::new(&mut writer, &metadata, pxar_format_version).await?;
let mut patterns = options.patterns;
@@ -192,6 +210,7 @@ where
device_set,
hardlinks: HashMap::new(),
file_copy_buffer: vec::undefined(4 * 1024 * 1024),
+ previous_ref: options.previous_ref,
};
archiver
--git a/pbs-client/src/pxar/extract.rs b/pbs-client/src/pxar/extract.rs
index e7dc5b02..0ded62ac 100644
--- a/pbs-client/src/pxar/extract.rs
+++ b/pbs-client/src/pxar/extract.rs
@@ -269,6 +269,7 @@ where
};
let extract_res = match (did_match, entry.kind()) {
+ (_, EntryKind::Version(_)) => Ok(()),
(_, EntryKind::Directory) => {
self.callback(entry.path());
diff --git a/pbs-client/src/pxar/mod.rs b/pbs-client/src/pxar/mod.rs
index 14674b9b..24315f5f 100644
--- a/pbs-client/src/pxar/mod.rs
+++ b/pbs-client/src/pxar/mod.rs
@@ -56,7 +56,7 @@ pub(crate) mod tools;
mod flags;
pub use flags::Flags;
-pub use create::{create_archive, PxarCreateOptions};
+pub use create::{create_archive, PxarCreateOptions, PxarPrevRef};
pub use extract::{
create_tar, create_zip, extract_archive, extract_sub_dir, extract_sub_dir_seq, ErrorHandler,
OverwriteFlags, PxarExtractContext, PxarExtractOptions,
diff --git a/pbs-client/src/pxar/tools.rs b/pbs-client/src/pxar/tools.rs
index 174a7351..52e025e4 100644
--- a/pbs-client/src/pxar/tools.rs
+++ b/pbs-client/src/pxar/tools.rs
@@ -155,6 +155,7 @@ pub fn format_multi_line_entry(entry: &Entry) -> String {
let meta = entry.metadata();
let (size, link, type_name) = match entry.kind() {
+ EntryKind::Version(version) => (format!("{version:?}"), String::new(), "version"),
EntryKind::File { size, .. } => (format!("{}", *size), String::new(), "file"),
EntryKind::Appendix { total } => (format!("{total}"), String::new(), "appendix"),
EntryKind::AppendixRef {
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index e5caf87d..f575ccd0 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -993,6 +993,7 @@ async fn create_backup(
patterns: pattern_list.clone(),
entries_max: entries_max as usize,
skip_lost_and_found,
+ previous_ref: None,
};
let upload_options = UploadOptions {
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index fb12befa..f89b0ab4 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -356,6 +356,7 @@ fn extract(
device_set: None,
patterns,
skip_lost_and_found: false,
+ previous_ref: None,
};
let pxar_writer = TokioWriter::new(writer);
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index bc044035..9376a2c1 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -330,13 +330,6 @@ async fn create_archive(
Some(HashSet::new())
};
- let options = pbs_client::pxar::PxarCreateOptions {
- entries_max: entries_max as usize,
- device_set,
- patterns,
- skip_lost_and_found: false,
- };
-
let source = PathBuf::from(source);
let dir = nix::dir::Dir::open(
@@ -349,7 +342,7 @@ async fn create_archive(
.create_new(true)
.write(true)
.mode(0o640)
- .open(archive)?;
+ .open(archive.clone())?;
let writer = std::io::BufWriter::with_capacity(1024 * 1024, file);
let mut feature_flags = Flags::DEFAULT;
@@ -372,6 +365,14 @@ async fn create_archive(
feature_flags.remove(Flags::WITH_SOCKETS);
}
+ let options = pbs_client::pxar::PxarCreateOptions {
+ entries_max: entries_max as usize,
+ device_set,
+ patterns,
+ skip_lost_and_found: false,
+ previous_ref: None,
+ };
+
let writer = pxar::encoder::sync::StandardWriter::new(writer);
pbs_client::pxar::create_archive(
dir,
diff --git a/src/tape/file_formats/snapshot_archive.rs b/src/tape/file_formats/snapshot_archive.rs
index 252384b5..abeefbd3 100644
--- a/src/tape/file_formats/snapshot_archive.rs
+++ b/src/tape/file_formats/snapshot_archive.rs
@@ -58,8 +58,11 @@ pub fn tape_write_snapshot_archive<'a>(
));
}
- let mut encoder =
- pxar::encoder::sync::Encoder::new(PxarTapeWriter::new(writer), &root_metadata)?;
+ let mut encoder = pxar::encoder::sync::Encoder::new(
+ PxarTapeWriter::new(writer),
+ &root_metadata,
+ pxar::format::FormatVersion::default(),
+ )?;
for filename in file_list.iter() {
let mut file = snapshot_reader.open_file(filename).map_err(|err| {
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-backup 17/29] fix #3174: upload stream: impl reused chunk injector
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (15 preceding siblings ...)
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 16/29] fix #3174: archiver: store ref to previous backup Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 18/29] fix #3174: chunker: add forced boundaries Christian Ebner
` (11 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
In order to be included in the backups index file, the reused chunks
which store the payload of skipped files during pxar encoding have to be
inserted after the encoder has written the pxar appendix entry type.
The chunker forces a chunk boundary after this marker and queues the
list of chunks to be uploaded thereafter.
This implements the logic to inject the chunks into the chunk upload
stream after such a boundary is requested, by looping over the queued
chunks and inserting them into the stream.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- refator to use AppendableDynamicEntry
- fix formatting via `cargo fmt`
pbs-client/src/inject_reused_chunks.rs | 152 +++++++++++++++++++++++++
pbs-client/src/lib.rs | 1 +
2 files changed, 153 insertions(+)
create mode 100644 pbs-client/src/inject_reused_chunks.rs
diff --git a/pbs-client/src/inject_reused_chunks.rs b/pbs-client/src/inject_reused_chunks.rs
new file mode 100644
index 00000000..7c0f7780
--- /dev/null
+++ b/pbs-client/src/inject_reused_chunks.rs
@@ -0,0 +1,152 @@
+use std::collections::VecDeque;
+use std::pin::Pin;
+use std::sync::atomic::{AtomicUsize, Ordering};
+use std::sync::{Arc, Mutex};
+use std::task::{Context, Poll};
+
+use anyhow::{anyhow, Error};
+use futures::{ready, Stream};
+use pin_project_lite::pin_project;
+
+use pbs_datastore::dynamic_index::AppendableDynamicEntry;
+
+pin_project! {
+ pub struct InjectReusedChunksQueue<S> {
+ #[pin]
+ input: S,
+ current: Option<InjectChunks>,
+ buffer: Option<bytes::BytesMut>,
+ injection_queue: Arc<Mutex<VecDeque<InjectChunks>>>,
+ stream_len: Arc<AtomicUsize>,
+ reused_len: Arc<AtomicUsize>,
+ index_csum: Arc<Mutex<Option<openssl::sha::Sha256>>>,
+ }
+}
+
+#[derive(Debug)]
+pub struct InjectChunks {
+ pub boundary: u64,
+ pub chunks: Vec<AppendableDynamicEntry>,
+ pub size: usize,
+}
+
+pub enum InjectedChunksInfo {
+ Known(Vec<(u64, [u8; 32])>),
+ Raw((u64, bytes::BytesMut)),
+}
+
+pub trait InjectReusedChunks: Sized {
+ fn inject_reused_chunks(
+ self,
+ injection_queue: Arc<Mutex<VecDeque<InjectChunks>>>,
+ stream_len: Arc<AtomicUsize>,
+ reused_len: Arc<AtomicUsize>,
+ index_csum: Arc<Mutex<Option<openssl::sha::Sha256>>>,
+ ) -> InjectReusedChunksQueue<Self>;
+}
+
+impl<S> InjectReusedChunks for S
+where
+ S: Stream<Item = Result<bytes::BytesMut, Error>>,
+{
+ fn inject_reused_chunks(
+ self,
+ injection_queue: Arc<Mutex<VecDeque<InjectChunks>>>,
+ stream_len: Arc<AtomicUsize>,
+ reused_len: Arc<AtomicUsize>,
+ index_csum: Arc<Mutex<Option<openssl::sha::Sha256>>>,
+ ) -> InjectReusedChunksQueue<Self> {
+ InjectReusedChunksQueue {
+ input: self,
+ current: None,
+ injection_queue,
+ buffer: None,
+ stream_len,
+ reused_len,
+ index_csum,
+ }
+ }
+}
+
+impl<S> Stream for InjectReusedChunksQueue<S>
+where
+ S: Stream<Item = Result<bytes::BytesMut, Error>>,
+{
+ type Item = Result<InjectedChunksInfo, Error>;
+
+ fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Option<Self::Item>> {
+ let mut this = self.project();
+ loop {
+ let current = this.current.take();
+ if let Some(current) = current {
+ let mut chunks = Vec::new();
+ let mut guard = this.index_csum.lock().unwrap();
+ let csum = guard.as_mut().unwrap();
+
+ for chunk in current.chunks {
+ let offset = this
+ .stream_len
+ .fetch_add(chunk.size() as usize, Ordering::SeqCst)
+ as u64;
+ this.reused_len
+ .fetch_add(chunk.size() as usize, Ordering::SeqCst);
+ let digest = chunk.digest();
+ chunks.push((offset, digest));
+ let end_offset = offset + chunk.size();
+ csum.update(&end_offset.to_le_bytes());
+ csum.update(&digest);
+ }
+ let chunk_info = InjectedChunksInfo::Known(chunks);
+ return Poll::Ready(Some(Ok(chunk_info)));
+ }
+
+ let buffer = this.buffer.take();
+ if let Some(buffer) = buffer {
+ let offset = this.stream_len.fetch_add(buffer.len(), Ordering::SeqCst) as u64;
+ let data = InjectedChunksInfo::Raw((offset, buffer));
+ return Poll::Ready(Some(Ok(data)));
+ }
+
+ match ready!(this.input.as_mut().poll_next(cx)) {
+ None => return Poll::Ready(None),
+ Some(Err(err)) => return Poll::Ready(Some(Err(err))),
+ Some(Ok(raw)) => {
+ let chunk_size = raw.len();
+ let offset = this.stream_len.load(Ordering::SeqCst) as u64;
+ let mut injections = this.injection_queue.lock().unwrap();
+
+ if let Some(inject) = injections.pop_front() {
+ if inject.boundary == offset {
+ if this.current.replace(inject).is_some() {
+ return Poll::Ready(Some(Err(anyhow!(
+ "replaced injection queue not empty"
+ ))));
+ }
+ if chunk_size > 0 && this.buffer.replace(raw).is_some() {
+ return Poll::Ready(Some(Err(anyhow!(
+ "replaced buffer not empty"
+ ))));
+ }
+ continue;
+ } else if inject.boundary == offset + chunk_size as u64 {
+ let _ = this.current.insert(inject);
+ } else if inject.boundary < offset + chunk_size as u64 {
+ return Poll::Ready(Some(Err(anyhow!("invalid injection boundary"))));
+ } else {
+ injections.push_front(inject);
+ }
+ }
+
+ if chunk_size == 0 {
+ return Poll::Ready(Some(Err(anyhow!("unexpected empty raw data"))));
+ }
+
+ let offset = this.stream_len.fetch_add(chunk_size, Ordering::SeqCst) as u64;
+ let data = InjectedChunksInfo::Raw((offset, raw));
+
+ return Poll::Ready(Some(Ok(data)));
+ }
+ }
+ }
+ }
+}
diff --git a/pbs-client/src/lib.rs b/pbs-client/src/lib.rs
index 21cf8556..3e7bd2a8 100644
--- a/pbs-client/src/lib.rs
+++ b/pbs-client/src/lib.rs
@@ -7,6 +7,7 @@ pub mod catalog_shell;
pub mod pxar;
pub mod tools;
+mod inject_reused_chunks;
mod merge_known_chunks;
pub mod pipe_to_stream;
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-backup 18/29] fix #3174: chunker: add forced boundaries
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (16 preceding siblings ...)
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 17/29] fix #3174: upload stream: impl reused chunk injector Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 19/29] fix #3174: backup writer: inject queued chunk in upload steam Christian Ebner
` (10 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
Allow to force a boundary while chunking and inject reused chunks into
the stream. Duoble ended queues are used to control the boundaries and
chunk injection between archiver, chunker and uploader.
The archiver gets an interface to request a boundary and push a list of
chunks to inject following that boundary. The chunker reads this queue,
creating the boundary and passing the list of chunks to inject to the
uploader via a second, dedicated double ended queue.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- fix formatting via `cargo fmt`
examples/test_chunk_speed2.rs | 9 +++-
pbs-client/src/backup_writer.rs | 6 ++-
pbs-client/src/chunk_stream.rs | 42 ++++++++++++++++++-
pbs-client/src/pxar/create.rs | 10 ++++-
pbs-client/src/pxar_backup_stream.rs | 8 +++-
proxmox-backup-client/src/main.rs | 36 ++++++++++++----
.../src/proxmox_restore_daemon/api.rs | 13 +++++-
pxar-bin/src/main.rs | 5 ++-
tests/catar.rs | 3 ++
9 files changed, 115 insertions(+), 17 deletions(-)
diff --git a/examples/test_chunk_speed2.rs b/examples/test_chunk_speed2.rs
index 3f69b436..e8bac726 100644
--- a/examples/test_chunk_speed2.rs
+++ b/examples/test_chunk_speed2.rs
@@ -1,5 +1,7 @@
use anyhow::Error;
use futures::*;
+use std::collections::VecDeque;
+use std::sync::{Arc, Mutex};
extern crate proxmox_backup;
@@ -26,7 +28,12 @@ async fn run() -> Result<(), Error> {
.map_err(Error::from);
//let chunk_stream = FixedChunkStream::new(stream, 4*1024*1024);
- let mut chunk_stream = ChunkStream::new(stream, None);
+ let mut chunk_stream = ChunkStream::new(
+ stream,
+ None,
+ Arc::new(Mutex::new(VecDeque::new())),
+ Arc::new(Mutex::new(VecDeque::new())),
+ );
let start_time = std::time::Instant::now();
diff --git a/pbs-client/src/backup_writer.rs b/pbs-client/src/backup_writer.rs
index 8a03d8ea..cc6dd49a 100644
--- a/pbs-client/src/backup_writer.rs
+++ b/pbs-client/src/backup_writer.rs
@@ -1,4 +1,4 @@
-use std::collections::HashSet;
+use std::collections::{HashSet, VecDeque};
use std::future::Future;
use std::os::unix::fs::OpenOptionsExt;
use std::sync::atomic::{AtomicU64, AtomicUsize, Ordering};
@@ -23,6 +23,7 @@ use pbs_tools::crypt_config::CryptConfig;
use proxmox_human_byte::HumanByte;
+use super::inject_reused_chunks::InjectChunks;
use super::merge_known_chunks::{MergeKnownChunks, MergedChunkInfo};
use super::{H2Client, HttpClient};
@@ -265,6 +266,7 @@ impl BackupWriter {
archive_name: &str,
stream: impl Stream<Item = Result<bytes::BytesMut, Error>>,
options: UploadOptions,
+ injection_queue: Arc<Mutex<VecDeque<InjectChunks>>>,
) -> Result<BackupStats, Error> {
let known_chunks = Arc::new(Mutex::new(HashSet::new()));
@@ -341,6 +343,7 @@ impl BackupWriter {
None
},
options.compress,
+ injection_queue,
)
.await?;
@@ -637,6 +640,7 @@ impl BackupWriter {
known_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
crypt_config: Option<Arc<CryptConfig>>,
compress: bool,
+ injection_queue: Arc<Mutex<VecDeque<InjectChunks>>>,
) -> impl Future<Output = Result<UploadStats, Error>> {
let total_chunks = Arc::new(AtomicUsize::new(0));
let total_chunks2 = total_chunks.clone();
diff --git a/pbs-client/src/chunk_stream.rs b/pbs-client/src/chunk_stream.rs
index 895f6eae..4a6301c3 100644
--- a/pbs-client/src/chunk_stream.rs
+++ b/pbs-client/src/chunk_stream.rs
@@ -1,4 +1,6 @@
+use std::collections::VecDeque;
use std::pin::Pin;
+use std::sync::{Arc, Mutex};
use std::task::{Context, Poll};
use anyhow::Error;
@@ -8,21 +10,34 @@ use futures::stream::{Stream, TryStream};
use pbs_datastore::Chunker;
+use crate::inject_reused_chunks::InjectChunks;
+
/// Split input stream into dynamic sized chunks
pub struct ChunkStream<S: Unpin> {
input: S,
chunker: Chunker,
buffer: BytesMut,
scan_pos: usize,
+ consumed: u64,
+ boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
+ injections: Arc<Mutex<VecDeque<InjectChunks>>>,
}
impl<S: Unpin> ChunkStream<S> {
- pub fn new(input: S, chunk_size: Option<usize>) -> Self {
+ pub fn new(
+ input: S,
+ chunk_size: Option<usize>,
+ boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
+ injections: Arc<Mutex<VecDeque<InjectChunks>>>,
+ ) -> Self {
Self {
input,
chunker: Chunker::new(chunk_size.unwrap_or(4 * 1024 * 1024)),
buffer: BytesMut::new(),
scan_pos: 0,
+ consumed: 0,
+ boundaries,
+ injections,
}
}
}
@@ -40,6 +55,29 @@ where
fn poll_next(self: Pin<&mut Self>, cx: &mut Context) -> Poll<Option<Self::Item>> {
let this = self.get_mut();
loop {
+ {
+ // Make sure to release this lock and don't hold it longer than required
+ let mut boundaries = this.boundaries.lock().unwrap();
+ if let Some(inject) = boundaries.pop_front() {
+ let max = this.consumed + this.buffer.len() as u64;
+ if inject.boundary <= max {
+ let chunk_size = (inject.boundary - this.consumed) as usize;
+ let result = this.buffer.split_to(chunk_size);
+ this.consumed += chunk_size as u64;
+ this.scan_pos = 0;
+
+ // Add the size of the injected chunks to consumed, so chunk stream offsets
+ // are in sync with the rest of the archive.
+ this.consumed += inject.size as u64;
+
+ this.injections.lock().unwrap().push_back(inject);
+
+ return Poll::Ready(Some(Ok(result)));
+ }
+ boundaries.push_front(inject);
+ }
+ }
+
if this.scan_pos < this.buffer.len() {
let boundary = this.chunker.scan(&this.buffer[this.scan_pos..]);
@@ -50,7 +88,9 @@ where
// continue poll
} else if chunk_size <= this.buffer.len() {
let result = this.buffer.split_to(chunk_size);
+ this.consumed += chunk_size as u64;
this.scan_pos = 0;
+
return Poll::Ready(Some(Ok(result)));
} else {
panic!("got unexpected chunk boundary from chunker");
diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 4f6ffc78..9980c393 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -1,4 +1,4 @@
-use std::collections::{HashMap, HashSet};
+use std::collections::{HashMap, HashSet, VecDeque};
use std::ffi::{CStr, CString, OsStr};
use std::fmt;
use std::io::{self, Read};
@@ -25,8 +25,9 @@ use proxmox_lang::c_str;
use proxmox_sys::fs::{self, acl, xattr};
use pbs_datastore::catalog::{BackupCatalogWriter, CatalogReader};
-use pbs_datastore::dynamic_index::DynamicIndexReader;
+use pbs_datastore::dynamic_index::{DynamicEntry, DynamicIndexReader};
+use crate::inject_reused_chunks::InjectChunks;
use crate::pxar::metadata::errno_is_unsupported;
use crate::pxar::tools::assert_single_path_component;
use crate::pxar::Flags;
@@ -142,6 +143,8 @@ struct Archiver {
hardlinks: HashMap<HardLinkInfo, (PathBuf, LinkOffset)>,
file_copy_buffer: Vec<u8>,
previous_ref: Option<PxarPrevRef>,
+ forced_boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
+ inject: (usize, Vec<DynamicEntry>),
}
type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
@@ -153,6 +156,7 @@ pub async fn create_archive<T, F>(
callback: F,
catalog: Option<Arc<Mutex<dyn BackupCatalogWriter + Send>>>,
options: PxarCreateOptions,
+ forced_boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
) -> Result<(), Error>
where
T: SeqWrite + Send,
@@ -211,6 +215,8 @@ where
hardlinks: HashMap::new(),
file_copy_buffer: vec::undefined(4 * 1024 * 1024),
previous_ref: options.previous_ref,
+ forced_boundaries,
+ inject: (0, Vec::new()),
};
archiver
diff --git a/pbs-client/src/pxar_backup_stream.rs b/pbs-client/src/pxar_backup_stream.rs
index 22a6ffdc..d18ba470 100644
--- a/pbs-client/src/pxar_backup_stream.rs
+++ b/pbs-client/src/pxar_backup_stream.rs
@@ -1,3 +1,4 @@
+use std::collections::VecDeque;
use std::io::Write;
//use std::os::unix::io::FromRawFd;
use std::path::Path;
@@ -17,6 +18,8 @@ use proxmox_io::StdChannelWriter;
use pbs_datastore::catalog::CatalogWriter;
+use crate::inject_reused_chunks::InjectChunks;
+
/// Stream implementation to encode and upload .pxar archives.
///
/// The hyper client needs an async Stream for file upload, so we
@@ -40,6 +43,7 @@ impl PxarBackupStream {
dir: Dir,
catalog: Arc<Mutex<CatalogWriter<W>>>,
options: crate::pxar::PxarCreateOptions,
+ boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
) -> Result<Self, Error> {
let (tx, rx) = std::sync::mpsc::sync_channel(10);
@@ -64,6 +68,7 @@ impl PxarBackupStream {
},
Some(catalog),
options,
+ boundaries,
)
.await
{
@@ -87,10 +92,11 @@ impl PxarBackupStream {
dirname: &Path,
catalog: Arc<Mutex<CatalogWriter<W>>>,
options: crate::pxar::PxarCreateOptions,
+ boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
) -> Result<Self, Error> {
let dir = nix::dir::Dir::open(dirname, OFlag::O_DIRECTORY, Mode::empty())?;
- Self::new(dir, catalog, options)
+ Self::new(dir, catalog, options, boundaries)
}
}
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index f575ccd0..442616b7 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -1,4 +1,4 @@
-use std::collections::HashSet;
+use std::collections::{HashSet, VecDeque};
use std::io::{self, Read, Seek, SeekFrom, Write};
use std::path::{Path, PathBuf};
use std::pin::Pin;
@@ -192,8 +192,17 @@ async fn backup_directory<P: AsRef<Path>>(
pxar_create_options: pbs_client::pxar::PxarCreateOptions,
upload_options: UploadOptions,
) -> Result<BackupStats, Error> {
- let pxar_stream = PxarBackupStream::open(dir_path.as_ref(), catalog, pxar_create_options)?;
- let mut chunk_stream = ChunkStream::new(pxar_stream, chunk_size);
+ let boundaries = Arc::new(Mutex::new(VecDeque::new()));
+ let pxar_stream = PxarBackupStream::open(
+ dir_path.as_ref(),
+ catalog,
+ pxar_create_options,
+ boundaries.clone(),
+ )?;
+
+ let injections = Arc::new(Mutex::new(VecDeque::new()));
+ let mut chunk_stream =
+ ChunkStream::new(pxar_stream, chunk_size, boundaries, injections.clone());
let (tx, rx) = mpsc::channel(10); // allow to buffer 10 chunks
@@ -211,7 +220,7 @@ async fn backup_directory<P: AsRef<Path>>(
}
let stats = client
- .upload_stream(archive_name, stream, upload_options)
+ .upload_stream(archive_name, stream, upload_options, injections)
.await?;
Ok(stats)
@@ -237,8 +246,9 @@ async fn backup_image<P: AsRef<Path>>(
bail!("cannot backup image with dynamic chunk size!");
}
+ let injection_queue = Arc::new(Mutex::new(VecDeque::new()));
let stats = client
- .upload_stream(archive_name, stream, upload_options)
+ .upload_stream(archive_name, stream, upload_options, injection_queue)
.await?;
Ok(stats)
@@ -529,7 +539,14 @@ fn spawn_catalog_upload(
let (catalog_tx, catalog_rx) = std::sync::mpsc::sync_channel(10); // allow to buffer 10 writes
let catalog_stream = proxmox_async::blocking::StdChannelStream(catalog_rx);
let catalog_chunk_size = 512 * 1024;
- let catalog_chunk_stream = ChunkStream::new(catalog_stream, Some(catalog_chunk_size));
+ let boundaries = Arc::new(Mutex::new(VecDeque::new()));
+ let injections = Arc::new(Mutex::new(VecDeque::new()));
+ let catalog_chunk_stream = ChunkStream::new(
+ catalog_stream,
+ Some(catalog_chunk_size),
+ boundaries,
+ injections.clone(),
+ );
let catalog_writer = Arc::new(Mutex::new(CatalogWriter::new(TokioWriterAdapter::new(
StdChannelWriter::new(catalog_tx),
@@ -545,7 +562,12 @@ fn spawn_catalog_upload(
tokio::spawn(async move {
let catalog_upload_result = client
- .upload_stream(CATALOG_NAME, catalog_chunk_stream, upload_options)
+ .upload_stream(
+ CATALOG_NAME,
+ catalog_chunk_stream,
+ upload_options,
+ injections,
+ )
.await;
if let Err(ref err) = catalog_upload_result {
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index f89b0ab4..5eff673e 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -1,8 +1,10 @@
///! File-restore API running inside the restore VM
+use std::collections::VecDeque;
use std::ffi::OsStr;
use std::fs;
use std::os::unix::ffi::OsStrExt;
use std::path::{Path, PathBuf};
+use std::sync::{Arc, Mutex};
use anyhow::{bail, Error};
use futures::FutureExt;
@@ -360,8 +362,15 @@ fn extract(
};
let pxar_writer = TokioWriter::new(writer);
- create_archive(dir, pxar_writer, Flags::DEFAULT, |_| Ok(()), None, options)
- .await
+ create_archive(
+ dir,
+ pxar_writer,
+ Flags::DEFAULT,
+ |_| Ok(()),
+ None,
+ options,
+ Arc::new(Mutex::new(VecDeque::new())),
+ ).await
}
.await;
if let Err(err) = result {
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index 9376a2c1..c019f3e4 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -1,10 +1,10 @@
-use std::collections::HashSet;
+use std::collections::{HashSet, VecDeque};
use std::ffi::OsStr;
use std::fs::OpenOptions;
use std::os::unix::fs::OpenOptionsExt;
use std::path::{Path, PathBuf};
use std::sync::atomic::{AtomicBool, Ordering};
-use std::sync::Arc;
+use std::sync::{Arc, Mutex};
use anyhow::{bail, format_err, Error};
use futures::future::FutureExt;
@@ -384,6 +384,7 @@ async fn create_archive(
},
None,
options,
+ Arc::new(Mutex::new(VecDeque::new())),
)
.await?;
diff --git a/tests/catar.rs b/tests/catar.rs
index 36bb4f3b..bb9da2fa 100644
--- a/tests/catar.rs
+++ b/tests/catar.rs
@@ -1,4 +1,6 @@
+use std::collections::VecDeque;
use std::process::Command;
+use std::sync::{Arc, Mutex};
use anyhow::Error;
@@ -40,6 +42,7 @@ fn run_test(dir_name: &str) -> Result<(), Error> {
|_| Ok(()),
None,
options,
+ Arc::new(Mutex::new(VecDeque::new())),
))?;
Command::new("cmp")
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-backup 19/29] fix #3174: backup writer: inject queued chunk in upload steam
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (17 preceding siblings ...)
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 18/29] fix #3174: chunker: add forced boundaries Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 20/29] fix #3174: archiver: reuse files with unchanged metadata Christian Ebner
` (9 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
Inject the chunk in the backup writers upload stream, including them
thereby in the index file.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- no changes
pbs-client/src/backup_writer.rs | 85 +++++++++++++++++++--------------
1 file changed, 50 insertions(+), 35 deletions(-)
diff --git a/pbs-client/src/backup_writer.rs b/pbs-client/src/backup_writer.rs
index cc6dd49a..e8c2ec11 100644
--- a/pbs-client/src/backup_writer.rs
+++ b/pbs-client/src/backup_writer.rs
@@ -23,7 +23,7 @@ use pbs_tools::crypt_config::CryptConfig;
use proxmox_human_byte::HumanByte;
-use super::inject_reused_chunks::InjectChunks;
+use super::inject_reused_chunks::{InjectChunks, InjectReusedChunks, InjectedChunksInfo};
use super::merge_known_chunks::{MergeKnownChunks, MergedChunkInfo};
use super::{H2Client, HttpClient};
@@ -667,48 +667,63 @@ impl BackupWriter {
let index_csum_2 = index_csum.clone();
stream
- .and_then(move |data| {
- let chunk_len = data.len();
+ .inject_reused_chunks(
+ injection_queue,
+ stream_len,
+ reused_len.clone(),
+ index_csum.clone(),
+ )
+ .and_then(move |chunk_info| match chunk_info {
+ InjectedChunksInfo::Known(chunks) => {
+ total_chunks.fetch_add(chunks.len(), Ordering::SeqCst);
+ future::ok(MergedChunkInfo::Known(chunks))
+ }
+ InjectedChunksInfo::Raw((offset, data)) => {
+ let chunk_len = data.len();
- total_chunks.fetch_add(1, Ordering::SeqCst);
- let offset = stream_len.fetch_add(chunk_len, Ordering::SeqCst) as u64;
+ total_chunks.fetch_add(1, Ordering::SeqCst);
- let mut chunk_builder = DataChunkBuilder::new(data.as_ref()).compress(compress);
+ let mut chunk_builder = DataChunkBuilder::new(data.as_ref()).compress(compress);
- if let Some(ref crypt_config) = crypt_config {
- chunk_builder = chunk_builder.crypt_config(crypt_config);
- }
+ if let Some(ref crypt_config) = crypt_config {
+ chunk_builder = chunk_builder.crypt_config(crypt_config);
+ }
- let mut known_chunks = known_chunks.lock().unwrap();
- let digest = chunk_builder.digest();
+ let mut known_chunks = known_chunks.lock().unwrap();
- let mut guard = index_csum.lock().unwrap();
- let csum = guard.as_mut().unwrap();
+ let digest = chunk_builder.digest();
- let chunk_end = offset + chunk_len as u64;
+ let mut guard = index_csum.lock().unwrap();
+ let csum = guard.as_mut().unwrap();
- if !is_fixed_chunk_size {
- csum.update(&chunk_end.to_le_bytes());
- }
- csum.update(digest);
+ let chunk_end = offset + chunk_len as u64;
- let chunk_is_known = known_chunks.contains(digest);
- if chunk_is_known {
- known_chunk_count.fetch_add(1, Ordering::SeqCst);
- reused_len.fetch_add(chunk_len, Ordering::SeqCst);
- future::ok(MergedChunkInfo::Known(vec![(offset, *digest)]))
- } else {
- let compressed_stream_len2 = compressed_stream_len.clone();
- known_chunks.insert(*digest);
- future::ready(chunk_builder.build().map(move |(chunk, digest)| {
- compressed_stream_len2.fetch_add(chunk.raw_size(), Ordering::SeqCst);
- MergedChunkInfo::New(ChunkInfo {
- chunk,
- digest,
- chunk_len: chunk_len as u64,
- offset,
- })
- }))
+ if !is_fixed_chunk_size {
+ csum.update(&chunk_end.to_le_bytes());
+ }
+ csum.update(digest);
+
+ let chunk_is_known = known_chunks.contains(digest);
+ if chunk_is_known {
+ known_chunk_count.fetch_add(1, Ordering::SeqCst);
+ reused_len.fetch_add(chunk_len, Ordering::SeqCst);
+
+ future::ok(MergedChunkInfo::Known(vec![(offset, *digest)]))
+ } else {
+ let compressed_stream_len2 = compressed_stream_len.clone();
+ known_chunks.insert(*digest);
+
+ future::ready(chunk_builder.build().map(move |(chunk, digest)| {
+ compressed_stream_len2.fetch_add(chunk.raw_size(), Ordering::SeqCst);
+
+ MergedChunkInfo::New(ChunkInfo {
+ chunk,
+ digest,
+ chunk_len: chunk_len as u64,
+ offset,
+ })
+ }))
+ }
}
})
.merge_known_chunks()
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-backup 20/29] fix #3174: archiver: reuse files with unchanged metadata
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (18 preceding siblings ...)
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 19/29] fix #3174: backup writer: inject queued chunk in upload steam Christian Ebner
@ 2024-01-25 13:25 ` Christian Ebner
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 21/29] fix #3174: specs: add backup detection mode specification Christian Ebner
` (8 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:25 UTC (permalink / raw)
To: pbs-devel
During pxar archive encoding, check regular files against their
previous backup catalogs metadata, if present.
Instead of re-encoding files with unchanged metadata with file size over
a given threshold limit, mark the entries as appendix references in the
pxar archive and append the chunks containing the file payload in the
appendix.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- fix formatting using `cargo fmt`
pbs-client/src/pxar/create.rs | 312 +++++++++++++++++-
proxmox-backup-client/src/main.rs | 8 +-
.../src/proxmox_restore_daemon/api.rs | 1 +
pxar-bin/src/main.rs | 1 +
src/tape/file_formats/snapshot_archive.rs | 2 +-
5 files changed, 303 insertions(+), 21 deletions(-)
diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index 9980c393..a8e70651 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -17,21 +17,26 @@ use nix::sys::stat::{FileStat, Mode};
use pathpatterns::{MatchEntry, MatchFlag, MatchList, MatchType, PatternFlag};
use proxmox_sys::error::SysError;
-use pxar::encoder::{LinkOffset, SeqWrite};
+use pxar::encoder::{AppendixRefOffset, LinkOffset, SeqWrite};
use pxar::Metadata;
use proxmox_io::vec;
use proxmox_lang::c_str;
use proxmox_sys::fs::{self, acl, xattr};
-use pbs_datastore::catalog::{BackupCatalogWriter, CatalogReader};
-use pbs_datastore::dynamic_index::{DynamicEntry, DynamicIndexReader};
+use pbs_datastore::catalog::{
+ AppendixStartOffset, BackupCatalogWriter, CatalogReader, CatalogV2Extension, DirEntry,
+ DirEntryAttribute,
+};
+use pbs_datastore::dynamic_index::{AppendableDynamicEntry, DynamicIndexReader};
use crate::inject_reused_chunks::InjectChunks;
use crate::pxar::metadata::errno_is_unsupported;
use crate::pxar::tools::assert_single_path_component;
use crate::pxar::Flags;
+const MAX_FILE_SIZE: u64 = 1024;
+
/// Pxar options for creating a pxar archive/stream
#[derive(Default)]
pub struct PxarCreateOptions {
@@ -45,6 +50,8 @@ pub struct PxarCreateOptions {
pub skip_lost_and_found: bool,
/// Reference state for partial backups
pub previous_ref: Option<PxarPrevRef>,
+ /// Catalog archive name
+ pub archive_name: Option<CString>,
}
/// Contains statefull information of previous backups snapshots for partial backups
@@ -127,6 +134,93 @@ struct HardLinkInfo {
st_ino: u64,
}
+struct Appendix {
+ total: AppendixRefOffset,
+ chunks: Vec<AppendableDynamicEntry>,
+}
+
+impl Appendix {
+ fn new() -> Self {
+ Self {
+ total: AppendixRefOffset::default(),
+ chunks: Vec::new(),
+ }
+ }
+
+ fn is_empty(&self) -> bool {
+ self.chunks.is_empty()
+ }
+
+ fn insert(
+ &mut self,
+ indices: Vec<AppendableDynamicEntry>,
+ start_padding: u64,
+ ) -> AppendixRefOffset {
+ if let Some(offset) = self.digest_sequence_contained(&indices) {
+ AppendixRefOffset::default().add(offset + start_padding)
+ } else if let Some(offset) = self.last_digest_matched(&indices) {
+ for chunk in indices.into_iter().skip(1) {
+ self.total = self.total.add(chunk.size());
+ self.chunks.push(chunk);
+ }
+ AppendixRefOffset::default().add(offset + start_padding)
+ } else {
+ let offset = self.total;
+ for chunk in indices.into_iter() {
+ self.total = self.total.add(chunk.size());
+ self.chunks.push(chunk);
+ }
+ offset.add(start_padding)
+ }
+ }
+
+ fn digest_sequence_contained(&self, indices: &[AppendableDynamicEntry]) -> Option<u64> {
+ let digest = if let Some(first) = indices.first() {
+ first.digest()
+ } else {
+ return None;
+ };
+
+ let mut offset = 0;
+ let mut iter = self.chunks.iter();
+ while let Some(position) = iter.position(|e| {
+ offset += e.size();
+ e.digest() == digest
+ }) {
+ if indices.len() + position > self.chunks.len() {
+ return None;
+ }
+
+ for (ind, chunk) in indices.iter().skip(1).enumerate() {
+ if chunk.digest() != self.chunks[ind + position].digest() {
+ return None;
+ }
+ }
+
+ offset -= self.chunks[position].size();
+ return Some(offset);
+ }
+
+ None
+ }
+
+ fn last_digest_matched(&self, indices: &[AppendableDynamicEntry]) -> Option<u64> {
+ let digest = if let Some(first) = indices.first() {
+ first.digest()
+ } else {
+ return None;
+ };
+
+ if let Some(last) = self.chunks.last() {
+ if last.digest() == digest {
+ return Some(self.total.raw() - last.size());
+ }
+ }
+
+ None
+ }
+}
+
struct Archiver {
feature_flags: Flags,
fs_feature_flags: Flags,
@@ -144,7 +238,8 @@ struct Archiver {
file_copy_buffer: Vec<u8>,
previous_ref: Option<PxarPrevRef>,
forced_boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
- inject: (usize, Vec<DynamicEntry>),
+ appendix: Appendix,
+ prev_appendix: Option<AppendixStartOffset>,
}
type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
@@ -155,7 +250,7 @@ pub async fn create_archive<T, F>(
feature_flags: Flags,
callback: F,
catalog: Option<Arc<Mutex<dyn BackupCatalogWriter + Send>>>,
- options: PxarCreateOptions,
+ mut options: PxarCreateOptions,
forced_boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
) -> Result<(), Error>
where
@@ -200,6 +295,22 @@ where
)?);
}
+ let (appendix_start, prev_cat_parent) = if let Some(ref mut prev_ref) = options.previous_ref {
+ let entry = prev_ref
+ .catalog
+ .lookup_recursive(prev_ref.archive_name.as_bytes())?;
+ let parent = match entry.attr {
+ DirEntryAttribute::Archive { .. } => Some(entry),
+ _ => None,
+ };
+ let appendix_start = prev_ref
+ .catalog
+ .appendix_offset(prev_ref.archive_name.as_bytes())?;
+ (appendix_start, parent)
+ } else {
+ (None, None)
+ };
+
let mut archiver = Archiver {
feature_flags,
fs_feature_flags,
@@ -216,13 +327,42 @@ where
file_copy_buffer: vec::undefined(4 * 1024 * 1024),
previous_ref: options.previous_ref,
forced_boundaries,
- inject: (0, Vec::new()),
+ appendix: Appendix::new(),
+ prev_appendix: appendix_start,
};
+ if let Some(ref mut catalog) = archiver.catalog {
+ if let Some(ref archive_name) = options.archive_name {
+ catalog
+ .lock()
+ .unwrap()
+ .start_archive(archive_name.as_c_str())?;
+ }
+ }
+
archiver
- .archive_dir_contents(&mut encoder, source_dir, true)
+ .archive_dir_contents(&mut encoder, source_dir, prev_cat_parent.as_ref(), true)
.await?;
- encoder.finish().await?;
+
+ if archiver.appendix.is_empty() {
+ encoder.finish(None).await?;
+ if let Some(ref mut catalog) = archiver.catalog {
+ if options.archive_name.is_some() {
+ catalog.lock().unwrap().end_archive(None)?;
+ }
+ }
+ } else {
+ let (appendix_offset, appendix_size) = archiver.add_appendix(&mut encoder).await?;
+ encoder
+ .finish(Some((appendix_offset, appendix_size)))
+ .await?;
+ if let Some(catalog) = archiver.catalog {
+ if options.archive_name.is_some() {
+ catalog.lock().unwrap().end_archive(Some(appendix_offset))?;
+ }
+ }
+ }
+
Ok(())
}
@@ -251,6 +391,7 @@ impl Archiver {
&'a mut self,
encoder: &'a mut Encoder<'b, T>,
mut dir: Dir,
+ prev_cat_parent: Option<&'a DirEntry>,
is_root: bool,
) -> BoxFuture<'a, Result<(), Error>> {
async move {
@@ -273,6 +414,13 @@ impl Archiver {
let old_path = std::mem::take(&mut self.path);
+ let mut prev_catalog_dir_entries = Vec::new();
+ if let Some(ref mut prev_ref) = self.previous_ref {
+ if let Some(ref parent) = prev_cat_parent {
+ prev_catalog_dir_entries = prev_ref.catalog.read_dir(parent)?;
+ }
+ }
+
for file_entry in file_list {
let file_name = file_entry.name.to_bytes();
@@ -284,9 +432,15 @@ impl Archiver {
(self.callback)(&file_entry.path)?;
self.path = file_entry.path;
- self.add_entry(encoder, dir_fd, &file_entry.name, &file_entry.stat)
- .await
- .map_err(|err| self.wrap_err(err))?;
+ self.add_entry(
+ encoder,
+ dir_fd,
+ &file_entry.name,
+ &file_entry.stat,
+ &prev_catalog_dir_entries,
+ )
+ .await
+ .map_err(|err| self.wrap_err(err))?;
}
self.path = old_path;
self.entry_counter = entry_counter;
@@ -533,12 +687,119 @@ impl Archiver {
Ok(())
}
+ async fn add_appendix<T: SeqWrite + Send>(
+ &mut self,
+ encoder: &mut Encoder<'_, T>,
+ ) -> Result<(pxar::encoder::AppendixStartOffset, AppendixRefOffset), Error> {
+ let total = self.appendix.total;
+ let appendix_offset = encoder.add_appendix(total).await?;
+ let mut boundaries = self.forced_boundaries.lock().unwrap();
+ let mut position = unsafe { encoder.position_add(0) };
+
+ // Inject reused chunks in patches of 128 to not exceed upload post req size limit
+ for chunks in self.appendix.chunks.chunks(128) {
+ let size = chunks
+ .iter()
+ .fold(0, |sum, chunk| sum + chunk.size() as usize);
+ let inject_chunks = InjectChunks {
+ boundary: position,
+ chunks: chunks.to_vec(),
+ size,
+ };
+ boundaries.push_back(inject_chunks);
+ position = unsafe { encoder.position_add(size as u64) };
+ }
+
+ Ok((appendix_offset, total))
+ }
+
+ async fn reuse_if_metadata_unchanged<T: SeqWrite + Send>(
+ &mut self,
+ encoder: &mut Encoder<'_, T>,
+ c_file_name: &CStr,
+ metadata: &Metadata,
+ stat: &FileStat,
+ catalog_entries: &[DirEntry],
+ ) -> Result<bool, Error> {
+ let prev_ref = match self.previous_ref {
+ None => return Ok(false),
+ Some(ref mut prev_ref) => prev_ref,
+ };
+
+ let catalog_entry = catalog_entries
+ .iter()
+ .find(|entry| entry.name == c_file_name.to_bytes());
+
+ let (size, ctime, start_offset) = match catalog_entry {
+ Some(&DirEntry {
+ attr:
+ DirEntryAttribute::File {
+ size,
+ mtime: _,
+ extension: Some(CatalogV2Extension { ctime, file_offset }),
+ },
+ ..
+ }) => (size, ctime, file_offset.raw()),
+ Some(&DirEntry {
+ attr:
+ DirEntryAttribute::AppendixRef {
+ size,
+ mtime: _,
+ ctime,
+ appendix_ref_offset,
+ },
+ ..
+ }) => (
+ size,
+ ctime,
+ self.prev_appendix.unwrap().raw() + appendix_ref_offset.raw(),
+ ),
+ // The entry type found in the catalog is not a regular file,
+ // do not reuse entry but rather encode it again.
+ _ => return Ok(false),
+ };
+
+ let file_size = stat.st_size as u64;
+ if ctime != stat.st_ctime || size != file_size {
+ return Ok(false);
+ }
+
+ // Since ctime is unchanged, use current metadata size to calculate size and thereby
+ // end offset for the file payload in the reference archive.
+ // This is required to find the existing indexes to be included in the new index file.
+ let mut bytes = pxar::encoder::encoded_size(c_file_name, metadata).await?;
+ // payload header size
+ bytes += std::mem::size_of::<pxar::format::Header>() as u64;
+
+ let end_offset = start_offset + bytes + file_size;
+ let (indices, start_padding, end_padding) =
+ prev_ref.index.indices(start_offset, end_offset)?;
+
+ let appendix_ref_offset = self.appendix.insert(indices, start_padding);
+ let file_name: &Path = OsStr::from_bytes(c_file_name.to_bytes()).as_ref();
+ self.add_appendix_ref(encoder, file_name, appendix_ref_offset, file_size)
+ .await?;
+
+ if let Some(ref catalog) = self.catalog {
+ catalog.lock().unwrap().add_appendix_ref(
+ c_file_name,
+ file_size,
+ stat.st_mtime,
+ stat.st_ctime,
+ appendix_ref_offset,
+ )?;
+ }
+
+ Ok(true)
+ }
+
async fn add_entry<T: SeqWrite + Send>(
&mut self,
encoder: &mut Encoder<'_, T>,
parent: RawFd,
c_file_name: &CStr,
stat: &FileStat,
+ prev_cat_entries: &[DirEntry],
) -> Result<(), Error> {
use pxar::format::mode;
@@ -599,6 +860,20 @@ impl Archiver {
}
let file_size = stat.st_size as u64;
+ if file_size > MAX_FILE_SIZE
+ && self
+ .reuse_if_metadata_unchanged(
+ encoder,
+ c_file_name,
+ &metadata,
+ stat,
+ prev_cat_entries,
+ )
+ .await?
+ {
+ return Ok(());
+ }
+
let offset: LinkOffset = self
.add_regular_file(encoder, fd, file_name, &metadata, file_size)
.await?;
@@ -626,8 +901,15 @@ impl Archiver {
if let Some(ref catalog) = self.catalog {
catalog.lock().unwrap().start_directory(c_file_name)?;
}
+ let prev_cat_entry = prev_cat_entries.iter().find(|entry| match entry {
+ DirEntry {
+ name,
+ attr: DirEntryAttribute::Directory { .. },
+ } => name == c_file_name.to_bytes(),
+ _ => false,
+ });
let result = self
- .add_directory(encoder, dir, c_file_name, &metadata, stat)
+ .add_directory(encoder, dir, c_file_name, &metadata, stat, prev_cat_entry)
.await;
if let Some(ref catalog) = self.catalog {
catalog.lock().unwrap().end_directory()?;
@@ -684,6 +966,7 @@ impl Archiver {
dir_name: &CStr,
metadata: &Metadata,
stat: &FileStat,
+ prev_cat_parent: Option<&DirEntry>,
) -> Result<(), Error> {
let dir_name = OsStr::from_bytes(dir_name.to_bytes());
@@ -710,14 +993,15 @@ impl Archiver {
log::info!("skipping mount point: {:?}", self.path);
Ok(())
} else {
- self.archive_dir_contents(&mut encoder, dir, false).await
+ self.archive_dir_contents(&mut encoder, dir, prev_cat_parent, false)
+ .await
};
self.fs_magic = old_fs_magic;
self.fs_feature_flags = old_fs_feature_flags;
self.current_st_dev = old_st_dev;
- encoder.finish().await?;
+ encoder.finish(None).await?;
result
}
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 442616b7..a087b43b 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -48,7 +48,7 @@ use pbs_client::{
FixedChunkStream, HttpClient, PxarBackupStream, RemoteChunkReader, UploadOptions,
BACKUP_SOURCE_SCHEMA,
};
-use pbs_datastore::catalog::{BackupCatalogWriter, CatalogReader, CatalogWriter};
+use pbs_datastore::catalog::{CatalogReader, CatalogWriter};
use pbs_datastore::chunk_store::verify_chunk_size;
use pbs_datastore::dynamic_index::{BufferedDynamicReader, DynamicIndexReader};
use pbs_datastore::fixed_index::FixedIndexReader;
@@ -1005,10 +1005,6 @@ async fn create_backup(
let catalog = catalog.as_ref().unwrap();
log_file("directory", &filename, &target);
- catalog
- .lock()
- .unwrap()
- .start_directory(std::ffi::CString::new(target.as_str())?.as_c_str())?;
let pxar_options = pbs_client::pxar::PxarCreateOptions {
device_set: devices.clone(),
@@ -1016,6 +1012,7 @@ async fn create_backup(
entries_max: entries_max as usize,
skip_lost_and_found,
previous_ref: None,
+ archive_name: Some(std::ffi::CString::new(target.as_str())?),
};
let upload_options = UploadOptions {
@@ -1036,7 +1033,6 @@ async fn create_backup(
)
.await?;
manifest.add_file(target, stats.size, stats.csum, crypto.mode)?;
- catalog.lock().unwrap().end_directory()?;
}
(BackupSpecificationType::IMAGE, false) => {
log_file("image", &filename, &target);
diff --git a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
index 5eff673e..734fa976 100644
--- a/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
+++ b/proxmox-restore-daemon/src/proxmox_restore_daemon/api.rs
@@ -359,6 +359,7 @@ fn extract(
patterns,
skip_lost_and_found: false,
previous_ref: None,
+ archive_name: None,
};
let pxar_writer = TokioWriter::new(writer);
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index c019f3e4..6d365115 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -371,6 +371,7 @@ async fn create_archive(
patterns,
skip_lost_and_found: false,
previous_ref: None,
+ archive_name: None,
};
let writer = pxar::encoder::sync::StandardWriter::new(writer);
diff --git a/src/tape/file_formats/snapshot_archive.rs b/src/tape/file_formats/snapshot_archive.rs
index abeefbd3..686eddc1 100644
--- a/src/tape/file_formats/snapshot_archive.rs
+++ b/src/tape/file_formats/snapshot_archive.rs
@@ -91,7 +91,7 @@ pub fn tape_write_snapshot_archive<'a>(
proxmox_lang::io_bail!("file '{}' shrunk while reading", filename);
}
}
- encoder.finish()?;
+ encoder.finish(None)?;
Ok(())
});
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-backup 21/29] fix #3174: specs: add backup detection mode specification
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (19 preceding siblings ...)
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 20/29] fix #3174: archiver: reuse files with unchanged metadata Christian Ebner
@ 2024-01-25 13:26 ` Christian Ebner
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 22/29] fix #3174: client: Add detection mode to backup creation Christian Ebner
` (7 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:26 UTC (permalink / raw)
To: pbs-devel
Adds the specification for switching the detection mode used to
identify regular files which changed since a reference backup run.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- fix formatting using `cargo fmt`
pbs-client/src/backup_specification.rs | 33 ++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/pbs-client/src/backup_specification.rs b/pbs-client/src/backup_specification.rs
index 619a3a9d..3db1a7a4 100644
--- a/pbs-client/src/backup_specification.rs
+++ b/pbs-client/src/backup_specification.rs
@@ -4,6 +4,7 @@ use proxmox_schema::*;
const_regex! {
BACKUPSPEC_REGEX = r"^([a-zA-Z0-9_-]+\.(pxar|img|conf|log)):(.+)$";
+ DETECTION_MODE_REGEX = r"^(data|metadata(:[a-zA-Z0-9_-]+\.pxar)*)$";
}
pub const BACKUP_SOURCE_SCHEMA: Schema =
@@ -11,6 +12,11 @@ pub const BACKUP_SOURCE_SCHEMA: Schema =
.format(&ApiStringFormat::Pattern(&BACKUPSPEC_REGEX))
.schema();
+pub const BACKUP_DETECTION_MODE_SPEC: Schema =
+ StringSchema::new("Backup source specification ([data|metadata(:<label>,...)]).")
+ .format(&ApiStringFormat::Pattern(&DETECTION_MODE_REGEX))
+ .schema();
+
pub enum BackupSpecificationType {
PXAR,
IMAGE,
@@ -45,3 +51,30 @@ pub fn parse_backup_specification(value: &str) -> Result<BackupSpecification, Er
bail!("unable to parse backup source specification '{}'", value);
}
+
+pub enum BackupDetectionMode {
+ Data,
+ Metadata(Vec<String>),
+}
+
+pub fn parse_backup_detection_mode_specification(
+ value: &str,
+) -> Result<BackupDetectionMode, Error> {
+ if let Some(caps) = (DETECTION_MODE_REGEX.regex_obj)().captures(value) {
+ let mode = match caps.get(1).unwrap().as_str() {
+ "data" => BackupDetectionMode::Data,
+ ty if ty.starts_with("metadata") => {
+ let archives = ty.split(':').skip(1).map(|s| format!("{s}.didx")).collect();
+ BackupDetectionMode::Metadata(archives)
+ }
+ _ => bail!("invalid backup detection mode"),
+ };
+
+ return Ok(mode);
+ }
+
+ bail!(
+ "unable to parse backup detection mode specification '{}'",
+ value
+ );
+}
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-backup 22/29] fix #3174: client: Add detection mode to backup creation
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (20 preceding siblings ...)
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 21/29] fix #3174: specs: add backup detection mode specification Christian Ebner
@ 2024-01-25 13:26 ` Christian Ebner
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 23/29] catalog: use format version 2 conditionally Christian Ebner
` (6 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:26 UTC (permalink / raw)
To: pbs-devel
Introduces the `change-detection-mode` parameter to change file
encoding behavior.
When set to `metadata`, the catalog for the previous backup run and
the corresponding index file are fetched from the server and used as
reference during pxar archive creation.
This allows the archiver to skip encoding of file payloads for
unchanged regular files and referencing their existing chunks to be
included in the new backups index file instead, creating a pxar
archive with appendix section containing the payloads as concatenation
of chunks.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- fix formatting using `cargo fmt`
proxmox-backup-client/src/main.rs | 123 ++++++++++++++++++++++++++++--
1 file changed, 115 insertions(+), 8 deletions(-)
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index a087b43b..470399e8 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -1,5 +1,6 @@
use std::collections::{HashSet, VecDeque};
use std::io::{self, Read, Seek, SeekFrom, Write};
+use std::os::unix::fs::OpenOptionsExt;
use std::path::{Path, PathBuf};
use std::pin::Pin;
use std::sync::{Arc, Mutex};
@@ -43,10 +44,10 @@ use pbs_client::tools::{
CHUNK_SIZE_SCHEMA, REPO_URL_SCHEMA,
};
use pbs_client::{
- delete_ticket_info, parse_backup_specification, view_task_result, BackupReader,
- BackupRepository, BackupSpecificationType, BackupStats, BackupWriter, ChunkStream,
- FixedChunkStream, HttpClient, PxarBackupStream, RemoteChunkReader, UploadOptions,
- BACKUP_SOURCE_SCHEMA,
+ delete_ticket_info, parse_backup_detection_mode_specification, parse_backup_specification,
+ view_task_result, BackupDetectionMode, BackupReader, BackupRepository, BackupSpecificationType,
+ BackupStats, BackupWriter, ChunkStream, FixedChunkStream, HttpClient, PxarBackupStream,
+ RemoteChunkReader, UploadOptions, BACKUP_DETECTION_MODE_SPEC, BACKUP_SOURCE_SCHEMA,
};
use pbs_datastore::catalog::{CatalogReader, CatalogWriter};
use pbs_datastore::chunk_store::verify_chunk_size;
@@ -666,6 +667,10 @@ fn spawn_catalog_upload(
schema: TRAFFIC_CONTROL_BURST_SCHEMA,
optional: true,
},
+ "change-detection-mode": {
+ schema: BACKUP_DETECTION_MODE_SPEC,
+ optional: true,
+ },
"exclude": {
type: Array,
description: "List of paths or patterns for matching files to exclude.",
@@ -849,7 +854,20 @@ async fn create_backup(
let backup_time = backup_time_opt.unwrap_or_else(epoch_i64);
- let client = connect_rate_limited(&repo, rate_limit)?;
+ let cd_mode = param["change-detection-mode"].as_str().unwrap_or("data");
+ let detection_mode = parse_backup_detection_mode_specification(cd_mode)?;
+
+ let client = connect_rate_limited(&repo, rate_limit.clone())?;
+ let backup_group = BackupGroup::new(backup_type, backup_id);
+
+ let previous_snapshot = if let BackupDetectionMode::Metadata(_) = detection_mode {
+ api_datastore_latest_snapshot(&client, &repo.store(), &backup_ns, backup_group)
+ .await
+ .ok()
+ } else {
+ None
+ };
+
record_repository(&repo);
let snapshot = BackupDir::from((backup_type, backup_id.to_owned(), backup_time));
@@ -959,8 +977,8 @@ async fn create_backup(
log::info!("{} {} '{}' to '{}' as {}", what, desc, file, repo, target);
};
- for (backup_type, filename, target, size) in upload_list {
- match (backup_type, dry_run) {
+ for (backup_spec_type, filename, target, size) in upload_list {
+ match (backup_spec_type, dry_run) {
// dry-run
(BackupSpecificationType::CONFIG, true) => log_file("config file", &filename, &target),
(BackupSpecificationType::LOGFILE, true) => log_file("log file", &filename, &target),
@@ -1006,12 +1024,52 @@ async fn create_backup(
log_file("directory", &filename, &target);
+ let known_chunks = Arc::new(Mutex::new(HashSet::new()));
+ let mut previous_ref = None;
+
+ if let BackupDetectionMode::Metadata(ref archives) = detection_mode {
+ if archives.is_empty() || archives.contains(&target) {
+ if let Some(ref manifest) = previous_manifest {
+ let reference_catalog = download_reference_catalog(
+ &repo,
+ previous_snapshot.as_ref().unwrap(),
+ &backup_ns,
+ crypt_config.clone(),
+ )
+ .await;
+
+ if let Ok(reference_catalog) = reference_catalog {
+ let reference_index = client
+ .download_previous_dynamic_index(
+ &target,
+ &manifest,
+ known_chunks.clone(),
+ )
+ .await;
+
+ if let Ok(reference_index) = reference_index {
+ log::info!(
+ "Using previous catalog as metadata reference for '{target}'"
+ );
+ previous_ref = Some(pbs_client::pxar::PxarPrevRef {
+ index: reference_index,
+ catalog: reference_catalog,
+ archive_name: target.clone(),
+ });
+ }
+ }
+ } else {
+ log::info!("No previous manifest, fallback to regular mode");
+ };
+ }
+ }
+
let pxar_options = pbs_client::pxar::PxarCreateOptions {
device_set: devices.clone(),
patterns: pattern_list.clone(),
entries_max: entries_max as usize,
skip_lost_and_found,
- previous_ref: None,
+ previous_ref,
archive_name: Some(std::ffi::CString::new(target.as_str())?),
};
@@ -1112,6 +1170,55 @@ async fn create_backup(
Ok(Value::Null)
}
+async fn download_reference_catalog(
+ repo: &BackupRepository,
+ previous_snapshot: &BackupDir,
+ backup_ns: &BackupNamespace,
+ crypt_config: Option<Arc<CryptConfig>>,
+) -> Result<CatalogReader<std::fs::File>, Error> {
+ let http_reader_client = connect(&repo)?;
+ let backup_reader = BackupReader::start(
+ &http_reader_client,
+ crypt_config.clone(),
+ repo.store(),
+ &backup_ns,
+ &previous_snapshot,
+ true,
+ )
+ .await?;
+
+ let (manifest, _) = backup_reader.download_manifest().await?;
+ manifest.check_fingerprint(crypt_config.as_ref().map(Arc::as_ref))?;
+
+ let index = backup_reader
+ .download_dynamic_index(&manifest, CATALOG_NAME)
+ .await?;
+ let most_used = index.find_most_used_chunks(8);
+ let file_info = manifest.lookup_file_info(CATALOG_NAME)?;
+
+ let chunk_reader = RemoteChunkReader::new(
+ backup_reader,
+ crypt_config.clone(),
+ file_info.chunk_crypt_mode(),
+ most_used,
+ );
+
+ let mut reader = BufferedDynamicReader::new(index, chunk_reader);
+
+ let mut catalogfile = std::fs::OpenOptions::new()
+ .write(true)
+ .read(true)
+ .custom_flags(libc::O_TMPFILE)
+ .open("/tmp")?;
+
+ std::io::copy(&mut reader, &mut catalogfile)
+ .map_err(|err| format_err!("failed to download reference catalog - {}", err))?;
+
+ catalogfile.seek(SeekFrom::Start(0))?;
+
+ Ok(CatalogReader::new(catalogfile))
+}
+
async fn dump_image<W: Write>(
client: Arc<BackupReader>,
crypt_config: Option<Arc<CryptConfig>>,
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-backup 23/29] catalog: use format version 2 conditionally
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (21 preceding siblings ...)
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 22/29] fix #3174: client: Add detection mode to backup creation Christian Ebner
@ 2024-01-25 13:26 ` Christian Ebner
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 24/29] tools: add optional path prefix to line based output Christian Ebner
` (5 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:26 UTC (permalink / raw)
To: pbs-devel
Only use the catalog format version 2 when performing backups with the
change detection mode set to metadata.
This makes sure to remain compatible with older clients which do not
support the format version 2 at least for backups created using the
regular mode.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- fix formatting using `cargo fmt`
- changed patch ordering for more logical flow
pbs-datastore/src/catalog.rs | 69 +++++++++++++++++++++++++------
proxmox-backup-client/src/main.rs | 30 ++++++++++----
2 files changed, 80 insertions(+), 19 deletions(-)
diff --git a/pbs-datastore/src/catalog.rs b/pbs-datastore/src/catalog.rs
index 220313c6..0ca2ce23 100644
--- a/pbs-datastore/src/catalog.rs
+++ b/pbs-datastore/src/catalog.rs
@@ -64,6 +64,12 @@ pub enum CatalogEntryType {
Socket = b's',
}
+#[derive(PartialEq)]
+pub enum CatalogFormatVersion {
+ V1,
+ V2,
+}
+
impl TryFrom<u8> for CatalogEntryType {
type Error = Error;
@@ -555,17 +561,22 @@ pub struct CatalogWriter<W> {
writer: W,
dirstack: Vec<DirInfo>,
pos: u64,
+ version: CatalogFormatVersion,
}
impl<W: Write> CatalogWriter<W> {
/// Create a new CatalogWriter instance
- pub fn new(writer: W) -> Result<Self, Error> {
+ pub fn new(writer: W, version: CatalogFormatVersion) -> Result<Self, Error> {
let mut me = Self {
writer,
dirstack: vec![DirInfo::new_rootdir()],
pos: 0,
+ version,
};
- me.write_all(&PROXMOX_CATALOG_FILE_MAGIC_2_0)?;
+ match me.version {
+ CatalogFormatVersion::V1 => me.write_all(&PROXMOX_CATALOG_FILE_MAGIC_1_0)?,
+ CatalogFormatVersion::V2 => me.write_all(&PROXMOX_CATALOG_FILE_MAGIC_2_0)?,
+ }
Ok(me)
}
@@ -599,6 +610,10 @@ impl<W: Write> CatalogWriter<W> {
impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
fn start_archive(&mut self, name: &CStr) -> Result<(), Error> {
+ if self.version == CatalogFormatVersion::V1 {
+ return self.start_directory(name);
+ }
+
let new = DirInfo::new(name.to_owned());
self.dirstack.push(new);
Ok(())
@@ -608,6 +623,9 @@ impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
&mut self,
appendix: Option<pxar::encoder::AppendixStartOffset>,
) -> Result<(), Error> {
+ if self.version == CatalogFormatVersion::V1 {
+ return self.end_directory();
+ }
let (start, name) = match self.dirstack.pop() {
Some(dir) => {
let start = self.pos;
@@ -688,17 +706,30 @@ impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
.last_mut()
.ok_or_else(|| format_err!("outside root"))?;
let name = name.to_bytes().to_vec();
- let file_offset = FileOffset {
- offset: file_offset.raw(),
- };
- dir.entries.push(DirEntry {
- name,
- attr: DirEntryAttribute::File {
- size,
- mtime,
- extension: Some(CatalogV2Extension { ctime, file_offset }),
+ let entry = match self.version {
+ CatalogFormatVersion::V1 => DirEntry {
+ name,
+ attr: DirEntryAttribute::File {
+ size,
+ mtime,
+ extension: None,
+ },
},
- });
+ CatalogFormatVersion::V2 => {
+ let file_offset = FileOffset {
+ offset: file_offset.raw(),
+ };
+ DirEntry {
+ name,
+ attr: DirEntryAttribute::File {
+ size,
+ mtime,
+ extension: Some(CatalogV2Extension { ctime, file_offset }),
+ },
+ }
+ }
+ };
+ dir.entries.push(entry);
Ok(())
}
@@ -710,6 +741,9 @@ impl<W: Write> BackupCatalogWriter for CatalogWriter<W> {
ctime: i64,
appendix_ref_offset: pxar::encoder::AppendixRefOffset,
) -> Result<(), Error> {
+ if self.version == CatalogFormatVersion::V1 {
+ bail!("unsupported by catalog format version 1");
+ }
let dir = self
.dirstack
.last_mut()
@@ -856,6 +890,17 @@ impl<R: Read + Seek> CatalogReader<R> {
})
}
+ pub fn format_version(&mut self) -> Result<CatalogFormatVersion, Error> {
+ self.reader.seek(SeekFrom::Start(0))?;
+ let mut magic = [0u8; 8];
+ self.reader.read_exact(&mut magic)?;
+ match magic {
+ PROXMOX_CATALOG_FILE_MAGIC_1_0 => Ok(CatalogFormatVersion::V1),
+ PROXMOX_CATALOG_FILE_MAGIC_2_0 => Ok(CatalogFormatVersion::V2),
+ _ => bail!("got unexpected magic number for catalog"),
+ }
+ }
+
pub fn appendix_offset(
&mut self,
archive_name: &[u8],
diff --git a/proxmox-backup-client/src/main.rs b/proxmox-backup-client/src/main.rs
index 470399e8..0a3f24c0 100644
--- a/proxmox-backup-client/src/main.rs
+++ b/proxmox-backup-client/src/main.rs
@@ -49,7 +49,7 @@ use pbs_client::{
BackupStats, BackupWriter, ChunkStream, FixedChunkStream, HttpClient, PxarBackupStream,
RemoteChunkReader, UploadOptions, BACKUP_DETECTION_MODE_SPEC, BACKUP_SOURCE_SCHEMA,
};
-use pbs_datastore::catalog::{CatalogReader, CatalogWriter};
+use pbs_datastore::catalog::{CatalogFormatVersion, CatalogReader, CatalogWriter};
use pbs_datastore::chunk_store::verify_chunk_size;
use pbs_datastore::dynamic_index::{BufferedDynamicReader, DynamicIndexReader};
use pbs_datastore::fixed_index::FixedIndexReader;
@@ -536,6 +536,7 @@ struct CatalogUploadResult {
fn spawn_catalog_upload(
client: Arc<BackupWriter>,
encrypt: bool,
+ catalog_format_version: CatalogFormatVersion,
) -> Result<CatalogUploadResult, Error> {
let (catalog_tx, catalog_rx) = std::sync::mpsc::sync_channel(10); // allow to buffer 10 writes
let catalog_stream = proxmox_async::blocking::StdChannelStream(catalog_rx);
@@ -549,9 +550,10 @@ fn spawn_catalog_upload(
injections.clone(),
);
- let catalog_writer = Arc::new(Mutex::new(CatalogWriter::new(TokioWriterAdapter::new(
- StdChannelWriter::new(catalog_tx),
- ))?));
+ let catalog_writer = Arc::new(Mutex::new(CatalogWriter::new(
+ TokioWriterAdapter::new(StdChannelWriter::new(catalog_tx)),
+ catalog_format_version,
+ )?));
let (catalog_result_tx, catalog_result_rx) = tokio::sync::oneshot::channel();
@@ -1015,8 +1017,17 @@ async fn create_backup(
(BackupSpecificationType::PXAR, false) => {
// start catalog upload on first use
if catalog.is_none() {
- let catalog_upload_res =
- spawn_catalog_upload(client.clone(), crypto.mode == CryptMode::Encrypt)?;
+ let catalog_format_version =
+ if let BackupDetectionMode::Metadata(_) = detection_mode {
+ CatalogFormatVersion::V2
+ } else {
+ CatalogFormatVersion::V1
+ };
+ let catalog_upload_res = spawn_catalog_upload(
+ client.clone(),
+ crypto.mode == CryptMode::Encrypt,
+ catalog_format_version,
+ )?;
catalog = Some(catalog_upload_res.catalog_writer);
catalog_result_rx = Some(catalog_upload_res.result);
}
@@ -1215,8 +1226,13 @@ async fn download_reference_catalog(
.map_err(|err| format_err!("failed to download reference catalog - {}", err))?;
catalogfile.seek(SeekFrom::Start(0))?;
+ let mut reader = CatalogReader::new(catalogfile);
- Ok(CatalogReader::new(catalogfile))
+ match reader.format_version() {
+ Ok(CatalogFormatVersion::V2) => Ok(reader),
+ Ok(CatalogFormatVersion::V1) => Err(format_err!("unsupported catalog format version")),
+ Err(err) => Err(err),
+ }
}
async fn dump_image<W: Write>(
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-backup 24/29] tools: add optional path prefix to line based output
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (22 preceding siblings ...)
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 23/29] catalog: use format version 2 conditionally Christian Ebner
@ 2024-01-25 13:26 ` Christian Ebner
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 25/29] pxar-bin: implement listing for appendix entries Christian Ebner
` (4 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:26 UTC (permalink / raw)
To: pbs-devel
By default, the shown path is the one stored in the entry.
When printing pxar entries from the appendix section, that path is
however incomplete and has to be prefixed by the base path.
Allow to pass this base path to be shown as prefix to path stored in
the entry itself.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- not present in previous version
pbs-client/src/pxar/tools.rs | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/pbs-client/src/pxar/tools.rs b/pbs-client/src/pxar/tools.rs
index 52e025e4..0f44b5dd 100644
--- a/pbs-client/src/pxar/tools.rs
+++ b/pbs-client/src/pxar/tools.rs
@@ -123,7 +123,7 @@ fn format_mtime(mtime: &StatxTimestamp) -> String {
format!("{}.{}", mtime.secs, mtime.nanos)
}
-pub fn format_single_line_entry(entry: &Entry) -> String {
+pub fn format_single_line_entry(entry: &Entry, base_path: Option<&Path>) -> String {
let mode_string = mode_string(entry);
let meta = entry.metadata();
@@ -138,14 +138,14 @@ pub fn format_single_line_entry(entry: &Entry) -> String {
let owner_string = format!("{}/{}", meta.stat.uid, meta.stat.gid);
+ let path = match base_path {
+ Some(base_path) => base_path.join(entry.path().strip_prefix("/").unwrap()),
+ None => entry.path().to_path_buf(),
+ };
+
format!(
- "{} {:<13} {} {:>8} {:?}{}",
- mode_string,
- owner_string,
+ "{mode_string} {owner_string:<13} {} {size:>8} {path:?}{link}",
format_mtime(&meta.stat.mtime),
- size,
- entry.path(),
- link,
)
}
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-backup 25/29] pxar-bin: implement listing for appendix entries
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (23 preceding siblings ...)
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 24/29] tools: add optional path prefix to line based output Christian Ebner
@ 2024-01-25 13:26 ` Christian Ebner
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 26/29] test-suite: add detection mode change benchmark Christian Ebner
` (3 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:26 UTC (permalink / raw)
To: pbs-devel
Fix the output of pxar list <archive.pxar> by also dumping the
entries referenced in the appendix section.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- not present in previous version
pxar-bin/src/main.rs | 57 ++++++++++++++++++++++++++++++++++++++++----
1 file changed, 52 insertions(+), 5 deletions(-)
diff --git a/pxar-bin/src/main.rs b/pxar-bin/src/main.rs
index 6d365115..066b6212 100644
--- a/pxar-bin/src/main.rs
+++ b/pxar-bin/src/main.rs
@@ -1,6 +1,7 @@
-use std::collections::{HashSet, VecDeque};
-use std::ffi::OsStr;
+use std::collections::{BTreeMap, HashSet, VecDeque};
+use std::ffi::{CString, OsStr};
use std::fs::OpenOptions;
+use std::os::unix::ffi::OsStrExt;
use std::os::unix::fs::OpenOptionsExt;
use std::path::{Path, PathBuf};
use std::sync::atomic::{AtomicBool, Ordering};
@@ -15,6 +16,7 @@ use pathpatterns::{MatchEntry, MatchType, PatternFlag};
use pbs_client::pxar::{
format_single_line_entry, Flags, OverwriteFlags, PxarExtractOptions, ENCODER_MAX_ENTRIES,
};
+use pxar::EntryKind;
use proxmox_router::cli::*;
use proxmox_schema::api;
@@ -438,15 +440,60 @@ async fn mount_archive(archive: String, mountpoint: String, verbose: bool) -> Re
)]
/// List the contents of an archive.
fn dump_archive(archive: String) -> Result<(), Error> {
- for entry in pxar::decoder::Decoder::open(archive)? {
+ let mut appendix_refs = BTreeMap::new();
+ let mut decoder = pxar::decoder::Decoder::open(archive)?;
+
+ while let Some(entry) = decoder.next() {
let entry = entry?;
+ match entry.kind() {
+ EntryKind::AppendixRef {
+ appendix_offset,
+ file_size,
+ } => {
+ appendix_refs.insert(*appendix_offset, (entry.path().to_path_buf(), *file_size));
+ }
+ EntryKind::Appendix { .. } => {
+ break;
+ }
+ _ => {
+ if log::log_enabled!(log::Level::Debug) {
+ log::debug!("{}", format_single_line_entry(&entry, None));
+ } else {
+ log::info!("{:?}", entry.path());
+ }
+ }
+ }
+ }
+
+ // Dump entries in appendix section
+ let mut consumed = 0;
+ for (offset, (base_path, size)) in appendix_refs {
+ let skip = offset - consumed;
+ decoder.skip_bytes(skip)?;
+
+ let entry = match decoder.next() {
+ Some(Ok(entry)) => entry,
+ Some(Err(err)) => return Err(err.into()),
+ None => return Err(format_err!("expected entry")),
+ };
if log::log_enabled!(log::Level::Debug) {
- log::debug!("{}", format_single_line_entry(&entry));
+ log::debug!("{}", format_single_line_entry(&entry, Some(&base_path)));
} else {
- log::info!("{:?}", entry.path());
+ log::info!("{:?}", base_path.join(entry.path().strip_prefix("/")?));
}
+
+ let metadata = entry.metadata();
+ let file_name_os = entry.file_name();
+ let file_name_bytes = file_name_os.as_bytes();
+ let file_name = CString::new(file_name_bytes)?;
+ let mut bytes = pxar::encoder::sync::encoded_size(file_name.as_c_str(), &metadata)?;
+ // payload header size
+ bytes += std::mem::size_of::<pxar::format::Header>() as u64;
+
+ consumed += skip + bytes + size;
}
+
Ok(())
}
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-backup 26/29] test-suite: add detection mode change benchmark
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (24 preceding siblings ...)
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 25/29] pxar-bin: implement listing for appendix entries Christian Ebner
@ 2024-01-25 13:26 ` Christian Ebner
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 27/29] test-suite: Add bin to deb, add shell completions Christian Ebner
` (2 subsequent siblings)
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:26 UTC (permalink / raw)
To: pbs-devel
Introduces the proxmox-backup-test-suite create intended for
benchmarking and high level user facing testing.
The initial code includes a benchmark intended for regression testing of
the proxmox-backup-client when using different file detection modes
during backup.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- patch order changed to follow more logical flow
Cargo.toml | 1 +
proxmox-backup-test-suite/Cargo.toml | 18 ++
.../src/detection_mode_bench.rs | 291 ++++++++++++++++++
proxmox-backup-test-suite/src/main.rs | 17 +
4 files changed, 327 insertions(+)
create mode 100644 proxmox-backup-test-suite/Cargo.toml
create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs
create mode 100644 proxmox-backup-test-suite/src/main.rs
diff --git a/Cargo.toml b/Cargo.toml
index ecf7f4ce..374089f6 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -46,6 +46,7 @@ members = [
"proxmox-rrd",
"pxar-bin",
+ "proxmox-backup-test-suite",
]
[lib]
diff --git a/proxmox-backup-test-suite/Cargo.toml b/proxmox-backup-test-suite/Cargo.toml
new file mode 100644
index 00000000..3f899e9b
--- /dev/null
+++ b/proxmox-backup-test-suite/Cargo.toml
@@ -0,0 +1,18 @@
+[package]
+name = "proxmox-backup-test-suite"
+version = "0.1.0"
+authors.workspace = true
+edition.workspace = true
+
+[dependencies]
+anyhow.workspace = true
+futures.workspace = true
+serde.workspace = true
+serde_json.workspace = true
+
+pbs-client.workspace = true
+pbs-key-config.workspace = true
+pbs-tools.workspace = true
+proxmox-async.workspace = true
+proxmox-router = { workspace = true, features = ["cli"] }
+proxmox-schema = { workspace = true, features = [ "api-macro" ] }
diff --git a/proxmox-backup-test-suite/src/detection_mode_bench.rs b/proxmox-backup-test-suite/src/detection_mode_bench.rs
new file mode 100644
index 00000000..e2e8dcf1
--- /dev/null
+++ b/proxmox-backup-test-suite/src/detection_mode_bench.rs
@@ -0,0 +1,291 @@
+use std::path::Path;
+use std::process::Command;
+
+use anyhow::{bail, format_err, Error};
+use serde_json::Value;
+
+use pbs_client::{
+ tools::{complete_repository, key_source::KEYFILE_SCHEMA, REPO_URL_SCHEMA},
+ BACKUP_SOURCE_SCHEMA,
+};
+use pbs_tools::json;
+use proxmox_router::cli::*;
+use proxmox_schema::api;
+
+const DEFAULT_NUMBER_OF_RUNS: u64 = 5;
+// Homepage https://cocodataset.org/
+const COCO_DATASET_SRC_URL: &'static str = "http://images.cocodataset.org/zips/unlabeled2017.zip";
+// Homepage https://kernel.org/
+const LINUX_GIT_REPOSITORY: &'static str =
+ "git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git";
+const LINUX_GIT_TAG: &'static str = "v6.5.5";
+
+pub(crate) fn detection_mode_bench_mgtm_cli() -> CliCommandMap {
+ let run_cmd_def = CliCommand::new(&API_METHOD_DETECTION_MODE_BENCH_RUN)
+ .arg_param(&["backupspec"])
+ .completion_cb("repository", complete_repository)
+ .completion_cb("keyfile", complete_file_name);
+
+ let prepare_cmd_def = CliCommand::new(&API_METHOD_DETECTION_MODE_BENCH_PREPARE);
+ CliCommandMap::new()
+ .insert("prepare", prepare_cmd_def)
+ .insert("run", run_cmd_def)
+}
+
+#[api(
+ input: {
+ properties: {
+ backupspec: {
+ type: Array,
+ description: "List of backup source specifications ([<label.ext>:<path>] ...)",
+ items: {
+ schema: BACKUP_SOURCE_SCHEMA,
+ }
+ },
+ repository: {
+ schema: REPO_URL_SCHEMA,
+ optional: true,
+ },
+ keyfile: {
+ schema: KEYFILE_SCHEMA,
+ optional: true,
+ },
+ "number-of-runs": {
+ description: "Number of times to repeat the run",
+ type: Integer,
+ optional: true,
+ },
+ }
+ }
+)]
+/// Run benchmark to compare performance for backups using different change detection modes.
+fn detection_mode_bench_run(param: Value) -> Result<(), Error> {
+ let mut pbc = Command::new("proxmox-backup-client");
+ pbc.arg("backup");
+
+ let backupspec_list = json::required_array_param(¶m, "backupspec")?;
+ for backupspec in backupspec_list {
+ let arg = backupspec
+ .as_str()
+ .ok_or_else(|| format_err!("failed to parse backupspec"))?;
+ pbc.arg(arg);
+ }
+
+ if let Some(repo) = param["repository"].as_str() {
+ pbc.arg("--repository");
+ pbc.arg::<&str>(repo);
+ }
+
+ if let Some(keyfile) = param["keyfile"].as_str() {
+ pbc.arg("--keyfile");
+ pbc.arg::<&str>(keyfile);
+ }
+
+ let number_of_runs = match param["number_of_runs"].as_u64() {
+ Some(n) => n,
+ None => DEFAULT_NUMBER_OF_RUNS,
+ };
+ if number_of_runs < 1 {
+ bail!("Number of runs must be greater than 1, aborting.");
+ }
+
+ // First run is an initial run to make sure all chunks are present already, reduce side effects
+ // by filesystem caches ecc.
+ let _stats_initial = do_run(&mut pbc, 1)?;
+
+ println!("\nStarting benchmarking backups with regular detection mode...\n");
+ let stats_reg = do_run(&mut pbc, number_of_runs)?;
+
+ // Make sure to have a valid reference with catalog fromat version 2
+ pbc.arg("--change-detection-mode=metadata");
+ let _stats_initial = do_run(&mut pbc, 1)?;
+
+ println!("\nStarting benchmarking backups with metadata detection mode...\n");
+ let stats_meta = do_run(&mut pbc, number_of_runs)?;
+
+ println!("\nCompleted benchmark with {number_of_runs} runs for each tested mode.");
+ println!("\nCompleted regular backup with:");
+ println!("Total runtime: {:.2} s", stats_reg.total);
+ println!("Average: {:.2} ± {:.2} s", stats_reg.avg, stats_reg.stddev);
+ println!("Min: {:.2} s", stats_reg.min);
+ println!("Max: {:.2} s", stats_reg.max);
+
+ println!("\nCompleted metadata detection mode backup with:");
+ println!("Total runtime: {:.2} s", stats_meta.total);
+ println!(
+ "Average: {:.2} ± {:.2} s",
+ stats_meta.avg, stats_meta.stddev
+ );
+ println!("Min: {:.2} s", stats_meta.min);
+ println!("Max: {:.2} s", stats_meta.max);
+
+ let diff_stddev =
+ ((stats_meta.stddev * stats_meta.stddev) + (stats_reg.stddev * stats_reg.stddev)).sqrt();
+ println!("\nDifferences (metadata based - regular):");
+ println!(
+ "Delta total runtime: {:.2} s ({:.2} %)",
+ stats_meta.total - stats_reg.total,
+ 100.0 * (stats_meta.total / stats_reg.total - 1.0),
+ );
+ println!(
+ "Delta average: {:.2} ± {:.2} s ({:.2} %)",
+ stats_meta.avg - stats_reg.avg,
+ diff_stddev,
+ 100.0 * (stats_meta.avg / stats_reg.avg - 1.0),
+ );
+ println!(
+ "Delta min: {:.2} s ({:.2} %)",
+ stats_meta.min - stats_reg.min,
+ 100.0 * (stats_meta.min / stats_reg.min - 1.0),
+ );
+ println!(
+ "Delta max: {:.2} s ({:.2} %)",
+ stats_meta.max - stats_reg.max,
+ 100.0 * (stats_meta.max / stats_reg.max - 1.0),
+ );
+
+ Ok(())
+}
+
+fn do_run(cmd: &mut Command, n_runs: u64) -> Result<Statistics, Error> {
+ let mut timings = Vec::with_capacity(n_runs as usize);
+ for iteration in 1..n_runs + 1 {
+ let start = std::time::SystemTime::now();
+ let mut child = cmd.spawn()?;
+ let exit_code = child.wait()?;
+ let elapsed = start.elapsed()?;
+ timings.push(elapsed);
+ if !exit_code.success() {
+ bail!("Run number {iteration} of {n_runs} failed, aborting.");
+ }
+ }
+
+ Ok(statistics(timings))
+}
+
+struct Statistics {
+ total: f64,
+ avg: f64,
+ stddev: f64,
+ min: f64,
+ max: f64,
+}
+
+fn statistics(timings: Vec<std::time::Duration>) -> Statistics {
+ let total = timings
+ .iter()
+ .fold(0f64, |sum, time| sum + time.as_secs_f64());
+ let avg = total / timings.len() as f64;
+ let var = 1f64 / (timings.len() - 1) as f64
+ * timings.iter().fold(0f64, |sq_sum, time| {
+ let diff = time.as_secs_f64() - avg;
+ sq_sum + diff * diff
+ });
+ let stddev = var.sqrt();
+ let min = timings.iter().min().unwrap().as_secs_f64();
+ let max = timings.iter().max().unwrap().as_secs_f64();
+
+ Statistics {
+ total,
+ avg,
+ stddev,
+ min,
+ max,
+ }
+}
+
+#[api(
+ input: {
+ properties: {
+ target: {
+ description: "target path to prepare test data.",
+ },
+ },
+ },
+)]
+/// Prepare files required for detection mode backup benchmarks.
+fn detection_mode_bench_prepare(target: String) -> Result<(), Error> {
+ let linux_repo_target = format!("{target}/linux");
+ let coco_dataset_target = format!("{target}/coco");
+ git_clone(LINUX_GIT_REPOSITORY, linux_repo_target.as_str())?;
+ git_checkout(LINUX_GIT_TAG, linux_repo_target.as_str())?;
+ wget_download(COCO_DATASET_SRC_URL, coco_dataset_target.as_str())?;
+
+ Ok(())
+}
+
+fn git_clone(repo: &str, target: &str) -> Result<(), Error> {
+ println!("Calling git clone for '{repo}'.");
+ let target_git = format!("{target}/.git");
+ let path = Path::new(&target_git);
+ if let Ok(true) = path.try_exists() {
+ println!("Target '{target}' already contains a git repository, skip.");
+ return Ok(());
+ }
+
+ let mut git = Command::new("git");
+ git.args(["clone", repo, target]);
+
+ let mut child = git.spawn()?;
+ let exit_code = child.wait()?;
+ if exit_code.success() {
+ println!("git clone finished with success.");
+ } else {
+ bail!("git clone failed for '{target}'.");
+ }
+
+ Ok(())
+}
+
+fn git_checkout(checkout_target: &str, target: &str) -> Result<(), Error> {
+ println!("Calling git checkout '{checkout_target}'.");
+ let mut git = Command::new("git");
+ git.args(["-C", target, "checkout", checkout_target]);
+
+ let mut child = git.spawn()?;
+ let exit_code = child.wait()?;
+ if exit_code.success() {
+ println!("git checkout finished with success.");
+ } else {
+ bail!("git checkout '{checkout_target}' failed for '{target}'.");
+ }
+ Ok(())
+}
+
+fn wget_download(source_url: &str, target: &str) -> Result<(), Error> {
+ let path = Path::new(&target);
+ if let Ok(true) = path.try_exists() {
+ println!("Target '{target}' already exists, skip.");
+ return Ok(());
+ }
+ let zip = format!("{}/unlabeled2017.zip", target);
+ let path = Path::new(&zip);
+ if !path.try_exists()? {
+ println!("Download archive using wget from '{source_url}' to '{target}'.");
+ let mut wget = Command::new("wget");
+ wget.args(["-P", target, source_url]);
+
+ let mut child = wget.spawn()?;
+ let exit_code = child.wait()?;
+ if exit_code.success() {
+ println!("Download finished with success.");
+ } else {
+ bail!("Failed to download '{source_url}' to '{target}'.");
+ }
+ return Ok(());
+ } else {
+ println!("Target '{target}' already contains download, skip download.");
+ }
+
+ let mut unzip = Command::new("unzip");
+ unzip.args([&zip, "-d", target]);
+
+ let mut child = unzip.spawn()?;
+ let exit_code = child.wait()?;
+ if exit_code.success() {
+ println!("Extracting zip archive finished with success.");
+ } else {
+ bail!("Failed to extract zip archive '{zip}' to '{target}'.");
+ }
+ Ok(())
+}
diff --git a/proxmox-backup-test-suite/src/main.rs b/proxmox-backup-test-suite/src/main.rs
new file mode 100644
index 00000000..0a5b436a
--- /dev/null
+++ b/proxmox-backup-test-suite/src/main.rs
@@ -0,0 +1,17 @@
+use proxmox_router::cli::*;
+
+mod detection_mode_bench;
+
+fn main() {
+ let cmd_def = CliCommandMap::new().insert(
+ "detection-mode-bench",
+ detection_mode_bench::detection_mode_bench_mgtm_cli(),
+ );
+
+ let rpcenv = CliEnvironment::new();
+ run_cli_command(
+ cmd_def,
+ rpcenv,
+ Some(|future| proxmox_async::runtime::main(future)),
+ );
+}
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-backup 27/29] test-suite: Add bin to deb, add shell completions
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (25 preceding siblings ...)
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 26/29] test-suite: add detection mode change benchmark Christian Ebner
@ 2024-01-25 13:26 ` Christian Ebner
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 28/29] pxar: add lookahead caching to reduce chunk fragmentation Christian Ebner
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-widget-toolkit 29/29] file-browser: support pxar archive and fileref types Christian Ebner
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:26 UTC (permalink / raw)
To: pbs-devel
Adds the required files for bash and zsh completion and packages the
binary to be included in the proxmox-backup-client debian package.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- patch order changed to more logic flow
Makefile | 13 ++++++++-----
debian/proxmox-backup-client.bash-completion | 1 +
debian/proxmox-backup-client.install | 2 ++
debian/proxmox-backup-test-suite.bc | 8 ++++++++
zsh-completions/_proxmox-backup-test-suite | 13 +++++++++++++
5 files changed, 32 insertions(+), 5 deletions(-)
create mode 100644 debian/proxmox-backup-test-suite.bc
create mode 100644 zsh-completions/_proxmox-backup-test-suite
diff --git a/Makefile b/Makefile
index 0317dd5e..acaac3f7 100644
--- a/Makefile
+++ b/Makefile
@@ -8,11 +8,12 @@ SUBDIRS := etc www docs
# Binaries usable by users
USR_BIN := \
- proxmox-backup-client \
- proxmox-file-restore \
- pxar \
- proxmox-tape \
- pmtx \
+ proxmox-backup-client \
+ proxmox-backup-test-suite \
+ proxmox-file-restore \
+ pxar \
+ proxmox-tape \
+ pmtx \
pmt
# Binaries usable by admins
@@ -165,6 +166,8 @@ $(COMPILED_BINS) $(COMPILEDIR)/dump-catalog-shell-cli $(COMPILEDIR)/docgen: .do-
--bin proxmox-backup-client \
--bin dump-catalog-shell-cli \
--bin proxmox-backup-debug \
+ --package proxmox-backup-test-suite \
+ --bin proxmox-backup-test-suite \
--package proxmox-file-restore \
--bin proxmox-file-restore \
--package pxar-bin \
diff --git a/debian/proxmox-backup-client.bash-completion b/debian/proxmox-backup-client.bash-completion
index 43736017..c4ff02ae 100644
--- a/debian/proxmox-backup-client.bash-completion
+++ b/debian/proxmox-backup-client.bash-completion
@@ -1,2 +1,3 @@
debian/proxmox-backup-client.bc proxmox-backup-client
+debian/proxmox-backup-test-suite.bc proxmox-backup-test-suite
debian/pxar.bc pxar
diff --git a/debian/proxmox-backup-client.install b/debian/proxmox-backup-client.install
index 74b568f1..0eb85975 100644
--- a/debian/proxmox-backup-client.install
+++ b/debian/proxmox-backup-client.install
@@ -1,6 +1,8 @@
usr/bin/proxmox-backup-client
+usr/bin/proxmox-backup-test-suite
usr/bin/pxar
usr/share/man/man1/proxmox-backup-client.1
usr/share/man/man1/pxar.1
usr/share/zsh/vendor-completions/_proxmox-backup-client
+usr/share/zsh/vendor-completions/_proxmox-backup-test-suite
usr/share/zsh/vendor-completions/_pxar
diff --git a/debian/proxmox-backup-test-suite.bc b/debian/proxmox-backup-test-suite.bc
new file mode 100644
index 00000000..2686d7ea
--- /dev/null
+++ b/debian/proxmox-backup-test-suite.bc
@@ -0,0 +1,8 @@
+# proxmox-backup-test-suite bash completion
+
+# see http://tiswww.case.edu/php/chet/bash/FAQ
+# and __ltrim_colon_completions() in /usr/share/bash-completion/bash_completion
+# this modifies global var, but I found no better way
+COMP_WORDBREAKS=${COMP_WORDBREAKS//:}
+
+complete -C 'proxmox-backup-test-suite bashcomplete' proxmox-backup-test-suite
diff --git a/zsh-completions/_proxmox-backup-test-suite b/zsh-completions/_proxmox-backup-test-suite
new file mode 100644
index 00000000..72ebcea5
--- /dev/null
+++ b/zsh-completions/_proxmox-backup-test-suite
@@ -0,0 +1,13 @@
+#compdef _proxmox-backup-test-suite() proxmox-backup-test-suite
+
+function _proxmox-backup-test-suite() {
+ local cwords line point cmd curr prev
+ cwords=${#words[@]}
+ line=$words
+ point=${#line}
+ cmd=${words[1]}
+ curr=${words[cwords]}
+ prev=${words[cwords-1]}
+ compadd -- $(COMP_CWORD="$cwords" COMP_LINE="$line" COMP_POINT="$point" \
+ proxmox-backup-test-suite bashcomplete "$cmd" "$curr" "$prev")
+}
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-backup 28/29] pxar: add lookahead caching to reduce chunk fragmentation
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (26 preceding siblings ...)
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 27/29] test-suite: Add bin to deb, add shell completions Christian Ebner
@ 2024-01-25 13:26 ` Christian Ebner
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-widget-toolkit 29/29] file-browser: support pxar archive and fileref types Christian Ebner
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:26 UTC (permalink / raw)
To: pbs-devel
For multiple consecutive runs with metadata based file change detection
the referencing of existing chunks can lead to fragmentation and an
increased size of the index file.
In order to reduce this, look ahead before adding files as reference and
only reference the chunks if excessive fragmentation can be avoided,
re-encode file regularly otherwise.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- not present in previous version
pbs-client/src/pxar/create.rs | 435 ++++++++++++++++++++-----
pbs-client/src/pxar/lookahead_cache.rs | 41 +++
pbs-client/src/pxar/mod.rs | 1 +
3 files changed, 402 insertions(+), 75 deletions(-)
create mode 100644 pbs-client/src/pxar/lookahead_cache.rs
diff --git a/pbs-client/src/pxar/create.rs b/pbs-client/src/pxar/create.rs
index a8e70651..e29401ee 100644
--- a/pbs-client/src/pxar/create.rs
+++ b/pbs-client/src/pxar/create.rs
@@ -31,11 +31,13 @@ use pbs_datastore::catalog::{
use pbs_datastore::dynamic_index::{AppendableDynamicEntry, DynamicIndexReader};
use crate::inject_reused_chunks::InjectChunks;
+use crate::pxar::lookahead_cache::{CacheEntry, CacheEntryData};
use crate::pxar::metadata::errno_is_unsupported;
use crate::pxar::tools::assert_single_path_component;
use crate::pxar::Flags;
const MAX_FILE_SIZE: u64 = 1024;
+const CACHED_PAYLOAD_THRESHOLD: u64 = 2 * 1024 * 1024;
/// Pxar options for creating a pxar archive/stream
#[derive(Default)]
@@ -240,6 +242,9 @@ struct Archiver {
forced_boundaries: Arc<Mutex<VecDeque<InjectChunks>>>,
appendix: Appendix,
prev_appendix: Option<AppendixStartOffset>,
+ cached_entries: Vec<CacheEntry>,
+ caching_enabled: bool,
+ cached_payload_size: u64,
}
type Encoder<'a, T> = pxar::encoder::aio::Encoder<'a, T>;
@@ -329,6 +334,9 @@ where
forced_boundaries,
appendix: Appendix::new(),
prev_appendix: appendix_start,
+ cached_entries: Vec::new(),
+ caching_enabled: false,
+ cached_payload_size: 0,
};
if let Some(ref mut catalog) = archiver.catalog {
@@ -344,6 +352,17 @@ where
.archive_dir_contents(&mut encoder, source_dir, prev_cat_parent.as_ref(), true)
.await?;
+ if let Some(last) = archiver.cached_entries.pop() {
+ match last {
+ CacheEntry::DirEnd => {}
+ _ => archiver.cached_entries.push(last),
+ }
+ }
+
+ archiver
+ .flush_cached_to_archive(&mut encoder, false)
+ .await?;
+
if archiver.appendix.is_empty() {
encoder.finish(None).await?;
if let Some(ref mut catalog) = archiver.catalog {
@@ -442,6 +461,11 @@ impl Archiver {
.await
.map_err(|err| self.wrap_err(err))?;
}
+
+ if self.caching_enabled {
+ self.cached_entries.push(CacheEntry::DirEnd);
+ }
+
self.path = old_path;
self.entry_counter = entry_counter;
self.patterns.truncate(old_patterns_count);
@@ -694,7 +718,7 @@ impl Archiver {
let total = self.appendix.total;
let appendix_offset = encoder.add_appendix(total).await?;
let mut boundaries = self.forced_boundaries.lock().unwrap();
- let mut position = unsafe { encoder.position_add(0) };
+ let mut position = unsafe { encoder.position_add(0)? };
// Inject reused chunks in patches of 128 to not exceed upload post req size limit
for chunks in self.appendix.chunks.chunks(128) {
@@ -707,24 +731,21 @@ impl Archiver {
size,
};
boundaries.push_back(inject_chunks);
- position = unsafe { encoder.position_add(size as u64) };
+ position = unsafe { encoder.position_add(size as u64)? };
}
Ok((appendix_offset, total))
}
- async fn reuse_if_metadata_unchanged<T: SeqWrite + Send>(
+ async fn is_reusable_entry(
&mut self,
- encoder: &mut Encoder<'_, T>,
c_file_name: &CStr,
- metadata: &Metadata,
stat: &FileStat,
catalog_entries: &[DirEntry],
- ) -> Result<bool, Error> {
- let prev_ref = match self.previous_ref {
- None => return Ok(false),
- Some(ref mut prev_ref) => prev_ref,
- };
+ ) -> Result<Option<u64>, Error> {
+ if stat.st_nlink > 1 || (stat.st_size as u64) < MAX_FILE_SIZE {
+ return Ok(None);
+ }
let catalog_entry = catalog_entries
.iter()
@@ -754,43 +775,17 @@ impl Archiver {
ctime,
self.prev_appendix.unwrap().raw() + appendix_ref_offset.raw(),
),
- // The entry type found in the catalog is not a regular file,
+ // Entry not found or the entry type found in the catalog is not a regular file,
// do not reuse entry but rather encode it again.
- _ => return Ok(false),
+ _ => return Ok(None),
};
let file_size = stat.st_size as u64;
if ctime != stat.st_ctime || size != file_size {
- return Ok(false);
+ return Ok(None);
}
- // Since ctime is unchanged, use current metadata size to calculate size and thereby
- // end offset for the file payload in the reference archive.
- // This is required to find the existing indexes to be included in the new index file.
- let mut bytes = pxar::encoder::encoded_size(c_file_name, metadata).await?;
- // payload header size
- bytes += std::mem::size_of::<pxar::format::Header>() as u64;
-
- let end_offset = start_offset + bytes + file_size;
- let (indices, start_padding, end_padding) =
- prev_ref.index.indices(start_offset, end_offset)?;
-
- let appendix_ref_offset = self.appendix.insert(indices, start_padding);
- let file_name: &Path = OsStr::from_bytes(c_file_name.to_bytes()).as_ref();
- self.add_appendix_ref(encoder, file_name, appendix_ref_offset, file_size)
- .await?;
-
- if let Some(ref catalog) = self.catalog {
- catalog.lock().unwrap().add_appendix_ref(
- c_file_name,
- file_size,
- stat.st_mtime,
- stat.st_ctime,
- appendix_ref_offset,
- )?;
- }
-
- Ok(true)
+ Ok(Some(start_offset))
}
async fn add_entry<T: SeqWrite + Send>(
@@ -801,8 +796,6 @@ impl Archiver {
stat: &FileStat,
prev_cat_entries: &[DirEntry],
) -> Result<(), Error> {
- use pxar::format::mode;
-
let file_mode = stat.st_mode & libc::S_IFMT;
let open_mode = if file_mode == libc::S_IFREG || file_mode == libc::S_IFDIR {
OFlag::empty()
@@ -822,6 +815,15 @@ impl Archiver {
None => return Ok(()),
};
+ let match_path = PathBuf::from("/").join(self.path.clone());
+ if self
+ .patterns
+ .matches(match_path.as_os_str().as_bytes(), stat.st_mode)?
+ == Some(MatchType::Exclude)
+ {
+ return Ok(());
+ }
+
let metadata = get_metadata(
fd.as_raw_fd(),
stat,
@@ -830,16 +832,148 @@ impl Archiver {
&mut self.fs_feature_flags,
)?;
- let match_path = PathBuf::from("/").join(self.path.clone());
- if self
- .patterns
- .matches(match_path.as_os_str().as_bytes(), stat.st_mode)?
- == Some(MatchType::Exclude)
- {
- return Ok(());
+ if self.previous_ref.is_none() {
+ return self
+ .add_entry_to_archive(encoder, c_file_name, stat, fd, &metadata, prev_cat_entries)
+ .await;
}
+ if metadata.is_regular_file() {
+ self.cache_or_flush_entries(encoder, c_file_name, stat, fd, &metadata, prev_cat_entries)
+ .await
+ } else {
+ if self.caching_enabled {
+ if stat.st_mode & libc::S_IFMT == libc::S_IFDIR {
+ let fd_clone = fd.try_clone()?;
+ let cache_entry = CacheEntry::DirEntry(CacheEntryData::new(
+ fd,
+ c_file_name.into(),
+ stat.clone(),
+ metadata.clone(),
+ Vec::new(),
+ 0,
+ ));
+ self.cached_entries.push(cache_entry);
+ let prev_cat_entry = prev_cat_entries.iter().find(|entry| match entry {
+ DirEntry {
+ name,
+ attr: DirEntryAttribute::Directory { .. },
+ } => name == c_file_name.to_bytes(),
+ _ => false,
+ });
+
+ let dir = Dir::from_fd(fd_clone.into_raw_fd())?;
+ self.add_directory(encoder, dir, c_file_name, &metadata, stat, prev_cat_entry)
+ .await?;
+ if let Some(ref catalog) = self.catalog {
+ if !self.caching_enabled {
+ catalog.lock().unwrap().end_directory()?;
+ }
+ }
+ } else {
+ let cache_entry = CacheEntry::RegEntry(CacheEntryData::new(
+ fd,
+ c_file_name.into(),
+ stat.clone(),
+ metadata,
+ Vec::new(),
+ 0,
+ ));
+ self.cached_entries.push(cache_entry);
+ }
+ Ok(())
+ } else {
+ self.add_entry_to_archive(
+ encoder,
+ c_file_name,
+ stat,
+ fd,
+ &metadata,
+ prev_cat_entries,
+ )
+ .await
+ }
+ }
+ }
+
+ async fn cache_or_flush_entries<T: SeqWrite + Send>(
+ &mut self,
+ encoder: &mut Encoder<'_, T>,
+ c_file_name: &CStr,
+ stat: &FileStat,
+ fd: OwnedFd,
+ metadata: &Metadata,
+ prev_cat_entries: &[DirEntry],
+ ) -> Result<(), Error> {
+ let reusable = self
+ .is_reusable_entry(c_file_name, stat, prev_cat_entries)
+ .await?;
+
let file_name: &Path = OsStr::from_bytes(c_file_name.to_bytes()).as_ref();
+ let file_size = stat.st_size as u64;
+ if let Some(start_offset) = reusable {
+ if let Some(ref prev_ref) = self.previous_ref {
+ // Since ctime is unchanged, use current metadata size to calculate size and thereby
+ // end offset for the file payload in the reference archive.
+ // This is required to find the existing indexes to be included in the new index file.
+ let mut bytes = pxar::encoder::encoded_size(c_file_name, metadata).await?;
+ // payload header size
+ bytes += std::mem::size_of::<pxar::format::Header>() as u64;
+
+ let end_offset = start_offset + bytes + file_size;
+ let (indices, start_padding, _end_padding) =
+ prev_ref.index.indices(start_offset, end_offset)?;
+
+ if self.cached_payload_size + file_size >= CACHED_PAYLOAD_THRESHOLD {
+ self.flush_cached_to_archive(encoder, true).await?;
+ let appendix_ref_offset = self.appendix.insert(indices, start_padding);
+ self.add_appendix_ref(encoder, file_name, appendix_ref_offset, file_size)
+ .await?;
+
+ if let Some(ref catalog) = self.catalog {
+ catalog.lock().unwrap().add_appendix_ref(
+ &c_file_name,
+ file_size,
+ stat.st_mtime,
+ stat.st_ctime,
+ appendix_ref_offset,
+ )?;
+ }
+ } else {
+ self.caching_enabled = true;
+ self.cached_payload_size += file_size;
+ let cache_entry = CacheEntry::RegEntry(CacheEntryData::new(
+ fd,
+ c_file_name.into(),
+ stat.clone(),
+ metadata.clone(),
+ indices,
+ start_padding,
+ ));
+ self.cached_entries.push(cache_entry);
+ }
+
+ return Ok(());
+ }
+ }
+
+ self.flush_cached_to_archive(encoder, false).await?;
+ self.add_entry_to_archive(encoder, c_file_name, stat, fd, &metadata, prev_cat_entries)
+ .await
+ }
+
+ async fn add_entry_to_archive<T: SeqWrite + Send>(
+ &mut self,
+ encoder: &mut Encoder<'_, T>,
+ c_file_name: &CStr,
+ stat: &FileStat,
+ fd: OwnedFd,
+ metadata: &Metadata,
+ prev_cat_entries: &[DirEntry],
+ ) -> Result<(), Error> {
+ let file_name: &Path = OsStr::from_bytes(c_file_name.to_bytes()).as_ref();
+
+ use pxar::format::mode;
match metadata.file_type() {
mode::IFREG => {
let link_info = HardLinkInfo {
@@ -852,28 +986,12 @@ impl Archiver {
if let Some(ref catalog) = self.catalog {
catalog.lock().unwrap().add_hardlink(c_file_name)?;
}
-
encoder.add_hardlink(file_name, path, *offset).await?;
-
return Ok(());
}
}
let file_size = stat.st_size as u64;
- if file_size > MAX_FILE_SIZE
- && self
- .reuse_if_metadata_unchanged(
- encoder,
- c_file_name,
- &metadata,
- stat,
- prev_cat_entries,
- )
- .await?
- {
- return Ok(());
- }
-
let offset: LinkOffset = self
.add_regular_file(encoder, fd, file_name, &metadata, file_size)
.await?;
@@ -892,7 +1010,6 @@ impl Archiver {
self.hardlinks
.insert(link_info, (self.path.clone(), offset));
}
-
Ok(())
}
mode::IFDIR => {
@@ -912,7 +1029,9 @@ impl Archiver {
.add_directory(encoder, dir, c_file_name, &metadata, stat, prev_cat_entry)
.await;
if let Some(ref catalog) = self.catalog {
- catalog.lock().unwrap().end_directory()?;
+ if !self.caching_enabled {
+ catalog.lock().unwrap().end_directory()?;
+ }
}
result
}
@@ -920,35 +1039,30 @@ impl Archiver {
if let Some(ref catalog) = self.catalog {
catalog.lock().unwrap().add_socket(c_file_name)?;
}
-
Ok(encoder.add_socket(&metadata, file_name).await?)
}
mode::IFIFO => {
if let Some(ref catalog) = self.catalog {
catalog.lock().unwrap().add_fifo(c_file_name)?;
}
-
Ok(encoder.add_fifo(&metadata, file_name).await?)
}
mode::IFLNK => {
if let Some(ref catalog) = self.catalog {
catalog.lock().unwrap().add_symlink(c_file_name)?;
}
-
self.add_symlink(encoder, fd, file_name, &metadata).await
}
mode::IFBLK => {
if let Some(ref catalog) = self.catalog {
catalog.lock().unwrap().add_block_device(c_file_name)?;
}
-
self.add_device(encoder, file_name, &metadata, stat).await
}
mode::IFCHR => {
if let Some(ref catalog) = self.catalog {
catalog.lock().unwrap().add_char_device(c_file_name)?;
}
-
self.add_device(encoder, file_name, &metadata, stat).await
}
other => bail!(
@@ -959,6 +1073,172 @@ impl Archiver {
}
}
+ async fn flush_cached_to_archive<T: SeqWrite + Send>(
+ &mut self,
+ encoder: &mut Encoder<'_, T>,
+ as_appendix_refs: bool,
+ ) -> Result<(), Error> {
+ let entries = std::mem::take(&mut self.cached_entries);
+
+ self.caching_enabled = false;
+ self.cached_payload_size = 0;
+
+ for entry in entries {
+ match entry {
+ CacheEntry::RegEntry(data) => {
+ self.flush_entry_to_archive(encoder, data, as_appendix_refs)
+ .await?
+ }
+ CacheEntry::DirEntry(data) => {
+ self.flush_directory_to_archive(encoder, data).await?
+ }
+ CacheEntry::DirEnd => {
+ let result = encoder.finish(None).await?;
+ if let Some(ref catalog) = self.catalog {
+ catalog.lock().unwrap().end_directory()?;
+ }
+ result
+ }
+ }
+ }
+
+ Ok(())
+ }
+
+ async fn flush_directory_to_archive<'a, 'b, T: SeqWrite + Send>(
+ &'a mut self,
+ encoder: &'a mut Encoder<'b, T>,
+ entry_data: CacheEntryData,
+ ) -> Result<(), Error> {
+ let CacheEntryData {
+ c_file_name,
+ metadata,
+ ..
+ } = entry_data;
+ let dir_name = OsStr::from_bytes(c_file_name.to_bytes());
+
+ if let Some(ref catalog) = self.catalog {
+ catalog.lock().unwrap().start_directory(&c_file_name)?;
+ }
+
+ encoder.create_directory(dir_name, &metadata).await?;
+
+ Ok(())
+ }
+
+ async fn flush_entry_to_archive<T: SeqWrite + Send>(
+ &mut self,
+ encoder: &mut Encoder<'_, T>,
+ entry_data: CacheEntryData,
+ as_appendix_refs: bool,
+ ) -> Result<(), Error> {
+ use pxar::format::mode;
+
+ let CacheEntryData {
+ fd,
+ c_file_name,
+ stat,
+ metadata,
+ indices,
+ start_padding,
+ } = entry_data;
+ let file_name: &Path = OsStr::from_bytes(c_file_name.to_bytes()).as_ref();
+
+ match metadata.file_type() {
+ mode::IFREG => {
+ let link_info = HardLinkInfo {
+ st_dev: stat.st_dev,
+ st_ino: stat.st_ino,
+ };
+
+ if stat.st_nlink > 1 {
+ if let Some((path, offset)) = self.hardlinks.get(&link_info) {
+ if let Some(ref catalog) = self.catalog {
+ catalog.lock().unwrap().add_hardlink(&c_file_name)?;
+ }
+ encoder.add_hardlink(file_name, path, *offset).await?;
+ return Ok(());
+ }
+ }
+
+ let file_size = stat.st_size as u64;
+ if as_appendix_refs {
+ let appendix_ref_offset = self.appendix.insert(indices, start_padding);
+ let file_name: &Path = OsStr::from_bytes(c_file_name.to_bytes()).as_ref();
+ self.add_appendix_ref(encoder, file_name, appendix_ref_offset, file_size)
+ .await?;
+ if let Some(ref catalog) = self.catalog {
+ catalog.lock().unwrap().add_appendix_ref(
+ &c_file_name,
+ file_size,
+ stat.st_mtime,
+ stat.st_ctime,
+ appendix_ref_offset,
+ )?;
+ }
+ } else {
+ let offset: LinkOffset = self
+ .add_regular_file(encoder, fd, file_name, &metadata, file_size)
+ .await?;
+
+ if let Some(ref catalog) = self.catalog {
+ catalog.lock().unwrap().add_file(
+ &c_file_name,
+ file_size,
+ stat.st_mtime,
+ stat.st_ctime,
+ offset,
+ )?;
+ }
+
+ if stat.st_nlink > 1 {
+ self.hardlinks
+ .insert(link_info, (self.path.clone(), offset));
+ }
+ }
+ }
+ mode::IFSOCK => {
+ if let Some(ref catalog) = self.catalog {
+ catalog.lock().unwrap().add_socket(&c_file_name)?;
+ }
+ encoder.add_socket(&metadata, file_name).await?;
+ }
+ mode::IFIFO => {
+ if let Some(ref catalog) = self.catalog {
+ catalog.lock().unwrap().add_fifo(&c_file_name)?;
+ }
+ encoder.add_fifo(&metadata, file_name).await?;
+ }
+ mode::IFLNK => {
+ if let Some(ref catalog) = self.catalog {
+ catalog.lock().unwrap().add_symlink(&c_file_name)?;
+ }
+ self.add_symlink(encoder, fd, file_name, &metadata).await?;
+ }
+ mode::IFBLK => {
+ if let Some(ref catalog) = self.catalog {
+ catalog.lock().unwrap().add_block_device(&c_file_name)?;
+ }
+ self.add_device(encoder, file_name, &metadata, &stat)
+ .await?;
+ }
+ mode::IFCHR => {
+ if let Some(ref catalog) = self.catalog {
+ catalog.lock().unwrap().add_char_device(&c_file_name)?;
+ }
+ self.add_device(encoder, file_name, &metadata, &stat)
+ .await?;
+ }
+ other => bail!(
+ "encountered unknown file type: 0x{:x} (0o{:o})",
+ other,
+ other
+ ),
+ }
+
+ Ok(())
+ }
+
async fn add_directory<T: SeqWrite + Send>(
&mut self,
encoder: &mut Encoder<'_, T>,
@@ -970,7 +1250,9 @@ impl Archiver {
) -> Result<(), Error> {
let dir_name = OsStr::from_bytes(dir_name.to_bytes());
- let mut encoder = encoder.create_directory(dir_name, metadata).await?;
+ if !self.caching_enabled {
+ encoder.create_directory(dir_name, metadata).await?;
+ }
let old_fs_magic = self.fs_magic;
let old_fs_feature_flags = self.fs_feature_flags;
@@ -993,7 +1275,7 @@ impl Archiver {
log::info!("skipping mount point: {:?}", self.path);
Ok(())
} else {
- self.archive_dir_contents(&mut encoder, dir, prev_cat_parent, false)
+ self.archive_dir_contents(encoder, dir, prev_cat_parent, false)
.await
};
@@ -1001,7 +1283,10 @@ impl Archiver {
self.fs_feature_flags = old_fs_feature_flags;
self.current_st_dev = old_st_dev;
- encoder.finish(None).await?;
+ if !self.caching_enabled {
+ encoder.finish(None).await?;
+ }
+
result
}
diff --git a/pbs-client/src/pxar/lookahead_cache.rs b/pbs-client/src/pxar/lookahead_cache.rs
new file mode 100644
index 00000000..b664250b
--- /dev/null
+++ b/pbs-client/src/pxar/lookahead_cache.rs
@@ -0,0 +1,41 @@
+use nix::sys::stat::FileStat;
+use std::ffi::CString;
+use std::os::unix::io::OwnedFd;
+
+use pbs_datastore::dynamic_index::AppendableDynamicEntry;
+use pxar::Metadata;
+
+pub(crate) struct CacheEntryData {
+ pub(crate) fd: OwnedFd,
+ pub(crate) c_file_name: CString,
+ pub(crate) stat: FileStat,
+ pub(crate) metadata: Metadata,
+ pub(crate) indices: Vec<AppendableDynamicEntry>,
+ pub(crate) start_padding: u64,
+}
+
+impl CacheEntryData {
+ pub(crate) fn new(
+ fd: OwnedFd,
+ c_file_name: CString,
+ stat: FileStat,
+ metadata: Metadata,
+ indices: Vec<AppendableDynamicEntry>,
+ start_padding: u64,
+ ) -> Self {
+ Self {
+ fd,
+ c_file_name,
+ stat,
+ metadata,
+ indices,
+ start_padding,
+ }
+ }
+}
+
+pub(crate) enum CacheEntry {
+ RegEntry(CacheEntryData),
+ DirEntry(CacheEntryData),
+ DirEnd,
+}
diff --git a/pbs-client/src/pxar/mod.rs b/pbs-client/src/pxar/mod.rs
index 24315f5f..3729ab10 100644
--- a/pbs-client/src/pxar/mod.rs
+++ b/pbs-client/src/pxar/mod.rs
@@ -50,6 +50,7 @@
pub(crate) mod create;
pub(crate) mod dir_stack;
pub(crate) mod extract;
+pub(crate) mod lookahead_cache;
pub(crate) mod metadata;
pub(crate) mod tools;
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
* [pbs-devel] [PATCH v6 proxmox-widget-toolkit 29/29] file-browser: support pxar archive and fileref types
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
` (27 preceding siblings ...)
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 28/29] pxar: add lookahead caching to reduce chunk fragmentation Christian Ebner
@ 2024-01-25 13:26 ` Christian Ebner
28 siblings, 0 replies; 30+ messages in thread
From: Christian Ebner @ 2024-01-25 13:26 UTC (permalink / raw)
To: pbs-devel
Covers cases for the special directory type archive and file
reference types as introduced by pxar format version 2.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Changes since v5:
- no changes
src/Schema.js | 2 ++
src/window/FileBrowser.js | 16 ++++++++++++----
2 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/src/Schema.js b/src/Schema.js
index 841527f..e8908cf 100644
--- a/src/Schema.js
+++ b/src/Schema.js
@@ -59,7 +59,9 @@ Ext.define('Proxmox.Schema', { // a singleton
b: { icon: 'cube', label: gettext('Block Device') },
c: { icon: 'tty', label: gettext('Character Device') },
d: { icon: 'folder-o', label: gettext('Directory') },
+ a: { icon: 'folder-o', label: gettext('Archive') },
f: { icon: 'file-text-o', label: gettext('File') },
+ r: { icon: 'file-text-o', label: gettext('FileRef') },
h: { icon: 'file-o', label: gettext('Hardlink') },
l: { icon: 'link', label: gettext('Softlink') },
p: { icon: 'exchange', label: gettext('Pipe/Fifo') },
diff --git a/src/window/FileBrowser.js b/src/window/FileBrowser.js
index e036d9f..4a288e6 100644
--- a/src/window/FileBrowser.js
+++ b/src/window/FileBrowser.js
@@ -8,7 +8,7 @@ Ext.define('proxmox-file-tree', {
calculate: data => {
if (data.size === undefined) {
return '';
- } else if (false && data.type === 'd') { // eslint-disable-line no-constant-condition
+ } else if (false && (data.type === 'd' || data.type === 'a')) { // eslint-disable-line no-constant-condition
// FIXME: enable again once we fixed trouble with confusing size vs item #
let fs = data.size === 1 ? gettext('{0} Item') : gettext('{0} Items');
return Ext.String.format(fs, data.size);
@@ -26,7 +26,7 @@ Ext.define('proxmox-file-tree', {
name: 'iconCls',
calculate: function(data) {
let icon = Proxmox.Schema.pxarFileTypes[data.type]?.icon ?? 'file-o';
- if (data.expanded && data.type === 'd') {
+ if (data.expanded && (data.type === 'd' || data.type === 'a')) {
icon = 'folder-open-o';
}
return `fa fa-${icon}`;
@@ -58,7 +58,9 @@ Ext.define("Proxmox.window.FileBrowser", {
downloadableFileTypes: {
'h': true, // hardlinks
'f': true, // "normal" files
+ 'r': true, // ref files
'd': true, // directories
+ 'a': true, // archive
},
// prefix to prepend to downloaded file names
@@ -98,7 +100,7 @@ Ext.define("Proxmox.window.FileBrowser", {
params.filepath = data.filepath;
let filename = view.downloadPrefix + data.text;
- if (data.type === 'd') {
+ if (data.type === 'd' || data.type === 'a') {
if (tar) {
params.tar = 1;
filename += ".tar.zst";
@@ -122,7 +124,7 @@ Ext.define("Proxmox.window.FileBrowser", {
view.lookup('selectText').setText(st);
let canDownload = view.downloadURL && view.downloadableFileTypes[data.type];
- let enableMenu = data.type === 'd';
+ let enableMenu = data.type === 'd' || data.type === 'a';
let downloadBtn = view.lookup('downloadBtn');
downloadBtn.setDisabled(!canDownload || enableMenu);
@@ -201,6 +203,7 @@ Ext.define("Proxmox.window.FileBrowser", {
});
store.load((rec, op, success) => {
let root = store.getRoot();
+ console.log(root);
root.expand(); // always expand invisible root node
if (view.archive === 'all') {
root.expandChildren(false);
@@ -269,6 +272,11 @@ Ext.define("Proxmox.window.FileBrowser", {
} else if (a.data.type !== 'd' && b.data.type === 'd') {
return 1;
}
+ if (a.data.type === 'a' && b.data.type !== 'a') {
+ return -1;
+ } else if (a.data.type !== 'a' && b.data.type === 'a') {
+ return 1;
+ }
let asize = a.data.size || 0;
let bsize = b.data.size || 0;
--
2.39.2
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2024-01-25 13:34 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-25 13:25 [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 1/29] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 2/29] fix #3174: decoder: impl skip_bytes for sync dec Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 3/29] fix #3174: encoder: calc filename + metadata byte size Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 4/29] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 5/29] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 6/29] fix #3174: encoder: helper to add to encoder position Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 7/29] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 8/29] fix #3174: add pxar format version entry Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 pxar 9/29] fix #3174: enc: move from instance per dir to state stack Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 10/29] fix #3174: index: add fn index list from start/end-offsets Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 11/29] fix #3174: api: double catalog upload size Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 12/29] fix #3174: catalog: introduce extended format v2 Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 13/29] fix #3174: archiver/extractor: impl appendix ref Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 14/29] fix #3174: catalog: add specialized Archive entry Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 15/29] fix #3174: extractor: impl seq restore from appendix Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 16/29] fix #3174: archiver: store ref to previous backup Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 17/29] fix #3174: upload stream: impl reused chunk injector Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 18/29] fix #3174: chunker: add forced boundaries Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 19/29] fix #3174: backup writer: inject queued chunk in upload steam Christian Ebner
2024-01-25 13:25 ` [pbs-devel] [PATCH v6 proxmox-backup 20/29] fix #3174: archiver: reuse files with unchanged metadata Christian Ebner
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 21/29] fix #3174: specs: add backup detection mode specification Christian Ebner
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 22/29] fix #3174: client: Add detection mode to backup creation Christian Ebner
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 23/29] catalog: use format version 2 conditionally Christian Ebner
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 24/29] tools: add optional path prefix to line based output Christian Ebner
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 25/29] pxar-bin: implement listing for appendix entries Christian Ebner
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 26/29] test-suite: add detection mode change benchmark Christian Ebner
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 27/29] test-suite: Add bin to deb, add shell completions Christian Ebner
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-backup 28/29] pxar: add lookahead caching to reduce chunk fragmentation Christian Ebner
2024-01-25 13:26 ` [pbs-devel] [PATCH v6 proxmox-widget-toolkit 29/29] file-browser: support pxar archive and fileref types Christian Ebner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox