From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH v3 pxar proxmox-backup 00/58] fix #3174: improve file-level backup
Date: Thu, 28 Mar 2024 13:36:09 +0100 [thread overview]
Message-ID: <20240328123707.336951-1-c.ebner@proxmox.com> (raw)
A big thank you to Dietmar and Fabian for the review of the previous
version and Fabian for extensive testing and help during debugging.
This series of patches implements an metadata based file change
detection mechanism for improved pxar file level backup creation speed
for unchanged files.
The chosen approach is to split pxar archives on creation via the
proxmox-backup-client into two separate data and upload streams,
one exclusive for regular file payloads, the other one for the rest
of the pxar archive, which is mostly metadata.
On consecutive runs, the metadata archive of the previous backup run,
which is limited in size and therefore rapidly accessed is used to
lookup and compare the metadata for entries to encode.
This assumes that the connection speed to the Proxmox Backup Server is
sufficiently fast, allowing the download and chaching of the chunks for
that index.
Changes to regular files are detected by comparing all of the files
metadata object, including mtime, acls, ecc. If no changes are detected,
the previous payload index is used to lookup chunks to possibly re-use
in the payload stream of the new archive.
In order to reduce possible chunk fragmentation, the decision whether to
re-use or re-encode a file payload is deferred until enough information
is gathered by adding entries to a look-ahead cache. If the padding
introduced by reusing chunks falls below a threshold, the entries are
referenced, the chunks are re-used and injected into the pxar payload
upload stream, otherwise they are discated and the files encoded
regularly.
The following lists the most notable changes included in this series since
the version 2:
- many bugfixes regarding incorrect archive encoding by wrong offset
generation, adding additional sanity checks and rather fail on
encoding than produce an incorrectly encoded archive
- different approach for deciding whether to re-use or re-encode the
entries. Previously, the entries have been encoded when a cached
payload size threshold was reached. Now, the padding introduced by
reusable chunks is tracked, and only if the padding does not exceed
the set threshold, the entries are re-used. This reduces the possible
padding, at the cost of re-encoding more entries. Also avoids to
re-use chunks which have now large padding holes because of
moved/removed files contained within.
- added headers for metadata archive and payload file
- added documentation
An invocation of a backup run with this patches now is:
```bash
proxmox-backup-client backup <label>.pxar:<source-path> --change-detection-mode=metadata
```
During the first run, no reference index is available, the pxar archive
will however be split into the two parts.
Following backups will however utilize the pxar archive accessor and
index files of the previous run to perform file change detection.
As benchmarks, the linux source code as well as the coco dataset for
computer vision and pattern recognition can be used.
The benchmarks can be performed by running:
```bash
proxmox-backup-test-suite detection-mode-bench prepare --target /<path-to-bench-source-target>
proxmox-backup-test-suite detection-mode-bench run linux.pxar:/<path-to-bench-source-target>/linux
proxmox-backup-test-suite detection-mode-bench run coco.pxar:/<path-to-bench-source-target>/coco
```
Above command invocations assume the default repository and credentials
to be set as environment variables, they might however be passed as
additional optional parameters instead.
pxar:
Christian Ebner (14):
encoder: fix two typos in comments
format/examples: add PXAR_PAYLOAD_REF entry header
decoder: add method to read payload references
decoder: factor out skip part from skip_entry
encoder: add optional output writer for file payloads
encoder: move to stack based state tracking
decoder/accessor: add optional payload input stream
encoder: add payload reference capability
encoder: add payload position capability
encoder: add payload advance capability
encoder/format: finish payload stream with marker
format: add payload stream start marker
format: add pxar format version entry
format/encoder/decoder: add entry type cli params
examples/apxar.rs | 2 +-
examples/mk-format-hashes.rs | 21 ++
examples/pxarcmd.rs | 7 +-
src/accessor/aio.rs | 10 +-
src/accessor/mod.rs | 52 +++-
src/accessor/sync.rs | 8 +-
src/decoder/aio.rs | 14 +-
src/decoder/mod.rs | 191 ++++++++++++--
src/decoder/sync.rs | 15 +-
src/encoder/aio.rs | 87 +++++--
src/encoder/mod.rs | 475 +++++++++++++++++++++++++----------
src/encoder/sync.rs | 67 ++++-
src/format/mod.rs | 63 +++++
src/lib.rs | 9 +
tests/simple/main.rs | 3 +
15 files changed, 827 insertions(+), 197 deletions(-)
proxmox-backup:
Christian Ebner (44):
client: pxar: switch to stack based encoder state
client: backup writer: only borrow http client
client: backup: factor out extension from backup target
client: backup: early check for fixed index type
client: pxar: combine writer params into struct
client: backup: split payload to dedicated stream
client: helper: add helpers for creating reader instances
client: helper: add method for split archive name mapping
client: restore: read payload from dedicated index
tools: cover meta extension for pxar archives
restore: cover meta extension for pxar archives
client: mount: make split pxar archives mountable
api: datastore: refactor getting local chunk reader
api: datastore: attach optional payload chunk reader
catalog: shell: factor out pxar fuse reader instantiation
catalog: shell: redirect payload reader for split streams
www: cover meta extension for pxar archives
pxar: add optional payload input for achive restore
pxar: add more context to extraction error
client: pxar: include payload offset in output
pxar: show padding in debug output on archive list
datastore: dynamic index: add method to get digest
client: pxar: helper for lookup of reusable dynamic entries
upload stream: impl reused chunk injector
client: chunk stream: add struct to hold injection state
client: chunk stream: add dynamic entries injection queues
specs: add backup detection mode specification
client: implement prepare reference method
client: pxar: implement store to insert chunks on caching
client: pxar: add previous reference to archiver
client: pxar: add method for metadata comparison
pxar: caching: add look-ahead cache types
client: pxar: add look-ahead caching
fix #3174: client: pxar: enable caching and meta comparison
client: backup: increase average chunk size for metadata
client: backup writer: add injected chunk count to stats
pxar: create: show chunk injection stats debug output
client: pxar: add entry kind format version
client: pxar: opt encode cli exclude patterns as CliParams
client: pxar: add flow chart for metadata change detection
docs: describe file format for split payload files
docs: add section describing change detection mode
test-suite: add detection mode change benchmark
test-suite: add bin to deb, add shell completions
Cargo.toml | 1 +
Makefile | 13 +-
debian/proxmox-backup-client.bash-completion | 1 +
debian/proxmox-backup-client.install | 2 +
debian/proxmox-backup-test-suite.bc | 8 +
docs/backup-client.rst | 33 +
docs/file-formats.rst | 32 +
docs/meta-format-overview.dot | 50 ++
examples/test_chunk_speed2.rs | 2 +-
examples/upload-speed.rs | 2 +-
pbs-client/src/backup_specification.rs | 40 +
pbs-client/src/backup_writer.rs | 103 ++-
pbs-client/src/chunk_stream.rs | 60 +-
pbs-client/src/inject_reused_chunks.rs | 152 ++++
pbs-client/src/lib.rs | 3 +-
pbs-client/src/pxar/create.rs | 779 +++++++++++++++++-
pbs-client/src/pxar/extract.rs | 2 +
...t-metadata-based-file-change-detection.svg | 1 +
...t-metadata-based-file-change-detection.txt | 12 +
pbs-client/src/pxar/look_ahead_cache.rs | 38 +
pbs-client/src/pxar/mod.rs | 3 +-
pbs-client/src/pxar/tools.rs | 123 ++-
pbs-client/src/pxar_backup_stream.rs | 57 +-
pbs-client/src/tools/mod.rs | 5 +-
pbs-datastore/src/dynamic_index.rs | 5 +
pbs-pxar-fuse/src/lib.rs | 2 +-
proxmox-backup-client/src/benchmark.rs | 2 +-
proxmox-backup-client/src/catalog.rs | 42 +-
proxmox-backup-client/src/helper.rs | 64 ++
proxmox-backup-client/src/main.rs | 281 ++++++-
proxmox-backup-client/src/mount.rs | 54 +-
proxmox-backup-test-suite/Cargo.toml | 18 +
.../src/detection_mode_bench.rs | 294 +++++++
proxmox-backup-test-suite/src/main.rs | 17 +
proxmox-file-restore/src/main.rs | 20 +-
.../src/proxmox_restore_daemon/api.rs | 16 +-
pxar-bin/src/main.rs | 53 +-
src/api2/admin/datastore.rs | 47 +-
src/api2/tape/restore.rs | 4 +-
src/bin/proxmox_backup_debug/diff.rs | 2 +-
src/tape/file_formats/snapshot_archive.rs | 9 +-
tests/catar.rs | 4 +-
www/datastore/Content.js | 6 +-
zsh-completions/_proxmox-backup-test-suite | 13 +
44 files changed, 2219 insertions(+), 256 deletions(-)
create mode 100644 debian/proxmox-backup-test-suite.bc
create mode 100644 docs/meta-format-overview.dot
create mode 100644 pbs-client/src/inject_reused_chunks.rs
create mode 100644 pbs-client/src/pxar/flow-chart-metadata-based-file-change-detection.svg
create mode 100644 pbs-client/src/pxar/flow-chart-metadata-based-file-change-detection.txt
create mode 100644 pbs-client/src/pxar/look_ahead_cache.rs
create mode 100644 proxmox-backup-client/src/helper.rs
create mode 100644 proxmox-backup-test-suite/Cargo.toml
create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs
create mode 100644 proxmox-backup-test-suite/src/main.rs
create mode 100644 zsh-completions/_proxmox-backup-test-suite
--
2.39.2
next reply other threads:[~2024-03-28 12:38 UTC|newest]
Thread overview: 122+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-28 12:36 Christian Ebner [this message]
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 01/58] encoder: fix two typos in comments Christian Ebner
2024-04-03 9:12 ` [pbs-devel] applied: " Fabian Grünbichler
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 02/58] format/examples: add PXAR_PAYLOAD_REF entry header Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 03/58] decoder: add method to read payload references Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 04/58] decoder: factor out skip part from skip_entry Christian Ebner
2024-04-03 9:18 ` Fabian Grünbichler
2024-04-03 11:02 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 05/58] encoder: add optional output writer for file payloads Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 06/58] encoder: move to stack based state tracking Christian Ebner
2024-04-03 9:54 ` Fabian Grünbichler
2024-04-03 11:01 ` Christian Ebner
2024-04-04 8:48 ` Fabian Grünbichler
2024-04-04 9:04 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 07/58] decoder/accessor: add optional payload input stream Christian Ebner
2024-04-03 10:38 ` Fabian Grünbichler
2024-04-03 11:47 ` Christian Ebner
2024-04-03 12:18 ` Christian Ebner
2024-04-04 8:46 ` Fabian Grünbichler
2024-04-04 9:49 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 08/58] encoder: add payload reference capability Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 09/58] encoder: add payload position capability Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 10/58] encoder: add payload advance capability Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 11/58] encoder/format: finish payload stream with marker Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 12/58] format: add payload stream start marker Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 13/58] format: add pxar format version entry Christian Ebner
2024-04-03 11:41 ` Fabian Grünbichler
2024-04-03 13:31 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 14/58] format/encoder/decoder: add entry type cli params Christian Ebner
2024-04-03 12:01 ` Fabian Grünbichler
2024-04-03 14:41 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 15/58] client: pxar: switch to stack based encoder state Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 16/58] client: backup writer: only borrow http client Christian Ebner
2024-04-08 9:04 ` [pbs-devel] applied: " Fabian Grünbichler
2024-04-08 9:17 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 17/58] client: backup: factor out extension from backup target Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 18/58] client: backup: early check for fixed index type Christian Ebner
2024-04-08 9:05 ` [pbs-devel] applied: " Fabian Grünbichler
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 19/58] client: pxar: combine writer params into struct Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 20/58] client: backup: split payload to dedicated stream Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 21/58] client: helper: add helpers for creating reader instances Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 22/58] client: helper: add method for split archive name mapping Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 23/58] client: restore: read payload from dedicated index Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 24/58] tools: cover meta extension for pxar archives Christian Ebner
2024-04-04 9:01 ` Fabian Grünbichler
2024-04-04 9:06 ` Christian Ebner
2024-04-04 9:10 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 25/58] restore: " Christian Ebner
2024-04-04 9:02 ` Fabian Grünbichler
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 26/58] client: mount: make split pxar archives mountable Christian Ebner
2024-04-04 9:43 ` Fabian Grünbichler
2024-04-04 13:29 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 27/58] api: datastore: refactor getting local chunk reader Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 28/58] api: datastore: attach optional payload " Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 29/58] catalog: shell: factor out pxar fuse reader instantiation Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 30/58] catalog: shell: redirect payload reader for split streams Christian Ebner
2024-04-04 9:49 ` Fabian Grünbichler
2024-04-04 15:52 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 31/58] www: cover meta extension for pxar archives Christian Ebner
2024-04-04 10:01 ` Fabian Grünbichler
2024-04-04 14:51 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 32/58] pxar: add optional payload input for achive restore Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 33/58] pxar: add more context to extraction error Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 34/58] client: pxar: include payload offset in output Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 35/58] pxar: show padding in debug output on archive list Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 36/58] datastore: dynamic index: add method to get digest Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 37/58] client: pxar: helper for lookup of reusable dynamic entries Christian Ebner
2024-04-04 12:54 ` Fabian Grünbichler
2024-04-04 17:13 ` Christian Ebner
2024-04-05 7:22 ` Christian Ebner
2024-04-05 11:28 ` Fabian Grünbichler
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 38/58] upload stream: impl reused chunk injector Christian Ebner
2024-04-04 14:24 ` Fabian Grünbichler
2024-04-05 10:26 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 39/58] client: chunk stream: add struct to hold injection state Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 40/58] client: chunk stream: add dynamic entries injection queues Christian Ebner
2024-04-04 14:52 ` Fabian Grünbichler
2024-04-08 13:54 ` Christian Ebner
2024-04-09 7:19 ` Fabian Grünbichler
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 41/58] specs: add backup detection mode specification Christian Ebner
2024-04-04 14:54 ` Fabian Grünbichler
2024-04-08 13:36 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 42/58] client: implement prepare reference method Christian Ebner
2024-04-05 8:01 ` Fabian Grünbichler
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 43/58] client: pxar: implement store to insert chunks on caching Christian Ebner
2024-04-05 7:52 ` Fabian Grünbichler
2024-04-09 9:12 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 44/58] client: pxar: add previous reference to archiver Christian Ebner
2024-04-04 15:04 ` Fabian Grünbichler
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 45/58] client: pxar: add method for metadata comparison Christian Ebner
2024-04-05 8:08 ` Fabian Grünbichler
2024-04-05 8:14 ` Christian Ebner
2024-04-09 12:52 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 46/58] pxar: caching: add look-ahead cache types Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 47/58] client: pxar: add look-ahead caching Christian Ebner
2024-04-05 8:33 ` Fabian Grünbichler
2024-04-09 14:53 ` Christian Ebner
[not found] ` <<dce38c53-f3e7-47ac-b1fd-a63daaabbcec@proxmox.com>
2024-04-10 7:03 ` Fabian Grünbichler
2024-04-10 7:11 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 48/58] fix #3174: client: pxar: enable caching and meta comparison Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 49/58] client: backup: increase average chunk size for metadata Christian Ebner
2024-04-05 9:42 ` Fabian Grünbichler
2024-04-05 10:49 ` Dietmar Maurer
2024-04-08 8:28 ` Fabian Grünbichler
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 50/58] client: backup writer: add injected chunk count to stats Christian Ebner
2024-03-28 12:37 ` [pbs-devel] [PATCH v3 proxmox-backup 51/58] pxar: create: show chunk injection stats debug output Christian Ebner
2024-04-05 9:47 ` Fabian Grünbichler
2024-04-10 10:00 ` Christian Ebner
2024-03-28 12:37 ` [pbs-devel] [PATCH v3 proxmox-backup 52/58] client: pxar: add entry kind format version Christian Ebner
2024-03-28 12:37 ` [pbs-devel] [PATCH v3 proxmox-backup 53/58] client: pxar: opt encode cli exclude patterns as CliParams Christian Ebner
2024-04-05 9:49 ` Fabian Grünbichler
2024-03-28 12:37 ` [pbs-devel] [PATCH v3 proxmox-backup 54/58] client: pxar: add flow chart for metadata change detection Christian Ebner
2024-04-05 10:16 ` Fabian Grünbichler
2024-04-10 10:04 ` Christian Ebner
2024-03-28 12:37 ` [pbs-devel] [PATCH v3 proxmox-backup 55/58] docs: describe file format for split payload files Christian Ebner
2024-04-05 10:26 ` Fabian Grünbichler
2024-03-28 12:37 ` [pbs-devel] [PATCH v3 proxmox-backup 56/58] docs: add section describing change detection mode Christian Ebner
2024-04-05 11:22 ` Fabian Grünbichler
2024-03-28 12:37 ` [pbs-devel] [PATCH v3 proxmox-backup 57/58] test-suite: add detection mode change benchmark Christian Ebner
2024-03-28 12:37 ` [pbs-devel] [PATCH v3 proxmox-backup 58/58] test-suite: add bin to deb, add shell completions Christian Ebner
2024-04-05 11:39 ` [pbs-devel] [PATCH v3 pxar proxmox-backup 00/58] fix #3174: improve file-level backup Fabian Grünbichler
2024-04-29 12:13 ` Christian Ebner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240328123707.336951-1-c.ebner@proxmox.com \
--to=c.ebner@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox