From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: Re: [pbs-devel] [PATCH v3 pxar proxmox-backup 00/58] fix #3174: improve file-level backup
Date: Mon, 29 Apr 2024 14:13:07 +0200 [thread overview]
Message-ID: <7ebd7071-7a53-4b83-8333-f05d61c9f868@proxmox.com> (raw)
In-Reply-To: <20240328123707.336951-1-c.ebner@proxmox.com>
On 3/28/24 13:36, Christian Ebner wrote:
> A big thank you to Dietmar and Fabian for the review of the previous
> version and Fabian for extensive testing and help during debugging.
>
> This series of patches implements an metadata based file change
> detection mechanism for improved pxar file level backup creation speed
> for unchanged files.
>
> The chosen approach is to split pxar archives on creation via the
> proxmox-backup-client into two separate data and upload streams,
> one exclusive for regular file payloads, the other one for the rest
> of the pxar archive, which is mostly metadata.
>
> On consecutive runs, the metadata archive of the previous backup run,
> which is limited in size and therefore rapidly accessed is used to
> lookup and compare the metadata for entries to encode.
> This assumes that the connection speed to the Proxmox Backup Server is
> sufficiently fast, allowing the download and chaching of the chunks for
> that index.
>
> Changes to regular files are detected by comparing all of the files
> metadata object, including mtime, acls, ecc. If no changes are detected,
> the previous payload index is used to lookup chunks to possibly re-use
> in the payload stream of the new archive.
> In order to reduce possible chunk fragmentation, the decision whether to
> re-use or re-encode a file payload is deferred until enough information
> is gathered by adding entries to a look-ahead cache. If the padding
> introduced by reusing chunks falls below a threshold, the entries are
> referenced, the chunks are re-used and injected into the pxar payload
> upload stream, otherwise they are discated and the files encoded
> regularly.
>
> The following lists the most notable changes included in this series since
> the version 2:
> - many bugfixes regarding incorrect archive encoding by wrong offset
> generation, adding additional sanity checks and rather fail on
> encoding than produce an incorrectly encoded archive
> - different approach for deciding whether to re-use or re-encode the
> entries. Previously, the entries have been encoded when a cached
> payload size threshold was reached. Now, the padding introduced by
> reusable chunks is tracked, and only if the padding does not exceed
> the set threshold, the entries are re-used. This reduces the possible
> padding, at the cost of re-encoding more entries. Also avoids to
> re-use chunks which have now large padding holes because of
> moved/removed files contained within.
> - added headers for metadata archive and payload file
> - added documentation
>
> An invocation of a backup run with this patches now is:
> ```bash
> proxmox-backup-client backup <label>.pxar:<source-path> --change-detection-mode=metadata
> ```
> During the first run, no reference index is available, the pxar archive
> will however be split into the two parts.
> Following backups will however utilize the pxar archive accessor and
> index files of the previous run to perform file change detection.
>
> As benchmarks, the linux source code as well as the coco dataset for
> computer vision and pattern recognition can be used.
> The benchmarks can be performed by running:
> ```bash
> proxmox-backup-test-suite detection-mode-bench prepare --target /<path-to-bench-source-target>
> proxmox-backup-test-suite detection-mode-bench run linux.pxar:/<path-to-bench-source-target>/linux
> proxmox-backup-test-suite detection-mode-bench run coco.pxar:/<path-to-bench-source-target>/coco
> ```
>
> Above command invocations assume the default repository and credentials
> to be set as environment variables, they might however be passed as
> additional optional parameters instead.
>
> pxar:
>
> Christian Ebner (14):
> encoder: fix two typos in comments
> format/examples: add PXAR_PAYLOAD_REF entry header
> decoder: add method to read payload references
> decoder: factor out skip part from skip_entry
> encoder: add optional output writer for file payloads
> encoder: move to stack based state tracking
> decoder/accessor: add optional payload input stream
> encoder: add payload reference capability
> encoder: add payload position capability
> encoder: add payload advance capability
> encoder/format: finish payload stream with marker
> format: add payload stream start marker
> format: add pxar format version entry
> format/encoder/decoder: add entry type cli params
>
> examples/apxar.rs | 2 +-
> examples/mk-format-hashes.rs | 21 ++
> examples/pxarcmd.rs | 7 +-
> src/accessor/aio.rs | 10 +-
> src/accessor/mod.rs | 52 +++-
> src/accessor/sync.rs | 8 +-
> src/decoder/aio.rs | 14 +-
> src/decoder/mod.rs | 191 ++++++++++++--
> src/decoder/sync.rs | 15 +-
> src/encoder/aio.rs | 87 +++++--
> src/encoder/mod.rs | 475 +++++++++++++++++++++++++----------
> src/encoder/sync.rs | 67 ++++-
> src/format/mod.rs | 63 +++++
> src/lib.rs | 9 +
> tests/simple/main.rs | 3 +
> 15 files changed, 827 insertions(+), 197 deletions(-)
>
> proxmox-backup:
>
> Christian Ebner (44):
> client: pxar: switch to stack based encoder state
> client: backup writer: only borrow http client
> client: backup: factor out extension from backup target
> client: backup: early check for fixed index type
> client: pxar: combine writer params into struct
> client: backup: split payload to dedicated stream
> client: helper: add helpers for creating reader instances
> client: helper: add method for split archive name mapping
> client: restore: read payload from dedicated index
> tools: cover meta extension for pxar archives
> restore: cover meta extension for pxar archives
> client: mount: make split pxar archives mountable
> api: datastore: refactor getting local chunk reader
> api: datastore: attach optional payload chunk reader
> catalog: shell: factor out pxar fuse reader instantiation
> catalog: shell: redirect payload reader for split streams
> www: cover meta extension for pxar archives
> pxar: add optional payload input for achive restore
> pxar: add more context to extraction error
> client: pxar: include payload offset in output
> pxar: show padding in debug output on archive list
> datastore: dynamic index: add method to get digest
> client: pxar: helper for lookup of reusable dynamic entries
> upload stream: impl reused chunk injector
> client: chunk stream: add struct to hold injection state
> client: chunk stream: add dynamic entries injection queues
> specs: add backup detection mode specification
> client: implement prepare reference method
> client: pxar: implement store to insert chunks on caching
> client: pxar: add previous reference to archiver
> client: pxar: add method for metadata comparison
> pxar: caching: add look-ahead cache types
> client: pxar: add look-ahead caching
> fix #3174: client: pxar: enable caching and meta comparison
> client: backup: increase average chunk size for metadata
> client: backup writer: add injected chunk count to stats
> pxar: create: show chunk injection stats debug output
> client: pxar: add entry kind format version
> client: pxar: opt encode cli exclude patterns as CliParams
> client: pxar: add flow chart for metadata change detection
> docs: describe file format for split payload files
> docs: add section describing change detection mode
> test-suite: add detection mode change benchmark
> test-suite: add bin to deb, add shell completions
>
> Cargo.toml | 1 +
> Makefile | 13 +-
> debian/proxmox-backup-client.bash-completion | 1 +
> debian/proxmox-backup-client.install | 2 +
> debian/proxmox-backup-test-suite.bc | 8 +
> docs/backup-client.rst | 33 +
> docs/file-formats.rst | 32 +
> docs/meta-format-overview.dot | 50 ++
> examples/test_chunk_speed2.rs | 2 +-
> examples/upload-speed.rs | 2 +-
> pbs-client/src/backup_specification.rs | 40 +
> pbs-client/src/backup_writer.rs | 103 ++-
> pbs-client/src/chunk_stream.rs | 60 +-
> pbs-client/src/inject_reused_chunks.rs | 152 ++++
> pbs-client/src/lib.rs | 3 +-
> pbs-client/src/pxar/create.rs | 779 +++++++++++++++++-
> pbs-client/src/pxar/extract.rs | 2 +
> ...t-metadata-based-file-change-detection.svg | 1 +
> ...t-metadata-based-file-change-detection.txt | 12 +
> pbs-client/src/pxar/look_ahead_cache.rs | 38 +
> pbs-client/src/pxar/mod.rs | 3 +-
> pbs-client/src/pxar/tools.rs | 123 ++-
> pbs-client/src/pxar_backup_stream.rs | 57 +-
> pbs-client/src/tools/mod.rs | 5 +-
> pbs-datastore/src/dynamic_index.rs | 5 +
> pbs-pxar-fuse/src/lib.rs | 2 +-
> proxmox-backup-client/src/benchmark.rs | 2 +-
> proxmox-backup-client/src/catalog.rs | 42 +-
> proxmox-backup-client/src/helper.rs | 64 ++
> proxmox-backup-client/src/main.rs | 281 ++++++-
> proxmox-backup-client/src/mount.rs | 54 +-
> proxmox-backup-test-suite/Cargo.toml | 18 +
> .../src/detection_mode_bench.rs | 294 +++++++
> proxmox-backup-test-suite/src/main.rs | 17 +
> proxmox-file-restore/src/main.rs | 20 +-
> .../src/proxmox_restore_daemon/api.rs | 16 +-
> pxar-bin/src/main.rs | 53 +-
> src/api2/admin/datastore.rs | 47 +-
> src/api2/tape/restore.rs | 4 +-
> src/bin/proxmox_backup_debug/diff.rs | 2 +-
> src/tape/file_formats/snapshot_archive.rs | 9 +-
> tests/catar.rs | 4 +-
> www/datastore/Content.js | 6 +-
> zsh-completions/_proxmox-backup-test-suite | 13 +
> 44 files changed, 2219 insertions(+), 256 deletions(-)
> create mode 100644 debian/proxmox-backup-test-suite.bc
> create mode 100644 docs/meta-format-overview.dot
> create mode 100644 pbs-client/src/inject_reused_chunks.rs
> create mode 100644 pbs-client/src/pxar/flow-chart-metadata-based-file-change-detection.svg
> create mode 100644 pbs-client/src/pxar/flow-chart-metadata-based-file-change-detection.txt
> create mode 100644 pbs-client/src/pxar/look_ahead_cache.rs
> create mode 100644 proxmox-backup-client/src/helper.rs
> create mode 100644 proxmox-backup-test-suite/Cargo.toml
> create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs
> create mode 100644 proxmox-backup-test-suite/src/main.rs
> create mode 100644 zsh-completions/_proxmox-backup-test-suite
>
An updated version of the patch series is available
https://lists.proxmox.com/pipermail/pbs-devel/2024-April/009104.html
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
prev parent reply other threads:[~2024-04-29 12:13 UTC|newest]
Thread overview: 122+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-28 12:36 Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 01/58] encoder: fix two typos in comments Christian Ebner
2024-04-03 9:12 ` [pbs-devel] applied: " Fabian Grünbichler
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 02/58] format/examples: add PXAR_PAYLOAD_REF entry header Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 03/58] decoder: add method to read payload references Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 04/58] decoder: factor out skip part from skip_entry Christian Ebner
2024-04-03 9:18 ` Fabian Grünbichler
2024-04-03 11:02 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 05/58] encoder: add optional output writer for file payloads Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 06/58] encoder: move to stack based state tracking Christian Ebner
2024-04-03 9:54 ` Fabian Grünbichler
2024-04-03 11:01 ` Christian Ebner
2024-04-04 8:48 ` Fabian Grünbichler
2024-04-04 9:04 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 07/58] decoder/accessor: add optional payload input stream Christian Ebner
2024-04-03 10:38 ` Fabian Grünbichler
2024-04-03 11:47 ` Christian Ebner
2024-04-03 12:18 ` Christian Ebner
2024-04-04 8:46 ` Fabian Grünbichler
2024-04-04 9:49 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 08/58] encoder: add payload reference capability Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 09/58] encoder: add payload position capability Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 10/58] encoder: add payload advance capability Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 11/58] encoder/format: finish payload stream with marker Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 12/58] format: add payload stream start marker Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 13/58] format: add pxar format version entry Christian Ebner
2024-04-03 11:41 ` Fabian Grünbichler
2024-04-03 13:31 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 pxar 14/58] format/encoder/decoder: add entry type cli params Christian Ebner
2024-04-03 12:01 ` Fabian Grünbichler
2024-04-03 14:41 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 15/58] client: pxar: switch to stack based encoder state Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 16/58] client: backup writer: only borrow http client Christian Ebner
2024-04-08 9:04 ` [pbs-devel] applied: " Fabian Grünbichler
2024-04-08 9:17 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 17/58] client: backup: factor out extension from backup target Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 18/58] client: backup: early check for fixed index type Christian Ebner
2024-04-08 9:05 ` [pbs-devel] applied: " Fabian Grünbichler
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 19/58] client: pxar: combine writer params into struct Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 20/58] client: backup: split payload to dedicated stream Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 21/58] client: helper: add helpers for creating reader instances Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 22/58] client: helper: add method for split archive name mapping Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 23/58] client: restore: read payload from dedicated index Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 24/58] tools: cover meta extension for pxar archives Christian Ebner
2024-04-04 9:01 ` Fabian Grünbichler
2024-04-04 9:06 ` Christian Ebner
2024-04-04 9:10 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 25/58] restore: " Christian Ebner
2024-04-04 9:02 ` Fabian Grünbichler
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 26/58] client: mount: make split pxar archives mountable Christian Ebner
2024-04-04 9:43 ` Fabian Grünbichler
2024-04-04 13:29 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 27/58] api: datastore: refactor getting local chunk reader Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 28/58] api: datastore: attach optional payload " Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 29/58] catalog: shell: factor out pxar fuse reader instantiation Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 30/58] catalog: shell: redirect payload reader for split streams Christian Ebner
2024-04-04 9:49 ` Fabian Grünbichler
2024-04-04 15:52 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 31/58] www: cover meta extension for pxar archives Christian Ebner
2024-04-04 10:01 ` Fabian Grünbichler
2024-04-04 14:51 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 32/58] pxar: add optional payload input for achive restore Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 33/58] pxar: add more context to extraction error Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 34/58] client: pxar: include payload offset in output Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 35/58] pxar: show padding in debug output on archive list Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 36/58] datastore: dynamic index: add method to get digest Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 37/58] client: pxar: helper for lookup of reusable dynamic entries Christian Ebner
2024-04-04 12:54 ` Fabian Grünbichler
2024-04-04 17:13 ` Christian Ebner
2024-04-05 7:22 ` Christian Ebner
2024-04-05 11:28 ` Fabian Grünbichler
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 38/58] upload stream: impl reused chunk injector Christian Ebner
2024-04-04 14:24 ` Fabian Grünbichler
2024-04-05 10:26 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 39/58] client: chunk stream: add struct to hold injection state Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 40/58] client: chunk stream: add dynamic entries injection queues Christian Ebner
2024-04-04 14:52 ` Fabian Grünbichler
2024-04-08 13:54 ` Christian Ebner
2024-04-09 7:19 ` Fabian Grünbichler
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 41/58] specs: add backup detection mode specification Christian Ebner
2024-04-04 14:54 ` Fabian Grünbichler
2024-04-08 13:36 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 42/58] client: implement prepare reference method Christian Ebner
2024-04-05 8:01 ` Fabian Grünbichler
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 43/58] client: pxar: implement store to insert chunks on caching Christian Ebner
2024-04-05 7:52 ` Fabian Grünbichler
2024-04-09 9:12 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 44/58] client: pxar: add previous reference to archiver Christian Ebner
2024-04-04 15:04 ` Fabian Grünbichler
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 45/58] client: pxar: add method for metadata comparison Christian Ebner
2024-04-05 8:08 ` Fabian Grünbichler
2024-04-05 8:14 ` Christian Ebner
2024-04-09 12:52 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 46/58] pxar: caching: add look-ahead cache types Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 47/58] client: pxar: add look-ahead caching Christian Ebner
2024-04-05 8:33 ` Fabian Grünbichler
2024-04-09 14:53 ` Christian Ebner
[not found] ` <<dce38c53-f3e7-47ac-b1fd-a63daaabbcec@proxmox.com>
2024-04-10 7:03 ` Fabian Grünbichler
2024-04-10 7:11 ` Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 48/58] fix #3174: client: pxar: enable caching and meta comparison Christian Ebner
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 49/58] client: backup: increase average chunk size for metadata Christian Ebner
2024-04-05 9:42 ` Fabian Grünbichler
2024-04-05 10:49 ` Dietmar Maurer
2024-04-08 8:28 ` Fabian Grünbichler
2024-03-28 12:36 ` [pbs-devel] [PATCH v3 proxmox-backup 50/58] client: backup writer: add injected chunk count to stats Christian Ebner
2024-03-28 12:37 ` [pbs-devel] [PATCH v3 proxmox-backup 51/58] pxar: create: show chunk injection stats debug output Christian Ebner
2024-04-05 9:47 ` Fabian Grünbichler
2024-04-10 10:00 ` Christian Ebner
2024-03-28 12:37 ` [pbs-devel] [PATCH v3 proxmox-backup 52/58] client: pxar: add entry kind format version Christian Ebner
2024-03-28 12:37 ` [pbs-devel] [PATCH v3 proxmox-backup 53/58] client: pxar: opt encode cli exclude patterns as CliParams Christian Ebner
2024-04-05 9:49 ` Fabian Grünbichler
2024-03-28 12:37 ` [pbs-devel] [PATCH v3 proxmox-backup 54/58] client: pxar: add flow chart for metadata change detection Christian Ebner
2024-04-05 10:16 ` Fabian Grünbichler
2024-04-10 10:04 ` Christian Ebner
2024-03-28 12:37 ` [pbs-devel] [PATCH v3 proxmox-backup 55/58] docs: describe file format for split payload files Christian Ebner
2024-04-05 10:26 ` Fabian Grünbichler
2024-03-28 12:37 ` [pbs-devel] [PATCH v3 proxmox-backup 56/58] docs: add section describing change detection mode Christian Ebner
2024-04-05 11:22 ` Fabian Grünbichler
2024-03-28 12:37 ` [pbs-devel] [PATCH v3 proxmox-backup 57/58] test-suite: add detection mode change benchmark Christian Ebner
2024-03-28 12:37 ` [pbs-devel] [PATCH v3 proxmox-backup 58/58] test-suite: add bin to deb, add shell completions Christian Ebner
2024-04-05 11:39 ` [pbs-devel] [PATCH v3 pxar proxmox-backup 00/58] fix #3174: improve file-level backup Fabian Grünbichler
2024-04-29 12:13 ` Christian Ebner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7ebd7071-7a53-4b83-8333-f05d61c9f868@proxmox.com \
--to=c.ebner@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox