From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pbs-devel-bounces@lists.proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 1F25C1FF389 for <inbox@lore.proxmox.com>; Tue, 7 May 2024 17:53:11 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 2031611B80; Tue, 7 May 2024 17:53:15 +0200 (CEST) From: Christian Ebner <c.ebner@proxmox.com> To: pbs-devel@lists.proxmox.com Date: Tue, 7 May 2024 17:51:42 +0200 Message-Id: <20240507155244.793819-1-c.ebner@proxmox.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.028 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pbs-devel] [PATCH v5 pxar proxmox-backup 00/62] fix #3174: improve file-level backup X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion <pbs-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/> List-Post: <mailto:pbs-devel@lists.proxmox.com> List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe> Reply-To: Proxmox Backup Server development discussion <pbs-devel@lists.proxmox.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" <pbs-devel-bounces@lists.proxmox.com> This series of patches implements an metadata based file change detection mechanism for improved pxar file level backup creation speed for unchanged files. The chosen approach is to split pxar archives on creation via the proxmox-backup-client into two separate data and upload streams, one exclusive for regular file payloads, the other one for the rest of the pxar archive, which is mostly metadata. On consecutive runs, the metadata archive of the previous backup run, which is limited in size and therefore rapidly accessed is used to lookup and compare the metadata for entries to encode. This assumes that the connection speed to the Proxmox Backup Server is sufficiently fast, allowing the download and chaching of the chunks for that index. Changes to regular files are detected by comparing all of the files metadata object, including mtime, acls, ecc. If no changes are detected, the previous payload index is used to lookup chunks to possibly re-use in the payload stream of the new archive. In order to reduce possible chunk fragmentation, the decision whether to reuse or reencode a file payload is deferred until enough information is gathered by adding entries to a look-ahead cache. If the padding introduced by reusing chunks falls below a threshold, the entries are referenced, the chunks are reused and injected into the pxar payload upload stream, otherwise they are discated and the files encoded regularly. Patches 13 and 14 are to be applied to the pxar repository only after patch 49 in the series, for the patches to compile in a sequential chain. The following lists the most notable changes included in this series since the version 4: - Increase open file handle limit to hard limit and adapt lookahead cache size dynamically (thanks a lot to Thomas for pointing this out and providing the necessary background information). This helps with the reuse of multiple entries being contained within the same chunk, otherwise exceeding padding threshold and being therefore reencoded instead. - Fix payload chunker scan to only scan up until chunk pos in case a suggested boundary is chosen. - Fix issue with decoder state being not set to correct `InDirectory` after reading prelude and getting root directory entry. - Fix issue with kept back chunk injection when the chunk follows a range discontinuity. - Add regression test for pxar create with metadata archive and payload index reference. The following lists the most notable changes included in this series since the version 3: - Rework the whole reused chunk injection and accounting logic and use lockless async `mpsc::channel`s instead of `Arc<Mutex<VecDeque<..>>>`. - Reworked lookahead caching logic to use payload ranges and check for possible range continuation instead of looking up the reusable dynamic entries immediately in case of a reusable entry chain. This also avoids edge cases not covered in the previous version of the patch series. This current version therefore tends to reencode small files more aggressively, since they might introduce additional unwanted paddings. - Correctly cover also hardlinks for the reuse logic, avoiding to reencode these entries. - Add additional dedicatet chunker implementation for payload data stream, allowing the archiver to suggest boundaries to the chunker to reduce padding for reused chunks. - Add additional `change-detection-mode=data`, in order to allow creating split archives with fully reencoded payload data. - Add additional payload input readers for pxar accessor type implementations where needed. - Add additional consistency check in pxar encoder when dropping state or encoder instance. - CliParams was renamed to the more opaque Prelude, since the pxar archive does not care about its contents and this might be extended to store other information about the archive as well. - Add missing proxmox-file-restore for split archives and fix restore of tar/zip archives via WebUI. This is handled by the same decoder logic, and needed an updated payload input content range to read the data from the correct location in the payload data archive. - Additional refactoring to use the pxar reader helpers where possible. The following lists the most notable changes included in this series since the version 2: - many bugfixes regarding incorrect archive encoding by wrong offset generation, adding additional sanity checks and rather fail on encoding than produce an incorrectly encoded archive - different approach for deciding whether to reuse or reencode the entries. Previously, the entries have been encoded when a cached payload size threshold was reached. Now, the padding introduced by reusable chunks is tracked, and only if the padding does not exceed the set threshold, the entries are reused. This reduces the possible padding, at the cost of reencoding more entries. Also avoids to re-use chunks which have now large padding holes because of moved/removed files contained within. - added headers for metadata archive and payload file - added documentation An invocation of a backup run with this patches now is: ```bash proxmox-backup-client backup <label>.pxar:<source-path> --change-detection-mode=metadata ``` During the first run, no reference index is available, the pxar archive will however be split into the two parts. Following backups will however utilize the pxar archive accessor and index files of the previous run to perform file change detection. As benchmarks, the linux source code as well as the coco dataset for computer vision and pattern recognition can be used. The benchmarks can be performed by running: ```bash proxmox-backup-test-suite detection-mode-bench prepare --target /<path-to-bench-source-target> proxmox-backup-test-suite detection-mode-bench run linux.pxar:/<path-to-bench-source-target>/linux proxmox-backup-test-suite detection-mode-bench run coco.pxar:/<path-to-bench-source-target>/coco ``` Above command invocations assume the default repository and credentials to be set as environment variables, they might however be passed as additional optional parameters instead. pxar: Christian Ebner (14): format/examples: add header type `PXAR_PAYLOAD_REF` decoder: add method to read payload references decoder: factor out skip part from skip_entry encoder: add optional output writer for file payloads encoder: move to stack based state tracking decoder/accessor: add optional payload input stream decoder: set payload input range when decoding via accessor encoder: add payload reference capability encoder: add payload position capability encoder: add payload advance capability encoder/format: finish payload stream with marker format: add payload stream start marker format/encoder/decoder: new pxar entry type `Version` format/encoder/decoder: new pxar entry type `Prelude` examples/apxar.rs | 2 +- examples/mk-format-hashes.rs | 21 ++ examples/pxarcmd.rs | 7 +- src/accessor/aio.rs | 10 +- src/accessor/mod.rs | 116 +++++++- src/accessor/sync.rs | 8 +- src/decoder/aio.rs | 14 +- src/decoder/mod.rs | 212 +++++++++++++-- src/decoder/sync.rs | 15 +- src/encoder/aio.rs | 87 ++++-- src/encoder/mod.rs | 497 ++++++++++++++++++++++++++--------- src/encoder/sync.rs | 67 ++++- src/format/mod.rs | 63 +++++ src/lib.rs | 9 + tests/compat.rs | 3 +- tests/simple/fs.rs | 8 +- tests/simple/main.rs | 8 +- 17 files changed, 935 insertions(+), 212 deletions(-) proxmox-backup: Christian Ebner (48): client: pxar: switch to stack based encoder state client: backup: factor out extension from backup target client: pxar: combine writers into struct client: pxar: add optional pxar payload writer instance client: pxar: optionally split metadata and payload streams client: helper: add helpers for creating reader instances client: helper: add method for split archive name mapping client: restore: read payload from dedicated index tools: cover extension for split pxar archives restore: cover extension for split pxar archives client: mount: make split pxar archives mountable api: datastore: refactor getting local chunk reader api: datastore: attach optional payload chunk reader catalog: shell: make split pxar archives accessible www: cover metadata extension for pxar archives file restore: factor out getting pxar reader file restore: cover split metadata and payload archives file restore: show more error context when extraction fails pxar: add optional payload input for achive restore pxar: add more context to extraction error client: pxar: include payload offset in entry listing pxar: show padding in debug output on archive list datastore: dynamic index: add method to get digest client: pxar: helper for lookup of reusable dynamic entries upload stream: implement reused chunk injector client: chunk stream: add struct to hold injection state client: streams: add channels for dynamic entry injection specs: add backup detection mode specification client: implement prepare reference method client: pxar: add method for metadata comparison pxar: caching: add look-ahead cache types fix #3174: client: pxar: enable caching and meta comparison client: backup writer: add injected chunk count to stats pxar: create: keep track of reused chunks and files pxar: create: show chunk injection stats debug output client: pxar: add helper to handle optional preludes client: pxar: opt encode cli exclude patterns as Prelude docs: file formats: describe split pxar archive file layout docs: add section describing change detection mode test-suite: add detection mode change benchmark test-suite: add bin to deb, add shell completions datastore: chunker: add Chunker trait datastore: chunker: implement chunker for payload stream client: chunk stream: switch payload stream chunker client: pxar: allow to restore prelude to optional path client: pxar: add archive creation with reference test client: tools: add helper to raise nofile rlimit client: pxar: set cache limit based on nofile rlimit Cargo.toml | 1 + Makefile | 13 +- debian/proxmox-backup-client.bash-completion | 1 + debian/proxmox-backup-client.install | 2 + debian/proxmox-backup-test-suite.bc | 8 + docs/backup-client.rst | 41 + docs/file-formats.rst | 46 + docs/meta-format-overview.dot | 50 + examples/test_chunk_size.rs | 9 +- examples/test_chunk_speed.rs | 7 +- examples/test_chunk_speed2.rs | 2 +- pbs-client/src/backup_specification.rs | 44 + pbs-client/src/backup_writer.rs | 120 +- pbs-client/src/chunk_stream.rs | 122 +- pbs-client/src/inject_reused_chunks.rs | 129 +++ pbs-client/src/lib.rs | 3 +- pbs-client/src/pxar/create.rs | 1004 ++++++++++++++++- pbs-client/src/pxar/extract.rs | 31 +- pbs-client/src/pxar/look_ahead_cache.rs | 38 + pbs-client/src/pxar/mod.rs | 5 +- pbs-client/src/pxar/tools.rs | 123 +- pbs-client/src/pxar_backup_stream.rs | 68 +- pbs-client/src/tools/mod.rs | 55 +- pbs-datastore/src/chunker.rs | 161 ++- pbs-datastore/src/dynamic_index.rs | 14 +- pbs-datastore/src/lib.rs | 2 +- pbs-pxar-fuse/src/lib.rs | 2 +- proxmox-backup-client/src/catalog.rs | 30 +- proxmox-backup-client/src/helper.rs | 96 ++ proxmox-backup-client/src/main.rs | 284 ++++- proxmox-backup-client/src/mount.rs | 34 +- proxmox-backup-test-suite/Cargo.toml | 18 + .../src/detection_mode_bench.rs | 294 +++++ proxmox-backup-test-suite/src/main.rs | 17 + proxmox-file-restore/src/main.rs | 80 +- .../src/proxmox_restore_daemon/api.rs | 18 +- pxar-bin/src/main.rs | 61 +- src/api2/admin/datastore.rs | 47 +- src/api2/tape/restore.rs | 21 +- src/bin/proxmox_backup_debug/diff.rs | 2 +- src/tape/file_formats/snapshot_archive.rs | 9 +- tests/catar.rs | 5 +- tests/pxar/backup-client-pxar-data.mpxar | Bin 0 -> 15070 bytes tests/pxar/backup-client-pxar-data.ppxar.didx | Bin 0 -> 8096 bytes tests/pxar/backup-client-pxar-expected.mpxar | Bin 0 -> 15086 bytes www/datastore/Content.js | 6 +- zsh-completions/_proxmox-backup-test-suite | 13 + 47 files changed, 2802 insertions(+), 334 deletions(-) create mode 100644 debian/proxmox-backup-test-suite.bc create mode 100644 docs/meta-format-overview.dot create mode 100644 pbs-client/src/inject_reused_chunks.rs create mode 100644 pbs-client/src/pxar/look_ahead_cache.rs create mode 100644 proxmox-backup-client/src/helper.rs create mode 100644 proxmox-backup-test-suite/Cargo.toml create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs create mode 100644 proxmox-backup-test-suite/src/main.rs create mode 100644 tests/pxar/backup-client-pxar-data.mpxar create mode 100644 tests/pxar/backup-client-pxar-data.ppxar.didx create mode 100644 tests/pxar/backup-client-pxar-expected.mpxar create mode 100644 zsh-completions/_proxmox-backup-test-suite -- 2.39.2 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel