From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <c.ebner@proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 8975CBC09B for <pbs-devel@lists.proxmox.com>; Thu, 28 Mar 2024 13:38:06 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 6C6B29D16 for <pbs-devel@lists.proxmox.com>; Thu, 28 Mar 2024 13:37:36 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for <pbs-devel@lists.proxmox.com>; Thu, 28 Mar 2024 13:37:35 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 001FA428FC for <pbs-devel@lists.proxmox.com>; Thu, 28 Mar 2024 13:37:35 +0100 (CET) From: Christian Ebner <c.ebner@proxmox.com> To: pbs-devel@lists.proxmox.com Date: Thu, 28 Mar 2024 13:36:09 +0100 Message-Id: <20240328123707.336951-1-c.ebner@proxmox.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.031 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [meta-format-overview.dot, mk-format-hashes.rs] Subject: [pbs-devel] [PATCH v3 pxar proxmox-backup 00/58] fix #3174: improve file-level backup X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion <pbs-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/> List-Post: <mailto:pbs-devel@lists.proxmox.com> List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe> X-List-Received-Date: Thu, 28 Mar 2024 12:38:06 -0000 A big thank you to Dietmar and Fabian for the review of the previous version and Fabian for extensive testing and help during debugging. This series of patches implements an metadata based file change detection mechanism for improved pxar file level backup creation speed for unchanged files. The chosen approach is to split pxar archives on creation via the proxmox-backup-client into two separate data and upload streams, one exclusive for regular file payloads, the other one for the rest of the pxar archive, which is mostly metadata. On consecutive runs, the metadata archive of the previous backup run, which is limited in size and therefore rapidly accessed is used to lookup and compare the metadata for entries to encode. This assumes that the connection speed to the Proxmox Backup Server is sufficiently fast, allowing the download and chaching of the chunks for that index. Changes to regular files are detected by comparing all of the files metadata object, including mtime, acls, ecc. If no changes are detected, the previous payload index is used to lookup chunks to possibly re-use in the payload stream of the new archive. In order to reduce possible chunk fragmentation, the decision whether to re-use or re-encode a file payload is deferred until enough information is gathered by adding entries to a look-ahead cache. If the padding introduced by reusing chunks falls below a threshold, the entries are referenced, the chunks are re-used and injected into the pxar payload upload stream, otherwise they are discated and the files encoded regularly. The following lists the most notable changes included in this series since the version 2: - many bugfixes regarding incorrect archive encoding by wrong offset generation, adding additional sanity checks and rather fail on encoding than produce an incorrectly encoded archive - different approach for deciding whether to re-use or re-encode the entries. Previously, the entries have been encoded when a cached payload size threshold was reached. Now, the padding introduced by reusable chunks is tracked, and only if the padding does not exceed the set threshold, the entries are re-used. This reduces the possible padding, at the cost of re-encoding more entries. Also avoids to re-use chunks which have now large padding holes because of moved/removed files contained within. - added headers for metadata archive and payload file - added documentation An invocation of a backup run with this patches now is: ```bash proxmox-backup-client backup <label>.pxar:<source-path> --change-detection-mode=metadata ``` During the first run, no reference index is available, the pxar archive will however be split into the two parts. Following backups will however utilize the pxar archive accessor and index files of the previous run to perform file change detection. As benchmarks, the linux source code as well as the coco dataset for computer vision and pattern recognition can be used. The benchmarks can be performed by running: ```bash proxmox-backup-test-suite detection-mode-bench prepare --target /<path-to-bench-source-target> proxmox-backup-test-suite detection-mode-bench run linux.pxar:/<path-to-bench-source-target>/linux proxmox-backup-test-suite detection-mode-bench run coco.pxar:/<path-to-bench-source-target>/coco ``` Above command invocations assume the default repository and credentials to be set as environment variables, they might however be passed as additional optional parameters instead. pxar: Christian Ebner (14): encoder: fix two typos in comments format/examples: add PXAR_PAYLOAD_REF entry header decoder: add method to read payload references decoder: factor out skip part from skip_entry encoder: add optional output writer for file payloads encoder: move to stack based state tracking decoder/accessor: add optional payload input stream encoder: add payload reference capability encoder: add payload position capability encoder: add payload advance capability encoder/format: finish payload stream with marker format: add payload stream start marker format: add pxar format version entry format/encoder/decoder: add entry type cli params examples/apxar.rs | 2 +- examples/mk-format-hashes.rs | 21 ++ examples/pxarcmd.rs | 7 +- src/accessor/aio.rs | 10 +- src/accessor/mod.rs | 52 +++- src/accessor/sync.rs | 8 +- src/decoder/aio.rs | 14 +- src/decoder/mod.rs | 191 ++++++++++++-- src/decoder/sync.rs | 15 +- src/encoder/aio.rs | 87 +++++-- src/encoder/mod.rs | 475 +++++++++++++++++++++++++---------- src/encoder/sync.rs | 67 ++++- src/format/mod.rs | 63 +++++ src/lib.rs | 9 + tests/simple/main.rs | 3 + 15 files changed, 827 insertions(+), 197 deletions(-) proxmox-backup: Christian Ebner (44): client: pxar: switch to stack based encoder state client: backup writer: only borrow http client client: backup: factor out extension from backup target client: backup: early check for fixed index type client: pxar: combine writer params into struct client: backup: split payload to dedicated stream client: helper: add helpers for creating reader instances client: helper: add method for split archive name mapping client: restore: read payload from dedicated index tools: cover meta extension for pxar archives restore: cover meta extension for pxar archives client: mount: make split pxar archives mountable api: datastore: refactor getting local chunk reader api: datastore: attach optional payload chunk reader catalog: shell: factor out pxar fuse reader instantiation catalog: shell: redirect payload reader for split streams www: cover meta extension for pxar archives pxar: add optional payload input for achive restore pxar: add more context to extraction error client: pxar: include payload offset in output pxar: show padding in debug output on archive list datastore: dynamic index: add method to get digest client: pxar: helper for lookup of reusable dynamic entries upload stream: impl reused chunk injector client: chunk stream: add struct to hold injection state client: chunk stream: add dynamic entries injection queues specs: add backup detection mode specification client: implement prepare reference method client: pxar: implement store to insert chunks on caching client: pxar: add previous reference to archiver client: pxar: add method for metadata comparison pxar: caching: add look-ahead cache types client: pxar: add look-ahead caching fix #3174: client: pxar: enable caching and meta comparison client: backup: increase average chunk size for metadata client: backup writer: add injected chunk count to stats pxar: create: show chunk injection stats debug output client: pxar: add entry kind format version client: pxar: opt encode cli exclude patterns as CliParams client: pxar: add flow chart for metadata change detection docs: describe file format for split payload files docs: add section describing change detection mode test-suite: add detection mode change benchmark test-suite: add bin to deb, add shell completions Cargo.toml | 1 + Makefile | 13 +- debian/proxmox-backup-client.bash-completion | 1 + debian/proxmox-backup-client.install | 2 + debian/proxmox-backup-test-suite.bc | 8 + docs/backup-client.rst | 33 + docs/file-formats.rst | 32 + docs/meta-format-overview.dot | 50 ++ examples/test_chunk_speed2.rs | 2 +- examples/upload-speed.rs | 2 +- pbs-client/src/backup_specification.rs | 40 + pbs-client/src/backup_writer.rs | 103 ++- pbs-client/src/chunk_stream.rs | 60 +- pbs-client/src/inject_reused_chunks.rs | 152 ++++ pbs-client/src/lib.rs | 3 +- pbs-client/src/pxar/create.rs | 779 +++++++++++++++++- pbs-client/src/pxar/extract.rs | 2 + ...t-metadata-based-file-change-detection.svg | 1 + ...t-metadata-based-file-change-detection.txt | 12 + pbs-client/src/pxar/look_ahead_cache.rs | 38 + pbs-client/src/pxar/mod.rs | 3 +- pbs-client/src/pxar/tools.rs | 123 ++- pbs-client/src/pxar_backup_stream.rs | 57 +- pbs-client/src/tools/mod.rs | 5 +- pbs-datastore/src/dynamic_index.rs | 5 + pbs-pxar-fuse/src/lib.rs | 2 +- proxmox-backup-client/src/benchmark.rs | 2 +- proxmox-backup-client/src/catalog.rs | 42 +- proxmox-backup-client/src/helper.rs | 64 ++ proxmox-backup-client/src/main.rs | 281 ++++++- proxmox-backup-client/src/mount.rs | 54 +- proxmox-backup-test-suite/Cargo.toml | 18 + .../src/detection_mode_bench.rs | 294 +++++++ proxmox-backup-test-suite/src/main.rs | 17 + proxmox-file-restore/src/main.rs | 20 +- .../src/proxmox_restore_daemon/api.rs | 16 +- pxar-bin/src/main.rs | 53 +- src/api2/admin/datastore.rs | 47 +- src/api2/tape/restore.rs | 4 +- src/bin/proxmox_backup_debug/diff.rs | 2 +- src/tape/file_formats/snapshot_archive.rs | 9 +- tests/catar.rs | 4 +- www/datastore/Content.js | 6 +- zsh-completions/_proxmox-backup-test-suite | 13 + 44 files changed, 2219 insertions(+), 256 deletions(-) create mode 100644 debian/proxmox-backup-test-suite.bc create mode 100644 docs/meta-format-overview.dot create mode 100644 pbs-client/src/inject_reused_chunks.rs create mode 100644 pbs-client/src/pxar/flow-chart-metadata-based-file-change-detection.svg create mode 100644 pbs-client/src/pxar/flow-chart-metadata-based-file-change-detection.txt create mode 100644 pbs-client/src/pxar/look_ahead_cache.rs create mode 100644 proxmox-backup-client/src/helper.rs create mode 100644 proxmox-backup-test-suite/Cargo.toml create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs create mode 100644 proxmox-backup-test-suite/src/main.rs create mode 100644 zsh-completions/_proxmox-backup-test-suite -- 2.39.2