From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 984D41FF389 for ; Wed, 5 Jun 2024 10:51:49 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id AC7C91EAB2; Wed, 5 Jun 2024 10:52:20 +0200 (CEST) Date: Wed, 05 Jun 2024 10:51:41 +0200 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Proxmox Backup Server development discussion References: <20240528094303.309806-1-c.ebner@proxmox.com> In-Reply-To: <20240528094303.309806-1-c.ebner@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.16.0 (https://github.com/astroidmail/astroid) Message-Id: <1717577288.2jcotivs19.astroid@yuna.none> X-SPAM-LEVEL: Spam detection results: 0 AWL 0.058 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - Subject: [pbs-devel] partially-applied: [PATCH v8 pxar proxmox-backup 00/69] fix #3174: improve file-level backup X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" applied the pxar patches + follow-ups, and patches 16, 17 and 39 for PBS. while most of the rest of the patches LGTM as well, there are too many inter-dependencies to just pick a few, and quite a lot would be required to make pbs buildable again with a bumped pxar, so I left the rest even if most of them will likely be unchanged for v9, and skipped the bump of pxar as well for now. On May 28, 2024 11:41 am, Christian Ebner wrote: > This series of patches implements an metadata based file change > detection mechanism for improved pxar file level backup creation speed > for unchanged files. > > The chosen approach is to split pxar archives on creation via the > proxmox-backup-client into two separate data and upload streams, > one exclusive for regular file payloads, the other one for the rest > of the pxar archive, which is mostly metadata. > > On consecutive runs, the metadata archive of the previous backup run, > which is limited in size and therefore rapidly accessed is used to > lookup and compare the metadata for entries to encode. > This assumes that the connection speed to the Proxmox Backup Server is > sufficiently fast, allowing the download and chaching of the chunks for > that index. > > Changes to regular files are detected by comparing all of the files > metadata object, including mtime, acls, ecc. If no changes are detected, > the previous payload index is used to lookup chunks to possibly re-use > in the payload stream of the new archive. > In order to reduce possible chunk fragmentation, the decision whether to > reuse or reencode a file payload is deferred until enough information > is gathered by adding entries to a look-ahead cache. If the padding > introduced by reusing chunks falls below a threshold, the entries are > referenced, the chunks are reused and injected into the pxar payload > upload stream, otherwise they are discated and the files encoded > regularly. > > Patches 16 and 17 are to be applied before the patches to the pxar > repository, while patches 14 and 15 are to be applied to the pxar repository > only after patch 52 in the series, for the patches to compile in a sequential > chain. > > The following lists the most notable changes included in this series since > the version 7: > - Fixed incorrectly squashed patches during rebase > > The following lists the most notable changes included in this series since > the version 6: > - Allow to use `.pxar` extension in cli commands for convenience > - Refactor the input/output interface for the pxar encoder, decoder and > accessor to use a `PxarVariant` enum, in order to guarantee the > payload relate input/output is always attached for split archives. > - Refactor the lookahead caching logic in the pxars `Archiver` to > improve overall code readability. > - Add helper method for file name matching and use it where possible, > for it to be handled in a single place. > - Extend documentation to include additional information about which > metadata is compared to the previous snapshot > - Fix an issue with the `pxar list` which failed in case of metadata > only pxar archives. > - Fix an issue in the payload chunker test where the context was not > updated accordingly. > - Various clippy fixes, smaller refactoring and reordering of patches > > The following lists the most notable changes included in this series since > the version 5: > - Fix an issue where the payload chunker was not correctly reset after > suggested or forced boundaries. > - Added regression tests for payload chunker and chunk stream. > > The following lists the most notable changes included in this series since > the version 4: > - Increase open file handle limit to hard limit and adapt lookahead > cache size dynamically (thanks a lot to Thomas for pointing this out > and providing the necessary background information). This helps with > the reuse of multiple entries being contained within the same chunk, > otherwise exceeding padding threshold and being therefore reencoded > instead. > - Fix payload chunker scan to only scan up until chunk pos in case a > suggested boundary is chosen. > - Fix issue with decoder state being not set to correct `InDirectory` > after reading prelude and getting root directory entry. > - Fix issue with kept back chunk injection when the chunk follows a > range discontinuity. > - Add regression test for pxar create with metadata archive and payload > index reference. > > The following lists the most notable changes included in this series since > the version 3: > - Rework the whole reused chunk injection and accounting logic and use > lockless async `mpsc::channel`s instead of `Arc>>`. > - Reworked lookahead caching logic to use payload ranges and check for > possible range continuation instead of looking up the reusable dynamic > entries immediately in case of a reusable entry chain. This also > avoids edge cases not covered in the previous version of the patch series. > This current version therefore tends to reencode small files more > aggressively, since they might introduce additional unwanted paddings. > - Correctly cover also hardlinks for the reuse logic, avoiding to > reencode these entries. > - Add additional dedicatet chunker implementation for payload data > stream, allowing the archiver to suggest boundaries to the chunker to > reduce padding for reused chunks. > - Add additional `change-detection-mode=data`, in order to allow > creating split archives with fully reencoded payload data. > - Add additional payload input readers for pxar accessor type > implementations where needed. > - Add additional consistency check in pxar encoder when dropping state > or encoder instance. > - CliParams was renamed to the more opaque Prelude, since the pxar > archive does not care about its contents and this might be extended to > store other information about the archive as well. > - Add missing proxmox-file-restore for split archives and fix restore of > tar/zip archives via WebUI. This is handled by the same decoder logic, > and needed an updated payload input content range to read the data > from the correct location in the payload data archive. > - Additional refactoring to use the pxar reader helpers where possible. > > The following lists the most notable changes included in this series since > the version 2: > - many bugfixes regarding incorrect archive encoding by wrong offset > generation, adding additional sanity checks and rather fail on > encoding than produce an incorrectly encoded archive > - different approach for deciding whether to reuse or reencode the > entries. Previously, the entries have been encoded when a cached > payload size threshold was reached. Now, the padding introduced by > reusable chunks is tracked, and only if the padding does not exceed > the set threshold, the entries are reused. This reduces the possible > padding, at the cost of reencoding more entries. Also avoids to > re-use chunks which have now large padding holes because of > moved/removed files contained within. > - added headers for metadata archive and payload file > - added documentation > > An invocation of a backup run with this patches now is: > ```bash > proxmox-backup-client backup