From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id B14F790A7A for ; Thu, 25 Jan 2024 14:26:26 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 1C48619972 for ; Thu, 25 Jan 2024 14:26:26 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Thu, 25 Jan 2024 14:26:23 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 57F1F492AA for ; Thu, 25 Jan 2024 14:26:23 +0100 (CET) From: Christian Ebner To: pbs-devel@lists.proxmox.com Date: Thu, 25 Jan 2024 14:25:39 +0100 Message-Id: <20240125132608.1172472-1-c.ebner@proxmox.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.053 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - Subject: [pbs-devel] [PATCH-SERIES v6 pxar proxmox-backup proxmox-widget-toolkit 0/29] fix #3174: improve file-level backup X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Jan 2024 13:26:26 -0000 Changes to the patch series since version 5 are based on the feedback obtained via internal communication channels. Many thanks to Thomas, Fabian, Wolfgang and Dominik for their continuous feedback. This series of patches implements an metadata based file change detection mechanism for improved pxar file level backup creation speed for unchanged files. The chosen approach is to skip encoding of regular file payloads, for which metadata (currently ctime and size) did not change as compared to a previous backup run. Instead of re-encoding the files, a reference to a newly introduced appendix section of the pxar archive will be written. The appendix section will be created as concatenation of indexed chunks from the previous backup run, thereby containing the sequential file payload at a calculated offset with respect to the starting point of the appendix section. Metadata comparison and calculation of the chunks to be indexed for the appendix section is performed using the catalog of a previous backup as reference. In order to be able to calculate the offsets, an updated catalog file format version 2 is introduced which extends the previous version by including the file offset with respect to the pxar archive byte stream, as well as the files ctime. This allows to find the required chunks indexes and the start padding within the concatenated chunks. The catalog reader remains backwards compatible to the catalog file format version 1. During encoding, the chunks needed for the appendix section are injected in the backup upload stream after forcing a chunk boundary when regular pxar encoding is finished. Finally, the pxar archive containing an appendix section are marked as such by appending a final pxar goodbye lookup table only containing the offset to the appendix section start and total size of that section, needed for random access as e.g. to mount the archive via the fuse filesystem implementation. The following lists the most notable changes included in this series since the version 5: - the archiver now implements lookahead caching in order to decide if to reuse or reencode an entry, replacing the previously used heuristic. This allows to reduce chunk fragmentation while being more predictable and stable than the previous approach - the pxar format version is now encoded as prefix entry to pxar archives with version 2 in order for encoder/decoder implementations to early detect the correct format. - the encoder now stores an internal stack of encoder states, instead of creating new EncoderImpl instances for each directory level. This was a neccessary change in order to handle the entry caching for the lookahead cache. - Appendix chunks now have a dedicated type `AppendableDynamicEntry` to clearly distingush them from regular chunks. The following lists the most notable changes included in this series since the version 4: - fix an issue with premature injection queue chunk insertion on initialization - fix an issue with the decoder state not being correctly set at the start of the appendix section, leading to decoding errors in special cases. - avoid double injection of chunks in cases where the chunk list to insert starts with the first chunk of the list already being present, but not the subsequent ones - refactoring and renaming of the Encoder's `bytes_len` to `encoded_size` The following lists the most notable changes included in this series since the version 3: - count appendix chunks as reused chunks, as they are not re-encoded and re-uploaded. - add a heuristic to reduce chunk fragmentation for appendix chunks for multiple consecutive backup runs with metadata based file change detection. - refactor the appendix list generation during pxar encoding in the archiver. - switch from Vec to BTreeMap for the restoring the appendix entries so entries are inserted in sorted order based on their offset, making it unnecessary to sort afterwards. - fix issue with chunk injection code which lead to data corruption in some edge cases, add additional checks as fortification. The following lists the most notable changes included in this series since the version 2: - avoid re-indexing the same chunks multiple times in the appendix section by looking them up in the already present appendix chunks list and calculate the appendix reference offset accordingly. This now requires to sort entries by their appendix start offset for sequential restore. - reduce appendix reference chunk fragmentation by increasing the file size threshold to 1k. - Fix the WebUIs file browser and single file restore, broken in the previous patch series. - fixes previous catalog and/or dynamic index downloads when either the backup group was empty or no archive with the same name present in the backup The following lists the most notable changes included in this series since the version 1: - fixes and refactors the missing chunk issue by modifying the logic to avoid re-appending the same chunks multiple times if referenced by multiple consecutive files. - fixes a performance issue with catalog lookup being the bottleneck for cases with directories with many entries, resulting in the metadata based file change detection performing worse than the regular mode. - fixes the creation of multi archive backups. All of the archives use the same reference catalog. - the catalog file format is pushed to version 2, including the needed archive offsets as well as ctime for file change detection. - the catalog is fully backward compatible to catalog file format version 1, so both can be decoded by the reader. However, the new version of the catalog file format will be used for all new backups - it is now possible to perform multiple consecutive runs of the backup with metadata based file change detection, no more need to perform the regular run previous to the other one. - change from `incremental` flag to enum based `BackupDetectionMode` parameter for command invocations. - includes a new `proxmox-backup-test-suite` binary to create and run benchmarks to compare the performance of the different detection modes. An invocation of a backup run with this patches now is: ```bash proxmox-backup-client backup