From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 0AE81DAAE for ; Fri, 22 Sep 2023 09:16:55 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id E11596BCF for ; Fri, 22 Sep 2023 09:16:52 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Fri, 22 Sep 2023 09:16:50 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 4BBB94878D for ; Fri, 22 Sep 2023 09:16:50 +0200 (CEST) From: Christian Ebner To: pbs-devel@lists.proxmox.com Date: Fri, 22 Sep 2023 09:16:01 +0200 Message-Id: <20230922071621.12670-1-c.ebner@proxmox.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.119 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pbs-devel] [RFC pxar proxmox-backup 00/20] fix #3174: improve file-level backup X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Sep 2023 07:16:55 -0000 This (still rather rough) series of patches prototypes a possible approach to improve the pxar file level backup creation speed. The series is intended to get a first feedback on the implementation approach and to find possible pitfalls I might not be aware of. The current approach is to skip encoding of regular file payloads, for which metadata (currently mtime and size) did not change as compared to a previous backup run. Instead of re-encoding the files, a reference to a newly introduced appendix section of the pxar archive will be written. The appenidx section will be created as concatination of indexed chunks from the previous backup run, thereby containing the sequential file payload at a calculated offset with respect to the starting point of the appendix section. Metadata comparison and caclulation of the chunks to be indexed for the appendix section is performed using the catalog of a previous backup as reference. In order to be able to calculate the offsets, the current catalog format is extended to include the file offset with respect to the pxar archive byte stream. This allows to find the required chunks indexes, the start padding within the concatenated chunks and the total bytes introduced by the chunks. During encoding, the chunks needed for the appendix section are injected in the pxar archive after forcing a chunk boundary when regular pxar encoding is finished. Finally, the pxar archive containing an appenidx section are marked as such by appending a final pxar goodbye lookup table only containing the offset to the appendix section start and total size of that section, needed for random access as e.g. for mounting the archive via the fuse filesystem implementation. Currently, the code assumes the reference backup (for which the previous run is used) to be a regular backup without appendix section, and the catalog for that backup to already contain the required additional offset information. An invocation therefore looks lile: ```bash proxmox-backup-client backup