From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <c.ebner@proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id D2245E600 for <pbs-devel@lists.proxmox.com>; Tue, 26 Sep 2023 09:15:52 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id B45AF3599A for <pbs-devel@lists.proxmox.com>; Tue, 26 Sep 2023 09:15:52 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for <pbs-devel@lists.proxmox.com>; Tue, 26 Sep 2023 09:15:51 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 6823D446C5 for <pbs-devel@lists.proxmox.com>; Tue, 26 Sep 2023 09:15:51 +0200 (CEST) Date: Tue, 26 Sep 2023 09:15:50 +0200 (CEST) From: Christian Ebner <c.ebner@proxmox.com> To: pbs-devel@lists.proxmox.com Message-ID: <1301290754.4714.1695712550183@webmail.proxmox.com> In-Reply-To: <20230922071621.12670-1-c.ebner@proxmox.com> References: <20230922071621.12670-1-c.ebner@proxmox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Priority: 3 Importance: Normal X-Mailer: Open-Xchange Mailer v7.10.6-Rev50 X-Originating-Client: open-xchange-appsuite X-SPAM-LEVEL: Spam detection results: 0 AWL 0.100 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] [RFC pxar proxmox-backup 00/20] fix #3174: improve file-level backup X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion <pbs-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/> List-Post: <mailto:pbs-devel@lists.proxmox.com> List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe> X-List-Received-Date: Tue, 26 Sep 2023 07:15:52 -0000 Thomas suggested to include some form of benchmark, which might be useful not only for measuring performance but rather might be used as regression test in a CI pipeline and/or used to optimize possible tunable parameters. > On 22.09.2023 09:16 CEST Christian Ebner <c.ebner@proxmox.com> wrote: > > > This (still rather rough) series of patches prototypes a possible > approach to improve the pxar file level backup creation speed. > The series is intended to get a first feedback on the implementation > approach and to find possible pitfalls I might not be aware of. > > The current approach is to skip encoding of regular file payloads, > for which metadata (currently mtime and size) did not change as > compared to a previous backup run. Instead of re-encoding the files, a > reference to a newly introduced appendix section of the pxar archive > will be written. The appenidx section will be created as concatination > of indexed chunks from the previous backup run, thereby containing the > sequential file payload at a calculated offset with respect to the > starting point of the appendix section. > > Metadata comparison and caclulation of the chunks to be indexed for the > appendix section is performed using the catalog of a previous backup as > reference. In order to be able to calculate the offsets, the current > catalog format is extended to include the file offset with respect to > the pxar archive byte stream. This allows to find the required chunks > indexes, the start padding within the concatenated chunks and the total > bytes introduced by the chunks. > > During encoding, the chunks needed for the appendix section are injected > in the pxar archive after forcing a chunk boundary when regular pxar > encoding is finished. Finally, the pxar archive containing an appenidx > section are marked as such by appending a final pxar goodbye lookup > table only containing the offset to the appendix section start and total > size of that section, needed for random access as e.g. for mounting the > archive via the fuse filesystem implementation. > > Currently, the code assumes the reference backup (for which the previous > run is used) to be a regular backup without appendix section, and the > catalog for that backup to already contain the required additional > offset information. > > An invocation therefore looks lile: > ```bash > proxmox-backup-client backup <label>.pxar:<source-path> > proxmox-backup-client backup <label>.pxar:<source-path> --incremental > ``` > > pxar: > > Christian Ebner (8): > fix #3174: encoder: impl fn new for LinkOffset > fix #3174: decoder: factor out skip_bytes from skip_entry > fix #3174: decoder: impl skip_bytes for sync dec > fix #3174: metadata: impl fn to calc byte size > fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype > fix #3174: enc/dec: impl PXAR_APPENDIX entrytype > fix #3174: encoder: add helper to incr encoder pos > fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype > > examples/mk-format-hashes.rs | 11 +++++ > examples/pxarcmd.rs | 4 +- > src/accessor/mod.rs | 46 ++++++++++++++++++++ > src/decoder/mod.rs | 38 +++++++++++++--- > src/decoder/sync.rs | 6 +++ > src/encoder/aio.rs | 36 ++++++++++++++-- > src/encoder/mod.rs | 84 +++++++++++++++++++++++++++++++++++- > src/encoder/sync.rs | 32 +++++++++++++- > src/format/mod.rs | 16 +++++++ > src/lib.rs | 54 +++++++++++++++++++++++ > 10 files changed, 312 insertions(+), 15 deletions(-) > > proxmox-backup: > > Christian Ebner (12): > fix #3174: index: add fn index list from start/end-offsets > fix #3174: index: add fn digest for DynamicEntry > fix #3174: api: double catalog upload size > fix #3174: catalog: incl pxar archives file offset > fix #3174: archiver/extractor: impl appendix ref > fix #3174: extractor: impl seq restore from appendix > fix #3174: archiver: store ref to previous backup > fix #3174: upload stream: impl reused chunk injector > fix #3174: chunker: add forced boundaries > fix #3174: backup writer: inject queued chunk in upload steam > fix #3174: archiver: reuse files with unchanged metadata > fix #3174: client: Add incremental flag to backup creation > > examples/test_chunk_speed2.rs | 9 +- > pbs-client/src/backup_writer.rs | 88 ++++--- > pbs-client/src/chunk_stream.rs | 41 +++- > pbs-client/src/inject_reused_chunks.rs | 123 ++++++++++ > pbs-client/src/lib.rs | 1 + > pbs-client/src/pxar/create.rs | 217 ++++++++++++++++-- > pbs-client/src/pxar/extract.rs | 141 ++++++++++++ > pbs-client/src/pxar/mod.rs | 2 +- > pbs-client/src/pxar/tools.rs | 9 + > pbs-client/src/pxar_backup_stream.rs | 8 +- > pbs-datastore/src/catalog.rs | 122 ++++++++-- > pbs-datastore/src/dynamic_index.rs | 38 +++ > proxmox-backup-client/src/main.rs | 142 +++++++++++- > .../src/proxmox_restore_daemon/api.rs | 15 +- > pxar-bin/src/main.rs | 22 +- > src/api2/backup/upload_chunk.rs | 4 +- > src/tape/file_formats/snapshot_archive.rs | 2 +- > tests/catar.rs | 3 + > 18 files changed, 886 insertions(+), 101 deletions(-) > create mode 100644 pbs-client/src/inject_reused_chunks.rs > > -- > 2.39.2