From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <c.ebner@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with ESMTPS id D2245E600
 for <pbs-devel@lists.proxmox.com>; Tue, 26 Sep 2023 09:15:52 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with ESMTP id B45AF3599A
 for <pbs-devel@lists.proxmox.com>; Tue, 26 Sep 2023 09:15:52 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [94.136.29.106])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with ESMTPS
 for <pbs-devel@lists.proxmox.com>; Tue, 26 Sep 2023 09:15:51 +0200 (CEST)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 6823D446C5
 for <pbs-devel@lists.proxmox.com>; Tue, 26 Sep 2023 09:15:51 +0200 (CEST)
Date: Tue, 26 Sep 2023 09:15:50 +0200 (CEST)
From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Message-ID: <1301290754.4714.1695712550183@webmail.proxmox.com>
In-Reply-To: <20230922071621.12670-1-c.ebner@proxmox.com>
References: <20230922071621.12670-1-c.ebner@proxmox.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Priority: 3
Importance: Normal
X-Mailer: Open-Xchange Mailer v7.10.6-Rev50
X-Originating-Client: open-xchange-appsuite
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.100 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pbs-devel] [RFC pxar proxmox-backup 00/20] fix #3174: improve
 file-level backup
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Tue, 26 Sep 2023 07:15:52 -0000

Thomas suggested to include some form of benchmark, which might be useful not only for measuring performance but rather might be used as regression test in a CI pipeline and/or used to optimize possible tunable parameters.

> On 22.09.2023 09:16 CEST Christian Ebner <c.ebner@proxmox.com> wrote:
> 
>  
> This (still rather rough) series of patches prototypes a possible
> approach to improve the pxar file level backup creation speed.
> The series is intended to get a first feedback on the implementation
> approach and to find possible pitfalls I might not be aware of.
> 
> The current approach is to skip encoding of regular file payloads,
> for which metadata (currently mtime and size) did not change as
> compared to a previous backup run. Instead of re-encoding the files, a
> reference to a newly introduced appendix section of the pxar archive
> will be written. The appenidx section will be created as concatination
> of indexed chunks from the previous backup run, thereby containing the
> sequential file payload at a calculated offset with respect to the
> starting point of the appendix section.
> 
> Metadata comparison and caclulation of the chunks to be indexed for the
> appendix section is performed using the catalog of a previous backup as
> reference. In order to be able to calculate the offsets, the current
> catalog format is extended to include the file offset with respect to
> the pxar archive byte stream. This allows to find the required chunks
> indexes, the start padding within the concatenated chunks and the total
> bytes introduced by the chunks.
> 
> During encoding, the chunks needed for the appendix section are injected
> in the pxar archive after forcing a chunk boundary when regular pxar
> encoding is finished. Finally, the pxar archive containing an appenidx
> section are marked as such by appending a final pxar goodbye lookup
> table only containing the offset to the appendix section start and total
> size of that section, needed for random access as e.g. for mounting the
> archive via the fuse filesystem implementation.
> 
> Currently, the code assumes the reference backup (for which the previous
> run is used) to be a regular backup without appendix section, and the
> catalog for that backup to already contain the required additional
> offset information.
> 
> An invocation therefore looks lile:
> ```bash
> proxmox-backup-client backup <label>.pxar:<source-path>
> proxmox-backup-client backup <label>.pxar:<source-path> --incremental
> ```
> 
> pxar:
> 
> Christian Ebner (8):
>   fix #3174: encoder: impl fn new for LinkOffset
>   fix #3174: decoder: factor out skip_bytes from skip_entry
>   fix #3174: decoder: impl skip_bytes for sync dec
>   fix #3174: metadata: impl fn to calc byte size
>   fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype
>   fix #3174: enc/dec: impl PXAR_APPENDIX entrytype
>   fix #3174: encoder: add helper to incr encoder pos
>   fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype
> 
>  examples/mk-format-hashes.rs | 11 +++++
>  examples/pxarcmd.rs          |  4 +-
>  src/accessor/mod.rs          | 46 ++++++++++++++++++++
>  src/decoder/mod.rs           | 38 +++++++++++++---
>  src/decoder/sync.rs          |  6 +++
>  src/encoder/aio.rs           | 36 ++++++++++++++--
>  src/encoder/mod.rs           | 84 +++++++++++++++++++++++++++++++++++-
>  src/encoder/sync.rs          | 32 +++++++++++++-
>  src/format/mod.rs            | 16 +++++++
>  src/lib.rs                   | 54 +++++++++++++++++++++++
>  10 files changed, 312 insertions(+), 15 deletions(-)
> 
> proxmox-backup:
> 
> Christian Ebner (12):
>   fix #3174: index: add fn index list from start/end-offsets
>   fix #3174: index: add fn digest for DynamicEntry
>   fix #3174: api: double catalog upload size
>   fix #3174: catalog: incl pxar archives file offset
>   fix #3174: archiver/extractor: impl appendix ref
>   fix #3174: extractor: impl seq restore from appendix
>   fix #3174: archiver: store ref to previous backup
>   fix #3174: upload stream: impl reused chunk injector
>   fix #3174: chunker: add forced boundaries
>   fix #3174: backup writer: inject queued chunk in upload steam
>   fix #3174: archiver: reuse files with unchanged metadata
>   fix #3174: client: Add incremental flag to backup creation
> 
>  examples/test_chunk_speed2.rs                 |   9 +-
>  pbs-client/src/backup_writer.rs               |  88 ++++---
>  pbs-client/src/chunk_stream.rs                |  41 +++-
>  pbs-client/src/inject_reused_chunks.rs        | 123 ++++++++++
>  pbs-client/src/lib.rs                         |   1 +
>  pbs-client/src/pxar/create.rs                 | 217 ++++++++++++++++--
>  pbs-client/src/pxar/extract.rs                | 141 ++++++++++++
>  pbs-client/src/pxar/mod.rs                    |   2 +-
>  pbs-client/src/pxar/tools.rs                  |   9 +
>  pbs-client/src/pxar_backup_stream.rs          |   8 +-
>  pbs-datastore/src/catalog.rs                  | 122 ++++++++--
>  pbs-datastore/src/dynamic_index.rs            |  38 +++
>  proxmox-backup-client/src/main.rs             | 142 +++++++++++-
>  .../src/proxmox_restore_daemon/api.rs         |  15 +-
>  pxar-bin/src/main.rs                          |  22 +-
>  src/api2/backup/upload_chunk.rs               |   4 +-
>  src/tape/file_formats/snapshot_archive.rs     |   2 +-
>  tests/catar.rs                                |   3 +
>  18 files changed, 886 insertions(+), 101 deletions(-)
>  create mode 100644 pbs-client/src/inject_reused_chunks.rs
> 
> -- 
> 2.39.2