public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: Re: [pbs-devel] [RFC pxar proxmox-backup 00/20] fix #3174: improve file-level backup
Date: Tue, 26 Sep 2023 09:15:50 +0200 (CEST)	[thread overview]
Message-ID: <1301290754.4714.1695712550183@webmail.proxmox.com> (raw)
In-Reply-To: <20230922071621.12670-1-c.ebner@proxmox.com>

Thomas suggested to include some form of benchmark, which might be useful not only for measuring performance but rather might be used as regression test in a CI pipeline and/or used to optimize possible tunable parameters.

> On 22.09.2023 09:16 CEST Christian Ebner <c.ebner@proxmox.com> wrote:
> 
>  
> This (still rather rough) series of patches prototypes a possible
> approach to improve the pxar file level backup creation speed.
> The series is intended to get a first feedback on the implementation
> approach and to find possible pitfalls I might not be aware of.
> 
> The current approach is to skip encoding of regular file payloads,
> for which metadata (currently mtime and size) did not change as
> compared to a previous backup run. Instead of re-encoding the files, a
> reference to a newly introduced appendix section of the pxar archive
> will be written. The appenidx section will be created as concatination
> of indexed chunks from the previous backup run, thereby containing the
> sequential file payload at a calculated offset with respect to the
> starting point of the appendix section.
> 
> Metadata comparison and caclulation of the chunks to be indexed for the
> appendix section is performed using the catalog of a previous backup as
> reference. In order to be able to calculate the offsets, the current
> catalog format is extended to include the file offset with respect to
> the pxar archive byte stream. This allows to find the required chunks
> indexes, the start padding within the concatenated chunks and the total
> bytes introduced by the chunks.
> 
> During encoding, the chunks needed for the appendix section are injected
> in the pxar archive after forcing a chunk boundary when regular pxar
> encoding is finished. Finally, the pxar archive containing an appenidx
> section are marked as such by appending a final pxar goodbye lookup
> table only containing the offset to the appendix section start and total
> size of that section, needed for random access as e.g. for mounting the
> archive via the fuse filesystem implementation.
> 
> Currently, the code assumes the reference backup (for which the previous
> run is used) to be a regular backup without appendix section, and the
> catalog for that backup to already contain the required additional
> offset information.
> 
> An invocation therefore looks lile:
> ```bash
> proxmox-backup-client backup <label>.pxar:<source-path>
> proxmox-backup-client backup <label>.pxar:<source-path> --incremental
> ```
> 
> pxar:
> 
> Christian Ebner (8):
>   fix #3174: encoder: impl fn new for LinkOffset
>   fix #3174: decoder: factor out skip_bytes from skip_entry
>   fix #3174: decoder: impl skip_bytes for sync dec
>   fix #3174: metadata: impl fn to calc byte size
>   fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype
>   fix #3174: enc/dec: impl PXAR_APPENDIX entrytype
>   fix #3174: encoder: add helper to incr encoder pos
>   fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype
> 
>  examples/mk-format-hashes.rs | 11 +++++
>  examples/pxarcmd.rs          |  4 +-
>  src/accessor/mod.rs          | 46 ++++++++++++++++++++
>  src/decoder/mod.rs           | 38 +++++++++++++---
>  src/decoder/sync.rs          |  6 +++
>  src/encoder/aio.rs           | 36 ++++++++++++++--
>  src/encoder/mod.rs           | 84 +++++++++++++++++++++++++++++++++++-
>  src/encoder/sync.rs          | 32 +++++++++++++-
>  src/format/mod.rs            | 16 +++++++
>  src/lib.rs                   | 54 +++++++++++++++++++++++
>  10 files changed, 312 insertions(+), 15 deletions(-)
> 
> proxmox-backup:
> 
> Christian Ebner (12):
>   fix #3174: index: add fn index list from start/end-offsets
>   fix #3174: index: add fn digest for DynamicEntry
>   fix #3174: api: double catalog upload size
>   fix #3174: catalog: incl pxar archives file offset
>   fix #3174: archiver/extractor: impl appendix ref
>   fix #3174: extractor: impl seq restore from appendix
>   fix #3174: archiver: store ref to previous backup
>   fix #3174: upload stream: impl reused chunk injector
>   fix #3174: chunker: add forced boundaries
>   fix #3174: backup writer: inject queued chunk in upload steam
>   fix #3174: archiver: reuse files with unchanged metadata
>   fix #3174: client: Add incremental flag to backup creation
> 
>  examples/test_chunk_speed2.rs                 |   9 +-
>  pbs-client/src/backup_writer.rs               |  88 ++++---
>  pbs-client/src/chunk_stream.rs                |  41 +++-
>  pbs-client/src/inject_reused_chunks.rs        | 123 ++++++++++
>  pbs-client/src/lib.rs                         |   1 +
>  pbs-client/src/pxar/create.rs                 | 217 ++++++++++++++++--
>  pbs-client/src/pxar/extract.rs                | 141 ++++++++++++
>  pbs-client/src/pxar/mod.rs                    |   2 +-
>  pbs-client/src/pxar/tools.rs                  |   9 +
>  pbs-client/src/pxar_backup_stream.rs          |   8 +-
>  pbs-datastore/src/catalog.rs                  | 122 ++++++++--
>  pbs-datastore/src/dynamic_index.rs            |  38 +++
>  proxmox-backup-client/src/main.rs             | 142 +++++++++++-
>  .../src/proxmox_restore_daemon/api.rs         |  15 +-
>  pxar-bin/src/main.rs                          |  22 +-
>  src/api2/backup/upload_chunk.rs               |   4 +-
>  src/tape/file_formats/snapshot_archive.rs     |   2 +-
>  tests/catar.rs                                |   3 +
>  18 files changed, 886 insertions(+), 101 deletions(-)
>  create mode 100644 pbs-client/src/inject_reused_chunks.rs
> 
> -- 
> 2.39.2




      parent reply	other threads:[~2023-09-26  7:15 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-22  7:16 Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 1/20] fix #3174: encoder: impl fn new for LinkOffset Christian Ebner
2023-09-27 12:08   ` Wolfgang Bumiller
2023-09-27 12:26     ` Christian Ebner
2023-09-28  6:49       ` Wolfgang Bumiller
2023-09-28  7:52         ` Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 2/20] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
2023-09-27 11:32   ` Wolfgang Bumiller
2023-09-27 11:53     ` Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 3/20] fix #3174: decoder: impl skip_bytes for sync dec Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 4/20] fix #3174: metadata: impl fn to calc byte size Christian Ebner
2023-09-27 11:38   ` Wolfgang Bumiller
2023-09-27 11:55     ` Christian Ebner
2023-09-28  8:07       ` Christian Ebner
2023-09-28  9:00         ` Wolfgang Bumiller
2023-09-28  9:27           ` Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 5/20] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 6/20] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 7/20] fix #3174: encoder: add helper to incr encoder pos Christian Ebner
2023-09-27 12:07   ` Wolfgang Bumiller
2023-09-27 12:20     ` Christian Ebner
2023-09-28  7:04       ` Wolfgang Bumiller
2023-09-28  7:50         ` Christian Ebner
2023-09-28  8:32           ` Wolfgang Bumiller
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 8/20] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 09/20] fix #3174: index: add fn index list from start/end-offsets Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 10/20] fix #3174: index: add fn digest for DynamicEntry Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 11/20] fix #3174: api: double catalog upload size Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 12/20] fix #3174: catalog: incl pxar archives file offset Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 13/20] fix #3174: archiver/extractor: impl appendix ref Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 14/20] fix #3174: extractor: impl seq restore from appendix Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 15/20] fix #3174: archiver: store ref to previous backup Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 16/20] fix #3174: upload stream: impl reused chunk injector Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 17/20] fix #3174: chunker: add forced boundaries Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 18/20] fix #3174: backup writer: inject queued chunk in upload steam Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 19/20] fix #3174: archiver: reuse files with unchanged metadata Christian Ebner
2023-09-26  7:01   ` Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 20/20] fix #3174: client: Add incremental flag to backup creation Christian Ebner
2023-09-26  7:11   ` Christian Ebner
2023-09-26  7:15 ` Christian Ebner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1301290754.4714.1695712550183@webmail.proxmox.com \
    --to=c.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal