From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: Re: [pbs-devel] [RFC pxar proxmox-backup 00/20] fix #3174: improve file-level backup
Date: Tue, 26 Sep 2023 09:15:50 +0200 (CEST) [thread overview]
Message-ID: <1301290754.4714.1695712550183@webmail.proxmox.com> (raw)
In-Reply-To: <20230922071621.12670-1-c.ebner@proxmox.com>
Thomas suggested to include some form of benchmark, which might be useful not only for measuring performance but rather might be used as regression test in a CI pipeline and/or used to optimize possible tunable parameters.
> On 22.09.2023 09:16 CEST Christian Ebner <c.ebner@proxmox.com> wrote:
>
>
> This (still rather rough) series of patches prototypes a possible
> approach to improve the pxar file level backup creation speed.
> The series is intended to get a first feedback on the implementation
> approach and to find possible pitfalls I might not be aware of.
>
> The current approach is to skip encoding of regular file payloads,
> for which metadata (currently mtime and size) did not change as
> compared to a previous backup run. Instead of re-encoding the files, a
> reference to a newly introduced appendix section of the pxar archive
> will be written. The appenidx section will be created as concatination
> of indexed chunks from the previous backup run, thereby containing the
> sequential file payload at a calculated offset with respect to the
> starting point of the appendix section.
>
> Metadata comparison and caclulation of the chunks to be indexed for the
> appendix section is performed using the catalog of a previous backup as
> reference. In order to be able to calculate the offsets, the current
> catalog format is extended to include the file offset with respect to
> the pxar archive byte stream. This allows to find the required chunks
> indexes, the start padding within the concatenated chunks and the total
> bytes introduced by the chunks.
>
> During encoding, the chunks needed for the appendix section are injected
> in the pxar archive after forcing a chunk boundary when regular pxar
> encoding is finished. Finally, the pxar archive containing an appenidx
> section are marked as such by appending a final pxar goodbye lookup
> table only containing the offset to the appendix section start and total
> size of that section, needed for random access as e.g. for mounting the
> archive via the fuse filesystem implementation.
>
> Currently, the code assumes the reference backup (for which the previous
> run is used) to be a regular backup without appendix section, and the
> catalog for that backup to already contain the required additional
> offset information.
>
> An invocation therefore looks lile:
> ```bash
> proxmox-backup-client backup <label>.pxar:<source-path>
> proxmox-backup-client backup <label>.pxar:<source-path> --incremental
> ```
>
> pxar:
>
> Christian Ebner (8):
> fix #3174: encoder: impl fn new for LinkOffset
> fix #3174: decoder: factor out skip_bytes from skip_entry
> fix #3174: decoder: impl skip_bytes for sync dec
> fix #3174: metadata: impl fn to calc byte size
> fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype
> fix #3174: enc/dec: impl PXAR_APPENDIX entrytype
> fix #3174: encoder: add helper to incr encoder pos
> fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype
>
> examples/mk-format-hashes.rs | 11 +++++
> examples/pxarcmd.rs | 4 +-
> src/accessor/mod.rs | 46 ++++++++++++++++++++
> src/decoder/mod.rs | 38 +++++++++++++---
> src/decoder/sync.rs | 6 +++
> src/encoder/aio.rs | 36 ++++++++++++++--
> src/encoder/mod.rs | 84 +++++++++++++++++++++++++++++++++++-
> src/encoder/sync.rs | 32 +++++++++++++-
> src/format/mod.rs | 16 +++++++
> src/lib.rs | 54 +++++++++++++++++++++++
> 10 files changed, 312 insertions(+), 15 deletions(-)
>
> proxmox-backup:
>
> Christian Ebner (12):
> fix #3174: index: add fn index list from start/end-offsets
> fix #3174: index: add fn digest for DynamicEntry
> fix #3174: api: double catalog upload size
> fix #3174: catalog: incl pxar archives file offset
> fix #3174: archiver/extractor: impl appendix ref
> fix #3174: extractor: impl seq restore from appendix
> fix #3174: archiver: store ref to previous backup
> fix #3174: upload stream: impl reused chunk injector
> fix #3174: chunker: add forced boundaries
> fix #3174: backup writer: inject queued chunk in upload steam
> fix #3174: archiver: reuse files with unchanged metadata
> fix #3174: client: Add incremental flag to backup creation
>
> examples/test_chunk_speed2.rs | 9 +-
> pbs-client/src/backup_writer.rs | 88 ++++---
> pbs-client/src/chunk_stream.rs | 41 +++-
> pbs-client/src/inject_reused_chunks.rs | 123 ++++++++++
> pbs-client/src/lib.rs | 1 +
> pbs-client/src/pxar/create.rs | 217 ++++++++++++++++--
> pbs-client/src/pxar/extract.rs | 141 ++++++++++++
> pbs-client/src/pxar/mod.rs | 2 +-
> pbs-client/src/pxar/tools.rs | 9 +
> pbs-client/src/pxar_backup_stream.rs | 8 +-
> pbs-datastore/src/catalog.rs | 122 ++++++++--
> pbs-datastore/src/dynamic_index.rs | 38 +++
> proxmox-backup-client/src/main.rs | 142 +++++++++++-
> .../src/proxmox_restore_daemon/api.rs | 15 +-
> pxar-bin/src/main.rs | 22 +-
> src/api2/backup/upload_chunk.rs | 4 +-
> src/tape/file_formats/snapshot_archive.rs | 2 +-
> tests/catar.rs | 3 +
> 18 files changed, 886 insertions(+), 101 deletions(-)
> create mode 100644 pbs-client/src/inject_reused_chunks.rs
>
> --
> 2.39.2
prev parent reply other threads:[~2023-09-26 7:15 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-22 7:16 Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC pxar 1/20] fix #3174: encoder: impl fn new for LinkOffset Christian Ebner
2023-09-27 12:08 ` Wolfgang Bumiller
2023-09-27 12:26 ` Christian Ebner
2023-09-28 6:49 ` Wolfgang Bumiller
2023-09-28 7:52 ` Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC pxar 2/20] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
2023-09-27 11:32 ` Wolfgang Bumiller
2023-09-27 11:53 ` Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC pxar 3/20] fix #3174: decoder: impl skip_bytes for sync dec Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC pxar 4/20] fix #3174: metadata: impl fn to calc byte size Christian Ebner
2023-09-27 11:38 ` Wolfgang Bumiller
2023-09-27 11:55 ` Christian Ebner
2023-09-28 8:07 ` Christian Ebner
2023-09-28 9:00 ` Wolfgang Bumiller
2023-09-28 9:27 ` Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC pxar 5/20] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC pxar 6/20] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC pxar 7/20] fix #3174: encoder: add helper to incr encoder pos Christian Ebner
2023-09-27 12:07 ` Wolfgang Bumiller
2023-09-27 12:20 ` Christian Ebner
2023-09-28 7:04 ` Wolfgang Bumiller
2023-09-28 7:50 ` Christian Ebner
2023-09-28 8:32 ` Wolfgang Bumiller
2023-09-22 7:16 ` [pbs-devel] [RFC pxar 8/20] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 09/20] fix #3174: index: add fn index list from start/end-offsets Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 10/20] fix #3174: index: add fn digest for DynamicEntry Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 11/20] fix #3174: api: double catalog upload size Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 12/20] fix #3174: catalog: incl pxar archives file offset Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 13/20] fix #3174: archiver/extractor: impl appendix ref Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 14/20] fix #3174: extractor: impl seq restore from appendix Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 15/20] fix #3174: archiver: store ref to previous backup Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 16/20] fix #3174: upload stream: impl reused chunk injector Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 17/20] fix #3174: chunker: add forced boundaries Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 18/20] fix #3174: backup writer: inject queued chunk in upload steam Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 19/20] fix #3174: archiver: reuse files with unchanged metadata Christian Ebner
2023-09-26 7:01 ` Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 20/20] fix #3174: client: Add incremental flag to backup creation Christian Ebner
2023-09-26 7:11 ` Christian Ebner
2023-09-26 7:15 ` Christian Ebner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1301290754.4714.1695712550183@webmail.proxmox.com \
--to=c.ebner@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox