From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: Re: [pbs-devel] [RFC pxar proxmox-backup 00/20] fix #3174: improve file-level backup
Date: Tue, 26 Sep 2023 09:15:50 +0200 (CEST) [thread overview]
Message-ID: <1301290754.4714.1695712550183@webmail.proxmox.com> (raw)
In-Reply-To: <20230922071621.12670-1-c.ebner@proxmox.com>
Thomas suggested to include some form of benchmark, which might be useful not only for measuring performance but rather might be used as regression test in a CI pipeline and/or used to optimize possible tunable parameters.
> On 22.09.2023 09:16 CEST Christian Ebner <c.ebner@proxmox.com> wrote:
>
>
> This (still rather rough) series of patches prototypes a possible
> approach to improve the pxar file level backup creation speed.
> The series is intended to get a first feedback on the implementation
> approach and to find possible pitfalls I might not be aware of.
>
> The current approach is to skip encoding of regular file payloads,
> for which metadata (currently mtime and size) did not change as
> compared to a previous backup run. Instead of re-encoding the files, a
> reference to a newly introduced appendix section of the pxar archive
> will be written. The appenidx section will be created as concatination
> of indexed chunks from the previous backup run, thereby containing the
> sequential file payload at a calculated offset with respect to the
> starting point of the appendix section.
>
> Metadata comparison and caclulation of the chunks to be indexed for the
> appendix section is performed using the catalog of a previous backup as
> reference. In order to be able to calculate the offsets, the current
> catalog format is extended to include the file offset with respect to
> the pxar archive byte stream. This allows to find the required chunks
> indexes, the start padding within the concatenated chunks and the total
> bytes introduced by the chunks.
>
> During encoding, the chunks needed for the appendix section are injected
> in the pxar archive after forcing a chunk boundary when regular pxar
> encoding is finished. Finally, the pxar archive containing an appenidx
> section are marked as such by appending a final pxar goodbye lookup
> table only containing the offset to the appendix section start and total
> size of that section, needed for random access as e.g. for mounting the
> archive via the fuse filesystem implementation.
>
> Currently, the code assumes the reference backup (for which the previous
> run is used) to be a regular backup without appendix section, and the
> catalog for that backup to already contain the required additional
> offset information.
>
> An invocation therefore looks lile:
> ```bash
> proxmox-backup-client backup <label>.pxar:<source-path>
> proxmox-backup-client backup <label>.pxar:<source-path> --incremental
> ```
>
> pxar:
>
> Christian Ebner (8):
> fix #3174: encoder: impl fn new for LinkOffset
> fix #3174: decoder: factor out skip_bytes from skip_entry
> fix #3174: decoder: impl skip_bytes for sync dec
> fix #3174: metadata: impl fn to calc byte size
> fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype
> fix #3174: enc/dec: impl PXAR_APPENDIX entrytype
> fix #3174: encoder: add helper to incr encoder pos
> fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype
>
> examples/mk-format-hashes.rs | 11 +++++
> examples/pxarcmd.rs | 4 +-
> src/accessor/mod.rs | 46 ++++++++++++++++++++
> src/decoder/mod.rs | 38 +++++++++++++---
> src/decoder/sync.rs | 6 +++
> src/encoder/aio.rs | 36 ++++++++++++++--
> src/encoder/mod.rs | 84 +++++++++++++++++++++++++++++++++++-
> src/encoder/sync.rs | 32 +++++++++++++-
> src/format/mod.rs | 16 +++++++
> src/lib.rs | 54 +++++++++++++++++++++++
> 10 files changed, 312 insertions(+), 15 deletions(-)
>
> proxmox-backup:
>
> Christian Ebner (12):
> fix #3174: index: add fn index list from start/end-offsets
> fix #3174: index: add fn digest for DynamicEntry
> fix #3174: api: double catalog upload size
> fix #3174: catalog: incl pxar archives file offset
> fix #3174: archiver/extractor: impl appendix ref
> fix #3174: extractor: impl seq restore from appendix
> fix #3174: archiver: store ref to previous backup
> fix #3174: upload stream: impl reused chunk injector
> fix #3174: chunker: add forced boundaries
> fix #3174: backup writer: inject queued chunk in upload steam
> fix #3174: archiver: reuse files with unchanged metadata
> fix #3174: client: Add incremental flag to backup creation
>
> examples/test_chunk_speed2.rs | 9 +-
> pbs-client/src/backup_writer.rs | 88 ++++---
> pbs-client/src/chunk_stream.rs | 41 +++-
> pbs-client/src/inject_reused_chunks.rs | 123 ++++++++++
> pbs-client/src/lib.rs | 1 +
> pbs-client/src/pxar/create.rs | 217 ++++++++++++++++--
> pbs-client/src/pxar/extract.rs | 141 ++++++++++++
> pbs-client/src/pxar/mod.rs | 2 +-
> pbs-client/src/pxar/tools.rs | 9 +
> pbs-client/src/pxar_backup_stream.rs | 8 +-
> pbs-datastore/src/catalog.rs | 122 ++++++++--
> pbs-datastore/src/dynamic_index.rs | 38 +++
> proxmox-backup-client/src/main.rs | 142 +++++++++++-
> .../src/proxmox_restore_daemon/api.rs | 15 +-
> pxar-bin/src/main.rs | 22 +-
> src/api2/backup/upload_chunk.rs | 4 +-
> src/tape/file_formats/snapshot_archive.rs | 2 +-
> tests/catar.rs | 3 +
> 18 files changed, 886 insertions(+), 101 deletions(-)
> create mode 100644 pbs-client/src/inject_reused_chunks.rs
>
> --
> 2.39.2
prev parent reply other threads:[~2023-09-26 7:15 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-22 7:16 Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC pxar 1/20] fix #3174: encoder: impl fn new for LinkOffset Christian Ebner
2023-09-27 12:08 ` Wolfgang Bumiller
2023-09-27 12:26 ` Christian Ebner
2023-09-28 6:49 ` Wolfgang Bumiller
2023-09-28 7:52 ` Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC pxar 2/20] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
2023-09-27 11:32 ` Wolfgang Bumiller
2023-09-27 11:53 ` Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC pxar 3/20] fix #3174: decoder: impl skip_bytes for sync dec Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC pxar 4/20] fix #3174: metadata: impl fn to calc byte size Christian Ebner
2023-09-27 11:38 ` Wolfgang Bumiller
2023-09-27 11:55 ` Christian Ebner
2023-09-28 8:07 ` Christian Ebner
2023-09-28 9:00 ` Wolfgang Bumiller
2023-09-28 9:27 ` Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC pxar 5/20] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC pxar 6/20] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC pxar 7/20] fix #3174: encoder: add helper to incr encoder pos Christian Ebner
2023-09-27 12:07 ` Wolfgang Bumiller
2023-09-27 12:20 ` Christian Ebner
2023-09-28 7:04 ` Wolfgang Bumiller
2023-09-28 7:50 ` Christian Ebner
2023-09-28 8:32 ` Wolfgang Bumiller
2023-09-22 7:16 ` [pbs-devel] [RFC pxar 8/20] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 09/20] fix #3174: index: add fn index list from start/end-offsets Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 10/20] fix #3174: index: add fn digest for DynamicEntry Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 11/20] fix #3174: api: double catalog upload size Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 12/20] fix #3174: catalog: incl pxar archives file offset Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 13/20] fix #3174: archiver/extractor: impl appendix ref Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 14/20] fix #3174: extractor: impl seq restore from appendix Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 15/20] fix #3174: archiver: store ref to previous backup Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 16/20] fix #3174: upload stream: impl reused chunk injector Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 17/20] fix #3174: chunker: add forced boundaries Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 18/20] fix #3174: backup writer: inject queued chunk in upload steam Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 19/20] fix #3174: archiver: reuse files with unchanged metadata Christian Ebner
2023-09-26 7:01 ` Christian Ebner
2023-09-22 7:16 ` [pbs-devel] [RFC proxmox-backup 20/20] fix #3174: client: Add incremental flag to backup creation Christian Ebner
2023-09-26 7:11 ` Christian Ebner
2023-09-26 7:15 ` Christian Ebner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1301290754.4714.1695712550183@webmail.proxmox.com \
--to=c.ebner@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal