public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [RFC pxar proxmox-backup 00/20] fix #3174: improve file-level backup
@ 2023-09-22  7:16 Christian Ebner
  2023-09-22  7:16 ` [pbs-devel] [RFC pxar 1/20] fix #3174: encoder: impl fn new for LinkOffset Christian Ebner
                   ` (20 more replies)
  0 siblings, 21 replies; 40+ messages in thread
From: Christian Ebner @ 2023-09-22  7:16 UTC (permalink / raw)
  To: pbs-devel

This (still rather rough) series of patches prototypes a possible
approach to improve the pxar file level backup creation speed.
The series is intended to get a first feedback on the implementation
approach and to find possible pitfalls I might not be aware of.

The current approach is to skip encoding of regular file payloads,
for which metadata (currently mtime and size) did not change as
compared to a previous backup run. Instead of re-encoding the files, a
reference to a newly introduced appendix section of the pxar archive
will be written. The appenidx section will be created as concatination
of indexed chunks from the previous backup run, thereby containing the
sequential file payload at a calculated offset with respect to the
starting point of the appendix section.

Metadata comparison and caclulation of the chunks to be indexed for the
appendix section is performed using the catalog of a previous backup as
reference. In order to be able to calculate the offsets, the current
catalog format is extended to include the file offset with respect to
the pxar archive byte stream. This allows to find the required chunks
indexes, the start padding within the concatenated chunks and the total
bytes introduced by the chunks.

During encoding, the chunks needed for the appendix section are injected
in the pxar archive after forcing a chunk boundary when regular pxar
encoding is finished. Finally, the pxar archive containing an appenidx
section are marked as such by appending a final pxar goodbye lookup
table only containing the offset to the appendix section start and total
size of that section, needed for random access as e.g. for mounting the
archive via the fuse filesystem implementation.

Currently, the code assumes the reference backup (for which the previous
run is used) to be a regular backup without appendix section, and the
catalog for that backup to already contain the required additional
offset information.

An invocation therefore looks lile:
```bash
proxmox-backup-client backup <label>.pxar:<source-path>
proxmox-backup-client backup <label>.pxar:<source-path> --incremental
```

pxar:

Christian Ebner (8):
  fix #3174: encoder: impl fn new for LinkOffset
  fix #3174: decoder: factor out skip_bytes from skip_entry
  fix #3174: decoder: impl skip_bytes for sync dec
  fix #3174: metadata: impl fn to calc byte size
  fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype
  fix #3174: enc/dec: impl PXAR_APPENDIX entrytype
  fix #3174: encoder: add helper to incr encoder pos
  fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype

 examples/mk-format-hashes.rs | 11 +++++
 examples/pxarcmd.rs          |  4 +-
 src/accessor/mod.rs          | 46 ++++++++++++++++++++
 src/decoder/mod.rs           | 38 +++++++++++++---
 src/decoder/sync.rs          |  6 +++
 src/encoder/aio.rs           | 36 ++++++++++++++--
 src/encoder/mod.rs           | 84 +++++++++++++++++++++++++++++++++++-
 src/encoder/sync.rs          | 32 +++++++++++++-
 src/format/mod.rs            | 16 +++++++
 src/lib.rs                   | 54 +++++++++++++++++++++++
 10 files changed, 312 insertions(+), 15 deletions(-)

proxmox-backup:

Christian Ebner (12):
  fix #3174: index: add fn index list from start/end-offsets
  fix #3174: index: add fn digest for DynamicEntry
  fix #3174: api: double catalog upload size
  fix #3174: catalog: incl pxar archives file offset
  fix #3174: archiver/extractor: impl appendix ref
  fix #3174: extractor: impl seq restore from appendix
  fix #3174: archiver: store ref to previous backup
  fix #3174: upload stream: impl reused chunk injector
  fix #3174: chunker: add forced boundaries
  fix #3174: backup writer: inject queued chunk in upload steam
  fix #3174: archiver: reuse files with unchanged metadata
  fix #3174: client: Add incremental flag to backup creation

 examples/test_chunk_speed2.rs                 |   9 +-
 pbs-client/src/backup_writer.rs               |  88 ++++---
 pbs-client/src/chunk_stream.rs                |  41 +++-
 pbs-client/src/inject_reused_chunks.rs        | 123 ++++++++++
 pbs-client/src/lib.rs                         |   1 +
 pbs-client/src/pxar/create.rs                 | 217 ++++++++++++++++--
 pbs-client/src/pxar/extract.rs                | 141 ++++++++++++
 pbs-client/src/pxar/mod.rs                    |   2 +-
 pbs-client/src/pxar/tools.rs                  |   9 +
 pbs-client/src/pxar_backup_stream.rs          |   8 +-
 pbs-datastore/src/catalog.rs                  | 122 ++++++++--
 pbs-datastore/src/dynamic_index.rs            |  38 +++
 proxmox-backup-client/src/main.rs             | 142 +++++++++++-
 .../src/proxmox_restore_daemon/api.rs         |  15 +-
 pxar-bin/src/main.rs                          |  22 +-
 src/api2/backup/upload_chunk.rs               |   4 +-
 src/tape/file_formats/snapshot_archive.rs     |   2 +-
 tests/catar.rs                                |   3 +
 18 files changed, 886 insertions(+), 101 deletions(-)
 create mode 100644 pbs-client/src/inject_reused_chunks.rs

-- 
2.39.2





^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2023-09-28  9:28 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-22  7:16 [pbs-devel] [RFC pxar proxmox-backup 00/20] fix #3174: improve file-level backup Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 1/20] fix #3174: encoder: impl fn new for LinkOffset Christian Ebner
2023-09-27 12:08   ` Wolfgang Bumiller
2023-09-27 12:26     ` Christian Ebner
2023-09-28  6:49       ` Wolfgang Bumiller
2023-09-28  7:52         ` Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 2/20] fix #3174: decoder: factor out skip_bytes from skip_entry Christian Ebner
2023-09-27 11:32   ` Wolfgang Bumiller
2023-09-27 11:53     ` Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 3/20] fix #3174: decoder: impl skip_bytes for sync dec Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 4/20] fix #3174: metadata: impl fn to calc byte size Christian Ebner
2023-09-27 11:38   ` Wolfgang Bumiller
2023-09-27 11:55     ` Christian Ebner
2023-09-28  8:07       ` Christian Ebner
2023-09-28  9:00         ` Wolfgang Bumiller
2023-09-28  9:27           ` Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 5/20] fix #3174: enc/dec: impl PXAR_APPENDIX_REF entrytype Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 6/20] fix #3174: enc/dec: impl PXAR_APPENDIX entrytype Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 7/20] fix #3174: encoder: add helper to incr encoder pos Christian Ebner
2023-09-27 12:07   ` Wolfgang Bumiller
2023-09-27 12:20     ` Christian Ebner
2023-09-28  7:04       ` Wolfgang Bumiller
2023-09-28  7:50         ` Christian Ebner
2023-09-28  8:32           ` Wolfgang Bumiller
2023-09-22  7:16 ` [pbs-devel] [RFC pxar 8/20] fix #3174: enc/dec: impl PXAR_APPENDIX_TAIL entrytype Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 09/20] fix #3174: index: add fn index list from start/end-offsets Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 10/20] fix #3174: index: add fn digest for DynamicEntry Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 11/20] fix #3174: api: double catalog upload size Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 12/20] fix #3174: catalog: incl pxar archives file offset Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 13/20] fix #3174: archiver/extractor: impl appendix ref Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 14/20] fix #3174: extractor: impl seq restore from appendix Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 15/20] fix #3174: archiver: store ref to previous backup Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 16/20] fix #3174: upload stream: impl reused chunk injector Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 17/20] fix #3174: chunker: add forced boundaries Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 18/20] fix #3174: backup writer: inject queued chunk in upload steam Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 19/20] fix #3174: archiver: reuse files with unchanged metadata Christian Ebner
2023-09-26  7:01   ` Christian Ebner
2023-09-22  7:16 ` [pbs-devel] [RFC proxmox-backup 20/20] fix #3174: client: Add incremental flag to backup creation Christian Ebner
2023-09-26  7:11   ` Christian Ebner
2023-09-26  7:15 ` [pbs-devel] [RFC pxar proxmox-backup 00/20] fix #3174: improve file-level backup Christian Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal