public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [RFC pxar proxmox-backup 00/36] fix #3174: improve file-level backup
@ 2024-02-28 14:01 Christian Ebner
  2024-02-28 14:01 ` [pbs-devel] [RFC pxar 01/36] format/examples: Fix typo in PXAR_PAYLOAD description Christian Ebner
                   ` (35 more replies)
  0 siblings, 36 replies; 39+ messages in thread
From: Christian Ebner @ 2024-02-28 14:01 UTC (permalink / raw)
  To: pbs-devel

Disclaimer: This patches are work in progress and not intended for
production use just yet. The purpose is for initial testing and review.

This series of patches implements an metadata based file change
detection mechanism for improved pxar file level backup creation speed
for unchanged files.

The chosen approach is to split pxar archives on creation via the
proxmox-backup-client into two separate archives and upload streams,
one exclusive for regular file payloads, the other one for the rest
of the pxar archive, which is mostly metadata.

On consecutive runs, the metadata archive of the previous backup run,
which is limited in size and therefore rapidly accessed is used to
lookup and compare the metadata for entries to encode.
This assumes that the connection speed to the Proxmox Backup Server is
sufficiently fast, allowing the download and chaching of the chunks for
that index.

Changes to regular files are detected by comparing all of the files
metadata object, including mtime, acls, ecc. If no changes are detected,
the previous payload index is used to lookup chunks to possibly re-use
in the payload stream of the new archive.
In order to reduce possible chunk fragmentation, the decision wether to
re-use or re-encode a file payload is deferred until enough information
is gathered by adding entries to a look-ahead cache. If enough payload
is referenced, the chunks are re-used and injected into the pxar payload
upload stream, otherwise they are discated and the files encoded
regularly.

An invocation of a backup run with this patches now is:
```bash
proxmox-backup-client backup <label>.pxar:<source-path> --change-detection-mode=metadata
```
During the first run, no reference index is available, the pxar archive
will however be split into the two parts.
Following backups will however utilize the pxar archive accessor and
index files of the previous run to perform file change detection.

As benchmarks, the linux source code as well as the coco dataset for
computer vision and pattern recognition can be used.
The benchmarks can be performed by running:
```bash
proxmox-backup-test-suite detection-mode-bench prepare --target /<path-to-bench-source-target>
proxmox-backup-test-suite detection-mode-bench run linux.pxar:/<path-to-bench-source-target>/linux
proxmox-backup-test-suite detection-mode-bench run coco.pxar:/<path-to-bench-source-target>/coco
```

Above command invocations assume the default repository and credentials
to be set as environment variables, they might however be passed as
additional optional parameters instead.

Benchmark runs using these test data show a significant improvement in
the time needed for the backups. Note that all of these results were to a local
PBS instance within a VM, minimizing therefore possible influences by the network.

For the linux source code backup:
    Completed benchmark with 5 runs for each tested mode.

    Completed regular backup with:
    Total runtime: 51.31 s
    Average: 10.26 ± 0.12 s
    Min: 10.16 s
    Max: 10.46 s

    Completed metadata detection mode backup with:
    Total runtime: 4.89 s
    Average: 0.98 ± 0.02 s
    Min: 0.95 s
    Max: 1.00 s

    Differences (metadata based - regular):
    Delta total runtime: -46.42 s (-90.47 %)
    Delta average: -9.28 ± 0.12 s (-90.47 %)
    Delta min: -9.21 s (-90.64 %)
    Delta max: -9.46 s (-90.44 %)

For the coco dataset backup:
    Completed benchmark with 5 runs for each tested mode.

    Completed regular backup with:
    Total runtime: 520.72 s
    Average: 104.14 ± 0.79 s
    Min: 103.44 s
    Max: 105.49 s

    Completed metadata detection mode backup with:
    Total runtime: 6.95 s
    Average: 1.39 ± 0.23 s
    Min: 1.26 s
    Max: 1.79 s

    Differences (metadata based - regular):
    Delta total runtime: -513.76 s (-98.66 %)
    Delta average: -102.75 ± 0.83 s (-98.66 %)
    Delta min: -102.18 s (-98.78 %)
    Delta max: -103.69 s (-98.30 %)

This series of patches implements an alternative, but more promising
approach to the series presented previously [0], with the intention to
solve the same issue with less changes required to the pxar format and to
be more efficient.

[0] https://lists.proxmox.com/pipermail/pbs-devel/2024-January/007693.html

pxar:

Christian Ebner (10):
  format/examples: Fix typo in PXAR_PAYLOAD description
  format/examples: add PXAR_PAYLOAD_REF entry header
  encoder: add optional output writer for file payloads
  decoder: add optional payload input stream
  accessor: add optional payload input stream
  encoder: move to stack based state tracking
  encoder: add payload reference capability
  encoder: add payload position capability
  encoder: add payload advance capabilty
  encoder/format: finish payload stream with marker

 examples/mk-format-hashes.rs |  12 +-
 examples/pxarcmd.rs          |   6 +-
 src/accessor/aio.rs          |   7 +
 src/accessor/mod.rs          |  85 +++++++-
 src/decoder/mod.rs           |  82 +++++++-
 src/decoder/sync.rs          |   7 +
 src/encoder/aio.rs           |  50 +++--
 src/encoder/mod.rs           | 363 +++++++++++++++++++++++++----------
 src/encoder/sync.rs          |  43 ++++-
 src/format/mod.rs            |   6 +-
 src/lib.rs                   |   3 +
 11 files changed, 524 insertions(+), 140 deletions(-)

proxmox-backup:

Christian Ebner (26):
  client: pxar: switch to stack based encoder state
  client: backup: factor out extension from backup target
  client: backup: early check for fixed index type
  client: backup: split payload to dedicated stream
  client: restore: read payload from dedicated index
  tools: cover meta extension for pxar archives
  restore: cover meta extension for pxar archives
  client: mount: make split pxar archives mountable
  api: datastore: refactor getting local chunk reader
  api: datastore: attach optional payload chunk reader
  catalog: shell: factor out pxar fuse reader instantiation
  catalog: shell: redirect payload reader for split streams
  www: cover meta extension for pxar archives
  index: fetch chunk form index by start/end-offset
  upload stream: impl reused chunk injector
  client: chunk stream: add chunk injection queues
  client: implement prepare reference method
  client: pxar: implement store to insert chunks on caching
  client: pxar: add previous reference to archiver
  client: pxar: add method for metadata comparison
  specs: add backup detection mode specification
  pxar: caching: add look-ahead cache types
  client: pxar: add look-ahead caching
  fix #3174: client: pxar: enable caching and meta comparison
  test-suite: add detection mode change benchmark
  test-suite: Add bin to deb, add shell completions

 Cargo.toml                                    |   1 +
 Makefile                                      |  13 +-
 debian/proxmox-backup-client.bash-completion  |   1 +
 debian/proxmox-backup-client.install          |   2 +
 debian/proxmox-backup-test-suite.bc           |   8 +
 examples/test_chunk_speed2.rs                 |  10 +-
 pbs-client/src/backup_specification.rs        |  53 ++
 pbs-client/src/backup_writer.rs               |  89 ++-
 pbs-client/src/chunk_stream.rs                |  42 +-
 pbs-client/src/inject_reused_chunks.rs        | 152 +++++
 pbs-client/src/lib.rs                         |   1 +
 pbs-client/src/pxar/create.rs                 | 597 +++++++++++++++++-
 pbs-client/src/pxar/lookahead_cache.rs        |  38 ++
 pbs-client/src/pxar/mod.rs                    |   3 +-
 pbs-client/src/pxar_backup_stream.rs          |  61 +-
 pbs-client/src/tools/mod.rs                   |   2 +-
 pbs-datastore/src/dynamic_index.rs            |  55 ++
 proxmox-backup-client/src/catalog.rs          |  71 ++-
 proxmox-backup-client/src/main.rs             | 280 +++++++-
 proxmox-backup-client/src/mount.rs            |  56 +-
 proxmox-backup-test-suite/Cargo.toml          |  18 +
 .../src/detection_mode_bench.rs               | 294 +++++++++
 proxmox-backup-test-suite/src/main.rs         |  17 +
 proxmox-file-restore/src/main.rs              |  11 +-
 .../src/proxmox_restore_daemon/api.rs         |  16 +-
 pxar-bin/src/main.rs                          |   7 +-
 src/api2/admin/datastore.rs                   |  45 +-
 tests/catar.rs                                |   4 +
 www/datastore/Content.js                      |   6 +-
 zsh-completions/_proxmox-backup-test-suite    |  13 +
 30 files changed, 1807 insertions(+), 159 deletions(-)
 create mode 100644 debian/proxmox-backup-test-suite.bc
 create mode 100644 pbs-client/src/inject_reused_chunks.rs
 create mode 100644 pbs-client/src/pxar/lookahead_cache.rs
 create mode 100644 proxmox-backup-test-suite/Cargo.toml
 create mode 100644 proxmox-backup-test-suite/src/detection_mode_bench.rs
 create mode 100644 proxmox-backup-test-suite/src/main.rs
 create mode 100644 zsh-completions/_proxmox-backup-test-suite

-- 
2.39.2





^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2024-03-12 10:12 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-28 14:01 [pbs-devel] [RFC pxar proxmox-backup 00/36] fix #3174: improve file-level backup Christian Ebner
2024-02-28 14:01 ` [pbs-devel] [RFC pxar 01/36] format/examples: Fix typo in PXAR_PAYLOAD description Christian Ebner
2024-02-28 18:09   ` [pbs-devel] applied: " Thomas Lamprecht
2024-02-28 14:01 ` [pbs-devel] [RFC pxar 02/36] format/examples: add PXAR_PAYLOAD_REF entry header Christian Ebner
2024-02-28 14:01 ` [pbs-devel] [RFC pxar 03/36] encoder: add optional output writer for file payloads Christian Ebner
2024-02-28 14:01 ` [pbs-devel] [RFC pxar 04/36] decoder: add optional payload input stream Christian Ebner
2024-02-28 14:01 ` [pbs-devel] [RFC pxar 05/36] accessor: " Christian Ebner
2024-02-28 14:01 ` [pbs-devel] [RFC pxar 06/36] encoder: move to stack based state tracking Christian Ebner
2024-03-12 10:12   ` Dietmar Maurer
2024-02-28 14:01 ` [pbs-devel] [RFC pxar 07/36] encoder: add payload reference capability Christian Ebner
2024-02-28 14:01 ` [pbs-devel] [RFC pxar 08/36] encoder: add payload position capability Christian Ebner
2024-02-28 14:01 ` [pbs-devel] [RFC pxar 09/36] encoder: add payload advance capabilty Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC pxar 10/36] encoder/format: finish payload stream with marker Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 11/36] client: pxar: switch to stack based encoder state Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 12/36] client: backup: factor out extension from backup target Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 13/36] client: backup: early check for fixed index type Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 14/36] client: backup: split payload to dedicated stream Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 15/36] client: restore: read payload from dedicated index Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 16/36] tools: cover meta extension for pxar archives Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 17/36] restore: " Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 18/36] client: mount: make split pxar archives mountable Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 19/36] api: datastore: refactor getting local chunk reader Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 20/36] api: datastore: attach optional payload " Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 21/36] catalog: shell: factor out pxar fuse reader instantiation Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 22/36] catalog: shell: redirect payload reader for split streams Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 23/36] www: cover meta extension for pxar archives Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 24/36] index: fetch chunk form index by start/end-offset Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 25/36] upload stream: impl reused chunk injector Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 26/36] client: chunk stream: add chunk injection queues Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 27/36] client: implement prepare reference method Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 28/36] client: pxar: implement store to insert chunks on caching Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 29/36] client: pxar: add previous reference to archiver Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 30/36] client: pxar: add method for metadata comparison Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 31/36] specs: add backup detection mode specification Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 32/36] pxar: caching: add look-ahead cache types Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 33/36] client: pxar: add look-ahead caching Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 34/36] fix #3174: client: pxar: enable caching and meta comparison Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 35/36] test-suite: add detection mode change benchmark Christian Ebner
2024-02-28 14:02 ` [pbs-devel] [RFC proxmox-backup 36/36] test-suite: Add bin to deb, add shell completions Christian Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal