public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH v7 proxmox-backup 58/69] docs: add section describing change detection mode
Date: Mon, 27 May 2024 16:33:12 +0200	[thread overview]
Message-ID: <20240527143323.456002-59-c.ebner@proxmox.com> (raw)
In-Reply-To: <20240527143323.456002-1-c.ebner@proxmox.com>

Describe the motivation and basic principle of the clients change
detection mode and show an example invocation.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 6:
- add more information on metadata being compared
- adapt and link from technical overview

 docs/backup-client.rst      | 45 +++++++++++++++++++++++++++++++++++++
 docs/technical-overview.rst |  3 +++
 2 files changed, 48 insertions(+)

diff --git a/docs/backup-client.rst b/docs/backup-client.rst
index 00a1abbb3..58fcd79f0 100644
--- a/docs/backup-client.rst
+++ b/docs/backup-client.rst
@@ -280,6 +280,51 @@ Multiple paths can be excluded like this:
 
     # proxmox-backup-client backup.pxar:./linux --exclude=/usr --exclude=/rust
 
+.. _client_change_detection_mode:
+
+Change Detection Mode
+~~~~~~~~~~~~~~~~~~~~~
+
+File-based backups containing a lot of data can take a long time, as the default
+behavior for the Proxmox backup client is to read all data and re-encode it.
+The encoded stream is split into variable sized chunks for efficient
+deduplication and based on the chunk digest a decision can be made whether a
+given chunk needs to be uploaded or can be indexed without upload as it is
+already available on the server (and therefore deduplicated). For some
+use-cases, where files do not change frequently the full re-reading is not
+feasible and undesired.
+
+The backup clients `change-detection-mode` can be switched from default to
+`metadata` based detection to reduce limitations as described above, instructing
+the client to avoid re-reading files with unchanged metadata whenever possible.
+When using this mode, instead of the regular pxar archive, the backup snapshot
+is stored into two separate files: the `mpxar` containing the archives metadata
+and the `ppxar` containing a concatenation of the file contents. This splitting
+allows for metadata lookups without the overhead of the file contents.
+Using the `change-detection-mode` set to `data` allows to create the same split
+archive as when using the `metadata` mode, but without using a previous
+reference and therefore reencoding all file payloads.
+
+When creating the backup archives, the current file metadata is compared to the
+one looked up in the previous `mpxar` archive.
+The metadata comparison includes file size, file type, ownership and permission
+information acls and attributes and most importantly the files mtime, for
+details see the :ref:`pxar metadata archive format <pxar-meta-format>`.
+
+If unchanged, the entry is cached for possible re-use of content chunks without
+re-reading, by indexing the already present chunks containing the contents from
+the previous backup snapshot. Since the file might only partially re-use chunks
+(thereby introducing wasted space in the form of padding), the decision whether
+to re-use or re-encode the currently cached entries is delegated to when enough
+information is available, comparing the possible padding a threshold value.
+
+The following shows an example for the client invocation with the `metadata`
+mode:
+
+.. code-block:: console
+
+    # proxmox-backup-client backup.pxar:./linux --change-detection-mode=metadata
+
 .. _client_encryption:
 
 Encryption
diff --git a/docs/technical-overview.rst b/docs/technical-overview.rst
index 89835a7cc..a8b1c7268 100644
--- a/docs/technical-overview.rst
+++ b/docs/technical-overview.rst
@@ -28,6 +28,9 @@ which are not chunked, e.g. the client log), or one or more indexes
 
 When uploading an index, the client first has to read the source data, chunk it
 and send the data as chunks with their identifying checksum to the server.
+When using the :ref:`change detection mode <change_detection_mode>` payload
+chunks for unchanged files are reused from the previous snapshot, thereby not
+reading the source data again.
 
 If there is a previous Snapshot in the backup group, the client can first
 download the chunk list of the previous Snapshot. If it detects a chunk that
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


  parent reply	other threads:[~2024-05-27 14:41 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-27 14:32 [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 01/69] decoder: factor out skip part from skip_entry Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 02/69] lib: add type for input/output variant differentiation Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 03/69] encoder: move to stack based state tracking Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 04/69] format/examples: add header type `PXAR_PAYLOAD_REF` Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 05/69] decoder: add method to read payload references Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 06/69] encoder: allow split output writer for archive creation Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 07/69] decoder/accessor: allow for split input stream variant Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 08/69] decoder: set payload input range when decoding via accessor Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 09/69] encoder: add payload reference capability Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 10/69] encoder: add payload position capability Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 11/69] encoder: add payload advance capability Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 12/69] encoder/format: finish payload stream with marker Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 13/69] format: add payload stream start marker Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 14/69] format/encoder/decoder: new pxar entry type `Version` Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 pxar 15/69] format/encoder/decoder: new pxar entry type `Prelude` Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 16/69] client: backup: factor out extension from backup target Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 17/69] api: datastore: refactor getting local chunk reader Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 18/69] client: pxar: switch to stack based encoder state Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 19/69] client: pxar: combine writers into struct Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 20/69] client: pxar: optionally split metadata and payload streams Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 21/69] client: helper: add helpers for creating reader instances Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 22/69] client: helper: add method for split archive name mapping Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 23/69] client: tools: helper to check pxar filename extensions Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 24/69] client: restore: read payload from dedicated index Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 25/69] tools: cover extension for split pxar archives Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 26/69] restore: " Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 27/69] client: mount: make split pxar archives mountable Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 28/69] api: datastore: attach split archive payload chunk reader Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 29/69] catalog: shell: make split pxar archives accessible Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 30/69] www: cover metadata extension for pxar archives Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 31/69] file restore: factor out getting pxar reader Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 32/69] file restore: cover split metadata and payload archives Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 33/69] file restore: show more error context when extraction fails Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 34/69] pxar: add optional payload input for archive restore Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 35/69] pxar: cover listing for split archives Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 36/69] pxar: add more context to extraction error Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 37/69] client: pxar: include payload offset in entry listing Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 38/69] pxar: show padding in debug output on archive list Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 39/69] datastore: dynamic index: add method to get digest Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 40/69] client: pxar: helper for lookup of reusable dynamic entries Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 41/69] upload stream: implement reused chunk injector Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 42/69] client: chunk stream: add struct to hold injection state Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 43/69] chunker: add method to reset chunker state Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 44/69] client: streams: add channels for dynamic entry injection Christian Ebner
2024-05-27 14:32 ` [pbs-devel] [PATCH v7 proxmox-backup 45/69] specs: add backup detection mode specification Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 46/69] client: implement prepare reference method Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 47/69] client: pxar: add method for metadata comparison Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 48/69] pxar: caching: add look-ahead cache Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 49/69] client: pxar: refactor catalog encoding for directories Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 50/69] fix #3174: client: pxar: enable caching and meta comparison Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 51/69] client: backup writer: add injected chunk count to stats Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 52/69] pxar: create: keep track of reused chunks and files Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 53/69] pxar: create: show chunk injection stats debug output Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 54/69] client: pxar: add helper to handle optional preludes Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 55/69] client: pxar: opt encode cli exclude patterns as Prelude Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 56/69] pxar: ignore version and prelude entries in listing Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 57/69] docs: file formats: describe split pxar archive file layout Christian Ebner
2024-05-27 14:33 ` Christian Ebner [this message]
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 59/69] test-suite: add detection mode change benchmark Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 60/69] test-suite: Makefile: add debian package and related files Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 61/69] datastore: chunker: add Chunker trait Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 62/69] datastore: chunker: implement chunker for payload stream Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 63/69] client: chunk stream: switch payload stream chunker Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 64/69] client: pxar: allow to restore prelude to optional path Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 65/69] client: pxar: add archive creation with reference test Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 66/69] client: tools: add helper to raise nofile rlimit Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 67/69] client: pxar: set cache limit based on " Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 68/69] chunker: tests: add regression tests for payload chunker Christian Ebner
2024-05-27 14:33 ` [pbs-devel] [PATCH v7 proxmox-backup 69/69] chunk stream: " Christian Ebner
2024-05-28  9:45 ` [pbs-devel] [PATCH v7 pxar proxmox-backup 00/69] fix #3174: improve file-level backup Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240527143323.456002-59-c.ebner@proxmox.com \
    --to=c.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal