From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 01BF41FF389 for ; Wed, 5 Jun 2024 13:02:02 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 90D1F3281C; Wed, 5 Jun 2024 13:02:32 +0200 (CEST) From: Christian Ebner To: pbs-devel@lists.proxmox.com Date: Wed, 5 Jun 2024 12:54:14 +0200 Message-Id: <20240605105416.278748-57-c.ebner@proxmox.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240605105416.278748-1-c.ebner@proxmox.com> References: <20240605105416.278748-1-c.ebner@proxmox.com> MIME-Version: 1.0 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.028 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - Subject: [pbs-devel] [PATCH v9 proxmox-backup 56/58] docs: add section describing change detection mode X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" Describe the motivation and basic principle of the clients change detection mode and show an example invocation. Signed-off-by: Christian Ebner --- changes since version 8: - adapted to suggested rewording docs/backup-client.rst | 47 +++++++++++++++++++++++++++++++++++++ docs/technical-overview.rst | 3 +++ 2 files changed, 50 insertions(+) diff --git a/docs/backup-client.rst b/docs/backup-client.rst index 00a1abbb3..e541c5537 100644 --- a/docs/backup-client.rst +++ b/docs/backup-client.rst @@ -280,6 +280,53 @@ Multiple paths can be excluded like this: # proxmox-backup-client backup.pxar:./linux --exclude=/usr --exclude=/rust +.. _client_change_detection_mode: + +Change Detection Mode +~~~~~~~~~~~~~~~~~~~~~ + +File-based backups containing a lot of data can take a long time, as the default +behavior for the Proxmox backup client is to read all data and encode it into a +pxar archive. +The encoded stream is split into variable sized chunks. For each chunk, a digest +is calculated and used to decide whether the chunk needs to be uploaded or can +be indexed without upload, as it is already available on the server (and +therefore deduplicated). If the backed up files are largely unchanged, +re-reading and then detecting the corresponding chunks don't need to be uploaded +after all is time consuming and undesired. + +The backup client's `change-detection-mode` can be switched from default to +`metadata` based detection to reduce limitations as described above, instructing +the client to avoid re-reading files with unchanged metadata whenever possible. +When using this mode, instead of the regular pxar archive, the backup snapshot +is stored into two separate files: the `mpxar` containing the archive's metadata +and the `ppxar` containing a concatenation of the file contents. This splitting +allows for efficient metadata lookups. + +Using the `change-detection-mode` set to `data` allows to create the same split +archive as when using the `metadata` mode, but without using a previous +reference and therefore reencoding all file payloads. +When creating the backup archives, the current file metadata is compared to the +one looked up in the previous `mpxar` archive. +The metadata comparison includes file size, file type, ownership and permission +information, as well as acls and attributes and most importantly the file's +mtime, for details see the +:ref:`pxar metadata archive format `. + +If unchanged, the entry is cached for possible re-use of content chunks without +re-reading, by indexing the already present chunks containing the contents from +the previous backup snapshot. Since the file might only partially re-use chunks +(thereby introducing wasted space in the form of padding), the decision whether +to re-use or re-encode the currently cached entries is postponed to when enough +information is available, comparing the possible padding to a threshold value. + +The following shows an example for the client invocation with the `metadata` +mode: + +.. code-block:: console + + # proxmox-backup-client backup.pxar:./linux --change-detection-mode=metadata + .. _client_encryption: Encryption diff --git a/docs/technical-overview.rst b/docs/technical-overview.rst index 89835a7cc..a8b1c7268 100644 --- a/docs/technical-overview.rst +++ b/docs/technical-overview.rst @@ -28,6 +28,9 @@ which are not chunked, e.g. the client log), or one or more indexes When uploading an index, the client first has to read the source data, chunk it and send the data as chunks with their identifying checksum to the server. +When using the :ref:`change detection mode ` payload +chunks for unchanged files are reused from the previous snapshot, thereby not +reading the source data again. If there is a previous Snapshot in the backup group, the client can first download the chunk list of the previous Snapshot. If it detects a chunk that -- 2.39.2 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel