From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 5B9991FF398 for ; Mon, 27 May 2024 16:41:26 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 8A5FE66A5; Mon, 27 May 2024 16:41:45 +0200 (CEST) From: Christian Ebner To: pbs-devel@lists.proxmox.com Date: Mon, 27 May 2024 16:33:12 +0200 Message-Id: <20240527143323.456002-59-c.ebner@proxmox.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240527143323.456002-1-c.ebner@proxmox.com> References: <20240527143323.456002-1-c.ebner@proxmox.com> MIME-Version: 1.0 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.029 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - Subject: [pbs-devel] [PATCH v7 proxmox-backup 58/69] docs: add section describing change detection mode X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" Describe the motivation and basic principle of the clients change detection mode and show an example invocation. Signed-off-by: Christian Ebner --- changes since version 6: - add more information on metadata being compared - adapt and link from technical overview docs/backup-client.rst | 45 +++++++++++++++++++++++++++++++++++++ docs/technical-overview.rst | 3 +++ 2 files changed, 48 insertions(+) diff --git a/docs/backup-client.rst b/docs/backup-client.rst index 00a1abbb3..58fcd79f0 100644 --- a/docs/backup-client.rst +++ b/docs/backup-client.rst @@ -280,6 +280,51 @@ Multiple paths can be excluded like this: # proxmox-backup-client backup.pxar:./linux --exclude=/usr --exclude=/rust +.. _client_change_detection_mode: + +Change Detection Mode +~~~~~~~~~~~~~~~~~~~~~ + +File-based backups containing a lot of data can take a long time, as the default +behavior for the Proxmox backup client is to read all data and re-encode it. +The encoded stream is split into variable sized chunks for efficient +deduplication and based on the chunk digest a decision can be made whether a +given chunk needs to be uploaded or can be indexed without upload as it is +already available on the server (and therefore deduplicated). For some +use-cases, where files do not change frequently the full re-reading is not +feasible and undesired. + +The backup clients `change-detection-mode` can be switched from default to +`metadata` based detection to reduce limitations as described above, instructing +the client to avoid re-reading files with unchanged metadata whenever possible. +When using this mode, instead of the regular pxar archive, the backup snapshot +is stored into two separate files: the `mpxar` containing the archives metadata +and the `ppxar` containing a concatenation of the file contents. This splitting +allows for metadata lookups without the overhead of the file contents. +Using the `change-detection-mode` set to `data` allows to create the same split +archive as when using the `metadata` mode, but without using a previous +reference and therefore reencoding all file payloads. + +When creating the backup archives, the current file metadata is compared to the +one looked up in the previous `mpxar` archive. +The metadata comparison includes file size, file type, ownership and permission +information acls and attributes and most importantly the files mtime, for +details see the :ref:`pxar metadata archive format `. + +If unchanged, the entry is cached for possible re-use of content chunks without +re-reading, by indexing the already present chunks containing the contents from +the previous backup snapshot. Since the file might only partially re-use chunks +(thereby introducing wasted space in the form of padding), the decision whether +to re-use or re-encode the currently cached entries is delegated to when enough +information is available, comparing the possible padding a threshold value. + +The following shows an example for the client invocation with the `metadata` +mode: + +.. code-block:: console + + # proxmox-backup-client backup.pxar:./linux --change-detection-mode=metadata + .. _client_encryption: Encryption diff --git a/docs/technical-overview.rst b/docs/technical-overview.rst index 89835a7cc..a8b1c7268 100644 --- a/docs/technical-overview.rst +++ b/docs/technical-overview.rst @@ -28,6 +28,9 @@ which are not chunked, e.g. the client log), or one or more indexes When uploading an index, the client first has to read the source data, chunk it and send the data as chunks with their identifying checksum to the server. +When using the :ref:`change detection mode ` payload +chunks for unchanged files are reused from the previous snapshot, thereby not +reading the source data again. If there is a previous Snapshot in the backup group, the client can first download the chunk list of the previous Snapshot. If it detects a chunk that -- 2.39.2 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel