From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id B2B411FF396 for ; Thu, 23 May 2024 11:28:57 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 081381C1F1; Thu, 23 May 2024 11:29:16 +0200 (CEST) Message-ID: <317ff75b-14e2-49a8-b47d-95247cb55b00@proxmox.com> Date: Thu, 23 May 2024 11:28:42 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta To: Proxmox Backup Server development discussion , Christian Ebner References: <20240514103421.289431-1-c.ebner@proxmox.com> <20240514103421.289431-55-c.ebner@proxmox.com> Content-Language: en-US From: Dominik Csapak In-Reply-To: <20240514103421.289431-55-c.ebner@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL 0.016 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] [PATCH v6 proxmox-backup 54/65] docs: add section describing change detection mode X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" two comments here, * i'd like for the docs to go a bit more into detail what the metadata *is* (or link to a section where it's explained, e.g. in the mpxar format?) because metadata can be mtime,size,inode,ctime,etc. and e.g. in borg backup you can even choose which you want * the 'technical overview' part still mentions that all data has to be read so a short mention of the change detection mode with link here would be good On 5/14/24 12:34, Christian Ebner wrote: > Describe the motivation and basic principle of the clients change > detection mode and show an example invocation. > > Signed-off-by: Christian Ebner > --- > docs/backup-client.rst | 41 +++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 41 insertions(+) > > diff --git a/docs/backup-client.rst b/docs/backup-client.rst > index 00a1abbb3..e48b5dd60 100644 > --- a/docs/backup-client.rst > +++ b/docs/backup-client.rst > @@ -280,6 +280,47 @@ Multiple paths can be excluded like this: > > # proxmox-backup-client backup.pxar:./linux --exclude=/usr --exclude=/rust > > +.. _client_change_detection_mode: > + > +Change detection mode > +~~~~~~~~~~~~~~~~~~~~~ > + > +File-based backups containing a lot of data can take a long time, as the default > +behavior for the Proxmox backup client is to read all data and re-encode it. > +The encoded stream is split into variable sized chunks for efficient > +deduplication and based on the chunk digest a decision can be made whether a > +given chunk needs to be uploaded or can be indexed without upload as it is > +already available on the server (and therefore deduplicated). For some > +use-cases, where files do not change frequently the full re-reading is not > +feasible and undesired. > + > +The backup clients `change-detection-mode` can be switched from default to > +`metadata` based detection to reduce limitations as described above, instructing > +the client to avoid re-reading files with unchanged metadata whenever possible. > +When using this mode, instead of the regular pxar archive, the backup snapshot > +is stored into two separate files: the `mpxar` containing the archives metadata > +and the `ppxar` containing a concatenation of the file contents. This splitting > +allows for metadata lookups without the overhead of the file contents. > +Using the `change-detection-mode` set to `data` allows to create the same split > +archive as when using the `metadata` mode, but without using a previous > +reference and therefore reencoding all file payloads. > + > +When creating the backup archives, the current file metadata is compared to the > +one looked up in the previous `mpxar` archive, and if unchanged the entry cached > +for possible re-use of content chunks without re-reading, by indexing the > +already present chunks containing the contents from the previous backup > +snapshot. Since the file might only partially re-use chunks (thereby introducing > +wasted space in the form of padding), the decision whether to re-use or > +re-encode the currently cached entries is delegated to when enough information > +is available, comparing the possible padding a threshold value. > + > +The following shows an example for the client invocation with the `metadata` > +mode: > + > +.. code-block:: console > + > + # proxmox-backup-client backup.pxar:./linux --change-detection-mode=metadata > + > .. _client_encryption: > > Encryption _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel