From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 4F96E1FF38F for ; Tue, 4 Jun 2024 14:07:32 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 1736B11302; Tue, 4 Jun 2024 14:08:02 +0200 (CEST) Date: Tue, 04 Jun 2024 14:07:55 +0200 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Proxmox Backup Server development discussion References: <20240528094303.309806-1-c.ebner@proxmox.com> <20240528094303.309806-59-c.ebner@proxmox.com> In-Reply-To: <20240528094303.309806-59-c.ebner@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.16.0 (https://github.com/astroidmail/astroid) Message-Id: <1717502521.11jolrj11q.astroid@yuna.none> X-SPAM-LEVEL: Spam detection results: 0 AWL 0.058 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com] Subject: Re: [pbs-devel] [PATCH v8 proxmox-backup 58/69] docs: add section describing change detection mode X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" On May 28, 2024 11:42 am, Christian Ebner wrote: > Describe the motivation and basic principle of the clients change > detection mode and show an example invocation. > > Signed-off-by: Christian Ebner > --- > changes since version 7: > - no changes > > changes since version 6: > - add more information on metadata being compared > - adapt and link from technical overview > > docs/backup-client.rst | 45 +++++++++++++++++++++++++++++++++++++ > docs/technical-overview.rst | 3 +++ > 2 files changed, 48 insertions(+) > > diff --git a/docs/backup-client.rst b/docs/backup-client.rst > index 00a1abbb3..58fcd79f0 100644 > --- a/docs/backup-client.rst > +++ b/docs/backup-client.rst > @@ -280,6 +280,51 @@ Multiple paths can be excluded like this: > > # proxmox-backup-client backup.pxar:./linux --exclude=/usr --exclude=/rust > > +.. _client_change_detection_mode: > + > +Change Detection Mode > +~~~~~~~~~~~~~~~~~~~~~ > + > +File-based backups containing a lot of data can take a long time, as the default > +behavior for the Proxmox backup client is to read all data and re-encode it. read all data and encode it into a pxar archive. > +The encoded stream is split into variable sized chunks for efficient > +deduplication and based on the chunk digest a decision can be made whether a I think I'd drop the efficient deduplication, the whole point of this section is that it is not that efficient :-P is split into variable sized chunks. For each chunk, a digest is calculated and used to decide whether the chunk needs .. > +given chunk needs to be uploaded or can be indexed without upload as it is > +already available on the server (and therefore deduplicated). For some > +use-cases, where files do not change frequently the full re-reading is not > +feasible and undesired. If the backed up files are largely unchanged, re-reading and then deciding the corresponding chunks don't need to be uploaded at all (.. something something undesired ;)) > + > +The backup clients `change-detection-mode` can be switched from default to client's > +`metadata` based detection to reduce limitations as described above, instructing > +the client to avoid re-reading files with unchanged metadata whenever possible. > +When using this mode, instead of the regular pxar archive, the backup snapshot > +is stored into two separate files: the `mpxar` containing the archives metadata archive's > +and the `ppxar` containing a concatenation of the file contents. This splitting > +allows for metadata lookups without the overhead of the file contents. for efficient metadata lookups. ? > +Using the `change-detection-mode` set to `data` allows to create the same split > +archive as when using the `metadata` mode, but without using a previous > +reference and therefore reencoding all file payloads. this part should move below, since the next paragraphs describe the metadata mode? > + > +When creating the backup archives, the current file metadata is compared to the > +one looked up in the previous `mpxar` archive. > +The metadata comparison includes file size, file type, ownership and permission > +information acls and attributes and most importantly the files mtime, for something here is missing (a comma?), and s/files/file's/ > +details see the :ref:`pxar metadata archive format `. > + > +If unchanged, the entry is cached for possible re-use of content chunks without > +re-reading, by indexing the already present chunks containing the contents from > +the previous backup snapshot. Since the file might only partially re-use chunks > +(thereby introducing wasted space in the form of padding), the decision whether > +to re-use or re-encode the currently cached entries is delegated to when enough is delayed/postponed > +information is available, comparing the possible padding a threshold value. to a > + > +The following shows an example for the client invocation with the `metadata` > +mode: > + > +.. code-block:: console > + > + # proxmox-backup-client backup.pxar:./linux --change-detection-mode=metadata > + > .. _client_encryption: > > Encryption > diff --git a/docs/technical-overview.rst b/docs/technical-overview.rst > index 89835a7cc..a8b1c7268 100644 > --- a/docs/technical-overview.rst > +++ b/docs/technical-overview.rst > @@ -28,6 +28,9 @@ which are not chunked, e.g. the client log), or one or more indexes > > When uploading an index, the client first has to read the source data, chunk it > and send the data as chunks with their identifying checksum to the server. > +When using the :ref:`change detection mode ` payload > +chunks for unchanged files are reused from the previous snapshot, thereby not > +reading the source data again. > > If there is a previous Snapshot in the backup group, the client can first > download the chunk list of the previous Snapshot. If it detects a chunk that > -- > 2.39.2 > > > > _______________________________________________ > pbs-devel mailing list > pbs-devel@lists.proxmox.com > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel > > > _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel