* [pbs-devel] [PATCH docs 1/3] docs: explain the working principle of the change detection modes
2024-11-18 9:24 [pbs-devel] [PATCH docs 0/3] extend documentation for change detection mode Christian Ebner
@ 2024-11-18 9:24 ` Christian Ebner
2024-11-18 9:24 ` [pbs-devel] [PATCH docs 2/3] docs: reference technical change detection mode section for client Christian Ebner
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Christian Ebner @ 2024-11-18 9:24 UTC (permalink / raw)
To: pbs-devel
Describe in more details how the different change detection modes
operate and give insights into the inner workings, especially for the
more complex `metadata` mode, which involves lookahead caching and
padding calculation for reused payload chunks.
Suggested-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
docs/technical-overview.rst | 108 ++++++++++++++++++++++++++++++++++++
1 file changed, 108 insertions(+)
diff --git a/docs/technical-overview.rst b/docs/technical-overview.rst
index f79deff38..21793c5c5 100644
--- a/docs/technical-overview.rst
+++ b/docs/technical-overview.rst
@@ -134,6 +134,111 @@ This is done to speed up the client part of the backup, since it only needs to
encrypt chunks that are actually getting uploaded. Chunks that exist already in
the previous backup, do not need to be encrypted and uploaded.
+Change Detection Mode for File-Based Backups
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The change detection mode controls how to detect and act for files which did not
+change in-between subsequent backup runs as well as the archive file format used
+to encode the directory entries.
+
+.. _change-detection-mode-legacy:
+
+Legacy Mode
++++++++++++
+
+Backup snapshots of filesystems are created by recursively scanning the
+directory entries. All entries to be included in the snapshot are read and
+serialized by encoding them using the ``pxar``
+:ref:`archive format <pxar-format>`. The resulting stream is chunked into
+:ref:`dynamically sized chunks <dynamically-sized-chunks>` and uploaded to the
+Proxmox Backup Server, deduplicating chunks based on their content digest for
+space efficient storage.
+File contents are read and chunked unconditionally, no check is performed to
+detect unchanged files.
+
+.. _change-detection-mode-data:
+
+Data Mode
++++++++++
+
+Like for ``legacy`` mode file contents are read and chunked unconditionally, no
+check is performed to detect unchanged files.
+
+However, in contrast to ``legacy`` mode, which stores entries metadata and data
+in a single self-contained ``pxar`` archive, the ``data`` mode encodes metadata
+and file contents into two separate streams. The resulting backup snapshots
+therefore contain split archives, an archive in ``mpxar``
+:ref:`format <pxar-meta-format>` containing the entries metadata and an archive
+with ``ppxar`` :ref:`format <ppxar-format>` , containing the actual file
+contents, separated by payload headers for consistency checks. The metadata
+archive stores a reference offset to the corresponding payload archive entry so
+the file contents can be accessed. Both of these archives are chunked and
+uploaded by the Proxmox backup client, resulting in separated indices and
+independent chunks.
+
+The ``mpxar`` archive can be used to efficiently fetch the associated metadata
+for archive entries without the overhead of payload data stored within the same
+chunks. This is used for example for entry lookups to list the archive contents
+or to navigate the mounted filesystem via the FUSE implementation. No dedicated
+catalog is therefore created for archives encoded using this mode.
+
+.. _change-detection-mode-metadata:
+
+Metadata Mode
++++++++++++++
+
+The ``metadata`` mode detects files whose file metadata did not change
+in-between subsequent backup runs. The metadata comparison includes file size,
+file type, ownership and permission information, as well as acls and attributes
+and most importantly the file's mtime, for details see the
+:ref:`pxar metadata archive format <pxar-meta-format>`. This mode will avoid
+reading and rechunking the file contents whenever possible by reusing the file
+content chunks of unchanged files from the previous backup snapshot.
+
+To compare the metadata, the previous snapshots ``mpxar`` metadata archive is
+downloaded at the start of the backup run and used as a reference. Further, the
+index of the payload archive ``ppxar`` is fetched and used to lookup the file
+content chunk's digests, which will be used to reindex pre-existing chunks
+without the need to reread and rechunk the file contents.
+
+During backup, the metadata and payload archives are encoded in the same manner
+as for the ``data`` mode, but for the ``metadata`` mode each entry is
+additionally looked up in the metadata reference archive for comparison first.
+If the file did not change as compared to the reference, the file is considered
+as unchanged and the Proxmox backup client enters a look-ahead caching mode. In
+this mode, the client will keep reading and comparing then following entries in
+the filesystem as long as they are reusable. Further, it keeps track of the
+payload archive offset range these file contents are stored in. The additional
+look-ahead caching is needed, as file boundaries are not required to be aligned
+with chunk boundaries, therefore reused chunks can contain possibly wasted chunk
+content (also called padding) if reused unconditionally.
+
+The look-ahead cache will greedily cache all unchanged entries up to the point
+where either the cache size limit is reached, a file entry with changed
+metadata is encountered, or the range of payload chunks considered for reuse is
+not continuous. An example for the latter is a file which disappeared in-between
+subsequent backup runs, leaving a hole in the range. At this point, the caching
+mode is disabled and the client calculates the wasted padding size which would
+be introduced by reusing the payload chunks for all the unchanged files cached
+up to this point. If the padding is acceptable (below a preset limit of 10% of
+the actually reused chunk content), the files are reused by encoding them in the
+metadata archive using updated offset references to the contents and reindexing
+the pre-existing chunks in the new ``ppxar`` archive. If however the padding is
+not acceptable, exceeding the limit, all cached entries are reencoded, not
+reusing any of the pre-existing data. The metadata as cached will be encoded in
+the metadata archive, no matter if cached file contents are to be reused or
+reencoded.
+
+This combination of look-ahead caching and reuse of pre-existing payload archive
+chunks for files with unchanged contents therefore speeds up the backup
+process by avoiding rereading and rechunking file contents whenever possible.
+
+To reduce paddings and increase chunk reusability, during creation of the
+archives in ``data`` mode and ``metadata`` mode the pxar encoder signals
+encountered file boundaries as suggested chunk boundaries to the sliding window
+chunker. The chunker then decides based on the internal state if the suggested
+boundary is accepted or disregarded.
+
Caveats and Limitations
-----------------------
@@ -184,6 +289,9 @@ read all files again for every backup, otherwise it would not be possible to
generate a consistent, independent pxar archive where the original chunks can be
reused. Note that in spite of this, only new or changed chunks will be uploaded.
+In order to avoid these limitations, the Change Detection Mode ``metadata`` was
+introduced.
+
Verification of Encrypted Chunks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* [pbs-devel] [PATCH docs 2/3] docs: reference technical change detection mode section for client
2024-11-18 9:24 [pbs-devel] [PATCH docs 0/3] extend documentation for change detection mode Christian Ebner
2024-11-18 9:24 ` [pbs-devel] [PATCH docs 1/3] docs: explain the working principle of the change detection modes Christian Ebner
@ 2024-11-18 9:24 ` Christian Ebner
2024-11-18 9:24 ` [pbs-devel] [PATCH docs 3/3] docs: client: fix formatting by using double ticks Christian Ebner
2024-11-21 16:04 ` [pbs-devel] applied-series: [PATCH docs 0/3] extend documentation for change detection mode Thomas Lamprecht
3 siblings, 0 replies; 6+ messages in thread
From: Christian Ebner @ 2024-11-18 9:24 UTC (permalink / raw)
To: pbs-devel
Currently, the change detection modes are described in the client
usage section, not intended for in-depth explanation on how these
client option works, but rather with focus on how to use them.
Therefore, add a reference to the more detailed technical section
regarding the change detection modes and reduce duplicate
explanations.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
docs/backup-client.rst | 21 ++++++---------------
1 file changed, 6 insertions(+), 15 deletions(-)
diff --git a/docs/backup-client.rst b/docs/backup-client.rst
index e56e0625b..78e856979 100644
--- a/docs/backup-client.rst
+++ b/docs/backup-client.rst
@@ -301,24 +301,15 @@ the client to avoid re-reading files with unchanged metadata whenever possible.
When using this mode, instead of the regular pxar archive, the backup snapshot
is stored into two separate files: the `mpxar` containing the archive's metadata
and the `ppxar` containing a concatenation of the file contents. This splitting
-allows for efficient metadata lookups.
+allows for efficient metadata lookups. When creating the backup archives, the
+current file metadata is compared to the one looked up in the previous `mpxar`
+archive. The operational details are explained more in depth in the
+:ref:`technical documentation <change-detection-mode-metadata>`.
Using the `change-detection-mode` set to `data` allows to create the same split
archive as when using the `metadata` mode, but without using a previous
-reference and therefore reencoding all file payloads.
-When creating the backup archives, the current file metadata is compared to the
-one looked up in the previous `mpxar` archive.
-The metadata comparison includes file size, file type, ownership and permission
-information, as well as acls and attributes and most importantly the file's
-mtime, for details see the
-:ref:`pxar metadata archive format <pxar-meta-format>`.
-
-If unchanged, the entry is cached for possible re-use of content chunks without
-re-reading, by indexing the already present chunks containing the contents from
-the previous backup snapshot. Since the file might only partially re-use chunks
-(thereby introducing wasted space in the form of padding), the decision whether
-to re-use or re-encode the currently cached entries is postponed to when enough
-information is available, comparing the possible padding to a threshold value.
+reference and therefore reencoding all file payloads. For details of this mode
+please see the :ref:`technical documentation <change-detection-mode-data>`.
.. _client_change_detection_mode_table:
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* [pbs-devel] [PATCH docs 3/3] docs: client: fix formatting by using double ticks
2024-11-18 9:24 [pbs-devel] [PATCH docs 0/3] extend documentation for change detection mode Christian Ebner
2024-11-18 9:24 ` [pbs-devel] [PATCH docs 1/3] docs: explain the working principle of the change detection modes Christian Ebner
2024-11-18 9:24 ` [pbs-devel] [PATCH docs 2/3] docs: reference technical change detection mode section for client Christian Ebner
@ 2024-11-18 9:24 ` Christian Ebner
2024-11-18 15:04 ` Shannon Sterz
2024-11-21 16:04 ` [pbs-devel] applied-series: [PATCH docs 0/3] extend documentation for change detection mode Thomas Lamprecht
3 siblings, 1 reply; 6+ messages in thread
From: Christian Ebner @ 2024-11-18 9:24 UTC (permalink / raw)
To: pbs-devel
With single ticks the containing modes and archive formats are
displayed cursive, to be consistent with other sections of the
documentation use inline blocks.
Adapted line wrappings to the additional line length.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
docs/backup-client.rst | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/docs/backup-client.rst b/docs/backup-client.rst
index 78e856979..45df440c9 100644
--- a/docs/backup-client.rst
+++ b/docs/backup-client.rst
@@ -295,19 +295,20 @@ therefore deduplicated). If the backed up files are largely unchanged,
re-reading and then detecting the corresponding chunks don't need to be uploaded
after all is time consuming and undesired.
-The backup client's `change-detection-mode` can be switched from default to
-`metadata` based detection to reduce limitations as described above, instructing
-the client to avoid re-reading files with unchanged metadata whenever possible.
+The backup client's ``change-detection-mode`` can be switched from default to
+``metadata`` based detection to reduce limitations as described above,
+instructing the client to avoid re-reading files with unchanged metadata
+whenever possible.
When using this mode, instead of the regular pxar archive, the backup snapshot
-is stored into two separate files: the `mpxar` containing the archive's metadata
-and the `ppxar` containing a concatenation of the file contents. This splitting
-allows for efficient metadata lookups. When creating the backup archives, the
-current file metadata is compared to the one looked up in the previous `mpxar`
-archive. The operational details are explained more in depth in the
-:ref:`technical documentation <change-detection-mode-metadata>`.
-
-Using the `change-detection-mode` set to `data` allows to create the same split
-archive as when using the `metadata` mode, but without using a previous
+is stored into two separate files: the ``mpxar`` containing the archive's
+metadata and the ``ppxar`` containing a concatenation of the file contents. This
+splitting allows for efficient metadata lookups. When creating the backup
+archives, the current file metadata is compared to the one looked up in the
+previous ``mpxar`` archive. The operational details are explained more in depth
+in the :ref:`technical documentation <change-detection-mode-metadata>`.
+
+Using the ``change-detection-mode`` set to ``data`` allows to create the same
+split archive as when using the ``metadata`` mode, but without using a previous
reference and therefore reencoding all file payloads. For details of this mode
please see the :ref:`technical documentation <change-detection-mode-data>`.
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [pbs-devel] [PATCH docs 3/3] docs: client: fix formatting by using double ticks
2024-11-18 9:24 ` [pbs-devel] [PATCH docs 3/3] docs: client: fix formatting by using double ticks Christian Ebner
@ 2024-11-18 15:04 ` Shannon Sterz
0 siblings, 0 replies; 6+ messages in thread
From: Shannon Sterz @ 2024-11-18 15:04 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On Mon Nov 18, 2024 at 10:24 AM CET, Christian Ebner wrote:
> With single ticks the containing modes and archive formats are
> displayed cursive, to be consistent with other sections of the
> documentation use inline blocks.
>
> Adapted line wrappings to the additional line length.
>
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> docs/backup-client.rst | 25 +++++++++++++------------
> 1 file changed, 13 insertions(+), 12 deletions(-)
>
> diff --git a/docs/backup-client.rst b/docs/backup-client.rst
> index 78e856979..45df440c9 100644
> --- a/docs/backup-client.rst
> +++ b/docs/backup-client.rst
> @@ -295,19 +295,20 @@ therefore deduplicated). If the backed up files are largely unchanged,
> re-reading and then detecting the corresponding chunks don't need to be uploaded
> after all is time consuming and undesired.
>
> -The backup client's `change-detection-mode` can be switched from default to
> -`metadata` based detection to reduce limitations as described above, instructing
> -the client to avoid re-reading files with unchanged metadata whenever possible.
> +The backup client's ``change-detection-mode`` can be switched from default to
> +``metadata`` based detection to reduce limitations as described above,
> +instructing the client to avoid re-reading files with unchanged metadata
> +whenever possible.
> When using this mode, instead of the regular pxar archive, the backup snapshot
> -is stored into two separate files: the `mpxar` containing the archive's metadata
> -and the `ppxar` containing a concatenation of the file contents. This splitting
> -allows for efficient metadata lookups. When creating the backup archives, the
> -current file metadata is compared to the one looked up in the previous `mpxar`
> -archive. The operational details are explained more in depth in the
> -:ref:`technical documentation <change-detection-mode-metadata>`.
> -
> -Using the `change-detection-mode` set to `data` allows to create the same split
> -archive as when using the `metadata` mode, but without using a previous
> +is stored into two separate files: the ``mpxar`` containing the archive's
> +metadata and the ``ppxar`` containing a concatenation of the file contents. This
> +splitting allows for efficient metadata lookups. When creating the backup
> +archives, the current file metadata is compared to the one looked up in the
> +previous ``mpxar`` archive. The operational details are explained more in depth
> +in the :ref:`technical documentation <change-detection-mode-metadata>`.
> +
> +Using the ``change-detection-mode`` set to ``data`` allows to create the same
> +split archive as when using the ``metadata`` mode, but without using a previous
> reference and therefore reencoding all file payloads. For details of this mode
> please see the :ref:`technical documentation <change-detection-mode-data>`.
>
read through all these doc patches, they sound good to me
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* [pbs-devel] applied-series: [PATCH docs 0/3] extend documentation for change detection mode
2024-11-18 9:24 [pbs-devel] [PATCH docs 0/3] extend documentation for change detection mode Christian Ebner
` (2 preceding siblings ...)
2024-11-18 9:24 ` [pbs-devel] [PATCH docs 3/3] docs: client: fix formatting by using double ticks Christian Ebner
@ 2024-11-21 16:04 ` Thomas Lamprecht
3 siblings, 0 replies; 6+ messages in thread
From: Thomas Lamprecht @ 2024-11-21 16:04 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Christian Ebner
Am 18.11.24 um 10:24 schrieb Christian Ebner:
> Add sections explaining the change detection modes in more technical
> details and reference to this sections in the client usage section,
> which should cover more the how-to-use than the how-it-works.
>
> Christian Ebner (3):
> docs: explain the working principle of the change detection modes
> docs: reference technical change detection mode section for client
> docs: client: fix formatting by using double ticks
>
> docs/backup-client.rst | 38 +++++--------
> docs/technical-overview.rst | 108 ++++++++++++++++++++++++++++++++++++
> 2 files changed, 123 insertions(+), 23 deletions(-)
>
applied series, thanks!
I took the liberty of transofrming Shannon's "sounds good to me" into a
Reviewed-by, holler at me if I should not do that anymore in the future.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 6+ messages in thread