public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox-backup] docs: explain some technical details about datastores/chunks
@ 2020-12-09 15:25 Dominik Csapak
  2020-12-09 15:56 ` Oguz Bektas
  2020-12-09 16:23 ` Lubomir Apostolov
  0 siblings, 2 replies; 7+ messages in thread
From: Dominik Csapak @ 2020-12-09 15:25 UTC (permalink / raw)
  To: pbs-devel

adds explanations for:
* what datastores are
* their relation with snapshots/chunks
* basic information about chunk directory structures
* fixed-/dynamically-sized chunks
* special handling of encrypted chunks
* hash collision probability
* limitation of file-based backups

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
 docs/index.rst              |   1 +
 docs/technical-overview.rst | 152 ++++++++++++++++++++++++++++++++++++
 docs/terminology.rst        |   3 +
 3 files changed, 156 insertions(+)
 create mode 100644 docs/technical-overview.rst

diff --git a/docs/index.rst b/docs/index.rst
index fffcb4fd..f3e6bf0c 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -33,6 +33,7 @@ in the section entitled "GNU Free Documentation License".
    pve-integration.rst
    pxar-tool.rst
    sysadmin.rst
+   technical-overview.rst
    faq.rst
 
 .. raw:: latex
diff --git a/docs/technical-overview.rst b/docs/technical-overview.rst
new file mode 100644
index 00000000..20f937bd
--- /dev/null
+++ b/docs/technical-overview.rst
@@ -0,0 +1,152 @@
+Technical Overview
+==================
+
+.. _technical_overview:
+
+Datastores
+----------
+
+A Datastore is the logical place where :ref:`Backup Snapshots <backup_snapshot>`
+and their chunks are stored. Snapshots consist of a manifest, blobs,
+dynamic- and fixed-indexes (see :ref:`terminology`), and are stored in the following directory structure:
+
+ <datastore-root>/<type>/<id>/<time>/
+
+The indexes contained in a snapshot reference chunks on which the deduplication
+of datastores is based.
+
+Chunks
+------
+
+A chunk is some (sometimes encrypted) data with a CRC-32 checksum at
+the end and a type marker at the beginning. It is identified by a
+SHA-256 checksum of its content.
+
+To generate such chunks, backup data is split either into fixed-size or
+dynamically-sized chunks. The same content will be hashed to the same
+checksum.
+
+The chunks of a datastore are found in
+
+ <datastore-root>/.chunks/
+
+This chunk directory is further subdivided by the first four byte of the
+chunks checksum, so the chunk with the checksum
+
+ a342e8151cbf439ce65f3df696b54c67a114982cc0aa751f2852c2f7acc19a8b
+
+lives in
+
+ <datastore-root>/.chunks/a342/
+
+This is done to reduce the number of files per directory, as having
+many files per directory can be bad for file system performance.
+
+These chunk directories('0000'-'ffff') will be preallocated when the datastore is
+created.
+
+Fixed-sized Chunks
+^^^^^^^^^^^^^^^^^^
+
+For block based backups (like VMs), fixed-sized chunks are used. The content
+(disk image), is split into chunks of the same length (typically 4 MiB).
+
+This works very well for VM images, since the file system on the guest,
+most often tries to allocate files in contiguous pieces, so new files get
+new blocks, and changing existing files changes only their own blocks.
+
+As an optimization, VMs in `Proxmox VE`_ can make use of 'dirty bitmaps',
+which can track the changed blocks of an image. Since these bitmap
+are also a representation of the image split into chunks, we have
+a direct relation between dirty blocks of the image and chunks we have
+to upload, so only touched chunks of the disk have to be uploaded for a backup.
+
+Dynamically-sized Chunks
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+If does not want to backup block-based systems but file-based systems,
+using fixed-sized chunks is not a good idea, since every time a file
+would change in size, the remaining data gets shifted around and this
+would result in many chunks changing, reducing the effect of deduplication.
+
+To improve this, `Proxmox Backup`_ Server uses dynamically-sized chunks
+instead. Instead of splitting an image into fixed sizes, it generates
+a consistent file archive and uses a rolling hash over the data
+to calculate chunk boundaries. We use a variant of Buzhash which
+is a cyclic polynomial algorithm.
+
+It works by continuously calculating a hashsum while iterating over the
+data, and on certain conditions it triggers a hash boundary.
+
+Assuming that there is not a change in every file of the to system to
+be backed up, eventually the algorithm triggers the boundary on the
+same data as a previous backup, resulting in a chunk that can be reused.
+
+Encrypted Chunks
+^^^^^^^^^^^^^^^^
+
+A special case are encrypted chunks. Both fixed- and dynamically-sized
+chunks can be encrypted, and their handling is slightly different
+from normal chunks.
+
+The hash of encrypted chunks are calculated not with the actual (encrypted)
+chunk content, but with the plaintext content concatenated with
+the encryption key. This way, two chunks of the same data encrypted with
+different keys generate two different checksums and no collisions occur for
+multiple encryption keys.
+
+This is done to speed up the client part of the backup, since it only needs
+to encrypt chunks that are actually getting uploaded and chunks that exist
+already in the previous backup, do not need to be encrypted and uploaded.
+
+Caveats and Limitations
+-----------------------
+
+Notes on hash collisions
+^^^^^^^^^^^^^^^^^^^^^^^^
+Every hash has a chance to produce collisions, meaning two (or more) inputs
+generating the same hashsum. For SHA-256, this chance is negligible.
+To calculate such a collision, one can use the ideas of the 'birthday problem'
+from probability theory. For big numbers, this is actually not feasible to
+calculate with normal computers, but there is a good approximation:
+
+.. math::
+
+ p(n, d) = 1 - e^{-n^2/(2d)}
+
+Where `n` is the number of tries, and `d` is the number of possibilities.
+So for example, if we assume a large datastore of 1 PiB, and an average chunk
+size of 4 MiB, we have :math:`n = 268435456` tries, and :math:`d = 2^{256}`
+possibilities.  Using the above formula we get that the probability of a
+collision in that scenario is:
+
+.. math::
+
+ 3.1115 * 10^{-61}
+
+For context, in a lottery game of 6 of 45, the chance to correctly guess all
+6 numbers is only :math:`1.2277 * 10^{-7}`.
+
+So it is extremely unlikely that such a collision would occur by accident
+in a normal datastore.
+
+Additionally, SHA-256 is prone to length extension attacks, but since
+there is an upper limit for how big the chunk sizes are, this is not
+problem, since a potential attacker cannot arbitrarily add the content.
+
+
+File-based Backup
+^^^^^^^^^^^^^^^^^
+
+Since there is no relation between files and dynamically-sized chunks,
+we have to read all files again in every backup. Otherwise it would not be
+possible to generate a format where the original chunks can be reused.
+
+Verification of encrypted chunks
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+For encrypted chunks, only the checksum of the original (plaintext) data
+is available, making it impossible for the server (without the encryption key),
+to verify its content against it. Instead only the CRC-32 checksum gets checked.
+
+
diff --git a/docs/terminology.rst b/docs/terminology.rst
index 89ec7646..3eff9780 100644
--- a/docs/terminology.rst
+++ b/docs/terminology.rst
@@ -1,3 +1,5 @@
+.. _terminology:
+
 Terminology
 ===========
 
@@ -99,6 +101,7 @@ Backup Group
 The tuple ``<type>/<ID>`` is called a backup group. Such a group
 may contain one or more backup snapshots.
 
+.. _backup_snapshot:
 
 Backup Snapshot
 ---------------
-- 
2.20.1





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] docs: explain some technical details about datastores/chunks
  2020-12-09 15:25 [pbs-devel] [PATCH proxmox-backup] docs: explain some technical details about datastores/chunks Dominik Csapak
@ 2020-12-09 15:56 ` Oguz Bektas
  2020-12-09 16:23 ` Lubomir Apostolov
  1 sibling, 0 replies; 7+ messages in thread
From: Oguz Bektas @ 2020-12-09 15:56 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

hi,

thanks for the patch!

some comments inline

On Wed, Dec 09, 2020 at 04:25:53PM +0100, Dominik Csapak wrote:
> adds explanations for:
> * what datastores are
> * their relation with snapshots/chunks
> * basic information about chunk directory structures
> * fixed-/dynamically-sized chunks
> * special handling of encrypted chunks
> * hash collision probability
> * limitation of file-based backups
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
>  docs/index.rst              |   1 +
>  docs/technical-overview.rst | 152 ++++++++++++++++++++++++++++++++++++
>  docs/terminology.rst        |   3 +
>  3 files changed, 156 insertions(+)
>  create mode 100644 docs/technical-overview.rst
> 
> diff --git a/docs/index.rst b/docs/index.rst
> index fffcb4fd..f3e6bf0c 100644
> --- a/docs/index.rst
> +++ b/docs/index.rst
> @@ -33,6 +33,7 @@ in the section entitled "GNU Free Documentation License".
>     pve-integration.rst
>     pxar-tool.rst
>     sysadmin.rst
> +   technical-overview.rst
>     faq.rst
>  
>  .. raw:: latex
> diff --git a/docs/technical-overview.rst b/docs/technical-overview.rst
> new file mode 100644
> index 00000000..20f937bd
> --- /dev/null
> +++ b/docs/technical-overview.rst
> @@ -0,0 +1,152 @@
> +Technical Overview
> +==================
> +
> +.. _technical_overview:
> +
> +Datastores
> +----------
> +
> +A Datastore is the logical place where :ref:`Backup Snapshots <backup_snapshot>`
> +and their chunks are stored. Snapshots consist of a manifest, blobs,
> +dynamic- and fixed-indexes (see :ref:`terminology`), and are stored in the following directory structure:
> +
> + <datastore-root>/<type>/<id>/<time>/
> +
> +The indexes contained in a snapshot reference chunks on which the deduplication
> +of datastores is based.

imo it's a bit confusing to read this sentence. maybe something like:

"The deduplication of datastores is based on chunks, which are
referenced by the indexes contained in a snapshot."

> +
> +Chunks
> +------
> +
> +A chunk is some (sometimes encrypted) data with a CRC-32 checksum at

s/sometimes/possibly ?

> +the end and a type marker at the beginning. It is identified by a
> +SHA-256 checksum of its content.
> +
> +To generate such chunks, backup data is split either into fixed-size or
> +dynamically-sized chunks. The same content will be hashed to the same
> +checksum.
> +
> +The chunks of a datastore are found in
> +
> + <datastore-root>/.chunks/
> +
> +This chunk directory is further subdivided by the first four byte of the
> +chunks checksum, so the chunk with the checksum
> +
> + a342e8151cbf439ce65f3df696b54c67a114982cc0aa751f2852c2f7acc19a8b
> +
> +lives in
> +
> + <datastore-root>/.chunks/a342/
> +
> +This is done to reduce the number of files per directory, as having
> +many files per directory can be bad for file system performance.
> +
> +These chunk directories('0000'-'ffff') will be preallocated when the datastore is

s/the/a

> +created.
> +
> +Fixed-sized Chunks
> +^^^^^^^^^^^^^^^^^^
> +
> +For block based backups (like VMs), fixed-sized chunks are used. The content
> +(disk image), is split into chunks of the same length (typically 4 MiB).
> +
> +This works very well for VM images, since the file system on the guest,
> +most often tries to allocate files in contiguous pieces, so new files get
> +new blocks, and changing existing files changes only their own blocks.

s/guest,/guest


> +
> +As an optimization, VMs in `Proxmox VE`_ can make use of 'dirty bitmaps',
> +which can track the changed blocks of an image. Since these bitmap
> +are also a representation of the image split into chunks, we have
> +a direct relation between dirty blocks of the image and chunks we have
> +to upload, so only touched chunks of the disk have to be uploaded for a backup.

s/touched/modified ?

> +
> +Dynamically-sized Chunks
> +^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +If does not want to backup block-based systems but file-based systems,

s/If/If one

> +using fixed-sized chunks is not a good idea, since every time a file
> +would change in size, the remaining data gets shifted around and this
> +would result in many chunks changing, reducing the effect of deduplication.
> +
> +To improve this, `Proxmox Backup`_ Server uses dynamically-sized chunks
> +instead. Instead of splitting an image into fixed sizes, it generates
> +a consistent file archive and uses a rolling hash over the data
> +to calculate chunk boundaries. We use a variant of Buzhash which
> +is a cyclic polynomial algorithm.
> +
> +It works by continuously calculating a hashsum while iterating over the
s/hashsum/checksum
> +data, and on certain conditions it triggers a hash boundary.
> +
> +Assuming that there is not a change in every file of the to system to
s/not a/no
s/to//
> +be backed up, eventually the algorithm triggers the boundary on the
> +same data as a previous backup, resulting in a chunk that can be reused.
> +
> +Encrypted Chunks
> +^^^^^^^^^^^^^^^^
> +
> +A special case are encrypted chunks. Both fixed- and dynamically-sized
> +chunks can be encrypted, and their handling is slightly different
> +from normal chunks.
> +
> +The hash of encrypted chunks are calculated not with the actual (encrypted)
> +chunk content, but with the plaintext content concatenated with
> +the encryption key. This way, two chunks of the same data encrypted with
> +different keys generate two different checksums and no collisions occur for
> +multiple encryption keys.
> +
> +This is done to speed up the client part of the backup, since it only needs
> +to encrypt chunks that are actually getting uploaded and chunks that exist
> +already in the previous backup, do not need to be encrypted and uploaded.
> +
> +Caveats and Limitations
> +-----------------------
> +
> +Notes on hash collisions
> +^^^^^^^^^^^^^^^^^^^^^^^^
> +Every hash has a chance to produce collisions, meaning two (or more) inputs
> +generating the same hashsum. For SHA-256, this chance is negligible.
s/hashsum/checksum
> +To calculate such a collision, one can use the ideas of the 'birthday problem'
> +from probability theory. For big numbers, this is actually not feasible to
s/not feasible/infeasible
> +calculate with normal computers, but there is a good approximation:
> +
> +.. math::
> +
> + p(n, d) = 1 - e^{-n^2/(2d)}
> +
> +Where `n` is the number of tries, and `d` is the number of possibilities.
> +So for example, if we assume a large datastore of 1 PiB, and an average chunk
> +size of 4 MiB, we have :math:`n = 268435456` tries, and :math:`d = 2^{256}`
> +possibilities.  Using the above formula we get that the probability of a
> +collision in that scenario is:
> +
> +.. math::
> +
> + 3.1115 * 10^{-61}
> +
> +For context, in a lottery game of 6 of 45, the chance to correctly guess all
> +6 numbers is only :math:`1.2277 * 10^{-7}`.
> +
> +So it is extremely unlikely that such a collision would occur by accident
> +in a normal datastore.
> +
> +Additionally, SHA-256 is prone to length extension attacks, but since
> +there is an upper limit for how big the chunk sizes are, this is not
> +problem, since a potential attacker cannot arbitrarily add the content.
s/problem/a problem

or

s/problem/problematic
> +
> +
> +File-based Backup
> +^^^^^^^^^^^^^^^^^
> +
> +Since there is no relation between files and dynamically-sized chunks,
> +we have to read all files again in every backup. Otherwise it would not be
> +possible to generate a format where the original chunks can be reused.
> +
> +Verification of encrypted chunks
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +For encrypted chunks, only the checksum of the original (plaintext) data
> +is available, making it impossible for the server (without the encryption key),
> +to verify its content against it. Instead only the CRC-32 checksum gets checked.
> +
> +
> diff --git a/docs/terminology.rst b/docs/terminology.rst
> index 89ec7646..3eff9780 100644
> --- a/docs/terminology.rst
> +++ b/docs/terminology.rst
> @@ -1,3 +1,5 @@
> +.. _terminology:
> +
>  Terminology
>  ===========
>  
> @@ -99,6 +101,7 @@ Backup Group
>  The tuple ``<type>/<ID>`` is called a backup group. Such a group
>  may contain one or more backup snapshots.
>  
> +.. _backup_snapshot:
>  
>  Backup Snapshot
>  ---------------
> -- 
> 2.20.1
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] docs: explain some technical details about datastores/chunks
  2020-12-09 15:25 [pbs-devel] [PATCH proxmox-backup] docs: explain some technical details about datastores/chunks Dominik Csapak
  2020-12-09 15:56 ` Oguz Bektas
@ 2020-12-09 16:23 ` Lubomir Apostolov
  2020-12-10  7:37   ` Dominik Csapak
  1 sibling, 1 reply; 7+ messages in thread
From: Lubomir Apostolov @ 2020-12-09 16:23 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

Hi,

After reading https://bugzilla.proxmox.com/show_bug.cgi?id=3138 and your mail, I'd like to discuss the following statement :
"we have to read all files again in every backup" for both cases - file and image based backup.

The image-backup seems simpler - fixed-size chunks. 
PBS should be able to link a snapshot with it's backup, and backup only differencies with parent snapshot as for example zfs send/recv works.
Every backup knows the chunk order and chunk size, so it can map every chunk to the original image extents.
The snapshot differences gives extents, which PBS can map to overlapped chunks, and send only those chunks, while referencing unchanged chunks from previous backup chunks list.

The variable-size chunks based on files needs another mapping between chunks and filenames. 
The rolling hash over the data may be linked a list containing the filenames inside, and then the snapshot diff containing files can flag the chunks to be saved.

So where's the catch ?

Best regards,
Lubomir Apostolov

-----Message d'origine-----
De : pbs-devel [mailto:pbs-devel-bounces@lists.proxmox.com] De la part de Dominik Csapak
Envoyé : mercredi 9 décembre 2020 16:26
À : pbs-devel@lists.proxmox.com
Objet : [pbs-devel] [PATCH proxmox-backup] docs: explain some technical details about datastores/chunks

adds explanations for:
* what datastores are
* their relation with snapshots/chunks
* basic information about chunk directory structures
* fixed-/dynamically-sized chunks
* special handling of encrypted chunks
* hash collision probability
* limitation of file-based backups

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
 docs/index.rst              |   1 +
 docs/technical-overview.rst | 152 ++++++++++++++++++++++++++++++++++++
 docs/terminology.rst        |   3 +
 3 files changed, 156 insertions(+)
 create mode 100644 docs/technical-overview.rst

diff --git a/docs/index.rst b/docs/index.rst
index fffcb4fd..f3e6bf0c 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -33,6 +33,7 @@ in the section entitled "GNU Free Documentation License".
    pve-integration.rst
    pxar-tool.rst
    sysadmin.rst
+   technical-overview.rst
    faq.rst
 
 .. raw:: latex
diff --git a/docs/technical-overview.rst b/docs/technical-overview.rst
new file mode 100644
index 00000000..20f937bd
--- /dev/null
+++ b/docs/technical-overview.rst
@@ -0,0 +1,152 @@
+Technical Overview
+==================
+
+.. _technical_overview:
+
+Datastores
+----------
+
+A Datastore is the logical place where :ref:`Backup Snapshots <backup_snapshot>`
+and their chunks are stored. Snapshots consist of a manifest, blobs,
+dynamic- and fixed-indexes (see :ref:`terminology`), and are stored in the following directory structure:
+
+ <datastore-root>/<type>/<id>/<time>/
+
+The indexes contained in a snapshot reference chunks on which the deduplication
+of datastores is based.
+
+Chunks
+------
+
+A chunk is some (sometimes encrypted) data with a CRC-32 checksum at
+the end and a type marker at the beginning. It is identified by a
+SHA-256 checksum of its content.
+
+To generate such chunks, backup data is split either into fixed-size or
+dynamically-sized chunks. The same content will be hashed to the same
+checksum.
+
+The chunks of a datastore are found in
+
+ <datastore-root>/.chunks/
+
+This chunk directory is further subdivided by the first four byte of the
+chunks checksum, so the chunk with the checksum
+
+ a342e8151cbf439ce65f3df696b54c67a114982cc0aa751f2852c2f7acc19a8b
+
+lives in
+
+ <datastore-root>/.chunks/a342/
+
+This is done to reduce the number of files per directory, as having
+many files per directory can be bad for file system performance.
+
+These chunk directories('0000'-'ffff') will be preallocated when the datastore is
+created.
+
+Fixed-sized Chunks
+^^^^^^^^^^^^^^^^^^
+
+For block based backups (like VMs), fixed-sized chunks are used. The content
+(disk image), is split into chunks of the same length (typically 4 MiB).
+
+This works very well for VM images, since the file system on the guest,
+most often tries to allocate files in contiguous pieces, so new files get
+new blocks, and changing existing files changes only their own blocks.
+
+As an optimization, VMs in `Proxmox VE`_ can make use of 'dirty bitmaps',
+which can track the changed blocks of an image. Since these bitmap
+are also a representation of the image split into chunks, we have
+a direct relation between dirty blocks of the image and chunks we have
+to upload, so only touched chunks of the disk have to be uploaded for a backup.
+
+Dynamically-sized Chunks
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+If does not want to backup block-based systems but file-based systems,
+using fixed-sized chunks is not a good idea, since every time a file
+would change in size, the remaining data gets shifted around and this
+would result in many chunks changing, reducing the effect of deduplication.
+
+To improve this, `Proxmox Backup`_ Server uses dynamically-sized chunks
+instead. Instead of splitting an image into fixed sizes, it generates
+a consistent file archive and uses a rolling hash over the data
+to calculate chunk boundaries. We use a variant of Buzhash which
+is a cyclic polynomial algorithm.
+
+It works by continuously calculating a hashsum while iterating over the
+data, and on certain conditions it triggers a hash boundary.
+
+Assuming that there is not a change in every file of the to system to
+be backed up, eventually the algorithm triggers the boundary on the
+same data as a previous backup, resulting in a chunk that can be reused.
+
+Encrypted Chunks
+^^^^^^^^^^^^^^^^
+
+A special case are encrypted chunks. Both fixed- and dynamically-sized
+chunks can be encrypted, and their handling is slightly different
+from normal chunks.
+
+The hash of encrypted chunks are calculated not with the actual (encrypted)
+chunk content, but with the plaintext content concatenated with
+the encryption key. This way, two chunks of the same data encrypted with
+different keys generate two different checksums and no collisions occur for
+multiple encryption keys.
+
+This is done to speed up the client part of the backup, since it only needs
+to encrypt chunks that are actually getting uploaded and chunks that exist
+already in the previous backup, do not need to be encrypted and uploaded.
+
+Caveats and Limitations
+-----------------------
+
+Notes on hash collisions
+^^^^^^^^^^^^^^^^^^^^^^^^
+Every hash has a chance to produce collisions, meaning two (or more) inputs
+generating the same hashsum. For SHA-256, this chance is negligible.
+To calculate such a collision, one can use the ideas of the 'birthday problem'
+from probability theory. For big numbers, this is actually not feasible to
+calculate with normal computers, but there is a good approximation:
+
+.. math::
+
+ p(n, d) = 1 - e^{-n^2/(2d)}
+
+Where `n` is the number of tries, and `d` is the number of possibilities.
+So for example, if we assume a large datastore of 1 PiB, and an average chunk
+size of 4 MiB, we have :math:`n = 268435456` tries, and :math:`d = 2^{256}`
+possibilities.  Using the above formula we get that the probability of a
+collision in that scenario is:
+
+.. math::
+
+ 3.1115 * 10^{-61}
+
+For context, in a lottery game of 6 of 45, the chance to correctly guess all
+6 numbers is only :math:`1.2277 * 10^{-7}`.
+
+So it is extremely unlikely that such a collision would occur by accident
+in a normal datastore.
+
+Additionally, SHA-256 is prone to length extension attacks, but since
+there is an upper limit for how big the chunk sizes are, this is not
+problem, since a potential attacker cannot arbitrarily add the content.
+
+
+File-based Backup
+^^^^^^^^^^^^^^^^^
+
+Since there is no relation between files and dynamically-sized chunks,
+we have to read all files again in every backup. Otherwise it would not be
+possible to generate a format where the original chunks can be reused.
+
+Verification of encrypted chunks
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+For encrypted chunks, only the checksum of the original (plaintext) data
+is available, making it impossible for the server (without the encryption key),
+to verify its content against it. Instead only the CRC-32 checksum gets checked.
+
+
diff --git a/docs/terminology.rst b/docs/terminology.rst
index 89ec7646..3eff9780 100644
--- a/docs/terminology.rst
+++ b/docs/terminology.rst
@@ -1,3 +1,5 @@
+.. _terminology:
+
 Terminology
 ===========
 
@@ -99,6 +101,7 @@ Backup Group
 The tuple ``<type>/<ID>`` is called a backup group. Such a group
 may contain one or more backup snapshots.
 
+.. _backup_snapshot:
 
 Backup Snapshot
 ---------------
-- 
2.20.1



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] docs: explain some technical details about datastores/chunks
  2020-12-09 16:23 ` Lubomir Apostolov
@ 2020-12-10  7:37   ` Dominik Csapak
  2020-12-10 11:42     ` Lubomir Apostolov
  0 siblings, 1 reply; 7+ messages in thread
From: Dominik Csapak @ 2020-12-10  7:37 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Lubomir Apostolov

On 12/9/20 5:23 PM, Lubomir Apostolov wrote:
> Hi,

Hi,

thanks for your message, your questions mean i did not
write the documentation clear enough, i try to answer here,
but i'll also incorporate that info in a v2

> 
> After reading https://bugzilla.proxmox.com/show_bug.cgi?id=3138 and your mail, I'd like to discuss the following statement :
> "we have to read all files again in every backup" for both cases - file and image based backup.

just to clarify, that sentence only relates to 'file-based backups', 
since for block based backups, we do not care about files at all,
just about the 'block image' we try back up.

> 
> The image-backup seems simpler - fixed-size chunks.
> PBS should be able to link a snapshot with it's backup, and backup only differencies with parent snapshot as for example zfs send/recv works.
> Every backup knows the chunk order and chunk size, so it can map every chunk to the original image extents.
> The snapshot differences gives extents, which PBS can map to overlapped chunks, and send only those chunks, while referencing unchanged chunks from previous backup chunks list.

while something like that *could* work, it is not how we do it.
instead of relying on storage snapshots (which may
not be available, e.g. for '.raw' files)
we simply iterate over the content of the block level image,
and create the chunks hashes. this normally means that we have
to read the whole image, but in case of pve qemu vms,
we use (as written in my patch) dirty-bitmaps
which keeps track of the changed blocks
(only those have to be hashed and backed up)

> 
> The variable-size chunks based on files needs another mapping between chunks and filenames.
> The rolling hash over the data may be linked a list containing the filenames inside, and then the snapshot diff containing files can flag the chunks to be saved.
> 
> So where's the catch ?

again, we do not rely on storage snapshots, since those may
not be available (e.g. ext4)

so, first we iterate over the filesystem/directory, from there,
we create a consistent archive format (pxar). this is something
like 'tar' but can be created in a streaming fashion (which we
need).

only over that archive, we create the chunks. so the data that
the chunker gets, has no direct relation to any files

hope that explains it better

> 
> Best regards,
> Lubomir Apostolov
> 
> -----Message d'origine-----
> De : pbs-devel [mailto:pbs-devel-bounces@lists.proxmox.com] De la part de Dominik Csapak
> Envoyé : mercredi 9 décembre 2020 16:26
> À : pbs-devel@lists.proxmox.com
> Objet : [pbs-devel] [PATCH proxmox-backup] docs: explain some technical details about datastores/chunks
> 
> adds explanations for:
> * what datastores are
> * their relation with snapshots/chunks
> * basic information about chunk directory structures
> * fixed-/dynamically-sized chunks
> * special handling of encrypted chunks
> * hash collision probability
> * limitation of file-based backups
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
>   docs/index.rst              |   1 +
>   docs/technical-overview.rst | 152 ++++++++++++++++++++++++++++++++++++
>   docs/terminology.rst        |   3 +
>   3 files changed, 156 insertions(+)
>   create mode 100644 docs/technical-overview.rst
> 
> diff --git a/docs/index.rst b/docs/index.rst
> index fffcb4fd..f3e6bf0c 100644
> --- a/docs/index.rst
> +++ b/docs/index.rst
> @@ -33,6 +33,7 @@ in the section entitled "GNU Free Documentation License".
>      pve-integration.rst
>      pxar-tool.rst
>      sysadmin.rst
> +   technical-overview.rst
>      faq.rst
>   
>   .. raw:: latex
> diff --git a/docs/technical-overview.rst b/docs/technical-overview.rst
> new file mode 100644
> index 00000000..20f937bd
> --- /dev/null
> +++ b/docs/technical-overview.rst
> @@ -0,0 +1,152 @@
> +Technical Overview
> +==================
> +
> +.. _technical_overview:
> +
> +Datastores
> +----------
> +
> +A Datastore is the logical place where :ref:`Backup Snapshots <backup_snapshot>`
> +and their chunks are stored. Snapshots consist of a manifest, blobs,
> +dynamic- and fixed-indexes (see :ref:`terminology`), and are stored in the following directory structure:
> +
> + <datastore-root>/<type>/<id>/<time>/
> +
> +The indexes contained in a snapshot reference chunks on which the deduplication
> +of datastores is based.
> +
> +Chunks
> +------
> +
> +A chunk is some (sometimes encrypted) data with a CRC-32 checksum at
> +the end and a type marker at the beginning. It is identified by a
> +SHA-256 checksum of its content.
> +
> +To generate such chunks, backup data is split either into fixed-size or
> +dynamically-sized chunks. The same content will be hashed to the same
> +checksum.
> +
> +The chunks of a datastore are found in
> +
> + <datastore-root>/.chunks/
> +
> +This chunk directory is further subdivided by the first four byte of the
> +chunks checksum, so the chunk with the checksum
> +
> + a342e8151cbf439ce65f3df696b54c67a114982cc0aa751f2852c2f7acc19a8b
> +
> +lives in
> +
> + <datastore-root>/.chunks/a342/
> +
> +This is done to reduce the number of files per directory, as having
> +many files per directory can be bad for file system performance.
> +
> +These chunk directories('0000'-'ffff') will be preallocated when the datastore is
> +created.
> +
> +Fixed-sized Chunks
> +^^^^^^^^^^^^^^^^^^
> +
> +For block based backups (like VMs), fixed-sized chunks are used. The content
> +(disk image), is split into chunks of the same length (typically 4 MiB).
> +
> +This works very well for VM images, since the file system on the guest,
> +most often tries to allocate files in contiguous pieces, so new files get
> +new blocks, and changing existing files changes only their own blocks.
> +
> +As an optimization, VMs in `Proxmox VE`_ can make use of 'dirty bitmaps',
> +which can track the changed blocks of an image. Since these bitmap
> +are also a representation of the image split into chunks, we have
> +a direct relation between dirty blocks of the image and chunks we have
> +to upload, so only touched chunks of the disk have to be uploaded for a backup.
> +
> +Dynamically-sized Chunks
> +^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +If does not want to backup block-based systems but file-based systems,
> +using fixed-sized chunks is not a good idea, since every time a file
> +would change in size, the remaining data gets shifted around and this
> +would result in many chunks changing, reducing the effect of deduplication.
> +
> +To improve this, `Proxmox Backup`_ Server uses dynamically-sized chunks
> +instead. Instead of splitting an image into fixed sizes, it generates
> +a consistent file archive and uses a rolling hash over the data
> +to calculate chunk boundaries. We use a variant of Buzhash which
> +is a cyclic polynomial algorithm.
> +
> +It works by continuously calculating a hashsum while iterating over the
> +data, and on certain conditions it triggers a hash boundary.
> +
> +Assuming that there is not a change in every file of the to system to
> +be backed up, eventually the algorithm triggers the boundary on the
> +same data as a previous backup, resulting in a chunk that can be reused.
> +
> +Encrypted Chunks
> +^^^^^^^^^^^^^^^^
> +
> +A special case are encrypted chunks. Both fixed- and dynamically-sized
> +chunks can be encrypted, and their handling is slightly different
> +from normal chunks.
> +
> +The hash of encrypted chunks are calculated not with the actual (encrypted)
> +chunk content, but with the plaintext content concatenated with
> +the encryption key. This way, two chunks of the same data encrypted with
> +different keys generate two different checksums and no collisions occur for
> +multiple encryption keys.
> +
> +This is done to speed up the client part of the backup, since it only needs
> +to encrypt chunks that are actually getting uploaded and chunks that exist
> +already in the previous backup, do not need to be encrypted and uploaded.
> +
> +Caveats and Limitations
> +-----------------------
> +
> +Notes on hash collisions
> +^^^^^^^^^^^^^^^^^^^^^^^^
> +Every hash has a chance to produce collisions, meaning two (or more) inputs
> +generating the same hashsum. For SHA-256, this chance is negligible.
> +To calculate such a collision, one can use the ideas of the 'birthday problem'
> +from probability theory. For big numbers, this is actually not feasible to
> +calculate with normal computers, but there is a good approximation:
> +
> +.. math::
> +
> + p(n, d) = 1 - e^{-n^2/(2d)}
> +
> +Where `n` is the number of tries, and `d` is the number of possibilities.
> +So for example, if we assume a large datastore of 1 PiB, and an average chunk
> +size of 4 MiB, we have :math:`n = 268435456` tries, and :math:`d = 2^{256}`
> +possibilities.  Using the above formula we get that the probability of a
> +collision in that scenario is:
> +
> +.. math::
> +
> + 3.1115 * 10^{-61}
> +
> +For context, in a lottery game of 6 of 45, the chance to correctly guess all
> +6 numbers is only :math:`1.2277 * 10^{-7}`.
> +
> +So it is extremely unlikely that such a collision would occur by accident
> +in a normal datastore.
> +
> +Additionally, SHA-256 is prone to length extension attacks, but since
> +there is an upper limit for how big the chunk sizes are, this is not
> +problem, since a potential attacker cannot arbitrarily add the content.
> +
> +
> +File-based Backup
> +^^^^^^^^^^^^^^^^^
> +
> +Since there is no relation between files and dynamically-sized chunks,
> +we have to read all files again in every backup. Otherwise it would not be
> +possible to generate a format where the original chunks can be reused.
> +
> +Verification of encrypted chunks
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +For encrypted chunks, only the checksum of the original (plaintext) data
> +is available, making it impossible for the server (without the encryption key),
> +to verify its content against it. Instead only the CRC-32 checksum gets checked.
> +
> +
> diff --git a/docs/terminology.rst b/docs/terminology.rst
> index 89ec7646..3eff9780 100644
> --- a/docs/terminology.rst
> +++ b/docs/terminology.rst
> @@ -1,3 +1,5 @@
> +.. _terminology:
> +
>   Terminology
>   ===========
>   
> @@ -99,6 +101,7 @@ Backup Group
>   The tuple ``<type>/<ID>`` is called a backup group. Such a group
>   may contain one or more backup snapshots.
>   
> +.. _backup_snapshot:
>   
>   Backup Snapshot
>   ---------------
> 





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] docs: explain some technical details about datastores/chunks
  2020-12-10  7:37   ` Dominik Csapak
@ 2020-12-10 11:42     ` Lubomir Apostolov
  2020-12-10 13:37       ` Dominik Csapak
  0 siblings, 1 reply; 7+ messages in thread
From: Lubomir Apostolov @ 2020-12-10 11:42 UTC (permalink / raw)
  To: Dominik Csapak, Proxmox Backup Server development discussion

Hi,

Thank you for the reply.

> thanks for your message, your questions mean i did not
> write the documentation clear enough, i try to answer here,
> but i'll also incorporate that info in a v2

My questions are more about flaws in design than documentation.

I should also state that currently PBS is so highly inefficient that from
our point of view (but we're not alone as you can see it in the forum) 
it's unsuitable in production :
* Reading the all data every day for a backup is an overkill, which 
  leads to wasted energy and premature hardware failures.
* With big clusters of data, it can't be done on a daily basis, 
  so risk of loosing data is higher

> just to clarify, that sentence only relates to 'file-based backups', 
> since for block based backups, we do not care about files at all,
> just about the 'block image' we try back up.

Agreed.

> while something like that *could* work, it is not how we do it.

So there's no catch, PBS only took the simplest path ?

> instead of relying on storage snapshots (which may
> not be available, e.g. for '.raw' files)
> we simply iterate over the content of the block level image,
> and create the chunks hashes. this normally means that we have
> to read the whole image, but in case of pve qemu vms,
> we use (as written in my patch) dirty-bitmaps
> which keeps track of the changed blocks
> (only those have to be hashed and backed up)

If snapshots aren't available, then you have to read all the data, agreed.
But when snapshots are available, not using them is problematic as 
previously stated. Using dirty-bitmaps means you already implemented 
part of the changed blocks algorithm.

> again, we do not rely on storage snapshots, since those may
> not be available (e.g. ext4)

Same comments as above.

> so, first we iterate over the filesystem/directory, from there,
> we create a consistent archive format (pxar). this is something
> like 'tar' but can be created in a streaming fashion (which we
> need).
> only over that archive, we create the chunks. so the data that
> the chunker gets, has no direct relation to any files

When iterating through filesystem in a tar fashion, you have all the 
filenames you read so far. 
You can map data extents to files precisely.
Also I saw that you can mount pxar archives, which means you 
alread save the mapping inside.

I hope my point of view is clearer now.

Best regards,
Lubomir Apostolov

-----Message d'origine-----
De : Dominik Csapak [mailto:d.csapak@proxmox.com] 
Envoyé : jeudi 10 décembre 2020 08:38
À : Proxmox Backup Server development discussion; Lubomir Apostolov
Objet : Re: [pbs-devel] [PATCH proxmox-backup] docs: explain some technical details about datastores/chunks

On 12/9/20 5:23 PM, Lubomir Apostolov wrote:
> Hi,

Hi,

thanks for your message, your questions mean i did not
write the documentation clear enough, i try to answer here,
but i'll also incorporate that info in a v2

> 
> After reading https://bugzilla.proxmox.com/show_bug.cgi?id=3138 and your mail, I'd like to discuss the following statement :
> "we have to read all files again in every backup" for both cases - file and image based backup.

just to clarify, that sentence only relates to 'file-based backups', 
since for block based backups, we do not care about files at all,
just about the 'block image' we try back up.


> 
> The image-backup seems simpler - fixed-size chunks.
> PBS should be able to link a snapshot with it's backup, and backup only differencies with parent snapshot as for example zfs send/recv works.
> Every backup knows the chunk order and chunk size, so it can map every chunk to the original image extents.
> The snapshot differences gives extents, which PBS can map to overlapped chunks, and send only those chunks, while referencing unchanged chunks from previous backup chunks list.

while something like that *could* work, it is not how we do it.
instead of relying on storage snapshots (which may
not be available, e.g. for '.raw' files)
we simply iterate over the content of the block level image,
and create the chunks hashes. this normally means that we have
to read the whole image, but in case of pve qemu vms,
we use (as written in my patch) dirty-bitmaps
which keeps track of the changed blocks
(only those have to be hashed and backed up)

> 
> The variable-size chunks based on files needs another mapping between chunks and filenames.
> The rolling hash over the data may be linked a list containing the filenames inside, and then the snapshot diff containing files can flag the chunks to be saved.
> 
> So where's the catch ?

again, we do not rely on storage snapshots, since those may
not be available (e.g. ext4)

so, first we iterate over the filesystem/directory, from there,
we create a consistent archive format (pxar). this is something
like 'tar' but can be created in a streaming fashion (which we
need).

only over that archive, we create the chunks. so the data that
the chunker gets, has no direct relation to any files

hope that explains it better

> 
> Best regards,
> Lubomir Apostolov
> 
> -----Message d'origine-----
> De : pbs-devel [mailto:pbs-devel-bounces@lists.proxmox.com] De la part de Dominik Csapak
> Envoyé : mercredi 9 décembre 2020 16:26
> À : pbs-devel@lists.proxmox.com
> Objet : [pbs-devel] [PATCH proxmox-backup] docs: explain some technical details about datastores/chunks
> 
> adds explanations for:
> * what datastores are
> * their relation with snapshots/chunks
> * basic information about chunk directory structures
> * fixed-/dynamically-sized chunks
> * special handling of encrypted chunks
> * hash collision probability
> * limitation of file-based backups
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
>   docs/index.rst              |   1 +
>   docs/technical-overview.rst | 152 ++++++++++++++++++++++++++++++++++++
>   docs/terminology.rst        |   3 +
>   3 files changed, 156 insertions(+)
>   create mode 100644 docs/technical-overview.rst
> 
> diff --git a/docs/index.rst b/docs/index.rst
> index fffcb4fd..f3e6bf0c 100644
> --- a/docs/index.rst
> +++ b/docs/index.rst
> @@ -33,6 +33,7 @@ in the section entitled "GNU Free Documentation License".
>      pve-integration.rst
>      pxar-tool.rst
>      sysadmin.rst
> +   technical-overview.rst
>      faq.rst
>   
>   .. raw:: latex
> diff --git a/docs/technical-overview.rst b/docs/technical-overview.rst
> new file mode 100644
> index 00000000..20f937bd
> --- /dev/null
> +++ b/docs/technical-overview.rst
> @@ -0,0 +1,152 @@
> +Technical Overview
> +==================
> +
> +.. _technical_overview:
> +
> +Datastores
> +----------
> +
> +A Datastore is the logical place where :ref:`Backup Snapshots <backup_snapshot>`
> +and their chunks are stored. Snapshots consist of a manifest, blobs,
> +dynamic- and fixed-indexes (see :ref:`terminology`), and are stored in the following directory structure:
> +
> + <datastore-root>/<type>/<id>/<time>/
> +
> +The indexes contained in a snapshot reference chunks on which the deduplication
> +of datastores is based.
> +
> +Chunks
> +------
> +
> +A chunk is some (sometimes encrypted) data with a CRC-32 checksum at
> +the end and a type marker at the beginning. It is identified by a
> +SHA-256 checksum of its content.
> +
> +To generate such chunks, backup data is split either into fixed-size or
> +dynamically-sized chunks. The same content will be hashed to the same
> +checksum.
> +
> +The chunks of a datastore are found in
> +
> + <datastore-root>/.chunks/
> +
> +This chunk directory is further subdivided by the first four byte of the
> +chunks checksum, so the chunk with the checksum
> +
> + a342e8151cbf439ce65f3df696b54c67a114982cc0aa751f2852c2f7acc19a8b
> +
> +lives in
> +
> + <datastore-root>/.chunks/a342/
> +
> +This is done to reduce the number of files per directory, as having
> +many files per directory can be bad for file system performance.
> +
> +These chunk directories('0000'-'ffff') will be preallocated when the datastore is
> +created.
> +
> +Fixed-sized Chunks
> +^^^^^^^^^^^^^^^^^^
> +
> +For block based backups (like VMs), fixed-sized chunks are used. The content
> +(disk image), is split into chunks of the same length (typically 4 MiB).
> +
> +This works very well for VM images, since the file system on the guest,
> +most often tries to allocate files in contiguous pieces, so new files get
> +new blocks, and changing existing files changes only their own blocks.
> +
> +As an optimization, VMs in `Proxmox VE`_ can make use of 'dirty bitmaps',
> +which can track the changed blocks of an image. Since these bitmap
> +are also a representation of the image split into chunks, we have
> +a direct relation between dirty blocks of the image and chunks we have
> +to upload, so only touched chunks of the disk have to be uploaded for a backup.
> +
> +Dynamically-sized Chunks
> +^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +If does not want to backup block-based systems but file-based systems,
> +using fixed-sized chunks is not a good idea, since every time a file
> +would change in size, the remaining data gets shifted around and this
> +would result in many chunks changing, reducing the effect of deduplication.
> +
> +To improve this, `Proxmox Backup`_ Server uses dynamically-sized chunks
> +instead. Instead of splitting an image into fixed sizes, it generates
> +a consistent file archive and uses a rolling hash over the data
> +to calculate chunk boundaries. We use a variant of Buzhash which
> +is a cyclic polynomial algorithm.
> +
> +It works by continuously calculating a hashsum while iterating over the
> +data, and on certain conditions it triggers a hash boundary.
> +
> +Assuming that there is not a change in every file of the to system to
> +be backed up, eventually the algorithm triggers the boundary on the
> +same data as a previous backup, resulting in a chunk that can be reused.
> +
> +Encrypted Chunks
> +^^^^^^^^^^^^^^^^
> +
> +A special case are encrypted chunks. Both fixed- and dynamically-sized
> +chunks can be encrypted, and their handling is slightly different
> +from normal chunks.
> +
> +The hash of encrypted chunks are calculated not with the actual (encrypted)
> +chunk content, but with the plaintext content concatenated with
> +the encryption key. This way, two chunks of the same data encrypted with
> +different keys generate two different checksums and no collisions occur for
> +multiple encryption keys.
> +
> +This is done to speed up the client part of the backup, since it only needs
> +to encrypt chunks that are actually getting uploaded and chunks that exist
> +already in the previous backup, do not need to be encrypted and uploaded.
> +
> +Caveats and Limitations
> +-----------------------
> +
> +Notes on hash collisions
> +^^^^^^^^^^^^^^^^^^^^^^^^
> +Every hash has a chance to produce collisions, meaning two (or more) inputs
> +generating the same hashsum. For SHA-256, this chance is negligible.
> +To calculate such a collision, one can use the ideas of the 'birthday problem'
> +from probability theory. For big numbers, this is actually not feasible to
> +calculate with normal computers, but there is a good approximation:
> +
> +.. math::
> +
> + p(n, d) = 1 - e^{-n^2/(2d)}
> +
> +Where `n` is the number of tries, and `d` is the number of possibilities.
> +So for example, if we assume a large datastore of 1 PiB, and an average chunk
> +size of 4 MiB, we have :math:`n = 268435456` tries, and :math:`d = 2^{256}`
> +possibilities.  Using the above formula we get that the probability of a
> +collision in that scenario is:
> +
> +.. math::
> +
> + 3.1115 * 10^{-61}
> +
> +For context, in a lottery game of 6 of 45, the chance to correctly guess all
> +6 numbers is only :math:`1.2277 * 10^{-7}`.
> +
> +So it is extremely unlikely that such a collision would occur by accident
> +in a normal datastore.
> +
> +Additionally, SHA-256 is prone to length extension attacks, but since
> +there is an upper limit for how big the chunk sizes are, this is not
> +problem, since a potential attacker cannot arbitrarily add the content.
> +
> +
> +File-based Backup
> +^^^^^^^^^^^^^^^^^
> +
> +Since there is no relation between files and dynamically-sized chunks,
> +we have to read all files again in every backup. Otherwise it would not be
> +possible to generate a format where the original chunks can be reused.
> +
> +Verification of encrypted chunks
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +For encrypted chunks, only the checksum of the original (plaintext) data
> +is available, making it impossible for the server (without the encryption key),
> +to verify its content against it. Instead only the CRC-32 checksum gets checked.
> +
> +
> diff --git a/docs/terminology.rst b/docs/terminology.rst
> index 89ec7646..3eff9780 100644
> --- a/docs/terminology.rst
> +++ b/docs/terminology.rst
> @@ -1,3 +1,5 @@
> +.. _terminology:
> +
>   Terminology
>   ===========
>   
> @@ -99,6 +101,7 @@ Backup Group
>   The tuple ``<type>/<ID>`` is called a backup group. Such a group
>   may contain one or more backup snapshots.
>   
> +.. _backup_snapshot:
>   
>   Backup Snapshot
>   ---------------
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] docs: explain some technical details about datastores/chunks
  2020-12-10 11:42     ` Lubomir Apostolov
@ 2020-12-10 13:37       ` Dominik Csapak
  2020-12-10 13:52         ` Dominik Csapak
  0 siblings, 1 reply; 7+ messages in thread
From: Dominik Csapak @ 2020-12-10 13:37 UTC (permalink / raw)
  To: Lubomir Apostolov, Proxmox Backup Server development discussion

On 12/10/20 12:42 PM, Lubomir Apostolov wrote:
> Hi,

hi,

> 
> Thank you for the reply.
> 
>> thanks for your message, your questions mean i did not
>> write the documentation clear enough, i try to answer here,
>> but i'll also incorporate that info in a v2
> 
> My questions are more about flaws in design than documentation.

well, all designs come with tradeoffs, i'll try to explain our choices
(and also add that to the docs in the future maybe)

> 
> I should also state that currently PBS is so highly inefficient that from
> our point of view (but we're not alone as you can see it in the forum)
> it's unsuitable in production :
> * Reading the all data every day for a backup is an overkill, which
>    leads to wasted energy and premature hardware failures.

but the only way to get 100% consistent backups
e.g. if an application sets the mtime/ctime
back to the original date, restic/borg would not
backup that file, even when the data changes

> * With big clusters of data, it can't be done on a daily basis,
>    so risk of loosing data is higher

if you have much data that changes every day, no other
backup solution will bring much benefit, as they also have
to read that data

if you have much data that does not change every day,
why do you want to back it up every day?

if your guest has a mixed use-case (some files change every day,
some files only change rarely) you can split them
into different directories and backup them separately

if you really need fast complete backups for big guests
that you do often, vms are the way to go

> 
>> just to clarify, that sentence only relates to 'file-based backups',
>> since for block based backups, we do not care about files at all,
>> just about the 'block image' we try back up.
> 
> Agreed.
> 
>> while something like that *could* work, it is not how we do it.
> 
> So there's no catch, PBS only took the simplest path ?

*could* as in theoretically, if all parts of the software
play well. for example, you wrote about snapshot diffs

those do not exists for all storages, or are not
easiliy accessible. writing backup for every specific
storage technology is really not sensible.

also you now would have to leave the snapshots on your source
storage intact, and in the case of zfs for example, you cannot
rollback without deleting them. in that case you'd have
to read the data to get the diff anyway..

if you want backups based on zfs, you can of course use
zfs send/receive (or pves replication) which is a perfectly
valid choice.

> 
>> instead of relying on storage snapshots (which may
>> not be available, e.g. for '.raw' files)
>> we simply iterate over the content of the block level image,
>> and create the chunks hashes. this normally means that we have
>> to read the whole image, but in case of pve qemu vms,
>> we use (as written in my patch) dirty-bitmaps
>> which keeps track of the changed blocks
>> (only those have to be hashed and backed up)
> 
> If snapshots aren't available, then you have to read all the data, agreed.
> But when snapshots are available, not using them is problematic as
> previously stated. Using dirty-bitmaps means you already implemented
> part of the changed blocks algorithm.

since with qemu backups we already use the dirty-bitmap,
we already only read what changed, or am i missing something here?

> 
>> again, we do not rely on storage snapshots, since those may
>> not be available (e.g. ext4)
> 
> Same comments as above.
> 
>> so, first we iterate over the filesystem/directory, from there,
>> we create a consistent archive format (pxar). this is something
>> like 'tar' but can be created in a streaming fashion (which we
>> need).
>> only over that archive, we create the chunks. so the data that
>> the chunker gets, has no direct relation to any files
> 
> When iterating through filesystem in a tar fashion, you have all the
> filenames you read so far.
> You can map data extents to files precisely.

i do not completely understand what you mean here, but
afaiu, you mean the chunks <-> file mapping?

if yes, then that could work but that comes with *very* big drawbacks: 
chunk size/speed/deduplication

if you have many small chunks, read and write speed will be
absolutely horrible

the readme in proxmox-backup[0] has some calculations of that
but the gist is that for big chunk sizes (we average here to about
4Mb/chunk, which we get with
the archive format and iterating over that) we get
speeds several magnitudes faster than with a small average
chunk size (e.g. 1kb)

if you would combine the files into chunks, deduplication would
suffer greatly, since the changed files gets combined
in a new way for every backup, in contrast if we combine it
consistently across backups, we can reuse many chunks

also the backups are not incremential or differential, but
every snapshot references the complete data of the source
meaning that you can delete every backup snapshot impacting
any other backups. this would not be easily possible
if we would introduce e.g. incremental backups
(since those would depend on a 'full' backup somehow)

also dynamically merging incremental backups while doing
so, costs in backup time and cpu power and is not trivial
to implement in a safe and efficient way

> Also I saw that you can mount pxar archives, which means you
> alread save the mapping inside.
> 

this has nothing to do with backing up really

when mapping, we have the complete list of chunks that
are referenced, and we dynamically look them up if we
need them, but that does not give us any file <=> relation

a simple example:

we have some pxar archive:

|start|file1_metadata|file1_data|file2_metadata|....|end|

now a dynamic chunk can begin anywhere not only on the entry boundaries
for example:

chunk1:
|start|file1_meta

chunk2:
data|file1_data|fil

chunk3:
e2_metadata|file2_

...

chunkX:
_data|end|

so if file1 gets added one byte it could look like

chunk1*:
|start|file1*_meta

chunk2*:
data|file1*_data|fil

chunk3:
e2_metadat|file2_
..

all chunks from chunk3 to chunkX stay the same
and only 2 new chunks get added

(ofc this is a very oversimplified example)

> I hope my point of view is clearer now.

i get where you are coming from, and i hope
i could explain the rationale behind the
design well enough

> 
> Best regards,
> Lubomir Apostolov
> 

kind regards,
Dominik

0: 
https://git.proxmox.com/?p=proxmox-backup.git;a=blob;f=README.rst;h=5c42ad10f58d259606b643df542e2664b08009fe;hb=refs/heads/master




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup] docs: explain some technical details about datastores/chunks
  2020-12-10 13:37       ` Dominik Csapak
@ 2020-12-10 13:52         ` Dominik Csapak
  0 siblings, 0 replies; 7+ messages in thread
From: Dominik Csapak @ 2020-12-10 13:52 UTC (permalink / raw)
  To: Lubomir Apostolov, Proxmox Backup Server development discussion

[snip]
> 
> also the backups are not incremential or differential, but
> every snapshot references the complete data of the source
> meaning that you can delete every backup snapshot impacting

ofc i mean 'without impacting' here....




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-12-10 13:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-09 15:25 [pbs-devel] [PATCH proxmox-backup] docs: explain some technical details about datastores/chunks Dominik Csapak
2020-12-09 15:56 ` Oguz Bektas
2020-12-09 16:23 ` Lubomir Apostolov
2020-12-10  7:37   ` Dominik Csapak
2020-12-10 11:42     ` Lubomir Apostolov
2020-12-10 13:37       ` Dominik Csapak
2020-12-10 13:52         ` Dominik Csapak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal