From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <d.csapak@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with UTF8SMTPS id D949B61410
 for <pve-devel@lists.proxmox.com>; Fri, 20 Nov 2020 09:27:28 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with UTF8SMTP id BFFF4E2ED
 for <pve-devel@lists.proxmox.com>; Fri, 20 Nov 2020 09:27:28 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [212.186.127.180])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with UTF8SMTPS id 351ACE2E2
 for <pve-devel@lists.proxmox.com>; Fri, 20 Nov 2020 09:27:28 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with UTF8SMTP id EEEE243CE9
 for <pve-devel@lists.proxmox.com>; Fri, 20 Nov 2020 09:27:27 +0100 (CET)
To: pve-devel@lists.proxmox.com
References: <57EF5F8B433A6742AD548ABECE78FE48539302@hal9001.straightec.lokal>
 <901006600.10.1605848394705@webmail.proxmox.com>
 <57EF5F8B433A6742AD548ABECE78FE48539304@hal9001.straightec.lokal>
From: Dominik Csapak <d.csapak@proxmox.com>
Message-ID: <0fb3f9c2-0828-2e3e-c8fd-13d7490ee8ba@proxmox.com>
Date: Fri, 20 Nov 2020 09:27:27 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:83.0) Gecko/20100101
 Thunderbird/83.0
MIME-Version: 1.0
In-Reply-To: <57EF5F8B433A6742AD548ABECE78FE48539304@hal9001.straightec.lokal>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.338 Adjusted score from AWL reputation of From: address
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.001 Looks like a legit reply (A)
 RCVD_IN_DNSWL_MED        -2.3 Sender listed at https://www.dnswl.org/,
 medium trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pve-devel] Improve container backup speed dramatically (factor
 100-1000)
X-BeenThere: pve-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox VE development discussion <pve-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pve-devel/>
List-Post: <mailto:pve-devel@lists.proxmox.com>
List-Help: <mailto:pve-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel>, 
 <mailto:pve-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Fri, 20 Nov 2020 08:27:28 -0000

hi,

it seems there are some misunderstandings as how the backup actually 
works, i'll try to clear that up

On 11/20/20 8:18 AM, Carsten Härle wrote:
>>> Yes, that is how the current variable sized chunking algorithm works.
> ...
> "zfs diff" does not provide the information needed for our deduplication
> algorithm, so we cannot use that.
> <<
> 
> 1) Can you please outline the algorithm?

we have 2 different chunking methods:

* fixed-sizes chunks
* dynamic-sized chunks

fixed-sized chunks, as the name implies, have a predefined, fixed size 
(e.g. 4M)
in vm backups we can split the disk image into such blocks and calculate
the hash

this works well in that case, since fs on disk tend to not
move data around, meaning if you change a byte in a file,
that one chunk will be different, but the rest will be the same

for dynamic sized chunks, we calculate what is called a 'rolling hash'[0]
over a window on the data and under certain circumstances, a chunk 
boundary is triggered, generating a chunk

neither of those chunking methods has any awareness or reference
to files

we use this for container backups in the following way

on iterating over the filesystem/directories, we generate
a so-called 'pxar' archive which is a streaming format
that contains metadata+data for a directory structure

while generating this data-stream we use the dynamic chunk
algorithm to generate chunks on that stream

this works well here, since if you modify/add a byte in a file,
all remaining data gets shifted over the rolling hash will
with a high degree of probabilty find a boundary again,
that it has before and the remaining chunks will be the same


> 2) Why you think, it is not possible to use the changed information of the file system?

1. we would like to avoid making features of the backup, storage
dependent

2. even if we would have that data, we'd have to completely read the 
stream of the previous backup to insert the changes in the right
position and generating a pxar stream that can be chunked.

but now we have read the whole tree again, but this time from
the backup server over the network (probably slower that local fs)
and possibly had to decrypt it (not necessary when reading again from 
local fs)

so with the current pxar+dynamic chunking, this is really not
feasible

what could be possible (but is much work) is
to create a new archive+chunking method, where
the relation files<->chunks is a bit more relevant,
but i'd guess this would blow up our indexing file size
(if you have a million small files, you'd have now a million
more chunks to reference, where as before there would be
less but bigger chunks that combined that data)

> 3) Why does differential backup work with VMs?

in vms there we can have a 'dirty bitmap' which
tracks which fixed-sized blocks was written to

since we split the disk image in the same chunk size
for the backup, there is a 1-to-1 mapping
of blocks written to, and blocks we have to backup

i hope this makes it clearer, if you have any questions, ideas,
etc. free to ask

later today/next week, i'll take the time to write
was i have written above into the documentation,
so that we have a single point of reference we can point to in
the future


kind regards
Dominik