From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <d.csapak@proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by lists.proxmox.com (Postfix) with UTF8SMTPS id AA2786B101
 for <pbs-devel@lists.proxmox.com>; Thu, 10 Dec 2020 14:37:46 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
 by firstgate.proxmox.com (Proxmox) with UTF8SMTP id 98E0615AE3
 for <pbs-devel@lists.proxmox.com>; Thu, 10 Dec 2020 14:37:16 +0100 (CET)
Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com
 [212.186.127.180])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by firstgate.proxmox.com (Proxmox) with UTF8SMTPS id 9DD1615AD4
 for <pbs-devel@lists.proxmox.com>; Thu, 10 Dec 2020 14:37:14 +0100 (CET)
Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1])
 by proxmox-new.maurer-it.com (Proxmox) with UTF8SMTP id 643A444F71;
 Thu, 10 Dec 2020 14:37:14 +0100 (CET)
To: Lubomir Apostolov <Lubomir.Apostolov@directique.com>,
 Proxmox Backup Server development discussion <pbs-devel@lists.proxmox.com>
References: <20201209152553.8752-1-d.csapak@proxmox.com>
 <718D0AF11703FA4C85B0535448A05610038195BAB5@SCOM4.directique.net>
 <8a9b4fcc-cbaf-02ec-d0fa-e9ea396a3463@proxmox.com>
 <718D0AF11703FA4C85B0535448A05610038195BD47@SCOM4.directique.net>
From: Dominik Csapak <d.csapak@proxmox.com>
Message-ID: <2a5b802e-e5f3-59d9-cda2-f94ffe176dde@proxmox.com>
Date: Thu, 10 Dec 2020 14:37:12 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:84.0) Gecko/20100101
 Thunderbird/84.0
MIME-Version: 1.0
In-Reply-To: <718D0AF11703FA4C85B0535448A05610038195BD47@SCOM4.directique.net>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.290 Adjusted score from AWL reputation of From: address
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 NICE_REPLY_A           -0.001 Looks like a legit reply (A)
 RCVD_IN_DNSWL_MED        -2.3 Sender listed at https://www.dnswl.org/,
 medium trust
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pbs-devel] [PATCH proxmox-backup] docs: explain some technical
 details about datastores/chunks
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
X-List-Received-Date: Thu, 10 Dec 2020 13:37:46 -0000

On 12/10/20 12:42 PM, Lubomir Apostolov wrote:
> Hi,

hi,

> 
> Thank you for the reply.
> 
>> thanks for your message, your questions mean i did not
>> write the documentation clear enough, i try to answer here,
>> but i'll also incorporate that info in a v2
> 
> My questions are more about flaws in design than documentation.

well, all designs come with tradeoffs, i'll try to explain our choices
(and also add that to the docs in the future maybe)

> 
> I should also state that currently PBS is so highly inefficient that from
> our point of view (but we're not alone as you can see it in the forum)
> it's unsuitable in production :
> * Reading the all data every day for a backup is an overkill, which
>    leads to wasted energy and premature hardware failures.

but the only way to get 100% consistent backups
e.g. if an application sets the mtime/ctime
back to the original date, restic/borg would not
backup that file, even when the data changes

> * With big clusters of data, it can't be done on a daily basis,
>    so risk of loosing data is higher

if you have much data that changes every day, no other
backup solution will bring much benefit, as they also have
to read that data

if you have much data that does not change every day,
why do you want to back it up every day?

if your guest has a mixed use-case (some files change every day,
some files only change rarely) you can split them
into different directories and backup them separately

if you really need fast complete backups for big guests
that you do often, vms are the way to go

> 
>> just to clarify, that sentence only relates to 'file-based backups',
>> since for block based backups, we do not care about files at all,
>> just about the 'block image' we try back up.
> 
> Agreed.
> 
>> while something like that *could* work, it is not how we do it.
> 
> So there's no catch, PBS only took the simplest path ?

*could* as in theoretically, if all parts of the software
play well. for example, you wrote about snapshot diffs

those do not exists for all storages, or are not
easiliy accessible. writing backup for every specific
storage technology is really not sensible.

also you now would have to leave the snapshots on your source
storage intact, and in the case of zfs for example, you cannot
rollback without deleting them. in that case you'd have
to read the data to get the diff anyway..

if you want backups based on zfs, you can of course use
zfs send/receive (or pves replication) which is a perfectly
valid choice.

> 
>> instead of relying on storage snapshots (which may
>> not be available, e.g. for '.raw' files)
>> we simply iterate over the content of the block level image,
>> and create the chunks hashes. this normally means that we have
>> to read the whole image, but in case of pve qemu vms,
>> we use (as written in my patch) dirty-bitmaps
>> which keeps track of the changed blocks
>> (only those have to be hashed and backed up)
> 
> If snapshots aren't available, then you have to read all the data, agreed.
> But when snapshots are available, not using them is problematic as
> previously stated. Using dirty-bitmaps means you already implemented
> part of the changed blocks algorithm.

since with qemu backups we already use the dirty-bitmap,
we already only read what changed, or am i missing something here?

> 
>> again, we do not rely on storage snapshots, since those may
>> not be available (e.g. ext4)
> 
> Same comments as above.
> 
>> so, first we iterate over the filesystem/directory, from there,
>> we create a consistent archive format (pxar). this is something
>> like 'tar' but can be created in a streaming fashion (which we
>> need).
>> only over that archive, we create the chunks. so the data that
>> the chunker gets, has no direct relation to any files
> 
> When iterating through filesystem in a tar fashion, you have all the
> filenames you read so far.
> You can map data extents to files precisely.

i do not completely understand what you mean here, but
afaiu, you mean the chunks <-> file mapping?

if yes, then that could work but that comes with *very* big drawbacks: 
chunk size/speed/deduplication

if you have many small chunks, read and write speed will be
absolutely horrible

the readme in proxmox-backup[0] has some calculations of that
but the gist is that for big chunk sizes (we average here to about
4Mb/chunk, which we get with
the archive format and iterating over that) we get
speeds several magnitudes faster than with a small average
chunk size (e.g. 1kb)

if you would combine the files into chunks, deduplication would
suffer greatly, since the changed files gets combined
in a new way for every backup, in contrast if we combine it
consistently across backups, we can reuse many chunks

also the backups are not incremential or differential, but
every snapshot references the complete data of the source
meaning that you can delete every backup snapshot impacting
any other backups. this would not be easily possible
if we would introduce e.g. incremental backups
(since those would depend on a 'full' backup somehow)

also dynamically merging incremental backups while doing
so, costs in backup time and cpu power and is not trivial
to implement in a safe and efficient way

> Also I saw that you can mount pxar archives, which means you
> alread save the mapping inside.
> 

this has nothing to do with backing up really

when mapping, we have the complete list of chunks that
are referenced, and we dynamically look them up if we
need them, but that does not give us any file <=> relation

a simple example:

we have some pxar archive:

|start|file1_metadata|file1_data|file2_metadata|....|end|

now a dynamic chunk can begin anywhere not only on the entry boundaries
for example:

chunk1:
|start|file1_meta

chunk2:
data|file1_data|fil

chunk3:
e2_metadata|file2_

...

chunkX:
_data|end|

so if file1 gets added one byte it could look like

chunk1*:
|start|file1*_meta

chunk2*:
data|file1*_data|fil

chunk3:
e2_metadat|file2_
..

all chunks from chunk3 to chunkX stay the same
and only 2 new chunks get added

(ofc this is a very oversimplified example)

> I hope my point of view is clearer now.

i get where you are coming from, and i hope
i could explain the rationale behind the
design well enough

> 
> Best regards,
> Lubomir Apostolov
> 

kind regards,
Dominik

0: 
https://git.proxmox.com/?p=proxmox-backup.git;a=blob;f=README.rst;h=5c42ad10f58d259606b643df542e2664b08009fe;hb=refs/heads/master