From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with UTF8SMTPS id 651926AEE5 for ; Thu, 10 Dec 2020 08:38:13 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with UTF8SMTP id 567AF17CDE for ; Thu, 10 Dec 2020 08:37:43 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with UTF8SMTPS id 06E6E17CD3 for ; Thu, 10 Dec 2020 08:37:42 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with UTF8SMTP id C697544687; Thu, 10 Dec 2020 08:37:41 +0100 (CET) To: Proxmox Backup Server development discussion , Lubomir Apostolov References: <20201209152553.8752-1-d.csapak@proxmox.com> <718D0AF11703FA4C85B0535448A05610038195BAB5@SCOM4.directique.net> From: Dominik Csapak Message-ID: <8a9b4fcc-cbaf-02ec-d0fa-e9ea396a3463@proxmox.com> Date: Thu, 10 Dec 2020 08:37:39 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:84.0) Gecko/20100101 Thunderbird/84.0 MIME-Version: 1.0 In-Reply-To: <718D0AF11703FA4C85B0535448A05610038195BAB5@SCOM4.directique.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: 0 AWL 0.292 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment NICE_REPLY_A -0.001 Looks like a legit reply (A) RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] [PATCH proxmox-backup] docs: explain some technical details about datastores/chunks X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Dec 2020 07:38:13 -0000 On 12/9/20 5:23 PM, Lubomir Apostolov wrote: > Hi, Hi, thanks for your message, your questions mean i did not write the documentation clear enough, i try to answer here, but i'll also incorporate that info in a v2 > > After reading https://bugzilla.proxmox.com/show_bug.cgi?id=3138 and your mail, I'd like to discuss the following statement : > "we have to read all files again in every backup" for both cases - file and image based backup. just to clarify, that sentence only relates to 'file-based backups', since for block based backups, we do not care about files at all, just about the 'block image' we try back up. > > The image-backup seems simpler - fixed-size chunks. > PBS should be able to link a snapshot with it's backup, and backup only differencies with parent snapshot as for example zfs send/recv works. > Every backup knows the chunk order and chunk size, so it can map every chunk to the original image extents. > The snapshot differences gives extents, which PBS can map to overlapped chunks, and send only those chunks, while referencing unchanged chunks from previous backup chunks list. while something like that *could* work, it is not how we do it. instead of relying on storage snapshots (which may not be available, e.g. for '.raw' files) we simply iterate over the content of the block level image, and create the chunks hashes. this normally means that we have to read the whole image, but in case of pve qemu vms, we use (as written in my patch) dirty-bitmaps which keeps track of the changed blocks (only those have to be hashed and backed up) > > The variable-size chunks based on files needs another mapping between chunks and filenames. > The rolling hash over the data may be linked a list containing the filenames inside, and then the snapshot diff containing files can flag the chunks to be saved. > > So where's the catch ? again, we do not rely on storage snapshots, since those may not be available (e.g. ext4) so, first we iterate over the filesystem/directory, from there, we create a consistent archive format (pxar). this is something like 'tar' but can be created in a streaming fashion (which we need). only over that archive, we create the chunks. so the data that the chunker gets, has no direct relation to any files hope that explains it better > > Best regards, > Lubomir Apostolov > > -----Message d'origine----- > De : pbs-devel [mailto:pbs-devel-bounces@lists.proxmox.com] De la part de Dominik Csapak > Envoyé : mercredi 9 décembre 2020 16:26 > À : pbs-devel@lists.proxmox.com > Objet : [pbs-devel] [PATCH proxmox-backup] docs: explain some technical details about datastores/chunks > > adds explanations for: > * what datastores are > * their relation with snapshots/chunks > * basic information about chunk directory structures > * fixed-/dynamically-sized chunks > * special handling of encrypted chunks > * hash collision probability > * limitation of file-based backups > > Signed-off-by: Dominik Csapak > --- > docs/index.rst | 1 + > docs/technical-overview.rst | 152 ++++++++++++++++++++++++++++++++++++ > docs/terminology.rst | 3 + > 3 files changed, 156 insertions(+) > create mode 100644 docs/technical-overview.rst > > diff --git a/docs/index.rst b/docs/index.rst > index fffcb4fd..f3e6bf0c 100644 > --- a/docs/index.rst > +++ b/docs/index.rst > @@ -33,6 +33,7 @@ in the section entitled "GNU Free Documentation License". > pve-integration.rst > pxar-tool.rst > sysadmin.rst > + technical-overview.rst > faq.rst > > .. raw:: latex > diff --git a/docs/technical-overview.rst b/docs/technical-overview.rst > new file mode 100644 > index 00000000..20f937bd > --- /dev/null > +++ b/docs/technical-overview.rst > @@ -0,0 +1,152 @@ > +Technical Overview > +================== > + > +.. _technical_overview: > + > +Datastores > +---------- > + > +A Datastore is the logical place where :ref:`Backup Snapshots ` > +and their chunks are stored. Snapshots consist of a manifest, blobs, > +dynamic- and fixed-indexes (see :ref:`terminology`), and are stored in the following directory structure: > + > + ///