From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id DBA7C694A9 for ; Tue, 2 Mar 2021 11:48:10 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id D1F542B0FB for ; Tue, 2 Mar 2021 11:48:10 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [212.186.127.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS id 3E5FF2B0F1 for ; Tue, 2 Mar 2021 11:48:07 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 0A9FF41D44 for ; Tue, 2 Mar 2021 11:48:07 +0100 (CET) Date: Tue, 02 Mar 2021 11:48:00 +0100 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= To: Proxmox VE user list References: <82a59e22-1a52-e104-1217-ac904d9223ca@merit.unu.edu> In-Reply-To: <82a59e22-1a52-e104-1217-ac904d9223ca@merit.unu.edu> MIME-Version: 1.0 User-Agent: astroid/0.15.0 (https://github.com/astroidmail/astroid) Message-Id: <1614681375.t5hirejgvw.astroid@nora.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL 0.027 Adjusted score from AWL reputation of From: address KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_DNSWL_MED -2.3 Sender listed at https://www.dnswl.org/, medium trust SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com, readthedocs.io] Subject: Re: [PVE-User] pbs incremental backups X-BeenThere: pve-user@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE user list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Mar 2021 10:48:10 -0000 On March 2, 2021 11:22 am, mj wrote: > Hi, >=20 > Testing PBS backups taken from PVE VMs on ceph rbd now. Very nice, very=20 > quick, very cool. :-) >=20 > We have a question. Something we wonder about. >=20 > In our current backup software, we make weekly full_system backups, and=20 > daily incremental_system backups, each incremental based on the same=20 > full_system backup. So: each daily incremental backup becomes bigger,=20 > until the weekend. Then we make a new full_system backup to base the=20 > next set of incrementals on. >=20 > In PBS I cannot specify if a backup is full or incremental, we assume=20 > this means that automatically the first backup is a full_system backup,=20 > and subsequent backups are incremental. The PBS backup logs confirm this=20 > assumption, saying: "scsi0: dirty-bitmap status: created new" vs "scsi0:=20 > dirty-bitmap status: OK (7.3 GiB of 501.0 GiB dirty)" >=20 > And now the question: At what point in time is a new full_system backup=20 > created, to rebase incremental backups on? > Or is each incremental backup based on the previous incremental? And if=20 > that is the case, how will we ever be able to delete one of the=20 > in-between incrementals, because that would then break to whole chain of=20 > incremental_backup-based-on-incremental_backup...? >=20 > We have read the page=20 > https://qemu.readthedocs.io/en/latest/interop/bitmaps.html but it does=20 > not seem to answer this. >=20 > Anyone care to share some insight on this logic and how PBS works? you might want to take a look at=20 https://pbs.proxmox.com/docs/technical-overview.html but, the short summary: PBS does not do full or incremental backups in the classical sense, it=20 uses a chunk-based deduplicated approach. the backup content is split=20 into chunks, those chunks are then hashed to get a chunk ID. the=20 incremental part happens on different levels: - if the backup is of a VM that has been backed up before, and that=20 previous backup still exists on the server, and the VM has not been=20 stopped in the meantime, only chunks which contain changed blocks=20 (tracked by Qemu with a dirty bitmap) are read, hashed and uploaded,=20 the rest is re-used. (FAST incremental) - for all backups, if a previous backup exists, it's index is=20 downloaded, all local data is read and hashed, but only chunks which=20 are missing on the server are actually uploaded (incremental) - if no previous backups exists, all local data is read and hashed and=20 uploaded ("full" backup) additionally, an uploaded chunk might still exist on the server (e.g.,=20 from backups in another backup group), in which case the server will=20 still re-use the existing chunk. so, fast incremental does the least work, incremental does full reading=20 and hashing but less uploading, and we try to avoid unnecessary writes=20 on the server side in all cases. all of the above is also true when you=20 add in encryption, although obviously changing encryption mode or keys=20 will invalidate previous backups for purposes of reusing chunks (and=20 thus lead to a single full backup even if previous ones exist). on the server side, a backup snapshots does not consist of a base and a=20 series of incremental diffs, it always references all the chunks that=20 represent this full snapshot. the magic is in the chunking and=20 deduplication, which allows us to store all those snapshots efficiently.=20 ALL snapshots are equivalent, whether it was the first one or not has no=20 bearing on how the snapshot or its referenced data is stored, just on=20 how much work it was to create and transfer it in the first place. =