Re: [PVE-User] pbs incremental backups

From: mj <lists@merit.unu.edu>
To: pve-user@lists.proxmox.com
Subject: Re: [PVE-User] pbs incremental backups
Date: Wed, 3 Mar 2021 08:48:20 +0100	[thread overview]
Message-ID: <244b6de7-3720-04f3-0fb3-f693137fe4c0@merit.unu.edu> (raw)
In-Reply-To: <1614681375.t5hirejgvw.astroid@nora.none>

Hi Arjen, Andreas and specially Fabian for your elaborate reply,

Thanks! It all makes much more sense now.

As said before, PBS is a great addition to the proxmox line of products.

Thanks!

MJ

On 3/2/21 11:48 AM, Fabian Grünbichler wrote:
> On March 2, 2021 11:22 am, mj wrote:
>> Hi,
>>
>> Testing PBS backups taken from PVE VMs on ceph rbd now. Very nice, very
>> quick, very cool. :-)
>>
>> We have a question. Something we wonder about.
>>
>> In our current backup software, we make weekly full_system backups, and
>> daily incremental_system backups, each incremental based on the same
>> full_system backup. So: each daily incremental backup becomes bigger,
>> until the weekend. Then we make a new full_system backup to base the
>> next set of incrementals on.
>>
>> In PBS I cannot specify if a backup is full or incremental, we assume
>> this means that automatically the first backup is a full_system backup,
>> and subsequent backups are incremental. The PBS backup logs confirm this
>> assumption, saying: "scsi0: dirty-bitmap status: created new" vs "scsi0:
>> dirty-bitmap status: OK (7.3 GiB of 501.0 GiB dirty)"
>>
>> And now the question: At what point in time is a new full_system backup
>> created, to rebase incremental backups on?
>> Or is each incremental backup based on the previous incremental? And if
>> that is the case, how will we ever be able to delete one of the
>> in-between incrementals, because that would then break to whole chain of
>> incremental_backup-based-on-incremental_backup...?
>>
>> We have read the page
>> https://qemu.readthedocs.io/en/latest/interop/bitmaps.html but it does
>> not seem to answer this.
>>
>> Anyone care to share some insight on this logic and how PBS works?
> 
> you might want to take a look at
> 
> https://pbs.proxmox.com/docs/technical-overview.html
> 
> but, the short summary:
> 
> PBS does not do full or incremental backups in the classical sense, it
> uses a chunk-based deduplicated approach. the backup content is split
> into chunks, those chunks are then hashed to get a chunk ID. the
> incremental part happens on different levels:
> 
> - if the backup is of a VM that has been backed up before, and that
>    previous backup still exists on the server, and the VM has not been
>    stopped in the meantime, only chunks which contain changed blocks
>    (tracked by Qemu with a dirty bitmap) are read, hashed and uploaded,
>    the rest is re-used. (FAST incremental)
> - for all backups, if a previous backup exists, it's index is
>    downloaded, all local data is read and hashed, but only chunks which
>    are missing on the server are actually uploaded (incremental)
> - if no previous backups exists, all local data is read and hashed and
>    uploaded ("full" backup)
> 
> additionally, an uploaded chunk might still exist on the server (e.g.,
> from backups in another backup group), in which case the server will
> still re-use the existing chunk.
> 
> so, fast incremental does the least work, incremental does full reading
> and hashing but less uploading, and we try to avoid unnecessary writes
> on the server side in all cases. all of the above is also true when you
> add in encryption, although obviously changing encryption mode or keys
> will invalidate previous backups for purposes of reusing chunks (and
> thus lead to a single full backup even if previous ones exist).
> 
> on the server side, a backup snapshots does not consist of a base and a
> series of incremental diffs, it always references all the chunks that
> represent this full snapshot. the magic is in the chunking and
> deduplication, which allows us to store all those snapshots efficiently.
> ALL snapshots are equivalent, whether it was the first one or not has no
> bearing on how the snapshot or its referenced data is stored, just on
> how much work it was to create and transfer it in the first place.
> 
> 
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>