From: mj <lists@merit.unu.edu>
To: pve-user@lists.proxmox.com
Subject: Re: [PVE-User] pbs incremental backups
Date: Wed, 3 Mar 2021 08:48:20 +0100 [thread overview]
Message-ID: <244b6de7-3720-04f3-0fb3-f693137fe4c0@merit.unu.edu> (raw)
In-Reply-To: <1614681375.t5hirejgvw.astroid@nora.none>
Hi Arjen, Andreas and specially Fabian for your elaborate reply,
Thanks! It all makes much more sense now.
As said before, PBS is a great addition to the proxmox line of products.
Thanks!
MJ
On 3/2/21 11:48 AM, Fabian Grünbichler wrote:
> On March 2, 2021 11:22 am, mj wrote:
>> Hi,
>>
>> Testing PBS backups taken from PVE VMs on ceph rbd now. Very nice, very
>> quick, very cool. :-)
>>
>> We have a question. Something we wonder about.
>>
>> In our current backup software, we make weekly full_system backups, and
>> daily incremental_system backups, each incremental based on the same
>> full_system backup. So: each daily incremental backup becomes bigger,
>> until the weekend. Then we make a new full_system backup to base the
>> next set of incrementals on.
>>
>> In PBS I cannot specify if a backup is full or incremental, we assume
>> this means that automatically the first backup is a full_system backup,
>> and subsequent backups are incremental. The PBS backup logs confirm this
>> assumption, saying: "scsi0: dirty-bitmap status: created new" vs "scsi0:
>> dirty-bitmap status: OK (7.3 GiB of 501.0 GiB dirty)"
>>
>> And now the question: At what point in time is a new full_system backup
>> created, to rebase incremental backups on?
>> Or is each incremental backup based on the previous incremental? And if
>> that is the case, how will we ever be able to delete one of the
>> in-between incrementals, because that would then break to whole chain of
>> incremental_backup-based-on-incremental_backup...?
>>
>> We have read the page
>> https://qemu.readthedocs.io/en/latest/interop/bitmaps.html but it does
>> not seem to answer this.
>>
>> Anyone care to share some insight on this logic and how PBS works?
>
> you might want to take a look at
>
> https://pbs.proxmox.com/docs/technical-overview.html
>
> but, the short summary:
>
> PBS does not do full or incremental backups in the classical sense, it
> uses a chunk-based deduplicated approach. the backup content is split
> into chunks, those chunks are then hashed to get a chunk ID. the
> incremental part happens on different levels:
>
> - if the backup is of a VM that has been backed up before, and that
> previous backup still exists on the server, and the VM has not been
> stopped in the meantime, only chunks which contain changed blocks
> (tracked by Qemu with a dirty bitmap) are read, hashed and uploaded,
> the rest is re-used. (FAST incremental)
> - for all backups, if a previous backup exists, it's index is
> downloaded, all local data is read and hashed, but only chunks which
> are missing on the server are actually uploaded (incremental)
> - if no previous backups exists, all local data is read and hashed and
> uploaded ("full" backup)
>
> additionally, an uploaded chunk might still exist on the server (e.g.,
> from backups in another backup group), in which case the server will
> still re-use the existing chunk.
>
> so, fast incremental does the least work, incremental does full reading
> and hashing but less uploading, and we try to avoid unnecessary writes
> on the server side in all cases. all of the above is also true when you
> add in encryption, although obviously changing encryption mode or keys
> will invalidate previous backups for purposes of reusing chunks (and
> thus lead to a single full backup even if previous ones exist).
>
> on the server side, a backup snapshots does not consist of a base and a
> series of incremental diffs, it always references all the chunks that
> represent this full snapshot. the magic is in the chunking and
> deduplication, which allows us to store all those snapshots efficiently.
> ALL snapshots are equivalent, whether it was the first one or not has no
> bearing on how the snapshot or its referenced data is stored, just on
> how much work it was to create and transfer it in the first place.
>
>
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
prev parent reply other threads:[~2021-03-03 7:48 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-02 10:22 mj
2021-03-02 10:37 ` Andreas Heinlein
2021-03-02 10:48 ` Fabian Grünbichler
2021-03-03 7:48 ` mj [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=244b6de7-3720-04f3-0fb3-f693137fe4c0@merit.unu.edu \
--to=lists@merit.unu.edu \
--cc=pve-user@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox