From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox VE user list <pve-user@lists.proxmox.com>
Subject: Re: [PVE-User] pbs incremental backups
Date: Tue, 02 Mar 2021 11:48:00 +0100 [thread overview]
Message-ID: <1614681375.t5hirejgvw.astroid@nora.none> (raw)
In-Reply-To: <82a59e22-1a52-e104-1217-ac904d9223ca@merit.unu.edu>
On March 2, 2021 11:22 am, mj wrote:
> Hi,
>
> Testing PBS backups taken from PVE VMs on ceph rbd now. Very nice, very
> quick, very cool. :-)
>
> We have a question. Something we wonder about.
>
> In our current backup software, we make weekly full_system backups, and
> daily incremental_system backups, each incremental based on the same
> full_system backup. So: each daily incremental backup becomes bigger,
> until the weekend. Then we make a new full_system backup to base the
> next set of incrementals on.
>
> In PBS I cannot specify if a backup is full or incremental, we assume
> this means that automatically the first backup is a full_system backup,
> and subsequent backups are incremental. The PBS backup logs confirm this
> assumption, saying: "scsi0: dirty-bitmap status: created new" vs "scsi0:
> dirty-bitmap status: OK (7.3 GiB of 501.0 GiB dirty)"
>
> And now the question: At what point in time is a new full_system backup
> created, to rebase incremental backups on?
> Or is each incremental backup based on the previous incremental? And if
> that is the case, how will we ever be able to delete one of the
> in-between incrementals, because that would then break to whole chain of
> incremental_backup-based-on-incremental_backup...?
>
> We have read the page
> https://qemu.readthedocs.io/en/latest/interop/bitmaps.html but it does
> not seem to answer this.
>
> Anyone care to share some insight on this logic and how PBS works?
you might want to take a look at
https://pbs.proxmox.com/docs/technical-overview.html
but, the short summary:
PBS does not do full or incremental backups in the classical sense, it
uses a chunk-based deduplicated approach. the backup content is split
into chunks, those chunks are then hashed to get a chunk ID. the
incremental part happens on different levels:
- if the backup is of a VM that has been backed up before, and that
previous backup still exists on the server, and the VM has not been
stopped in the meantime, only chunks which contain changed blocks
(tracked by Qemu with a dirty bitmap) are read, hashed and uploaded,
the rest is re-used. (FAST incremental)
- for all backups, if a previous backup exists, it's index is
downloaded, all local data is read and hashed, but only chunks which
are missing on the server are actually uploaded (incremental)
- if no previous backups exists, all local data is read and hashed and
uploaded ("full" backup)
additionally, an uploaded chunk might still exist on the server (e.g.,
from backups in another backup group), in which case the server will
still re-use the existing chunk.
so, fast incremental does the least work, incremental does full reading
and hashing but less uploading, and we try to avoid unnecessary writes
on the server side in all cases. all of the above is also true when you
add in encryption, although obviously changing encryption mode or keys
will invalidate previous backups for purposes of reusing chunks (and
thus lead to a single full backup even if previous ones exist).
on the server side, a backup snapshots does not consist of a base and a
series of incremental diffs, it always references all the chunks that
represent this full snapshot. the magic is in the chunking and
deduplication, which allows us to store all those snapshots efficiently.
ALL snapshots are equivalent, whether it was the first one or not has no
bearing on how the snapshot or its referenced data is stored, just on
how much work it was to create and transfer it in the first place.
next prev parent reply other threads:[~2021-03-02 10:48 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-02 10:22 mj
2021-03-02 10:37 ` Andreas Heinlein
2021-03-02 10:48 ` Fabian Grünbichler [this message]
2021-03-03 7:48 ` mj
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1614681375.t5hirejgvw.astroid@nora.none \
--to=f.gruenbichler@proxmox.com \
--cc=pve-user@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox