public inbox for pve-user@lists.proxmox.com
 help / color / mirror / Atom feed
* [PVE-User] pbs incremental backups
@ 2021-03-02 10:22 mj
  2021-03-02 10:37 ` Andreas Heinlein
  2021-03-02 10:48 ` Fabian Grünbichler
  0 siblings, 2 replies; 4+ messages in thread
From: mj @ 2021-03-02 10:22 UTC (permalink / raw)
  To: Proxmox VE user list

Hi,

Testing PBS backups taken from PVE VMs on ceph rbd now. Very nice, very 
quick, very cool. :-)

We have a question. Something we wonder about.

In our current backup software, we make weekly full_system backups, and 
daily incremental_system backups, each incremental based on the same 
full_system backup. So: each daily incremental backup becomes bigger, 
until the weekend. Then we make a new full_system backup to base the 
next set of incrementals on.

In PBS I cannot specify if a backup is full or incremental, we assume 
this means that automatically the first backup is a full_system backup, 
and subsequent backups are incremental. The PBS backup logs confirm this 
assumption, saying: "scsi0: dirty-bitmap status: created new" vs "scsi0: 
dirty-bitmap status: OK (7.3 GiB of 501.0 GiB dirty)"

And now the question: At what point in time is a new full_system backup 
created, to rebase incremental backups on?
Or is each incremental backup based on the previous incremental? And if 
that is the case, how will we ever be able to delete one of the 
in-between incrementals, because that would then break to whole chain of 
incremental_backup-based-on-incremental_backup...?

We have read the page 
https://qemu.readthedocs.io/en/latest/interop/bitmaps.html but it does 
not seem to answer this.

Anyone care to share some insight on this logic and how PBS works?

MJ




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] pbs incremental backups
  2021-03-02 10:22 [PVE-User] pbs incremental backups mj
@ 2021-03-02 10:37 ` Andreas Heinlein
  2021-03-02 10:48 ` Fabian Grünbichler
  1 sibling, 0 replies; 4+ messages in thread
From: Andreas Heinlein @ 2021-03-02 10:37 UTC (permalink / raw)
  To: pve-user

Hello,

this is probably a bit difficult to understand at first, but I will try 
to explain.

Each backup is a full backup, even though only the differences to the 
last backup are actually backed up. The remaining unchanged block are 
just being referenced from the last backup. When you delete a previous 
backup, only those blocks which are not referenced by another backup 
will actually be deleted.

So each backup is independent on its own, and you can delete all 
previous backups without losing any data.

Hope this clears it up a bit.

Bye,
Andreas

Am 02.03.2021 um 11:22 schrieb mj:
> Hi,
>
> Testing PBS backups taken from PVE VMs on ceph rbd now. Very nice, 
> very quick, very cool. :-)
>
> We have a question. Something we wonder about.
>
> In our current backup software, we make weekly full_system backups, 
> and daily incremental_system backups, each incremental based on the 
> same full_system backup. So: each daily incremental backup becomes 
> bigger, until the weekend. Then we make a new full_system backup to 
> base the next set of incrementals on.
>
> In PBS I cannot specify if a backup is full or incremental, we assume 
> this means that automatically the first backup is a full_system 
> backup, and subsequent backups are incremental. The PBS backup logs 
> confirm this assumption, saying: "scsi0: dirty-bitmap status: created 
> new" vs "scsi0: dirty-bitmap status: OK (7.3 GiB of 501.0 GiB dirty)"
>
> And now the question: At what point in time is a new full_system 
> backup created, to rebase incremental backups on?
> Or is each incremental backup based on the previous incremental? And 
> if that is the case, how will we ever be able to delete one of the 
> in-between incrementals, because that would then break to whole chain 
> of incremental_backup-based-on-incremental_backup...?
>
> We have read the page 
> https://qemu.readthedocs.io/en/latest/interop/bitmaps.html but it does 
> not seem to answer this.
>
> Anyone care to share some insight on this logic and how PBS works?
>
> MJ 





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] pbs incremental backups
  2021-03-02 10:22 [PVE-User] pbs incremental backups mj
  2021-03-02 10:37 ` Andreas Heinlein
@ 2021-03-02 10:48 ` Fabian Grünbichler
  2021-03-03  7:48   ` mj
  1 sibling, 1 reply; 4+ messages in thread
From: Fabian Grünbichler @ 2021-03-02 10:48 UTC (permalink / raw)
  To: Proxmox VE user list

On March 2, 2021 11:22 am, mj wrote:
> Hi,
> 
> Testing PBS backups taken from PVE VMs on ceph rbd now. Very nice, very 
> quick, very cool. :-)
> 
> We have a question. Something we wonder about.
> 
> In our current backup software, we make weekly full_system backups, and 
> daily incremental_system backups, each incremental based on the same 
> full_system backup. So: each daily incremental backup becomes bigger, 
> until the weekend. Then we make a new full_system backup to base the 
> next set of incrementals on.
> 
> In PBS I cannot specify if a backup is full or incremental, we assume 
> this means that automatically the first backup is a full_system backup, 
> and subsequent backups are incremental. The PBS backup logs confirm this 
> assumption, saying: "scsi0: dirty-bitmap status: created new" vs "scsi0: 
> dirty-bitmap status: OK (7.3 GiB of 501.0 GiB dirty)"
> 
> And now the question: At what point in time is a new full_system backup 
> created, to rebase incremental backups on?
> Or is each incremental backup based on the previous incremental? And if 
> that is the case, how will we ever be able to delete one of the 
> in-between incrementals, because that would then break to whole chain of 
> incremental_backup-based-on-incremental_backup...?
> 
> We have read the page 
> https://qemu.readthedocs.io/en/latest/interop/bitmaps.html but it does 
> not seem to answer this.
> 
> Anyone care to share some insight on this logic and how PBS works?

you might want to take a look at 

https://pbs.proxmox.com/docs/technical-overview.html

but, the short summary:

PBS does not do full or incremental backups in the classical sense, it 
uses a chunk-based deduplicated approach. the backup content is split 
into chunks, those chunks are then hashed to get a chunk ID. the 
incremental part happens on different levels:

- if the backup is of a VM that has been backed up before, and that 
  previous backup still exists on the server, and the VM has not been 
  stopped in the meantime, only chunks which contain changed blocks 
  (tracked by Qemu with a dirty bitmap) are read, hashed and uploaded, 
  the rest is re-used. (FAST incremental)
- for all backups, if a previous backup exists, it's index is 
  downloaded, all local data is read and hashed, but only chunks which 
  are missing on the server are actually uploaded (incremental)
- if no previous backups exists, all local data is read and hashed and 
  uploaded ("full" backup)

additionally, an uploaded chunk might still exist on the server (e.g., 
from backups in another backup group), in which case the server will 
still re-use the existing chunk.

so, fast incremental does the least work, incremental does full reading 
and hashing but less uploading, and we try to avoid unnecessary writes 
on the server side in all cases. all of the above is also true when you 
add in encryption, although obviously changing encryption mode or keys 
will invalidate previous backups for purposes of reusing chunks (and 
thus lead to a single full backup even if previous ones exist).

on the server side, a backup snapshots does not consist of a base and a 
series of incremental diffs, it always references all the chunks that 
represent this full snapshot. the magic is in the chunking and 
deduplication, which allows us to store all those snapshots efficiently. 
ALL snapshots are equivalent, whether it was the first one or not has no 
bearing on how the snapshot or its referenced data is stored, just on 
how much work it was to create and transfer it in the first place.




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PVE-User] pbs incremental backups
  2021-03-02 10:48 ` Fabian Grünbichler
@ 2021-03-03  7:48   ` mj
  0 siblings, 0 replies; 4+ messages in thread
From: mj @ 2021-03-03  7:48 UTC (permalink / raw)
  To: pve-user

Hi Arjen, Andreas and specially Fabian for your elaborate reply,

Thanks! It all makes much more sense now.

As said before, PBS is a great addition to the proxmox line of products.

Thanks!

MJ

On 3/2/21 11:48 AM, Fabian Grünbichler wrote:
> On March 2, 2021 11:22 am, mj wrote:
>> Hi,
>>
>> Testing PBS backups taken from PVE VMs on ceph rbd now. Very nice, very
>> quick, very cool. :-)
>>
>> We have a question. Something we wonder about.
>>
>> In our current backup software, we make weekly full_system backups, and
>> daily incremental_system backups, each incremental based on the same
>> full_system backup. So: each daily incremental backup becomes bigger,
>> until the weekend. Then we make a new full_system backup to base the
>> next set of incrementals on.
>>
>> In PBS I cannot specify if a backup is full or incremental, we assume
>> this means that automatically the first backup is a full_system backup,
>> and subsequent backups are incremental. The PBS backup logs confirm this
>> assumption, saying: "scsi0: dirty-bitmap status: created new" vs "scsi0:
>> dirty-bitmap status: OK (7.3 GiB of 501.0 GiB dirty)"
>>
>> And now the question: At what point in time is a new full_system backup
>> created, to rebase incremental backups on?
>> Or is each incremental backup based on the previous incremental? And if
>> that is the case, how will we ever be able to delete one of the
>> in-between incrementals, because that would then break to whole chain of
>> incremental_backup-based-on-incremental_backup...?
>>
>> We have read the page
>> https://qemu.readthedocs.io/en/latest/interop/bitmaps.html but it does
>> not seem to answer this.
>>
>> Anyone care to share some insight on this logic and how PBS works?
> 
> you might want to take a look at
> 
> https://pbs.proxmox.com/docs/technical-overview.html
> 
> but, the short summary:
> 
> PBS does not do full or incremental backups in the classical sense, it
> uses a chunk-based deduplicated approach. the backup content is split
> into chunks, those chunks are then hashed to get a chunk ID. the
> incremental part happens on different levels:
> 
> - if the backup is of a VM that has been backed up before, and that
>    previous backup still exists on the server, and the VM has not been
>    stopped in the meantime, only chunks which contain changed blocks
>    (tracked by Qemu with a dirty bitmap) are read, hashed and uploaded,
>    the rest is re-used. (FAST incremental)
> - for all backups, if a previous backup exists, it's index is
>    downloaded, all local data is read and hashed, but only chunks which
>    are missing on the server are actually uploaded (incremental)
> - if no previous backups exists, all local data is read and hashed and
>    uploaded ("full" backup)
> 
> additionally, an uploaded chunk might still exist on the server (e.g.,
> from backups in another backup group), in which case the server will
> still re-use the existing chunk.
> 
> so, fast incremental does the least work, incremental does full reading
> and hashing but less uploading, and we try to avoid unnecessary writes
> on the server side in all cases. all of the above is also true when you
> add in encryption, although obviously changing encryption mode or keys
> will invalidate previous backups for purposes of reusing chunks (and
> thus lead to a single full backup even if previous ones exist).
> 
> on the server side, a backup snapshots does not consist of a base and a
> series of incremental diffs, it always references all the chunks that
> represent this full snapshot. the magic is in the chunking and
> deduplication, which allows us to store all those snapshots efficiently.
> ALL snapshots are equivalent, whether it was the first one or not has no
> bearing on how the snapshot or its referenced data is stored, just on
> how much work it was to create and transfer it in the first place.
> 
> 
> _______________________________________________
> pve-user mailing list
> pve-user@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-03-03  7:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-02 10:22 [PVE-User] pbs incremental backups mj
2021-03-02 10:37 ` Andreas Heinlein
2021-03-02 10:48 ` Fabian Grünbichler
2021-03-03  7:48   ` mj

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal