public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [RFC proxmox-backup] fix #5710: api: backup: stat known chunks during backup
@ 2024-09-13 13:59 Christian Ebner
  2024-09-13 15:23 ` Dietmar Maurer
  2024-09-16 10:17 ` Gabriel Goller
  0 siblings, 2 replies; 7+ messages in thread
From: Christian Ebner @ 2024-09-13 13:59 UTC (permalink / raw)
  To: pbs-devel

Known chunks are expected to be present on the datastore a-priori,
allowing clients to only re-index these chunks without uploading the
raw chunk data. The list of reusable known chunks is send to the
client by the server, deduced from the indexed chunks of the previous
backup snapshot of the group.

If however such a known chunk disappeared (the previous backup
snapshot having been verified before that or not verified just yet),
the backup will finish just fine, leading to a seemingly successful
backup. Only a subsequent verification job will detect the backup
snapshot as being corrupt.

In order to reduce the impact, stat the list of known chunks twice
during backup:
- during registration of a known chunk
- when finishing the backup

The first stat is to early on detect missing chunks, the latter
assures that all (known and newly registered) chunks are present on
the datastore after the backup run.

If a missing chunk is detected, the backup run itself will fail and
the previous backup snapshots verify state is set to failed.
This prevents the same snapshot from being reused by another,
subsequent backup job.

link to issue in bugtracker:
https://bugzilla.proxmox.com/show_bug.cgi?id=5710

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Sending this as RFC because of the possible performance impact,
especially for users with datastores located on storages with limited
I/0.

Test on my side did not show a significant performance difference as
compared to an unpatched server.
I did perform vzdump backups of a VM with a 32G disk attached and a
LXC container with a Debian install and rootfs of ca. 400M (both off,
no changes in data in-between backup runs). PBS datastore located on
ZFS backed by an NVME SSD, 5 runs each after an initial run to assure
full chunk presence on server and valid previous snapshot.

Here some ballpark figures:

-----------------------------------------------------------
patched                    | unpatched
-----------------------------------------------------------
VM           | LXC         | VM           | LXC
-----------------------------------------------------------
15.3s ± 1.4s | 1.73 ± 0.02 | 14.7s ± 1.8s | 1.77s ± 0.05s
-----------------------------------------------------------

 src/api2/backup/environment.rs | 12 ++++++++++
 src/api2/backup/mod.rs         | 44 ++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+)

diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index 99d885e2e..90fd91d63 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -259,6 +259,18 @@ impl BackupEnvironment {
         state.known_chunks.get(digest).copied()
     }
 
+    /// Stat all currently registered chunks
+    ///
+    pub fn stat_registered_chunks(&self) -> Result<(), Error> {
+        let state = self.state.lock().unwrap();
+        for digest in state.known_chunks.keys() {
+            self.datastore
+                .stat_chunk(&digest)
+                .map_err(|err| format_err!("stat failed on {} - {err}", hex::encode(digest)))?;
+        }
+        Ok(())
+    }
+
     /// Store the writer with an unique ID
     pub fn register_dynamic_writer(
         &self,
diff --git a/src/api2/backup/mod.rs b/src/api2/backup/mod.rs
index ea0d0292e..c308066b6 100644
--- a/src/api2/backup/mod.rs
+++ b/src/api2/backup/mod.rs
@@ -579,6 +579,8 @@ fn dynamic_append(
             .lookup_chunk(&digest)
             .ok_or_else(|| format_err!("no such chunk {}", digest_str))?;
 
+        stat_known_chunk(env, wid, &digest)?;
+
         env.dynamic_writer_append_chunk(wid, offset, size, &digest)?;
 
         env.debug(format!(
@@ -653,6 +655,8 @@ fn fixed_append(
             .lookup_chunk(&digest)
             .ok_or_else(|| format_err!("no such chunk {}", digest_str))?;
 
+        stat_known_chunk(env, wid, &digest)?;
+
         env.fixed_writer_append_chunk(wid, offset, size, &digest)?;
 
         env.debug(format!(
@@ -785,6 +789,12 @@ fn finish_backup(
 ) -> Result<Value, Error> {
     let env: &BackupEnvironment = rpcenv.as_ref();
 
+    if let Err(err) = env.stat_registered_chunks() {
+        env.debug(format!("stat registered chunks failed - {err}"));
+        update_previous_manifest_to_verify_failed(env)?;
+        bail!("stat registered chunks failed - {err}");
+    }
+
     env.finish_backup()?;
     env.log("successfully finished backup");
 
@@ -872,3 +882,37 @@ fn download_previous(
     }
     .boxed()
 }
+
+// Check if known chunk is still on disk
+//
+// If not, fail and mark previous backup snapshot as invalid so the next backup run will not use it.
+fn stat_known_chunk(env: &BackupEnvironment, wid: usize, digest: &[u8; 32]) -> Result<(), Error> {
+    if let Err(err) = env.datastore.stat_chunk(&digest) {
+        let digest_str = hex::encode(digest);
+        env.debug(format!(
+            "known chunk stat failed on {digest_str} indexed by {wid} - {err}",
+        ));
+        update_previous_manifest_to_verify_failed(env)?;
+        bail!("known chunk stat failed on {digest_str} - {err}");
+    }
+    Ok(())
+}
+
+// Update the manifest of the previous backup snapshot to be in `VerifyState::Failed`, using the
+// current backup writer tasks UPID.
+fn update_previous_manifest_to_verify_failed(env: &BackupEnvironment) -> Result<(), Error> {
+    if let Some(last) = env.last_backup.as_ref() {
+        // No need to acquire snapshot lock, already locked when starting the backup
+        let verify_state = SnapshotVerifyState {
+            state: VerifyState::Failed,
+            upid: env.worker.upid().clone(), // backp writer UPID
+        };
+        let verify_state = serde_json::to_value(verify_state)?;
+        last.backup_dir
+            .update_manifest(|manifest| {
+                manifest.unprotected["verify_state"] = verify_state;
+            })
+            .map_err(|err| format_err!("manifest update failed - {err}"))?;
+    }
+    Ok(())
+}
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup] fix #5710: api: backup: stat known chunks during backup
  2024-09-13 13:59 [pbs-devel] [RFC proxmox-backup] fix #5710: api: backup: stat known chunks during backup Christian Ebner
@ 2024-09-13 15:23 ` Dietmar Maurer
  2024-09-17  7:19   ` Christian Ebner
  2024-09-16 10:17 ` Gabriel Goller
  1 sibling, 1 reply; 7+ messages in thread
From: Dietmar Maurer @ 2024-09-13 15:23 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Christian Ebner


> Known chunks are expected to be present on the datastore a-priori,
> allowing clients to only re-index these chunks without uploading the
> raw chunk data. The list of reusable known chunks is send to the
> client by the server, deduced from the indexed chunks of the previous
> backup snapshot of the group.

Is it possible to only stat actually reused chunks at the end?


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup] fix #5710: api: backup: stat known chunks during backup
  2024-09-13 13:59 [pbs-devel] [RFC proxmox-backup] fix #5710: api: backup: stat known chunks during backup Christian Ebner
  2024-09-13 15:23 ` Dietmar Maurer
@ 2024-09-16 10:17 ` Gabriel Goller
  2024-09-17  7:23   ` Christian Ebner
  1 sibling, 1 reply; 7+ messages in thread
From: Gabriel Goller @ 2024-09-16 10:17 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On 13.09.2024 15:59, Christian Ebner wrote:
>In order to reduce the impact, stat the list of known chunks twice
>during backup:
>- during registration of a known chunk
>- when finishing the backup
>
>The first stat is to early on detect missing chunks, the latter
>assures that all (known and newly registered) chunks are present on
>the datastore after the backup run.
>
>If a missing chunk is detected, the backup run itself will fail and
>the previous backup snapshots verify state is set to failed.
>This prevents the same snapshot from being reused by another,
>subsequent backup job.

I wonder if we need the first option. We could only `stat` the chunks at
the end of the backup to improve performance.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup] fix #5710: api: backup: stat known chunks during backup
  2024-09-13 15:23 ` Dietmar Maurer
@ 2024-09-17  7:19   ` Christian Ebner
  2024-09-17 11:40     ` Christian Ebner
  0 siblings, 1 reply; 7+ messages in thread
From: Christian Ebner @ 2024-09-17  7:19 UTC (permalink / raw)
  To: Dietmar Maurer, Proxmox Backup Server development discussion

On 9/13/24 17:23, Dietmar Maurer wrote:
> 
>> Known chunks are expected to be present on the datastore a-priori,
>> allowing clients to only re-index these chunks without uploading the
>> raw chunk data. The list of reusable known chunks is send to the
>> client by the server, deduced from the indexed chunks of the previous
>> backup snapshot of the group.
> 
> Is it possible to only stat actually reused chunks at the end?

Yes, as is all the chunks, even not reused but indexed by the previous 
backup snapshot are stat'ed.

I can add an additional flag to the hashmap's value storing known chunks
to differentiate and only stat these.

Will send a new version of the patch.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup] fix #5710: api: backup: stat known chunks during backup
  2024-09-16 10:17 ` Gabriel Goller
@ 2024-09-17  7:23   ` Christian Ebner
  0 siblings, 0 replies; 7+ messages in thread
From: Christian Ebner @ 2024-09-17  7:23 UTC (permalink / raw)
  To: pbs-devel

On 9/16/24 12:17, Gabriel Goller wrote:
> On 13.09.2024 15:59, Christian Ebner wrote:
>> In order to reduce the impact, stat the list of known chunks twice
>> during backup:
>> - during registration of a known chunk
>> - when finishing the backup
>>
>> The first stat is to early on detect missing chunks, the latter
>> assures that all (known and newly registered) chunks are present on
>> the datastore after the backup run.
>>
>> If a missing chunk is detected, the backup run itself will fail and
>> the previous backup snapshots verify state is set to failed.
>> This prevents the same snapshot from being reused by another,
>> subsequent backup job.
> 
> I wonder if we need the first option. We could only `stat` the chunks at
> the end of the backup to improve performance.

The first stat is intended to early detect and fail the backup run on 
missing chunks, but you are right, to minimize the performance impact 
only the final stat when finishing the backup run is enough.

Will drop the first check altogether in a new version of the patch.



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup] fix #5710: api: backup: stat known chunks during backup
  2024-09-17  7:19   ` Christian Ebner
@ 2024-09-17 11:40     ` Christian Ebner
  2024-09-18  8:54       ` Dietmar Maurer
  0 siblings, 1 reply; 7+ messages in thread
From: Christian Ebner @ 2024-09-17 11:40 UTC (permalink / raw)
  To: Dietmar Maurer, Proxmox Backup Server development discussion

On 9/17/24 09:19, Christian Ebner wrote:
> On 9/13/24 17:23, Dietmar Maurer wrote:
>>
>>
>> Is it possible to only stat actually reused chunks at the end?
> 
> Yes, as is all the chunks, even not reused but indexed by the previous 
> backup snapshot are stat'ed.
> 
> I can add an additional flag to the hashmap's value storing known chunks
> to differentiate and only stat these.
> 
> Will send a new version of the patch.

Unfortunately, the accounting of known chunks for fixed indices when 
using the incremental mode turns out to be harder than expected.

For this case, the index is initially copied over from the previous 
snapshot and chunk digests inserted into the known chunks hashmap.
On client side, only changed chunks are uploaded via the backup api, and 
inserted in the fixed index on the corresponding index position. The 
information which previously indexed chunk it replaced is however lacking.

Because of this, I currently tend to rather not modify the current 
behavior and stat all of the known chunks.

Any thoughts on this? Is there a better approach I currently am failing 
to see?



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup] fix #5710: api: backup: stat known chunks during backup
  2024-09-17 11:40     ` Christian Ebner
@ 2024-09-18  8:54       ` Dietmar Maurer
  0 siblings, 0 replies; 7+ messages in thread
From: Dietmar Maurer @ 2024-09-18  8:54 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Christian Ebner

> For this case, the index is initially copied over from the previous 
> snapshot and chunk digests inserted into the known chunks hashmap.
> On client side, only changed chunks are uploaded via the backup api, and 
> inserted in the fixed index on the corresponding index position. The 
> information which previously indexed chunk it replaced is however lacking.
> 
> Because of this, I currently tend to rather not modify the current 
> behavior and stat all of the known chunks.
> 
> Any thoughts on this? Is there a better approach I currently am failing 
> to see?

No, sorry. But Gabriel pointed out that this will have a negligible effect in most cases, so we can keep the current code if too difficult to implement.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-09-18  8:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-13 13:59 [pbs-devel] [RFC proxmox-backup] fix #5710: api: backup: stat known chunks during backup Christian Ebner
2024-09-13 15:23 ` Dietmar Maurer
2024-09-17  7:19   ` Christian Ebner
2024-09-17 11:40     ` Christian Ebner
2024-09-18  8:54       ` Dietmar Maurer
2024-09-16 10:17 ` Gabriel Goller
2024-09-17  7:23   ` Christian Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal