public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH v4 proxmox-backup] fix #5710: api: backup: stat known chunks on backup finish
@ 2024-10-08  9:46 Christian Ebner
  2024-11-22  9:36 ` [pbs-devel] applied: " Fabian Grünbichler
  2024-11-25 21:42 ` [pbs-devel] " Thomas Lamprecht
  0 siblings, 2 replies; 5+ messages in thread
From: Christian Ebner @ 2024-10-08  9:46 UTC (permalink / raw)
  To: pbs-devel

Known chunks are expected to be present on the datastore a-priori,
allowing clients to only re-index these chunks without uploading the
raw chunk data. The list of reusable known chunks is send to the
client by the server, deduced from the indexed chunks of the previous
backup snapshot of the group.

If however such a known chunk disappeared (the previous backup
snapshot having been verified before that or not verified just yet),
the backup will finish just fine, leading to a seemingly successful
backup. Only a subsequent verification job will detect the backup
snapshot as being corrupt.

In order to reduce the impact, stat the list of previously known
chunks when finishing the backup. If a missing chunk is detected, the
backup run itself will fail and the previous backup snapshots verify
state is set to failed.
This prevents the same snapshot from being reused by another,
subsequent backup job.

Note:
The current backup run might have been just fine, if the now missing
known chunk is not indexed. But since there is no straight forward
way to detect which known chunks have not been reused in the fast
incremental mode for fixed index backups, the backup run is
considered failed.

link to issue in bugtracker:
https://bugzilla.proxmox.com/show_bug.cgi?id=5710

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Tested-by: Gabriel Goller <g.goller@proxmox.com>
Reviewed-by: Gabriel Goller <g.goller@proxmox.com>
---
Changes since version 3, thanks to Gabriel for additional comments:
- Use anyhow error context also for manifest update error
- Use `with_context` over mapping the error, which is more concise

Changes since version 2, thanks to Gabriel for testing and review:
- Use and display anyhow error context
- s/backp/backup/

Changes since version 1, thanks to Dietmar and Gabriel for feedback:
- Only stat on backup finish
- Distinguish newly uploaded from previously known chunks, to be able
  to only stat the latter.

New test on my side show a performance degradation of ~2% for the VM
backup and about ~10% for the LXC backup as compared to an unpatched
server.
In contrast to version 1 of the patches the PBS datastore this time
was located on an NFS share backed by an NVME SSD.

I did perform vzdump backups of a VM with a 32G disk attached and a
LXC container with a Debian install and rootfs of ca. 400M (both off,
no changes in data in-between backup runs).
Again performed 5 runs each after an initial run to assure full chunk
presence on server and valid previous snapshot.

Here the updated figures:

-----------------------------------------------------------
patched                    | unpatched
-----------------------------------------------------------
VM           | LXC         | VM           | LXC
-----------------------------------------------------------
14.0s ± 0.8s | 2.2s ± 0.1s | 13.7s ± 0.5s | 2.0s ± 0.03s
-----------------------------------------------------------

 src/api2/backup/environment.rs | 54 +++++++++++++++++++++++++++++-----
 src/api2/backup/mod.rs         | 22 +++++++++++++-
 2 files changed, 68 insertions(+), 8 deletions(-)

diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index 99d885e2e..19624fae3 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -1,4 +1,4 @@
-use anyhow::{bail, format_err, Error};
+use anyhow::{bail, format_err, Context, Error};
 use nix::dir::Dir;
 use std::collections::HashMap;
 use std::sync::{Arc, Mutex};
@@ -72,8 +72,14 @@ struct FixedWriterState {
     incremental: bool,
 }
 
-// key=digest, value=length
-type KnownChunksMap = HashMap<[u8; 32], u32>;
+#[derive(Copy, Clone)]
+struct KnownChunkInfo {
+    uploaded: bool,
+    length: u32,
+}
+
+// key=digest, value=KnownChunkInfo
+type KnownChunksMap = HashMap<[u8; 32], KnownChunkInfo>;
 
 struct SharedBackupState {
     finished: bool,
@@ -159,7 +165,13 @@ impl BackupEnvironment {
 
         state.ensure_unfinished()?;
 
-        state.known_chunks.insert(digest, length);
+        state.known_chunks.insert(
+            digest,
+            KnownChunkInfo {
+                uploaded: false,
+                length,
+            },
+        );
 
         Ok(())
     }
@@ -213,7 +225,13 @@ impl BackupEnvironment {
         }
 
         // register chunk
-        state.known_chunks.insert(digest, size);
+        state.known_chunks.insert(
+            digest,
+            KnownChunkInfo {
+                uploaded: true,
+                length: size,
+            },
+        );
 
         Ok(())
     }
@@ -248,7 +266,13 @@ impl BackupEnvironment {
         }
 
         // register chunk
-        state.known_chunks.insert(digest, size);
+        state.known_chunks.insert(
+            digest,
+            KnownChunkInfo {
+                uploaded: true,
+                length: size,
+            },
+        );
 
         Ok(())
     }
@@ -256,7 +280,23 @@ impl BackupEnvironment {
     pub fn lookup_chunk(&self, digest: &[u8; 32]) -> Option<u32> {
         let state = self.state.lock().unwrap();
 
-        state.known_chunks.get(digest).copied()
+        state
+            .known_chunks
+            .get(digest)
+            .map(|known_chunk_info| known_chunk_info.length)
+    }
+
+    /// stat known chunks from previous backup, so excluding newly uploaded ones
+    pub fn stat_prev_known_chunks(&self) -> Result<(), Error> {
+        let state = self.state.lock().unwrap();
+        for (digest, known_chunk_info) in &state.known_chunks {
+            if !known_chunk_info.uploaded {
+                self.datastore
+                    .stat_chunk(digest)
+                    .with_context(|| format!("stat failed on {}", hex::encode(digest)))?;
+            }
+        }
+        Ok(())
     }
 
     /// Store the writer with an unique ID
diff --git a/src/api2/backup/mod.rs b/src/api2/backup/mod.rs
index ea0d0292e..63c49f653 100644
--- a/src/api2/backup/mod.rs
+++ b/src/api2/backup/mod.rs
@@ -1,6 +1,6 @@
 //! Backup protocol (HTTP2 upgrade)
 
-use anyhow::{bail, format_err, Error};
+use anyhow::{bail, format_err, Context, Error};
 use futures::*;
 use hex::FromHex;
 use hyper::header::{HeaderValue, CONNECTION, UPGRADE};
@@ -785,6 +785,26 @@ fn finish_backup(
 ) -> Result<Value, Error> {
     let env: &BackupEnvironment = rpcenv.as_ref();
 
+    if let Err(err) = env.stat_prev_known_chunks() {
+        env.debug(format!("stat registered chunks failed - {err:?}"));
+
+        if let Some(last) = env.last_backup.as_ref() {
+            // No need to acquire snapshot lock, already locked when starting the backup
+            let verify_state = SnapshotVerifyState {
+                state: VerifyState::Failed,
+                upid: env.worker.upid().clone(), // backup writer UPID
+            };
+            let verify_state = serde_json::to_value(verify_state)?;
+            last.backup_dir
+                .update_manifest(|manifest| {
+                    manifest.unprotected["verify_state"] = verify_state;
+                })
+                .with_context(|| "manifest update failed")?;
+        }
+
+        bail!("stat known chunks failed - {err:?}");
+    }
+
     env.finish_backup()?;
     env.log("successfully finished backup");
 
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [pbs-devel] applied: [PATCH v4 proxmox-backup] fix #5710: api: backup: stat known chunks on backup finish
  2024-10-08  9:46 [pbs-devel] [PATCH v4 proxmox-backup] fix #5710: api: backup: stat known chunks on backup finish Christian Ebner
@ 2024-11-22  9:36 ` Fabian Grünbichler
  2024-11-25 21:42 ` [pbs-devel] " Thomas Lamprecht
  1 sibling, 0 replies; 5+ messages in thread
From: Fabian Grünbichler @ 2024-11-22  9:36 UTC (permalink / raw)
  To: Christian Ebner, pbs-devel

Quoting Christian Ebner (2024-10-08 11:46:17)
> Known chunks are expected to be present on the datastore a-priori,
> allowing clients to only re-index these chunks without uploading the
> raw chunk data. The list of reusable known chunks is send to the
> client by the server, deduced from the indexed chunks of the previous
> backup snapshot of the group.
> 
> If however such a known chunk disappeared (the previous backup
> snapshot having been verified before that or not verified just yet),
> the backup will finish just fine, leading to a seemingly successful
> backup. Only a subsequent verification job will detect the backup
> snapshot as being corrupt.
> 
> In order to reduce the impact, stat the list of previously known
> chunks when finishing the backup. If a missing chunk is detected, the
> backup run itself will fail and the previous backup snapshots verify
> state is set to failed.
> This prevents the same snapshot from being reused by another,
> subsequent backup job.
> 
> Note:
> The current backup run might have been just fine, if the now missing
> known chunk is not indexed. But since there is no straight forward
> way to detect which known chunks have not been reused in the fast
> incremental mode for fixed index backups, the backup run is
> considered failed.
> 
> link to issue in bugtracker:
> https://bugzilla.proxmox.com/show_bug.cgi?id=5710
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> Tested-by: Gabriel Goller <g.goller@proxmox.com>
> Reviewed-by: Gabriel Goller <g.goller@proxmox.com>
> ---
> Changes since version 3, thanks to Gabriel for additional comments:
> - Use anyhow error context also for manifest update error
> - Use `with_context` over mapping the error, which is more concise
> 
> Changes since version 2, thanks to Gabriel for testing and review:
> - Use and display anyhow error context
> - s/backp/backup/
> 
> Changes since version 1, thanks to Dietmar and Gabriel for feedback:
> - Only stat on backup finish
> - Distinguish newly uploaded from previously known chunks, to be able
>   to only stat the latter.
> 
> New test on my side show a performance degradation of ~2% for the VM
> backup and about ~10% for the LXC backup as compared to an unpatched
> server.
> In contrast to version 1 of the patches the PBS datastore this time
> was located on an NFS share backed by an NVME SSD.
> 
> I did perform vzdump backups of a VM with a 32G disk attached and a
> LXC container with a Debian install and rootfs of ca. 400M (both off,
> no changes in data in-between backup runs).
> Again performed 5 runs each after an initial run to assure full chunk
> presence on server and valid previous snapshot.
> 
> Here the updated figures:
> 
> -----------------------------------------------------------
> patched                    | unpatched
> -----------------------------------------------------------
> VM           | LXC         | VM           | LXC
> -----------------------------------------------------------
> 14.0s ± 0.8s | 2.2s ± 0.1s | 13.7s ± 0.5s | 2.0s ± 0.03s
> -----------------------------------------------------------
> 
>  src/api2/backup/environment.rs | 54 +++++++++++++++++++++++++++++-----
>  src/api2/backup/mod.rs         | 22 +++++++++++++-
>  2 files changed, 68 insertions(+), 8 deletions(-)
> 
> diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
> index 99d885e2e..19624fae3 100644
> --- a/src/api2/backup/environment.rs
> +++ b/src/api2/backup/environment.rs
> @@ -1,4 +1,4 @@
> -use anyhow::{bail, format_err, Error};
> +use anyhow::{bail, format_err, Context, Error};
>  use nix::dir::Dir;
>  use std::collections::HashMap;
>  use std::sync::{Arc, Mutex};
> @@ -72,8 +72,14 @@ struct FixedWriterState {
>      incremental: bool,
>  }
>  
> -// key=digest, value=length
> -type KnownChunksMap = HashMap<[u8; 32], u32>;
> +#[derive(Copy, Clone)]
> +struct KnownChunkInfo {
> +    uploaded: bool,
> +    length: u32,
> +}
> +
> +// key=digest, value=KnownChunkInfo
> +type KnownChunksMap = HashMap<[u8; 32], KnownChunkInfo>;
>  
>  struct SharedBackupState {
>      finished: bool,
> @@ -159,7 +165,13 @@ impl BackupEnvironment {
>  
>          state.ensure_unfinished()?;
>  
> -        state.known_chunks.insert(digest, length);
> +        state.known_chunks.insert(
> +            digest,
> +            KnownChunkInfo {
> +                uploaded: false,
> +                length,
> +            },
> +        );
>  
>          Ok(())
>      }
> @@ -213,7 +225,13 @@ impl BackupEnvironment {
>          }
>  
>          // register chunk
> -        state.known_chunks.insert(digest, size);
> +        state.known_chunks.insert(
> +            digest,
> +            KnownChunkInfo {
> +                uploaded: true,
> +                length: size,
> +            },
> +        );
>  
>          Ok(())
>      }
> @@ -248,7 +266,13 @@ impl BackupEnvironment {
>          }
>  
>          // register chunk
> -        state.known_chunks.insert(digest, size);
> +        state.known_chunks.insert(
> +            digest,
> +            KnownChunkInfo {
> +                uploaded: true,
> +                length: size,
> +            },
> +        );
>  
>          Ok(())
>      }
> @@ -256,7 +280,23 @@ impl BackupEnvironment {
>      pub fn lookup_chunk(&self, digest: &[u8; 32]) -> Option<u32> {
>          let state = self.state.lock().unwrap();
>  
> -        state.known_chunks.get(digest).copied()
> +        state
> +            .known_chunks
> +            .get(digest)
> +            .map(|known_chunk_info| known_chunk_info.length)
> +    }
> +
> +    /// stat known chunks from previous backup, so excluding newly uploaded ones
> +    pub fn stat_prev_known_chunks(&self) -> Result<(), Error> {
> +        let state = self.state.lock().unwrap();
> +        for (digest, known_chunk_info) in &state.known_chunks {
> +            if !known_chunk_info.uploaded {
> +                self.datastore
> +                    .stat_chunk(digest)
> +                    .with_context(|| format!("stat failed on {}", hex::encode(digest)))?;
> +            }
> +        }
> +        Ok(())
>      }
>  
>      /// Store the writer with an unique ID
> diff --git a/src/api2/backup/mod.rs b/src/api2/backup/mod.rs
> index ea0d0292e..63c49f653 100644
> --- a/src/api2/backup/mod.rs
> +++ b/src/api2/backup/mod.rs
> @@ -1,6 +1,6 @@
>  //! Backup protocol (HTTP2 upgrade)
>  
> -use anyhow::{bail, format_err, Error};
> +use anyhow::{bail, format_err, Context, Error};
>  use futures::*;
>  use hex::FromHex;
>  use hyper::header::{HeaderValue, CONNECTION, UPGRADE};
> @@ -785,6 +785,26 @@ fn finish_backup(
>  ) -> Result<Value, Error> {
>      let env: &BackupEnvironment = rpcenv.as_ref();
>  
> +    if let Err(err) = env.stat_prev_known_chunks() {
> +        env.debug(format!("stat registered chunks failed - {err:?}"));
> +
> +        if let Some(last) = env.last_backup.as_ref() {
> +            // No need to acquire snapshot lock, already locked when starting the backup
> +            let verify_state = SnapshotVerifyState {
> +                state: VerifyState::Failed,
> +                upid: env.worker.upid().clone(), // backup writer UPID
> +            };
> +            let verify_state = serde_json::to_value(verify_state)?;
> +            last.backup_dir
> +                .update_manifest(|manifest| {
> +                    manifest.unprotected["verify_state"] = verify_state;
> +                })
> +                .with_context(|| "manifest update failed")?;
> +        }
> +
> +        bail!("stat known chunks failed - {err:?}");
> +    }
> +
>      env.finish_backup()?;
>      env.log("successfully finished backup");
>  
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pbs-devel] [PATCH v4 proxmox-backup] fix #5710: api: backup: stat known chunks on backup finish
  2024-10-08  9:46 [pbs-devel] [PATCH v4 proxmox-backup] fix #5710: api: backup: stat known chunks on backup finish Christian Ebner
  2024-11-22  9:36 ` [pbs-devel] applied: " Fabian Grünbichler
@ 2024-11-25 21:42 ` Thomas Lamprecht
  2024-11-26  7:36   ` Christian Ebner
  1 sibling, 1 reply; 5+ messages in thread
From: Thomas Lamprecht @ 2024-11-25 21:42 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Christian Ebner

Am 08.10.24 um 11:46 schrieb Christian Ebner:
> Known chunks are expected to be present on the datastore a-priori,
> allowing clients to only re-index these chunks without uploading the
> raw chunk data. The list of reusable known chunks is send to the
> client by the server, deduced from the indexed chunks of the previous
> backup snapshot of the group.
> 
> If however such a known chunk disappeared (the previous backup
> snapshot having been verified before that or not verified just yet),
> the backup will finish just fine, leading to a seemingly successful
> backup. Only a subsequent verification job will detect the backup
> snapshot as being corrupt.
> 
> In order to reduce the impact, stat the list of previously known
> chunks when finishing the backup. If a missing chunk is detected, the
> backup run itself will fail and the previous backup snapshots verify
> state is set to failed.
> This prevents the same snapshot from being reused by another,
> subsequent backup job.
> 
> Note:
> The current backup run might have been just fine, if the now missing
> known chunk is not indexed. But since there is no straight forward
> way to detect which known chunks have not been reused in the fast
> incremental mode for fixed index backups, the backup run is
> considered failed.
> 
> link to issue in bugtracker:
> https://bugzilla.proxmox.com/show_bug.cgi?id=5710
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> Tested-by: Gabriel Goller <g.goller@proxmox.com>
> Reviewed-by: Gabriel Goller <g.goller@proxmox.com>
> ---
> Changes since version 3, thanks to Gabriel for additional comments:
> - Use anyhow error context also for manifest update error
> - Use `with_context` over mapping the error, which is more concise
> 
> Changes since version 2, thanks to Gabriel for testing and review:
> - Use and display anyhow error context
> - s/backp/backup/
> 
> Changes since version 1, thanks to Dietmar and Gabriel for feedback:
> - Only stat on backup finish
> - Distinguish newly uploaded from previously known chunks, to be able
>   to only stat the latter.
> 
> New test on my side show a performance degradation of ~2% for the VM
> backup and about ~10% for the LXC backup as compared to an unpatched
> server.
> In contrast to version 1 of the patches the PBS datastore this time
> was located on an NFS share backed by an NVME SSD.
> 
> I did perform vzdump backups of a VM with a 32G disk attached and a
> LXC container with a Debian install and rootfs of ca. 400M (both off,
> no changes in data in-between backup runs).
> Again performed 5 runs each after an initial run to assure full chunk
> presence on server and valid previous snapshot.
> 
> Here the updated figures:
> 
> -----------------------------------------------------------
> patched                    | unpatched
> -----------------------------------------------------------
> VM           | LXC         | VM           | LXC
> -----------------------------------------------------------
> 14.0s ± 0.8s | 2.2s ± 0.1s | 13.7s ± 0.5s | 2.0s ± 0.03s
> -----------------------------------------------------------

please include this stuff in the actual commit message, it's nice to see as
point-in-time sample when reading the git log.
A comparison with bigger disks, say 1 TB, would be additionally great to see
how this scales with big disk size.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pbs-devel] [PATCH v4 proxmox-backup] fix #5710: api: backup: stat known chunks on backup finish
  2024-11-25 21:42 ` [pbs-devel] " Thomas Lamprecht
@ 2024-11-26  7:36   ` Christian Ebner
  2024-11-26 10:12     ` Thomas Lamprecht
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Ebner @ 2024-11-26  7:36 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox Backup Server development discussion

On 11/25/24 22:42, Thomas Lamprecht wrote:
> please include this stuff in the actual commit message, it's nice to see as
> point-in-time sample when reading the git log.
> A comparison with bigger disks, say 1 TB, would be additionally great to see
> how this scales with big disk size.

Thanks for feedback!

I decided to not include these in the patch directly, as the tests 
performed were limited in extend and setup, so I was unsure how 
representative they actually are.

I will however keep this in mind for next time, as this has already been 
applied as is.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [pbs-devel] [PATCH v4 proxmox-backup] fix #5710: api: backup: stat known chunks on backup finish
  2024-11-26  7:36   ` Christian Ebner
@ 2024-11-26 10:12     ` Thomas Lamprecht
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Lamprecht @ 2024-11-26 10:12 UTC (permalink / raw)
  To: Christian Ebner, Proxmox Backup Server development discussion

Am 26.11.24 um 08:36 schrieb Christian Ebner:
> On 11/25/24 22:42, Thomas Lamprecht wrote:
>> please include this stuff in the actual commit message, it's nice to see as
>> point-in-time sample when reading the git log.
>> A comparison with bigger disks, say 1 TB, would be additionally great to see
>> how this scales with big disk size.
> 
> Thanks for feedback!
> 
> I decided to not include these in the patch directly, as the tests 
> performed were limited in extend and setup, so I was unsure how 
> representative they actually are.

In such cases it's fine to include them with exactly such a disclaimer.

> I will however keep this in mind for next time, as this has already been 
> applied as is.

The data is out there so already good, I just have a strong preference
of having these things in the commit log too.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-11-26 10:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-08  9:46 [pbs-devel] [PATCH v4 proxmox-backup] fix #5710: api: backup: stat known chunks on backup finish Christian Ebner
2024-11-22  9:36 ` [pbs-devel] applied: " Fabian Grünbichler
2024-11-25 21:42 ` [pbs-devel] " Thomas Lamprecht
2024-11-26  7:36   ` Christian Ebner
2024-11-26 10:12     ` Thomas Lamprecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal