* [pbs-devel] [PATCH proxmox-backup] Revert "fix #5710: api: backup: stat known chunks on backup finish"
@ 2024-12-12 7:52 Christian Ebner
2025-01-07 8:36 ` Christian Ebner
2025-01-13 10:36 ` [pbs-devel] applied: " Fabian Grünbichler
0 siblings, 2 replies; 7+ messages in thread
From: Christian Ebner @ 2024-12-12 7:52 UTC (permalink / raw)
To: pbs-devel
Commit da11d226 ("fix #5710: api: backup: stat known chunks on backup
finish") introduced a seemingly cheap server side check to verify
existence of known chunks in the chunk store by stating. This check
however does not scale for large backup snapshots which might contain
millions of known chunks, as reported in the community forum [0].
Revert the changes for now instead of making this opt-in/opt-out, a
more general approach has to be thought out to mark backup snapshots
which fail verification.
Link to the report in the forum:
[0] https://forum.proxmox.com/threads/158812/
Fixes: da11d226 ("fix #5710: api: backup: stat known chunks on backup finish")
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/api2/backup/environment.rs | 54 +++++-----------------------------
src/api2/backup/mod.rs | 22 +-------------
2 files changed, 8 insertions(+), 68 deletions(-)
diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index 19624fae3..99d885e2e 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -1,4 +1,4 @@
-use anyhow::{bail, format_err, Context, Error};
+use anyhow::{bail, format_err, Error};
use nix::dir::Dir;
use std::collections::HashMap;
use std::sync::{Arc, Mutex};
@@ -72,14 +72,8 @@ struct FixedWriterState {
incremental: bool,
}
-#[derive(Copy, Clone)]
-struct KnownChunkInfo {
- uploaded: bool,
- length: u32,
-}
-
-// key=digest, value=KnownChunkInfo
-type KnownChunksMap = HashMap<[u8; 32], KnownChunkInfo>;
+// key=digest, value=length
+type KnownChunksMap = HashMap<[u8; 32], u32>;
struct SharedBackupState {
finished: bool,
@@ -165,13 +159,7 @@ impl BackupEnvironment {
state.ensure_unfinished()?;
- state.known_chunks.insert(
- digest,
- KnownChunkInfo {
- uploaded: false,
- length,
- },
- );
+ state.known_chunks.insert(digest, length);
Ok(())
}
@@ -225,13 +213,7 @@ impl BackupEnvironment {
}
// register chunk
- state.known_chunks.insert(
- digest,
- KnownChunkInfo {
- uploaded: true,
- length: size,
- },
- );
+ state.known_chunks.insert(digest, size);
Ok(())
}
@@ -266,13 +248,7 @@ impl BackupEnvironment {
}
// register chunk
- state.known_chunks.insert(
- digest,
- KnownChunkInfo {
- uploaded: true,
- length: size,
- },
- );
+ state.known_chunks.insert(digest, size);
Ok(())
}
@@ -280,23 +256,7 @@ impl BackupEnvironment {
pub fn lookup_chunk(&self, digest: &[u8; 32]) -> Option<u32> {
let state = self.state.lock().unwrap();
- state
- .known_chunks
- .get(digest)
- .map(|known_chunk_info| known_chunk_info.length)
- }
-
- /// stat known chunks from previous backup, so excluding newly uploaded ones
- pub fn stat_prev_known_chunks(&self) -> Result<(), Error> {
- let state = self.state.lock().unwrap();
- for (digest, known_chunk_info) in &state.known_chunks {
- if !known_chunk_info.uploaded {
- self.datastore
- .stat_chunk(digest)
- .with_context(|| format!("stat failed on {}", hex::encode(digest)))?;
- }
- }
- Ok(())
+ state.known_chunks.get(digest).copied()
}
/// Store the writer with an unique ID
diff --git a/src/api2/backup/mod.rs b/src/api2/backup/mod.rs
index 31334b59c..0373d135b 100644
--- a/src/api2/backup/mod.rs
+++ b/src/api2/backup/mod.rs
@@ -1,6 +1,6 @@
//! Backup protocol (HTTP2 upgrade)
-use anyhow::{bail, format_err, Context, Error};
+use anyhow::{bail, format_err, Error};
use futures::*;
use hex::FromHex;
use hyper::header::{HeaderValue, CONNECTION, UPGRADE};
@@ -788,26 +788,6 @@ fn finish_backup(
) -> Result<Value, Error> {
let env: &BackupEnvironment = rpcenv.as_ref();
- if let Err(err) = env.stat_prev_known_chunks() {
- env.debug(format!("stat registered chunks failed - {err:?}"));
-
- if let Some(last) = env.last_backup.as_ref() {
- // No need to acquire snapshot lock, already locked when starting the backup
- let verify_state = SnapshotVerifyState {
- state: VerifyState::Failed,
- upid: env.worker.upid().clone(), // backup writer UPID
- };
- let verify_state = serde_json::to_value(verify_state)?;
- last.backup_dir
- .update_manifest(|manifest| {
- manifest.unprotected["verify_state"] = verify_state;
- })
- .with_context(|| "manifest update failed")?;
- }
-
- bail!("stat known chunks failed - {err:?}");
- }
-
env.finish_backup()?;
env.log("successfully finished backup");
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup] Revert "fix #5710: api: backup: stat known chunks on backup finish"
2024-12-12 7:52 [pbs-devel] [PATCH proxmox-backup] Revert "fix #5710: api: backup: stat known chunks on backup finish" Christian Ebner
@ 2025-01-07 8:36 ` Christian Ebner
2025-01-09 10:29 ` Mark Schouten
2025-01-13 10:36 ` [pbs-devel] applied: " Fabian Grünbichler
1 sibling, 1 reply; 7+ messages in thread
From: Christian Ebner @ 2025-01-07 8:36 UTC (permalink / raw)
To: pbs-devel
Ping,
more users are running into this:
https://forum.proxmox.com/threads/158566/
https://forum.proxmox.com/threads/158812/
https://forum.proxmox.com/threads/160097/
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup] Revert "fix #5710: api: backup: stat known chunks on backup finish"
2025-01-07 8:36 ` Christian Ebner
@ 2025-01-09 10:29 ` Mark Schouten
0 siblings, 0 replies; 7+ messages in thread
From: Mark Schouten @ 2025-01-09 10:29 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, pbs-devel
[-- Attachment #1.1: Type: text/plain, Size: 703 bytes --]
As do I.. 🙂
—
Mark Schouten
CTO, Tuxis B.V.
+31 318 200208 / mark@tuxis.nl
------ Original Message ------
From "Christian Ebner" <c.ebner@proxmox.com>
To pbs-devel@lists.proxmox.com
Date 07/01/2025 09:36:09
Subject Re: [pbs-devel] [PATCH proxmox-backup] Revert "fix #5710: api:
backup: stat known chunks on backup finish"
>Ping,
>
>more users are running into this:
>https://forum.proxmox.com/threads/158566/
>https://forum.proxmox.com/threads/158812/
>https://forum.proxmox.com/threads/160097/
>
>
>_______________________________________________
>pbs-devel mailing list
>pbs-devel@lists.proxmox.com
>https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
[-- Attachment #1.2: Type: text/html, Size: 3201 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* [pbs-devel] applied: [PATCH proxmox-backup] Revert "fix #5710: api: backup: stat known chunks on backup finish"
2024-12-12 7:52 [pbs-devel] [PATCH proxmox-backup] Revert "fix #5710: api: backup: stat known chunks on backup finish" Christian Ebner
2025-01-07 8:36 ` Christian Ebner
@ 2025-01-13 10:36 ` Fabian Grünbichler
2025-01-13 10:47 ` Christian Ebner
1 sibling, 1 reply; 7+ messages in thread
From: Fabian Grünbichler @ 2025-01-13 10:36 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
we should probably spec out some potential replacement approaches?
On December 12, 2024 8:52 am, Christian Ebner wrote:
> Commit da11d226 ("fix #5710: api: backup: stat known chunks on backup
> finish") introduced a seemingly cheap server side check to verify
> existence of known chunks in the chunk store by stating. This check
> however does not scale for large backup snapshots which might contain
> millions of known chunks, as reported in the community forum [0].
> Revert the changes for now instead of making this opt-in/opt-out, a
> more general approach has to be thought out to mark backup snapshots
> which fail verification.
>
> Link to the report in the forum:
> [0] https://forum.proxmox.com/threads/158812/
>
> Fixes: da11d226 ("fix #5710: api: backup: stat known chunks on backup finish")
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> src/api2/backup/environment.rs | 54 +++++-----------------------------
> src/api2/backup/mod.rs | 22 +-------------
> 2 files changed, 8 insertions(+), 68 deletions(-)
>
> diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
> index 19624fae3..99d885e2e 100644
> --- a/src/api2/backup/environment.rs
> +++ b/src/api2/backup/environment.rs
> @@ -1,4 +1,4 @@
> -use anyhow::{bail, format_err, Context, Error};
> +use anyhow::{bail, format_err, Error};
> use nix::dir::Dir;
> use std::collections::HashMap;
> use std::sync::{Arc, Mutex};
> @@ -72,14 +72,8 @@ struct FixedWriterState {
> incremental: bool,
> }
>
> -#[derive(Copy, Clone)]
> -struct KnownChunkInfo {
> - uploaded: bool,
> - length: u32,
> -}
> -
> -// key=digest, value=KnownChunkInfo
> -type KnownChunksMap = HashMap<[u8; 32], KnownChunkInfo>;
> +// key=digest, value=length
> +type KnownChunksMap = HashMap<[u8; 32], u32>;
>
> struct SharedBackupState {
> finished: bool,
> @@ -165,13 +159,7 @@ impl BackupEnvironment {
>
> state.ensure_unfinished()?;
>
> - state.known_chunks.insert(
> - digest,
> - KnownChunkInfo {
> - uploaded: false,
> - length,
> - },
> - );
> + state.known_chunks.insert(digest, length);
>
> Ok(())
> }
> @@ -225,13 +213,7 @@ impl BackupEnvironment {
> }
>
> // register chunk
> - state.known_chunks.insert(
> - digest,
> - KnownChunkInfo {
> - uploaded: true,
> - length: size,
> - },
> - );
> + state.known_chunks.insert(digest, size);
>
> Ok(())
> }
> @@ -266,13 +248,7 @@ impl BackupEnvironment {
> }
>
> // register chunk
> - state.known_chunks.insert(
> - digest,
> - KnownChunkInfo {
> - uploaded: true,
> - length: size,
> - },
> - );
> + state.known_chunks.insert(digest, size);
>
> Ok(())
> }
> @@ -280,23 +256,7 @@ impl BackupEnvironment {
> pub fn lookup_chunk(&self, digest: &[u8; 32]) -> Option<u32> {
> let state = self.state.lock().unwrap();
>
> - state
> - .known_chunks
> - .get(digest)
> - .map(|known_chunk_info| known_chunk_info.length)
> - }
> -
> - /// stat known chunks from previous backup, so excluding newly uploaded ones
> - pub fn stat_prev_known_chunks(&self) -> Result<(), Error> {
> - let state = self.state.lock().unwrap();
> - for (digest, known_chunk_info) in &state.known_chunks {
> - if !known_chunk_info.uploaded {
> - self.datastore
> - .stat_chunk(digest)
> - .with_context(|| format!("stat failed on {}", hex::encode(digest)))?;
> - }
> - }
> - Ok(())
> + state.known_chunks.get(digest).copied()
> }
>
> /// Store the writer with an unique ID
> diff --git a/src/api2/backup/mod.rs b/src/api2/backup/mod.rs
> index 31334b59c..0373d135b 100644
> --- a/src/api2/backup/mod.rs
> +++ b/src/api2/backup/mod.rs
> @@ -1,6 +1,6 @@
> //! Backup protocol (HTTP2 upgrade)
>
> -use anyhow::{bail, format_err, Context, Error};
> +use anyhow::{bail, format_err, Error};
> use futures::*;
> use hex::FromHex;
> use hyper::header::{HeaderValue, CONNECTION, UPGRADE};
> @@ -788,26 +788,6 @@ fn finish_backup(
> ) -> Result<Value, Error> {
> let env: &BackupEnvironment = rpcenv.as_ref();
>
> - if let Err(err) = env.stat_prev_known_chunks() {
> - env.debug(format!("stat registered chunks failed - {err:?}"));
> -
> - if let Some(last) = env.last_backup.as_ref() {
> - // No need to acquire snapshot lock, already locked when starting the backup
> - let verify_state = SnapshotVerifyState {
> - state: VerifyState::Failed,
> - upid: env.worker.upid().clone(), // backup writer UPID
> - };
> - let verify_state = serde_json::to_value(verify_state)?;
> - last.backup_dir
> - .update_manifest(|manifest| {
> - manifest.unprotected["verify_state"] = verify_state;
> - })
> - .with_context(|| "manifest update failed")?;
> - }
> -
> - bail!("stat known chunks failed - {err:?}");
> - }
> -
> env.finish_backup()?;
> env.log("successfully finished backup");
>
> --
> 2.39.5
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [pbs-devel] applied: [PATCH proxmox-backup] Revert "fix #5710: api: backup: stat known chunks on backup finish"
2025-01-13 10:36 ` [pbs-devel] applied: " Fabian Grünbichler
@ 2025-01-13 10:47 ` Christian Ebner
2025-01-13 12:48 ` Fabian Grünbichler
0 siblings, 1 reply; 7+ messages in thread
From: Christian Ebner @ 2025-01-13 10:47 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 1/13/25 11:36, Fabian Grünbichler wrote:
> we should probably spec out some potential replacement approaches?
Agreed, but I would nevertheless argue for reverting this for the time
being: a new approach will not use the same logic. What do you suggest?
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [pbs-devel] applied: [PATCH proxmox-backup] Revert "fix #5710: api: backup: stat known chunks on backup finish"
2025-01-13 10:47 ` Christian Ebner
@ 2025-01-13 12:48 ` Fabian Grünbichler
2025-01-13 13:10 ` Christian Ebner
0 siblings, 1 reply; 7+ messages in thread
From: Fabian Grünbichler @ 2025-01-13 12:48 UTC (permalink / raw)
To: Christian Ebner, Proxmox Backup Server development discussion
On January 13, 2025 11:47 am, Christian Ebner wrote:
> On 1/13/25 11:36, Fabian Grünbichler wrote:
>> we should probably spec out some potential replacement approaches?
>
> Agreed, but I would nevertheless argue for reverting this for the time
> being: a new approach will not use the same logic. What do you suggest?
yes, the revert is already applied now :)
we discussed some potential approaches internally a few weeks back, IIRC
some potential mechanisms:
A) actively mark all snapshots referencing a certain chunk as corrupt
when we detect the chunk corruption probably needs to be combined with
some sort of reverse-map from chunk to list of snapshots to scale.
B) mark all snapshots in the chain after the corruption as potentially
corrupt (new state) and incorporate that into the reverify logic to
detect other snapshots with a high likelihood of being affected quickly
C) notify active backup writers of chunks or snapshots being marked as
corrupt. chunks can be cross-referenced at the end with re-used chunks,
snapshots could trigger stat logic similar to the one we just reverted
in case the snapshot is in the same group as the backup writer
D) ..?
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [pbs-devel] applied: [PATCH proxmox-backup] Revert "fix #5710: api: backup: stat known chunks on backup finish"
2025-01-13 12:48 ` Fabian Grünbichler
@ 2025-01-13 13:10 ` Christian Ebner
0 siblings, 0 replies; 7+ messages in thread
From: Christian Ebner @ 2025-01-13 13:10 UTC (permalink / raw)
To: Fabian Grünbichler, Proxmox Backup Server development discussion
On 1/13/25 13:48, Fabian Grünbichler wrote:
> On January 13, 2025 11:47 am, Christian Ebner wrote:
>> On 1/13/25 11:36, Fabian Grünbichler wrote:
>>> we should probably spec out some potential replacement approaches?
>>
>> Agreed, but I would nevertheless argue for reverting this for the time
>> being: a new approach will not use the same logic. What do you suggest?
>
> yes, the revert is already applied now :)
>
> we discussed some potential approaches internally a few weeks back, IIRC
> some potential mechanisms:
>
> A) actively mark all snapshots referencing a certain chunk as corrupt
> when we detect the chunk corruption probably needs to be combined with
> some sort of reverse-map from chunk to list of snapshots to scale.
>
>
> B) mark all snapshots in the chain after the corruption as potentially
> corrupt (new state) and incorporate that into the reverify logic to
> detect other snapshots with a high likelihood of being affected quickly
>
> C) notify active backup writers of chunks or snapshots being marked as
> corrupt. chunks can be cross-referenced at the end with re-used chunks,
> snapshots could trigger stat logic similar to the one we just reverted
> in case the snapshot is in the same group as the backup writer
>
> D) ..?
Thanks!
I did update the issue in the bugtracker accordingly and linked to your
suggestions for reference, will have a closer look at how to approach
this once again.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-01-13 13:11 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-12-12 7:52 [pbs-devel] [PATCH proxmox-backup] Revert "fix #5710: api: backup: stat known chunks on backup finish" Christian Ebner
2025-01-07 8:36 ` Christian Ebner
2025-01-09 10:29 ` Mark Schouten
2025-01-13 10:36 ` [pbs-devel] applied: " Fabian Grünbichler
2025-01-13 10:47 ` Christian Ebner
2025-01-13 12:48 ` Fabian Grünbichler
2025-01-13 13:10 ` Christian Ebner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox