public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox-backup v3 0/3] fix #3786: resync corrupt chunks in sync-job
@ 2024-11-05 10:40 Gabriel Goller
  2024-11-05 10:40 ` [pbs-devel] [PATCH proxmox-backup v3 1/3] fix #3786: api: add resync-corrupt option to sync jobs Gabriel Goller
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Gabriel Goller @ 2024-11-05 10:40 UTC (permalink / raw)
  To: pbs-devel

Add an option `resync-corrupt` that resyncs corrupt snapshots when running
sync-job. This option checks if the local snapshot failed the last
verification and if it did, overwrites the local snapshot with the
remote one.

This is quite useful, as we currently don't have an option to "fix" 
broken chunks/snapshots in any way, even if a healthy version is on 
another (e.g. offsite) instance.

Important things to note are also: this has a slight performance 
penalty, as all the manifests have to be looked through, and a 
verification job has to be run beforehand, otherwise we do not know 
if the snapshot is healthy.

Note: This series was originally written by Shannon! I just picked it 
up, rebased, and fixed the obvious comments on the last series.

Changelog v3 (thanks @Fabian)
 - filter out snapshots earlier in the pull_group function
 - move verify_state to BackupManifest and fixed invocations
 - reverted verify_state Option -> Result state (It doesn't matter if we get an
   error, we get that quite often f.e. in new backups)
 - removed some unnecessary log lines
 - removed some unnecessary imports and modifications
 - rebase to current master

Changelog v2 (thanks @Thomas):
 - order git trailers
 - adjusted schema description to include broken indexes
 - change verify_state to return a Result<_,_>
 - print error if verify_state is not able to read the state
 - update docs on pull_snapshot function
 - simplify logic by combining flags
 - move log line out of loop to only print once that we resync the snapshot

Changelog since RFC (Shannon's work):
 - rename option from deep-sync to resync-corrupt
 - rebase on latest master (and change implementation details, as a 
     lot has changed around sync-jobs)

proxmox-backup:

Gabriel Goller (3):
  fix #3786: api: add resync-corrupt option to sync jobs
  fix #3786: ui/cli: add resync-corrupt option on sync-jobs
  fix #3786: docs: add resync-corrupt option to sync-job

 docs/managing-remotes.rst         |  6 +++
 pbs-api-types/src/jobs.rs         | 10 +++++
 pbs-datastore/src/backup_info.rs  | 12 +++++-
 pbs-datastore/src/manifest.rs     | 13 ++++++-
 src/api2/config/sync.rs           |  4 ++
 src/api2/pull.rs                  |  9 ++++-
 src/bin/proxmox-backup-manager.rs | 13 ++++++-
 src/server/pull.rs                | 62 +++++++++++++++++++++++--------
 www/window/SyncJobEdit.js         | 11 ++++++
 9 files changed, 119 insertions(+), 21 deletions(-)


Summary over all repositories:
  9 files changed, 119 insertions(+), 21 deletions(-)

-- 
Generated by git-murpp 0.7.1


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v3 1/3] fix #3786: api: add resync-corrupt option to sync jobs
  2024-11-05 10:40 [pbs-devel] [PATCH proxmox-backup v3 0/3] fix #3786: resync corrupt chunks in sync-job Gabriel Goller
@ 2024-11-05 10:40 ` Gabriel Goller
  2024-11-20 13:11   ` Fabian Grünbichler
  2024-11-05 10:40 ` [pbs-devel] [PATCH proxmox-backup v3 2/3] fix #3786: ui/cli: add resync-corrupt option on sync-jobs Gabriel Goller
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Gabriel Goller @ 2024-11-05 10:40 UTC (permalink / raw)
  To: pbs-devel

This option allows us to "fix" corrupt snapshots (and/or their chunks)
by pulling them from another remote. When traversing the remote
snapshots, we check if it exists locally, and if it is, we check if the
last verification of it failed. If the local snapshot is broken and the
`resync-corrupt` option is turned on, we pull in the remote snapshot,
overwriting the local one.

This is very useful and has been requested a lot, as there is currently
no way to "fix" corrupt chunks/snapshots even if the user has a healthy
version of it on their offsite instance.

Originally-by: Shannon Sterz <s.sterz@proxmox.com>
Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
---
 pbs-api-types/src/jobs.rs         | 10 +++++
 pbs-datastore/src/backup_info.rs  | 12 +++++-
 pbs-datastore/src/manifest.rs     | 13 ++++++-
 src/api2/config/sync.rs           |  4 ++
 src/api2/pull.rs                  |  9 ++++-
 src/bin/proxmox-backup-manager.rs |  4 +-
 src/server/pull.rs                | 62 +++++++++++++++++++++++--------
 7 files changed, 93 insertions(+), 21 deletions(-)

diff --git a/pbs-api-types/src/jobs.rs b/pbs-api-types/src/jobs.rs
index 868702bc059e..58f739ad00b5 100644
--- a/pbs-api-types/src/jobs.rs
+++ b/pbs-api-types/src/jobs.rs
@@ -498,6 +498,10 @@ pub const TRANSFER_LAST_SCHEMA: Schema =
         .minimum(1)
         .schema();
 
+pub const RESYNC_CORRUPT_SCHEMA: Schema =
+    BooleanSchema::new("If the verification failed for a local snapshot, try to pull it again.")
+        .schema();
+
 #[api(
     properties: {
         id: {
@@ -552,6 +556,10 @@ pub const TRANSFER_LAST_SCHEMA: Schema =
             schema: TRANSFER_LAST_SCHEMA,
             optional: true,
         },
+        "resync-corrupt": {
+            schema: RESYNC_CORRUPT_SCHEMA,
+            optional: true,
+        }
     }
 )]
 #[derive(Serialize, Deserialize, Clone, Updater, PartialEq)]
@@ -585,6 +593,8 @@ pub struct SyncJobConfig {
     pub limit: RateLimitConfig,
     #[serde(skip_serializing_if = "Option::is_none")]
     pub transfer_last: Option<usize>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub resync_corrupt: Option<bool>,
 }
 
 impl SyncJobConfig {
diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
index 414ec878d01a..e6174322dad6 100644
--- a/pbs-datastore/src/backup_info.rs
+++ b/pbs-datastore/src/backup_info.rs
@@ -8,7 +8,8 @@ use anyhow::{bail, format_err, Error};
 use proxmox_sys::fs::{lock_dir_noblock, replace_file, CreateOptions};
 
 use pbs_api_types::{
-    Authid, BackupNamespace, BackupType, GroupFilter, BACKUP_DATE_REGEX, BACKUP_FILE_REGEX,
+    Authid, BackupNamespace, BackupType, GroupFilter, VerifyState, BACKUP_DATE_REGEX,
+    BACKUP_FILE_REGEX,
 };
 use pbs_config::{open_backup_lockfile, BackupLockGuard};
 
@@ -583,6 +584,15 @@ impl BackupDir {
 
         Ok(())
     }
+
+    /// Load the verify state from the manifest.
+    pub fn verify_state(&self) -> Option<VerifyState> {
+        if let Ok(manifest) = self.load_manifest() {
+            manifest.0.verify_state()
+        } else {
+            None
+        }
+    }
 }
 
 impl AsRef<pbs_api_types::BackupNamespace> for BackupDir {
diff --git a/pbs-datastore/src/manifest.rs b/pbs-datastore/src/manifest.rs
index c3df014272a0..623c1499c0bb 100644
--- a/pbs-datastore/src/manifest.rs
+++ b/pbs-datastore/src/manifest.rs
@@ -5,7 +5,7 @@ use anyhow::{bail, format_err, Error};
 use serde::{Deserialize, Serialize};
 use serde_json::{json, Value};
 
-use pbs_api_types::{BackupType, CryptMode, Fingerprint};
+use pbs_api_types::{BackupType, CryptMode, Fingerprint, SnapshotVerifyState, VerifyState};
 use pbs_tools::crypt_config::CryptConfig;
 
 pub const MANIFEST_BLOB_NAME: &str = "index.json.blob";
@@ -242,6 +242,17 @@ impl BackupManifest {
         let manifest: BackupManifest = serde_json::from_value(json)?;
         Ok(manifest)
     }
+
+    /// Get the verify state of the snapshot
+    ///
+    /// Note: New snapshots, which have not been verified yet, do not have a status and this
+    /// function will return `None`.
+    pub fn verify_state(&self) -> Option<VerifyState> {
+        let verify = self.unprotected["verify_state"].clone();
+        serde_json::from_value::<SnapshotVerifyState>(verify)
+            .map(|svs| svs.state)
+            .ok()
+    }
 }
 
 impl TryFrom<super::DataBlob> for BackupManifest {
diff --git a/src/api2/config/sync.rs b/src/api2/config/sync.rs
index 6fdc69a9e645..fa9db92f3d11 100644
--- a/src/api2/config/sync.rs
+++ b/src/api2/config/sync.rs
@@ -368,6 +368,9 @@ pub fn update_sync_job(
     if let Some(transfer_last) = update.transfer_last {
         data.transfer_last = Some(transfer_last);
     }
+    if let Some(resync_corrupt) = update.resync_corrupt {
+        data.resync_corrupt = Some(resync_corrupt);
+    }
 
     if update.limit.rate_in.is_some() {
         data.limit.rate_in = update.limit.rate_in;
@@ -527,6 +530,7 @@ acl:1:/remote/remote1/remotestore1:write@pbs:RemoteSyncOperator
         ns: None,
         owner: Some(write_auth_id.clone()),
         comment: None,
+        resync_corrupt: None,
         remove_vanished: None,
         max_depth: None,
         group_filter: None,
diff --git a/src/api2/pull.rs b/src/api2/pull.rs
index e733c9839e3a..0d4be0e2d228 100644
--- a/src/api2/pull.rs
+++ b/src/api2/pull.rs
@@ -10,7 +10,7 @@ use pbs_api_types::{
     Authid, BackupNamespace, GroupFilter, RateLimitConfig, SyncJobConfig, DATASTORE_SCHEMA,
     GROUP_FILTER_LIST_SCHEMA, NS_MAX_DEPTH_REDUCED_SCHEMA, PRIV_DATASTORE_BACKUP,
     PRIV_DATASTORE_PRUNE, PRIV_REMOTE_READ, REMOTE_ID_SCHEMA, REMOVE_VANISHED_BACKUPS_SCHEMA,
-    TRANSFER_LAST_SCHEMA,
+    RESYNC_CORRUPT_SCHEMA, TRANSFER_LAST_SCHEMA,
 };
 use pbs_config::CachedUserInfo;
 use proxmox_human_byte::HumanByte;
@@ -89,6 +89,7 @@ impl TryFrom<&SyncJobConfig> for PullParameters {
             sync_job.group_filter.clone(),
             sync_job.limit.clone(),
             sync_job.transfer_last,
+            sync_job.resync_corrupt,
         )
     }
 }
@@ -240,6 +241,10 @@ pub fn do_sync_job(
                 schema: TRANSFER_LAST_SCHEMA,
                 optional: true,
             },
+            "resync-corrupt": {
+                schema: RESYNC_CORRUPT_SCHEMA,
+                optional: true,
+            },
         },
     },
     access: {
@@ -264,6 +269,7 @@ async fn pull(
     group_filter: Option<Vec<GroupFilter>>,
     limit: RateLimitConfig,
     transfer_last: Option<usize>,
+    resync_corrupt: Option<bool>,
     rpcenv: &mut dyn RpcEnvironment,
 ) -> Result<String, Error> {
     let auth_id: Authid = rpcenv.get_auth_id().unwrap().parse()?;
@@ -301,6 +307,7 @@ async fn pull(
         group_filter,
         limit,
         transfer_last,
+        resync_corrupt,
     )?;
 
     // fixme: set to_stdout to false?
diff --git a/src/bin/proxmox-backup-manager.rs b/src/bin/proxmox-backup-manager.rs
index 420e96665662..38a1cf0f5881 100644
--- a/src/bin/proxmox-backup-manager.rs
+++ b/src/bin/proxmox-backup-manager.rs
@@ -14,8 +14,8 @@ use pbs_api_types::percent_encoding::percent_encode_component;
 use pbs_api_types::{
     BackupNamespace, GroupFilter, RateLimitConfig, SyncJobConfig, DATASTORE_SCHEMA,
     GROUP_FILTER_LIST_SCHEMA, IGNORE_VERIFIED_BACKUPS_SCHEMA, NS_MAX_DEPTH_SCHEMA,
-    REMOTE_ID_SCHEMA, REMOVE_VANISHED_BACKUPS_SCHEMA, TRANSFER_LAST_SCHEMA, UPID_SCHEMA,
-    VERIFICATION_OUTDATED_AFTER_SCHEMA,
+    REMOTE_ID_SCHEMA, REMOVE_VANISHED_BACKUPS_SCHEMA, RESYNC_CORRUPT_SCHEMA, TRANSFER_LAST_SCHEMA,
+    UPID_SCHEMA, VERIFICATION_OUTDATED_AFTER_SCHEMA,
 };
 use pbs_client::{display_task_log, view_task_result};
 use pbs_config::sync;
diff --git a/src/server/pull.rs b/src/server/pull.rs
index d9584776ee7f..11a0a9d74cf3 100644
--- a/src/server/pull.rs
+++ b/src/server/pull.rs
@@ -12,7 +12,8 @@ use tracing::info;
 
 use pbs_api_types::{
     print_store_and_ns, Authid, BackupDir, BackupGroup, BackupNamespace, GroupFilter, Operation,
-    RateLimitConfig, Remote, MAX_NAMESPACE_DEPTH, PRIV_DATASTORE_AUDIT, PRIV_DATASTORE_BACKUP,
+    RateLimitConfig, Remote, VerifyState, MAX_NAMESPACE_DEPTH, PRIV_DATASTORE_AUDIT,
+    PRIV_DATASTORE_BACKUP,
 };
 use pbs_client::BackupRepository;
 use pbs_config::CachedUserInfo;
@@ -55,6 +56,8 @@ pub(crate) struct PullParameters {
     group_filter: Vec<GroupFilter>,
     /// How many snapshots should be transferred at most (taking the newest N snapshots)
     transfer_last: Option<usize>,
+    /// Whether to re-sync corrupted snapshots
+    resync_corrupt: bool,
 }
 
 impl PullParameters {
@@ -72,12 +75,14 @@ impl PullParameters {
         group_filter: Option<Vec<GroupFilter>>,
         limit: RateLimitConfig,
         transfer_last: Option<usize>,
+        resync_corrupt: Option<bool>,
     ) -> Result<Self, Error> {
         if let Some(max_depth) = max_depth {
             ns.check_max_depth(max_depth)?;
             remote_ns.check_max_depth(max_depth)?;
         };
         let remove_vanished = remove_vanished.unwrap_or(false);
+        let resync_corrupt = resync_corrupt.unwrap_or(false);
 
         let source: Arc<dyn SyncSource> = if let Some(remote) = remote {
             let (remote_config, _digest) = pbs_config::remote::config()?;
@@ -116,6 +121,7 @@ impl PullParameters {
             max_depth,
             group_filter,
             transfer_last,
+            resync_corrupt,
         })
     }
 }
@@ -323,7 +329,7 @@ async fn pull_single_archive<'a>(
 ///
 /// Pulling a snapshot consists of the following steps:
 /// - (Re)download the manifest
-/// -- if it matches, only download log and treat snapshot as already synced
+/// -- if it matches and is not corrupt, only download log and treat snapshot as already synced
 /// - Iterate over referenced files
 /// -- if file already exists, verify contents
 /// -- if not, pull it from the remote
@@ -332,6 +338,7 @@ async fn pull_snapshot<'a>(
     reader: Arc<dyn SyncSourceReader + 'a>,
     snapshot: &'a pbs_datastore::BackupDir,
     downloaded_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
+    corrupt: bool,
 ) -> Result<SyncStats, Error> {
     let mut sync_stats = SyncStats::default();
     let mut manifest_name = snapshot.full_path();
@@ -352,7 +359,7 @@ async fn pull_snapshot<'a>(
         return Ok(sync_stats);
     }
 
-    if manifest_name.exists() {
+    if manifest_name.exists() && !corrupt {
         let manifest_blob = proxmox_lang::try_block!({
             let mut manifest_file = std::fs::File::open(&manifest_name).map_err(|err| {
                 format_err!("unable to open local manifest {manifest_name:?} - {err}")
@@ -381,7 +388,7 @@ async fn pull_snapshot<'a>(
         let mut path = snapshot.full_path();
         path.push(&item.filename);
 
-        if path.exists() {
+        if !corrupt && path.exists() {
             match ArchiveType::from_path(&item.filename)? {
                 ArchiveType::DynamicIndex => {
                     let index = DynamicIndexReader::open(&path)?;
@@ -443,6 +450,7 @@ async fn pull_snapshot_from<'a>(
     reader: Arc<dyn SyncSourceReader + 'a>,
     snapshot: &'a pbs_datastore::BackupDir,
     downloaded_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
+    corrupt: bool,
 ) -> Result<SyncStats, Error> {
     let (_path, is_new, _snap_lock) = snapshot
         .datastore()
@@ -451,7 +459,7 @@ async fn pull_snapshot_from<'a>(
     let sync_stats = if is_new {
         info!("sync snapshot {}", snapshot.dir());
 
-        match pull_snapshot(reader, snapshot, downloaded_chunks).await {
+        match pull_snapshot(reader, snapshot, downloaded_chunks, corrupt).await {
             Err(err) => {
                 if let Err(cleanup_err) = snapshot.datastore().remove_backup_dir(
                     snapshot.backup_ns(),
@@ -468,8 +476,15 @@ async fn pull_snapshot_from<'a>(
             }
         }
     } else {
-        info!("re-sync snapshot {}", snapshot.dir());
-        pull_snapshot(reader, snapshot, downloaded_chunks).await?
+        if corrupt {
+            info!(
+                "re-sync snapshot {} due to bad verification result",
+                snapshot.dir()
+            );
+        } else {
+            info!("re-sync snapshot {}", snapshot.dir());
+        }
+        pull_snapshot(reader, snapshot, downloaded_chunks, corrupt).await?
     };
 
     Ok(sync_stats)
@@ -523,26 +538,40 @@ async fn pull_group(
         .last_successful_backup(&target_ns, group)?
         .unwrap_or(i64::MIN);
 
-    let list: Vec<BackupDir> = raw_list
+    // Filter remote BackupDirs to include in pull
+    // Also stores if the snapshot is corrupt (verification job failed)
+    let list: Vec<(BackupDir, bool)> = raw_list
         .into_iter()
         .enumerate()
-        .filter(|&(pos, ref dir)| {
+        .filter_map(|(pos, dir)| {
             source_snapshots.insert(dir.time);
+            // If resync_corrupt is set, check if the corresponding local snapshot failed to
+            // verification
+            if params.resync_corrupt {
+                let local_dir = params
+                    .target
+                    .store
+                    .backup_dir(target_ns.clone(), dir.clone());
+                if let Ok(local_dir) = local_dir {
+                    let verify_state = local_dir.verify_state();
+                    if verify_state == Some(VerifyState::Failed) {
+                        return Some((dir, true));
+                    }
+                }
+            }
             // Note: the snapshot represented by `last_sync_time` might be missing its backup log
             // or post-backup verification state if those were not yet available during the last
             // sync run, always resync it
             if last_sync_time > dir.time {
                 already_synced_skip_info.update(dir.time);
-                return false;
+                return None;
             }
-
             if pos < cutoff && last_sync_time != dir.time {
                 transfer_last_skip_info.update(dir.time);
-                return false;
+                return None;
             }
-            true
+            Some((dir, false))
         })
-        .map(|(_, dir)| dir)
         .collect();
 
     if already_synced_skip_info.count > 0 {
@@ -561,7 +590,7 @@ async fn pull_group(
 
     let mut sync_stats = SyncStats::default();
 
-    for (pos, from_snapshot) in list.into_iter().enumerate() {
+    for (pos, (from_snapshot, corrupt)) in list.into_iter().enumerate() {
         let to_snapshot = params
             .target
             .store
@@ -571,7 +600,8 @@ async fn pull_group(
             .source
             .reader(source_namespace, &from_snapshot)
             .await?;
-        let result = pull_snapshot_from(reader, &to_snapshot, downloaded_chunks.clone()).await;
+        let result =
+            pull_snapshot_from(reader, &to_snapshot, downloaded_chunks.clone(), corrupt).await;
 
         progress.done_snapshots = pos as u64 + 1;
         info!("percentage done: {progress}");
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v3 2/3] fix #3786: ui/cli: add resync-corrupt option on sync-jobs
  2024-11-05 10:40 [pbs-devel] [PATCH proxmox-backup v3 0/3] fix #3786: resync corrupt chunks in sync-job Gabriel Goller
  2024-11-05 10:40 ` [pbs-devel] [PATCH proxmox-backup v3 1/3] fix #3786: api: add resync-corrupt option to sync jobs Gabriel Goller
@ 2024-11-05 10:40 ` Gabriel Goller
  2024-11-20 13:12   ` Fabian Grünbichler
  2024-11-05 10:40 ` [pbs-devel] [PATCH proxmox-backup v3 3/3] fix #3786: docs: add resync-corrupt option to sync-job Gabriel Goller
  2024-11-21 13:34 ` [pbs-devel] [PATCH proxmox-backup v3 0/3] fix #3786: resync corrupt chunks in sync-job Gabriel Goller
  3 siblings, 1 reply; 11+ messages in thread
From: Gabriel Goller @ 2024-11-05 10:40 UTC (permalink / raw)
  To: pbs-devel

Add the `resync-corrupt` option to the ui and the
`proxmox-backup-manager` cli. It is listed in the `Advanced` section,
because it slows the sync-job down and is useless if no verification
job was run beforehand.

Originally-by: Shannon Sterz <s.sterz@proxmox.com>
Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
---
 src/bin/proxmox-backup-manager.rs |  9 +++++++++
 www/window/SyncJobEdit.js         | 11 +++++++++++
 2 files changed, 20 insertions(+)

diff --git a/src/bin/proxmox-backup-manager.rs b/src/bin/proxmox-backup-manager.rs
index 38a1cf0f5881..08728e9d7250 100644
--- a/src/bin/proxmox-backup-manager.rs
+++ b/src/bin/proxmox-backup-manager.rs
@@ -339,6 +339,10 @@ fn task_mgmt_cli() -> CommandLineInterface {
                 schema: TRANSFER_LAST_SCHEMA,
                 optional: true,
             },
+            "resync-corrupt": {
+                schema: RESYNC_CORRUPT_SCHEMA,
+                optional: true,
+            },
         }
    }
 )]
@@ -355,6 +359,7 @@ async fn pull_datastore(
     group_filter: Option<Vec<GroupFilter>>,
     limit: RateLimitConfig,
     transfer_last: Option<usize>,
+    resync_corrupt: Option<bool>,
     param: Value,
 ) -> Result<Value, Error> {
     let output_format = get_output_format(&param);
@@ -391,6 +396,10 @@ async fn pull_datastore(
         args["transfer-last"] = json!(transfer_last)
     }
 
+    if let Some(resync_corrupt) = resync_corrupt {
+        args["resync-corrupt"] = Value::from(resync_corrupt);
+    }
+
     let mut limit_json = json!(limit);
     let limit_map = limit_json
         .as_object_mut()
diff --git a/www/window/SyncJobEdit.js b/www/window/SyncJobEdit.js
index 6543995e8800..a3c497fc2185 100644
--- a/www/window/SyncJobEdit.js
+++ b/www/window/SyncJobEdit.js
@@ -321,6 +321,17 @@ Ext.define('PBS.window.SyncJobEdit', {
 			    deleteEmpty: '{!isCreate}',
 			},
 		    },
+		    {
+			fieldLabel: gettext('Resync corrupt snapshots'),
+			xtype: 'proxmoxcheckbox',
+			name: 'resync-corrupt',
+			autoEl: {
+			    tag: 'div',
+			    'data-qtip': gettext('Re-sync snapshots, whose verfification failed.'),
+			},
+			uncheckedValue: false,
+			value: false,
+		    },
 		],
 	    },
 	    {
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v3 3/3] fix #3786: docs: add resync-corrupt option to sync-job
  2024-11-05 10:40 [pbs-devel] [PATCH proxmox-backup v3 0/3] fix #3786: resync corrupt chunks in sync-job Gabriel Goller
  2024-11-05 10:40 ` [pbs-devel] [PATCH proxmox-backup v3 1/3] fix #3786: api: add resync-corrupt option to sync jobs Gabriel Goller
  2024-11-05 10:40 ` [pbs-devel] [PATCH proxmox-backup v3 2/3] fix #3786: ui/cli: add resync-corrupt option on sync-jobs Gabriel Goller
@ 2024-11-05 10:40 ` Gabriel Goller
  2024-11-21 13:34 ` [pbs-devel] [PATCH proxmox-backup v3 0/3] fix #3786: resync corrupt chunks in sync-job Gabriel Goller
  3 siblings, 0 replies; 11+ messages in thread
From: Gabriel Goller @ 2024-11-05 10:40 UTC (permalink / raw)
  To: pbs-devel

Add short section explaining the `resync-corrupt` option on the
sync-job.

Originally-by: Shannon Sterz <s.sterz@proxmox.com>
Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
---
 docs/managing-remotes.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/docs/managing-remotes.rst b/docs/managing-remotes.rst
index dd43ccd2b79b..e8013b6e2113 100644
--- a/docs/managing-remotes.rst
+++ b/docs/managing-remotes.rst
@@ -135,6 +135,12 @@ For mixing include and exclude filter, following rules apply:
 
 .. note:: The ``protected`` flag of remote backup snapshots will not be synced.
 
+Enabling the advanced option 'resync-corrupt' will re-sync all snapshots that have 
+failed to verify during the last :ref:`maintenance_verification`. Hence, a verification
+job needs to be run before a sync job with 'resync-corrupt' can be carried out. Be aware
+that a 'resync-corrupt'-job needs to check the manifests of all snapshots in a datastore
+and might take much longer than regular sync jobs.
+
 Namespace Support
 ^^^^^^^^^^^^^^^^^
 
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup v3 1/3] fix #3786: api: add resync-corrupt option to sync jobs
  2024-11-05 10:40 ` [pbs-devel] [PATCH proxmox-backup v3 1/3] fix #3786: api: add resync-corrupt option to sync jobs Gabriel Goller
@ 2024-11-20 13:11   ` Fabian Grünbichler
  2024-11-21 10:04     ` Gabriel Goller
  0 siblings, 1 reply; 11+ messages in thread
From: Fabian Grünbichler @ 2024-11-20 13:11 UTC (permalink / raw)
  To: Gabriel Goller, pbs-devel

a few small nits inline, looks good to me otherwise, but given the size of this
and the size of the push series, I'd rather this be rebased on top of the other
one ;)

Quoting Gabriel Goller (2024-11-05 11:40:13)
> This option allows us to "fix" corrupt snapshots (and/or their chunks)
> by pulling them from another remote. When traversing the remote
> snapshots, we check if it exists locally, and if it is, we check if the
> last verification of it failed. If the local snapshot is broken and the
> `resync-corrupt` option is turned on, we pull in the remote snapshot,
> overwriting the local one.
> 
> This is very useful and has been requested a lot, as there is currently
> no way to "fix" corrupt chunks/snapshots even if the user has a healthy
> version of it on their offsite instance.
> 
> Originally-by: Shannon Sterz <s.sterz@proxmox.com>
> Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
> ---
>  pbs-api-types/src/jobs.rs         | 10 +++++
>  pbs-datastore/src/backup_info.rs  | 12 +++++-
>  pbs-datastore/src/manifest.rs     | 13 ++++++-
>  src/api2/config/sync.rs           |  4 ++
>  src/api2/pull.rs                  |  9 ++++-
>  src/bin/proxmox-backup-manager.rs |  4 +-
>  src/server/pull.rs                | 62 +++++++++++++++++++++++--------
>  7 files changed, 93 insertions(+), 21 deletions(-)
> 
> diff --git a/pbs-api-types/src/jobs.rs b/pbs-api-types/src/jobs.rs
> index 868702bc059e..58f739ad00b5 100644
> --- a/pbs-api-types/src/jobs.rs
> +++ b/pbs-api-types/src/jobs.rs
> @@ -498,6 +498,10 @@ pub const TRANSFER_LAST_SCHEMA: Schema =
>          .minimum(1)
>          .schema();
>  
> +pub const RESYNC_CORRUPT_SCHEMA: Schema =
> +    BooleanSchema::new("If the verification failed for a local snapshot, try to pull it again.")
> +        .schema();
> +
>  #[api(
>      properties: {
>          id: {
> @@ -552,6 +556,10 @@ pub const TRANSFER_LAST_SCHEMA: Schema =
>              schema: TRANSFER_LAST_SCHEMA,
>              optional: true,
>          },
> +        "resync-corrupt": {
> +            schema: RESYNC_CORRUPT_SCHEMA,
> +            optional: true,
> +        }
>      }
>  )]
>  #[derive(Serialize, Deserialize, Clone, Updater, PartialEq)]
> @@ -585,6 +593,8 @@ pub struct SyncJobConfig {
>      pub limit: RateLimitConfig,
>      #[serde(skip_serializing_if = "Option::is_none")]
>      pub transfer_last: Option<usize>,
> +    #[serde(skip_serializing_if = "Option::is_none")]
> +    pub resync_corrupt: Option<bool>,
>  }
>  
>  impl SyncJobConfig {
> diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
> index 414ec878d01a..e6174322dad6 100644
> --- a/pbs-datastore/src/backup_info.rs
> +++ b/pbs-datastore/src/backup_info.rs
> @@ -8,7 +8,8 @@ use anyhow::{bail, format_err, Error};
>  use proxmox_sys::fs::{lock_dir_noblock, replace_file, CreateOptions};
>  
>  use pbs_api_types::{
> -    Authid, BackupNamespace, BackupType, GroupFilter, BACKUP_DATE_REGEX, BACKUP_FILE_REGEX,
> +    Authid, BackupNamespace, BackupType, GroupFilter, VerifyState, BACKUP_DATE_REGEX,
> +    BACKUP_FILE_REGEX,
>  };
>  use pbs_config::{open_backup_lockfile, BackupLockGuard};
>  
> @@ -583,6 +584,15 @@ impl BackupDir {
>  
>          Ok(())
>      }
> +
> +    /// Load the verify state from the manifest.
> +    pub fn verify_state(&self) -> Option<VerifyState> {

should this be a Result<Option<..>> to allow differentiation between no
verification state, and failure to parse or load the manifest? that would allow
us to resync totally corrupted snapshots as well (although that might be
considered out of scope based on the parameter description ;))

> +        if let Ok(manifest) = self.load_manifest() {
> +            manifest.0.verify_state()
> +        } else {
> +            None
> +        }
> +    }
>  }
>  
>  impl AsRef<pbs_api_types::BackupNamespace> for BackupDir {
> diff --git a/pbs-datastore/src/manifest.rs b/pbs-datastore/src/manifest.rs
> index c3df014272a0..623c1499c0bb 100644
> --- a/pbs-datastore/src/manifest.rs
> +++ b/pbs-datastore/src/manifest.rs
> @@ -5,7 +5,7 @@ use anyhow::{bail, format_err, Error};
>  use serde::{Deserialize, Serialize};
>  use serde_json::{json, Value};
>  
> -use pbs_api_types::{BackupType, CryptMode, Fingerprint};
> +use pbs_api_types::{BackupType, CryptMode, Fingerprint, SnapshotVerifyState, VerifyState};
>  use pbs_tools::crypt_config::CryptConfig;
>  
>  pub const MANIFEST_BLOB_NAME: &str = "index.json.blob";
> @@ -242,6 +242,17 @@ impl BackupManifest {
>          let manifest: BackupManifest = serde_json::from_value(json)?;
>          Ok(manifest)
>      }
> +
> +    /// Get the verify state of the snapshot
> +    ///
> +    /// Note: New snapshots, which have not been verified yet, do not have a status and this
> +    /// function will return `None`.
> +    pub fn verify_state(&self) -> Option<VerifyState> {

should this be a Result<Option<..>> to allow differentiation between no
verification state, and failure to parse?

also, it would be great if existing code retrieving this could be adapted to
use these new helpers, which would require having the Result there as well..

> +        let verify = self.unprotected["verify_state"].clone();
> +        serde_json::from_value::<SnapshotVerifyState>(verify)
> +            .map(|svs| svs.state)
> +            .ok()
> +    }
>  }
>  
>  impl TryFrom<super::DataBlob> for BackupManifest {
> diff --git a/src/api2/config/sync.rs b/src/api2/config/sync.rs
> index 6fdc69a9e645..fa9db92f3d11 100644
> --- a/src/api2/config/sync.rs
> +++ b/src/api2/config/sync.rs
> @@ -368,6 +368,9 @@ pub fn update_sync_job(
>      if let Some(transfer_last) = update.transfer_last {
>          data.transfer_last = Some(transfer_last);
>      }
> +    if let Some(resync_corrupt) = update.resync_corrupt {
> +        data.resync_corrupt = Some(resync_corrupt);
> +    }
>  
>      if update.limit.rate_in.is_some() {
>          data.limit.rate_in = update.limit.rate_in;
> @@ -527,6 +530,7 @@ acl:1:/remote/remote1/remotestore1:write@pbs:RemoteSyncOperator
>          ns: None,
>          owner: Some(write_auth_id.clone()),
>          comment: None,
> +        resync_corrupt: None,
>          remove_vanished: None,
>          max_depth: None,
>          group_filter: None,
> diff --git a/src/api2/pull.rs b/src/api2/pull.rs
> index e733c9839e3a..0d4be0e2d228 100644
> --- a/src/api2/pull.rs
> +++ b/src/api2/pull.rs
> @@ -10,7 +10,7 @@ use pbs_api_types::{
>      Authid, BackupNamespace, GroupFilter, RateLimitConfig, SyncJobConfig, DATASTORE_SCHEMA,
>      GROUP_FILTER_LIST_SCHEMA, NS_MAX_DEPTH_REDUCED_SCHEMA, PRIV_DATASTORE_BACKUP,
>      PRIV_DATASTORE_PRUNE, PRIV_REMOTE_READ, REMOTE_ID_SCHEMA, REMOVE_VANISHED_BACKUPS_SCHEMA,
> -    TRANSFER_LAST_SCHEMA,
> +    RESYNC_CORRUPT_SCHEMA, TRANSFER_LAST_SCHEMA,
>  };
>  use pbs_config::CachedUserInfo;
>  use proxmox_human_byte::HumanByte;
> @@ -89,6 +89,7 @@ impl TryFrom<&SyncJobConfig> for PullParameters {
>              sync_job.group_filter.clone(),
>              sync_job.limit.clone(),
>              sync_job.transfer_last,
> +            sync_job.resync_corrupt,
>          )
>      }
>  }
> @@ -240,6 +241,10 @@ pub fn do_sync_job(
>                  schema: TRANSFER_LAST_SCHEMA,
>                  optional: true,
>              },
> +            "resync-corrupt": {
> +                schema: RESYNC_CORRUPT_SCHEMA,
> +                optional: true,
> +            },
>          },
>      },
>      access: {
> @@ -264,6 +269,7 @@ async fn pull(
>      group_filter: Option<Vec<GroupFilter>>,
>      limit: RateLimitConfig,
>      transfer_last: Option<usize>,
> +    resync_corrupt: Option<bool>,
>      rpcenv: &mut dyn RpcEnvironment,
>  ) -> Result<String, Error> {
>      let auth_id: Authid = rpcenv.get_auth_id().unwrap().parse()?;
> @@ -301,6 +307,7 @@ async fn pull(
>          group_filter,
>          limit,
>          transfer_last,
> +        resync_corrupt,
>      )?;
>  
>      // fixme: set to_stdout to false?
> diff --git a/src/bin/proxmox-backup-manager.rs b/src/bin/proxmox-backup-manager.rs
> index 420e96665662..38a1cf0f5881 100644
> --- a/src/bin/proxmox-backup-manager.rs
> +++ b/src/bin/proxmox-backup-manager.rs
> @@ -14,8 +14,8 @@ use pbs_api_types::percent_encoding::percent_encode_component;
>  use pbs_api_types::{
>      BackupNamespace, GroupFilter, RateLimitConfig, SyncJobConfig, DATASTORE_SCHEMA,
>      GROUP_FILTER_LIST_SCHEMA, IGNORE_VERIFIED_BACKUPS_SCHEMA, NS_MAX_DEPTH_SCHEMA,
> -    REMOTE_ID_SCHEMA, REMOVE_VANISHED_BACKUPS_SCHEMA, TRANSFER_LAST_SCHEMA, UPID_SCHEMA,
> -    VERIFICATION_OUTDATED_AFTER_SCHEMA,
> +    REMOTE_ID_SCHEMA, REMOVE_VANISHED_BACKUPS_SCHEMA, RESYNC_CORRUPT_SCHEMA, TRANSFER_LAST_SCHEMA,
> +    UPID_SCHEMA, VERIFICATION_OUTDATED_AFTER_SCHEMA,
>  };
>  use pbs_client::{display_task_log, view_task_result};
>  use pbs_config::sync;
> diff --git a/src/server/pull.rs b/src/server/pull.rs
> index d9584776ee7f..11a0a9d74cf3 100644
> --- a/src/server/pull.rs
> +++ b/src/server/pull.rs
> @@ -12,7 +12,8 @@ use tracing::info;
>  
>  use pbs_api_types::{
>      print_store_and_ns, Authid, BackupDir, BackupGroup, BackupNamespace, GroupFilter, Operation,
> -    RateLimitConfig, Remote, MAX_NAMESPACE_DEPTH, PRIV_DATASTORE_AUDIT, PRIV_DATASTORE_BACKUP,
> +    RateLimitConfig, Remote, VerifyState, MAX_NAMESPACE_DEPTH, PRIV_DATASTORE_AUDIT,
> +    PRIV_DATASTORE_BACKUP,
>  };
>  use pbs_client::BackupRepository;
>  use pbs_config::CachedUserInfo;
> @@ -55,6 +56,8 @@ pub(crate) struct PullParameters {
>      group_filter: Vec<GroupFilter>,
>      /// How many snapshots should be transferred at most (taking the newest N snapshots)
>      transfer_last: Option<usize>,
> +    /// Whether to re-sync corrupted snapshots
> +    resync_corrupt: bool,
>  }
>  
>  impl PullParameters {
> @@ -72,12 +75,14 @@ impl PullParameters {
>          group_filter: Option<Vec<GroupFilter>>,
>          limit: RateLimitConfig,
>          transfer_last: Option<usize>,
> +        resync_corrupt: Option<bool>,
>      ) -> Result<Self, Error> {
>          if let Some(max_depth) = max_depth {
>              ns.check_max_depth(max_depth)?;
>              remote_ns.check_max_depth(max_depth)?;
>          };
>          let remove_vanished = remove_vanished.unwrap_or(false);
> +        let resync_corrupt = resync_corrupt.unwrap_or(false);
>  
>          let source: Arc<dyn SyncSource> = if let Some(remote) = remote {
>              let (remote_config, _digest) = pbs_config::remote::config()?;
> @@ -116,6 +121,7 @@ impl PullParameters {
>              max_depth,
>              group_filter,
>              transfer_last,
> +            resync_corrupt,
>          })
>      }
>  }
> @@ -323,7 +329,7 @@ async fn pull_single_archive<'a>(
>  ///
>  /// Pulling a snapshot consists of the following steps:
>  /// - (Re)download the manifest
> -/// -- if it matches, only download log and treat snapshot as already synced
> +/// -- if it matches and is not corrupt, only download log and treat snapshot as already synced
>  /// - Iterate over referenced files
>  /// -- if file already exists, verify contents
>  /// -- if not, pull it from the remote
> @@ -332,6 +338,7 @@ async fn pull_snapshot<'a>(
>      reader: Arc<dyn SyncSourceReader + 'a>,
>      snapshot: &'a pbs_datastore::BackupDir,
>      downloaded_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
> +    corrupt: bool,
>  ) -> Result<SyncStats, Error> {
>      let mut sync_stats = SyncStats::default();
>      let mut manifest_name = snapshot.full_path();
> @@ -352,7 +359,7 @@ async fn pull_snapshot<'a>(
>          return Ok(sync_stats);
>      }
>  
> -    if manifest_name.exists() {
> +    if manifest_name.exists() && !corrupt {
>          let manifest_blob = proxmox_lang::try_block!({
>              let mut manifest_file = std::fs::File::open(&manifest_name).map_err(|err| {
>                  format_err!("unable to open local manifest {manifest_name:?} - {err}")
> @@ -381,7 +388,7 @@ async fn pull_snapshot<'a>(
>          let mut path = snapshot.full_path();
>          path.push(&item.filename);
>  
> -        if path.exists() {
> +        if !corrupt && path.exists() {
>              match ArchiveType::from_path(&item.filename)? {
>                  ArchiveType::DynamicIndex => {
>                      let index = DynamicIndexReader::open(&path)?;
> @@ -443,6 +450,7 @@ async fn pull_snapshot_from<'a>(
>      reader: Arc<dyn SyncSourceReader + 'a>,
>      snapshot: &'a pbs_datastore::BackupDir,
>      downloaded_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
> +    corrupt: bool,
>  ) -> Result<SyncStats, Error> {
>      let (_path, is_new, _snap_lock) = snapshot
>          .datastore()
> @@ -451,7 +459,7 @@ async fn pull_snapshot_from<'a>(
>      let sync_stats = if is_new {

is_new and corrupt are never both true..

>          info!("sync snapshot {}", snapshot.dir());
>  
> -        match pull_snapshot(reader, snapshot, downloaded_chunks).await {
> +        match pull_snapshot(reader, snapshot, downloaded_chunks, corrupt).await {

so this should be always false ;)

>              Err(err) => {
>                  if let Err(cleanup_err) = snapshot.datastore().remove_backup_dir(
>                      snapshot.backup_ns(),
> @@ -468,8 +476,15 @@ async fn pull_snapshot_from<'a>(
>              }
>          }
>      } else {
> -        info!("re-sync snapshot {}", snapshot.dir());
> -        pull_snapshot(reader, snapshot, downloaded_chunks).await?
> +        if corrupt {
> +            info!(
> +                "re-sync snapshot {} due to bad verification result",

nit: why not call it "corrupt", since that is what the parameter is called?

> +                snapshot.dir()
> +            );
> +        } else {
> +            info!("re-sync snapshot {}", snapshot.dir());
> +        }
> +        pull_snapshot(reader, snapshot, downloaded_chunks, corrupt).await?
>      };
>  
>      Ok(sync_stats)
> @@ -523,26 +538,40 @@ async fn pull_group(
>          .last_successful_backup(&target_ns, group)?
>          .unwrap_or(i64::MIN);
>  
> -    let list: Vec<BackupDir> = raw_list
> +    // Filter remote BackupDirs to include in pull
> +    // Also stores if the snapshot is corrupt (verification job failed)
> +    let list: Vec<(BackupDir, bool)> = raw_list
>          .into_iter()
>          .enumerate()
> -        .filter(|&(pos, ref dir)| {
> +        .filter_map(|(pos, dir)| {
>              source_snapshots.insert(dir.time);
> +            // If resync_corrupt is set, check if the corresponding local snapshot failed to
> +            // verification
> +            if params.resync_corrupt {
> +                let local_dir = params
> +                    .target
> +                    .store
> +                    .backup_dir(target_ns.clone(), dir.clone());
> +                if let Ok(local_dir) = local_dir {
> +                    let verify_state = local_dir.verify_state();
> +                    if verify_state == Some(VerifyState::Failed) {
> +                        return Some((dir, true));
> +                    }
> +                }
> +            }
>              // Note: the snapshot represented by `last_sync_time` might be missing its backup log
>              // or post-backup verification state if those were not yet available during the last
>              // sync run, always resync it
>              if last_sync_time > dir.time {
>                  already_synced_skip_info.update(dir.time);
> -                return false;
> +                return None;
>              }
> -
>              if pos < cutoff && last_sync_time != dir.time {
>                  transfer_last_skip_info.update(dir.time);
> -                return false;
> +                return None;
>              }
> -            true
> +            Some((dir, false))
>          })
> -        .map(|(_, dir)| dir)
>          .collect();
>  
>      if already_synced_skip_info.count > 0 {
> @@ -561,7 +590,7 @@ async fn pull_group(
>  
>      let mut sync_stats = SyncStats::default();
>  
> -    for (pos, from_snapshot) in list.into_iter().enumerate() {
> +    for (pos, (from_snapshot, corrupt)) in list.into_iter().enumerate() {
>          let to_snapshot = params
>              .target
>              .store
> @@ -571,7 +600,8 @@ async fn pull_group(
>              .source
>              .reader(source_namespace, &from_snapshot)
>              .await?;
> -        let result = pull_snapshot_from(reader, &to_snapshot, downloaded_chunks.clone()).await;
> +        let result =
> +            pull_snapshot_from(reader, &to_snapshot, downloaded_chunks.clone(), corrupt).await;
>  
>          progress.done_snapshots = pos as u64 + 1;
>          info!("percentage done: {progress}");
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
>


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup v3 2/3] fix #3786: ui/cli: add resync-corrupt option on sync-jobs
  2024-11-05 10:40 ` [pbs-devel] [PATCH proxmox-backup v3 2/3] fix #3786: ui/cli: add resync-corrupt option on sync-jobs Gabriel Goller
@ 2024-11-20 13:12   ` Fabian Grünbichler
  2024-11-21 10:18     ` Gabriel Goller
  0 siblings, 1 reply; 11+ messages in thread
From: Fabian Grünbichler @ 2024-11-20 13:12 UTC (permalink / raw)
  To: Gabriel Goller, pbs-devel

nit: might look better moved to the right side of the window, but no hard
feelings either way

Quoting Gabriel Goller (2024-11-05 11:40:14)
> Add the `resync-corrupt` option to the ui and the
> `proxmox-backup-manager` cli. It is listed in the `Advanced` section,
> because it slows the sync-job down and is useless if no verification
> job was run beforehand.
> 
> Originally-by: Shannon Sterz <s.sterz@proxmox.com>
> Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
> ---
>  src/bin/proxmox-backup-manager.rs |  9 +++++++++
>  www/window/SyncJobEdit.js         | 11 +++++++++++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/src/bin/proxmox-backup-manager.rs b/src/bin/proxmox-backup-manager.rs
> index 38a1cf0f5881..08728e9d7250 100644
> --- a/src/bin/proxmox-backup-manager.rs
> +++ b/src/bin/proxmox-backup-manager.rs
> @@ -339,6 +339,10 @@ fn task_mgmt_cli() -> CommandLineInterface {
>                  schema: TRANSFER_LAST_SCHEMA,
>                  optional: true,
>              },
> +            "resync-corrupt": {
> +                schema: RESYNC_CORRUPT_SCHEMA,
> +                optional: true,
> +            },
>          }
>     }
>  )]
> @@ -355,6 +359,7 @@ async fn pull_datastore(
>      group_filter: Option<Vec<GroupFilter>>,
>      limit: RateLimitConfig,
>      transfer_last: Option<usize>,
> +    resync_corrupt: Option<bool>,
>      param: Value,
>  ) -> Result<Value, Error> {
>      let output_format = get_output_format(&param);
> @@ -391,6 +396,10 @@ async fn pull_datastore(
>          args["transfer-last"] = json!(transfer_last)
>      }
>  
> +    if let Some(resync_corrupt) = resync_corrupt {
> +        args["resync-corrupt"] = Value::from(resync_corrupt);
> +    }
> +
>      let mut limit_json = json!(limit);
>      let limit_map = limit_json
>          .as_object_mut()
> diff --git a/www/window/SyncJobEdit.js b/www/window/SyncJobEdit.js
> index 6543995e8800..a3c497fc2185 100644
> --- a/www/window/SyncJobEdit.js
> +++ b/www/window/SyncJobEdit.js
> @@ -321,6 +321,17 @@ Ext.define('PBS.window.SyncJobEdit', {
>                             deleteEmpty: '{!isCreate}',
>                         },
>                     },
> +                   {
> +                       fieldLabel: gettext('Resync corrupt snapshots'),
> +                       xtype: 'proxmoxcheckbox',
> +                       name: 'resync-corrupt',
> +                       autoEl: {
> +                           tag: 'div',
> +                           'data-qtip': gettext('Re-sync snapshots, whose verfification failed.'),
> +                       },
> +                       uncheckedValue: false,
> +                       value: false,
> +                   },
>                 ],
>             },
>             {
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
>


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup v3 1/3] fix #3786: api: add resync-corrupt option to sync jobs
  2024-11-20 13:11   ` Fabian Grünbichler
@ 2024-11-21 10:04     ` Gabriel Goller
  2024-11-21 10:09       ` Fabian Grünbichler
  0 siblings, 1 reply; 11+ messages in thread
From: Gabriel Goller @ 2024-11-21 10:04 UTC (permalink / raw)
  To: Fabian Grünbichler; +Cc: pbs-devel

On 20.11.2024 14:11, Fabian Grünbichler wrote:
>a few small nits inline, looks good to me otherwise, but given the size of this
>and the size of the push series, I'd rather this be rebased on top of the other
>one ;)

Sure, shouldn't be a lot of work. Should I send a rebased version on top
of the current push series as a v4?

>Quoting Gabriel Goller (2024-11-05 11:40:13)
>> diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
>> index 414ec878d01a..e6174322dad6 100644
>> --- a/pbs-datastore/src/backup_info.rs
>> +++ b/pbs-datastore/src/backup_info.rs
>> @@ -8,7 +8,8 @@ use anyhow::{bail, format_err, Error};
>>  use proxmox_sys::fs::{lock_dir_noblock, replace_file, CreateOptions};
>>
>>  use pbs_api_types::{
>> -    Authid, BackupNamespace, BackupType, GroupFilter, BACKUP_DATE_REGEX, BACKUP_FILE_REGEX,
>> +    Authid, BackupNamespace, BackupType, GroupFilter, VerifyState, BACKUP_DATE_REGEX,
>> +    BACKUP_FILE_REGEX,
>>  };
>>  use pbs_config::{open_backup_lockfile, BackupLockGuard};
>>
>> @@ -583,6 +584,15 @@ impl BackupDir {
>>
>>          Ok(())
>>      }
>> +
>> +    /// Load the verify state from the manifest.
>> +    pub fn verify_state(&self) -> Option<VerifyState> {
>
>should this be a Result<Option<..>> to allow differentiation between no
>verification state, and failure to parse or load the manifest? that would allow
>us to resync totally corrupted snapshots as well (although that might be
>considered out of scope based on the parameter description ;))

Yep it was already like this in the first version, no idea why I changed
it. Like this we can return the load_manifest error with the Result and
swallow the inner error with a `ok()` as it doesn't matter anymore.

     pub fn verify_state(&self) -> Result<Option<VerifyState>, anyhow::Error> {
         let manifest = self.load_manifest()?;
         Ok(manifest.0.verify_state().ok().flatten().map(|svs| svs.state))
     }


I think we also want to resync on errors when reading the manifest, I'll
include that in the next version! Something like this maybe:

     match local_dir.verify_state() {
         Ok(Some(state)) => {
             if state == VerifyState::Failed {
                 return Some((dir, true));
             }
         },
         Ok(None) => {
             // This means there either was an error parsing the manifest, or the 
             // verify_state item was not found. This could be a new backup.
         }
         Err(_) => {
             // There was an error loading the manifest, probably better if we
             // resync.
             return Some((dir, true));
         }
     }

>> +        if let Ok(manifest) = self.load_manifest() {
>> +            manifest.0.verify_state()
>> +        } else {
>> +            None
>> +        }
>> +    }
>>  }
>>
>>  impl AsRef<pbs_api_types::BackupNamespace> for BackupDir {
>> diff --git a/pbs-datastore/src/manifest.rs b/pbs-datastore/src/manifest.rs
>> index c3df014272a0..623c1499c0bb 100644
>> --- a/pbs-datastore/src/manifest.rs
>> +++ b/pbs-datastore/src/manifest.rs
>> @@ -5,7 +5,7 @@ use anyhow::{bail, format_err, Error};
>>  use serde::{Deserialize, Serialize};
>>  use serde_json::{json, Value};
>>
>> -use pbs_api_types::{BackupType, CryptMode, Fingerprint};
>> +use pbs_api_types::{BackupType, CryptMode, Fingerprint, SnapshotVerifyState, VerifyState};
>>  use pbs_tools::crypt_config::CryptConfig;
>>
>>  pub const MANIFEST_BLOB_NAME: &str = "index.json.blob";
>> @@ -242,6 +242,17 @@ impl BackupManifest {
>>          let manifest: BackupManifest = serde_json::from_value(json)?;
>>          Ok(manifest)
>>      }
>> +
>> +    /// Get the verify state of the snapshot
>> +    ///
>> +    /// Note: New snapshots, which have not been verified yet, do not have a status and this
>> +    /// function will return `None`.
>> +    pub fn verify_state(&self) -> Option<VerifyState> {
>
>should this be a Result<Option<..>> to allow differentiation between no
>verification state, and failure to parse?

Hmm so I could return a Result<Option<..>> by checking the error of the
serde_json::from_value call. I could check if the "verify_state" value
wasn't found in the manifest by calling `is_eof` [0] and if not, return
a Ok(None), otherwise return an Error. This will make it more
complicated for all the callers though – also 99% of the callers will
treat Err the same as Ok(None) anyways. LTMK what you think!

     /// Get the verify state of the snapshot
     ///
     /// Note: New snapshots, which have not been verified yet, do not have a status and this
     /// function will return `Ok(None)`.
     pub fn verify_state(&self) -> Result<Option<SnapshotVerifyState>, anyhow::Error> {
         let verify = self.unprotected["verify_state"].clone();
         match serde_json::from_value::<SnapshotVerifyState>(verify) {
             Err(err) => {
                 // `verify_state` item has not been found
                 if err.is_eof() {
                     Ok(None)
                 }else {
                     Err(err.into())
                 }
             },
             Ok(svs) => {
                 Ok(Some(svs))
             }
         }
     }


Else I could just return a Result<SnapshotVerifyState>.

[0]: https://docs.rs/serde_json/latest/serde_json/struct.Error.html#method.is_eof

>also, it would be great if existing code retrieving this could be adapted to
>use these new helpers, which would require having the Result there as well..

Yep, overlooked those, my bad.

>> @@ -381,7 +388,7 @@ async fn pull_snapshot<'a>(
>>          let mut path = snapshot.full_path();
>>          path.push(&item.filename);
>>
>> -        if path.exists() {
>> +        if !corrupt && path.exists() {
>>              match ArchiveType::from_path(&item.filename)? {
>>                  ArchiveType::DynamicIndex => {
>>                      let index = DynamicIndexReader::open(&path)?;
>> @@ -443,6 +450,7 @@ async fn pull_snapshot_from<'a>(
>>      reader: Arc<dyn SyncSourceReader + 'a>,
>>      snapshot: &'a pbs_datastore::BackupDir,
>>      downloaded_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
>> +    corrupt: bool,
>>  ) -> Result<SyncStats, Error> {
>>      let (_path, is_new, _snap_lock) = snapshot
>>          .datastore()
>> @@ -451,7 +459,7 @@ async fn pull_snapshot_from<'a>(
>>      let sync_stats = if is_new {
>
>is_new and corrupt are never both true..
>
>>          info!("sync snapshot {}", snapshot.dir());
>>
>> -        match pull_snapshot(reader, snapshot, downloaded_chunks).await {
>> +        match pull_snapshot(reader, snapshot, downloaded_chunks, corrupt).await {
>
>so this should be always false ;)

Agree, wrote a comment and passed directly `false`.

>>              Err(err) => {
>>                  if let Err(cleanup_err) = snapshot.datastore().remove_backup_dir(
>>                      snapshot.backup_ns(),
>> @@ -468,8 +476,15 @@ async fn pull_snapshot_from<'a>(
>>              }
>>          }
>>      } else {
>> -        info!("re-sync snapshot {}", snapshot.dir());
>> -        pull_snapshot(reader, snapshot, downloaded_chunks).await?
>> +        if corrupt {
>> +            info!(
>> +                "re-sync snapshot {} due to bad verification result",
>
>nit: why not call it "corrupt", since that is what the parameter is called?

ack

>> +                snapshot.dir()
>> +            );
>> +        } else {
>> +            info!("re-sync snapshot {}", snapshot.dir());
>> +        }
>> +        pull_snapshot(reader, snapshot, downloaded_chunks, corrupt).await?
>>      };
>>
>>      Ok(sync_stats)


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup v3 1/3] fix #3786: api: add resync-corrupt option to sync jobs
  2024-11-21 10:04     ` Gabriel Goller
@ 2024-11-21 10:09       ` Fabian Grünbichler
  2024-11-21 10:30         ` Gabriel Goller
  0 siblings, 1 reply; 11+ messages in thread
From: Fabian Grünbichler @ 2024-11-21 10:09 UTC (permalink / raw)
  To: Gabriel Goller; +Cc: pbs-devel


> Gabriel Goller <g.goller@proxmox.com> hat am 21.11.2024 11:04 CET geschrieben:
> 
>  
> On 20.11.2024 14:11, Fabian Grünbichler wrote:
> >a few small nits inline, looks good to me otherwise, but given the size of this
> >and the size of the push series, I'd rather this be rebased on top of the other
> >one ;)
> 
> Sure, shouldn't be a lot of work. Should I send a rebased version on top
> of the current push series as a v4?

yes, but please wait until it's applied (there have been a few changes queued on-top where I am not sure whether they might cause more conflicts ;))

> 
> >Quoting Gabriel Goller (2024-11-05 11:40:13)
> >> diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
> >> index 414ec878d01a..e6174322dad6 100644
> >> --- a/pbs-datastore/src/backup_info.rs
> >> +++ b/pbs-datastore/src/backup_info.rs
> >> @@ -8,7 +8,8 @@ use anyhow::{bail, format_err, Error};
> >>  use proxmox_sys::fs::{lock_dir_noblock, replace_file, CreateOptions};
> >>
> >>  use pbs_api_types::{
> >> -    Authid, BackupNamespace, BackupType, GroupFilter, BACKUP_DATE_REGEX, BACKUP_FILE_REGEX,
> >> +    Authid, BackupNamespace, BackupType, GroupFilter, VerifyState, BACKUP_DATE_REGEX,
> >> +    BACKUP_FILE_REGEX,
> >>  };
> >>  use pbs_config::{open_backup_lockfile, BackupLockGuard};
> >>
> >> @@ -583,6 +584,15 @@ impl BackupDir {
> >>
> >>          Ok(())
> >>      }
> >> +
> >> +    /// Load the verify state from the manifest.
> >> +    pub fn verify_state(&self) -> Option<VerifyState> {
> >
> >should this be a Result<Option<..>> to allow differentiation between no
> >verification state, and failure to parse or load the manifest? that would allow
> >us to resync totally corrupted snapshots as well (although that might be
> >considered out of scope based on the parameter description ;))
> 
> Yep it was already like this in the first version, no idea why I changed
> it. Like this we can return the load_manifest error with the Result and
> swallow the inner error with a `ok()` as it doesn't matter anymore.
> 
>      pub fn verify_state(&self) -> Result<Option<VerifyState>, anyhow::Error> {
>          let manifest = self.load_manifest()?;
>          Ok(manifest.0.verify_state().ok().flatten().map(|svs| svs.state))
>      }
> 
> 
> I think we also want to resync on errors when reading the manifest, I'll
> include that in the next version! Something like this maybe:
> 
>      match local_dir.verify_state() {
>          Ok(Some(state)) => {
>              if state == VerifyState::Failed {
>                  return Some((dir, true));
>              }
>          },
>          Ok(None) => {
>              // This means there either was an error parsing the manifest, or the 
>              // verify_state item was not found. This could be a new backup.

IMHO this should only be reached if no verification state is in the manifest (because no verification has happened yet), but the manifest was otherwise completely parseable. this can be treated the same as an okay verify state, since we can't know any better.

>          }
>          Err(_) => {
>              // There was an error loading the manifest, probably better if we
>              // resync.
>              return Some((dir, true));
>          }
>      }
> 
> >> +        if let Ok(manifest) = self.load_manifest() {
> >> +            manifest.0.verify_state()
> >> +        } else {
> >> +            None
> >> +        }
> >> +    }
> >>  }
> >>
> >>  impl AsRef<pbs_api_types::BackupNamespace> for BackupDir {
> >> diff --git a/pbs-datastore/src/manifest.rs b/pbs-datastore/src/manifest.rs
> >> index c3df014272a0..623c1499c0bb 100644
> >> --- a/pbs-datastore/src/manifest.rs
> >> +++ b/pbs-datastore/src/manifest.rs
> >> @@ -5,7 +5,7 @@ use anyhow::{bail, format_err, Error};
> >>  use serde::{Deserialize, Serialize};
> >>  use serde_json::{json, Value};
> >>
> >> -use pbs_api_types::{BackupType, CryptMode, Fingerprint};
> >> +use pbs_api_types::{BackupType, CryptMode, Fingerprint, SnapshotVerifyState, VerifyState};
> >>  use pbs_tools::crypt_config::CryptConfig;
> >>
> >>  pub const MANIFEST_BLOB_NAME: &str = "index.json.blob";
> >> @@ -242,6 +242,17 @@ impl BackupManifest {
> >>          let manifest: BackupManifest = serde_json::from_value(json)?;
> >>          Ok(manifest)
> >>      }
> >> +
> >> +    /// Get the verify state of the snapshot
> >> +    ///
> >> +    /// Note: New snapshots, which have not been verified yet, do not have a status and this
> >> +    /// function will return `None`.
> >> +    pub fn verify_state(&self) -> Option<VerifyState> {
> >
> >should this be a Result<Option<..>> to allow differentiation between no
> >verification state, and failure to parse?
> 
> Hmm so I could return a Result<Option<..>> by checking the error of the
> serde_json::from_value call. I could check if the "verify_state" value
> wasn't found in the manifest by calling `is_eof` [0] and if not, return
> a Ok(None), otherwise return an Error. This will make it more
> complicated for all the callers though – also 99% of the callers will
> treat Err the same as Ok(None) anyways. LTMK what you think!
> 
>      /// Get the verify state of the snapshot
>      ///
>      /// Note: New snapshots, which have not been verified yet, do not have a status and this
>      /// function will return `Ok(None)`.
>      pub fn verify_state(&self) -> Result<Option<SnapshotVerifyState>, anyhow::Error> {
>          let verify = self.unprotected["verify_state"].clone();

can't you just check here whether we have a value and return None otherwise?

>          match serde_json::from_value::<SnapshotVerifyState>(verify) {
>              Err(err) => {

then this can just bubble up the error?

>                  // `verify_state` item has not been found
>                  if err.is_eof() {
>                      Ok(None)
>                  }else {
>                      Err(err.into())
>                  }
>              },
>              Ok(svs) => {
>                  Ok(Some(svs))
>              }
>          }
>      }
> 
> 
> Else I could just return a Result<SnapshotVerifyState>.

I think differentiating between Ok(Some(state)), Ok(None) and Err(err) is important here, so I'd rather not do that ;)

> 
> [0]: https://docs.rs/serde_json/latest/serde_json/struct.Error.html#method.is_eof
> 
> >also, it would be great if existing code retrieving this could be adapted to
> >use these new helpers, which would require having the Result there as well..
> 
> Yep, overlooked those, my bad.
> 
> >> @@ -381,7 +388,7 @@ async fn pull_snapshot<'a>(
> >>          let mut path = snapshot.full_path();
> >>          path.push(&item.filename);
> >>
> >> -        if path.exists() {
> >> +        if !corrupt && path.exists() {
> >>              match ArchiveType::from_path(&item.filename)? {
> >>                  ArchiveType::DynamicIndex => {
> >>                      let index = DynamicIndexReader::open(&path)?;
> >> @@ -443,6 +450,7 @@ async fn pull_snapshot_from<'a>(
> >>      reader: Arc<dyn SyncSourceReader + 'a>,
> >>      snapshot: &'a pbs_datastore::BackupDir,
> >>      downloaded_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
> >> +    corrupt: bool,
> >>  ) -> Result<SyncStats, Error> {
> >>      let (_path, is_new, _snap_lock) = snapshot
> >>          .datastore()
> >> @@ -451,7 +459,7 @@ async fn pull_snapshot_from<'a>(
> >>      let sync_stats = if is_new {
> >
> >is_new and corrupt are never both true..
> >
> >>          info!("sync snapshot {}", snapshot.dir());
> >>
> >> -        match pull_snapshot(reader, snapshot, downloaded_chunks).await {
> >> +        match pull_snapshot(reader, snapshot, downloaded_chunks, corrupt).await {
> >
> >so this should be always false ;)
> 
> Agree, wrote a comment and passed directly `false`.
> 
> >>              Err(err) => {
> >>                  if let Err(cleanup_err) = snapshot.datastore().remove_backup_dir(
> >>                      snapshot.backup_ns(),
> >> @@ -468,8 +476,15 @@ async fn pull_snapshot_from<'a>(
> >>              }
> >>          }
> >>      } else {
> >> -        info!("re-sync snapshot {}", snapshot.dir());
> >> -        pull_snapshot(reader, snapshot, downloaded_chunks).await?
> >> +        if corrupt {
> >> +            info!(
> >> +                "re-sync snapshot {} due to bad verification result",
> >
> >nit: why not call it "corrupt", since that is what the parameter is called?
> 
> ack
> 
> >> +                snapshot.dir()
> >> +            );
> >> +        } else {
> >> +            info!("re-sync snapshot {}", snapshot.dir());
> >> +        }
> >> +        pull_snapshot(reader, snapshot, downloaded_chunks, corrupt).await?
> >>      };
> >>
> >>      Ok(sync_stats)


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup v3 2/3] fix #3786: ui/cli: add resync-corrupt option on sync-jobs
  2024-11-20 13:12   ` Fabian Grünbichler
@ 2024-11-21 10:18     ` Gabriel Goller
  0 siblings, 0 replies; 11+ messages in thread
From: Gabriel Goller @ 2024-11-21 10:18 UTC (permalink / raw)
  To: Fabian Grünbichler; +Cc: pbs-devel

On 20.11.2024 14:12, Fabian Grünbichler wrote:
>nit: might look better moved to the right side of the window, but no hard
>feelings either way

Every "advanced options" popout of ours (at least pbs) currently aligns
to left, so I wouldn't change that :)



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup v3 1/3] fix #3786: api: add resync-corrupt option to sync jobs
  2024-11-21 10:09       ` Fabian Grünbichler
@ 2024-11-21 10:30         ` Gabriel Goller
  0 siblings, 0 replies; 11+ messages in thread
From: Gabriel Goller @ 2024-11-21 10:30 UTC (permalink / raw)
  To: Fabian Grünbichler; +Cc: pbs-devel

On 21.11.2024 11:09, Fabian Grünbichler wrote:
>> Gabriel Goller <g.goller@proxmox.com> hat am 21.11.2024 11:04 CET geschrieben:
>>
>>
>> On 20.11.2024 14:11, Fabian Grünbichler wrote:
>> >a few small nits inline, looks good to me otherwise, but given the size of this
>> >and the size of the push series, I'd rather this be rebased on top of the other
>> >one ;)
>>
>> Sure, shouldn't be a lot of work. Should I send a rebased version on top
>> of the current push series as a v4?
>
>yes, but please wait until it's applied (there have been a few changes queued on-top where I am not sure whether they might cause more conflicts ;))

Ok, will wait with the next version until that series is applied!

>> >Quoting Gabriel Goller (2024-11-05 11:40:13)
>> >> diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
>> >> index 414ec878d01a..e6174322dad6 100644
>> >> --- a/pbs-datastore/src/backup_info.rs
>> >> +++ b/pbs-datastore/src/backup_info.rs
>> >> @@ -8,7 +8,8 @@ use anyhow::{bail, format_err, Error};
>> >>  use proxmox_sys::fs::{lock_dir_noblock, replace_file, CreateOptions};
>> >>
>> >>  use pbs_api_types::{
>> >> -    Authid, BackupNamespace, BackupType, GroupFilter, BACKUP_DATE_REGEX, BACKUP_FILE_REGEX,
>> >> +    Authid, BackupNamespace, BackupType, GroupFilter, VerifyState, BACKUP_DATE_REGEX,
>> >> +    BACKUP_FILE_REGEX,
>> >>  };
>> >>  use pbs_config::{open_backup_lockfile, BackupLockGuard};
>> >>
>> >> @@ -583,6 +584,15 @@ impl BackupDir {
>> >>
>> >>          Ok(())
>> >>      }
>> >> +
>> >> +    /// Load the verify state from the manifest.
>> >> +    pub fn verify_state(&self) -> Option<VerifyState> {
>> >
>> >should this be a Result<Option<..>> to allow differentiation between no
>> >verification state, and failure to parse or load the manifest? that would allow
>> >us to resync totally corrupted snapshots as well (although that might be
>> >considered out of scope based on the parameter description ;))
>>
>> Yep it was already like this in the first version, no idea why I changed
>> it. Like this we can return the load_manifest error with the Result and
>> swallow the inner error with a `ok()` as it doesn't matter anymore.
>>
>>      pub fn verify_state(&self) -> Result<Option<VerifyState>, anyhow::Error> {
>>          let manifest = self.load_manifest()?;
>>          Ok(manifest.0.verify_state().ok().flatten().map(|svs| svs.state))
>>      }
>>
>>
>> I think we also want to resync on errors when reading the manifest, I'll
>> include that in the next version! Something like this maybe:
>>
>>      match local_dir.verify_state() {
>>          Ok(Some(state)) => {
>>              if state == VerifyState::Failed {
>>                  return Some((dir, true));
>>              }
>>          },
>>          Ok(None) => {
>>              // This means there either was an error parsing the manifest, or the
>>              // verify_state item was not found. This could be a new backup.
>
>IMHO this should only be reached if no verification state is in the manifest (because no verification has happened yet), but the manifest was otherwise completely parseable. this can be treated the same as an okay verify state, since we can't know any better.

Oh, right.

>>          }
>>          Err(_) => {
>>              // There was an error loading the manifest, probably better if we
>>              // resync.
>>              return Some((dir, true));
>>          }
>>      }
>>
>> >> +        if let Ok(manifest) = self.load_manifest() {
>> >> +            manifest.0.verify_state()
>> >> +        } else {
>> >> +            None
>> >> +        }
>> >> +    }
>> >>  }
>> >>
>> >>  impl AsRef<pbs_api_types::BackupNamespace> for BackupDir {
>> >> diff --git a/pbs-datastore/src/manifest.rs b/pbs-datastore/src/manifest.rs
>> >> index c3df014272a0..623c1499c0bb 100644
>> >> --- a/pbs-datastore/src/manifest.rs
>> >> +++ b/pbs-datastore/src/manifest.rs
>> >> @@ -5,7 +5,7 @@ use anyhow::{bail, format_err, Error};
>> >>  use serde::{Deserialize, Serialize};
>> >>  use serde_json::{json, Value};
>> >>
>> >> -use pbs_api_types::{BackupType, CryptMode, Fingerprint};
>> >> +use pbs_api_types::{BackupType, CryptMode, Fingerprint, SnapshotVerifyState, VerifyState};
>> >>  use pbs_tools::crypt_config::CryptConfig;
>> >>
>> >>  pub const MANIFEST_BLOB_NAME: &str = "index.json.blob";
>> >> @@ -242,6 +242,17 @@ impl BackupManifest {
>> >>          let manifest: BackupManifest = serde_json::from_value(json)?;
>> >>          Ok(manifest)
>> >>      }
>> >> +
>> >> +    /// Get the verify state of the snapshot
>> >> +    ///
>> >> +    /// Note: New snapshots, which have not been verified yet, do not have a status and this
>> >> +    /// function will return `None`.
>> >> +    pub fn verify_state(&self) -> Option<VerifyState> {
>> >
>> >should this be a Result<Option<..>> to allow differentiation between no
>> >verification state, and failure to parse?
>>
>> Hmm so I could return a Result<Option<..>> by checking the error of the
>> serde_json::from_value call. I could check if the "verify_state" value
>> wasn't found in the manifest by calling `is_eof` [0] and if not, return
>> a Ok(None), otherwise return an Error. This will make it more
>> complicated for all the callers though – also 99% of the callers will
>> treat Err the same as Ok(None) anyways. LTMK what you think!
>>
>>      /// Get the verify state of the snapshot
>>      ///
>>      /// Note: New snapshots, which have not been verified yet, do not have a status and this
>>      /// function will return `Ok(None)`.
>>      pub fn verify_state(&self) -> Result<Option<SnapshotVerifyState>, anyhow::Error> {
>>          let verify = self.unprotected["verify_state"].clone();
>
>can't you just check here whether we have a value and return None otherwise?

Yep, I can check with `value.is_null()`.

>>          match serde_json::from_value::<SnapshotVerifyState>(verify) {
>>              Err(err) => {
>
>then this can just bubble up the error?

ack

>>                  // `verify_state` item has not been found
>>                  if err.is_eof() {
>>                      Ok(None)
>>                  }else {
>>                      Err(err.into())
>>                  }
>>              },
>>              Ok(svs) => {
>>                  Ok(Some(svs))
>>              }
>>          }
>>      }
>>
>>
>> Else I could just return a Result<SnapshotVerifyState>.
>
>I think differentiating between Ok(Some(state)), Ok(None) and Err(err) is important here, so I'd rather not do that ;)

ack.

Thanks for the review!


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup v3 0/3] fix #3786: resync corrupt chunks in sync-job
  2024-11-05 10:40 [pbs-devel] [PATCH proxmox-backup v3 0/3] fix #3786: resync corrupt chunks in sync-job Gabriel Goller
                   ` (2 preceding siblings ...)
  2024-11-05 10:40 ` [pbs-devel] [PATCH proxmox-backup v3 3/3] fix #3786: docs: add resync-corrupt option to sync-job Gabriel Goller
@ 2024-11-21 13:34 ` Gabriel Goller
  3 siblings, 0 replies; 11+ messages in thread
From: Gabriel Goller @ 2024-11-21 13:34 UTC (permalink / raw)
  To: pbs-devel

Submitted v4!


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-11-21 13:34 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-05 10:40 [pbs-devel] [PATCH proxmox-backup v3 0/3] fix #3786: resync corrupt chunks in sync-job Gabriel Goller
2024-11-05 10:40 ` [pbs-devel] [PATCH proxmox-backup v3 1/3] fix #3786: api: add resync-corrupt option to sync jobs Gabriel Goller
2024-11-20 13:11   ` Fabian Grünbichler
2024-11-21 10:04     ` Gabriel Goller
2024-11-21 10:09       ` Fabian Grünbichler
2024-11-21 10:30         ` Gabriel Goller
2024-11-05 10:40 ` [pbs-devel] [PATCH proxmox-backup v3 2/3] fix #3786: ui/cli: add resync-corrupt option on sync-jobs Gabriel Goller
2024-11-20 13:12   ` Fabian Grünbichler
2024-11-21 10:18     ` Gabriel Goller
2024-11-05 10:40 ` [pbs-devel] [PATCH proxmox-backup v3 3/3] fix #3786: docs: add resync-corrupt option to sync-job Gabriel Goller
2024-11-21 13:34 ` [pbs-devel] [PATCH proxmox-backup v3 0/3] fix #3786: resync corrupt chunks in sync-job Gabriel Goller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal