public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [RFC PATCH 0/4] fix #3786: add a "deep sync" option to sync jobs
@ 2022-06-15  8:20 Stefan Sterz
  2022-06-15  8:20 ` [pbs-devel] [RFC PATCH 1/4] fix #3786: api: add deep sync parameter Stefan Sterz
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Stefan Sterz @ 2022-06-15  8:20 UTC (permalink / raw)
  To: pbs-devel

this series adds a "deep sync" option to sync jobs. a deep sync uses
the information from a previous verification job to re-sync snapshots
that have corrupted chunks.

the deep sync option is added to the "advanced" section of the sync
job configuration intentionally for the reasoning see commit 3.
however, i am not entirely sure if that's the best way of handling
this.

sending this as an rfc because:

  a) this is a fairly minimalistic and unoptimized implementation. a
    deep sync currently just looks at _every_ manifest and if it's
    corrupt re-syncs the snapshot. as described in the bug report
    this could be improved by flagging groups during gc /
    verification / etc jobs and only looking at groups with corrupt
    snapshots.

    however, this would require a place to store this information.
    afaict currently the only metadata we have about groups is their
    owner and that has a dedicated "owner" file. i could just add
    a "corrupted" file, but if we need further information about
    groups in the future it might make sense to think about a more
    flexible solution.

    hence, i see three options: a) stick with this simple
    implementation, b) leverage gc / verification / etc jobs further
    and flag groups with a dedicated "corrupt" file or c) add a yet
    to be determined meta-data file for groups and store the flag
    there. any input would be greatly appreciated.

  b) while i tested this code quite a bit, i am unsure whether i
    missed scenarios in which corrupted snapshots might not be
    re-synced.

Stefan Sterz (4):
  fix #3786: api: add deep sync parameter
  fix #3786: server/datastore: add deep sync parameter to pull sync jobs
  fix #3786: ui/cli: add deep sync option to ui and cli
  fix #3786: docs: document deep sync behavior and prerequisites

 docs/managing-remotes.rst         |  6 ++++++
 pbs-api-types/src/jobs.rs         | 12 ++++++++++++
 pbs-datastore/src/backup_info.rs  | 22 +++++++++++++++++++++-
 src/api2/config/sync.rs           | 10 ++++++++++
 src/api2/pull.rs                  | 12 ++++++++++--
 src/bin/proxmox-backup-manager.rs | 13 +++++++++++--
 src/server/pull.rs                | 29 +++++++++++++++++++++--------
 www/window/SyncJobEdit.js         | 11 +++++++++++
 8 files changed, 102 insertions(+), 13 deletions(-)

-- 
2.30.2





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pbs-devel] [RFC PATCH 1/4] fix #3786: api: add deep sync parameter
  2022-06-15  8:20 [pbs-devel] [RFC PATCH 0/4] fix #3786: add a "deep sync" option to sync jobs Stefan Sterz
@ 2022-06-15  8:20 ` Stefan Sterz
  2022-08-02  9:19   ` Fabian Grünbichler
  2022-06-15  8:20 ` [pbs-devel] [RFC PATCH 2/4] fix #3786: server/datastore: add deep sync parameter to pull sync jobs Stefan Sterz
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 8+ messages in thread
From: Stefan Sterz @ 2022-06-15  8:20 UTC (permalink / raw)
  To: pbs-devel

allows configuring a sync job as a "deep" sync job, which will re-sync
corrupted snapshots

Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
---
 pbs-api-types/src/jobs.rs | 12 ++++++++++++
 src/api2/config/sync.rs   | 10 ++++++++++
 src/api2/pull.rs          | 12 ++++++++++--
 src/server/pull.rs        |  5 +++++
 4 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/pbs-api-types/src/jobs.rs b/pbs-api-types/src/jobs.rs
index 925a1829..87de8803 100644
--- a/pbs-api-types/src/jobs.rs
+++ b/pbs-api-types/src/jobs.rs
@@ -63,6 +63,12 @@ pub const REMOVE_VANISHED_BACKUPS_SCHEMA: Schema = BooleanSchema::new(
 .default(false)
 .schema();
 
+pub const DEEP_SYNC_BACKUPS_SCHEMA: Schema = BooleanSchema::new(
+    "Checks if snapshots passed the last verification and if not re-syncs them.",
+)
+.default(false)
+.schema();
+
 #[api(
     properties: {
         "next-run": {
@@ -461,6 +467,10 @@ pub const GROUP_FILTER_LIST_SCHEMA: Schema =
             schema: NS_MAX_DEPTH_REDUCED_SCHEMA,
             optional: true,
         },
+        "deep-sync":{
+            schema: DEEP_SYNC_BACKUPS_SCHEMA,
+            optional: true,
+        },
         comment: {
             optional: true,
             schema: SINGLE_LINE_COMMENT_SCHEMA,
@@ -498,6 +508,8 @@ pub struct SyncJobConfig {
     #[serde(skip_serializing_if = "Option::is_none")]
     pub max_depth: Option<usize>,
     #[serde(skip_serializing_if = "Option::is_none")]
+    pub deep_sync: Option<bool>,
+    #[serde(skip_serializing_if = "Option::is_none")]
     pub comment: Option<String>,
     #[serde(skip_serializing_if = "Option::is_none")]
     pub schedule: Option<String>,
diff --git a/src/api2/config/sync.rs b/src/api2/config/sync.rs
index 7b9e9542..ee7f6175 100644
--- a/src/api2/config/sync.rs
+++ b/src/api2/config/sync.rs
@@ -200,6 +200,8 @@ pub enum DeletableProperty {
     schedule,
     /// Delete the remove-vanished flag.
     remove_vanished,
+    /// Delete the deep-sync flag.
+    deep_sync,
     /// Delete the group_filter property.
     group_filter,
     /// Delete the rate_in property.
@@ -286,6 +288,9 @@ pub fn update_sync_job(
                 DeletableProperty::remove_vanished => {
                     data.remove_vanished = None;
                 }
+                DeletableProperty::deep_sync => {
+                    data.deep_sync = None;
+                }
                 DeletableProperty::group_filter => {
                     data.group_filter = None;
                 }
@@ -381,6 +386,10 @@ pub fn update_sync_job(
         }
     }
 
+    if let Some(deep_sync) = update.deep_sync {
+        data.deep_sync = Some(deep_sync);
+    }
+
     if !check_sync_job_modify_access(&user_info, &auth_id, &data) {
         bail!("permission check failed");
     }
@@ -504,6 +513,7 @@ acl:1:/remote/remote1/remotestore1:write@pbs:RemoteSyncOperator
         owner: Some(write_auth_id.clone()),
         comment: None,
         remove_vanished: None,
+        deep_sync: None,
         max_depth: None,
         group_filter: None,
         schedule: None,
diff --git a/src/api2/pull.rs b/src/api2/pull.rs
index e05e946e..424f9962 100644
--- a/src/api2/pull.rs
+++ b/src/api2/pull.rs
@@ -10,8 +10,9 @@ use proxmox_sys::task_log;
 
 use pbs_api_types::{
     Authid, BackupNamespace, GroupFilter, RateLimitConfig, SyncJobConfig, DATASTORE_SCHEMA,
-    GROUP_FILTER_LIST_SCHEMA, NS_MAX_DEPTH_REDUCED_SCHEMA, PRIV_DATASTORE_BACKUP,
-    PRIV_DATASTORE_PRUNE, PRIV_REMOTE_READ, REMOTE_ID_SCHEMA, REMOVE_VANISHED_BACKUPS_SCHEMA,
+    DEEP_SYNC_BACKUPS_SCHEMA, GROUP_FILTER_LIST_SCHEMA, NS_MAX_DEPTH_REDUCED_SCHEMA,
+    PRIV_DATASTORE_BACKUP, PRIV_DATASTORE_PRUNE, PRIV_REMOTE_READ, REMOTE_ID_SCHEMA,
+    REMOVE_VANISHED_BACKUPS_SCHEMA,
 };
 use pbs_config::CachedUserInfo;
 use proxmox_rest_server::WorkerTask;
@@ -76,6 +77,7 @@ impl TryFrom<&SyncJobConfig> for PullParameters {
                 .clone(),
             sync_job.remove_vanished,
             sync_job.max_depth,
+            sync_job.deep_sync,
             sync_job.group_filter.clone(),
             sync_job.limit.clone(),
         )
@@ -196,6 +198,10 @@ pub fn do_sync_job(
                 schema: NS_MAX_DEPTH_REDUCED_SCHEMA,
                 optional: true,
             },
+            "deep-sync": {
+                schema: DEEP_SYNC_BACKUPS_SCHEMA,
+                optional: true,
+            },
             "group-filter": {
                 schema: GROUP_FILTER_LIST_SCHEMA,
                 optional: true,
@@ -224,6 +230,7 @@ async fn pull(
     remote_ns: Option<BackupNamespace>,
     remove_vanished: Option<bool>,
     max_depth: Option<usize>,
+    deep_sync: Option<bool>,
     group_filter: Option<Vec<GroupFilter>>,
     limit: RateLimitConfig,
     _info: &ApiMethod,
@@ -257,6 +264,7 @@ async fn pull(
         auth_id.clone(),
         remove_vanished,
         max_depth,
+        deep_sync,
         group_filter,
         limit,
     )?;
diff --git a/src/server/pull.rs b/src/server/pull.rs
index b159c75d..6778c66b 100644
--- a/src/server/pull.rs
+++ b/src/server/pull.rs
@@ -56,6 +56,8 @@ pub struct PullParameters {
     remove_vanished: bool,
     /// How many levels of sub-namespaces to pull (0 == no recursion, None == maximum recursion)
     max_depth: Option<usize>,
+    /// Whether to re-sync corrupted snapshots
+    _deep_sync: bool,
     /// Filters for reducing the pull scope
     group_filter: Option<Vec<GroupFilter>>,
     /// Rate limits for all transfers from `remote`
@@ -76,6 +78,7 @@ impl PullParameters {
         owner: Authid,
         remove_vanished: Option<bool>,
         max_depth: Option<usize>,
+        deep_sync: Option<bool>,
         group_filter: Option<Vec<GroupFilter>>,
         limit: RateLimitConfig,
     ) -> Result<Self, Error> {
@@ -90,6 +93,7 @@ impl PullParameters {
         let remote: Remote = remote_config.lookup("remote", remote)?;
 
         let remove_vanished = remove_vanished.unwrap_or(false);
+        let deep_sync = deep_sync.unwrap_or(false);
 
         let source = BackupRepository::new(
             Some(remote.config.auth_id.clone()),
@@ -107,6 +111,7 @@ impl PullParameters {
             owner,
             remove_vanished,
             max_depth,
+            _deep_sync: deep_sync,
             group_filter,
             limit,
         })
-- 
2.30.2





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pbs-devel] [RFC PATCH 2/4] fix #3786: server/datastore: add deep sync parameter to pull sync jobs
  2022-06-15  8:20 [pbs-devel] [RFC PATCH 0/4] fix #3786: add a "deep sync" option to sync jobs Stefan Sterz
  2022-06-15  8:20 ` [pbs-devel] [RFC PATCH 1/4] fix #3786: api: add deep sync parameter Stefan Sterz
@ 2022-06-15  8:20 ` Stefan Sterz
  2022-08-02 10:07   ` Fabian Grünbichler
  2022-06-15  8:20 ` [pbs-devel] [RFC PATCH 3/4] fix #3786: ui/cli: add deep sync option to ui and cli Stefan Sterz
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 8+ messages in thread
From: Stefan Sterz @ 2022-06-15  8:20 UTC (permalink / raw)
  To: pbs-devel

Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
---
 pbs-datastore/src/backup_info.rs | 22 +++++++++++++++++++++-
 src/server/pull.rs               | 28 ++++++++++++++++++----------
 2 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
index 10320a35..89461c66 100644
--- a/pbs-datastore/src/backup_info.rs
+++ b/pbs-datastore/src/backup_info.rs
@@ -9,7 +9,8 @@ use anyhow::{bail, format_err, Error};
 use proxmox_sys::fs::{lock_dir_noblock, replace_file, CreateOptions};
 
 use pbs_api_types::{
-    Authid, BackupNamespace, BackupType, GroupFilter, BACKUP_DATE_REGEX, BACKUP_FILE_REGEX,
+    Authid, BackupNamespace, BackupType, GroupFilter, SnapshotVerifyState, VerifyState,
+    BACKUP_DATE_REGEX, BACKUP_FILE_REGEX,
 };
 use pbs_config::{open_backup_lockfile, BackupLockGuard};
 
@@ -544,6 +545,25 @@ impl BackupDir {
 
         Ok(())
     }
+
+    /// Returns true if the last verification of the snapshot failed and false otherwise.
+    ///
+    /// Note that a snapshot that has not been verified will also return false.
+    pub fn is_corrupt(&self) -> bool {
+        let mut to_return = false;
+
+        let _ = self.update_manifest(|m| {
+            let verify = m.unprotected["verify_state"].clone();
+
+            if let Ok(verify) = serde_json::from_value::<SnapshotVerifyState>(verify) {
+                if verify.state == VerifyState::Failed {
+                    to_return = true;
+                }
+            }
+        });
+
+        to_return
+    }
 }
 
 impl AsRef<pbs_api_types::BackupNamespace> for BackupDir {
diff --git a/src/server/pull.rs b/src/server/pull.rs
index 6778c66b..767b394c 100644
--- a/src/server/pull.rs
+++ b/src/server/pull.rs
@@ -57,7 +57,7 @@ pub struct PullParameters {
     /// How many levels of sub-namespaces to pull (0 == no recursion, None == maximum recursion)
     max_depth: Option<usize>,
     /// Whether to re-sync corrupted snapshots
-    _deep_sync: bool,
+    deep_sync: bool,
     /// Filters for reducing the pull scope
     group_filter: Option<Vec<GroupFilter>>,
     /// Rate limits for all transfers from `remote`
@@ -111,7 +111,7 @@ impl PullParameters {
             owner,
             remove_vanished,
             max_depth,
-            _deep_sync: deep_sync,
+            deep_sync,
             group_filter,
             limit,
         })
@@ -371,6 +371,7 @@ async fn pull_snapshot(
     worker: &WorkerTask,
     reader: Arc<BackupReader>,
     snapshot: &pbs_datastore::BackupDir,
+    params: &PullParameters,
     downloaded_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
 ) -> Result<(), Error> {
     let mut manifest_name = snapshot.full_path();
@@ -437,7 +438,10 @@ async fn pull_snapshot(
         let mut path = snapshot.full_path();
         path.push(&item.filename);
 
-        if path.exists() {
+        // if a snapshot could not be verified, the index file will stay the same, but it'll point
+        // to at least one corrupted chunk. hence, skip this check if the last verification job
+        // failed and we are running a deep sync.
+        if !(params.deep_sync && snapshot.is_corrupt()) && path.exists() {
             match archive_type(&item.filename)? {
                 ArchiveType::DynamicIndex => {
                     let index = DynamicIndexReader::open(&path)?;
@@ -513,6 +517,7 @@ async fn pull_snapshot_from(
     worker: &WorkerTask,
     reader: Arc<BackupReader>,
     snapshot: &pbs_datastore::BackupDir,
+    params: &PullParameters,
     downloaded_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
 ) -> Result<(), Error> {
     let (_path, is_new, _snap_lock) = snapshot
@@ -522,7 +527,7 @@ async fn pull_snapshot_from(
     if is_new {
         task_log!(worker, "sync snapshot {}", snapshot.dir());
 
-        if let Err(err) = pull_snapshot(worker, reader, snapshot, downloaded_chunks).await {
+        if let Err(err) = pull_snapshot(worker, reader, snapshot, params, downloaded_chunks).await {
             if let Err(cleanup_err) = snapshot.datastore().remove_backup_dir(
                 snapshot.backup_ns(),
                 snapshot.as_ref(),
@@ -535,7 +540,7 @@ async fn pull_snapshot_from(
         task_log!(worker, "sync snapshot {} done", snapshot.dir());
     } else {
         task_log!(worker, "re-sync snapshot {}", snapshot.dir());
-        pull_snapshot(worker, reader, snapshot, downloaded_chunks).await?;
+        pull_snapshot(worker, reader, snapshot, params, downloaded_chunks).await?;
         task_log!(worker, "re-sync snapshot {} done", snapshot.dir());
     }
 
@@ -666,10 +671,12 @@ async fn pull_group(
 
         remote_snapshots.insert(snapshot.time);
 
-        if let Some(last_sync_time) = last_sync {
-            if last_sync_time > snapshot.time {
-                skip_info.update(snapshot.time);
-                continue;
+        if !params.deep_sync {
+            if let Some(last_sync_time) = last_sync {
+                if last_sync_time > snapshot.time {
+                    skip_info.update(snapshot.time);
+                    continue;
+                }
             }
         }
 
@@ -699,7 +706,8 @@ async fn pull_group(
 
         let snapshot = params.store.backup_dir(target_ns.clone(), snapshot)?;
 
-        let result = pull_snapshot_from(worker, reader, &snapshot, downloaded_chunks.clone()).await;
+        let result =
+            pull_snapshot_from(worker, reader, &snapshot, params, downloaded_chunks.clone()).await;
 
         progress.done_snapshots = pos as u64 + 1;
         task_log!(worker, "percentage done: {}", progress);
-- 
2.30.2





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pbs-devel] [RFC PATCH 3/4] fix #3786: ui/cli: add deep sync option to ui and cli
  2022-06-15  8:20 [pbs-devel] [RFC PATCH 0/4] fix #3786: add a "deep sync" option to sync jobs Stefan Sterz
  2022-06-15  8:20 ` [pbs-devel] [RFC PATCH 1/4] fix #3786: api: add deep sync parameter Stefan Sterz
  2022-06-15  8:20 ` [pbs-devel] [RFC PATCH 2/4] fix #3786: server/datastore: add deep sync parameter to pull sync jobs Stefan Sterz
@ 2022-06-15  8:20 ` Stefan Sterz
  2022-06-15  8:20 ` [pbs-devel] [RFC PATCH 4/4] fix #3786: docs: document deep sync behavior and prerequisites Stefan Sterz
  2022-06-22 10:52 ` [pbs-devel] [RFC PATCH 0/4] fix #3786: add a "deep sync" option to sync jobs Matthias Heiserer
  4 siblings, 0 replies; 8+ messages in thread
From: Stefan Sterz @ 2022-06-15  8:20 UTC (permalink / raw)
  To: pbs-devel

make "deep sync" an advanced option because: a) you need to know that
it has no effect unless a previous verify job has been executed b) it
might take a lot more time to finish since it has to check each
manifest and c) the default "off" should be what most users want.

Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
---
 src/bin/proxmox-backup-manager.rs | 13 +++++++++++--
 www/window/SyncJobEdit.js         | 11 +++++++++++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/src/bin/proxmox-backup-manager.rs b/src/bin/proxmox-backup-manager.rs
index 89598c90..0c0191dd 100644
--- a/src/bin/proxmox-backup-manager.rs
+++ b/src/bin/proxmox-backup-manager.rs
@@ -12,8 +12,8 @@ use proxmox_sys::fs::CreateOptions;
 use pbs_api_types::percent_encoding::percent_encode_component;
 use pbs_api_types::{
     BackupNamespace, GroupFilter, RateLimitConfig, SyncJobConfig, DATASTORE_SCHEMA,
-    GROUP_FILTER_LIST_SCHEMA, IGNORE_VERIFIED_BACKUPS_SCHEMA, NS_MAX_DEPTH_SCHEMA,
-    REMOTE_ID_SCHEMA, REMOVE_VANISHED_BACKUPS_SCHEMA, UPID_SCHEMA,
+    DEEP_SYNC_BACKUPS_SCHEMA, GROUP_FILTER_LIST_SCHEMA, IGNORE_VERIFIED_BACKUPS_SCHEMA,
+    NS_MAX_DEPTH_SCHEMA, REMOTE_ID_SCHEMA, REMOVE_VANISHED_BACKUPS_SCHEMA, UPID_SCHEMA,
     VERIFICATION_OUTDATED_AFTER_SCHEMA,
 };
 use pbs_client::{display_task_log, view_task_result};
@@ -260,6 +260,10 @@ fn task_mgmt_cli() -> CommandLineInterface {
                 schema: NS_MAX_DEPTH_SCHEMA,
                 optional: true,
             },
+            "deep-sync": {
+                schema: DEEP_SYNC_BACKUPS_SCHEMA,
+                optional: true,
+            },
             "group-filter": {
                 schema: GROUP_FILTER_LIST_SCHEMA,
                 optional: true,
@@ -284,6 +288,7 @@ async fn pull_datastore(
     ns: Option<BackupNamespace>,
     remove_vanished: Option<bool>,
     max_depth: Option<usize>,
+    deep_sync: Option<bool>,
     group_filter: Option<Vec<GroupFilter>>,
     limit: RateLimitConfig,
     param: Value,
@@ -319,6 +324,10 @@ async fn pull_datastore(
         args["remove-vanished"] = Value::from(remove_vanished);
     }
 
+    if let Some(deep_sync) = deep_sync {
+        args["deep-sync"] = Value::from(deep_sync);
+    }
+
     let result = client.post("api2/json/pull", Some(args)).await?;
 
     view_task_result(&client, result, &output_format).await?;
diff --git a/www/window/SyncJobEdit.js b/www/window/SyncJobEdit.js
index 948ad5da..da1b0bed 100644
--- a/www/window/SyncJobEdit.js
+++ b/www/window/SyncJobEdit.js
@@ -232,6 +232,17 @@ Ext.define('PBS.window.SyncJobEdit', {
 			    editable: '{isCreate}',
 			},
 		    },
+		    {
+			fieldLabel: gettext('Deep sync'),
+			xtype: 'proxmoxcheckbox',
+			name: 'deep-sync',
+			autoEl: {
+			    tag: 'div',
+			    'data-qtip': gettext('Check if all snapshots passed the last verification job and if not re-sync them?'),
+			},
+			uncheckedValue: false,
+			value: false,
+		    },
 		],
 	    },
 	    {
-- 
2.30.2





^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pbs-devel] [RFC PATCH 4/4] fix #3786: docs: document deep sync behavior and prerequisites
  2022-06-15  8:20 [pbs-devel] [RFC PATCH 0/4] fix #3786: add a "deep sync" option to sync jobs Stefan Sterz
                   ` (2 preceding siblings ...)
  2022-06-15  8:20 ` [pbs-devel] [RFC PATCH 3/4] fix #3786: ui/cli: add deep sync option to ui and cli Stefan Sterz
@ 2022-06-15  8:20 ` Stefan Sterz
  2022-06-22 10:52 ` [pbs-devel] [RFC PATCH 0/4] fix #3786: add a "deep sync" option to sync jobs Matthias Heiserer
  4 siblings, 0 replies; 8+ messages in thread
From: Stefan Sterz @ 2022-06-15  8:20 UTC (permalink / raw)
  To: pbs-devel

Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
---
 docs/managing-remotes.rst | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/docs/managing-remotes.rst b/docs/managing-remotes.rst
index 0af3dd60..ff639054 100644
--- a/docs/managing-remotes.rst
+++ b/docs/managing-remotes.rst
@@ -95,6 +95,12 @@ the local datastore as well. If the ``owner`` option is not set (defaulting to
 ``root@pam``) or is set to something other than the configuring user,
 ``Datastore.Modify`` is required as well.
 
+Enabling the advanced option ``deep-sync`` will re-sync all snapshots that have
+failed to verify during the last :ref:`maintenance_verification`. Hence, a
+verification job needs to be run before a deep sync job can be carried out. Be
+aware that a deep sync needs to check the manifests of all snapshots in a
+datastore and might take much longer than regular sync jobs.
+
 If the ``group-filter`` option is set, only backup groups matching at least one
 of the specified criteria are synced. The available criteria are:
 
-- 
2.30.2





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pbs-devel] [RFC PATCH 0/4] fix #3786: add a "deep sync" option to sync jobs
  2022-06-15  8:20 [pbs-devel] [RFC PATCH 0/4] fix #3786: add a "deep sync" option to sync jobs Stefan Sterz
                   ` (3 preceding siblings ...)
  2022-06-15  8:20 ` [pbs-devel] [RFC PATCH 4/4] fix #3786: docs: document deep sync behavior and prerequisites Stefan Sterz
@ 2022-06-22 10:52 ` Matthias Heiserer
  4 siblings, 0 replies; 8+ messages in thread
From: Matthias Heiserer @ 2022-06-22 10:52 UTC (permalink / raw)
  To: pbs-devel

On 15.06.2022 10:20, Stefan Sterz wrote:
> this series adds a "deep sync" option to sync jobs. a deep sync uses
> the information from a previous verification job to re-sync snapshots
> that have corrupted chunks.
> 
> the deep sync option is added to the "advanced" section of the sync
> job configuration intentionally for the reasoning see commit 3.
> however, i am not entirely sure if that's the best way of handling
> this.
> 
Tested on a PBS VM with two data stores, works great!
I quite like it.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pbs-devel] [RFC PATCH 1/4] fix #3786: api: add deep sync parameter
  2022-06-15  8:20 ` [pbs-devel] [RFC PATCH 1/4] fix #3786: api: add deep sync parameter Stefan Sterz
@ 2022-08-02  9:19   ` Fabian Grünbichler
  0 siblings, 0 replies; 8+ messages in thread
From: Fabian Grünbichler @ 2022-08-02  9:19 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On June 15, 2022 10:20 am, Stefan Sterz wrote:
> allows configuring a sync job as a "deep" sync job, which will re-sync
> corrupted snapshots

I am not too happy about the naming, "deep" could mean basically 
anything, including re-syncing all missing snapshots, (re-?)syncing 
notes/verification-state/protection/..

maybe it makes sense to either make this more verbose, or already make 
it a property string "resync" with
- missing (to re-sync missing, older snapshots)
- corrupt (to re-sync corrupt snapshots)
?

> 
> Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
> ---
>  pbs-api-types/src/jobs.rs | 12 ++++++++++++
>  src/api2/config/sync.rs   | 10 ++++++++++
>  src/api2/pull.rs          | 12 ++++++++++--
>  src/server/pull.rs        |  5 +++++
>  4 files changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/pbs-api-types/src/jobs.rs b/pbs-api-types/src/jobs.rs
> index 925a1829..87de8803 100644
> --- a/pbs-api-types/src/jobs.rs
> +++ b/pbs-api-types/src/jobs.rs
> @@ -63,6 +63,12 @@ pub const REMOVE_VANISHED_BACKUPS_SCHEMA: Schema = 
> BooleanSchema::new(
>  .default(false)
>  .schema();
>  
> +pub const DEEP_SYNC_BACKUPS_SCHEMA: Schema = BooleanSchema::new(
> +    "Checks if snapshots passed the last verification and if not 
> re-syncs them.",
> +)
> +.default(false)
> +.schema();
> +
>  #[api(
>      properties: {
>          "next-run": {
> @@ -461,6 +467,10 @@ pub const GROUP_FILTER_LIST_SCHEMA: Schema =
>              schema: NS_MAX_DEPTH_REDUCED_SCHEMA,
>              optional: true,
>          },
> +        "deep-sync":{
> +            schema: DEEP_SYNC_BACKUPS_SCHEMA,
> +            optional: true,
> +        },
>          comment: {
>              optional: true,
>              schema: SINGLE_LINE_COMMENT_SCHEMA,
> @@ -498,6 +508,8 @@ pub struct SyncJobConfig {
>      #[serde(skip_serializing_if = "Option::is_none")]
>      pub max_depth: Option<usize>,
>      #[serde(skip_serializing_if = "Option::is_none")]
> +    pub deep_sync: Option<bool>,
> +    #[serde(skip_serializing_if = "Option::is_none")]
>      pub comment: Option<String>,
>      #[serde(skip_serializing_if = "Option::is_none")]
>      pub schedule: Option<String>,
> diff --git a/src/api2/config/sync.rs b/src/api2/config/sync.rs
> index 7b9e9542..ee7f6175 100644
> --- a/src/api2/config/sync.rs
> +++ b/src/api2/config/sync.rs
> @@ -200,6 +200,8 @@ pub enum DeletableProperty {
>      schedule,
>      /// Delete the remove-vanished flag.
>      remove_vanished,
> +    /// Delete the deep-sync flag.
> +    deep_sync,
>      /// Delete the group_filter property.
>      group_filter,
>      /// Delete the rate_in property.
> @@ -286,6 +288,9 @@ pub fn update_sync_job(
>                  DeletableProperty::remove_vanished => {
>                      data.remove_vanished = None;
>                  }
> +                DeletableProperty::deep_sync => {
> +                    data.deep_sync = None;
> +                }
>                  DeletableProperty::group_filter => {
>                      data.group_filter = None;
>                  }
> @@ -381,6 +386,10 @@ pub fn update_sync_job(
>          }
>      }
>  
> +    if let Some(deep_sync) = update.deep_sync {
> +        data.deep_sync = Some(deep_sync);
> +    }
> +
>      if !check_sync_job_modify_access(&user_info, &auth_id, &data) {
>          bail!("permission check failed");
>      }
> @@ -504,6 +513,7 @@ 
> acl:1:/remote/remote1/remotestore1:write@pbs:RemoteSyncOperator
>          owner: Some(write_auth_id.clone()),
>          comment: None,
>          remove_vanished: None,
> +        deep_sync: None,
>          max_depth: None,
>          group_filter: None,
>          schedule: None,
> diff --git a/src/api2/pull.rs b/src/api2/pull.rs
> index e05e946e..424f9962 100644
> --- a/src/api2/pull.rs
> +++ b/src/api2/pull.rs
> @@ -10,8 +10,9 @@ use proxmox_sys::task_log;
>  
>  use pbs_api_types::{
>      Authid, BackupNamespace, GroupFilter, RateLimitConfig, SyncJobConfig, DATASTORE_SCHEMA,
> -    GROUP_FILTER_LIST_SCHEMA, NS_MAX_DEPTH_REDUCED_SCHEMA, PRIV_DATASTORE_BACKUP,
> -    PRIV_DATASTORE_PRUNE, PRIV_REMOTE_READ, REMOTE_ID_SCHEMA, REMOVE_VANISHED_BACKUPS_SCHEMA,
> +    DEEP_SYNC_BACKUPS_SCHEMA, GROUP_FILTER_LIST_SCHEMA, NS_MAX_DEPTH_REDUCED_SCHEMA,
> +    PRIV_DATASTORE_BACKUP, PRIV_DATASTORE_PRUNE, PRIV_REMOTE_READ, REMOTE_ID_SCHEMA,
> +    REMOVE_VANISHED_BACKUPS_SCHEMA,
>  };
>  use pbs_config::CachedUserInfo;
>  use proxmox_rest_server::WorkerTask;
> @@ -76,6 +77,7 @@ impl TryFrom<&SyncJobConfig> for PullParameters {
>                  .clone(),
>              sync_job.remove_vanished,
>              sync_job.max_depth,
> +            sync_job.deep_sync,
>              sync_job.group_filter.clone(),
>              sync_job.limit.clone(),
>          )
> @@ -196,6 +198,10 @@ pub fn do_sync_job(
>                  schema: NS_MAX_DEPTH_REDUCED_SCHEMA,
>                  optional: true,
>              },
> +            "deep-sync": {
> +                schema: DEEP_SYNC_BACKUPS_SCHEMA,
> +                optional: true,
> +            },
>              "group-filter": {
>                  schema: GROUP_FILTER_LIST_SCHEMA,
>                  optional: true,
> @@ -224,6 +230,7 @@ async fn pull(
>      remote_ns: Option<BackupNamespace>,
>      remove_vanished: Option<bool>,
>      max_depth: Option<usize>,
> +    deep_sync: Option<bool>,
>      group_filter: Option<Vec<GroupFilter>>,
>      limit: RateLimitConfig,
>      _info: &ApiMethod,
> @@ -257,6 +264,7 @@ async fn pull(
>          auth_id.clone(),
>          remove_vanished,
>          max_depth,
> +        deep_sync,
>          group_filter,
>          limit,
>      )?;
> diff --git a/src/server/pull.rs b/src/server/pull.rs
> index b159c75d..6778c66b 100644
> --- a/src/server/pull.rs
> +++ b/src/server/pull.rs
> @@ -56,6 +56,8 @@ pub struct PullParameters {
>      remove_vanished: bool,
>      /// How many levels of sub-namespaces to pull (0 == no recursion, None == maximum recursion)
>      max_depth: Option<usize>,
> +    /// Whether to re-sync corrupted snapshots
> +    _deep_sync: bool,

nit: IMHO it's better to have this commit generate a small warning here 
instead of the churn of renaming that field in the next patch ;) but 
it's a matter of taste I guess :)

>      /// Filters for reducing the pull scope
>      group_filter: Option<Vec<GroupFilter>>,
>      /// Rate limits for all transfers from `remote`
> @@ -76,6 +78,7 @@ impl PullParameters {
>          owner: Authid,
>          remove_vanished: Option<bool>,
>          max_depth: Option<usize>,
> +        deep_sync: Option<bool>,
>          group_filter: Option<Vec<GroupFilter>>,
>          limit: RateLimitConfig,
>      ) -> Result<Self, Error> {
> @@ -90,6 +93,7 @@ impl PullParameters {
>          let remote: Remote = remote_config.lookup("remote", remote)?;
>  
>          let remove_vanished = remove_vanished.unwrap_or(false);
> +        let deep_sync = deep_sync.unwrap_or(false);
>  
>          let source = BackupRepository::new(
>              Some(remote.config.auth_id.clone()),
> @@ -107,6 +111,7 @@ impl PullParameters {
>              owner,
>              remove_vanished,
>              max_depth,
> +            _deep_sync: deep_sync,
>              group_filter,
>              limit,
>          })
> -- 
> 2.30.2
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pbs-devel] [RFC PATCH 2/4] fix #3786: server/datastore: add deep sync parameter to pull sync jobs
  2022-06-15  8:20 ` [pbs-devel] [RFC PATCH 2/4] fix #3786: server/datastore: add deep sync parameter to pull sync jobs Stefan Sterz
@ 2022-08-02 10:07   ` Fabian Grünbichler
  0 siblings, 0 replies; 8+ messages in thread
From: Fabian Grünbichler @ 2022-08-02 10:07 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On June 15, 2022 10:20 am, Stefan Sterz wrote:
> Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
> ---
>  pbs-datastore/src/backup_info.rs | 22 +++++++++++++++++++++-
>  src/server/pull.rs               | 28 ++++++++++++++++++----------
>  2 files changed, 39 insertions(+), 11 deletions(-)
> 
> diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
> index 10320a35..89461c66 100644
> --- a/pbs-datastore/src/backup_info.rs
> +++ b/pbs-datastore/src/backup_info.rs
> @@ -9,7 +9,8 @@ use anyhow::{bail, format_err, Error};
>  use proxmox_sys::fs::{lock_dir_noblock, replace_file, CreateOptions};
>  
>  use pbs_api_types::{
> -    Authid, BackupNamespace, BackupType, GroupFilter, BACKUP_DATE_REGEX, BACKUP_FILE_REGEX,
> +    Authid, BackupNamespace, BackupType, GroupFilter, SnapshotVerifyState, VerifyState,
> +    BACKUP_DATE_REGEX, BACKUP_FILE_REGEX,
>  };
>  use pbs_config::{open_backup_lockfile, BackupLockGuard};
>  
> @@ -544,6 +545,25 @@ impl BackupDir {
>  
>          Ok(())
>      }
> +
> +    /// Returns true if the last verification of the snapshot failed and false otherwise.
> +    ///
> +    /// Note that a snapshot that has not been verified will also return false.
> +    pub fn is_corrupt(&self) -> bool {
> +        let mut to_return = false;
> +
> +        let _ = self.update_manifest(|m| {
> +            let verify = m.unprotected["verify_state"].clone();
> +
> +            if let Ok(verify) = serde_json::from_value::<SnapshotVerifyState>(verify) {
> +                if verify.state == VerifyState::Failed {
> +                    to_return = true;
> +                }
> +            }
> +        });
> +
> +        to_return
> +    }

I a not sure whether this is the right place for this helper (mainly 
because of the assumption that something named "is_corrupt" in such a 
high level place is allowed to return "false" in the error path..)

maybe move it to the pull code as an internal helper there for now?

alternatively, something like this might be more appropriate if we think 
we'll add more places where we just want to base actions on "definitely 
corrupt/ok/.. last time we checked":

 pub fn verify_state(&self) -> Option<VerifyState> {
     self.load_manifest().ok().and_then(|(m, _)| {
         let verify = m.unprotected["verify_state"].clone();
         serde_json::from_value::<SnapshotVerifyState>(verify).ok().map(|svs| svs.state)
     })
 }

or the same slightly adapted and wrapped in Result if we want to handle 
no verify state differently from broken verify state/failure to load 
manifest.

no locking just for reading, shorter code, more widely-usable return 
value => pull code can then still map that so everything except 
Some(Failed) means its okay ;)

or, given that the current patch only checks this when a manifest is 
already available, it could also move to the manifest to remove the 
outer layer of error-source ;) but see below for other potential changes 
that might make this nil again.

>  }
>  
>  impl AsRef<pbs_api_types::BackupNamespace> for BackupDir {
> diff --git a/src/server/pull.rs b/src/server/pull.rs
> index 6778c66b..767b394c 100644
> --- a/src/server/pull.rs
> +++ b/src/server/pull.rs
> @@ -57,7 +57,7 @@ pub struct PullParameters {
>      /// How many levels of sub-namespaces to pull (0 == no recursion, None == maximum recursion)
>      max_depth: Option<usize>,
>      /// Whether to re-sync corrupted snapshots
> -    _deep_sync: bool,
> +    deep_sync: bool,
>      /// Filters for reducing the pull scope
>      group_filter: Option<Vec<GroupFilter>>,
>      /// Rate limits for all transfers from `remote`
> @@ -111,7 +111,7 @@ impl PullParameters {
>              owner,
>              remove_vanished,
>              max_depth,
> -            _deep_sync: deep_sync,
> +            deep_sync,
>              group_filter,
>              limit,
>          })
> @@ -371,6 +371,7 @@ async fn pull_snapshot(
>      worker: &WorkerTask,
>      reader: Arc<BackupReader>,
>      snapshot: &pbs_datastore::BackupDir,
> +    params: &PullParameters,
>      downloaded_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,

this could just be a `force` or `force-corrupt` boolean? (either meaning 
"force re-sync if corrupt" if the check stays here, or "force re-sync 
because corrupt" if the check moves up a layer)

>  ) -> Result<(), Error> {
>      let mut manifest_name = snapshot.full_path();
> @@ -437,7 +438,10 @@ async fn pull_snapshot(
>          let mut path = snapshot.full_path();
>          path.push(&item.filename);
>  
> -        if path.exists() {
> +        // if a snapshot could not be verified, the index file will stay the same, but it'll point
> +        // to at least one corrupted chunk. hence, skip this check if the last verification job
> +        // failed and we are running a deep sync.
> +        if !(params.deep_sync && snapshot.is_corrupt()) && path.exists() {

this decision could be made in pull_snapshot_from (only for already 
existing snapshots), setting force accordingly. or it could stay here, 
and be called on the manifest directly. in any case it only needs to be 
called once, not once for each referenced file.

>              match archive_type(&item.filename)? {
>                  ArchiveType::DynamicIndex => {
>                      let index = DynamicIndexReader::open(&path)?;
> @@ -513,6 +517,7 @@ async fn pull_snapshot_from(
>      worker: &WorkerTask,
>      reader: Arc<BackupReader>,
>      snapshot: &pbs_datastore::BackupDir,
> +    params: &PullParameters,
>      downloaded_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
>  ) -> Result<(), Error> {
>      let (_path, is_new, _snap_lock) = snapshot
> @@ -522,7 +527,7 @@ async fn pull_snapshot_from(
>      if is_new {
>          task_log!(worker, "sync snapshot {}", snapshot.dir());
>  
> -        if let Err(err) = pull_snapshot(worker, reader, snapshot, downloaded_chunks).await {
> +        if let Err(err) = pull_snapshot(worker, reader, snapshot, params, downloaded_chunks).await {

this would then pass false, since a new snapshot would never need to be 
force-resynced ;)

>              if let Err(cleanup_err) = snapshot.datastore().remove_backup_dir(
>                  snapshot.backup_ns(),
>                  snapshot.as_ref(),
> @@ -535,7 +540,7 @@ async fn pull_snapshot_from(
>          task_log!(worker, "sync snapshot {} done", snapshot.dir());
>      } else {
>          task_log!(worker, "re-sync snapshot {}", snapshot.dir());
> -        pull_snapshot(worker, reader, snapshot, downloaded_chunks).await?;
> +        pull_snapshot(worker, reader, snapshot, params, downloaded_chunks).await?;

here we could decide whether to force-resync based on verify state and 
deep_sync flag, or just pass the deep_sync flag and leave the corruption 
check in pull_snapshot.

>          task_log!(worker, "re-sync snapshot {} done", snapshot.dir());
>      }
>  
> @@ -666,10 +671,12 @@ async fn pull_group(
>  
>          remote_snapshots.insert(snapshot.time);
>  
> -        if let Some(last_sync_time) = last_sync {
> -            if last_sync_time > snapshot.time {
> -                skip_info.update(snapshot.time);
> -                continue;
> +        if !params.deep_sync {

and this here would be where we could handle the second "re-sync missing 
snapshots" part, if we want to ;) it would likely require pulling up the 
snapshot lock somewhere here, and passing is_new to pull_snapshot_from?

> +            if let Some(last_sync_time) = last_sync {
> +                if last_sync_time > snapshot.time {
> +                    skip_info.update(snapshot.time);
> +                    continue;
> +                }
>              }
>          }
>  
> @@ -699,7 +706,8 @@ async fn pull_group(
>  
>          let snapshot = params.store.backup_dir(target_ns.clone(), snapshot)?;
>  
> -        let result = pull_snapshot_from(worker, reader, &snapshot, downloaded_chunks.clone()).await;
> +        let result =
> +            pull_snapshot_from(worker, reader, &snapshot, params, downloaded_chunks.clone()).await;
>  
>          progress.done_snapshots = pos as u64 + 1;
>          task_log!(worker, "percentage done: {}", progress);
> -- 
> 2.30.2
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-08-02 10:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-15  8:20 [pbs-devel] [RFC PATCH 0/4] fix #3786: add a "deep sync" option to sync jobs Stefan Sterz
2022-06-15  8:20 ` [pbs-devel] [RFC PATCH 1/4] fix #3786: api: add deep sync parameter Stefan Sterz
2022-08-02  9:19   ` Fabian Grünbichler
2022-06-15  8:20 ` [pbs-devel] [RFC PATCH 2/4] fix #3786: server/datastore: add deep sync parameter to pull sync jobs Stefan Sterz
2022-08-02 10:07   ` Fabian Grünbichler
2022-06-15  8:20 ` [pbs-devel] [RFC PATCH 3/4] fix #3786: ui/cli: add deep sync option to ui and cli Stefan Sterz
2022-06-15  8:20 ` [pbs-devel] [RFC PATCH 4/4] fix #3786: docs: document deep sync behavior and prerequisites Stefan Sterz
2022-06-22 10:52 ` [pbs-devel] [RFC PATCH 0/4] fix #3786: add a "deep sync" option to sync jobs Matthias Heiserer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal