From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pbs-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68])
	by lore.proxmox.com (Postfix) with ESMTPS id 14A8A1FF15D
	for <inbox@lore.proxmox.com>; Thu, 17 Oct 2024 08:15:04 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id D6AAC9429;
	Thu, 17 Oct 2024 08:15:37 +0200 (CEST)
Message-ID: <6376441e-f2ac-4cb6-ae63-39f21bb2c4c3@proxmox.com>
Date: Thu, 17 Oct 2024 08:15:29 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird Beta
To: Proxmox Backup Server development discussion
 <pbs-devel@lists.proxmox.com>, Gabriel Goller <g.goller@proxmox.com>
References: <20241015131823.338766-1-g.goller@proxmox.com>
 <20241015131823.338766-2-g.goller@proxmox.com>
Content-Language: en-GB, de-AT
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
Autocrypt: addr=t.lamprecht@proxmox.com; keydata=
 xsFNBFsLjcYBEACsaQP6uTtw/xHTUCKF4VD4/Wfg7gGn47+OfCKJQAD+Oyb3HSBkjclopC5J
 uXsB1vVOfqVYE6PO8FlD2L5nxgT3SWkc6Ka634G/yGDU3ZC3C/7NcDVKhSBI5E0ww4Qj8s9w
 OQRloemb5LOBkJNEUshkWRTHHOmk6QqFB/qBPW2COpAx6oyxVUvBCgm/1S0dAZ9gfkvpqFSD
 90B5j3bL6i9FIv3YGUCgz6Ue3f7u+HsEAew6TMtlt90XV3vT4M2IOuECG/pXwTy7NtmHaBQ7
 UJBcwSOpDEweNob50+9B4KbnVn1ydx+K6UnEcGDvUWBkREccvuExvupYYYQ5dIhRFf3fkS4+
 wMlyAFh8PQUgauod+vqs45FJaSgTqIALSBsEHKEs6IoTXtnnpbhu3p6XBin4hunwoBFiyYt6
 YHLAM1yLfCyX510DFzX/Ze2hLqatqzY5Wa7NIXqYYelz7tXiuCLHP84+sV6JtEkeSUCuOiUY
 virj6nT/nJK8m0BzdR6FgGtNxp7RVXFRz/+mwijJVLpFsyG1i0Hmv2zTn3h2nyGK/I6yhFNt
 dX69y5hbo6LAsRjLUvZeHXpTU4TrpN/WiCjJblbj5um5eEr4yhcwhVmG102puTtuCECsDucZ
 jpKpUqzXlpLbzG/dp9dXFH3MivvfuaHrg3MtjXY1i+/Oxyp5iwARAQABzTNUaG9tYXMgTGFt
 cHJlY2h0IChBdXRoLTQpIDx0LmxhbXByZWNodEBwcm94bW94LmNvbT7CwY4EEwEIADgWIQQO
 R4qbEl/pah9K6VrTZCM6gDZWBgUCWwuNxgIbAwULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAK
 CRDTZCM6gDZWBm/jD/4+6JB2s67eaqoP6x9VGaXNGJPCscwzLuxDTCG90G9FYu29VcXtubH/
 bPwsyBbNUQpqTm/s4XboU2qpS5ykCuTjqavrcP33tdkYfGcItj2xMipJ1i3TWvpikQVsX42R
 G64wovLs/dvpTYphRZkg5DwhgTmy3mRkmofFCTa+//MOcNOORltemp984tWjpR3bUJETNWpF
 sKGZHa3N4kCNxb7A+VMsJZ/1gN3jbQbQG7GkJtnHlWkw9rKCYqBtWrnrHa4UAvSa9M/XCIAB
 FThFGqZI1ojdVlv5gd6b/nWxfOPrLlSxbUo5FZ1i/ycj7/24nznW1V4ykG9iUld4uYUY86bB
 UGSjew1KYp9FmvKiwEoB+zxNnuEQfS7/Bj1X9nxizgweiHIyFsRqgogTvLh403QMSGNSoArk
 tqkorf1U+VhEncIn4H3KksJF0njZKfilrieOO7Vuot1xKr9QnYrZzJ7m7ZxJ/JfKGaRHXkE1
 feMmrvZD1AtdUATZkoeQtTOpMu4r6IQRfSdwm/CkppZXfDe50DJxAMDWwfK2rr2bVkNg/yZI
 tKLBS0YgRTIynkvv0h8d9dIjiicw3RMeYXyqOnSWVva2r+tl+JBaenr8YTQw0zARrhC0mttu
 cIZGnVEvQuDwib57QLqMjQaC1gazKHvhA15H5MNxUhwm229UmdH3KM7BTQRbC43GARAAyTkR
 D6KRJ9Xa2fVMh+6f186q0M3ni+5tsaVhUiykxjsPgkuWXWW9MbLpYXkzX6h/RIEKlo2BGA95
 QwG5+Ya2Bo3g7FGJHAkXY6loq7DgMp5/TVQ8phsSv3WxPTJLCBq6vNBamp5hda4cfXFUymsy
 HsJy4dtgkrPQ/bnsdFDCRUuhJHopnAzKHN8APXpKU6xV5e3GE4LwFsDhNHfH/m9+2yO/trcD
 txSFpyftbK2gaMERHgA8SKkzRhiwRTt9w5idOfpJVkYRsgvuSGZ0pcD4kLCOIFrer5xXudk6
 NgJc36XkFRMnwqrL/bB4k6Pi2u5leyqcXSLyBgeHsZJxg6Lcr2LZ35+8RQGPOw9C0ItmRjtY
 ZpGKPlSxjxA1WHT2YlF9CEt3nx7c4C3thHHtqBra6BGPyW8rvtq4zRqZRLPmZ0kt/kiMPhTM
 8wZAlObbATVrUMcZ/uNjRv2vU9O5aTAD9E5r1B0dlqKgxyoImUWB0JgpILADaT3VybDd3C8X
 s6Jt8MytUP+1cEWt9VKo4vY4Jh5vwrJUDLJvzpN+TsYCZPNVj18+jf9uGRaoK6W++DdMAr5l
 gQiwsNgf9372dbMI7pt2gnT5/YdG+ZHnIIlXC6OUonA1Ro/Itg90Q7iQySnKKkqqnWVc+qO9
 GJbzcGykxD6EQtCSlurt3/5IXTA7t6sAEQEAAcLBdgQYAQgAIBYhBA5HipsSX+lqH0rpWtNk
 IzqANlYGBQJbC43GAhsMAAoJENNkIzqANlYGD1sP/ikKgHgcspEKqDED9gQrTBvipH85si0j
 /Jwu/tBtnYjLgKLh2cjv1JkgYYjb3DyZa1pLsIv6rGnPX9bH9IN03nqirC/Q1Y1lnbNTynPk
 IflgvsJjoTNZjgu1wUdQlBgL/JhUp1sIYID11jZphgzfDgp/E6ve/8xE2HMAnf4zAfJaKgD0
 F+fL1DlcdYUditAiYEuN40Ns/abKs8I1MYx7Yglu3RzJfBzV4t86DAR+OvuF9v188WrFwXCS
 RSf4DmJ8tntyNej+DVGUnmKHupLQJO7uqCKB/1HLlMKc5G3GLoGqJliHjUHUAXNzinlpE2Vj
 C78pxpwxRNg2ilE3AhPoAXrY5qED5PLE9sLnmQ9AzRcMMJUXjTNEDxEYbF55SdGBHHOAcZtA
 kEQKub86e+GHA+Z8oXQSGeSGOkqHi7zfgW1UexddTvaRwE6AyZ6FxTApm8wq8NT2cryWPWTF
 BDSGB3ujWHMM8ERRYJPcBSjTvt0GcEqnd+OSGgxTkGOdufn51oz82zfpVo1t+J/FNz6MRMcg
 8nEC+uKvgzH1nujxJ5pRCBOquFZaGn/p71Yr0oVitkttLKblFsqwa+10Lt6HBxm+2+VLp4Ja
 0WZNncZciz3V3cuArpan/ZhhyiWYV5FD0pOXPCJIx7WS9PTtxiv0AOS4ScWEUmBxyhFeOpYa DrEx
In-Reply-To: <20241015131823.338766-2-g.goller@proxmox.com>
X-SPAM-LEVEL: Spam detection results:  0
 AWL -0.051 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pbs-devel] [PATCH proxmox-backup 1/3] fix #3786: api: add
 resync-corrupt option to sync jobs
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox Backup Server development discussion
 <pbs-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pbs-devel-bounces@lists.proxmox.com
Sender: "pbs-devel" <pbs-devel-bounces@lists.proxmox.com>

Am 15/10/2024 um 15:18 schrieb Gabriel Goller:
> This option allows us to "fix" corrupt snapshots (and/or their chunks)
> by pulling them from another remote. When traversing the remote
> snapshots, we check if it exists locally, and if it is, we check if the
> last verification of it failed. If the local snapshot is broken and the
> `resync-corrupt` option is turned on, we pull in the remote snapshot,
> overwriting the local one.
> 
> This is very useful and has been requested a lot, as there is currently
> no way to "fix" corrupt chunks/snapshots even if the user has a healthy
> version of it on their offsite instance.
> 
> Signed-off-by: Gabriel Goller <g.goller@proxmox.com>
> Originally-by: Shannon Sterz <s.sterz@proxmox.com>

those two trailers probably should be switched to uphold causal ordering.


In general, it would be _really_ nice to get some regression testing for
sync jobs covering the selection with different matching and options.
The lack of that doesn't have to block your/shannon's patch, but IMO it
should be added rather sooner than later. Sync is becoming more and more
complex, and the selection of snapshots is a very important and fundamental
part of it.

> ---
>  pbs-api-types/src/jobs.rs         | 11 +++++
>  pbs-datastore/src/backup_info.rs  | 13 +++++-
>  src/api2/config/sync.rs           |  4 ++
>  src/api2/pull.rs                  |  9 +++-
>  src/bin/proxmox-backup-manager.rs |  4 +-
>  src/server/pull.rs                | 68 +++++++++++++++++++++----------
>  6 files changed, 84 insertions(+), 25 deletions(-)
> 
> diff --git a/pbs-api-types/src/jobs.rs b/pbs-api-types/src/jobs.rs
> index 868702bc059e..a3cd68bf5afd 100644
> --- a/pbs-api-types/src/jobs.rs
> +++ b/pbs-api-types/src/jobs.rs
> @@ -498,6 +498,11 @@ pub const TRANSFER_LAST_SCHEMA: Schema =
>          .minimum(1)
>          .schema();
>  
> +pub const RESYNC_CORRUPT_SCHEMA: Schema = BooleanSchema::new(
> +    "If the verification failed for a local snapshot, try to pull its chunks again.",

not only chunks, also its indexes (which might be also a source of a bad
verification result I think)

> +)
> +.schema();
> +
>  #[api(
>      properties: {
>          id: {
> @@ -552,6 +557,10 @@ pub const TRANSFER_LAST_SCHEMA: Schema =
>              schema: TRANSFER_LAST_SCHEMA,
>              optional: true,
>          },
> +        "resync-corrupt": {
> +            schema: RESYNC_CORRUPT_SCHEMA,
> +            optional: true,
> +        }
>      }
>  )]
>  #[derive(Serialize, Deserialize, Clone, Updater, PartialEq)]
> @@ -585,6 +594,8 @@ pub struct SyncJobConfig {
>      pub limit: RateLimitConfig,
>      #[serde(skip_serializing_if = "Option::is_none")]
>      pub transfer_last: Option<usize>,
> +    #[serde(skip_serializing_if = "Option::is_none")]
> +    pub resync_corrupt: Option<bool>,
>  }
>  
>  impl SyncJobConfig {
> diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
> index 414ec878d01a..2d283dbe5cd9 100644
> --- a/pbs-datastore/src/backup_info.rs
> +++ b/pbs-datastore/src/backup_info.rs
> @@ -8,7 +8,8 @@ use anyhow::{bail, format_err, Error};
>  use proxmox_sys::fs::{lock_dir_noblock, replace_file, CreateOptions};
>  
>  use pbs_api_types::{
> -    Authid, BackupNamespace, BackupType, GroupFilter, BACKUP_DATE_REGEX, BACKUP_FILE_REGEX,
> +    Authid, BackupNamespace, BackupType, GroupFilter, SnapshotVerifyState, VerifyState,
> +    BACKUP_DATE_REGEX, BACKUP_FILE_REGEX,
>  };
>  use pbs_config::{open_backup_lockfile, BackupLockGuard};
>  
> @@ -583,6 +584,16 @@ impl BackupDir {
>  
>          Ok(())
>      }
> +
> +    /// Load the verify state from the manifest.
> +    pub fn verify_state(&self) -> Option<VerifyState> {
> +        self.load_manifest().ok().and_then(|(m, _)| {

hmm, just ignoring errors here seems a bit odd, that might be a case where one
might want to re-sync. So I'd prefer wrapping this a result, even if it's a bit
tedious.

> +            let verify = m.unprotected["verify_state"].clone();
> +            serde_json::from_value::<SnapshotVerifyState>(verify)
> +                .ok()
> +                .map(|svs| svs.state)
> +        })
> +    }
>  }
>  
>  impl AsRef<pbs_api_types::BackupNamespace> for BackupDir {
> diff --git a/src/api2/config/sync.rs b/src/api2/config/sync.rs
> index 6fdc69a9e645..fa9db92f3d11 100644
> --- a/src/api2/config/sync.rs
> +++ b/src/api2/config/sync.rs
> @@ -368,6 +368,9 @@ pub fn update_sync_job(
>      if let Some(transfer_last) = update.transfer_last {
>          data.transfer_last = Some(transfer_last);
>      }
> +    if let Some(resync_corrupt) = update.resync_corrupt {
> +        data.resync_corrupt = Some(resync_corrupt);
> +    }
>  
>      if update.limit.rate_in.is_some() {
>          data.limit.rate_in = update.limit.rate_in;
> @@ -527,6 +530,7 @@ acl:1:/remote/remote1/remotestore1:write@pbs:RemoteSyncOperator
>          ns: None,
>          owner: Some(write_auth_id.clone()),
>          comment: None,
> +        resync_corrupt: None,
>          remove_vanished: None,
>          max_depth: None,
>          group_filter: None,
> diff --git a/src/api2/pull.rs b/src/api2/pull.rs
> index e733c9839e3a..0d4be0e2d228 100644
> --- a/src/api2/pull.rs
> +++ b/src/api2/pull.rs
> @@ -10,7 +10,7 @@ use pbs_api_types::{
>      Authid, BackupNamespace, GroupFilter, RateLimitConfig, SyncJobConfig, DATASTORE_SCHEMA,
>      GROUP_FILTER_LIST_SCHEMA, NS_MAX_DEPTH_REDUCED_SCHEMA, PRIV_DATASTORE_BACKUP,
>      PRIV_DATASTORE_PRUNE, PRIV_REMOTE_READ, REMOTE_ID_SCHEMA, REMOVE_VANISHED_BACKUPS_SCHEMA,
> -    TRANSFER_LAST_SCHEMA,
> +    RESYNC_CORRUPT_SCHEMA, TRANSFER_LAST_SCHEMA,
>  };
>  use pbs_config::CachedUserInfo;
>  use proxmox_human_byte::HumanByte;
> @@ -89,6 +89,7 @@ impl TryFrom<&SyncJobConfig> for PullParameters {
>              sync_job.group_filter.clone(),
>              sync_job.limit.clone(),
>              sync_job.transfer_last,
> +            sync_job.resync_corrupt,
>          )
>      }
>  }
> @@ -240,6 +241,10 @@ pub fn do_sync_job(
>                  schema: TRANSFER_LAST_SCHEMA,
>                  optional: true,
>              },
> +            "resync-corrupt": {
> +                schema: RESYNC_CORRUPT_SCHEMA,
> +                optional: true,
> +            },
>          },
>      },
>      access: {
> @@ -264,6 +269,7 @@ async fn pull(
>      group_filter: Option<Vec<GroupFilter>>,
>      limit: RateLimitConfig,
>      transfer_last: Option<usize>,
> +    resync_corrupt: Option<bool>,
>      rpcenv: &mut dyn RpcEnvironment,
>  ) -> Result<String, Error> {
>      let auth_id: Authid = rpcenv.get_auth_id().unwrap().parse()?;
> @@ -301,6 +307,7 @@ async fn pull(
>          group_filter,
>          limit,
>          transfer_last,
> +        resync_corrupt,
>      )?;
>  
>      // fixme: set to_stdout to false?
> diff --git a/src/bin/proxmox-backup-manager.rs b/src/bin/proxmox-backup-manager.rs
> index 420e96665662..38a1cf0f5881 100644
> --- a/src/bin/proxmox-backup-manager.rs
> +++ b/src/bin/proxmox-backup-manager.rs
> @@ -14,8 +14,8 @@ use pbs_api_types::percent_encoding::percent_encode_component;
>  use pbs_api_types::{
>      BackupNamespace, GroupFilter, RateLimitConfig, SyncJobConfig, DATASTORE_SCHEMA,
>      GROUP_FILTER_LIST_SCHEMA, IGNORE_VERIFIED_BACKUPS_SCHEMA, NS_MAX_DEPTH_SCHEMA,
> -    REMOTE_ID_SCHEMA, REMOVE_VANISHED_BACKUPS_SCHEMA, TRANSFER_LAST_SCHEMA, UPID_SCHEMA,
> -    VERIFICATION_OUTDATED_AFTER_SCHEMA,
> +    REMOTE_ID_SCHEMA, REMOVE_VANISHED_BACKUPS_SCHEMA, RESYNC_CORRUPT_SCHEMA, TRANSFER_LAST_SCHEMA,
> +    UPID_SCHEMA, VERIFICATION_OUTDATED_AFTER_SCHEMA,
>  };
>  use pbs_client::{display_task_log, view_task_result};
>  use pbs_config::sync;
> diff --git a/src/server/pull.rs b/src/server/pull.rs
> index 3117f7d2c960..67881f83b5e3 100644
> --- a/src/server/pull.rs
> +++ b/src/server/pull.rs
> @@ -12,7 +12,8 @@ use tracing::info;
>  
>  use pbs_api_types::{
>      print_store_and_ns, Authid, BackupDir, BackupGroup, BackupNamespace, GroupFilter, Operation,
> -    RateLimitConfig, Remote, MAX_NAMESPACE_DEPTH, PRIV_DATASTORE_AUDIT, PRIV_DATASTORE_BACKUP,
> +    RateLimitConfig, Remote, VerifyState, MAX_NAMESPACE_DEPTH, PRIV_DATASTORE_AUDIT,
> +    PRIV_DATASTORE_BACKUP,
>  };
>  use pbs_client::BackupRepository;
>  use pbs_config::CachedUserInfo;
> @@ -55,6 +56,8 @@ pub(crate) struct PullParameters {
>      group_filter: Vec<GroupFilter>,
>      /// How many snapshots should be transferred at most (taking the newest N snapshots)
>      transfer_last: Option<usize>,
> +    /// Whether to re-sync corrupted snapshots
> +    resync_corrupt: bool,
>  }
>  
>  impl PullParameters {
> @@ -72,12 +75,14 @@ impl PullParameters {
>          group_filter: Option<Vec<GroupFilter>>,
>          limit: RateLimitConfig,
>          transfer_last: Option<usize>,
> +        resync_corrupt: Option<bool>,
>      ) -> Result<Self, Error> {
>          if let Some(max_depth) = max_depth {
>              ns.check_max_depth(max_depth)?;
>              remote_ns.check_max_depth(max_depth)?;
>          };
>          let remove_vanished = remove_vanished.unwrap_or(false);
> +        let resync_corrupt = resync_corrupt.unwrap_or(false);
>  
>          let source: Arc<dyn SyncSource> = if let Some(remote) = remote {
>              let (remote_config, _digest) = pbs_config::remote::config()?;
> @@ -116,6 +121,7 @@ impl PullParameters {
>              max_depth,
>              group_filter,
>              transfer_last,
> +            resync_corrupt,
>          })
>      }
>  }
> @@ -175,9 +181,10 @@ async fn pull_index_chunks<I: IndexFile>(
>                      target.cond_touch_chunk(&info.digest, false)
>                  })?;
>                  if chunk_exists {
> -                    //info!("chunk {} exists {}", pos, hex::encode(digest));
> +                    //info!("chunk exists {}", hex::encode(info.digest));
>                      return Ok::<_, Error>(());
>                  }
> +
>                  //info!("sync {} chunk {}", pos, hex::encode(digest));
>                  let chunk = chunk_reader.read_raw_chunk(&info.digest).await?;
>                  let raw_size = chunk.raw_size() as usize;
> @@ -332,6 +339,7 @@ async fn pull_snapshot<'a>(
>      reader: Arc<dyn SyncSourceReader + 'a>,
>      snapshot: &'a pbs_datastore::BackupDir,
>      downloaded_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
> +    resync_corrupt: bool,

would be good to update the doc comment of this method to mention the new param
in the description of steps

>  ) -> Result<SyncStats, Error> {
>      let mut sync_stats = SyncStats::default();
>      let mut manifest_name = snapshot.full_path();
> @@ -352,6 +360,8 @@ async fn pull_snapshot<'a>(
>          return Ok(sync_stats);
>      }
>  
> +    let snapshot_is_corrupt = snapshot.verify_state() == Some(VerifyState::Failed);

Hmm, but currently you'd only need to get the state if resync_corrupt is true?
Or at least if the local manifest already exists, if we want to allow re-syncing
on some other conditions (but that can be adapted to that once required).

e.g.:

let must_resync_exisiting = resync_corrupt && snapshot.verify_state() == Some(VerifyState::Failed);

That would make the checks below a bit easier to read.

And would it make sense to check the downloaded manifest's verify state, no point
in re-pulling if the other side is also corrupt, or is that possibility already
excluded somewhere else in this code path?

> +
>      if manifest_name.exists() {
>          let manifest_blob = proxmox_lang::try_block!({
>              let mut manifest_file = std::fs::File::open(&manifest_name).map_err(|err| {
> @@ -365,7 +375,9 @@ async fn pull_snapshot<'a>(
>              format_err!("unable to read local manifest {manifest_name:?} - {err}")
>          })?;
>  
> -        if manifest_blob.raw_data() == tmp_manifest_blob.raw_data() {
> +        if (manifest_blob.raw_data() == tmp_manifest_blob.raw_data())

unnecessary addition of parenthesis a == b && (!c && !d) would be fine too and less
noise to read, so to say.

> +            && (!resync_corrupt && !snapshot_is_corrupt)

This is equal to `.. && !(resync_corrupt || snapshot_is_corrupt)`, which makes it
slightly easier to read for me, and I'm slightly confused as why it's not 
`.. && !(resync_corrupt && snapshot_is_corrupt)`, i.e., with the new variable above:

if manifest_blob.raw_data() == tmp_manifest_blob.raw_data() && !must_resync_exisiting {
    // ...
}

Or why would the expression need to be different here?

> +        {
>              if !client_log_name.exists() {
>                  reader.try_download_client_log(&client_log_name).await?;
>              };
> @@ -381,7 +393,7 @@ async fn pull_snapshot<'a>(
>          let mut path = snapshot.full_path();
>          path.push(&item.filename);
>  
> -        if path.exists() {
> +        if !(resync_corrupt && snapshot_is_corrupt) && path.exists() {
>              match ArchiveType::from_path(&item.filename)? {
>                  ArchiveType::DynamicIndex => {
>                      let index = DynamicIndexReader::open(&path)?;
> @@ -416,6 +428,10 @@ async fn pull_snapshot<'a>(
>              }
>          }
>  
> +        if resync_corrupt && snapshot_is_corrupt {
> +            info!("pulling snapshot {} again", snapshot.dir());

maybe include the reason for re-pulling this here, e.g. something like:

"re-syncing snapshot {} due to bad verification result"

And wouldn't that get logged multiple times for backups with multiple archives, like
e.g. guests with multiple disks? Makes probably more sense to log that before the for
loop


> +        }
> +
>          let stats =
>              pull_single_archive(reader.clone(), snapshot, item, downloaded_chunks.clone()).await?;
>          sync_stats.add(stats);
> @@ -443,6 +459,7 @@ async fn pull_snapshot_from<'a>(
>      reader: Arc<dyn SyncSourceReader + 'a>,
>      snapshot: &'a pbs_datastore::BackupDir,
>      downloaded_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
> +    resync_corrupt: bool,
>  ) -> Result<SyncStats, Error> {
>      let (_path, is_new, _snap_lock) = snapshot
>          .datastore()
> @@ -451,7 +468,7 @@ async fn pull_snapshot_from<'a>(
>      let sync_stats = if is_new {
>          info!("sync snapshot {}", snapshot.dir());
>  
> -        match pull_snapshot(reader, snapshot, downloaded_chunks).await {
> +        match pull_snapshot(reader, snapshot, downloaded_chunks, resync_corrupt).await {
>              Err(err) => {
>                  if let Err(cleanup_err) = snapshot.datastore().remove_backup_dir(
>                      snapshot.backup_ns(),
> @@ -469,7 +486,7 @@ async fn pull_snapshot_from<'a>(
>          }
>      } else {
>          info!("re-sync snapshot {}", snapshot.dir());
> -        pull_snapshot(reader, snapshot, downloaded_chunks).await?
> +        pull_snapshot(reader, snapshot, downloaded_chunks, resync_corrupt).await?
>      };
>  
>      Ok(sync_stats)
> @@ -528,21 +545,24 @@ async fn pull_group(
>          .enumerate()
>          .filter(|&(pos, ref dir)| {
>              source_snapshots.insert(dir.time);
> -            if last_sync_time > dir.time {
> -                already_synced_skip_info.update(dir.time);
> -                return false;
> -            } else if already_synced_skip_info.count > 0 {
> -                info!("{already_synced_skip_info}");
> -                already_synced_skip_info.reset();
> -                return true;
> -            }
> +            // If resync_corrupt is set, we go through all the remote snapshots
> +            if !params.resync_corrupt {

maybe reverse that and use an explicit return? i.e.:

if params.resync_corrupt {
    return true; // always check all remote snapshots when resyncing
}

// rest of the filter

would avoid an indentation level for the non-bypass path and be slightly easier
to read to me

> +                if last_sync_time > dir.time {
> +                    already_synced_skip_info.update(dir.time);
> +                    return false;
> +                } else if already_synced_skip_info.count > 0 {
> +                    info!("{already_synced_skip_info}");
> +                    already_synced_skip_info.reset();
> +                    return true;
> +                }
>  
> -            if pos < cutoff && last_sync_time != dir.time {
> -                transfer_last_skip_info.update(dir.time);
> -                return false;
> -            } else if transfer_last_skip_info.count > 0 {
> -                info!("{transfer_last_skip_info}");
> -                transfer_last_skip_info.reset();
> +                if pos < cutoff && last_sync_time != dir.time {
> +                    transfer_last_skip_info.update(dir.time);
> +                    return false;
> +                } else if transfer_last_skip_info.count > 0 {
> +                    info!("{transfer_last_skip_info}");
> +                    transfer_last_skip_info.reset();
> +                }
>              }
>              true
>          })
> @@ -566,7 +586,13 @@ async fn pull_group(
>              .source
>              .reader(source_namespace, &from_snapshot)
>              .await?;
> -        let result = pull_snapshot_from(reader, &to_snapshot, downloaded_chunks.clone()).await;
> +        let result = pull_snapshot_from(
> +            reader,
> +            &to_snapshot,
> +            downloaded_chunks.clone(),
> +            params.resync_corrupt,
> +        )
> +        .await;
>  
>          progress.done_snapshots = pos as u64 + 1;
>          info!("percentage done: {progress}");



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel