public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Christian Ebner <c.ebner@proxmox.com>
To: Proxmox Backup Server development discussion
	<pbs-devel@lists.proxmox.com>, Nicolas Frey <n.frey@proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v2 1/4] api: verify: move chunk loading into parallel handler
Date: Fri, 7 Nov 2025 09:39:11 +0100	[thread overview]
Message-ID: <49ae4fa9-fd40-45ca-a058-4fd1962871af@proxmox.com> (raw)
In-Reply-To: <20251106161316.528349-5-n.frey@proxmox.com>

Bundling the parameters makes this easier to digest, thanks.

Nevertheless, which I failed to mention on the previous version of the 
patch, this should better be done as preparatory patch, so the 
introduction of the new `IndexVerifyState` and all the adaption required 
just for that should go into one, the actual changes to the chunk 
loading in a subsequent patch. That makes it easier to diget.

some more comments inline

On 11/6/25 5:13 PM, Nicolas Frey wrote:
> This way, the chunks will be loaded in parallel in addition to being
> checked in parallel.
> 
> Depending on the underlying storage, this can speed up reading chunks
> from disk, especially when the underlying storage is IO depth
> dependent, and the CPU is faster than the storage.
> 
> Introduces a new state struct `IndexVerifyState` so that we only need
> to pass around and clone one `Arc`.
> 
> Originally-by: Dominik Csapak <d.csapak@proxmox.com>
> Signed-off-by: Nicolas Frey <n.frey@proxmox.com>
> ---
>   src/backup/verify.rs | 134 +++++++++++++++++++++++++------------------
>   1 file changed, 78 insertions(+), 56 deletions(-)
> 
> diff --git a/src/backup/verify.rs b/src/backup/verify.rs
> index 31c03891..910a3ed5 100644
> --- a/src/backup/verify.rs
> +++ b/src/backup/verify.rs
> @@ -1,6 +1,6 @@
>   use pbs_config::BackupLockGuard;
>   use std::collections::HashSet;
> -use std::sync::atomic::{AtomicUsize, Ordering};
> +use std::sync::atomic::{AtomicU64, AtomicUsize, Ordering};
>   use std::sync::{Arc, Mutex};
>   use std::time::Instant;
>   
> @@ -20,7 +20,7 @@ use pbs_datastore::index::{ChunkReadInfo, IndexFile};
>   use pbs_datastore::manifest::{BackupManifest, FileInfo};
>   use pbs_datastore::{DataBlob, DataStore, DatastoreBackend, StoreProgress};
>   
> -use crate::tools::parallel_handler::ParallelHandler;
> +use crate::tools::parallel_handler::{ParallelHandler, SendHandle};
>   
>   use crate::backup::hierarchy::ListAccessibleBackupGroups;
>   
> @@ -34,6 +34,15 @@ pub struct VerifyWorker {
>       backend: DatastoreBackend,
>   }
>   
> +struct IndexVerifyState {
> +    read_bytes: AtomicU64,
> +    decoded_bytes: AtomicU64,
> +    errors: AtomicUsize,
> +    datastore: Arc<DataStore>,
> +    corrupt_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
> +    verified_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
> +}
> +
>   impl VerifyWorker {
>       /// Creates a new VerifyWorker for a given task worker and datastore.
>       pub fn new(
> @@ -81,27 +90,25 @@ impl VerifyWorker {
>           index: Box<dyn IndexFile + Send>,
>           crypt_mode: CryptMode,
>       ) -> Result<(), Error> {
> -        let errors = Arc::new(AtomicUsize::new(0));
> -
>           let start_time = Instant::now();

nit: The start_time could be moved to the `IndexVerifyState` as well

>   
> -        let mut read_bytes = 0;
> -        let mut decoded_bytes = 0;
> -
> -        let datastore2 = Arc::clone(&self.datastore);
> -        let corrupt_chunks2 = Arc::clone(&self.corrupt_chunks);
> -        let verified_chunks2 = Arc::clone(&self.verified_chunks);
> -        let errors2 = Arc::clone(&errors);
> -
> -        let decoder_pool = ParallelHandler::new(
> -            "verify chunk decoder",
> -            4,
> +        let verify_state = Arc::new(IndexVerifyState {
> +            read_bytes: AtomicU64::new(0),
> +            decoded_bytes: AtomicU64::new(0),
> +            errors: AtomicUsize::new(0),
> +            datastore: Arc::clone(&self.datastore),
> +            corrupt_chunks: Arc::clone(&self.corrupt_chunks),
> +            verified_chunks: Arc::clone(&self.verified_chunks),
> +        });

This could be moved to a constructor and instantiated via 
`IndexVerifyState::new(&self.datastore, &self.corrupt_chunks, 
&self.verified_chunks)`, which sets all the field values and gathers the 
start time as well.

> +
> +        let decoder_pool = ParallelHandler::new("verify chunk decoder", 4, {
> +            let verify_state = Arc::clone(&verify_state);
>               move |(chunk, digest, size): (DataBlob, [u8; 32], u64)| {
>                   let chunk_crypt_mode = match chunk.crypt_mode() {
>                       Err(err) => {
> -                        corrupt_chunks2.lock().unwrap().insert(digest);
> +                        verify_state.corrupt_chunks.lock().unwrap().insert(digest);
>                           info!("can't verify chunk, unknown CryptMode - {err}");
> -                        errors2.fetch_add(1, Ordering::SeqCst);
> +                        verify_state.errors.fetch_add(1, Ordering::SeqCst);
>                           return Ok(());
>                       }
>                       Ok(mode) => mode,
> @@ -111,25 +118,25 @@ impl VerifyWorker {
>                       info!(
>                       "chunk CryptMode {chunk_crypt_mode:?} does not match index CryptMode {crypt_mode:?}"
>                   );
> -                    errors2.fetch_add(1, Ordering::SeqCst);
> +                    verify_state.errors.fetch_add(1, Ordering::SeqCst);
>                   }
>   
>                   if let Err(err) = chunk.verify_unencrypted(size as usize, &digest) {
> -                    corrupt_chunks2.lock().unwrap().insert(digest);
> +                    verify_state.corrupt_chunks.lock().unwrap().insert(digest);
>                       info!("{err}");
> -                    errors2.fetch_add(1, Ordering::SeqCst);
> -                    match datastore2.rename_corrupt_chunk(&digest) {
> +                    verify_state.errors.fetch_add(1, Ordering::SeqCst);
> +                    match verify_state.datastore.rename_corrupt_chunk(&digest) {
>                           Ok(Some(new_path)) => info!("corrupt chunk renamed to {new_path:?}"),
>                           Err(err) => info!("{err}"),
>                           _ => (),
>                       }
>                   } else {
> -                    verified_chunks2.lock().unwrap().insert(digest);
> +                    verify_state.verified_chunks.lock().unwrap().insert(digest);
>                   }
>   
>                   Ok(())
> -            },
> -        );
> +            }
> +        });
>   
>           let skip_chunk = |digest: &[u8; 32]| -> bool {
>               if self.verified_chunks.lock().unwrap().contains(digest) {
> @@ -137,7 +144,7 @@ impl VerifyWorker {
>               } else if self.corrupt_chunks.lock().unwrap().contains(digest) {
>                   let digest_str = hex::encode(digest);
>                   info!("chunk {digest_str} was marked as corrupt");
> -                errors.fetch_add(1, Ordering::SeqCst);
> +                verify_state.errors.fetch_add(1, Ordering::SeqCst);
>                   true
>               } else {
>                   false
> @@ -156,6 +163,21 @@ impl VerifyWorker {
>               .datastore
>               .get_chunks_in_order(&*index, skip_chunk, check_abort)?;
>   
> +        let reader_pool = ParallelHandler::new("read chunks", 4, {
> +            let decoder_pool = decoder_pool.channel();
> +            let verify_state = Arc::clone(&verify_state);
> +            let backend = self.backend.clone();
> +
> +            move |info: ChunkReadInfo| {
> +                Self::verify_chunk_by_backend(
> +                    &backend,
> +                    Arc::clone(&verify_state),
> +                    &decoder_pool,
> +                    &info,
> +                )
> +            }
> +        });
> +
>           for (pos, _) in chunk_list {
>               self.worker.check_abort()?;
>               self.worker.fail_on_shutdown()?;
> @@ -167,58 +189,56 @@ impl VerifyWorker {
>                   continue; // already verified or marked corrupt
>               }
>   
> -            self.verify_chunk_by_backend(
> -                &info,
> -                &mut read_bytes,
> -                &mut decoded_bytes,
> -                Arc::clone(&errors),
> -                &decoder_pool,
> -            )?;
> +            reader_pool.send(info)?;
>           }
>   
> -        decoder_pool.complete()?;
> +        reader_pool.complete()?;
>   
>           let elapsed = start_time.elapsed().as_secs_f64();

could get the time from the `verify_state` then

>   
> +        let read_bytes = verify_state.read_bytes.load(Ordering::SeqCst);
> +        let decoded_bytes = verify_state.decoded_bytes.load(Ordering::SeqCst);
> +
>           let read_bytes_mib = (read_bytes as f64) / (1024.0 * 1024.0);
>           let decoded_bytes_mib = (decoded_bytes as f64) / (1024.0 * 1024.0);
>   
>           let read_speed = read_bytes_mib / elapsed;
>           let decode_speed = decoded_bytes_mib / elapsed;
>   
> -        let error_count = errors.load(Ordering::SeqCst);
> +        let error_count = verify_state.errors.load(Ordering::SeqCst);
>   
>           info!(
>               "  verified {read_bytes_mib:.2}/{decoded_bytes_mib:.2} MiB in {elapsed:.2} seconds, speed {read_speed:.2}/{decode_speed:.2} MiB/s ({error_count} errors)"
>           );
>   
> -        if errors.load(Ordering::SeqCst) > 0 {
> +        if verify_state.errors.load(Ordering::SeqCst) > 0 {
>               bail!("chunks could not be verified");
>           }
>   
>           Ok(())
>       }
>   
> +    #[allow(clippy::too_many_arguments)]

nit: above is no longer needed?

>       fn verify_chunk_by_backend(
> -        &self,
> +        backend: &DatastoreBackend,
> +        verify_state: Arc<IndexVerifyState>,
> +        decoder_pool: &SendHandle<(DataBlob, [u8; 32], u64)>,
>           info: &ChunkReadInfo,
> -        read_bytes: &mut u64,
> -        decoded_bytes: &mut u64,
> -        errors: Arc<AtomicUsize>,
> -        decoder_pool: &ParallelHandler<(DataBlob, [u8; 32], u64)>,
>       ) -> Result<(), Error> {
> -        match &self.backend {
> -            DatastoreBackend::Filesystem => match self.datastore.load_chunk(&info.digest) {
> -                Err(err) => self.add_corrupt_chunk(
> +        match backend {
> +            DatastoreBackend::Filesystem => match verify_state.datastore.load_chunk(&info.digest) {
> +                Err(err) => Self::add_corrupt_chunk(
> +                    verify_state,
>                       info.digest,
> -                    errors,
>                       &format!("can't verify chunk, load failed - {err}"),
>                   ),
>                   Ok(chunk) => {
>                       let size = info.size();
> -                    *read_bytes += chunk.raw_size();
> +                    verify_state
> +                        .read_bytes
> +                        .fetch_add(chunk.raw_size(), Ordering::SeqCst);
>                       decoder_pool.send((chunk, info.digest, size))?;
> -                    *decoded_bytes += size;
> +                    verify_state.decoded_bytes.fetch_add(size, Ordering::SeqCst);
>                   }
>               },
>               DatastoreBackend::S3(s3_client) => {
> @@ -235,26 +255,28 @@ impl VerifyWorker {
>                           match chunk_result {
>                               Ok(chunk) => {
>                                   let size = info.size();
> -                                *read_bytes += chunk.raw_size();
> +                                verify_state
> +                                    .read_bytes
> +                                    .fetch_add(chunk.raw_size(), Ordering::SeqCst);
>                                   decoder_pool.send((chunk, info.digest, size))?;
> -                                *decoded_bytes += size;
> +                                verify_state.decoded_bytes.fetch_add(size, Ordering::SeqCst);
>                               }
>                               Err(err) => {
> -                                errors.fetch_add(1, Ordering::SeqCst);
> +                                verify_state.errors.fetch_add(1, Ordering::SeqCst);
>                                   error!("can't verify chunk, load failed - {err}");
>                               }
>                           }
>                       }
> -                    Ok(None) => self.add_corrupt_chunk(
> +                    Ok(None) => Self::add_corrupt_chunk(
> +                        verify_state,
>                           info.digest,
> -                        errors,
>                           &format!(
>                               "can't verify missing chunk with digest {}",
>                               hex::encode(info.digest)
>                           ),
>                       ),
>                       Err(err) => {
> -                        errors.fetch_add(1, Ordering::SeqCst);
> +                        verify_state.errors.fetch_add(1, Ordering::SeqCst);
>                           error!("can't verify chunk, load failed - {err}");
>                       }
>                   }
> @@ -263,13 +285,13 @@ impl VerifyWorker {
>           Ok(())
>       }
>   
> -    fn add_corrupt_chunk(&self, digest: [u8; 32], errors: Arc<AtomicUsize>, message: &str) {
> +    fn add_corrupt_chunk(verify_state: Arc<IndexVerifyState>, digest: [u8; 32], message: &str) {
>           // Panic on poisoned mutex
> -        let mut corrupt_chunks = self.corrupt_chunks.lock().unwrap();
> +        let mut corrupt_chunks = verify_state.corrupt_chunks.lock().unwrap();
>           corrupt_chunks.insert(digest);
>           error!(message);
> -        errors.fetch_add(1, Ordering::SeqCst);
> -        match self.datastore.rename_corrupt_chunk(&digest) {
> +        verify_state.errors.fetch_add(1, Ordering::SeqCst);
> +        match verify_state.datastore.rename_corrupt_chunk(&digest) {
>               Ok(Some(new_path)) => info!("corrupt chunk renamed to {new_path:?}"),
>               Err(err) => info!("{err}"),
>               _ => (),



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


  reply	other threads:[~2025-11-07  8:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-06 16:13 [pbs-devel] [PATCH proxmox{, -backup} v2 0/7] parallelize chunk reads in verification Nicolas Frey
2025-11-06 16:13 ` [pbs-devel] [PATCH proxmox v2 1/3] pbs-api-types: add schema for {worker, read, verify}-threads Nicolas Frey
2025-11-06 16:13 ` [pbs-devel] [PATCH proxmox v2 2/3] pbs-api-types: jobs: add {read, verify}-threads to VerificationJobConfig Nicolas Frey
2025-11-06 17:44   ` Thomas Lamprecht
2025-11-07  7:47     ` Christian Ebner
2025-11-06 16:13 ` [pbs-devel] [PATCH proxmox v2 3/3] pbs-api-types: use worker-threads schema for TapeBackupJobSetup Nicolas Frey
2025-11-06 16:13 ` [pbs-devel] [PATCH proxmox-backup v2 1/4] api: verify: move chunk loading into parallel handler Nicolas Frey
2025-11-07  8:39   ` Christian Ebner [this message]
2025-11-06 16:13 ` [pbs-devel] [PATCH proxmox-backup v2 2/4] api: verify: determine the number of threads to use with {read, verify}-threads Nicolas Frey
2025-11-07  9:31   ` Christian Ebner
2025-11-07  9:41     ` Christian Ebner
2025-11-06 16:13 ` [pbs-devel] [PATCH proxmox-backup v2 3/4] api: verify: add {read, verify}-threads to update endpoint Nicolas Frey
2025-11-06 16:13 ` [pbs-devel] [PATCH proxmox-backup v2 4/4] ui: verify: add option to set number of threads for job Nicolas Frey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49ae4fa9-fd40-45ca-a058-4fd1962871af@proxmox.com \
    --to=c.ebner@proxmox.com \
    --cc=n.frey@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal