From: Christian Ebner <c.ebner@proxmox.com>
To: Proxmox Backup Server development discussion
<pbs-devel@lists.proxmox.com>, Nicolas Frey <n.frey@proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v2 1/4] api: verify: move chunk loading into parallel handler
Date: Fri, 7 Nov 2025 09:39:11 +0100 [thread overview]
Message-ID: <49ae4fa9-fd40-45ca-a058-4fd1962871af@proxmox.com> (raw)
In-Reply-To: <20251106161316.528349-5-n.frey@proxmox.com>
Bundling the parameters makes this easier to digest, thanks.
Nevertheless, which I failed to mention on the previous version of the
patch, this should better be done as preparatory patch, so the
introduction of the new `IndexVerifyState` and all the adaption required
just for that should go into one, the actual changes to the chunk
loading in a subsequent patch. That makes it easier to diget.
some more comments inline
On 11/6/25 5:13 PM, Nicolas Frey wrote:
> This way, the chunks will be loaded in parallel in addition to being
> checked in parallel.
>
> Depending on the underlying storage, this can speed up reading chunks
> from disk, especially when the underlying storage is IO depth
> dependent, and the CPU is faster than the storage.
>
> Introduces a new state struct `IndexVerifyState` so that we only need
> to pass around and clone one `Arc`.
>
> Originally-by: Dominik Csapak <d.csapak@proxmox.com>
> Signed-off-by: Nicolas Frey <n.frey@proxmox.com>
> ---
> src/backup/verify.rs | 134 +++++++++++++++++++++++++------------------
> 1 file changed, 78 insertions(+), 56 deletions(-)
>
> diff --git a/src/backup/verify.rs b/src/backup/verify.rs
> index 31c03891..910a3ed5 100644
> --- a/src/backup/verify.rs
> +++ b/src/backup/verify.rs
> @@ -1,6 +1,6 @@
> use pbs_config::BackupLockGuard;
> use std::collections::HashSet;
> -use std::sync::atomic::{AtomicUsize, Ordering};
> +use std::sync::atomic::{AtomicU64, AtomicUsize, Ordering};
> use std::sync::{Arc, Mutex};
> use std::time::Instant;
>
> @@ -20,7 +20,7 @@ use pbs_datastore::index::{ChunkReadInfo, IndexFile};
> use pbs_datastore::manifest::{BackupManifest, FileInfo};
> use pbs_datastore::{DataBlob, DataStore, DatastoreBackend, StoreProgress};
>
> -use crate::tools::parallel_handler::ParallelHandler;
> +use crate::tools::parallel_handler::{ParallelHandler, SendHandle};
>
> use crate::backup::hierarchy::ListAccessibleBackupGroups;
>
> @@ -34,6 +34,15 @@ pub struct VerifyWorker {
> backend: DatastoreBackend,
> }
>
> +struct IndexVerifyState {
> + read_bytes: AtomicU64,
> + decoded_bytes: AtomicU64,
> + errors: AtomicUsize,
> + datastore: Arc<DataStore>,
> + corrupt_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
> + verified_chunks: Arc<Mutex<HashSet<[u8; 32]>>>,
> +}
> +
> impl VerifyWorker {
> /// Creates a new VerifyWorker for a given task worker and datastore.
> pub fn new(
> @@ -81,27 +90,25 @@ impl VerifyWorker {
> index: Box<dyn IndexFile + Send>,
> crypt_mode: CryptMode,
> ) -> Result<(), Error> {
> - let errors = Arc::new(AtomicUsize::new(0));
> -
> let start_time = Instant::now();
nit: The start_time could be moved to the `IndexVerifyState` as well
>
> - let mut read_bytes = 0;
> - let mut decoded_bytes = 0;
> -
> - let datastore2 = Arc::clone(&self.datastore);
> - let corrupt_chunks2 = Arc::clone(&self.corrupt_chunks);
> - let verified_chunks2 = Arc::clone(&self.verified_chunks);
> - let errors2 = Arc::clone(&errors);
> -
> - let decoder_pool = ParallelHandler::new(
> - "verify chunk decoder",
> - 4,
> + let verify_state = Arc::new(IndexVerifyState {
> + read_bytes: AtomicU64::new(0),
> + decoded_bytes: AtomicU64::new(0),
> + errors: AtomicUsize::new(0),
> + datastore: Arc::clone(&self.datastore),
> + corrupt_chunks: Arc::clone(&self.corrupt_chunks),
> + verified_chunks: Arc::clone(&self.verified_chunks),
> + });
This could be moved to a constructor and instantiated via
`IndexVerifyState::new(&self.datastore, &self.corrupt_chunks,
&self.verified_chunks)`, which sets all the field values and gathers the
start time as well.
> +
> + let decoder_pool = ParallelHandler::new("verify chunk decoder", 4, {
> + let verify_state = Arc::clone(&verify_state);
> move |(chunk, digest, size): (DataBlob, [u8; 32], u64)| {
> let chunk_crypt_mode = match chunk.crypt_mode() {
> Err(err) => {
> - corrupt_chunks2.lock().unwrap().insert(digest);
> + verify_state.corrupt_chunks.lock().unwrap().insert(digest);
> info!("can't verify chunk, unknown CryptMode - {err}");
> - errors2.fetch_add(1, Ordering::SeqCst);
> + verify_state.errors.fetch_add(1, Ordering::SeqCst);
> return Ok(());
> }
> Ok(mode) => mode,
> @@ -111,25 +118,25 @@ impl VerifyWorker {
> info!(
> "chunk CryptMode {chunk_crypt_mode:?} does not match index CryptMode {crypt_mode:?}"
> );
> - errors2.fetch_add(1, Ordering::SeqCst);
> + verify_state.errors.fetch_add(1, Ordering::SeqCst);
> }
>
> if let Err(err) = chunk.verify_unencrypted(size as usize, &digest) {
> - corrupt_chunks2.lock().unwrap().insert(digest);
> + verify_state.corrupt_chunks.lock().unwrap().insert(digest);
> info!("{err}");
> - errors2.fetch_add(1, Ordering::SeqCst);
> - match datastore2.rename_corrupt_chunk(&digest) {
> + verify_state.errors.fetch_add(1, Ordering::SeqCst);
> + match verify_state.datastore.rename_corrupt_chunk(&digest) {
> Ok(Some(new_path)) => info!("corrupt chunk renamed to {new_path:?}"),
> Err(err) => info!("{err}"),
> _ => (),
> }
> } else {
> - verified_chunks2.lock().unwrap().insert(digest);
> + verify_state.verified_chunks.lock().unwrap().insert(digest);
> }
>
> Ok(())
> - },
> - );
> + }
> + });
>
> let skip_chunk = |digest: &[u8; 32]| -> bool {
> if self.verified_chunks.lock().unwrap().contains(digest) {
> @@ -137,7 +144,7 @@ impl VerifyWorker {
> } else if self.corrupt_chunks.lock().unwrap().contains(digest) {
> let digest_str = hex::encode(digest);
> info!("chunk {digest_str} was marked as corrupt");
> - errors.fetch_add(1, Ordering::SeqCst);
> + verify_state.errors.fetch_add(1, Ordering::SeqCst);
> true
> } else {
> false
> @@ -156,6 +163,21 @@ impl VerifyWorker {
> .datastore
> .get_chunks_in_order(&*index, skip_chunk, check_abort)?;
>
> + let reader_pool = ParallelHandler::new("read chunks", 4, {
> + let decoder_pool = decoder_pool.channel();
> + let verify_state = Arc::clone(&verify_state);
> + let backend = self.backend.clone();
> +
> + move |info: ChunkReadInfo| {
> + Self::verify_chunk_by_backend(
> + &backend,
> + Arc::clone(&verify_state),
> + &decoder_pool,
> + &info,
> + )
> + }
> + });
> +
> for (pos, _) in chunk_list {
> self.worker.check_abort()?;
> self.worker.fail_on_shutdown()?;
> @@ -167,58 +189,56 @@ impl VerifyWorker {
> continue; // already verified or marked corrupt
> }
>
> - self.verify_chunk_by_backend(
> - &info,
> - &mut read_bytes,
> - &mut decoded_bytes,
> - Arc::clone(&errors),
> - &decoder_pool,
> - )?;
> + reader_pool.send(info)?;
> }
>
> - decoder_pool.complete()?;
> + reader_pool.complete()?;
>
> let elapsed = start_time.elapsed().as_secs_f64();
could get the time from the `verify_state` then
>
> + let read_bytes = verify_state.read_bytes.load(Ordering::SeqCst);
> + let decoded_bytes = verify_state.decoded_bytes.load(Ordering::SeqCst);
> +
> let read_bytes_mib = (read_bytes as f64) / (1024.0 * 1024.0);
> let decoded_bytes_mib = (decoded_bytes as f64) / (1024.0 * 1024.0);
>
> let read_speed = read_bytes_mib / elapsed;
> let decode_speed = decoded_bytes_mib / elapsed;
>
> - let error_count = errors.load(Ordering::SeqCst);
> + let error_count = verify_state.errors.load(Ordering::SeqCst);
>
> info!(
> " verified {read_bytes_mib:.2}/{decoded_bytes_mib:.2} MiB in {elapsed:.2} seconds, speed {read_speed:.2}/{decode_speed:.2} MiB/s ({error_count} errors)"
> );
>
> - if errors.load(Ordering::SeqCst) > 0 {
> + if verify_state.errors.load(Ordering::SeqCst) > 0 {
> bail!("chunks could not be verified");
> }
>
> Ok(())
> }
>
> + #[allow(clippy::too_many_arguments)]
nit: above is no longer needed?
> fn verify_chunk_by_backend(
> - &self,
> + backend: &DatastoreBackend,
> + verify_state: Arc<IndexVerifyState>,
> + decoder_pool: &SendHandle<(DataBlob, [u8; 32], u64)>,
> info: &ChunkReadInfo,
> - read_bytes: &mut u64,
> - decoded_bytes: &mut u64,
> - errors: Arc<AtomicUsize>,
> - decoder_pool: &ParallelHandler<(DataBlob, [u8; 32], u64)>,
> ) -> Result<(), Error> {
> - match &self.backend {
> - DatastoreBackend::Filesystem => match self.datastore.load_chunk(&info.digest) {
> - Err(err) => self.add_corrupt_chunk(
> + match backend {
> + DatastoreBackend::Filesystem => match verify_state.datastore.load_chunk(&info.digest) {
> + Err(err) => Self::add_corrupt_chunk(
> + verify_state,
> info.digest,
> - errors,
> &format!("can't verify chunk, load failed - {err}"),
> ),
> Ok(chunk) => {
> let size = info.size();
> - *read_bytes += chunk.raw_size();
> + verify_state
> + .read_bytes
> + .fetch_add(chunk.raw_size(), Ordering::SeqCst);
> decoder_pool.send((chunk, info.digest, size))?;
> - *decoded_bytes += size;
> + verify_state.decoded_bytes.fetch_add(size, Ordering::SeqCst);
> }
> },
> DatastoreBackend::S3(s3_client) => {
> @@ -235,26 +255,28 @@ impl VerifyWorker {
> match chunk_result {
> Ok(chunk) => {
> let size = info.size();
> - *read_bytes += chunk.raw_size();
> + verify_state
> + .read_bytes
> + .fetch_add(chunk.raw_size(), Ordering::SeqCst);
> decoder_pool.send((chunk, info.digest, size))?;
> - *decoded_bytes += size;
> + verify_state.decoded_bytes.fetch_add(size, Ordering::SeqCst);
> }
> Err(err) => {
> - errors.fetch_add(1, Ordering::SeqCst);
> + verify_state.errors.fetch_add(1, Ordering::SeqCst);
> error!("can't verify chunk, load failed - {err}");
> }
> }
> }
> - Ok(None) => self.add_corrupt_chunk(
> + Ok(None) => Self::add_corrupt_chunk(
> + verify_state,
> info.digest,
> - errors,
> &format!(
> "can't verify missing chunk with digest {}",
> hex::encode(info.digest)
> ),
> ),
> Err(err) => {
> - errors.fetch_add(1, Ordering::SeqCst);
> + verify_state.errors.fetch_add(1, Ordering::SeqCst);
> error!("can't verify chunk, load failed - {err}");
> }
> }
> @@ -263,13 +285,13 @@ impl VerifyWorker {
> Ok(())
> }
>
> - fn add_corrupt_chunk(&self, digest: [u8; 32], errors: Arc<AtomicUsize>, message: &str) {
> + fn add_corrupt_chunk(verify_state: Arc<IndexVerifyState>, digest: [u8; 32], message: &str) {
> // Panic on poisoned mutex
> - let mut corrupt_chunks = self.corrupt_chunks.lock().unwrap();
> + let mut corrupt_chunks = verify_state.corrupt_chunks.lock().unwrap();
> corrupt_chunks.insert(digest);
> error!(message);
> - errors.fetch_add(1, Ordering::SeqCst);
> - match self.datastore.rename_corrupt_chunk(&digest) {
> + verify_state.errors.fetch_add(1, Ordering::SeqCst);
> + match verify_state.datastore.rename_corrupt_chunk(&digest) {
> Ok(Some(new_path)) => info!("corrupt chunk renamed to {new_path:?}"),
> Err(err) => info!("{err}"),
> _ => (),
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2025-11-07 8:39 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-06 16:13 [pbs-devel] [PATCH proxmox{, -backup} v2 0/7] parallelize chunk reads in verification Nicolas Frey
2025-11-06 16:13 ` [pbs-devel] [PATCH proxmox v2 1/3] pbs-api-types: add schema for {worker, read, verify}-threads Nicolas Frey
2025-11-06 16:13 ` [pbs-devel] [PATCH proxmox v2 2/3] pbs-api-types: jobs: add {read, verify}-threads to VerificationJobConfig Nicolas Frey
2025-11-06 17:44 ` Thomas Lamprecht
2025-11-07 7:47 ` Christian Ebner
2025-11-06 16:13 ` [pbs-devel] [PATCH proxmox v2 3/3] pbs-api-types: use worker-threads schema for TapeBackupJobSetup Nicolas Frey
2025-11-06 16:13 ` [pbs-devel] [PATCH proxmox-backup v2 1/4] api: verify: move chunk loading into parallel handler Nicolas Frey
2025-11-07 8:39 ` Christian Ebner [this message]
2025-11-06 16:13 ` [pbs-devel] [PATCH proxmox-backup v2 2/4] api: verify: determine the number of threads to use with {read, verify}-threads Nicolas Frey
2025-11-07 9:31 ` Christian Ebner
2025-11-07 9:41 ` Christian Ebner
2025-11-06 16:13 ` [pbs-devel] [PATCH proxmox-backup v2 3/4] api: verify: add {read, verify}-threads to update endpoint Nicolas Frey
2025-11-06 16:13 ` [pbs-devel] [PATCH proxmox-backup v2 4/4] ui: verify: add option to set number of threads for job Nicolas Frey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=49ae4fa9-fd40-45ca-a058-4fd1962871af@proxmox.com \
--to=c.ebner@proxmox.com \
--cc=n.frey@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox