From: Dominik Csapak <d.csapak@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [RFC PATCH proxmox-backup 2/3] verify: move chunk loading into parallel handler
Date: Mon, 7 Jul 2025 15:27:05 +0200 [thread overview]
Message-ID: <20250707132706.2854973-3-d.csapak@proxmox.com> (raw)
In-Reply-To: <20250707132706.2854973-1-d.csapak@proxmox.com>
This way, the chunks will be loaded in parallel in addition to being
checked in parallel.
Depending on the underlying storage, this can speed up reading chunks
from disk, especially when the underlying storage is IO depth
dependent, and the CPU is faster than the storage.
In my local tests I measured the following speed difference:
verified a single snapshot with ~64 GiB (4x the RAM size) with 12 cores
current: ~550MiB/s
this patch: ~950MiB/s
Obviously it increased the IO and CPU load in line with the throughput.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
src/backup/verify.rs | 48 +++++++++++++++++++++++---------------------
1 file changed, 25 insertions(+), 23 deletions(-)
diff --git a/src/backup/verify.rs b/src/backup/verify.rs
index ba4ca4d2f..83dd0d9a3 100644
--- a/src/backup/verify.rs
+++ b/src/backup/verify.rs
@@ -1,6 +1,6 @@
use pbs_config::BackupLockGuard;
use std::collections::HashSet;
-use std::sync::atomic::{AtomicUsize, Ordering};
+use std::sync::atomic::{AtomicU64, AtomicUsize, Ordering};
use std::sync::{Arc, Mutex};
use std::time::Instant;
@@ -17,7 +17,7 @@ use pbs_api_types::{
use pbs_datastore::backup_info::{BackupDir, BackupGroup, BackupInfo};
use pbs_datastore::index::IndexFile;
use pbs_datastore::manifest::{BackupManifest, FileInfo};
-use pbs_datastore::{DataBlob, DataStore, StoreProgress};
+use pbs_datastore::{DataStore, StoreProgress};
use crate::tools::parallel_handler::ParallelHandler;
@@ -106,16 +106,32 @@ fn verify_index_chunks(
let start_time = Instant::now();
- let mut read_bytes = 0;
- let mut decoded_bytes = 0;
+ let read_bytes = Arc::new(AtomicU64::new(0));
+ let decoded_bytes = Arc::new(AtomicU64::new(0));
let decoder_pool = ParallelHandler::new("verify chunk decoder", 4, {
let datastore = Arc::clone(&verify_worker.datastore);
let corrupt_chunks = Arc::clone(&verify_worker.corrupt_chunks);
let verified_chunks = Arc::clone(&verify_worker.verified_chunks);
let errors = Arc::clone(&errors);
+ let read_bytes = Arc::clone(&read_bytes);
+ let decoded_bytes = Arc::clone(&decoded_bytes);
- move |(chunk, digest, size): (DataBlob, [u8; 32], u64)| {
+ move |(digest, size): ([u8; 32], u64)| {
+ let chunk = match datastore.load_chunk(&digest) {
+ Err(err) => {
+ corrupt_chunks.lock().unwrap().insert(digest);
+ error!("can't verify chunk, load failed - {err}");
+ errors.fetch_add(1, Ordering::SeqCst);
+ rename_corrupted_chunk(datastore.clone(), &digest);
+ return Ok(());
+ }
+ Ok(chunk) => {
+ read_bytes.fetch_add(chunk.raw_size(), Ordering::SeqCst);
+ decoded_bytes.fetch_add(size, Ordering::SeqCst);
+ chunk
+ }
+ };
let chunk_crypt_mode = match chunk.crypt_mode() {
Err(err) => {
corrupt_chunks.lock().unwrap().insert(digest);
@@ -193,30 +209,16 @@ fn verify_index_chunks(
continue; // already verified or marked corrupt
}
- match verify_worker.datastore.load_chunk(&info.digest) {
- Err(err) => {
- verify_worker
- .corrupt_chunks
- .lock()
- .unwrap()
- .insert(info.digest);
- error!("can't verify chunk, load failed - {err}");
- errors.fetch_add(1, Ordering::SeqCst);
- rename_corrupted_chunk(verify_worker.datastore.clone(), &info.digest);
- }
- Ok(chunk) => {
- let size = info.size();
- read_bytes += chunk.raw_size();
- decoder_pool.send((chunk, info.digest, size))?;
- decoded_bytes += size;
- }
- }
+ decoder_pool.send((info.digest, info.size()))?;
}
decoder_pool.complete()?;
let elapsed = start_time.elapsed().as_secs_f64();
+ let read_bytes = read_bytes.load(Ordering::SeqCst);
+ let decoded_bytes = decoded_bytes.load(Ordering::SeqCst);
+
let read_bytes_mib = (read_bytes as f64) / (1024.0 * 1024.0);
let decoded_bytes_mib = (decoded_bytes as f64) / (1024.0 * 1024.0);
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2025-07-07 13:26 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-07 13:27 [pbs-devel] [RFC PATCH proxmox-backup stable-3 0/3] improve verify speed under certain conditions Dominik Csapak
2025-07-07 13:27 ` [pbs-devel] [RFC PATCH proxmox-backup 1/3] verify: rename variables Dominik Csapak
2025-07-07 13:27 ` Dominik Csapak [this message]
2025-07-07 13:27 ` [pbs-devel] [RFC PATCH proxmox-backup 3/3] verify: use separate read pool for reading chunks Dominik Csapak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250707132706.2854973-3-d.csapak@proxmox.com \
--to=d.csapak@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox