public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [RFC PATCH proxmox-backup stable-3 0/3] improve verify speed under certain conditions
@ 2025-07-07 13:27 Dominik Csapak
  2025-07-07 13:27 ` [pbs-devel] [RFC PATCH proxmox-backup 1/3] verify: rename variables Dominik Csapak
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Dominik Csapak @ 2025-07-07 13:27 UTC (permalink / raw)
  To: pbs-devel

by parallelizing the chunk loading.

While talking off-list with f.gruenbichler about the pbs-restore speed
improvements, it came up that we don't parallelize the reads when
verifying snapshots.

This series introduces 2 patches that does this, the first just moves
the loading into the threads, and the second creates a seperate thread
pool just for reading.

It was developed on stable-3, if this kind of change is wanted, I would
of course send it for master too. (Some structure seems to have changed
there, so it's not a straightforward cherry-pick/port)

Dominik Csapak (3):
  verify: rename variables
  verify: move chunk loading into parallel handler
  verify: use separate read pool for reading chunks

 src/backup/verify.rs | 85 +++++++++++++++++++++++++-------------------
 1 file changed, 48 insertions(+), 37 deletions(-)

-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [pbs-devel] [RFC PATCH proxmox-backup 1/3] verify: rename variables
  2025-07-07 13:27 [pbs-devel] [RFC PATCH proxmox-backup stable-3 0/3] improve verify speed under certain conditions Dominik Csapak
@ 2025-07-07 13:27 ` Dominik Csapak
  2025-07-07 13:27 ` [pbs-devel] [RFC PATCH proxmox-backup 2/3] verify: move chunk loading into parallel handler Dominik Csapak
  2025-07-07 13:27 ` [pbs-devel] [RFC PATCH proxmox-backup 3/3] verify: use separate read pool for reading chunks Dominik Csapak
  2 siblings, 0 replies; 4+ messages in thread
From: Dominik Csapak @ 2025-07-07 13:27 UTC (permalink / raw)
  To: pbs-devel

give them a better name by moving the cloning into a block just before
the closure, since there is no naming conflict.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
this patch is just there so one can better see the changes in 2/3

 src/backup/verify.rs | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/src/backup/verify.rs b/src/backup/verify.rs
index 3d2cba8ac..ba4ca4d2f 100644
--- a/src/backup/verify.rs
+++ b/src/backup/verify.rs
@@ -109,20 +109,18 @@ fn verify_index_chunks(
     let mut read_bytes = 0;
     let mut decoded_bytes = 0;
 
-    let datastore2 = Arc::clone(&verify_worker.datastore);
-    let corrupt_chunks2 = Arc::clone(&verify_worker.corrupt_chunks);
-    let verified_chunks2 = Arc::clone(&verify_worker.verified_chunks);
-    let errors2 = Arc::clone(&errors);
-
-    let decoder_pool = ParallelHandler::new(
-        "verify chunk decoder",
-        4,
+    let decoder_pool = ParallelHandler::new("verify chunk decoder", 4, {
+        let datastore = Arc::clone(&verify_worker.datastore);
+        let corrupt_chunks = Arc::clone(&verify_worker.corrupt_chunks);
+        let verified_chunks = Arc::clone(&verify_worker.verified_chunks);
+        let errors = Arc::clone(&errors);
+
         move |(chunk, digest, size): (DataBlob, [u8; 32], u64)| {
             let chunk_crypt_mode = match chunk.crypt_mode() {
                 Err(err) => {
-                    corrupt_chunks2.lock().unwrap().insert(digest);
+                    corrupt_chunks.lock().unwrap().insert(digest);
                     info!("can't verify chunk, unknown CryptMode - {err}");
-                    errors2.fetch_add(1, Ordering::SeqCst);
+                    errors.fetch_add(1, Ordering::SeqCst);
                     return Ok(());
                 }
                 Ok(mode) => mode,
@@ -132,21 +130,21 @@ fn verify_index_chunks(
                 info!(
                     "chunk CryptMode {chunk_crypt_mode:?} does not match index CryptMode {crypt_mode:?}"
                 );
-                errors2.fetch_add(1, Ordering::SeqCst);
+                errors.fetch_add(1, Ordering::SeqCst);
             }
 
             if let Err(err) = chunk.verify_unencrypted(size as usize, &digest) {
-                corrupt_chunks2.lock().unwrap().insert(digest);
+                corrupt_chunks.lock().unwrap().insert(digest);
                 info!("{err}");
-                errors2.fetch_add(1, Ordering::SeqCst);
-                rename_corrupted_chunk(datastore2.clone(), &digest);
+                errors.fetch_add(1, Ordering::SeqCst);
+                rename_corrupted_chunk(datastore.clone(), &digest);
             } else {
-                verified_chunks2.lock().unwrap().insert(digest);
+                verified_chunks.lock().unwrap().insert(digest);
             }
 
             Ok(())
-        },
-    );
+        }
+    });
 
     let skip_chunk = |digest: &[u8; 32]| -> bool {
         if verify_worker
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [pbs-devel] [RFC PATCH proxmox-backup 2/3] verify: move chunk loading into parallel handler
  2025-07-07 13:27 [pbs-devel] [RFC PATCH proxmox-backup stable-3 0/3] improve verify speed under certain conditions Dominik Csapak
  2025-07-07 13:27 ` [pbs-devel] [RFC PATCH proxmox-backup 1/3] verify: rename variables Dominik Csapak
@ 2025-07-07 13:27 ` Dominik Csapak
  2025-07-07 13:27 ` [pbs-devel] [RFC PATCH proxmox-backup 3/3] verify: use separate read pool for reading chunks Dominik Csapak
  2 siblings, 0 replies; 4+ messages in thread
From: Dominik Csapak @ 2025-07-07 13:27 UTC (permalink / raw)
  To: pbs-devel

This way, the chunks will be loaded in parallel in addition to being
checked in parallel.

Depending on the underlying storage, this can speed up reading chunks
from disk, especially when the underlying storage is IO depth
dependent, and the CPU is faster than the storage.

In my local tests I measured the following speed difference:
verified a single snapshot with ~64 GiB (4x the RAM size) with 12 cores

current:    ~550MiB/s
this patch: ~950MiB/s

Obviously it increased the IO and CPU load in line with the throughput.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
 src/backup/verify.rs | 48 +++++++++++++++++++++++---------------------
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/src/backup/verify.rs b/src/backup/verify.rs
index ba4ca4d2f..83dd0d9a3 100644
--- a/src/backup/verify.rs
+++ b/src/backup/verify.rs
@@ -1,6 +1,6 @@
 use pbs_config::BackupLockGuard;
 use std::collections::HashSet;
-use std::sync::atomic::{AtomicUsize, Ordering};
+use std::sync::atomic::{AtomicU64, AtomicUsize, Ordering};
 use std::sync::{Arc, Mutex};
 use std::time::Instant;
 
@@ -17,7 +17,7 @@ use pbs_api_types::{
 use pbs_datastore::backup_info::{BackupDir, BackupGroup, BackupInfo};
 use pbs_datastore::index::IndexFile;
 use pbs_datastore::manifest::{BackupManifest, FileInfo};
-use pbs_datastore::{DataBlob, DataStore, StoreProgress};
+use pbs_datastore::{DataStore, StoreProgress};
 
 use crate::tools::parallel_handler::ParallelHandler;
 
@@ -106,16 +106,32 @@ fn verify_index_chunks(
 
     let start_time = Instant::now();
 
-    let mut read_bytes = 0;
-    let mut decoded_bytes = 0;
+    let read_bytes = Arc::new(AtomicU64::new(0));
+    let decoded_bytes = Arc::new(AtomicU64::new(0));
 
     let decoder_pool = ParallelHandler::new("verify chunk decoder", 4, {
         let datastore = Arc::clone(&verify_worker.datastore);
         let corrupt_chunks = Arc::clone(&verify_worker.corrupt_chunks);
         let verified_chunks = Arc::clone(&verify_worker.verified_chunks);
         let errors = Arc::clone(&errors);
+        let read_bytes = Arc::clone(&read_bytes);
+        let decoded_bytes = Arc::clone(&decoded_bytes);
 
-        move |(chunk, digest, size): (DataBlob, [u8; 32], u64)| {
+        move |(digest, size): ([u8; 32], u64)| {
+            let chunk = match datastore.load_chunk(&digest) {
+                Err(err) => {
+                    corrupt_chunks.lock().unwrap().insert(digest);
+                    error!("can't verify chunk, load failed - {err}");
+                    errors.fetch_add(1, Ordering::SeqCst);
+                    rename_corrupted_chunk(datastore.clone(), &digest);
+                    return Ok(());
+                }
+                Ok(chunk) => {
+                    read_bytes.fetch_add(chunk.raw_size(), Ordering::SeqCst);
+                    decoded_bytes.fetch_add(size, Ordering::SeqCst);
+                    chunk
+                }
+            };
             let chunk_crypt_mode = match chunk.crypt_mode() {
                 Err(err) => {
                     corrupt_chunks.lock().unwrap().insert(digest);
@@ -193,30 +209,16 @@ fn verify_index_chunks(
             continue; // already verified or marked corrupt
         }
 
-        match verify_worker.datastore.load_chunk(&info.digest) {
-            Err(err) => {
-                verify_worker
-                    .corrupt_chunks
-                    .lock()
-                    .unwrap()
-                    .insert(info.digest);
-                error!("can't verify chunk, load failed - {err}");
-                errors.fetch_add(1, Ordering::SeqCst);
-                rename_corrupted_chunk(verify_worker.datastore.clone(), &info.digest);
-            }
-            Ok(chunk) => {
-                let size = info.size();
-                read_bytes += chunk.raw_size();
-                decoder_pool.send((chunk, info.digest, size))?;
-                decoded_bytes += size;
-            }
-        }
+        decoder_pool.send((info.digest, info.size()))?;
     }
 
     decoder_pool.complete()?;
 
     let elapsed = start_time.elapsed().as_secs_f64();
 
+    let read_bytes = read_bytes.load(Ordering::SeqCst);
+    let decoded_bytes = decoded_bytes.load(Ordering::SeqCst);
+
     let read_bytes_mib = (read_bytes as f64) / (1024.0 * 1024.0);
     let decoded_bytes_mib = (decoded_bytes as f64) / (1024.0 * 1024.0);
 
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [pbs-devel] [RFC PATCH proxmox-backup 3/3] verify: use separate read pool for reading chunks
  2025-07-07 13:27 [pbs-devel] [RFC PATCH proxmox-backup stable-3 0/3] improve verify speed under certain conditions Dominik Csapak
  2025-07-07 13:27 ` [pbs-devel] [RFC PATCH proxmox-backup 1/3] verify: rename variables Dominik Csapak
  2025-07-07 13:27 ` [pbs-devel] [RFC PATCH proxmox-backup 2/3] verify: move chunk loading into parallel handler Dominik Csapak
@ 2025-07-07 13:27 ` Dominik Csapak
  2 siblings, 0 replies; 4+ messages in thread
From: Dominik Csapak @ 2025-07-07 13:27 UTC (permalink / raw)
  To: pbs-devel

instead of having each 'worker thread' read and then verify it's chunk,
use a separate 'reader pool' that reads chunks in parallell but
independent from verifying.

While this does introduces 4 new threads, they should be mostly busy with
reading from disk and not doing anything cpu intensive.

The advantage vs the current system is that the threads can start to
read the next chunks while the previous ones are still being verified.

Due to the nature of the ParallelHandler, the channel is bounded to the
number of threads, so there won't be more than 4 chunks read in advance.

In my local tests I measured the following speed difference:
verified a single snapshot with ~64 GiB (4x the RAM size) with 12 cores

current:                                        ~550MiB/s
previous patch (moving loading into threads):   ~950MiB/s
this patch:                                     ~1150MiB/s

Obviously it increased the IO and CPU load in line with the throughput.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
 src/backup/verify.rs | 49 +++++++++++++++++++++++++++-----------------
 1 file changed, 30 insertions(+), 19 deletions(-)

diff --git a/src/backup/verify.rs b/src/backup/verify.rs
index 83dd0d9a3..b139819a6 100644
--- a/src/backup/verify.rs
+++ b/src/backup/verify.rs
@@ -17,7 +17,7 @@ use pbs_api_types::{
 use pbs_datastore::backup_info::{BackupDir, BackupGroup, BackupInfo};
 use pbs_datastore::index::IndexFile;
 use pbs_datastore::manifest::{BackupManifest, FileInfo};
-use pbs_datastore::{DataStore, StoreProgress};
+use pbs_datastore::{DataBlob, DataStore, StoreProgress};
 
 use crate::tools::parallel_handler::ParallelHandler;
 
@@ -114,24 +114,8 @@ fn verify_index_chunks(
         let corrupt_chunks = Arc::clone(&verify_worker.corrupt_chunks);
         let verified_chunks = Arc::clone(&verify_worker.verified_chunks);
         let errors = Arc::clone(&errors);
-        let read_bytes = Arc::clone(&read_bytes);
-        let decoded_bytes = Arc::clone(&decoded_bytes);
 
-        move |(digest, size): ([u8; 32], u64)| {
-            let chunk = match datastore.load_chunk(&digest) {
-                Err(err) => {
-                    corrupt_chunks.lock().unwrap().insert(digest);
-                    error!("can't verify chunk, load failed - {err}");
-                    errors.fetch_add(1, Ordering::SeqCst);
-                    rename_corrupted_chunk(datastore.clone(), &digest);
-                    return Ok(());
-                }
-                Ok(chunk) => {
-                    read_bytes.fetch_add(chunk.raw_size(), Ordering::SeqCst);
-                    decoded_bytes.fetch_add(size, Ordering::SeqCst);
-                    chunk
-                }
-            };
+        move |(chunk, digest, size): (DataBlob, [u8; 32], u64)| {
             let chunk_crypt_mode = match chunk.crypt_mode() {
                 Err(err) => {
                     corrupt_chunks.lock().unwrap().insert(digest);
@@ -162,6 +146,32 @@ fn verify_index_chunks(
         }
     });
 
+    let reader_pool = ParallelHandler::new("read chunks", 4, {
+        let datastore = Arc::clone(&verify_worker.datastore);
+        let corrupt_chunks = Arc::clone(&verify_worker.corrupt_chunks);
+        let errors = Arc::clone(&errors);
+        let read_bytes = Arc::clone(&read_bytes);
+        let decoded_bytes = Arc::clone(&decoded_bytes);
+        let decoder_pool = decoder_pool.channel();
+
+        move |(digest, size): ([u8; 32], u64)| {
+            match datastore.load_chunk(&digest) {
+                Err(err) => {
+                    corrupt_chunks.lock().unwrap().insert(digest);
+                    error!("can't verify chunk, load failed - {err}");
+                    errors.fetch_add(1, Ordering::SeqCst);
+                    rename_corrupted_chunk(datastore.clone(), &digest);
+                }
+                Ok(chunk) => {
+                    read_bytes.fetch_add(chunk.raw_size(), Ordering::SeqCst);
+                    decoded_bytes.fetch_add(size, Ordering::SeqCst);
+                    decoder_pool.send((chunk, digest, size))?;
+                }
+            }
+            Ok(())
+        }
+    });
+
     let skip_chunk = |digest: &[u8; 32]| -> bool {
         if verify_worker
             .verified_chunks
@@ -209,9 +219,10 @@ fn verify_index_chunks(
             continue; // already verified or marked corrupt
         }
 
-        decoder_pool.send((info.digest, info.size()))?;
+        reader_pool.send((info.digest, info.size()))?;
     }
 
+    reader_pool.complete()?;
     decoder_pool.complete()?;
 
     let elapsed = start_time.elapsed().as_secs_f64();
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-07-07 13:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-07-07 13:27 [pbs-devel] [RFC PATCH proxmox-backup stable-3 0/3] improve verify speed under certain conditions Dominik Csapak
2025-07-07 13:27 ` [pbs-devel] [RFC PATCH proxmox-backup 1/3] verify: rename variables Dominik Csapak
2025-07-07 13:27 ` [pbs-devel] [RFC PATCH proxmox-backup 2/3] verify: move chunk loading into parallel handler Dominik Csapak
2025-07-07 13:27 ` [pbs-devel] [RFC PATCH proxmox-backup 3/3] verify: use separate read pool for reading chunks Dominik Csapak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal