public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Dominik Csapak <d.csapak@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [RFC PATCH proxmox-backup v2 3/3] verify: move chunk loading into the worker threads
Date: Tue,  7 May 2024 09:29:55 +0200	[thread overview]
Message-ID: <20240507072955.364206-4-d.csapak@proxmox.com> (raw)
In-Reply-To: <20240507072955.364206-1-d.csapak@proxmox.com>

so that the chunk loading can also be done in parallel.

This can has a very big impact on how fast verification is, depending on
the underlying storage.

I measured on the following setups:

* Setup A: spread out backup on virtualized PBS on single HDD
* Setup B: backup with mostly sequential chunks on virtualized PBS on HDDs
* Setup C: backup on virtualized PBS on a fast NVME

(value is in MiB/s read speed from the task log, caches were cleared
between runs, cpu was mostly idle for the HDD tests)

setup  baseline(current code)  1 thread  2 threads  4 threads  8 threads
A      89                      75        73         79         85
B      67                      56        61         67         72
C      1133                    616       1133       1850       2558

This data shows that on spinning disks, having a single read thread
that continuously reads, can be better than simply having multiple
threads that read from disk.

On fast disks though, reading in parallel makes it much faster, even
with the same number of threads.

Since the results are so varied across backups/storages/etc. I opted
to not automatically calculate the default (e.g. by disk type/cpu
cores/etc.)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
As mentioned in the cover letter, we could go even further and have
a seperate reading thread pool (thats configurable) so we could
have the default values we have now but users can still tune it to
their liking.

 pbs-api-types/src/datastore.rs |  2 +-
 src/backup/verify.rs           | 65 ++++++++++++++--------------------
 2 files changed, 28 insertions(+), 39 deletions(-)

diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
index 3fb9ff766..6278a6d99 100644
--- a/pbs-api-types/src/datastore.rs
+++ b/pbs-api-types/src/datastore.rs
@@ -233,7 +233,7 @@ pub struct DatastoreTuning {
     #[serde(skip_serializing_if = "Option::is_none")]
     /// Configures how many threads to use to read from the datastore while backing up to tape.
     pub tape_backup_read_threads: Option<usize>,
-    /// Configures how many threads to use for hashing on verify.
+    /// Configures how many threads to use for reading and hashing on verify.
     pub verification_threads: Option<usize>,
 }
 
diff --git a/src/backup/verify.rs b/src/backup/verify.rs
index c9fbb2e33..e2789a865 100644
--- a/src/backup/verify.rs
+++ b/src/backup/verify.rs
@@ -1,6 +1,6 @@
 use nix::dir::Dir;
 use std::collections::HashSet;
-use std::sync::atomic::{AtomicUsize, Ordering};
+use std::sync::atomic::{AtomicU64, AtomicUsize, Ordering};
 use std::sync::{Arc, Mutex};
 use std::time::Instant;
 
@@ -15,7 +15,7 @@ use pbs_api_types::{
 use pbs_datastore::backup_info::{BackupDir, BackupGroup, BackupInfo};
 use pbs_datastore::index::IndexFile;
 use pbs_datastore::manifest::{archive_type, ArchiveType, BackupManifest, FileInfo};
-use pbs_datastore::{DataBlob, DataStore, StoreProgress};
+use pbs_datastore::{DataStore, StoreProgress};
 use proxmox_sys::fs::lock_dir_noblock_shared;
 
 use crate::tools::parallel_handler::ParallelHandler;
@@ -114,8 +114,8 @@ fn verify_index_chunks(
 
     let start_time = Instant::now();
 
-    let mut read_bytes = 0;
-    let mut decoded_bytes = 0;
+    let read_bytes = Arc::new(AtomicU64::new(0));
+    let decoded_bytes = Arc::new(AtomicU64::new(0));
 
     let worker2 = Arc::clone(&verify_worker.worker);
     let datastore2 = Arc::clone(&verify_worker.datastore);
@@ -125,10 +125,23 @@ fn verify_index_chunks(
 
     let thread_count = datastore2.get_thread_configuration().verification_threads;
 
-    let decoder_pool = ParallelHandler::new(
-        "verify chunk decoder",
-        thread_count,
-        move |(chunk, digest, size): (DataBlob, [u8; 32], u64)| {
+    let decoder_pool = ParallelHandler::new("verify chunk decoder", thread_count, {
+        let read_bytes = read_bytes.clone();
+        let decoded_bytes = decoded_bytes.clone();
+        move |(digest, size): ([u8; 32], u64)| {
+            let chunk = match datastore2.load_chunk(&digest) {
+                Err(err) => {
+                    corrupt_chunks2.lock().unwrap().insert(digest);
+                    task_log!(worker2, "can't verify chunk, load failed - {}", err);
+                    errors2.fetch_add(1, Ordering::SeqCst);
+                    rename_corrupted_chunk(datastore2.clone(), &digest, &worker2);
+                    return Ok(());
+                }
+                Ok(chunk) => {
+                    read_bytes.fetch_add(chunk.raw_size(), Ordering::SeqCst);
+                    chunk
+                }
+            };
             let chunk_crypt_mode = match chunk.crypt_mode() {
                 Err(err) => {
                     corrupt_chunks2.lock().unwrap().insert(digest);
@@ -157,10 +170,11 @@ fn verify_index_chunks(
             } else {
                 verified_chunks2.lock().unwrap().insert(digest);
             }
+            decoded_bytes.fetch_add(size, Ordering::SeqCst);
 
             Ok(())
-        },
-    );
+        }
+    });
 
     let skip_chunk = |digest: &[u8; 32]| -> bool {
         if verify_worker
@@ -213,40 +227,15 @@ fn verify_index_chunks(
             continue; // already verified or marked corrupt
         }
 
-        match verify_worker.datastore.load_chunk(&info.digest) {
-            Err(err) => {
-                verify_worker
-                    .corrupt_chunks
-                    .lock()
-                    .unwrap()
-                    .insert(info.digest);
-                task_log!(
-                    verify_worker.worker,
-                    "can't verify chunk, load failed - {}",
-                    err
-                );
-                errors.fetch_add(1, Ordering::SeqCst);
-                rename_corrupted_chunk(
-                    verify_worker.datastore.clone(),
-                    &info.digest,
-                    &verify_worker.worker,
-                );
-            }
-            Ok(chunk) => {
-                let size = info.size();
-                read_bytes += chunk.raw_size();
-                decoder_pool.send((chunk, info.digest, size))?;
-                decoded_bytes += size;
-            }
-        }
+        decoder_pool.send((info.digest, info.size()))?;
     }
 
     decoder_pool.complete()?;
 
     let elapsed = start_time.elapsed().as_secs_f64();
 
-    let read_bytes_mib = (read_bytes as f64) / (1024.0 * 1024.0);
-    let decoded_bytes_mib = (decoded_bytes as f64) / (1024.0 * 1024.0);
+    let read_bytes_mib = (read_bytes.load(Ordering::SeqCst) as f64) / (1024.0 * 1024.0);
+    let decoded_bytes_mib = (decoded_bytes.load(Ordering::SeqCst) as f64) / (1024.0 * 1024.0);
 
     let read_speed = read_bytes_mib / elapsed;
     let decode_speed = decoded_bytes_mib / elapsed;
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


      parent reply	other threads:[~2024-05-07  7:29 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-07  7:29 [pbs-devel] [PATCH proxmox-backup v2 0/3] tape/verify: improve multithreaded Dominik Csapak
2024-05-07  7:29 ` [pbs-devel] [PATCH proxmox-backup v2 1/3] tape: introduce a tape backup read thread tuning option Dominik Csapak
2024-05-07 15:10   ` Max Carrara
2024-05-08  6:56     ` [pbs-devel] [PATCH proxmox-backup v2 1/3] tape: introduce a tape backup read thread tuning opti Dominik Csapak
2024-05-08 13:37       ` Max Carrara
2024-05-08 17:47   ` [pbs-devel] [PATCH proxmox-backup v2 1/3] tape: introduce a tape backup read thread tuning option Max Carrara
2024-05-07  7:29 ` [pbs-devel] [RFC PATCH proxmox-backup v2 2/3] verify: add tuning option for number of threads to use Dominik Csapak
2024-05-07  7:29 ` Dominik Csapak [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240507072955.364206-4-d.csapak@proxmox.com \
    --to=d.csapak@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal