public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Dominik Csapak <d.csapak@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH proxmox-backup 2/2] tape: use datastores 'read-thread' for tape backup
Date: Tue, 30 Apr 2024 11:39:39 +0200	[thread overview]
Message-ID: <20240430093939.1318786-2-d.csapak@proxmox.com> (raw)
In-Reply-To: <20240430093939.1318786-1-d.csapak@proxmox.com>

using a single thread for reading is not optimal in some cases, e.g.
when the underlying storage can handle more reads in parallel than
with a single thread.

This depends largely on the storage and cpu.

We use the ParallelHandler to handle the actual reads.
Make the sync_channel buffer size depending on the number of threads
so we have space for two chunks per thread.

Did some benchmarks on my (virtual) pbs with a real tape drive (lto8
tape in an lto9 drive):

For my NVME datastore it did not matter much how many threads were used
so i guess the bottleneck was either in the hba/drive or cable rather
than the disks/cpu. (Always got around ~300MB/s from the task log)

For a datastore on a single HDD, the results are much more interesting:

1 Thread:  ~55MB/s
2 Threads: ~70MB/s
4 Threads: ~80MB/s
8 Threads: ~95MB/s

So the fact that multiple IO request are done in parallel does speed up
the tape backup in general.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---

altough i did benchmark, i would be very grateful if other people could
test this (and the previous) change in their varying disk setups, so we
can verify that it really makes a difference and is worth it to have it
configurable

 pbs-api-types/src/datastore.rs              |  2 +-
 src/tape/pool_writer/new_chunks_iterator.rs | 42 +++++++++++++--------
 2 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
index 2ad2ae063..243c4759f 100644
--- a/pbs-api-types/src/datastore.rs
+++ b/pbs-api-types/src/datastore.rs
@@ -210,7 +210,7 @@ pub enum DatastoreFSyncLevel {
             optional: true,
         },
         "read-threads": {
-            description: "Controls how many threads are used for reading from the datastore for verification.",
+            description: "Controls how many threads are used for reading from the datastore for verify and tape backup.",
             type: usize,
             optional: true,
             minimum: 1,
diff --git a/src/tape/pool_writer/new_chunks_iterator.rs b/src/tape/pool_writer/new_chunks_iterator.rs
index 1454b33d2..63b10c9f8 100644
--- a/src/tape/pool_writer/new_chunks_iterator.rs
+++ b/src/tape/pool_writer/new_chunks_iterator.rs
@@ -6,8 +6,9 @@ use anyhow::{format_err, Error};
 use pbs_datastore::{DataBlob, DataStore, SnapshotReader};
 
 use crate::tape::CatalogSet;
+use crate::tools::parallel_handler::ParallelHandler;
 
-/// Chunk iterator which use a separate thread to read chunks
+/// Chunk iterator which use separate threads to read chunks
 ///
 /// The iterator skips duplicate chunks and chunks already in the
 /// catalog.
@@ -25,7 +26,8 @@ impl NewChunksIterator {
         snapshot_reader: Arc<Mutex<SnapshotReader>>,
         catalog_set: Arc<Mutex<CatalogSet>>,
     ) -> Result<(std::thread::JoinHandle<()>, Self), Error> {
-        let (tx, rx) = std::sync::mpsc::sync_channel(3);
+        let read_threads = datastore.get_read_threads();
+        let (tx, rx) = std::sync::mpsc::sync_channel(read_threads * 2);
 
         let reader_thread = std::thread::spawn(move || {
             let snapshot_reader = snapshot_reader.lock().unwrap();
@@ -35,36 +37,44 @@ impl NewChunksIterator {
             let datastore_name = snapshot_reader.datastore_name().to_string();
 
             let result: Result<(), Error> = proxmox_lang::try_block!({
-                let mut chunk_iter = snapshot_reader.chunk_iterator(move |digest| {
+                let chunk_iter = snapshot_reader.chunk_iterator(move |digest| {
                     catalog_set
                         .lock()
                         .unwrap()
                         .contains_chunk(&datastore_name, digest)
                 })?;
 
-                loop {
-                    let digest = match chunk_iter.next() {
-                        None => {
-                            let _ = tx.send(Ok(None)); // ignore send error
-                            break;
+                let reader_pool =
+                    ParallelHandler::new("tape backup chunk reader pool", read_threads, {
+                        let tx = tx.clone();
+                        move |digest| {
+                            let blob = datastore.load_chunk(&digest)?;
+                            //println!("LOAD CHUNK {}", hex::encode(&digest));
+
+                            tx.send(Ok(Some((digest, blob)))).map_err(|err| {
+                                format_err!("error sending result from reader thread: {err}")
+                            })?;
+
+                            Ok(())
                         }
-                        Some(digest) => digest?,
-                    };
+                    });
+
+                for digest in chunk_iter {
+                    let digest = digest?;
 
                     if chunk_index.contains(&digest) {
                         continue;
                     }
 
-                    let blob = datastore.load_chunk(&digest)?;
-                    //println!("LOAD CHUNK {}", hex::encode(&digest));
-                    if let Err(err) = tx.send(Ok(Some((digest, blob)))) {
-                        eprintln!("could not send chunk to reader thread: {err}");
-                        break;
-                    }
+                    reader_pool.send(digest)?;
 
                     chunk_index.insert(digest);
                 }
 
+                reader_pool.complete()?;
+
+                let _ = tx.send(Ok(None)); // ignore send error
+
                 Ok(())
             });
             if let Err(err) = result {
-- 
2.39.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


  reply	other threads:[~2024-04-30  9:39 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-30  9:39 [pbs-devel] [PATCH proxmox-backup 1/2] datastore: add tuning options for the number of reading threads Dominik Csapak
2024-04-30  9:39 ` Dominik Csapak [this message]
2024-04-30 11:00   ` [pbs-devel] [PATCH proxmox-backup 2/2] tape: use datastores 'read-thread' for tape backup Dominik Csapak
2024-05-02 12:22 ` [pbs-devel] [PATCH proxmox-backup 1/2] datastore: add tuning options for the number of reading threads Thomas Lamprecht
2024-05-02 13:14   ` Dominik Csapak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240430093939.1318786-2-d.csapak@proxmox.com \
    --to=d.csapak@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal