From: Dominik Csapak <d.csapak@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH proxmox-backup 2/2] tape: use datastores 'read-thread' for tape backup
Date: Tue, 30 Apr 2024 11:39:39 +0200 [thread overview]
Message-ID: <20240430093939.1318786-2-d.csapak@proxmox.com> (raw)
In-Reply-To: <20240430093939.1318786-1-d.csapak@proxmox.com>
using a single thread for reading is not optimal in some cases, e.g.
when the underlying storage can handle more reads in parallel than
with a single thread.
This depends largely on the storage and cpu.
We use the ParallelHandler to handle the actual reads.
Make the sync_channel buffer size depending on the number of threads
so we have space for two chunks per thread.
Did some benchmarks on my (virtual) pbs with a real tape drive (lto8
tape in an lto9 drive):
For my NVME datastore it did not matter much how many threads were used
so i guess the bottleneck was either in the hba/drive or cable rather
than the disks/cpu. (Always got around ~300MB/s from the task log)
For a datastore on a single HDD, the results are much more interesting:
1 Thread: ~55MB/s
2 Threads: ~70MB/s
4 Threads: ~80MB/s
8 Threads: ~95MB/s
So the fact that multiple IO request are done in parallel does speed up
the tape backup in general.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
---
altough i did benchmark, i would be very grateful if other people could
test this (and the previous) change in their varying disk setups, so we
can verify that it really makes a difference and is worth it to have it
configurable
pbs-api-types/src/datastore.rs | 2 +-
src/tape/pool_writer/new_chunks_iterator.rs | 42 +++++++++++++--------
2 files changed, 27 insertions(+), 17 deletions(-)
diff --git a/pbs-api-types/src/datastore.rs b/pbs-api-types/src/datastore.rs
index 2ad2ae063..243c4759f 100644
--- a/pbs-api-types/src/datastore.rs
+++ b/pbs-api-types/src/datastore.rs
@@ -210,7 +210,7 @@ pub enum DatastoreFSyncLevel {
optional: true,
},
"read-threads": {
- description: "Controls how many threads are used for reading from the datastore for verification.",
+ description: "Controls how many threads are used for reading from the datastore for verify and tape backup.",
type: usize,
optional: true,
minimum: 1,
diff --git a/src/tape/pool_writer/new_chunks_iterator.rs b/src/tape/pool_writer/new_chunks_iterator.rs
index 1454b33d2..63b10c9f8 100644
--- a/src/tape/pool_writer/new_chunks_iterator.rs
+++ b/src/tape/pool_writer/new_chunks_iterator.rs
@@ -6,8 +6,9 @@ use anyhow::{format_err, Error};
use pbs_datastore::{DataBlob, DataStore, SnapshotReader};
use crate::tape::CatalogSet;
+use crate::tools::parallel_handler::ParallelHandler;
-/// Chunk iterator which use a separate thread to read chunks
+/// Chunk iterator which use separate threads to read chunks
///
/// The iterator skips duplicate chunks and chunks already in the
/// catalog.
@@ -25,7 +26,8 @@ impl NewChunksIterator {
snapshot_reader: Arc<Mutex<SnapshotReader>>,
catalog_set: Arc<Mutex<CatalogSet>>,
) -> Result<(std::thread::JoinHandle<()>, Self), Error> {
- let (tx, rx) = std::sync::mpsc::sync_channel(3);
+ let read_threads = datastore.get_read_threads();
+ let (tx, rx) = std::sync::mpsc::sync_channel(read_threads * 2);
let reader_thread = std::thread::spawn(move || {
let snapshot_reader = snapshot_reader.lock().unwrap();
@@ -35,36 +37,44 @@ impl NewChunksIterator {
let datastore_name = snapshot_reader.datastore_name().to_string();
let result: Result<(), Error> = proxmox_lang::try_block!({
- let mut chunk_iter = snapshot_reader.chunk_iterator(move |digest| {
+ let chunk_iter = snapshot_reader.chunk_iterator(move |digest| {
catalog_set
.lock()
.unwrap()
.contains_chunk(&datastore_name, digest)
})?;
- loop {
- let digest = match chunk_iter.next() {
- None => {
- let _ = tx.send(Ok(None)); // ignore send error
- break;
+ let reader_pool =
+ ParallelHandler::new("tape backup chunk reader pool", read_threads, {
+ let tx = tx.clone();
+ move |digest| {
+ let blob = datastore.load_chunk(&digest)?;
+ //println!("LOAD CHUNK {}", hex::encode(&digest));
+
+ tx.send(Ok(Some((digest, blob)))).map_err(|err| {
+ format_err!("error sending result from reader thread: {err}")
+ })?;
+
+ Ok(())
}
- Some(digest) => digest?,
- };
+ });
+
+ for digest in chunk_iter {
+ let digest = digest?;
if chunk_index.contains(&digest) {
continue;
}
- let blob = datastore.load_chunk(&digest)?;
- //println!("LOAD CHUNK {}", hex::encode(&digest));
- if let Err(err) = tx.send(Ok(Some((digest, blob)))) {
- eprintln!("could not send chunk to reader thread: {err}");
- break;
- }
+ reader_pool.send(digest)?;
chunk_index.insert(digest);
}
+ reader_pool.complete()?;
+
+ let _ = tx.send(Ok(None)); // ignore send error
+
Ok(())
});
if let Err(err) = result {
--
2.39.2
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2024-04-30 9:39 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-30 9:39 [pbs-devel] [PATCH proxmox-backup 1/2] datastore: add tuning options for the number of reading threads Dominik Csapak
2024-04-30 9:39 ` Dominik Csapak [this message]
2024-04-30 11:00 ` [pbs-devel] [PATCH proxmox-backup 2/2] tape: use datastores 'read-thread' for tape backup Dominik Csapak
2024-05-02 12:22 ` [pbs-devel] [PATCH proxmox-backup 1/2] datastore: add tuning options for the number of reading threads Thomas Lamprecht
2024-05-02 13:14 ` Dominik Csapak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240430093939.1318786-2-d.csapak@proxmox.com \
--to=d.csapak@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox