public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Dominik Csapak <d.csapak@proxmox.com>
To: Proxmox Backup Server development discussion
	<pbs-devel@lists.proxmox.com>,
	Max Carrara <m.carrara@proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v2 1/3] tape: introduce a tape backup read thread tuning opti
Date: Wed, 8 May 2024 08:56:32 +0200	[thread overview]
Message-ID: <2d13fe7c-25be-4eff-9cc6-85b65e829c36@proxmox.com> (raw)
In-Reply-To: <D13IAVJ7MM88.ZZT9M6HM142C@proxmox.com>

On 5/7/24 17:10, Max Carrara wrote:
> On Tue May 7, 2024 at 9:29 AM CEST, Dominik Csapak wrote:
>> using a single thread for reading is not optimal in some cases, e.g.
>> when the underlying storage can handle more reads in parallel than with
>> a single thread.
>>
> 
> Got only a few tiny notes that are otherwise unremarkable:
> 
> Would prefer to reword this as:
> 
> Using a single thread for reading is not optimal in some cases, e.g.
> when the underlying storage can handle reads from multiple threads in
> parallel.
> 
>> We use the ParallelHandler to handle the actual reads. Make the
>> sync_channel buffer size depending on the number of threads so we have
>> space for two chunks per thread.
> 
> Would s/depending/depend/ for the above paragraph.
> 

ok to both comments, will update in a v3 (if necessary)
or if preferred the one committing could fixup the commit message when applying

>>
>> How this impacts the backup speed largely depends on the underlying
>> storage and how the backup is laid out on it.
>>
>> I benchmarked the following setups:
>>
>> * Setup A: relatively spread out backup on a virtualized pbs on single HDDs
>> * Setup B: mostly sequential chunks on a virtualized pbs on single HDDs
>> * Setup C: backup on virtualized pbs on a fast NVME
>> * Setup D: backup on bare metal pbs with ZFS in a RAID10 with 6 HDDs
>>    and 2 fast special devices in a mirror
>>
>> (values are reported in MB/s as seen in the task log, caches were
>> cleared between runs, backups were bigger than the memory available)
>>
>> setup  1 thread  2 threads  4 threads  8 threads
>> A      55        70         80         95
>> B      110       89         100        108
>> C      294       294        294        294
>> D      118       180        300        300
>>
>> So there are cases where multiple read threads speed up the tape backup
>> (dramatically). On the other hand there are situations where reading
>> from a single thread is actually faster, probably because we can read
>> from the HDD sequentially.
>>
>> I used a new default value of '4' here since that gave good performance
>> on all setups (not necessarily the best) and we also have a default
>> value of '4' for verification threads.
>>
>> We use a struct here for the datastore since we want to introduce other
>> thread tuning options too.
>>
>> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> 
> The patch is pretty straightforward; there's one question inline, but
> otherwise this looks fine to me. Solid work!
> 
> Wasn't able to test this yet as setting up a VTL on Bookworm failed for
> me, unfortunately. Will try to test this tomorrow if possible.

it's probably obvious, but just to be sure:

make sure the target storage for the vtl is faster than your
source storage, otherwise you'll always be limited by that...

> 
> For now:
> 
> Reviewed-by: Max Carrara <m.carrara@proxmox.com>
> 
[snip]
> diff --git a/src/tape/pool_writer/new_chunks_iterator.rs b/src/tape/pool_writer/new_chunks_iterator.rs
>> index 1454b33d2..a2a8091f6 100644
>> --- a/src/tape/pool_writer/new_chunks_iterator.rs
>> +++ b/src/tape/pool_writer/new_chunks_iterator.rs
>> @@ -6,8 +6,9 @@ use anyhow::{format_err, Error};
>>   use pbs_datastore::{DataBlob, DataStore, SnapshotReader};
>>   
>>   use crate::tape::CatalogSet;
>> +use crate::tools::parallel_handler::ParallelHandler;
>>   
>> -/// Chunk iterator which use a separate thread to read chunks
>> +/// Chunk iterator which uses separate threads to read chunks
>>   ///
>>   /// The iterator skips duplicate chunks and chunks already in the
>>   /// catalog.
>> @@ -25,7 +26,11 @@ impl NewChunksIterator {
>>           snapshot_reader: Arc<Mutex<SnapshotReader>>,
>>           catalog_set: Arc<Mutex<CatalogSet>>,
>>       ) -> Result<(std::thread::JoinHandle<()>, Self), Error> {
>> -        let (tx, rx) = std::sync::mpsc::sync_channel(3);
>> +        let read_threads = datastore
>> +            .get_thread_configuration()
>> +            .tape_backup_read_threads;
>> +
>> +        let (tx, rx) = std::sync::mpsc::sync_channel(read_threads * 2);
> 
> Is there any reason you're using `* 2` here? For example, is the
> throughput unaffected if you use a larger value, like `* 8`?
> 
> If the constant has an effect like that it should IMO be documented, but
> if not, then it can just stay like it is.

i did not benchmark that much here (would have increased my benchmark time since
that's another dimension in the testing with many possible values)

but my though process was this:

if the source storage is much slower than what we can write to tape, that
buffer will be empty most of the time

if the source storage is much faster than what we can write to tape, it
will be full most of the time

in both cases the buffer size won't matter much (as long as there is a bit of
buffer)

the sweet spot is when the storage is about as fast as the tape writes
and in that case you want to be able to have just a bit more buffer
than threads reading from the storage so there is always a read
going on (so the buffer will never empty, since that slows
down the tape write)

so 1*number of threads seemed to low, but i think anything bigger
than 2* is just a waste of memory

does that make sense?

if we really want, i can make additional benchmarks but my guess
is that it won't make much difference in practice


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


  reply	other threads:[~2024-05-08  6:57 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-07  7:29 [pbs-devel] [PATCH proxmox-backup v2 0/3] tape/verify: improve multithreaded Dominik Csapak
2024-05-07  7:29 ` [pbs-devel] [PATCH proxmox-backup v2 1/3] tape: introduce a tape backup read thread tuning option Dominik Csapak
2024-05-07 15:10   ` Max Carrara
2024-05-08  6:56     ` Dominik Csapak [this message]
2024-05-08 13:37       ` [pbs-devel] [PATCH proxmox-backup v2 1/3] tape: introduce a tape backup read thread tuning opti Max Carrara
2024-05-08 17:47   ` [pbs-devel] [PATCH proxmox-backup v2 1/3] tape: introduce a tape backup read thread tuning option Max Carrara
2024-05-07  7:29 ` [pbs-devel] [RFC PATCH proxmox-backup v2 2/3] verify: add tuning option for number of threads to use Dominik Csapak
2024-05-07  7:29 ` [pbs-devel] [RFC PATCH proxmox-backup v2 3/3] verify: move chunk loading into the worker threads Dominik Csapak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2d13fe7c-25be-4eff-9cc6-85b65e829c36@proxmox.com \
    --to=d.csapak@proxmox.com \
    --cc=m.carrara@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal