public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Max Carrara" <m.carrara@proxmox.com>
To: "Dominik Csapak" <d.csapak@proxmox.com>,
	"Proxmox Backup Server development discussion"
	<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v2 1/3] tape: introduce a tape backup read thread tuning opti
Date: Wed, 08 May 2024 15:37:16 +0200	[thread overview]
Message-ID: <D14AY5X8H2A6.13Z06BU3IRB9G@proxmox.com> (raw)
In-Reply-To: <2d13fe7c-25be-4eff-9cc6-85b65e829c36@proxmox.com>

On Wed May 8, 2024 at 8:56 AM CEST, Dominik Csapak wrote:
> On 5/7/24 17:10, Max Carrara wrote:
> > On Tue May 7, 2024 at 9:29 AM CEST, Dominik Csapak wrote:
> >> using a single thread for reading is not optimal in some cases, e.g.
> >> when the underlying storage can handle more reads in parallel than with
> >> a single thread.
> >>
> > 
> > Got only a few tiny notes that are otherwise unremarkable:
> > 
> > Would prefer to reword this as:
> > 
> > Using a single thread for reading is not optimal in some cases, e.g.
> > when the underlying storage can handle reads from multiple threads in
> > parallel.
> > 
> >> We use the ParallelHandler to handle the actual reads. Make the
> >> sync_channel buffer size depending on the number of threads so we have
> >> space for two chunks per thread.
> > 
> > Would s/depending/depend/ for the above paragraph.
> > 
>
> ok to both comments, will update in a v3 (if necessary)
> or if preferred the one committing could fixup the commit message when applying
>
> >>
> >> How this impacts the backup speed largely depends on the underlying
> >> storage and how the backup is laid out on it.
> >>
> >> I benchmarked the following setups:
> >>
> >> * Setup A: relatively spread out backup on a virtualized pbs on single HDDs
> >> * Setup B: mostly sequential chunks on a virtualized pbs on single HDDs
> >> * Setup C: backup on virtualized pbs on a fast NVME
> >> * Setup D: backup on bare metal pbs with ZFS in a RAID10 with 6 HDDs
> >>    and 2 fast special devices in a mirror
> >>
> >> (values are reported in MB/s as seen in the task log, caches were
> >> cleared between runs, backups were bigger than the memory available)
> >>
> >> setup  1 thread  2 threads  4 threads  8 threads
> >> A      55        70         80         95
> >> B      110       89         100        108
> >> C      294       294        294        294
> >> D      118       180        300        300
> >>
> >> So there are cases where multiple read threads speed up the tape backup
> >> (dramatically). On the other hand there are situations where reading
> >> from a single thread is actually faster, probably because we can read
> >> from the HDD sequentially.
> >>
> >> I used a new default value of '4' here since that gave good performance
> >> on all setups (not necessarily the best) and we also have a default
> >> value of '4' for verification threads.
> >>
> >> We use a struct here for the datastore since we want to introduce other
> >> thread tuning options too.
> >>
> >> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> > 
> > The patch is pretty straightforward; there's one question inline, but
> > otherwise this looks fine to me. Solid work!
> > 
> > Wasn't able to test this yet as setting up a VTL on Bookworm failed for
> > me, unfortunately. Will try to test this tomorrow if possible.
>
> it's probably obvious, but just to be sure:
>
> make sure the target storage for the vtl is faster than your
> source storage, otherwise you'll always be limited by that...

Thanks for the tip! Will consider it. Am in the process of spinning up
another VM.

>
> > 
> > For now:
> > 
> > Reviewed-by: Max Carrara <m.carrara@proxmox.com>
> > 
> [snip]
> > diff --git a/src/tape/pool_writer/new_chunks_iterator.rs b/src/tape/pool_writer/new_chunks_iterator.rs
> >> index 1454b33d2..a2a8091f6 100644
> >> --- a/src/tape/pool_writer/new_chunks_iterator.rs
> >> +++ b/src/tape/pool_writer/new_chunks_iterator.rs
> >> @@ -6,8 +6,9 @@ use anyhow::{format_err, Error};
> >>   use pbs_datastore::{DataBlob, DataStore, SnapshotReader};
> >>   
> >>   use crate::tape::CatalogSet;
> >> +use crate::tools::parallel_handler::ParallelHandler;
> >>   
> >> -/// Chunk iterator which use a separate thread to read chunks
> >> +/// Chunk iterator which uses separate threads to read chunks
> >>   ///
> >>   /// The iterator skips duplicate chunks and chunks already in the
> >>   /// catalog.
> >> @@ -25,7 +26,11 @@ impl NewChunksIterator {
> >>           snapshot_reader: Arc<Mutex<SnapshotReader>>,
> >>           catalog_set: Arc<Mutex<CatalogSet>>,
> >>       ) -> Result<(std::thread::JoinHandle<()>, Self), Error> {
> >> -        let (tx, rx) = std::sync::mpsc::sync_channel(3);
> >> +        let read_threads = datastore
> >> +            .get_thread_configuration()
> >> +            .tape_backup_read_threads;
> >> +
> >> +        let (tx, rx) = std::sync::mpsc::sync_channel(read_threads * 2);
> > 
> > Is there any reason you're using `* 2` here? For example, is the
> > throughput unaffected if you use a larger value, like `* 8`?
> > 
> > If the constant has an effect like that it should IMO be documented, but
> > if not, then it can just stay like it is.
>
> i did not benchmark that much here (would have increased my benchmark time since
> that's another dimension in the testing with many possible values)
>
> but my though process was this:
>
> if the source storage is much slower than what we can write to tape, that
> buffer will be empty most of the time
>
> if the source storage is much faster than what we can write to tape, it
> will be full most of the time
>
> in both cases the buffer size won't matter much (as long as there is a bit of
> buffer)
>
> the sweet spot is when the storage is about as fast as the tape writes
> and in that case you want to be able to have just a bit more buffer
> than threads reading from the storage so there is always a read
> going on (so the buffer will never empty, since that slows
> down the tape write)
>
> so 1*number of threads seemed to low, but i think anything bigger
> than 2* is just a waste of memory
>
> does that make sense?
>
> if we really want, i can make additional benchmarks but my guess
> is that it won't make much difference in practice

Okay that makes a lot of sense, that has cleared everything up for me.
Thanks!

I don't think this needs to be changed at all, then. LGTM



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


  reply	other threads:[~2024-05-08 13:37 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-07  7:29 [pbs-devel] [PATCH proxmox-backup v2 0/3] tape/verify: improve multithreaded Dominik Csapak
2024-05-07  7:29 ` [pbs-devel] [PATCH proxmox-backup v2 1/3] tape: introduce a tape backup read thread tuning option Dominik Csapak
2024-05-07 15:10   ` Max Carrara
2024-05-08  6:56     ` [pbs-devel] [PATCH proxmox-backup v2 1/3] tape: introduce a tape backup read thread tuning opti Dominik Csapak
2024-05-08 13:37       ` Max Carrara [this message]
2024-05-08 17:47   ` [pbs-devel] [PATCH proxmox-backup v2 1/3] tape: introduce a tape backup read thread tuning option Max Carrara
2024-05-07  7:29 ` [pbs-devel] [RFC PATCH proxmox-backup v2 2/3] verify: add tuning option for number of threads to use Dominik Csapak
2024-05-07  7:29 ` [pbs-devel] [RFC PATCH proxmox-backup v2 3/3] verify: move chunk loading into the worker threads Dominik Csapak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D14AY5X8H2A6.13Z06BU3IRB9G@proxmox.com \
    --to=m.carrara@proxmox.com \
    --cc=d.csapak@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal