public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
To: Proxmox Backup Server development discussion
	<pbs-devel@lists.proxmox.com>,
	Dominik Csapak <d.csapak@proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v3 1/1] tape: introduce a tape backup job worker thread option
Date: Thu, 20 Mar 2025 17:30:35 +0100	[thread overview]
Message-ID: <e5a2dcac-630e-4797-bbbf-f38bc260c2ca@proxmox.com> (raw)
In-Reply-To: <20250221150631.3791658-3-d.csapak@proxmox.com>

Am 21.02.25 um 16:06 schrieb Dominik Csapak:
> Using a single thread for reading is not optimal in some cases, e.g.
> when the underlying storage can handle reads from multiple threads in
> parallel.
> 
> We use the ParallelHandler to handle the actual reads. Make the
> sync_channel buffer size depend on the number of threads so we have
> space for two chunks per thread. (But keep the minimum to 3 like
> before).
> 
> How this impacts the backup speed largely depends on the underlying
> storage and how the backup is laid out on it.

And the amount of tasks going on at the same time, which can wildly
influence the result and is not fully under the admin control as the
duration of tasks is not exactly fixed.

FWIW, I think this is an OK stop-gap as it's really simple and if the
admin is somewhat careful with configuring this and the tasks schedules
it might help them already quite a bit as your benchmark shows, and
again, it _really_ is simple, and the whole tape subsystem is a bit more
specific and contained.

That said, in the long-term this would probably be better replaced with
a global scheduling approach that respects this and other tasks workloads
and resource usage, which is certainly not an easy thing to do as there
are many aspects one needs to think through and schedulers are not really
a done thing in academic research either, especially not general ones.
I'm mostly mentioning this to avoid proliferation of such a mechanism to
other tasks, as that would result in a configuration hell for admin where
they hardly can tune for their workloads sanely anymore.

Code looks OK to me, one tiny nit about a comment inline though.

> diff --git a/src/tape/pool_writer/new_chunks_iterator.rs b/src/tape/pool_writer/new_chunks_iterator.rs
> index 1454b33d2..de847b3c9 100644
> --- a/src/tape/pool_writer/new_chunks_iterator.rs
> +++ b/src/tape/pool_writer/new_chunks_iterator.rs
> @@ -6,8 +6,9 @@ use anyhow::{format_err, Error};
>  use pbs_datastore::{DataBlob, DataStore, SnapshotReader};
>  
>  use crate::tape::CatalogSet;
> +use crate::tools::parallel_handler::ParallelHandler;
>  
> -/// Chunk iterator which use a separate thread to read chunks
> +/// Chunk iterator which uses separate threads to read chunks
>  ///
>  /// The iterator skips duplicate chunks and chunks already in the
>  /// catalog.
> @@ -24,8 +25,11 @@ impl NewChunksIterator {
>          datastore: Arc<DataStore>,
>          snapshot_reader: Arc<Mutex<SnapshotReader>>,
>          catalog_set: Arc<Mutex<CatalogSet>>,
> +        read_threads: usize,
>      ) -> Result<(std::thread::JoinHandle<()>, Self), Error> {
> -        let (tx, rx) = std::sync::mpsc::sync_channel(3);
> +        // use twice the threadcount for the channel, so the read thread can already send another
> +        // one when the previous one was not consumed yet, but keep the minimum at 3

this reads a bit confusing like the channel could go up to 2 x thread
count, while that does not make much sense if one is well acquainted
with the matter at hand it'd be IMO still nicer to clarify that for
others stumbling into this, maybe something like:


// set the buffer size of the channel queues to twice the number of threads or 3, whichever
// is greater, to reduce the chance of a reader thread (producer) being blocked.

Can be fixed up on applying though, if you agree (or propose something better).

> +        let (tx, rx) = std::sync::mpsc::sync_channel((read_threads * 2).max(3));




_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


  reply	other threads:[~2025-03-20 16:31 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-21 15:06 [pbs-devel] [PATCH proxmox/proxmox-backup v3] tape: implement multithreaded read Dominik Csapak
2025-02-21 15:06 ` [pbs-devel] [PATCH proxmox v3 1/1] pbs api types: tape backup job: add worker threads option Dominik Csapak
2025-02-21 15:06 ` [pbs-devel] [PATCH proxmox-backup v3 1/1] tape: introduce a tape backup job worker thread option Dominik Csapak
2025-03-20 16:30   ` Thomas Lamprecht [this message]
2025-03-21  8:31     ` Dominik Csapak
2025-03-21 15:14     ` Laurențiu Leahu-Vlăducu
2025-03-19 14:12 ` [pbs-devel] [PATCH proxmox/proxmox-backup v3] tape: implement multithreaded read Dominik Csapak
2025-03-20  6:39   ` Dietmar Maurer
2025-03-20  8:03     ` Dietmar Maurer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e5a2dcac-630e-4797-bbbf-f38bc260c2ca@proxmox.com \
    --to=t.lamprecht@proxmox.com \
    --cc=d.csapak@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal