public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Dominik Csapak <d.csapak@proxmox.com>
To: Thomas Lamprecht <t.lamprecht@proxmox.com>,
	Proxmox Backup Server development discussion
	<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v3 1/1] tape: introduce a tape backup job worker thread option
Date: Fri, 21 Mar 2025 09:31:40 +0100	[thread overview]
Message-ID: <b4cdb6f9-7276-45e3-b386-758164831eb0@proxmox.com> (raw)
In-Reply-To: <e5a2dcac-630e-4797-bbbf-f38bc260c2ca@proxmox.com>

On 3/20/25 17:30, Thomas Lamprecht wrote:
> Am 21.02.25 um 16:06 schrieb Dominik Csapak:
>> Using a single thread for reading is not optimal in some cases, e.g.
>> when the underlying storage can handle reads from multiple threads in
>> parallel.
>>
>> We use the ParallelHandler to handle the actual reads. Make the
>> sync_channel buffer size depend on the number of threads so we have
>> space for two chunks per thread. (But keep the minimum to 3 like
>> before).
>>
>> How this impacts the backup speed largely depends on the underlying
>> storage and how the backup is laid out on it.
> 
> And the amount of tasks going on at the same time, which can wildly
> influence the result and is not fully under the admin control as the
> duration of tasks is not exactly fixed.
> 
> FWIW, I think this is an OK stop-gap as it's really simple and if the
> admin is somewhat careful with configuring this and the tasks schedules
> it might help them already quite a bit as your benchmark shows, and
> again, it _really_ is simple, and the whole tape subsystem is a bit more
> specific and contained.
> 
> That said, in the long-term this would probably be better replaced with
> a global scheduling approach that respects this and other tasks workloads
> and resource usage, which is certainly not an easy thing to do as there
> are many aspects one needs to think through and schedulers are not really
> a done thing in academic research either, especially not general ones.
> I'm mostly mentioning this to avoid proliferation of such a mechanism to
> other tasks, as that would result in a configuration hell for admin where
> they hardly can tune for their workloads sanely anymore.
> 

I fully agree here with you, I sent it this way again because we have users running into this now
and don't want/can't wait for the proper fix with a scheduler.

> Code looks OK to me, one tiny nit about a comment inline though.
> 
>> diff --git a/src/tape/pool_writer/new_chunks_iterator.rs b/src/tape/pool_writer/new_chunks_iterator.rs
>> index 1454b33d2..de847b3c9 100644
>> --- a/src/tape/pool_writer/new_chunks_iterator.rs
>> +++ b/src/tape/pool_writer/new_chunks_iterator.rs
>> @@ -6,8 +6,9 @@ use anyhow::{format_err, Error};
>>   use pbs_datastore::{DataBlob, DataStore, SnapshotReader};
>>   
>>   use crate::tape::CatalogSet;
>> +use crate::tools::parallel_handler::ParallelHandler;
>>   
>> -/// Chunk iterator which use a separate thread to read chunks
>> +/// Chunk iterator which uses separate threads to read chunks
>>   ///
>>   /// The iterator skips duplicate chunks and chunks already in the
>>   /// catalog.
>> @@ -24,8 +25,11 @@ impl NewChunksIterator {
>>           datastore: Arc<DataStore>,
>>           snapshot_reader: Arc<Mutex<SnapshotReader>>,
>>           catalog_set: Arc<Mutex<CatalogSet>>,
>> +        read_threads: usize,
>>       ) -> Result<(std::thread::JoinHandle<()>, Self), Error> {
>> -        let (tx, rx) = std::sync::mpsc::sync_channel(3);
>> +        // use twice the threadcount for the channel, so the read thread can already send another
>> +        // one when the previous one was not consumed yet, but keep the minimum at 3
> 
> this reads a bit confusing like the channel could go up to 2 x thread
> count, while that does not make much sense if one is well acquainted
> with the matter at hand it'd be IMO still nicer to clarify that for
> others stumbling into this, maybe something like:
> 
> 
> // set the buffer size of the channel queues to twice the number of threads or 3, whichever
> // is greater, to reduce the chance of a reader thread (producer) being blocked.
> 
> Can be fixed up on applying though, if you agree (or propose something better).

sound fine to me

> 
>> +        let (tx, rx) = std::sync::mpsc::sync_channel((read_threads * 2).max(3));
> 
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


  reply	other threads:[~2025-03-21  8:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-21 15:06 [pbs-devel] [PATCH proxmox/proxmox-backup v3] tape: implement multithreaded read Dominik Csapak
2025-02-21 15:06 ` [pbs-devel] [PATCH proxmox v3 1/1] pbs api types: tape backup job: add worker threads option Dominik Csapak
2025-04-02 12:48   ` [pbs-devel] applied: " Thomas Lamprecht
2025-02-21 15:06 ` [pbs-devel] [PATCH proxmox-backup v3 1/1] tape: introduce a tape backup job worker thread option Dominik Csapak
2025-03-20 16:30   ` Thomas Lamprecht
2025-03-21  8:31     ` Dominik Csapak [this message]
2025-03-21 15:14     ` Laurențiu Leahu-Vlăducu
2025-04-02 14:48   ` [pbs-devel] applied: " Thomas Lamprecht
2025-03-19 14:12 ` [pbs-devel] [PATCH proxmox/proxmox-backup v3] tape: implement multithreaded read Dominik Csapak
2025-03-20  6:39   ` Dietmar Maurer
2025-03-20  8:03     ` Dietmar Maurer
2025-04-01 13:54 ` Bastian Mäuser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b4cdb6f9-7276-45e3-b386-758164831eb0@proxmox.com \
    --to=d.csapak@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    --cc=t.lamprecht@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal