From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pbs-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id D3DE71FF189
	for <inbox@lore.proxmox.com>; Fri, 21 Mar 2025 09:31:45 +0100 (CET)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 4C40618EC4;
	Fri, 21 Mar 2025 09:31:44 +0100 (CET)
Message-ID: <b4cdb6f9-7276-45e3-b386-758164831eb0@proxmox.com>
Date: Fri, 21 Mar 2025 09:31:40 +0100
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird Beta
To: Thomas Lamprecht <t.lamprecht@proxmox.com>,
 Proxmox Backup Server development discussion <pbs-devel@lists.proxmox.com>
References: <20250221150631.3791658-1-d.csapak@proxmox.com>
 <20250221150631.3791658-3-d.csapak@proxmox.com>
 <e5a2dcac-630e-4797-bbbf-f38bc260c2ca@proxmox.com>
Content-Language: en-US
From: Dominik Csapak <d.csapak@proxmox.com>
In-Reply-To: <e5a2dcac-630e-4797-bbbf-f38bc260c2ca@proxmox.com>
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.021 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pbs-devel] [PATCH proxmox-backup v3 1/1] tape: introduce a
 tape backup job worker thread option
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox Backup Server development discussion
 <pbs-devel@lists.proxmox.com>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Errors-To: pbs-devel-bounces@lists.proxmox.com
Sender: "pbs-devel" <pbs-devel-bounces@lists.proxmox.com>

On 3/20/25 17:30, Thomas Lamprecht wrote:
> Am 21.02.25 um 16:06 schrieb Dominik Csapak:
>> Using a single thread for reading is not optimal in some cases, e.g.
>> when the underlying storage can handle reads from multiple threads in
>> parallel.
>>
>> We use the ParallelHandler to handle the actual reads. Make the
>> sync_channel buffer size depend on the number of threads so we have
>> space for two chunks per thread. (But keep the minimum to 3 like
>> before).
>>
>> How this impacts the backup speed largely depends on the underlying
>> storage and how the backup is laid out on it.
> 
> And the amount of tasks going on at the same time, which can wildly
> influence the result and is not fully under the admin control as the
> duration of tasks is not exactly fixed.
> 
> FWIW, I think this is an OK stop-gap as it's really simple and if the
> admin is somewhat careful with configuring this and the tasks schedules
> it might help them already quite a bit as your benchmark shows, and
> again, it _really_ is simple, and the whole tape subsystem is a bit more
> specific and contained.
> 
> That said, in the long-term this would probably be better replaced with
> a global scheduling approach that respects this and other tasks workloads
> and resource usage, which is certainly not an easy thing to do as there
> are many aspects one needs to think through and schedulers are not really
> a done thing in academic research either, especially not general ones.
> I'm mostly mentioning this to avoid proliferation of such a mechanism to
> other tasks, as that would result in a configuration hell for admin where
> they hardly can tune for their workloads sanely anymore.
> 

I fully agree here with you, I sent it this way again because we have users running into this now
and don't want/can't wait for the proper fix with a scheduler.

> Code looks OK to me, one tiny nit about a comment inline though.
> 
>> diff --git a/src/tape/pool_writer/new_chunks_iterator.rs b/src/tape/pool_writer/new_chunks_iterator.rs
>> index 1454b33d2..de847b3c9 100644
>> --- a/src/tape/pool_writer/new_chunks_iterator.rs
>> +++ b/src/tape/pool_writer/new_chunks_iterator.rs
>> @@ -6,8 +6,9 @@ use anyhow::{format_err, Error};
>>   use pbs_datastore::{DataBlob, DataStore, SnapshotReader};
>>   
>>   use crate::tape::CatalogSet;
>> +use crate::tools::parallel_handler::ParallelHandler;
>>   
>> -/// Chunk iterator which use a separate thread to read chunks
>> +/// Chunk iterator which uses separate threads to read chunks
>>   ///
>>   /// The iterator skips duplicate chunks and chunks already in the
>>   /// catalog.
>> @@ -24,8 +25,11 @@ impl NewChunksIterator {
>>           datastore: Arc<DataStore>,
>>           snapshot_reader: Arc<Mutex<SnapshotReader>>,
>>           catalog_set: Arc<Mutex<CatalogSet>>,
>> +        read_threads: usize,
>>       ) -> Result<(std::thread::JoinHandle<()>, Self), Error> {
>> -        let (tx, rx) = std::sync::mpsc::sync_channel(3);
>> +        // use twice the threadcount for the channel, so the read thread can already send another
>> +        // one when the previous one was not consumed yet, but keep the minimum at 3
> 
> this reads a bit confusing like the channel could go up to 2 x thread
> count, while that does not make much sense if one is well acquainted
> with the matter at hand it'd be IMO still nicer to clarify that for
> others stumbling into this, maybe something like:
> 
> 
> // set the buffer size of the channel queues to twice the number of threads or 3, whichever
> // is greater, to reduce the chance of a reader thread (producer) being blocked.
> 
> Can be fixed up on applying though, if you agree (or propose something better).

sound fine to me

> 
>> +        let (tx, rx) = std::sync::mpsc::sync_channel((read_threads * 2).max(3));
> 
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel