From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 72CF81FF3A0 for ; Wed, 8 May 2024 08:57:02 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 21C821A1FD; Wed, 8 May 2024 08:57:07 +0200 (CEST) Message-ID: <2d13fe7c-25be-4eff-9cc6-85b65e829c36@proxmox.com> Date: Wed, 8 May 2024 08:56:32 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta To: Proxmox Backup Server development discussion , Max Carrara References: <20240507072955.364206-1-d.csapak@proxmox.com> <20240507072955.364206-2-d.csapak@proxmox.com> Content-Language: en-US From: Dominik Csapak In-Reply-To: X-SPAM-LEVEL: Spam detection results: 0 AWL 0.016 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] [PATCH proxmox-backup v2 1/3] tape: introduce a tape backup read thread tuning opti X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" On 5/7/24 17:10, Max Carrara wrote: > On Tue May 7, 2024 at 9:29 AM CEST, Dominik Csapak wrote: >> using a single thread for reading is not optimal in some cases, e.g. >> when the underlying storage can handle more reads in parallel than with >> a single thread. >> > > Got only a few tiny notes that are otherwise unremarkable: > > Would prefer to reword this as: > > Using a single thread for reading is not optimal in some cases, e.g. > when the underlying storage can handle reads from multiple threads in > parallel. > >> We use the ParallelHandler to handle the actual reads. Make the >> sync_channel buffer size depending on the number of threads so we have >> space for two chunks per thread. > > Would s/depending/depend/ for the above paragraph. > ok to both comments, will update in a v3 (if necessary) or if preferred the one committing could fixup the commit message when applying >> >> How this impacts the backup speed largely depends on the underlying >> storage and how the backup is laid out on it. >> >> I benchmarked the following setups: >> >> * Setup A: relatively spread out backup on a virtualized pbs on single HDDs >> * Setup B: mostly sequential chunks on a virtualized pbs on single HDDs >> * Setup C: backup on virtualized pbs on a fast NVME >> * Setup D: backup on bare metal pbs with ZFS in a RAID10 with 6 HDDs >> and 2 fast special devices in a mirror >> >> (values are reported in MB/s as seen in the task log, caches were >> cleared between runs, backups were bigger than the memory available) >> >> setup 1 thread 2 threads 4 threads 8 threads >> A 55 70 80 95 >> B 110 89 100 108 >> C 294 294 294 294 >> D 118 180 300 300 >> >> So there are cases where multiple read threads speed up the tape backup >> (dramatically). On the other hand there are situations where reading >> from a single thread is actually faster, probably because we can read >> from the HDD sequentially. >> >> I used a new default value of '4' here since that gave good performance >> on all setups (not necessarily the best) and we also have a default >> value of '4' for verification threads. >> >> We use a struct here for the datastore since we want to introduce other >> thread tuning options too. >> >> Signed-off-by: Dominik Csapak > > The patch is pretty straightforward; there's one question inline, but > otherwise this looks fine to me. Solid work! > > Wasn't able to test this yet as setting up a VTL on Bookworm failed for > me, unfortunately. Will try to test this tomorrow if possible. it's probably obvious, but just to be sure: make sure the target storage for the vtl is faster than your source storage, otherwise you'll always be limited by that... > > For now: > > Reviewed-by: Max Carrara > [snip] > diff --git a/src/tape/pool_writer/new_chunks_iterator.rs b/src/tape/pool_writer/new_chunks_iterator.rs >> index 1454b33d2..a2a8091f6 100644 >> --- a/src/tape/pool_writer/new_chunks_iterator.rs >> +++ b/src/tape/pool_writer/new_chunks_iterator.rs >> @@ -6,8 +6,9 @@ use anyhow::{format_err, Error}; >> use pbs_datastore::{DataBlob, DataStore, SnapshotReader}; >> >> use crate::tape::CatalogSet; >> +use crate::tools::parallel_handler::ParallelHandler; >> >> -/// Chunk iterator which use a separate thread to read chunks >> +/// Chunk iterator which uses separate threads to read chunks >> /// >> /// The iterator skips duplicate chunks and chunks already in the >> /// catalog. >> @@ -25,7 +26,11 @@ impl NewChunksIterator { >> snapshot_reader: Arc>, >> catalog_set: Arc>, >> ) -> Result<(std::thread::JoinHandle<()>, Self), Error> { >> - let (tx, rx) = std::sync::mpsc::sync_channel(3); >> + let read_threads = datastore >> + .get_thread_configuration() >> + .tape_backup_read_threads; >> + >> + let (tx, rx) = std::sync::mpsc::sync_channel(read_threads * 2); > > Is there any reason you're using `* 2` here? For example, is the > throughput unaffected if you use a larger value, like `* 8`? > > If the constant has an effect like that it should IMO be documented, but > if not, then it can just stay like it is. i did not benchmark that much here (would have increased my benchmark time since that's another dimension in the testing with many possible values) but my though process was this: if the source storage is much slower than what we can write to tape, that buffer will be empty most of the time if the source storage is much faster than what we can write to tape, it will be full most of the time in both cases the buffer size won't matter much (as long as there is a bit of buffer) the sweet spot is when the storage is about as fast as the tape writes and in that case you want to be able to have just a bit more buffer than threads reading from the storage so there is always a read going on (so the buffer will never empty, since that slows down the tape write) so 1*number of threads seemed to low, but i think anything bigger than 2* is just a waste of memory does that make sense? if we really want, i can make additional benchmarks but my guess is that it won't make much difference in practice _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel