From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pbs-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id C83F41FF18C
	for <inbox@lore.proxmox.com>; Mon,  7 Apr 2025 09:21:44 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id A4A6F36F50;
	Mon,  7 Apr 2025 09:21:42 +0200 (CEST)
Date: Mon, 07 Apr 2025 09:21:04 +0200
From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= <f.gruenbichler@proxmox.com>
To: Proxmox Backup Server development discussion <pbs-devel@lists.proxmox.com>
References: <20250404134936.425392-1-c.ebner@proxmox.com>
 <20250404134936.425392-5-c.ebner@proxmox.com>
 <D8Y1V8BY8F6I.25I1HV13SAN63@proxmox.com>
In-Reply-To: <D8Y1V8BY8F6I.25I1HV13SAN63@proxmox.com>
MIME-Version: 1.0
User-Agent: astroid/0.16.0 (https://github.com/astroidmail/astroid)
Message-Id: <1744009968.ip6vyc3mbq.astroid@yuna.none>
X-SPAM-LEVEL: Spam detection results:  0
 AWL 0.045 Adjusted score from AWL reputation of From: address
 BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
 DMARC_MISSING             0.1 Missing DMARC policy
 KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to
 Validity was blocked. See
 https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more
 information.
 SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
 SPF_PASS               -0.001 SPF: sender matches SPF record
Subject: Re: [pbs-devel] [PATCH v4 proxmox-backup 4/7] fix #4182: server:
 sync: allow pulling groups concurrently
X-BeenThere: pbs-devel@lists.proxmox.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe>
List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, 
 <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe>
Reply-To: Proxmox Backup Server development discussion
 <pbs-devel@lists.proxmox.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: pbs-devel-bounces@lists.proxmox.com
Sender: "pbs-devel" <pbs-devel-bounces@lists.proxmox.com>

On April 4, 2025 8:02 pm, Max Carrara wrote:
> On Fri Apr 4, 2025 at 3:49 PM CEST, Christian Ebner wrote:
>> Currently, a sync job sequentially pulls the backup groups and the
>> snapshots contained within them, therefore being limited in download
>> speed by the http2 connection of the source reader instance in case
>> of remote syncs. High latency networks suffer from limited download
>> speed.
>>
>> Improve the throughput by allowing to pull up to a configured number
>> of backup groups concurrently, by creating tasks connecting and
>> pulling from the remote source in parallel.
>>
>> Make the error handling and accounting logic for each group pull
>> reusable by moving it into its own helper function, returning the
>> future.
>>
>> The store progress is placed behind an atomic reference counted mutex
>> to allow for concurrent access of status updates.
> 
> Yeah, so... I've got some thoughts about this:
> 
> First of all, I think that that's *fine* here, as the
> `Arc<Mutex<StoreProgress>>` probably isn't going to face that much lock
> contention or something anyway. So to get that out of the way, IMO we
> can keep that here as it is right now.
> 
> But in the future I do think that we should check the locations where we
> have that kind of concurrent data access / modification, because I feel
> the amount of mutexes is only going to continue growing. That's not a
> bad thing per se, but it does come with a few risks / drawbacks (e.g.
> higher risk for deadlocks).
> 
> Without going to deep into the whole "how to avoid deadlocks" discussion
> and other things, here's an alternative I want to propose that could
> perhaps be done in (or as part of a) different series, since it's a bit
> out of scope for this one here. (Though, if you do wanna do it in this
> one, I certainly won't complain!)
> 
> First, since `StoreProgress` only contains four `usize`s, it should be
> fairly easy to convert the ones being modified into `AtomicUsize`s and
> perhaps add helper methods to increase their respective values;
> something like this:
> 
>     #[derive(Debug, Default)]
>     /// Tracker for progress of operations iterating over `Datastore` contents.
>     pub struct StoreProgress {
>         /// Completed groups
>         pub done_groups: AtomicUsize,
>         /// Total groups
>         pub total_groups: u64,
>         /// Completed snapshots within current group
>         pub done_snapshots: AtomicUsize,
>         /// Total snapshots in current group
>         pub group_snapshots: u64,
>     }
>     
>     // [...]
>     
>     impl StoreProgress {
>         pub fn add_done_group(&self) {
>             let _ = self.done_groups.fetch_add(1, Ordering::Relaxed);
>         }
>     
>         // [...]
>     }
> 
> (of course, what it all should look like in detail is up to bikeshedding :P)
> 
> Something like that would probably be nicer here, because:
> 
> - You won't need to wrap `StoreProgress` within an `Arc<Mutex<T>>`
>   anymore -- a shared reference is enough, since ...
> - Operations on atomics take &self (that's the whole point of them ofc ;p )
> 
> This means that:
> - Cloning an `Arc<T>` is not necessary anymore
>   --> should be approx. two atomic ops less times the amount of `Arc`s used
>       (on create/clone and drop for each `Arc`)
> - Locking the `Mutex<T>` is also not necessary anymore, which means
>   --> should be two atomic ops less for each call to `.lock()`
>       (acquire and release)
> 
> In turn, this is replaced by a single atomic call with
> `Ordering::Relaxed` (which is fine for counters [0]). So, something like
> 
>     progress.lock().unwrap().done_groups += 1;
> 
> would just become
> 
>     progress.add_done_group();
> 
> which is also quite neat.
> 
> Note however that we might have to split that struct into a "local" and
> "shared" version or whatever in order to adapt it all to the current
> code (a locally-used struct ofc doesn't need atomics).
> 
> Again, I think what you're doing here is perfectly fine; I just think
> that we should have a look at all of those concurrent data accesses and
> see whether we can slim some stuff down or perhaps have some kind of
> statically enforced mutex ordering for deadlock prevention [1].
> 
> [0]: https://doc.rust-lang.org/nomicon/atomics.html#relaxed
> [1]: https://www.youtube.com/watch?v=Ba7fajt4l1M

we do have a similar construct in the client (although still with Arc<>,
maybe we could eliminate it there? ;)):

https://git.proxmox.com/?p=proxmox-backup.git;a=blob;f=pbs-client/src/backup_stats.rs;h=f0563a0011b1117e32027a8a88f9d6f65db591f0;hb=HEAD#l45


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel