From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <pbs-devel-bounces@lists.proxmox.com> Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id C83F41FF18C for <inbox@lore.proxmox.com>; Mon, 7 Apr 2025 09:21:44 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id A4A6F36F50; Mon, 7 Apr 2025 09:21:42 +0200 (CEST) Date: Mon, 07 Apr 2025 09:21:04 +0200 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= <f.gruenbichler@proxmox.com> To: Proxmox Backup Server development discussion <pbs-devel@lists.proxmox.com> References: <20250404134936.425392-1-c.ebner@proxmox.com> <20250404134936.425392-5-c.ebner@proxmox.com> <D8Y1V8BY8F6I.25I1HV13SAN63@proxmox.com> In-Reply-To: <D8Y1V8BY8F6I.25I1HV13SAN63@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.16.0 (https://github.com/astroidmail/astroid) Message-Id: <1744009968.ip6vyc3mbq.astroid@yuna.none> X-SPAM-LEVEL: Spam detection results: 0 AWL 0.045 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pbs-devel] [PATCH v4 proxmox-backup 4/7] fix #4182: server: sync: allow pulling groups concurrently X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion <pbs-devel.lists.proxmox.com> List-Unsubscribe: <https://lists.proxmox.com/cgi-bin/mailman/options/pbs-devel>, <mailto:pbs-devel-request@lists.proxmox.com?subject=unsubscribe> List-Archive: <http://lists.proxmox.com/pipermail/pbs-devel/> List-Post: <mailto:pbs-devel@lists.proxmox.com> List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help> List-Subscribe: <https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel>, <mailto:pbs-devel-request@lists.proxmox.com?subject=subscribe> Reply-To: Proxmox Backup Server development discussion <pbs-devel@lists.proxmox.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" <pbs-devel-bounces@lists.proxmox.com> On April 4, 2025 8:02 pm, Max Carrara wrote: > On Fri Apr 4, 2025 at 3:49 PM CEST, Christian Ebner wrote: >> Currently, a sync job sequentially pulls the backup groups and the >> snapshots contained within them, therefore being limited in download >> speed by the http2 connection of the source reader instance in case >> of remote syncs. High latency networks suffer from limited download >> speed. >> >> Improve the throughput by allowing to pull up to a configured number >> of backup groups concurrently, by creating tasks connecting and >> pulling from the remote source in parallel. >> >> Make the error handling and accounting logic for each group pull >> reusable by moving it into its own helper function, returning the >> future. >> >> The store progress is placed behind an atomic reference counted mutex >> to allow for concurrent access of status updates. > > Yeah, so... I've got some thoughts about this: > > First of all, I think that that's *fine* here, as the > `Arc<Mutex<StoreProgress>>` probably isn't going to face that much lock > contention or something anyway. So to get that out of the way, IMO we > can keep that here as it is right now. > > But in the future I do think that we should check the locations where we > have that kind of concurrent data access / modification, because I feel > the amount of mutexes is only going to continue growing. That's not a > bad thing per se, but it does come with a few risks / drawbacks (e.g. > higher risk for deadlocks). > > Without going to deep into the whole "how to avoid deadlocks" discussion > and other things, here's an alternative I want to propose that could > perhaps be done in (or as part of a) different series, since it's a bit > out of scope for this one here. (Though, if you do wanna do it in this > one, I certainly won't complain!) > > First, since `StoreProgress` only contains four `usize`s, it should be > fairly easy to convert the ones being modified into `AtomicUsize`s and > perhaps add helper methods to increase their respective values; > something like this: > > #[derive(Debug, Default)] > /// Tracker for progress of operations iterating over `Datastore` contents. > pub struct StoreProgress { > /// Completed groups > pub done_groups: AtomicUsize, > /// Total groups > pub total_groups: u64, > /// Completed snapshots within current group > pub done_snapshots: AtomicUsize, > /// Total snapshots in current group > pub group_snapshots: u64, > } > > // [...] > > impl StoreProgress { > pub fn add_done_group(&self) { > let _ = self.done_groups.fetch_add(1, Ordering::Relaxed); > } > > // [...] > } > > (of course, what it all should look like in detail is up to bikeshedding :P) > > Something like that would probably be nicer here, because: > > - You won't need to wrap `StoreProgress` within an `Arc<Mutex<T>>` > anymore -- a shared reference is enough, since ... > - Operations on atomics take &self (that's the whole point of them ofc ;p ) > > This means that: > - Cloning an `Arc<T>` is not necessary anymore > --> should be approx. two atomic ops less times the amount of `Arc`s used > (on create/clone and drop for each `Arc`) > - Locking the `Mutex<T>` is also not necessary anymore, which means > --> should be two atomic ops less for each call to `.lock()` > (acquire and release) > > In turn, this is replaced by a single atomic call with > `Ordering::Relaxed` (which is fine for counters [0]). So, something like > > progress.lock().unwrap().done_groups += 1; > > would just become > > progress.add_done_group(); > > which is also quite neat. > > Note however that we might have to split that struct into a "local" and > "shared" version or whatever in order to adapt it all to the current > code (a locally-used struct ofc doesn't need atomics). > > Again, I think what you're doing here is perfectly fine; I just think > that we should have a look at all of those concurrent data accesses and > see whether we can slim some stuff down or perhaps have some kind of > statically enforced mutex ordering for deadlock prevention [1]. > > [0]: https://doc.rust-lang.org/nomicon/atomics.html#relaxed > [1]: https://www.youtube.com/watch?v=Ba7fajt4l1M we do have a similar construct in the client (although still with Arc<>, maybe we could eliminate it there? ;)): https://git.proxmox.com/?p=proxmox-backup.git;a=blob;f=pbs-client/src/backup_stats.rs;h=f0563a0011b1117e32027a8a88f9d6f65db591f0;hb=HEAD#l45 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel