From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 496761FF136 for ; Mon, 04 May 2026 10:24:31 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 784DD164A4; Mon, 4 May 2026 10:24:30 +0200 (CEST) Date: Mon, 04 May 2026 10:24:22 +0200 From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= Subject: Re: [PATCH proxmox-backup] api: backup: cleanup backup group created by benchmark To: Christian Ebner , pbs-devel@lists.proxmox.com References: <20260430135931.722979-1-c.ebner@proxmox.com> In-Reply-To: <20260430135931.722979-1-c.ebner@proxmox.com> MIME-Version: 1.0 User-Agent: astroid/0.17.0 (https://github.com/astroidmail/astroid) Message-Id: <1777881516.a5cvx4evyt.astroid@yuna.none> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1777882960561 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.054 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Message-ID-Hash: LQ5W4IVYJCFP5O5RJ76QPE5YK5YNKWX4 X-Message-ID-Hash: LQ5W4IVYJCFP5O5RJ76QPE5YK5YNKWX4 X-MailFrom: f.gruenbichler@proxmox.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop; banned-address; emergency; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header X-Mailman-Version: 3.3.10 Precedence: list List-Id: Proxmox Backup Server development discussion List-Help: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: On April 30, 2026 3:59 pm, Christian Ebner wrote: > The benchmark creates it's own backup group host/benchmark, failed > however to auto-cleanup the group after itself, because since commit > 23be00a42 ("fix #3336: datastore: remove group if the last snapshot > is removed"), cleanup requires an exclusive lock on the backup group > for destroying it. The backup environment however already holds the > exclusive lock to disallow concurrent backups to the same group. >=20 > To fix this, drop the locks held in the backup environment by > dropping the environment itself and rely on the cleanup to reacquire > them again. >=20 > Fixes: https://forum.proxmox.com/threads/183138/ > Signed-off-by: Christian Ebner > --- > src/api2/backup/mod.rs | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) >=20 > diff --git a/src/api2/backup/mod.rs b/src/api2/backup/mod.rs > index 86ec49487..8848ca99c 100644 > --- a/src/api2/backup/mod.rs > +++ b/src/api2/backup/mod.rs > @@ -288,9 +288,14 @@ fn upgrade_to_backup_protocol( > if benchmark { > env.log("benchmark finished successfully"); > proxmox_async::runtime::block_in_place(|| { > - env.datastore.remove_backup_dir( > - env.backup_dir.backup_ns(), > - env.backup_dir.as_ref(), > + let datastore =3D env.datastore.clone(); > + let namespace =3D env.backup_dir.backup_ns().clo= ne(); > + let snapshot =3D env.backup_dir.dir().clone(); > + // draps all locks nit: `draps` ;) > + drop(env); > + datastore.remove_backup_dir( > + &namespace, > + &snapshot, > true, > ) doesn't this also affect the "cleanup-on-error" paths a few lines below this? dropping the full env is also a bit problematic because it opens up a race condition if there are back-to-back benchmarks (or backups): - benchmark starts - benchmark is finished, drops env - next benchmark starts and locks group and previous "snapshot" - cleanup fails to obtain lock(s) and doesn't run for benchmarks that is not so bad, but for backups it would leave half-written backup snapshots around (until they are cleaned up by other means?). ideally, we would not drop the locks here but just run the cleanup using the locks we already have, which is what "force" is doing. we currently only set force - for the three calls here in the backup env - when cleaning up a newly created snapshot as part of pull error handling in all those case we are holding an exclusive lock on the group and on the snapshot already. so we could just skip the group locking as well when force is set? (ideally we'd find a way to actually encode this in the signature, e.g. by replacing `force` with references to the lock guards?)