From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <pbs-devel-bounces@lists.proxmox.com>
Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9])
	by lore.proxmox.com (Postfix) with ESMTPS id 496761FF136
	for <inbox@lore.proxmox.com>; Mon, 04 May 2026 10:24:31 +0200 (CEST)
Received: from firstgate.proxmox.com (localhost [127.0.0.1])
	by firstgate.proxmox.com (Proxmox) with ESMTP id 784DD164A4;
	Mon,  4 May 2026 10:24:30 +0200 (CEST)
Date: Mon, 04 May 2026 10:24:22 +0200
From: Fabian =?iso-8859-1?q?Gr=FCnbichler?= <f.gruenbichler@proxmox.com>
Subject: Re: [PATCH proxmox-backup] api: backup: cleanup backup group created
 by benchmark
To: Christian Ebner <c.ebner@proxmox.com>, pbs-devel@lists.proxmox.com
References: <20260430135931.722979-1-c.ebner@proxmox.com>
In-Reply-To: <20260430135931.722979-1-c.ebner@proxmox.com>
MIME-Version: 1.0
User-Agent: astroid/0.17.0 (https://github.com/astroidmail/astroid)
Message-Id: <1777881516.a5cvx4evyt.astroid@yuna.none>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2
X-Bm-Transport-Timestamp: 1777882960561
X-SPAM-LEVEL: Spam detection results:  0
	AWL                     0.054 Adjusted score from AWL reputation of From:
 address
	BAYES_00                 -1.9 Bayes spam probability is 0 to 1%
	DMARC_MISSING             0.1 Missing DMARC policy
	KAM_DMARC_STATUS         0.01 Test Rule for DKIM or SPF Failure with Strict
 Alignment
	SPF_HELO_NONE           0.001 SPF: HELO does not publish an SPF Record
	SPF_PASS               -0.001 SPF: sender matches SPF record
Message-ID-Hash: LQ5W4IVYJCFP5O5RJ76QPE5YK5YNKWX4
X-Message-ID-Hash: LQ5W4IVYJCFP5O5RJ76QPE5YK5YNKWX4
X-MailFrom: f.gruenbichler@proxmox.com
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; loop;
 banned-address; emergency; member-moderation; nonmember-moderation;
 administrivia; implicit-dest; max-recipients; max-size; news-moderation;
 no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.10
Precedence: list
List-Id: Proxmox Backup Server development discussion
 <pbs-devel.lists.proxmox.com>
List-Help: <mailto:pbs-devel-request@lists.proxmox.com?subject=help>
List-Owner: <mailto:pbs-devel-owner@lists.proxmox.com>
List-Post: <mailto:pbs-devel@lists.proxmox.com>
List-Subscribe: <mailto:pbs-devel-join@lists.proxmox.com>
List-Unsubscribe: <mailto:pbs-devel-leave@lists.proxmox.com>

On April 30, 2026 3:59 pm, Christian Ebner wrote:
> The benchmark creates it's own backup group host/benchmark, failed
> however to auto-cleanup the group after itself, because since commit
> 23be00a42 ("fix #3336: datastore: remove group if the last snapshot
> is removed"), cleanup requires an exclusive lock on the backup group
> for destroying it. The backup environment however already holds the
> exclusive lock to disallow concurrent backups to the same group.
>=20
> To fix this, drop the locks held in the backup environment by
> dropping the environment itself and rely on the cleanup to reacquire
> them again.
>=20
> Fixes: https://forum.proxmox.com/threads/183138/
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
>  src/api2/backup/mod.rs | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>=20
> diff --git a/src/api2/backup/mod.rs b/src/api2/backup/mod.rs
> index 86ec49487..8848ca99c 100644
> --- a/src/api2/backup/mod.rs
> +++ b/src/api2/backup/mod.rs
> @@ -288,9 +288,14 @@ fn upgrade_to_backup_protocol(
>                  if benchmark {
>                      env.log("benchmark finished successfully");
>                      proxmox_async::runtime::block_in_place(|| {
> -                        env.datastore.remove_backup_dir(
> -                            env.backup_dir.backup_ns(),
> -                            env.backup_dir.as_ref(),
> +                        let datastore =3D env.datastore.clone();
> +                        let namespace =3D env.backup_dir.backup_ns().clo=
ne();
> +                        let snapshot =3D env.backup_dir.dir().clone();
> +                        // draps all locks

nit: `draps` ;)

> +                        drop(env);
> +                        datastore.remove_backup_dir(
> +                            &namespace,
> +                            &snapshot,
>                              true,
>                          )

doesn't this also affect the "cleanup-on-error" paths a few lines below
this?

dropping the full env is also a bit problematic because it opens up a
race condition if there are back-to-back benchmarks (or backups):
- benchmark starts
- benchmark is finished, drops env
- next benchmark starts and locks group and previous "snapshot"
- cleanup fails to obtain lock(s) and doesn't run

for benchmarks that is not so bad, but for backups it would leave
half-written backup snapshots around (until they are cleaned up by other
means?).

ideally, we would not drop the locks here but just run the cleanup using
the locks we already have, which is what "force" is doing.

we currently only set force
- for the three calls here in the backup env
- when cleaning up a newly created snapshot as part of pull error
  handling

in all those case we are holding an exclusive lock on the group and on
the snapshot already. so we could just skip the group locking as well
when force is set? (ideally we'd find a way to actually encode this in
the signature, e.g. by replacing `force` with references to the lock
guards?)