public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Christian Ebner <c.ebner@proxmox.com>
To: "Proxmox Backup Server development discussion"
	<pbs-devel@lists.proxmox.com>,
	"Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v2] fix #6566: backup: api: conditionally drop group and snapshot locks
Date: Tue, 30 Sep 2025 11:21:11 +0200	[thread overview]
Message-ID: <f4e4c13e-0e9d-4a13-9237-847ef3676694@proxmox.com> (raw)
In-Reply-To: <1759221679.xvpl4bbm4p.astroid@yuna.none>

On 9/30/25 10:44 AM, Fabian Grünbichler wrote:
> On September 29, 2025 5:17 pm, Christian Ebner wrote:
>> To guarantee consistency by possible concurrent operations, the
>> backup protocol locks the backup group, the previous backup
>> snapshot (if any) and holds a lock for the newly created backup
>> snapshot. All of these are currently stored in the backup worker
>> task, only released on its destruction.
>>
>> The backup API however signals a successful backup via the return
>> status of the `finish` call, while still holding the locks.
>> Therefore, an immediate subsequent backup of the client to the same
>> group can fail because the locks cannot be acquired until the previous
>> backup task is completely destroyed, which can however outlive the
>> `finish` return for some time. This manifests in e.g. a push sync job
>> failing.
>>
>> To fix this, store the lock guards inside the RPC environments shared
>> state instead, allowing to selectively drop the locks on successful
>> backup finish. On error, hold the locks until the cleanup was
>> successful.
>>
>> Immediate verification of new snapshots already downgraded the lock
>> by dropping the exclusive lock and getting a shared lock. Since the
>> dropping is now already handled by the finish call, only gathering
>> the shared lock is required. While there is now a larger time window
>> for concurrent prunes, the underlying possible race between
>> verification and prune remains in place.
>>
>> Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=6566
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> 
> Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> 
> it should be very rare that a snapshot is attempted to be removed right
> as it was created, so the slightly increased race window there should be
> okay.
> 
> did you test this with vzdump's `protected` option and verify after
> completion?

I did not manage to race the backup finish <-> verify after finished.

But there is the (pre-existing) race between pruning and setting the 
protected marker. This can be triggered if timed right, e.g. just did 
with the latest rebased version. The backup task then fails with, e.g.:
```
...
INFO: stopping kvm after backup task
INFO: adding notes to backup
WARN: unable to add notes - proxmox-backup-client failed: Error: unable 
to update manifest blob - unable to load blob 
'"/devel/datastore/vm/100/2025-09-30T09:09:40Z/index.json.blob"' - No 
such file or directory (os error 2)
INFO: marking backup as protected
ERROR: Backup of VM 100 failed - unable to set protected flag - 400 Bad 
Request
INFO: Failed at 2025-09-30 11:09:52
...
```

To fix that, we would need to extend the backup upgrade api endpoint by 
a protected marker flag, so this can be set a-priori and written when 
the manifest is persisted to disk.



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

  reply	other threads:[~2025-09-30  9:21 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-29 15:17 Christian Ebner
2025-09-30  8:44 ` Fabian Grünbichler
2025-09-30  9:21   ` Christian Ebner [this message]
2025-10-01  8:59 ` [pbs-devel] applied: " Fabian Grünbichler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f4e4c13e-0e9d-4a13-9237-847ef3676694@proxmox.com \
    --to=c.ebner@proxmox.com \
    --cc=f.gruenbichler@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal