public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Christian Ebner <c.ebner@proxmox.com>, pbs-devel@lists.proxmox.com
Subject: Re: [PATCH proxmox-backup 4/4] api/datastore: use maintenance-mode lock to protect against changes
Date: Wed, 06 May 2026 08:30:23 +0200	[thread overview]
Message-ID: <1778048819.s4fbqpt0x2.astroid@yuna.none> (raw)
In-Reply-To: <189d0714-90c1-4607-95d0-3ae74febef59@proxmox.com>

On May 5, 2026 3:02 pm, Christian Ebner wrote:
> On 5/5/26 2:12 PM, Fabian Grünbichler wrote:
>> On May 5, 2026 10:11 am, Christian Ebner wrote:

> [..]

>>> diff --git a/pbs-datastore/src/lib.rs b/pbs-datastore/src/lib.rs
>>> index 48acba4c8..823f3bf2f 100644
>>> --- a/pbs-datastore/src/lib.rs
>>> +++ b/pbs-datastore/src/lib.rs
>>> @@ -159,8 +159,9 @@
>>>   
>>>   use std::os::unix::io::AsRawFd;
>>>   use std::path::Path;
>>> +use std::time::Duration;
>>>   
>>> -use anyhow::{bail, Error};
>>> +use anyhow::{bail, Context, Error};
>>>   
>>>   use pbs_config::BackupLockGuard;
>>>   
>>> @@ -271,3 +272,11 @@ where
>>>   
>>>       Ok(lock)
>>>   }
>>> +
>>> +/// Acquire an exclusive lock for the datastore's maintenance-mode
>>> +pub fn maintenance_mode_lock(store: &str) -> Result<BackupLockGuard, Error> {
>>> +    lock_helper(store, Path::new("maintenance-mode.lck"), |p| {
>>> +        pbs_config::open_backup_lockfile(p, Some(Duration::from_secs(0)), true)
>> 
>> this might warrant a comment, usually we open config related locks with
>> a timeout to prevent flaky locking in case of concurrency, waiting a few
>> seconds is normally fine.
>> 
>> but here, we must always first acquire the datastore config lock (should
>> we encode that in the signature??) anyway, which makes the 0
>> timeout/non-blocking behaviour okay, if I understand it correctly?
> 
> Right, it makes sense to pass in the config lock to "prove" that it was 
> acquired first.
> 
> But it could make sense to add the short delay here nevertheless. After 
> all, another operation might have dropped the config lock already, while 
> still holding onto the maintenance-mode lock. OTOH the operations which 
> do hold it for longer probably will outlive the timeout, so a bit torn 
> as I'm not to sure if there is much benefit.

but the only time that happens is if the operation holding *only* the
maintainence mode lock is doing a long-running maintenance operation on
that datastore (the code modified below). and we don't want a long
timeout, because that means holding up the much more important global
datastore.cfg lock (that we must be holding to reach this fn).

I'd keep this non-blocking but add a comment *why*.

>>> +            .context("unable to acquire exclusive datastore's maintenance-mode lock")
>>> +    })
>>> +}
>>> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
>>> index 1e0a78f3a..883091e6f 100644
>>> --- a/src/api2/admin/datastore.rs
>>> +++ b/src/api2/admin/datastore.rs
>>> @@ -61,8 +61,8 @@ use pbs_datastore::index::IndexFile;
>>>   use pbs_datastore::manifest::BackupManifest;
>>>   use pbs_datastore::prune::compute_prune_info;
>>>   use pbs_datastore::{
>>> -    check_backup_owner, ensure_datastore_is_mounted, task_tracking, BackupDir, DataStore,
>>> -    LocalChunkReader, StoreProgress,
>>> +    check_backup_owner, ensure_datastore_is_mounted, maintenance_mode_lock, task_tracking,
>>> +    BackupDir, DataStore, LocalChunkReader, StoreProgress,
>>>   };
>>>   use pbs_tools::json::required_string_param;
>>>   use proxmox_rest_server::{formatter, worker_is_active, WorkerTask};
>>> @@ -2705,9 +2705,13 @@ fn expect_maintenance_type(
>>>   }
>>>   
>>>   fn unset_maintenance(
>>> -    _lock: pbs_config::BackupLockGuard,
>>> +    lock: Option<pbs_config::BackupLockGuard>,
>> 
>> I think this should take Option<(lock, lock)>, and
>> expect_maintenance_type should acquire and return both locks, but it
>> needs to be adapted further, because atm it is broken:
>> 
>> 1. start s3-refresh while active operations are running
>> 2. s3-refresh will be in loop waiting for operations to disappear
>> 3. roughly every second, it will call expect_maintenance_type
>> 4. if somebody else is holding the datastore config lock for more than
>>     ten seconds (let's say you started a re-use existing datastore
>>     creation with S3 ;)), expect_maintenance_type will error out!
> 
> Yeah, point 4 is broken in any case... :/ But will fix according to the 
> suggestions above.

it could also happen cause of lock contention, or something else
blocking for more than a few seconds, or a combination of those factors.
the datastore creation I just stumbled upon and seemed like an obvious
way to trigger it :)




  reply	other threads:[~2026-05-06  6:30 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05  8:11 [PATCH proxmox-backup 0/4] don't hold datastore config lock during s3-refresh Christian Ebner
2026-05-05  8:11 ` [PATCH proxmox-backup 1/4] pbs-config: reorganize crate use statements Christian Ebner
2026-05-05 12:15   ` applied: " Fabian Grünbichler
2026-05-05  8:11 ` [PATCH proxmox-backup 2/4] datastore: move lock files base path constant to central location Christian Ebner
2026-05-05  8:11 ` [PATCH proxmox-backup 3/4] datastore: move file lock helper to centralized place Christian Ebner
2026-05-05 12:15   ` Fabian Grünbichler
2026-05-05  8:11 ` [PATCH proxmox-backup 4/4] api/datastore: use maintenance-mode lock to protect against changes Christian Ebner
2026-05-05 12:14   ` Fabian Grünbichler
2026-05-05 13:02     ` Christian Ebner
2026-05-06  6:30       ` Fabian Grünbichler [this message]
2026-05-06 16:58 ` superseded: [PATCH proxmox-backup 0/4] don't hold datastore config lock during s3-refresh Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1778048819.s4fbqpt0x2.astroid@yuna.none \
    --to=f.gruenbichler@proxmox.com \
    --cc=c.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal