* [PATCH proxmox-backup 0/4] don't hold datastore config lock during s3-refresh
@ 2026-05-05 8:11 Christian Ebner
2026-05-05 8:11 ` [PATCH proxmox-backup 1/4] pbs-config: reorganize crate use statements Christian Ebner
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Christian Ebner @ 2026-05-05 8:11 UTC (permalink / raw)
To: pbs-devel
Currently an s3-refresh runs while holding the datastore config lock,
with the intention to block from accidentail leaving the
corresponding maintenance mode, assuring consistency.
However, this also blocks config access by all unrelated datastores,
similar to mount/unmount for removable datastores, but for s3-refresh
this is not acceptable since it can take significantly more time or
block on network issues until timeouts are reached.
Instead, implement per-datastore locks for maintenance-mode changes,
and rely on these to block unexpected maintenance-mode changes.
The first 3 patches are code style and refactoring only, the last patch
actually introduces the maintenance-mode locks.
Reported in the community forum: https://forum.proxmox.com/threads/183244/
Christian Ebner (4):
pbs-config: reorganize crate use statements
datastore: move lock files base path constant to central location
datastore: move file lock helper to centralized place
api/datastore: use maintenance-mode lock to protect against changes
pbs-config/src/lib.rs | 18 +++++++------
pbs-datastore/src/backup_info.rs | 36 ++-----------------------
pbs-datastore/src/chunk_store.rs | 5 ++--
pbs-datastore/src/datastore.rs | 2 ++
pbs-datastore/src/lib.rs | 45 +++++++++++++++++++++++++++++++
pbs-datastore/src/move_journal.rs | 2 +-
src/api2/admin/datastore.rs | 26 +++++++++++++-----
src/api2/config/datastore.rs | 6 ++++-
8 files changed, 87 insertions(+), 53 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH proxmox-backup 1/4] pbs-config: reorganize crate use statements
2026-05-05 8:11 [PATCH proxmox-backup 0/4] don't hold datastore config lock during s3-refresh Christian Ebner
@ 2026-05-05 8:11 ` Christian Ebner
2026-05-05 12:15 ` applied: " Fabian Grünbichler
2026-05-05 8:11 ` [PATCH proxmox-backup 2/4] datastore: move lock files base path constant to central location Christian Ebner
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Christian Ebner @ 2026-05-05 8:11 UTC (permalink / raw)
To: pbs-devel
Follow along the style used throughout the rest of the codebase,
grouping and sorting use statements for crates by provenance.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-config/src/lib.rs | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/pbs-config/src/lib.rs b/pbs-config/src/lib.rs
index d1bc9f862..454eda2c9 100644
--- a/pbs-config/src/lib.rs
+++ b/pbs-config/src/lib.rs
@@ -1,3 +1,13 @@
+use std::os::unix::prelude::AsRawFd;
+
+use anyhow::{bail, format_err, Error};
+use hex::FromHex;
+use nix::unistd::{Gid, Group, Uid, User};
+
+use proxmox_sys::fs::DirLockGuard;
+
+pub use pbs_buildcfg::{BACKUP_GROUP_NAME, BACKUP_USER_NAME};
+
pub mod acl;
mod cached_user_info;
pub use cached_user_info::CachedUserInfo;
@@ -23,14 +33,6 @@ pub mod verify;
mod config_version_cache;
pub use config_version_cache::ConfigVersionCache;
-use anyhow::{bail, format_err, Error};
-use hex::FromHex;
-use nix::unistd::{Gid, Group, Uid, User};
-use proxmox_sys::fs::DirLockGuard;
-use std::os::unix::prelude::AsRawFd;
-
-pub use pbs_buildcfg::{BACKUP_GROUP_NAME, BACKUP_USER_NAME};
-
/// Return User info for the 'backup' user (``getpwnam_r(3)``)
pub fn backup_user() -> Result<nix::unistd::User, Error> {
if cfg!(test) {
--
2.47.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH proxmox-backup 2/4] datastore: move lock files base path constant to central location
2026-05-05 8:11 [PATCH proxmox-backup 0/4] don't hold datastore config lock during s3-refresh Christian Ebner
2026-05-05 8:11 ` [PATCH proxmox-backup 1/4] pbs-config: reorganize crate use statements Christian Ebner
@ 2026-05-05 8:11 ` Christian Ebner
2026-05-05 8:11 ` [PATCH proxmox-backup 3/4] datastore: move file lock helper to centralized place Christian Ebner
2026-05-05 8:11 ` [PATCH proxmox-backup 4/4] api/datastore: use maintenance-mode lock to protect against changes Christian Ebner
3 siblings, 0 replies; 9+ messages in thread
From: Christian Ebner @ 2026-05-05 8:11 UTC (permalink / raw)
To: pbs-devel
For better organization of the locking related helpers. This constant
is used in various places for locking, therefore move it to the more
fitting central crate lib.
No functional changes.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/backup_info.rs | 3 +--
pbs-datastore/src/chunk_store.rs | 3 +--
pbs-datastore/src/lib.rs | 2 ++
pbs-datastore/src/move_journal.rs | 2 +-
4 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
index 223a8cbd4..9919b908a 100644
--- a/pbs-datastore/src/backup_info.rs
+++ b/pbs-datastore/src/backup_info.rs
@@ -24,9 +24,8 @@ use crate::datastore::{GROUP_NOTES_FILE_NAME, GROUP_OWNER_FILE_NAME};
use crate::manifest::{BackupManifest, MANIFEST_LOCK_NAME};
use crate::move_journal;
use crate::s3::S3_CONTENT_PREFIX;
-use crate::{DataBlob, DataStore, DatastoreBackend};
+use crate::{DataBlob, DataStore, DatastoreBackend, DATASTORE_LOCKS_DIR};
-pub const DATASTORE_LOCKS_DIR: &str = "/run/proxmox-backup/locks";
pub const PROTECTED_MARKER_FILENAME: &str = ".protected";
proxmox_schema::const_regex! {
diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 68db88eab..5b07ebdaa 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -18,12 +18,11 @@ use proxmox_sys::process_locker::{
};
use proxmox_worker_task::WorkerTaskContext;
-use crate::backup_info::DATASTORE_LOCKS_DIR;
use crate::data_blob::DataChunkBuilder;
use crate::file_formats::{
COMPRESSED_BLOB_MAGIC_1_0, ENCRYPTED_BLOB_MAGIC_1_0, UNCOMPRESSED_BLOB_MAGIC_1_0,
};
-use crate::{DataBlob, LocalDatastoreLruCache};
+use crate::{DataBlob, LocalDatastoreLruCache, DATASTORE_LOCKS_DIR};
const USING_MARKER_FILENAME_EXT: &str = "using";
diff --git a/pbs-datastore/src/lib.rs b/pbs-datastore/src/lib.rs
index 6647ee2b6..29d203f87 100644
--- a/pbs-datastore/src/lib.rs
+++ b/pbs-datastore/src/lib.rs
@@ -162,6 +162,8 @@ pub const ACTIVE_OPERATIONS_DIR: &str = concat!(
pbs_buildcfg::PROXMOX_BACKUP_RUN_DIR_M!(),
"/active-operations"
);
+/// Base path for datastore related file locks.
+pub const DATASTORE_LOCKS_DIR: &str = "/run/proxmox-backup/locks";
#[macro_export]
macro_rules! PROXMOX_BACKUP_PROTOCOL_ID_V1 {
diff --git a/pbs-datastore/src/move_journal.rs b/pbs-datastore/src/move_journal.rs
index 891644d7a..fb41e863c 100644
--- a/pbs-datastore/src/move_journal.rs
+++ b/pbs-datastore/src/move_journal.rs
@@ -54,7 +54,7 @@ use proxmox_sys::fs::{open_file_locked, CreateOptions};
use pbs_config::backup_user;
-use crate::backup_info::DATASTORE_LOCKS_DIR;
+use crate::DATASTORE_LOCKS_DIR;
const JOURNAL_FILENAME: &str = "move-journal";
const APPEND_LOCK_TIMEOUT: Duration = Duration::from_secs(10);
--
2.47.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH proxmox-backup 3/4] datastore: move file lock helper to centralized place
2026-05-05 8:11 [PATCH proxmox-backup 0/4] don't hold datastore config lock during s3-refresh Christian Ebner
2026-05-05 8:11 ` [PATCH proxmox-backup 1/4] pbs-config: reorganize crate use statements Christian Ebner
2026-05-05 8:11 ` [PATCH proxmox-backup 2/4] datastore: move lock files base path constant to central location Christian Ebner
@ 2026-05-05 8:11 ` Christian Ebner
2026-05-05 12:15 ` Fabian Grünbichler
2026-05-05 8:11 ` [PATCH proxmox-backup 4/4] api/datastore: use maintenance-mode lock to protect against changes Christian Ebner
3 siblings, 1 reply; 9+ messages in thread
From: Christian Ebner @ 2026-05-05 8:11 UTC (permalink / raw)
To: pbs-devel
Moves the implementation to a more central location, as it is not
only used by the datastore contents, but also by the chunk store.
No functional changes.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/backup_info.rs | 35 ++------------------------------
pbs-datastore/src/chunk_store.rs | 2 +-
pbs-datastore/src/lib.rs | 34 +++++++++++++++++++++++++++++++
3 files changed, 37 insertions(+), 34 deletions(-)
diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
index 9919b908a..bae136d3c 100644
--- a/pbs-datastore/src/backup_info.rs
+++ b/pbs-datastore/src/backup_info.rs
@@ -1,5 +1,5 @@
use std::fmt;
-use std::os::unix::io::{AsRawFd, RawFd};
+use std::os::unix::io::RawFd;
use std::os::unix::prelude::OsStrExt;
use std::path::Path;
use std::path::PathBuf;
@@ -24,7 +24,7 @@ use crate::datastore::{GROUP_NOTES_FILE_NAME, GROUP_OWNER_FILE_NAME};
use crate::manifest::{BackupManifest, MANIFEST_LOCK_NAME};
use crate::move_journal;
use crate::s3::S3_CONTENT_PREFIX;
-use crate::{DataBlob, DataStore, DatastoreBackend, DATASTORE_LOCKS_DIR};
+use crate::{lock_helper, DataBlob, DataStore, DatastoreBackend, DATASTORE_LOCKS_DIR};
pub const PROTECTED_MARKER_FILENAME: &str = ".protected";
@@ -1210,34 +1210,3 @@ fn lock_file_path_helper(ns: &BackupNamespace, path: PathBuf) -> PathBuf {
to_return.join(format!("{first_eigthy}...{last_eighty}-{hash}"))
}
-
-/// Helps implement the double stat'ing procedure. It avoids certain race conditions upon lock
-/// deletion.
-///
-/// It also creates the base directory for lock files.
-pub(crate) fn lock_helper<F>(
- store_name: &str,
- path: &std::path::Path,
- lock_fn: F,
-) -> Result<BackupLockGuard, Error>
-where
- F: Fn(&std::path::Path) -> Result<BackupLockGuard, Error>,
-{
- let mut lock_dir = Path::new(DATASTORE_LOCKS_DIR).join(store_name);
-
- if let Some(parent) = path.parent() {
- lock_dir = lock_dir.join(parent);
- };
-
- std::fs::create_dir_all(&lock_dir)?;
-
- let lock = lock_fn(path)?;
-
- let inode = nix::sys::stat::fstat(lock.as_raw_fd())?.st_ino;
-
- if nix::sys::stat::stat(path).map_or(true, |st| inode != st.st_ino) {
- bail!("could not acquire lock, another thread modified the lock file");
- }
-
- Ok(lock)
-}
diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 5b07ebdaa..778897974 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -944,7 +944,7 @@ impl ChunkStore {
timeout: Duration,
) -> Result<BackupLockGuard, Error> {
let lock_path = self.chunk_lock_path(digest);
- let guard = crate::backup_info::lock_helper(self.name(), &lock_path, |path| {
+ let guard = crate::lock_helper(self.name(), &lock_path, |path| {
pbs_config::open_backup_lockfile(path, Some(timeout), true)
})?;
Ok(guard)
diff --git a/pbs-datastore/src/lib.rs b/pbs-datastore/src/lib.rs
index 29d203f87..48acba4c8 100644
--- a/pbs-datastore/src/lib.rs
+++ b/pbs-datastore/src/lib.rs
@@ -157,6 +157,13 @@
#![deny(unsafe_op_in_unsafe_fn)]
+use std::os::unix::io::AsRawFd;
+use std::path::Path;
+
+use anyhow::{bail, Error};
+
+use pbs_config::BackupLockGuard;
+
/// Directory path where active operations counters are saved.
pub const ACTIVE_OPERATIONS_DIR: &str = concat!(
pbs_buildcfg::PROXMOX_BACKUP_RUN_DIR_M!(),
@@ -237,3 +244,30 @@ pub use local_chunk_reader::LocalChunkReader;
mod local_datastore_lru_cache;
pub use local_datastore_lru_cache::LocalDatastoreLruCache;
+
+/// Helps implement the double stat'ing procedure. It avoids certain race conditions upon lock
+/// deletion.
+///
+/// It also creates the base directory for lock files.
+pub fn lock_helper<F>(store_name: &str, path: &Path, lock_fn: F) -> Result<BackupLockGuard, Error>
+where
+ F: Fn(&Path) -> Result<BackupLockGuard, Error>,
+{
+ let mut lock_dir = Path::new(DATASTORE_LOCKS_DIR).join(store_name);
+
+ if let Some(parent) = path.parent() {
+ lock_dir = lock_dir.join(parent);
+ };
+
+ std::fs::create_dir_all(&lock_dir)?;
+
+ let lock = lock_fn(path)?;
+
+ let inode = nix::sys::stat::fstat(lock.as_raw_fd())?.st_ino;
+
+ if nix::sys::stat::stat(path).map_or(true, |st| inode != st.st_ino) {
+ bail!("could not acquire lock, another thread modified the lock file");
+ }
+
+ Ok(lock)
+}
--
2.47.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH proxmox-backup 4/4] api/datastore: use maintenance-mode lock to protect against changes
2026-05-05 8:11 [PATCH proxmox-backup 0/4] don't hold datastore config lock during s3-refresh Christian Ebner
` (2 preceding siblings ...)
2026-05-05 8:11 ` [PATCH proxmox-backup 3/4] datastore: move file lock helper to centralized place Christian Ebner
@ 2026-05-05 8:11 ` Christian Ebner
2026-05-05 12:14 ` Fabian Grünbichler
3 siblings, 1 reply; 9+ messages in thread
From: Christian Ebner @ 2026-05-05 8:11 UTC (permalink / raw)
To: pbs-devel
Instead of locking the whole config, which causes also unrelated
datastores to not be available, use a per-datastore maintenance mode
lock to protect against changes during potentially longer running
operations such as s3-refresh or removable datastore mount/unmount.
However keep the current semantic of first setting the maintenance
mode, allow to still opt out from the mode while waiting for active
operations to finish, and only then enforce the maintenance mode
during the s3 refresh or removable datastore mount operation.
unset_maintenance() is adapted to either reacquire the datastore
config lock or use the optional lock guard.
Fixes: https://forum.proxmox.com/threads/183244/
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/datastore.rs | 2 ++
pbs-datastore/src/lib.rs | 11 ++++++++++-
src/api2/admin/datastore.rs | 26 ++++++++++++++++++++------
src/api2/config/datastore.rs | 6 +++++-
4 files changed, 37 insertions(+), 8 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 34efcd398..aa372cd73 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -3036,6 +3036,7 @@ impl DataStore {
let (mut config, _digest) = pbs_config::datastore::config()?;
let mut datastore_config: DataStoreConfig = config.lookup("datastore", name)?;
+ let maintenance_mode_lock = crate::maintenance_mode_lock(name)?;
datastore_config.set_maintenance_mode(Some(MaintenanceMode {
ty: MaintenanceType::Delete,
message: None,
@@ -3043,6 +3044,7 @@ impl DataStore {
config.set_data(name, "datastore", &datastore_config)?;
pbs_config::datastore::save_config(&config)?;
+ drop(maintenance_mode_lock);
drop(config_lock);
let (operations, _lock) = task_tracking::get_active_operations_locked(name)?;
diff --git a/pbs-datastore/src/lib.rs b/pbs-datastore/src/lib.rs
index 48acba4c8..823f3bf2f 100644
--- a/pbs-datastore/src/lib.rs
+++ b/pbs-datastore/src/lib.rs
@@ -159,8 +159,9 @@
use std::os::unix::io::AsRawFd;
use std::path::Path;
+use std::time::Duration;
-use anyhow::{bail, Error};
+use anyhow::{bail, Context, Error};
use pbs_config::BackupLockGuard;
@@ -271,3 +272,11 @@ where
Ok(lock)
}
+
+/// Acquire an exclusive lock for the datastore's maintenance-mode
+pub fn maintenance_mode_lock(store: &str) -> Result<BackupLockGuard, Error> {
+ lock_helper(store, Path::new("maintenance-mode.lck"), |p| {
+ pbs_config::open_backup_lockfile(p, Some(Duration::from_secs(0)), true)
+ .context("unable to acquire exclusive datastore's maintenance-mode lock")
+ })
+}
diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index 1e0a78f3a..883091e6f 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -61,8 +61,8 @@ use pbs_datastore::index::IndexFile;
use pbs_datastore::manifest::BackupManifest;
use pbs_datastore::prune::compute_prune_info;
use pbs_datastore::{
- check_backup_owner, ensure_datastore_is_mounted, task_tracking, BackupDir, DataStore,
- LocalChunkReader, StoreProgress,
+ check_backup_owner, ensure_datastore_is_mounted, maintenance_mode_lock, task_tracking,
+ BackupDir, DataStore, LocalChunkReader, StoreProgress,
};
use pbs_tools::json::required_string_param;
use proxmox_rest_server::{formatter, worker_is_active, WorkerTask};
@@ -2705,9 +2705,13 @@ fn expect_maintenance_type(
}
fn unset_maintenance(
- _lock: pbs_config::BackupLockGuard,
+ lock: Option<pbs_config::BackupLockGuard>,
mut config: DataStoreConfig,
) -> Result<(), Error> {
+ let _lock = match lock {
+ Some(lock) => lock,
+ None => pbs_config::datastore::lock_config()?,
+ };
let (mut section_config, _digest) = pbs_config::datastore::config()?;
config.maintenance_mode = None;
section_config.set_data(&config.name, "datastore", &config)?;
@@ -2763,6 +2767,7 @@ async fn do_unmount(store: String, auth_id: Authid, to_stdout: bool) -> Result<V
let (mut section_config, _digest) = pbs_config::datastore::config()?;
let mut datastore: DataStoreConfig = section_config.lookup("datastore", &store)?;
+ let maintenance_mode_lock = maintenance_mode_lock(&store)?;
datastore.set_maintenance_mode(Some(MaintenanceMode {
ty: MaintenanceType::Unmount,
message: None,
@@ -2770,6 +2775,7 @@ async fn do_unmount(store: String, auth_id: Authid, to_stdout: bool) -> Result<V
section_config.set_data(&store, "datastore", &datastore)?;
pbs_config::datastore::save_config(§ion_config)?;
+ drop(maintenance_mode_lock);
drop(_lock);
if let Ok(proxy_pid) = proxmox_rest_server::read_pid(pbs_buildcfg::PROXMOX_BACKUP_PROXY_PID_FN)
@@ -2845,6 +2851,7 @@ pub fn s3_refresh(store: String, rpcenv: &mut dyn RpcEnvironment) -> Result<Valu
let (mut section_config, _digest) = pbs_config::datastore::config()?;
let mut datastore_config: DataStoreConfig = section_config.lookup("datastore", &store)?;
+ let maintenance_mode_lock = maintenance_mode_lock(&store)?;
datastore_config.set_maintenance_mode(Some(MaintenanceMode {
ty: MaintenanceType::S3Refresh,
message: None,
@@ -2852,6 +2859,7 @@ pub fn s3_refresh(store: String, rpcenv: &mut dyn RpcEnvironment) -> Result<Valu
section_config.set_data(&store, "datastore", &datastore_config)?;
pbs_config::datastore::save_config(§ion_config)?;
+ drop(maintenance_mode_lock);
drop(_lock);
let upid = WorkerTask::new_thread(
@@ -2875,7 +2883,8 @@ pub(crate) fn do_s3_refresh(store: &str, worker: &dyn WorkerTaskContext) -> Resu
}
/// Wait for no more active operations on the given datastore and run the provided callback in
-/// a datastore locked context, protecting against maintenance mode changes.
+/// a maintenance mode locked context, protecting against leaving the mode for the functions
+/// lifetime.
fn run_maintenance_locked(
store: &str,
maintenance_expected: MaintenanceType,
@@ -2908,15 +2917,20 @@ fn run_maintenance_locked(
let _ = expect_maintenance_type(store, maintenance_expected)
.inspect_err(|e| warn!("maintenance mode was not as expected: {e}"))
.and_then(|(lock, config)| {
- unset_maintenance(lock, config)
+ unset_maintenance(Some(lock), config)
.inspect_err(|e| warn!("could not reset maintenance mode: {e}"))
});
bail!("aborted, due to user request");
}
let (lock, config) = expect_maintenance_type(store, maintenance_expected)?;
+ let _maintenance_mode_lock = maintenance_mode_lock(&config.name)?;
+ // avoid blocking config access/updates, but keep maintenance-mode locked
+ // until done.
+ drop(lock);
+
callback()?;
- unset_maintenance(lock, config)
+ unset_maintenance(None, config)
.map_err(|e| format_err!("could not reset maintenance mode: {e}"))?;
Ok(())
diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
index 16e85a636..b07e7fc7e 100644
--- a/src/api2/config/datastore.rs
+++ b/src/api2/config/datastore.rs
@@ -31,7 +31,9 @@ use crate::api2::config::tape_backup_job::{delete_tape_backup_job, list_tape_bac
use crate::api2::config::verify::delete_verification_job;
use pbs_config::CachedUserInfo;
-use pbs_datastore::{get_datastore_mount_status, DataStore, S3_DATASTORE_IN_USE_MARKER};
+use pbs_datastore::{
+ get_datastore_mount_status, maintenance_mode_lock, DataStore, S3_DATASTORE_IN_USE_MARKER,
+};
use proxmox_rest_server::WorkerTask;
use proxmox_s3_client::{S3ObjectKey, S3_HTTP_REQUEST_TIMEOUT};
@@ -540,6 +542,7 @@ pub fn update_datastore(
data.tuning = None;
}
DeletableProperty::MaintenanceMode => {
+ let _maintenance_mode_lock = maintenance_mode_lock(&name)?;
data.set_maintenance_mode(None)?;
}
DeletableProperty::NotificationThresholds => {
@@ -638,6 +641,7 @@ pub fn update_datastore(
)?),
None => None,
};
+ let _maintenance_mode_lock = maintenance_mode_lock(&name)?;
data.set_maintenance_mode(maintenance_mode)?;
}
--
2.47.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH proxmox-backup 4/4] api/datastore: use maintenance-mode lock to protect against changes
2026-05-05 8:11 ` [PATCH proxmox-backup 4/4] api/datastore: use maintenance-mode lock to protect against changes Christian Ebner
@ 2026-05-05 12:14 ` Fabian Grünbichler
2026-05-05 13:02 ` Christian Ebner
0 siblings, 1 reply; 9+ messages in thread
From: Fabian Grünbichler @ 2026-05-05 12:14 UTC (permalink / raw)
To: Christian Ebner, pbs-devel
On May 5, 2026 10:11 am, Christian Ebner wrote:
> Instead of locking the whole config, which causes also unrelated
> datastores to not be available, use a per-datastore maintenance mode
> lock to protect against changes during potentially longer running
> operations such as s3-refresh or removable datastore mount/unmount.
>
> However keep the current semantic of first setting the maintenance
> mode, allow to still opt out from the mode while waiting for active
> operations to finish, and only then enforce the maintenance mode
> during the s3 refresh or removable datastore mount operation.
>
> unset_maintenance() is adapted to either reacquire the datastore
> config lock or use the optional lock guard.
>
> Fixes: https://forum.proxmox.com/threads/183244/
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> pbs-datastore/src/datastore.rs | 2 ++
> pbs-datastore/src/lib.rs | 11 ++++++++++-
> src/api2/admin/datastore.rs | 26 ++++++++++++++++++++------
> src/api2/config/datastore.rs | 6 +++++-
> 4 files changed, 37 insertions(+), 8 deletions(-)
>
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 34efcd398..aa372cd73 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -3036,6 +3036,7 @@ impl DataStore {
> let (mut config, _digest) = pbs_config::datastore::config()?;
> let mut datastore_config: DataStoreConfig = config.lookup("datastore", name)?;
>
> + let maintenance_mode_lock = crate::maintenance_mode_lock(name)?;
> datastore_config.set_maintenance_mode(Some(MaintenanceMode {
meh that this is in pbs-api-types, else we could require it being passed
the lock to ensure we never miss call sites..
right now, datastore creation doesn't take the maintenance_mode_lock
despite setting it (in the reuse-existing S3 case), and it suffers from
a similar problem - it holds the lock for the whole duration of the
creation, but that also entails S3 requests that are allowed to take up
to 30m(!) - and other expensive things. have we maybe found our second
use case for this lock? if we hold it during creation of datastores, we
could drop the config lock before re-acquiring it for persisting the
config entry? or we need to move this into a two-phase creation, first
add the entry with maintenance mode "creating", then while only holding
the maintenance_mode_lock do all the prep work, and then at the end
re-acquire the config lock and unset the maintenance mode? this would
have considerable overlap with how we handle s3 refreshing..
IMHO moving to a two-step creation would be cleanest, since we've also
had reports in the past about a long chunk store initialization being
cumbersome (since it effectively blocks all modifications of the config
while it is going on).
> ty: MaintenanceType::Delete,
> message: None,
> @@ -3043,6 +3044,7 @@ impl DataStore {
>
> config.set_data(name, "datastore", &datastore_config)?;
> pbs_config::datastore::save_config(&config)?;
> + drop(maintenance_mode_lock);
> drop(config_lock);
>
> let (operations, _lock) = task_tracking::get_active_operations_locked(name)?;
> diff --git a/pbs-datastore/src/lib.rs b/pbs-datastore/src/lib.rs
> index 48acba4c8..823f3bf2f 100644
> --- a/pbs-datastore/src/lib.rs
> +++ b/pbs-datastore/src/lib.rs
> @@ -159,8 +159,9 @@
>
> use std::os::unix::io::AsRawFd;
> use std::path::Path;
> +use std::time::Duration;
>
> -use anyhow::{bail, Error};
> +use anyhow::{bail, Context, Error};
>
> use pbs_config::BackupLockGuard;
>
> @@ -271,3 +272,11 @@ where
>
> Ok(lock)
> }
> +
> +/// Acquire an exclusive lock for the datastore's maintenance-mode
> +pub fn maintenance_mode_lock(store: &str) -> Result<BackupLockGuard, Error> {
> + lock_helper(store, Path::new("maintenance-mode.lck"), |p| {
> + pbs_config::open_backup_lockfile(p, Some(Duration::from_secs(0)), true)
this might warrant a comment, usually we open config related locks with
a timeout to prevent flaky locking in case of concurrency, waiting a few
seconds is normally fine.
but here, we must always first acquire the datastore config lock (should
we encode that in the signature??) anyway, which makes the 0
timeout/non-blocking behaviour okay, if I understand it correctly?
> + .context("unable to acquire exclusive datastore's maintenance-mode lock")
> + })
> +}
> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
> index 1e0a78f3a..883091e6f 100644
> --- a/src/api2/admin/datastore.rs
> +++ b/src/api2/admin/datastore.rs
> @@ -61,8 +61,8 @@ use pbs_datastore::index::IndexFile;
> use pbs_datastore::manifest::BackupManifest;
> use pbs_datastore::prune::compute_prune_info;
> use pbs_datastore::{
> - check_backup_owner, ensure_datastore_is_mounted, task_tracking, BackupDir, DataStore,
> - LocalChunkReader, StoreProgress,
> + check_backup_owner, ensure_datastore_is_mounted, maintenance_mode_lock, task_tracking,
> + BackupDir, DataStore, LocalChunkReader, StoreProgress,
> };
> use pbs_tools::json::required_string_param;
> use proxmox_rest_server::{formatter, worker_is_active, WorkerTask};
> @@ -2705,9 +2705,13 @@ fn expect_maintenance_type(
> }
>
> fn unset_maintenance(
> - _lock: pbs_config::BackupLockGuard,
> + lock: Option<pbs_config::BackupLockGuard>,
I think this should take Option<(lock, lock)>, and
expect_maintenance_type should acquire and return both locks, but it
needs to be adapted further, because atm it is broken:
1. start s3-refresh while active operations are running
2. s3-refresh will be in loop waiting for operations to disappear
3. roughly every second, it will call expect_maintenance_type
4. if somebody else is holding the datastore config lock for more than
ten seconds (let's say you started a re-use existing datastore
creation with S3 ;)), expect_maintenance_type will error out!
we currently call expect_maintenance_type in three places:
1. to check whether the maintenance mode changed under us while waiting
for operations to stop (no need for locking here)
2. to check whether the maintenance mode changed under us if we abort
waiting - we don't really need to lock here if we move this call to
unset_maintenance
3. to do a final check before we do the actual work - this one uses the
locks then, but we could just as well lock first, then check?
what do you think about the following (untested!):
----8<----
diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index 883091e6f..1acb0baf3 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -49,7 +49,7 @@ use pbs_api_types::{
VERIFY_JOB_READ_THREADS_SCHEMA, VERIFY_JOB_VERIFY_THREADS_SCHEMA,
};
use pbs_client::pxar::{create_tar, create_zip};
-use pbs_config::CachedUserInfo;
+use pbs_config::{BackupLockGuard, CachedUserInfo};
use pbs_datastore::backup_info::BackupInfo;
use pbs_datastore::cached_chunk_reader::CachedChunkReader;
use pbs_datastore::catalog::{ArchiveEntry, CatalogReader};
@@ -2686,11 +2686,7 @@ pub fn mount(store: String, rpcenv: &mut dyn RpcEnvironment) -> Result<Value, Er
Ok(json!(upid))
}
-fn expect_maintenance_type(
- store: &str,
- maintenance_type: MaintenanceType,
-) -> Result<(pbs_config::BackupLockGuard, DataStoreConfig), Error> {
- let lock = pbs_config::datastore::lock_config()?;
+fn expect_maintenance_type(store: &str, maintenance_type: MaintenanceType) -> Result<(), Error> {
let (section_config, _digest) = pbs_config::datastore::config()?;
let store_config: DataStoreConfig = section_config.lookup("datastore", store)?;
@@ -2701,20 +2697,24 @@ fn expect_maintenance_type(
bail!("maintenance mode is not '{maintenance_type}'");
}
- Ok((lock, store_config))
+ Ok(())
}
fn unset_maintenance(
- lock: Option<pbs_config::BackupLockGuard>,
- mut config: DataStoreConfig,
+ maintenance_lock: Option<BackupLockGuard>,
+ store: &str,
+ expected: MaintenanceType,
) -> Result<(), Error> {
- let _lock = match lock {
+ let _lock = pbs_config::datastore::lock_config()?;
+ let _maintenance_mode_lock = match maintenance_lock {
Some(lock) => lock,
- None => pbs_config::datastore::lock_config()?,
+ None => maintenance_mode_lock(store)?,
};
+ expect_maintenance_type(store, expected)?;
let (mut section_config, _digest) = pbs_config::datastore::config()?;
- config.maintenance_mode = None;
- section_config.set_data(&config.name, "datastore", &config)?;
+ let mut store_config: DataStoreConfig = section_config.lookup("datastore", store)?;
+ store_config.maintenance_mode = None;
+ section_config.set_data(store, "datastore", &store_config)?;
pbs_config::datastore::save_config(§ion_config)?;
Ok(())
}
@@ -2914,23 +2914,20 @@ fn run_maintenance_locked(
}
if aborted || worker.abort_requested() {
- let _ = expect_maintenance_type(store, maintenance_expected)
- .inspect_err(|e| warn!("maintenance mode was not as expected: {e}"))
- .and_then(|(lock, config)| {
- unset_maintenance(Some(lock), config)
- .inspect_err(|e| warn!("could not reset maintenance mode: {e}"))
- });
+ let _ = unset_maintenance(None, store, maintenance_expected)
+ .inspect_err(|e| warn!("could not reset maintenance mode: {e}"));
bail!("aborted, due to user request");
}
- let (lock, config) = expect_maintenance_type(store, maintenance_expected)?;
- let _maintenance_mode_lock = maintenance_mode_lock(&config.name)?;
+ let lock = pbs_config::datastore::lock_config()?;
+ let maintenance_mode_lock = maintenance_mode_lock(store)?;
+ expect_maintenance_type(store, maintenance_expected)?;
// avoid blocking config access/updates, but keep maintenance-mode locked
// until done.
drop(lock);
callback()?;
- unset_maintenance(None, config)
+ unset_maintenance(Some(maintenance_mode_lock), store, maintenance_expected)
.map_err(|e| format_err!("could not reset maintenance mode: {e}"))?;
Ok(())
---->8----
the tiny bit of double work of reparsing the config is IMHO not that
bad, since this is not on any hot path..
> mut config: DataStoreConfig,
> ) -> Result<(), Error> {
> + let _lock = match lock {
> + Some(lock) => lock,
> + None => pbs_config::datastore::lock_config()?,
> + };
> let (mut section_config, _digest) = pbs_config::datastore::config()?;
> config.maintenance_mode = None;
> section_config.set_data(&config.name, "datastore", &config)?;
> @@ -2763,6 +2767,7 @@ async fn do_unmount(store: String, auth_id: Authid, to_stdout: bool) -> Result<V
> let (mut section_config, _digest) = pbs_config::datastore::config()?;
> let mut datastore: DataStoreConfig = section_config.lookup("datastore", &store)?;
>
> + let maintenance_mode_lock = maintenance_mode_lock(&store)?;
> datastore.set_maintenance_mode(Some(MaintenanceMode {
> ty: MaintenanceType::Unmount,
> message: None,
> @@ -2770,6 +2775,7 @@ async fn do_unmount(store: String, auth_id: Authid, to_stdout: bool) -> Result<V
> section_config.set_data(&store, "datastore", &datastore)?;
> pbs_config::datastore::save_config(§ion_config)?;
>
> + drop(maintenance_mode_lock);
> drop(_lock);
>
> if let Ok(proxy_pid) = proxmox_rest_server::read_pid(pbs_buildcfg::PROXMOX_BACKUP_PROXY_PID_FN)
> @@ -2845,6 +2851,7 @@ pub fn s3_refresh(store: String, rpcenv: &mut dyn RpcEnvironment) -> Result<Valu
> let (mut section_config, _digest) = pbs_config::datastore::config()?;
> let mut datastore_config: DataStoreConfig = section_config.lookup("datastore", &store)?;
>
> + let maintenance_mode_lock = maintenance_mode_lock(&store)?;
> datastore_config.set_maintenance_mode(Some(MaintenanceMode {
> ty: MaintenanceType::S3Refresh,
> message: None,
> @@ -2852,6 +2859,7 @@ pub fn s3_refresh(store: String, rpcenv: &mut dyn RpcEnvironment) -> Result<Valu
>
> section_config.set_data(&store, "datastore", &datastore_config)?;
> pbs_config::datastore::save_config(§ion_config)?;
> + drop(maintenance_mode_lock);
> drop(_lock);
>
> let upid = WorkerTask::new_thread(
> @@ -2875,7 +2883,8 @@ pub(crate) fn do_s3_refresh(store: &str, worker: &dyn WorkerTaskContext) -> Resu
> }
>
> /// Wait for no more active operations on the given datastore and run the provided callback in
> -/// a datastore locked context, protecting against maintenance mode changes.
> +/// a maintenance mode locked context, protecting against leaving the mode for the functions
> +/// lifetime.
> fn run_maintenance_locked(
> store: &str,
> maintenance_expected: MaintenanceType,
> @@ -2908,15 +2917,20 @@ fn run_maintenance_locked(
> let _ = expect_maintenance_type(store, maintenance_expected)
> .inspect_err(|e| warn!("maintenance mode was not as expected: {e}"))
> .and_then(|(lock, config)| {
> - unset_maintenance(lock, config)
> + unset_maintenance(Some(lock), config)
> .inspect_err(|e| warn!("could not reset maintenance mode: {e}"))
> });
> bail!("aborted, due to user request");
> }
>
> let (lock, config) = expect_maintenance_type(store, maintenance_expected)?;
> + let _maintenance_mode_lock = maintenance_mode_lock(&config.name)?;
> + // avoid blocking config access/updates, but keep maintenance-mode locked
> + // until done.
> + drop(lock);
> +
> callback()?;
> - unset_maintenance(lock, config)
> + unset_maintenance(None, config)
> .map_err(|e| format_err!("could not reset maintenance mode: {e}"))?;
>
> Ok(())
> diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
> index 16e85a636..b07e7fc7e 100644
> --- a/src/api2/config/datastore.rs
> +++ b/src/api2/config/datastore.rs
> @@ -31,7 +31,9 @@ use crate::api2::config::tape_backup_job::{delete_tape_backup_job, list_tape_bac
> use crate::api2::config::verify::delete_verification_job;
> use pbs_config::CachedUserInfo;
>
> -use pbs_datastore::{get_datastore_mount_status, DataStore, S3_DATASTORE_IN_USE_MARKER};
> +use pbs_datastore::{
> + get_datastore_mount_status, maintenance_mode_lock, DataStore, S3_DATASTORE_IN_USE_MARKER,
> +};
> use proxmox_rest_server::WorkerTask;
> use proxmox_s3_client::{S3ObjectKey, S3_HTTP_REQUEST_TIMEOUT};
>
> @@ -540,6 +542,7 @@ pub fn update_datastore(
> data.tuning = None;
> }
> DeletableProperty::MaintenanceMode => {
> + let _maintenance_mode_lock = maintenance_mode_lock(&name)?;
> data.set_maintenance_mode(None)?;
> }
> DeletableProperty::NotificationThresholds => {
> @@ -638,6 +641,7 @@ pub fn update_datastore(
> )?),
> None => None,
> };
> + let _maintenance_mode_lock = maintenance_mode_lock(&name)?;
> data.set_maintenance_mode(maintenance_mode)?;
> }
>
> --
> 2.47.3
>
>
>
>
>
>
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH proxmox-backup 3/4] datastore: move file lock helper to centralized place
2026-05-05 8:11 ` [PATCH proxmox-backup 3/4] datastore: move file lock helper to centralized place Christian Ebner
@ 2026-05-05 12:15 ` Fabian Grünbichler
0 siblings, 0 replies; 9+ messages in thread
From: Fabian Grünbichler @ 2026-05-05 12:15 UTC (permalink / raw)
To: Christian Ebner, pbs-devel
On May 5, 2026 10:11 am, Christian Ebner wrote:
> Moves the implementation to a more central location, as it is not
> only used by the datastore contents, but also by the chunk store.
we could also consider moving it to its own place?
we have:
- lock_chunk in the chunk_store (has no dependency on the
chunk/datastore other than its name)
- lock_file_path_helper in backup_info
- lock_helper
- the DIR constant
- lock_path implementations for
-- chunk (digest)
-- group
-- snapshot
-- manifests
I've also wondered in the past a few times whether we'd not want to make
the BackupLockGuard generic to encode *what* kind of entity is being
locked, to improve function signatures.. though that might be a bigger
change ;)
>
> No functional changes.
>
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> pbs-datastore/src/backup_info.rs | 35 ++------------------------------
> pbs-datastore/src/chunk_store.rs | 2 +-
> pbs-datastore/src/lib.rs | 34 +++++++++++++++++++++++++++++++
> 3 files changed, 37 insertions(+), 34 deletions(-)
>
> diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
> index 9919b908a..bae136d3c 100644
> --- a/pbs-datastore/src/backup_info.rs
> +++ b/pbs-datastore/src/backup_info.rs
> @@ -1,5 +1,5 @@
> use std::fmt;
> -use std::os::unix::io::{AsRawFd, RawFd};
> +use std::os::unix::io::RawFd;
> use std::os::unix::prelude::OsStrExt;
> use std::path::Path;
> use std::path::PathBuf;
> @@ -24,7 +24,7 @@ use crate::datastore::{GROUP_NOTES_FILE_NAME, GROUP_OWNER_FILE_NAME};
> use crate::manifest::{BackupManifest, MANIFEST_LOCK_NAME};
> use crate::move_journal;
> use crate::s3::S3_CONTENT_PREFIX;
> -use crate::{DataBlob, DataStore, DatastoreBackend, DATASTORE_LOCKS_DIR};
> +use crate::{lock_helper, DataBlob, DataStore, DatastoreBackend, DATASTORE_LOCKS_DIR};
>
> pub const PROTECTED_MARKER_FILENAME: &str = ".protected";
>
> @@ -1210,34 +1210,3 @@ fn lock_file_path_helper(ns: &BackupNamespace, path: PathBuf) -> PathBuf {
>
> to_return.join(format!("{first_eigthy}...{last_eighty}-{hash}"))
> }
> -
> -/// Helps implement the double stat'ing procedure. It avoids certain race conditions upon lock
> -/// deletion.
> -///
> -/// It also creates the base directory for lock files.
> -pub(crate) fn lock_helper<F>(
> - store_name: &str,
> - path: &std::path::Path,
> - lock_fn: F,
> -) -> Result<BackupLockGuard, Error>
> -where
> - F: Fn(&std::path::Path) -> Result<BackupLockGuard, Error>,
> -{
> - let mut lock_dir = Path::new(DATASTORE_LOCKS_DIR).join(store_name);
> -
> - if let Some(parent) = path.parent() {
> - lock_dir = lock_dir.join(parent);
> - };
> -
> - std::fs::create_dir_all(&lock_dir)?;
> -
> - let lock = lock_fn(path)?;
> -
> - let inode = nix::sys::stat::fstat(lock.as_raw_fd())?.st_ino;
> -
> - if nix::sys::stat::stat(path).map_or(true, |st| inode != st.st_ino) {
> - bail!("could not acquire lock, another thread modified the lock file");
> - }
> -
> - Ok(lock)
> -}
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index 5b07ebdaa..778897974 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -944,7 +944,7 @@ impl ChunkStore {
> timeout: Duration,
> ) -> Result<BackupLockGuard, Error> {
> let lock_path = self.chunk_lock_path(digest);
> - let guard = crate::backup_info::lock_helper(self.name(), &lock_path, |path| {
> + let guard = crate::lock_helper(self.name(), &lock_path, |path| {
> pbs_config::open_backup_lockfile(path, Some(timeout), true)
> })?;
> Ok(guard)
> diff --git a/pbs-datastore/src/lib.rs b/pbs-datastore/src/lib.rs
> index 29d203f87..48acba4c8 100644
> --- a/pbs-datastore/src/lib.rs
> +++ b/pbs-datastore/src/lib.rs
> @@ -157,6 +157,13 @@
>
> #![deny(unsafe_op_in_unsafe_fn)]
>
> +use std::os::unix::io::AsRawFd;
> +use std::path::Path;
> +
> +use anyhow::{bail, Error};
> +
> +use pbs_config::BackupLockGuard;
> +
> /// Directory path where active operations counters are saved.
> pub const ACTIVE_OPERATIONS_DIR: &str = concat!(
> pbs_buildcfg::PROXMOX_BACKUP_RUN_DIR_M!(),
> @@ -237,3 +244,30 @@ pub use local_chunk_reader::LocalChunkReader;
>
> mod local_datastore_lru_cache;
> pub use local_datastore_lru_cache::LocalDatastoreLruCache;
> +
> +/// Helps implement the double stat'ing procedure. It avoids certain race conditions upon lock
> +/// deletion.
> +///
> +/// It also creates the base directory for lock files.
> +pub fn lock_helper<F>(store_name: &str, path: &Path, lock_fn: F) -> Result<BackupLockGuard, Error>
in any case, this does not need to be pub, `pub(crate)` is enough..
> +where
> + F: Fn(&Path) -> Result<BackupLockGuard, Error>,
> +{
> + let mut lock_dir = Path::new(DATASTORE_LOCKS_DIR).join(store_name);
> +
> + if let Some(parent) = path.parent() {
> + lock_dir = lock_dir.join(parent);
> + };
> +
> + std::fs::create_dir_all(&lock_dir)?;
> +
> + let lock = lock_fn(path)?;
> +
> + let inode = nix::sys::stat::fstat(lock.as_raw_fd())?.st_ino;
> +
> + if nix::sys::stat::stat(path).map_or(true, |st| inode != st.st_ino) {
> + bail!("could not acquire lock, another thread modified the lock file");
> + }
> +
> + Ok(lock)
> +}
> --
> 2.47.3
>
>
>
>
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* applied: [PATCH proxmox-backup 1/4] pbs-config: reorganize crate use statements
2026-05-05 8:11 ` [PATCH proxmox-backup 1/4] pbs-config: reorganize crate use statements Christian Ebner
@ 2026-05-05 12:15 ` Fabian Grünbichler
0 siblings, 0 replies; 9+ messages in thread
From: Fabian Grünbichler @ 2026-05-05 12:15 UTC (permalink / raw)
To: Christian Ebner, pbs-devel
On May 5, 2026 10:11 am, Christian Ebner wrote:
> Follow along the style used throughout the rest of the codebase,
> grouping and sorting use statements for crates by provenance.
>
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Applied this one, thanks!
[1/4] pbs-config: reorganize crate use statements
commit: 5965b6fd07dbeda95c5cf6344b3c917efc746544
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH proxmox-backup 4/4] api/datastore: use maintenance-mode lock to protect against changes
2026-05-05 12:14 ` Fabian Grünbichler
@ 2026-05-05 13:02 ` Christian Ebner
0 siblings, 0 replies; 9+ messages in thread
From: Christian Ebner @ 2026-05-05 13:02 UTC (permalink / raw)
To: Fabian Grünbichler, pbs-devel
On 5/5/26 2:12 PM, Fabian Grünbichler wrote:
> On May 5, 2026 10:11 am, Christian Ebner wrote:
>> Instead of locking the whole config, which causes also unrelated
>> datastores to not be available, use a per-datastore maintenance mode
>> lock to protect against changes during potentially longer running
>> operations such as s3-refresh or removable datastore mount/unmount.
>>
>> However keep the current semantic of first setting the maintenance
>> mode, allow to still opt out from the mode while waiting for active
>> operations to finish, and only then enforce the maintenance mode
>> during the s3 refresh or removable datastore mount operation.
>>
>> unset_maintenance() is adapted to either reacquire the datastore
>> config lock or use the optional lock guard.
>>
>> Fixes: https://forum.proxmox.com/threads/183244/
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>> pbs-datastore/src/datastore.rs | 2 ++
>> pbs-datastore/src/lib.rs | 11 ++++++++++-
>> src/api2/admin/datastore.rs | 26 ++++++++++++++++++++------
>> src/api2/config/datastore.rs | 6 +++++-
>> 4 files changed, 37 insertions(+), 8 deletions(-)
>>
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 34efcd398..aa372cd73 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -3036,6 +3036,7 @@ impl DataStore {
>> let (mut config, _digest) = pbs_config::datastore::config()?;
>> let mut datastore_config: DataStoreConfig = config.lookup("datastore", name)?;
>>
>> + let maintenance_mode_lock = crate::maintenance_mode_lock(name)?;
>> datastore_config.set_maintenance_mode(Some(MaintenanceMode {
>
> meh that this is in pbs-api-types, else we could require it being passed
> the lock to ensure we never miss call sites..
Yeah, unfortunately. That is the reason I kept it independet.
>
> right now, datastore creation doesn't take the maintenance_mode_lock
> despite setting it (in the reuse-existing S3 case), and it suffers from
> a similar problem - it holds the lock for the whole duration of the
> creation, but that also entails S3 requests that are allowed to take up
> to 30m(!) - and other expensive things. have we maybe found our second
> use case for this lock? if we hold it during creation of datastores, we
> could drop the config lock before re-acquiring it for persisting the
> config entry? or we need to move this into a two-phase creation, first
> add the entry with maintenance mode "creating", then while only holding
> the maintenance_mode_lock do all the prep work, and then at the end
> re-acquire the config lock and unset the maintenance mode? this would
> have considerable overlap with how we handle s3 refreshing..
>
> IMHO moving to a two-step creation would be cleanest, since we've also
> had reports in the past about a long chunk store initialization being
> cumbersome (since it effectively blocks all modifications of the config
> while it is going on).
Good catch, the s3 requests there and the creation are indeed worth the
suggested 2-step approach with a corresponding maintenance-mode and
holding it's lock.
>
>> ty: MaintenanceType::Delete,
>> message: None,
>> @@ -3043,6 +3044,7 @@ impl DataStore {
>>
>> config.set_data(name, "datastore", &datastore_config)?;
>> pbs_config::datastore::save_config(&config)?;
>> + drop(maintenance_mode_lock);
>> drop(config_lock);
>>
>> let (operations, _lock) = task_tracking::get_active_operations_locked(name)?;
>> diff --git a/pbs-datastore/src/lib.rs b/pbs-datastore/src/lib.rs
>> index 48acba4c8..823f3bf2f 100644
>> --- a/pbs-datastore/src/lib.rs
>> +++ b/pbs-datastore/src/lib.rs
>> @@ -159,8 +159,9 @@
>>
>> use std::os::unix::io::AsRawFd;
>> use std::path::Path;
>> +use std::time::Duration;
>>
>> -use anyhow::{bail, Error};
>> +use anyhow::{bail, Context, Error};
>>
>> use pbs_config::BackupLockGuard;
>>
>> @@ -271,3 +272,11 @@ where
>>
>> Ok(lock)
>> }
>> +
>> +/// Acquire an exclusive lock for the datastore's maintenance-mode
>> +pub fn maintenance_mode_lock(store: &str) -> Result<BackupLockGuard, Error> {
>> + lock_helper(store, Path::new("maintenance-mode.lck"), |p| {
>> + pbs_config::open_backup_lockfile(p, Some(Duration::from_secs(0)), true)
>
> this might warrant a comment, usually we open config related locks with
> a timeout to prevent flaky locking in case of concurrency, waiting a few
> seconds is normally fine.
>
> but here, we must always first acquire the datastore config lock (should
> we encode that in the signature??) anyway, which makes the 0
> timeout/non-blocking behaviour okay, if I understand it correctly?
Right, it makes sense to pass in the config lock to "prove" that it was
acquired first.
But it could make sense to add the short delay here nevertheless. After
all, another operation might have dropped the config lock already, while
still holding onto the maintenance-mode lock. OTOH the operations which
do hold it for longer probably will outlive the timeout, so a bit torn
as I'm not to sure if there is much benefit.
>
>> + .context("unable to acquire exclusive datastore's maintenance-mode lock")
>> + })
>> +}
>> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
>> index 1e0a78f3a..883091e6f 100644
>> --- a/src/api2/admin/datastore.rs
>> +++ b/src/api2/admin/datastore.rs
>> @@ -61,8 +61,8 @@ use pbs_datastore::index::IndexFile;
>> use pbs_datastore::manifest::BackupManifest;
>> use pbs_datastore::prune::compute_prune_info;
>> use pbs_datastore::{
>> - check_backup_owner, ensure_datastore_is_mounted, task_tracking, BackupDir, DataStore,
>> - LocalChunkReader, StoreProgress,
>> + check_backup_owner, ensure_datastore_is_mounted, maintenance_mode_lock, task_tracking,
>> + BackupDir, DataStore, LocalChunkReader, StoreProgress,
>> };
>> use pbs_tools::json::required_string_param;
>> use proxmox_rest_server::{formatter, worker_is_active, WorkerTask};
>> @@ -2705,9 +2705,13 @@ fn expect_maintenance_type(
>> }
>>
>> fn unset_maintenance(
>> - _lock: pbs_config::BackupLockGuard,
>> + lock: Option<pbs_config::BackupLockGuard>,
>
> I think this should take Option<(lock, lock)>, and
> expect_maintenance_type should acquire and return both locks, but it
> needs to be adapted further, because atm it is broken:
>
> 1. start s3-refresh while active operations are running
> 2. s3-refresh will be in loop waiting for operations to disappear
> 3. roughly every second, it will call expect_maintenance_type
> 4. if somebody else is holding the datastore config lock for more than
> ten seconds (let's say you started a re-use existing datastore
> creation with S3 ;)), expect_maintenance_type will error out!
Yeah, point 4 is broken in any case... :/ But will fix according to the
suggestions above.
>
> we currently call expect_maintenance_type in three places:
> 1. to check whether the maintenance mode changed under us while waiting
> for operations to stop (no need for locking here)
> 2. to check whether the maintenance mode changed under us if we abort
> waiting - we don't really need to lock here if we move this call to
> unset_maintenance
> 3. to do a final check before we do the actual work - this one uses the
> locks then, but we could just as well lock first, then check?
>
> what do you think about the following (untested!):
>
> ----8<----
> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
> index 883091e6f..1acb0baf3 100644
> --- a/src/api2/admin/datastore.rs
> +++ b/src/api2/admin/datastore.rs
> @@ -49,7 +49,7 @@ use pbs_api_types::{
> VERIFY_JOB_READ_THREADS_SCHEMA, VERIFY_JOB_VERIFY_THREADS_SCHEMA,
> };
> use pbs_client::pxar::{create_tar, create_zip};
> -use pbs_config::CachedUserInfo;
> +use pbs_config::{BackupLockGuard, CachedUserInfo};
> use pbs_datastore::backup_info::BackupInfo;
> use pbs_datastore::cached_chunk_reader::CachedChunkReader;
> use pbs_datastore::catalog::{ArchiveEntry, CatalogReader};
> @@ -2686,11 +2686,7 @@ pub fn mount(store: String, rpcenv: &mut dyn RpcEnvironment) -> Result<Value, Er
> Ok(json!(upid))
> }
>
> -fn expect_maintenance_type(
> - store: &str,
> - maintenance_type: MaintenanceType,
> -) -> Result<(pbs_config::BackupLockGuard, DataStoreConfig), Error> {
> - let lock = pbs_config::datastore::lock_config()?;
> +fn expect_maintenance_type(store: &str, maintenance_type: MaintenanceType) -> Result<(), Error> {
> let (section_config, _digest) = pbs_config::datastore::config()?;
> let store_config: DataStoreConfig = section_config.lookup("datastore", store)?;
>
> @@ -2701,20 +2697,24 @@ fn expect_maintenance_type(
> bail!("maintenance mode is not '{maintenance_type}'");
> }
>
> - Ok((lock, store_config))
> + Ok(())
> }
>
> fn unset_maintenance(
> - lock: Option<pbs_config::BackupLockGuard>,
> - mut config: DataStoreConfig,
> + maintenance_lock: Option<BackupLockGuard>,
> + store: &str,
> + expected: MaintenanceType,
> ) -> Result<(), Error> {
> - let _lock = match lock {
> + let _lock = pbs_config::datastore::lock_config()?;
> + let _maintenance_mode_lock = match maintenance_lock {
> Some(lock) => lock,
> - None => pbs_config::datastore::lock_config()?,
> + None => maintenance_mode_lock(store)?,
> };
> + expect_maintenance_type(store, expected)?;
> let (mut section_config, _digest) = pbs_config::datastore::config()?;
> - config.maintenance_mode = None;
> - section_config.set_data(&config.name, "datastore", &config)?;
> + let mut store_config: DataStoreConfig = section_config.lookup("datastore", store)?;
> + store_config.maintenance_mode = None;
> + section_config.set_data(store, "datastore", &store_config)?;
> pbs_config::datastore::save_config(§ion_config)?;
> Ok(())
> }
> @@ -2914,23 +2914,20 @@ fn run_maintenance_locked(
> }
>
> if aborted || worker.abort_requested() {
> - let _ = expect_maintenance_type(store, maintenance_expected)
> - .inspect_err(|e| warn!("maintenance mode was not as expected: {e}"))
> - .and_then(|(lock, config)| {
> - unset_maintenance(Some(lock), config)
> - .inspect_err(|e| warn!("could not reset maintenance mode: {e}"))
> - });
> + let _ = unset_maintenance(None, store, maintenance_expected)
> + .inspect_err(|e| warn!("could not reset maintenance mode: {e}"));
> bail!("aborted, due to user request");
> }
>
> - let (lock, config) = expect_maintenance_type(store, maintenance_expected)?;
> - let _maintenance_mode_lock = maintenance_mode_lock(&config.name)?;
> + let lock = pbs_config::datastore::lock_config()?;
> + let maintenance_mode_lock = maintenance_mode_lock(store)?;
> + expect_maintenance_type(store, maintenance_expected)?;
> // avoid blocking config access/updates, but keep maintenance-mode locked
> // until done.
> drop(lock);
>
> callback()?;
> - unset_maintenance(None, config)
> + unset_maintenance(Some(maintenance_mode_lock), store, maintenance_expected)
> .map_err(|e| format_err!("could not reset maintenance mode: {e}"))?;
>
> Ok(())
> ---->8----
>
> the tiny bit of double work of reparsing the config is IMHO not that
> bad, since this is not on any hot path..
Hmm, yes, can incorporate this, moving the locking and expected mode
check into the unset helper makes indeed sense.
>
>
>
>
>> mut config: DataStoreConfig,
>> ) -> Result<(), Error> {
>> + let _lock = match lock {
>> + Some(lock) => lock,
>> + None => pbs_config::datastore::lock_config()?,
>> + };
>> let (mut section_config, _digest) = pbs_config::datastore::config()?;
>> config.maintenance_mode = None;
>> section_config.set_data(&config.name, "datastore", &config)?;
>> @@ -2763,6 +2767,7 @@ async fn do_unmount(store: String, auth_id: Authid, to_stdout: bool) -> Result<V
>> let (mut section_config, _digest) = pbs_config::datastore::config()?;
>> let mut datastore: DataStoreConfig = section_config.lookup("datastore", &store)?;
>>
>> + let maintenance_mode_lock = maintenance_mode_lock(&store)?;
>> datastore.set_maintenance_mode(Some(MaintenanceMode {
>> ty: MaintenanceType::Unmount,
>> message: None,
>> @@ -2770,6 +2775,7 @@ async fn do_unmount(store: String, auth_id: Authid, to_stdout: bool) -> Result<V
>> section_config.set_data(&store, "datastore", &datastore)?;
>> pbs_config::datastore::save_config(§ion_config)?;
>>
>> + drop(maintenance_mode_lock);
>> drop(_lock);
>>
>> if let Ok(proxy_pid) = proxmox_rest_server::read_pid(pbs_buildcfg::PROXMOX_BACKUP_PROXY_PID_FN)
>> @@ -2845,6 +2851,7 @@ pub fn s3_refresh(store: String, rpcenv: &mut dyn RpcEnvironment) -> Result<Valu
>> let (mut section_config, _digest) = pbs_config::datastore::config()?;
>> let mut datastore_config: DataStoreConfig = section_config.lookup("datastore", &store)?;
>>
>> + let maintenance_mode_lock = maintenance_mode_lock(&store)?;
>> datastore_config.set_maintenance_mode(Some(MaintenanceMode {
>> ty: MaintenanceType::S3Refresh,
>> message: None,
>> @@ -2852,6 +2859,7 @@ pub fn s3_refresh(store: String, rpcenv: &mut dyn RpcEnvironment) -> Result<Valu
>>
>> section_config.set_data(&store, "datastore", &datastore_config)?;
>> pbs_config::datastore::save_config(§ion_config)?;
>> + drop(maintenance_mode_lock);
>> drop(_lock);
>>
>> let upid = WorkerTask::new_thread(
>> @@ -2875,7 +2883,8 @@ pub(crate) fn do_s3_refresh(store: &str, worker: &dyn WorkerTaskContext) -> Resu
>> }
>>
>> /// Wait for no more active operations on the given datastore and run the provided callback in
>> -/// a datastore locked context, protecting against maintenance mode changes.
>> +/// a maintenance mode locked context, protecting against leaving the mode for the functions
>> +/// lifetime.
>> fn run_maintenance_locked(
>> store: &str,
>> maintenance_expected: MaintenanceType,
>> @@ -2908,15 +2917,20 @@ fn run_maintenance_locked(
>> let _ = expect_maintenance_type(store, maintenance_expected)
>> .inspect_err(|e| warn!("maintenance mode was not as expected: {e}"))
>> .and_then(|(lock, config)| {
>> - unset_maintenance(lock, config)
>> + unset_maintenance(Some(lock), config)
>> .inspect_err(|e| warn!("could not reset maintenance mode: {e}"))
>> });
>> bail!("aborted, due to user request");
>> }
>>
>> let (lock, config) = expect_maintenance_type(store, maintenance_expected)?;
>> + let _maintenance_mode_lock = maintenance_mode_lock(&config.name)?;
>> + // avoid blocking config access/updates, but keep maintenance-mode locked
>> + // until done.
>> + drop(lock);
>> +
>> callback()?;
>> - unset_maintenance(lock, config)
>> + unset_maintenance(None, config)
>> .map_err(|e| format_err!("could not reset maintenance mode: {e}"))?;
>>
>> Ok(())
>> diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
>> index 16e85a636..b07e7fc7e 100644
>> --- a/src/api2/config/datastore.rs
>> +++ b/src/api2/config/datastore.rs
>> @@ -31,7 +31,9 @@ use crate::api2::config::tape_backup_job::{delete_tape_backup_job, list_tape_bac
>> use crate::api2::config::verify::delete_verification_job;
>> use pbs_config::CachedUserInfo;
>>
>> -use pbs_datastore::{get_datastore_mount_status, DataStore, S3_DATASTORE_IN_USE_MARKER};
>> +use pbs_datastore::{
>> + get_datastore_mount_status, maintenance_mode_lock, DataStore, S3_DATASTORE_IN_USE_MARKER,
>> +};
>> use proxmox_rest_server::WorkerTask;
>> use proxmox_s3_client::{S3ObjectKey, S3_HTTP_REQUEST_TIMEOUT};
>>
>> @@ -540,6 +542,7 @@ pub fn update_datastore(
>> data.tuning = None;
>> }
>> DeletableProperty::MaintenanceMode => {
>> + let _maintenance_mode_lock = maintenance_mode_lock(&name)?;
>> data.set_maintenance_mode(None)?;
>> }
>> DeletableProperty::NotificationThresholds => {
>> @@ -638,6 +641,7 @@ pub fn update_datastore(
>> )?),
>> None => None,
>> };
>> + let _maintenance_mode_lock = maintenance_mode_lock(&name)?;
>> data.set_maintenance_mode(maintenance_mode)?;
>> }
>>
>> --
>> 2.47.3
>>
>>
>>
>>
>>
>>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-05-05 13:02 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-05 8:11 [PATCH proxmox-backup 0/4] don't hold datastore config lock during s3-refresh Christian Ebner
2026-05-05 8:11 ` [PATCH proxmox-backup 1/4] pbs-config: reorganize crate use statements Christian Ebner
2026-05-05 12:15 ` applied: " Fabian Grünbichler
2026-05-05 8:11 ` [PATCH proxmox-backup 2/4] datastore: move lock files base path constant to central location Christian Ebner
2026-05-05 8:11 ` [PATCH proxmox-backup 3/4] datastore: move file lock helper to centralized place Christian Ebner
2026-05-05 12:15 ` Fabian Grünbichler
2026-05-05 8:11 ` [PATCH proxmox-backup 4/4] api/datastore: use maintenance-mode lock to protect against changes Christian Ebner
2026-05-05 12:14 ` Fabian Grünbichler
2026-05-05 13:02 ` Christian Ebner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox