From: Christian Ebner <c.ebner@proxmox.com>
To: Proxmox Backup Server development discussion
<pbs-devel@lists.proxmox.com>,
Shannon Sterz <s.sterz@proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v7 1/4] datastore/api/backup: prepare for fix of #3935 by adding lock helpers
Date: Fri, 14 Mar 2025 09:30:02 +0100 [thread overview]
Message-ID: <10c3183b-4c24-43f7-ba10-1fdfadbd9690@proxmox.com> (raw)
In-Reply-To: <20250311135200.309896-2-s.sterz@proxmox.com>
On 3/11/25 14:51, Shannon Sterz wrote:
> to avoid duplicate code, add helpers for locking groups and snapshots
> to the BackupGroup and BackupDir traits respectively and refactor
> existing code to use them.
>
> this also changes the error semantics of the refactored
> functions as locking related error messages are now unified and, thus,
> less specific.
The callside specific context could however be readded to some extend by
using anyhow::Context, adapt the callside error formatting and keeping
the would_bock_msg even more generic like `locked by another operation`?
This would help narrowing down issues more easily.
>
> Signed-off-by: Shannon Sterz <s.sterz@proxmox.com>
> ---
> pbs-datastore/src/backup_info.rs | 39 +++++++++++++---
> pbs-datastore/src/datastore.rs | 66 ++++++++++------------------
> pbs-datastore/src/snapshot_reader.rs | 7 +--
> src/api2/backup/environment.rs | 9 +---
> src/api2/backup/mod.rs | 8 +---
> src/api2/reader/mod.rs | 7 +--
> src/backup/verify.rs | 12 ++---
> src/server/sync.rs | 10 ++---
> 8 files changed, 68 insertions(+), 90 deletions(-)
>
> diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
> index ef2b982c..42ce74e9 100644
> --- a/pbs-datastore/src/backup_info.rs
> +++ b/pbs-datastore/src/backup_info.rs
> @@ -5,7 +5,9 @@ use std::sync::Arc;
>
> use anyhow::{bail, format_err, Error};
>
> -use proxmox_sys::fs::{lock_dir_noblock, replace_file, CreateOptions};
> +use proxmox_sys::fs::{
> + lock_dir_noblock, lock_dir_noblock_shared, replace_file, CreateOptions, DirLockGuard,
> +};
>
> use pbs_api_types::{
> Authid, BackupGroupDeleteStats, BackupNamespace, BackupType, GroupFilter, VerifyState,
> @@ -199,9 +201,8 @@ impl BackupGroup {
> /// Returns `BackupGroupDeleteStats`, containing the number of deleted snapshots
> /// and number of protected snaphsots, which therefore were not removed.
> pub fn destroy(&self) -> Result<BackupGroupDeleteStats, Error> {
> + let _guard = self.lock()?;
> let path = self.full_group_path();
> - let _guard =
> - proxmox_sys::fs::lock_dir_noblock(&path, "backup group", "possible running backup")?;
>
> log::info!("removing backup group {:?}", path);
> let mut delete_stats = BackupGroupDeleteStats::default();
> @@ -237,6 +238,15 @@ impl BackupGroup {
> self.store
> .set_owner(&self.ns, self.as_ref(), auth_id, force)
> }
> +
> + /// Lock a group exclusively
> + pub fn lock(&self) -> Result<DirLockGuard, Error> {
> + lock_dir_noblock(
> + &self.full_group_path(),
> + "backup group",
> + "possible running backup",
> + )
> + }
> }
>
> impl AsRef<pbs_api_types::BackupNamespace> for BackupGroup {
> @@ -442,15 +452,31 @@ impl BackupDir {
> .map_err(|err| format_err!("unable to acquire manifest lock {:?} - {}", &path, err))
> }
>
> + /// Lock this snapshot exclusively
> + pub fn lock(&self) -> Result<DirLockGuard, Error> {
> + lock_dir_noblock(
> + &self.full_path(),
> + "snapshot",
> + "backup is running or snapshot is in use",
> + )
> + }
> +
> + /// Acquire a shared lock on this snapshot
> + pub fn lock_shared(&self) -> Result<DirLockGuard, Error> {
> + lock_dir_noblock_shared(
> + &self.full_path(),
> + "snapshot",
> + "backup is running or snapshot is in use, could not acquire shared lock",
> + )
> + }
> +
> /// Destroy the whole snapshot, bails if it's protected
> ///
> /// Setting `force` to true skips locking and thus ignores if the backup is currently in use.
> pub fn destroy(&self, force: bool) -> Result<(), Error> {
> - let full_path = self.full_path();
> -
> let (_guard, _manifest_guard);
> if !force {
> - _guard = lock_dir_noblock(&full_path, "snapshot", "possibly running or in use")?;
> + _guard = self.lock()?;
> _manifest_guard = self.lock_manifest()?;
> }
>
> @@ -458,6 +484,7 @@ impl BackupDir {
> bail!("cannot remove protected snapshot"); // use special error type?
> }
>
> + let full_path = self.full_path();
> log::info!("removing backup snapshot {:?}", full_path);
> std::fs::remove_dir_all(&full_path).map_err(|err| {
> format_err!("removing backup snapshot {:?} failed - {}", full_path, err,)
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 75c0c16a..b3804b5a 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -13,8 +13,8 @@ use proxmox_human_byte::HumanByte;
> use proxmox_schema::ApiType;
>
> use proxmox_sys::error::SysError;
> +use proxmox_sys::fs::DirLockGuard;
> use proxmox_sys::fs::{file_read_optional_string, replace_file, CreateOptions};
> -use proxmox_sys::fs::{lock_dir_noblock, DirLockGuard};
> use proxmox_sys::linux::procfs::MountInfo;
> use proxmox_sys::process_locker::ProcessLockSharedGuard;
> use proxmox_worker_task::WorkerTaskContext;
> @@ -774,41 +774,31 @@ impl DataStore {
> ///
> /// This also acquires an exclusive lock on the directory and returns the lock guard.
> pub fn create_locked_backup_group(
> - &self,
> + self: &Arc<Self>,
> ns: &BackupNamespace,
> backup_group: &pbs_api_types::BackupGroup,
> auth_id: &Authid,
> ) -> Result<(Authid, DirLockGuard), Error> {
> - // create intermediate path first:
> - let mut full_path = self.base_path();
> - for ns in ns.components() {
> - full_path.push("ns");
> - full_path.push(ns);
> - }
> - full_path.push(backup_group.ty.as_str());
> - std::fs::create_dir_all(&full_path)?;
> + let backup_group = self.backup_group(ns.clone(), backup_group.clone());
>
> - full_path.push(&backup_group.id);
> + // create intermediate path first
> + let full_path = backup_group.full_group_path();
>
> - // create the last component now
> + std::fs::create_dir_all(full_path.parent().ok_or_else(|| {
> + format_err!("could not construct parent path for group {backup_group:?}")
> + })?)?;
> +
> + // now create the group, this allows us to check whether it existed before
> match std::fs::create_dir(&full_path) {
> Ok(_) => {
> - let guard = lock_dir_noblock(
> - &full_path,
> - "backup group",
> - "another backup is already running",
> - )?;
> - self.set_owner(ns, backup_group, auth_id, false)?;
> - let owner = self.get_owner(ns, backup_group)?; // just to be sure
> + let guard = backup_group.lock()?;
> + self.set_owner(ns, backup_group.group(), auth_id, false)?;
> + let owner = self.get_owner(ns, backup_group.group())?; // just to be sure
> Ok((owner, guard))
> }
> Err(ref err) if err.kind() == io::ErrorKind::AlreadyExists => {
> - let guard = lock_dir_noblock(
> - &full_path,
> - "backup group",
> - "another backup is already running",
> - )?;
> - let owner = self.get_owner(ns, backup_group)?; // just to be sure
> + let guard = backup_group.lock()?;
> + let owner = self.get_owner(ns, backup_group.group())?; // just to be sure
> Ok((owner, guard))
> }
> Err(err) => bail!("unable to create backup group {:?} - {}", full_path, err),
> @@ -819,29 +809,17 @@ impl DataStore {
> ///
> /// The BackupGroup directory needs to exist.
> pub fn create_locked_backup_dir(
> - &self,
> + self: &Arc<Self>,
> ns: &BackupNamespace,
> backup_dir: &pbs_api_types::BackupDir,
> ) -> Result<(PathBuf, bool, DirLockGuard), Error> {
> - let full_path = self.snapshot_path(ns, backup_dir);
> - let relative_path = full_path.strip_prefix(self.base_path()).map_err(|err| {
> - format_err!(
> - "failed to produce correct path for backup {backup_dir} in namespace {ns}: {err}"
> - )
> - })?;
> + let backup_dir = self.backup_dir(ns.clone(), backup_dir.clone())?;
> + let relative_path = backup_dir.relative_path();
>
> - let lock = || {
> - lock_dir_noblock(
> - &full_path,
> - "snapshot",
> - "internal error - tried creating snapshot that's already in use",
> - )
> - };
> -
> - match std::fs::create_dir(&full_path) {
> - Ok(_) => Ok((relative_path.to_owned(), true, lock()?)),
> + match std::fs::create_dir(backup_dir.full_path()) {
> + Ok(_) => Ok((relative_path, true, backup_dir.lock()?)),
> Err(ref e) if e.kind() == io::ErrorKind::AlreadyExists => {
> - Ok((relative_path.to_owned(), false, lock()?))
> + Ok((relative_path, false, backup_dir.lock()?))
> }
> Err(e) => Err(e.into()),
> }
> @@ -1305,7 +1283,7 @@ impl DataStore {
> bail!("snapshot {} does not exist!", backup_dir.dir());
> }
>
> - let _guard = lock_dir_noblock(&full_path, "snapshot", "possibly running or in use")?;
> + let _guard = backup_dir.lock()?;
>
> let protected_path = backup_dir.protected_file();
> if protection {
> diff --git a/pbs-datastore/src/snapshot_reader.rs b/pbs-datastore/src/snapshot_reader.rs
> index 36349888..604d35c1 100644
> --- a/pbs-datastore/src/snapshot_reader.rs
> +++ b/pbs-datastore/src/snapshot_reader.rs
> @@ -1,14 +1,12 @@
> use std::fs::File;
> use std::os::unix::io::{AsRawFd, FromRawFd};
> use std::path::Path;
> -use std::sync::Arc;
> use std::rc::Rc;
> +use std::sync::Arc;
nit: unrelated hunk, does not apply cleanly anymore
>
> use anyhow::{bail, Error};
> use nix::dir::Dir;
>
> -use proxmox_sys::fs::lock_dir_noblock_shared;
> -
> use pbs_api_types::{
> print_store_and_ns, ArchiveType, BackupNamespace, Operation, CLIENT_LOG_BLOB_NAME,
> MANIFEST_BLOB_NAME,
> @@ -48,8 +46,7 @@ impl SnapshotReader {
> bail!("snapshot {} does not exist!", snapshot.dir());
> }
>
> - let locked_dir =
> - lock_dir_noblock_shared(&snapshot_path, "snapshot", "locked by another operation")?;
> + let locked_dir = snapshot.lock_shared()?;
>
> let datastore_name = datastore.name().to_string();
> let manifest = match snapshot.load_manifest() {
> diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
> index 99d885e2..133932b3 100644
> --- a/src/api2/backup/environment.rs
> +++ b/src/api2/backup/environment.rs
> @@ -8,7 +8,7 @@ use ::serde::Serialize;
> use serde_json::{json, Value};
>
> use proxmox_router::{RpcEnvironment, RpcEnvironmentType};
> -use proxmox_sys::fs::{lock_dir_noblock_shared, replace_file, CreateOptions};
> +use proxmox_sys::fs::{replace_file, CreateOptions};
>
> use pbs_api_types::Authid;
> use pbs_datastore::backup_info::{BackupDir, BackupInfo};
> @@ -645,12 +645,7 @@ impl BackupEnvironment {
>
> // Downgrade to shared lock, the backup itself is finished
> drop(excl_snap_lock);
> - let snap_lock = lock_dir_noblock_shared(
> - &self.backup_dir.full_path(),
> - "snapshot",
> - "snapshot is already locked by another operation",
> - )?;
> -
> + let snap_lock = self.backup_dir.lock_shared()?;
> let worker_id = format!(
> "{}:{}/{}/{:08X}",
> self.datastore.name(),
> diff --git a/src/api2/backup/mod.rs b/src/api2/backup/mod.rs
> index 82c6438a..d5513893 100644
> --- a/src/api2/backup/mod.rs
> +++ b/src/api2/backup/mod.rs
> @@ -17,7 +17,6 @@ use proxmox_router::{
> };
> use proxmox_schema::*;
> use proxmox_sortable_macro::sortable;
> -use proxmox_sys::fs::lock_dir_noblock_shared;
>
> use pbs_api_types::{
> ArchiveType, Authid, BackupNamespace, BackupType, Operation, VerifyState,
> @@ -186,12 +185,7 @@ fn upgrade_to_backup_protocol(
> }
>
> // lock last snapshot to prevent forgetting/pruning it during backup
> - let full_path = last.backup_dir.full_path();
> - Some(lock_dir_noblock_shared(
> - &full_path,
> - "snapshot",
> - "base snapshot is already locked by another operation",
> - )?)
> + Some(last.backup_dir.lock_shared()?)
> } else {
> None
> };
> diff --git a/src/api2/reader/mod.rs b/src/api2/reader/mod.rs
> index 328141c8..5a9678ff 100644
> --- a/src/api2/reader/mod.rs
> +++ b/src/api2/reader/mod.rs
> @@ -16,7 +16,6 @@ use proxmox_router::{
> };
> use proxmox_schema::{BooleanSchema, ObjectSchema};
> use proxmox_sortable_macro::sortable;
> -use proxmox_sys::fs::lock_dir_noblock_shared;
>
> use pbs_api_types::{
> ArchiveType, Authid, Operation, BACKUP_ARCHIVE_NAME_SCHEMA, BACKUP_ID_SCHEMA,
> @@ -129,11 +128,7 @@ fn upgrade_to_backup_reader_protocol(
> bail!("snapshot {} does not exist.", backup_dir.dir());
> }
>
> - let _guard = lock_dir_noblock_shared(
> - &backup_dir.full_path(),
> - "snapshot",
> - "locked by another operation",
> - )?;
> + let _guard = backup_dir.lock_shared()?;
>
> let path = datastore.base_path();
>
> diff --git a/src/backup/verify.rs b/src/backup/verify.rs
> index f652c262..10a64fda 100644
> --- a/src/backup/verify.rs
> +++ b/src/backup/verify.rs
> @@ -4,10 +4,9 @@ use std::sync::{Arc, Mutex};
> use std::time::Instant;
>
> use anyhow::{bail, Error};
> -use nix::dir::Dir;
> +use proxmox_sys::fs::DirLockGuard;
> use tracing::{error, info, warn};
>
> -use proxmox_sys::fs::lock_dir_noblock_shared;
> use proxmox_worker_task::WorkerTaskContext;
>
> use pbs_api_types::{
> @@ -307,11 +306,8 @@ pub fn verify_backup_dir(
> return Ok(true);
> }
>
> - let snap_lock = lock_dir_noblock_shared(
> - &backup_dir.full_path(),
> - "snapshot",
> - "locked by another operation",
> - );
> + let snap_lock = backup_dir.lock_shared();
> +
> match snap_lock {
> Ok(snap_lock) => {
> verify_backup_dir_with_lock(verify_worker, backup_dir, upid, filter, snap_lock)
> @@ -334,7 +330,7 @@ pub fn verify_backup_dir_with_lock(
> backup_dir: &BackupDir,
> upid: UPID,
> filter: Option<&dyn Fn(&BackupManifest) -> bool>,
> - _snap_lock: Dir,
> + _snap_lock: DirLockGuard,
> ) -> Result<bool, Error> {
> let datastore_name = verify_worker.datastore.name();
> let backup_dir_name = backup_dir.dir();
> diff --git a/src/server/sync.rs b/src/server/sync.rs
> index 4dd46c5a..abe7316a 100644
> --- a/src/server/sync.rs
> +++ b/src/server/sync.rs
> @@ -10,6 +10,7 @@ use std::time::Duration;
> use anyhow::{bail, format_err, Context, Error};
> use futures::{future::FutureExt, select};
> use hyper::http::StatusCode;
> +use proxmox_sys::fs::DirLockGuard;
> use serde_json::json;
> use tracing::{info, warn};
>
> @@ -105,7 +106,7 @@ pub(crate) struct RemoteSourceReader {
> }
>
> pub(crate) struct LocalSourceReader {
> - pub(crate) _dir_lock: Arc<Mutex<proxmox_sys::fs::DirLockGuard>>,
> + pub(crate) _dir_lock: Arc<Mutex<DirLockGuard>>,
> pub(crate) path: PathBuf,
> pub(crate) datastore: Arc<DataStore>,
> }
> @@ -478,13 +479,8 @@ impl SyncSource for LocalSource {
> dir: &BackupDir,
> ) -> Result<Arc<dyn SyncSourceReader>, Error> {
> let dir = self.store.backup_dir(ns.clone(), dir.clone())?;
> - let dir_lock = proxmox_sys::fs::lock_dir_noblock_shared(
> - &dir.full_path(),
> - "snapshot",
> - "locked by another operation",
> - )?;
> Ok(Arc::new(LocalSourceReader {
> - _dir_lock: Arc::new(Mutex::new(dir_lock)),
> + _dir_lock: Arc::new(Mutex::new(dir.lock()?)),
> path: dir.full_path(),
> datastore: dir.datastore().clone(),
> }))
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2025-03-14 8:30 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-11 13:51 [pbs-devel] [PATCH proxmox-backup v7 0/4] refactor datastore locking to use tmpfs Shannon Sterz
2025-03-11 13:51 ` [pbs-devel] [PATCH proxmox-backup v7 1/4] datastore/api/backup: prepare for fix of #3935 by adding lock helpers Shannon Sterz
2025-03-14 8:30 ` Christian Ebner [this message]
2025-03-11 13:51 ` [pbs-devel] [PATCH proxmox-backup v7 2/4] fix #3935: datastore/api/backup: move datastore locking to '/run' Shannon Sterz
2025-03-11 13:51 ` [pbs-devel] [PATCH proxmox-backup v7 3/4] fix #3935: datastore: move manifest locking to new locking method Shannon Sterz
2025-03-11 13:52 ` [pbs-devel] [PATCH proxmox-backup v7 4/4] fix: api: avoid race condition in set_backup_owner Shannon Sterz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=10c3183b-4c24-43f7-ba10-1fdfadbd9690@proxmox.com \
--to=c.ebner@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
--cc=s.sterz@proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal