public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox-backup v4 0/5] refactor datastore locking to use tmpfs
@ 2022-07-05 14:54 Stefan Sterz
  2022-07-05 14:54 ` [pbs-devel] [PATCH proxmox-backup v4 1/5] fix: datastore: make relative_group_path() return relative path Stefan Sterz
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Stefan Sterz @ 2022-07-05 14:54 UTC (permalink / raw)
  To: pbs-devel

The first commit fixes a bug in the `BackupGroup`'s
`relative_group_path()` method. It would return the full path instead
of a path relative to a datastore. Since this function wasn't used
before this series, afaict, this fix should be unproblematic.

The rests of this series refactors the locking mechanism inside the
`DataStore`, `BackupDir` and `BackupGroup traits. In a first step
locking methods are added and the existing code is refactored to use
them.  Commit three uses the fixed `relative_group_path()` method to
derive a lock file name under '/run' for each group/snapshot. It also
adds double stat'ing.

The fourth commit refactors locking for manifests and brings it
in-line with the group/snapshot locks. Finally the fifth commit fixes
a race condition when changing the owner of a datastore.

As far as I can tell, the addition of namespaces did not create a need
for more locking.

changes from v3:
* moved patch 2 to the front so it can be applied separatelly more
  easily
* rebased on current master

changes from v2:

* different encoding scheme for lock file names
* refactored locking methods to be used by the new BackupDir and
  BackupGroup traits
* adapted lock file names to include namespaces

changes from v1 (thanks @ Wolfgang Bumiller & Thomas Lamprecht):

* split adding locking helpers and move '/run' into two commits
* instead of stat'ing the path of lock file twice, only use the file
  descriptor for one of the stat'ing procedures instead
* several improvements to helper functions and documentation

Stefan Sterz (5):
  fix: datastore: make relative_group_path() return relative path
  fix #3935: datastore/api/backup: add lock helpers to backup dir/group
  fix #3935: datastore/api/backup: move datastore locking to '/run'
  fix #3935: datastore: move manifest locking to new locking method
  fix: api: avoid race condition in set_backup_owner

 pbs-config/src/lib.rs                |   7 ++
 pbs-datastore/src/backup_info.rs     | 153 ++++++++++++++++++++++-----
 pbs-datastore/src/datastore.rs       |  67 ++++--------
 pbs-datastore/src/snapshot_reader.rs |  23 ++--
 src/api2/admin/datastore.rs          |   9 +-
 src/api2/backup/environment.rs       |   5 +-
 src/api2/backup/mod.rs               |   8 +-
 src/api2/reader/mod.rs               |   7 +-
 src/backup/verify.rs                 |  11 +-
 9 files changed, 189 insertions(+), 101 deletions(-)

-- 
2.30.2





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v4 1/5] fix: datastore: make relative_group_path() return relative path
  2022-07-05 14:54 [pbs-devel] [PATCH proxmox-backup v4 0/5] refactor datastore locking to use tmpfs Stefan Sterz
@ 2022-07-05 14:54 ` Stefan Sterz
  2022-07-05 14:54 ` [pbs-devel] [PATCH proxmox-backup v4 2/5] fix #3935: datastore/api/backup: add lock helpers to backup dir/group Stefan Sterz
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Stefan Sterz @ 2022-07-05 14:54 UTC (permalink / raw)
  To: pbs-devel

previously the BackGroup trait used the datastore's
namespace_path() method to construct a base path. this would result in
it returning an absolute path equivalent to full_group_path(). use
the namspace's path() method instead to get a relative path, in-line
with backup_dir's relative_path().

Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
---
 pbs-datastore/src/backup_info.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
index 10320a35..74432dc7 100644
--- a/pbs-datastore/src/backup_info.rs
+++ b/pbs-datastore/src/backup_info.rs
@@ -72,7 +72,7 @@ impl BackupGroup {
     }
 
     pub fn relative_group_path(&self) -> PathBuf {
-        let mut path = self.store.namespace_path(&self.ns);
+        let mut path = self.ns.path();
         path.push(self.group.ty.as_str());
         path.push(&self.group.id);
         path
-- 
2.30.2





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v4 2/5] fix #3935: datastore/api/backup: add lock helpers to backup dir/group
  2022-07-05 14:54 [pbs-devel] [PATCH proxmox-backup v4 0/5] refactor datastore locking to use tmpfs Stefan Sterz
  2022-07-05 14:54 ` [pbs-devel] [PATCH proxmox-backup v4 1/5] fix: datastore: make relative_group_path() return relative path Stefan Sterz
@ 2022-07-05 14:54 ` Stefan Sterz
  2022-07-05 14:54 ` [pbs-devel] [PATCH proxmox-backup v4 3/5] fix #3935: datastore/api/backup: move datastore locking to '/run' Stefan Sterz
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Stefan Sterz @ 2022-07-05 14:54 UTC (permalink / raw)
  To: pbs-devel

to avoid duplicate code, add helpers for locking groups and snapshots
to the BackupGroup and BackupDir traits respectively and refactor
existing code to use them.

Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
---
 pbs-datastore/src/backup_info.rs     | 31 +++++++++++---
 pbs-datastore/src/datastore.rs       | 63 +++++++++-------------------
 pbs-datastore/src/snapshot_reader.rs |  8 +---
 src/api2/backup/mod.rs               |  8 +---
 src/api2/reader/mod.rs               |  7 +---
 src/backup/verify.rs                 |  7 +---
 6 files changed, 50 insertions(+), 74 deletions(-)

diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
index 74432dc7..e171ac21 100644
--- a/pbs-datastore/src/backup_info.rs
+++ b/pbs-datastore/src/backup_info.rs
@@ -6,7 +6,9 @@ use std::sync::Arc;
 
 use anyhow::{bail, format_err, Error};
 
-use proxmox_sys::fs::{lock_dir_noblock, replace_file, CreateOptions};
+use proxmox_sys::fs::{
+    lock_dir_noblock, lock_dir_noblock_shared, replace_file, CreateOptions, DirLockGuard,
+};
 
 use pbs_api_types::{
     Authid, BackupNamespace, BackupType, GroupFilter, BACKUP_DATE_REGEX, BACKUP_FILE_REGEX,
@@ -194,9 +196,8 @@ impl BackupGroup {
     ///
     /// Returns true if all snapshots were removed, and false if some were protected
     pub fn destroy(&self) -> Result<bool, Error> {
+        let _guard = self.lock()?;
         let path = self.full_group_path();
-        let _guard =
-            proxmox_sys::fs::lock_dir_noblock(&path, "backup group", "possible running backup")?;
 
         log::info!("removing backup group {:?}", path);
         let mut removed_all_snaps = true;
@@ -230,6 +231,15 @@ impl BackupGroup {
         self.store
             .set_owner(&self.ns, self.as_ref(), auth_id, force)
     }
+
+    /// Lock a group exclusively
+    pub fn lock(&self) -> Result<DirLockGuard, Error> {
+        lock_dir_noblock(
+            &self.full_group_path(),
+            "backup group",
+            "possible running backup",
+        )
+    }
 }
 
 impl AsRef<pbs_api_types::BackupNamespace> for BackupGroup {
@@ -433,15 +443,23 @@ impl BackupDir {
             .map_err(|err| format_err!("unable to acquire manifest lock {:?} - {}", &path, err))
     }
 
+    /// Lock this snapshot exclusively
+    pub fn lock(&self) -> Result<DirLockGuard, Error> {
+        lock_dir_noblock(&self.full_path(), "snapshot", "possibly running or in use")
+    }
+
+    /// Acquire a shared lock on this snapshot
+    pub fn lock_shared(&self) -> Result<DirLockGuard, Error> {
+        lock_dir_noblock_shared(&self.full_path(), "snapshot", "possibly running or in use")
+    }
+
     /// Destroy the whole snapshot, bails if it's protected
     ///
     /// Setting `force` to true skips locking and thus ignores if the backup is currently in use.
     pub fn destroy(&self, force: bool) -> Result<(), Error> {
-        let full_path = self.full_path();
-
         let (_guard, _manifest_guard);
         if !force {
-            _guard = lock_dir_noblock(&full_path, "snapshot", "possibly running or in use")?;
+            _guard = self.lock()?;
             _manifest_guard = self.lock_manifest()?;
         }
 
@@ -449,6 +467,7 @@ impl BackupDir {
             bail!("cannot remove protected snapshot"); // use special error type?
         }
 
+        let full_path = self.full_path();
         log::info!("removing backup snapshot {:?}", full_path);
         std::fs::remove_dir_all(&full_path).map_err(|err| {
             format_err!("removing backup snapshot {:?} failed - {}", full_path, err,)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 1ae5bcb6..2cfcbcaa 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -11,8 +11,8 @@ use nix::unistd::{unlinkat, UnlinkatFlags};
 
 use proxmox_schema::ApiType;
 
+use proxmox_sys::fs::DirLockGuard;
 use proxmox_sys::fs::{file_read_optional_string, replace_file, CreateOptions};
-use proxmox_sys::fs::{lock_dir_noblock, DirLockGuard};
 use proxmox_sys::process_locker::ProcessLockSharedGuard;
 use proxmox_sys::WorkerTaskContext;
 use proxmox_sys::{task_log, task_warn};
@@ -628,36 +628,27 @@ impl DataStore {
         backup_group: &pbs_api_types::BackupGroup,
         auth_id: &Authid,
     ) -> Result<(Authid, DirLockGuard), Error> {
+        let backup_group =
+            BackupGroup::new(Arc::new(self.clone()), ns.clone(), backup_group.clone());
+
         // create intermediate path first:
-        let mut full_path = self.base_path();
-        for ns in ns.components() {
-            full_path.push("ns");
-            full_path.push(ns);
-        }
-        full_path.push(backup_group.ty.as_str());
-        std::fs::create_dir_all(&full_path)?;
+        let full_path = backup_group.full_group_path();
 
-        full_path.push(&backup_group.id);
+        std::fs::create_dir_all(&full_path.parent().ok_or(format_err!(
+            "could not construct parent path for group {backup_group:?}"
+        ))?)?;
 
-        // create the last component now
+        // now create group, this allows us to check whether it existed before
         match std::fs::create_dir(&full_path) {
             Ok(_) => {
-                let guard = lock_dir_noblock(
-                    &full_path,
-                    "backup group",
-                    "another backup is already running",
-                )?;
-                self.set_owner(ns, backup_group, auth_id, false)?;
-                let owner = self.get_owner(ns, backup_group)?; // just to be sure
+                let guard = backup_group.lock()?;
+                self.set_owner(ns, backup_group.group(), auth_id, false)?;
+                let owner = self.get_owner(ns, backup_group.group())?; // just to be sure
                 Ok((owner, guard))
             }
             Err(ref err) if err.kind() == io::ErrorKind::AlreadyExists => {
-                let guard = lock_dir_noblock(
-                    &full_path,
-                    "backup group",
-                    "another backup is already running",
-                )?;
-                let owner = self.get_owner(ns, backup_group)?; // just to be sure
+                let guard = backup_group.lock()?;
+                let owner = self.get_owner(ns, backup_group.group())?; // just to be sure
                 Ok((owner, guard))
             }
             Err(err) => bail!("unable to create backup group {:?} - {}", full_path, err),
@@ -672,25 +663,13 @@ impl DataStore {
         ns: &BackupNamespace,
         backup_dir: &pbs_api_types::BackupDir,
     ) -> Result<(PathBuf, bool, DirLockGuard), Error> {
-        let full_path = self.snapshot_path(ns, backup_dir);
-        let relative_path = full_path.strip_prefix(self.base_path()).map_err(|err| {
-            format_err!(
-                "failed to produce correct path for backup {backup_dir} in namespace {ns}: {err}"
-            )
-        })?;
+        let backup_dir = Arc::new(self.clone()).backup_dir(ns.clone(), backup_dir.clone())?;
+        let relative_path = backup_dir.relative_path();
 
-        let lock = || {
-            lock_dir_noblock(
-                &full_path,
-                "snapshot",
-                "internal error - tried creating snapshot that's already in use",
-            )
-        };
-
-        match std::fs::create_dir(&full_path) {
-            Ok(_) => Ok((relative_path.to_owned(), true, lock()?)),
+        match std::fs::create_dir(&backup_dir.full_path()) {
+            Ok(_) => Ok((relative_path.to_owned(), true, backup_dir.lock()?)),
             Err(ref e) if e.kind() == io::ErrorKind::AlreadyExists => {
-                Ok((relative_path.to_owned(), false, lock()?))
+                Ok((relative_path.to_owned(), false, backup_dir.lock()?))
             }
             Err(e) => Err(e.into()),
         }
@@ -1132,9 +1111,7 @@ impl DataStore {
 
     /// Updates the protection status of the specified snapshot.
     pub fn update_protection(&self, backup_dir: &BackupDir, protection: bool) -> Result<(), Error> {
-        let full_path = backup_dir.full_path();
-
-        let _guard = lock_dir_noblock(&full_path, "snapshot", "possibly running or in use")?;
+        let _guard = backup_dir.lock()?;
 
         let protected_path = backup_dir.protected_file();
         if protection {
diff --git a/pbs-datastore/src/snapshot_reader.rs b/pbs-datastore/src/snapshot_reader.rs
index a2bb6aa1..22308f42 100644
--- a/pbs-datastore/src/snapshot_reader.rs
+++ b/pbs-datastore/src/snapshot_reader.rs
@@ -6,8 +6,6 @@ use std::sync::Arc;
 use anyhow::{bail, Error};
 use nix::dir::Dir;
 
-use proxmox_sys::fs::lock_dir_noblock_shared;
-
 use pbs_api_types::{print_store_and_ns, BackupNamespace, Operation};
 
 use crate::backup_info::BackupDir;
@@ -39,10 +37,8 @@ impl SnapshotReader {
 
     pub(crate) fn new_do(snapshot: BackupDir) -> Result<Self, Error> {
         let datastore = snapshot.datastore();
-        let snapshot_path = snapshot.full_path();
 
-        let locked_dir =
-            lock_dir_noblock_shared(&snapshot_path, "snapshot", "locked by another operation")?;
+        let locked_dir = snapshot.lock_shared()?;
 
         let datastore_name = datastore.name().to_string();
         let manifest = match snapshot.load_manifest() {
@@ -57,7 +53,7 @@ impl SnapshotReader {
             }
         };
 
-        let mut client_log_path = snapshot_path;
+        let mut client_log_path = snapshot.full_path();
         client_log_path.push(CLIENT_LOG_BLOB_NAME);
 
         let mut file_list = vec![MANIFEST_BLOB_NAME.to_string()];
diff --git a/src/api2/backup/mod.rs b/src/api2/backup/mod.rs
index e519a200..077d70f9 100644
--- a/src/api2/backup/mod.rs
+++ b/src/api2/backup/mod.rs
@@ -27,7 +27,6 @@ use pbs_datastore::manifest::{archive_type, ArchiveType};
 use pbs_datastore::{DataStore, PROXMOX_BACKUP_PROTOCOL_ID_V1};
 use pbs_tools::json::{required_array_param, required_integer_param, required_string_param};
 use proxmox_rest_server::{H2Service, WorkerTask};
-use proxmox_sys::fs::lock_dir_noblock_shared;
 
 mod environment;
 use environment::*;
@@ -181,12 +180,7 @@ fn upgrade_to_backup_protocol(
             }
 
             // lock last snapshot to prevent forgetting/pruning it during backup
-            let full_path = last.backup_dir.full_path();
-            Some(lock_dir_noblock_shared(
-                &full_path,
-                "snapshot",
-                "base snapshot is already locked by another operation",
-            )?)
+            Some(last.backup_dir.lock_shared()?)
         } else {
             None
         };
diff --git a/src/api2/reader/mod.rs b/src/api2/reader/mod.rs
index e2a10da3..ea7c4aa7 100644
--- a/src/api2/reader/mod.rs
+++ b/src/api2/reader/mod.rs
@@ -27,7 +27,6 @@ use pbs_datastore::manifest::{archive_type, ArchiveType};
 use pbs_datastore::{DataStore, PROXMOX_BACKUP_READER_PROTOCOL_ID_V1};
 use pbs_tools::json::required_string_param;
 use proxmox_rest_server::{H2Service, WorkerTask};
-use proxmox_sys::fs::lock_dir_noblock_shared;
 
 use crate::api2::backup::optional_ns_param;
 use crate::api2::helpers;
@@ -125,11 +124,7 @@ fn upgrade_to_backup_reader_protocol(
             }
         }
 
-        let _guard = lock_dir_noblock_shared(
-            &backup_dir.full_path(),
-            "snapshot",
-            "locked by another operation",
-        )?;
+        let _guard = backup_dir.lock_shared()?;
 
         let path = datastore.base_path();
 
diff --git a/src/backup/verify.rs b/src/backup/verify.rs
index 9ad45e5e..e482e309 100644
--- a/src/backup/verify.rs
+++ b/src/backup/verify.rs
@@ -16,7 +16,6 @@ use pbs_datastore::backup_info::{BackupDir, BackupGroup, BackupInfo};
 use pbs_datastore::index::IndexFile;
 use pbs_datastore::manifest::{archive_type, ArchiveType, BackupManifest, FileInfo};
 use pbs_datastore::{DataBlob, DataStore, StoreProgress};
-use proxmox_sys::fs::lock_dir_noblock_shared;
 
 use crate::tools::parallel_handler::ParallelHandler;
 
@@ -328,11 +327,7 @@ pub fn verify_backup_dir(
     upid: UPID,
     filter: Option<&dyn Fn(&BackupManifest) -> bool>,
 ) -> Result<bool, Error> {
-    let snap_lock = lock_dir_noblock_shared(
-        &backup_dir.full_path(),
-        "snapshot",
-        "locked by another operation",
-    );
+    let snap_lock = backup_dir.lock_shared();
     match snap_lock {
         Ok(snap_lock) => {
             verify_backup_dir_with_lock(verify_worker, backup_dir, upid, filter, snap_lock)
-- 
2.30.2





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v4 3/5] fix #3935: datastore/api/backup: move datastore locking to '/run'
  2022-07-05 14:54 [pbs-devel] [PATCH proxmox-backup v4 0/5] refactor datastore locking to use tmpfs Stefan Sterz
  2022-07-05 14:54 ` [pbs-devel] [PATCH proxmox-backup v4 1/5] fix: datastore: make relative_group_path() return relative path Stefan Sterz
  2022-07-05 14:54 ` [pbs-devel] [PATCH proxmox-backup v4 2/5] fix #3935: datastore/api/backup: add lock helpers to backup dir/group Stefan Sterz
@ 2022-07-05 14:54 ` Stefan Sterz
  2022-07-05 14:54 ` [pbs-devel] [PATCH proxmox-backup v4 4/5] fix #3935: datastore: move manifest locking to new locking method Stefan Sterz
  2022-07-05 14:54 ` [pbs-devel] [PATCH proxmox-backup v4 5/5] fix: api: avoid race condition in set_backup_owner Stefan Sterz
  4 siblings, 0 replies; 6+ messages in thread
From: Stefan Sterz @ 2022-07-05 14:54 UTC (permalink / raw)
  To: pbs-devel

to avoid issues when removing a group or snapshot directory where two
threads hold a lock to the same directory, move locking to the tmpfs
backed '/run' directory and use a flat file structure. also adds
double stat'ing to make it possible to remove locks without certain
race condition issues.

Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
---
 pbs-config/src/lib.rs                |   7 ++
 pbs-datastore/src/backup_info.rs     | 121 +++++++++++++++++++++++----
 pbs-datastore/src/datastore.rs       |   6 +-
 pbs-datastore/src/snapshot_reader.rs |  17 +++-
 src/api2/backup/environment.rs       |   5 +-
 src/backup/verify.rs                 |   4 +-
 6 files changed, 133 insertions(+), 27 deletions(-)

diff --git a/pbs-config/src/lib.rs b/pbs-config/src/lib.rs
index a83db4e1..baeda8dc 100644
--- a/pbs-config/src/lib.rs
+++ b/pbs-config/src/lib.rs
@@ -23,6 +23,7 @@ pub use config_version_cache::ConfigVersionCache;
 
 use anyhow::{format_err, Error};
 use nix::unistd::{Gid, Group, Uid, User};
+use std::os::unix::prelude::AsRawFd;
 
 pub use pbs_buildcfg::{BACKUP_GROUP_NAME, BACKUP_USER_NAME};
 
@@ -48,6 +49,12 @@ pub fn backup_group() -> Result<nix::unistd::Group, Error> {
 
 pub struct BackupLockGuard(Option<std::fs::File>);
 
+impl AsRawFd for BackupLockGuard {
+    fn as_raw_fd(&self) -> i32 {
+        self.0.as_ref().map_or(-1, |f| f.as_raw_fd())
+    }
+}
+
 #[doc(hidden)]
 /// Note: do not use for production code, this is only intended for tests
 pub unsafe fn create_mocked_lock() -> BackupLockGuard {
diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
index e171ac21..046ed6e9 100644
--- a/pbs-datastore/src/backup_info.rs
+++ b/pbs-datastore/src/backup_info.rs
@@ -1,14 +1,15 @@
 use std::convert::TryFrom;
 use std::fmt;
-use std::os::unix::io::RawFd;
+use std::os::unix::io::{AsRawFd, RawFd};
+use std::path::Path;
 use std::path::PathBuf;
 use std::sync::Arc;
+use std::time::Duration;
 
 use anyhow::{bail, format_err, Error};
 
-use proxmox_sys::fs::{
-    lock_dir_noblock, lock_dir_noblock_shared, replace_file, CreateOptions, DirLockGuard,
-};
+use proxmox_sys::fs::{replace_file, CreateOptions};
+use proxmox_sys::systemd::escape_unit;
 
 use pbs_api_types::{
     Authid, BackupNamespace, BackupType, GroupFilter, BACKUP_DATE_REGEX, BACKUP_FILE_REGEX,
@@ -20,6 +21,8 @@ use crate::manifest::{
 };
 use crate::{DataBlob, DataStore};
 
+pub const DATASTORE_LOCKS_DIR: &str = "/run/proxmox-backup/locks";
+
 /// BackupGroup is a directory containing a list of BackupDir
 #[derive(Clone)]
 pub struct BackupGroup {
@@ -216,6 +219,8 @@ impl BackupGroup {
             })?;
         }
 
+        let _ = std::fs::remove_file(self.lock_path());
+
         Ok(removed_all_snaps)
     }
 
@@ -232,13 +237,29 @@ impl BackupGroup {
             .set_owner(&self.ns, self.as_ref(), auth_id, force)
     }
 
-    /// Lock a group exclusively
-    pub fn lock(&self) -> Result<DirLockGuard, Error> {
-        lock_dir_noblock(
-            &self.full_group_path(),
-            "backup group",
-            "possible running backup",
-        )
+    /// Returns a file name for locking a group.
+    ///
+    /// The lock file will be located in:
+    /// `${DATASTORE_LOCKS_DIR}/${datastore name}/${lock_file_name_helper(rpath)}`
+    /// where `rpath` is the relative path of the group.
+    ///
+    /// If the group's relative path contains non-Unicode sequences they will be replaced via
+    /// [std::ffi::OsStr::to_string_lossy()].
+    fn lock_path(&self) -> PathBuf {
+        let path = Path::new(DATASTORE_LOCKS_DIR).join(self.store.name());
+
+        let rpath = self.relative_group_path();
+        let rpath = rpath.as_os_str().to_string_lossy();
+
+        path.join(lock_file_name_helper(&rpath))
+    }
+
+    /// Locks a group exclusively.
+    pub fn lock(&self) -> Result<BackupLockGuard, Error> {
+        lock_helper(self.store.name(), &self.lock_path(), |p| {
+            open_backup_lockfile(p, Some(Duration::from_secs(0)), true)
+                .map_err(|err| format_err!("unable to acquire backup group lock {p:?} - {err}"))
+        })
     }
 }
 
@@ -443,14 +464,37 @@ impl BackupDir {
             .map_err(|err| format_err!("unable to acquire manifest lock {:?} - {}", &path, err))
     }
 
-    /// Lock this snapshot exclusively
-    pub fn lock(&self) -> Result<DirLockGuard, Error> {
-        lock_dir_noblock(&self.full_path(), "snapshot", "possibly running or in use")
+    /// Returns a file name for locking a snapshot.
+    ///
+    /// The lock file will be located in:
+    /// `${DATASTORE_LOCKS_DIR}/${datastore name}/${lock_file_name_helper(rpath)}`
+    /// where `rpath` is the relative path of the snapshot.
+    ///
+    /// If the snapshot's relative path contains non-Unicode sequences they will be replaced via
+    /// [std::ffi::OsStr::to_string_lossy()].
+    fn lock_path(&self) -> PathBuf {
+        let path = Path::new(DATASTORE_LOCKS_DIR).join(self.store.name());
+
+        let rpath = self.relative_path();
+        let rpath = rpath.as_os_str().to_string_lossy();
+
+        path.join(lock_file_name_helper(&rpath))
+    }
+
+    /// Locks a snapshot exclusively.
+    pub fn lock(&self) -> Result<BackupLockGuard, Error> {
+        lock_helper(self.store.name(), &self.lock_path(), |p| {
+            open_backup_lockfile(p, Some(Duration::from_secs(0)), true)
+                .map_err(|err| format_err!("unable to acquire snapshot lock {p:?} - {err}"))
+        })
     }
 
-    /// Acquire a shared lock on this snapshot
-    pub fn lock_shared(&self) -> Result<DirLockGuard, Error> {
-        lock_dir_noblock_shared(&self.full_path(), "snapshot", "possibly running or in use")
+    /// Acquires a shared lock on a snapshot.
+    pub fn lock_shared(&self) -> Result<BackupLockGuard, Error> {
+        lock_helper(self.store.name(), &self.lock_path(), |p| {
+            open_backup_lockfile(p, Some(Duration::from_secs(0)), false)
+                .map_err(|err| format_err!("unable to acquire shared snapshot lock {p:?} - {err}"))
+        })
     }
 
     /// Destroy the whole snapshot, bails if it's protected
@@ -473,11 +517,13 @@ impl BackupDir {
             format_err!("removing backup snapshot {:?} failed - {}", full_path, err,)
         })?;
 
-        // the manifest doesn't exist anymore, no need to keep the lock (already done by guard?)
+        // remove no longer needed lock files
         if let Ok(path) = self.manifest_lock_path() {
             let _ = std::fs::remove_file(path); // ignore errors
         }
 
+        let _ = std::fs::remove_file(self.lock_path()); // ignore errors
+
         Ok(())
     }
 
@@ -664,3 +710,42 @@ fn list_backup_files<P: ?Sized + nix::NixPath>(
 
     Ok(files)
 }
+
+/// Encodes a string so it can be used as a lock file name.
+///
+/// The first 64 characters will be the sha256 hash of `path` then a hyphen followed by up to 100
+/// characters of the unit encoded version of `path`.
+fn lock_file_name_helper(path: &str) -> String {
+    let enc = escape_unit(path, true);
+    let from = enc.len().checked_sub(100).unwrap_or(0);
+    let enc = enc[from..].to_string();
+    let hash = hex::encode(openssl::sha::sha256(path.as_bytes()));
+
+    format!("{hash}-{enc}")
+}
+
+/// Helps implement the double stat'ing procedure. It avoids certain race conditions upon lock
+/// deletion.
+///
+/// It also creates the base directory for lock files.
+fn lock_helper<F>(
+    store_name: &str,
+    path: &std::path::Path,
+    lock_fn: F,
+) -> Result<BackupLockGuard, Error>
+where
+    F: Fn(&std::path::Path) -> Result<BackupLockGuard, Error>,
+{
+    let lock_dir = Path::new(DATASTORE_LOCKS_DIR).join(store_name);
+    std::fs::create_dir_all(&lock_dir)?;
+
+    let lock = lock_fn(path)?;
+
+    let stat = nix::sys::stat::fstat(lock.as_raw_fd())?.st_ino;
+
+    if nix::sys::stat::stat(path).map_or(true, |st| stat != st.st_ino) {
+        bail!("could not acquire lock, another thread modified the lock file");
+    }
+
+    Ok(lock)
+}
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 2cfcbcaa..85273fc6 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -11,7 +11,6 @@ use nix::unistd::{unlinkat, UnlinkatFlags};
 
 use proxmox_schema::ApiType;
 
-use proxmox_sys::fs::DirLockGuard;
 use proxmox_sys::fs::{file_read_optional_string, replace_file, CreateOptions};
 use proxmox_sys::process_locker::ProcessLockSharedGuard;
 use proxmox_sys::WorkerTaskContext;
@@ -21,6 +20,7 @@ use pbs_api_types::{
     Authid, BackupNamespace, BackupType, ChunkOrder, DataStoreConfig, DatastoreTuning,
     GarbageCollectionStatus, HumanByte, Operation, UPID,
 };
+use pbs_config::BackupLockGuard;
 
 use crate::backup_info::{BackupDir, BackupGroup};
 use crate::chunk_store::ChunkStore;
@@ -627,7 +627,7 @@ impl DataStore {
         ns: &BackupNamespace,
         backup_group: &pbs_api_types::BackupGroup,
         auth_id: &Authid,
-    ) -> Result<(Authid, DirLockGuard), Error> {
+    ) -> Result<(Authid, BackupLockGuard), Error> {
         let backup_group =
             BackupGroup::new(Arc::new(self.clone()), ns.clone(), backup_group.clone());
 
@@ -662,7 +662,7 @@ impl DataStore {
         &self,
         ns: &BackupNamespace,
         backup_dir: &pbs_api_types::BackupDir,
-    ) -> Result<(PathBuf, bool, DirLockGuard), Error> {
+    ) -> Result<(PathBuf, bool, BackupLockGuard), Error> {
         let backup_dir = Arc::new(self.clone()).backup_dir(ns.clone(), backup_dir.clone())?;
         let relative_path = backup_dir.relative_path();
 
diff --git a/pbs-datastore/src/snapshot_reader.rs b/pbs-datastore/src/snapshot_reader.rs
index 22308f42..340f30db 100644
--- a/pbs-datastore/src/snapshot_reader.rs
+++ b/pbs-datastore/src/snapshot_reader.rs
@@ -3,8 +3,11 @@ use std::os::unix::io::{AsRawFd, FromRawFd};
 use std::path::Path;
 use std::sync::Arc;
 
-use anyhow::{bail, Error};
+use anyhow::{bail, format_err, Error};
 use nix::dir::Dir;
+use nix::fcntl::OFlag;
+use nix::sys::stat::Mode;
+use pbs_config::BackupLockGuard;
 
 use pbs_api_types::{print_store_and_ns, BackupNamespace, Operation};
 
@@ -23,6 +26,10 @@ pub struct SnapshotReader {
     datastore_name: String,
     file_list: Vec<String>,
     locked_dir: Dir,
+
+    // while this is never read, the lock needs to be kept until the
+    // reader is dropped to ensure valid locking semantics
+    _lock: BackupLockGuard,
 }
 
 impl SnapshotReader {
@@ -38,7 +45,12 @@ impl SnapshotReader {
     pub(crate) fn new_do(snapshot: BackupDir) -> Result<Self, Error> {
         let datastore = snapshot.datastore();
 
-        let locked_dir = snapshot.lock_shared()?;
+        let lock = snapshot.lock_shared()?;
+
+        let path = snapshot.full_path();
+
+        let locked_dir = Dir::open(&path, OFlag::O_RDONLY, Mode::empty())
+            .map_err(|err| format_err!("unable to open snapshot directory {path:?} - {err}"))?;
 
         let datastore_name = datastore.name().to_string();
         let manifest = match snapshot.load_manifest() {
@@ -69,6 +81,7 @@ impl SnapshotReader {
             datastore_name,
             file_list,
             locked_dir,
+            _lock: lock,
         })
     }
 
diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index 8c1c42db..c4280b3f 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -1,5 +1,6 @@
 use anyhow::{bail, format_err, Error};
-use nix::dir::Dir;
+
+use pbs_config::BackupLockGuard;
 use std::collections::HashMap;
 use std::sync::{Arc, Mutex};
 
@@ -632,7 +633,7 @@ impl BackupEnvironment {
     /// If verify-new is set on the datastore, this will run a new verify task
     /// for the backup. If not, this will return and also drop the passed lock
     /// immediately.
-    pub fn verify_after_complete(&self, snap_lock: Dir) -> Result<(), Error> {
+    pub fn verify_after_complete(&self, snap_lock: BackupLockGuard) -> Result<(), Error> {
         self.ensure_finished()?;
 
         if !self.datastore.verify_new() {
diff --git a/src/backup/verify.rs b/src/backup/verify.rs
index e482e309..f0c8d950 100644
--- a/src/backup/verify.rs
+++ b/src/backup/verify.rs
@@ -1,4 +1,4 @@
-use nix::dir::Dir;
+use pbs_config::BackupLockGuard;
 use std::collections::HashSet;
 use std::sync::atomic::{AtomicUsize, Ordering};
 use std::sync::{Arc, Mutex};
@@ -351,7 +351,7 @@ pub fn verify_backup_dir_with_lock(
     backup_dir: &BackupDir,
     upid: UPID,
     filter: Option<&dyn Fn(&BackupManifest) -> bool>,
-    _snap_lock: Dir,
+    _snap_lock: BackupLockGuard,
 ) -> Result<bool, Error> {
     let manifest = match backup_dir.load_manifest() {
         Ok((manifest, _)) => manifest,
-- 
2.30.2





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v4 4/5] fix #3935: datastore: move manifest locking to new locking method
  2022-07-05 14:54 [pbs-devel] [PATCH proxmox-backup v4 0/5] refactor datastore locking to use tmpfs Stefan Sterz
                   ` (2 preceding siblings ...)
  2022-07-05 14:54 ` [pbs-devel] [PATCH proxmox-backup v4 3/5] fix #3935: datastore/api/backup: move datastore locking to '/run' Stefan Sterz
@ 2022-07-05 14:54 ` Stefan Sterz
  2022-07-05 14:54 ` [pbs-devel] [PATCH proxmox-backup v4 5/5] fix: api: avoid race condition in set_backup_owner Stefan Sterz
  4 siblings, 0 replies; 6+ messages in thread
From: Stefan Sterz @ 2022-07-05 14:54 UTC (permalink / raw)
  To: pbs-devel

adds double stat'ing and removes directory hierarchy to bring manifest
locking in-line with other locks used by the BackupDir trait.

Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
---
 pbs-datastore/src/backup_info.rs | 35 ++++++++++++++++----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
index 046ed6e9..43712700 100644
--- a/pbs-datastore/src/backup_info.rs
+++ b/pbs-datastore/src/backup_info.rs
@@ -443,25 +443,29 @@ impl BackupDir {
     /// Returns the filename to lock a manifest
     ///
     /// Also creates the basedir. The lockfile is located in
-    /// '/run/proxmox-backup/locks/{datastore}/[ns/{ns}/]+{type}/{id}/{timestamp}.index.json.lck'
-    fn manifest_lock_path(&self) -> Result<PathBuf, Error> {
-        let mut path = PathBuf::from(&format!("/run/proxmox-backup/locks/{}", self.store.name()));
-        path.push(self.relative_path());
+    /// `${DATASTORE_LOCKS_DIR}/${datastore name}/${lock_file_name_helper(rpath)}.index.json.lck`
+    /// where rpath is the relative path of the snapshot.
+    ///
+    /// If the snapshot's relative path contains non-Unicode sequences they will be replaced via
+    /// [std::ffi::OsStr::to_string_lossy()].
+    fn manifest_lock_path(&self) -> PathBuf {
+        let path = Path::new(DATASTORE_LOCKS_DIR).join(self.store.name());
 
-        std::fs::create_dir_all(&path)?;
-        let ts = self.backup_time_string();
-        path.push(&format!("{ts}{MANIFEST_LOCK_NAME}"));
+        let rpath = self.relative_path().join(&format!("{MANIFEST_LOCK_NAME}"));
+        let rpath = rpath.as_os_str().to_string_lossy();
 
-        Ok(path)
+        path.join(lock_file_name_helper(&rpath))
     }
 
     /// Locks the manifest of a snapshot, for example, to update or delete it.
     pub(crate) fn lock_manifest(&self) -> Result<BackupLockGuard, Error> {
-        let path = self.manifest_lock_path()?;
-
-        // actions locking the manifest should be relatively short, only wait a few seconds
-        open_backup_lockfile(&path, Some(std::time::Duration::from_secs(5)), true)
-            .map_err(|err| format_err!("unable to acquire manifest lock {:?} - {}", &path, err))
+        lock_helper(self.store.name(), &self.manifest_lock_path(), |p| {
+            // update_manifest should never take a long time, so if
+            // someone else has the lock we can simply block a bit
+            // and should get it soon
+            open_backup_lockfile(&p, Some(Duration::from_secs(5)), true)
+                .map_err(|err| format_err!("unable to acquire manifest lock {p:?} - {err}"))
+        })
     }
 
     /// Returns a file name for locking a snapshot.
@@ -518,10 +522,7 @@ impl BackupDir {
         })?;
 
         // remove no longer needed lock files
-        if let Ok(path) = self.manifest_lock_path() {
-            let _ = std::fs::remove_file(path); // ignore errors
-        }
-
+        let _ = std::fs::remove_file(self.manifest_lock_path()); // ignore errors
         let _ = std::fs::remove_file(self.lock_path()); // ignore errors
 
         Ok(())
-- 
2.30.2





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v4 5/5] fix: api: avoid race condition in set_backup_owner
  2022-07-05 14:54 [pbs-devel] [PATCH proxmox-backup v4 0/5] refactor datastore locking to use tmpfs Stefan Sterz
                   ` (3 preceding siblings ...)
  2022-07-05 14:54 ` [pbs-devel] [PATCH proxmox-backup v4 4/5] fix #3935: datastore: move manifest locking to new locking method Stefan Sterz
@ 2022-07-05 14:54 ` Stefan Sterz
  4 siblings, 0 replies; 6+ messages in thread
From: Stefan Sterz @ 2022-07-05 14:54 UTC (permalink / raw)
  To: pbs-devel

when two clients change the owner of a backup store, a race condition
arose. add locking to avoid this.

Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
---
 src/api2/admin/datastore.rs | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index 5448fa10..4fdc1742 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -2135,10 +2135,9 @@ pub fn set_backup_owner(
     let datastore = DataStore::lookup_datastore(&store, Some(Operation::Write))?;
 
     let backup_group = datastore.backup_group(ns, backup_group);
+    let owner = backup_group.get_owner()?;
 
     if owner_check_required {
-        let owner = backup_group.get_owner()?;
-
         let allowed = match (owner.is_token(), new_owner.is_token()) {
             (true, true) => {
                 // API token to API token, owned by same user
@@ -2185,6 +2184,12 @@ pub fn set_backup_owner(
         );
     }
 
+    let _guard = backup_group.lock()?;
+
+    if owner != backup_group.get_owner()? {
+        bail!("{owner} does not own this group anymore");
+    }
+
     backup_group.set_owner(&new_owner, true)?;
 
     Ok(())
-- 
2.30.2





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-07-05 14:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-05 14:54 [pbs-devel] [PATCH proxmox-backup v4 0/5] refactor datastore locking to use tmpfs Stefan Sterz
2022-07-05 14:54 ` [pbs-devel] [PATCH proxmox-backup v4 1/5] fix: datastore: make relative_group_path() return relative path Stefan Sterz
2022-07-05 14:54 ` [pbs-devel] [PATCH proxmox-backup v4 2/5] fix #3935: datastore/api/backup: add lock helpers to backup dir/group Stefan Sterz
2022-07-05 14:54 ` [pbs-devel] [PATCH proxmox-backup v4 3/5] fix #3935: datastore/api/backup: move datastore locking to '/run' Stefan Sterz
2022-07-05 14:54 ` [pbs-devel] [PATCH proxmox-backup v4 4/5] fix #3935: datastore: move manifest locking to new locking method Stefan Sterz
2022-07-05 14:54 ` [pbs-devel] [PATCH proxmox-backup v4 5/5] fix: api: avoid race condition in set_backup_owner Stefan Sterz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal