* [pbs-devel] [RFC PATCH 0/5] fix #3935: refactor datastore locking to use tmpfs @ 2022-03-18 14:06 Stefan Sterz 2022-03-18 14:06 ` [pbs-devel] [RFC PATCH 1/5] fix #3935: datastore/api2/backup: move datastore locking to '/run' Stefan Sterz ` (5 more replies) 0 siblings, 6 replies; 7+ messages in thread From: Stefan Sterz @ 2022-03-18 14:06 UTC (permalink / raw) To: pbs-devel This RFC overhauls the locking mechanism for snapshots and backup groups in pbs-datastore and wherever groups and snapshots are accessed. Sending this as an RFC in case I missed places where locking occurs on groups, snapshots or manifests. The first patch in this series moves most of the locking into new functions that are part of the `datastore` trait. These functions create a mirror of the directory structure under `/run/proxmox-backup/locks` that is used for locking. The second patch adds one more locking function that enables shared locks on groups. This functionality has not been needed so far, but might be useful in the future. I am unsure whether it should be included. The third patch moves the `/run/proxmox-backup/locks` path into a constant in order to be able to switch the lock location more easily latter. Sending this as its own patch as it also affects manifest locking in a minor way, unlike the previous patches. Lastly, patch four and five fix issues that are locking related and currently present in the codebase. For patch four, see my comment there. Patch five resolves a race condition when two clients try to change the owner of a data store via the API. Patch five depends on patch one, but patch four could be applied independently. Stefan Sterz (5): fix #3935: datastore/api2/backup: move datastore locking to '/run' fix #3935: datastore: add shared group lock to datastore fix #3935: datastore/config: add a constant for the locking directory fix #3935: datastore: keep manifest locks, avoid similar locking issue fix: api: avoid race condition in set_backup_owner pbs-datastore/src/datastore.rs | 95 +++++++++++++++++++++------- pbs-datastore/src/snapshot_reader.rs | 12 +--- src/api2/admin/datastore.rs | 9 ++- src/api2/backup/mod.rs | 5 +- src/api2/reader/mod.rs | 7 +- src/backup/verify.rs | 8 +-- 6 files changed, 88 insertions(+), 48 deletions(-) -- 2.30.2 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [pbs-devel] [RFC PATCH 1/5] fix #3935: datastore/api2/backup: move datastore locking to '/run' 2022-03-18 14:06 [pbs-devel] [RFC PATCH 0/5] fix #3935: refactor datastore locking to use tmpfs Stefan Sterz @ 2022-03-18 14:06 ` Stefan Sterz 2022-03-18 14:06 ` [pbs-devel] [RFC PATCH 2/5] fix #3935: datastore: add shared group lock to datastore Stefan Sterz ` (4 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: Stefan Sterz @ 2022-03-18 14:06 UTC (permalink / raw) To: pbs-devel to avoid issues when removing a group or snapshot directory where two threads hold a lock to the same directory, move locking to the tmpfs backed '/run' directory Signed-off-by: Stefan Sterz <s.sterz@proxmox.com> --- pbs-datastore/src/datastore.rs | 74 ++++++++++++++++++++++------ pbs-datastore/src/snapshot_reader.rs | 12 +---- src/api2/backup/mod.rs | 5 +- src/api2/reader/mod.rs | 7 +-- src/backup/verify.rs | 8 +-- 5 files changed, 66 insertions(+), 40 deletions(-) diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs index d416c8d8..389ac298 100644 --- a/pbs-datastore/src/datastore.rs +++ b/pbs-datastore/src/datastore.rs @@ -15,7 +15,7 @@ use proxmox_sys::fs::{replace_file, file_read_optional_string, CreateOptions}; use proxmox_sys::process_locker::ProcessLockSharedGuard; use proxmox_sys::WorkerTaskContext; use proxmox_sys::{task_log, task_warn}; -use proxmox_sys::fs::{lock_dir_noblock, DirLockGuard}; +use proxmox_sys::fs::DirLockGuard; use pbs_api_types::{ UPID, DataStoreConfig, Authid, GarbageCollectionStatus, HumanByte, @@ -288,11 +288,9 @@ impl DataStore { /// Remove a complete backup group including all snapshots, returns true /// if all snapshots were removed, and false if some were protected pub fn remove_backup_group(&self, backup_group: &BackupGroup) -> Result<bool, Error> { - + let _guard = self.lock_group(backup_group)?; let full_path = self.group_path(backup_group); - let _guard = proxmox_sys::fs::lock_dir_noblock(&full_path, "backup group", "possible running backup")?; - log::info!("removing backup group {:?}", full_path); let mut removed_all = true; @@ -322,13 +320,11 @@ impl DataStore { } /// Remove a backup directory including all content - pub fn remove_backup_dir(&self, backup_dir: &BackupDir, force: bool) -> Result<(), Error> { - - let full_path = self.snapshot_path(backup_dir); - + pub fn remove_backup_dir(&self, backup_dir: &BackupDir, force: bool) -> Result<(), Error> { let (_guard, _manifest_guard); + if !force { - _guard = lock_dir_noblock(&full_path, "snapshot", "possibly running or in use")?; + _guard = self.lock_snapshot(backup_dir)?; _manifest_guard = self.lock_manifest(backup_dir)?; } @@ -336,6 +332,8 @@ impl DataStore { bail!("cannot remove protected snapshot"); } + let full_path = self.snapshot_path(backup_dir); + log::info!("removing backup snapshot {:?}", full_path); std::fs::remove_dir_all(&full_path) .map_err(|err| { @@ -438,13 +436,13 @@ impl DataStore { // create the last component now match std::fs::create_dir(&full_path) { Ok(_) => { - let guard = lock_dir_noblock(&full_path, "backup group", "another backup is already running")?; + let guard = self.lock_group(backup_group)?; self.set_owner(backup_group, auth_id, false)?; let owner = self.get_owner(backup_group)?; // just to be sure Ok((owner, guard)) } Err(ref err) if err.kind() == io::ErrorKind::AlreadyExists => { - let guard = lock_dir_noblock(&full_path, "backup group", "another backup is already running")?; + let guard = self.lock_group(backup_group)?; let owner = self.get_owner(backup_group)?; // just to be sure Ok((owner, guard)) } @@ -462,8 +460,7 @@ impl DataStore { let mut full_path = self.base_path(); full_path.push(&relative_path); - let lock = || - lock_dir_noblock(&full_path, "snapshot", "internal error - tried creating snapshot that's already in use"); + let lock = || self.lock_snapshot(backup_dir); match std::fs::create_dir(&full_path) { Ok(_) => Ok((relative_path, true, lock()?)), @@ -883,9 +880,7 @@ impl DataStore { backup_dir: &BackupDir, protection: bool ) -> Result<(), Error> { - let full_path = self.snapshot_path(backup_dir); - - let _guard = lock_dir_noblock(&full_path, "snapshot", "possibly running or in use")?; + let _guard = self.lock_snapshot(backup_dir)?; let protected_path = backup_dir.protected_file(self.base_path()); if protection { @@ -952,4 +947,51 @@ impl DataStore { Ok(chunk_list) } + + fn snapshot_lock_path(&self, backup_dir: &BackupDir) -> Result<PathBuf, Error> { + let path = Path::new("/run/proxmox-backup/locks") + .join(self.name()) + .join(backup_dir.relative_path()); + + std::fs::create_dir_all(&path)?; + + Ok(path) + } + + fn group_lock_path(&self, group: &BackupGroup) -> Result<PathBuf, Error> { + let path = Path::new("/run/proxmox-backup/locks") + .join(self.name()) + .join(group.group_path()); + + std::fs::create_dir_all(&path)?; + + Ok(path) + } + + /// Lock a snapshot exclusively + pub fn lock_snapshot(&self, backup_dir: &BackupDir) -> Result<DirLockGuard, Error> { + proxmox_sys::fs::lock_dir_noblock( + &self.snapshot_lock_path(backup_dir)?, + "snapshot", + "possibly running or in use", + ) + } + + /// Acquire a shared lock on a snapshot + pub fn lock_snapshot_shared(&self, backup_dir: &BackupDir) -> Result<DirLockGuard, Error> { + proxmox_sys::fs::lock_dir_noblock_shared( + &self.snapshot_lock_path(backup_dir)?, + "snapshot", + "possibly running or in use", + ) + } + + /// Lock a group exclusively + pub fn lock_group(&self, group: &BackupGroup) -> Result<DirLockGuard, Error> { + proxmox_sys::fs::lock_dir_noblock( + &self.group_lock_path(group)?, + "backup group", + "another backup is already running", + ) + } } diff --git a/pbs-datastore/src/snapshot_reader.rs b/pbs-datastore/src/snapshot_reader.rs index 1bbf57e7..30993b0d 100644 --- a/pbs-datastore/src/snapshot_reader.rs +++ b/pbs-datastore/src/snapshot_reader.rs @@ -6,8 +6,6 @@ use std::fs::File; use anyhow::{bail, Error}; use nix::dir::Dir; -use proxmox_sys::fs::lock_dir_noblock_shared; - use crate::backup_info::BackupDir; use crate::index::IndexFile; use crate::fixed_index::FixedIndexReader; @@ -29,13 +27,7 @@ impl SnapshotReader { /// Lock snapshot, reads the manifest and returns a new instance pub fn new(datastore: Arc<DataStore>, snapshot: BackupDir) -> Result<Self, Error> { - - let snapshot_path = datastore.snapshot_path(&snapshot); - - let locked_dir = lock_dir_noblock_shared( - &snapshot_path, - "snapshot", - "locked by another operation")?; + let locked_dir = datastore.lock_snapshot_shared(&snapshot)?; let datastore_name = datastore.name().to_string(); @@ -47,7 +39,7 @@ impl SnapshotReader { } }; - let mut client_log_path = snapshot_path; + let mut client_log_path = datastore.snapshot_path(&snapshot); client_log_path.push(CLIENT_LOG_BLOB_NAME); let mut file_list = Vec::new(); diff --git a/src/api2/backup/mod.rs b/src/api2/backup/mod.rs index 395edd3d..6e38f3da 100644 --- a/src/api2/backup/mod.rs +++ b/src/api2/backup/mod.rs @@ -20,7 +20,7 @@ use pbs_api_types::{ BACKUP_ID_SCHEMA, BACKUP_TIME_SCHEMA, BACKUP_TYPE_SCHEMA, DATASTORE_SCHEMA, CHUNK_DIGEST_SCHEMA, PRIV_DATASTORE_BACKUP, BACKUP_ARCHIVE_NAME_SCHEMA, }; -use proxmox_sys::fs::lock_dir_noblock_shared; + use pbs_tools::json::{required_array_param, required_integer_param, required_string_param}; use pbs_config::CachedUserInfo; use pbs_datastore::{DataStore, PROXMOX_BACKUP_PROTOCOL_ID_V1}; @@ -157,8 +157,7 @@ async move { } // lock last snapshot to prevent forgetting/pruning it during backup - let full_path = datastore.snapshot_path(&last.backup_dir); - Some(lock_dir_noblock_shared(&full_path, "snapshot", "base snapshot is already locked by another operation")?) + Some(datastore.lock_snapshot_shared(&last.backup_dir)?) } else { None }; diff --git a/src/api2/reader/mod.rs b/src/api2/reader/mod.rs index 2b11d1b1..f472686a 100644 --- a/src/api2/reader/mod.rs +++ b/src/api2/reader/mod.rs @@ -20,7 +20,7 @@ use pbs_api_types::{ CHUNK_DIGEST_SCHEMA, PRIV_DATASTORE_READ, PRIV_DATASTORE_BACKUP, BACKUP_ARCHIVE_NAME_SCHEMA, }; -use proxmox_sys::fs::lock_dir_noblock_shared; + use pbs_tools::json::{required_integer_param, required_string_param}; use pbs_datastore::{DataStore, PROXMOX_BACKUP_READER_PROTOCOL_ID_V1}; use pbs_datastore::backup_info::BackupDir; @@ -114,10 +114,7 @@ fn upgrade_to_backup_reader_protocol( } } - let _guard = lock_dir_noblock_shared( - &datastore.snapshot_path(&backup_dir), - "snapshot", - "locked by another operation")?; + let _guard = datastore.lock_snapshot_shared(&backup_dir)?; let path = datastore.base_path(); diff --git a/src/backup/verify.rs b/src/backup/verify.rs index 307d366c..eea0e15d 100644 --- a/src/backup/verify.rs +++ b/src/backup/verify.rs @@ -13,7 +13,6 @@ use pbs_datastore::{DataStore, DataBlob, StoreProgress}; use pbs_datastore::backup_info::{BackupGroup, BackupDir, BackupInfo}; use pbs_datastore::index::IndexFile; use pbs_datastore::manifest::{archive_type, ArchiveType, BackupManifest, FileInfo}; -use proxmox_sys::fs::lock_dir_noblock_shared; use crate::tools::parallel_handler::ParallelHandler; @@ -300,11 +299,8 @@ pub fn verify_backup_dir( upid: UPID, filter: Option<&dyn Fn(&BackupManifest) -> bool>, ) -> Result<bool, Error> { - let snap_lock = lock_dir_noblock_shared( - &verify_worker.datastore.snapshot_path(backup_dir), - "snapshot", - "locked by another operation", - ); + let snap_lock = verify_worker.datastore.lock_snapshot_shared(&backup_dir); + match snap_lock { Ok(snap_lock) => { verify_backup_dir_with_lock(verify_worker, backup_dir, upid, filter, snap_lock) -- 2.30.2 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [pbs-devel] [RFC PATCH 2/5] fix #3935: datastore: add shared group lock to datastore 2022-03-18 14:06 [pbs-devel] [RFC PATCH 0/5] fix #3935: refactor datastore locking to use tmpfs Stefan Sterz 2022-03-18 14:06 ` [pbs-devel] [RFC PATCH 1/5] fix #3935: datastore/api2/backup: move datastore locking to '/run' Stefan Sterz @ 2022-03-18 14:06 ` Stefan Sterz 2022-03-18 14:06 ` [pbs-devel] [RFC PATCH 3/5] fix #3935: datastore/config: add a constant for the locking directory Stefan Sterz ` (3 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: Stefan Sterz @ 2022-03-18 14:06 UTC (permalink / raw) To: pbs-devel so far this has not been needed, added for completeness' sake Signed-off-by: Stefan Sterz <s.sterz@proxmox.com> --- pbs-datastore/src/datastore.rs | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs index 389ac298..c1304d1a 100644 --- a/pbs-datastore/src/datastore.rs +++ b/pbs-datastore/src/datastore.rs @@ -994,4 +994,14 @@ impl DataStore { "another backup is already running", ) } + + /// Acquire a shared lock on a group + #[allow(dead_code)] + pub fn lock_group_shared(&self, group: &BackupGroup) -> Result<DirLockGuard, Error> { + proxmox_sys::fs::lock_dir_noblock_shared( + &self.group_lock_path(group)?, + "backup group", + "another backup is already running", + ) + } } -- 2.30.2 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [pbs-devel] [RFC PATCH 3/5] fix #3935: datastore/config: add a constant for the locking directory 2022-03-18 14:06 [pbs-devel] [RFC PATCH 0/5] fix #3935: refactor datastore locking to use tmpfs Stefan Sterz 2022-03-18 14:06 ` [pbs-devel] [RFC PATCH 1/5] fix #3935: datastore/api2/backup: move datastore locking to '/run' Stefan Sterz 2022-03-18 14:06 ` [pbs-devel] [RFC PATCH 2/5] fix #3935: datastore: add shared group lock to datastore Stefan Sterz @ 2022-03-18 14:06 ` Stefan Sterz 2022-03-18 14:06 ` [pbs-devel] [RFC PATCH 4/5] fix #3935: datastore: keep manifest locks, avoid similar locking issue Stefan Sterz ` (2 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: Stefan Sterz @ 2022-03-18 14:06 UTC (permalink / raw) To: pbs-devel adds a constant to make switching the datastore lock directory location easier Signed-off-by: Stefan Sterz <s.sterz@proxmox.com> --- pbs-datastore/src/datastore.rs | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs index c1304d1a..11bd578d 100644 --- a/pbs-datastore/src/datastore.rs +++ b/pbs-datastore/src/datastore.rs @@ -39,6 +39,8 @@ lazy_static! { static ref DATASTORE_MAP: Mutex<HashMap<String, Arc<DataStore>>> = Mutex::new(HashMap::new()); } +pub const DATASTORE_LOCKS_DIR: &str = "/run/proxmox-backup/locks"; + /// checks if auth_id is owner, or, if owner is a token, if /// auth_id is the user of the token pub fn check_backup_owner( @@ -807,7 +809,8 @@ impl DataStore { backup_dir: &BackupDir, ) -> Result<String, Error> { let mut path = format!( - "/run/proxmox-backup/locks/{}/{}/{}", + "{}/{}/{}/{}", + DATASTORE_LOCKS_DIR, self.name(), backup_dir.group().backup_type(), backup_dir.group().backup_id(), @@ -949,7 +952,7 @@ impl DataStore { } fn snapshot_lock_path(&self, backup_dir: &BackupDir) -> Result<PathBuf, Error> { - let path = Path::new("/run/proxmox-backup/locks") + let path = Path::new(DATASTORE_LOCKS_DIR) .join(self.name()) .join(backup_dir.relative_path()); @@ -959,7 +962,7 @@ impl DataStore { } fn group_lock_path(&self, group: &BackupGroup) -> Result<PathBuf, Error> { - let path = Path::new("/run/proxmox-backup/locks") + let path = Path::new(DATASTORE_LOCKS_DIR) .join(self.name()) .join(group.group_path()); -- 2.30.2 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [pbs-devel] [RFC PATCH 4/5] fix #3935: datastore: keep manifest locks, avoid similar locking issue 2022-03-18 14:06 [pbs-devel] [RFC PATCH 0/5] fix #3935: refactor datastore locking to use tmpfs Stefan Sterz ` (2 preceding siblings ...) 2022-03-18 14:06 ` [pbs-devel] [RFC PATCH 3/5] fix #3935: datastore/config: add a constant for the locking directory Stefan Sterz @ 2022-03-18 14:06 ` Stefan Sterz 2022-03-18 14:06 ` [pbs-devel] [RFC PATCH 5/5] fix: api: avoid race condition in set_backup_owner Stefan Sterz 2022-03-22 9:38 ` [pbs-devel] [RFC PATCH 0/5] fix #3935: refactor datastore locking to use tmpfs Wolfgang Bumiller 5 siblings, 0 replies; 7+ messages in thread From: Stefan Sterz @ 2022-03-18 14:06 UTC (permalink / raw) To: pbs-devel just as removing the directories we locked on is an issue, so is removing the locks for the manifest. Signed-off-by: Stefan Sterz <s.sterz@proxmox.com> --- Similarly to the issue that led to this patch series the following seems plausible to me: A wants to remove a group, locks group, locks manifest (via `remove_group`) B wants to interact with the manifest, begins locking manifest via `lock_manifest`, it opens file handle to the lock file A deletes group, deletes manifest C wants to interact with the manifest, re-creates lock via `lock_manifest`, now holds a lock to the manifest A drops group and manifest lock B acquires lock B and C now hold the lock to the same manifest I am not sure how likely this is or how "bad", but given that `lock_manifest` has a 5 second timeout, it might be more likely than it appears at first. Correct me if I am wrong though. If I am not mistaken, this is also the reason why we can basically _never_ remove a file/directory that acts as a lock. One solution would be to require a higher lock protecting the lower lock, which would need to be acquired by all threads interacting with the given object. However, this basically negates the principle of acquiring locks with the least privileges and would more or less converge towards a global lock. For example: To remove a snapshot lock, we need to lock the group. If its the last snapshot and we want to remove the group, we'd need to lock the data store. Now every thread that wants to access a snapshot, needs to acquire a data store lock. There might be a nicer solution that I am missing. If I am wrong please tell me :) pbs-datastore/src/datastore.rs | 6 ------ 1 file changed, 6 deletions(-) diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs index 11bd578d..60383860 100644 --- a/pbs-datastore/src/datastore.rs +++ b/pbs-datastore/src/datastore.rs @@ -346,12 +346,6 @@ impl DataStore { ) })?; - // the manifest does not exists anymore, we do not need to keep the lock - if let Ok(path) = self.manifest_lock_path(backup_dir) { - // ignore errors - let _ = std::fs::remove_file(path); - } - Ok(()) } -- 2.30.2 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [pbs-devel] [RFC PATCH 5/5] fix: api: avoid race condition in set_backup_owner 2022-03-18 14:06 [pbs-devel] [RFC PATCH 0/5] fix #3935: refactor datastore locking to use tmpfs Stefan Sterz ` (3 preceding siblings ...) 2022-03-18 14:06 ` [pbs-devel] [RFC PATCH 4/5] fix #3935: datastore: keep manifest locks, avoid similar locking issue Stefan Sterz @ 2022-03-18 14:06 ` Stefan Sterz 2022-03-22 9:38 ` [pbs-devel] [RFC PATCH 0/5] fix #3935: refactor datastore locking to use tmpfs Wolfgang Bumiller 5 siblings, 0 replies; 7+ messages in thread From: Stefan Sterz @ 2022-03-18 14:06 UTC (permalink / raw) To: pbs-devel when two clients change the owner of a backup store, a race condition arose. add locking to avoid this. Signed-off-by: Stefan Sterz <s.sterz@proxmox.com> --- src/api2/admin/datastore.rs | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs index ef82b426..0c297937 100644 --- a/src/api2/admin/datastore.rs +++ b/src/api2/admin/datastore.rs @@ -1889,11 +1889,12 @@ pub fn set_backup_owner( let privs = user_info.lookup_privs(&auth_id, &["datastore", &store]); + let owner = datastore.get_owner(&backup_group)?; + let allowed = if (privs & PRIV_DATASTORE_MODIFY) != 0 { // High-privilege user/token true } else if (privs & PRIV_DATASTORE_BACKUP) != 0 { - let owner = datastore.get_owner(&backup_group)?; match (owner.is_token(), new_owner.is_token()) { (true, true) => { @@ -1940,6 +1941,12 @@ pub fn set_backup_owner( new_owner); } + let _guard = datastore.lock_group(&backup_group)?; + + if owner != datastore.get_owner(&backup_group)? { + bail!("{} does not own this group anymore", owner); + } + datastore.set_owner(&backup_group, &new_owner, true)?; Ok(()) -- 2.30.2 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [pbs-devel] [RFC PATCH 0/5] fix #3935: refactor datastore locking to use tmpfs 2022-03-18 14:06 [pbs-devel] [RFC PATCH 0/5] fix #3935: refactor datastore locking to use tmpfs Stefan Sterz ` (4 preceding siblings ...) 2022-03-18 14:06 ` [pbs-devel] [RFC PATCH 5/5] fix: api: avoid race condition in set_backup_owner Stefan Sterz @ 2022-03-22 9:38 ` Wolfgang Bumiller 5 siblings, 0 replies; 7+ messages in thread From: Wolfgang Bumiller @ 2022-03-22 9:38 UTC (permalink / raw) To: Stefan Sterz; +Cc: pbs-devel On Fri, Mar 18, 2022 at 03:06:50PM +0100, Stefan Sterz wrote: > This RFC overhauls the locking mechanism for snapshots and backup > groups in pbs-datastore and wherever groups and snapshots are > accessed. Sending this as an RFC in case I missed places where > locking occurs on groups, snapshots or manifests. > > The first patch in this series moves most of the locking into new > functions that are part of the `datastore` trait. These functions > create a mirror of the directory structure under > `/run/proxmox-backup/locks` that is used for locking. The second > patch adds one more locking function that enables shared locks on > groups. This functionality has not been needed so far, but might be > useful in the future. I am unsure whether it should be included. > > The third patch moves the `/run/proxmox-backup/locks` path into a > constant in order to be able to switch the lock location more easily > latter. Sending this as its own patch as it also affects manifest > locking in a minor way, unlike the previous patches. > > Lastly, patch four and five fix issues that are locking related and > currently present in the codebase. For patch four, see my comment > there. Patch five resolves a race condition when two clients try to > change the owner of a data store via the API. Patch five depends on > patch one, but patch four could be applied independently. So I think we'll move ahead with this series with minor changes and possibly extend it a little further. We had an off-list discussion and basically came to the following conclusion: Since this series moves the locks into a tmpfs, adding `stat()` calls on the locked fd and the path afterwards should be cheap enough. By comparing the inodes we can make sure the file is still the same one and we didn't race against a deletion. This way we don't need to leave the files lying around. From what I can tell you currently use directories for locks. But since the locks aren't actually the backup group directory anymore I think they should be files instead. Also, since we'd like to actually clean up the locks, the lock paths shouldn't be nested directories as they are now, but flat files (still based on the path but escaped, otherwise cleaning up would become even more complex ;-) ). Other ideas which floated around: - A lock daemon (or simply a fuse implementation where 'unlink' cancels pending flocks). (This could be extended in various ways for async locking, timeouts, etc.) - A "hashmap lock file": An in-memory file filled with robust-futex where the group name simply gets hashed to get to a positioin in the file. Hash clashes aren't an issue since all they'd do is cause multiple groups to be guarded by the same lock, which wouldn't technically be wrong. But the tmpfs+stat approach seems good enough to - at least for now - not really bother with these other options. I also think patch 3 should be merged into 1 (the constant should be there right away), and patch 2 (dead_code) can just be dropped for now. As for the manifest lock (patch 4), we could probably switch to doing a double-stat() there as well instead. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-03-22 9:38 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-03-18 14:06 [pbs-devel] [RFC PATCH 0/5] fix #3935: refactor datastore locking to use tmpfs Stefan Sterz 2022-03-18 14:06 ` [pbs-devel] [RFC PATCH 1/5] fix #3935: datastore/api2/backup: move datastore locking to '/run' Stefan Sterz 2022-03-18 14:06 ` [pbs-devel] [RFC PATCH 2/5] fix #3935: datastore: add shared group lock to datastore Stefan Sterz 2022-03-18 14:06 ` [pbs-devel] [RFC PATCH 3/5] fix #3935: datastore/config: add a constant for the locking directory Stefan Sterz 2022-03-18 14:06 ` [pbs-devel] [RFC PATCH 4/5] fix #3935: datastore: keep manifest locks, avoid similar locking issue Stefan Sterz 2022-03-18 14:06 ` [pbs-devel] [RFC PATCH 5/5] fix: api: avoid race condition in set_backup_owner Stefan Sterz 2022-03-22 9:38 ` [pbs-devel] [RFC PATCH 0/5] fix #3935: refactor datastore locking to use tmpfs Wolfgang Bumiller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox