public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH v2 0/7] More flocking and race elimination
@ 2020-08-11  8:50 Stefan Reiter
  2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 1/7] src/tools/fs.rs: new helper lock_dir_noblock Stefan Reiter
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Stefan Reiter @ 2020-08-11  8:50 UTC (permalink / raw)
  To: pbs-devel

The previous series[0] already fixed some bugs, but it also introduced some new
ones, e.g. incomplete but not running backups were unable to be removed.

This is the next step to race-free™ backups, this time going all-in on flocks.

The 6th patch introduces locking for base backups, but also re-writes the
previous (unnecessarily complex) base-snapshot existance check, which is
therefor reverted in patch 5.

v2:
* drop applied patches
* add cleanup patches from dietmar and rewrite series on top [1]
* use new lock_dir_noblock function
* rethink prune logic w/ feedback from Fabi


[0] https://lists.proxmox.com/pipermail/pbs-devel/2020-July/000233.html
[1] https://lists.proxmox.com/pipermail/pbs-devel/2020-August/000311.html


proxmox-backup: Dietmar Maurer (2):
  src/tools/fs.rs: new helper lock_dir_noblock
  src/backup/backup_info.rs: remove BackupGroup lock()

Stefan Reiter (5):
  datastore: prevent in-use deletion with locks instead of heuristic
  backup: flock snapshot on backup start
  Revert "backup: ensure base snapshots are still available after
    backup"
  backup: lock base snapshot and ensure existance on finish
  prune: also check backup snapshot locks

 src/api2/admin/datastore.rs     |  2 +-
 src/api2/backup.rs              | 16 ++++++---
 src/api2/backup/environment.rs  | 17 +++-------
 src/backup/backup_info.rs       | 43 +----------------------
 src/backup/datastore.rs         | 60 ++++++++++-----------------------
 src/backup/prune.rs             | 50 +++++++++++++++++----------
 src/bin/proxmox-backup-proxy.rs |  2 +-
 src/client/pull.rs              |  2 +-
 src/tools/fs.rs                 | 39 +++++++++++++++++++--
 tests/prune.rs                  |  6 ++--
 10 files changed, 110 insertions(+), 127 deletions(-)

-- 
2.20.1




^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pbs-devel] [PATCH v2 proxmox-backup 1/7] src/tools/fs.rs: new helper lock_dir_noblock
  2020-08-11  8:50 [pbs-devel] [PATCH v2 0/7] More flocking and race elimination Stefan Reiter
@ 2020-08-11  8:50 ` Stefan Reiter
  2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 2/7] src/backup/backup_info.rs: remove BackupGroup lock() Stefan Reiter
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Stefan Reiter @ 2020-08-11  8:50 UTC (permalink / raw)
  To: pbs-devel

From: Dietmar Maurer <dietmar@proxmox.com>

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 src/tools/fs.rs | 39 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/src/tools/fs.rs b/src/tools/fs.rs
index ff625d30..5aaaecda 100644
--- a/src/tools/fs.rs
+++ b/src/tools/fs.rs
@@ -7,10 +7,18 @@ use std::os::unix::io::{AsRawFd, RawFd};
 use anyhow::{format_err, Error};
 use nix::dir;
 use nix::dir::Dir;
+use nix::fcntl::OFlag;
+use nix::sys::stat::Mode;
+
 use regex::Regex;
 
+use proxmox::sys::error::SysError;
+
+
 use crate::tools::borrow::Tied;
 
+pub type DirLockGuard = Dir;
+
 /// This wraps nix::dir::Entry with the parent directory's file descriptor.
 pub struct ReadDirEntry {
     entry: dir::Entry,
@@ -94,9 +102,6 @@ impl Iterator for ReadDir {
 /// Create an iterator over sub directory entries.
 /// This uses `openat` on `dirfd`, so `path` can be relative to that or an absolute path.
 pub fn read_subdir<P: ?Sized + nix::NixPath>(dirfd: RawFd, path: &P) -> nix::Result<ReadDir> {
-    use nix::fcntl::OFlag;
-    use nix::sys::stat::Mode;
-
     let dir = Dir::openat(dirfd, path, OFlag::O_RDONLY, Mode::empty())?;
     let fd = dir.as_raw_fd();
     let iter = Tied::new(dir, |dir| {
@@ -259,3 +264,31 @@ impl Default for FSXAttr {
         }
     }
 }
+
+
+pub fn lock_dir_noblock(
+    path: &std::path::Path,
+    what: &str,
+    would_block_msg: &str,
+) -> Result<DirLockGuard, Error> {
+    let mut handle = Dir::open(path, OFlag::O_RDONLY, Mode::empty())
+        .map_err(|err| {
+            format_err!("unable to open {} directory {:?} for locking - {}", what, path, err)
+        })?;
+
+    // acquire in non-blocking mode, no point in waiting here since other
+    // backups could still take a very long time
+    proxmox::tools::fs::lock_file(&mut handle, true, Some(std::time::Duration::from_nanos(0)))
+        .map_err(|err| {
+            format_err!(
+                "unable to acquire lock on {} directory {:?} - {}", what, path,
+                if err.would_block() {
+                    String::from(would_block_msg)
+                } else {
+                    err.to_string()
+                }
+            )
+        })?;
+
+    Ok(handle)
+}
-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pbs-devel] [PATCH v2 proxmox-backup 2/7] src/backup/backup_info.rs: remove BackupGroup lock()
  2020-08-11  8:50 [pbs-devel] [PATCH v2 0/7] More flocking and race elimination Stefan Reiter
  2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 1/7] src/tools/fs.rs: new helper lock_dir_noblock Stefan Reiter
@ 2020-08-11  8:50 ` Stefan Reiter
  2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 3/7] datastore: prevent in-use deletion with locks instead of heuristic Stefan Reiter
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Stefan Reiter @ 2020-08-11  8:50 UTC (permalink / raw)
  To: pbs-devel

From: Dietmar Maurer <dietmar@proxmox.com>

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 src/backup/backup_info.rs | 41 ---------------------------------------
 src/backup/datastore.rs   |  9 +++++----
 2 files changed, 5 insertions(+), 45 deletions(-)

diff --git a/src/backup/backup_info.rs b/src/backup/backup_info.rs
index ea917d3c..26b57fae 100644
--- a/src/backup/backup_info.rs
+++ b/src/backup/backup_info.rs
@@ -3,16 +3,12 @@ use crate::tools;
 use anyhow::{bail, format_err, Error};
 use regex::Regex;
 use std::os::unix::io::RawFd;
-use nix::dir::Dir;
 
-use std::time::Duration;
 use chrono::{DateTime, TimeZone, SecondsFormat, Utc};
 
 use std::path::{PathBuf, Path};
 use lazy_static::lazy_static;
 
-use proxmox::sys::error::SysError;
-
 use super::manifest::MANIFEST_BLOB_NAME;
 
 macro_rules! BACKUP_ID_RE { () => (r"[A-Za-z0-9][A-Za-z0-9_-]+") }
@@ -40,9 +36,6 @@ lazy_static!{
 
 }
 
-/// Opaque type releasing the corresponding flock when dropped
-pub type BackupGroupGuard = Dir;
-
 /// BackupGroup is a directory containing a list of BackupDir
 #[derive(Debug, Eq, PartialEq, Hash, Clone)]
 pub struct BackupGroup {
@@ -137,40 +130,6 @@ impl BackupGroup {
         Ok(last)
     }
 
-    pub fn lock(&self, base_path: &Path) -> Result<BackupGroupGuard, Error> {
-        use nix::fcntl::OFlag;
-        use nix::sys::stat::Mode;
-
-        let mut path = base_path.to_owned();
-        path.push(self.group_path());
-
-        let mut handle = Dir::open(&path, OFlag::O_RDONLY, Mode::empty())
-            .map_err(|err| {
-                format_err!(
-                    "unable to open backup group directory {:?} for locking - {}",
-                    self.group_path(),
-                    err,
-                )
-            })?;
-
-        // acquire in non-blocking mode, no point in waiting here since other
-        // backups could still take a very long time
-        proxmox::tools::fs::lock_file(&mut handle, true, Some(Duration::from_nanos(0)))
-            .map_err(|err| {
-                format_err!(
-                    "unable to acquire lock on backup group {:?} - {}",
-                    self.group_path(),
-                    if err.would_block() {
-                        String::from("another backup is already running")
-                    } else {
-                        err.to_string()
-                    }
-                )
-            })?;
-
-        Ok(handle)
-    }
-
     pub fn list_groups(base_path: &Path) -> Result<Vec<BackupGroup>, Error> {
         let mut list = Vec::new();
 
diff --git a/src/backup/datastore.rs b/src/backup/datastore.rs
index afdff224..01695f48 100644
--- a/src/backup/datastore.rs
+++ b/src/backup/datastore.rs
@@ -11,7 +11,7 @@ use serde_json::Value;
 
 use proxmox::tools::fs::{replace_file, CreateOptions};
 
-use super::backup_info::{BackupGroup, BackupGroupGuard, BackupDir, BackupInfo};
+use super::backup_info::{BackupGroup, BackupDir, BackupInfo};
 use super::chunk_store::ChunkStore;
 use super::dynamic_index::{DynamicIndexReader, DynamicIndexWriter};
 use super::fixed_index::{FixedIndexReader, FixedIndexWriter};
@@ -21,6 +21,7 @@ use super::{DataBlob, ArchiveType, archive_type};
 use crate::config::datastore;
 use crate::server::WorkerTask;
 use crate::tools;
+use crate::tools::fs::{lock_dir_noblock, DirLockGuard};
 use crate::api2::types::{GarbageCollectionStatus, Userid};
 
 lazy_static! {
@@ -335,7 +336,7 @@ impl DataStore {
         &self,
         backup_group: &BackupGroup,
         userid: &Userid,
-    ) -> Result<(Userid, BackupGroupGuard), Error> {
+    ) -> Result<(Userid, DirLockGuard), Error> {
         // create intermediate path first:
         let base_path = self.base_path();
 
@@ -348,13 +349,13 @@ impl DataStore {
         // create the last component now
         match std::fs::create_dir(&full_path) {
             Ok(_) => {
-                let guard = backup_group.lock(&base_path)?;
+                let guard = lock_dir_noblock(&full_path, "backup group", "another backup is already running")?;
                 self.set_owner(backup_group, userid, false)?;
                 let owner = self.get_owner(backup_group)?; // just to be sure
                 Ok((owner, guard))
             }
             Err(ref err) if err.kind() == io::ErrorKind::AlreadyExists => {
-                let guard = backup_group.lock(&base_path)?;
+                let guard = lock_dir_noblock(&full_path, "backup group", "another backup is already running")?;
                 let owner = self.get_owner(backup_group)?; // just to be sure
                 Ok((owner, guard))
             }
-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pbs-devel] [PATCH v2 proxmox-backup 3/7] datastore: prevent in-use deletion with locks instead of heuristic
  2020-08-11  8:50 [pbs-devel] [PATCH v2 0/7] More flocking and race elimination Stefan Reiter
  2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 1/7] src/tools/fs.rs: new helper lock_dir_noblock Stefan Reiter
  2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 2/7] src/backup/backup_info.rs: remove BackupGroup lock() Stefan Reiter
@ 2020-08-11  8:50 ` Stefan Reiter
  2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 4/7] backup: flock snapshot on backup start Stefan Reiter
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Stefan Reiter @ 2020-08-11  8:50 UTC (permalink / raw)
  To: pbs-devel

Attempt to lock the backup directory to be deleted, if it works keep the
lock until the deletion is complete. This way we ensure that no other
locking operation (e.g. using a snapshot as base for another backup) can
happen concurrently.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

For this to actually work the following patches are obviously necessary, but I
wanted to keep them seperate for review.

 src/backup/datastore.rs | 40 ++++------------------------------------
 1 file changed, 4 insertions(+), 36 deletions(-)

diff --git a/src/backup/datastore.rs b/src/backup/datastore.rs
index 01695f48..71544d20 100644
--- a/src/backup/datastore.rs
+++ b/src/backup/datastore.rs
@@ -11,7 +11,7 @@ use serde_json::Value;
 
 use proxmox::tools::fs::{replace_file, CreateOptions};
 
-use super::backup_info::{BackupGroup, BackupDir, BackupInfo};
+use super::backup_info::{BackupGroup, BackupDir};
 use super::chunk_store::ChunkStore;
 use super::dynamic_index::{DynamicIndexReader, DynamicIndexWriter};
 use super::fixed_index::{FixedIndexReader, FixedIndexWriter};
@@ -200,19 +200,7 @@ impl DataStore {
 
         let full_path = self.group_path(backup_group);
 
-        let mut snap_list = backup_group.list_backups(&self.base_path())?;
-        BackupInfo::sort_list(&mut snap_list, false);
-        for snap in snap_list {
-            if snap.is_finished() {
-                break;
-            } else {
-                bail!(
-                    "cannot remove backup group {:?}, contains potentially running backup: {}",
-                    full_path,
-                    snap.backup_dir
-                );
-            }
-        }
+        let _guard = tools::fs::lock_dir_noblock(&full_path, "backup group", "possible running backup")?;
 
         log::info!("removing backup group {:?}", full_path);
         std::fs::remove_dir_all(&full_path)
@@ -232,29 +220,9 @@ impl DataStore {
 
         let full_path = self.snapshot_path(backup_dir);
 
+        let _guard;
         if !force {
-            let mut snap_list = backup_dir.group().list_backups(&self.base_path())?;
-            BackupInfo::sort_list(&mut snap_list, false);
-            let mut prev_snap_finished = true;
-            for snap in snap_list {
-                let cur_snap_finished = snap.is_finished();
-                if &snap.backup_dir == backup_dir {
-                    if !cur_snap_finished {
-                        bail!(
-                            "cannot remove currently running snapshot: {:?}",
-                            backup_dir
-                        );
-                    }
-                    if !prev_snap_finished {
-                        bail!(
-                            "cannot remove snapshot {:?}, successor is currently running and potentially based on it",
-                            backup_dir
-                        );
-                    }
-                    break;
-                }
-                prev_snap_finished = cur_snap_finished;
-            }
+            _guard = lock_dir_noblock(&full_path, "snapshot", "possibly running or used as base")?;
         }
 
         log::info!("removing backup snapshot {:?}", full_path);
-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pbs-devel] [PATCH v2 proxmox-backup 4/7] backup: flock snapshot on backup start
  2020-08-11  8:50 [pbs-devel] [PATCH v2 0/7] More flocking and race elimination Stefan Reiter
                   ` (2 preceding siblings ...)
  2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 3/7] datastore: prevent in-use deletion with locks instead of heuristic Stefan Reiter
@ 2020-08-11  8:50 ` Stefan Reiter
  2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 5/7] Revert "backup: ensure base snapshots are still available after backup" Stefan Reiter
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Stefan Reiter @ 2020-08-11  8:50 UTC (permalink / raw)
  To: pbs-devel

An flock on the snapshot dir itself is used in addition to the group dir
lock. The lock is used to avoid races with forget and prune, while
having more granularity than the group lock (i.e. the group lock is
necessary to prevent more than one backup per group, but the snapshot
lock still allows backups unrelated to the currently running to be
forgotten/pruned).

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 src/api2/backup.rs      |  3 ++-
 src/backup/datastore.rs | 13 +++++++++----
 src/client/pull.rs      |  2 +-
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/src/api2/backup.rs b/src/api2/backup.rs
index 4b019007..f541eb97 100644
--- a/src/api2/backup.rs
+++ b/src/api2/backup.rs
@@ -106,7 +106,7 @@ async move {
         }
     }
 
-    let (path, is_new) = datastore.create_backup_dir(&backup_dir)?;
+    let (path, is_new, _snap_guard) = datastore.create_locked_backup_dir(&backup_dir)?;
     if !is_new { bail!("backup directory already exists."); }
 
     WorkerTask::spawn("backup", Some(worker_id), userid.clone(), true, move |worker| {
@@ -146,6 +146,7 @@ async move {
         async move {
             // keep flock until task ends
             let _group_guard = _group_guard;
+            let _snap_guard = _snap_guard;
 
             let res = select!{
                 req = req_fut => req,
diff --git a/src/backup/datastore.rs b/src/backup/datastore.rs
index 71544d20..37f581bc 100644
--- a/src/backup/datastore.rs
+++ b/src/backup/datastore.rs
@@ -334,15 +334,20 @@ impl DataStore {
     /// Creates a new backup snapshot inside a BackupGroup
     ///
     /// The BackupGroup directory needs to exist.
-    pub fn create_backup_dir(&self, backup_dir: &BackupDir) ->  Result<(PathBuf, bool), io::Error> {
+    pub fn create_locked_backup_dir(&self, backup_dir: &BackupDir)
+        -> Result<(PathBuf, bool, DirLockGuard), Error>
+    {
         let relative_path = backup_dir.relative_path();
         let mut full_path = self.base_path();
         full_path.push(&relative_path);
 
+        let lock = ||
+            lock_dir_noblock(&full_path, "snapshot", "internal error - tried creating snapshot that's already in use");
+
         match std::fs::create_dir(&full_path) {
-            Ok(_) => Ok((relative_path, true)),
-            Err(ref e) if e.kind() == io::ErrorKind::AlreadyExists => Ok((relative_path, false)),
-            Err(e) => Err(e)
+            Ok(_) => Ok((relative_path, true, lock()?)),
+            Err(ref e) if e.kind() == io::ErrorKind::AlreadyExists => Ok((relative_path, false, lock()?)),
+            Err(e) => Err(e.into())
         }
     }
 
diff --git a/src/client/pull.rs b/src/client/pull.rs
index 05b7c66c..2428051a 100644
--- a/src/client/pull.rs
+++ b/src/client/pull.rs
@@ -297,7 +297,7 @@ pub async fn pull_snapshot_from(
     snapshot: &BackupDir,
 ) -> Result<(), Error> {
 
-    let (_path, is_new) = tgt_store.create_backup_dir(&snapshot)?;
+    let (_path, is_new, _snap_lock) = tgt_store.create_locked_backup_dir(&snapshot)?;
 
     if is_new {
         worker.log(format!("sync snapshot {:?}", snapshot.relative_path()));
-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pbs-devel] [PATCH v2 proxmox-backup 5/7] Revert "backup: ensure base snapshots are still available after backup"
  2020-08-11  8:50 [pbs-devel] [PATCH v2 0/7] More flocking and race elimination Stefan Reiter
                   ` (3 preceding siblings ...)
  2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 4/7] backup: flock snapshot on backup start Stefan Reiter
@ 2020-08-11  8:50 ` Stefan Reiter
  2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 6/7] backup: lock base snapshot and ensure existance on finish Stefan Reiter
  2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 7/7] prune: also check backup snapshot locks Stefan Reiter
  6 siblings, 0 replies; 9+ messages in thread
From: Stefan Reiter @ 2020-08-11  8:50 UTC (permalink / raw)
  To: pbs-devel

This reverts commit d53fbe24741ed3bd920ee1c61bc163b1874b7c82.

The HashSet and "register" function are unnecessary, as we already know
which backup is the one we need to check: the last one, stored as
'last_backup'.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

The next patch restores this functionality in a much simpler way.

 src/api2/backup.rs             |  1 -
 src/api2/backup/environment.rs | 21 +--------------------
 src/backup/backup_info.rs      |  2 +-
 3 files changed, 2 insertions(+), 22 deletions(-)

diff --git a/src/api2/backup.rs b/src/api2/backup.rs
index f541eb97..eda83bb3 100644
--- a/src/api2/backup.rs
+++ b/src/api2/backup.rs
@@ -661,7 +661,6 @@ fn download_previous(
             };
             if let Some(index) = index {
                 env.log(format!("register chunks in '{}' from previous backup.", archive_name));
-                env.register_base_snapshot(last_backup.backup_dir.clone());
 
                 for pos in 0..index.index_count() {
                     let info = index.chunk_info(pos).unwrap();
diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index f8c3f651..e4d280a4 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -1,6 +1,6 @@
 use anyhow::{bail, format_err, Error};
 use std::sync::{Arc, Mutex};
-use std::collections::{HashMap, HashSet};
+use std::collections::HashMap;
 
 use ::serde::{Serialize};
 use serde_json::{json, Value};
@@ -73,7 +73,6 @@ struct SharedBackupState {
     dynamic_writers: HashMap<usize, DynamicWriterState>,
     fixed_writers: HashMap<usize, FixedWriterState>,
     known_chunks: HashMap<[u8;32], u32>,
-    base_snapshots: HashSet<BackupDir>,
     backup_size: u64, // sums up size of all files
     backup_stat: UploadStatistic,
 }
@@ -127,7 +126,6 @@ impl BackupEnvironment {
             dynamic_writers: HashMap::new(),
             fixed_writers: HashMap::new(),
             known_chunks: HashMap::new(),
-            base_snapshots: HashSet::new(),
             backup_size: 0,
             backup_stat: UploadStatistic::new(),
         };
@@ -146,13 +144,6 @@ impl BackupEnvironment {
         }
     }
 
-    /// Register a snapshot as a predecessor of the current backup.
-    /// It's existance will be ensured on finishing.
-    pub fn register_base_snapshot(&self, snap: BackupDir) {
-        let mut state = self.state.lock().unwrap();
-        state.base_snapshots.insert(snap);
-    }
-
     /// Register a Chunk with associated length.
     ///
     /// We do not fully trust clients, so a client may only use registered
@@ -489,16 +480,6 @@ impl BackupEnvironment {
         self.datastore.store_manifest(&self.backup_dir, manifest)
             .map_err(|err| format_err!("unable to store manifest blob - {}", err))?;
 
-        for snap in &state.base_snapshots {
-            let path = self.datastore.snapshot_path(snap);
-            if !path.exists() {
-                bail!(
-                    "base snapshot {} was removed during backup, cannot finish as chunks might be missing",
-                    snap
-                );
-            }
-        }
-
         // marks the backup as successful
         state.finished = true;
 
diff --git a/src/backup/backup_info.rs b/src/backup/backup_info.rs
index 26b57fae..1d42d79d 100644
--- a/src/backup/backup_info.rs
+++ b/src/backup/backup_info.rs
@@ -173,7 +173,7 @@ impl std::str::FromStr for BackupGroup {
 /// Uniquely identify a Backup (relative to data store)
 ///
 /// We also call this a backup snaphost.
-#[derive(Debug, Eq, PartialEq, Hash, Clone)]
+#[derive(Debug, Eq, PartialEq, Clone)]
 pub struct BackupDir {
     /// Backup group
     group: BackupGroup,
-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pbs-devel] [PATCH v2 proxmox-backup 6/7] backup: lock base snapshot and ensure existance on finish
  2020-08-11  8:50 [pbs-devel] [PATCH v2 0/7] More flocking and race elimination Stefan Reiter
                   ` (4 preceding siblings ...)
  2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 5/7] Revert "backup: ensure base snapshots are still available after backup" Stefan Reiter
@ 2020-08-11  8:50 ` Stefan Reiter
  2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 7/7] prune: also check backup snapshot locks Stefan Reiter
  6 siblings, 0 replies; 9+ messages in thread
From: Stefan Reiter @ 2020-08-11  8:50 UTC (permalink / raw)
  To: pbs-devel

To prevent forgetting the base snapshot of a running backup, and catch
the case when it still happens (e.g. via manual rm) to at least error
out instead of storing a potentially invalid backup.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 src/api2/backup.rs             | 12 ++++++++++--
 src/api2/backup/environment.rs | 10 ++++++++++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/src/api2/backup.rs b/src/api2/backup.rs
index eda83bb3..ad608d85 100644
--- a/src/api2/backup.rs
+++ b/src/api2/backup.rs
@@ -16,6 +16,7 @@ use crate::backup::*;
 use crate::api2::types::*;
 use crate::config::acl::PRIV_DATASTORE_BACKUP;
 use crate::config::cached_user_info::CachedUserInfo;
+use crate::tools::fs::lock_dir_noblock;
 
 mod environment;
 use environment::*;
@@ -100,11 +101,17 @@ async move {
     let last_backup = BackupInfo::last_backup(&datastore.base_path(), &backup_group, true).unwrap_or(None);
     let backup_dir = BackupDir::new_with_group(backup_group.clone(), backup_time);
 
-    if let Some(last) = &last_backup {
+    let _last_guard = if let Some(last) = &last_backup {
         if backup_dir.backup_time() <= last.backup_dir.backup_time() {
             bail!("backup timestamp is older than last backup.");
         }
-    }
+
+        // lock last snapshot to prevent forgetting/pruning it during backup
+        let full_path = datastore.snapshot_path(&last.backup_dir);
+        Some(lock_dir_noblock(&full_path, "snapshot", "base snapshot is already locked by another operation")?)
+    } else {
+        None
+    };
 
     let (path, is_new, _snap_guard) = datastore.create_locked_backup_dir(&backup_dir)?;
     if !is_new { bail!("backup directory already exists."); }
@@ -147,6 +154,7 @@ async move {
             // keep flock until task ends
             let _group_guard = _group_guard;
             let _snap_guard = _snap_guard;
+            let _last_guard = _last_guard;
 
             let res = select!{
                 req = req_fut => req,
diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index e4d280a4..973563d3 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -480,6 +480,16 @@ impl BackupEnvironment {
         self.datastore.store_manifest(&self.backup_dir, manifest)
             .map_err(|err| format_err!("unable to store manifest blob - {}", err))?;
 
+        if let Some(base) = &self.last_backup {
+            let path = self.datastore.snapshot_path(&base.backup_dir);
+            if !path.exists() {
+                bail!(
+                    "base snapshot {} was removed during backup, cannot finish as chunks might be missing",
+                    base.backup_dir
+                );
+            }
+        }
+
         // marks the backup as successful
         state.finished = true;
 
-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* [pbs-devel] [PATCH v2 proxmox-backup 7/7] prune: also check backup snapshot locks
  2020-08-11  8:50 [pbs-devel] [PATCH v2 0/7] More flocking and race elimination Stefan Reiter
                   ` (5 preceding siblings ...)
  2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 6/7] backup: lock base snapshot and ensure existance on finish Stefan Reiter
@ 2020-08-11  8:50 ` Stefan Reiter
  2020-08-11  9:35   ` Dietmar Maurer
  6 siblings, 1 reply; 9+ messages in thread
From: Stefan Reiter @ 2020-08-11  8:50 UTC (permalink / raw)
  To: pbs-devel

...to avoid pruning running backups or backups used as base for others.

Unfinished backups that have no lock are considered broken and will
always be pruned.

The ignore_locks parameter is used for testing, since flocks are not
available in test environment.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

Once again, feel free to remove my comments, I just can't wrap my head around
that logic without them ;)

 src/api2/admin/datastore.rs     |  2 +-
 src/backup/prune.rs             | 50 ++++++++++++++++++++++-----------
 src/bin/proxmox-backup-proxy.rs |  2 +-
 tests/prune.rs                  |  6 ++--
 4 files changed, 39 insertions(+), 21 deletions(-)

diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index fe72ea32..a987416e 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -636,7 +636,7 @@ fn prune(
 
     let list = group.list_backups(&datastore.base_path())?;
 
-    let mut prune_info = compute_prune_info(list, &prune_options)?;
+    let mut prune_info = compute_prune_info(list, &prune_options, &datastore.base_path(), false)?;
 
     prune_info.reverse(); // delete older snapshots first
 
diff --git a/src/backup/prune.rs b/src/backup/prune.rs
index f7a87c5a..f642e7b3 100644
--- a/src/backup/prune.rs
+++ b/src/backup/prune.rs
@@ -1,12 +1,13 @@
 use anyhow::{Error};
 use std::collections::{HashMap, HashSet};
-use std::path::PathBuf;
+use std::path::{PathBuf, Path};
 
 use chrono::{DateTime, Timelike, Datelike, Local};
 
 use super::{BackupDir, BackupInfo};
+use crate::tools::fs::lock_dir_noblock;
 
-enum PruneMark { Keep, KeepPartial, Remove }
+enum PruneMark { Keep, Ignored, Remove }
 
 fn mark_selections<F: Fn(DateTime<Local>, &BackupInfo) -> String> (
     mark: &mut HashMap<PathBuf, PruneMark>,
@@ -45,26 +46,39 @@ fn mark_selections<F: Fn(DateTime<Local>, &BackupInfo) -> String> (
     }
 }
 
-fn remove_incomplete_snapshots(
+fn mark_incomplete_and_in_use_snapshots(
     mark: &mut HashMap<PathBuf, PruneMark>,
     list: &Vec<BackupInfo>,
+    base_path: &Path,
+    ignore_locks: bool,
 ) {
 
-    let mut keep_unfinished = true;
     for info in list.iter() {
-        // backup is considered unfinished if there is no manifest
-        if info.is_finished() {
-            // There is a new finished backup, so there is no need
-            // to keep older unfinished backups.
-            keep_unfinished = false;
-        } else {
-            let backup_id = info.backup_dir.relative_path();
-            if keep_unfinished { // keep first unfinished
-                mark.insert(backup_id, PruneMark::KeepPartial);
-            } else {
+        let backup_id = info.backup_dir.relative_path();
+        let mut full_path = base_path.to_owned();
+        full_path.push(backup_id.clone());
+
+        if ignore_locks || lock_dir_noblock(&full_path, "snapshot", "").is_ok() {
+            if !info.is_finished() {
+                // incomplete backup, but we can lock it, so it's not running -
+                // always remove to clean up broken backups
                 mark.insert(backup_id, PruneMark::Remove);
             }
-            keep_unfinished = false;
+            // if locking succeeds and the backup is complete, let the regular
+            // marking logic figure it out
+            // note that we don't need to keep any locks either - any newly
+            // started backup can only ever use the latest finished backup as
+            // base, which is kept anyway (since we always keep the latest, and
+            // this function already filters out any unfinished ones)
+        } else {
+            // backups we can't lock we have to ignore - note that this means in
+            // case a backup is running in a group, the first three snapshots
+            // will always be kept
+            // we cannot special case that though, since we don't know if a lock
+            // is in place for backup or forget, where in the latter case we
+            // must not mark it as "Keep", as otherwise we might end up with
+            // no backup for a given selection period
+            mark.insert(backup_id, PruneMark::Ignored);
         }
     }
 }
@@ -173,13 +187,15 @@ impl PruneOptions {
 pub fn compute_prune_info(
     mut list: Vec<BackupInfo>,
     options: &PruneOptions,
+    base_path: &Path,
+    ignore_locks: bool,
 ) -> Result<Vec<(BackupInfo, bool)>, Error> {
 
     let mut mark = HashMap::new();
 
     BackupInfo::sort_list(&mut list, false);
 
-    remove_incomplete_snapshots(&mut mark, &list);
+    mark_incomplete_and_in_use_snapshots(&mut mark, &list, base_path, ignore_locks);
 
     if let Some(keep_last) = options.keep_last {
         mark_selections(&mut mark, &list, keep_last as usize, |_local_time, info| {
@@ -227,7 +243,7 @@ pub fn compute_prune_info(
             let backup_id = info.backup_dir.relative_path();
             let keep = match mark.get(&backup_id) {
                 Some(PruneMark::Keep) => true,
-                Some(PruneMark::KeepPartial) => true,
+                Some(PruneMark::Ignored) => true,
                _ => false,
             };
             (info, keep)
diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
index 3f7bf3ec..8cd2e105 100644
--- a/src/bin/proxmox-backup-proxy.rs
+++ b/src/bin/proxmox-backup-proxy.rs
@@ -442,7 +442,7 @@ async fn schedule_datastore_prune() {
                 let groups = BackupGroup::list_groups(&base_path)?;
                 for group in groups {
                     let list = group.list_backups(&base_path)?;
-                    let mut prune_info = compute_prune_info(list, &prune_options)?;
+                    let mut prune_info = compute_prune_info(list, &prune_options, &base_path, false)?;
                     prune_info.reverse(); // delete older snapshots first
 
                     worker.log(format!("Starting prune on store \"{}\" group \"{}/{}\"",
diff --git a/tests/prune.rs b/tests/prune.rs
index d9758ea7..31f48970 100644
--- a/tests/prune.rs
+++ b/tests/prune.rs
@@ -1,5 +1,5 @@
 use anyhow::{Error};
-use std::path::PathBuf;
+use std::path::{PathBuf, Path};
 
 use proxmox_backup::backup::*;
 
@@ -9,7 +9,9 @@ fn get_prune_list(
     options: &PruneOptions,
 ) -> Vec<PathBuf> {
 
-    let mut prune_info = compute_prune_info(list, options).unwrap();
+    // ignore locks, would always fail in test environment
+    // base_path is only used for locking so leave empty
+    let mut prune_info = compute_prune_info(list, options, &Path::new(""), true).unwrap();
 
     prune_info.reverse();
 
-- 
2.20.1





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [pbs-devel] [PATCH v2 proxmox-backup 7/7] prune: also check backup snapshot locks
  2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 7/7] prune: also check backup snapshot locks Stefan Reiter
@ 2020-08-11  9:35   ` Dietmar Maurer
  0 siblings, 0 replies; 9+ messages in thread
From: Dietmar Maurer @ 2020-08-11  9:35 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter

I think this is not required. The old code also works now.

> On 08/11/2020 10:50 AM Stefan Reiter <s.reiter@proxmox.com> wrote:
> 
>  
> ...to avoid pruning running backups or backups used as base for others.
> 
> Unfinished backups that have no lock are considered broken and will
> always be pruned.
> 
> The ignore_locks parameter is used for testing, since flocks are not
> available in test environment.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> Once again, feel free to remove my comments, I just can't wrap my head around
> that logic without them ;)
> 
>  src/api2/admin/datastore.rs     |  2 +-
>  src/backup/prune.rs             | 50 ++++++++++++++++++++++-----------
>  src/bin/proxmox-backup-proxy.rs |  2 +-
>  tests/prune.rs                  |  6 ++--
>  4 files changed, 39 insertions(+), 21 deletions(-)
> 
> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
> index fe72ea32..a987416e 100644
> --- a/src/api2/admin/datastore.rs
> +++ b/src/api2/admin/datastore.rs
> @@ -636,7 +636,7 @@ fn prune(
>  
>      let list = group.list_backups(&datastore.base_path())?;
>  
> -    let mut prune_info = compute_prune_info(list, &prune_options)?;
> +    let mut prune_info = compute_prune_info(list, &prune_options, &datastore.base_path(), false)?;
>  
>      prune_info.reverse(); // delete older snapshots first
>  
> diff --git a/src/backup/prune.rs b/src/backup/prune.rs
> index f7a87c5a..f642e7b3 100644
> --- a/src/backup/prune.rs
> +++ b/src/backup/prune.rs
> @@ -1,12 +1,13 @@
>  use anyhow::{Error};
>  use std::collections::{HashMap, HashSet};
> -use std::path::PathBuf;
> +use std::path::{PathBuf, Path};
>  
>  use chrono::{DateTime, Timelike, Datelike, Local};
>  
>  use super::{BackupDir, BackupInfo};
> +use crate::tools::fs::lock_dir_noblock;
>  
> -enum PruneMark { Keep, KeepPartial, Remove }
> +enum PruneMark { Keep, Ignored, Remove }
>  
>  fn mark_selections<F: Fn(DateTime<Local>, &BackupInfo) -> String> (
>      mark: &mut HashMap<PathBuf, PruneMark>,
> @@ -45,26 +46,39 @@ fn mark_selections<F: Fn(DateTime<Local>, &BackupInfo) -> String> (
>      }
>  }
>  
> -fn remove_incomplete_snapshots(
> +fn mark_incomplete_and_in_use_snapshots(
>      mark: &mut HashMap<PathBuf, PruneMark>,
>      list: &Vec<BackupInfo>,
> +    base_path: &Path,
> +    ignore_locks: bool,
>  ) {
>  
> -    let mut keep_unfinished = true;
>      for info in list.iter() {
> -        // backup is considered unfinished if there is no manifest
> -        if info.is_finished() {
> -            // There is a new finished backup, so there is no need
> -            // to keep older unfinished backups.
> -            keep_unfinished = false;
> -        } else {
> -            let backup_id = info.backup_dir.relative_path();
> -            if keep_unfinished { // keep first unfinished
> -                mark.insert(backup_id, PruneMark::KeepPartial);
> -            } else {
> +        let backup_id = info.backup_dir.relative_path();
> +        let mut full_path = base_path.to_owned();
> +        full_path.push(backup_id.clone());
> +
> +        if ignore_locks || lock_dir_noblock(&full_path, "snapshot", "").is_ok() {
> +            if !info.is_finished() {
> +                // incomplete backup, but we can lock it, so it's not running -
> +                // always remove to clean up broken backups
>                  mark.insert(backup_id, PruneMark::Remove);
>              }
> -            keep_unfinished = false;
> +            // if locking succeeds and the backup is complete, let the regular
> +            // marking logic figure it out
> +            // note that we don't need to keep any locks either - any newly
> +            // started backup can only ever use the latest finished backup as
> +            // base, which is kept anyway (since we always keep the latest, and
> +            // this function already filters out any unfinished ones)
> +        } else {
> +            // backups we can't lock we have to ignore - note that this means in
> +            // case a backup is running in a group, the first three snapshots
> +            // will always be kept
> +            // we cannot special case that though, since we don't know if a lock
> +            // is in place for backup or forget, where in the latter case we
> +            // must not mark it as "Keep", as otherwise we might end up with
> +            // no backup for a given selection period
> +            mark.insert(backup_id, PruneMark::Ignored);
>          }
>      }
>  }
> @@ -173,13 +187,15 @@ impl PruneOptions {
>  pub fn compute_prune_info(
>      mut list: Vec<BackupInfo>,
>      options: &PruneOptions,
> +    base_path: &Path,
> +    ignore_locks: bool,
>  ) -> Result<Vec<(BackupInfo, bool)>, Error> {
>  
>      let mut mark = HashMap::new();
>  
>      BackupInfo::sort_list(&mut list, false);
>  
> -    remove_incomplete_snapshots(&mut mark, &list);
> +    mark_incomplete_and_in_use_snapshots(&mut mark, &list, base_path, ignore_locks);
>  
>      if let Some(keep_last) = options.keep_last {
>          mark_selections(&mut mark, &list, keep_last as usize, |_local_time, info| {
> @@ -227,7 +243,7 @@ pub fn compute_prune_info(
>              let backup_id = info.backup_dir.relative_path();
>              let keep = match mark.get(&backup_id) {
>                  Some(PruneMark::Keep) => true,
> -                Some(PruneMark::KeepPartial) => true,
> +                Some(PruneMark::Ignored) => true,
>                 _ => false,
>              };
>              (info, keep)
> diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
> index 3f7bf3ec..8cd2e105 100644
> --- a/src/bin/proxmox-backup-proxy.rs
> +++ b/src/bin/proxmox-backup-proxy.rs
> @@ -442,7 +442,7 @@ async fn schedule_datastore_prune() {
>                  let groups = BackupGroup::list_groups(&base_path)?;
>                  for group in groups {
>                      let list = group.list_backups(&base_path)?;
> -                    let mut prune_info = compute_prune_info(list, &prune_options)?;
> +                    let mut prune_info = compute_prune_info(list, &prune_options, &base_path, false)?;
>                      prune_info.reverse(); // delete older snapshots first
>  
>                      worker.log(format!("Starting prune on store \"{}\" group \"{}/{}\"",
> diff --git a/tests/prune.rs b/tests/prune.rs
> index d9758ea7..31f48970 100644
> --- a/tests/prune.rs
> +++ b/tests/prune.rs
> @@ -1,5 +1,5 @@
>  use anyhow::{Error};
> -use std::path::PathBuf;
> +use std::path::{PathBuf, Path};
>  
>  use proxmox_backup::backup::*;
>  
> @@ -9,7 +9,9 @@ fn get_prune_list(
>      options: &PruneOptions,
>  ) -> Vec<PathBuf> {
>  
> -    let mut prune_info = compute_prune_info(list, options).unwrap();
> +    // ignore locks, would always fail in test environment
> +    // base_path is only used for locking so leave empty
> +    let mut prune_info = compute_prune_info(list, options, &Path::new(""), true).unwrap();
>  
>      prune_info.reverse();
>  
> -- 
> 2.20.1
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-08-11  9:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-11  8:50 [pbs-devel] [PATCH v2 0/7] More flocking and race elimination Stefan Reiter
2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 1/7] src/tools/fs.rs: new helper lock_dir_noblock Stefan Reiter
2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 2/7] src/backup/backup_info.rs: remove BackupGroup lock() Stefan Reiter
2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 3/7] datastore: prevent in-use deletion with locks instead of heuristic Stefan Reiter
2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 4/7] backup: flock snapshot on backup start Stefan Reiter
2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 5/7] Revert "backup: ensure base snapshots are still available after backup" Stefan Reiter
2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 6/7] backup: lock base snapshot and ensure existance on finish Stefan Reiter
2020-08-11  8:50 ` [pbs-devel] [PATCH v2 proxmox-backup 7/7] prune: also check backup snapshot locks Stefan Reiter
2020-08-11  9:35   ` Dietmar Maurer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal