public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH 0/7] More flocking and race elimination
@ 2020-08-04 10:41 Stefan Reiter
  2020-08-04 10:41 ` [pbs-devel] [PATCH proxmox-backup 1/7] finish_backup: mark backup as finished only after checks have passed Stefan Reiter
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Stefan Reiter @ 2020-08-04 10:41 UTC (permalink / raw)
  To: pbs-devel

The previous series[0] already fixed some bugs, but it also introduced some new
ones, e.g. incomplete but not running backups were unable to be removed.

This is the next step to race-free™ backups, this time going all-in on flocks.

Patches 1 and 2 are technically unrelated (though necessary), so could be
committed standalone.

The last patch introduces locking for base backups, but also re-writes the
previous (unnecessarily complex) base-snapshot existance check, which is
therefor reverted in patch 6.

[0] https://lists.proxmox.com/pipermail/pbs-devel/2020-July/000233.html


proxmox-backup: Stefan Reiter (7):
  finish_backup: mark backup as finished only after checks have passed
  backup: only allow finished backups as base snapshot
  datastore: prevent in-use deletion with locks instead of heuristic
  prune: also check backup snapshot locks
  backup: flock snapshot on backup start
  Revert "backup: ensure base snapshots are still available after
    backup"
  backup: lock base snapshot and ensure existance on finish

 src/api2/admin/datastore.rs     |  2 +-
 src/api2/backup.rs              | 16 +++++---
 src/api2/backup/environment.rs  | 21 +++--------
 src/backup/backup_info.rs       | 65 +++++++++++++++++++++++++++++----
 src/backup/datastore.rs         | 62 +++++++++++--------------------
 src/backup/prune.rs             | 48 +++++++++++++++++-------
 src/bin/proxmox-backup-proxy.rs |  2 +-
 src/client/pull.rs              |  2 +-
 tests/prune.rs                  |  6 ++-
 9 files changed, 136 insertions(+), 88 deletions(-)

-- 
2.20.1




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 1/7] finish_backup: mark backup as finished only after checks have passed
  2020-08-04 10:41 [pbs-devel] [PATCH 0/7] More flocking and race elimination Stefan Reiter
@ 2020-08-04 10:41 ` Stefan Reiter
  2020-08-06  4:39   ` [pbs-devel] applied: " Dietmar Maurer
  2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 2/7] backup: only allow finished backups as base snapshot Stefan Reiter
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Stefan Reiter @ 2020-08-04 10:41 UTC (permalink / raw)
  To: pbs-devel

Commit 9fa55e09 "finish_backup: test/verify manifest at server side"
moved the finished-marking above some checks, which means if those fail
the backup would still be marked as successful on the server.

Revert that part and comment the line for the future.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 src/api2/backup/environment.rs | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index 02aa0c40..bc780d20 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -477,8 +477,6 @@ impl BackupEnvironment {
             bail!("backup does not contain valid files (file count == 0)");
         }
 
-        state.finished = true;
-
         // check manifest
         let mut manifest = self.datastore.load_manifest_json(&self.backup_dir)
             .map_err(|err| format_err!("unable to load manifest blob - {}", err))?;
@@ -500,6 +498,8 @@ impl BackupEnvironment {
             }
         }
 
+        // marks the backup as successful
+        state.finished = true;
 
         Ok(())
     }
-- 
2.20.1





^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 2/7] backup: only allow finished backups as base snapshot
  2020-08-04 10:41 [pbs-devel] [PATCH 0/7] More flocking and race elimination Stefan Reiter
  2020-08-04 10:41 ` [pbs-devel] [PATCH proxmox-backup 1/7] finish_backup: mark backup as finished only after checks have passed Stefan Reiter
@ 2020-08-04 10:42 ` Stefan Reiter
  2020-08-06  4:45   ` Dietmar Maurer
  2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 3/7] datastore: prevent in-use deletion with locks instead of heuristic Stefan Reiter
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Stefan Reiter @ 2020-08-04 10:42 UTC (permalink / raw)
  To: pbs-devel

If the datastore holds broken backups for some reason, do not attempt to
base following snapshots on those. This would lead to an error on
/previous, leaving the client no choice but to upload all chunks, even
though there might be potential for incremental savings.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 src/api2/backup.rs        | 2 +-
 src/backup/backup_info.rs | 8 ++++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/src/api2/backup.rs b/src/api2/backup.rs
index ad13faa5..aafea8fa 100644
--- a/src/api2/backup.rs
+++ b/src/api2/backup.rs
@@ -97,7 +97,7 @@ async move {
         bail!("backup owner check failed ({} != {})", username, owner);
     }
 
-    let last_backup = BackupInfo::last_backup(&datastore.base_path(), &backup_group).unwrap_or(None);
+    let last_backup = BackupInfo::last_backup(&datastore.base_path(), &backup_group, true).unwrap_or(None);
     let backup_dir = BackupDir::new_with_group(backup_group.clone(), backup_time);
 
     if let Some(last) = &last_backup {
diff --git a/src/backup/backup_info.rs b/src/backup/backup_info.rs
index 37dc7aa1..ea917d3c 100644
--- a/src/backup/backup_info.rs
+++ b/src/backup/backup_info.rs
@@ -313,9 +313,13 @@ impl BackupInfo {
     }
 
     /// Finds the latest backup inside a backup group
-    pub fn last_backup(base_path: &Path, group: &BackupGroup) -> Result<Option<BackupInfo>, Error> {
+    pub fn last_backup(base_path: &Path, group: &BackupGroup, only_finished: bool)
+        -> Result<Option<BackupInfo>, Error>
+    {
         let backups = group.list_backups(base_path)?;
-        Ok(backups.into_iter().max_by_key(|item| item.backup_dir.backup_time()))
+        Ok(backups.into_iter()
+            .filter(|item| !only_finished || item.is_finished())
+            .max_by_key(|item| item.backup_dir.backup_time()))
     }
 
     pub fn sort_list(list: &mut Vec<BackupInfo>, ascendending: bool) {
-- 
2.20.1





^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 3/7] datastore: prevent in-use deletion with locks instead of heuristic
  2020-08-04 10:41 [pbs-devel] [PATCH 0/7] More flocking and race elimination Stefan Reiter
  2020-08-04 10:41 ` [pbs-devel] [PATCH proxmox-backup 1/7] finish_backup: mark backup as finished only after checks have passed Stefan Reiter
  2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 2/7] backup: only allow finished backups as base snapshot Stefan Reiter
@ 2020-08-04 10:42 ` Stefan Reiter
  2020-08-06  4:51   ` Dietmar Maurer
  2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 4/7] prune: also check backup snapshot locks Stefan Reiter
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Stefan Reiter @ 2020-08-04 10:42 UTC (permalink / raw)
  To: pbs-devel

Attempt to lock the backup directory to be deleted, if it works keep the
lock until the deletion is complete. This way we ensure that no other
locking operation (e.g. using a snapshot as base for another backup) can
happen concurrently.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

For this to actually work the following patches are obviously necessary, but I
wanted to keep them seperate for review.

 src/backup/backup_info.rs | 55 +++++++++++++++++++++++++++++++++++----
 src/backup/datastore.rs   | 48 ++++++++--------------------------
 2 files changed, 61 insertions(+), 42 deletions(-)

diff --git a/src/backup/backup_info.rs b/src/backup/backup_info.rs
index ea917d3c..c35928ce 100644
--- a/src/backup/backup_info.rs
+++ b/src/backup/backup_info.rs
@@ -41,7 +41,7 @@ lazy_static!{
 }
 
 /// Opaque type releasing the corresponding flock when dropped
-pub type BackupGroupGuard = Dir;
+pub type BackupLockGuard = Dir;
 
 /// BackupGroup is a directory containing a list of BackupDir
 #[derive(Debug, Eq, PartialEq, Hash, Clone)]
@@ -91,7 +91,11 @@ impl BackupGroup {
             let backup_dir = BackupDir::new(self.backup_type.clone(), self.backup_id.clone(), dt.timestamp());
             let files = list_backup_files(l2_fd, backup_time)?;
 
-            list.push(BackupInfo { backup_dir, files });
+            list.push(BackupInfo {
+                backup_dir,
+                files,
+                base_path: base_path.to_owned()
+            });
 
             Ok(())
         })?;
@@ -137,7 +141,7 @@ impl BackupGroup {
         Ok(last)
     }
 
-    pub fn lock(&self, base_path: &Path) -> Result<BackupGroupGuard, Error> {
+    pub fn lock(&self, base_path: &Path) -> Result<BackupLockGuard, Error> {
         use nix::fcntl::OFlag;
         use nix::sys::stat::Mode;
 
@@ -299,6 +303,8 @@ pub struct BackupInfo {
     pub backup_dir: BackupDir,
     /// List of data files
     pub files: Vec<String>,
+    /// Full path to dir containing backup_dir
+    pub base_path: PathBuf,
 }
 
 impl BackupInfo {
@@ -309,7 +315,7 @@ impl BackupInfo {
 
         let files = list_backup_files(libc::AT_FDCWD, &path)?;
 
-        Ok(BackupInfo { backup_dir, files })
+        Ok(BackupInfo { backup_dir, files, base_path: base_path.to_owned() })
     }
 
     /// Finds the latest backup inside a backup group
@@ -354,7 +360,11 @@ impl BackupInfo {
 
                     let files = list_backup_files(l2_fd, backup_time)?;
 
-                    list.push(BackupInfo { backup_dir, files });
+                    list.push(BackupInfo {
+                        backup_dir,
+                        files,
+                        base_path: base_path.to_owned()
+                    });
 
                     Ok(())
                 })
@@ -367,6 +377,41 @@ impl BackupInfo {
         // backup is considered unfinished if there is no manifest
         self.files.iter().any(|name| name == super::MANIFEST_BLOB_NAME)
     }
+
+    pub fn lock(&self) -> Result<BackupLockGuard, Error> {
+        use nix::fcntl::OFlag;
+        use nix::sys::stat::Mode;
+
+        let mut path = self.base_path.clone();
+        let dir = self.backup_dir.relative_path();
+        path.push(&dir);
+
+        let mut handle = Dir::open(&path, OFlag::O_RDONLY, Mode::empty())
+            .map_err(|err| {
+                format_err!(
+                    "unable to open snapshot directory {:?} for locking - {}",
+                    &dir,
+                    err,
+                )
+            })?;
+
+        // acquire in non-blocking mode, no point in waiting here since other
+        // backups could still take a very long time
+        proxmox::tools::fs::lock_file(&mut handle, true, Some(Duration::from_nanos(0)))
+            .map_err(|err| {
+                format_err!(
+                    "unable to acquire lock on snapshot {:?} - {}",
+                    &dir,
+                    if err.would_block() {
+                        String::from("snapshot is running or being used as base")
+                    } else {
+                        err.to_string()
+                    }
+                )
+            })?;
+
+        Ok(handle)
+    }
 }
 
 fn list_backup_files<P: ?Sized + nix::NixPath>(dirfd: RawFd, path: &P) -> Result<Vec<String>, Error> {
diff --git a/src/backup/datastore.rs b/src/backup/datastore.rs
index ffd64b81..3c374302 100644
--- a/src/backup/datastore.rs
+++ b/src/backup/datastore.rs
@@ -11,7 +11,7 @@ use serde_json::Value;
 
 use proxmox::tools::fs::{replace_file, CreateOptions};
 
-use super::backup_info::{BackupGroup, BackupGroupGuard, BackupDir, BackupInfo};
+use super::backup_info::{BackupGroup, BackupLockGuard, BackupDir, BackupInfo};
 use super::chunk_store::ChunkStore;
 use super::dynamic_index::{DynamicIndexReader, DynamicIndexWriter};
 use super::fixed_index::{FixedIndexReader, FixedIndexWriter};
@@ -199,19 +199,13 @@ impl DataStore {
 
         let full_path = self.group_path(backup_group);
 
-        let mut snap_list = backup_group.list_backups(&self.base_path())?;
-        BackupInfo::sort_list(&mut snap_list, false);
-        for snap in snap_list {
-            if snap.is_finished() {
-                break;
-            } else {
-                bail!(
-                    "cannot remove backup group {:?}, contains potentially running backup: {}",
-                    full_path,
-                    snap.backup_dir
-                );
-            }
-        }
+        let _guard = backup_group.lock(&self.base_path()).map_err(|err| {
+            format_err!(
+                "cannot acquire lock on backup group {}: {}",
+                backup_group,
+                err
+            )
+        })?;
 
         log::info!("removing backup group {:?}", full_path);
         std::fs::remove_dir_all(&full_path)
@@ -231,29 +225,9 @@ impl DataStore {
 
         let full_path = self.snapshot_path(backup_dir);
 
+        let _guard;
         if !force {
-            let mut snap_list = backup_dir.group().list_backups(&self.base_path())?;
-            BackupInfo::sort_list(&mut snap_list, false);
-            let mut prev_snap_finished = true;
-            for snap in snap_list {
-                let cur_snap_finished = snap.is_finished();
-                if &snap.backup_dir == backup_dir {
-                    if !cur_snap_finished {
-                        bail!(
-                            "cannot remove currently running snapshot: {:?}",
-                            backup_dir
-                        );
-                    }
-                    if !prev_snap_finished {
-                        bail!(
-                            "cannot remove snapshot {:?}, successor is currently running and potentially based on it",
-                            backup_dir
-                        );
-                    }
-                    break;
-                }
-                prev_snap_finished = cur_snap_finished;
-            }
+            _guard = BackupInfo::new(&self.base_path(), backup_dir.clone())?.lock()?;
         }
 
         log::info!("removing backup snapshot {:?}", full_path);
@@ -326,7 +300,7 @@ impl DataStore {
     /// current owner (instead of setting the owner).
     ///
     /// This also aquires an exclusive lock on the directory and returns the lock guard.
-    pub fn create_locked_backup_group(&self, backup_group: &BackupGroup, userid: &str) -> Result<(String, BackupGroupGuard), Error> {
+    pub fn create_locked_backup_group(&self, backup_group: &BackupGroup, userid: &str) -> Result<(String, BackupLockGuard), Error> {
 
         // create intermediate path first:
         let base_path = self.base_path();
-- 
2.20.1





^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 4/7] prune: also check backup snapshot locks
  2020-08-04 10:41 [pbs-devel] [PATCH 0/7] More flocking and race elimination Stefan Reiter
                   ` (2 preceding siblings ...)
  2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 3/7] datastore: prevent in-use deletion with locks instead of heuristic Stefan Reiter
@ 2020-08-04 10:42 ` Stefan Reiter
  2020-08-05  7:23   ` Fabian Ebner
  2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 5/7] backup: flock snapshot on backup start Stefan Reiter
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Stefan Reiter @ 2020-08-04 10:42 UTC (permalink / raw)
  To: pbs-devel

...to avoid pruning running backups or backups used as base for others.

Unfinished backups that have no lock are considered broken and will
always be pruned.

The ignore_locks parameter is used for testing, since flocks are not
available in test environment.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

This one I'm a bit scared of: The prune logic is astonishingly confusing IMHO,
but I hope I worked out how this makes the most sense.

I left my long-form comments in, even though they might seem a bit excessive -
they helped me make light of the situation, and might help in the future as
well. But if it's too much, feel free to shorten/remove them.

 src/api2/admin/datastore.rs     |  2 +-
 src/backup/prune.rs             | 48 +++++++++++++++++++++++----------
 src/bin/proxmox-backup-proxy.rs |  2 +-
 tests/prune.rs                  |  6 +++--
 4 files changed, 40 insertions(+), 18 deletions(-)

diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index 97fd9093..6194c9c7 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -621,7 +621,7 @@ fn prune(
 
     let list = group.list_backups(&datastore.base_path())?;
 
-    let mut prune_info = compute_prune_info(list, &prune_options)?;
+    let mut prune_info = compute_prune_info(list, &prune_options, false)?;
 
     prune_info.reverse(); // delete older snapshots first
 
diff --git a/src/backup/prune.rs b/src/backup/prune.rs
index f7a87c5a..1bc5b9e8 100644
--- a/src/backup/prune.rs
+++ b/src/backup/prune.rs
@@ -45,26 +45,45 @@ fn mark_selections<F: Fn(DateTime<Local>, &BackupInfo) -> String> (
     }
 }
 
-fn remove_incomplete_snapshots(
+fn mark_incomplete_and_in_use_snapshots(
     mark: &mut HashMap<PathBuf, PruneMark>,
     list: &Vec<BackupInfo>,
+    ignore_locks: bool,
 ) {
 
-    let mut keep_unfinished = true;
+    let mut first_finished = true;
     for info in list.iter() {
-        // backup is considered unfinished if there is no manifest
-        if info.is_finished() {
-            // There is a new finished backup, so there is no need
-            // to keep older unfinished backups.
-            keep_unfinished = false;
-        } else {
-            let backup_id = info.backup_dir.relative_path();
-            if keep_unfinished { // keep first unfinished
-                mark.insert(backup_id, PruneMark::KeepPartial);
-            } else {
+        let backup_id = info.backup_dir.relative_path();
+        let finished = info.is_finished();
+        if ignore_locks || info.lock().is_ok() {
+            if !finished {
+                // incomplete backup, but we can lock it, so it's not running -
+                // always remove to clean up broken backups
                 mark.insert(backup_id, PruneMark::Remove);
             }
-            keep_unfinished = false;
+            // if locking succeeds and the backup is complete, let the regular
+            // marking logic figure it out
+            // note that we don't need to keep any locks either - any newly
+            // started backup can only ever use the latest finished backup as
+            // base, which is kept anyway (since we always keep the latest, and
+            // this function already filters out any unfinished ones)
+        } else {
+            if finished {
+                // if the backup is finished but locking fails, we have to keep
+                // it, however, if it's the latest finished one, it will be kept
+                // anyway, so don't mark it at all
+                if !first_finished {
+                    mark.insert(backup_id, PruneMark::Keep);
+                }
+            } else {
+                // ...if it's incomplete, and locking fails, that probably means
+                // it's currently running, so keep it, but do not include it in
+                // mark computation
+                mark.insert(backup_id, PruneMark::KeepPartial);
+            };
+        }
+        if first_finished && finished {
+            first_finished = false;
         }
     }
 }
@@ -173,13 +192,14 @@ impl PruneOptions {
 pub fn compute_prune_info(
     mut list: Vec<BackupInfo>,
     options: &PruneOptions,
+    ignore_locks: bool,
 ) -> Result<Vec<(BackupInfo, bool)>, Error> {
 
     let mut mark = HashMap::new();
 
     BackupInfo::sort_list(&mut list, false);
 
-    remove_incomplete_snapshots(&mut mark, &list);
+    mark_incomplete_and_in_use_snapshots(&mut mark, &list, ignore_locks);
 
     if let Some(keep_last) = options.keep_last {
         mark_selections(&mut mark, &list, keep_last as usize, |_local_time, info| {
diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
index dc0d16d2..694e4b60 100644
--- a/src/bin/proxmox-backup-proxy.rs
+++ b/src/bin/proxmox-backup-proxy.rs
@@ -441,7 +441,7 @@ async fn schedule_datastore_prune() {
                 let groups = BackupGroup::list_groups(&base_path)?;
                 for group in groups {
                     let list = group.list_backups(&base_path)?;
-                    let mut prune_info = compute_prune_info(list, &prune_options)?;
+                    let mut prune_info = compute_prune_info(list, &prune_options, false)?;
                     prune_info.reverse(); // delete older snapshots first
 
                     worker.log(format!("Starting prune on store \"{}\" group \"{}/{}\"",
diff --git a/tests/prune.rs b/tests/prune.rs
index d9758ea7..923a94fd 100644
--- a/tests/prune.rs
+++ b/tests/prune.rs
@@ -9,7 +9,8 @@ fn get_prune_list(
     options: &PruneOptions,
 ) -> Vec<PathBuf> {
 
-    let mut prune_info = compute_prune_info(list, options).unwrap();
+    // ignore locks, would always fail in test environment
+    let mut prune_info = compute_prune_info(list, options, true).unwrap();
 
     prune_info.reverse();
 
@@ -38,7 +39,8 @@ fn create_info(
         files.push(String::from(MANIFEST_BLOB_NAME));
     }
 
-    BackupInfo { backup_dir, files }
+    // note: base_path is only used for locking, which we ignore here anyway
+    BackupInfo { backup_dir, files, base_path: PathBuf::from("/tmp/pbs-test") }
 }
 
 #[test]
-- 
2.20.1





^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 5/7] backup: flock snapshot on backup start
  2020-08-04 10:41 [pbs-devel] [PATCH 0/7] More flocking and race elimination Stefan Reiter
                   ` (3 preceding siblings ...)
  2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 4/7] prune: also check backup snapshot locks Stefan Reiter
@ 2020-08-04 10:42 ` Stefan Reiter
  2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 6/7] Revert "backup: ensure base snapshots are still available after backup" Stefan Reiter
  2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 7/7] backup: lock base snapshot and ensure existance on finish Stefan Reiter
  6 siblings, 0 replies; 15+ messages in thread
From: Stefan Reiter @ 2020-08-04 10:42 UTC (permalink / raw)
  To: pbs-devel

An flock on the snapshot dir itself is used in addition to the group dir
lock. The lock is used to avoid races with forget and prune, while
having more granularity than the group lock (i.e. the group lock is
necessary to prevent more than one backup per group, but the snapshot
lock still allows backup unrelated to the currently running to be
forgotten/pruned).

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 src/api2/backup.rs      |  3 ++-
 src/backup/datastore.rs | 14 ++++++++++----
 src/client/pull.rs      |  2 +-
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/src/api2/backup.rs b/src/api2/backup.rs
index aafea8fa..29b1a606 100644
--- a/src/api2/backup.rs
+++ b/src/api2/backup.rs
@@ -106,7 +106,7 @@ async move {
         }
     }
 
-    let (path, is_new) = datastore.create_backup_dir(&backup_dir)?;
+    let (path, is_new, _snap_guard) = datastore.create_locked_backup_dir(&backup_dir)?;
     if !is_new { bail!("backup directory already exists."); }
 
     WorkerTask::spawn("backup", Some(worker_id), &username.clone(), true, move |worker| {
@@ -146,6 +146,7 @@ async move {
         async move {
             // keep flock until task ends
             let _group_guard = _group_guard;
+            let _snap_guard = _snap_guard;
 
             let res = select!{
                 req = req_fut => req,
diff --git a/src/backup/datastore.rs b/src/backup/datastore.rs
index 3c374302..19fad4e1 100644
--- a/src/backup/datastore.rs
+++ b/src/backup/datastore.rs
@@ -331,15 +331,21 @@ impl DataStore {
     /// Creates a new backup snapshot inside a BackupGroup
     ///
     /// The BackupGroup directory needs to exist.
-    pub fn create_backup_dir(&self, backup_dir: &BackupDir) ->  Result<(PathBuf, bool), io::Error> {
+    pub fn create_locked_backup_dir(&self, backup_dir: &BackupDir)
+        -> Result<(PathBuf, bool, BackupLockGuard), Error>
+    {
         let relative_path = backup_dir.relative_path();
         let mut full_path = self.base_path();
         full_path.push(&relative_path);
 
+        let lock = || {
+            BackupInfo::new(&self.base_path(), backup_dir.clone())?.lock()
+        };
+
         match std::fs::create_dir(&full_path) {
-            Ok(_) => Ok((relative_path, true)),
-            Err(ref e) if e.kind() == io::ErrorKind::AlreadyExists => Ok((relative_path, false)),
-            Err(e) => Err(e)
+            Ok(_) => Ok((relative_path, true, lock()?)),
+            Err(ref e) if e.kind() == io::ErrorKind::AlreadyExists => Ok((relative_path, false, lock()?)),
+            Err(e) => Err(e.into())
         }
     }
 
diff --git a/src/client/pull.rs b/src/client/pull.rs
index 429ab458..2363dbfa 100644
--- a/src/client/pull.rs
+++ b/src/client/pull.rs
@@ -297,7 +297,7 @@ pub async fn pull_snapshot_from(
     snapshot: &BackupDir,
 ) -> Result<(), Error> {
 
-    let (_path, is_new) = tgt_store.create_backup_dir(&snapshot)?;
+    let (_path, is_new, _snap_lock) = tgt_store.create_locked_backup_dir(&snapshot)?;
 
     if is_new {
         worker.log(format!("sync snapshot {:?}", snapshot.relative_path()));
-- 
2.20.1





^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 6/7] Revert "backup: ensure base snapshots are still available after backup"
  2020-08-04 10:41 [pbs-devel] [PATCH 0/7] More flocking and race elimination Stefan Reiter
                   ` (4 preceding siblings ...)
  2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 5/7] backup: flock snapshot on backup start Stefan Reiter
@ 2020-08-04 10:42 ` Stefan Reiter
  2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 7/7] backup: lock base snapshot and ensure existance on finish Stefan Reiter
  6 siblings, 0 replies; 15+ messages in thread
From: Stefan Reiter @ 2020-08-04 10:42 UTC (permalink / raw)
  To: pbs-devel

This reverts commit d53fbe24741ed3bd920ee1c61bc163b1874b7c82.

The HashSet and "register" function are unnecessary, as we already know
which backup is the one we need to check: the last one, stored as
'last_backup'.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

The next patch restores this functionality in a much simpler way.

 src/api2/backup.rs             |  1 -
 src/api2/backup/environment.rs | 21 +--------------------
 src/backup/backup_info.rs      |  2 +-
 3 files changed, 2 insertions(+), 22 deletions(-)

diff --git a/src/api2/backup.rs b/src/api2/backup.rs
index 29b1a606..4b751e3e 100644
--- a/src/api2/backup.rs
+++ b/src/api2/backup.rs
@@ -661,7 +661,6 @@ fn download_previous(
             };
             if let Some(index) = index {
                 env.log(format!("register chunks in '{}' from previous backup.", archive_name));
-                env.register_base_snapshot(last_backup.backup_dir.clone());
 
                 for pos in 0..index.index_count() {
                     let info = index.chunk_info(pos).unwrap();
diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index bc780d20..aa039cd9 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -1,6 +1,6 @@
 use anyhow::{bail, format_err, Error};
 use std::sync::{Arc, Mutex};
-use std::collections::{HashMap, HashSet};
+use std::collections::HashMap;
 
 use ::serde::{Serialize};
 use serde_json::{json, Value};
@@ -72,7 +72,6 @@ struct SharedBackupState {
     dynamic_writers: HashMap<usize, DynamicWriterState>,
     fixed_writers: HashMap<usize, FixedWriterState>,
     known_chunks: HashMap<[u8;32], u32>,
-    base_snapshots: HashSet<BackupDir>,
     backup_size: u64, // sums up size of all files
     backup_stat: UploadStatistic,
 }
@@ -126,7 +125,6 @@ impl BackupEnvironment {
             dynamic_writers: HashMap::new(),
             fixed_writers: HashMap::new(),
             known_chunks: HashMap::new(),
-            base_snapshots: HashSet::new(),
             backup_size: 0,
             backup_stat: UploadStatistic::new(),
         };
@@ -145,13 +143,6 @@ impl BackupEnvironment {
         }
     }
 
-    /// Register a snapshot as a predecessor of the current backup.
-    /// It's existance will be ensured on finishing.
-    pub fn register_base_snapshot(&self, snap: BackupDir) {
-        let mut state = self.state.lock().unwrap();
-        state.base_snapshots.insert(snap);
-    }
-
     /// Register a Chunk with associated length.
     ///
     /// We do not fully trust clients, so a client may only use registered
@@ -488,16 +479,6 @@ impl BackupEnvironment {
         self.datastore.store_manifest(&self.backup_dir, manifest)
             .map_err(|err| format_err!("unable to store manifest blob - {}", err))?;
 
-        for snap in &state.base_snapshots {
-            let path = self.datastore.snapshot_path(snap);
-            if !path.exists() {
-                bail!(
-                    "base snapshot {} was removed during backup, cannot finish as chunks might be missing",
-                    snap
-                );
-            }
-        }
-
         // marks the backup as successful
         state.finished = true;
 
diff --git a/src/backup/backup_info.rs b/src/backup/backup_info.rs
index c35928ce..5bd8b8d5 100644
--- a/src/backup/backup_info.rs
+++ b/src/backup/backup_info.rs
@@ -218,7 +218,7 @@ impl std::str::FromStr for BackupGroup {
 /// Uniquely identify a Backup (relative to data store)
 ///
 /// We also call this a backup snaphost.
-#[derive(Debug, Eq, PartialEq, Hash, Clone)]
+#[derive(Debug, Eq, PartialEq, Clone)]
 pub struct BackupDir {
     /// Backup group
     group: BackupGroup,
-- 
2.20.1





^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 7/7] backup: lock base snapshot and ensure existance on finish
  2020-08-04 10:41 [pbs-devel] [PATCH 0/7] More flocking and race elimination Stefan Reiter
                   ` (5 preceding siblings ...)
  2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 6/7] Revert "backup: ensure base snapshots are still available after backup" Stefan Reiter
@ 2020-08-04 10:42 ` Stefan Reiter
  6 siblings, 0 replies; 15+ messages in thread
From: Stefan Reiter @ 2020-08-04 10:42 UTC (permalink / raw)
  To: pbs-devel

To prevent forgetting the base snapshot of a running backup, and catch
the case when it still happens (e.g. via manual rm) to at least error
out instead of storing a potentially invalid backup.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 src/api2/backup.rs             | 10 ++++++++--
 src/api2/backup/environment.rs | 10 ++++++++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/src/api2/backup.rs b/src/api2/backup.rs
index 4b751e3e..2973864a 100644
--- a/src/api2/backup.rs
+++ b/src/api2/backup.rs
@@ -100,11 +100,16 @@ async move {
     let last_backup = BackupInfo::last_backup(&datastore.base_path(), &backup_group, true).unwrap_or(None);
     let backup_dir = BackupDir::new_with_group(backup_group.clone(), backup_time);
 
-    if let Some(last) = &last_backup {
+    let _last_guard = if let Some(last) = &last_backup {
         if backup_dir.backup_time() <= last.backup_dir.backup_time() {
             bail!("backup timestamp is older than last backup.");
         }
-    }
+
+        // lock last snapshot to prevent forgetting/pruning it during backup
+        Some(last.lock())
+    } else {
+        None
+    };
 
     let (path, is_new, _snap_guard) = datastore.create_locked_backup_dir(&backup_dir)?;
     if !is_new { bail!("backup directory already exists."); }
@@ -147,6 +152,7 @@ async move {
             // keep flock until task ends
             let _group_guard = _group_guard;
             let _snap_guard = _snap_guard;
+            let _last_guard = _last_guard;
 
             let res = select!{
                 req = req_fut => req,
diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index aa039cd9..dffca562 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -479,6 +479,16 @@ impl BackupEnvironment {
         self.datastore.store_manifest(&self.backup_dir, manifest)
             .map_err(|err| format_err!("unable to store manifest blob - {}", err))?;
 
+        if let Some(base) = &self.last_backup {
+            let path = self.datastore.snapshot_path(&base.backup_dir);
+            if !path.exists() {
+                bail!(
+                    "base snapshot {} was removed during backup, cannot finish as chunks might be missing",
+                    base.backup_dir
+                );
+            }
+        }
+
         // marks the backup as successful
         state.finished = true;
 
-- 
2.20.1





^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 4/7] prune: also check backup snapshot locks
  2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 4/7] prune: also check backup snapshot locks Stefan Reiter
@ 2020-08-05  7:23   ` Fabian Ebner
  2020-08-05  8:34     ` Stefan Reiter
  0 siblings, 1 reply; 15+ messages in thread
From: Fabian Ebner @ 2020-08-05  7:23 UTC (permalink / raw)
  To: pbs-devel

Am 04.08.20 um 12:42 schrieb Stefan Reiter:
> ...to avoid pruning running backups or backups used as base for others.
> 
> Unfinished backups that have no lock are considered broken and will
> always be pruned.
> 
> The ignore_locks parameter is used for testing, since flocks are not
> available in test environment.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> This one I'm a bit scared of: The prune logic is astonishingly confusing IMHO,
> but I hope I worked out how this makes the most sense.
> 
> I left my long-form comments in, even though they might seem a bit excessive -
> they helped me make light of the situation, and might help in the future as
> well. But if it's too much, feel free to shorten/remove them.
> 
>   src/api2/admin/datastore.rs     |  2 +-
>   src/backup/prune.rs             | 48 +++++++++++++++++++++++----------
>   src/bin/proxmox-backup-proxy.rs |  2 +-
>   tests/prune.rs                  |  6 +++--
>   4 files changed, 40 insertions(+), 18 deletions(-)
> 
> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
> index 97fd9093..6194c9c7 100644
> --- a/src/api2/admin/datastore.rs
> +++ b/src/api2/admin/datastore.rs
> @@ -621,7 +621,7 @@ fn prune(
>   
>       let list = group.list_backups(&datastore.base_path())?;
>   
> -    let mut prune_info = compute_prune_info(list, &prune_options)?;
> +    let mut prune_info = compute_prune_info(list, &prune_options, false)?;
>   
>       prune_info.reverse(); // delete older snapshots first
>   
> diff --git a/src/backup/prune.rs b/src/backup/prune.rs
> index f7a87c5a..1bc5b9e8 100644
> --- a/src/backup/prune.rs
> +++ b/src/backup/prune.rs
> @@ -45,26 +45,45 @@ fn mark_selections<F: Fn(DateTime<Local>, &BackupInfo) -> String> (
>       }
>   }
>   
> -fn remove_incomplete_snapshots(
> +fn mark_incomplete_and_in_use_snapshots(
>       mark: &mut HashMap<PathBuf, PruneMark>,
>       list: &Vec<BackupInfo>,
> +    ignore_locks: bool,
>   ) {
>   
> -    let mut keep_unfinished = true;
> +    let mut first_finished = true;
>       for info in list.iter() {
> -        // backup is considered unfinished if there is no manifest
> -        if info.is_finished() {
> -            // There is a new finished backup, so there is no need
> -            // to keep older unfinished backups.
> -            keep_unfinished = false;
> -        } else {
> -            let backup_id = info.backup_dir.relative_path();
> -            if keep_unfinished { // keep first unfinished
> -                mark.insert(backup_id, PruneMark::KeepPartial);
> -            } else {
> +        let backup_id = info.backup_dir.relative_path();
> +        let finished = info.is_finished();
> +        if ignore_locks || info.lock().is_ok() {
> +            if !finished {
> +                // incomplete backup, but we can lock it, so it's not running -
> +                // always remove to clean up broken backups
>                   mark.insert(backup_id, PruneMark::Remove);
>               }
> -            keep_unfinished = false;
> +            // if locking succeeds and the backup is complete, let the regular
> +            // marking logic figure it out
> +            // note that we don't need to keep any locks either - any newly
> +            // started backup can only ever use the latest finished backup as
> +            // base, which is kept anyway (since we always keep the latest, and
> +            // this function already filters out any unfinished ones)
> +        } else {
> +            if finished {
> +                // if the backup is finished but locking fails, we have to keep
> +                // it, however, if it's the latest finished one, it will be kept
> +                // anyway, so don't mark it at all
> +                if !first_finished {
> +                    mark.insert(backup_id, PruneMark::Keep);
> +                }
> +            } else {
> +                // ...if it's incomplete, and locking fails, that probably means
> +                // it's currently running, so keep it, but do not include it in
> +                // mark computation
> +                mark.insert(backup_id, PruneMark::KeepPartial);
> +            };
> +        }
> +        if first_finished && finished {
> +            first_finished = false;
>           }
>       }
>   }

Is there any downside to treating all finished backups where the lock 
can't be acquired the same way? Not treating first_finished differently
might save a few lines.

Another thing is, by marking finished backups where we cannot acquire 
the lock as Keep, they will count towards the covered time periods in 
the Prune selection. This might cause the following, IMHO unwanted, 
behavior:
Say there's two finished backups A and B for a given month (and they're 
not the first finished one). A is about to be removed with forget. While 
A is being removed, the prune operation starts, say with --keep-monthly, 
and it cannot acquire the lock on A. But the prune selection considers 
the month to be already covered, because A is marked as Keep, so it will 
mark B with Remove. In the end there is no backup for that month remaining.

Marking these backups with KeepPartial would work around the issue, 
because those are not counted towards covering time periods. Of course 
the name doesn't really fit, maybe we should introduce an 
Ignore/Protected mark?

I haven't thought too much about it yet, but what about selecting 
normally and only removing the one's where we have the locks and warn 
about the others?

> @@ -173,13 +192,14 @@ impl PruneOptions {
>   pub fn compute_prune_info(
>       mut list: Vec<BackupInfo>,
>       options: &PruneOptions,
> +    ignore_locks: bool,
>   ) -> Result<Vec<(BackupInfo, bool)>, Error> {
>   
>       let mut mark = HashMap::new();
>   
>       BackupInfo::sort_list(&mut list, false);
>   
> -    remove_incomplete_snapshots(&mut mark, &list);
> +    mark_incomplete_and_in_use_snapshots(&mut mark, &list, ignore_locks);
>   
>       if let Some(keep_last) = options.keep_last {
>           mark_selections(&mut mark, &list, keep_last as usize, |_local_time, info| {
> diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
> index dc0d16d2..694e4b60 100644
> --- a/src/bin/proxmox-backup-proxy.rs
> +++ b/src/bin/proxmox-backup-proxy.rs
> @@ -441,7 +441,7 @@ async fn schedule_datastore_prune() {
>                   let groups = BackupGroup::list_groups(&base_path)?;
>                   for group in groups {
>                       let list = group.list_backups(&base_path)?;
> -                    let mut prune_info = compute_prune_info(list, &prune_options)?;
> +                    let mut prune_info = compute_prune_info(list, &prune_options, false)?;
>                       prune_info.reverse(); // delete older snapshots first
>   
>                       worker.log(format!("Starting prune on store \"{}\" group \"{}/{}\"",
> diff --git a/tests/prune.rs b/tests/prune.rs
> index d9758ea7..923a94fd 100644
> --- a/tests/prune.rs
> +++ b/tests/prune.rs
> @@ -9,7 +9,8 @@ fn get_prune_list(
>       options: &PruneOptions,
>   ) -> Vec<PathBuf> {
>   
> -    let mut prune_info = compute_prune_info(list, options).unwrap();
> +    // ignore locks, would always fail in test environment
> +    let mut prune_info = compute_prune_info(list, options, true).unwrap();
>   
>       prune_info.reverse();
>   
> @@ -38,7 +39,8 @@ fn create_info(
>           files.push(String::from(MANIFEST_BLOB_NAME));
>       }
>   
> -    BackupInfo { backup_dir, files }
> +    // note: base_path is only used for locking, which we ignore here anyway
> +    BackupInfo { backup_dir, files, base_path: PathBuf::from("/tmp/pbs-test") }
>   }
>   
>   #[test]
> 




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 4/7] prune: also check backup snapshot locks
  2020-08-05  7:23   ` Fabian Ebner
@ 2020-08-05  8:34     ` Stefan Reiter
  0 siblings, 0 replies; 15+ messages in thread
From: Stefan Reiter @ 2020-08-05  8:34 UTC (permalink / raw)
  To: Fabian Ebner, pbs-devel

On 8/5/20 9:23 AM, Fabian Ebner wrote:
> Am 04.08.20 um 12:42 schrieb Stefan Reiter:
>> ...to avoid pruning running backups or backups used as base for others.
>>
>> Unfinished backups that have no lock are considered broken and will
>> always be pruned.
>>
>> The ignore_locks parameter is used for testing, since flocks are not
>> available in test environment.
>>
>> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
>> ---
>>
>> This one I'm a bit scared of: The prune logic is astonishingly 
>> confusing IMHO,
>> but I hope I worked out how this makes the most sense.
>>
>> I left my long-form comments in, even though they might seem a bit 
>> excessive -
>> they helped me make light of the situation, and might help in the 
>> future as
>> well. But if it's too much, feel free to shorten/remove them.
>>
>>   src/api2/admin/datastore.rs     |  2 +-
>>   src/backup/prune.rs             | 48 +++++++++++++++++++++++----------
>>   src/bin/proxmox-backup-proxy.rs |  2 +-
>>   tests/prune.rs                  |  6 +++--
>>   4 files changed, 40 insertions(+), 18 deletions(-)
>>
>> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
>> index 97fd9093..6194c9c7 100644
>> --- a/src/api2/admin/datastore.rs
>> +++ b/src/api2/admin/datastore.rs
>> @@ -621,7 +621,7 @@ fn prune(
>>       let list = group.list_backups(&datastore.base_path())?;
>> -    let mut prune_info = compute_prune_info(list, &prune_options)?;
>> +    let mut prune_info = compute_prune_info(list, &prune_options, 
>> false)?;
>>       prune_info.reverse(); // delete older snapshots first
>> diff --git a/src/backup/prune.rs b/src/backup/prune.rs
>> index f7a87c5a..1bc5b9e8 100644
>> --- a/src/backup/prune.rs
>> +++ b/src/backup/prune.rs
>> @@ -45,26 +45,45 @@ fn mark_selections<F: Fn(DateTime<Local>, 
>> &BackupInfo) -> String> (
>>       }
>>   }
>> -fn remove_incomplete_snapshots(
>> +fn mark_incomplete_and_in_use_snapshots(
>>       mark: &mut HashMap<PathBuf, PruneMark>,
>>       list: &Vec<BackupInfo>,
>> +    ignore_locks: bool,
>>   ) {
>> -    let mut keep_unfinished = true;
>> +    let mut first_finished = true;
>>       for info in list.iter() {
>> -        // backup is considered unfinished if there is no manifest
>> -        if info.is_finished() {
>> -            // There is a new finished backup, so there is no need
>> -            // to keep older unfinished backups.
>> -            keep_unfinished = false;
>> -        } else {
>> -            let backup_id = info.backup_dir.relative_path();
>> -            if keep_unfinished { // keep first unfinished
>> -                mark.insert(backup_id, PruneMark::KeepPartial);
>> -            } else {
>> +        let backup_id = info.backup_dir.relative_path();
>> +        let finished = info.is_finished();
>> +        if ignore_locks || info.lock().is_ok() {
>> +            if !finished {
>> +                // incomplete backup, but we can lock it, so it's not 
>> running -
>> +                // always remove to clean up broken backups
>>                   mark.insert(backup_id, PruneMark::Remove);
>>               }
>> -            keep_unfinished = false;
>> +            // if locking succeeds and the backup is complete, let 
>> the regular
>> +            // marking logic figure it out
>> +            // note that we don't need to keep any locks either - any 
>> newly
>> +            // started backup can only ever use the latest finished 
>> backup as
>> +            // base, which is kept anyway (since we always keep the 
>> latest, and
>> +            // this function already filters out any unfinished ones)
>> +        } else {
>> +            if finished {
>> +                // if the backup is finished but locking fails, we 
>> have to keep
>> +                // it, however, if it's the latest finished one, it 
>> will be kept
>> +                // anyway, so don't mark it at all
>> +                if !first_finished {
>> +                    mark.insert(backup_id, PruneMark::Keep);
>> +                }
>> +            } else {
>> +                // ...if it's incomplete, and locking fails, that 
>> probably means
>> +                // it's currently running, so keep it, but do not 
>> include it in
>> +                // mark computation
>> +                mark.insert(backup_id, PruneMark::KeepPartial);
>> +            };
>> +        }
>> +        if first_finished && finished {
>> +            first_finished = false;
>>           }
>>       }
>>   }
> 
> Is there any downside to treating all finished backups where the lock 
> can't be acquired the same way? Not treating first_finished differently
> might save a few lines.
> 

I did it exactly for this reason, so the last backup does in fact count 
towards the kept backups, since it's guaranteed to be newest in it's 
selection (month, week, etc.), however...

> Another thing is, by marking finished backups where we cannot acquire 
> the lock as Keep, they will count towards the covered time periods in 
> the Prune selection. This might cause the following, IMHO unwanted, 
> behavior:
> Say there's two finished backups A and B for a given month (and they're 
> not the first finished one). A is about to be removed with forget. While 
> A is being removed, the prune operation starts, say with --keep-monthly, 
> and it cannot acquire the lock on A. But the prune selection considers 
> the month to be already covered, because A is marked as Keep, so it will 
> mark B with Remove. In the end there is no backup for that month remaining.
> 

...I didn't consider a lock conflict with forget instead of backup, so I 
suppose that is indeed wrong.

> Marking these backups with KeepPartial would work around the issue, 
> because those are not counted towards covering time periods. Of course 
> the name doesn't really fit, maybe we should introduce an 
> Ignore/Protected mark?
> 

That still doesn't make the behaviour clearer: If a backup is running, 
the first *three* snapshots will now never be pruned (first two are 
locked and the third is treated as the first, so always kept). At least 
it's more correct I guess, and I don't really see an easier way to deal 
with the 'forget' race.

AFAICT this is the only place we use KeepPartial anyway, so we could 
just rename it. We should probably also introduce a RemoveBroken (or 
similar) to distinguish the reasons why a backup is removed...

> I haven't thought too much about it yet, but what about selecting 
> normally and only removing the one's where we have the locks and warn 
> about the others?
> 

...though this only makes sense if we update the GUI to show the 
reasons, as well as any warnings like the one you mentioned, which 
probably makes sense to do in a seperate series.

>> @@ -173,13 +192,14 @@ impl PruneOptions {
>>   pub fn compute_prune_info(
>>       mut list: Vec<BackupInfo>,
>>       options: &PruneOptions,
>> +    ignore_locks: bool,
>>   ) -> Result<Vec<(BackupInfo, bool)>, Error> {
>>       let mut mark = HashMap::new();
>>       BackupInfo::sort_list(&mut list, false);
>> -    remove_incomplete_snapshots(&mut mark, &list);
>> +    mark_incomplete_and_in_use_snapshots(&mut mark, &list, 
>> ignore_locks);
>>       if let Some(keep_last) = options.keep_last {
>>           mark_selections(&mut mark, &list, keep_last as usize, 
>> |_local_time, info| {
>> diff --git a/src/bin/proxmox-backup-proxy.rs 
>> b/src/bin/proxmox-backup-proxy.rs
>> index dc0d16d2..694e4b60 100644
>> --- a/src/bin/proxmox-backup-proxy.rs
>> +++ b/src/bin/proxmox-backup-proxy.rs
>> @@ -441,7 +441,7 @@ async fn schedule_datastore_prune() {
>>                   let groups = BackupGroup::list_groups(&base_path)?;
>>                   for group in groups {
>>                       let list = group.list_backups(&base_path)?;
>> -                    let mut prune_info = compute_prune_info(list, 
>> &prune_options)?;
>> +                    let mut prune_info = compute_prune_info(list, 
>> &prune_options, false)?;
>>                       prune_info.reverse(); // delete older snapshots 
>> first
>>                       worker.log(format!("Starting prune on store 
>> \"{}\" group \"{}/{}\"",
>> diff --git a/tests/prune.rs b/tests/prune.rs
>> index d9758ea7..923a94fd 100644
>> --- a/tests/prune.rs
>> +++ b/tests/prune.rs
>> @@ -9,7 +9,8 @@ fn get_prune_list(
>>       options: &PruneOptions,
>>   ) -> Vec<PathBuf> {
>> -    let mut prune_info = compute_prune_info(list, options).unwrap();
>> +    // ignore locks, would always fail in test environment
>> +    let mut prune_info = compute_prune_info(list, options, 
>> true).unwrap();
>>       prune_info.reverse();
>> @@ -38,7 +39,8 @@ fn create_info(
>>           files.push(String::from(MANIFEST_BLOB_NAME));
>>       }
>> -    BackupInfo { backup_dir, files }
>> +    // note: base_path is only used for locking, which we ignore here 
>> anyway
>> +    BackupInfo { backup_dir, files, base_path: 
>> PathBuf::from("/tmp/pbs-test") }
>>   }
>>   #[test]
>>




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pbs-devel] applied: [PATCH proxmox-backup 1/7] finish_backup: mark backup as finished only after checks have passed
  2020-08-04 10:41 ` [pbs-devel] [PATCH proxmox-backup 1/7] finish_backup: mark backup as finished only after checks have passed Stefan Reiter
@ 2020-08-06  4:39   ` Dietmar Maurer
  0 siblings, 0 replies; 15+ messages in thread
From: Dietmar Maurer @ 2020-08-06  4:39 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter

applied




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 2/7] backup: only allow finished backups as base snapshot
  2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 2/7] backup: only allow finished backups as base snapshot Stefan Reiter
@ 2020-08-06  4:45   ` Dietmar Maurer
  2020-08-06  7:58     ` Stefan Reiter
  0 siblings, 1 reply; 15+ messages in thread
From: Dietmar Maurer @ 2020-08-06  4:45 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter

This break the assumption that the current backup only reference junks from
the previous one.

Instead, we can simply fail if the previous backup is not complete yet?

> On 08/04/2020 12:42 PM Stefan Reiter <s.reiter@proxmox.com> wrote:
> 
>  
> If the datastore holds broken backups for some reason, do not attempt to
> base following snapshots on those. This would lead to an error on
> /previous, leaving the client no choice but to upload all chunks, even
> though there might be potential for incremental savings.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
>  src/api2/backup.rs        | 2 +-
>  src/backup/backup_info.rs | 8 ++++++--
>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/src/api2/backup.rs b/src/api2/backup.rs
> index ad13faa5..aafea8fa 100644
> --- a/src/api2/backup.rs
> +++ b/src/api2/backup.rs
> @@ -97,7 +97,7 @@ async move {
>          bail!("backup owner check failed ({} != {})", username, owner);
>      }
>  
> -    let last_backup = BackupInfo::last_backup(&datastore.base_path(), &backup_group).unwrap_or(None);
> +    let last_backup = BackupInfo::last_backup(&datastore.base_path(), &backup_group, true).unwrap_or(None);
>      let backup_dir = BackupDir::new_with_group(backup_group.clone(), backup_time);
>  
>      if let Some(last) = &last_backup {
> diff --git a/src/backup/backup_info.rs b/src/backup/backup_info.rs
> index 37dc7aa1..ea917d3c 100644
> --- a/src/backup/backup_info.rs
> +++ b/src/backup/backup_info.rs
> @@ -313,9 +313,13 @@ impl BackupInfo {
>      }
>  
>      /// Finds the latest backup inside a backup group
> -    pub fn last_backup(base_path: &Path, group: &BackupGroup) -> Result<Option<BackupInfo>, Error> {
> +    pub fn last_backup(base_path: &Path, group: &BackupGroup, only_finished: bool)
> +        -> Result<Option<BackupInfo>, Error>
> +    {
>          let backups = group.list_backups(base_path)?;
> -        Ok(backups.into_iter().max_by_key(|item| item.backup_dir.backup_time()))
> +        Ok(backups.into_iter()
> +            .filter(|item| !only_finished || item.is_finished())
> +            .max_by_key(|item| item.backup_dir.backup_time()))
>      }
>  
>      pub fn sort_list(list: &mut Vec<BackupInfo>, ascendending: bool) {
> -- 
> 2.20.1
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 3/7] datastore: prevent in-use deletion with locks instead of heuristic
  2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 3/7] datastore: prevent in-use deletion with locks instead of heuristic Stefan Reiter
@ 2020-08-06  4:51   ` Dietmar Maurer
  0 siblings, 0 replies; 15+ messages in thread
From: Dietmar Maurer @ 2020-08-06  4:51 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter


> @@ -299,6 +303,8 @@ pub struct BackupInfo {
>      pub backup_dir: BackupDir,
>      /// List of data files
>      pub files: Vec<String>,
> +    /// Full path to dir containing backup_dir
> +    pub base_path: PathBuf,
>  }

I avoided to encode the full path by intention. Instead, I pass it as parameter
when its needed.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 2/7] backup: only allow finished backups as base snapshot
  2020-08-06  4:45   ` Dietmar Maurer
@ 2020-08-06  7:58     ` Stefan Reiter
  2020-08-07  5:40       ` [pbs-devel] applied: " Dietmar Maurer
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Reiter @ 2020-08-06  7:58 UTC (permalink / raw)
  To: Dietmar Maurer, Proxmox Backup Server development discussion

On 8/6/20 6:45 AM, Dietmar Maurer wrote:
> This break the assumption that the current backup only reference junks from
> the previous one.
> 

Why do we need that assumption?

> Instead, we can simply fail if the previous backup is not complete yet?
> 

"Not complete *yet*" can't happen, since only one backup can run at a 
time. So if this case occurs, it's because the server has broken backups.

>> On 08/04/2020 12:42 PM Stefan Reiter <s.reiter@proxmox.com> wrote:
>>
>>   
>> If the datastore holds broken backups for some reason, do not attempt to
>> base following snapshots on those. This would lead to an error on
>> /previous, leaving the client no choice but to upload all chunks, even
>> though there might be potential for incremental savings.
>>
>> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
>> ---
>>   src/api2/backup.rs        | 2 +-
>>   src/backup/backup_info.rs | 8 ++++++--
>>   2 files changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/api2/backup.rs b/src/api2/backup.rs
>> index ad13faa5..aafea8fa 100644
>> --- a/src/api2/backup.rs
>> +++ b/src/api2/backup.rs
>> @@ -97,7 +97,7 @@ async move {
>>           bail!("backup owner check failed ({} != {})", username, owner);
>>       }
>>   
>> -    let last_backup = BackupInfo::last_backup(&datastore.base_path(), &backup_group).unwrap_or(None);
>> +    let last_backup = BackupInfo::last_backup(&datastore.base_path(), &backup_group, true).unwrap_or(None);
>>       let backup_dir = BackupDir::new_with_group(backup_group.clone(), backup_time);
>>   
>>       if let Some(last) = &last_backup {
>> diff --git a/src/backup/backup_info.rs b/src/backup/backup_info.rs
>> index 37dc7aa1..ea917d3c 100644
>> --- a/src/backup/backup_info.rs
>> +++ b/src/backup/backup_info.rs
>> @@ -313,9 +313,13 @@ impl BackupInfo {
>>       }
>>   
>>       /// Finds the latest backup inside a backup group
>> -    pub fn last_backup(base_path: &Path, group: &BackupGroup) -> Result<Option<BackupInfo>, Error> {
>> +    pub fn last_backup(base_path: &Path, group: &BackupGroup, only_finished: bool)
>> +        -> Result<Option<BackupInfo>, Error>
>> +    {
>>           let backups = group.list_backups(base_path)?;
>> -        Ok(backups.into_iter().max_by_key(|item| item.backup_dir.backup_time()))
>> +        Ok(backups.into_iter()
>> +            .filter(|item| !only_finished || item.is_finished())
>> +            .max_by_key(|item| item.backup_dir.backup_time()))
>>       }
>>   
>>       pub fn sort_list(list: &mut Vec<BackupInfo>, ascendending: bool) {
>> -- 
>> 2.20.1
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [pbs-devel] applied: [PATCH proxmox-backup 2/7] backup: only allow finished backups as base snapshot
  2020-08-06  7:58     ` Stefan Reiter
@ 2020-08-07  5:40       ` Dietmar Maurer
  0 siblings, 0 replies; 15+ messages in thread
From: Dietmar Maurer @ 2020-08-07  5:40 UTC (permalink / raw)
  To: Stefan Reiter, Proxmox Backup Server development discussion

OK, applied.

> On 08/06/2020 9:58 AM Stefan Reiter <s.reiter@proxmox.com> wrote:
> 
>  
> On 8/6/20 6:45 AM, Dietmar Maurer wrote:
> > This break the assumption that the current backup only reference junks from
> > the previous one.
> > 
> 
> Why do we need that assumption?
> 
> > Instead, we can simply fail if the previous backup is not complete yet?
> > 
> 
> "Not complete *yet*" can't happen, since only one backup can run at a 
> time. So if this case occurs, it's because the server has broken backups.




^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-08-07  5:41 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-04 10:41 [pbs-devel] [PATCH 0/7] More flocking and race elimination Stefan Reiter
2020-08-04 10:41 ` [pbs-devel] [PATCH proxmox-backup 1/7] finish_backup: mark backup as finished only after checks have passed Stefan Reiter
2020-08-06  4:39   ` [pbs-devel] applied: " Dietmar Maurer
2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 2/7] backup: only allow finished backups as base snapshot Stefan Reiter
2020-08-06  4:45   ` Dietmar Maurer
2020-08-06  7:58     ` Stefan Reiter
2020-08-07  5:40       ` [pbs-devel] applied: " Dietmar Maurer
2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 3/7] datastore: prevent in-use deletion with locks instead of heuristic Stefan Reiter
2020-08-06  4:51   ` Dietmar Maurer
2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 4/7] prune: also check backup snapshot locks Stefan Reiter
2020-08-05  7:23   ` Fabian Ebner
2020-08-05  8:34     ` Stefan Reiter
2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 5/7] backup: flock snapshot on backup start Stefan Reiter
2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 6/7] Revert "backup: ensure base snapshots are still available after backup" Stefan Reiter
2020-08-04 10:42 ` [pbs-devel] [PATCH proxmox-backup 7/7] backup: lock base snapshot and ensure existance on finish Stefan Reiter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal