public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH 0/5] fix #2881: protect base snapshots and avoid races
@ 2020-07-29 12:33 Stefan Reiter
  2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 1/5] fix typo: avgerage to average Stefan Reiter
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Stefan Reiter @ 2020-07-29 12:33 UTC (permalink / raw)
  To: pbs-devel

Avoid races with GC and between different concurrent backups by limiting the
number of concurrent backups per group to 1 and forbidding the deletion of the
previous finished backup, if a new one is currently running.

Fixes several issues that could lead to failed backups, or even worse,
"successful" backups that were really missing chunks.


proxmox-backup: Stefan Reiter (5):
  fix typo: avgerage to average
  datastore: prevent deletion of snaps in use as "previous backup"
  tools: add nonblocking mode to lock_file
  backup: use flock on backup group to forbid multiple backups at once
  backup: ensure base snapshots are still available after backup

 src/api2/admin/datastore.rs     |  4 +--
 src/api2/backup.rs              | 12 +++++---
 src/api2/backup/environment.rs  | 23 +++++++++++++--
 src/backup/backup_info.rs       | 51 ++++++++++++++++++++++++++++++++-
 src/backup/datastore.rs         | 43 +++++++++++++++++++++++++--
 src/backup/prune.rs             |  2 +-
 src/bin/proxmox-backup-proxy.rs |  2 +-
 src/client/backup_writer.rs     |  2 +-
 src/client/pull.rs              |  4 +--
 src/tools.rs                    | 13 +++++++++
 10 files changed, 140 insertions(+), 16 deletions(-)

-- 
2.20.1




^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 1/5] fix typo: avgerage to average
  2020-07-29 12:33 [pbs-devel] [PATCH 0/5] fix #2881: protect base snapshots and avoid races Stefan Reiter
@ 2020-07-29 12:33 ` Stefan Reiter
  2020-07-30  5:25   ` [pbs-devel] applied: " Dietmar Maurer
  2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 2/5] datastore: prevent deletion of snaps in use as "previous backup" Stefan Reiter
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Stefan Reiter @ 2020-07-29 12:33 UTC (permalink / raw)
  To: pbs-devel

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 src/client/backup_writer.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/client/backup_writer.rs b/src/client/backup_writer.rs
index b201ef2f..0b0ef93b 100644
--- a/src/client/backup_writer.rs
+++ b/src/client/backup_writer.rs
@@ -266,7 +266,7 @@ impl BackupWriter {
         if archive_name != CATALOG_NAME {
             let speed: HumanByte = ((uploaded * 1_000_000) / (duration.as_micros() as usize)).into();
             let uploaded: HumanByte = uploaded.into();
-            println!("{}: had to upload {} of {} in {:.2}s, avgerage speed {}/s).", archive, uploaded, vsize_h, duration.as_secs_f64(), speed);
+            println!("{}: had to upload {} of {} in {:.2}s, average speed {}/s).", archive, uploaded, vsize_h, duration.as_secs_f64(), speed);
         } else {
             println!("Uploaded backup catalog ({})", vsize_h);
         }
-- 
2.20.1





^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 2/5] datastore: prevent deletion of snaps in use as "previous backup"
  2020-07-29 12:33 [pbs-devel] [PATCH 0/5] fix #2881: protect base snapshots and avoid races Stefan Reiter
  2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 1/5] fix typo: avgerage to average Stefan Reiter
@ 2020-07-29 12:33 ` Stefan Reiter
  2020-07-30  6:37   ` [pbs-devel] applied: " Dietmar Maurer
  2020-07-30  6:40   ` [pbs-devel] " Fabian Grünbichler
  2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 3/5] tools: add nonblocking mode to lock_file Stefan Reiter
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 16+ messages in thread
From: Stefan Reiter @ 2020-07-29 12:33 UTC (permalink / raw)
  To: pbs-devel

To prevent a race with a background GC operation, do not allow deletion
of backups who's index might currently be referenced as the "known chunk
list" for successive backups. Otherwise the GC could delete chunks it
thinks are no longer referenced, while at the same time telling the
client that it doesn't need to upload said chunks because they already
exist.

Additionally, prevent deletion of whole backup groups, if there are
snapshots contained that appear to be currently in-progress. This is
currently unlikely to trigger, as that function is only used for sync
jobs, but it's a useful safeguard either way.

Deleting a single snapshot has a 'force' parameter, which is necessary
to allow deleting incomplete snapshots on an aborted backup. Pruning
also sets force=true to avoid the check, since it calculates which
snapshots to keep on its own.

To avoid code duplication, the is_finished method is factored out.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 src/api2/admin/datastore.rs     |  4 +--
 src/api2/backup/environment.rs  |  2 +-
 src/backup/backup_info.rs       |  7 +++++-
 src/backup/datastore.rs         | 43 +++++++++++++++++++++++++++++++--
 src/backup/prune.rs             |  2 +-
 src/bin/proxmox-backup-proxy.rs |  2 +-
 src/client/pull.rs              |  4 +--
 7 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index 17e31f1e..f207e5ea 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -272,7 +272,7 @@ fn delete_snapshot(
     let allowed = (user_privs & PRIV_DATASTORE_MODIFY) != 0;
     if !allowed { check_backup_owner(&datastore, snapshot.group(), &username)?; }
 
-    datastore.remove_backup_dir(&snapshot)?;
+    datastore.remove_backup_dir(&snapshot, false)?;
 
     Ok(Value::Null)
 }
@@ -660,7 +660,7 @@ fn prune(
             }));
 
             if !(dry_run || keep) {
-                datastore.remove_backup_dir(&info.backup_dir)?;
+                datastore.remove_backup_dir(&info.backup_dir, true)?;
             }
         }
 
diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index 2a3632a4..f2ebd24f 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -479,7 +479,7 @@ impl BackupEnvironment {
         let mut state = self.state.lock().unwrap();
         state.finished = true;
 
-        self.datastore.remove_backup_dir(&self.backup_dir)?;
+        self.datastore.remove_backup_dir(&self.backup_dir, true)?;
 
         Ok(())
     }
diff --git a/src/backup/backup_info.rs b/src/backup/backup_info.rs
index f8ea11bd..b4f671bd 100644
--- a/src/backup/backup_info.rs
+++ b/src/backup/backup_info.rs
@@ -173,7 +173,7 @@ impl std::str::FromStr for BackupGroup {
 /// Uniquely identify a Backup (relative to data store)
 ///
 /// We also call this a backup snaphost.
-#[derive(Debug, Clone)]
+#[derive(Debug, Eq, PartialEq, Clone)]
 pub struct BackupDir {
     /// Backup group
     group: BackupGroup,
@@ -317,6 +317,11 @@ impl BackupInfo {
         })?;
         Ok(list)
     }
+
+    pub fn is_finished(&self) -> bool {
+        // backup is considered unfinished if there is no manifest
+        self.files.iter().any(|name| name == super::MANIFEST_BLOB_NAME)
+    }
 }
 
 fn list_backup_files<P: ?Sized + nix::NixPath>(dirfd: RawFd, path: &P) -> Result<Vec<String>, Error> {
diff --git a/src/backup/datastore.rs b/src/backup/datastore.rs
index 92f7b069..9c2f2851 100644
--- a/src/backup/datastore.rs
+++ b/src/backup/datastore.rs
@@ -8,7 +8,7 @@ use anyhow::{bail, format_err, Error};
 use lazy_static::lazy_static;
 use chrono::{DateTime, Utc};
 
-use super::backup_info::{BackupGroup, BackupDir};
+use super::backup_info::{BackupGroup, BackupDir, BackupInfo};
 use super::chunk_store::ChunkStore;
 use super::dynamic_index::{DynamicIndexReader, DynamicIndexWriter};
 use super::fixed_index::{FixedIndexReader, FixedIndexWriter};
@@ -197,6 +197,20 @@ impl DataStore {
 
         let full_path = self.group_path(backup_group);
 
+        let mut snap_list = backup_group.list_backups(&self.base_path())?;
+        BackupInfo::sort_list(&mut snap_list, false);
+        for snap in snap_list {
+            if snap.is_finished() {
+                break;
+            } else {
+                bail!(
+                    "cannot remove backup group {:?}, contains potentially running backup: {}",
+                    full_path,
+                    snap.backup_dir
+                );
+            }
+        }
+
         log::info!("removing backup group {:?}", full_path);
         std::fs::remove_dir_all(&full_path)
             .map_err(|err| {
@@ -211,10 +225,35 @@ impl DataStore {
     }
 
     /// Remove a backup directory including all content
-    pub fn remove_backup_dir(&self, backup_dir: &BackupDir) ->  Result<(), Error> {
+    pub fn remove_backup_dir(&self, backup_dir: &BackupDir, force: bool) ->  Result<(), Error> {
 
         let full_path = self.snapshot_path(backup_dir);
 
+        if !force {
+            let mut snap_list = backup_dir.group().list_backups(&self.base_path())?;
+            BackupInfo::sort_list(&mut snap_list, false);
+            let mut prev_snap_finished = true;
+            for snap in snap_list {
+                let cur_snap_finished = snap.is_finished();
+                if &snap.backup_dir == backup_dir {
+                    if !cur_snap_finished {
+                        bail!(
+                            "cannot remove currently running snapshot: {:?}",
+                            backup_dir
+                        );
+                    }
+                    if !prev_snap_finished {
+                        bail!(
+                            "cannot remove snapshot {:?}, successor is currently running and potentially based on it",
+                            backup_dir
+                        );
+                    }
+                    break;
+                }
+                prev_snap_finished = cur_snap_finished;
+            }
+        }
+
         log::info!("removing backup snapshot {:?}", full_path);
         std::fs::remove_dir_all(&full_path)
             .map_err(|err| {
diff --git a/src/backup/prune.rs b/src/backup/prune.rs
index bd7cf3b1..f7a87c5a 100644
--- a/src/backup/prune.rs
+++ b/src/backup/prune.rs
@@ -53,7 +53,7 @@ fn remove_incomplete_snapshots(
     let mut keep_unfinished = true;
     for info in list.iter() {
         // backup is considered unfinished if there is no manifest
-        if info.files.iter().any(|name| name == super::MANIFEST_BLOB_NAME) {
+        if info.is_finished() {
             // There is a new finished backup, so there is no need
             // to keep older unfinished backups.
             keep_unfinished = false;
diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
index a3033916..dc0d16d2 100644
--- a/src/bin/proxmox-backup-proxy.rs
+++ b/src/bin/proxmox-backup-proxy.rs
@@ -455,7 +455,7 @@ async fn schedule_datastore_prune() {
                             BackupDir::backup_time_to_string(info.backup_dir.backup_time())));
 
                         if !keep {
-                            datastore.remove_backup_dir(&info.backup_dir)?;
+                            datastore.remove_backup_dir(&info.backup_dir, true)?;
                         }
                     }
                 }
diff --git a/src/client/pull.rs b/src/client/pull.rs
index 758cb574..c44cb9f6 100644
--- a/src/client/pull.rs
+++ b/src/client/pull.rs
@@ -277,7 +277,7 @@ pub async fn pull_snapshot_from(
         worker.log(format!("sync snapshot {:?}", snapshot.relative_path()));
 
         if let Err(err) = pull_snapshot(worker, reader, tgt_store.clone(), &snapshot).await {
-            if let Err(cleanup_err) = tgt_store.remove_backup_dir(&snapshot) {
+            if let Err(cleanup_err) = tgt_store.remove_backup_dir(&snapshot, true) {
                 worker.log(format!("cleanup error - {}", cleanup_err));
             }
             return Err(err);
@@ -362,7 +362,7 @@ pub async fn pull_group(
             let backup_time = info.backup_dir.backup_time();
             if remote_snapshots.contains(&backup_time) { continue; }
             worker.log(format!("delete vanished snapshot {:?}", info.backup_dir.relative_path()));
-            tgt_store.remove_backup_dir(&info.backup_dir)?;
+            tgt_store.remove_backup_dir(&info.backup_dir, false)?;
         }
     }
 
-- 
2.20.1





^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 3/5] tools: add nonblocking mode to lock_file
  2020-07-29 12:33 [pbs-devel] [PATCH 0/5] fix #2881: protect base snapshots and avoid races Stefan Reiter
  2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 1/5] fix typo: avgerage to average Stefan Reiter
  2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 2/5] datastore: prevent deletion of snaps in use as "previous backup" Stefan Reiter
@ 2020-07-29 12:33 ` Stefan Reiter
  2020-07-30  6:23   ` [pbs-devel] applied: " Dietmar Maurer
  2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 4/5] backup: use flock on backup group to forbid multiple backups at once Stefan Reiter
  2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 5/5] backup: ensure base snapshots are still available after backup Stefan Reiter
  4 siblings, 1 reply; 16+ messages in thread
From: Stefan Reiter @ 2020-07-29 12:33 UTC (permalink / raw)
  To: pbs-devel

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 src/tools.rs | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/src/tools.rs b/src/tools.rs
index 44db796d..d7b72a73 100644
--- a/src/tools.rs
+++ b/src/tools.rs
@@ -91,6 +91,9 @@ pub fn map_struct_mut<T>(buffer: &mut [u8]) -> Result<&mut T, Error> {
 
 /// Create a file lock using fntl. This function allows you to specify
 /// a timeout if you want to avoid infinite blocking.
+///
+/// With timeout set to 0, non-blocking mode is used and the function
+/// will fail immediately if the lock can't be acquired.
 pub fn lock_file<F: AsRawFd>(
     file: &mut F,
     exclusive: bool,
@@ -110,6 +113,16 @@ pub fn lock_file<F: AsRawFd>(
         Some(t) => t,
     };
 
+    if timeout.as_nanos() == 0 {
+        let lockarg = if exclusive {
+            nix::fcntl::FlockArg::LockExclusiveNonblock
+        } else {
+            nix::fcntl::FlockArg::LockSharedNonblock
+        };
+        nix::fcntl::flock(file.as_raw_fd(), lockarg)?;
+        return Ok(());
+    }
+
     // unblock the timeout signal temporarily
     let _sigblock_guard = timer::unblock_timeout_signal();
 
-- 
2.20.1





^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 4/5] backup: use flock on backup group to forbid multiple backups at once
  2020-07-29 12:33 [pbs-devel] [PATCH 0/5] fix #2881: protect base snapshots and avoid races Stefan Reiter
                   ` (2 preceding siblings ...)
  2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 3/5] tools: add nonblocking mode to lock_file Stefan Reiter
@ 2020-07-29 12:33 ` Stefan Reiter
  2020-07-30  5:50   ` Dietmar Maurer
  2020-07-30  6:38   ` [pbs-devel] applied: " Dietmar Maurer
  2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 5/5] backup: ensure base snapshots are still available after backup Stefan Reiter
  4 siblings, 2 replies; 16+ messages in thread
From: Stefan Reiter @ 2020-07-29 12:33 UTC (permalink / raw)
  To: pbs-devel

Multiple backups within one backup group don't really make sense, but
break all sorts of guarantees (e.g. a second backup started after a
first would use a "known-chunks" list from the previous unfinished one,
which would be empty - but using the list from the last finished one is
not a fix either, as that one could be deleted or pruned once the first
simultaneous backup is finished).

Fix it by only allowing one backup per backup group at one time. This is
done via a flock on the backup group directory, thus remaining intact
even after a reload.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 src/api2/backup.rs        | 11 ++++++----
 src/backup/backup_info.rs | 44 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/src/api2/backup.rs b/src/api2/backup.rs
index 05978bf2..621e8c07 100644
--- a/src/api2/backup.rs
+++ b/src/api2/backup.rs
@@ -95,17 +95,17 @@ async move {
     }
 
     let last_backup = BackupInfo::last_backup(&datastore.base_path(), &backup_group).unwrap_or(None);
-    let backup_dir = BackupDir::new_with_group(backup_group, backup_time);
+    let backup_dir = BackupDir::new_with_group(backup_group.clone(), backup_time);
 
     if let Some(last) = &last_backup {
         if backup_dir.backup_time() <= last.backup_dir.backup_time() {
             bail!("backup timestamp is older than last backup.");
         }
-        // fixme: abort if last backup is still running - howto test?
-        // Idea: write upid into a file inside snapshot dir. then test if
-        // it is still running here.
     }
 
+    // lock backup group to only allow one backup per group at a time
+    let _group_guard = backup_group.lock(&datastore.base_path())?;
+
     let (path, is_new) = datastore.create_backup_dir(&backup_dir)?;
     if !is_new { bail!("backup directory already exists."); }
 
@@ -144,6 +144,9 @@ async move {
             .map(|_| Err(format_err!("task aborted")));
 
         async move {
+            // keep flock until task ends
+            let _group_guard = _group_guard;
+
             let res = select!{
                 req = req_fut => req,
                 abrt = abort_future => abrt,
diff --git a/src/backup/backup_info.rs b/src/backup/backup_info.rs
index b4f671bd..041f5785 100644
--- a/src/backup/backup_info.rs
+++ b/src/backup/backup_info.rs
@@ -3,7 +3,9 @@ use crate::tools;
 use anyhow::{bail, format_err, Error};
 use regex::Regex;
 use std::os::unix::io::RawFd;
+use nix::dir::Dir;
 
+use std::time::Duration;
 use chrono::{DateTime, TimeZone, SecondsFormat, Utc};
 
 use std::path::{PathBuf, Path};
@@ -36,6 +38,9 @@ lazy_static!{
 
 }
 
+/// Opaque type releasing the corresponding flock when dropped
+pub type BackupGroupGuard = Dir;
+
 /// BackupGroup is a directory containing a list of BackupDir
 #[derive(Debug, Eq, PartialEq, Hash, Clone)]
 pub struct BackupGroup {
@@ -130,6 +135,45 @@ impl BackupGroup {
         Ok(last)
     }
 
+    pub fn lock(&self, base_path: &Path) -> Result<BackupGroupGuard, Error> {
+        use nix::fcntl::OFlag;
+        use nix::sys::stat::Mode;
+
+        let mut path = base_path.to_owned();
+        path.push(self.group_path());
+
+        let mut handle = Dir::open(&path, OFlag::O_RDONLY, Mode::empty())
+            .map_err(|err| {
+                format_err!(
+                    "unable to open backup group directory {:?} for locking - {}",
+                    self.group_path(),
+                    err,
+                )
+            })?;
+
+        // acquire in non-blocking mode, no point in waiting here since other
+        // backups could still take a very long time
+        tools::lock_file(&mut handle, true, Some(Duration::from_nanos(0)))
+            .map_err(|err| {
+                match err.downcast_ref::<nix::Error>() {
+                    Some(nix::Error::Sys(nix::errno::Errno::EAGAIN)) => {
+                        return format_err!(
+                            "unable to acquire lock on backup group {:?} - another backup is already running",
+                            self.group_path(),
+                        );
+                    },
+                    _ => ()
+                }
+                format_err!(
+                    "unable to acquire lock on backup group {:?} - {}",
+                    self.group_path(),
+                    err,
+                )
+            })?;
+
+        Ok(handle)
+    }
+
     pub fn list_groups(base_path: &Path) -> Result<Vec<BackupGroup>, Error> {
         let mut list = Vec::new();
 
-- 
2.20.1





^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 5/5] backup: ensure base snapshots are still available after backup
  2020-07-29 12:33 [pbs-devel] [PATCH 0/5] fix #2881: protect base snapshots and avoid races Stefan Reiter
                   ` (3 preceding siblings ...)
  2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 4/5] backup: use flock on backup group to forbid multiple backups at once Stefan Reiter
@ 2020-07-29 12:33 ` Stefan Reiter
  2020-07-30  6:38   ` [pbs-devel] applied: " Dietmar Maurer
  4 siblings, 1 reply; 16+ messages in thread
From: Stefan Reiter @ 2020-07-29 12:33 UTC (permalink / raw)
  To: pbs-devel

This should never trigger if everything else works correctly, but it is
still a very cheap check to avoid wrongly marking a backup as "OK" when
in fact some chunks might be missing.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 src/api2/backup.rs             |  1 +
 src/api2/backup/environment.rs | 21 ++++++++++++++++++++-
 src/backup/backup_info.rs      |  2 +-
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/src/api2/backup.rs b/src/api2/backup.rs
index 621e8c07..b9dff1fc 100644
--- a/src/api2/backup.rs
+++ b/src/api2/backup.rs
@@ -660,6 +660,7 @@ fn download_previous(
             };
             if let Some(index) = index {
                 env.log(format!("register chunks in '{}' from previous backup.", archive_name));
+                env.register_base_snapshot(last_backup.backup_dir.clone());
 
                 for pos in 0..index.index_count() {
                     let info = index.chunk_info(pos).unwrap();
diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index f2ebd24f..c8417ef9 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -1,6 +1,6 @@
 use anyhow::{bail, Error};
 use std::sync::{Arc, Mutex};
-use std::collections::HashMap;
+use std::collections::{HashMap, HashSet};
 
 use serde_json::{json, Value};
 
@@ -57,6 +57,7 @@ struct SharedBackupState {
     dynamic_writers: HashMap<usize, DynamicWriterState>,
     fixed_writers: HashMap<usize, FixedWriterState>,
     known_chunks: HashMap<[u8;32], u32>,
+    base_snapshots: HashSet<BackupDir>,
 }
 
 impl SharedBackupState {
@@ -108,6 +109,7 @@ impl BackupEnvironment {
             dynamic_writers: HashMap::new(),
             fixed_writers: HashMap::new(),
             known_chunks: HashMap::new(),
+            base_snapshots: HashSet::new(),
         };
 
         Self {
@@ -124,6 +126,13 @@ impl BackupEnvironment {
         }
     }
 
+    /// Register a snapshot as a predecessor of the current backup.
+    /// It's existance will be ensured on finishing.
+    pub fn register_base_snapshot(&self, snap: BackupDir) {
+        let mut state = self.state.lock().unwrap();
+        state.base_snapshots.insert(snap);
+    }
+
     /// Register a Chunk with associated length.
     ///
     /// We do not fully trust clients, so a client may only use registered
@@ -445,6 +454,16 @@ impl BackupEnvironment {
             bail!("backup does not contain valid files (file count == 0)");
         }
 
+        for snap in &state.base_snapshots {
+            let path = self.datastore.snapshot_path(snap);
+            if !path.exists() {
+                bail!(
+                    "base snapshot {} was removed during backup, cannot finish as chunks might be missing",
+                    snap
+                );
+            }
+        }
+
         state.finished = true;
 
         Ok(())
diff --git a/src/backup/backup_info.rs b/src/backup/backup_info.rs
index 041f5785..e77cc787 100644
--- a/src/backup/backup_info.rs
+++ b/src/backup/backup_info.rs
@@ -217,7 +217,7 @@ impl std::str::FromStr for BackupGroup {
 /// Uniquely identify a Backup (relative to data store)
 ///
 /// We also call this a backup snaphost.
-#[derive(Debug, Eq, PartialEq, Clone)]
+#[derive(Debug, Eq, PartialEq, Hash, Clone)]
 pub struct BackupDir {
     /// Backup group
     group: BackupGroup,
-- 
2.20.1





^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pbs-devel] applied: [PATCH proxmox-backup 1/5] fix typo: avgerage to average
  2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 1/5] fix typo: avgerage to average Stefan Reiter
@ 2020-07-30  5:25   ` Dietmar Maurer
  0 siblings, 0 replies; 16+ messages in thread
From: Dietmar Maurer @ 2020-07-30  5:25 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter

applied




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 4/5] backup: use flock on backup group to forbid multiple backups at once
  2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 4/5] backup: use flock on backup group to forbid multiple backups at once Stefan Reiter
@ 2020-07-30  5:50   ` Dietmar Maurer
  2020-07-30  7:36     ` Stefan Reiter
  2020-07-30  6:38   ` [pbs-devel] applied: " Dietmar Maurer
  1 sibling, 1 reply; 16+ messages in thread
From: Dietmar Maurer @ 2020-07-30  5:50 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter

> +        // acquire in non-blocking mode, no point in waiting here since other
> +        // backups could still take a very long time
> +        tools::lock_file(&mut handle, true, Some(Duration::from_nanos(0)))
> +            .map_err(|err| {
> +                match err.downcast_ref::<nix::Error>() {
> +                    Some(nix::Error::Sys(nix::errno::Errno::EAGAIN)) => {

Honestly, I would remove this special case with downcast - the default error message is good enough. Isn't it?

> +                        return format_err!(
> +                            "unable to acquire lock on backup group {:?} - another backup is already running",
> +                            self.group_path(),
> +                        );
> +                    },
> +                    _ => ()
> +                }
> +                format_err!(
> +                    "unable to acquire lock on backup group {:?} - {}",
> +                    self.group_path(),
> +                    err,
> +                )
> +            })?;
> +
> +        Ok(handle)
> +    }
> +




^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pbs-devel] applied: [PATCH proxmox-backup 3/5] tools: add nonblocking mode to lock_file
  2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 3/5] tools: add nonblocking mode to lock_file Stefan Reiter
@ 2020-07-30  6:23   ` Dietmar Maurer
  0 siblings, 0 replies; 16+ messages in thread
From: Dietmar Maurer @ 2020-07-30  6:23 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter

applied




^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pbs-devel] applied: [PATCH proxmox-backup 2/5] datastore: prevent deletion of snaps in use as "previous backup"
  2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 2/5] datastore: prevent deletion of snaps in use as "previous backup" Stefan Reiter
@ 2020-07-30  6:37   ` Dietmar Maurer
  2020-07-30  6:40   ` [pbs-devel] " Fabian Grünbichler
  1 sibling, 0 replies; 16+ messages in thread
From: Dietmar Maurer @ 2020-07-30  6:37 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter

applied




^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pbs-devel] applied: [PATCH proxmox-backup 4/5] backup: use flock on backup group to forbid multiple backups at once
  2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 4/5] backup: use flock on backup group to forbid multiple backups at once Stefan Reiter
  2020-07-30  5:50   ` Dietmar Maurer
@ 2020-07-30  6:38   ` Dietmar Maurer
  1 sibling, 0 replies; 16+ messages in thread
From: Dietmar Maurer @ 2020-07-30  6:38 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter

applied




^ permalink raw reply	[flat|nested] 16+ messages in thread

* [pbs-devel] applied: [PATCH proxmox-backup 5/5] backup: ensure base snapshots are still available after backup
  2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 5/5] backup: ensure base snapshots are still available after backup Stefan Reiter
@ 2020-07-30  6:38   ` Dietmar Maurer
  0 siblings, 0 replies; 16+ messages in thread
From: Dietmar Maurer @ 2020-07-30  6:38 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter

applied




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 2/5] datastore: prevent deletion of snaps in use as "previous backup"
  2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 2/5] datastore: prevent deletion of snaps in use as "previous backup" Stefan Reiter
  2020-07-30  6:37   ` [pbs-devel] applied: " Dietmar Maurer
@ 2020-07-30  6:40   ` Fabian Grünbichler
  1 sibling, 0 replies; 16+ messages in thread
From: Fabian Grünbichler @ 2020-07-30  6:40 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

for the record, this is https://bugzilla.proxmox.com/show_bug.cgi?id=2881

On July 29, 2020 2:33 pm, Stefan Reiter wrote:
> To prevent a race with a background GC operation, do not allow deletion
> of backups who's index might currently be referenced as the "known chunk
> list" for successive backups. Otherwise the GC could delete chunks it
> thinks are no longer referenced, while at the same time telling the
> client that it doesn't need to upload said chunks because they already
> exist.
> 
> Additionally, prevent deletion of whole backup groups, if there are
> snapshots contained that appear to be currently in-progress. This is
> currently unlikely to trigger, as that function is only used for sync
> jobs, but it's a useful safeguard either way.
> 
> Deleting a single snapshot has a 'force' parameter, which is necessary
> to allow deleting incomplete snapshots on an aborted backup. Pruning
> also sets force=true to avoid the check, since it calculates which
> snapshots to keep on its own.
> 
> To avoid code duplication, the is_finished method is factored out.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
>  src/api2/admin/datastore.rs     |  4 +--
>  src/api2/backup/environment.rs  |  2 +-
>  src/backup/backup_info.rs       |  7 +++++-
>  src/backup/datastore.rs         | 43 +++++++++++++++++++++++++++++++--
>  src/backup/prune.rs             |  2 +-
>  src/bin/proxmox-backup-proxy.rs |  2 +-
>  src/client/pull.rs              |  4 +--
>  7 files changed, 54 insertions(+), 10 deletions(-)
> 
> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
> index 17e31f1e..f207e5ea 100644
> --- a/src/api2/admin/datastore.rs
> +++ b/src/api2/admin/datastore.rs
> @@ -272,7 +272,7 @@ fn delete_snapshot(
>      let allowed = (user_privs & PRIV_DATASTORE_MODIFY) != 0;
>      if !allowed { check_backup_owner(&datastore, snapshot.group(), &username)?; }
>  
> -    datastore.remove_backup_dir(&snapshot)?;
> +    datastore.remove_backup_dir(&snapshot, false)?;
>  
>      Ok(Value::Null)
>  }
> @@ -660,7 +660,7 @@ fn prune(
>              }));
>  
>              if !(dry_run || keep) {
> -                datastore.remove_backup_dir(&info.backup_dir)?;
> +                datastore.remove_backup_dir(&info.backup_dir, true)?;
>              }
>          }
>  
> diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
> index 2a3632a4..f2ebd24f 100644
> --- a/src/api2/backup/environment.rs
> +++ b/src/api2/backup/environment.rs
> @@ -479,7 +479,7 @@ impl BackupEnvironment {
>          let mut state = self.state.lock().unwrap();
>          state.finished = true;
>  
> -        self.datastore.remove_backup_dir(&self.backup_dir)?;
> +        self.datastore.remove_backup_dir(&self.backup_dir, true)?;
>  
>          Ok(())
>      }
> diff --git a/src/backup/backup_info.rs b/src/backup/backup_info.rs
> index f8ea11bd..b4f671bd 100644
> --- a/src/backup/backup_info.rs
> +++ b/src/backup/backup_info.rs
> @@ -173,7 +173,7 @@ impl std::str::FromStr for BackupGroup {
>  /// Uniquely identify a Backup (relative to data store)
>  ///
>  /// We also call this a backup snaphost.
> -#[derive(Debug, Clone)]
> +#[derive(Debug, Eq, PartialEq, Clone)]
>  pub struct BackupDir {
>      /// Backup group
>      group: BackupGroup,
> @@ -317,6 +317,11 @@ impl BackupInfo {
>          })?;
>          Ok(list)
>      }
> +
> +    pub fn is_finished(&self) -> bool {
> +        // backup is considered unfinished if there is no manifest
> +        self.files.iter().any(|name| name == super::MANIFEST_BLOB_NAME)
> +    }
>  }
>  
>  fn list_backup_files<P: ?Sized + nix::NixPath>(dirfd: RawFd, path: &P) -> Result<Vec<String>, Error> {
> diff --git a/src/backup/datastore.rs b/src/backup/datastore.rs
> index 92f7b069..9c2f2851 100644
> --- a/src/backup/datastore.rs
> +++ b/src/backup/datastore.rs
> @@ -8,7 +8,7 @@ use anyhow::{bail, format_err, Error};
>  use lazy_static::lazy_static;
>  use chrono::{DateTime, Utc};
>  
> -use super::backup_info::{BackupGroup, BackupDir};
> +use super::backup_info::{BackupGroup, BackupDir, BackupInfo};
>  use super::chunk_store::ChunkStore;
>  use super::dynamic_index::{DynamicIndexReader, DynamicIndexWriter};
>  use super::fixed_index::{FixedIndexReader, FixedIndexWriter};
> @@ -197,6 +197,20 @@ impl DataStore {
>  
>          let full_path = self.group_path(backup_group);
>  
> +        let mut snap_list = backup_group.list_backups(&self.base_path())?;
> +        BackupInfo::sort_list(&mut snap_list, false);
> +        for snap in snap_list {
> +            if snap.is_finished() {
> +                break;
> +            } else {
> +                bail!(
> +                    "cannot remove backup group {:?}, contains potentially running backup: {}",
> +                    full_path,
> +                    snap.backup_dir
> +                );
> +            }
> +        }
> +
>          log::info!("removing backup group {:?}", full_path);
>          std::fs::remove_dir_all(&full_path)
>              .map_err(|err| {
> @@ -211,10 +225,35 @@ impl DataStore {
>      }
>  
>      /// Remove a backup directory including all content
> -    pub fn remove_backup_dir(&self, backup_dir: &BackupDir) ->  Result<(), Error> {
> +    pub fn remove_backup_dir(&self, backup_dir: &BackupDir, force: bool) ->  Result<(), Error> {
>  
>          let full_path = self.snapshot_path(backup_dir);
>  
> +        if !force {
> +            let mut snap_list = backup_dir.group().list_backups(&self.base_path())?;
> +            BackupInfo::sort_list(&mut snap_list, false);
> +            let mut prev_snap_finished = true;
> +            for snap in snap_list {
> +                let cur_snap_finished = snap.is_finished();
> +                if &snap.backup_dir == backup_dir {
> +                    if !cur_snap_finished {
> +                        bail!(
> +                            "cannot remove currently running snapshot: {:?}",
> +                            backup_dir
> +                        );
> +                    }
> +                    if !prev_snap_finished {
> +                        bail!(
> +                            "cannot remove snapshot {:?}, successor is currently running and potentially based on it",
> +                            backup_dir
> +                        );
> +                    }
> +                    break;
> +                }
> +                prev_snap_finished = cur_snap_finished;
> +            }
> +        }
> +
>          log::info!("removing backup snapshot {:?}", full_path);
>          std::fs::remove_dir_all(&full_path)
>              .map_err(|err| {
> diff --git a/src/backup/prune.rs b/src/backup/prune.rs
> index bd7cf3b1..f7a87c5a 100644
> --- a/src/backup/prune.rs
> +++ b/src/backup/prune.rs
> @@ -53,7 +53,7 @@ fn remove_incomplete_snapshots(
>      let mut keep_unfinished = true;
>      for info in list.iter() {
>          // backup is considered unfinished if there is no manifest
> -        if info.files.iter().any(|name| name == super::MANIFEST_BLOB_NAME) {
> +        if info.is_finished() {
>              // There is a new finished backup, so there is no need
>              // to keep older unfinished backups.
>              keep_unfinished = false;
> diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
> index a3033916..dc0d16d2 100644
> --- a/src/bin/proxmox-backup-proxy.rs
> +++ b/src/bin/proxmox-backup-proxy.rs
> @@ -455,7 +455,7 @@ async fn schedule_datastore_prune() {
>                              BackupDir::backup_time_to_string(info.backup_dir.backup_time())));
>  
>                          if !keep {
> -                            datastore.remove_backup_dir(&info.backup_dir)?;
> +                            datastore.remove_backup_dir(&info.backup_dir, true)?;
>                          }
>                      }
>                  }
> diff --git a/src/client/pull.rs b/src/client/pull.rs
> index 758cb574..c44cb9f6 100644
> --- a/src/client/pull.rs
> +++ b/src/client/pull.rs
> @@ -277,7 +277,7 @@ pub async fn pull_snapshot_from(
>          worker.log(format!("sync snapshot {:?}", snapshot.relative_path()));
>  
>          if let Err(err) = pull_snapshot(worker, reader, tgt_store.clone(), &snapshot).await {
> -            if let Err(cleanup_err) = tgt_store.remove_backup_dir(&snapshot) {
> +            if let Err(cleanup_err) = tgt_store.remove_backup_dir(&snapshot, true) {
>                  worker.log(format!("cleanup error - {}", cleanup_err));
>              }
>              return Err(err);
> @@ -362,7 +362,7 @@ pub async fn pull_group(
>              let backup_time = info.backup_dir.backup_time();
>              if remote_snapshots.contains(&backup_time) { continue; }
>              worker.log(format!("delete vanished snapshot {:?}", info.backup_dir.relative_path()));
> -            tgt_store.remove_backup_dir(&info.backup_dir)?;
> +            tgt_store.remove_backup_dir(&info.backup_dir, false)?;
>          }
>      }
>  
> -- 
> 2.20.1
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 4/5] backup: use flock on backup group to forbid multiple backups at once
  2020-07-30  5:50   ` Dietmar Maurer
@ 2020-07-30  7:36     ` Stefan Reiter
  2020-07-30  7:41       ` Dietmar Maurer
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Reiter @ 2020-07-30  7:36 UTC (permalink / raw)
  To: Dietmar Maurer, Proxmox Backup Server development discussion

On 7/30/20 7:50 AM, Dietmar Maurer wrote:
>> +        // acquire in non-blocking mode, no point in waiting here since other
>> +        // backups could still take a very long time
>> +        tools::lock_file(&mut handle, true, Some(Duration::from_nanos(0)))
>> +            .map_err(|err| {
>> +                match err.downcast_ref::<nix::Error>() {
>> +                    Some(nix::Error::Sys(nix::errno::Errno::EAGAIN)) => {
> 
> Honestly, I would remove this special case with downcast - the default error message is good enough. Isn't it?
> 

The original message ends in "EAGAIN - Try again", which to me sounded 
like inviting the user to just run the command again, instead of telling 
them there's a backup running.

>> +                        return format_err!(
>> +                            "unable to acquire lock on backup group {:?} - another backup is already running",
>> +                            self.group_path(),
>> +                        );
>> +                    },
>> +                    _ => ()
>> +                }
>> +                format_err!(
>> +                    "unable to acquire lock on backup group {:?} - {}",
>> +                    self.group_path(),
>> +                    err,
>> +                )
>> +            })?;
>> +
>> +        Ok(handle)
>> +    }
>> +




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 4/5] backup: use flock on backup group to forbid multiple backups at once
  2020-07-30  7:36     ` Stefan Reiter
@ 2020-07-30  7:41       ` Dietmar Maurer
  2020-07-30  8:02         ` Stefan Reiter
  0 siblings, 1 reply; 16+ messages in thread
From: Dietmar Maurer @ 2020-07-30  7:41 UTC (permalink / raw)
  To: Stefan Reiter, Proxmox Backup Server development discussion


> On 07/30/2020 9:36 AM Stefan Reiter <s.reiter@proxmox.com> wrote:
> 
>  
> On 7/30/20 7:50 AM, Dietmar Maurer wrote:
> >> +        // acquire in non-blocking mode, no point in waiting here since other
> >> +        // backups could still take a very long time
> >> +        tools::lock_file(&mut handle, true, Some(Duration::from_nanos(0)))
> >> +            .map_err(|err| {
> >> +                match err.downcast_ref::<nix::Error>() {
> >> +                    Some(nix::Error::Sys(nix::errno::Errno::EAGAIN)) => {
> > 
> > Honestly, I would remove this special case with downcast - the default error message is good enough. Isn't it?
> > 
> 
> The original message ends in "EAGAIN - Try again", which to me sounded 
> like inviting the user to just run the command again, instead of telling 
> them there's a backup running.

The suggestion is too use: "unable to acquire lock on backup group {:?} - {}",

> 
> >> +                        return format_err!(
> >> +                            "unable to acquire lock on backup group {:?} - another backup is already running",
> >> +                            self.group_path(),
> >> +                        );
> >> +                    },
> >> +                    _ => ()
> >> +                }
> >> +                format_err!(
> >> +                    "unable to acquire lock on backup group {:?} - {}",
> >> +                    self.group_path(),
> >> +                    err,
> >> +                )
> >> +            })?;
> >> +
> >> +        Ok(handle)
> >> +    }
> >> +




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 4/5] backup: use flock on backup group to forbid multiple backups at once
  2020-07-30  7:41       ` Dietmar Maurer
@ 2020-07-30  8:02         ` Stefan Reiter
  0 siblings, 0 replies; 16+ messages in thread
From: Stefan Reiter @ 2020-07-30  8:02 UTC (permalink / raw)
  To: Dietmar Maurer, Proxmox Backup Server development discussion

On 7/30/20 9:41 AM, Dietmar Maurer wrote:
> 
>> On 07/30/2020 9:36 AM Stefan Reiter <s.reiter@proxmox.com> wrote:
>>
>>   
>> On 7/30/20 7:50 AM, Dietmar Maurer wrote:
>>>> +        // acquire in non-blocking mode, no point in waiting here since other
>>>> +        // backups could still take a very long time
>>>> +        tools::lock_file(&mut handle, true, Some(Duration::from_nanos(0)))
>>>> +            .map_err(|err| {
>>>> +                match err.downcast_ref::<nix::Error>() {
>>>> +                    Some(nix::Error::Sys(nix::errno::Errno::EAGAIN)) => {
>>>
>>> Honestly, I would remove this special case with downcast - the default error message is good enough. Isn't it?
>>>
>>
>> The original message ends in "EAGAIN - Try again", which to me sounded
>> like inviting the user to just run the command again, instead of telling
>> them there's a backup running.
> 
> The suggestion is too use: "unable to acquire lock on backup group {:?} - {}",
> 

Yes, and with that the error message reads:

"Error: unable to acquire lock on backup group "host/xyz" - EAGAIN: Try 
again"

...which I think might confuse some users.
>>
>>>> +                        return format_err!(
>>>> +                            "unable to acquire lock on backup group {:?} - another backup is already running",
>>>> +                            self.group_path(),
>>>> +                        );
>>>> +                    },
>>>> +                    _ => ()
>>>> +                }
>>>> +                format_err!(
>>>> +                    "unable to acquire lock on backup group {:?} - {}",
>>>> +                    self.group_path(),
>>>> +                    err,
>>>> +                )
>>>> +            })?;
>>>> +
>>>> +        Ok(handle)
>>>> +    }
>>>> +




^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-07-30  8:02 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-29 12:33 [pbs-devel] [PATCH 0/5] fix #2881: protect base snapshots and avoid races Stefan Reiter
2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 1/5] fix typo: avgerage to average Stefan Reiter
2020-07-30  5:25   ` [pbs-devel] applied: " Dietmar Maurer
2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 2/5] datastore: prevent deletion of snaps in use as "previous backup" Stefan Reiter
2020-07-30  6:37   ` [pbs-devel] applied: " Dietmar Maurer
2020-07-30  6:40   ` [pbs-devel] " Fabian Grünbichler
2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 3/5] tools: add nonblocking mode to lock_file Stefan Reiter
2020-07-30  6:23   ` [pbs-devel] applied: " Dietmar Maurer
2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 4/5] backup: use flock on backup group to forbid multiple backups at once Stefan Reiter
2020-07-30  5:50   ` Dietmar Maurer
2020-07-30  7:36     ` Stefan Reiter
2020-07-30  7:41       ` Dietmar Maurer
2020-07-30  8:02         ` Stefan Reiter
2020-07-30  6:38   ` [pbs-devel] applied: " Dietmar Maurer
2020-07-29 12:33 ` [pbs-devel] [PATCH proxmox-backup 5/5] backup: ensure base snapshots are still available after backup Stefan Reiter
2020-07-30  6:38   ` [pbs-devel] applied: " Dietmar Maurer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal