public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH v2 0/4] Locking and rustdoc improvements
@ 2020-10-15 10:49 Stefan Reiter
  2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 1/4] gc: avoid race between phase1 and forget/prune Stefan Reiter
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Stefan Reiter @ 2020-10-15 10:49 UTC (permalink / raw)
  To: pbs-devel

Getting our locking into better shape, now with at least 10% more flocks/mutexes
and 14% fewer bugs! *

Order now, and you will even receive an updated documentation providing lots of
detail on how said locking actually works!

* terms and conditions may apply


With these patches applied, all interactions displayed in the table in patch 11
resolve fine in theory, in practice I couldn't trigger any of the known races we
had previously anymore either. Extra thourough review/testing still appreciated,
locking is hard...

v2:
* Drop applied patches
* Use lock file and update_manifest method for manifest
* Don't hold manifest lock across verify
* Drop Error.context() from path 1, use File::open directly instead
* Update docs for v2 changes


proxmox-backup: Stefan Reiter (4):
  gc: avoid race between phase1 and forget/prune
  datastore: add manifest locking
  rustdoc: add crate level doc
  rustdoc: overhaul backup rustdoc and add locking table

 src/api2/admin/datastore.rs    |   8 +-
 src/api2/backup/environment.rs |  13 +--
 src/backup.rs                  | 199 ++++++++++++++++++++-------------
 src/backup/datastore.rs        | 105 ++++++++++++++---
 src/backup/manifest.rs         |   1 +
 src/backup/verify.rs           |   9 +-
 src/lib.rs                     |   5 +
 7 files changed, 229 insertions(+), 111 deletions(-)

-- 
2.20.1




^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] [PATCH v2 proxmox-backup 1/4] gc: avoid race between phase1 and forget/prune
  2020-10-15 10:49 [pbs-devel] [PATCH v2 0/4] Locking and rustdoc improvements Stefan Reiter
@ 2020-10-15 10:49 ` Stefan Reiter
  2020-10-16  6:26   ` Dietmar Maurer
  2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 2/4] datastore: add manifest locking Stefan Reiter
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Stefan Reiter @ 2020-10-15 10:49 UTC (permalink / raw)
  To: pbs-devel

...by saving all forgotten chunks into a HashSet when we detect a GC
phase1 currently running. If GC then hits a NotFound, it can check if
the snapshot was simply forgotten behind its back, or if an actual error
occurred and it needs to abort (since it might delete still referenced
chunks, if the error is transient and the index file is still there).

We have to attach the error message in {fixed,dynamic}_index via the
.context() method, otherwise the original std::io::Error gets lost and
we can't check for NotFound.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

v2:
* use File::open directly and catch std::io::Error that way, avoid Error.context

 src/backup/datastore.rs | 62 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 55 insertions(+), 7 deletions(-)

diff --git a/src/backup/datastore.rs b/src/backup/datastore.rs
index 7dd2624c..ca8ca438 100644
--- a/src/backup/datastore.rs
+++ b/src/backup/datastore.rs
@@ -36,6 +36,9 @@ pub struct DataStore {
     chunk_store: Arc<ChunkStore>,
     gc_mutex: Mutex<bool>,
     last_gc_status: Mutex<GarbageCollectionStatus>,
+
+    // bool indicates if phase1 is currently active
+    removed_during_gc: Mutex<(bool, HashSet<PathBuf>)>,
 }
 
 impl DataStore {
@@ -82,6 +85,7 @@ impl DataStore {
             chunk_store: Arc::new(chunk_store),
             gc_mutex: Mutex::new(false),
             last_gc_status: Mutex::new(gc_status),
+            removed_during_gc: Mutex::new((false, HashSet::new())),
         })
     }
 
@@ -236,6 +240,14 @@ impl DataStore {
             _guard = lock_dir_noblock(&full_path, "snapshot", "possibly running or in use")?;
         }
 
+        // Acquire lock and keep it during remove operation, so there's no
+        // chance for a race between adding to the hash and actually removing
+        // the dir (phase1 might otherwise start in-between)
+        let mut removed_guard = self.removed_during_gc.lock().unwrap();
+        if removed_guard.0 {
+            removed_guard.1.insert(self.snapshot_path(&backup_dir));
+        }
+
         log::info!("removing backup snapshot {:?}", full_path);
         std::fs::remove_dir_all(&full_path)
             .map_err(|err| {
@@ -461,12 +473,35 @@ impl DataStore {
             tools::fail_on_shutdown()?;
 
             if let Ok(archive_type) = archive_type(&path) {
-                if archive_type == ArchiveType::FixedIndex {
-                    let index = self.open_fixed_reader(&path)?;
-                    self.index_mark_used_chunks(index, &path, status, worker)?;
-                } else if archive_type == ArchiveType::DynamicIndex {
-                    let index = self.open_dynamic_reader(&path)?;
-                    self.index_mark_used_chunks(index, &path, status, worker)?;
+                let full_path =  self.chunk_store.relative_path(&path);
+
+                match std::fs::File::open(&full_path) {
+                    Ok(file) => {
+                        if archive_type == ArchiveType::FixedIndex {
+                            let index = FixedIndexReader::new(file)?;
+                            self.index_mark_used_chunks(index, &path, status, worker)?;
+                        } else if archive_type == ArchiveType::DynamicIndex {
+                            let index = DynamicIndexReader::new(file)?;
+                            self.index_mark_used_chunks(index, &path, status, worker)?;
+                        }
+                    },
+                    Err(err) => {
+                        if err.kind() == std::io::ErrorKind::NotFound {
+                            let (_, removed_hash) = &*self.removed_during_gc.lock().unwrap();
+                            let backup_dir = path.parent();
+                            if backup_dir.is_some() && removed_hash.contains(backup_dir.unwrap()) {
+                                // index file not found, but we know that it was deleted by
+                                // a concurrent 'forget', so we can safely ignore
+                            } else {
+                                bail!(
+                                    "index file not found but hasn't been removed by forget/prune, aborting GC - {}",
+                                    err
+                                )
+                            }
+                        } else {
+                            bail!(err)
+                        }
+                    }
                 }
             }
             done += 1;
@@ -512,7 +547,20 @@ impl DataStore {
 
             crate::task_log!(worker, "Start GC phase1 (mark used chunks)");
 
-            self.mark_used_chunks(&mut gc_status, worker)?;
+            {
+                let mut guard = self.removed_during_gc.lock().unwrap();
+                guard.0 = true;
+            }
+            let mark_res = self.mark_used_chunks(&mut gc_status, worker);
+            {
+                let mut guard = self.removed_during_gc.lock().unwrap();
+                guard.0 = false;
+                guard.1.clear();
+            }
+
+            if let Err(err) = mark_res {
+                bail!(err);
+            }
 
             crate::task_log!(worker, "Start GC phase2 (sweep unused chunks)");
             self.chunk_store.sweep_unused_chunks(
-- 
2.20.1





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] [PATCH v2 proxmox-backup 2/4] datastore: add manifest locking
  2020-10-15 10:49 [pbs-devel] [PATCH v2 0/4] Locking and rustdoc improvements Stefan Reiter
  2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 1/4] gc: avoid race between phase1 and forget/prune Stefan Reiter
@ 2020-10-15 10:49 ` Stefan Reiter
  2020-10-16  6:33   ` Dietmar Maurer
  2020-10-16  7:39   ` [pbs-devel] applied: " Dietmar Maurer
  2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 3/4] rustdoc: add crate level doc Stefan Reiter
  2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 4/4] rustdoc: overhaul backup rustdoc and add locking table Stefan Reiter
  3 siblings, 2 replies; 11+ messages in thread
From: Stefan Reiter @ 2020-10-15 10:49 UTC (permalink / raw)
  To: pbs-devel

Avoid races when updating manifest data by flocking a lock file.
update_manifest is used to ensure updates always happen with the lock
held.

Snapshot deletion also acquires the lock, so it cannot interfere with an
outstanding manifest write.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

v2:
* Use seperate manifest lock file
* Change store_manifest to update_manifest to force consumers to use correct
  locking
* Don't hold lock across verify, reload manifest at the end
* Update comments

 src/api2/admin/datastore.rs    |  8 +++----
 src/api2/backup/environment.rs | 13 ++++------
 src/backup/datastore.rs        | 43 +++++++++++++++++++++++++++++-----
 src/backup/manifest.rs         |  1 +
 src/backup/verify.rs           |  9 +++----
 5 files changed, 50 insertions(+), 24 deletions(-)

diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index de39cd1b..4f15c1cd 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -1481,11 +1481,9 @@ fn set_notes(
     let allowed = (user_privs & PRIV_DATASTORE_READ) != 0;
     if !allowed { check_backup_owner(&datastore, backup_dir.group(), &userid)?; }
 
-    let (mut manifest, _) = datastore.load_manifest(&backup_dir)?;
-
-    manifest.unprotected["notes"] = notes.into();
-
-    datastore.store_manifest(&backup_dir, manifest)?;
+    datastore.update_manifest(&backup_dir,|manifest| {
+        manifest.unprotected["notes"] = notes.into();
+    }).map_err(|err| format_err!("unable to update manifest blob - {}", err))?;
 
     Ok(())
 }
diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
index f00c2cd3..c4f166df 100644
--- a/src/api2/backup/environment.rs
+++ b/src/api2/backup/environment.rs
@@ -472,16 +472,11 @@ impl BackupEnvironment {
             bail!("backup does not contain valid files (file count == 0)");
         }
 
-        // check manifest
-        let (mut manifest, _) = self.datastore.load_manifest(&self.backup_dir)
-            .map_err(|err| format_err!("unable to load manifest blob - {}", err))?;
-
+        // check for valid manifest and store stats
         let stats = serde_json::to_value(state.backup_stat)?;
-
-        manifest.unprotected["chunk_upload_stats"] = stats;
-
-        self.datastore.store_manifest(&self.backup_dir, manifest)
-            .map_err(|err| format_err!("unable to store manifest blob - {}", err))?;
+        self.datastore.update_manifest(&self.backup_dir, |manifest| {
+            manifest.unprotected["chunk_upload_stats"] = stats;
+        }).map_err(|err| format_err!("unable to update manifest blob - {}", err))?;
 
         if let Some(base) = &self.last_backup {
             let path = self.datastore.snapshot_path(&base.backup_dir);
diff --git a/src/backup/datastore.rs b/src/backup/datastore.rs
index ca8ca438..7b2bee7e 100644
--- a/src/backup/datastore.rs
+++ b/src/backup/datastore.rs
@@ -3,17 +3,19 @@ use std::io::{self, Write};
 use std::path::{Path, PathBuf};
 use std::sync::{Arc, Mutex};
 use std::convert::TryFrom;
+use std::time::Duration;
+use std::fs::File;
 
 use anyhow::{bail, format_err, Error};
 use lazy_static::lazy_static;
 
-use proxmox::tools::fs::{replace_file, CreateOptions};
+use proxmox::tools::fs::{replace_file, CreateOptions, open_file_locked};
 
 use super::backup_info::{BackupGroup, BackupDir};
 use super::chunk_store::ChunkStore;
 use super::dynamic_index::{DynamicIndexReader, DynamicIndexWriter};
 use super::fixed_index::{FixedIndexReader, FixedIndexWriter};
-use super::manifest::{MANIFEST_BLOB_NAME, CLIENT_LOG_BLOB_NAME, BackupManifest};
+use super::manifest::{MANIFEST_BLOB_NAME, MANIFEST_LOCK_NAME, CLIENT_LOG_BLOB_NAME, BackupManifest};
 use super::index::*;
 use super::{DataBlob, ArchiveType, archive_type};
 use crate::config::datastore;
@@ -235,9 +237,10 @@ impl DataStore {
 
         let full_path = self.snapshot_path(backup_dir);
 
-        let _guard;
+        let (_guard, _manifest_guard);
         if !force {
             _guard = lock_dir_noblock(&full_path, "snapshot", "possibly running or in use")?;
+            _manifest_guard = self.lock_manifest(backup_dir);
         }
 
         // Acquire lock and keep it during remove operation, so there's no
@@ -665,8 +668,27 @@ impl DataStore {
             digest_str,
             err,
         ))
-     }
+    }
 
+    fn lock_manifest(
+        &self,
+        backup_dir: &BackupDir,
+    ) -> Result<File, Error> {
+        let mut path = self.base_path();
+        path.push(backup_dir.relative_path());
+        path.push(&MANIFEST_LOCK_NAME);
+
+        // update_manifest should never take a long time, so if someone else has
+        // the lock we can simply block a bit and should get it soon
+        open_file_locked(&path, Duration::from_secs(5), true)
+            .map_err(|err| {
+                format_err!(
+                    "unable to acquire manifest lock {:?} - {}", &path, err
+                )
+            })
+    }
+
+    /// Load the manifest without a lock. Must not be written back.
     pub fn load_manifest(
         &self,
         backup_dir: &BackupDir,
@@ -677,11 +699,19 @@ impl DataStore {
         Ok((manifest, raw_size))
     }
 
-    pub fn store_manifest(
+    /// Update the manifest of the specified snapshot. Never write a manifest directly,
+    /// only use this method - anything else may break locking guarantees.
+    pub fn update_manifest(
         &self,
         backup_dir: &BackupDir,
-        manifest: BackupManifest,
+        update_fn: impl FnOnce(&mut BackupManifest),
     ) -> Result<(), Error> {
+
+        let _guard = self.lock_manifest(backup_dir)?;
+        let (mut manifest, _) = self.load_manifest(&backup_dir)?;
+
+        update_fn(&mut manifest);
+
         let manifest = serde_json::to_value(manifest)?;
         let manifest = serde_json::to_string_pretty(&manifest)?;
         let blob = DataBlob::encode(manifest.as_bytes(), None, true)?;
@@ -691,6 +721,7 @@ impl DataStore {
         path.push(backup_dir.relative_path());
         path.push(MANIFEST_BLOB_NAME);
 
+        // atomic replace invalidates flock - no other writes past this point!
         replace_file(&path, raw_data, CreateOptions::new())?;
 
         Ok(())
diff --git a/src/backup/manifest.rs b/src/backup/manifest.rs
index 609cc998..51980a07 100644
--- a/src/backup/manifest.rs
+++ b/src/backup/manifest.rs
@@ -8,6 +8,7 @@ use ::serde::{Deserialize, Serialize};
 use crate::backup::{BackupDir, CryptMode, CryptConfig};
 
 pub const MANIFEST_BLOB_NAME: &str = "index.json.blob";
+pub const MANIFEST_LOCK_NAME: &str = ".index.json.lck";
 pub const CLIENT_LOG_BLOB_NAME: &str = "client.log.blob";
 
 mod hex_csum {
diff --git a/src/backup/verify.rs b/src/backup/verify.rs
index 05b6ba86..ea3fa760 100644
--- a/src/backup/verify.rs
+++ b/src/backup/verify.rs
@@ -300,7 +300,7 @@ pub fn verify_backup_dir(
         return Ok(true);
     }
 
-    let mut manifest = match datastore.load_manifest(&backup_dir) {
+    let manifest = match datastore.load_manifest(&backup_dir) {
         Ok((manifest, _)) => manifest,
         Err(err) => {
             task_log!(
@@ -367,9 +367,10 @@ pub fn verify_backup_dir(
         state: verify_result,
         upid,
     };
-    manifest.unprotected["verify_state"] = serde_json::to_value(verify_state)?;
-    datastore.store_manifest(&backup_dir, manifest)
-        .map_err(|err| format_err!("unable to store manifest blob - {}", err))?;
+    let verify_state = serde_json::to_value(verify_state)?;
+    datastore.update_manifest(&backup_dir, |manifest| {
+        manifest.unprotected["verify_state"] = verify_state;
+    }).map_err(|err| format_err!("unable to update manifest blob - {}", err))?;
 
     Ok(error_count == 0)
 }
-- 
2.20.1





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] [PATCH v2 proxmox-backup 3/4] rustdoc: add crate level doc
  2020-10-15 10:49 [pbs-devel] [PATCH v2 0/4] Locking and rustdoc improvements Stefan Reiter
  2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 1/4] gc: avoid race between phase1 and forget/prune Stefan Reiter
  2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 2/4] datastore: add manifest locking Stefan Reiter
@ 2020-10-15 10:49 ` Stefan Reiter
  2020-10-16  7:47   ` [pbs-devel] applied: " Dietmar Maurer
  2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 4/4] rustdoc: overhaul backup rustdoc and add locking table Stefan Reiter
  3 siblings, 1 reply; 11+ messages in thread
From: Stefan Reiter @ 2020-10-15 10:49 UTC (permalink / raw)
  To: pbs-devel

Contains a link to the 'backup' module's doc, as that explains a lot
about the inner workings of PBS and probably marks a good entry point
for new readers.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

v2:
* unchanged

 src/lib.rs | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/lib.rs b/src/lib.rs
index c44c9ae1..cc51b2e0 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -1,3 +1,8 @@
+//! See the different modules for documentation on their usage.
+//!
+//! The [backup](backup/index.html) module contains some detailed information
+//! on the inner workings of the backup server regarding data storage.
+
 pub mod task;
 
 #[macro_use]
-- 
2.20.1





^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] [PATCH v2 proxmox-backup 4/4] rustdoc: overhaul backup rustdoc and add locking table
  2020-10-15 10:49 [pbs-devel] [PATCH v2 0/4] Locking and rustdoc improvements Stefan Reiter
                   ` (2 preceding siblings ...)
  2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 3/4] rustdoc: add crate level doc Stefan Reiter
@ 2020-10-15 10:49 ` Stefan Reiter
  2020-10-16  7:47   ` [pbs-devel] applied: " Dietmar Maurer
  3 siblings, 1 reply; 11+ messages in thread
From: Stefan Reiter @ 2020-10-15 10:49 UTC (permalink / raw)
  To: pbs-devel

Rewrite most of the documentation to be more readable and correct
(according to the current implementations).

Add a table visualizing all different locks used to synchronize
concurrent operations.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

FYI: I used https://www.tablesgenerator.com/markdown_tables for the table

v2:
* Update table to reflect update_manifest changes

 src/backup.rs | 199 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 119 insertions(+), 80 deletions(-)

diff --git a/src/backup.rs b/src/backup.rs
index 1b2180bc..577fdf40 100644
--- a/src/backup.rs
+++ b/src/backup.rs
@@ -1,107 +1,146 @@
-//! This module implements the proxmox backup data storage
+//! This module implements the data storage and access layer.
 //!
-//! Proxmox backup splits large files into chunks, and stores them
-//! deduplicated using a content addressable storage format.
+//! # Data formats
 //!
-//! A chunk is simply defined as binary blob, which is stored inside a
-//! `ChunkStore`, addressed by the SHA256 digest of the binary blob.
+//! PBS splits large files into chunks, and stores them deduplicated using
+//! a content addressable storage format.
 //!
-//! Index files are used to reconstruct the original file. They
-//! basically contain a list of SHA256 checksums. The `DynamicIndex*`
-//! format is able to deal with dynamic chunk sizes, whereas the
-//! `FixedIndex*` format is an optimization to store a list of equal
-//! sized chunks.
+//! Backup snapshots are stored as folders containing a manifest file and
+//! potentially one or more index or blob files.
 //!
-//! # ChunkStore Locking
+//! The manifest contains hashes of all other files and can be signed by
+//! the client.
 //!
-//! We need to be able to restart the proxmox-backup service daemons,
-//! so that we can update the software without rebooting the host. But
-//! such restarts must not abort running backup jobs, so we need to
-//! keep the old service running until those jobs are finished. This
-//! implies that we need some kind of locking for the
-//! ChunkStore. Please note that it is perfectly valid to have
-//! multiple parallel ChunkStore writers, even when they write the
-//! same chunk (because the chunk would have the same name and the
-//! same data). The only real problem is garbage collection, because
-//! we need to avoid deleting chunks which are still referenced.
+//! Blob files contain data directly. They are used for config files and
+//! the like.
 //!
-//! * Read Index Files:
+//! Index files are used to reconstruct an original file. They contain a
+//! list of SHA256 checksums. The `DynamicIndex*` format is able to deal
+//! with dynamic chunk sizes (CT and host backups), whereas the
+//! `FixedIndex*` format is an optimization to store a list of equal sized
+//! chunks (VMs, whole block devices).
 //!
-//!   Acquire shared lock for .idx files.
-//!
-//!
-//! * Delete Index Files:
-//!
-//!   Acquire exclusive lock for .idx files. This makes sure that we do
-//!   not delete index files while they are still in use.
-//!
-//!
-//! * Create Index Files:
-//!
-//!   Acquire shared lock for ChunkStore (process wide).
-//!
-//!   Note: When creating .idx files, we create temporary a (.tmp) file,
-//!   then do an atomic rename ...
-//!
-//!
-//! * Garbage Collect:
-//!
-//!   Acquire exclusive lock for ChunkStore (process wide). If we have
-//!   already a shared lock for the ChunkStore, try to upgrade that
-//!   lock.
-//!
-//!
-//! * Server Restart
-//!
-//!   Try to abort the running garbage collection to release exclusive
-//!   ChunkStore locks ASAP. Start the new service with the existing listening
-//!   socket.
+//! A chunk is defined as a binary blob, which is stored inside a
+//! [ChunkStore](struct.ChunkStore.html) instead of the backup directory
+//! directly, and can be addressed by its SHA256 digest.
 //!
 //!
 //! # Garbage Collection (GC)
 //!
-//! Deleting backups is as easy as deleting the corresponding .idx
-//! files. Unfortunately, this does not free up any storage, because
-//! those files just contain references to chunks.
+//! Deleting backups is as easy as deleting the corresponding .idx files.
+//! However, this does not free up any storage, because those files just
+//! contain references to chunks.
 //!
 //! To free up some storage, we run a garbage collection process at
-//! regular intervals. The collector uses a mark and sweep
-//! approach. In the first phase, it scans all .idx files to mark used
-//! chunks. The second phase then removes all unmarked chunks from the
-//! store.
+//! regular intervals. The collector uses a mark and sweep approach. In
+//! the first phase, it scans all .idx files to mark used chunks. The
+//! second phase then removes all unmarked chunks from the store.
 //!
-//! The above locking mechanism makes sure that we are the only
-//! process running GC. But we still want to be able to create backups
-//! during GC, so there may be multiple backup threads/tasks
-//! running. Either started before GC started, or started while GC is
-//! running.
+//! The locking mechanisms mentioned below make sure that we are the only
+//! process running GC. We still want to be able to create backups during
+//! GC, so there may be multiple backup threads/tasks running, either
+//! started before GC, or while GC is running.
 //!
 //! ## `atime` based GC
 //!
 //! The idea here is to mark chunks by updating the `atime` (access
-//! timestamp) on the chunk file. This is quite simple and does not
-//! need additional RAM.
+//! timestamp) on the chunk file. This is quite simple and does not need
+//! additional RAM.
 //!
 //! One minor problem is that recent Linux versions use the `relatime`
-//! mount flag by default for performance reasons (yes, we want
-//! that). When enabled, `atime` data is written to the disk only if
-//! the file has been modified since the `atime` data was last updated
-//! (`mtime`), or if the file was last accessed more than a certain
-//! amount of time ago (by default 24h). So we may only delete chunks
-//! with `atime` older than 24 hours.
-//!
-//! Another problem arises from running backups. The mark phase does
-//! not find any chunks from those backups, because there is no .idx
-//! file for them (created after the backup). Chunks created or
-//! touched by those backups may have an `atime` as old as the start
-//! time of those backups. Please note that the backup start time may
-//! predate the GC start time. So we may only delete chunks older than
-//! the start time of those running backup jobs.
+//! mount flag by default for performance reasons (and we want that). When
+//! enabled, `atime` data is written to the disk only if the file has been
+//! modified since the `atime` data was last updated (`mtime`), or if the
+//! file was last accessed more than a certain amount of time ago (by
+//! default 24h). So we may only delete chunks with `atime` older than 24
+//! hours.
 //!
+//! Another problem arises from running backups. The mark phase does not
+//! find any chunks from those backups, because there is no .idx file for
+//! them (created after the backup). Chunks created or touched by those
+//! backups may have an `atime` as old as the start time of those backups.
+//! Please note that the backup start time may predate the GC start time.
+//! So we may only delete chunks older than the start time of those
+//! running backup jobs, which might be more than 24h back (this is the
+//! reason why ProcessLocker exclusive locks only have to be exclusive
+//! between processes, since within one we can determine the age of the
+//! oldest shared lock).
 //!
 //! ## Store `marks` in RAM using a HASH
 //!
-//! Not sure if this is better. TODO
+//! Might be better. Under investigation.
+//!
+//!
+//! # Locking
+//!
+//! Since PBS allows multiple potentially interfering operations at the
+//! same time (e.g. garbage collect, prune, multiple backup creations
+//! (only in seperate groups), forget, ...), these need to lock against
+//! each other in certain scenarios. There is no overarching global lock
+//! though, instead always the finest grained lock possible is used,
+//! because running these operations concurrently is treated as a feature
+//! on its own.
+//!
+//! ## Inter-process Locking
+//!
+//! We need to be able to restart the proxmox-backup service daemons, so
+//! that we can update the software without rebooting the host. But such
+//! restarts must not abort running backup jobs, so we need to keep the
+//! old service running until those jobs are finished. This implies that
+//! we need some kind of locking for modifying chunks and indices in the
+//! ChunkStore.
+//!
+//! Please note that it is perfectly valid to have multiple
+//! parallel ChunkStore writers, even when they write the same chunk
+//! (because the chunk would have the same name and the same data, and
+//! writes are completed atomically via a rename). The only problem is
+//! garbage collection, because we need to avoid deleting chunks which are
+//! still referenced.
+//!
+//! To do this we use the
+//! [ProcessLocker](../tools/struct.ProcessLocker.html).
+//!
+//! ### ChunkStore-wide
+//!
+//! * Create Index Files:
+//!
+//!   Acquire shared lock for ChunkStore.
+//!
+//!   Note: When creating .idx files, we create a temporary .tmp file,
+//!   then do an atomic rename.
+//!
+//! * Garbage Collect:
+//!
+//!   Acquire exclusive lock for ChunkStore. If we have
+//!   already a shared lock for the ChunkStore, try to upgrade that
+//!   lock.
+//!
+//! Exclusive locks only work _between processes_. It is valid to have an
+//! exclusive and one or more shared locks held within one process. Writing
+//! chunks within one process is synchronized using the gc_mutex.
+//!
+//! On server restart, we stop any running GC in the old process to avoid
+//! having the exclusive lock held for too long.
+//!
+//! ## Locking table
+//!
+//! Below table shows all operations that play a role in locking, and which
+//! mechanisms are used to make their concurrent usage safe.
+//!
+//! | starting ><br>v during | read index file | create index file | GC mark | GC sweep | update manifest | forget | prune | create backup | verify | reader api |
+//! |-|-|-|-|-|-|-|-|-|-|-|
+//! | **read index file** | / | / | / | / | / | mmap stays valid, oldest_shared_lock prevents GC | see forget column | / | / | / |
+//! | **create index file** | / | / | / | / | / | / | / | /, happens at the end, after all chunks are touched | /, only happens without a manifest | / |
+//! | **GC mark** | / | Datastore process-lock shared | gc_mutex, exclusive ProcessLocker | gc_mutex | /, GC only cares about index files, not manifests | tells GC about removed chunks | see forget column | /, index files don’t exist yet | / | / |
+//! | **GC sweep** | / | Datastore process-lock shared | gc_mutex, exclusive ProcessLocker | gc_mutex | / | /, chunks already marked | see forget column | chunks get touched; chunk_store.mutex; oldest PL lock | / | / |
+//! | **update manifest** | / | / | / | / | update_manifest lock | update_manifest lock, remove dir under lock | see forget column | /, “write manifest” happens at the end | /, can call “write manifest”, see that column | / |
+//! | **forget** | / | / | removed_during_gc mutex is held during unlink | marking done, doesn’t matter if forgotten now | update_manifest lock, forget waits for lock | /, unlink is atomic | causes forget to fail, but that’s OK | running backup has snapshot flock | /, potentially detects missing folder | shared snap flock |
+//! | **prune** | / | / | see forget row | see forget row | see forget row | causes warn in prune, but no error | see forget column | running and last non-running can’t be pruned | see forget row | shared snap flock |
+//! | **create backup** | / | only time this happens, thus has snapshot flock | / | chunks get touched; chunk_store.mutex; oldest PL lock | no lock, but cannot exist beforehand | snapshot flock, can’t be forgotten | running and last non-running can’t be pruned | snapshot group flock, only one running per group | /, won’t be verified since manifest missing | / |
+//! | **verify** | / | / | / | / | see “update manifest” row | /, potentially detects missing folder | see forget column | / | /, but useless (“update manifest” protects itself) | / |
+//! | **reader api** | / | / | / | /, open snap can’t be forgotten, so ref must exist | / | prevented by shared snap flock | prevented by shared snap flock | / | / | /, lock is shared |!
+//! * / = no interaction
+//! * shared/exclusive from POV of 'starting' process
 
 use anyhow::{bail, Error};
 
-- 
2.20.1





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pbs-devel] [PATCH v2 proxmox-backup 1/4] gc: avoid race between phase1 and forget/prune
  2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 1/4] gc: avoid race between phase1 and forget/prune Stefan Reiter
@ 2020-10-16  6:26   ` Dietmar Maurer
  0 siblings, 0 replies; 11+ messages in thread
From: Dietmar Maurer @ 2020-10-16  6:26 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter

> ...by saving all forgotten chunks into a HashSet when we detect a GC

forgotten chunks? I think you store index file names in that hash...

> phase1 currently running. If GC then hits a NotFound, it can check if
> the snapshot was simply forgotten behind its back, or if an actual error
> occurred and it needs to abort (since it might delete still referenced
> chunks, if the error is transient and the index file is still there).

Thought more about this, and this "transient" error model makes no sense too me.
You can get such "transient" errors during list_images(), so this approach does not really
provide any protection for that.

So for now, I applied a simpler version which simply ignores vanished files.

We can still apply further safety guards if it turns out we need them ...

> We have to attach the error message in {fixed,dynamic}_index via the
> .context() method, otherwise the original std::io::Error gets lost and
> we can't check for NotFound.

above comment is obsolete for v2




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pbs-devel] [PATCH v2 proxmox-backup 2/4] datastore: add manifest locking
  2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 2/4] datastore: add manifest locking Stefan Reiter
@ 2020-10-16  6:33   ` Dietmar Maurer
  2020-10-16  7:37     ` Dietmar Maurer
  2020-10-16  7:39   ` [pbs-devel] applied: " Dietmar Maurer
  1 sibling, 1 reply; 11+ messages in thread
From: Dietmar Maurer @ 2020-10-16  6:33 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter

comments inline:

> On 10/15/2020 12:49 PM Stefan Reiter <s.reiter@proxmox.com> wrote:
> 
>  
> Avoid races when updating manifest data by flocking a lock file.
> update_manifest is used to ensure updates always happen with the lock
> held.
> 
> Snapshot deletion also acquires the lock, so it cannot interfere with an
> outstanding manifest write.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> v2:
> * Use seperate manifest lock file
> * Change store_manifest to update_manifest to force consumers to use correct
>   locking
> * Don't hold lock across verify, reload manifest at the end
> * Update comments
> 
>  src/api2/admin/datastore.rs    |  8 +++----
>  src/api2/backup/environment.rs | 13 ++++------
>  src/backup/datastore.rs        | 43 +++++++++++++++++++++++++++++-----
>  src/backup/manifest.rs         |  1 +
>  src/backup/verify.rs           |  9 +++----
>  5 files changed, 50 insertions(+), 24 deletions(-)
> 
> diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
> index de39cd1b..4f15c1cd 100644
> --- a/src/api2/admin/datastore.rs
> +++ b/src/api2/admin/datastore.rs
> @@ -1481,11 +1481,9 @@ fn set_notes(
>      let allowed = (user_privs & PRIV_DATASTORE_READ) != 0;
>      if !allowed { check_backup_owner(&datastore, backup_dir.group(), &userid)?; }
>  
> -    let (mut manifest, _) = datastore.load_manifest(&backup_dir)?;
> -
> -    manifest.unprotected["notes"] = notes.into();
> -
> -    datastore.store_manifest(&backup_dir, manifest)?;
> +    datastore.update_manifest(&backup_dir,|manifest| {
> +        manifest.unprotected["notes"] = notes.into();
> +    }).map_err(|err| format_err!("unable to update manifest blob - {}", err))?;
>  
>      Ok(())
>  }
> diff --git a/src/api2/backup/environment.rs b/src/api2/backup/environment.rs
> index f00c2cd3..c4f166df 100644
> --- a/src/api2/backup/environment.rs
> +++ b/src/api2/backup/environment.rs
> @@ -472,16 +472,11 @@ impl BackupEnvironment {
>              bail!("backup does not contain valid files (file count == 0)");
>          }
>  
> -        // check manifest
> -        let (mut manifest, _) = self.datastore.load_manifest(&self.backup_dir)
> -            .map_err(|err| format_err!("unable to load manifest blob - {}", err))?;
> -
> +        // check for valid manifest and store stats
>          let stats = serde_json::to_value(state.backup_stat)?;
> -
> -        manifest.unprotected["chunk_upload_stats"] = stats;
> -
> -        self.datastore.store_manifest(&self.backup_dir, manifest)
> -            .map_err(|err| format_err!("unable to store manifest blob - {}", err))?;
> +        self.datastore.update_manifest(&self.backup_dir, |manifest| {
> +            manifest.unprotected["chunk_upload_stats"] = stats;
> +        }).map_err(|err| format_err!("unable to update manifest blob - {}", err))?;
>  
>          if let Some(base) = &self.last_backup {
>              let path = self.datastore.snapshot_path(&base.backup_dir);
> diff --git a/src/backup/datastore.rs b/src/backup/datastore.rs
> index ca8ca438..7b2bee7e 100644
> --- a/src/backup/datastore.rs
> +++ b/src/backup/datastore.rs
> @@ -3,17 +3,19 @@ use std::io::{self, Write};
>  use std::path::{Path, PathBuf};
>  use std::sync::{Arc, Mutex};
>  use std::convert::TryFrom;
> +use std::time::Duration;
> +use std::fs::File;
>  
>  use anyhow::{bail, format_err, Error};
>  use lazy_static::lazy_static;
>  
> -use proxmox::tools::fs::{replace_file, CreateOptions};
> +use proxmox::tools::fs::{replace_file, CreateOptions, open_file_locked};
>  
>  use super::backup_info::{BackupGroup, BackupDir};
>  use super::chunk_store::ChunkStore;
>  use super::dynamic_index::{DynamicIndexReader, DynamicIndexWriter};
>  use super::fixed_index::{FixedIndexReader, FixedIndexWriter};
> -use super::manifest::{MANIFEST_BLOB_NAME, CLIENT_LOG_BLOB_NAME, BackupManifest};
> +use super::manifest::{MANIFEST_BLOB_NAME, MANIFEST_LOCK_NAME, CLIENT_LOG_BLOB_NAME, BackupManifest};
>  use super::index::*;
>  use super::{DataBlob, ArchiveType, archive_type};
>  use crate::config::datastore;
> @@ -235,9 +237,10 @@ impl DataStore {
>  
>          let full_path = self.snapshot_path(backup_dir);
>  
> -        let _guard;
> +        let (_guard, _manifest_guard);
>          if !force {
>              _guard = lock_dir_noblock(&full_path, "snapshot", "possibly running or in use")?;
> +            _manifest_guard = self.lock_manifest(backup_dir);

I think this is unnecessary. An update manifest should not block a remove_backup_dir.
What for exactly?

>          }
>  
>          // Acquire lock and keep it during remove operation, so there's no
> @@ -665,8 +668,27 @@ impl DataStore {
>              digest_str,
>              err,
>          ))
> -     }
> +    }
>  
> +    fn lock_manifest(
> +        &self,
> +        backup_dir: &BackupDir,
> +    ) -> Result<File, Error> {
> +        let mut path = self.base_path();
> +        path.push(backup_dir.relative_path());
> +        path.push(&MANIFEST_LOCK_NAME);
> +
> +        // update_manifest should never take a long time, so if someone else has
> +        // the lock we can simply block a bit and should get it soon
> +        open_file_locked(&path, Duration::from_secs(5), true)
> +            .map_err(|err| {
> +                format_err!(
> +                    "unable to acquire manifest lock {:?} - {}", &path, err
> +                )
> +            })
> +    }
> +
> +    /// Load the manifest without a lock. Must not be written back.
>      pub fn load_manifest(
>          &self,
>          backup_dir: &BackupDir,
> @@ -677,11 +699,19 @@ impl DataStore {
>          Ok((manifest, raw_size))
>      }
>  
> -    pub fn store_manifest(
> +    /// Update the manifest of the specified snapshot. Never write a manifest directly,
> +    /// only use this method - anything else may break locking guarantees.
> +    pub fn update_manifest(
>          &self,
>          backup_dir: &BackupDir,
> -        manifest: BackupManifest,
> +        update_fn: impl FnOnce(&mut BackupManifest),
>      ) -> Result<(), Error> {

It should not be possible to update anything outside the "unprotected" property.

> +
> +        let _guard = self.lock_manifest(backup_dir)?;
> +        let (mut manifest, _) = self.load_manifest(&backup_dir)?;
> +
> +        update_fn(&mut manifest);
> +
>          let manifest = serde_json::to_value(manifest)?;
>          let manifest = serde_json::to_string_pretty(&manifest)?;
>          let blob = DataBlob::encode(manifest.as_bytes(), None, true)?;
> @@ -691,6 +721,7 @@ impl DataStore {
>          path.push(backup_dir.relative_path());
>          path.push(MANIFEST_BLOB_NAME);
>  
> +        // atomic replace invalidates flock - no other writes past this point!
>          replace_file(&path, raw_data, CreateOptions::new())?;
>  
>          Ok(())
> diff --git a/src/backup/manifest.rs b/src/backup/manifest.rs
> index 609cc998..51980a07 100644
> --- a/src/backup/manifest.rs
> +++ b/src/backup/manifest.rs
> @@ -8,6 +8,7 @@ use ::serde::{Deserialize, Serialize};
>  use crate::backup::{BackupDir, CryptMode, CryptConfig};
>  
>  pub const MANIFEST_BLOB_NAME: &str = "index.json.blob";
> +pub const MANIFEST_LOCK_NAME: &str = ".index.json.lck";
>  pub const CLIENT_LOG_BLOB_NAME: &str = "client.log.blob";
>  
>  mod hex_csum {
> diff --git a/src/backup/verify.rs b/src/backup/verify.rs
> index 05b6ba86..ea3fa760 100644
> --- a/src/backup/verify.rs
> +++ b/src/backup/verify.rs
> @@ -300,7 +300,7 @@ pub fn verify_backup_dir(
>          return Ok(true);
>      }
>  
> -    let mut manifest = match datastore.load_manifest(&backup_dir) {
> +    let manifest = match datastore.load_manifest(&backup_dir) {
>          Ok((manifest, _)) => manifest,
>          Err(err) => {
>              task_log!(
> @@ -367,9 +367,10 @@ pub fn verify_backup_dir(
>          state: verify_result,
>          upid,
>      };
> -    manifest.unprotected["verify_state"] = serde_json::to_value(verify_state)?;
> -    datastore.store_manifest(&backup_dir, manifest)
> -        .map_err(|err| format_err!("unable to store manifest blob - {}", err))?;
> +    let verify_state = serde_json::to_value(verify_state)?;
> +    datastore.update_manifest(&backup_dir, |manifest| {
> +        manifest.unprotected["verify_state"] = verify_state;
> +    }).map_err(|err| format_err!("unable to update manifest blob - {}", err))?;
>  
>      Ok(error_count == 0)
>  }
> -- 
> 2.20.1
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [pbs-devel] [PATCH v2 proxmox-backup 2/4] datastore: add manifest locking
  2020-10-16  6:33   ` Dietmar Maurer
@ 2020-10-16  7:37     ` Dietmar Maurer
  0 siblings, 0 replies; 11+ messages in thread
From: Dietmar Maurer @ 2020-10-16  7:37 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter


> > -        let _guard;
> > +        let (_guard, _manifest_guard);
> >          if !force {
> >              _guard = lock_dir_noblock(&full_path, "snapshot", "possibly running or in use")?;
> > +            _manifest_guard = self.lock_manifest(backup_dir);
> 
> I think this is unnecessary. An update manifest should not block a remove_backup_dir.
> What for exactly?

Ok, Fabian just told be that this is required to avoid unexpected error messages,
so I will apply this now.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] applied: [PATCH v2 proxmox-backup 2/4] datastore: add manifest locking
  2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 2/4] datastore: add manifest locking Stefan Reiter
  2020-10-16  6:33   ` Dietmar Maurer
@ 2020-10-16  7:39   ` Dietmar Maurer
  1 sibling, 0 replies; 11+ messages in thread
From: Dietmar Maurer @ 2020-10-16  7:39 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter

applied




^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] applied: [PATCH v2 proxmox-backup 3/4] rustdoc: add crate level doc
  2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 3/4] rustdoc: add crate level doc Stefan Reiter
@ 2020-10-16  7:47   ` Dietmar Maurer
  0 siblings, 0 replies; 11+ messages in thread
From: Dietmar Maurer @ 2020-10-16  7:47 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter

applied




^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] applied: [PATCH v2 proxmox-backup 4/4] rustdoc: overhaul backup rustdoc and add locking table
  2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 4/4] rustdoc: overhaul backup rustdoc and add locking table Stefan Reiter
@ 2020-10-16  7:47   ` Dietmar Maurer
  0 siblings, 0 replies; 11+ messages in thread
From: Dietmar Maurer @ 2020-10-16  7:47 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Stefan Reiter

applied




^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-10-16  7:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-15 10:49 [pbs-devel] [PATCH v2 0/4] Locking and rustdoc improvements Stefan Reiter
2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 1/4] gc: avoid race between phase1 and forget/prune Stefan Reiter
2020-10-16  6:26   ` Dietmar Maurer
2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 2/4] datastore: add manifest locking Stefan Reiter
2020-10-16  6:33   ` Dietmar Maurer
2020-10-16  7:37     ` Dietmar Maurer
2020-10-16  7:39   ` [pbs-devel] applied: " Dietmar Maurer
2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 3/4] rustdoc: add crate level doc Stefan Reiter
2020-10-16  7:47   ` [pbs-devel] applied: " Dietmar Maurer
2020-10-15 10:49 ` [pbs-devel] [PATCH v2 proxmox-backup 4/4] rustdoc: overhaul backup rustdoc and add locking table Stefan Reiter
2020-10-16  7:47   ` [pbs-devel] applied: " Dietmar Maurer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal