[pbs-devel] [RFC proxmox-backup 0/4] implement trash can for snapshots

public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed

* [pbs-devel] [RFC proxmox-backup 0/4] implement trash can for snapshots
@ 2025-04-16 14:17 Christian Ebner
  2025-04-16 14:18 ` [pbs-devel] [RFC proxmox-backup 1/4] datastore: always skip over base directory when listing index files Christian Ebner
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Christian Ebner @ 2025-04-16 14:17 UTC (permalink / raw)
  To: pbs-devel

In an effort to simplify the GC phase 1 logic introduced by commit
cb9814e3 ("garbage collection: fix rare race in chunk marking phase")
this patch series implement a trash can functionality for snapshots.

The main intention is to allow snapshot's index files, pruned while
ongoing phase 1 of garbage collection, to be read and their chunks
marked as in use as well. This will allow to get rid of the currently
implemented and rather complex retry looping logic, which could in
theory lead to failing GC or backups when trying to lock the whole
group exclusively following the 10-th retry.

To achieve this, pruning of snapshots does not remove them
immediately, but rather moves them to a `.trash` subfolder in the
datastores base directory. This directory will then be cleared before
starting of GC phase 1, meaning that any index file could be restored
until the next GC run.

This however comes with it's own set of issues, therefore sending
these patches as RFC for now. Open questions and known limitations
are:
- Pruning does not cleanup any space, on the contrary it might
  require additional space on COW filesystem. Should there be a flag
  to bypass the trash, also given that sometimes users truly want to
  remove a snapshot immediately? Although that would re-introduce the
  issue with new snapshot ceration and concurrent GC on a last
  snapshot.
- Prune + sync + prune might lead to the same snapshot being pruned
  multiple times, currently any second prune on a snapshot would
  fail. Should this overwrite the trashed snapshot?
- GC might now read the same index twice, once before it was pruned
  followed by a prune while phase 1 is still ongoing and the second
  time as read from the trash. Not really an issue, but rather a
  limitation.
- Further issues I'm currently overlooking

Christian Ebner (4):
  datastore: always skip over base directory when listing index files
  datastore: allow to specify sub-directory for index file listing
  datastore: move snapshots to trash folder on destroy
  garbage collection: read pruned snapshot index files from trash

 pbs-datastore/src/backup_info.rs |  14 ++-
 pbs-datastore/src/datastore.rs   | 158 +++++++++++++++----------------
 2 files changed, 89 insertions(+), 83 deletions(-)

-- 
2.39.5

_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pbs-devel] [RFC proxmox-backup 1/4] datastore: always skip over base directory when listing index files
  2025-04-16 14:17 [pbs-devel] [RFC proxmox-backup 0/4] implement trash can for snapshots Christian Ebner
@ 2025-04-16 14:18 ` Christian Ebner
  2025-04-17  9:29   ` Fabian Grünbichler
  2025-04-16 14:18 ` [pbs-devel] [RFC proxmox-backup 2/4] datastore: allow to specify sub-directory for index file listing Christian Ebner
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Christian Ebner @ 2025-04-16 14:18 UTC (permalink / raw)
  To: pbs-devel

The base is a directory and not an index file anyways, so there is no
need to apply the filtering and archive type matching on it.
Further, this will allow to use a hidden folder as base, otherwise
not possible as excluded by the filtering, which will be useful when
listing index files in a `.trash` folder.

No functional change intended.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/datastore.rs | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 309137e00..3fde8b871 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -979,7 +979,9 @@ impl DataStore {
 
         use walkdir::WalkDir;
 
-        let walker = WalkDir::new(base).into_iter();
+        let mut walker = WalkDir::new(base).into_iter();
+        // always ignore the base directory itself, so a hidden folder can be used as base as well
+        walker.next();
 
         // make sure we skip .chunks (and other hidden files to keep it simple)
         fn is_hidden(entry: &walkdir::DirEntry) -> bool {
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pbs-devel] [RFC proxmox-backup 2/4] datastore: allow to specify sub-directory for index file listing
  2025-04-16 14:17 [pbs-devel] [RFC proxmox-backup 0/4] implement trash can for snapshots Christian Ebner
  2025-04-16 14:18 ` [pbs-devel] [RFC proxmox-backup 1/4] datastore: always skip over base directory when listing index files Christian Ebner
@ 2025-04-16 14:18 ` Christian Ebner
  2025-04-18  9:38   ` Thomas Lamprecht
  2025-04-16 14:18 ` [pbs-devel] [RFC proxmox-backup 3/4] datastore: move snapshots to trash folder on destroy Christian Ebner
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Christian Ebner @ 2025-04-16 14:18 UTC (permalink / raw)
  To: pbs-devel

Extend the list_index_files helper to allow passing a sub-directory
to it. This will be used to fetch only index files from the `.trash`
directory, ignored by regular listing since hidden folders are not
considered.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/datastore.rs | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 3fde8b871..97b78f000 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -972,8 +972,15 @@ impl DataStore {
     /// The filesystem is walked recursevly to detect index files based on their archive type based
     /// on the filename. This however excludes the chunks folder, hidden files and does not follow
     /// symlinks.
-    fn list_index_files(&self) -> Result<HashSet<PathBuf>, Error> {
-        let base = self.base_path();
+    /// If a subdirectory is provided, only that is scanned for index files.
+    fn list_index_files(&self, subdir: Option<PathBuf>) -> Result<HashSet<PathBuf>, Error> {
+        let mut base = self.base_path();
+        if let Some(subdir) = subdir {
+            base.push(subdir);
+            if !base.exists() {
+                return Ok(HashSet::new());
+            }
+        }
 
         let mut list = HashSet::new();
 
@@ -1132,7 +1139,7 @@ impl DataStore {
         // seen by the regular logic and the user is informed by the garbage collection run about
         // the detected index files not following the iterators logic.
 
-        let mut unprocessed_index_list = self.list_index_files()?;
+        let mut unprocessed_index_list = self.list_index_files(None)?;
         let mut index_count = unprocessed_index_list.len();
 
         let mut chunk_lru_cache = LruCache::new(cache_capacity);
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pbs-devel] [RFC proxmox-backup 3/4] datastore: move snapshots to trash folder on destroy
  2025-04-16 14:17 [pbs-devel] [RFC proxmox-backup 0/4] implement trash can for snapshots Christian Ebner
  2025-04-16 14:18 ` [pbs-devel] [RFC proxmox-backup 1/4] datastore: always skip over base directory when listing index files Christian Ebner
  2025-04-16 14:18 ` [pbs-devel] [RFC proxmox-backup 2/4] datastore: allow to specify sub-directory for index file listing Christian Ebner
@ 2025-04-16 14:18 ` Christian Ebner
  2025-04-17  9:29   ` Fabian Grünbichler
  2025-04-16 14:18 ` [pbs-devel] [RFC proxmox-backup 4/4] garbage collection: read pruned snapshot index files from trash Christian Ebner
  2025-04-17  9:29 ` [pbs-devel] [RFC proxmox-backup 0/4] implement trash can for snapshots Fabian Grünbichler
  4 siblings, 1 reply; 21+ messages in thread
From: Christian Ebner @ 2025-04-16 14:18 UTC (permalink / raw)
  To: pbs-devel

Instead of directly deleting the snapshot directory and it's contents
on a prune, move the snapshot directory into the `.trash` subfolder
of the datastore.

This allows to mark chunks which were used by these index files if
the snapshot was pruned during an ongoing garbage collection.
Garbage collection will clean up these files before starting with the
marking phase 1 and read all index files after completing that phase,
touching these chunks as well.

Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/backup_info.rs | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
index d4732fdd9..cd0d521c9 100644
--- a/pbs-datastore/src/backup_info.rs
+++ b/pbs-datastore/src/backup_info.rs
@@ -588,11 +588,19 @@ impl BackupDir {
             bail!("cannot remove protected snapshot"); // use special error type?
         }
 
+        let relative_path = self.relative_path();
+        let mut trash_path = self.store.base_path();
+        trash_path.push(".trash/");
+        trash_path.push(relative_path);
+        if let Some(parent) = trash_path.parent() {
+            std::fs::create_dir_all(&parent)
+                .with_context(|| format!("creating trash folders for {trash_path:?} failed"))?;
+        }
+
         let full_path = self.full_path();
         log::info!("removing backup snapshot {:?}", full_path);
-        std::fs::remove_dir_all(&full_path).map_err(|err| {
-            format_err!("removing backup snapshot {:?} failed - {}", full_path, err,)
-        })?;
+        std::fs::rename(&full_path, trash_path)
+            .with_context(|| format!("moving backup snapshot {full_path:?} to trash failed"))?;
 
         // remove no longer needed lock files
         let _ = std::fs::remove_file(self.manifest_lock_path()); // ignore errors
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [pbs-devel] [RFC proxmox-backup 4/4] garbage collection: read pruned snapshot index files from trash
  2025-04-16 14:17 [pbs-devel] [RFC proxmox-backup 0/4] implement trash can for snapshots Christian Ebner
                   ` (2 preceding siblings ...)
  2025-04-16 14:18 ` [pbs-devel] [RFC proxmox-backup 3/4] datastore: move snapshots to trash folder on destroy Christian Ebner
@ 2025-04-16 14:18 ` Christian Ebner
  2025-04-17  9:29   ` Fabian Grünbichler
  2025-04-17  9:29 ` [pbs-devel] [RFC proxmox-backup 0/4] implement trash can for snapshots Fabian Grünbichler
  4 siblings, 1 reply; 21+ messages in thread
From: Christian Ebner @ 2025-04-16 14:18 UTC (permalink / raw)
  To: pbs-devel

Snapshots pruned during phase 1 are now also assured to be included
in the marking phase by reading the index files from trash and
touching these chunks as well.

Clear any trash before starting with phase 1, so that only snapshots
pruned during GC are consided.

Further, drop the retry logic used before to assure eventually newly
written index files are included in the marking phase, if the
previously last one was pruned. This is not necessary anymore as the
previously last one will be read from trash.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/datastore.rs | 141 +++++++++++++++------------------
 1 file changed, 65 insertions(+), 76 deletions(-)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 97b78f000..688e65247 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1137,7 +1137,13 @@ impl DataStore {
         //
         // By this it is assured that all index files are used, even if they would not have been
         // seen by the regular logic and the user is informed by the garbage collection run about
-        // the detected index files not following the iterators logic.
+        // the detected index files not following the iterators logic. Further, include trashed
+        // snapshots which have been pruned during garbage collections marking phase.
+
+        let trash = PathBuf::from(".trash/");
+        let mut full_trash_path = self.base_path();
+        full_trash_path.push(&trash);
+        let _ = std::fs::remove_dir_all(full_trash_path);
 
         let mut unprocessed_index_list = self.list_index_files(None)?;
         let mut index_count = unprocessed_index_list.len();
@@ -1154,88 +1160,63 @@ impl DataStore {
             let namespace = namespace.context("iterating namespaces failed")?;
             for group in arc_self.iter_backup_groups(namespace)? {
                 let group = group.context("iterating backup groups failed")?;
+                let mut snapshots = match group.list_backups() {
+                    Ok(snapshots) => snapshots,
+                    Err(err) => {
+                        if group.exists() {
+                            return Err(err).context("listing snapshots failed")?;
+                        }
+                        // vanished, will be covered by trashed list below to avoid
+                        // not touching known chunks.
+                        continue;
+                    }
+                };
 
-                // Avoid race between listing/marking of snapshots by GC and pruning the last
-                // snapshot in the group, following a new snapshot creation. Otherwise known chunks
-                // might only be referenced by the new snapshot, so it must be read as well.
-                let mut retry_counter = 0;
-                'retry: loop {
-                    let _lock = match retry_counter {
-                        0..=9 => None,
-                        10 => Some(
-                            group
-                                .lock()
-                                .context("exhausted retries and failed to lock group")?,
-                        ),
-                        _ => bail!("exhausted retries and unexpected counter overrun"),
-                    };
-
-                    let mut snapshots = match group.list_backups() {
-                        Ok(snapshots) => snapshots,
-                        Err(err) => {
-                            if group.exists() {
-                                return Err(err).context("listing snapshots failed")?;
-                            }
-                            break 'retry;
+                BackupInfo::sort_list(&mut snapshots, true);
+                for snapshot in snapshots.into_iter() {
+                    for file in snapshot.files {
+                        worker.check_abort()?;
+                        worker.fail_on_shutdown()?;
+
+                        match ArchiveType::from_path(&file) {
+                            Ok(ArchiveType::FixedIndex) | Ok(ArchiveType::DynamicIndex) => (),
+                            Ok(ArchiveType::Blob) | Err(_) => continue,
                         }
-                    };
-
-                    // Always start iteration with the last snapshot of the group to reduce race
-                    // window with concurrent backup+prune previous last snapshot. Allows to retry
-                    // without the need to keep track of already processed index files for the
-                    // current group.
-                    BackupInfo::sort_list(&mut snapshots, true);
-                    for (count, snapshot) in snapshots.into_iter().rev().enumerate() {
-                        for file in snapshot.files {
-                            worker.check_abort()?;
-                            worker.fail_on_shutdown()?;
-
-                            match ArchiveType::from_path(&file) {
-                                Ok(ArchiveType::FixedIndex) | Ok(ArchiveType::DynamicIndex) => (),
-                                Ok(ArchiveType::Blob) | Err(_) => continue,
-                            }
 
-                            let mut path = snapshot.backup_dir.full_path();
-                            path.push(file);
-
-                            let index = match self.open_index_reader(&path)? {
-                                Some(index) => index,
-                                None => {
-                                    unprocessed_index_list.remove(&path);
-                                    if count == 0 {
-                                        retry_counter += 1;
-                                        continue 'retry;
-                                    }
-                                    continue;
-                                }
-                            };
-
-                            self.index_mark_used_chunks(
-                                index,
-                                &path,
-                                &mut chunk_lru_cache,
-                                status,
-                                worker,
-                            )?;
-
-                            if !unprocessed_index_list.remove(&path) {
-                                info!("Encountered new index file '{path:?}', increment total index file count");
-                                index_count += 1;
-                            }
+                        let mut path = snapshot.backup_dir.full_path();
+                        path.push(file);
 
-                            let percentage = (processed_index_files + 1) * 100 / index_count;
-                            if percentage > last_percentage {
-                                info!(
-                                    "marked {percentage}% ({} of {index_count} index files)",
-                                    processed_index_files + 1,
-                                );
-                                last_percentage = percentage;
+                        let index = match self.open_index_reader(&path)? {
+                            Some(index) => index,
+                            None => {
+                                unprocessed_index_list.remove(&path);
+                                continue;
                             }
-                            processed_index_files += 1;
+                        };
+
+                        self.index_mark_used_chunks(
+                            index,
+                            &path,
+                            &mut chunk_lru_cache,
+                            status,
+                            worker,
+                        )?;
+
+                        if !unprocessed_index_list.remove(&path) {
+                            info!("Encountered new index file '{path:?}', increment total index file count");
+                            index_count += 1;
                         }
-                    }
 
-                    break;
+                        let percentage = (processed_index_files + 1) * 100 / index_count;
+                        if percentage > last_percentage {
+                            info!(
+                                "marked {percentage}% ({} of {index_count} index files)",
+                                processed_index_files + 1,
+                            );
+                            last_percentage = percentage;
+                        }
+                        processed_index_files += 1;
+                    }
                 }
             }
         }
@@ -1257,6 +1238,14 @@ impl DataStore {
             warn!("Found {strange_paths_count} index files outside of expected directory scheme");
         }
 
+        let trashed_index_list = self.list_index_files(Some(trash))?;
+        for path in trashed_index_list {
+            if let Some(index) = self.open_index_reader(&path)? {
+                info!("Mark chunks for pruned index file found in {path:?}");
+                self.index_mark_used_chunks(index, &path, &mut chunk_lru_cache, status, worker)?;
+            };
+        }
+
         Ok(())
     }
 
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup 1/4] datastore: always skip over base directory when listing index files
  2025-04-16 14:18 ` [pbs-devel] [RFC proxmox-backup 1/4] datastore: always skip over base directory when listing index files Christian Ebner
@ 2025-04-17  9:29   ` Fabian Grünbichler
  2025-04-17 10:27     ` Christian Ebner
  0 siblings, 1 reply; 21+ messages in thread
From: Fabian Grünbichler @ 2025-04-17  9:29 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

the commit subject could maybe indicate somehow that this is for the
GC-specific helper, and not some generic index listing code ;)

On April 16, 2025 4:18 pm, Christian Ebner wrote:
> The base is a directory and not an index file anyways, so there is no
> need to apply the filtering and archive type matching on it.
> Further, this will allow to use a hidden folder as base, otherwise
> not possible as excluded by the filtering, which will be useful when
> listing index files in a `.trash` folder.
> 
> No functional change intended.
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
>  pbs-datastore/src/datastore.rs | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 309137e00..3fde8b871 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -979,7 +979,9 @@ impl DataStore {
>  
>          use walkdir::WalkDir;
>  
> -        let walker = WalkDir::new(base).into_iter();
> +        let mut walker = WalkDir::new(base).into_iter();
> +        // always ignore the base directory itself, so a hidden folder can be used as base as well
> +        walker.next();

this should check for errors?

>  
>          // make sure we skip .chunks (and other hidden files to keep it simple)
>          fn is_hidden(entry: &walkdir::DirEntry) -> bool {
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup 3/4] datastore: move snapshots to trash folder on destroy
  2025-04-16 14:18 ` [pbs-devel] [RFC proxmox-backup 3/4] datastore: move snapshots to trash folder on destroy Christian Ebner
@ 2025-04-17  9:29   ` Fabian Grünbichler
  2025-04-18 11:06     ` Thomas Lamprecht
  0 siblings, 1 reply; 21+ messages in thread
From: Fabian Grünbichler @ 2025-04-17  9:29 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On April 16, 2025 4:18 pm, Christian Ebner wrote:
> Instead of directly deleting the snapshot directory and it's contents
> on a prune, move the snapshot directory into the `.trash` subfolder
> of the datastore.
> 
> This allows to mark chunks which were used by these index files if
> the snapshot was pruned during an ongoing garbage collection.
> Garbage collection will clean up these files before starting with the
> marking phase 1 and read all index files after completing that phase,
> touching these chunks as well.

some other variants to maybe consider:

marking the snapshot itself as trash (in the manifest, or by adding a
trash marker file inside the dir) - this would mean that there is no
iterator race issue when undoing a prune, no double-pruning collisions,
.. - but it also means we need to adapt all call sites that should skip
trashed snapshots (most existing ones), which is more churn.

having a trash dir per group instead of a global one for the whole
datastore (less likely to incur extra costs in case somebody has a weird
setup where namespaces/.. are symlinked or bindmounted or similar
shenanigans). would need to postpone group removal to GC in case all
snapshots are pruned.

> 
> Suggested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
>  pbs-datastore/src/backup_info.rs | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
> index d4732fdd9..cd0d521c9 100644
> --- a/pbs-datastore/src/backup_info.rs
> +++ b/pbs-datastore/src/backup_info.rs
> @@ -588,11 +588,19 @@ impl BackupDir {
>              bail!("cannot remove protected snapshot"); // use special error type?
>          }
>  
> +        let relative_path = self.relative_path();
> +        let mut trash_path = self.store.base_path();
> +        trash_path.push(".trash/");
> +        trash_path.push(relative_path);
> +        if let Some(parent) = trash_path.parent() {
> +            std::fs::create_dir_all(&parent)
> +                .with_context(|| format!("creating trash folders for {trash_path:?} failed"))?;
> +        }
> +
>          let full_path = self.full_path();
>          log::info!("removing backup snapshot {:?}", full_path);
> -        std::fs::remove_dir_all(&full_path).map_err(|err| {
> -            format_err!("removing backup snapshot {:?} failed - {}", full_path, err,)
> -        })?;
> +        std::fs::rename(&full_path, trash_path)
> +            .with_context(|| format!("moving backup snapshot {full_path:?} to trash failed"))?;
>  
>          // remove no longer needed lock files
>          let _ = std::fs::remove_file(self.manifest_lock_path()); // ignore errors
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup 4/4] garbage collection: read pruned snapshot index files from trash
  2025-04-16 14:18 ` [pbs-devel] [RFC proxmox-backup 4/4] garbage collection: read pruned snapshot index files from trash Christian Ebner
@ 2025-04-17  9:29   ` Fabian Grünbichler
  2025-04-17 10:38     ` Christian Ebner
  0 siblings, 1 reply; 21+ messages in thread
From: Fabian Grünbichler @ 2025-04-17  9:29 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On April 16, 2025 4:18 pm, Christian Ebner wrote:
> Snapshots pruned during phase 1 are now also assured to be included
> in the marking phase by reading the index files from trash and
> touching these chunks as well.
> 
> Clear any trash before starting with phase 1, so that only snapshots
> pruned during GC are consided.
> 
> Further, drop the retry logic used before to assure eventually newly
> written index files are included in the marking phase, if the
> previously last one was pruned. This is not necessary anymore as the
> previously last one will be read from trash.
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
>  pbs-datastore/src/datastore.rs | 141 +++++++++++++++------------------
>  1 file changed, 65 insertions(+), 76 deletions(-)
> 
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 97b78f000..688e65247 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -1137,7 +1137,13 @@ impl DataStore {
>          //
>          // By this it is assured that all index files are used, even if they would not have been
>          // seen by the regular logic and the user is informed by the garbage collection run about
> -        // the detected index files not following the iterators logic.
> +        // the detected index files not following the iterators logic. Further, include trashed
> +        // snapshots which have been pruned during garbage collections marking phase.
> +
> +        let trash = PathBuf::from(".trash/");
> +        let mut full_trash_path = self.base_path();
> +        full_trash_path.push(&trash);
> +        let _ = std::fs::remove_dir_all(full_trash_path);

I think this would need some locking, else we start recursively deleting
here while at the same time a prune operation is moving something into
the trash..

>  
>          let mut unprocessed_index_list = self.list_index_files(None)?;
>          let mut index_count = unprocessed_index_list.len();
> @@ -1154,88 +1160,63 @@ impl DataStore {
>              let namespace = namespace.context("iterating namespaces failed")?;
>              for group in arc_self.iter_backup_groups(namespace)? {
>                  let group = group.context("iterating backup groups failed")?;
> +                let mut snapshots = match group.list_backups() {
> +                    Ok(snapshots) => snapshots,
> +                    Err(err) => {
> +                        if group.exists() {
> +                            return Err(err).context("listing snapshots failed")?;
> +                        }
> +                        // vanished, will be covered by trashed list below to avoid
> +                        // not touching known chunks.
> +                        continue;
> +                    }
> +                };
>  
> -                // Avoid race between listing/marking of snapshots by GC and pruning the last
> -                // snapshot in the group, following a new snapshot creation. Otherwise known chunks
> -                // might only be referenced by the new snapshot, so it must be read as well.
> -                let mut retry_counter = 0;
> -                'retry: loop {
> -                    let _lock = match retry_counter {
> -                        0..=9 => None,
> -                        10 => Some(
> -                            group
> -                                .lock()
> -                                .context("exhausted retries and failed to lock group")?,
> -                        ),
> -                        _ => bail!("exhausted retries and unexpected counter overrun"),
> -                    };
> -
> -                    let mut snapshots = match group.list_backups() {
> -                        Ok(snapshots) => snapshots,
> -                        Err(err) => {
> -                            if group.exists() {
> -                                return Err(err).context("listing snapshots failed")?;
> -                            }
> -                            break 'retry;
> +                BackupInfo::sort_list(&mut snapshots, true);
> +                for snapshot in snapshots.into_iter() {
> +                    for file in snapshot.files {
> +                        worker.check_abort()?;
> +                        worker.fail_on_shutdown()?;
> +
> +                        match ArchiveType::from_path(&file) {
> +                            Ok(ArchiveType::FixedIndex) | Ok(ArchiveType::DynamicIndex) => (),
> +                            Ok(ArchiveType::Blob) | Err(_) => continue,
>                          }
> -                    };
> -
> -                    // Always start iteration with the last snapshot of the group to reduce race
> -                    // window with concurrent backup+prune previous last snapshot. Allows to retry
> -                    // without the need to keep track of already processed index files for the
> -                    // current group.
> -                    BackupInfo::sort_list(&mut snapshots, true);
> -                    for (count, snapshot) in snapshots.into_iter().rev().enumerate() {
> -                        for file in snapshot.files {
> -                            worker.check_abort()?;
> -                            worker.fail_on_shutdown()?;
> -
> -                            match ArchiveType::from_path(&file) {
> -                                Ok(ArchiveType::FixedIndex) | Ok(ArchiveType::DynamicIndex) => (),
> -                                Ok(ArchiveType::Blob) | Err(_) => continue,
> -                            }
>  
> -                            let mut path = snapshot.backup_dir.full_path();
> -                            path.push(file);
> -
> -                            let index = match self.open_index_reader(&path)? {
> -                                Some(index) => index,
> -                                None => {
> -                                    unprocessed_index_list.remove(&path);
> -                                    if count == 0 {
> -                                        retry_counter += 1;
> -                                        continue 'retry;
> -                                    }
> -                                    continue;
> -                                }
> -                            };
> -
> -                            self.index_mark_used_chunks(
> -                                index,
> -                                &path,
> -                                &mut chunk_lru_cache,
> -                                status,
> -                                worker,
> -                            )?;
> -
> -                            if !unprocessed_index_list.remove(&path) {
> -                                info!("Encountered new index file '{path:?}', increment total index file count");
> -                                index_count += 1;
> -                            }
> +                        let mut path = snapshot.backup_dir.full_path();
> +                        path.push(file);
>  
> -                            let percentage = (processed_index_files + 1) * 100 / index_count;
> -                            if percentage > last_percentage {
> -                                info!(
> -                                    "marked {percentage}% ({} of {index_count} index files)",
> -                                    processed_index_files + 1,
> -                                );
> -                                last_percentage = percentage;
> +                        let index = match self.open_index_reader(&path)? {
> +                            Some(index) => index,
> +                            None => {
> +                                unprocessed_index_list.remove(&path);
> +                                continue;
>                              }
> -                            processed_index_files += 1;
> +                        };
> +
> +                        self.index_mark_used_chunks(
> +                            index,
> +                            &path,
> +                            &mut chunk_lru_cache,
> +                            status,
> +                            worker,
> +                        )?;
> +
> +                        if !unprocessed_index_list.remove(&path) {
> +                            info!("Encountered new index file '{path:?}', increment total index file count");
> +                            index_count += 1;
>                          }
> -                    }
>  
> -                    break;
> +                        let percentage = (processed_index_files + 1) * 100 / index_count;
> +                        if percentage > last_percentage {
> +                            info!(
> +                                "marked {percentage}% ({} of {index_count} index files)",
> +                                processed_index_files + 1,
> +                            );
> +                            last_percentage = percentage;
> +                        }
> +                        processed_index_files += 1;
> +                    }
>                  }
>              }
>          }
> @@ -1257,6 +1238,14 @@ impl DataStore {
>              warn!("Found {strange_paths_count} index files outside of expected directory scheme");
>          }
>  
> +        let trashed_index_list = self.list_index_files(Some(trash))?;
> +        for path in trashed_index_list {
> +            if let Some(index) = self.open_index_reader(&path)? {
> +                info!("Mark chunks for pruned index file found in {path:?}");
> +                self.index_mark_used_chunks(index, &path, &mut chunk_lru_cache, status, worker)?;
> +            };
> +        }
> +

I think we'd want to support undoing moving a snapshot to trash, if we
have a trash can (that's the other half of wanting this feature after
all). if we do so, we need to be careful to not re-introduce a race here
(e.g., by keeping a copy in the trash can when undoing, or by using a
trash can mechanism that doesn't require separate iteration over regular
and trashed snapshots).

>          Ok(())
>      }
>  
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup 0/4] implement trash can for snapshots
  2025-04-16 14:17 [pbs-devel] [RFC proxmox-backup 0/4] implement trash can for snapshots Christian Ebner
                   ` (3 preceding siblings ...)
  2025-04-16 14:18 ` [pbs-devel] [RFC proxmox-backup 4/4] garbage collection: read pruned snapshot index files from trash Christian Ebner
@ 2025-04-17  9:29 ` Fabian Grünbichler
  4 siblings, 0 replies; 21+ messages in thread
From: Fabian Grünbichler @ 2025-04-17  9:29 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On April 16, 2025 4:17 pm, Christian Ebner wrote:
> In an effort to simplify the GC phase 1 logic introduced by commit
> cb9814e3 ("garbage collection: fix rare race in chunk marking phase")
> this patch series implement a trash can functionality for snapshots.

that was fast ;)

> The main intention is to allow snapshot's index files, pruned while
> ongoing phase 1 of garbage collection, to be read and their chunks
> marked as in use as well. This will allow to get rid of the currently
> implemented and rather complex retry looping logic, which could in
> theory lead to failing GC or backups when trying to lock the whole
> group exclusively following the 10-th retry.

I think the other, not really smaller intention is to allow undoing an
accidental/premature deletion/pruning. So we need to consider this
usecase as well when designing the trash can semantics, and ideally
introduce that at the same time so we can properly rule out problems..

> To achieve this, pruning of snapshots does not remove them
> immediately, but rather moves them to a `.trash` subfolder in the
> datastores base directory. This directory will then be cleared before
> starting of GC phase 1, meaning that any index file could be restored
> until the next GC run.

see my comment on patch #3

> This however comes with it's own set of issues, therefore sending
> these patches as RFC for now. Open questions and known limitations
> are:
> - Pruning does not cleanup any space, on the contrary it might
>   require additional space on COW filesystem. Should there be a flag
>   to bypass the trash, also given that sometimes users truly want to
>   remove a snapshot immediately? Although that would re-introduce the
>   issue with new snapshot ceration and concurrent GC on a last
>   snapshot.

I think it might make sense, but I am not sure how we could avoid the GC
issue (but I think we could design the trash can feature in a way that
we keep the retry logic in GC, but that it only ever triggers in case
such a skip-trash pruning took place in a group).

> - Prune + sync + prune might lead to the same snapshot being pruned
>   multiple times, currently any second prune on a snapshot would
>   fail. Should this overwrite the trashed snapshot?

this depends on how the trash feature is implemented:
- if it's a mark on the snapshot, then attempting to write the snapshot
  again could either fail or overwrite the trashed snapshot
- if the snapshot is moved to a trash can, then we could keep multiple
  copies there

> - GC might now read the same index twice, once before it was pruned
>   followed by a prune while phase 1 is still ongoing and the second
>   time as read from the trash. Not really an issue, but rather a
>   limitation.

reading twice is a lot better than never reading ;) I don't think this
should be particularly problematic.

> - Further issues I'm currently overlooking
> 
> Christian Ebner (4):
>   datastore: always skip over base directory when listing index files
>   datastore: allow to specify sub-directory for index file listing
>   datastore: move snapshots to trash folder on destroy
>   garbage collection: read pruned snapshot index files from trash
> 
>  pbs-datastore/src/backup_info.rs |  14 ++-
>  pbs-datastore/src/datastore.rs   | 158 +++++++++++++++----------------
>  2 files changed, 89 insertions(+), 83 deletions(-)
> 
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup 1/4] datastore: always skip over base directory when listing index files
  2025-04-17  9:29   ` Fabian Grünbichler
@ 2025-04-17 10:27     ` Christian Ebner
  0 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-04-17 10:27 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Fabian Grünbichler

On 4/17/25 11:29, Fabian Grünbichler wrote:
> the commit subject could maybe indicate somehow that this is for the
> GC-specific helper, and not some generic index listing code ;)
> 
> On April 16, 2025 4:18 pm, Christian Ebner wrote:
>> The base is a directory and not an index file anyways, so there is no
>> need to apply the filtering and archive type matching on it.
>> Further, this will allow to use a hidden folder as base, otherwise
>> not possible as excluded by the filtering, which will be useful when
>> listing index files in a `.trash` folder.
>>
>> No functional change intended.
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>>   pbs-datastore/src/datastore.rs | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 309137e00..3fde8b871 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -979,7 +979,9 @@ impl DataStore {
>>   
>>           use walkdir::WalkDir;
>>   
>> -        let walker = WalkDir::new(base).into_iter();
>> +        let mut walker = WalkDir::new(base).into_iter();
>> +        // always ignore the base directory itself, so a hidden folder can be used as base as well
>> +        walker.next();
> 
> this should check for errors?

Agreed, makes sense as otherwise the index file listing might return 
without error and an empty list if e.g. the directory cannot be read for 
some reason.

Will adapt accordingly, thanks!



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup 4/4] garbage collection: read pruned snapshot index files from trash
  2025-04-17  9:29   ` Fabian Grünbichler
@ 2025-04-17 10:38     ` Christian Ebner
  2025-04-17 11:27       ` Fabian Grünbichler
  0 siblings, 1 reply; 21+ messages in thread
From: Christian Ebner @ 2025-04-17 10:38 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Fabian Grünbichler

On 4/17/25 11:29, Fabian Grünbichler wrote:
> On April 16, 2025 4:18 pm, Christian Ebner wrote:
>> Snapshots pruned during phase 1 are now also assured to be included
>> in the marking phase by reading the index files from trash and
>> touching these chunks as well.
>>
>> Clear any trash before starting with phase 1, so that only snapshots
>> pruned during GC are consided.
>>
>> Further, drop the retry logic used before to assure eventually newly
>> written index files are included in the marking phase, if the
>> previously last one was pruned. This is not necessary anymore as the
>> previously last one will be read from trash.
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>>   pbs-datastore/src/datastore.rs | 141 +++++++++++++++------------------
>>   1 file changed, 65 insertions(+), 76 deletions(-)
>>
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 97b78f000..688e65247 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -1137,7 +1137,13 @@ impl DataStore {
>>           //
>>           // By this it is assured that all index files are used, even if they would not have been
>>           // seen by the regular logic and the user is informed by the garbage collection run about
>> -        // the detected index files not following the iterators logic.
>> +        // the detected index files not following the iterators logic. Further, include trashed
>> +        // snapshots which have been pruned during garbage collections marking phase.
>> +
>> +        let trash = PathBuf::from(".trash/");
>> +        let mut full_trash_path = self.base_path();
>> +        full_trash_path.push(&trash);
>> +        let _ = std::fs::remove_dir_all(full_trash_path);
> 
> I think this would need some locking, else we start recursively deleting
> here while at the same time a prune operation is moving something into
> the trash..

True, if there is a concurrent insert, then deletion will probably fail 
because the sub-directory is then not empty anymore. Instead of locking, 
I could do an atomic rename of the whole trash can instead, and cleanup 
the renamed one? Should be not only more efficient but also easier to 
implement.

> 
>>   
>>           let mut unprocessed_index_list = self.list_index_files(None)?;
>>           let mut index_count = unprocessed_index_list.len();
>> @@ -1154,88 +1160,63 @@ impl DataStore {
>>               let namespace = namespace.context("iterating namespaces failed")?;
>>               for group in arc_self.iter_backup_groups(namespace)? {
>>                   let group = group.context("iterating backup groups failed")?;
>> +                let mut snapshots = match group.list_backups() {
>> +                    Ok(snapshots) => snapshots,
>> +                    Err(err) => {
>> +                        if group.exists() {
>> +                            return Err(err).context("listing snapshots failed")?;
>> +                        }
>> +                        // vanished, will be covered by trashed list below to avoid
>> +                        // not touching known chunks.
>> +                        continue;
>> +                    }
>> +                };
>>   
>> -                // Avoid race between listing/marking of snapshots by GC and pruning the last
>> -                // snapshot in the group, following a new snapshot creation. Otherwise known chunks
>> -                // might only be referenced by the new snapshot, so it must be read as well.
>> -                let mut retry_counter = 0;
>> -                'retry: loop {
>> -                    let _lock = match retry_counter {
>> -                        0..=9 => None,
>> -                        10 => Some(
>> -                            group
>> -                                .lock()
>> -                                .context("exhausted retries and failed to lock group")?,
>> -                        ),
>> -                        _ => bail!("exhausted retries and unexpected counter overrun"),
>> -                    };
>> -
>> -                    let mut snapshots = match group.list_backups() {
>> -                        Ok(snapshots) => snapshots,
>> -                        Err(err) => {
>> -                            if group.exists() {
>> -                                return Err(err).context("listing snapshots failed")?;
>> -                            }
>> -                            break 'retry;
>> +                BackupInfo::sort_list(&mut snapshots, true);
>> +                for snapshot in snapshots.into_iter() {
>> +                    for file in snapshot.files {
>> +                        worker.check_abort()?;
>> +                        worker.fail_on_shutdown()?;
>> +
>> +                        match ArchiveType::from_path(&file) {
>> +                            Ok(ArchiveType::FixedIndex) | Ok(ArchiveType::DynamicIndex) => (),
>> +                            Ok(ArchiveType::Blob) | Err(_) => continue,
>>                           }
>> -                    };
>> -
>> -                    // Always start iteration with the last snapshot of the group to reduce race
>> -                    // window with concurrent backup+prune previous last snapshot. Allows to retry
>> -                    // without the need to keep track of already processed index files for the
>> -                    // current group.
>> -                    BackupInfo::sort_list(&mut snapshots, true);
>> -                    for (count, snapshot) in snapshots.into_iter().rev().enumerate() {
>> -                        for file in snapshot.files {
>> -                            worker.check_abort()?;
>> -                            worker.fail_on_shutdown()?;
>> -
>> -                            match ArchiveType::from_path(&file) {
>> -                                Ok(ArchiveType::FixedIndex) | Ok(ArchiveType::DynamicIndex) => (),
>> -                                Ok(ArchiveType::Blob) | Err(_) => continue,
>> -                            }
>>   
>> -                            let mut path = snapshot.backup_dir.full_path();
>> -                            path.push(file);
>> -
>> -                            let index = match self.open_index_reader(&path)? {
>> -                                Some(index) => index,
>> -                                None => {
>> -                                    unprocessed_index_list.remove(&path);
>> -                                    if count == 0 {
>> -                                        retry_counter += 1;
>> -                                        continue 'retry;
>> -                                    }
>> -                                    continue;
>> -                                }
>> -                            };
>> -
>> -                            self.index_mark_used_chunks(
>> -                                index,
>> -                                &path,
>> -                                &mut chunk_lru_cache,
>> -                                status,
>> -                                worker,
>> -                            )?;
>> -
>> -                            if !unprocessed_index_list.remove(&path) {
>> -                                info!("Encountered new index file '{path:?}', increment total index file count");
>> -                                index_count += 1;
>> -                            }
>> +                        let mut path = snapshot.backup_dir.full_path();
>> +                        path.push(file);
>>   
>> -                            let percentage = (processed_index_files + 1) * 100 / index_count;
>> -                            if percentage > last_percentage {
>> -                                info!(
>> -                                    "marked {percentage}% ({} of {index_count} index files)",
>> -                                    processed_index_files + 1,
>> -                                );
>> -                                last_percentage = percentage;
>> +                        let index = match self.open_index_reader(&path)? {
>> +                            Some(index) => index,
>> +                            None => {
>> +                                unprocessed_index_list.remove(&path);
>> +                                continue;
>>                               }
>> -                            processed_index_files += 1;
>> +                        };
>> +
>> +                        self.index_mark_used_chunks(
>> +                            index,
>> +                            &path,
>> +                            &mut chunk_lru_cache,
>> +                            status,
>> +                            worker,
>> +                        )?;
>> +
>> +                        if !unprocessed_index_list.remove(&path) {
>> +                            info!("Encountered new index file '{path:?}', increment total index file count");
>> +                            index_count += 1;
>>                           }
>> -                    }
>>   
>> -                    break;
>> +                        let percentage = (processed_index_files + 1) * 100 / index_count;
>> +                        if percentage > last_percentage {
>> +                            info!(
>> +                                "marked {percentage}% ({} of {index_count} index files)",
>> +                                processed_index_files + 1,
>> +                            );
>> +                            last_percentage = percentage;
>> +                        }
>> +                        processed_index_files += 1;
>> +                    }
>>                   }
>>               }
>>           }
>> @@ -1257,6 +1238,14 @@ impl DataStore {
>>               warn!("Found {strange_paths_count} index files outside of expected directory scheme");
>>           }
>>   
>> +        let trashed_index_list = self.list_index_files(Some(trash))?;
>> +        for path in trashed_index_list {
>> +            if let Some(index) = self.open_index_reader(&path)? {
>> +                info!("Mark chunks for pruned index file found in {path:?}");
>> +                self.index_mark_used_chunks(index, &path, &mut chunk_lru_cache, status, worker)?;
>> +            };
>> +        }
>> +
> 
> I think we'd want to support undoing moving a snapshot to trash, if we
> have a trash can (that's the other half of wanting this feature after
> all). if we do so, we need to be careful to not re-introduce a race here
> (e.g., by keeping a copy in the trash can when undoing, or by using a
> trash can mechanism that doesn't require separate iteration over regular
> and trashed snapshots).

Hmm, that's right. So keeping a copy in the trash might be a good 
approach here. That is if one sticks to the trash folder mechanism. Must 
first see the implications of your other suggested approaches.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup 4/4] garbage collection: read pruned snapshot index files from trash
  2025-04-17 10:38     ` Christian Ebner
@ 2025-04-17 11:27       ` Fabian Grünbichler
  0 siblings, 0 replies; 21+ messages in thread
From: Fabian Grünbichler @ 2025-04-17 11:27 UTC (permalink / raw)
  To: Christian Ebner, Proxmox Backup Server development discussion


> Christian Ebner <c.ebner@proxmox.com> hat am 17.04.2025 12:38 CEST geschrieben:
> On 4/17/25 11:29, Fabian Grünbichler wrote:
> > On April 16, 2025 4:18 pm, Christian Ebner wrote:
> >> Snapshots pruned during phase 1 are now also assured to be included
> >> in the marking phase by reading the index files from trash and
> >> touching these chunks as well.
> >>
> >> Clear any trash before starting with phase 1, so that only snapshots
> >> pruned during GC are consided.
> >>
> >> Further, drop the retry logic used before to assure eventually newly
> >> written index files are included in the marking phase, if the
> >> previously last one was pruned. This is not necessary anymore as the
> >> previously last one will be read from trash.
> >>
> >> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> >> ---
> >>   pbs-datastore/src/datastore.rs | 141 +++++++++++++++------------------
> >>   1 file changed, 65 insertions(+), 76 deletions(-)
> >>
> >> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> >> index 97b78f000..688e65247 100644
> >> --- a/pbs-datastore/src/datastore.rs
> >> +++ b/pbs-datastore/src/datastore.rs
> >> @@ -1137,7 +1137,13 @@ impl DataStore {
> >>           //
> >>           // By this it is assured that all index files are used, even if they would not have been
> >>           // seen by the regular logic and the user is informed by the garbage collection run about
> >> -        // the detected index files not following the iterators logic.
> >> +        // the detected index files not following the iterators logic. Further, include trashed
> >> +        // snapshots which have been pruned during garbage collections marking phase.
> >> +
> >> +        let trash = PathBuf::from(".trash/");
> >> +        let mut full_trash_path = self.base_path();
> >> +        full_trash_path.push(&trash);
> >> +        let _ = std::fs::remove_dir_all(full_trash_path);
> > 
> > I think this would need some locking, else we start recursively deleting
> > here while at the same time a prune operation is moving something into
> > the trash..
> 
> True, if there is a concurrent insert, then deletion will probably fail 
> because the sub-directory is then not empty anymore. Instead of locking, 
> I could do an atomic rename of the whole trash can instead, and cleanup 
> the renamed one? Should be not only more efficient but also easier to 
> implement.

that only works if you assume that nothing operates on (or rather, inside of)
the trash dir via an already opened FD.. which seems like a dangerous and
easy to miss assumption..


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup 2/4] datastore: allow to specify sub-directory for index file listing
  2025-04-16 14:18 ` [pbs-devel] [RFC proxmox-backup 2/4] datastore: allow to specify sub-directory for index file listing Christian Ebner
@ 2025-04-18  9:38   ` Thomas Lamprecht
  2025-04-18  9:55     ` Christian Ebner
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Lamprecht @ 2025-04-18  9:38 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Christian Ebner

Am 16.04.25 um 16:18 schrieb Christian Ebner:
> Extend the list_index_files helper to allow passing a sub-directory
> to it. This will be used to fetch only index files from the `.trash`
> directory, ignored by regular listing since hidden folders are not
> considered.
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
>  pbs-datastore/src/datastore.rs | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 3fde8b871..97b78f000 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -972,8 +972,15 @@ impl DataStore {
>      /// The filesystem is walked recursevly to detect index files based on their archive type based
>      /// on the filename. This however excludes the chunks folder, hidden files and does not follow
>      /// symlinks.
> -    fn list_index_files(&self) -> Result<HashSet<PathBuf>, Error> {
> -        let base = self.base_path();
> +    /// If a subdirectory is provided, only that is scanned for index files.
> +    fn list_index_files(&self, subdir: Option<PathBuf>) -> Result<HashSet<PathBuf>, Error> {

it might be a tiny bit nicer to add a new list_index_files_from_path
that shares the implementation with this one here. While it's not pub
and there is only one use size, it's still minimally nicer to have this
encoded in the method name, at least when reading the call site and
wondering what None means here.

> +        let mut base = self.base_path();
> +        if let Some(subdir) = subdir {
> +            base.push(subdir);
> +            if !base.exists() {
> +                return Ok(HashSet::new());
> +            }
> +        }
>  
>          let mut list = HashSet::new();
>  
> @@ -1132,7 +1139,7 @@ impl DataStore {
>          // seen by the regular logic and the user is informed by the garbage collection run about
>          // the detected index files not following the iterators logic.
>  
> -        let mut unprocessed_index_list = self.list_index_files()?;
> +        let mut unprocessed_index_list = self.list_index_files(None)?;
>          let mut index_count = unprocessed_index_list.len();
>  
>          let mut chunk_lru_cache = LruCache::new(cache_capacity);



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup 2/4] datastore: allow to specify sub-directory for index file listing
  2025-04-18  9:38   ` Thomas Lamprecht
@ 2025-04-18  9:55     ` Christian Ebner
  0 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-04-18  9:55 UTC (permalink / raw)
  To: Thomas Lamprecht, Proxmox Backup Server development discussion

On 4/18/25 11:38, Thomas Lamprecht wrote:
> Am 16.04.25 um 16:18 schrieb Christian Ebner:
>> Extend the list_index_files helper to allow passing a sub-directory
>> to it. This will be used to fetch only index files from the `.trash`
>> directory, ignored by regular listing since hidden folders are not
>> considered.
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>>   pbs-datastore/src/datastore.rs | 13 ++++++++++---
>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 3fde8b871..97b78f000 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -972,8 +972,15 @@ impl DataStore {
>>       /// The filesystem is walked recursevly to detect index files based on their archive type based
>>       /// on the filename. This however excludes the chunks folder, hidden files and does not follow
>>       /// symlinks.
>> -    fn list_index_files(&self) -> Result<HashSet<PathBuf>, Error> {
>> -        let base = self.base_path();
>> +    /// If a subdirectory is provided, only that is scanned for index files.
>> +    fn list_index_files(&self, subdir: Option<PathBuf>) -> Result<HashSet<PathBuf>, Error> {
> 
> it might be a tiny bit nicer to add a new list_index_files_from_path
> that shares the implementation with this one here. While it's not pub
> and there is only one use size, it's still minimally nicer to have this
> encoded in the method name, at least when reading the call site and
> wondering what None means here.

Okay, will adapt this accordingly, thanks!


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup 3/4] datastore: move snapshots to trash folder on destroy
  2025-04-17  9:29   ` Fabian Grünbichler
@ 2025-04-18 11:06     ` Thomas Lamprecht
  2025-04-18 11:49       ` Christian Ebner
  2025-04-18 11:51       ` Fabian Grünbichler
  0 siblings, 2 replies; 21+ messages in thread
From: Thomas Lamprecht @ 2025-04-18 11:06 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Fabian Grünbichler

Am 17.04.25 um 11:29 schrieb Fabian Grünbichler:
> On April 16, 2025 4:18 pm, Christian Ebner wrote:
>> Instead of directly deleting the snapshot directory and it's contents
>> on a prune, move the snapshot directory into the `.trash` subfolder
>> of the datastore.
>>
>> This allows to mark chunks which were used by these index files if
>> the snapshot was pruned during an ongoing garbage collection.
>> Garbage collection will clean up these files before starting with the
>> marking phase 1 and read all index files after completing that phase,
>> touching these chunks as well.
> 
> some other variants to maybe consider:
> 
> marking the snapshot itself as trash (in the manifest, or by adding a
> trash marker file inside the dir) - this would mean that there is no
> iterator race issue when undoing a prune, no double-pruning collisions,
> .. - but it also means we need to adapt all call sites that should skip
> trashed snapshots (most existing ones), which is more churn.

Shouldn't we use the central iterators implementations to query indexes?
I.e., any call site that doesn't should probably be switched over to
those, just like Chris did for GC recently.
Then it could be defined if trashed indexes should be skipped – the
default – or included when instantiating that iterator, e.g. through a
parameter or probably a dedicated "with_trash" fn – but that's details.

> 
> having a trash dir per group instead of a global one for the whole
> datastore (less likely to incur extra costs in case somebody has a weird
> setup where namespaces/.. are symlinked or bindmounted or similar
> shenanigans). would need to postpone group removal to GC in case all
> snapshots are pruned.

Both of those variants would make restore simpler too, depends IMO a
bit if we already read the manifest everywhere anyway where the info
is needed, in that case I'd slightly favor that as place to store the
info if a backup is trash or not, as that would avoid the need for
another directory or file to store (inode exhaustion) and manage.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup 3/4] datastore: move snapshots to trash folder on destroy
  2025-04-18 11:06     ` Thomas Lamprecht
@ 2025-04-18 11:49       ` Christian Ebner
  2025-04-18 12:03         ` Fabian Grünbichler
  2025-04-18 11:51       ` Fabian Grünbichler
  1 sibling, 1 reply; 21+ messages in thread
From: Christian Ebner @ 2025-04-18 11:49 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Thomas Lamprecht,
	Fabian Grünbichler

On 4/18/25 13:06, Thomas Lamprecht wrote:
> Am 17.04.25 um 11:29 schrieb Fabian Grünbichler:
>> On April 16, 2025 4:18 pm, Christian Ebner wrote:
>>> Instead of directly deleting the snapshot directory and it's contents
>>> on a prune, move the snapshot directory into the `.trash` subfolder
>>> of the datastore.
>>>
>>> This allows to mark chunks which were used by these index files if
>>> the snapshot was pruned during an ongoing garbage collection.
>>> Garbage collection will clean up these files before starting with the
>>> marking phase 1 and read all index files after completing that phase,
>>> touching these chunks as well.
>>
>> some other variants to maybe consider:
>>
>> marking the snapshot itself as trash (in the manifest, or by adding a
>> trash marker file inside the dir) - this would mean that there is no
>> iterator race issue when undoing a prune, no double-pruning collisions,
>> .. - but it also means we need to adapt all call sites that should skip
>> trashed snapshots (most existing ones), which is more churn.
> 
> Shouldn't we use the central iterators implementations to query indexes?

Yes, correct me if I'm wrong, have not checked all call sites yet but 
index files are mostly accessed by going trough the manifest, either via 
BackupManifest::files or at least verifying it via 
BackupManifest::verfiy_file, as that's also were encryption and 
verification state are stored.

So adding a label to store a trashed state there would work out just 
fine, filtering these snapshots for listing, sync job, ecc. is then fine 
as well. Also, fetching the previous backup snapshot for fast 
incremental mode will work, although require additional filtering.

Although, I'm a bit concerned about performance for the content listing 
if we keep and iterate all of the pruned snapshots. After all they will 
persist until next GC, which could lead to a lot of accumulated snapshots.

One further issue I see with that approach is again sync jobs, which now 
do not see the trashed snapshot on the target and try to re-sync it? Or 
would we include that information for the sync jobs to skip over? Would 
be a bit strange however if the snapshot is not trashed on the source side.

Also, thinking about UI to recover from trash: Might it be good to still 
show the snapshots while listing, but marked with an icon, just like for 
e.g. encryption state? Or create a dedicated window/tab to only show 
trashed items.

All in all storing the trash information on the manifest might not be 
the better option. Give above issues, I'm leaning more towards a 
separate folder structure for this.

What's your opinion on these concerns?

_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup 3/4] datastore: move snapshots to trash folder on destroy
  2025-04-18 11:06     ` Thomas Lamprecht
  2025-04-18 11:49       ` Christian Ebner
@ 2025-04-18 11:51       ` Fabian Grünbichler
  1 sibling, 0 replies; 21+ messages in thread
From: Fabian Grünbichler @ 2025-04-18 11:51 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Thomas Lamprecht

On April 18, 2025 1:06 pm, Thomas Lamprecht wrote:
> Am 17.04.25 um 11:29 schrieb Fabian Grünbichler:
>> On April 16, 2025 4:18 pm, Christian Ebner wrote:
>>> Instead of directly deleting the snapshot directory and it's contents
>>> on a prune, move the snapshot directory into the `.trash` subfolder
>>> of the datastore.
>>>
>>> This allows to mark chunks which were used by these index files if
>>> the snapshot was pruned during an ongoing garbage collection.
>>> Garbage collection will clean up these files before starting with the
>>> marking phase 1 and read all index files after completing that phase,
>>> touching these chunks as well.
>> 
>> some other variants to maybe consider:
>> 
>> marking the snapshot itself as trash (in the manifest, or by adding a
>> trash marker file inside the dir) - this would mean that there is no
>> iterator race issue when undoing a prune, no double-pruning collisions,
>> .. - but it also means we need to adapt all call sites that should skip
>> trashed snapshots (most existing ones), which is more churn.
> 
> Shouldn't we use the central iterators implementations to query indexes?
> I.e., any call site that doesn't should probably be switched over to
> those, just like Chris did for GC recently.
> Then it could be defined if trashed indexes should be skipped – the
> default – or included when instantiating that iterator, e.g. through a
> parameter or probably a dedicated "with_trash" fn – but that's details.

yes, the parts where we iterate are fairly easy to handle, but we do
have quite a few where we access a snapshot directly and might not want
to treat a trashed one like a non-trashed one ;)

>> having a trash dir per group instead of a global one for the whole
>> datastore (less likely to incur extra costs in case somebody has a weird
>> setup where namespaces/.. are symlinked or bindmounted or similar
>> shenanigans). would need to postpone group removal to GC in case all
>> snapshots are pruned.
> 
> Both of those variants would make restore simpler too, depends IMO a
> bit if we already read the manifest everywhere anyway where the info
> is needed, in that case I'd slightly favor that as place to store the
> info if a backup is trash or not, as that would avoid the need for
> another directory or file to store (inode exhaustion) and manage.

the iterator itself doesn't yet read the manifest, neither does the
group's `list_backups` fn.. so the "manifest-only" part would
potentially add quite a bit of overhead there.. but of course, things
like verification and snapshot listing over the API already do read the
manifest later...

another variant would be to have a per-group list of trashed snapshots
*and* a marker in the manifest - that way both use cases would be
supported, with the caveat that they'd have to be kept in sync (but that
only affects prune/undo_prune itself, so might not be *that* bad?)


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup 3/4] datastore: move snapshots to trash folder on destroy
  2025-04-18 11:49       ` Christian Ebner
@ 2025-04-18 12:03         ` Fabian Grünbichler
  2025-04-18 12:45           ` Christian Ebner
  0 siblings, 1 reply; 21+ messages in thread
From: Fabian Grünbichler @ 2025-04-18 12:03 UTC (permalink / raw)
  To: Christian Ebner, Proxmox Backup Server development discussion,
	Thomas Lamprecht

On April 18, 2025 1:49 pm, Christian Ebner wrote:
> On 4/18/25 13:06, Thomas Lamprecht wrote:
>> Am 17.04.25 um 11:29 schrieb Fabian Grünbichler:
>>> On April 16, 2025 4:18 pm, Christian Ebner wrote:
>>>> Instead of directly deleting the snapshot directory and it's contents
>>>> on a prune, move the snapshot directory into the `.trash` subfolder
>>>> of the datastore.
>>>>
>>>> This allows to mark chunks which were used by these index files if
>>>> the snapshot was pruned during an ongoing garbage collection.
>>>> Garbage collection will clean up these files before starting with the
>>>> marking phase 1 and read all index files after completing that phase,
>>>> touching these chunks as well.
>>>
>>> some other variants to maybe consider:
>>>
>>> marking the snapshot itself as trash (in the manifest, or by adding a
>>> trash marker file inside the dir) - this would mean that there is no
>>> iterator race issue when undoing a prune, no double-pruning collisions,
>>> .. - but it also means we need to adapt all call sites that should skip
>>> trashed snapshots (most existing ones), which is more churn.
>> 
>> Shouldn't we use the central iterators implementations to query indexes?
> 
> Yes, correct me if I'm wrong, have not checked all call sites yet but 
> index files are mostly accessed by going trough the manifest, either via 
> BackupManifest::files or at least verifying it via 
> BackupManifest::verfiy_file, as that's also were encryption and 
> verification state are stored.
> 
> So adding a label to store a trashed state there would work out just 
> fine, filtering these snapshots for listing, sync job, ecc. is then fine 
> as well. Also, fetching the previous backup snapshot for fast 
> incremental mode will work, although require additional filtering.
> 
> Although, I'm a bit concerned about performance for the content listing 
> if we keep and iterate all of the pruned snapshots. After all they will 
> persist until next GC, which could lead to a lot of accumulated snapshots.

that's a fair point, in some environments this might be cumbersome..
OTOH, those are exactly the environments that would/should run GC often
I guess, so maybe it's not that bad?

> One further issue I see with that approach is again sync jobs, which now 
> do not see the trashed snapshot on the target and try to re-sync it? Or 
> would we include that information for the sync jobs to skip over? Would 
> be a bit strange however if the snapshot is not trashed on the source side.

they'd only resync it if it's after the last local one, unless it's a
"sync missing" special sync, so this is not different to the current
state? at least, if syncing a trashed snapshot using the same snapshot
is allowed and just undoes the trashing?

> Also, thinking about UI to recover from trash: Might it be good to still 
> show the snapshots while listing, but marked with an icon, just like for 
> e.g. encryption state? Or create a dedicated window/tab to only show 
> trashed items.

yes, the snapshot list needs to get an option to include trashed ones,
and the UI should set and handle that appropriately ;)

> All in all storing the trash information on the manifest might not be 
> the better option. Give above issues, I'm leaning more towards a 
> separate folder structure for this.

most of the above issues apply to both variants anyway - the main
difference is that with the separate folder iterating access needs to
opt-into including trashed snapshots, so only does extra work in case
that is desired, whereas in the manifest one the extra work is already
done by the time we can decide to skip/filter out a snapshot because
it's trash.

maybe a summary would be:

pro separate folder:
- less work when only iterating over non-trash or only over trash
- no need to parse manifest where it is currently not parsed

con separate folder:
- more work/complicated handling when iterating over both trash and non-trash
- more work to put something into/out of the trash

?


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup 3/4] datastore: move snapshots to trash folder on destroy
  2025-04-18 12:03         ` Fabian Grünbichler
@ 2025-04-18 12:45           ` Christian Ebner
  2025-04-22  7:54             ` Fabian Grünbichler
  0 siblings, 1 reply; 21+ messages in thread
From: Christian Ebner @ 2025-04-18 12:45 UTC (permalink / raw)
  To: Fabian Grünbichler,
	Proxmox Backup Server development discussion, Thomas Lamprecht

On 4/18/25 14:03, Fabian Grünbichler wrote:
> On April 18, 2025 1:49 pm, Christian Ebner wrote:
>> On 4/18/25 13:06, Thomas Lamprecht wrote:
>>> Am 17.04.25 um 11:29 schrieb Fabian Grünbichler:
>>>> On April 16, 2025 4:18 pm, Christian Ebner wrote:
>>>>> Instead of directly deleting the snapshot directory and it's contents
>>>>> on a prune, move the snapshot directory into the `.trash` subfolder
>>>>> of the datastore.
>>>>>
>>>>> This allows to mark chunks which were used by these index files if
>>>>> the snapshot was pruned during an ongoing garbage collection.
>>>>> Garbage collection will clean up these files before starting with the
>>>>> marking phase 1 and read all index files after completing that phase,
>>>>> touching these chunks as well.
>>>>
>>>> some other variants to maybe consider:
>>>>
>>>> marking the snapshot itself as trash (in the manifest, or by adding a
>>>> trash marker file inside the dir) - this would mean that there is no
>>>> iterator race issue when undoing a prune, no double-pruning collisions,
>>>> .. - but it also means we need to adapt all call sites that should skip
>>>> trashed snapshots (most existing ones), which is more churn.
>>>
>>> Shouldn't we use the central iterators implementations to query indexes?
>>
>> Yes, correct me if I'm wrong, have not checked all call sites yet but
>> index files are mostly accessed by going trough the manifest, either via
>> BackupManifest::files or at least verifying it via
>> BackupManifest::verfiy_file, as that's also were encryption and
>> verification state are stored.
>>
>> So adding a label to store a trashed state there would work out just
>> fine, filtering these snapshots for listing, sync job, ecc. is then fine
>> as well. Also, fetching the previous backup snapshot for fast
>> incremental mode will work, although require additional filtering.
>>
>> Although, I'm a bit concerned about performance for the content listing
>> if we keep and iterate all of the pruned snapshots. After all they will
>> persist until next GC, which could lead to a lot of accumulated snapshots.
> 
> that's a fair point, in some environments this might be cumbersome..
> OTOH, those are exactly the environments that would/should run GC often
> I guess, so maybe it's not that bad?

Well, the content listing performance is already problematic for some 
setups, so I would like to avoid adding to that problem :)

>> One further issue I see with that approach is again sync jobs, which now
>> do not see the trashed snapshot on the target and try to re-sync it? Or
>> would we include that information for the sync jobs to skip over? Would
>> be a bit strange however if the snapshot is not trashed on the source side.
> 
> they'd only resync it if it's after the last local one, unless it's a
> "sync missing" special sync, so this is not different to the current
> state? at least, if syncing a trashed snapshot using the same snapshot
> is allowed and just undoes the trashing?
> 
>> Also, thinking about UI to recover from trash: Might it be good to still
>> show the snapshots while listing, but marked with an icon, just like for
>> e.g. encryption state? Or create a dedicated window/tab to only show
>> trashed items.
> 
> yes, the snapshot list needs to get an option to include trashed ones,
> and the UI should set and handle that appropriately ;)
> 
>> All in all storing the trash information on the manifest might not be
>> the better option. Give above issues, I'm leaning more towards a
>> separate folder structure for this.
> 
> most of the above issues apply to both variants anyway - the main
> difference is that with the separate folder iterating access needs to
> opt-into including trashed snapshots, so only does extra work in case
> that is desired, whereas in the manifest one the extra work is already
> done by the time we can decide to skip/filter out a snapshot because
> it's trash.
> 
> maybe a summary would be:
> 
> pro separate folder:
> - less work when only iterating over non-trash or only over trash
> - no need to parse manifest where it is currently not parsed
> con separate folder:
> - more work/complicated handling when iterating over both trash and non-trash
> - more work to put something into/out of the trash
> 
> ?

Hmm, maybe a variant where we do not rely on the manifest or a dedicated 
folder marker at all, but rather change the folder name for the folder 
to either be hidden or have a dedicated pre/postfix? Similar to your 
marker file suggestion, but without the marker and an easy way to skip 
reading such snapshot folders to begin with. Would then require the 
snapshot creation to perform some additional checking and adapt the 
iterators to have variants with and without the hidden structure, but 
should reduce the issues from both variants discussed above?


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup 3/4] datastore: move snapshots to trash folder on destroy
  2025-04-18 12:45           ` Christian Ebner
@ 2025-04-22  7:54             ` Fabian Grünbichler
  2025-04-29 11:27               ` Christian Ebner
  0 siblings, 1 reply; 21+ messages in thread
From: Fabian Grünbichler @ 2025-04-22  7:54 UTC (permalink / raw)
  To: Christian Ebner, Proxmox Backup Server development discussion,
	Thomas Lamprecht


> Christian Ebner <c.ebner@proxmox.com> hat am 18.04.2025 14:45 CEST geschrieben:
> 
>  
> On 4/18/25 14:03, Fabian Grünbichler wrote:
> > On April 18, 2025 1:49 pm, Christian Ebner wrote:
> >> On 4/18/25 13:06, Thomas Lamprecht wrote:
> >>> Am 17.04.25 um 11:29 schrieb Fabian Grünbichler:
> >>>> On April 16, 2025 4:18 pm, Christian Ebner wrote:
> >>>>> Instead of directly deleting the snapshot directory and it's contents
> >>>>> on a prune, move the snapshot directory into the `.trash` subfolder
> >>>>> of the datastore.
> >>>>>
> >>>>> This allows to mark chunks which were used by these index files if
> >>>>> the snapshot was pruned during an ongoing garbage collection.
> >>>>> Garbage collection will clean up these files before starting with the
> >>>>> marking phase 1 and read all index files after completing that phase,
> >>>>> touching these chunks as well.
> >>>>
> >>>> some other variants to maybe consider:
> >>>>
> >>>> marking the snapshot itself as trash (in the manifest, or by adding a
> >>>> trash marker file inside the dir) - this would mean that there is no
> >>>> iterator race issue when undoing a prune, no double-pruning collisions,
> >>>> .. - but it also means we need to adapt all call sites that should skip
> >>>> trashed snapshots (most existing ones), which is more churn.
> >>>
> >>> Shouldn't we use the central iterators implementations to query indexes?
> >>
> >> Yes, correct me if I'm wrong, have not checked all call sites yet but
> >> index files are mostly accessed by going trough the manifest, either via
> >> BackupManifest::files or at least verifying it via
> >> BackupManifest::verfiy_file, as that's also were encryption and
> >> verification state are stored.
> >>
> >> So adding a label to store a trashed state there would work out just
> >> fine, filtering these snapshots for listing, sync job, ecc. is then fine
> >> as well. Also, fetching the previous backup snapshot for fast
> >> incremental mode will work, although require additional filtering.
> >>
> >> Although, I'm a bit concerned about performance for the content listing
> >> if we keep and iterate all of the pruned snapshots. After all they will
> >> persist until next GC, which could lead to a lot of accumulated snapshots.
> > 
> > that's a fair point, in some environments this might be cumbersome..
> > OTOH, those are exactly the environments that would/should run GC often
> > I guess, so maybe it's not that bad?
> 
> Well, the content listing performance is already problematic for some 
> setups, so I would like to avoid adding to that problem :)

sure, but that is a two-fold problem:
- our UI code (PBS and PVE) not caching the result properly/using a streaming
  API to access the content tree
- slow storage and big datastores

> >> One further issue I see with that approach is again sync jobs, which now
> >> do not see the trashed snapshot on the target and try to re-sync it? Or
> >> would we include that information for the sync jobs to skip over? Would
> >> be a bit strange however if the snapshot is not trashed on the source side.
> > 
> > they'd only resync it if it's after the last local one, unless it's a
> > "sync missing" special sync, so this is not different to the current
> > state? at least, if syncing a trashed snapshot using the same snapshot
> > is allowed and just undoes the trashing?
> > 
> >> Also, thinking about UI to recover from trash: Might it be good to still
> >> show the snapshots while listing, but marked with an icon, just like for
> >> e.g. encryption state? Or create a dedicated window/tab to only show
> >> trashed items.
> > 
> > yes, the snapshot list needs to get an option to include trashed ones,
> > and the UI should set and handle that appropriately ;)
> > 
> >> All in all storing the trash information on the manifest might not be
> >> the better option. Give above issues, I'm leaning more towards a
> >> separate folder structure for this.
> > 
> > most of the above issues apply to both variants anyway - the main
> > difference is that with the separate folder iterating access needs to
> > opt-into including trashed snapshots, so only does extra work in case
> > that is desired, whereas in the manifest one the extra work is already
> > done by the time we can decide to skip/filter out a snapshot because
> > it's trash.
> > 
> > maybe a summary would be:
> > 
> > pro separate folder:
> > - less work when only iterating over non-trash or only over trash
> > - no need to parse manifest where it is currently not parsed
> > con separate folder:
> > - more work/complicated handling when iterating over both trash and non-trash
> > - more work to put something into/out of the trash
> > 
> > ?
> 
> Hmm, maybe a variant where we do not rely on the manifest or a dedicated 
> folder marker at all, but rather change the folder name for the folder 
> to either be hidden or have a dedicated pre/postfix? Similar to your 
> marker file suggestion, but without the marker and an easy way to skip 
> reading such snapshot folders to begin with. Would then require the 
> snapshot creation to perform some additional checking and adapt the 
> iterators to have variants with and without the hidden structure, but 
> should reduce the issues from both variants discussed above?

renaming the snapdir or moving it to an explicit (per-group?) trash dir are
the same solution with all the same upsides and downsides, AFAICT. you still
need to carefully reason about both directions (moving to trash and undoing),
in particular about interactions with GC. you still have the issue of multiple
cycles of moving to trash, adding the snapshot again, moving to trash again
(which can be handled, but is more involved compared to the flag approach),
when to lock both the trashed and the regular snapdir (or whether there is
a single lock covering both?), etc.pp.

I don't think there is any strict technical blocker for either approach, and
both have pros and cons that need to be weighed. I don't see a clear winner
(yet), maybe others have a strong(er) opinion?


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [pbs-devel] [RFC proxmox-backup 3/4] datastore: move snapshots to trash folder on destroy
  2025-04-22  7:54             ` Fabian Grünbichler
@ 2025-04-29 11:27               ` Christian Ebner
  0 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-04-29 11:27 UTC (permalink / raw)
  To: Fabian Grünbichler,
	Proxmox Backup Server development discussion, Thomas Lamprecht

On 4/22/25 09:54, Fabian Grünbichler wrote:
> 
> renaming the snapdir or moving it to an explicit (per-group?) trash dir are
> the same solution with all the same upsides and downsides, AFAICT. you still
> need to carefully reason about both directions (moving to trash and undoing),
> in particular about interactions with GC. you still have the issue of multiple
> cycles of moving to trash, adding the snapshot again, moving to trash again
> (which can be handled, but is more involved compared to the flag approach),
> when to lock both the trashed and the regular snapdir (or whether there is
> a single lock covering both?), etc.pp.
> 
> I don't think there is any strict technical blocker for either approach, and
> both have pros and cons that need to be weighed. I don't see a clear winner
> (yet), maybe others have a strong(er) opinion?

Since it is also required to keep the backup groups and namespaces 
around when pruning these, the suggested approach with using a marker 
file for trashed snapshots seems best. This allows to apply the same 
marker approach to these as well.

Re-creation of the same group would however require to clear all the 
trash content for these backup groups in order to not run into 
ownership/consistency issues.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-04-29 11:27 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-16 14:17 [pbs-devel] [RFC proxmox-backup 0/4] implement trash can for snapshots Christian Ebner
2025-04-16 14:18 ` [pbs-devel] [RFC proxmox-backup 1/4] datastore: always skip over base directory when listing index files Christian Ebner
2025-04-17  9:29   ` Fabian Grünbichler
2025-04-17 10:27     ` Christian Ebner
2025-04-16 14:18 ` [pbs-devel] [RFC proxmox-backup 2/4] datastore: allow to specify sub-directory for index file listing Christian Ebner
2025-04-18  9:38   ` Thomas Lamprecht
2025-04-18  9:55     ` Christian Ebner
2025-04-16 14:18 ` [pbs-devel] [RFC proxmox-backup 3/4] datastore: move snapshots to trash folder on destroy Christian Ebner
2025-04-17  9:29   ` Fabian Grünbichler
2025-04-18 11:06     ` Thomas Lamprecht
2025-04-18 11:49       ` Christian Ebner
2025-04-18 12:03         ` Fabian Grünbichler
2025-04-18 12:45           ` Christian Ebner
2025-04-22  7:54             ` Fabian Grünbichler
2025-04-29 11:27               ` Christian Ebner
2025-04-18 11:51       ` Fabian Grünbichler
2025-04-16 14:18 ` [pbs-devel] [RFC proxmox-backup 4/4] garbage collection: read pruned snapshot index files from trash Christian Ebner
2025-04-17  9:29   ` Fabian Grünbichler
2025-04-17 10:38     ` Christian Ebner
2025-04-17 11:27       ` Fabian Grünbichler
2025-04-17  9:29 ` [pbs-devel] [RFC proxmox-backup 0/4] implement trash can for snapshots Fabian Grünbichler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal