[pbs-devel] [PATCH proxmox-backup] garbage collection: fix rare race in chunk marking phase

* [pbs-devel] [PATCH proxmox-backup] garbage collection: fix rare race in chunk marking phase
@ 2025-04-14 13:38 Christian Ebner
  2025-04-15 10:55 ` [pbs-devel] superseded: " Christian Ebner
  0 siblings, 1 reply; 2+ messages in thread
From: Christian Ebner @ 2025-04-14 13:38 UTC (permalink / raw)
  To: pbs-devel

During phase 1 of garbage collection referenced chunks are marked as
in use by iterating over all index files and updating the atime on
the chunks referenced by these.

In an edge case for long running garbage collection jobs, where a
newly added snapshot (created after the start of GC) reused known
chunks from a previous snapshot, but the previous snapshot index
referencing them disappeared before the marking phase could reach
that index (e.g. pruned because only 1 snapshot to be kept by
retention setting), known chunks from that previous index file might
not be marked (given that by none of the other index files it was
marked).

Since commit 74361da8 ("garbage collection: generate index file list
via datastore iterators") this is even less likely as now the
iteration reads also index files added during phase 1, and
therefore either the new or the previous index file will account for
these chunks (the previous backup snapshot can only be prunded after
the new one finished, since locked). There remains however a small
race window between the reading of the snapshots in the backup group
and the reading of the actual index files for marking.

Fix this race by:
1. Checking if the last snapshot of a group disappeared and if so
2. generate the list again, looking for new index files previously
   not accounted for
3. To avoid possible endless looping, lock the group if the snapshot
   list changed even after the 10th time (which will lead to
   concurrent operations to this group failing).

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/datastore.rs | 110 +++++++++++++++++++++++----------
 1 file changed, 76 insertions(+), 34 deletions(-)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index fbd9eddba..dba80412f 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1129,7 +1129,7 @@ impl DataStore {
         // the detected index files not following the iterators logic.
 
         let mut unprocessed_index_list = self.list_index_files()?;
-        let index_count = unprocessed_index_list.len();
+        let mut index_count = unprocessed_index_list.len();
 
         let mut chunk_lru_cache = LruCache::new(cache_capacity);
         let mut processed_index_files = 0;
@@ -1143,41 +1143,83 @@ impl DataStore {
             let namespace = namespace.context("iterating namespaces failed")?;
             for group in arc_self.iter_backup_groups(namespace)? {
                 let group = group.context("iterating backup groups failed")?;
-                let mut snapshots = group.list_backups().context("listing snapshots failed")?;
-                // Sort by snapshot timestamp to iterate over consecutive snapshots for each image.
-                BackupInfo::sort_list(&mut snapshots, true);
-                for snapshot in snapshots {
-                    for file in snapshot.files {
-                        worker.check_abort()?;
-                        worker.fail_on_shutdown()?;
-
-                        let mut path = snapshot.backup_dir.full_path();
-                        path.push(file);
-
-                        let index = match self.open_index_reader(&path)? {
-                            Some(index) => index,
-                            None => continue,
-                        };
-                        self.index_mark_used_chunks(
-                            index,
-                            &path,
-                            &mut chunk_lru_cache,
-                            status,
-                            worker,
-                        )?;
-
-                        unprocessed_index_list.remove(&path);
-
-                        let percentage = (processed_index_files + 1) * 100 / index_count;
-                        if percentage > last_percentage {
-                            info!(
-                                "marked {percentage}% ({} of {index_count} index files)",
-                                processed_index_files + 1,
-                            );
-                            last_percentage = percentage;
+
+                // Avoid race between listing/marking of snapshots by GC and pruning the last
+                // snapshot in the group, following a new snapshot creation. Otherwise known chunks
+                // might only be referenced by the new snapshot, so it must be read as well.
+                let mut retry_counter = 0;
+                let mut processed_group_indices = HashSet::new();
+                let mut needs_retry;
+                loop {
+                    needs_retry = false;
+                    let _lock = if retry_counter == 10 {
+                        // Last resort, exclusively lock the backup group to make sure no new
+                        // backups can be written. This will only trigger if none of the previous
+                        // retries converged successfully, which is very unlikely.
+                        Some(group.lock())
+                    } else {
+                        None
+                    };
+
+                    let mut snapshots = group.list_backups().context("listing snapshots failed")?;
+                    let snapshot_count = snapshots.len();
+                    // Sort by snapshot timestamp to iterate over consecutive snapshots for each snapshot.
+                    BackupInfo::sort_list(&mut snapshots, true);
+                    for (count, snapshot) in snapshots.into_iter().enumerate() {
+                        for file in snapshot.files {
+                            worker.check_abort()?;
+                            worker.fail_on_shutdown()?;
+
+                            let mut path = snapshot.backup_dir.full_path();
+                            path.push(file);
+
+                            // Avoid reprocessing of already seen index files on retry
+                            if retry_counter > 0 && processed_group_indices.contains(&path) {
+                                continue;
+                            }
+
+                            let index = match self.open_index_reader(&path)? {
+                                Some(index) => index,
+                                None => {
+                                    if count + 1 == snapshot_count && !snapshot.backup_dir.full_path().exists() {
+                                        needs_retry = true;
+                                    }
+                                    unprocessed_index_list.remove(&path);
+                                    continue;
+                                }
+                            };
+                            self.index_mark_used_chunks(
+                                index,
+                                &path,
+                                &mut chunk_lru_cache,
+                                status,
+                                worker,
+                            )?;
+                            processed_group_indices.insert(path.clone());
+
+                            if !unprocessed_index_list.remove(&path) {
+                                info!("Encountered new index file '{path:?}', increment total index file count");
+                                index_count += 1;
+                            }
+
+                            let percentage = (processed_index_files + 1) * 100 / index_count;
+                            if percentage > last_percentage {
+                                info!(
+                                    "marked {percentage}% ({} of {index_count} index files)",
+                                    processed_index_files + 1,
+                                );
+                                last_percentage = percentage;
+                            }
+                            processed_index_files += 1;
                         }
-                        processed_index_files += 1;
                     }
+
+                    if needs_retry {
+                        retry_counter += 1;
+                        continue;
+                    }
+
+                    break;
                 }
             }
         }
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 2+ messages in thread