[pbs-devel] [PATCH v2 proxmox-backup 0/4] GC: avoid multiple atime updates

public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed

* [pbs-devel] [PATCH v2 proxmox-backup 0/4] GC: avoid multiple atime updates
@ 2025-03-10 11:16 Christian Ebner
  2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 1/4] datastore: restrict datastores list_images method scope to module Christian Ebner
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Christian Ebner @ 2025-03-10 11:16 UTC (permalink / raw)
  To: pbs-devel

This patches implement the logic to greatly improve the performance
of phase 1 garbage collection by avoiding multiple atime updates on
the same chunk.

Currently, phase 1 GC iterates over all folders in the datastore
looking and collecting all image index files without taking any
logical assumptions (e.g. namespaces, groups, snapshots, ...). This
is to avoid accidentally missing image index files located in
unexpected paths and therefore not marking their chunks as in use,
leading to potential data losses.

This patches improve phase 1 by iterating index images using the datatstore's
iterators and keep track of the already touched chunks for consecutive
backup snapshots, following the same principle as for incremental backup
snapshots. Paths outside of the iterator logic are still taken into
account and processed as well by generating a list of all the found images
first, removing index files encountered while iterating, finally leaving
a list of indexes with unexpected paths. These unexpected paths are now
also logged, for the user to potentially take action.

By keeping track of already seen and therefore updated chunk atimes, it is now
avoided to update the atime over and over again on the chunks shared by
consecutive backup snaphshots.

Most notably changes since version 1 (thanks Fabian for feedback):
- Logically iterate using pre-existing iterators instead of constructing
  data structure for iteration when listing images.
- Tested that double listing does not affect runtime.
- Chunks are now remembered for all archives per snapshot, not just a
  single archive per snapshot as previously, this mimics more closely
  the backup behaviour, this give some additional gains in some cases.

Statistics generated by averaging 3 GC runtimes, measured after an initial
run each to warm up caches. Datastores A and B (192 index files) are unrelated,
containing "real" backups. The syscall counts were generated using
`strace -f -e utimensat -p $(pidof proxmox-backup-proxy)` and (after small
cleanup) `wc -l`.

datastore A on spinning disk:
unpatched: 117 ± 4 s,    utimensat calls: 12059913
version 1: 27.6 ± 0.5 s, utimensat calls:  1178913
version 2: 24.3 ± 0.5 s, utimensat calls:  1120317

datastore B on SSD:
unpatched: 27 ± 1 s,     utimensat calls: 2032380
version 1: 14.3 ± 0.5 s, utimensat calls:  565417
version 2: 15.1 ± 0.2 s, utimensat calls:  564617

datastore B via NFS export:
unpatched: aborted after 10 min - no progress (while other versions did)
version 1: 34 min 3 s
version 2: 32 min 19 s
Above results are from only 1 run, since GC takes long time on the NFS shared
datastore.

Christian Ebner (4):
  datastore: restrict datastores list_images method scope to module
  datastore: add helper method to open index reader from path
  garbage collection: allow to keep track of already touched chunks
  fix #5331: garbage collection: avoid multiple chunk atime updates

 pbs-datastore/src/datastore.rs | 206 ++++++++++++++++++++++++---------
 1 file changed, 154 insertions(+), 52 deletions(-)

-- 
2.39.5

_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [pbs-devel] [PATCH v2 proxmox-backup 1/4] datastore: restrict datastores list_images method scope to module
  2025-03-10 11:16 [pbs-devel] [PATCH v2 proxmox-backup 0/4] GC: avoid multiple atime updates Christian Ebner
@ 2025-03-10 11:16 ` Christian Ebner
  2025-03-17 15:00   ` [pbs-devel] applied: " Fabian Grünbichler
  2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 2/4] datastore: add helper method to open index reader from path Christian Ebner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Christian Ebner @ 2025-03-10 11:16 UTC (permalink / raw)
  To: pbs-devel

Drop the pub scope for `DataStore`s `list_images` method.

This method is only used to generate a list of index files found in
the datastore for iteration during garbage collection. There are no
other call sites and this is intended to only be used within the
module itself. Allows to be more flexible for future method signature
adaptions.

No functional changes.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- no changes

 pbs-datastore/src/datastore.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 75c0c16ab..a6a91ca79 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -970,7 +970,7 @@ impl DataStore {
         ListGroups::new(Arc::clone(self), ns)?.collect()
     }
 
-    pub fn list_images(&self) -> Result<Vec<PathBuf>, Error> {
+    fn list_images(&self) -> Result<Vec<PathBuf>, Error> {
         let base = self.base_path();
 
         let mut list = vec![];
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [pbs-devel] [PATCH v2 proxmox-backup 2/4] datastore: add helper method to open index reader from path
  2025-03-10 11:16 [pbs-devel] [PATCH v2 proxmox-backup 0/4] GC: avoid multiple atime updates Christian Ebner
  2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 1/4] datastore: restrict datastores list_images method scope to module Christian Ebner
@ 2025-03-10 11:16 ` Christian Ebner
  2025-03-17 14:59   ` Fabian Grünbichler
  2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 3/4] garbage collection: allow to keep track of already touched chunks Christian Ebner
  2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 4/4] fix #5331: garbage collection: avoid multiple chunk atime updates Christian Ebner
  3 siblings, 1 reply; 14+ messages in thread
From: Christian Ebner @ 2025-03-10 11:16 UTC (permalink / raw)
  To: pbs-devel

Refactor the archive type and index file reader opening with its
error handling into a helper method for better reusability.

This allows to use the same logic for both, expected image paths
and unexpected image paths when iterating trough the datastore
in a hierarchical manner.

Improve error handling by switching to anyhow's error context.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- not present in previous version

 pbs-datastore/src/datastore.rs | 66 ++++++++++++++++++++++------------
 1 file changed, 44 insertions(+), 22 deletions(-)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index a6a91ca79..72bc9f77f 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -5,7 +5,7 @@ use std::os::unix::io::AsRawFd;
 use std::path::{Path, PathBuf};
 use std::sync::{Arc, LazyLock, Mutex};
 
-use anyhow::{bail, format_err, Error};
+use anyhow::{bail, format_err, Context, Error};
 use nix::unistd::{unlinkat, UnlinkatFlags};
 use tracing::{info, warn};
 
@@ -1029,10 +1029,47 @@ impl DataStore {
         Ok(list)
     }
 
+    // Similar to open index, but ignore index files with blob or unknown archive type.
+    // Further, do not fail if file vanished.
+    fn open_index_reader(&self, path: &Path) -> Result<Option<Box<dyn IndexFile>>, Error> {
+        let archive_type = match ArchiveType::from_path(path) {
+            Ok(archive_type) => archive_type,
+            // ignore archives with unknown archive type
+            Err(_) => return Ok(None),
+        };
+
+        let file = match std::fs::File::open(path) {
+            Ok(file) => file,
+            // ignore vanished files
+            Err(err) if err.kind() == io::ErrorKind::NotFound => return Ok(None),
+            Err(err) => {
+                return Err(
+                    Error::from(err).context(format!("can't open file {}", path.to_string_lossy()))
+                )
+            }
+        };
+
+        match archive_type {
+            ArchiveType::FixedIndex => {
+                let reader = FixedIndexReader::new(file)
+                    .context(format!("can't open fixed index {}", path.to_string_lossy()))?;
+                Ok(Some(Box::new(reader)))
+            }
+            ArchiveType::DynamicIndex => {
+                let reader = DynamicIndexReader::new(file).context(format!(
+                    "can't open dynamic index {}",
+                    path.to_string_lossy()
+                ))?;
+                Ok(Some(Box::new(reader)))
+            }
+            ArchiveType::Blob => Ok(None),
+        }
+    }
+
     // mark chunks  used by ``index`` as used
-    fn index_mark_used_chunks<I: IndexFile>(
+    fn index_mark_used_chunks(
         &self,
-        index: I,
+        index: Box<dyn IndexFile>,
         file_name: &Path, // only used for error reporting
         status: &mut GarbageCollectionStatus,
         worker: &dyn WorkerTaskContext,
@@ -1090,24 +1127,8 @@ impl DataStore {
                 }
             }
 
-            match std::fs::File::open(&img) {
-                Ok(file) => {
-                    if let Ok(archive_type) = ArchiveType::from_path(&img) {
-                        if archive_type == ArchiveType::FixedIndex {
-                            let index = FixedIndexReader::new(file).map_err(|e| {
-                                format_err!("can't read index '{}' - {}", img.to_string_lossy(), e)
-                            })?;
-                            self.index_mark_used_chunks(index, &img, status, worker)?;
-                        } else if archive_type == ArchiveType::DynamicIndex {
-                            let index = DynamicIndexReader::new(file).map_err(|e| {
-                                format_err!("can't read index '{}' - {}", img.to_string_lossy(), e)
-                            })?;
-                            self.index_mark_used_chunks(index, &img, status, worker)?;
-                        }
-                    }
-                }
-                Err(err) if err.kind() == io::ErrorKind::NotFound => (), // ignore vanished files
-                Err(err) => bail!("can't open index {} - {}", img.to_string_lossy(), err),
+            if let Some(index) = self.open_index_reader(&img)? {
+                self.index_mark_used_chunks(index, &img, status, worker)?;
             }
 
             let percentage = (i + 1) * 100 / image_count;
@@ -1173,7 +1194,8 @@ impl DataStore {
 
             info!("Start GC phase1 (mark used chunks)");
 
-            self.mark_used_chunks(&mut gc_status, worker)?;
+            self.mark_used_chunks(&mut gc_status, worker)
+                .map_err(|err| format_err!("marking used chunks failed - {err:#}"))?;
 
             info!("Start GC phase2 (sweep unused chunks)");
             self.inner.chunk_store.sweep_unused_chunks(
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [pbs-devel] [PATCH v2 proxmox-backup 3/4] garbage collection: allow to keep track of already touched chunks
  2025-03-10 11:16 [pbs-devel] [PATCH v2 proxmox-backup 0/4] GC: avoid multiple atime updates Christian Ebner
  2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 1/4] datastore: restrict datastores list_images method scope to module Christian Ebner
  2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 2/4] datastore: add helper method to open index reader from path Christian Ebner
@ 2025-03-10 11:16 ` Christian Ebner
  2025-03-17 14:55   ` Fabian Grünbichler
  2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 4/4] fix #5331: garbage collection: avoid multiple chunk atime updates Christian Ebner
  3 siblings, 1 reply; 14+ messages in thread
From: Christian Ebner @ 2025-03-10 11:16 UTC (permalink / raw)
  To: pbs-devel

Implements the `TouchedChunks` struct and methods to keep track of
already touched chunks during garbage collection phase 1, to avoid
multiple computational and I/O intensive atime updates via a syscall.

By inserting a digest, the chunk will be considered as touched and
can be ignored for subsequent encounters. To limit memory usage, the
structure allows to reset the chunk status, flagging them as seen
previous to the reset. A subsequent insert will then flag it as seen
after the reset. Chunks not seen after a reset, will be cleared from
the structure by the next reset call, eliminating them from memory.

This allows to reset the tracking stat after each processes image
index file, to mimic the incremental backup behaviour of known chunks
and limit memory footprint.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- no changes

 pbs-datastore/src/datastore.rs | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 72bc9f77f..fdbb33a98 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1585,3 +1585,32 @@ impl DataStore {
         Ok(())
     }
 }
+
+struct TouchedChunks {
+    list: HashMap<[u8; 32], bool>,
+}
+
+impl TouchedChunks {
+    fn new() -> Self {
+        Self {
+            list: HashMap::new(),
+        }
+    }
+
+    // Clear untouched chunks and reset the touched marker for others.
+    fn reset(&mut self) {
+        let mut new_list = HashMap::new();
+        for (digest, touched) in self.list.drain() {
+            if touched {
+                new_list.insert(digest, false);
+            }
+        }
+        self.list = new_list;
+    }
+
+    // Insert the digest in the list of touched chunks.
+    // Returns true if the chunk was already present, false otherwise.
+    fn insert(&mut self, digest: [u8; 32]) -> bool {
+        self.list.insert(digest, true).is_some()
+    }
+}
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [pbs-devel] [PATCH v2 proxmox-backup 4/4] fix #5331: garbage collection: avoid multiple chunk atime updates
  2025-03-10 11:16 [pbs-devel] [PATCH v2 proxmox-backup 0/4] GC: avoid multiple atime updates Christian Ebner
                   ` (2 preceding siblings ...)
  2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 3/4] garbage collection: allow to keep track of already touched chunks Christian Ebner
@ 2025-03-10 11:16 ` Christian Ebner
  2025-03-10 11:40   ` Christian Ebner
  2025-03-17 14:55   ` Fabian Grünbichler
  3 siblings, 2 replies; 14+ messages in thread
From: Christian Ebner @ 2025-03-10 11:16 UTC (permalink / raw)
  To: pbs-devel

Reduce the number of atime updates on the same chunk by logically
iterating over image index files, following the incremental backup
logic. By inserting paths for encountered images during
`list_images` using the GroupedImageList structure, the iteration
happens now for the same image filenames in the same image namespace
and group in a order based on the snapshot timestamp. For each image,
keep track of the encountered chunk digests, and remember these as
seen for the next snapshot. Chunks which have been encountered in the
previous image index, but are not present anymore are removed from
the list after each image, in order to reduce memory footprint.

Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5331
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- Use pre-existing datastore iterator helpers, following the logic other
  datastore operations take.
- Chunks are now remembered for all archives per snapshot, not just a
  single archive per snapshot as previously, this mimics more closely
  the backup behaviour.

 pbs-datastore/src/datastore.rs | 117 +++++++++++++++++++++++----------
 1 file changed, 84 insertions(+), 33 deletions(-)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index fdbb33a98..a80343d9b 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -25,7 +25,7 @@ use pbs_api_types::{
     MaintenanceMode, MaintenanceType, Operation, UPID,
 };
 
-use crate::backup_info::{BackupDir, BackupGroup};
+use crate::backup_info::{BackupDir, BackupGroup, BackupInfo};
 use crate::chunk_store::ChunkStore;
 use crate::dynamic_index::{DynamicIndexReader, DynamicIndexWriter};
 use crate::fixed_index::{FixedIndexReader, FixedIndexWriter};
@@ -970,10 +970,10 @@ impl DataStore {
         ListGroups::new(Arc::clone(self), ns)?.collect()
     }
 
-    fn list_images(&self) -> Result<Vec<PathBuf>, Error> {
+    fn list_images(&self) -> Result<HashSet<PathBuf>, Error> {
         let base = self.base_path();
 
-        let mut list = vec![];
+        let mut list = HashSet::new();
 
         use walkdir::WalkDir;
 
@@ -1021,7 +1021,7 @@ impl DataStore {
                 if archive_type == ArchiveType::FixedIndex
                     || archive_type == ArchiveType::DynamicIndex
                 {
-                    list.push(path);
+                    list.insert(path);
                 }
             }
         }
@@ -1071,6 +1071,7 @@ impl DataStore {
         &self,
         index: Box<dyn IndexFile>,
         file_name: &Path, // only used for error reporting
+        touched_chunks: &mut TouchedChunks,
         status: &mut GarbageCollectionStatus,
         worker: &dyn WorkerTaskContext,
     ) -> Result<(), Error> {
@@ -1081,6 +1082,12 @@ impl DataStore {
             worker.check_abort()?;
             worker.fail_on_shutdown()?;
             let digest = index.index_digest(pos).unwrap();
+
+            // Avoid multiple expensive atime updates by utimensat
+            if touched_chunks.insert(*digest) {
+                continue;
+            }
+
             if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
                 let hex = hex::encode(digest);
                 warn!(
@@ -1107,43 +1114,87 @@ impl DataStore {
         status: &mut GarbageCollectionStatus,
         worker: &dyn WorkerTaskContext,
     ) -> Result<(), Error> {
-        let image_list = self.list_images()?;
-        let image_count = image_list.len();
-
+        // Iterate twice over the datastore to fetch images, even if this comes with an additional
+        // runtime cost:
+        // - First iteration to find all index files, no matter if they are in a location expected
+        //   by the datastore's hierarchy
+        // - Iterate using the datastore's helpers, so the namespaces, groups and snapshots are
+        //   looked up given the expected hierarchy and iterator logic
+        //
+        // By this it is assured that all index files are used, even if they would not have been
+        // seen by the regular logic and the user is informed by the garbage collection run about
+        // the detected index files not following the iterators logic.
+
+        let mut unprocessed_image_list = self.list_images()?;
+        let image_count = unprocessed_image_list.len();
+
+        // Optimize for avoiding updates of chunks atime in same group for consecutive
+        // snapshots multiple times.
+        let mut touched_chunks = TouchedChunks::new();
+        let mut processed_images = 0;
         let mut last_percentage: usize = 0;
 
-        let mut strange_paths_count: u64 = 0;
-
-        for (i, img) in image_list.into_iter().enumerate() {
-            worker.check_abort()?;
-            worker.fail_on_shutdown()?;
-
-            if let Some(backup_dir_path) = img.parent() {
-                let backup_dir_path = backup_dir_path.strip_prefix(self.base_path())?;
-                if let Some(backup_dir_str) = backup_dir_path.to_str() {
-                    if pbs_api_types::parse_ns_and_snapshot(backup_dir_str).is_err() {
-                        strange_paths_count += 1;
+        let arc_self = Arc::new(self.clone());
+        for namespace in arc_self
+            .recursive_iter_backup_ns(BackupNamespace::root())
+            .context("creating namespace iterator failed")?
+        {
+            let namespace = namespace.context("iterating namespaces failed")?;
+            for group in arc_self.iter_backup_groups(namespace)? {
+                let group = group.context("iterating backup groups failed")?;
+                let mut snapshots = group.list_backups().context("listing snapshots failed")?;
+                // Sort by snapshot timestamp to iterate over consecutive snapshots for each image.
+                BackupInfo::sort_list(&mut snapshots, true);
+                for snapshot in snapshots {
+                    for file in snapshot.files {
+                        worker.check_abort()?;
+                        worker.fail_on_shutdown()?;
+
+                        let mut path = snapshot.backup_dir.full_path();
+                        path.push(file);
+
+                        let index = match self.open_index_reader(&path)? {
+                            Some(index) => index,
+                            None => continue,
+                        };
+                        self.index_mark_used_chunks(
+                            index,
+                            &path,
+                            &mut touched_chunks,
+                            status,
+                            worker,
+                        )?;
+
+                        unprocessed_image_list.remove(&path);
+
+                        let percentage = (processed_images + 1) * 100 / image_count;
+                        if percentage > last_percentage {
+                            info!(
+                                "marked {percentage}% ({} of {image_count} index files)",
+                                processed_images + 1,
+                            );
+                            last_percentage = percentage;
+                        }
+                        processed_images += 1;
                     }
+                    touched_chunks.reset();
                 }
             }
-
-            if let Some(index) = self.open_index_reader(&img)? {
-                self.index_mark_used_chunks(index, &img, status, worker)?;
-            }
-
-            let percentage = (i + 1) * 100 / image_count;
-            if percentage > last_percentage {
-                info!(
-                    "marked {percentage}% ({} of {image_count} index files)",
-                    i + 1,
-                );
-                last_percentage = percentage;
-            }
         }
 
+        let strange_paths_count = unprocessed_image_list.len();
         if strange_paths_count > 0 {
-            info!(
-                "found (and marked) {strange_paths_count} index files outside of expected directory scheme"
+            warn!("found {strange_paths_count} index files outside of expected directory scheme");
+        }
+        for path in unprocessed_image_list {
+            let index = match self.open_index_reader(&path)? {
+                Some(index) => index,
+                None => continue,
+            };
+            self.index_mark_used_chunks(index, &path, &mut touched_chunks, status, worker)?;
+            warn!(
+                "Marked chunks for unexpected index file at '{}'",
+                path.to_string_lossy()
             );
         }
 
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pbs-devel] [PATCH v2 proxmox-backup 4/4] fix #5331: garbage collection: avoid multiple chunk atime updates
  2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 4/4] fix #5331: garbage collection: avoid multiple chunk atime updates Christian Ebner
@ 2025-03-10 11:40   ` Christian Ebner
  2025-03-17 14:55   ` Fabian Grünbichler
  1 sibling, 0 replies; 14+ messages in thread
From: Christian Ebner @ 2025-03-10 11:40 UTC (permalink / raw)
  To: pbs-devel

Forgot to update the commit message on this patch, should be:
```
Reduce the number of atime updates on the same chunk by logically
iterating over image index files, following the datastore iterators
logic. Index files in unexpected locations are still considered, by
generating a list of all images to be found in the datastore and
removing regular index files from that list, leaving unexpected ones
behind.

To reduce the number of atimes updates, keep track of the encountered
chunk digests for consecutive snapshots. Chunks which have been
encountered in the previous snapshot, but are not present in the
current one are removed from the list in order to reduce memory
footprint.
```
Will update this if a further iteration of the patches is requested,
please amend this accordingly otherwise.

_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pbs-devel] [PATCH v2 proxmox-backup 3/4] garbage collection: allow to keep track of already touched chunks
  2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 3/4] garbage collection: allow to keep track of already touched chunks Christian Ebner
@ 2025-03-17 14:55   ` Fabian Grünbichler
  2025-03-17 15:39     ` Christian Ebner
  0 siblings, 1 reply; 14+ messages in thread
From: Fabian Grünbichler @ 2025-03-17 14:55 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On March 10, 2025 12:16 pm, Christian Ebner wrote:
> Implements the `TouchedChunks` struct and methods to keep track of
> already touched chunks during garbage collection phase 1, to avoid
> multiple computational and I/O intensive atime updates via a syscall.
> 
> By inserting a digest, the chunk will be considered as touched and
> can be ignored for subsequent encounters. To limit memory usage, the
> structure allows to reset the chunk status, flagging them as seen
> previous to the reset. A subsequent insert will then flag it as seen
> after the reset. Chunks not seen after a reset, will be cleared from
> the structure by the next reset call, eliminating them from memory.
> 
> This allows to reset the tracking stat after each processes image
> index file, to mimic the incremental backup behaviour of known chunks
> and limit memory footprint.
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 1:
> - no changes
> 
>  pbs-datastore/src/datastore.rs | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 72bc9f77f..fdbb33a98 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -1585,3 +1585,32 @@ impl DataStore {
>          Ok(())
>      }
>  }
> +
> +struct TouchedChunks {
> +    list: HashMap<[u8; 32], bool>,
> +}
> +
> +impl TouchedChunks {
> +    fn new() -> Self {
> +        Self {
> +            list: HashMap::new(),
> +        }
> +    }
> +
> +    // Clear untouched chunks and reset the touched marker for others.
> +    fn reset(&mut self) {
> +        let mut new_list = HashMap::new();
> +        for (digest, touched) in self.list.drain() {
> +            if touched {
> +                new_list.insert(digest, false);
> +            }
> +        }
> +        self.list = new_list;

this could avoid the memory allocation (and for bigger
indices/snapshots, probably multiple reallocations via the inserts) by
switching to `retain` (which gets a `&mut V` and can thus flip the value
of touched while filtering in a single pass).

despite the performance warning about it visiting empty buckets as well,
this seems to be (slightly) faster for my test datastore (would be
interesting if your test cases agree?) when benchmarking with a warmed
up cache.

I used the following on-top of your series (the shrink_to is to reduce
memory usage in case of outliers after they've been processed):

```
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index a80343d9b..d3c3f831f 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1650,13 +1650,12 @@ impl TouchedChunks {
 
     // Clear untouched chunks and reset the touched marker for others.
     fn reset(&mut self) {
-        let mut new_list = HashMap::new();
-        for (digest, touched) in self.list.drain() {
-            if touched {
-                new_list.insert(digest, false);
-            }
-        }
-        self.list = new_list;
+        self.list.retain(|_digest, touched| {
+            *touched = !*touched;
+            !*touched
+        });
+        let max_capacity = self.list.len().saturating_add(self.list.len() / 3);
+        self.list.shrink_to(max_capacity);
     }
 
     // Insert the digest in the list of touched chunks.
```

if the above is slower for your test inputs, then at least initializing
the second HashMap with some initial capacity (the len of the previous
list?) is probably sensible..

alternatively, we could also explore (maybe as a follow-up to
immediately realize the performance gains we already know we get from
the current aproach?) some sort of LRU-based approach?

that might give us the option of:
- dropping the reset call/fn while still having bounded (even
  configurable?) memory overhead
- extend skipping touches across a broader range of snapshots/indices
  for additional performance gains (if the capacity is high enough)

> +    }
> +
> +    // Insert the digest in the list of touched chunks.
> +    // Returns true if the chunk was already present, false otherwise.
> +    fn insert(&mut self, digest: [u8; 32]) -> bool {
> +        self.list.insert(digest, true).is_some()
> +    }
> +}
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pbs-devel] [PATCH v2 proxmox-backup 4/4] fix #5331: garbage collection: avoid multiple chunk atime updates
  2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 4/4] fix #5331: garbage collection: avoid multiple chunk atime updates Christian Ebner
  2025-03-10 11:40   ` Christian Ebner
@ 2025-03-17 14:55   ` Fabian Grünbichler
  2025-03-17 15:43     ` Christian Ebner
  1 sibling, 1 reply; 14+ messages in thread
From: Fabian Grünbichler @ 2025-03-17 14:55 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On March 10, 2025 12:16 pm, Christian Ebner wrote:
> Reduce the number of atime updates on the same chunk by logically
> iterating over image index files, following the incremental backup
> logic. By inserting paths for encountered images during
> `list_images` using the GroupedImageList structure, the iteration
> happens now for the same image filenames in the same image namespace
> and group in a order based on the snapshot timestamp. For each image,
> keep track of the encountered chunk digests, and remember these as
> seen for the next snapshot. Chunks which have been encountered in the
> previous image index, but are not present anymore are removed from
> the list after each image, in order to reduce memory footprint.
> 
> Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5331
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>

technically this could be split into two parts - one for the iterating
change, one for the performance optimization? with the latter part maybe
being combined with patch#3 and the order switched around? ;)

> ---
> changes since version 1:
> - Use pre-existing datastore iterator helpers, following the logic other
>   datastore operations take.
> - Chunks are now remembered for all archives per snapshot, not just a
>   single archive per snapshot as previously, this mimics more closely
>   the backup behaviour.
> 
>  pbs-datastore/src/datastore.rs | 117 +++++++++++++++++++++++----------
>  1 file changed, 84 insertions(+), 33 deletions(-)
> 
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index fdbb33a98..a80343d9b 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -25,7 +25,7 @@ use pbs_api_types::{
>      MaintenanceMode, MaintenanceType, Operation, UPID,
>  };
>  
> -use crate::backup_info::{BackupDir, BackupGroup};
> +use crate::backup_info::{BackupDir, BackupGroup, BackupInfo};
>  use crate::chunk_store::ChunkStore;
>  use crate::dynamic_index::{DynamicIndexReader, DynamicIndexWriter};
>  use crate::fixed_index::{FixedIndexReader, FixedIndexWriter};
> @@ -970,10 +970,10 @@ impl DataStore {
>          ListGroups::new(Arc::clone(self), ns)?.collect()
>      }
>  
> -    fn list_images(&self) -> Result<Vec<PathBuf>, Error> {
> +    fn list_images(&self) -> Result<HashSet<PathBuf>, Error> {
>          let base = self.base_path();
>  
> -        let mut list = vec![];
> +        let mut list = HashSet::new();
>  
>          use walkdir::WalkDir;
>  
> @@ -1021,7 +1021,7 @@ impl DataStore {
>                  if archive_type == ArchiveType::FixedIndex
>                      || archive_type == ArchiveType::DynamicIndex
>                  {
> -                    list.push(path);
> +                    list.insert(path);
>                  }
>              }
>          }
> @@ -1071,6 +1071,7 @@ impl DataStore {
>          &self,
>          index: Box<dyn IndexFile>,
>          file_name: &Path, // only used for error reporting
> +        touched_chunks: &mut TouchedChunks,
>          status: &mut GarbageCollectionStatus,
>          worker: &dyn WorkerTaskContext,
>      ) -> Result<(), Error> {
> @@ -1081,6 +1082,12 @@ impl DataStore {
>              worker.check_abort()?;
>              worker.fail_on_shutdown()?;
>              let digest = index.index_digest(pos).unwrap();
> +
> +            // Avoid multiple expensive atime updates by utimensat
> +            if touched_chunks.insert(*digest) {
> +                continue;
> +            }
> +
>              if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
>                  let hex = hex::encode(digest);
>                  warn!(
> @@ -1107,43 +1114,87 @@ impl DataStore {
>          status: &mut GarbageCollectionStatus,
>          worker: &dyn WorkerTaskContext,
>      ) -> Result<(), Error> {
> -        let image_list = self.list_images()?;
> -        let image_count = image_list.len();
> -
> +        // Iterate twice over the datastore to fetch images, even if this comes with an additional
> +        // runtime cost:
> +        // - First iteration to find all index files, no matter if they are in a location expected
> +        //   by the datastore's hierarchy
> +        // - Iterate using the datastore's helpers, so the namespaces, groups and snapshots are
> +        //   looked up given the expected hierarchy and iterator logic
> +        //
> +        // By this it is assured that all index files are used, even if they would not have been
> +        // seen by the regular logic and the user is informed by the garbage collection run about
> +        // the detected index files not following the iterators logic.
> +
> +        let mut unprocessed_image_list = self.list_images()?;
> +        let image_count = unprocessed_image_list.len();
> +
> +        // Optimize for avoiding updates of chunks atime in same group for consecutive
> +        // snapshots multiple times.
> +        let mut touched_chunks = TouchedChunks::new();
> +        let mut processed_images = 0;
>          let mut last_percentage: usize = 0;
>  
> -        let mut strange_paths_count: u64 = 0;
> -
> -        for (i, img) in image_list.into_iter().enumerate() {
> -            worker.check_abort()?;
> -            worker.fail_on_shutdown()?;
> -
> -            if let Some(backup_dir_path) = img.parent() {
> -                let backup_dir_path = backup_dir_path.strip_prefix(self.base_path())?;
> -                if let Some(backup_dir_str) = backup_dir_path.to_str() {
> -                    if pbs_api_types::parse_ns_and_snapshot(backup_dir_str).is_err() {
> -                        strange_paths_count += 1;
> +        let arc_self = Arc::new(self.clone());
> +        for namespace in arc_self
> +            .recursive_iter_backup_ns(BackupNamespace::root())
> +            .context("creating namespace iterator failed")?
> +        {
> +            let namespace = namespace.context("iterating namespaces failed")?;
> +            for group in arc_self.iter_backup_groups(namespace)? {
> +                let group = group.context("iterating backup groups failed")?;
> +                let mut snapshots = group.list_backups().context("listing snapshots failed")?;
> +                // Sort by snapshot timestamp to iterate over consecutive snapshots for each image.
> +                BackupInfo::sort_list(&mut snapshots, true);
> +                for snapshot in snapshots {
> +                    for file in snapshot.files {
> +                        worker.check_abort()?;
> +                        worker.fail_on_shutdown()?;
> +
> +                        let mut path = snapshot.backup_dir.full_path();
> +                        path.push(file);
> +
> +                        let index = match self.open_index_reader(&path)? {
> +                            Some(index) => index,
> +                            None => continue,
> +                        };
> +                        self.index_mark_used_chunks(
> +                            index,
> +                            &path,
> +                            &mut touched_chunks,
> +                            status,
> +                            worker,
> +                        )?;
> +
> +                        unprocessed_image_list.remove(&path);
> +
> +                        let percentage = (processed_images + 1) * 100 / image_count;
> +                        if percentage > last_percentage {
> +                            info!(
> +                                "marked {percentage}% ({} of {image_count} index files)",
> +                                processed_images + 1,
> +                            );
> +                            last_percentage = percentage;
> +                        }
> +                        processed_images += 1;
>                      }
> +                    touched_chunks.reset();
>                  }
>              }
> -
> -            if let Some(index) = self.open_index_reader(&img)? {
> -                self.index_mark_used_chunks(index, &img, status, worker)?;
> -            }
> -
> -            let percentage = (i + 1) * 100 / image_count;
> -            if percentage > last_percentage {
> -                info!(
> -                    "marked {percentage}% ({} of {image_count} index files)",
> -                    i + 1,
> -                );
> -                last_percentage = percentage;
> -            }
>          }
>  
> +        let strange_paths_count = unprocessed_image_list.len();
>          if strange_paths_count > 0 {
> -            info!(
> -                "found (and marked) {strange_paths_count} index files outside of expected directory scheme"
> +            warn!("found {strange_paths_count} index files outside of expected directory scheme");
> +        }
> +        for path in unprocessed_image_list {
> +            let index = match self.open_index_reader(&path)? {
> +                Some(index) => index,
> +                None => continue,
> +            };
> +            self.index_mark_used_chunks(index, &path, &mut touched_chunks, status, worker)?;
> +            warn!(
> +                "Marked chunks for unexpected index file at '{}'",
> +                path.to_string_lossy()
>              );
>          }
>  
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pbs-devel] [PATCH v2 proxmox-backup 2/4] datastore: add helper method to open index reader from path
  2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 2/4] datastore: add helper method to open index reader from path Christian Ebner
@ 2025-03-17 14:59   ` Fabian Grünbichler
  2025-03-17 15:41     ` Christian Ebner
  0 siblings, 1 reply; 14+ messages in thread
From: Fabian Grünbichler @ 2025-03-17 14:59 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On March 10, 2025 12:16 pm, Christian Ebner wrote:
> Refactor the archive type and index file reader opening with its
> error handling into a helper method for better reusability.
> 
> This allows to use the same logic for both, expected image paths
> and unexpected image paths when iterating trough the datastore
> in a hierarchical manner.
> 
> Improve error handling by switching to anyhow's error context.
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 1:
> - not present in previous version
> 
>  pbs-datastore/src/datastore.rs | 66 ++++++++++++++++++++++------------
>  1 file changed, 44 insertions(+), 22 deletions(-)
> 
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index a6a91ca79..72bc9f77f 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -5,7 +5,7 @@ use std::os::unix::io::AsRawFd;
>  use std::path::{Path, PathBuf};
>  use std::sync::{Arc, LazyLock, Mutex};
>  
> -use anyhow::{bail, format_err, Error};
> +use anyhow::{bail, format_err, Context, Error};
>  use nix::unistd::{unlinkat, UnlinkatFlags};
>  use tracing::{info, warn};
>  
> @@ -1029,10 +1029,47 @@ impl DataStore {
>          Ok(list)
>      }
>  
> +    // Similar to open index, but ignore index files with blob or unknown archive type.
> +    // Further, do not fail if file vanished.

nit: but compared to open_index it takes an absolute path, not a relative one
to the base of the datastore? this should probably be made explicit and
checked?

(it might also at some point make sense to pull out GC+related helpers
into a separate file to separate such things properly..)

> +    fn open_index_reader(&self, path: &Path) -> Result<Option<Box<dyn IndexFile>>, Error> {
> +        let archive_type = match ArchiveType::from_path(path) {
> +            Ok(archive_type) => archive_type,
> +            // ignore archives with unknown archive type
> +            Err(_) => return Ok(None),
> +        };
> +
> +        let file = match std::fs::File::open(path) {
> +            Ok(file) => file,
> +            // ignore vanished files
> +            Err(err) if err.kind() == io::ErrorKind::NotFound => return Ok(None),
> +            Err(err) => {
> +                return Err(
> +                    Error::from(err).context(format!("can't open file {}", path.to_string_lossy()))
> +                )
> +            }
> +        };
> +
> +        match archive_type {
> +            ArchiveType::FixedIndex => {
> +                let reader = FixedIndexReader::new(file)
> +                    .context(format!("can't open fixed index {}", path.to_string_lossy()))?;
> +                Ok(Some(Box::new(reader)))
> +            }
> +            ArchiveType::DynamicIndex => {
> +                let reader = DynamicIndexReader::new(file).context(format!(
> +                    "can't open dynamic index {}",
> +                    path.to_string_lossy()
> +                ))?;
> +                Ok(Some(Box::new(reader)))
> +            }
> +            ArchiveType::Blob => Ok(None),
> +        }
> +    }
> +
>      // mark chunks  used by ``index`` as used
> -    fn index_mark_used_chunks<I: IndexFile>(
> +    fn index_mark_used_chunks(
>          &self,
> -        index: I,
> +        index: Box<dyn IndexFile>,
>          file_name: &Path, // only used for error reporting
>          status: &mut GarbageCollectionStatus,
>          worker: &dyn WorkerTaskContext,
> @@ -1090,24 +1127,8 @@ impl DataStore {
>                  }
>              }
>  
> -            match std::fs::File::open(&img) {
> -                Ok(file) => {
> -                    if let Ok(archive_type) = ArchiveType::from_path(&img) {
> -                        if archive_type == ArchiveType::FixedIndex {
> -                            let index = FixedIndexReader::new(file).map_err(|e| {
> -                                format_err!("can't read index '{}' - {}", img.to_string_lossy(), e)
> -                            })?;
> -                            self.index_mark_used_chunks(index, &img, status, worker)?;
> -                        } else if archive_type == ArchiveType::DynamicIndex {
> -                            let index = DynamicIndexReader::new(file).map_err(|e| {
> -                                format_err!("can't read index '{}' - {}", img.to_string_lossy(), e)
> -                            })?;
> -                            self.index_mark_used_chunks(index, &img, status, worker)?;
> -                        }
> -                    }
> -                }
> -                Err(err) if err.kind() == io::ErrorKind::NotFound => (), // ignore vanished files
> -                Err(err) => bail!("can't open index {} - {}", img.to_string_lossy(), err),
> +            if let Some(index) = self.open_index_reader(&img)? {
> +                self.index_mark_used_chunks(index, &img, status, worker)?;
>              }
>  
>              let percentage = (i + 1) * 100 / image_count;
> @@ -1173,7 +1194,8 @@ impl DataStore {
>  
>              info!("Start GC phase1 (mark used chunks)");
>  
> -            self.mark_used_chunks(&mut gc_status, worker)?;
> +            self.mark_used_chunks(&mut gc_status, worker)
> +                .map_err(|err| format_err!("marking used chunks failed - {err:#}"))?;
>  
>              info!("Start GC phase2 (sweep unused chunks)");
>              self.inner.chunk_store.sweep_unused_chunks(
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [pbs-devel] applied: [PATCH v2 proxmox-backup 1/4] datastore: restrict datastores list_images method scope to module
  2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 1/4] datastore: restrict datastores list_images method scope to module Christian Ebner
@ 2025-03-17 15:00   ` Fabian Grünbichler
  0 siblings, 0 replies; 14+ messages in thread
From: Fabian Grünbichler @ 2025-03-17 15:00 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

applied this one ;)

On March 10, 2025 12:16 pm, Christian Ebner wrote:
> Drop the pub scope for `DataStore`s `list_images` method.
> 
> This method is only used to generate a list of index files found in
> the datastore for iteration during garbage collection. There are no
> other call sites and this is intended to only be used within the
> module itself. Allows to be more flexible for future method signature
> adaptions.
> 
> No functional changes.
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
> changes since version 1:
> - no changes
> 
>  pbs-datastore/src/datastore.rs | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 75c0c16ab..a6a91ca79 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -970,7 +970,7 @@ impl DataStore {
>          ListGroups::new(Arc::clone(self), ns)?.collect()
>      }
>  
> -    pub fn list_images(&self) -> Result<Vec<PathBuf>, Error> {
> +    fn list_images(&self) -> Result<Vec<PathBuf>, Error> {
>          let base = self.base_path();
>  
>          let mut list = vec![];
> -- 
> 2.39.5
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pbs-devel] [PATCH v2 proxmox-backup 3/4] garbage collection: allow to keep track of already touched chunks
  2025-03-17 14:55   ` Fabian Grünbichler
@ 2025-03-17 15:39     ` Christian Ebner
  2025-03-18 16:53       ` Christian Ebner
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Ebner @ 2025-03-17 15:39 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Fabian Grünbichler

On 3/17/25 15:55, Fabian Grünbichler wrote:
> On March 10, 2025 12:16 pm, Christian Ebner wrote:
>> Implements the `TouchedChunks` struct and methods to keep track of
>> already touched chunks during garbage collection phase 1, to avoid
>> multiple computational and I/O intensive atime updates via a syscall.
>>
>> By inserting a digest, the chunk will be considered as touched and
>> can be ignored for subsequent encounters. To limit memory usage, the
>> structure allows to reset the chunk status, flagging them as seen
>> previous to the reset. A subsequent insert will then flag it as seen
>> after the reset. Chunks not seen after a reset, will be cleared from
>> the structure by the next reset call, eliminating them from memory.
>>
>> This allows to reset the tracking stat after each processes image
>> index file, to mimic the incremental backup behaviour of known chunks
>> and limit memory footprint.
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>> changes since version 1:
>> - no changes
>>
>>   pbs-datastore/src/datastore.rs | 29 +++++++++++++++++++++++++++++
>>   1 file changed, 29 insertions(+)
>>
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 72bc9f77f..fdbb33a98 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -1585,3 +1585,32 @@ impl DataStore {
>>           Ok(())
>>       }
>>   }
>> +
>> +struct TouchedChunks {
>> +    list: HashMap<[u8; 32], bool>,
>> +}
>> +
>> +impl TouchedChunks {
>> +    fn new() -> Self {
>> +        Self {
>> +            list: HashMap::new(),
>> +        }
>> +    }
>> +
>> +    // Clear untouched chunks and reset the touched marker for others.
>> +    fn reset(&mut self) {
>> +        let mut new_list = HashMap::new();
>> +        for (digest, touched) in self.list.drain() {
>> +            if touched {
>> +                new_list.insert(digest, false);
>> +            }
>> +        }
>> +        self.list = new_list;
> 
> this could avoid the memory allocation (and for bigger
> indices/snapshots, probably multiple reallocations via the inserts) by
> switching to `retain` (which gets a `&mut V` and can thus flip the value
> of touched while filtering in a single pass).
> 
> despite the performance warning about it visiting empty buckets as well,
> this seems to be (slightly) faster for my test datastore (would be
> interesting if your test cases agree?) when benchmarking with a warmed
> up cache.
> 
> I used the following on-top of your series (the shrink_to is to reduce
> memory usage in case of outliers after they've been processed):
> 
> ```
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index a80343d9b..d3c3f831f 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -1650,13 +1650,12 @@ impl TouchedChunks {
>   
>       // Clear untouched chunks and reset the touched marker for others.
>       fn reset(&mut self) {
> -        let mut new_list = HashMap::new();
> -        for (digest, touched) in self.list.drain() {
> -            if touched {
> -                new_list.insert(digest, false);
> -            }
> -        }
> -        self.list = new_list;
> +        self.list.retain(|_digest, touched| {
> +            *touched = !*touched;
> +            !*touched
> +        });
> +        let max_capacity = self.list.len().saturating_add(self.list.len() / 3);
> +        self.list.shrink_to(max_capacity);
>       }
>   
>       // Insert the digest in the list of touched chunks.
> ```
> 
> if the above is slower for your test inputs, then at least initializing
> the second HashMap with some initial capacity (the len of the previous
> list?) is probably sensible..

Okay, will check and incorporate these suggestions, thanks!

> 
> alternatively, we could also explore (maybe as a follow-up to
> immediately realize the performance gains we already know we get from
> the current aproach?) some sort of LRU-based approach?

Yeah, had something like that in mind as well because of the recent work 
on that. This would allow to better control the overall memory 
requirements for the garbage collection task as well, as it then does 
not depend on the index files. Keeping the cache capacity large is of 
course a pre-condition for that.

> 
> that might give us the option of:
> - dropping the reset call/fn while still having bounded (even
>    configurable?) memory overhead
> - extend skipping touches across a broader range of snapshots/indices
>    for additional performance gains (if the capacity is high enough)
> 
>> +    }
>> +
>> +    // Insert the digest in the list of touched chunks.
>> +    // Returns true if the chunk was already present, false otherwise.
>> +    fn insert(&mut self, digest: [u8; 32]) -> bool {
>> +        self.list.insert(digest, true).is_some()
>> +    }
>> +}
>> -- 
>> 2.39.5
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pbs-devel] [PATCH v2 proxmox-backup 2/4] datastore: add helper method to open index reader from path
  2025-03-17 14:59   ` Fabian Grünbichler
@ 2025-03-17 15:41     ` Christian Ebner
  0 siblings, 0 replies; 14+ messages in thread
From: Christian Ebner @ 2025-03-17 15:41 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Fabian Grünbichler

On 3/17/25 15:59, Fabian Grünbichler wrote:
> On March 10, 2025 12:16 pm, Christian Ebner wrote:
>> Refactor the archive type and index file reader opening with its
>> error handling into a helper method for better reusability.
>>
>> This allows to use the same logic for both, expected image paths
>> and unexpected image paths when iterating trough the datastore
>> in a hierarchical manner.
>>
>> Improve error handling by switching to anyhow's error context.
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>> changes since version 1:
>> - not present in previous version
>>
>>   pbs-datastore/src/datastore.rs | 66 ++++++++++++++++++++++------------
>>   1 file changed, 44 insertions(+), 22 deletions(-)
>>
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index a6a91ca79..72bc9f77f 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -5,7 +5,7 @@ use std::os::unix::io::AsRawFd;
>>   use std::path::{Path, PathBuf};
>>   use std::sync::{Arc, LazyLock, Mutex};
>>   
>> -use anyhow::{bail, format_err, Error};
>> +use anyhow::{bail, format_err, Context, Error};
>>   use nix::unistd::{unlinkat, UnlinkatFlags};
>>   use tracing::{info, warn};
>>   
>> @@ -1029,10 +1029,47 @@ impl DataStore {
>>           Ok(list)
>>       }
>>   
>> +    // Similar to open index, but ignore index files with blob or unknown archive type.
>> +    // Further, do not fail if file vanished.
> 
> nit: but compared to open_index it takes an absolute path, not a relative one
> to the base of the datastore? this should probably be made explicit and
> checked?

Acked, will adapt this accordingly!

> 
> (it might also at some point make sense to pull out GC+related helpers
> into a separate file to separate such things properly..)

Ok, will dot this as followup patches on top of this as well.

> 
>> +    fn open_index_reader(&self, path: &Path) -> Result<Option<Box<dyn IndexFile>>, Error> {
>> +        let archive_type = match ArchiveType::from_path(path) {
>> +            Ok(archive_type) => archive_type,
>> +            // ignore archives with unknown archive type
>> +            Err(_) => return Ok(None),
>> +        };
>> +
>> +        let file = match std::fs::File::open(path) {
>> +            Ok(file) => file,
>> +            // ignore vanished files
>> +            Err(err) if err.kind() == io::ErrorKind::NotFound => return Ok(None),
>> +            Err(err) => {
>> +                return Err(
>> +                    Error::from(err).context(format!("can't open file {}", path.to_string_lossy()))
>> +                )
>> +            }
>> +        };
>> +
>> +        match archive_type {
>> +            ArchiveType::FixedIndex => {
>> +                let reader = FixedIndexReader::new(file)
>> +                    .context(format!("can't open fixed index {}", path.to_string_lossy()))?;
>> +                Ok(Some(Box::new(reader)))
>> +            }
>> +            ArchiveType::DynamicIndex => {
>> +                let reader = DynamicIndexReader::new(file).context(format!(
>> +                    "can't open dynamic index {}",
>> +                    path.to_string_lossy()
>> +                ))?;
>> +                Ok(Some(Box::new(reader)))
>> +            }
>> +            ArchiveType::Blob => Ok(None),
>> +        }
>> +    }
>> +
>>       // mark chunks  used by ``index`` as used
>> -    fn index_mark_used_chunks<I: IndexFile>(
>> +    fn index_mark_used_chunks(
>>           &self,
>> -        index: I,
>> +        index: Box<dyn IndexFile>,
>>           file_name: &Path, // only used for error reporting
>>           status: &mut GarbageCollectionStatus,
>>           worker: &dyn WorkerTaskContext,
>> @@ -1090,24 +1127,8 @@ impl DataStore {
>>                   }
>>               }
>>   
>> -            match std::fs::File::open(&img) {
>> -                Ok(file) => {
>> -                    if let Ok(archive_type) = ArchiveType::from_path(&img) {
>> -                        if archive_type == ArchiveType::FixedIndex {
>> -                            let index = FixedIndexReader::new(file).map_err(|e| {
>> -                                format_err!("can't read index '{}' - {}", img.to_string_lossy(), e)
>> -                            })?;
>> -                            self.index_mark_used_chunks(index, &img, status, worker)?;
>> -                        } else if archive_type == ArchiveType::DynamicIndex {
>> -                            let index = DynamicIndexReader::new(file).map_err(|e| {
>> -                                format_err!("can't read index '{}' - {}", img.to_string_lossy(), e)
>> -                            })?;
>> -                            self.index_mark_used_chunks(index, &img, status, worker)?;
>> -                        }
>> -                    }
>> -                }
>> -                Err(err) if err.kind() == io::ErrorKind::NotFound => (), // ignore vanished files
>> -                Err(err) => bail!("can't open index {} - {}", img.to_string_lossy(), err),
>> +            if let Some(index) = self.open_index_reader(&img)? {
>> +                self.index_mark_used_chunks(index, &img, status, worker)?;
>>               }
>>   
>>               let percentage = (i + 1) * 100 / image_count;
>> @@ -1173,7 +1194,8 @@ impl DataStore {
>>   
>>               info!("Start GC phase1 (mark used chunks)");
>>   
>> -            self.mark_used_chunks(&mut gc_status, worker)?;
>> +            self.mark_used_chunks(&mut gc_status, worker)
>> +                .map_err(|err| format_err!("marking used chunks failed - {err:#}"))?;
>>   
>>               info!("Start GC phase2 (sweep unused chunks)");
>>               self.inner.chunk_store.sweep_unused_chunks(
>> -- 
>> 2.39.5
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pbs-devel] [PATCH v2 proxmox-backup 4/4] fix #5331: garbage collection: avoid multiple chunk atime updates
  2025-03-17 14:55   ` Fabian Grünbichler
@ 2025-03-17 15:43     ` Christian Ebner
  0 siblings, 0 replies; 14+ messages in thread
From: Christian Ebner @ 2025-03-17 15:43 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Fabian Grünbichler

On 3/17/25 15:55, Fabian Grünbichler wrote:
> On March 10, 2025 12:16 pm, Christian Ebner wrote:
>> Reduce the number of atime updates on the same chunk by logically
>> iterating over image index files, following the incremental backup
>> logic. By inserting paths for encountered images during
>> `list_images` using the GroupedImageList structure, the iteration
>> happens now for the same image filenames in the same image namespace
>> and group in a order based on the snapshot timestamp. For each image,
>> keep track of the encountered chunk digests, and remember these as
>> seen for the next snapshot. Chunks which have been encountered in the
>> previous image index, but are not present anymore are removed from
>> the list after each image, in order to reduce memory footprint.
>>
>> Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5331
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> 
> technically this could be split into two parts - one for the iterating
> change, one for the performance optimization? with the latter part maybe
> being combined with patch#3 and the order switched around? ;)

Okay, will reorganize the patches accordingly.

> 
>> ---
>> changes since version 1:
>> - Use pre-existing datastore iterator helpers, following the logic other
>>    datastore operations take.
>> - Chunks are now remembered for all archives per snapshot, not just a
>>    single archive per snapshot as previously, this mimics more closely
>>    the backup behaviour.
>>
>>   pbs-datastore/src/datastore.rs | 117 +++++++++++++++++++++++----------
>>   1 file changed, 84 insertions(+), 33 deletions(-)
>>
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index fdbb33a98..a80343d9b 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -25,7 +25,7 @@ use pbs_api_types::{
>>       MaintenanceMode, MaintenanceType, Operation, UPID,
>>   };
>>   
>> -use crate::backup_info::{BackupDir, BackupGroup};
>> +use crate::backup_info::{BackupDir, BackupGroup, BackupInfo};
>>   use crate::chunk_store::ChunkStore;
>>   use crate::dynamic_index::{DynamicIndexReader, DynamicIndexWriter};
>>   use crate::fixed_index::{FixedIndexReader, FixedIndexWriter};
>> @@ -970,10 +970,10 @@ impl DataStore {
>>           ListGroups::new(Arc::clone(self), ns)?.collect()
>>       }
>>   
>> -    fn list_images(&self) -> Result<Vec<PathBuf>, Error> {
>> +    fn list_images(&self) -> Result<HashSet<PathBuf>, Error> {
>>           let base = self.base_path();
>>   
>> -        let mut list = vec![];
>> +        let mut list = HashSet::new();
>>   
>>           use walkdir::WalkDir;
>>   
>> @@ -1021,7 +1021,7 @@ impl DataStore {
>>                   if archive_type == ArchiveType::FixedIndex
>>                       || archive_type == ArchiveType::DynamicIndex
>>                   {
>> -                    list.push(path);
>> +                    list.insert(path);
>>                   }
>>               }
>>           }
>> @@ -1071,6 +1071,7 @@ impl DataStore {
>>           &self,
>>           index: Box<dyn IndexFile>,
>>           file_name: &Path, // only used for error reporting
>> +        touched_chunks: &mut TouchedChunks,
>>           status: &mut GarbageCollectionStatus,
>>           worker: &dyn WorkerTaskContext,
>>       ) -> Result<(), Error> {
>> @@ -1081,6 +1082,12 @@ impl DataStore {
>>               worker.check_abort()?;
>>               worker.fail_on_shutdown()?;
>>               let digest = index.index_digest(pos).unwrap();
>> +
>> +            // Avoid multiple expensive atime updates by utimensat
>> +            if touched_chunks.insert(*digest) {
>> +                continue;
>> +            }
>> +
>>               if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
>>                   let hex = hex::encode(digest);
>>                   warn!(
>> @@ -1107,43 +1114,87 @@ impl DataStore {
>>           status: &mut GarbageCollectionStatus,
>>           worker: &dyn WorkerTaskContext,
>>       ) -> Result<(), Error> {
>> -        let image_list = self.list_images()?;
>> -        let image_count = image_list.len();
>> -
>> +        // Iterate twice over the datastore to fetch images, even if this comes with an additional
>> +        // runtime cost:
>> +        // - First iteration to find all index files, no matter if they are in a location expected
>> +        //   by the datastore's hierarchy
>> +        // - Iterate using the datastore's helpers, so the namespaces, groups and snapshots are
>> +        //   looked up given the expected hierarchy and iterator logic
>> +        //
>> +        // By this it is assured that all index files are used, even if they would not have been
>> +        // seen by the regular logic and the user is informed by the garbage collection run about
>> +        // the detected index files not following the iterators logic.
>> +
>> +        let mut unprocessed_image_list = self.list_images()?;
>> +        let image_count = unprocessed_image_list.len();
>> +
>> +        // Optimize for avoiding updates of chunks atime in same group for consecutive
>> +        // snapshots multiple times.
>> +        let mut touched_chunks = TouchedChunks::new();
>> +        let mut processed_images = 0;
>>           let mut last_percentage: usize = 0;
>>   
>> -        let mut strange_paths_count: u64 = 0;
>> -
>> -        for (i, img) in image_list.into_iter().enumerate() {
>> -            worker.check_abort()?;
>> -            worker.fail_on_shutdown()?;
>> -
>> -            if let Some(backup_dir_path) = img.parent() {
>> -                let backup_dir_path = backup_dir_path.strip_prefix(self.base_path())?;
>> -                if let Some(backup_dir_str) = backup_dir_path.to_str() {
>> -                    if pbs_api_types::parse_ns_and_snapshot(backup_dir_str).is_err() {
>> -                        strange_paths_count += 1;
>> +        let arc_self = Arc::new(self.clone());
>> +        for namespace in arc_self
>> +            .recursive_iter_backup_ns(BackupNamespace::root())
>> +            .context("creating namespace iterator failed")?
>> +        {
>> +            let namespace = namespace.context("iterating namespaces failed")?;
>> +            for group in arc_self.iter_backup_groups(namespace)? {
>> +                let group = group.context("iterating backup groups failed")?;
>> +                let mut snapshots = group.list_backups().context("listing snapshots failed")?;
>> +                // Sort by snapshot timestamp to iterate over consecutive snapshots for each image.
>> +                BackupInfo::sort_list(&mut snapshots, true);
>> +                for snapshot in snapshots {
>> +                    for file in snapshot.files {
>> +                        worker.check_abort()?;
>> +                        worker.fail_on_shutdown()?;
>> +
>> +                        let mut path = snapshot.backup_dir.full_path();
>> +                        path.push(file);
>> +
>> +                        let index = match self.open_index_reader(&path)? {
>> +                            Some(index) => index,
>> +                            None => continue,
>> +                        };
>> +                        self.index_mark_used_chunks(
>> +                            index,
>> +                            &path,
>> +                            &mut touched_chunks,
>> +                            status,
>> +                            worker,
>> +                        )?;
>> +
>> +                        unprocessed_image_list.remove(&path);
>> +
>> +                        let percentage = (processed_images + 1) * 100 / image_count;
>> +                        if percentage > last_percentage {
>> +                            info!(
>> +                                "marked {percentage}% ({} of {image_count} index files)",
>> +                                processed_images + 1,
>> +                            );
>> +                            last_percentage = percentage;
>> +                        }
>> +                        processed_images += 1;
>>                       }
>> +                    touched_chunks.reset();
>>                   }
>>               }
>> -
>> -            if let Some(index) = self.open_index_reader(&img)? {
>> -                self.index_mark_used_chunks(index, &img, status, worker)?;
>> -            }
>> -
>> -            let percentage = (i + 1) * 100 / image_count;
>> -            if percentage > last_percentage {
>> -                info!(
>> -                    "marked {percentage}% ({} of {image_count} index files)",
>> -                    i + 1,
>> -                );
>> -                last_percentage = percentage;
>> -            }
>>           }
>>   
>> +        let strange_paths_count = unprocessed_image_list.len();
>>           if strange_paths_count > 0 {
>> -            info!(
>> -                "found (and marked) {strange_paths_count} index files outside of expected directory scheme"
>> +            warn!("found {strange_paths_count} index files outside of expected directory scheme");
>> +        }
>> +        for path in unprocessed_image_list {
>> +            let index = match self.open_index_reader(&path)? {
>> +                Some(index) => index,
>> +                None => continue,
>> +            };
>> +            self.index_mark_used_chunks(index, &path, &mut touched_chunks, status, worker)?;
>> +            warn!(
>> +                "Marked chunks for unexpected index file at '{}'",
>> +                path.to_string_lossy()
>>               );
>>           }
>>   
>> -- 
>> 2.39.5
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [pbs-devel] [PATCH v2 proxmox-backup 3/4] garbage collection: allow to keep track of already touched chunks
  2025-03-17 15:39     ` Christian Ebner
@ 2025-03-18 16:53       ` Christian Ebner
  0 siblings, 0 replies; 14+ messages in thread
From: Christian Ebner @ 2025-03-18 16:53 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Fabian Grünbichler

On 3/17/25 16:39, Christian Ebner wrote:
> On 3/17/25 15:55, Fabian Grünbichler wrote:
>>
>> this could avoid the memory allocation (and for bigger
>> indices/snapshots, probably multiple reallocations via the inserts) by
>> switching to `retain` (which gets a `&mut V` and can thus flip the value
>> of touched while filtering in a single pass).
>>
>> despite the performance warning about it visiting empty buckets as well,
>> this seems to be (slightly) faster for my test datastore (would be
>> interesting if your test cases agree?) when benchmarking with a warmed
>> up cache.
>>
>> I used the following on-top of your series (the shrink_to is to reduce
>> memory usage in case of outliers after they've been processed):
>>
>> ```
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/ 
>> datastore.rs
>> index a80343d9b..d3c3f831f 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -1650,13 +1650,12 @@ impl TouchedChunks {
>>       // Clear untouched chunks and reset the touched marker for others.
>>       fn reset(&mut self) {
>> -        let mut new_list = HashMap::new();
>> -        for (digest, touched) in self.list.drain() {
>> -            if touched {
>> -                new_list.insert(digest, false);
>> -            }
>> -        }
>> -        self.list = new_list;
>> +        self.list.retain(|_digest, touched| {
>> +            *touched = !*touched;
>> +            !*touched
>> +        });
>> +        let max_capacity = 
>> self.list.len().saturating_add(self.list.len() / 3);
>> +        self.list.shrink_to(max_capacity);
>>       }
>>       // Insert the digest in the list of touched chunks.
>> ```
>>
>> if the above is slower for your test inputs, then at least initializing
>> the second HashMap with some initial capacity (the len of the previous
>> list?) is probably sensible..
> 
> Okay, will check and incorporate these suggestions, thanks!

Did some testing on this, above changes do result in a slight improvement:

Without above I do get about 26s with 1147680 utimensat calls on a 
datastore located on spinnig disk, with it drops to about 24s (checked 
the same number of utimensat calls).

On the datastore located on SSD I get about 15s with 566341 utimensat 
calls without the modifications and about 14s with.

Note: These values are not comparable to the tests results listed in the 
coverletter of v2, as the datastore contents changed since.

So this helps also for my testcase. However, see below...

>>
>> alternatively, we could also explore (maybe as a follow-up to
>> immediately realize the performance gains we already know we get from
>> the current aproach?) some sort of LRU-based approach?
> 
> Yeah, had something like that in mind as well because of the recent work 
> on that. This would allow to better control the overall memory 
> requirements for the garbage collection task as well, as it then does 
> not depend on the index files. Keeping the cache capacity large is of 
> course a pre-condition for that.

I did test this as well storing up to 1024 * 1024 * 32 bytes in digests 
as keys in the LRU cache, and while there is no significant change in 
runtime (?, this needs some double checking) as compared to the patched 
version as shown above, the utimensat call count dropped to 1078946 for 
my particular test case on the spinning disk and 560833 on the SSD. 
Therefore I will further pursue this approach given the already 
mentioned advantages.

The question then remains, do we want to keep the changes to the image 
list iteration, or even drop that as well in favor of reduced complexity 
and slight increase in performance?

I do not see much gain from that anymore...


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-03-18 16:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-10 11:16 [pbs-devel] [PATCH v2 proxmox-backup 0/4] GC: avoid multiple atime updates Christian Ebner
2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 1/4] datastore: restrict datastores list_images method scope to module Christian Ebner
2025-03-17 15:00   ` [pbs-devel] applied: " Fabian Grünbichler
2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 2/4] datastore: add helper method to open index reader from path Christian Ebner
2025-03-17 14:59   ` Fabian Grünbichler
2025-03-17 15:41     ` Christian Ebner
2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 3/4] garbage collection: allow to keep track of already touched chunks Christian Ebner
2025-03-17 14:55   ` Fabian Grünbichler
2025-03-17 15:39     ` Christian Ebner
2025-03-18 16:53       ` Christian Ebner
2025-03-10 11:16 ` [pbs-devel] [PATCH v2 proxmox-backup 4/4] fix #5331: garbage collection: avoid multiple chunk atime updates Christian Ebner
2025-03-10 11:40   ` Christian Ebner
2025-03-17 14:55   ` Fabian Grünbichler
2025-03-17 15:43     ` Christian Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal