* [pbs-devel] [PATCH v5 proxmox-backup 0/5] fix #5331: GC: avoid multiple atime updates
@ 2025-03-26 10:03 Christian Ebner
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 1/5] tools: lru cache: tell if node was already present or newly inserted Christian Ebner
` (5 more replies)
0 siblings, 6 replies; 12+ messages in thread
From: Christian Ebner @ 2025-03-26 10:03 UTC (permalink / raw)
To: pbs-devel
This patches implement the logic to greatly improve the performance
of phase 1 garbage collection by avoiding multiple atime updates on
the same chunk.
Currently, phase 1 GC iterates over all folders in the datastore
looking and collecting all image index files without taking any
logical assumptions (e.g. namespaces, groups, snapshots, ...). This
is to avoid accidentally missing image index files located in
unexpected paths and therefore not marking their chunks as in use,
leading to potential data losses.
This patches improve phase 1 by:
- Iterating index images using the datatstore's iterators for detecting
regular index files. Paths outside of the iterator logic are still taken
into account and processed as well by generating a list of all the found
images first, removing index files encountered while iterating, finally
leaving a list of indexes with unexpected paths. These unexpected paths
are now also logged, for the user to potentially take action.
- Keeping track of recently touched chunks by storing their digests in a
LRU cache, skipping over expensive atime updates for chunks already
present in the cache.
Most notably changes since version 4 (thanks Thomas for feedback):
- Added basic benchmark results to the respective commit messages
- Extend reasoning in commit messages
- Adapted variable name, fixed formatting issue
Most notably changes since version 3 (thanks Wolfgang for feedback):
- Use `with_context` over `context` to avoid possibly unnecessary allocation
- Align terminology with docs and rest of the codebase by using index
file instead of image in method and variable names.
Most notably changes since version 2 (thanks Fabian for feedback):
- Use LRU cache instead of keeping track of chunks from the previous
snapshot in the group.
- Split patches to logically separate iteration from caching logic
- Adapt for better anyhow context error propagation and formatting
Most notably changes since version 1 (thanks Fabian for feedback):
- Logically iterate using pre-existing iterators instead of constructing
data structure for iteration when listing images.
- Tested that double listing does not affect runtime.
- Chunks are now remembered for all archives per snapshot, not just a
single archive per snapshot as previously, this mimics more closely
the backup behaviour, this give some additional gains in some cases.
Christian Ebner (5):
tools: lru cache: tell if node was already present or newly inserted
garbage collection: format error including anyhow error context
datastore: add helper method to open index reader from path
garbage collection: generate index file list via datastore iterators
fix #5331: garbage collection: avoid multiple chunk atime updates
pbs-datastore/src/datastore.rs | 179 ++++++++++++++++++++++----------
pbs-tools/src/lru_cache.rs | 4 +-
src/api2/admin/datastore.rs | 6 +-
src/bin/proxmox-backup-proxy.rs | 2 +-
4 files changed, 131 insertions(+), 60 deletions(-)
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* [pbs-devel] [PATCH v5 proxmox-backup 1/5] tools: lru cache: tell if node was already present or newly inserted
2025-03-26 10:03 [pbs-devel] [PATCH v5 proxmox-backup 0/5] fix #5331: GC: avoid multiple atime updates Christian Ebner
@ 2025-03-26 10:03 ` Christian Ebner
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 2/5] garbage collection: format error including anyhow error context Christian Ebner
` (4 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Christian Ebner @ 2025-03-26 10:03 UTC (permalink / raw)
To: pbs-devel
Add a boolean return type to LruCache::insert(), telling if the node
was already present in the cache or if it was newly inserted.
This will allow to use the LRU cache for garbage collection, where
it is required to skip atime updates for chunks already marked in
use.
That improves phase 1 garbage collection performance by avoiding,
multiple atime updates.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes
pbs-tools/src/lru_cache.rs | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/pbs-tools/src/lru_cache.rs b/pbs-tools/src/lru_cache.rs
index 1befb1a0d..9e0112647 100644
--- a/pbs-tools/src/lru_cache.rs
+++ b/pbs-tools/src/lru_cache.rs
@@ -133,7 +133,7 @@ impl<K: std::cmp::Eq + std::hash::Hash + Copy, V> LruCache<K, V> {
/// Insert or update an entry identified by `key` with the given `value`.
/// This entry is placed as the most recently used node at the head.
- pub fn insert(&mut self, key: K, value: V) {
+ pub fn insert(&mut self, key: K, value: V) -> bool {
match self.map.entry(key) {
Entry::Occupied(mut o) => {
// Node present, update value
@@ -142,6 +142,7 @@ impl<K: std::cmp::Eq + std::hash::Hash + Copy, V> LruCache<K, V> {
let mut node = unsafe { Box::from_raw(node_ptr) };
node.value = value;
let _node_ptr = Box::into_raw(node);
+ true
}
Entry::Vacant(v) => {
// Node not present, insert a new one
@@ -159,6 +160,7 @@ impl<K: std::cmp::Eq + std::hash::Hash + Copy, V> LruCache<K, V> {
if self.map.len() > self.capacity {
self.pop_tail();
}
+ false
}
}
}
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* [pbs-devel] [PATCH v5 proxmox-backup 2/5] garbage collection: format error including anyhow error context
2025-03-26 10:03 [pbs-devel] [PATCH v5 proxmox-backup 0/5] fix #5331: GC: avoid multiple atime updates Christian Ebner
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 1/5] tools: lru cache: tell if node was already present or newly inserted Christian Ebner
@ 2025-03-26 10:03 ` Christian Ebner
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 3/5] datastore: add helper method to open index reader from path Christian Ebner
` (3 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Christian Ebner @ 2025-03-26 10:03 UTC (permalink / raw)
To: pbs-devel
Until now errors are shown ignoring the anyhow error context. In
order to allow the garbage collection to return additional error
context, format the error including the context as single line.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- no changes
src/api2/admin/datastore.rs | 6 +-----
src/bin/proxmox-backup-proxy.rs | 2 +-
2 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/src/api2/admin/datastore.rs b/src/api2/admin/datastore.rs
index 483e595c1..4009f7b80 100644
--- a/src/api2/admin/datastore.rs
+++ b/src/api2/admin/datastore.rs
@@ -1218,11 +1218,7 @@ pub fn start_garbage_collection(
let upid_str =
crate::server::do_garbage_collection_job(job, datastore, &auth_id, None, to_stdout)
.map_err(|err| {
- format_err!(
- "unable to start garbage collection job on datastore {} - {}",
- store,
- err
- )
+ format_err!("unable to start garbage collection job on datastore {store} - {err:#}")
})?;
Ok(json!(upid_str))
diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
index c9a6032e6..1d4cf37c5 100644
--- a/src/bin/proxmox-backup-proxy.rs
+++ b/src/bin/proxmox-backup-proxy.rs
@@ -543,7 +543,7 @@ async fn schedule_datastore_garbage_collection() {
Some(event_str),
false,
) {
- eprintln!("unable to start garbage collection job on datastore {store} - {err}");
+ eprintln!("unable to start garbage collection job on datastore {store} - {err:#}");
}
}
}
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* [pbs-devel] [PATCH v5 proxmox-backup 3/5] datastore: add helper method to open index reader from path
2025-03-26 10:03 [pbs-devel] [PATCH v5 proxmox-backup 0/5] fix #5331: GC: avoid multiple atime updates Christian Ebner
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 1/5] tools: lru cache: tell if node was already present or newly inserted Christian Ebner
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 2/5] garbage collection: format error including anyhow error context Christian Ebner
@ 2025-03-26 10:03 ` Christian Ebner
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 4/5] garbage collection: generate index file list via datastore iterators Christian Ebner
` (2 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Christian Ebner @ 2025-03-26 10:03 UTC (permalink / raw)
To: pbs-devel
Refactor the archive type and index file reader opening with its
error handling into a helper method for better reusability.
This allows to use the same logic for both, expected image paths
and unexpected image paths when iterating trough the datastore
in a hierarchical manner.
Improve error handling by switching to anyhow's error context.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- cargo fmt
pbs-datastore/src/datastore.rs | 66 ++++++++++++++++++++++------------
1 file changed, 44 insertions(+), 22 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index a6a91ca79..6c3046ff6 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -5,7 +5,7 @@ use std::os::unix::io::AsRawFd;
use std::path::{Path, PathBuf};
use std::sync::{Arc, LazyLock, Mutex};
-use anyhow::{bail, format_err, Error};
+use anyhow::{bail, format_err, Context, Error};
use nix::unistd::{unlinkat, UnlinkatFlags};
use tracing::{info, warn};
@@ -1029,10 +1029,47 @@ impl DataStore {
Ok(list)
}
+ // Similar to open index, but ignore index files with blob or unknown archive type.
+ // Further, do not fail if file vanished.
+ fn open_index_reader(&self, absolute_path: &Path) -> Result<Option<Box<dyn IndexFile>>, Error> {
+ let archive_type = match ArchiveType::from_path(absolute_path) {
+ Ok(archive_type) => archive_type,
+ // ignore archives with unknown archive type
+ Err(_) => return Ok(None),
+ };
+
+ if absolute_path.is_relative() {
+ bail!("expected absolute path, got '{absolute_path:?}'");
+ }
+
+ let file = match std::fs::File::open(absolute_path) {
+ Ok(file) => file,
+ // ignore vanished files
+ Err(err) if err.kind() == io::ErrorKind::NotFound => return Ok(None),
+ Err(err) => {
+ return Err(Error::from(err).context(format!("can't open file '{absolute_path:?}'")))
+ }
+ };
+
+ match archive_type {
+ ArchiveType::FixedIndex => {
+ let reader = FixedIndexReader::new(file)
+ .with_context(|| format!("can't open fixed index '{absolute_path:?}'"))?;
+ Ok(Some(Box::new(reader)))
+ }
+ ArchiveType::DynamicIndex => {
+ let reader = DynamicIndexReader::new(file)
+ .with_context(|| format!("can't open dynamic index '{absolute_path:?}'"))?;
+ Ok(Some(Box::new(reader)))
+ }
+ ArchiveType::Blob => Ok(None),
+ }
+ }
+
// mark chunks used by ``index`` as used
- fn index_mark_used_chunks<I: IndexFile>(
+ fn index_mark_used_chunks(
&self,
- index: I,
+ index: Box<dyn IndexFile>,
file_name: &Path, // only used for error reporting
status: &mut GarbageCollectionStatus,
worker: &dyn WorkerTaskContext,
@@ -1090,24 +1127,8 @@ impl DataStore {
}
}
- match std::fs::File::open(&img) {
- Ok(file) => {
- if let Ok(archive_type) = ArchiveType::from_path(&img) {
- if archive_type == ArchiveType::FixedIndex {
- let index = FixedIndexReader::new(file).map_err(|e| {
- format_err!("can't read index '{}' - {}", img.to_string_lossy(), e)
- })?;
- self.index_mark_used_chunks(index, &img, status, worker)?;
- } else if archive_type == ArchiveType::DynamicIndex {
- let index = DynamicIndexReader::new(file).map_err(|e| {
- format_err!("can't read index '{}' - {}", img.to_string_lossy(), e)
- })?;
- self.index_mark_used_chunks(index, &img, status, worker)?;
- }
- }
- }
- Err(err) if err.kind() == io::ErrorKind::NotFound => (), // ignore vanished files
- Err(err) => bail!("can't open index {} - {}", img.to_string_lossy(), err),
+ if let Some(index) = self.open_index_reader(&img)? {
+ self.index_mark_used_chunks(index, &img, status, worker)?;
}
let percentage = (i + 1) * 100 / image_count;
@@ -1173,7 +1194,8 @@ impl DataStore {
info!("Start GC phase1 (mark used chunks)");
- self.mark_used_chunks(&mut gc_status, worker)?;
+ self.mark_used_chunks(&mut gc_status, worker)
+ .context("marking used chunks failed")?;
info!("Start GC phase2 (sweep unused chunks)");
self.inner.chunk_store.sweep_unused_chunks(
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* [pbs-devel] [PATCH v5 proxmox-backup 4/5] garbage collection: generate index file list via datastore iterators
2025-03-26 10:03 [pbs-devel] [PATCH v5 proxmox-backup 0/5] fix #5331: GC: avoid multiple atime updates Christian Ebner
` (2 preceding siblings ...)
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 3/5] datastore: add helper method to open index reader from path Christian Ebner
@ 2025-03-26 10:03 ` Christian Ebner
2025-04-02 17:26 ` Thomas Lamprecht
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 5/5] fix #5331: garbage collection: avoid multiple chunk atime updates Christian Ebner
2025-04-02 17:45 ` [pbs-devel] applied-series: [PATCH v5 proxmox-backup 0/5] fix #5331: GC: avoid multiple " Thomas Lamprecht
5 siblings, 1 reply; 12+ messages in thread
From: Christian Ebner @ 2025-03-26 10:03 UTC (permalink / raw)
To: pbs-devel
Instead of iterating over all index files found in the datastore in
an unstructured manner, use the datastore iterators to logically
iterate over them as other datastore operations will.
This allows to better distinguish index files in unexpected locations
from ones in their expected location, warning the user of unexpected
ones to allow to act on possible missconfigurations. Further, this
will allow to integrate marking of snapshots with missing chunks as
incomplete/corrupt more easily and helps improve cache hits when
introducing LRU caching to avoid multiple atime updates in phase 1 of
garbage collection.
This now iterates twice over the index files, as indices in
unexpected locations are still considered by generating the list of
all index files to be found in the datastore and removing regular
index files from that list, leaving unexpected ones behind.
Further, align terminology by renaming the `list_images` method to
a more fitting `list_index_files` and the variable names accordingly.
This will reduce possible confusion since throughout the codebase and
in the documentation files referencing the data chunks are referred
to as index files. The term image on the other hand is associated
with virtual machine images and other large binary data stored as
fixed-size chunks.
Basic benchmarking:
Total GC runtime shows no significatn change (average of 3 runs):
unpatched: 155.4 ± 2.6 s
patched: 155.4 ± 3.5 s
VmPeak measured via /proc/self/status before and after
`mark_used_chunks` (proxmox-backup-proxy was restarted in between
for normalization, no changes for all 3 runs):
unpatched before: 1196032 kB
unpatched after: 1196032 kB
patched before: 1196028 kB
patched after: 1196028 kB
List image shows a slight increase due to the switch to a HashSet
(average of 3 runs):
unpatched: 64.2 ± 8.4 ms
patched: 72.8 ± 3.7 ms
Description of the PBS host and datastore:
CPU: Intel Xeon E5-2620
Datastore backing storage: ZFS RAID 10 with 3 mirrors of 2x
ST16000NM001G, mirror of 2x SAMSUNG_MZ1LB1T9HALS as special
Namespaces: 45
Groups: 182
Snapshots: 3184
Index files: 6875
Deduplication factor: 44.54
Original data usage: 120.742 TiB
On-Disk usage: 2.711 TiB (2.25%)
On-Disk chunks: 1494727
Average chunk size: 1.902 MiB
Distribution of snapshots (binned by month):
2023-11 11
2023-12 16
2024-01 30
2024-02 38
2024-03 17
2024-04 37
2024-05 17
2024-06 59
2024-07 99
2024-08 96
2024-09 115
2024-10 35
2024-11 42
2024-12 37
2025-01 162
2025-02 489
2025-03 1884
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- extended commit message
pbs-datastore/src/datastore.rs | 104 ++++++++++++++++++++++-----------
1 file changed, 70 insertions(+), 34 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 6c3046ff6..8ce98f1b3 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -25,7 +25,7 @@ use pbs_api_types::{
MaintenanceMode, MaintenanceType, Operation, UPID,
};
-use crate::backup_info::{BackupDir, BackupGroup};
+use crate::backup_info::{BackupDir, BackupGroup, BackupInfo};
use crate::chunk_store::ChunkStore;
use crate::dynamic_index::{DynamicIndexReader, DynamicIndexWriter};
use crate::fixed_index::{FixedIndexReader, FixedIndexWriter};
@@ -970,10 +970,15 @@ impl DataStore {
ListGroups::new(Arc::clone(self), ns)?.collect()
}
- fn list_images(&self) -> Result<Vec<PathBuf>, Error> {
+ /// Lookup all index files to be found in the datastore without taking any logical iteration
+ /// into account.
+ /// The filesystem is walked recursevly to detect index files based on their archive type based
+ /// on the filename. This however excludes the chunks folder, hidden files and does not follow
+ /// symlinks.
+ fn list_index_files(&self) -> Result<HashSet<PathBuf>, Error> {
let base = self.base_path();
- let mut list = vec![];
+ let mut list = HashSet::new();
use walkdir::WalkDir;
@@ -1021,7 +1026,7 @@ impl DataStore {
if archive_type == ArchiveType::FixedIndex
|| archive_type == ArchiveType::DynamicIndex
{
- list.push(path);
+ list.insert(path);
}
}
}
@@ -1107,44 +1112,75 @@ impl DataStore {
status: &mut GarbageCollectionStatus,
worker: &dyn WorkerTaskContext,
) -> Result<(), Error> {
- let image_list = self.list_images()?;
- let image_count = image_list.len();
-
+ // Iterate twice over the datastore to fetch index files, even if this comes with an
+ // additional runtime cost:
+ // - First iteration to find all index files, no matter if they are in a location expected
+ // by the datastore's hierarchy
+ // - Iterate using the datastore's helpers, so the namespaces, groups and snapshots are
+ // looked up given the expected hierarchy and iterator logic
+ //
+ // By this it is assured that all index files are used, even if they would not have been
+ // seen by the regular logic and the user is informed by the garbage collection run about
+ // the detected index files not following the iterators logic.
+
+ let mut unprocessed_index_list = self.list_index_files()?;
+ let index_count = unprocessed_index_list.len();
+
+ let mut processed_index_files = 0;
let mut last_percentage: usize = 0;
- let mut strange_paths_count: u64 = 0;
-
- for (i, img) in image_list.into_iter().enumerate() {
- worker.check_abort()?;
- worker.fail_on_shutdown()?;
-
- if let Some(backup_dir_path) = img.parent() {
- let backup_dir_path = backup_dir_path.strip_prefix(self.base_path())?;
- if let Some(backup_dir_str) = backup_dir_path.to_str() {
- if pbs_api_types::parse_ns_and_snapshot(backup_dir_str).is_err() {
- strange_paths_count += 1;
+ let arc_self = Arc::new(self.clone());
+ for namespace in arc_self
+ .recursive_iter_backup_ns(BackupNamespace::root())
+ .context("creating namespace iterator failed")?
+ {
+ let namespace = namespace.context("iterating namespaces failed")?;
+ for group in arc_self.iter_backup_groups(namespace)? {
+ let group = group.context("iterating backup groups failed")?;
+ let mut snapshots = group.list_backups().context("listing snapshots failed")?;
+ // Sort by snapshot timestamp to iterate over consecutive snapshots for each image.
+ BackupInfo::sort_list(&mut snapshots, true);
+ for snapshot in snapshots {
+ for file in snapshot.files {
+ worker.check_abort()?;
+ worker.fail_on_shutdown()?;
+
+ let mut path = snapshot.backup_dir.full_path();
+ path.push(file);
+
+ let index = match self.open_index_reader(&path)? {
+ Some(index) => index,
+ None => continue,
+ };
+ self.index_mark_used_chunks(index, &path, status, worker)?;
+
+ unprocessed_index_list.remove(&path);
+
+ let percentage = (processed_index_files + 1) * 100 / index_count;
+ if percentage > last_percentage {
+ info!(
+ "marked {percentage}% ({} of {index_count} index files)",
+ processed_index_files + 1,
+ );
+ last_percentage = percentage;
+ }
+ processed_index_files += 1;
}
}
}
-
- if let Some(index) = self.open_index_reader(&img)? {
- self.index_mark_used_chunks(index, &img, status, worker)?;
- }
-
- let percentage = (i + 1) * 100 / image_count;
- if percentage > last_percentage {
- info!(
- "marked {percentage}% ({} of {image_count} index files)",
- i + 1,
- );
- last_percentage = percentage;
- }
}
+ let strange_paths_count = unprocessed_index_list.len();
if strange_paths_count > 0 {
- info!(
- "found (and marked) {strange_paths_count} index files outside of expected directory scheme"
- );
+ warn!("found {strange_paths_count} index files outside of expected directory scheme");
+ }
+ for path in unprocessed_index_list {
+ let index = match self.open_index_reader(&path)? {
+ Some(index) => index,
+ None => continue,
+ };
+ self.index_mark_used_chunks(index, &path, status, worker)?;
+ warn!("Marked chunks for unexpected index file at '{path:?}'");
}
Ok(())
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* [pbs-devel] [PATCH v5 proxmox-backup 5/5] fix #5331: garbage collection: avoid multiple chunk atime updates
2025-03-26 10:03 [pbs-devel] [PATCH v5 proxmox-backup 0/5] fix #5331: GC: avoid multiple atime updates Christian Ebner
` (3 preceding siblings ...)
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 4/5] garbage collection: generate index file list via datastore iterators Christian Ebner
@ 2025-03-26 10:03 ` Christian Ebner
2025-04-02 15:57 ` Thomas Lamprecht
2025-04-02 17:45 ` [pbs-devel] applied-series: [PATCH v5 proxmox-backup 0/5] fix #5331: GC: avoid multiple " Thomas Lamprecht
5 siblings, 1 reply; 12+ messages in thread
From: Christian Ebner @ 2025-03-26 10:03 UTC (permalink / raw)
To: pbs-devel
To reduce the number of atimes updates, keep track of the recently
marked chunks in phase 1 of garbage to avoid multiple atime updates
via expensive utimensat() calls.
Recently touched chunks are tracked by storing the chunk digests in
an LRU cache of fixed capacity. By inserting a digest, the chunk will
be the most recently touched one and if already present in the cache
before insert, the atime update can be skipped. The cache capacity of
1024 * 1024 was chosen as compromise between required memory usage
and the size of an index file referencing a 4 TiB fixed size chunked
image (with 4MiB chunk size).
The previous change to iterate over the datastore contents using the
datastore's iterator helps for increased cache hits, as subsequent
snapshots are most likely to share common chunks.
Basic benchmarking:
Number of utimensat calls shows significatn reduction:
unpatched: 31591944
patched: 1495136
Total GC runtime shows significatn reduction (average of 3 runs):
unpatched: 155.4 ± 3.5 s
patched: 22.8 ± 0.5 s
VmPeak measured via /proc/self/status before and after
`mark_used_chunks` (proxmox-backup-proxy was restarted in between
for normalization, average of 3 runs):
unpatched before: 1196028 ± 0 kB
unpatched after: 1196028 ± 0 kB
unpatched before: 1163337 ± 28317 kB
unpatched after: 1330906 ± 29280 kB
delta: 167569 kB
Dependence on the cache capacity:
capacity runtime[s] VmPeakDiff[kB]
1*1024 66.221 0
10*1024 36.164 0
100*1024 23.141 0
1024*1024 22.188 101060
10*1024*1024 23.178 689660
100*1024*1024 25.135 5507292
Description of the PBS host and datastore:
CPU: Intel Xeon E5-2620
Datastore backing storage: ZFS RAID 10 with 3 mirrors of 2x
ST16000NM001G, mirror of 2x SAMSUNG_MZ1LB1T9HALS as special
Namespaces: 45
Groups: 182
Snapshots: 3184
Index files: 6875
Deduplication factor: 44.54
Original data usage: 120.742 TiB
On-Disk usage: 2.711 TiB (2.25%)
On-Disk chunks: 1494727
Average chunk size: 1.902 MiB
Distribution of snapshots (binned by month):
2023-11 11
2023-12 16
2024-01 30
2024-02 38
2024-03 17
2024-04 37
2024-05 17
2024-06 59
2024-07 99
2024-08 96
2024-09 115
2024-10 35
2024-11 42
2024-12 37
2025-01 162
2025-02 489
2025-03 1884
Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=5331
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 4:
- extended commit message
- s/recently_touched_chunks/chunk_lru_cache/
- dropped to generic comment
pbs-datastore/src/datastore.rs | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 8ce98f1b3..9cb78d741 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -7,6 +7,7 @@ use std::sync::{Arc, LazyLock, Mutex};
use anyhow::{bail, format_err, Context, Error};
use nix::unistd::{unlinkat, UnlinkatFlags};
+use pbs_tools::lru_cache::LruCache;
use tracing::{info, warn};
use proxmox_human_byte::HumanByte;
@@ -1076,6 +1077,7 @@ impl DataStore {
&self,
index: Box<dyn IndexFile>,
file_name: &Path, // only used for error reporting
+ chunk_lru_cache: &mut LruCache<[u8; 32], ()>,
status: &mut GarbageCollectionStatus,
worker: &dyn WorkerTaskContext,
) -> Result<(), Error> {
@@ -1086,6 +1088,12 @@ impl DataStore {
worker.check_abort()?;
worker.fail_on_shutdown()?;
let digest = index.index_digest(pos).unwrap();
+
+ // Avoid multiple expensive atime updates by utimensat
+ if chunk_lru_cache.insert(*digest, ()) {
+ continue;
+ }
+
if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
let hex = hex::encode(digest);
warn!(
@@ -1126,6 +1134,7 @@ impl DataStore {
let mut unprocessed_index_list = self.list_index_files()?;
let index_count = unprocessed_index_list.len();
+ let mut chunk_lru_cache = LruCache::new(1024 * 1024);
let mut processed_index_files = 0;
let mut last_percentage: usize = 0;
@@ -1152,7 +1161,13 @@ impl DataStore {
Some(index) => index,
None => continue,
};
- self.index_mark_used_chunks(index, &path, status, worker)?;
+ self.index_mark_used_chunks(
+ index,
+ &path,
+ &mut chunk_lru_cache,
+ status,
+ worker,
+ )?;
unprocessed_index_list.remove(&path);
@@ -1179,7 +1194,7 @@ impl DataStore {
Some(index) => index,
None => continue,
};
- self.index_mark_used_chunks(index, &path, status, worker)?;
+ self.index_mark_used_chunks(index, &path, &mut chunk_lru_cache, status, worker)?;
warn!("Marked chunks for unexpected index file at '{path:?}'");
}
--
2.39.5
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [pbs-devel] [PATCH v5 proxmox-backup 5/5] fix #5331: garbage collection: avoid multiple chunk atime updates
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 5/5] fix #5331: garbage collection: avoid multiple chunk atime updates Christian Ebner
@ 2025-04-02 15:57 ` Thomas Lamprecht
2025-04-02 19:50 ` Christian Ebner
0 siblings, 1 reply; 12+ messages in thread
From: Thomas Lamprecht @ 2025-04-02 15:57 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Christian Ebner
Am 26.03.25 um 11:03 schrieb Christian Ebner:
> Basic benchmarking:
>
> Number of utimensat calls shows significatn reduction:
> unpatched: 31591944
> patched: 1495136
>
> Total GC runtime shows significatn reduction (average of 3 runs):
> unpatched: 155.4 ± 3.5 s
> patched: 22.8 ± 0.5 s
Thanks a lot for providing these numbers, and what a nice runtime
improvement!
>
> VmPeak measured via /proc/self/status before and after
> `mark_used_chunks` (proxmox-backup-proxy was restarted in between
> for normalization, average of 3 runs):
> unpatched before: 1196028 ± 0 kB
> unpatched after: 1196028 ± 0 kB
>
> unpatched before: 1163337 ± 28317 kB
> unpatched after: 1330906 ± 29280 kB
> delta: 167569 kB
VmPeak is virtual memory though, not something like resident set size,
or better proportional set size – but yeah that's harder to get.
Simplest way might be polling something like `ps -o pid,rss,pss -u backup`
in a shell alongside the GC run a few times per second, e.g.:
while :; do printf '%s ' $(date '+%T.%3N'); $(); ps -o pid,rss,pss -u backup --no-headers; sleep 0.5; done | tee gc-stats
And then get the highest PSS values via:
sort -nk4,4 gc-stats | tail
I do not think this needs to be redone, and a new revision needs to be
send though. But, it might be nice to do a quick test just for a rough
comparison to VmPeak delta.
>
> Dependence on the cache capacity:
> capacity runtime[s] VmPeakDiff[kB]
> 1*1024 66.221 0
> 10*1024 36.164 0
> 100*1024 23.141 0
> 1024*1024 22.188 101060
Hmm, seems like we could lower the cache size to something like 128*1024
or 256*1024 and get already most benefits for this workload.
What do you think about applying this as is and after doing a quick RSS
and/or PSS benchmark decide if it's worth to start out a bit smaller, as
167 MiB delta is a bit much for my taste if a quarter of that is enough
to get most benefits. If the actual used memory (not just virtual memory
mappings) is rather closer to the cache size without overhead (32 MiB),
I'd be fine with keeping this as is.
tuning option in MiB (i.e. 32 MiB / 32 B == 1024*1024 capacity) where the
admin can better control this themselves.
> 10*1024*1024 23.178 689660
> 100*1024*1024 25.135 5507292
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [pbs-devel] [PATCH v5 proxmox-backup 4/5] garbage collection: generate index file list via datastore iterators
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 4/5] garbage collection: generate index file list via datastore iterators Christian Ebner
@ 2025-04-02 17:26 ` Thomas Lamprecht
2025-04-02 19:39 ` Christian Ebner
0 siblings, 1 reply; 12+ messages in thread
From: Thomas Lamprecht @ 2025-04-02 17:26 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Christian Ebner
Am 26.03.25 um 11:03 schrieb Christian Ebner:
> Instead of iterating over all index files found in the datastore in
> an unstructured manner, use the datastore iterators to logically
> iterate over them as other datastore operations will.
btw. With this we should also be able to relatively cheaply generate per
namespace and per backup group stats similar to the datastore wide GC stats,
i.e. also implement [#5799]. No pressure at all here, I just remembered
about this when re-checking your series here.
[#5799]: https://bugzilla.proxmox.com/show_bug.cgi?id=5799
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* [pbs-devel] applied-series: [PATCH v5 proxmox-backup 0/5] fix #5331: GC: avoid multiple atime updates
2025-03-26 10:03 [pbs-devel] [PATCH v5 proxmox-backup 0/5] fix #5331: GC: avoid multiple atime updates Christian Ebner
` (4 preceding siblings ...)
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 5/5] fix #5331: garbage collection: avoid multiple chunk atime updates Christian Ebner
@ 2025-04-02 17:45 ` Thomas Lamprecht
5 siblings, 0 replies; 12+ messages in thread
From: Thomas Lamprecht @ 2025-04-02 17:45 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Christian Ebner
Am 26.03.25 um 11:03 schrieb Christian Ebner:> This patches implement the logic to greatly improve the performance
> of phase 1 garbage collection by avoiding multiple atime updates on
> the same chunk.
>
> Currently, phase 1 GC iterates over all folders in the datastore
> looking and collecting all image index files without taking any
> logical assumptions (e.g. namespaces, groups, snapshots, ...). This
> is to avoid accidentally missing image index files located in
> unexpected paths and therefore not marking their chunks as in use,
> leading to potential data losses.
>
> This patches improve phase 1 by:
> - Iterating index images using the datatstore's iterators for detecting
> regular index files. Paths outside of the iterator logic are still taken
> into account and processed as well by generating a list of all the found
> images first, removing index files encountered while iterating, finally
> leaving a list of indexes with unexpected paths. These unexpected paths
> are now also logged, for the user to potentially take action.
> - Keeping track of recently touched chunks by storing their digests in a
> LRU cache, skipping over expensive atime updates for chunks already
> present in the cache.
> Christian Ebner (5):
> tools: lru cache: tell if node was already present or newly inserted
> garbage collection: format error including anyhow error context
> datastore: add helper method to open index reader from path
> garbage collection: generate index file list via datastore iterators
> fix #5331: garbage collection: avoid multiple chunk atime updates
>
> pbs-datastore/src/datastore.rs | 179 ++++++++++++++++++++++----------
> pbs-tools/src/lru_cache.rs | 4 +-
> src/api2/admin/datastore.rs | 6 +-
> src/bin/proxmox-backup-proxy.rs | 2 +-
> 4 files changed, 131 insertions(+), 60 deletions(-)
>
I decided to move this forward as is, getting the other metrics measured
like proposed in my reply to patch 5/5 might be still nice, if that really
results in us choosing a different default or what not we can record those
metrics in a new commit, else they are at least on the list and all is
fine anyway. So:
applied, thanks!
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [pbs-devel] [PATCH v5 proxmox-backup 4/5] garbage collection: generate index file list via datastore iterators
2025-04-02 17:26 ` Thomas Lamprecht
@ 2025-04-02 19:39 ` Christian Ebner
0 siblings, 0 replies; 12+ messages in thread
From: Christian Ebner @ 2025-04-02 19:39 UTC (permalink / raw)
To: Thomas Lamprecht, Proxmox Backup Server development discussion
On 4/2/25 19:26, Thomas Lamprecht wrote:
> Am 26.03.25 um 11:03 schrieb Christian Ebner:
>> Instead of iterating over all index files found in the datastore in
>> an unstructured manner, use the datastore iterators to logically
>> iterate over them as other datastore operations will.
>
> btw. With this we should also be able to relatively cheaply generate per
> namespace and per backup group stats similar to the datastore wide GC stats,
> i.e. also implement [#5799]. No pressure at all here, I just remembered
> about this when re-checking your series here.
>
> [#5799]: https://bugzilla.proxmox.com/show_bug.cgi?id=5799
Great, will have a go at this!
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [pbs-devel] [PATCH v5 proxmox-backup 5/5] fix #5331: garbage collection: avoid multiple chunk atime updates
2025-04-02 15:57 ` Thomas Lamprecht
@ 2025-04-02 19:50 ` Christian Ebner
2025-04-02 19:54 ` Thomas Lamprecht
0 siblings, 1 reply; 12+ messages in thread
From: Christian Ebner @ 2025-04-02 19:50 UTC (permalink / raw)
To: Thomas Lamprecht, Proxmox Backup Server development discussion
On 4/2/25 17:57, Thomas Lamprecht wrote:
> Am 26.03.25 um 11:03 schrieb Christian Ebner:
>> Basic benchmarking:
>>
>> Number of utimensat calls shows significatn reduction:
>> unpatched: 31591944
>> patched: 1495136
>>
>> Total GC runtime shows significatn reduction (average of 3 runs):
>> unpatched: 155.4 ± 3.5 s
>> patched: 22.8 ± 0.5 s
>
> Thanks a lot for providing these numbers, and what a nice runtime
> improvement!
>
>>
>> VmPeak measured via /proc/self/status before and after
>> `mark_used_chunks` (proxmox-backup-proxy was restarted in between
>> for normalization, average of 3 runs):
>> unpatched before: 1196028 ± 0 kB
>> unpatched after: 1196028 ± 0 kB
>>
>> unpatched before: 1163337 ± 28317 kB
>> unpatched after: 1330906 ± 29280 kB
>> delta: 167569 kB
>
> VmPeak is virtual memory though, not something like resident set size,
> or better proportional set size – but yeah that's harder to get.
> Simplest way might be polling something like `ps -o pid,rss,pss -u backup`
> in a shell alongside the GC run a few times per second, e.g.:
>
> while :; do printf '%s ' $(date '+%T.%3N'); $(); ps -o pid,rss,pss -u backup --no-headers; sleep 0.5; done | tee gc-stats
>
> And then get the highest PSS values via:
>
> sort -nk4,4 gc-stats | tail
>
> I do not think this needs to be redone, and a new revision needs to be
> send though. But, it might be nice to do a quick test just for a rough
> comparison to VmPeak delta.
Ah, thanks for the explanation and suggestion, was a bit unsure already
if the VmPeak is informative enough. Will re-check this with the
suggested metrics.
>
>>
>> Dependence on the cache capacity:
>> capacity runtime[s] VmPeakDiff[kB]
>> 1*1024 66.221 0
>> 10*1024 36.164 0
>> 100*1024 23.141 0
>> 1024*1024 22.188 101060
>
> Hmm, seems like we could lower the cache size to something like 128*1024
> or 256*1024 and get already most benefits for this workload.
>
> What do you think about applying this as is and after doing a quick RSS
> and/or PSS benchmark decide if it's worth to start out a bit smaller, as
> 167 MiB delta is a bit much for my taste if a quarter of that is enough
> to get most benefits. If the actual used memory (not just virtual memory
> mappings) is rather closer to the cache size without overhead (32 MiB),
> I'd be fine with keeping this as is.
Okay, yes will ask Stoiko to get access to the PBS instance once more to
have a similar datastore and gain some more data on this.
>
> tuning option in MiB (i.e. 32 MiB / 32 B == 1024*1024 capacity) where the
> admin can better control this themselves.
This I do not fully understand, I assume this sentence is cut off? But I
can send a patch to expose this as datastore tuning option as well.
>
>> 10*1024*1024 23.178 689660
>> 100*1024*1024 25.135 5507292
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [pbs-devel] [PATCH v5 proxmox-backup 5/5] fix #5331: garbage collection: avoid multiple chunk atime updates
2025-04-02 19:50 ` Christian Ebner
@ 2025-04-02 19:54 ` Thomas Lamprecht
0 siblings, 0 replies; 12+ messages in thread
From: Thomas Lamprecht @ 2025-04-02 19:54 UTC (permalink / raw)
To: Christian Ebner, Proxmox Backup Server development discussion
Am 02.04.25 um 21:50 schrieb Christian Ebner:
>> tuning option in MiB (i.e. 32 MiB / 32 B == 1024*1024 capacity) where the
>> admin can better control this themselves.
> This I do not fully understand, I assume this sentence is cut off? But I
> can send a patch to expose this as datastore tuning option as well.
Yeah, sorry about that, this was something I wrote but then thought it's
not worth it just now without further feedback but did not fully removed
it – I'd move the freshly uploaded 3.3.5-1 version to pbstest soonish and
then we can see how it plays out on our remaining production instances and
maybe we already get some user feedback. But yes, having this as tuning
option might be worth it in the long run for both users with constrained
memory and users with huge backup groups and enough memory available.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-04-02 19:54 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-26 10:03 [pbs-devel] [PATCH v5 proxmox-backup 0/5] fix #5331: GC: avoid multiple atime updates Christian Ebner
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 1/5] tools: lru cache: tell if node was already present or newly inserted Christian Ebner
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 2/5] garbage collection: format error including anyhow error context Christian Ebner
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 3/5] datastore: add helper method to open index reader from path Christian Ebner
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 4/5] garbage collection: generate index file list via datastore iterators Christian Ebner
2025-04-02 17:26 ` Thomas Lamprecht
2025-04-02 19:39 ` Christian Ebner
2025-03-26 10:03 ` [pbs-devel] [PATCH v5 proxmox-backup 5/5] fix #5331: garbage collection: avoid multiple chunk atime updates Christian Ebner
2025-04-02 15:57 ` Thomas Lamprecht
2025-04-02 19:50 ` Christian Ebner
2025-04-02 19:54 ` Thomas Lamprecht
2025-04-02 17:45 ` [pbs-devel] applied-series: [PATCH v5 proxmox-backup 0/5] fix #5331: GC: avoid multiple " Thomas Lamprecht
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal