From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH proxmox-backup v11 22/46] datastore: implement garbage collection for s3 backend
Date: Tue, 22 Jul 2025 12:10:42 +0200 [thread overview]
Message-ID: <20250722101106.526438-27-c.ebner@proxmox.com> (raw)
In-Reply-To: <20250722101106.526438-1-c.ebner@proxmox.com>
Implements the garbage collection for datastores backed by an s3
object store.
Take advantage of the local datastore by placing marker files in the
chunk store during phase 1 of the garbage collection, updating their
atime if already present.
This allows us to avoid making expensive API calls to update object
metadata, which would only be possible via a copy object operation.
The phase 2 is implemented by fetching a list of all the chunks via
the ListObjectsV2 API call, filtered by the chunk folder prefix.
This operation has to be performed in batches of 1000 objects, given
by the APIs response limits.
For each object key, lookup the marker file and decide based on the
marker existence and it's atime if the chunk object needs to be
removed. Deletion happens via the delete objects operation, allowing
to delete multiple chunks by a single request.
This allows to efficiently lookup chunks which are not in use
anymore while being performant and cost effective.
Baseline runtime performance tests:
-----------------------------------
3 garbage collection runs were performed with hot filesystem caches
(by additional GC run before the test runs). The PBS instance was
virtualized, the same virtualized disk using ZFS for all the local
cache stores:
All datastores contained the same encrypted data, with the following
content statistics:
Original data usage: 269.685 GiB
On-Disk usage: 9.018 GiB (3.34%)
On-Disk chunks: 6477
Deduplication factor: 29.90
Average chunk size: 1.426 MiB
The resutlts demonstrate the overhead caused by the additional
ListObjectV2 API calls and their processing, but depending on the
object store backend.
Average garbage collection runtime:
Local datastore: (2.04 ± 0.01) s
Local RADOS gateway (Squid): (3.05 ± 0.01) s
AWS S3: (3.05 ± 0.01) s
Cloudflare R2: (6.71 ± 0.58) s
After pruning of all datastore contents (therefore including
DeleteObjects requests):
Local datastore: 3.04 s
Local RADOS gateway (Squid): 14.08 s
AWS S3: 13.06 s
Cloudflare R2: 78.21 s
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Reviewed-by: Lukas Wagner <l.wagner@proxmox.com>
Reviewed-by: Hannes Laimer <h.laimer@proxmox.com>
---
changes since version 10:
- no changes
pbs-datastore/src/chunk_store.rs | 4 +
pbs-datastore/src/datastore.rs | 252 +++++++++++++++++++++++++++----
2 files changed, 230 insertions(+), 26 deletions(-)
diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 8c195df54..95f00e8d5 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -353,6 +353,10 @@ impl ChunkStore {
ProcessLocker::oldest_shared_lock(self.locker.clone().unwrap())
}
+ pub fn mutex(&self) -> &std::sync::Mutex<()> {
+ &self.mutex
+ }
+
pub fn sweep_unused_chunks(
&self,
oldest_writer: i64,
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index ae8493a7c..04e54a10a 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -4,7 +4,7 @@ use std::os::unix::ffi::OsStrExt;
use std::os::unix::io::AsRawFd;
use std::path::{Path, PathBuf};
use std::sync::{Arc, LazyLock, Mutex};
-use std::time::Duration;
+use std::time::{Duration, SystemTime};
use anyhow::{bail, format_err, Context, Error};
use http_body_util::BodyExt;
@@ -13,7 +13,7 @@ use pbs_tools::lru_cache::LruCache;
use tracing::{info, warn};
use proxmox_human_byte::HumanByte;
-use proxmox_s3_client::{S3Client, S3ClientConfig, S3ClientOptions, S3PathPrefix};
+use proxmox_s3_client::{S3Client, S3ClientConfig, S3ClientOptions, S3ObjectKey, S3PathPrefix};
use proxmox_schema::ApiType;
use proxmox_sys::error::SysError;
@@ -1210,6 +1210,7 @@ impl DataStore {
chunk_lru_cache: &mut Option<LruCache<[u8; 32], ()>>,
status: &mut GarbageCollectionStatus,
worker: &dyn WorkerTaskContext,
+ s3_client: Option<Arc<S3Client>>,
) -> Result<(), Error> {
status.index_file_count += 1;
status.index_data_bytes += index.index_bytes();
@@ -1232,21 +1233,41 @@ impl DataStore {
}
}
- if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
- let hex = hex::encode(digest);
- warn!(
- "warning: unable to access non-existent chunk {hex}, required by {file_name:?}"
- );
-
- // touch any corresponding .bad files to keep them around, meaning if a chunk is
- // rewritten correctly they will be removed automatically, as well as if no index
- // file requires the chunk anymore (won't get to this loop then)
- for i in 0..=9 {
- let bad_ext = format!("{}.bad", i);
- let mut bad_path = PathBuf::new();
- bad_path.push(self.chunk_path(digest).0);
- bad_path.set_extension(bad_ext);
- self.inner.chunk_store.cond_touch_path(&bad_path, false)?;
+ match s3_client {
+ None => {
+ // Filesystem backend
+ if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
+ let hex = hex::encode(digest);
+ warn!(
+ "warning: unable to access non-existent chunk {hex}, required by {file_name:?}"
+ );
+
+ // touch any corresponding .bad files to keep them around, meaning if a chunk is
+ // rewritten correctly they will be removed automatically, as well as if no index
+ // file requires the chunk anymore (won't get to this loop then)
+ for i in 0..=9 {
+ let bad_ext = format!("{}.bad", i);
+ let mut bad_path = PathBuf::new();
+ bad_path.push(self.chunk_path(digest).0);
+ bad_path.set_extension(bad_ext);
+ self.inner.chunk_store.cond_touch_path(&bad_path, false)?;
+ }
+ }
+ }
+ Some(ref _s3_client) => {
+ // Update atime on local cache marker files.
+ if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
+ let (chunk_path, _digest) = self.chunk_path(digest);
+ // Insert empty file as marker to tell GC phase2 that this is
+ // a chunk still in-use, so to keep in the S3 object store.
+ std::fs::File::options()
+ .write(true)
+ .create_new(true)
+ .open(&chunk_path)
+ .with_context(|| {
+ format!("failed to create marker for chunk {}", hex::encode(digest))
+ })?;
+ }
}
}
}
@@ -1258,6 +1279,7 @@ impl DataStore {
status: &mut GarbageCollectionStatus,
worker: &dyn WorkerTaskContext,
cache_capacity: usize,
+ s3_client: Option<Arc<S3Client>>,
) -> Result<(), Error> {
// Iterate twice over the datastore to fetch index files, even if this comes with an
// additional runtime cost:
@@ -1351,6 +1373,7 @@ impl DataStore {
&mut chunk_lru_cache,
status,
worker,
+ s3_client.as_ref().cloned(),
)?;
if !unprocessed_index_list.remove(&path) {
@@ -1385,7 +1408,14 @@ impl DataStore {
continue;
}
};
- self.index_mark_used_chunks(index, &path, &mut chunk_lru_cache, status, worker)?;
+ self.index_mark_used_chunks(
+ index,
+ &path,
+ &mut chunk_lru_cache,
+ status,
+ worker,
+ s3_client.as_ref().cloned(),
+ )?;
warn!("Marked chunks for unexpected index file at '{path:?}'");
}
if strange_paths_count > 0 {
@@ -1484,18 +1514,104 @@ impl DataStore {
1024 * 1024
};
- info!("Start GC phase1 (mark used chunks)");
+ let s3_client = match self.backend()? {
+ DatastoreBackend::Filesystem => None,
+ DatastoreBackend::S3(s3_client) => {
+ proxmox_async::runtime::block_on(s3_client.head_bucket())
+ .context("failed to reach bucket")?;
+ Some(s3_client)
+ }
+ };
- self.mark_used_chunks(&mut gc_status, worker, gc_cache_capacity)
- .context("marking used chunks failed")?;
+ info!("Start GC phase1 (mark used chunks)");
- info!("Start GC phase2 (sweep unused chunks)");
- self.inner.chunk_store.sweep_unused_chunks(
- oldest_writer,
- min_atime,
+ self.mark_used_chunks(
&mut gc_status,
worker,
- )?;
+ gc_cache_capacity,
+ s3_client.as_ref().cloned(),
+ )
+ .context("marking used chunks failed")?;
+
+ info!("Start GC phase2 (sweep unused chunks)");
+
+ if let Some(ref s3_client) = s3_client {
+ let mut chunk_count = 0;
+ let prefix = S3PathPrefix::Some(".chunks/".to_string());
+ // Operates in batches of 1000 objects max per request
+ let mut list_bucket_result =
+ proxmox_async::runtime::block_on(s3_client.list_objects_v2(&prefix, None))
+ .context("failed to list chunk in s3 object store")?;
+
+ let mut delete_list = Vec::with_capacity(1000);
+ loop {
+ let lock = self.inner.chunk_store.mutex().lock().unwrap();
+
+ for content in list_bucket_result.contents {
+ if self
+ .mark_chunk_for_object_key(
+ &content.key,
+ content.size,
+ min_atime,
+ oldest_writer,
+ &mut delete_list,
+ &mut gc_status,
+ )
+ .with_context(|| {
+ format!("failed to mark chunk for object key {}", content.key)
+ })?
+ {
+ chunk_count += 1;
+ }
+ }
+
+ if !delete_list.is_empty() {
+ let delete_objects_result = proxmox_async::runtime::block_on(
+ s3_client.delete_objects(&delete_list),
+ )?;
+ if let Some(_err) = delete_objects_result.error {
+ bail!("failed to delete some objects");
+ }
+ delete_list.clear();
+ }
+
+ drop(lock);
+
+ // Process next batch of chunks if there is more
+ if list_bucket_result.is_truncated {
+ list_bucket_result =
+ proxmox_async::runtime::block_on(s3_client.list_objects_v2(
+ &prefix,
+ list_bucket_result.next_continuation_token.as_deref(),
+ ))?;
+ continue;
+ }
+
+ break;
+ }
+ info!("processed {chunk_count} total chunks");
+
+ // Phase 2 GC of Filesystem backed storage is phase 3 for S3 backed GC
+ info!("Start GC phase3 (sweep unused chunk markers)");
+
+ let mut tmp_gc_status = GarbageCollectionStatus {
+ upid: Some(upid.to_string()),
+ ..Default::default()
+ };
+ self.inner.chunk_store.sweep_unused_chunks(
+ oldest_writer,
+ min_atime,
+ &mut tmp_gc_status,
+ worker,
+ )?;
+ } else {
+ self.inner.chunk_store.sweep_unused_chunks(
+ oldest_writer,
+ min_atime,
+ &mut gc_status,
+ worker,
+ )?;
+ }
if let Some(cache_stats) = &gc_status.cache_stats {
let total_cache_counts = cache_stats.hits + cache_stats.misses;
@@ -1582,6 +1698,90 @@ impl DataStore {
Ok(())
}
+ // Mark the chunk marker in the local cache store for the given object key as in use
+ // by updating it's atime.
+ // Returns Ok(true) if the chunk was updated and Ok(false) if the object was not a chunk.
+ fn mark_chunk_for_object_key(
+ &self,
+ object_key: &S3ObjectKey,
+ size: u64,
+ min_atime: i64,
+ oldest_writer: i64,
+ delete_list: &mut Vec<S3ObjectKey>,
+ gc_status: &mut GarbageCollectionStatus,
+ ) -> Result<bool, Error> {
+ let chunk_path = match self.chunk_path_from_object_key(&object_key) {
+ Some(path) => path,
+ None => return Ok(false),
+ };
+
+ // Check local markers (created or atime updated during phase1) and
+ // keep or delete chunk based on that.
+ let atime = match std::fs::metadata(&chunk_path) {
+ Ok(stat) => stat.accessed()?,
+ Err(err) if err.kind() == std::io::ErrorKind::NotFound => {
+ // File not found, delete by setting atime to unix epoch
+ info!("Not found, mark for deletion: {object_key}");
+ SystemTime::UNIX_EPOCH
+ }
+ Err(err) => return Err(err.into()),
+ };
+ let atime = atime.duration_since(SystemTime::UNIX_EPOCH)?.as_secs() as i64;
+
+ let bad = chunk_path.as_path().ends_with(".bad");
+
+ if atime < min_atime {
+ delete_list.push(object_key.clone());
+ if bad {
+ gc_status.removed_bad += 1;
+ } else {
+ gc_status.removed_chunks += 1;
+ }
+ gc_status.removed_bytes += size;
+ } else if atime < oldest_writer {
+ if bad {
+ gc_status.still_bad += 1;
+ } else {
+ gc_status.pending_chunks += 1;
+ }
+ gc_status.pending_bytes += size;
+ } else {
+ if !bad {
+ gc_status.disk_chunks += 1;
+ }
+ gc_status.disk_bytes += size;
+ }
+
+ Ok(true)
+ }
+
+ // Check and generate a chunk path from given object key
+ fn chunk_path_from_object_key(&self, object_key: &S3ObjectKey) -> Option<PathBuf> {
+ // Check object is actually a chunk
+ let digest = match Path::new::<str>(object_key).file_name() {
+ Some(file_name) => file_name,
+ // should never be the case as objects will have a filename
+ None => return None,
+ };
+ let bytes = digest.as_bytes();
+ if bytes.len() != 64 && bytes.len() != 64 + ".0.bad".len() {
+ return None;
+ }
+ if !bytes.iter().take(64).all(u8::is_ascii_hexdigit) {
+ return None;
+ }
+
+ // Safe since contains valid ascii hexdigits only as checked above.
+ let digest_str = digest.to_string_lossy();
+ let hexdigit_prefix = unsafe { digest_str.get_unchecked(0..4) };
+ let mut chunk_path = self.base_path();
+ chunk_path.push(".chunks");
+ chunk_path.push(hexdigit_prefix);
+ chunk_path.push(digest);
+
+ Some(chunk_path)
+ }
+
pub fn try_shared_chunk_store_lock(&self) -> Result<ProcessLockSharedGuard, Error> {
self.inner.chunk_store.try_shared_lock()
}
--
2.47.2
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2025-07-22 10:10 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-22 10:10 [pbs-devel] [PATCH proxmox{, -backup} v11 00/50] fix #2943: S3 storage backend for datastores Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox v11 1/4] pbs-api-types: extend datastore config by backend config enum Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox v11 2/4] pbs-api-types: maintenance: add new maintenance mode S3 refresh Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox v11 3/4] s3 client: Add missing S3 object key max length check Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox v11 4/4] s3 client: merge secrets config with client config Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 01/46] datastore: add helpers for path/digest to s3 object key conversion Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 02/46] config: introduce s3 object store client configuration Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 03/46] api: config: implement endpoints to manipulate and list s3 configs Christian Ebner
2025-07-22 12:18 ` Lukas Wagner
2025-07-22 12:32 ` Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 04/46] api: datastore: check s3 backend bucket access on datastore create Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 05/46] api/cli: add endpoint and command to check s3 client connection Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 06/46] datastore: allow to get the backend for a datastore Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 07/46] api: backup: store datastore backend in runtime environment Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 08/46] api: backup: conditionally upload chunks to s3 object store backend Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 09/46] api: backup: conditionally upload blobs " Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 10/46] api: backup: conditionally upload indices " Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 11/46] api: backup: conditionally upload manifest " Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 12/46] api: datastore: conditionally upload client log to s3 backend Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 13/46] sync: pull: conditionally upload content " Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 14/46] api: reader: fetch chunks based on datastore backend Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 15/46] datastore: local chunk reader: read chunks based on backend Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 16/46] verify worker: add datastore backed to verify worker Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 17/46] verify: implement chunk verification for stores with s3 backend Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 18/46] datastore: create namespace marker in " Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 19/46] datastore: create/delete protected marker file on s3 storage backend Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 20/46] datastore: prune groups/snapshots from s3 object store backend Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 21/46] datastore: get and set owner for s3 " Christian Ebner
2025-07-22 10:10 ` Christian Ebner [this message]
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 23/46] ui: add datastore type selector and reorganize component layout Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 24/46] ui: add s3 client edit window for configuration create/edit Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 25/46] ui: add s3 client view for configuration Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 26/46] ui: expose the s3 client view in the navigation tree Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 27/46] ui: add s3 client selector and bucket field for s3 backend setup Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 28/46] tools: lru cache: add removed callback for evicted cache nodes Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 29/46] tools: async lru cache: implement insert, remove and contains methods Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 30/46] datastore: add local datastore cache for network attached storages Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 31/46] api: backup: use local datastore cache on s3 backend chunk upload Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 32/46] api: reader: use local datastore cache on s3 backend chunk fetching Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 33/46] datastore: local chunk reader: get cached chunk from local cache store Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 34/46] backup writer: refactor parameters into backup writer options struct Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 35/46] api: backup: add no-cache flag to bypass local datastore cache Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 36/46] api/datastore: implement refresh endpoint for stores with s3 backend Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 37/46] cli: add dedicated subcommand for datastore s3 refresh Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 38/46] ui: render s3 refresh as valid maintenance type and task description Christian Ebner
2025-07-22 10:10 ` [pbs-devel] [PATCH proxmox-backup v11 39/46] ui: expose s3 refresh button for datastores backed by object store Christian Ebner
2025-07-22 10:11 ` [pbs-devel] [PATCH proxmox-backup v11 40/46] datastore: conditionally upload atime marker chunk to s3 backend Christian Ebner
2025-07-22 10:11 ` [pbs-devel] [PATCH proxmox-backup v11 41/46] bin: implement client subcommands for s3 configuration manipulation Christian Ebner
2025-07-22 10:11 ` [pbs-devel] [PATCH proxmox-backup v11 42/46] bin: expose reuse-datastore flag for proxmox-backup-manager Christian Ebner
2025-07-22 10:11 ` [pbs-devel] [PATCH proxmox-backup v11 43/46] datastore: mark store as in-use by setting marker on s3 backend Christian Ebner
2025-07-22 10:11 ` [pbs-devel] [PATCH proxmox-backup v11 44/46] datastore: run s3-refresh when reusing a datastore with " Christian Ebner
2025-07-22 10:11 ` [pbs-devel] [PATCH proxmox-backup v11 45/46] api/ui: add flag to allow overwriting in-use marker for " Christian Ebner
2025-07-22 10:11 ` [pbs-devel] [PATCH proxmox-backup v11 46/46] docs: Add section describing how to setup s3 backed datastore Christian Ebner
2025-07-22 20:25 ` [pbs-devel] applied: [PATCH proxmox{, -backup} v11 00/50] fix #2943: S3 storage backend for datastores Thomas Lamprecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250722101106.526438-27-c.ebner@proxmox.com \
--to=c.ebner@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox