public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [RFC proxmox-backup 26/39] datastore: implement garbage collection for s3 backend
Date: Mon, 19 May 2025 13:46:27 +0200	[thread overview]
Message-ID: <20250519114640.303640-27-c.ebner@proxmox.com> (raw)
In-Reply-To: <20250519114640.303640-1-c.ebner@proxmox.com>

Implements the garbage collection for datastore's backed by an s3
object store.
Take advantage of the local datastore by placing marker files in the
chunk store during phase 1 of the garbage collection, updating their
atime if already present. By this expensive api calls can be avoided
to update the object metadata (only possible via a copy object
operation).

The phase 2 is implemented by fetching a list of all the chunks via
the ListObjectsV2 api call, filtered by the chunk folder prefix.
This operation has to be performed in patches of 1000 objects, given
by the api's response limits.
For each object key, lookup the marker file and decide based on the
marker existence and it's atime if the chunk object needs to be
removed. Deletion happens via the delete objects operation, allowing
to delete multiple chunks by a single request.

This allows to efficiently lookup chunks which are not in use
anymore while being performant and cost effective.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/datastore.rs | 200 ++++++++++++++++++++++++++++-----
 1 file changed, 175 insertions(+), 25 deletions(-)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 4fc6fe9a5..68d3ac6e2 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -4,7 +4,7 @@ use std::os::unix::ffi::OsStrExt;
 use std::os::unix::io::AsRawFd;
 use std::path::{Path, PathBuf};
 use std::sync::{Arc, LazyLock, Mutex};
-use std::time::Duration;
+use std::time::{Duration, SystemTime};
 
 use anyhow::{bail, format_err, Context, Error};
 use nix::unistd::{unlinkat, UnlinkatFlags};
@@ -1145,6 +1145,7 @@ impl DataStore {
         chunk_lru_cache: &mut LruCache<[u8; 32], ()>,
         status: &mut GarbageCollectionStatus,
         worker: &dyn WorkerTaskContext,
+        s3_client: Option<Arc<S3Client>>,
     ) -> Result<(), Error> {
         status.index_file_count += 1;
         status.index_data_bytes += index.index_bytes();
@@ -1159,21 +1160,41 @@ impl DataStore {
                 continue;
             }
 
-            if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
-                let hex = hex::encode(digest);
-                warn!(
-                    "warning: unable to access non-existent chunk {hex}, required by {file_name:?}"
-                );
-
-                // touch any corresponding .bad files to keep them around, meaning if a chunk is
-                // rewritten correctly they will be removed automatically, as well as if no index
-                // file requires the chunk anymore (won't get to this loop then)
-                for i in 0..=9 {
-                    let bad_ext = format!("{}.bad", i);
-                    let mut bad_path = PathBuf::new();
-                    bad_path.push(self.chunk_path(digest).0);
-                    bad_path.set_extension(bad_ext);
-                    self.inner.chunk_store.cond_touch_path(&bad_path, false)?;
+            match s3_client {
+                None => {
+                    // Filesystem backend
+                    if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
+                        let hex = hex::encode(digest);
+                        warn!(
+                            "warning: unable to access non-existent chunk {hex}, required by {file_name:?}"
+                        );
+
+                        // touch any corresponding .bad files to keep them around, meaning if a chunk is
+                        // rewritten correctly they will be removed automatically, as well as if no index
+                        // file requires the chunk anymore (won't get to this loop then)
+                        for i in 0..=9 {
+                            let bad_ext = format!("{}.bad", i);
+                            let mut bad_path = PathBuf::new();
+                            bad_path.push(self.chunk_path(digest).0);
+                            bad_path.set_extension(bad_ext);
+                            self.inner.chunk_store.cond_touch_path(&bad_path, false)?;
+                        }
+                    }
+                }
+                Some(ref _s3_client) => {
+                    // Update atime on local cache marker files.
+                    if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
+                        let (chunk_path, _digest) = self.chunk_path(digest);
+                        // Insert empty file as marker to tell GC phase2 that this is
+                        // a chunk still in-use, so to keep in the S3 object store.
+                        std::fs::File::options()
+                            .write(true)
+                            .create_new(true)
+                            .open(chunk_path)
+                            .with_context(|| {
+                                format!("failed to create marker for chunk {}", hex::encode(digest))
+                            })?;
+                    }
                 }
             }
         }
@@ -1185,6 +1206,7 @@ impl DataStore {
         status: &mut GarbageCollectionStatus,
         worker: &dyn WorkerTaskContext,
         cache_capacity: usize,
+        s3_client: Option<Arc<S3Client>>,
     ) -> Result<(), Error> {
         // Iterate twice over the datastore to fetch index files, even if this comes with an
         // additional runtime cost:
@@ -1274,6 +1296,7 @@ impl DataStore {
                                 &mut chunk_lru_cache,
                                 status,
                                 worker,
+                                s3_client.as_ref().cloned(),
                             )?;
 
                             if !unprocessed_index_list.remove(&path) {
@@ -1308,7 +1331,14 @@ impl DataStore {
                     continue;
                 }
             };
-            self.index_mark_used_chunks(index, &path, &mut chunk_lru_cache, status, worker)?;
+            self.index_mark_used_chunks(
+                index,
+                &path,
+                &mut chunk_lru_cache,
+                status,
+                worker,
+                s3_client.as_ref().cloned(),
+            )?;
             warn!("Marked chunks for unexpected index file at '{path:?}'");
         }
         if strange_paths_count > 0 {
@@ -1406,18 +1436,138 @@ impl DataStore {
                 1024 * 1024
             };
 
-            info!("Start GC phase1 (mark used chunks)");
+            let s3_client = match self.backend()? {
+                DatastoreBackend::Filesystem => None,
+                DatastoreBackend::S3(s3_client) => {
+                    proxmox_async::runtime::block_on(s3_client.head_bucket())
+                        .context("failed to reach bucket")?;
+                    Some(s3_client)
+                }
+            };
 
-            self.mark_used_chunks(&mut gc_status, worker, gc_cache_capacity)
-                .context("marking used chunks failed")?;
+            info!("Start GC phase1 (mark used chunks)");
 
-            info!("Start GC phase2 (sweep unused chunks)");
-            self.inner.chunk_store.sweep_unused_chunks(
-                oldest_writer,
-                min_atime,
+            self.mark_used_chunks(
                 &mut gc_status,
                 worker,
-            )?;
+                gc_cache_capacity,
+                s3_client.as_ref().cloned(),
+            )
+            .context("marking used chunks failed")?;
+
+            info!("Start GC phase2 (sweep unused chunks)");
+
+            if let Some(ref s3_client) = s3_client {
+                let mut chunk_count = 0;
+                let prefix = Some(".chunks/");
+                // Operates in batches of 1000 objects max per request
+                let mut list_bucket_result =
+                    proxmox_async::runtime::block_on(s3_client.list_objects_v2(prefix, None, None))?;
+
+                let mut delete_list = Vec::with_capacity(1000);
+                loop {
+                    for content in list_bucket_result.contents {
+                        // Check object is actually a chunk
+                        let digest = match Path::new(&content.key).file_name() {
+                            Some(file_name) => file_name,
+                            // should never be the case as objects will have a filename
+                            None => continue,
+                        };
+                        let bytes = digest.as_bytes();
+                        if bytes.len() != 64 && bytes.len() != 64 + ".0.bad".len() {
+                            continue;
+                        }
+                        if !bytes.iter().take(64).all(u8::is_ascii_hexdigit) {
+                            continue;
+                        }
+
+                        let bad = bytes.ends_with(b".bad");
+
+                        // Check local markers (created or atime updated during phase1) and
+                        // keep or delete chunk based on that.
+
+                        let mut chunk_path = self.base_path();
+                        chunk_path.push(&content.key);
+                        let atime = match std::fs::metadata(chunk_path) {
+                            Ok(stat) => stat.accessed()?,
+                            Err(err) if err.kind() == std::io::ErrorKind::NotFound => {
+                                // File not found, delete by setting atime to unix epoch
+                                info!("Not found, mark for deletion: {}", content.key);
+                                SystemTime::UNIX_EPOCH
+                            }
+                            Err(err) => return Err(err.into()),
+                        };
+                        let atime = atime.duration_since(SystemTime::UNIX_EPOCH)?.as_secs() as i64;
+
+                        chunk_count += 1;
+
+                        if atime < min_atime {
+                            delete_list.push(content.key);
+                            if bad {
+                                gc_status.removed_bad += 1;
+                            } else {
+                                gc_status.removed_chunks += 1;
+                            }
+                            gc_status.removed_bytes += content.size;
+                        } else if atime < oldest_writer {
+                            if bad {
+                                gc_status.still_bad += 1;
+                            } else {
+                                gc_status.pending_chunks += 1;
+                            }
+                            gc_status.pending_bytes += content.size;
+                        } else {
+                            if !bad {
+                                gc_status.disk_chunks += 1;
+                            }
+                            gc_status.disk_bytes += content.size;
+                        }
+                    }
+
+                    if !delete_list.is_empty() {
+                        //TODO: error handling
+                        let _delete_objects_result = proxmox_async::runtime::block_on(
+                            s3_client.delete_objects(&delete_list),
+                        )?;
+                        delete_list.clear();
+                    }
+
+                    // Process next batch of chunks if there is more
+                    if list_bucket_result.is_truncated {
+                        list_bucket_result =
+                            proxmox_async::runtime::block_on(s3_client.list_objects_v2(
+                                prefix,
+                                None,
+                                list_bucket_result.next_continuation_token.as_deref(),
+                            ))?;
+                        continue;
+                    }
+
+                    break;
+                }
+                info!("processed {chunk_count} total chunks");
+
+                // Phase 2 GC of Filesystem backed storage is phase 3 for S3 backed GC
+                info!("Start GC phase3 (sweep unused chunk markers)");
+
+                let mut tmp_gc_status = GarbageCollectionStatus {
+                    upid: Some(upid.to_string()),
+                    ..Default::default()
+                };
+                self.inner.chunk_store.sweep_unused_chunks(
+                    oldest_writer,
+                    min_atime,
+                    &mut tmp_gc_status,
+                    worker,
+                )?;
+            } else {
+                self.inner.chunk_store.sweep_unused_chunks(
+                    oldest_writer,
+                    min_atime,
+                    &mut gc_status,
+                    worker,
+                )?;
+            }
 
             info!(
                 "Removed garbage: {}",
-- 
2.39.5



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


  parent reply	other threads:[~2025-05-19 11:47 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-19 11:46 [pbs-devel] [RFC proxmox proxmox-backup 00/39] S3 storage backend for datastores Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox 1/2] pbs-api-types: add types for S3 client configs and secrets Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox 2/2] pbs-api-types: extend datastore config by backend config enum Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 03/39] fmt: fix minor formatting issues Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 04/39] verify: refactor verify related functions to be methods of worker Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 05/39] s3 client: add crate for AWS S3 compatible object store client Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 06/39] s3 client: implement AWS signature v4 request authentication Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 07/39] s3 client: add dedicated type for s3 object keys Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 08/39] s3 client: add helper for last modified timestamp parsing Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 09/39] s3 client: add helper to parse http date headers Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 10/39] s3 client: implement methods to operate on s3 objects in bucket Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 11/39] config: introduce s3 object store client configuration Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 12/39] api: config: implement endpoints to manipulate and list s3 configs Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 13/39] api: datastore: check S3 backend bucket access on datastore create Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 14/39] datastore: allow to get the backend for a datastore Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 15/39] api: backup: store datastore backend in runtime environment Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 16/39] api: backup: conditionally upload chunks to S3 object store backend Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 17/39] api: backup: conditionally upload blobs " Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 18/39] api: backup: conditionally upload indices " Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 19/39] api: backup: conditionally upload manifest " Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 20/39] api: reader: fetch chunks based on datastore backend Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 21/39] datastore: local chunk reader: read chunks based on backend Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 22/39] verify worker: add datastore backed to verify worker Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 23/39] verify: implement chunk verification for stores with s3 backend Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 24/39] api: remove snapshot from S3 backend on snapshot delete Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 25/39] datastore: prune groups/snapshots from S3 object store backend Christian Ebner
2025-05-19 11:46 ` Christian Ebner [this message]
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 27/39] ui: add S3 client edit window for configuration create/edit Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 28/39] ui: add S3 client view for configuration Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 29/39] ui: expose the S3 client view in the navigation tree Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 30/39] ui: add s3 bucket selector and allow to set s3 backend Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 31/39] api/bin: add endpoint and command to test s3 backend for datastore Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 32/39] tools: lru cache: add removed callback for evicted nodes Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 33/39] tools: async lru cache: implement insert, remove and contains methods Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 34/39] datastore: add local datastore cache for network attached storages Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 35/39] api: backup: use local datastore cache on S3 backend chunk upload Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 36/39] api: reader: use local datastore cache on S3 backend chunk fetching Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 37/39] api: backup: add no-cache flag to bypass local datastore cache Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 38/39] datastore: get and set owner for S3 store backend Christian Ebner
2025-05-19 11:46 ` [pbs-devel] [RFC proxmox-backup 39/39] datastore: create namespace marker in S3 backend Christian Ebner
2025-05-29 14:33 ` [pbs-devel] superseded: [RFC proxmox proxmox-backup 00/39] S3 storage backend for datastores Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250519114640.303640-27-c.ebner@proxmox.com \
    --to=c.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal