all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH proxmox-backup 15/17] GC: fix race with chunk upload/insert on s3 backends
Date: Mon,  3 Nov 2025 12:31:18 +0100	[thread overview]
Message-ID: <20251103113120.239455-16-c.ebner@proxmox.com> (raw)
In-Reply-To: <20251103113120.239455-1-c.ebner@proxmox.com>

The previous approach relying soly on local marker files was flawed
as it could not protect against all the possible races between chunk
upload to the s3 object store, insertion in the local datastore cache
and in-memory LRU cache.

Since these operations are now protected by getting a per-chunk file
lock, use that to check if it is safe to remove the chunk and do so
in a consistent manner by holding the chunk lock guard until it is
actually removed from the s3 backend and the caches.

Since an error when removing the chunk from cache could lead to
inconsistencies, GC must now fail in that case.

The chunk store lock is now only required on cache remove.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/datastore.rs | 27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index b36424131..beca72e9a 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1639,14 +1639,18 @@ impl DataStore {
 
             let mut delete_list = Vec::with_capacity(1000);
             loop {
-                let lock = self.inner.chunk_store.mutex().lock().unwrap();
-
                 for content in list_bucket_result.contents {
                     let (chunk_path, digest) = match self.chunk_path_from_object_key(&content.key) {
                         Some(path) => path,
                         None => continue,
                     };
 
+                    let timeout = std::time::Duration::from_secs(0);
+                    let _chunk_guard = match self.inner.chunk_store.lock_chunk(&digest, timeout) {
+                        Ok(guard) => guard,
+                        Err(_) => continue,
+                    };
+
                     // Check local markers (created or atime updated during phase1) and
                     // keep or delete chunk based on that.
                     let atime = match std::fs::metadata(&chunk_path) {
@@ -1675,10 +1679,10 @@ impl DataStore {
                             &mut gc_status,
                             || {
                                 if let Some(cache) = self.cache() {
-                                    // ignore errors, phase 3 will retry cleanup anyways
-                                    let _ = cache.remove(&digest);
+                                    let _guard = self.inner.chunk_store.mutex().lock().unwrap();
+                                    cache.remove(&digest)?;
                                 }
-                                delete_list.push(content.key);
+                                delete_list.push((content.key, _chunk_guard));
                                 Ok(())
                             },
                         )?;
@@ -1687,14 +1691,19 @@ impl DataStore {
                     chunk_count += 1;
                 }
 
-                drop(lock);
-
                 if !delete_list.is_empty() {
-                    let delete_objects_result =
-                        proxmox_async::runtime::block_on(s3_client.delete_objects(&delete_list))?;
+                    let delete_objects_result = proxmox_async::runtime::block_on(
+                        s3_client.delete_objects(
+                            &delete_list
+                                .iter()
+                                .map(|(key, _)| key.clone())
+                                .collect::<Vec<S3ObjectKey>>(),
+                        ),
+                    )?;
                     if let Some(_err) = delete_objects_result.error {
                         bail!("failed to delete some objects");
                     }
+                    // release all chunk guards
                     delete_list.clear();
                 }
 
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


  parent reply	other threads:[~2025-11-03 11:31 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-03 11:31 [pbs-devel] [PATCH proxmox-backup 00/17] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 01/17] sync: pull: instantiate backend only once per sync job Christian Ebner
2025-11-03 14:51   ` Fabian Grünbichler
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 02/17] api/datastore: move group notes setting to the datastore Christian Ebner
2025-11-03 14:51   ` Fabian Grünbichler
2025-11-04  8:51     ` Christian Ebner
2025-11-04  9:13       ` Fabian Grünbichler
2025-11-04  9:37         ` Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 03/17] api/datastore: move snapshot deletion into dedicated datastore helper Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 04/17] api/datastore: move backup log upload by implementing " Christian Ebner
2025-11-03 14:51   ` Fabian Grünbichler
2025-11-04  8:47     ` Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 05/17] api/datastore: add dedicated datastore helper to set snapshot notes Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 06/17] datastore: refactor chunk insert based on backend Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 07/17] verify: rename corrupted to corrupt in log output and function names Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 08/17] verify/datastore: make rename corrupt chunk a datastore helper method Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 09/17] datastore: refactor rename_corrupt_chunk error handling Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 10/17] datastore: implement per-chunk file locking helper for s3 backend Christian Ebner
2025-11-03 14:51   ` Fabian Grünbichler
2025-11-04  8:45     ` Christian Ebner
2025-11-04  9:01       ` Fabian Grünbichler
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 11/17] datastore: acquire chunk store mutex lock when renaming corrupt chunk Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 12/17] datastore: get per-chunk file lock for chunk rename on s3 backend Christian Ebner
2025-11-03 14:51   ` Fabian Grünbichler
2025-11-04  8:33     ` Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 13/17] fix #6961: datastore: verify: evict corrupt chunks from in-memory LRU cache Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 14/17] datastore: add locking to protect against races on chunk insert for s3 Christian Ebner
2025-11-03 11:31 ` Christian Ebner [this message]
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 16/17] GC: lock chunk marker before cleanup in phase 3 on s3 backends Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 17/17] datastore: GC: drop overly verbose info message during s3 chunk sweep Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251103113120.239455-16-c.ebner@proxmox.com \
    --to=c.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal