public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [RFC proxmox-backup] GC: fix deadlock for cache eviction and garbage collection
Date: Tue,  4 Nov 2025 18:52:08 +0100	[thread overview]
Message-ID: <20251104175208.872621-1-c.ebner@proxmox.com> (raw)

When inserting a chunk via the local datastore cache, first the
chunk is inserted into the chunk store and then into the in-memory
AsyncLruCache. If the cache capacity is reached, the AsycLruCache
will execute a callback on the evicted cache node, which in case of
the local datastore cache performs a clear chunk call. For this
codepath, the AsyncLruCache is guarded by locking a mutex to get
exclusive access on the cache, and then the chunk store mutex guard
is acquired for safe clearing of the chunk.

Garbage collection however tries the opposite if a chunk is no longer
present and should be cleaned up. It first guards the chunk store
mutex, only to then try and remove the chunk from the local chunk
store and the AsyncLruCache, the latter trying to guarantee
exclusive access by guarding its own mutex.

This therefore can result in a deadlock, further locking the whole
chunk store.

Fix this by locking the chunk store before even trying to clear the
chunk from the in-memory LRU cache, which is performed by the cache
eviction.

Reported-by: https://forum.proxmox.com/threads/174878/
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/chunk_store.rs               |  6 +++---
 pbs-datastore/src/local_datastore_lru_cache.rs | 14 ++++++++++----
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index ba7618e40..f9f13ec87 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -667,7 +667,9 @@ impl ChunkStore {
     ///
     /// Used to evict chunks from the local datastore cache, while keeping them as in-use markers
     /// for garbage collection. Returns with success also if chunk file is not pre-existing.
-    pub fn clear_chunk(&self, digest: &[u8; 32]) -> Result<(), Error> {
+    ///
+    /// Caller must hold the chunk store mutex lock.
+    pub unsafe fn clear_chunk(&self, digest: &[u8; 32]) -> Result<(), Error> {
         let (chunk_path, digest_str) = self.chunk_path(digest);
         let mut create_options = CreateOptions::new();
         if nix::unistd::Uid::effective().is_root() {
@@ -676,8 +678,6 @@ impl ChunkStore {
             create_options = create_options.owner(uid).group(gid);
         }
 
-        let _lock = self.mutex.lock();
-
         proxmox_sys::fs::replace_file(&chunk_path, &[], create_options, false)
             .map_err(|err| format_err!("clear chunk failed for {digest_str} - {err}"))?;
         Ok(())
diff --git a/pbs-datastore/src/local_datastore_lru_cache.rs b/pbs-datastore/src/local_datastore_lru_cache.rs
index fe3b51a55..2cab6b83d 100644
--- a/pbs-datastore/src/local_datastore_lru_cache.rs
+++ b/pbs-datastore/src/local_datastore_lru_cache.rs
@@ -35,8 +35,11 @@ impl LocalDatastoreLruCache {
     /// Fails if the chunk cannot be inserted successfully.
     pub fn insert(&self, digest: &[u8; 32], chunk: &DataBlob) -> Result<(), Error> {
         self.store.insert_chunk(chunk, digest)?;
-        self.cache
-            .insert(*digest, (), |digest| self.store.clear_chunk(&digest))
+        let _lock = self.store.mutex().lock();
+        self.cache.insert(*digest, (), |digest| unsafe {
+            // unsafe condition satisfied, holding chunk store mutex guard
+            self.store.clear_chunk(&digest)
+        })
     }
 
     /// Remove a chunk from the local datastore cache.
@@ -71,8 +74,11 @@ impl LocalDatastoreLruCache {
             Ok(mut file) => match DataBlob::load_from_reader(&mut file) {
                 // File was still cached with contents, load response from file
                 Ok(chunk) => {
-                    self.cache
-                        .insert(*digest, (), |digest| self.store.clear_chunk(&digest))?;
+                    let _lock = self.store.mutex().lock();
+                    self.cache.insert(*digest, (), |digest| unsafe {
+                        // unsafe condition satisfied, holding chunk store mutex guard
+                        self.store.clear_chunk(&digest)
+                    })?;
                     Ok(Some(chunk))
                 }
                 // File was empty, might have been evicted since
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


                 reply	other threads:[~2025-11-04 17:51 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251104175208.872621-1-c.ebner@proxmox.com \
    --to=c.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal