From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH proxmox-backup v3 22/23] GC: fix deadlock for cache eviction and garbage collection
Date: Wed, 5 Nov 2025 13:22:32 +0100 [thread overview]
Message-ID: <20251105122233.439382-23-c.ebner@proxmox.com> (raw)
In-Reply-To: <20251105122233.439382-1-c.ebner@proxmox.com>
When inserting a chunk via the local datastore cache, first the
chunk is inserted into the chunk store and then into the in-memory
AsyncLruCache. If the cache capacity is reached, the AsycLruCache
will execute a callback on the evicted cache node, which in case of
the local datastore cache performs a clear chunk call. For this
codepath, the AsyncLruCache is guarded by locking a mutex to get
exclusive access on the cache, and then the chunk store mutex guard
is acquired for safe clearing of the chunk.
Garbage collection however tries the opposite if a chunk is no longer
present and should be cleaned up. It first guards the chunk store
mutex, only to then try and remove the chunk from the local chunk
store and the AsyncLruCache, the latter trying to guarantee
exclusive access by guarding its own mutex.
This therefore can result in a deadlock, further locking the whole
chunk store.
Fix this by guarding the chunk store within the remove_chunk() helper
method on the ChunkStore, not acquiring the lock in the garbage
collection itself for this code path (still guarded by the per-chunk
file lock). By this the order of locking is the same as on cache
eviction.
Reported-by: https://forum.proxmox.com/threads/174878/
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 2:
- not present in previous version
pbs-datastore/src/chunk_store.rs | 16 ++++++++++------
pbs-datastore/src/datastore.rs | 1 -
pbs-datastore/src/local_datastore_lru_cache.rs | 3 ++-
3 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 305ce2316..234b5e8e7 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -434,6 +434,8 @@ impl ChunkStore {
nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW,
) {
Ok(stat) if stat.st_atime < min_atime => {
+ // still protected by per-chunk file lock
+ drop(lock);
let _ = cache.remove(&digest);
return Ok(());
}
@@ -445,19 +447,20 @@ impl ChunkStore {
}
}
- unlinkat(Some(dirfd), filename, UnlinkatFlags::NoRemoveDir).map_err(
- |err| {
- format_err!(
+ let result =
+ unlinkat(Some(dirfd), filename, UnlinkatFlags::NoRemoveDir)
+ .map_err(|err| {
+ format_err!(
"unlinking chunk {filename:?} failed on store '{}' - {err}",
self.name,
)
- },
- )
+ });
+ drop(lock);
+ result
},
)?;
}
}
- drop(lock);
}
Ok(())
@@ -717,6 +720,7 @@ impl ChunkStore {
/// chunks by verifications and chunk inserts by backups.
pub(crate) fn remove_chunk(&self, digest: &[u8; 32]) -> Result<(), Error> {
let (chunk_path, _digest_str) = self.chunk_path(digest);
+ let _lock = self.mutex.lock();
std::fs::remove_file(chunk_path).map_err(Error::from)
}
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 8e2f31d7a..1ea86e019 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1706,7 +1706,6 @@ impl DataStore {
&mut gc_status,
|| {
if let Some(cache) = self.cache() {
- let _guard = self.inner.chunk_store.mutex().lock().unwrap();
cache.remove(&digest)?;
}
delete_list.push((content.key, _chunk_guard));
diff --git a/pbs-datastore/src/local_datastore_lru_cache.rs b/pbs-datastore/src/local_datastore_lru_cache.rs
index 7b9d8caae..74647cdb2 100644
--- a/pbs-datastore/src/local_datastore_lru_cache.rs
+++ b/pbs-datastore/src/local_datastore_lru_cache.rs
@@ -42,7 +42,8 @@ impl LocalDatastoreLruCache {
/// Remove a chunk from the local datastore cache.
///
/// Callers to this method must assure that:
- /// - no concurrent insert is being performed, the chunk store's mutex must be held.
+ /// - the chunk store's mutex is not being held.
+ /// - no concurrent insert is being performed, the per-chunk file lock must be held.
/// - the chunk to be removed is no longer referenced by an index file.
/// - the chunk to be removed has not been inserted by an active writer (atime newer than
/// writer start time).
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2025-11-05 12:32 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-05 12:22 [pbs-devel] [PATCH proxmox-backup v3 00/23] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 01/23] sync: pull: instantiate backend only once per sync job Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 02/23] api/datastore: move group notes setting to the datastore Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 03/23] api/datastore: move snapshot deletion into dedicated datastore helper Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 04/23] api/datastore: move backup log upload by implementing " Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 05/23] api: backup: use datastore add_blob helper for backup session Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 06/23] api/datastore: add dedicated datastore helper to set snapshot notes Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 07/23] api/datastore: move s3 index upload helper to datastore backend Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 08/23] datastore: refactor chunk insert based on backend Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 09/23] verify: rename corrupted to corrupt in log output and function names Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 10/23] verify/datastore: make rename corrupt chunk a datastore helper method Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 11/23] datastore: refactor rename_corrupt_chunk error handling Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 12/23] chunk store: implement per-chunk file locking helper for s3 backend Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 13/23] datastore: acquire chunk store mutex lock when renaming corrupt chunk Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 14/23] datastore: get per-chunk file lock for chunk rename on s3 backend Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 15/23] fix #6961: datastore: verify: evict corrupt chunks from in-memory LRU cache Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 16/23] datastore: add locking to protect against races on chunk insert for s3 Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 17/23] GC: fix race with chunk upload/insert on s3 backends Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 18/23] GC: lock chunk marker before cleanup in phase 3 " Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 19/23] datastore: GC: drop overly verbose info message during s3 chunk sweep Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 20/23] chunk store: reduce exposure of clear_chunk() to crate only Christian Ebner
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 21/23] chunk store: make chunk removal a helper method of the chunk store Christian Ebner
2025-11-05 12:22 ` Christian Ebner [this message]
2025-11-05 12:22 ` [pbs-devel] [PATCH proxmox-backup v3 23/23] chunk store: never fail when trying to remove missing chunk file Christian Ebner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251105122233.439382-23-c.ebner@proxmox.com \
--to=c.ebner@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox