From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 474431FF183 for ; Wed, 5 Nov 2025 13:32:21 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 11F0E22CB4; Wed, 5 Nov 2025 13:33:00 +0100 (CET) From: Christian Ebner To: pbs-devel@lists.proxmox.com Date: Wed, 5 Nov 2025 13:22:32 +0100 Message-ID: <20251105122233.439382-23-c.ebner@proxmox.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20251105122233.439382-1-c.ebner@proxmox.com> References: <20251105122233.439382-1-c.ebner@proxmox.com> MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1762345369263 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.045 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [proxmox.com, datastore.rs] Subject: [pbs-devel] [PATCH proxmox-backup v3 22/23] GC: fix deadlock for cache eviction and garbage collection X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" When inserting a chunk via the local datastore cache, first the chunk is inserted into the chunk store and then into the in-memory AsyncLruCache. If the cache capacity is reached, the AsycLruCache will execute a callback on the evicted cache node, which in case of the local datastore cache performs a clear chunk call. For this codepath, the AsyncLruCache is guarded by locking a mutex to get exclusive access on the cache, and then the chunk store mutex guard is acquired for safe clearing of the chunk. Garbage collection however tries the opposite if a chunk is no longer present and should be cleaned up. It first guards the chunk store mutex, only to then try and remove the chunk from the local chunk store and the AsyncLruCache, the latter trying to guarantee exclusive access by guarding its own mutex. This therefore can result in a deadlock, further locking the whole chunk store. Fix this by guarding the chunk store within the remove_chunk() helper method on the ChunkStore, not acquiring the lock in the garbage collection itself for this code path (still guarded by the per-chunk file lock). By this the order of locking is the same as on cache eviction. Reported-by: https://forum.proxmox.com/threads/174878/ Signed-off-by: Christian Ebner --- changes since version 2: - not present in previous version pbs-datastore/src/chunk_store.rs | 16 ++++++++++------ pbs-datastore/src/datastore.rs | 1 - pbs-datastore/src/local_datastore_lru_cache.rs | 3 ++- 3 files changed, 12 insertions(+), 8 deletions(-) diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs index 305ce2316..234b5e8e7 100644 --- a/pbs-datastore/src/chunk_store.rs +++ b/pbs-datastore/src/chunk_store.rs @@ -434,6 +434,8 @@ impl ChunkStore { nix::fcntl::AtFlags::AT_SYMLINK_NOFOLLOW, ) { Ok(stat) if stat.st_atime < min_atime => { + // still protected by per-chunk file lock + drop(lock); let _ = cache.remove(&digest); return Ok(()); } @@ -445,19 +447,20 @@ impl ChunkStore { } } - unlinkat(Some(dirfd), filename, UnlinkatFlags::NoRemoveDir).map_err( - |err| { - format_err!( + let result = + unlinkat(Some(dirfd), filename, UnlinkatFlags::NoRemoveDir) + .map_err(|err| { + format_err!( "unlinking chunk {filename:?} failed on store '{}' - {err}", self.name, ) - }, - ) + }); + drop(lock); + result }, )?; } } - drop(lock); } Ok(()) @@ -717,6 +720,7 @@ impl ChunkStore { /// chunks by verifications and chunk inserts by backups. pub(crate) fn remove_chunk(&self, digest: &[u8; 32]) -> Result<(), Error> { let (chunk_path, _digest_str) = self.chunk_path(digest); + let _lock = self.mutex.lock(); std::fs::remove_file(chunk_path).map_err(Error::from) } diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs index 8e2f31d7a..1ea86e019 100644 --- a/pbs-datastore/src/datastore.rs +++ b/pbs-datastore/src/datastore.rs @@ -1706,7 +1706,6 @@ impl DataStore { &mut gc_status, || { if let Some(cache) = self.cache() { - let _guard = self.inner.chunk_store.mutex().lock().unwrap(); cache.remove(&digest)?; } delete_list.push((content.key, _chunk_guard)); diff --git a/pbs-datastore/src/local_datastore_lru_cache.rs b/pbs-datastore/src/local_datastore_lru_cache.rs index 7b9d8caae..74647cdb2 100644 --- a/pbs-datastore/src/local_datastore_lru_cache.rs +++ b/pbs-datastore/src/local_datastore_lru_cache.rs @@ -42,7 +42,8 @@ impl LocalDatastoreLruCache { /// Remove a chunk from the local datastore cache. /// /// Callers to this method must assure that: - /// - no concurrent insert is being performed, the chunk store's mutex must be held. + /// - the chunk store's mutex is not being held. + /// - no concurrent insert is being performed, the per-chunk file lock must be held. /// - the chunk to be removed is no longer referenced by an index file. /// - the chunk to be removed has not been inserted by an active writer (atime newer than /// writer start time). -- 2.47.3 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel