From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 207C71FF191 for ; Tue, 4 Nov 2025 18:51:48 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id E492C134CB; Tue, 4 Nov 2025 18:52:26 +0100 (CET) From: Christian Ebner To: pbs-devel@lists.proxmox.com Date: Tue, 4 Nov 2025 18:52:08 +0100 Message-ID: <20251104175208.872621-1-c.ebner@proxmox.com> X-Mailer: git-send-email 2.47.3 MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1762278726407 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.046 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pbs-devel] [RFC proxmox-backup] GC: fix deadlock for cache eviction and garbage collection X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" When inserting a chunk via the local datastore cache, first the chunk is inserted into the chunk store and then into the in-memory AsyncLruCache. If the cache capacity is reached, the AsycLruCache will execute a callback on the evicted cache node, which in case of the local datastore cache performs a clear chunk call. For this codepath, the AsyncLruCache is guarded by locking a mutex to get exclusive access on the cache, and then the chunk store mutex guard is acquired for safe clearing of the chunk. Garbage collection however tries the opposite if a chunk is no longer present and should be cleaned up. It first guards the chunk store mutex, only to then try and remove the chunk from the local chunk store and the AsyncLruCache, the latter trying to guarantee exclusive access by guarding its own mutex. This therefore can result in a deadlock, further locking the whole chunk store. Fix this by locking the chunk store before even trying to clear the chunk from the in-memory LRU cache, which is performed by the cache eviction. Reported-by: https://forum.proxmox.com/threads/174878/ Signed-off-by: Christian Ebner --- pbs-datastore/src/chunk_store.rs | 6 +++--- pbs-datastore/src/local_datastore_lru_cache.rs | 14 ++++++++++---- 2 files changed, 13 insertions(+), 7 deletions(-) diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs index ba7618e40..f9f13ec87 100644 --- a/pbs-datastore/src/chunk_store.rs +++ b/pbs-datastore/src/chunk_store.rs @@ -667,7 +667,9 @@ impl ChunkStore { /// /// Used to evict chunks from the local datastore cache, while keeping them as in-use markers /// for garbage collection. Returns with success also if chunk file is not pre-existing. - pub fn clear_chunk(&self, digest: &[u8; 32]) -> Result<(), Error> { + /// + /// Caller must hold the chunk store mutex lock. + pub unsafe fn clear_chunk(&self, digest: &[u8; 32]) -> Result<(), Error> { let (chunk_path, digest_str) = self.chunk_path(digest); let mut create_options = CreateOptions::new(); if nix::unistd::Uid::effective().is_root() { @@ -676,8 +678,6 @@ impl ChunkStore { create_options = create_options.owner(uid).group(gid); } - let _lock = self.mutex.lock(); - proxmox_sys::fs::replace_file(&chunk_path, &[], create_options, false) .map_err(|err| format_err!("clear chunk failed for {digest_str} - {err}"))?; Ok(()) diff --git a/pbs-datastore/src/local_datastore_lru_cache.rs b/pbs-datastore/src/local_datastore_lru_cache.rs index fe3b51a55..2cab6b83d 100644 --- a/pbs-datastore/src/local_datastore_lru_cache.rs +++ b/pbs-datastore/src/local_datastore_lru_cache.rs @@ -35,8 +35,11 @@ impl LocalDatastoreLruCache { /// Fails if the chunk cannot be inserted successfully. pub fn insert(&self, digest: &[u8; 32], chunk: &DataBlob) -> Result<(), Error> { self.store.insert_chunk(chunk, digest)?; - self.cache - .insert(*digest, (), |digest| self.store.clear_chunk(&digest)) + let _lock = self.store.mutex().lock(); + self.cache.insert(*digest, (), |digest| unsafe { + // unsafe condition satisfied, holding chunk store mutex guard + self.store.clear_chunk(&digest) + }) } /// Remove a chunk from the local datastore cache. @@ -71,8 +74,11 @@ impl LocalDatastoreLruCache { Ok(mut file) => match DataBlob::load_from_reader(&mut file) { // File was still cached with contents, load response from file Ok(chunk) => { - self.cache - .insert(*digest, (), |digest| self.store.clear_chunk(&digest))?; + let _lock = self.store.mutex().lock(); + self.cache.insert(*digest, (), |digest| unsafe { + // unsafe condition satisfied, holding chunk store mutex guard + self.store.clear_chunk(&digest) + })?; Ok(Some(chunk)) } // File was empty, might have been evicted since -- 2.47.3 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel