From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id B0C811FF15E for ; Mon, 10 Nov 2025 12:56:18 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 6C88D1428E; Mon, 10 Nov 2025 12:56:56 +0100 (CET) From: Christian Ebner To: pbs-devel@lists.proxmox.com Date: Mon, 10 Nov 2025 12:56:26 +0100 Message-ID: <20251110115627.280318-14-c.ebner@proxmox.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20251110115627.280318-1-c.ebner@proxmox.com> References: <20251110115627.280318-1-c.ebner@proxmox.com> MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1762775787200 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.048 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pbs-devel] [PATCH proxmox-backup v4 13/14] GC: fix deadlock for cache eviction and garbage collection X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" When inserting a chunk via the local datastore cache, first the chunk is inserted into the chunk store and then into the in-memory AsyncLruCache. If the cache capacity is reached, the AsycLruCache will execute a callback on the evicted cache node, which in case of the local datastore cache performs a clear chunk call. For this codepath, the AsyncLruCache is guarded by locking a mutex to get exclusive access on the cache, and then the chunk store mutex guard is acquired for safe clearing of the chunk. Garbage collection however tries the opposite if a chunk is no longer present and should be cleaned up. It first guards the chunk store mutex, only to then try and remove the chunk from the local chunk store and the AsyncLruCache, the latter trying to guarantee exclusive access by guarding its own mutex. This therefore can result in a deadlock, further locking the whole chunk store. Fix this by guarding the chunk store within the remove_chunk() helper method on the ChunkStore, not acquiring the lock in the garbage collection itself for this code path (still guarded by the per-chunk file lock). By this the order of locking is the same as on cache eviction. Reported-by: https://forum.proxmox.com/threads/174878/ Signed-off-by: Christian Ebner --- pbs-datastore/src/chunk_store.rs | 4 +++- pbs-datastore/src/datastore.rs | 1 - pbs-datastore/src/local_datastore_lru_cache.rs | 3 ++- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs index 879ac4923..273b17d5f 100644 --- a/pbs-datastore/src/chunk_store.rs +++ b/pbs-datastore/src/chunk_store.rs @@ -430,6 +430,8 @@ impl ChunkStore { if let Ok(_guard) = self.lock_chunk(&digest, Duration::from_secs(0)) { + // still protected by per-chunk file lock + drop(lock); cache.remove(&digest)?; } @@ -450,7 +452,6 @@ impl ChunkStore { )?; } } - drop(lock); } Ok(()) @@ -710,6 +711,7 @@ impl ChunkStore { /// chunks by verifications and chunk inserts by backups. pub(crate) fn remove_chunk(&self, digest: &[u8; 32]) -> Result<(), Error> { let (chunk_path, _digest_str) = self.chunk_path(digest); + let _lock = self.mutex.lock(); std::fs::remove_file(chunk_path).map_err(Error::from) } diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs index 354caeae1..0a86aaede 100644 --- a/pbs-datastore/src/datastore.rs +++ b/pbs-datastore/src/datastore.rs @@ -1710,7 +1710,6 @@ impl DataStore { &mut gc_status, || { if let Some(cache) = self.cache() { - let _guard = self.inner.chunk_store.mutex().lock().unwrap(); cache.remove(&digest)?; } delete_list.push((content.key, _chunk_guard)); diff --git a/pbs-datastore/src/local_datastore_lru_cache.rs b/pbs-datastore/src/local_datastore_lru_cache.rs index 7b9d8caae..74647cdb2 100644 --- a/pbs-datastore/src/local_datastore_lru_cache.rs +++ b/pbs-datastore/src/local_datastore_lru_cache.rs @@ -42,7 +42,8 @@ impl LocalDatastoreLruCache { /// Remove a chunk from the local datastore cache. /// /// Callers to this method must assure that: - /// - no concurrent insert is being performed, the chunk store's mutex must be held. + /// - the chunk store's mutex is not being held. + /// - no concurrent insert is being performed, the per-chunk file lock must be held. /// - the chunk to be removed is no longer referenced by an index file. /// - the chunk to be removed has not been inserted by an active writer (atime newer than /// writer start time). -- 2.47.3 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel