From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 176E41FF191 for ; Tue, 4 Nov 2025 14:07:50 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id E8DA5D07C; Tue, 4 Nov 2025 14:08:22 +0100 (CET) From: Christian Ebner To: pbs-devel@lists.proxmox.com Date: Tue, 4 Nov 2025 14:06:57 +0100 Message-ID: <20251104130659.435139-18-c.ebner@proxmox.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20251104130659.435139-1-c.ebner@proxmox.com> References: <20251104130659.435139-1-c.ebner@proxmox.com> MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1762261622956 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.047 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pbs-devel] [PATCH proxmox-backup v2 17/19] GC: fix race with chunk upload/insert on s3 backends X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" The previous approach relying soly on local marker files was flawed as it could not protect against all the possible races between chunk upload to the s3 object store, insertion in the local datastore cache and in-memory LRU cache. Since these operations are now protected by getting a per-chunk file lock, use that to check if it is safe to remove the chunk and do so in a consistent manner by holding the chunk lock guard until it is actually removed from the s3 backend and the caches. Since an error when removing the chunk from cache could lead to inconsistencies, GC must now fail in that case. The chunk store lock is now only required on cache remove. Signed-off-by: Christian Ebner --- Changes since version 1: - no changes pbs-datastore/src/datastore.rs | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs index f869320d8..8b7333d80 100644 --- a/pbs-datastore/src/datastore.rs +++ b/pbs-datastore/src/datastore.rs @@ -1667,14 +1667,18 @@ impl DataStore { let mut delete_list = Vec::with_capacity(1000); loop { - let lock = self.inner.chunk_store.mutex().lock().unwrap(); - for content in list_bucket_result.contents { let (chunk_path, digest) = match self.chunk_path_from_object_key(&content.key) { Some(path) => path, None => continue, }; + let timeout = std::time::Duration::from_secs(0); + let _chunk_guard = match self.inner.chunk_store.lock_chunk(&digest, timeout) { + Ok(guard) => guard, + Err(_) => continue, + }; + // Check local markers (created or atime updated during phase1) and // keep or delete chunk based on that. let atime = match std::fs::metadata(&chunk_path) { @@ -1703,10 +1707,10 @@ impl DataStore { &mut gc_status, || { if let Some(cache) = self.cache() { - // ignore errors, phase 3 will retry cleanup anyways - let _ = cache.remove(&digest); + let _guard = self.inner.chunk_store.mutex().lock().unwrap(); + cache.remove(&digest)?; } - delete_list.push(content.key); + delete_list.push((content.key, _chunk_guard)); Ok(()) }, )?; @@ -1715,14 +1719,19 @@ impl DataStore { chunk_count += 1; } - drop(lock); - if !delete_list.is_empty() { - let delete_objects_result = - proxmox_async::runtime::block_on(s3_client.delete_objects(&delete_list))?; + let delete_objects_result = proxmox_async::runtime::block_on( + s3_client.delete_objects( + &delete_list + .iter() + .map(|(key, _)| key.clone()) + .collect::>(), + ), + )?; if let Some(_err) = delete_objects_result.error { bail!("failed to delete some objects"); } + // release all chunk guards delete_list.clear(); } -- 2.47.3 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel