From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 55E6C1FF165 for ; Thu, 6 Nov 2025 18:14:01 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id E7A321E6C2; Thu, 6 Nov 2025 18:14:42 +0100 (CET) From: Christian Ebner To: pbs-devel@lists.proxmox.com Date: Thu, 6 Nov 2025 18:13:57 +0100 Message-ID: <20251106171358.865503-3-c.ebner@proxmox.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20251106171358.865503-1-c.ebner@proxmox.com> References: <20251106171358.865503-1-c.ebner@proxmox.com> MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1762449230384 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.047 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pbs-devel] [PATCH proxmox-backup v2 2/3] chunk store: fix race window between chunk stat and gc cleanup X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" Sweeping of unused chunks during garbage collection checks their atime to distinguish between chunks being in-use and chunks no longer being used. While garbage collection does lock the chunk store by guarding its mutex before reading file stats and deleting unused chunks, the conditional touch did not do this before updating the chunks atime (thereby also checking the presence). Therefore there is a race window between the chunks metadata being read and the chunk being removed, but the chunk being touched in-between. The race is however rare, as for this to happen the chunk must be older than the cutoff time and not be referenced by any index file, otherwise the atime would be updated during phase 1 already. Fix by guarding the chunk store mutex before touching a chunk. To achieve this, rename and splitoff internal touch chunk helpers to reflect that the internal helpers do not acquire the chunk store lock, while the one exposed to be accessed from outside the chunk store module does. Signed-off-by: Christian Ebner --- changes since version 1: - make sure internal helpers already holding the mutex guard try to lock it again pbs-datastore/src/chunk_store.rs | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs index 1262377d5..b88a0a096 100644 --- a/pbs-datastore/src/chunk_store.rs +++ b/pbs-datastore/src/chunk_store.rs @@ -204,18 +204,31 @@ impl ChunkStore { }) } - fn touch_chunk(&self, digest: &[u8; 32]) -> Result<(), Error> { + fn touch_chunk_no_lock(&self, digest: &[u8; 32]) -> Result<(), Error> { // unwrap: only `None` in unit tests assert!(self.locker.is_some()); - self.cond_touch_chunk(digest, true)?; + self.cond_touch_chunk_no_lock(digest, true)?; Ok(()) } + /// Update the chunk files atime if it exists. + /// + /// If the chunk file does not exist, return with error if assert_exists is true, with + /// Ok(false) otherwise. pub(super) fn cond_touch_chunk( &self, digest: &[u8; 32], assert_exists: bool, + ) -> Result { + let _lock = self.mutex.lock(); + self.cond_touch_chunk_no_lock(digest, assert_exists) + } + + fn cond_touch_chunk_no_lock( + &self, + digest: &[u8; 32], + assert_exists: bool, ) -> Result { // unwrap: only `None` in unit tests assert!(self.locker.is_some()); @@ -587,7 +600,7 @@ impl ChunkStore { } let old_size = metadata.len(); if encoded_size == old_size { - self.touch_chunk(digest)?; + self.touch_chunk_no_lock(digest)?; return Ok((true, old_size)); } else if old_size == 0 { log::warn!("found empty chunk '{digest_str}' in store {name}, overwriting"); @@ -612,11 +625,11 @@ impl ChunkStore { // compressed, the size mismatch could be caused by different zstd versions // so let's keep the one that was uploaded first, bit-rot is hopefully detected by // verification at some point.. - self.touch_chunk(digest)?; + self.touch_chunk_no_lock(digest)?; return Ok((true, old_size)); } else if old_size < encoded_size { log::debug!("Got another copy of chunk with digest '{digest_str}', existing chunk is smaller, discarding uploaded one."); - self.touch_chunk(digest)?; + self.touch_chunk_no_lock(digest)?; return Ok((true, old_size)); } else { log::debug!("Got another copy of chunk with digest '{digest_str}', existing chunk is bigger, replacing with uploaded one."); -- 2.47.3 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel