From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 9E37D1FF187 for ; Mon, 3 Nov 2025 12:31:14 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 826911A857; Mon, 3 Nov 2025 12:31:47 +0100 (CET) From: Christian Ebner To: pbs-devel@lists.proxmox.com Date: Mon, 3 Nov 2025 12:31:14 +0100 Message-ID: <20251103113120.239455-12-c.ebner@proxmox.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20251103113120.239455-1-c.ebner@proxmox.com> References: <20251103113120.239455-1-c.ebner@proxmox.com> MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1762169485012 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.047 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pbs-devel] [PATCH proxmox-backup 11/17] datastore: acquire chunk store mutex lock when renaming corrupt chunk X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" When renaming a corrupt chunk in the chunk store, currently the chunk store mutex lock is not held, leading to possible races with other operations which hold the lock and therefore assume exclusive access such as garbage collection or backup chunk inserts. This affects both, filesystem and S3 backends. To fix the possible race, get the lock and rearrange the code such that it is never held when entering async code for the s3 backend. This does not yet solve the race and cache consistency for s3 backends, which is addressed by introducing per-chunk file locking in subsequent patches. Signed-off-by: Christian Ebner --- pbs-datastore/src/datastore.rs | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs index 32f3562b3..e7ec87a7f 100644 --- a/pbs-datastore/src/datastore.rs +++ b/pbs-datastore/src/datastore.rs @@ -2585,6 +2585,8 @@ impl DataStore { pub fn rename_corrupt_chunk(&self, digest: &[u8; 32]) -> Result, Error> { let (path, digest_str) = self.chunk_path(digest); + let _lock = self.inner.chunk_store.mutex().lock().unwrap(); + let mut counter = 0; let mut new_path = path.clone(); loop { @@ -2596,6 +2598,14 @@ impl DataStore { } } + let result = match std::fs::rename(&path, &new_path) { + Ok(_) => Ok(Some(new_path)), + Err(err) if err.kind() == std::io::ErrorKind::NotFound => Ok(None), + Err(err) => bail!("could not rename corrupt chunk {path:?} - {err}"), + }; + + drop(_lock); + let backend = self.backend().map_err(|err| { format_err!( "failed to get backend while trying to rename bad chunk: {digest_str} - {err}" @@ -2626,10 +2636,6 @@ impl DataStore { )?; } - match std::fs::rename(&path, &new_path) { - Ok(_) => Ok(Some(new_path)), - Err(err) if err.kind() == std::io::ErrorKind::NotFound => Ok(None), - Err(err) => bail!("could not rename corrupt chunk {path:?} - {err}"), - } + result } } -- 2.47.3 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel