* [pbs-devel] [PATCH proxmox-backup 0/2] fix 2 issues with s3 store verifies @ 2025-10-29 11:06 Christian Ebner 2025-10-29 11:06 ` [pbs-devel] [PATCH proxmox-backup 1/2] verify: never hold mutex lock in async scope on corrupt chunk rename Christian Ebner 2025-10-29 11:06 ` [pbs-devel] [PATCH proxmox-backup 2/2] verify: distinguish s3 object fetching and chunk loading error Christian Ebner 0 siblings, 2 replies; 3+ messages in thread From: Christian Ebner @ 2025-10-29 11:06 UTC (permalink / raw) To: pbs-devel This patches were pulled out from the original patch series [0] since they are independent from the bigger series attempting to fix the possible race between corrupt chunk renaming and chunk insert/upload and better reviewed/tested independently. Patch 1 makes sure the mutex guard to sync up access to the corrupt chunk list is dropped before attempting to rename a corrupt chunk, which will call into async context on s3 stores. Otherwise deadlock can arise. Patch 2 is a followup to the bugfix for issue #6665, which however did not correctly distinguish between transient fetching errors and the possible chunk DataBlob decoding error from the response body in case of a successful response. [0] https://lore.proxmox.com/pbs-devel/20251016131819.349049-6-c.ebner@proxmox.com/T/ Christian Ebner (2): verify: never hold mutex lock in async scope on corrupt chunk rename verify: distinguish s3 object fetching and chunk loading error src/backup/verify.rs | 34 +++++++++++++++++++--------------- 1 file changed, 19 insertions(+), 15 deletions(-) -- 2.47.3 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 3+ messages in thread
* [pbs-devel] [PATCH proxmox-backup 1/2] verify: never hold mutex lock in async scope on corrupt chunk rename 2025-10-29 11:06 [pbs-devel] [PATCH proxmox-backup 0/2] fix 2 issues with s3 store verifies Christian Ebner @ 2025-10-29 11:06 ` Christian Ebner 2025-10-29 11:06 ` [pbs-devel] [PATCH proxmox-backup 2/2] verify: distinguish s3 object fetching and chunk loading error Christian Ebner 1 sibling, 0 replies; 3+ messages in thread From: Christian Ebner @ 2025-10-29 11:06 UTC (permalink / raw) To: pbs-devel Holding a mutex lock across async await boundaries is prone to deadlock [0]. Renaming a corrupt chunk requires however async API calls in case of datastores backed by S3. Fix this by simply not hold onto the mutex lock guarding the corrupt chunk list during chunk verification tasks when calling the rename method. If the chunk is already present in this list, there will be no other verification task operating on that exact chunk anyways. [0] https://docs.rs/tokio/latest/tokio/sync/struct.Mutex.html#which-kind-of-mutex-should-you-use Signed-off-by: Christian Ebner <c.ebner@proxmox.com> --- Originally on list as https://lore.proxmox.com/pbs-devel/20251016131819.349049-4-c.ebner@proxmox.com/ No changes to that patch. src/backup/verify.rs | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/backup/verify.rs b/src/backup/verify.rs index bdbe3148b..7172e81e1 100644 --- a/src/backup/verify.rs +++ b/src/backup/verify.rs @@ -332,8 +332,7 @@ impl VerifyWorker { fn add_corrupt_chunk(&self, digest: [u8; 32], errors: Arc<AtomicUsize>, message: &str) { // Panic on poisoned mutex - let mut corrupt_chunks = self.corrupt_chunks.lock().unwrap(); - corrupt_chunks.insert(digest); + self.corrupt_chunks.lock().unwrap().insert(digest); error!(message); errors.fetch_add(1, Ordering::SeqCst); Self::rename_corrupted_chunk(self.datastore.clone(), &digest); -- 2.47.3 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 3+ messages in thread
* [pbs-devel] [PATCH proxmox-backup 2/2] verify: distinguish s3 object fetching and chunk loading error 2025-10-29 11:06 [pbs-devel] [PATCH proxmox-backup 0/2] fix 2 issues with s3 store verifies Christian Ebner 2025-10-29 11:06 ` [pbs-devel] [PATCH proxmox-backup 1/2] verify: never hold mutex lock in async scope on corrupt chunk rename Christian Ebner @ 2025-10-29 11:06 ` Christian Ebner 1 sibling, 0 replies; 3+ messages in thread From: Christian Ebner @ 2025-10-29 11:06 UTC (permalink / raw) To: pbs-devel Errors while loading chunks from the object store might be cause by transient issues, and must therefore handled so they do not incorrectly mark chunks as corrupt. On creating the chunk from the response data, which includes the chunk header and validity checks, errors must however lead to the chunk being flagged as bad. Adapt the code so these errors are correctly distinguished. This is a followup to commit 3c350f35 ("fix #6665: never rename chunks on s3 client fetch errors") which did not take that into account. Fixes: 3c350f35 ("fix #6665: never rename chunks on s3 client fetch errors") Signed-off-by: Christian Ebner <c.ebner@proxmox.com> --- Originally on list as https://lore.proxmox.com/pbs-devel/20251016131819.349049-7-c.ebner@proxmox.com/ Extended the commit message to include the commit ref since the original version of the patch. src/backup/verify.rs | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/src/backup/verify.rs b/src/backup/verify.rs index 7172e81e1..f01345b04 100644 --- a/src/backup/verify.rs +++ b/src/backup/verify.rs @@ -292,19 +292,24 @@ impl VerifyWorker { let object_key = pbs_datastore::s3::object_key_from_digest(&info.digest)?; match proxmox_async::runtime::block_on(s3_client.get_object(object_key)) { Ok(Some(response)) => { - let chunk_result = proxmox_lang::try_block!({ - let bytes = - proxmox_async::runtime::block_on(response.content.collect())? - .to_bytes(); - DataBlob::from_raw(bytes.to_vec()) - }); - - match chunk_result { - Ok(chunk) => { - let size = info.size(); - *read_bytes += chunk.raw_size(); - decoder_pool.send((chunk, info.digest, size))?; - *decoded_bytes += size; + match proxmox_async::runtime::block_on(response.content.collect()) { + Ok(raw_chunk) => { + match DataBlob::from_raw(raw_chunk.to_bytes().to_vec()) { + Ok(chunk) => { + let size = info.size(); + *read_bytes += chunk.raw_size(); + decoder_pool.send((chunk, info.digest, size))?; + *decoded_bytes += size; + } + Err(err) => self.add_corrupt_chunk( + info.digest, + errors, + &format!( + "can't verify chunk with digest {} - {err}", + hex::encode(info.digest) + ), + ), + } } Err(err) => { errors.fetch_add(1, Ordering::SeqCst); -- 2.47.3 _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-10-29 11:06 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-10-29 11:06 [pbs-devel] [PATCH proxmox-backup 0/2] fix 2 issues with s3 store verifies Christian Ebner 2025-10-29 11:06 ` [pbs-devel] [PATCH proxmox-backup 1/2] verify: never hold mutex lock in async scope on corrupt chunk rename Christian Ebner 2025-10-29 11:06 ` [pbs-devel] [PATCH proxmox-backup 2/2] verify: distinguish s3 object fetching and chunk loading error Christian Ebner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox