public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox-backup 0/2] fix 2 issues with s3 store verifies
@ 2025-10-29 11:06 Christian Ebner
  2025-10-29 11:06 ` [pbs-devel] [PATCH proxmox-backup 1/2] verify: never hold mutex lock in async scope on corrupt chunk rename Christian Ebner
  2025-10-29 11:06 ` [pbs-devel] [PATCH proxmox-backup 2/2] verify: distinguish s3 object fetching and chunk loading error Christian Ebner
  0 siblings, 2 replies; 3+ messages in thread
From: Christian Ebner @ 2025-10-29 11:06 UTC (permalink / raw)
  To: pbs-devel

This patches were pulled out from the original patch series [0] since
they are independent from the bigger series attempting to fix the
possible race between corrupt chunk renaming and chunk insert/upload
and better reviewed/tested independently.

Patch 1 makes sure the mutex guard to sync up access to the corrupt
chunk list is dropped before attempting to rename a corrupt chunk,
which will call into async context on s3 stores. Otherwise deadlock
can arise.

Patch 2 is a followup to the bugfix for issue #6665, which however
did not correctly distinguish between transient fetching errors and
the possible chunk DataBlob decoding error from the response body in
case of a successful response.

[0] https://lore.proxmox.com/pbs-devel/20251016131819.349049-6-c.ebner@proxmox.com/T/

Christian Ebner (2):
  verify: never hold mutex lock in async scope on corrupt chunk rename
  verify: distinguish s3 object fetching and chunk loading error

 src/backup/verify.rs | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 1/2] verify: never hold mutex lock in async scope on corrupt chunk rename
  2025-10-29 11:06 [pbs-devel] [PATCH proxmox-backup 0/2] fix 2 issues with s3 store verifies Christian Ebner
@ 2025-10-29 11:06 ` Christian Ebner
  2025-10-29 11:06 ` [pbs-devel] [PATCH proxmox-backup 2/2] verify: distinguish s3 object fetching and chunk loading error Christian Ebner
  1 sibling, 0 replies; 3+ messages in thread
From: Christian Ebner @ 2025-10-29 11:06 UTC (permalink / raw)
  To: pbs-devel

Holding a mutex lock across async await boundaries is prone to
deadlock [0]. Renaming a corrupt chunk requires however async API
calls in case of datastores backed by S3.

Fix this by simply not hold onto the mutex lock guarding the corrupt
chunk list during chunk verification tasks when calling the rename
method. If the chunk is already present in this list, there will be
no other verification task operating on that exact chunk anyways.

[0] https://docs.rs/tokio/latest/tokio/sync/struct.Mutex.html#which-kind-of-mutex-should-you-use

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Originally on list as https://lore.proxmox.com/pbs-devel/20251016131819.349049-4-c.ebner@proxmox.com/

No changes to that patch.

 src/backup/verify.rs | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/backup/verify.rs b/src/backup/verify.rs
index bdbe3148b..7172e81e1 100644
--- a/src/backup/verify.rs
+++ b/src/backup/verify.rs
@@ -332,8 +332,7 @@ impl VerifyWorker {
 
     fn add_corrupt_chunk(&self, digest: [u8; 32], errors: Arc<AtomicUsize>, message: &str) {
         // Panic on poisoned mutex
-        let mut corrupt_chunks = self.corrupt_chunks.lock().unwrap();
-        corrupt_chunks.insert(digest);
+        self.corrupt_chunks.lock().unwrap().insert(digest);
         error!(message);
         errors.fetch_add(1, Ordering::SeqCst);
         Self::rename_corrupted_chunk(self.datastore.clone(), &digest);
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 2/2] verify: distinguish s3 object fetching and chunk loading error
  2025-10-29 11:06 [pbs-devel] [PATCH proxmox-backup 0/2] fix 2 issues with s3 store verifies Christian Ebner
  2025-10-29 11:06 ` [pbs-devel] [PATCH proxmox-backup 1/2] verify: never hold mutex lock in async scope on corrupt chunk rename Christian Ebner
@ 2025-10-29 11:06 ` Christian Ebner
  1 sibling, 0 replies; 3+ messages in thread
From: Christian Ebner @ 2025-10-29 11:06 UTC (permalink / raw)
  To: pbs-devel

Errors while loading chunks from the object store might be cause by
transient issues, and must therefore handled so they do not
incorrectly mark chunks as corrupt.

On creating the chunk from the response data, which includes the
chunk header and validity checks, errors must however lead to the
chunk being flagged as bad. Adapt the code so these errors are
correctly distinguished.

This is a followup to commit 3c350f35 ("fix #6665: never rename
chunks on s3 client fetch errors") which did not take that into
account.

Fixes: 3c350f35 ("fix #6665: never rename chunks on s3 client fetch errors")
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
Originally on list as https://lore.proxmox.com/pbs-devel/20251016131819.349049-7-c.ebner@proxmox.com/

Extended the commit message to include the commit ref since the
original version of the patch.

 src/backup/verify.rs | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/src/backup/verify.rs b/src/backup/verify.rs
index 7172e81e1..f01345b04 100644
--- a/src/backup/verify.rs
+++ b/src/backup/verify.rs
@@ -292,19 +292,24 @@ impl VerifyWorker {
                 let object_key = pbs_datastore::s3::object_key_from_digest(&info.digest)?;
                 match proxmox_async::runtime::block_on(s3_client.get_object(object_key)) {
                     Ok(Some(response)) => {
-                        let chunk_result = proxmox_lang::try_block!({
-                            let bytes =
-                                proxmox_async::runtime::block_on(response.content.collect())?
-                                    .to_bytes();
-                            DataBlob::from_raw(bytes.to_vec())
-                        });
-
-                        match chunk_result {
-                            Ok(chunk) => {
-                                let size = info.size();
-                                *read_bytes += chunk.raw_size();
-                                decoder_pool.send((chunk, info.digest, size))?;
-                                *decoded_bytes += size;
+                        match proxmox_async::runtime::block_on(response.content.collect()) {
+                            Ok(raw_chunk) => {
+                                match DataBlob::from_raw(raw_chunk.to_bytes().to_vec()) {
+                                    Ok(chunk) => {
+                                        let size = info.size();
+                                        *read_bytes += chunk.raw_size();
+                                        decoder_pool.send((chunk, info.digest, size))?;
+                                        *decoded_bytes += size;
+                                    }
+                                    Err(err) => self.add_corrupt_chunk(
+                                        info.digest,
+                                        errors,
+                                        &format!(
+                                            "can't verify chunk with digest {} - {err}",
+                                            hex::encode(info.digest)
+                                        ),
+                                    ),
+                                }
                             }
                             Err(err) => {
                                 errors.fetch_add(1, Ordering::SeqCst);
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-10-29 11:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-29 11:06 [pbs-devel] [PATCH proxmox-backup 0/2] fix 2 issues with s3 store verifies Christian Ebner
2025-10-29 11:06 ` [pbs-devel] [PATCH proxmox-backup 1/2] verify: never hold mutex lock in async scope on corrupt chunk rename Christian Ebner
2025-10-29 11:06 ` [pbs-devel] [PATCH proxmox-backup 2/2] verify: distinguish s3 object fetching and chunk loading error Christian Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal