all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox{, -backup} 0/6] fix #6665: never mark chunks as bad on s3 client fetch error
@ 2025-08-25 10:32 Christian Ebner
  2025-08-25 10:32 ` [pbs-devel] [PATCH proxmox 1/5] proxmox-http: add method to share full body as contiguous bytes Christian Ebner
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Christian Ebner @ 2025-08-25 10:32 UTC (permalink / raw)
  To: pbs-devel

This patch series fixes an issue with chunks being incorrectly marked
as bad in case of s3 client fetch errors. In that case, only increase
the error counter and log the error, but leave the chunk as is.

To increase robusteness, implement a retry logic for all transient s3
client request errors with linear backoff time. Further,
unconditionally increase the get object request timeout from currently
1 minute to 30 minutes, to greatly reduce possible timeouts on low
download bandwidth connections.

proxmox:

Christian Ebner (5):
  proxmox-http: add method to share full body as contiguous bytes
  s3-client: drop non-ambiguous mention of chunks in error message
  s3-client: fix unintended match statement being an expression
  s3-client: bump s3 request timeout from 1 minute to 30 minutes
  s3-client: add retry logic for transient client errors

 proxmox-http/src/body.rs        |  8 ++++
 proxmox-s3-client/src/client.rs | 70 +++++++++++++++++++++++----------
 2 files changed, 58 insertions(+), 20 deletions(-)


proxmox-backup:

Christian Ebner (1):
  fix #6665: never rename chunks on s3 client fetch errors

 src/backup/verify.rs | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)


Summary over all repositories:
  3 files changed, 81 insertions(+), 32 deletions(-)

-- 
Generated by git-murpp 0.8.1


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pbs-devel] [PATCH proxmox 1/5] proxmox-http: add method to share full body as contiguous bytes
  2025-08-25 10:32 [pbs-devel] [PATCH proxmox{, -backup} 0/6] fix #6665: never mark chunks as bad on s3 client fetch error Christian Ebner
@ 2025-08-25 10:32 ` Christian Ebner
  2025-08-25 10:32 ` [pbs-devel] [PATCH proxmox 2/5] s3-client: drop non-ambiguous mention of chunks in error message Christian Ebner
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Christian Ebner @ 2025-08-25 10:32 UTC (permalink / raw)
  To: pbs-devel

While already possible to get the body contents as bytes slice,
exposing it as `Bytes` object has the advantage to be able to
efficiently clone the body when a shared reference is not possible,
e.g. for request cloning.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 proxmox-http/src/body.rs | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/proxmox-http/src/body.rs b/proxmox-http/src/body.rs
index 3eb17355..a1dade27 100644
--- a/proxmox-http/src/body.rs
+++ b/proxmox-http/src/body.rs
@@ -35,6 +35,14 @@ impl Body {
         }
     }
 
+    /// Returns the body contents as `Bytes` it is a "full" body, None otherwise.
+    pub fn bytes(&self) -> Option<Bytes> {
+        match self.inner {
+            InnerBody::Full(ref bytes) => Some(bytes.clone()),
+            InnerBody::Streaming(_) => None,
+        }
+    }
+
     pub fn wrap_stream<S>(stream: S) -> Body
     where
         S: futures::stream::TryStream + Send + 'static,
-- 
2.47.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pbs-devel] [PATCH proxmox 2/5] s3-client: drop non-ambiguous mention of chunks in error message
  2025-08-25 10:32 [pbs-devel] [PATCH proxmox{, -backup} 0/6] fix #6665: never mark chunks as bad on s3 client fetch error Christian Ebner
  2025-08-25 10:32 ` [pbs-devel] [PATCH proxmox 1/5] proxmox-http: add method to share full body as contiguous bytes Christian Ebner
@ 2025-08-25 10:32 ` Christian Ebner
  2025-08-25 10:32 ` [pbs-devel] [PATCH proxmox 3/5] s3-client: fix unintended match statement being an expression Christian Ebner
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Christian Ebner @ 2025-08-25 10:32 UTC (permalink / raw)
  To: pbs-devel

The upload with retry helper method was initially implemented for
chunk uploads in PBS, before being moved to the dedicated s3 client
crate prior to being applied.

Since this helper is also being used to upload other objects, non just
data chunks, adapt the error message to be more ambiguous.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 proxmox-s3-client/src/client.rs | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/proxmox-s3-client/src/client.rs b/proxmox-s3-client/src/client.rs
index ec29d95a..4301f1af 100644
--- a/proxmox-s3-client/src/client.rs
+++ b/proxmox-s3-client/src/client.rs
@@ -684,12 +684,12 @@ impl S3Client {
                 Ok(PutObjectResponse::PreconditionFailed) => return Ok(true),
                 Ok(PutObjectResponse::NeedsRetry) => {
                     if retry >= MAX_S3_UPLOAD_RETRY - 1 {
-                        bail!("concurrent operation, chunk upload failed")
+                        bail!("concurrent operation, upload failed")
                     }
                 }
                 Err(err) => {
                     if retry >= MAX_S3_UPLOAD_RETRY - 1 {
-                        return Err(err.context("chunk upload failed"));
+                        return Err(err.context("upload failed"));
                     }
                 }
             };
-- 
2.47.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pbs-devel] [PATCH proxmox 3/5] s3-client: fix unintended match statement being an expression
  2025-08-25 10:32 [pbs-devel] [PATCH proxmox{, -backup} 0/6] fix #6665: never mark chunks as bad on s3 client fetch error Christian Ebner
  2025-08-25 10:32 ` [pbs-devel] [PATCH proxmox 1/5] proxmox-http: add method to share full body as contiguous bytes Christian Ebner
  2025-08-25 10:32 ` [pbs-devel] [PATCH proxmox 2/5] s3-client: drop non-ambiguous mention of chunks in error message Christian Ebner
@ 2025-08-25 10:32 ` Christian Ebner
  2025-08-25 10:32 ` [pbs-devel] [PATCH proxmox 4/5] s3-client: bump s3 request timeout from 1 minute to 30 minutes Christian Ebner
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Christian Ebner @ 2025-08-25 10:32 UTC (permalink / raw)
  To: pbs-devel

The match in question is not intendended to be an expression, so drop
the block trailing semicolon, adding it to the bail instead.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 proxmox-s3-client/src/client.rs | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/proxmox-s3-client/src/client.rs b/proxmox-s3-client/src/client.rs
index 4301f1af..ab2582bf 100644
--- a/proxmox-s3-client/src/client.rs
+++ b/proxmox-s3-client/src/client.rs
@@ -684,7 +684,7 @@ impl S3Client {
                 Ok(PutObjectResponse::PreconditionFailed) => return Ok(true),
                 Ok(PutObjectResponse::NeedsRetry) => {
                     if retry >= MAX_S3_UPLOAD_RETRY - 1 {
-                        bail!("concurrent operation, upload failed")
+                        bail!("concurrent operation, upload failed");
                     }
                 }
                 Err(err) => {
@@ -692,7 +692,7 @@ impl S3Client {
                         return Err(err.context("upload failed"));
                     }
                 }
-            };
+            }
         }
         Ok(false)
     }
-- 
2.47.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pbs-devel] [PATCH proxmox 4/5] s3-client: bump s3 request timeout from 1 minute to 30 minutes
  2025-08-25 10:32 [pbs-devel] [PATCH proxmox{, -backup} 0/6] fix #6665: never mark chunks as bad on s3 client fetch error Christian Ebner
                   ` (2 preceding siblings ...)
  2025-08-25 10:32 ` [pbs-devel] [PATCH proxmox 3/5] s3-client: fix unintended match statement being an expression Christian Ebner
@ 2025-08-25 10:32 ` Christian Ebner
  2025-08-25 10:32 ` [pbs-devel] [PATCH proxmox 5/5] s3-client: add retry logic for transient client errors Christian Ebner
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Christian Ebner @ 2025-08-25 10:32 UTC (permalink / raw)
  To: pbs-devel

The currently chosen value of 1 minute was to low for several
operations such as, e.g. fetching of chunks over low download
bandwidth connections. Since the download size is not known a-priory
and other operations such as list objects v2 might also take
considerably more time, bump the timeout unconditionally. The TCP
keepalive timeout for the connection is bumped to the same value, so
the connection is not dropped before that for requests where no data
is send before getting the response (e.g. list_object_v2() calls).

The 30 minutes translate to an average download rate of about 2.3
KiB/s for a 4 MiB data blob.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 proxmox-s3-client/src/client.rs | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/proxmox-s3-client/src/client.rs b/proxmox-s3-client/src/client.rs
index ab2582bf..559990a1 100644
--- a/proxmox-s3-client/src/client.rs
+++ b/proxmox-s3-client/src/client.rs
@@ -32,10 +32,10 @@ use crate::response_reader::{
 };
 
 /// Default timeout for s3 api requests.
-pub const S3_HTTP_REQUEST_TIMEOUT: Duration = Duration::from_secs(60);
+pub const S3_HTTP_REQUEST_TIMEOUT: Duration = Duration::from_secs(30 * 60);
 
 const S3_HTTP_CONNECT_TIMEOUT: Duration = Duration::from_secs(10);
-const S3_TCP_KEEPALIVE_TIME: u32 = 120;
+const S3_TCP_KEEPALIVE_TIME: u32 = 30 * 60;
 const MAX_S3_UPLOAD_RETRY: usize = 3;
 // Assumed minimum upload rate of 1 KiB/s for dynamic put object request timeout calculation.
 const S3_MIN_ASSUMED_UPLOAD_RATE: u64 = 1024;
-- 
2.47.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pbs-devel] [PATCH proxmox 5/5] s3-client: add retry logic for transient client errors
  2025-08-25 10:32 [pbs-devel] [PATCH proxmox{, -backup} 0/6] fix #6665: never mark chunks as bad on s3 client fetch error Christian Ebner
                   ` (3 preceding siblings ...)
  2025-08-25 10:32 ` [pbs-devel] [PATCH proxmox 4/5] s3-client: bump s3 request timeout from 1 minute to 30 minutes Christian Ebner
@ 2025-08-25 10:32 ` Christian Ebner
  2025-08-25 10:32 ` [pbs-devel] [PATCH proxmox-backup 1/1] fix #6665: never rename chunks on s3 client fetch errors Christian Ebner
  2025-08-25 13:11 ` [pbs-devel] superseded: [PATCH proxmox{, -backup} 0/6] fix #6665: never mark chunks as bad on s3 client fetch error Christian Ebner
  6 siblings, 0 replies; 8+ messages in thread
From: Christian Ebner @ 2025-08-25 10:32 UTC (permalink / raw)
  To: pbs-devel

Implements a retry logic with linearly increasing backoff time for
transient client errors.

For this, clone the requests by destructuring and efficiently
cloneing its body, leveraging Bytes::clone(). Retry up to 3 times,
adding a linearly increasing backoff time as multiples of 100 ms
for each retry, with the intention to reduce network congestion
and remote system overload [0].

[0] https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 proxmox-s3-client/src/client.rs | 60 ++++++++++++++++++++++++---------
 1 file changed, 45 insertions(+), 15 deletions(-)

diff --git a/proxmox-s3-client/src/client.rs b/proxmox-s3-client/src/client.rs
index 559990a1..3c9ba009 100644
--- a/proxmox-s3-client/src/client.rs
+++ b/proxmox-s3-client/src/client.rs
@@ -39,6 +39,8 @@ const S3_TCP_KEEPALIVE_TIME: u32 = 30 * 60;
 const MAX_S3_UPLOAD_RETRY: usize = 3;
 // Assumed minimum upload rate of 1 KiB/s for dynamic put object request timeout calculation.
 const S3_MIN_ASSUMED_UPLOAD_RATE: u64 = 1024;
+const MAX_S3_HTTP_REQUEST_RETRY: usize = 3;
+const S3_HTTP_REQUEST_RETRY_BACKOFF_DEFAULT: Duration = Duration::from_millis(100);
 
 /// S3 object key path prefix without the context prefix as defined by the client options.
 ///
@@ -293,23 +295,51 @@ impl S3Client {
         timeout: Option<Duration>,
     ) -> Result<Response<Incoming>, Error> {
         let request = self.prepare(request).await?;
-        if request.method() == Method::PUT {
-            if let Some(limiter) = &self.put_rate_limiter {
-                let sleep = {
-                    let mut limiter = limiter.lock().unwrap();
-                    limiter.register_traffic(Instant::now(), 1)
-                };
-                tokio::time::sleep(sleep).await;
+
+        let (parts, body) = request.into_parts();
+        let body_bytes = body
+            .bytes()
+            .ok_or_else(|| format_err!("cannot prepare request with streaming body"))?;
+
+        for retry in 0..MAX_S3_HTTP_REQUEST_RETRY {
+            let request = Request::from_parts(parts.clone(), Body::from(body_bytes.clone()));
+            if parts.method == Method::PUT {
+                if let Some(limiter) = &self.put_rate_limiter {
+                    let sleep = {
+                        let mut limiter = limiter.lock().unwrap();
+                        limiter.register_traffic(Instant::now(), 1)
+                    };
+                    tokio::time::sleep(sleep).await;
+                }
+            }
+
+            if retry > 0 {
+                let backoff_millis = S3_HTTP_REQUEST_RETRY_BACKOFF_DEFAULT * 2 * retry as u32;
+                tokio::time::sleep(backoff_millis).await;
+            }
+
+            let response = if let Some(timeout) = timeout {
+                tokio::time::timeout(timeout, self.client.request(request)).await
+            } else {
+                Ok(self.client.request(request).await)
+            };
+
+            match response {
+                Ok(Ok(response)) => return Ok(response),
+                Ok(Err(err)) => {
+                    if retry >= MAX_S3_HTTP_REQUEST_RETRY - 1 {
+                        return Err(err.into());
+                    }
+                }
+                Err(_elapsed) => {
+                    if retry >= MAX_S3_HTTP_REQUEST_RETRY - 1 {
+                        bail!("request timed out exceeding retries");
+                    }
+                }
             }
         }
-        let response = if let Some(timeout) = timeout {
-            tokio::time::timeout(timeout, self.client.request(request))
-                .await
-                .context("request timeout")??
-        } else {
-            self.client.request(request).await?
-        };
-        Ok(response)
+
+        bail!("failed to send request exceeding retries");
     }
 
     /// Check if bucket exists and got permissions to access it.
-- 
2.47.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 1/1] fix #6665: never rename chunks on s3 client fetch errors
  2025-08-25 10:32 [pbs-devel] [PATCH proxmox{, -backup} 0/6] fix #6665: never mark chunks as bad on s3 client fetch error Christian Ebner
                   ` (4 preceding siblings ...)
  2025-08-25 10:32 ` [pbs-devel] [PATCH proxmox 5/5] s3-client: add retry logic for transient client errors Christian Ebner
@ 2025-08-25 10:32 ` Christian Ebner
  2025-08-25 13:11 ` [pbs-devel] superseded: [PATCH proxmox{, -backup} 0/6] fix #6665: never mark chunks as bad on s3 client fetch error Christian Ebner
  6 siblings, 0 replies; 8+ messages in thread
From: Christian Ebner @ 2025-08-25 10:32 UTC (permalink / raw)
  To: pbs-devel

The chunk verification for the s3 backend always fetches the chunk
directly from the backend and verifies it based on the received
response.

Currently, when fetching the chunk contents failed in case of an
unrelated s3 client error, e.g. due to transient networking issues,
both the locally cached and the chunk in the object store, were
incorrectly marked as bad and renamed.

Instead, treat chunk fetching issues as error but do not treat the
chunk itself as bad.

Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=6665
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 src/backup/verify.rs | 35 +++++++++++++++++++++++------------
 1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/src/backup/verify.rs b/src/backup/verify.rs
index b1452f267..069c3bdf9 100644
--- a/src/backup/verify.rs
+++ b/src/backup/verify.rs
@@ -284,13 +284,25 @@ impl VerifyWorker {
                 let object_key = pbs_datastore::s3::object_key_from_digest(&info.digest)?;
                 match proxmox_async::runtime::block_on(s3_client.get_object(object_key)) {
                     Ok(Some(response)) => {
-                        let bytes = proxmox_async::runtime::block_on(response.content.collect())?
-                            .to_bytes();
-                        let chunk = DataBlob::from_raw(bytes.to_vec())?;
-                        let size = info.size();
-                        *read_bytes += chunk.raw_size();
-                        decoder_pool.send((chunk, info.digest, size))?;
-                        *decoded_bytes += size;
+                        let chunk_result = proxmox_lang::try_block!({
+                            let bytes =
+                                proxmox_async::runtime::block_on(response.content.collect())?
+                                    .to_bytes();
+                            DataBlob::from_raw(bytes.to_vec())
+                        });
+
+                        match chunk_result {
+                            Ok(chunk) => {
+                                let size = info.size();
+                                *read_bytes += chunk.raw_size();
+                                decoder_pool.send((chunk, info.digest, size))?;
+                                *decoded_bytes += size;
+                            }
+                            Err(err) => {
+                                errors.fetch_add(1, Ordering::SeqCst);
+                                error!("can't verify chunk, load failed - {err}");
+                            }
+                        }
                     }
                     Ok(None) => self.add_corrupt_chunk(
                         info.digest,
@@ -300,11 +312,10 @@ impl VerifyWorker {
                             hex::encode(info.digest)
                         ),
                     ),
-                    Err(err) => self.add_corrupt_chunk(
-                        info.digest,
-                        errors,
-                        &format!("can't verify chunk, load failed - {err}"),
-                    ),
+                    Err(err) => {
+                        errors.fetch_add(1, Ordering::SeqCst);
+                        error!("can't verify chunk, load failed - {err}");
+                    }
                 }
             }
         }
-- 
2.47.2



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pbs-devel] superseded: [PATCH proxmox{, -backup} 0/6] fix #6665: never mark chunks as bad on s3 client fetch error
  2025-08-25 10:32 [pbs-devel] [PATCH proxmox{, -backup} 0/6] fix #6665: never mark chunks as bad on s3 client fetch error Christian Ebner
                   ` (5 preceding siblings ...)
  2025-08-25 10:32 ` [pbs-devel] [PATCH proxmox-backup 1/1] fix #6665: never rename chunks on s3 client fetch errors Christian Ebner
@ 2025-08-25 13:11 ` Christian Ebner
  6 siblings, 0 replies; 8+ messages in thread
From: Christian Ebner @ 2025-08-25 13:11 UTC (permalink / raw)
  To: pbs-devel

superseded-by version 2:
https://lore.proxmox.com/pbs-devel/20250825131007.626777-1-c.ebner@proxmox.com/T/


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-08-25 13:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-08-25 10:32 [pbs-devel] [PATCH proxmox{, -backup} 0/6] fix #6665: never mark chunks as bad on s3 client fetch error Christian Ebner
2025-08-25 10:32 ` [pbs-devel] [PATCH proxmox 1/5] proxmox-http: add method to share full body as contiguous bytes Christian Ebner
2025-08-25 10:32 ` [pbs-devel] [PATCH proxmox 2/5] s3-client: drop non-ambiguous mention of chunks in error message Christian Ebner
2025-08-25 10:32 ` [pbs-devel] [PATCH proxmox 3/5] s3-client: fix unintended match statement being an expression Christian Ebner
2025-08-25 10:32 ` [pbs-devel] [PATCH proxmox 4/5] s3-client: bump s3 request timeout from 1 minute to 30 minutes Christian Ebner
2025-08-25 10:32 ` [pbs-devel] [PATCH proxmox 5/5] s3-client: add retry logic for transient client errors Christian Ebner
2025-08-25 10:32 ` [pbs-devel] [PATCH proxmox-backup 1/1] fix #6665: never rename chunks on s3 client fetch errors Christian Ebner
2025-08-25 13:11 ` [pbs-devel] superseded: [PATCH proxmox{, -backup} 0/6] fix #6665: never mark chunks as bad on s3 client fetch error Christian Ebner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal