public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [PATCH proxmox v2 0/3] fix #6858: implement retry logic for transient API errors
@ 2026-02-24 13:49 Christian Ebner
  2026-02-24 13:49 ` [PATCH proxmox v2 1/3] s3-client: early return when request timeout deadline reached Christian Ebner
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Christian Ebner @ 2026-02-24 13:49 UTC (permalink / raw)
  To: pbs-devel

These patches implement the best practice [0] on handling S3 API
response status codes 500, 503 by retrying the requests after an
exponential backoff time. Do the same for status code 504, as this
is returned by some storage providers if overwhelmed [1].

The first 2 patches contain a small fix to avoid additional response
latency in case of request timeout being reached and reorganize the
code for better logical flow. The final patch then adds the
additional response status code checks for retires.

Link to the issue in bugzilla:
https://bugzilla.proxmox.com/show_bug.cgi?id=6858

[0] https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorBestPractices.html
[1] https://forum.proxmox.com/threads/180956/

Changes since version 1 (thanks @Fabian for review):
- return the last error if retries are exhausted
- consider also 504 gateway timeout as retryable

Christian Ebner (3):
  s3-client: early return when request timeout deadline reached
  s3-client: move exponential backoff to after the response state check
  fix #6858: s3-client: retry request on 500, 503 and 504 status codes

 proxmox-s3-client/src/client.rs | 38 ++++++++++++++++-----------------
 1 file changed, 19 insertions(+), 19 deletions(-)

-- 
2.47.3





^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH proxmox v2 1/3] s3-client: early return when request timeout deadline reached
  2026-02-24 13:49 [PATCH proxmox v2 0/3] fix #6858: implement retry logic for transient API errors Christian Ebner
@ 2026-02-24 13:49 ` Christian Ebner
  2026-02-24 13:49 ` [PATCH proxmox v2 2/3] s3-client: move exponential backoff to after the response state check Christian Ebner
  2026-02-24 13:49 ` [PATCH proxmox v2 3/3] fix #6858: s3-client: retry request on 500, 503 and 504 status codes Christian Ebner
  2 siblings, 0 replies; 4+ messages in thread
From: Christian Ebner @ 2026-02-24 13:49 UTC (permalink / raw)
  To: pbs-devel

The optional timeout value generates a deadline, after which the
request times out and fails, independent from retries.

The current implementation however unneededly continues to loop over
the remaining retires, including potential put rate limit delay and
exponential backoff time, creating unjustified additional latency.

Fix this by early returning with error once the deadline is reached.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- no changes

 proxmox-s3-client/src/client.rs | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/proxmox-s3-client/src/client.rs b/proxmox-s3-client/src/client.rs
index 83176b39..5e30aa12 100644
--- a/proxmox-s3-client/src/client.rs
+++ b/proxmox-s3-client/src/client.rs
@@ -386,23 +386,20 @@ impl S3Client {
             }
 
             let response = if let Some(deadline) = deadline {
-                tokio::time::timeout_at(deadline, self.client.request(request)).await
+                tokio::time::timeout_at(deadline, self.client.request(request))
+                    .await
+                    .context("request timeout reached")?
             } else {
-                Ok(self.client.request(request).await)
+                self.client.request(request).await
             };
 
             match response {
-                Ok(Ok(response)) => return Ok(response),
-                Ok(Err(err)) => {
+                Ok(response) => return Ok(response),
+                Err(err) => {
                     if retry >= MAX_S3_HTTP_REQUEST_RETRY - 1 {
                         return Err(err.into());
                     }
                 }
-                Err(_elapsed) => {
-                    if retry >= MAX_S3_HTTP_REQUEST_RETRY - 1 {
-                        bail!("request timed out exceeding retries");
-                    }
-                }
             }
         }
 
-- 
2.47.3





^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH proxmox v2 2/3] s3-client: move exponential backoff to after the response state check
  2026-02-24 13:49 [PATCH proxmox v2 0/3] fix #6858: implement retry logic for transient API errors Christian Ebner
  2026-02-24 13:49 ` [PATCH proxmox v2 1/3] s3-client: early return when request timeout deadline reached Christian Ebner
@ 2026-02-24 13:49 ` Christian Ebner
  2026-02-24 13:49 ` [PATCH proxmox v2 3/3] fix #6858: s3-client: retry request on 500, 503 and 504 status codes Christian Ebner
  2 siblings, 0 replies; 4+ messages in thread
From: Christian Ebner @ 2026-02-24 13:49 UTC (permalink / raw)
  To: pbs-devel

The exponential backup must only be performed after transient error
states anyways, so move it to the end of the loop, further avoiding
an unneeded retry counter check.

Since the put rate limiter remains in-place, this now also correctly
accounts for the additional exponential backoff time, already doing
some of the potential delay.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- no changes

 proxmox-s3-client/src/client.rs | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/proxmox-s3-client/src/client.rs b/proxmox-s3-client/src/client.rs
index 5e30aa12..f3e5eb45 100644
--- a/proxmox-s3-client/src/client.rs
+++ b/proxmox-s3-client/src/client.rs
@@ -380,11 +380,6 @@ impl S3Client {
                 }
             }
 
-            if retry > 0 {
-                let backoff_secs = S3_HTTP_REQUEST_RETRY_BACKOFF_DEFAULT * 3_u32.pow(retry as u32);
-                tokio::time::sleep(backoff_secs).await;
-            }
-
             let response = if let Some(deadline) = deadline {
                 tokio::time::timeout_at(deadline, self.client.request(request))
                     .await
@@ -401,6 +396,9 @@ impl S3Client {
                     }
                 }
             }
+
+            let backoff_secs = S3_HTTP_REQUEST_RETRY_BACKOFF_DEFAULT * 3_u32.pow(retry as u32);
+            tokio::time::sleep(backoff_secs).await;
         }
 
         bail!("failed to send request exceeding retries");
-- 
2.47.3





^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH proxmox v2 3/3] fix #6858: s3-client: retry request on 500, 503 and 504 status codes
  2026-02-24 13:49 [PATCH proxmox v2 0/3] fix #6858: implement retry logic for transient API errors Christian Ebner
  2026-02-24 13:49 ` [PATCH proxmox v2 1/3] s3-client: early return when request timeout deadline reached Christian Ebner
  2026-02-24 13:49 ` [PATCH proxmox v2 2/3] s3-client: move exponential backoff to after the response state check Christian Ebner
@ 2026-02-24 13:49 ` Christian Ebner
  2 siblings, 0 replies; 4+ messages in thread
From: Christian Ebner @ 2026-02-24 13:49 UTC (permalink / raw)
  To: pbs-devel

Follow the best practices for AWS S3 error handling [0] and perform
retries on requests with http status code 500 or 503 in the response.

Further, do the same for 504 gateway timeout errors encountered by
some users in the community forum [1] in combination with Hetzner's
S3 storage offerings.

This is done for all requests unconditionally, maximum number of
retires and optional request timeout being honored.

[0] https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorBestPractices.html
[1] https://forum.proxmox.com/threads/180956/

Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=6858
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
changes since version 1:
- return the last error if retries are exhausted
- consider also 504 gateway timeout as retryable

 proxmox-s3-client/src/client.rs | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/proxmox-s3-client/src/client.rs b/proxmox-s3-client/src/client.rs
index f3e5eb45..35c80948 100644
--- a/proxmox-s3-client/src/client.rs
+++ b/proxmox-s3-client/src/client.rs
@@ -388,13 +388,18 @@ impl S3Client {
                 self.client.request(request).await
             };
 
-            match response {
-                Ok(response) => return Ok(response),
-                Err(err) => {
-                    if retry >= MAX_S3_HTTP_REQUEST_RETRY - 1 {
-                        return Err(err.into());
-                    }
-                }
+            let do_retry = match &response {
+                Ok(response) => matches!(
+                    response.status(),
+                    StatusCode::INTERNAL_SERVER_ERROR
+                        | StatusCode::SERVICE_UNAVAILABLE
+                        | StatusCode::GATEWAY_TIMEOUT
+                ),
+                Err(_) => true,
+            };
+
+            if !do_retry || retry >= MAX_S3_HTTP_REQUEST_RETRY - 1 {
+                return Ok(response?);
             }
 
             let backoff_secs = S3_HTTP_REQUEST_RETRY_BACKOFF_DEFAULT * 3_u32.pow(retry as u32);
-- 
2.47.3





^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-02-24 13:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-24 13:49 [PATCH proxmox v2 0/3] fix #6858: implement retry logic for transient API errors Christian Ebner
2026-02-24 13:49 ` [PATCH proxmox v2 1/3] s3-client: early return when request timeout deadline reached Christian Ebner
2026-02-24 13:49 ` [PATCH proxmox v2 2/3] s3-client: move exponential backoff to after the response state check Christian Ebner
2026-02-24 13:49 ` [PATCH proxmox v2 3/3] fix #6858: s3-client: retry request on 500, 503 and 504 status codes Christian Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal