* [pbs-devel] [PATCH proxmox{, -backup} v2 0/7] fix #6665: never mark chunks as bad on s3 client fetch error
@ 2025-08-25 13:10 Christian Ebner
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox v2 1/6] proxmox-http: add method to share full body as contiguous bytes Christian Ebner
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Christian Ebner @ 2025-08-25 13:10 UTC (permalink / raw)
To: pbs-devel
This patch series fixes an issue with chunks being incorrectly marked
as bad in case of s3 client fetch errors. In that case, only increase
the error counter and log the error, but leave the chunk as is.
To increase robusteness, implement a retry logic for all transient s3
client request errors with exponential backoff time. Further,
unconditionally increase the get object request timeout from currently
1 minute to 30 minutes, to greatly reduce possible timeouts on low
download bandwidth connections.
Changes since version 1 (thanks @Wolfgang for offlist feedback):
- Change timeout value to be independent of request retries
- Adapt backoff time to be exponential, increasing initial value to 1 second
- Do not increase tcp keepalive idle time, and rename const to avoid confusion
proxmox:
Christian Ebner (6):
proxmox-http: add method to share full body as contiguous bytes
s3-client: drop non-ambiguous mention of chunks in error message
s3-client: fix unintended match statement being an expression
s3-client: bump s3 request timeout from 1 minute to 30 minutes
s3-client: add retry logic for transient client errors
s3-client: use better fitting name for TCP idle time
proxmox-http/src/body.rs | 8 ++++
proxmox-s3-client/src/client.rs | 74 +++++++++++++++++++++++----------
2 files changed, 61 insertions(+), 21 deletions(-)
proxmox-backup:
Christian Ebner (1):
fix #6665: never rename chunks on s3 client fetch errors
src/backup/verify.rs | 35 +++++++++++++++++++++++------------
1 file changed, 23 insertions(+), 12 deletions(-)
Summary over all repositories:
3 files changed, 84 insertions(+), 33 deletions(-)
--
Generated by git-murpp 0.8.1
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* [pbs-devel] [PATCH proxmox v2 1/6] proxmox-http: add method to share full body as contiguous bytes
2025-08-25 13:10 [pbs-devel] [PATCH proxmox{, -backup} v2 0/7] fix #6665: never mark chunks as bad on s3 client fetch error Christian Ebner
@ 2025-08-25 13:10 ` Christian Ebner
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox v2 2/6] s3-client: drop non-ambiguous mention of chunks in error message Christian Ebner
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Christian Ebner @ 2025-08-25 13:10 UTC (permalink / raw)
To: pbs-devel
While already possible to get the body contents as bytes slice,
exposing it as `Bytes` object has the advantage to be able to
efficiently clone the body when a shared reference is not possible,
e.g. for request cloning.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
proxmox-http/src/body.rs | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/proxmox-http/src/body.rs b/proxmox-http/src/body.rs
index 3eb17355..a1dade27 100644
--- a/proxmox-http/src/body.rs
+++ b/proxmox-http/src/body.rs
@@ -35,6 +35,14 @@ impl Body {
}
}
+ /// Returns the body contents as `Bytes` it is a "full" body, None otherwise.
+ pub fn bytes(&self) -> Option<Bytes> {
+ match self.inner {
+ InnerBody::Full(ref bytes) => Some(bytes.clone()),
+ InnerBody::Streaming(_) => None,
+ }
+ }
+
pub fn wrap_stream<S>(stream: S) -> Body
where
S: futures::stream::TryStream + Send + 'static,
--
2.47.2
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* [pbs-devel] [PATCH proxmox v2 2/6] s3-client: drop non-ambiguous mention of chunks in error message
2025-08-25 13:10 [pbs-devel] [PATCH proxmox{, -backup} v2 0/7] fix #6665: never mark chunks as bad on s3 client fetch error Christian Ebner
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox v2 1/6] proxmox-http: add method to share full body as contiguous bytes Christian Ebner
@ 2025-08-25 13:10 ` Christian Ebner
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox v2 3/6] s3-client: fix unintended match statement being an expression Christian Ebner
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Christian Ebner @ 2025-08-25 13:10 UTC (permalink / raw)
To: pbs-devel
The upload with retry helper method was initially implemented for
chunk uploads in PBS, before being moved to the dedicated s3 client
crate prior to being applied.
Since this helper is also being used to upload other objects, non just
data chunks, adapt the error message to be more ambiguous.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
proxmox-s3-client/src/client.rs | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/proxmox-s3-client/src/client.rs b/proxmox-s3-client/src/client.rs
index ec29d95a..4301f1af 100644
--- a/proxmox-s3-client/src/client.rs
+++ b/proxmox-s3-client/src/client.rs
@@ -684,12 +684,12 @@ impl S3Client {
Ok(PutObjectResponse::PreconditionFailed) => return Ok(true),
Ok(PutObjectResponse::NeedsRetry) => {
if retry >= MAX_S3_UPLOAD_RETRY - 1 {
- bail!("concurrent operation, chunk upload failed")
+ bail!("concurrent operation, upload failed")
}
}
Err(err) => {
if retry >= MAX_S3_UPLOAD_RETRY - 1 {
- return Err(err.context("chunk upload failed"));
+ return Err(err.context("upload failed"));
}
}
};
--
2.47.2
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* [pbs-devel] [PATCH proxmox v2 3/6] s3-client: fix unintended match statement being an expression
2025-08-25 13:10 [pbs-devel] [PATCH proxmox{, -backup} v2 0/7] fix #6665: never mark chunks as bad on s3 client fetch error Christian Ebner
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox v2 1/6] proxmox-http: add method to share full body as contiguous bytes Christian Ebner
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox v2 2/6] s3-client: drop non-ambiguous mention of chunks in error message Christian Ebner
@ 2025-08-25 13:10 ` Christian Ebner
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox v2 4/6] s3-client: bump s3 request timeout from 1 minute to 30 minutes Christian Ebner
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Christian Ebner @ 2025-08-25 13:10 UTC (permalink / raw)
To: pbs-devel
The match in question is not intendended to be an expression, so drop
the block trailing semicolon, adding it to the bail instead.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
proxmox-s3-client/src/client.rs | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/proxmox-s3-client/src/client.rs b/proxmox-s3-client/src/client.rs
index 4301f1af..ab2582bf 100644
--- a/proxmox-s3-client/src/client.rs
+++ b/proxmox-s3-client/src/client.rs
@@ -684,7 +684,7 @@ impl S3Client {
Ok(PutObjectResponse::PreconditionFailed) => return Ok(true),
Ok(PutObjectResponse::NeedsRetry) => {
if retry >= MAX_S3_UPLOAD_RETRY - 1 {
- bail!("concurrent operation, upload failed")
+ bail!("concurrent operation, upload failed");
}
}
Err(err) => {
@@ -692,7 +692,7 @@ impl S3Client {
return Err(err.context("upload failed"));
}
}
- };
+ }
}
Ok(false)
}
--
2.47.2
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* [pbs-devel] [PATCH proxmox v2 4/6] s3-client: bump s3 request timeout from 1 minute to 30 minutes
2025-08-25 13:10 [pbs-devel] [PATCH proxmox{, -backup} v2 0/7] fix #6665: never mark chunks as bad on s3 client fetch error Christian Ebner
` (2 preceding siblings ...)
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox v2 3/6] s3-client: fix unintended match statement being an expression Christian Ebner
@ 2025-08-25 13:10 ` Christian Ebner
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox v2 5/6] s3-client: add retry logic for transient client errors Christian Ebner
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Christian Ebner @ 2025-08-25 13:10 UTC (permalink / raw)
To: pbs-devel
The currently chosen value of 1 minute was to low for several
operations such as, e.g. fetching of chunks over low download
bandwidth connections. Since the download size is not known a-priory
and other operations such as list objects v2 might also take
considerably more time, bump the timeout unconditionally.
The 30 minutes translate to an average download rate of about 2.3
KiB/s for a 4 MiB data blob.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
proxmox-s3-client/src/client.rs | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/proxmox-s3-client/src/client.rs b/proxmox-s3-client/src/client.rs
index ab2582bf..64d62c54 100644
--- a/proxmox-s3-client/src/client.rs
+++ b/proxmox-s3-client/src/client.rs
@@ -32,7 +32,7 @@ use crate::response_reader::{
};
/// Default timeout for s3 api requests.
-pub const S3_HTTP_REQUEST_TIMEOUT: Duration = Duration::from_secs(60);
+pub const S3_HTTP_REQUEST_TIMEOUT: Duration = Duration::from_secs(30 * 60);
const S3_HTTP_CONNECT_TIMEOUT: Duration = Duration::from_secs(10);
const S3_TCP_KEEPALIVE_TIME: u32 = 120;
--
2.47.2
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* [pbs-devel] [PATCH proxmox v2 5/6] s3-client: add retry logic for transient client errors
2025-08-25 13:10 [pbs-devel] [PATCH proxmox{, -backup} v2 0/7] fix #6665: never mark chunks as bad on s3 client fetch error Christian Ebner
` (3 preceding siblings ...)
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox v2 4/6] s3-client: bump s3 request timeout from 1 minute to 30 minutes Christian Ebner
@ 2025-08-25 13:10 ` Christian Ebner
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox v2 6/6] s3-client: use better fitting name for TCP idle time Christian Ebner
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox-backup v2 1/1] fix #6665: never rename chunks on s3 client fetch errors Christian Ebner
6 siblings, 0 replies; 8+ messages in thread
From: Christian Ebner @ 2025-08-25 13:10 UTC (permalink / raw)
To: pbs-devel
Implements a retry logic with exponentially increasing backoff time
for transient client errors.
For this, clone the requests by destructuring and efficiently
cloneing its body, leveraging Bytes::clone(). Retry up to 3 times,
adding an exponentially increasing backoff time for each retry
starting at 1 second, with the intention to reduce network congestion
and remote system overload.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
proxmox-s3-client/src/client.rs | 62 +++++++++++++++++++++++++--------
1 file changed, 47 insertions(+), 15 deletions(-)
diff --git a/proxmox-s3-client/src/client.rs b/proxmox-s3-client/src/client.rs
index 64d62c54..e3845111 100644
--- a/proxmox-s3-client/src/client.rs
+++ b/proxmox-s3-client/src/client.rs
@@ -39,6 +39,8 @@ const S3_TCP_KEEPALIVE_TIME: u32 = 120;
const MAX_S3_UPLOAD_RETRY: usize = 3;
// Assumed minimum upload rate of 1 KiB/s for dynamic put object request timeout calculation.
const S3_MIN_ASSUMED_UPLOAD_RATE: u64 = 1024;
+const MAX_S3_HTTP_REQUEST_RETRY: usize = 3;
+const S3_HTTP_REQUEST_RETRY_BACKOFF_DEFAULT: Duration = Duration::from_secs(1);
/// S3 object key path prefix without the context prefix as defined by the client options.
///
@@ -293,23 +295,53 @@ impl S3Client {
timeout: Option<Duration>,
) -> Result<Response<Incoming>, Error> {
let request = self.prepare(request).await?;
- if request.method() == Method::PUT {
- if let Some(limiter) = &self.put_rate_limiter {
- let sleep = {
- let mut limiter = limiter.lock().unwrap();
- limiter.register_traffic(Instant::now(), 1)
- };
- tokio::time::sleep(sleep).await;
+
+ let (parts, body) = request.into_parts();
+ let body_bytes = body
+ .bytes()
+ .ok_or_else(|| format_err!("cannot prepare request with streaming body"))?;
+
+ let deadline = timeout.map(|timeout| tokio::time::Instant::now() + timeout);
+
+ for retry in 0..MAX_S3_HTTP_REQUEST_RETRY {
+ let request = Request::from_parts(parts.clone(), Body::from(body_bytes.clone()));
+ if parts.method == Method::PUT {
+ if let Some(limiter) = &self.put_rate_limiter {
+ let sleep = {
+ let mut limiter = limiter.lock().unwrap();
+ limiter.register_traffic(Instant::now(), 1)
+ };
+ tokio::time::sleep(sleep).await;
+ }
+ }
+
+ if retry > 0 {
+ let backoff_secs = S3_HTTP_REQUEST_RETRY_BACKOFF_DEFAULT * 3_u32.pow(retry as u32);
+ tokio::time::sleep(backoff_secs).await;
+ }
+
+ let response = if let Some(deadline) = deadline {
+ tokio::time::timeout_at(deadline, self.client.request(request)).await
+ } else {
+ Ok(self.client.request(request).await)
+ };
+
+ match response {
+ Ok(Ok(response)) => return Ok(response),
+ Ok(Err(err)) => {
+ if retry >= MAX_S3_HTTP_REQUEST_RETRY - 1 {
+ return Err(err.into());
+ }
+ }
+ Err(_elapsed) => {
+ if retry >= MAX_S3_HTTP_REQUEST_RETRY - 1 {
+ bail!("request timed out exceeding retries");
+ }
+ }
}
}
- let response = if let Some(timeout) = timeout {
- tokio::time::timeout(timeout, self.client.request(request))
- .await
- .context("request timeout")??
- } else {
- self.client.request(request).await?
- };
- Ok(response)
+
+ bail!("failed to send request exceeding retries");
}
/// Check if bucket exists and got permissions to access it.
--
2.47.2
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* [pbs-devel] [PATCH proxmox v2 6/6] s3-client: use better fitting name for TCP idle time
2025-08-25 13:10 [pbs-devel] [PATCH proxmox{, -backup} v2 0/7] fix #6665: never mark chunks as bad on s3 client fetch error Christian Ebner
` (4 preceding siblings ...)
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox v2 5/6] s3-client: add retry logic for transient client errors Christian Ebner
@ 2025-08-25 13:10 ` Christian Ebner
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox-backup v2 1/1] fix #6665: never rename chunks on s3 client fetch errors Christian Ebner
6 siblings, 0 replies; 8+ messages in thread
From: Christian Ebner @ 2025-08-25 13:10 UTC (permalink / raw)
To: pbs-devel
Although called tcp_keepalive in the HttpsConnector, the parameter
sets the idle time before starting to send TCP keepalive requests [0].
Rename the constant to be better fitting, avoiding possible confusion
with tcp keepalive interval.
[0] https://git.proxmox.com/?p=proxmox.git;a=blob;f=proxmox-sys/src/linux/socket.rs;h=f1f108fb8c88c19cdd8850b17dc8191782d3d6ec;hb=HEAD#l13
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
proxmox-s3-client/src/client.rs | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/proxmox-s3-client/src/client.rs b/proxmox-s3-client/src/client.rs
index e3845111..96a5878d 100644
--- a/proxmox-s3-client/src/client.rs
+++ b/proxmox-s3-client/src/client.rs
@@ -35,7 +35,7 @@ use crate::response_reader::{
pub const S3_HTTP_REQUEST_TIMEOUT: Duration = Duration::from_secs(30 * 60);
const S3_HTTP_CONNECT_TIMEOUT: Duration = Duration::from_secs(10);
-const S3_TCP_KEEPALIVE_TIME: u32 = 120;
+const S3_TCP_KEEPIDLE_TIME: u32 = 120;
const MAX_S3_UPLOAD_RETRY: usize = 3;
// Assumed minimum upload rate of 1 KiB/s for dynamic put object request timeout calculation.
const S3_MIN_ASSUMED_UPLOAD_RATE: u64 = 1024;
@@ -154,7 +154,7 @@ impl S3Client {
let https_connector = HttpsConnector::with_connector(
http_connector,
ssl_connector_builder.build(),
- S3_TCP_KEEPALIVE_TIME,
+ S3_TCP_KEEPIDLE_TIME,
);
let client = Client::builder(TokioExecutor::new()).build::<_, Body>(https_connector);
--
2.47.2
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v2 1/1] fix #6665: never rename chunks on s3 client fetch errors
2025-08-25 13:10 [pbs-devel] [PATCH proxmox{, -backup} v2 0/7] fix #6665: never mark chunks as bad on s3 client fetch error Christian Ebner
` (5 preceding siblings ...)
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox v2 6/6] s3-client: use better fitting name for TCP idle time Christian Ebner
@ 2025-08-25 13:10 ` Christian Ebner
6 siblings, 0 replies; 8+ messages in thread
From: Christian Ebner @ 2025-08-25 13:10 UTC (permalink / raw)
To: pbs-devel
The chunk verification for the s3 backend always fetches the chunk
directly from the backend and verifies it based on the received
response.
Currently, when fetching the chunk contents failed in case of an
unrelated s3 client error, e.g. due to transient networking issues,
both the locally cached and the chunk in the object store, were
incorrectly marked as bad and renamed.
Instead, treat chunk fetching issues as error but do not treat the
chunk itself as bad.
Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=6665
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
src/backup/verify.rs | 35 +++++++++++++++++++++++------------
1 file changed, 23 insertions(+), 12 deletions(-)
diff --git a/src/backup/verify.rs b/src/backup/verify.rs
index b1452f267..069c3bdf9 100644
--- a/src/backup/verify.rs
+++ b/src/backup/verify.rs
@@ -284,13 +284,25 @@ impl VerifyWorker {
let object_key = pbs_datastore::s3::object_key_from_digest(&info.digest)?;
match proxmox_async::runtime::block_on(s3_client.get_object(object_key)) {
Ok(Some(response)) => {
- let bytes = proxmox_async::runtime::block_on(response.content.collect())?
- .to_bytes();
- let chunk = DataBlob::from_raw(bytes.to_vec())?;
- let size = info.size();
- *read_bytes += chunk.raw_size();
- decoder_pool.send((chunk, info.digest, size))?;
- *decoded_bytes += size;
+ let chunk_result = proxmox_lang::try_block!({
+ let bytes =
+ proxmox_async::runtime::block_on(response.content.collect())?
+ .to_bytes();
+ DataBlob::from_raw(bytes.to_vec())
+ });
+
+ match chunk_result {
+ Ok(chunk) => {
+ let size = info.size();
+ *read_bytes += chunk.raw_size();
+ decoder_pool.send((chunk, info.digest, size))?;
+ *decoded_bytes += size;
+ }
+ Err(err) => {
+ errors.fetch_add(1, Ordering::SeqCst);
+ error!("can't verify chunk, load failed - {err}");
+ }
+ }
}
Ok(None) => self.add_corrupt_chunk(
info.digest,
@@ -300,11 +312,10 @@ impl VerifyWorker {
hex::encode(info.digest)
),
),
- Err(err) => self.add_corrupt_chunk(
- info.digest,
- errors,
- &format!("can't verify chunk, load failed - {err}"),
- ),
+ Err(err) => {
+ errors.fetch_add(1, Ordering::SeqCst);
+ error!("can't verify chunk, load failed - {err}");
+ }
}
}
}
--
2.47.2
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-08-25 13:10 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-08-25 13:10 [pbs-devel] [PATCH proxmox{, -backup} v2 0/7] fix #6665: never mark chunks as bad on s3 client fetch error Christian Ebner
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox v2 1/6] proxmox-http: add method to share full body as contiguous bytes Christian Ebner
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox v2 2/6] s3-client: drop non-ambiguous mention of chunks in error message Christian Ebner
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox v2 3/6] s3-client: fix unintended match statement being an expression Christian Ebner
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox v2 4/6] s3-client: bump s3 request timeout from 1 minute to 30 minutes Christian Ebner
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox v2 5/6] s3-client: add retry logic for transient client errors Christian Ebner
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox v2 6/6] s3-client: use better fitting name for TCP idle time Christian Ebner
2025-08-25 13:10 ` [pbs-devel] [PATCH proxmox-backup v2 1/1] fix #6665: never rename chunks on s3 client fetch errors Christian Ebner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox