public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox{, -backup} v3 00/10] s3 store: fix chunk upload/insert and GC race condition for s3 backend
@ 2025-10-15 16:39 Christian Ebner
  2025-10-15 16:39 ` [pbs-devel] [PATCH proxmox v3 1/2] s3-client: add exponential backoff time for upload retries Christian Ebner
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Christian Ebner @ 2025-10-15 16:39 UTC (permalink / raw)
  To: pbs-devel

Patches 1 to 3 extend the current s3-client implementation by an additional
helper method, allowing to unconditionally upload chunks on final retry of
otherwise conditional uploads. This is to avoid failing backups in case of
concurrent uploads to the same chunk digest.

The remaining patches fix a possible race condition between s3 backend upload
and garbage collection, which can result in chunk loss. If the chunk upload
finished, garbage collection listed and checked the chunk's in-use marker, just
before it being written by the cache insert after the upload, garbage collection
can incorrectly delete the chunk. This is circumvented by setting and checking
an additional chunk marker file, which is touched before starting the upload
and removed after cache insert, assuring that these chunks are not removed.

Chunk upload marker files will not get cleaned up if the upload fails, excluding
issues in case of concurrent uploads of the same chunk. Garbage collection has
to clean up these markers as well, based on their atime.

Changes since version 2 (thanks @Fabian):
- Also fix issues arising from the concurrent conditional s3 put requests.
- Only cleanup upload marker file on success, add GC logic to cleanup lingering
  ones from failed uploads.
- Refactor chunk upload marker file related helpers.
- Split patches into smaller portions.

Changes since version 1 (thanks @Fabian):
- Refactor the garbage collection rework patches, using a callback to perform the
  chunk removal, so both filesystem and s3 backend can use the same logic without
  the need to readapt the gc status.
- Completely reworked the local datastore cache access method, so it not only
  serves the contents from s3 backend if that needs to be fetched, but also
  closes the download/insert race and drops quite some duplicate code,
  completely getting rid of the now obsolete S3Cacher
- Rework the chunk insert for s3 to also cover cases were concurrent uploads of
  the same object/key occurs, making sure that the upload marker creation will
  not lead to failure and that the upload marker cleanup is handled correctly as
  well. The only race still open is which of the two concurrent uploads inserts
  to the local cache, but since both versions must encode for the same data (
  as they have the same digest), this is not an issue. If one of the upload
  fails however, both must be considered as failed, since then there is no
  guarantee anymore that garbage collection did not cleanup the chunks from the
  s3 backend in the meantime.

proxmox:

Christian Ebner (2):
  s3-client: add exponential backoff time for upload retries
  s3-client: add helper method to force final unconditional upload on

 proxmox-s3-client/src/client.rs | 36 +++++++++++++++++++++++++++++----
 1 file changed, 32 insertions(+), 4 deletions(-)


proxmox-backup:

Christian Ebner (8):
  api/pull: avoid failing on concurrent conditional chunk uploads
  datastore: GC: drop overly verbose info message during s3 chunk sweep
  GC: refactor atime gathering for local chunk markers with s3 backend
  chunk store: refactor method for chunk insertion
  chunk store: add backend upload marker helpers for s3 backed stores
  api: chunk upload: fix race between chunk backend upload and insert
  api: chunk upload: fix race with garbage collection for no-cache on s3
  pull: guard chunk upload and only insert into cache after upload

 pbs-datastore/src/chunk_store.rs              | 200 +++++++++++++++++-
 pbs-datastore/src/datastore.rs                |  41 +++-
 .../src/local_datastore_lru_cache.rs          |   3 +-
 src/api2/backup/upload_chunk.rs               |  17 +-
 src/api2/config/datastore.rs                  |   2 +
 src/server/pull.rs                            |   7 +-
 6 files changed, 242 insertions(+), 28 deletions(-)


Summary over all repositories:
  7 files changed, 274 insertions(+), 32 deletions(-)

-- 
Generated by git-murpp 0.8.1


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] [PATCH proxmox v3 1/2] s3-client: add exponential backoff time for upload retries
  2025-10-15 16:39 [pbs-devel] [PATCH proxmox{, -backup} v3 00/10] s3 store: fix chunk upload/insert and GC race condition for s3 backend Christian Ebner
@ 2025-10-15 16:39 ` Christian Ebner
  2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox v3 2/2] s3-client: add helper method to force final unconditional upload on Christian Ebner
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Christian Ebner @ 2025-10-15 16:39 UTC (permalink / raw)
  To: pbs-devel

With the main purpose to backoff some time in case of concurrent
conditional uploads with the If-None-Match header is set. For these,
the upload will fail with http status code 409 as stated in  [0]. The
backoff time will increase the time window for the already ongoing
upload to finish.

[0] https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html#API_PutObject_RequestSyntax

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 proxmox-s3-client/src/client.rs | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/proxmox-s3-client/src/client.rs b/proxmox-s3-client/src/client.rs
index abb35f2f..4ebd8c4b 100644
--- a/proxmox-s3-client/src/client.rs
+++ b/proxmox-s3-client/src/client.rs
@@ -715,6 +715,10 @@ impl S3Client {
             .max(S3_HTTP_REQUEST_TIMEOUT.as_secs());
         let timeout = Some(Duration::from_secs(timeout_secs));
         for retry in 0..MAX_S3_UPLOAD_RETRY {
+            if retry > 0 {
+                let backoff_secs = S3_HTTP_REQUEST_RETRY_BACKOFF_DEFAULT * 3_u32.pow(retry as u32);
+                tokio::time::sleep(backoff_secs).await;
+            }
             let body = Body::from(object_data.clone());
             match self
                 .put_object(object_key.clone(), body, timeout, replace)
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] [PATCH proxmox v3 2/2] s3-client: add helper method to force final unconditional upload on
  2025-10-15 16:39 [pbs-devel] [PATCH proxmox{, -backup} v3 00/10] s3 store: fix chunk upload/insert and GC race condition for s3 backend Christian Ebner
  2025-10-15 16:39 ` [pbs-devel] [PATCH proxmox v3 1/2] s3-client: add exponential backoff time for upload retries Christian Ebner
@ 2025-10-15 16:40 ` Christian Ebner
  2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 1/8] api/pull: avoid failing on concurrent conditional chunk uploads Christian Ebner
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Christian Ebner @ 2025-10-15 16:40 UTC (permalink / raw)
  To: pbs-devel

Extend the currently implemented conditional/unconditional upload
helpers with an additional variant which will perform conditional
uploads requests up until the last one. The last will be send
unconditionally, not setting the If-None-Match header. The usecase
for this is to not fail in PBS during chunk upload if a concurrent
upload to the same chunk is in-progress, not finished within the
upload retries with backoff time.

Which put object results in the final object is then however not
clearly specified in that case, AWS docs mention contradicting
behaviour [0]. Quote for different parts of the docs:

> If two PUT requests are simultaneously made to the same key, the
> request with the latest timestamp wins.
> [...]
> Amazon S3 internally uses last-writer-wins semantics to determine
> which write takes precedence.

[0] https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html#ConsistencyModel

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 proxmox-s3-client/src/client.rs | 32 ++++++++++++++++++++++++++++----
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/proxmox-s3-client/src/client.rs b/proxmox-s3-client/src/client.rs
index 4ebd8c4b..fae8a56f 100644
--- a/proxmox-s3-client/src/client.rs
+++ b/proxmox-s3-client/src/client.rs
@@ -684,7 +684,26 @@ impl S3Client {
         object_data: Bytes,
     ) -> Result<bool, Error> {
         let replace = false;
-        self.do_upload_with_retry(object_key, object_data, replace)
+        let finally_replace = false;
+        self.do_upload_with_retry(object_key, object_data, replace, finally_replace)
+            .await
+    }
+
+    /// Upload the given object via the S3 api, not replacing it if already present in the object
+    /// store. If a conditional upload leads to repeated failures with status code 409, do not set
+    /// the `If-None-Match` header for the final retry.
+    /// Retrying up to 3 times in case of error.
+    ///
+    /// Note: Which object results in the final version is not clearly specified.
+    #[inline(always)]
+    pub async fn upload_replace_on_final_retry(
+        &self,
+        object_key: S3ObjectKey,
+        object_data: Bytes,
+    ) -> Result<bool, Error> {
+        let replace = false;
+        let finally_replace = true;
+        self.do_upload_with_retry(object_key, object_data, replace, finally_replace)
             .await
     }
 
@@ -697,17 +716,19 @@ impl S3Client {
         object_data: Bytes,
     ) -> Result<bool, Error> {
         let replace = true;
-        self.do_upload_with_retry(object_key, object_data, replace)
+        let finally_replace = false;
+        self.do_upload_with_retry(object_key, object_data, replace, finally_replace)
             .await
     }
 
     /// Helper to perform the object upload and retry, wrapped by the corresponding methods
-    /// to mask the `replace` flag.
+    /// to mask the `replace` and `finally_replace` flag.
     async fn do_upload_with_retry(
         &self,
         object_key: S3ObjectKey,
         object_data: Bytes,
-        replace: bool,
+        mut replace: bool,
+        finally_replace: bool,
     ) -> Result<bool, Error> {
         let content_size = object_data.len() as u64;
         let timeout_secs = content_size
@@ -719,6 +740,9 @@ impl S3Client {
                 let backoff_secs = S3_HTTP_REQUEST_RETRY_BACKOFF_DEFAULT * 3_u32.pow(retry as u32);
                 tokio::time::sleep(backoff_secs).await;
             }
+            if retry == MAX_S3_UPLOAD_RETRY - 1 {
+                replace = finally_replace;
+            }
             let body = Body::from(object_data.clone());
             match self
                 .put_object(object_key.clone(), body, timeout, replace)
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v3 1/8] api/pull: avoid failing on concurrent conditional chunk uploads
  2025-10-15 16:39 [pbs-devel] [PATCH proxmox{, -backup} v3 00/10] s3 store: fix chunk upload/insert and GC race condition for s3 backend Christian Ebner
  2025-10-15 16:39 ` [pbs-devel] [PATCH proxmox v3 1/2] s3-client: add exponential backoff time for upload retries Christian Ebner
  2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox v3 2/2] s3-client: add helper method to force final unconditional upload on Christian Ebner
@ 2025-10-15 16:40 ` Christian Ebner
  2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 2/8] datastore: GC: drop overly verbose info message during s3 chunk sweep Christian Ebner
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Christian Ebner @ 2025-10-15 16:40 UTC (permalink / raw)
  To: pbs-devel

Chunks are currently being uploaded conditionally by setting the
`If-None-Match` header for put request (if not disabled by the
provider quirks).
In that case, uploads to the s3 backend while a concurrent upload to
the same object is ongoing will lead to the request returning with
http status code 409 [0]. While a retry logic with backoff time is
used, the concurrent upload might still not be finished after the
retires are exhausted.

Therefore, use the `upload_replace_on_final_retry` method instead,
which does not set the `If-None-Match` header on the last retry,
effectively re-uploading the object in that case. While it is not
specified which of the concurrent uploads will then be the resulting
object version, this is still fine as chunks with the same digest
encode for the same data (modulo compression).

[0] https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html#API_PutObject_RequestSyntax

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 src/api2/backup/upload_chunk.rs | 4 ++--
 src/server/pull.rs              | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/api2/backup/upload_chunk.rs b/src/api2/backup/upload_chunk.rs
index 8dd7e4d52..64e8d6e63 100644
--- a/src/api2/backup/upload_chunk.rs
+++ b/src/api2/backup/upload_chunk.rs
@@ -263,7 +263,7 @@ async fn upload_to_backend(
             if env.no_cache {
                 let object_key = pbs_datastore::s3::object_key_from_digest(&digest)?;
                 let is_duplicate = s3_client
-                    .upload_no_replace_with_retry(object_key, data)
+                    .upload_replace_on_final_retry(object_key, data)
                     .await
                     .map_err(|err| format_err!("failed to upload chunk to s3 backend - {err:#}"))?;
                 return Ok((digest, size, encoded_size, is_duplicate));
@@ -287,7 +287,7 @@ async fn upload_to_backend(
             tracing::info!("Upload of new chunk {}", hex::encode(digest));
             let object_key = pbs_datastore::s3::object_key_from_digest(&digest)?;
             let is_duplicate = s3_client
-                .upload_no_replace_with_retry(object_key, data.clone())
+                .upload_replace_on_final_retry(object_key, data.clone())
                 .await
                 .map_err(|err| format_err!("failed to upload chunk to s3 backend - {err:#}"))?;
 
diff --git a/src/server/pull.rs b/src/server/pull.rs
index 817b57ac5..c0b6fef7c 100644
--- a/src/server/pull.rs
+++ b/src/server/pull.rs
@@ -181,7 +181,7 @@ async fn pull_index_chunks<I: IndexFile>(
                     let upload_data = hyper::body::Bytes::from(data);
                     let object_key = pbs_datastore::s3::object_key_from_digest(&digest)?;
                     let _is_duplicate = proxmox_async::runtime::block_on(
-                        s3_client.upload_no_replace_with_retry(object_key, upload_data),
+                        s3_client.upload_replace_on_final_retry(object_key, upload_data),
                     )
                     .context("failed to upload chunk to s3 backend")?;
                 }
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v3 2/8] datastore: GC: drop overly verbose info message during s3 chunk sweep
  2025-10-15 16:39 [pbs-devel] [PATCH proxmox{, -backup} v3 00/10] s3 store: fix chunk upload/insert and GC race condition for s3 backend Christian Ebner
                   ` (2 preceding siblings ...)
  2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 1/8] api/pull: avoid failing on concurrent conditional chunk uploads Christian Ebner
@ 2025-10-15 16:40 ` Christian Ebner
  2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 3/8] GC: refactor atime gathering for local chunk markers with s3 backend Christian Ebner
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Christian Ebner @ 2025-10-15 16:40 UTC (permalink / raw)
  To: pbs-devel

This message does not provide much additional information and is
shown on correct operation anyways, leaving it however open for
misinterpretation of being an error/warning. Drop it in favor of
being less verbose.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/datastore.rs | 1 -
 1 file changed, 1 deletion(-)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index a6b17e3c3..b0386421f 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1652,7 +1652,6 @@ impl DataStore {
                         Ok(stat) => stat.accessed()?,
                         Err(err) if err.kind() == std::io::ErrorKind::NotFound => {
                             // File not found, delete by setting atime to unix epoch
-                            info!("Not found, mark for deletion: {}", content.key);
                             SystemTime::UNIX_EPOCH
                         }
                         Err(err) => return Err(err.into()),
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v3 3/8] GC: refactor atime gathering for local chunk markers with s3 backend
  2025-10-15 16:39 [pbs-devel] [PATCH proxmox{, -backup} v3 00/10] s3 store: fix chunk upload/insert and GC race condition for s3 backend Christian Ebner
                   ` (3 preceding siblings ...)
  2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 2/8] datastore: GC: drop overly verbose info message during s3 chunk sweep Christian Ebner
@ 2025-10-15 16:40 ` Christian Ebner
  2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 4/8] chunk store: refactor method for chunk insertion Christian Ebner
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Christian Ebner @ 2025-10-15 16:40 UTC (permalink / raw)
  To: pbs-devel

Instead of setting the access time to the unix epoch and calculate the
atime from that, directly set the value to 0 if the marker file is not
present.

This is in preparation for also checking the to be introduced upload
markers to avoid GC races with upload/insert on s3 datastores.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/datastore.rs | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index b0386421f..45e079f1a 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1649,14 +1649,16 @@ impl DataStore {
                     // Check local markers (created or atime updated during phase1) and
                     // keep or delete chunk based on that.
                     let atime = match std::fs::metadata(&chunk_path) {
-                        Ok(stat) => stat.accessed()?,
+                        Ok(stat) => stat
+                            .accessed()?
+                            .duration_since(SystemTime::UNIX_EPOCH)?
+                            .as_secs() as i64,
                         Err(err) if err.kind() == std::io::ErrorKind::NotFound => {
                             // File not found, delete by setting atime to unix epoch
-                            SystemTime::UNIX_EPOCH
+                            0
                         }
                         Err(err) => return Err(err.into()),
                     };
-                    let atime = atime.duration_since(SystemTime::UNIX_EPOCH)?.as_secs() as i64;
 
                     let bad = chunk_path
                         .as_path()
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v3 4/8] chunk store: refactor method for chunk insertion
  2025-10-15 16:39 [pbs-devel] [PATCH proxmox{, -backup} v3 00/10] s3 store: fix chunk upload/insert and GC race condition for s3 backend Christian Ebner
                   ` (4 preceding siblings ...)
  2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 3/8] GC: refactor atime gathering for local chunk markers with s3 backend Christian Ebner
@ 2025-10-15 16:40 ` Christian Ebner
  2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 5/8] chunk store: add backend upload marker helpers for s3 backed stores Christian Ebner
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Christian Ebner @ 2025-10-15 16:40 UTC (permalink / raw)
  To: pbs-devel

Extend the current implementation of the chunk insertion by a method
to provide a callback to be executed before writing the chunk file,
but after checking if the file is pre-existing.

This is in preparation for different methods for chunk insertion,
with differentiated handling of the non pre-existing chunk file case.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/chunk_store.rs | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 6e50327cb..65d74d68e 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -558,6 +558,15 @@ impl ChunkStore {
     }
 
     pub fn insert_chunk(&self, chunk: &DataBlob, digest: &[u8; 32]) -> Result<(bool, u64), Error> {
+        self.insert_chunk_impl(chunk, digest, |_, _| Ok(()))
+    }
+
+    fn insert_chunk_impl(
+        &self,
+        chunk: &DataBlob,
+        digest: &[u8; 32],
+        before_create: impl FnOnce(&[u8; 32], bool) -> Result<(), Error>,
+    ) -> Result<(bool, u64), Error> {
         // unwrap: only `None` in unit tests
         assert!(self.locker.is_some());
 
@@ -572,7 +581,7 @@ impl ChunkStore {
 
         let name = &self.name;
 
-        if let Ok(metadata) = std::fs::metadata(&chunk_path) {
+        let pre_existing = if let Ok(metadata) = std::fs::metadata(&chunk_path) {
             if !metadata.is_file() {
                 bail!("got unexpected file type on store '{name}' for chunk {digest_str}");
             }
@@ -612,7 +621,12 @@ impl ChunkStore {
             } else {
                 log::debug!("Got another copy of chunk with digest '{digest_str}', existing chunk is bigger, replacing with uploaded one.");
             }
-        }
+            true
+        } else {
+            false
+        };
+
+        before_create(digest, pre_existing)?;
 
         let chunk_dir_path = chunk_path
             .parent()
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v3 5/8] chunk store: add backend upload marker helpers for s3 backed stores
  2025-10-15 16:39 [pbs-devel] [PATCH proxmox{, -backup} v3 00/10] s3 store: fix chunk upload/insert and GC race condition for s3 backend Christian Ebner
                   ` (5 preceding siblings ...)
  2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 4/8] chunk store: refactor method for chunk insertion Christian Ebner
@ 2025-10-15 16:40 ` Christian Ebner
  2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 6/8] api: chunk upload: fix race between chunk backend upload and insert Christian Ebner
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Christian Ebner @ 2025-10-15 16:40 UTC (permalink / raw)
  To: pbs-devel

In preparation for fixing the race window between chunk upload/insert
and garbage collection for ongoing backup jobs.

Introduces helper methods to manipulate upload marker files for
chunks, guarded by the chunk store mutex for consistency in case of
concurrent operation.
Upload marker files obtain filenames extending the digest by the
`backend-upload` extension.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/chunk_store.rs | 156 +++++++++++++++++++++++++++++--
 pbs-datastore/src/datastore.rs   |  10 ++
 src/api2/config/datastore.rs     |   2 +
 3 files changed, 160 insertions(+), 8 deletions(-)

diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 65d74d68e..2693a1c11 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -2,12 +2,12 @@ use std::os::unix::fs::MetadataExt;
 use std::os::unix::io::AsRawFd;
 use std::path::{Path, PathBuf};
 use std::sync::{Arc, Mutex};
-use std::time::Duration;
+use std::time::{Duration, SystemTime};
 
 use anyhow::{bail, format_err, Context, Error};
 use tracing::{info, warn};
 
-use pbs_api_types::{DatastoreFSyncLevel, GarbageCollectionStatus};
+use pbs_api_types::{DatastoreBackendType, DatastoreFSyncLevel, GarbageCollectionStatus};
 use proxmox_io::ReadExt;
 use proxmox_s3_client::S3Client;
 use proxmox_sys::fs::{create_dir, create_path, file_type_from_file_stat, CreateOptions};
@@ -22,6 +22,10 @@ use crate::file_formats::{
 };
 use crate::DataBlob;
 
+/// Filename extension for local datastore cache marker files indicating
+/// the chunk being uploaded to the backend
+const BACKEND_UPLOAD_MARKER_EXT: &str = "backend-upload";
+
 /// File system based chunk store
 pub struct ChunkStore {
     name: String, // used for error reporting
@@ -30,6 +34,7 @@ pub struct ChunkStore {
     mutex: Mutex<()>,
     locker: Option<Arc<Mutex<ProcessLocker>>>,
     sync_level: DatastoreFSyncLevel,
+    datastore_backend_type: DatastoreBackendType,
 }
 
 // TODO: what about sysctl setting vm.vfs_cache_pressure (0 - 100) ?
@@ -77,6 +82,7 @@ impl ChunkStore {
             mutex: Mutex::new(()),
             locker: None,
             sync_level: Default::default(),
+            datastore_backend_type: DatastoreBackendType::default(),
         }
     }
 
@@ -97,6 +103,7 @@ impl ChunkStore {
         uid: nix::unistd::Uid,
         gid: nix::unistd::Gid,
         sync_level: DatastoreFSyncLevel,
+        datastore_backend_type: DatastoreBackendType,
     ) -> Result<Self, Error>
     where
         P: Into<PathBuf>,
@@ -151,7 +158,7 @@ impl ChunkStore {
             }
         }
 
-        Self::open(name, base, sync_level)
+        Self::open(name, base, sync_level, datastore_backend_type)
     }
 
     fn lockfile_path<P: Into<PathBuf>>(base: P) -> PathBuf {
@@ -185,6 +192,7 @@ impl ChunkStore {
         name: &str,
         base: P,
         sync_level: DatastoreFSyncLevel,
+        datastore_backend_type: DatastoreBackendType,
     ) -> Result<Self, Error> {
         let base: PathBuf = base.into();
 
@@ -201,6 +209,7 @@ impl ChunkStore {
             locker: Some(locker),
             mutex: Mutex::new(()),
             sync_level,
+            datastore_backend_type,
         })
     }
 
@@ -557,10 +566,114 @@ impl ChunkStore {
         Ok(())
     }
 
+    /// Insert a new backend upload marker or update the atime if pre-existing, signaling to garbage
+    /// collection that there is an in-progress upload for this chunk.
+    ///
+    /// Returns true if the marker was created or touched, returns false if the chunk has been
+    /// inserted since, the marker file not being created and the upload must be avoided.
+    pub(crate) fn touch_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<bool, Error> {
+        if self.datastore_backend_type == DatastoreBackendType::Filesystem {
+            bail!("cannot create backend upload marker, not a cache store");
+        }
+        let (marker_path, digest_str) = self.chunk_backed_upload_marker_path(digest);
+
+        let _lock = self.mutex.lock();
+
+        if self.cond_touch_chunk(digest, false)? {
+            return Ok(false);
+        }
+
+        if let Err(err) = std::fs::File::options()
+            .write(true)
+            .create_new(true)
+            .open(&marker_path)
+        {
+            if err.kind() == std::io::ErrorKind::AlreadyExists {
+                // do not rely on any `File` method implementations such as set_time,
+                // rather update atime using the same logic as for chunks.
+                self.cond_touch_path(&marker_path, true).with_context(|| {
+                    format!("failed to update access time of backend upload marker for chunk {digest_str}")
+                })?;
+            } else {
+                return Err(err).with_context(|| {
+                    format!("failed to create backend upload marker for chunk {digest_str}")
+                });
+            }
+        }
+        Ok(true)
+    }
+
+    /// Sweep the chunk marker and upload chunk marker, gathering the access timestamp.
+    ///
+    /// The backup upload marker is removed if it has not been touched within
+    /// the cutoff time.
+    /// Note: caller must hold chunk store file lock.
+    pub(crate) unsafe fn sweep_chunk_marker_files(
+        &self,
+        digest: &[u8; 32],
+        min_atime: i64,
+    ) -> Result<i64, Error> {
+        let (chunk_path, _digest_str) = self.chunk_path(digest);
+        let atime = match std::fs::metadata(&chunk_path) {
+            Ok(stat) => stat
+                .accessed()?
+                .duration_since(SystemTime::UNIX_EPOCH)?
+                .as_secs() as i64,
+            Err(err) if err.kind() == std::io::ErrorKind::NotFound => {
+                let (marker_path, _digest_str) = self.chunk_backed_upload_marker_path(digest);
+                match std::fs::metadata(&marker_path) {
+                    Ok(upload_marker_stat) => {
+                        let atime = upload_marker_stat
+                            .accessed()?
+                            .duration_since(SystemTime::UNIX_EPOCH)?
+                            .as_secs() as i64;
+                        if atime < min_atime {
+                            std::fs::remove_file(&marker_path)?
+                        }
+                        atime
+                    }
+                    Err(err) => {
+                        if err.kind() != std::io::ErrorKind::NotFound {
+                            bail!("failed to check backend upload marker: {err}");
+                        } else {
+                            0
+                        }
+                    }
+                }
+            }
+            Err(err) => return Err(err.into()),
+        };
+        Ok(atime)
+    }
+
     pub fn insert_chunk(&self, chunk: &DataBlob, digest: &[u8; 32]) -> Result<(bool, u64), Error> {
         self.insert_chunk_impl(chunk, digest, |_, _| Ok(()))
     }
 
+    /// Insert the chunk into the chunk store and remove the backend upload marker file for
+    /// datastores with non-filesystem backend.
+    pub(crate) fn insert_chunk_and_remove_upload_marker(
+        &self,
+        chunk: &DataBlob,
+        digest: &[u8; 32],
+    ) -> Result<(bool, u64), Error> {
+        self.insert_chunk_impl(chunk, digest, |digest, pre_existing| {
+            if self.datastore_backend_type != DatastoreBackendType::Filesystem {
+                let (marker_path, digest_str) = self.chunk_backed_upload_marker_path(digest);
+                if let Err(err) = std::fs::remove_file(marker_path) {
+                    if !(pre_existing && err.kind() == std::io::ErrorKind::NotFound) {
+                        bail!(
+                            "vanished upload marker file on store '{}' for {digest_str} - {err}",
+                            self.name,
+                        )
+                    }
+                }
+            }
+
+            Ok(())
+        })
+    }
+
     fn insert_chunk_impl(
         &self,
         chunk: &DataBlob,
@@ -692,6 +805,15 @@ impl ChunkStore {
         Ok(())
     }
 
+    /// Generate file path for backend upload marker of given chunk digest.
+    fn chunk_backed_upload_marker_path(&self, digest: &[u8; 32]) -> (PathBuf, String) {
+        let (chunk_path, digest_str) = self.chunk_path(digest);
+        (
+            chunk_path.with_extension(BACKEND_UPLOAD_MARKER_EXT),
+            digest_str,
+        )
+    }
+
     pub fn relative_path(&self, path: &Path) -> PathBuf {
         // unwrap: only `None` in unit tests
         assert!(self.locker.is_some());
@@ -777,14 +899,26 @@ fn test_chunk_store1() {
 
     if let Err(_e) = std::fs::remove_dir_all(".testdir") { /* ignore */ }
 
-    let chunk_store = ChunkStore::open("test", &path, DatastoreFSyncLevel::None);
+    let chunk_store = ChunkStore::open(
+        "test",
+        &path,
+        DatastoreFSyncLevel::None,
+        DatastoreBackendType::Filesystem,
+    );
     assert!(chunk_store.is_err());
 
     let user = nix::unistd::User::from_uid(nix::unistd::Uid::current())
         .unwrap()
         .unwrap();
-    let chunk_store =
-        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None).unwrap();
+    let chunk_store = ChunkStore::create(
+        "test",
+        &path,
+        user.uid,
+        user.gid,
+        DatastoreFSyncLevel::None,
+        DatastoreBackendType::Filesystem,
+    )
+    .unwrap();
 
     let (chunk, digest) = crate::data_blob::DataChunkBuilder::new(&[0u8, 1u8])
         .build()
@@ -796,8 +930,14 @@ fn test_chunk_store1() {
     let (exists, _) = chunk_store.insert_chunk(&chunk, &digest).unwrap();
     assert!(exists);
 
-    let chunk_store =
-        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None);
+    let chunk_store = ChunkStore::create(
+        "test",
+        &path,
+        user.uid,
+        user.gid,
+        DatastoreFSyncLevel::None,
+        DatastoreBackendType::Filesystem,
+    );
     assert!(chunk_store.is_err());
 
     if let Err(_e) = std::fs::remove_dir_all(".testdir") { /* ignore */ }
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 45e079f1a..ed994eb0b 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -336,10 +336,14 @@ impl DataStore {
                 DatastoreTuning::API_SCHEMA
                     .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
             )?;
+            let backend_config: DatastoreBackendConfig =
+                config.backend.as_deref().unwrap_or("").parse()?;
+            let backend_type = backend_config.ty.unwrap_or_default();
             Arc::new(ChunkStore::open(
                 name,
                 config.absolute_path(),
                 tuning.sync_level.unwrap_or_default(),
+                backend_type,
             )?)
         };
 
@@ -424,10 +428,16 @@ impl DataStore {
             DatastoreTuning::API_SCHEMA
                 .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
         )?;
+        let backend_config: DatastoreBackendConfig = serde_json::from_value(
+            DatastoreBackendConfig::API_SCHEMA
+                .parse_property_string(config.backend.as_deref().unwrap_or(""))?,
+        )?;
+        let backend_type = backend_config.ty.unwrap_or_default();
         let chunk_store = ChunkStore::open(
             &name,
             config.absolute_path(),
             tuning.sync_level.unwrap_or_default(),
+            backend_type,
         )?;
         let inner = Arc::new(Self::with_store_and_config(
             Arc::new(chunk_store),
diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
index 3b03c0466..541bd0a04 100644
--- a/src/api2/config/datastore.rs
+++ b/src/api2/config/datastore.rs
@@ -173,6 +173,7 @@ pub(crate) fn do_create_datastore(
                 &datastore.name,
                 &path,
                 tuning.sync_level.unwrap_or_default(),
+                backend_type,
             )
         })?
     } else {
@@ -204,6 +205,7 @@ pub(crate) fn do_create_datastore(
             backup_user.uid,
             backup_user.gid,
             tuning.sync_level.unwrap_or_default(),
+            backend_type,
         )?
     };
 
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v3 6/8] api: chunk upload: fix race between chunk backend upload and insert
  2025-10-15 16:39 [pbs-devel] [PATCH proxmox{, -backup} v3 00/10] s3 store: fix chunk upload/insert and GC race condition for s3 backend Christian Ebner
                   ` (6 preceding siblings ...)
  2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 5/8] chunk store: add backend upload marker helpers for s3 backed stores Christian Ebner
@ 2025-10-15 16:40 ` Christian Ebner
  2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 7/8] api: chunk upload: fix race with garbage collection for no-cache on s3 Christian Ebner
  2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 8/8] pull: guard chunk upload and only insert into cache after upload Christian Ebner
  9 siblings, 0 replies; 11+ messages in thread
From: Christian Ebner @ 2025-10-15 16:40 UTC (permalink / raw)
  To: pbs-devel

Chunk are first uploaded to the object store for S3 backed
datastores, only then inserted into the local datastore cache.
By this, it is assured that the chunk is only ever considered as
valid after successful upload. Garbage collection does however rely
on the local marker file to be present, only then considering the chunk
as in-use. While this marker is created if not present during phase 1
of garbage collection, this only happens for chunks which are already
referenced by a complete index file.

Therefore, there remains a race window between garbage collection
listing the chunks (upload completed) and lookup of the local marker
file being present (chunk cache insert after upload). This can lead to
chunks which just finished upload, but were not yet inserted into the
local cache store to be removed again.

To close this race window, mark chunks which are currently being
uploaded to the backend by an additional marker file, checked by the
garbage collection as well if the regular marker is not found.
The upload marker file is only removed after successful chunk insert
or by garbage collection if the atime is older than the cutoff (cleanup
in case of failed uploads).

Concurrent chunk uploads are still possible, updating the upload
marker file as it is then pre-existing. The upload marker file is not
removed in case of upload errors, assuring it remains in place for
the other upload.

Avoid this overhead for regular datastores by only performing these
operations on s3 backend datastores.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/datastore.rs                | 25 +++++++++++--------
 .../src/local_datastore_lru_cache.rs          |  3 ++-
 src/api2/backup/upload_chunk.rs               |  9 ++++---
 3 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index ed994eb0b..aa34ab037 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -4,7 +4,7 @@ use std::os::unix::ffi::OsStrExt;
 use std::os::unix::io::AsRawFd;
 use std::path::{Path, PathBuf};
 use std::sync::{Arc, LazyLock, Mutex};
-use std::time::{Duration, SystemTime};
+use std::time::Duration;
 
 use anyhow::{bail, format_err, Context, Error};
 use http_body_util::BodyExt;
@@ -1658,16 +1658,10 @@ impl DataStore {
 
                     // Check local markers (created or atime updated during phase1) and
                     // keep or delete chunk based on that.
-                    let atime = match std::fs::metadata(&chunk_path) {
-                        Ok(stat) => stat
-                            .accessed()?
-                            .duration_since(SystemTime::UNIX_EPOCH)?
-                            .as_secs() as i64,
-                        Err(err) if err.kind() == std::io::ErrorKind::NotFound => {
-                            // File not found, delete by setting atime to unix epoch
-                            0
-                        }
-                        Err(err) => return Err(err.into()),
+                    let atime = unsafe {
+                        self.inner
+                            .chunk_store
+                            .sweep_chunk_marker_files(&digest, min_atime)?
                     };
 
                     let bad = chunk_path
@@ -1868,6 +1862,15 @@ impl DataStore {
         self.inner.chunk_store.insert_chunk(chunk, digest)
     }
 
+    /// Insert a new backend upload marker or update atime if pre-existing, signaling to garbage
+    /// collection that there is an in-progress upload for this chunk.
+    ///
+    /// Returns true if the marker was created or touched, returns false if the chunk has been
+    /// inserted since, the marker file not being created and the upload must be avoided.
+    pub fn touch_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<bool, Error> {
+        self.inner.chunk_store.touch_backend_upload_marker(digest)
+    }
+
     pub fn stat_chunk(&self, digest: &[u8; 32]) -> Result<std::fs::Metadata, Error> {
         let (chunk_path, _digest_str) = self.inner.chunk_store.chunk_path(digest);
         std::fs::metadata(chunk_path).map_err(Error::from)
diff --git a/pbs-datastore/src/local_datastore_lru_cache.rs b/pbs-datastore/src/local_datastore_lru_cache.rs
index fe3b51a55..cdad77031 100644
--- a/pbs-datastore/src/local_datastore_lru_cache.rs
+++ b/pbs-datastore/src/local_datastore_lru_cache.rs
@@ -34,7 +34,8 @@ impl LocalDatastoreLruCache {
     ///
     /// Fails if the chunk cannot be inserted successfully.
     pub fn insert(&self, digest: &[u8; 32], chunk: &DataBlob) -> Result<(), Error> {
-        self.store.insert_chunk(chunk, digest)?;
+        self.store
+            .insert_chunk_and_remove_upload_marker(chunk, digest)?;
         self.cache
             .insert(*digest, (), |digest| self.store.clear_chunk(&digest))
     }
diff --git a/src/api2/backup/upload_chunk.rs b/src/api2/backup/upload_chunk.rs
index 64e8d6e63..0640f3652 100644
--- a/src/api2/backup/upload_chunk.rs
+++ b/src/api2/backup/upload_chunk.rs
@@ -259,6 +259,7 @@ async fn upload_to_backend(
                     data.len()
                 );
             }
+            let datastore = env.datastore.clone();
 
             if env.no_cache {
                 let object_key = pbs_datastore::s3::object_key_from_digest(&digest)?;
@@ -272,11 +273,11 @@ async fn upload_to_backend(
             // Avoid re-upload to S3 if the chunk is either present in the LRU cache or the chunk
             // file exists on filesystem. The latter means that the chunk has been present in the
             // past an was not cleaned up by garbage collection, so contained in the S3 object store.
-            if env.datastore.cache_contains(&digest) {
+            if datastore.cache_contains(&digest) {
                 tracing::info!("Skip upload of cached chunk {}", hex::encode(digest));
                 return Ok((digest, size, encoded_size, true));
             }
-            if let Ok(true) = env.datastore.cond_touch_chunk(&digest, false) {
+            if let Ok(true) = datastore.cond_touch_chunk(&digest, false) {
                 tracing::info!(
                     "Skip upload of already encountered chunk {}",
                     hex::encode(digest)
@@ -286,6 +287,9 @@ async fn upload_to_backend(
 
             tracing::info!("Upload of new chunk {}", hex::encode(digest));
             let object_key = pbs_datastore::s3::object_key_from_digest(&digest)?;
+            if !datastore.touch_backend_upload_marker(&digest)? {
+                return Ok((digest, size, encoded_size, true));
+            }
             let is_duplicate = s3_client
                 .upload_replace_on_final_retry(object_key, data.clone())
                 .await
@@ -295,7 +299,6 @@ async fn upload_to_backend(
             // Although less performant than doing this in parallel, it is required for consisency
             // since chunks are considered as present on the backend if the file exists in the local
             // cache store.
-            let datastore = env.datastore.clone();
             tracing::info!("Caching of chunk {}", hex::encode(digest));
             let _ = tokio::task::spawn_blocking(move || {
                 let chunk = DataBlob::from_raw(data.to_vec())?;
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v3 7/8] api: chunk upload: fix race with garbage collection for no-cache on s3
  2025-10-15 16:39 [pbs-devel] [PATCH proxmox{, -backup} v3 00/10] s3 store: fix chunk upload/insert and GC race condition for s3 backend Christian Ebner
                   ` (7 preceding siblings ...)
  2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 6/8] api: chunk upload: fix race between chunk backend upload and insert Christian Ebner
@ 2025-10-15 16:40 ` Christian Ebner
  2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 8/8] pull: guard chunk upload and only insert into cache after upload Christian Ebner
  9 siblings, 0 replies; 11+ messages in thread
From: Christian Ebner @ 2025-10-15 16:40 UTC (permalink / raw)
  To: pbs-devel

Chunks uploaded to the s3 backend are never inserted into the local
datastore cache. The presence of the chunk marker file is however
required for garbage collection to not cleanup the chunks. While the
marker files are created during phase 1 of the garbage collection for
indexed chunks, this is not the case for in progress backups with the
no-cache flag set.

Therefore, mark chunks as in-progress while being uploaded just like
for the regular mode with cache, but replace this with the zero-sized
chunk marker file after upload finished to avoid incorrect garbage
collection cleanup.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/chunk_store.rs | 26 ++++++++++++++++++++++++++
 pbs-datastore/src/datastore.rs   |  7 +++++++
 src/api2/backup/upload_chunk.rs  |  4 ++++
 3 files changed, 37 insertions(+)

diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 2693a1c11..1e71b2970 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -646,6 +646,32 @@ impl ChunkStore {
         Ok(atime)
     }
 
+    /// Transform the backend upload marker to be a chunk marker.
+    ///
+    /// If the chunk marker is already present, its atime will be updated instead.
+    pub(crate) fn persist_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
+        if self.datastore_backend_type == DatastoreBackendType::Filesystem {
+            bail!("cannot create backend upload marker, not a cache store");
+        }
+        let (marker_path, _digest_str) = self.chunk_backed_upload_marker_path(digest);
+        let (chunk_path, digest_str) = self.chunk_path(digest);
+        let _lock = self.mutex.lock();
+
+        if let Err(err) = std::fs::rename(&marker_path, chunk_path) {
+            // Assert the chunk has been inserted and it is therefore safe to cleanup
+            // the upload marker nevertheless.
+            if self.cond_touch_chunk(digest, false)? {
+                std::fs::remove_file(&marker_path)?;
+                return Ok(());
+            }
+
+            return Err(format_err!(
+                "persisting backup upload marker failed for {digest_str} - {err}"
+            ));
+        }
+        Ok(())
+    }
+
     pub fn insert_chunk(&self, chunk: &DataBlob, digest: &[u8; 32]) -> Result<(bool, u64), Error> {
         self.insert_chunk_impl(chunk, digest, |_, _| Ok(()))
     }
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index aa34ab037..69c87c336 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1871,6 +1871,13 @@ impl DataStore {
         self.inner.chunk_store.touch_backend_upload_marker(digest)
     }
 
+    /// Persist the backend upload marker to be a zero size chunk marker.
+    ///
+    /// Marks the chunk as present in the local store cache without inserting its payload.
+    pub fn persist_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
+        self.inner.chunk_store.persist_backend_upload_marker(digest)
+    }
+
     pub fn stat_chunk(&self, digest: &[u8; 32]) -> Result<std::fs::Metadata, Error> {
         let (chunk_path, _digest_str) = self.inner.chunk_store.chunk_path(digest);
         std::fs::metadata(chunk_path).map_err(Error::from)
diff --git a/src/api2/backup/upload_chunk.rs b/src/api2/backup/upload_chunk.rs
index 0640f3652..bc64054a8 100644
--- a/src/api2/backup/upload_chunk.rs
+++ b/src/api2/backup/upload_chunk.rs
@@ -263,10 +263,14 @@ async fn upload_to_backend(
 
             if env.no_cache {
                 let object_key = pbs_datastore::s3::object_key_from_digest(&digest)?;
+                if !datastore.touch_backend_upload_marker(&digest)? {
+                    return Ok((digest, size, encoded_size, true));
+                }
                 let is_duplicate = s3_client
                     .upload_replace_on_final_retry(object_key, data)
                     .await
                     .map_err(|err| format_err!("failed to upload chunk to s3 backend - {err:#}"))?;
+                datastore.persist_backend_upload_marker(&digest)?;
                 return Ok((digest, size, encoded_size, is_duplicate));
             }
 
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v3 8/8] pull: guard chunk upload and only insert into cache after upload
  2025-10-15 16:39 [pbs-devel] [PATCH proxmox{, -backup} v3 00/10] s3 store: fix chunk upload/insert and GC race condition for s3 backend Christian Ebner
                   ` (8 preceding siblings ...)
  2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 7/8] api: chunk upload: fix race with garbage collection for no-cache on s3 Christian Ebner
@ 2025-10-15 16:40 ` Christian Ebner
  9 siblings, 0 replies; 11+ messages in thread
From: Christian Ebner @ 2025-10-15 16:40 UTC (permalink / raw)
  To: pbs-devel

Inserting the chunk into the local datastore cache leads to it being
considered as present on the backend. The upload might however still
fail, leading to missing chunks.

Therefore, only insert the chunk after the upload completed with
success and guard the upload by the backend upload marker file to
avoid races with garbage collection.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 src/server/pull.rs | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/server/pull.rs b/src/server/pull.rs
index c0b6fef7c..0bc619c68 100644
--- a/src/server/pull.rs
+++ b/src/server/pull.rs
@@ -176,14 +176,17 @@ async fn pull_index_chunks<I: IndexFile>(
                     if target2.cache_contains(&digest) {
                         return Ok(());
                     }
-                    target2.cache_insert(&digest, &chunk)?;
                     let data = chunk.raw_data().to_vec();
                     let upload_data = hyper::body::Bytes::from(data);
                     let object_key = pbs_datastore::s3::object_key_from_digest(&digest)?;
+                    if !target2.touch_backend_upload_marker(&digest)? {
+                        return Ok(());
+                    }
                     let _is_duplicate = proxmox_async::runtime::block_on(
                         s3_client.upload_replace_on_final_retry(object_key, upload_data),
                     )
                     .context("failed to upload chunk to s3 backend")?;
+                    target2.cache_insert(&digest, &chunk)?;
                 }
             }
             Ok(())
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-10-15 16:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-15 16:39 [pbs-devel] [PATCH proxmox{, -backup} v3 00/10] s3 store: fix chunk upload/insert and GC race condition for s3 backend Christian Ebner
2025-10-15 16:39 ` [pbs-devel] [PATCH proxmox v3 1/2] s3-client: add exponential backoff time for upload retries Christian Ebner
2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox v3 2/2] s3-client: add helper method to force final unconditional upload on Christian Ebner
2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 1/8] api/pull: avoid failing on concurrent conditional chunk uploads Christian Ebner
2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 2/8] datastore: GC: drop overly verbose info message during s3 chunk sweep Christian Ebner
2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 3/8] GC: refactor atime gathering for local chunk markers with s3 backend Christian Ebner
2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 4/8] chunk store: refactor method for chunk insertion Christian Ebner
2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 5/8] chunk store: add backend upload marker helpers for s3 backed stores Christian Ebner
2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 6/8] api: chunk upload: fix race between chunk backend upload and insert Christian Ebner
2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 7/8] api: chunk upload: fix race with garbage collection for no-cache on s3 Christian Ebner
2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 8/8] pull: guard chunk upload and only insert into cache after upload Christian Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal