From: Christian Ebner <c.ebner@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH proxmox-backup v3 6/8] api: chunk upload: fix race between chunk backend upload and insert
Date: Wed, 15 Oct 2025 18:40:06 +0200 [thread overview]
Message-ID: <20251015164008.975591-9-c.ebner@proxmox.com> (raw)
In-Reply-To: <20251015164008.975591-1-c.ebner@proxmox.com>
Chunk are first uploaded to the object store for S3 backed
datastores, only then inserted into the local datastore cache.
By this, it is assured that the chunk is only ever considered as
valid after successful upload. Garbage collection does however rely
on the local marker file to be present, only then considering the chunk
as in-use. While this marker is created if not present during phase 1
of garbage collection, this only happens for chunks which are already
referenced by a complete index file.
Therefore, there remains a race window between garbage collection
listing the chunks (upload completed) and lookup of the local marker
file being present (chunk cache insert after upload). This can lead to
chunks which just finished upload, but were not yet inserted into the
local cache store to be removed again.
To close this race window, mark chunks which are currently being
uploaded to the backend by an additional marker file, checked by the
garbage collection as well if the regular marker is not found.
The upload marker file is only removed after successful chunk insert
or by garbage collection if the atime is older than the cutoff (cleanup
in case of failed uploads).
Concurrent chunk uploads are still possible, updating the upload
marker file as it is then pre-existing. The upload marker file is not
removed in case of upload errors, assuring it remains in place for
the other upload.
Avoid this overhead for regular datastores by only performing these
operations on s3 backend datastores.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/datastore.rs | 25 +++++++++++--------
.../src/local_datastore_lru_cache.rs | 3 ++-
src/api2/backup/upload_chunk.rs | 9 ++++---
3 files changed, 22 insertions(+), 15 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index ed994eb0b..aa34ab037 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -4,7 +4,7 @@ use std::os::unix::ffi::OsStrExt;
use std::os::unix::io::AsRawFd;
use std::path::{Path, PathBuf};
use std::sync::{Arc, LazyLock, Mutex};
-use std::time::{Duration, SystemTime};
+use std::time::Duration;
use anyhow::{bail, format_err, Context, Error};
use http_body_util::BodyExt;
@@ -1658,16 +1658,10 @@ impl DataStore {
// Check local markers (created or atime updated during phase1) and
// keep or delete chunk based on that.
- let atime = match std::fs::metadata(&chunk_path) {
- Ok(stat) => stat
- .accessed()?
- .duration_since(SystemTime::UNIX_EPOCH)?
- .as_secs() as i64,
- Err(err) if err.kind() == std::io::ErrorKind::NotFound => {
- // File not found, delete by setting atime to unix epoch
- 0
- }
- Err(err) => return Err(err.into()),
+ let atime = unsafe {
+ self.inner
+ .chunk_store
+ .sweep_chunk_marker_files(&digest, min_atime)?
};
let bad = chunk_path
@@ -1868,6 +1862,15 @@ impl DataStore {
self.inner.chunk_store.insert_chunk(chunk, digest)
}
+ /// Insert a new backend upload marker or update atime if pre-existing, signaling to garbage
+ /// collection that there is an in-progress upload for this chunk.
+ ///
+ /// Returns true if the marker was created or touched, returns false if the chunk has been
+ /// inserted since, the marker file not being created and the upload must be avoided.
+ pub fn touch_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<bool, Error> {
+ self.inner.chunk_store.touch_backend_upload_marker(digest)
+ }
+
pub fn stat_chunk(&self, digest: &[u8; 32]) -> Result<std::fs::Metadata, Error> {
let (chunk_path, _digest_str) = self.inner.chunk_store.chunk_path(digest);
std::fs::metadata(chunk_path).map_err(Error::from)
diff --git a/pbs-datastore/src/local_datastore_lru_cache.rs b/pbs-datastore/src/local_datastore_lru_cache.rs
index fe3b51a55..cdad77031 100644
--- a/pbs-datastore/src/local_datastore_lru_cache.rs
+++ b/pbs-datastore/src/local_datastore_lru_cache.rs
@@ -34,7 +34,8 @@ impl LocalDatastoreLruCache {
///
/// Fails if the chunk cannot be inserted successfully.
pub fn insert(&self, digest: &[u8; 32], chunk: &DataBlob) -> Result<(), Error> {
- self.store.insert_chunk(chunk, digest)?;
+ self.store
+ .insert_chunk_and_remove_upload_marker(chunk, digest)?;
self.cache
.insert(*digest, (), |digest| self.store.clear_chunk(&digest))
}
diff --git a/src/api2/backup/upload_chunk.rs b/src/api2/backup/upload_chunk.rs
index 64e8d6e63..0640f3652 100644
--- a/src/api2/backup/upload_chunk.rs
+++ b/src/api2/backup/upload_chunk.rs
@@ -259,6 +259,7 @@ async fn upload_to_backend(
data.len()
);
}
+ let datastore = env.datastore.clone();
if env.no_cache {
let object_key = pbs_datastore::s3::object_key_from_digest(&digest)?;
@@ -272,11 +273,11 @@ async fn upload_to_backend(
// Avoid re-upload to S3 if the chunk is either present in the LRU cache or the chunk
// file exists on filesystem. The latter means that the chunk has been present in the
// past an was not cleaned up by garbage collection, so contained in the S3 object store.
- if env.datastore.cache_contains(&digest) {
+ if datastore.cache_contains(&digest) {
tracing::info!("Skip upload of cached chunk {}", hex::encode(digest));
return Ok((digest, size, encoded_size, true));
}
- if let Ok(true) = env.datastore.cond_touch_chunk(&digest, false) {
+ if let Ok(true) = datastore.cond_touch_chunk(&digest, false) {
tracing::info!(
"Skip upload of already encountered chunk {}",
hex::encode(digest)
@@ -286,6 +287,9 @@ async fn upload_to_backend(
tracing::info!("Upload of new chunk {}", hex::encode(digest));
let object_key = pbs_datastore::s3::object_key_from_digest(&digest)?;
+ if !datastore.touch_backend_upload_marker(&digest)? {
+ return Ok((digest, size, encoded_size, true));
+ }
let is_duplicate = s3_client
.upload_replace_on_final_retry(object_key, data.clone())
.await
@@ -295,7 +299,6 @@ async fn upload_to_backend(
// Although less performant than doing this in parallel, it is required for consisency
// since chunks are considered as present on the backend if the file exists in the local
// cache store.
- let datastore = env.datastore.clone();
tracing::info!("Caching of chunk {}", hex::encode(digest));
let _ = tokio::task::spawn_blocking(move || {
let chunk = DataBlob::from_raw(data.to_vec())?;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2025-10-15 16:40 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-15 16:39 [pbs-devel] [PATCH proxmox{, -backup} v3 00/10] s3 store: fix chunk upload/insert and GC race condition for s3 backend Christian Ebner
2025-10-15 16:39 ` [pbs-devel] [PATCH proxmox v3 1/2] s3-client: add exponential backoff time for upload retries Christian Ebner
2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox v3 2/2] s3-client: add helper method to force final unconditional upload on Christian Ebner
2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 1/8] api/pull: avoid failing on concurrent conditional chunk uploads Christian Ebner
2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 2/8] datastore: GC: drop overly verbose info message during s3 chunk sweep Christian Ebner
2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 3/8] GC: refactor atime gathering for local chunk markers with s3 backend Christian Ebner
2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 4/8] chunk store: refactor method for chunk insertion Christian Ebner
2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 5/8] chunk store: add backend upload marker helpers for s3 backed stores Christian Ebner
2025-10-15 16:40 ` Christian Ebner [this message]
2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 7/8] api: chunk upload: fix race with garbage collection for no-cache on s3 Christian Ebner
2025-10-15 16:40 ` [pbs-devel] [PATCH proxmox-backup v3 8/8] pull: guard chunk upload and only insert into cache after upload Christian Ebner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251015164008.975591-9-c.ebner@proxmox.com \
--to=c.ebner@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox