all lists on lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox-backup 0/7] s3 store: fix issues with chunk s3 backend upload and cache eviction
@ 2025-10-06 10:41 Christian Ebner
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 1/7] datastore: gc: inline single callsite method Christian Ebner
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Christian Ebner @ 2025-10-06 10:41 UTC (permalink / raw)
  To: pbs-devel

These patches fix 2 issues with the current s3 backend implementation
and reduce code duplication.

Patch 1 to 3 rework the garbage collection, deduplicating the common atime
check and garbage collection status update logic and refactor the chunk removal
from local datastore cache on s3 backends.

Patch 4 fixes an issue which could lead to backup restores failing
during concurrent backups, if the local datastore cache was small, leading
to chunks being evicted from the cache by truncating the contents, while the
chunk reader still accessed the chunk. This is now circumvented by replacing
the chunk file instead, leaving the contents on the already opened file
handle for the reader still accessible.

The remaining patches fix a possible race condition between s3 backend upload
and garbage collection, which can result in chunk loss. If the chunk upload
finished, garbage collection listed and checked the chunk's in-use marker, just
before it being written by the cache insert after the upload, garbage collection
can incorrectly delete the chunk. This is circumvented by setting and checking
an additional chunk marker file, which is created before starting the upload
and removed after cache insert, assuring that these chunks are not removed.

proxmox-backup:

Christian Ebner (7):
  datastore: gc: inline single callsite method
  gc: chunk store: rework atime check and gc status into common helper
  chunk store: add and use method to remove chunks
  chunk store: fix: replace evicted cache chunks instead of truncate
  api: chunk upload: fix race between chunk backend upload and insert
  api: chunk upload: fix race with garbage collection for no-cache on s3
  pull: guard chunk upload and only insert into cache after upload

 pbs-datastore/src/chunk_store.rs              | 211 +++++++++++++++---
 pbs-datastore/src/datastore.rs                | 155 +++++++------
 .../src/local_datastore_lru_cache.rs          |  32 +--
 src/api2/backup/upload_chunk.rs               |  29 ++-
 src/api2/config/datastore.rs                  |   2 +
 src/server/pull.rs                            |  14 +-
 6 files changed, 299 insertions(+), 144 deletions(-)


Summary over all repositories:
  6 files changed, 299 insertions(+), 144 deletions(-)

-- 
Generated by git-murpp 0.8.1


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 1/7] datastore: gc: inline single callsite method
  2025-10-06 10:41 [pbs-devel] [PATCH proxmox-backup 0/7] s3 store: fix issues with chunk s3 backend upload and cache eviction Christian Ebner
@ 2025-10-06 10:41 ` Christian Ebner
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 2/7] gc: chunk store: rework atime check and gc status into common helper Christian Ebner
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Christian Ebner @ 2025-10-06 10:41 UTC (permalink / raw)
  To: pbs-devel

This method only has a single callsite and is better split by
deduplicating common code in `ChunkStore::swipe_unused_chunks`.

Therefore, inline the method in preparation for the following
code deduplication.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/datastore.rs | 127 +++++++++++++--------------------
 1 file changed, 49 insertions(+), 78 deletions(-)

diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 2e62590f9..c2a82b8b8 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1653,21 +1653,56 @@ impl DataStore {
                 let lock = self.inner.chunk_store.mutex().lock().unwrap();
 
                 for content in list_bucket_result.contents {
-                    if self
-                        .mark_chunk_for_object_key(
-                            &content.key,
-                            content.size,
-                            min_atime,
-                            oldest_writer,
-                            &mut delete_list,
-                            &mut gc_status,
-                        )
-                        .with_context(|| {
-                            format!("failed to mark chunk for object key {}", content.key)
-                        })?
-                    {
-                        chunk_count += 1;
+                    let (chunk_path, digest) = match self.chunk_path_from_object_key(&content.key) {
+                        Some(path) => path,
+                        None => continue,
+                    };
+
+                    // Check local markers (created or atime updated during phase1) and
+                    // keep or delete chunk based on that.
+                    let atime = match std::fs::metadata(&chunk_path) {
+                        Ok(stat) => stat.accessed()?,
+                        Err(err) if err.kind() == std::io::ErrorKind::NotFound => {
+                            // File not found, delete by setting atime to unix epoch
+                            info!("Not found, mark for deletion: {}", content.key);
+                            SystemTime::UNIX_EPOCH
+                        }
+                        Err(err) => return Err(err.into()),
+                    };
+                    let atime = atime.duration_since(SystemTime::UNIX_EPOCH)?.as_secs() as i64;
+
+                    let bad = chunk_path
+                        .as_path()
+                        .extension()
+                        .is_some_and(|ext| ext == "bad");
+
+                    if atime < min_atime {
+                        if let Some(cache) = self.cache() {
+                            // ignore errors, phase 3 will retry cleanup anyways
+                            let _ = cache.remove(&digest);
+                        }
+                        delete_list.push(content.key.clone());
+                        if bad {
+                            gc_status.removed_bad += 1;
+                        } else {
+                            gc_status.removed_chunks += 1;
+                        }
+                        gc_status.removed_bytes += content.size;
+                    } else if atime < oldest_writer {
+                        if bad {
+                            gc_status.still_bad += 1;
+                        } else {
+                            gc_status.pending_chunks += 1;
+                        }
+                        gc_status.pending_bytes += content.size;
+                    } else {
+                        if !bad {
+                            gc_status.disk_chunks += 1;
+                        }
+                        gc_status.disk_bytes += content.size;
                     }
+
+                    chunk_count += 1;
                 }
 
                 drop(lock);
@@ -1796,70 +1831,6 @@ impl DataStore {
         Ok(())
     }
 
-    // Mark the chunk marker in the local cache store for the given object key as in use
-    // by updating it's atime.
-    // Returns Ok(true) if the chunk was updated and Ok(false) if the object was not a chunk.
-    fn mark_chunk_for_object_key(
-        &self,
-        object_key: &S3ObjectKey,
-        size: u64,
-        min_atime: i64,
-        oldest_writer: i64,
-        delete_list: &mut Vec<S3ObjectKey>,
-        gc_status: &mut GarbageCollectionStatus,
-    ) -> Result<bool, Error> {
-        let (chunk_path, digest) = match self.chunk_path_from_object_key(object_key) {
-            Some(path) => path,
-            None => return Ok(false),
-        };
-
-        // Check local markers (created or atime updated during phase1) and
-        // keep or delete chunk based on that.
-        let atime = match std::fs::metadata(&chunk_path) {
-            Ok(stat) => stat.accessed()?,
-            Err(err) if err.kind() == std::io::ErrorKind::NotFound => {
-                // File not found, delete by setting atime to unix epoch
-                info!("Not found, mark for deletion: {object_key}");
-                SystemTime::UNIX_EPOCH
-            }
-            Err(err) => return Err(err.into()),
-        };
-        let atime = atime.duration_since(SystemTime::UNIX_EPOCH)?.as_secs() as i64;
-
-        let bad = chunk_path
-            .as_path()
-            .extension()
-            .is_some_and(|ext| ext == "bad");
-
-        if atime < min_atime {
-            if let Some(cache) = self.cache() {
-                // ignore errors, phase 3 will retry cleanup anyways
-                let _ = cache.remove(&digest);
-            }
-            delete_list.push(object_key.clone());
-            if bad {
-                gc_status.removed_bad += 1;
-            } else {
-                gc_status.removed_chunks += 1;
-            }
-            gc_status.removed_bytes += size;
-        } else if atime < oldest_writer {
-            if bad {
-                gc_status.still_bad += 1;
-            } else {
-                gc_status.pending_chunks += 1;
-            }
-            gc_status.pending_bytes += size;
-        } else {
-            if !bad {
-                gc_status.disk_chunks += 1;
-            }
-            gc_status.disk_bytes += size;
-        }
-
-        Ok(true)
-    }
-
     // Check and generate a chunk path from given object key
     fn chunk_path_from_object_key(&self, object_key: &S3ObjectKey) -> Option<(PathBuf, [u8; 32])> {
         // Check object is actually a chunk
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 2/7] gc: chunk store: rework atime check and gc status into common helper
  2025-10-06 10:41 [pbs-devel] [PATCH proxmox-backup 0/7] s3 store: fix issues with chunk s3 backend upload and cache eviction Christian Ebner
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 1/7] datastore: gc: inline single callsite method Christian Ebner
@ 2025-10-06 10:41 ` Christian Ebner
  2025-10-06 13:14   ` Fabian Grünbichler
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 3/7] chunk store: add and use method to remove chunks Christian Ebner
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Christian Ebner @ 2025-10-06 10:41 UTC (permalink / raw)
  To: pbs-devel

Use the shared code paths for both, filesystem and s3 backend to a
common helper to avoid code duplication and adapt callsites
accordingly.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/chunk_store.rs | 69 ++++++++++++++++++++++----------
 pbs-datastore/src/datastore.rs   | 29 +++++---------
 2 files changed, 57 insertions(+), 41 deletions(-)

diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 3c59612bb..0725ca3a7 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -408,36 +408,27 @@ impl ChunkStore {
 
                 chunk_count += 1;
 
-                if stat.st_atime < min_atime {
-                    //let age = now - stat.st_atime;
-                    //println!("UNLINK {}  {:?}", age/(3600*24), filename);
+                if Self::check_atime_and_update_gc_status(
+                    stat.st_atime,
+                    min_atime,
+                    oldest_writer,
+                    stat.st_size as u64,
+                    bad,
+                    status,
+                ) {
                     if let Err(err) = unlinkat(Some(dirfd), filename, UnlinkatFlags::NoRemoveDir) {
                         if bad {
+                            status.removed_bad -= 1;
                             status.still_bad += 1;
+                        } else {
+                            status.removed_chunks += 1;
                         }
+                        status.removed_bytes -= stat.st_size as u64;
                         bail!(
                             "unlinking chunk {filename:?} failed on store '{}' - {err}",
                             self.name,
                         );
                     }
-                    if bad {
-                        status.removed_bad += 1;
-                    } else {
-                        status.removed_chunks += 1;
-                    }
-                    status.removed_bytes += stat.st_size as u64;
-                } else if stat.st_atime < oldest_writer {
-                    if bad {
-                        status.still_bad += 1;
-                    } else {
-                        status.pending_chunks += 1;
-                    }
-                    status.pending_bytes += stat.st_size as u64;
-                } else {
-                    if !bad {
-                        status.disk_chunks += 1;
-                    }
-                    status.disk_bytes += stat.st_size as u64;
                 }
             }
             drop(lock);
@@ -446,6 +437,42 @@ impl ChunkStore {
         Ok(())
     }
 
+    /// Check within what range the provided chunks atime falls and update the garbage collection
+    /// status accordingly.
+    ///
+    /// Returns true if the chunk file should be removed.
+    pub(super) fn check_atime_and_update_gc_status(
+        atime: i64,
+        min_atime: i64,
+        oldest_writer: i64,
+        size: u64,
+        bad: bool,
+        gc_status: &mut GarbageCollectionStatus,
+    ) -> bool {
+        if atime < min_atime {
+            if bad {
+                gc_status.removed_bad += 1;
+            } else {
+                gc_status.removed_chunks += 1;
+            }
+            gc_status.removed_bytes += size;
+            return true;
+        } else if atime < oldest_writer {
+            if bad {
+                gc_status.still_bad += 1;
+            } else {
+                gc_status.pending_chunks += 1;
+            }
+            gc_status.pending_bytes += size;
+        } else {
+            if !bad {
+                gc_status.disk_chunks += 1;
+            }
+            gc_status.disk_bytes += size;
+        }
+        false
+    }
+
     /// Check if atime updates are honored by the filesystem backing the chunk store.
     ///
     /// Checks if the atime is always updated by utimensat taking into consideration the Linux
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index c2a82b8b8..e36af68fc 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1676,30 +1676,19 @@ impl DataStore {
                         .extension()
                         .is_some_and(|ext| ext == "bad");
 
-                    if atime < min_atime {
+                    if ChunkStore::check_atime_and_update_gc_status(
+                        atime,
+                        min_atime,
+                        oldest_writer,
+                        content.size,
+                        bad,
+                        &mut gc_status,
+                    ) {
                         if let Some(cache) = self.cache() {
                             // ignore errors, phase 3 will retry cleanup anyways
                             let _ = cache.remove(&digest);
                         }
-                        delete_list.push(content.key.clone());
-                        if bad {
-                            gc_status.removed_bad += 1;
-                        } else {
-                            gc_status.removed_chunks += 1;
-                        }
-                        gc_status.removed_bytes += content.size;
-                    } else if atime < oldest_writer {
-                        if bad {
-                            gc_status.still_bad += 1;
-                        } else {
-                            gc_status.pending_chunks += 1;
-                        }
-                        gc_status.pending_bytes += content.size;
-                    } else {
-                        if !bad {
-                            gc_status.disk_chunks += 1;
-                        }
-                        gc_status.disk_bytes += content.size;
+                        delete_list.push(content.key);
                     }
 
                     chunk_count += 1;
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 3/7] chunk store: add and use method to remove chunks
  2025-10-06 10:41 [pbs-devel] [PATCH proxmox-backup 0/7] s3 store: fix issues with chunk s3 backend upload and cache eviction Christian Ebner
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 1/7] datastore: gc: inline single callsite method Christian Ebner
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 2/7] gc: chunk store: rework atime check and gc status into common helper Christian Ebner
@ 2025-10-06 10:41 ` Christian Ebner
  2025-10-06 13:17   ` Fabian Grünbichler
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 4/7] chunk store: fix: replace evicted cache chunks instead of truncate Christian Ebner
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Christian Ebner @ 2025-10-06 10:41 UTC (permalink / raw)
  To: pbs-devel

Reworks the removing of cached chunks during phase 2 of garbage
collection for datastores backed by s3.

Move the actual chunk removal logic to be a method of the chunk store
and require the mutex guard to be passe as shared reference,
signaling that the caller locked the store as required to avoid races
with chunk insert.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/chunk_store.rs               | 15 ++++++++++++++-
 pbs-datastore/src/datastore.rs                 |  2 +-
 pbs-datastore/src/local_datastore_lru_cache.rs |  7 +++----
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 0725ca3a7..010785fbc 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -1,7 +1,7 @@
 use std::os::unix::fs::MetadataExt;
 use std::os::unix::io::AsRawFd;
 use std::path::{Path, PathBuf};
-use std::sync::{Arc, Mutex};
+use std::sync::{Arc, Mutex, MutexGuard};
 use std::time::Duration;
 
 use anyhow::{bail, format_err, Context, Error};
@@ -254,6 +254,19 @@ impl ChunkStore {
         Ok(true)
     }
 
+    /// Remove a chunk from the chunk store
+    ///
+    /// Used to remove chunks from the local datastore cache. Caller must signal to hold the chunk
+    /// store mutex lock.
+    pub fn remove_chunk(
+        &self,
+        digest: &[u8; 32],
+        _guard: &MutexGuard<'_, ()>,
+    ) -> Result<(), Error> {
+        let (path, _digest) = self.chunk_path(digest);
+        std::fs::remove_file(path).map_err(Error::from)
+    }
+
     pub fn get_chunk_iterator(
         &self,
     ) -> Result<
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index e36af68fc..4f55eb9db 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1686,7 +1686,7 @@ impl DataStore {
                     ) {
                         if let Some(cache) = self.cache() {
                             // ignore errors, phase 3 will retry cleanup anyways
-                            let _ = cache.remove(&digest);
+                            let _ = cache.remove(&digest, &lock);
                         }
                         delete_list.push(content.key);
                     }
diff --git a/pbs-datastore/src/local_datastore_lru_cache.rs b/pbs-datastore/src/local_datastore_lru_cache.rs
index c0edd3619..1d2e87cb9 100644
--- a/pbs-datastore/src/local_datastore_lru_cache.rs
+++ b/pbs-datastore/src/local_datastore_lru_cache.rs
@@ -2,7 +2,7 @@
 //! a network layer (e.g. via the S3 backend).
 
 use std::future::Future;
-use std::sync::Arc;
+use std::sync::{Arc, MutexGuard};
 
 use anyhow::{bail, Error};
 use http_body_util::BodyExt;
@@ -87,10 +87,9 @@ impl LocalDatastoreLruCache {
     /// Remove a chunk from the local datastore cache.
     ///
     /// Fails if the chunk cannot be deleted successfully.
-    pub fn remove(&self, digest: &[u8; 32]) -> Result<(), Error> {
+    pub fn remove(&self, digest: &[u8; 32], guard: &MutexGuard<'_, ()>) -> Result<(), Error> {
         self.cache.remove(*digest);
-        let (path, _digest_str) = self.store.chunk_path(digest);
-        std::fs::remove_file(path).map_err(Error::from)
+        self.store.remove_chunk(digest, guard)
     }
 
     /// Access the locally cached chunk or fetch it from the S3 object store via the provided
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 4/7] chunk store: fix: replace evicted cache chunks instead of truncate
  2025-10-06 10:41 [pbs-devel] [PATCH proxmox-backup 0/7] s3 store: fix issues with chunk s3 backend upload and cache eviction Christian Ebner
                   ` (2 preceding siblings ...)
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 3/7] chunk store: add and use method to remove chunks Christian Ebner
@ 2025-10-06 10:41 ` Christian Ebner
  2025-10-06 13:18   ` Fabian Grünbichler
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 5/7] api: chunk upload: fix race between chunk backend upload and insert Christian Ebner
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Christian Ebner @ 2025-10-06 10:41 UTC (permalink / raw)
  To: pbs-devel

Evicted chunks have been truncated to size zero, keeping the chunk
file in place as in-use marker for the garbage collection but freeing
the chunk file contents. This can however lead to restores failing if
they already opened the chunk file for reading, as their contents are
now incomplete.

Fix this by instead replacing the chunk file with a zero sized file,
leaving the contents accessible for the already opened chunk readers.

By moving the logic from the local datastore cache to a helper method
on the chunk store, it is also assured that the operation is guarded
by the chunk store mutex lock to avoid races with chunk re-insert.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/chunk_store.rs              | 20 +++++++++++++++
 .../src/local_datastore_lru_cache.rs          | 25 +++----------------
 2 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 010785fbc..74fa79db1 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -668,6 +668,26 @@ impl ChunkStore {
         (chunk_path, digest_str)
     }
 
+    /// Replace a chunk file with a zero size file in the chunk store.
+    ///
+    /// Used to evict chunks from the local datastore cache, while keeping them as in-use markers
+    /// for garbage collection. Returns with success also if chunk file is not pre-existing.
+    pub fn clear_chunk(&self, digest: &[u8; 32]) -> Result<(), Error> {
+        let (chunk_path, digest_str) = self.chunk_path(digest);
+        let mut create_options = CreateOptions::new();
+        if nix::unistd::Uid::effective().is_root() {
+            let uid = pbs_config::backup_user()?.uid;
+            let gid = pbs_config::backup_group()?.gid;
+            create_options = create_options.owner(uid).group(gid);
+        }
+
+        let _lock = self.mutex.lock();
+
+        proxmox_sys::fs::replace_file(&chunk_path, &[], create_options, false)
+            .map_err(|err| format_err!("clear chunk failed for {digest_str} - {err}"))?;
+        Ok(())
+    }
+
     pub fn relative_path(&self, path: &Path) -> PathBuf {
         // unwrap: only `None` in unit tests
         assert!(self.locker.is_some());
diff --git a/pbs-datastore/src/local_datastore_lru_cache.rs b/pbs-datastore/src/local_datastore_lru_cache.rs
index 1d2e87cb9..6f950f4b3 100644
--- a/pbs-datastore/src/local_datastore_lru_cache.rs
+++ b/pbs-datastore/src/local_datastore_lru_cache.rs
@@ -71,17 +71,8 @@ impl LocalDatastoreLruCache {
     /// Fails if the chunk cannot be inserted successfully.
     pub fn insert(&self, digest: &[u8; 32], chunk: &DataBlob) -> Result<(), Error> {
         self.store.insert_chunk(chunk, digest)?;
-        self.cache.insert(*digest, (), |digest| {
-            let (path, _digest_str) = self.store.chunk_path(&digest);
-            // Truncate to free up space but keep the inode around, since that
-            // is used as marker for chunks in use by garbage collection.
-            if let Err(err) = nix::unistd::truncate(&path, 0) {
-                if err != nix::errno::Errno::ENOENT {
-                    return Err(Error::from(err));
-                }
-            }
-            Ok(())
-        })
+        self.cache
+            .insert(*digest, (), |digest| self.store.clear_chunk(&digest))
     }
 
     /// Remove a chunk from the local datastore cache.
@@ -104,17 +95,7 @@ impl LocalDatastoreLruCache {
     ) -> Result<Option<DataBlob>, Error> {
         if self
             .cache
-            .access(*digest, cacher, |digest| {
-                let (path, _digest_str) = self.store.chunk_path(&digest);
-                // Truncate to free up space but keep the inode around, since that
-                // is used as marker for chunks in use by garbage collection.
-                if let Err(err) = nix::unistd::truncate(&path, 0) {
-                    if err != nix::errno::Errno::ENOENT {
-                        return Err(Error::from(err));
-                    }
-                }
-                Ok(())
-            })
+            .access(*digest, cacher, |digest| self.store.clear_chunk(&digest))
             .await?
             .is_some()
         {
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 5/7] api: chunk upload: fix race between chunk backend upload and insert
  2025-10-06 10:41 [pbs-devel] [PATCH proxmox-backup 0/7] s3 store: fix issues with chunk s3 backend upload and cache eviction Christian Ebner
                   ` (3 preceding siblings ...)
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 4/7] chunk store: fix: replace evicted cache chunks instead of truncate Christian Ebner
@ 2025-10-06 10:41 ` Christian Ebner
  2025-10-06 13:18   ` Fabian Grünbichler
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 6/7] api: chunk upload: fix race with garbage collection for no-cache on s3 Christian Ebner
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Christian Ebner @ 2025-10-06 10:41 UTC (permalink / raw)
  To: pbs-devel

Chunk are first uploaded to the object store for S3 backed
datastores, only then inserted into the local datastore cache.
By this, it is assured that the chunk is only ever considered as
valid after successful upload. Garbage collection does however rely
on the local marker file to be present, only then considering the chunk
as in-use. While this marker is created if not present during phase 1
of garbage collection, this only happens for chunks which are already
referenced by a complete index file.

Therefore, there remains a race window between garbage collection
listing the chunks (upload completed) and lookup of the local marker
file being present (chunk cache insert after upload). This can lead to
chunks which just finished upload, but were not yet inserted into the
local cache store to be removed again.

To close this race window, mark chunks which are currently being
uploaded to the backend by an additional marker file, checked by the
garbage collection as well if the regular marker is not found, and
only removed after successful chunk insert.

Avoid this overhead for regular datastores by passing and checking
the backend type for the chunk store.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/chunk_store.rs | 94 +++++++++++++++++++++++++++++---
 pbs-datastore/src/datastore.rs   | 32 +++++++++++
 src/api2/backup/upload_chunk.rs  | 17 ++++--
 src/api2/config/datastore.rs     |  2 +
 4 files changed, 133 insertions(+), 12 deletions(-)

diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 74fa79db1..22efe4a32 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -7,7 +7,7 @@ use std::time::Duration;
 use anyhow::{bail, format_err, Context, Error};
 use tracing::{info, warn};
 
-use pbs_api_types::{DatastoreFSyncLevel, GarbageCollectionStatus};
+use pbs_api_types::{DatastoreBackendType, DatastoreFSyncLevel, GarbageCollectionStatus};
 use proxmox_io::ReadExt;
 use proxmox_s3_client::S3Client;
 use proxmox_sys::fs::{create_dir, create_path, file_type_from_file_stat, CreateOptions};
@@ -22,6 +22,10 @@ use crate::file_formats::{
 };
 use crate::DataBlob;
 
+/// Filename extension for local datastore cache marker files indicating
+/// the chunk being uploaded to the backend
+pub(crate) const BACKEND_UPLOAD_MARKER_EXT: &str = "backend-upload";
+
 /// File system based chunk store
 pub struct ChunkStore {
     name: String, // used for error reporting
@@ -30,6 +34,7 @@ pub struct ChunkStore {
     mutex: Mutex<()>,
     locker: Option<Arc<Mutex<ProcessLocker>>>,
     sync_level: DatastoreFSyncLevel,
+    datastore_backend_type: DatastoreBackendType,
 }
 
 // TODO: what about sysctl setting vm.vfs_cache_pressure (0 - 100) ?
@@ -77,6 +82,7 @@ impl ChunkStore {
             mutex: Mutex::new(()),
             locker: None,
             sync_level: Default::default(),
+            datastore_backend_type: DatastoreBackendType::default(),
         }
     }
 
@@ -97,6 +103,7 @@ impl ChunkStore {
         uid: nix::unistd::Uid,
         gid: nix::unistd::Gid,
         sync_level: DatastoreFSyncLevel,
+        datastore_backend_type: DatastoreBackendType,
     ) -> Result<Self, Error>
     where
         P: Into<PathBuf>,
@@ -151,7 +158,7 @@ impl ChunkStore {
             }
         }
 
-        Self::open(name, base, sync_level)
+        Self::open(name, base, sync_level, datastore_backend_type)
     }
 
     fn lockfile_path<P: Into<PathBuf>>(base: P) -> PathBuf {
@@ -185,6 +192,7 @@ impl ChunkStore {
         name: &str,
         base: P,
         sync_level: DatastoreFSyncLevel,
+        datastore_backend_type: DatastoreBackendType,
     ) -> Result<Self, Error> {
         let base: PathBuf = base.into();
 
@@ -201,6 +209,7 @@ impl ChunkStore {
             locker: Some(locker),
             mutex: Mutex::new(()),
             sync_level,
+            datastore_backend_type,
         })
     }
 
@@ -567,6 +576,40 @@ impl ChunkStore {
         Ok(())
     }
 
+    pub(crate) fn insert_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
+        if self.datastore_backend_type == DatastoreBackendType::Filesystem {
+            bail!("cannot create backend upload marker, not a cache store");
+        }
+        let (marker_path, digest_str) = self.chunk_backed_upload_marker_path(digest);
+
+        let _lock = self.mutex.lock();
+
+        std::fs::File::options()
+            .write(true)
+            .create_new(true)
+            .open(&marker_path)
+            .with_context(|| {
+                format!("failed to create backend upload marker for chunk {digest_str}")
+            })?;
+        Ok(())
+    }
+
+    pub(crate) fn cleanup_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
+        if self.datastore_backend_type == DatastoreBackendType::Filesystem {
+            bail!("cannot cleanup backend upload marker, not a cache store");
+        }
+        let (marker_path, _digest_str) = self.chunk_backed_upload_marker_path(digest);
+
+        let _lock = self.mutex.lock();
+
+        if let Err(err) = std::fs::remove_file(marker_path) {
+            if err.kind() != std::io::ErrorKind::NotFound {
+                bail!("failed to cleanup backend upload marker: {err}");
+            }
+        }
+        Ok(())
+    }
+
     pub fn insert_chunk(&self, chunk: &DataBlob, digest: &[u8; 32]) -> Result<(bool, u64), Error> {
         // unwrap: only `None` in unit tests
         assert!(self.locker.is_some());
@@ -644,6 +687,16 @@ impl ChunkStore {
             format_err!("inserting chunk on store '{name}' failed for {digest_str} - {err}")
         })?;
 
+        if self.datastore_backend_type != DatastoreBackendType::Filesystem {
+            if let Err(err) =
+                std::fs::remove_file(chunk_path.with_extension(BACKEND_UPLOAD_MARKER_EXT))
+            {
+                if err.kind() != std::io::ErrorKind::NotFound {
+                    bail!("failed to cleanup backend upload marker: {err}");
+                }
+            }
+        }
+
         if self.sync_level == DatastoreFSyncLevel::File {
             // fsync dir handle to persist the tmp rename
             let dir = std::fs::File::open(chunk_dir_path)?;
@@ -688,6 +741,15 @@ impl ChunkStore {
         Ok(())
     }
 
+    /// Generate file path for backend upload marker of given chunk digest.
+    pub(crate) fn chunk_backed_upload_marker_path(&self, digest: &[u8; 32]) -> (PathBuf, String) {
+        let (chunk_path, digest_str) = self.chunk_path(digest);
+        (
+            chunk_path.with_extension(BACKEND_UPLOAD_MARKER_EXT),
+            digest_str,
+        )
+    }
+
     pub fn relative_path(&self, path: &Path) -> PathBuf {
         // unwrap: only `None` in unit tests
         assert!(self.locker.is_some());
@@ -773,14 +835,26 @@ fn test_chunk_store1() {
 
     if let Err(_e) = std::fs::remove_dir_all(".testdir") { /* ignore */ }
 
-    let chunk_store = ChunkStore::open("test", &path, DatastoreFSyncLevel::None);
+    let chunk_store = ChunkStore::open(
+        "test",
+        &path,
+        DatastoreFSyncLevel::None,
+        DatastoreBackendType::Filesystem,
+    );
     assert!(chunk_store.is_err());
 
     let user = nix::unistd::User::from_uid(nix::unistd::Uid::current())
         .unwrap()
         .unwrap();
-    let chunk_store =
-        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None).unwrap();
+    let chunk_store = ChunkStore::create(
+        "test",
+        &path,
+        user.uid,
+        user.gid,
+        DatastoreFSyncLevel::None,
+        DatastoreBackendType::Filesystem,
+    )
+    .unwrap();
 
     let (chunk, digest) = crate::data_blob::DataChunkBuilder::new(&[0u8, 1u8])
         .build()
@@ -792,8 +866,14 @@ fn test_chunk_store1() {
     let (exists, _) = chunk_store.insert_chunk(&chunk, &digest).unwrap();
     assert!(exists);
 
-    let chunk_store =
-        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None);
+    let chunk_store = ChunkStore::create(
+        "test",
+        &path,
+        user.uid,
+        user.gid,
+        DatastoreFSyncLevel::None,
+        DatastoreBackendType::Filesystem,
+    );
     assert!(chunk_store.is_err());
 
     if let Err(_e) = std::fs::remove_dir_all(".testdir") { /* ignore */ }
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 4f55eb9db..58fb863ec 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -348,10 +348,14 @@ impl DataStore {
                 DatastoreTuning::API_SCHEMA
                     .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
             )?;
+            let backend_config: DatastoreBackendConfig =
+                config.backend.as_deref().unwrap_or("").parse()?;
+            let backend_type = backend_config.ty.unwrap_or_default();
             Arc::new(ChunkStore::open(
                 name,
                 config.absolute_path(),
                 tuning.sync_level.unwrap_or_default(),
+                backend_type,
             )?)
         };
 
@@ -436,10 +440,16 @@ impl DataStore {
             DatastoreTuning::API_SCHEMA
                 .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
         )?;
+        let backend_config: DatastoreBackendConfig = serde_json::from_value(
+            DatastoreBackendConfig::API_SCHEMA
+                .parse_property_string(config.backend.as_deref().unwrap_or(""))?,
+        )?;
+        let backend_type = backend_config.ty.unwrap_or_default();
         let chunk_store = ChunkStore::open(
             &name,
             config.absolute_path(),
             tuning.sync_level.unwrap_or_default(),
+            backend_type,
         )?;
         let inner = Arc::new(Self::with_store_and_config(
             Arc::new(chunk_store),
@@ -1663,6 +1673,15 @@ impl DataStore {
                     let atime = match std::fs::metadata(&chunk_path) {
                         Ok(stat) => stat.accessed()?,
                         Err(err) if err.kind() == std::io::ErrorKind::NotFound => {
+                            if std::fs::metadata(
+                                chunk_path
+                                    .with_extension(crate::chunk_store::BACKEND_UPLOAD_MARKER_EXT),
+                            )
+                            .is_ok()
+                            {
+                                info!("keep in progress {}", content.key);
+                                continue;
+                            }
                             // File not found, delete by setting atime to unix epoch
                             info!("Not found, mark for deletion: {}", content.key);
                             SystemTime::UNIX_EPOCH
@@ -1867,6 +1886,19 @@ impl DataStore {
         self.inner.chunk_store.insert_chunk(chunk, digest)
     }
 
+    /// Inserts the marker file to signal an in progress upload to the backend
+    ///
+    /// The presence of the marker avoids a race between inserting the chunk into the
+    /// datastore and cleanup of the chunk by garbage collection.
+    pub fn insert_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
+        self.inner.chunk_store.insert_backend_upload_marker(digest)
+    }
+
+    /// Remove the marker file signaling an in-progress upload to the backend
+    pub fn cleanup_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
+        self.inner.chunk_store.cleanup_backend_upload_marker(digest)
+    }
+
     pub fn stat_chunk(&self, digest: &[u8; 32]) -> Result<std::fs::Metadata, Error> {
         let (chunk_path, _digest_str) = self.inner.chunk_store.chunk_path(digest);
         std::fs::metadata(chunk_path).map_err(Error::from)
diff --git a/src/api2/backup/upload_chunk.rs b/src/api2/backup/upload_chunk.rs
index 8dd7e4d52..d4b1850eb 100644
--- a/src/api2/backup/upload_chunk.rs
+++ b/src/api2/backup/upload_chunk.rs
@@ -259,6 +259,7 @@ async fn upload_to_backend(
                     data.len()
                 );
             }
+            let datastore = env.datastore.clone();
 
             if env.no_cache {
                 let object_key = pbs_datastore::s3::object_key_from_digest(&digest)?;
@@ -272,11 +273,11 @@ async fn upload_to_backend(
             // Avoid re-upload to S3 if the chunk is either present in the LRU cache or the chunk
             // file exists on filesystem. The latter means that the chunk has been present in the
             // past an was not cleaned up by garbage collection, so contained in the S3 object store.
-            if env.datastore.cache_contains(&digest) {
+            if datastore.cache_contains(&digest) {
                 tracing::info!("Skip upload of cached chunk {}", hex::encode(digest));
                 return Ok((digest, size, encoded_size, true));
             }
-            if let Ok(true) = env.datastore.cond_touch_chunk(&digest, false) {
+            if let Ok(true) = datastore.cond_touch_chunk(&digest, false) {
                 tracing::info!(
                     "Skip upload of already encountered chunk {}",
                     hex::encode(digest)
@@ -284,18 +285,24 @@ async fn upload_to_backend(
                 return Ok((digest, size, encoded_size, true));
             }
 
+            datastore.insert_backend_upload_marker(&digest)?;
             tracing::info!("Upload of new chunk {}", hex::encode(digest));
             let object_key = pbs_datastore::s3::object_key_from_digest(&digest)?;
-            let is_duplicate = s3_client
+            let is_duplicate = match s3_client
                 .upload_no_replace_with_retry(object_key, data.clone())
                 .await
-                .map_err(|err| format_err!("failed to upload chunk to s3 backend - {err:#}"))?;
+            {
+                Ok(is_duplicate) => is_duplicate,
+                Err(err) => {
+                    datastore.cleanup_backend_upload_marker(&digest)?;
+                    bail!("failed to upload chunk to s3 backend - {err:#}");
+                }
+            };
 
             // Only insert the chunk into the cache after it has been successufuly uploaded.
             // Although less performant than doing this in parallel, it is required for consisency
             // since chunks are considered as present on the backend if the file exists in the local
             // cache store.
-            let datastore = env.datastore.clone();
             tracing::info!("Caching of chunk {}", hex::encode(digest));
             let _ = tokio::task::spawn_blocking(move || {
                 let chunk = DataBlob::from_raw(data.to_vec())?;
diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
index 3b03c0466..541bd0a04 100644
--- a/src/api2/config/datastore.rs
+++ b/src/api2/config/datastore.rs
@@ -173,6 +173,7 @@ pub(crate) fn do_create_datastore(
                 &datastore.name,
                 &path,
                 tuning.sync_level.unwrap_or_default(),
+                backend_type,
             )
         })?
     } else {
@@ -204,6 +205,7 @@ pub(crate) fn do_create_datastore(
             backup_user.uid,
             backup_user.gid,
             tuning.sync_level.unwrap_or_default(),
+            backend_type,
         )?
     };
 
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 6/7] api: chunk upload: fix race with garbage collection for no-cache on s3
  2025-10-06 10:41 [pbs-devel] [PATCH proxmox-backup 0/7] s3 store: fix issues with chunk s3 backend upload and cache eviction Christian Ebner
                   ` (4 preceding siblings ...)
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 5/7] api: chunk upload: fix race between chunk backend upload and insert Christian Ebner
@ 2025-10-06 10:41 ` Christian Ebner
  2025-10-06 13:18   ` Fabian Grünbichler
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 7/7] pull: guard chunk upload and only insert into cache after upload Christian Ebner
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Christian Ebner @ 2025-10-06 10:41 UTC (permalink / raw)
  To: pbs-devel

Chunks uploaded to the s3 backend are never inserted into the local
datastore cache. The presence of the chunk marker file is however
required for garbage collection to not cleanup the chunks. While the
marker files are created during phase 1 of the garbage collection for
indexed chunks, this is not the case for in progress backups with the
no-cache flag set.

Therefore, mark chunks as in-progress while being uploaded just like
for the regular mode with cache, but replace this with the zero-sized
chunk marker file after upload finished to avoid incorrect garbage
collection cleanup.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 pbs-datastore/src/chunk_store.rs | 13 +++++++++++++
 pbs-datastore/src/datastore.rs   |  7 +++++++
 src/api2/backup/upload_chunk.rs  | 12 ++++++++++--
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 22efe4a32..7fd92b626 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -594,6 +594,19 @@ impl ChunkStore {
         Ok(())
     }
 
+    pub(crate) fn persist_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
+        if self.datastore_backend_type == DatastoreBackendType::Filesystem {
+            bail!("cannot create backend upload marker, not a cache store");
+        }
+        let (marker_path, _digest_str) = self.chunk_backed_upload_marker_path(digest);
+        let (chunk_path, digest_str) = self.chunk_path(digest);
+        let _lock = self.mutex.lock();
+
+        std::fs::rename(marker_path, chunk_path).map_err(|err| {
+            format_err!("persisting backup upload marker failed for {digest_str} - {err}")
+        })
+    }
+
     pub(crate) fn cleanup_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
         if self.datastore_backend_type == DatastoreBackendType::Filesystem {
             bail!("cannot cleanup backend upload marker, not a cache store");
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 58fb863ec..8b0d4ab5c 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1894,6 +1894,13 @@ impl DataStore {
         self.inner.chunk_store.insert_backend_upload_marker(digest)
     }
 
+    /// Persist the backend upload marker to be a zero size chunk marker.
+    ///
+    /// Marks the chunk as present in the local store cache without inserting its payload.
+    pub fn persist_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
+        self.inner.chunk_store.persist_backend_upload_marker(digest)
+    }
+
     /// Remove the marker file signaling an in-progress upload to the backend
     pub fn cleanup_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
         self.inner.chunk_store.cleanup_backend_upload_marker(digest)
diff --git a/src/api2/backup/upload_chunk.rs b/src/api2/backup/upload_chunk.rs
index d4b1850eb..35d873ebf 100644
--- a/src/api2/backup/upload_chunk.rs
+++ b/src/api2/backup/upload_chunk.rs
@@ -263,10 +263,18 @@ async fn upload_to_backend(
 
             if env.no_cache {
                 let object_key = pbs_datastore::s3::object_key_from_digest(&digest)?;
-                let is_duplicate = s3_client
+                env.datastore.insert_backend_upload_marker(&digest)?;
+                let is_duplicate = match s3_client
                     .upload_no_replace_with_retry(object_key, data)
                     .await
-                    .map_err(|err| format_err!("failed to upload chunk to s3 backend - {err:#}"))?;
+                {
+                    Ok(is_duplicate) => is_duplicate,
+                    Err(err) => {
+                        datastore.cleanup_backend_upload_marker(&digest)?;
+                        bail!("failed to upload chunk to s3 backend - {err:#}");
+                    }
+                };
+                env.datastore.persist_backend_upload_marker(&digest)?;
                 return Ok((digest, size, encoded_size, is_duplicate));
             }
 
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [pbs-devel] [PATCH proxmox-backup 7/7] pull: guard chunk upload and only insert into cache after upload
  2025-10-06 10:41 [pbs-devel] [PATCH proxmox-backup 0/7] s3 store: fix issues with chunk s3 backend upload and cache eviction Christian Ebner
                   ` (5 preceding siblings ...)
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 6/7] api: chunk upload: fix race with garbage collection for no-cache on s3 Christian Ebner
@ 2025-10-06 10:41 ` Christian Ebner
  2025-10-06 13:18   ` Fabian Grünbichler
  2025-10-06 13:18 ` [pbs-devel] [PATCH proxmox-backup 0/7] s3 store: fix issues with chunk s3 backend upload and cache eviction Fabian Grünbichler
  2025-10-08 15:22 ` [pbs-devel] superseded: " Christian Ebner
  8 siblings, 1 reply; 19+ messages in thread
From: Christian Ebner @ 2025-10-06 10:41 UTC (permalink / raw)
  To: pbs-devel

Inserting the chunk into the local datastore cache leads to it being
considered as present on the backend. The upload might however still
fail, leading to missing chunks.

Therefore, only insert the chunk after the upload completed with
success and guard the upload by the backend upload marker file to
avoid races with garbage collection.

Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
 src/server/pull.rs | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/src/server/pull.rs b/src/server/pull.rs
index 08e6e6b64..639415107 100644
--- a/src/server/pull.rs
+++ b/src/server/pull.rs
@@ -176,14 +176,20 @@ async fn pull_index_chunks<I: IndexFile>(
                     if target2.cache_contains(&digest) {
                         return Ok(());
                     }
-                    target2.cache_insert(&digest, &chunk)?;
                     let data = chunk.raw_data().to_vec();
                     let upload_data = hyper::body::Bytes::from(data);
                     let object_key = pbs_datastore::s3::object_key_from_digest(&digest)?;
-                    let _is_duplicate = proxmox_async::runtime::block_on(
+                    target2.insert_backend_upload_marker(&digest)?;
+                    match proxmox_async::runtime::block_on(
                         s3_client.upload_no_replace_with_retry(object_key, upload_data),
-                    )
-                    .context("failed to upload chunk to s3 backend")?;
+                    ) {
+                        Ok(_is_duplicate) => (),
+                        Err(err) => {
+                            target2.cleanup_backend_upload_marker(&digest)?;
+                            return Err(err.context("failed to upload chunk to s3 backend"));
+                        }
+                    }
+                    target2.cache_insert(&digest, &chunk)?;
                 }
             }
             Ok(())
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 2/7] gc: chunk store: rework atime check and gc status into common helper
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 2/7] gc: chunk store: rework atime check and gc status into common helper Christian Ebner
@ 2025-10-06 13:14   ` Fabian Grünbichler
  0 siblings, 0 replies; 19+ messages in thread
From: Fabian Grünbichler @ 2025-10-06 13:14 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On October 6, 2025 12:41 pm, Christian Ebner wrote:
> Use the shared code paths for both, filesystem and s3 backend to a
> common helper to avoid code duplication and adapt callsites
> accordingly.
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
>  pbs-datastore/src/chunk_store.rs | 69 ++++++++++++++++++++++----------
>  pbs-datastore/src/datastore.rs   | 29 +++++---------
>  2 files changed, 57 insertions(+), 41 deletions(-)
> 
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index 3c59612bb..0725ca3a7 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -408,36 +408,27 @@ impl ChunkStore {
>  
>                  chunk_count += 1;
>  
> -                if stat.st_atime < min_atime {
> -                    //let age = now - stat.st_atime;
> -                    //println!("UNLINK {}  {:?}", age/(3600*24), filename);
> +                if Self::check_atime_and_update_gc_status(
> +                    stat.st_atime,
> +                    min_atime,
> +                    oldest_writer,
> +                    stat.st_size as u64,
> +                    bad,
> +                    status,
> +                ) {
>                      if let Err(err) = unlinkat(Some(dirfd), filename, UnlinkatFlags::NoRemoveDir) {

if this part where also handled by the helper (in a fashion that allows
using it for both cache and regular chunk store)

>                          if bad {
> +                            status.removed_bad -= 1;
>                              status.still_bad += 1;
> +                        } else {
> +                            status.removed_chunks += 1;
>                          }
> +                        status.removed_bytes -= stat.st_size as u64;

then this error handling here would not leak outside of the helper, and
we could "simply" call the helper and bubble up any error it returns..

>                          bail!(
>                              "unlinking chunk {filename:?} failed on store '{}' - {err}",
>                              self.name,
>                          );
>                      }
> -                    if bad {
> -                        status.removed_bad += 1;
> -                    } else {
> -                        status.removed_chunks += 1;
> -                    }
> -                    status.removed_bytes += stat.st_size as u64;
> -                } else if stat.st_atime < oldest_writer {
> -                    if bad {
> -                        status.still_bad += 1;
> -                    } else {
> -                        status.pending_chunks += 1;
> -                    }
> -                    status.pending_bytes += stat.st_size as u64;
> -                } else {
> -                    if !bad {
> -                        status.disk_chunks += 1;
> -                    }
> -                    status.disk_bytes += stat.st_size as u64;
>                  }
>              }
>              drop(lock);
> @@ -446,6 +437,42 @@ impl ChunkStore {
>          Ok(())
>      }
>  
> +    /// Check within what range the provided chunks atime falls and update the garbage collection
> +    /// status accordingly.
> +    ///
> +    /// Returns true if the chunk file should be removed.
> +    pub(super) fn check_atime_and_update_gc_status(
> +        atime: i64,
> +        min_atime: i64,
> +        oldest_writer: i64,
> +        size: u64,
> +        bad: bool,
> +        gc_status: &mut GarbageCollectionStatus,
> +    ) -> bool {
> +        if atime < min_atime {
> +            if bad {
> +                gc_status.removed_bad += 1;
> +            } else {
> +                gc_status.removed_chunks += 1;
> +            }
> +            gc_status.removed_bytes += size;
> +            return true;
> +        } else if atime < oldest_writer {
> +            if bad {
> +                gc_status.still_bad += 1;
> +            } else {
> +                gc_status.pending_chunks += 1;
> +            }
> +            gc_status.pending_bytes += size;
> +        } else {
> +            if !bad {
> +                gc_status.disk_chunks += 1;
> +            }
> +            gc_status.disk_bytes += size;
> +        }
> +        false
> +    }
> +
>      /// Check if atime updates are honored by the filesystem backing the chunk store.
>      ///
>      /// Checks if the atime is always updated by utimensat taking into consideration the Linux
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index c2a82b8b8..e36af68fc 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -1676,30 +1676,19 @@ impl DataStore {
>                          .extension()
>                          .is_some_and(|ext| ext == "bad");
>  
> -                    if atime < min_atime {
> +                    if ChunkStore::check_atime_and_update_gc_status(
> +                        atime,
> +                        min_atime,
> +                        oldest_writer,
> +                        content.size,
> +                        bad,
> +                        &mut gc_status,
> +                    ) {
>                          if let Some(cache) = self.cache() {
>                              // ignore errors, phase 3 will retry cleanup anyways
>                              let _ = cache.remove(&digest);
>                          }
> -                        delete_list.push(content.key.clone());
> -                        if bad {
> -                            gc_status.removed_bad += 1;
> -                        } else {
> -                            gc_status.removed_chunks += 1;
> -                        }
> -                        gc_status.removed_bytes += content.size;
> -                    } else if atime < oldest_writer {
> -                        if bad {
> -                            gc_status.still_bad += 1;
> -                        } else {
> -                            gc_status.pending_chunks += 1;
> -                        }
> -                        gc_status.pending_bytes += content.size;
> -                    } else {
> -                        if !bad {
> -                            gc_status.disk_chunks += 1;
> -                        }
> -                        gc_status.disk_bytes += content.size;
> +                        delete_list.push(content.key);

nit: this removal of the clone could have happened in patch #1

>                      }
>  
>                      chunk_count += 1;
> -- 
> 2.47.3
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 3/7] chunk store: add and use method to remove chunks
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 3/7] chunk store: add and use method to remove chunks Christian Ebner
@ 2025-10-06 13:17   ` Fabian Grünbichler
  0 siblings, 0 replies; 19+ messages in thread
From: Fabian Grünbichler @ 2025-10-06 13:17 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On October 6, 2025 12:41 pm, Christian Ebner wrote:
> Reworks the removing of cached chunks during phase 2 of garbage
> collection for datastores backed by s3.
> 
> Move the actual chunk removal logic to be a method of the chunk store
> and require the mutex guard to be passe as shared reference,
> signaling that the caller locked the store as required to avoid races
> with chunk insert.
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
>  pbs-datastore/src/chunk_store.rs               | 15 ++++++++++++++-
>  pbs-datastore/src/datastore.rs                 |  2 +-
>  pbs-datastore/src/local_datastore_lru_cache.rs |  7 +++----
>  3 files changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index 0725ca3a7..010785fbc 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -1,7 +1,7 @@
>  use std::os::unix::fs::MetadataExt;
>  use std::os::unix::io::AsRawFd;
>  use std::path::{Path, PathBuf};
> -use std::sync::{Arc, Mutex};
> +use std::sync::{Arc, Mutex, MutexGuard};
>  use std::time::Duration;
>  
>  use anyhow::{bail, format_err, Context, Error};
> @@ -254,6 +254,19 @@ impl ChunkStore {
>          Ok(true)
>      }
>  
> +    /// Remove a chunk from the chunk store
> +    ///
> +    /// Used to remove chunks from the local datastore cache. Caller must signal to hold the chunk
> +    /// store mutex lock.
> +    pub fn remove_chunk(
> +        &self,
> +        digest: &[u8; 32],
> +        _guard: &MutexGuard<'_, ()>,

if we do this, then this should be a proper type across the board..

but it also is a bit wrong interface-wise - just obtaining the chunk
store lock doesn't make it safe to remove chunks, it's still only GC
that is "allowed" to do that since it handles all the logic and
additional locking..

while that is the case in the call path here/now, it should be mentioned
in the doc comments at least, and the visibility restricted
accordingly..

> +    ) -> Result<(), Error> {
> +        let (path, _digest) = self.chunk_path(digest);
> +        std::fs::remove_file(path).map_err(Error::from)
> +    }
> +
>      pub fn get_chunk_iterator(
>          &self,
>      ) -> Result<
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index e36af68fc..4f55eb9db 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -1686,7 +1686,7 @@ impl DataStore {
>                      ) {
>                          if let Some(cache) = self.cache() {
>                              // ignore errors, phase 3 will retry cleanup anyways
> -                            let _ = cache.remove(&digest);
> +                            let _ = cache.remove(&digest, &lock);

so this call site here is okay, because it happens in GC after all the
checks and additional locking has been done to ensure:
- chunks which are not yet referenced by visible indices are not removed
- GC is not running in a pre-reload process that doesn't "see" new
  backup writers
- GC is not running in a post-reload process while the old process still
  has writers
- ..

>                          }
>                          delete_list.push(content.key);
>                      }
> diff --git a/pbs-datastore/src/local_datastore_lru_cache.rs b/pbs-datastore/src/local_datastore_lru_cache.rs
> index c0edd3619..1d2e87cb9 100644
> --- a/pbs-datastore/src/local_datastore_lru_cache.rs
> +++ b/pbs-datastore/src/local_datastore_lru_cache.rs
> @@ -2,7 +2,7 @@
>  //! a network layer (e.g. via the S3 backend).
>  
>  use std::future::Future;
> -use std::sync::Arc;
> +use std::sync::{Arc, MutexGuard};
>  
>  use anyhow::{bail, Error};
>  use http_body_util::BodyExt;
> @@ -87,10 +87,9 @@ impl LocalDatastoreLruCache {
>      /// Remove a chunk from the local datastore cache.
>      ///
>      /// Fails if the chunk cannot be deleted successfully.
> -    pub fn remove(&self, digest: &[u8; 32]) -> Result<(), Error> {
> +    pub fn remove(&self, digest: &[u8; 32], guard: &MutexGuard<'_, ()>) -> Result<(), Error> {
>          self.cache.remove(*digest);
> -        let (path, _digest_str) = self.store.chunk_path(digest);
> -        std::fs::remove_file(path).map_err(Error::from)
> +        self.store.remove_chunk(digest, guard)

and this here is the only call site "forwarding" this removal from the
cache called above by GC to the underlying chunk store, but it should
probably also not be `pub`??

>      }
>  
>      /// Access the locally cached chunk or fetch it from the S3 object store via the provided
> -- 
> 2.47.3
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 4/7] chunk store: fix: replace evicted cache chunks instead of truncate
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 4/7] chunk store: fix: replace evicted cache chunks instead of truncate Christian Ebner
@ 2025-10-06 13:18   ` Fabian Grünbichler
  2025-10-06 15:35     ` Christian Ebner
  0 siblings, 1 reply; 19+ messages in thread
From: Fabian Grünbichler @ 2025-10-06 13:18 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On October 6, 2025 12:41 pm, Christian Ebner wrote:
> Evicted chunks have been truncated to size zero, keeping the chunk
> file in place as in-use marker for the garbage collection but freeing
> the chunk file contents. This can however lead to restores failing if
> they already opened the chunk file for reading, as their contents are
> now incomplete.
> 
> Fix this by instead replacing the chunk file with a zero sized file,
> leaving the contents accessible for the already opened chunk readers.
> 
> By moving the logic from the local datastore cache to a helper method
> on the chunk store, it is also assured that the operation is guarded
> by the chunk store mutex lock to avoid races with chunk re-insert.

AFAICT this is still racy even after this patch, because in cache.access
after a cache miss (either no local file, or empty local file) we fetch
the chunk from S3, insert it into the chunk store (cache), but then
instead of returning the chunk from the already in-memory chunk data, we
load it again from the path - without holding the lock that prevents the
chunk from being evicted again.. while unlikely to be hit in practice,
this is still wasteful because we could save a round-trip and just
return the chunk we've already constructed from the S3 response and hit
two birds with one stone?

> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
>  pbs-datastore/src/chunk_store.rs              | 20 +++++++++++++++
>  .../src/local_datastore_lru_cache.rs          | 25 +++----------------
>  2 files changed, 23 insertions(+), 22 deletions(-)
> 
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index 010785fbc..74fa79db1 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -668,6 +668,26 @@ impl ChunkStore {
>          (chunk_path, digest_str)
>      }
>  
> +    /// Replace a chunk file with a zero size file in the chunk store.
> +    ///
> +    /// Used to evict chunks from the local datastore cache, while keeping them as in-use markers
> +    /// for garbage collection. Returns with success also if chunk file is not pre-existing.
> +    pub fn clear_chunk(&self, digest: &[u8; 32]) -> Result<(), Error> {
> +        let (chunk_path, digest_str) = self.chunk_path(digest);
> +        let mut create_options = CreateOptions::new();
> +        if nix::unistd::Uid::effective().is_root() {
> +            let uid = pbs_config::backup_user()?.uid;
> +            let gid = pbs_config::backup_group()?.gid;
> +            create_options = create_options.owner(uid).group(gid);
> +        }
> +
> +        let _lock = self.mutex.lock();
> +
> +        proxmox_sys::fs::replace_file(&chunk_path, &[], create_options, false)
> +            .map_err(|err| format_err!("clear chunk failed for {digest_str} - {err}"))?;
> +        Ok(())
> +    }
> +
>      pub fn relative_path(&self, path: &Path) -> PathBuf {
>          // unwrap: only `None` in unit tests
>          assert!(self.locker.is_some());
> diff --git a/pbs-datastore/src/local_datastore_lru_cache.rs b/pbs-datastore/src/local_datastore_lru_cache.rs
> index 1d2e87cb9..6f950f4b3 100644
> --- a/pbs-datastore/src/local_datastore_lru_cache.rs
> +++ b/pbs-datastore/src/local_datastore_lru_cache.rs
> @@ -71,17 +71,8 @@ impl LocalDatastoreLruCache {
>      /// Fails if the chunk cannot be inserted successfully.
>      pub fn insert(&self, digest: &[u8; 32], chunk: &DataBlob) -> Result<(), Error> {
>          self.store.insert_chunk(chunk, digest)?;
> -        self.cache.insert(*digest, (), |digest| {
> -            let (path, _digest_str) = self.store.chunk_path(&digest);
> -            // Truncate to free up space but keep the inode around, since that
> -            // is used as marker for chunks in use by garbage collection.
> -            if let Err(err) = nix::unistd::truncate(&path, 0) {
> -                if err != nix::errno::Errno::ENOENT {
> -                    return Err(Error::from(err));
> -                }
> -            }
> -            Ok(())
> -        })
> +        self.cache
> +            .insert(*digest, (), |digest| self.store.clear_chunk(&digest))
>      }
>  
>      /// Remove a chunk from the local datastore cache.
> @@ -104,17 +95,7 @@ impl LocalDatastoreLruCache {
>      ) -> Result<Option<DataBlob>, Error> {
>          if self
>              .cache
> -            .access(*digest, cacher, |digest| {
> -                let (path, _digest_str) = self.store.chunk_path(&digest);
> -                // Truncate to free up space but keep the inode around, since that
> -                // is used as marker for chunks in use by garbage collection.
> -                if let Err(err) = nix::unistd::truncate(&path, 0) {
> -                    if err != nix::errno::Errno::ENOENT {
> -                        return Err(Error::from(err));
> -                    }
> -                }
> -                Ok(())
> -            })
> +            .access(*digest, cacher, |digest| self.store.clear_chunk(&digest))
>              .await?
>              .is_some()
>          {
> -- 
> 2.47.3
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 5/7] api: chunk upload: fix race between chunk backend upload and insert
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 5/7] api: chunk upload: fix race between chunk backend upload and insert Christian Ebner
@ 2025-10-06 13:18   ` Fabian Grünbichler
  2025-10-07 10:15     ` Christian Ebner
  0 siblings, 1 reply; 19+ messages in thread
From: Fabian Grünbichler @ 2025-10-06 13:18 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On October 6, 2025 12:41 pm, Christian Ebner wrote:
> Chunk are first uploaded to the object store for S3 backed
> datastores, only then inserted into the local datastore cache.
> By this, it is assured that the chunk is only ever considered as
> valid after successful upload. Garbage collection does however rely
> on the local marker file to be present, only then considering the chunk
> as in-use. While this marker is created if not present during phase 1
> of garbage collection, this only happens for chunks which are already
> referenced by a complete index file.
> 
> Therefore, there remains a race window between garbage collection
> listing the chunks (upload completed) and lookup of the local marker
> file being present (chunk cache insert after upload). This can lead to
> chunks which just finished upload, but were not yet inserted into the
> local cache store to be removed again.
> 
> To close this race window, mark chunks which are currently being
> uploaded to the backend by an additional marker file, checked by the
> garbage collection as well if the regular marker is not found, and
> only removed after successful chunk insert.
> 
> Avoid this overhead for regular datastores by passing and checking
> the backend type for the chunk store.
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
>  pbs-datastore/src/chunk_store.rs | 94 +++++++++++++++++++++++++++++---
>  pbs-datastore/src/datastore.rs   | 32 +++++++++++
>  src/api2/backup/upload_chunk.rs  | 17 ++++--
>  src/api2/config/datastore.rs     |  2 +
>  4 files changed, 133 insertions(+), 12 deletions(-)
> 
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index 74fa79db1..22efe4a32 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -7,7 +7,7 @@ use std::time::Duration;
>  use anyhow::{bail, format_err, Context, Error};
>  use tracing::{info, warn};
>  
> -use pbs_api_types::{DatastoreFSyncLevel, GarbageCollectionStatus};
> +use pbs_api_types::{DatastoreBackendType, DatastoreFSyncLevel, GarbageCollectionStatus};
>  use proxmox_io::ReadExt;
>  use proxmox_s3_client::S3Client;
>  use proxmox_sys::fs::{create_dir, create_path, file_type_from_file_stat, CreateOptions};
> @@ -22,6 +22,10 @@ use crate::file_formats::{
>  };
>  use crate::DataBlob;
>  
> +/// Filename extension for local datastore cache marker files indicating
> +/// the chunk being uploaded to the backend
> +pub(crate) const BACKEND_UPLOAD_MARKER_EXT: &str = "backend-upload";

I think this part here should be internal to the chunk store

> +
>  /// File system based chunk store
>  pub struct ChunkStore {
>      name: String, // used for error reporting
> @@ -30,6 +34,7 @@ pub struct ChunkStore {
>      mutex: Mutex<()>,
>      locker: Option<Arc<Mutex<ProcessLocker>>>,
>      sync_level: DatastoreFSyncLevel,
> +    datastore_backend_type: DatastoreBackendType,
>  }
>  
>  // TODO: what about sysctl setting vm.vfs_cache_pressure (0 - 100) ?
> @@ -77,6 +82,7 @@ impl ChunkStore {
>              mutex: Mutex::new(()),
>              locker: None,
>              sync_level: Default::default(),
> +            datastore_backend_type: DatastoreBackendType::default(),
>          }
>      }
>  
> @@ -97,6 +103,7 @@ impl ChunkStore {
>          uid: nix::unistd::Uid,
>          gid: nix::unistd::Gid,
>          sync_level: DatastoreFSyncLevel,
> +        datastore_backend_type: DatastoreBackendType,
>      ) -> Result<Self, Error>
>      where
>          P: Into<PathBuf>,
> @@ -151,7 +158,7 @@ impl ChunkStore {
>              }
>          }
>  
> -        Self::open(name, base, sync_level)
> +        Self::open(name, base, sync_level, datastore_backend_type)
>      }
>  
>      fn lockfile_path<P: Into<PathBuf>>(base: P) -> PathBuf {
> @@ -185,6 +192,7 @@ impl ChunkStore {
>          name: &str,
>          base: P,
>          sync_level: DatastoreFSyncLevel,
> +        datastore_backend_type: DatastoreBackendType,
>      ) -> Result<Self, Error> {
>          let base: PathBuf = base.into();
>  
> @@ -201,6 +209,7 @@ impl ChunkStore {
>              locker: Some(locker),
>              mutex: Mutex::new(()),
>              sync_level,
> +            datastore_backend_type,
>          })
>      }
>  
> @@ -567,6 +576,40 @@ impl ChunkStore {
>          Ok(())
>      }
>  
> +    pub(crate) fn insert_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
> +        if self.datastore_backend_type == DatastoreBackendType::Filesystem {
> +            bail!("cannot create backend upload marker, not a cache store");
> +        }
> +        let (marker_path, digest_str) = self.chunk_backed_upload_marker_path(digest);
> +
> +        let _lock = self.mutex.lock();
> +
> +        std::fs::File::options()
> +            .write(true)
> +            .create_new(true)
> +            .open(&marker_path)
> +            .with_context(|| {
> +                format!("failed to create backend upload marker for chunk {digest_str}")

A: reference for comment further below in upload_chunk

> +            })?;
> +        Ok(())
> +    }
> +
> +    pub(crate) fn cleanup_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
> +        if self.datastore_backend_type == DatastoreBackendType::Filesystem {
> +            bail!("cannot cleanup backend upload marker, not a cache store");
> +        }
> +        let (marker_path, _digest_str) = self.chunk_backed_upload_marker_path(digest);
> +
> +        let _lock = self.mutex.lock();
> +
> +        if let Err(err) = std::fs::remove_file(marker_path) {
> +            if err.kind() != std::io::ErrorKind::NotFound {
> +                bail!("failed to cleanup backend upload marker: {err}");
> +            }
> +        }
> +        Ok(())
> +    }
> +

and instead we should add a third helper here that allows to query
whether there exists a pending upload marker for a certain digest?

>      pub fn insert_chunk(&self, chunk: &DataBlob, digest: &[u8; 32]) -> Result<(bool, u64), Error> {
>          // unwrap: only `None` in unit tests
>          assert!(self.locker.is_some());
> @@ -644,6 +687,16 @@ impl ChunkStore {
>              format_err!("inserting chunk on store '{name}' failed for {digest_str} - {err}")
>          })?;
>  
> +        if self.datastore_backend_type != DatastoreBackendType::Filesystem {
> +            if let Err(err) =
> +                std::fs::remove_file(chunk_path.with_extension(BACKEND_UPLOAD_MARKER_EXT))
> +            {
> +                if err.kind() != std::io::ErrorKind::NotFound {
> +                    bail!("failed to cleanup backend upload marker: {err}");
> +                }
> +            }
> +        }
> +
>          if self.sync_level == DatastoreFSyncLevel::File {
>              // fsync dir handle to persist the tmp rename
>              let dir = std::fs::File::open(chunk_dir_path)?;
> @@ -688,6 +741,15 @@ impl ChunkStore {
>          Ok(())
>      }
>  
> +    /// Generate file path for backend upload marker of given chunk digest.
> +    pub(crate) fn chunk_backed_upload_marker_path(&self, digest: &[u8; 32]) -> (PathBuf, String) {
> +        let (chunk_path, digest_str) = self.chunk_path(digest);
> +        (
> +            chunk_path.with_extension(BACKEND_UPLOAD_MARKER_EXT),
> +            digest_str,
> +        )
> +    }

this should be internal as well

> +
>      pub fn relative_path(&self, path: &Path) -> PathBuf {
>          // unwrap: only `None` in unit tests
>          assert!(self.locker.is_some());
> @@ -773,14 +835,26 @@ fn test_chunk_store1() {
>  
>      if let Err(_e) = std::fs::remove_dir_all(".testdir") { /* ignore */ }
>  
> -    let chunk_store = ChunkStore::open("test", &path, DatastoreFSyncLevel::None);
> +    let chunk_store = ChunkStore::open(
> +        "test",
> +        &path,
> +        DatastoreFSyncLevel::None,
> +        DatastoreBackendType::Filesystem,
> +    );
>      assert!(chunk_store.is_err());
>  
>      let user = nix::unistd::User::from_uid(nix::unistd::Uid::current())
>          .unwrap()
>          .unwrap();
> -    let chunk_store =
> -        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None).unwrap();
> +    let chunk_store = ChunkStore::create(
> +        "test",
> +        &path,
> +        user.uid,
> +        user.gid,
> +        DatastoreFSyncLevel::None,
> +        DatastoreBackendType::Filesystem,
> +    )
> +    .unwrap();
>  
>      let (chunk, digest) = crate::data_blob::DataChunkBuilder::new(&[0u8, 1u8])
>          .build()
> @@ -792,8 +866,14 @@ fn test_chunk_store1() {
>      let (exists, _) = chunk_store.insert_chunk(&chunk, &digest).unwrap();
>      assert!(exists);
>  
> -    let chunk_store =
> -        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None);
> +    let chunk_store = ChunkStore::create(
> +        "test",
> +        &path,
> +        user.uid,
> +        user.gid,
> +        DatastoreFSyncLevel::None,
> +        DatastoreBackendType::Filesystem,
> +    );
>      assert!(chunk_store.is_err());
>  
>      if let Err(_e) = std::fs::remove_dir_all(".testdir") { /* ignore */ }
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 4f55eb9db..58fb863ec 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -348,10 +348,14 @@ impl DataStore {
>                  DatastoreTuning::API_SCHEMA
>                      .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
>              )?;
> +            let backend_config: DatastoreBackendConfig =
> +                config.backend.as_deref().unwrap_or("").parse()?;
> +            let backend_type = backend_config.ty.unwrap_or_default();
>              Arc::new(ChunkStore::open(
>                  name,
>                  config.absolute_path(),
>                  tuning.sync_level.unwrap_or_default(),
> +                backend_type,
>              )?)
>          };
>  
> @@ -436,10 +440,16 @@ impl DataStore {
>              DatastoreTuning::API_SCHEMA
>                  .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
>          )?;
> +        let backend_config: DatastoreBackendConfig = serde_json::from_value(
> +            DatastoreBackendConfig::API_SCHEMA
> +                .parse_property_string(config.backend.as_deref().unwrap_or(""))?,
> +        )?;
> +        let backend_type = backend_config.ty.unwrap_or_default();
>          let chunk_store = ChunkStore::open(
>              &name,
>              config.absolute_path(),
>              tuning.sync_level.unwrap_or_default(),
> +            backend_type,
>          )?;
>          let inner = Arc::new(Self::with_store_and_config(
>              Arc::new(chunk_store),
> @@ -1663,6 +1673,15 @@ impl DataStore {
>                      let atime = match std::fs::metadata(&chunk_path) {
>                          Ok(stat) => stat.accessed()?,
>                          Err(err) if err.kind() == std::io::ErrorKind::NotFound => {
> +                            if std::fs::metadata(
> +                                chunk_path
> +                                    .with_extension(crate::chunk_store::BACKEND_UPLOAD_MARKER_EXT),
> +                            )
> +                            .is_ok()
> +                            {
> +                                info!("keep in progress {}", content.key);

what does `keep in progress` mean?

> +                                continue;
> +                            }

this should call the new helper.

>                              // File not found, delete by setting atime to unix epoch
>                              info!("Not found, mark for deletion: {}", content.key);
>                              SystemTime::UNIX_EPOCH
> @@ -1867,6 +1886,19 @@ impl DataStore {
>          self.inner.chunk_store.insert_chunk(chunk, digest)
>      }
>  
> +    /// Inserts the marker file to signal an in progress upload to the backend
> +    ///
> +    /// The presence of the marker avoids a race between inserting the chunk into the
> +    /// datastore and cleanup of the chunk by garbage collection.
> +    pub fn insert_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
> +        self.inner.chunk_store.insert_backend_upload_marker(digest)
> +    }
> +
> +    /// Remove the marker file signaling an in-progress upload to the backend
> +    pub fn cleanup_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
> +        self.inner.chunk_store.cleanup_backend_upload_marker(digest)
> +    }
> +
>      pub fn stat_chunk(&self, digest: &[u8; 32]) -> Result<std::fs::Metadata, Error> {
>          let (chunk_path, _digest_str) = self.inner.chunk_store.chunk_path(digest);
>          std::fs::metadata(chunk_path).map_err(Error::from)
> diff --git a/src/api2/backup/upload_chunk.rs b/src/api2/backup/upload_chunk.rs
> index 8dd7e4d52..d4b1850eb 100644
> --- a/src/api2/backup/upload_chunk.rs
> +++ b/src/api2/backup/upload_chunk.rs
> @@ -259,6 +259,7 @@ async fn upload_to_backend(
>                      data.len()
>                  );
>              }
> +            let datastore = env.datastore.clone();
>  
>              if env.no_cache {
>                  let object_key = pbs_datastore::s3::object_key_from_digest(&digest)?;
> @@ -272,11 +273,11 @@ async fn upload_to_backend(
>              // Avoid re-upload to S3 if the chunk is either present in the LRU cache or the chunk
>              // file exists on filesystem. The latter means that the chunk has been present in the
>              // past an was not cleaned up by garbage collection, so contained in the S3 object store.
> -            if env.datastore.cache_contains(&digest) {
> +            if datastore.cache_contains(&digest) {
>                  tracing::info!("Skip upload of cached chunk {}", hex::encode(digest));
>                  return Ok((digest, size, encoded_size, true));
>              }
> -            if let Ok(true) = env.datastore.cond_touch_chunk(&digest, false) {
> +            if let Ok(true) = datastore.cond_touch_chunk(&digest, false) {
>                  tracing::info!(
>                      "Skip upload of already encountered chunk {}",
>                      hex::encode(digest)
> @@ -284,18 +285,24 @@ async fn upload_to_backend(
>                  return Ok((digest, size, encoded_size, true));
>              }
>  
> +            datastore.insert_backend_upload_marker(&digest)?;

what stops two concurrent uploads from ending up here and making this
return an error for the second client to end up attempting to insert the
marker file? in A, we only take the mutex for inserting the marker file
itself..

there's a second issue - what if we end up calling
datastore.insert_backend_upload_marker here, it blocks on the mutex,
because another insertion is going on that already holds it..

then we end up creating the marker file here, even though the chunk is
already in the chunk store..

mutexes are not (guaranteed to be) fair, so we can have the situation
that two backup clients concurrently attempt to upload the same chunk,
with one of them succeeding across the whole

- insert marker
- upload to S3
- insert chunk and remove marker

process, with the other one blocked on obtaining the mutex for inserting
the marker file if there is enough contention on the chunk store lock

>              tracing::info!("Upload of new chunk {}", hex::encode(digest));
>              let object_key = pbs_datastore::s3::object_key_from_digest(&digest)?;

(side not - if this fails, then the marker file is left around..)

> -            let is_duplicate = s3_client
> +            let is_duplicate = match s3_client
>                  .upload_no_replace_with_retry(object_key, data.clone())
>                  .await
> -                .map_err(|err| format_err!("failed to upload chunk to s3 backend - {err:#}"))?;
> +            {
> +                Ok(is_duplicate) => is_duplicate,
> +                Err(err) => {
> +                    datastore.cleanup_backend_upload_marker(&digest)?;
> +                    bail!("failed to upload chunk to s3 backend - {err:#}");
> +                }
> +            };
>  
>              // Only insert the chunk into the cache after it has been successufuly uploaded.
>              // Although less performant than doing this in parallel, it is required for consisency
>              // since chunks are considered as present on the backend if the file exists in the local
>              // cache store.
> -            let datastore = env.datastore.clone();
>              tracing::info!("Caching of chunk {}", hex::encode(digest));
>              let _ = tokio::task::spawn_blocking(move || {
>                  let chunk = DataBlob::from_raw(data.to_vec())?;

and down below here, if the chunk was inserted by somebody else, the
cache_insert call on the next line here will end up calling insert_chunk
on the chunk_store, which won't clear the marker unless it actually
freshly inserts the chunk, which doesn't happen if it already exists..

> diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
> index 3b03c0466..541bd0a04 100644
> --- a/src/api2/config/datastore.rs
> +++ b/src/api2/config/datastore.rs
> @@ -173,6 +173,7 @@ pub(crate) fn do_create_datastore(
>                  &datastore.name,
>                  &path,
>                  tuning.sync_level.unwrap_or_default(),
> +                backend_type,
>              )
>          })?
>      } else {
> @@ -204,6 +205,7 @@ pub(crate) fn do_create_datastore(
>              backup_user.uid,
>              backup_user.gid,
>              tuning.sync_level.unwrap_or_default(),
> +            backend_type,
>          )?
>      };
>  
> -- 
> 2.47.3
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 6/7] api: chunk upload: fix race with garbage collection for no-cache on s3
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 6/7] api: chunk upload: fix race with garbage collection for no-cache on s3 Christian Ebner
@ 2025-10-06 13:18   ` Fabian Grünbichler
  0 siblings, 0 replies; 19+ messages in thread
From: Fabian Grünbichler @ 2025-10-06 13:18 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On October 6, 2025 12:41 pm, Christian Ebner wrote:
> Chunks uploaded to the s3 backend are never inserted into the local
> datastore cache. The presence of the chunk marker file is however
> required for garbage collection to not cleanup the chunks. While the
> marker files are created during phase 1 of the garbage collection for
> indexed chunks, this is not the case for in progress backups with the
> no-cache flag set.
> 
> Therefore, mark chunks as in-progress while being uploaded just like
> for the regular mode with cache, but replace this with the zero-sized
> chunk marker file after upload finished to avoid incorrect garbage
> collection cleanup.
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
>  pbs-datastore/src/chunk_store.rs | 13 +++++++++++++
>  pbs-datastore/src/datastore.rs   |  7 +++++++
>  src/api2/backup/upload_chunk.rs  | 12 ++++++++++--
>  3 files changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
> index 22efe4a32..7fd92b626 100644
> --- a/pbs-datastore/src/chunk_store.rs
> +++ b/pbs-datastore/src/chunk_store.rs
> @@ -594,6 +594,19 @@ impl ChunkStore {
>          Ok(())
>      }
>  
> +    pub(crate) fn persist_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
> +        if self.datastore_backend_type == DatastoreBackendType::Filesystem {
> +            bail!("cannot create backend upload marker, not a cache store");
> +        }
> +        let (marker_path, _digest_str) = self.chunk_backed_upload_marker_path(digest);
> +        let (chunk_path, digest_str) = self.chunk_path(digest);
> +        let _lock = self.mutex.lock();
> +
> +        std::fs::rename(marker_path, chunk_path).map_err(|err| {
> +            format_err!("persisting backup upload marker failed for {digest_str} - {err}")
> +        })
> +    }
> +
>      pub(crate) fn cleanup_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
>          if self.datastore_backend_type == DatastoreBackendType::Filesystem {
>              bail!("cannot cleanup backend upload marker, not a cache store");
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 58fb863ec..8b0d4ab5c 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -1894,6 +1894,13 @@ impl DataStore {
>          self.inner.chunk_store.insert_backend_upload_marker(digest)
>      }
>  
> +    /// Persist the backend upload marker to be a zero size chunk marker.
> +    ///
> +    /// Marks the chunk as present in the local store cache without inserting its payload.
> +    pub fn persist_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
> +        self.inner.chunk_store.persist_backend_upload_marker(digest)
> +    }
> +
>      /// Remove the marker file signaling an in-progress upload to the backend
>      pub fn cleanup_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
>          self.inner.chunk_store.cleanup_backend_upload_marker(digest)
> diff --git a/src/api2/backup/upload_chunk.rs b/src/api2/backup/upload_chunk.rs
> index d4b1850eb..35d873ebf 100644
> --- a/src/api2/backup/upload_chunk.rs
> +++ b/src/api2/backup/upload_chunk.rs
> @@ -263,10 +263,18 @@ async fn upload_to_backend(
>  
>              if env.no_cache {
>                  let object_key = pbs_datastore::s3::object_key_from_digest(&digest)?;
> -                let is_duplicate = s3_client
> +                env.datastore.insert_backend_upload_marker(&digest)?;

this has the same issue as patch #5 - if two clients attempt to upload
the same digest concurrently, then one of them will fail and abort the
backup..

> +                let is_duplicate = match s3_client
>                      .upload_no_replace_with_retry(object_key, data)
>                      .await
> -                    .map_err(|err| format_err!("failed to upload chunk to s3 backend - {err:#}"))?;
> +                {
> +                    Ok(is_duplicate) => is_duplicate,
> +                    Err(err) => {
> +                        datastore.cleanup_backend_upload_marker(&digest)?;
> +                        bail!("failed to upload chunk to s3 backend - {err:#}");
> +                    }
> +                };
> +                env.datastore.persist_backend_upload_marker(&digest)?;

and if this fails, the corresponding chunk can never be uploaded again..

>                  return Ok((digest, size, encoded_size, is_duplicate));
>              }
>  
> -- 
> 2.47.3
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 7/7] pull: guard chunk upload and only insert into cache after upload
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 7/7] pull: guard chunk upload and only insert into cache after upload Christian Ebner
@ 2025-10-06 13:18   ` Fabian Grünbichler
  0 siblings, 0 replies; 19+ messages in thread
From: Fabian Grünbichler @ 2025-10-06 13:18 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On October 6, 2025 12:41 pm, Christian Ebner wrote:
> Inserting the chunk into the local datastore cache leads to it being
> considered as present on the backend. The upload might however still
> fail, leading to missing chunks.
> 
> Therefore, only insert the chunk after the upload completed with
> success and guard the upload by the backend upload marker file to
> avoid races with garbage collection.
> 
> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
> ---
>  src/server/pull.rs | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/src/server/pull.rs b/src/server/pull.rs
> index 08e6e6b64..639415107 100644
> --- a/src/server/pull.rs
> +++ b/src/server/pull.rs
> @@ -176,14 +176,20 @@ async fn pull_index_chunks<I: IndexFile>(
>                      if target2.cache_contains(&digest) {
>                          return Ok(());
>                      }
> -                    target2.cache_insert(&digest, &chunk)?;
>                      let data = chunk.raw_data().to_vec();
>                      let upload_data = hyper::body::Bytes::from(data);
>                      let object_key = pbs_datastore::s3::object_key_from_digest(&digest)?;
> -                    let _is_duplicate = proxmox_async::runtime::block_on(
> +                    target2.insert_backend_upload_marker(&digest)?;
> +                    match proxmox_async::runtime::block_on(
>                          s3_client.upload_no_replace_with_retry(object_key, upload_data),
> -                    )
> -                    .context("failed to upload chunk to s3 backend")?;
> +                    ) {
> +                        Ok(_is_duplicate) => (),
> +                        Err(err) => {
> +                            target2.cleanup_backend_upload_marker(&digest)?;
> +                            return Err(err.context("failed to upload chunk to s3 backend"));
> +                        }
> +                    }
> +                    target2.cache_insert(&digest, &chunk)?;

same issue as described in patch #5 exists here as well..

>                  }
>              }
>              Ok(())
> -- 
> 2.47.3
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 0/7] s3 store: fix issues with chunk s3 backend upload and cache eviction
  2025-10-06 10:41 [pbs-devel] [PATCH proxmox-backup 0/7] s3 store: fix issues with chunk s3 backend upload and cache eviction Christian Ebner
                   ` (6 preceding siblings ...)
  2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 7/7] pull: guard chunk upload and only insert into cache after upload Christian Ebner
@ 2025-10-06 13:18 ` Fabian Grünbichler
  2025-10-08 15:22 ` [pbs-devel] superseded: " Christian Ebner
  8 siblings, 0 replies; 19+ messages in thread
From: Fabian Grünbichler @ 2025-10-06 13:18 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion

On October 6, 2025 12:41 pm, Christian Ebner wrote:
> These patches fix 2 issues with the current s3 backend implementation
> and reduce code duplication.
> 
> Patch 1 to 3 rework the garbage collection, deduplicating the common atime
> check and garbage collection status update logic and refactor the chunk removal
> from local datastore cache on s3 backends.
> 
> Patch 4 fixes an issue which could lead to backup restores failing
> during concurrent backups, if the local datastore cache was small, leading
> to chunks being evicted from the cache by truncating the contents, while the
> chunk reader still accessed the chunk. This is now circumvented by replacing
> the chunk file instead, leaving the contents on the already opened file
> handle for the reader still accessible.

these parts here are not 100% there yet (see comments), but I think the
approach taken is fine in principial!

> The remaining patches fix a possible race condition between s3 backend upload
> and garbage collection, which can result in chunk loss. If the chunk upload
> finished, garbage collection listed and checked the chunk's in-use marker, just
> before it being written by the cache insert after the upload, garbage collection
> can incorrectly delete the chunk. This is circumvented by setting and checking
> an additional chunk marker file, which is created before starting the upload
> and removed after cache insert, assuring that these chunks are not removed.

see detailed comments on individual patches, but I think the current
approach of using marker files guarded by the existing chunk store mutex
has a lot of subtle pitfalls surrounding concurrent insertions and error
handling..

one potential alternative would be to use an flock instead, but that
might lead to scalability issues if there are lots of concurrent chunk
uploads..

maybe we could instead do the following:
- lock the mutex & touch the pending marker file before uploading the
  chunk (but allow it to already exist, as that might be a concurrent or
  older failed upload)
- clean it up when inserting into the cache (but treat it not existing
  as benign?)
- in GC, lock and
  - check whether the chunk was properly inserted in the meantime (then
    we know it's fresh and mustn't be cleaned up)
  - if not, check the atime of the pending marker (if it is before the
    cutoff then it must be from an already aborted backup that we don't
    care about, and we can remove the pending marker and the chunk on S3)

and I think at this point we should have a single helper for inserting
an in-memory chunk into an S3-backed datastore, including the upload
handling.. while atm we don't support restoring from tape to S3
directly, once we add that it would be a third place where we'd need the
same/similar code and potentially introduce bugs related to handling all
this (backup API, pull sync are the existing two).

> proxmox-backup:
> 
> Christian Ebner (7):
>   datastore: gc: inline single callsite method
>   gc: chunk store: rework atime check and gc status into common helper
>   chunk store: add and use method to remove chunks
>   chunk store: fix: replace evicted cache chunks instead of truncate
>   api: chunk upload: fix race between chunk backend upload and insert
>   api: chunk upload: fix race with garbage collection for no-cache on s3
>   pull: guard chunk upload and only insert into cache after upload
> 
>  pbs-datastore/src/chunk_store.rs              | 211 +++++++++++++++---
>  pbs-datastore/src/datastore.rs                | 155 +++++++------
>  .../src/local_datastore_lru_cache.rs          |  32 +--
>  src/api2/backup/upload_chunk.rs               |  29 ++-
>  src/api2/config/datastore.rs                  |   2 +
>  src/server/pull.rs                            |  14 +-
>  6 files changed, 299 insertions(+), 144 deletions(-)
> 
> 
> Summary over all repositories:
>   6 files changed, 299 insertions(+), 144 deletions(-)
> 
> -- 
> Generated by git-murpp 0.8.1
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 4/7] chunk store: fix: replace evicted cache chunks instead of truncate
  2025-10-06 13:18   ` Fabian Grünbichler
@ 2025-10-06 15:35     ` Christian Ebner
  2025-10-06 16:14       ` Christian Ebner
  0 siblings, 1 reply; 19+ messages in thread
From: Christian Ebner @ 2025-10-06 15:35 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Fabian Grünbichler

On 10/6/25 3:18 PM, Fabian Grünbichler wrote:
> On October 6, 2025 12:41 pm, Christian Ebner wrote:
>> Evicted chunks have been truncated to size zero, keeping the chunk
>> file in place as in-use marker for the garbage collection but freeing
>> the chunk file contents. This can however lead to restores failing if
>> they already opened the chunk file for reading, as their contents are
>> now incomplete.
>>
>> Fix this by instead replacing the chunk file with a zero sized file,
>> leaving the contents accessible for the already opened chunk readers.
>>
>> By moving the logic from the local datastore cache to a helper method
>> on the chunk store, it is also assured that the operation is guarded
>> by the chunk store mutex lock to avoid races with chunk re-insert.
> 
> AFAICT this is still racy even after this patch, because in cache.access
> after a cache miss (either no local file, or empty local file) we fetch
> the chunk from S3, insert it into the chunk store (cache), but then
> instead of returning the chunk from the already in-memory chunk data, we
> load it again from the path - without holding the lock that prevents the
> chunk from being evicted again.. while unlikely to be hit in practice,
> this is still wasteful because we could save a round-trip and just
> return the chunk we've already constructed from the S3 response and hit
> two birds with one stone?

Good catch! Indeed we can serve the response from the already in-memory 
data in that case.

> 
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>>   pbs-datastore/src/chunk_store.rs              | 20 +++++++++++++++
>>   .../src/local_datastore_lru_cache.rs          | 25 +++----------------
>>   2 files changed, 23 insertions(+), 22 deletions(-)
>>
>> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
>> index 010785fbc..74fa79db1 100644
>> --- a/pbs-datastore/src/chunk_store.rs
>> +++ b/pbs-datastore/src/chunk_store.rs
>> @@ -668,6 +668,26 @@ impl ChunkStore {
>>           (chunk_path, digest_str)
>>       }
>>   
>> +    /// Replace a chunk file with a zero size file in the chunk store.
>> +    ///
>> +    /// Used to evict chunks from the local datastore cache, while keeping them as in-use markers
>> +    /// for garbage collection. Returns with success also if chunk file is not pre-existing.
>> +    pub fn clear_chunk(&self, digest: &[u8; 32]) -> Result<(), Error> {
>> +        let (chunk_path, digest_str) = self.chunk_path(digest);
>> +        let mut create_options = CreateOptions::new();
>> +        if nix::unistd::Uid::effective().is_root() {
>> +            let uid = pbs_config::backup_user()?.uid;
>> +            let gid = pbs_config::backup_group()?.gid;
>> +            create_options = create_options.owner(uid).group(gid);
>> +        }
>> +
>> +        let _lock = self.mutex.lock();
>> +
>> +        proxmox_sys::fs::replace_file(&chunk_path, &[], create_options, false)
>> +            .map_err(|err| format_err!("clear chunk failed for {digest_str} - {err}"))?;
>> +        Ok(())
>> +    }
>> +
>>       pub fn relative_path(&self, path: &Path) -> PathBuf {
>>           // unwrap: only `None` in unit tests
>>           assert!(self.locker.is_some());
>> diff --git a/pbs-datastore/src/local_datastore_lru_cache.rs b/pbs-datastore/src/local_datastore_lru_cache.rs
>> index 1d2e87cb9..6f950f4b3 100644
>> --- a/pbs-datastore/src/local_datastore_lru_cache.rs
>> +++ b/pbs-datastore/src/local_datastore_lru_cache.rs
>> @@ -71,17 +71,8 @@ impl LocalDatastoreLruCache {
>>       /// Fails if the chunk cannot be inserted successfully.
>>       pub fn insert(&self, digest: &[u8; 32], chunk: &DataBlob) -> Result<(), Error> {
>>           self.store.insert_chunk(chunk, digest)?;
>> -        self.cache.insert(*digest, (), |digest| {
>> -            let (path, _digest_str) = self.store.chunk_path(&digest);
>> -            // Truncate to free up space but keep the inode around, since that
>> -            // is used as marker for chunks in use by garbage collection.
>> -            if let Err(err) = nix::unistd::truncate(&path, 0) {
>> -                if err != nix::errno::Errno::ENOENT {
>> -                    return Err(Error::from(err));
>> -                }
>> -            }
>> -            Ok(())
>> -        })
>> +        self.cache
>> +            .insert(*digest, (), |digest| self.store.clear_chunk(&digest))
>>       }
>>   
>>       /// Remove a chunk from the local datastore cache.
>> @@ -104,17 +95,7 @@ impl LocalDatastoreLruCache {
>>       ) -> Result<Option<DataBlob>, Error> {
>>           if self
>>               .cache
>> -            .access(*digest, cacher, |digest| {
>> -                let (path, _digest_str) = self.store.chunk_path(&digest);
>> -                // Truncate to free up space but keep the inode around, since that
>> -                // is used as marker for chunks in use by garbage collection.
>> -                if let Err(err) = nix::unistd::truncate(&path, 0) {
>> -                    if err != nix::errno::Errno::ENOENT {
>> -                        return Err(Error::from(err));
>> -                    }
>> -                }
>> -                Ok(())
>> -            })
>> +            .access(*digest, cacher, |digest| self.store.clear_chunk(&digest))
>>               .await?
>>               .is_some()
>>           {
>> -- 
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 4/7] chunk store: fix: replace evicted cache chunks instead of truncate
  2025-10-06 15:35     ` Christian Ebner
@ 2025-10-06 16:14       ` Christian Ebner
  0 siblings, 0 replies; 19+ messages in thread
From: Christian Ebner @ 2025-10-06 16:14 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Fabian Grünbichler

On 10/6/25 5:35 PM, Christian Ebner wrote:
> On 10/6/25 3:18 PM, Fabian Grünbichler wrote:
>> On October 6, 2025 12:41 pm, Christian Ebner wrote:
>>> Evicted chunks have been truncated to size zero, keeping the chunk
>>> file in place as in-use marker for the garbage collection but freeing
>>> the chunk file contents. This can however lead to restores failing if
>>> they already opened the chunk file for reading, as their contents are
>>> now incomplete.
>>>
>>> Fix this by instead replacing the chunk file with a zero sized file,
>>> leaving the contents accessible for the already opened chunk readers.
>>>
>>> By moving the logic from the local datastore cache to a helper method
>>> on the chunk store, it is also assured that the operation is guarded
>>> by the chunk store mutex lock to avoid races with chunk re-insert.
>>
>> AFAICT this is still racy even after this patch, because in cache.access
>> after a cache miss (either no local file, or empty local file) we fetch
>> the chunk from S3, insert it into the chunk store (cache), but then
>> instead of returning the chunk from the already in-memory chunk data, we
>> load it again from the path - without holding the lock that prevents the
>> chunk from being evicted again.. while unlikely to be hit in practice,
>> this is still wasteful because we could save a round-trip and just
>> return the chunk we've already constructed from the S3 response and hit
>> two birds with one stone?
> 
> Good catch! Indeed we can serve the response from the already in-memory 
> data in that case.

Actually it is more subtle, as I cannot pass the response from the 
fetcher along to the access implementation of the 
LocalDatastoreLruCache. But I will rewrite the implementation for that,
making the fetcher basically a dummy and moving the insert logic to the 
access method instead.


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [pbs-devel] [PATCH proxmox-backup 5/7] api: chunk upload: fix race between chunk backend upload and insert
  2025-10-06 13:18   ` Fabian Grünbichler
@ 2025-10-07 10:15     ` Christian Ebner
  0 siblings, 0 replies; 19+ messages in thread
From: Christian Ebner @ 2025-10-07 10:15 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Fabian Grünbichler

On 10/6/25 3:17 PM, Fabian Grünbichler wrote:
> On October 6, 2025 12:41 pm, Christian Ebner wrote:
>> Chunk are first uploaded to the object store for S3 backed
>> datastores, only then inserted into the local datastore cache.
>> By this, it is assured that the chunk is only ever considered as
>> valid after successful upload. Garbage collection does however rely
>> on the local marker file to be present, only then considering the chunk
>> as in-use. While this marker is created if not present during phase 1
>> of garbage collection, this only happens for chunks which are already
>> referenced by a complete index file.
>>
>> Therefore, there remains a race window between garbage collection
>> listing the chunks (upload completed) and lookup of the local marker
>> file being present (chunk cache insert after upload). This can lead to
>> chunks which just finished upload, but were not yet inserted into the
>> local cache store to be removed again.
>>
>> To close this race window, mark chunks which are currently being
>> uploaded to the backend by an additional marker file, checked by the
>> garbage collection as well if the regular marker is not found, and
>> only removed after successful chunk insert.
>>
>> Avoid this overhead for regular datastores by passing and checking
>> the backend type for the chunk store.
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>>   pbs-datastore/src/chunk_store.rs | 94 +++++++++++++++++++++++++++++---
>>   pbs-datastore/src/datastore.rs   | 32 +++++++++++
>>   src/api2/backup/upload_chunk.rs  | 17 ++++--
>>   src/api2/config/datastore.rs     |  2 +
>>   4 files changed, 133 insertions(+), 12 deletions(-)
>>
>> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
>> index 74fa79db1..22efe4a32 100644
>> --- a/pbs-datastore/src/chunk_store.rs
>> +++ b/pbs-datastore/src/chunk_store.rs
>> @@ -7,7 +7,7 @@ use std::time::Duration;
>>   use anyhow::{bail, format_err, Context, Error};
>>   use tracing::{info, warn};
>>   
>> -use pbs_api_types::{DatastoreFSyncLevel, GarbageCollectionStatus};
>> +use pbs_api_types::{DatastoreBackendType, DatastoreFSyncLevel, GarbageCollectionStatus};
>>   use proxmox_io::ReadExt;
>>   use proxmox_s3_client::S3Client;
>>   use proxmox_sys::fs::{create_dir, create_path, file_type_from_file_stat, CreateOptions};
>> @@ -22,6 +22,10 @@ use crate::file_formats::{
>>   };
>>   use crate::DataBlob;
>>   
>> +/// Filename extension for local datastore cache marker files indicating
>> +/// the chunk being uploaded to the backend
>> +pub(crate) const BACKEND_UPLOAD_MARKER_EXT: &str = "backend-upload";
> 
> I think this part here should be internal to the chunk store
> 
>> +
>>   /// File system based chunk store
>>   pub struct ChunkStore {
>>       name: String, // used for error reporting
>> @@ -30,6 +34,7 @@ pub struct ChunkStore {
>>       mutex: Mutex<()>,
>>       locker: Option<Arc<Mutex<ProcessLocker>>>,
>>       sync_level: DatastoreFSyncLevel,
>> +    datastore_backend_type: DatastoreBackendType,
>>   }
>>   
>>   // TODO: what about sysctl setting vm.vfs_cache_pressure (0 - 100) ?
>> @@ -77,6 +82,7 @@ impl ChunkStore {
>>               mutex: Mutex::new(()),
>>               locker: None,
>>               sync_level: Default::default(),
>> +            datastore_backend_type: DatastoreBackendType::default(),
>>           }
>>       }
>>   
>> @@ -97,6 +103,7 @@ impl ChunkStore {
>>           uid: nix::unistd::Uid,
>>           gid: nix::unistd::Gid,
>>           sync_level: DatastoreFSyncLevel,
>> +        datastore_backend_type: DatastoreBackendType,
>>       ) -> Result<Self, Error>
>>       where
>>           P: Into<PathBuf>,
>> @@ -151,7 +158,7 @@ impl ChunkStore {
>>               }
>>           }
>>   
>> -        Self::open(name, base, sync_level)
>> +        Self::open(name, base, sync_level, datastore_backend_type)
>>       }
>>   
>>       fn lockfile_path<P: Into<PathBuf>>(base: P) -> PathBuf {
>> @@ -185,6 +192,7 @@ impl ChunkStore {
>>           name: &str,
>>           base: P,
>>           sync_level: DatastoreFSyncLevel,
>> +        datastore_backend_type: DatastoreBackendType,
>>       ) -> Result<Self, Error> {
>>           let base: PathBuf = base.into();
>>   
>> @@ -201,6 +209,7 @@ impl ChunkStore {
>>               locker: Some(locker),
>>               mutex: Mutex::new(()),
>>               sync_level,
>> +            datastore_backend_type,
>>           })
>>       }
>>   
>> @@ -567,6 +576,40 @@ impl ChunkStore {
>>           Ok(())
>>       }
>>   
>> +    pub(crate) fn insert_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
>> +        if self.datastore_backend_type == DatastoreBackendType::Filesystem {
>> +            bail!("cannot create backend upload marker, not a cache store");
>> +        }
>> +        let (marker_path, digest_str) = self.chunk_backed_upload_marker_path(digest);
>> +
>> +        let _lock = self.mutex.lock();
>> +
>> +        std::fs::File::options()
>> +            .write(true)
>> +            .create_new(true)
>> +            .open(&marker_path)
>> +            .with_context(|| {
>> +                format!("failed to create backend upload marker for chunk {digest_str}")
> 
> A: reference for comment further below in upload_chunk
> 
>> +            })?;
>> +        Ok(())
>> +    }
>> +
>> +    pub(crate) fn cleanup_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
>> +        if self.datastore_backend_type == DatastoreBackendType::Filesystem {
>> +            bail!("cannot cleanup backend upload marker, not a cache store");
>> +        }
>> +        let (marker_path, _digest_str) = self.chunk_backed_upload_marker_path(digest);
>> +
>> +        let _lock = self.mutex.lock();
>> +
>> +        if let Err(err) = std::fs::remove_file(marker_path) {
>> +            if err.kind() != std::io::ErrorKind::NotFound {
>> +                bail!("failed to cleanup backend upload marker: {err}");
>> +            }
>> +        }
>> +        Ok(())
>> +    }
>> +
> 
> and instead we should add a third helper here that allows to query
> whether there exists a pending upload marker for a certain digest?
> 
>>       pub fn insert_chunk(&self, chunk: &DataBlob, digest: &[u8; 32]) -> Result<(bool, u64), Error> {
>>           // unwrap: only `None` in unit tests
>>           assert!(self.locker.is_some());
>> @@ -644,6 +687,16 @@ impl ChunkStore {
>>               format_err!("inserting chunk on store '{name}' failed for {digest_str} - {err}")
>>           })?;
>>   
>> +        if self.datastore_backend_type != DatastoreBackendType::Filesystem {
>> +            if let Err(err) =
>> +                std::fs::remove_file(chunk_path.with_extension(BACKEND_UPLOAD_MARKER_EXT))
>> +            {
>> +                if err.kind() != std::io::ErrorKind::NotFound {
>> +                    bail!("failed to cleanup backend upload marker: {err}");
>> +                }
>> +            }
>> +        }
>> +
>>           if self.sync_level == DatastoreFSyncLevel::File {
>>               // fsync dir handle to persist the tmp rename
>>               let dir = std::fs::File::open(chunk_dir_path)?;
>> @@ -688,6 +741,15 @@ impl ChunkStore {
>>           Ok(())
>>       }
>>   
>> +    /// Generate file path for backend upload marker of given chunk digest.
>> +    pub(crate) fn chunk_backed_upload_marker_path(&self, digest: &[u8; 32]) -> (PathBuf, String) {
>> +        let (chunk_path, digest_str) = self.chunk_path(digest);
>> +        (
>> +            chunk_path.with_extension(BACKEND_UPLOAD_MARKER_EXT),
>> +            digest_str,
>> +        )
>> +    }
> 
> this should be internal as well
> 
>> +
>>       pub fn relative_path(&self, path: &Path) -> PathBuf {
>>           // unwrap: only `None` in unit tests
>>           assert!(self.locker.is_some());
>> @@ -773,14 +835,26 @@ fn test_chunk_store1() {
>>   
>>       if let Err(_e) = std::fs::remove_dir_all(".testdir") { /* ignore */ }
>>   
>> -    let chunk_store = ChunkStore::open("test", &path, DatastoreFSyncLevel::None);
>> +    let chunk_store = ChunkStore::open(
>> +        "test",
>> +        &path,
>> +        DatastoreFSyncLevel::None,
>> +        DatastoreBackendType::Filesystem,
>> +    );
>>       assert!(chunk_store.is_err());
>>   
>>       let user = nix::unistd::User::from_uid(nix::unistd::Uid::current())
>>           .unwrap()
>>           .unwrap();
>> -    let chunk_store =
>> -        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None).unwrap();
>> +    let chunk_store = ChunkStore::create(
>> +        "test",
>> +        &path,
>> +        user.uid,
>> +        user.gid,
>> +        DatastoreFSyncLevel::None,
>> +        DatastoreBackendType::Filesystem,
>> +    )
>> +    .unwrap();
>>   
>>       let (chunk, digest) = crate::data_blob::DataChunkBuilder::new(&[0u8, 1u8])
>>           .build()
>> @@ -792,8 +866,14 @@ fn test_chunk_store1() {
>>       let (exists, _) = chunk_store.insert_chunk(&chunk, &digest).unwrap();
>>       assert!(exists);
>>   
>> -    let chunk_store =
>> -        ChunkStore::create("test", &path, user.uid, user.gid, DatastoreFSyncLevel::None);
>> +    let chunk_store = ChunkStore::create(
>> +        "test",
>> +        &path,
>> +        user.uid,
>> +        user.gid,
>> +        DatastoreFSyncLevel::None,
>> +        DatastoreBackendType::Filesystem,
>> +    );
>>       assert!(chunk_store.is_err());
>>   
>>       if let Err(_e) = std::fs::remove_dir_all(".testdir") { /* ignore */ }
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 4f55eb9db..58fb863ec 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -348,10 +348,14 @@ impl DataStore {
>>                   DatastoreTuning::API_SCHEMA
>>                       .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
>>               )?;
>> +            let backend_config: DatastoreBackendConfig =
>> +                config.backend.as_deref().unwrap_or("").parse()?;
>> +            let backend_type = backend_config.ty.unwrap_or_default();
>>               Arc::new(ChunkStore::open(
>>                   name,
>>                   config.absolute_path(),
>>                   tuning.sync_level.unwrap_or_default(),
>> +                backend_type,
>>               )?)
>>           };
>>   
>> @@ -436,10 +440,16 @@ impl DataStore {
>>               DatastoreTuning::API_SCHEMA
>>                   .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
>>           )?;
>> +        let backend_config: DatastoreBackendConfig = serde_json::from_value(
>> +            DatastoreBackendConfig::API_SCHEMA
>> +                .parse_property_string(config.backend.as_deref().unwrap_or(""))?,
>> +        )?;
>> +        let backend_type = backend_config.ty.unwrap_or_default();
>>           let chunk_store = ChunkStore::open(
>>               &name,
>>               config.absolute_path(),
>>               tuning.sync_level.unwrap_or_default(),
>> +            backend_type,
>>           )?;
>>           let inner = Arc::new(Self::with_store_and_config(
>>               Arc::new(chunk_store),
>> @@ -1663,6 +1673,15 @@ impl DataStore {
>>                       let atime = match std::fs::metadata(&chunk_path) {
>>                           Ok(stat) => stat.accessed()?,
>>                           Err(err) if err.kind() == std::io::ErrorKind::NotFound => {
>> +                            if std::fs::metadata(
>> +                                chunk_path
>> +                                    .with_extension(crate::chunk_store::BACKEND_UPLOAD_MARKER_EXT),
>> +                            )
>> +                            .is_ok()
>> +                            {
>> +                                info!("keep in progress {}", content.key);
> 
> what does `keep in progress` mean?
> 
>> +                                continue;
>> +                            }
> 
> this should call the new helper.
> 
>>                               // File not found, delete by setting atime to unix epoch
>>                               info!("Not found, mark for deletion: {}", content.key);
>>                               SystemTime::UNIX_EPOCH
>> @@ -1867,6 +1886,19 @@ impl DataStore {
>>           self.inner.chunk_store.insert_chunk(chunk, digest)
>>       }
>>   
>> +    /// Inserts the marker file to signal an in progress upload to the backend
>> +    ///
>> +    /// The presence of the marker avoids a race between inserting the chunk into the
>> +    /// datastore and cleanup of the chunk by garbage collection.
>> +    pub fn insert_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
>> +        self.inner.chunk_store.insert_backend_upload_marker(digest)
>> +    }
>> +
>> +    /// Remove the marker file signaling an in-progress upload to the backend
>> +    pub fn cleanup_backend_upload_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
>> +        self.inner.chunk_store.cleanup_backend_upload_marker(digest)
>> +    }
>> +
>>       pub fn stat_chunk(&self, digest: &[u8; 32]) -> Result<std::fs::Metadata, Error> {
>>           let (chunk_path, _digest_str) = self.inner.chunk_store.chunk_path(digest);
>>           std::fs::metadata(chunk_path).map_err(Error::from)
>> diff --git a/src/api2/backup/upload_chunk.rs b/src/api2/backup/upload_chunk.rs
>> index 8dd7e4d52..d4b1850eb 100644
>> --- a/src/api2/backup/upload_chunk.rs
>> +++ b/src/api2/backup/upload_chunk.rs
>> @@ -259,6 +259,7 @@ async fn upload_to_backend(
>>                       data.len()
>>                   );
>>               }
>> +            let datastore = env.datastore.clone();
>>   
>>               if env.no_cache {
>>                   let object_key = pbs_datastore::s3::object_key_from_digest(&digest)?;
>> @@ -272,11 +273,11 @@ async fn upload_to_backend(
>>               // Avoid re-upload to S3 if the chunk is either present in the LRU cache or the chunk
>>               // file exists on filesystem. The latter means that the chunk has been present in the
>>               // past an was not cleaned up by garbage collection, so contained in the S3 object store.
>> -            if env.datastore.cache_contains(&digest) {
>> +            if datastore.cache_contains(&digest) {
>>                   tracing::info!("Skip upload of cached chunk {}", hex::encode(digest));
>>                   return Ok((digest, size, encoded_size, true));
>>               }
>> -            if let Ok(true) = env.datastore.cond_touch_chunk(&digest, false) {
>> +            if let Ok(true) = datastore.cond_touch_chunk(&digest, false) {
>>                   tracing::info!(
>>                       "Skip upload of already encountered chunk {}",
>>                       hex::encode(digest)
>> @@ -284,18 +285,24 @@ async fn upload_to_backend(
>>                   return Ok((digest, size, encoded_size, true));
>>               }
>>   
>> +            datastore.insert_backend_upload_marker(&digest)?;
> 
> what stops two concurrent uploads from ending up here and making this
> return an error for the second client to end up attempting to insert the
> marker file? in A, we only take the mutex for inserting the marker file
> itself..

This issue here is actually rather problematic... One could backoff and 
busy wait? Or maybe use an inotify watch on the file to get informed 
when the concurrent upload is finished/failed and cleaned up the upload 
marker. Only issue with that is for cases where there is a lingering 
upload marker which will never get cleaned up e.g. because of a crash...

The better option is probably to rather not fail on duplicate marker 
creation, allowing concurrent uploads to the s3 backend and sync up on 
insert, not failing on marker removal if the chunk file already exists 
because it has been inserted by the concurrent upload. Must do some more 
research on how the s3 backend handles such concurrent object upload cases.

> 
> there's a second issue - what if we end up calling
> datastore.insert_backend_upload_marker here, it blocks on the mutex,
> because another insertion is going on that already holds it..
> 
> then we end up creating the marker file here, even though the chunk is
> already in the chunk store..

True, but this is not as problematic as one can re-checking if the chunk 
file exists after acquiring the lock before writing the marker and short 
circuit there, skipping the re-upload and re-insert.

> 
> mutexes are not (guaranteed to be) fair, so we can have the situation
> that two backup clients concurrently attempt to upload the same chunk,
> with one of them succeeding across the whole
> 
> - insert marker
> - upload to S3
> - insert chunk and remove marker
> 
> process, with the other one blocked on obtaining the mutex for inserting
> the marker file if there is enough contention on the chunk store lock
> 
>>               tracing::info!("Upload of new chunk {}", hex::encode(digest));
>>               let object_key = pbs_datastore::s3::object_key_from_digest(&digest)?;
> 
> (side not - if this fails, then the marker file is left around..)

Moved the s3 key creation to be before the marker insertion, just like 
for the no-cache case, thanks.

>> -            let is_duplicate = s3_client
>> +            let is_duplicate = match s3_client
>>                   .upload_no_replace_with_retry(object_key, data.clone())
>>                   .await
>> -                .map_err(|err| format_err!("failed to upload chunk to s3 backend - {err:#}"))?;
>> +            {
>> +                Ok(is_duplicate) => is_duplicate,
>> +                Err(err) => {
>> +                    datastore.cleanup_backend_upload_marker(&digest)?;
>> +                    bail!("failed to upload chunk to s3 backend - {err:#}");
>> +                }
>> +            };
>>   
>>               // Only insert the chunk into the cache after it has been successufuly uploaded.
>>               // Although less performant than doing this in parallel, it is required for consisency
>>               // since chunks are considered as present on the backend if the file exists in the local
>>               // cache store.
>> -            let datastore = env.datastore.clone();
>>               tracing::info!("Caching of chunk {}", hex::encode(digest));
>>               let _ = tokio::task::spawn_blocking(move || {
>>                   let chunk = DataBlob::from_raw(data.to_vec())?;
> 
> and down below here, if the chunk was inserted by somebody else, the
> cache_insert call on the next line here will end up calling insert_chunk
> on the chunk_store, which won't clear the marker unless it actually
> freshly inserts the chunk, which doesn't happen if it already exists..
> 
>> diff --git a/src/api2/config/datastore.rs b/src/api2/config/datastore.rs
>> index 3b03c0466..541bd0a04 100644
>> --- a/src/api2/config/datastore.rs
>> +++ b/src/api2/config/datastore.rs
>> @@ -173,6 +173,7 @@ pub(crate) fn do_create_datastore(
>>                   &datastore.name,
>>                   &path,
>>                   tuning.sync_level.unwrap_or_default(),
>> +                backend_type,
>>               )
>>           })?
>>       } else {
>> @@ -204,6 +205,7 @@ pub(crate) fn do_create_datastore(
>>               backup_user.uid,
>>               backup_user.gid,
>>               tuning.sync_level.unwrap_or_default(),
>> +            backend_type,
>>           )?
>>       };
>>   
>> -- 
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [pbs-devel] superseded: [PATCH proxmox-backup 0/7] s3 store: fix issues with chunk s3 backend upload and cache eviction
  2025-10-06 10:41 [pbs-devel] [PATCH proxmox-backup 0/7] s3 store: fix issues with chunk s3 backend upload and cache eviction Christian Ebner
                   ` (7 preceding siblings ...)
  2025-10-06 13:18 ` [pbs-devel] [PATCH proxmox-backup 0/7] s3 store: fix issues with chunk s3 backend upload and cache eviction Fabian Grünbichler
@ 2025-10-08 15:22 ` Christian Ebner
  8 siblings, 0 replies; 19+ messages in thread
From: Christian Ebner @ 2025-10-08 15:22 UTC (permalink / raw)
  To: pbs-devel

superseded-by version 2:
https://lore.proxmox.com/pbs-devel/20251008152125.849216-1-c.ebner@proxmox.com/T/


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-10-08 15:22 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-06 10:41 [pbs-devel] [PATCH proxmox-backup 0/7] s3 store: fix issues with chunk s3 backend upload and cache eviction Christian Ebner
2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 1/7] datastore: gc: inline single callsite method Christian Ebner
2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 2/7] gc: chunk store: rework atime check and gc status into common helper Christian Ebner
2025-10-06 13:14   ` Fabian Grünbichler
2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 3/7] chunk store: add and use method to remove chunks Christian Ebner
2025-10-06 13:17   ` Fabian Grünbichler
2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 4/7] chunk store: fix: replace evicted cache chunks instead of truncate Christian Ebner
2025-10-06 13:18   ` Fabian Grünbichler
2025-10-06 15:35     ` Christian Ebner
2025-10-06 16:14       ` Christian Ebner
2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 5/7] api: chunk upload: fix race between chunk backend upload and insert Christian Ebner
2025-10-06 13:18   ` Fabian Grünbichler
2025-10-07 10:15     ` Christian Ebner
2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 6/7] api: chunk upload: fix race with garbage collection for no-cache on s3 Christian Ebner
2025-10-06 13:18   ` Fabian Grünbichler
2025-10-06 10:41 ` [pbs-devel] [PATCH proxmox-backup 7/7] pull: guard chunk upload and only insert into cache after upload Christian Ebner
2025-10-06 13:18   ` Fabian Grünbichler
2025-10-06 13:18 ` [pbs-devel] [PATCH proxmox-backup 0/7] s3 store: fix issues with chunk s3 backend upload and cache eviction Fabian Grünbichler
2025-10-08 15:22 ` [pbs-devel] superseded: " Christian Ebner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal