* [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend
@ 2025-11-11 14:29 Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 01/19] datastore: GC: drop overly verbose info message during s3 chunk sweep Christian Ebner
` (19 more replies)
0 siblings, 20 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:29 UTC (permalink / raw)
To: pbs-devel
These patches fix possible race conditions on datastores with s3 backend for
chunk insert, renaming of corrupt chunks during verification and cleanup during
garbage collection. Further, the patches assure consistency between the chunk
marker file of the local datastore cache, the s3 object store and the in-memory
LRU cache during state changes occurring by one of the above mentioned operations.
Consistency is achieved by using a per-chunk file locking mechanism. File locks
are stored on the predefined location for datastore file locks, using the same
`.chunks/prefix/digest` folder layout for consistency and to keep readdir and
other fs operations performant.
As part of the series it is now also assured that chunks which are removed from
the local datastore cache, are also dropped from it's in-memory LRU cache and
therefore a consistent state is achieved. Further, bad chunks are touched as
well during GC phase 1 for s3 backed datastores and the creation of missing
marker files performed conditionally, to avoid consistency issues.
Changes since version 4:
- Incorporated patches by Fabian for better handling of the chunk store mutex locking
- Add patches to fix missing marker file creation and keeping of bad chunks during
garbage collection for s3 backend
- Document locking order restrictions
Changes since version 3:
- Add patches to limit visibility of BackupDir and BackupGroup destroy
- Refactored s3 upload index helper
- Avoid unneeded double stat for GC phase 3 clenaups
Changes since version 2:
- Incorporate additional race fix as discussed in
https://lore.proxmox.com/pbs-devel/8ab74557-9592-43e7-8706-10fceaae31b7@proxmox.com/T/
and suggested offlist.
Changes since version 1 (thanks @Fabian for review):
- Fix lock inversion for rename corrup chunk.
- Inline the chunk lock helper, making it explicit and thereby avoid calling the
helper for regular datastores.
- Pass the backend to the add_blob datastore helper, so it can be reused for the
backup session and pull sync job.
- Move also the s3 index upload helper from the backup env to the datastore, and
reuse it for the sync job as well.
This patch series obsoletes two previous patch series with unfortunately
incomplete bugfix attempts found at:
- https://lore.proxmox.com/pbs-devel/8d711a20-b193-47a9-8f38-6ce800e6d0e8@proxmox.com/T/
- https://lore.proxmox.com/pbs-devel/20251015164008.975591-1-c.ebner@proxmox.com/T/
proxmox-backup:
Christian Ebner (16):
datastore: GC: drop overly verbose info message during s3 chunk sweep
chunk store: implement per-chunk file locking helper for s3 backend
datastore: acquire chunk store mutex lock when renaming corrupt chunk
datastore: get per-chunk file lock for chunk rename on s3 backend
fix #6961: datastore: verify: evict corrupt chunks from in-memory LRU
cache
datastore: add locking to protect against races on chunk insert for s3
GC: fix race with chunk upload/insert on s3 backends
chunk store: reduce exposure of clear_chunk() to crate only
chunk store: make chunk removal a helper method of the chunk store
GC: cleanup chunk markers from cache in phase 3 on s3 backends
GC: touch bad chunk files independent of backend type
GC: guard missing marker file insertion for s3 backed stores
GC: s3: track if a chunk marker file is missing since a bad chunk
chunk store: add helpers marking missing local chunk markers as
expected
GC: assure chunk exists on s3 store when creating missing chunk marker
datastore: document s3 backend specific locking restrictions
Fabian Grünbichler (3):
store: split insert_chunk into wrapper + unsafe locked implementation
store: cache: move Mutex acquire to cache insertion
chunk store: rename cache-specific helpers
pbs-datastore/src/backup_info.rs | 2 +-
pbs-datastore/src/chunk_store.rs | 124 ++++++++++++++--
pbs-datastore/src/datastore.rs | 140 ++++++++++++------
pbs-datastore/src/lib.rs | 13 ++
.../src/local_datastore_lru_cache.rs | 23 ++-
5 files changed, 236 insertions(+), 66 deletions(-)
Summary over all repositories:
5 files changed, 236 insertions(+), 66 deletions(-)
--
Generated by git-murpp 0.8.1
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 01/19] datastore: GC: drop overly verbose info message during s3 chunk sweep
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
@ 2025-11-11 14:29 ` Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 02/19] chunk store: implement per-chunk file locking helper for s3 backend Christian Ebner
` (18 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:29 UTC (permalink / raw)
To: pbs-devel
This message does not provide much additional information and is
shown on correct operation anyways, leaving it however open for
misinterpretation of being an error/warning. Drop it in favor of
being less verbose and not potentially spam the task log in case of
many chunks being removed, still being visible in the final garbage
collection stats output.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/datastore.rs | 1 -
1 file changed, 1 deletion(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index b9d4ed2d2..7e5c8b0f2 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1674,7 +1674,6 @@ impl DataStore {
Ok(stat) => stat.accessed()?,
Err(err) if err.kind() == std::io::ErrorKind::NotFound => {
// File not found, delete by setting atime to unix epoch
- info!("Not found, mark for deletion: {}", content.key);
SystemTime::UNIX_EPOCH
}
Err(err) => return Err(err.into()),
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 02/19] chunk store: implement per-chunk file locking helper for s3 backend
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 01/19] datastore: GC: drop overly verbose info message during s3 chunk sweep Christian Ebner
@ 2025-11-11 14:29 ` Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 03/19] datastore: acquire chunk store mutex lock when renaming corrupt chunk Christian Ebner
` (17 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:29 UTC (permalink / raw)
To: pbs-devel
Adds chunk store helper methods to create per-chunk file locks. These
will be used to guard chunk operations on s3 backends to guarantee
exclusive access when performing cache and backend operations.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/backup_info.rs | 2 +-
pbs-datastore/src/chunk_store.rs | 26 ++++++++++++++++++++++++++
2 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
index 29ca6f901..859039cf4 100644
--- a/pbs-datastore/src/backup_info.rs
+++ b/pbs-datastore/src/backup_info.rs
@@ -939,7 +939,7 @@ fn lock_file_path_helper(ns: &BackupNamespace, path: PathBuf) -> PathBuf {
/// deletion.
///
/// It also creates the base directory for lock files.
-fn lock_helper<F>(
+pub(crate) fn lock_helper<F>(
store_name: &str,
path: &std::path::Path,
lock_fn: F,
diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index b88a0a096..8261080aa 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -8,6 +8,7 @@ use anyhow::{bail, format_err, Context, Error};
use tracing::{info, warn};
use pbs_api_types::{DatastoreFSyncLevel, GarbageCollectionStatus};
+use pbs_config::BackupLockGuard;
use proxmox_io::ReadExt;
use proxmox_s3_client::S3Client;
use proxmox_sys::fs::{create_dir, create_path, file_type_from_file_stat, CreateOptions};
@@ -16,6 +17,7 @@ use proxmox_sys::process_locker::{
};
use proxmox_worker_task::WorkerTaskContext;
+use crate::backup_info::DATASTORE_LOCKS_DIR;
use crate::data_blob::DataChunkBuilder;
use crate::file_formats::{
COMPRESSED_BLOB_MAGIC_1_0, ENCRYPTED_BLOB_MAGIC_1_0, UNCOMPRESSED_BLOB_MAGIC_1_0,
@@ -776,6 +778,30 @@ impl ChunkStore {
ChunkStore::check_permissions(lockfile_path, 0o644)?;
Ok(())
}
+
+ /// Generates the path to the chunks lock file
+ pub(crate) fn chunk_lock_path(&self, digest: &[u8]) -> PathBuf {
+ let mut lock_path = Path::new(DATASTORE_LOCKS_DIR).join(self.name.clone());
+ let digest_str = hex::encode(digest);
+ lock_path.push(".chunks");
+ let prefix = digest_to_prefix(digest);
+ lock_path.push(&prefix);
+ lock_path.push(&digest_str);
+ lock_path
+ }
+
+ /// Get an exclusive lock on the chunks lock file
+ pub(crate) fn lock_chunk(
+ &self,
+ digest: &[u8],
+ timeout: Duration,
+ ) -> Result<BackupLockGuard, Error> {
+ let lock_path = self.chunk_lock_path(digest);
+ let guard = crate::backup_info::lock_helper(self.name(), &lock_path, |path| {
+ pbs_config::open_backup_lockfile(path, Some(timeout), true)
+ })?;
+ Ok(guard)
+ }
}
#[test]
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 03/19] datastore: acquire chunk store mutex lock when renaming corrupt chunk
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 01/19] datastore: GC: drop overly verbose info message during s3 chunk sweep Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 02/19] chunk store: implement per-chunk file locking helper for s3 backend Christian Ebner
@ 2025-11-11 14:29 ` Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 04/19] datastore: get per-chunk file lock for chunk rename on s3 backend Christian Ebner
` (16 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:29 UTC (permalink / raw)
To: pbs-devel
When renaming a corrupt chunk in the chunk store, currently the chunk
store mutex lock is not held, leading to possible races with other
operations which hold the lock and therefore assume exclusive access
such as garbage collection or backup chunk inserts. This affects
both, filesystem and S3 backends.
To fix the possible race, get the lock and rearrange the code such
that it is never held when entering async code for the s3 backend.
This does not yet solve the race and cache consistency for s3
backends, which is addressed by introducing per-chunk file locking
in subsequent patches.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/datastore.rs | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 7e5c8b0f2..905d709e7 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -2596,6 +2596,8 @@ impl DataStore {
pub fn rename_corrupt_chunk(&self, digest: &[u8; 32]) -> Result<Option<PathBuf>, Error> {
let (path, digest_str) = self.chunk_path(digest);
+ let _lock = self.inner.chunk_store.mutex().lock().unwrap();
+
let mut counter = 0;
let mut new_path = path.clone();
loop {
@@ -2607,6 +2609,14 @@ impl DataStore {
}
}
+ let result = match std::fs::rename(&path, &new_path) {
+ Ok(_) => Ok(Some(new_path)),
+ Err(err) if err.kind() == std::io::ErrorKind::NotFound => Ok(None),
+ Err(err) => bail!("could not rename corrupt chunk {path:?} - {err}"),
+ };
+
+ drop(_lock);
+
let backend = self.backend().map_err(|err| {
format_err!(
"failed to get backend while trying to rename bad chunk: {digest_str} - {err}"
@@ -2637,10 +2647,6 @@ impl DataStore {
)?;
}
- match std::fs::rename(&path, &new_path) {
- Ok(_) => Ok(Some(new_path)),
- Err(err) if err.kind() == std::io::ErrorKind::NotFound => Ok(None),
- Err(err) => bail!("could not rename corrupt chunk {path:?} - {err}"),
- }
+ result
}
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 04/19] datastore: get per-chunk file lock for chunk rename on s3 backend
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
` (2 preceding siblings ...)
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 03/19] datastore: acquire chunk store mutex lock when renaming corrupt chunk Christian Ebner
@ 2025-11-11 14:29 ` Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 05/19] fix #6961: datastore: verify: evict corrupt chunks from in-memory LRU cache Christian Ebner
` (15 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:29 UTC (permalink / raw)
To: pbs-devel
To guarantee exclusive access during s3 object store and cache
operations, acquire the per-chunk file before renaming a chunk when
the datastore is backed by s3.
This does not yet cover locking for the GC and chunk insert, part
of subsequent changes.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/datastore.rs | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 905d709e7..a3c0f32aa 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -56,6 +56,8 @@ pub const GROUP_OWNER_FILE_NAME: &str = "owner";
/// Filename for in-use marker stored on S3 object store backend
pub const S3_DATASTORE_IN_USE_MARKER: &str = ".in-use";
const NAMESPACE_MARKER_FILENAME: &str = ".namespace";
+// s3 put request times out after upload_size / 1 Kib/s, so about 2.3 hours for 8 MiB
+const CHUNK_LOCK_TIMEOUT: Duration = Duration::from_secs(3 * 60 * 60);
/// checks if auth_id is owner, or, if owner is a token, if
/// auth_id is the user of the token
@@ -2596,6 +2598,13 @@ impl DataStore {
pub fn rename_corrupt_chunk(&self, digest: &[u8; 32]) -> Result<Option<PathBuf>, Error> {
let (path, digest_str) = self.chunk_path(digest);
+ let _chunk_guard;
+ if let DatastoreBackendType::S3 = self.inner.backend_config.ty.unwrap_or_default() {
+ _chunk_guard = self
+ .inner
+ .chunk_store
+ .lock_chunk(digest, CHUNK_LOCK_TIMEOUT)?;
+ }
let _lock = self.inner.chunk_store.mutex().lock().unwrap();
let mut counter = 0;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 05/19] fix #6961: datastore: verify: evict corrupt chunks from in-memory LRU cache
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
` (3 preceding siblings ...)
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 04/19] datastore: get per-chunk file lock for chunk rename on s3 backend Christian Ebner
@ 2025-11-11 14:29 ` Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 06/19] datastore: add locking to protect against races on chunk insert for s3 Christian Ebner
` (14 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:29 UTC (permalink / raw)
To: pbs-devel
lock cache store on rename of fs backed datastores and additional
per-chunk file lock on s3 backend.
Chunks detected as corrupt have been renamed on both, the S3 backend
and the local datastore cache, but not evicted from the in-memory
cache containing the LRU chunk digests. This can lead to the chunks
being considered as already present if their digest is still cached,
and therefore not being re-inserted in the local store cache and S3
backend on backup upload.
Fix this by not only renaming the local datastore's chunk marker
file, but also removing it from the in-memory cache while holding the
chunk store mutex lock to exclude interference from concurrent chunk
inserts.
Reported-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=6961
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/datastore.rs | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index a3c0f32aa..5680382cd 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -2618,6 +2618,11 @@ impl DataStore {
}
}
+ if let Some(cache) = self.cache() {
+ // Locks are being held, so it is safe to call the method
+ let _ = unsafe { cache.remove(digest) };
+ }
+
let result = match std::fs::rename(&path, &new_path) {
Ok(_) => Ok(Some(new_path)),
Err(err) if err.kind() == std::io::ErrorKind::NotFound => Ok(None),
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 06/19] datastore: add locking to protect against races on chunk insert for s3
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
` (4 preceding siblings ...)
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 05/19] fix #6961: datastore: verify: evict corrupt chunks from in-memory LRU cache Christian Ebner
@ 2025-11-11 14:29 ` Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 07/19] GC: fix race with chunk upload/insert on s3 backends Christian Ebner
` (13 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:29 UTC (permalink / raw)
To: pbs-devel
Acquire the per-chunk file lock to get exclusive access to the chunk
on insert and make it possible to guarantee that s3 backend
operations and the local caches reflect the consistent state without
other operations such as verify chunk renaming or garbage collection
interfering.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/datastore.rs | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 5680382cd..4654d4416 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1913,6 +1913,10 @@ impl DataStore {
match backend {
DatastoreBackend::Filesystem => self.inner.chunk_store.insert_chunk(chunk, digest),
DatastoreBackend::S3(s3_client) => {
+ let _chunk_guard = self
+ .inner
+ .chunk_store
+ .lock_chunk(digest, CHUNK_LOCK_TIMEOUT)?;
let chunk_data: Bytes = chunk.raw_data().to_vec().into();
let chunk_size = chunk_data.len() as u64;
let object_key = crate::s3::object_key_from_digest(digest)?;
@@ -1933,6 +1937,10 @@ impl DataStore {
) -> Result<(bool, u64), Error> {
let chunk_data = chunk.raw_data();
let chunk_size = chunk_data.len() as u64;
+ let _chunk_guard = self
+ .inner
+ .chunk_store
+ .lock_chunk(digest, CHUNK_LOCK_TIMEOUT)?;
// Avoid re-upload to S3 if the chunk is either present in the in-memory LRU cache
// or the chunk marker file exists on filesystem. The latter means the chunk has
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 07/19] GC: fix race with chunk upload/insert on s3 backends
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
` (5 preceding siblings ...)
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 06/19] datastore: add locking to protect against races on chunk insert for s3 Christian Ebner
@ 2025-11-11 14:29 ` Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 08/19] chunk store: reduce exposure of clear_chunk() to crate only Christian Ebner
` (12 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:29 UTC (permalink / raw)
To: pbs-devel
The previous approach relying soly on local marker files was flawed
as it could not protect against all the possible races between chunk
upload to the s3 object store, insertion in the local datastore cache
and in-memory LRU cache.
Since these operations are now protected by getting a per-chunk file
lock, use that to check if it is safe to remove the chunk and do so
in a consistent manner by holding the chunk lock guard until it is
actually removed from the s3 backend and the caches.
Since an error when removing the chunk from cache could lead to
inconsistencies, GC must now fail in that case.
The chunk store lock is now acquired for each chunk to follow the
required locking order to avoid deadlocks:
- per-chunk file lock
- chunk store mutex lock
- lru cache mutex lock
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/datastore.rs | 27 ++++++++++++++++++---------
1 file changed, 18 insertions(+), 9 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 4654d4416..13a33af1c 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1662,14 +1662,19 @@ impl DataStore {
let mut delete_list = Vec::with_capacity(1000);
loop {
- let lock = self.inner.chunk_store.mutex().lock().unwrap();
-
for content in list_bucket_result.contents {
let (chunk_path, digest) = match self.chunk_path_from_object_key(&content.key) {
Some(path) => path,
None => continue,
};
+ let timeout = std::time::Duration::from_secs(0);
+ let _chunk_guard = match self.inner.chunk_store.lock_chunk(&digest, timeout) {
+ Ok(guard) => guard,
+ Err(_) => continue,
+ };
+ let _guard = self.inner.chunk_store.mutex().lock().unwrap();
+
// Check local markers (created or atime updated during phase1) and
// keep or delete chunk based on that.
let atime = match std::fs::metadata(&chunk_path) {
@@ -1697,10 +1702,9 @@ impl DataStore {
&mut gc_status,
|| {
if let Some(cache) = self.cache() {
- // ignore errors, phase 3 will retry cleanup anyways
- let _ = cache.remove(&digest);
+ cache.remove(&digest)?;
}
- delete_list.push(content.key);
+ delete_list.push((content.key, _chunk_guard));
Ok(())
},
)?;
@@ -1709,14 +1713,19 @@ impl DataStore {
chunk_count += 1;
}
- drop(lock);
-
if !delete_list.is_empty() {
- let delete_objects_result =
- proxmox_async::runtime::block_on(s3_client.delete_objects(&delete_list))?;
+ let delete_objects_result = proxmox_async::runtime::block_on(
+ s3_client.delete_objects(
+ &delete_list
+ .iter()
+ .map(|(key, _)| key.clone())
+ .collect::<Vec<S3ObjectKey>>(),
+ ),
+ )?;
if let Some(_err) = delete_objects_result.error {
bail!("failed to delete some objects");
}
+ // release all chunk guards
delete_list.clear();
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 08/19] chunk store: reduce exposure of clear_chunk() to crate only
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
` (6 preceding siblings ...)
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 07/19] GC: fix race with chunk upload/insert on s3 backends Christian Ebner
@ 2025-11-11 14:29 ` Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 09/19] chunk store: make chunk removal a helper method of the chunk store Christian Ebner
` (11 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:29 UTC (permalink / raw)
To: pbs-devel
This helper method is only used by the local datastore cache to clear
chunk when evicted from the cache, freeing up space usage. This should
not be called from outside pbs-datastore crate.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/chunk_store.rs | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 8261080aa..7099b6614 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -686,7 +686,7 @@ impl ChunkStore {
///
/// Used to evict chunks from the local datastore cache, while keeping them as in-use markers
/// for garbage collection. Returns with success also if chunk file is not pre-existing.
- pub fn clear_chunk(&self, digest: &[u8; 32]) -> Result<(), Error> {
+ pub(crate) fn clear_chunk(&self, digest: &[u8; 32]) -> Result<(), Error> {
let (chunk_path, digest_str) = self.chunk_path(digest);
let mut create_options = CreateOptions::new();
if nix::unistd::Uid::effective().is_root() {
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 09/19] chunk store: make chunk removal a helper method of the chunk store
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
` (7 preceding siblings ...)
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 08/19] chunk store: reduce exposure of clear_chunk() to crate only Christian Ebner
@ 2025-11-11 14:29 ` Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 10/19] store: split insert_chunk into wrapper + unsafe locked implementation Christian Ebner
` (10 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:29 UTC (permalink / raw)
To: pbs-devel
Better encapsulate functionality which touches the chunk store, to
assure a common interface is used. The local datastore LRU cache will
call this during garbage collection to clean up chunk marker files
for chunks which have been removed from the S3 object store backend.
This is in preparation for fixing a deadlock, no functional changes
intended.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/chunk_store.rs | 9 +++++++++
pbs-datastore/src/local_datastore_lru_cache.rs | 3 +--
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 7099b6614..a7ae4fca5 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -702,6 +702,15 @@ impl ChunkStore {
Ok(())
}
+ /// Removes a chunk marker file from the `LocalDatastoreLruCache`s chunk store.
+ ///
+ /// Callers must hold the per-chunk file lock in order to avoid races with renaming of corrupt
+ /// chunks by verifications and chunk inserts by backups.
+ pub(crate) fn remove_chunk(&self, digest: &[u8; 32]) -> Result<(), Error> {
+ let (chunk_path, _digest_str) = self.chunk_path(digest);
+ std::fs::remove_file(chunk_path).map_err(Error::from)
+ }
+
pub fn relative_path(&self, path: &Path) -> PathBuf {
// unwrap: only `None` in unit tests
assert!(self.locker.is_some());
diff --git a/pbs-datastore/src/local_datastore_lru_cache.rs b/pbs-datastore/src/local_datastore_lru_cache.rs
index fe3b51a55..7b9d8caae 100644
--- a/pbs-datastore/src/local_datastore_lru_cache.rs
+++ b/pbs-datastore/src/local_datastore_lru_cache.rs
@@ -52,8 +52,7 @@ impl LocalDatastoreLruCache {
/// Fails if the chunk cannot be deleted successfully.
pub(crate) unsafe fn remove(&self, digest: &[u8; 32]) -> Result<(), Error> {
self.cache.remove(*digest);
- let (path, _digest_str) = self.store.chunk_path(digest);
- std::fs::remove_file(path).map_err(Error::from)
+ self.store.remove_chunk(digest)
}
/// Access the locally cached chunk or fetch it from the S3 object store via the provided
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 10/19] store: split insert_chunk into wrapper + unsafe locked implementation
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
` (8 preceding siblings ...)
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 09/19] chunk store: make chunk removal a helper method of the chunk store Christian Ebner
@ 2025-11-11 14:29 ` Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 11/19] store: cache: move Mutex acquire to cache insertion Christian Ebner
` (9 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:29 UTC (permalink / raw)
To: pbs-devel
From: Fabian Grünbichler <f.gruenbichler@proxmox.com>
to allow calling it with the chunk store Mutex already held, to synchronize the
LRU cache and the chunk store inserts.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/chunk_store.rs | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index a7ae4fca5..085816f42 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -587,9 +587,22 @@ impl ChunkStore {
//println!("DIGEST {}", hex::encode(digest));
- let (chunk_path, digest_str) = self.chunk_path(digest);
+ let _lock = self.mutex.lock();
- let lock = self.mutex.lock();
+ // Safety: lock acquired above
+ unsafe { self.insert_chunk_nolock(chunk, digest) }
+ }
+
+ /// Safety: requires holding the chunk store mutex!
+ pub(crate) unsafe fn insert_chunk_nolock(
+ &self,
+ chunk: &DataBlob,
+ digest: &[u8; 32],
+ ) -> Result<(bool, u64), Error> {
+ // unwrap: only `None` in unit tests
+ assert!(self.locker.is_some());
+
+ let (chunk_path, digest_str) = self.chunk_path(digest);
let raw_data = chunk.raw_data();
let encoded_size = raw_data.len() as u64;
@@ -665,8 +678,6 @@ impl ChunkStore {
.map_err(|err| format_err!("fsync failed: {err}"))?;
}
- drop(lock);
-
Ok((false, encoded_size))
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 11/19] store: cache: move Mutex acquire to cache insertion
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
` (9 preceding siblings ...)
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 10/19] store: split insert_chunk into wrapper + unsafe locked implementation Christian Ebner
@ 2025-11-11 14:29 ` Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 12/19] chunk store: rename cache-specific helpers Christian Ebner
` (8 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:29 UTC (permalink / raw)
To: pbs-devel
From: Fabian Grünbichler <f.gruenbichler@proxmox.com>
to avoid the lock ordering issue between the cache implemention's internal
Mutex, and the chunk store Mutex, refactor the interface so that any cache
actions that modify the chunk store can acquire the chunk store Mutex first,
before locking the cache.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/chunk_store.rs | 6 +++---
.../src/local_datastore_lru_cache.rs | 20 ++++++++++++++-----
2 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 085816f42..a17c258a7 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -697,7 +697,9 @@ impl ChunkStore {
///
/// Used to evict chunks from the local datastore cache, while keeping them as in-use markers
/// for garbage collection. Returns with success also if chunk file is not pre-existing.
- pub(crate) fn clear_chunk(&self, digest: &[u8; 32]) -> Result<(), Error> {
+ ///
+ /// Safety: chunk store mutex must be held!
+ pub(crate) unsafe fn clear_chunk(&self, digest: &[u8; 32]) -> Result<(), Error> {
let (chunk_path, digest_str) = self.chunk_path(digest);
let mut create_options = CreateOptions::new();
if nix::unistd::Uid::effective().is_root() {
@@ -706,8 +708,6 @@ impl ChunkStore {
create_options = create_options.owner(uid).group(gid);
}
- let _lock = self.mutex.lock();
-
proxmox_sys::fs::replace_file(&chunk_path, &[], create_options, false)
.map_err(|err| format_err!("clear chunk failed for {digest_str} - {err}"))?;
Ok(())
diff --git a/pbs-datastore/src/local_datastore_lru_cache.rs b/pbs-datastore/src/local_datastore_lru_cache.rs
index 7b9d8caae..8b2dbedfd 100644
--- a/pbs-datastore/src/local_datastore_lru_cache.rs
+++ b/pbs-datastore/src/local_datastore_lru_cache.rs
@@ -34,9 +34,16 @@ impl LocalDatastoreLruCache {
///
/// Fails if the chunk cannot be inserted successfully.
pub fn insert(&self, digest: &[u8; 32], chunk: &DataBlob) -> Result<(), Error> {
- self.store.insert_chunk(chunk, digest)?;
- self.cache
- .insert(*digest, (), |digest| self.store.clear_chunk(&digest))
+ let _lock = self.store.mutex().lock().unwrap();
+
+ // Safety: lock acquire above
+ unsafe {
+ self.store.insert_chunk_nolock(chunk, digest)?;
+ }
+ self.cache.insert(*digest, (), |digest| {
+ // Safety: lock acquired above, this is executed inline!
+ unsafe { self.store.clear_chunk(&digest) }
+ })
}
/// Remove a chunk from the local datastore cache.
@@ -70,8 +77,11 @@ impl LocalDatastoreLruCache {
Ok(mut file) => match DataBlob::load_from_reader(&mut file) {
// File was still cached with contents, load response from file
Ok(chunk) => {
- self.cache
- .insert(*digest, (), |digest| self.store.clear_chunk(&digest))?;
+ let _lock = self.store.mutex().lock().unwrap();
+ self.cache.insert(*digest, (), |digest| {
+ // Safety: lock acquired above, this is executed inline
+ unsafe { self.store.clear_chunk(&digest) }
+ })?;
Ok(Some(chunk))
}
// File was empty, might have been evicted since
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 12/19] chunk store: rename cache-specific helpers
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
` (10 preceding siblings ...)
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 11/19] store: cache: move Mutex acquire to cache insertion Christian Ebner
@ 2025-11-11 14:29 ` Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 13/19] GC: cleanup chunk markers from cache in phase 3 on s3 backends Christian Ebner
` (7 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:29 UTC (permalink / raw)
To: pbs-devel
From: Fabian Grünbichler <f.gruenbichler@proxmox.com>
to make it more explicit that these operate in cache context, and are not
generic helpers as the original names kind of imply.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/chunk_store.rs | 4 ++--
pbs-datastore/src/local_datastore_lru_cache.rs | 6 +++---
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index a17c258a7..f5a77276d 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -699,7 +699,7 @@ impl ChunkStore {
/// for garbage collection. Returns with success also if chunk file is not pre-existing.
///
/// Safety: chunk store mutex must be held!
- pub(crate) unsafe fn clear_chunk(&self, digest: &[u8; 32]) -> Result<(), Error> {
+ pub(crate) unsafe fn replace_chunk_with_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
let (chunk_path, digest_str) = self.chunk_path(digest);
let mut create_options = CreateOptions::new();
if nix::unistd::Uid::effective().is_root() {
@@ -717,7 +717,7 @@ impl ChunkStore {
///
/// Callers must hold the per-chunk file lock in order to avoid races with renaming of corrupt
/// chunks by verifications and chunk inserts by backups.
- pub(crate) fn remove_chunk(&self, digest: &[u8; 32]) -> Result<(), Error> {
+ pub(crate) fn remove_chunk_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
let (chunk_path, _digest_str) = self.chunk_path(digest);
std::fs::remove_file(chunk_path).map_err(Error::from)
}
diff --git a/pbs-datastore/src/local_datastore_lru_cache.rs b/pbs-datastore/src/local_datastore_lru_cache.rs
index 8b2dbedfd..ae2215be1 100644
--- a/pbs-datastore/src/local_datastore_lru_cache.rs
+++ b/pbs-datastore/src/local_datastore_lru_cache.rs
@@ -42,7 +42,7 @@ impl LocalDatastoreLruCache {
}
self.cache.insert(*digest, (), |digest| {
// Safety: lock acquired above, this is executed inline!
- unsafe { self.store.clear_chunk(&digest) }
+ unsafe { self.store.replace_chunk_with_marker(&digest) }
})
}
@@ -59,7 +59,7 @@ impl LocalDatastoreLruCache {
/// Fails if the chunk cannot be deleted successfully.
pub(crate) unsafe fn remove(&self, digest: &[u8; 32]) -> Result<(), Error> {
self.cache.remove(*digest);
- self.store.remove_chunk(digest)
+ self.store.remove_chunk_marker(digest)
}
/// Access the locally cached chunk or fetch it from the S3 object store via the provided
@@ -80,7 +80,7 @@ impl LocalDatastoreLruCache {
let _lock = self.store.mutex().lock().unwrap();
self.cache.insert(*digest, (), |digest| {
// Safety: lock acquired above, this is executed inline
- unsafe { self.store.clear_chunk(&digest) }
+ unsafe { self.store.replace_chunk_with_marker(&digest) }
})?;
Ok(Some(chunk))
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 13/19] GC: cleanup chunk markers from cache in phase 3 on s3 backends
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
` (11 preceding siblings ...)
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 12/19] chunk store: rename cache-specific helpers Christian Ebner
@ 2025-11-11 14:29 ` Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 14/19] GC: touch bad chunk files independent of backend type Christian Ebner
` (6 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:29 UTC (permalink / raw)
To: pbs-devel
Pass along the in-memory cache when sweeping unused chunks in phase 3
of garbage collection for datastores with s3 backend.
When a dangling marker file has been detected, which will only happen
if the chunk was removed from the object store by some unexpected
interaction (e.g. manually removed from the bucket), this marker must be
removed to get a consistent state (snapshots referencing the chunk
remain however corrupt).
Clear such a chunk from both, the in-memory and local datastore cache,
so it can be reuploaded by future backup or sync jobs.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/chunk_store.rs | 22 +++++++++++++++++++++-
pbs-datastore/src/datastore.rs | 2 ++
2 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index f5a77276d..20f71efef 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -5,6 +5,7 @@ use std::sync::{Arc, Mutex};
use std::time::Duration;
use anyhow::{bail, format_err, Context, Error};
+use hex::FromHex;
use tracing::{info, warn};
use pbs_api_types::{DatastoreFSyncLevel, GarbageCollectionStatus};
@@ -22,7 +23,7 @@ use crate::data_blob::DataChunkBuilder;
use crate::file_formats::{
COMPRESSED_BLOB_MAGIC_1_0, ENCRYPTED_BLOB_MAGIC_1_0, UNCOMPRESSED_BLOB_MAGIC_1_0,
};
-use crate::DataBlob;
+use crate::{DataBlob, LocalDatastoreLruCache};
/// File system based chunk store
pub struct ChunkStore {
@@ -383,6 +384,7 @@ impl ChunkStore {
min_atime: i64,
status: &mut GarbageCollectionStatus,
worker: &dyn WorkerTaskContext,
+ cache: Option<&LocalDatastoreLruCache>,
) -> Result<(), Error> {
// unwrap: only `None` in unit tests
assert!(self.locker.is_some());
@@ -436,6 +438,24 @@ impl ChunkStore {
bad,
status,
|| {
+ // non-bad S3 chunks need to be removed via cache
+ if let Some(cache) = cache {
+ if !bad {
+ let digest = <[u8; 32]>::from_hex(filename.to_bytes())?;
+
+ // unless there is a concurrent upload pending,
+ // must never block due to required locking order
+ if let Ok(_guard) =
+ self.lock_chunk(&digest, Duration::from_secs(0))
+ {
+ cache.remove(&digest)?;
+ }
+
+ return Ok(());
+ }
+ }
+
+ // bad or local chunks
unlinkat(Some(dirfd), filename, UnlinkatFlags::NoRemoveDir).map_err(
|err| {
format_err!(
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 13a33af1c..a2ff8655e 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1755,6 +1755,7 @@ impl DataStore {
min_atime,
&mut tmp_gc_status,
worker,
+ self.cache(),
)?;
} else {
self.inner.chunk_store.sweep_unused_chunks(
@@ -1762,6 +1763,7 @@ impl DataStore {
min_atime,
&mut gc_status,
worker,
+ None,
)?;
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 14/19] GC: touch bad chunk files independent of backend type
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
` (12 preceding siblings ...)
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 13/19] GC: cleanup chunk markers from cache in phase 3 on s3 backends Christian Ebner
@ 2025-11-11 14:29 ` Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 15/19] GC: guard missing marker file insertion for s3 backed stores Christian Ebner
` (5 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:29 UTC (permalink / raw)
To: pbs-devel
The bad chunks marker files should be kept around also for s3 backed
datastores so GC does not clean them up unless a good chunk was
re-uploaded and bad chunk renaming detects them and does not lead
to conflicting renames.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/datastore.rs | 64 ++++++++++++++++------------------
1 file changed, 30 insertions(+), 34 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index a2ff8655e..1c9466af7 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1322,41 +1322,37 @@ impl DataStore {
}
}
- match s3_client {
- None => {
- // Filesystem backend
- if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
- let hex = hex::encode(digest);
- warn!(
- "warning: unable to access non-existent chunk {hex}, required by {file_name:?}"
- );
-
- // touch any corresponding .bad files to keep them around, meaning if a chunk is
- // rewritten correctly they will be removed automatically, as well as if no index
- // file requires the chunk anymore (won't get to this loop then)
- for i in 0..=9 {
- let bad_ext = format!("{i}.bad");
- let mut bad_path = PathBuf::new();
- bad_path.push(self.chunk_path(digest).0);
- bad_path.set_extension(bad_ext);
- self.inner.chunk_store.cond_touch_path(&bad_path, false)?;
- }
- }
+ if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
+ let hex = hex::encode(digest);
+ warn!(
+ "warning: unable to access non-existent chunk {hex}, required by {file_name:?}"
+ );
+
+ // touch any corresponding .bad files to keep them around, meaning if a chunk is
+ // rewritten correctly they will be removed automatically, as well as if no index
+ // file requires the chunk anymore (won't get to this loop then)
+ for i in 0..=9 {
+ let bad_ext = format!("{i}.bad");
+ let mut bad_path = PathBuf::new();
+ bad_path.push(self.chunk_path(digest).0);
+ bad_path.set_extension(bad_ext);
+ self.inner.chunk_store.cond_touch_path(&bad_path, false)?;
}
- Some(ref _s3_client) => {
- // Update atime on local cache marker files.
- if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
- let (chunk_path, _digest) = self.chunk_path(digest);
- // Insert empty file as marker to tell GC phase2 that this is
- // a chunk still in-use, so to keep in the S3 object store.
- std::fs::File::options()
- .write(true)
- .create_new(true)
- .open(&chunk_path)
- .with_context(|| {
- format!("failed to create marker for chunk {}", hex::encode(digest))
- })?;
- }
+ }
+
+ if let Some(ref _s3_client) = s3_client {
+ // Update atime on local cache marker files.
+ if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
+ let (chunk_path, _digest) = self.chunk_path(digest);
+ // Insert empty file as marker to tell GC phase2 that this is
+ // a chunk still in-use, so to keep in the S3 object store.
+ std::fs::File::options()
+ .write(true)
+ .create_new(true)
+ .open(&chunk_path)
+ .with_context(|| {
+ format!("failed to create marker for chunk {}", hex::encode(digest))
+ })?;
}
}
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 15/19] GC: guard missing marker file insertion for s3 backed stores
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
` (13 preceding siblings ...)
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 14/19] GC: touch bad chunk files independent of backend type Christian Ebner
@ 2025-11-11 14:29 ` Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 16/19] GC: s3: track if a chunk marker file is missing since a bad chunk Christian Ebner
` (4 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:29 UTC (permalink / raw)
To: pbs-devel
The chunk marker file should only ever be missing if the local
datastore cache has been recreated (e.g. after setup on a new PBS
instance while reusing the s3 bucket contents) or by manual user
interaction. Garbage collection does re-create the marker in these
cases.
Guard this marker file creation by the per-chunk file lock to not run
into races with chunk insertion operations. Since this requires to
stat the chunk marker file and locking is expensive, first check if
the marker exists without holding a lock, only then lock and retry.
Since the first chunk file existence check is already performed
anyways move the logic to be within the non-existing branch thereof.
By making this happen after touching potential bad chunks, this will
allow to check these beforehand.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/datastore.rs | 46 ++++++++++++++++++----------------
1 file changed, 25 insertions(+), 21 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 1c9466af7..0fa86c939 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1323,36 +1323,40 @@ impl DataStore {
}
if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
- let hex = hex::encode(digest);
- warn!(
- "warning: unable to access non-existent chunk {hex}, required by {file_name:?}"
- );
-
+ let (chunk_path, _digest_str) = self.chunk_path(digest);
// touch any corresponding .bad files to keep them around, meaning if a chunk is
// rewritten correctly they will be removed automatically, as well as if no index
// file requires the chunk anymore (won't get to this loop then)
for i in 0..=9 {
let bad_ext = format!("{i}.bad");
- let mut bad_path = PathBuf::new();
- bad_path.push(self.chunk_path(digest).0);
+ let mut bad_path = chunk_path.clone();
bad_path.set_extension(bad_ext);
self.inner.chunk_store.cond_touch_path(&bad_path, false)?;
}
- }
- if let Some(ref _s3_client) = s3_client {
- // Update atime on local cache marker files.
- if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
- let (chunk_path, _digest) = self.chunk_path(digest);
- // Insert empty file as marker to tell GC phase2 that this is
- // a chunk still in-use, so to keep in the S3 object store.
- std::fs::File::options()
- .write(true)
- .create_new(true)
- .open(&chunk_path)
- .with_context(|| {
- format!("failed to create marker for chunk {}", hex::encode(digest))
- })?;
+ if let Some(ref _s3_client) = s3_client {
+ // Do not retry here, this is very unlikely to happen as chunk markers will
+ // most likely only be missing if the local cache store was recreated.
+ let _guard = self
+ .inner
+ .chunk_store
+ .lock_chunk(digest, CHUNK_LOCK_TIMEOUT)?;
+ if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
+ // Insert empty file as marker to tell GC phase2 that this is
+ // a chunk still in-use, so to keep in the S3 object store.
+ std::fs::File::options()
+ .write(true)
+ .create_new(true)
+ .open(&chunk_path)
+ .with_context(|| {
+ format!("failed to create marker for chunk {}", hex::encode(digest))
+ })?;
+ }
+ } else {
+ let hex = hex::encode(digest);
+ warn!(
+ "warning: unable to access non-existent chunk {hex}, required by {file_name:?}"
+ );
}
}
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 16/19] GC: s3: track if a chunk marker file is missing since a bad chunk
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
` (14 preceding siblings ...)
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 15/19] GC: guard missing marker file insertion for s3 backed stores Christian Ebner
@ 2025-11-11 14:29 ` Christian Ebner
2025-11-11 14:30 ` [pbs-devel] [PATCH proxmox-backup v5 17/19] chunk store: add helpers marking missing local chunk markers as expected Christian Ebner
` (3 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:29 UTC (permalink / raw)
To: pbs-devel
For bad chunks the marker file must never be created, as otherwise it
cannot be re-inserted by chunk upload.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/datastore.rs | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 0fa86c939..549bc3b41 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1327,11 +1327,14 @@ impl DataStore {
// touch any corresponding .bad files to keep them around, meaning if a chunk is
// rewritten correctly they will be removed automatically, as well as if no index
// file requires the chunk anymore (won't get to this loop then)
+ let mut is_bad = false;
for i in 0..=9 {
let bad_ext = format!("{i}.bad");
let mut bad_path = chunk_path.clone();
bad_path.set_extension(bad_ext);
- self.inner.chunk_store.cond_touch_path(&bad_path, false)?;
+ if self.inner.chunk_store.cond_touch_path(&bad_path, false)? {
+ is_bad = true;
+ }
}
if let Some(ref _s3_client) = s3_client {
@@ -1341,7 +1344,7 @@ impl DataStore {
.inner
.chunk_store
.lock_chunk(digest, CHUNK_LOCK_TIMEOUT)?;
- if !self.inner.chunk_store.cond_touch_chunk(digest, false)? {
+ if !self.inner.chunk_store.cond_touch_chunk(digest, false)? && !is_bad {
// Insert empty file as marker to tell GC phase2 that this is
// a chunk still in-use, so to keep in the S3 object store.
std::fs::File::options()
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 17/19] chunk store: add helpers marking missing local chunk markers as expected
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
` (15 preceding siblings ...)
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 16/19] GC: s3: track if a chunk marker file is missing since a bad chunk Christian Ebner
@ 2025-11-11 14:30 ` Christian Ebner
2025-11-11 14:30 ` [pbs-devel] [PATCH proxmox-backup v5 18/19] GC: assure chunk exists on s3 store when creating missing chunk marker Christian Ebner
` (2 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:30 UTC (permalink / raw)
To: pbs-devel
writes or removes a `<digest>.using` marker file in place of an
expected but missing local store cache chunk marker file for s3
backed datastores.
These markers will be used to identify chunks which are still in-use
and not bad, but the local store does not contain the marker file
(e.g. because the cache might be empty due to datastore re-creation
from s3 bucket).
These markers will be set in phase 1 of garbage collection and
replaced with the expected marker file by either a concurrent chunk
insert or if the listing of chunk in phase 2 of garbage collection
detects the chunk to be present on the s3 object store.
Phase 3 will clean up the files after falling out of the cutoff window,
given that the extension is chosen to fit the filename criteria.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/chunk_store.rs | 42 ++++++++++++++++++++++++++++++--
1 file changed, 40 insertions(+), 2 deletions(-)
diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
index 20f71efef..f10415d1f 100644
--- a/pbs-datastore/src/chunk_store.rs
+++ b/pbs-datastore/src/chunk_store.rs
@@ -721,18 +721,56 @@ impl ChunkStore {
/// Safety: chunk store mutex must be held!
pub(crate) unsafe fn replace_chunk_with_marker(&self, digest: &[u8; 32]) -> Result<(), Error> {
let (chunk_path, digest_str) = self.chunk_path(digest);
+ Self::create_marker_file(&chunk_path)
+ .map_err(|err| format_err!("clear chunk failed for {digest_str} - {err}"))?;
+ Ok(())
+ }
+
+ /// Helper to generate new empty marker file
+ fn create_marker_file(path: &Path) -> Result<(), Error> {
let mut create_options = CreateOptions::new();
if nix::unistd::Uid::effective().is_root() {
let uid = pbs_config::backup_user()?.uid;
let gid = pbs_config::backup_group()?.gid;
create_options = create_options.owner(uid).group(gid);
}
+ proxmox_sys::fs::replace_file(&path, &[], create_options, false)
+ }
- proxmox_sys::fs::replace_file(&chunk_path, &[], create_options, false)
- .map_err(|err| format_err!("clear chunk failed for {digest_str} - {err}"))?;
+ /// Mark chunk as expected to be present by writing a file the chunk store.
+ ///
+ /// Used to mark chunks which are found in index files during phase 1 of garbage collection
+ /// for s3 datastores, but the marker file is not present and it is seemingly not a bad chunk.
+ /// This might happen if the local store cache is empty after datastore re-creation.
+ pub(crate) fn mark_chunk_as_expected(&self, digest: &[u8; 32]) -> Result<(), Error> {
+ let marker_path = self.chunk_expected_marker_path(digest);
+ Self::create_marker_file(&marker_path)
+ .map_err(|err| format_err!("mark chunk failed for {} - {err}", hex::encode(digest)))?;
Ok(())
}
+ /// Remove chunk as required mark by removing file the chunk store.
+ ///
+ /// Returns true if the file was present and removed, false if the file did not exist.
+ pub(crate) fn clear_chunk_expected_mark(&self, digest: &[u8; 32]) -> Result<bool, Error> {
+ let marker_path = self.chunk_expected_marker_path(digest);
+ if let Err(err) = std::fs::remove_file(marker_path) {
+ if err.kind() != std::io::ErrorKind::NotFound {
+ return Err(err.into());
+ } else {
+ return Ok(false);
+ }
+ }
+ Ok(true)
+ }
+
+ /// Helper to generate marker file path for expected chunks
+ fn chunk_expected_marker_path(&self, digest: &[u8; 32]) -> PathBuf {
+ let (mut path, _digest_str) = self.chunk_path(digest);
+ path.set_extension("using");
+ path
+ }
+
/// Removes a chunk marker file from the `LocalDatastoreLruCache`s chunk store.
///
/// Callers must hold the per-chunk file lock in order to avoid races with renaming of corrupt
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 18/19] GC: assure chunk exists on s3 store when creating missing chunk marker
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
` (16 preceding siblings ...)
2025-11-11 14:30 ` [pbs-devel] [PATCH proxmox-backup v5 17/19] chunk store: add helpers marking missing local chunk markers as expected Christian Ebner
@ 2025-11-11 14:30 ` Christian Ebner
2025-11-11 14:30 ` [pbs-devel] [PATCH proxmox-backup v5 19/19] datastore: document s3 backend specific locking restrictions Christian Ebner
2025-11-14 13:21 ` [pbs-devel] superseded: [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:30 UTC (permalink / raw)
To: pbs-devel
Currently it is not assured the chunk is still present on the s3
object store before re-creating the chunk marker file. That will
however lead to the chunk not being re-inserted if re-uploaded.
Since checking the presence right away is expensive as it requires
additional api requests, mark the chunk as expected instead and
delay the existence check to phase 2 which must fetch the chunks
anyways.
Rely on the per-chunk file locks for consistency.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/datastore.rs | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 549bc3b41..bf06d6fda 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -1347,13 +1347,7 @@ impl DataStore {
if !self.inner.chunk_store.cond_touch_chunk(digest, false)? && !is_bad {
// Insert empty file as marker to tell GC phase2 that this is
// a chunk still in-use, so to keep in the S3 object store.
- std::fs::File::options()
- .write(true)
- .create_new(true)
- .open(&chunk_path)
- .with_context(|| {
- format!("failed to create marker for chunk {}", hex::encode(digest))
- })?;
+ self.inner.chunk_store.mark_chunk_as_expected(digest)?;
}
} else {
let hex = hex::encode(digest);
@@ -1683,8 +1677,16 @@ impl DataStore {
let atime = match std::fs::metadata(&chunk_path) {
Ok(stat) => stat.accessed()?,
Err(err) if err.kind() == std::io::ErrorKind::NotFound => {
- // File not found, delete by setting atime to unix epoch
- SystemTime::UNIX_EPOCH
+ if self.inner.chunk_store.clear_chunk_expected_mark(&digest)? {
+ unsafe {
+ // chunk store lock held
+ self.inner.chunk_store.replace_chunk_with_marker(&digest)?;
+ }
+ SystemTime::now()
+ } else {
+ // File not found, delete by setting atime to unix epoch
+ SystemTime::UNIX_EPOCH
+ }
}
Err(err) => return Err(err.into()),
};
@@ -1980,7 +1982,8 @@ impl DataStore {
)
.map_err(|err| format_err!("failed to upload chunk to s3 backend - {err:#}"))?;
tracing::info!("Caching of chunk {}", hex::encode(digest));
- self.cache_insert(&digest, &chunk)?;
+ self.cache_insert(digest, chunk)?;
+ self.inner.chunk_store.clear_chunk_expected_mark(digest)?;
Ok((is_duplicate, chunk_size))
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 19/19] datastore: document s3 backend specific locking restrictions
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
` (17 preceding siblings ...)
2025-11-11 14:30 ` [pbs-devel] [PATCH proxmox-backup v5 18/19] GC: assure chunk exists on s3 store when creating missing chunk marker Christian Ebner
@ 2025-11-11 14:30 ` Christian Ebner
2025-11-14 13:21 ` [pbs-devel] superseded: [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-11 14:30 UTC (permalink / raw)
To: pbs-devel
The requirements are stricter here, since not only must it be avoided
to hold std::sync::Mutex guards for async contexts, but also there
must be consistency between s3 object store, local datastore cache
and in-memory LRU cache.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
---
pbs-datastore/src/lib.rs | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/pbs-datastore/src/lib.rs b/pbs-datastore/src/lib.rs
index 849078a8f..1f7c54ae8 100644
--- a/pbs-datastore/src/lib.rs
+++ b/pbs-datastore/src/lib.rs
@@ -81,6 +81,19 @@
//! because running these operations concurrently is treated as a feature
//! on its own.
//!
+//! For datastores with S3 backend there are further restrictions since
+//! there are 3 types of locking mechanisms involved:
+//! - per-chunk file lock
+//! - chunk store mutex lock
+//! - lru cache mutex lock
+//!
+//! Locks must always be acquired in this specific order to avoid deadlocks.
+//! The per-chunk file lock is used to avoid holding a mutex lock during calls
+//! into async contexts, which can deadlock otherwise. It must be held for the
+//! whole time from starting an operation on the chunk until it is persisted
+//! to s3 backend, local datastore cache and in-memory LRU cache where
+//! required.
+//!
//! ## Inter-process Locking
//!
//! We need to be able to restart the proxmox-backup service daemons, so
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
* [pbs-devel] superseded: [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
` (18 preceding siblings ...)
2025-11-11 14:30 ` [pbs-devel] [PATCH proxmox-backup v5 19/19] datastore: document s3 backend specific locking restrictions Christian Ebner
@ 2025-11-14 13:21 ` Christian Ebner
19 siblings, 0 replies; 21+ messages in thread
From: Christian Ebner @ 2025-11-14 13:21 UTC (permalink / raw)
To: pbs-devel
superseded-by version 6:
https://lore.proxmox.com/pbs-devel/20251114131901.441650-1-c.ebner@proxmox.com/T/
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2025-11-14 13:20 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-11 14:29 [pbs-devel] [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 01/19] datastore: GC: drop overly verbose info message during s3 chunk sweep Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 02/19] chunk store: implement per-chunk file locking helper for s3 backend Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 03/19] datastore: acquire chunk store mutex lock when renaming corrupt chunk Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 04/19] datastore: get per-chunk file lock for chunk rename on s3 backend Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 05/19] fix #6961: datastore: verify: evict corrupt chunks from in-memory LRU cache Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 06/19] datastore: add locking to protect against races on chunk insert for s3 Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 07/19] GC: fix race with chunk upload/insert on s3 backends Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 08/19] chunk store: reduce exposure of clear_chunk() to crate only Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 09/19] chunk store: make chunk removal a helper method of the chunk store Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 10/19] store: split insert_chunk into wrapper + unsafe locked implementation Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 11/19] store: cache: move Mutex acquire to cache insertion Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 12/19] chunk store: rename cache-specific helpers Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 13/19] GC: cleanup chunk markers from cache in phase 3 on s3 backends Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 14/19] GC: touch bad chunk files independent of backend type Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 15/19] GC: guard missing marker file insertion for s3 backed stores Christian Ebner
2025-11-11 14:29 ` [pbs-devel] [PATCH proxmox-backup v5 16/19] GC: s3: track if a chunk marker file is missing since a bad chunk Christian Ebner
2025-11-11 14:30 ` [pbs-devel] [PATCH proxmox-backup v5 17/19] chunk store: add helpers marking missing local chunk markers as expected Christian Ebner
2025-11-11 14:30 ` [pbs-devel] [PATCH proxmox-backup v5 18/19] GC: assure chunk exists on s3 store when creating missing chunk marker Christian Ebner
2025-11-11 14:30 ` [pbs-devel] [PATCH proxmox-backup v5 19/19] datastore: document s3 backend specific locking restrictions Christian Ebner
2025-11-14 13:21 ` [pbs-devel] superseded: [PATCH proxmox-backup v5 00/19] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox