all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Christian Ebner <c.ebner@proxmox.com>
To: "Proxmox Backup Server development discussion"
	<pbs-devel@lists.proxmox.com>,
	"Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup 10/17] datastore: implement per-chunk file locking helper for s3 backend
Date: Tue, 4 Nov 2025 09:45:41 +0100	[thread overview]
Message-ID: <876f89a6-a91a-4305-a848-42ec9d670842@proxmox.com> (raw)
In-Reply-To: <1762177869.199x08a4up.astroid@yuna.none>

On 11/3/25 3:50 PM, Fabian Grünbichler wrote:
> On November 3, 2025 12:31 pm, Christian Ebner wrote:
>> Adds a datastore helper method to create per-chunk file locks. These
>> will be used to guard chunk operations on s3 backends to guarantee
>> exclusive access when performing cache and backend operations.
>>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>>   pbs-datastore/src/backup_info.rs |  2 +-
>>   pbs-datastore/src/chunk_store.rs | 26 ++++++++++++++++++++++++++
>>   pbs-datastore/src/datastore.rs   | 12 ++++++++++++
>>   3 files changed, 39 insertions(+), 1 deletion(-)
>>
>> diff --git a/pbs-datastore/src/backup_info.rs b/pbs-datastore/src/backup_info.rs
>> index 4b10b6435..70c0fbe8a 100644
>> --- a/pbs-datastore/src/backup_info.rs
>> +++ b/pbs-datastore/src/backup_info.rs
>> @@ -936,7 +936,7 @@ fn lock_file_path_helper(ns: &BackupNamespace, path: PathBuf) -> PathBuf {
>>   /// deletion.
>>   ///
>>   /// It also creates the base directory for lock files.
>> -fn lock_helper<F>(
>> +pub(crate) fn lock_helper<F>(
>>       store_name: &str,
>>       path: &std::path::Path,
>>       lock_fn: F,
>> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
>> index ba7618e40..49687b2fa 100644
>> --- a/pbs-datastore/src/chunk_store.rs
>> +++ b/pbs-datastore/src/chunk_store.rs
>> @@ -8,6 +8,7 @@ use anyhow::{bail, format_err, Context, Error};
>>   use tracing::{info, warn};
>>   
>>   use pbs_api_types::{DatastoreFSyncLevel, GarbageCollectionStatus};
>> +use pbs_config::BackupLockGuard;
>>   use proxmox_io::ReadExt;
>>   use proxmox_s3_client::S3Client;
>>   use proxmox_sys::fs::{create_dir, create_path, file_type_from_file_stat, CreateOptions};
>> @@ -16,6 +17,7 @@ use proxmox_sys::process_locker::{
>>   };
>>   use proxmox_worker_task::WorkerTaskContext;
>>   
>> +use crate::backup_info::DATASTORE_LOCKS_DIR;
>>   use crate::data_blob::DataChunkBuilder;
>>   use crate::file_formats::{
>>       COMPRESSED_BLOB_MAGIC_1_0, ENCRYPTED_BLOB_MAGIC_1_0, UNCOMPRESSED_BLOB_MAGIC_1_0,
>> @@ -759,6 +761,30 @@ impl ChunkStore {
>>           ChunkStore::check_permissions(lockfile_path, 0o644)?;
>>           Ok(())
>>       }
>> +
>> +    /// Generates the path to the chunks lock file
>> +    pub(crate) fn chunk_lock_path(&self, digest: &[u8]) -> PathBuf {
>> +        let mut lock_path = Path::new(DATASTORE_LOCKS_DIR).join(self.name.clone());
>> +        let digest_str = hex::encode(digest);
>> +        lock_path.push(".chunks");
>> +        let prefix = digest_to_prefix(digest);
>> +        lock_path.push(&prefix);
>> +        lock_path.push(&digest_str);
>> +        lock_path
> 
> should we add "s3" or some suffix here, so that if we add another
> backend in the future we already have specific paths?

But the backend is a property of a datastore, and the lock paths are 
already prefixed by the datastore name in any case. So this would just 
add an additional directory level without much gain. So in my opinion 
this should stay as is.

> 
>> +    }
>> +
>> +    /// Get an exclusive lock on the chunks lock file
>> +    pub(crate) fn lock_chunk(
>> +        &self,
>> +        digest: &[u8],
>> +        timeout: Duration,
>> +    ) -> Result<BackupLockGuard, Error> {
>> +        let lock_path = self.chunk_lock_path(digest);
>> +        let guard = crate::backup_info::lock_helper(self.name(), &lock_path, |path| {
>> +            pbs_config::open_backup_lockfile(path, Some(timeout), true)
>> +        })?;
>> +        Ok(guard)
>> +    }
>>   }
>>   
>>   #[test]
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 397c37e56..32f3562b3 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -2568,6 +2568,18 @@ impl DataStore {
>>           Ok(())
>>       }
>>   
>> +    /// Locks the per chunk lock file if the backend requires it
>> +    fn lock_chunk_for_backend(&self, digest: &[u8; 32]) -> Result<Option<BackupLockGuard>, Error> {
>> +        // s3 put request times out after upload_size / 1 Kib/s, so about 2.3 hours for 8 MiB
>> +        let timeout = Duration::from_secs(3 * 60 * 60);
> 
> could move into the S3 branch below.. or be made S3 specific in the
> first place, since it is only called/effective there? the renaming
> helper needs some rework then I guess..

I did introduce this exactly to not have to move it to the rename 
helper, as that is already convoluted enough in my opinion. But can have 
a look if it makes sense to inline this. Also, the Duration 
instantiation does not need to happen for both cases, so will move that 
to the s3 specific part.

> 
> but I am not sure if this logic here is really sound (any individual
> caller waiting for longer than a single uploads max timeout might be
> valid, since the locking is not fair and multiple locking attempts might
> have queued up), I guess the instances where we end up taking this lock
> are few enough that no progress over such a long time makes any progress
> within reasonable time unlikely..
> 
> we currently take this lock for the duration of a chunk
> upload/insertion, for the duration of a chunk rename after corruption
> has been detected, and for a batch of GC chunk removal.

Hmm, what possible alternatives I have here? Bypassing the lock helper 
and acquire the file lock without timeout? Also not ideal I guess.

> 
>> +        match self.inner.backend_config.ty.unwrap_or_default() {
>> +            DatastoreBackendType::Filesystem => Ok(None),
>> +            DatastoreBackendType::S3 => {
>> +                self.inner.chunk_store.lock_chunk(digest, timeout).map(Some)
>> +            }
>> +        }
>> +    }
>> +
>>       /// Renames a corrupt chunk, returning the new path if the chunk was renamed successfully.
>>       /// Returns with `Ok(None)` if the chunk source was not found.
>>       pub fn rename_corrupt_chunk(&self, digest: &[u8; 32]) -> Result<Option<PathBuf>, Error> {
>> -- 
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

  reply	other threads:[~2025-11-04  8:45 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-03 11:31 [pbs-devel] [PATCH proxmox-backup 00/17] fix chunk upload/insert, rename corrupt chunks and GC race conditions " Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 01/17] sync: pull: instantiate backend only once per sync job Christian Ebner
2025-11-03 14:51   ` Fabian Grünbichler
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 02/17] api/datastore: move group notes setting to the datastore Christian Ebner
2025-11-03 14:51   ` Fabian Grünbichler
2025-11-04  8:51     ` Christian Ebner
2025-11-04  9:13       ` Fabian Grünbichler
2025-11-04  9:37         ` Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 03/17] api/datastore: move snapshot deletion into dedicated datastore helper Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 04/17] api/datastore: move backup log upload by implementing " Christian Ebner
2025-11-03 14:51   ` Fabian Grünbichler
2025-11-04  8:47     ` Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 05/17] api/datastore: add dedicated datastore helper to set snapshot notes Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 06/17] datastore: refactor chunk insert based on backend Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 07/17] verify: rename corrupted to corrupt in log output and function names Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 08/17] verify/datastore: make rename corrupt chunk a datastore helper method Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 09/17] datastore: refactor rename_corrupt_chunk error handling Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 10/17] datastore: implement per-chunk file locking helper for s3 backend Christian Ebner
2025-11-03 14:51   ` Fabian Grünbichler
2025-11-04  8:45     ` Christian Ebner [this message]
2025-11-04  9:01       ` Fabian Grünbichler
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 11/17] datastore: acquire chunk store mutex lock when renaming corrupt chunk Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 12/17] datastore: get per-chunk file lock for chunk rename on s3 backend Christian Ebner
2025-11-03 14:51   ` Fabian Grünbichler
2025-11-04  8:33     ` Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 13/17] fix #6961: datastore: verify: evict corrupt chunks from in-memory LRU cache Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 14/17] datastore: add locking to protect against races on chunk insert for s3 Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 15/17] GC: fix race with chunk upload/insert on s3 backends Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 16/17] GC: lock chunk marker before cleanup in phase 3 " Christian Ebner
2025-11-03 11:31 ` [pbs-devel] [PATCH proxmox-backup 17/17] datastore: GC: drop overly verbose info message during s3 chunk sweep Christian Ebner
2025-11-04 13:08 ` [pbs-devel] superseded: [PATCH proxmox-backup 00/17] fix chunk upload/insert, rename corrupt chunks and GC race conditions for s3 backend Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=876f89a6-a91a-4305-a848-42ec9d670842@proxmox.com \
    --to=c.ebner@proxmox.com \
    --cc=f.gruenbichler@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal