public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Christian Ebner <c.ebner@proxmox.com>
To: Robert Obkircher <r.obkircher@proxmox.com>, pbs-devel@lists.proxmox.com
Subject: Re: [PATCH proxmox-backup 4/4] datastore: fix sync level update propagation to chunk store
Date: Mon, 11 May 2026 14:56:12 +0200	[thread overview]
Message-ID: <0671c027-806a-4b12-b059-261ed33d4777@proxmox.com> (raw)
In-Reply-To: <76a86dc3-56f5-42b8-ae83-91cd98f47c3f@proxmox.com>

On 5/11/26 11:23 AM, Robert Obkircher wrote:
> 
> On 08.05.26 14:27, Christian Ebner wrote:
>> Changing the datastore tuning options triggers an invalidation of the
>> datastore cache entry, leading to re-instantiation with the new
>> config parameters on the next datastore lookup.
>> Since commit 0bd9c8701 ("datastore: lookup: reuse ChunkStore on stale
>> datastore re-open") this does however not lead to re-creation of the
>> chunk store instance in order to avoid dropping the process locker,
>> which would lead to loosing any existing shared lock. However, as a
>> consequence the sync level is not update on the chunk store.
> typo: updated

acked!

>>
>> Fix this by:
>> - Storing the sync level as runtime properties of the chunk store as
>>    state within the mutex syncing concurrent modify access. It is held
>>    where needed anyways.
>> - Pass the mutex guard as additional parameters to the methods
>>    requiring the locked state. This encodes the requirement for the
>>    mutex guard directly into the function signature instead of
>>    labeling it as unsafe only.
>> - Assuring the previous sync level on config changes for consistency.
>>
>> Reported-by: Robert Obkircher <r.obkircher@proxmox.com>
>> Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
>> ---
>>   pbs-datastore/src/chunk_store.rs              | 59 +++++++++++--------
>>   pbs-datastore/src/datastore.rs                | 16 ++++-
>>   .../src/local_datastore_lru_cache.rs          | 12 ++--
>>   3 files changed, 56 insertions(+), 31 deletions(-)
>>
>> diff --git a/pbs-datastore/src/chunk_store.rs b/pbs-datastore/src/chunk_store.rs
>> index 2888dea39..e2219276d 100644
>> --- a/pbs-datastore/src/chunk_store.rs
>> +++ b/pbs-datastore/src/chunk_store.rs
>> @@ -1,7 +1,7 @@
>>   use std::os::unix::fs::MetadataExt;
>>   use std::os::unix::io::AsRawFd;
>>   use std::path::{Path, PathBuf};
>> -use std::sync::{Arc, Mutex};
>> +use std::sync::{Arc, Mutex, MutexGuard};
>>   use std::time::Duration;
>>   
>>   use anyhow::{bail, format_err, Context, Error};
>> @@ -27,14 +27,20 @@ use crate::{DataBlob, LocalDatastoreLruCache};
>>   
>>   const USING_MARKER_FILENAME_EXT: &str = "using";
>>   
>> +#[derive(Default)]
>> +/// Configurable runtime properties of a chunk store
>> +pub(crate) struct ChunkStoreProperties {
>> +    pub(crate) sync_level: DatastoreFSyncLevel,
>> +}
>> +
>>   /// File system based chunk store
>>   pub struct ChunkStore {
>>       name: String, // used for error reporting
>>       pub(crate) base: PathBuf,
>>       chunk_dir: PathBuf,
>> -    mutex: Mutex<()>,
>> +    // Mutex to sync chunk store access, including property updates
>> +    mutex: Arc<Mutex<ChunkStoreProperties>>,
>>       locker: Option<Arc<Mutex<ProcessLocker>>>,
>> -    sync_level: DatastoreFSyncLevel,
>>   }
>>   
>>   // TODO: what about sysctl setting vm.vfs_cache_pressure (0 - 100) ?
>> @@ -79,9 +85,8 @@ impl ChunkStore {
>>               name: String::new(),
>>               base: PathBuf::new(),
>>               chunk_dir: PathBuf::new(),
>> -            mutex: Mutex::new(()),
>> +            mutex: Arc::new(Mutex::new(Default::default())),
>>               locker: None,
>> -            sync_level: Default::default(),
>>           }
>>       }
>>   
>> @@ -204,16 +209,19 @@ impl ChunkStore {
>>               base,
>>               chunk_dir,
>>               locker: Some(locker),
>> -            mutex: Mutex::new(()),
>> -            sync_level,
>> +            mutex: Arc::new(Mutex::new(ChunkStoreProperties { sync_level })),
>>           })
>>       }
>>   
>> -    fn touch_chunk_no_lock(&self, digest: &[u8; 32]) -> Result<(), Error> {
>> +    fn touch_chunk_no_lock(
>> +        &self,
>> +        digest: &[u8; 32],
>> +        mutex_guard: MutexGuard<ChunkStoreProperties>,
>> +    ) -> Result<(), Error> {
>>           // unwrap: only `None` in unit tests
>>           assert!(self.locker.is_some());
>>   
>> -        self.cond_touch_chunk_no_lock(digest, true)?;
>> +        self.cond_touch_chunk_no_lock(digest, true, mutex_guard)?;
>>           Ok(())
>>       }
>>   
>> @@ -226,14 +234,15 @@ impl ChunkStore {
>>           digest: &[u8; 32],
>>           assert_exists: bool,
>>       ) -> Result<bool, Error> {
>> -        let _lock = self.mutex.lock();
>> -        self.cond_touch_chunk_no_lock(digest, assert_exists)
>> +        let lock = self.mutex.lock().unwrap();
>> +        self.cond_touch_chunk_no_lock(digest, assert_exists, lock)
>>       }
>>   
>>       fn cond_touch_chunk_no_lock(
>>           &self,
>>           digest: &[u8; 32],
>>           assert_exists: bool,
>> +        _mutex_guard: MutexGuard<ChunkStoreProperties>,
>>       ) -> Result<bool, Error> {
>>           // unwrap: only `None` in unit tests
>>           assert!(self.locker.is_some());
>> @@ -423,8 +432,12 @@ impl ChunkStore {
>>           ProcessLocker::oldest_shared_lock(self.locker.clone().unwrap())
>>       }
>>   
>> -    pub(crate) fn mutex(&self) -> &std::sync::Mutex<()> {
>> -        &self.mutex
>> +    /// Mutex to lock chunk store for exclusive access.
>> +    ///
>> +    /// Must be held when modifying chunk store contents and allows to update
>> +    /// chunk store runtime properties.
>> +    pub(crate) fn mutex(&self) -> Arc<Mutex<ChunkStoreProperties>> {
>> +        Arc::clone(&self.mutex)
> It's a bit unfortunate if we have to clone the Arc every time.
> 
> It looks like the methods that require the mutex guard all take self
> by shared reference, so it should be fine to keep returning that here
> as well, no?

Yes, should be avoided. Will fix for v2

> 
>>       }
>>   
>>       pub fn sweep_unused_chunks(
>> @@ -665,18 +678,17 @@ impl ChunkStore {
>>   
>>           //println!("DIGEST {}", hex::encode(digest));
>>   
>> -        let _lock = self.mutex.lock();
>> +        let lock = self.mutex.lock().unwrap();
>>   
>> -        // Safety: lock acquired above
>> -        unsafe { self.insert_chunk_nolock(chunk, digest, true) }
>> +        self.insert_chunk_nolock(chunk, digest, true, lock)
>>       }
>>   
>> -    /// Safety: requires holding the chunk store mutex!
>> -    pub(crate) unsafe fn insert_chunk_nolock(
>> +    pub(crate) fn insert_chunk_nolock(
>>           &self,
>>           chunk: &DataBlob,
>>           digest: &[u8; 32],
>>           warn_on_overwrite_empty: bool,
>> +        mutex_guard: MutexGuard<ChunkStoreProperties>,
>>       ) -> Result<(bool, u64), Error> {
>>           // unwrap: only `None` in unit tests
>>           assert!(self.locker.is_some());
>> @@ -694,7 +706,7 @@ impl ChunkStore {
>>               }
>>               let old_size = metadata.len();
>>               if encoded_size == old_size {
>> -                self.touch_chunk_no_lock(digest)?;
>> +                self.touch_chunk_no_lock(digest, mutex_guard)?;
>>                   return Ok((true, old_size));
>>               } else if old_size == 0 {
>>                   if warn_on_overwrite_empty {
>> @@ -721,11 +733,11 @@ impl ChunkStore {
>>                   // compressed, the size mismatch could be caused by different zstd versions
>>                   // so let's keep the one that was uploaded first, bit-rot is hopefully detected by
>>                   // verification at some point..
>> -                self.touch_chunk_no_lock(digest)?;
>> +                self.touch_chunk_no_lock(digest, mutex_guard)?;
>>                   return Ok((true, old_size));
>>               } else if old_size < encoded_size {
>>                   log::debug!("Got another copy of chunk with digest '{digest_str}', existing chunk is smaller, discarding uploaded one.");
>> -                self.touch_chunk_no_lock(digest)?;
>> +                self.touch_chunk_no_lock(digest, mutex_guard)?;
>>                   return Ok((true, old_size));
>>               } else {
>>                   log::debug!("Got another copy of chunk with digest '{digest_str}', existing chunk is bigger, replacing with uploaded one.");
>> @@ -742,17 +754,18 @@ impl ChunkStore {
>>               let gid = pbs_config::backup_group()?.gid;
>>               create_options = create_options.owner(uid).group(gid);
>>           }
>> +
>>           proxmox_sys::fs::replace_file(
>>               &chunk_path,
>>               raw_data,
>>               create_options,
>> -            self.sync_level == DatastoreFSyncLevel::File,
>> +            mutex_guard.sync_level == DatastoreFSyncLevel::File,
>>           )
>>           .map_err(|err| {
>>               format_err!("inserting chunk on store '{name}' failed for {digest_str} - {err}")
>>           })?;
>>   
>> -        if self.sync_level == DatastoreFSyncLevel::File {
>> +        if mutex_guard.sync_level == DatastoreFSyncLevel::File {
>>               // fsync dir handle to persist the tmp rename
>>               let dir = std::fs::File::open(chunk_dir_path)?;
>>               nix::unistd::fsync(dir.as_raw_fd())
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 8f69dd7ac..379f797be 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -610,12 +610,22 @@ impl DataStore {
>>                       operation: Some(lookup.operation),
>>                   }));
>>               }
>> +
>> +            datastore.try_ensure_sync_level()?;
> This should only be called if the level actually changes.
> 
> There is also a small race condition with try_ensure_sync_level. It
> could be solved by locking the mutex in both cases.

Yes, will fix, thanks!

>> +            let tuning: DatastoreTuning = serde_json::from_value(
>> +                DatastoreTuning::API_SCHEMA
>> +                    .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
>> +            )?;
>> +            let mutex = datastore.chunk_store.mutex();
>> +            let mut mutex_guard = mutex.lock().unwrap();
>> +            mutex_guard.sync_level = tuning.sync_level.unwrap_or_default();
>>               Arc::clone(&datastore.chunk_store)
>>           } else {
>>               let tuning: DatastoreTuning = serde_json::from_value(
>>                   DatastoreTuning::API_SCHEMA
>>                       .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
>>               )?;
>> +
>>               Arc::new(ChunkStore::open(
>>                   lookup.name,
>>                   config.absolute_path(),
>> @@ -2499,7 +2509,8 @@ impl DataStore {
>>                           Ok(guard) => guard,
>>                           Err(_) => continue,
>>                       };
>> -                    let _guard = self.inner.chunk_store.mutex().lock().unwrap();
>> +                    let mutex = self.inner.chunk_store.mutex();
>> +                    let _guard = mutex.lock().unwrap();
>>   
>>                       // Check local markers (created or atime updated during phase1) and
>>                       // keep or delete chunk based on that.
>> @@ -3450,7 +3461,8 @@ impl DataStore {
>>                   .chunk_store
>>                   .lock_chunk(digest, CHUNK_LOCK_TIMEOUT)?;
>>           }
>> -        let _lock = self.inner.chunk_store.mutex().lock().unwrap();
>> +        let mutex = self.inner.chunk_store.mutex();
>> +        let _lock = mutex.lock().unwrap();
>>   
>>           let (new_path, counter) = self.inner.chunk_store.next_bad_chunk_path(digest);
>>   
>> diff --git a/pbs-datastore/src/local_datastore_lru_cache.rs b/pbs-datastore/src/local_datastore_lru_cache.rs
>> index ac27d4637..cf38d4a57 100644
>> --- a/pbs-datastore/src/local_datastore_lru_cache.rs
>> +++ b/pbs-datastore/src/local_datastore_lru_cache.rs
>> @@ -34,12 +34,11 @@ impl LocalDatastoreLruCache {
>>       ///
>>       /// Fails if the chunk cannot be inserted successfully.
>>       pub fn insert(&self, digest: &[u8; 32], chunk: &DataBlob) -> Result<(), Error> {
>> -        let _lock = self.store.mutex().lock().unwrap();
>> +        let shared_mutex = self.store.mutex();
>> +        let lock = shared_mutex.lock().unwrap();
>> +
>> +        self.store.insert_chunk_nolock(chunk, digest, false, lock)?;
>>   
>> -        // Safety: lock acquire above
>> -        unsafe {
>> -            self.store.insert_chunk_nolock(chunk, digest, false)?;
>> -        }
>>           self.cache.insert(*digest, (), |digest| {
>>               // Safety: lock acquired above, this is executed inline!
>>               unsafe {
>> @@ -80,7 +79,8 @@ impl LocalDatastoreLruCache {
>>               Ok(mut file) => match DataBlob::load_from_reader(&mut file) {
>>                   // File was still cached with contents, load response from file
>>                   Ok(chunk) => {
>> -                    let _lock = self.store.mutex().lock().unwrap();
>> +                    let shared_mutex = self.store.mutex();
>> +                    let _lock = shared_mutex.lock().unwrap();
>>                       self.cache.insert(*digest, (), |digest| {
>>                           // Safety: lock acquired above, this is executed inline
>>                           unsafe {





  reply	other threads:[~2026-05-11 12:56 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08 12:29 [PATCH proxmox-backup 0/4] fix sync level updates for chunk store Christian Ebner
2026-05-08 12:29 ` [PATCH proxmox-backup 1/4] datastore: restrict chunk store mutex scope to crate only Christian Ebner
2026-05-08 12:29 ` [PATCH proxmox-backup 2/4] datastore: avoid useless double borrowing of datastore Christian Ebner
2026-05-08 12:29 ` [PATCH proxmox-backup 3/4] datastore: move try_ensure_sync_level() to DataStoreImpl Christian Ebner
2026-05-11  9:23   ` Robert Obkircher
2026-05-11 12:53     ` Christian Ebner
2026-05-08 12:29 ` [PATCH proxmox-backup 4/4] datastore: fix sync level update propagation to chunk store Christian Ebner
2026-05-11  9:25   ` Robert Obkircher
2026-05-11 12:56     ` Christian Ebner [this message]
2026-05-12  8:56 ` [PATCH proxmox-backup 0/4] fix sync level updates for " Christian Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0671c027-806a-4b12-b059-261ed33d4777@proxmox.com \
    --to=c.ebner@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    --cc=r.obkircher@proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal