public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
To: "Proxmox Backup Server development discussion"
	<pbs-devel@lists.proxmox.com>,
	"Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
Date: Wed, 26 Nov 2025 18:21:13 +0100	[thread overview]
Message-ID: <601f8abd-ea47-4c3b-9930-ff52b8e07675@proxmox.com> (raw)
In-Reply-To: <1764167253.w84bii3yf4.astroid@yuna.none>

On 11/26/25 4:15 PM, Fabian Grünbichler wrote:
> On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
>> Repeated /status requests caused lookup_datastore() to re-read and
>> parse datastore.cfg on every call. The issue was mentioned in report
>> #6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
>> dominated by pbs_config::datastore::config() (config parsing).
>>
>> This patch implements caching of the global datastore.cfg using the
>> generation numbers from the shared config version cache. It caches the
>> datastore.cfg along with the generation number and, when a subsequent
>> lookup sees the same generation, it reuses the cached config without
>> re-reading it from disk. If the generation differs
>> (or the cache is unavailable), the config is re-read from disk.
>> If `update_cache = true`, the new config and current generation are
>> persisted in the cache. In this case, callers must hold the datastore
>> config lock to avoid racing with concurrent config changes.
>> If `update_cache` is `false` and generation did not match, the freshly
>> read config is returned but the cache is left unchanged. If
>> `ConfigVersionCache` is not available, the config is always read from
>> disk and `None` is returned as generation.
>>
>> Behavioral notes
>>
>> - The generation is bumped via the existing save_config() path, so
>>    API-driven config changes are detected immediately.
>> - Manual edits to datastore.cfg are not detected; this is covered in a
>>    dedicated patch in this series.
>> - DataStore::drop still performs a config read on the common path;
>>    also covered in a dedicated patch in this series.
>>
>> Links
>>
>> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>>
>> Fixes: #6049
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> 
> style nits below, but otherwise
> 
> Reviewed-by: Fabian Grünbicher <f.gruenbichler@proxmox.com>
> 
>> ---
>> Changes:
>>
>>  From v1 → v2, thanks @Fabian
>> - Moved the ConfigVersionCache changes into its own patch.
>> - Introduced the global static DATASTORE_CONFIG_CACHE to store the
>> fully parsed datastore.cfg instead, along with its generation number.
>>    Introduced DatastoreConfigCache struct to hold both.
>> - Removed and replaced the CachedDatastoreConfigTag field of
>>    DataStoreImpl with a generation number field only (Option<usize>)
>>    to validate DataStoreImpl reuse.
>> - Added DataStore::datastore_section_config_cached() helper function
>>    to encapsulate the caching logic and simplify reuse.
>> - Modified DataStore::lookup_datastore() to use the new helper.
>>
>>  From v2 → v3
>> No changes
>>
>>  From v3 → v4, thanks @Fabian
>> - Restructured the version cache checks in
>>    datastore_section_config_cached(), to simplify the logic.
>> - Added update_cache parameter to datastore_section_config_cached() to
>>    control cache updates.
>>
>>  From v4 → v5
>> - Rebased only, no changes
>>
>>   pbs-datastore/Cargo.toml       |   1 +
>>   pbs-datastore/src/datastore.rs | 138 +++++++++++++++++++++++++--------
>>   2 files changed, 105 insertions(+), 34 deletions(-)
> 
> this could be
> 
>   2 files changed, 80 insertions(+), 17 deletions(-)
> 
> see below. this might sound nit-picky, but keeping diffs concise makes
> reviewing a lot easier because of the higher signal to noise ratio..
> 
>>
>> diff --git a/pbs-datastore/Cargo.toml b/pbs-datastore/Cargo.toml
>> index 8ce930a9..42f49a7b 100644
>> --- a/pbs-datastore/Cargo.toml
>> +++ b/pbs-datastore/Cargo.toml
>> @@ -40,6 +40,7 @@ proxmox-io.workspace = true
>>   proxmox-lang.workspace=true
>>   proxmox-s3-client = { workspace = true, features = [ "impl" ] }
>>   proxmox-schema = { workspace = true, features = [ "api-macro" ] }
>> +proxmox-section-config.workspace = true
>>   proxmox-serde = { workspace = true, features = [ "serde_json" ] }
>>   proxmox-sys.workspace = true
>>   proxmox-systemd.workspace = true
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 36550ff6..c9cb5d65 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -34,7 +34,8 @@ use pbs_api_types::{
>>       MaintenanceType, Operation, UPID,
>>   };
>>   use pbs_config::s3::S3_CFG_TYPE_ID;
>> -use pbs_config::BackupLockGuard;
>> +use pbs_config::{BackupLockGuard, ConfigVersionCache};
>> +use proxmox_section_config::SectionConfigData;
>>   
>>   use crate::backup_info::{
>>       BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
>> @@ -48,6 +49,17 @@ use crate::s3::S3_CONTENT_PREFIX;
>>   use crate::task_tracking::{self, update_active_operations};
>>   use crate::{DataBlob, LocalDatastoreLruCache};
>>   
>> +// Cache for fully parsed datastore.cfg
>> +struct DatastoreConfigCache {
>> +    // Parsed datastore.cfg file
>> +    config: Arc<SectionConfigData>,
>> +    // Generation number from ConfigVersionCache
>> +    last_generation: usize,
>> +}
>> +
>> +static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
>> +    LazyLock::new(|| Mutex::new(None));
>> +
>>   static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
>>       LazyLock::new(|| Mutex::new(HashMap::new()));
>>   
>> @@ -149,11 +161,13 @@ pub struct DataStoreImpl {
>>       last_gc_status: Mutex<GarbageCollectionStatus>,
>>       verify_new: bool,
>>       chunk_order: ChunkOrder,
>> -    last_digest: Option<[u8; 32]>,
>>       sync_level: DatastoreFSyncLevel,
>>       backend_config: DatastoreBackendConfig,
>>       lru_store_caching: Option<LocalDatastoreLruCache>,
>>       thread_settings: DatastoreThreadSettings,
>> +    /// Datastore generation number from `ConfigVersionCache` at creation time, used to
> 
> datastore.cfg cache generation number at lookup time, used to invalidate
> this cached `DataStoreImpl`
> 
> creation time could also refer to the datastore creation time..
>

Good point, agree! Will apply your suggestion, I like it more :)

>> +    /// validate reuse of this cached `DataStoreImpl`.
>> +    config_generation: Option<usize>,
>>   }
>>   
>>   impl DataStoreImpl {
>> @@ -166,11 +180,11 @@ impl DataStoreImpl {
>>               last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
>>               verify_new: false,
>>               chunk_order: Default::default(),
>> -            last_digest: None,
>>               sync_level: Default::default(),
>>               backend_config: Default::default(),
>>               lru_store_caching: None,
>>               thread_settings: Default::default(),
>> +            config_generation: None,
>>           })
>>       }
>>   }
>> @@ -286,6 +300,55 @@ impl DatastoreThreadSettings {
> 
> so this new helper is already +49 lines, which means it's the bulk of
> the change if the noise below is removed..
> 
>>       }
>>   }
>>   
>> +/// Returns the parsed datastore config (`datastore.cfg`) and its
>> +/// generation.
>> +///
>> +/// Uses `ConfigVersionCache` to detect stale entries:
>> +/// - If the cached generation matches the current generation, the
>> +///   cached config is returned.
>> +/// - Otherwise the config is re-read from disk. If `update_cache` is
>> +///   `true`, the new config and current generation are stored in the
>> +///   cache. Callers that set `update_cache = true` must hold the
>> +///   datastore config lock to avoid racing with concurrent config
>> +///   changes.
>> +/// - If `update_cache` is `false`, the freshly read config is returned
>> +///   but the cache is left unchanged.
>> +///
>> +/// If `ConfigVersionCache` is not available, the config is always read
>> +/// from disk and `None` is returned as the generation.
>> +fn datastore_section_config_cached(
>> +    update_cache: bool,
>> +) -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
>> +    let mut config_cache = DATASTORE_CONFIG_CACHE.lock().unwrap();
>> +
>> +    if let Ok(version_cache) = ConfigVersionCache::new() {
>> +        let current_gen = version_cache.datastore_generation();
>> +        if let Some(cached) = config_cache.as_ref() {
>> +            // Fast path: re-use cached datastore.cfg
>> +            if cached.last_generation == current_gen {
>> +                return Ok((cached.config.clone(), Some(cached.last_generation)));
>> +            }
>> +        }
>> +        // Slow path: re-read datastore.cfg
>> +        let (config_raw, _digest) = pbs_config::datastore::config()?;
>> +        let config = Arc::new(config_raw);
>> +
>> +        if update_cache {
>> +            *config_cache = Some(DatastoreConfigCache {
>> +                config: config.clone(),
>> +                last_generation: current_gen,
>> +            });
>> +        }
>> +
>> +        Ok((config, Some(current_gen)))
>> +    } else {
>> +        // Fallback path, no config version cache: read datastore.cfg and return None as generation
>> +        *config_cache = None;
>> +        let (config_raw, _digest) = pbs_config::datastore::config()?;
>> +        Ok((Arc::new(config_raw), None))
>> +    }
>> +}
>> +
>>   impl DataStore {
>>       // This one just panics on everything
>>       #[doc(hidden)]
>> @@ -363,56 +426,63 @@ impl DataStore {
>>           name: &str,
>>           operation: Option<Operation>,
>>       ) -> Result<Arc<DataStore>, Error> {
> 
> the changes here contain a lot of churn that is not needed for the
> actual change that is done - e.g., there's lots of variables being
> needless renamed..
> 
>> -        // Avoid TOCTOU between checking maintenance mode and updating active operation counter, as
>> -        // we use it to decide whether it is okay to delete the datastore.
>> +        // Avoid TOCTOU between checking maintenance mode and updating active operations.
>>           let _config_lock = pbs_config::datastore::lock_config()?;
>>   
>> -        // we could use the ConfigVersionCache's generation for staleness detection, but  we load
>> -        // the config anyway -> just use digest, additional benefit: manual changes get detected
>> -        let (config, digest) = pbs_config::datastore::config()?;
>> -        let config: DataStoreConfig = config.lookup("datastore", name)?;
>> +        // Get the current datastore.cfg generation number and cached config
>> +        let (section_config, gen_num) = datastore_section_config_cached(true)?;
>> +
>> +        let datastore_cfg: DataStoreConfig = section_config.lookup("datastore", name)?;
> 
> renaming this variable here already causes a lot of noise - if you
> really want to do that, do it up-front or at the end as cleanup commit
> (depending on how sure you are the change is agree-able - if it's almost
> certain it is accepted, put it up front, then it can get applied
> already. if not, put it at the end -> then it can be skipped without
> affecting the other patches ;))
> 
> but going from config to datastore_cfg doesn't make the code any clearer
> - the latter can still refer to the whole config (datastore.cfg) or the
> individual datastore's section within.. since we have no futher business
> with the whole section config here, I think keeping this as `config` is
> fine..
> 

Agree, will change back!

As you mentioned, the idea of the rename was to somehow better
differentiate between the global datastore.cfg and the individual
datastore config.
As we have 2 configs here in the same block, I tried to avoid to rely on
"config". As you said datastore_cfg
does not make it much better, or could be misinterpreted. Ultimatively,
I directly aligend with the type names (datastore_cfg for
DataStoreConfig).

I see your point and agree:

since we have no futher business
 > with the whole section config here, I think keeping this as `config` is
 > fine..

This is the best solution here.

>> +        let maintenance_mode = datastore_cfg.get_maintenance_mode();
>> +        let mount_status = get_datastore_mount_status(&datastore_cfg);
> 
> and things extracted into variables here despite being used only once
> (this also doesn't change during the rest of the series)
> 

Good point, will remove!

>>   
>> -        if let Some(maintenance_mode) = config.get_maintenance_mode() {
>> -            if let Err(error) = maintenance_mode.check(operation) {
>> +        if let Some(mm) = &maintenance_mode {
> 
> binding name changes
> 
>> +            if let Err(error) = mm.check(operation.clone()) {
> 
> new clone() is introduced but not needed

Nice catch!

> 
>>                   bail!("datastore '{name}' is unavailable: {error}");
>>               }
>>           }
>>   
>> -        if get_datastore_mount_status(&config) == Some(false) {
>> -            let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
>> -            datastore_cache.remove(&config.name);
>> -            bail!("datastore '{}' is not mounted", config.name);
>> +        let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
> 
> moving this is fine!
> 

Helps to avoid the duplicate checks :)

>> +
>> +        if mount_status == Some(false) {
>> +            datastore_cache.remove(&datastore_cfg.name);
>> +            bail!("datastore '{}' is not mounted", datastore_cfg.name);
>>           }
>>   
>> -        let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
>> -        let entry = datastore_cache.get(name);
> 
> this is changed to doing it twice and cloning..
> 

Refactoring this!

>> -
>> -        // reuse chunk store so that we keep using the same process locker instance!
>> -        let chunk_store = if let Some(datastore) = &entry {
> 
> structure changed here and binding renamed, even though the old one works as-is
> 

Agree, will revert!

>> -            let last_digest = datastore.last_digest.as_ref();
>> -            if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
>> -                if let Some(operation) = operation {
>> -                    update_active_operations(name, operation, 1)?;
>> +        // Re-use DataStoreImpl
>> +        if let Some(existing) = datastore_cache.get(name).cloned() {
>> +            if let (Some(last_generation), Some(gen_num)) = (existing.config_generation, gen_num) {
>> +                if last_generation == gen_num {
> 
> these two ifs can be collapsed into
> 
> if datastore.config_generation == gen_num && gen_num.is_some() {
> 
> since we only want to reuse the entry if the current gen num is the same
> as the last one.
> 

Will collapse the two ifs, good catch!

>> +                    if let Some(op) = operation {
> 
> binding needlessly renamed
> 

Agree, will revert!

>> +                        update_active_operations(name, op, 1)?;
>> +                    }
>> +
>> +                    return Ok(Arc::new(Self {
>> +                        inner: existing,
>> +                        operation,
>> +                    }));
>>                   }
>> -                return Ok(Arc::new(Self {
>> -                    inner: Arc::clone(datastore),
>> -                    operation,
>> -                }));
>>               }
>> -            Arc::clone(&datastore.chunk_store)
>> +        }
>> +
>> +        // (Re)build DataStoreImpl
>> +
>> +        // Reuse chunk store so that we keep using the same process locker instance!
>> +        let chunk_store = if let Some(existing) = datastore_cache.get(name) {
>> +            Arc::clone(&existing.chunk_store)
> 
> this one is not needed at all, can just be combined with the previous if
> as before
>

Will refactor!

>>           } else {
>>               let tuning: DatastoreTuning = serde_json::from_value(
>>                   DatastoreTuning::API_SCHEMA
>> -                    .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
>> +                    .parse_property_string(datastore_cfg.tuning.as_deref().unwrap_or(""))?,
>>               )?;
>>               Arc::new(ChunkStore::open(
>>                   name,
>> -                config.absolute_path(),
>> +                datastore_cfg.absolute_path(),
>>                   tuning.sync_level.unwrap_or_default(),
>>               )?)
>>           };
>>   
>> -        let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
>> +        let datastore = DataStore::with_store_and_config(chunk_store, datastore_cfg, gen_num)?;
>>   
>>           let datastore = Arc::new(datastore);
>>           datastore_cache.insert(name.to_string(), datastore.clone());
>> @@ -514,7 +584,7 @@ impl DataStore {
>>       fn with_store_and_config(
>>           chunk_store: Arc<ChunkStore>,
>>           config: DataStoreConfig,
>> -        last_digest: Option<[u8; 32]>,
>> +        generation: Option<usize>,
>>       ) -> Result<DataStoreImpl, Error> {
>>           let mut gc_status_path = chunk_store.base_path();
>>           gc_status_path.push(".gc-status");
>> @@ -579,11 +649,11 @@ impl DataStore {
>>               last_gc_status: Mutex::new(gc_status),
>>               verify_new: config.verify_new.unwrap_or(false),
>>               chunk_order: tuning.chunk_order.unwrap_or_default(),
>> -            last_digest,
>>               sync_level: tuning.sync_level.unwrap_or_default(),
>>               backend_config,
>>               lru_store_caching,
>>               thread_settings,
>> +            config_generation: generation,
>>           })
>>       }
>>   
>> -- 
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

  reply	other threads:[~2025-11-26 17:21 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-24 17:04 [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler
2025-11-26 17:21     ` Samuel Rufinatscha [this message]
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler
2025-11-28  9:03     ` Samuel Rufinatscha
2025-11-28 10:46       ` Fabian Grünbichler
2025-11-28 11:10         ` Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler
2025-11-26 15:16 ` [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Fabian Grünbichler
2025-11-26 16:10   ` Samuel Rufinatscha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=601f8abd-ea47-4c3b-9930-ff52b8e07675@proxmox.com \
    --to=s.rufinatscha@proxmox.com \
    --cc=f.gruenbichler@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal