all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
To: "Proxmox Backup Server development discussion"
	<pbs-devel@lists.proxmox.com>,
	"Fabian Grünbichler" <f.gruenbichler@proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
Date: Wed, 26 Nov 2025 18:21:13 +0100	[thread overview]
Message-ID: <601f8abd-ea47-4c3b-9930-ff52b8e07675@proxmox.com> (raw)
In-Reply-To: <1764167253.w84bii3yf4.astroid@yuna.none>

On 11/26/25 4:15 PM, Fabian Grünbichler wrote:
> On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
>> Repeated /status requests caused lookup_datastore() to re-read and
>> parse datastore.cfg on every call. The issue was mentioned in report
>> #6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
>> dominated by pbs_config::datastore::config() (config parsing).
>>
>> This patch implements caching of the global datastore.cfg using the
>> generation numbers from the shared config version cache. It caches the
>> datastore.cfg along with the generation number and, when a subsequent
>> lookup sees the same generation, it reuses the cached config without
>> re-reading it from disk. If the generation differs
>> (or the cache is unavailable), the config is re-read from disk.
>> If `update_cache = true`, the new config and current generation are
>> persisted in the cache. In this case, callers must hold the datastore
>> config lock to avoid racing with concurrent config changes.
>> If `update_cache` is `false` and generation did not match, the freshly
>> read config is returned but the cache is left unchanged. If
>> `ConfigVersionCache` is not available, the config is always read from
>> disk and `None` is returned as generation.
>>
>> Behavioral notes
>>
>> - The generation is bumped via the existing save_config() path, so
>>    API-driven config changes are detected immediately.
>> - Manual edits to datastore.cfg are not detected; this is covered in a
>>    dedicated patch in this series.
>> - DataStore::drop still performs a config read on the common path;
>>    also covered in a dedicated patch in this series.
>>
>> Links
>>
>> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>>
>> Fixes: #6049
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> 
> style nits below, but otherwise
> 
> Reviewed-by: Fabian Grünbicher <f.gruenbichler@proxmox.com>
> 
>> ---
>> Changes:
>>
>>  From v1 → v2, thanks @Fabian
>> - Moved the ConfigVersionCache changes into its own patch.
>> - Introduced the global static DATASTORE_CONFIG_CACHE to store the
>> fully parsed datastore.cfg instead, along with its generation number.
>>    Introduced DatastoreConfigCache struct to hold both.
>> - Removed and replaced the CachedDatastoreConfigTag field of
>>    DataStoreImpl with a generation number field only (Option<usize>)
>>    to validate DataStoreImpl reuse.
>> - Added DataStore::datastore_section_config_cached() helper function
>>    to encapsulate the caching logic and simplify reuse.
>> - Modified DataStore::lookup_datastore() to use the new helper.
>>
>>  From v2 → v3
>> No changes
>>
>>  From v3 → v4, thanks @Fabian
>> - Restructured the version cache checks in
>>    datastore_section_config_cached(), to simplify the logic.
>> - Added update_cache parameter to datastore_section_config_cached() to
>>    control cache updates.
>>
>>  From v4 → v5
>> - Rebased only, no changes
>>
>>   pbs-datastore/Cargo.toml       |   1 +
>>   pbs-datastore/src/datastore.rs | 138 +++++++++++++++++++++++++--------
>>   2 files changed, 105 insertions(+), 34 deletions(-)
> 
> this could be
> 
>   2 files changed, 80 insertions(+), 17 deletions(-)
> 
> see below. this might sound nit-picky, but keeping diffs concise makes
> reviewing a lot easier because of the higher signal to noise ratio..
> 
>>
>> diff --git a/pbs-datastore/Cargo.toml b/pbs-datastore/Cargo.toml
>> index 8ce930a9..42f49a7b 100644
>> --- a/pbs-datastore/Cargo.toml
>> +++ b/pbs-datastore/Cargo.toml
>> @@ -40,6 +40,7 @@ proxmox-io.workspace = true
>>   proxmox-lang.workspace=true
>>   proxmox-s3-client = { workspace = true, features = [ "impl" ] }
>>   proxmox-schema = { workspace = true, features = [ "api-macro" ] }
>> +proxmox-section-config.workspace = true
>>   proxmox-serde = { workspace = true, features = [ "serde_json" ] }
>>   proxmox-sys.workspace = true
>>   proxmox-systemd.workspace = true
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 36550ff6..c9cb5d65 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -34,7 +34,8 @@ use pbs_api_types::{
>>       MaintenanceType, Operation, UPID,
>>   };
>>   use pbs_config::s3::S3_CFG_TYPE_ID;
>> -use pbs_config::BackupLockGuard;
>> +use pbs_config::{BackupLockGuard, ConfigVersionCache};
>> +use proxmox_section_config::SectionConfigData;
>>   
>>   use crate::backup_info::{
>>       BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
>> @@ -48,6 +49,17 @@ use crate::s3::S3_CONTENT_PREFIX;
>>   use crate::task_tracking::{self, update_active_operations};
>>   use crate::{DataBlob, LocalDatastoreLruCache};
>>   
>> +// Cache for fully parsed datastore.cfg
>> +struct DatastoreConfigCache {
>> +    // Parsed datastore.cfg file
>> +    config: Arc<SectionConfigData>,
>> +    // Generation number from ConfigVersionCache
>> +    last_generation: usize,
>> +}
>> +
>> +static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
>> +    LazyLock::new(|| Mutex::new(None));
>> +
>>   static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
>>       LazyLock::new(|| Mutex::new(HashMap::new()));
>>   
>> @@ -149,11 +161,13 @@ pub struct DataStoreImpl {
>>       last_gc_status: Mutex<GarbageCollectionStatus>,
>>       verify_new: bool,
>>       chunk_order: ChunkOrder,
>> -    last_digest: Option<[u8; 32]>,
>>       sync_level: DatastoreFSyncLevel,
>>       backend_config: DatastoreBackendConfig,
>>       lru_store_caching: Option<LocalDatastoreLruCache>,
>>       thread_settings: DatastoreThreadSettings,
>> +    /// Datastore generation number from `ConfigVersionCache` at creation time, used to
> 
> datastore.cfg cache generation number at lookup time, used to invalidate
> this cached `DataStoreImpl`
> 
> creation time could also refer to the datastore creation time..
>

Good point, agree! Will apply your suggestion, I like it more :)

>> +    /// validate reuse of this cached `DataStoreImpl`.
>> +    config_generation: Option<usize>,
>>   }
>>   
>>   impl DataStoreImpl {
>> @@ -166,11 +180,11 @@ impl DataStoreImpl {
>>               last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
>>               verify_new: false,
>>               chunk_order: Default::default(),
>> -            last_digest: None,
>>               sync_level: Default::default(),
>>               backend_config: Default::default(),
>>               lru_store_caching: None,
>>               thread_settings: Default::default(),
>> +            config_generation: None,
>>           })
>>       }
>>   }
>> @@ -286,6 +300,55 @@ impl DatastoreThreadSettings {
> 
> so this new helper is already +49 lines, which means it's the bulk of
> the change if the noise below is removed..
> 
>>       }
>>   }
>>   
>> +/// Returns the parsed datastore config (`datastore.cfg`) and its
>> +/// generation.
>> +///
>> +/// Uses `ConfigVersionCache` to detect stale entries:
>> +/// - If the cached generation matches the current generation, the
>> +///   cached config is returned.
>> +/// - Otherwise the config is re-read from disk. If `update_cache` is
>> +///   `true`, the new config and current generation are stored in the
>> +///   cache. Callers that set `update_cache = true` must hold the
>> +///   datastore config lock to avoid racing with concurrent config
>> +///   changes.
>> +/// - If `update_cache` is `false`, the freshly read config is returned
>> +///   but the cache is left unchanged.
>> +///
>> +/// If `ConfigVersionCache` is not available, the config is always read
>> +/// from disk and `None` is returned as the generation.
>> +fn datastore_section_config_cached(
>> +    update_cache: bool,
>> +) -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
>> +    let mut config_cache = DATASTORE_CONFIG_CACHE.lock().unwrap();
>> +
>> +    if let Ok(version_cache) = ConfigVersionCache::new() {
>> +        let current_gen = version_cache.datastore_generation();
>> +        if let Some(cached) = config_cache.as_ref() {
>> +            // Fast path: re-use cached datastore.cfg
>> +            if cached.last_generation == current_gen {
>> +                return Ok((cached.config.clone(), Some(cached.last_generation)));
>> +            }
>> +        }
>> +        // Slow path: re-read datastore.cfg
>> +        let (config_raw, _digest) = pbs_config::datastore::config()?;
>> +        let config = Arc::new(config_raw);
>> +
>> +        if update_cache {
>> +            *config_cache = Some(DatastoreConfigCache {
>> +                config: config.clone(),
>> +                last_generation: current_gen,
>> +            });
>> +        }
>> +
>> +        Ok((config, Some(current_gen)))
>> +    } else {
>> +        // Fallback path, no config version cache: read datastore.cfg and return None as generation
>> +        *config_cache = None;
>> +        let (config_raw, _digest) = pbs_config::datastore::config()?;
>> +        Ok((Arc::new(config_raw), None))
>> +    }
>> +}
>> +
>>   impl DataStore {
>>       // This one just panics on everything
>>       #[doc(hidden)]
>> @@ -363,56 +426,63 @@ impl DataStore {
>>           name: &str,
>>           operation: Option<Operation>,
>>       ) -> Result<Arc<DataStore>, Error> {
> 
> the changes here contain a lot of churn that is not needed for the
> actual change that is done - e.g., there's lots of variables being
> needless renamed..
> 
>> -        // Avoid TOCTOU between checking maintenance mode and updating active operation counter, as
>> -        // we use it to decide whether it is okay to delete the datastore.
>> +        // Avoid TOCTOU between checking maintenance mode and updating active operations.
>>           let _config_lock = pbs_config::datastore::lock_config()?;
>>   
>> -        // we could use the ConfigVersionCache's generation for staleness detection, but  we load
>> -        // the config anyway -> just use digest, additional benefit: manual changes get detected
>> -        let (config, digest) = pbs_config::datastore::config()?;
>> -        let config: DataStoreConfig = config.lookup("datastore", name)?;
>> +        // Get the current datastore.cfg generation number and cached config
>> +        let (section_config, gen_num) = datastore_section_config_cached(true)?;
>> +
>> +        let datastore_cfg: DataStoreConfig = section_config.lookup("datastore", name)?;
> 
> renaming this variable here already causes a lot of noise - if you
> really want to do that, do it up-front or at the end as cleanup commit
> (depending on how sure you are the change is agree-able - if it's almost
> certain it is accepted, put it up front, then it can get applied
> already. if not, put it at the end -> then it can be skipped without
> affecting the other patches ;))
> 
> but going from config to datastore_cfg doesn't make the code any clearer
> - the latter can still refer to the whole config (datastore.cfg) or the
> individual datastore's section within.. since we have no futher business
> with the whole section config here, I think keeping this as `config` is
> fine..
> 

Agree, will change back!

As you mentioned, the idea of the rename was to somehow better
differentiate between the global datastore.cfg and the individual
datastore config.
As we have 2 configs here in the same block, I tried to avoid to rely on
"config". As you said datastore_cfg
does not make it much better, or could be misinterpreted. Ultimatively,
I directly aligend with the type names (datastore_cfg for
DataStoreConfig).

I see your point and agree:

since we have no futher business
 > with the whole section config here, I think keeping this as `config` is
 > fine..

This is the best solution here.

>> +        let maintenance_mode = datastore_cfg.get_maintenance_mode();
>> +        let mount_status = get_datastore_mount_status(&datastore_cfg);
> 
> and things extracted into variables here despite being used only once
> (this also doesn't change during the rest of the series)
> 

Good point, will remove!

>>   
>> -        if let Some(maintenance_mode) = config.get_maintenance_mode() {
>> -            if let Err(error) = maintenance_mode.check(operation) {
>> +        if let Some(mm) = &maintenance_mode {
> 
> binding name changes
> 
>> +            if let Err(error) = mm.check(operation.clone()) {
> 
> new clone() is introduced but not needed

Nice catch!

> 
>>                   bail!("datastore '{name}' is unavailable: {error}");
>>               }
>>           }
>>   
>> -        if get_datastore_mount_status(&config) == Some(false) {
>> -            let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
>> -            datastore_cache.remove(&config.name);
>> -            bail!("datastore '{}' is not mounted", config.name);
>> +        let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
> 
> moving this is fine!
> 

Helps to avoid the duplicate checks :)

>> +
>> +        if mount_status == Some(false) {
>> +            datastore_cache.remove(&datastore_cfg.name);
>> +            bail!("datastore '{}' is not mounted", datastore_cfg.name);
>>           }
>>   
>> -        let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
>> -        let entry = datastore_cache.get(name);
> 
> this is changed to doing it twice and cloning..
> 

Refactoring this!

>> -
>> -        // reuse chunk store so that we keep using the same process locker instance!
>> -        let chunk_store = if let Some(datastore) = &entry {
> 
> structure changed here and binding renamed, even though the old one works as-is
> 

Agree, will revert!

>> -            let last_digest = datastore.last_digest.as_ref();
>> -            if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
>> -                if let Some(operation) = operation {
>> -                    update_active_operations(name, operation, 1)?;
>> +        // Re-use DataStoreImpl
>> +        if let Some(existing) = datastore_cache.get(name).cloned() {
>> +            if let (Some(last_generation), Some(gen_num)) = (existing.config_generation, gen_num) {
>> +                if last_generation == gen_num {
> 
> these two ifs can be collapsed into
> 
> if datastore.config_generation == gen_num && gen_num.is_some() {
> 
> since we only want to reuse the entry if the current gen num is the same
> as the last one.
> 

Will collapse the two ifs, good catch!

>> +                    if let Some(op) = operation {
> 
> binding needlessly renamed
> 

Agree, will revert!

>> +                        update_active_operations(name, op, 1)?;
>> +                    }
>> +
>> +                    return Ok(Arc::new(Self {
>> +                        inner: existing,
>> +                        operation,
>> +                    }));
>>                   }
>> -                return Ok(Arc::new(Self {
>> -                    inner: Arc::clone(datastore),
>> -                    operation,
>> -                }));
>>               }
>> -            Arc::clone(&datastore.chunk_store)
>> +        }
>> +
>> +        // (Re)build DataStoreImpl
>> +
>> +        // Reuse chunk store so that we keep using the same process locker instance!
>> +        let chunk_store = if let Some(existing) = datastore_cache.get(name) {
>> +            Arc::clone(&existing.chunk_store)
> 
> this one is not needed at all, can just be combined with the previous if
> as before
>

Will refactor!

>>           } else {
>>               let tuning: DatastoreTuning = serde_json::from_value(
>>                   DatastoreTuning::API_SCHEMA
>> -                    .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
>> +                    .parse_property_string(datastore_cfg.tuning.as_deref().unwrap_or(""))?,
>>               )?;
>>               Arc::new(ChunkStore::open(
>>                   name,
>> -                config.absolute_path(),
>> +                datastore_cfg.absolute_path(),
>>                   tuning.sync_level.unwrap_or_default(),
>>               )?)
>>           };
>>   
>> -        let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
>> +        let datastore = DataStore::with_store_and_config(chunk_store, datastore_cfg, gen_num)?;
>>   
>>           let datastore = Arc::new(datastore);
>>           datastore_cache.insert(name.to_string(), datastore.clone());
>> @@ -514,7 +584,7 @@ impl DataStore {
>>       fn with_store_and_config(
>>           chunk_store: Arc<ChunkStore>,
>>           config: DataStoreConfig,
>> -        last_digest: Option<[u8; 32]>,
>> +        generation: Option<usize>,
>>       ) -> Result<DataStoreImpl, Error> {
>>           let mut gc_status_path = chunk_store.base_path();
>>           gc_status_path.push(".gc-status");
>> @@ -579,11 +649,11 @@ impl DataStore {
>>               last_gc_status: Mutex::new(gc_status),
>>               verify_new: config.verify_new.unwrap_or(false),
>>               chunk_order: tuning.chunk_order.unwrap_or_default(),
>> -            last_digest,
>>               sync_level: tuning.sync_level.unwrap_or_default(),
>>               backend_config,
>>               lru_store_caching,
>>               thread_settings,
>> +            config_generation: generation,
>>           })
>>       }
>>   
>> -- 
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

  reply	other threads:[~2025-11-26 17:21 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-24 17:04 [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler
2025-11-26 17:21     ` Samuel Rufinatscha [this message]
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler
2025-11-28  9:03     ` Samuel Rufinatscha
2025-11-28 10:46       ` Fabian Grünbichler
2025-11-28 11:10         ` Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler
2025-11-26 15:16 ` [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Fabian Grünbichler
2025-11-26 16:10   ` Samuel Rufinatscha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=601f8abd-ea47-4c3b-9930-ff52b8e07675@proxmox.com \
    --to=s.rufinatscha@proxmox.com \
    --cc=f.gruenbichler@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal