public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox Backup Server development discussion
	<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
Date: Wed, 26 Nov 2025 16:15:52 +0100	[thread overview]
Message-ID: <1764167253.w84bii3yf4.astroid@yuna.none> (raw)
In-Reply-To: <20251124170423.303300-3-s.rufinatscha@proxmox.com>

On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
> Repeated /status requests caused lookup_datastore() to re-read and
> parse datastore.cfg on every call. The issue was mentioned in report
> #6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
> dominated by pbs_config::datastore::config() (config parsing).
> 
> This patch implements caching of the global datastore.cfg using the
> generation numbers from the shared config version cache. It caches the
> datastore.cfg along with the generation number and, when a subsequent
> lookup sees the same generation, it reuses the cached config without
> re-reading it from disk. If the generation differs
> (or the cache is unavailable), the config is re-read from disk.
> If `update_cache = true`, the new config and current generation are
> persisted in the cache. In this case, callers must hold the datastore
> config lock to avoid racing with concurrent config changes.
> If `update_cache` is `false` and generation did not match, the freshly
> read config is returned but the cache is left unchanged. If
> `ConfigVersionCache` is not available, the config is always read from
> disk and `None` is returned as generation.
> 
> Behavioral notes
> 
> - The generation is bumped via the existing save_config() path, so
>   API-driven config changes are detected immediately.
> - Manual edits to datastore.cfg are not detected; this is covered in a
>   dedicated patch in this series.
> - DataStore::drop still performs a config read on the common path;
>   also covered in a dedicated patch in this series.
> 
> Links
> 
> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
> 
> Fixes: #6049
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>

style nits below, but otherwise

Reviewed-by: Fabian Grünbicher <f.gruenbichler@proxmox.com>

> ---
> Changes:
> 
> From v1 → v2, thanks @Fabian
> - Moved the ConfigVersionCache changes into its own patch.
> - Introduced the global static DATASTORE_CONFIG_CACHE to store the
> fully parsed datastore.cfg instead, along with its generation number.
>   Introduced DatastoreConfigCache struct to hold both.
> - Removed and replaced the CachedDatastoreConfigTag field of
>   DataStoreImpl with a generation number field only (Option<usize>)
>   to validate DataStoreImpl reuse.
> - Added DataStore::datastore_section_config_cached() helper function
>   to encapsulate the caching logic and simplify reuse.
> - Modified DataStore::lookup_datastore() to use the new helper.
> 
> From v2 → v3
> No changes
> 
> From v3 → v4, thanks @Fabian
> - Restructured the version cache checks in
>   datastore_section_config_cached(), to simplify the logic.
> - Added update_cache parameter to datastore_section_config_cached() to
>   control cache updates.
> 
> From v4 → v5
> - Rebased only, no changes
> 
>  pbs-datastore/Cargo.toml       |   1 +
>  pbs-datastore/src/datastore.rs | 138 +++++++++++++++++++++++++--------
>  2 files changed, 105 insertions(+), 34 deletions(-)

this could be

 2 files changed, 80 insertions(+), 17 deletions(-)

see below. this might sound nit-picky, but keeping diffs concise makes
reviewing a lot easier because of the higher signal to noise ratio..

> 
> diff --git a/pbs-datastore/Cargo.toml b/pbs-datastore/Cargo.toml
> index 8ce930a9..42f49a7b 100644
> --- a/pbs-datastore/Cargo.toml
> +++ b/pbs-datastore/Cargo.toml
> @@ -40,6 +40,7 @@ proxmox-io.workspace = true
>  proxmox-lang.workspace=true
>  proxmox-s3-client = { workspace = true, features = [ "impl" ] }
>  proxmox-schema = { workspace = true, features = [ "api-macro" ] }
> +proxmox-section-config.workspace = true
>  proxmox-serde = { workspace = true, features = [ "serde_json" ] }
>  proxmox-sys.workspace = true
>  proxmox-systemd.workspace = true
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 36550ff6..c9cb5d65 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -34,7 +34,8 @@ use pbs_api_types::{
>      MaintenanceType, Operation, UPID,
>  };
>  use pbs_config::s3::S3_CFG_TYPE_ID;
> -use pbs_config::BackupLockGuard;
> +use pbs_config::{BackupLockGuard, ConfigVersionCache};
> +use proxmox_section_config::SectionConfigData;
>  
>  use crate::backup_info::{
>      BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
> @@ -48,6 +49,17 @@ use crate::s3::S3_CONTENT_PREFIX;
>  use crate::task_tracking::{self, update_active_operations};
>  use crate::{DataBlob, LocalDatastoreLruCache};
>  
> +// Cache for fully parsed datastore.cfg
> +struct DatastoreConfigCache {
> +    // Parsed datastore.cfg file
> +    config: Arc<SectionConfigData>,
> +    // Generation number from ConfigVersionCache
> +    last_generation: usize,
> +}
> +
> +static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
> +    LazyLock::new(|| Mutex::new(None));
> +
>  static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
>      LazyLock::new(|| Mutex::new(HashMap::new()));
>  
> @@ -149,11 +161,13 @@ pub struct DataStoreImpl {
>      last_gc_status: Mutex<GarbageCollectionStatus>,
>      verify_new: bool,
>      chunk_order: ChunkOrder,
> -    last_digest: Option<[u8; 32]>,
>      sync_level: DatastoreFSyncLevel,
>      backend_config: DatastoreBackendConfig,
>      lru_store_caching: Option<LocalDatastoreLruCache>,
>      thread_settings: DatastoreThreadSettings,
> +    /// Datastore generation number from `ConfigVersionCache` at creation time, used to

datastore.cfg cache generation number at lookup time, used to invalidate
this cached `DataStoreImpl`

creation time could also refer to the datastore creation time..

> +    /// validate reuse of this cached `DataStoreImpl`.
> +    config_generation: Option<usize>,
>  }
>  
>  impl DataStoreImpl {
> @@ -166,11 +180,11 @@ impl DataStoreImpl {
>              last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
>              verify_new: false,
>              chunk_order: Default::default(),
> -            last_digest: None,
>              sync_level: Default::default(),
>              backend_config: Default::default(),
>              lru_store_caching: None,
>              thread_settings: Default::default(),
> +            config_generation: None,
>          })
>      }
>  }
> @@ -286,6 +300,55 @@ impl DatastoreThreadSettings {

so this new helper is already +49 lines, which means it's the bulk of
the change if the noise below is removed..

>      }
>  }
>  
> +/// Returns the parsed datastore config (`datastore.cfg`) and its
> +/// generation.
> +///
> +/// Uses `ConfigVersionCache` to detect stale entries:
> +/// - If the cached generation matches the current generation, the
> +///   cached config is returned.
> +/// - Otherwise the config is re-read from disk. If `update_cache` is
> +///   `true`, the new config and current generation are stored in the
> +///   cache. Callers that set `update_cache = true` must hold the
> +///   datastore config lock to avoid racing with concurrent config
> +///   changes.
> +/// - If `update_cache` is `false`, the freshly read config is returned
> +///   but the cache is left unchanged.
> +///
> +/// If `ConfigVersionCache` is not available, the config is always read
> +/// from disk and `None` is returned as the generation.
> +fn datastore_section_config_cached(
> +    update_cache: bool,
> +) -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
> +    let mut config_cache = DATASTORE_CONFIG_CACHE.lock().unwrap();
> +
> +    if let Ok(version_cache) = ConfigVersionCache::new() {
> +        let current_gen = version_cache.datastore_generation();
> +        if let Some(cached) = config_cache.as_ref() {
> +            // Fast path: re-use cached datastore.cfg
> +            if cached.last_generation == current_gen {
> +                return Ok((cached.config.clone(), Some(cached.last_generation)));
> +            }
> +        }
> +        // Slow path: re-read datastore.cfg
> +        let (config_raw, _digest) = pbs_config::datastore::config()?;
> +        let config = Arc::new(config_raw);
> +
> +        if update_cache {
> +            *config_cache = Some(DatastoreConfigCache {
> +                config: config.clone(),
> +                last_generation: current_gen,
> +            });
> +        }
> +
> +        Ok((config, Some(current_gen)))
> +    } else {
> +        // Fallback path, no config version cache: read datastore.cfg and return None as generation
> +        *config_cache = None;
> +        let (config_raw, _digest) = pbs_config::datastore::config()?;
> +        Ok((Arc::new(config_raw), None))
> +    }
> +}
> +
>  impl DataStore {
>      // This one just panics on everything
>      #[doc(hidden)]
> @@ -363,56 +426,63 @@ impl DataStore {
>          name: &str,
>          operation: Option<Operation>,
>      ) -> Result<Arc<DataStore>, Error> {

the changes here contain a lot of churn that is not needed for the
actual change that is done - e.g., there's lots of variables being
needless renamed..

> -        // Avoid TOCTOU between checking maintenance mode and updating active operation counter, as
> -        // we use it to decide whether it is okay to delete the datastore.
> +        // Avoid TOCTOU between checking maintenance mode and updating active operations.
>          let _config_lock = pbs_config::datastore::lock_config()?;
>  
> -        // we could use the ConfigVersionCache's generation for staleness detection, but  we load
> -        // the config anyway -> just use digest, additional benefit: manual changes get detected
> -        let (config, digest) = pbs_config::datastore::config()?;
> -        let config: DataStoreConfig = config.lookup("datastore", name)?;
> +        // Get the current datastore.cfg generation number and cached config
> +        let (section_config, gen_num) = datastore_section_config_cached(true)?;
> +
> +        let datastore_cfg: DataStoreConfig = section_config.lookup("datastore", name)?;

renaming this variable here already causes a lot of noise - if you
really want to do that, do it up-front or at the end as cleanup commit
(depending on how sure you are the change is agree-able - if it's almost
certain it is accepted, put it up front, then it can get applied
already. if not, put it at the end -> then it can be skipped without
affecting the other patches ;))

but going from config to datastore_cfg doesn't make the code any clearer
- the latter can still refer to the whole config (datastore.cfg) or the
individual datastore's section within.. since we have no futher business
with the whole section config here, I think keeping this as `config` is
fine..

> +        let maintenance_mode = datastore_cfg.get_maintenance_mode();
> +        let mount_status = get_datastore_mount_status(&datastore_cfg);

and things extracted into variables here despite being used only once
(this also doesn't change during the rest of the series)

>  
> -        if let Some(maintenance_mode) = config.get_maintenance_mode() {
> -            if let Err(error) = maintenance_mode.check(operation) {
> +        if let Some(mm) = &maintenance_mode {

binding name changes

> +            if let Err(error) = mm.check(operation.clone()) {

new clone() is introduced but not needed

>                  bail!("datastore '{name}' is unavailable: {error}");
>              }
>          }
>  
> -        if get_datastore_mount_status(&config) == Some(false) {
> -            let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
> -            datastore_cache.remove(&config.name);
> -            bail!("datastore '{}' is not mounted", config.name);
> +        let mut datastore_cache = DATASTORE_MAP.lock().unwrap();

moving this is fine!

> +
> +        if mount_status == Some(false) {
> +            datastore_cache.remove(&datastore_cfg.name);
> +            bail!("datastore '{}' is not mounted", datastore_cfg.name);
>          }
>  
> -        let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
> -        let entry = datastore_cache.get(name);

this is changed to doing it twice and cloning..

> -
> -        // reuse chunk store so that we keep using the same process locker instance!
> -        let chunk_store = if let Some(datastore) = &entry {

structure changed here and binding renamed, even though the old one works as-is

> -            let last_digest = datastore.last_digest.as_ref();
> -            if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
> -                if let Some(operation) = operation {
> -                    update_active_operations(name, operation, 1)?;
> +        // Re-use DataStoreImpl
> +        if let Some(existing) = datastore_cache.get(name).cloned() {
> +            if let (Some(last_generation), Some(gen_num)) = (existing.config_generation, gen_num) {
> +                if last_generation == gen_num {

these two ifs can be collapsed into

if datastore.config_generation == gen_num && gen_num.is_some() {

since we only want to reuse the entry if the current gen num is the same
as the last one.

> +                    if let Some(op) = operation {

binding needlessly renamed

> +                        update_active_operations(name, op, 1)?;
> +                    }
> +
> +                    return Ok(Arc::new(Self {
> +                        inner: existing,
> +                        operation,
> +                    }));
>                  }
> -                return Ok(Arc::new(Self {
> -                    inner: Arc::clone(datastore),
> -                    operation,
> -                }));
>              }
> -            Arc::clone(&datastore.chunk_store)
> +        }
> +
> +        // (Re)build DataStoreImpl
> +
> +        // Reuse chunk store so that we keep using the same process locker instance!
> +        let chunk_store = if let Some(existing) = datastore_cache.get(name) {
> +            Arc::clone(&existing.chunk_store)

this one is not needed at all, can just be combined with the previous if
as before

>          } else {
>              let tuning: DatastoreTuning = serde_json::from_value(
>                  DatastoreTuning::API_SCHEMA
> -                    .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
> +                    .parse_property_string(datastore_cfg.tuning.as_deref().unwrap_or(""))?,
>              )?;
>              Arc::new(ChunkStore::open(
>                  name,
> -                config.absolute_path(),
> +                datastore_cfg.absolute_path(),
>                  tuning.sync_level.unwrap_or_default(),
>              )?)
>          };
>  
> -        let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
> +        let datastore = DataStore::with_store_and_config(chunk_store, datastore_cfg, gen_num)?;
>  
>          let datastore = Arc::new(datastore);
>          datastore_cache.insert(name.to_string(), datastore.clone());
> @@ -514,7 +584,7 @@ impl DataStore {
>      fn with_store_and_config(
>          chunk_store: Arc<ChunkStore>,
>          config: DataStoreConfig,
> -        last_digest: Option<[u8; 32]>,
> +        generation: Option<usize>,
>      ) -> Result<DataStoreImpl, Error> {
>          let mut gc_status_path = chunk_store.base_path();
>          gc_status_path.push(".gc-status");
> @@ -579,11 +649,11 @@ impl DataStore {
>              last_gc_status: Mutex::new(gc_status),
>              verify_new: config.verify_new.unwrap_or(false),
>              chunk_order: tuning.chunk_order.unwrap_or_default(),
> -            last_digest,
>              sync_level: tuning.sync_level.unwrap_or_default(),
>              backend_config,
>              lru_store_caching,
>              thread_settings,
> +            config_generation: generation,
>          })
>      }
>  
> -- 
> 2.47.3
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

  reply	other threads:[~2025-11-26 15:16 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-24 17:04 [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler [this message]
2025-11-26 17:21     ` Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler
2025-11-28  9:03     ` Samuel Rufinatscha
2025-11-28 10:46       ` Fabian Grünbichler
2025-11-28 11:10         ` Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-26 15:15   ` Fabian Grünbichler
2025-11-26 15:16 ` [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Fabian Grünbichler
2025-11-26 16:10   ` Samuel Rufinatscha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1764167253.w84bii3yf4.astroid@yuna.none \
    --to=f.gruenbichler@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal