From: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>
To: Proxmox Backup Server development discussion
<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
Date: Wed, 26 Nov 2025 16:15:52 +0100 [thread overview]
Message-ID: <1764167253.w84bii3yf4.astroid@yuna.none> (raw)
In-Reply-To: <20251124170423.303300-3-s.rufinatscha@proxmox.com>
On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
> Repeated /status requests caused lookup_datastore() to re-read and
> parse datastore.cfg on every call. The issue was mentioned in report
> #6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
> dominated by pbs_config::datastore::config() (config parsing).
>
> This patch implements caching of the global datastore.cfg using the
> generation numbers from the shared config version cache. It caches the
> datastore.cfg along with the generation number and, when a subsequent
> lookup sees the same generation, it reuses the cached config without
> re-reading it from disk. If the generation differs
> (or the cache is unavailable), the config is re-read from disk.
> If `update_cache = true`, the new config and current generation are
> persisted in the cache. In this case, callers must hold the datastore
> config lock to avoid racing with concurrent config changes.
> If `update_cache` is `false` and generation did not match, the freshly
> read config is returned but the cache is left unchanged. If
> `ConfigVersionCache` is not available, the config is always read from
> disk and `None` is returned as generation.
>
> Behavioral notes
>
> - The generation is bumped via the existing save_config() path, so
> API-driven config changes are detected immediately.
> - Manual edits to datastore.cfg are not detected; this is covered in a
> dedicated patch in this series.
> - DataStore::drop still performs a config read on the common path;
> also covered in a dedicated patch in this series.
>
> Links
>
> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>
> Fixes: #6049
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
style nits below, but otherwise
Reviewed-by: Fabian Grünbicher <f.gruenbichler@proxmox.com>
> ---
> Changes:
>
> From v1 → v2, thanks @Fabian
> - Moved the ConfigVersionCache changes into its own patch.
> - Introduced the global static DATASTORE_CONFIG_CACHE to store the
> fully parsed datastore.cfg instead, along with its generation number.
> Introduced DatastoreConfigCache struct to hold both.
> - Removed and replaced the CachedDatastoreConfigTag field of
> DataStoreImpl with a generation number field only (Option<usize>)
> to validate DataStoreImpl reuse.
> - Added DataStore::datastore_section_config_cached() helper function
> to encapsulate the caching logic and simplify reuse.
> - Modified DataStore::lookup_datastore() to use the new helper.
>
> From v2 → v3
> No changes
>
> From v3 → v4, thanks @Fabian
> - Restructured the version cache checks in
> datastore_section_config_cached(), to simplify the logic.
> - Added update_cache parameter to datastore_section_config_cached() to
> control cache updates.
>
> From v4 → v5
> - Rebased only, no changes
>
> pbs-datastore/Cargo.toml | 1 +
> pbs-datastore/src/datastore.rs | 138 +++++++++++++++++++++++++--------
> 2 files changed, 105 insertions(+), 34 deletions(-)
this could be
2 files changed, 80 insertions(+), 17 deletions(-)
see below. this might sound nit-picky, but keeping diffs concise makes
reviewing a lot easier because of the higher signal to noise ratio..
>
> diff --git a/pbs-datastore/Cargo.toml b/pbs-datastore/Cargo.toml
> index 8ce930a9..42f49a7b 100644
> --- a/pbs-datastore/Cargo.toml
> +++ b/pbs-datastore/Cargo.toml
> @@ -40,6 +40,7 @@ proxmox-io.workspace = true
> proxmox-lang.workspace=true
> proxmox-s3-client = { workspace = true, features = [ "impl" ] }
> proxmox-schema = { workspace = true, features = [ "api-macro" ] }
> +proxmox-section-config.workspace = true
> proxmox-serde = { workspace = true, features = [ "serde_json" ] }
> proxmox-sys.workspace = true
> proxmox-systemd.workspace = true
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 36550ff6..c9cb5d65 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -34,7 +34,8 @@ use pbs_api_types::{
> MaintenanceType, Operation, UPID,
> };
> use pbs_config::s3::S3_CFG_TYPE_ID;
> -use pbs_config::BackupLockGuard;
> +use pbs_config::{BackupLockGuard, ConfigVersionCache};
> +use proxmox_section_config::SectionConfigData;
>
> use crate::backup_info::{
> BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
> @@ -48,6 +49,17 @@ use crate::s3::S3_CONTENT_PREFIX;
> use crate::task_tracking::{self, update_active_operations};
> use crate::{DataBlob, LocalDatastoreLruCache};
>
> +// Cache for fully parsed datastore.cfg
> +struct DatastoreConfigCache {
> + // Parsed datastore.cfg file
> + config: Arc<SectionConfigData>,
> + // Generation number from ConfigVersionCache
> + last_generation: usize,
> +}
> +
> +static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
> + LazyLock::new(|| Mutex::new(None));
> +
> static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
> LazyLock::new(|| Mutex::new(HashMap::new()));
>
> @@ -149,11 +161,13 @@ pub struct DataStoreImpl {
> last_gc_status: Mutex<GarbageCollectionStatus>,
> verify_new: bool,
> chunk_order: ChunkOrder,
> - last_digest: Option<[u8; 32]>,
> sync_level: DatastoreFSyncLevel,
> backend_config: DatastoreBackendConfig,
> lru_store_caching: Option<LocalDatastoreLruCache>,
> thread_settings: DatastoreThreadSettings,
> + /// Datastore generation number from `ConfigVersionCache` at creation time, used to
datastore.cfg cache generation number at lookup time, used to invalidate
this cached `DataStoreImpl`
creation time could also refer to the datastore creation time..
> + /// validate reuse of this cached `DataStoreImpl`.
> + config_generation: Option<usize>,
> }
>
> impl DataStoreImpl {
> @@ -166,11 +180,11 @@ impl DataStoreImpl {
> last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
> verify_new: false,
> chunk_order: Default::default(),
> - last_digest: None,
> sync_level: Default::default(),
> backend_config: Default::default(),
> lru_store_caching: None,
> thread_settings: Default::default(),
> + config_generation: None,
> })
> }
> }
> @@ -286,6 +300,55 @@ impl DatastoreThreadSettings {
so this new helper is already +49 lines, which means it's the bulk of
the change if the noise below is removed..
> }
> }
>
> +/// Returns the parsed datastore config (`datastore.cfg`) and its
> +/// generation.
> +///
> +/// Uses `ConfigVersionCache` to detect stale entries:
> +/// - If the cached generation matches the current generation, the
> +/// cached config is returned.
> +/// - Otherwise the config is re-read from disk. If `update_cache` is
> +/// `true`, the new config and current generation are stored in the
> +/// cache. Callers that set `update_cache = true` must hold the
> +/// datastore config lock to avoid racing with concurrent config
> +/// changes.
> +/// - If `update_cache` is `false`, the freshly read config is returned
> +/// but the cache is left unchanged.
> +///
> +/// If `ConfigVersionCache` is not available, the config is always read
> +/// from disk and `None` is returned as the generation.
> +fn datastore_section_config_cached(
> + update_cache: bool,
> +) -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
> + let mut config_cache = DATASTORE_CONFIG_CACHE.lock().unwrap();
> +
> + if let Ok(version_cache) = ConfigVersionCache::new() {
> + let current_gen = version_cache.datastore_generation();
> + if let Some(cached) = config_cache.as_ref() {
> + // Fast path: re-use cached datastore.cfg
> + if cached.last_generation == current_gen {
> + return Ok((cached.config.clone(), Some(cached.last_generation)));
> + }
> + }
> + // Slow path: re-read datastore.cfg
> + let (config_raw, _digest) = pbs_config::datastore::config()?;
> + let config = Arc::new(config_raw);
> +
> + if update_cache {
> + *config_cache = Some(DatastoreConfigCache {
> + config: config.clone(),
> + last_generation: current_gen,
> + });
> + }
> +
> + Ok((config, Some(current_gen)))
> + } else {
> + // Fallback path, no config version cache: read datastore.cfg and return None as generation
> + *config_cache = None;
> + let (config_raw, _digest) = pbs_config::datastore::config()?;
> + Ok((Arc::new(config_raw), None))
> + }
> +}
> +
> impl DataStore {
> // This one just panics on everything
> #[doc(hidden)]
> @@ -363,56 +426,63 @@ impl DataStore {
> name: &str,
> operation: Option<Operation>,
> ) -> Result<Arc<DataStore>, Error> {
the changes here contain a lot of churn that is not needed for the
actual change that is done - e.g., there's lots of variables being
needless renamed..
> - // Avoid TOCTOU between checking maintenance mode and updating active operation counter, as
> - // we use it to decide whether it is okay to delete the datastore.
> + // Avoid TOCTOU between checking maintenance mode and updating active operations.
> let _config_lock = pbs_config::datastore::lock_config()?;
>
> - // we could use the ConfigVersionCache's generation for staleness detection, but we load
> - // the config anyway -> just use digest, additional benefit: manual changes get detected
> - let (config, digest) = pbs_config::datastore::config()?;
> - let config: DataStoreConfig = config.lookup("datastore", name)?;
> + // Get the current datastore.cfg generation number and cached config
> + let (section_config, gen_num) = datastore_section_config_cached(true)?;
> +
> + let datastore_cfg: DataStoreConfig = section_config.lookup("datastore", name)?;
renaming this variable here already causes a lot of noise - if you
really want to do that, do it up-front or at the end as cleanup commit
(depending on how sure you are the change is agree-able - if it's almost
certain it is accepted, put it up front, then it can get applied
already. if not, put it at the end -> then it can be skipped without
affecting the other patches ;))
but going from config to datastore_cfg doesn't make the code any clearer
- the latter can still refer to the whole config (datastore.cfg) or the
individual datastore's section within.. since we have no futher business
with the whole section config here, I think keeping this as `config` is
fine..
> + let maintenance_mode = datastore_cfg.get_maintenance_mode();
> + let mount_status = get_datastore_mount_status(&datastore_cfg);
and things extracted into variables here despite being used only once
(this also doesn't change during the rest of the series)
>
> - if let Some(maintenance_mode) = config.get_maintenance_mode() {
> - if let Err(error) = maintenance_mode.check(operation) {
> + if let Some(mm) = &maintenance_mode {
binding name changes
> + if let Err(error) = mm.check(operation.clone()) {
new clone() is introduced but not needed
> bail!("datastore '{name}' is unavailable: {error}");
> }
> }
>
> - if get_datastore_mount_status(&config) == Some(false) {
> - let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
> - datastore_cache.remove(&config.name);
> - bail!("datastore '{}' is not mounted", config.name);
> + let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
moving this is fine!
> +
> + if mount_status == Some(false) {
> + datastore_cache.remove(&datastore_cfg.name);
> + bail!("datastore '{}' is not mounted", datastore_cfg.name);
> }
>
> - let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
> - let entry = datastore_cache.get(name);
this is changed to doing it twice and cloning..
> -
> - // reuse chunk store so that we keep using the same process locker instance!
> - let chunk_store = if let Some(datastore) = &entry {
structure changed here and binding renamed, even though the old one works as-is
> - let last_digest = datastore.last_digest.as_ref();
> - if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
> - if let Some(operation) = operation {
> - update_active_operations(name, operation, 1)?;
> + // Re-use DataStoreImpl
> + if let Some(existing) = datastore_cache.get(name).cloned() {
> + if let (Some(last_generation), Some(gen_num)) = (existing.config_generation, gen_num) {
> + if last_generation == gen_num {
these two ifs can be collapsed into
if datastore.config_generation == gen_num && gen_num.is_some() {
since we only want to reuse the entry if the current gen num is the same
as the last one.
> + if let Some(op) = operation {
binding needlessly renamed
> + update_active_operations(name, op, 1)?;
> + }
> +
> + return Ok(Arc::new(Self {
> + inner: existing,
> + operation,
> + }));
> }
> - return Ok(Arc::new(Self {
> - inner: Arc::clone(datastore),
> - operation,
> - }));
> }
> - Arc::clone(&datastore.chunk_store)
> + }
> +
> + // (Re)build DataStoreImpl
> +
> + // Reuse chunk store so that we keep using the same process locker instance!
> + let chunk_store = if let Some(existing) = datastore_cache.get(name) {
> + Arc::clone(&existing.chunk_store)
this one is not needed at all, can just be combined with the previous if
as before
> } else {
> let tuning: DatastoreTuning = serde_json::from_value(
> DatastoreTuning::API_SCHEMA
> - .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
> + .parse_property_string(datastore_cfg.tuning.as_deref().unwrap_or(""))?,
> )?;
> Arc::new(ChunkStore::open(
> name,
> - config.absolute_path(),
> + datastore_cfg.absolute_path(),
> tuning.sync_level.unwrap_or_default(),
> )?)
> };
>
> - let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
> + let datastore = DataStore::with_store_and_config(chunk_store, datastore_cfg, gen_num)?;
>
> let datastore = Arc::new(datastore);
> datastore_cache.insert(name.to_string(), datastore.clone());
> @@ -514,7 +584,7 @@ impl DataStore {
> fn with_store_and_config(
> chunk_store: Arc<ChunkStore>,
> config: DataStoreConfig,
> - last_digest: Option<[u8; 32]>,
> + generation: Option<usize>,
> ) -> Result<DataStoreImpl, Error> {
> let mut gc_status_path = chunk_store.base_path();
> gc_status_path.push(".gc-status");
> @@ -579,11 +649,11 @@ impl DataStore {
> last_gc_status: Mutex::new(gc_status),
> verify_new: config.verify_new.unwrap_or(false),
> chunk_order: tuning.chunk_order.unwrap_or_default(),
> - last_digest,
> sync_level: tuning.sync_level.unwrap_or_default(),
> backend_config,
> lru_store_caching,
> thread_settings,
> + config_generation: generation,
> })
> }
>
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next prev parent reply other threads:[~2025-11-26 15:16 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-24 17:04 [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
2025-11-26 15:15 ` Fabian Grünbichler
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2025-11-26 15:15 ` Fabian Grünbichler [this message]
2025-11-26 17:21 ` Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
2025-11-26 15:15 ` Fabian Grünbichler
2025-11-28 9:03 ` Samuel Rufinatscha
2025-11-28 10:46 ` Fabian Grünbichler
2025-11-28 11:10 ` Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-26 15:15 ` Fabian Grünbichler
2025-11-26 15:16 ` [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Fabian Grünbichler
2025-11-26 16:10 ` Samuel Rufinatscha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1764167253.w84bii3yf4.astroid@yuna.none \
--to=f.gruenbichler@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox