all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH proxmox-backup 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
Date: Tue, 11 Nov 2025 13:29:39 +0100	[thread overview]
Message-ID: <20251111122941.110412-2-s.rufinatscha@proxmox.com> (raw)
In-Reply-To: <20251111122941.110412-1-s.rufinatscha@proxmox.com>

Repeated /status requests caused lookup_datastore() to re-read and
parse datastore.cfg on every call. The issue was mentioned in report
#6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
dominated by pbs_config::datastore::config() (config parsing).

This patch adds a fast path to lookup_datastore() using the shared-
memory ConfigVersionCache generation counter for datastore.cfg. It
stores the last seen generation number alongside the cached
DataStoreImpl and, when a subsequent lookup sees the same generation,
we reuse the cached instance without re-reading the on-disk config. If
the generation differs (or the cache is unavailable), it falls back to
the existing slow path with no behavioral changes.

Behavioral notes

- The generation is bumped via the existing save_config() path, so
  API-driven config changes are detected immediately.
- Manual edits to datastore.cfg are not detected by this commit; a TTL
  guard is introduced in a later patch to cover that case.
- DataStore::drop still performs a config read on the common path; this
  is addressed in the next patch.

Testing

cargo-flamegraph confirms the config-parse hotspot in
lookup_datastore() disappears from the hot path.

Additionally, an end-to-end benchmark was performed using the
`/status?verbose=0` API with 1000 datastores, 5 requests per store,
and 16-way parallelism:

| Metric                   | Before | After  | Δ (abs) | Δ (%)   |
|--------------------------|:------:|:------:|:-------:|:-------:|
| Total time               | 13s    | 11s    | −2s     | −15.38% |
| Throughput (all rounds)  | 384.62 | 454.55 | +69.93  | +18.18% |
| Cold RPS (round #1)      | 76.92  | 90.91  | +13.99  | +18.19% |
| Warm RPS (rounds 2..N)   | 307.69 | 363.64 | +55.95  | +18.18% |

Throughput improved by ~18% overall, with total time reduced by ~15%.
Warm lookups now reuse cached datastore instances and skip redundant
config parsing entirely. The first-access round also shows a similar
improvement, likely due to reduced contention and I/O when many stores
are accessed in parallel.

Note: A second hotspot remains in Drop where a config read occurs; that
is addressed in the next commit.

Links

[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph

Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
 pbs-config/src/config_version_cache.rs | 10 +++-
 pbs-datastore/src/datastore.rs         | 77 +++++++++++++++++---------
 2 files changed, 59 insertions(+), 28 deletions(-)

diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
index e8fb994f..b875f7e0 100644
--- a/pbs-config/src/config_version_cache.rs
+++ b/pbs-config/src/config_version_cache.rs
@@ -26,7 +26,6 @@ struct ConfigVersionCacheDataInner {
     // Traffic control (traffic-control.cfg) generation/version.
     traffic_control_generation: AtomicUsize,
     // datastore (datastore.cfg) generation/version
-    // FIXME: remove with PBS 3.0
     datastore_generation: AtomicUsize,
     // Add further atomics here
 }
@@ -145,8 +144,15 @@ impl ConfigVersionCache {
             .fetch_add(1, Ordering::AcqRel);
     }
 
+    /// Returns the datastore generation number.
+    pub fn datastore_generation(&self) -> usize {
+        self.shmem
+            .data()
+            .datastore_generation
+            .load(Ordering::Acquire)
+    }
+
     /// Increase the datastore generation number.
-    // FIXME: remove with PBS 3.0 or make actually useful again in datastore lookup
     pub fn increase_datastore_generation(&self) -> usize {
         self.shmem
             .data()
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 70af94d8..18eebb58 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -32,7 +32,7 @@ use pbs_api_types::{
     MaintenanceType, Operation, UPID,
 };
 use pbs_config::s3::S3_CFG_TYPE_ID;
-use pbs_config::BackupLockGuard;
+use pbs_config::{BackupLockGuard, ConfigVersionCache};
 
 use crate::backup_info::{
     BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
@@ -140,10 +140,10 @@ pub struct DataStoreImpl {
     last_gc_status: Mutex<GarbageCollectionStatus>,
     verify_new: bool,
     chunk_order: ChunkOrder,
-    last_digest: Option<[u8; 32]>,
     sync_level: DatastoreFSyncLevel,
     backend_config: DatastoreBackendConfig,
     lru_store_caching: Option<LocalDatastoreLruCache>,
+    cached_config_tag: Option<CachedDatastoreConfigTag>,
 }
 
 impl DataStoreImpl {
@@ -156,10 +156,10 @@ impl DataStoreImpl {
             last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
             verify_new: false,
             chunk_order: Default::default(),
-            last_digest: None,
             sync_level: Default::default(),
             backend_config: Default::default(),
             lru_store_caching: None,
+            cached_config_tag: None,
         })
     }
 }
@@ -224,6 +224,15 @@ pub enum DatastoreBackend {
     S3(Arc<S3Client>),
 }
 
+/// Used to determine whether a cached datastore instance is still valid
+/// or needs to be reloaded after a config change.
+struct CachedDatastoreConfigTag {
+    /// Maintenance mode at the time the lookup hint was captured, if any.
+    last_maintenance_mode: Option<MaintenanceMode>,
+    /// Datastore generation number from `ConfigVersionCache`; `None` when the cache wasn't available.
+    last_generation: Option<usize>,
+}
+
 impl DataStore {
     // This one just panics on everything
     #[doc(hidden)]
@@ -299,13 +308,40 @@ impl DataStore {
         // we use it to decide whether it is okay to delete the datastore.
         let _config_lock = pbs_config::datastore::lock_config()?;
 
-        // we could use the ConfigVersionCache's generation for staleness detection, but  we load
-        // the config anyway -> just use digest, additional benefit: manual changes get detected
-        let (config, digest) = pbs_config::datastore::config()?;
+        // Get the current datastore.cfg generation number
+        let gen_num = ConfigVersionCache::new()
+            .ok()
+            .map(|c| c.datastore_generation());
+
+        // Fast-path: if we have a cached entry created under the same datastore.cfg generation number, reuse it.
+        if let (Some(gen_num), Some(ds)) =
+            (gen_num, DATASTORE_MAP.lock().unwrap().get(name).cloned())
+        {
+            if let Some(cached_tag) = &ds.cached_config_tag {
+                if cached_tag.last_generation == Some(gen_num) {
+                    if let Some(mm) = &cached_tag.last_maintenance_mode {
+                        if let Err(error) = mm.check(operation) {
+                            bail!("datastore '{name}' is unavailable: {error}");
+                        }
+                    }
+                    if let Some(operation) = operation {
+                        update_active_operations(name, operation, 1)?;
+                    }
+                    return Ok(Arc::new(Self {
+                        inner: ds,
+                        operation,
+                    }));
+                }
+            }
+        }
+
+        // Slow path: (re)load config
+        let (config, _digest) = pbs_config::datastore::config()?;
         let config: DataStoreConfig = config.lookup("datastore", name)?;
 
-        if let Some(maintenance_mode) = config.get_maintenance_mode() {
-            if let Err(error) = maintenance_mode.check(operation) {
+        let maintenance_mode = config.get_maintenance_mode();
+        if let Some(mm) = &maintenance_mode {
+            if let Err(error) = mm.check(operation) {
                 bail!("datastore '{name}' is unavailable: {error}");
             }
         }
@@ -321,16 +357,6 @@ impl DataStore {
 
         // reuse chunk store so that we keep using the same process locker instance!
         let chunk_store = if let Some(datastore) = &entry {
-            let last_digest = datastore.last_digest.as_ref();
-            if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
-                if let Some(operation) = operation {
-                    update_active_operations(name, operation, 1)?;
-                }
-                return Ok(Arc::new(Self {
-                    inner: Arc::clone(datastore),
-                    operation,
-                }));
-            }
             Arc::clone(&datastore.chunk_store)
         } else {
             let tuning: DatastoreTuning = serde_json::from_value(
@@ -344,7 +370,11 @@ impl DataStore {
             )?)
         };
 
-        let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
+        let mut datastore = DataStore::with_store_and_config(chunk_store, config)?;
+        datastore.cached_config_tag = Some(CachedDatastoreConfigTag {
+            last_maintenance_mode: maintenance_mode,
+            last_generation: gen_num,
+        });
 
         let datastore = Arc::new(datastore);
         datastore_cache.insert(name.to_string(), datastore.clone());
@@ -430,11 +460,7 @@ impl DataStore {
             config.absolute_path(),
             tuning.sync_level.unwrap_or_default(),
         )?;
-        let inner = Arc::new(Self::with_store_and_config(
-            Arc::new(chunk_store),
-            config,
-            None,
-        )?);
+        let inner = Arc::new(Self::with_store_and_config(Arc::new(chunk_store), config)?);
 
         if let Some(operation) = operation {
             update_active_operations(&name, operation, 1)?;
@@ -446,7 +472,6 @@ impl DataStore {
     fn with_store_and_config(
         chunk_store: Arc<ChunkStore>,
         config: DataStoreConfig,
-        last_digest: Option<[u8; 32]>,
     ) -> Result<DataStoreImpl, Error> {
         let mut gc_status_path = chunk_store.base_path();
         gc_status_path.push(".gc-status");
@@ -506,10 +531,10 @@ impl DataStore {
             last_gc_status: Mutex::new(gc_status),
             verify_new: config.verify_new.unwrap_or(false),
             chunk_order: tuning.chunk_order.unwrap_or_default(),
-            last_digest,
             sync_level: tuning.sync_level.unwrap_or_default(),
             backend_config,
             lru_store_caching,
+            cached_config_tag: None,
         })
     }
 
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

  reply	other threads:[~2025-11-11 12:29 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-11 12:29 [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-11 12:29 ` Samuel Rufinatscha [this message]
2025-11-12 13:24   ` [pbs-devel] [PATCH proxmox-backup 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Fabian Grünbichler
2025-11-13 12:59     ` Samuel Rufinatscha
2025-11-11 12:29 ` [pbs-devel] [PATCH proxmox-backup 2/3] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
2025-11-12 11:24   ` Fabian Grünbichler
2025-11-12 15:20     ` Samuel Rufinatscha
2025-11-11 12:29 ` [pbs-devel] [PATCH proxmox-backup 3/3] datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-12 11:27 ` [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path Fabian Grünbichler
2025-11-12 17:27   ` Samuel Rufinatscha
2025-11-14 15:08 ` [pbs-devel] superseded: " Samuel Rufinatscha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251111122941.110412-2-s.rufinatscha@proxmox.com \
    --to=s.rufinatscha@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal