* [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path
@ 2025-11-24 17:04 Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
` (4 more replies)
0 siblings, 5 replies; 15+ messages in thread
From: Samuel Rufinatscha @ 2025-11-24 17:04 UTC (permalink / raw)
To: pbs-devel
Hi,
this series reduces CPU time in datastore lookups by avoiding repeated
datastore.cfg reads/parses in both `lookup_datastore()` and
`DataStore::Drop`. It also adds a TTL so manual config edits are
noticed without reintroducing hashing on every request.
While investigating #6049 [1], cargo-flamegraph [2] showed hotspots
during repeated `/status` calls in `lookup_datastore()` and in `Drop`,
dominated by `pbs_config::datastore::config()` (config parse).
The parsing cost itself should eventually be investigated in a future
effort. Furthermore, cargo-flamegraph showed that when using a
token-based auth method to access the API, a significant amount of time
is spent in validation on every request request [3].
## Approach
[PATCH 1/4] Support datastore generation in ConfigVersionCache
[PATCH 2/4] Fast path for datastore lookups
Cache the parsed datastore.cfg keyed by the shared datastore
generation. lookup_datastore() reuses both the cached config and an
existing DataStoreImpl when the generation matches, and falls back
to the old slow path otherwise. The caching logic is implemented
using the datastore_section_config_cached(update_cache: bool) helper.
[PATCH 3/4] Fast path for Drop
Make DataStore::Drop use the datastore_section_config_cached()
helper to avoid re-reading/parsing datastore.cfg on every Drop.
Bump generation not only on API config saves, but also on slow-path
lookups (if update_cache is true), to enable Drop handlers see
eventual newer configs.
[PATCH 4/4] TTL to catch manual edits
Add a TTL to the cached config and bump the datastore generation iff
the digest changed but generation stays the same. This catches manual
edits to datastore.cfg without reintroducing hashing or config
parsing on every request.
## Benchmark results
### End-to-end
Testing `/status?verbose=0` end-to-end with 1000 stores, 5 req/store
and parallel=16 before/after the series:
Metric Before After
----------------------------------------
Total time 12s 9s
Throughput (all) 416.67 555.56
Cold RPS (round #1) 83.33 111.11
Warm RPS (#2..N) 333.33 444.44
Running under flamegraph [2], TLS appears to consume a significant
amount of CPU time and blur the results. Still, a ~33% higher overall
throughput and ~25% less end-to-end time for this workload.
### Isolated benchmarks (hyperfine)
In addition to the end-to-end tests, I measured two standalone
benchmarks with hyperfine, each using a config with 1000 datastores.
`M` is the number of distinct datastores looked up and
`N` is the number of lookups per datastore.
Drop-direct variant:
Drops the `DataStore` after every lookup, so the `Drop` path runs on
every iteration:
use anyhow::Error;
use pbs_api_types::Operation;
use pbs_datastore::DataStore;
fn main() -> Result<(), Error> {
let mut args = std::env::args();
args.next();
let datastores = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
let iterations = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
for d in 1..=datastores {
let name = format!("ds{:04}", d);
for i in 1..=iterations {
DataStore::lookup_datastore(&name, Some(Operation::Write))?;
}
}
Ok(())
}
+----+------+-----------+-----------+---------+
| M | N | Baseline | Patched | Speedup |
+----+------+-----------+-----------+---------+
| 1 | 1000 | 1.684 s | 35.3 ms | 47.7x |
| 10 | 100 | 1.689 s | 35.0 ms | 48.3x |
| 100| 10 | 1.709 s | 35.8 ms | 47.7x |
|1000| 1 | 1.809 s | 39.0 ms | 46.4x |
+----+------+-----------+-----------+---------+
Bulk-drop variant:
Keeps the `DataStore` instances alive for
all `N` lookups of a given datastore and then drops them in bulk,
mimicking a task that performs many lookups while it is running and
only triggers the expensive `Drop` logic when the last user exits.
use anyhow::Error;
use pbs_api_types::Operation;
use pbs_datastore::DataStore;
fn main() -> Result<(), Error> {
let mut args = std::env::args();
args.next();
let datastores = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
let iterations = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
for d in 1..=datastores {
let name = format!("ds{:04}", d);
let mut stores = Vec::with_capacity(iterations);
for i in 1..=iterations {
stores.push(DataStore::lookup_datastore(&name, Some(Operation::Write))?);
}
}
Ok(())
}
+------+------+---------------+--------------+---------+
| M | N | Baseline mean | Patched mean | Speedup |
+------+------+---------------+--------------+---------+
| 1 | 1000 | 890.6 ms | 35.5 ms | 25.1x |
| 10 | 100 | 891.3 ms | 35.1 ms | 25.4x |
| 100 | 10 | 983.9 ms | 35.6 ms | 27.6x |
| 1000 | 1 | 1829.0 ms | 45.2 ms | 40.5x |
+------+------+---------------+--------------+---------+
Both variants show that the combination of the cached config lookups
and the cheaper `Drop` handling reduces the hot-path cost from ~1.8 s
per run to a few tens of milliseconds in these benchmarks.
## Reproduction steps
VM: 4 vCPU, ~8 GiB RAM, VirtIO-SCSI; disks:
- scsi0 32G (OS)
- scsi1 1000G (datastores)
Install PBS from ISO on the VM.
Set up ZFS on /dev/sdb (adjust if different):
zpool create -f -o ashift=12 pbsbench /dev/sdb
zfs set mountpoint=/pbsbench pbsbench
zfs create pbsbench/pbs-bench
Raise file-descriptor limit:
sudo systemctl edit proxmox-backup-proxy.service
Add the following lines:
[Service]
LimitNOFILE=1048576
Reload systemd and restart the proxy:
sudo systemctl daemon-reload
sudo systemctl restart proxmox-backup-proxy.service
Verify the limit:
systemctl show proxmox-backup-proxy.service | grep LimitNOFILE
Create 1000 ZFS-backed datastores (as used in #6049 [1]):
seq -w 001 1000 | xargs -n1 -P1 bash -c '
id=$0
name="ds${id}"
dataset="pbsbench/pbs-bench/${name}"
path="/pbsbench/pbs-bench/${name}"
zfs create -o mountpoint="$path" "$dataset"
proxmox-backup-manager datastore create "$name" "$path" \
--comment "ZFS dataset-based datastore"
'
Build PBS from this series, then run the server under manually
under flamegraph:
systemctl stop proxmox-backup-proxy
cargo flamegraph --release --bin proxmox-backup-proxy
## Patch summary
[PATCH 1/4] partial fix #6049: config: enable config version cache for datastore
[PATCH 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
[PATCH 3/4] partial fix #6049: datastore: use config fast-path in Drop
[PATCH 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
## Maintainer notes
No dependency bumps, no API changes and no breaking changes.
Thanks,
Samuel
Links
[1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
[3] Bugzilla #7017: https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Samuel Rufinatscha (4):
partial fix #6049: config: enable config version cache for datastore
partial fix #6049: datastore: impl ConfigVersionCache fast path for
lookups
partial fix #6049: datastore: use config fast-path in Drop
partial fix #6049: datastore: add TTL fallback to catch manual config
edits
pbs-config/src/config_version_cache.rs | 10 +-
pbs-datastore/Cargo.toml | 1 +
pbs-datastore/src/datastore.rs | 213 ++++++++++++++++++++-----
3 files changed, 179 insertions(+), 45 deletions(-)
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore
2025-11-24 17:04 [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
@ 2025-11-24 17:04 ` Samuel Rufinatscha
2025-11-26 15:15 ` Fabian Grünbichler
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
` (3 subsequent siblings)
4 siblings, 1 reply; 15+ messages in thread
From: Samuel Rufinatscha @ 2025-11-24 17:04 UTC (permalink / raw)
To: pbs-devel
Repeated /status requests caused lookup_datastore() to re-read and
parse datastore.cfg on every call. The issue was mentioned in report
#6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
dominated by pbs_config::datastore::config() (config parsing).
To solve the issue, this patch prepares the config version cache,
so that datastore config caching can be built on top of it.
This patch specifically:
(1) implements increment function in order to invalidate generations
(2) removes obsolete comments
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes:
From v1 → v2 (original introduction), thanks @Fabian
- Split the ConfigVersionCache changes out of the large datastore patch
into their own config-only patch
* removed the obsolete // FIXME comment on datastore_generation
* added ConfigVersionCache::datastore_generation() as getter
From v2 → v3
No changes
From v3 → v4
No changes
From v4 → v5
- Rebased only, no changes
pbs-config/src/config_version_cache.rs | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
index e8fb994f..b875f7e0 100644
--- a/pbs-config/src/config_version_cache.rs
+++ b/pbs-config/src/config_version_cache.rs
@@ -26,7 +26,6 @@ struct ConfigVersionCacheDataInner {
// Traffic control (traffic-control.cfg) generation/version.
traffic_control_generation: AtomicUsize,
// datastore (datastore.cfg) generation/version
- // FIXME: remove with PBS 3.0
datastore_generation: AtomicUsize,
// Add further atomics here
}
@@ -145,8 +144,15 @@ impl ConfigVersionCache {
.fetch_add(1, Ordering::AcqRel);
}
+ /// Returns the datastore generation number.
+ pub fn datastore_generation(&self) -> usize {
+ self.shmem
+ .data()
+ .datastore_generation
+ .load(Ordering::Acquire)
+ }
+
/// Increase the datastore generation number.
- // FIXME: remove with PBS 3.0 or make actually useful again in datastore lookup
pub fn increase_datastore_generation(&self) -> usize {
self.shmem
.data()
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
2025-11-24 17:04 [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
@ 2025-11-24 17:04 ` Samuel Rufinatscha
2025-11-26 15:15 ` Fabian Grünbichler
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
` (2 subsequent siblings)
4 siblings, 1 reply; 15+ messages in thread
From: Samuel Rufinatscha @ 2025-11-24 17:04 UTC (permalink / raw)
To: pbs-devel
Repeated /status requests caused lookup_datastore() to re-read and
parse datastore.cfg on every call. The issue was mentioned in report
#6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
dominated by pbs_config::datastore::config() (config parsing).
This patch implements caching of the global datastore.cfg using the
generation numbers from the shared config version cache. It caches the
datastore.cfg along with the generation number and, when a subsequent
lookup sees the same generation, it reuses the cached config without
re-reading it from disk. If the generation differs
(or the cache is unavailable), the config is re-read from disk.
If `update_cache = true`, the new config and current generation are
persisted in the cache. In this case, callers must hold the datastore
config lock to avoid racing with concurrent config changes.
If `update_cache` is `false` and generation did not match, the freshly
read config is returned but the cache is left unchanged. If
`ConfigVersionCache` is not available, the config is always read from
disk and `None` is returned as generation.
Behavioral notes
- The generation is bumped via the existing save_config() path, so
API-driven config changes are detected immediately.
- Manual edits to datastore.cfg are not detected; this is covered in a
dedicated patch in this series.
- DataStore::drop still performs a config read on the common path;
also covered in a dedicated patch in this series.
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes:
From v1 → v2, thanks @Fabian
- Moved the ConfigVersionCache changes into its own patch.
- Introduced the global static DATASTORE_CONFIG_CACHE to store the
fully parsed datastore.cfg instead, along with its generation number.
Introduced DatastoreConfigCache struct to hold both.
- Removed and replaced the CachedDatastoreConfigTag field of
DataStoreImpl with a generation number field only (Option<usize>)
to validate DataStoreImpl reuse.
- Added DataStore::datastore_section_config_cached() helper function
to encapsulate the caching logic and simplify reuse.
- Modified DataStore::lookup_datastore() to use the new helper.
From v2 → v3
No changes
From v3 → v4, thanks @Fabian
- Restructured the version cache checks in
datastore_section_config_cached(), to simplify the logic.
- Added update_cache parameter to datastore_section_config_cached() to
control cache updates.
From v4 → v5
- Rebased only, no changes
pbs-datastore/Cargo.toml | 1 +
pbs-datastore/src/datastore.rs | 138 +++++++++++++++++++++++++--------
2 files changed, 105 insertions(+), 34 deletions(-)
diff --git a/pbs-datastore/Cargo.toml b/pbs-datastore/Cargo.toml
index 8ce930a9..42f49a7b 100644
--- a/pbs-datastore/Cargo.toml
+++ b/pbs-datastore/Cargo.toml
@@ -40,6 +40,7 @@ proxmox-io.workspace = true
proxmox-lang.workspace=true
proxmox-s3-client = { workspace = true, features = [ "impl" ] }
proxmox-schema = { workspace = true, features = [ "api-macro" ] }
+proxmox-section-config.workspace = true
proxmox-serde = { workspace = true, features = [ "serde_json" ] }
proxmox-sys.workspace = true
proxmox-systemd.workspace = true
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 36550ff6..c9cb5d65 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -34,7 +34,8 @@ use pbs_api_types::{
MaintenanceType, Operation, UPID,
};
use pbs_config::s3::S3_CFG_TYPE_ID;
-use pbs_config::BackupLockGuard;
+use pbs_config::{BackupLockGuard, ConfigVersionCache};
+use proxmox_section_config::SectionConfigData;
use crate::backup_info::{
BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
@@ -48,6 +49,17 @@ use crate::s3::S3_CONTENT_PREFIX;
use crate::task_tracking::{self, update_active_operations};
use crate::{DataBlob, LocalDatastoreLruCache};
+// Cache for fully parsed datastore.cfg
+struct DatastoreConfigCache {
+ // Parsed datastore.cfg file
+ config: Arc<SectionConfigData>,
+ // Generation number from ConfigVersionCache
+ last_generation: usize,
+}
+
+static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
+ LazyLock::new(|| Mutex::new(None));
+
static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
LazyLock::new(|| Mutex::new(HashMap::new()));
@@ -149,11 +161,13 @@ pub struct DataStoreImpl {
last_gc_status: Mutex<GarbageCollectionStatus>,
verify_new: bool,
chunk_order: ChunkOrder,
- last_digest: Option<[u8; 32]>,
sync_level: DatastoreFSyncLevel,
backend_config: DatastoreBackendConfig,
lru_store_caching: Option<LocalDatastoreLruCache>,
thread_settings: DatastoreThreadSettings,
+ /// Datastore generation number from `ConfigVersionCache` at creation time, used to
+ /// validate reuse of this cached `DataStoreImpl`.
+ config_generation: Option<usize>,
}
impl DataStoreImpl {
@@ -166,11 +180,11 @@ impl DataStoreImpl {
last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
verify_new: false,
chunk_order: Default::default(),
- last_digest: None,
sync_level: Default::default(),
backend_config: Default::default(),
lru_store_caching: None,
thread_settings: Default::default(),
+ config_generation: None,
})
}
}
@@ -286,6 +300,55 @@ impl DatastoreThreadSettings {
}
}
+/// Returns the parsed datastore config (`datastore.cfg`) and its
+/// generation.
+///
+/// Uses `ConfigVersionCache` to detect stale entries:
+/// - If the cached generation matches the current generation, the
+/// cached config is returned.
+/// - Otherwise the config is re-read from disk. If `update_cache` is
+/// `true`, the new config and current generation are stored in the
+/// cache. Callers that set `update_cache = true` must hold the
+/// datastore config lock to avoid racing with concurrent config
+/// changes.
+/// - If `update_cache` is `false`, the freshly read config is returned
+/// but the cache is left unchanged.
+///
+/// If `ConfigVersionCache` is not available, the config is always read
+/// from disk and `None` is returned as the generation.
+fn datastore_section_config_cached(
+ update_cache: bool,
+) -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
+ let mut config_cache = DATASTORE_CONFIG_CACHE.lock().unwrap();
+
+ if let Ok(version_cache) = ConfigVersionCache::new() {
+ let current_gen = version_cache.datastore_generation();
+ if let Some(cached) = config_cache.as_ref() {
+ // Fast path: re-use cached datastore.cfg
+ if cached.last_generation == current_gen {
+ return Ok((cached.config.clone(), Some(cached.last_generation)));
+ }
+ }
+ // Slow path: re-read datastore.cfg
+ let (config_raw, _digest) = pbs_config::datastore::config()?;
+ let config = Arc::new(config_raw);
+
+ if update_cache {
+ *config_cache = Some(DatastoreConfigCache {
+ config: config.clone(),
+ last_generation: current_gen,
+ });
+ }
+
+ Ok((config, Some(current_gen)))
+ } else {
+ // Fallback path, no config version cache: read datastore.cfg and return None as generation
+ *config_cache = None;
+ let (config_raw, _digest) = pbs_config::datastore::config()?;
+ Ok((Arc::new(config_raw), None))
+ }
+}
+
impl DataStore {
// This one just panics on everything
#[doc(hidden)]
@@ -363,56 +426,63 @@ impl DataStore {
name: &str,
operation: Option<Operation>,
) -> Result<Arc<DataStore>, Error> {
- // Avoid TOCTOU between checking maintenance mode and updating active operation counter, as
- // we use it to decide whether it is okay to delete the datastore.
+ // Avoid TOCTOU between checking maintenance mode and updating active operations.
let _config_lock = pbs_config::datastore::lock_config()?;
- // we could use the ConfigVersionCache's generation for staleness detection, but we load
- // the config anyway -> just use digest, additional benefit: manual changes get detected
- let (config, digest) = pbs_config::datastore::config()?;
- let config: DataStoreConfig = config.lookup("datastore", name)?;
+ // Get the current datastore.cfg generation number and cached config
+ let (section_config, gen_num) = datastore_section_config_cached(true)?;
+
+ let datastore_cfg: DataStoreConfig = section_config.lookup("datastore", name)?;
+ let maintenance_mode = datastore_cfg.get_maintenance_mode();
+ let mount_status = get_datastore_mount_status(&datastore_cfg);
- if let Some(maintenance_mode) = config.get_maintenance_mode() {
- if let Err(error) = maintenance_mode.check(operation) {
+ if let Some(mm) = &maintenance_mode {
+ if let Err(error) = mm.check(operation.clone()) {
bail!("datastore '{name}' is unavailable: {error}");
}
}
- if get_datastore_mount_status(&config) == Some(false) {
- let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
- datastore_cache.remove(&config.name);
- bail!("datastore '{}' is not mounted", config.name);
+ let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
+
+ if mount_status == Some(false) {
+ datastore_cache.remove(&datastore_cfg.name);
+ bail!("datastore '{}' is not mounted", datastore_cfg.name);
}
- let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
- let entry = datastore_cache.get(name);
-
- // reuse chunk store so that we keep using the same process locker instance!
- let chunk_store = if let Some(datastore) = &entry {
- let last_digest = datastore.last_digest.as_ref();
- if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
- if let Some(operation) = operation {
- update_active_operations(name, operation, 1)?;
+ // Re-use DataStoreImpl
+ if let Some(existing) = datastore_cache.get(name).cloned() {
+ if let (Some(last_generation), Some(gen_num)) = (existing.config_generation, gen_num) {
+ if last_generation == gen_num {
+ if let Some(op) = operation {
+ update_active_operations(name, op, 1)?;
+ }
+
+ return Ok(Arc::new(Self {
+ inner: existing,
+ operation,
+ }));
}
- return Ok(Arc::new(Self {
- inner: Arc::clone(datastore),
- operation,
- }));
}
- Arc::clone(&datastore.chunk_store)
+ }
+
+ // (Re)build DataStoreImpl
+
+ // Reuse chunk store so that we keep using the same process locker instance!
+ let chunk_store = if let Some(existing) = datastore_cache.get(name) {
+ Arc::clone(&existing.chunk_store)
} else {
let tuning: DatastoreTuning = serde_json::from_value(
DatastoreTuning::API_SCHEMA
- .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
+ .parse_property_string(datastore_cfg.tuning.as_deref().unwrap_or(""))?,
)?;
Arc::new(ChunkStore::open(
name,
- config.absolute_path(),
+ datastore_cfg.absolute_path(),
tuning.sync_level.unwrap_or_default(),
)?)
};
- let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
+ let datastore = DataStore::with_store_and_config(chunk_store, datastore_cfg, gen_num)?;
let datastore = Arc::new(datastore);
datastore_cache.insert(name.to_string(), datastore.clone());
@@ -514,7 +584,7 @@ impl DataStore {
fn with_store_and_config(
chunk_store: Arc<ChunkStore>,
config: DataStoreConfig,
- last_digest: Option<[u8; 32]>,
+ generation: Option<usize>,
) -> Result<DataStoreImpl, Error> {
let mut gc_status_path = chunk_store.base_path();
gc_status_path.push(".gc-status");
@@ -579,11 +649,11 @@ impl DataStore {
last_gc_status: Mutex::new(gc_status),
verify_new: config.verify_new.unwrap_or(false),
chunk_order: tuning.chunk_order.unwrap_or_default(),
- last_digest,
sync_level: tuning.sync_level.unwrap_or_default(),
backend_config,
lru_store_caching,
thread_settings,
+ config_generation: generation,
})
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop
2025-11-24 17:04 [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
@ 2025-11-24 17:04 ` Samuel Rufinatscha
2025-11-26 15:15 ` Fabian Grünbichler
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-26 15:16 ` [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Fabian Grünbichler
4 siblings, 1 reply; 15+ messages in thread
From: Samuel Rufinatscha @ 2025-11-24 17:04 UTC (permalink / raw)
To: pbs-devel
The Drop impl of DataStore re-read datastore.cfg to decide whether
the entry should be evicted from the in-process cache (based on
maintenance mode’s clear_from_cache). During the investigation of
issue #6049 [1], a flamegraph [2] showed that the config reload in Drop
accounted for a measurable share of CPU time under load.
This patch wires the datastore config fast path to the Drop
impl to eventually avoid an expensive config reload from disk to capture
the maintenance mandate. Also, to ensure the Drop handlers will detect
that a newer config exists / to mitigate usage of an eventually stale
cached entry, generation will not only be bumped on config save, but also
on re-read of the config file (slow path), if `update_cache = true`.
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes:
From v1 → v2
- Replace caching logic with the datastore_section_config_cached()
helper.
From v2 → v3
No changes
From v3 → v4, thanks @Fabian
- Pass datastore_section_config_cached(false) in Drop to avoid
concurrent cache updates.
From v4 → v5
- Rebased only, no changes
pbs-datastore/src/datastore.rs | 60 ++++++++++++++++++++++++++--------
1 file changed, 47 insertions(+), 13 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index c9cb5d65..7638a899 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -225,15 +225,40 @@ impl Drop for DataStore {
// remove datastore from cache iff
// - last task finished, and
// - datastore is in a maintenance mode that mandates it
- let remove_from_cache = last_task
- && pbs_config::datastore::config()
- .and_then(|(s, _)| s.lookup::<DataStoreConfig>("datastore", self.name()))
- .is_ok_and(|c| {
- c.get_maintenance_mode()
- .is_some_and(|m| m.clear_from_cache())
- });
-
- if remove_from_cache {
+
+ // first check: check if last task finished
+ if !last_task {
+ return;
+ }
+
+ let (section_config, _gen) = match datastore_section_config_cached(false) {
+ Ok(v) => v,
+ Err(err) => {
+ log::error!(
+ "failed to load datastore config in Drop for {} - {err}",
+ self.name()
+ );
+ return;
+ }
+ };
+
+ let datastore_cfg: DataStoreConfig =
+ match section_config.lookup("datastore", self.name()) {
+ Ok(cfg) => cfg,
+ Err(err) => {
+ log::error!(
+ "failed to look up datastore '{}' in Drop - {err}",
+ self.name()
+ );
+ return;
+ }
+ };
+
+ // second check: check maintenance mode mandate
+ if datastore_cfg
+ .get_maintenance_mode()
+ .is_some_and(|m| m.clear_from_cache())
+ {
DATASTORE_MAP.lock().unwrap().remove(self.name());
}
}
@@ -307,12 +332,12 @@ impl DatastoreThreadSettings {
/// - If the cached generation matches the current generation, the
/// cached config is returned.
/// - Otherwise the config is re-read from disk. If `update_cache` is
-/// `true`, the new config and current generation are stored in the
+/// `true`, the new config and bumped generation are stored in the
/// cache. Callers that set `update_cache = true` must hold the
/// datastore config lock to avoid racing with concurrent config
/// changes.
/// - If `update_cache` is `false`, the freshly read config is returned
-/// but the cache is left unchanged.
+/// but the cache and generation are left unchanged.
///
/// If `ConfigVersionCache` is not available, the config is always read
/// from disk and `None` is returned as the generation.
@@ -333,14 +358,23 @@ fn datastore_section_config_cached(
let (config_raw, _digest) = pbs_config::datastore::config()?;
let config = Arc::new(config_raw);
+ let mut effective_gen = current_gen;
if update_cache {
+ // Bump the generation. This ensures that Drop
+ // handlers will detect that a newer config exists
+ // and will not rely on a stale cached entry for
+ // maintenance mandate.
+ let prev_gen = version_cache.increase_datastore_generation();
+ effective_gen = prev_gen + 1;
+
+ // Persist
*config_cache = Some(DatastoreConfigCache {
config: config.clone(),
- last_generation: current_gen,
+ last_generation: effective_gen,
});
}
- Ok((config, Some(current_gen)))
+ Ok((config, Some(effective_gen)))
} else {
// Fallback path, no config version cache: read datastore.cfg and return None as generation
*config_cache = None;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
* [pbs-devel] [PATCH proxmox-backup v5 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
2025-11-24 17:04 [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
` (2 preceding siblings ...)
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
@ 2025-11-24 17:04 ` Samuel Rufinatscha
2025-11-26 15:15 ` Fabian Grünbichler
2025-11-26 15:16 ` [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Fabian Grünbichler
4 siblings, 1 reply; 15+ messages in thread
From: Samuel Rufinatscha @ 2025-11-24 17:04 UTC (permalink / raw)
To: pbs-devel
The lookup fast path reacts to API-driven config changes because
save_config() bumps the generation. Manual edits of datastore.cfg do
not bump the counter. To keep the system robust against such edits
without reintroducing config reading and hashing on the hot path, this
patch adds a TTL to the cache entry.
If the cached config is older than
DATASTORE_CONFIG_CACHE_TTL_SECS (set to 60s), the next lookup takes
the slow path and refreshes the entry. As an optimization, a check to
catch manual edits was added (if the digest changed but generation
stayed the same), so that the generation is only bumped when needed.
Links
[1] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes:
From v1 → v2
- Store last_update timestamp in DatastoreConfigCache type.
From v2 → v3
No changes
From v3 → v4
- Fix digest generation bump logic in update_cache, thanks @Fabian.
From v4 → v5
- Rebased only, no changes
pbs-datastore/src/datastore.rs | 53 ++++++++++++++++++++++++----------
1 file changed, 38 insertions(+), 15 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 7638a899..0fc3fbf2 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -53,8 +53,12 @@ use crate::{DataBlob, LocalDatastoreLruCache};
struct DatastoreConfigCache {
// Parsed datastore.cfg file
config: Arc<SectionConfigData>,
+ // Digest of the datastore.cfg file
+ digest: [u8; 32],
// Generation number from ConfigVersionCache
last_generation: usize,
+ // Last update time (epoch seconds)
+ last_update: i64,
}
static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
@@ -63,6 +67,8 @@ static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
LazyLock::new(|| Mutex::new(HashMap::new()));
+/// Max age in seconds to reuse the cached datastore config.
+const DATASTORE_CONFIG_CACHE_TTL_SECS: i64 = 60;
/// Filename to store backup group notes
pub const GROUP_NOTES_FILE_NAME: &str = "notes";
/// Filename to store backup group owner
@@ -329,13 +335,14 @@ impl DatastoreThreadSettings {
/// generation.
///
/// Uses `ConfigVersionCache` to detect stale entries:
-/// - If the cached generation matches the current generation, the
-/// cached config is returned.
+/// - If the cached generation matches the current generation and TTL is
+/// OK, the cached config is returned.
/// - Otherwise the config is re-read from disk. If `update_cache` is
-/// `true`, the new config and bumped generation are stored in the
-/// cache. Callers that set `update_cache = true` must hold the
-/// datastore config lock to avoid racing with concurrent config
-/// changes.
+/// `true` and a previous cached entry exists with the same generation
+/// but a different digest, this indicates the config has changed
+/// (e.g. manual edit) and the generation must be bumped. Callers
+/// that set `update_cache = true` must hold the datastore config lock
+/// to avoid racing with concurrent config changes.
/// - If `update_cache` is `false`, the freshly read config is returned
/// but the cache and generation are left unchanged.
///
@@ -347,30 +354,46 @@ fn datastore_section_config_cached(
let mut config_cache = DATASTORE_CONFIG_CACHE.lock().unwrap();
if let Ok(version_cache) = ConfigVersionCache::new() {
+ let now = epoch_i64();
let current_gen = version_cache.datastore_generation();
if let Some(cached) = config_cache.as_ref() {
- // Fast path: re-use cached datastore.cfg
- if cached.last_generation == current_gen {
+ // Fast path: re-use cached datastore.cfg if generation matches and TTL not expired
+ if cached.last_generation == current_gen
+ && now - cached.last_update < DATASTORE_CONFIG_CACHE_TTL_SECS
+ {
return Ok((cached.config.clone(), Some(cached.last_generation)));
}
}
// Slow path: re-read datastore.cfg
- let (config_raw, _digest) = pbs_config::datastore::config()?;
+ let (config_raw, digest) = pbs_config::datastore::config()?;
let config = Arc::new(config_raw);
let mut effective_gen = current_gen;
if update_cache {
- // Bump the generation. This ensures that Drop
- // handlers will detect that a newer config exists
- // and will not rely on a stale cached entry for
- // maintenance mandate.
- let prev_gen = version_cache.increase_datastore_generation();
- effective_gen = prev_gen + 1;
+ // Bump the generation if the config has been changed manually.
+ // This ensures that Drop handlers will detect that a newer config exists
+ // and will not rely on a stale cached entry for maintenance mandate.
+ let (prev_gen, prev_digest) = config_cache
+ .as_ref()
+ .map(|c| (Some(c.last_generation), Some(c.digest)))
+ .unwrap_or((None, None));
+
+ let manual_edit = match (prev_gen, prev_digest) {
+ (Some(prev_g), Some(prev_d)) => prev_g == current_gen && prev_d != digest,
+ _ => false,
+ };
+
+ if manual_edit {
+ let prev_gen = version_cache.increase_datastore_generation();
+ effective_gen = prev_gen + 1;
+ }
// Persist
*config_cache = Some(DatastoreConfigCache {
config: config.clone(),
+ digest,
last_generation: effective_gen,
+ last_update: now,
});
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v5 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
@ 2025-11-26 15:15 ` Fabian Grünbichler
0 siblings, 0 replies; 15+ messages in thread
From: Fabian Grünbichler @ 2025-11-26 15:15 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
nit for the subject: this doesn't fix the reported issue, it just
improves the fix further, so please drop that part and maybe instead add
"lookup" somewhere..
On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
> The lookup fast path reacts to API-driven config changes because
> save_config() bumps the generation. Manual edits of datastore.cfg do
> not bump the counter. To keep the system robust against such edits
> without reintroducing config reading and hashing on the hot path, this
> patch adds a TTL to the cache entry.
>
> If the cached config is older than
> DATASTORE_CONFIG_CACHE_TTL_SECS (set to 60s), the next lookup takes
> the slow path and refreshes the entry. As an optimization, a check to
> catch manual edits was added (if the digest changed but generation
> stayed the same), so that the generation is only bumped when needed.
>
> Links
>
> [1] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>
> Fixes: #6049
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
one style nit below, otherwise:
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
> Changes:
>
> From v1 → v2
> - Store last_update timestamp in DatastoreConfigCache type.
>
> From v2 → v3
> No changes
>
> From v3 → v4
> - Fix digest generation bump logic in update_cache, thanks @Fabian.
>
> From v4 → v5
> - Rebased only, no changes
>
> pbs-datastore/src/datastore.rs | 53 ++++++++++++++++++++++++----------
> 1 file changed, 38 insertions(+), 15 deletions(-)
>
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 7638a899..0fc3fbf2 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -53,8 +53,12 @@ use crate::{DataBlob, LocalDatastoreLruCache};
> struct DatastoreConfigCache {
> // Parsed datastore.cfg file
> config: Arc<SectionConfigData>,
> + // Digest of the datastore.cfg file
> + digest: [u8; 32],
> // Generation number from ConfigVersionCache
> last_generation: usize,
> + // Last update time (epoch seconds)
> + last_update: i64,
> }
>
> static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
> @@ -63,6 +67,8 @@ static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
> static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
> LazyLock::new(|| Mutex::new(HashMap::new()));
>
> +/// Max age in seconds to reuse the cached datastore config.
> +const DATASTORE_CONFIG_CACHE_TTL_SECS: i64 = 60;
> /// Filename to store backup group notes
> pub const GROUP_NOTES_FILE_NAME: &str = "notes";
> /// Filename to store backup group owner
> @@ -329,13 +335,14 @@ impl DatastoreThreadSettings {
> /// generation.
> ///
> /// Uses `ConfigVersionCache` to detect stale entries:
> -/// - If the cached generation matches the current generation, the
> -/// cached config is returned.
> +/// - If the cached generation matches the current generation and TTL is
> +/// OK, the cached config is returned.
> /// - Otherwise the config is re-read from disk. If `update_cache` is
> -/// `true`, the new config and bumped generation are stored in the
> -/// cache. Callers that set `update_cache = true` must hold the
> -/// datastore config lock to avoid racing with concurrent config
> -/// changes.
> +/// `true` and a previous cached entry exists with the same generation
> +/// but a different digest, this indicates the config has changed
> +/// (e.g. manual edit) and the generation must be bumped. Callers
> +/// that set `update_cache = true` must hold the datastore config lock
> +/// to avoid racing with concurrent config changes.
> /// - If `update_cache` is `false`, the freshly read config is returned
> /// but the cache and generation are left unchanged.
> ///
> @@ -347,30 +354,46 @@ fn datastore_section_config_cached(
> let mut config_cache = DATASTORE_CONFIG_CACHE.lock().unwrap();
>
> if let Ok(version_cache) = ConfigVersionCache::new() {
> + let now = epoch_i64();
> let current_gen = version_cache.datastore_generation();
> if let Some(cached) = config_cache.as_ref() {
> - // Fast path: re-use cached datastore.cfg
> - if cached.last_generation == current_gen {
> + // Fast path: re-use cached datastore.cfg if generation matches and TTL not expired
> + if cached.last_generation == current_gen
> + && now - cached.last_update < DATASTORE_CONFIG_CACHE_TTL_SECS
> + {
> return Ok((cached.config.clone(), Some(cached.last_generation)));
> }
> }
> // Slow path: re-read datastore.cfg
> - let (config_raw, _digest) = pbs_config::datastore::config()?;
> + let (config_raw, digest) = pbs_config::datastore::config()?;
> let config = Arc::new(config_raw);
>
> let mut effective_gen = current_gen;
> if update_cache {
> - // Bump the generation. This ensures that Drop
> - // handlers will detect that a newer config exists
> - // and will not rely on a stale cached entry for
> - // maintenance mandate.
> - let prev_gen = version_cache.increase_datastore_generation();
> - effective_gen = prev_gen + 1;
> + // Bump the generation if the config has been changed manually.
> + // This ensures that Drop handlers will detect that a newer config exists
> + // and will not rely on a stale cached entry for maintenance mandate.
> + let (prev_gen, prev_digest) = config_cache
> + .as_ref()
> + .map(|c| (Some(c.last_generation), Some(c.digest)))
> + .unwrap_or((None, None));
so here we map an option to a tuple of options and unwrap it
> +
> + let manual_edit = match (prev_gen, prev_digest) {
only to then match and convert it to a boolean again here
> + (Some(prev_g), Some(prev_d)) => prev_g == current_gen && prev_d != digest,
> + _ => false,
> + };
> +
> + if manual_edit {
> + let prev_gen = version_cache.increase_datastore_generation();
> + effective_gen = prev_gen + 1;
to then do some code here, if the boolean is true ;)
this can all just be a single block of code instead:
if let Some(cached) = config_cache.as_ref() {
if cached.last_generation == current_gen && cached.digest != digest {
effective_gen = version_cache.increase_datastore_generation() + 1;
}
}
which also matches the first block higher up in the helper..
> + }
>
> // Persist
> *config_cache = Some(DatastoreConfigCache {
> config: config.clone(),
> + digest,
> last_generation: effective_gen,
> + last_update: now,
> });
> }
>
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
@ 2025-11-26 15:15 ` Fabian Grünbichler
2025-11-28 9:03 ` Samuel Rufinatscha
0 siblings, 1 reply; 15+ messages in thread
From: Fabian Grünbichler @ 2025-11-26 15:15 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
> The Drop impl of DataStore re-read datastore.cfg to decide whether
> the entry should be evicted from the in-process cache (based on
> maintenance mode’s clear_from_cache). During the investigation of
> issue #6049 [1], a flamegraph [2] showed that the config reload in Drop
> accounted for a measurable share of CPU time under load.
>
> This patch wires the datastore config fast path to the Drop
> impl to eventually avoid an expensive config reload from disk to capture
> the maintenance mandate. Also, to ensure the Drop handlers will detect
> that a newer config exists / to mitigate usage of an eventually stale
> cached entry, generation will not only be bumped on config save, but also
> on re-read of the config file (slow path), if `update_cache = true`.
>
> Links
>
> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>
> Fixes: #6049
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> Changes:
>
> From v1 → v2
> - Replace caching logic with the datastore_section_config_cached()
> helper.
>
> From v2 → v3
> No changes
>
> From v3 → v4, thanks @Fabian
> - Pass datastore_section_config_cached(false) in Drop to avoid
> concurrent cache updates.
>
> From v4 → v5
> - Rebased only, no changes
>
> pbs-datastore/src/datastore.rs | 60 ++++++++++++++++++++++++++--------
> 1 file changed, 47 insertions(+), 13 deletions(-)
>
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index c9cb5d65..7638a899 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -225,15 +225,40 @@ impl Drop for DataStore {
> // remove datastore from cache iff
> // - last task finished, and
> // - datastore is in a maintenance mode that mandates it
> - let remove_from_cache = last_task
> - && pbs_config::datastore::config()
> - .and_then(|(s, _)| s.lookup::<DataStoreConfig>("datastore", self.name()))
> - .is_ok_and(|c| {
> - c.get_maintenance_mode()
> - .is_some_and(|m| m.clear_from_cache())
> - });
old code here ignored parsing/locking/.. issues and just assumed if no
config can be obtained nothing should be done..
> -
> - if remove_from_cache {
> +
> + // first check: check if last task finished
> + if !last_task {
> + return;
> + }
> +
> + let (section_config, _gen) = match datastore_section_config_cached(false) {
> + Ok(v) => v,
> + Err(err) => {
> + log::error!(
> + "failed to load datastore config in Drop for {} - {err}",
> + self.name()
> + );
> + return;
> + }
> + };
> +
> + let datastore_cfg: DataStoreConfig =
> + match section_config.lookup("datastore", self.name()) {
> + Ok(cfg) => cfg,
> + Err(err) => {
> + log::error!(
> + "failed to look up datastore '{}' in Drop - {err}",
> + self.name()
> + );
> + return;
here we now have fancy error logging ;) which can be fine, but if we go
from silently ignoring errors to logging them at error level that should
be mentioned to make it clear that it is intentional.
besides that, the second error here means that the datastore was removed
from the config in the meantime.. in which case we should probably
remove it from the map as well, if is still there, even though we can't
check the maintenance mode..
> + }
> + };
> +
> + // second check: check maintenance mode mandate
what is a "maintenance mode mandate"? ;)
keeping it simple, why not just
// check if maintenance mode requires closing FDs
> + if datastore_cfg
> + .get_maintenance_mode()
> + .is_some_and(|m| m.clear_from_cache())
> + {
> DATASTORE_MAP.lock().unwrap().remove(self.name());
> }
> }
> @@ -307,12 +332,12 @@ impl DatastoreThreadSettings {
> /// - If the cached generation matches the current generation, the
> /// cached config is returned.
> /// - Otherwise the config is re-read from disk. If `update_cache` is
> -/// `true`, the new config and current generation are stored in the
> +/// `true`, the new config and bumped generation are stored in the
> /// cache. Callers that set `update_cache = true` must hold the
> /// datastore config lock to avoid racing with concurrent config
> /// changes.
> /// - If `update_cache` is `false`, the freshly read config is returned
> -/// but the cache is left unchanged.
> +/// but the cache and generation are left unchanged.
> ///
> /// If `ConfigVersionCache` is not available, the config is always read
> /// from disk and `None` is returned as the generation.
> @@ -333,14 +358,23 @@ fn datastore_section_config_cached(
does this part here make any sense in this patch?
we don't check the generation in the Drop handler anyway, so it will get
the latest cached version, no matter what?
we'd only end up in this part of the code via lookup_datastore, and only
if:
- the previous cached entry and the current one have a different
generation -> no need to bump again, the cache is already invalidated
- there is no previous cached entry -> nothing to invalidate
I think this part should move to the next patch..
> let (config_raw, _digest) = pbs_config::datastore::config()?;
> let config = Arc::new(config_raw);
>
> + let mut effective_gen = current_gen;
> if update_cache {
> + // Bump the generation. This ensures that Drop
> + // handlers will detect that a newer config exists
> + // and will not rely on a stale cached entry for
> + // maintenance mandate.
> + let prev_gen = version_cache.increase_datastore_generation();
> + effective_gen = prev_gen + 1;
> +
> + // Persist
> *config_cache = Some(DatastoreConfigCache {
> config: config.clone(),
> - last_generation: current_gen,
> + last_generation: effective_gen,
> });
> }
>
> - Ok((config, Some(current_gen)))
> + Ok((config, Some(effective_gen)))
> } else {
> // Fallback path, no config version cache: read datastore.cfg and return None as generation
> *config_cache = None;
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
@ 2025-11-26 15:15 ` Fabian Grünbichler
2025-11-26 17:21 ` Samuel Rufinatscha
0 siblings, 1 reply; 15+ messages in thread
From: Fabian Grünbichler @ 2025-11-26 15:15 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
> Repeated /status requests caused lookup_datastore() to re-read and
> parse datastore.cfg on every call. The issue was mentioned in report
> #6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
> dominated by pbs_config::datastore::config() (config parsing).
>
> This patch implements caching of the global datastore.cfg using the
> generation numbers from the shared config version cache. It caches the
> datastore.cfg along with the generation number and, when a subsequent
> lookup sees the same generation, it reuses the cached config without
> re-reading it from disk. If the generation differs
> (or the cache is unavailable), the config is re-read from disk.
> If `update_cache = true`, the new config and current generation are
> persisted in the cache. In this case, callers must hold the datastore
> config lock to avoid racing with concurrent config changes.
> If `update_cache` is `false` and generation did not match, the freshly
> read config is returned but the cache is left unchanged. If
> `ConfigVersionCache` is not available, the config is always read from
> disk and `None` is returned as generation.
>
> Behavioral notes
>
> - The generation is bumped via the existing save_config() path, so
> API-driven config changes are detected immediately.
> - Manual edits to datastore.cfg are not detected; this is covered in a
> dedicated patch in this series.
> - DataStore::drop still performs a config read on the common path;
> also covered in a dedicated patch in this series.
>
> Links
>
> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>
> Fixes: #6049
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
style nits below, but otherwise
Reviewed-by: Fabian Grünbicher <f.gruenbichler@proxmox.com>
> ---
> Changes:
>
> From v1 → v2, thanks @Fabian
> - Moved the ConfigVersionCache changes into its own patch.
> - Introduced the global static DATASTORE_CONFIG_CACHE to store the
> fully parsed datastore.cfg instead, along with its generation number.
> Introduced DatastoreConfigCache struct to hold both.
> - Removed and replaced the CachedDatastoreConfigTag field of
> DataStoreImpl with a generation number field only (Option<usize>)
> to validate DataStoreImpl reuse.
> - Added DataStore::datastore_section_config_cached() helper function
> to encapsulate the caching logic and simplify reuse.
> - Modified DataStore::lookup_datastore() to use the new helper.
>
> From v2 → v3
> No changes
>
> From v3 → v4, thanks @Fabian
> - Restructured the version cache checks in
> datastore_section_config_cached(), to simplify the logic.
> - Added update_cache parameter to datastore_section_config_cached() to
> control cache updates.
>
> From v4 → v5
> - Rebased only, no changes
>
> pbs-datastore/Cargo.toml | 1 +
> pbs-datastore/src/datastore.rs | 138 +++++++++++++++++++++++++--------
> 2 files changed, 105 insertions(+), 34 deletions(-)
this could be
2 files changed, 80 insertions(+), 17 deletions(-)
see below. this might sound nit-picky, but keeping diffs concise makes
reviewing a lot easier because of the higher signal to noise ratio..
>
> diff --git a/pbs-datastore/Cargo.toml b/pbs-datastore/Cargo.toml
> index 8ce930a9..42f49a7b 100644
> --- a/pbs-datastore/Cargo.toml
> +++ b/pbs-datastore/Cargo.toml
> @@ -40,6 +40,7 @@ proxmox-io.workspace = true
> proxmox-lang.workspace=true
> proxmox-s3-client = { workspace = true, features = [ "impl" ] }
> proxmox-schema = { workspace = true, features = [ "api-macro" ] }
> +proxmox-section-config.workspace = true
> proxmox-serde = { workspace = true, features = [ "serde_json" ] }
> proxmox-sys.workspace = true
> proxmox-systemd.workspace = true
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 36550ff6..c9cb5d65 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -34,7 +34,8 @@ use pbs_api_types::{
> MaintenanceType, Operation, UPID,
> };
> use pbs_config::s3::S3_CFG_TYPE_ID;
> -use pbs_config::BackupLockGuard;
> +use pbs_config::{BackupLockGuard, ConfigVersionCache};
> +use proxmox_section_config::SectionConfigData;
>
> use crate::backup_info::{
> BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
> @@ -48,6 +49,17 @@ use crate::s3::S3_CONTENT_PREFIX;
> use crate::task_tracking::{self, update_active_operations};
> use crate::{DataBlob, LocalDatastoreLruCache};
>
> +// Cache for fully parsed datastore.cfg
> +struct DatastoreConfigCache {
> + // Parsed datastore.cfg file
> + config: Arc<SectionConfigData>,
> + // Generation number from ConfigVersionCache
> + last_generation: usize,
> +}
> +
> +static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
> + LazyLock::new(|| Mutex::new(None));
> +
> static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
> LazyLock::new(|| Mutex::new(HashMap::new()));
>
> @@ -149,11 +161,13 @@ pub struct DataStoreImpl {
> last_gc_status: Mutex<GarbageCollectionStatus>,
> verify_new: bool,
> chunk_order: ChunkOrder,
> - last_digest: Option<[u8; 32]>,
> sync_level: DatastoreFSyncLevel,
> backend_config: DatastoreBackendConfig,
> lru_store_caching: Option<LocalDatastoreLruCache>,
> thread_settings: DatastoreThreadSettings,
> + /// Datastore generation number from `ConfigVersionCache` at creation time, used to
datastore.cfg cache generation number at lookup time, used to invalidate
this cached `DataStoreImpl`
creation time could also refer to the datastore creation time..
> + /// validate reuse of this cached `DataStoreImpl`.
> + config_generation: Option<usize>,
> }
>
> impl DataStoreImpl {
> @@ -166,11 +180,11 @@ impl DataStoreImpl {
> last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
> verify_new: false,
> chunk_order: Default::default(),
> - last_digest: None,
> sync_level: Default::default(),
> backend_config: Default::default(),
> lru_store_caching: None,
> thread_settings: Default::default(),
> + config_generation: None,
> })
> }
> }
> @@ -286,6 +300,55 @@ impl DatastoreThreadSettings {
so this new helper is already +49 lines, which means it's the bulk of
the change if the noise below is removed..
> }
> }
>
> +/// Returns the parsed datastore config (`datastore.cfg`) and its
> +/// generation.
> +///
> +/// Uses `ConfigVersionCache` to detect stale entries:
> +/// - If the cached generation matches the current generation, the
> +/// cached config is returned.
> +/// - Otherwise the config is re-read from disk. If `update_cache` is
> +/// `true`, the new config and current generation are stored in the
> +/// cache. Callers that set `update_cache = true` must hold the
> +/// datastore config lock to avoid racing with concurrent config
> +/// changes.
> +/// - If `update_cache` is `false`, the freshly read config is returned
> +/// but the cache is left unchanged.
> +///
> +/// If `ConfigVersionCache` is not available, the config is always read
> +/// from disk and `None` is returned as the generation.
> +fn datastore_section_config_cached(
> + update_cache: bool,
> +) -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
> + let mut config_cache = DATASTORE_CONFIG_CACHE.lock().unwrap();
> +
> + if let Ok(version_cache) = ConfigVersionCache::new() {
> + let current_gen = version_cache.datastore_generation();
> + if let Some(cached) = config_cache.as_ref() {
> + // Fast path: re-use cached datastore.cfg
> + if cached.last_generation == current_gen {
> + return Ok((cached.config.clone(), Some(cached.last_generation)));
> + }
> + }
> + // Slow path: re-read datastore.cfg
> + let (config_raw, _digest) = pbs_config::datastore::config()?;
> + let config = Arc::new(config_raw);
> +
> + if update_cache {
> + *config_cache = Some(DatastoreConfigCache {
> + config: config.clone(),
> + last_generation: current_gen,
> + });
> + }
> +
> + Ok((config, Some(current_gen)))
> + } else {
> + // Fallback path, no config version cache: read datastore.cfg and return None as generation
> + *config_cache = None;
> + let (config_raw, _digest) = pbs_config::datastore::config()?;
> + Ok((Arc::new(config_raw), None))
> + }
> +}
> +
> impl DataStore {
> // This one just panics on everything
> #[doc(hidden)]
> @@ -363,56 +426,63 @@ impl DataStore {
> name: &str,
> operation: Option<Operation>,
> ) -> Result<Arc<DataStore>, Error> {
the changes here contain a lot of churn that is not needed for the
actual change that is done - e.g., there's lots of variables being
needless renamed..
> - // Avoid TOCTOU between checking maintenance mode and updating active operation counter, as
> - // we use it to decide whether it is okay to delete the datastore.
> + // Avoid TOCTOU between checking maintenance mode and updating active operations.
> let _config_lock = pbs_config::datastore::lock_config()?;
>
> - // we could use the ConfigVersionCache's generation for staleness detection, but we load
> - // the config anyway -> just use digest, additional benefit: manual changes get detected
> - let (config, digest) = pbs_config::datastore::config()?;
> - let config: DataStoreConfig = config.lookup("datastore", name)?;
> + // Get the current datastore.cfg generation number and cached config
> + let (section_config, gen_num) = datastore_section_config_cached(true)?;
> +
> + let datastore_cfg: DataStoreConfig = section_config.lookup("datastore", name)?;
renaming this variable here already causes a lot of noise - if you
really want to do that, do it up-front or at the end as cleanup commit
(depending on how sure you are the change is agree-able - if it's almost
certain it is accepted, put it up front, then it can get applied
already. if not, put it at the end -> then it can be skipped without
affecting the other patches ;))
but going from config to datastore_cfg doesn't make the code any clearer
- the latter can still refer to the whole config (datastore.cfg) or the
individual datastore's section within.. since we have no futher business
with the whole section config here, I think keeping this as `config` is
fine..
> + let maintenance_mode = datastore_cfg.get_maintenance_mode();
> + let mount_status = get_datastore_mount_status(&datastore_cfg);
and things extracted into variables here despite being used only once
(this also doesn't change during the rest of the series)
>
> - if let Some(maintenance_mode) = config.get_maintenance_mode() {
> - if let Err(error) = maintenance_mode.check(operation) {
> + if let Some(mm) = &maintenance_mode {
binding name changes
> + if let Err(error) = mm.check(operation.clone()) {
new clone() is introduced but not needed
> bail!("datastore '{name}' is unavailable: {error}");
> }
> }
>
> - if get_datastore_mount_status(&config) == Some(false) {
> - let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
> - datastore_cache.remove(&config.name);
> - bail!("datastore '{}' is not mounted", config.name);
> + let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
moving this is fine!
> +
> + if mount_status == Some(false) {
> + datastore_cache.remove(&datastore_cfg.name);
> + bail!("datastore '{}' is not mounted", datastore_cfg.name);
> }
>
> - let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
> - let entry = datastore_cache.get(name);
this is changed to doing it twice and cloning..
> -
> - // reuse chunk store so that we keep using the same process locker instance!
> - let chunk_store = if let Some(datastore) = &entry {
structure changed here and binding renamed, even though the old one works as-is
> - let last_digest = datastore.last_digest.as_ref();
> - if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
> - if let Some(operation) = operation {
> - update_active_operations(name, operation, 1)?;
> + // Re-use DataStoreImpl
> + if let Some(existing) = datastore_cache.get(name).cloned() {
> + if let (Some(last_generation), Some(gen_num)) = (existing.config_generation, gen_num) {
> + if last_generation == gen_num {
these two ifs can be collapsed into
if datastore.config_generation == gen_num && gen_num.is_some() {
since we only want to reuse the entry if the current gen num is the same
as the last one.
> + if let Some(op) = operation {
binding needlessly renamed
> + update_active_operations(name, op, 1)?;
> + }
> +
> + return Ok(Arc::new(Self {
> + inner: existing,
> + operation,
> + }));
> }
> - return Ok(Arc::new(Self {
> - inner: Arc::clone(datastore),
> - operation,
> - }));
> }
> - Arc::clone(&datastore.chunk_store)
> + }
> +
> + // (Re)build DataStoreImpl
> +
> + // Reuse chunk store so that we keep using the same process locker instance!
> + let chunk_store = if let Some(existing) = datastore_cache.get(name) {
> + Arc::clone(&existing.chunk_store)
this one is not needed at all, can just be combined with the previous if
as before
> } else {
> let tuning: DatastoreTuning = serde_json::from_value(
> DatastoreTuning::API_SCHEMA
> - .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
> + .parse_property_string(datastore_cfg.tuning.as_deref().unwrap_or(""))?,
> )?;
> Arc::new(ChunkStore::open(
> name,
> - config.absolute_path(),
> + datastore_cfg.absolute_path(),
> tuning.sync_level.unwrap_or_default(),
> )?)
> };
>
> - let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
> + let datastore = DataStore::with_store_and_config(chunk_store, datastore_cfg, gen_num)?;
>
> let datastore = Arc::new(datastore);
> datastore_cache.insert(name.to_string(), datastore.clone());
> @@ -514,7 +584,7 @@ impl DataStore {
> fn with_store_and_config(
> chunk_store: Arc<ChunkStore>,
> config: DataStoreConfig,
> - last_digest: Option<[u8; 32]>,
> + generation: Option<usize>,
> ) -> Result<DataStoreImpl, Error> {
> let mut gc_status_path = chunk_store.base_path();
> gc_status_path.push(".gc-status");
> @@ -579,11 +649,11 @@ impl DataStore {
> last_gc_status: Mutex::new(gc_status),
> verify_new: config.verify_new.unwrap_or(false),
> chunk_order: tuning.chunk_order.unwrap_or_default(),
> - last_digest,
> sync_level: tuning.sync_level.unwrap_or_default(),
> backend_config,
> lru_store_caching,
> thread_settings,
> + config_generation: generation,
> })
> }
>
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
@ 2025-11-26 15:15 ` Fabian Grünbichler
0 siblings, 0 replies; 15+ messages in thread
From: Fabian Grünbichler @ 2025-11-26 15:15 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
nit for the subject: this doesn't yet partially fix anything..
On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
> Repeated /status requests caused lookup_datastore() to re-read and
> parse datastore.cfg on every call. The issue was mentioned in report
> #6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
> dominated by pbs_config::datastore::config() (config parsing).
>
> To solve the issue, this patch prepares the config version cache,
> so that datastore config caching can be built on top of it.
> This patch specifically:
> (1) implements increment function in order to invalidate generations
> (2) removes obsolete comments
>
> Links
>
> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>
> Fixes: #6049
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
> Changes:
>
> From v1 → v2 (original introduction), thanks @Fabian
> - Split the ConfigVersionCache changes out of the large datastore patch
> into their own config-only patch
> * removed the obsolete // FIXME comment on datastore_generation
> * added ConfigVersionCache::datastore_generation() as getter
>
> From v2 → v3
> No changes
>
> From v3 → v4
> No changes
>
> From v4 → v5
> - Rebased only, no changes
>
> pbs-config/src/config_version_cache.rs | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
> index e8fb994f..b875f7e0 100644
> --- a/pbs-config/src/config_version_cache.rs
> +++ b/pbs-config/src/config_version_cache.rs
> @@ -26,7 +26,6 @@ struct ConfigVersionCacheDataInner {
> // Traffic control (traffic-control.cfg) generation/version.
> traffic_control_generation: AtomicUsize,
> // datastore (datastore.cfg) generation/version
> - // FIXME: remove with PBS 3.0
> datastore_generation: AtomicUsize,
> // Add further atomics here
> }
> @@ -145,8 +144,15 @@ impl ConfigVersionCache {
> .fetch_add(1, Ordering::AcqRel);
> }
>
> + /// Returns the datastore generation number.
> + pub fn datastore_generation(&self) -> usize {
> + self.shmem
> + .data()
> + .datastore_generation
> + .load(Ordering::Acquire)
> + }
> +
> /// Increase the datastore generation number.
> - // FIXME: remove with PBS 3.0 or make actually useful again in datastore lookup
> pub fn increase_datastore_generation(&self) -> usize {
> self.shmem
> .data()
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path
2025-11-24 17:04 [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
` (3 preceding siblings ...)
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
@ 2025-11-26 15:16 ` Fabian Grünbichler
2025-11-26 16:10 ` Samuel Rufinatscha
4 siblings, 1 reply; 15+ messages in thread
From: Fabian Grünbichler @ 2025-11-26 15:16 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
> Hi,
>
> this series reduces CPU time in datastore lookups by avoiding repeated
> datastore.cfg reads/parses in both `lookup_datastore()` and
> `DataStore::Drop`. It also adds a TTL so manual config edits are
> noticed without reintroducing hashing on every request.
>
> While investigating #6049 [1], cargo-flamegraph [2] showed hotspots
> during repeated `/status` calls in `lookup_datastore()` and in `Drop`,
> dominated by `pbs_config::datastore::config()` (config parse).
>
> The parsing cost itself should eventually be investigated in a future
> effort. Furthermore, cargo-flamegraph showed that when using a
> token-based auth method to access the API, a significant amount of time
> is spent in validation on every request request [3].
>
> ## Approach
>
> [PATCH 1/4] Support datastore generation in ConfigVersionCache
>
> [PATCH 2/4] Fast path for datastore lookups
> Cache the parsed datastore.cfg keyed by the shared datastore
> generation. lookup_datastore() reuses both the cached config and an
> existing DataStoreImpl when the generation matches, and falls back
> to the old slow path otherwise. The caching logic is implemented
> using the datastore_section_config_cached(update_cache: bool) helper.
>
> [PATCH 3/4] Fast path for Drop
> Make DataStore::Drop use the datastore_section_config_cached()
> helper to avoid re-reading/parsing datastore.cfg on every Drop.
> Bump generation not only on API config saves, but also on slow-path
> lookups (if update_cache is true), to enable Drop handlers see
> eventual newer configs.
>
> [PATCH 4/4] TTL to catch manual edits
> Add a TTL to the cached config and bump the datastore generation iff
> the digest changed but generation stays the same. This catches manual
> edits to datastore.cfg without reintroducing hashing or config
> parsing on every request.
semantics wise this looks mostly good to me now, sent a few style
remarks for the individual patches. let's wait for feedback from the
reporter, and then wrap this up hopefully :)
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path
2025-11-26 15:16 ` [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Fabian Grünbichler
@ 2025-11-26 16:10 ` Samuel Rufinatscha
0 siblings, 0 replies; 15+ messages in thread
From: Samuel Rufinatscha @ 2025-11-26 16:10 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 11/26/25 4:16 PM, Fabian Grünbichler wrote:
> On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
>> Hi,
>>
>> this series reduces CPU time in datastore lookups by avoiding repeated
>> datastore.cfg reads/parses in both `lookup_datastore()` and
>> `DataStore::Drop`. It also adds a TTL so manual config edits are
>> noticed without reintroducing hashing on every request.
>>
>> While investigating #6049 [1], cargo-flamegraph [2] showed hotspots
>> during repeated `/status` calls in `lookup_datastore()` and in `Drop`,
>> dominated by `pbs_config::datastore::config()` (config parse).
>>
>> The parsing cost itself should eventually be investigated in a future
>> effort. Furthermore, cargo-flamegraph showed that when using a
>> token-based auth method to access the API, a significant amount of time
>> is spent in validation on every request request [3].
>>
>> ## Approach
>>
>> [PATCH 1/4] Support datastore generation in ConfigVersionCache
>>
>> [PATCH 2/4] Fast path for datastore lookups
>> Cache the parsed datastore.cfg keyed by the shared datastore
>> generation. lookup_datastore() reuses both the cached config and an
>> existing DataStoreImpl when the generation matches, and falls back
>> to the old slow path otherwise. The caching logic is implemented
>> using the datastore_section_config_cached(update_cache: bool) helper.
>>
>> [PATCH 3/4] Fast path for Drop
>> Make DataStore::Drop use the datastore_section_config_cached()
>> helper to avoid re-reading/parsing datastore.cfg on every Drop.
>> Bump generation not only on API config saves, but also on slow-path
>> lookups (if update_cache is true), to enable Drop handlers see
>> eventual newer configs.
>>
>> [PATCH 4/4] TTL to catch manual edits
>> Add a TTL to the cached config and bump the datastore generation iff
>> the digest changed but generation stays the same. This catches manual
>> edits to datastore.cfg without reintroducing hashing or config
>> parsing on every request.
>
> semantics wise this looks mostly good to me now, sent a few style
> remarks for the individual patches. let's wait for feedback from the
> reporter, and then wrap this up hopefully :)
Thanks for the great review Fabian! I agree, will make sure to integrate
the style remarks and avoid more of the diff noise :)
Thanks!
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
2025-11-26 15:15 ` Fabian Grünbichler
@ 2025-11-26 17:21 ` Samuel Rufinatscha
0 siblings, 0 replies; 15+ messages in thread
From: Samuel Rufinatscha @ 2025-11-26 17:21 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 11/26/25 4:15 PM, Fabian Grünbichler wrote:
> On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
>> Repeated /status requests caused lookup_datastore() to re-read and
>> parse datastore.cfg on every call. The issue was mentioned in report
>> #6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
>> dominated by pbs_config::datastore::config() (config parsing).
>>
>> This patch implements caching of the global datastore.cfg using the
>> generation numbers from the shared config version cache. It caches the
>> datastore.cfg along with the generation number and, when a subsequent
>> lookup sees the same generation, it reuses the cached config without
>> re-reading it from disk. If the generation differs
>> (or the cache is unavailable), the config is re-read from disk.
>> If `update_cache = true`, the new config and current generation are
>> persisted in the cache. In this case, callers must hold the datastore
>> config lock to avoid racing with concurrent config changes.
>> If `update_cache` is `false` and generation did not match, the freshly
>> read config is returned but the cache is left unchanged. If
>> `ConfigVersionCache` is not available, the config is always read from
>> disk and `None` is returned as generation.
>>
>> Behavioral notes
>>
>> - The generation is bumped via the existing save_config() path, so
>> API-driven config changes are detected immediately.
>> - Manual edits to datastore.cfg are not detected; this is covered in a
>> dedicated patch in this series.
>> - DataStore::drop still performs a config read on the common path;
>> also covered in a dedicated patch in this series.
>>
>> Links
>>
>> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>>
>> Fixes: #6049
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>
> style nits below, but otherwise
>
> Reviewed-by: Fabian Grünbicher <f.gruenbichler@proxmox.com>
>
>> ---
>> Changes:
>>
>> From v1 → v2, thanks @Fabian
>> - Moved the ConfigVersionCache changes into its own patch.
>> - Introduced the global static DATASTORE_CONFIG_CACHE to store the
>> fully parsed datastore.cfg instead, along with its generation number.
>> Introduced DatastoreConfigCache struct to hold both.
>> - Removed and replaced the CachedDatastoreConfigTag field of
>> DataStoreImpl with a generation number field only (Option<usize>)
>> to validate DataStoreImpl reuse.
>> - Added DataStore::datastore_section_config_cached() helper function
>> to encapsulate the caching logic and simplify reuse.
>> - Modified DataStore::lookup_datastore() to use the new helper.
>>
>> From v2 → v3
>> No changes
>>
>> From v3 → v4, thanks @Fabian
>> - Restructured the version cache checks in
>> datastore_section_config_cached(), to simplify the logic.
>> - Added update_cache parameter to datastore_section_config_cached() to
>> control cache updates.
>>
>> From v4 → v5
>> - Rebased only, no changes
>>
>> pbs-datastore/Cargo.toml | 1 +
>> pbs-datastore/src/datastore.rs | 138 +++++++++++++++++++++++++--------
>> 2 files changed, 105 insertions(+), 34 deletions(-)
>
> this could be
>
> 2 files changed, 80 insertions(+), 17 deletions(-)
>
> see below. this might sound nit-picky, but keeping diffs concise makes
> reviewing a lot easier because of the higher signal to noise ratio..
>
>>
>> diff --git a/pbs-datastore/Cargo.toml b/pbs-datastore/Cargo.toml
>> index 8ce930a9..42f49a7b 100644
>> --- a/pbs-datastore/Cargo.toml
>> +++ b/pbs-datastore/Cargo.toml
>> @@ -40,6 +40,7 @@ proxmox-io.workspace = true
>> proxmox-lang.workspace=true
>> proxmox-s3-client = { workspace = true, features = [ "impl" ] }
>> proxmox-schema = { workspace = true, features = [ "api-macro" ] }
>> +proxmox-section-config.workspace = true
>> proxmox-serde = { workspace = true, features = [ "serde_json" ] }
>> proxmox-sys.workspace = true
>> proxmox-systemd.workspace = true
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 36550ff6..c9cb5d65 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -34,7 +34,8 @@ use pbs_api_types::{
>> MaintenanceType, Operation, UPID,
>> };
>> use pbs_config::s3::S3_CFG_TYPE_ID;
>> -use pbs_config::BackupLockGuard;
>> +use pbs_config::{BackupLockGuard, ConfigVersionCache};
>> +use proxmox_section_config::SectionConfigData;
>>
>> use crate::backup_info::{
>> BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
>> @@ -48,6 +49,17 @@ use crate::s3::S3_CONTENT_PREFIX;
>> use crate::task_tracking::{self, update_active_operations};
>> use crate::{DataBlob, LocalDatastoreLruCache};
>>
>> +// Cache for fully parsed datastore.cfg
>> +struct DatastoreConfigCache {
>> + // Parsed datastore.cfg file
>> + config: Arc<SectionConfigData>,
>> + // Generation number from ConfigVersionCache
>> + last_generation: usize,
>> +}
>> +
>> +static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
>> + LazyLock::new(|| Mutex::new(None));
>> +
>> static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
>> LazyLock::new(|| Mutex::new(HashMap::new()));
>>
>> @@ -149,11 +161,13 @@ pub struct DataStoreImpl {
>> last_gc_status: Mutex<GarbageCollectionStatus>,
>> verify_new: bool,
>> chunk_order: ChunkOrder,
>> - last_digest: Option<[u8; 32]>,
>> sync_level: DatastoreFSyncLevel,
>> backend_config: DatastoreBackendConfig,
>> lru_store_caching: Option<LocalDatastoreLruCache>,
>> thread_settings: DatastoreThreadSettings,
>> + /// Datastore generation number from `ConfigVersionCache` at creation time, used to
>
> datastore.cfg cache generation number at lookup time, used to invalidate
> this cached `DataStoreImpl`
>
> creation time could also refer to the datastore creation time..
>
Good point, agree! Will apply your suggestion, I like it more :)
>> + /// validate reuse of this cached `DataStoreImpl`.
>> + config_generation: Option<usize>,
>> }
>>
>> impl DataStoreImpl {
>> @@ -166,11 +180,11 @@ impl DataStoreImpl {
>> last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
>> verify_new: false,
>> chunk_order: Default::default(),
>> - last_digest: None,
>> sync_level: Default::default(),
>> backend_config: Default::default(),
>> lru_store_caching: None,
>> thread_settings: Default::default(),
>> + config_generation: None,
>> })
>> }
>> }
>> @@ -286,6 +300,55 @@ impl DatastoreThreadSettings {
>
> so this new helper is already +49 lines, which means it's the bulk of
> the change if the noise below is removed..
>
>> }
>> }
>>
>> +/// Returns the parsed datastore config (`datastore.cfg`) and its
>> +/// generation.
>> +///
>> +/// Uses `ConfigVersionCache` to detect stale entries:
>> +/// - If the cached generation matches the current generation, the
>> +/// cached config is returned.
>> +/// - Otherwise the config is re-read from disk. If `update_cache` is
>> +/// `true`, the new config and current generation are stored in the
>> +/// cache. Callers that set `update_cache = true` must hold the
>> +/// datastore config lock to avoid racing with concurrent config
>> +/// changes.
>> +/// - If `update_cache` is `false`, the freshly read config is returned
>> +/// but the cache is left unchanged.
>> +///
>> +/// If `ConfigVersionCache` is not available, the config is always read
>> +/// from disk and `None` is returned as the generation.
>> +fn datastore_section_config_cached(
>> + update_cache: bool,
>> +) -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
>> + let mut config_cache = DATASTORE_CONFIG_CACHE.lock().unwrap();
>> +
>> + if let Ok(version_cache) = ConfigVersionCache::new() {
>> + let current_gen = version_cache.datastore_generation();
>> + if let Some(cached) = config_cache.as_ref() {
>> + // Fast path: re-use cached datastore.cfg
>> + if cached.last_generation == current_gen {
>> + return Ok((cached.config.clone(), Some(cached.last_generation)));
>> + }
>> + }
>> + // Slow path: re-read datastore.cfg
>> + let (config_raw, _digest) = pbs_config::datastore::config()?;
>> + let config = Arc::new(config_raw);
>> +
>> + if update_cache {
>> + *config_cache = Some(DatastoreConfigCache {
>> + config: config.clone(),
>> + last_generation: current_gen,
>> + });
>> + }
>> +
>> + Ok((config, Some(current_gen)))
>> + } else {
>> + // Fallback path, no config version cache: read datastore.cfg and return None as generation
>> + *config_cache = None;
>> + let (config_raw, _digest) = pbs_config::datastore::config()?;
>> + Ok((Arc::new(config_raw), None))
>> + }
>> +}
>> +
>> impl DataStore {
>> // This one just panics on everything
>> #[doc(hidden)]
>> @@ -363,56 +426,63 @@ impl DataStore {
>> name: &str,
>> operation: Option<Operation>,
>> ) -> Result<Arc<DataStore>, Error> {
>
> the changes here contain a lot of churn that is not needed for the
> actual change that is done - e.g., there's lots of variables being
> needless renamed..
>
>> - // Avoid TOCTOU between checking maintenance mode and updating active operation counter, as
>> - // we use it to decide whether it is okay to delete the datastore.
>> + // Avoid TOCTOU between checking maintenance mode and updating active operations.
>> let _config_lock = pbs_config::datastore::lock_config()?;
>>
>> - // we could use the ConfigVersionCache's generation for staleness detection, but we load
>> - // the config anyway -> just use digest, additional benefit: manual changes get detected
>> - let (config, digest) = pbs_config::datastore::config()?;
>> - let config: DataStoreConfig = config.lookup("datastore", name)?;
>> + // Get the current datastore.cfg generation number and cached config
>> + let (section_config, gen_num) = datastore_section_config_cached(true)?;
>> +
>> + let datastore_cfg: DataStoreConfig = section_config.lookup("datastore", name)?;
>
> renaming this variable here already causes a lot of noise - if you
> really want to do that, do it up-front or at the end as cleanup commit
> (depending on how sure you are the change is agree-able - if it's almost
> certain it is accepted, put it up front, then it can get applied
> already. if not, put it at the end -> then it can be skipped without
> affecting the other patches ;))
>
> but going from config to datastore_cfg doesn't make the code any clearer
> - the latter can still refer to the whole config (datastore.cfg) or the
> individual datastore's section within.. since we have no futher business
> with the whole section config here, I think keeping this as `config` is
> fine..
>
Agree, will change back!
As you mentioned, the idea of the rename was to somehow better
differentiate between the global datastore.cfg and the individual
datastore config.
As we have 2 configs here in the same block, I tried to avoid to rely on
"config". As you said datastore_cfg
does not make it much better, or could be misinterpreted. Ultimatively,
I directly aligend with the type names (datastore_cfg for
DataStoreConfig).
I see your point and agree:
since we have no futher business
> with the whole section config here, I think keeping this as `config` is
> fine..
This is the best solution here.
>> + let maintenance_mode = datastore_cfg.get_maintenance_mode();
>> + let mount_status = get_datastore_mount_status(&datastore_cfg);
>
> and things extracted into variables here despite being used only once
> (this also doesn't change during the rest of the series)
>
Good point, will remove!
>>
>> - if let Some(maintenance_mode) = config.get_maintenance_mode() {
>> - if let Err(error) = maintenance_mode.check(operation) {
>> + if let Some(mm) = &maintenance_mode {
>
> binding name changes
>
>> + if let Err(error) = mm.check(operation.clone()) {
>
> new clone() is introduced but not needed
Nice catch!
>
>> bail!("datastore '{name}' is unavailable: {error}");
>> }
>> }
>>
>> - if get_datastore_mount_status(&config) == Some(false) {
>> - let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
>> - datastore_cache.remove(&config.name);
>> - bail!("datastore '{}' is not mounted", config.name);
>> + let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
>
> moving this is fine!
>
Helps to avoid the duplicate checks :)
>> +
>> + if mount_status == Some(false) {
>> + datastore_cache.remove(&datastore_cfg.name);
>> + bail!("datastore '{}' is not mounted", datastore_cfg.name);
>> }
>>
>> - let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
>> - let entry = datastore_cache.get(name);
>
> this is changed to doing it twice and cloning..
>
Refactoring this!
>> -
>> - // reuse chunk store so that we keep using the same process locker instance!
>> - let chunk_store = if let Some(datastore) = &entry {
>
> structure changed here and binding renamed, even though the old one works as-is
>
Agree, will revert!
>> - let last_digest = datastore.last_digest.as_ref();
>> - if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
>> - if let Some(operation) = operation {
>> - update_active_operations(name, operation, 1)?;
>> + // Re-use DataStoreImpl
>> + if let Some(existing) = datastore_cache.get(name).cloned() {
>> + if let (Some(last_generation), Some(gen_num)) = (existing.config_generation, gen_num) {
>> + if last_generation == gen_num {
>
> these two ifs can be collapsed into
>
> if datastore.config_generation == gen_num && gen_num.is_some() {
>
> since we only want to reuse the entry if the current gen num is the same
> as the last one.
>
Will collapse the two ifs, good catch!
>> + if let Some(op) = operation {
>
> binding needlessly renamed
>
Agree, will revert!
>> + update_active_operations(name, op, 1)?;
>> + }
>> +
>> + return Ok(Arc::new(Self {
>> + inner: existing,
>> + operation,
>> + }));
>> }
>> - return Ok(Arc::new(Self {
>> - inner: Arc::clone(datastore),
>> - operation,
>> - }));
>> }
>> - Arc::clone(&datastore.chunk_store)
>> + }
>> +
>> + // (Re)build DataStoreImpl
>> +
>> + // Reuse chunk store so that we keep using the same process locker instance!
>> + let chunk_store = if let Some(existing) = datastore_cache.get(name) {
>> + Arc::clone(&existing.chunk_store)
>
> this one is not needed at all, can just be combined with the previous if
> as before
>
Will refactor!
>> } else {
>> let tuning: DatastoreTuning = serde_json::from_value(
>> DatastoreTuning::API_SCHEMA
>> - .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
>> + .parse_property_string(datastore_cfg.tuning.as_deref().unwrap_or(""))?,
>> )?;
>> Arc::new(ChunkStore::open(
>> name,
>> - config.absolute_path(),
>> + datastore_cfg.absolute_path(),
>> tuning.sync_level.unwrap_or_default(),
>> )?)
>> };
>>
>> - let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
>> + let datastore = DataStore::with_store_and_config(chunk_store, datastore_cfg, gen_num)?;
>>
>> let datastore = Arc::new(datastore);
>> datastore_cache.insert(name.to_string(), datastore.clone());
>> @@ -514,7 +584,7 @@ impl DataStore {
>> fn with_store_and_config(
>> chunk_store: Arc<ChunkStore>,
>> config: DataStoreConfig,
>> - last_digest: Option<[u8; 32]>,
>> + generation: Option<usize>,
>> ) -> Result<DataStoreImpl, Error> {
>> let mut gc_status_path = chunk_store.base_path();
>> gc_status_path.push(".gc-status");
>> @@ -579,11 +649,11 @@ impl DataStore {
>> last_gc_status: Mutex::new(gc_status),
>> verify_new: config.verify_new.unwrap_or(false),
>> chunk_order: tuning.chunk_order.unwrap_or_default(),
>> - last_digest,
>> sync_level: tuning.sync_level.unwrap_or_default(),
>> backend_config,
>> lru_store_caching,
>> thread_settings,
>> + config_generation: generation,
>> })
>> }
>>
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop
2025-11-26 15:15 ` Fabian Grünbichler
@ 2025-11-28 9:03 ` Samuel Rufinatscha
2025-11-28 10:46 ` Fabian Grünbichler
0 siblings, 1 reply; 15+ messages in thread
From: Samuel Rufinatscha @ 2025-11-28 9:03 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 11/26/25 4:15 PM, Fabian Grünbichler wrote:
> On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
>> The Drop impl of DataStore re-read datastore.cfg to decide whether
>> the entry should be evicted from the in-process cache (based on
>> maintenance mode’s clear_from_cache). During the investigation of
>> issue #6049 [1], a flamegraph [2] showed that the config reload in Drop
>> accounted for a measurable share of CPU time under load.
>>
>> This patch wires the datastore config fast path to the Drop
>> impl to eventually avoid an expensive config reload from disk to capture
>> the maintenance mandate. Also, to ensure the Drop handlers will detect
>> that a newer config exists / to mitigate usage of an eventually stale
>> cached entry, generation will not only be bumped on config save, but also
>> on re-read of the config file (slow path), if `update_cache = true`.
>>
>> Links
>>
>> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>>
>> Fixes: #6049
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> Changes:
>>
>> From v1 → v2
>> - Replace caching logic with the datastore_section_config_cached()
>> helper.
>>
>> From v2 → v3
>> No changes
>>
>> From v3 → v4, thanks @Fabian
>> - Pass datastore_section_config_cached(false) in Drop to avoid
>> concurrent cache updates.
>>
>> From v4 → v5
>> - Rebased only, no changes
>>
>> pbs-datastore/src/datastore.rs | 60 ++++++++++++++++++++++++++--------
>> 1 file changed, 47 insertions(+), 13 deletions(-)
>>
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index c9cb5d65..7638a899 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -225,15 +225,40 @@ impl Drop for DataStore {
>> // remove datastore from cache iff
>> // - last task finished, and
>> // - datastore is in a maintenance mode that mandates it
>> - let remove_from_cache = last_task
>> - && pbs_config::datastore::config()
>> - .and_then(|(s, _)| s.lookup::<DataStoreConfig>("datastore", self.name()))
>> - .is_ok_and(|c| {
>> - c.get_maintenance_mode()
>> - .is_some_and(|m| m.clear_from_cache())
>> - });
>
> old code here ignored parsing/locking/.. issues and just assumed if no
> config can be obtained nothing should be done..
>
>> -
>> - if remove_from_cache {
>> +
>> + // first check: check if last task finished
>> + if !last_task {
>> + return;
>> + }
>> +
>> + let (section_config, _gen) = match datastore_section_config_cached(false) {
>> + Ok(v) => v,
>> + Err(err) => {
>> + log::error!(
>> + "failed to load datastore config in Drop for {} - {err}",
>> + self.name()
>> + );
>> + return;
>> + }
>> + };
>> +
>> + let datastore_cfg: DataStoreConfig =
>> + match section_config.lookup("datastore", self.name()) {
>> + Ok(cfg) => cfg,
>> + Err(err) => {
>> + log::error!(
>> + "failed to look up datastore '{}' in Drop - {err}",
>> + self.name()
>> + );
>> + return;
>
> here we now have fancy error logging ;) which can be fine, but if we go
> from silently ignoring errors to logging them at error level that should
> be mentioned to make it clear that it is intentional.
>
Makes sense, will mention that change in the commit message.
> besides that, the second error here means that the datastore was removed
> from the config in the meantime.. in which case we should probably
> remove it from the map as well, if is still there, even though we can't
> check the maintenance mode..
>
>> + }
>> + };
>> +
>> + // second check: check maintenance mode mandate
>
> what is a "maintenance mode mandate"? ;)
>
> keeping it simple, why not just
>
> // check if maintenance mode requires closing FDs
>
I see, will rephrase this, thanks!
>> + if datastore_cfg
>> + .get_maintenance_mode()
>> + .is_some_and(|m| m.clear_from_cache())
>> + {
>> DATASTORE_MAP.lock().unwrap().remove(self.name());
>> }
>> }
>> @@ -307,12 +332,12 @@ impl DatastoreThreadSettings {
>> /// - If the cached generation matches the current generation, the
>> /// cached config is returned.
>> /// - Otherwise the config is re-read from disk. If `update_cache` is
>> -/// `true`, the new config and current generation are stored in the
>> +/// `true`, the new config and bumped generation are stored in the
>> /// cache. Callers that set `update_cache = true` must hold the
>> /// datastore config lock to avoid racing with concurrent config
>> /// changes.
>> /// - If `update_cache` is `false`, the freshly read config is returned
>> -/// but the cache is left unchanged.
>> +/// but the cache and generation are left unchanged.
>> ///
>> /// If `ConfigVersionCache` is not available, the config is always read
>> /// from disk and `None` is returned as the generation.
>> @@ -333,14 +358,23 @@ fn datastore_section_config_cached(
>
> does this part here make any sense in this patch?
>
> we don't check the generation in the Drop handler anyway, so it will get
> the latest cached version, no matter what?
>
we don't check the generation in the Drop handler, but the drop handler
depends on this to potentially get a most fresh cached version?
> we'd only end up in this part of the code via lookup_datastore, and only
> if:
> - the previous cached entry and the current one have a different
> generation -> no need to bump again, the cache is already invalidated
> - there is no previous cached entry -> nothing to invalidate
>
> I think this part should move to the next patch..
Shouldn't it be rather in PATCH 2 then, instead part of the TTL feature
Also I would adjust the comment below then, so that it doesn't
necessarily just benefit the drop handler that calls
datastore_section_config_cached(false) but would in general future uses
of datastore_section_config_cached(false)?
>
>> let (config_raw, _digest) = pbs_config::datastore::config()?;
>> let config = Arc::new(config_raw);
>>
>> + let mut effective_gen = current_gen;
>> if update_cache {
>> + // Bump the generation. This ensures that Drop
>> + // handlers will detect that a newer config exists
>> + // and will not rely on a stale cached entry for
>> + // maintenance mandate.
>> + let prev_gen = version_cache.increase_datastore_generation();
>> + effective_gen = prev_gen + 1;
>> +
>> + // Persist
>> *config_cache = Some(DatastoreConfigCache {
>> config: config.clone(),
>> - last_generation: current_gen,
>> + last_generation: effective_gen,
>> });
>> }
>>
>> - Ok((config, Some(current_gen)))
>> + Ok((config, Some(effective_gen)))
>> } else {
>> // Fallback path, no config version cache: read datastore.cfg and return None as generation
>> *config_cache = None;
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop
2025-11-28 9:03 ` Samuel Rufinatscha
@ 2025-11-28 10:46 ` Fabian Grünbichler
2025-11-28 11:10 ` Samuel Rufinatscha
0 siblings, 1 reply; 15+ messages in thread
From: Fabian Grünbichler @ 2025-11-28 10:46 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Samuel Rufinatscha
On November 28, 2025 10:03 am, Samuel Rufinatscha wrote:
> On 11/26/25 4:15 PM, Fabian Grünbichler wrote:
>> On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
>>> @@ -307,12 +332,12 @@ impl DatastoreThreadSettings {
>>> /// - If the cached generation matches the current generation, the
>>> /// cached config is returned.
>>> /// - Otherwise the config is re-read from disk. If `update_cache` is
>>> -/// `true`, the new config and current generation are stored in the
>>> +/// `true`, the new config and bumped generation are stored in the
>>> /// cache. Callers that set `update_cache = true` must hold the
>>> /// datastore config lock to avoid racing with concurrent config
>>> /// changes.
>>> /// - If `update_cache` is `false`, the freshly read config is returned
>>> -/// but the cache is left unchanged.
>>> +/// but the cache and generation are left unchanged.
>>> ///
>>> /// If `ConfigVersionCache` is not available, the config is always read
>>> /// from disk and `None` is returned as the generation.
>>> @@ -333,14 +358,23 @@ fn datastore_section_config_cached(
>>
>> does this part here make any sense in this patch?
>>
>> we don't check the generation in the Drop handler anyway, so it will get
>> the latest cached version, no matter what?
>>
>
> we don't check the generation in the Drop handler, but the drop handler
> depends on this to potentially get a most fresh cached version?
datastore_section_config_cached will only reload the config if it was
changed over our API and the generation in the cached entry does no
longer match the current generation number. in that case there is no
need to bump the generation number, since that was already done by
whichever call saved the config and caused the generation number
mismatch in the first place - this already invalidated all previously
cached entries..
bumping the generation number only makes sense once we introduce the
force-reload mechanism in patch #4.
>
>> we'd only end up in this part of the code via lookup_datastore, and only
>> if:
>> - the previous cached entry and the current one have a different
>> generation -> no need to bump again, the cache is already invalidated
>> - there is no previous cached entry -> nothing to invalidate
>>
>> I think this part should move to the next patch..
>
> Shouldn't it be rather in PATCH 2 then, instead part of the TTL feature
> Also I would adjust the comment below then, so that it doesn't
> necessarily just benefit the drop handler that calls
> datastore_section_config_cached(false) but would in general future uses
> of datastore_section_config_cached(false)?
it has no benefit at this point in the series (or after/at patch #2),
see above. bumping only makes sense if we detect the generation number
is not valid, which we can only do via the digest check from patch#4.
and the digest check only makes sense with the TTL force-reload, because
else we can never end up in the code path where we read the config
without the cache already being invalid anyway.
>
>>
>>> let (config_raw, _digest) = pbs_config::datastore::config()?;
>>> let config = Arc::new(config_raw);
>>>
>>> + let mut effective_gen = current_gen;
>>> if update_cache {
>>> + // Bump the generation. This ensures that Drop
>>> + // handlers will detect that a newer config exists
>>> + // and will not rely on a stale cached entry for
>>> + // maintenance mandate.
>>> + let prev_gen = version_cache.increase_datastore_generation();
>>> + effective_gen = prev_gen + 1;
>>> +
>>> + // Persist
>>> *config_cache = Some(DatastoreConfigCache {
>>> config: config.clone(),
>>> - last_generation: current_gen,
>>> + last_generation: effective_gen,
>>> });
>>> }
>>>
>>> - Ok((config, Some(current_gen)))
>>> + Ok((config, Some(effective_gen)))
>>> } else {
>>> // Fallback path, no config version cache: read datastore.cfg and return None as generation
>>> *config_cache = None;
>>> --
>>> 2.47.3
>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop
2025-11-28 10:46 ` Fabian Grünbichler
@ 2025-11-28 11:10 ` Samuel Rufinatscha
0 siblings, 0 replies; 15+ messages in thread
From: Samuel Rufinatscha @ 2025-11-28 11:10 UTC (permalink / raw)
To: Fabian Grünbichler, Proxmox Backup Server development discussion
On 11/28/25 11:46 AM, Fabian Grünbichler wrote:
> On November 28, 2025 10:03 am, Samuel Rufinatscha wrote:
>> On 11/26/25 4:15 PM, Fabian Grünbichler wrote:
>>> On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
>>>> @@ -307,12 +332,12 @@ impl DatastoreThreadSettings {
>>>> /// - If the cached generation matches the current generation, the
>>>> /// cached config is returned.
>>>> /// - Otherwise the config is re-read from disk. If `update_cache` is
>>>> -/// `true`, the new config and current generation are stored in the
>>>> +/// `true`, the new config and bumped generation are stored in the
>>>> /// cache. Callers that set `update_cache = true` must hold the
>>>> /// datastore config lock to avoid racing with concurrent config
>>>> /// changes.
>>>> /// - If `update_cache` is `false`, the freshly read config is returned
>>>> -/// but the cache is left unchanged.
>>>> +/// but the cache and generation are left unchanged.
>>>> ///
>>>> /// If `ConfigVersionCache` is not available, the config is always read
>>>> /// from disk and `None` is returned as the generation.
>>>> @@ -333,14 +358,23 @@ fn datastore_section_config_cached(
>>>
>>> does this part here make any sense in this patch?
>>>
>>> we don't check the generation in the Drop handler anyway, so it will get
>>> the latest cached version, no matter what?
>>>
>>
>> we don't check the generation in the Drop handler, but the drop handler
>> depends on this to potentially get a most fresh cached version?
>
> datastore_section_config_cached will only reload the config if it was
> changed over our API and the generation in the cached entry does no
> longer match the current generation number. in that case there is no
> need to bump the generation number, since that was already done by
> whichever call saved the config and caused the generation number
> mismatch in the first place - this already invalidated all previously
> cached entries..
>
> bumping the generation number only makes sense once we introduce the
> force-reload mechanism in patch #4.
>
>>
>>> we'd only end up in this part of the code via lookup_datastore, and only
>>> if:
>>> - the previous cached entry and the current one have a different
>>> generation -> no need to bump again, the cache is already invalidated
>>> - there is no previous cached entry -> nothing to invalidate
>>>
>>> I think this part should move to the next patch..
>>
>> Shouldn't it be rather in PATCH 2 then, instead part of the TTL feature
>> Also I would adjust the comment below then, so that it doesn't
>> necessarily just benefit the drop handler that calls
>> datastore_section_config_cached(false) but would in general future uses
>> of datastore_section_config_cached(false)?
>
> it has no benefit at this point in the series (or after/at patch #2),
> see above. bumping only makes sense if we detect the generation number
> is not valid, which we can only do via the digest check from patch#4.
> and the digest check only makes sense with the TTL force-reload, because
> else we can never end up in the code path where we read the config
> without the cache already being invalid anyway.
>
Makes sense, I see. Thanks for clarifying Fabian!
Will add it to patch 4.
>>
>>>
>>>> let (config_raw, _digest) = pbs_config::datastore::config()?;
>>>> let config = Arc::new(config_raw);
>>>>
>>>> + let mut effective_gen = current_gen;
>>>> if update_cache {
>>>> + // Bump the generation. This ensures that Drop
>>>> + // handlers will detect that a newer config exists
>>>> + // and will not rely on a stale cached entry for
>>>> + // maintenance mandate.
>>>> + let prev_gen = version_cache.increase_datastore_generation();
>>>> + effective_gen = prev_gen + 1;
>>>> +
>>>> + // Persist
>>>> *config_cache = Some(DatastoreConfigCache {
>>>> config: config.clone(),
>>>> - last_generation: current_gen,
>>>> + last_generation: effective_gen,
>>>> });
>>>> }
>>>>
>>>> - Ok((config, Some(current_gen)))
>>>> + Ok((config, Some(effective_gen)))
>>>> } else {
>>>> // Fallback path, no config version cache: read datastore.cfg and return None as generation
>>>> *config_cache = None;
>>>> --
>>>> 2.47.3
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> pbs-devel mailing list
>>>> pbs-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-11-28 11:09 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-24 17:04 [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
2025-11-26 15:15 ` Fabian Grünbichler
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2025-11-26 15:15 ` Fabian Grünbichler
2025-11-26 17:21 ` Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
2025-11-26 15:15 ` Fabian Grünbichler
2025-11-28 9:03 ` Samuel Rufinatscha
2025-11-28 10:46 ` Fabian Grünbichler
2025-11-28 11:10 ` Samuel Rufinatscha
2025-11-24 17:04 ` [pbs-devel] [PATCH proxmox-backup v5 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-26 15:15 ` Fabian Grünbichler
2025-11-26 15:16 ` [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Fabian Grünbichler
2025-11-26 16:10 ` Samuel Rufinatscha
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.