From: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] superseded: [PATCH proxmox-backup v4 0/4] datastore: remove config reload on hot path
Date: Mon, 24 Nov 2025 18:06:20 +0100 [thread overview]
Message-ID: <57eec0bf-b820-4380-8f29-2de5722a26b2@proxmox.com> (raw)
In-Reply-To: <20251124153328.239666-1-s.rufinatscha@proxmox.com>
https://lore.proxmox.com/pbs-devel/20251124170423.303300-1-s.rufinatscha@proxmox.com/T/#t
On 11/24/25 4:32 PM, Samuel Rufinatscha wrote:
> Hi,
>
> this series reduces CPU time in datastore lookups by avoiding repeated
> datastore.cfg reads/parses in both `lookup_datastore()` and
> `DataStore::Drop`. It also adds a TTL so manual config edits are
> noticed without reintroducing hashing on every request.
>
> While investigating #6049 [1], cargo-flamegraph [2] showed hotspots
> during repeated `/status` calls in `lookup_datastore()` and in `Drop`,
> dominated by `pbs_config::datastore::config()` (config parse).
>
> The parsing cost itself should eventually be investigated in a future
> effort. Furthermore, cargo-flamegraph showed that when using a
> token-based auth method to access the API, a significant amount of time
> is spent in validation on every request request [3].
>
> ## Approach
>
> [PATCH 1/4] Support datastore generation in ConfigVersionCache
>
> [PATCH 2/4] Fast path for datastore lookups
> Cache the parsed datastore.cfg keyed by the shared datastore
> generation. lookup_datastore() reuses both the cached config and an
> existing DataStoreImpl when the generation matches, and falls back
> to the old slow path otherwise. The caching logic is implemented
> using the datastore_section_config_cached(update_cache: bool) helper.
>
> [PATCH 3/4] Fast path for Drop
> Make DataStore::Drop use the datastore_section_config_cached()
> helper to avoid re-reading/parsing datastore.cfg on every Drop.
> Bump generation not only on API config saves, but also on slow-path
> lookups (if update_cache is true), to enable Drop handlers see
> eventual newer configs.
>
> [PATCH 4/4] TTL to catch manual edits
> Add a TTL to the cached config and bump the datastore generation iff
> the digest changed but generation stays the same. This catches manual
> edits to datastore.cfg without reintroducing hashing or config
> parsing on every request.
>
> ## Benchmark results
>
> ### End-to-end
>
> Testing `/status?verbose=0` end-to-end with 1000 stores, 5 req/store
> and parallel=16 before/after the series:
>
> Metric Before After
> ----------------------------------------
> Total time 12s 9s
> Throughput (all) 416.67 555.56
> Cold RPS (round #1) 83.33 111.11
> Warm RPS (#2..N) 333.33 444.44
>
> Running under flamegraph [2], TLS appears to consume a significant
> amount of CPU time and blur the results. Still, a ~33% higher overall
> throughput and ~25% less end-to-end time for this workload.
>
> ### Isolated benchmarks (hyperfine)
>
> In addition to the end-to-end tests, I measured two standalone
> benchmarks with hyperfine, each using a config with 1000 datastores.
> `M` is the number of distinct datastores looked up and
> `N` is the number of lookups per datastore.
>
> Drop-direct variant:
>
> Drops the `DataStore` after every lookup, so the `Drop` path runs on
> every iteration:
>
> use anyhow::Error;
>
> use pbs_api_types::Operation;
> use pbs_datastore::DataStore;
>
> fn main() -> Result<(), Error> {
> let mut args = std::env::args();
> args.next();
>
> let datastores = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> let iterations = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> for d in 1..=datastores {
> let name = format!("ds{:04}", d);
>
> for i in 1..=iterations {
> DataStore::lookup_datastore(&name, Some(Operation::Write))?;
> }
> }
>
> Ok(())
> }
>
> +----+------+-----------+-----------+---------+
> | M | N | Baseline | Patched | Speedup |
> +----+------+-----------+-----------+---------+
> | 1 | 1000 | 1.684 s | 35.3 ms | 47.7x |
> | 10 | 100 | 1.689 s | 35.0 ms | 48.3x |
> | 100| 10 | 1.709 s | 35.8 ms | 47.7x |
> |1000| 1 | 1.809 s | 39.0 ms | 46.4x |
> +----+------+-----------+-----------+---------+
>
> Bulk-drop variant:
>
> Keeps the `DataStore` instances alive for
> all `N` lookups of a given datastore and then drops them in bulk,
> mimicking a task that performs many lookups while it is running and
> only triggers the expensive `Drop` logic when the last user exits.
>
> use anyhow::Error;
>
> use pbs_api_types::Operation;
> use pbs_datastore::DataStore;
>
> fn main() -> Result<(), Error> {
> let mut args = std::env::args();
> args.next();
>
> let datastores = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> let iterations = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> for d in 1..=datastores {
> let name = format!("ds{:04}", d);
>
> let mut stores = Vec::with_capacity(iterations);
> for i in 1..=iterations {
> stores.push(DataStore::lookup_datastore(&name, Some(Operation::Write))?);
> }
> }
>
> Ok(())
> }
>
> +------+------+---------------+--------------+---------+
> | M | N | Baseline mean | Patched mean | Speedup |
> +------+------+---------------+--------------+---------+
> | 1 | 1000 | 890.6 ms | 35.5 ms | 25.1x |
> | 10 | 100 | 891.3 ms | 35.1 ms | 25.4x |
> | 100 | 10 | 983.9 ms | 35.6 ms | 27.6x |
> | 1000 | 1 | 1829.0 ms | 45.2 ms | 40.5x |
> +------+------+---------------+--------------+---------+
>
>
> Both variants show that the combination of the cached config lookups
> and the cheaper `Drop` handling reduces the hot-path cost from ~1.8 s
> per run to a few tens of milliseconds in these benchmarks.
>
> ## Reproduction steps
>
> VM: 4 vCPU, ~8 GiB RAM, VirtIO-SCSI; disks:
> - scsi0 32G (OS)
> - scsi1 1000G (datastores)
>
> Install PBS from ISO on the VM.
>
> Set up ZFS on /dev/sdb (adjust if different):
>
> zpool create -f -o ashift=12 pbsbench /dev/sdb
> zfs set mountpoint=/pbsbench pbsbench
> zfs create pbsbench/pbs-bench
>
> Raise file-descriptor limit:
>
> sudo systemctl edit proxmox-backup-proxy.service
>
> Add the following lines:
>
> [Service]
> LimitNOFILE=1048576
>
> Reload systemd and restart the proxy:
>
> sudo systemctl daemon-reload
> sudo systemctl restart proxmox-backup-proxy.service
>
> Verify the limit:
>
> systemctl show proxmox-backup-proxy.service | grep LimitNOFILE
>
> Create 1000 ZFS-backed datastores (as used in #6049 [1]):
>
> seq -w 001 1000 | xargs -n1 -P1 bash -c '
> id=$0
> name="ds${id}"
> dataset="pbsbench/pbs-bench/${name}"
> path="/pbsbench/pbs-bench/${name}"
> zfs create -o mountpoint="$path" "$dataset"
> proxmox-backup-manager datastore create "$name" "$path" \
> --comment "ZFS dataset-based datastore"
> '
>
> Build PBS from this series, then run the server under manually
> under flamegraph:
>
> systemctl stop proxmox-backup-proxy
> cargo flamegraph --release --bin proxmox-backup-proxy
>
> ## Patch summary
>
> [PATCH 1/4] partial fix #6049: config: enable config version cache for datastore
> [PATCH 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
> [PATCH 3/4] partial fix #6049: datastore: use config fast-path in Drop
> [PATCH 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
>
> ## Maintainer notes
>
> No dependency bumps, no API changes and no breaking changes.
>
> Thanks,
> Samuel
>
> Links
>
> [1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
> [3] Bugzilla #7017: https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>
> Samuel Rufinatscha (4):
> partial fix #6049: config: enable config version cache for datastore
> partial fix #6049: datastore: impl ConfigVersionCache fast path for
> lookups
> partial fix #6049: datastore: use config fast-path in Drop
> partial fix #6049: datastore: add TTL fallback to catch manual config
> edits
>
> pbs-config/src/config_version_cache.rs | 10 +-
> pbs-datastore/Cargo.toml | 1 +
> pbs-datastore/src/datastore.rs | 215 ++++++++++++++++++++-----
> 3 files changed, 180 insertions(+), 46 deletions(-)
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
prev parent reply other threads:[~2025-11-24 17:06 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-24 15:33 [pbs-devel] " Samuel Rufinatscha
2025-11-24 15:33 ` [pbs-devel] [PATCH proxmox-backup v4 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
2025-11-24 15:33 ` [pbs-devel] [PATCH proxmox-backup v4 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2025-11-24 15:33 ` [pbs-devel] [PATCH proxmox-backup v4 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
2025-11-24 15:33 ` [pbs-devel] [PATCH proxmox-backup v4 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-24 17:06 ` Samuel Rufinatscha [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57eec0bf-b820-4380-8f29-2de5722a26b2@proxmox.com \
--to=s.rufinatscha@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox