From: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] superseded: [PATCH proxmox-backup v4 0/4] datastore: remove config reload on hot path
Date: Mon, 24 Nov 2025 18:06:20 +0100 [thread overview]
Message-ID: <57eec0bf-b820-4380-8f29-2de5722a26b2@proxmox.com> (raw)
In-Reply-To: <20251124153328.239666-1-s.rufinatscha@proxmox.com>
https://lore.proxmox.com/pbs-devel/20251124170423.303300-1-s.rufinatscha@proxmox.com/T/#t
On 11/24/25 4:32 PM, Samuel Rufinatscha wrote:
> Hi,
>
> this series reduces CPU time in datastore lookups by avoiding repeated
> datastore.cfg reads/parses in both `lookup_datastore()` and
> `DataStore::Drop`. It also adds a TTL so manual config edits are
> noticed without reintroducing hashing on every request.
>
> While investigating #6049 [1], cargo-flamegraph [2] showed hotspots
> during repeated `/status` calls in `lookup_datastore()` and in `Drop`,
> dominated by `pbs_config::datastore::config()` (config parse).
>
> The parsing cost itself should eventually be investigated in a future
> effort. Furthermore, cargo-flamegraph showed that when using a
> token-based auth method to access the API, a significant amount of time
> is spent in validation on every request request [3].
>
> ## Approach
>
> [PATCH 1/4] Support datastore generation in ConfigVersionCache
>
> [PATCH 2/4] Fast path for datastore lookups
> Cache the parsed datastore.cfg keyed by the shared datastore
> generation. lookup_datastore() reuses both the cached config and an
> existing DataStoreImpl when the generation matches, and falls back
> to the old slow path otherwise. The caching logic is implemented
> using the datastore_section_config_cached(update_cache: bool) helper.
>
> [PATCH 3/4] Fast path for Drop
> Make DataStore::Drop use the datastore_section_config_cached()
> helper to avoid re-reading/parsing datastore.cfg on every Drop.
> Bump generation not only on API config saves, but also on slow-path
> lookups (if update_cache is true), to enable Drop handlers see
> eventual newer configs.
>
> [PATCH 4/4] TTL to catch manual edits
> Add a TTL to the cached config and bump the datastore generation iff
> the digest changed but generation stays the same. This catches manual
> edits to datastore.cfg without reintroducing hashing or config
> parsing on every request.
>
> ## Benchmark results
>
> ### End-to-end
>
> Testing `/status?verbose=0` end-to-end with 1000 stores, 5 req/store
> and parallel=16 before/after the series:
>
> Metric Before After
> ----------------------------------------
> Total time 12s 9s
> Throughput (all) 416.67 555.56
> Cold RPS (round #1) 83.33 111.11
> Warm RPS (#2..N) 333.33 444.44
>
> Running under flamegraph [2], TLS appears to consume a significant
> amount of CPU time and blur the results. Still, a ~33% higher overall
> throughput and ~25% less end-to-end time for this workload.
>
> ### Isolated benchmarks (hyperfine)
>
> In addition to the end-to-end tests, I measured two standalone
> benchmarks with hyperfine, each using a config with 1000 datastores.
> `M` is the number of distinct datastores looked up and
> `N` is the number of lookups per datastore.
>
> Drop-direct variant:
>
> Drops the `DataStore` after every lookup, so the `Drop` path runs on
> every iteration:
>
> use anyhow::Error;
>
> use pbs_api_types::Operation;
> use pbs_datastore::DataStore;
>
> fn main() -> Result<(), Error> {
> let mut args = std::env::args();
> args.next();
>
> let datastores = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> let iterations = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> for d in 1..=datastores {
> let name = format!("ds{:04}", d);
>
> for i in 1..=iterations {
> DataStore::lookup_datastore(&name, Some(Operation::Write))?;
> }
> }
>
> Ok(())
> }
>
> +----+------+-----------+-----------+---------+
> | M | N | Baseline | Patched | Speedup |
> +----+------+-----------+-----------+---------+
> | 1 | 1000 | 1.684 s | 35.3 ms | 47.7x |
> | 10 | 100 | 1.689 s | 35.0 ms | 48.3x |
> | 100| 10 | 1.709 s | 35.8 ms | 47.7x |
> |1000| 1 | 1.809 s | 39.0 ms | 46.4x |
> +----+------+-----------+-----------+---------+
>
> Bulk-drop variant:
>
> Keeps the `DataStore` instances alive for
> all `N` lookups of a given datastore and then drops them in bulk,
> mimicking a task that performs many lookups while it is running and
> only triggers the expensive `Drop` logic when the last user exits.
>
> use anyhow::Error;
>
> use pbs_api_types::Operation;
> use pbs_datastore::DataStore;
>
> fn main() -> Result<(), Error> {
> let mut args = std::env::args();
> args.next();
>
> let datastores = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> let iterations = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> for d in 1..=datastores {
> let name = format!("ds{:04}", d);
>
> let mut stores = Vec::with_capacity(iterations);
> for i in 1..=iterations {
> stores.push(DataStore::lookup_datastore(&name, Some(Operation::Write))?);
> }
> }
>
> Ok(())
> }
>
> +------+------+---------------+--------------+---------+
> | M | N | Baseline mean | Patched mean | Speedup |
> +------+------+---------------+--------------+---------+
> | 1 | 1000 | 890.6 ms | 35.5 ms | 25.1x |
> | 10 | 100 | 891.3 ms | 35.1 ms | 25.4x |
> | 100 | 10 | 983.9 ms | 35.6 ms | 27.6x |
> | 1000 | 1 | 1829.0 ms | 45.2 ms | 40.5x |
> +------+------+---------------+--------------+---------+
>
>
> Both variants show that the combination of the cached config lookups
> and the cheaper `Drop` handling reduces the hot-path cost from ~1.8 s
> per run to a few tens of milliseconds in these benchmarks.
>
> ## Reproduction steps
>
> VM: 4 vCPU, ~8 GiB RAM, VirtIO-SCSI; disks:
> - scsi0 32G (OS)
> - scsi1 1000G (datastores)
>
> Install PBS from ISO on the VM.
>
> Set up ZFS on /dev/sdb (adjust if different):
>
> zpool create -f -o ashift=12 pbsbench /dev/sdb
> zfs set mountpoint=/pbsbench pbsbench
> zfs create pbsbench/pbs-bench
>
> Raise file-descriptor limit:
>
> sudo systemctl edit proxmox-backup-proxy.service
>
> Add the following lines:
>
> [Service]
> LimitNOFILE=1048576
>
> Reload systemd and restart the proxy:
>
> sudo systemctl daemon-reload
> sudo systemctl restart proxmox-backup-proxy.service
>
> Verify the limit:
>
> systemctl show proxmox-backup-proxy.service | grep LimitNOFILE
>
> Create 1000 ZFS-backed datastores (as used in #6049 [1]):
>
> seq -w 001 1000 | xargs -n1 -P1 bash -c '
> id=$0
> name="ds${id}"
> dataset="pbsbench/pbs-bench/${name}"
> path="/pbsbench/pbs-bench/${name}"
> zfs create -o mountpoint="$path" "$dataset"
> proxmox-backup-manager datastore create "$name" "$path" \
> --comment "ZFS dataset-based datastore"
> '
>
> Build PBS from this series, then run the server under manually
> under flamegraph:
>
> systemctl stop proxmox-backup-proxy
> cargo flamegraph --release --bin proxmox-backup-proxy
>
> ## Patch summary
>
> [PATCH 1/4] partial fix #6049: config: enable config version cache for datastore
> [PATCH 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
> [PATCH 3/4] partial fix #6049: datastore: use config fast-path in Drop
> [PATCH 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
>
> ## Maintainer notes
>
> No dependency bumps, no API changes and no breaking changes.
>
> Thanks,
> Samuel
>
> Links
>
> [1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
> [3] Bugzilla #7017: https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>
> Samuel Rufinatscha (4):
> partial fix #6049: config: enable config version cache for datastore
> partial fix #6049: datastore: impl ConfigVersionCache fast path for
> lookups
> partial fix #6049: datastore: use config fast-path in Drop
> partial fix #6049: datastore: add TTL fallback to catch manual config
> edits
>
> pbs-config/src/config_version_cache.rs | 10 +-
> pbs-datastore/Cargo.toml | 1 +
> pbs-datastore/src/datastore.rs | 215 ++++++++++++++++++++-----
> 3 files changed, 180 insertions(+), 46 deletions(-)
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
prev parent reply other threads:[~2025-11-24 17:06 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-24 15:33 [pbs-devel] " Samuel Rufinatscha
2025-11-24 15:33 ` [pbs-devel] [PATCH proxmox-backup v4 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
2025-11-24 15:33 ` [pbs-devel] [PATCH proxmox-backup v4 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2025-11-24 15:33 ` [pbs-devel] [PATCH proxmox-backup v4 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
2025-11-24 15:33 ` [pbs-devel] [PATCH proxmox-backup v4 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-24 17:06 ` Samuel Rufinatscha [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57eec0bf-b820-4380-8f29-2de5722a26b2@proxmox.com \
--to=s.rufinatscha@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.