From: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path
Date: Tue, 11 Nov 2025 13:29:38 +0100 [thread overview]
Message-ID: <20251111122941.110412-1-s.rufinatscha@proxmox.com> (raw)
Hi,
this series reduces CPU time in datastore lookups by avoiding repeated
datastore.cfg reads/parses in both `lookup_datastore()` and
`DataStore::Drop`. It also adds a TTL so manual config edits are
noticed without reintroducing hashing on every request.
While investigating #6049 [1], cargo-flamegraph [2] showed hotspots during
repeated `/status` calls in `lookup_datastore()` and in `Drop`,
dominated by `pbs_config::datastore::config()` (config parse).
The parsing cost itself should eventually be investigated in a future
effort. Furthermore, cargo-flamegraph showed that when using a
token-based auth method to access the API, a significant amount of time
is spent in validation on every request, likely related to bcrypt.
Also this should be eventually revisited in a future effort.
## Approach
[PATCH 1/3] Fast path for datastore lookups
Use the shared-memory `ConfigVersionCache` generation for `datastore.cfg`.
Tag each cached `DataStoreImpl` with the last seen generation; when it
matches, reuse the cached instance. Fall back to the existing slow path
on mismatch or when the cache is unavailable.
[PATCH 2/3] Fast path for `Drop`
Reuse the maintenance mode eviction decision captured at lookup time,
removing the config reload from `Drop`.
[PATCH 3/3] TTL to catch manual edits
If a cached entry is older than `DATASTORE_CONFIG_CACHE_TTL_SECS`
(default 60s), the next lookup refreshes it via the slow path. This
detects manual file edits without hashing on every request.
## Results
End-to-end `/status?verbose=0` (1000 stores, 5 req/store, parallel=16):
Metric Baseline [1/3] [2/3]
------------------------------------------------
Total time 13s 11s 10s
Throughput (all) 384.62 454.55 500.00
Cold RPS (round #1) 76.92 90.91 100.00
Warm RPS (2..N) 307.69 363.64 400.00
Patch 1 improves overall throughput by ~18% (−15% total time). Patch 2
adds ~10% on top. Patch 3 is a robustness feature; a 0.1 s probe shows
periodic latency spikes at TTL expiry and flat latencies otherwise.
## Reproduction steps
VM: 4 vCPU, ~8 GiB RAM, VirtIO-SCSI; disks:
- scsi0 32G (OS)
- scsi1 1000G (datastores)
Install PBS from ISO on the VM.
Set up ZFS on /dev/sdb (adjust if different):
zpool create -f -o ashift=12 pbsbench /dev/sdb
zfs set mountpoint=/pbsbench pbsbench
zfs create pbsbench/pbs-bench
Raise file-descriptor limit:
sudo systemctl edit proxmox-backup-proxy.service
Add the following lines:
[Service]
LimitNOFILE=1048576
Reload systemd and restart the proxy:
sudo systemctl daemon-reload
sudo systemctl restart proxmox-backup-proxy.service
Verify the limit:
systemctl show proxmox-backup-proxy.service | grep LimitNOFILE
Create 1000 ZFS-backed datastores (as used in #6049 [1]):
seq -w 001 1000 | xargs -n1 -P1 bash -c '
id=$0
name="ds${id}"
dataset="pbsbench/pbs-bench/${name}"
path="/pbsbench/pbs-bench/${name}"
zfs create -o mountpoint="$path" "$dataset"
proxmox-backup-manager datastore create "$name" "$path" \
--comment "ZFS dataset-based datastore"
'
Build PBS from this series, then run the server under manually
under flamegraph:
systemctl stop proxmox-backup-proxy
cargo flamegraph --release --bin proxmox-backup-proxy
Benchmark script (`bench.sh`) used for the numbers above:
#!/usr/bin/env bash
set -euo pipefail
# --- Config ---------------------------------------------------------------
HOST='https://localhost:8007'
USER='root@pam'
PASS="$(cat passfile)"
DATASTORE_PATH="/pbsbench/pbs-bench"
MAX_STORES=1000 # how many stores to include
PARALLEL=16 # concurrent workers
REPEAT=5 # requests per store (1 cold + REPEAT-1 warm)
PRINT_FIRST=false # true => log first request's HTTP code per store
# --- Helpers --------------------------------------------------------------
fmt_rps () {
local n="$1" t="$2"
awk -v n="$n" -v t="$t" 'BEGIN { if (t > 0) printf("%.2f\n", n/t); else print "0.00" }'
}
# --- Login ---------------------------------------------------------------
auth=$(curl -ks -X POST "$HOST/api2/json/access/ticket" \
-d "username=$USER" -d "password=$PASS")
ticket=$(echo "$auth" | jq -r '.data.ticket')
if [[ -z "${ticket:-}" || "$ticket" == "null" ]]; then
echo "[ERROR] Login failed (no ticket)"
exit 1
fi
# --- Collect stores (deterministic order) --------------------------------
mapfile -t STORES < <(
find "$DATASTORE_PATH" -mindepth 1 -maxdepth 1 -type d -printf '%f\n' \
| sort | head -n "$MAX_STORES"
)
USED_STORES=${#STORES[@]}
if (( USED_STORES == 0 )); then
echo "[ERROR] No datastore dirs under $DATASTORE_PATH"
exit 1
fi
echo "[INFO] Running with stores=$USED_STORES, repeat=$REPEAT, parallel=$PARALLEL"
# --- Temp counters --------------------------------------------------------
SUCCESS_ALL="$(mktemp)"
FAIL_ALL="$(mktemp)"
COLD_OK="$(mktemp)"
WARM_OK="$(mktemp)"
trap 'rm -f "$SUCCESS_ALL" "$FAIL_ALL" "$COLD_OK" "$WARM_OK"' EXIT
export HOST ticket REPEAT SUCCESS_ALL FAIL_ALL COLD_OK WARM_OK PRINT_FIRST
SECONDS=0
# --- Fire requests --------------------------------------------------------
printf "%s\n" "${STORES[@]}" \
| xargs -P"$PARALLEL" -I{} bash -c '
store="$1"
url="$HOST/api2/json/admin/datastore/$store/status?verbose=0"
for ((i=1;i<=REPEAT;i++)); do
code=$(curl -ks -o /dev/null -w "%{http_code}" -b "PBSAuthCookie=$ticket" "$url" || echo 000)
if [[ "$code" == "200" ]]; then
echo 1 >> "$SUCCESS_ALL"
if (( i == 1 )); then
echo 1 >> "$COLD_OK"
else
echo 1 >> "$WARM_OK"
fi
if [[ "$PRINT_FIRST" == "true" && $i -eq 1 ]]; then
ts=$(date +%H:%M:%S)
echo "[$ts] $store #$i HTTP:200"
fi
else
echo 1 >> "$FAIL_ALL"
if [[ "$PRINT_FIRST" == "true" && $i -eq 1 ]]; then
ts=$(date +%H:%M:%S)
echo "[$ts] $store #$i HTTP:$code (FAIL)"
fi
fi
done
' _ {}
# --- Summary --------------------------------------------------------------
elapsed=$SECONDS
ok=$(wc -l < "$SUCCESS_ALL" 2>/dev/null || echo 0)
fail=$(wc -l < "$FAIL_ALL" 2>/dev/null || echo 0)
cold_ok=$(wc -l < "$COLD_OK" 2>/dev/null || echo 0)
warm_ok=$(wc -l < "$WARM_OK" 2>/dev/null || echo 0)
expected=$(( USED_STORES * REPEAT ))
total=$(( ok + fail ))
rps_all=$(fmt_rps "$ok" "$elapsed")
rps_cold=$(fmt_rps "$cold_ok" "$elapsed")
rps_warm=$(fmt_rps "$warm_ok" "$elapsed")
echo "===== Summary ====="
echo "Stores used: $USED_STORES"
echo "Expected requests: $expected"
echo "Executed requests: $total"
echo "OK (HTTP 200): $ok"
echo "Failed: $fail"
printf "Total time: %dm %ds\n" $((elapsed/60)) $((elapsed%60))
echo "Throughput all RPS: $rps_all"
echo "Cold RPS (round #1): $rps_cold"
echo "Warm RPS (#2..N): $rps_warm"
## Maintainer notes
- No dependency bumps, no API changes, no breaking changes in this
series.
## Patch summary
[PATCH 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
[PATCH 2/3] partial fix #6049: datastore: use config fast-path in Drop
[PATCH 3/3] datastore: add TTL fallback to catch manual config edits
Thanks for reviewing!
[1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Samuel Rufinatscha (3):
partial fix #6049: datastore: impl ConfigVersionCache fast path for
lookups
partial fix #6049: datastore: use config fast-path in Drop
datastore: add TTL fallback to catch manual config edits
pbs-config/src/config_version_cache.rs | 10 ++-
pbs-datastore/src/datastore.rs | 119 ++++++++++++++++++-------
2 files changed, 96 insertions(+), 33 deletions(-)
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
next reply other threads:[~2025-11-11 12:29 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-11 12:29 Samuel Rufinatscha [this message]
2025-11-11 12:29 ` [pbs-devel] [PATCH proxmox-backup 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2025-11-12 13:24 ` Fabian Grünbichler
2025-11-13 12:59 ` Samuel Rufinatscha
2025-11-11 12:29 ` [pbs-devel] [PATCH proxmox-backup 2/3] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
2025-11-12 11:24 ` Fabian Grünbichler
2025-11-12 15:20 ` Samuel Rufinatscha
2025-11-11 12:29 ` [pbs-devel] [PATCH proxmox-backup 3/3] datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-12 11:27 ` [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path Fabian Grünbichler
2025-11-12 17:27 ` Samuel Rufinatscha
2025-11-14 15:08 ` [pbs-devel] superseded: " Samuel Rufinatscha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251111122941.110412-1-s.rufinatscha@proxmox.com \
--to=s.rufinatscha@proxmox.com \
--cc=pbs-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.