From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id ABC331FF15E for ; Mon, 24 Nov 2025 18:06:12 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id B4D5A2DF5; Mon, 24 Nov 2025 18:06:23 +0100 (CET) Message-ID: <57eec0bf-b820-4380-8f29-2de5722a26b2@proxmox.com> Date: Mon, 24 Nov 2025 18:06:20 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: pbs-devel@lists.proxmox.com References: <20251124153328.239666-1-s.rufinatscha@proxmox.com> Content-Language: en-US From: Samuel Rufinatscha In-Reply-To: <20251124153328.239666-1-s.rufinatscha@proxmox.com> X-SPAM-LEVEL: Spam detection results: 0 AWL -0.699 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment KAM_MAILER 2 Automated Mailer Tag Left in Email SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [datastore.rs, proxmox.com] Subject: [pbs-devel] superseded: [PATCH proxmox-backup v4 0/4] datastore: remove config reload on hot path X-BeenThere: pbs-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox Backup Server development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox Backup Server development discussion Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: pbs-devel-bounces@lists.proxmox.com Sender: "pbs-devel" https://lore.proxmox.com/pbs-devel/20251124170423.303300-1-s.rufinatscha@proxmox.com/T/#t On 11/24/25 4:32 PM, Samuel Rufinatscha wrote: > Hi, > > this series reduces CPU time in datastore lookups by avoiding repeated > datastore.cfg reads/parses in both `lookup_datastore()` and > `DataStore::Drop`. It also adds a TTL so manual config edits are > noticed without reintroducing hashing on every request. > > While investigating #6049 [1], cargo-flamegraph [2] showed hotspots > during repeated `/status` calls in `lookup_datastore()` and in `Drop`, > dominated by `pbs_config::datastore::config()` (config parse). > > The parsing cost itself should eventually be investigated in a future > effort. Furthermore, cargo-flamegraph showed that when using a > token-based auth method to access the API, a significant amount of time > is spent in validation on every request request [3]. > > ## Approach > > [PATCH 1/4] Support datastore generation in ConfigVersionCache > > [PATCH 2/4] Fast path for datastore lookups > Cache the parsed datastore.cfg keyed by the shared datastore > generation. lookup_datastore() reuses both the cached config and an > existing DataStoreImpl when the generation matches, and falls back > to the old slow path otherwise. The caching logic is implemented > using the datastore_section_config_cached(update_cache: bool) helper. > > [PATCH 3/4] Fast path for Drop > Make DataStore::Drop use the datastore_section_config_cached() > helper to avoid re-reading/parsing datastore.cfg on every Drop. > Bump generation not only on API config saves, but also on slow-path > lookups (if update_cache is true), to enable Drop handlers see > eventual newer configs. > > [PATCH 4/4] TTL to catch manual edits > Add a TTL to the cached config and bump the datastore generation iff > the digest changed but generation stays the same. This catches manual > edits to datastore.cfg without reintroducing hashing or config > parsing on every request. > > ## Benchmark results > > ### End-to-end > > Testing `/status?verbose=0` end-to-end with 1000 stores, 5 req/store > and parallel=16 before/after the series: > > Metric Before After > ---------------------------------------- > Total time 12s 9s > Throughput (all) 416.67 555.56 > Cold RPS (round #1) 83.33 111.11 > Warm RPS (#2..N) 333.33 444.44 > > Running under flamegraph [2], TLS appears to consume a significant > amount of CPU time and blur the results. Still, a ~33% higher overall > throughput and ~25% less end-to-end time for this workload. > > ### Isolated benchmarks (hyperfine) > > In addition to the end-to-end tests, I measured two standalone > benchmarks with hyperfine, each using a config with 1000 datastores. > `M` is the number of distinct datastores looked up and > `N` is the number of lookups per datastore. > > Drop-direct variant: > > Drops the `DataStore` after every lookup, so the `Drop` path runs on > every iteration: > > use anyhow::Error; > > use pbs_api_types::Operation; > use pbs_datastore::DataStore; > > fn main() -> Result<(), Error> { > let mut args = std::env::args(); > args.next(); > > let datastores = if let Some(n) = args.next() { > n.parse::()? > } else { > 1000 > }; > > let iterations = if let Some(n) = args.next() { > n.parse::()? > } else { > 1000 > }; > > for d in 1..=datastores { > let name = format!("ds{:04}", d); > > for i in 1..=iterations { > DataStore::lookup_datastore(&name, Some(Operation::Write))?; > } > } > > Ok(()) > } > > +----+------+-----------+-----------+---------+ > | M | N | Baseline | Patched | Speedup | > +----+------+-----------+-----------+---------+ > | 1 | 1000 | 1.684 s | 35.3 ms | 47.7x | > | 10 | 100 | 1.689 s | 35.0 ms | 48.3x | > | 100| 10 | 1.709 s | 35.8 ms | 47.7x | > |1000| 1 | 1.809 s | 39.0 ms | 46.4x | > +----+------+-----------+-----------+---------+ > > Bulk-drop variant: > > Keeps the `DataStore` instances alive for > all `N` lookups of a given datastore and then drops them in bulk, > mimicking a task that performs many lookups while it is running and > only triggers the expensive `Drop` logic when the last user exits. > > use anyhow::Error; > > use pbs_api_types::Operation; > use pbs_datastore::DataStore; > > fn main() -> Result<(), Error> { > let mut args = std::env::args(); > args.next(); > > let datastores = if let Some(n) = args.next() { > n.parse::()? > } else { > 1000 > }; > > let iterations = if let Some(n) = args.next() { > n.parse::()? > } else { > 1000 > }; > > for d in 1..=datastores { > let name = format!("ds{:04}", d); > > let mut stores = Vec::with_capacity(iterations); > for i in 1..=iterations { > stores.push(DataStore::lookup_datastore(&name, Some(Operation::Write))?); > } > } > > Ok(()) > } > > +------+------+---------------+--------------+---------+ > | M | N | Baseline mean | Patched mean | Speedup | > +------+------+---------------+--------------+---------+ > | 1 | 1000 | 890.6 ms | 35.5 ms | 25.1x | > | 10 | 100 | 891.3 ms | 35.1 ms | 25.4x | > | 100 | 10 | 983.9 ms | 35.6 ms | 27.6x | > | 1000 | 1 | 1829.0 ms | 45.2 ms | 40.5x | > +------+------+---------------+--------------+---------+ > > > Both variants show that the combination of the cached config lookups > and the cheaper `Drop` handling reduces the hot-path cost from ~1.8 s > per run to a few tens of milliseconds in these benchmarks. > > ## Reproduction steps > > VM: 4 vCPU, ~8 GiB RAM, VirtIO-SCSI; disks: > - scsi0 32G (OS) > - scsi1 1000G (datastores) > > Install PBS from ISO on the VM. > > Set up ZFS on /dev/sdb (adjust if different): > > zpool create -f -o ashift=12 pbsbench /dev/sdb > zfs set mountpoint=/pbsbench pbsbench > zfs create pbsbench/pbs-bench > > Raise file-descriptor limit: > > sudo systemctl edit proxmox-backup-proxy.service > > Add the following lines: > > [Service] > LimitNOFILE=1048576 > > Reload systemd and restart the proxy: > > sudo systemctl daemon-reload > sudo systemctl restart proxmox-backup-proxy.service > > Verify the limit: > > systemctl show proxmox-backup-proxy.service | grep LimitNOFILE > > Create 1000 ZFS-backed datastores (as used in #6049 [1]): > > seq -w 001 1000 | xargs -n1 -P1 bash -c ' > id=$0 > name="ds${id}" > dataset="pbsbench/pbs-bench/${name}" > path="/pbsbench/pbs-bench/${name}" > zfs create -o mountpoint="$path" "$dataset" > proxmox-backup-manager datastore create "$name" "$path" \ > --comment "ZFS dataset-based datastore" > ' > > Build PBS from this series, then run the server under manually > under flamegraph: > > systemctl stop proxmox-backup-proxy > cargo flamegraph --release --bin proxmox-backup-proxy > > ## Patch summary > > [PATCH 1/4] partial fix #6049: config: enable config version cache for datastore > [PATCH 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups > [PATCH 3/4] partial fix #6049: datastore: use config fast-path in Drop > [PATCH 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits > > ## Maintainer notes > > No dependency bumps, no API changes and no breaking changes. > > Thanks, > Samuel > > Links > > [1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049 > [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph > [3] Bugzilla #7017: https://bugzilla.proxmox.com/show_bug.cgi?id=7017 > > Samuel Rufinatscha (4): > partial fix #6049: config: enable config version cache for datastore > partial fix #6049: datastore: impl ConfigVersionCache fast path for > lookups > partial fix #6049: datastore: use config fast-path in Drop > partial fix #6049: datastore: add TTL fallback to catch manual config > edits > > pbs-config/src/config_version_cache.rs | 10 +- > pbs-datastore/Cargo.toml | 1 + > pbs-datastore/src/datastore.rs | 215 ++++++++++++++++++++----- > 3 files changed, 180 insertions(+), 46 deletions(-) > _______________________________________________ pbs-devel mailing list pbs-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel