public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
To: pbs-devel@lists.proxmox.com
Subject: Re: [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token verification overhead
Date: Thu, 18 Dec 2025 12:03:25 +0100	[thread overview]
Message-ID: <7ee26369-ed7f-458f-87d3-275f6304bf65@proxmox.com> (raw)
In-Reply-To: <20251217162520.486520-1-s.rufinatscha@proxmox.com>

It appears we need to switch the cache to a SharedMemory
implementation. I ran additional token invalidation tests, which showed
that regenerating a secret for a given token in the dashboard can take
up to one minute (TTL) to propagate, even though it calls the set_secret
API, which should directly update the cache.

I discussed this with Fabian, and it appears the root cause seems to be
that token modifications happen in the privileged daemon, while most
regular API requests are handled by the non-privileged one.
Moving the cache to a SharedMemory implementation should resolve
this. No other changes should be required.

I will send a v3 incorporating this change.

On 12/17/25 5:25 PM, Samuel Rufinatscha wrote:
> Hi,
> 
> this series improves the performance of token-based API authentication
> in PBS (pbs-config) and in PDM (underlying proxmox-access-control
> crate), addressing the API token verification hotspot reported in our
> bugtracker #6049 [1].
> 
> When profiling PBS /status endpoint with cargo flamegraph [2],
> token-based authentication showed up as a dominant hotspot via
> proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
> path from the hot section of the flamegraph. The same performance issue
> was measured [2] for PDM. PDM uses the underlying shared
> proxmox-access-control library for token handling, which is a
> factored out version of the token.shadow handling code from PBS.
> 
> While this series fixes the immediate performance issue both in PBS
> (pbs-config) and in the shared proxmox-access-control crate used by
> PDM, PBS should eventually, ideally be refactored, in a separate
> effort, to use proxmox-access-control for token handling instead of its
> local implementation.
> 
> Problem
> 
> For token-based API requests, both PBS’s pbs-config token.shadow
> handling and PDM proxmox-access-control’s token.shadow handling
> currently:
> 
> 1. read the token.shadow file on each request
> 2. deserialize it into a HashMap<Authid, String>
> 3. run password hash verification via
>     proxmox_sys::crypt::verify_crypt_pw for the provided token secret
> 
> Under load, this results in significant CPU usage spent in repeated
> password hash computations for the same token+secret pairs. The
> attached flamegraphs for PBS [2] and PDM [3] show
> proxmox_sys::crypt::verify_crypt_pw dominating the hot path.
> 
> Approach
> 
> The goal is to reduce the cost of token-based authentication preserving
> the existing token handling semantics (including detecting manual edits
> to token.shadow) and be consistent between PBS (pbs-config) and
> PDM (proxmox-access-control). For both sites, the series proposes
> following approach:
> 
> 1. Introduce an in-memory cache for verified token secrets
> 2. Invalidate the cache when token.shadow changes (detect manual edits)
> 3. Control metadata checks with a TTL window
> 
> Testing
> 
> *PBS (pbs-config)*
> 
> To verify the effect in PBS, I:
> 1. Set up test environment based on latest PBS ISO, installed Rust
>     toolchain, cloned proxmox-backup repository to use with cargo
>     flamegraph. Reproduced bug #6049 [1] by profiling the /status
>     endpoint with token-based authentication using cargo flamegraph [2].
>     The flamegraph showed proxmox_sys::crypt::verify_crypt_pw is the
>     hotspot.
> 2. Built PBS with pbs-config patches and re-ran the same workload and
>     profiling setup.
> 3. Confirmed that the proxmox_sys::crypt::verify_crypt_pw path no
>     longer appears in the hot section of the flamegraph. CPU usage is
>     now dominated by TLS overhead.
> 4. Functionally verified that:
>     * token-based API authentication still works for valid tokens
>     * invalid secrets are rejected as before
>     * generating a new token secret via dashboard works and
>     authenticates correctly
> 
> *PDM (proxmox-access-control)*
> 
> To verify the effect in PDM, I followed a similar testing approach.
> Instead of /status, I profiled the /version endpoint with cargo
> flamegraph [2] and verified that the token hashing path disappears [4]
> from the hot section after applying the proxmox-access-control patches.
> 
> Functionally I verified that:
>     * token-based API authentication still works for valid tokens
>     * invalid secrets are rejected as before
>     * generating a new token secret via dashboard works and
>     authenticates correctly
> 
> Benchmarks:
> 
> Two different benchmarks have been run to measure caching effects
> and RwLock contention:
> 
> (1) Requests per second for PBS /status endpoint (E2E)
> (2) RwLock contention for token create/delete under
>      heavy parallel token-authenticated readers; compared
>      std::sync::RwLock and parking_lot::RwLock.
> 
> (1) benchmarked parallel token auth requests for
> /status?verbose=0 on top of the datastore lookup cache series [5]
> to check throughput impact. With datastores=1, repeat=5000, parallel=16
> this series gives ~179 req/s compared to ~65 req/s without it.
> This is a ~2.75x improvement.
> 
> (2) benchmarked token create/delete operations under heavy load of
> token-authenticated requests on top of the datastore lookup cache [5]
> series. This benchmark was done using against a 64-parallel
> token-auth flood (200k requests) against
> /admin/datastore/ds0001/status?verbose=0 while executing 50 token
> create + 50 token delete operations. After the series I saw the
> following e2e API latencies:
> 
> parking_lot::RwLock
>    - create avg ~27ms (p95 ~28ms) vs ~46ms (p95 ~50ms) baseline
>    - delete avg ~17ms (p95 ~19ms) vs ~33ms (p95 ~35ms) baseline
> 
> std::sync::RwLock
>    - create avg ~27ms (p95 ~28ms)
>    - create avg ~17ms (p95 ~19ms)
> 
> It appears that the both RwLock implementations perform similarly
> for this workload. The parking_lot version has been chosen for the
> added fairness guarantees.
> 
> Patch summary
> 
> pbs-config:
> 
> 0001 – pbs-config: cache verified API token secrets
> Adds an in-memory cache keyed by Authid that stores plain text token
> secrets after a successful verification or generation and uses
> openssl’s memcmp constant-time for comparison.
> 
> 0002 – pbs-config: invalidate token-secret cache on token.shadow
> changes
> Tracks token.shadow mtime and length and clears the in-memory
> cache when the file changes.
> 
> 0003 – pbs-config: add TTL window to token-secret cache
> Introduces a TTL (TOKEN_SECRET_CACHE_TTL_SECS, default 60) for metadata
> checks so that fs::metadata is only called periodically.
> 
> proxmox-access-control:
> 
> 0004 – access-control: cache verified API token secrets
> Mirrors PBS PATCH 0001.
> 
> 0005 – access-control: invalidate token-secret cache on token.shadow changes
> Mirrors PBS PATCH 0002.
> 
> 0006 – access-control: add TTL window to token-secret cache
> Mirrors PBS PATCH 0003.
> 
> proxmox-datacenter-manager:
> 
> 0007 – docs: document API token-cache TTL effects
> Documents the effects of the TTL window on token.shadow edits
> 
> Changes since v1
> 
> - (refactor) Switched cache initialization to LazyLock
> - (perf) Use parking_lot::RwLock and best-effort cache access on the
>    read/refresh path (try_read/try_write) to avoid lock contention
> - (doc) Document TTL-delayed effect of manual token.shadow edits
> - (fix) Add generation guards (API_MUTATION_GENERATION +
>    FILE_GENERATION) to prevent caching across concurrent set/delete and
>    external edits
> 
> Please see the patch specific changelogs for more details.
> 
> Thanks for considering this patch series, I look forward to your
> feedback.
> 
> Best,
> Samuel Rufinatscha
> 
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
> [2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
> [3] attachment 1794 [1]: Flamegraph PDM baseline
> [4] attachment 1795 [1]: Flamegraph PDM patched
> [5] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> 
> proxmox-backup:
> 
> Samuel Rufinatscha (3):
>    pbs-config: cache verified API token secrets
>    pbs-config: invalidate token-secret cache on token.shadow changes
>    pbs-config: add TTL window to token secret cache
> 
>   Cargo.toml                     |   1 +
>   docs/user-management.rst       |   4 +
>   pbs-config/Cargo.toml          |   1 +
>   pbs-config/src/token_shadow.rs | 238 ++++++++++++++++++++++++++++++++-
>   4 files changed, 243 insertions(+), 1 deletion(-)
> 
> 
> proxmox:
> 
> Samuel Rufinatscha (3):
>    proxmox-access-control: cache verified API token secrets
>    proxmox-access-control: invalidate token-secret cache on token.shadow
>      changes
>    proxmox-access-control: add TTL window to token secret cache
> 
>   Cargo.toml                                 |   1 +
>   proxmox-access-control/Cargo.toml          |   1 +
>   proxmox-access-control/src/token_shadow.rs | 238 ++++++++++++++++++++-
>   3 files changed, 239 insertions(+), 1 deletion(-)
> 
> 
> proxmox-datacenter-manager:
> 
> Samuel Rufinatscha (1):
>    docs: document API token-cache TTL effects
> 
>   docs/access-control.rst | 3 +++
>   1 file changed, 3 insertions(+)
> 
> 
> Summary over all repositories:
>    8 files changed, 485 insertions(+), 2 deletions(-)
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

      parent reply	other threads:[~2025-12-18 11:02 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-17 16:25 Samuel Rufinatscha
2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox-backup v2 1/3] pbs-config: cache verified API token secrets Samuel Rufinatscha
2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox-backup v2 2/3] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox-backup v2 3/3] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox v2 1/3] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox v2 2/3] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox v2 3/3] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox-datacenter-manager v2 1/1] docs: document API token-cache TTL effects Samuel Rufinatscha
2025-12-18 11:03 ` Samuel Rufinatscha [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7ee26369-ed7f-458f-87d3-275f6304bf65@proxmox.com \
    --to=s.rufinatscha@proxmox.com \
    --cc=pbs-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal