public inbox for pbs-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token verification overhead
@ 2025-12-17 16:25 Samuel Rufinatscha
  2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox-backup v2 1/3] pbs-config: cache verified API token secrets Samuel Rufinatscha
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Samuel Rufinatscha @ 2025-12-17 16:25 UTC (permalink / raw)
  To: pbs-devel

Hi,

this series improves the performance of token-based API authentication
in PBS (pbs-config) and in PDM (underlying proxmox-access-control
crate), addressing the API token verification hotspot reported in our
bugtracker #6049 [1].

When profiling PBS /status endpoint with cargo flamegraph [2],
token-based authentication showed up as a dominant hotspot via
proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
path from the hot section of the flamegraph. The same performance issue
was measured [2] for PDM. PDM uses the underlying shared
proxmox-access-control library for token handling, which is a
factored out version of the token.shadow handling code from PBS.

While this series fixes the immediate performance issue both in PBS
(pbs-config) and in the shared proxmox-access-control crate used by
PDM, PBS should eventually, ideally be refactored, in a separate
effort, to use proxmox-access-control for token handling instead of its
local implementation.

Problem

For token-based API requests, both PBS’s pbs-config token.shadow
handling and PDM proxmox-access-control’s token.shadow handling
currently:

1. read the token.shadow file on each request
2. deserialize it into a HashMap<Authid, String>
3. run password hash verification via
   proxmox_sys::crypt::verify_crypt_pw for the provided token secret

Under load, this results in significant CPU usage spent in repeated
password hash computations for the same token+secret pairs. The
attached flamegraphs for PBS [2] and PDM [3] show
proxmox_sys::crypt::verify_crypt_pw dominating the hot path.

Approach

The goal is to reduce the cost of token-based authentication preserving
the existing token handling semantics (including detecting manual edits
to token.shadow) and be consistent between PBS (pbs-config) and
PDM (proxmox-access-control). For both sites, the series proposes
following approach:

1. Introduce an in-memory cache for verified token secrets
2. Invalidate the cache when token.shadow changes (detect manual edits)
3. Control metadata checks with a TTL window

Testing

*PBS (pbs-config)*

To verify the effect in PBS, I:
1. Set up test environment based on latest PBS ISO, installed Rust
   toolchain, cloned proxmox-backup repository to use with cargo
   flamegraph. Reproduced bug #6049 [1] by profiling the /status
   endpoint with token-based authentication using cargo flamegraph [2].
   The flamegraph showed proxmox_sys::crypt::verify_crypt_pw is the
   hotspot.
2. Built PBS with pbs-config patches and re-ran the same workload and
   profiling setup.
3. Confirmed that the proxmox_sys::crypt::verify_crypt_pw path no
   longer appears in the hot section of the flamegraph. CPU usage is
   now dominated by TLS overhead.
4. Functionally verified that:
   * token-based API authentication still works for valid tokens
   * invalid secrets are rejected as before
   * generating a new token secret via dashboard works and
   authenticates correctly

*PDM (proxmox-access-control)*

To verify the effect in PDM, I followed a similar testing approach.
Instead of /status, I profiled the /version endpoint with cargo
flamegraph [2] and verified that the token hashing path disappears [4]
from the hot section after applying the proxmox-access-control patches.

Functionally I verified that:
   * token-based API authentication still works for valid tokens
   * invalid secrets are rejected as before
   * generating a new token secret via dashboard works and
   authenticates correctly

Benchmarks:

Two different benchmarks have been run to measure caching effects
and RwLock contention:

(1) Requests per second for PBS /status endpoint (E2E)
(2) RwLock contention for token create/delete under
    heavy parallel token-authenticated readers; compared
    std::sync::RwLock and parking_lot::RwLock.

(1) benchmarked parallel token auth requests for
/status?verbose=0 on top of the datastore lookup cache series [5]
to check throughput impact. With datastores=1, repeat=5000, parallel=16
this series gives ~179 req/s compared to ~65 req/s without it.
This is a ~2.75x improvement.

(2) benchmarked token create/delete operations under heavy load of
token-authenticated requests on top of the datastore lookup cache [5]
series. This benchmark was done using against a 64-parallel
token-auth flood (200k requests) against
/admin/datastore/ds0001/status?verbose=0 while executing 50 token
create + 50 token delete operations. After the series I saw the
following e2e API latencies:

parking_lot::RwLock
  - create avg ~27ms (p95 ~28ms) vs ~46ms (p95 ~50ms) baseline
  - delete avg ~17ms (p95 ~19ms) vs ~33ms (p95 ~35ms) baseline

std::sync::RwLock
  - create avg ~27ms (p95 ~28ms)
  - create avg ~17ms (p95 ~19ms)

It appears that the both RwLock implementations perform similarly
for this workload. The parking_lot version has been chosen for the
added fairness guarantees.

Patch summary

pbs-config:

0001 – pbs-config: cache verified API token secrets
Adds an in-memory cache keyed by Authid that stores plain text token
secrets after a successful verification or generation and uses
openssl’s memcmp constant-time for comparison.

0002 – pbs-config: invalidate token-secret cache on token.shadow
changes
Tracks token.shadow mtime and length and clears the in-memory
cache when the file changes.

0003 – pbs-config: add TTL window to token-secret cache
Introduces a TTL (TOKEN_SECRET_CACHE_TTL_SECS, default 60) for metadata
checks so that fs::metadata is only called periodically.

proxmox-access-control:

0004 – access-control: cache verified API token secrets
Mirrors PBS PATCH 0001.

0005 – access-control: invalidate token-secret cache on token.shadow changes
Mirrors PBS PATCH 0002.

0006 – access-control: add TTL window to token-secret cache
Mirrors PBS PATCH 0003.

proxmox-datacenter-manager:

0007 – docs: document API token-cache TTL effects
Documents the effects of the TTL window on token.shadow edits

Changes since v1

- (refactor) Switched cache initialization to LazyLock
- (perf) Use parking_lot::RwLock and best-effort cache access on the
  read/refresh path (try_read/try_write) to avoid lock contention
- (doc) Document TTL-delayed effect of manual token.shadow edits
- (fix) Add generation guards (API_MUTATION_GENERATION +
  FILE_GENERATION) to prevent caching across concurrent set/delete and
  external edits

Please see the patch specific changelogs for more details.

Thanks for considering this patch series, I look forward to your
feedback.

Best,
Samuel Rufinatscha

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
[2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
[3] attachment 1794 [1]: Flamegraph PDM baseline
[4] attachment 1795 [1]: Flamegraph PDM patched
[5] https://bugzilla.proxmox.com/show_bug.cgi?id=6049

proxmox-backup:

Samuel Rufinatscha (3):
  pbs-config: cache verified API token secrets
  pbs-config: invalidate token-secret cache on token.shadow changes
  pbs-config: add TTL window to token secret cache

 Cargo.toml                     |   1 +
 docs/user-management.rst       |   4 +
 pbs-config/Cargo.toml          |   1 +
 pbs-config/src/token_shadow.rs | 238 ++++++++++++++++++++++++++++++++-
 4 files changed, 243 insertions(+), 1 deletion(-)


proxmox:

Samuel Rufinatscha (3):
  proxmox-access-control: cache verified API token secrets
  proxmox-access-control: invalidate token-secret cache on token.shadow
    changes
  proxmox-access-control: add TTL window to token secret cache

 Cargo.toml                                 |   1 +
 proxmox-access-control/Cargo.toml          |   1 +
 proxmox-access-control/src/token_shadow.rs | 238 ++++++++++++++++++++-
 3 files changed, 239 insertions(+), 1 deletion(-)


proxmox-datacenter-manager:

Samuel Rufinatscha (1):
  docs: document API token-cache TTL effects

 docs/access-control.rst | 3 +++
 1 file changed, 3 insertions(+)


Summary over all repositories:
  8 files changed, 485 insertions(+), 2 deletions(-)

-- 
Generated by git-murpp 0.8.1


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v2 1/3] pbs-config: cache verified API token secrets
  2025-12-17 16:25 [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token verification overhead Samuel Rufinatscha
@ 2025-12-17 16:25 ` Samuel Rufinatscha
  2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox-backup v2 2/3] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Samuel Rufinatscha @ 2025-12-17 16:25 UTC (permalink / raw)
  To: pbs-devel

Currently, every token-based API request reads the token.shadow file and
runs the expensive password hash verification for the given token
secret. This shows up as a hotspot in /status profiling (see
bug #7017 [1]).

This patch introduces an in-memory cache of successfully verified token
secrets. Subsequent requests for the same token+secret combination only
perform a comparison using openssl::memcmp::eq and avoid re-running the
password hash. The cache is updated when a token secret is set and
cleared when a token is deleted. Note, this does NOT include manual
config changes, which will be covered in a subsequent patch.

This patch partly fixes bug #7017 [1].

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v1 to v2:
- Replace OnceCell with LazyLock, and std::sync::RwLock with
 parking_lot::RwLock.
- Add API_MUTATION_GENERATION and guard cache inserts
 to prevent “zombie inserts” across concurrent set/delete.
- Refactor cache operations into cache_try_secret_matches,
 cache_try_insert_secret, and centralize write-side behavior in
 apply_api_mutation.
- Switch fast-path cache access to try_read/try_write (best-effort).

 Cargo.toml                     |  1 +
 pbs-config/Cargo.toml          |  1 +
 pbs-config/src/token_shadow.rs | 94 +++++++++++++++++++++++++++++++++-
 3 files changed, 95 insertions(+), 1 deletion(-)

diff --git a/Cargo.toml b/Cargo.toml
index ff143932..231cdca8 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -143,6 +143,7 @@ nom = "7"
 num-traits = "0.2"
 once_cell = "1.3.1"
 openssl = "0.10.40"
+parking_lot = "0.12"
 percent-encoding = "2.1"
 pin-project-lite = "0.2"
 regex = "1.5.5"
diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
index 74afb3c6..eb81ce00 100644
--- a/pbs-config/Cargo.toml
+++ b/pbs-config/Cargo.toml
@@ -13,6 +13,7 @@ libc.workspace = true
 nix.workspace = true
 once_cell.workspace = true
 openssl.workspace = true
+parking_lot.workspace = true
 regex.workspace = true
 serde.workspace = true
 serde_json.workspace = true
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index 640fabbf..ce845e8d 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,6 +1,9 @@
 use std::collections::HashMap;
+use std::sync::atomic::{AtomicU64, Ordering};
+use std::sync::LazyLock;
 
 use anyhow::{bail, format_err, Error};
+use parking_lot::RwLock;
 use serde::{Deserialize, Serialize};
 use serde_json::{from_value, Value};
 
@@ -13,6 +16,19 @@ use crate::{open_backup_lockfile, BackupLockGuard};
 const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
 const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
 
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
+    RwLock::new(ApiTokenSecretCache {
+        secrets: HashMap::new(),
+    })
+});
+/// API mutation generation (set/delete)
+static API_MUTATION_GENERATION: AtomicU64 = AtomicU64::new(0);
+
 #[derive(Serialize, Deserialize)]
 #[serde(rename_all = "kebab-case")]
 /// ApiToken id / secret pair
@@ -54,9 +70,24 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
+    // Fast path
+    if cache_try_secret_matches(tokenid, secret) {
+        return Ok(());
+    }
+
+    // Slow path snapshot (before expensive work)
+    let api_gen_before = API_MUTATION_GENERATION.load(Ordering::Acquire);
+
     let data = read_file()?;
     match data.get(tokenid) {
-        Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+        Some(hashed_secret) => {
+            proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+
+            // Try to cache only if nothing changed while we verified
+            cache_try_insert_secret(tokenid.clone(), secret.to_owned(), api_gen_before);
+
+            Ok(())
+        }
         None => bail!("invalid API token"),
     }
 }
@@ -82,6 +113,8 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
+    apply_api_mutation(tokenid, Some(secret));
+
     Ok(())
 }
 
@@ -97,5 +130,64 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
     data.remove(tokenid);
     write_file(data)?;
 
+    apply_api_mutation(tokenid, None);
+
     Ok(())
 }
+
+struct ApiTokenSecretCache {
+    /// Keys are token Authids, values are the corresponding plain text secrets.
+    /// Entries are added after a successful on-disk verification in
+    /// `verify_secret` or when a new token secret is generated by
+    /// `generate_and_set_secret`. Used to avoid repeated
+    /// password-hash computation on subsequent authentications.
+    secrets: HashMap<Authid, CachedSecret>,
+}
+
+/// Cached secret.
+struct CachedSecret {
+    secret: String,
+}
+
+fn cache_try_insert_secret(tokenid: Authid, secret: String, api_gen_snapshot: u64) {
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return;
+    };
+
+    if API_MUTATION_GENERATION.load(Ordering::Acquire) == api_gen_snapshot {
+        cache.secrets.insert(tokenid, CachedSecret { secret });
+    }
+}
+
+fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
+    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+        return false;
+    };
+    let Some(entry) = cache.secrets.get(tokenid) else {
+        return false;
+    };
+
+    openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes())
+}
+
+fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
+    // Prevent in-flight verify_secret() from caching results across a mutation.
+    API_MUTATION_GENERATION.fetch_add(1, Ordering::AcqRel);
+
+    // Mutations must be reflected immediately once set/delete returns.
+    let mut cache = TOKEN_SECRET_CACHE.write();
+
+    match new_secret {
+        Some(secret) => {
+            cache.secrets.insert(
+                tokenid.clone(),
+                CachedSecret {
+                    secret: secret.to_owned(),
+                },
+            );
+        }
+        None => {
+            cache.secrets.remove(tokenid);
+        }
+    }
+}
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v2 2/3] pbs-config: invalidate token-secret cache on token.shadow changes
  2025-12-17 16:25 [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token verification overhead Samuel Rufinatscha
  2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox-backup v2 1/3] pbs-config: cache verified API token secrets Samuel Rufinatscha
@ 2025-12-17 16:25 ` Samuel Rufinatscha
  2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox-backup v2 3/3] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Samuel Rufinatscha @ 2025-12-17 16:25 UTC (permalink / raw)
  To: pbs-devel

Previously the in-memory token-secret cache was only updated via
set_secret() and delete_secret(), so manual edits to token.shadow were
not reflected.

This patch adds file change detection to the cache. It tracks the mtime
and length of token.shadow and clears the in-memory token secret cache
whenever these values change.

Note, this patch fetches file stats on every request. An TTL-based
optimization will be covered in a subsequent patch of the series.

This patch partly fixes bug #7017 [1].

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v1 to v2:
- Add file metadata tracking (file_mtime, file_len) and
  FILE_GENERATION.
- Store file_gen in CachedSecret and verify it against the current
  FILE_GENERATION to ensure cached entries belong to the current file
  state.
- Add shadow_mtime_len() helper and convert refresh to best-effort
  (try_write, returns bool).
- Pass a pre-write metadata snapshot into apply_api_mutation and
  clear/bump generation if the cache metadata indicates missed external
  edits.

 pbs-config/src/token_shadow.rs | 128 +++++++++++++++++++++++++++++----
 1 file changed, 116 insertions(+), 12 deletions(-)

diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index ce845e8d..71553aae 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,6 +1,9 @@
 use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
 use std::sync::atomic::{AtomicU64, Ordering};
 use std::sync::LazyLock;
+use std::time::SystemTime;
 
 use anyhow::{bail, format_err, Error};
 use parking_lot::RwLock;
@@ -24,10 +27,14 @@ const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
 static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
     RwLock::new(ApiTokenSecretCache {
         secrets: HashMap::new(),
+        file_mtime: None,
+        file_len: None,
     })
 });
 /// API mutation generation (set/delete)
 static API_MUTATION_GENERATION: AtomicU64 = AtomicU64::new(0);
+/// External/manual edits generation for the token.shadow file
+static FILE_GENERATION: AtomicU64 = AtomicU64::new(0);
 
 #[derive(Serialize, Deserialize)]
 #[serde(rename_all = "kebab-case")]
@@ -64,6 +71,29 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
     proxmox_sys::fs::replace_file(CONF_FILE, &json, options, true)
 }
 
+/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
+/// Returns true if the cache is valid to use, false if not.
+fn refresh_cache_if_file_changed() -> bool {
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return false; // cannot validate external changes -> don't trust cache
+    };
+
+    let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
+        return false; // cannot validate external changes -> don't trust cache
+    };
+
+    if cache.file_mtime == new_mtime && cache.file_len == new_len {
+        return true;
+    }
+
+    cache.secrets.clear();
+    cache.file_mtime = new_mtime;
+    cache.file_len = new_len;
+    FILE_GENERATION.fetch_add(1, Ordering::AcqRel);
+
+    true
+}
+
 /// Verifies that an entry for given tokenid / API token secret exists
 pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     if !tokenid.is_token() {
@@ -71,12 +101,13 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     }
 
     // Fast path
-    if cache_try_secret_matches(tokenid, secret) {
+    if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
         return Ok(());
     }
 
     // Slow path snapshot (before expensive work)
     let api_gen_before = API_MUTATION_GENERATION.load(Ordering::Acquire);
+    let file_gen_before = FILE_GENERATION.load(Ordering::Acquire);
 
     let data = read_file()?;
     match data.get(tokenid) {
@@ -84,7 +115,12 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
             proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
 
             // Try to cache only if nothing changed while we verified
-            cache_try_insert_secret(tokenid.clone(), secret.to_owned(), api_gen_before);
+            cache_try_insert_secret(
+                tokenid.clone(),
+                secret.to_owned(),
+                api_gen_before,
+                file_gen_before,
+            );
 
             Ok(())
         }
@@ -108,12 +144,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
 
     let _guard = lock_config()?;
 
+    // Capture state BEFORE we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
-    apply_api_mutation(tokenid, Some(secret));
+    apply_api_mutation(tokenid, Some(secret), pre_meta);
 
     Ok(())
 }
@@ -126,11 +165,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
 
     let _guard = lock_config()?;
 
+    // Capture state BEFORE we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
-    apply_api_mutation(tokenid, None);
+    apply_api_mutation(tokenid, None, pre_meta);
 
     Ok(())
 }
@@ -142,20 +184,40 @@ struct ApiTokenSecretCache {
     /// `generate_and_set_secret`. Used to avoid repeated
     /// password-hash computation on subsequent authentications.
     secrets: HashMap<Authid, CachedSecret>,
+    // shadow file mtime to detect changes
+    file_mtime: Option<SystemTime>,
+    // shadow file length to detect changes
+    file_len: Option<u64>,
 }
 
-/// Cached secret.
+/// Cached secret and the file generation it was cached at.
 struct CachedSecret {
     secret: String,
+    file_gen: u64,
 }
 
-fn cache_try_insert_secret(tokenid: Authid, secret: String, api_gen_snapshot: u64) {
+fn cache_try_insert_secret(
+    tokenid: Authid,
+    secret: String,
+    api_gen_snapshot: u64,
+    file_gen_snapshot: u64,
+) {
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return;
     };
 
-    if API_MUTATION_GENERATION.load(Ordering::Acquire) == api_gen_snapshot {
-        cache.secrets.insert(tokenid, CachedSecret { secret });
+    // Check generations to avoid zombie-inserts
+    let cur_file_gen = FILE_GENERATION.load(Ordering::Acquire);
+    let cur_api_gen = API_MUTATION_GENERATION.load(Ordering::Acquire);
+
+    if cur_file_gen == file_gen_snapshot && cur_api_gen == api_gen_snapshot {
+        cache.secrets.insert(
+            tokenid,
+            CachedSecret {
+                secret,
+                file_gen: cur_file_gen,
+            },
+        );
     }
 }
 
@@ -167,22 +229,44 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
         return false;
     };
 
-    openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes())
+    let gen1 = FILE_GENERATION.load(Ordering::Acquire);
+    if entry.file_gen != gen1 {
+        return false;
+    }
+
+    let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
+
+    let gen2 = FILE_GENERATION.load(Ordering::Acquire);
+    eq && gen1 == gen2
 }
 
-fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
-    // Prevent in-flight verify_secret() from caching results across a mutation.
+fn apply_api_mutation(
+    tokenid: &Authid,
+    new_secret: Option<&str>,
+    pre_write_meta: (Option<SystemTime>, Option<u64>),
+) {
     API_MUTATION_GENERATION.fetch_add(1, Ordering::AcqRel);
 
-    // Mutations must be reflected immediately once set/delete returns.
     let mut cache = TOKEN_SECRET_CACHE.write();
 
+    // If the cache meta doesn't match the file state before the on-disk write,
+    // external/manual edits happened -> drop everything and bump FILE_GENERATION.
+    let (pre_mtime, pre_len) = pre_write_meta;
+    if cache.file_mtime != pre_mtime || cache.file_len != pre_len {
+        cache.secrets.clear();
+        FILE_GENERATION.fetch_add(1, Ordering::AcqRel);
+    }
+
+    let file_gen = FILE_GENERATION.load(Ordering::Acquire);
+
+    // Apply the API mutation to the cache.
     match new_secret {
         Some(secret) => {
             cache.secrets.insert(
                 tokenid.clone(),
                 CachedSecret {
                     secret: secret.to_owned(),
+                    file_gen,
                 },
             );
         }
@@ -190,4 +274,24 @@ fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
             cache.secrets.remove(tokenid);
         }
     }
+
+    // Keep cache metadata aligned if possible.
+    match shadow_mtime_len() {
+        Ok((mtime, len)) => {
+            cache.file_mtime = mtime;
+            cache.file_len = len;
+        }
+        Err(_) => {
+            cache.file_mtime = None;
+            cache.file_len = None;
+        }
+    }
+}
+
+fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
+    match fs::metadata(CONF_FILE) {
+        Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
+        Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
+        Err(e) => Err(e.into()),
+    }
 }
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pbs-devel] [PATCH proxmox-backup v2 3/3] pbs-config: add TTL window to token secret cache
  2025-12-17 16:25 [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token verification overhead Samuel Rufinatscha
  2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox-backup v2 1/3] pbs-config: cache verified API token secrets Samuel Rufinatscha
  2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox-backup v2 2/3] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2025-12-17 16:25 ` Samuel Rufinatscha
  2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox v2 1/3] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Samuel Rufinatscha @ 2025-12-17 16:25 UTC (permalink / raw)
  To: pbs-devel

Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
usually should rarely change.

This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired. Documents TTL effects.

This patch partly fixes bug #7017 [1].

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v1 to v2:
- Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
- Implement double-checked TTL: check with try_read first; only attempt
  refresh with try_write if expired/unknown.
- Fix TTL bookkeeping: update last_checked on the “file unchanged” path
  and after API mutations.
- Add documentation warning about TTL-delayed effect of manual
  token.shadow edits.

 docs/user-management.rst       |  4 ++++
 pbs-config/src/token_shadow.rs | 42 +++++++++++++++++++++++++++++++++-
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/docs/user-management.rst b/docs/user-management.rst
index 41b43d60..32a9ec29 100644
--- a/docs/user-management.rst
+++ b/docs/user-management.rst
@@ -156,6 +156,10 @@ metadata:
 Similarly, the ``user delete-token`` subcommand can be used to delete a token
 again.
 
+.. WARNING:: If you manually remove a generated API token from the token secrets
+   file (token.shadow), it can take up to one minute before the token is
+   rejected. This is due to caching.
+
 Newly generated API tokens don't have any permissions. Please read the next
 section to learn how to set access permissions.
 
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index 71553aae..79940fd5 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -11,6 +11,7 @@ use serde::{Deserialize, Serialize};
 use serde_json::{from_value, Value};
 
 use proxmox_sys::fs::CreateOptions;
+use proxmox_time::epoch_i64;
 
 use pbs_api_types::Authid;
 //use crate::auth;
@@ -29,12 +30,15 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
         secrets: HashMap::new(),
         file_mtime: None,
         file_len: None,
+        last_checked: None,
     })
 });
 /// API mutation generation (set/delete)
 static API_MUTATION_GENERATION: AtomicU64 = AtomicU64::new(0);
 /// External/manual edits generation for the token.shadow file
 static FILE_GENERATION: AtomicU64 = AtomicU64::new(0);
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
 
 #[derive(Serialize, Deserialize)]
 #[serde(rename_all = "kebab-case")]
@@ -74,22 +78,54 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
 /// Refreshes the in-memory cache if the on-disk token.shadow file changed.
 /// Returns true if the cache is valid to use, false if not.
 fn refresh_cache_if_file_changed() -> bool {
+    let now = epoch_i64();
+
+    // Check TTL (best-effort)
+    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+        return false; // cannot validate external changes -> don't trust cache
+    };
+
+    let ttl_ok = cache
+        .last_checked
+        .is_some_and(|last| now.saturating_sub(last) < TOKEN_SECRET_CACHE_TTL_SECS);
+
+    drop(cache);
+
+    if ttl_ok {
+        return true;
+    }
+
+    // TTL expired/unknown at this point -> do best-effort refresh.
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return false; // cannot validate external changes -> don't trust cache
     };
 
+    // Check TTL after acquiring write lock.
+    if let Some(last) = cache.last_checked {
+        if now.saturating_sub(last) < TOKEN_SECRET_CACHE_TTL_SECS {
+            return true;
+        }
+    }
+
+    let had_prior_state = cache.last_checked.is_some();
+
     let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
         return false; // cannot validate external changes -> don't trust cache
     };
 
     if cache.file_mtime == new_mtime && cache.file_len == new_len {
+        cache.last_checked = Some(now);
         return true;
     }
 
     cache.secrets.clear();
     cache.file_mtime = new_mtime;
     cache.file_len = new_len;
-    FILE_GENERATION.fetch_add(1, Ordering::AcqRel);
+    cache.last_checked = Some(now);
+
+    if had_prior_state {
+        FILE_GENERATION.fetch_add(1, Ordering::AcqRel);
+    }
 
     true
 }
@@ -188,6 +224,8 @@ struct ApiTokenSecretCache {
     file_mtime: Option<SystemTime>,
     // shadow file length to detect changes
     file_len: Option<u64>,
+    // last time the file metadata was checked
+    last_checked: Option<i64>,
 }
 
 /// Cached secret and the file generation it was cached at.
@@ -280,10 +318,12 @@ fn apply_api_mutation(
         Ok((mtime, len)) => {
             cache.file_mtime = mtime;
             cache.file_len = len;
+            cache.last_checked = Some(epoch_i64());
         }
         Err(_) => {
             cache.file_mtime = None;
             cache.file_len = None;
+            cache.last_checked = None; // to force refresh next time
         }
     }
 }
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pbs-devel] [PATCH proxmox v2 1/3] proxmox-access-control: cache verified API token secrets
  2025-12-17 16:25 [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (2 preceding siblings ...)
  2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox-backup v2 3/3] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
@ 2025-12-17 16:25 ` Samuel Rufinatscha
  2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox v2 2/3] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Samuel Rufinatscha @ 2025-12-17 16:25 UTC (permalink / raw)
  To: pbs-devel

Currently, every token-based API request reads the token.shadow file and
runs the expensive password hash verification for the given token
secret. This issue was first observed as part of profiling the PBS
/status endpoint (see bug #7017 [1]) and is required for the factored
out proxmox_access_control token_shadow implementation too.

This patch introduces an in-memory cache of successfully verified token
secrets. Subsequent requests for the same token+secret combination only
perform a comparison using openssl::memcmp::eq and avoid re-running the
password hash. The cache is updated when a token secret is set and
cleared when a token is deleted. Note, this does NOT include manual
config changes, which will be covered in a subsequent patch.

This patch partly fixes bug #7017 [1].

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v1 to v2:
- Replace OnceCell with LazyLock, and std::sync::RwLock with
 parking_lot::RwLock.
- Add API_MUTATION_GENERATION and guard cache inserts
 to prevent “zombie inserts” across concurrent set/delete.
- Refactor cache operations into cache_try_secret_matches,
 cache_try_insert_secret, and centralize write-side behavior in
 apply_api_mutation.
- Switch fast-path cache access to try_read/try_write (best-effort).

 Cargo.toml                                 |  1 +
 proxmox-access-control/Cargo.toml          |  1 +
 proxmox-access-control/src/token_shadow.rs | 94 +++++++++++++++++++++-
 3 files changed, 95 insertions(+), 1 deletion(-)

diff --git a/Cargo.toml b/Cargo.toml
index 27a69afa..59a2ec93 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -112,6 +112,7 @@ native-tls = "0.2"
 nix = "0.29"
 openssl = "0.10"
 pam-sys = "0.5"
+parking_lot = "0.12"
 percent-encoding = "2.1"
 pin-utils = "0.1.0"
 proc-macro2 = "1.0"
diff --git a/proxmox-access-control/Cargo.toml b/proxmox-access-control/Cargo.toml
index ec189664..1de2842c 100644
--- a/proxmox-access-control/Cargo.toml
+++ b/proxmox-access-control/Cargo.toml
@@ -16,6 +16,7 @@ anyhow.workspace = true
 const_format.workspace = true
 nix = { workspace = true, optional = true }
 openssl = { workspace = true, optional = true }
+parking_lot.workspace = true
 regex.workspace = true
 hex = { workspace = true, optional = true }
 serde.workspace = true
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index c586d834..c0285b62 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,6 +1,9 @@
 use std::collections::HashMap;
+use std::sync::atomic::{AtomicU64, Ordering};
+use std::sync::LazyLock;
 
 use anyhow::{bail, format_err, Error};
+use parking_lot::RwLock;
 use serde_json::{from_value, Value};
 
 use proxmox_auth_api::types::Authid;
@@ -8,6 +11,19 @@ use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
 
 use crate::init::impl_feature::{token_shadow, token_shadow_lock};
 
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
+    RwLock::new(ApiTokenSecretCache {
+        secrets: HashMap::new(),
+    })
+});
+/// API mutation generation (set/delete)
+static API_MUTATION_GENERATION: AtomicU64 = AtomicU64::new(0);
+
 // Get exclusive lock
 fn lock_config() -> Result<ApiLockGuard, Error> {
     open_api_lockfile(token_shadow_lock(), None, true)
@@ -36,9 +52,24 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
+    // Fast path
+    if cache_try_secret_matches(tokenid, secret) {
+        return Ok(());
+    }
+
+    // Slow path snapshot (before expensive work)
+    let api_gen_before = API_MUTATION_GENERATION.load(Ordering::Acquire);
+
     let data = read_file()?;
     match data.get(tokenid) {
-        Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+        Some(hashed_secret) => {
+            proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+
+            // Try to cache only if nothing changed while we verified
+            cache_try_insert_secret(tokenid.clone(), secret.to_owned(), api_gen_before);
+
+            Ok(())
+        }
         None => bail!("invalid API token"),
     }
 }
@@ -56,6 +87,8 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
+    apply_api_mutation(tokenid, Some(secret));
+
     Ok(())
 }
 
@@ -71,6 +104,8 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
     data.remove(tokenid);
     write_file(data)?;
 
+    apply_api_mutation(tokenid, None);
+
     Ok(())
 }
 
@@ -81,3 +116,60 @@ pub fn generate_and_set_secret(tokenid: &Authid) -> Result<String, Error> {
     set_secret(tokenid, &secret)?;
     Ok(secret)
 }
+
+struct ApiTokenSecretCache {
+    /// Keys are token Authids, values are the corresponding plain text secrets.
+    /// Entries are added after a successful on-disk verification in
+    /// `verify_secret` or when a new token secret is generated by
+    /// `generate_and_set_secret`. Used to avoid repeated
+    /// password-hash computation on subsequent authentications.
+    secrets: HashMap<Authid, CachedSecret>,
+}
+
+/// Cached secret.
+struct CachedSecret {
+    secret: String,
+}
+
+fn cache_try_insert_secret(tokenid: Authid, secret: String, api_gen_snapshot: u64) {
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return;
+    };
+
+    if API_MUTATION_GENERATION.load(Ordering::Acquire) == api_gen_snapshot {
+        cache.secrets.insert(tokenid, CachedSecret { secret });
+    }
+}
+
+fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
+    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+        return false;
+    };
+    let Some(entry) = cache.secrets.get(tokenid) else {
+        return false;
+    };
+
+    openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes())
+}
+
+fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
+    // Prevent in-flight verify_secret() from caching results across a mutation.
+    API_MUTATION_GENERATION.fetch_add(1, Ordering::AcqRel);
+
+    // Mutations must be reflected immediately once set/delete returns.
+    let mut cache = TOKEN_SECRET_CACHE.write();
+
+    match new_secret {
+        Some(secret) => {
+            cache.secrets.insert(
+                tokenid.clone(),
+                CachedSecret {
+                    secret: secret.to_owned(),
+                },
+            );
+        }
+        None => {
+            cache.secrets.remove(tokenid);
+        }
+    }
+}
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pbs-devel] [PATCH proxmox v2 2/3] proxmox-access-control: invalidate token-secret cache on token.shadow changes
  2025-12-17 16:25 [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (3 preceding siblings ...)
  2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox v2 1/3] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
@ 2025-12-17 16:25 ` Samuel Rufinatscha
  2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox v2 3/3] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
  2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox-datacenter-manager v2 1/1] docs: document API token-cache TTL effects Samuel Rufinatscha
  6 siblings, 0 replies; 8+ messages in thread
From: Samuel Rufinatscha @ 2025-12-17 16:25 UTC (permalink / raw)
  To: pbs-devel

Previously the in-memory token-secret cache was only updated via
set_secret() and delete_secret(), so manual edits to token.shadow were
not reflected.

This patch adds file change detection to the cache. It tracks the mtime
and length of token.shadow and clears the in-memory token secret cache
whenever these values change.

Note, this patch fetches file stats on every request. An TTL-based
optimization will be covered in a subsequent patch of the series.

This patch partly fixes bug #7017 [1].

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v1 to v2:
- Add file metadata tracking (file_mtime, file_len) and
  FILE_GENERATION.
- Store file_gen in CachedSecret and verify it against the current
  FILE_GENERATION to ensure cached entries belong to the current file
  state.
- Add shadow_mtime_len() helper and convert refresh to best-effort
  (try_write, returns bool).
- Pass a pre-write metadata snapshot into apply_api_mutation and
  clear/bump generation if the cache metadata indicates missed external
  edits.

 proxmox-access-control/src/token_shadow.rs | 128 +++++++++++++++++++--
 1 file changed, 116 insertions(+), 12 deletions(-)

diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index c0285b62..efadce94 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,6 +1,9 @@
 use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
 use std::sync::atomic::{AtomicU64, Ordering};
 use std::sync::LazyLock;
+use std::time::SystemTime;
 
 use anyhow::{bail, format_err, Error};
 use parking_lot::RwLock;
@@ -19,10 +22,14 @@ use crate::init::impl_feature::{token_shadow, token_shadow_lock};
 static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
     RwLock::new(ApiTokenSecretCache {
         secrets: HashMap::new(),
+        file_mtime: None,
+        file_len: None,
     })
 });
 /// API mutation generation (set/delete)
 static API_MUTATION_GENERATION: AtomicU64 = AtomicU64::new(0);
+/// External/manual edits generation for the token.shadow file
+static FILE_GENERATION: AtomicU64 = AtomicU64::new(0);
 
 // Get exclusive lock
 fn lock_config() -> Result<ApiLockGuard, Error> {
@@ -46,6 +53,29 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
     replace_config(token_shadow(), &json)
 }
 
+/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
+/// Returns true if the cache is valid to use, false if not.
+fn refresh_cache_if_file_changed() -> bool {
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return false; // cannot validate external changes -> don't trust cache
+    };
+
+    let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
+        return false; // cannot validate external changes -> don't trust cache
+    };
+
+    if cache.file_mtime == new_mtime && cache.file_len == new_len {
+        return true;
+    }
+
+    cache.secrets.clear();
+    cache.file_mtime = new_mtime;
+    cache.file_len = new_len;
+    FILE_GENERATION.fetch_add(1, Ordering::AcqRel);
+
+    true
+}
+
 /// Verifies that an entry for given tokenid / API token secret exists
 pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     if !tokenid.is_token() {
@@ -53,12 +83,13 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     }
 
     // Fast path
-    if cache_try_secret_matches(tokenid, secret) {
+    if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
         return Ok(());
     }
 
     // Slow path snapshot (before expensive work)
     let api_gen_before = API_MUTATION_GENERATION.load(Ordering::Acquire);
+    let file_gen_before = FILE_GENERATION.load(Ordering::Acquire);
 
     let data = read_file()?;
     match data.get(tokenid) {
@@ -66,7 +97,12 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
             proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
 
             // Try to cache only if nothing changed while we verified
-            cache_try_insert_secret(tokenid.clone(), secret.to_owned(), api_gen_before);
+            cache_try_insert_secret(
+                tokenid.clone(),
+                secret.to_owned(),
+                api_gen_before,
+                file_gen_before,
+            );
 
             Ok(())
         }
@@ -82,12 +118,15 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
 
     let _guard = lock_config()?;
 
+    // Capture state BEFORE we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
-    apply_api_mutation(tokenid, Some(secret));
+    apply_api_mutation(tokenid, Some(secret), pre_meta);
 
     Ok(())
 }
@@ -100,11 +139,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
 
     let _guard = lock_config()?;
 
+    // Capture state BEFORE we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
-    apply_api_mutation(tokenid, None);
+    apply_api_mutation(tokenid, None, pre_meta);
 
     Ok(())
 }
@@ -124,20 +166,40 @@ struct ApiTokenSecretCache {
     /// `generate_and_set_secret`. Used to avoid repeated
     /// password-hash computation on subsequent authentications.
     secrets: HashMap<Authid, CachedSecret>,
+    // shadow file mtime to detect changes
+    file_mtime: Option<SystemTime>,
+    // shadow file length to detect changes
+    file_len: Option<u64>,
 }
 
-/// Cached secret.
+/// Cached secret and the file generation it was cached at.
 struct CachedSecret {
     secret: String,
+    file_gen: u64,
 }
 
-fn cache_try_insert_secret(tokenid: Authid, secret: String, api_gen_snapshot: u64) {
+fn cache_try_insert_secret(
+    tokenid: Authid,
+    secret: String,
+    api_gen_snapshot: u64,
+    file_gen_snapshot: u64,
+) {
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return;
     };
 
-    if API_MUTATION_GENERATION.load(Ordering::Acquire) == api_gen_snapshot {
-        cache.secrets.insert(tokenid, CachedSecret { secret });
+    // Check generations to avoid zombie-inserts
+    let cur_file_gen = FILE_GENERATION.load(Ordering::Acquire);
+    let cur_api_gen = API_MUTATION_GENERATION.load(Ordering::Acquire);
+
+    if cur_file_gen == file_gen_snapshot && cur_api_gen == api_gen_snapshot {
+        cache.secrets.insert(
+            tokenid,
+            CachedSecret {
+                secret,
+                file_gen: cur_file_gen,
+            },
+        );
     }
 }
 
@@ -149,22 +211,44 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
         return false;
     };
 
-    openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes())
+    let gen1 = FILE_GENERATION.load(Ordering::Acquire);
+    if entry.file_gen != gen1 {
+        return false;
+    }
+
+    let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
+
+    let gen2 = FILE_GENERATION.load(Ordering::Acquire);
+    eq && gen1 == gen2
 }
 
-fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
-    // Prevent in-flight verify_secret() from caching results across a mutation.
+fn apply_api_mutation(
+    tokenid: &Authid,
+    new_secret: Option<&str>,
+    pre_write_meta: (Option<SystemTime>, Option<u64>),
+) {
     API_MUTATION_GENERATION.fetch_add(1, Ordering::AcqRel);
 
-    // Mutations must be reflected immediately once set/delete returns.
     let mut cache = TOKEN_SECRET_CACHE.write();
 
+    // If the cache meta doesn't match the file state before the on-disk write,
+    // external/manual edits happened -> drop everything and bump FILE_GENERATION.
+    let (pre_mtime, pre_len) = pre_write_meta;
+    if cache.file_mtime != pre_mtime || cache.file_len != pre_len {
+        cache.secrets.clear();
+        FILE_GENERATION.fetch_add(1, Ordering::AcqRel);
+    }
+
+    let file_gen = FILE_GENERATION.load(Ordering::Acquire);
+
+    // Apply the API mutation to the cache.
     match new_secret {
         Some(secret) => {
             cache.secrets.insert(
                 tokenid.clone(),
                 CachedSecret {
                     secret: secret.to_owned(),
+                    file_gen,
                 },
             );
         }
@@ -172,4 +256,24 @@ fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
             cache.secrets.remove(tokenid);
         }
     }
+
+    // Keep cache metadata aligned if possible.
+    match shadow_mtime_len() {
+        Ok((mtime, len)) => {
+            cache.file_mtime = mtime;
+            cache.file_len = len;
+        }
+        Err(_) => {
+            cache.file_mtime = None;
+            cache.file_len = None;
+        }
+    }
+}
+
+fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
+    match fs::metadata(token_shadow().as_path()) {
+        Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
+        Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
+        Err(e) => Err(e.into()),
+    }
 }
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pbs-devel] [PATCH proxmox v2 3/3] proxmox-access-control: add TTL window to token secret cache
  2025-12-17 16:25 [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (4 preceding siblings ...)
  2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox v2 2/3] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2025-12-17 16:25 ` Samuel Rufinatscha
  2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox-datacenter-manager v2 1/1] docs: document API token-cache TTL effects Samuel Rufinatscha
  6 siblings, 0 replies; 8+ messages in thread
From: Samuel Rufinatscha @ 2025-12-17 16:25 UTC (permalink / raw)
  To: pbs-devel

Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
should rarely change.

This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired.

This patch partly fixes bug #7017 [1].

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v1 to v2:
- Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
- Implement double-checked TTL: check with try_read first; only attempt
  refresh with try_write if expired/unknown.
- Fix TTL bookkeeping: update last_checked on the “file unchanged” path
  and after API mutations.

 proxmox-access-control/src/token_shadow.rs | 42 +++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index efadce94..4ca56de9 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -11,6 +11,7 @@ use serde_json::{from_value, Value};
 
 use proxmox_auth_api::types::Authid;
 use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
+use proxmox_time::epoch_i64;
 
 use crate::init::impl_feature::{token_shadow, token_shadow_lock};
 
@@ -24,12 +25,15 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
         secrets: HashMap::new(),
         file_mtime: None,
         file_len: None,
+        last_checked: None,
     })
 });
 /// API mutation generation (set/delete)
 static API_MUTATION_GENERATION: AtomicU64 = AtomicU64::new(0);
 /// External/manual edits generation for the token.shadow file
 static FILE_GENERATION: AtomicU64 = AtomicU64::new(0);
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
 
 // Get exclusive lock
 fn lock_config() -> Result<ApiLockGuard, Error> {
@@ -56,22 +60,54 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
 /// Refreshes the in-memory cache if the on-disk token.shadow file changed.
 /// Returns true if the cache is valid to use, false if not.
 fn refresh_cache_if_file_changed() -> bool {
+    let now = epoch_i64();
+
+    // Check TTL (best-effort)
+    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+        return false; // cannot validate external changes -> don't trust cache
+    };
+
+    let ttl_ok = cache
+        .last_checked
+        .is_some_and(|last| now.saturating_sub(last) < TOKEN_SECRET_CACHE_TTL_SECS);
+
+    drop(cache);
+
+    if ttl_ok {
+        return true;
+    }
+
+    // TTL expired/unknown at this point -> do best-effort refresh.
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return false; // cannot validate external changes -> don't trust cache
     };
 
+    // Check TTL after acquiring write lock.
+    if let Some(last) = cache.last_checked {
+        if now.saturating_sub(last) < TOKEN_SECRET_CACHE_TTL_SECS {
+            return true;
+        }
+    }
+
+    let had_prior_state = cache.last_checked.is_some();
+
     let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
         return false; // cannot validate external changes -> don't trust cache
     };
 
     if cache.file_mtime == new_mtime && cache.file_len == new_len {
+        cache.last_checked = Some(now);
         return true;
     }
 
     cache.secrets.clear();
     cache.file_mtime = new_mtime;
     cache.file_len = new_len;
-    FILE_GENERATION.fetch_add(1, Ordering::AcqRel);
+    cache.last_checked = Some(now);
+
+    if had_prior_state {
+        FILE_GENERATION.fetch_add(1, Ordering::AcqRel);
+    }
 
     true
 }
@@ -170,6 +206,8 @@ struct ApiTokenSecretCache {
     file_mtime: Option<SystemTime>,
     // shadow file length to detect changes
     file_len: Option<u64>,
+    // last time the file metadata was checked
+    last_checked: Option<i64>,
 }
 
 /// Cached secret and the file generation it was cached at.
@@ -262,10 +300,12 @@ fn apply_api_mutation(
         Ok((mtime, len)) => {
             cache.file_mtime = mtime;
             cache.file_len = len;
+            cache.last_checked = Some(epoch_i64());
         }
         Err(_) => {
             cache.file_mtime = None;
             cache.file_len = None;
+            cache.last_checked = None; // to force refresh next time
         }
     }
 }
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pbs-devel] [PATCH proxmox-datacenter-manager v2 1/1] docs: document API token-cache TTL effects
  2025-12-17 16:25 [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (5 preceding siblings ...)
  2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox v2 3/3] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
@ 2025-12-17 16:25 ` Samuel Rufinatscha
  6 siblings, 0 replies; 8+ messages in thread
From: Samuel Rufinatscha @ 2025-12-17 16:25 UTC (permalink / raw)
  To: pbs-devel

Documents the effects of the added API token-cache in the
proxmox-access-control crate. This patch is part of the
series that fixes bug #7017 [1].

This patch partly fixes bug #7017 [1].

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
 docs/access-control.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/docs/access-control.rst b/docs/access-control.rst
index adf26cd..f4f26f2 100644
--- a/docs/access-control.rst
+++ b/docs/access-control.rst
@@ -47,6 +47,9 @@ place of the user ID (``user@realm``) and the user password, respectively.
 The API token is passed from the client to the server by setting the ``Authorization`` HTTP header
 with method ``PDMAPIToken`` to the value ``TOKENID:TOKENSECRET``.
 
+.. WARNING:: If you manually remove a generated API token from the token secrets file (token.shadow),
+   it can take up to one minute before the token is rejected. This is due to caching.
+
 .. _access_control:
 
 Access Control
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-12-17 16:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-17 16:25 [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token verification overhead Samuel Rufinatscha
2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox-backup v2 1/3] pbs-config: cache verified API token secrets Samuel Rufinatscha
2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox-backup v2 2/3] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox-backup v2 3/3] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox v2 1/3] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox v2 2/3] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox v2 3/3] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
2025-12-17 16:25 ` [pbs-devel] [PATCH proxmox-datacenter-manager v2 1/1] docs: document API token-cache TTL effects Samuel Rufinatscha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal