Re: [pbs-devel] [PATCH proxmox-backup 1/3] pbs-config: cache verified API token secrets

From: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>,
	pbs-devel@lists.proxmox.com
Subject: Re: [pbs-devel] [PATCH proxmox-backup 1/3] pbs-config: cache verified API token secrets
Date: Mon, 15 Dec 2025 20:00:21 +0100	[thread overview]
Message-ID: <4d6331ff-aac6-40b0-9749-61c4f86bdd24@proxmox.com> (raw)
In-Reply-To: <a60fd5ae-b8f5-4d17-9762-55ab4ea5ee02@proxmox.com>

On 12/15/25 4:06 PM, Samuel Rufinatscha wrote:
> On 12/10/25 4:35 PM, Samuel Rufinatscha wrote:
>> On 12/10/25 12:47 PM, Fabian Grünbichler wrote:
>>> Quoting Samuel Rufinatscha (2025-12-05 14:25:54)
>>>> Currently, every token-based API request reads the token.shadow file 
>>>> and
>>>> runs the expensive password hash verification for the given token
>>>> secret. This shows up as a hotspot in /status profiling (see
>>>> bug #6049 [1]).
>>>>
>>>> This patch introduces an in-memory cache of successfully verified token
>>>> secrets. Subsequent requests for the same token+secret combination only
>>>> perform a comparison using openssl::memcmp::eq and avoid re-running the
>>>> password hash. The cache is updated when a token secret is set and
>>>> cleared when a token is deleted. Note, this does NOT include manual
>>>> config changes, which will be covered in a subsequent patch.
>>>>
>>>> This patch partly fixes bug #6049 [1].
>>>>
>>>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>>>
>>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>>> ---
>>>>   pbs-config/src/token_shadow.rs | 58 ++++++++++++++++++++++++++++++ 
>>>> +++-
>>>>   1 file changed, 57 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/ 
>>>> token_shadow.rs
>>>> index 640fabbf..47aa2fc2 100644
>>>> --- a/pbs-config/src/token_shadow.rs
>>>> +++ b/pbs-config/src/token_shadow.rs
>>>> @@ -1,6 +1,8 @@
>>>>   use std::collections::HashMap;
>>>> +use std::sync::RwLock;
>>>>   use anyhow::{bail, format_err, Error};
>>>> +use once_cell::sync::OnceCell;
>>>>   use serde::{Deserialize, Serialize};
>>>>   use serde_json::{from_value, Value};
>>>> @@ -13,6 +15,13 @@ use crate::{open_backup_lockfile, BackupLockGuard};
>>>>   const LOCK_FILE: &str = pbs_buildcfg::configdir!("/ 
>>>> token.shadow.lock");
>>>>   const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>>>> +/// Global in-memory cache for successfully verified API token 
>>>> secrets.
>>>> +/// The cache stores plain text secrets for token Authids that have 
>>>> already been
>>>> +/// verified against the hashed values in `token.shadow`. This 
>>>> allows for cheap
>>>> +/// subsequent authentications for the same token+secret 
>>>> combination, avoiding
>>>> +/// recomputing the password hash on every request.
>>>> +static TOKEN_SECRET_CACHE: OnceCell<RwLock<ApiTokenSecretCache>> = 
>>>> OnceCell::new();
>>>> +
>>>>   #[derive(Serialize, Deserialize)]
>>>>   #[serde(rename_all = "kebab-case")]
>>>>   /// ApiToken id / secret pair
>>>> @@ -54,9 +63,25 @@ pub fn verify_secret(tokenid: &Authid, secret: 
>>>> &str) -> Result<(), Error> {
>>>>           bail!("not an API token ID");
>>>>       }
>>>> +    // Fast path
>>>> +    if let Some(cached) = 
>>>> token_secret_cache().read().unwrap().secrets.get(tokenid) {
>>>
>>> did you benchmark this with a lot of parallel token requests? a plain 
>>> RwLock
>>> gives no guarantees at all w.r.t. ordering or fairness, so a lot of 
>>> token-based
>>> requests could effectively prevent token removal AFAICT (or vice-versa,
>>> spamming token creation could lock out all tokens?)
>>>
>>> since we don't actually require the cache here to proceed, we could 
>>> also make this a try_read
>>> or a read with timeout, and fallback to the slow path if there is too 
>>> much
>>> contention? alternatively, comparing with parking_lot would also be
>>> interesting, since that implementation does have fairness guarantees.
>>>
>>> note that token-based requests are basically doable by anyone being 
>>> able to
>>> reach PBS, whereas token creation/deletion is available to every 
>>> authenticaed
>>> user.
>>>
>>
>> Thanks for the review Fabian and the valuable comments!
>>
>> I did not benchmark the RwLock itself under load. Your point about
>> contention/fairness for RwLock makes perfect sense, and we should 
>> consider this. So for v2, I will integrate try_read() /
>> try_write() as mentioned to avoid possible contention / DoS issues.
>>
>> I’ll also consider parking_lot::RwLock, thanks for the hint!
>>
> 
> 
> I benchmarked the "writer under heavy parallel readers" scenario by
> running a 64-parallel token-auth flood against
> /admin/datastore/ds0001/status?verbose=0 (≈ 44-48k successful
> requests total) while executing 50 token create + 50 token delete
> operations.
> 
> With the suggested best-effort approach (cache lookups/inserts via
> try_read/try_write) I saw the following e2e API latencies:
> 
> delete: p95 ~39ms, max ~44ms
> create: p95 ~50ms, max ~56ms
> 
> I also compared against parking_lot::RwLock under the same setup,
> results were in the same range (delete p95 ~39–43ms, max ~43–64ms)
> so I didn’t see a clear benefit there for this workload.
> 
> For v2 I will keep std::sync::RwLock with read/insert best-effort, while
> delete/removal blocking.
> 
>

Fabian,

one clarification/follow-up: the comparison against parking_lot::RwLock
was focused on end-to-end latency, and under the benchmarked
workload we didn’t observe starvation effects. Still, std::sync::RwLock
does not provide ordering or fairness guarantees, so under sustained
token-auth read load cache invalidation could theoretically be delayed.

Given that, I think switching to parking_lot::RwLock for v2 to get clear
fairness semantics, while keeping the try_read/try_insert approach, is
the better solution here.

>>>> +        // Compare cached secret with provided one using constant 
>>>> time comparison
>>>> +        if openssl::memcmp::eq(cached.as_bytes(), secret.as_bytes()) {
>>>> +            // Already verified before
>>>> +            return Ok(());
>>>> +        }
>>>> +        // Fall through to slow path if secret doesn't match cached 
>>>> one
>>>> +    }
>>>
>>> this could also be a helper, like the rest. then it would consume (a 
>>> reference
>>> to) the user-provided secret value, instead of giving access to all 
>>> cached
>>> ones. doesn't make a real difference now other than consistence, but 
>>> the cache
>>> is (more) cleanly encapsulated then.
>>>
>>>> +
>>>> +    // Slow path: read file + verify hash
>>>>       let data = read_file()?;
>>>>       match data.get(tokenid) {
>>>> -        Some(hashed_secret) => 
>>>> proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
>>>> +        Some(hashed_secret) => {
>>>> +            proxmox_sys::crypt::verify_crypt_pw(secret, 
>>>> hashed_secret)?;
>>>> +            // Cache the plain secret for future requests
>>>> +            cache_insert_secret(tokenid.clone(), secret.to_owned());
>>>
>>> same applies here - storing the value in the cache is optional (and 
>>> good if it
>>> works), but we don't want to stall forever waiting for the cache 
>>> insertion to
>>> go through..
>>>
>>>> +            Ok(())
>>>> +        }
>>>>           None => bail!("invalid API token"),
>>>>       }
>>>>   }
>>>> @@ -82,6 +107,8 @@ fn set_secret(tokenid: &Authid, secret: &str) -> 
>>>> Result<(), Error> {
>>>>       data.insert(tokenid.clone(), hashed_secret);
>>>>       write_file(data)?;
>>>> +    cache_insert_secret(tokenid.clone(), secret.to_owned());
>>>
>>> this
>>>
>>>> +
>>>>       Ok(())
>>>>   }
>>>> @@ -97,5 +124,34 @@ pub fn delete_secret(tokenid: &Authid) -> 
>>>> Result<(), Error> {
>>>>       data.remove(tokenid);
>>>>       write_file(data)?;
>>>> +    cache_remove_secret(tokenid);
>>>
>>> and this need to block of course and can't be skipped, because 
>>> otherwise the
>>> read above might operate on wrong data..
>>>
>>>> +
>>>>       Ok(())
>>>>   }
>>>> +
>>>> +struct ApiTokenSecretCache {
>>>> +    /// Keys are token Authids, values are the corresponding plain 
>>>> text secrets.
>>>> +    /// Entries are added after a successful on-disk verification in
>>>> +    /// `verify_secret` or when a new token secret is generated by
>>>> +    /// `generate_and_set_secret`. Used to avoid repeated
>>>> +    /// password-hash computation on subsequent authentications.
>>>> +    secrets: HashMap<Authid, String>,
>>>> +}
>>>> +
>>>> +fn token_secret_cache() -> &'static RwLock<ApiTokenSecretCache> {
>>>> +    TOKEN_SECRET_CACHE.get_or_init(|| {
>>>> +        RwLock::new(ApiTokenSecretCache {
>>>> +            secrets: HashMap::new(),
>>>> +        })
>>>> +    })
>>>> +}
>>>> +
>>>> +fn cache_insert_secret(tokenid: Authid, secret: String) {
>>>> +    let mut cache = token_secret_cache().write().unwrap();
>>>> +    cache.secrets.insert(tokenid, secret);
>>>> +}
>>>> +
>>>> +fn cache_remove_secret(tokenid: &Authid) {
>>>> +    let mut cache = token_secret_cache().write().unwrap();
>>>> +    cache.secrets.remove(tokenid);
>>>> +}
>>>> -- 
>>>> 2.47.3
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> pbs-devel mailing list
>>>> pbs-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>>
>>>>
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel