Re: [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets

From: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
To: "Fabian Grünbichler" <f.gruenbichler@proxmox.com>,
	"Proxmox Backup Server development discussion"
	<pbs-devel@lists.proxmox.com>
Subject: Re: [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets
Date: Fri, 16 Jan 2026 17:56:13 +0100	[thread overview]
Message-ID: <035958cb-e9f6-4511-af7b-2269a99f5522@proxmox.com> (raw)
In-Reply-To: <176857922498.137827.16978965567100463552@yuna.proxmox.com>

On 1/16/26 4:59 PM, Fabian Grünbichler wrote:
> Quoting Samuel Rufinatscha (2026-01-16 16:13:17)
>> On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
>>> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>>>> Currently, every token-based API request reads the token.shadow file and
>>>> runs the expensive password hash verification for the given token
>>>> secret. This shows up as a hotspot in /status profiling (see
>>>> bug #7017 [1]).
>>>>
>>>> This patch introduces an in-memory cache of successfully verified token
>>>> secrets. Subsequent requests for the same token+secret combination only
>>>> perform a comparison using openssl::memcmp::eq and avoid re-running the
>>>> password hash. The cache is updated when a token secret is set and
>>>> cleared when a token is deleted. Note, this does NOT include manual
>>>> config changes, which will be covered in a subsequent patch.
>>>>
>>>> This patch is part of the series which fixes bug #7017 [1].
>>>>
>>>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>>>
>>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>>> ---
> 
> [..]
> 
>>>> +
>>>> +// Tries to match the given token secret against the cached secret.
>>>> +// Checks the generation before and after the constant-time compare to avoid a
>>>> +// TOCTOU window. If another process rotates/deletes a token while we're validating
>>>> +// the cached secret, the generation will change, and we
>>>> +// must not trust the cache for this request.
>>>> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
>>>> +    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
>>>> +        return false;
>>>> +    };
>>>> +    let Some(entry) = cache.secrets.get(tokenid) else {
>>>> +        return false;
>>>> +    };
>>>> +
>>>> +    let cache_gen = cache.shared_gen;
>>>> +
>>>> +    let Some(gen1) = token_shadow_shared_gen() else {
>>>> +        return false;
>>>> +    };
>>>> +    if gen1 != cache_gen {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
>>>
>>> should we invalidate the cache here for this particular authid in case
>>> of a mismatch, to avoid making brute forcing too easy/cheap?
>>>
>>
>> We are not doing a cheap reject, in mismatch we do still fall through to
>> verify_crypt_pw(). Evicting on mismatch could however enable cache
>> thrashing where wrong secrets for a known tokenid would evict cached
>> entries. So I think we should not invalidate here on mismatch.
> 
> forgot this part here, sorry. you are right, this *should* be okay. I do think
> the second generation check there serves no purpose though. the token config
> can change at any point after we've validated the secret using the old state,
> there is nothing we can do about that, and it's totally fine to accept a token
> that is modified at exactly the same moment, even if that same token wouldn't
> be valid 2 seconds later..
> 
> there has to be a point where we have to say "this token is valid", and at the
> point of memcmp here we have already:
> - verified we don't need to reload the file
> - verified we didn't have any API changes to the token config
> - verified that the secret matches what we have cached
> 
> redoing the first two changes after that point doesn't protect us against
> changes afterwards either, so we might as well not do that extra work that
> doesn't give us any extra safety guarantees anyway..

Agreed, the second generation check only narrows down a very small 
window around memcmp (tried to avoid the TOCTOU at this point), but as 
you said, it doesn’t provide a strong additional guarantee and is 
unnecessary. Will remove!

> 
>>
>>>> +    let Some(gen2) = token_shadow_shared_gen() else {
>>>> +        return false;
>>>> +    };
>>>> +
>>>> +    eq && gen2 == cache_gen
>>>> +}
>>>> +
>>>> +fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
>>>> +    // Signal cache invalidation to other processes (best-effort).
>>>> +    let new_shared_gen = bump_token_shadow_shared_gen();
>>>> +
>>>> +    let mut cache = TOKEN_SECRET_CACHE.write();
> 
> because I mentioned switching those two around - this actually requires more
> thought I think..
> 
> right now, calling apply_api_mutation happens under a lock, but there are other
> calls that bump the generation, so this is actually racy here. OTOH, bumping
> the generation before locking the cache means faster cache invalidation..

Yes, I favored to bump the gen before the write lock for faster cache
invalidation / for better security.
> 
> maybe we should re-verify the generation after obtaining the lock? and maybe
> make apply_api_mutation consume the shadow config file lock, to ensure it's
> only called while that lock is being held?

Agree, I think we should re-verify the generation after the write lock.
Also agree, I think we should pass the file lock down. Good idea! :)
This should make it more robust.

> 
>>>> +
>>>> +    // If we cannot read/bump the shared generation, we cannot safely trust the cache.
>>>> +    let Some(gen) = new_shared_gen else {
>>>> +        invalidate_cache_state(&mut cache);
>>>> +        cache.shared_gen = 0;
>>>> +        return;
>>>> +    };
>>>> +
>>>> +    // Update to the post-mutation generation.
>>>> +    cache.shared_gen = gen;
>>>> +
>>>> +    // Apply the new mutation.
>>>> +    match new_secret {
>>>> +        Some(secret) => {
>>>> +            cache.secrets.insert(
>>>> +                tokenid.clone(),
>>>> +                CachedSecret {
>>>> +                    secret: secret.to_owned(),
>>>> +                },
>>>> +            );
>>>> +        }
>>>> +        None => {
>>>> +            cache.secrets.remove(tokenid);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +/// Get the current shared generation.
>>>> +fn token_shadow_shared_gen() -> Option<usize> {
>>>> +    crate::ConfigVersionCache::new()
>>>> +        .ok()
>>>> +        .map(|cvc| cvc.token_shadow_generation())
>>>> +}
>>>> +
>>>> +/// Bump and return the new shared generation.
>>>> +fn bump_token_shadow_shared_gen() -> Option<usize> {
>>>> +    crate::ConfigVersionCache::new()
>>>> +        .ok()
>>>> +        .map(|cvc| cvc.increase_token_shadow_generation() + 1)
>>>> +}
>>>> +
>>>> +/// Invalidates the cache state and only keeps the shared generation.
>>>
>>> both calls to this actually set the cached generation to some value
>>> right after, so maybe this should take a generation directly and set it?
>>>
>>
>> patch 3/4 doesn’t always update the gen on cache invalidation
>> (shadow_mtime_len() error branch in apply_api_mutation) but most other
>> call sites do. Agreed this can be refactored, maybe:
>>
>> fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>>       cache.secrets.clear();
>>       // clear other cache fields (mtime/len/last_checked) as needed
>> }
>>
>> fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache,
>> gen: usize) {
>>       invalidate_cache_state(cache);
>>       cache.shared_gen = gen;
>> }
>>
>> We could also do a single helper with Option<usize> but two helpers make
>> the call sites more explicit.
>>
>>>> +fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>>>> +    cache.secrets.clear();
>>>> +}
>>>> -- 
>>>> 2.47.3
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> pbs-devel mailing list
>>>> pbs-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>

_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel