* [pbs-devel] [PATCH proxmox-backup v2 3/4] partial fix #6049: datastore: use config fast-path in Drop
@ 2025-11-14 15:05 16% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2025-11-14 15:05 UTC (permalink / raw)
To: pbs-devel
The Drop impl of DataStore re-read datastore.cfg to decide whether
the entry should be evicted from the in-process cache (based on
maintenance mode’s clear_from_cache). During the investigation of
issue #6049 [1], a flamegraph [2] showed that the config reload in Drop
accounted for a measurable share of CPU time under load.
This patch adds the datastore config fast path to the Drop impl to
eventually avoid an expensive config reload from disk to capture
the maintenance mandate.
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
pbs-datastore/src/datastore.rs | 43 +++++++++++++++++++++++++++-------
1 file changed, 34 insertions(+), 9 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index e7748872..0fabf592 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -214,15 +214,40 @@ impl Drop for DataStore {
// remove datastore from cache iff
// - last task finished, and
// - datastore is in a maintenance mode that mandates it
- let remove_from_cache = last_task
- && pbs_config::datastore::config()
- .and_then(|(s, _)| s.lookup::<DataStoreConfig>("datastore", self.name()))
- .is_ok_and(|c| {
- c.get_maintenance_mode()
- .is_some_and(|m| m.clear_from_cache())
- });
-
- if remove_from_cache {
+
+ // first check: check if last task finished
+ if !last_task {
+ return;
+ }
+
+ let (section_config, _gen) = match datastore_section_config_cached() {
+ Ok(v) => v,
+ Err(err) => {
+ log::error!(
+ "failed to load datastore config in Drop for {} - {err}",
+ self.name()
+ );
+ return;
+ }
+ };
+
+ let datastore_cfg: DataStoreConfig =
+ match section_config.lookup("datastore", self.name()) {
+ Ok(cfg) => cfg,
+ Err(err) => {
+ log::error!(
+ "failed to look up datastore '{}' in Drop - {err}",
+ self.name()
+ );
+ return;
+ }
+ };
+
+ // second check: check maintenance mode mandate
+ if datastore_cfg
+ .get_maintenance_mode()
+ .is_some_and(|m| m.clear_from_cache())
+ {
DATASTORE_MAP.lock().unwrap().remove(self.name());
}
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 16%]
* [pbs-devel] applied: [PATCH proxmox] fix #6913: auth-api: fix user ID parsing for 2-character realms
2025-11-03 16:26 15% [pbs-devel] [PATCH proxmox] fix #6913: auth-api: fix user ID parsing for 2-character realms Samuel Rufinatscha
2025-11-11 10:40 5% ` Fabian Grünbichler
@ 2025-11-14 10:34 5% ` Fabian Grünbichler
1 sibling, 0 replies; 39+ results
From: Fabian Grünbichler @ 2025-11-14 10:34 UTC (permalink / raw)
To: pbs-devel, Samuel Rufinatscha
On Mon, 03 Nov 2025 17:26:05 +0100, Samuel Rufinatscha wrote:
> PVE and PBS both allow creating realms with names of length ≥ 2.
> However, when creating a user, PBS rejected realms with 2 characters
> (e.g. `test@aa`), while PVE accepted them. This issue was reported
> in our bug tracker [1]. Since the issue appears in the underlying
> `proxmox/proxmox-auth-api` crate, also PDM userid handling is
> affected.
>
> [...]
Applied, thanks!
[1/1] fix #6913: auth-api: fix user ID parsing for 2-character realms
commit: 2ac004ec5516beb93000544bb709dccd8c48f116
Best regards,
--
Fabian Grünbichler <f.gruenbichler@proxmox.com>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
2025-11-12 13:24 4% ` Fabian Grünbichler
@ 2025-11-13 12:59 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2025-11-13 12:59 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 11/12/25 2:24 PM, Fabian Grünbichler wrote:
> On November 11, 2025 1:29 pm, Samuel Rufinatscha wrote:
>> Repeated /status requests caused lookup_datastore() to re-read and
>> parse datastore.cfg on every call. The issue was mentioned in report
>> #6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
>> dominated by pbs_config::datastore::config() (config parsing).
>>
>> This patch adds a fast path to lookup_datastore() using the shared-
>> memory ConfigVersionCache generation counter for datastore.cfg. It
>> stores the last seen generation number alongside the cached
>> DataStoreImpl and, when a subsequent lookup sees the same generation,
>> we reuse the cached instance without re-reading the on-disk config. If
>> the generation differs (or the cache is unavailable), it falls back to
>> the existing slow path with no behavioral changes.
>>
>> Behavioral notes
>>
>> - The generation is bumped via the existing save_config() path, so
>> API-driven config changes are detected immediately.
>> - Manual edits to datastore.cfg are not detected by this commit; a TTL
>> guard is introduced in a later patch to cover that case.
>> - DataStore::drop still performs a config read on the common path; this
>> is addressed in the next patch.
>>
>> Testing
>>
>> cargo-flamegraph confirms the config-parse hotspot in
>> lookup_datastore() disappears from the hot path.
>>
>> Additionally, an end-to-end benchmark was performed using the
>> `/status?verbose=0` API with 1000 datastores, 5 requests per store,
>> and 16-way parallelism:
>>
>> | Metric | Before | After | Δ (abs) | Δ (%) |
>> |--------------------------|:------:|:------:|:-------:|:-------:|
>> | Total time | 13s | 11s | −2s | −15.38% |
>> | Throughput (all rounds) | 384.62 | 454.55 | +69.93 | +18.18% |
>> | Cold RPS (round #1) | 76.92 | 90.91 | +13.99 | +18.19% |
>> | Warm RPS (rounds 2..N) | 307.69 | 363.64 | +55.95 | +18.18% |
>>
>> Throughput improved by ~18% overall, with total time reduced by ~15%.
>> Warm lookups now reuse cached datastore instances and skip redundant
>> config parsing entirely. The first-access round also shows a similar
>> improvement, likely due to reduced contention and I/O when many stores
>> are accessed in parallel.
>>
>> Note: A second hotspot remains in Drop where a config read occurs; that
>> is addressed in the next commit.
>
> it would be interesting to also have numbers for just lookup+Drop,
> without all the HTTP and TLS overhead to really isolate it. that should
> also make it easier to reliably benchmark with something like hyperfine
> ;)
>
Good point, I will isolate the numbers - seems like the TLS overhead is
quite huge. Thanks for the hyperfine reference!
> for my simple config (5 datastores) single-threaded lookup+drop of a
> single datastore 100k times gives around 1.31 speedup for the whole
> series. the slightly modified version from below (which basically runs
> the most expensive part of Drop only once) for the same test setup still
> gives a speedup of 1.17
>
So the lookup optimization dominates the speedup if we hold longer to
the datastores, great to see.
> re-running the same benchmark with a config with 1k datastores, querying> M datastores N times gives the following speedups:
>
> M = 1, N = 1000: 15.6x faster
> M = 10, N = 100: 14.5x
> M = 100, N = 10: 8.8x
> M = 1000, N = 1: 1.8x (so this is basically showing the speedup of the
> improved Drop handling alone!)
>
> and then a slightly modified version, that keeps the DataStore instance
> around until all N lookups for that datastore are done, then dropping
> them in bulk (which mimics having lots of lookups while a task is
> running, making the Drop handler only do the expensive part every once
> in a while when the last task for a given datastore exits):
>
> use anyhow::Error;
>
> use pbs_api_types::Operation;
> use pbs_datastore::DataStore;
>
> fn main() -> Result<(), Error> {
> let mut args = std::env::args();
> args.next();
>
> let datastores = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> let iterations = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> for d in 1..=datastores {
> let name = format!("pbs_bench_{d}");
> let mut stores = Vec::with_capacity(iterations);
> for i in 1..=iterations {
> stores.push(DataStore::lookup_datastore(&name, Some(Operation::Write))?);
> }
> }
>
> Ok(())
> }
>
> M = 1, N = 1000: 8.5x faster
> M = 10, N = 100: 7.9x
> M = 100, N = 10: 5.2x
> M = 1000, N = 1: 1.9x
>
> looking at the flamegraph of this isolated benchmark it's now obvious
> that the remaining overhead is lockfiles and tracking the operations
> (both in lookup and when dropping)..
>
Thanks a lot also for setting up the test environment and providing your
numbers, which are helpful to compare against!
Regarding the overhead in lockfiles and tracking operations, it is a
good confirmation that everything else on the hot path is optimized - I
think the locks and tracking operations could maybe be revisited in a
future effort!
> side-note: I actually found a bug in our operations tracking while
> benchmarking this series that gave me wildly different numbers because
> the "drop last task" part never got executed as a result:
>
> https://lore.proxmox.com/pbs-devel/20251112131525.645971-1-f.gruenbichler@proxmox.com/T/#u
>
Nice catch!
>>
>> Links
>>
>> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>>
>> Fixes: #6049
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> pbs-config/src/config_version_cache.rs | 10 +++-
>> pbs-datastore/src/datastore.rs | 77 +++++++++++++++++---------
>> 2 files changed, 59 insertions(+), 28 deletions(-)
>>
>> diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
>> index e8fb994f..b875f7e0 100644
>> --- a/pbs-config/src/config_version_cache.rs
>> +++ b/pbs-config/src/config_version_cache.rs
>> @@ -26,7 +26,6 @@ struct ConfigVersionCacheDataInner {
>> // Traffic control (traffic-control.cfg) generation/version.
>> traffic_control_generation: AtomicUsize,
>> // datastore (datastore.cfg) generation/version
>> - // FIXME: remove with PBS 3.0
>> datastore_generation: AtomicUsize,
>> // Add further atomics here
>> }
>> @@ -145,8 +144,15 @@ impl ConfigVersionCache {
>> .fetch_add(1, Ordering::AcqRel);
>> }
>>
>> + /// Returns the datastore generation number.
>> + pub fn datastore_generation(&self) -> usize {
>> + self.shmem
>> + .data()
>> + .datastore_generation
>> + .load(Ordering::Acquire)
>> + }
>> +
>> /// Increase the datastore generation number.
>> - // FIXME: remove with PBS 3.0 or make actually useful again in datastore lookup
>> pub fn increase_datastore_generation(&self) -> usize {
>> self.shmem
>> .data()
>
> this part could be split out into its own patch
>
Will factor this out into a separate patch.
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 70af94d8..18eebb58 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -32,7 +32,7 @@ use pbs_api_types::{
>> MaintenanceType, Operation, UPID,
>> };
>> use pbs_config::s3::S3_CFG_TYPE_ID;
>> -use pbs_config::BackupLockGuard;
>> +use pbs_config::{BackupLockGuard, ConfigVersionCache};
>>
>> use crate::backup_info::{
>> BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
>> @@ -140,10 +140,10 @@ pub struct DataStoreImpl {
>> last_gc_status: Mutex<GarbageCollectionStatus>,
>> verify_new: bool,
>> chunk_order: ChunkOrder,
>> - last_digest: Option<[u8; 32]>,
>> sync_level: DatastoreFSyncLevel,
>> backend_config: DatastoreBackendConfig,
>> lru_store_caching: Option<LocalDatastoreLruCache>,
>> + cached_config_tag: Option<CachedDatastoreConfigTag>,
>> }
>>
>> impl DataStoreImpl {
>> @@ -156,10 +156,10 @@ impl DataStoreImpl {
>> last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
>> verify_new: false,
>> chunk_order: Default::default(),
>> - last_digest: None,
>> sync_level: Default::default(),
>> backend_config: Default::default(),
>> lru_store_caching: None,
>> + cached_config_tag: None,
>> })
>> }
>> }
>> @@ -224,6 +224,15 @@ pub enum DatastoreBackend {
>> S3(Arc<S3Client>),
>> }
>>
>> +/// Used to determine whether a cached datastore instance is still valid
>> +/// or needs to be reloaded after a config change.
>> +struct CachedDatastoreConfigTag {
>> + /// Maintenance mode at the time the lookup hint was captured, if any.
>> + last_maintenance_mode: Option<MaintenanceMode>,
>> + /// Datastore generation number from `ConfigVersionCache`; `None` when the cache wasn't available.
>> + last_generation: Option<usize>,
>
> if the whole tag is an option, do we really need to make the generation
> an option as well?
>
Good point, keeping the tag optional only is enough and will simplify
generation checks.
>> +}
>> +
>> impl DataStore {
>> // This one just panics on everything
>> #[doc(hidden)]
>> @@ -299,13 +308,40 @@ impl DataStore {
>> // we use it to decide whether it is okay to delete the datastore.
>> let _config_lock = pbs_config::datastore::lock_config()?;
>>
>> - // we could use the ConfigVersionCache's generation for staleness detection, but we load
>> - // the config anyway -> just use digest, additional benefit: manual changes get detected
>> - let (config, digest) = pbs_config::datastore::config()?;
>> + // Get the current datastore.cfg generation number
>> + let gen_num = ConfigVersionCache::new()
>> + .ok()
>> + .map(|c| c.datastore_generation());
>> +
>> + // Fast-path: if we have a cached entry created under the same datastore.cfg generation number, reuse it.
>> + if let (Some(gen_num), Some(ds)) =
>> + (gen_num, DATASTORE_MAP.lock().unwrap().get(name).cloned())
>> + {
>> + if let Some(cached_tag) = &ds.cached_config_tag {
>> + if cached_tag.last_generation == Some(gen_num) {
>> + if let Some(mm) = &cached_tag.last_maintenance_mode {
>> + if let Err(error) = mm.check(operation) {
>> + bail!("datastore '{name}' is unavailable: {error}");
>> + }
>> + }
>> + if let Some(operation) = operation {
>> + update_active_operations(name, operation, 1)?;
>> + }
>> + return Ok(Arc::new(Self {
>> + inner: ds,
>> + operation,
>> + }));
>> + }
>> + }
>> + }
>> +
>> + // Slow path: (re)load config
>> + let (config, _digest) = pbs_config::datastore::config()?;
>> let config: DataStoreConfig = config.lookup("datastore", name)?;
>>
>> - if let Some(maintenance_mode) = config.get_maintenance_mode() {
>> - if let Err(error) = maintenance_mode.check(operation) {
>> + let maintenance_mode = config.get_maintenance_mode();
>> + if let Some(mm) = &maintenance_mode {
>> + if let Err(error) = mm.check(operation) {
>> bail!("datastore '{name}' is unavailable: {error}");
>> }
>> }
>
> after this here we have a check for the mount status in the old hot
> path, that is missing in the new hot path. the mount status can change
> even if the config doesn't, so we should probably add this back to the
> hot path and re-run the numbers?
>
> that check again needs more parts of the config, so maybe we could
> explore caching the full config here? e.g., add a new static
> Mutex<Option<(DataStoreConfig, usize)>> (extended by the timestamp in
> the last patch) and adapt the patches here to use it? depending on
> whether we make the cached config available outside of lookup_datastore,
> we could then even not add the maintenance mode to the cache tag, and
> just store the generation number there, and retrieve the maintenance
> mode from the cached config in the Drop implementation..
>
For the mount check we will need device_uuid and datastore mount dir
path which we could add to the cached entry, however I think I would
also rather explore caching of the full global config here. This should
result in further performance gains - this would/should allow for
eventual fast-lookups to any other datastore. Will try to integrate it
in v2.
> there's a little more duplication here, e.g. we lock the map and check
> for an entry in both the fast and slow paths, we could do it once up
> front (nobody can change the map while we have it locked anyway), the
> checks are written twice and could probably be extracted into a helper
> so that future similar checks are added to both paths and not to only
> one by accident, ..
>
Good point, I will factor it out (e.g. small closure).
>> @@ -321,16 +357,6 @@ impl DataStore {
>>
>> // reuse chunk store so that we keep using the same process locker instance!
>> let chunk_store = if let Some(datastore) = &entry {
>> - let last_digest = datastore.last_digest.as_ref();
>> - if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
>> - if let Some(operation) = operation {
>> - update_active_operations(name, operation, 1)?;
>> - }
>> - return Ok(Arc::new(Self {
>> - inner: Arc::clone(datastore),
>> - operation,
>> - }));
>> - }
>> Arc::clone(&datastore.chunk_store)
>> } else {
>> let tuning: DatastoreTuning = serde_json::from_value(
>> @@ -344,7 +370,11 @@ impl DataStore {
>> )?)
>> };
>>
>> - let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
>> + let mut datastore = DataStore::with_store_and_config(chunk_store, config)?;
>> + datastore.cached_config_tag = Some(CachedDatastoreConfigTag {
>> + last_maintenance_mode: maintenance_mode,
>> + last_generation: gen_num,
>> + });
>
> this part should be in with_store_and_config, which should get the
> last_generation (and later last_update) as parameter(s), just like it
> had the digest before this patch..
>
Agree, I will move this part.
>>
>> let datastore = Arc::new(datastore);
>> datastore_cache.insert(name.to_string(), datastore.clone());
>> @@ -430,11 +460,7 @@ impl DataStore {
>> config.absolute_path(),
>> tuning.sync_level.unwrap_or_default(),
>> )?;
>> - let inner = Arc::new(Self::with_store_and_config(
>> - Arc::new(chunk_store),
>> - config,
>> - None,
>> - )?);
>> + let inner = Arc::new(Self::with_store_and_config(Arc::new(chunk_store), config)?);
>>
>> if let Some(operation) = operation {
>> update_active_operations(&name, operation, 1)?;
>> @@ -446,7 +472,6 @@ impl DataStore {
>> fn with_store_and_config(
>> chunk_store: Arc<ChunkStore>,
>> config: DataStoreConfig,
>> - last_digest: Option<[u8; 32]>,
>> ) -> Result<DataStoreImpl, Error> {
>> let mut gc_status_path = chunk_store.base_path();
>> gc_status_path.push(".gc-status");
>> @@ -506,10 +531,10 @@ impl DataStore {
>> last_gc_status: Mutex::new(gc_status),
>> verify_new: config.verify_new.unwrap_or(false),
>> chunk_order: tuning.chunk_order.unwrap_or_default(),
>> - last_digest,
>> sync_level: tuning.sync_level.unwrap_or_default(),
>> backend_config,
>> lru_store_caching,
>> + cached_config_tag: None,
>> })
>> }
>>
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path
2025-11-12 11:27 5% ` [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path Fabian Grünbichler
@ 2025-11-12 17:27 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2025-11-12 17:27 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 11/12/25 12:27 PM, Fabian Grünbichler wrote:
> On November 11, 2025 1:29 pm, Samuel Rufinatscha wrote:
>> Hi,
>>
>> this series reduces CPU time in datastore lookups by avoiding repeated
>> datastore.cfg reads/parses in both `lookup_datastore()` and
>> `DataStore::Drop`. It also adds a TTL so manual config edits are
>> noticed without reintroducing hashing on every request.
>>
>> While investigating #6049 [1], cargo-flamegraph [2] showed hotspots during
>> repeated `/status` calls in `lookup_datastore()` and in `Drop`,
>> dominated by `pbs_config::datastore::config()` (config parse).
>>
>> The parsing cost itself should eventually be investigated in a future
>> effort. Furthermore, cargo-flamegraph showed that when using a
>> token-based auth method to access the API, a significant amount of time
>> is spent in validation on every request, likely related to bcrypt.
>> Also this should be eventually revisited in a future effort.
>
> please file a bug for the token part, if there isn't already one!
>
Thanks for the in-depth review Fabian! I created a bug report for the
token part and added the relevant flamegraph - this should help narrow
down the issue: https://bugzilla.proxmox.com/show_bug.cgi?id=7017
> thanks for diving into this, it already looks promising, even though the
> effect on more "normal" systems with more reasonable numbers of
> datastores and clients will be less pronounced ;)
>
> the big TL;DR would be that we trade faster datastore lookups (which
> happen quite frequently, in particular if there are many datastores with
> clients checking their status) against slightly delayed reload of the
> configuration in case of manual, behind-our-backs edits, with one
> particular corner case that is slightly problematic, but also a bit
> contrived:
> - datastore is looked up
> - config is edited (manually) to set maintenance mode to one that
> requires removing from the datastore map once the last task exits
> - last task drops the datastore struct
> - no regular edits happened in the meantime
> - the Drop handler doesn't know it needs to remove the datastore from
> the map
> - open FD is held by proxy, datastore fails to be unmounted/..
>
> we could solve this issue by not only bumping the generation on save,
> but also when we reload the config (in particular if we cache the whole
> config!). that would make the Drop handler efficient enough for idle
> systems that have mostly lookups but no long running tasks. as soon as a
> datastore has long running tasks, the last such task will likely exit
> long after the TTL for its config lookup has expired, so will need to do
> a refresh - although that refresh could again be from the global cache,
> instead of from disk? still wouldn't close the window entirely, but make
> it pretty unlikely to be hit in practice..
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
Good idea! I will add the bump in the lookup_datastore() slow path
directly after (config, digest) is read and increment the generation if
the digest changed but generation hasn’t - this should also help avoid
unnecessary cache invalidations.
In Drop we then either check if the shared gen differs from the cached
tag gen or the tag is TTL expired, otherwise use the cached decision.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup 2/3] partial fix #6049: datastore: use config fast-path in Drop
2025-11-12 11:24 5% ` Fabian Grünbichler
@ 2025-11-12 15:20 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2025-11-12 15:20 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 11/12/25 12:25 PM, Fabian Grünbichler wrote:
> On November 11, 2025 1:29 pm, Samuel Rufinatscha wrote:
>> The Drop impl of DataStore re-read datastore.cfg to decide whether
>> the entry should be evicted from the in-process cache (based on
>> maintenance mode’s clear_from_cache). During the investigation of
>> issue #6049 [1], a flamegraph [2] showed that the config reload in Drop
>> accounted for a measurable share of CPU time under load.
>>
>> This patch makes Drop O(1) on the fast path by reusing the maintenance-
>
> I am not sure what the O(1) is refering to? This patch implements a
> faster cache lookup in front of the (slow) config parsing variant, but
> that doesn't really align well with what the "Big O" notation tries to
> express ;)
>
> The parsing below still scales with the number of datastores in the
> config, after all. It can just be skipped sometimes :)
>
Good point — the O(1) reference is a rather misleading. I’ll rephrase it
in v2 :)
>> mode decision captured at lookup time and stored with the cached
>> datastore entry. When the last reference goes away we:
>> - decrement active-operation counters, and
>> - evict only if the cached decision mandates eviction.
>>
>> If the cache tag is absent or not fresh, a subsequent slow-path lookup
>> will be performed.
>>
>> Testing
>>
>> Compared flamegraphs before and after: prior to this change
>> (on top of patch 1), stacks originating from Drop included
>> pbs_config::datastore::config(). After the change, those vanish from
>> the drop path.
>>
>> An end-to-end benchmark using `/status?verbose=0` with 1000 datastores,
>> 5 requests per store, and 16-way parallelism shows a further
>> improvement:
>>
>> | Metric | After commit 1 | After commit 2 | Δ (abs) | Δ (%) |
>> |-------------------------|:--------------:|:--------------:|:-------:|:-------:|
>> | Total time | 11s | 10s | −1s | −9.09% |
>> | Throughput (all rounds) | 454.55 | 500.00 | +45.45 | +10.00% |
>> | Cold RPS (round #1) | 90.91 | 100.00 | +9.09 | +10.00% |
>> | Warm RPS (rounds 2..N) | 363.64 | 400.00 | +36.36 | +10.00% |
>>
>> Optimizing Drop improves overall throughput by ~10%. The gain appears
>> in both cold and warm rounds, and the flamegraph confirms the config
>> reload no longer sits on the hot path.
>>
>> Links
>>
>> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>>
>> Fixes: #6049
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> pbs-datastore/src/datastore.rs | 31 +++++++++++++++++++++++++++----
>> 1 file changed, 27 insertions(+), 4 deletions(-)
>>
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 18eebb58..da80416a 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -200,15 +200,38 @@ impl Drop for DataStore {
>> // remove datastore from cache iff
>> // - last task finished, and
>> // - datastore is in a maintenance mode that mandates it
>> - let remove_from_cache = last_task
>> - && pbs_config::datastore::config()
>> +
>> + // first check: check if last task finished
>> + if !last_task {
>> + return;
>> + }
>> +
>> + let cached_tag = self.inner.cached_config_tag.as_ref();
>> + let last_gen_num = cached_tag.and_then(|c| c.last_generation);
>> + let gen_num = ConfigVersionCache::new()
>> + .ok()
>> + .map(|c| c.datastore_generation());
>> +
>> + let cache_is_fresh = match (last_gen_num, gen_num) {
>> + (Some(a), Some(b)) => a == b,
>> + _ => false,
>> + };
>
> this is just last_gen_num == gen_num and checking that either is Some.
> if we make the tag always contain a generation instead of an option, we
> can simplify this code ;)
>
Good point, will adjust this. I think we could keep
`ConfigVersionCache::new().ok()` and create the optional cache tag only
if the generation number is `Some`. This way, the lookup would still be
able to perform a slow path read if the cache isn’t available for any
reason.
>> +
>> + let mm_mandate = if cache_is_fresh {
>> + cached_tag
>> + .and_then(|c| c.last_maintenance_mode.as_ref())
>> + .is_some_and(|m| m.clear_from_cache())
>> + } else {
>> + pbs_config::datastore::config()
>> .and_then(|(s, _)| s.lookup::<DataStoreConfig>("datastore", self.name()))
>> .is_ok_and(|c| {
>> c.get_maintenance_mode()
>> .is_some_and(|m| m.clear_from_cache())
>> - });
>> + })
>> + };
>>
>> - if remove_from_cache {
>> + // second check: check maintenance mode mandate
>> + if mm_mandate {
>> DATASTORE_MAP.lock().unwrap().remove(self.name());
>> }
>> }
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
2025-11-11 12:29 12% ` [pbs-devel] [PATCH proxmox-backup 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
@ 2025-11-12 13:24 4% ` Fabian Grünbichler
2025-11-13 12:59 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 39+ results
From: Fabian Grünbichler @ 2025-11-12 13:24 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On November 11, 2025 1:29 pm, Samuel Rufinatscha wrote:
> Repeated /status requests caused lookup_datastore() to re-read and
> parse datastore.cfg on every call. The issue was mentioned in report
> #6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
> dominated by pbs_config::datastore::config() (config parsing).
>
> This patch adds a fast path to lookup_datastore() using the shared-
> memory ConfigVersionCache generation counter for datastore.cfg. It
> stores the last seen generation number alongside the cached
> DataStoreImpl and, when a subsequent lookup sees the same generation,
> we reuse the cached instance without re-reading the on-disk config. If
> the generation differs (or the cache is unavailable), it falls back to
> the existing slow path with no behavioral changes.
>
> Behavioral notes
>
> - The generation is bumped via the existing save_config() path, so
> API-driven config changes are detected immediately.
> - Manual edits to datastore.cfg are not detected by this commit; a TTL
> guard is introduced in a later patch to cover that case.
> - DataStore::drop still performs a config read on the common path; this
> is addressed in the next patch.
>
> Testing
>
> cargo-flamegraph confirms the config-parse hotspot in
> lookup_datastore() disappears from the hot path.
>
> Additionally, an end-to-end benchmark was performed using the
> `/status?verbose=0` API with 1000 datastores, 5 requests per store,
> and 16-way parallelism:
>
> | Metric | Before | After | Δ (abs) | Δ (%) |
> |--------------------------|:------:|:------:|:-------:|:-------:|
> | Total time | 13s | 11s | −2s | −15.38% |
> | Throughput (all rounds) | 384.62 | 454.55 | +69.93 | +18.18% |
> | Cold RPS (round #1) | 76.92 | 90.91 | +13.99 | +18.19% |
> | Warm RPS (rounds 2..N) | 307.69 | 363.64 | +55.95 | +18.18% |
>
> Throughput improved by ~18% overall, with total time reduced by ~15%.
> Warm lookups now reuse cached datastore instances and skip redundant
> config parsing entirely. The first-access round also shows a similar
> improvement, likely due to reduced contention and I/O when many stores
> are accessed in parallel.
>
> Note: A second hotspot remains in Drop where a config read occurs; that
> is addressed in the next commit.
it would be interesting to also have numbers for just lookup+Drop,
without all the HTTP and TLS overhead to really isolate it. that should
also make it easier to reliably benchmark with something like hyperfine
;)
for my simple config (5 datastores) single-threaded lookup+drop of a
single datastore 100k times gives around 1.31 speedup for the whole
series. the slightly modified version from below (which basically runs
the most expensive part of Drop only once) for the same test setup still
gives a speedup of 1.17
re-running the same benchmark with a config with 1k datastores, querying
M datastores N times gives the following speedups:
M = 1, N = 1000: 15.6x faster
M = 10, N = 100: 14.5x
M = 100, N = 10: 8.8x
M = 1000, N = 1: 1.8x (so this is basically showing the speedup of the
improved Drop handling alone!)
and then a slightly modified version, that keeps the DataStore instance
around until all N lookups for that datastore are done, then dropping
them in bulk (which mimics having lots of lookups while a task is
running, making the Drop handler only do the expensive part every once
in a while when the last task for a given datastore exits):
use anyhow::Error;
use pbs_api_types::Operation;
use pbs_datastore::DataStore;
fn main() -> Result<(), Error> {
let mut args = std::env::args();
args.next();
let datastores = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
let iterations = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
for d in 1..=datastores {
let name = format!("pbs_bench_{d}");
let mut stores = Vec::with_capacity(iterations);
for i in 1..=iterations {
stores.push(DataStore::lookup_datastore(&name, Some(Operation::Write))?);
}
}
Ok(())
}
M = 1, N = 1000: 8.5x faster
M = 10, N = 100: 7.9x
M = 100, N = 10: 5.2x
M = 1000, N = 1: 1.9x
looking at the flamegraph of this isolated benchmark it's now obvious
that the remaining overhead is lockfiles and tracking the operations
(both in lookup and when dropping)..
side-note: I actually found a bug in our operations tracking while
benchmarking this series that gave me wildly different numbers because
the "drop last task" part never got executed as a result:
https://lore.proxmox.com/pbs-devel/20251112131525.645971-1-f.gruenbichler@proxmox.com/T/#u
>
> Links
>
> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>
> Fixes: #6049
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> pbs-config/src/config_version_cache.rs | 10 +++-
> pbs-datastore/src/datastore.rs | 77 +++++++++++++++++---------
> 2 files changed, 59 insertions(+), 28 deletions(-)
>
> diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
> index e8fb994f..b875f7e0 100644
> --- a/pbs-config/src/config_version_cache.rs
> +++ b/pbs-config/src/config_version_cache.rs
> @@ -26,7 +26,6 @@ struct ConfigVersionCacheDataInner {
> // Traffic control (traffic-control.cfg) generation/version.
> traffic_control_generation: AtomicUsize,
> // datastore (datastore.cfg) generation/version
> - // FIXME: remove with PBS 3.0
> datastore_generation: AtomicUsize,
> // Add further atomics here
> }
> @@ -145,8 +144,15 @@ impl ConfigVersionCache {
> .fetch_add(1, Ordering::AcqRel);
> }
>
> + /// Returns the datastore generation number.
> + pub fn datastore_generation(&self) -> usize {
> + self.shmem
> + .data()
> + .datastore_generation
> + .load(Ordering::Acquire)
> + }
> +
> /// Increase the datastore generation number.
> - // FIXME: remove with PBS 3.0 or make actually useful again in datastore lookup
> pub fn increase_datastore_generation(&self) -> usize {
> self.shmem
> .data()
this part could be split out into its own patch
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 70af94d8..18eebb58 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -32,7 +32,7 @@ use pbs_api_types::{
> MaintenanceType, Operation, UPID,
> };
> use pbs_config::s3::S3_CFG_TYPE_ID;
> -use pbs_config::BackupLockGuard;
> +use pbs_config::{BackupLockGuard, ConfigVersionCache};
>
> use crate::backup_info::{
> BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
> @@ -140,10 +140,10 @@ pub struct DataStoreImpl {
> last_gc_status: Mutex<GarbageCollectionStatus>,
> verify_new: bool,
> chunk_order: ChunkOrder,
> - last_digest: Option<[u8; 32]>,
> sync_level: DatastoreFSyncLevel,
> backend_config: DatastoreBackendConfig,
> lru_store_caching: Option<LocalDatastoreLruCache>,
> + cached_config_tag: Option<CachedDatastoreConfigTag>,
> }
>
> impl DataStoreImpl {
> @@ -156,10 +156,10 @@ impl DataStoreImpl {
> last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
> verify_new: false,
> chunk_order: Default::default(),
> - last_digest: None,
> sync_level: Default::default(),
> backend_config: Default::default(),
> lru_store_caching: None,
> + cached_config_tag: None,
> })
> }
> }
> @@ -224,6 +224,15 @@ pub enum DatastoreBackend {
> S3(Arc<S3Client>),
> }
>
> +/// Used to determine whether a cached datastore instance is still valid
> +/// or needs to be reloaded after a config change.
> +struct CachedDatastoreConfigTag {
> + /// Maintenance mode at the time the lookup hint was captured, if any.
> + last_maintenance_mode: Option<MaintenanceMode>,
> + /// Datastore generation number from `ConfigVersionCache`; `None` when the cache wasn't available.
> + last_generation: Option<usize>,
if the whole tag is an option, do we really need to make the generation
an option as well?
> +}
> +
> impl DataStore {
> // This one just panics on everything
> #[doc(hidden)]
> @@ -299,13 +308,40 @@ impl DataStore {
> // we use it to decide whether it is okay to delete the datastore.
> let _config_lock = pbs_config::datastore::lock_config()?;
>
> - // we could use the ConfigVersionCache's generation for staleness detection, but we load
> - // the config anyway -> just use digest, additional benefit: manual changes get detected
> - let (config, digest) = pbs_config::datastore::config()?;
> + // Get the current datastore.cfg generation number
> + let gen_num = ConfigVersionCache::new()
> + .ok()
> + .map(|c| c.datastore_generation());
> +
> + // Fast-path: if we have a cached entry created under the same datastore.cfg generation number, reuse it.
> + if let (Some(gen_num), Some(ds)) =
> + (gen_num, DATASTORE_MAP.lock().unwrap().get(name).cloned())
> + {
> + if let Some(cached_tag) = &ds.cached_config_tag {
> + if cached_tag.last_generation == Some(gen_num) {
> + if let Some(mm) = &cached_tag.last_maintenance_mode {
> + if let Err(error) = mm.check(operation) {
> + bail!("datastore '{name}' is unavailable: {error}");
> + }
> + }
> + if let Some(operation) = operation {
> + update_active_operations(name, operation, 1)?;
> + }
> + return Ok(Arc::new(Self {
> + inner: ds,
> + operation,
> + }));
> + }
> + }
> + }
> +
> + // Slow path: (re)load config
> + let (config, _digest) = pbs_config::datastore::config()?;
> let config: DataStoreConfig = config.lookup("datastore", name)?;
>
> - if let Some(maintenance_mode) = config.get_maintenance_mode() {
> - if let Err(error) = maintenance_mode.check(operation) {
> + let maintenance_mode = config.get_maintenance_mode();
> + if let Some(mm) = &maintenance_mode {
> + if let Err(error) = mm.check(operation) {
> bail!("datastore '{name}' is unavailable: {error}");
> }
> }
after this here we have a check for the mount status in the old hot
path, that is missing in the new hot path. the mount status can change
even if the config doesn't, so we should probably add this back to the
hot path and re-run the numbers?
that check again needs more parts of the config, so maybe we could
explore caching the full config here? e.g., add a new static
Mutex<Option<(DataStoreConfig, usize)>> (extended by the timestamp in
the last patch) and adapt the patches here to use it? depending on
whether we make the cached config available outside of lookup_datastore,
we could then even not add the maintenance mode to the cache tag, and
just store the generation number there, and retrieve the maintenance
mode from the cached config in the Drop implementation..
there's a little more duplication here, e.g. we lock the map and check
for an entry in both the fast and slow paths, we could do it once up
front (nobody can change the map while we have it locked anyway), the
checks are written twice and could probably be extracted into a helper
so that future similar checks are added to both paths and not to only
one by accident, ..
> @@ -321,16 +357,6 @@ impl DataStore {
>
> // reuse chunk store so that we keep using the same process locker instance!
> let chunk_store = if let Some(datastore) = &entry {
> - let last_digest = datastore.last_digest.as_ref();
> - if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
> - if let Some(operation) = operation {
> - update_active_operations(name, operation, 1)?;
> - }
> - return Ok(Arc::new(Self {
> - inner: Arc::clone(datastore),
> - operation,
> - }));
> - }
> Arc::clone(&datastore.chunk_store)
> } else {
> let tuning: DatastoreTuning = serde_json::from_value(
> @@ -344,7 +370,11 @@ impl DataStore {
> )?)
> };
>
> - let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
> + let mut datastore = DataStore::with_store_and_config(chunk_store, config)?;
> + datastore.cached_config_tag = Some(CachedDatastoreConfigTag {
> + last_maintenance_mode: maintenance_mode,
> + last_generation: gen_num,
> + });
this part should be in with_store_and_config, which should get the
last_generation (and later last_update) as parameter(s), just like it
had the digest before this patch..
>
> let datastore = Arc::new(datastore);
> datastore_cache.insert(name.to_string(), datastore.clone());
> @@ -430,11 +460,7 @@ impl DataStore {
> config.absolute_path(),
> tuning.sync_level.unwrap_or_default(),
> )?;
> - let inner = Arc::new(Self::with_store_and_config(
> - Arc::new(chunk_store),
> - config,
> - None,
> - )?);
> + let inner = Arc::new(Self::with_store_and_config(Arc::new(chunk_store), config)?);
>
> if let Some(operation) = operation {
> update_active_operations(&name, operation, 1)?;
> @@ -446,7 +472,6 @@ impl DataStore {
> fn with_store_and_config(
> chunk_store: Arc<ChunkStore>,
> config: DataStoreConfig,
> - last_digest: Option<[u8; 32]>,
> ) -> Result<DataStoreImpl, Error> {
> let mut gc_status_path = chunk_store.base_path();
> gc_status_path.push(".gc-status");
> @@ -506,10 +531,10 @@ impl DataStore {
> last_gc_status: Mutex::new(gc_status),
> verify_new: config.verify_new.unwrap_or(false),
> chunk_order: tuning.chunk_order.unwrap_or_default(),
> - last_digest,
> sync_level: tuning.sync_level.unwrap_or_default(),
> backend_config,
> lru_store_caching,
> + cached_config_tag: None,
> })
> }
>
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 4%]
* Re: [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path
2025-11-11 12:29 11% [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path Samuel Rufinatscha
` (2 preceding siblings ...)
2025-11-11 12:29 15% ` [pbs-devel] [PATCH proxmox-backup 3/3] datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
@ 2025-11-12 11:27 5% ` Fabian Grünbichler
2025-11-12 17:27 6% ` Samuel Rufinatscha
3 siblings, 1 reply; 39+ results
From: Fabian Grünbichler @ 2025-11-12 11:27 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On November 11, 2025 1:29 pm, Samuel Rufinatscha wrote:
> Hi,
>
> this series reduces CPU time in datastore lookups by avoiding repeated
> datastore.cfg reads/parses in both `lookup_datastore()` and
> `DataStore::Drop`. It also adds a TTL so manual config edits are
> noticed without reintroducing hashing on every request.
>
> While investigating #6049 [1], cargo-flamegraph [2] showed hotspots during
> repeated `/status` calls in `lookup_datastore()` and in `Drop`,
> dominated by `pbs_config::datastore::config()` (config parse).
>
> The parsing cost itself should eventually be investigated in a future
> effort. Furthermore, cargo-flamegraph showed that when using a
> token-based auth method to access the API, a significant amount of time
> is spent in validation on every request, likely related to bcrypt.
> Also this should be eventually revisited in a future effort.
please file a bug for the token part, if there isn't already one!
thanks for diving into this, it already looks promising, even though the
effect on more "normal" systems with more reasonable numbers of
datastores and clients will be less pronounced ;)
the big TL;DR would be that we trade faster datastore lookups (which
happen quite frequently, in particular if there are many datastores with
clients checking their status) against slightly delayed reload of the
configuration in case of manual, behind-our-backs edits, with one
particular corner case that is slightly problematic, but also a bit
contrived:
- datastore is looked up
- config is edited (manually) to set maintenance mode to one that
requires removing from the datastore map once the last task exits
- last task drops the datastore struct
- no regular edits happened in the meantime
- the Drop handler doesn't know it needs to remove the datastore from
the map
- open FD is held by proxy, datastore fails to be unmounted/..
we could solve this issue by not only bumping the generation on save,
but also when we reload the config (in particular if we cache the whole
config!). that would make the Drop handler efficient enough for idle
systems that have mostly lookups but no long running tasks. as soon as a
datastore has long running tasks, the last such task will likely exit
long after the TTL for its config lookup has expired, so will need to do
a refresh - although that refresh could again be from the global cache,
instead of from disk? still wouldn't close the window entirely, but make
it pretty unlikely to be hit in practice..
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup 2/3] partial fix #6049: datastore: use config fast-path in Drop
2025-11-11 12:29 15% ` [pbs-devel] [PATCH proxmox-backup 2/3] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
@ 2025-11-12 11:24 5% ` Fabian Grünbichler
2025-11-12 15:20 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 39+ results
From: Fabian Grünbichler @ 2025-11-12 11:24 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On November 11, 2025 1:29 pm, Samuel Rufinatscha wrote:
> The Drop impl of DataStore re-read datastore.cfg to decide whether
> the entry should be evicted from the in-process cache (based on
> maintenance mode’s clear_from_cache). During the investigation of
> issue #6049 [1], a flamegraph [2] showed that the config reload in Drop
> accounted for a measurable share of CPU time under load.
>
> This patch makes Drop O(1) on the fast path by reusing the maintenance-
I am not sure what the O(1) is refering to? This patch implements a
faster cache lookup in front of the (slow) config parsing variant, but
that doesn't really align well with what the "Big O" notation tries to
express ;)
The parsing below still scales with the number of datastores in the
config, after all. It can just be skipped sometimes :)
> mode decision captured at lookup time and stored with the cached
> datastore entry. When the last reference goes away we:
> - decrement active-operation counters, and
> - evict only if the cached decision mandates eviction.
>
> If the cache tag is absent or not fresh, a subsequent slow-path lookup
> will be performed.
>
> Testing
>
> Compared flamegraphs before and after: prior to this change
> (on top of patch 1), stacks originating from Drop included
> pbs_config::datastore::config(). After the change, those vanish from
> the drop path.
>
> An end-to-end benchmark using `/status?verbose=0` with 1000 datastores,
> 5 requests per store, and 16-way parallelism shows a further
> improvement:
>
> | Metric | After commit 1 | After commit 2 | Δ (abs) | Δ (%) |
> |-------------------------|:--------------:|:--------------:|:-------:|:-------:|
> | Total time | 11s | 10s | −1s | −9.09% |
> | Throughput (all rounds) | 454.55 | 500.00 | +45.45 | +10.00% |
> | Cold RPS (round #1) | 90.91 | 100.00 | +9.09 | +10.00% |
> | Warm RPS (rounds 2..N) | 363.64 | 400.00 | +36.36 | +10.00% |
>
> Optimizing Drop improves overall throughput by ~10%. The gain appears
> in both cold and warm rounds, and the flamegraph confirms the config
> reload no longer sits on the hot path.
>
> Links
>
> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>
> Fixes: #6049
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> pbs-datastore/src/datastore.rs | 31 +++++++++++++++++++++++++++----
> 1 file changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 18eebb58..da80416a 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -200,15 +200,38 @@ impl Drop for DataStore {
> // remove datastore from cache iff
> // - last task finished, and
> // - datastore is in a maintenance mode that mandates it
> - let remove_from_cache = last_task
> - && pbs_config::datastore::config()
> +
> + // first check: check if last task finished
> + if !last_task {
> + return;
> + }
> +
> + let cached_tag = self.inner.cached_config_tag.as_ref();
> + let last_gen_num = cached_tag.and_then(|c| c.last_generation);
> + let gen_num = ConfigVersionCache::new()
> + .ok()
> + .map(|c| c.datastore_generation());
> +
> + let cache_is_fresh = match (last_gen_num, gen_num) {
> + (Some(a), Some(b)) => a == b,
> + _ => false,
> + };
this is just last_gen_num == gen_num and checking that either is Some.
if we make the tag always contain a generation instead of an option, we
can simplify this code ;)
> +
> + let mm_mandate = if cache_is_fresh {
> + cached_tag
> + .and_then(|c| c.last_maintenance_mode.as_ref())
> + .is_some_and(|m| m.clear_from_cache())
> + } else {
> + pbs_config::datastore::config()
> .and_then(|(s, _)| s.lookup::<DataStoreConfig>("datastore", self.name()))
> .is_ok_and(|c| {
> c.get_maintenance_mode()
> .is_some_and(|m| m.clear_from_cache())
> - });
> + })
> + };
>
> - if remove_from_cache {
> + // second check: check maintenance mode mandate
> + if mm_mandate {
> DATASTORE_MAP.lock().unwrap().remove(self.name());
> }
> }
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox] fix #6913: auth-api: fix user ID parsing for 2-character realms
2025-11-11 10:40 5% ` Fabian Grünbichler
@ 2025-11-11 13:49 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2025-11-11 13:49 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 11/11/25 11:40 AM, Fabian Grünbichler wrote:
> On November 3, 2025 5:26 pm, Samuel Rufinatscha wrote:
>> PVE and PBS both allow creating realms with names of length ≥ 2.
>> However, when creating a user, PBS rejected realms with 2 characters
>> (e.g. `test@aa`), while PVE accepted them. This issue was reported
>> in our bug tracker [1]. Since the issue appears in the underlying
>> `proxmox/proxmox-auth-api` crate, also PDM userid handling is
>> affected.
>>
>> The issue is caused by a mismatch between realm creation and parsing
>> rules in `proxmox/proxmox-auth-api`. `REALM_ID_SCHEMA` allows
>> min_length(2), but `PROXMOX_AUTH_REALM_STRING_SCHEMA` enforced
>> min_length(3).
>>
>> This patch lowers the minimum realm length in
>> `PROXMOX_AUTH_REALM_STRING_SCHEMA` from 3 to 2 to align PBS and PMG
>> with PVE.
>>
>> ## Testing
>>
>> Please see the attached unit tests.
>> The changes were further verified using a rebuilt PBS .deb
>> deployment. PDM was tested using a non-package binary through the
>> provided client CLI.
>>
>> ## Maintainer notes:
>>
>> Bump the `proxmox-auth-api` dependency, no breaking change.
>> PBS and PDM to use the new dependency.
>
> this part here we'd usually put into the patch notes (the part below the
> `---`), which doesn't show up in git history. you can manage those notes
> using `git notes ..`, including (if you set your config accordingly),
> preserving/merging them across rebases.
>
> Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
>
Thanks for the review Fabian - makes absolutely sense! I will keep this
in mind for my future patches.
>>
>> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6913
>>
>> Fixes: #6913
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> proxmox-auth-api/src/types.rs | 68 ++++++++++++++++++++++++++++++++++-
>> 1 file changed, 67 insertions(+), 1 deletion(-)
>>
>> diff --git a/proxmox-auth-api/src/types.rs b/proxmox-auth-api/src/types.rs
>> index 9bde661c..aa09fb93 100644
>> --- a/proxmox-auth-api/src/types.rs
>> +++ b/proxmox-auth-api/src/types.rs
>> @@ -95,7 +95,7 @@ pub const PROXMOX_GROUP_ID_SCHEMA: Schema = StringSchema::new("Group ID")
>> pub const PROXMOX_AUTH_REALM_STRING_SCHEMA: StringSchema =
>> StringSchema::new("Authentication domain ID")
>> .format(&proxmox_schema::api_types::SAFE_ID_FORMAT)
>> - .min_length(3)
>> + .min_length(2)
>> .max_length(32);
>> pub const PROXMOX_AUTH_REALM_SCHEMA: Schema = PROXMOX_AUTH_REALM_STRING_SCHEMA.schema();
>>
>> @@ -769,6 +769,72 @@ fn test_token_id() {
>> assert_eq!(auth_id.to_string(), "test@pam!bar".to_string());
>> }
>>
>> +#[test]
>> +fn test_realm_validation() {
>> + let empty_realm: Result<Realm, _> = "".to_string().try_into();
>> + let one_char_realm: Result<Realm, _> = "a".to_string().try_into();
>> + let two_char_realm: Result<Realm, _> = "aa".to_string().try_into();
>> + let long_realm: Result<Realm, _> = "a".repeat(33).try_into();
>> + let valid_realm: Result<Realm, _> = "pam".to_string().try_into();
>> +
>> + assert!(empty_realm.is_err(), "Empty realm should fail validation");
>> + assert!(
>> + one_char_realm.is_err(),
>> + "1-char realm should fail validation"
>> + );
>> + assert!(
>> + two_char_realm.is_ok(),
>> + "2-char realm should pass validation"
>> + );
>> + assert!(valid_realm.is_ok(), "Typical realm should pass validation");
>> + assert!(
>> + long_realm.is_err(),
>> + "Realm >32 chars should fail validation"
>> + );
>> +}
>> +
>> +#[test]
>> +fn test_userid_validation() {
>> + let empty_str: Result<Userid, _> = "".parse();
>> + let invalid_no_realm: Result<Userid, _> = "user".parse();
>> + let invalid_empty_realm: Result<Userid, _> = "user@".parse();
>> + let invalid_one_char_realm: Result<Userid, _> = "user@a".parse();
>> + let valid_two_char_realm: Result<Userid, _> = "user@aa".parse();
>> + let valid_long_realm: Result<Userid, _> = "user@pam".parse();
>> + let invalid_long_realm: Result<Userid, _> = format!("user@{}", "a".repeat(33)).parse();
>> + let invalid_empty_username: Result<Userid, _> = "@aa".parse();
>> +
>> + assert!(empty_str.is_err(), "Empty userid should fail");
>> + assert!(
>> + invalid_no_realm.is_err(),
>> + "Userid without realm should fail"
>> + );
>> + assert!(
>> + invalid_empty_realm.is_err(),
>> + "Userid with empty realm should fail"
>> + );
>> + assert!(
>> + invalid_one_char_realm.is_err(),
>> + "Userid with 1-char realm should fail"
>> + );
>> + assert!(
>> + valid_two_char_realm.is_ok(),
>> + "Userid with 2-char realm should pass"
>> + );
>> + assert!(
>> + valid_long_realm.is_ok(),
>> + "Userid with normal realm should pass"
>> + );
>> + assert!(
>> + invalid_long_realm.is_err(),
>> + "Userid with realm >32 chars should fail"
>> + );
>> + assert!(
>> + invalid_empty_username.is_err(),
>> + "Userid with empty username should fail"
>> + );
>> +}
>
> these two are more or less tests validating our schema deserializer, but
> as the types are rather core types they also don't hurt.
>
> AFAICT we don't have in-depth tests in proxmox-schema that verify that
> the schema constraints validation actually works as expected, there's
> just some basic tests for query parameter handling and schema types
> themselves - might be an area worth improving ;)
>
Good point! Having tests for the schema constraints would be a great
follow-up and probably good-to-have, also we could move these tests
then.
>> +
>> serde_plain::derive_deserialize_from_fromstr!(Userid, "valid user id");
>> serde_plain::derive_serialize_from_display!(Userid);
>>
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path
@ 2025-11-11 12:29 11% Samuel Rufinatscha
2025-11-11 12:29 12% ` [pbs-devel] [PATCH proxmox-backup 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
` (3 more replies)
0 siblings, 4 replies; 39+ results
From: Samuel Rufinatscha @ 2025-11-11 12:29 UTC (permalink / raw)
To: pbs-devel
Hi,
this series reduces CPU time in datastore lookups by avoiding repeated
datastore.cfg reads/parses in both `lookup_datastore()` and
`DataStore::Drop`. It also adds a TTL so manual config edits are
noticed without reintroducing hashing on every request.
While investigating #6049 [1], cargo-flamegraph [2] showed hotspots during
repeated `/status` calls in `lookup_datastore()` and in `Drop`,
dominated by `pbs_config::datastore::config()` (config parse).
The parsing cost itself should eventually be investigated in a future
effort. Furthermore, cargo-flamegraph showed that when using a
token-based auth method to access the API, a significant amount of time
is spent in validation on every request, likely related to bcrypt.
Also this should be eventually revisited in a future effort.
## Approach
[PATCH 1/3] Fast path for datastore lookups
Use the shared-memory `ConfigVersionCache` generation for `datastore.cfg`.
Tag each cached `DataStoreImpl` with the last seen generation; when it
matches, reuse the cached instance. Fall back to the existing slow path
on mismatch or when the cache is unavailable.
[PATCH 2/3] Fast path for `Drop`
Reuse the maintenance mode eviction decision captured at lookup time,
removing the config reload from `Drop`.
[PATCH 3/3] TTL to catch manual edits
If a cached entry is older than `DATASTORE_CONFIG_CACHE_TTL_SECS`
(default 60s), the next lookup refreshes it via the slow path. This
detects manual file edits without hashing on every request.
## Results
End-to-end `/status?verbose=0` (1000 stores, 5 req/store, parallel=16):
Metric Baseline [1/3] [2/3]
------------------------------------------------
Total time 13s 11s 10s
Throughput (all) 384.62 454.55 500.00
Cold RPS (round #1) 76.92 90.91 100.00
Warm RPS (2..N) 307.69 363.64 400.00
Patch 1 improves overall throughput by ~18% (−15% total time). Patch 2
adds ~10% on top. Patch 3 is a robustness feature; a 0.1 s probe shows
periodic latency spikes at TTL expiry and flat latencies otherwise.
## Reproduction steps
VM: 4 vCPU, ~8 GiB RAM, VirtIO-SCSI; disks:
- scsi0 32G (OS)
- scsi1 1000G (datastores)
Install PBS from ISO on the VM.
Set up ZFS on /dev/sdb (adjust if different):
zpool create -f -o ashift=12 pbsbench /dev/sdb
zfs set mountpoint=/pbsbench pbsbench
zfs create pbsbench/pbs-bench
Raise file-descriptor limit:
sudo systemctl edit proxmox-backup-proxy.service
Add the following lines:
[Service]
LimitNOFILE=1048576
Reload systemd and restart the proxy:
sudo systemctl daemon-reload
sudo systemctl restart proxmox-backup-proxy.service
Verify the limit:
systemctl show proxmox-backup-proxy.service | grep LimitNOFILE
Create 1000 ZFS-backed datastores (as used in #6049 [1]):
seq -w 001 1000 | xargs -n1 -P1 bash -c '
id=$0
name="ds${id}"
dataset="pbsbench/pbs-bench/${name}"
path="/pbsbench/pbs-bench/${name}"
zfs create -o mountpoint="$path" "$dataset"
proxmox-backup-manager datastore create "$name" "$path" \
--comment "ZFS dataset-based datastore"
'
Build PBS from this series, then run the server under manually
under flamegraph:
systemctl stop proxmox-backup-proxy
cargo flamegraph --release --bin proxmox-backup-proxy
Benchmark script (`bench.sh`) used for the numbers above:
#!/usr/bin/env bash
set -euo pipefail
# --- Config ---------------------------------------------------------------
HOST='https://localhost:8007'
USER='root@pam'
PASS="$(cat passfile)"
DATASTORE_PATH="/pbsbench/pbs-bench"
MAX_STORES=1000 # how many stores to include
PARALLEL=16 # concurrent workers
REPEAT=5 # requests per store (1 cold + REPEAT-1 warm)
PRINT_FIRST=false # true => log first request's HTTP code per store
# --- Helpers --------------------------------------------------------------
fmt_rps () {
local n="$1" t="$2"
awk -v n="$n" -v t="$t" 'BEGIN { if (t > 0) printf("%.2f\n", n/t); else print "0.00" }'
}
# --- Login ---------------------------------------------------------------
auth=$(curl -ks -X POST "$HOST/api2/json/access/ticket" \
-d "username=$USER" -d "password=$PASS")
ticket=$(echo "$auth" | jq -r '.data.ticket')
if [[ -z "${ticket:-}" || "$ticket" == "null" ]]; then
echo "[ERROR] Login failed (no ticket)"
exit 1
fi
# --- Collect stores (deterministic order) --------------------------------
mapfile -t STORES < <(
find "$DATASTORE_PATH" -mindepth 1 -maxdepth 1 -type d -printf '%f\n' \
| sort | head -n "$MAX_STORES"
)
USED_STORES=${#STORES[@]}
if (( USED_STORES == 0 )); then
echo "[ERROR] No datastore dirs under $DATASTORE_PATH"
exit 1
fi
echo "[INFO] Running with stores=$USED_STORES, repeat=$REPEAT, parallel=$PARALLEL"
# --- Temp counters --------------------------------------------------------
SUCCESS_ALL="$(mktemp)"
FAIL_ALL="$(mktemp)"
COLD_OK="$(mktemp)"
WARM_OK="$(mktemp)"
trap 'rm -f "$SUCCESS_ALL" "$FAIL_ALL" "$COLD_OK" "$WARM_OK"' EXIT
export HOST ticket REPEAT SUCCESS_ALL FAIL_ALL COLD_OK WARM_OK PRINT_FIRST
SECONDS=0
# --- Fire requests --------------------------------------------------------
printf "%s\n" "${STORES[@]}" \
| xargs -P"$PARALLEL" -I{} bash -c '
store="$1"
url="$HOST/api2/json/admin/datastore/$store/status?verbose=0"
for ((i=1;i<=REPEAT;i++)); do
code=$(curl -ks -o /dev/null -w "%{http_code}" -b "PBSAuthCookie=$ticket" "$url" || echo 000)
if [[ "$code" == "200" ]]; then
echo 1 >> "$SUCCESS_ALL"
if (( i == 1 )); then
echo 1 >> "$COLD_OK"
else
echo 1 >> "$WARM_OK"
fi
if [[ "$PRINT_FIRST" == "true" && $i -eq 1 ]]; then
ts=$(date +%H:%M:%S)
echo "[$ts] $store #$i HTTP:200"
fi
else
echo 1 >> "$FAIL_ALL"
if [[ "$PRINT_FIRST" == "true" && $i -eq 1 ]]; then
ts=$(date +%H:%M:%S)
echo "[$ts] $store #$i HTTP:$code (FAIL)"
fi
fi
done
' _ {}
# --- Summary --------------------------------------------------------------
elapsed=$SECONDS
ok=$(wc -l < "$SUCCESS_ALL" 2>/dev/null || echo 0)
fail=$(wc -l < "$FAIL_ALL" 2>/dev/null || echo 0)
cold_ok=$(wc -l < "$COLD_OK" 2>/dev/null || echo 0)
warm_ok=$(wc -l < "$WARM_OK" 2>/dev/null || echo 0)
expected=$(( USED_STORES * REPEAT ))
total=$(( ok + fail ))
rps_all=$(fmt_rps "$ok" "$elapsed")
rps_cold=$(fmt_rps "$cold_ok" "$elapsed")
rps_warm=$(fmt_rps "$warm_ok" "$elapsed")
echo "===== Summary ====="
echo "Stores used: $USED_STORES"
echo "Expected requests: $expected"
echo "Executed requests: $total"
echo "OK (HTTP 200): $ok"
echo "Failed: $fail"
printf "Total time: %dm %ds\n" $((elapsed/60)) $((elapsed%60))
echo "Throughput all RPS: $rps_all"
echo "Cold RPS (round #1): $rps_cold"
echo "Warm RPS (#2..N): $rps_warm"
## Maintainer notes
- No dependency bumps, no API changes, no breaking changes in this
series.
## Patch summary
[PATCH 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
[PATCH 2/3] partial fix #6049: datastore: use config fast-path in Drop
[PATCH 3/3] datastore: add TTL fallback to catch manual config edits
Thanks for reviewing!
[1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Samuel Rufinatscha (3):
partial fix #6049: datastore: impl ConfigVersionCache fast path for
lookups
partial fix #6049: datastore: use config fast-path in Drop
datastore: add TTL fallback to catch manual config edits
pbs-config/src/config_version_cache.rs | 10 ++-
pbs-datastore/src/datastore.rs | 119 ++++++++++++++++++-------
2 files changed, 96 insertions(+), 33 deletions(-)
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 11%]
* [pbs-devel] [PATCH proxmox-backup 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
2025-11-11 12:29 11% [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path Samuel Rufinatscha
@ 2025-11-11 12:29 12% ` Samuel Rufinatscha
2025-11-12 13:24 4% ` Fabian Grünbichler
2025-11-11 12:29 15% ` [pbs-devel] [PATCH proxmox-backup 2/3] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
` (2 subsequent siblings)
3 siblings, 1 reply; 39+ results
From: Samuel Rufinatscha @ 2025-11-11 12:29 UTC (permalink / raw)
To: pbs-devel
Repeated /status requests caused lookup_datastore() to re-read and
parse datastore.cfg on every call. The issue was mentioned in report
#6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
dominated by pbs_config::datastore::config() (config parsing).
This patch adds a fast path to lookup_datastore() using the shared-
memory ConfigVersionCache generation counter for datastore.cfg. It
stores the last seen generation number alongside the cached
DataStoreImpl and, when a subsequent lookup sees the same generation,
we reuse the cached instance without re-reading the on-disk config. If
the generation differs (or the cache is unavailable), it falls back to
the existing slow path with no behavioral changes.
Behavioral notes
- The generation is bumped via the existing save_config() path, so
API-driven config changes are detected immediately.
- Manual edits to datastore.cfg are not detected by this commit; a TTL
guard is introduced in a later patch to cover that case.
- DataStore::drop still performs a config read on the common path; this
is addressed in the next patch.
Testing
cargo-flamegraph confirms the config-parse hotspot in
lookup_datastore() disappears from the hot path.
Additionally, an end-to-end benchmark was performed using the
`/status?verbose=0` API with 1000 datastores, 5 requests per store,
and 16-way parallelism:
| Metric | Before | After | Δ (abs) | Δ (%) |
|--------------------------|:------:|:------:|:-------:|:-------:|
| Total time | 13s | 11s | −2s | −15.38% |
| Throughput (all rounds) | 384.62 | 454.55 | +69.93 | +18.18% |
| Cold RPS (round #1) | 76.92 | 90.91 | +13.99 | +18.19% |
| Warm RPS (rounds 2..N) | 307.69 | 363.64 | +55.95 | +18.18% |
Throughput improved by ~18% overall, with total time reduced by ~15%.
Warm lookups now reuse cached datastore instances and skip redundant
config parsing entirely. The first-access round also shows a similar
improvement, likely due to reduced contention and I/O when many stores
are accessed in parallel.
Note: A second hotspot remains in Drop where a config read occurs; that
is addressed in the next commit.
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
pbs-config/src/config_version_cache.rs | 10 +++-
pbs-datastore/src/datastore.rs | 77 +++++++++++++++++---------
2 files changed, 59 insertions(+), 28 deletions(-)
diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
index e8fb994f..b875f7e0 100644
--- a/pbs-config/src/config_version_cache.rs
+++ b/pbs-config/src/config_version_cache.rs
@@ -26,7 +26,6 @@ struct ConfigVersionCacheDataInner {
// Traffic control (traffic-control.cfg) generation/version.
traffic_control_generation: AtomicUsize,
// datastore (datastore.cfg) generation/version
- // FIXME: remove with PBS 3.0
datastore_generation: AtomicUsize,
// Add further atomics here
}
@@ -145,8 +144,15 @@ impl ConfigVersionCache {
.fetch_add(1, Ordering::AcqRel);
}
+ /// Returns the datastore generation number.
+ pub fn datastore_generation(&self) -> usize {
+ self.shmem
+ .data()
+ .datastore_generation
+ .load(Ordering::Acquire)
+ }
+
/// Increase the datastore generation number.
- // FIXME: remove with PBS 3.0 or make actually useful again in datastore lookup
pub fn increase_datastore_generation(&self) -> usize {
self.shmem
.data()
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 70af94d8..18eebb58 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -32,7 +32,7 @@ use pbs_api_types::{
MaintenanceType, Operation, UPID,
};
use pbs_config::s3::S3_CFG_TYPE_ID;
-use pbs_config::BackupLockGuard;
+use pbs_config::{BackupLockGuard, ConfigVersionCache};
use crate::backup_info::{
BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
@@ -140,10 +140,10 @@ pub struct DataStoreImpl {
last_gc_status: Mutex<GarbageCollectionStatus>,
verify_new: bool,
chunk_order: ChunkOrder,
- last_digest: Option<[u8; 32]>,
sync_level: DatastoreFSyncLevel,
backend_config: DatastoreBackendConfig,
lru_store_caching: Option<LocalDatastoreLruCache>,
+ cached_config_tag: Option<CachedDatastoreConfigTag>,
}
impl DataStoreImpl {
@@ -156,10 +156,10 @@ impl DataStoreImpl {
last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
verify_new: false,
chunk_order: Default::default(),
- last_digest: None,
sync_level: Default::default(),
backend_config: Default::default(),
lru_store_caching: None,
+ cached_config_tag: None,
})
}
}
@@ -224,6 +224,15 @@ pub enum DatastoreBackend {
S3(Arc<S3Client>),
}
+/// Used to determine whether a cached datastore instance is still valid
+/// or needs to be reloaded after a config change.
+struct CachedDatastoreConfigTag {
+ /// Maintenance mode at the time the lookup hint was captured, if any.
+ last_maintenance_mode: Option<MaintenanceMode>,
+ /// Datastore generation number from `ConfigVersionCache`; `None` when the cache wasn't available.
+ last_generation: Option<usize>,
+}
+
impl DataStore {
// This one just panics on everything
#[doc(hidden)]
@@ -299,13 +308,40 @@ impl DataStore {
// we use it to decide whether it is okay to delete the datastore.
let _config_lock = pbs_config::datastore::lock_config()?;
- // we could use the ConfigVersionCache's generation for staleness detection, but we load
- // the config anyway -> just use digest, additional benefit: manual changes get detected
- let (config, digest) = pbs_config::datastore::config()?;
+ // Get the current datastore.cfg generation number
+ let gen_num = ConfigVersionCache::new()
+ .ok()
+ .map(|c| c.datastore_generation());
+
+ // Fast-path: if we have a cached entry created under the same datastore.cfg generation number, reuse it.
+ if let (Some(gen_num), Some(ds)) =
+ (gen_num, DATASTORE_MAP.lock().unwrap().get(name).cloned())
+ {
+ if let Some(cached_tag) = &ds.cached_config_tag {
+ if cached_tag.last_generation == Some(gen_num) {
+ if let Some(mm) = &cached_tag.last_maintenance_mode {
+ if let Err(error) = mm.check(operation) {
+ bail!("datastore '{name}' is unavailable: {error}");
+ }
+ }
+ if let Some(operation) = operation {
+ update_active_operations(name, operation, 1)?;
+ }
+ return Ok(Arc::new(Self {
+ inner: ds,
+ operation,
+ }));
+ }
+ }
+ }
+
+ // Slow path: (re)load config
+ let (config, _digest) = pbs_config::datastore::config()?;
let config: DataStoreConfig = config.lookup("datastore", name)?;
- if let Some(maintenance_mode) = config.get_maintenance_mode() {
- if let Err(error) = maintenance_mode.check(operation) {
+ let maintenance_mode = config.get_maintenance_mode();
+ if let Some(mm) = &maintenance_mode {
+ if let Err(error) = mm.check(operation) {
bail!("datastore '{name}' is unavailable: {error}");
}
}
@@ -321,16 +357,6 @@ impl DataStore {
// reuse chunk store so that we keep using the same process locker instance!
let chunk_store = if let Some(datastore) = &entry {
- let last_digest = datastore.last_digest.as_ref();
- if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
- if let Some(operation) = operation {
- update_active_operations(name, operation, 1)?;
- }
- return Ok(Arc::new(Self {
- inner: Arc::clone(datastore),
- operation,
- }));
- }
Arc::clone(&datastore.chunk_store)
} else {
let tuning: DatastoreTuning = serde_json::from_value(
@@ -344,7 +370,11 @@ impl DataStore {
)?)
};
- let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
+ let mut datastore = DataStore::with_store_and_config(chunk_store, config)?;
+ datastore.cached_config_tag = Some(CachedDatastoreConfigTag {
+ last_maintenance_mode: maintenance_mode,
+ last_generation: gen_num,
+ });
let datastore = Arc::new(datastore);
datastore_cache.insert(name.to_string(), datastore.clone());
@@ -430,11 +460,7 @@ impl DataStore {
config.absolute_path(),
tuning.sync_level.unwrap_or_default(),
)?;
- let inner = Arc::new(Self::with_store_and_config(
- Arc::new(chunk_store),
- config,
- None,
- )?);
+ let inner = Arc::new(Self::with_store_and_config(Arc::new(chunk_store), config)?);
if let Some(operation) = operation {
update_active_operations(&name, operation, 1)?;
@@ -446,7 +472,6 @@ impl DataStore {
fn with_store_and_config(
chunk_store: Arc<ChunkStore>,
config: DataStoreConfig,
- last_digest: Option<[u8; 32]>,
) -> Result<DataStoreImpl, Error> {
let mut gc_status_path = chunk_store.base_path();
gc_status_path.push(".gc-status");
@@ -506,10 +531,10 @@ impl DataStore {
last_gc_status: Mutex::new(gc_status),
verify_new: config.verify_new.unwrap_or(false),
chunk_order: tuning.chunk_order.unwrap_or_default(),
- last_digest,
sync_level: tuning.sync_level.unwrap_or_default(),
backend_config,
lru_store_caching,
+ cached_config_tag: None,
})
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] [PATCH proxmox-backup 3/3] datastore: add TTL fallback to catch manual config edits
2025-11-11 12:29 11% [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-11 12:29 12% ` [pbs-devel] [PATCH proxmox-backup 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2025-11-11 12:29 15% ` [pbs-devel] [PATCH proxmox-backup 2/3] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
@ 2025-11-11 12:29 15% ` Samuel Rufinatscha
2025-11-12 11:27 5% ` [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path Fabian Grünbichler
3 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2025-11-11 12:29 UTC (permalink / raw)
To: pbs-devel
The lookup fast path reacts to API-driven config changes because
save_config() bumps the generation. Manual edits of datastore.cfg do
not bump the counter. To keep the system robust against such edits
without reintroducing config reading and hashing on the hot path, this
patch adds a TTL to the cache entry.
If the datastore’s cached tag is older than
DATASTORE_CONFIG_CACHE_TTL_SECS (set to 60s), the next lookup takes
the slow path (re-read/parse) and refreshes the cached entry. Within
the TTL window, unchanged generations still use the fast path.
Note: Manual edits may remain unseen until the TTL elapses or any API
config write occurs.
Testing
With the TTL enabled, flamegraphs for hot status requests remain flat. A
0.1 second interval test confirmed periodic latency spikes at TTL expiry.
Maintainer notes
No dependency bumps or breaking changes.
Links
[1] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Refs: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
pbs-datastore/src/datastore.rs | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index da80416a..5eaae49b 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -22,7 +22,7 @@ use proxmox_sys::error::SysError;
use proxmox_sys::fs::{file_read_optional_string, replace_file, CreateOptions};
use proxmox_sys::linux::procfs::MountInfo;
use proxmox_sys::process_locker::{ProcessLockExclusiveGuard, ProcessLockSharedGuard};
-use proxmox_time::TimeSpan;
+use proxmox_time::{epoch_i64, TimeSpan};
use proxmox_worker_task::WorkerTaskContext;
use pbs_api_types::{
@@ -56,6 +56,9 @@ pub const GROUP_OWNER_FILE_NAME: &str = "owner";
/// Filename for in-use marker stored on S3 object store backend
pub const S3_DATASTORE_IN_USE_MARKER: &str = ".in-use";
const NAMESPACE_MARKER_FILENAME: &str = ".namespace";
+/// Max age in seconds to reuse the datastore lookup fast path
+/// before forcing a slow-path config read.
+const DATASTORE_CONFIG_CACHE_TTL_SECS: i64 = 60;
/// checks if auth_id is owner, or, if owner is a token, if
/// auth_id is the user of the token
@@ -254,6 +257,8 @@ struct CachedDatastoreConfigTag {
last_maintenance_mode: Option<MaintenanceMode>,
/// Datastore generation number from `ConfigVersionCache`; `None` when the cache wasn't available.
last_generation: Option<usize>,
+ /// Epoch seconds when this lookup hint was created.
+ last_update: i64,
}
impl DataStore {
@@ -335,13 +340,16 @@ impl DataStore {
let gen_num = ConfigVersionCache::new()
.ok()
.map(|c| c.datastore_generation());
+ let now = epoch_i64();
// Fast-path: if we have a cached entry created under the same datastore.cfg generation number, reuse it.
if let (Some(gen_num), Some(ds)) =
(gen_num, DATASTORE_MAP.lock().unwrap().get(name).cloned())
{
if let Some(cached_tag) = &ds.cached_config_tag {
- if cached_tag.last_generation == Some(gen_num) {
+ if cached_tag.last_generation == Some(gen_num)
+ && (now - cached_tag.last_update) < DATASTORE_CONFIG_CACHE_TTL_SECS
+ {
if let Some(mm) = &cached_tag.last_maintenance_mode {
if let Err(error) = mm.check(operation) {
bail!("datastore '{name}' is unavailable: {error}");
@@ -397,6 +405,7 @@ impl DataStore {
datastore.cached_config_tag = Some(CachedDatastoreConfigTag {
last_maintenance_mode: maintenance_mode,
last_generation: gen_num,
+ last_update: now,
});
let datastore = Arc::new(datastore);
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox-backup 2/3] partial fix #6049: datastore: use config fast-path in Drop
2025-11-11 12:29 11% [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-11 12:29 12% ` [pbs-devel] [PATCH proxmox-backup 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
@ 2025-11-11 12:29 15% ` Samuel Rufinatscha
2025-11-12 11:24 5% ` Fabian Grünbichler
2025-11-11 12:29 15% ` [pbs-devel] [PATCH proxmox-backup 3/3] datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-12 11:27 5% ` [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path Fabian Grünbichler
3 siblings, 1 reply; 39+ results
From: Samuel Rufinatscha @ 2025-11-11 12:29 UTC (permalink / raw)
To: pbs-devel
The Drop impl of DataStore re-read datastore.cfg to decide whether
the entry should be evicted from the in-process cache (based on
maintenance mode’s clear_from_cache). During the investigation of
issue #6049 [1], a flamegraph [2] showed that the config reload in Drop
accounted for a measurable share of CPU time under load.
This patch makes Drop O(1) on the fast path by reusing the maintenance-
mode decision captured at lookup time and stored with the cached
datastore entry. When the last reference goes away we:
- decrement active-operation counters, and
- evict only if the cached decision mandates eviction.
If the cache tag is absent or not fresh, a subsequent slow-path lookup
will be performed.
Testing
Compared flamegraphs before and after: prior to this change
(on top of patch 1), stacks originating from Drop included
pbs_config::datastore::config(). After the change, those vanish from
the drop path.
An end-to-end benchmark using `/status?verbose=0` with 1000 datastores,
5 requests per store, and 16-way parallelism shows a further
improvement:
| Metric | After commit 1 | After commit 2 | Δ (abs) | Δ (%) |
|-------------------------|:--------------:|:--------------:|:-------:|:-------:|
| Total time | 11s | 10s | −1s | −9.09% |
| Throughput (all rounds) | 454.55 | 500.00 | +45.45 | +10.00% |
| Cold RPS (round #1) | 90.91 | 100.00 | +9.09 | +10.00% |
| Warm RPS (rounds 2..N) | 363.64 | 400.00 | +36.36 | +10.00% |
Optimizing Drop improves overall throughput by ~10%. The gain appears
in both cold and warm rounds, and the flamegraph confirms the config
reload no longer sits on the hot path.
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
pbs-datastore/src/datastore.rs | 31 +++++++++++++++++++++++++++----
1 file changed, 27 insertions(+), 4 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 18eebb58..da80416a 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -200,15 +200,38 @@ impl Drop for DataStore {
// remove datastore from cache iff
// - last task finished, and
// - datastore is in a maintenance mode that mandates it
- let remove_from_cache = last_task
- && pbs_config::datastore::config()
+
+ // first check: check if last task finished
+ if !last_task {
+ return;
+ }
+
+ let cached_tag = self.inner.cached_config_tag.as_ref();
+ let last_gen_num = cached_tag.and_then(|c| c.last_generation);
+ let gen_num = ConfigVersionCache::new()
+ .ok()
+ .map(|c| c.datastore_generation());
+
+ let cache_is_fresh = match (last_gen_num, gen_num) {
+ (Some(a), Some(b)) => a == b,
+ _ => false,
+ };
+
+ let mm_mandate = if cache_is_fresh {
+ cached_tag
+ .and_then(|c| c.last_maintenance_mode.as_ref())
+ .is_some_and(|m| m.clear_from_cache())
+ } else {
+ pbs_config::datastore::config()
.and_then(|(s, _)| s.lookup::<DataStoreConfig>("datastore", self.name()))
.is_ok_and(|c| {
c.get_maintenance_mode()
.is_some_and(|m| m.clear_from_cache())
- });
+ })
+ };
- if remove_from_cache {
+ // second check: check maintenance mode mandate
+ if mm_mandate {
DATASTORE_MAP.lock().unwrap().remove(self.name());
}
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* Re: [pbs-devel] [PATCH proxmox] fix #6913: auth-api: fix user ID parsing for 2-character realms
2025-11-03 16:26 15% [pbs-devel] [PATCH proxmox] fix #6913: auth-api: fix user ID parsing for 2-character realms Samuel Rufinatscha
@ 2025-11-11 10:40 5% ` Fabian Grünbichler
2025-11-11 13:49 6% ` Samuel Rufinatscha
2025-11-14 10:34 5% ` [pbs-devel] applied: " Fabian Grünbichler
1 sibling, 1 reply; 39+ results
From: Fabian Grünbichler @ 2025-11-11 10:40 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On November 3, 2025 5:26 pm, Samuel Rufinatscha wrote:
> PVE and PBS both allow creating realms with names of length ≥ 2.
> However, when creating a user, PBS rejected realms with 2 characters
> (e.g. `test@aa`), while PVE accepted them. This issue was reported
> in our bug tracker [1]. Since the issue appears in the underlying
> `proxmox/proxmox-auth-api` crate, also PDM userid handling is
> affected.
>
> The issue is caused by a mismatch between realm creation and parsing
> rules in `proxmox/proxmox-auth-api`. `REALM_ID_SCHEMA` allows
> min_length(2), but `PROXMOX_AUTH_REALM_STRING_SCHEMA` enforced
> min_length(3).
>
> This patch lowers the minimum realm length in
> `PROXMOX_AUTH_REALM_STRING_SCHEMA` from 3 to 2 to align PBS and PMG
> with PVE.
>
> ## Testing
>
> Please see the attached unit tests.
> The changes were further verified using a rebuilt PBS .deb
> deployment. PDM was tested using a non-package binary through the
> provided client CLI.
>
> ## Maintainer notes:
>
> Bump the `proxmox-auth-api` dependency, no breaking change.
> PBS and PDM to use the new dependency.
this part here we'd usually put into the patch notes (the part below the
`---`), which doesn't show up in git history. you can manage those notes
using `git notes ..`, including (if you set your config accordingly),
preserving/merging them across rebases.
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
>
> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6913
>
> Fixes: #6913
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> proxmox-auth-api/src/types.rs | 68 ++++++++++++++++++++++++++++++++++-
> 1 file changed, 67 insertions(+), 1 deletion(-)
>
> diff --git a/proxmox-auth-api/src/types.rs b/proxmox-auth-api/src/types.rs
> index 9bde661c..aa09fb93 100644
> --- a/proxmox-auth-api/src/types.rs
> +++ b/proxmox-auth-api/src/types.rs
> @@ -95,7 +95,7 @@ pub const PROXMOX_GROUP_ID_SCHEMA: Schema = StringSchema::new("Group ID")
> pub const PROXMOX_AUTH_REALM_STRING_SCHEMA: StringSchema =
> StringSchema::new("Authentication domain ID")
> .format(&proxmox_schema::api_types::SAFE_ID_FORMAT)
> - .min_length(3)
> + .min_length(2)
> .max_length(32);
> pub const PROXMOX_AUTH_REALM_SCHEMA: Schema = PROXMOX_AUTH_REALM_STRING_SCHEMA.schema();
>
> @@ -769,6 +769,72 @@ fn test_token_id() {
> assert_eq!(auth_id.to_string(), "test@pam!bar".to_string());
> }
>
> +#[test]
> +fn test_realm_validation() {
> + let empty_realm: Result<Realm, _> = "".to_string().try_into();
> + let one_char_realm: Result<Realm, _> = "a".to_string().try_into();
> + let two_char_realm: Result<Realm, _> = "aa".to_string().try_into();
> + let long_realm: Result<Realm, _> = "a".repeat(33).try_into();
> + let valid_realm: Result<Realm, _> = "pam".to_string().try_into();
> +
> + assert!(empty_realm.is_err(), "Empty realm should fail validation");
> + assert!(
> + one_char_realm.is_err(),
> + "1-char realm should fail validation"
> + );
> + assert!(
> + two_char_realm.is_ok(),
> + "2-char realm should pass validation"
> + );
> + assert!(valid_realm.is_ok(), "Typical realm should pass validation");
> + assert!(
> + long_realm.is_err(),
> + "Realm >32 chars should fail validation"
> + );
> +}
> +
> +#[test]
> +fn test_userid_validation() {
> + let empty_str: Result<Userid, _> = "".parse();
> + let invalid_no_realm: Result<Userid, _> = "user".parse();
> + let invalid_empty_realm: Result<Userid, _> = "user@".parse();
> + let invalid_one_char_realm: Result<Userid, _> = "user@a".parse();
> + let valid_two_char_realm: Result<Userid, _> = "user@aa".parse();
> + let valid_long_realm: Result<Userid, _> = "user@pam".parse();
> + let invalid_long_realm: Result<Userid, _> = format!("user@{}", "a".repeat(33)).parse();
> + let invalid_empty_username: Result<Userid, _> = "@aa".parse();
> +
> + assert!(empty_str.is_err(), "Empty userid should fail");
> + assert!(
> + invalid_no_realm.is_err(),
> + "Userid without realm should fail"
> + );
> + assert!(
> + invalid_empty_realm.is_err(),
> + "Userid with empty realm should fail"
> + );
> + assert!(
> + invalid_one_char_realm.is_err(),
> + "Userid with 1-char realm should fail"
> + );
> + assert!(
> + valid_two_char_realm.is_ok(),
> + "Userid with 2-char realm should pass"
> + );
> + assert!(
> + valid_long_realm.is_ok(),
> + "Userid with normal realm should pass"
> + );
> + assert!(
> + invalid_long_realm.is_err(),
> + "Userid with realm >32 chars should fail"
> + );
> + assert!(
> + invalid_empty_username.is_err(),
> + "Userid with empty username should fail"
> + );
> +}
these two are more or less tests validating our schema deserializer, but
as the types are rather core types they also don't hurt.
AFAICT we don't have in-depth tests in proxmox-schema that verify that
the schema constraints validation actually works as expected, there's
just some basic tests for query parameter handling and schema types
themselves - might be an area worth improving ;)
> +
> serde_plain::derive_deserialize_from_fromstr!(Userid, "valid user id");
> serde_plain::derive_serialize_from_display!(Userid);
>
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox v3 1/1] fix #6939: acme: support servers returning 204 for nonce requests
2025-11-04 14:11 4% ` Thomas Lamprecht
@ 2025-11-05 10:22 9% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2025-11-05 10:22 UTC (permalink / raw)
To: Thomas Lamprecht, Proxmox Backup Server development discussion
On 11/4/25 3:11 PM, Thomas Lamprecht wrote:
> Am 03.11.25 um 11:13 schrieb Samuel Rufinatscha:
>> Some ACME servers (notably custom or legacy implementations) respond
>> to HEAD /newNonce with a 204 No Content instead of the
>> RFC 8555-recommended 200 OK [1]. While this behavior is technically
>> off-spec, it is not illegal. This issue was reported on our bug
>> tracker [2].
>>
>> The previous implementation treated any non-200 response as an error,
>> causing account registration to fail against such servers. Relax the
>> status-code check to accept both 200 and 204 responses (and potentially
>> support other 2xx codes) to improve interoperability.
>>
>> Note: In comparison, PVE’s Perl ACME client performs a GET request [3]
>> instead of a HEAD request and accepts any 2xx success code when
>> retrieving the nonce [4]. This difference in behavior does not affect
>> functionality but is worth noting for consistency across
>> implementations.
>
> The resulting code now looks mostly OK, but when doing a final review
> before pushing this out I noticed two things that I should have actually
> caught earlier:
>
> 1. Introducing + using the new http_status module should be a separate
> patch upfront, it has nothing to do with the actual bug fix but is
> rather an independent cleanup.
>
> 2. The Request struct and it's expected member are pub and is still
> used directly in PBS - where it was originally factored out from
> but not yet replaced. So the type change from `u16` to `&'static [u16]`
> causes a breaking ABI change. That change itself can be manageable,
> albeit if it can be easily avoided it's always nicer to do so; using
> a constructor (potentially with builder pattern) and changing the
> visibility of the members to pub(crate) or even making them private
> to the module, can often be a good option to reduce friction for
> any future change.
> But here it would be nicer to clear the tech debt and switch PBS fully
> over to the factored out impl, like e.g. PDM uses already. This should
> then also allow us to reduce the visibility of the struct members and
> the http_status module, which as of now I'd also rather see as
> proxmox-acme specific and thus not exposing it to the public would be
> better.
>
> Do you want to give switching PBS over to the factored-out impl a try?
> It adds a bit to the scope, but we have to clean this up (or accumulate
> tech debt interest) sooner or later anyway, and if we do already a breaking
> change I'd prefer sooner.
> For the acme side of your changes nothing should change – besides maybe
> already reducing visibility of structs and/or their members in a separate
> patch.
>
> In summary, a good order/split for the resulting patches could look
> something like:
>
> 1. change PBS over to proxmox-acme, at least the client part.
> 2. reduce the visibility of types that now are only used in proxmox-acme
> internally
> 3. introduce and use http_status mod in proxmox-acme
> 4. fix #6939
>
> What do you think? If moving PBS to proxmox-acme turns out to be tricky,
> or even more work than expected, we can also go this route here as stop
> gap; but IMO it would be favorable to spend a bit more time in actually
> cleaning this up and reduce code duplication than doing so.
> Again, I should have caught that early, but FWIW, almost all of the work
> you done so far will still be relevant, so no real harm done FWICT.
Hi Thomas,
thanks a lot for having a second look and the detailed feedback -
sounds good to me! I agree that it would be better to split off the
fix-independent changes, and it would be a good opportunity now to
switch PBS over to the factored-out proxmox-acme implementation (get
at least rid of the duplicate client impl), and follow PDM. I think
this is manageable and will start by switching PBS to the shared
client, adjust visibility and add the http_status module before
applying the fix.
I’ll go ahead with the refactoring and will send a v4 with the
suggested patch structure.
Samuel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 9%]
* Re: [pbs-devel] [PATCH proxmox v3 1/1] fix #6939: acme: support servers returning 204 for nonce requests
2025-11-03 10:13 13% ` [pbs-devel] [PATCH proxmox v3 1/1] " Samuel Rufinatscha
@ 2025-11-04 14:11 4% ` Thomas Lamprecht
2025-11-05 10:22 9% ` Samuel Rufinatscha
0 siblings, 1 reply; 39+ results
From: Thomas Lamprecht @ 2025-11-04 14:11 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Samuel Rufinatscha
Am 03.11.25 um 11:13 schrieb Samuel Rufinatscha:
> Some ACME servers (notably custom or legacy implementations) respond
> to HEAD /newNonce with a 204 No Content instead of the
> RFC 8555-recommended 200 OK [1]. While this behavior is technically
> off-spec, it is not illegal. This issue was reported on our bug
> tracker [2].
>
> The previous implementation treated any non-200 response as an error,
> causing account registration to fail against such servers. Relax the
> status-code check to accept both 200 and 204 responses (and potentially
> support other 2xx codes) to improve interoperability.
>
> Note: In comparison, PVE’s Perl ACME client performs a GET request [3]
> instead of a HEAD request and accepts any 2xx success code when
> retrieving the nonce [4]. This difference in behavior does not affect
> functionality but is worth noting for consistency across
> implementations.
The resulting code now looks mostly OK, but when doing a final review
before pushing this out I noticed two things that I should have actually
caught earlier:
1. Introducing + using the new http_status module should be a separate
patch upfront, it has nothing to do with the actual bug fix but is
rather an independent cleanup.
2. The Request struct and it's expected member are pub and is still
used directly in PBS - where it was originally factored out from
but not yet replaced. So the type change from `u16` to `&'static [u16]`
causes a breaking ABI change. That change itself can be manageable,
albeit if it can be easily avoided it's always nicer to do so; using
a constructor (potentially with builder pattern) and changing the
visibility of the members to pub(crate) or even making them private
to the module, can often be a good option to reduce friction for
any future change.
But here it would be nicer to clear the tech debt and switch PBS fully
over to the factored out impl, like e.g. PDM uses already. This should
then also allow us to reduce the visibility of the struct members and
the http_status module, which as of now I'd also rather see as
proxmox-acme specific and thus not exposing it to the public would be
better.
Do you want to give switching PBS over to the factored-out impl a try?
It adds a bit to the scope, but we have to clean this up (or accumulate
tech debt interest) sooner or later anyway, and if we do already a breaking
change I'd prefer sooner.
For the acme side of your changes nothing should change – besides maybe
already reducing visibility of structs and/or their members in a separate
patch.
In summary, a good order/split for the resulting patches could look
something like:
1. change PBS over to proxmox-acme, at least the client part.
2. reduce the visibility of types that now are only used in proxmox-acme
internally
3. introduce and use http_status mod in proxmox-acme
4. fix #6939
What do you think? If moving PBS to proxmox-acme turns out to be tricky,
or even more work than expected, we can also go this route here as stop
gap; but IMO it would be favorable to spend a bit more time in actually
cleaning this up and reduce code duplication than doing so.
Again, I should have caught that early, but FWIW, almost all of the work
you done so far will still be relevant, so no real harm done FWICT.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 4%]
* [pbs-devel] [PATCH proxmox] fix #6913: auth-api: fix user ID parsing for 2-character realms
@ 2025-11-03 16:26 15% Samuel Rufinatscha
2025-11-11 10:40 5% ` Fabian Grünbichler
2025-11-14 10:34 5% ` [pbs-devel] applied: " Fabian Grünbichler
0 siblings, 2 replies; 39+ results
From: Samuel Rufinatscha @ 2025-11-03 16:26 UTC (permalink / raw)
To: pbs-devel
PVE and PBS both allow creating realms with names of length ≥ 2.
However, when creating a user, PBS rejected realms with 2 characters
(e.g. `test@aa`), while PVE accepted them. This issue was reported
in our bug tracker [1]. Since the issue appears in the underlying
`proxmox/proxmox-auth-api` crate, also PDM userid handling is
affected.
The issue is caused by a mismatch between realm creation and parsing
rules in `proxmox/proxmox-auth-api`. `REALM_ID_SCHEMA` allows
min_length(2), but `PROXMOX_AUTH_REALM_STRING_SCHEMA` enforced
min_length(3).
This patch lowers the minimum realm length in
`PROXMOX_AUTH_REALM_STRING_SCHEMA` from 3 to 2 to align PBS and PMG
with PVE.
## Testing
Please see the attached unit tests.
The changes were further verified using a rebuilt PBS .deb
deployment. PDM was tested using a non-package binary through the
provided client CLI.
## Maintainer notes:
Bump the `proxmox-auth-api` dependency, no breaking change.
PBS and PDM to use the new dependency.
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6913
Fixes: #6913
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-auth-api/src/types.rs | 68 ++++++++++++++++++++++++++++++++++-
1 file changed, 67 insertions(+), 1 deletion(-)
diff --git a/proxmox-auth-api/src/types.rs b/proxmox-auth-api/src/types.rs
index 9bde661c..aa09fb93 100644
--- a/proxmox-auth-api/src/types.rs
+++ b/proxmox-auth-api/src/types.rs
@@ -95,7 +95,7 @@ pub const PROXMOX_GROUP_ID_SCHEMA: Schema = StringSchema::new("Group ID")
pub const PROXMOX_AUTH_REALM_STRING_SCHEMA: StringSchema =
StringSchema::new("Authentication domain ID")
.format(&proxmox_schema::api_types::SAFE_ID_FORMAT)
- .min_length(3)
+ .min_length(2)
.max_length(32);
pub const PROXMOX_AUTH_REALM_SCHEMA: Schema = PROXMOX_AUTH_REALM_STRING_SCHEMA.schema();
@@ -769,6 +769,72 @@ fn test_token_id() {
assert_eq!(auth_id.to_string(), "test@pam!bar".to_string());
}
+#[test]
+fn test_realm_validation() {
+ let empty_realm: Result<Realm, _> = "".to_string().try_into();
+ let one_char_realm: Result<Realm, _> = "a".to_string().try_into();
+ let two_char_realm: Result<Realm, _> = "aa".to_string().try_into();
+ let long_realm: Result<Realm, _> = "a".repeat(33).try_into();
+ let valid_realm: Result<Realm, _> = "pam".to_string().try_into();
+
+ assert!(empty_realm.is_err(), "Empty realm should fail validation");
+ assert!(
+ one_char_realm.is_err(),
+ "1-char realm should fail validation"
+ );
+ assert!(
+ two_char_realm.is_ok(),
+ "2-char realm should pass validation"
+ );
+ assert!(valid_realm.is_ok(), "Typical realm should pass validation");
+ assert!(
+ long_realm.is_err(),
+ "Realm >32 chars should fail validation"
+ );
+}
+
+#[test]
+fn test_userid_validation() {
+ let empty_str: Result<Userid, _> = "".parse();
+ let invalid_no_realm: Result<Userid, _> = "user".parse();
+ let invalid_empty_realm: Result<Userid, _> = "user@".parse();
+ let invalid_one_char_realm: Result<Userid, _> = "user@a".parse();
+ let valid_two_char_realm: Result<Userid, _> = "user@aa".parse();
+ let valid_long_realm: Result<Userid, _> = "user@pam".parse();
+ let invalid_long_realm: Result<Userid, _> = format!("user@{}", "a".repeat(33)).parse();
+ let invalid_empty_username: Result<Userid, _> = "@aa".parse();
+
+ assert!(empty_str.is_err(), "Empty userid should fail");
+ assert!(
+ invalid_no_realm.is_err(),
+ "Userid without realm should fail"
+ );
+ assert!(
+ invalid_empty_realm.is_err(),
+ "Userid with empty realm should fail"
+ );
+ assert!(
+ invalid_one_char_realm.is_err(),
+ "Userid with 1-char realm should fail"
+ );
+ assert!(
+ valid_two_char_realm.is_ok(),
+ "Userid with 2-char realm should pass"
+ );
+ assert!(
+ valid_long_realm.is_ok(),
+ "Userid with normal realm should pass"
+ );
+ assert!(
+ invalid_long_realm.is_err(),
+ "Userid with realm >32 chars should fail"
+ );
+ assert!(
+ invalid_empty_username.is_err(),
+ "Userid with empty username should fail"
+ );
+}
+
serde_plain::derive_deserialize_from_fromstr!(Userid, "valid user id");
serde_plain::derive_serialize_from_display!(Userid);
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] superseded: [PATCH proxmox{, -backup} v2 0/2] fix #6939: acme: support servers returning 204 for nonce requests
2025-10-29 16:45 13% [pbs-devel] [PATCH proxmox{, -backup} v2 " Samuel Rufinatscha
2025-10-29 16:45 13% ` [pbs-devel] [PATCH proxmox v2 1/1] " Samuel Rufinatscha
2025-10-29 16:45 15% ` [pbs-devel] [PATCH proxmox-backup v2 1/1] fix #6939: acme: accept HTTP 204 from newNonce endpoint Samuel Rufinatscha
@ 2025-11-03 10:21 13% ` Samuel Rufinatscha
2 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2025-11-03 10:21 UTC (permalink / raw)
To: pbs-devel
https://lore.proxmox.com/pbs-devel/20251103101322.100392-1-s.rufinatscha@proxmox.com/T/#t
On 10/29/25 5:45 PM, Samuel Rufinatscha wrote:
> Hi,
>
> this series proposes a change to ACME account registration in Proxmox
> Backup Server (PBS), so that it also works with ACME servers that return
> HTTP 204 No Content to the HEAD request for newNonce.
>
> This behaviour was observed against a specific ACME deployment and
> reported as bug #6939 [1]. Currently, PBS cannot register an ACME
> account for this CA.
>
> ## Problem
>
> During ACME account registration, PBS first fetches an anti-replay nonce
> by sending a HEAD request to the CA’s newNonce URL. RFC 8555 7.2 [2]
> says:
>
> * the server MUST include a Replay-Nonce header with a fresh nonce,
> * the server SHOULD use status 200 OK for the HEAD request,
> * the server MUST also handle GET on the same resource with status 204 No
> Content and an empty body [2].
>
> Currently, our Rust ACME clients only accept 200 OK. PBS inherits that
> strictness and aborts with:
>
> *ACME server responded with unexpected status code: 204*
>
> The author mentions, the issue did not appear with PVE9 [1].
> After looking into PVE’s Perl ACME client [3] it appears it uses a GET
> request instead of a HEAD request and accepts any 2xx success code
> when retrieving the nonce [5]. This difference in behavior does not
> affect functionality but is worth noting for consistency across
> implementations.
>
> ## Ideas to solve the problem
>
> To support ACME providers which return 204 No Content, the underlying
> ACME clients need to tolerate both 200 OK and 204 No Content as valid
> responses for the nonce HEAD request, as long as the Replay-Nonce is
> provided.
>
> I considered following solutions:
>
> 1. Change the `expected` field of the `AcmeRequest` type from `u16` to
> `Vec<u16>`, to support multiple success codes
>
> 2. Keep `expected: u16` and add a second field e.g. `expected_other:
> Vec<u16>` for "also allowed" codes.
>
> 3. Support any 2xx success codes, and remove the `expected` check
>
> I thought (1) might be reasonable, because:
>
> * It stays explicit and makes it clear which statuses are considered
> success.
> * We don’t create two parallel concepts ("expected" vs
> "expected_other") which introduces additional complexity
> * Can be extend later if we meet yet another harmless but not 200
> variant.
> * We don’t allow arbitrary 2xx.
>
> What do you think? Do you maybe have any other solution in mind that
> would fit better?
>
> ## Testing
>
> To prove the proposed fix, I reproduced the scenario:
>
> Pebble (release 2.8.0) from Let's Encrypt [5] running on a Debian 9 VM
> as the ACME server. nginx in front of Pebble, to intercept the
> `newNonce` request in order to return 204 No Content instead of 200 OK,
> all other requests are unchanged and forwarded to Pebble. Trust the
> Pebble and ngix CAs via `/usr/local/share/ca-certificates` +
> `update-ca-certificates` on the PBS VM.
>
> Then I ran following command against nginx:
>
> ```
> proxmox-backup-manager acme account register proxytest root@backup.local
> --directory 'https://nginx-address/dir
>
> Attempting to fetch Terms of
> Service from "https://acme-vm/dir"
> Terms of Service:
> data:text/plain,Do%20what%20thou%20wilt
> Do you agree to the above terms?
> [y|N]: y
> Do you want to use external account binding? [y|N]: N
> Attempting
> to register account with "https://acme-vm/dir"...
> Registration
> successful, account URL: https://acme-vm/my-account/160e58b66bdd72da
> ```
>
> When adjusting the nginx configuration to return any other non-expected
> success status code, e.g. 205, PBS expectely rejects with `API
> misbehaved: ACME server responded with unexpected status code: 205`.
>
> ## Maintainer notes:
>
> The patch series involves the following components:
>
> proxmox-acme: Apply PATCH 1 to change `expected` from `u16` to
> `Vec<u16>`. This results in a breaking change, as it changes the public
> API of the `AcmeRequest` type that is used by other components.
>
> proxmox-acme-api: Needs to depend on the new proxmox-acme; patch bump
>
> proxmox-backup: Apply PATCH 2 to use the new API changes; no breaking
> change as of only internal changes; patch bump
>
> proxmox-perl-rs / proxmox-datacenter-manager: Will need to use the
> dependency version bumps to follow the new proxmox-acme.
>
> ## Patch summary
>
> [PATCH 1/2] fix #6939: support providers returning 204 for nonce
> requests
>
> * Make the expected-status logic accept multiple allowed codes.
> * Treat both 200 OK and 204 No Content as valid for HEAD /newNonce,
> provided Replay-Nonce is present.
> * Keep rejecting other codes.
>
> [PATCH 2/2] acme: accept HTTP 204 from newNonce endpoint
>
> * Use the updated proxmox-acme behavior in PBS.
> * PBS can now register an ACME account against servers that return 204
> for the nonce HEAD request.
> * Still rejects unexpected codes.
>
> Thanks for considering this patch series, I look forward to your
> feedback.
>
> Best,
> Samuel Rufinatscha
>
> ## Changes from v1:
>
> [PATCH 1/2] fix #6939: support providers returning 204 for nonce
> requests
> * Introduced `http_success` module to contain the http success codes
> * Replaced `Vec<u16>` with `&[u16]` for expected codes to avoid
> allocations.
> * Clarified the PVEs Perl ACME client behaviour in the commit message.
>
> [PATCH 2/2] acme: accept HTTP 204 from newNonce endpoint
> * Integrated the `http_success` module, replacing `Vec<u16>` with `&[u16]`
> * Clarified the PVEs Perl ACME client behaviour in the commit message.
>
> [1] Bugzilla report #6939:
> [https://bugzilla.proxmox.com/show_bug.cgi?id=6939](https://bugzilla.proxmox.com/show_bug.cgi?id=6939)
> [2] RFC 8555 (ACME):
> [https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2](https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2)
> [3] PVE’s Perl ACME client (allow 2xx codes for nonce requests):
> [https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597)
> [4] Pebble ACME server:
> [https://github.com/letsencrypt/pebble](https://github.com/letsencrypt/pebble)
> [5] Pebble ACME server (perform GET request:
> [https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219)
>
> proxmox:
>
> Samuel Rufinatscha (1):
> fix #6939: acme: support servers returning 204 for nonce requests
>
> proxmox-acme/src/account.rs | 10 +++++-----
> proxmox-acme/src/async_client.rs | 6 +++---
> proxmox-acme/src/client.rs | 2 +-
> proxmox-acme/src/lib.rs | 4 ++++
> proxmox-acme/src/request.rs | 15 ++++++++++++---
> 5 files changed, 25 insertions(+), 12 deletions(-)
>
>
> proxmox-backup:
>
> Samuel Rufinatscha (1):
> fix #6939: acme: accept HTTP 204 from newNonce endpoint
>
> src/acme/client.rs | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
>
> Summary over all repositories:
> 6 files changed, 29 insertions(+), 16 deletions(-)
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox v3 1/1] fix #6939: acme: support servers returning 204 for nonce requests
2025-11-03 10:13 13% [pbs-devel] [PATCH proxmox{, -backup} v3 " Samuel Rufinatscha
@ 2025-11-03 10:13 13% ` Samuel Rufinatscha
2025-11-04 14:11 4% ` Thomas Lamprecht
2025-11-03 10:13 15% ` [pbs-devel] [PATCH proxmox-backup v3 1/1] fix #6939: acme: accept HTTP 204 from newNonce endpoint Samuel Rufinatscha
1 sibling, 1 reply; 39+ results
From: Samuel Rufinatscha @ 2025-11-03 10:13 UTC (permalink / raw)
To: pbs-devel
Some ACME servers (notably custom or legacy implementations) respond
to HEAD /newNonce with a 204 No Content instead of the
RFC 8555-recommended 200 OK [1]. While this behavior is technically
off-spec, it is not illegal. This issue was reported on our bug
tracker [2].
The previous implementation treated any non-200 response as an error,
causing account registration to fail against such servers. Relax the
status-code check to accept both 200 and 204 responses (and potentially
support other 2xx codes) to improve interoperability.
Note: In comparison, PVE’s Perl ACME client performs a GET request [3]
instead of a HEAD request and accepts any 2xx success code when
retrieving the nonce [4]. This difference in behavior does not affect
functionality but is worth noting for consistency across
implementations.
[1] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
[3] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219
[4] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597
Fixes: #6939
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme/src/account.rs | 10 +++++-----
proxmox-acme/src/async_client.rs | 6 +++---
proxmox-acme/src/client.rs | 2 +-
proxmox-acme/src/lib.rs | 4 ++++
proxmox-acme/src/request.rs | 15 ++++++++++++---
5 files changed, 25 insertions(+), 12 deletions(-)
diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
index 0bbf0027..1ad485a2 100644
--- a/proxmox-acme/src/account.rs
+++ b/proxmox-acme/src/account.rs
@@ -85,7 +85,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::request::CREATED,
+ expected: &[crate::http_status::CREATED],
};
Ok(NewOrder::new(request))
@@ -107,7 +107,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: &[crate::http_status::OK],
})
}
@@ -132,7 +132,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: &[crate::http_status::OK],
})
}
@@ -157,7 +157,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: &[crate::http_status::OK],
})
}
@@ -405,7 +405,7 @@ impl AccountCreator {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::request::CREATED,
+ expected: &[crate::http_status::CREATED],
})
}
diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
index dc755fb9..66ec6024 100644
--- a/proxmox-acme/src/async_client.rs
+++ b/proxmox-acme/src/async_client.rs
@@ -420,7 +420,7 @@ impl AcmeClient {
};
if parts.status.is_success() {
- if status != request.expected {
+ if !request.expected.contains(&status) {
return Err(Error::InvalidApi(format!(
"ACME server responded with unexpected status code: {:?}",
parts.status
@@ -498,7 +498,7 @@ impl AcmeClient {
method: "GET",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: &[crate::http_status::OK],
},
nonce,
)
@@ -550,7 +550,7 @@ impl AcmeClient {
method: "HEAD",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: &[crate::http_status::OK, crate::http_status::NO_CONTENT],
},
nonce,
)
diff --git a/proxmox-acme/src/client.rs b/proxmox-acme/src/client.rs
index 931f7245..881ee83d 100644
--- a/proxmox-acme/src/client.rs
+++ b/proxmox-acme/src/client.rs
@@ -203,7 +203,7 @@ impl Inner {
let got_nonce = self.update_nonce(&mut response)?;
if response.is_success() {
- if response.status != request.expected {
+ if !request.expected.contains(&response.status) {
return Err(Error::InvalidApi(format!(
"API server responded with unexpected status code: {:?}",
response.status
diff --git a/proxmox-acme/src/lib.rs b/proxmox-acme/src/lib.rs
index df722629..cf1bc68d 100644
--- a/proxmox-acme/src/lib.rs
+++ b/proxmox-acme/src/lib.rs
@@ -70,6 +70,10 @@ pub use order::Order;
#[doc(inline)]
pub use request::Request;
+#[cfg(feature = "impl")]
+#[doc(inline)]
+pub use request::http_status;
+
// we don't inline these:
#[cfg(feature = "impl")]
pub use order::NewOrder;
diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
index 78a90913..24e669c5 100644
--- a/proxmox-acme/src/request.rs
+++ b/proxmox-acme/src/request.rs
@@ -1,7 +1,6 @@
use serde::Deserialize;
pub(crate) const JSON_CONTENT_TYPE: &str = "application/jose+json";
-pub(crate) const CREATED: u16 = 201;
/// A request which should be performed on the ACME provider.
pub struct Request {
@@ -17,8 +16,18 @@ pub struct Request {
/// The body to pass along with request, or an empty string.
pub body: String,
- /// The expected status code a compliant ACME provider will return on success.
- pub expected: u16,
+ /// The set of HTTP status codes that indicate a successful response from an ACME provider.
+ pub expected: &'static [u16],
+}
+
+/// Common HTTP status codes used in ACME responses.
+pub mod http_status {
+ /// 200 OK
+ pub const OK: u16 = 200;
+ /// 201 Created
+ pub const CREATED: u16 = 201;
+ /// 204 No Content
+ pub const NO_CONTENT: u16 = 204;
}
/// An ACME error response contains a specially formatted type string, and can optionally
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox-backup v3 1/1] fix #6939: acme: accept HTTP 204 from newNonce endpoint
2025-11-03 10:13 13% [pbs-devel] [PATCH proxmox{, -backup} v3 " Samuel Rufinatscha
2025-11-03 10:13 13% ` [pbs-devel] [PATCH proxmox v3 1/1] " Samuel Rufinatscha
@ 2025-11-03 10:13 15% ` Samuel Rufinatscha
1 sibling, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2025-11-03 10:13 UTC (permalink / raw)
To: pbs-devel
When registering an ACME account, PBS fetches a fresh nonce by issuing a
HEAD request to the server's newNonce URL. Until now we assumed this
request would return HTTP 200 OK.
In practice, some ACME servers [1] respond with HTTP 204 No Content for
this HEAD request while still providing a valid Replay-Nonce header.
This causes PBS to abort registration with "ACME server responded with
unexpected status code: 204", even though the server would otherwise
issue certificates correctly. This behavior is technically
off-spec [2], however not forbidden.
Adjust the ACME client code in PBS to accept both 200 OK and 204 No
Content as successful results for the newNonce step. We continue to
reject other status codes so we don't silently accept arbitrary 2xx
responses.
Note: In comparison, PVE’s Perl ACME client performs a GET request [3]
instead of a HEAD request and accepts any 2xx success code when
retrieving the nonce [4]. This difference in behavior does not affect
functionality but is worth noting for consistency across
implementations.
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
[2] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
[3] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219
[4] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597
Fixes: #6939
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
src/acme/client.rs | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/acme/client.rs b/src/acme/client.rs
index 9fb6ad55..1962529a 100644
--- a/src/acme/client.rs
+++ b/src/acme/client.rs
@@ -15,7 +15,7 @@ use serde::{Deserialize, Serialize};
use proxmox_acme::account::AccountCreator;
use proxmox_acme::order::{Order, OrderData};
use proxmox_acme::types::AccountData as AcmeAccountData;
-use proxmox_acme::Request as AcmeRequest;
+use proxmox_acme::{http_status, Request as AcmeRequest};
use proxmox_acme::{Account, Authorization, Challenge, Directory, Error, ErrorResponse};
use proxmox_http::client::Client;
use proxmox_sys::fs::{replace_file, CreateOptions};
@@ -529,7 +529,7 @@ impl AcmeClient {
};
if parts.status.is_success() {
- if status != request.expected {
+ if !request.expected.contains(&status) {
return Err(Error::InvalidApi(format!(
"ACME server responded with unexpected status code: {:?}",
parts.status
@@ -606,7 +606,7 @@ impl AcmeClient {
method: "GET",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: &[http_status::OK],
},
nonce,
)
@@ -654,7 +654,7 @@ impl AcmeClient {
method: "HEAD",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: &[http_status::OK, http_status::NO_CONTENT],
},
nonce,
)
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox{, -backup} v3 0/2] fix #6939: acme: support servers returning 204 for nonce requests
@ 2025-11-03 10:13 13% Samuel Rufinatscha
2025-11-03 10:13 13% ` [pbs-devel] [PATCH proxmox v3 1/1] " Samuel Rufinatscha
2025-11-03 10:13 15% ` [pbs-devel] [PATCH proxmox-backup v3 1/1] fix #6939: acme: accept HTTP 204 from newNonce endpoint Samuel Rufinatscha
0 siblings, 2 replies; 39+ results
From: Samuel Rufinatscha @ 2025-11-03 10:13 UTC (permalink / raw)
To: pbs-devel
Hi,
this series proposes a change to ACME account registration in Proxmox
Backup Server (PBS), so that it also works with ACME servers that return
HTTP 204 No Content to the HEAD request for newNonce.
This behaviour was observed against a specific ACME deployment and
reported as bug #6939 [1]. Currently, PBS cannot register an ACME
account for this CA.
## Problem
During ACME account registration, PBS first fetches an anti-replay nonce
by sending a HEAD request to the CA’s newNonce URL. RFC 8555 7.2 [2]
says:
* the server MUST include a Replay-Nonce header with a fresh nonce,
* the server SHOULD use status 200 OK for the HEAD request,
* the server MUST also handle GET on the same resource with status 204 No
Content and an empty body [2].
Currently, our Rust ACME clients only accept 200 OK. PBS inherits that
strictness and aborts with:
*ACME server responded with unexpected status code: 204*
The author mentions, the issue did not appear with PVE9 [1].
After looking into PVE’s Perl ACME client [3] it appears it uses a GET
request instead of a HEAD request and accepts any 2xx success code
when retrieving the nonce [5]. This difference in behavior does not
affect functionality but is worth noting for consistency across
implementations.
## Ideas to solve the problem
To support ACME providers which return 204 No Content, the underlying
ACME clients need to tolerate both 200 OK and 204 No Content as valid
responses for the nonce HEAD request, as long as the Replay-Nonce is
provided.
I considered following solutions:
1. Change the `expected` field of the `AcmeRequest` type from `u16` to
`Vec<u16>`, to support multiple success codes
2. Keep `expected: u16` and add a second field e.g. `expected_other:
Vec<u16>` for "also allowed" codes.
3. Support any 2xx success codes, and remove the `expected` check
I thought (1) might be reasonable, because:
* It stays explicit and makes it clear which statuses are considered
success.
* We don’t create two parallel concepts ("expected" vs
"expected_other") which introduces additional complexity
* Can be extend later if we meet yet another harmless but not 200
variant.
* We don’t allow arbitrary 2xx.
What do you think? Do you maybe have any other solution in mind that
would fit better?
## Testing
To prove the proposed fix, I reproduced the scenario:
Pebble (release 2.8.0) from Let's Encrypt [5] running on a Debian 9 VM
as the ACME server. nginx in front of Pebble, to intercept the
`newNonce` request in order to return 204 No Content instead of 200 OK,
all other requests are unchanged and forwarded to Pebble. Trust the
Pebble and ngix CAs via `/usr/local/share/ca-certificates` +
`update-ca-certificates` on the PBS VM.
Then I ran following command against nginx:
```
proxmox-backup-manager acme account register proxytest root@backup.local
--directory 'https://nginx-address/dir
Attempting to fetch Terms of
Service from "https://acme-vm/dir"
Terms of Service:
data:text/plain,Do%20what%20thou%20wilt
Do you agree to the above terms?
[y|N]: y
Do you want to use external account binding? [y|N]: N
Attempting
to register account with "https://acme-vm/dir"...
Registration
successful, account URL: https://acme-vm/my-account/160e58b66bdd72da
```
When adjusting the nginx configuration to return any other non-expected
success status code, e.g. 205, PBS expectely rejects with `API
misbehaved: ACME server responded with unexpected status code: 205`.
## Maintainer notes:
The patch series involves the following components:
proxmox-acme: Apply PATCH 1 to change `expected` from `u16` to
`Vec<u16>`. This results in a breaking change, as it changes the public
API of the `AcmeRequest` type that is used by other components.
proxmox-acme-api: Needs to depend on the new proxmox-acme; patch bump
proxmox-backup: Apply PATCH 2 to use the new API changes; no breaking
change as of only internal changes; patch bump
proxmox-perl-rs / proxmox-datacenter-manager: Will need to use the
dependency version bumps to follow the new proxmox-acme.
## Patch summary
[PATCH 1/2] fix #6939: support providers returning 204 for nonce
requests
* Make the expected-status logic accept multiple allowed codes.
* Treat both 200 OK and 204 No Content as valid for HEAD /newNonce,
provided Replay-Nonce is present.
* Keep rejecting other codes.
[PATCH 2/2] acme: accept HTTP 204 from newNonce endpoint
* Use the updated proxmox-acme behavior in PBS.
* PBS can now register an ACME account against servers that return 204
for the nonce HEAD request.
* Still rejects unexpected codes.
Thanks for considering this patch series, I look forward to your
feedback.
Best,
Samuel Rufinatscha
## Changes from v1:
[PATCH 1/2] fix #6939: support providers returning 204 for nonce
requests
* Introduced `http_success` module to contain the http success codes
* Replaced `Vec<u16>` with `&[u16]` for expected codes to avoid
allocations.
* Clarified the PVEs Perl ACME client behaviour in the commit message.
[PATCH 2/2] acme: accept HTTP 204 from newNonce endpoint
* Integrated the `http_success` module, replacing `Vec<u16>` with `&[u16]`
* Clarified the PVEs Perl ACME client behaviour in the commit message.
## Changes from v2:
[PATCH 1/2] fix #6939: support providers returning 204 for nonce
requests
* Rename `http_success` module to `http_status`
[PATCH 2/2] acme: accept HTTP 204 from newNonce endpoint
* Replace `http_success` usage
[1] Bugzilla report #6939:
[https://bugzilla.proxmox.com/show_bug.cgi?id=6939](https://bugzilla.proxmox.com/show_bug.cgi?id=6939)
[2] RFC 8555 (ACME):
[https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2](https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2)
[3] PVE’s Perl ACME client (allow 2xx codes for nonce requests):
[https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597)
[4] Pebble ACME server:
[https://github.com/letsencrypt/pebble](https://github.com/letsencrypt/pebble)
[5] Pebble ACME server (perform GET request:
[https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219)
proxmox:
Samuel Rufinatscha (1):
fix #6939: acme: support servers returning 204 for nonce requests
proxmox-acme/src/account.rs | 10 +++++-----
proxmox-acme/src/async_client.rs | 6 +++---
proxmox-acme/src/client.rs | 2 +-
proxmox-acme/src/lib.rs | 4 ++++
proxmox-acme/src/request.rs | 15 ++++++++++++---
5 files changed, 25 insertions(+), 12 deletions(-)
proxmox-backup:
Samuel Rufinatscha (1):
fix #6939: acme: accept HTTP 204 from newNonce endpoint
src/acme/client.rs | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
Summary over all repositories:
6 files changed, 29 insertions(+), 16 deletions(-)
--
Generated by git-murpp 0.8.1
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* Re: [pbs-devel] [PATCH proxmox v2 1/1] fix #6939: acme: support servers returning 204 for nonce requests
2025-10-31 16:21 5% ` Thomas Lamprecht
@ 2025-11-03 8:51 9% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2025-11-03 8:51 UTC (permalink / raw)
To: Thomas Lamprecht, Proxmox Backup Server development discussion
On 10/31/25 5:21 PM, Thomas Lamprecht wrote:
> Am 29.10.25 um 17:45 schrieb Samuel Rufinatscha:
>> Some ACME servers (notably custom or legacy implementations) respond
>> to HEAD /newNonce with a 204 No Content instead of the
>> RFC 8555-recommended 200 OK [1]. While this behavior is technically
>> off-spec, it is not illegal. This issue was reported on our bug
>> tracker [2].
>>
>> The previous implementation treated any non-200 response as an error,
>> causing account registration to fail against such servers. Relax the
>> status-code check to accept both 200 and 204 responses (and potentially
>> support other 2xx codes) to improve interoperability.
>>
>> Note: In comparison, PVE’s Perl ACME client performs a GET request [3]
>> instead of a HEAD request and accepts any 2xx success code when
>> retrieving the nonce [4]. This difference in behavior does not affect
>> functionality but is worth noting for consistency across
>> implementations.
>>
>> [1] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
>> [2] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
>> [3] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219
>> [4] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597
>>
>> Fixes: #6939
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>
> Thanks for the v2, looks OK in general, one naming issue – that irked
> my a tiny bit to much to just ignore it – inline.
>
>> proxmox-acme/src/account.rs | 10 +++++-----
>> proxmox-acme/src/async_client.rs | 6 +++---
>> proxmox-acme/src/client.rs | 2 +-
>> proxmox-acme/src/lib.rs | 4 ++++
>> proxmox-acme/src/request.rs | 15 ++++++++++++---
>> 5 files changed, 25 insertions(+), 12 deletions(-)
>
>> diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
>> index 78a90913..0532528e 100644
>> --- a/proxmox-acme/src/request.rs
>> +++ b/proxmox-acme/src/request.rs
>> @@ -1,7 +1,6 @@
>> use serde::Deserialize;
>>
>> pub(crate) const JSON_CONTENT_TYPE: &str = "application/jose+json";
>> -pub(crate) const CREATED: u16 = 201;
>>
>> /// A request which should be performed on the ACME provider.
>> pub struct Request {
>> @@ -17,8 +16,18 @@ pub struct Request {
>> /// The body to pass along with request, or an empty string.
>> pub body: String,
>>
>> - /// The expected status code a compliant ACME provider will return on success.
>> - pub expected: u16,
>> + /// The set of HTTP status codes that indicate a successful response from an ACME provider.
>> + pub expected: &'static [u16],
>> +}
>> +
>> +/// Common HTTP success status codes used in ACME responses.
>> +pub mod http_success {
>
> It's not wrong, but reads a bit odd to me; is it really relevant to differ
> between success and (in the future, e.g. if this gets copied+adapted somewhere
> else) get a http_client_error or http_server_error module?
>
> I'd probably rather just use one of http_status or http_status_code, as the
> codes themselves are uniquely named already anyway and with "status" included
> on the name it would be IMO slightly better describe what this refers to without
> having to read the doc comment.
>
>> + /// 200 OK
>> + pub const OK: u16 = 200;
>> + /// 201 Created
>> + pub const CREATED: u16 = 201;
>> + /// 204 No Content
>> + pub const NO_CONTENT: u16 = 204;
>> }
>>
>> /// An ACME error response contains a specially formatted type string, and can optionally
>
Thanks, Thomas! Agreed, http_status sounds clearer. I’ll update it and
send a v3.
Best,
Samuel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 9%]
* Re: [pbs-devel] [PATCH proxmox v2 1/1] fix #6939: acme: support servers returning 204 for nonce requests
2025-10-29 16:45 13% ` [pbs-devel] [PATCH proxmox v2 1/1] " Samuel Rufinatscha
@ 2025-10-31 16:21 5% ` Thomas Lamprecht
2025-11-03 8:51 9% ` Samuel Rufinatscha
0 siblings, 1 reply; 39+ results
From: Thomas Lamprecht @ 2025-10-31 16:21 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Samuel Rufinatscha
Am 29.10.25 um 17:45 schrieb Samuel Rufinatscha:
> Some ACME servers (notably custom or legacy implementations) respond
> to HEAD /newNonce with a 204 No Content instead of the
> RFC 8555-recommended 200 OK [1]. While this behavior is technically
> off-spec, it is not illegal. This issue was reported on our bug
> tracker [2].
>
> The previous implementation treated any non-200 response as an error,
> causing account registration to fail against such servers. Relax the
> status-code check to accept both 200 and 204 responses (and potentially
> support other 2xx codes) to improve interoperability.
>
> Note: In comparison, PVE’s Perl ACME client performs a GET request [3]
> instead of a HEAD request and accepts any 2xx success code when
> retrieving the nonce [4]. This difference in behavior does not affect
> functionality but is worth noting for consistency across
> implementations.
>
> [1] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
> [2] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
> [3] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219
> [4] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597
>
> Fixes: #6939
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
Thanks for the v2, looks OK in general, one naming issue – that irked
my a tiny bit to much to just ignore it – inline.
> proxmox-acme/src/account.rs | 10 +++++-----
> proxmox-acme/src/async_client.rs | 6 +++---
> proxmox-acme/src/client.rs | 2 +-
> proxmox-acme/src/lib.rs | 4 ++++
> proxmox-acme/src/request.rs | 15 ++++++++++++---
> 5 files changed, 25 insertions(+), 12 deletions(-)
> diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
> index 78a90913..0532528e 100644
> --- a/proxmox-acme/src/request.rs
> +++ b/proxmox-acme/src/request.rs
> @@ -1,7 +1,6 @@
> use serde::Deserialize;
>
> pub(crate) const JSON_CONTENT_TYPE: &str = "application/jose+json";
> -pub(crate) const CREATED: u16 = 201;
>
> /// A request which should be performed on the ACME provider.
> pub struct Request {
> @@ -17,8 +16,18 @@ pub struct Request {
> /// The body to pass along with request, or an empty string.
> pub body: String,
>
> - /// The expected status code a compliant ACME provider will return on success.
> - pub expected: u16,
> + /// The set of HTTP status codes that indicate a successful response from an ACME provider.
> + pub expected: &'static [u16],
> +}
> +
> +/// Common HTTP success status codes used in ACME responses.
> +pub mod http_success {
It's not wrong, but reads a bit odd to me; is it really relevant to differ
between success and (in the future, e.g. if this gets copied+adapted somewhere
else) get a http_client_error or http_server_error module?
I'd probably rather just use one of http_status or http_status_code, as the
codes themselves are uniquely named already anyway and with "status" included
on the name it would be IMO slightly better describe what this refers to without
having to read the doc comment.
> + /// 200 OK
> + pub const OK: u16 = 200;
> + /// 201 Created
> + pub const CREATED: u16 = 201;
> + /// 204 No Content
> + pub const NO_CONTENT: u16 = 204;
> }
>
> /// An ACME error response contains a specially formatted type string, and can optionally
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* [pbs-devel] superseded: [PATCH proxmox{, -backup} 0/2] fix #6939: acme: support servers returning 204 for nonce requests
2025-10-28 15:21 14% [pbs-devel] [PATCH proxmox{, -backup} 0/2] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (2 preceding siblings ...)
2025-10-29 7:51 5% ` [pbs-devel] [PATCH proxmox{, -backup} 0/2] fix #6939: acme: support servers returning 204 for nonce requests Thomas Lamprecht
@ 2025-10-29 16:49 13% ` Samuel Rufinatscha
3 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2025-10-29 16:49 UTC (permalink / raw)
To: pbs-devel
https://lore.proxmox.com/pbs-devel/20251029164520.263926-1-s.rufinatscha@proxmox.com/T/#t
On 10/28/25 4:21 PM, Samuel Rufinatscha wrote:
> Hi,
>
> this series proposes a change to ACME account registration in Proxmox
> Backup Server (PBS), so that it also works with ACME servers that return
> HTTP 204 No Content to the HEAD request for newNonce.
>
> This behaviour was observed against a specific ACME deployment and
> reported as bug #6939 [1]. Currently, PBS cannot register an ACME
> account for this CA.
>
> ## Problem
>
> During ACME account registration, PBS first fetches an anti-replay nonce
> by sending a HEAD request to the CA’s newNonce URL. RFC 8555 7.2 [2]
> says:
>
> * the server MUST include a Replay-Nonce header with a fresh nonce,
> * the server SHOULD use status 200 OK for the HEAD request,
> * the server MUST also handle GET on the same resource with status 204 No
> Content and an empty body [2].
>
> Currently, our Rust ACME clients only accept 200 OK. PBS inherits that
> strictness and aborts with:
>
> *ACME server responded with unexpected status code: 204*
>
> This is also stricter than the PVE Perl ACME client, which tolerates any
> 2xx success codes [3]. The author mentions, the issue did not appear
> with PVE9 [1].
>
> ## Ideas to solve the problem
>
> To support ACME providers which return 204 No Content, the underlying
> ACME clients need to tolerate both 200 OK and 204 No Content as valid
> responses for the nonce HEAD request, as long as the Replay-Nonce is
> provided.
>
> I considered following solutions:
>
> 1. Change the `expected` field of the `AcmeRequest` type from `u16` to
> `Vec<u16>`, to support multiple success codes
>
> 2. Keep `expected: u16` and add a second field e.g. `expected_other:
> Vec<u16>` for "also allowed" codes.
>
> 3. Support any 2xx success codes, and remove the `expected` check
>
> I thought (1) might be reasonable, because:
>
> * It stays explicit and makes it clear which statuses are considered
> success.
> * We don’t create two parallel concepts ("expected" vs
> "expected_other") which introduces additional complexity
> * Can be extend later if we meet yet another harmless but not 200
> variant.
> * We don’t allow arbitrary 2xx.
>
> What do you think? Do you maybe have any other solution in mind that
> would fit better?
>
> ## Testing
>
> To prove the proposed fix, I reproduced the scenario:
>
> Pebble (release 2.8.0) from Let's Encrypt [5] running on a Debian 9 VM
> as the ACME server. nginx in front of Pebble, to intercept the
> `newNonce` request in order to return 204 No Content instead of 200 OK,
> all other requests are unchanged and forwarded to Pebble. Trust the
> Pebble and ngix CAs via `/usr/local/share/ca-certificates` +
> `update-ca-certificates` on the PBS VM.
>
> Then I ran following command against nginx:
>
> ```
> proxmox-backup-manager acme account register proxytest root@backup.local
> --directory 'https://nginx-address/dir
>
> Attempting to fetch Terms of
> Service from "https://acme-vm/dir"
> Terms of Service:
> data:text/plain,Do%20what%20thou%20wilt
> Do you agree to the above terms?
> [y|N]: y
> Do you want to use external account binding? [y|N]: N
> Attempting
> to register account with "https://acme-vm/dir"...
> Registration
> successful, account URL: https://acme-vm/my-account/160e58b66bdd72da
> ```
>
> When adjusting the nginx configuration to return any other non-expected
> success status code, e.g. 205, PBS expectely rejects with `API
> misbehaved: ACME server responded with unexpected status code: 205`.
>
> ## Maintainer notes:
>
> The patch series involves the following components:
>
> proxmox-acme: Apply PATCH 1 to change `expected` from `u16` to
> `Vec<u16>`. This results in a breaking change, as it changes the public
> API of the `AcmeRequest` type that is used by other components.
>
> proxmox-acme-api: Needs to depend on the new proxmox-acme; patch bump
>
> proxmox-backup: Apply PATCH 2 to use the new API changes; no breaking
> change as of only internal changes; patch bump
>
> proxmox-perl-rs / proxmox-datacenter-manager: Will need to use the
> dependency version bumps to follow the new proxmox-acme.
>
> ## Patch summary
>
> [PATCH 1/2] fix #6939: support providers returning 204 for nonce
> requests
>
> * Make the expected-status logic accept multiple allowed codes.
> * Treat both 200 OK and 204 No Content as valid for HEAD /newNonce,
> provided Replay-Nonce is present.
> * Keep rejecting other codes.
>
> [PATCH 2/2] acme: accept HTTP 204 from newNonce endpoint
>
> * Use the updated proxmox-acme behavior in PBS.
> * PBS can now register an ACME account against servers that return 204
> for the nonce HEAD request.
> * Still rejects unexpected codes.
>
> Thanks for considering this patch series, I look forward to your
> feedback.
>
> Best,
> Samuel Rufinatscha
>
> [1] Bugzilla report #6939:
> [https://bugzilla.proxmox.com/show_bug.cgi?id=6939](https://bugzilla.proxmox.com/show_bug.cgi?id=6939)
> [2] RFC 8555 (ACME):
> [https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2](https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2)
> [3] PVE’s Perl ACME client:
> [https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597)
> [4] Pebble ACME server:
> [https://github.com/letsencrypt/pebble](https://github.com/letsencrypt/pebble)
>
> proxmox:
>
> Samuel Rufinatscha (1):
> fix #6939: acme: support servers returning 204 for nonce requests
>
> proxmox-acme/src/account.rs | 10 +++++-----
> proxmox-acme/src/async_client.rs | 6 +++---
> proxmox-acme/src/client.rs | 2 +-
> proxmox-acme/src/request.rs | 4 ++--
> 4 files changed, 11 insertions(+), 11 deletions(-)
>
>
> proxmox-backup:
>
> Samuel Rufinatscha (1):
> fix #6939: acme: accept HTTP 204 from newNonce endpoint
>
> src/acme/client.rs | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
>
> Summary over all repositories:
> 5 files changed, 14 insertions(+), 14 deletions(-)
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox v2 1/1] fix #6939: acme: support servers returning 204 for nonce requests
2025-10-29 16:45 13% [pbs-devel] [PATCH proxmox{, -backup} v2 " Samuel Rufinatscha
@ 2025-10-29 16:45 13% ` Samuel Rufinatscha
2025-10-31 16:21 5% ` Thomas Lamprecht
2025-10-29 16:45 15% ` [pbs-devel] [PATCH proxmox-backup v2 1/1] fix #6939: acme: accept HTTP 204 from newNonce endpoint Samuel Rufinatscha
2025-11-03 10:21 13% ` [pbs-devel] superseded: [PATCH proxmox{, -backup} v2 0/2] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2 siblings, 1 reply; 39+ results
From: Samuel Rufinatscha @ 2025-10-29 16:45 UTC (permalink / raw)
To: pbs-devel
Some ACME servers (notably custom or legacy implementations) respond
to HEAD /newNonce with a 204 No Content instead of the
RFC 8555-recommended 200 OK [1]. While this behavior is technically
off-spec, it is not illegal. This issue was reported on our bug
tracker [2].
The previous implementation treated any non-200 response as an error,
causing account registration to fail against such servers. Relax the
status-code check to accept both 200 and 204 responses (and potentially
support other 2xx codes) to improve interoperability.
Note: In comparison, PVE’s Perl ACME client performs a GET request [3]
instead of a HEAD request and accepts any 2xx success code when
retrieving the nonce [4]. This difference in behavior does not affect
functionality but is worth noting for consistency across
implementations.
[1] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
[3] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219
[4] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597
Fixes: #6939
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme/src/account.rs | 10 +++++-----
proxmox-acme/src/async_client.rs | 6 +++---
proxmox-acme/src/client.rs | 2 +-
proxmox-acme/src/lib.rs | 4 ++++
proxmox-acme/src/request.rs | 15 ++++++++++++---
5 files changed, 25 insertions(+), 12 deletions(-)
diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
index 73d786b8..44f9383f 100644
--- a/proxmox-acme/src/account.rs
+++ b/proxmox-acme/src/account.rs
@@ -85,7 +85,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::request::CREATED,
+ expected: &[crate::http_success::CREATED],
};
Ok(NewOrder::new(request))
@@ -107,7 +107,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: &[crate::http_success::OK],
})
}
@@ -132,7 +132,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: &[crate::http_success::OK],
})
}
@@ -157,7 +157,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: &[crate::http_success::OK],
})
}
@@ -405,7 +405,7 @@ impl AccountCreator {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::request::CREATED,
+ expected: &[crate::http_success::CREATED],
})
}
diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
index 60e1f359..b9df0f55 100644
--- a/proxmox-acme/src/async_client.rs
+++ b/proxmox-acme/src/async_client.rs
@@ -421,7 +421,7 @@ impl AcmeClient {
};
if parts.status.is_success() {
- if status != request.expected {
+ if !request.expected.contains(&status) {
return Err(Error::InvalidApi(format!(
"ACME server responded with unexpected status code: {:?}",
parts.status
@@ -501,7 +501,7 @@ impl AcmeClient {
method: "GET",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: &[crate::http_success::OK],
},
nonce,
)
@@ -553,7 +553,7 @@ impl AcmeClient {
method: "HEAD",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: &[crate::http_success::OK, crate::http_success::NO_CONTENT],
},
nonce,
)
diff --git a/proxmox-acme/src/client.rs b/proxmox-acme/src/client.rs
index d8a62081..ea8a8655 100644
--- a/proxmox-acme/src/client.rs
+++ b/proxmox-acme/src/client.rs
@@ -203,7 +203,7 @@ impl Inner {
let got_nonce = self.update_nonce(&mut response)?;
if response.is_success() {
- if response.status != request.expected {
+ if !request.expected.contains(&response.status) {
return Err(Error::InvalidApi(format!(
"API server responded with unexpected status code: {:?}",
response.status
diff --git a/proxmox-acme/src/lib.rs b/proxmox-acme/src/lib.rs
index df722629..ec586ec4 100644
--- a/proxmox-acme/src/lib.rs
+++ b/proxmox-acme/src/lib.rs
@@ -70,6 +70,10 @@ pub use order::Order;
#[doc(inline)]
pub use request::Request;
+#[cfg(feature = "impl")]
+#[doc(inline)]
+pub use request::http_success;
+
// we don't inline these:
#[cfg(feature = "impl")]
pub use order::NewOrder;
diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
index 78a90913..0532528e 100644
--- a/proxmox-acme/src/request.rs
+++ b/proxmox-acme/src/request.rs
@@ -1,7 +1,6 @@
use serde::Deserialize;
pub(crate) const JSON_CONTENT_TYPE: &str = "application/jose+json";
-pub(crate) const CREATED: u16 = 201;
/// A request which should be performed on the ACME provider.
pub struct Request {
@@ -17,8 +16,18 @@ pub struct Request {
/// The body to pass along with request, or an empty string.
pub body: String,
- /// The expected status code a compliant ACME provider will return on success.
- pub expected: u16,
+ /// The set of HTTP status codes that indicate a successful response from an ACME provider.
+ pub expected: &'static [u16],
+}
+
+/// Common HTTP success status codes used in ACME responses.
+pub mod http_success {
+ /// 200 OK
+ pub const OK: u16 = 200;
+ /// 201 Created
+ pub const CREATED: u16 = 201;
+ /// 204 No Content
+ pub const NO_CONTENT: u16 = 204;
}
/// An ACME error response contains a specially formatted type string, and can optionally
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox-backup v2 1/1] fix #6939: acme: accept HTTP 204 from newNonce endpoint
2025-10-29 16:45 13% [pbs-devel] [PATCH proxmox{, -backup} v2 " Samuel Rufinatscha
2025-10-29 16:45 13% ` [pbs-devel] [PATCH proxmox v2 1/1] " Samuel Rufinatscha
@ 2025-10-29 16:45 15% ` Samuel Rufinatscha
2025-11-03 10:21 13% ` [pbs-devel] superseded: [PATCH proxmox{, -backup} v2 0/2] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2025-10-29 16:45 UTC (permalink / raw)
To: pbs-devel
When registering an ACME account, PBS fetches a fresh nonce by issuing a
HEAD request to the server's newNonce URL. Until now we assumed this
request would return HTTP 200 OK.
In practice, some ACME servers [1] respond with HTTP 204 No Content for
this HEAD request while still providing a valid Replay-Nonce header.
This causes PBS to abort registration with "ACME server responded with
unexpected status code: 204", even though the server would otherwise
issue certificates correctly. This behavior is technically
off-spec [2], however not forbidden.
Adjust the ACME client code in PBS to accept both 200 OK and 204 No
Content as successful results for the newNonce step. We continue to
reject other status codes so we don't silently accept arbitrary 2xx
responses.
Note: In comparison, PVE’s Perl ACME client performs a GET request [3]
instead of a HEAD request and accepts any 2xx success code when
retrieving the nonce [4]. This difference in behavior does not affect
functionality but is worth noting for consistency across
implementations.
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
[2] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
[3] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219
[4] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597
Fixes: #6939
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
src/acme/client.rs | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/src/acme/client.rs b/src/acme/client.rs
index 1c12a4b9..7c32940b 100644
--- a/src/acme/client.rs
+++ b/src/acme/client.rs
@@ -15,7 +15,7 @@ use serde::{Deserialize, Serialize};
use proxmox_acme::account::AccountCreator;
use proxmox_acme::order::{Order, OrderData};
use proxmox_acme::types::AccountData as AcmeAccountData;
-use proxmox_acme::Request as AcmeRequest;
+use proxmox_acme::{http_success, Request as AcmeRequest};
use proxmox_acme::{Account, Authorization, Challenge, Directory, Error, ErrorResponse};
use proxmox_http::client::Client;
use proxmox_sys::fs::{replace_file, CreateOptions};
@@ -530,7 +530,7 @@ impl AcmeClient {
};
if parts.status.is_success() {
- if status != request.expected {
+ if !request.expected.contains(&status) {
return Err(Error::InvalidApi(format!(
"ACME server responded with unexpected status code: {:?}",
parts.status
@@ -609,7 +609,7 @@ impl AcmeClient {
method: "GET",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: &[http_success::OK],
},
nonce,
)
@@ -657,7 +657,7 @@ impl AcmeClient {
method: "HEAD",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: &[http_success::OK, http_success::NO_CONTENT],
},
nonce,
)
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox{, -backup} v2 0/2] fix #6939: acme: support servers returning 204 for nonce requests
@ 2025-10-29 16:45 13% Samuel Rufinatscha
2025-10-29 16:45 13% ` [pbs-devel] [PATCH proxmox v2 1/1] " Samuel Rufinatscha
` (2 more replies)
0 siblings, 3 replies; 39+ results
From: Samuel Rufinatscha @ 2025-10-29 16:45 UTC (permalink / raw)
To: pbs-devel
Hi,
this series proposes a change to ACME account registration in Proxmox
Backup Server (PBS), so that it also works with ACME servers that return
HTTP 204 No Content to the HEAD request for newNonce.
This behaviour was observed against a specific ACME deployment and
reported as bug #6939 [1]. Currently, PBS cannot register an ACME
account for this CA.
## Problem
During ACME account registration, PBS first fetches an anti-replay nonce
by sending a HEAD request to the CA’s newNonce URL. RFC 8555 7.2 [2]
says:
* the server MUST include a Replay-Nonce header with a fresh nonce,
* the server SHOULD use status 200 OK for the HEAD request,
* the server MUST also handle GET on the same resource with status 204 No
Content and an empty body [2].
Currently, our Rust ACME clients only accept 200 OK. PBS inherits that
strictness and aborts with:
*ACME server responded with unexpected status code: 204*
The author mentions, the issue did not appear with PVE9 [1].
After looking into PVE’s Perl ACME client [3] it appears it uses a GET
request instead of a HEAD request and accepts any 2xx success code
when retrieving the nonce [5]. This difference in behavior does not
affect functionality but is worth noting for consistency across
implementations.
## Ideas to solve the problem
To support ACME providers which return 204 No Content, the underlying
ACME clients need to tolerate both 200 OK and 204 No Content as valid
responses for the nonce HEAD request, as long as the Replay-Nonce is
provided.
I considered following solutions:
1. Change the `expected` field of the `AcmeRequest` type from `u16` to
`Vec<u16>`, to support multiple success codes
2. Keep `expected: u16` and add a second field e.g. `expected_other:
Vec<u16>` for "also allowed" codes.
3. Support any 2xx success codes, and remove the `expected` check
I thought (1) might be reasonable, because:
* It stays explicit and makes it clear which statuses are considered
success.
* We don’t create two parallel concepts ("expected" vs
"expected_other") which introduces additional complexity
* Can be extend later if we meet yet another harmless but not 200
variant.
* We don’t allow arbitrary 2xx.
What do you think? Do you maybe have any other solution in mind that
would fit better?
## Testing
To prove the proposed fix, I reproduced the scenario:
Pebble (release 2.8.0) from Let's Encrypt [5] running on a Debian 9 VM
as the ACME server. nginx in front of Pebble, to intercept the
`newNonce` request in order to return 204 No Content instead of 200 OK,
all other requests are unchanged and forwarded to Pebble. Trust the
Pebble and ngix CAs via `/usr/local/share/ca-certificates` +
`update-ca-certificates` on the PBS VM.
Then I ran following command against nginx:
```
proxmox-backup-manager acme account register proxytest root@backup.local
--directory 'https://nginx-address/dir
Attempting to fetch Terms of
Service from "https://acme-vm/dir"
Terms of Service:
data:text/plain,Do%20what%20thou%20wilt
Do you agree to the above terms?
[y|N]: y
Do you want to use external account binding? [y|N]: N
Attempting
to register account with "https://acme-vm/dir"...
Registration
successful, account URL: https://acme-vm/my-account/160e58b66bdd72da
```
When adjusting the nginx configuration to return any other non-expected
success status code, e.g. 205, PBS expectely rejects with `API
misbehaved: ACME server responded with unexpected status code: 205`.
## Maintainer notes:
The patch series involves the following components:
proxmox-acme: Apply PATCH 1 to change `expected` from `u16` to
`Vec<u16>`. This results in a breaking change, as it changes the public
API of the `AcmeRequest` type that is used by other components.
proxmox-acme-api: Needs to depend on the new proxmox-acme; patch bump
proxmox-backup: Apply PATCH 2 to use the new API changes; no breaking
change as of only internal changes; patch bump
proxmox-perl-rs / proxmox-datacenter-manager: Will need to use the
dependency version bumps to follow the new proxmox-acme.
## Patch summary
[PATCH 1/2] fix #6939: support providers returning 204 for nonce
requests
* Make the expected-status logic accept multiple allowed codes.
* Treat both 200 OK and 204 No Content as valid for HEAD /newNonce,
provided Replay-Nonce is present.
* Keep rejecting other codes.
[PATCH 2/2] acme: accept HTTP 204 from newNonce endpoint
* Use the updated proxmox-acme behavior in PBS.
* PBS can now register an ACME account against servers that return 204
for the nonce HEAD request.
* Still rejects unexpected codes.
Thanks for considering this patch series, I look forward to your
feedback.
Best,
Samuel Rufinatscha
## Changes from v1:
[PATCH 1/2] fix #6939: support providers returning 204 for nonce
requests
* Introduced `http_success` module to contain the http success codes
* Replaced `Vec<u16>` with `&[u16]` for expected codes to avoid
allocations.
* Clarified the PVEs Perl ACME client behaviour in the commit message.
[PATCH 2/2] acme: accept HTTP 204 from newNonce endpoint
* Integrated the `http_success` module, replacing `Vec<u16>` with `&[u16]`
* Clarified the PVEs Perl ACME client behaviour in the commit message.
[1] Bugzilla report #6939:
[https://bugzilla.proxmox.com/show_bug.cgi?id=6939](https://bugzilla.proxmox.com/show_bug.cgi?id=6939)
[2] RFC 8555 (ACME):
[https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2](https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2)
[3] PVE’s Perl ACME client (allow 2xx codes for nonce requests):
[https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597)
[4] Pebble ACME server:
[https://github.com/letsencrypt/pebble](https://github.com/letsencrypt/pebble)
[5] Pebble ACME server (perform GET request:
[https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219)
proxmox:
Samuel Rufinatscha (1):
fix #6939: acme: support servers returning 204 for nonce requests
proxmox-acme/src/account.rs | 10 +++++-----
proxmox-acme/src/async_client.rs | 6 +++---
proxmox-acme/src/client.rs | 2 +-
proxmox-acme/src/lib.rs | 4 ++++
proxmox-acme/src/request.rs | 15 ++++++++++++---
5 files changed, 25 insertions(+), 12 deletions(-)
proxmox-backup:
Samuel Rufinatscha (1):
fix #6939: acme: accept HTTP 204 from newNonce endpoint
src/acme/client.rs | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
Summary over all repositories:
6 files changed, 29 insertions(+), 16 deletions(-)
--
Generated by git-murpp 0.8.1
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* Re: [pbs-devel] [PATCH proxmox{, -backup} 0/2] fix #6939: acme: support servers returning 204 for nonce requests
2025-10-29 7:51 5% ` [pbs-devel] [PATCH proxmox{, -backup} 0/2] fix #6939: acme: support servers returning 204 for nonce requests Thomas Lamprecht
@ 2025-10-29 16:02 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2025-10-29 16:02 UTC (permalink / raw)
To: Thomas Lamprecht, Proxmox Backup Server development discussion
On 10/29/25 8:50 AM, Thomas Lamprecht wrote:
> Thanks for the patch and actually also checking if PVE is affected, and why
> it isn't.
>
> Am 28.10.25 um 20:34 schrieb Samuel Rufinatscha:
>> This is also stricter than the PVE Perl ACME client, which tolerates any
>> 2xx success codes [3]. The author mentions, the issue did not appear
>> with PVE9 [1].
>>
>> ## Ideas to solve the problem
>>
>> To support ACME providers which return 204 No Content, the underlying
>> ACME clients need to tolerate both 200 OK and 204 No Content as valid
>> responses for the nonce HEAD request, as long as the Replay-Nonce is
>> provided.
>>
>> I considered following solutions:
>>
>> 1. Change the `expected` field of the `AcmeRequest` type from `u16` to
>> `Vec<u16>`, to support multiple success codes
>>
>> 2. Keep `expected: u16` and add a second field e.g. `expected_other:
>> Vec<u16>` for "also allowed" codes.
>>
>> 3. Support any 2xx success codes, and remove the `expected` check
>>
>> I thought (1) might be reasonable, because:
>>
>> * It stays explicit and makes it clear which statuses are considered
>> success.
>> * We don’t create two parallel concepts ("expected" vs
>> "expected_other") which introduces additional complexity
>> * Can be extend later if we meet yet another harmless but not 200
>> variant.
>> * We don’t allow arbitrary 2xx.
>>
>> What do you think? Do you maybe have any other solution in mind that
>> would fit better?
>
> There probably isn't answer that strictly right in all cases, but in
> general it's good to have similar behavior across implementations,
> especially given that we do not know of any report where the PVE behavior
> of accepting all 2xx response codes caused any problems, from that 3.
> would be best, or does the RFC forbid the server to accept other status
> codes?
>
> That said, in practice only 201 (Created) might make sense for ACME in
> addition to the referenced 200 (OK) and 204 (No Content), and following
> the RFC is fine, so 1. is IMO also a good solution here.
> Please note the difference to PVE in the commit message, there you write
> that behavior is now aligned with PVE, but it's rather "closer aligned"
> to, not fully.
According to RFC 8555, most ACME endpoints define exact expected status
codes. The only case where multiple 2xx codes are explicitly allowed is the
newNonce endpoint, where the RFC states that the server SHOULD return
200 for
HEAD and MUST return 204 for GET.
I further looked into the Perl client and found that it enforces specific
codes for most endpoints as well, but performs a GET request [1] for nonce
retrieval (instead of HEAD) and accepts any 2xx success [2] code in that
case. So the behavior isn’t directly comparable to the Rust client, but it’s
worth noting. Functionally, this shouldn’t be an issue, as ACME
providers are
required to support both methods, with HEAD being the preferred one.
I will update the commit messages and cover letter accordingly.
[1]
https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219
[2]
https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox 1/1] fix #6939: acme: support servers returning 204 for nonce requests
2025-10-29 10:38 5% ` Wolfgang Bumiller
@ 2025-10-29 15:56 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2025-10-29 15:56 UTC (permalink / raw)
To: Wolfgang Bumiller; +Cc: pbs-devel
On 10/29/25 11:38 AM, Wolfgang Bumiller wrote:
> On Tue, Oct 28, 2025 at 04:22:00PM +0100, Samuel Rufinatscha wrote:
>> Some ACME servers (notably custom or legacy implementations) respond
>> to HEAD /newNonce with a 204 No Content instead of the
>> RFC 8555-recommended 200 OK [1]. While this behavior is technically
>> off-spec, it is functionally harmless. This issue was reported on our
>> bug tracker [2].
>>
>> The previous implementation treated any non-200 response as an error,
>> causing account registration to fail against such servers. Relax the
>> status-code check to accept both 200 and 204 responses (and potentially
>> support other 2xx codes) to improve interoperability.
>>
>> This aligns behavior with PVE’s more tolerant Perl ACME client and
>> avoids regressions.
>>
>> [1] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
>> [2] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
>>
>> Fixes: #6939
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> proxmox-acme/src/account.rs | 10 +++++-----
>> proxmox-acme/src/async_client.rs | 6 +++---
>> proxmox-acme/src/client.rs | 2 +-
>> proxmox-acme/src/request.rs | 4 ++--
>> 4 files changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
>> index 73d786b8..60719865 100644
>> --- a/proxmox-acme/src/account.rs
>> +++ b/proxmox-acme/src/account.rs
>> @@ -85,7 +85,7 @@ impl Account {
>> method: "POST",
>> content_type: crate::request::JSON_CONTENT_TYPE,
>> body,
>> - expected: crate::request::CREATED,
>> + expected: vec![crate::request::CREATED],
>> };
>>
>> Ok(NewOrder::new(request))
>> @@ -107,7 +107,7 @@ impl Account {
>> method: "POST",
>> content_type: crate::request::JSON_CONTENT_TYPE,
>> body,
>> - expected: 200,
>> + expected: vec![200],
>> })
>> }
>>
>> @@ -132,7 +132,7 @@ impl Account {
>> method: "POST",
>> content_type: crate::request::JSON_CONTENT_TYPE,
>> body,
>> - expected: 200,
>> + expected: vec![200],
>> })
>> }
>>
>> @@ -157,7 +157,7 @@ impl Account {
>> method: "POST",
>> content_type: crate::request::JSON_CONTENT_TYPE,
>> body,
>> - expected: 200,
>> + expected: vec![200],
>> })
>> }
>>
>> @@ -405,7 +405,7 @@ impl AccountCreator {
>> method: "POST",
>> content_type: crate::request::JSON_CONTENT_TYPE,
>> body,
>> - expected: crate::request::CREATED,
>> + expected: vec![crate::request::CREATED],
>> })
>> }
>>
>> diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
>> index 60e1f359..0901aa8d 100644
>> --- a/proxmox-acme/src/async_client.rs
>> +++ b/proxmox-acme/src/async_client.rs
>> @@ -421,7 +421,7 @@ impl AcmeClient {
>> };
>>
>> if parts.status.is_success() {
>> - if status != request.expected {
>> + if !request.expected.contains(&status) {
>> return Err(Error::InvalidApi(format!(
>> "ACME server responded with unexpected status code: {:?}",
>> parts.status
>> @@ -501,7 +501,7 @@ impl AcmeClient {
>> method: "GET",
>> content_type: "",
>> body: String::new(),
>> - expected: 200,
>> + expected: vec![200],
>> },
>> nonce,
>> )
>> @@ -553,7 +553,7 @@ impl AcmeClient {
>> method: "HEAD",
>> content_type: "",
>> body: String::new(),
>> - expected: 200,
>> + expected: vec![200, 204],
>> },
>> nonce,
>> )
>> diff --git a/proxmox-acme/src/client.rs b/proxmox-acme/src/client.rs
>> index d8a62081..ea8a8655 100644
>> --- a/proxmox-acme/src/client.rs
>> +++ b/proxmox-acme/src/client.rs
>> @@ -203,7 +203,7 @@ impl Inner {
>> let got_nonce = self.update_nonce(&mut response)?;
>>
>> if response.is_success() {
>> - if response.status != request.expected {
>> + if !request.expected.contains(&response.status) {
>> return Err(Error::InvalidApi(format!(
>> "API server responded with unexpected status code: {:?}",
>> response.status
>> diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
>> index 78a90913..38e825d6 100644
>> --- a/proxmox-acme/src/request.rs
>> +++ b/proxmox-acme/src/request.rs
>> @@ -17,8 +17,8 @@ pub struct Request {
>> /// The body to pass along with request, or an empty string.
>> pub body: String,
>>
>> - /// The expected status code a compliant ACME provider will return on success.
>> - pub expected: u16,
>> + /// The set of HTTP status codes that indicate a successful response from an ACME provider.
>> + pub expected: Vec<u16>,
>
> We always have a static set, so I'd rather use `&'static [u16]` here.
> There's no need to allocate usually-single-element vectors everywhere.
Agree, will replace the `Vec` with `&'static [u16]`.
>
>> }
>>
>> /// An ACME error response contains a specially formatted type string, and can optionally
>> --
>> 2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox 1/1] fix #6939: acme: support servers returning 204 for nonce requests
2025-10-29 10:36 0% ` Wolfgang Bumiller
@ 2025-10-29 15:50 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2025-10-29 15:50 UTC (permalink / raw)
To: Wolfgang Bumiller, Thomas Lamprecht
Cc: Proxmox Backup Server development discussion
On 10/29/25 11:35 AM, Wolfgang Bumiller wrote:
> On Wed, Oct 29, 2025 at 08:53:34AM +0100, Thomas Lamprecht wrote:
>> Am 29.10.25 um 08:23 schrieb Christian Ebner:
>>> Hi, thanks for the patches!
>>>
>>> comments inline
>>>
>>> On 10/28/25 8:34 PM, Samuel Rufinatscha wrote:
>>>> Some ACME servers (notably custom or legacy implementations) respond
>>>> to HEAD /newNonce with a 204 No Content instead of the
>>>> RFC 8555-recommended 200 OK [1]. While this behavior is technically
>>>> off-spec, it is functionally harmless. This issue was reported on our
>>>> bug tracker [2].
>>>>
>>>> The previous implementation treated any non-200 response as an error,
>>>> causing account registration to fail against such servers. Relax the
>>>> status-code check to accept both 200 and 204 responses (and potentially
>>>> support other 2xx codes) to improve interoperability.
>>>>
>>>> This aligns behavior with PVE’s more tolerant Perl ACME client and
>>>> avoids regressions.
>>>>
>>>> [1] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
>>>> [2] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
>>>>
>>>> Fixes: #6939
>>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>>> ---
>>>> proxmox-acme/src/account.rs | 10 +++++-----
>>>> proxmox-acme/src/async_client.rs | 6 +++---
>>>> proxmox-acme/src/client.rs | 2 +-
>>>> proxmox-acme/src/request.rs | 4 ++--
>>>> 4 files changed, 11 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
>>>> index 73d786b8..60719865 100644
>>>> --- a/proxmox-acme/src/account.rs
>>>> +++ b/proxmox-acme/src/account.rs
>>>> @@ -85,7 +85,7 @@ impl Account {
>>>> method: "POST",
>>>> content_type: crate::request::JSON_CONTENT_TYPE,
>>>> body,
>>>> - expected: crate::request::CREATED,
>>>> + expected: vec![crate::request::CREATED],
>>>
>>> while this is defined as dedicated constant...
>>>
>>>> };
>>>> Ok(NewOrder::new(request))
>>>> @@ -107,7 +107,7 @@ impl Account {
>>>> method: "POST",
>>>> content_type: crate::request::JSON_CONTENT_TYPE,
>>>> body,
>>>> - expected: 200,
>>>> + expected: vec![200],
>>>
>>> ... these and the others below are not. Same for the 204 status code you are about to add.
>>>
>>> So in preparation for adding the new status code, these should probably be defined as, either:
>>> - as dedicated status code constants as well, or
>>> - all moved over to directly use https://docs.rs/http/1.3.1/http/status/struct.StatusCode.html
>>>
>>> I feel like the latter is not done here intentionally to avoid the dependency on hyper or http (re-exported by hyper) for the api types only.
>>
>> While you are right that constants are generally nicer, IMO HTTP codes are
>> very stable and universal to be fine to be used directly as numbers in the few
>> limited instances here.
>
> Mostly this, but we can also just add internal constants as well. 200
> just seemed common enough...
>>
>> If we already (even just transitively) would get them from a dependency we still
>> should switch to that, but I'd not introduce a new dependency just for that; IMO
>> to high of a cost.
First thanks for the review Christian, Thomas, Wolfgang - agree!
I checked the option of using `StatusCode`, but as you mentioned, that
would require adding the `http` or `hyper` dependency, which we
currently don’t include in the core types. I would therefore drop the
constant as suggested and introduced a small, dedicated internal module
to hold the ACME HTTP success constants. This would keep them together
and would make imports handy.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox 1/1] fix #6939: acme: support servers returning 204 for nonce requests
2025-10-28 15:22 15% ` [pbs-devel] [PATCH proxmox 1/1] " Samuel Rufinatscha
2025-10-29 7:23 5% ` Christian Ebner
@ 2025-10-29 10:38 5% ` Wolfgang Bumiller
2025-10-29 15:56 6% ` Samuel Rufinatscha
1 sibling, 1 reply; 39+ results
From: Wolfgang Bumiller @ 2025-10-29 10:38 UTC (permalink / raw)
To: Samuel Rufinatscha; +Cc: pbs-devel
On Tue, Oct 28, 2025 at 04:22:00PM +0100, Samuel Rufinatscha wrote:
> Some ACME servers (notably custom or legacy implementations) respond
> to HEAD /newNonce with a 204 No Content instead of the
> RFC 8555-recommended 200 OK [1]. While this behavior is technically
> off-spec, it is functionally harmless. This issue was reported on our
> bug tracker [2].
>
> The previous implementation treated any non-200 response as an error,
> causing account registration to fail against such servers. Relax the
> status-code check to accept both 200 and 204 responses (and potentially
> support other 2xx codes) to improve interoperability.
>
> This aligns behavior with PVE’s more tolerant Perl ACME client and
> avoids regressions.
>
> [1] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
> [2] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
>
> Fixes: #6939
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> proxmox-acme/src/account.rs | 10 +++++-----
> proxmox-acme/src/async_client.rs | 6 +++---
> proxmox-acme/src/client.rs | 2 +-
> proxmox-acme/src/request.rs | 4 ++--
> 4 files changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
> index 73d786b8..60719865 100644
> --- a/proxmox-acme/src/account.rs
> +++ b/proxmox-acme/src/account.rs
> @@ -85,7 +85,7 @@ impl Account {
> method: "POST",
> content_type: crate::request::JSON_CONTENT_TYPE,
> body,
> - expected: crate::request::CREATED,
> + expected: vec![crate::request::CREATED],
> };
>
> Ok(NewOrder::new(request))
> @@ -107,7 +107,7 @@ impl Account {
> method: "POST",
> content_type: crate::request::JSON_CONTENT_TYPE,
> body,
> - expected: 200,
> + expected: vec![200],
> })
> }
>
> @@ -132,7 +132,7 @@ impl Account {
> method: "POST",
> content_type: crate::request::JSON_CONTENT_TYPE,
> body,
> - expected: 200,
> + expected: vec![200],
> })
> }
>
> @@ -157,7 +157,7 @@ impl Account {
> method: "POST",
> content_type: crate::request::JSON_CONTENT_TYPE,
> body,
> - expected: 200,
> + expected: vec![200],
> })
> }
>
> @@ -405,7 +405,7 @@ impl AccountCreator {
> method: "POST",
> content_type: crate::request::JSON_CONTENT_TYPE,
> body,
> - expected: crate::request::CREATED,
> + expected: vec![crate::request::CREATED],
> })
> }
>
> diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
> index 60e1f359..0901aa8d 100644
> --- a/proxmox-acme/src/async_client.rs
> +++ b/proxmox-acme/src/async_client.rs
> @@ -421,7 +421,7 @@ impl AcmeClient {
> };
>
> if parts.status.is_success() {
> - if status != request.expected {
> + if !request.expected.contains(&status) {
> return Err(Error::InvalidApi(format!(
> "ACME server responded with unexpected status code: {:?}",
> parts.status
> @@ -501,7 +501,7 @@ impl AcmeClient {
> method: "GET",
> content_type: "",
> body: String::new(),
> - expected: 200,
> + expected: vec![200],
> },
> nonce,
> )
> @@ -553,7 +553,7 @@ impl AcmeClient {
> method: "HEAD",
> content_type: "",
> body: String::new(),
> - expected: 200,
> + expected: vec![200, 204],
> },
> nonce,
> )
> diff --git a/proxmox-acme/src/client.rs b/proxmox-acme/src/client.rs
> index d8a62081..ea8a8655 100644
> --- a/proxmox-acme/src/client.rs
> +++ b/proxmox-acme/src/client.rs
> @@ -203,7 +203,7 @@ impl Inner {
> let got_nonce = self.update_nonce(&mut response)?;
>
> if response.is_success() {
> - if response.status != request.expected {
> + if !request.expected.contains(&response.status) {
> return Err(Error::InvalidApi(format!(
> "API server responded with unexpected status code: {:?}",
> response.status
> diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
> index 78a90913..38e825d6 100644
> --- a/proxmox-acme/src/request.rs
> +++ b/proxmox-acme/src/request.rs
> @@ -17,8 +17,8 @@ pub struct Request {
> /// The body to pass along with request, or an empty string.
> pub body: String,
>
> - /// The expected status code a compliant ACME provider will return on success.
> - pub expected: u16,
> + /// The set of HTTP status codes that indicate a successful response from an ACME provider.
> + pub expected: Vec<u16>,
We always have a static set, so I'd rather use `&'static [u16]` here.
There's no need to allocate usually-single-element vectors everywhere.
> }
>
> /// An ACME error response contains a specially formatted type string, and can optionally
> --
> 2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox 1/1] fix #6939: acme: support servers returning 204 for nonce requests
2025-10-29 7:53 0% ` Thomas Lamprecht
2025-10-29 8:07 0% ` Christian Ebner
@ 2025-10-29 10:36 0% ` Wolfgang Bumiller
2025-10-29 15:50 6% ` Samuel Rufinatscha
1 sibling, 1 reply; 39+ results
From: Wolfgang Bumiller @ 2025-10-29 10:36 UTC (permalink / raw)
To: Thomas Lamprecht; +Cc: Proxmox Backup Server development discussion
On Wed, Oct 29, 2025 at 08:53:34AM +0100, Thomas Lamprecht wrote:
> Am 29.10.25 um 08:23 schrieb Christian Ebner:
> > Hi, thanks for the patches!
> >
> > comments inline
> >
> > On 10/28/25 8:34 PM, Samuel Rufinatscha wrote:
> >> Some ACME servers (notably custom or legacy implementations) respond
> >> to HEAD /newNonce with a 204 No Content instead of the
> >> RFC 8555-recommended 200 OK [1]. While this behavior is technically
> >> off-spec, it is functionally harmless. This issue was reported on our
> >> bug tracker [2].
> >>
> >> The previous implementation treated any non-200 response as an error,
> >> causing account registration to fail against such servers. Relax the
> >> status-code check to accept both 200 and 204 responses (and potentially
> >> support other 2xx codes) to improve interoperability.
> >>
> >> This aligns behavior with PVE’s more tolerant Perl ACME client and
> >> avoids regressions.
> >>
> >> [1] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
> >> [2] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
> >>
> >> Fixes: #6939
> >> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> >> ---
> >> proxmox-acme/src/account.rs | 10 +++++-----
> >> proxmox-acme/src/async_client.rs | 6 +++---
> >> proxmox-acme/src/client.rs | 2 +-
> >> proxmox-acme/src/request.rs | 4 ++--
> >> 4 files changed, 11 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
> >> index 73d786b8..60719865 100644
> >> --- a/proxmox-acme/src/account.rs
> >> +++ b/proxmox-acme/src/account.rs
> >> @@ -85,7 +85,7 @@ impl Account {
> >> method: "POST",
> >> content_type: crate::request::JSON_CONTENT_TYPE,
> >> body,
> >> - expected: crate::request::CREATED,
> >> + expected: vec![crate::request::CREATED],
> >
> > while this is defined as dedicated constant...
> >
> >> };
> >> Ok(NewOrder::new(request))
> >> @@ -107,7 +107,7 @@ impl Account {
> >> method: "POST",
> >> content_type: crate::request::JSON_CONTENT_TYPE,
> >> body,
> >> - expected: 200,
> >> + expected: vec![200],
> >
> > ... these and the others below are not. Same for the 204 status code you are about to add.
> >
> > So in preparation for adding the new status code, these should probably be defined as, either:
> > - as dedicated status code constants as well, or
> > - all moved over to directly use https://docs.rs/http/1.3.1/http/status/struct.StatusCode.html
> >
> > I feel like the latter is not done here intentionally to avoid the dependency on hyper or http (re-exported by hyper) for the api types only.
>
> While you are right that constants are generally nicer, IMO HTTP codes are
> very stable and universal to be fine to be used directly as numbers in the few
> limited instances here.
Mostly this, but we can also just add internal constants as well. 200
just seemed common enough...
>
> If we already (even just transitively) would get them from a dependency we still
> should switch to that, but I'd not introduce a new dependency just for that; IMO
> to high of a cost.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 0%]
* Re: [pbs-devel] [PATCH proxmox 1/1] fix #6939: acme: support servers returning 204 for nonce requests
2025-10-29 7:53 0% ` Thomas Lamprecht
@ 2025-10-29 8:07 0% ` Christian Ebner
2025-10-29 10:36 0% ` Wolfgang Bumiller
1 sibling, 0 replies; 39+ results
From: Christian Ebner @ 2025-10-29 8:07 UTC (permalink / raw)
To: Thomas Lamprecht, Proxmox Backup Server development discussion,
Samuel Rufinatscha
Cc: Wolfgang Bumiller
On 10/29/25 8:53 AM, Thomas Lamprecht wrote:
> Am 29.10.25 um 08:23 schrieb Christian Ebner:
>> Hi, thanks for the patches!
>>
>> comments inline
>>
>> On 10/28/25 8:34 PM, Samuel Rufinatscha wrote:
>>> Some ACME servers (notably custom or legacy implementations) respond
>>> to HEAD /newNonce with a 204 No Content instead of the
>>> RFC 8555-recommended 200 OK [1]. While this behavior is technically
>>> off-spec, it is functionally harmless. This issue was reported on our
>>> bug tracker [2].
>>>
>>> The previous implementation treated any non-200 response as an error,
>>> causing account registration to fail against such servers. Relax the
>>> status-code check to accept both 200 and 204 responses (and potentially
>>> support other 2xx codes) to improve interoperability.
>>>
>>> This aligns behavior with PVE’s more tolerant Perl ACME client and
>>> avoids regressions.
>>>
>>> [1] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
>>> [2] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
>>>
>>> Fixes: #6939
>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>> ---
>>> proxmox-acme/src/account.rs | 10 +++++-----
>>> proxmox-acme/src/async_client.rs | 6 +++---
>>> proxmox-acme/src/client.rs | 2 +-
>>> proxmox-acme/src/request.rs | 4 ++--
>>> 4 files changed, 11 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
>>> index 73d786b8..60719865 100644
>>> --- a/proxmox-acme/src/account.rs
>>> +++ b/proxmox-acme/src/account.rs
>>> @@ -85,7 +85,7 @@ impl Account {
>>> method: "POST",
>>> content_type: crate::request::JSON_CONTENT_TYPE,
>>> body,
>>> - expected: crate::request::CREATED,
>>> + expected: vec![crate::request::CREATED],
>>
>> while this is defined as dedicated constant...
>>
>>> };
>>> Ok(NewOrder::new(request))
>>> @@ -107,7 +107,7 @@ impl Account {
>>> method: "POST",
>>> content_type: crate::request::JSON_CONTENT_TYPE,
>>> body,
>>> - expected: 200,
>>> + expected: vec![200],
>>
>> ... these and the others below are not. Same for the 204 status code you are about to add.
>>
>> So in preparation for adding the new status code, these should probably be defined as, either:
>> - as dedicated status code constants as well, or
>> - all moved over to directly use https://docs.rs/http/1.3.1/http/status/struct.StatusCode.html
>>
>> I feel like the latter is not done here intentionally to avoid the dependency on hyper or http (re-exported by hyper) for the api types only.
>
> While you are right that constants are generally nicer, IMO HTTP codes are
> very stable and universal to be fine to be used directly as numbers in the few
> limited instances here.
>
> If we already (even just transitively) would get them from a dependency we still
> should switch to that, but I'd not introduce a new dependency just for that; IMO
> to high of a cost.
Agreed, given that the `create::request::CREATED` constant should be
inlined and dropped then for consistency.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 0%]
* Re: [pbs-devel] [PATCH proxmox 1/1] fix #6939: acme: support servers returning 204 for nonce requests
2025-10-29 7:23 5% ` Christian Ebner
@ 2025-10-29 7:53 0% ` Thomas Lamprecht
2025-10-29 8:07 0% ` Christian Ebner
2025-10-29 10:36 0% ` Wolfgang Bumiller
0 siblings, 2 replies; 39+ results
From: Thomas Lamprecht @ 2025-10-29 7:53 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Christian Ebner,
Samuel Rufinatscha
Cc: Wolfgang Bumiller
Am 29.10.25 um 08:23 schrieb Christian Ebner:
> Hi, thanks for the patches!
>
> comments inline
>
> On 10/28/25 8:34 PM, Samuel Rufinatscha wrote:
>> Some ACME servers (notably custom or legacy implementations) respond
>> to HEAD /newNonce with a 204 No Content instead of the
>> RFC 8555-recommended 200 OK [1]. While this behavior is technically
>> off-spec, it is functionally harmless. This issue was reported on our
>> bug tracker [2].
>>
>> The previous implementation treated any non-200 response as an error,
>> causing account registration to fail against such servers. Relax the
>> status-code check to accept both 200 and 204 responses (and potentially
>> support other 2xx codes) to improve interoperability.
>>
>> This aligns behavior with PVE’s more tolerant Perl ACME client and
>> avoids regressions.
>>
>> [1] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
>> [2] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
>>
>> Fixes: #6939
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> proxmox-acme/src/account.rs | 10 +++++-----
>> proxmox-acme/src/async_client.rs | 6 +++---
>> proxmox-acme/src/client.rs | 2 +-
>> proxmox-acme/src/request.rs | 4 ++--
>> 4 files changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
>> index 73d786b8..60719865 100644
>> --- a/proxmox-acme/src/account.rs
>> +++ b/proxmox-acme/src/account.rs
>> @@ -85,7 +85,7 @@ impl Account {
>> method: "POST",
>> content_type: crate::request::JSON_CONTENT_TYPE,
>> body,
>> - expected: crate::request::CREATED,
>> + expected: vec![crate::request::CREATED],
>
> while this is defined as dedicated constant...
>
>> };
>> Ok(NewOrder::new(request))
>> @@ -107,7 +107,7 @@ impl Account {
>> method: "POST",
>> content_type: crate::request::JSON_CONTENT_TYPE,
>> body,
>> - expected: 200,
>> + expected: vec![200],
>
> ... these and the others below are not. Same for the 204 status code you are about to add.
>
> So in preparation for adding the new status code, these should probably be defined as, either:
> - as dedicated status code constants as well, or
> - all moved over to directly use https://docs.rs/http/1.3.1/http/status/struct.StatusCode.html
>
> I feel like the latter is not done here intentionally to avoid the dependency on hyper or http (re-exported by hyper) for the api types only.
While you are right that constants are generally nicer, IMO HTTP codes are
very stable and universal to be fine to be used directly as numbers in the few
limited instances here.
If we already (even just transitively) would get them from a dependency we still
should switch to that, but I'd not introduce a new dependency just for that; IMO
to high of a cost.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 0%]
* Re: [pbs-devel] [PATCH proxmox{, -backup} 0/2] fix #6939: acme: support servers returning 204 for nonce requests
2025-10-28 15:21 14% [pbs-devel] [PATCH proxmox{, -backup} 0/2] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2025-10-28 15:22 15% ` [pbs-devel] [PATCH proxmox 1/1] " Samuel Rufinatscha
2025-10-28 15:22 16% ` [pbs-devel] [PATCH proxmox-backup 1/1] fix #6939: acme: accept HTTP 204 from newNonce endpoint Samuel Rufinatscha
@ 2025-10-29 7:51 5% ` Thomas Lamprecht
2025-10-29 16:02 6% ` Samuel Rufinatscha
2025-10-29 16:49 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
3 siblings, 1 reply; 39+ results
From: Thomas Lamprecht @ 2025-10-29 7:51 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Samuel Rufinatscha
Thanks for the patch and actually also checking if PVE is affected, and why
it isn't.
Am 28.10.25 um 20:34 schrieb Samuel Rufinatscha:
> This is also stricter than the PVE Perl ACME client, which tolerates any
> 2xx success codes [3]. The author mentions, the issue did not appear
> with PVE9 [1].
>
> ## Ideas to solve the problem
>
> To support ACME providers which return 204 No Content, the underlying
> ACME clients need to tolerate both 200 OK and 204 No Content as valid
> responses for the nonce HEAD request, as long as the Replay-Nonce is
> provided.
>
> I considered following solutions:
>
> 1. Change the `expected` field of the `AcmeRequest` type from `u16` to
> `Vec<u16>`, to support multiple success codes
>
> 2. Keep `expected: u16` and add a second field e.g. `expected_other:
> Vec<u16>` for "also allowed" codes.
>
> 3. Support any 2xx success codes, and remove the `expected` check
>
> I thought (1) might be reasonable, because:
>
> * It stays explicit and makes it clear which statuses are considered
> success.
> * We don’t create two parallel concepts ("expected" vs
> "expected_other") which introduces additional complexity
> * Can be extend later if we meet yet another harmless but not 200
> variant.
> * We don’t allow arbitrary 2xx.
>
> What do you think? Do you maybe have any other solution in mind that
> would fit better?
There probably isn't answer that strictly right in all cases, but in
general it's good to have similar behavior across implementations,
especially given that we do not know of any report where the PVE behavior
of accepting all 2xx response codes caused any problems, from that 3.
would be best, or does the RFC forbid the server to accept other status
codes?
That said, in practice only 201 (Created) might make sense for ACME in
addition to the referenced 200 (OK) and 204 (No Content), and following
the RFC is fine, so 1. is IMO also a good solution here.
Please note the difference to PVE in the commit message, there you write
that behavior is now aligned with PVE, but it's rather "closer aligned"
to, not fully.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox 1/1] fix #6939: acme: support servers returning 204 for nonce requests
2025-10-28 15:22 15% ` [pbs-devel] [PATCH proxmox 1/1] " Samuel Rufinatscha
@ 2025-10-29 7:23 5% ` Christian Ebner
2025-10-29 7:53 0% ` Thomas Lamprecht
2025-10-29 10:38 5% ` Wolfgang Bumiller
1 sibling, 1 reply; 39+ results
From: Christian Ebner @ 2025-10-29 7:23 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Samuel Rufinatscha
Cc: Wolfgang Bumiller
Hi, thanks for the patches!
comments inline
On 10/28/25 8:34 PM, Samuel Rufinatscha wrote:
> Some ACME servers (notably custom or legacy implementations) respond
> to HEAD /newNonce with a 204 No Content instead of the
> RFC 8555-recommended 200 OK [1]. While this behavior is technically
> off-spec, it is functionally harmless. This issue was reported on our
> bug tracker [2].
>
> The previous implementation treated any non-200 response as an error,
> causing account registration to fail against such servers. Relax the
> status-code check to accept both 200 and 204 responses (and potentially
> support other 2xx codes) to improve interoperability.
>
> This aligns behavior with PVE’s more tolerant Perl ACME client and
> avoids regressions.
>
> [1] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
> [2] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
>
> Fixes: #6939
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> proxmox-acme/src/account.rs | 10 +++++-----
> proxmox-acme/src/async_client.rs | 6 +++---
> proxmox-acme/src/client.rs | 2 +-
> proxmox-acme/src/request.rs | 4 ++--
> 4 files changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
> index 73d786b8..60719865 100644
> --- a/proxmox-acme/src/account.rs
> +++ b/proxmox-acme/src/account.rs
> @@ -85,7 +85,7 @@ impl Account {
> method: "POST",
> content_type: crate::request::JSON_CONTENT_TYPE,
> body,
> - expected: crate::request::CREATED,
> + expected: vec![crate::request::CREATED],
while this is defined as dedicated constant...
> };
>
> Ok(NewOrder::new(request))
> @@ -107,7 +107,7 @@ impl Account {
> method: "POST",
> content_type: crate::request::JSON_CONTENT_TYPE,
> body,
> - expected: 200,
> + expected: vec![200],
... these and the others below are not. Same for the 204 status code you
are about to add.
So in preparation for adding the new status code, these should probably
be defined as, either:
- as dedicated status code constants as well, or
- all moved over to directly use
https://docs.rs/http/1.3.1/http/status/struct.StatusCode.html
I feel like the latter is not done here intentionally to avoid the
dependency on hyper or http (re-exported by hyper) for the api types only.
@wolfgang, comments on that?
> })
> }
>
> @@ -132,7 +132,7 @@ impl Account {
> method: "POST",
> content_type: crate::request::JSON_CONTENT_TYPE,
> body,
> - expected: 200,
> + expected: vec![200],
> })
> }
>
> @@ -157,7 +157,7 @@ impl Account {
> method: "POST",
> content_type: crate::request::JSON_CONTENT_TYPE,
> body,
> - expected: 200,
> + expected: vec![200],
> })
> }
>
> @@ -405,7 +405,7 @@ impl AccountCreator {
> method: "POST",
> content_type: crate::request::JSON_CONTENT_TYPE,
> body,
> - expected: crate::request::CREATED,
> + expected: vec![crate::request::CREATED],
> })
> }
>
> diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
> index 60e1f359..0901aa8d 100644
> --- a/proxmox-acme/src/async_client.rs
> +++ b/proxmox-acme/src/async_client.rs
> @@ -421,7 +421,7 @@ impl AcmeClient {
> };
>
> if parts.status.is_success() {
> - if status != request.expected {
> + if !request.expected.contains(&status) {
> return Err(Error::InvalidApi(format!(
> "ACME server responded with unexpected status code: {:?}",
> parts.status
> @@ -501,7 +501,7 @@ impl AcmeClient {
> method: "GET",
> content_type: "",
> body: String::new(),
> - expected: 200,
> + expected: vec![200],
> },
> nonce,
> )
> @@ -553,7 +553,7 @@ impl AcmeClient {
> method: "HEAD",
> content_type: "",
> body: String::new(),
> - expected: 200,
> + expected: vec![200, 204],
> },
> nonce,
> )
> diff --git a/proxmox-acme/src/client.rs b/proxmox-acme/src/client.rs
> index d8a62081..ea8a8655 100644
> --- a/proxmox-acme/src/client.rs
> +++ b/proxmox-acme/src/client.rs
> @@ -203,7 +203,7 @@ impl Inner {
> let got_nonce = self.update_nonce(&mut response)?;
>
> if response.is_success() {
> - if response.status != request.expected {
> + if !request.expected.contains(&response.status) {
> return Err(Error::InvalidApi(format!(
> "API server responded with unexpected status code: {:?}",
> response.status
> diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
> index 78a90913..38e825d6 100644
> --- a/proxmox-acme/src/request.rs
> +++ b/proxmox-acme/src/request.rs
> @@ -17,8 +17,8 @@ pub struct Request {
> /// The body to pass along with request, or an empty string.
> pub body: String,
>
> - /// The expected status code a compliant ACME provider will return on success.
> - pub expected: u16,
> + /// The set of HTTP status codes that indicate a successful response from an ACME provider.
> + pub expected: Vec<u16>,
> }
>
> /// An ACME error response contains a specially formatted type string, and can optionally
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* [pbs-devel] [PATCH proxmox-backup 1/1] fix #6939: acme: accept HTTP 204 from newNonce endpoint
2025-10-28 15:21 14% [pbs-devel] [PATCH proxmox{, -backup} 0/2] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2025-10-28 15:22 15% ` [pbs-devel] [PATCH proxmox 1/1] " Samuel Rufinatscha
@ 2025-10-28 15:22 16% ` Samuel Rufinatscha
2025-10-29 7:51 5% ` [pbs-devel] [PATCH proxmox{, -backup} 0/2] fix #6939: acme: support servers returning 204 for nonce requests Thomas Lamprecht
2025-10-29 16:49 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
3 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2025-10-28 15:22 UTC (permalink / raw)
To: pbs-devel
When registering an ACME account, PBS fetches a fresh nonce by issuing a
HEAD request to the server's newNonce URL. Until now we assumed this
request would return HTTP 200 OK.
In practice, some ACME servers respond with HTTP 204 No Content for this
HEAD request while still providing a valid Replay-Nonce header. This
causes PBS to abort registration with "ACME server responded with
unexpected status code: 204", even though the server would otherwise
issue certificates correctly.
Adjust the ACME client code in PBS to accept both 200 OK and 204 No
Content as successful results for the newNonce step. We continue to
reject other status codes so we don't silently accept arbitrary 2xx
responses.
This restores interoperability with ACME servers that send 204 for
newNonce, and aligns PBS' behavior with the updated proxmox-acme library
as well as PVE's more tolerant ACME client.
Fixes: #6939
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
src/acme/client.rs | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/acme/client.rs b/src/acme/client.rs
index 1c12a4b9..0dabf676 100644
--- a/src/acme/client.rs
+++ b/src/acme/client.rs
@@ -530,7 +530,7 @@ impl AcmeClient {
};
if parts.status.is_success() {
- if status != request.expected {
+ if !request.expected.contains(&status) {
return Err(Error::InvalidApi(format!(
"ACME server responded with unexpected status code: {:?}",
parts.status
@@ -609,7 +609,7 @@ impl AcmeClient {
method: "GET",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: vec![200],
},
nonce,
)
@@ -657,7 +657,7 @@ impl AcmeClient {
method: "HEAD",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: vec![200, 204],
},
nonce,
)
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 16%]
* [pbs-devel] [PATCH proxmox 1/1] fix #6939: acme: support servers returning 204 for nonce requests
2025-10-28 15:21 14% [pbs-devel] [PATCH proxmox{, -backup} 0/2] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
@ 2025-10-28 15:22 15% ` Samuel Rufinatscha
2025-10-29 7:23 5% ` Christian Ebner
2025-10-29 10:38 5% ` Wolfgang Bumiller
2025-10-28 15:22 16% ` [pbs-devel] [PATCH proxmox-backup 1/1] fix #6939: acme: accept HTTP 204 from newNonce endpoint Samuel Rufinatscha
` (2 subsequent siblings)
3 siblings, 2 replies; 39+ results
From: Samuel Rufinatscha @ 2025-10-28 15:22 UTC (permalink / raw)
To: pbs-devel
Some ACME servers (notably custom or legacy implementations) respond
to HEAD /newNonce with a 204 No Content instead of the
RFC 8555-recommended 200 OK [1]. While this behavior is technically
off-spec, it is functionally harmless. This issue was reported on our
bug tracker [2].
The previous implementation treated any non-200 response as an error,
causing account registration to fail against such servers. Relax the
status-code check to accept both 200 and 204 responses (and potentially
support other 2xx codes) to improve interoperability.
This aligns behavior with PVE’s more tolerant Perl ACME client and
avoids regressions.
[1] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
Fixes: #6939
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme/src/account.rs | 10 +++++-----
proxmox-acme/src/async_client.rs | 6 +++---
proxmox-acme/src/client.rs | 2 +-
proxmox-acme/src/request.rs | 4 ++--
4 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
index 73d786b8..60719865 100644
--- a/proxmox-acme/src/account.rs
+++ b/proxmox-acme/src/account.rs
@@ -85,7 +85,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::request::CREATED,
+ expected: vec![crate::request::CREATED],
};
Ok(NewOrder::new(request))
@@ -107,7 +107,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: vec![200],
})
}
@@ -132,7 +132,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: vec![200],
})
}
@@ -157,7 +157,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: vec![200],
})
}
@@ -405,7 +405,7 @@ impl AccountCreator {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::request::CREATED,
+ expected: vec![crate::request::CREATED],
})
}
diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
index 60e1f359..0901aa8d 100644
--- a/proxmox-acme/src/async_client.rs
+++ b/proxmox-acme/src/async_client.rs
@@ -421,7 +421,7 @@ impl AcmeClient {
};
if parts.status.is_success() {
- if status != request.expected {
+ if !request.expected.contains(&status) {
return Err(Error::InvalidApi(format!(
"ACME server responded with unexpected status code: {:?}",
parts.status
@@ -501,7 +501,7 @@ impl AcmeClient {
method: "GET",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: vec![200],
},
nonce,
)
@@ -553,7 +553,7 @@ impl AcmeClient {
method: "HEAD",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: vec![200, 204],
},
nonce,
)
diff --git a/proxmox-acme/src/client.rs b/proxmox-acme/src/client.rs
index d8a62081..ea8a8655 100644
--- a/proxmox-acme/src/client.rs
+++ b/proxmox-acme/src/client.rs
@@ -203,7 +203,7 @@ impl Inner {
let got_nonce = self.update_nonce(&mut response)?;
if response.is_success() {
- if response.status != request.expected {
+ if !request.expected.contains(&response.status) {
return Err(Error::InvalidApi(format!(
"API server responded with unexpected status code: {:?}",
response.status
diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
index 78a90913..38e825d6 100644
--- a/proxmox-acme/src/request.rs
+++ b/proxmox-acme/src/request.rs
@@ -17,8 +17,8 @@ pub struct Request {
/// The body to pass along with request, or an empty string.
pub body: String,
- /// The expected status code a compliant ACME provider will return on success.
- pub expected: u16,
+ /// The set of HTTP status codes that indicate a successful response from an ACME provider.
+ pub expected: Vec<u16>,
}
/// An ACME error response contains a specially formatted type string, and can optionally
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox{, -backup} 0/2] fix #6939: acme: support servers returning 204 for nonce requests
@ 2025-10-28 15:21 14% Samuel Rufinatscha
2025-10-28 15:22 15% ` [pbs-devel] [PATCH proxmox 1/1] " Samuel Rufinatscha
` (3 more replies)
0 siblings, 4 replies; 39+ results
From: Samuel Rufinatscha @ 2025-10-28 15:21 UTC (permalink / raw)
To: pbs-devel
Hi,
this series proposes a change to ACME account registration in Proxmox
Backup Server (PBS), so that it also works with ACME servers that return
HTTP 204 No Content to the HEAD request for newNonce.
This behaviour was observed against a specific ACME deployment and
reported as bug #6939 [1]. Currently, PBS cannot register an ACME
account for this CA.
## Problem
During ACME account registration, PBS first fetches an anti-replay nonce
by sending a HEAD request to the CA’s newNonce URL. RFC 8555 7.2 [2]
says:
* the server MUST include a Replay-Nonce header with a fresh nonce,
* the server SHOULD use status 200 OK for the HEAD request,
* the server MUST also handle GET on the same resource with status 204 No
Content and an empty body [2].
Currently, our Rust ACME clients only accept 200 OK. PBS inherits that
strictness and aborts with:
*ACME server responded with unexpected status code: 204*
This is also stricter than the PVE Perl ACME client, which tolerates any
2xx success codes [3]. The author mentions, the issue did not appear
with PVE9 [1].
## Ideas to solve the problem
To support ACME providers which return 204 No Content, the underlying
ACME clients need to tolerate both 200 OK and 204 No Content as valid
responses for the nonce HEAD request, as long as the Replay-Nonce is
provided.
I considered following solutions:
1. Change the `expected` field of the `AcmeRequest` type from `u16` to
`Vec<u16>`, to support multiple success codes
2. Keep `expected: u16` and add a second field e.g. `expected_other:
Vec<u16>` for "also allowed" codes.
3. Support any 2xx success codes, and remove the `expected` check
I thought (1) might be reasonable, because:
* It stays explicit and makes it clear which statuses are considered
success.
* We don’t create two parallel concepts ("expected" vs
"expected_other") which introduces additional complexity
* Can be extend later if we meet yet another harmless but not 200
variant.
* We don’t allow arbitrary 2xx.
What do you think? Do you maybe have any other solution in mind that
would fit better?
## Testing
To prove the proposed fix, I reproduced the scenario:
Pebble (release 2.8.0) from Let's Encrypt [5] running on a Debian 9 VM
as the ACME server. nginx in front of Pebble, to intercept the
`newNonce` request in order to return 204 No Content instead of 200 OK,
all other requests are unchanged and forwarded to Pebble. Trust the
Pebble and ngix CAs via `/usr/local/share/ca-certificates` +
`update-ca-certificates` on the PBS VM.
Then I ran following command against nginx:
```
proxmox-backup-manager acme account register proxytest root@backup.local
--directory 'https://nginx-address/dir
Attempting to fetch Terms of
Service from "https://acme-vm/dir"
Terms of Service:
data:text/plain,Do%20what%20thou%20wilt
Do you agree to the above terms?
[y|N]: y
Do you want to use external account binding? [y|N]: N
Attempting
to register account with "https://acme-vm/dir"...
Registration
successful, account URL: https://acme-vm/my-account/160e58b66bdd72da
```
When adjusting the nginx configuration to return any other non-expected
success status code, e.g. 205, PBS expectely rejects with `API
misbehaved: ACME server responded with unexpected status code: 205`.
## Maintainer notes:
The patch series involves the following components:
proxmox-acme: Apply PATCH 1 to change `expected` from `u16` to
`Vec<u16>`. This results in a breaking change, as it changes the public
API of the `AcmeRequest` type that is used by other components.
proxmox-acme-api: Needs to depend on the new proxmox-acme; patch bump
proxmox-backup: Apply PATCH 2 to use the new API changes; no breaking
change as of only internal changes; patch bump
proxmox-perl-rs / proxmox-datacenter-manager: Will need to use the
dependency version bumps to follow the new proxmox-acme.
## Patch summary
[PATCH 1/2] fix #6939: support providers returning 204 for nonce
requests
* Make the expected-status logic accept multiple allowed codes.
* Treat both 200 OK and 204 No Content as valid for HEAD /newNonce,
provided Replay-Nonce is present.
* Keep rejecting other codes.
[PATCH 2/2] acme: accept HTTP 204 from newNonce endpoint
* Use the updated proxmox-acme behavior in PBS.
* PBS can now register an ACME account against servers that return 204
for the nonce HEAD request.
* Still rejects unexpected codes.
Thanks for considering this patch series, I look forward to your
feedback.
Best,
Samuel Rufinatscha
[1] Bugzilla report #6939:
[https://bugzilla.proxmox.com/show_bug.cgi?id=6939](https://bugzilla.proxmox.com/show_bug.cgi?id=6939)
[2] RFC 8555 (ACME):
[https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2](https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2)
[3] PVE’s Perl ACME client:
[https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597)
[4] Pebble ACME server:
[https://github.com/letsencrypt/pebble](https://github.com/letsencrypt/pebble)
proxmox:
Samuel Rufinatscha (1):
fix #6939: acme: support servers returning 204 for nonce requests
proxmox-acme/src/account.rs | 10 +++++-----
proxmox-acme/src/async_client.rs | 6 +++---
proxmox-acme/src/client.rs | 2 +-
proxmox-acme/src/request.rs | 4 ++--
4 files changed, 11 insertions(+), 11 deletions(-)
proxmox-backup:
Samuel Rufinatscha (1):
fix #6939: acme: accept HTTP 204 from newNonce endpoint
src/acme/client.rs | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
Summary over all repositories:
5 files changed, 14 insertions(+), 14 deletions(-)
--
Generated by git-murpp 0.8.1
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 14%]
Results 201-239 of 239 prev (newer) | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2025-10-28 15:21 14% [pbs-devel] [PATCH proxmox{, -backup} 0/2] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2025-10-28 15:22 15% ` [pbs-devel] [PATCH proxmox 1/1] " Samuel Rufinatscha
2025-10-29 7:23 5% ` Christian Ebner
2025-10-29 7:53 0% ` Thomas Lamprecht
2025-10-29 8:07 0% ` Christian Ebner
2025-10-29 10:36 0% ` Wolfgang Bumiller
2025-10-29 15:50 6% ` Samuel Rufinatscha
2025-10-29 10:38 5% ` Wolfgang Bumiller
2025-10-29 15:56 6% ` Samuel Rufinatscha
2025-10-28 15:22 16% ` [pbs-devel] [PATCH proxmox-backup 1/1] fix #6939: acme: accept HTTP 204 from newNonce endpoint Samuel Rufinatscha
2025-10-29 7:51 5% ` [pbs-devel] [PATCH proxmox{, -backup} 0/2] fix #6939: acme: support servers returning 204 for nonce requests Thomas Lamprecht
2025-10-29 16:02 6% ` Samuel Rufinatscha
2025-10-29 16:49 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
2025-10-29 16:45 13% [pbs-devel] [PATCH proxmox{, -backup} v2 " Samuel Rufinatscha
2025-10-29 16:45 13% ` [pbs-devel] [PATCH proxmox v2 1/1] " Samuel Rufinatscha
2025-10-31 16:21 5% ` Thomas Lamprecht
2025-11-03 8:51 9% ` Samuel Rufinatscha
2025-10-29 16:45 15% ` [pbs-devel] [PATCH proxmox-backup v2 1/1] fix #6939: acme: accept HTTP 204 from newNonce endpoint Samuel Rufinatscha
2025-11-03 10:21 13% ` [pbs-devel] superseded: [PATCH proxmox{, -backup} v2 0/2] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2025-11-03 10:13 13% [pbs-devel] [PATCH proxmox{, -backup} v3 " Samuel Rufinatscha
2025-11-03 10:13 13% ` [pbs-devel] [PATCH proxmox v3 1/1] " Samuel Rufinatscha
2025-11-04 14:11 4% ` Thomas Lamprecht
2025-11-05 10:22 9% ` Samuel Rufinatscha
2025-11-03 10:13 15% ` [pbs-devel] [PATCH proxmox-backup v3 1/1] fix #6939: acme: accept HTTP 204 from newNonce endpoint Samuel Rufinatscha
2025-11-03 16:26 15% [pbs-devel] [PATCH proxmox] fix #6913: auth-api: fix user ID parsing for 2-character realms Samuel Rufinatscha
2025-11-11 10:40 5% ` Fabian Grünbichler
2025-11-11 13:49 6% ` Samuel Rufinatscha
2025-11-14 10:34 5% ` [pbs-devel] applied: " Fabian Grünbichler
2025-11-11 12:29 11% [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-11 12:29 12% ` [pbs-devel] [PATCH proxmox-backup 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2025-11-12 13:24 4% ` Fabian Grünbichler
2025-11-13 12:59 6% ` Samuel Rufinatscha
2025-11-11 12:29 15% ` [pbs-devel] [PATCH proxmox-backup 2/3] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
2025-11-12 11:24 5% ` Fabian Grünbichler
2025-11-12 15:20 6% ` Samuel Rufinatscha
2025-11-11 12:29 15% ` [pbs-devel] [PATCH proxmox-backup 3/3] datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-12 11:27 5% ` [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path Fabian Grünbichler
2025-11-12 17:27 6% ` Samuel Rufinatscha
2025-11-14 15:05 [pbs-devel] [PATCH proxmox-backup v2 0/4] " Samuel Rufinatscha
2025-11-14 15:05 16% ` [pbs-devel] [PATCH proxmox-backup v2 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.