* Re: [pve-devel] [PATCH pve-cluster 07/15] pmxcfs-rs: add pmxcfs-test-utils infrastructure crate
@ 2026-02-03 17:03 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-02-03 17:03 UTC (permalink / raw)
To: Proxmox VE development discussion, Kefu Chai; +Cc: Kefu Chai
Thanks for the patch, having shared test utilities in a dedicated crate
makes a lot of sense.
Comments inline.
On 1/6/26 3:25 PM, Kefu Chai wrote:
> From: Kefu Chai <tchaikov@gmail.com>
>
> This commit introduces a dedicated testing infrastructure crate to support
> comprehensive unit and integration testing across the pmxcfs-rs workspace.
>
> Why a dedicated crate?
> - Provides shared test utilities without creating circular dependencies
> - Enables consistent test patterns across all pmxcfs crates
> - Centralizes mock implementations for dependency injection
>
> What this crate provides:
> 1. MockMemDb: Fast, in-memory implementation of MemDbOps trait
> - Eliminates SQLite I/O overhead in unit tests (~100x faster)
> - Enables isolated testing without filesystem dependencies
> - Uses HashMap for storage instead of SQLite persistence
>
> 2. MockStatus: Re-exported mock implementation for StatusOps trait
> - Allows testing without global singleton state
> - Enables parallel test execution
>
> 3. TestEnv builder: Fluent interface for test environment setup
> - Standardizes test configuration across different test types
> - Provides common directory structures and test data
>
> 4. Async helpers: Condition polling utilities (wait_for_condition)
> - Replaces sleep-based synchronization with active polling
>
> This crate is marked as dev-only in the workspace and is used by other
> crates through [dev-dependencies] to avoid circular dependencies.
>
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
> src/pmxcfs-rs/Cargo.toml | 2 +
> src/pmxcfs-rs/pmxcfs-test-utils/Cargo.toml | 34 +
> src/pmxcfs-rs/pmxcfs-test-utils/src/lib.rs | 526 +++++++++++++++
> .../pmxcfs-test-utils/src/mock_memdb.rs | 636 ++++++++++++++++++
> 4 files changed, 1198 insertions(+)
> create mode 100644 src/pmxcfs-rs/pmxcfs-test-utils/Cargo.toml
> create mode 100644 src/pmxcfs-rs/pmxcfs-test-utils/src/lib.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-test-utils/src/mock_memdb.rs
>
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index b5191c31..8fe06b88 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -7,6 +7,7 @@ members = [
> "pmxcfs-rrd", # RRD (Round-Robin Database) persistence
> "pmxcfs-memdb", # In-memory database with SQLite persistence
> "pmxcfs-status", # Status monitoring and RRD data management
> + "pmxcfs-test-utils", # Test utilities and helpers (dev-only)
> ]
> resolver = "2"
>
> @@ -29,6 +30,7 @@ pmxcfs-status = { path = "pmxcfs-status" }
> pmxcfs-ipc = { path = "pmxcfs-ipc" }
> pmxcfs-services = { path = "pmxcfs-services" }
> pmxcfs-logger = { path = "pmxcfs-logger" }
> +pmxcfs-test-utils = { path = "pmxcfs-test-utils" }
>
> # Core async runtime
> tokio = { version = "1.35", features = ["full"] }
> diff --git a/src/pmxcfs-rs/pmxcfs-test-utils/Cargo.toml b/src/pmxcfs-rs/pmxcfs-test-utils/Cargo.toml
> new file mode 100644
> index 00000000..41cdce64
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-test-utils/Cargo.toml
> @@ -0,0 +1,34 @@
> +[package]
> +name = "pmxcfs-test-utils"
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +repository.workspace = true
> +rust-version.workspace = true
> +
> +[lib]
> +name = "pmxcfs_test_utils"
> +path = "src/lib.rs"
> +
> +[dependencies]
> +# Internal workspace dependencies
> +pmxcfs-api-types.workspace = true
> +pmxcfs-config.workspace = true
> +pmxcfs-memdb.workspace = true
> +pmxcfs-status.workspace = true
> +
> +# Error handling
> +anyhow.workspace = true
> +
> +# Concurrency
> +parking_lot.workspace = true
> +
> +# System integration
> +libc.workspace = true
> +
> +# Development utilities
> +tempfile.workspace = true
> +
> +# Async runtime
> +tokio.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-test-utils/src/lib.rs b/src/pmxcfs-rs/pmxcfs-test-utils/src/lib.rs
> new file mode 100644
> index 00000000..a2b732a5
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-test-utils/src/lib.rs
> @@ -0,0 +1,526 @@
> +//! Test utilities for pmxcfs integration and unit tests
> +//!
> +//! This crate provides:
> +//! - Common test setup and helper functions
> +//! - TestEnv builder for standard test configurations
> +//! - Mock implementations (MockStatus, MockMemDb for isolated testing)
> +//! - Test constants and utilities
> +
> +use anyhow::Result;
> +use pmxcfs_config::Config;
> +use pmxcfs_memdb::MemDb;
> +use std::sync::Arc;
> +use std::time::{Duration, Instant};
> +use tempfile::TempDir;
> +
> +// Re-export MockStatus for easy test access
> +pub use pmxcfs_status::{MockStatus, StatusOps};
> +
> +// Mock implementations
> +mod mock_memdb;
> +pub use mock_memdb::MockMemDb;
> +
> +// Re-export MemDbOps for convenience in tests
> +pub use pmxcfs_memdb::MemDbOps;
> +
> +// Test constants
> +pub const TEST_MTIME: u32 = 1234567890;
> +pub const TEST_NODE_NAME: &str = "testnode";
> +pub const TEST_CLUSTER_NAME: &str = "test-cluster";
> +pub const TEST_WWW_DATA_GID: u32 = 33;
> +
> +/// Test environment builder for standard test setups
> +///
> +/// This builder provides a fluent interface for creating test environments
> +/// with optional components (database, status, config).
> +///
> +/// # Example
> +/// ```
> +/// use pmxcfs_test_utils::TestEnv;
> +///
> +/// # fn example() -> anyhow::Result<()> {
> +/// let env = TestEnv::new()
> +/// .with_database()?
> +/// .with_mock_status()
> +/// .build();
> +///
> +/// // Use env.db, env.status, etc.
> +/// # Ok(())
> +/// # }
> +/// ```
> +pub struct TestEnv {
> + pub config: Arc<Config>,
> + pub db: Option<MemDb>,
> + pub status: Option<Arc<dyn StatusOps>>,
these are pub, but we also have accessor functions
(which can panic)
> + pub temp_dir: Option<TempDir>,
> +}
> +
> +impl TestEnv {
> + /// Create a new test environment builder with default config
> + pub fn new() -> Self {
> + Self::new_with_config(false)
> + }
> +
> + /// Create a new test environment builder with local mode config
> + pub fn new_local() -> Self {
> + Self::new_with_config(true)
> + }
> +
> + /// Create a new test environment builder with custom local_mode setting
> + pub fn new_with_config(local_mode: bool) -> Self {
> + let config = create_test_config(local_mode);
> + Self {
> + config,
> + db: None,
> + status: None,
> + temp_dir: None,
> + }
> + }
> +
> + /// Add a database with standard directory structure
> + pub fn with_database(mut self) -> Result<Self> {
> + let (temp_dir, db) = create_test_db()?;
> + self.temp_dir = Some(temp_dir);
> + self.db = Some(db);
> + Ok(self)
> + }
> +
> + /// Add a minimal database (no standard directories)
> + pub fn with_minimal_database(mut self) -> Result<Self> {
> + let (temp_dir, db) = create_minimal_test_db()?;
> + self.temp_dir = Some(temp_dir);
> + self.db = Some(db);
> + Ok(self)
> + }
> +
> + /// Add a MockStatus instance for isolated testing
> + pub fn with_mock_status(mut self) -> Self {
> + self.status = Some(Arc::new(MockStatus::new()));
> + self
> + }
> +
> + /// Add the real Status instance (uses global singleton)
> + pub fn with_status(mut self) -> Self {
> + self.status = Some(pmxcfs_status::init());
> + self
> + }
> +
> + /// Build and return the test environment
> + pub fn build(self) -> Self {
> + self
> + }
this function seems redundant
> +
> + /// Get a reference to the database (panics if not configured)
> + pub fn db(&self) -> &MemDb {
> + self.db
> + .as_ref()
> + .expect("Database not configured. Call with_database() first")
> + }
> +
> + /// Get a reference to the status (panics if not configured)
> + pub fn status(&self) -> &Arc<dyn StatusOps> {
> + self.status
> + .as_ref()
> + .expect("Status not configured. Call with_status() or with_mock_status() first")
> + }
> +}
> +
> +impl Default for TestEnv {
> + fn default() -> Self {
> + Self::new()
> + }
> +}
> +
> +/// Creates a standard test configuration
> +///
> +/// # Arguments
> +/// * `local_mode` - Whether to run in local mode (no cluster)
> +///
> +/// # Returns
> +/// Arc-wrapped Config suitable for testing
> +pub fn create_test_config(local_mode: bool) -> Arc<Config> {
> + Config::new(
> + TEST_NODE_NAME.to_string(),
> + "127.0.0.1".to_string(),
> + TEST_WWW_DATA_GID,
> + false, // debug mode
> + local_mode,
> + TEST_CLUSTER_NAME.to_string(),
> + )
> +}
> +
> +/// Creates a test database with standard directory structure
> +///
> +/// Creates the following directories:
> +/// - /nodes/{nodename}/qemu-server
> +/// - /nodes/{nodename}/lxc
> +/// - /nodes/{nodename}/priv
> +/// - /priv/lock/qemu-server
> +/// - /priv/lock/lxc
> +/// - /qemu-server
> +/// - /lxc
> +///
> +/// # Returns
> +/// (TempDir, MemDb) - The temp directory must be kept alive for database to persist
> +pub fn create_test_db() -> Result<(TempDir, MemDb)> {
> + let temp_dir = TempDir::new()?;
> + let db_path = temp_dir.path().join("test.db");
> + let db = MemDb::open(&db_path, true)?;
> +
> + // Create standard directory structure
> + let now = TEST_MTIME;
> +
> + // Node-specific directories
> + db.create("/nodes", libc::S_IFDIR, now)?;
> + db.create(&format!("/nodes/{}", TEST_NODE_NAME), libc::S_IFDIR, now)?;
> + db.create(
> + &format!("/nodes/{}/qemu-server", TEST_NODE_NAME),
> + libc::S_IFDIR,
> + now,
> + )?;
> + db.create(
> + &format!("/nodes/{}/lxc", TEST_NODE_NAME),
> + libc::S_IFDIR,
> + now,
> + )?;
> + db.create(
> + &format!("/nodes/{}/priv", TEST_NODE_NAME),
> + libc::S_IFDIR,
> + now,
> + )?;
> +
> + // Global directories
> + db.create("/priv", libc::S_IFDIR, now)?;
> + db.create("/priv/lock", libc::S_IFDIR, now)?;
> + db.create("/priv/lock/qemu-server", libc::S_IFDIR, now)?;
> + db.create("/priv/lock/lxc", libc::S_IFDIR, now)?;
> + db.create("/qemu-server", libc::S_IFDIR, now)?;
> + db.create("/lxc", libc::S_IFDIR, now)?;
> +
> + Ok((temp_dir, db))
> +}
> +
> +/// Creates a minimal test database (no standard directories)
> +///
> +/// Use this when you want full control over database structure
> +///
> +/// # Returns
> +/// (TempDir, MemDb) - The temp directory must be kept alive for database to persist
> +pub fn create_minimal_test_db() -> Result<(TempDir, MemDb)> {
> + let temp_dir = TempDir::new()?;
> + let db_path = temp_dir.path().join("test.db");
> + let db = MemDb::open(&db_path, true)?;
> + Ok((temp_dir, db))
> +}
> +
> +/// Creates test VM configuration content
> +///
> +/// # Arguments
> +/// * `vmid` - VM ID
> +/// * `cores` - Number of CPU cores
> +/// * `memory` - Memory in MB
> +///
> +/// # Returns
> +/// Configuration file content as bytes
> +pub fn create_vm_config(vmid: u32, cores: u32, memory: u32) -> Vec<u8> {
> + format!(
> + "name: test-vm-{}\ncores: {}\nmemory: {}\nbootdisk: scsi0\n",
> + vmid, cores, memory
> + )
> + .into_bytes()
> +}
> +
> +/// Creates test CT (container) configuration content
> +///
> +/// # Arguments
> +/// * `vmid` - Container ID
> +/// * `cores` - Number of CPU cores
> +/// * `memory` - Memory in MB
> +///
> +/// # Returns
> +/// Configuration file content as bytes
> +pub fn create_ct_config(vmid: u32, cores: u32, memory: u32) -> Vec<u8> {
> + format!(
> + "cores: {}\nmemory: {}\nrootfs: local:100/vm-{}-disk-0.raw\n",
> + cores, memory, vmid
> + )
> + .into_bytes()
> +}
> +
> +/// Creates a test lock path for a VM config
> +///
> +/// # Arguments
> +/// * `vmid` - VM ID
> +/// * `vm_type` - "qemu" or "lxc"
> +///
> +/// # Returns
> +/// Lock path in format `/priv/lock/{vm_type}/{vmid}.conf`
> +pub fn create_lock_path(vmid: u32, vm_type: &str) -> String {
> + format!("/priv/lock/{}/{}.conf", vm_type, vmid)
> +}
> +
> +/// Creates a test config path for a VM
> +///
> +/// # Arguments
> +/// * `vmid` - VM ID
> +/// * `vm_type` - "qemu-server" or "lxc"
> +///
> +/// # Returns
> +/// Config path in format `/{vm_type}/{vmid}.conf`
> +pub fn create_config_path(vmid: u32, vm_type: &str) -> String {
> + format!("/{}/{}.conf", vm_type, vmid)
> +}
> +
> +/// Clears all VMs from a status instance
> +///
> +/// Useful for ensuring clean state before tests that register VMs.
> +///
> +/// # Arguments
> +/// * `status` - The status instance to clear
> +pub fn clear_test_vms(status: &dyn StatusOps) {
> + let existing_vms: Vec<u32> = status.get_vmlist().keys().copied().collect();
> + for vmid in existing_vms {
> + status.delete_vm(vmid);
> + }
> +}
> +
> +/// Wait for a condition to become true, polling at regular intervals
> +///
> +/// This is a replacement for sleep-based synchronization in integration tests.
> +/// Instead of sleeping for an arbitrary duration and hoping the condition is met,
> +/// this function polls the condition and returns as soon as it becomes true.
> +///
> +/// # Arguments
> +/// * `predicate` - Function that returns true when the condition is met
> +/// * `timeout` - Maximum time to wait for the condition
> +/// * `check_interval` - How often to check the condition
> +///
> +/// # Returns
> +/// * `true` if condition was met within timeout
> +/// * `false` if timeout was reached without condition being met
> +///
> +/// # Example
> +/// ```no_run
> +/// use pmxcfs_test_utils::wait_for_condition;
> +/// use std::time::Duration;
> +/// use std::sync::atomic::{AtomicBool, Ordering};
> +/// use std::sync::Arc;
> +///
> +/// # async fn example() {
> +/// let ready = Arc::new(AtomicBool::new(false));
> +///
> +/// // Wait for service to be ready (with timeout)
> +/// let result = wait_for_condition(
> +/// || ready.load(Ordering::SeqCst),
> +/// Duration::from_secs(5),
> +/// Duration::from_millis(10),
> +/// ).await;
> +///
> +/// assert!(result, "Service should be ready within 5 seconds");
> +/// # }
> +/// ```
> +pub async fn wait_for_condition<F>(
> + predicate: F,
> + timeout: Duration,
> + check_interval: Duration,
> +) -> bool
> +where
> + F: Fn() -> bool,
> +{
> + let start = Instant::now();
> + loop {
> + if predicate() {
> + return true;
> + }
> + if start.elapsed() >= timeout {
> + return false;
> + }
> + tokio::time::sleep(check_interval).await;
> + }
> +}
> +
> +/// Wait for a condition with a custom error message
> +///
> +/// Similar to `wait_for_condition`, but returns a Result with a custom error message
> +/// if the timeout is reached.
> +///
> +/// # Arguments
> +/// * `predicate` - Function that returns true when the condition is met
> +/// * `timeout` - Maximum time to wait for the condition
> +/// * `check_interval` - How often to check the condition
> +/// * `error_msg` - Error message to return if timeout is reached
> +///
> +/// # Returns
> +/// * `Ok(())` if condition was met within timeout
> +/// * `Err(anyhow::Error)` with custom message if timeout was reached
> +///
> +/// # Example
> +/// ```no_run
> +/// use pmxcfs_test_utils::wait_for_condition_or_fail;
> +/// use std::time::Duration;
> +/// use std::sync::atomic::{AtomicU64, Ordering};
> +/// use std::sync::Arc;
> +///
> +/// # async fn example() -> anyhow::Result<()> {
> +/// let counter = Arc::new(AtomicU64::new(0));
> +///
> +/// wait_for_condition_or_fail(
> +/// || counter.load(Ordering::SeqCst) >= 1,
> +/// Duration::from_secs(5),
> +/// Duration::from_millis(10),
> +/// "Service should initialize within 5 seconds",
> +/// ).await?;
> +///
> +/// # Ok(())
> +/// # }
> +/// ```
> +pub async fn wait_for_condition_or_fail<F>(
> + predicate: F,
> + timeout: Duration,
> + check_interval: Duration,
> + error_msg: &str,
> +) -> Result<()>
> +where
> + F: Fn() -> bool,
> +{
> + if wait_for_condition(predicate, timeout, check_interval).await {
> + Ok(())
> + } else {
> + anyhow::bail!("{}", error_msg)
> + }
> +}
> +
> +/// Blocking version of wait_for_condition for synchronous tests
> +///
> +/// Similar to `wait_for_condition`, but works in synchronous contexts.
> +/// Polls the condition and returns as soon as it becomes true or timeout is reached.
> +///
> +/// # Arguments
> +/// * `predicate` - Function that returns true when the condition is met
> +/// * `timeout` - Maximum time to wait for the condition
> +/// * `check_interval` - How often to check the condition
> +///
> +/// # Returns
> +/// * `true` if condition was met within timeout
> +/// * `false` if timeout was reached without condition being met
> +///
> +/// # Example
> +/// ```no_run
> +/// use pmxcfs_test_utils::wait_for_condition_blocking;
> +/// use std::time::Duration;
> +/// use std::sync::atomic::{AtomicBool, Ordering};
> +/// use std::sync::Arc;
> +///
> +/// let ready = Arc::new(AtomicBool::new(false));
> +///
> +/// // Wait for service to be ready (with timeout)
> +/// let result = wait_for_condition_blocking(
> +/// || ready.load(Ordering::SeqCst),
> +/// Duration::from_secs(5),
> +/// Duration::from_millis(10),
> +/// );
> +///
> +/// assert!(result, "Service should be ready within 5 seconds");
> +/// ```
> +pub fn wait_for_condition_blocking<F>(
> + predicate: F,
> + timeout: Duration,
> + check_interval: Duration,
> +) -> bool
> +where
> + F: Fn() -> bool,
> +{
> + let start = Instant::now();
> + loop {
> + if predicate() {
> + return true;
> + }
> + if start.elapsed() >= timeout {
> + return false;
> + }
> + std::thread::sleep(check_interval);
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> +
> + #[test]
> + fn test_create_test_config() {
> + let config = create_test_config(true);
> + assert_eq!(config.nodename, TEST_NODE_NAME);
> + assert_eq!(config.cluster_name, TEST_CLUSTER_NAME);
> + assert!(config.local_mode);
> + }
> +
> + #[test]
> + fn test_create_test_db() -> Result<()> {
> + let (_temp_dir, db) = create_test_db()?;
> +
> + // Verify standard directories exist
> + assert!(db.exists("/nodes")?, "Should have /nodes");
> + assert!(db.exists("/qemu-server")?, "Should have /qemu-server");
> + assert!(db.exists("/priv/lock")?, "Should have /priv/lock");
> +
> + Ok(())
> + }
> +
> + #[test]
> + fn test_path_helpers() {
> + assert_eq!(
> + create_lock_path(100, "qemu-server"),
The docs of create_lock_path say qemu or lxc, but we pass "qemu-server"
> + "/priv/lock/qemu-server/100.conf"
> + );
> + assert_eq!(
> + create_config_path(100, "qemu-server"),
> + "/qemu-server/100.conf"
> + );
> + }
> +
> + #[test]
> + fn test_env_builder_basic() {
> + let env = TestEnv::new().build();
> + assert_eq!(env.config.nodename, TEST_NODE_NAME);
> + assert!(env.db.is_none());
> + assert!(env.status.is_none());
> + }
> +
> + #[test]
> + fn test_env_builder_with_database() -> Result<()> {
> + let env = TestEnv::new().with_database()?.build();
> + assert!(env.db.is_some());
> + assert!(env.db().exists("/nodes")?);
> + Ok(())
> + }
> +
> + #[test]
> + fn test_env_builder_with_mock_status() {
> + let env = TestEnv::new().with_mock_status().build();
> + assert!(env.status.is_some());
> +
> + // Test that MockStatus works
> + let status = env.status();
> + status.set_quorate(true);
> + assert!(status.is_quorate());
> + }
> +
> + #[test]
> + fn test_env_builder_full() -> Result<()> {
> + let env = TestEnv::new().with_database()?.with_mock_status().build();
> +
> + assert!(env.db.is_some());
> + assert!(env.status.is_some());
> + assert!(env.config.nodename == TEST_NODE_NAME);
> +
> + Ok(())
> + }
> +
> + // NOTE: Tokio tests for wait_for_condition functions are REMOVED because they
> + // cause the test runner to hang when running `cargo test --lib --workspace`.
> + // Root cause: tokio multi-threaded runtime doesn't shut down properly when
> + // these async tests complete, blocking the entire test suite.
> + //
> + // These utility functions work correctly and are verified in integration tests
> + // that actually use them (e.g., integration-tests/).
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-test-utils/src/mock_memdb.rs b/src/pmxcfs-rs/pmxcfs-test-utils/src/mock_memdb.rs
> new file mode 100644
> index 00000000..c341f9eb
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-test-utils/src/mock_memdb.rs
> @@ -0,0 +1,636 @@
> +//! Mock in-memory database implementation for testing
> +//!
> +//! This module provides `MockMemDb`, a lightweight in-memory implementation
> +//! of the `MemDbOps` trait for use in unit tests.
> +
> +use anyhow::{Result, bail};
> +use parking_lot::RwLock;
> +use pmxcfs_memdb::{MemDbOps, ROOT_INODE, TreeEntry};
> +use std::collections::HashMap;
> +use std::sync::atomic::{AtomicU64, Ordering};
> +use std::time::{SystemTime, UNIX_EPOCH};
> +
> +// Directory and file type constants from dirent.h
> +const DT_DIR: u8 = 4;
> +const DT_REG: u8 = 8;
> +
> +/// Mock in-memory database for testing
> +///
> +/// Unlike the real `MemDb` which uses SQLite persistence, `MockMemDb` stores
> +/// everything in memory using HashMap. This makes it:
> +/// - Faster for unit tests (no disk I/O)
> +/// - Easier to inject failures for error testing
> +/// - Completely isolated (no shared state between tests)
> +///
> +/// # Example
> +/// ```
> +/// use pmxcfs_test_utils::MockMemDb;
> +/// use pmxcfs_memdb::MemDbOps;
> +/// use std::sync::Arc;
> +///
> +/// let db: Arc<dyn MemDbOps> = Arc::new(MockMemDb::new());
> +/// db.create("/test.txt", 0, 1234).unwrap();
> +/// assert!(db.exists("/test.txt").unwrap());
> +/// ```
> +pub struct MockMemDb {
> + /// Files and directories stored as path -> data
> + files: RwLock<HashMap<String, Vec<u8>>>,
> + /// Directory entries stored as path -> Vec<child_names>
> + directories: RwLock<HashMap<String, Vec<String>>>,
> + /// Metadata stored as path -> TreeEntry
> + entries: RwLock<HashMap<String, TreeEntry>>,
> + /// Lock state stored as path -> (timestamp, checksum)
> + locks: RwLock<HashMap<String, (u64, [u8; 32])>>,
> + /// Version counter
> + version: AtomicU64,
> + /// Inode counter
> + next_inode: AtomicU64,
> +}
> +
> +impl MockMemDb {
> + /// Create a new empty mock database
> + pub fn new() -> Self {
> + let mut directories = HashMap::new();
> + directories.insert("/".to_string(), Vec::new());
> +
> + let mut entries = HashMap::new();
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap()
> + .as_secs() as u32;
> +
> + // Create root entry
> + entries.insert(
> + "/".to_string(),
> + TreeEntry {
> + inode: ROOT_INODE,
> + parent: 0,
> + version: 0,
> + writer: 1,
> + mtime: now,
> + size: 0,
> + entry_type: DT_DIR,
> + data: Vec::new(),
> + name: String::new(),
> + },
> + );
> +
> + Self {
> + files: RwLock::new(HashMap::new()),
> + directories: RwLock::new(directories),
> + entries: RwLock::new(entries),
> + locks: RwLock::new(HashMap::new()),
> + version: AtomicU64::new(1),
> + next_inode: AtomicU64::new(ROOT_INODE + 1),
> + }
> + }
> +
> + /// Helper to check if path is a directory
> + fn is_directory(&self, path: &str) -> bool {
> + self.directories.read().contains_key(path)
> + }
> +
> + /// Helper to get parent path
> + fn parent_path(path: &str) -> Option<String> {
> + if path == "/" {
> + return None;
> + }
> + let parent = path.rsplit_once('/')?.0;
> + if parent.is_empty() {
> + Some("/".to_string())
> + } else {
> + Some(parent.to_string())
> + }
> + }
> +
> + /// Helper to get file name from path
> + fn file_name(path: &str) -> String {
> + if path == "/" {
> + return String::new();
> + }
> + path.rsplit('/').next().unwrap_or("").to_string()
> + }
> +}
> +
> +impl Default for MockMemDb {
> + fn default() -> Self {
> + Self::new()
> + }
> +}
> +
> +impl MemDbOps for MockMemDb {
> + fn create(&self, path: &str, mode: u32, mtime: u32) -> Result<()> {
> + if path.is_empty() {
> + bail!("Empty path");
> + }
> +
> + if self.entries.read().contains_key(path) {
> + bail!("File exists: {}", path);
> + }
> +
> + let is_dir = (mode & libc::S_IFMT) == libc::S_IFDIR;
> + let entry_type = if is_dir { DT_DIR } else { DT_REG };
> + let inode = self.next_inode.fetch_add(1, Ordering::SeqCst);
> +
> + // Add to parent directory
> + if let Some(parent) = Self::parent_path(path) {
> + if !self.is_directory(&parent) {
> + bail!("Parent is not a directory: {}", parent);
> + }
> + let mut dirs = self.directories.write();
> + if let Some(children) = dirs.get_mut(&parent) {
> + children.push(Self::file_name(path));
> + }
> + }
> +
> + // Create entry
> + let entry = TreeEntry {
> + inode,
> + parent: 0, // Simplified
> + version: self.version.load(Ordering::SeqCst),
> + writer: 1,
> + mtime,
> + size: 0,
> + entry_type,
> + data: Vec::new(),
> + name: Self::file_name(path),
> + };
> +
> + self.entries.write().insert(path.to_string(), entry);
> +
> + if is_dir {
> + self.directories
> + .write()
> + .insert(path.to_string(), Vec::new());
> + } else {
> + self.files.write().insert(path.to_string(), Vec::new());
> + }
> +
> + self.version.fetch_add(1, Ordering::SeqCst);
> + Ok(())
> + }
> +
> + fn read(&self, path: &str, offset: u64, size: usize) -> Result<Vec<u8>> {
> + let files = self.files.read();
> + let data = files
> + .get(path)
> + .ok_or_else(|| anyhow::anyhow!("File not found: {}", path))?;
> +
> + let offset = offset as usize;
> + if offset >= data.len() {
> + return Ok(Vec::new());
> + }
> +
> + let end = std::cmp::min(offset + size, data.len());
> + Ok(data[offset..end].to_vec())
> + }
> +
> + fn write(
> + &self,
> + path: &str,
> + offset: u64,
> + mtime: u32,
> + data: &[u8],
> + truncate: bool,
> + ) -> Result<usize> {
> + let mut files = self.files.write();
> + let file_data = files
> + .get_mut(path)
> + .ok_or_else(|| anyhow::anyhow!("File not found: {}", path))?;
> +
> + let offset = offset as usize;
> +
> + if truncate {
> + file_data.clear();
> + }
> +
> + // Expand if needed
> + if offset + data.len() > file_data.len() {
> + file_data.resize(offset + data.len(), 0);
> + }
> +
> + file_data[offset..offset + data.len()].copy_from_slice(data);
> +
> + // Update entry
> + if let Some(entry) = self.entries.write().get_mut(path) {
> + entry.mtime = mtime;
> + entry.size = file_data.len();
> + }
> +
> + self.version.fetch_add(1, Ordering::SeqCst);
> + Ok(data.len())
> + }
> +
> + fn delete(&self, path: &str) -> Result<()> {
> + if !self.entries.read().contains_key(path) {
> + bail!("File not found: {}", path);
> + }
> +
> + // Check if directory is empty
> + if let Some(children) = self.directories.read().get(path) {
> + if !children.is_empty() {
> + bail!("Directory not empty: {}", path);
> + }
> + }
> +
> + self.entries.write().remove(path);
> + self.files.write().remove(path);
> + self.directories.write().remove(path);
> +
> + // Remove from parent
> + if let Some(parent) = Self::parent_path(path) {
> + if let Some(children) = self.directories.write().get_mut(&parent) {
> + children.retain(|name| name != &Self::file_name(path));
> + }
> + }
> +
> + self.version.fetch_add(1, Ordering::SeqCst);
> + Ok(())
> + }
> +
> + fn rename(&self, old_path: &str, new_path: &str) -> Result<()> {
> + // Check existence first with read locks (released immediately)
> + {
> + let entries = self.entries.read();
> + if !entries.contains_key(old_path) {
> + bail!("Source not found: {}", old_path);
> + }
> + if entries.contains_key(new_path) {
> + bail!("Destination already exists: {}", new_path);
> + }
> + }
We currently don't update parent children lists.
Also, if rename() can be used for directories: we likely need to
rewrite/move all descendant keys (/old/... -> /new/...) across
entries/files/directories to keep the tree consistent.
> +
> + // Move entry - hold write lock for entire operation
> + {
> + let mut entries = self.entries.write();
> + if let Some(mut entry) = entries.remove(old_path) {
> + entry.name = Self::file_name(new_path);
> + entries.insert(new_path.to_string(), entry);
> + }
> + }
Between the read and write lock we have a TOCTOU.
Coudlnt we just hold the write lock?
> +
> + // Move file data - hold write lock for entire operation
> + {
> + let mut files = self.files.write();
> + if let Some(data) = files.remove(old_path) {
> + files.insert(new_path.to_string(), data);
> + }
> + }
> +
> + // Move directory - hold write lock for entire operation
> + {
> + let mut directories = self.directories.write();
> + if let Some(children) = directories.remove(old_path) {
> + directories.insert(new_path.to_string(), children);
> + }
> + }
> +
> + self.version.fetch_add(1, Ordering::SeqCst);
> + Ok(())
> + }
> +
> + fn exists(&self, path: &str) -> Result<bool> {
> + Ok(self.entries.read().contains_key(path))
> + }
> +
> + fn readdir(&self, path: &str) -> Result<Vec<TreeEntry>> {
> + let directories = self.directories.read();
> + let children = directories
> + .get(path)
> + .ok_or_else(|| anyhow::anyhow!("Not a directory: {}", path))?;
> +
> + let entries = self.entries.read();
> + let mut result = Vec::new();
> +
> + for child_name in children {
> + let child_path = if path == "/" {
> + format!("/{}", child_name)
> + } else {
> + format!("{}/{}", path, child_name)
> + };
> +
> + if let Some(entry) = entries.get(&child_path) {
> + result.push(entry.clone());
> + }
> + }
> +
> + Ok(result)
> + }
> +
> + fn set_mtime(&self, path: &str, _writer: u32, mtime: u32) -> Result<()> {
> + let mut entries = self.entries.write();
> + let entry = entries
> + .get_mut(path)
> + .ok_or_else(|| anyhow::anyhow!("File not found: {}", path))?;
> + entry.mtime = mtime;
> + Ok(())
> + }
> +
> + fn lookup_path(&self, path: &str) -> Option<TreeEntry> {
> + self.entries.read().get(path).cloned()
> + }
> +
> + fn get_entry_by_inode(&self, inode: u64) -> Option<TreeEntry> {
> + self.entries
> + .read()
> + .values()
> + .find(|e| e.inode == inode)
> + .cloned()
> + }
> +
> + fn acquire_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()> {
> + let mut locks = self.locks.write();
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap()
> + .as_secs();
> +
> + if let Some((timestamp, existing_csum)) = locks.get(path) {
> + // Check if expired
> + if now - timestamp > 120 {
nit: magic number here, could we use a
const LOCK_TIMEOUT_SECS: u64 = 120; for example?
> + // Expired, can acquire
> + locks.insert(path.to_string(), (now, *csum));
> + return Ok(());
> + }
> +
> + // Not expired, check if same checksum (refresh)
> + if existing_csum == csum {
> + locks.insert(path.to_string(), (now, *csum));
> + return Ok(());
> + }
> +
> + bail!("Lock already held with different checksum");
> + }
> +
> + locks.insert(path.to_string(), (now, *csum));
> + Ok(())
> + }
> +
> + fn release_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()> {
> + let mut locks = self.locks.write();
> + if let Some((_, existing_csum)) = locks.get(path) {
> + if existing_csum == csum {
> + locks.remove(path);
> + return Ok(());
> + }
> + bail!("Lock checksum mismatch");
> + }
> + bail!("No lock found");
> + }
> +
> + fn is_locked(&self, path: &str) -> bool {
> + if let Some((timestamp, _)) = self.locks.read().get(path) {
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap()
> + .as_secs();
> + now - timestamp <= 120
> + } else {
> + false
> + }
> + }
> +
> + fn lock_expired(&self, path: &str, csum: &[u8; 32]) -> bool {
> + if let Some((timestamp, existing_csum)) = self.locks.read().get(path).cloned() {
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap()
> + .as_secs();
> +
> + // Checksum mismatch - reset timeout
> + if &existing_csum != csum {
> + self.locks.write().insert(path.to_string(), (now, *csum));
can we please document this, why we are modifying state when
checksums mismatch?
> + return false;
> + }
> +
> + // Check expiration
> + now - timestamp > 120
> + } else {
> + false
> + }
> + }
> +
> + fn get_version(&self) -> u64 {
> + self.version.load(Ordering::SeqCst)
> + }
> +
> + fn get_all_entries(&self) -> Result<Vec<TreeEntry>> {
> + Ok(self.entries.read().values().cloned().collect())
> + }
> +
> + fn replace_all_entries(&self, entries: Vec<TreeEntry>) -> Result<()> {
Also replace_all_entries() / apply_tree_entry() don’t rebuild parent
directories[..] children lists
> + self.entries.write().clear();
Clears entries, so the root TreeEntry ("/") should be reinserted to
preserve invariants not? (similar to directories below).
> + self.files.write().clear();
> + self.directories.write().clear();
Clearing directories removes "/" but doesn’t reinsert "/"
If possible, we could acquire all write locks once (in the right order)
before the loop
> +
> + for entry in entries {
> + let path = format!("/{}", entry.name); // Simplified
> + self.entries.write().insert(path.clone(), entry.clone());
> +
> + if entry.size > 0 {
Use entry.entry_type == DT_DIR to distinguish directories from files.
The current entry.size > 0 check incorrectly classifies empty files
(size 0) as directories.
> + self.files.write().insert(path, entry.data.clone());
> + } else {
> + self.directories.write().insert(path, Vec::new());
> + }
> + }
> +
> + self.version.fetch_add(1, Ordering::SeqCst);
> + Ok(())
> + }
> +
> + fn apply_tree_entry(&self, entry: TreeEntry) -> Result<()> {
> + let path = format!("/{}", entry.name); // Simplified
> + self.entries.write().insert(path.clone(), entry.clone());
> +
> + if entry.size > 0 {
also here please use entry.entry_type == DT_DIR
> + self.files.write().insert(path, entry.data.clone());
> + }
> +
> + self.version.fetch_add(1, Ordering::SeqCst);
> + Ok(())
> + }
> +
> + fn encode_database(&self) -> Result<Vec<u8>> {
> + // Simplified - just return empty vec
> + Ok(Vec::new())
> + }
> +
> + fn compute_database_checksum(&self) -> Result<[u8; 32]> {
> + // Simplified - return deterministic checksum based on version
> + let version = self.version.load(Ordering::SeqCst);
> + let mut checksum = [0u8; 32];
> + checksum[0..8].copy_from_slice(&version.to_le_bytes());
> + Ok(checksum)
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> + use std::sync::Arc;
> +
> + #[test]
> + fn test_mock_memdb_basic_operations() {
> + let db = MockMemDb::new();
> +
> + // Create file
> + db.create("/test.txt", libc::S_IFREG, 1234).unwrap();
> + assert!(db.exists("/test.txt").unwrap());
> +
> + // Write data
> + let data = b"Hello, MockMemDb!";
> + db.write("/test.txt", 0, 1235, data, false).unwrap();
> +
> + // Read data
> + let read_data = db.read("/test.txt", 0, 100).unwrap();
> + assert_eq!(&read_data[..], data);
> +
> + // Check entry
> + let entry = db.lookup_path("/test.txt").unwrap();
> + assert_eq!(entry.size, data.len());
> + assert_eq!(entry.mtime, 1235);
> + }
> +
> + #[test]
> + fn test_mock_memdb_directory_operations() {
> + let db = MockMemDb::new();
> +
> + // Create directory
> + db.create("/mydir", libc::S_IFDIR, 1000).unwrap();
> + assert!(db.exists("/mydir").unwrap());
> +
> + // Create file in directory
> + db.create("/mydir/file.txt", libc::S_IFREG, 1001).unwrap();
> +
> + // Read directory
> + let entries = db.readdir("/mydir").unwrap();
> + assert_eq!(entries.len(), 1);
> + assert_eq!(entries[0].name, "file.txt");
> + }
> +
> + #[test]
> + fn test_mock_memdb_lock_operations() {
> + let db = MockMemDb::new();
> + let csum1 = [1u8; 32];
> + let csum2 = [2u8; 32];
> +
> + // Acquire lock
> + db.acquire_lock("/priv/lock/resource", &csum1).unwrap();
> + assert!(db.is_locked("/priv/lock/resource"));
> +
> + // Lock with same checksum should succeed (refresh)
> + assert!(db.acquire_lock("/priv/lock/resource", &csum1).is_ok());
> +
> + // Lock with different checksum should fail
> + assert!(db.acquire_lock("/priv/lock/resource", &csum2).is_err());
> +
> + // Release lock
> + db.release_lock("/priv/lock/resource", &csum1).unwrap();
> + assert!(!db.is_locked("/priv/lock/resource"));
> +
> + // Can acquire with different checksum now
> + db.acquire_lock("/priv/lock/resource", &csum2).unwrap();
> + assert!(db.is_locked("/priv/lock/resource"));
> + }
> +
> + #[test]
> + fn test_mock_memdb_rename() {
> + let db = MockMemDb::new();
> +
> + // Create file
> + db.create("/old.txt", libc::S_IFREG, 1000).unwrap();
> + db.write("/old.txt", 0, 1001, b"content", false).unwrap();
> +
> + // Rename
> + db.rename("/old.txt", "/new.txt").unwrap();
> +
> + // Old path should not exist
> + assert!(!db.exists("/old.txt").unwrap());
> +
> + // New path should exist with same content
> + assert!(db.exists("/new.txt").unwrap());
> + let data = db.read("/new.txt", 0, 100).unwrap();
> + assert_eq!(&data[..], b"content");
> + }
> +
> + #[test]
> + fn test_mock_memdb_delete() {
> + let db = MockMemDb::new();
> +
> + // Create and delete file
> + db.create("/delete-me.txt", libc::S_IFREG, 1000).unwrap();
> + assert!(db.exists("/delete-me.txt").unwrap());
> +
> + db.delete("/delete-me.txt").unwrap();
> + assert!(!db.exists("/delete-me.txt").unwrap());
> +
> + // Delete non-existent file should fail
> + assert!(db.delete("/nonexistent.txt").is_err());
> + }
> +
> + #[test]
> + fn test_mock_memdb_version_tracking() {
> + let db = MockMemDb::new();
> + let initial_version = db.get_version();
> +
> + // Version should increment on modifications
> + db.create("/file1.txt", libc::S_IFREG, 1000).unwrap();
> + assert!(db.get_version() > initial_version);
> +
> + let v1 = db.get_version();
> + db.write("/file1.txt", 0, 1001, b"data", false).unwrap();
> + assert!(db.get_version() > v1);
> +
> + let v2 = db.get_version();
> + db.delete("/file1.txt").unwrap();
> + assert!(db.get_version() > v2);
> + }
> +
> + #[test]
> + fn test_mock_memdb_isolation() {
> + // Each MockMemDb instance is completely isolated
> + let db1 = MockMemDb::new();
> + let db2 = MockMemDb::new();
> +
> + db1.create("/test.txt", libc::S_IFREG, 1000).unwrap();
> +
> + // db2 should not see db1's files
> + assert!(db1.exists("/test.txt").unwrap());
> + assert!(!db2.exists("/test.txt").unwrap());
> + }
> +
> + #[test]
> + fn test_mock_memdb_as_trait_object() {
> + // Demonstrate using MockMemDb through trait object
> + let db: Arc<dyn MemDbOps> = Arc::new(MockMemDb::new());
> +
> + db.create("/trait-test.txt", libc::S_IFREG, 2000).unwrap();
> + assert!(db.exists("/trait-test.txt").unwrap());
> +
> + db.write("/trait-test.txt", 0, 2001, b"via trait", false)
> + .unwrap();
> + let data = db.read("/trait-test.txt", 0, 100).unwrap();
> + assert_eq!(&data[..], b"via trait");
> + }
> +
> + #[test]
> + fn test_mock_memdb_error_cases() {
> + let db = MockMemDb::new();
> +
> + // Create duplicate should fail
> + db.create("/dup.txt", libc::S_IFREG, 1000).unwrap();
> + assert!(db.create("/dup.txt", libc::S_IFREG, 1000).is_err());
> +
> + // Read non-existent file should fail
> + assert!(db.read("/nonexistent.txt", 0, 100).is_err());
> +
> + // Write to non-existent file should fail
> + assert!(
> + db.write("/nonexistent.txt", 0, 1000, b"data", false)
> + .is_err()
> + );
> +
> + // Empty path should fail
> + assert!(db.create("", libc::S_IFREG, 1000).is_err());
> + }
> +}
^ permalink raw reply [relevance 6%]
* Re: [pve-devel] [PATCH pve-cluster 06/15] pmxcfs-rs: add pmxcfs-status crate
@ 2026-02-02 16:07 5% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-02-02 16:07 UTC (permalink / raw)
To: Proxmox VE development discussion, Kefu Chai
comments inline
On 1/6/26 3:25 PM, Kefu Chai wrote:
> Add cluster status tracking and monitoring:
> - Status: Central status container (thread-safe)
> - Cluster membership tracking
> - VM/CT registry with version tracking
> - RRD data management
> - Cluster log integration
> - Quorum state tracking
> - Configuration file version tracking
>
> This integrates pmxcfs-memdb, pmxcfs-rrd, pmxcfs-logger, and
> pmxcfs-api-types to provide centralized cluster state management.
> It also uses procfs for system metrics collection.
>
> Includes comprehensive unit tests for:
> - VM registration and deletion
> - Cluster membership updates
> - Version tracking
> - Configuration file monitoring
>
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
> src/pmxcfs-rs/Cargo.toml | 1 +
> src/pmxcfs-rs/pmxcfs-status/Cargo.toml | 40 +
> src/pmxcfs-rs/pmxcfs-status/README.md | 142 ++
> src/pmxcfs-rs/pmxcfs-status/src/lib.rs | 54 +
> src/pmxcfs-rs/pmxcfs-status/src/status.rs | 1561 +++++++++++++++++++++
> src/pmxcfs-rs/pmxcfs-status/src/traits.rs | 486 +++++++
> src/pmxcfs-rs/pmxcfs-status/src/types.rs | 62 +
> 7 files changed, 2346 insertions(+)
> create mode 100644 src/pmxcfs-rs/pmxcfs-status/Cargo.toml
> create mode 100644 src/pmxcfs-rs/pmxcfs-status/README.md
> create mode 100644 src/pmxcfs-rs/pmxcfs-status/src/lib.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-status/src/status.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-status/src/traits.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-status/src/types.rs
>
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index 2e41ac93..b5191c31 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -6,6 +6,7 @@ members = [
> "pmxcfs-logger", # Cluster log with ring buffer and deduplication
> "pmxcfs-rrd", # RRD (Round-Robin Database) persistence
> "pmxcfs-memdb", # In-memory database with SQLite persistence
> + "pmxcfs-status", # Status monitoring and RRD data management
> ]
> resolver = "2"
>
> diff --git a/src/pmxcfs-rs/pmxcfs-status/Cargo.toml b/src/pmxcfs-rs/pmxcfs-status/Cargo.toml
> new file mode 100644
> index 00000000..e4a817d7
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-status/Cargo.toml
> @@ -0,0 +1,40 @@
> +[package]
> +name = "pmxcfs-status"
> +description = "Status monitoring and RRD data management for pmxcfs"
> +
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +repository.workspace = true
> +
> +[lints]
> +workspace = true
> +
> +[dependencies]
> +# Workspace dependencies
> +pmxcfs-api-types.workspace = true
> +pmxcfs-rrd.workspace = true
> +pmxcfs-memdb.workspace = true
> +pmxcfs-logger.workspace = true
> +
> +# Error handling
> +anyhow.workspace = true
> +
> +# Async runtime
> +tokio.workspace = true
> +
> +# Concurrency primitives
> +parking_lot.workspace = true
> +
> +# Logging
> +tracing.workspace = true
> +
> +# Utilities
> +chrono.workspace = true
this dependency is not used
> +
> +# System information (Linux /proc filesystem)
> +procfs = "0.17"
> +
> +[dev-dependencies]
> +tempfile.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-status/README.md b/src/pmxcfs-rs/pmxcfs-status/README.md
> new file mode 100644
> index 00000000..b6958af3
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-status/README.md
> @@ -0,0 +1,142 @@
> +# pmxcfs-status
> +
> +**Cluster Status** tracking and monitoring for pmxcfs.
> +
> +This crate manages all runtime cluster state information including membership, VM lists, node status, RRD metrics, and cluster logs. It serves as the central repository for dynamic cluster information that changes during runtime.
> +
> +## Overview
> +
> +The Status subsystem tracks:
> +- **Cluster membership**: Which nodes are in the cluster and their states
> +- **VM/CT tracking**: Registry of all virtual machines and containers
> +- **Node status**: Per-node health and resource information
> +- **RRD data**: Performance metrics (CPU, memory, disk, network)
> +- **Cluster log**: Centralized log aggregation
> +- **Quorum state**: Whether cluster has quorum
> +- **Version tracking**: Monitors configuration file changes
> +
> +## Usage
> +
> +### Initialization
> +
> +```rust
> +use pmxcfs_status;
> +
> +// For tests or when RRD persistence is not needed
> +let status = pmxcfs_status::init();
> +
> +// For production with RRD file persistence
> +let status = pmxcfs_status::init_with_rrd("/var/lib/rrdcached/db").await;
> +```
> +
> +The default `init()` is synchronous and doesn't require a directory parameter, making tests simpler. Use `init_with_rrd()` for production deployments that need RRD persistence.
> +
> +### Integration with Other Components
> +
> +**FUSE Plugins**:
> +- `.version` plugin reads from Status
> +- `.vmlist` plugin generates VM list from Status
> +- `.members` plugin generates member list from Status
> +- `.rrd` plugin accesses RRD data from Status
> +- `.clusterlog` plugin reads cluster log from Status
> +
> +**DFSM Status Sync**:
> +- `StatusSyncService` (pmxcfs-dfsm) broadcasts status updates
> +- Uses `pve_kvstore_v1` CPG group
> +- KV store data synchronized across nodes
> +
> +**IPC Server**:
> +- `set_status` IPC call updates Status
> +- Used by `pvecm`/`pvenode` tools
> +- RRD data received via IPC
> +
> +**MemDb Integration**:
> +- Scans VM configs to populate vmlist
> +- Tracks version changes on file modifications
> +- Used for `.version` plugin timestamps
> +
> +## Architecture
> +
> +### Module Structure
> +
> +| Module | Purpose |
> +|--------|---------|
> +| `lib.rs` | Public API and initialization |
> +| `status.rs` | Core Status struct and operations |
> +| `types.rs` | Type definitions (ClusterNode, ClusterInfo, etc.) |
> +
> +### Key Features
> +
> +**Thread-Safe**: All operations use `RwLock` or `AtomicU64` for concurrent access
> +**Version Tracking**: Monotonically increasing counters for change detection
> +**Structured Logging**: Field-based tracing for better observability
> +**Optional RRD**: RRD persistence is opt-in, simplifying testing
> +
> +## C to Rust Mapping
> +
> +### Data Structures
> +
> +| C Type | Rust Type | Notes |
> +|--------|-----------|-------|
> +| `cfs_status_t` | `Status` | Main status container |
> +| `cfs_clinfo_t` | `ClusterInfo` | Cluster membership info |
> +| `cfs_clnode_t` | `ClusterNode` | Individual node info |
> +| `vminfo_t` | `VmEntry` | VM/CT registry entry (in pmxcfs-api-types) |
> +| `clog_entry_t` | `ClusterLogEntry` | Cluster log entry |
> +
> +### Core Functions
> +
> +| C Function | Rust Equivalent | Notes |
> +|-----------|-----------------|-------|
> +| `cfs_status_init()` | `init()` or `init_with_rrd()` | Two variants for flexibility |
> +| `cfs_set_quorate()` | `Status::set_quorate()` | Quorum tracking |
> +| `cfs_is_quorate()` | `Status::is_quorate()` | Quorum checking |
> +| `vmlist_register_vm()` | `Status::register_vm()` | VM registration |
> +| `vmlist_delete_vm()` | `Status::delete_vm()` | VM deletion |
> +| `cfs_status_set()` | `Status::set_node_status()` | Status updates (including RRD) |
> +
> +## Key Differences from C Implementation
> +
> +### RRD Decoupling
> +
> +**C Version (status.c)**:
> +- RRD code embedded in status.c
> +- Async initialization always required
> +
> +**Rust Version**:
> +- Separate `pmxcfs-rrd` crate
> +- `init()` is synchronous (no RRD)
> +- `init_with_rrd()` is async (with RRD)
> +- Tests don't need temp directories
> +
> +### Concurrency
> +
> +**C Version**:
> +- Single `GMutex` for entire status structure
> +
> +**Rust Version**:
> +- Fine-grained `RwLock` for different data structures
> +- `AtomicU64` for version counters
> +- Better read parallelism
> +
> +## Configuration File Tracking
> +
> +Status tracks version numbers for these common Proxmox config files:
> +
> +- `corosync.conf`, `corosync.conf.new`
> +- `storage.cfg`, `user.cfg`, `domains.cfg`
> +- `datacenter.cfg`, `vzdump.cron`, `vzdump.conf`
> +- `ha/` directory files (crm_commands, manager_status, resources.cfg, etc.)
> +- `sdn/` directory files (vnets.cfg, zones.cfg, controllers.cfg, etc.)
> +- And many more (see `Status::new()` in status.rs for complete list)
> +
> +## References
> +
> +### C Implementation
> +- `src/pmxcfs/status.c` / `status.h` - Status tracking
> +
> +### Related Crates
> +- **pmxcfs-rrd**: RRD file persistence
> +- **pmxcfs-dfsm**: Status synchronization via StatusSyncService
> +- **pmxcfs-logger**: Cluster log implementation
> +- **pmxcfs**: FUSE plugins that read from Status
> diff --git a/src/pmxcfs-rs/pmxcfs-status/src/lib.rs b/src/pmxcfs-rs/pmxcfs-status/src/lib.rs
> new file mode 100644
> index 00000000..282e007d
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-status/src/lib.rs
> @@ -0,0 +1,54 @@
> +/// Status information and monitoring
> +///
> +/// This module manages:
> +/// - Cluster membership (nodes, IPs, online status)
> +/// - RRD (Round Robin Database) data for metrics
> +/// - Cluster log
> +/// - Node status information
> +/// - VM/CT list tracking
> +mod status;
> +mod traits;
> +mod types;
> +
> +// Re-export public types
> +pub use pmxcfs_api_types::{VmEntry, VmType};
> +pub use types::{ClusterInfo, ClusterLogEntry, ClusterNode, NodeStatus};
> +
> +// Re-export Status struct and trait
> +pub use status::Status;
> +pub use traits::{BoxFuture, MockStatus, StatusOps};
> +
> +use std::sync::Arc;
> +
> +/// Initialize status subsystem without RRD persistence
> +///
> +/// This is the default initialization that creates a Status instance
> +/// without file-based RRD persistence. RRD data will be kept in memory only.
> +pub fn init() -> Arc<Status> {
> + tracing::info!("Status subsystem initialized (RRD persistence disabled)");
> + Arc::new(Status::new(None))
> +}
> +
> +/// Initialize status subsystem with RRD file persistence
> +///
> +/// Creates a Status instance with RRD data written to disk in the specified directory.
> +/// This requires the RRD directory to exist and be writable.
> +pub async fn init_with_rrd<P: AsRef<std::path::Path>>(rrd_dir: P) -> Arc<Status> {
> + let rrd_dir_path = rrd_dir.as_ref();
> + let rrd_writer = match pmxcfs_rrd::RrdWriter::new(rrd_dir_path).await {
> + Ok(writer) => {
> + tracing::info!(
> + directory = %rrd_dir_path.display(),
> + "RRD file persistence enabled"
> + );
> + Some(writer)
> + }
> + Err(e) => {
> + tracing::warn!(error = %e, "RRD file persistence disabled");
> + None
> + }
> + };
> +
> + tracing::info!("Status subsystem initialized");
> + Arc::new(Status::new(rrd_writer))
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-status/src/status.rs b/src/pmxcfs-rs/pmxcfs-status/src/status.rs
> new file mode 100644
> index 00000000..94b6483d
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-status/src/status.rs
> @@ -0,0 +1,1561 @@
> +/// Status subsystem implementation
> +use crate::types::{ClusterInfo, ClusterLogEntry, ClusterNode, NodeStatus, RrdEntry};
> +use anyhow::Result;
> +use parking_lot::RwLock;
> +use pmxcfs_api_types::{VmEntry, VmType};
> +use std::collections::HashMap;
> +use std::sync::Arc;
> +use std::sync::atomic::{AtomicU64, Ordering};
> +use std::time::{SystemTime, UNIX_EPOCH};
> +
> +/// Status subsystem (matches C implementation's cfs_status_t)
> +pub struct Status {
> + /// Cluster information (nodes, membership) - matches C's clinfo
> + cluster_info: RwLock<Option<ClusterInfo>>,
> +
> + /// Cluster info version counter - increments on membership changes (matches C's clinfo_version)
> + cluster_version: AtomicU64,
This field is used as a change counter in multiple places but gets
overwritten in update_cluster_info(). In C we have clinfo_version
vs cman_version. These need to be separate fields as in C, otherwise
update_cluster_info overwrites the monotonic change counter that other
call sites depend on.
> +
> + /// VM list version counter - increments when VM list changes (matches C's vmlist_version)
> + vmlist_version: AtomicU64,
> +
> + /// MemDB path version counters (matches C's memdb_change_array)
> + /// Tracks versions for specific config files like "corosync.conf", "user.cfg", etc.
> + memdb_path_versions: RwLock<HashMap<String, AtomicU64>>,
> +
> + /// Node status data by name
> + node_status: RwLock<HashMap<String, NodeStatus>>,
> +
> + /// Cluster log with ring buffer and deduplication (matches C's clusterlog_t)
> + cluster_log: pmxcfs_logger::ClusterLog,
> +
> + /// RRD entries by key (e.g., "pve2-node/nodename" or "pve2.3-vm/vmid")
> + pub(crate) rrd_data: RwLock<HashMap<String, RrdEntry>>,
> +
> + /// RRD file writer for persistent storage (using tokio RwLock for async compatibility)
> + rrd_writer: Option<Arc<tokio::sync::RwLock<pmxcfs_rrd::RrdWriter>>>,
> +
> + /// VM/CT list (vmid -> VmEntry)
> + vmlist: RwLock<HashMap<u32, VmEntry>>,
> +
> + /// Quorum status (matches C's cfs_status.quorate)
> + quorate: RwLock<bool>,
> +
> + /// Current cluster members (CPG membership)
> + members: RwLock<Vec<pmxcfs_api_types::MemberInfo>>,
> +
> + /// Daemon start timestamp (UNIX epoch) - for .version plugin
> + start_time: u64,
> +
> + /// KV store data from nodes (nodeid -> key -> value)
> + /// Matches C implementation's kvhash
> + kvstore: RwLock<HashMap<u32, HashMap<String, Vec<u8>>>>,
C removes a kvstore entry when len == 0 and maintains a
per key entry->version counter (incremented on overwrite).
Our kvstore currently stores only Vec<u8> and doesn’t reflect
these semantics
> +}
> +
> +impl Status {
> + /// Create a new Status instance
> + ///
> + /// For production use with RRD persistence, use `pmxcfs_status::init_with_rrd()`.
> + /// For tests or when RRD persistence is not needed, use `pmxcfs_status::init()`.
> + /// This constructor is public to allow custom initialization patterns.
> + pub fn new(rrd_writer: Option<pmxcfs_rrd::RrdWriter>) -> Self {
> + // Wrap RrdWriter in Arc<tokio::sync::RwLock> if provided (for async compatibility)
> + let rrd_writer = rrd_writer.map(|w| Arc::new(tokio::sync::RwLock::new(w)));
> +
> + // Initialize memdb path versions for common Proxmox config files
> + // Matches C implementation's memdb_change_array (status.c:79-120)
> + // These are the exact paths tracked by the C implementation
> + let mut path_versions = HashMap::new();
> + let common_paths = vec![
> + "corosync.conf",
> + "corosync.conf.new",
> + "storage.cfg",
> + "user.cfg",
> + "domains.cfg",
> + "notifications.cfg",
> + "priv/notifications.cfg",
> + "priv/shadow.cfg",
> + "priv/acme/plugins.cfg",
> + "priv/tfa.cfg",
> + "priv/token.cfg",
> + "datacenter.cfg",
> + "vzdump.cron",
> + "vzdump.conf",
> + "jobs.cfg",
> + "ha/crm_commands",
> + "ha/manager_status",
> + "ha/resources.cfg",
> + "ha/rules.cfg",
> + "ha/groups.cfg",
> + "ha/fence.cfg",
> + "status.cfg",
> + "replication.cfg",
> + "ceph.conf",
> + "sdn/vnets.cfg",
> + "sdn/zones.cfg",
> + "sdn/controllers.cfg",
> + "sdn/subnets.cfg",
> + "sdn/ipams.cfg",
> + "sdn/mac-cache.json", // SDN MAC address cache
> + "sdn/pve-ipam-state.json", // SDN IPAM state
> + "sdn/dns.cfg", // SDN DNS configuration
> + "sdn/fabrics.cfg", // SDN fabrics configuration
> + "sdn/.running-config", // SDN running configuration
> + "virtual-guest/cpu-models.conf", // Virtual guest CPU models
> + "virtual-guest/profiles.cfg", // Virtual guest profiles
> + "firewall/cluster.fw", // Cluster firewall rules
> + "mapping/directory.cfg", // Directory mappings
> + "mapping/pci.cfg", // PCI device mappings
> + "mapping/usb.cfg", // USB device mappings
> + ];
> +
> + for path in common_paths {
> + path_versions.insert(path.to_string(), AtomicU64::new(0));
> + }
> +
> + // Get start time (matches C implementation's cfs_status.start_time)
> + let start_time = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap_or_default()
> + .as_secs();
> +
> + Self {
> + cluster_info: RwLock::new(None),
> + cluster_version: AtomicU64::new(1),
> + vmlist_version: AtomicU64::new(1),
> + memdb_path_versions: RwLock::new(path_versions),
> + node_status: RwLock::new(HashMap::new()),
> + cluster_log: pmxcfs_logger::ClusterLog::new(),
> + rrd_data: RwLock::new(HashMap::new()),
> + rrd_writer,
> + vmlist: RwLock::new(HashMap::new()),
> + quorate: RwLock::new(false),
> + members: RwLock::new(Vec::new()),
> + start_time,
> + kvstore: RwLock::new(HashMap::new()),
> + }
> + }
> +
> + /// Get node status
> + pub fn get_node_status(&self, name: &str) -> Option<NodeStatus> {
> + self.node_status.read().get(name).cloned()
> + }
> +
> + /// Set node status (matches C implementation's cfs_status_set)
> + ///
> + /// This handles status updates received via IPC from external clients.
> + /// If the key starts with "rrd/", it's RRD data that should be written to disk.
> + /// Otherwise, it's generic node status data.
> + pub async fn set_node_status(&self, name: String, data: Vec<u8>) -> Result<()> {
we need to check for CFS_MAX_STATUS_SIZE, to avoid accepting unbounded
payloads (and to avoid possible state divergence with C)
> + // Check if this is RRD data (matching C's cfs_status_set behavior)
> + if let Some(rrd_key) = name.strip_prefix("rrd/") {
> + // Strip "rrd/" prefix to get the actual RRD key
> + // Convert data to string (RRD data is text format)
> + let data_str = String::from_utf8(data)
> + .map_err(|e| anyhow::anyhow!("Invalid UTF-8 in RRD data: {e}"))?;
We need to strip \0 as C payloads are NUL terminated and from_utf8
preserves it, so that it doesn't end up in RRD dump output
> +
> + // Write to RRD (stores in memory and writes to disk)
> + self.set_rrd_data(rrd_key.to_string(), data_str).await?;
> + } else {
nodeip handling is missing here, C has a dedicated branch for it.
The backing data structure iphash is also missing.
> + // Regular node status (not RRD)
> + let now = SystemTime::now().duration_since(UNIX_EPOCH)?.as_secs();
> + let status = NodeStatus {
> + name: name.clone(),
> + data,
> + timestamp: now,
> + };
> + self.node_status.write().insert(name, status);
> + }
> +
> + Ok(())
> + }
> +
> + /// Add cluster log entry
> + pub fn add_log_entry(&self, entry: ClusterLogEntry) {
> + // Convert ClusterLogEntry to ClusterLog format and add
> + // The ClusterLog handles size limits and deduplication internally
> + let _ = self.cluster_log.add(
> + &entry.node,
> + &entry.ident,
> + &entry.tag,
> + 0, // pid not tracked in our entries
> + entry.priority,
> + entry.timestamp as u32,
> + &entry.message,
> + );
> + }
> +
> + /// Get cluster log entries
> + pub fn get_log_entries(&self, max: usize) -> Vec<ClusterLogEntry> {
> + // Get entries from ClusterLog and convert to ClusterLogEntry
> + self.cluster_log
> + .get_entries(max)
> + .into_iter()
> + .map(|entry| ClusterLogEntry {
> + timestamp: entry.time as u64,
> + node: entry.node,
> + priority: entry.priority,
> + ident: entry.ident,
> + tag: entry.tag,
> + message: entry.message,
> + })
> + .collect()
> + }
> +
> + /// Clear all cluster log entries (for testing)
> + pub fn clear_cluster_log(&self) {
> + self.cluster_log.clear();
> + }
> +
> + /// Set RRD data (C-compatible format)
> + /// Key format: "pve2-node/{nodename}" or "pve2.3-vm/{vmid}"
> + /// Data format: "{timestamp}:{val1}:{val2}:..."
> + pub async fn set_rrd_data(&self, key: String, data: String) -> Result<()> {
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap_or_default()
> + .as_secs();
> +
> + let entry = RrdEntry {
> + key: key.clone(),
> + data: data.clone(),
> + timestamp: now,
> + };
> +
> + // Store in memory for .rrd plugin file
> + self.rrd_data.write().insert(key.clone(), entry);
> +
> + // Also write to RRD file on disk (if persistence is enabled)
> + if let Some(writer_lock) = &self.rrd_writer {
> + let mut writer = writer_lock.write().await;
> + writer.update(&key, &data).await?;
> + tracing::trace!("Updated RRD file: {} -> {}", key, data);
> + }
> +
> + Ok(())
> + }
> +
> + /// Remove old RRD entries (older than 5 minutes)
> + pub fn remove_old_rrd_data(&self) {
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap_or_default()
> + .as_secs();
> +
> + const EXPIRE_SECONDS: u64 = 60 * 5; // 5 minutes
> +
> + self.rrd_data
> + .write()
> + .retain(|_, entry| now - entry.timestamp <= EXPIRE_SECONDS);
If the system clock jumps backwards, now can be less than
entry.timestamp
> + }
> +
> + /// Get RRD data dump (text format matching C implementation)
> + pub fn get_rrd_dump(&self) -> String {
This rebuilds everytime when called, and calls remove_old_rrd_data
under write lock. This could be cached for a specific time to improve
performance, similarly as done in C.
> + // Remove old entries first
> + self.remove_old_rrd_data();
> +
> + let rrd = self.rrd_data.read();
> + let mut result = String::new();
> +
> + for entry in rrd.values() {
> + result.push_str(&entry.key);
> + result.push(':');
> + result.push_str(&entry.data);
> + result.push('\n');
> + }
> +
> + result
> + }
> +
> + /// Collect disk I/O statistics (bytes read, bytes written)
> + ///
> + /// Note: This is for future VM RRD implementation. Per C implementation:
> + /// - Node RRD (rrd_def_node) has 12 fields and does NOT include diskread/diskwrite
> + /// - VM RRD (rrd_def_vm) has 10 fields and DOES include diskread/diskwrite at indices 8-9
> + ///
> + /// This method will be used when implementing VM RRD collection.
> + ///
> + /// # Sector Size
> + /// The Linux kernel reports disk statistics in /proc/diskstats using 512-byte sectors
> + /// as the standard unit, regardless of the device's actual physical sector size.
> + /// This is a kernel reporting convention (see Documentation/admin-guide/iostats.rst).
> + #[allow(dead_code)]
> + fn collect_disk_io() -> Result<(u64, u64)> {
> + // /proc/diskstats always uses 512-byte sectors (kernel convention)
> + const DISKSTATS_SECTOR_SIZE: u64 = 512;
> +
> + let diskstats = procfs::diskstats()?;
> +
> + let mut total_read = 0u64;
> + let mut total_write = 0u64;
> +
> + for stat in diskstats {
> + // Skip partitions (only look at whole disks: sda, vda, etc.)
> + if stat
> + .name
> + .chars()
> + .last()
> + .map(|c| c.is_numeric())
> + .unwrap_or(false)
> + {
> + continue;
> + }
> +
> + // Convert sectors to bytes using kernel's reporting unit
> + total_read += stat.sectors_read * DISKSTATS_SECTOR_SIZE;
> + total_write += stat.sectors_written * DISKSTATS_SECTOR_SIZE;
> + }
> +
> + Ok((total_read, total_write))
> + }
> +
> + /// Register a VM/CT
> + pub fn register_vm(&self, vmid: u32, vmtype: VmType, node: String) {
> + tracing::debug!(vmid, vmtype = ?vmtype, node = %node, "Registered VM");
> +
> + // Get existing VM version or start at 1
> + let version = self
> + .vmlist
> + .read()
> + .get(&vmid)
> + .map(|vm| vm.version + 1)
In C we have the global static uint32_t vminfo_version_counter.
Here we have per vm based counters. Why the difference?
wouldnt it be more
helpful if we also have a global order here, so we can determine
the update order of VMs from it?
> + .unwrap_or(1);
> +
> + let entry = VmEntry {
> + vmid,
> + vmtype,
> + node,
> + version,
> + };
> + self.vmlist.write().insert(vmid, entry);
Between the read() and write() we have TOCTOU window, similarly as in
set_quorate
> +
> + // Increment vmlist version counter
> + self.increment_vmlist_version();
> + }
> +
> + /// Delete a VM/CT
> + pub fn delete_vm(&self, vmid: u32) {
> + if self.vmlist.write().remove(&vmid).is_some() {
This should bump unconditionally to match C
> + tracing::debug!(vmid, "Deleted VM");
> +
> + // Increment vmlist version counter
> + self.increment_vmlist_version();
> + }
> + }
> +
> + /// Check if VM/CT exists
> + pub fn vm_exists(&self, vmid: u32) -> bool {
> + self.vmlist.read().contains_key(&vmid)
> + }
> +
> + /// Check if a different VM/CT exists (different node or type)
> + pub fn different_vm_exists(&self, vmid: u32, vmtype: VmType, node: &str) -> bool {
> + if let Some(entry) = self.vmlist.read().get(&vmid) {
> + entry.vmtype != vmtype || entry.node != node
> + } else {
> + false
> + }
> + }
> +
> + /// Get VM list
> + pub fn get_vmlist(&self) -> HashMap<u32, VmEntry> {
> + self.vmlist.read().clone()
> + }
> +
> + /// Scan directories for VMs/CTs and update vmlist
> + ///
> + /// Uses memdb's `recreate_vmlist()` to properly scan nodes/*/qemu-server/
> + /// and nodes/*/lxc/ directories to track which node each VM belongs to.
> + pub fn scan_vmlist(&self, memdb: &pmxcfs_memdb::MemDb) {
> + // Use the proper recreate_vmlist from memdb which scans nodes/*/qemu-server/ and nodes/*/lxc/
> + match pmxcfs_memdb::recreate_vmlist(memdb) {
> + Ok(new_vmlist) => {
> + let vmlist_len = new_vmlist.len();
> + let mut vmlist = self.vmlist.write();
> + *vmlist = new_vmlist;
This replaces the entire HashMap, which resets all per VM version
counters.
> + drop(vmlist);
> +
> + tracing::info!(vms = vmlist_len, "VM list scan complete");
> +
> + // Increment vmlist version counter
> + self.increment_vmlist_version();
> + }
> + Err(err) => {
> + tracing::error!(error = %err, "Failed to recreate vmlist");
> + }
> + }
> + }
> +
> + /// Initialize cluster information with cluster name
> + pub fn init_cluster(&self, cluster_name: String) {
> + let info = ClusterInfo::new(cluster_name);
> + *self.cluster_info.write() = Some(info);
> + self.cluster_version.fetch_add(1, Ordering::SeqCst);
> + }
> +
> + /// Register a node in the cluster (name, ID, IP)
> + pub fn register_node(&self, node_id: u32, name: String, ip: String) {
> + tracing::debug!(node_id, node = %name, ip = %ip, "Registering cluster node");
> +
> + let mut cluster_info = self.cluster_info.write();
> + if let Some(ref mut info) = *cluster_info {
> + let node = ClusterNode {
> + name,
> + node_id,
> + ip,
> + online: false, // Will be updated by cluster module
> + };
> + info.add_node(node);
> + self.cluster_version.fetch_add(1, Ordering::SeqCst);
> + }
> + }
> +
> + /// Get cluster information (for .members plugin)
> + pub fn get_cluster_info(&self) -> Option<ClusterInfo> {
> + self.cluster_info.read().clone()
> + }
> +
> + /// Get cluster version
> + pub fn get_cluster_version(&self) -> u64 {
> + self.cluster_version.load(Ordering::SeqCst)
> + }
> +
> + /// Increment cluster version (called when membership changes)
> + pub fn increment_cluster_version(&self) {
> + self.cluster_version.fetch_add(1, Ordering::SeqCst);
> + }
> +
> + /// Update cluster info from CMAP (called by ClusterConfigService)
> + pub fn update_cluster_info(
> + &self,
> + cluster_name: String,
> + config_version: u64,
> + nodes: Vec<(u32, String, String)>,
> + ) -> Result<()> {
> + let mut cluster_info = self.cluster_info.write();
> +
> + // Create or update cluster info
> + let mut info = cluster_info
> + .take()
> + .unwrap_or_else(|| ClusterInfo::new(cluster_name.clone()));
> +
> + // Update cluster name if changed
> + if info.cluster_name != cluster_name {
> + info.cluster_name = cluster_name;
> + }
> +
> + // Clear existing nodes
> + info.nodes_by_id.clear();
> + info.nodes_by_name.clear();
> +
> + // Add updated nodes
> + for (nodeid, name, ip) in nodes {
> + let node = ClusterNode {
> + name: name.clone(),
> + node_id: nodeid,
> + ip,
> + online: false, // Will be updated by quorum module
This drops online status. C's cfs_status_set_clinfo preserves it by
copying from oldnode. This needs the same treatment here.
> + };
> + info.add_node(node);
> + }
Do we need to cleanup kvstore on node removal?
> +
> + *cluster_info = Some(info);
> +
> + // Update version to reflect configuration change
> + self.cluster_version.store(config_version, Ordering::SeqCst);
> +
> + tracing::info!(version = config_version, "Updated cluster configuration");
> + Ok(())
> + }
> +
> + /// Update node online status (called by cluster module)
> + pub fn set_node_online(&self, node_id: u32, online: bool) {
> + let mut cluster_info = self.cluster_info.write();
> + if let Some(ref mut info) = *cluster_info
> + && let Some(node) = info.nodes_by_id.get_mut(&node_id)
> + && node.online != online
> + {
> + node.online = online;
> + // Also update in nodes_by_name
> + if let Some(name_node) = info.nodes_by_name.get_mut(&node.name) {
> + name_node.online = online;
> + }
> + self.cluster_version.fetch_add(1, Ordering::SeqCst);
> + tracing::debug!(
> + node = %node.name,
> + node_id,
> + online = if online { "true" } else { "false" },
> + "Node online status changed"
> + );
> + }
> + }
> +
> + /// Check if cluster is quorate (matches C's cfs_is_quorate)
> + pub fn is_quorate(&self) -> bool {
> + *self.quorate.read()
> + }
> +
> + /// Set quorum status (matches C's cfs_set_quorate)
> + pub fn set_quorate(&self, quorate: bool) {
> + let old_quorate = *self.quorate.read();
between this
> + *self.quorate.write() = quorate;
and this line we have a TOCTOU window.
The * dereferences the bool out of it, and then the guard is dropped at
the semicolon. So between line 1 and line 2, no lock is held.
Putting both operations under the write lock would solve it.
> +
> + if old_quorate != quorate {
> + if quorate {
> + tracing::info!("Node has quorum");
> + } else {
> + tracing::info!("Node lost quorum");
> + }
> + }
> + }
> +
> + /// Get current cluster members (CPG membership)
> + pub fn get_members(&self) -> Vec<pmxcfs_api_types::MemberInfo> {
> + self.members.read().clone()
> + }
> +
> + /// Update cluster members and sync online status (matches C's dfsm_confchg callback)
> + ///
> + /// This updates the CPG member list and synchronizes the online status
> + /// in cluster_info to match current membership.
> + pub fn update_members(&self, members: Vec<pmxcfs_api_types::MemberInfo>) {
> + *self.members.write() = members.clone();
> +
> + // Update online status in cluster_info based on members
> + // (matches C implementation's dfsm_confchg in status.c:1989-2025)
> + let mut cluster_info = self.cluster_info.write();
> + if let Some(ref mut info) = *cluster_info {
> + // First mark all nodes as offline
> + for node in info.nodes_by_id.values_mut() {
> + node.online = false;
> + }
> + for node in info.nodes_by_name.values_mut() {
> + node.online = false;
> + }
> +
> + // Then mark active members as online
> + for member in &members {
> + if let Some(node) = info.nodes_by_id.get_mut(&member.node_id) {
> + node.online = true;
> + // Also update in nodes_by_name
> + if let Some(name_node) = info.nodes_by_name.get_mut(&node.name) {
> + name_node.online = true;
> + }
> + }
> + }
> +
> + self.cluster_version.fetch_add(1, Ordering::SeqCst);
> + }
> + }
> +
> + /// Get daemon start timestamp (for .version plugin)
> + pub fn get_start_time(&self) -> u64 {
> + self.start_time
> + }
> +
> + /// Increment VM list version (matches C's cfs_status.vmlist_version++)
> + pub fn increment_vmlist_version(&self) {
> + self.vmlist_version.fetch_add(1, Ordering::SeqCst);
> + }
> +
> + /// Get VM list version
> + pub fn get_vmlist_version(&self) -> u64 {
> + self.vmlist_version.load(Ordering::SeqCst)
> + }
> +
> + /// Increment version for a specific memdb path (matches C's record_memdb_change)
> + pub fn increment_path_version(&self, path: &str) {
> + let versions = self.memdb_path_versions.read();
> + if let Some(counter) = versions.get(path) {
> + counter.fetch_add(1, Ordering::SeqCst);
> + }
> + }
> +
> + /// Get version for a specific memdb path
> + pub fn get_path_version(&self, path: &str) -> u64 {
> + let versions = self.memdb_path_versions.read();
> + versions
> + .get(path)
> + .map(|counter| counter.load(Ordering::SeqCst))
> + .unwrap_or(0)
> + }
> +
> + /// Get all memdb path versions (for .version plugin)
> + pub fn get_all_path_versions(&self) -> HashMap<String, u64> {
> + let versions = self.memdb_path_versions.read();
> + versions
> + .iter()
> + .map(|(path, counter)| (path.clone(), counter.load(Ordering::SeqCst)))
> + .collect()
> + }
> +
> + /// Increment ALL configuration file versions (matches C's record_memdb_reload)
> + ///
> + /// Called when the entire database is reloaded from cluster peers.
> + /// This ensures clients know that all configuration files should be re-read.
> + pub fn increment_all_path_versions(&self) {
> + let versions = self.memdb_path_versions.read();
> + for (_, counter) in versions.iter() {
> + counter.fetch_add(1, Ordering::SeqCst);
> + }
> + }
> +
> + /// Set key-value data from a node (kvstore DFSM)
> + ///
> + /// Matches C implementation's cfs_kvstore_node_set in status.c.
> + /// Stores ephemeral status data like RRD metrics, IP addresses, etc.
> + pub fn set_node_kv(&self, nodeid: u32, key: String, value: Vec<u8>) {
We accept unknown nodeids here, maybe something like this would work
let cluster_info = self.cluster_info.read();
match &*cluster_info {
Some(info) if info.nodes_by_id.contains_key(&nodeid) => {},
_ => return,
}
drop(cluster_info);
Also, shouldn't we also have the same 3 checks here as set_node_status
should have? Basically
if let Some(rrd_key) = key.strip_prefix("rrd/") {
..
} else if key == "nodeip" {
..
} else {
..
}
> + let mut kvstore = self.kvstore.write();
> + kvstore.entry(nodeid).or_default().insert(key, value);
> + }
> +
> + /// Get key-value data from a node
> + pub fn get_node_kv(&self, nodeid: u32, key: &str) -> Option<Vec<u8>> {
> + let kvstore = self.kvstore.read();
> + kvstore.get(&nodeid)?.get(key).cloned()
> + }
> +
> + /// Add cluster log entry (called by kvstore DFSM)
> + ///
> + /// This is the wrapper for kvstore LOG messages.
> + /// Matches C implementation's clusterlog_insert call.
> + pub fn add_cluster_log(
> + &self,
> + timestamp: u32,
> + priority: u8,
> + tag: String,
> + node: String,
> + message: String,
> + ) {
> + let entry = ClusterLogEntry {
> + timestamp: timestamp as u64,
> + node,
> + priority,
> + ident: String::new(), // Not used in kvstore messages
> + tag,
> + message,
> + };
> + self.add_log_entry(entry);
> + }
> +
> + /// Update node online status based on CPG membership (kvstore DFSM confchg callback)
> + ///
> + /// This is called when kvstore CPG membership changes.
> + /// Matches C implementation's dfsm_confchg in status.c.
> + pub fn update_member_status(&self, member_list: &[u32]) {
> + let mut cluster_info = self.cluster_info.write();
> + if let Some(ref mut info) = *cluster_info {
> + // Mark all nodes as offline
> + for node in info.nodes_by_id.values_mut() {
> + node.online = false;
> + }
> + for node in info.nodes_by_name.values_mut() {
> + node.online = false;
> + }
> +
> + // Mark nodes in member_list as online
> + for &nodeid in member_list {
> + if let Some(node) = info.nodes_by_id.get_mut(&nodeid) {
> + node.online = true;
> + // Also update in nodes_by_name
> + if let Some(name_node) = info.nodes_by_name.get_mut(&node.name) {
> + name_node.online = true;
> + }
> + }
> + }
> +
> + self.cluster_version.fetch_add(1, Ordering::SeqCst);
> + }
> + }
> +
> + /// Get cluster log state (for DFSM synchronization)
> + ///
> + /// Returns the cluster log in C-compatible binary format (clog_base_t).
> + /// Matches C implementation's clusterlog_get_state() in logger.c:553-571.
> + pub fn get_cluster_log_state(&self) -> Result<Vec<u8>> {
> + self.cluster_log.get_state()
> + }
> +
> + /// Merge cluster log states from remote nodes
> + ///
> + /// Deserializes binary states from remote nodes and merges them with the local log.
> + /// Matches C implementation's dfsm_process_state_update() in status.c:2049-2074.
> + pub fn merge_cluster_log_states(
> + &self,
> + states: &[pmxcfs_api_types::NodeSyncInfo],
> + ) -> Result<()> {
> + use pmxcfs_logger::ClusterLog;
> +
> + let mut remote_logs = Vec::new();
> +
> + for state_info in states {
> + // Check if this node has state data
> + let state_data = match &state_info.state {
> + Some(data) if !data.is_empty() => data,
> + _ => continue,
> + };
> +
> + match ClusterLog::deserialize_state(state_data) {
> + Ok(ring_buffer) => {
> + tracing::debug!(
> + "Deserialized cluster log from node {}: {} entries",
> + state_info.nodeid,
> + ring_buffer.len()
> + );
> + remote_logs.push(ring_buffer);
> + }
> + Err(e) => {
> + tracing::warn!(
> + nodeid = state_info.nodeid,
> + error = %e,
> + "Failed to deserialize cluster log from node"
> + );
> + }
> + }
> + }
> +
> + if !remote_logs.is_empty() {
> + // Merge remote logs with local log (include_local = true)
> + match self.cluster_log.merge(remote_logs, true) {
> + Ok(merged) => {
> + // Update our buffer with the merged result
> + self.cluster_log.update_buffer(merged);
> + tracing::debug!("Successfully merged cluster logs");
> + }
> + Err(e) => {
> + tracing::error!(error = %e, "Failed to merge cluster logs");
> + }
> + }
> + }
> +
> + Ok(())
> + }
> +
> + /// Add cluster log entry from remote node (kvstore LOG message)
> + ///
> + /// Matches C implementation's clusterlog_insert() via kvstore message handling.
> + pub fn add_remote_cluster_log(
> + &self,
> + time: u32,
> + priority: u8,
> + node: String,
> + ident: String,
> + tag: String,
> + message: String,
> + ) -> Result<()> {
> + self.cluster_log
> + .add(&node, &ident, &tag, 0, priority, time, &message)?;
> + Ok(())
> + }
> +}
> +
> +// Implement StatusOps trait for Status
> +impl crate::traits::StatusOps for Status {
> + fn get_node_status(&self, name: &str) -> Option<NodeStatus> {
> + self.get_node_status(name)
> + }
> +
> + fn set_node_status<'a>(
> + &'a self,
> + name: String,
> + data: Vec<u8>,
> + ) -> crate::traits::BoxFuture<'a, Result<()>> {
> + Box::pin(self.set_node_status(name, data))
> + }
> +
> + fn add_log_entry(&self, entry: ClusterLogEntry) {
> + self.add_log_entry(entry)
> + }
> +
> + fn get_log_entries(&self, max: usize) -> Vec<ClusterLogEntry> {
> + self.get_log_entries(max)
> + }
> +
> + fn clear_cluster_log(&self) {
> + self.clear_cluster_log()
> + }
> +
> + fn add_cluster_log(
> + &self,
> + timestamp: u32,
> + priority: u8,
> + tag: String,
> + node: String,
> + msg: String,
> + ) {
> + self.add_cluster_log(timestamp, priority, tag, node, msg)
> + }
> +
> + fn get_cluster_log_state(&self) -> Result<Vec<u8>> {
> + self.get_cluster_log_state()
> + }
> +
> + fn merge_cluster_log_states(&self, states: &[pmxcfs_api_types::NodeSyncInfo]) -> Result<()> {
> + self.merge_cluster_log_states(states)
> + }
> +
> + fn add_remote_cluster_log(
> + &self,
> + time: u32,
> + priority: u8,
> + node: String,
> + ident: String,
> + tag: String,
> + message: String,
> + ) -> Result<()> {
> + self.add_remote_cluster_log(time, priority, node, ident, tag, message)
> + }
> +
> + fn set_rrd_data<'a>(
> + &'a self,
> + key: String,
> + data: String,
> + ) -> crate::traits::BoxFuture<'a, Result<()>> {
> + Box::pin(self.set_rrd_data(key, data))
> + }
> +
> + fn remove_old_rrd_data(&self) {
> + self.remove_old_rrd_data()
> + }
> +
> + fn get_rrd_dump(&self) -> String {
> + self.get_rrd_dump()
> + }
> +
> + fn register_vm(&self, vmid: u32, vmtype: VmType, node: String) {
> + self.register_vm(vmid, vmtype, node)
> + }
> +
> + fn delete_vm(&self, vmid: u32) {
> + self.delete_vm(vmid)
> + }
> +
> + fn vm_exists(&self, vmid: u32) -> bool {
> + self.vm_exists(vmid)
> + }
> +
> + fn different_vm_exists(&self, vmid: u32, vmtype: VmType, node: &str) -> bool {
> + self.different_vm_exists(vmid, vmtype, node)
> + }
> +
> + fn get_vmlist(&self) -> HashMap<u32, VmEntry> {
> + self.get_vmlist()
> + }
> +
> + fn scan_vmlist(&self, memdb: &pmxcfs_memdb::MemDb) {
> + self.scan_vmlist(memdb)
> + }
> +
> + fn init_cluster(&self, cluster_name: String) {
> + self.init_cluster(cluster_name)
> + }
> +
> + fn register_node(&self, node_id: u32, name: String, ip: String) {
> + self.register_node(node_id, name, ip)
> + }
> +
> + fn get_cluster_info(&self) -> Option<ClusterInfo> {
> + self.get_cluster_info()
> + }
> +
> + fn get_cluster_version(&self) -> u64 {
> + self.get_cluster_version()
> + }
> +
> + fn increment_cluster_version(&self) {
> + self.increment_cluster_version()
> + }
> +
> + fn update_cluster_info(
> + &self,
> + cluster_name: String,
> + config_version: u64,
> + nodes: Vec<(u32, String, String)>,
> + ) -> Result<()> {
> + self.update_cluster_info(cluster_name, config_version, nodes)
> + }
> +
> + fn set_node_online(&self, node_id: u32, online: bool) {
> + self.set_node_online(node_id, online)
> + }
> +
> + fn is_quorate(&self) -> bool {
> + self.is_quorate()
> + }
> +
> + fn set_quorate(&self, quorate: bool) {
> + self.set_quorate(quorate)
> + }
> +
> + fn get_members(&self) -> Vec<pmxcfs_api_types::MemberInfo> {
> + self.get_members()
> + }
> +
> + fn update_members(&self, members: Vec<pmxcfs_api_types::MemberInfo>) {
> + self.update_members(members)
> + }
> +
> + fn update_member_status(&self, member_list: &[u32]) {
> + self.update_member_status(member_list)
> + }
> +
> + fn get_start_time(&self) -> u64 {
> + self.get_start_time()
> + }
> +
> + fn increment_vmlist_version(&self) {
> + self.increment_vmlist_version()
> + }
> +
> + fn get_vmlist_version(&self) -> u64 {
> + self.get_vmlist_version()
> + }
> +
> + fn increment_path_version(&self, path: &str) {
> + self.increment_path_version(path)
> + }
> +
> + fn get_path_version(&self, path: &str) -> u64 {
> + self.get_path_version(path)
> + }
> +
> + fn get_all_path_versions(&self) -> HashMap<String, u64> {
> + self.get_all_path_versions()
> + }
> +
> + fn increment_all_path_versions(&self) {
> + self.increment_all_path_versions()
> + }
> +
> + fn set_node_kv(&self, nodeid: u32, key: String, value: Vec<u8>) {
> + self.set_node_kv(nodeid, key, value)
> + }
> +
> + fn get_node_kv(&self, nodeid: u32, key: &str) -> Option<Vec<u8>> {
> + self.get_node_kv(nodeid, key)
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
[..]
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-status/src/traits.rs b/src/pmxcfs-rs/pmxcfs-status/src/traits.rs
[..]
> diff --git a/src/pmxcfs-rs/pmxcfs-status/src/types.rs b/src/pmxcfs-rs/pmxcfs-status/src/types.rs
> new file mode 100644
> index 00000000..393ce63a
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-status/src/types.rs
> @@ -0,0 +1,62 @@
> +/// Data types for the status module
> +use std::collections::HashMap;
> +
> +/// Cluster node information (matches C implementation's cfs_clnode_t)
> +#[derive(Debug, Clone)]
> +pub struct ClusterNode {
> + pub name: String,
> + pub node_id: u32,
> + pub ip: String,
> + pub online: bool,
> +}
> +
> +/// Cluster information (matches C implementation's cfs_clinfo_t)
> +#[derive(Debug, Clone)]
> +pub struct ClusterInfo {
> + pub cluster_name: String,
> + pub nodes_by_id: HashMap<u32, ClusterNode>,
> + pub nodes_by_name: HashMap<String, ClusterNode>,
Mutation sites have to remember to update both maps.
A safer pattern would be to make nodes_by_name just an index:
pub nodes_by_id: HashMap<u32, ClusterNode>,
pub nodes_by_name: HashMap<String, u32>,
> +}
> +
> +impl ClusterInfo {
> + pub(crate) fn new(cluster_name: String) -> Self {
> + Self {
> + cluster_name,
> + nodes_by_id: HashMap::new(),
> + nodes_by_name: HashMap::new(),
> + }
> + }
> +
> + /// Add or update a node in the cluster
> + pub(crate) fn add_node(&mut self, node: ClusterNode) {
> + self.nodes_by_name.insert(node.name.clone(), node.clone());
> + self.nodes_by_id.insert(node.node_id, node);
> + }
> +}
> +
> +/// Node status data
> +#[derive(Clone, Debug)]
> +pub struct NodeStatus {
> + pub name: String,
> + pub data: Vec<u8>,
> + pub timestamp: u64,
> +}
> +
> +/// Cluster log entry
> +#[derive(Clone, Debug)]
> +pub struct ClusterLogEntry {
> + pub timestamp: u64,
> + pub node: String,
> + pub priority: u8,
> + pub ident: String,
> + pub tag: String,
> + pub message: String,
> +}
> +
> +/// RRD (Round Robin Database) entry
> +#[derive(Clone, Debug)]
> +pub(crate) struct RrdEntry {
> + pub key: String,
> + pub data: String,
> + pub timestamp: u64,
> +}
^ permalink raw reply [relevance 5%]
* Re: [pve-devel] [PATCH pve-cluster 05/15] pmxcfs-rs: add pmxcfs-memdb crate
@ 2026-01-30 15:35 5% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-30 15:35 UTC (permalink / raw)
To: Proxmox VE development discussion, Kefu Chai
Thanks for this substantial patch, Kefu! The overall structure looks
good and is already a good step.
Main issues I noted are around C compatibility.
Besides that, the version handling differs between operations.
I'd suggest centralizing this into a single mutation helper that
handles version bump + version update + entry change in one
transaction. A single write guard mutex would also help avoid the lock
ordering issues and race conditions I noted.
Details inline.
On 1/7/26 10:15 AM, Kefu Chai wrote:
> Add in-memory database with SQLite persistence:
> - MemDb: Main database handle (thread-safe via Arc)
> - TreeEntry: File/directory entries with metadata
> - SQLite schema version 5 (C-compatible)
> - Plugin system (6 functional + 4 link plugins)
> - Resource locking with timeout-based expiration
> - Version tracking and checksumming
> - Index encoding/decoding for cluster synchronization
>
> This crate depends only on pmxcfs-api-types and external
> libraries (rusqlite, sha2, bincode). It provides the core
> storage layer used by the distributed file system.
>
> Includes comprehensive unit tests for:
> - CRUD operations on files and directories
> - Lock acquisition and expiration
> - SQLite persistence and recovery
> - Index encoding/decoding for sync
> - Tree entry application
>
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
> src/pmxcfs-rs/Cargo.toml | 1 +
> src/pmxcfs-rs/pmxcfs-memdb/Cargo.toml | 42 +
> src/pmxcfs-rs/pmxcfs-memdb/README.md | 220 ++
> src/pmxcfs-rs/pmxcfs-memdb/src/database.rs | 2227 +++++++++++++++++
> src/pmxcfs-rs/pmxcfs-memdb/src/index.rs | 814 ++++++
> src/pmxcfs-rs/pmxcfs-memdb/src/lib.rs | 26 +
> src/pmxcfs-rs/pmxcfs-memdb/src/locks.rs | 286 +++
> src/pmxcfs-rs/pmxcfs-memdb/src/sync.rs | 249 ++
> src/pmxcfs-rs/pmxcfs-memdb/src/traits.rs | 101 +
> src/pmxcfs-rs/pmxcfs-memdb/src/types.rs | 325 +++
> src/pmxcfs-rs/pmxcfs-memdb/src/vmlist.rs | 189 ++
> .../pmxcfs-memdb/tests/checksum_test.rs | 158 ++
> .../tests/sync_integration_tests.rs | 394 +++
> 13 files changed, 5032 insertions(+)
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/Cargo.toml
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/README.md
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/database.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/index.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/lib.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/locks.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/sync.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/traits.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/types.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/vmlist.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/tests/checksum_test.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/tests/sync_integration_tests.rs
>
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index dd36c81f..2e41ac93 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -5,6 +5,7 @@ members = [
> "pmxcfs-config", # Configuration management
> "pmxcfs-logger", # Cluster log with ring buffer and deduplication
> "pmxcfs-rrd", # RRD (Round-Robin Database) persistence
> + "pmxcfs-memdb", # In-memory database with SQLite persistence
> ]
> resolver = "2"
>
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/Cargo.toml b/src/pmxcfs-rs/pmxcfs-memdb/Cargo.toml
> new file mode 100644
> index 00000000..409b87ce
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/Cargo.toml
> @@ -0,0 +1,42 @@
> +[package]
> +name = "pmxcfs-memdb"
> +description = "In-memory database with SQLite persistence for pmxcfs"
> +
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +repository.workspace = true
> +
> +[lints]
> +workspace = true
> +
> +[dependencies]
> +# Error handling
> +anyhow.workspace = true
> +
> +# Database
> +rusqlite = { version = "0.30", features = ["bundled"] }
> +
> +# Concurrency primitives
> +parking_lot.workspace = true
> +
> +# System integration
> +libc.workspace = true
> +
> +# Cryptography (for checksums)
> +sha2.workspace = true
> +bytes.workspace = true
> +
> +# Serialization
> +serde.workspace = true
> +bincode.workspace = true
> +
> +# Logging
> +tracing.workspace = true
> +
> +# pmxcfs types
> +pmxcfs-api-types = { path = "../pmxcfs-api-types" }
> +
> +[dev-dependencies]
> +tempfile.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/README.md b/src/pmxcfs-rs/pmxcfs-memdb/README.md
> new file mode 100644
> index 00000000..172e7351
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/README.md
> @@ -0,0 +1,220 @@
> +# pmxcfs-memdb
> +
> +**In-Memory Database** with SQLite persistence for pmxcfs cluster filesystem.
> +
> +This crate provides a thread-safe, cluster-synchronized in-memory database that serves as the backend storage for the Proxmox cluster filesystem. All filesystem operations (read, write, create, delete) are performed on in-memory structures with SQLite providing durable persistence.
> +
> +## Overview
> +
> +The MemDb is the core data structure that stores all cluster configuration files in memory for fast access while maintaining durability through SQLite. Changes are synchronized across the cluster using the DFSM protocol.
> +
> +### Key Features
> +
> +- **In-memory tree structure**: All filesystem entries cached in memory
> +- **SQLite persistence**: Durable storage with ACID guarantees
> +- **Cluster synchronization**: State replication via DFSM (pmxcfs-dfsm crate)
> +- **Version tracking**: Monotonically increasing version numbers for conflict detection
> +- **Resource locking**: File-level locks with timeout-based expiration
> +- **Thread-safe**: All operations protected by mutex
> +- **Size limits**: Enforces max file size (1 MiB) and total filesystem size (128 MiB)
> +
> +## Architecture
> +
> +### Module Structure
> +
> +| Module | Purpose | C Equivalent |
> +|--------|---------|--------------|
> +| `database.rs` | Core MemDb struct and CRUD operations | `memdb.c` (main functions) |
> +| `types.rs` | TreeEntry, LockInfo, constants | `memdb.h:38-51, 71-74` |
> +| `locks.rs` | Resource locking functionality | `memdb.c:memdb_lock_*` |
> +| `sync.rs` | State serialization for cluster sync | `memdb.c:memdb_encode_index` |
> +| `index.rs` | Index comparison for DFSM updates | `memdb.c:memdb_index_*` |
> +
> +## C to Rust Mapping
> +
> +### Data Structures
> +
> +| C Type | Rust Type | Notes |
> +|--------|-----------|-------|
> +| `memdb_t` | `MemDb` | Main database handle (Clone-able via Arc) |
> +| `memdb_tree_entry_t` | `TreeEntry` | File/directory entry |
> +| `memdb_index_t` | `MemDbIndex` | Serialized state for sync |
> +| `memdb_index_extry_t` | `IndexEntry` | Single index entry |
> +| `memdb_lock_info_t` | `LockInfo` | Lock metadata |
> +| `db_backend_t` | `Connection` | SQLite backend (rusqlite) |
> +| `GHashTable *index` | `HashMap<u64, TreeEntry>` | Inode index |
> +| `GHashTable *locks` | `HashMap<String, LockInfo>` | Lock table |
> +| `GMutex mutex` | `Mutex` | Thread synchronization |
> +
> +### Core Functions
> +
> +#### Database Lifecycle
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_open()` | `MemDb::open()` | database.rs |
> +| `memdb_close()` | (Drop trait) | Automatic |
> +| `memdb_checkpoint()` | (implicit in writes) | Auto-commit |
> +
> +#### File Operations
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_read()` | `MemDb::read()` | database.rs |
> +| `memdb_write()` | `MemDb::write()` | database.rs |
> +| `memdb_create()` | `MemDb::create()` | database.rs |
> +| `memdb_delete()` | `MemDb::delete()` | database.rs |
> +| `memdb_mkdir()` | `MemDb::create()` (with DT_DIR) | database.rs |
> +| `memdb_rename()` | `MemDb::rename()` | database.rs |
> +| `memdb_mtime()` | (included in write) | database.rs |
> +
> +#### Directory Operations
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_readdir()` | `MemDb::readdir()` | database.rs |
> +| `memdb_dirlist_free()` | (automatic) | Rust's Vec drops automatically |
> +
> +#### Metadata Operations
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_getattr()` | `MemDb::lookup_path()` | database.rs |
> +| `memdb_statfs()` | `MemDb::statfs()` | database.rs |
the statfs impl is missing in the diff, please re-visit
> +
> +#### Tree Entry Functions
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_tree_entry_new()` | `TreeEntry { ... }` | Struct initialization |
> +| `memdb_tree_entry_copy()` | `.clone()` | Automatic (derive Clone) |
> +| `memdb_tree_entry_free()` | (Drop trait) | Automatic |
> +| `tree_entry_debug()` | `{:?}` format | Automatic (derive Debug) |
> +| `memdb_tree_entry_csum()` | `TreeEntry::compute_checksum()` | types.rs |
> +
> +#### Lock Operations
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_lock_expired()` | `MemDb::is_lock_expired()` | locks.rs |
> +| `memdb_update_locks()` | `MemDb::update_locks()` | locks.rs |
> +
> +#### Index/Sync Operations
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_encode_index()` | `MemDb::get_index()` | sync.rs |
> +| `memdb_index_copy()` | `.clone()` | Automatic (derive Clone) |
> +| `memdb_compute_checksum()` | `MemDb::compute_checksum()` | sync.rs |
> +| `bdb_backend_commit_update()` | `MemDb::apply_tree_entry()` | database.rs |
> +
> +#### State Synchronization
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_recreate_vmlist()` | (handled by status crate) | External |
> +| (implicit) | `MemDb::replace_all_entries()` | database.rs |
> +
> +### SQLite Backend
> +
> +**C Version (database.c):**
> +- Direct SQLite3 C API
> +- Manual statement preparation
> +- Explicit transaction management
> +- Manual memory management
> +
> +**Rust Version (database.rs):**
> +- `rusqlite` crate for type-safe SQLite access
> +
> +## Database Schema
> +
> +The SQLite schema stores all filesystem entries with metadata:
> +- `inode = 1` is always the root directory
> +- `parent = 0` for root, otherwise parent directory's inode
> +- `version` increments on each modification (monotonic)
> +- `writer` is the node ID that made the change
> +- `mtime` is seconds since UNIX epoch
> +- `data` is NULL for directories, BLOB for files
> +
> +## TreeEntry Wire Format
> +
> +For cluster synchronization (DFSM Update messages), TreeEntry uses C-compatible serialization that is byte-compatible with C's implementation.
> +
> +## Key Differences from C Implementation
> +
> +### Thread Safety
> +
> +**C Version:**
> +- Single `GMutex` protects entire memdb_t
> +- Callback-based access from qb_loop (single-threaded)
> +
> +**Rust Version:**
> +- Mutex for each data structure (index, tree, locks, conn)
> +- More granular locking
> +- Can be shared across tokio tasks
> +
> +### Data Structures
> +
> +**C Version:**
> +- `GHashTable` (GLib) for index and tree
> +- Recursive tree structure with pointers
> +
> +**Rust Version:**
> +- `HashMap` from std
> +- Flat structure: `HashMap<u64, HashMap<String, u64>>` for tree
> +- Separate `HashMap<u64, TreeEntry>` for index
> +- No recursive pointers (eliminates cycles)
> +
> +### SQLite Integration
> +
> +**C Version (database.c):**
> +- Direct SQLite3 C API
> +
> +**Rust Version (database.rs):**
> +- `rusqlite` crate for type-safe SQLite access
> +
> +## Constants
> +
> +| Constant | Value | Purpose |
> +|----------|-------|---------|
> +| `MEMDB_MAX_FILE_SIZE` | 1 MiB | Maximum file size (matches C) |
> +| `MEMDB_MAX_FSSIZE` | 128 MiB | Maximum total filesystem size |
> +| `MEMDB_MAX_INODES` | 256k | Maximum number of files/dirs |
> +| `MEMDB_BLOCKSIZE` | 4096 | Block size for statfs |
> +| `LOCK_TIMEOUT` | 120 sec | Lock expiration timeout |
> +| `DT_DIR` | 4 | Directory type (matches POSIX) |
> +| `DT_REG` | 8 | Regular file type (matches POSIX) |
> +
> +## Known Issues / TODOs
> +
> +### Missing Features
> +
> +- [ ] **vmlist regeneration**: `memdb_recreate_vmlist()` not implemented (handled by status crate's `scan_vmlist()`)
> +
> +### Behavioral Differences (Benign)
> +
> +- **Lock storage**: C reads from filesystem at startup, Rust does the same but implementation differs
> +- **Index encoding**: Rust uses `Vec<IndexEntry>` instead of flexible array member
> +- **Checksum algorithm**: Same (SHA-256) but implementation differs (ring vs OpenSSL)
> +
> +### Compatibility
> +
> +- **Database format**: 100% compatible with C version (same SQLite schema)
> +- **Wire format**: TreeEntry serialization matches C byte-for-byte
> +- **Constants**: All limits match C version exactly
> +
> +## References
> +
> +### C Implementation
> +- `src/pmxcfs/memdb.c` / `memdb.h` - In-memory database
> +- `src/pmxcfs/database.c` - SQLite backend
> +
> +### Related Crates
> +- **pmxcfs-dfsm**: Uses MemDb for cluster synchronization
> +- **pmxcfs-api-types**: Message types for FUSE operations
> +- **pmxcfs**: Main daemon and FUSE integration
> +
> +### External Dependencies
> +- **rusqlite**: SQLite bindings
> +- **parking_lot**: Fast mutex implementation
> +- **sha2**: SHA-256 checksums
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/database.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/database.rs
> new file mode 100644
> index 00000000..ee280683
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/database.rs
> @@ -0,0 +1,2227 @@
> +/// Core MemDb implementation - in-memory database with SQLite persistence
> +use anyhow::{Context, Result};
> +use parking_lot::Mutex;
> +use rusqlite::{Connection, params};
> +use std::collections::HashMap;
> +use std::path::Path;
> +use std::sync::Arc;
> +use std::sync::atomic::{AtomicU64, Ordering};
> +use std::time::{SystemTime, UNIX_EPOCH};
> +
> +use super::types::LockInfo;
> +use super::types::{
> + DT_DIR, DT_REG, LOCK_DIR_PATH, LoadDbResult, MEMDB_MAX_FILE_SIZE, ROOT_INODE, TreeEntry,
> + VERSION_FILENAME,
> +};
> +
> +/// In-memory database with SQLite persistence
> +#[derive(Clone)]
> +pub struct MemDb {
> + pub(super) inner: Arc<MemDbInner>,
> +}
> +
> +pub(super) struct MemDbInner {
> + /// SQLite connection for persistence (wrapped in Mutex for thread-safety)
> + pub(super) conn: Mutex<Connection>,
> +
> + /// In-memory index of all entries (inode -> TreeEntry)
> + /// This is a cache of the database for fast lookups
> + pub(super) index: Mutex<HashMap<u64, TreeEntry>>,
> +
> + /// In-memory tree structure (parent inode -> children)
> + pub(super) tree: Mutex<HashMap<u64, HashMap<String, u64>>>,
> +
> + /// Root entry
> + pub(super) root_inode: u64,
> +
> + /// Current version (incremented on each write)
> + pub(super) version: AtomicU64,
> +
> + /// Resource locks (path -> LockInfo)
> + pub(super) locks: Mutex<HashMap<String, LockInfo>>,
In C we set memdb->errors = 1 after DB errors and refuses subsequent
operations. We should likely also have a error flag here and update
/ check it when performing the operations?
> +}
> +
> +// Manually implement Send and Sync for MemDb
> +// This is safe because we protect the Connection with a Mutex
> +unsafe impl Send for MemDbInner {}
> +unsafe impl Sync for MemDbInner {}
Mutex<Connection> should allow us to avoid any unsafe impls here.
please remove and let the compiler enforce the guarantees
> +
> +impl MemDb {
> + pub fn open(path: &Path, create: bool) -> Result<Self> {
> + let conn = Connection::open(path)?;
> +
> + if create {
> + Self::init_schema(&conn)?;
> + }
> +
> + let (index, tree, root_inode, version) = Self::load_from_db(&conn)?;
> +
> + let memdb = Self {
> + inner: Arc::new(MemDbInner {
> + conn: Mutex::new(conn),
> + index: Mutex::new(index),
> + tree: Mutex::new(tree),
> + root_inode,
> + version: AtomicU64::new(version),
> + locks: Mutex::new(HashMap::new()),
> + }),
> + };
> +
> + memdb.update_locks();
> +
> + Ok(memdb)
> + }
> +
> + fn init_schema(conn: &Connection) -> Result<()> {
> + conn.execute_batch(
> + r#"
> + CREATE TABLE tree (
> + inode INTEGER PRIMARY KEY,
> + parent INTEGER NOT NULL,
> + version INTEGER NOT NULL,
> + writer INTEGER NOT NULL,
> + mtime INTEGER NOT NULL,
> + type INTEGER NOT NULL,
> + name TEXT NOT NULL,
> + data BLOB,
> + size INTEGER NOT NULL
> + );
> +
> + CREATE INDEX tree_parent_idx ON tree(parent, name);
> +
> + CREATE TABLE config (
> + name TEXT PRIMARY KEY,
> + value TEXT
> + );
> + "#,
> + )?;
> +
> + // Create root metadata entry as inode ROOT_INODE with name "__version__"
> + // Matching C implementation: root inode is NEVER in database as a regular entry
> + // Root metadata is stored as inode ROOT_INODE with special name "__version__"
> + let now = SystemTime::now()
> + .duration_since(SystemTime::UNIX_EPOCH)?
> + .as_secs() as u32;
> +
> + conn.execute(
> + "INSERT INTO tree (inode, parent, version, writer, mtime, type, name, data, size) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)",
> + params![ROOT_INODE, ROOT_INODE, 1, 0, now, DT_REG, VERSION_FILENAME, None::<Vec<u8>>, 0],
> + )?;
> +
> + Ok(())
> + }
> +
> + fn load_from_db(conn: &Connection) -> Result<LoadDbResult> {
> + let mut index = HashMap::new();
> + let mut tree: HashMap<u64, HashMap<String, u64>> = HashMap::new();
> + let mut max_version = 0u64;
> +
> + let mut stmt = conn.prepare(
> + "SELECT inode, parent, version, writer, mtime, type, name, data, size FROM tree",
> + )?;
> + let rows = stmt.query_map([], |row| {
> + let inode: u64 = row.get(0)?;
> + let parent: u64 = row.get(1)?;
> + let version: u64 = row.get(2)?;
> + let writer: u32 = row.get(3)?;
> + let mtime: u32 = row.get(4)?;
> + let entry_type: u8 = row.get(5)?;
> + let name: String = row.get(6)?;
> + let data: Option<Vec<u8>> = row.get(7)?;
> + let size: i64 = row.get(8)?;
> +
> + Ok(TreeEntry {
> + inode,
> + parent,
> + version,
> + writer,
> + mtime,
> + size: size as usize,
> + entry_type,
> + name,
> + data: data.unwrap_or_default(),
> + })
> + })?;
> +
> + // Create root entry in memory first (matching C implementation in database.c:559-567)
> + // Root is NEVER stored in database, only its metadata via inode ROOT_INODE
> + let now = SystemTime::now()
> + .duration_since(SystemTime::UNIX_EPOCH)?
> + .as_secs() as u32;
> + let mut root = TreeEntry {
> + inode: ROOT_INODE,
> + parent: ROOT_INODE, // Root's parent is itself
> + version: 0, // Will be populated from __version__ entry
> + writer: 0,
> + mtime: now,
> + size: 0,
> + entry_type: DT_DIR,
> + name: String::new(),
> + data: Vec::new(),
> + };
> +
> + for row in rows {
> + let entry = row?;
> +
> + // Handle __version__ entry (inode ROOT_INODE) - populate root metadata (C: database.c:372-382)
> + if entry.inode == ROOT_INODE {
> + if entry.name == VERSION_FILENAME {
> + tracing::debug!(
> + "Loading root metadata from __version__: version={}, writer={}, mtime={}",
> + entry.version,
> + entry.writer,
> + entry.mtime
> + );
> + root.version = entry.version;
> + root.writer = entry.writer;
> + root.mtime = entry.mtime;
> + if entry.version > max_version {
> + max_version = entry.version;
> + }
> + } else {
> + tracing::warn!("Ignoring inode 0 with unexpected name: {}", entry.name);
> + }
> + continue; // Don't add __version__ to index
> + }
> +
> + // Track max version from all entries
> + if entry.version > max_version {
> + max_version = entry.version;
> + }
> +
> + // Add to tree structure
> + tree.entry(entry.parent)
> + .or_default()
> + .insert(entry.name.clone(), entry.inode);
> +
> + // If this is a directory, ensure it has an entry in the tree map
> + if entry.is_dir() {
> + tree.entry(entry.inode).or_default();
> + }
> +
> + // Add to index
> + index.insert(entry.inode, entry);
> + }
> +
> + // If root version is still 0, set it to 1 (new database)
> + if root.version == 0 {
> + root.version = 1;
> + max_version = 1;
> + tracing::debug!("No __version__ entry found, initializing root with version 1");
> + }
> +
> + // Add root to index and ensure it has a tree entry (use entry() to not overwrite children!)
> + index.insert(ROOT_INODE, root);
> + tree.entry(ROOT_INODE).or_default();
> +
> + Ok((index, tree, ROOT_INODE, max_version))
> + }
> +
> + pub fn get_entry_by_inode(&self, inode: u64) -> Option<TreeEntry> {
> + let index = self.inner.index.lock();
> + index.get(&inode).cloned()
> + }
> +
> + /// Increment global version and synchronize root entry version
> + ///
> + /// CRITICAL: The C implementation uses root->version as the index version.
> + /// We must keep the root entry's version synchronized with the global version counter
> + /// to ensure C nodes can verify the index after applying updates.
> + ///
> + /// This function acquires the index lock and database connection lock internally,
> + /// so it must NOT be called while holding either lock.
We could use a single "write guard" mutex for all mutating operations
to avoid risking consistency / races.
To me it seems C is doing exactly that and it helps us avoid these
issues.
> + fn increment_version(&self) -> Result<u64> {
> + let new_version = self.inner.version.fetch_add(1, Ordering::SeqCst) + 1;
> +
> + // Update root entry version in memory and database
> + {
> + let mut index = self.inner.index.lock();
> + if let Some(root_entry) = index.get_mut(&self.inner.root_inode) {
> + root_entry.version = new_version;
> + }
> + drop(index); // Release lock before DB access
> + }
> +
> + // Persist to database (outside index lock to avoid deadlock)
> + {
> + let conn = self.inner.conn.lock();
> + conn.execute(
> + "UPDATE tree SET version = ? WHERE inode = ?",
> + rusqlite::params![new_version as i64, self.inner.root_inode as i64],
> + )
> + .context("Failed to update root version in database")?;
> + }
> +
> + Ok(new_version)
> + }
Can we please centralize version bumps and __version__ updates?
right now increment_version() updates root version in
memory + DB separately from the actual entry mutation, while
other paths update __version__ differently (and sometimes
not at all).
it’d be much safer if every mutation did:
bump version +update __version__ + apply the entry change in
the same transaction, then updated in-memory
For example we could have a helper like this:
fn with_mutation<R>(&self, writer: u32, mtime: u32, f: impl
FnOnce(&Transaction<'_>, u64) -> Result<R>) -> Result<R>;
> +
> + /// Get the __version__ entry for sending updates to C nodes
> + ///
> + /// The __version__ entry (inode ROOT_INODE) stores root metadata in the database
> + /// but is not kept in the in-memory index. This method queries it directly
> + /// from the database to send as an UPDATE message to C nodes.
> + pub fn get_version_entry(&self) -> anyhow::Result<TreeEntry> {
> + let index = self.inner.index.lock();
> + let root_entry = index
> + .get(&self.inner.root_inode)
> + .ok_or_else(|| anyhow::anyhow!("Root entry not found"))?;
> +
> + // Create a __version__ entry matching C's format
> + // This is what C expects to receive as inode ROOT_INODE
> + Ok(TreeEntry {
> + inode: ROOT_INODE, // __version__ is always inode ROOT_INODE in database/wire format
> + parent: ROOT_INODE, // Root's parent is itself
> + version: root_entry.version,
> + writer: root_entry.writer,
> + mtime: root_entry.mtime,
> + size: 0,
> + entry_type: DT_REG,
> + name: VERSION_FILENAME.to_string(),
> + data: Vec::new(),
> + })
> + }
> +
> + pub fn lookup_path(&self, path: &str) -> Option<TreeEntry> {
> + let index = self.inner.index.lock();
> + let tree = self.inner.tree.lock();
Here we lock in order index, tree
But in fn readdir(..) we lock in order tree, index
I think we should at least enforce a strict lock ordering across
all methods, or collapse to a single mutex as mentioned.
> +
> + if path.is_empty() || path == "/" || path == "." {
> + return index.get(&self.inner.root_inode).cloned();
> + }
> +
> + let parts: Vec<&str> = path.split('/').filter(|s| !s.is_empty()).collect();
> + let mut current_inode = self.inner.root_inode;
> +
> + for part in parts {
> + let children = tree.get(¤t_inode)?;
> + current_inode = *children.get(part)?;
> + }
> +
> + index.get(¤t_inode).cloned()
> + }
> +
> + /// Split a path into parent directory and basename
> + ///
> + /// Paths should be absolute (starting with `/`). While the implementation
> + /// handles relative paths for C compatibility, all new code should use absolute paths.
> + fn split_path(path: &str) -> (String, String) {
> + debug_assert!(
> + path.starts_with('/') || path.is_empty(),
> + "Path should be absolute (start with /), got: {path}"
> + );
This only validates in debug builds. You could replace this with
actual checks.
> +
> + let path = path.trim_end_matches('/');
> +
> + if let Some(pos) = path.rfind('/') {
> + let dirname = if pos == 0 { "/" } else { &path[..pos] };
> + let basename = &path[pos + 1..];
> + (dirname.to_string(), basename.to_string())
> + } else {
> + ("/".to_string(), path.to_string())
> + }
> + }
> +
> + pub fn exists(&self, path: &str) -> Result<bool> {
> + Ok(self.lookup_path(path).is_some())
> + }
> +
> + pub fn read(&self, path: &str, offset: u64, size: usize) -> Result<Vec<u8>> {
> + let entry = self
> + .lookup_path(path)
> + .ok_or_else(|| anyhow::anyhow!("File not found: {path}"))?;
> +
> + if entry.is_dir() {
> + return Err(anyhow::anyhow!("Cannot read directory: {path}"));
> + }
> +
> + let offset = offset as usize;
> + if offset >= entry.data.len() {
> + return Ok(Vec::new());
> + }
> +
> + let end = std::cmp::min(offset + size, entry.data.len());
> + Ok(entry.data[offset..end].to_vec())
> + }
> +
> + /// Helper to update __version__ entry in database
> + ///
> + /// This is called for EVERY write operation to keep root metadata synchronized
> + /// (matching C behavior in database.c:275-278)
> + fn update_version_entry(
> + conn: &rusqlite::Connection,
> + version: u64,
> + writer: u32,
> + mtime: u32,
> + ) -> Result<()> {
> + conn.execute(
> + "UPDATE tree SET version = ?1, writer = ?2, mtime = ?3 WHERE inode = ?4",
> + params![version, writer, mtime, ROOT_INODE],
> + )?;
> + Ok(())
> + }
> +
> + /// Helper to update root entry in index
> + ///
> + /// Keeps the in-memory root entry synchronized with database __version__
> + fn update_root_metadata(
> + index: &mut HashMap<u64, TreeEntry>,
> + root_inode: u64,
> + version: u64,
> + writer: u32,
> + mtime: u32,
> + ) {
> + if let Some(root_entry) = index.get_mut(&root_inode) {
> + root_entry.version = version;
> + root_entry.writer = writer;
> + root_entry.mtime = mtime;
> + }
> + }
> +
> + pub fn create(&self, path: &str, mode: u32, mtime: u32) -> Result<()> {
> + if self.exists(path)? {
> + return Err(anyhow::anyhow!("File already exists: {path}"));
> + }
> +
> + let (parent_path, basename) = Self::split_path(path);
> +
> + let parent_entry = self
> + .lookup_path(&parent_path)
> + .ok_or_else(|| anyhow::anyhow!("Parent directory not found: {parent_path}"))?;
> +
> + if !parent_entry.is_dir() {
> + return Err(anyhow::anyhow!("Parent is not a directory: {parent_path}"));
> + }
> +
> + let entry_type = if mode & libc::S_IFDIR != 0 {
> + DT_DIR
> + } else {
> + DT_REG
> + };
> +
> + // CRITICAL: Increment version FIRST, then assign inode = version
> + // This matches C's behavior: te->inode = memdb->root->version
> + // (see src/pmxcfs/memdb.c:760)
> + let version = self.increment_version()?;
> + let new_inode = version; // Inode equals version number (C compatibility)
> +
> + let entry = TreeEntry {
> + inode: new_inode,
> + parent: parent_entry.inode,
> + version,
> + writer: 0, // Local operations always use writer 0 (matching C)
> + mtime,
> + size: 0,
> + entry_type,
> + name: basename.clone(),
> + data: Vec::new(),
> + };
> +
> + {
> + let conn = self.inner.conn.lock();
> + let tx = conn.unchecked_transaction()?;
> +
> + tx.execute(
> + "INSERT INTO tree (inode, parent, version, writer, mtime, type, name, data, size) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)",
> + params![
> + entry.inode,
> + entry.parent,
> + entry.version,
> + entry.writer,
> + entry.mtime,
> + entry.entry_type,
> + entry.name,
> + if entry.is_dir() { None::<Vec<u8>> } else { Some(entry.data.clone()) },
> + entry.size
> + ],
> + )?;
> +
> + // CRITICAL: Update __version__ entry (matching C in database.c:275-278)
> + Self::update_version_entry(&tx, entry.version, entry.writer, entry.mtime)?;
> +
> + tx.commit()?;
> + }
> +
> + {
> + let mut index = self.inner.index.lock();
> + let mut tree = self.inner.tree.lock();
> +
> + index.insert(new_inode, entry.clone());
> + Self::update_root_metadata(
> + &mut index,
> + self.inner.root_inode,
> + entry.version,
> + entry.writer,
> + entry.mtime,
> + );
> +
> + tree.entry(parent_entry.inode)
> + .or_default()
> + .insert(basename, new_inode);
> +
> + if entry.is_dir() {
> + tree.insert(new_inode, HashMap::new());
> + }
> + }
> +
> + // If this is a directory in priv/lock/, register it in the lock table
> + if entry.is_dir() && parent_path == LOCK_DIR_PATH {
> + let csum = entry.compute_checksum();
> + let _ = self.lock_expired(path, &csum);
> + tracing::debug!("Registered lock directory: {}", path);
> + }
> +
> + Ok(())
> + }
> +
> + pub fn write(
> + &self,
> + path: &str,
> + offset: u64,
> + mtime: u32,
> + data: &[u8],
> + truncate: bool,
> + ) -> Result<usize> {
> + let mut entry = self
> + .lookup_path(path)
> + .ok_or_else(|| anyhow::anyhow!("File not found: {path}"))?;
> +
> + if entry.is_dir() {
> + return Err(anyhow::anyhow!("Cannot write to directory: {path}"));
> + }
> +
> + // Truncate before writing if requested (matches C implementation behavior)
C preserves prefix bytes on truncate
> + if truncate {
> + entry.data.clear();
> + }
> +
> + // Check size limit
> + let new_size = std::cmp::max(entry.data.len(), (offset as usize) + data.len());
I think we should use checked arithmetic to avoid possible overflows
on 32 bit systems. Also we should check
offset + data.len() <= MEMDB_MAX_FILE_SIZE
> +
> + if new_size > MEMDB_MAX_FILE_SIZE {
> + return Err(anyhow::anyhow!(
> + "File size exceeds maximum: {MEMDB_MAX_FILE_SIZE}"
> + ));
> + }
> +
> + // Extend if necessary
> + let offset = offset as usize;
> + if offset + data.len() > entry.data.len() {
> + entry.data.resize(offset + data.len(), 0);
> + }
> +
> + // Write data
> + entry.data[offset..offset + data.len()].copy_from_slice(data);
> + entry.size = entry.data.len();
> + entry.mtime = mtime;
> + entry.writer = 0; // Local operations always use writer 0 (matching C)
> +
> + // Increment version
> + let version = self.increment_version()?;
> + entry.version = version;
> +
> + // Update database
> + {
> + let conn = self.inner.conn.lock();
> + let tx = conn.unchecked_transaction()?;
> +
> + tx.execute(
> + "UPDATE tree SET version = ?1, writer = ?2, mtime = ?3, size = ?4, data = ?5 WHERE inode = ?6",
> + params![
> + entry.version,
> + entry.writer,
> + entry.mtime,
> + entry.size,
> + &entry.data,
> + entry.inode
> + ],
> + )?;
> +
> + // CRITICAL: Update __version__ entry (matching C in database.c:275-278)
> + Self::update_version_entry(&tx, entry.version, entry.writer, entry.mtime)?;
> +
> + tx.commit()?;
> + }
> +
> + // Update in-memory index
> + {
> + let mut index = self.inner.index.lock();
> + index.insert(entry.inode, entry.clone());
> + Self::update_root_metadata(
> + &mut index,
> + self.inner.root_inode,
> + entry.version,
> + entry.writer,
> + entry.mtime,
> + );
> + }
> +
> + Ok(data.len())
> + }
> +
> + /// Update modification time of a file or directory
> + ///
> + /// This implements the C version's `memdb_mtime` function (memdb.c:860-932)
> + /// with full lock protection semantics for directories in `priv/lock/`.
> + ///
> + /// # Lock Protection
> + ///
> + /// For lock directories (`priv/lock/*`), this function enforces:
> + /// 1. Only the same writer (node ID) can update the lock
> + /// 2. Only newer mtime values are accepted (to prevent replay attacks)
> + /// 3. Lock cache is refreshed after successful update
> + ///
> + /// # Arguments
> + ///
> + /// * `path` - Path to the file/directory
> + /// * `writer` - Writer ID (node ID in cluster)
> + /// * `mtime` - New modification time (seconds since UNIX epoch)
> + pub fn set_mtime(&self, path: &str, writer: u32, mtime: u32) -> Result<()> {
> + let mut entry = self
> + .lookup_path(path)
> + .ok_or_else(|| anyhow::anyhow!("File not found: {path}"))?;
> +
> + // Don't allow updating root
> + if entry.inode == self.inner.root_inode {
> + return Err(anyhow::anyhow!("Cannot update root directory"));
> + }
> +
> + // Check if this is a lock directory (matching C logic in memdb.c:882)
> + let (parent_path, _) = Self::split_path(path);
> + let is_lock = parent_path.trim_start_matches('/') == LOCK_DIR_PATH && entry.is_dir();
> +
> + if is_lock {
> + // Lock protection: Only allow newer mtime (C: memdb.c:886-889)
> + // This prevents replay attacks and ensures lock renewal works correctly
> + if mtime < entry.mtime {
> + tracing::warn!(
> + "Rejecting mtime update for lock '{}': {} < {} (locked)",
> + path,
> + mtime,
> + entry.mtime
> + );
> + return Err(anyhow::anyhow!(
> + "Cannot set older mtime on locked directory (dir is locked)"
> + ));
> + }
> +
> + // Lock protection: Only same writer can update (C: memdb.c:890-894)
> + // This prevents lock hijacking from other nodes
> + if entry.writer != writer {
> + tracing::warn!(
> + "Rejecting mtime update for lock '{}': writer {} != {} (wrong owner)",
> + path,
> + writer,
> + entry.writer
> + );
> + return Err(anyhow::anyhow!(
> + "Lock owned by different writer (cannot hijack lock)"
> + ));
> + }
> +
> + tracing::debug!(
> + "Updating lock directory: {} (mtime: {} -> {})",
> + path,
> + entry.mtime,
> + mtime
> + );
> + }
> +
> + // Increment version
> + let version = self.increment_version()?;
> +
> + // Update entry
> + entry.version = version;
> + entry.writer = writer;
> + entry.mtime = mtime;
> +
> + // Update database
> + {
> + let conn = self.inner.conn.lock();
> + conn.execute(
> + "UPDATE tree SET version = ?1, writer = ?2, mtime = ?3 WHERE inode = ?4",
> + params![entry.version, entry.writer, entry.mtime, entry.inode],
> + )?;
> + }
> +
> + // Update in-memory index
> + {
> + let mut index = self.inner.index.lock();
> + index.insert(entry.inode, entry.clone());
> + }
> +
> + // Refresh lock cache if this is a lock directory (C: memdb.c:924-929)
> + // Remove old entry and insert new one with updated checksum
> + if is_lock {
> + let mut locks = self.inner.locks.lock();
> + locks.remove(path);
> +
> + let csum = entry.compute_checksum();
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap_or_default()
> + .as_secs();
> +
> + locks.insert(path.to_string(), LockInfo { ltime: now, csum });
> +
> + tracing::debug!("Refreshed lock cache for: {}", path);
> + }
> +
> + Ok(())
> + }
> +
> + pub fn readdir(&self, path: &str) -> Result<Vec<TreeEntry>> {
> + let entry = self
> + .lookup_path(path)
> + .ok_or_else(|| anyhow::anyhow!("Directory not found: {path}"))?;
> +
> + if !entry.is_dir() {
> + return Err(anyhow::anyhow!("Not a directory: {path}"));
> + }
> +
> + let tree = self.inner.tree.lock();
> + let index = self.inner.index.lock();
> +
> + let children = tree
> + .get(&entry.inode)
> + .ok_or_else(|| anyhow::anyhow!("Directory structure corrupted"))?;
> +
> + let mut entries = Vec::new();
> + for child_inode in children.values() {
> + if let Some(child) = index.get(child_inode) {
> + entries.push(child.clone());
> + }
> + }
> +
> + Ok(entries)
> + }
> +
> + pub fn delete(&self, path: &str) -> Result<()> {
> + let entry = self
> + .lookup_path(path)
> + .ok_or_else(|| anyhow::anyhow!("File not found: {path}"))?;
> +
> + // Don't allow deleting root
> + if entry.inode == self.inner.root_inode {
> + return Err(anyhow::anyhow!("Cannot delete root directory"));
> + }
> +
> + // If directory, check if empty
> + if entry.is_dir() {
> + let tree = self.inner.tree.lock();
> + if let Some(children) = tree.get(&entry.inode)
> + && !children.is_empty()
> + {
> + return Err(anyhow::anyhow!("Directory not empty: {path}"));
> + }
> + }
C's memdb_delete() increments the root version, but here we dont.
Also the __version__ needs to be incremented.
> +
> + // Delete from database
> + {
> + let conn = self.inner.conn.lock();
> + conn.execute("DELETE FROM tree WHERE inode = ?1", params![entry.inode])?;
> + }
> +
> + // Update in-memory structures
> + {
> + let mut index = self.inner.index.lock();
> + let mut tree = self.inner.tree.lock();
> +
> + // Remove from index
> + index.remove(&entry.inode);
> +
> + // Remove from parent's children
> + if let Some(parent_children) = tree.get_mut(&entry.parent) {
> + parent_children.remove(&entry.name);
> + }
> +
> + // Remove from tree if directory
> + if entry.is_dir() {
> + tree.remove(&entry.inode);
> + }
> + }
> +
> + // Clean up lock cache for directories (matching C behavior in memdb.c:1235)
> + // This prevents stale lock cache entries and memory leaks
> + if entry.is_dir() {
> + let mut locks = self.inner.locks.lock();
> + locks.remove(path);
> + tracing::debug!("Removed lock cache entry for deleted directory: {}", path);
> + }
> +
> + Ok(())
> + }
> +
> + pub fn rename(&self, old_path: &str, new_path: &str) -> Result<()> {
> + let mut entry = self
> + .lookup_path(old_path)
> + .ok_or_else(|| anyhow::anyhow!("Source not found: {old_path}"))?;
> +
> + if entry.inode == self.inner.root_inode {
> + return Err(anyhow::anyhow!("Cannot rename root directory"));
> + }
> +
> + if self.exists(new_path)? {
> + return Err(anyhow::anyhow!("Destination already exists: {new_path}"));
> + }
> +
> + let (new_parent_path, new_basename) = Self::split_path(new_path);
> +
> + let new_parent_entry = self
> + .lookup_path(&new_parent_path)
> + .ok_or_else(|| anyhow::anyhow!("New parent directory not found: {new_parent_path}"))?;
> +
> + if !new_parent_entry.is_dir() {
> + return Err(anyhow::anyhow!(
> + "New parent is not a directory: {new_parent_path}"
> + ));
> + }
> +
> + let old_parent = entry.parent;
> + let old_name = entry.name.clone();
> +
> + entry.parent = new_parent_entry.inode;
> + entry.name = new_basename.clone();
> +
> + let version = self.increment_version()?;
> + entry.version = version;
> +
> + // Update database
> + {
> + let conn = self.inner.conn.lock();
> + let tx = conn.unchecked_transaction()?;
> +
> + tx.execute(
> + "UPDATE tree SET parent = ?1, name = ?2, version = ?3 WHERE inode = ?4",
> + params![entry.parent, entry.name, entry.version, entry.inode],
> + )?;
> +
> + // CRITICAL: Update __version__ entry (matching C in database.c:275-278)
> + Self::update_version_entry(&tx, entry.version, entry.writer, entry.mtime)?;
> +
> + tx.commit()?;
> + }
> +
> + {
> + let mut index = self.inner.index.lock();
> + let mut tree = self.inner.tree.lock();
> +
> + index.insert(entry.inode, entry.clone());
> + Self::update_root_metadata(
> + &mut index,
> + self.inner.root_inode,
> + entry.version,
> + entry.writer,
> + entry.mtime,
> + );
> +
> + if let Some(old_parent_children) = tree.get_mut(&old_parent) {
> + old_parent_children.remove(&old_name);
> + }
> +
> + tree.entry(new_parent_entry.inode)
> + .or_default()
> + .insert(new_basename, entry.inode);
> + }
> +
> + Ok(())
> + }
> +
> + pub fn get_all_entries(&self) -> Result<Vec<TreeEntry>> {
> + let index = self.inner.index.lock();
> + let entries: Vec<TreeEntry> = index.values().cloned().collect();
> + Ok(entries)
> + }
> +
> + pub fn get_version(&self) -> u64 {
> + self.inner.version.load(Ordering::SeqCst)
> + }
> +
> + /// Replace all entries (for full state synchronization)
> + pub fn replace_all_entries(&self, entries: Vec<TreeEntry>) -> Result<()> {
> + tracing::info!(
> + "Replacing all database entries with {} new entries",
> + entries.len()
> + );
> +
> + let conn = self.inner.conn.lock();
> + let tx = conn.unchecked_transaction()?;
> +
> + tx.execute("DELETE FROM tree", [])?;
Here we delete all entries, including the root one ..
> +
> + let max_version = entries.iter().map(|e| e.version).max().unwrap_or(0);
> +
> + for entry in &entries {
> + tx.execute(
> + "INSERT INTO tree (inode, parent, version, writer, mtime, type, name, data, size) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)",
> + params![
> + entry.inode,
> + entry.parent,
> + entry.version,
> + entry.writer,
> + entry.mtime,
> + entry.entry_type,
> + entry.name,
> + if entry.is_dir() { None::<Vec<u8>> } else { Some(entry.data.clone()) },
> + entry.size
> + ],
> + )?;
> + }
.. but if Vec<TreeEntry> contains the in-memory root format instead
of the DB format the database may corrupt; on restart load_from_db()
will ignore the malformed root entry and reset version to 1.
We need to handle the root case explicitly.
> +
> + tx.commit()?;
> + drop(conn);
> +
> + let mut index = self.inner.index.lock();
> + let mut tree = self.inner.tree.lock();
> +
> + index.clear();
> + tree.clear();
> +
> + for entry in entries {
> + tree.entry(entry.parent)
> + .or_default()
> + .insert(entry.name.clone(), entry.inode);
> +
> + if entry.is_dir() {
> + tree.entry(entry.inode).or_default();
> + }
> +
> + index.insert(entry.inode, entry);
> + }
> +
> + self.inner.version.store(max_version, Ordering::SeqCst);
> +
> + tracing::info!(
> + "Database state replaced successfully, version now: {}",
> + max_version
> + );
> + Ok(())
> + }
> +
> + /// Apply a single TreeEntry during incremental synchronization
> + ///
> + /// This is used when receiving Update messages from the leader.
> + /// It directly inserts or updates the entry in the database without
> + /// going through the path-based API.
> + pub fn apply_tree_entry(&self, entry: TreeEntry) -> Result<()> {
> + tracing::debug!(
> + "Applying TreeEntry: inode={}, parent={}, name='{}', version={}",
> + entry.inode,
> + entry.parent,
> + entry.name,
> + entry.version
> + );
> +
> + // Begin transaction for atomicity
> + let conn = self.inner.conn.lock();
> + let tx = conn.unchecked_transaction()?;
> +
> + // Handle root inode specially (inode 0 is __version__)
> + let db_name = if entry.inode == self.inner.root_inode {
> + VERSION_FILENAME
> + } else {
> + entry.name.as_str()
> + };
> +
> + // Insert or replace the entry in database
> + tx.execute(
> + "INSERT OR REPLACE INTO tree (inode, parent, version, writer, mtime, type, name, data, size) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)",
> + params![
> + entry.inode,
> + entry.parent,
> + entry.version,
> + entry.writer,
> + entry.mtime,
> + entry.entry_type,
> + db_name,
> + if entry.is_dir() { None::<Vec<u8>> } else { Some(entry.data.clone()) },
> + entry.size
> + ],
> + )?;
> +
> + // CRITICAL: Update __version__ entry with the same metadata (matching C in database.c:275-278)
> + // Only do this if we're not already writing __version__ itself
> + if entry.inode != ROOT_INODE {
> + Self::update_version_entry(&tx, entry.version, entry.writer, entry.mtime)?;
> + }
> +
> + tx.commit()?;
> + drop(conn);
> +
> + // Update in-memory structures
> + let mut index = self.inner.index.lock();
> + let mut tree = self.inner.tree.lock();
> +
> + // Check if this entry already exists
> + let old_entry = index.get(&entry.inode).cloned();
> +
> + // If entry exists with different parent or name, update tree structure
> + if let Some(old) = old_entry {
> + if old.parent != entry.parent || old.name != entry.name {
> + // Remove from old parent's children
> + if let Some(old_parent_children) = tree.get_mut(&old.parent) {
> + old_parent_children.remove(&old.name);
> + }
> +
> + // Add to new parent's children
> + tree.entry(entry.parent)
> + .or_default()
> + .insert(entry.name.clone(), entry.inode);
> + }
> + } else {
> + // New entry - add to parent's children
> + tree.entry(entry.parent)
> + .or_default()
> + .insert(entry.name.clone(), entry.inode);
> + }
> +
> + // If this is a directory, ensure it has an entry in the tree map
> + if entry.is_dir() {
> + tree.entry(entry.inode).or_default();
> + }
> +
> + // Update index
> + index.insert(entry.inode, entry.clone());
incoming updates may include inode 0. this would overwrite the
in-memory root dir entry (DT_DIR) with a file entry
> +
> + // Update root entry's metadata to match __version__ (if we wrote a non-root entry)
> + if entry.inode != self.inner.root_inode {
> + Self::update_root_metadata(
> + &mut index,
> + self.inner.root_inode,
> + entry.version,
> + entry.writer,
> + entry.mtime,
> + );
> + tracing::debug!(
> + version = entry.version,
> + writer = entry.writer,
> + mtime = entry.mtime,
> + "Updated root entry metadata"
> + );
> + }
> +
> + // Update version counter if this entry has a higher version
> + self.inner
> + .version
> + .fetch_max(entry.version, Ordering::SeqCst);
> +
> + tracing::debug!("TreeEntry applied successfully");
> + Ok(())
> + }
> +
> + /// **TEST ONLY**: Manually set lock timestamp for testing expiration behavior
> + ///
> + /// This method is exposed for testing purposes only to simulate lock expiration
> + /// without waiting the full 120 seconds. Do not use in production code.
> + #[cfg(test)]
> + pub fn test_set_lock_timestamp(&self, path: &str, timestamp_secs: u64) {
> + let mut locks = self.inner.locks.lock();
> + if let Some(lock_info) = locks.get_mut(path) {
> + lock_info.ltime = timestamp_secs;
> + }
> + }
> +}
> +
> +// ============================================================================
> +// Trait Implementation for Dependency Injection
> +// ============================================================================
> +
> +impl crate::traits::MemDbOps for MemDb {
> + fn create(&self, path: &str, mode: u32, mtime: u32) -> Result<()> {
> + self.create(path, mode, mtime)
> + }
> +
> + fn read(&self, path: &str, offset: u64, size: usize) -> Result<Vec<u8>> {
> + self.read(path, offset, size)
> + }
> +
> + fn write(
> + &self,
> + path: &str,
> + offset: u64,
> + mtime: u32,
> + data: &[u8],
> + truncate: bool,
> + ) -> Result<usize> {
> + self.write(path, offset, mtime, data, truncate)
> + }
> +
> + fn delete(&self, path: &str) -> Result<()> {
> + self.delete(path)
> + }
> +
> + fn rename(&self, old_path: &str, new_path: &str) -> Result<()> {
> + self.rename(old_path, new_path)
> + }
> +
> + fn exists(&self, path: &str) -> Result<bool> {
> + self.exists(path)
> + }
> +
> + fn readdir(&self, path: &str) -> Result<Vec<crate::types::TreeEntry>> {
> + self.readdir(path)
> + }
> +
> + fn set_mtime(&self, path: &str, writer: u32, mtime: u32) -> Result<()> {
> + self.set_mtime(path, writer, mtime)
> + }
> +
> + fn lookup_path(&self, path: &str) -> Option<crate::types::TreeEntry> {
> + self.lookup_path(path)
> + }
> +
> + fn get_entry_by_inode(&self, inode: u64) -> Option<crate::types::TreeEntry> {
> + self.get_entry_by_inode(inode)
> + }
> +
> + fn acquire_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()> {
> + self.acquire_lock(path, csum)
> + }
> +
> + fn release_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()> {
> + self.release_lock(path, csum)
> + }
> +
> + fn is_locked(&self, path: &str) -> bool {
> + self.is_locked(path)
> + }
> +
> + fn lock_expired(&self, path: &str, csum: &[u8; 32]) -> bool {
> + self.lock_expired(path, csum)
> + }
> +
> + fn get_version(&self) -> u64 {
> + self.get_version()
> + }
> +
> + fn get_all_entries(&self) -> Result<Vec<crate::types::TreeEntry>> {
> + self.get_all_entries()
> + }
> +
> + fn replace_all_entries(&self, entries: Vec<crate::types::TreeEntry>) -> Result<()> {
> + self.replace_all_entries(entries)
> + }
> +
> + fn apply_tree_entry(&self, entry: crate::types::TreeEntry) -> Result<()> {
> + self.apply_tree_entry(entry)
> + }
> +
> + fn encode_database(&self) -> Result<Vec<u8>> {
> + self.encode_database()
> + }
> +
> + fn compute_database_checksum(&self) -> Result<[u8; 32]> {
> + self.compute_database_checksum()
> + }
> +}
> +
[..]
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/index.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/index.rs
> new file mode 100644
> index 00000000..5bf9c102
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/index.rs
> @@ -0,0 +1,814 @@
> +/// MemDB Index structures for C-compatible state synchronization
> +///
> +/// This module implements the memdb_index_t format used by the C implementation
> +/// for efficient state comparison during cluster synchronization.
> +use anyhow::Result;
> +use sha2::{Digest, Sha256};
> +
> +/// Index entry matching C's memdb_index_extry_t
> +///
> +/// Wire format (40 bytes):
> +/// ```c
> +/// typedef struct {
> +/// guint64 inode; // 8 bytes
> +/// char digest[32]; // 32 bytes (SHA256)
> +/// } memdb_index_extry_t;
> +/// ```
> +#[derive(Debug, Clone, PartialEq, Eq)]
> +pub struct IndexEntry {
> + pub inode: u64,
> + pub digest: [u8; 32],
> +}
> +
> +impl IndexEntry {
> + pub fn serialize(&self) -> Vec<u8> {
> + let mut data = Vec::with_capacity(40);
> + data.extend_from_slice(&self.inode.to_le_bytes());
> + data.extend_from_slice(&self.digest);
> + data
> + }
> +
> + pub fn deserialize(data: &[u8]) -> Result<Self> {
> + if data.len() < 40 {
> + anyhow::bail!("IndexEntry too short: {} bytes (need 40)", data.len());
> + }
> +
> + let inode = u64::from_le_bytes(data[0..8].try_into().unwrap());
> + let mut digest = [0u8; 32];
> + digest.copy_from_slice(&data[8..40]);
> +
> + Ok(Self { inode, digest })
> + }
> +}
> +
> +/// MemDB index matching C's memdb_index_t
> +///
> +/// Wire format header (24 bytes) + entries:
this should be 32 bytes? also please fix the comment below and the
reference in the README
> +/// ```c
> +/// typedef struct {
> +/// guint64 version; // 8 bytes
> +/// guint64 last_inode; // 8 bytes
> +/// guint32 writer; // 4 bytes
> +/// guint32 mtime; // 4 bytes
> +/// guint32 size; // 4 bytes (number of entries)
> +/// guint32 bytes; // 4 bytes (total bytes allocated)
> +/// memdb_index_extry_t entries[]; // variable length
> +/// } memdb_index_t;
> +/// ```
> +#[derive(Debug, Clone, PartialEq, Eq)]
> +pub struct MemDbIndex {
> + pub version: u64,
> + pub last_inode: u64,
> + pub writer: u32,
> + pub mtime: u32,
> + pub size: u32, // number of entries
> + pub bytes: u32, // total bytes (24 + size * 40)
> + pub entries: Vec<IndexEntry>,
> +}
> +
> +impl MemDbIndex {
> + /// Create a new index from entries
> + ///
> + /// Entries are automatically sorted by inode for efficient comparison
> + /// and to match C implementation behavior.
> + pub fn new(
> + version: u64,
> + last_inode: u64,
> + writer: u32,
> + mtime: u32,
> + mut entries: Vec<IndexEntry>,
> + ) -> Self {
> + // Sort entries by inode (matching C implementation)
> + entries.sort_by_key(|e| e.inode);
> +
> + let size = entries.len() as u32;
> + let bytes = 32 + size * 40; // header (32) + entries
> +
> + Self {
> + version,
> + last_inode,
> + writer,
> + mtime,
> + size,
> + bytes,
> + entries,
> + }
> + }
> +
> + /// Serialize to C-compatible wire format
> + pub fn serialize(&self) -> Vec<u8> {
> + let mut data = Vec::with_capacity(self.bytes as usize);
> +
> + // Header (32 bytes)
> + data.extend_from_slice(&self.version.to_le_bytes());
> + data.extend_from_slice(&self.last_inode.to_le_bytes());
> + data.extend_from_slice(&self.writer.to_le_bytes());
> + data.extend_from_slice(&self.mtime.to_le_bytes());
> + data.extend_from_slice(&self.size.to_le_bytes());
> + data.extend_from_slice(&self.bytes.to_le_bytes());
> +
> + // Entries (40 bytes each)
> + for entry in &self.entries {
> + data.extend_from_slice(&entry.serialize());
> + }
> +
> + data
> + }
> +
> + /// Deserialize from C-compatible wire format
> + pub fn deserialize(data: &[u8]) -> Result<Self> {
> + if data.len() < 32 {
> + anyhow::bail!(
> + "MemDbIndex too short: {} bytes (need at least 32)",
> + data.len()
> + );
> + }
> +
> + // Parse header
> + let version = u64::from_le_bytes(data[0..8].try_into().unwrap());
> + let last_inode = u64::from_le_bytes(data[8..16].try_into().unwrap());
> + let writer = u32::from_le_bytes(data[16..20].try_into().unwrap());
> + let mtime = u32::from_le_bytes(data[20..24].try_into().unwrap());
> + let size = u32::from_le_bytes(data[24..28].try_into().unwrap());
> + let bytes = u32::from_le_bytes(data[28..32].try_into().unwrap());
> +
> + // Validate size
> + let expected_bytes = 32 + size * 40;
> + if bytes != expected_bytes {
> + anyhow::bail!("MemDbIndex bytes mismatch: got {bytes}, expected {expected_bytes}");
> + }
> +
> + if data.len() < bytes as usize {
> + anyhow::bail!(
> + "MemDbIndex data too short: {} bytes (need {})",
> + data.len(),
> + bytes
> + );
> + }
> +
> + // Parse entries
> + let mut entries = Vec::with_capacity(size as usize);
> + let mut offset = 32;
> + for _ in 0..size {
> + let entry = IndexEntry::deserialize(&data[offset..offset + 40])?;
> + entries.push(entry);
> + offset += 40;
> + }
> +
> + Ok(Self {
> + version,
> + last_inode,
> + writer,
> + mtime,
> + size,
> + bytes,
> + entries,
> + })
> + }
> +
> + /// Compute SHA256 digest of a tree entry for the index
> + ///
> + /// Matches C's memdb_encode_index() digest computation (memdb.c:1497-1507)
> + /// CRITICAL: Order and fields must match exactly:
> + /// 1. version, 2. writer, 3. mtime, 4. size, 5. type, 6. parent, 7. name, 8. data
> + ///
> + /// NOTE: inode is NOT included in the digest (only used as the index key)
> + #[allow(clippy::too_many_arguments)]
> + pub fn compute_entry_digest(
> + _inode: u64, // Not included in digest, only for signature compatibility
> + parent: u64,
> + version: u64,
> + writer: u32,
> + mtime: u32,
> + size: usize,
> + entry_type: u8,
> + name: &str,
> + data: &[u8],
> + ) -> [u8; 32] {
> + let mut hasher = Sha256::new();
> +
> + // Hash entry metadata in C's exact order (memdb.c:1497-1503)
> + hasher.update(version.to_le_bytes());
> + hasher.update(writer.to_le_bytes());
> + hasher.update(mtime.to_le_bytes());
> + hasher.update((size as u32).to_le_bytes()); // C uses u32 for te->size
> + hasher.update([entry_type]);
> + hasher.update(parent.to_le_bytes());
> + hasher.update(name.as_bytes());
> +
> + // Hash data only for regular files with non-zero size (memdb.c:1505-1507)
> + if entry_type == 8 /* DT_REG */ && size > 0 {
> + hasher.update(data);
> + }
> +
> + hasher.finalize().into()
> + }
> +}
> +
> +/// Implement comparison for MemDbIndex
> +///
> +/// Matches C's dcdb_choose_leader_with_highest_index() logic:
> +/// - If same version, higher mtime wins
> +/// - If different version, higher version wins
> +impl PartialOrd for MemDbIndex {
> + fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
> + Some(self.cmp(other))
> + }
> +}
> +
> +impl Ord for MemDbIndex {
> + fn cmp(&self, other: &Self) -> std::cmp::Ordering {
> + // First compare by version (higher version wins)
> + // Then by mtime (higher mtime wins) if versions are equal
> + self.version
> + .cmp(&other.version)
> + .then_with(|| self.mtime.cmp(&other.mtime))
> + }
> +}
> +
> +impl MemDbIndex {
> + /// Find entries that differ from another index
> + ///
> + /// Returns the set of inodes that need to be sent as updates.
> + /// Matches C's dcdb_create_and_send_updates() comparison logic.
> + pub fn find_differences(&self, other: &MemDbIndex) -> Vec<u64> {
> + let mut differences = Vec::new();
> +
> + // Walk through master index, comparing with slave
> + let mut j = 0; // slave position
> +
> + for i in 0..self.entries.len() {
> + let master_entry = &self.entries[i];
> + let inode = master_entry.inode;
> +
> + // Advance slave pointer to matching or higher inode
> + while j < other.entries.len() && other.entries[j].inode < inode {
> + j += 1;
> + }
> +
> + // Check if entries match
> + if j < other.entries.len() {
> + let slave_entry = &other.entries[j];
> + if slave_entry.inode == inode && slave_entry.digest == master_entry.digest {
> + // Entries match - skip
> + continue;
> + }
> + }
> +
> + // Entry differs or missing - needs update
> + differences.push(inode);
> + }
> +
> + differences
> + }
> +}
> +
> +#[cfg(test)]
[..]
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/lib.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/lib.rs
> new file mode 100644
> index 00000000..f5c6d97a
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/lib.rs
> @@ -0,0 +1,26 @@
> +/// In-memory database with SQLite persistence
> +///
> +/// This module provides a cluster-synchronized in-memory database with SQLite persistence.
> +/// The implementation is organized into focused submodules:
> +///
> +/// - `types`: Type definitions and constants
> +/// - `database`: Core MemDb struct and CRUD operations
> +/// - `locks`: Resource locking functionality
> +/// - `sync`: State synchronization and serialization
> +/// - `index`: C-compatible memdb index structures for efficient state comparison
> +/// - `traits`: Trait abstractions for dependency injection and testing
> +mod database;
> +mod index;
> +mod locks;
> +mod sync;
> +mod traits;
> +mod types;
> +mod vmlist;
> +
> +// Re-export public types
> +pub use database::MemDb;
> +pub use index::{IndexEntry, MemDbIndex};
> +pub use locks::is_lock_path;
> +pub use traits::MemDbOps;
> +pub use types::{ROOT_INODE, TreeEntry};
> +pub use vmlist::recreate_vmlist;
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/locks.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/locks.rs
> new file mode 100644
> index 00000000..6d797fd0
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/locks.rs
> @@ -0,0 +1,286 @@
> +/// Lock management for memdb
> +///
> +/// Locks in pmxcfs are implemented as directory entries stored in the database at
> +/// `priv/lock/<lockname>`. This ensures locks are:
> +/// 1. Persistent across restarts
> +/// 2. Synchronized across the cluster via DFSM
> +/// 3. Visible to both C and Rust nodes
> +///
> +/// The in-memory lock table is a cache rebuilt from the database on startup
> +/// and updated dynamically during runtime.
> +use anyhow::Result;
> +use std::time::{SystemTime, UNIX_EPOCH};
> +
> +use super::database::MemDb;
> +use super::types::{LOCK_DIR_PATH, LOCK_TIMEOUT, LockInfo};
> +
> +/// Check if a path is in the lock directory
> +///
> +/// Matches C's path_is_lockdir() function (cfs-utils.c:306)
> +/// Returns true if path is "{LOCK_DIR_PATH}/<something>" (with or without leading /)
> +pub fn is_lock_path(path: &str) -> bool {
> + let path = path.trim_start_matches('/');
> + let lock_prefix = format!("{LOCK_DIR_PATH}/");
> + path.starts_with(&lock_prefix) && path.len() > lock_prefix.len()
> +}
> +
> +impl MemDb {
> + /// Check if a lock has expired (with side effects matching C semantics)
> + ///
> + /// This function implements the same behavior as the C version (memdb.c:330-358):
> + /// - If no lock exists in cache: Reads from database, creates cache entry, returns `false`
> + /// - If lock exists but csum mismatches: Updates csum, resets timeout, logs critical error, returns `false`
> + /// - If lock exists, csum matches, and time > LOCK_TIMEOUT: Returns `true` (expired)
> + /// - Otherwise: Returns `false` (not expired)
> + ///
> + /// This function is used for both checking AND managing locks, matching C semantics.
> + ///
> + /// # Current Usage
> + /// - Called from `database::create()` when creating lock directories (matching C memdb.c:928)
> + /// - Called from FUSE utimens operation (pmxcfs/src/fuse/filesystem.rs:717) for mtime=0 unlock requests
> + /// - Called from DFSM unlock message handlers (pmxcfs/src/memdb_callbacks.rs:142,161)
> + ///
> + /// Note: DFSM broadcasting of unlock messages to cluster nodes is not yet fully implemented.
> + /// See TODOs in filesystem.rs:723 and memdb_callbacks.rs:154 for remaining work.
> + pub fn lock_expired(&self, path: &str, csum: &[u8; 32]) -> bool {
> + let mut locks = self.inner.locks.lock();
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap_or_default()
> + .as_secs();
> +
> + match locks.get_mut(path) {
> + Some(lock_info) => {
> + // Lock exists in cache - check csum
> + if lock_info.csum != *csum {
> + // Wrong csum - update and reset timeout
> + lock_info.ltime = now;
> + lock_info.csum = *csum;
> + tracing::error!("Lock checksum mismatch for '{}' - resetting timeout", path);
> + return false;
> + }
> +
> + // Csum matches - check if expired
> + let elapsed = now - lock_info.ltime;
> + if elapsed > LOCK_TIMEOUT {
> + tracing::debug!(path, elapsed, "Lock expired");
> + return true; // Expired
> + }
> +
> + false // Not expired
> + }
> + None => {
> + // No lock in cache - create new cache entry
> + locks.insert(
> + path.to_string(),
> + LockInfo {
> + ltime: now,
> + csum: *csum,
> + },
> + );
> + tracing::debug!(path, "Created new lock cache entry");
> + false // Not expired (just created)
> + }
> + }
> + }
> +
> + /// Acquire a lock on a path
> + ///
> + /// This creates a directory entry in the database at `priv/lock/<lockname>`
> + /// and broadcasts the operation to the cluster via DFSM.
> + pub fn acquire_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()> {
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap_or_default()
> + .as_secs();
> +
> + let locks = self.inner.locks.lock();
> +
> + // Check if there's an existing valid lock in cache
> + if let Some(existing_lock) = locks.get(path) {
> + let lock_age = now - existing_lock.ltime;
> + if lock_age <= LOCK_TIMEOUT && existing_lock.csum != *csum {
> + return Err(anyhow::anyhow!("Lock already held by another process"));
> + }
> + }
> +
> + // Convert path like "/priv/lock/foo.lock" to just the lock name
> + let lock_dir_with_slash = format!("/{LOCK_DIR_PATH}/");
> + let lock_name = if let Some(name) = path.strip_prefix(&lock_dir_with_slash) {
> + name
> + } else {
> + path.strip_prefix('/').unwrap_or(path)
> + };
> +
> + let lock_path = format!("/{LOCK_DIR_PATH}/{lock_name}");
In this lock path we use leading slash, but update_locks is without
format!("{}/{}", LOCK_DIR_PATH, entry.name) which would not match.
Please standardize on paths without leading slash and also adjust
the stripping logic accordingly.
Also we should validate the lock names to avoid path traversal.
> +
> + // Release locks mutex before database operations to avoid deadlock
> + drop(locks);
> +
> + // Create or update lock directory in database
> + // First check if it exists
> + if self.exists(&lock_path)? {
> + // Lock directory exists - update its mtime to refresh
> + // In C this is implicit through the checksum, we'll update the entry
> + tracing::debug!("Refreshing existing lock directory: {}", lock_path);
> + // We don't need to do anything - the lock cache entry will be updated below
> + } else {
> + // Create lock directory in database
> + let mode = libc::S_IFDIR | 0o755;
> + let mtime = now as u32;
> +
> + // Ensure lock directory exists
> + let lock_dir_full = format!("/{LOCK_DIR_PATH}");
> + if !self.exists(&lock_dir_full)? {
> + self.create(&lock_dir_full, libc::S_IFDIR | 0o755, mtime)?;
> + }
> +
> + self.create(&lock_path, mode, mtime)?;
> + tracing::debug!("Created lock directory in database: {}", lock_path);
> + }
> +
> + // Update in-memory cache
> + let mut locks = self.inner.locks.lock();
> + locks.insert(
> + lock_path.clone(),
> + LockInfo {
> + ltime: now,
> + csum: *csum,
> + },
> + );
> +
> + tracing::debug!("Lock acquired on path: {}", lock_path);
> + Ok(())
> + }
> +
> + /// Release a lock on a path
> + ///
> + /// This deletes the directory entry from the database and broadcasts
> + /// the delete operation to the cluster via DFSM.
> + pub fn release_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()> {
> + let locks = self.inner.locks.lock();
> +
> + if let Some(lock_info) = locks.get(path) {
> + // Only release if checksum matches
> + if lock_info.csum != *csum {
> + return Err(anyhow::anyhow!("Cannot release lock: checksum mismatch"));
> + }
> + } else {
> + return Err(anyhow::anyhow!("No lock found on path: {path}"));
> + }
> +
> + // Release locks mutex before database operations
> + drop(locks);
> +
> + // Delete lock directory from database
> + if self.exists(path)? {
> + self.delete(path)?;
> + tracing::debug!("Deleted lock directory from database: {}", path);
> + }
> +
> + // Remove from in-memory cache
> + let mut locks = self.inner.locks.lock();
> + locks.remove(path);
> +
> + tracing::debug!("Lock released on path: {}", path);
> + Ok(())
> + }
> +
> + /// Update lock cache by scanning the priv/lock directory in database
> + ///
> + /// This implements the C version's behavior (memdb.c:360-89):
> + /// - Scans the `priv/lock` directory in the database
> + /// - Rebuilds the entire lock hash table from database state
> + /// - Preserves `ltime` from old entries if csum matches
> + /// - Is called on database open and after synchronization
> + ///
> + /// This ensures locks are visible across C/Rust nodes and survive restarts.
> + pub(crate) fn update_locks(&self) {
> + // Check if lock directory exists
> + let _lock_dir = match self.lookup_path(LOCK_DIR_PATH) {
> + Some(entry) if entry.is_dir() => entry,
> + _ => {
> + tracing::debug!(
> + "{} directory does not exist, initializing empty lock table",
> + LOCK_DIR_PATH
> + );
> + self.inner.locks.lock().clear();
> + return;
> + }
> + };
> +
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap_or_default()
> + .as_secs();
> +
> + // Get old locks table for preserving ltimes
> + let old_locks = {
> + let locks = self.inner.locks.lock();
> + locks.clone()
> + };
> +
> + // Build new locks table from database
> + let mut new_locks = std::collections::HashMap::new();
> +
> + // Read all lock directories
> + match self.readdir(LOCK_DIR_PATH) {
> + Ok(entries) => {
> + for entry in entries {
> + // Only process directories (locks are stored as directories)
> + if !entry.is_dir() {
> + continue;
> + }
> +
> + let lock_path = format!("{}/{}", LOCK_DIR_PATH, entry.name);
> + let csum = entry.compute_checksum();
> +
> + // Check if we have an old entry with matching checksum
> + let ltime = if let Some(old_lock) = old_locks.get(&lock_path) {
> + if old_lock.csum == csum {
> + // Checksum matches - preserve old ltime
> + old_lock.ltime
> + } else {
> + // Checksum changed - reset ltime
> + now
> + }
> + } else {
> + // New lock - set ltime to now
> + now
> + };
> +
> + new_locks.insert(lock_path.clone(), LockInfo { ltime, csum });
> + tracing::debug!("Loaded lock from database: {}", lock_path);
> + }
> + }
> + Err(e) => {
> + tracing::warn!("Failed to read {} directory: {}", LOCK_DIR_PATH, e);
> + return;
> + }
> + }
> +
> + // Replace lock table
> + *self.inner.locks.lock() = new_locks;
> +
> + tracing::debug!(
> + "Updated lock table from database: {} locks",
> + self.inner.locks.lock().len()
> + );
> + }
> +
> + /// Check if a path is locked
> + pub fn is_locked(&self, path: &str) -> bool {
> + let locks = self.inner.locks.lock();
> + if let Some(lock_info) = locks.get(path) {
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap_or_default()
> + .as_secs();
> +
> + // Check if lock is still valid (not expired)
> + (now - lock_info.ltime) <= LOCK_TIMEOUT
> + } else {
> + false
> + }
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/sync.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/sync.rs
> new file mode 100644
> index 00000000..719a2cf0
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/sync.rs
> @@ -0,0 +1,249 @@
> +/// State synchronization and serialization for memdb
> +use anyhow::{Context, Result};
> +use sha2::{Digest, Sha256};
> +use std::sync::atomic::Ordering;
> +
> +use super::database::MemDb;
> +use super::index::{IndexEntry, MemDbIndex};
> +use super::types::TreeEntry;
> +
> +impl MemDb {
> + /// Encode database index for C-compatible state synchronization
> + ///
> + /// This creates a memdb_index_t structure matching the C implementation,
> + /// containing metadata and a sorted list of (inode, digest) pairs.
> + /// This is sent as the "state" during DFSM synchronization.
> + pub fn encode_index(&self) -> Result<MemDbIndex> {
> + let mut index = self.inner.index.lock();
> +
> + // CRITICAL: Synchronize root entry version with global version counter
> + // The C implementation uses root->version as the index version,
> + // so we must ensure they match before encoding.
> + let global_version = self.inner.version.load(Ordering::SeqCst);
> +
> + let root_inode = self.inner.root_inode;
> + let mut root_version_updated = false;
> + if let Some(root_entry) = index.get_mut(&root_inode) {
> + if root_entry.version != global_version {
> + root_entry.version = global_version;
> + root_version_updated = true;
> + }
> + } else {
> + anyhow::bail!("Root entry not found in index");
> + }
> +
> + // If root version was updated, persist to database
> + if root_version_updated {
> + let conn = self.inner.conn.lock();
> + let root_entry = index.get(&root_inode).unwrap(); // Safe: we just checked it exists
> +
> + conn.execute(
> + "UPDATE entries SET version = ? WHERE inode = ?",
Please revisit the schema, should refer to the tree table?
> + rusqlite::params![root_entry.version as i64, root_inode as i64],
> + )
> + .context("Failed to update root version in database")?;
> +
> + drop(conn);
> + }
> +
> + // Collect ALL entries including root, sorted by inode
> + let mut entries: Vec<&TreeEntry> = index.values().collect();
> + entries.sort_by_key(|e| e.inode);
> +
> + tracing::info!("=== encode_index: Encoding {} entries ===", entries.len());
> + for te in entries.iter() {
> + tracing::info!(
> + " Entry: inode={:#018x}, parent={:#018x}, name='{}', type={}, version={}, writer={}, mtime={}, size={}",
> + te.inode, te.parent, te.name, te.entry_type, te.version, te.writer, te.mtime, te.size
> + );
> + }
> +
> + // Create index entries with digests
> + let index_entries: Vec<IndexEntry> = entries
> + .iter()
> + .map(|te| {
> + let digest = MemDbIndex::compute_entry_digest(
> + te.inode,
> + te.parent,
> + te.version,
> + te.writer,
> + te.mtime,
> + te.size,
> + te.entry_type,
> + &te.name,
> + &te.data,
> + );
> + tracing::debug!(
> + " Digest for inode {:#018x}: {:02x}{:02x}{:02x}{:02x}...{:02x}{:02x}{:02x}{:02x}",
> + te.inode,
> + digest[0], digest[1], digest[2], digest[3],
> + digest[28], digest[29], digest[30], digest[31]
> + );
> + IndexEntry { inode: te.inode, digest }
> + })
> + .collect();
> +
> + // Get root entry for mtime and writer_id (now updated with global version)
> + let root_entry = index
> + .get(&self.inner.root_inode)
> + .ok_or_else(|| anyhow::anyhow!("Root entry not found in index"))?;
> +
> + let version = global_version; // Already synchronized above
> + let last_inode = index.keys().max().copied().unwrap_or(1);
> + let writer = root_entry.writer;
> + let mtime = root_entry.mtime;
> +
> + drop(index);
> +
> + Ok(MemDbIndex::new(
> + version,
> + last_inode,
> + writer,
> + mtime,
> + index_entries,
> + ))
> + }
> +
> + /// Encode the entire database state into a byte array
> + /// Matches C version's memdb_encode() function
> + pub fn encode_database(&self) -> Result<Vec<u8>> {
> + let index = self.inner.index.lock();
> +
> + // Collect all entries sorted by inode for consistent ordering
> + // This matches the C implementation's memdb_tree_compare function
> + let mut entries: Vec<&TreeEntry> = index.values().collect();
> + entries.sort_by_key(|e| e.inode);
> +
> + // Log all entries for debugging
> + tracing::info!(
> + "Encoding database: {} entries",
> + entries.len()
> + );
> + for entry in entries.iter() {
> + tracing::info!(
> + " Entry: inode={}, name='{}', parent={}, type={}, size={}, version={}",
> + entry.inode,
> + entry.name,
> + entry.parent,
> + entry.entry_type,
> + entry.size,
> + entry.version
> + );
> + }
> +
> + // Serialize using bincode (compatible with C struct layout)
> + let encoded = bincode::serialize(&entries)
> + .map_err(|e| anyhow::anyhow!("Failed to encode database: {e}"))?;
> +
> + tracing::debug!(
> + "Encoded database: {} entries, {} bytes",
> + entries.len(),
> + encoded.len()
> + );
> +
> + Ok(encoded)
> + }
> +
> + /// Compute checksum of the entire database state
> + /// Used for DFSM state verification
> + pub fn compute_database_checksum(&self) -> Result<[u8; 32]> {
> + let encoded = self.encode_database()?;
This currently serializes via bincode then hashes. C’s
memdb_compute_checksum hashes the entries directly.
This does not look C compatible.
> +
> + let mut hasher = Sha256::new();
> + hasher.update(&encoded);
> +
> + Ok(hasher.finalize().into())
> + }
> +
> + /// Decode database state from a byte array
> + /// Used during DFSM state synchronization
> + pub fn decode_database(data: &[u8]) -> Result<Vec<TreeEntry>> {
> + let entries: Vec<TreeEntry> = bincode::deserialize(data)
> + .map_err(|e| anyhow::anyhow!("Failed to decode database: {e}"))?;
> +
> + tracing::debug!("Decoded database: {} entries", entries.len());
> +
> + Ok(entries)
> + }
> +
> + /// Synchronize corosync configuration from MemDb to filesystem
> + ///
> + /// Reads corosync.conf from memdb and writes to system file if changed.
> + /// This syncs the cluster configuration from the distributed database
> + /// to the local filesystem.
> + ///
> + /// # Arguments
> + /// * `system_path` - Path to write the corosync.conf file (default: /etc/corosync/corosync.conf)
> + /// * `force` - Force write even if unchanged
> + pub fn sync_corosync_conf(&self, system_path: Option<&str>, force: bool) -> Result<()> {
> + let system_path = system_path.unwrap_or("/etc/corosync/corosync.conf");
> + tracing::info!(
> + "Syncing corosync configuration to {} (force={})",
> + system_path,
> + force
> + );
> +
> + // Path in memdb for corosync.conf
> + let memdb_path = "/corosync.conf";
> +
> + // Try to read from memdb
> + let memdb_data = match self.lookup_path(memdb_path) {
> + Some(entry) if entry.is_file() => entry.data,
> + Some(_) => {
> + return Err(anyhow::anyhow!("{memdb_path} exists but is not a file"));
> + }
> + None => {
> + tracing::debug!("{} not found in memdb, nothing to sync", memdb_path);
> + return Ok(());
> + }
> + };
> +
> + // Read current system file if it exists
> + let system_data = std::fs::read(system_path).ok();
> +
> + // Determine if we need to write
> + let should_write = force || system_data.as_ref() != Some(&memdb_data);
> +
> + if !should_write {
> + tracing::debug!("Corosync configuration unchanged, skipping write");
> + return Ok(());
> + }
> +
> + // SAFETY CHECK: Writing to /etc requires root permissions
> + // We'll attempt the write but log clearly if it fails
> + tracing::info!(
> + "Corosync configuration changed (size: {} bytes), updating {}",
> + memdb_data.len(),
> + system_path
> + );
> +
> + // Basic validation: check if it looks like a valid corosync config
> + let config_str =
> + std::str::from_utf8(&memdb_data).context("Corosync config is not valid UTF-8")?;
> +
> + if !config_str.contains("totem") {
> + tracing::warn!("Corosync config validation: missing 'totem' section");
> + }
> + if !config_str.contains("nodelist") {
> + tracing::warn!("Corosync config validation: missing 'nodelist' section");
> + }
> +
> + // Attempt to write (will fail if not root or no permissions)
> + match std::fs::write(system_path, &memdb_data) {
> + Ok(()) => {
> + tracing::info!("Successfully updated {}", system_path);
> + Ok(())
> + }
> + Err(e) if e.kind() == std::io::ErrorKind::PermissionDenied => {
> + tracing::warn!(
> + "Permission denied writing {}: {}. Run as root to enable corosync sync.",
> + system_path,
> + e
> + );
> + // Don't return error - this is expected in non-root mode
> + Ok(())
> + }
> + Err(e) => Err(anyhow::anyhow!("Failed to write {system_path}: {e}")),
> + }
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/traits.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/traits.rs
> new file mode 100644
> index 00000000..efe3ff36
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/traits.rs
> @@ -0,0 +1,101 @@
> +//! Traits for MemDb operations
> +//!
> +//! This module provides the `MemDbOps` trait which abstracts MemDb operations
> +//! for dependency injection and testing. Similar to `StatusOps` in pmxcfs-status.
> +
> +use crate::types::TreeEntry;
> +use anyhow::Result;
> +
> +/// Trait abstracting MemDb operations for dependency injection and mocking
> +///
> +/// This trait enables:
> +/// - Dependency injection of MemDb into components
> +/// - Testing with MockMemDb instead of real database
> +/// - Trait objects for runtime polymorphism
> +///
> +/// # Example
> +/// ```no_run
> +/// use pmxcfs_memdb::{MemDb, MemDbOps};
> +/// use std::sync::Arc;
> +///
> +/// fn use_database(db: Arc<dyn MemDbOps>) {
> +/// // Can work with real MemDb or MockMemDb
> +/// let exists = db.exists("/test").unwrap();
> +/// }
> +/// ```
> +pub trait MemDbOps: Send + Sync {
> + // ===== Basic File Operations =====
> +
> + /// Create a new file or directory
> + fn create(&self, path: &str, mode: u32, mtime: u32) -> Result<()>;
> +
> + /// Read data from a file
> + fn read(&self, path: &str, offset: u64, size: usize) -> Result<Vec<u8>>;
> +
> + /// Write data to a file
> + fn write(
> + &self,
> + path: &str,
> + offset: u64,
> + mtime: u32,
> + data: &[u8],
> + truncate: bool,
> + ) -> Result<usize>;
> +
> + /// Delete a file or directory
> + fn delete(&self, path: &str) -> Result<()>;
> +
> + /// Rename a file or directory
> + fn rename(&self, old_path: &str, new_path: &str) -> Result<()>;
> +
> + /// Check if a path exists
> + fn exists(&self, path: &str) -> Result<bool>;
> +
> + /// List directory contents
> + fn readdir(&self, path: &str) -> Result<Vec<TreeEntry>>;
> +
> + /// Set modification time
> + fn set_mtime(&self, path: &str, writer: u32, mtime: u32) -> Result<()>;
> +
> + // ===== Path Lookup =====
> +
> + /// Look up a path and return its entry
> + fn lookup_path(&self, path: &str) -> Option<TreeEntry>;
> +
> + /// Get entry by inode number
> + fn get_entry_by_inode(&self, inode: u64) -> Option<TreeEntry>;
> +
> + // ===== Lock Operations =====
> +
> + /// Acquire a lock on a path
> + fn acquire_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()>;
> +
> + /// Release a lock on a path
> + fn release_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()>;
> +
> + /// Check if a path is locked
> + fn is_locked(&self, path: &str) -> bool;
> +
> + /// Check if a lock has expired
> + fn lock_expired(&self, path: &str, csum: &[u8; 32]) -> bool;
> +
> + // ===== Database Operations =====
> +
> + /// Get the current database version
> + fn get_version(&self) -> u64;
> +
> + /// Get all entries in the database
> + fn get_all_entries(&self) -> Result<Vec<TreeEntry>>;
> +
> + /// Replace all entries (for synchronization)
> + fn replace_all_entries(&self, entries: Vec<TreeEntry>) -> Result<()>;
> +
> + /// Apply a single tree entry update
> + fn apply_tree_entry(&self, entry: TreeEntry) -> Result<()>;
> +
> + /// Encode the entire database for network transmission
> + fn encode_database(&self) -> Result<Vec<u8>>;
> +
> + /// Compute database checksum
> + fn compute_database_checksum(&self) -> Result<[u8; 32]>;
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/types.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/types.rs
> new file mode 100644
> index 00000000..988596c8
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/types.rs
> @@ -0,0 +1,325 @@
> +/// Type definitions for memdb module
> +use sha2::{Digest, Sha256};
> +use std::collections::HashMap;
> +
> +pub(super) const MEMDB_MAX_FILE_SIZE: usize = 1024 * 1024; // 1 MiB (matches C version)
> +pub(super) const LOCK_TIMEOUT: u64 = 120; // Lock timeout in seconds
> +pub(super) const DT_DIR: u8 = 4; // Directory type
> +pub(super) const DT_REG: u8 = 8; // Regular file type
> +
> +/// Root inode number (matches C implementation's memdb root inode)
> +/// IMPORTANT: This is the MEMDB root inode, which is 0 in both C and Rust.
> +/// The FUSE layer exposes this as inode 1 to the filesystem (FUSE_ROOT_ID).
> +/// See pmxcfs/src/fuse.rs for the inode mapping logic between memdb and FUSE.
> +pub const ROOT_INODE: u64 = 0;
> +
> +/// Version file name (matches C VERSIONFILENAME)
> +/// Used to store root metadata as inode ROOT_INODE in the database
> +pub const VERSION_FILENAME: &str = "__version__";
> +
> +/// Lock directory path (where cluster resource locks are stored)
> +/// Locks are implemented as directory entries stored at `priv/lock/<lockname>`
> +pub const LOCK_DIR_PATH: &str = "priv/lock";
> +
> +/// Lock information for resource locking
> +///
> +/// In the C version (memdb.h:71-74), the lock info struct includes a `path` field
> +/// that serves as the hash table key. In Rust, we use `HashMap<String, LockInfo>`
> +/// where the path is stored as the HashMap key, so we don't duplicate it here.
> +#[derive(Clone, Debug)]
> +pub(crate) struct LockInfo {
> + /// Lock timestamp (seconds since UNIX epoch)
> + pub(crate) ltime: u64,
> +
> + /// Checksum of the locked resource (used to detect changes)
> + pub(crate) csum: [u8; 32],
> +}
> +
> +/// Tree entry representing a file or directory
> +#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)]
> +pub struct TreeEntry {
> + pub inode: u64,
> + pub parent: u64,
> + pub version: u64,
> + pub writer: u32,
> + pub mtime: u32,
> + pub size: usize,
> + pub entry_type: u8, // DT_DIR or DT_REG
> + pub name: String,
> + pub data: Vec<u8>, // File data (empty for directories)
> +}
> +
> +impl TreeEntry {
> + pub fn is_dir(&self) -> bool {
> + self.entry_type == DT_DIR
> + }
> +
> + pub fn is_file(&self) -> bool {
> + self.entry_type == DT_REG
> + }
> +
> + /// Serialize TreeEntry to C-compatible wire format for Update messages
> + ///
> + /// Wire format (matches dcdb_send_update_inode):
> + /// ```c
> + /// [parent: u64][inode: u64][version: u64][writer: u32][mtime: u32]
> + /// [size: u32][namelen: u32][type: u8][name: namelen bytes][data: size bytes]
> + /// ```
> + pub fn serialize_for_update(&self) -> Vec<u8> {
> + let namelen = (self.name.len() + 1) as u32; // Include null terminator
> + let header_size = 8 + 8 + 8 + 4 + 4 + 4 + 4 + 1; // 41 bytes
> + let total_size = header_size + namelen as usize + self.data.len();
> +
> + let mut buf = Vec::with_capacity(total_size);
> +
> + // Header fields
> + buf.extend_from_slice(&self.parent.to_le_bytes());
> + buf.extend_from_slice(&self.inode.to_le_bytes());
> + buf.extend_from_slice(&self.version.to_le_bytes());
> + buf.extend_from_slice(&self.writer.to_le_bytes());
> + buf.extend_from_slice(&self.mtime.to_le_bytes());
> + buf.extend_from_slice(&(self.size as u32).to_le_bytes());
> + buf.extend_from_slice(&namelen.to_le_bytes());
> + buf.push(self.entry_type);
> +
> + // Name (null-terminated)
> + buf.extend_from_slice(self.name.as_bytes());
> + buf.push(0); // null terminator
> +
> + // Data (only for files)
> + if self.entry_type == DT_REG && !self.data.is_empty() {
> + buf.extend_from_slice(&self.data);
> + }
> +
> + buf
> + }
> +
> + /// Deserialize TreeEntry from C-compatible wire format
> + ///
> + /// Matches dcdb_parse_update_inode
> + pub fn deserialize_from_update(data: &[u8]) -> anyhow::Result<Self> {
> + if data.len() < 41 {
> + anyhow::bail!(
> + "Update message too short: {} bytes (need at least 41)",
> + data.len()
> + );
> + }
> +
> + let mut offset = 0;
> +
> + // Parse header
> + let parent = u64::from_le_bytes(data[offset..offset + 8].try_into().unwrap());
> + offset += 8;
> + let inode = u64::from_le_bytes(data[offset..offset + 8].try_into().unwrap());
> + offset += 8;
> + let version = u64::from_le_bytes(data[offset..offset + 8].try_into().unwrap());
> + offset += 8;
> + let writer = u32::from_le_bytes(data[offset..offset + 4].try_into().unwrap());
> + offset += 4;
> + let mtime = u32::from_le_bytes(data[offset..offset + 4].try_into().unwrap());
> + offset += 4;
> + let size = u32::from_le_bytes(data[offset..offset + 4].try_into().unwrap()) as usize;
> + offset += 4;
> + let namelen = u32::from_le_bytes(data[offset..offset + 4].try_into().unwrap()) as usize;
> + offset += 4;
> + let entry_type = data[offset];
> + offset += 1;
> +
> + // Validate type
> + if entry_type != DT_REG && entry_type != DT_DIR {
> + anyhow::bail!("Invalid entry type: {entry_type}");
> + }
> +
> + // Validate lengths
> + if data.len() < offset + namelen + size {
> + anyhow::bail!(
> + "Update message too short: {} bytes (need {})",
> + data.len(),
> + offset + namelen + size
> + );
> + }
> +
> + // Parse name (null-terminated)
> + let name_bytes = &data[offset..offset + namelen];
> + if name_bytes.is_empty() || name_bytes[namelen - 1] != 0 {
> + anyhow::bail!("Name not null-terminated");
> + }
> + let name = std::str::from_utf8(&name_bytes[..namelen - 1])
> + .map_err(|e| anyhow::anyhow!("Invalid UTF-8 in name: {e}"))?
> + .to_string();
> + offset += namelen;
> +
> + // Parse data
> + let data_vec = if entry_type == DT_REG && size > 0 {
> + data[offset..offset + size].to_vec()
> + } else {
> + Vec::new()
> + };
> +
> + Ok(TreeEntry {
> + inode,
> + parent,
> + version,
> + writer,
> + mtime,
> + size,
> + entry_type,
> + name,
> + data: data_vec,
> + })
> + }
> +
> + /// Compute SHA-256 checksum of this tree entry
> + ///
> + /// This checksum is used by the lock system to detect changes to lock directory entries.
> + /// Matches C version's memdb_tree_entry_csum() function (memdb.c:1389).
> + ///
> + /// The checksum includes all entry metadata (inode, parent, version, writer, mtime, size,
> + /// entry_type, name) and data (for files). This ensures any modification to a lock directory
> + /// entry is detected, triggering lock timeout reset.
Since C hashes raw integer bytes, should we use to_ne_bytes() here?
> + pub fn compute_checksum(&self) -> [u8; 32] {
> + let mut hasher = Sha256::new();
> +
> + // Hash entry metadata in the same order as C version
> + hasher.update(self.inode.to_le_bytes());
> + hasher.update(self.parent.to_le_bytes());
This seems to be at the wrong position.
In C it is at the 7th position.
> + hasher.update(self.version.to_le_bytes());
> + hasher.update(self.writer.to_le_bytes());
> + hasher.update(self.mtime.to_le_bytes());
> + hasher.update(self.size.to_le_bytes());
C hashes only 4 bytes (guint32)
I think this should be
hasher.update((self.size as u32).to_le_bytes());
> + hasher.update([self.entry_type]);
> + hasher.update(self.name.as_bytes());
> +
> + // Hash data if present
> + if !self.data.is_empty() {
> + hasher.update(&self.data);
> + }
> +
> + hasher.finalize().into()
> + }
> +}
> +
> +/// Return type for load_from_db: (index, tree, root_inode, max_version)
> +pub(super) type LoadDbResult = (
> + HashMap<u64, TreeEntry>,
> + HashMap<u64, HashMap<String, u64>>,
> + u64,
> + u64,
> +);
> +
[..]
> +}
^ permalink raw reply [relevance 5%]
* Re: [pve-devel] [PATCH pve-cluster 04/15] pmxcfs-rs: add pmxcfs-rrd crate
@ 2026-01-29 14:44 5% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-29 14:44 UTC (permalink / raw)
To: Proxmox VE development discussion, Kefu Chai
Thanks for the patch, Kefu.
Overall looks good (nice backend abstraction and schema separation).
I left a few inline notes around transform_data() skip logic,
sanitizing key path components, .rrd on-disk naming consistency across
backends/tests, and adding a few actual payload fixtures for the
transform tests.
Please see comments inline.
On 1/7/26 10:15 AM, Kefu Chai wrote:
> Add RRD (Round-Robin Database) file persistence system:
> - RrdWriter: Main API for RRD operations
> - Schema definitions for CPU, memory, network metrics
> - Format migration support (v1/v2/v3)
> - rrdcached integration for batched writes
> - Data transformation for legacy formats
>
> This is an independent crate with no internal dependencies,
> only requiring external RRD libraries (rrd, rrdcached-client)
> and tokio for async operations. It handles time-series data
> storage compatible with the C implementation.
>
> Includes comprehensive unit tests for data transformation,
> schema generation, and multi-source data processing.
>
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
> src/pmxcfs-rs/Cargo.toml | 1 +
> src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml | 18 +
> src/pmxcfs-rs/pmxcfs-rrd/README.md | 51 ++
> src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs | 67 ++
> .../pmxcfs-rrd/src/backend/backend_daemon.rs | 214 +++++++
> .../pmxcfs-rrd/src/backend/backend_direct.rs | 606 ++++++++++++++++++
> .../src/backend/backend_fallback.rs | 229 +++++++
> src/pmxcfs-rs/pmxcfs-rrd/src/daemon.rs | 140 ++++
> src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs | 313 +++++++++
> src/pmxcfs-rs/pmxcfs-rrd/src/lib.rs | 21 +
> src/pmxcfs-rs/pmxcfs-rrd/src/schema.rs | 577 +++++++++++++++++
> src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs | 397 ++++++++++++
> 12 files changed, 2634 insertions(+)
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/README.md
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_daemon.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_direct.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_fallback.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/daemon.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/lib.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/schema.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs
>
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index 4d17e87e..dd36c81f 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -4,6 +4,7 @@ members = [
> "pmxcfs-api-types", # Shared types and error definitions
> "pmxcfs-config", # Configuration management
> "pmxcfs-logger", # Cluster log with ring buffer and deduplication
> + "pmxcfs-rrd", # RRD (Round-Robin Database) persistence
> ]
> resolver = "2"
>
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml b/src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml
> new file mode 100644
> index 00000000..bab71423
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml
> @@ -0,0 +1,18 @@
> +[package]
> +name = "pmxcfs-rrd"
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +
> +[dependencies]
> +anyhow.workspace = true
> +async-trait = "0.1"
> +chrono = { version = "0.4", default-features = false, features = ["clock"] }
> +rrd = "0.2"
> +rrdcached-client = "0.1.5"
This crate looks fairly young/small. Are we comfortable depending on
it? We could probably vendor/fork it to control stability?
> +tokio.workspace = true
> +tracing.workspace = true
> +
> +[dev-dependencies]
> +tempfile.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/README.md b/src/pmxcfs-rs/pmxcfs-rrd/README.md
> new file mode 100644
> index 00000000..800d78cf
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/README.md
> @@ -0,0 +1,51 @@
> +# pmxcfs-rrd
> +
> +RRD (Round-Robin Database) persistence for pmxcfs performance metrics.
> +
> +## Overview
> +
> +This crate provides RRD file management for storing time-series performance data from Proxmox nodes and VMs. It handles file creation, updates, and integration with rrdcached daemon for efficient writes.
Can we elaborate on the usage / flow of this crate?
How it will be called, what data will be passed, how the transformation
works, how the backend impls differ. This will help reviewers for sure.
Maybe also add a small code example how this lib is used, which I think
is valuable.
> +
> +### Key Features
> +
> +- RRD file creation with schema-based initialization
> +- RRD updates (write metrics to disk)
> +- rrdcached integration for batched writes
> +- Support for both legacy and current schema versions
> +- Type-safe key parsing and validation
> +- Compatible with existing C-created RRD files
> +
> +## Module Structure
> +
> +| Module | Purpose |
> +|--------|---------|
> +| `writer.rs` | Main RrdWriter API |
> +| `schema.rs` | RRD schema definitions (DS, RRA) |
> +| `key_type.rs` | RRD key parsing and validation |
> +| `daemon.rs` | rrdcached daemon client |
The backend module is not listed here.
But I think we could drop this table. If we keep it, I think it would
be helpful to elaborate a bit more on the components.
> +
> +## External Dependencies
> +
> +- **librrd**: RRDtool library (via FFI bindings)
Lets explicitly note the rrd crate here, which provides the bindings
> +- **rrdcached**: Optional daemon for batched writes and improved performance
Since rrdcached is optional, we could also add a feature flag to reduce
dependencies/build surface?
> +
> +## Testing
> +
> +Unit tests verify:
> +- Schema generation and validation
> +- Key parsing for different RRD types (node, VM, storage)
> +- RRD file creation and update operations
> +- rrdcached client connection and fallback behavior
> +
> +Run tests with:
> +```bash
> +cargo test -p pmxcfs-rrd
> +```
> +
> +## References
> +
> +- **C Implementation**: `src/pmxcfs/status.c` (RRD code embedded)
> +- **Related Crates**:
> + - `pmxcfs-status` - Uses RrdWriter for metrics persistence
> + - `pmxcfs` - FUSE `.rrd` plugin reads RRD files
> +- **RRDtool Documentation**: https://oss.oetiker.ch/rrdtool/
Thanks for adding the references and how they are used in C, this is
very helpful I think.
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs
> new file mode 100644
> index 00000000..58652831
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs
> @@ -0,0 +1,67 @@
> +/// RRD Backend Trait and Implementations
> +///
> +/// This module provides an abstraction over different RRD writing mechanisms:
> +/// - Daemon-based (via rrdcached) for performance and batching
> +/// - Direct file writing for reliability and fallback scenarios
> +/// - Fallback composite that tries daemon first, then falls back to direct
> +///
> +/// This design matches the C implementation's behavior in status.c where
> +/// it attempts daemon update first, then falls back to direct file writes.
> +use super::schema::RrdSchema;
> +use anyhow::Result;
> +use async_trait::async_trait;
> +use std::path::Path;
> +
> +/// Trait for RRD backend implementations
> +///
> +/// Provides abstraction over different RRD writing mechanisms.
> +/// All methods are async to support both async (daemon) and sync (direct file) operations.
> +#[async_trait]
> +pub trait RrdBackend: Send + Sync {
Great idea to abstract this!
> + /// Update RRD file with new data
> + ///
> + /// # Arguments
> + /// * `file_path` - Full path to the RRD file
> + /// * `data` - Update data in format "timestamp:value1:value2:..."
> + async fn update(&mut self, file_path: &Path, data: &str) -> Result<()>;
> +
> + /// Create new RRD file with schema
> + ///
> + /// # Arguments
> + /// * `file_path` - Full path where RRD file should be created
> + /// * `schema` - RRD schema defining data sources and archives
> + /// * `start_timestamp` - Start time for the RRD file (Unix timestamp)
> + async fn create(
> + &mut self,
> + file_path: &Path,
> + schema: &RrdSchema,
> + start_timestamp: i64,
> + ) -> Result<()>;
> +
> + /// Flush pending updates to disk
> + ///
> + /// For daemon backends, this sends a FLUSH command.
> + /// For direct backends, this is a no-op (writes are immediate).
> + #[allow(dead_code)] // Used in backend implementations via trait dispatch
> + async fn flush(&mut self) -> Result<()>;
> +
> + /// Check if backend is available and healthy
> + ///
> + /// Returns true if the backend can be used for operations.
> + /// For daemon backends, this checks if the connection is alive.
> + /// For direct backends, this always returns true.
> + #[allow(dead_code)] // Used in fallback backend via trait dispatch
> + async fn is_available(&self) -> bool;
> +
> + /// Get a human-readable name for this backend
> + fn name(&self) -> &str;
> +}
> +
> +// Backend implementations
> +mod backend_daemon;
> +mod backend_direct;
> +mod backend_fallback;
> +
> +pub use backend_daemon::RrdCachedBackend;
> +pub use backend_direct::RrdDirectBackend;
> +pub use backend_fallback::RrdFallbackBackend;
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_daemon.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_daemon.rs
> new file mode 100644
> index 00000000..28c1a99a
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_daemon.rs
> @@ -0,0 +1,214 @@
> +/// RRD Backend: rrdcached daemon
> +///
> +/// Uses rrdcached for batched, high-performance RRD updates.
> +/// This is the preferred backend when the daemon is available.
> +use super::super::schema::RrdSchema;
> +use anyhow::{Context, Result};
> +use async_trait::async_trait;
> +use rrdcached_client::RRDCachedClient;
> +use rrdcached_client::consolidation_function::ConsolidationFunction;
> +use rrdcached_client::create::{
> + CreateArguments, CreateDataSource, CreateDataSourceType, CreateRoundRobinArchive,
> +};
> +use std::path::Path;
> +
> +/// RRD backend using rrdcached daemon
> +pub struct RrdCachedBackend {
> + client: RRDCachedClient<tokio::net::UnixStream>,
> +}
> +
> +impl RrdCachedBackend {
> + /// Connect to rrdcached daemon
> + ///
> + /// # Arguments
> + /// * `socket_path` - Path to rrdcached Unix socket (default: /var/run/rrdcached.sock)
> + pub async fn connect(socket_path: &str) -> Result<Self> {
> + let client = RRDCachedClient::connect_unix(socket_path)
> + .await
> + .with_context(|| format!("Failed to connect to rrdcached at {socket_path}"))?;
> +
> + tracing::info!("Connected to rrdcached at {}", socket_path);
> +
> + Ok(Self { client })
> + }
> +}
> +
> +#[async_trait]
> +impl super::super::backend::RrdBackend for RrdCachedBackend {
> + async fn update(&mut self, file_path: &Path, data: &str) -> Result<()> {
> + // Parse the update data
> + let parts: Vec<&str> = data.split(':').collect();
> + if parts.len() < 2 {
> + anyhow::bail!("Invalid update data format: {data}");
> + }
> +
> + let timestamp = if parts[0] == "N" {
> + None
> + } else {
> + Some(
> + parts[0]
> + .parse::<usize>()
> + .with_context(|| format!("Invalid timestamp: {}", parts[0]))?,
> + )
> + };
> +
> + let values: Vec<f64> = parts[1..]
> + .iter()
> + .map(|v| {
> + if *v == "U" {
> + Ok(f64::NAN)
> + } else {
> + v.parse::<f64>()
> + .with_context(|| format!("Invalid value: {v}"))
> + }
> + })
> + .collect::<Result<Vec<_>>>()?;
> +
> + // Get file path without .rrd extension (rrdcached-client adds it)
> + let path_str = file_path.to_string_lossy();
> + let path_without_ext = path_str.strip_suffix(".rrd").unwrap_or(&path_str);
> +
> + // Send update via rrdcached
> + self.client
> + .update(path_without_ext, timestamp, values)
> + .await
> + .with_context(|| format!("rrdcached update failed for {:?}", file_path))?;
> +
> + tracing::trace!("Updated RRD via daemon: {:?} -> {}", file_path, data);
> +
> + Ok(())
> + }
> +
> + async fn create(
> + &mut self,
> + file_path: &Path,
> + schema: &RrdSchema,
> + start_timestamp: i64,
> + ) -> Result<()> {
> + tracing::debug!(
> + "Creating RRD file via daemon: {:?} with {} data sources",
> + file_path,
> + schema.column_count()
> + );
> +
> + // Convert our data sources to rrdcached-client CreateDataSource objects
> + let mut data_sources = Vec::new();
> + for ds in &schema.data_sources {
> + let serie_type = match ds.ds_type {
> + "GAUGE" => CreateDataSourceType::Gauge,
> + "DERIVE" => CreateDataSourceType::Derive,
> + "COUNTER" => CreateDataSourceType::Counter,
> + "ABSOLUTE" => CreateDataSourceType::Absolute,
> + _ => anyhow::bail!("Unsupported data source type: {}", ds.ds_type),
> + };
> +
> + // Parse min/max values
> + let minimum = if ds.min == "U" {
> + None
> + } else {
> + ds.min.parse().ok()
> + };
> + let maximum = if ds.max == "U" {
> + None
> + } else {
> + ds.max.parse().ok()
> + };
> +
> + let data_source = CreateDataSource {
> + name: ds.name.to_string(),
> + minimum,
> + maximum,
> + heartbeat: ds.heartbeat as i64,
> + serie_type,
> + };
> +
> + data_sources.push(data_source);
> + }
> +
> + // Convert our RRA definitions to rrdcached-client CreateRoundRobinArchive objects
> + let mut archives = Vec::new();
> + for rra in &schema.archives {
> + // Parse RRA string: "RRA:AVERAGE:0.5:1:70"
> + let parts: Vec<&str> = rra.split(':').collect();
> + if parts.len() != 5 || parts[0] != "RRA" {
> + anyhow::bail!("Invalid RRA format: {rra}");
> + }
> +
> + let consolidation_function = match parts[1] {
> + "AVERAGE" => ConsolidationFunction::Average,
> + "MIN" => ConsolidationFunction::Min,
> + "MAX" => ConsolidationFunction::Max,
> + "LAST" => ConsolidationFunction::Last,
> + _ => anyhow::bail!("Unsupported consolidation function: {}", parts[1]),
> + };
> +
> + let xfiles_factor: f64 = parts[2]
> + .parse()
> + .with_context(|| format!("Invalid xff in RRA: {rra}"))?;
> + let steps: i64 = parts[3]
> + .parse()
> + .with_context(|| format!("Invalid steps in RRA: {rra}"))?;
> + let rows: i64 = parts[4]
> + .parse()
> + .with_context(|| format!("Invalid rows in RRA: {rra}"))?;
> +
> + let archive = CreateRoundRobinArchive {
> + consolidation_function,
> + xfiles_factor,
> + steps,
> + rows,
> + };
> + archives.push(archive);
> + }
> +
> + // Get path without .rrd extension (rrdcached-client adds it)
> + let path_str = file_path.to_string_lossy();
> + let path_without_ext = path_str
> + .strip_suffix(".rrd")
> + .unwrap_or(&path_str)
> + .to_string();
> +
> + // Create CreateArguments
> + let create_args = CreateArguments {
> + path: path_without_ext,
> + data_sources,
> + round_robin_archives: archives,
> + start_timestamp: start_timestamp as u64,
> + step_seconds: 60, // 60-second step (1 minute resolution)
> + };
> +
> + // Validate before sending
> + create_args.validate().context("Invalid CREATE arguments")?;
> +
> + // Send CREATE command via rrdcached
> + self.client
> + .create(create_args)
> + .await
> + .with_context(|| format!("Failed to create RRD file via daemon: {file_path:?}"))?;
> +
> + tracing::info!("Created RRD file via daemon: {:?} ({})", file_path, schema);
> +
> + Ok(())
> + }
> +
> + async fn flush(&mut self) -> Result<()> {
> + self.client
> + .flush_all()
> + .await
> + .context("Failed to flush rrdcached")?;
> +
> + tracing::debug!("Flushed all pending RRD updates");
> +
> + Ok(())
> + }
> +
> + async fn is_available(&self) -> bool {
> + // For now, assume we're available if we have a client
> + // Could add a PING command in the future
> + true
> + }
> +
> + fn name(&self) -> &str {
> + "rrdcached"
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_direct.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_direct.rs
> new file mode 100644
> index 00000000..6be3eb5d
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_direct.rs
> @@ -0,0 +1,606 @@
> +/// RRD Backend: Direct file writing
> +///
> +/// Uses the `rrd` crate (librrd bindings) for direct RRD file operations.
> +/// This backend is used as a fallback when rrdcached is unavailable.
> +///
> +/// This matches the C implementation's behavior in status.c:1416-1420 where
> +/// it falls back to rrd_update_r() and rrd_create_r() for direct file access.
> +use super::super::schema::RrdSchema;
> +use anyhow::{Context, Result};
> +use async_trait::async_trait;
> +use std::path::Path;
> +use std::time::Duration;
> +
> +/// RRD backend using direct file operations via librrd
> +pub struct RrdDirectBackend {
> + // Currently stateless, but kept as struct for future enhancements
> +}
> +
> +impl RrdDirectBackend {
> + /// Create a new direct file backend
> + pub fn new() -> Self {
> + tracing::info!("Using direct RRD file backend (via librrd)");
> + Self {}
> + }
> +}
> +
> +impl Default for RrdDirectBackend {
> + fn default() -> Self {
> + Self::new()
> + }
> +}
> +
> +#[async_trait]
> +impl super::super::backend::RrdBackend for RrdDirectBackend {
> + async fn update(&mut self, file_path: &Path, data: &str) -> Result<()> {
> + let path = file_path.to_path_buf();
> + let data_str = data.to_string();
> +
> + // Use tokio::task::spawn_blocking for sync rrd operations
> + // This prevents blocking the async runtime
> + tokio::task::spawn_blocking(move || {
> + // Parse the update data to extract timestamp and values
> + // Format: "timestamp:value1:value2:..."
> + let parts: Vec<&str> = data_str.split(':').collect();
> + if parts.is_empty() {
> + anyhow::bail!("Empty update data");
> + }
> +
> + // Use rrd::ops::update::update_all_with_timestamp
> + // This is the most direct way to update RRD files
> + let timestamp_str = parts[0];
> + let timestamp: i64 = if timestamp_str == "N" {
> + // "N" means "now" in RRD terminology
> + chrono::Utc::now().timestamp()
> + } else {
> + timestamp_str
> + .parse()
> + .with_context(|| format!("Invalid timestamp: {}", timestamp_str))?
> + };
> +
> + let timestamp = chrono::DateTime::from_timestamp(timestamp, 0)
> + .ok_or_else(|| anyhow::anyhow!("Invalid timestamp value: {}", timestamp))?;
> +
> + // Convert values to Datum
> + let values: Vec<rrd::ops::update::Datum> = parts[1..]
> + .iter()
> + .map(|v| {
> + if *v == "U" {
> + // Unknown/unspecified value
> + rrd::ops::update::Datum::Unspecified
> + } else if let Ok(int_val) = v.parse::<u64>() {
> + rrd::ops::update::Datum::Int(int_val)
> + } else if let Ok(float_val) = v.parse::<f64>() {
> + rrd::ops::update::Datum::Float(float_val)
> + } else {
> + rrd::ops::update::Datum::Unspecified
> + }
> + })
> + .collect();
> +
> + // Perform the update
> + rrd::ops::update::update_all(
> + &path,
> + rrd::ops::update::ExtraFlags::empty(),
> + &[(
> + rrd::ops::update::BatchTime::Timestamp(timestamp),
> + values.as_slice(),
> + )],
> + )
> + .with_context(|| format!("Direct RRD update failed for {:?}", path))?;
> +
> + tracing::trace!("Updated RRD via direct file: {:?} -> {}", path, data_str);
> +
> + Ok::<(), anyhow::Error>(())
> + })
> + .await
> + .context("Failed to spawn blocking task for RRD update")??;
> +
> + Ok(())
> + }
> +
> + async fn create(
> + &mut self,
> + file_path: &Path,
> + schema: &RrdSchema,
> + start_timestamp: i64,
> + ) -> Result<()> {
> + tracing::debug!(
> + "Creating RRD file via direct: {:?} with {} data sources",
> + file_path,
> + schema.column_count()
> + );
> +
> + let path = file_path.to_path_buf();
> + let schema = schema.clone();
> +
> + // Ensure parent directory exists
> + if let Some(parent) = path.parent() {
> + std::fs::create_dir_all(parent)
> + .with_context(|| format!("Failed to create directory: {parent:?}"))?;
> + }
> +
> + // Use tokio::task::spawn_blocking for sync rrd operations
> + tokio::task::spawn_blocking(move || {
> + // Convert timestamp
> + let start = chrono::DateTime::from_timestamp(start_timestamp, 0)
> + .ok_or_else(|| anyhow::anyhow!("Invalid start timestamp: {}", start_timestamp))?;
> +
> + // Convert data sources
> + let data_sources: Vec<rrd::ops::create::DataSource> = schema
> + .data_sources
> + .iter()
> + .map(|ds| {
> + let name = rrd::ops::create::DataSourceName::new(ds.name);
> +
> + match ds.ds_type {
> + "GAUGE" => {
> + let min = if ds.min == "U" {
> + None
> + } else {
> + Some(ds.min.parse().context("Invalid min value")?)
> + };
> + let max = if ds.max == "U" {
> + None
> + } else {
> + Some(ds.max.parse().context("Invalid max value")?)
> + };
> + Ok(rrd::ops::create::DataSource::gauge(
> + name,
> + ds.heartbeat,
> + min,
> + max,
> + ))
> + }
> + "DERIVE" => {
> + let min = if ds.min == "U" {
> + None
> + } else {
> + Some(ds.min.parse().context("Invalid min value")?)
> + };
> + let max = if ds.max == "U" {
> + None
> + } else {
> + Some(ds.max.parse().context("Invalid max value")?)
> + };
> + Ok(rrd::ops::create::DataSource::derive(
> + name,
> + ds.heartbeat,
> + min,
> + max,
> + ))
> + }
> + "COUNTER" => {
> + let min = if ds.min == "U" {
> + None
> + } else {
> + Some(ds.min.parse().context("Invalid min value")?)
> + };
> + let max = if ds.max == "U" {
> + None
> + } else {
> + Some(ds.max.parse().context("Invalid max value")?)
> + };
> + Ok(rrd::ops::create::DataSource::counter(
> + name,
> + ds.heartbeat,
> + min,
> + max,
> + ))
> + }
> + "ABSOLUTE" => {
> + let min = if ds.min == "U" {
> + None
> + } else {
> + Some(ds.min.parse().context("Invalid min value")?)
> + };
> + let max = if ds.max == "U" {
> + None
> + } else {
> + Some(ds.max.parse().context("Invalid max value")?)
> + };
> + Ok(rrd::ops::create::DataSource::absolute(
> + name,
> + ds.heartbeat,
> + min,
> + max,
> + ))
> + }
> + _ => anyhow::bail!("Unsupported data source type: {}", ds.ds_type),
> + }
> + })
> + .collect::<Result<Vec<_>>>()?;
> +
> + // Convert RRAs
> + let archives: Result<Vec<rrd::ops::create::Archive>> = schema
> + .archives
> + .iter()
> + .map(|rra| {
> + // Parse RRA string: "RRA:AVERAGE:0.5:1:1440"
> + let parts: Vec<&str> = rra.split(':').collect();
> + if parts.len() != 5 || parts[0] != "RRA" {
> + anyhow::bail!("Invalid RRA format: {}", rra);
> + }
> +
> + let cf = match parts[1] {
> + "AVERAGE" => rrd::ConsolidationFn::Avg,
> + "MIN" => rrd::ConsolidationFn::Min,
> + "MAX" => rrd::ConsolidationFn::Max,
> + "LAST" => rrd::ConsolidationFn::Last,
> + _ => anyhow::bail!("Unsupported consolidation function: {}", parts[1]),
> + };
> +
> + let xff: f64 = parts[2]
> + .parse()
> + .with_context(|| format!("Invalid xff in RRA: {}", rra))?;
> + let steps: u32 = parts[3]
> + .parse()
> + .with_context(|| format!("Invalid steps in RRA: {}", rra))?;
> + let rows: u32 = parts[4]
> + .parse()
> + .with_context(|| format!("Invalid rows in RRA: {}", rra))?;
> +
> + rrd::ops::create::Archive::new(cf, xff, steps, rows)
> + .map_err(|e| anyhow::anyhow!("Failed to create archive: {}", e))
> + })
> + .collect();
> +
> + let archives = archives?;
> +
> + // Call rrd::ops::create::create
> + rrd::ops::create::create(
> + &path,
> + start,
> + Duration::from_secs(60), // 60-second step
> + false, // no_overwrite = false
With overwrite allowed, there could be a race if we have a second,
concurrent create.
> + None, // template
> + &[], // sources
> + data_sources.iter(),
> + archives.iter(),
> + )
> + .with_context(|| format!("Direct RRD create failed for {:?}", path))?;
> +
> + tracing::info!("Created RRD file via direct: {:?} ({})", path, schema);
> +
> + Ok::<(), anyhow::Error>(())
> + })
> + .await
> + .context("Failed to spawn blocking task for RRD create")??;
> +
> + Ok(())
> + }
> +
> + async fn flush(&mut self) -> Result<()> {
> + // No-op for direct backend - writes are immediate
> + tracing::trace!("Flush called on direct backend (no-op)");
> + Ok(())
> + }
> +
> + async fn is_available(&self) -> bool {
> + // Direct backend is always available (no external dependencies)
> + true
> + }
> +
> + fn name(&self) -> &str {
> + "direct"
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> + use crate::backend::RrdBackend;
> + use crate::schema::{RrdFormat, RrdSchema};
> + use std::path::PathBuf;
> + use tempfile::TempDir;
> +
> + // ===== Test Helpers =====
> +
> + /// Create a temporary directory for RRD files
> + fn setup_temp_dir() -> TempDir {
> + TempDir::new().expect("Failed to create temp directory")
> + }
> +
> + /// Create a test RRD file path
> + fn test_rrd_path(dir: &TempDir, name: &str) -> PathBuf {
> + dir.path().join(format!("{}.rrd", name))
What’s the canonical on-disk naming here (with or without .rrd)?
file_path() and the daemon path handling suggest no extension,
but direct/tests currently create *.rrd. Can we make this consistent
across writer/backends/tests?
> + }
> +
> + // ===== RrdDirectBackend Tests =====
> +
> + #[tokio::test]
> + async fn test_direct_backend_create_node_rrd() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "node_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::node(RrdFormat::Pve9_0);
> + let start_time = 1704067200; // 2024-01-01 00:00:00
> +
> + // Create RRD file
> + let result = backend.create(&rrd_path, &schema, start_time).await;
> + assert!(
> + result.is_ok(),
> + "Failed to create node RRD: {:?}",
> + result.err()
> + );
> +
> + // Verify file was created
> + assert!(rrd_path.exists(), "RRD file should exist after create");
> +
> + // Verify backend name
> + assert_eq!(backend.name(), "direct");
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_create_vm_rrd() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "vm_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::vm(RrdFormat::Pve9_0);
> + let start_time = 1704067200;
> +
> + let result = backend.create(&rrd_path, &schema, start_time).await;
> + assert!(
> + result.is_ok(),
> + "Failed to create VM RRD: {:?}",
> + result.err()
> + );
> + assert!(rrd_path.exists());
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_create_storage_rrd() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "storage_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::storage(RrdFormat::Pve2);
> + let start_time = 1704067200;
> +
> + let result = backend.create(&rrd_path, &schema, start_time).await;
> + assert!(
> + result.is_ok(),
> + "Failed to create storage RRD: {:?}",
> + result.err()
> + );
> + assert!(rrd_path.exists());
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_update_with_timestamp() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "update_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::storage(RrdFormat::Pve2);
> + let start_time = 1704067200;
> +
> + // Create RRD file
> + backend
> + .create(&rrd_path, &schema, start_time)
> + .await
> + .expect("Failed to create RRD");
> +
> + // Update with explicit timestamp and values
> + // Format: "timestamp:value1:value2"
> + let update_data = "1704067260:1000000:500000"; // total=1MB, used=500KB
> + let result = backend.update(&rrd_path, update_data).await;
> +
> + assert!(result.is_ok(), "Failed to update RRD: {:?}", result.err());
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_update_with_n_timestamp() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "update_n_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::storage(RrdFormat::Pve2);
> + let start_time = 1704067200;
> +
> + backend
> + .create(&rrd_path, &schema, start_time)
> + .await
> + .expect("Failed to create RRD");
> +
> + // Update with "N" (current time) timestamp
> + let update_data = "N:2000000:750000";
> + let result = backend.update(&rrd_path, update_data).await;
> +
> + assert!(
> + result.is_ok(),
> + "Failed to update RRD with N timestamp: {:?}",
> + result.err()
> + );
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_update_with_unknown_values() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "update_u_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::storage(RrdFormat::Pve2);
> + let start_time = 1704067200;
> +
> + backend
> + .create(&rrd_path, &schema, start_time)
> + .await
> + .expect("Failed to create RRD");
> +
> + // Update with "U" (unknown) values
> + let update_data = "N:U:1000000"; // total unknown, used known
> + let result = backend.update(&rrd_path, update_data).await;
> +
> + assert!(
> + result.is_ok(),
> + "Failed to update RRD with U values: {:?}",
> + result.err()
> + );
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_update_invalid_data() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "invalid_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::storage(RrdFormat::Pve2);
> + let start_time = 1704067200;
> +
> + backend
> + .create(&rrd_path, &schema, start_time)
> + .await
> + .expect("Failed to create RRD");
> +
> + // Test truly invalid data formats that MUST fail
> + // Note: Invalid values like "abc" are converted to Unspecified (U), which is valid RRD behavior
> + let invalid_cases = vec![
> + "", // Empty string
> + ":", // Only separator
> + "timestamp", // Missing values
> + "N", // No colon separator
> + "abc:123:456", // Invalid timestamp (not N or integer)
> + ];
> +
> + for invalid_data in invalid_cases {
> + let result = backend.update(&rrd_path, invalid_data).await;
> + assert!(
> + result.is_err(),
> + "Update should fail for invalid data: '{}', but got Ok",
> + invalid_data
> + );
> + }
> +
> + // Test lenient data formats that succeed (invalid values become Unspecified)
> + // Use explicit timestamps to avoid "same timestamp" errors
> + let mut timestamp = start_time + 60;
> + let lenient_cases = vec![
> + "abc:456", // Invalid first value -> becomes U
> + "123:def", // Invalid second value -> becomes U
> + "U:U", // All unknown
> + ];
> +
> + for valid_data in lenient_cases {
> + let update_data = format!("{}:{}", timestamp, valid_data);
> + let result = backend.update(&rrd_path, &update_data).await;
> + assert!(
> + result.is_ok(),
> + "Update should succeed for lenient data: '{}', but got Err: {:?}",
> + update_data,
> + result.err()
> + );
> + timestamp += 60; // Increment timestamp for next update
> + }
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_update_nonexistent_file() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "nonexistent");
> +
> + let mut backend = RrdDirectBackend::new();
> +
> + // Try to update a file that doesn't exist
> + let result = backend.update(&rrd_path, "N:100:200").await;
> +
> + assert!(result.is_err(), "Update should fail for nonexistent file");
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_flush() {
> + let mut backend = RrdDirectBackend::new();
> +
> + // Flush should always succeed for direct backend (no-op)
> + let result = backend.flush().await;
> + assert!(
> + result.is_ok(),
> + "Flush should always succeed for direct backend"
> + );
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_is_available() {
> + let backend = RrdDirectBackend::new();
> +
> + // Direct backend should always be available
> + assert!(
> + backend.is_available().await,
> + "Direct backend should always be available"
> + );
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_multiple_updates() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "multi_update_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::storage(RrdFormat::Pve2);
> + let start_time = 1704067200;
> +
> + backend
> + .create(&rrd_path, &schema, start_time)
> + .await
> + .expect("Failed to create RRD");
> +
> + // Perform multiple updates
> + for i in 0..10 {
> + let timestamp = start_time + 60 * (i + 1); // 1 minute intervals
> + let total = 1000000 + (i * 100000);
> + let used = 500000 + (i * 50000);
> + let update_data = format!("{}:{}:{}", timestamp, total, used);
> +
> + let result = backend.update(&rrd_path, &update_data).await;
> + assert!(result.is_ok(), "Update {} failed: {:?}", i, result.err());
> + }
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_overwrite_file() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "overwrite_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::storage(RrdFormat::Pve2);
> + let start_time = 1704067200;
> +
> + // Create file first time
> + backend
> + .create(&rrd_path, &schema, start_time)
> + .await
> + .expect("First create failed");
> +
> + // Create same file again - should succeed (overwrites)
> + // Note: librrd create() with no_overwrite=false allows overwriting
> + let result = backend.create(&rrd_path, &schema, start_time).await;
> + assert!(
> + result.is_ok(),
> + "Creating file again should succeed (overwrite mode): {:?}",
> + result.err()
> + );
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_large_schema() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "large_schema_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::node(RrdFormat::Pve9_0); // 19 data sources
> + let start_time = 1704067200;
> +
> + // Create RRD with large schema
> + let result = backend.create(&rrd_path, &schema, start_time).await;
> + assert!(result.is_ok(), "Failed to create RRD with large schema");
> +
> + // Update with all values
> + let values = "100:200:50.5:10.2:8000000:4000000:2000000:500000:50000000:25000000:1000000:2000000:6000000:1000000:0.5:1.2:0.8:0.3:0.1";
> + let update_data = format!("N:{}", values);
> +
> + let result = backend.update(&rrd_path, &update_data).await;
> + assert!(result.is_ok(), "Failed to update RRD with large schema");
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_fallback.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_fallback.rs
> new file mode 100644
> index 00000000..7d574e5b
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_fallback.rs
> @@ -0,0 +1,229 @@
> +/// RRD Backend: Fallback (Daemon + Direct)
> +///
> +/// Composite backend that tries daemon first, falls back to direct file writing.
> +/// This matches the C implementation's behavior in status.c:1405-1420 where
> +/// it attempts rrdc_update() first, then falls back to rrd_update_r().
> +use super::super::schema::RrdSchema;
> +use super::{RrdCachedBackend, RrdDirectBackend};
> +use anyhow::{Context, Result};
> +use async_trait::async_trait;
> +use std::path::Path;
> +
> +/// Composite backend that tries daemon first, falls back to direct
> +///
> +/// This provides the same behavior as the C implementation:
> +/// 1. Try to use rrdcached daemon for performance
> +/// 2. If daemon fails or is unavailable, fall back to direct file writes
> +pub struct RrdFallbackBackend {
> + /// Optional daemon backend (None if daemon is unavailable/failed)
> + daemon: Option<RrdCachedBackend>,
> + /// Direct backend (always available)
> + direct: RrdDirectBackend,
> +}
> +
> +impl RrdFallbackBackend {
> + /// Create a new fallback backend
> + ///
> + /// Attempts to connect to rrdcached daemon. If successful, will prefer daemon.
> + /// If daemon is unavailable, will use direct mode only.
> + ///
> + /// # Arguments
> + /// * `daemon_socket` - Path to rrdcached Unix socket
> + pub async fn new(daemon_socket: &str) -> Self {
> + let daemon = match RrdCachedBackend::connect(daemon_socket).await {
> + Ok(backend) => {
> + tracing::info!("RRD fallback backend: daemon available, will prefer daemon mode");
> + Some(backend)
> + }
> + Err(e) => {
> + tracing::warn!(
> + "RRD fallback backend: daemon unavailable ({}), using direct mode only",
> + e
> + );
> + None
> + }
> + };
> +
> + let direct = RrdDirectBackend::new();
> +
> + Self { daemon, direct }
> + }
> +
> + /// Create a fallback backend with explicit daemon and direct backends
> + ///
> + /// Useful for testing or custom configurations
> + #[allow(dead_code)] // Used in tests for custom backend configurations
> + pub fn with_backends(daemon: Option<RrdCachedBackend>, direct: RrdDirectBackend) -> Self {
> + Self { daemon, direct }
> + }
> +
> + /// Check if daemon is currently being used
> + #[allow(dead_code)] // Used for debugging/monitoring daemon status
> + pub fn is_using_daemon(&self) -> bool {
> + self.daemon.is_some()
> + }
> +
> + /// Disable daemon mode and switch to direct mode only
> + ///
> + /// Called automatically when daemon operations fail
> + fn disable_daemon(&mut self) {
> + if self.daemon.is_some() {
> + tracing::warn!("Disabling daemon mode, switching to direct file writes");
> + self.daemon = None;
> + }
> + }
> +}
> +
> +#[async_trait]
> +impl super::super::backend::RrdBackend for RrdFallbackBackend {
> + async fn update(&mut self, file_path: &Path, data: &str) -> Result<()> {
> + // Try daemon first if available
> + if let Some(daemon) = &mut self.daemon {
> + match daemon.update(file_path, data).await {
> + Ok(()) => {
> + tracing::trace!("Updated RRD via daemon (fallback backend)");
> + return Ok(());
> + }
> + Err(e) => {
> + tracing::warn!("Daemon update failed, falling back to direct: {}", e);
> + self.disable_daemon();
Currently, we disable here the daemon permanently after one failure.
In C the daemon retries on every update call it seems.
I think its fine if we go with this for now, but this should be then
noted as a difference in the README, and maybe something to change
in the future.
> + }
> + }
> + }
> +
> + // Fallback to direct
> + self.direct
> + .update(file_path, data)
> + .await
> + .context("Both daemon and direct update failed")
> + }
> +
> + async fn create(
> + &mut self,
> + file_path: &Path,
> + schema: &RrdSchema,
> + start_timestamp: i64,
> + ) -> Result<()> {
> + // Try daemon first if available
> + if let Some(daemon) = &mut self.daemon {
> + match daemon.create(file_path, schema, start_timestamp).await {
> + Ok(()) => {
> + tracing::trace!("Created RRD via daemon (fallback backend)");
> + return Ok(());
> + }
> + Err(e) => {
> + tracing::warn!("Daemon create failed, falling back to direct: {}", e);
> + self.disable_daemon();
> + }
> + }
> + }
> +
> + // Fallback to direct
> + self.direct
> + .create(file_path, schema, start_timestamp)
> + .await
> + .context("Both daemon and direct create failed")
> + }
> +
> + async fn flush(&mut self) -> Result<()> {
> + // Only flush if using daemon
> + if let Some(daemon) = &mut self.daemon {
> + match daemon.flush().await {
> + Ok(()) => return Ok(()),
> + Err(e) => {
> + tracing::warn!("Daemon flush failed: {}", e);
> + self.disable_daemon();
> + }
> + }
> + }
> +
> + // Direct backend flush is a no-op
> + self.direct.flush().await
> + }
> +
> + async fn is_available(&self) -> bool {
> + // Always available - either daemon or direct will work
> + true
> + }
> +
> + fn name(&self) -> &str {
> + if self.daemon.is_some() {
> + "fallback(daemon+direct)"
> + } else {
> + "fallback(direct-only)"
> + }
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> + use crate::backend::RrdBackend;
> + use crate::schema::{RrdFormat, RrdSchema};
> + use std::path::PathBuf;
> + use tempfile::TempDir;
> +
> + /// Create a temporary directory for RRD files
> + fn setup_temp_dir() -> TempDir {
> + TempDir::new().expect("Failed to create temp directory")
> + }
> +
> + /// Create a test RRD file path
> + fn test_rrd_path(dir: &TempDir, name: &str) -> PathBuf {
> + dir.path().join(format!("{}.rrd", name))
> + }
> +
> + #[test]
> + fn test_fallback_backend_without_daemon() {
> + let direct = RrdDirectBackend::new();
> + let backend = RrdFallbackBackend::with_backends(None, direct);
> +
> + assert!(!backend.is_using_daemon());
> + assert_eq!(backend.name(), "fallback(direct-only)");
> + }
> +
> + #[tokio::test]
> + async fn test_fallback_backend_direct_mode_operations() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "fallback_test");
> +
> + // Create fallback backend without daemon (direct mode only)
> + let direct = RrdDirectBackend::new();
> + let mut backend = RrdFallbackBackend::with_backends(None, direct);
> +
> + assert!(!backend.is_using_daemon(), "Should not be using daemon");
> + assert_eq!(backend.name(), "fallback(direct-only)");
> +
> + // Test create and update operations work in direct mode
> + let schema = RrdSchema::storage(RrdFormat::Pve2);
> + let start_time = 1704067200;
> +
> + let result = backend.create(&rrd_path, &schema, start_time).await;
> + assert!(result.is_ok(), "Create should work in direct mode");
> +
> + let result = backend.update(&rrd_path, "N:1000:500").await;
> + assert!(result.is_ok(), "Update should work in direct mode");
> + }
> +
> + #[tokio::test]
> + async fn test_fallback_backend_is_always_available() {
> + let direct = RrdDirectBackend::new();
> + let backend = RrdFallbackBackend::with_backends(None, direct);
> +
> + // Fallback backend should always be available (even without daemon)
> + assert!(
> + backend.is_available().await,
> + "Fallback backend should always be available"
> + );
> + }
> +
> + #[tokio::test]
> + async fn test_fallback_backend_flush_without_daemon() {
> + let direct = RrdDirectBackend::new();
> + let mut backend = RrdFallbackBackend::with_backends(None, direct);
> +
> + // Flush should succeed even without daemon (no-op for direct)
> + let result = backend.flush().await;
> + assert!(result.is_ok(), "Flush should succeed without daemon");
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/daemon.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/daemon.rs
> new file mode 100644
> index 00000000..e53b6dad
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/daemon.rs
> @@ -0,0 +1,140 @@
> +/// RRDCached Daemon Client (wrapper around rrdcached-client crate)
> +///
> +/// This module provides a thin wrapper around the rrdcached-client crate.
> +use anyhow::{Context, Result};
> +use std::path::Path;
> +
> +/// Wrapper around rrdcached-client
> +#[allow(dead_code)] // Used in backend_daemon.rs via module-level access
> +pub struct RrdCachedClient {
> + pub(crate) client:
> + tokio::sync::Mutex<rrdcached_client::RRDCachedClient<tokio::net::UnixStream>>,
> +}
> +
> +impl RrdCachedClient {
> + /// Connect to rrdcached daemon via Unix socket
> + ///
> + /// # Arguments
> + /// * `socket_path` - Path to rrdcached Unix socket (default: /var/run/rrdcached.sock)
> + #[allow(dead_code)] // Used via backend modules
> + pub async fn connect<P: AsRef<Path>>(socket_path: P) -> Result<Self> {
> + let socket_path = socket_path.as_ref().to_string_lossy().to_string();
> +
> + tracing::debug!("Connecting to rrdcached at {}", socket_path);
> +
> + // Connect to daemon (async operation)
> + let client = rrdcached_client::RRDCachedClient::connect_unix(&socket_path)
> + .await
> + .with_context(|| format!("Failed to connect to rrdcached: {socket_path}"))?;
> +
> + tracing::info!("Connected to rrdcached at {}", socket_path);
> +
> + Ok(Self {
> + client: tokio::sync::Mutex::new(client),
> + })
> + }
> +
> + /// Update RRD file via rrdcached
> + ///
> + /// # Arguments
> + /// * `file_path` - Full path to RRD file
> + /// * `data` - Update data in format "timestamp:value1:value2:..."
> + #[allow(dead_code)] // Used via backend modules
> + pub async fn update<P: AsRef<Path>>(&self, file_path: P, data: &str) -> Result<()> {
There is a lot of duplication in this function and
RrdCachedBackend::update(), I think this can be refactored a bit.
> + let file_path = file_path.as_ref();
> +
> + // Parse the update data
> + let parts: Vec<&str> = data.split(':').collect();
> + if parts.len() < 2 {
> + anyhow::bail!("Invalid update data format: {data}");
> + }
> +
> + let timestamp = if parts[0] == "N" {
> + None
> + } else {
> + Some(
> + parts[0]
> + .parse::<usize>()
> + .with_context(|| format!("Invalid timestamp: {}", parts[0]))?,
> + )
> + };
> +
> + let values: Vec<f64> = parts[1..]
> + .iter()
> + .map(|v| {
> + if *v == "U" {
> + Ok(f64::NAN)
> + } else {
> + v.parse::<f64>()
> + .with_context(|| format!("Invalid value: {v}"))
while we fail here on parsing of non-U values,
RrdCachedBackend::update() treats many invalid tokens as
Datum::Unspecified and succeeds.
It makes behavior depend on which backend is active.
We should stick to one rule.
> + }
> + })
> + .collect::<Result<Vec<_>>>()?;
> +
> + // Get file path without .rrd extension (rrdcached-client adds it)
> + let path_str = file_path.to_string_lossy();
> + let path_without_ext = path_str.strip_suffix(".rrd").unwrap_or(&path_str);
> +
> + // Send update via rrdcached
> + let mut client = self.client.lock().await;
> + client
> + .update(path_without_ext, timestamp, values)
> + .await
> + .context("Failed to send update to rrdcached")?;
> +
> + tracing::trace!("Updated RRD via daemon: {:?} -> {}", file_path, data);
> +
> + Ok(())
> + }
> +
> + /// Create RRD file via rrdcached
> + #[allow(dead_code)] // Used via backend modules
> + pub async fn create(&self, args: rrdcached_client::create::CreateArguments) -> Result<()> {
> + let mut client = self.client.lock().await;
> + client
> + .create(args)
> + .await
> + .context("Failed to create RRD via rrdcached")?;
> + Ok(())
> + }
> +
> + /// Flush all pending updates
> + #[allow(dead_code)] // Used via backend modules
> + pub async fn flush(&self) -> Result<()> {
> + let mut client = self.client.lock().await;
> + client
> + .flush_all()
> + .await
> + .context("Failed to flush rrdcached")?;
> +
> + tracing::debug!("Flushed all RRD files");
> +
> + Ok(())
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> +
> + #[tokio::test]
> + #[ignore] // Only runs if rrdcached daemon is actually running
> + async fn test_connect_to_daemon() {
> + // This test requires a running rrdcached daemon
> + let result = RrdCachedClient::connect("/var/run/rrdcached.sock").await;
> +
> + match result {
> + Ok(client) => {
> + // Try to flush (basic connectivity test)
> + let result = client.flush().await;
> + println!("RRDCached flush result: {:?}", result);
> +
> + // Connection successful (flush may fail if no files, that's OK)
> + assert!(result.is_ok() || result.is_err());
> + }
> + Err(e) => {
> + println!("Note: rrdcached not running (expected in test env): {}", e);
> + }
> + }
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs
> new file mode 100644
> index 00000000..54021c14
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs
> @@ -0,0 +1,313 @@
> +/// RRD Key Type Parsing and Path Resolution
> +///
> +/// This module handles parsing RRD status update keys and mapping them
> +/// to the appropriate file paths and schemas.
> +use anyhow::{Context, Result};
> +use std::path::{Path, PathBuf};
> +
> +use super::schema::{RrdFormat, RrdSchema};
> +
> +/// RRD key types for routing to correct schema and path
> +///
> +/// This enum represents the different types of RRD metrics that pmxcfs tracks:
> +/// - Node metrics (CPU, memory, network for a node)
> +/// - VM metrics (CPU, memory, disk, network for a VM/CT)
> +/// - Storage metrics (total/used space for a storage)
> +#[derive(Debug, Clone, PartialEq, Eq)]
> +pub(crate) enum RrdKeyType {
> + /// Node metrics: pve2-node/{nodename} or pve-node-9.0/{nodename}
> + Node { nodename: String, format: RrdFormat },
> + /// VM metrics: pve2.3-vm/{vmid} or pve-vm-9.0/{vmid}
> + Vm { vmid: String, format: RrdFormat },
> + /// Storage metrics: pve2-storage/{node}/{storage} or pve-storage-9.0/{node}/{storage}
> + Storage {
> + nodename: String,
> + storage: String,
> + format: RrdFormat,
> + },
> +}
> +
> +impl RrdKeyType {
> + /// Parse RRD key from status update key
> + ///
> + /// Supported formats:
> + /// - "pve2-node/node1" → Node { nodename: "node1", format: Pve2 }
> + /// - "pve-node-9.0/node1" → Node { nodename: "node1", format: Pve9_0 }
> + /// - "pve2.3-vm/100" → Vm { vmid: "100", format: Pve2 }
> + /// - "pve-storage-9.0/node1/local" → Storage { nodename: "node1", storage: "local", format: Pve9_0 }
> + pub(crate) fn parse(key: &str) -> Result<Self> {
> + let parts: Vec<&str> = key.split('/').collect();
> +
> + if parts.is_empty() {
> + anyhow::bail!("Empty RRD key");
> + }
> +
> + match parts[0] {
> + "pve2-node" => {
> + let nodename = parts.get(1).context("Missing nodename")?.to_string();
> + Ok(RrdKeyType::Node {
> + nodename,
> + format: RrdFormat::Pve2,
> + })
> + }
> + prefix if prefix.starts_with("pve-node-") => {
pve-node-9.1/... would be treated as 9.0 so we lose the ability to
distinguish future format
Shouldnt we parse the suffix? Or please explicitly document the assumption.
> + let nodename = parts.get(1).context("Missing nodename")?.to_string();
> + Ok(RrdKeyType::Node {
> + nodename,
> + format: RrdFormat::Pve9_0,
> + })
> + }
> + "pve2.3-vm" => {
> + let vmid = parts.get(1).context("Missing vmid")?.to_string();
> + Ok(RrdKeyType::Vm {
> + vmid,
> + format: RrdFormat::Pve2,
> + })
> + }
> + prefix if prefix.starts_with("pve-vm-") => {
> + let vmid = parts.get(1).context("Missing vmid")?.to_string();
> + Ok(RrdKeyType::Vm {
> + vmid,
> + format: RrdFormat::Pve9_0,
> + })
> + }
> + "pve2-storage" => {
> + let nodename = parts.get(1).context("Missing nodename")?.to_string();
> + let storage = parts.get(2).context("Missing storage")?.to_string();
> + Ok(RrdKeyType::Storage {
> + nodename,
> + storage,
> + format: RrdFormat::Pve2,
> + })
> + }
> + prefix if prefix.starts_with("pve-storage-") => {
> + let nodename = parts.get(1).context("Missing nodename")?.to_string();
> + let storage = parts.get(2).context("Missing storage")?.to_string();
> + Ok(RrdKeyType::Storage {
> + nodename,
> + storage,
> + format: RrdFormat::Pve9_0,
> + })
> + }
> + _ => anyhow::bail!("Unknown RRD key format: {key}"),
> + }
> + }
> +
> + /// Get the RRD file path for this key type
> + ///
> + /// Always returns paths using the current format (9.0), regardless of the input format.
> + /// This enables transparent format migration: old PVE8 nodes can send `pve2-node/` keys,
> + /// and they'll be written to `pve-node-9.0/` files automatically.
> + ///
> + /// # Format Migration Strategy
> + ///
> + /// The C implementation always creates files in the current format directory
> + /// (see status.c:1287). This Rust implementation follows the same approach:
> + /// - Input: `pve2-node/node1` → Output: `/var/lib/rrdcached/db/pve-node-9.0/node1`
> + /// - Input: `pve-node-9.0/node1` → Output: `/var/lib/rrdcached/db/pve-node-9.0/node1`
> + ///
> + /// This allows rolling upgrades where old and new nodes coexist in the same cluster.
> + pub(crate) fn file_path(&self, base_dir: &Path) -> PathBuf {
> + match self {
> + RrdKeyType::Node { nodename, .. } => {
> + // Always use current format path
> + base_dir.join("pve-node-9.0").join(nodename)
If nodename or storage contains .. or / base_dir could be escaped and
the write could happen anywhere.
I think we need validate/sanitize the input paths if not already
done. Ideally already as part of RrdKeyType?
> + }
> + RrdKeyType::Vm { vmid, .. } => {
> + // Always use current format path
> + base_dir.join("pve-vm-9.0").join(vmid)
> + }
> + RrdKeyType::Storage {
> + nodename, storage, ..
> + } => {
> + // Always use current format path
> + base_dir
> + .join("pve-storage-9.0")
> + .join(nodename)
> + .join(storage)
> + }
> + }
> + }
> +
> + /// Get the source format from the input key
> + ///
> + /// This is used for data transformation (padding/truncation).
> + pub(crate) fn source_format(&self) -> RrdFormat {
> + match self {
> + RrdKeyType::Node { format, .. }
> + | RrdKeyType::Vm { format, .. }
> + | RrdKeyType::Storage { format, .. } => *format,
> + }
> + }
> +
> + /// Get the target RRD schema (always current format)
> + ///
> + /// Files are always created using the current format (Pve9_0),
> + /// regardless of the source format in the key.
> + pub(crate) fn schema(&self) -> RrdSchema {
> + match self {
> + RrdKeyType::Node { .. } => RrdSchema::node(RrdFormat::Pve9_0),
> + RrdKeyType::Vm { .. } => RrdSchema::vm(RrdFormat::Pve9_0),
> + RrdKeyType::Storage { .. } => RrdSchema::storage(RrdFormat::Pve9_0),
> + }
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> +
> + #[test]
> + fn test_parse_node_keys() {
> + let key = RrdKeyType::parse("pve2-node/testnode").unwrap();
> + assert_eq!(
> + key,
> + RrdKeyType::Node {
> + nodename: "testnode".to_string(),
> + format: RrdFormat::Pve2
> + }
> + );
> +
> + let key = RrdKeyType::parse("pve-node-9.0/testnode").unwrap();
> + assert_eq!(
> + key,
> + RrdKeyType::Node {
> + nodename: "testnode".to_string(),
> + format: RrdFormat::Pve9_0
> + }
> + );
> + }
> +
> + #[test]
> + fn test_parse_vm_keys() {
> + let key = RrdKeyType::parse("pve2.3-vm/100").unwrap();
> + assert_eq!(
> + key,
> + RrdKeyType::Vm {
> + vmid: "100".to_string(),
> + format: RrdFormat::Pve2
> + }
> + );
> +
> + let key = RrdKeyType::parse("pve-vm-9.0/100").unwrap();
> + assert_eq!(
> + key,
> + RrdKeyType::Vm {
> + vmid: "100".to_string(),
> + format: RrdFormat::Pve9_0
> + }
> + );
> + }
> +
> + #[test]
> + fn test_parse_storage_keys() {
> + let key = RrdKeyType::parse("pve2-storage/node1/local").unwrap();
> + assert_eq!(
> + key,
> + RrdKeyType::Storage {
> + nodename: "node1".to_string(),
> + storage: "local".to_string(),
> + format: RrdFormat::Pve2
> + }
> + );
> +
> + let key = RrdKeyType::parse("pve-storage-9.0/node1/local").unwrap();
> + assert_eq!(
> + key,
> + RrdKeyType::Storage {
> + nodename: "node1".to_string(),
> + storage: "local".to_string(),
> + format: RrdFormat::Pve9_0
> + }
> + );
> + }
> +
> + #[test]
> + fn test_file_paths() {
> + let base = Path::new("/var/lib/rrdcached/db");
> +
> + // New format key → new format path
> + let key = RrdKeyType::Node {
> + nodename: "node1".to_string(),
> + format: RrdFormat::Pve9_0,
> + };
> + assert_eq!(
> + key.file_path(base),
> + PathBuf::from("/var/lib/rrdcached/db/pve-node-9.0/node1")
> + );
> +
> + // Old format key → new format path (auto-upgrade!)
> + let key = RrdKeyType::Node {
> + nodename: "node1".to_string(),
> + format: RrdFormat::Pve2,
> + };
> + assert_eq!(
> + key.file_path(base),
> + PathBuf::from("/var/lib/rrdcached/db/pve-node-9.0/node1"),
> + "Old format keys should create new format files"
> + );
> +
> + // VM: Old format → new format
> + let key = RrdKeyType::Vm {
> + vmid: "100".to_string(),
> + format: RrdFormat::Pve2,
> + };
> + assert_eq!(
> + key.file_path(base),
> + PathBuf::from("/var/lib/rrdcached/db/pve-vm-9.0/100"),
> + "Old VM format should upgrade to new format"
> + );
> +
> + // Storage: Always uses current format
> + let key = RrdKeyType::Storage {
> + nodename: "node1".to_string(),
> + storage: "local".to_string(),
> + format: RrdFormat::Pve2,
> + };
> + assert_eq!(
> + key.file_path(base),
> + PathBuf::from("/var/lib/rrdcached/db/pve-storage-9.0/node1/local"),
> + "Old storage format should upgrade to new format"
> + );
> + }
> +
> + #[test]
> + fn test_source_format() {
> + let key = RrdKeyType::Node {
> + nodename: "node1".to_string(),
> + format: RrdFormat::Pve2,
> + };
> + assert_eq!(key.source_format(), RrdFormat::Pve2);
> +
> + let key = RrdKeyType::Vm {
> + vmid: "100".to_string(),
> + format: RrdFormat::Pve9_0,
> + };
> + assert_eq!(key.source_format(), RrdFormat::Pve9_0);
> + }
> +
> + #[test]
> + fn test_schema_always_current_format() {
> + // Even with Pve2 source format, schema should return Pve9_0
> + let key = RrdKeyType::Node {
> + nodename: "node1".to_string(),
> + format: RrdFormat::Pve2,
> + };
> + let schema = key.schema();
> + assert_eq!(
> + schema.format,
> + RrdFormat::Pve9_0,
> + "Schema should always use current format"
> + );
> + assert_eq!(schema.column_count(), 19, "Should have Pve9_0 column count");
> +
> + // Pve9_0 source also gets Pve9_0 schema
> + let key = RrdKeyType::Node {
> + nodename: "node1".to_string(),
> + format: RrdFormat::Pve9_0,
> + };
> + let schema = key.schema();
> + assert_eq!(schema.format, RrdFormat::Pve9_0);
> + assert_eq!(schema.column_count(), 19);
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/lib.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/lib.rs
> new file mode 100644
> index 00000000..7a439676
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/lib.rs
> @@ -0,0 +1,21 @@
> +/// RRD (Round-Robin Database) Persistence Module
> +///
> +/// This module provides RRD file persistence compatible with the C pmxcfs implementation.
> +/// It handles:
> +/// - RRD file creation with proper schemas (node, VM, storage)
> +/// - RRD file updates (writing metrics to disk)
> +/// - Multiple backend strategies:
> +/// - Daemon mode: High-performance batched updates via rrdcached
> +/// - Direct mode: Reliable fallback using direct file writes
> +/// - Fallback mode: Tries daemon first, falls back to direct (matches C behavior)
> +/// - Version management (pve2 vs pve-9.0 formats)
> +///
> +/// The implementation matches the C behavior in status.c where it attempts
> +/// daemon updates first, then falls back to direct file operations.
> +mod backend;
> +mod daemon;
> +mod key_type;
> +pub(crate) mod schema;
> +mod writer;
> +
> +pub use writer::RrdWriter;
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/schema.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/schema.rs
> new file mode 100644
> index 00000000..d449bd6e
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/schema.rs
> @@ -0,0 +1,577 @@
> +/// RRD Schema Definitions
> +///
> +/// Defines RRD database schemas matching the C pmxcfs implementation.
> +/// Each schema specifies data sources (DS) and round-robin archives (RRA).
> +use std::fmt;
> +
> +/// RRD format version
> +#[derive(Debug, Clone, Copy, PartialEq, Eq)]
> +pub enum RrdFormat {
> + /// Legacy pve2 format (12 columns for node, 10 for VM, 2 for storage)
> + Pve2,
> + /// New pve-9.0 format (19 columns for node, 17 for VM, 2 for storage)
> + Pve9_0,
> +}
> +
> +/// RRD data source definition
> +#[derive(Debug, Clone)]
> +pub struct RrdDataSource {
> + /// Data source name
> + pub name: &'static str,
> + /// Data source type (GAUGE, COUNTER, DERIVE, ABSOLUTE)
> + pub ds_type: &'static str,
> + /// Heartbeat (seconds before marking as unknown)
> + pub heartbeat: u32,
> + /// Minimum value (U for unknown)
> + pub min: &'static str,
> + /// Maximum value (U for unknown)
> + pub max: &'static str,
> +}
> +
> +impl RrdDataSource {
> + /// Create GAUGE data source with no min/max limits
> + pub(super) const fn gauge(name: &'static str) -> Self {
> + Self {
> + name,
> + ds_type: "GAUGE",
> + heartbeat: 120,
> + min: "0",
> + max: "U",
> + }
> + }
> +
> + /// Create DERIVE data source (for counters that can wrap)
> + pub(super) const fn derive(name: &'static str) -> Self {
> + Self {
> + name,
> + ds_type: "DERIVE",
> + heartbeat: 120,
> + min: "0",
> + max: "U",
> + }
> + }
> +
> + /// Format as RRD command line argument
> + ///
> + /// Matches C implementation format: "DS:name:TYPE:heartbeat:min:max"
> + /// (see rrd_def_node in src/pmxcfs/status.c:1100)
> + ///
> + /// Currently unused but kept for debugging/testing and C format compatibility.
> + #[allow(dead_code)]
> + pub(super) fn to_arg(&self) -> String {
> + format!(
> + "DS:{}:{}:{}:{}:{}",
> + self.name, self.ds_type, self.heartbeat, self.min, self.max
> + )
> + }
> +}
> +
> +/// RRD schema with data sources and archives
> +#[derive(Debug, Clone)]
> +pub struct RrdSchema {
> + /// RRD format version
> + pub format: RrdFormat,
> + /// Data sources
> + pub data_sources: Vec<RrdDataSource>,
> + /// Round-robin archives (RRA definitions)
> + pub archives: Vec<String>,
> +}
> +
> +impl RrdSchema {
> + /// Create node RRD schema
> + pub fn node(format: RrdFormat) -> Self {
> + let data_sources = match format {
> + RrdFormat::Pve2 => vec![
> + RrdDataSource::gauge("loadavg"),
> + RrdDataSource::gauge("maxcpu"),
> + RrdDataSource::gauge("cpu"),
> + RrdDataSource::gauge("iowait"),
> + RrdDataSource::gauge("memtotal"),
> + RrdDataSource::gauge("memused"),
> + RrdDataSource::gauge("swaptotal"),
> + RrdDataSource::gauge("swapused"),
> + RrdDataSource::gauge("roottotal"),
> + RrdDataSource::gauge("rootused"),
> + RrdDataSource::derive("netin"),
> + RrdDataSource::derive("netout"),
> + ],
> + RrdFormat::Pve9_0 => vec![
> + RrdDataSource::gauge("loadavg"),
> + RrdDataSource::gauge("maxcpu"),
> + RrdDataSource::gauge("cpu"),
> + RrdDataSource::gauge("iowait"),
> + RrdDataSource::gauge("memtotal"),
> + RrdDataSource::gauge("memused"),
> + RrdDataSource::gauge("swaptotal"),
> + RrdDataSource::gauge("swapused"),
> + RrdDataSource::gauge("roottotal"),
> + RrdDataSource::gauge("rootused"),
> + RrdDataSource::derive("netin"),
> + RrdDataSource::derive("netout"),
> + RrdDataSource::gauge("memavailable"),
> + RrdDataSource::gauge("arcsize"),
> + RrdDataSource::gauge("pressurecpusome"),
> + RrdDataSource::gauge("pressureiosome"),
> + RrdDataSource::gauge("pressureiofull"),
> + RrdDataSource::gauge("pressurememorysome"),
> + RrdDataSource::gauge("pressurememoryfull"),
> + ],
> + };
> +
> + Self {
> + format,
> + data_sources,
> + archives: Self::default_archives(),
> + }
> + }
> +
> + /// Create VM RRD schema
> + pub fn vm(format: RrdFormat) -> Self {
> + let data_sources = match format {
> + RrdFormat::Pve2 => vec![
> + RrdDataSource::gauge("maxcpu"),
> + RrdDataSource::gauge("cpu"),
> + RrdDataSource::gauge("maxmem"),
> + RrdDataSource::gauge("mem"),
> + RrdDataSource::gauge("maxdisk"),
> + RrdDataSource::gauge("disk"),
> + RrdDataSource::derive("netin"),
> + RrdDataSource::derive("netout"),
> + RrdDataSource::derive("diskread"),
> + RrdDataSource::derive("diskwrite"),
> + ],
> + RrdFormat::Pve9_0 => vec![
> + RrdDataSource::gauge("maxcpu"),
> + RrdDataSource::gauge("cpu"),
> + RrdDataSource::gauge("maxmem"),
> + RrdDataSource::gauge("mem"),
> + RrdDataSource::gauge("maxdisk"),
> + RrdDataSource::gauge("disk"),
> + RrdDataSource::derive("netin"),
> + RrdDataSource::derive("netout"),
> + RrdDataSource::derive("diskread"),
> + RrdDataSource::derive("diskwrite"),
> + RrdDataSource::gauge("memhost"),
> + RrdDataSource::gauge("pressurecpusome"),
> + RrdDataSource::gauge("pressurecpufull"),
> + RrdDataSource::gauge("pressureiosome"),
> + RrdDataSource::gauge("pressureiofull"),
> + RrdDataSource::gauge("pressurememorysome"),
> + RrdDataSource::gauge("pressurememoryfull"),
> + ],
> + };
> +
> + Self {
> + format,
> + data_sources,
> + archives: Self::default_archives(),
> + }
> + }
> +
> + /// Create storage RRD schema
> + pub fn storage(format: RrdFormat) -> Self {
> + let data_sources = vec![RrdDataSource::gauge("total"), RrdDataSource::gauge("used")];
> +
> + Self {
> + format,
> + data_sources,
> + archives: Self::default_archives(),
> + }
> + }
> +
> + /// Default RRA (Round-Robin Archive) definitions
> + ///
> + /// These match the C implementation's archives for 60-second step size:
> + /// - RRA:AVERAGE:0.5:1:1440 -> 1 min * 1440 => 1 day
> + /// - RRA:AVERAGE:0.5:30:1440 -> 30 min * 1440 => 30 days
> + /// - RRA:AVERAGE:0.5:360:1440 -> 6 hours * 1440 => 360 days (~1 year)
> + /// - RRA:AVERAGE:0.5:10080:570 -> 1 week * 570 => ~10 years
> + /// - RRA:MAX:0.5:1:1440 -> 1 min * 1440 => 1 day
> + /// - RRA:MAX:0.5:30:1440 -> 30 min * 1440 => 30 days
> + /// - RRA:MAX:0.5:360:1440 -> 6 hours * 1440 => 360 days (~1 year)
> + /// - RRA:MAX:0.5:10080:570 -> 1 week * 570 => ~10 years
> + pub(super) fn default_archives() -> Vec<String> {
> + vec![
> + "RRA:AVERAGE:0.5:1:1440".to_string(),
> + "RRA:AVERAGE:0.5:30:1440".to_string(),
> + "RRA:AVERAGE:0.5:360:1440".to_string(),
> + "RRA:AVERAGE:0.5:10080:570".to_string(),
> + "RRA:MAX:0.5:1:1440".to_string(),
> + "RRA:MAX:0.5:30:1440".to_string(),
> + "RRA:MAX:0.5:360:1440".to_string(),
> + "RRA:MAX:0.5:10080:570".to_string(),
> + ]
> + }
> +
> + /// Get number of data sources
> + pub fn column_count(&self) -> usize {
> + self.data_sources.len()
> + }
> +}
> +
> +impl fmt::Display for RrdSchema {
> + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
> + write!(
> + f,
> + "{:?} schema with {} data sources",
> + self.format,
> + self.column_count()
> + )
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> +
> + fn assert_ds_properties(
> + ds: &RrdDataSource,
> + expected_name: &str,
> + expected_type: &str,
> + index: usize,
> + ) {
> + assert_eq!(ds.name, expected_name, "DS[{}] name mismatch", index);
> + assert_eq!(ds.ds_type, expected_type, "DS[{}] type mismatch", index);
> + assert_eq!(ds.heartbeat, 120, "DS[{}] heartbeat should be 120", index);
> + assert_eq!(ds.min, "0", "DS[{}] min should be 0", index);
> + assert_eq!(ds.max, "U", "DS[{}] max should be U", index);
> + }
> +
> + #[test]
> + fn test_datasource_construction() {
> + let gauge_ds = RrdDataSource::gauge("cpu");
> + assert_eq!(gauge_ds.name, "cpu");
> + assert_eq!(gauge_ds.ds_type, "GAUGE");
> + assert_eq!(gauge_ds.heartbeat, 120);
> + assert_eq!(gauge_ds.min, "0");
> + assert_eq!(gauge_ds.max, "U");
> + assert_eq!(gauge_ds.to_arg(), "DS:cpu:GAUGE:120:0:U");
> +
> + let derive_ds = RrdDataSource::derive("netin");
> + assert_eq!(derive_ds.name, "netin");
> + assert_eq!(derive_ds.ds_type, "DERIVE");
> + assert_eq!(derive_ds.heartbeat, 120);
> + assert_eq!(derive_ds.min, "0");
> + assert_eq!(derive_ds.max, "U");
> + assert_eq!(derive_ds.to_arg(), "DS:netin:DERIVE:120:0:U");
> + }
> +
> + #[test]
> + fn test_node_schema_pve2() {
> + let schema = RrdSchema::node(RrdFormat::Pve2);
> +
> + assert_eq!(schema.column_count(), 12);
> + assert_eq!(schema.format, RrdFormat::Pve2);
> +
> + let expected_ds = vec![
> + ("loadavg", "GAUGE"),
> + ("maxcpu", "GAUGE"),
> + ("cpu", "GAUGE"),
> + ("iowait", "GAUGE"),
> + ("memtotal", "GAUGE"),
> + ("memused", "GAUGE"),
> + ("swaptotal", "GAUGE"),
> + ("swapused", "GAUGE"),
> + ("roottotal", "GAUGE"),
> + ("rootused", "GAUGE"),
> + ("netin", "DERIVE"),
> + ("netout", "DERIVE"),
> + ];
> +
> + for (i, (name, ds_type)) in expected_ds.iter().enumerate() {
> + assert_ds_properties(&schema.data_sources[i], name, ds_type, i);
> + }
> + }
> +
> + #[test]
> + fn test_node_schema_pve9() {
> + let schema = RrdSchema::node(RrdFormat::Pve9_0);
> +
> + assert_eq!(schema.column_count(), 19);
> + assert_eq!(schema.format, RrdFormat::Pve9_0);
> +
> + let pve2_schema = RrdSchema::node(RrdFormat::Pve2);
> + for i in 0..12 {
> + assert_eq!(
> + schema.data_sources[i].name, pve2_schema.data_sources[i].name,
> + "First 12 DS should match pve2"
> + );
> + assert_eq!(
> + schema.data_sources[i].ds_type, pve2_schema.data_sources[i].ds_type,
> + "First 12 DS types should match pve2"
> + );
> + }
> +
> + let pve9_additions = vec![
> + ("memavailable", "GAUGE"),
> + ("arcsize", "GAUGE"),
> + ("pressurecpusome", "GAUGE"),
> + ("pressureiosome", "GAUGE"),
> + ("pressureiofull", "GAUGE"),
> + ("pressurememorysome", "GAUGE"),
> + ("pressurememoryfull", "GAUGE"),
> + ];
> +
> + for (i, (name, ds_type)) in pve9_additions.iter().enumerate() {
> + assert_ds_properties(&schema.data_sources[12 + i], name, ds_type, 12 + i);
> + }
> + }
> +
> + #[test]
> + fn test_vm_schema_pve2() {
> + let schema = RrdSchema::vm(RrdFormat::Pve2);
> +
> + assert_eq!(schema.column_count(), 10);
> + assert_eq!(schema.format, RrdFormat::Pve2);
> +
> + let expected_ds = vec![
> + ("maxcpu", "GAUGE"),
> + ("cpu", "GAUGE"),
> + ("maxmem", "GAUGE"),
> + ("mem", "GAUGE"),
> + ("maxdisk", "GAUGE"),
> + ("disk", "GAUGE"),
> + ("netin", "DERIVE"),
> + ("netout", "DERIVE"),
> + ("diskread", "DERIVE"),
> + ("diskwrite", "DERIVE"),
> + ];
> +
> + for (i, (name, ds_type)) in expected_ds.iter().enumerate() {
> + assert_ds_properties(&schema.data_sources[i], name, ds_type, i);
> + }
> + }
> +
> + #[test]
> + fn test_vm_schema_pve9() {
> + let schema = RrdSchema::vm(RrdFormat::Pve9_0);
> +
> + assert_eq!(schema.column_count(), 17);
> + assert_eq!(schema.format, RrdFormat::Pve9_0);
> +
> + let pve2_schema = RrdSchema::vm(RrdFormat::Pve2);
> + for i in 0..10 {
> + assert_eq!(
> + schema.data_sources[i].name, pve2_schema.data_sources[i].name,
> + "First 10 DS should match pve2"
> + );
> + assert_eq!(
> + schema.data_sources[i].ds_type, pve2_schema.data_sources[i].ds_type,
> + "First 10 DS types should match pve2"
> + );
> + }
> +
> + let pve9_additions = vec![
> + ("memhost", "GAUGE"),
> + ("pressurecpusome", "GAUGE"),
> + ("pressurecpufull", "GAUGE"),
> + ("pressureiosome", "GAUGE"),
> + ("pressureiofull", "GAUGE"),
> + ("pressurememorysome", "GAUGE"),
> + ("pressurememoryfull", "GAUGE"),
> + ];
> +
> + for (i, (name, ds_type)) in pve9_additions.iter().enumerate() {
> + assert_ds_properties(&schema.data_sources[10 + i], name, ds_type, 10 + i);
> + }
> + }
> +
> + #[test]
> + fn test_storage_schema() {
> + for format in [RrdFormat::Pve2, RrdFormat::Pve9_0] {
> + let schema = RrdSchema::storage(format);
> +
> + assert_eq!(schema.column_count(), 2);
> + assert_eq!(schema.format, format);
> +
> + assert_ds_properties(&schema.data_sources[0], "total", "GAUGE", 0);
> + assert_ds_properties(&schema.data_sources[1], "used", "GAUGE", 1);
> + }
> + }
> +
> + #[test]
> + fn test_rra_archives() {
> + let expected_rras = [
> + "RRA:AVERAGE:0.5:1:1440",
> + "RRA:AVERAGE:0.5:30:1440",
> + "RRA:AVERAGE:0.5:360:1440",
> + "RRA:AVERAGE:0.5:10080:570",
> + "RRA:MAX:0.5:1:1440",
> + "RRA:MAX:0.5:30:1440",
> + "RRA:MAX:0.5:360:1440",
> + "RRA:MAX:0.5:10080:570",
> + ];
> +
> + let schemas = vec![
> + RrdSchema::node(RrdFormat::Pve2),
> + RrdSchema::node(RrdFormat::Pve9_0),
> + RrdSchema::vm(RrdFormat::Pve2),
> + RrdSchema::vm(RrdFormat::Pve9_0),
> + RrdSchema::storage(RrdFormat::Pve2),
> + RrdSchema::storage(RrdFormat::Pve9_0),
> + ];
> +
> + for schema in schemas {
> + assert_eq!(schema.archives.len(), 8);
> +
> + for (i, expected) in expected_rras.iter().enumerate() {
> + assert_eq!(
> + &schema.archives[i], expected,
> + "RRA[{}] mismatch in {:?}",
> + i, schema.format
> + );
> + }
> + }
> + }
> +
> + #[test]
> + fn test_heartbeat_consistency() {
> + let schemas = vec![
> + RrdSchema::node(RrdFormat::Pve2),
> + RrdSchema::node(RrdFormat::Pve9_0),
> + RrdSchema::vm(RrdFormat::Pve2),
> + RrdSchema::vm(RrdFormat::Pve9_0),
> + RrdSchema::storage(RrdFormat::Pve2),
> + RrdSchema::storage(RrdFormat::Pve9_0),
> + ];
> +
> + for schema in schemas {
> + for ds in &schema.data_sources {
> + assert_eq!(ds.heartbeat, 120);
> + assert_eq!(ds.min, "0");
> + assert_eq!(ds.max, "U");
> + }
> + }
> + }
> +
> + #[test]
> + fn test_gauge_vs_derive_correctness() {
> + // GAUGE: instantaneous values (CPU%, memory bytes)
> + // DERIVE: cumulative counters that can wrap (network/disk bytes)
> +
> + let node = RrdSchema::node(RrdFormat::Pve2);
> + let node_derive_indices = [10, 11]; // netin, netout
> + for (i, ds) in node.data_sources.iter().enumerate() {
> + if node_derive_indices.contains(&i) {
> + assert_eq!(
> + ds.ds_type, "DERIVE",
> + "Node DS[{}] ({}) should be DERIVE",
> + i, ds.name
> + );
> + } else {
> + assert_eq!(
> + ds.ds_type, "GAUGE",
> + "Node DS[{}] ({}) should be GAUGE",
> + i, ds.name
> + );
> + }
> + }
> +
> + let vm = RrdSchema::vm(RrdFormat::Pve2);
> + let vm_derive_indices = [6, 7, 8, 9]; // netin, netout, diskread, diskwrite
> + for (i, ds) in vm.data_sources.iter().enumerate() {
> + if vm_derive_indices.contains(&i) {
> + assert_eq!(
> + ds.ds_type, "DERIVE",
> + "VM DS[{}] ({}) should be DERIVE",
> + i, ds.name
> + );
> + } else {
> + assert_eq!(
> + ds.ds_type, "GAUGE",
> + "VM DS[{}] ({}) should be GAUGE",
> + i, ds.name
> + );
> + }
> + }
> +
> + let storage = RrdSchema::storage(RrdFormat::Pve2);
> + for ds in &storage.data_sources {
> + assert_eq!(
> + ds.ds_type, "GAUGE",
> + "Storage DS ({}) should be GAUGE",
> + ds.name
> + );
> + }
> + }
> +
> + #[test]
> + fn test_pve9_backward_compatibility() {
> + let node_pve2 = RrdSchema::node(RrdFormat::Pve2);
> + let node_pve9 = RrdSchema::node(RrdFormat::Pve9_0);
> +
> + assert!(node_pve9.column_count() > node_pve2.column_count());
> +
> + for i in 0..node_pve2.column_count() {
> + assert_eq!(
> + node_pve2.data_sources[i].name, node_pve9.data_sources[i].name,
> + "Node DS[{}] name must match between pve2 and pve9.0",
> + i
> + );
> + assert_eq!(
> + node_pve2.data_sources[i].ds_type, node_pve9.data_sources[i].ds_type,
> + "Node DS[{}] type must match between pve2 and pve9.0",
> + i
> + );
> + }
> +
> + let vm_pve2 = RrdSchema::vm(RrdFormat::Pve2);
> + let vm_pve9 = RrdSchema::vm(RrdFormat::Pve9_0);
> +
> + assert!(vm_pve9.column_count() > vm_pve2.column_count());
> +
> + for i in 0..vm_pve2.column_count() {
> + assert_eq!(
> + vm_pve2.data_sources[i].name, vm_pve9.data_sources[i].name,
> + "VM DS[{}] name must match between pve2 and pve9.0",
> + i
> + );
> + assert_eq!(
> + vm_pve2.data_sources[i].ds_type, vm_pve9.data_sources[i].ds_type,
> + "VM DS[{}] type must match between pve2 and pve9.0",
> + i
> + );
> + }
> +
> + let storage_pve2 = RrdSchema::storage(RrdFormat::Pve2);
> + let storage_pve9 = RrdSchema::storage(RrdFormat::Pve9_0);
> + assert_eq!(storage_pve2.column_count(), storage_pve9.column_count());
> + }
> +
> + #[test]
> + fn test_schema_display() {
> + let test_cases = vec![
> + (RrdSchema::node(RrdFormat::Pve2), "Pve2", "12 data sources"),
> + (
> + RrdSchema::node(RrdFormat::Pve9_0),
> + "Pve9_0",
> + "19 data sources",
> + ),
> + (RrdSchema::vm(RrdFormat::Pve2), "Pve2", "10 data sources"),
> + (
> + RrdSchema::vm(RrdFormat::Pve9_0),
> + "Pve9_0",
> + "17 data sources",
> + ),
> + (
> + RrdSchema::storage(RrdFormat::Pve2),
> + "Pve2",
> + "2 data sources",
> + ),
> + ];
> +
> + for (schema, expected_format, expected_count) in test_cases {
> + let display = format!("{}", schema);
> + assert!(
> + display.contains(expected_format),
> + "Display should contain format: {}",
> + display
> + );
> + assert!(
> + display.contains(expected_count),
> + "Display should contain count: {}",
> + display
> + );
> + }
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs
> new file mode 100644
> index 00000000..79ed202a
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs
> @@ -0,0 +1,397 @@
> +/// RRD File Writer
> +///
> +/// Handles creating and updating RRD files via pluggable backends.
> +/// Supports daemon-based (rrdcached) and direct file writing modes.
> +use super::key_type::RrdKeyType;
> +use super::schema::{RrdFormat, RrdSchema};
> +use anyhow::{Context, Result};
> +use chrono::Utc;
> +use std::collections::HashMap;
> +use std::fs;
> +use std::path::{Path, PathBuf};
> +
> +/// Metric type for determining column skipping rules
> +#[derive(Debug, Clone, Copy, PartialEq, Eq)]
> +enum MetricType {
> + Node,
> + Vm,
> + Storage,
> +}
> +
> +impl MetricType {
> + /// Number of non-archivable columns to skip
> + ///
> + /// C implementation (status.c:1300, 1335):
> + /// - Node: skip 2 (uptime, status)
> + /// - VM: skip 4 (uptime, status, template, pid)
> + /// - Storage: skip 0
> + fn skip_columns(self) -> usize {
> + match self {
> + MetricType::Node => 2,
> + MetricType::Vm => 4,
> + MetricType::Storage => 0,
> + }
> + }
> +}
> +
> +impl RrdFormat {
> + /// Get column count for a specific metric type
> + #[allow(dead_code)]
> + fn column_count(self, metric_type: &MetricType) -> usize {
> + match (self, metric_type) {
> + (RrdFormat::Pve2, MetricType::Node) => 12,
> + (RrdFormat::Pve9_0, MetricType::Node) => 19,
> + (RrdFormat::Pve2, MetricType::Vm) => 10,
> + (RrdFormat::Pve9_0, MetricType::Vm) => 17,
> + (_, MetricType::Storage) => 2, // Same for both formats
> + }
> + }
> +}
> +
> +impl RrdKeyType {
> + /// Get the metric type for this key
> + fn metric_type(&self) -> MetricType {
> + match self {
> + RrdKeyType::Node { .. } => MetricType::Node,
> + RrdKeyType::Vm { .. } => MetricType::Vm,
> + RrdKeyType::Storage { .. } => MetricType::Storage,
> + }
> + }
> +}
> +
> +/// RRD writer for persistent metric storage
> +///
> +/// Uses pluggable backends (daemon, direct, or fallback) for RRD operations.
> +pub struct RrdWriter {
> + /// Base directory for RRD files (default: /var/lib/rrdcached/db)
> + base_dir: PathBuf,
> + /// Backend for RRD operations (daemon, direct, or fallback)
> + backend: Box<dyn super::backend::RrdBackend>,
> + /// Track which RRD files we've already created
> + created_files: HashMap<String, ()>,
We currently dont clear this cache?
This suggests to risk DDoS.
> +}
> +
> +impl RrdWriter {
> + /// Create new RRD writer with default fallback backend
> + ///
> + /// Uses the fallback backend that tries daemon first, then falls back to direct file writes.
> + /// This matches the C implementation's behavior.
> + ///
> + /// # Arguments
> + /// * `base_dir` - Base directory for RRD files
> + pub async fn new<P: AsRef<Path>>(base_dir: P) -> Result<Self> {
> + let backend = Self::default_backend().await?;
> + Self::with_backend(base_dir, backend).await
> + }
> +
> + /// Create new RRD writer with specific backend
> + ///
> + /// # Arguments
> + /// * `base_dir` - Base directory for RRD files
> + /// * `backend` - RRD backend to use (daemon, direct, or fallback)
> + pub(crate) async fn with_backend<P: AsRef<Path>>(
> + base_dir: P,
> + backend: Box<dyn super::backend::RrdBackend>,
> + ) -> Result<Self> {
> + let base_dir = base_dir.as_ref().to_path_buf();
> +
> + // Create base directory if it doesn't exist
> + fs::create_dir_all(&base_dir)
> + .with_context(|| format!("Failed to create RRD base directory: {base_dir:?}"))?;
> +
> + tracing::info!("RRD writer using backend: {}", backend.name());
> +
> + Ok(Self {
> + base_dir,
> + backend,
> + created_files: HashMap::new(),
> + })
> + }
> +
> + /// Create default backend (fallback: daemon + direct)
> + ///
> + /// This matches the C implementation's behavior:
> + /// - Tries rrdcached daemon first for performance
> + /// - Falls back to direct file writes if daemon fails
> + async fn default_backend() -> Result<Box<dyn super::backend::RrdBackend>> {
> + let backend = super::backend::RrdFallbackBackend::new("/var/run/rrdcached.sock").await;
> + Ok(Box::new(backend))
> + }
> +
> + /// Update RRD file with metric data
> + ///
> + /// This will:
> + /// 1. Transform data from source format to target format (padding/truncation/column skipping)
> + /// 2. Create the RRD file if it doesn't exist
> + /// 3. Update via rrdcached daemon
> + ///
> + /// # Arguments
> + /// * `key` - RRD key (e.g., "pve2-node/node1", "pve-vm-9.0/100")
> + /// * `data` - Metric data string (format: "timestamp:value1:value2:...")
> + pub async fn update(&mut self, key: &str, data: &str) -> Result<()> {
> + // Parse the key to determine file path and schema
> + let key_type = RrdKeyType::parse(key).with_context(|| format!("Invalid RRD key: {key}"))?;
> +
> + // Get source format and target schema
> + let source_format = key_type.source_format();
> + let target_schema = key_type.schema();
> + let metric_type = key_type.metric_type();
> +
> + // Transform data from source to target format
> + let transformed_data =
> + Self::transform_data(data, source_format, &target_schema, metric_type)
> + .with_context(|| format!("Failed to transform RRD data for key: {key}"))?;
> +
> + // Get the file path (always uses current format)
> + let file_path = key_type.file_path(&self.base_dir);
> +
> + // Ensure the RRD file exists
> + if !self.created_files.contains_key(key) && !file_path.exists() {
If an RRD file is deleted/rotated while the process is running,
created_files still contains the key, so it won’t recreate and
updates will fail. Maybe check file_path.exists() unconditionally?
> + self.create_rrd_file(&key_type, &file_path).await?;
> + self.created_files.insert(key.to_string(), ());
> + }
> +
> + // Update the RRD file via backend
> + self.backend.update(&file_path, &transformed_data).await?;
> +
> + Ok(())
> + }
> +
> + /// Create RRD file with appropriate schema via backend
> + async fn create_rrd_file(&mut self, key_type: &RrdKeyType, file_path: &Path) -> Result<()> {
> + // Ensure parent directory exists
> + if let Some(parent) = file_path.parent() {
> + fs::create_dir_all(parent)
> + .with_context(|| format!("Failed to create directory: {parent:?}"))?;
> + }
> +
> + // Get schema for this RRD type
> + let schema = key_type.schema();
> +
> + // Calculate start time (at day boundary, matching C implementation)
> + let now = Utc::now();
> + let start = now
> + .date_naive()
> + .and_hms_opt(0, 0, 0)
> + .expect("00:00:00 is always a valid time")
> + .and_utc();
start time uses UTC midnight here, I think the C code uses localtime
day boundary. Worth double checking
> + let start_timestamp = start.timestamp();
> +
> + tracing::debug!(
> + "Creating RRD file: {:?} with {} data sources via {}",
> + file_path,
> + schema.column_count(),
> + self.backend.name()
> + );
> +
> + // Delegate to backend for creation
> + self.backend
> + .create(file_path, &schema, start_timestamp)
> + .await?;
> +
> + tracing::info!("Created RRD file: {:?} ({})", file_path, schema);
> +
> + Ok(())
> + }
> +
> + /// Transform data from source format to target format
> + ///
> + /// This implements the C behavior from status.c:
> + /// 1. Skip non-archivable columns only for old formats (uptime, status for nodes)
> + /// 2. Pad old format data with `:U` for missing columns
> + /// 3. Truncate future format data to known columns
> + ///
> + /// # Arguments
> + /// * `data` - Raw data string from status update (format: "timestamp:v1:v2:...")
> + /// * `source_format` - Format indicated by the input key
> + /// * `target_schema` - Target RRD schema (always Pve9_0 currently)
> + /// * `metric_type` - Type of metric (Node, VM, Storage) for column skipping
> + ///
> + /// # Returns
> + /// Transformed data string ready for RRD update
> + fn transform_data(
> + data: &str,
> + source_format: RrdFormat,
> + target_schema: &RrdSchema,
> + metric_type: MetricType,
> + ) -> Result<String> {
> + let mut parts = data.split(':');
> +
> + let timestamp = parts
> + .next()
> + .ok_or_else(|| anyhow::anyhow!("Empty data string"))?;
Not required for correctness as backend will reject, but early
validation here of the timestamp would improve the error message
and avoid doing work before failing
> +
> + // Skip non-archivable columns for old format only (C: status.c:1300, 1335, 1385)
> + let skip_count = if source_format == RrdFormat::Pve2 {
> + metric_type.skip_columns()
> + } else {
> + 0
> + };
likely a bug: here we only skip the non-archivable prefix fields for
Pve2, but not for Pve9_0. If pve9 payloads still include
uptime/status/template/pid, the mapping will be shifted and metrics
will be written into the wrong columns.
status.c:update_rrd_data() skips unconditionally by type:
if (strncmp(key, "pve2-node/", 10) == 0 || strncmp(key, "pve-node-", 9)
== 0) {
...
skip = 2; // first two columns are live data that isn't archived
...
}
} else if (strncmp(key, "pve2.3-vm/", 10) == 0 || strncmp(key,
"pve-vm-", 7) == 0) {
...
skip = 4; // first 4 columns are live data that isn't archived
...
}
skip = 2 / skip = 4 is not "PVE2 only"
Let’s either apply skip_columns() based on metric type for all formats,
or show with captured fixtures pve9 payloads are already stripped.
> +
> + // Build transformed data: timestamp + values (skipped, padded/truncated to target_cols)
> + let target_cols = target_schema.column_count();
> +
> + // Join values with ':' separator, efficiently building the string without Vec allocation
> + let mut iter = parts
> + .skip(skip_count)
> + .chain(std::iter::repeat("U"))
> + .take(target_cols);
> + let values = match iter.next() {
> + Some(first) => {
> + // Start with first value, fold remaining values with separator
> + iter.fold(first.to_string(), |mut acc, value| {
> + acc.push(':');
> + acc.push_str(value);
> + acc
> + })
> + }
> + None => String::new(),
> + };
> +
> + Ok(format!("{timestamp}:{values}"))
> + }
> +
> + /// Flush all pending updates
> + #[allow(dead_code)] // Used via RRD update cycle
> + pub(crate) async fn flush(&mut self) -> Result<()> {
> + self.backend.flush().await
> + }
> +
> + /// Get base directory
> + #[allow(dead_code)] // Used for path resolution in updates
> + pub(crate) fn base_dir(&self) -> &Path {
> + &self.base_dir
> + }
> +}
> +
> +impl Drop for RrdWriter {
> + fn drop(&mut self) {
> + // Note: We can't flush in Drop since it's async
> + // Users should call flush() explicitly before dropping if needed
> + tracing::debug!("RrdWriter dropped");
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::super::schema::{RrdFormat, RrdSchema};
> + use super::*;
> +
> + #[test]
> + fn test_rrd_file_path_generation() {
> + let temp_dir = std::path::PathBuf::from("/tmp/test");
> +
> + let key_node = RrdKeyType::Node {
> + nodename: "testnode".to_string(),
> + format: RrdFormat::Pve9_0,
> + };
> + let path = key_node.file_path(&temp_dir);
> + assert_eq!(path, temp_dir.join("pve-node-9.0").join("testnode"));
> + }
> +
> + // ===== Format Adaptation Tests =====
The transform tests are helpful, but can we add some real sample
payloads?
If we can capture a few actual update strings produced by the current
C impl / running system for:
then we could add them as fixtures and assert transform_data() produces
exactly the expected column layout for the target schema
> +
> + #[test]
> + fn test_transform_data_node_pve2_to_pve9() {
> + // Test padding old format (12 cols) to new format (19 cols)
> + // Input: timestamp:uptime:status:load:maxcpu:cpu:iowait:memtotal:memused:swap_t:swap_u:netin:netout
> + let data = "1234567890:1000:0:1.5:4:2.0:0.5:8000000000:6000000000:0:0:1000000:500000";
> +
> + let schema = RrdSchema::node(RrdFormat::Pve9_0);
> + let result =
> + RrdWriter::transform_data(data, RrdFormat::Pve2, &schema, MetricType::Node).unwrap();
> +
> + // After skipping 2 cols (uptime, status) and padding with 7 U's:
> + // timestamp:load:maxcpu:cpu:iowait:memtotal:memused:swap_t:swap_u:netin:netout:U:U:U:U:U:U:U
> + let parts: Vec<&str> = result.split(':').collect();
> + assert_eq!(parts[0], "1234567890", "Timestamp should be preserved");
> + assert_eq!(parts.len(), 20, "Should have timestamp + 19 values"); // 1 + 19
> + assert_eq!(parts[1], "1.5", "First value after skip should be load");
> + assert_eq!(parts[2], "4", "Second value should be maxcpu");
> +
> + // Check padding
> + for (i, item) in parts.iter().enumerate().take(20).skip(12) {
> + assert_eq!(item, &"U", "Column {} should be padded with U", i);
> + }
> + }
> +
> + #[test]
> + fn test_transform_data_vm_pve2_to_pve9() {
> + // Test VM transformation with 4 columns skipped
> + // Input: timestamp:uptime:status:template:pid:maxcpu:cpu:maxmem:mem:maxdisk:disk:netin:netout:diskread:diskwrite
> + let data = "1234567890:1000:1:0:12345:4:2:4096:2048:100000:50000:1000:500:100:50";
> +
> + let schema = RrdSchema::vm(RrdFormat::Pve9_0);
> + let result =
> + RrdWriter::transform_data(data, RrdFormat::Pve2, &schema, MetricType::Vm).unwrap();
> +
> + let parts: Vec<&str> = result.split(':').collect();
> + assert_eq!(parts[0], "1234567890");
> + assert_eq!(parts.len(), 18, "Should have timestamp + 17 values");
> + assert_eq!(parts[1], "4", "First value after skip should be maxcpu");
> +
> + // Check padding (last 7 columns)
> + for (i, item) in parts.iter().enumerate().take(18).skip(11) {
> + assert_eq!(item, &"U", "Column {} should be padded", i);
> + }
> + }
> +
> + #[test]
> + fn test_transform_data_no_padding_needed() {
> + // Test when source and target have same column count
> + let data = "1234567890:1.5:4:2.0:0.5:8000000000:6000000000:0:0:0:0:1000000:500000:7000000000:0:0:0:0:0:0";
> +
> + let schema = RrdSchema::node(RrdFormat::Pve9_0);
> + let result =
> + RrdWriter::transform_data(data, RrdFormat::Pve9_0, &schema, MetricType::Node).unwrap();
> +
> + // No transformation should occur (same format)
> + let parts: Vec<&str> = result.split(':').collect();
> + assert_eq!(parts.len(), 20); // timestamp + 19 values
> + assert_eq!(parts[1], "1.5");
> + }
> +
> + #[test]
> + fn test_transform_data_future_format_truncation() {
> + // Test truncation of future format with extra columns
> + let data = "1234567890:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25";
> +
> + let schema = RrdSchema::node(RrdFormat::Pve9_0);
> + // Simulating future format that has 25 columns
> + let result =
> + RrdWriter::transform_data(data, RrdFormat::Pve9_0, &schema, MetricType::Node).unwrap();
> +
> + let parts: Vec<&str> = result.split(':').collect();
> + assert_eq!(parts.len(), 20, "Should truncate to timestamp + 19 values");
> + assert_eq!(parts[19], "19", "Last value should be column 19");
> + }
> +
> + #[test]
> + fn test_transform_data_storage_no_change() {
> + // Storage format is same for Pve2 and Pve9_0 (2 columns, no skipping)
> + let data = "1234567890:1000000000000:500000000000";
> +
> + let schema = RrdSchema::storage(RrdFormat::Pve9_0);
> + let result =
> + RrdWriter::transform_data(data, RrdFormat::Pve2, &schema, MetricType::Storage).unwrap();
> +
> + assert_eq!(result, data, "Storage data should not be transformed");
> + }
> +
> + #[test]
> + fn test_metric_type_methods() {
> + assert_eq!(MetricType::Node.skip_columns(), 2);
> + assert_eq!(MetricType::Vm.skip_columns(), 4);
> + assert_eq!(MetricType::Storage.skip_columns(), 0);
> + }
> +
> + #[test]
> + fn test_format_column_counts() {
> + assert_eq!(RrdFormat::Pve2.column_count(&MetricType::Node), 12);
> + assert_eq!(RrdFormat::Pve9_0.column_count(&MetricType::Node), 19);
> + assert_eq!(RrdFormat::Pve2.column_count(&MetricType::Vm), 10);
> + assert_eq!(RrdFormat::Pve9_0.column_count(&MetricType::Vm), 17);
> + assert_eq!(RrdFormat::Pve2.column_count(&MetricType::Storage), 2);
> + assert_eq!(RrdFormat::Pve9_0.column_count(&MetricType::Storage), 2);
> + }
> +}
^ permalink raw reply [relevance 5%]
* Re: [pve-devel] [PATCH pve-cluster 03/15] pmxcfs-rs: add pmxcfs-logger crate
@ 2026-01-27 13:16 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-27 13:16 UTC (permalink / raw)
To: Proxmox VE development discussion, Kefu Chai
Thanks for the patch, Kefu.
The overall structure looks solid.
Main points are around C compatibility details. It might
also be worth adding a couple of binary compatibility tests
(known C blobs/fixtures) and a perf test for merging large logs.
Please see inline comments below.
On 1/6/26 3:24 PM, Kefu Chai wrote:
> Add cluster logging system with:
> - ClusterLog: Main API with automatic deduplication
> - RingBuffer: Circular buffer (50,000 entries)
> - FNV-1a hashing for duplicate detection
> - JSON export matching C format
> - Binary serialization for efficient storage
> - Time-based and node-digest sorting
>
> This is a self-contained crate with no internal dependencies,
> only requiring serde and parking_lot. It provides ~24% of the
> C version's LOC (740 vs 3000+) while maintaining full
> compatibility with the existing log format.
>
> Includes comprehensive unit tests for ring buffer operations,
> serialization, and filtering.
>
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
> src/pmxcfs-rs/Cargo.toml | 1 +
> src/pmxcfs-rs/pmxcfs-logger/Cargo.toml | 15 +
> src/pmxcfs-rs/pmxcfs-logger/README.md | 58 ++
> .../pmxcfs-logger/src/cluster_log.rs | 550 +++++++++++++++++
> src/pmxcfs-rs/pmxcfs-logger/src/entry.rs | 579 +++++++++++++++++
> src/pmxcfs-rs/pmxcfs-logger/src/hash.rs | 173 ++++++
> src/pmxcfs-rs/pmxcfs-logger/src/lib.rs | 27 +
> .../pmxcfs-logger/src/ring_buffer.rs | 581 ++++++++++++++++++
> 8 files changed, 1984 insertions(+)
> create mode 100644 src/pmxcfs-rs/pmxcfs-logger/Cargo.toml
> create mode 100644 src/pmxcfs-rs/pmxcfs-logger/README.md
> create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/cluster_log.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/entry.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/hash.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/lib.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/ring_buffer.rs
>
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index 28e20bb7..4d17e87e 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -3,6 +3,7 @@
> members = [
> "pmxcfs-api-types", # Shared types and error definitions
> "pmxcfs-config", # Configuration management
> + "pmxcfs-logger", # Cluster log with ring buffer and deduplication
> ]
> resolver = "2"
>
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/Cargo.toml b/src/pmxcfs-rs/pmxcfs-logger/Cargo.toml
> new file mode 100644
> index 00000000..1af3f015
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/Cargo.toml
> @@ -0,0 +1,15 @@
> +[package]
> +name = "pmxcfs-logger"
> +version = "0.1.0"
> +edition = "2021"
> +
> +[dependencies]
> +anyhow = "1.0"
> +parking_lot = "0.12"
> +serde = { version = "1.0", features = ["derive"] }
> +serde_json = "1.0"
> +tracing = "0.1"
> +
> +[dev-dependencies]
> +tempfile = "3.0"
> +
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/README.md b/src/pmxcfs-rs/pmxcfs-logger/README.md
> new file mode 100644
> index 00000000..38f102c2
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/README.md
> @@ -0,0 +1,58 @@
> +# pmxcfs-logger
> +
> +Cluster-wide log management for pmxcfs, fully compatible with the C implementation (logger.c).
> +
> +## Overview
> +
> +This crate implements a cluster log system matching Proxmox's C-based logger.c behavior. It provides:
> +
> +- **Ring Buffer Storage**: Circular buffer for log entries with automatic capacity management
> +- **FNV-1a Hashing**: Hashing for node and identity-based deduplication
> +- **Deduplication**: Per-node tracking of latest log entries to avoid duplicates
> +- **Time-based Sorting**: Chronological ordering of log entries across nodes
> +- **Multi-node Merging**: Combining logs from multiple cluster nodes
> +- **JSON Export**: Web UI-compatible JSON output matching C format
> +
> +## Architecture
> +
> +### Key Components
> +
> +1. **LogEntry** (`entry.rs`): Individual log entry with automatic UID generation
> +2. **RingBuffer** (`ring_buffer.rs`): Circular buffer with capacity management
> +3. **ClusterLog** (`lib.rs`): Main API with deduplication and merging
> +4. **Hash Functions** (`hash.rs`): FNV-1a implementation matching C
> +
> +## C to Rust Mapping
> +
> +| C Function | Rust Equivalent | Location |
> +|------------|-----------------|----------|
> +| `fnv_64a_buf` | `hash::fnv_64a` | hash.rs |
> +| `clog_pack` | `LogEntry::pack` | entry.rs |
> +| `clog_copy` | `RingBuffer::add_entry` | ring_buffer.rs |
> +| `clog_sort` | `RingBuffer::sort` | ring_buffer.rs |
> +| `clog_dump_json` | `RingBuffer::dump_json` | ring_buffer.rs |
> +| `clusterlog_insert` | `ClusterLog::insert` | lib.rs |
> +| `clusterlog_add` | `ClusterLog::add` | lib.rs |
> +| `clusterlog_merge` | `ClusterLog::merge` | lib.rs |
> +| `dedup_lookup` | `ClusterLog::dedup_lookup` | lib.rs |
> +
> +## Key Differences from C
> +
> +1. **No `node_digest` in DedupEntry**: C stores `node_digest` both as HashMap key and in the struct. Rust only uses it as the key, saving 8 bytes per entry.
> +
> +2. **Mutex granularity**: C uses a single global mutex. Rust uses separate Arc<Mutex<>> for buffer and dedup table, allowing better concurrency.
> +
> +3. **Code size**: Rust implementation is ~24% the size of C (740 lines vs 3,000+) while maintaining equivalent functionality.
> +
> +## Integration
> +
> +This crate is integrated into `pmxcfs-status` to provide cluster log functionality. The `.clusterlog` FUSE plugin uses this to provide JSON log output compatible with the Proxmox web UI.
> +
> +## References
> +
> +### C Implementation
> +- `src/pmxcfs/logger.c` / `logger.h` - Cluster log implementation
> +
> +### Related Crates
> +- **pmxcfs-status**: Integrates ClusterLog for status tracking
> +- **pmxcfs**: FUSE plugin exposes cluster log via `.clusterlog`
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/cluster_log.rs b/src/pmxcfs-rs/pmxcfs-logger/src/cluster_log.rs
> new file mode 100644
> index 00000000..3eb6c68c
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/cluster_log.rs
> @@ -0,0 +1,550 @@
> +/// Cluster Log Implementation
> +///
> +/// This module implements the cluster-wide log system with deduplication
> +/// and merging support, matching C's clusterlog_t.
> +use crate::entry::LogEntry;
> +use crate::ring_buffer::{RingBuffer, CLOG_DEFAULT_SIZE};
> +use anyhow::Result;
> +use parking_lot::Mutex;
> +use std::collections::{BTreeMap, HashMap};
> +use std::sync::Arc;
> +
> +/// Deduplication entry - tracks the latest UID and time for each node
> +///
> +/// Note: C's `dedup_entry_t` (logger.c:70-74) includes node_digest field because
> +/// GHashTable stores the struct pointer both as key and value. In Rust, we use
> +/// HashMap<u64, DedupEntry> where node_digest is the key, so we don't need to
> +/// duplicate it in the value. This is functionally equivalent but more efficient.
> +#[derive(Debug, Clone)]
> +pub(crate) struct DedupEntry {
> + /// Latest UID seen from this node
> + pub uid: u32,
> + /// Latest timestamp seen from this node
> + pub time: u32,
> +}
> +
> +/// Cluster-wide log with deduplication and merging support
> +/// Matches C's `clusterlog_t`
> +pub struct ClusterLog {
> + /// Ring buffer for log storage
> + pub(crate) buffer: Arc<Mutex<RingBuffer>>,
> +
> + /// Deduplication tracker (node_digest -> latest entry info)
> + /// Matches C's dedup hash table
> + pub(crate) dedup: Arc<Mutex<HashMap<u64, DedupEntry>>>,
> +}
> +
> +impl ClusterLog {
> + /// Create a new cluster log with default size
> + pub fn new() -> Self {
> + Self::with_capacity(CLOG_DEFAULT_SIZE)
> + }
> +
> + /// Create a new cluster log with specified capacity
> + pub fn with_capacity(capacity: usize) -> Self {
> + Self {
> + buffer: Arc::new(Mutex::new(RingBuffer::new(capacity))),
> + dedup: Arc::new(Mutex::new(HashMap::new())),
> + }
> + }
> +
> + /// Matches C's `clusterlog_add` function (logger.c:588-615)
> + #[allow(clippy::too_many_arguments)]
> + pub fn add(
> + &self,
> + node: &str,
> + ident: &str,
> + tag: &str,
> + pid: u32,
> + priority: u8,
> + time: u32,
> + message: &str,
> + ) -> Result<()> {
> + let entry = LogEntry::pack(node, ident, tag, pid, time, priority, message)?;
> + self.insert(&entry)
> + }
> +
> + /// Insert a log entry (with deduplication)
> + ///
> + /// Matches C's `clusterlog_insert` function (logger.c:573-586)
> + pub fn insert(&self, entry: &LogEntry) -> Result<()> {
> + let mut dedup = self.dedup.lock();
> +
> + // Check deduplication
> + if self.is_not_duplicate(&mut dedup, entry) {
> + // Entry is not a duplicate, add it
> + let mut buffer = self.buffer.lock();
> + buffer.add_entry(entry)?;
> + } else {
> + tracing::debug!("Ignoring duplicate cluster log entry");
> + }
> +
> + Ok(())
> + }
> +
> + /// Check if entry is a duplicate (returns true if NOT a duplicate)
> + ///
> + /// Matches C's `dedup_lookup` function (logger.c:362-388)
> + fn is_not_duplicate(&self, dedup: &mut HashMap<u64, DedupEntry>, entry: &LogEntry) -> bool {
> + match dedup.get_mut(&entry.node_digest) {
> + None => {
> + dedup.insert(
> + entry.node_digest,
> + DedupEntry {
> + time: entry.time,
> + uid: entry.uid,
> + },
> + );
> + true
> + }
> + Some(dd) => {
> + if entry.time > dd.time || (entry.time == dd.time && entry.uid > dd.uid) {
> + dd.time = entry.time;
> + dd.uid = entry.uid;
> + true
> + } else {
> + false
> + }
> + }
> + }
> + }
> +
> + pub fn get_entries(&self, max: usize) -> Vec<LogEntry> {
> + let buffer = self.buffer.lock();
> + buffer.iter().take(max).cloned().collect()
> + }
> +
> + /// Clear all log entries (for testing)
> + pub fn clear(&self) {
> + let mut buffer = self.buffer.lock();
> + let capacity = buffer.capacity();
> + *buffer = RingBuffer::new(capacity);
> + drop(buffer);
> +
> + self.dedup.lock().clear();
> + }
> +
> + /// Sort the log entries by time
> + ///
> + /// Matches C's `clog_sort` function (logger.c:321-355)
> + pub fn sort(&self) -> Result<RingBuffer> {
> + let buffer = self.buffer.lock();
> + buffer.sort()
> + }
> +
> + /// Merge logs from multiple nodes
> + ///
> + /// Matches C's `clusterlog_merge` function (logger.c:405-512)
> + pub fn merge(&self, remote_logs: Vec<RingBuffer>, include_local: bool) -> Result<RingBuffer> {
> + let mut sorted_entries: BTreeMap<(u32, u64, u32), LogEntry> = BTreeMap::new();
> + let mut merge_dedup: HashMap<u64, DedupEntry> = HashMap::new();
> +
> + // Calculate maximum capacity
> + let max_size = if include_local {
> + let local = self.buffer.lock();
> + let local_cap = local.capacity();
> + drop(local);
> +
> + std::iter::once(local_cap)
> + .chain(remote_logs.iter().map(|b| b.capacity()))
> + .max()
> + .unwrap_or(CLOG_DEFAULT_SIZE)
> + } else {
> + remote_logs
> + .iter()
> + .map(|b| b.capacity())
> + .max()
> + .unwrap_or(CLOG_DEFAULT_SIZE)
> + };
> +
> + // Add local entries if requested
> + if include_local {
> + let buffer = self.buffer.lock();
> + for entry in buffer.iter() {
> + let key = (entry.time, entry.node_digest, entry.uid);
> + sorted_entries.insert(key, entry.clone());
BTreeMap::insert overwrites on duplicate. Please re-check whether we
want that; if we want to keep-first, use
entry(key).or_insert(...) and only update merge_dedup when newly
inserted.
> + self.is_not_duplicate(&mut merge_dedup, entry);
> + }
> + }
> +
> + // Add remote entries
> + for remote_buffer in &remote_logs {
> + for entry in remote_buffer.iter() {
> + let key = (entry.time, entry.node_digest, entry.uid);
> + sorted_entries.insert(key, entry.clone());
> + self.is_not_duplicate(&mut merge_dedup, entry);
> + }
> + }
> +
> + let mut result = RingBuffer::new(max_size);
> +
> + // BTreeMap iterates in key order, entries are already sorted by (time, node_digest, uid)
> + for (_key, entry) in sorted_entries.iter().rev() {
C iterates oldest -> newest and clog_copy() makes each entry the new
head, so result is newest first. With .rev() and push_front we likely
invert it. Maybe drop .rev()? Please re-check
> + if result.is_near_full() {
> + break;
> + }
> + result.add_entry(entry)?;
> + }
> +
> + *self.dedup.lock() = merge_dedup;
clusterlog_merge() in C updates both cl->dedup and cl->base under the
same mutex. Here we update only dedup but return a RingBuffer which
then requires a separate update_buffer() call. Shouldn't this be an
atomic operation? Also, we currently have two mutexes (dedup and
buffer), which increases deadlock risk. Couldnt we put buffer and
dedup behind one mutex and make merge() update both buffer+dedup
atomically inside the same lock?
> +
> + Ok(result)
> + }
> +
> + /// Export log to JSON format
> + ///
> + /// Matches C's `clog_dump_json` function (logger.c:139-199)
> + pub fn dump_json(&self, ident_filter: Option<&str>, max_entries: usize) -> String {
> + let buffer = self.buffer.lock();
> + buffer.dump_json(ident_filter, max_entries)
> + }
> +
> + /// Export log to JSON format with sorted entries
> + pub fn dump_json_sorted(
> + &self,
> + ident_filter: Option<&str>,
> + max_entries: usize,
> + ) -> Result<String> {
> + let sorted = self.sort()?;
> + Ok(sorted.dump_json(ident_filter, max_entries))
> + }
> +
> + /// Matches C's `clusterlog_get_state` function (logger.c:553-571)
> + ///
> + /// Returns binary-serialized clog_base_t structure for network transmission.
> + /// This format is compatible with C nodes for mixed-cluster operation.
> + pub fn get_state(&self) -> Result<Vec<u8>> {
> + let sorted = self.sort()?;
> + Ok(sorted.serialize_binary())
> + }
> +
> + pub fn deserialize_state(data: &[u8]) -> Result<RingBuffer> {
> + RingBuffer::deserialize_binary(data)
> + }
> +
> + /// Replace the entire buffer after merging logs from multiple nodes
> + pub fn update_buffer(&self, new_buffer: RingBuffer) {
> + *self.buffer.lock() = new_buffer;
> + }
> +}
> +
> +impl Default for ClusterLog {
> + fn default() -> Self {
> + Self::new()
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> +
> + #[test]
> + fn test_cluster_log_creation() {
> + let log = ClusterLog::new();
> + assert!(log.buffer.lock().is_empty());
> + }
> +
> + #[test]
> + fn test_add_entry() {
> + let log = ClusterLog::new();
> +
> + let result = log.add(
> + "node1",
> + "root",
> + "cluster",
> + 12345,
> + 6, // Info priority
> + 1234567890,
> + "Test message",
> + );
> +
> + assert!(result.is_ok());
> + assert!(!log.buffer.lock().is_empty());
> + }
> +
> + #[test]
> + fn test_deduplication() {
> + let log = ClusterLog::new();
> +
> + // Add same entry twice (but with different UIDs since each add creates a new entry)
> + let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Message 1");
> + let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Message 1");
> +
> + // Both entries are added because they have different UIDs
> + // Deduplication tracks the latest (time, UID) per node, not content
> + let buffer = log.buffer.lock();
> + assert_eq!(buffer.len(), 2);
> + }
> +
> + #[test]
> + fn test_newer_entry_replaces() {
> + let log = ClusterLog::new();
> +
> + // Add older entry
> + let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Old message");
> +
> + // Add newer entry from same node
> + let _ = log.add("node1", "root", "cluster", 123, 6, 1001, "New message");
> +
> + // Should have both entries (newer doesn't remove older, just updates dedup tracker)
> + let buffer = log.buffer.lock();
> + assert_eq!(buffer.len(), 2);
> + }
> +
> + #[test]
> + fn test_json_export() {
> + let log = ClusterLog::new();
> +
> + let _ = log.add(
> + "node1",
> + "root",
> + "cluster",
> + 123,
> + 6,
> + 1234567890,
> + "Test message",
> + );
> +
> + let json = log.dump_json(None, 50);
> +
> + // Should be valid JSON
> + assert!(serde_json::from_str::<serde_json::Value>(&json).is_ok());
> +
> + // Should contain "data" field
> + let value: serde_json::Value = serde_json::from_str(&json).unwrap();
> + assert!(value.get("data").is_some());
> + }
> +
> + #[test]
> + fn test_merge_logs() {
> + let log1 = ClusterLog::new();
> + let log2 = ClusterLog::new();
> +
> + // Add entries to first log
> + let _ = log1.add(
> + "node1",
> + "root",
> + "cluster",
> + 123,
> + 6,
> + 1000,
> + "Message from node1",
> + );
> +
> + // Add entries to second log
> + let _ = log2.add(
> + "node2",
> + "root",
> + "cluster",
> + 456,
> + 6,
> + 1001,
> + "Message from node2",
> + );
> +
> + // Get log2's buffer for merging
> + let log2_buffer = log2.buffer.lock().clone();
> +
> + // Merge into log1
> + let merged = log1.merge(vec![log2_buffer], true).unwrap();
> +
> + // Should contain entries from both logs
> + assert!(merged.len() >= 2);
> + }
> +
> + // ========================================================================
> + // HIGH PRIORITY TESTS - Merge Edge Cases
> + // ========================================================================
> +
> + #[test]
> + fn test_merge_empty_logs() {
> + let log = ClusterLog::new();
> +
> + // Add some entries to local log
> + let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Local entry");
> +
> + // Merge with empty remote logs
> + let merged = log.merge(vec![], true).unwrap();
> +
> + // Should have 1 entry (from local log)
> + assert_eq!(merged.len(), 1);
> + let entry = merged.iter().next().unwrap();
> + assert_eq!(entry.node, "node1");
> + }
> +
> + #[test]
> + fn test_merge_single_node_only() {
> + let log = ClusterLog::new();
> +
> + // Add entries only from single node
> + let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Entry 1");
> + let _ = log.add("node1", "root", "cluster", 124, 6, 1001, "Entry 2");
> + let _ = log.add("node1", "root", "cluster", 125, 6, 1002, "Entry 3");
> +
> + // Merge with no remote logs (just sort local)
> + let merged = log.merge(vec![], true).unwrap();
> +
> + // Should have all 3 entries
> + assert_eq!(merged.len(), 3);
> +
> + // Entries should be sorted by time (buffer stores newest first after reversing during add)
> + // Merge reverses the BTreeMap iteration, so newest entries are added first
> + let times: Vec<u32> = merged.iter().map(|e| e.time).collect();
> + let mut expected = vec![1002, 1001, 1000];
> + expected.sort();
> + expected.reverse(); // Newest first
> +
> + let mut actual = times.clone();
> + actual.sort();
> + actual.reverse();
> +
> + assert_eq!(actual, expected);
> + }
> +
> + #[test]
> + fn test_merge_all_duplicates() {
> + let log1 = ClusterLog::new();
> + let log2 = ClusterLog::new();
> +
> + // Add same entries to both logs (same node, time, but different UIDs)
> + let _ = log1.add("node1", "root", "cluster", 123, 6, 1000, "Entry 1");
> + let _ = log1.add("node1", "root", "cluster", 124, 6, 1001, "Entry 2");
> +
> + let _ = log2.add("node1", "root", "cluster", 125, 6, 1000, "Entry 1");
> + let _ = log2.add("node1", "root", "cluster", 126, 6, 1001, "Entry 2");
> +
> + let log2_buffer = log2.buffer.lock().clone();
> +
> + // Merge - should handle entries from same node at same times
> + let merged = log1.merge(vec![log2_buffer], true).unwrap();
> +
> + // Should have 4 entries (all are unique by UID despite same time/node)
> + assert_eq!(merged.len(), 4);
> + }
> +
> + #[test]
> + fn test_merge_exceeding_capacity() {
> + // Create small buffer to test capacity enforcement
> + let log = ClusterLog::with_capacity(50_000); // Small buffer
> +
> + // Add many entries to fill beyond capacity
> + for i in 0..100 {
> + let _ = log.add(
> + "node1",
> + "root",
> + "cluster",
> + 100 + i,
> + 6,
> + 1000 + i,
> + &format!("Entry {}", i),
> + );
> + }
> +
> + // Create remote log with many entries
> + let remote = ClusterLog::with_capacity(50_000);
> + for i in 0..100 {
> + let _ = remote.add(
> + "node2",
> + "root",
> + "cluster",
> + 200 + i,
> + 6,
> + 1000 + i,
> + &format!("Remote {}", i),
> + );
> + }
> +
> + let remote_buffer = remote.buffer.lock().clone();
> +
> + // Merge - should stop when buffer is near full
> + let merged = log.merge(vec![remote_buffer], true).unwrap();
> +
> + // Buffer should be limited by capacity, not necessarily < 200
> + // The actual limit depends on entry sizes and capacity
> + // Just verify we got some reasonable number of entries
> + assert!(!merged.is_empty(), "Should have some entries");
> + assert!(
> + merged.len() <= 200,
> + "Should not exceed total available entries"
> + );
> + }
> +
> + #[test]
> + fn test_merge_preserves_dedup_state() {
> + let log = ClusterLog::new();
> +
> + // Add entries from node1
> + let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Entry 1");
> + let _ = log.add("node1", "root", "cluster", 124, 6, 1001, "Entry 2");
> +
> + // Create remote log with later entries from node1
> + let remote = ClusterLog::new();
> + let _ = remote.add("node1", "root", "cluster", 125, 6, 1002, "Entry 3");
> +
> + let remote_buffer = remote.buffer.lock().clone();
> +
> + // Merge
> + let _ = log.merge(vec![remote_buffer], true).unwrap();
> +
> + // Check that dedup state was updated
> + let dedup = log.dedup.lock();
> + let node1_digest = crate::hash::fnv_64a_str("node1");
> + let dedup_entry = dedup.get(&node1_digest).unwrap();
> +
> + // Should track the latest time from node1
> + assert_eq!(dedup_entry.time, 1002);
> + // UID is auto-generated, so just verify it exists and is reasonable
> + assert!(dedup_entry.uid > 0);
> + }
> +
> + #[test]
> + fn test_get_state_binary_format() {
> + let log = ClusterLog::new();
> +
> + // Add some entries
> + let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Entry 1");
> + let _ = log.add("node2", "admin", "system", 456, 6, 1001, "Entry 2");
> +
> + // Get state
> + let state = log.get_state().unwrap();
> +
> + // Should be binary format, not JSON
> + assert!(state.len() >= 8); // At least header
> +
> + // Check header format (clog_base_t)
> + let size = u32::from_le_bytes(state[0..4].try_into().unwrap()) as usize;
> + let cpos = u32::from_le_bytes(state[4..8].try_into().unwrap());
> +
> + assert_eq!(size, state.len());
> + assert_eq!(cpos, 8); // First entry at offset 8
> +
> + // Should be able to deserialize back
> + let deserialized = ClusterLog::deserialize_state(&state).unwrap();
> + assert_eq!(deserialized.len(), 2);
> + }
> +
> + #[test]
> + fn test_state_roundtrip() {
> + let log = ClusterLog::new();
> +
> + // Add entries
> + let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Test 1");
> + let _ = log.add("node2", "admin", "system", 456, 6, 1001, "Test 2");
> +
> + // Serialize
> + let state = log.get_state().unwrap();
> +
> + // Deserialize
> + let deserialized = ClusterLog::deserialize_state(&state).unwrap();
> +
> + // Check entries preserved
> + assert_eq!(deserialized.len(), 2);
> +
> + // Buffer is stored newest-first after sorting and serialization
> + let entries: Vec<_> = deserialized.iter().collect();
> + assert_eq!(entries[0].node, "node2"); // Newest (time 1001)
> + assert_eq!(entries[0].message, "Test 2");
> + assert_eq!(entries[1].node, "node1"); // Oldest (time 1000)
> + assert_eq!(entries[1].message, "Test 1");
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/entry.rs b/src/pmxcfs-rs/pmxcfs-logger/src/entry.rs
> new file mode 100644
> index 00000000..187667ad
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/entry.rs
> @@ -0,0 +1,579 @@
> +/// Log Entry Implementation
> +///
> +/// This module implements the cluster log entry structure, matching the C
> +/// implementation's clog_entry_t (logger.c).
> +use super::hash::fnv_64a_str;
> +use anyhow::{bail, Result};
> +use serde::Serialize;
> +use std::sync::atomic::{AtomicU32, Ordering};
> +
> +// Constants from C implementation
> +pub(crate) const CLOG_MAX_ENTRY_SIZE: usize = 8192 + 4096; // SYSLOG_MAX_LINE_LENGTH + overhead
This constant is also defined in ring_buffer.rs.
> +
> +/// Global UID counter (matches C's `uid_counter` in logger.c:62)
> +static UID_COUNTER: AtomicU32 = AtomicU32::new(0);
> +
> +/// Log entry structure
> +///
> +/// Matches C's `clog_entry_t` from logger.c:
> +/// ```c
> +/// typedef struct {
> +/// uint32_t prev; // Previous entry offset
> +/// uint32_t next; // Next entry offset
> +/// uint32_t uid; // Unique ID
> +/// uint32_t time; // Timestamp
> +/// uint64_t node_digest; // FNV-1a hash of node name
> +/// uint64_t ident_digest; // FNV-1a hash of ident
> +/// uint32_t pid; // Process ID
> +/// uint8_t priority; // Syslog priority (0-7)
> +/// uint8_t node_len; // Length of node name (including null)
> +/// uint8_t ident_len; // Length of ident (including null)
> +/// uint8_t tag_len; // Length of tag (including null)
> +/// uint32_t msg_len; // Length of message (including null)
> +/// char data[]; // Variable length data: node + ident + tag + msg
> +/// } clog_entry_t;
> +/// ```
> +#[derive(Debug, Clone, Serialize)]
> +pub struct LogEntry {
> + /// Unique ID for this entry (auto-incrementing)
> + pub uid: u32,
> +
> + /// Unix timestamp
> + pub time: u32,
> +
> + /// FNV-1a hash of node name
> + pub node_digest: u64,
> +
> + /// FNV-1a hash of ident (user)
> + pub ident_digest: u64,
> +
> + /// Process ID
> + pub pid: u32,
> +
> + /// Syslog priority (0-7)
> + pub priority: u8,
> +
> + /// Node name
> + pub node: String,
> +
> + /// Identity/user
> + pub ident: String,
> +
> + /// Tag (e.g., "cluster", "pmxcfs")
> + pub tag: String,
> +
> + /// Log message
> + pub message: String,
> +}
> +
> +impl LogEntry {
> + /// Matches C's `clog_pack` function (logger.c:220-278)
> + pub fn pack(
> + node: &str,
> + ident: &str,
> + tag: &str,
> + pid: u32,
> + time: u32,
> + priority: u8,
> + message: &str,
> + ) -> Result<Self> {
> + if priority >= 8 {
> + bail!("Invalid priority: {priority} (must be 0-7)");
> + }
> +
> + let node = Self::truncate_string(node, 255);
> + let ident = Self::truncate_string(ident, 255);
> + let tag = Self::truncate_string(tag, 255);
> + let message = Self::utf8_to_ascii(message);
> +
> + let node_len = node.len() + 1;
> + let ident_len = ident.len() + 1;
> + let tag_len = tag.len() + 1;
> + let mut msg_len = message.len() + 1;
> +
> + let total_size = std::mem::size_of::<u32>() * 4 // prev, next, uid, time
> + + std::mem::size_of::<u64>() * 2 // node_digest, ident_digest
> + + std::mem::size_of::<u32>() * 2 // pid, msg_len
> + + std::mem::size_of::<u8>() * 4 // priority, node_len, ident_len, tag_len
> + + node_len
> + + ident_len
> + + tag_len
> + + msg_len;
> +
> + if total_size > CLOG_MAX_ENTRY_SIZE {
> + let diff = total_size - CLOG_MAX_ENTRY_SIZE;
> + msg_len = msg_len.saturating_sub(diff);
> + }
> +
> + let node_digest = fnv_64a_str(&node);
> + let ident_digest = fnv_64a_str(&ident);
> + let uid = UID_COUNTER.fetch_add(1, Ordering::SeqCst).wrapping_add(1);
> +
> + Ok(Self {
> + uid,
> + time,
> + node_digest,
> + ident_digest,
> + pid,
> + priority,
> + node,
> + ident,
> + tag,
> + message: message[..msg_len.saturating_sub(1)].to_string(),
> + })
> + }
> +
> + /// Truncate string to max length
> + fn truncate_string(s: &str, max_len: usize) -> String {
> + if s.len() > max_len {
> + s[..max_len].to_string()
> + } else {
> + s.to_string()
> + }
> + }
> +
> + /// Convert UTF-8 to ASCII with proper escaping
> + ///
> + /// Matches C's `utf8_to_ascii` behavior (cfs-utils.c:40-107):
> + /// - Control characters (0x00-0x1F, 0x7F): Escaped as #0XXX (e.g., #007 for BEL)
> + /// - Unicode (U+0080 to U+FFFF): Escaped as \uXXXX (e.g., \u4e16 for 世)
> + /// - Quotes (when quotequote=true): Escaped as \"
> + /// - Characters > U+FFFF: Silently dropped
> + /// - ASCII printable (0x20-0x7E except quotes): Passed through unchanged
> + fn utf8_to_ascii(s: &str) -> String {
> + let mut result = String::with_capacity(s.len());
> +
> + for c in s.chars() {
> + match c {
> + // Control characters: #0XXX format (3 decimal digits with leading 0)
> + '\x00'..='\x1F' | '\x7F' => {
> + let code = c as u32;
> + result.push('#');
> + result.push('0');
> + // Format as 3 decimal digits with leading zeros (e.g., #0007 for BEL)
> + result.push_str(&format!("{:03}", code));
> + }
> + // ASCII printable characters: pass through
> + c if c.is_ascii() => {
> + result.push(c);
> + }
> + // Unicode U+0080 to U+FFFF: \uXXXX format
> + c if (c as u32) < 0x10000 => {
> + result.push('\\');
> + result.push('u');
> + result.push_str(&format!("{:04x}", c as u32));
> + }
> + // Characters > U+FFFF: silently drop (matches C behavior)
> + _ => {}
> + }
> + }
> +
> + result
> + }
> +
> + /// Matches C's `clog_entry_size` function (logger.c:201-206)
> + pub fn size(&self) -> usize {
> + std::mem::size_of::<u32>() * 4 // prev, next, uid, time
> + + std::mem::size_of::<u64>() * 2 // node_digest, ident_digest
> + + std::mem::size_of::<u32>() * 2 // pid, msg_len
> + + std::mem::size_of::<u8>() * 4 // priority, node_len, ident_len, tag_len
> + + self.node.len() + 1
> + + self.ident.len() + 1
> + + self.tag.len() + 1
> + + self.message.len() + 1
> + }
> +
> + /// C implementation: `uint32_t realsize = ((size + 7) & 0xfffffff8);`
> + pub fn aligned_size(&self) -> usize {
> + let size = self.size();
> + (size + 7) & !7
> + }
> +
> + pub fn to_json_object(&self) -> serde_json::Value {
> + serde_json::json!({
> + "uid": self.uid,
> + "time": self.time,
> + "pri": self.priority,
> + "tag": self.tag,
> + "pid": self.pid,
> + "node": self.node,
> + "user": self.ident,
> + "msg": self.message,
> + })
> + }
> +
> + /// Serialize to C binary format (clog_entry_t)
> + ///
> + /// Binary layout matches C structure:
> + /// ```c
> + /// struct {
> + /// uint32_t prev; // Will be filled by ring buffer
> + /// uint32_t next; // Will be filled by ring buffer
> + /// uint32_t uid;
> + /// uint32_t time;
> + /// uint64_t node_digest;
> + /// uint64_t ident_digest;
> + /// uint32_t pid;
> + /// uint8_t priority;
> + /// uint8_t node_len;
> + /// uint8_t ident_len;
> + /// uint8_t tag_len;
> + /// uint32_t msg_len;
> + /// char data[]; // node + ident + tag + msg (null-terminated)
> + /// }
> + /// ```
> + pub(crate) fn serialize_binary(&self, prev: u32, next: u32) -> Vec<u8> {
> + let mut buf = Vec::new();
> +
> + buf.extend_from_slice(&prev.to_le_bytes());
> + buf.extend_from_slice(&next.to_le_bytes());
> + buf.extend_from_slice(&self.uid.to_le_bytes());
> + buf.extend_from_slice(&self.time.to_le_bytes());
> + buf.extend_from_slice(&self.node_digest.to_le_bytes());
> + buf.extend_from_slice(&self.ident_digest.to_le_bytes());
> + buf.extend_from_slice(&self.pid.to_le_bytes());
> + buf.push(self.priority);
> +
> + let node_len = (self.node.len() + 1) as u8;
> + let ident_len = (self.ident.len() + 1) as u8;
> + let tag_len = (self.tag.len() + 1) as u8;
These three fields are u8 incl. NUL. Payload must cap at 254 bytes,
otherwise len + 1 wraps to 0. C does MIN(strlen + 1,255)
> + let msg_len = (self.message.len() + 1) as u32;
> +
> + buf.push(node_len);
> + buf.push(ident_len);
> + buf.push(tag_len);
> + buf.extend_from_slice(&msg_len.to_le_bytes());
> +
> + buf.extend_from_slice(self.node.as_bytes());
> + buf.push(0);
> +
> + buf.extend_from_slice(self.ident.as_bytes());
> + buf.push(0);
> +
> + buf.extend_from_slice(self.tag.as_bytes());
> + buf.push(0);
> +
> + buf.extend_from_slice(self.message.as_bytes());
> + buf.push(0);
> +
> + buf
> + }
> +
> + pub(crate) fn deserialize_binary(data: &[u8]) -> Result<(Self, u32, u32)> {
> + if data.len() < 48 {
> + bail!(
> + "Entry too small: {} bytes (need at least 48 for header)",
> + data.len()
> + );
> + }
> +
> + let mut offset = 0;
> +
> + let prev = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> + offset += 4;
> +
> + let next = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> + offset += 4;
> +
> + let uid = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> + offset += 4;
> +
> + let time = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> + offset += 4;
> +
> + let node_digest = u64::from_le_bytes(data[offset..offset + 8].try_into()?);
> + offset += 8;
> +
> + let ident_digest = u64::from_le_bytes(data[offset..offset + 8].try_into()?);
> + offset += 8;
> +
> + let pid = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> + offset += 4;
> +
> + let priority = data[offset];
> + offset += 1;
> +
> + let node_len = data[offset] as usize;
> + offset += 1;
> +
> + let ident_len = data[offset] as usize;
> + offset += 1;
> +
> + let tag_len = data[offset] as usize;
> + offset += 1;
> +
> + let msg_len = u32::from_le_bytes(data[offset..offset + 4].try_into()?) as usize;
> + offset += 4;
> +
> + if offset + node_len + ident_len + tag_len + msg_len > data.len() {
> + bail!("Entry data exceeds buffer size");
> + }
> +
> + let node = read_null_terminated(&data[offset..offset + node_len])?;
> + offset += node_len;
> +
> + let ident = read_null_terminated(&data[offset..offset + ident_len])?;
> + offset += ident_len;
> +
> + let tag = read_null_terminated(&data[offset..offset + tag_len])?;
> + offset += tag_len;
> +
> + let message = read_null_terminated(&data[offset..offset + msg_len])?;
> +
> + Ok((
> + Self {
> + uid,
> + time,
> + node_digest,
> + ident_digest,
> + pid,
> + priority,
> + node,
> + ident,
> + tag,
> + message,
> + },
> + prev,
> + next,
> + ))
> + }
> +}
> +
> +fn read_null_terminated(data: &[u8]) -> Result<String> {
> + let len = data.iter().position(|&b| b == 0).unwrap_or(data.len());
> + Ok(String::from_utf8_lossy(&data[..len]).into_owned())
> +}
> +
> +#[cfg(test)]
> +pub fn reset_uid_counter() {
> + UID_COUNTER.store(0, Ordering::SeqCst);
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> +
> + #[test]
> + fn test_pack_entry() {
> + reset_uid_counter();
> +
> + let entry = LogEntry::pack(
> + "node1",
> + "root",
> + "cluster",
> + 12345,
> + 1234567890,
> + 6, // Info priority
> + "Test message",
> + )
> + .unwrap();
> +
> + assert_eq!(entry.uid, 1);
> + assert_eq!(entry.time, 1234567890);
> + assert_eq!(entry.node, "node1");
> + assert_eq!(entry.ident, "root");
> + assert_eq!(entry.tag, "cluster");
> + assert_eq!(entry.pid, 12345);
> + assert_eq!(entry.priority, 6);
> + assert_eq!(entry.message, "Test message");
> + }
> +
> + #[test]
> + fn test_uid_increment() {
> + reset_uid_counter();
> +
> + let entry1 = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg1").unwrap();
> + let entry2 = LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "msg2").unwrap();
> +
> + assert_eq!(entry1.uid, 1);
> + assert_eq!(entry2.uid, 2);
> + }
> +
> + #[test]
> + fn test_invalid_priority() {
> + let result = LogEntry::pack("node1", "root", "tag", 0, 1000, 8, "message");
> + assert!(result.is_err());
> + }
> +
> + #[test]
> + fn test_node_digest() {
> + let entry1 = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg").unwrap();
> + let entry2 = LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "msg").unwrap();
> + let entry3 = LogEntry::pack("node2", "root", "tag", 0, 1000, 6, "msg").unwrap();
> +
> + // Same node should have same digest
> + assert_eq!(entry1.node_digest, entry2.node_digest);
> +
> + // Different node should have different digest
> + assert_ne!(entry1.node_digest, entry3.node_digest);
> + }
> +
> + #[test]
> + fn test_ident_digest() {
> + let entry1 = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg").unwrap();
> + let entry2 = LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "msg").unwrap();
> + let entry3 = LogEntry::pack("node1", "admin", "tag", 0, 1000, 6, "msg").unwrap();
> +
> + // Same ident should have same digest
> + assert_eq!(entry1.ident_digest, entry2.ident_digest);
> +
> + // Different ident should have different digest
> + assert_ne!(entry1.ident_digest, entry3.ident_digest);
> + }
> +
> + #[test]
> + fn test_utf8_to_ascii() {
> + let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "Hello 世界").unwrap();
> + assert!(entry.message.is_ascii());
> + // Unicode chars escaped as \uXXXX format (matches C implementation)
> + assert!(entry.message.contains("\\u4e16")); // 世 = U+4E16
> + assert!(entry.message.contains("\\u754c")); // 界 = U+754C
> + }
> +
> + #[test]
> + fn test_utf8_control_chars() {
> + // Test control character escaping
> + let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "Hello\x07World").unwrap();
> + assert!(entry.message.is_ascii());
> + // BEL (0x07) should be escaped as #0007
> + assert!(entry.message.contains("#0007"));
> + }
> +
> + #[test]
> + fn test_utf8_mixed_content() {
> + // Test mix of ASCII, Unicode, and control chars
> + let entry = LogEntry::pack(
> + "node1",
> + "root",
> + "tag",
> + 0,
> + 1000,
> + 6,
> + "Test\x01\nUnicode世\ttab",
> + )
> + .unwrap();
> + assert!(entry.message.is_ascii());
> + // SOH (0x01) -> #0001
> + assert!(entry.message.contains("#0001"));
> + // Newline (0x0A) -> #0010
> + assert!(entry.message.contains("#0010"));
> + // Unicode 世 (U+4E16) -> \u4e16
> + assert!(entry.message.contains("\\u4e16"));
> + // Tab (0x09) -> #0009
> + assert!(entry.message.contains("#0009"));
> + }
> +
> + #[test]
> + fn test_string_truncation() {
> + let long_node = "a".repeat(300);
> + let entry = LogEntry::pack(&long_node, "root", "tag", 0, 1000, 6, "msg").unwrap();
> + assert!(entry.node.len() <= 255);
> + }
> +
> + #[test]
> + fn test_message_truncation() {
> + let long_message = "a".repeat(CLOG_MAX_ENTRY_SIZE);
> + let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, &long_message).unwrap();
> + // Entry should fit within max size
> + assert!(entry.size() <= CLOG_MAX_ENTRY_SIZE);
> + }
> +
> + #[test]
> + fn test_aligned_size() {
> + let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg").unwrap();
> + let aligned = entry.aligned_size();
> +
> + // Aligned size should be multiple of 8
> + assert_eq!(aligned % 8, 0);
> +
> + // Aligned size should be >= actual size
> + assert!(aligned >= entry.size());
> +
> + // Aligned size should be within 7 bytes of actual size
> + assert!(aligned - entry.size() < 8);
> + }
> +
> + #[test]
> + fn test_json_export() {
> + let entry = LogEntry::pack("node1", "root", "cluster", 123, 1234567890, 6, "Test").unwrap();
> + let json = entry.to_json_object();
> +
> + assert_eq!(json["node"], "node1");
> + assert_eq!(json["user"], "root");
> + assert_eq!(json["tag"], "cluster");
> + assert_eq!(json["pid"], 123);
> + assert_eq!(json["time"], 1234567890);
> + assert_eq!(json["pri"], 6);
> + assert_eq!(json["msg"], "Test");
> + }
> +
> + #[test]
> + fn test_binary_serialization_roundtrip() {
> + let entry = LogEntry::pack(
> + "node1",
> + "root",
> + "cluster",
> + 12345,
> + 1234567890,
> + 6,
> + "Test message",
> + )
> + .unwrap();
> +
> + // Serialize with prev/next pointers
> + let binary = entry.serialize_binary(100, 200);
> +
> + // Deserialize
> + let (deserialized, prev, next) = LogEntry::deserialize_binary(&binary).unwrap();
> +
> + // Check prev/next pointers
> + assert_eq!(prev, 100);
> + assert_eq!(next, 200);
> +
> + // Check entry fields
> + assert_eq!(deserialized.uid, entry.uid);
> + assert_eq!(deserialized.time, entry.time);
> + assert_eq!(deserialized.node_digest, entry.node_digest);
> + assert_eq!(deserialized.ident_digest, entry.ident_digest);
> + assert_eq!(deserialized.pid, entry.pid);
> + assert_eq!(deserialized.priority, entry.priority);
> + assert_eq!(deserialized.node, entry.node);
> + assert_eq!(deserialized.ident, entry.ident);
> + assert_eq!(deserialized.tag, entry.tag);
> + assert_eq!(deserialized.message, entry.message);
> + }
> +
> + #[test]
> + fn test_binary_format_header_size() {
> + let entry = LogEntry::pack("n", "u", "t", 1, 1000, 6, "m").unwrap();
> + let binary = entry.serialize_binary(0, 0);
> +
> + // Header should be exactly 48 bytes
> + // prev(4) + next(4) + uid(4) + time(4) + node_digest(8) + ident_digest(8) +
> + // pid(4) + priority(1) + node_len(1) + ident_len(1) + tag_len(1) + msg_len(4)
> + assert!(binary.len() >= 48);
> +
> + // First 48 bytes are header
> + assert_eq!(&binary[0..4], &0u32.to_le_bytes()); // prev
> + assert_eq!(&binary[4..8], &0u32.to_le_bytes()); // next
> + }
> +
> + #[test]
> + fn test_binary_deserialize_invalid_size() {
> + let too_small = vec![0u8; 40]; // Less than 48 byte header
> + let result = LogEntry::deserialize_binary(&too_small);
> + assert!(result.is_err());
> + }
> +
> + #[test]
> + fn test_binary_null_terminators() {
> + let entry = LogEntry::pack("node1", "root", "tag", 123, 1000, 6, "message").unwrap();
> + let binary = entry.serialize_binary(0, 0);
> +
> + // Check that strings are null-terminated
> + // Find null bytes in data section (after 48-byte header)
> + let data_section = &binary[48..];
> + let null_count = data_section.iter().filter(|&&b| b == 0).count();
> + assert_eq!(null_count, 4); // 4 null terminators (node, ident, tag, msg)
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/hash.rs b/src/pmxcfs-rs/pmxcfs-logger/src/hash.rs
> new file mode 100644
> index 00000000..710c9ab3
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/hash.rs
> @@ -0,0 +1,173 @@
> +/// FNV-1a (Fowler-Noll-Vo) 64-bit hash function
> +///
> +/// This matches the C implementation's fnv_64a_buf function (logger.c:52-60)
> +/// Used for generating node and ident digests for deduplication.
> +/// FNV-1a 64-bit non-zero initial basis
> +pub(crate) const FNV1A_64_INIT: u64 = 0xcbf29ce484222325;
> +
> +/// Compute 64-bit FNV-1a hash
> +///
> +/// This is a faithful port of the C implementation from logger.c lines 52-60:
> +/// ```c
> +/// static inline uint64_t fnv_64a_buf(const void *buf, size_t len, uint64_t hval) {
> +/// unsigned char *bp = (unsigned char *)buf;
> +/// unsigned char *be = bp + len;
> +/// while (bp < be) {
> +/// hval ^= (uint64_t)*bp++;
> +/// hval += (hval << 1) + (hval << 4) + (hval << 5) + (hval << 7) + (hval << 8) + (hval << 40);
> +/// }
> +/// return hval;
> +/// }
> +/// ```
> +///
> +/// # Arguments
> +/// * `data` - The data to hash
> +/// * `init` - Initial hash value (use FNV1A_64_INIT for first hash)
> +///
> +/// # Returns
> +/// 64-bit hash value
> +///
> +/// Note: This function appears unused but is actually called via `fnv_64a_str` below,
> +/// which provides the primary API for string hashing. Both functions share the core
> +/// FNV-1a implementation logic.
> +#[inline]
> +#[allow(dead_code)] // Used via fnv_64a_str wrapper
> +pub(crate) fn fnv_64a(data: &[u8], init: u64) -> u64 {
> + let mut hval = init;
> +
> + for &byte in data {
> + hval ^= byte as u64;
> + // FNV magic prime multiplication done via shifts and adds
> + // This is equivalent to: hval *= 0x100000001b3 (FNV 64-bit prime)
> + hval = hval.wrapping_add(
> + (hval << 1)
> + .wrapping_add(hval << 4)
> + .wrapping_add(hval << 5)
> + .wrapping_add(hval << 7)
> + .wrapping_add(hval << 8)
> + .wrapping_add(hval << 40),
> + );
> + }
> +
> + hval
> +}
> +
> +/// Hash a null-terminated string (includes the null byte)
> +///
> +/// The C implementation includes the null terminator in the hash:
> +/// `fnv_64a_buf(node, node_len, FNV1A_64_INIT)` where node_len includes the '\0'
> +///
> +/// This function adds a null byte to match that behavior.
> +#[inline]
> +pub(crate) fn fnv_64a_str(s: &str) -> u64 {
> + let bytes = s.as_bytes();
> + let mut hval = FNV1A_64_INIT;
> +
> + for &byte in bytes {
> + hval ^= byte as u64;
> + hval = hval.wrapping_add(
> + (hval << 1)
> + .wrapping_add(hval << 4)
> + .wrapping_add(hval << 5)
> + .wrapping_add(hval << 7)
> + .wrapping_add(hval << 8)
> + .wrapping_add(hval << 40),
> + );
> + }
> +
> + // Hash the null terminator (C compatibility: original XORs with 0 which is a no-op)
> + // We skip the no-op XOR and proceed directly to the final avalanche
> + hval.wrapping_add(
> + (hval << 1)
> + .wrapping_add(hval << 4)
> + .wrapping_add(hval << 5)
> + .wrapping_add(hval << 7)
> + .wrapping_add(hval << 8)
> + .wrapping_add(hval << 40),
> + )
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> +
> + #[test]
> + fn test_fnv1a_init() {
> + // Test that init constant matches C implementation
> + assert_eq!(FNV1A_64_INIT, 0xcbf29ce484222325);
> + }
> +
> + #[test]
> + fn test_fnv1a_empty() {
> + // Empty string with null terminator
> + let hash = fnv_64a(&[0], FNV1A_64_INIT);
> + assert_ne!(hash, FNV1A_64_INIT); // Should be different from init
> + }
> +
> + #[test]
> + fn test_fnv1a_consistency() {
> + // Same input should produce same output
> + let data = b"test";
> + let hash1 = fnv_64a(data, FNV1A_64_INIT);
> + let hash2 = fnv_64a(data, FNV1A_64_INIT);
> + assert_eq!(hash1, hash2);
> + }
> +
> + #[test]
> + fn test_fnv1a_different_data() {
> + // Different input should (usually) produce different output
> + let hash1 = fnv_64a(b"test1", FNV1A_64_INIT);
> + let hash2 = fnv_64a(b"test2", FNV1A_64_INIT);
> + assert_ne!(hash1, hash2);
> + }
> +
> + #[test]
> + fn test_fnv1a_str() {
> + // Test string hashing with null terminator
> + let hash1 = fnv_64a_str("node1");
> + let hash2 = fnv_64a_str("node1");
> + let hash3 = fnv_64a_str("node2");
> +
> + assert_eq!(hash1, hash2); // Same string should hash the same
> + assert_ne!(hash1, hash3); // Different strings should hash differently
> + }
> +
> + #[test]
> + fn test_fnv1a_node_names() {
> + // Test with typical Proxmox node names
> + let nodes = vec!["pve1", "pve2", "pve3"];
> + let mut hashes = Vec::new();
> +
> + for node in &nodes {
> + let hash = fnv_64a_str(node);
> + hashes.push(hash);
> + }
> +
> + // All hashes should be unique
> + for i in 0..hashes.len() {
> + for j in (i + 1)..hashes.len() {
> + assert_ne!(
> + hashes[i], hashes[j],
> + "Hashes for {} and {} should differ",
> + nodes[i], nodes[j]
> + );
> + }
> + }
> + }
> +
> + #[test]
> + fn test_fnv1a_chaining() {
> + // Test that we can chain hashes
> + let data1 = b"first";
> + let data2 = b"second";
> +
> + let hash1 = fnv_64a(data1, FNV1A_64_INIT);
> + let hash2 = fnv_64a(data2, hash1); // Use previous hash as init
> +
> + // Should produce a deterministic result
> + let hash1_again = fnv_64a(data1, FNV1A_64_INIT);
> + let hash2_again = fnv_64a(data2, hash1_again);
> +
> + assert_eq!(hash2, hash2_again);
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/lib.rs b/src/pmxcfs-rs/pmxcfs-logger/src/lib.rs
> new file mode 100644
> index 00000000..964f0b3a
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/lib.rs
> @@ -0,0 +1,27 @@
> +/// Cluster Log Implementation
> +///
> +/// This module provides a cluster-wide log system compatible with the C implementation.
> +/// It maintains a ring buffer of log entries that can be merged from multiple nodes,
> +/// deduplicated, and exported to JSON.
> +///
> +/// Key features:
> +/// - Ring buffer storage for efficient memory usage
> +/// - FNV-1a hashing for node and ident tracking
> +/// - Deduplication across nodes
> +/// - Time-based sorting
> +/// - Multi-node log merging
> +/// - JSON export for web UI
> +// Internal modules (not exposed)
> +mod cluster_log;
> +mod entry;
> +mod hash;
> +mod ring_buffer;
> +
> +// Public API - only expose what's needed externally
> +pub use cluster_log::ClusterLog;
> +
> +// Re-export types only for testing or internal crate use
> +#[doc(hidden)]
> +pub use entry::LogEntry;
> +#[doc(hidden)]
> +pub use ring_buffer::RingBuffer;
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/ring_buffer.rs b/src/pmxcfs-rs/pmxcfs-logger/src/ring_buffer.rs
> new file mode 100644
> index 00000000..4f6db63e
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/ring_buffer.rs
> @@ -0,0 +1,581 @@
> +/// Ring Buffer Implementation for Cluster Log
> +///
> +/// This module implements a circular buffer for storing log entries,
> +/// matching the C implementation's clog_base_t structure.
> +use super::entry::LogEntry;
> +use super::hash::fnv_64a_str;
> +use anyhow::{bail, Result};
> +use std::collections::VecDeque;
> +
> +pub(crate) const CLOG_DEFAULT_SIZE: usize = 5 * 1024 * 1024; // 5MB
> +pub(crate) const CLOG_MAX_ENTRY_SIZE: usize = 8192 + 4096;
These constants don't match the C constants
#define CLOG_DEFAULT_SIZE (8192 * 16)
#define CLOG_MAX_ENTRY_SIZE 4096
That likely affects capacity semantics, merge limits, and the binary
format?
> +
> +/// Ring buffer for log entries
> +///
> +/// This is a simplified Rust version of the C implementation's ring buffer.
> +/// The C version uses a raw byte buffer with manual pointer arithmetic,
> +/// but we use a VecDeque for safety and simplicity while maintaining
> +/// the same conceptual behavior.
> +///
> +/// C structure (logger.c:64-68):
> +/// ```c
> +/// struct clog_base {
> +/// uint32_t size; // Total buffer size
> +/// uint32_t cpos; // Current position
> +/// char data[]; // Variable length data
> +/// };
> +/// ```
> +#[derive(Debug, Clone)]
> +pub struct RingBuffer {
> + /// Maximum capacity in bytes
> + capacity: usize,
> +
> + /// Current size in bytes (approximate)
> + current_size: usize,
> +
> + /// Entries stored in the buffer (newest first)
> + /// We use VecDeque for efficient push/pop at both ends
> + entries: VecDeque<LogEntry>,
> +}
> +
> +impl RingBuffer {
> + /// Create a new ring buffer with specified capacity
> + pub fn new(capacity: usize) -> Self {
> + // Ensure minimum capacity
> + let capacity = if capacity < CLOG_MAX_ENTRY_SIZE * 10 {
> + CLOG_DEFAULT_SIZE
> + } else {
> + capacity
> + };
> +
> + Self {
> + capacity,
> + current_size: 0,
> + entries: VecDeque::new(),
> + }
> + }
> +
> + /// Add an entry to the buffer
> + ///
> + /// Matches C's `clog_copy` function (logger.c:208-218) which calls
> + /// `clog_alloc_entry` (logger.c:76-102) to allocate space in the ring buffer.
> + pub fn add_entry(&mut self, entry: &LogEntry) -> Result<()> {
> + let entry_size = entry.aligned_size();
> +
> + // Make room if needed (remove oldest entries)
> + while self.current_size + entry_size > self.capacity && !self.entries.is_empty() {
> + if let Some(old_entry) = self.entries.pop_back() {
> + self.current_size = self.current_size.saturating_sub(old_entry.aligned_size());
> + }
> + }
> +
> + // Add new entry at the front (newest first)
> + self.entries.push_front(entry.clone());
> + self.current_size += entry_size;
> +
> + Ok(())
> + }
> +
> + /// Check if buffer is near full (>90% capacity)
> + pub fn is_near_full(&self) -> bool {
> + self.current_size > (self.capacity * 9 / 10)
> + }
> +
> + /// Check if buffer is empty
> + pub fn is_empty(&self) -> bool {
> + self.entries.is_empty()
> + }
> +
> + /// Get number of entries
> + pub fn len(&self) -> usize {
> + self.entries.len()
> + }
> +
> + /// Get buffer capacity
> + pub fn capacity(&self) -> usize {
> + self.capacity
> + }
> +
> + /// Iterate over entries (newest first)
> + pub fn iter(&self) -> impl Iterator<Item = &LogEntry> {
> + self.entries.iter()
> + }
> +
> + /// Sort entries by time, node_digest, and uid
> + ///
> + /// Matches C's `clog_sort` function (logger.c:321-355)
> + ///
> + /// C uses GTree with custom comparison function `clog_entry_sort_fn`
> + /// (logger.c:297-310):
> + /// ```c
> + /// if (entry1->time != entry2->time) {
> + /// return entry1->time - entry2->time;
> + /// }
> + /// if (entry1->node_digest != entry2->node_digest) {
> + /// return entry1->node_digest - entry2->node_digest;
> + /// }
> + /// return entry1->uid - entry2->uid;
> + /// ```
> + pub fn sort(&self) -> Result<Self> {
> + let mut new_buffer = Self::new(self.capacity);
> +
> + // Collect and sort entries
> + let mut sorted: Vec<LogEntry> = self.entries.iter().cloned().collect();
> +
> + // Sort by time (ascending), then node_digest, then uid
> + sorted.sort_by_key(|e| (e.time, e.node_digest, e.uid));
> +
> + // Add sorted entries to new buffer
> + // Since add_entry pushes to front, we add in forward order to get newest-first
> + // sorted = [oldest...newest], add_entry pushes to front, so:
> + // - Add oldest: [oldest]
> + // - Add next: [next, oldest]
> + // - Add newest: [newest, next, oldest]
> + for entry in sorted.iter() {
> + new_buffer.add_entry(entry)?;
> + }
> +
> + Ok(new_buffer)
> + }
> +
> + /// Dump buffer to JSON format
> + ///
> + /// Matches C's `clog_dump_json` function (logger.c:139-199)
> + ///
> + /// # Arguments
> + /// * `ident_filter` - Optional ident filter (user filter)
> + /// * `max_entries` - Maximum number of entries to include
> + pub fn dump_json(&self, ident_filter: Option<&str>, max_entries: usize) -> String {
> + // Compute ident digest if filter is provided
> + let ident_digest = ident_filter.map(fnv_64a_str);
> +
> + let mut data = Vec::new();
> + let mut count = 0;
> +
> + // Iterate over entries (newest first)
> + for entry in self.iter() {
> + if count >= max_entries {
> + break;
> + }
> +
> + // Apply ident filter if specified
> + if let Some(digest) = ident_digest {
> + if digest != entry.ident_digest {
> + continue;
> + }
> + }
> +
> + data.push(entry.to_json_object());
> + count += 1;
> + }
> +
> + // Reverse to show oldest first (matching C behavior)
> + data.reverse();
C prints entries newest to oldest (walk prev from cpos).
Shouldnt this line be removed?
> +
> + let result = serde_json::json!({
> + "data": data
> + });
> +
> + serde_json::to_string_pretty(&result).unwrap_or_else(|_| "{}".to_string())
> + }
> +
> + /// Dump buffer contents (for debugging)
> + ///
> + /// Matches C's `clog_dump` function (logger.c:122-137)
> + #[allow(dead_code)]
> + pub fn dump(&self) {
> + for (idx, entry) in self.entries.iter().enumerate() {
> + println!(
> + "[{}] uid={:08x} time={} node={}{{{:016X}}} tag={}[{}{{{:016X}}}]: {}",
> + idx,
> + entry.uid,
> + entry.time,
> + entry.node,
> + entry.node_digest,
> + entry.tag,
> + entry.ident,
> + entry.ident_digest,
> + entry.message
> + );
> + }
> + }
> +
> + /// Serialize to C binary format (clog_base_t)
> + ///
> + /// Binary layout matches C structure:
> + /// ```c
> + /// struct clog_base {
> + /// uint32_t size; // Total buffer size
> + /// uint32_t cpos; // Current position (offset to newest entry)
> + /// char data[]; // Entry data
> + /// };
> + /// ```
> + pub(crate) fn serialize_binary(&self) -> Vec<u8> {
Please re-check, but in C, clusterlog_get_state() returns a full
memdump (allocated ring buffer capacity), with cpos pointing at the
newest entry offset (not always 8). Also in C, entry.next is not a
pointer to the next/newer entry, it’s the end offset of this entry
(entry_off + aligned_size), used to find where the next entry
should be written.
> + // Empty buffer case
> + if self.entries.is_empty() {
> + let mut buf = Vec::with_capacity(8);
> + buf.extend_from_slice(&8u32.to_le_bytes()); // size = header only
> + buf.extend_from_slice(&0u32.to_le_bytes()); // cpos = 0 (empty)
> + return buf;
> + }
> +
> + // Calculate total size needed
> + let mut data_size = 0usize;
> + for entry in self.iter() {
> + data_size += entry.aligned_size();
> + }
> +
> + let total_size = 8 + data_size; // 8 bytes header + data
> + let mut buf = Vec::with_capacity(total_size);
> +
> + // Write header
> + buf.extend_from_slice(&(total_size as u32).to_le_bytes()); // size
> + buf.extend_from_slice(&8u32.to_le_bytes()); // cpos (points to first entry at offset 8)
> +
> + // Write entries with linked list structure
> + // Entries are in newest-first order in our VecDeque
> + let entry_count = self.entries.len();
> + let mut offsets = Vec::with_capacity(entry_count);
> + let mut current_offset = 8u32; // Start after header
> +
> + // Calculate offsets first
> + for entry in self.iter() {
> + offsets.push(current_offset);
> + current_offset += entry.aligned_size() as u32;
> + }
> +
> + // Write entries with prev/next pointers
> + // Build circular linked list: newest -> ... -> oldest
> + // Entry 0 (newest) has prev pointing to entry 1
> + // Last entry has prev = 0 (end of list)
> + for (i, entry) in self.iter().enumerate() {
> + let prev = if i + 1 < entry_count {
> + offsets[i + 1]
> + } else {
> + 0
> + };
> + let next = if i > 0 { offsets[i - 1] } else { 0 };
> +
> + let entry_bytes = entry.serialize_binary(prev, next);
> + buf.extend_from_slice(&entry_bytes);
> +
> + // Add padding to maintain 8-byte alignment
> + let aligned_size = entry.aligned_size();
> + let padding = aligned_size - entry_bytes.len();
> + buf.resize(buf.len() + padding, 0);
> + }
> +
> + buf
> + }
> +
> + /// Deserialize from C binary format
> + ///
> + /// Parses clog_base_t structure and extracts all entries
> + pub(crate) fn deserialize_binary(data: &[u8]) -> Result<Self> {
> + if data.len() < 8 {
> + bail!(
> + "Buffer too small: {} bytes (need at least 8 for header)",
> + data.len()
> + );
> + }
> +
> + // Read header
> + let size = u32::from_le_bytes(data[0..4].try_into()?) as usize;
> + let cpos = u32::from_le_bytes(data[4..8].try_into()?) as usize;
> +
> + if size != data.len() {
> + bail!(
> + "Size mismatch: header says {}, got {} bytes",
> + size,
> + data.len()
> + );
> + }
> +
> + if cpos < 8 || cpos >= size {
> + // Empty buffer (cpos == 0) or invalid
> + if cpos == 0 {
> + return Ok(Self::new(size));
> + }
> + bail!("Invalid cpos: {cpos} (size: {size})");
> + }
> +
> + // Parse entries starting from cpos, walking backwards via prev pointers
> + let mut entries = VecDeque::new();
> + let mut current_pos = cpos;
> +
C has wrap/overwrite guards when walking prev.
We should probably mirror those checks here too
> + loop {
> + if current_pos == 0 || current_pos < 8 || current_pos >= size {
> + break;
> + }
> +
> + // Parse entry at current_pos
> + let entry_data = &data[current_pos..];
> + let (entry, prev, _next) = LogEntry::deserialize_binary(entry_data)?;
> +
> + // Add to back (we're walking backwards in time, newest to oldest)
> + // VecDeque should end up as [newest, ..., oldest]
> + entries.push_back(entry);
> +
> + current_pos = prev as usize;
> + }
> +
> + // Create ring buffer with entries
> + let mut ring = Self::new(size);
> + ring.entries = entries;
> + ring.current_size = size - 8; // Approximate
> +
> + Ok(ring)
> + }
> +}
> +
> +impl Default for RingBuffer {
> + fn default() -> Self {
> + Self::new(CLOG_DEFAULT_SIZE)
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> +
> + #[test]
> + fn test_ring_buffer_creation() {
> + let buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> + assert_eq!(buffer.capacity, CLOG_DEFAULT_SIZE);
> + assert_eq!(buffer.len(), 0);
> + assert!(buffer.is_empty());
> + }
> +
> + #[test]
> + fn test_add_entry() {
> + let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> + let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "message").unwrap();
> +
> + let result = buffer.add_entry(&entry);
> + assert!(result.is_ok());
> + assert_eq!(buffer.len(), 1);
> + assert!(!buffer.is_empty());
> + }
> +
> + #[test]
> + fn test_ring_buffer_wraparound() {
> + // Create a buffer with minimum required size (CLOG_MAX_ENTRY_SIZE * 10)
> + // but fill it beyond 90% to trigger wraparound
> + let mut buffer = RingBuffer::new(CLOG_MAX_ENTRY_SIZE * 10);
> +
> + // Add many small entries to fill the buffer
> + // Each entry is small, so we need many to fill the buffer
> + let initial_count = 50_usize;
> + for i in 0..initial_count {
> + let entry =
> + LogEntry::pack("node1", "root", "tag", 0, 1000 + i as u32, 6, "msg").unwrap();
> + let _ = buffer.add_entry(&entry);
> + }
> +
> + // All entries should fit initially
> + let count_before = buffer.len();
> + assert_eq!(count_before, initial_count);
> +
> + // Now add entries with large messages to trigger wraparound
> + // Make messages large enough to fill the buffer beyond capacity
> + let large_msg = "x".repeat(7000); // Very large message (close to max)
> + let large_entries_count = 20_usize;
> + for i in 0..large_entries_count {
> + let entry =
> + LogEntry::pack("node1", "root", "tag", 0, 2000 + i as u32, 6, &large_msg).unwrap();
> + let _ = buffer.add_entry(&entry);
> + }
> +
> + // Should have removed some old entries due to capacity limits
> + assert!(
> + buffer.len() < count_before + large_entries_count,
> + "Expected wraparound to remove old entries (have {} entries, expected < {})",
> + buffer.len(),
> + count_before + large_entries_count
> + );
> +
> + // Newest entry should be present
> + let newest = buffer.iter().next().unwrap();
> + assert_eq!(newest.time, 2000 + large_entries_count as u32 - 1); // Last added entry
> + }
> +
> + #[test]
> + fn test_sort_by_time() {
> + let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> + // Add entries in random time order
> + let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1002, 6, "c").unwrap());
> + let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "a").unwrap());
> + let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "b").unwrap());
> +
> + let sorted = buffer.sort().unwrap();
> +
> + // Check that entries are sorted by time (oldest first after reversing)
> + let times: Vec<u32> = sorted.iter().map(|e| e.time).collect();
> + let mut times_sorted = times.clone();
> + times_sorted.sort();
> + times_sorted.reverse(); // Newest first in buffer
> + assert_eq!(times, times_sorted);
> + }
> +
> + #[test]
> + fn test_sort_by_node_digest() {
> + let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> + // Add entries with same time but different nodes
> + let _ = buffer.add_entry(&LogEntry::pack("node3", "root", "tag", 0, 1000, 6, "c").unwrap());
> + let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "a").unwrap());
> + let _ = buffer.add_entry(&LogEntry::pack("node2", "root", "tag", 0, 1000, 6, "b").unwrap());
> +
> + let sorted = buffer.sort().unwrap();
> +
> + // Entries with same time should be sorted by node_digest
> + // Within same time, should be sorted
> + for entries in sorted.iter().collect::<Vec<_>>().windows(2) {
> + if entries[0].time == entries[1].time {
> + assert!(entries[0].node_digest >= entries[1].node_digest);
> + }
> + }
> + }
> +
> + #[test]
> + fn test_json_dump() {
> + let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> + let _ = buffer
> + .add_entry(&LogEntry::pack("node1", "root", "cluster", 123, 1000, 6, "msg").unwrap());
> +
> + let json = buffer.dump_json(None, 50);
> +
> + // Should be valid JSON
> + let parsed: serde_json::Value = serde_json::from_str(&json).unwrap();
> + assert!(parsed.get("data").is_some());
> +
> + let data = parsed["data"].as_array().unwrap();
> + assert_eq!(data.len(), 1);
> +
> + let entry = &data[0];
> + assert_eq!(entry["node"], "node1");
> + assert_eq!(entry["user"], "root");
> + assert_eq!(entry["tag"], "cluster");
> + }
> +
> + #[test]
> + fn test_json_dump_with_filter() {
> + let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> + // Add entries with different users
> + let _ =
> + buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg1").unwrap());
> + let _ =
> + buffer.add_entry(&LogEntry::pack("node1", "admin", "tag", 0, 1001, 6, "msg2").unwrap());
> + let _ =
> + buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1002, 6, "msg3").unwrap());
> +
> + // Filter for "root" only
> + let json = buffer.dump_json(Some("root"), 50);
> +
> + let parsed: serde_json::Value = serde_json::from_str(&json).unwrap();
> + let data = parsed["data"].as_array().unwrap();
> +
> + // Should only have 2 entries (the ones from "root")
> + assert_eq!(data.len(), 2);
> +
> + for entry in data {
> + assert_eq!(entry["user"], "root");
> + }
> + }
> +
> + #[test]
> + fn test_json_dump_max_entries() {
> + let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> + // Add 10 entries
> + for i in 0..10 {
> + let _ = buffer
> + .add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000 + i, 6, "msg").unwrap());
> + }
> +
> + // Request only 5 entries
> + let json = buffer.dump_json(None, 5);
> +
> + let parsed: serde_json::Value = serde_json::from_str(&json).unwrap();
> + let data = parsed["data"].as_array().unwrap();
> +
> + assert_eq!(data.len(), 5);
> + }
> +
> + #[test]
> + fn test_iterator() {
> + let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> + let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "a").unwrap());
> + let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "b").unwrap());
> + let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1002, 6, "c").unwrap());
> +
> + let messages: Vec<String> = buffer.iter().map(|e| e.message.clone()).collect();
> +
> + // Should be in reverse order (newest first)
> + assert_eq!(messages, vec!["c", "b", "a"]);
> + }
> +
> + #[test]
> + fn test_binary_serialization_roundtrip() {
> + let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> + let _ = buffer.add_entry(
> + &LogEntry::pack("node1", "root", "cluster", 123, 1000, 6, "Entry 1").unwrap(),
> + );
> + let _ = buffer.add_entry(
> + &LogEntry::pack("node2", "admin", "system", 456, 1001, 5, "Entry 2").unwrap(),
> + );
> +
> + // Serialize
> + let binary = buffer.serialize_binary();
> +
> + // Deserialize
> + let deserialized = RingBuffer::deserialize_binary(&binary).unwrap();
> +
> + // Check entry count
> + assert_eq!(deserialized.len(), buffer.len());
> +
> + // Check entries match
> + let orig_entries: Vec<_> = buffer.iter().collect();
> + let deser_entries: Vec<_> = deserialized.iter().collect();
> +
> + for (orig, deser) in orig_entries.iter().zip(deser_entries.iter()) {
> + assert_eq!(deser.uid, orig.uid);
> + assert_eq!(deser.time, orig.time);
> + assert_eq!(deser.node, orig.node);
> + assert_eq!(deser.message, orig.message);
> + }
> + }
> +
> + #[test]
> + fn test_binary_format_header() {
> + let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> + let _ = buffer.add_entry(&LogEntry::pack("n", "u", "t", 1, 1000, 6, "m").unwrap());
> +
> + let binary = buffer.serialize_binary();
> +
> + // Check header format
> + assert!(binary.len() >= 8);
> +
> + let size = u32::from_le_bytes(binary[0..4].try_into().unwrap()) as usize;
> + let cpos = u32::from_le_bytes(binary[4..8].try_into().unwrap());
> +
> + assert_eq!(size, binary.len());
> + assert_eq!(cpos, 8); // First entry at offset 8
> + }
> +
> + #[test]
> + fn test_binary_empty_buffer() {
> + let buffer = RingBuffer::new(1024);
> + let binary = buffer.serialize_binary();
> +
> + // Empty buffer should just be header
> + assert_eq!(binary.len(), 8);
> +
> + let deserialized = RingBuffer::deserialize_binary(&binary).unwrap();
> + assert_eq!(deserialized.len(), 0);
> + }
> +}
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [relevance 6%]
* Re: [pve-devel] [PATCH pve-cluster 02/15] pmxcfs-rs: add pmxcfs-config crate
2026-01-23 15:01 6% ` Samuel Rufinatscha
@ 2026-01-26 9:43 5% ` Kefu Chai
0 siblings, 0 replies; 200+ results
From: Kefu Chai @ 2026-01-26 9:43 UTC (permalink / raw)
To: Samuel Rufinatscha, Proxmox VE development discussion
On Fri Jan 23, 2026 at 11:01 PM CST, Samuel Rufinatscha wrote:
> comments inline
>
> On 1/6/26 3:25 PM, Kefu Chai wrote:
>> Add configuration management crate that provides:
>> - Config struct for runtime configuration
>> - Node hostname, IP, and group ID tracking
>> - Debug and local mode flags
>> - Thread-safe configuration access via parking_lot Mutex
>>
>> This is a foundational crate with no internal dependencies, only
>> requiring parking_lot for synchronization. Other crates will use
>> this for accessing runtime configuration.
>>
>> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
>> ---
>> src/pmxcfs-rs/Cargo.toml | 3 +-
>> src/pmxcfs-rs/pmxcfs-config/Cargo.toml | 16 +
>> src/pmxcfs-rs/pmxcfs-config/README.md | 127 +++++++
>> src/pmxcfs-rs/pmxcfs-config/src/lib.rs | 471 +++++++++++++++++++++++++
>> 4 files changed, 616 insertions(+), 1 deletion(-)
>> create mode 100644 src/pmxcfs-rs/pmxcfs-config/Cargo.toml
>> create mode 100644 src/pmxcfs-rs/pmxcfs-config/README.md
>> create mode 100644 src/pmxcfs-rs/pmxcfs-config/src/lib.rs
>>
>> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
>> index 15d88f52..28e20bb7 100644
>> --- a/src/pmxcfs-rs/Cargo.toml
>> +++ b/src/pmxcfs-rs/Cargo.toml
>> @@ -1,7 +1,8 @@
>> # Workspace root for pmxcfs Rust implementation
>> [workspace]
>> members = [
>> - "pmxcfs-api-types", # Shared types and error definitions
>> + "pmxcfs-api-types", # Shared types and error definitions
>> + "pmxcfs-config", # Configuration management
>> ]
>> resolver = "2"
>>
>> diff --git a/src/pmxcfs-rs/pmxcfs-config/Cargo.toml b/src/pmxcfs-rs/pmxcfs-config/Cargo.toml
>> new file mode 100644
>> index 00000000..f5a60995
>> --- /dev/null
>> +++ b/src/pmxcfs-rs/pmxcfs-config/Cargo.toml
>> @@ -0,0 +1,16 @@
>> +[package]
>> +name = "pmxcfs-config"
>> +description = "Configuration management for pmxcfs"
>> +
>> +version.workspace = true
>> +edition.workspace = true
>> +authors.workspace = true
>> +license.workspace = true
>> +repository.workspace = true
>> +
>> +[lints]
>> +workspace = true
>> +
>> +[dependencies]
>> +# Concurrency primitives
>> +parking_lot.workspace = true
>> diff --git a/src/pmxcfs-rs/pmxcfs-config/README.md b/src/pmxcfs-rs/pmxcfs-config/README.md
>> new file mode 100644
>> index 00000000..c06b2170
>> --- /dev/null
>> +++ b/src/pmxcfs-rs/pmxcfs-config/README.md
>> @@ -0,0 +1,127 @@
>> +# pmxcfs-config
>> +
>> +**Configuration Management** and **Cluster Services** for pmxcfs.
>> +
>> +This crate provides configuration structures and cluster integration services including quorum tracking and cluster configuration monitoring via Corosync APIs.
>> +
>> +## Overview
>> +
>> +This crate contains:
>> +1. **Config struct**: Runtime configuration (node name, IPs, flags)
>> +2. Integration with Corosync services (tracked in main pmxcfs crate):
>> + - **QuorumService** (`pmxcfs/src/quorum_service.rs`) - Quorum monitoring
>> + - **ClusterConfigService** (`pmxcfs/src/cluster_config_service.rs`) - Config tracking
>
> This patch only contains the Config struct, but not Cluster Services
> or QuorumService, please revist commit message and README.
Sorry, the README.md was out-of-sync after the latest refatory. Fixed.
>
>> +
>> +## Config Struct
>> +
>> +The `Config` struct holds daemon-wide configuration including node hostname, IP address, www-data group ID, debug flag, local mode flag, and cluster name.
>> +
>> +## Cluster Services
>> +
>> +The following services are implemented in the main pmxcfs crate but documented here for completeness.
>> +
>> +### QuorumService
>> +
>> +**C Equivalent:** `src/pmxcfs/quorum.c` - `service_quorum_new()`
>> +**Rust Location:** `src/pmxcfs-rs/pmxcfs/src/quorum_service.rs`
>> +
>> +Monitors cluster quorum status via Corosync quorum API.
>> +
>> +#### Features
>> +- Tracks quorum state (quorate/inquorate)
>> +- Monitors member list changes
>> +- Automatic reconnection on Corosync restart
>> +- Updates `Status` quorum flag
>> +
>> +#### C to Rust Mapping
>> +
>> +| C Function | Rust Equivalent | Location |
>> +|-----------|-----------------|----------|
>> +| `service_quorum_new()` | `QuorumService::new()` | quorum_service.rs |
>> +| `service_quorum_destroy()` | (Drop trait / finalize) | Automatic |
>> +| `quorum_notification_fn` | quorum_notification closure | quorum_service.rs |
>> +| `nodelist_notification_fn` | nodelist_notification closure | quorum_service.rs |
>> +
>> +#### Quorum Notifications
>> +
>> +The service monitors quorum state changes and member list changes, updating the Status accordingly.
>> +
>> +### ClusterConfigService
>> +
>> +**C Equivalent:** `src/pmxcfs/confdb.c` - `service_confdb_new()`
>> +**Rust Location:** `src/pmxcfs-rs/pmxcfs/src/cluster_config_service.rs`
>> +
>> +Monitors Corosync cluster configuration (cmap) and tracks node membership.
>> +
>> +#### Features
>> +- Monitors cluster membership via Corosync cmap API
>> +- Tracks node additions/removals
>> +- Registers nodes in Status
>> +- Automatic reconnection on Corosync restart
>> +
>> +#### C to Rust Mapping
>> +
>> +| C Function | Rust Equivalent | Location |
>> +|-----------|-----------------|----------|
>> +| `service_confdb_new()` | `ClusterConfigService::new()` | cluster_config_service.rs |
>> +| `service_confdb_destroy()` | (Drop trait / finalize) | Automatic |
>> +| `confdb_track_fn` | (direct cmap queries) | Different approach |
>> +
>> +#### Configuration Tracking
>> +
>> +The service monitors:
>> +- `nodelist.node.*.nodeid` - Node IDs
>> +- `nodelist.node.*.name` - Node names
>> +- `nodelist.node.*.ring*_addr` - Node IP addresses
>> +
>> +Updates `Status` with current cluster membership.
>> +
>> +## Key Differences from C Implementation
>> +
>> +### Cluster Config Service API
>> +
>> +**C Version (confdb.c):**
>> +- Uses deprecated confdb API
>> +- Track changes via confdb notifications
>> +
>> +**Rust Version:**
>> +- Uses modern cmap API
>> +- Direct cmap queries
>> +
>> +Both read the same data, but Rust uses the modern Corosync API.
>> +
>> +### Service Integration
>> +
>> +**C Version:**
>> +- qb_loop manages lifecycle
>> +
>> +**Rust Version:**
>> +- Service trait abstracts lifecycle
>> +- ServiceManager handles retry
>> +- Tokio async dispatch
>> +
>> +## Known Issues / TODOs
>> +
>> +### Compatibility
>> +- **Quorum tracking**: Compatible with C implementation
>> +- **Node registration**: Equivalent behavior
>> +- **cmap vs confdb**: Rust uses modern cmap API (C uses deprecated confdb)
>> +
>> +### Missing Features
>> +- None identified
>> +
>> +### Behavioral Differences (Benign)
>> +- **API choice**: Rust uses cmap, C uses confdb (both read same data)
>> +- **Lifecycle**: Rust uses Service trait, C uses manual lifecycle
>> +
>> +## References
>> +
>> +### C Implementation
>> +- `src/pmxcfs/quorum.c` / `quorum.h` - Quorum service
>> +- `src/pmxcfs/confdb.c` / `confdb.h` - Cluster config service
>> +
>> +### Related Crates
>> +- **pmxcfs**: Main daemon with QuorumService and ClusterConfigService
>> +- **pmxcfs-status**: Status tracking updated by these services
>> +- **pmxcfs-services**: Service framework used by both services
>> +- **rust-corosync**: Corosync FFI bindings
>> diff --git a/src/pmxcfs-rs/pmxcfs-config/src/lib.rs b/src/pmxcfs-rs/pmxcfs-config/src/lib.rs
>> new file mode 100644
>> index 00000000..5e1ee1b2
>> --- /dev/null
>> +++ b/src/pmxcfs-rs/pmxcfs-config/src/lib.rs
>> @@ -0,0 +1,471 @@
>> +use parking_lot::RwLock;
>> +use std::sync::Arc;
>> +
>> +/// Global configuration for pmxcfs
>> +pub struct Config {
>> + /// Node name (hostname without domain)
>> + pub nodename: String,
>> +
>> + /// Node IP address
>> + pub node_ip: String,
>
> Consider using std::net::IpAddr (or SocketAddr if a port is part of the
> value). Tests currently mix IP vs IP:PORT, so it’s unclear what node_ip
> is supposed to represent.
It's a value extracted from resolve_node_ip(), so it's just an IP
address. so switched to IpAddr, and tests are updated accordingly.
>
>> +
>> + /// www-data group ID for file permissions
>> + pub www_data_gid: u32,
>> +
>> + /// Debug mode enabled
>> + pub debug: bool,
>> +
>> + /// Force local mode (no clustering)
>> + pub local_mode: bool,
>> +
>> + /// Cluster name (CPG group name)
>> + pub cluster_name: String,
>> +
>> + /// Debug level (0 = normal, 1+ = debug) - mutable at runtime
>> + debug_level: RwLock<u8>,
>
> in the crate docs it says: “The Config struct uses Arc<AtomicU8> for
> debug_level” but the implementation uses parking_lot::RwLock<u8>.
> Unless we need lock coupling with other fields, AtomicU8 would likely
> be sufficient (and cheaper) for debug_level. Also please re-check the
> commit message, which mentions parking_lot::Mutex.
Indeed. AtomicU8 is more light-weight and simpler than RwLock. changed
accordingly.
>
>> +}
>> +
>> +impl Clone for Config {
>> + fn clone(&self) -> Self {
>> + Self {
>> + nodename: self.nodename.clone(),
>> + node_ip: self.node_ip.clone(),
>> + www_data_gid: self.www_data_gid,
>> + debug: self.debug,
>> + local_mode: self.local_mode,
>> + cluster_name: self.cluster_name.clone(),
>> + debug_level: RwLock::new(*self.debug_level.read()),
>> + }
>> + }
>> +}
>> +
>> +impl std::fmt::Debug for Config {
>> + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
>> + f.debug_struct("Config")
>> + .field("nodename", &self.nodename)
>> + .field("node_ip", &self.node_ip)
>> + .field("www_data_gid", &self.www_data_gid)
>> + .field("debug", &self.debug)
>> + .field("local_mode", &self.local_mode)
>> + .field("cluster_name", &self.cluster_name)
>> + .field("debug_level", &*self.debug_level.read())
>> + .finish()
>> + }
>> +}
>> +
>> +impl Config {
>> + pub fn new(
>> + nodename: String,
>> + node_ip: String,
>> + www_data_gid: u32,
>> + debug: bool,
>> + local_mode: bool,
>> + cluster_name: String,
>> + ) -> Arc<Self> {
>
> The constructor returns Arc<Config>
> I think we could keep new() -> Self, and provide convenience
> constructor shared() -> Arc<Self>.
> This would allow local usage (e.g. for tests) without heap allocation
> of the struct
Config::new() is added. And tests are using it automatically.
>
>> + let debug_level = if debug { 1 } else { 0 };
>
> debug_level is derived from debug at creation time, but thereafter:
> set_debug_level() does not update debug and is_debug() would continue
> to reflect the initial flag, not the effective debug level
> is_debug() should just be a helper that returns self.debug_level() > 0.
> The debug field should probably be removed entirely.
Ahh, thanks for pointing this out. Fixed.
>
>> + Arc::new(Self {
>> + nodename,
>> + node_ip,
>> + www_data_gid,
>> + debug,
>> + local_mode,
>> + cluster_name,
>> + debug_level: RwLock::new(debug_level),
>> + })
>> + }
>> +
>> + pub fn cluster_name(&self) -> &str {
>> + &self.cluster_name
>> + }
>> +
>> + pub fn nodename(&self) -> &str {
>> + &self.nodename
>> + }
>> +
>> + pub fn node_ip(&self) -> &str {
>> + &self.node_ip
>> + }
>> +
>> + pub fn www_data_gid(&self) -> u32 {
>> + self.www_data_gid
>> + }
>> +
>> + pub fn is_debug(&self) -> bool {
>> + self.debug
>> + }
>> +
>> + pub fn is_local_mode(&self) -> bool {
>> + self.local_mode
>> + }
>> +
>> + /// Get current debug level (0 = normal, 1+ = debug)
>> + pub fn debug_level(&self) -> u8 {
>> + *self.debug_level.read()
>> + }
>> +
>> + /// Set debug level (0 = normal, 1+ = debug)
>> + pub fn set_debug_level(&self, level: u8) {
>> + *self.debug_level.write() = level;
>> + }
>
> Right now most fields are pub but also getters are exposed. This will
> make it harder to enforce invariants.
> I would suggest to make fields private and keep getters, or keep fields
> public and drop the getters.
Indeed. I made all fields private and keep getters.
>
>> +}
>> +
>> +#[cfg(test)]
>> +mod tests {
>> + //! Unit tests for Config struct
>> + //!
>> + //! This test module provides comprehensive coverage for:
>> + //! - Configuration creation and initialization
>> + //! - Getter methods for all configuration fields
>> + //! - Debug level mutation and thread safety
>> + //! - Concurrent access patterns (reads and writes)
>> + //! - Clone independence
>> + //! - Debug formatting
>> + //! - Edge cases (empty strings, long strings, special characters, unicode)
>> + //!
>> + //! ## Thread Safety
>> + //!
>> + //! The Config struct uses `Arc<AtomicU8>` for debug_level to allow
>> + //! safe concurrent reads and writes. Tests verify:
>> + //! - 10 threads × 100 operations (concurrent modifications)
>> + //! - 20 threads × 1000 operations (concurrent reads)
>> + //!
>> + //! ## Edge Cases
>> + //!
>> + //! Tests cover various edge cases including:
>> + //! - Empty strings for node/cluster names
>> + //! - Long strings (1000+ characters)
>> + //! - Special characters in strings
>> + //! - Unicode support (emoji, non-ASCII characters)
>> +
>> + use super::*;
>> + use std::thread;
>> +
>> + // ===== Basic Construction Tests =====
>> +
>> + #[test]
>> + fn test_config_creation() {
>> + let config = Config::new(
>> + "node1".to_string(),
>> + "192.168.1.10".to_string(),
>> + 33,
>> + false,
>> + false,
>> + "pmxcfs".to_string(),
>> + );
>> +
>> + assert_eq!(config.nodename(), "node1");
>> + assert_eq!(config.node_ip(), "192.168.1.10");
>> + assert_eq!(config.www_data_gid(), 33);
>> + assert!(!config.is_debug());
>> + assert!(!config.is_local_mode());
>> + assert_eq!(config.cluster_name(), "pmxcfs");
>> + assert_eq!(
>> + config.debug_level(),
>> + 0,
>> + "Debug level should be 0 when debug is false"
>> + );
>> + }
>> +
>> + #[test]
>> + fn test_config_creation_with_debug() {
>> + let config = Config::new(
>> + "node2".to_string(),
>> + "10.0.0.5".to_string(),
>> + 1000,
>> + true,
>> + false,
>> + "test-cluster".to_string(),
>> + );
>> +
>> + assert!(config.is_debug());
>> + assert_eq!(
>> + config.debug_level(),
>> + 1,
>> + "Debug level should be 1 when debug is true"
>> + );
>> + }
>> +
>> + #[test]
>> + fn test_config_creation_local_mode() {
>> + let config = Config::new(
>> + "localhost".to_string(),
>> + "127.0.0.1".to_string(),
>> + 33,
>> + false,
>> + true,
>> + "local".to_string(),
>> + );
>> +
>> + assert!(config.is_local_mode());
>> + assert!(!config.is_debug());
>> + }
>> +
>> + // ===== Getter Tests =====
>> +
>> + #[test]
>> + fn test_all_getters() {
>> + let config = Config::new(
>> + "testnode".to_string(),
>> + "172.16.0.1".to_string(),
>> + 999,
>> + true,
>> + true,
>> + "my-cluster".to_string(),
>> + );
>> +
>> + // Test all getter methods
>> + assert_eq!(config.nodename(), "testnode");
>> + assert_eq!(config.node_ip(), "172.16.0.1");
>> + assert_eq!(config.www_data_gid(), 999);
>> + assert!(config.is_debug());
>> + assert!(config.is_local_mode());
>> + assert_eq!(config.cluster_name(), "my-cluster");
>> + assert_eq!(config.debug_level(), 1);
>> + }
>> +
>> + // ===== Debug Level Mutation Tests =====
>> +
>> + #[test]
>> + fn test_debug_level_mutation() {
>> + let config = Config::new(
>> + "node1".to_string(),
>> + "192.168.1.1".to_string(),
>> + 33,
>> + false,
>> + false,
>> + "pmxcfs".to_string(),
>> + );
>> +
>> + assert_eq!(config.debug_level(), 0);
>> +
>> + config.set_debug_level(1);
>> + assert_eq!(config.debug_level(), 1);
>> +
>> + config.set_debug_level(5);
>> + assert_eq!(config.debug_level(), 5);
>> +
>> + config.set_debug_level(0);
>> + assert_eq!(config.debug_level(), 0);
>> + }
>> +
>> + #[test]
>> + fn test_debug_level_max_value() {
>> + let config = Config::new(
>> + "node1".to_string(),
>> + "192.168.1.1".to_string(),
>> + 33,
>> + false,
>> + false,
>> + "pmxcfs".to_string(),
>> + );
>> +
>> + config.set_debug_level(255);
>> + assert_eq!(config.debug_level(), 255);
>> +
>> + config.set_debug_level(0);
>> + assert_eq!(config.debug_level(), 0);
>> + }
>> +
>> + // ===== Thread Safety Tests =====
>> +
>> + #[test]
>> + fn test_debug_level_thread_safety() {
>> + let config = Config::new(
>> + "node1".to_string(),
>> + "192.168.1.1".to_string(),
>> + 33,
>> + false,
>> + false,
>> + "pmxcfs".to_string(),
>> + );
>> +
>> + let config_clone = Arc::clone(&config);
>> +
>> + // Spawn multiple threads that concurrently modify debug level
>> + let handles: Vec<_> = (0..10)
>> + .map(|i| {
>> + let cfg = Arc::clone(&config);
>> + thread::spawn(move || {
>> + for _ in 0..100 {
>> + cfg.set_debug_level(i);
>> + let _ = cfg.debug_level();
>> + }
>> + })
>> + })
>> + .collect();
>> +
>> + // All threads should complete without panicking
>> + for handle in handles {
>> + handle.join().unwrap();
>> + }
>> +
>> + // Final value should be one of the values set by threads
>> + let final_level = config_clone.debug_level();
>> + assert!(
>> + final_level < 10,
>> + "Debug level should be < 10, got {final_level}"
>> + );
>> + }
>> +
>> + #[test]
>> + fn test_concurrent_reads() {
>> + let config = Config::new(
>> + "node1".to_string(),
>> + "192.168.1.1".to_string(),
>> + 33,
>> + true,
>> + false,
>> + "pmxcfs".to_string(),
>> + );
>> +
>> + // Spawn multiple threads that concurrently read config
>> + let handles: Vec<_> = (0..20)
>> + .map(|_| {
>> + let cfg = Arc::clone(&config);
>> + thread::spawn(move || {
>> + for _ in 0..1000 {
>> + assert_eq!(cfg.nodename(), "node1");
>> + assert_eq!(cfg.node_ip(), "192.168.1.1");
>> + assert_eq!(cfg.www_data_gid(), 33);
>> + assert!(cfg.is_debug());
>> + assert!(!cfg.is_local_mode());
>> + assert_eq!(cfg.cluster_name(), "pmxcfs");
>> + }
>> + })
>> + })
>> + .collect();
>> +
>> + for handle in handles {
>> + handle.join().unwrap();
>> + }
>> + }
>> +
>> + // ===== Clone Tests =====
>> +
>> + #[test]
>> + fn test_config_clone() {
>> + let config1 = Config::new(
>> + "node1".to_string(),
>> + "192.168.1.1".to_string(),
>> + 33,
>> + true,
>> + false,
>> + "pmxcfs".to_string(),
>> + );
>> +
>> + config1.set_debug_level(5);
>> +
>> + let config2 = (*config1).clone();
>> +
>> + // Cloned config should have same values
>> + assert_eq!(config2.nodename(), config1.nodename());
>> + assert_eq!(config2.node_ip(), config1.node_ip());
>> + assert_eq!(config2.www_data_gid(), config1.www_data_gid());
>> + assert_eq!(config2.is_debug(), config1.is_debug());
>> + assert_eq!(config2.is_local_mode(), config1.is_local_mode());
>> + assert_eq!(config2.cluster_name(), config1.cluster_name());
>> + assert_eq!(config2.debug_level(), 5);
>> +
>> + // Modifying one should not affect the other
>> + config2.set_debug_level(10);
>> + assert_eq!(config1.debug_level(), 5);
>> + assert_eq!(config2.debug_level(), 10);
>> + }
>> +
>> + // ===== Debug Formatting Tests =====
>> +
>> + #[test]
>> + fn test_debug_format() {
>> + let config = Config::new(
>> + "node1".to_string(),
>> + "192.168.1.1".to_string(),
>> + 33,
>> + true,
>> + false,
>> + "pmxcfs".to_string(),
>> + );
>> +
>> + let debug_str = format!("{config:?}");
>> +
>> + // Check that debug output contains all fields
>> + assert!(debug_str.contains("Config"));
>> + assert!(debug_str.contains("nodename"));
>> + assert!(debug_str.contains("node1"));
>> + assert!(debug_str.contains("node_ip"));
>> + assert!(debug_str.contains("192.168.1.1"));
>> + assert!(debug_str.contains("www_data_gid"));
>> + assert!(debug_str.contains("33"));
>> + assert!(debug_str.contains("debug"));
>> + assert!(debug_str.contains("true"));
>> + assert!(debug_str.contains("local_mode"));
>> + assert!(debug_str.contains("false"));
>> + assert!(debug_str.contains("cluster_name"));
>> + assert!(debug_str.contains("pmxcfs"));
>> + assert!(debug_str.contains("debug_level"));
>> + }
>> +
>> + // ===== Edge Cases and Boundary Tests =====
>> +
>> + #[test]
>> + fn test_empty_strings() {
>> + let config = Config::new(String::new(), String::new(), 0, false, false, String::new());
>> +
>> + assert_eq!(config.nodename(), "");
>> + assert_eq!(config.node_ip(), "");
>> + assert_eq!(config.cluster_name(), "");
>> + assert_eq!(config.www_data_gid(), 0);
>> + }
>> +
>> + #[test]
>> + fn test_long_strings() {
>> + let long_name = "a".repeat(1000);
>> + let long_ip = "192.168.1.".to_string() + &"1".repeat(100);
>> + let long_cluster = "cluster-".to_string() + &"x".repeat(500);
>> +
>> + let config = Config::new(
>> + long_name.clone(),
>> + long_ip.clone(),
>> + u32::MAX,
>> + true,
>> + true,
>> + long_cluster.clone(),
>> + );
>> +
>> + assert_eq!(config.nodename(), long_name);
>> + assert_eq!(config.node_ip(), long_ip);
>> + assert_eq!(config.cluster_name(), long_cluster);
>> + assert_eq!(config.www_data_gid(), u32::MAX);
>> + }
>> +
>> + #[test]
>> + fn test_special_characters_in_strings() {
>> + let config = Config::new(
>> + "node-1_test.local".to_string(),
>> + "192.168.1.10:8006".to_string(),
>> + 33,
>> + false,
>> + false,
>> + "my-cluster_v2.0".to_string(),
>> + );
>> +
>> + assert_eq!(config.nodename(), "node-1_test.local");
>> + assert_eq!(config.node_ip(), "192.168.1.10:8006");
>> + assert_eq!(config.cluster_name(), "my-cluster_v2.0");
>> + }
>> +
>> + #[test]
>> + fn test_unicode_in_strings() {
>> + let config = Config::new(
>> + "ノード1".to_string(),
>> + "::1".to_string(),
>> + 33,
>> + false,
>> + false,
>> + "集群".to_string(),
>> + );
>> +
>> + assert_eq!(config.nodename(), "ノード1");
>> + assert_eq!(config.node_ip(), "::1");
>> + assert_eq!(config.cluster_name(), "集群");
>> + }
>> +}
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [relevance 5%]
* Re: [pve-devel] [PATCH pve-cluster 01/15] pmxcfs-rs: add workspace and pmxcfs-api-types crate
2026-01-23 14:17 6% ` Samuel Rufinatscha
@ 2026-01-26 9:00 6% ` Kefu Chai
0 siblings, 0 replies; 200+ results
From: Kefu Chai @ 2026-01-26 9:00 UTC (permalink / raw)
To: Samuel Rufinatscha, Proxmox VE development discussion
On Fri Jan 23, 2026 at 10:17 PM CST, Samuel Rufinatscha wrote:
> Thanks for the series. I’ve started reviewing patches 1–6; sending
> notes for patch 1 first, and I’ll follow up with comments on the
> others once I’ve gone through them in more depth.
Hi Samuel, thanks for your review. replies inlined.
>
> comments inline
>
> On 1/6/26 3:25 PM, Kefu Chai wrote:
>> Initialize the Rust workspace for the pmxcfs rewrite project.
>>
>> Add pmxcfs-api-types crate which provides foundational types:
>> - PmxcfsError: Error type with errno mapping for FUSE operations
>> - FuseMessage: Filesystem operation messages
>> - KvStoreMessage: Status synchronization messages
>> - ApplicationMessage: Wrapper enum for both message types
>> - VmType: VM type enum (Qemu, Lxc)
>>
>> This is the foundation crate with no internal dependencies, only
>> requiring thiserror and libc. All other crates will depend on these
>> shared type definitions.
>>
>> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
>> ---
>> src/pmxcfs-rs/Cargo.lock | 2067 +++++++++++++++++++++
>
> Following the .gitignore pattern in our other repos, Cargo.lock is
> ignored, so I’d suggest dropping it from the series.
dropped.
>
>> src/pmxcfs-rs/Cargo.toml | 83 +
>> src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml | 19 +
>> src/pmxcfs-rs/pmxcfs-api-types/README.md | 105 ++
>> src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs | 152 ++
>> 5 files changed, 2426 insertions(+)
>> create mode 100644 src/pmxcfs-rs/Cargo.lock
>> create mode 100644 src/pmxcfs-rs/Cargo.toml
>> create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
>> create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/README.md
>> create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
>>
>> diff --git a/src/pmxcfs-rs/Cargo.lock b/src/pmxcfs-rs/Cargo.lock
>
> [..]
>
>> +++ b/src/pmxcfs-rs/Cargo.toml
>> @@ -0,0 +1,83 @@
>> +# Workspace root for pmxcfs Rust implementation
>> +[workspace]
>> +members = [
>> + "pmxcfs-api-types", # Shared types and error definitions
>> +]
>> +resolver = "2"
>> +
>> +[workspace.package]
>> +version = "9.0.6"
>> +edition = "2024"
>> +authors = ["Proxmox Support Team <support@proxmox.com>"]
>> +license = "AGPL-3.0"
>> +repository = "https://git.proxmox.com/?p=pve-cluster.git"
>> +rust-version = "1.85"
>> +
>> +[workspace.dependencies]
>
> Here we already declare workspace path deps for crates that aren’t
> present yet (pmxcfs-config, pmxcfs-memdb, ...). For bisectability,
> could we keep this patch minimal and add those workspace
> members/path deps in the patches where the crates are introduced?
restructured the commits to add the deps only when they are used.
>
>> +# Internal workspace dependencies
>> +pmxcfs-api-types = { path = "pmxcfs-api-types" }
>> +pmxcfs-config = { path = "pmxcfs-config" }
>> +pmxcfs-memdb = { path = "pmxcfs-memdb" }
>> +pmxcfs-dfsm = { path = "pmxcfs-dfsm" }
>> +pmxcfs-rrd = { path = "pmxcfs-rrd" }
>> +pmxcfs-status = { path = "pmxcfs-status" }
>> +pmxcfs-ipc = { path = "pmxcfs-ipc" }
>> +pmxcfs-services = { path = "pmxcfs-services" }
>> +pmxcfs-logger = { path = "pmxcfs-logger" }
>> +
>> +# Core async runtime
>> +tokio = { version = "1.35", features = ["full"] }
>> +tokio-util = "0.7"
>> +async-trait = "0.1"
>> +
>
> If the goal is to centrally pin external crate versions early, maybe
> limit [workspace.dependencies] here generally to the crates actually
> used by pmxcfs-api-types (thiserror, libc) and extend as new crates
> are added.
likewise.
>
>> +# Error handling
>> +anyhow = "1.0"
>> +thiserror = "1.0"
>> +
>> +# Logging and tracing
>> +tracing = "0.1"
>> +tracing-subscriber = { version = "0.3", features = ["env-filter"] }
>> +
>> +# Serialization
>> +serde = { version = "1.0", features = ["derive"] }
>> +serde_json = "1.0"
>> +bincode = "1.3"
>> +
>> +# Network and cluster
>> +bytes = "1.5"
>> +sha2 = "0.10"
>> +bytemuck = { version = "1.14", features = ["derive"] }
>> +
>> +# System integration
>> +libc = "0.2"
>> +nix = { version = "0.27", features = ["fs", "process", "signal", "user", "socket"] }
>> +users = "0.11"
>> +
>> +# Corosync/CPG bindings
>> +rust-corosync = "0.1"
>> +
>> +# Enum conversions
>> +num_enum = "0.7"
>> +
>> +# Concurrency primitives
>> +parking_lot = "0.12"
>> +
>> +# Utilities
>> +chrono = "0.4"
>> +futures = "0.3"
>> +
>> +# Development dependencies
>> +tempfile = "3.8"
>> +
>> +[workspace.lints.clippy]
>> +uninlined_format_args = "warn"
>> +
>> +[profile.release]
>> +lto = true
>> +codegen-units = 1
>> +opt-level = 3
>> +strip = true
>> +
>> +[profile.dev]
>> +opt-level = 1
>> +debug = true
>> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml b/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
>> new file mode 100644
>> index 00000000..cdce7951
>> --- /dev/null
>> +++ b/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
>> @@ -0,0 +1,19 @@
>> +[package]
>> +name = "pmxcfs-api-types"
>> +description = "Shared types and error definitions for pmxcfs"
>> +
>> +version.workspace = true
>> +edition.workspace = true
>> +authors.workspace = true
>> +license.workspace = true
>> +repository.workspace = true
>> +
>> +[lints]
>> +workspace = true
>> +
>> +[dependencies]
>> +# Error handling
>> +thiserror.workspace = true
>> +
>> +# System integration
>> +libc.workspace = true
>> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/README.md b/src/pmxcfs-rs/pmxcfs-api-types/README.md
>> new file mode 100644
>> index 00000000..da8304ae
>> --- /dev/null
>> +++ b/src/pmxcfs-rs/pmxcfs-api-types/README.md
>> @@ -0,0 +1,105 @@
>> +# pmxcfs-api-types
>> +
>> +**Shared Types and Error Definitions** for pmxcfs.
>> +
>> +This crate provides common types, error definitions, and message formats used across all pmxcfs crates. It serves as the "API contract" between different components.
>> +
>> +## Overview
>> +
>> +The crate contains:
>> +- **Error types**: `PmxcfsError` with errno mapping for FUSE
>> +- **Message types**: `FuseMessage`, `KvStoreMessage`, `ApplicationMessage`
>
> These types and the mentioned serialization helpers aren’t part of this
> diff, could you re-check both README.md (and the commit message) so they
> match?
sorry, this README was revised before the last refactory. fixed.
>
>> +- **Shared types**: `MemberInfo`, `NodeSyncInfo`
>> +- **Serialization**: C-compatible wire format helpers
>> +
>> +## Error Types
>> +
>> +### PmxcfsError
>> +
>> +Type-safe error enum with automatic errno conversion.
>> +
>> +### errno Mapping
>> +
>> +Errors automatically convert to POSIX errno values for FUSE.
>> +
>> +| Error | errno | Value |
>> +|-------|-------|-------|
>> +| `NotFound` | `ENOENT` | 2 |
>> +| `PermissionDenied` | `EPERM` | 1 |
>> +| `AlreadyExists` | `EEXIST` | 17 |
>> +| `NotADirectory` | `ENOTDIR` | 20 |
>> +| `IsADirectory` | `EISDIR` | 21 |
>> +| `DirectoryNotEmpty` | `ENOTEMPTY` | 39 |
>> +| `FileTooLarge` | `EFBIG` | 27 |
>> +| `ReadOnlyFilesystem` | `EROFS` | 30 |
>> +| `NoQuorum` | `EACCES` | 13 |
>> +| `Timeout` | `ETIMEDOUT` | 110 |
>> +
>> +## Message Types
>> +
>> +### FuseMessage
>> +
>> +Filesystem operations broadcast through the cluster (via DFSM). Uses C-compatible wire format compatible with `dcdb.c`.
>> +
>> +### KvStoreMessage
>> +
>> +Status and metrics synchronization (via kvstore DFSM). Uses C-compatible wire format.
>> +
>> +### ApplicationMessage
>> +
>> +Wrapper for either FuseMessage or KvStoreMessage, used by DFSM to handle both filesystem and status messages with type safety.
>> +
>> +## Shared Types
>> +
>> +### MemberInfo
>> +
>> +Cluster member information.
>> +
>> +### NodeSyncInfo
>> +
>> +DFSM synchronization state.
>> +
>> +## C to Rust Mapping
>> +
>> +### Error Handling
>> +
>> +**C Version (cfs-utils.h):**
>> +- Return codes: `0` = success, negative = error
>> +- errno-based error reporting
>> +- Manual error checking everywhere
>> +
>> +**Rust Version:**
>> +- `Result<T, PmxcfsError>` type
>> +
>> +### Message Types
>> +
>> +**C Version (dcdb.h):**
>> +
>> +**Rust Version:**
>> +- Type-safe enums
>> +
>> +## Key Differences from C Implementation
>> +
>> +All message types have `serialize()` and `deserialize()` methods that produce byte-for-byte compatible formats with the C implementation.
>> +
>> +## Known Issues / TODOs
>> +
>> +### Missing Features
>> +- None identified
>> +
>> +### Compatibility
>> +- **Wire format**: 100% compatible with C implementation
>> +- **errno values**: Match POSIX standards
>> +- **Message types**: All C message types covered
>> +
>> +## References
>> +
>> +### C Implementation
>> +- `src/pmxcfs/cfs-utils.h` - Utility types and error codes
>> +- `src/pmxcfs/dcdb.h` - FUSE message types
>> +- `src/pmxcfs/status.h` - KvStore message types
>> +
>> +### Related Crates
>> +- **pmxcfs-dfsm**: Uses ApplicationMessage for cluster sync
>> +- **pmxcfs-memdb**: Uses PmxcfsError for database operations
>> +- **pmxcfs**: Uses FuseMessage for FUSE operations
>> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs b/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
>> new file mode 100644
>> index 00000000..ae0e5eb0
>> --- /dev/null
>> +++ b/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
>> @@ -0,0 +1,152 @@
>> +use thiserror::Error;
>> +
>> +/// Error types for pmxcfs operations
>> +#[derive(Error, Debug)]
>> +pub enum PmxcfsError {
>
> nit: the error related parts could be added into a dedicated error.rs
> module
thanks! extracted.
>
>> + #[error("I/O error: {0}")]
>> + Io(#[from] std::io::Error),
>> +
>> + #[error("Database error: {0}")]
>> + Database(String),
>> +
>> + #[error("FUSE error: {0}")]
>> + Fuse(String),
>> +
>> + #[error("Cluster error: {0}")]
>> + Cluster(String),
>> +
>> + #[error("Corosync error: {0}")]
>> + Corosync(String),
>> +
>> + #[error("Configuration error: {0}")]
>> + Configuration(String),
>> +
>> + #[error("System error: {0}")]
>> + System(String),
>> +
>> + #[error("IPC error: {0}")]
>> + Ipc(String),
>> +
>> + #[error("Permission denied")]
>> + PermissionDenied,
>> +
>> + #[error("Not found: {0}")]
>> + NotFound(String),
>> +
>> + #[error("Already exists: {0}")]
>> + AlreadyExists(String),
>> +
>> + #[error("Invalid argument: {0}")]
>> + InvalidArgument(String),
>> +
>> + #[error("Not a directory: {0}")]
>> + NotADirectory(String),
>> +
>> + #[error("Is a directory: {0}")]
>> + IsADirectory(String),
>> +
>> + #[error("Directory not empty: {0}")]
>> + DirectoryNotEmpty(String),
>> +
>> + #[error("No quorum")]
>> + NoQuorum,
>> +
>> + #[error("Read-only filesystem")]
>> + ReadOnlyFilesystem,
>> +
>> + #[error("File too large")]
>> + FileTooLarge,
>> +
>> + #[error("Lock error: {0}")]
>> + Lock(String),
>> +
>> + #[error("Timeout")]
>> + Timeout,
>> +
>> + #[error("Invalid path: {0}")]
>> + InvalidPath(String),
>> +}
>> +
>> +impl PmxcfsError {
>> + /// Convert error to errno value for FUSE operations
>> + pub fn to_errno(&self) -> i32 {
>> + match self {
>> + PmxcfsError::NotFound(_) => libc::ENOENT,
>> + PmxcfsError::PermissionDenied => libc::EPERM,
>> + PmxcfsError::AlreadyExists(_) => libc::EEXIST,
>> + PmxcfsError::NotADirectory(_) => libc::ENOTDIR,
>> + PmxcfsError::IsADirectory(_) => libc::EISDIR,
>> + PmxcfsError::DirectoryNotEmpty(_) => libc::ENOTEMPTY,
>> + PmxcfsError::InvalidArgument(_) => libc::EINVAL,
>> + PmxcfsError::FileTooLarge => libc::EFBIG,
>> + PmxcfsError::ReadOnlyFilesystem => libc::EROFS,
>> + PmxcfsError::NoQuorum => libc::EACCES,
>> + PmxcfsError::Timeout => libc::ETIMEDOUT,
>> + PmxcfsError::Io(e) => match e.raw_os_error() {
>> + Some(errno) => errno,
>> + None => libc::EIO,
>> + },
>> + _ => libc::EIO,
>
> Please check with C implementation, but:
>
> "PermissionDenied" should likely map to EACCES rather than EPERM. In
> FUSE/POSIX, EACCES is the standard return for file permission blocks,
> whereas EPERM is usually for administrative restrictions
> (like ownership)
>
> "InvalidPath" maps better to EINVAL. EIO suggests a hardware/disk
> failure, whereas InvalidPath implies an argument issue
>
> Also, "Lock" should explicitly be mapped.
> EBUSY (resource busy / lock contention)
> or EDEADLK (deadlock) / EAGAIN depending on semantics
>
> In general, can we minimize the number of errors falling into the
> generic EIO branch?
>
indeed. the way how the errors were categorized was way too
coarse-grained. fixed accordingly.
>> + }
>> + }
>> +}
>> +
>> +/// Result type for pmxcfs operations
>> +pub type Result<T> = std::result::Result<T, PmxcfsError>;
>> +
>> +/// VM/CT types
>> +#[derive(Debug, Clone, Copy, PartialEq, Eq)]
>
> If this is used in wire contexts please add #[repr(u8)] to ensure a
> stable ABI.
it's not use in wire format. i removed assignment statements, as in
this case, we don't need predictable values for these values -- they
are only used in-memory comparisons as distinct identifiers.
>
>> +pub enum VmType {
>> + Qemu = 1,
>> + Lxc = 3,
>
> There’s a gap between values 1 -> 3: is 2 reserved?
> If so, maybe add a short comment.
it's not reserved. actually, it's OpenVZ which was not supported anymore.
see https://www.proxmox.com/en/about/company-details/press-releases/proxmox-ve-4-0-released
now that the specific values are not assigned to these enum values, we
don't need to keep it anymore. but a short comment was added anyway to
explain that OpenVZ support was removed.
>
>> +}
>> +
>> +impl VmType {
>> + /// Returns the directory name where config files are stored
>> + pub fn config_dir(&self) -> &'static str {
>> + match self {
>> + VmType::Qemu => "qemu-server",
>> + VmType::Lxc => "lxc",
>> + }
>> + }
>> +}
>> +
>> +impl std::fmt::Display for VmType {
>> + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
>> + match self {
>> + VmType::Qemu => write!(f, "qemu"),
>> + VmType::Lxc => write!(f, "lxc"),
>> + }
>> + }
>> +}
>> +
>> +/// VM/CT entry for vmlist
>> +#[derive(Debug, Clone)]
>> +pub struct VmEntry {
>> + pub vmid: u32,
>> + pub vmtype: VmType,
>> + pub node: String,
>> + /// Per-VM version counter (increments when this VM's config changes)
>> + pub version: u32,
>> +}
>> +
>> +/// Information about a cluster member
>> +///
>> +/// This is a shared type used by both cluster and DFSM modules
>> +#[derive(Debug, Clone)]
>> +pub struct MemberInfo {
>> + pub node_id: u32,
>> + pub pid: u32,
>> + pub joined_at: u64,
>> +}
>> +
>> +/// Node synchronization info for DFSM state sync
>> +///
>> +/// Used during DFSM synchronization to track which nodes have provided state
>> +#[derive(Debug, Clone)]
>> +pub struct NodeSyncInfo {
>> + pub nodeid: u32,
>
> We have "nodeid" here but "node_id" in MemberInfo, this should be
> aligned.
thanks for pointing this out! changed to "node_id".
>
>> + pub pid: u32,
>> + pub state: Option<Vec<u8>>,
>> + pub synced: bool,
>> +}
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [relevance 6%]
* Re: [pve-devel] [PATCH pve-cluster 02/15] pmxcfs-rs: add pmxcfs-config crate
@ 2026-01-23 15:01 6% ` Samuel Rufinatscha
2026-01-26 9:43 5% ` Kefu Chai
0 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2026-01-23 15:01 UTC (permalink / raw)
To: Proxmox VE development discussion, Kefu Chai
comments inline
On 1/6/26 3:25 PM, Kefu Chai wrote:
> Add configuration management crate that provides:
> - Config struct for runtime configuration
> - Node hostname, IP, and group ID tracking
> - Debug and local mode flags
> - Thread-safe configuration access via parking_lot Mutex
>
> This is a foundational crate with no internal dependencies, only
> requiring parking_lot for synchronization. Other crates will use
> this for accessing runtime configuration.
>
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
> src/pmxcfs-rs/Cargo.toml | 3 +-
> src/pmxcfs-rs/pmxcfs-config/Cargo.toml | 16 +
> src/pmxcfs-rs/pmxcfs-config/README.md | 127 +++++++
> src/pmxcfs-rs/pmxcfs-config/src/lib.rs | 471 +++++++++++++++++++++++++
> 4 files changed, 616 insertions(+), 1 deletion(-)
> create mode 100644 src/pmxcfs-rs/pmxcfs-config/Cargo.toml
> create mode 100644 src/pmxcfs-rs/pmxcfs-config/README.md
> create mode 100644 src/pmxcfs-rs/pmxcfs-config/src/lib.rs
>
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index 15d88f52..28e20bb7 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -1,7 +1,8 @@
> # Workspace root for pmxcfs Rust implementation
> [workspace]
> members = [
> - "pmxcfs-api-types", # Shared types and error definitions
> + "pmxcfs-api-types", # Shared types and error definitions
> + "pmxcfs-config", # Configuration management
> ]
> resolver = "2"
>
> diff --git a/src/pmxcfs-rs/pmxcfs-config/Cargo.toml b/src/pmxcfs-rs/pmxcfs-config/Cargo.toml
> new file mode 100644
> index 00000000..f5a60995
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-config/Cargo.toml
> @@ -0,0 +1,16 @@
> +[package]
> +name = "pmxcfs-config"
> +description = "Configuration management for pmxcfs"
> +
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +repository.workspace = true
> +
> +[lints]
> +workspace = true
> +
> +[dependencies]
> +# Concurrency primitives
> +parking_lot.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-config/README.md b/src/pmxcfs-rs/pmxcfs-config/README.md
> new file mode 100644
> index 00000000..c06b2170
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-config/README.md
> @@ -0,0 +1,127 @@
> +# pmxcfs-config
> +
> +**Configuration Management** and **Cluster Services** for pmxcfs.
> +
> +This crate provides configuration structures and cluster integration services including quorum tracking and cluster configuration monitoring via Corosync APIs.
> +
> +## Overview
> +
> +This crate contains:
> +1. **Config struct**: Runtime configuration (node name, IPs, flags)
> +2. Integration with Corosync services (tracked in main pmxcfs crate):
> + - **QuorumService** (`pmxcfs/src/quorum_service.rs`) - Quorum monitoring
> + - **ClusterConfigService** (`pmxcfs/src/cluster_config_service.rs`) - Config tracking
This patch only contains the Config struct, but not Cluster Services
or QuorumService, please revist commit message and README.
> +
> +## Config Struct
> +
> +The `Config` struct holds daemon-wide configuration including node hostname, IP address, www-data group ID, debug flag, local mode flag, and cluster name.
> +
> +## Cluster Services
> +
> +The following services are implemented in the main pmxcfs crate but documented here for completeness.
> +
> +### QuorumService
> +
> +**C Equivalent:** `src/pmxcfs/quorum.c` - `service_quorum_new()`
> +**Rust Location:** `src/pmxcfs-rs/pmxcfs/src/quorum_service.rs`
> +
> +Monitors cluster quorum status via Corosync quorum API.
> +
> +#### Features
> +- Tracks quorum state (quorate/inquorate)
> +- Monitors member list changes
> +- Automatic reconnection on Corosync restart
> +- Updates `Status` quorum flag
> +
> +#### C to Rust Mapping
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `service_quorum_new()` | `QuorumService::new()` | quorum_service.rs |
> +| `service_quorum_destroy()` | (Drop trait / finalize) | Automatic |
> +| `quorum_notification_fn` | quorum_notification closure | quorum_service.rs |
> +| `nodelist_notification_fn` | nodelist_notification closure | quorum_service.rs |
> +
> +#### Quorum Notifications
> +
> +The service monitors quorum state changes and member list changes, updating the Status accordingly.
> +
> +### ClusterConfigService
> +
> +**C Equivalent:** `src/pmxcfs/confdb.c` - `service_confdb_new()`
> +**Rust Location:** `src/pmxcfs-rs/pmxcfs/src/cluster_config_service.rs`
> +
> +Monitors Corosync cluster configuration (cmap) and tracks node membership.
> +
> +#### Features
> +- Monitors cluster membership via Corosync cmap API
> +- Tracks node additions/removals
> +- Registers nodes in Status
> +- Automatic reconnection on Corosync restart
> +
> +#### C to Rust Mapping
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `service_confdb_new()` | `ClusterConfigService::new()` | cluster_config_service.rs |
> +| `service_confdb_destroy()` | (Drop trait / finalize) | Automatic |
> +| `confdb_track_fn` | (direct cmap queries) | Different approach |
> +
> +#### Configuration Tracking
> +
> +The service monitors:
> +- `nodelist.node.*.nodeid` - Node IDs
> +- `nodelist.node.*.name` - Node names
> +- `nodelist.node.*.ring*_addr` - Node IP addresses
> +
> +Updates `Status` with current cluster membership.
> +
> +## Key Differences from C Implementation
> +
> +### Cluster Config Service API
> +
> +**C Version (confdb.c):**
> +- Uses deprecated confdb API
> +- Track changes via confdb notifications
> +
> +**Rust Version:**
> +- Uses modern cmap API
> +- Direct cmap queries
> +
> +Both read the same data, but Rust uses the modern Corosync API.
> +
> +### Service Integration
> +
> +**C Version:**
> +- qb_loop manages lifecycle
> +
> +**Rust Version:**
> +- Service trait abstracts lifecycle
> +- ServiceManager handles retry
> +- Tokio async dispatch
> +
> +## Known Issues / TODOs
> +
> +### Compatibility
> +- **Quorum tracking**: Compatible with C implementation
> +- **Node registration**: Equivalent behavior
> +- **cmap vs confdb**: Rust uses modern cmap API (C uses deprecated confdb)
> +
> +### Missing Features
> +- None identified
> +
> +### Behavioral Differences (Benign)
> +- **API choice**: Rust uses cmap, C uses confdb (both read same data)
> +- **Lifecycle**: Rust uses Service trait, C uses manual lifecycle
> +
> +## References
> +
> +### C Implementation
> +- `src/pmxcfs/quorum.c` / `quorum.h` - Quorum service
> +- `src/pmxcfs/confdb.c` / `confdb.h` - Cluster config service
> +
> +### Related Crates
> +- **pmxcfs**: Main daemon with QuorumService and ClusterConfigService
> +- **pmxcfs-status**: Status tracking updated by these services
> +- **pmxcfs-services**: Service framework used by both services
> +- **rust-corosync**: Corosync FFI bindings
> diff --git a/src/pmxcfs-rs/pmxcfs-config/src/lib.rs b/src/pmxcfs-rs/pmxcfs-config/src/lib.rs
> new file mode 100644
> index 00000000..5e1ee1b2
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-config/src/lib.rs
> @@ -0,0 +1,471 @@
> +use parking_lot::RwLock;
> +use std::sync::Arc;
> +
> +/// Global configuration for pmxcfs
> +pub struct Config {
> + /// Node name (hostname without domain)
> + pub nodename: String,
> +
> + /// Node IP address
> + pub node_ip: String,
Consider using std::net::IpAddr (or SocketAddr if a port is part of the
value). Tests currently mix IP vs IP:PORT, so it’s unclear what node_ip
is supposed to represent.
> +
> + /// www-data group ID for file permissions
> + pub www_data_gid: u32,
> +
> + /// Debug mode enabled
> + pub debug: bool,
> +
> + /// Force local mode (no clustering)
> + pub local_mode: bool,
> +
> + /// Cluster name (CPG group name)
> + pub cluster_name: String,
> +
> + /// Debug level (0 = normal, 1+ = debug) - mutable at runtime
> + debug_level: RwLock<u8>,
in the crate docs it says: “The Config struct uses Arc<AtomicU8> for
debug_level” but the implementation uses parking_lot::RwLock<u8>.
Unless we need lock coupling with other fields, AtomicU8 would likely
be sufficient (and cheaper) for debug_level. Also please re-check the
commit message, which mentions parking_lot::Mutex.
> +}
> +
> +impl Clone for Config {
> + fn clone(&self) -> Self {
> + Self {
> + nodename: self.nodename.clone(),
> + node_ip: self.node_ip.clone(),
> + www_data_gid: self.www_data_gid,
> + debug: self.debug,
> + local_mode: self.local_mode,
> + cluster_name: self.cluster_name.clone(),
> + debug_level: RwLock::new(*self.debug_level.read()),
> + }
> + }
> +}
> +
> +impl std::fmt::Debug for Config {
> + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
> + f.debug_struct("Config")
> + .field("nodename", &self.nodename)
> + .field("node_ip", &self.node_ip)
> + .field("www_data_gid", &self.www_data_gid)
> + .field("debug", &self.debug)
> + .field("local_mode", &self.local_mode)
> + .field("cluster_name", &self.cluster_name)
> + .field("debug_level", &*self.debug_level.read())
> + .finish()
> + }
> +}
> +
> +impl Config {
> + pub fn new(
> + nodename: String,
> + node_ip: String,
> + www_data_gid: u32,
> + debug: bool,
> + local_mode: bool,
> + cluster_name: String,
> + ) -> Arc<Self> {
The constructor returns Arc<Config>
I think we could keep new() -> Self, and provide convenience
constructor shared() -> Arc<Self>.
This would allow local usage (e.g. for tests) without heap allocation
of the struct
> + let debug_level = if debug { 1 } else { 0 };
debug_level is derived from debug at creation time, but thereafter:
set_debug_level() does not update debug and is_debug() would continue
to reflect the initial flag, not the effective debug level
is_debug() should just be a helper that returns self.debug_level() > 0.
The debug field should probably be removed entirely.
> + Arc::new(Self {
> + nodename,
> + node_ip,
> + www_data_gid,
> + debug,
> + local_mode,
> + cluster_name,
> + debug_level: RwLock::new(debug_level),
> + })
> + }
> +
> + pub fn cluster_name(&self) -> &str {
> + &self.cluster_name
> + }
> +
> + pub fn nodename(&self) -> &str {
> + &self.nodename
> + }
> +
> + pub fn node_ip(&self) -> &str {
> + &self.node_ip
> + }
> +
> + pub fn www_data_gid(&self) -> u32 {
> + self.www_data_gid
> + }
> +
> + pub fn is_debug(&self) -> bool {
> + self.debug
> + }
> +
> + pub fn is_local_mode(&self) -> bool {
> + self.local_mode
> + }
> +
> + /// Get current debug level (0 = normal, 1+ = debug)
> + pub fn debug_level(&self) -> u8 {
> + *self.debug_level.read()
> + }
> +
> + /// Set debug level (0 = normal, 1+ = debug)
> + pub fn set_debug_level(&self, level: u8) {
> + *self.debug_level.write() = level;
> + }
Right now most fields are pub but also getters are exposed. This will
make it harder to enforce invariants.
I would suggest to make fields private and keep getters, or keep fields
public and drop the getters.
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + //! Unit tests for Config struct
> + //!
> + //! This test module provides comprehensive coverage for:
> + //! - Configuration creation and initialization
> + //! - Getter methods for all configuration fields
> + //! - Debug level mutation and thread safety
> + //! - Concurrent access patterns (reads and writes)
> + //! - Clone independence
> + //! - Debug formatting
> + //! - Edge cases (empty strings, long strings, special characters, unicode)
> + //!
> + //! ## Thread Safety
> + //!
> + //! The Config struct uses `Arc<AtomicU8>` for debug_level to allow
> + //! safe concurrent reads and writes. Tests verify:
> + //! - 10 threads × 100 operations (concurrent modifications)
> + //! - 20 threads × 1000 operations (concurrent reads)
> + //!
> + //! ## Edge Cases
> + //!
> + //! Tests cover various edge cases including:
> + //! - Empty strings for node/cluster names
> + //! - Long strings (1000+ characters)
> + //! - Special characters in strings
> + //! - Unicode support (emoji, non-ASCII characters)
> +
> + use super::*;
> + use std::thread;
> +
> + // ===== Basic Construction Tests =====
> +
> + #[test]
> + fn test_config_creation() {
> + let config = Config::new(
> + "node1".to_string(),
> + "192.168.1.10".to_string(),
> + 33,
> + false,
> + false,
> + "pmxcfs".to_string(),
> + );
> +
> + assert_eq!(config.nodename(), "node1");
> + assert_eq!(config.node_ip(), "192.168.1.10");
> + assert_eq!(config.www_data_gid(), 33);
> + assert!(!config.is_debug());
> + assert!(!config.is_local_mode());
> + assert_eq!(config.cluster_name(), "pmxcfs");
> + assert_eq!(
> + config.debug_level(),
> + 0,
> + "Debug level should be 0 when debug is false"
> + );
> + }
> +
> + #[test]
> + fn test_config_creation_with_debug() {
> + let config = Config::new(
> + "node2".to_string(),
> + "10.0.0.5".to_string(),
> + 1000,
> + true,
> + false,
> + "test-cluster".to_string(),
> + );
> +
> + assert!(config.is_debug());
> + assert_eq!(
> + config.debug_level(),
> + 1,
> + "Debug level should be 1 when debug is true"
> + );
> + }
> +
> + #[test]
> + fn test_config_creation_local_mode() {
> + let config = Config::new(
> + "localhost".to_string(),
> + "127.0.0.1".to_string(),
> + 33,
> + false,
> + true,
> + "local".to_string(),
> + );
> +
> + assert!(config.is_local_mode());
> + assert!(!config.is_debug());
> + }
> +
> + // ===== Getter Tests =====
> +
> + #[test]
> + fn test_all_getters() {
> + let config = Config::new(
> + "testnode".to_string(),
> + "172.16.0.1".to_string(),
> + 999,
> + true,
> + true,
> + "my-cluster".to_string(),
> + );
> +
> + // Test all getter methods
> + assert_eq!(config.nodename(), "testnode");
> + assert_eq!(config.node_ip(), "172.16.0.1");
> + assert_eq!(config.www_data_gid(), 999);
> + assert!(config.is_debug());
> + assert!(config.is_local_mode());
> + assert_eq!(config.cluster_name(), "my-cluster");
> + assert_eq!(config.debug_level(), 1);
> + }
> +
> + // ===== Debug Level Mutation Tests =====
> +
> + #[test]
> + fn test_debug_level_mutation() {
> + let config = Config::new(
> + "node1".to_string(),
> + "192.168.1.1".to_string(),
> + 33,
> + false,
> + false,
> + "pmxcfs".to_string(),
> + );
> +
> + assert_eq!(config.debug_level(), 0);
> +
> + config.set_debug_level(1);
> + assert_eq!(config.debug_level(), 1);
> +
> + config.set_debug_level(5);
> + assert_eq!(config.debug_level(), 5);
> +
> + config.set_debug_level(0);
> + assert_eq!(config.debug_level(), 0);
> + }
> +
> + #[test]
> + fn test_debug_level_max_value() {
> + let config = Config::new(
> + "node1".to_string(),
> + "192.168.1.1".to_string(),
> + 33,
> + false,
> + false,
> + "pmxcfs".to_string(),
> + );
> +
> + config.set_debug_level(255);
> + assert_eq!(config.debug_level(), 255);
> +
> + config.set_debug_level(0);
> + assert_eq!(config.debug_level(), 0);
> + }
> +
> + // ===== Thread Safety Tests =====
> +
> + #[test]
> + fn test_debug_level_thread_safety() {
> + let config = Config::new(
> + "node1".to_string(),
> + "192.168.1.1".to_string(),
> + 33,
> + false,
> + false,
> + "pmxcfs".to_string(),
> + );
> +
> + let config_clone = Arc::clone(&config);
> +
> + // Spawn multiple threads that concurrently modify debug level
> + let handles: Vec<_> = (0..10)
> + .map(|i| {
> + let cfg = Arc::clone(&config);
> + thread::spawn(move || {
> + for _ in 0..100 {
> + cfg.set_debug_level(i);
> + let _ = cfg.debug_level();
> + }
> + })
> + })
> + .collect();
> +
> + // All threads should complete without panicking
> + for handle in handles {
> + handle.join().unwrap();
> + }
> +
> + // Final value should be one of the values set by threads
> + let final_level = config_clone.debug_level();
> + assert!(
> + final_level < 10,
> + "Debug level should be < 10, got {final_level}"
> + );
> + }
> +
> + #[test]
> + fn test_concurrent_reads() {
> + let config = Config::new(
> + "node1".to_string(),
> + "192.168.1.1".to_string(),
> + 33,
> + true,
> + false,
> + "pmxcfs".to_string(),
> + );
> +
> + // Spawn multiple threads that concurrently read config
> + let handles: Vec<_> = (0..20)
> + .map(|_| {
> + let cfg = Arc::clone(&config);
> + thread::spawn(move || {
> + for _ in 0..1000 {
> + assert_eq!(cfg.nodename(), "node1");
> + assert_eq!(cfg.node_ip(), "192.168.1.1");
> + assert_eq!(cfg.www_data_gid(), 33);
> + assert!(cfg.is_debug());
> + assert!(!cfg.is_local_mode());
> + assert_eq!(cfg.cluster_name(), "pmxcfs");
> + }
> + })
> + })
> + .collect();
> +
> + for handle in handles {
> + handle.join().unwrap();
> + }
> + }
> +
> + // ===== Clone Tests =====
> +
> + #[test]
> + fn test_config_clone() {
> + let config1 = Config::new(
> + "node1".to_string(),
> + "192.168.1.1".to_string(),
> + 33,
> + true,
> + false,
> + "pmxcfs".to_string(),
> + );
> +
> + config1.set_debug_level(5);
> +
> + let config2 = (*config1).clone();
> +
> + // Cloned config should have same values
> + assert_eq!(config2.nodename(), config1.nodename());
> + assert_eq!(config2.node_ip(), config1.node_ip());
> + assert_eq!(config2.www_data_gid(), config1.www_data_gid());
> + assert_eq!(config2.is_debug(), config1.is_debug());
> + assert_eq!(config2.is_local_mode(), config1.is_local_mode());
> + assert_eq!(config2.cluster_name(), config1.cluster_name());
> + assert_eq!(config2.debug_level(), 5);
> +
> + // Modifying one should not affect the other
> + config2.set_debug_level(10);
> + assert_eq!(config1.debug_level(), 5);
> + assert_eq!(config2.debug_level(), 10);
> + }
> +
> + // ===== Debug Formatting Tests =====
> +
> + #[test]
> + fn test_debug_format() {
> + let config = Config::new(
> + "node1".to_string(),
> + "192.168.1.1".to_string(),
> + 33,
> + true,
> + false,
> + "pmxcfs".to_string(),
> + );
> +
> + let debug_str = format!("{config:?}");
> +
> + // Check that debug output contains all fields
> + assert!(debug_str.contains("Config"));
> + assert!(debug_str.contains("nodename"));
> + assert!(debug_str.contains("node1"));
> + assert!(debug_str.contains("node_ip"));
> + assert!(debug_str.contains("192.168.1.1"));
> + assert!(debug_str.contains("www_data_gid"));
> + assert!(debug_str.contains("33"));
> + assert!(debug_str.contains("debug"));
> + assert!(debug_str.contains("true"));
> + assert!(debug_str.contains("local_mode"));
> + assert!(debug_str.contains("false"));
> + assert!(debug_str.contains("cluster_name"));
> + assert!(debug_str.contains("pmxcfs"));
> + assert!(debug_str.contains("debug_level"));
> + }
> +
> + // ===== Edge Cases and Boundary Tests =====
> +
> + #[test]
> + fn test_empty_strings() {
> + let config = Config::new(String::new(), String::new(), 0, false, false, String::new());
> +
> + assert_eq!(config.nodename(), "");
> + assert_eq!(config.node_ip(), "");
> + assert_eq!(config.cluster_name(), "");
> + assert_eq!(config.www_data_gid(), 0);
> + }
> +
> + #[test]
> + fn test_long_strings() {
> + let long_name = "a".repeat(1000);
> + let long_ip = "192.168.1.".to_string() + &"1".repeat(100);
> + let long_cluster = "cluster-".to_string() + &"x".repeat(500);
> +
> + let config = Config::new(
> + long_name.clone(),
> + long_ip.clone(),
> + u32::MAX,
> + true,
> + true,
> + long_cluster.clone(),
> + );
> +
> + assert_eq!(config.nodename(), long_name);
> + assert_eq!(config.node_ip(), long_ip);
> + assert_eq!(config.cluster_name(), long_cluster);
> + assert_eq!(config.www_data_gid(), u32::MAX);
> + }
> +
> + #[test]
> + fn test_special_characters_in_strings() {
> + let config = Config::new(
> + "node-1_test.local".to_string(),
> + "192.168.1.10:8006".to_string(),
> + 33,
> + false,
> + false,
> + "my-cluster_v2.0".to_string(),
> + );
> +
> + assert_eq!(config.nodename(), "node-1_test.local");
> + assert_eq!(config.node_ip(), "192.168.1.10:8006");
> + assert_eq!(config.cluster_name(), "my-cluster_v2.0");
> + }
> +
> + #[test]
> + fn test_unicode_in_strings() {
> + let config = Config::new(
> + "ノード1".to_string(),
> + "::1".to_string(),
> + 33,
> + false,
> + false,
> + "集群".to_string(),
> + );
> +
> + assert_eq!(config.nodename(), "ノード1");
> + assert_eq!(config.node_ip(), "::1");
> + assert_eq!(config.cluster_name(), "集群");
> + }
> +}
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [relevance 6%]
* Re: [pve-devel] [PATCH pve-cluster 01/15] pmxcfs-rs: add workspace and pmxcfs-api-types crate
@ 2026-01-23 14:17 6% ` Samuel Rufinatscha
2026-01-26 9:00 6% ` Kefu Chai
0 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2026-01-23 14:17 UTC (permalink / raw)
To: Proxmox VE development discussion, Kefu Chai
Thanks for the series. I’ve started reviewing patches 1–6; sending
notes for patch 1 first, and I’ll follow up with comments on the
others once I’ve gone through them in more depth.
comments inline
On 1/6/26 3:25 PM, Kefu Chai wrote:
> Initialize the Rust workspace for the pmxcfs rewrite project.
>
> Add pmxcfs-api-types crate which provides foundational types:
> - PmxcfsError: Error type with errno mapping for FUSE operations
> - FuseMessage: Filesystem operation messages
> - KvStoreMessage: Status synchronization messages
> - ApplicationMessage: Wrapper enum for both message types
> - VmType: VM type enum (Qemu, Lxc)
>
> This is the foundation crate with no internal dependencies, only
> requiring thiserror and libc. All other crates will depend on these
> shared type definitions.
>
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
> src/pmxcfs-rs/Cargo.lock | 2067 +++++++++++++++++++++
Following the .gitignore pattern in our other repos, Cargo.lock is
ignored, so I’d suggest dropping it from the series.
> src/pmxcfs-rs/Cargo.toml | 83 +
> src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml | 19 +
> src/pmxcfs-rs/pmxcfs-api-types/README.md | 105 ++
> src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs | 152 ++
> 5 files changed, 2426 insertions(+)
> create mode 100644 src/pmxcfs-rs/Cargo.lock
> create mode 100644 src/pmxcfs-rs/Cargo.toml
> create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
> create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/README.md
> create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
>
> diff --git a/src/pmxcfs-rs/Cargo.lock b/src/pmxcfs-rs/Cargo.lock
[..]
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -0,0 +1,83 @@
> +# Workspace root for pmxcfs Rust implementation
> +[workspace]
> +members = [
> + "pmxcfs-api-types", # Shared types and error definitions
> +]
> +resolver = "2"
> +
> +[workspace.package]
> +version = "9.0.6"
> +edition = "2024"
> +authors = ["Proxmox Support Team <support@proxmox.com>"]
> +license = "AGPL-3.0"
> +repository = "https://git.proxmox.com/?p=pve-cluster.git"
> +rust-version = "1.85"
> +
> +[workspace.dependencies]
Here we already declare workspace path deps for crates that aren’t
present yet (pmxcfs-config, pmxcfs-memdb, ...). For bisectability,
could we keep this patch minimal and add those workspace
members/path deps in the patches where the crates are introduced?
> +# Internal workspace dependencies
> +pmxcfs-api-types = { path = "pmxcfs-api-types" }
> +pmxcfs-config = { path = "pmxcfs-config" }
> +pmxcfs-memdb = { path = "pmxcfs-memdb" }
> +pmxcfs-dfsm = { path = "pmxcfs-dfsm" }
> +pmxcfs-rrd = { path = "pmxcfs-rrd" }
> +pmxcfs-status = { path = "pmxcfs-status" }
> +pmxcfs-ipc = { path = "pmxcfs-ipc" }
> +pmxcfs-services = { path = "pmxcfs-services" }
> +pmxcfs-logger = { path = "pmxcfs-logger" }
> +
> +# Core async runtime
> +tokio = { version = "1.35", features = ["full"] }
> +tokio-util = "0.7"
> +async-trait = "0.1"
> +
If the goal is to centrally pin external crate versions early, maybe
limit [workspace.dependencies] here generally to the crates actually
used by pmxcfs-api-types (thiserror, libc) and extend as new crates
are added.
> +# Error handling
> +anyhow = "1.0"
> +thiserror = "1.0"
> +
> +# Logging and tracing
> +tracing = "0.1"
> +tracing-subscriber = { version = "0.3", features = ["env-filter"] }
> +
> +# Serialization
> +serde = { version = "1.0", features = ["derive"] }
> +serde_json = "1.0"
> +bincode = "1.3"
> +
> +# Network and cluster
> +bytes = "1.5"
> +sha2 = "0.10"
> +bytemuck = { version = "1.14", features = ["derive"] }
> +
> +# System integration
> +libc = "0.2"
> +nix = { version = "0.27", features = ["fs", "process", "signal", "user", "socket"] }
> +users = "0.11"
> +
> +# Corosync/CPG bindings
> +rust-corosync = "0.1"
> +
> +# Enum conversions
> +num_enum = "0.7"
> +
> +# Concurrency primitives
> +parking_lot = "0.12"
> +
> +# Utilities
> +chrono = "0.4"
> +futures = "0.3"
> +
> +# Development dependencies
> +tempfile = "3.8"
> +
> +[workspace.lints.clippy]
> +uninlined_format_args = "warn"
> +
> +[profile.release]
> +lto = true
> +codegen-units = 1
> +opt-level = 3
> +strip = true
> +
> +[profile.dev]
> +opt-level = 1
> +debug = true
> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml b/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
> new file mode 100644
> index 00000000..cdce7951
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
> @@ -0,0 +1,19 @@
> +[package]
> +name = "pmxcfs-api-types"
> +description = "Shared types and error definitions for pmxcfs"
> +
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +repository.workspace = true
> +
> +[lints]
> +workspace = true
> +
> +[dependencies]
> +# Error handling
> +thiserror.workspace = true
> +
> +# System integration
> +libc.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/README.md b/src/pmxcfs-rs/pmxcfs-api-types/README.md
> new file mode 100644
> index 00000000..da8304ae
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-api-types/README.md
> @@ -0,0 +1,105 @@
> +# pmxcfs-api-types
> +
> +**Shared Types and Error Definitions** for pmxcfs.
> +
> +This crate provides common types, error definitions, and message formats used across all pmxcfs crates. It serves as the "API contract" between different components.
> +
> +## Overview
> +
> +The crate contains:
> +- **Error types**: `PmxcfsError` with errno mapping for FUSE
> +- **Message types**: `FuseMessage`, `KvStoreMessage`, `ApplicationMessage`
These types and the mentioned serialization helpers aren’t part of this
diff, could you re-check both README.md (and the commit message) so they
match?
> +- **Shared types**: `MemberInfo`, `NodeSyncInfo`
> +- **Serialization**: C-compatible wire format helpers
> +
> +## Error Types
> +
> +### PmxcfsError
> +
> +Type-safe error enum with automatic errno conversion.
> +
> +### errno Mapping
> +
> +Errors automatically convert to POSIX errno values for FUSE.
> +
> +| Error | errno | Value |
> +|-------|-------|-------|
> +| `NotFound` | `ENOENT` | 2 |
> +| `PermissionDenied` | `EPERM` | 1 |
> +| `AlreadyExists` | `EEXIST` | 17 |
> +| `NotADirectory` | `ENOTDIR` | 20 |
> +| `IsADirectory` | `EISDIR` | 21 |
> +| `DirectoryNotEmpty` | `ENOTEMPTY` | 39 |
> +| `FileTooLarge` | `EFBIG` | 27 |
> +| `ReadOnlyFilesystem` | `EROFS` | 30 |
> +| `NoQuorum` | `EACCES` | 13 |
> +| `Timeout` | `ETIMEDOUT` | 110 |
> +
> +## Message Types
> +
> +### FuseMessage
> +
> +Filesystem operations broadcast through the cluster (via DFSM). Uses C-compatible wire format compatible with `dcdb.c`.
> +
> +### KvStoreMessage
> +
> +Status and metrics synchronization (via kvstore DFSM). Uses C-compatible wire format.
> +
> +### ApplicationMessage
> +
> +Wrapper for either FuseMessage or KvStoreMessage, used by DFSM to handle both filesystem and status messages with type safety.
> +
> +## Shared Types
> +
> +### MemberInfo
> +
> +Cluster member information.
> +
> +### NodeSyncInfo
> +
> +DFSM synchronization state.
> +
> +## C to Rust Mapping
> +
> +### Error Handling
> +
> +**C Version (cfs-utils.h):**
> +- Return codes: `0` = success, negative = error
> +- errno-based error reporting
> +- Manual error checking everywhere
> +
> +**Rust Version:**
> +- `Result<T, PmxcfsError>` type
> +
> +### Message Types
> +
> +**C Version (dcdb.h):**
> +
> +**Rust Version:**
> +- Type-safe enums
> +
> +## Key Differences from C Implementation
> +
> +All message types have `serialize()` and `deserialize()` methods that produce byte-for-byte compatible formats with the C implementation.
> +
> +## Known Issues / TODOs
> +
> +### Missing Features
> +- None identified
> +
> +### Compatibility
> +- **Wire format**: 100% compatible with C implementation
> +- **errno values**: Match POSIX standards
> +- **Message types**: All C message types covered
> +
> +## References
> +
> +### C Implementation
> +- `src/pmxcfs/cfs-utils.h` - Utility types and error codes
> +- `src/pmxcfs/dcdb.h` - FUSE message types
> +- `src/pmxcfs/status.h` - KvStore message types
> +
> +### Related Crates
> +- **pmxcfs-dfsm**: Uses ApplicationMessage for cluster sync
> +- **pmxcfs-memdb**: Uses PmxcfsError for database operations
> +- **pmxcfs**: Uses FuseMessage for FUSE operations
> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs b/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
> new file mode 100644
> index 00000000..ae0e5eb0
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
> @@ -0,0 +1,152 @@
> +use thiserror::Error;
> +
> +/// Error types for pmxcfs operations
> +#[derive(Error, Debug)]
> +pub enum PmxcfsError {
nit: the error related parts could be added into a dedicated error.rs
module
> + #[error("I/O error: {0}")]
> + Io(#[from] std::io::Error),
> +
> + #[error("Database error: {0}")]
> + Database(String),
> +
> + #[error("FUSE error: {0}")]
> + Fuse(String),
> +
> + #[error("Cluster error: {0}")]
> + Cluster(String),
> +
> + #[error("Corosync error: {0}")]
> + Corosync(String),
> +
> + #[error("Configuration error: {0}")]
> + Configuration(String),
> +
> + #[error("System error: {0}")]
> + System(String),
> +
> + #[error("IPC error: {0}")]
> + Ipc(String),
> +
> + #[error("Permission denied")]
> + PermissionDenied,
> +
> + #[error("Not found: {0}")]
> + NotFound(String),
> +
> + #[error("Already exists: {0}")]
> + AlreadyExists(String),
> +
> + #[error("Invalid argument: {0}")]
> + InvalidArgument(String),
> +
> + #[error("Not a directory: {0}")]
> + NotADirectory(String),
> +
> + #[error("Is a directory: {0}")]
> + IsADirectory(String),
> +
> + #[error("Directory not empty: {0}")]
> + DirectoryNotEmpty(String),
> +
> + #[error("No quorum")]
> + NoQuorum,
> +
> + #[error("Read-only filesystem")]
> + ReadOnlyFilesystem,
> +
> + #[error("File too large")]
> + FileTooLarge,
> +
> + #[error("Lock error: {0}")]
> + Lock(String),
> +
> + #[error("Timeout")]
> + Timeout,
> +
> + #[error("Invalid path: {0}")]
> + InvalidPath(String),
> +}
> +
> +impl PmxcfsError {
> + /// Convert error to errno value for FUSE operations
> + pub fn to_errno(&self) -> i32 {
> + match self {
> + PmxcfsError::NotFound(_) => libc::ENOENT,
> + PmxcfsError::PermissionDenied => libc::EPERM,
> + PmxcfsError::AlreadyExists(_) => libc::EEXIST,
> + PmxcfsError::NotADirectory(_) => libc::ENOTDIR,
> + PmxcfsError::IsADirectory(_) => libc::EISDIR,
> + PmxcfsError::DirectoryNotEmpty(_) => libc::ENOTEMPTY,
> + PmxcfsError::InvalidArgument(_) => libc::EINVAL,
> + PmxcfsError::FileTooLarge => libc::EFBIG,
> + PmxcfsError::ReadOnlyFilesystem => libc::EROFS,
> + PmxcfsError::NoQuorum => libc::EACCES,
> + PmxcfsError::Timeout => libc::ETIMEDOUT,
> + PmxcfsError::Io(e) => match e.raw_os_error() {
> + Some(errno) => errno,
> + None => libc::EIO,
> + },
> + _ => libc::EIO,
Please check with C implementation, but:
"PermissionDenied" should likely map to EACCES rather than EPERM. In
FUSE/POSIX, EACCES is the standard return for file permission blocks,
whereas EPERM is usually for administrative restrictions
(like ownership)
"InvalidPath" maps better to EINVAL. EIO suggests a hardware/disk
failure, whereas InvalidPath implies an argument issue
Also, "Lock" should explicitly be mapped.
EBUSY (resource busy / lock contention)
or EDEADLK (deadlock) / EAGAIN depending on semantics
In general, can we minimize the number of errors falling into the
generic EIO branch?
> + }
> + }
> +}
> +
> +/// Result type for pmxcfs operations
> +pub type Result<T> = std::result::Result<T, PmxcfsError>;
> +
> +/// VM/CT types
> +#[derive(Debug, Clone, Copy, PartialEq, Eq)]
If this is used in wire contexts please add #[repr(u8)] to ensure a
stable ABI.
> +pub enum VmType {
> + Qemu = 1,
> + Lxc = 3,
There’s a gap between values 1 -> 3: is 2 reserved?
If so, maybe add a short comment.
> +}
> +
> +impl VmType {
> + /// Returns the directory name where config files are stored
> + pub fn config_dir(&self) -> &'static str {
> + match self {
> + VmType::Qemu => "qemu-server",
> + VmType::Lxc => "lxc",
> + }
> + }
> +}
> +
> +impl std::fmt::Display for VmType {
> + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
> + match self {
> + VmType::Qemu => write!(f, "qemu"),
> + VmType::Lxc => write!(f, "lxc"),
> + }
> + }
> +}
> +
> +/// VM/CT entry for vmlist
> +#[derive(Debug, Clone)]
> +pub struct VmEntry {
> + pub vmid: u32,
> + pub vmtype: VmType,
> + pub node: String,
> + /// Per-VM version counter (increments when this VM's config changes)
> + pub version: u32,
> +}
> +
> +/// Information about a cluster member
> +///
> +/// This is a shared type used by both cluster and DFSM modules
> +#[derive(Debug, Clone)]
> +pub struct MemberInfo {
> + pub node_id: u32,
> + pub pid: u32,
> + pub joined_at: u64,
> +}
> +
> +/// Node synchronization info for DFSM state sync
> +///
> +/// Used during DFSM synchronization to track which nodes have provided state
> +#[derive(Debug, Clone)]
> +pub struct NodeSyncInfo {
> + pub nodeid: u32,
We have "nodeid" here but "node_id" in MemberInfo, this should be
aligned.
> + pub pid: u32,
> + pub state: Option<Vec<u8>>,
> + pub synced: bool,
> +}
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [relevance 6%]
* [pbs-devel] superseded: [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] token-shadow: reduce api token verification overhead
2026-01-02 16:07 13% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] " Samuel Rufinatscha
` (9 preceding siblings ...)
2026-01-02 16:07 17% ` [pbs-devel] [PATCH proxmox-datacenter-manager v3 2/2] docs: document API token-cache TTL effects Samuel Rufinatscha
@ 2026-01-21 15:15 13% ` Samuel Rufinatscha
10 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-21 15:15 UTC (permalink / raw)
To: pbs-devel
https://lore.proxmox.com/pbs-devel/20260121151408.731516-1-s.rufinatscha@proxmox.com/T/#t
On 1/2/26 5:07 PM, Samuel Rufinatscha wrote:
> Hi,
>
> this series improves the performance of token-based API authentication
> in PBS (pbs-config) and in PDM (underlying proxmox-access-control
> crate), addressing the API token verification hotspot reported in our
> bugtracker #7017 [1].
>
> When profiling PBS /status endpoint with cargo flamegraph [2],
> token-based authentication showed up as a dominant hotspot via
> proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
> path from the hot section of the flamegraph. The same performance issue
> was measured [2] for PDM. PDM uses the underlying shared
> proxmox-access-control library for token handling, which is a
> factored out version of the token.shadow handling code from PBS.
>
> While this series fixes the immediate performance issue both in PBS
> (pbs-config) and in the shared proxmox-access-control crate used by
> PDM, PBS should eventually, ideally be refactored, in a separate
> effort, to use proxmox-access-control for token handling instead of its
> local implementation.
>
> Problem
>
> For token-based API requests, both PBS’s pbs-config token.shadow
> handling and PDM proxmox-access-control’s token.shadow handling
> currently:
>
> 1. read the token.shadow file on each request
> 2. deserialize it into a HashMap<Authid, String>
> 3. run password hash verification via
> proxmox_sys::crypt::verify_crypt_pw for the provided token secret
>
> Under load, this results in significant CPU usage spent in repeated
> password hashing for the same token+secret pairs. The attached
> flamegraphs for PBS [2] and PDM [3] show
> proxmox_sys::crypt::verify_crypt_pw dominating the hot path.
>
> Approach
>
> The goal is to reduce the cost of token-based authentication preserving
> the existing token handling semantics (including detecting manual edits
> to token.shadow) and be consistent between PBS (pbs-config) and
> PDM (proxmox-access-control). For both sites, this series proposes to:
>
> 1. Introduce an in-memory cache for verified token secrets and
> invalidate it through a shared ConfigVersionCache generation. Note, a
> shared generation is required to keep privileged and unprivileged
> daemon in sync to avoid caching inconsistencies across processes.
> 2. Invalidate on token.shadow file API changes (set_secret,
> delete_secret)
> 3. Invalidate on direct/manual token.shadow file changes (mtime +
> length)
> 4. Avoid per-request file stat calls using a TTL window
>
> Testing
>
> *PBS (pbs-config)*
>
> To verify the effect in PBS, I:
> 1. Set up test environment based on latest PBS ISO, installed Rust
> toolchain, cloned proxmox-backup repository to use with cargo
> flamegraph. Reproduced bug #7017 [1] by profiling the /status
> endpoint with token-based authentication using cargo flamegraph [2].
> 2. Built PBS with pbs-config patches and re-ran the same workload and
> profiling setup. Confirmed that
> proxmox_sys::crypt::verify_crypt_pw path no longer appears in the
> hot section of the flamegraph. CPU usage is now dominated by TLS
> overhead.
> 3. Functionally-wise, I verified that:
> * valid tokens authenticate correctly when used in API requests
> * invalid secrets are rejected as before
> * generating a new token secret via dashboard (create token for user,
> regenerate existing secret) works and authenticates correctly
>
> *PDM (proxmox-access-control)*
>
> To verify the effect in PDM, I followed a similar testing approach.
> Instead of PBS’ /status, I profiled the /version endpoint with cargo
> flamegraph [2] and verified that the expensive hashing path disappears
> from the hot section after introducing caching.
>
> Functionally-wise, I verified that:
> * valid tokens authenticate correctly when used in API requests
> * invalid secrets are rejected as before
> * generating a new token secret via dashboard (create token for user,
> regenerate existing secret) works and authenticates correctly
>
> Benchmarks:
>
> Two different benchmarks have been run to measure caching effects
> and RwLock contention:
>
> (1) Requests per second for PBS /status endpoint (E2E)
>
> Benchmarked parallel token auth requests for
> /status?verbose=0 on top of the datastore lookup cache series [4]
> to check throughput impact. With datastores=1, repeat=5000, parallel=16
> this series gives ~172 req/s compared to ~65 req/s without it.
> This is a ~2.6x improvement (and aligns with the ~179 req/s from the
> previous series, which used per-process cache invalidation).
>
> (2) RwLock contention for token create/delete under heavy load of
> token-authenticated requests
>
> The previous version of the series compared std::sync::RwLock and
> parking_lot::RwLock contention for token create/delete under heavy
> parallel token-authenticated readers. parking_lot::RwLock has been
> chosen for the added fairness guarantees.
>
> Patch summary
>
> pbs-config:
>
> 0001 – pbs-config: add token.shadow generation to ConfigVersionCache
> Extends ConfigVersionCache to provide a process-shared generation
> number for token.shadow changes.
>
> 0002 – pbs-config: cache verified API token secrets
> Adds an in-memory cache to cache verified, plain-text API token secrets.
> Cache is invalidated through the process-shared ConfigVersionCache
> generation number. Uses openssl’s memcmp constant-time for matching
> secrets.
>
> 0003 – pbs-config: invalidate token-secret cache on token.shadow
> changes
> Stats token.shadow mtime and length and clears the cache when the
> file changes, on each token verification request.
>
> 0004 – pbs-config: add TTL window to token-secret cache
> Introduces a TTL (TOKEN_SECRET_CACHE_TTL_SECS, default 60) for metadata
> checks so that fs::metadata calls are not performed on each request.
>
> proxmox-access-control:
>
> 0005 – access-control: extend AccessControlConfig for token.shadow invalidation
>
> Extends the AccessControlConfig trait with
> token_shadow_cache_generation() and
> increment_token_shadow_cache_generation() for
> proxmox-access-control to get the shared token.shadow generation number
> and bump it on token shadow changes.
>
> 0006 – access-control: cache verified API token secrets
> Mirrors PBS PATCH 0002.
>
> 0007 – access-control: invalidate token-secret cache on token.shadow changes
> Mirrors PBS PATCH 0003.
>
> 0008 – access-control: add TTL window to token-secret cache
> Mirrors PBS PATCH 0004.
>
> proxmox-datacenter-manager:
>
> 0009 – pdm-config: add token.shadow generation to ConfigVersionCache
> Extends PDM ConfigVersionCache and implements
> token_shadow_cache_generation() and
> increment_token_shadow_cache_generation() from AccessControlConfig for
> PDM.
>
> 0010 – docs: document API token-cache TTL effects
> Documents the effects of the TTL window on token.shadow edits
>
> Changes from v1 to v2:
>
> * (refactor) Switched cache initialization to LazyLock
> * (perf) Use parking_lot::RwLock and best-effort cache access on the
> read/refresh path (try_read/try_write) to avoid lock contention
> * (doc) Document TTL-delayed effect of manual token.shadow edits
> * (fix) Add generation guards (API_MUTATION_GENERATION +
> FILE_GENERATION) to prevent caching across concurrent set/delete and
> external edits
>
> Changes from v2 to v3:
>
> * (refactor) Replace PBS per-process cache invalidation with a
> cross-process token.shadow generation based on PBS
> ConfigVersionCache, ensuring cache consistency between privileged
> and unprivileged daemons.
> * (refactor) Decoupling generation source from the
> proxmox/proxmox-access-control cache implementation: extend
> AccessControlConfig hooks so that products can provide the shared
> token.shadow generation source.
> * (refactor) Extend PDM's ConfigVersionCache with
> token_shadow_generation
> and introduce a pdm_config::AccessControlConfig wrapper implementing
> the new proxmox-access-control trait hooks. Switch server and CLI
> initialization to use pdm_config::AccessControlConfig instead of
> pdm_api_types::AccessControlConfig.
> * (refactor) Adapt generation checks around cached-secret comparison to
> use the new shared generation source.
> * (fix/logic) cache_try_insert_secret: Update the local cache
> generation if stale, allowing the new secret to be inserted
> immediately
> * (refactor) Extract cache invalidation logic into a
> invalidate_cache_state helper to reduce duplication and ensure
> consistent state resets
> * (refactor) Simplify refresh_cache_if_file_changed: handle the
> un-initialized/reset state and adjust the generation mismatch
> path to ensure file metadata is always re-read.
> * (doc) Clarify TTL-delayed effects of manual token.shadow edits.
>
> Please see the patch specific changelogs for more details.
>
> Thanks for considering this patch series, I look forward to your
> feedback.
>
> Best,
> Samuel Rufinatscha
>
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
> [2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
> [3] attachment 1794 [1]: Flamegraph PDM baseline
> [4] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>
> proxmox-backup:
>
> Samuel Rufinatscha (4):
> pbs-config: add token.shadow generation to ConfigVersionCache
> pbs-config: cache verified API token secrets
> pbs-config: invalidate token-secret cache on token.shadow changes
> pbs-config: add TTL window to token secret cache
>
> Cargo.toml | 1 +
> docs/user-management.rst | 4 +
> pbs-config/Cargo.toml | 1 +
> pbs-config/src/config_version_cache.rs | 18 ++
> pbs-config/src/token_shadow.rs | 298 ++++++++++++++++++++++++-
> 5 files changed, 321 insertions(+), 1 deletion(-)
>
>
> proxmox:
>
> Samuel Rufinatscha (4):
> proxmox-access-control: extend AccessControlConfig for token.shadow
> invalidation
> proxmox-access-control: cache verified API token secrets
> proxmox-access-control: invalidate token-secret cache on token.shadow
> changes
> proxmox-access-control: add TTL window to token secret cache
>
> Cargo.toml | 1 +
> proxmox-access-control/Cargo.toml | 1 +
> proxmox-access-control/src/init.rs | 17 ++
> proxmox-access-control/src/token_shadow.rs | 299 ++++++++++++++++++++-
> 4 files changed, 317 insertions(+), 1 deletion(-)
>
>
> proxmox-datacenter-manager:
>
> Samuel Rufinatscha (2):
> pdm-config: implement token.shadow generation
> docs: document API token-cache TTL effects
>
> cli/admin/src/main.rs | 2 +-
> docs/access-control.rst | 4 ++
> lib/pdm-config/Cargo.toml | 1 +
> lib/pdm-config/src/access_control_config.rs | 73 +++++++++++++++++++++
> lib/pdm-config/src/config_version_cache.rs | 18 +++++
> lib/pdm-config/src/lib.rs | 2 +
> server/src/acl.rs | 3 +-
> 7 files changed, 100 insertions(+), 3 deletions(-)
> create mode 100644 lib/pdm-config/src/access_control_config.rs
>
>
> Summary over all repositories:
> 16 files changed, 738 insertions(+), 5 deletions(-)
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox v4 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
` (5 preceding siblings ...)
2026-01-21 15:14 12% ` [pbs-devel] [PATCH proxmox v4 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
@ 2026-01-21 15:14 12% ` Samuel Rufinatscha
2026-01-21 15:14 15% ` [pbs-devel] [PATCH proxmox v4 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
` (3 subsequent siblings)
10 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
To: pbs-devel
This patch adds manual/direct file change detection by tracking the
mtime and length of token.shadow and clears the in-memory token secret
cache whenever these values change.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to v4:
* make use of .replace() in refresh_cache_if_file_changed to get
previous state
* Group file stats with ShadowFileInfo
* Return false in refresh_cache_if_file_changed to avoid unnecessary cache
queries
* Adjusted commit message
Changes from v2 to v3:
* Cache now tracks last_checked (epoch seconds).
* Simplified refresh_cache_if_file_changed, removed
FILE_GENERATION logic
* On first load, initializes file metadata and keeps empty cache.
Changes from v1 to v2:
* Add file metadata tracking (file_mtime, file_len) and
FILE_GENERATION.
* Store file_gen in CachedSecret and verify it against the current
FILE_GENERATION to ensure cached entries belong to the current file
state.
* Add shadow_mtime_len() helper and convert refresh to best-effort
(try_write, returns bool).
* Pass a pre-write metadata snapshot into apply_api_mutation and
clear/bump generation if the cache metadata indicates missed external
edits.
proxmox-access-control/src/token_shadow.rs | 123 ++++++++++++++++++++-
1 file changed, 119 insertions(+), 4 deletions(-)
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index e4dfab50..05813b52 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,5 +1,8 @@
use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
use std::sync::LazyLock;
+use std::time::SystemTime;
use anyhow::{bail, format_err, Error};
use parking_lot::RwLock;
@@ -7,6 +10,7 @@ use serde_json::{from_value, Value};
use proxmox_auth_api::types::Authid;
use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
+use proxmox_time::epoch_i64;
use crate::init::access_conf;
use crate::init::impl_feature::{token_shadow, token_shadow_lock};
@@ -20,6 +24,7 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
RwLock::new(ApiTokenSecretCache {
secrets: HashMap::new(),
shared_gen: 0,
+ shadow: None,
})
});
@@ -45,6 +50,56 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
replace_config(token_shadow(), &json)
}
+/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
+/// Returns true if the cache is valid to use, false if not.
+fn refresh_cache_if_file_changed() -> bool {
+ let now = epoch_i64();
+
+ // Best-effort refresh under write lock.
+ let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+ return false;
+ };
+
+ let Some(shared_gen_now) = token_shadow_shared_gen() else {
+ return false;
+ };
+
+ // If another process bumped the generation, we don't know what changed -> clear cache
+ if cache.shared_gen != shared_gen_now {
+ invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
+ }
+
+ // Stat the file to detect manual edits.
+ let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
+ return false;
+ };
+
+ // If the file didn't change, only update last_checked
+ if let Some(shadow) = cache.shadow.as_mut() {
+ if shadow.mtime == new_mtime && shadow.len == new_len {
+ shadow.last_checked = now;
+ return true;
+ }
+ }
+
+ cache.secrets.clear();
+
+ let prev = cache.shadow.replace(ShadowFileInfo {
+ mtime: new_mtime,
+ len: new_len,
+ last_checked: now,
+ });
+
+ if prev.is_some() {
+ // Best-effort propagation to other processes if a change was detected
+ if let Some(shared_gen_new) = bump_token_shadow_shared_gen() {
+ cache.shared_gen = shared_gen_new;
+ }
+ }
+
+ false
+}
+
/// Verifies that an entry for given tokenid / API token secret exists
pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
if !tokenid.is_token() {
@@ -52,7 +107,7 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
}
// Fast path
- if cache_try_secret_matches(tokenid, secret) {
+ if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
return Ok(());
}
@@ -84,12 +139,15 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
let guard = lock_config()?;
+ // Capture state before we write to detect external edits.
+ let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
let mut data = read_file()?;
let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
data.insert(tokenid.clone(), hashed_secret);
write_file(data)?;
- apply_api_mutation(guard, tokenid, Some(secret));
+ apply_api_mutation(guard, tokenid, Some(secret), pre_meta);
Ok(())
}
@@ -102,11 +160,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
let guard = lock_config()?;
+ // Capture state before we write to detect external edits.
+ let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
let mut data = read_file()?;
data.remove(tokenid);
write_file(data)?;
- apply_api_mutation(guard, tokenid, None);
+ apply_api_mutation(guard, tokenid, None, pre_meta);
Ok(())
}
@@ -128,6 +189,8 @@ struct ApiTokenSecretCache {
secrets: HashMap<Authid, CachedSecret>,
/// Shared generation to detect mutations of the underlying token.shadow file.
shared_gen: usize,
+ /// Shadow file info to detect changes
+ shadow: Option<ShadowFileInfo>,
}
/// Cached secret.
@@ -135,6 +198,16 @@ struct CachedSecret {
secret: String,
}
+/// Shadow file info
+struct ShadowFileInfo {
+ // shadow file mtime to detect changes
+ mtime: Option<SystemTime>,
+ // shadow file length to detect changes
+ len: Option<u64>,
+ // last time the file metadata was checked
+ last_checked: i64,
+}
+
fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
return;
@@ -179,7 +252,14 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
false
}
-fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+fn apply_api_mutation(
+ _guard: ApiLockGuard,
+ tokenid: &Authid,
+ new_secret: Option<&str>,
+ pre_write_meta: (Option<SystemTime>, Option<u64>),
+) {
+ let now = epoch_i64();
+
// Signal cache invalidation to other processes (best-effort).
let bumped_gen = bump_token_shadow_shared_gen();
@@ -198,6 +278,16 @@ fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Opt
return;
}
+ // If our cached file metadata does not match the on-disk state before our write,
+ // we likely missed an external/manual edit. We can no longer trust any cached secrets.
+ if cache
+ .shadow
+ .as_ref()
+ .is_some_and(|s| (s.mtime, s.len) != pre_write_meta)
+ {
+ cache.secrets.clear();
+ }
+
// Update to the post-mutation generation.
cache.shared_gen = current_gen;
@@ -215,6 +305,22 @@ fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Opt
cache.secrets.remove(tokenid);
}
}
+
+ // Update our view of the file metadata to the post-write state (best-effort).
+ // (If this fails, drop local cache so callers fall back to slow path until refreshed.)
+ match shadow_mtime_len() {
+ Ok((mtime, len)) => {
+ cache.shadow = Some(ShadowFileInfo {
+ mtime,
+ len,
+ last_checked: now,
+ });
+ }
+ Err(_) => {
+ // If we cannot validate state, do not trust cache.
+ invalidate_cache_state_and_set_gen(&mut cache, current_gen);
+ }
+ }
}
/// Get the current shared generation.
@@ -234,4 +340,13 @@ fn bump_token_shadow_shared_gen() -> Option<usize> {
fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache, gen: usize) {
cache.secrets.clear();
cache.shared_gen = gen;
+ cache.shadow = None;
+}
+
+fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
+ match fs::metadata(token_shadow()) {
+ Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
+ Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
+ Err(e) => Err(e.into()),
+ }
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] [PATCH proxmox-datacenter-manager v4 3/3] pdm-config: wire user+acl cache generation
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
` (9 preceding siblings ...)
2026-01-21 15:14 17% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 2/3] docs: document API token-cache TTL effects Samuel Rufinatscha
@ 2026-01-21 15:14 16% ` Samuel Rufinatscha
10 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
To: pbs-devel
Rename ConfigVersionCache’s user_cache_generation to
user_and_acl_generation to match AccessControlConfig::cache_generation
and increment_cache_generation semantics: it expects the same shared
generation for both user and ACL configs.
Safety: no layout change, the shared-memory size and field order remain
unchanged.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
lib/pdm-config/src/access_control.rs | 11 +++++++++++
lib/pdm-config/src/config_version_cache.rs | 16 ++++++++--------
2 files changed, 19 insertions(+), 8 deletions(-)
diff --git a/lib/pdm-config/src/access_control.rs b/lib/pdm-config/src/access_control.rs
index 389b3f4..1d498d3 100644
--- a/lib/pdm-config/src/access_control.rs
+++ b/lib/pdm-config/src/access_control.rs
@@ -7,6 +7,17 @@ impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
&pdm_api_types::AccessControlPermissions
}
+ fn cache_generation(&self) -> Option<usize> {
+ crate::ConfigVersionCache::new()
+ .ok()
+ .map(|c| c.user_and_acl_generation())
+ }
+
+ fn increment_cache_generation(&self) -> Result<(), Error> {
+ let c = crate::ConfigVersionCache::new()?;
+ Ok(c.increase_user_and_acl_generation())
+ }
+
fn token_shadow_cache_generation(&self) -> Option<usize> {
crate::ConfigVersionCache::new()
.ok()
diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
index 933140c..f3d52a0 100644
--- a/lib/pdm-config/src/config_version_cache.rs
+++ b/lib/pdm-config/src/config_version_cache.rs
@@ -21,8 +21,8 @@ use proxmox_shared_memory::*;
#[repr(C)]
struct ConfigVersionCacheDataInner {
magic: [u8; 8],
- // User (user.cfg) cache generation/version.
- user_cache_generation: AtomicUsize,
+ // User (user.cfg) and ACL (acl.cfg) generation/version.
+ user_and_acl_generation: AtomicUsize,
// Traffic control (traffic-control.cfg) generation/version.
traffic_control_generation: AtomicUsize,
// Tracks updates to the remote/hostname/nodename mapping cache.
@@ -126,19 +126,19 @@ impl ConfigVersionCache {
Ok(Arc::new(Self { shmem }))
}
- /// Returns the user cache generation number.
- pub fn user_cache_generation(&self) -> usize {
+ /// Returns the user and ACL cache generation number.
+ pub fn user_and_acl_generation(&self) -> usize {
self.shmem
.data()
- .user_cache_generation
+ .user_and_acl_generation
.load(Ordering::Acquire)
}
- /// Increase the user cache generation number.
- pub fn increase_user_cache_generation(&self) {
+ /// Increase the user and ACL cache generation number.
+ pub fn increase_user_and_acl_generation(&self) {
self.shmem
.data()
- .user_cache_generation
+ .user_and_acl_generation
.fetch_add(1, Ordering::AcqRel);
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 16%]
* [pbs-devel] [PATCH proxmox-backup v4 3/4] pbs-config: invalidate token-secret cache on token.shadow changes
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
2026-01-21 15:13 17% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
2026-01-21 15:13 12% ` [pbs-devel] [PATCH proxmox-backup v4 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
@ 2026-01-21 15:13 12% ` Samuel Rufinatscha
2026-01-21 15:14 15% ` [pbs-devel] [PATCH proxmox-backup v4 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
` (7 subsequent siblings)
10 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-21 15:13 UTC (permalink / raw)
To: pbs-devel
This patch adds manual/direct file change detection by tracking the
mtime and length of token.shadow and clears the in-memory token secret
cache whenever these values change.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to v4:
* make use of .replace() in refresh_cache_if_file_changed to get
previous state
* Group file stats with ShadowFileInfo
* Return false in refresh_cache_if_file_changed to avoid unnecessary cache
queries
* Adjusted commit message
Changes from v2 to v3:
* Cache now tracks last_checked (epoch seconds).
* Simplified refresh_cache_if_file_changed, removed
FILE_GENERATION logic
* On first load, initializes file metadata and keeps empty cache.
Changes from v1 to v2:
* Add file metadata tracking (file_mtime, file_len) and
FILE_GENERATION.
* Store file_gen in CachedSecret and verify it against the current
FILE_GENERATION to ensure cached entries belong to the current file
state.
* Add shadow_mtime_len() helper and convert refresh to best-effort
(try_write, returns bool).
* Pass a pre-write metadata snapshot into apply_api_mutation and
clear/bump generation if the cache metadata indicates missed external
edits.
pbs-config/src/token_shadow.rs | 123 +++++++++++++++++++++++++++++++--
1 file changed, 119 insertions(+), 4 deletions(-)
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index d5aa5de2..a5bd1525 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,5 +1,8 @@
use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
use std::sync::LazyLock;
+use std::time::SystemTime;
use anyhow::{bail, format_err, Error};
use parking_lot::RwLock;
@@ -7,6 +10,7 @@ use serde::{Deserialize, Serialize};
use serde_json::{from_value, Value};
use proxmox_sys::fs::CreateOptions;
+use proxmox_time::epoch_i64;
use pbs_api_types::Authid;
//use crate::auth;
@@ -24,6 +28,7 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
RwLock::new(ApiTokenSecretCache {
secrets: HashMap::new(),
shared_gen: 0,
+ shadow: None,
})
});
@@ -62,6 +67,56 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
proxmox_sys::fs::replace_file(CONF_FILE, &json, options, true)
}
+/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
+/// Returns true if the cache is valid to use, false if not.
+fn refresh_cache_if_file_changed() -> bool {
+ let now = epoch_i64();
+
+ // Best-effort refresh under write lock.
+ let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+ return false;
+ };
+
+ let Some(shared_gen_now) = token_shadow_shared_gen() else {
+ return false;
+ };
+
+ // If another process bumped the generation, we don't know what changed -> clear cache
+ if cache.shared_gen != shared_gen_now {
+ invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
+ }
+
+ // Stat the file to detect manual edits.
+ let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
+ return false;
+ };
+
+ // If the file didn't change, only update last_checked
+ if let Some(shadow) = cache.shadow.as_mut() {
+ if shadow.mtime == new_mtime && shadow.len == new_len {
+ shadow.last_checked = now;
+ return true;
+ }
+ }
+
+ cache.secrets.clear();
+
+ let prev = cache.shadow.replace(ShadowFileInfo {
+ mtime: new_mtime,
+ len: new_len,
+ last_checked: now,
+ });
+
+ if prev.is_some() {
+ // Best-effort propagation to other processes if a change was detected
+ if let Some(shared_gen_new) = bump_token_shadow_shared_gen() {
+ cache.shared_gen = shared_gen_new;
+ }
+ }
+
+ false
+}
+
/// Verifies that an entry for given tokenid / API token secret exists
pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
if !tokenid.is_token() {
@@ -69,7 +124,7 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
}
// Fast path
- if cache_try_secret_matches(tokenid, secret) {
+ if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
return Ok(());
}
@@ -109,12 +164,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
let guard = lock_config()?;
+ // Capture state before we write to detect external edits.
+ let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
let mut data = read_file()?;
let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
data.insert(tokenid.clone(), hashed_secret);
write_file(data)?;
- apply_api_mutation(guard, tokenid, Some(secret));
+ apply_api_mutation(guard, tokenid, Some(secret), pre_meta);
Ok(())
}
@@ -127,11 +185,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
let guard = lock_config()?;
+ // Capture state before we write to detect external edits.
+ let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
let mut data = read_file()?;
data.remove(tokenid);
write_file(data)?;
- apply_api_mutation(guard, tokenid, None);
+ apply_api_mutation(guard, tokenid, None, pre_meta);
Ok(())
}
@@ -145,6 +206,8 @@ struct ApiTokenSecretCache {
secrets: HashMap<Authid, CachedSecret>,
/// Shared generation to detect mutations of the underlying token.shadow file.
shared_gen: usize,
+ /// Shadow file info to detect changes
+ shadow: Option<ShadowFileInfo>,
}
/// Cached secret.
@@ -152,6 +215,16 @@ struct CachedSecret {
secret: String,
}
+/// Shadow file info
+struct ShadowFileInfo {
+ // shadow file mtime to detect changes
+ mtime: Option<SystemTime>,
+ // shadow file length to detect changes
+ len: Option<u64>,
+ // last time the file metadata was checked
+ last_checked: i64,
+}
+
fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
return;
@@ -196,7 +269,14 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
false
}
-fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+fn apply_api_mutation(
+ _guard: BackupLockGuard,
+ tokenid: &Authid,
+ new_secret: Option<&str>,
+ pre_write_meta: (Option<SystemTime>, Option<u64>),
+) {
+ let now = epoch_i64();
+
// Signal cache invalidation to other processes (best-effort).
let bumped_gen = bump_token_shadow_shared_gen();
@@ -215,6 +295,16 @@ fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Opt
return;
}
+ // If our cached file metadata does not match the on-disk state before our write,
+ // we likely missed an external/manual edit. We can no longer trust any cached secrets.
+ if cache
+ .shadow
+ .as_ref()
+ .is_some_and(|s| (s.mtime, s.len) != pre_write_meta)
+ {
+ cache.secrets.clear();
+ }
+
// Update to the post-mutation generation.
cache.shared_gen = current_gen;
@@ -232,6 +322,22 @@ fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Opt
cache.secrets.remove(tokenid);
}
}
+
+ // Update our view of the file metadata to the post-write state (best-effort).
+ // (If this fails, drop local cache so callers fall back to slow path until refreshed.)
+ match shadow_mtime_len() {
+ Ok((mtime, len)) => {
+ cache.shadow = Some(ShadowFileInfo {
+ mtime,
+ len,
+ last_checked: now,
+ });
+ }
+ Err(_) => {
+ // If we cannot validate state, do not trust cache.
+ invalidate_cache_state_and_set_gen(&mut cache, current_gen);
+ }
+ }
}
/// Get the current shared generation.
@@ -252,4 +358,13 @@ fn bump_token_shadow_shared_gen() -> Option<usize> {
fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache, gen: usize) {
cache.secrets.clear();
cache.shared_gen = gen;
+ cache.shadow = None;
+}
+
+fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
+ match fs::metadata(CONF_FILE) {
+ Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
+ Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
+ Err(e) => Err(e.into()),
+ }
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] [PATCH proxmox-backup v4 1/4] pbs-config: add token.shadow generation to ConfigVersionCache
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
@ 2026-01-21 15:13 17% ` Samuel Rufinatscha
2026-01-21 15:13 12% ` [pbs-devel] [PATCH proxmox-backup v4 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
` (9 subsequent siblings)
10 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-21 15:13 UTC (permalink / raw)
To: pbs-devel
Prepares the config version cache to support token_shadow caching.
Safety: the shmem mapping is fixed to 4096 bytes via the #[repr(C)]
union padding, and the new atomic is appended to the end of the
#[repr(C)] inner struct, so all existing field offsets stay unchanged.
Old processes keep accessing the same bytes and new processes consume
previously reserved padding.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to v4:
* Rebased
* Adjusted commit message
Changes from v2 to v3:
* Rebased
Changes from v1 to v2:
* Rebased
pbs-config/src/config_version_cache.rs | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
index b875f7e0..399a6f79 100644
--- a/pbs-config/src/config_version_cache.rs
+++ b/pbs-config/src/config_version_cache.rs
@@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
traffic_control_generation: AtomicUsize,
// datastore (datastore.cfg) generation/version
datastore_generation: AtomicUsize,
+ // Token shadow (token.shadow) generation/version.
+ token_shadow_generation: AtomicUsize,
// Add further atomics here
}
@@ -159,4 +161,20 @@ impl ConfigVersionCache {
.datastore_generation
.fetch_add(1, Ordering::AcqRel)
}
+
+ /// Returns the token shadow generation number.
+ pub fn token_shadow_generation(&self) -> usize {
+ self.shmem
+ .data()
+ .token_shadow_generation
+ .load(Ordering::Acquire)
+ }
+
+ /// Increase the token shadow generation number.
+ pub fn increase_token_shadow_generation(&self) -> usize {
+ self.shmem
+ .data()
+ .token_shadow_generation
+ .fetch_add(1, Ordering::AcqRel)
+ }
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 17%]
* [pbs-devel] [PATCH proxmox v4 2/4] proxmox-access-control: cache verified API token secrets
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
` (4 preceding siblings ...)
2026-01-21 15:14 13% ` [pbs-devel] [PATCH proxmox v4 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen Samuel Rufinatscha
@ 2026-01-21 15:14 12% ` Samuel Rufinatscha
2026-01-21 15:14 12% ` [pbs-devel] [PATCH proxmox v4 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
` (4 subsequent siblings)
10 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
To: pbs-devel
Adds an in-memory cache of successfully verified token secrets.
Subsequent requests for the same token+secret combination only perform a
comparison using openssl::memcmp::eq and avoid re-running the password
hash. The cache is updated when a token secret is set and cleared when a
token is deleted.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to v4:
* Add gen param to invalidate_cache_state()
* Validates the generation bump after obtaining write lock in
apply_api_mutation
* Pass lock to apply_api_mutation
* Remove unnecessary gen check cache_try_secret_matches
* Adjusted commit message
Changes from v2 to v3:
* Replaced process-local cache invalidation (AtomicU64
API_MUTATION_GENERATION) with a cross-process shared generation via
ConfigVersionCache.
* Validate shared generation before/after the constant-time secret
compare; only insert into cache if the generation is unchanged.
* invalidate_cache_state() on insert if shared generation changed.
Changes from v1 to v2:
* Replace OnceCell with LazyLock, and std::sync::RwLock with
parking_lot::RwLock.
* Add API_MUTATION_GENERATION and guard cache inserts
to prevent “zombie inserts” across concurrent set/delete.
* Refactor cache operations into cache_try_secret_matches,
cache_try_insert_secret, and centralize write-side behavior in
apply_api_mutation.
* Switch fast-path cache access to try_read/try_write (best-effort).
Cargo.toml | 1 +
proxmox-access-control/Cargo.toml | 1 +
proxmox-access-control/src/token_shadow.rs | 160 ++++++++++++++++++++-
3 files changed, 159 insertions(+), 3 deletions(-)
diff --git a/Cargo.toml b/Cargo.toml
index 27a69afa..59a2ec93 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -112,6 +112,7 @@ native-tls = "0.2"
nix = "0.29"
openssl = "0.10"
pam-sys = "0.5"
+parking_lot = "0.12"
percent-encoding = "2.1"
pin-utils = "0.1.0"
proc-macro2 = "1.0"
diff --git a/proxmox-access-control/Cargo.toml b/proxmox-access-control/Cargo.toml
index ec189664..1de2842c 100644
--- a/proxmox-access-control/Cargo.toml
+++ b/proxmox-access-control/Cargo.toml
@@ -16,6 +16,7 @@ anyhow.workspace = true
const_format.workspace = true
nix = { workspace = true, optional = true }
openssl = { workspace = true, optional = true }
+parking_lot.workspace = true
regex.workspace = true
hex = { workspace = true, optional = true }
serde.workspace = true
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index c586d834..e4dfab50 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,13 +1,28 @@
use std::collections::HashMap;
+use std::sync::LazyLock;
use anyhow::{bail, format_err, Error};
+use parking_lot::RwLock;
use serde_json::{from_value, Value};
use proxmox_auth_api::types::Authid;
use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
+use crate::init::access_conf;
use crate::init::impl_feature::{token_shadow, token_shadow_lock};
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
+ RwLock::new(ApiTokenSecretCache {
+ secrets: HashMap::new(),
+ shared_gen: 0,
+ })
+});
+
// Get exclusive lock
fn lock_config() -> Result<ApiLockGuard, Error> {
open_api_lockfile(token_shadow_lock(), None, true)
@@ -36,9 +51,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
bail!("not an API token ID");
}
+ // Fast path
+ if cache_try_secret_matches(tokenid, secret) {
+ return Ok(());
+ }
+
+ // Slow path
+ // First, capture the shared generation before doing the hash verification.
+ let gen_before = token_shadow_shared_gen();
+
let data = read_file()?;
match data.get(tokenid) {
- Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+ Some(hashed_secret) => {
+ proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+
+ // Try to cache only if nothing changed while verifying the secret.
+ if let Some(gen) = gen_before {
+ cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
+ }
+
+ Ok(())
+ }
None => bail!("invalid API token"),
}
}
@@ -49,13 +82,15 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
bail!("not an API token ID");
}
- let _guard = lock_config()?;
+ let guard = lock_config()?;
let mut data = read_file()?;
let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
data.insert(tokenid.clone(), hashed_secret);
write_file(data)?;
+ apply_api_mutation(guard, tokenid, Some(secret));
+
Ok(())
}
@@ -65,12 +100,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
bail!("not an API token ID");
}
- let _guard = lock_config()?;
+ let guard = lock_config()?;
let mut data = read_file()?;
data.remove(tokenid);
write_file(data)?;
+ apply_api_mutation(guard, tokenid, None);
+
Ok(())
}
@@ -81,3 +118,120 @@ pub fn generate_and_set_secret(tokenid: &Authid) -> Result<String, Error> {
set_secret(tokenid, &secret)?;
Ok(secret)
}
+
+struct ApiTokenSecretCache {
+ /// Keys are token Authids, values are the corresponding plain text secrets.
+ /// Entries are added after a successful on-disk verification in
+ /// `verify_secret` or when a new token secret is generated by
+ /// `generate_and_set_secret`. Used to avoid repeated
+ /// password-hash computation on subsequent authentications.
+ secrets: HashMap<Authid, CachedSecret>,
+ /// Shared generation to detect mutations of the underlying token.shadow file.
+ shared_gen: usize,
+}
+
+/// Cached secret.
+struct CachedSecret {
+ secret: String,
+}
+
+fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
+ let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+ return;
+ };
+
+ let Some(shared_gen_now) = token_shadow_shared_gen() else {
+ return;
+ };
+
+ // If this process missed a generation bump, its cache is stale.
+ if cache.shared_gen != shared_gen_now {
+ invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
+ }
+
+ // If a mutation happened while we were verifying the secret, do not insert.
+ if shared_gen_now == shared_gen_before {
+ cache.secrets.insert(tokenid, CachedSecret { secret });
+ }
+}
+
+/// Tries to match the given token secret against the cached secret.
+///
+/// Verifies the generation/version before doing the constant-time
+/// comparison to reduce TOCTOU risk. During token rotation or deletion
+/// tokens for in-flight requests may still validate against the previous
+/// generation.
+fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
+ let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+ return false;
+ };
+ let Some(entry) = cache.secrets.get(tokenid) else {
+ return false;
+ };
+ let Some(current_gen) = token_shadow_shared_gen() else {
+ return false;
+ };
+
+ if current_gen == cache.shared_gen {
+ return openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
+ }
+
+ false
+}
+
+fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+ // Signal cache invalidation to other processes (best-effort).
+ let bumped_gen = bump_token_shadow_shared_gen();
+
+ let mut cache = TOKEN_SECRET_CACHE.write();
+
+ // If we cannot get the current generation, we cannot trust the cache
+ let Some(current_gen) = token_shadow_shared_gen() else {
+ invalidate_cache_state_and_set_gen(&mut cache, 0);
+ return;
+ };
+
+ // If we cannot bump the shared generation, or if it changed after
+ // obtaining the cache write lock, we cannot trust the cache
+ if bumped_gen != Some(current_gen) {
+ invalidate_cache_state_and_set_gen(&mut cache, current_gen);
+ return;
+ }
+
+ // Update to the post-mutation generation.
+ cache.shared_gen = current_gen;
+
+ // Apply the new mutation.
+ match new_secret {
+ Some(secret) => {
+ cache.secrets.insert(
+ tokenid.clone(),
+ CachedSecret {
+ secret: secret.to_owned(),
+ },
+ );
+ }
+ None => {
+ cache.secrets.remove(tokenid);
+ }
+ }
+}
+
+/// Get the current shared generation.
+fn token_shadow_shared_gen() -> Option<usize> {
+ access_conf().token_shadow_cache_generation()
+}
+
+/// Bump and return the new shared generation.
+fn bump_token_shadow_shared_gen() -> Option<usize> {
+ access_conf()
+ .increment_token_shadow_cache_generation()
+ .ok()
+ .map(|prev| prev + 1)
+}
+
+/// Invalidates local cache contents and sets/updates the cached generation.
+fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache, gen: usize) {
+ cache.secrets.clear();
+ cache.shared_gen = gen;
+}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] [PATCH proxmox-datacenter-manager v4 1/3] pdm-config: implement token.shadow generation
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
` (7 preceding siblings ...)
2026-01-21 15:14 15% ` [pbs-devel] [PATCH proxmox v4 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
@ 2026-01-21 15:14 13% ` Samuel Rufinatscha
2026-01-21 15:14 17% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 2/3] docs: document API token-cache TTL effects Samuel Rufinatscha
2026-01-21 15:14 16% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 3/3] pdm-config: wire user+acl cache generation Samuel Rufinatscha
10 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
To: pbs-devel
PDM depends on the shared proxmox/proxmox-access-control crate for
token.shadow handling which expects the product to provide a
cross-process invalidation signal so it can cache/invalidate
token.shadow secrets.
This patch wires AccessControlConfig to ConfigVersionCache for
token.shadow invalidation and switches server/CLI to use
pdm-config’s AccessControlConfig and UI to use
UiAccessControlConfig.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to v4:
* pdm-api-types: replace AccessControlConfig with
AccessControlPermissions and implement init::AccessControlPermissions
there
* pdm-config: add new AccessControlConfig implementing
init::AccessControlConfig
* UI: init uses a local UiAccessControlConfig for init_access_config()
* Adjusted commit message
cli/admin/src/main.rs | 2 +-
lib/pdm-api-types/src/acl.rs | 4 ++--
lib/pdm-config/Cargo.toml | 1 +
lib/pdm-config/src/access_control.rs | 20 ++++++++++++++++++++
lib/pdm-config/src/config_version_cache.rs | 18 ++++++++++++++++++
lib/pdm-config/src/lib.rs | 2 ++
server/src/acl.rs | 3 +--
ui/src/main.rs | 10 +++++++++-
8 files changed, 54 insertions(+), 6 deletions(-)
create mode 100644 lib/pdm-config/src/access_control.rs
diff --git a/cli/admin/src/main.rs b/cli/admin/src/main.rs
index f698fa2..916c633 100644
--- a/cli/admin/src/main.rs
+++ b/cli/admin/src/main.rs
@@ -19,7 +19,7 @@ fn main() {
proxmox_product_config::init(api_user, priv_user);
proxmox_access_control::init::init(
- &pdm_api_types::AccessControlConfig,
+ &pdm_config::AccessControlConfig,
pdm_buildcfg::configdir!("/access"),
)
.expect("failed to setup access control config");
diff --git a/lib/pdm-api-types/src/acl.rs b/lib/pdm-api-types/src/acl.rs
index 405982a..7c405a7 100644
--- a/lib/pdm-api-types/src/acl.rs
+++ b/lib/pdm-api-types/src/acl.rs
@@ -187,9 +187,9 @@ pub struct AclListItem {
pub roleid: String,
}
-pub struct AccessControlConfig;
+pub struct AccessControlPermissions;
-impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
+impl proxmox_access_control::init::AccessControlPermissions for AccessControlPermissions {
fn privileges(&self) -> &HashMap<&str, u64> {
static PRIVS: LazyLock<HashMap<&str, u64>> =
LazyLock::new(|| PRIVILEGES.iter().copied().collect());
diff --git a/lib/pdm-config/Cargo.toml b/lib/pdm-config/Cargo.toml
index d39c2ad..19781d2 100644
--- a/lib/pdm-config/Cargo.toml
+++ b/lib/pdm-config/Cargo.toml
@@ -13,6 +13,7 @@ once_cell.workspace = true
openssl.workspace = true
serde.workspace = true
+proxmox-access-control.workspace = true
proxmox-config-digest = { workspace = true, features = [ "openssl" ] }
proxmox-http = { workspace = true, features = [ "http-helpers" ] }
proxmox-ldap = { workspace = true, features = [ "types" ]}
diff --git a/lib/pdm-config/src/access_control.rs b/lib/pdm-config/src/access_control.rs
new file mode 100644
index 0000000..389b3f4
--- /dev/null
+++ b/lib/pdm-config/src/access_control.rs
@@ -0,0 +1,20 @@
+use anyhow::Error;
+
+pub struct AccessControlConfig;
+
+impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
+ fn permissions(&self) -> &dyn proxmox_access_control::init::AccessControlPermissions {
+ &pdm_api_types::AccessControlPermissions
+ }
+
+ fn token_shadow_cache_generation(&self) -> Option<usize> {
+ crate::ConfigVersionCache::new()
+ .ok()
+ .map(|c| c.token_shadow_generation())
+ }
+
+ fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
+ let c = crate::ConfigVersionCache::new()?;
+ Ok(c.increase_token_shadow_generation())
+ }
+}
diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
index 36a6a77..933140c 100644
--- a/lib/pdm-config/src/config_version_cache.rs
+++ b/lib/pdm-config/src/config_version_cache.rs
@@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
traffic_control_generation: AtomicUsize,
// Tracks updates to the remote/hostname/nodename mapping cache.
remote_mapping_cache: AtomicUsize,
+ // Token shadow (token.shadow) generation/version.
+ token_shadow_generation: AtomicUsize,
// Add further atomics here
}
@@ -172,4 +174,20 @@ impl ConfigVersionCache {
.fetch_add(1, Ordering::Relaxed)
+ 1
}
+
+ /// Returns the token shadow generation number.
+ pub fn token_shadow_generation(&self) -> usize {
+ self.shmem
+ .data()
+ .token_shadow_generation
+ .load(Ordering::Acquire)
+ }
+
+ /// Increase the token shadow generation number.
+ pub fn increase_token_shadow_generation(&self) -> usize {
+ self.shmem
+ .data()
+ .token_shadow_generation
+ .fetch_add(1, Ordering::AcqRel)
+ }
}
diff --git a/lib/pdm-config/src/lib.rs b/lib/pdm-config/src/lib.rs
index 4c49054..614f7ae 100644
--- a/lib/pdm-config/src/lib.rs
+++ b/lib/pdm-config/src/lib.rs
@@ -9,6 +9,8 @@ pub mod remotes;
pub mod setup;
pub mod views;
+mod access_control;
+pub use access_control::AccessControlConfig;
mod config_version_cache;
pub use config_version_cache::ConfigVersionCache;
diff --git a/server/src/acl.rs b/server/src/acl.rs
index f421814..e6e007b 100644
--- a/server/src/acl.rs
+++ b/server/src/acl.rs
@@ -1,6 +1,5 @@
pub(crate) fn init() {
- static ACCESS_CONTROL_CONFIG: pdm_api_types::AccessControlConfig =
- pdm_api_types::AccessControlConfig;
+ static ACCESS_CONTROL_CONFIG: pdm_config::AccessControlConfig = pdm_config::AccessControlConfig;
proxmox_access_control::init::init(&ACCESS_CONTROL_CONFIG, pdm_buildcfg::configdir!("/access"))
.expect("failed to setup access control config");
diff --git a/ui/src/main.rs b/ui/src/main.rs
index 2bd900e..9f87505 100644
--- a/ui/src/main.rs
+++ b/ui/src/main.rs
@@ -390,10 +390,18 @@ fn main() {
pwt::state::set_available_languages(proxmox_yew_comp::available_language_list());
if let Err(e) =
- proxmox_access_control::init::init_access_config(&pdm_api_types::AccessControlConfig)
+ proxmox_access_control::init::init_access_config(&UiAccessControlConfig)
{
log::error!("could not initialize access control config - {e:#}");
}
yew::Renderer::<DatacenterManagerApp>::new().render();
}
+
+struct UiAccessControlConfig;
+
+impl proxmox_access_control::init::AccessControlConfig for UiAccessControlConfig {
+ fn permissions(&self) -> &dyn proxmox_access_control::init::AccessControlPermissions {
+ &pdm_api_types::AccessControlPermissions
+ }
+}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox v4 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
` (3 preceding siblings ...)
2026-01-21 15:14 15% ` [pbs-devel] [PATCH proxmox-backup v4 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
@ 2026-01-21 15:14 13% ` Samuel Rufinatscha
2026-01-21 15:14 12% ` [pbs-devel] [PATCH proxmox v4 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
` (5 subsequent siblings)
10 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
To: pbs-devel
Splits AccessControlConfig trait into AccessControlPermissions and
AccessControlConfig traits and adds token.shadow generation support
to AccessControlConfig (provides default impl).
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to 4:
* Split AccessControlConfig: introduced AccessControlPermissions to
provide permissions for AccessControlConfig
* Adjusted commit message
proxmox-access-control/src/acl.rs | 10 ++-
proxmox-access-control/src/init.rs | 113 +++++++++++++++++++++++------
2 files changed, 99 insertions(+), 24 deletions(-)
diff --git a/proxmox-access-control/src/acl.rs b/proxmox-access-control/src/acl.rs
index 38cb7edf..4b4eac09 100644
--- a/proxmox-access-control/src/acl.rs
+++ b/proxmox-access-control/src/acl.rs
@@ -763,7 +763,7 @@ fn privs_to_priv_names(privs: u64) -> Vec<&'static str> {
mod test {
use std::{collections::HashMap, sync::OnceLock};
- use crate::init::{init_access_config, AccessControlConfig};
+ use crate::init::{init_access_config, AccessControlConfig, AccessControlPermissions};
use super::AclTree;
use anyhow::Error;
@@ -775,7 +775,7 @@ mod test {
roles: HashMap<&'a str, (u64, &'a str)>,
}
- impl AccessControlConfig for TestAcmConfig<'_> {
+ impl AccessControlPermissions for TestAcmConfig<'_> {
fn roles(&self) -> &HashMap<&str, (u64, &str)> {
&self.roles
}
@@ -793,6 +793,12 @@ mod test {
}
}
+ impl AccessControlConfig for TestAcmConfig<'_> {
+ fn permissions(&self) -> &dyn AccessControlPermissions {
+ self
+ }
+ }
+
fn setup_acl_tree_config() {
static ACL_CONFIG: OnceLock<TestAcmConfig> = OnceLock::new();
let config = ACL_CONFIG.get_or_init(|| {
diff --git a/proxmox-access-control/src/init.rs b/proxmox-access-control/src/init.rs
index e64398e8..dfd7784b 100644
--- a/proxmox-access-control/src/init.rs
+++ b/proxmox-access-control/src/init.rs
@@ -8,9 +8,8 @@ use proxmox_section_config::SectionConfigData;
static ACCESS_CONF: OnceLock<&'static dyn AccessControlConfig> = OnceLock::new();
-/// This trait specifies the functions a product needs to implement to get ACL tree based access
-/// control management from this plugin.
-pub trait AccessControlConfig: Send + Sync {
+/// Provides permission metadata used by access control.
+pub trait AccessControlPermissions: Send + Sync {
/// Returns a mapping of all recognized privileges and their corresponding `u64` value.
fn privileges(&self) -> &HashMap<&str, u64>;
@@ -32,25 +31,6 @@ pub trait AccessControlConfig: Send + Sync {
false
}
- /// Returns the current cache generation of the user and acl configs. If the generation was
- /// incremented since the last time the cache was queried, the configs are loaded again from
- /// disk.
- ///
- /// Returning `None` will always reload the cache.
- ///
- /// Default: Always returns `None`.
- fn cache_generation(&self) -> Option<usize> {
- None
- }
-
- /// Increment the cache generation of user and acl configs. This indicates that they were
- /// changed on disk.
- ///
- /// Default: Does nothing.
- fn increment_cache_generation(&self) -> Result<(), Error> {
- Ok(())
- }
-
/// Optionally returns a role that has no access to any resource.
///
/// Default: Returns `None`.
@@ -103,6 +83,95 @@ pub trait AccessControlConfig: Send + Sync {
}
}
+/// This trait specifies the functions a product needs to implement to get ACL tree based access
+/// control management from this plugin.
+pub trait AccessControlConfig: Send + Sync {
+ /// Return the permissions provider.
+ fn permissions(&self) -> &dyn AccessControlPermissions;
+
+ fn privileges(&self) -> &HashMap<&str, u64> {
+ self.permissions().privileges()
+ }
+
+ fn roles(&self) -> &HashMap<&str, (u64, &str)> {
+ self.permissions().roles()
+ }
+
+ fn is_superuser(&self, auth_id: &Authid) -> bool {
+ self.permissions().is_superuser(auth_id)
+ }
+
+ fn is_group_member(&self, user_id: &Userid, group: &str) -> bool {
+ self.permissions().is_group_member(user_id, group)
+ }
+
+ fn role_no_access(&self) -> Option<&str> {
+ self.permissions().role_no_access()
+ }
+
+ fn role_admin(&self) -> Option<&str> {
+ self.permissions().role_admin()
+ }
+
+ fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
+ self.permissions().init_user_config(config)
+ }
+
+ fn acl_audit_privileges(&self) -> u64 {
+ self.permissions().acl_audit_privileges()
+ }
+
+ fn acl_modify_privileges(&self) -> u64 {
+ self.permissions().acl_modify_privileges()
+ }
+
+ fn check_acl_path(&self, path: &str) -> Result<(), Error> {
+ self.permissions().check_acl_path(path)
+ }
+
+ fn allow_partial_permission_match(&self) -> bool {
+ self.permissions().allow_partial_permission_match()
+ }
+
+ // Cache hooks
+
+ /// Returns the current cache generation of the user and acl configs. If the generation was
+ /// incremented since the last time the cache was queried, the configs are loaded again from
+ /// disk.
+ ///
+ /// Returning `None` will always reload the cache.
+ ///
+ /// Default: Always returns `None`.
+ fn cache_generation(&self) -> Option<usize> {
+ None
+ }
+
+ /// Increment the cache generation of user and acl configs. This indicates that they were
+ /// changed on disk.
+ ///
+ /// Default: Does nothing.
+ fn increment_cache_generation(&self) -> Result<(), Error> {
+ Ok(())
+ }
+
+ /// Returns the current cache generation of the token shadow cache. If the generation was
+ /// incremented since the last time the cache was queried, the token shadow cache is reloaded
+ /// from disk.
+ ///
+ /// Default: Always returns `None`.
+ fn token_shadow_cache_generation(&self) -> Option<usize> {
+ None
+ }
+
+ /// Increment the cache generation of the token shadow cache. This indicates that it was
+ /// changed on disk.
+ ///
+ /// Default: Returns an error as token shadow generation is not supported.
+ fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
+ anyhow::bail!("token shadow generation not supported");
+ }
+}
+
pub fn init_access_config(config: &'static dyn AccessControlConfig) -> Result<(), Error> {
ACCESS_CONF
.set(config)
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox v4 4/4] proxmox-access-control: add TTL window to token secret cache
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
` (6 preceding siblings ...)
2026-01-21 15:14 12% ` [pbs-devel] [PATCH proxmox v4 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2026-01-21 15:14 15% ` Samuel Rufinatscha
2026-01-21 15:14 13% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 1/3] pdm-config: implement token.shadow generation Samuel Rufinatscha
` (2 subsequent siblings)
10 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
To: pbs-devel
Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
usually should rarely change.
This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired; documents TTL effects.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to 4:
* Adjusted commit message
Changes from v2 to v3:
* Refactored refresh_cache_if_file_changed TTL logic.
* Remove had_prior_state check (replaced by last_checked logic).
* Improve TTL bound checks.
* Reword documentation warning for clarity.
Changes from v1 to v2:
* Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
* Implement double-checked TTL: check with try_read first; only attempt
refresh with try_write if expired/unknown.
* Fix TTL bookkeeping: update last_checked on the “file unchanged” path
and after API mutations.
* Add documentation warning about TTL-delayed effect of manual
token.shadow edits.
proxmox-access-control/src/token_shadow.rs | 30 +++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index 05813b52..a361fd72 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -28,6 +28,9 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
})
});
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
+
// Get exclusive lock
fn lock_config() -> Result<ApiLockGuard, Error> {
open_api_lockfile(token_shadow_lock(), None, true)
@@ -55,11 +58,29 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
fn refresh_cache_if_file_changed() -> bool {
let now = epoch_i64();
- // Best-effort refresh under write lock.
+ // Fast path: cache is fresh if shared-gen matches and TTL not expired.
+ if let (Some(cache), Some(shared_gen_read)) =
+ (TOKEN_SECRET_CACHE.try_read(), token_shadow_shared_gen())
+ {
+ if cache.shared_gen == shared_gen_read
+ && cache.shadow.as_ref().is_some_and(|cached| {
+ now >= cached.last_checked
+ && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS
+ })
+ {
+ return true;
+ }
+ // read lock drops here
+ } else {
+ return false;
+ }
+
+ // Slow path: best-effort refresh under write lock.
let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
return false;
};
+ // Re-read generation after acquiring the lock (may have changed meanwhile).
let Some(shared_gen_now) = token_shadow_shared_gen() else {
return false;
};
@@ -69,6 +90,13 @@ fn refresh_cache_if_file_changed() -> bool {
invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
}
+ // TTL check again after acquiring the lock
+ if cache.shadow.as_ref().is_some_and(|cached| {
+ now >= cached.last_checked && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS
+ }) {
+ return true;
+ }
+
// Stat the file to detect manual edits.
let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
return false;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox-backup v4 2/4] pbs-config: cache verified API token secrets
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
2026-01-21 15:13 17% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
@ 2026-01-21 15:13 12% ` Samuel Rufinatscha
2026-01-21 15:13 12% ` [pbs-devel] [PATCH proxmox-backup v4 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
` (8 subsequent siblings)
10 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-21 15:13 UTC (permalink / raw)
To: pbs-devel
Adds an in-memory cache of successfully verified token secrets.
Subsequent requests for the same token+secret combination only perform a
comparison using openssl::memcmp::eq and avoid re-running the password
hash. The cache is updated when a token secret is set and cleared when a
token is deleted.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to v4:
* Add gen param to invalidate_cache_state()
* Validates the generation bump after obtaining write lock in
apply_api_mutation
* Pass lock to apply_api_mutation
* Remove unnecessary gen check cache_try_secret_matches
* Adjusted commit message
Changes from v2 to v3:
* Replaced process-local cache invalidation (AtomicU64
API_MUTATION_GENERATION) with a cross-process shared generation via
ConfigVersionCache.
* Validate shared generation before/after the constant-time secret
compare; only insert into cache if the generation is unchanged.
* invalidate_cache_state() on insert if shared generation changed.
Changes from v1 to v2:
* Replace OnceCell with LazyLock, and std::sync::RwLock with
parking_lot::RwLock.
* Add API_MUTATION_GENERATION and guard cache inserts
to prevent “zombie inserts” across concurrent set/delete.
* Refactor cache operations into cache_try_secret_matches,
cache_try_insert_secret, and centralize write-side behavior in
apply_api_mutation.
* Switch fast-path cache access to try_read/try_write (best-effort).
Cargo.toml | 1 +
pbs-config/Cargo.toml | 1 +
pbs-config/src/token_shadow.rs | 160 ++++++++++++++++++++++++++++++++-
3 files changed, 159 insertions(+), 3 deletions(-)
diff --git a/Cargo.toml b/Cargo.toml
index 0da18383..aed66fe3 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -143,6 +143,7 @@ nom = "7"
num-traits = "0.2"
once_cell = "1.3.1"
openssl = "0.10.40"
+parking_lot = "0.12"
percent-encoding = "2.1"
pin-project-lite = "0.2"
regex = "1.5.5"
diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
index 74afb3c6..eb81ce00 100644
--- a/pbs-config/Cargo.toml
+++ b/pbs-config/Cargo.toml
@@ -13,6 +13,7 @@ libc.workspace = true
nix.workspace = true
once_cell.workspace = true
openssl.workspace = true
+parking_lot.workspace = true
regex.workspace = true
serde.workspace = true
serde_json.workspace = true
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index 640fabbf..d5aa5de2 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,6 +1,8 @@
use std::collections::HashMap;
+use std::sync::LazyLock;
use anyhow::{bail, format_err, Error};
+use parking_lot::RwLock;
use serde::{Deserialize, Serialize};
use serde_json::{from_value, Value};
@@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
+ RwLock::new(ApiTokenSecretCache {
+ secrets: HashMap::new(),
+ shared_gen: 0,
+ })
+});
+
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
/// ApiToken id / secret pair
@@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
bail!("not an API token ID");
}
+ // Fast path
+ if cache_try_secret_matches(tokenid, secret) {
+ return Ok(());
+ }
+
+ // Slow path
+ // First, capture the shared generation before doing the hash verification.
+ let gen_before = token_shadow_shared_gen();
+
let data = read_file()?;
match data.get(tokenid) {
- Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+ Some(hashed_secret) => {
+ proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+
+ // Try to cache only if nothing changed while verifying the secret.
+ if let Some(gen) = gen_before {
+ cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
+ }
+
+ Ok(())
+ }
None => bail!("invalid API token"),
}
}
@@ -75,13 +107,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
bail!("not an API token ID");
}
- let _guard = lock_config()?;
+ let guard = lock_config()?;
let mut data = read_file()?;
let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
data.insert(tokenid.clone(), hashed_secret);
write_file(data)?;
+ apply_api_mutation(guard, tokenid, Some(secret));
+
Ok(())
}
@@ -91,11 +125,131 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
bail!("not an API token ID");
}
- let _guard = lock_config()?;
+ let guard = lock_config()?;
let mut data = read_file()?;
data.remove(tokenid);
write_file(data)?;
+ apply_api_mutation(guard, tokenid, None);
+
Ok(())
}
+
+struct ApiTokenSecretCache {
+ /// Keys are token Authids, values are the corresponding plain text secrets.
+ /// Entries are added after a successful on-disk verification in
+ /// `verify_secret` or when a new token secret is generated by
+ /// `generate_and_set_secret`. Used to avoid repeated
+ /// password-hash computation on subsequent authentications.
+ secrets: HashMap<Authid, CachedSecret>,
+ /// Shared generation to detect mutations of the underlying token.shadow file.
+ shared_gen: usize,
+}
+
+/// Cached secret.
+struct CachedSecret {
+ secret: String,
+}
+
+fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
+ let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+ return;
+ };
+
+ let Some(shared_gen_now) = token_shadow_shared_gen() else {
+ return;
+ };
+
+ // If this process missed a generation bump, its cache is stale.
+ if cache.shared_gen != shared_gen_now {
+ invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
+ }
+
+ // If a mutation happened while we were verifying the secret, do not insert.
+ if shared_gen_now == shared_gen_before {
+ cache.secrets.insert(tokenid, CachedSecret { secret });
+ }
+}
+
+/// Tries to match the given token secret against the cached secret.
+///
+/// Verifies the generation/version before doing the constant-time
+/// comparison to reduce TOCTOU risk. During token rotation or deletion
+/// tokens for in-flight requests may still validate against the previous
+/// generation.
+fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
+ let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+ return false;
+ };
+ let Some(entry) = cache.secrets.get(tokenid) else {
+ return false;
+ };
+ let Some(current_gen) = token_shadow_shared_gen() else {
+ return false;
+ };
+
+ if current_gen == cache.shared_gen {
+ return openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
+ }
+
+ false
+}
+
+fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+ // Signal cache invalidation to other processes (best-effort).
+ let bumped_gen = bump_token_shadow_shared_gen();
+
+ let mut cache = TOKEN_SECRET_CACHE.write();
+
+ // If we cannot get the current generation, we cannot trust the cache
+ let Some(current_gen) = token_shadow_shared_gen() else {
+ invalidate_cache_state_and_set_gen(&mut cache, 0);
+ return;
+ };
+
+ // If we cannot bump the shared generation, or if it changed after
+ // obtaining the cache write lock, we cannot trust the cache
+ if bumped_gen != Some(current_gen) {
+ invalidate_cache_state_and_set_gen(&mut cache, current_gen);
+ return;
+ }
+
+ // Update to the post-mutation generation.
+ cache.shared_gen = current_gen;
+
+ // Apply the new mutation.
+ match new_secret {
+ Some(secret) => {
+ cache.secrets.insert(
+ tokenid.clone(),
+ CachedSecret {
+ secret: secret.to_owned(),
+ },
+ );
+ }
+ None => {
+ cache.secrets.remove(tokenid);
+ }
+ }
+}
+
+/// Get the current shared generation.
+fn token_shadow_shared_gen() -> Option<usize> {
+ crate::ConfigVersionCache::new()
+ .ok()
+ .map(|cvc| cvc.token_shadow_generation())
+}
+
+/// Bump and return the new shared generation.
+fn bump_token_shadow_shared_gen() -> Option<usize> {
+ crate::ConfigVersionCache::new()
+ .ok()
+ .map(|cvc| cvc.increase_token_shadow_generation() + 1)
+}
+
+/// Invalidates local cache contents and sets/updates the cached generation.
+fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache, gen: usize) {
+ cache.secrets.clear();
+ cache.shared_gen = gen;
+}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] [PATCH proxmox-datacenter-manager v4 2/3] docs: document API token-cache TTL effects
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
` (8 preceding siblings ...)
2026-01-21 15:14 13% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 1/3] pdm-config: implement token.shadow generation Samuel Rufinatscha
@ 2026-01-21 15:14 17% ` Samuel Rufinatscha
2026-01-21 15:14 16% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 3/3] pdm-config: wire user+acl cache generation Samuel Rufinatscha
10 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
To: pbs-devel
Documents the effects of the added API token-cache in the
proxmox-access-control crate.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to 4:
* Adjusted commit message
docs/access-control.rst | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/docs/access-control.rst b/docs/access-control.rst
index adf26cd..18e57a2 100644
--- a/docs/access-control.rst
+++ b/docs/access-control.rst
@@ -47,6 +47,10 @@ place of the user ID (``user@realm``) and the user password, respectively.
The API token is passed from the client to the server by setting the ``Authorization`` HTTP header
with method ``PDMAPIToken`` to the value ``TOKENID:TOKENSECRET``.
+.. WARNING:: Direct/manual edits to ``token.shadow`` may take up to 60 seconds (or
+ longer in edge cases) to take effect due to caching. Restart services for
+ immediate effect of manual edits.
+
.. _access_control:
Access Control
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 17%]
* [pbs-devel] [PATCH proxmox-backup v4 4/4] pbs-config: add TTL window to token secret cache
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
` (2 preceding siblings ...)
2026-01-21 15:13 12% ` [pbs-devel] [PATCH proxmox-backup v4 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2026-01-21 15:14 15% ` Samuel Rufinatscha
2026-01-21 15:14 13% ` [pbs-devel] [PATCH proxmox v4 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen Samuel Rufinatscha
` (6 subsequent siblings)
10 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
To: pbs-devel
Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
usually should rarely change.
This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired; documents TTL effects.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to 4:
* Adjusted commit message
Changes from v2 to v3:
* Refactored refresh_cache_if_file_changed TTL logic.
* Remove had_prior_state check (replaced by last_checked logic).
* Improve TTL bound checks.
* Reword documentation warning for clarity.
Changes from v1 to v2:
* Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
* Implement double-checked TTL: check with try_read first; only attempt
refresh with try_write if expired/unknown.
* Fix TTL bookkeeping: update last_checked on the “file unchanged” path
and after API mutations.
* Add documentation warning about TTL-delayed effect of manual
token.shadow edits.
docs/user-management.rst | 4 ++++
pbs-config/src/token_shadow.rs | 29 ++++++++++++++++++++++++++++-
2 files changed, 32 insertions(+), 1 deletion(-)
diff --git a/docs/user-management.rst b/docs/user-management.rst
index 41b43d60..8dfae528 100644
--- a/docs/user-management.rst
+++ b/docs/user-management.rst
@@ -156,6 +156,10 @@ metadata:
Similarly, the ``user delete-token`` subcommand can be used to delete a token
again.
+.. WARNING:: Direct/manual edits to ``token.shadow`` may take up to 60 seconds (or
+ longer in edge cases) to take effect due to caching. Restart services for
+ immediate effect of manual edits.
+
Newly generated API tokens don't have any permissions. Please read the next
section to learn how to set access permissions.
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index a5bd1525..24633f6e 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -31,6 +31,8 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
shadow: None,
})
});
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
@@ -72,11 +74,29 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
fn refresh_cache_if_file_changed() -> bool {
let now = epoch_i64();
- // Best-effort refresh under write lock.
+ // Fast path: cache is fresh if shared-gen matches and TTL not expired.
+ if let (Some(cache), Some(shared_gen_read)) =
+ (TOKEN_SECRET_CACHE.try_read(), token_shadow_shared_gen())
+ {
+ if cache.shared_gen == shared_gen_read
+ && cache.shadow.as_ref().is_some_and(|cached| {
+ now >= cached.last_checked
+ && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS
+ })
+ {
+ return true;
+ }
+ // read lock drops here
+ } else {
+ return false;
+ }
+
+ // Slow path: best-effort refresh under write lock.
let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
return false;
};
+ // Re-read generation after acquiring the lock (may have changed meanwhile).
let Some(shared_gen_now) = token_shadow_shared_gen() else {
return false;
};
@@ -86,6 +106,13 @@ fn refresh_cache_if_file_changed() -> bool {
invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
}
+ // TTL check again after acquiring the lock
+ if cache.shadow.as_ref().is_some_and(|cached| {
+ now >= cached.last_checked && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS
+ }) {
+ return true;
+ }
+
// Stat the file to detect manual edits.
let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
return false;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead
@ 2026-01-21 15:13 14% Samuel Rufinatscha
2026-01-21 15:13 17% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
` (10 more replies)
0 siblings, 11 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-21 15:13 UTC (permalink / raw)
To: pbs-devel
Hi,
this series improves the performance of token-based API authentication
in PBS (pbs-config) and in PDM (underlying proxmox-access-control
crate), addressing the API token verification hotspot reported in our
bugtracker #7017 [1].
When profiling PBS /status endpoint with cargo flamegraph [2],
token-based authentication showed up as a dominant hotspot via
proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
path from the hot section of the flamegraph. The same performance issue
was measured [2] for PDM. PDM uses the underlying shared
proxmox-access-control library for token handling, which is a
factored out version of the token.shadow handling code from PBS.
While this series fixes the immediate performance issue both in PBS
(pbs-config) and in the shared proxmox-access-control crate used by
PDM, PBS should eventually, ideally be refactored, in a separate
effort, to use proxmox-access-control for token handling instead of its
local implementation.
Approach
The goal is to reduce the cost of token-based authentication preserving
the existing token handling semantics (including detecting manual edits
to token.shadow) and be consistent between PBS (pbs-config) and
PDM (proxmox-access-control). For both sites, this series proposes to:
1. Introduce an in-memory cache for verified token secrets and
invalidate it through a shared ConfigVersionCache generation. Note, a
shared generation is required to keep privileged and unprivileged
daemon in sync to avoid caching inconsistencies across processes.
2. Invalidate on token.shadow API changes (set_secret,
delete_secret)
3. Invalidate on direct/manual token.shadow file changes (mtime +
length)
4. Avoid per-request file stat calls using a TTL window
Testing
To verify the effect in PBS (pbs-config changes), I:
1. Set up test environment based on latest PBS ISO, installed Rust
toolchain, cloned proxmox-backup repository to use with cargo
flamegraph. Reproduced bug #7017 [1] by profiling the /status
endpoint with token-based authentication using cargo flamegraph [2].
2. Built PBS with pbs-config patches and re-ran the same workload and
profiling setup. Confirmed that
proxmox_sys::crypt::verify_crypt_pw path no longer appears in the
hot section of the flamegraph. CPU usage is now dominated by TLS
overhead.
3. Functionally-wise, I verified that:
* valid tokens authenticate correctly when used in API requests
* invalid secrets are rejected as before
* generating a new token secret via dashboard (create token for
user, regenerate existing secret) works and authenticates correctly
To verify the effect in PDM (proxmox-access-control changes), instead
of PBS’ /status, I profiled the /version endpoint with cargo flamegraph
[2] and verified that the expensive hashing path disappears from the
hot section after introducing caching. Functionally-wise, I verified
that:
* valid tokens authenticate correctly when used in API requests
* invalid secrets are rejected as before
* generating a new token secret via dashboard (create token for user,
regenerate existing secret) works and authenticates correctly
Benchmarks
Two different benchmarks have been run to measure caching effects
and RwLock contention:
(1) Requests per second for PBS /status endpoint (E2E)
Benchmarked parallel token auth requests for
/status?verbose=0 on top of the datastore lookup cache series [3]
to check throughput impact. With datastores=1, repeat=5000, parallel=16
this series gives ~172 req/s compared to ~65 req/s without it.
This is a ~2.6x improvement (and aligns with the ~179 req/s from the
previous series, which used per-process cache invalidation).
(2) RwLock contention for token create/delete under heavy load of
token-authenticated requests
The previous version of the series compared std::sync::RwLock and
parking_lot::RwLock contention for token create/delete under heavy
parallel token-authenticated readers. parking_lot::RwLock has been
chosen for the added fairness guarantees.
Patch summary
pbs-config:
0001 – pbs-config: add token.shadow generation to ConfigVersionCache
0002 – pbs-config: cache verified API token secrets
0003 – pbs-config: invalidate token-secret cache on token.shadow
changes
0004 – pbs-config: add TTL window to token-secret cache
proxmox-access-control:
0005 – access-control: extend AccessControlConfig for token.shadow invalidation
0006 – access-control: cache verified API token secrets
0007 – access-control: invalidate token-secret cache on token.shadow changes
0008 – access-control: add TTL window to token-secret cache
proxmox-datacenter-manager:
0009 – pdm-config: add token.shadow generation to ConfigVersionCache
0010 – docs: document API token-cache TTL effects
0011 – pdm-config: wire user+acl cache generation
Maintainer notes
* proxmox-access-control trait split: permissions now live in
AccessControlPermissions, and AccessControlConfig now requires
fn permissions(&self) -> &dyn AccessControlPermissions ->
version bump
* Renames ConfigVersionCache`s pub user_cache_generation and
increase_user_cache_generation -> version bump
* Adds parking_lot::RwLock dependency in PBS and proxmox-access-control
Kind regards,
Samuel Rufinatscha
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
[2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
[3] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
proxmox-backup:
Samuel Rufinatscha (4):
pbs-config: add token.shadow generation to ConfigVersionCache
pbs-config: cache verified API token secrets
pbs-config: invalidate token-secret cache on token.shadow changes
pbs-config: add TTL window to token secret cache
Cargo.toml | 1 +
docs/user-management.rst | 4 +
pbs-config/Cargo.toml | 1 +
pbs-config/src/config_version_cache.rs | 18 ++
pbs-config/src/token_shadow.rs | 302 ++++++++++++++++++++++++-
5 files changed, 323 insertions(+), 3 deletions(-)
proxmox:
Samuel Rufinatscha (4):
proxmox-access-control: split AccessControlConfig and add token.shadow
gen
proxmox-access-control: cache verified API token secrets
proxmox-access-control: invalidate token-secret cache on token.shadow
changes
proxmox-access-control: add TTL window to token secret cache
Cargo.toml | 1 +
proxmox-access-control/Cargo.toml | 1 +
proxmox-access-control/src/acl.rs | 10 +-
proxmox-access-control/src/init.rs | 113 ++++++--
proxmox-access-control/src/token_shadow.rs | 303 ++++++++++++++++++++-
5 files changed, 401 insertions(+), 27 deletions(-)
proxmox-datacenter-manager:
Samuel Rufinatscha (3):
pdm-config: implement token.shadow generation
docs: document API token-cache TTL effects
pdm-config: wire user+acl cache generation
cli/admin/src/main.rs | 2 +-
docs/access-control.rst | 4 +++
lib/pdm-api-types/src/acl.rs | 4 +--
lib/pdm-config/Cargo.toml | 1 +
lib/pdm-config/src/access_control.rs | 31 ++++++++++++++++++++
lib/pdm-config/src/config_version_cache.rs | 34 +++++++++++++++++-----
lib/pdm-config/src/lib.rs | 2 ++
server/src/acl.rs | 3 +-
ui/src/main.rs | 10 ++++++-
9 files changed, 77 insertions(+), 14 deletions(-)
create mode 100644 lib/pdm-config/src/access_control.rs
Summary over all repositories:
19 files changed, 801 insertions(+), 44 deletions(-)
--
Generated by git-murpp 0.8.1
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 14%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 3/4] pbs-config: invalidate token-secret cache on token.shadow changes
2026-01-14 10:44 5% ` Fabian Grünbichler
@ 2026-01-20 9:21 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-20 9:21 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>> Previously the in-memory token-secret cache was only updated via
>> set_secret() and delete_secret(), so manual edits to token.shadow were
>> not reflected.
>>
>> This patch adds file change detection to the cache. It tracks the mtime
>> and length of token.shadow and clears the in-memory token secret cache
>> whenever these values change.
>>
>> Note, this patch fetches file stats on every request. An TTL-based
>> optimization will be covered in a subsequent patch of the series.
>>
>> This patch is part of the series which fixes bug #7017 [1].
>>
>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> Changes from v1 to v2:
>>
>> * Add file metadata tracking (file_mtime, file_len) and
>> FILE_GENERATION.
>> * Store file_gen in CachedSecret and verify it against the current
>> FILE_GENERATION to ensure cached entries belong to the current file
>> state.
>> * Add shadow_mtime_len() helper and convert refresh to best-effort
>> (try_write, returns bool).
>> * Pass a pre-write metadata snapshot into apply_api_mutation and
>> clear/bump generation if the cache metadata indicates missed external
>> edits.
>>
>> Changes from v2 to v3:
>>
>> * Cache now tracks last_checked (epoch seconds).
>> * Simplified refresh_cache_if_file_changed, removed
>> FILE_GENERATION logic
>> * On first load, initializes file metadata and keeps empty cache.
>>
>> pbs-config/src/token_shadow.rs | 122 +++++++++++++++++++++++++++++++--
>> 1 file changed, 118 insertions(+), 4 deletions(-)
>>
>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
>> index fa84aee5..02fb191b 100644
>> --- a/pbs-config/src/token_shadow.rs
>> +++ b/pbs-config/src/token_shadow.rs
>> @@ -1,5 +1,8 @@
>> use std::collections::HashMap;
>> +use std::fs;
>> +use std::io::ErrorKind;
>> use std::sync::LazyLock;
>> +use std::time::SystemTime;
>>
>> use anyhow::{bail, format_err, Error};
>> use parking_lot::RwLock;
>> @@ -7,6 +10,7 @@ use serde::{Deserialize, Serialize};
>> use serde_json::{from_value, Value};
>>
>> use proxmox_sys::fs::CreateOptions;
>> +use proxmox_time::epoch_i64;
>>
>> use pbs_api_types::Authid;
>> //use crate::auth;
>> @@ -24,6 +28,9 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
>> RwLock::new(ApiTokenSecretCache {
>> secrets: HashMap::new(),
>> shared_gen: 0,
>> + file_mtime: None,
>> + file_len: None,
>> + last_checked: None,
>> })
>> });
>>
>> @@ -62,6 +69,63 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
>> proxmox_sys::fs::replace_file(CONF_FILE, &json, options, true)
>> }
>>
>> +/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
>> +/// Returns true if the cache is valid to use, false if not.
>> +fn refresh_cache_if_file_changed() -> bool {
>> + let now = epoch_i64();
>> +
>> + // Best-effort refresh under write lock.
>> + let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
>> + return false;
>> + };
>> +
>> + let Some(shared_gen_now) = token_shadow_shared_gen() else {
>> + return false;
>> + };
>> +
>> + // If another process bumped the generation, we don't know what changed -> clear cache
>> + if cache.shared_gen != shared_gen_now {
>> + invalidate_cache_state(&mut cache);
>> + cache.shared_gen = shared_gen_now;
>> + }
>> +
>> + // Stat the file to detect manual edits.
>> + let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
>> + return false;
>> + };
>> +
>> + // Initialize file stats if we have no prior state.
>> + if cache.last_checked.is_none() {
>> + cache.secrets.clear(); // ensure cache is empty on first load
>> + cache.file_mtime = new_mtime;
>> + cache.file_len = new_len;
>> + cache.last_checked = Some(now);
>> + return true;
>
> this code here
>
>> + }
>> +
>> + // No change detected.
>> + if cache.file_mtime == new_mtime && cache.file_len == new_len {
>> + cache.last_checked = Some(now);
>> + return true;
>> + }
>> +
>> + // Manual edit detected -> invalidate cache and update stat.
>> + cache.secrets.clear();
>> + cache.file_mtime = new_mtime;
>> + cache.file_len = new_len;
>> + cache.last_checked = Some(now);
>
> and this code here are identical. if this is the first invocation, then
> the change detection check above cannot be true (the cached mtime and
> len will be None).
>
> so we can drop the first if above, and replace the last line in this
> hunk with
>
> let prev_last_checked = cache.last_checked.replace(Some(now));
>
> and then skip bumping the generation if this is_none()
Great idea about the .replace()! Integrating it with the new
ShadowFileInfo :)
>
> OTOH, if we just cleared the cache here, does it make sense to return
> true? the cache is empty, so likely querying it *now* makes no sense?
Agree, we should just return false here
>
>> +
>> + // Best-effort propagation to other processes + update local view.
>> + if let Some(shared_gen_new) = bump_token_shadow_shared_gen() {
>> + cache.shared_gen = shared_gen_new;
>> + } else {
>> + // Do not fail: local cache is already safe as we cleared it above.
>> + // Keep local shared_gen as-is to avoid repeated failed attempts.
>> + }
>> +
>> + true
>> +}
>> +
>> /// Verifies that an entry for given tokenid / API token secret exists
>> pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>> if !tokenid.is_token() {
>> @@ -69,7 +133,7 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>> }
>>
>> // Fast path
>> - if cache_try_secret_matches(tokenid, secret) {
>> + if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
>> return Ok(());
>> }
>>
>> @@ -109,12 +173,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>>
>> let _guard = lock_config()?;
>>
>> + // Capture state before we write to detect external edits.
>> + let pre_meta = shadow_mtime_len().unwrap_or((None, None));
>> +
>> let mut data = read_file()?;
>> let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
>> data.insert(tokenid.clone(), hashed_secret);
>> write_file(data)?;
>>
>> - apply_api_mutation(tokenid, Some(secret));
>> + apply_api_mutation(tokenid, Some(secret), pre_meta);
>>
>> Ok(())
>> }
>> @@ -127,11 +194,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
>>
>> let _guard = lock_config()?;
>>
>> + // Capture state before we write to detect external edits.
>> + let pre_meta = shadow_mtime_len().unwrap_or((None, None));
>> +
>> let mut data = read_file()?;
>> data.remove(tokenid);
>> write_file(data)?;
>>
>> - apply_api_mutation(tokenid, None);
>> + apply_api_mutation(tokenid, None, pre_meta);
>>
>> Ok(())
>> }
>> @@ -145,6 +215,12 @@ struct ApiTokenSecretCache {
>> secrets: HashMap<Authid, CachedSecret>,
>> /// Shared generation to detect mutations of the underlying token.shadow file.
>> shared_gen: usize,
>> + // shadow file mtime to detect changes
>> + file_mtime: Option<SystemTime>,
>> + // shadow file length to detect changes
>> + file_len: Option<u64>,
>> + // last time the file metadata was checked
>> + last_checked: Option<i64>,
>
> these three are always set together, so wouldn't it make more sense to
> make them an Option<ShadowFileInfo> ?
>
>> }
>>
>> /// Cached secret.
>> @@ -204,7 +280,13 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
>> eq && gen2 == cache_gen
>> }
>>
>> -fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
>> +fn apply_api_mutation(
>> + tokenid: &Authid,
>> + new_secret: Option<&str>,
>> + pre_write_meta: (Option<SystemTime>, Option<u64>),
>> +) {
>> + let now = epoch_i64();
>> +
>> // Signal cache invalidation to other processes (best-effort).
>> let new_shared_gen = bump_token_shadow_shared_gen();
>>
>> @@ -220,6 +302,13 @@ fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
>> // Update to the post-mutation generation.
>> cache.shared_gen = gen;
>>
>> + // If our cached file metadata does not match the on-disk state before our write,
>> + // we likely missed an external/manual edit. We can no longer trust any cached secrets.
>> + let (pre_mtime, pre_len) = pre_write_meta;
>> + if cache.file_mtime != pre_mtime || cache.file_len != pre_len {
>> + cache.secrets.clear();
>> + }
>> +
>> // Apply the new mutation.
>> match new_secret {
>> Some(secret) => {
>> @@ -234,6 +323,20 @@ fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
>> cache.secrets.remove(tokenid);
>> }
>> }
>> +
>> + // Update our view of the file metadata to the post-write state (best-effort).
>> + // (If this fails, drop local cache so callers fall back to slow path until refreshed.)
>> + match shadow_mtime_len() {
>> + Ok((mtime, len)) => {
>> + cache.file_mtime = mtime;
>> + cache.file_len = len;
>> + cache.last_checked = Some(now);
>> + }
>> + Err(_) => {
>> + // If we cannot validate state, do not trust cache.
>> + invalidate_cache_state(&mut cache);
>> + }
>> + }
>> }
>>
>> /// Get the current shared generation.
>> @@ -253,4 +356,15 @@ fn bump_token_shadow_shared_gen() -> Option<usize> {
>> /// Invalidates the cache state and only keeps the shared generation.
>> fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>> cache.secrets.clear();
>> + cache.file_mtime = None;
>> + cache.file_len = None;
>> + cache.last_checked = None;
>> +}
>> +
>> +fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
>> + match fs::metadata(CONF_FILE) {
>> + Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
>> + Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
>> + Err(e) => Err(e.into()),
>> + }
>> }
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-datacenter-manager v3 1/2] pdm-config: implement token.shadow generation
2026-01-16 16:48 5% ` Shannon Sterz
@ 2026-01-19 7:56 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-19 7:56 UTC (permalink / raw)
To: Shannon Sterz; +Cc: Proxmox Backup Server development discussion
comments inline
On 1/16/26 5:47 PM, Shannon Sterz wrote:
> On Fri Jan 16, 2026 at 5:28 PM CET, Samuel Rufinatscha wrote:
>> On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
>>> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>>>> PDM depends on the shared proxmox/proxmox-access-control crate for
>>>> token.shadow handling, which expects the product to provide a
>>>> cross-process invalidation signal so it can safely cache verified API
>>>> token secrets and invalidate them when token.shadow is changed.
>>>>
>>>> This patch
>>>>
>>>> * adds a token_shadow_generation to PDM’s shared-memory
>>>> ConfigVersionCache
>>>> * implements proxmox_access_control::init::AccessControlConfig
>>>> for pdm_config::AccessControlConfig, which
>>>> - delegates roles/privs/path checks to the existing
>>>> pdm_api_types::AccessControlConfig implementation
>>>> - implements the shadow cache generation trait functions
>>>> * switches the AccessControlConfig init paths (server + CLI) to use
>>>> pdm_config::AccessControlConfig instead of
>>>> pdm_api_types::AccessControlConfig
>>>>
>>>> This patch is part of the series which fixes bug #7017 [1].
>>>>
>>>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>>>
>>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>>> ---
>>>> cli/admin/src/main.rs | 2 +-
>>>> lib/pdm-config/Cargo.toml | 1 +
>>>> lib/pdm-config/src/access_control_config.rs | 73 +++++++++++++++++++++
>>>> lib/pdm-config/src/config_version_cache.rs | 18 +++++
>>>> lib/pdm-config/src/lib.rs | 2 +
>>>> server/src/acl.rs | 3 +-
>>>> 6 files changed, 96 insertions(+), 3 deletions(-)
>>>> create mode 100644 lib/pdm-config/src/access_control_config.rs
>>>>
>>>> diff --git a/cli/admin/src/main.rs b/cli/admin/src/main.rs
>>>> index f698fa2..916c633 100644
>>>> --- a/cli/admin/src/main.rs
>>>> +++ b/cli/admin/src/main.rs
>>>> @@ -19,7 +19,7 @@ fn main() {
>>>> proxmox_product_config::init(api_user, priv_user);
>>>>
>>>> proxmox_access_control::init::init(
>>>> - &pdm_api_types::AccessControlConfig,
>>>> + &pdm_config::AccessControlConfig,
>>>> pdm_buildcfg::configdir!("/access"),
>>>> )
>>>> .expect("failed to setup access control config");
>>>> diff --git a/lib/pdm-config/Cargo.toml b/lib/pdm-config/Cargo.toml
>>>> index d39c2ad..19781d2 100644
>>>> --- a/lib/pdm-config/Cargo.toml
>>>> +++ b/lib/pdm-config/Cargo.toml
>>>> @@ -13,6 +13,7 @@ once_cell.workspace = true
>>>> openssl.workspace = true
>>>> serde.workspace = true
>>>>
>>>> +proxmox-access-control.workspace = true
>>>> proxmox-config-digest = { workspace = true, features = [ "openssl" ] }
>>>> proxmox-http = { workspace = true, features = [ "http-helpers" ] }
>>>> proxmox-ldap = { workspace = true, features = [ "types" ]}
>>>> diff --git a/lib/pdm-config/src/access_control_config.rs b/lib/pdm-config/src/access_control_config.rs
>>>> new file mode 100644
>>>> index 0000000..6f2e6b3
>>>> --- /dev/null
>>>> +++ b/lib/pdm-config/src/access_control_config.rs
>>>> @@ -0,0 +1,73 @@
>>>> +// e.g. in src/main.rs or server::context mod, wherever convenient
>>>> +
>>>> +use anyhow::Error;
>>>> +use pdm_api_types::{Authid, Userid};
>>>> +use proxmox_section_config::SectionConfigData;
>>>> +use std::collections::HashMap;
>>>> +
>>>> +pub struct AccessControlConfig;
>>>> +
>>>> +impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
>>>
>>> should we then remove the impl from the api type?
>>>
>>
>> Thanks for pointing this out Fabian! Currently, /ui/src/main.rs still
>> makes use of pdm_api_types::AccessControlConfig. This looks like a WASM
>> module, and is based on ticket based auth
>> (proxmox_login::Authentication) as far as I can see. Do you maybe know
>> if it actually requires the token cache / can work with CVC? If it does
>> not, then I think we should keep the API impl. I left this unchanged
>> and only touched server and CLI call sites.
>
> i mostly exposed that there to get access to the privileges, roles, and
> is_superuser functions. they are needed in the ui to selectively render
> ui elements depending on a users privileges.
>
> this should probably be factored out though and shared differently if we
> want to extend this trait with more caching functions.
>
Good point.
>>>> + fn privileges(&self) -> &HashMap<&str, u64> {
>>>> + pdm_api_types::AccessControlConfig.privileges()
>>>> + }
>>>> +
>>>> + fn roles(&self) -> &HashMap<&str, (u64, &str)> {
>>>> + pdm_api_types::AccessControlConfig.roles()
>>>> + }
>>>> +
>>>> + fn is_superuser(&self, auth_id: &Authid) -> bool {
>>>> + pdm_api_types::AccessControlConfig.is_superuser(auth_id)
>>>> + }
>>>> +
>>>> + fn is_group_member(&self, user_id: &Userid, group: &str) -> bool {
>>>> + pdm_api_types::AccessControlConfig.is_group_member(user_id, group)
>>>> + }
>>>> +
>>>> + fn role_admin(&self) -> Option<&str> {
>>>> + pdm_api_types::AccessControlConfig.role_admin()
>>>> + }
>>>> +
>>>> + fn role_no_access(&self) -> Option<&str> {
>>>> + pdm_api_types::AccessControlConfig.role_no_access()
>>>> + }
>>>> +
>>>> + fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
>>>> + pdm_api_types::AccessControlConfig.init_user_config(config)
>>>> + }
>>>> +
>>>> + fn acl_audit_privileges(&self) -> u64 {
>>>> + pdm_api_types::AccessControlConfig.acl_audit_privileges()
>>>> + }
>>>> +
>>>> + fn acl_modify_privileges(&self) -> u64 {
>>>> + pdm_api_types::AccessControlConfig.acl_modify_privileges()
>>>> + }
>>>> +
>>>> + fn check_acl_path(&self, path: &str) -> Result<(), Error> {
>>>> + pdm_api_types::AccessControlConfig.check_acl_path(path)
>>>> + }
>>>> +
>>>> + fn allow_partial_permission_match(&self) -> bool {
>>>> + pdm_api_types::AccessControlConfig.allow_partial_permission_match()
>>>> + }
>>>> +
>>>> + fn cache_generation(&self) -> Option<usize> {
>>>> + pdm_api_types::AccessControlConfig.cache_generation()
>>>> + }
>>>
>>> shouldn't this be wired up to the ConfigVersionCache?
>>>
>>
>> If I understand correctly, cache_generation() and the
>> increment_cache_generation() below do not appear to have been wired
>> so far, meaning that caches were not enabled. To enable them,
>> a PDM AccessControlConfig implementation would probably be required
>> (as suggested in this patch) in order to be able integrate with
>> ConfigVersionCache.
>>
>> I think these two functions should be checked, if we want to enabled
>> them or not, probably best as part of a dedicated scope? I can create a
>> bug report for this.
>>
>
> sure, i think it's not too much effort, though. if you split out the
> caching parts, the ui should be fine without them. it really has no need
> for them afair.
If the UI doesnt make use of it maybe it would be simply best to keep
two different impls? One to keep it minimal, also since not all parts
might be WASM compatible, and one impl as proposed to wire-up CVC (and
maybe other things in the future..).
And will wire CVC for the other two existing caching functions as part
of this series.
>
>>>> +
>>>> + fn increment_cache_generation(&self) -> Result<(), Error> {
>>>> + pdm_api_types::AccessControlConfig.increment_cache_generation()
>>>
>>> shouldn't this be wired up to the ConfigVersionCache?
>>>
>>>> + }
>>>> +
>>>> + fn token_shadow_cache_generation(&self) -> Option<usize> {
>>>> + crate::ConfigVersionCache::new()
>>>> + .ok()
>>>> + .map(|c| c.token_shadow_generation())
>>>> + }
>>>> +
>>>> + fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
>>>> + let c = crate::ConfigVersionCache::new()?;
>>>> + Ok(c.increase_token_shadow_generation())
>>>> + }
>>>> +}
>>>> diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
>>>> index 36a6a77..933140c 100644
>>>> --- a/lib/pdm-config/src/config_version_cache.rs
>>>> +++ b/lib/pdm-config/src/config_version_cache.rs
>>>> @@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
>>>> traffic_control_generation: AtomicUsize,
>>>> // Tracks updates to the remote/hostname/nodename mapping cache.
>>>> remote_mapping_cache: AtomicUsize,
>>>> + // Token shadow (token.shadow) generation/version.
>>>> + token_shadow_generation: AtomicUsize,
>>>
>>> explanation why this is safe for the commit message would be nice ;)
>>>
>>
>> Will add :)
>>
>>>> // Add further atomics here
>>>> }
>>>>
>>>> @@ -172,4 +174,20 @@ impl ConfigVersionCache {
>>>> .fetch_add(1, Ordering::Relaxed)
>>>> + 1
>>>> }
>>>> +
>>>> + /// Returns the token shadow generation number.
>>>> + pub fn token_shadow_generation(&self) -> usize {
>>>> + self.shmem
>>>> + .data()
>>>> + .token_shadow_generation
>>>> + .load(Ordering::Acquire)
>>>> + }
>>>> +
>>>> + /// Increase the token shadow generation number.
>>>> + pub fn increase_token_shadow_generation(&self) -> usize {
>>>> + self.shmem
>>>> + .data()
>>>> + .token_shadow_generation
>>>> + .fetch_add(1, Ordering::AcqRel)
>>>> + }
>>>> }
>>>> diff --git a/lib/pdm-config/src/lib.rs b/lib/pdm-config/src/lib.rs
>>>> index 4c49054..a15a006 100644
>>>> --- a/lib/pdm-config/src/lib.rs
>>>> +++ b/lib/pdm-config/src/lib.rs
>>>> @@ -9,6 +9,8 @@ pub mod remotes;
>>>> pub mod setup;
>>>> pub mod views;
>>>>
>>>> +mod access_control_config;
>>>> +pub use access_control_config::AccessControlConfig;
>>>> mod config_version_cache;
>>>> pub use config_version_cache::ConfigVersionCache;
>>>>
>>>> diff --git a/server/src/acl.rs b/server/src/acl.rs
>>>> index f421814..e6e007b 100644
>>>> --- a/server/src/acl.rs
>>>> +++ b/server/src/acl.rs
>>>> @@ -1,6 +1,5 @@
>>>> pub(crate) fn init() {
>>>> - static ACCESS_CONTROL_CONFIG: pdm_api_types::AccessControlConfig =
>>>> - pdm_api_types::AccessControlConfig;
>>>> + static ACCESS_CONTROL_CONFIG: pdm_config::AccessControlConfig = pdm_config::AccessControlConfig;
>>>>
>>>> proxmox_access_control::init::init(&ACCESS_CONTROL_CONFIG, pdm_buildcfg::configdir!("/access"))
>>>> .expect("failed to setup access control config");
>>>> --
>>>> 2.47.3
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> pbs-devel mailing list
>>>> pbs-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets
2026-01-16 16:00 5% ` Fabian Grünbichler
@ 2026-01-16 16:56 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-16 16:56 UTC (permalink / raw)
To: Fabian Grünbichler, Proxmox Backup Server development discussion
On 1/16/26 4:59 PM, Fabian Grünbichler wrote:
> Quoting Samuel Rufinatscha (2026-01-16 16:13:17)
>> On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
>>> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>>>> Currently, every token-based API request reads the token.shadow file and
>>>> runs the expensive password hash verification for the given token
>>>> secret. This shows up as a hotspot in /status profiling (see
>>>> bug #7017 [1]).
>>>>
>>>> This patch introduces an in-memory cache of successfully verified token
>>>> secrets. Subsequent requests for the same token+secret combination only
>>>> perform a comparison using openssl::memcmp::eq and avoid re-running the
>>>> password hash. The cache is updated when a token secret is set and
>>>> cleared when a token is deleted. Note, this does NOT include manual
>>>> config changes, which will be covered in a subsequent patch.
>>>>
>>>> This patch is part of the series which fixes bug #7017 [1].
>>>>
>>>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>>>
>>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>>> ---
>
> [..]
>
>>>> +
>>>> +// Tries to match the given token secret against the cached secret.
>>>> +// Checks the generation before and after the constant-time compare to avoid a
>>>> +// TOCTOU window. If another process rotates/deletes a token while we're validating
>>>> +// the cached secret, the generation will change, and we
>>>> +// must not trust the cache for this request.
>>>> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
>>>> + let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
>>>> + return false;
>>>> + };
>>>> + let Some(entry) = cache.secrets.get(tokenid) else {
>>>> + return false;
>>>> + };
>>>> +
>>>> + let cache_gen = cache.shared_gen;
>>>> +
>>>> + let Some(gen1) = token_shadow_shared_gen() else {
>>>> + return false;
>>>> + };
>>>> + if gen1 != cache_gen {
>>>> + return false;
>>>> + }
>>>> +
>>>> + let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
>>>
>>> should we invalidate the cache here for this particular authid in case
>>> of a mismatch, to avoid making brute forcing too easy/cheap?
>>>
>>
>> We are not doing a cheap reject, in mismatch we do still fall through to
>> verify_crypt_pw(). Evicting on mismatch could however enable cache
>> thrashing where wrong secrets for a known tokenid would evict cached
>> entries. So I think we should not invalidate here on mismatch.
>
> forgot this part here, sorry. you are right, this *should* be okay. I do think
> the second generation check there serves no purpose though. the token config
> can change at any point after we've validated the secret using the old state,
> there is nothing we can do about that, and it's totally fine to accept a token
> that is modified at exactly the same moment, even if that same token wouldn't
> be valid 2 seconds later..
>
> there has to be a point where we have to say "this token is valid", and at the
> point of memcmp here we have already:
> - verified we don't need to reload the file
> - verified we didn't have any API changes to the token config
> - verified that the secret matches what we have cached
>
> redoing the first two changes after that point doesn't protect us against
> changes afterwards either, so we might as well not do that extra work that
> doesn't give us any extra safety guarantees anyway..
Agreed, the second generation check only narrows down a very small
window around memcmp (tried to avoid the TOCTOU at this point), but as
you said, it doesn’t provide a strong additional guarantee and is
unnecessary. Will remove!
>
>>
>>>> + let Some(gen2) = token_shadow_shared_gen() else {
>>>> + return false;
>>>> + };
>>>> +
>>>> + eq && gen2 == cache_gen
>>>> +}
>>>> +
>>>> +fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
>>>> + // Signal cache invalidation to other processes (best-effort).
>>>> + let new_shared_gen = bump_token_shadow_shared_gen();
>>>> +
>>>> + let mut cache = TOKEN_SECRET_CACHE.write();
>
> because I mentioned switching those two around - this actually requires more
> thought I think..
>
> right now, calling apply_api_mutation happens under a lock, but there are other
> calls that bump the generation, so this is actually racy here. OTOH, bumping
> the generation before locking the cache means faster cache invalidation..
Yes, I favored to bump the gen before the write lock for faster cache
invalidation / for better security.
>
> maybe we should re-verify the generation after obtaining the lock? and maybe
> make apply_api_mutation consume the shadow config file lock, to ensure it's
> only called while that lock is being held?
Agree, I think we should re-verify the generation after the write lock.
Also agree, I think we should pass the file lock down. Good idea! :)
This should make it more robust.
>
>>>> +
>>>> + // If we cannot read/bump the shared generation, we cannot safely trust the cache.
>>>> + let Some(gen) = new_shared_gen else {
>>>> + invalidate_cache_state(&mut cache);
>>>> + cache.shared_gen = 0;
>>>> + return;
>>>> + };
>>>> +
>>>> + // Update to the post-mutation generation.
>>>> + cache.shared_gen = gen;
>>>> +
>>>> + // Apply the new mutation.
>>>> + match new_secret {
>>>> + Some(secret) => {
>>>> + cache.secrets.insert(
>>>> + tokenid.clone(),
>>>> + CachedSecret {
>>>> + secret: secret.to_owned(),
>>>> + },
>>>> + );
>>>> + }
>>>> + None => {
>>>> + cache.secrets.remove(tokenid);
>>>> + }
>>>> + }
>>>> +}
>>>> +
>>>> +/// Get the current shared generation.
>>>> +fn token_shadow_shared_gen() -> Option<usize> {
>>>> + crate::ConfigVersionCache::new()
>>>> + .ok()
>>>> + .map(|cvc| cvc.token_shadow_generation())
>>>> +}
>>>> +
>>>> +/// Bump and return the new shared generation.
>>>> +fn bump_token_shadow_shared_gen() -> Option<usize> {
>>>> + crate::ConfigVersionCache::new()
>>>> + .ok()
>>>> + .map(|cvc| cvc.increase_token_shadow_generation() + 1)
>>>> +}
>>>> +
>>>> +/// Invalidates the cache state and only keeps the shared generation.
>>>
>>> both calls to this actually set the cached generation to some value
>>> right after, so maybe this should take a generation directly and set it?
>>>
>>
>> patch 3/4 doesn’t always update the gen on cache invalidation
>> (shadow_mtime_len() error branch in apply_api_mutation) but most other
>> call sites do. Agreed this can be refactored, maybe:
>>
>> fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>> cache.secrets.clear();
>> // clear other cache fields (mtime/len/last_checked) as needed
>> }
>>
>> fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache,
>> gen: usize) {
>> invalidate_cache_state(cache);
>> cache.shared_gen = gen;
>> }
>>
>> We could also do a single helper with Option<usize> but two helpers make
>> the call sites more explicit.
>>
>>>> +fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>>>> + cache.secrets.clear();
>>>> +}
>>>> --
>>>> 2.47.3
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> pbs-devel mailing list
>>>> pbs-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-datacenter-manager v3 1/2] pdm-config: implement token.shadow generation
2026-01-16 16:28 6% ` Samuel Rufinatscha
@ 2026-01-16 16:48 5% ` Shannon Sterz
2026-01-19 7:56 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Shannon Sterz @ 2026-01-16 16:48 UTC (permalink / raw)
To: Samuel Rufinatscha; +Cc: Proxmox Backup Server development discussion
On Fri Jan 16, 2026 at 5:28 PM CET, Samuel Rufinatscha wrote:
> On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
>> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>>> PDM depends on the shared proxmox/proxmox-access-control crate for
>>> token.shadow handling, which expects the product to provide a
>>> cross-process invalidation signal so it can safely cache verified API
>>> token secrets and invalidate them when token.shadow is changed.
>>>
>>> This patch
>>>
>>> * adds a token_shadow_generation to PDM’s shared-memory
>>> ConfigVersionCache
>>> * implements proxmox_access_control::init::AccessControlConfig
>>> for pdm_config::AccessControlConfig, which
>>> - delegates roles/privs/path checks to the existing
>>> pdm_api_types::AccessControlConfig implementation
>>> - implements the shadow cache generation trait functions
>>> * switches the AccessControlConfig init paths (server + CLI) to use
>>> pdm_config::AccessControlConfig instead of
>>> pdm_api_types::AccessControlConfig
>>>
>>> This patch is part of the series which fixes bug #7017 [1].
>>>
>>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>>
>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>> ---
>>> cli/admin/src/main.rs | 2 +-
>>> lib/pdm-config/Cargo.toml | 1 +
>>> lib/pdm-config/src/access_control_config.rs | 73 +++++++++++++++++++++
>>> lib/pdm-config/src/config_version_cache.rs | 18 +++++
>>> lib/pdm-config/src/lib.rs | 2 +
>>> server/src/acl.rs | 3 +-
>>> 6 files changed, 96 insertions(+), 3 deletions(-)
>>> create mode 100644 lib/pdm-config/src/access_control_config.rs
>>>
>>> diff --git a/cli/admin/src/main.rs b/cli/admin/src/main.rs
>>> index f698fa2..916c633 100644
>>> --- a/cli/admin/src/main.rs
>>> +++ b/cli/admin/src/main.rs
>>> @@ -19,7 +19,7 @@ fn main() {
>>> proxmox_product_config::init(api_user, priv_user);
>>>
>>> proxmox_access_control::init::init(
>>> - &pdm_api_types::AccessControlConfig,
>>> + &pdm_config::AccessControlConfig,
>>> pdm_buildcfg::configdir!("/access"),
>>> )
>>> .expect("failed to setup access control config");
>>> diff --git a/lib/pdm-config/Cargo.toml b/lib/pdm-config/Cargo.toml
>>> index d39c2ad..19781d2 100644
>>> --- a/lib/pdm-config/Cargo.toml
>>> +++ b/lib/pdm-config/Cargo.toml
>>> @@ -13,6 +13,7 @@ once_cell.workspace = true
>>> openssl.workspace = true
>>> serde.workspace = true
>>>
>>> +proxmox-access-control.workspace = true
>>> proxmox-config-digest = { workspace = true, features = [ "openssl" ] }
>>> proxmox-http = { workspace = true, features = [ "http-helpers" ] }
>>> proxmox-ldap = { workspace = true, features = [ "types" ]}
>>> diff --git a/lib/pdm-config/src/access_control_config.rs b/lib/pdm-config/src/access_control_config.rs
>>> new file mode 100644
>>> index 0000000..6f2e6b3
>>> --- /dev/null
>>> +++ b/lib/pdm-config/src/access_control_config.rs
>>> @@ -0,0 +1,73 @@
>>> +// e.g. in src/main.rs or server::context mod, wherever convenient
>>> +
>>> +use anyhow::Error;
>>> +use pdm_api_types::{Authid, Userid};
>>> +use proxmox_section_config::SectionConfigData;
>>> +use std::collections::HashMap;
>>> +
>>> +pub struct AccessControlConfig;
>>> +
>>> +impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
>>
>> should we then remove the impl from the api type?
>>
>
> Thanks for pointing this out Fabian! Currently, /ui/src/main.rs still
> makes use of pdm_api_types::AccessControlConfig. This looks like a WASM
> module, and is based on ticket based auth
> (proxmox_login::Authentication) as far as I can see. Do you maybe know
> if it actually requires the token cache / can work with CVC? If it does
> not, then I think we should keep the API impl. I left this unchanged
> and only touched server and CLI call sites.
i mostly exposed that there to get access to the privileges, roles, and
is_superuser functions. they are needed in the ui to selectively render
ui elements depending on a users privileges.
this should probably be factored out though and shared differently if we
want to extend this trait with more caching functions.
>>> + fn privileges(&self) -> &HashMap<&str, u64> {
>>> + pdm_api_types::AccessControlConfig.privileges()
>>> + }
>>> +
>>> + fn roles(&self) -> &HashMap<&str, (u64, &str)> {
>>> + pdm_api_types::AccessControlConfig.roles()
>>> + }
>>> +
>>> + fn is_superuser(&self, auth_id: &Authid) -> bool {
>>> + pdm_api_types::AccessControlConfig.is_superuser(auth_id)
>>> + }
>>> +
>>> + fn is_group_member(&self, user_id: &Userid, group: &str) -> bool {
>>> + pdm_api_types::AccessControlConfig.is_group_member(user_id, group)
>>> + }
>>> +
>>> + fn role_admin(&self) -> Option<&str> {
>>> + pdm_api_types::AccessControlConfig.role_admin()
>>> + }
>>> +
>>> + fn role_no_access(&self) -> Option<&str> {
>>> + pdm_api_types::AccessControlConfig.role_no_access()
>>> + }
>>> +
>>> + fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
>>> + pdm_api_types::AccessControlConfig.init_user_config(config)
>>> + }
>>> +
>>> + fn acl_audit_privileges(&self) -> u64 {
>>> + pdm_api_types::AccessControlConfig.acl_audit_privileges()
>>> + }
>>> +
>>> + fn acl_modify_privileges(&self) -> u64 {
>>> + pdm_api_types::AccessControlConfig.acl_modify_privileges()
>>> + }
>>> +
>>> + fn check_acl_path(&self, path: &str) -> Result<(), Error> {
>>> + pdm_api_types::AccessControlConfig.check_acl_path(path)
>>> + }
>>> +
>>> + fn allow_partial_permission_match(&self) -> bool {
>>> + pdm_api_types::AccessControlConfig.allow_partial_permission_match()
>>> + }
>>> +
>>> + fn cache_generation(&self) -> Option<usize> {
>>> + pdm_api_types::AccessControlConfig.cache_generation()
>>> + }
>>
>> shouldn't this be wired up to the ConfigVersionCache?
>>
>
> If I understand correctly, cache_generation() and the
> increment_cache_generation() below do not appear to have been wired
> so far, meaning that caches were not enabled. To enable them,
> a PDM AccessControlConfig implementation would probably be required
> (as suggested in this patch) in order to be able integrate with
> ConfigVersionCache.
>
> I think these two functions should be checked, if we want to enabled
> them or not, probably best as part of a dedicated scope? I can create a
> bug report for this.
>
sure, i think it's not too much effort, though. if you split out the
caching parts, the ui should be fine without them. it really has no need
for them afair.
>>> +
>>> + fn increment_cache_generation(&self) -> Result<(), Error> {
>>> + pdm_api_types::AccessControlConfig.increment_cache_generation()
>>
>> shouldn't this be wired up to the ConfigVersionCache?
>>
>>> + }
>>> +
>>> + fn token_shadow_cache_generation(&self) -> Option<usize> {
>>> + crate::ConfigVersionCache::new()
>>> + .ok()
>>> + .map(|c| c.token_shadow_generation())
>>> + }
>>> +
>>> + fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
>>> + let c = crate::ConfigVersionCache::new()?;
>>> + Ok(c.increase_token_shadow_generation())
>>> + }
>>> +}
>>> diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
>>> index 36a6a77..933140c 100644
>>> --- a/lib/pdm-config/src/config_version_cache.rs
>>> +++ b/lib/pdm-config/src/config_version_cache.rs
>>> @@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
>>> traffic_control_generation: AtomicUsize,
>>> // Tracks updates to the remote/hostname/nodename mapping cache.
>>> remote_mapping_cache: AtomicUsize,
>>> + // Token shadow (token.shadow) generation/version.
>>> + token_shadow_generation: AtomicUsize,
>>
>> explanation why this is safe for the commit message would be nice ;)
>>
>
> Will add :)
>
>>> // Add further atomics here
>>> }
>>>
>>> @@ -172,4 +174,20 @@ impl ConfigVersionCache {
>>> .fetch_add(1, Ordering::Relaxed)
>>> + 1
>>> }
>>> +
>>> + /// Returns the token shadow generation number.
>>> + pub fn token_shadow_generation(&self) -> usize {
>>> + self.shmem
>>> + .data()
>>> + .token_shadow_generation
>>> + .load(Ordering::Acquire)
>>> + }
>>> +
>>> + /// Increase the token shadow generation number.
>>> + pub fn increase_token_shadow_generation(&self) -> usize {
>>> + self.shmem
>>> + .data()
>>> + .token_shadow_generation
>>> + .fetch_add(1, Ordering::AcqRel)
>>> + }
>>> }
>>> diff --git a/lib/pdm-config/src/lib.rs b/lib/pdm-config/src/lib.rs
>>> index 4c49054..a15a006 100644
>>> --- a/lib/pdm-config/src/lib.rs
>>> +++ b/lib/pdm-config/src/lib.rs
>>> @@ -9,6 +9,8 @@ pub mod remotes;
>>> pub mod setup;
>>> pub mod views;
>>>
>>> +mod access_control_config;
>>> +pub use access_control_config::AccessControlConfig;
>>> mod config_version_cache;
>>> pub use config_version_cache::ConfigVersionCache;
>>>
>>> diff --git a/server/src/acl.rs b/server/src/acl.rs
>>> index f421814..e6e007b 100644
>>> --- a/server/src/acl.rs
>>> +++ b/server/src/acl.rs
>>> @@ -1,6 +1,5 @@
>>> pub(crate) fn init() {
>>> - static ACCESS_CONTROL_CONFIG: pdm_api_types::AccessControlConfig =
>>> - pdm_api_types::AccessControlConfig;
>>> + static ACCESS_CONTROL_CONFIG: pdm_config::AccessControlConfig = pdm_config::AccessControlConfig;
>>>
>>> proxmox_access_control::init::init(&ACCESS_CONTROL_CONFIG, pdm_buildcfg::configdir!("/access"))
>>> .expect("failed to setup access control config");
>>> --
>>> 2.47.3
>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-datacenter-manager v3 1/2] pdm-config: implement token.shadow generation
2026-01-14 10:45 5% ` Fabian Grünbichler
@ 2026-01-16 16:28 6% ` Samuel Rufinatscha
2026-01-16 16:48 5% ` Shannon Sterz
0 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2026-01-16 16:28 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>> PDM depends on the shared proxmox/proxmox-access-control crate for
>> token.shadow handling, which expects the product to provide a
>> cross-process invalidation signal so it can safely cache verified API
>> token secrets and invalidate them when token.shadow is changed.
>>
>> This patch
>>
>> * adds a token_shadow_generation to PDM’s shared-memory
>> ConfigVersionCache
>> * implements proxmox_access_control::init::AccessControlConfig
>> for pdm_config::AccessControlConfig, which
>> - delegates roles/privs/path checks to the existing
>> pdm_api_types::AccessControlConfig implementation
>> - implements the shadow cache generation trait functions
>> * switches the AccessControlConfig init paths (server + CLI) to use
>> pdm_config::AccessControlConfig instead of
>> pdm_api_types::AccessControlConfig
>>
>> This patch is part of the series which fixes bug #7017 [1].
>>
>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> cli/admin/src/main.rs | 2 +-
>> lib/pdm-config/Cargo.toml | 1 +
>> lib/pdm-config/src/access_control_config.rs | 73 +++++++++++++++++++++
>> lib/pdm-config/src/config_version_cache.rs | 18 +++++
>> lib/pdm-config/src/lib.rs | 2 +
>> server/src/acl.rs | 3 +-
>> 6 files changed, 96 insertions(+), 3 deletions(-)
>> create mode 100644 lib/pdm-config/src/access_control_config.rs
>>
>> diff --git a/cli/admin/src/main.rs b/cli/admin/src/main.rs
>> index f698fa2..916c633 100644
>> --- a/cli/admin/src/main.rs
>> +++ b/cli/admin/src/main.rs
>> @@ -19,7 +19,7 @@ fn main() {
>> proxmox_product_config::init(api_user, priv_user);
>>
>> proxmox_access_control::init::init(
>> - &pdm_api_types::AccessControlConfig,
>> + &pdm_config::AccessControlConfig,
>> pdm_buildcfg::configdir!("/access"),
>> )
>> .expect("failed to setup access control config");
>> diff --git a/lib/pdm-config/Cargo.toml b/lib/pdm-config/Cargo.toml
>> index d39c2ad..19781d2 100644
>> --- a/lib/pdm-config/Cargo.toml
>> +++ b/lib/pdm-config/Cargo.toml
>> @@ -13,6 +13,7 @@ once_cell.workspace = true
>> openssl.workspace = true
>> serde.workspace = true
>>
>> +proxmox-access-control.workspace = true
>> proxmox-config-digest = { workspace = true, features = [ "openssl" ] }
>> proxmox-http = { workspace = true, features = [ "http-helpers" ] }
>> proxmox-ldap = { workspace = true, features = [ "types" ]}
>> diff --git a/lib/pdm-config/src/access_control_config.rs b/lib/pdm-config/src/access_control_config.rs
>> new file mode 100644
>> index 0000000..6f2e6b3
>> --- /dev/null
>> +++ b/lib/pdm-config/src/access_control_config.rs
>> @@ -0,0 +1,73 @@
>> +// e.g. in src/main.rs or server::context mod, wherever convenient
>> +
>> +use anyhow::Error;
>> +use pdm_api_types::{Authid, Userid};
>> +use proxmox_section_config::SectionConfigData;
>> +use std::collections::HashMap;
>> +
>> +pub struct AccessControlConfig;
>> +
>> +impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
>
> should we then remove the impl from the api type?
>
Thanks for pointing this out Fabian! Currently, /ui/src/main.rs still
makes use of pdm_api_types::AccessControlConfig. This looks like a WASM
module, and is based on ticket based auth
(proxmox_login::Authentication) as far as I can see. Do you maybe know
if it actually requires the token cache / can work with CVC? If it does
not, then I think we should keep the API impl. I left this unchanged
and only touched server and CLI call sites.
>> + fn privileges(&self) -> &HashMap<&str, u64> {
>> + pdm_api_types::AccessControlConfig.privileges()
>> + }
>> +
>> + fn roles(&self) -> &HashMap<&str, (u64, &str)> {
>> + pdm_api_types::AccessControlConfig.roles()
>> + }
>> +
>> + fn is_superuser(&self, auth_id: &Authid) -> bool {
>> + pdm_api_types::AccessControlConfig.is_superuser(auth_id)
>> + }
>> +
>> + fn is_group_member(&self, user_id: &Userid, group: &str) -> bool {
>> + pdm_api_types::AccessControlConfig.is_group_member(user_id, group)
>> + }
>> +
>> + fn role_admin(&self) -> Option<&str> {
>> + pdm_api_types::AccessControlConfig.role_admin()
>> + }
>> +
>> + fn role_no_access(&self) -> Option<&str> {
>> + pdm_api_types::AccessControlConfig.role_no_access()
>> + }
>> +
>> + fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
>> + pdm_api_types::AccessControlConfig.init_user_config(config)
>> + }
>> +
>> + fn acl_audit_privileges(&self) -> u64 {
>> + pdm_api_types::AccessControlConfig.acl_audit_privileges()
>> + }
>> +
>> + fn acl_modify_privileges(&self) -> u64 {
>> + pdm_api_types::AccessControlConfig.acl_modify_privileges()
>> + }
>> +
>> + fn check_acl_path(&self, path: &str) -> Result<(), Error> {
>> + pdm_api_types::AccessControlConfig.check_acl_path(path)
>> + }
>> +
>> + fn allow_partial_permission_match(&self) -> bool {
>> + pdm_api_types::AccessControlConfig.allow_partial_permission_match()
>> + }
>> +
>> + fn cache_generation(&self) -> Option<usize> {
>> + pdm_api_types::AccessControlConfig.cache_generation()
>> + }
>
> shouldn't this be wired up to the ConfigVersionCache?
>
If I understand correctly, cache_generation() and the
increment_cache_generation() below do not appear to have been wired
so far, meaning that caches were not enabled. To enable them,
a PDM AccessControlConfig implementation would probably be required
(as suggested in this patch) in order to be able integrate with
ConfigVersionCache.
I think these two functions should be checked, if we want to enabled
them or not, probably best as part of a dedicated scope? I can create a
bug report for this.
>> +
>> + fn increment_cache_generation(&self) -> Result<(), Error> {
>> + pdm_api_types::AccessControlConfig.increment_cache_generation()
>
> shouldn't this be wired up to the ConfigVersionCache?
>
>> + }
>> +
>> + fn token_shadow_cache_generation(&self) -> Option<usize> {
>> + crate::ConfigVersionCache::new()
>> + .ok()
>> + .map(|c| c.token_shadow_generation())
>> + }
>> +
>> + fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
>> + let c = crate::ConfigVersionCache::new()?;
>> + Ok(c.increase_token_shadow_generation())
>> + }
>> +}
>> diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
>> index 36a6a77..933140c 100644
>> --- a/lib/pdm-config/src/config_version_cache.rs
>> +++ b/lib/pdm-config/src/config_version_cache.rs
>> @@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
>> traffic_control_generation: AtomicUsize,
>> // Tracks updates to the remote/hostname/nodename mapping cache.
>> remote_mapping_cache: AtomicUsize,
>> + // Token shadow (token.shadow) generation/version.
>> + token_shadow_generation: AtomicUsize,
>
> explanation why this is safe for the commit message would be nice ;)
>
Will add :)
>> // Add further atomics here
>> }
>>
>> @@ -172,4 +174,20 @@ impl ConfigVersionCache {
>> .fetch_add(1, Ordering::Relaxed)
>> + 1
>> }
>> +
>> + /// Returns the token shadow generation number.
>> + pub fn token_shadow_generation(&self) -> usize {
>> + self.shmem
>> + .data()
>> + .token_shadow_generation
>> + .load(Ordering::Acquire)
>> + }
>> +
>> + /// Increase the token shadow generation number.
>> + pub fn increase_token_shadow_generation(&self) -> usize {
>> + self.shmem
>> + .data()
>> + .token_shadow_generation
>> + .fetch_add(1, Ordering::AcqRel)
>> + }
>> }
>> diff --git a/lib/pdm-config/src/lib.rs b/lib/pdm-config/src/lib.rs
>> index 4c49054..a15a006 100644
>> --- a/lib/pdm-config/src/lib.rs
>> +++ b/lib/pdm-config/src/lib.rs
>> @@ -9,6 +9,8 @@ pub mod remotes;
>> pub mod setup;
>> pub mod views;
>>
>> +mod access_control_config;
>> +pub use access_control_config::AccessControlConfig;
>> mod config_version_cache;
>> pub use config_version_cache::ConfigVersionCache;
>>
>> diff --git a/server/src/acl.rs b/server/src/acl.rs
>> index f421814..e6e007b 100644
>> --- a/server/src/acl.rs
>> +++ b/server/src/acl.rs
>> @@ -1,6 +1,5 @@
>> pub(crate) fn init() {
>> - static ACCESS_CONTROL_CONFIG: pdm_api_types::AccessControlConfig =
>> - pdm_api_types::AccessControlConfig;
>> + static ACCESS_CONTROL_CONFIG: pdm_config::AccessControlConfig = pdm_config::AccessControlConfig;
>>
>> proxmox_access_control::init::init(&ACCESS_CONTROL_CONFIG, pdm_buildcfg::configdir!("/access"))
>> .expect("failed to setup access control config");
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets
2026-01-16 15:13 6% ` Samuel Rufinatscha
2026-01-16 15:29 5% ` Fabian Grünbichler
@ 2026-01-16 16:00 5% ` Fabian Grünbichler
2026-01-16 16:56 6% ` Samuel Rufinatscha
1 sibling, 1 reply; 200+ results
From: Fabian Grünbichler @ 2026-01-16 16:00 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Samuel Rufinatscha
Quoting Samuel Rufinatscha (2026-01-16 16:13:17)
> On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
> > On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
> >> Currently, every token-based API request reads the token.shadow file and
> >> runs the expensive password hash verification for the given token
> >> secret. This shows up as a hotspot in /status profiling (see
> >> bug #7017 [1]).
> >>
> >> This patch introduces an in-memory cache of successfully verified token
> >> secrets. Subsequent requests for the same token+secret combination only
> >> perform a comparison using openssl::memcmp::eq and avoid re-running the
> >> password hash. The cache is updated when a token secret is set and
> >> cleared when a token is deleted. Note, this does NOT include manual
> >> config changes, which will be covered in a subsequent patch.
> >>
> >> This patch is part of the series which fixes bug #7017 [1].
> >>
> >> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
> >>
> >> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> >> ---
[..]
> >> +
> >> +// Tries to match the given token secret against the cached secret.
> >> +// Checks the generation before and after the constant-time compare to avoid a
> >> +// TOCTOU window. If another process rotates/deletes a token while we're validating
> >> +// the cached secret, the generation will change, and we
> >> +// must not trust the cache for this request.
> >> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
> >> + let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
> >> + return false;
> >> + };
> >> + let Some(entry) = cache.secrets.get(tokenid) else {
> >> + return false;
> >> + };
> >> +
> >> + let cache_gen = cache.shared_gen;
> >> +
> >> + let Some(gen1) = token_shadow_shared_gen() else {
> >> + return false;
> >> + };
> >> + if gen1 != cache_gen {
> >> + return false;
> >> + }
> >> +
> >> + let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
> >
> > should we invalidate the cache here for this particular authid in case
> > of a mismatch, to avoid making brute forcing too easy/cheap?
> >
>
> We are not doing a cheap reject, in mismatch we do still fall through to
> verify_crypt_pw(). Evicting on mismatch could however enable cache
> thrashing where wrong secrets for a known tokenid would evict cached
> entries. So I think we should not invalidate here on mismatch.
forgot this part here, sorry. you are right, this *should* be okay. I do think
the second generation check there serves no purpose though. the token config
can change at any point after we've validated the secret using the old state,
there is nothing we can do about that, and it's totally fine to accept a token
that is modified at exactly the same moment, even if that same token wouldn't
be valid 2 seconds later..
there has to be a point where we have to say "this token is valid", and at the
point of memcmp here we have already:
- verified we don't need to reload the file
- verified we didn't have any API changes to the token config
- verified that the secret matches what we have cached
redoing the first two changes after that point doesn't protect us against
changes afterwards either, so we might as well not do that extra work that
doesn't give us any extra safety guarantees anyway..
>
> >> + let Some(gen2) = token_shadow_shared_gen() else {
> >> + return false;
> >> + };
> >> +
> >> + eq && gen2 == cache_gen
> >> +}
> >> +
> >> +fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
> >> + // Signal cache invalidation to other processes (best-effort).
> >> + let new_shared_gen = bump_token_shadow_shared_gen();
> >> +
> >> + let mut cache = TOKEN_SECRET_CACHE.write();
because I mentioned switching those two around - this actually requires more
thought I think..
right now, calling apply_api_mutation happens under a lock, but there are other
calls that bump the generation, so this is actually racy here. OTOH, bumping
the generation before locking the cache means faster cache invalidation..
maybe we should re-verify the generation after obtaining the lock? and maybe
make apply_api_mutation consume the shadow config file lock, to ensure it's
only called while that lock is being held?
> >> +
> >> + // If we cannot read/bump the shared generation, we cannot safely trust the cache.
> >> + let Some(gen) = new_shared_gen else {
> >> + invalidate_cache_state(&mut cache);
> >> + cache.shared_gen = 0;
> >> + return;
> >> + };
> >> +
> >> + // Update to the post-mutation generation.
> >> + cache.shared_gen = gen;
> >> +
> >> + // Apply the new mutation.
> >> + match new_secret {
> >> + Some(secret) => {
> >> + cache.secrets.insert(
> >> + tokenid.clone(),
> >> + CachedSecret {
> >> + secret: secret.to_owned(),
> >> + },
> >> + );
> >> + }
> >> + None => {
> >> + cache.secrets.remove(tokenid);
> >> + }
> >> + }
> >> +}
> >> +
> >> +/// Get the current shared generation.
> >> +fn token_shadow_shared_gen() -> Option<usize> {
> >> + crate::ConfigVersionCache::new()
> >> + .ok()
> >> + .map(|cvc| cvc.token_shadow_generation())
> >> +}
> >> +
> >> +/// Bump and return the new shared generation.
> >> +fn bump_token_shadow_shared_gen() -> Option<usize> {
> >> + crate::ConfigVersionCache::new()
> >> + .ok()
> >> + .map(|cvc| cvc.increase_token_shadow_generation() + 1)
> >> +}
> >> +
> >> +/// Invalidates the cache state and only keeps the shared generation.
> >
> > both calls to this actually set the cached generation to some value
> > right after, so maybe this should take a generation directly and set it?
> >
>
> patch 3/4 doesn’t always update the gen on cache invalidation
> (shadow_mtime_len() error branch in apply_api_mutation) but most other
> call sites do. Agreed this can be refactored, maybe:
>
> fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
> cache.secrets.clear();
> // clear other cache fields (mtime/len/last_checked) as needed
> }
>
> fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache,
> gen: usize) {
> invalidate_cache_state(cache);
> cache.shared_gen = gen;
> }
>
> We could also do a single helper with Option<usize> but two helpers make
> the call sites more explicit.
>
> >> +fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
> >> + cache.secrets.clear();
> >> +}
> >> --
> >> 2.47.3
> >>
> >>
> >>
> >> _______________________________________________
> >> pbs-devel mailing list
> >> pbs-devel@lists.proxmox.com
> >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> >>
> >
> >
> > _______________________________________________
> > pbs-devel mailing list
> > pbs-devel@lists.proxmox.com
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets
2026-01-16 15:29 5% ` Fabian Grünbichler
@ 2026-01-16 15:33 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-16 15:33 UTC (permalink / raw)
To: Fabian Grünbichler, Proxmox Backup Server development discussion
On 1/16/26 4:28 PM, Fabian Grünbichler wrote:
> Quoting Samuel Rufinatscha (2026-01-16 16:13:17)
>> On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
>>> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>>>> Currently, every token-based API request reads the token.shadow file and
>>>> runs the expensive password hash verification for the given token
>>>> secret. This shows up as a hotspot in /status profiling (see
>>>> bug #7017 [1]).
>>>>
>>>> This patch introduces an in-memory cache of successfully verified token
>>>> secrets. Subsequent requests for the same token+secret combination only
>>>> perform a comparison using openssl::memcmp::eq and avoid re-running the
>>>> password hash. The cache is updated when a token secret is set and
>>>> cleared when a token is deleted. Note, this does NOT include manual
>>>> config changes, which will be covered in a subsequent patch.
>>>>
>>>> This patch is part of the series which fixes bug #7017 [1].
>>>>
>>>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>>>
>>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>>> ---
>>>> Changes from v1 to v2:
>>>>
>>>> * Replace OnceCell with LazyLock, and std::sync::RwLock with
>>>> parking_lot::RwLock.
>>>> * Add API_MUTATION_GENERATION and guard cache inserts
>>>> to prevent “zombie inserts” across concurrent set/delete.
>>>> * Refactor cache operations into cache_try_secret_matches,
>>>> cache_try_insert_secret, and centralize write-side behavior in
>>>> apply_api_mutation.
>>>> * Switch fast-path cache access to try_read/try_write (best-effort).
>>>>
>>>> Changes from v2 to v3:
>>>>
>>>> * Replaced process-local cache invalidation (AtomicU64
>>>> API_MUTATION_GENERATION) with a cross-process shared generation via
>>>> ConfigVersionCache.
>>>> * Validate shared generation before/after the constant-time secret
>>>> compare; only insert into cache if the generation is unchanged.
>>>> * invalidate_cache_state() on insert if shared generation changed.
>>>>
>>>> Cargo.toml | 1 +
>>>> pbs-config/Cargo.toml | 1 +
>>>> pbs-config/src/token_shadow.rs | 157 ++++++++++++++++++++++++++++++++-
>>>> 3 files changed, 158 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/Cargo.toml b/Cargo.toml
>>>> index 1aa57ae5..821b63b7 100644
>>>> --- a/Cargo.toml
>>>> +++ b/Cargo.toml
>>>> @@ -143,6 +143,7 @@ nom = "7"
>>>> num-traits = "0.2"
>>>> once_cell = "1.3.1"
>>>> openssl = "0.10.40"
>>>> +parking_lot = "0.12"
>>>> percent-encoding = "2.1"
>>>> pin-project-lite = "0.2"
>>>> regex = "1.5.5"
>>>> diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
>>>> index 74afb3c6..eb81ce00 100644
>>>> --- a/pbs-config/Cargo.toml
>>>> +++ b/pbs-config/Cargo.toml
>>>> @@ -13,6 +13,7 @@ libc.workspace = true
>>>> nix.workspace = true
>>>> once_cell.workspace = true
>>>> openssl.workspace = true
>>>> +parking_lot.workspace = true
>>>> regex.workspace = true
>>>> serde.workspace = true
>>>> serde_json.workspace = true
>>>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
>>>> index 640fabbf..fa84aee5 100644
>>>> --- a/pbs-config/src/token_shadow.rs
>>>> +++ b/pbs-config/src/token_shadow.rs
>>>> @@ -1,6 +1,8 @@
>>>> use std::collections::HashMap;
>>>> +use std::sync::LazyLock;
>>>>
>>>> use anyhow::{bail, format_err, Error};
>>>> +use parking_lot::RwLock;
>>>> use serde::{Deserialize, Serialize};
>>>> use serde_json::{from_value, Value};
>>>>
>>>> @@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
>>>> const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
>>>> const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>>>>
>>>> +/// Global in-memory cache for successfully verified API token secrets.
>>>> +/// The cache stores plain text secrets for token Authids that have already been
>>>> +/// verified against the hashed values in `token.shadow`. This allows for cheap
>>>> +/// subsequent authentications for the same token+secret combination, avoiding
>>>> +/// recomputing the password hash on every request.
>>>> +static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
>>>> + RwLock::new(ApiTokenSecretCache {
>>>> + secrets: HashMap::new(),
>>>> + shared_gen: 0,
>>>> + })
>>>> +});
>>>> +
>>>> #[derive(Serialize, Deserialize)]
>>>> #[serde(rename_all = "kebab-case")]
>>>> /// ApiToken id / secret pair
>>>> @@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>>>> bail!("not an API token ID");
>>>> }
>>>>
>>>> + // Fast path
>>>> + if cache_try_secret_matches(tokenid, secret) {
>>>> + return Ok(());
>>>> + }
>>>> +
>>>> + // Slow path
>>>> + // First, capture the shared generation before doing the hash verification.
>>>> + let gen_before = token_shadow_shared_gen();
>>>> +
>>>> let data = read_file()?;
>>>> match data.get(tokenid) {
>>>> - Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
>>>> + Some(hashed_secret) => {
>>>> + proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
>>>> +
>>>> + // Try to cache only if nothing changed while verifying the secret.
>>>> + if let Some(gen) = gen_before {
>>>> + cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
>>>> + }
>>>> +
>>>> + Ok(())
>>>> + }
>>>> None => bail!("invalid API token"),
>>>> }
>>>> }
>>>> @@ -82,6 +114,8 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>>>> data.insert(tokenid.clone(), hashed_secret);
>>>> write_file(data)?;
>>>>
>>>> + apply_api_mutation(tokenid, Some(secret));
>>>> +
>>>> Ok(())
>>>> }
>>>>
>>>> @@ -97,5 +131,126 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
>>>> data.remove(tokenid);
>>>> write_file(data)?;
>>>>
>>>> + apply_api_mutation(tokenid, None);
>>>> +
>>>> Ok(())
>>>> }
>>>> +
>>>> +struct ApiTokenSecretCache {
>>>> + /// Keys are token Authids, values are the corresponding plain text secrets.
>>>> + /// Entries are added after a successful on-disk verification in
>>>> + /// `verify_secret` or when a new token secret is generated by
>>>> + /// `generate_and_set_secret`. Used to avoid repeated
>>>> + /// password-hash computation on subsequent authentications.
>>>> + secrets: HashMap<Authid, CachedSecret>,
>>>> + /// Shared generation to detect mutations of the underlying token.shadow file.
>>>> + shared_gen: usize,
>>>> +}
>>>> +
>>>> +/// Cached secret.
>>>> +struct CachedSecret {
>>>> + secret: String,
>>>> +}
>>>> +
>>>> +fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
>>>> + let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
>>>> + return;
>>>> + };
>>>> +
>>>> + let Some(shared_gen_now) = token_shadow_shared_gen() else {
>>>> + return;
>>>> + };
>>>> +
>>>> + // If this process missed a generation bump, its cache is stale.
>>>> + if cache.shared_gen != shared_gen_now {
>>>> + invalidate_cache_state(&mut cache);
>>>> + cache.shared_gen = shared_gen_now;
>>>> + }
>>>> +
>>>> + // If a mutation happened while we were verifying the secret, do not insert.
>>>> + if shared_gen_now == shared_gen_before {
>>>> + cache.secrets.insert(tokenid, CachedSecret { secret });
>>>> + }
>>>> +}
>>>> +
>>>> +// Tries to match the given token secret against the cached secret.
>>>> +// Checks the generation before and after the constant-time compare to avoid a
>>>> +// TOCTOU window. If another process rotates/deletes a token while we're validating
>>>> +// the cached secret, the generation will change, and we
>>>> +// must not trust the cache for this request.
>>>> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
>>>> + let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
>>>> + return false;
>>>> + };
>>>> + let Some(entry) = cache.secrets.get(tokenid) else {
>>>> + return false;
>>>> + };
>>>> +
>>>> + let cache_gen = cache.shared_gen;
>>>> +
>>>> + let Some(gen1) = token_shadow_shared_gen() else {
>>>> + return false;
>>>> + };
>>>> + if gen1 != cache_gen {
>>>> + return false;
>>>> + }
>>>> +
>>>> + let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
>>>
>>> should we invalidate the cache here for this particular authid in case
>>> of a mismatch, to avoid making brute forcing too easy/cheap?
>>>
>>
>> We are not doing a cheap reject, in mismatch we do still fall through to
>> verify_crypt_pw(). Evicting on mismatch could however enable cache
>> thrashing where wrong secrets for a known tokenid would evict cached
>> entries. So I think we should not invalidate here on mismatch.
>>
>>>> + let Some(gen2) = token_shadow_shared_gen() else {
>>>> + return false;
>>>> + };
>>>> +
>>>> + eq && gen2 == cache_gen
>>>> +}
>>>> +
>>>> +fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
>>>> + // Signal cache invalidation to other processes (best-effort).
>>>> + let new_shared_gen = bump_token_shadow_shared_gen();
>>>> +
>>>> + let mut cache = TOKEN_SECRET_CACHE.write();
>>>> +
>>>> + // If we cannot read/bump the shared generation, we cannot safely trust the cache.
>>>> + let Some(gen) = new_shared_gen else {
>>>> + invalidate_cache_state(&mut cache);
>>>> + cache.shared_gen = 0;
>>>> + return;
>>>> + };
>>>> +
>>>> + // Update to the post-mutation generation.
>>>> + cache.shared_gen = gen;
>>>> +
>>>> + // Apply the new mutation.
>>>> + match new_secret {
>>>> + Some(secret) => {
>>>> + cache.secrets.insert(
>>>> + tokenid.clone(),
>>>> + CachedSecret {
>>>> + secret: secret.to_owned(),
>>>> + },
>>>> + );
>>>> + }
>>>> + None => {
>>>> + cache.secrets.remove(tokenid);
>>>> + }
>>>> + }
>>>> +}
>>>> +
>>>> +/// Get the current shared generation.
>>>> +fn token_shadow_shared_gen() -> Option<usize> {
>>>> + crate::ConfigVersionCache::new()
>>>> + .ok()
>>>> + .map(|cvc| cvc.token_shadow_generation())
>>>> +}
>>>> +
>>>> +/// Bump and return the new shared generation.
>>>> +fn bump_token_shadow_shared_gen() -> Option<usize> {
>>>> + crate::ConfigVersionCache::new()
>>>> + .ok()
>>>> + .map(|cvc| cvc.increase_token_shadow_generation() + 1)
>>>> +}
>>>> +
>>>> +/// Invalidates the cache state and only keeps the shared generation.
>>>
>>> both calls to this actually set the cached generation to some value
>>> right after, so maybe this should take a generation directly and set it?
>>>
>>
>> patch 3/4 doesn’t always update the gen on cache invalidation
>> (shadow_mtime_len() error branch in apply_api_mutation) but most other
>> call sites do. Agreed this can be refactored, maybe:
>
> that one sets the generation before (potentially) invalidating the cache
> though, so we could unconditionally reset the generation to that value when
> invalidating.. we should maybe also re-order the lock and bump there?
>
Good point, I will check this! thanks Fabian! :)
>>
>> fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>> cache.secrets.clear();
>> // clear other cache fields (mtime/len/last_checked) as needed
>> }
>>
>> fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache,
>> gen: usize) {
>> invalidate_cache_state(cache);
>> cache.shared_gen = gen;
>> }
>>
>> We could also do a single helper with Option<usize> but two helpers make
>> the call sites more explicit.
>>
>>>> +fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>>>> + cache.secrets.clear();
>>>> +}
>>>> --
>>>> 2.47.3
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> pbs-devel mailing list
>>>> pbs-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets
2026-01-16 15:13 6% ` Samuel Rufinatscha
@ 2026-01-16 15:29 5% ` Fabian Grünbichler
2026-01-16 15:33 6% ` Samuel Rufinatscha
2026-01-16 16:00 5% ` Fabian Grünbichler
1 sibling, 1 reply; 200+ results
From: Fabian Grünbichler @ 2026-01-16 15:29 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Samuel Rufinatscha
Quoting Samuel Rufinatscha (2026-01-16 16:13:17)
> On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
> > On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
> >> Currently, every token-based API request reads the token.shadow file and
> >> runs the expensive password hash verification for the given token
> >> secret. This shows up as a hotspot in /status profiling (see
> >> bug #7017 [1]).
> >>
> >> This patch introduces an in-memory cache of successfully verified token
> >> secrets. Subsequent requests for the same token+secret combination only
> >> perform a comparison using openssl::memcmp::eq and avoid re-running the
> >> password hash. The cache is updated when a token secret is set and
> >> cleared when a token is deleted. Note, this does NOT include manual
> >> config changes, which will be covered in a subsequent patch.
> >>
> >> This patch is part of the series which fixes bug #7017 [1].
> >>
> >> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
> >>
> >> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> >> ---
> >> Changes from v1 to v2:
> >>
> >> * Replace OnceCell with LazyLock, and std::sync::RwLock with
> >> parking_lot::RwLock.
> >> * Add API_MUTATION_GENERATION and guard cache inserts
> >> to prevent “zombie inserts” across concurrent set/delete.
> >> * Refactor cache operations into cache_try_secret_matches,
> >> cache_try_insert_secret, and centralize write-side behavior in
> >> apply_api_mutation.
> >> * Switch fast-path cache access to try_read/try_write (best-effort).
> >>
> >> Changes from v2 to v3:
> >>
> >> * Replaced process-local cache invalidation (AtomicU64
> >> API_MUTATION_GENERATION) with a cross-process shared generation via
> >> ConfigVersionCache.
> >> * Validate shared generation before/after the constant-time secret
> >> compare; only insert into cache if the generation is unchanged.
> >> * invalidate_cache_state() on insert if shared generation changed.
> >>
> >> Cargo.toml | 1 +
> >> pbs-config/Cargo.toml | 1 +
> >> pbs-config/src/token_shadow.rs | 157 ++++++++++++++++++++++++++++++++-
> >> 3 files changed, 158 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/Cargo.toml b/Cargo.toml
> >> index 1aa57ae5..821b63b7 100644
> >> --- a/Cargo.toml
> >> +++ b/Cargo.toml
> >> @@ -143,6 +143,7 @@ nom = "7"
> >> num-traits = "0.2"
> >> once_cell = "1.3.1"
> >> openssl = "0.10.40"
> >> +parking_lot = "0.12"
> >> percent-encoding = "2.1"
> >> pin-project-lite = "0.2"
> >> regex = "1.5.5"
> >> diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
> >> index 74afb3c6..eb81ce00 100644
> >> --- a/pbs-config/Cargo.toml
> >> +++ b/pbs-config/Cargo.toml
> >> @@ -13,6 +13,7 @@ libc.workspace = true
> >> nix.workspace = true
> >> once_cell.workspace = true
> >> openssl.workspace = true
> >> +parking_lot.workspace = true
> >> regex.workspace = true
> >> serde.workspace = true
> >> serde_json.workspace = true
> >> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
> >> index 640fabbf..fa84aee5 100644
> >> --- a/pbs-config/src/token_shadow.rs
> >> +++ b/pbs-config/src/token_shadow.rs
> >> @@ -1,6 +1,8 @@
> >> use std::collections::HashMap;
> >> +use std::sync::LazyLock;
> >>
> >> use anyhow::{bail, format_err, Error};
> >> +use parking_lot::RwLock;
> >> use serde::{Deserialize, Serialize};
> >> use serde_json::{from_value, Value};
> >>
> >> @@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
> >> const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
> >> const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
> >>
> >> +/// Global in-memory cache for successfully verified API token secrets.
> >> +/// The cache stores plain text secrets for token Authids that have already been
> >> +/// verified against the hashed values in `token.shadow`. This allows for cheap
> >> +/// subsequent authentications for the same token+secret combination, avoiding
> >> +/// recomputing the password hash on every request.
> >> +static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
> >> + RwLock::new(ApiTokenSecretCache {
> >> + secrets: HashMap::new(),
> >> + shared_gen: 0,
> >> + })
> >> +});
> >> +
> >> #[derive(Serialize, Deserialize)]
> >> #[serde(rename_all = "kebab-case")]
> >> /// ApiToken id / secret pair
> >> @@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
> >> bail!("not an API token ID");
> >> }
> >>
> >> + // Fast path
> >> + if cache_try_secret_matches(tokenid, secret) {
> >> + return Ok(());
> >> + }
> >> +
> >> + // Slow path
> >> + // First, capture the shared generation before doing the hash verification.
> >> + let gen_before = token_shadow_shared_gen();
> >> +
> >> let data = read_file()?;
> >> match data.get(tokenid) {
> >> - Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
> >> + Some(hashed_secret) => {
> >> + proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
> >> +
> >> + // Try to cache only if nothing changed while verifying the secret.
> >> + if let Some(gen) = gen_before {
> >> + cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
> >> + }
> >> +
> >> + Ok(())
> >> + }
> >> None => bail!("invalid API token"),
> >> }
> >> }
> >> @@ -82,6 +114,8 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
> >> data.insert(tokenid.clone(), hashed_secret);
> >> write_file(data)?;
> >>
> >> + apply_api_mutation(tokenid, Some(secret));
> >> +
> >> Ok(())
> >> }
> >>
> >> @@ -97,5 +131,126 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
> >> data.remove(tokenid);
> >> write_file(data)?;
> >>
> >> + apply_api_mutation(tokenid, None);
> >> +
> >> Ok(())
> >> }
> >> +
> >> +struct ApiTokenSecretCache {
> >> + /// Keys are token Authids, values are the corresponding plain text secrets.
> >> + /// Entries are added after a successful on-disk verification in
> >> + /// `verify_secret` or when a new token secret is generated by
> >> + /// `generate_and_set_secret`. Used to avoid repeated
> >> + /// password-hash computation on subsequent authentications.
> >> + secrets: HashMap<Authid, CachedSecret>,
> >> + /// Shared generation to detect mutations of the underlying token.shadow file.
> >> + shared_gen: usize,
> >> +}
> >> +
> >> +/// Cached secret.
> >> +struct CachedSecret {
> >> + secret: String,
> >> +}
> >> +
> >> +fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
> >> + let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
> >> + return;
> >> + };
> >> +
> >> + let Some(shared_gen_now) = token_shadow_shared_gen() else {
> >> + return;
> >> + };
> >> +
> >> + // If this process missed a generation bump, its cache is stale.
> >> + if cache.shared_gen != shared_gen_now {
> >> + invalidate_cache_state(&mut cache);
> >> + cache.shared_gen = shared_gen_now;
> >> + }
> >> +
> >> + // If a mutation happened while we were verifying the secret, do not insert.
> >> + if shared_gen_now == shared_gen_before {
> >> + cache.secrets.insert(tokenid, CachedSecret { secret });
> >> + }
> >> +}
> >> +
> >> +// Tries to match the given token secret against the cached secret.
> >> +// Checks the generation before and after the constant-time compare to avoid a
> >> +// TOCTOU window. If another process rotates/deletes a token while we're validating
> >> +// the cached secret, the generation will change, and we
> >> +// must not trust the cache for this request.
> >> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
> >> + let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
> >> + return false;
> >> + };
> >> + let Some(entry) = cache.secrets.get(tokenid) else {
> >> + return false;
> >> + };
> >> +
> >> + let cache_gen = cache.shared_gen;
> >> +
> >> + let Some(gen1) = token_shadow_shared_gen() else {
> >> + return false;
> >> + };
> >> + if gen1 != cache_gen {
> >> + return false;
> >> + }
> >> +
> >> + let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
> >
> > should we invalidate the cache here for this particular authid in case
> > of a mismatch, to avoid making brute forcing too easy/cheap?
> >
>
> We are not doing a cheap reject, in mismatch we do still fall through to
> verify_crypt_pw(). Evicting on mismatch could however enable cache
> thrashing where wrong secrets for a known tokenid would evict cached
> entries. So I think we should not invalidate here on mismatch.
>
> >> + let Some(gen2) = token_shadow_shared_gen() else {
> >> + return false;
> >> + };
> >> +
> >> + eq && gen2 == cache_gen
> >> +}
> >> +
> >> +fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
> >> + // Signal cache invalidation to other processes (best-effort).
> >> + let new_shared_gen = bump_token_shadow_shared_gen();
> >> +
> >> + let mut cache = TOKEN_SECRET_CACHE.write();
> >> +
> >> + // If we cannot read/bump the shared generation, we cannot safely trust the cache.
> >> + let Some(gen) = new_shared_gen else {
> >> + invalidate_cache_state(&mut cache);
> >> + cache.shared_gen = 0;
> >> + return;
> >> + };
> >> +
> >> + // Update to the post-mutation generation.
> >> + cache.shared_gen = gen;
> >> +
> >> + // Apply the new mutation.
> >> + match new_secret {
> >> + Some(secret) => {
> >> + cache.secrets.insert(
> >> + tokenid.clone(),
> >> + CachedSecret {
> >> + secret: secret.to_owned(),
> >> + },
> >> + );
> >> + }
> >> + None => {
> >> + cache.secrets.remove(tokenid);
> >> + }
> >> + }
> >> +}
> >> +
> >> +/// Get the current shared generation.
> >> +fn token_shadow_shared_gen() -> Option<usize> {
> >> + crate::ConfigVersionCache::new()
> >> + .ok()
> >> + .map(|cvc| cvc.token_shadow_generation())
> >> +}
> >> +
> >> +/// Bump and return the new shared generation.
> >> +fn bump_token_shadow_shared_gen() -> Option<usize> {
> >> + crate::ConfigVersionCache::new()
> >> + .ok()
> >> + .map(|cvc| cvc.increase_token_shadow_generation() + 1)
> >> +}
> >> +
> >> +/// Invalidates the cache state and only keeps the shared generation.
> >
> > both calls to this actually set the cached generation to some value
> > right after, so maybe this should take a generation directly and set it?
> >
>
> patch 3/4 doesn’t always update the gen on cache invalidation
> (shadow_mtime_len() error branch in apply_api_mutation) but most other
> call sites do. Agreed this can be refactored, maybe:
that one sets the generation before (potentially) invalidating the cache
though, so we could unconditionally reset the generation to that value when
invalidating.. we should maybe also re-order the lock and bump there?
>
> fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
> cache.secrets.clear();
> // clear other cache fields (mtime/len/last_checked) as needed
> }
>
> fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache,
> gen: usize) {
> invalidate_cache_state(cache);
> cache.shared_gen = gen;
> }
>
> We could also do a single helper with Option<usize> but two helpers make
> the call sites more explicit.
>
> >> +fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
> >> + cache.secrets.clear();
> >> +}
> >> --
> >> 2.47.3
> >>
> >>
> >>
> >> _______________________________________________
> >> pbs-devel mailing list
> >> pbs-devel@lists.proxmox.com
> >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> >>
> >
> >
> > _______________________________________________
> > pbs-devel mailing list
> > pbs-devel@lists.proxmox.com
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets
2026-01-14 10:44 5% ` Fabian Grünbichler
@ 2026-01-16 15:13 6% ` Samuel Rufinatscha
2026-01-16 15:29 5% ` Fabian Grünbichler
2026-01-16 16:00 5% ` Fabian Grünbichler
0 siblings, 2 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-16 15:13 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>> Currently, every token-based API request reads the token.shadow file and
>> runs the expensive password hash verification for the given token
>> secret. This shows up as a hotspot in /status profiling (see
>> bug #7017 [1]).
>>
>> This patch introduces an in-memory cache of successfully verified token
>> secrets. Subsequent requests for the same token+secret combination only
>> perform a comparison using openssl::memcmp::eq and avoid re-running the
>> password hash. The cache is updated when a token secret is set and
>> cleared when a token is deleted. Note, this does NOT include manual
>> config changes, which will be covered in a subsequent patch.
>>
>> This patch is part of the series which fixes bug #7017 [1].
>>
>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> Changes from v1 to v2:
>>
>> * Replace OnceCell with LazyLock, and std::sync::RwLock with
>> parking_lot::RwLock.
>> * Add API_MUTATION_GENERATION and guard cache inserts
>> to prevent “zombie inserts” across concurrent set/delete.
>> * Refactor cache operations into cache_try_secret_matches,
>> cache_try_insert_secret, and centralize write-side behavior in
>> apply_api_mutation.
>> * Switch fast-path cache access to try_read/try_write (best-effort).
>>
>> Changes from v2 to v3:
>>
>> * Replaced process-local cache invalidation (AtomicU64
>> API_MUTATION_GENERATION) with a cross-process shared generation via
>> ConfigVersionCache.
>> * Validate shared generation before/after the constant-time secret
>> compare; only insert into cache if the generation is unchanged.
>> * invalidate_cache_state() on insert if shared generation changed.
>>
>> Cargo.toml | 1 +
>> pbs-config/Cargo.toml | 1 +
>> pbs-config/src/token_shadow.rs | 157 ++++++++++++++++++++++++++++++++-
>> 3 files changed, 158 insertions(+), 1 deletion(-)
>>
>> diff --git a/Cargo.toml b/Cargo.toml
>> index 1aa57ae5..821b63b7 100644
>> --- a/Cargo.toml
>> +++ b/Cargo.toml
>> @@ -143,6 +143,7 @@ nom = "7"
>> num-traits = "0.2"
>> once_cell = "1.3.1"
>> openssl = "0.10.40"
>> +parking_lot = "0.12"
>> percent-encoding = "2.1"
>> pin-project-lite = "0.2"
>> regex = "1.5.5"
>> diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
>> index 74afb3c6..eb81ce00 100644
>> --- a/pbs-config/Cargo.toml
>> +++ b/pbs-config/Cargo.toml
>> @@ -13,6 +13,7 @@ libc.workspace = true
>> nix.workspace = true
>> once_cell.workspace = true
>> openssl.workspace = true
>> +parking_lot.workspace = true
>> regex.workspace = true
>> serde.workspace = true
>> serde_json.workspace = true
>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
>> index 640fabbf..fa84aee5 100644
>> --- a/pbs-config/src/token_shadow.rs
>> +++ b/pbs-config/src/token_shadow.rs
>> @@ -1,6 +1,8 @@
>> use std::collections::HashMap;
>> +use std::sync::LazyLock;
>>
>> use anyhow::{bail, format_err, Error};
>> +use parking_lot::RwLock;
>> use serde::{Deserialize, Serialize};
>> use serde_json::{from_value, Value};
>>
>> @@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
>> const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
>> const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>>
>> +/// Global in-memory cache for successfully verified API token secrets.
>> +/// The cache stores plain text secrets for token Authids that have already been
>> +/// verified against the hashed values in `token.shadow`. This allows for cheap
>> +/// subsequent authentications for the same token+secret combination, avoiding
>> +/// recomputing the password hash on every request.
>> +static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
>> + RwLock::new(ApiTokenSecretCache {
>> + secrets: HashMap::new(),
>> + shared_gen: 0,
>> + })
>> +});
>> +
>> #[derive(Serialize, Deserialize)]
>> #[serde(rename_all = "kebab-case")]
>> /// ApiToken id / secret pair
>> @@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>> bail!("not an API token ID");
>> }
>>
>> + // Fast path
>> + if cache_try_secret_matches(tokenid, secret) {
>> + return Ok(());
>> + }
>> +
>> + // Slow path
>> + // First, capture the shared generation before doing the hash verification.
>> + let gen_before = token_shadow_shared_gen();
>> +
>> let data = read_file()?;
>> match data.get(tokenid) {
>> - Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
>> + Some(hashed_secret) => {
>> + proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
>> +
>> + // Try to cache only if nothing changed while verifying the secret.
>> + if let Some(gen) = gen_before {
>> + cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
>> + }
>> +
>> + Ok(())
>> + }
>> None => bail!("invalid API token"),
>> }
>> }
>> @@ -82,6 +114,8 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>> data.insert(tokenid.clone(), hashed_secret);
>> write_file(data)?;
>>
>> + apply_api_mutation(tokenid, Some(secret));
>> +
>> Ok(())
>> }
>>
>> @@ -97,5 +131,126 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
>> data.remove(tokenid);
>> write_file(data)?;
>>
>> + apply_api_mutation(tokenid, None);
>> +
>> Ok(())
>> }
>> +
>> +struct ApiTokenSecretCache {
>> + /// Keys are token Authids, values are the corresponding plain text secrets.
>> + /// Entries are added after a successful on-disk verification in
>> + /// `verify_secret` or when a new token secret is generated by
>> + /// `generate_and_set_secret`. Used to avoid repeated
>> + /// password-hash computation on subsequent authentications.
>> + secrets: HashMap<Authid, CachedSecret>,
>> + /// Shared generation to detect mutations of the underlying token.shadow file.
>> + shared_gen: usize,
>> +}
>> +
>> +/// Cached secret.
>> +struct CachedSecret {
>> + secret: String,
>> +}
>> +
>> +fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
>> + let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
>> + return;
>> + };
>> +
>> + let Some(shared_gen_now) = token_shadow_shared_gen() else {
>> + return;
>> + };
>> +
>> + // If this process missed a generation bump, its cache is stale.
>> + if cache.shared_gen != shared_gen_now {
>> + invalidate_cache_state(&mut cache);
>> + cache.shared_gen = shared_gen_now;
>> + }
>> +
>> + // If a mutation happened while we were verifying the secret, do not insert.
>> + if shared_gen_now == shared_gen_before {
>> + cache.secrets.insert(tokenid, CachedSecret { secret });
>> + }
>> +}
>> +
>> +// Tries to match the given token secret against the cached secret.
>> +// Checks the generation before and after the constant-time compare to avoid a
>> +// TOCTOU window. If another process rotates/deletes a token while we're validating
>> +// the cached secret, the generation will change, and we
>> +// must not trust the cache for this request.
>> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
>> + let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
>> + return false;
>> + };
>> + let Some(entry) = cache.secrets.get(tokenid) else {
>> + return false;
>> + };
>> +
>> + let cache_gen = cache.shared_gen;
>> +
>> + let Some(gen1) = token_shadow_shared_gen() else {
>> + return false;
>> + };
>> + if gen1 != cache_gen {
>> + return false;
>> + }
>> +
>> + let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
>
> should we invalidate the cache here for this particular authid in case
> of a mismatch, to avoid making brute forcing too easy/cheap?
>
We are not doing a cheap reject, in mismatch we do still fall through to
verify_crypt_pw(). Evicting on mismatch could however enable cache
thrashing where wrong secrets for a known tokenid would evict cached
entries. So I think we should not invalidate here on mismatch.
>> + let Some(gen2) = token_shadow_shared_gen() else {
>> + return false;
>> + };
>> +
>> + eq && gen2 == cache_gen
>> +}
>> +
>> +fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
>> + // Signal cache invalidation to other processes (best-effort).
>> + let new_shared_gen = bump_token_shadow_shared_gen();
>> +
>> + let mut cache = TOKEN_SECRET_CACHE.write();
>> +
>> + // If we cannot read/bump the shared generation, we cannot safely trust the cache.
>> + let Some(gen) = new_shared_gen else {
>> + invalidate_cache_state(&mut cache);
>> + cache.shared_gen = 0;
>> + return;
>> + };
>> +
>> + // Update to the post-mutation generation.
>> + cache.shared_gen = gen;
>> +
>> + // Apply the new mutation.
>> + match new_secret {
>> + Some(secret) => {
>> + cache.secrets.insert(
>> + tokenid.clone(),
>> + CachedSecret {
>> + secret: secret.to_owned(),
>> + },
>> + );
>> + }
>> + None => {
>> + cache.secrets.remove(tokenid);
>> + }
>> + }
>> +}
>> +
>> +/// Get the current shared generation.
>> +fn token_shadow_shared_gen() -> Option<usize> {
>> + crate::ConfigVersionCache::new()
>> + .ok()
>> + .map(|cvc| cvc.token_shadow_generation())
>> +}
>> +
>> +/// Bump and return the new shared generation.
>> +fn bump_token_shadow_shared_gen() -> Option<usize> {
>> + crate::ConfigVersionCache::new()
>> + .ok()
>> + .map(|cvc| cvc.increase_token_shadow_generation() + 1)
>> +}
>> +
>> +/// Invalidates the cache state and only keeps the shared generation.
>
> both calls to this actually set the cached generation to some value
> right after, so maybe this should take a generation directly and set it?
>
patch 3/4 doesn’t always update the gen on cache invalidation
(shadow_mtime_len() error branch in apply_api_mutation) but most other
call sites do. Agreed this can be refactored, maybe:
fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
cache.secrets.clear();
// clear other cache fields (mtime/len/last_checked) as needed
}
fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache,
gen: usize) {
invalidate_cache_state(cache);
cache.shared_gen = gen;
}
We could also do a single helper with Option<usize> but two helpers make
the call sites more explicit.
>> +fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>> + cache.secrets.clear();
>> +}
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 1/4] pbs-config: add token.shadow generation to ConfigVersionCache
2026-01-14 10:44 5% ` Fabian Grünbichler
@ 2026-01-16 13:53 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-16 13:53 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>> Currently, every token-based API request reads the token.shadow file and
>> runs the expensive password hash verification for the given token
>> secret. This shows up as a hotspot in /status profiling (see
>> bug #7017 [1]).
>>
>> To solve the issue, this patch prepares the config version cache,
>> so that token_shadow_generation config caching can be built on
>> top of it.
>>
>> This patch specifically:
>> (1) implements increment function in order to invalidate generations
>
> this is needlessly verbose..
>
>>
>> This patch is part of the series which fixes bug #7017 [1].
>
> this is already mentioned higher up and doesn't need to be repeated
> here.
>
Makes sense, will adjust this. Thanks!
> this patch needs a rebase. it would be good to call out why it is safe
> to add to this struct, since it is accessed/mapped by both old and new
> processes.
>
Will add a note on why this is safe: the shmem mapping is fixed to 4096
bytes via the #[repr(C)] union padding and enforced
by assert_cache_size(). The new AtomicUsize is appended at the end of
the struct, so existing field offsets are unchanged. Old
processes keep accessing the same bytes; the new field consumes
previously reserved padding.
>>
>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> pbs-config/src/config_version_cache.rs | 18 ++++++++++++++++++
>> 1 file changed, 18 insertions(+)
>>
>> diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
>> index e8fb994f..1376b11d 100644
>> --- a/pbs-config/src/config_version_cache.rs
>> +++ b/pbs-config/src/config_version_cache.rs
>> @@ -28,6 +28,8 @@ struct ConfigVersionCacheDataInner {
>> // datastore (datastore.cfg) generation/version
>> // FIXME: remove with PBS 3.0
>> datastore_generation: AtomicUsize,
>> + // Token shadow (token.shadow) generation/version.
>> + token_shadow_generation: AtomicUsize,
>> // Add further atomics here
>> }
>>
>> @@ -153,4 +155,20 @@ impl ConfigVersionCache {
>> .datastore_generation
>> .fetch_add(1, Ordering::AcqRel)
>> }
>> +
>> + /// Returns the token shadow generation number.
>> + pub fn token_shadow_generation(&self) -> usize {
>> + self.shmem
>> + .data()
>> + .token_shadow_generation
>> + .load(Ordering::Acquire)
>> + }
>> +
>> + /// Increase the token shadow generation number.
>> + pub fn increase_token_shadow_generation(&self) -> usize {
>> + self.shmem
>> + .data()
>> + .token_shadow_generation
>> + .fetch_add(1, Ordering::AcqRel)
>> + }
>> }
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* [pbs-devel] superseded: [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests
2026-01-08 11:26 11% [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (9 preceding siblings ...)
2026-01-13 13:48 5% ` [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests Fabian Grünbichler
@ 2026-01-16 11:30 13% ` Samuel Rufinatscha
10 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-16 11:30 UTC (permalink / raw)
To: pbs-devel
https://lore.proxmox.com/pbs-devel/20260116112859.194016-1-s.rufinatscha@proxmox.com/T/#t
On 1/8/26 12:25 PM, Samuel Rufinatscha wrote:
> Hi,
>
> this series fixes account registration for ACME providers that return
> HTTP 204 No Content to the newNonce request. Currently, both the PBS
> ACME client and the shared ACME client in proxmox-acme only accept
> HTTP 200 OK for this request. The issue was observed in PBS against a
> custom ACME deployment and reported as bug #6939 [1].
>
> ## Problem
>
> During ACME account registration, PBS first fetches an anti-replay
> nonce by sending a HEAD request to the CA’s newNonce URL.
> RFC 8555 §7.2 [2] states that:
>
> * the server MUST include a Replay-Nonce header with a fresh nonce,
> * the server SHOULD use status 200 OK for the HEAD request,
> * the server MUST also handle GET on the same resource and may return
> 204 No Content with an empty body.
>
> The reporter observed the following error message:
>
> *ACME server responded with unexpected status code: 204*
>
> and mentioned that the issue did not appear with PVE 9 [1]. Looking at
> PVE’s Perl ACME client [3], it uses a GET request instead of HEAD and
> accepts any 2xx success code when retrieving the nonce. This difference
> in behavior does not affect functionality but is worth noting for
> consistency across implementations.
>
> ## Approach
>
> To support ACME providers which return 204 No Content, the Rust ACME
> clients in proxmox-backup and proxmox need to treat both 200 OK and 204
> No Content as valid responses for the nonce request, as long as a
> Replay-Nonce header is present.
>
> This series changes the expected field of the internal Request type
> from a single u16 to a list of allowed status codes
> (e.g. &'static [u16]), so one request can explicitly accept multiple
> success codes.
>
> To avoid fixing the issue twice (once in PBS’ own ACME client and once
> in the shared Rust client), this series first refactors PBS to use the
> shared AcmeClient from proxmox-acme / proxmox-acme-api, similar to PDM,
> and then applies the bug fix in that shared implementation so that all
> consumers benefit from the more tolerant behavior.
>
> ## Testing
>
> *Testing the refactor*
>
> To test the refactor, I
> (1) installed latest stable PBS on a VM
> (2) created .deb package from latest PBS (master), containing the
> refactor
> (3) installed created .deb package
> (4) installed Pebble from Let's Encrypt [5] on the same VM
> (5) created an ACME account and ordered the new certificate for the
> host domain.
>
> Steps to reproduce:
>
> (1) install latest stable PBS on a VM, create .deb package from latest
> PBS (master) containing the refactor, install created .deb package
> (2) install Pebble from Let's Encrypt [5] on the same VM:
>
> cd
> apt update
> apt install -y golang git
> git clone https://github.com/letsencrypt/pebble
> cd pebble
> go build ./cmd/pebble
>
> then, download and trust the Pebble cert:
>
> wget https://raw.githubusercontent.com/letsencrypt/pebble/main/test/certs/pebble.minica.pem
> cp pebble.minica.pem /usr/local/share/ca-certificates/pebble.minica.crt
> update-ca-certificates
>
> We want Pebble to perform HTTP-01 validation against port 80, because
> PBS’s standalone plugin will bind port 80. Set httpPort to 80.
>
> nano ./test/config/pebble-config.json
>
> Start the Pebble server in the background:
>
> ./pebble -config ./test/config/pebble-config.json &
>
> Create a Pebble ACME account:
>
> proxmox-backup-manager acme account register default admin@example.com --directory 'https://127.0.0.1:14000/dir'
>
> To verify persistence of the account I checked
>
> ls /etc/proxmox-backup/acme/accounts
>
> Verified if update-account works
>
> proxmox-backup-manager acme account update default --contact "a@example.com,b@example.com"
> proxmox-backup-manager acme account info default
>
> In the PBS GUI, you can create a new domain. You can use your host
> domain name (see /etc/hosts). Select the created account and order the
> certificate.
>
> After a page reload, you might need to accept the new certificate in the browser.
> In the PBS dashboard, you should see the new Pebble certificate.
>
> *Note: on reboot, the created Pebble ACME account will be gone and you
> will need to create a new one. Pebble does not persist account info.
> In that case remove the previously created account in
> /etc/proxmox-backup/acme/accounts.
>
> *Testing the newNonce fix*
>
> To prove the ACME newNonce fix, I put nginx in front of Pebble, to
> intercept the newNonce request in order to return 204 No Content
> instead of 200 OK, all other requests are unchanged and forwarded to
> Pebble. Requires trusting the nginx CAs via
> /usr/local/share/ca-certificates + update-ca-certificates on the VM.
>
> Then I ran following command against nginx:
>
> proxmox-backup-manager acme account register proxytest root@backup.local --directory 'https://nginx-address/dir
>
> The account could be created successfully. When adjusting the nginx
> configuration to return any other non-expected success status code,
> PBS rejects as expected.
>
> ## Patch summary
>
> 0001 – [PATCH proxmox v5 1/4] acme: reduce visibility of Request type
> Restricts the visibility of the low-level Request type. Consumers
> should rely on proxmox-acme-api or AcmeClient handlers.
>
> 0002– [PATCH proxmox v5 2/4] acme: introduce http_status module
>
> 0003 – [PATCH proxmox v5 3/4] fix #6939: acme: support servers
> returning 204 for nonce requests
> Adjusts nonce handling to support ACME servers that return HTTP 204
> (No Content) for new-nonce requests.
>
> 0004 – [PATCH proxmox v5 4/4] acme-api: add helper to load client for
> an account
> Introduces a helper function to load an ACME client instance for a
> given account. Required for the following PBS ACME refactor.
>
> 0005 – [PATCH proxmox-backup v5 1/5] acme: clean up ACME-related imports
>
> 0006 – [PATCH proxmox-backup v5 2/5] acme: include proxmox-acme-api
> dependency
> Prepares the codebase to use the factored out ACME API impl.
>
> 0007 – [PATCH proxmox-backup v5 3/5] acme: drop local AcmeClient
> Removes the local AcmeClient implementation. Represents the minimal
> set of changes to replace it with the factored out AcmeClient.
>
> 0008 – [PATCH proxmox-backup v5 4/5] acme: change API impls to use
> proxmox-acme-api handlers
>
> 0009 – [PATCH proxmox-backup v5 5/5] acme: certificate ordering through
> proxmox-acme-api
>
> Thanks for considering this patch series, I look forward to your
> feedback.
>
> Best,
> Samuel Rufinatscha
>
> ## Changelog
>
> Changes from v4 to v5:
>
> * rebased series
> * re-ordered series (proxmox-acme fix first)
> * proxmox-backup: cleaned up imports based on an initial clean-up patch
> * proxmox-acme: removed now unused post_request_raw_payload(),
> update_account_request(), deactivate_account_request()
> * proxmox-acme: removed now obsolete/unused get_authorization() and
> GetAuthorization impl
>
> Verified removal by compiling PBS, PDM, and proxmox-perl-rs
> with all features.
>
> Changes from v3 to v4:
>
> * add proxmox-acme-api as a dependency and initialize it in
> PBS so PBS can use the shared ACME API instead.
> * remove the PBS-local AcmeClient implementation and switch PBS
> over to the shared proxmox-acme async client.
> * rework PBS’ ACME API endpoints to delegate to
> proxmox-acme-api handlers instead of duplicating logic locally.
> * move PBS’ ACME certificate ordering logic over to
> proxmox-acme-api, keeping only certificate installation/reload in PBS.
> * add a load_client_with_account helper in proxmox-acme-api so PBS
> (and others) can construct an AcmeClient for a configured account
> without duplicating boilerplate.
> * hide the low-level Request type and its fields behind constructors
> / reduced visibility so changes to “expected” no longer affect the
> public API as they did in v3.
> * split out the HTTP status constants into an internal http_status
> module as a separate preparatory cleanup before the bug fix, instead
> of doing this inline like in v3.
> * Rebased on top of the refactor: keep the same behavioural fix as in
> v3 accept 204 for newNonce with Replay-Nonce present), but implement
> it on top of the http_status module that is part of the refactor.
>
> Changes from v2 to v3:
>
> * rename `http_success` module to `http_status`
> * replace `http_success` usage
> * introduced `http_success` module to contain the http success codes
> * replaced `Vec<u16>` with `&[u16]` for expected codes to avoid allocations.
> * clarified the PVEs Perl ACME client behaviour in the commit message.
> * integrated the `http_success` module, replacing `Vec<u16>` with `&[u16]`
> * clarified the PVEs Perl ACME client behaviour in the commit message.
>
> [1] Bugzilla report #6939:
> [https://bugzilla.proxmox.com/show_bug.cgi?id=6939](https://bugzilla.proxmox.com/show_bug.cgi?id=6939)
> [2] RFC 8555 (ACME):
> [https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2](https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2)
> [3] PVE’s Perl ACME client (allow 2xx codes for nonce requests):
> [https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597)
> [4] Pebble ACME server:
> [https://github.com/letsencrypt/pebble](https://github.com/letsencrypt/pebble)
> [5] Pebble ACME server (perform GET request:
> [https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219)
>
> proxmox:
>
> Samuel Rufinatscha (4):
> acme: reduce visibility of Request type
> acme: introduce http_status module
> fix #6939: acme: support servers returning 204 for nonce requests
> acme-api: add helper to load client for an account
>
> proxmox-acme-api/src/account_api_impl.rs | 5 ++
> proxmox-acme-api/src/lib.rs | 3 +-
> proxmox-acme/src/account.rs | 102 ++---------------------
> proxmox-acme/src/async_client.rs | 8 +-
> proxmox-acme/src/authorization.rs | 30 -------
> proxmox-acme/src/client.rs | 8 +-
> proxmox-acme/src/lib.rs | 6 +-
> proxmox-acme/src/order.rs | 2 +-
> proxmox-acme/src/request.rs | 25 ++++--
> 9 files changed, 44 insertions(+), 145 deletions(-)
>
>
> proxmox-backup:
>
> Samuel Rufinatscha (5):
> acme: clean up ACME-related imports
> acme: include proxmox-acme-api dependency
> acme: drop local AcmeClient
> acme: change API impls to use proxmox-acme-api handlers
> acme: certificate ordering through proxmox-acme-api
>
> Cargo.toml | 3 +
> src/acme/client.rs | 691 -------------------------
> src/acme/mod.rs | 5 -
> src/acme/plugin.rs | 336 ------------
> src/api2/config/acme.rs | 406 ++-------------
> src/api2/node/certificates.rs | 232 ++-------
> src/api2/types/acme.rs | 98 ----
> src/api2/types/mod.rs | 3 -
> src/bin/proxmox-backup-api.rs | 2 +
> src/bin/proxmox-backup-manager.rs | 14 +-
> src/bin/proxmox-backup-proxy.rs | 15 +-
> src/bin/proxmox_backup_manager/acme.rs | 21 +-
> src/config/acme/mod.rs | 55 +-
> src/config/acme/plugin.rs | 92 +---
> src/config/node.rs | 31 +-
> src/lib.rs | 2 -
> 16 files changed, 109 insertions(+), 1897 deletions(-)
> delete mode 100644 src/acme/client.rs
> delete mode 100644 src/acme/mod.rs
> delete mode 100644 src/acme/plugin.rs
> delete mode 100644 src/api2/types/acme.rs
>
>
> Summary over all repositories:
> 25 files changed, 153 insertions(+), 2042 deletions(-)
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox-backup v6 1/2] acme: remove local AcmeClient and use proxmox-acme-api handlers
2026-01-16 11:28 10% [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (2 preceding siblings ...)
2026-01-16 11:28 14% ` [pbs-devel] [PATCH proxmox v6 3/3] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
@ 2026-01-16 11:28 4% ` Samuel Rufinatscha
2026-01-16 11:28 9% ` [pbs-devel] [PATCH proxmox-backup v6 2/2] acme: remove unused src/acme and plugin code Samuel Rufinatscha
4 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-16 11:28 UTC (permalink / raw)
To: pbs-devel
PBS currently uses its own ACME client and API logic, while PDM uses the
factored out proxmox-acme and proxmox-acme-api crates. This requires
maintenance in two places. This patch moves PBS over to the shared
ACME stack.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Cargo.toml | 3 +
src/acme/client.rs | 691 -------------------------
src/acme/mod.rs | 4 -
src/acme/plugin.rs | 2 +-
src/api2/config/acme.rs | 399 ++------------
src/api2/node/certificates.rs | 221 +-------
src/api2/types/acme.rs | 61 +--
src/bin/proxmox-backup-api.rs | 2 +
src/bin/proxmox-backup-manager.rs | 3 +-
src/bin/proxmox-backup-proxy.rs | 1 +
src/bin/proxmox_backup_manager/acme.rs | 37 +-
src/config/acme/mod.rs | 167 ------
src/config/acme/plugin.rs | 88 +---
src/config/node.rs | 43 +-
14 files changed, 98 insertions(+), 1624 deletions(-)
delete mode 100644 src/acme/client.rs
diff --git a/Cargo.toml b/Cargo.toml
index 49548ecc..5c94bfaa 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -101,6 +101,7 @@ pbs-api-types = "1.0.8"
# other proxmox crates
pathpatterns = "1"
proxmox-acme = "1"
+proxmox-acme-api = { version = "1", features = [ "impl" ] }
pxar = "1"
# PBS workspace
@@ -251,6 +252,7 @@ pbs-api-types.workspace = true
# in their respective repo
proxmox-acme.workspace = true
+proxmox-acme-api.workspace = true
pxar.workspace = true
# proxmox-backup workspace/internal crates
@@ -269,6 +271,7 @@ proxmox-rrd-api-types.workspace = true
[patch.crates-io]
#pbs-api-types = { path = "../proxmox/pbs-api-types" }
#proxmox-acme = { path = "../proxmox/proxmox-acme" }
+#proxmox-acme-api = { path = "../proxmox/proxmox-acme-api" }
#proxmox-api-macro = { path = "../proxmox/proxmox-api-macro" }
#proxmox-apt = { path = "../proxmox/proxmox-apt" }
#proxmox-apt-api-types = { path = "../proxmox/proxmox-apt-api-types" }
diff --git a/src/acme/client.rs b/src/acme/client.rs
deleted file mode 100644
index 9fb6ad55..00000000
--- a/src/acme/client.rs
+++ /dev/null
@@ -1,691 +0,0 @@
-//! HTTP Client for the ACME protocol.
-
-use std::fs::OpenOptions;
-use std::io;
-use std::os::unix::fs::OpenOptionsExt;
-
-use anyhow::{bail, format_err};
-use bytes::Bytes;
-use http_body_util::BodyExt;
-use hyper::Request;
-use nix::sys::stat::Mode;
-use proxmox_http::Body;
-use serde::{Deserialize, Serialize};
-
-use proxmox_acme::account::AccountCreator;
-use proxmox_acme::order::{Order, OrderData};
-use proxmox_acme::types::AccountData as AcmeAccountData;
-use proxmox_acme::Request as AcmeRequest;
-use proxmox_acme::{Account, Authorization, Challenge, Directory, Error, ErrorResponse};
-use proxmox_http::client::Client;
-use proxmox_sys::fs::{replace_file, CreateOptions};
-
-use crate::api2::types::AcmeAccountName;
-use crate::config::acme::account_path;
-use crate::tools::pbs_simple_http;
-
-/// Our on-disk format inherited from PVE's proxmox-acme code.
-#[derive(Deserialize, Serialize)]
-#[serde(rename_all = "camelCase")]
-pub struct AccountData {
- /// The account's location URL.
- location: String,
-
- /// The account data.
- account: AcmeAccountData,
-
- /// The private key as PEM formatted string.
- key: String,
-
- /// ToS URL the user agreed to.
- #[serde(skip_serializing_if = "Option::is_none")]
- tos: Option<String>,
-
- #[serde(skip_serializing_if = "is_false", default)]
- debug: bool,
-
- /// The directory's URL.
- directory_url: String,
-}
-
-#[inline]
-fn is_false(b: &bool) -> bool {
- !*b
-}
-
-pub struct AcmeClient {
- directory_url: String,
- debug: bool,
- account_path: Option<String>,
- tos: Option<String>,
- account: Option<Account>,
- directory: Option<Directory>,
- nonce: Option<String>,
- http_client: Client,
-}
-
-impl AcmeClient {
- /// Create a new ACME client for a given ACME directory URL.
- pub fn new(directory_url: String) -> Self {
- Self {
- directory_url,
- debug: false,
- account_path: None,
- tos: None,
- account: None,
- directory: None,
- nonce: None,
- http_client: pbs_simple_http(None),
- }
- }
-
- /// Load an existing ACME account by name.
- pub async fn load(account_name: &AcmeAccountName) -> Result<Self, anyhow::Error> {
- let account_path = account_path(account_name.as_ref());
- let data = match tokio::fs::read(&account_path).await {
- Ok(data) => data,
- Err(err) if err.kind() == io::ErrorKind::NotFound => {
- bail!("acme account '{}' does not exist", account_name)
- }
- Err(err) => bail!(
- "failed to load acme account from '{}' - {}",
- account_path,
- err
- ),
- };
- let data: AccountData = serde_json::from_slice(&data).map_err(|err| {
- format_err!(
- "failed to parse acme account from '{}' - {}",
- account_path,
- err
- )
- })?;
-
- let account = Account::from_parts(data.location, data.key, data.account);
-
- let mut me = Self::new(data.directory_url);
- me.debug = data.debug;
- me.account_path = Some(account_path);
- me.tos = data.tos;
- me.account = Some(account);
-
- Ok(me)
- }
-
- pub async fn new_account<'a>(
- &'a mut self,
- account_name: &AcmeAccountName,
- tos_agreed: bool,
- contact: Vec<String>,
- rsa_bits: Option<u32>,
- eab_creds: Option<(String, String)>,
- ) -> Result<&'a Account, anyhow::Error> {
- self.tos = if tos_agreed {
- self.terms_of_service_url().await?.map(str::to_owned)
- } else {
- None
- };
-
- let mut account = Account::creator()
- .set_contacts(contact)
- .agree_to_tos(tos_agreed);
-
- if let Some((eab_kid, eab_hmac_key)) = eab_creds {
- account = account.set_eab_credentials(eab_kid, eab_hmac_key)?;
- }
-
- let account = if let Some(bits) = rsa_bits {
- account.generate_rsa_key(bits)?
- } else {
- account.generate_ec_key()?
- };
-
- let _ = self.register_account(account).await?;
-
- crate::config::acme::make_acme_account_dir()?;
- let account_path = account_path(account_name.as_ref());
- let file = OpenOptions::new()
- .write(true)
- .create_new(true)
- .mode(0o600)
- .open(&account_path)
- .map_err(|err| format_err!("failed to open {:?} for writing: {}", account_path, err))?;
- self.write_to(file).map_err(|err| {
- format_err!(
- "failed to write acme account to {:?}: {}",
- account_path,
- err
- )
- })?;
- self.account_path = Some(account_path);
-
- // unwrap: Setting `self.account` is literally this function's job, we just can't keep
- // the borrow from from `self.register_account()` active due to clashes.
- Ok(self.account.as_ref().unwrap())
- }
-
- fn save(&self) -> Result<(), anyhow::Error> {
- let mut data = Vec::<u8>::new();
- self.write_to(&mut data)?;
- let account_path = self.account_path.as_ref().ok_or_else(|| {
- format_err!("no account path set, cannot save updated account information")
- })?;
- crate::config::acme::make_acme_account_dir()?;
- replace_file(
- account_path,
- &data,
- CreateOptions::new()
- .perm(Mode::from_bits_truncate(0o600))
- .owner(nix::unistd::ROOT)
- .group(nix::unistd::Gid::from_raw(0)),
- true,
- )
- }
-
- /// Shortcut to `account().ok_or_else(...).key_authorization()`.
- pub fn key_authorization(&self, token: &str) -> Result<String, anyhow::Error> {
- Ok(Self::need_account(&self.account)?.key_authorization(token)?)
- }
-
- /// Shortcut to `account().ok_or_else(...).dns_01_txt_value()`.
- /// the key authorization value.
- pub fn dns_01_txt_value(&self, token: &str) -> Result<String, anyhow::Error> {
- Ok(Self::need_account(&self.account)?.dns_01_txt_value(token)?)
- }
-
- async fn register_account(
- &mut self,
- account: AccountCreator,
- ) -> Result<&Account, anyhow::Error> {
- let mut retry = retry();
- let mut response = loop {
- retry.tick()?;
-
- let (directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
- let request = account.request(directory, nonce)?;
- match self.run_request(request).await {
- Ok(response) => break response,
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- };
-
- let account = account.response(response.location_required()?, &response.body)?;
-
- self.account = Some(account);
- Ok(self.account.as_ref().unwrap())
- }
-
- pub async fn update_account<T: Serialize>(
- &mut self,
- data: &T,
- ) -> Result<&Account, anyhow::Error> {
- let account = Self::need_account(&self.account)?;
-
- let mut retry = retry();
- let response = loop {
- retry.tick()?;
-
- let (_directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = account.post_request(&account.location, nonce, data)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(response) => break response,
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- };
-
- // unwrap: we've been keeping an immutable reference to it from the top of the method
- let _ = account;
- self.account.as_mut().unwrap().data = response.json()?;
- self.save()?;
- Ok(self.account.as_ref().unwrap())
- }
-
- pub async fn new_order<I>(&mut self, domains: I) -> Result<Order, anyhow::Error>
- where
- I: IntoIterator<Item = String>,
- {
- let account = Self::need_account(&self.account)?;
-
- let order = domains
- .into_iter()
- .fold(OrderData::new(), |order, domain| order.domain(domain));
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let mut new_order = account.new_order(&order, directory, nonce)?;
- let mut response = match Self::execute(
- &mut self.http_client,
- new_order.request.take().unwrap(),
- &mut self.nonce,
- )
- .await
- {
- Ok(response) => response,
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- };
-
- return Ok(
- new_order.response(response.location_required()?, response.bytes().as_ref())?
- );
- }
- }
-
- /// Low level "POST-as-GET" request.
- async fn post_as_get(&mut self, url: &str) -> Result<AcmeResponse, anyhow::Error> {
- let account = Self::need_account(&self.account)?;
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (_directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = account.get_request(url, nonce)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(response) => return Ok(response),
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- }
- }
-
- /// Low level POST request.
- async fn post<T: Serialize>(
- &mut self,
- url: &str,
- data: &T,
- ) -> Result<AcmeResponse, anyhow::Error> {
- let account = Self::need_account(&self.account)?;
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (_directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = account.post_request(url, nonce, data)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(response) => return Ok(response),
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- }
- }
-
- /// Request challenge validation. Afterwards, the challenge should be polled.
- pub async fn request_challenge_validation(
- &mut self,
- url: &str,
- ) -> Result<Challenge, anyhow::Error> {
- Ok(self
- .post(url, &serde_json::Value::Object(Default::default()))
- .await?
- .json()?)
- }
-
- /// Assuming the provided URL is an 'Authorization' URL, get and deserialize it.
- pub async fn get_authorization(&mut self, url: &str) -> Result<Authorization, anyhow::Error> {
- Ok(self.post_as_get(url).await?.json()?)
- }
-
- /// Assuming the provided URL is an 'Order' URL, get and deserialize it.
- pub async fn get_order(&mut self, url: &str) -> Result<OrderData, anyhow::Error> {
- Ok(self.post_as_get(url).await?.json()?)
- }
-
- /// Finalize an Order via its `finalize` URL property and the DER encoded CSR.
- pub async fn finalize(&mut self, url: &str, csr: &[u8]) -> Result<(), anyhow::Error> {
- let csr = proxmox_base64::url::encode_no_pad(csr);
- let data = serde_json::json!({ "csr": csr });
- self.post(url, &data).await?;
- Ok(())
- }
-
- /// Download a certificate via its 'certificate' URL property.
- ///
- /// The certificate will be a PEM certificate chain.
- pub async fn get_certificate(&mut self, url: &str) -> Result<Bytes, anyhow::Error> {
- Ok(self.post_as_get(url).await?.body)
- }
-
- /// Revoke an existing certificate (PEM or DER formatted).
- pub async fn revoke_certificate(
- &mut self,
- certificate: &[u8],
- reason: Option<u32>,
- ) -> Result<(), anyhow::Error> {
- // TODO: This can also work without an account.
- let account = Self::need_account(&self.account)?;
-
- let revocation = account.revoke_certificate(certificate, reason)?;
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = revocation.request(directory, nonce)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(_response) => return Ok(()),
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- }
- }
-
- fn need_account(account: &Option<Account>) -> Result<&Account, anyhow::Error> {
- account
- .as_ref()
- .ok_or_else(|| format_err!("cannot use client without an account"))
- }
-
- pub(crate) fn account(&self) -> Result<&Account, anyhow::Error> {
- Self::need_account(&self.account)
- }
-
- pub fn tos(&self) -> Option<&str> {
- self.tos.as_deref()
- }
-
- pub fn directory_url(&self) -> &str {
- &self.directory_url
- }
-
- fn to_account_data(&self) -> Result<AccountData, anyhow::Error> {
- let account = self.account()?;
-
- Ok(AccountData {
- location: account.location.clone(),
- key: account.private_key.clone(),
- account: AcmeAccountData {
- only_return_existing: false, // don't actually write this out in case it's set
- ..account.data.clone()
- },
- tos: self.tos.clone(),
- debug: self.debug,
- directory_url: self.directory_url.clone(),
- })
- }
-
- fn write_to<T: io::Write>(&self, out: T) -> Result<(), anyhow::Error> {
- let data = self.to_account_data()?;
-
- Ok(serde_json::to_writer_pretty(out, &data)?)
- }
-}
-
-struct AcmeResponse {
- body: Bytes,
- location: Option<String>,
- got_nonce: bool,
-}
-
-impl AcmeResponse {
- /// Convenience helper to assert that a location header was part of the response.
- fn location_required(&mut self) -> Result<String, anyhow::Error> {
- self.location
- .take()
- .ok_or_else(|| format_err!("missing Location header"))
- }
-
- /// Convenience shortcut to perform json deserialization of the returned body.
- fn json<T: for<'a> Deserialize<'a>>(&self) -> Result<T, Error> {
- Ok(serde_json::from_slice(&self.body)?)
- }
-
- /// Convenience shortcut to get the body as bytes.
- fn bytes(&self) -> &[u8] {
- &self.body
- }
-}
-
-impl AcmeClient {
- /// Non-self-borrowing run_request version for borrow workarounds.
- async fn execute(
- http_client: &mut Client,
- request: AcmeRequest,
- nonce: &mut Option<String>,
- ) -> Result<AcmeResponse, Error> {
- let req_builder = Request::builder().method(request.method).uri(&request.url);
-
- let http_request = if !request.content_type.is_empty() {
- req_builder
- .header("Content-Type", request.content_type)
- .header("Content-Length", request.body.len())
- .body(request.body.into())
- } else {
- req_builder.body(Body::empty())
- }
- .map_err(|err| Error::Custom(format!("failed to create http request: {err}")))?;
-
- let response = http_client
- .request(http_request)
- .await
- .map_err(|err| Error::Custom(err.to_string()))?;
- let (parts, body) = response.into_parts();
-
- let status = parts.status.as_u16();
- let body = body
- .collect()
- .await
- .map_err(|err| Error::Custom(format!("failed to retrieve response body: {err}")))?
- .to_bytes();
-
- let got_nonce = if let Some(new_nonce) = parts.headers.get(proxmox_acme::REPLAY_NONCE) {
- let new_nonce = new_nonce.to_str().map_err(|err| {
- Error::Client(format!(
- "received invalid replay-nonce header from ACME server: {err}"
- ))
- })?;
- *nonce = Some(new_nonce.to_owned());
- true
- } else {
- false
- };
-
- if parts.status.is_success() {
- if status != request.expected {
- return Err(Error::InvalidApi(format!(
- "ACME server responded with unexpected status code: {:?}",
- parts.status
- )));
- }
-
- let location = parts
- .headers
- .get("Location")
- .map(|header| {
- header.to_str().map(str::to_owned).map_err(|err| {
- Error::Client(format!(
- "received invalid location header from ACME server: {err}"
- ))
- })
- })
- .transpose()?;
-
- return Ok(AcmeResponse {
- body,
- location,
- got_nonce,
- });
- }
-
- let error: ErrorResponse = serde_json::from_slice(&body).map_err(|err| {
- Error::Client(format!(
- "error status with improper error ACME response: {err}"
- ))
- })?;
-
- if error.ty == proxmox_acme::error::BAD_NONCE {
- if !got_nonce {
- return Err(Error::InvalidApi(
- "badNonce without a new Replay-Nonce header".to_string(),
- ));
- }
- return Err(Error::BadNonce);
- }
-
- Err(Error::Api(error))
- }
-
- /// Low-level API to run an n API request. This automatically updates the current nonce!
- async fn run_request(&mut self, request: AcmeRequest) -> Result<AcmeResponse, Error> {
- Self::execute(&mut self.http_client, request, &mut self.nonce).await
- }
-
- pub async fn directory(&mut self) -> Result<&Directory, Error> {
- Ok(Self::get_directory(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?
- .0)
- }
-
- async fn get_directory<'a, 'b>(
- http_client: &mut Client,
- directory_url: &str,
- directory: &'a mut Option<Directory>,
- nonce: &'b mut Option<String>,
- ) -> Result<(&'a Directory, Option<&'b str>), Error> {
- if let Some(d) = directory {
- return Ok((d, nonce.as_deref()));
- }
-
- let response = Self::execute(
- http_client,
- AcmeRequest {
- url: directory_url.to_string(),
- method: "GET",
- content_type: "",
- body: String::new(),
- expected: 200,
- },
- nonce,
- )
- .await?;
-
- *directory = Some(Directory::from_parts(
- directory_url.to_string(),
- response.json()?,
- ));
-
- Ok((directory.as_mut().unwrap(), nonce.as_deref()))
- }
-
- /// Like `get_directory`, but if the directory provides no nonce, also performs a `HEAD`
- /// request on the new nonce URL.
- async fn get_dir_nonce<'a, 'b>(
- http_client: &mut Client,
- directory_url: &str,
- directory: &'a mut Option<Directory>,
- nonce: &'b mut Option<String>,
- ) -> Result<(&'a Directory, &'b str), Error> {
- // this let construct is a lifetime workaround:
- let _ = Self::get_directory(http_client, directory_url, directory, nonce).await?;
- let dir = directory.as_ref().unwrap(); // the above fails if it couldn't fill this option
- if nonce.is_none() {
- // this is also a lifetime issue...
- let _ = Self::get_nonce(http_client, nonce, dir.new_nonce_url()).await?;
- };
- Ok((dir, nonce.as_deref().unwrap()))
- }
-
- pub async fn terms_of_service_url(&mut self) -> Result<Option<&str>, Error> {
- Ok(self.directory().await?.terms_of_service_url())
- }
-
- async fn get_nonce<'a>(
- http_client: &mut Client,
- nonce: &'a mut Option<String>,
- new_nonce_url: &str,
- ) -> Result<&'a str, Error> {
- let response = Self::execute(
- http_client,
- AcmeRequest {
- url: new_nonce_url.to_owned(),
- method: "HEAD",
- content_type: "",
- body: String::new(),
- expected: 200,
- },
- nonce,
- )
- .await?;
-
- if !response.got_nonce {
- return Err(Error::InvalidApi(
- "no new nonce received from new nonce URL".to_string(),
- ));
- }
-
- nonce
- .as_deref()
- .ok_or_else(|| Error::Client("failed to update nonce".to_string()))
- }
-}
-
-/// bad nonce retry count helper
-struct Retry(usize);
-
-const fn retry() -> Retry {
- Retry(0)
-}
-
-impl Retry {
- fn tick(&mut self) -> Result<(), Error> {
- if self.0 >= 3 {
- Err(Error::Client("kept getting a badNonce error!".to_string()))
- } else {
- self.0 += 1;
- Ok(())
- }
- }
-}
diff --git a/src/acme/mod.rs b/src/acme/mod.rs
index bf61811c..700d90d7 100644
--- a/src/acme/mod.rs
+++ b/src/acme/mod.rs
@@ -1,5 +1 @@
-mod client;
-pub use client::AcmeClient;
-
pub(crate) mod plugin;
-pub(crate) use plugin::get_acme_plugin;
diff --git a/src/acme/plugin.rs b/src/acme/plugin.rs
index 993d729b..6804243c 100644
--- a/src/acme/plugin.rs
+++ b/src/acme/plugin.rs
@@ -18,10 +18,10 @@ use tokio::io::{AsyncBufReadExt, AsyncRead, AsyncWriteExt, BufReader};
use tokio::net::TcpListener;
use tokio::process::Command;
+use proxmox_acme::async_client::AcmeClient;
use proxmox_acme::{Authorization, Challenge};
use proxmox_rest_server::WorkerTask;
-use crate::acme::AcmeClient;
use crate::api2::types::AcmeDomain;
use crate::config::acme::plugin::{DnsPlugin, PluginData};
diff --git a/src/api2/config/acme.rs b/src/api2/config/acme.rs
index 18671639..fb1a8a6f 100644
--- a/src/api2/config/acme.rs
+++ b/src/api2/config/acme.rs
@@ -1,29 +1,19 @@
-use std::fs;
-use std::ops::ControlFlow;
+use anyhow::Error;
use std::path::Path;
-use std::sync::{Arc, LazyLock, Mutex};
-use std::time::SystemTime;
-
-use anyhow::{bail, format_err, Error};
-use hex::FromHex;
-use serde::{Deserialize, Serialize};
-use serde_json::{json, Value};
-use tracing::{info, warn};
+use tracing::info;
use pbs_api_types::{Authid, PRIV_SYS_MODIFY};
-use proxmox_acme::types::AccountData as AcmeAccountData;
-use proxmox_acme::Account;
+use proxmox_acme_api::{
+ AccountEntry, AccountInfo, AcmeAccountName, AcmeChallengeSchema, ChallengeSchemaWrapper,
+ DeletablePluginProperty, DnsPluginCore, DnsPluginCoreUpdater, KnownAcmeDirectory, PluginConfig,
+ DEFAULT_ACME_DIRECTORY_ENTRY, PLUGIN_ID_SCHEMA,
+};
+use proxmox_config_digest::ConfigDigest;
use proxmox_rest_server::WorkerTask;
use proxmox_router::{
http_bail, list_subdirs_api_method, Permission, Router, RpcEnvironment, SubdirMap,
};
-use proxmox_schema::{api, param_bail};
-
-use crate::acme::AcmeClient;
-use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
-use crate::config::acme::plugin::{
- self, DnsPlugin, DnsPluginCore, DnsPluginCoreUpdater, PLUGIN_ID_SCHEMA,
-};
+use proxmox_schema::api;
pub(crate) const ROUTER: Router = Router::new()
.get(&list_subdirs_api_method!(SUBDIRS))
@@ -65,19 +55,6 @@ const PLUGIN_ITEM_ROUTER: Router = Router::new()
.put(&API_METHOD_UPDATE_PLUGIN)
.delete(&API_METHOD_DELETE_PLUGIN);
-#[api(
- properties: {
- name: { type: AcmeAccountName },
- },
-)]
-/// An ACME Account entry.
-///
-/// Currently only contains a 'name' property.
-#[derive(Serialize)]
-pub struct AccountEntry {
- name: AcmeAccountName,
-}
-
#[api(
access: {
permission: &Permission::Privilege(&["system", "certificates"], PRIV_SYS_MODIFY, false),
@@ -91,40 +68,7 @@ pub struct AccountEntry {
)]
/// List ACME accounts.
pub fn list_accounts() -> Result<Vec<AccountEntry>, Error> {
- let mut entries = Vec::new();
- crate::config::acme::foreach_acme_account(|name| {
- entries.push(AccountEntry { name });
- ControlFlow::Continue(())
- })?;
- Ok(entries)
-}
-
-#[api(
- properties: {
- account: { type: Object, properties: {}, additional_properties: true },
- tos: {
- type: String,
- optional: true,
- },
- },
-)]
-/// ACME Account information.
-///
-/// This is what we return via the API.
-#[derive(Serialize)]
-pub struct AccountInfo {
- /// Raw account data.
- account: AcmeAccountData,
-
- /// The ACME directory URL the account was created at.
- directory: String,
-
- /// The account's own URL within the ACME directory.
- location: String,
-
- /// The ToS URL, if the user agreed to one.
- #[serde(skip_serializing_if = "Option::is_none")]
- tos: Option<String>,
+ proxmox_acme_api::list_accounts()
}
#[api(
@@ -141,23 +85,7 @@ pub struct AccountInfo {
)]
/// Return existing ACME account information.
pub async fn get_account(name: AcmeAccountName) -> Result<AccountInfo, Error> {
- let client = AcmeClient::load(&name).await?;
- let account = client.account()?;
- Ok(AccountInfo {
- location: account.location.clone(),
- tos: client.tos().map(str::to_owned),
- directory: client.directory_url().to_owned(),
- account: AcmeAccountData {
- only_return_existing: false, // don't actually write this out in case it's set
- ..account.data.clone()
- },
- })
-}
-
-fn account_contact_from_string(s: &str) -> Vec<String> {
- s.split(&[' ', ';', ',', '\0'][..])
- .map(|s| format!("mailto:{s}"))
- .collect()
+ proxmox_acme_api::get_account(name).await
}
#[api(
@@ -222,15 +150,11 @@ fn register_account(
);
}
- if Path::new(&crate::config::acme::account_path(&name)).exists() {
+ if Path::new(&proxmox_acme_api::account_config_filename(&name)).exists() {
http_bail!(BAD_REQUEST, "account {} already exists", name);
}
- let directory = directory.unwrap_or_else(|| {
- crate::config::acme::DEFAULT_ACME_DIRECTORY_ENTRY
- .url
- .to_owned()
- });
+ let directory = directory.unwrap_or_else(|| DEFAULT_ACME_DIRECTORY_ENTRY.url.to_string());
WorkerTask::spawn(
"acme-register",
@@ -238,41 +162,24 @@ fn register_account(
auth_id.to_string(),
true,
move |_worker| async move {
- let mut client = AcmeClient::new(directory);
-
info!("Registering ACME account '{}'...", &name);
- let account = do_register_account(
- &mut client,
+ let location = proxmox_acme_api::register_account(
&name,
- tos_url.is_some(),
contact,
- None,
+ tos_url,
+ Some(directory),
eab_kid.zip(eab_hmac_key),
)
.await?;
- info!("Registration successful, account URL: {}", account.location);
+ info!("Registration successful, account URL: {}", location);
Ok(())
},
)
}
-pub async fn do_register_account<'a>(
- client: &'a mut AcmeClient,
- name: &AcmeAccountName,
- agree_to_tos: bool,
- contact: String,
- rsa_bits: Option<u32>,
- eab_creds: Option<(String, String)>,
-) -> Result<&'a Account, Error> {
- let contact = account_contact_from_string(&contact);
- client
- .new_account(name, agree_to_tos, contact, rsa_bits, eab_creds)
- .await
-}
-
#[api(
input: {
properties: {
@@ -303,14 +210,7 @@ pub fn update_account(
auth_id.to_string(),
true,
move |_worker| async move {
- let data = match contact {
- Some(data) => json!({
- "contact": account_contact_from_string(&data),
- }),
- None => json!({}),
- };
-
- AcmeClient::load(&name).await?.update_account(&data).await?;
+ proxmox_acme_api::update_account(&name, contact).await?;
Ok(())
},
@@ -348,18 +248,8 @@ pub fn deactivate_account(
auth_id.to_string(),
true,
move |_worker| async move {
- match AcmeClient::load(&name)
- .await?
- .update_account(&json!({"status": "deactivated"}))
- .await
- {
- Ok(_account) => (),
- Err(err) if !force => return Err(err),
- Err(err) => {
- warn!("error deactivating account {name}, proceeding anyway - {err}");
- }
- }
- crate::config::acme::mark_account_deactivated(&name)?;
+ proxmox_acme_api::deactivate_account(&name, force).await?;
+
Ok(())
},
)
@@ -386,15 +276,7 @@ pub fn deactivate_account(
)]
/// Get the Terms of Service URL for an ACME directory.
async fn get_tos(directory: Option<String>) -> Result<Option<String>, Error> {
- let directory = directory.unwrap_or_else(|| {
- crate::config::acme::DEFAULT_ACME_DIRECTORY_ENTRY
- .url
- .to_owned()
- });
- Ok(AcmeClient::new(directory)
- .terms_of_service_url()
- .await?
- .map(str::to_owned))
+ proxmox_acme_api::get_tos(directory).await
}
#[api(
@@ -409,52 +291,7 @@ async fn get_tos(directory: Option<String>) -> Result<Option<String>, Error> {
)]
/// Get named known ACME directory endpoints.
fn get_directories() -> Result<&'static [KnownAcmeDirectory], Error> {
- Ok(crate::config::acme::KNOWN_ACME_DIRECTORIES)
-}
-
-/// Wrapper for efficient Arc use when returning the ACME challenge-plugin schema for serializing
-struct ChallengeSchemaWrapper {
- inner: Arc<Vec<AcmeChallengeSchema>>,
-}
-
-impl Serialize for ChallengeSchemaWrapper {
- fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
- where
- S: serde::Serializer,
- {
- self.inner.serialize(serializer)
- }
-}
-
-struct CachedSchema {
- schema: Arc<Vec<AcmeChallengeSchema>>,
- cached_mtime: SystemTime,
-}
-
-fn get_cached_challenge_schemas() -> Result<ChallengeSchemaWrapper, Error> {
- static CACHE: LazyLock<Mutex<Option<CachedSchema>>> = LazyLock::new(|| Mutex::new(None));
-
- // the actual loading code
- let mut last = CACHE.lock().unwrap();
-
- let actual_mtime = fs::metadata(crate::config::acme::ACME_DNS_SCHEMA_FN)?.modified()?;
-
- let schema = match &*last {
- Some(CachedSchema {
- schema,
- cached_mtime,
- }) if *cached_mtime >= actual_mtime => schema.clone(),
- _ => {
- let new_schema = Arc::new(crate::config::acme::load_dns_challenge_schema()?);
- *last = Some(CachedSchema {
- schema: Arc::clone(&new_schema),
- cached_mtime: actual_mtime,
- });
- new_schema
- }
- };
-
- Ok(ChallengeSchemaWrapper { inner: schema })
+ Ok(proxmox_acme_api::KNOWN_ACME_DIRECTORIES)
}
#[api(
@@ -469,69 +306,7 @@ fn get_cached_challenge_schemas() -> Result<ChallengeSchemaWrapper, Error> {
)]
/// Get named known ACME directory endpoints.
fn get_challenge_schema() -> Result<ChallengeSchemaWrapper, Error> {
- get_cached_challenge_schemas()
-}
-
-#[api]
-#[derive(Default, Deserialize, Serialize)]
-#[serde(rename_all = "kebab-case")]
-/// The API's format is inherited from PVE/PMG:
-pub struct PluginConfig {
- /// Plugin ID.
- plugin: String,
-
- /// Plugin type.
- #[serde(rename = "type")]
- ty: String,
-
- /// DNS Api name.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- api: Option<String>,
-
- /// Plugin configuration data.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- data: Option<String>,
-
- /// Extra delay in seconds to wait before requesting validation.
- ///
- /// Allows to cope with long TTL of DNS records.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- validation_delay: Option<u32>,
-
- /// Flag to disable the config.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- disable: Option<bool>,
-}
-
-// See PMG/PVE's $modify_cfg_for_api sub
-fn modify_cfg_for_api(id: &str, ty: &str, data: &Value) -> PluginConfig {
- let mut entry = data.clone();
-
- let obj = entry.as_object_mut().unwrap();
- obj.remove("id");
- obj.insert("plugin".to_string(), Value::String(id.to_owned()));
- obj.insert("type".to_string(), Value::String(ty.to_owned()));
-
- // FIXME: This needs to go once the `Updater` is fixed.
- // None of these should be able to fail unless the user changed the files by hand, in which
- // case we leave the unmodified string in the Value for now. This will be handled with an error
- // later.
- if let Some(Value::String(ref mut data)) = obj.get_mut("data") {
- if let Ok(new) = proxmox_base64::url::decode_no_pad(&data) {
- if let Ok(utf8) = String::from_utf8(new) {
- *data = utf8;
- }
- }
- }
-
- // PVE/PMG do this explicitly for ACME plugins...
- // obj.insert("digest".to_string(), Value::String(digest.clone()));
-
- serde_json::from_value(entry).unwrap_or_else(|_| PluginConfig {
- plugin: "*Error*".to_string(),
- ty: "*Error*".to_string(),
- ..Default::default()
- })
+ proxmox_acme_api::get_cached_challenge_schemas()
}
#[api(
@@ -547,12 +322,7 @@ fn modify_cfg_for_api(id: &str, ty: &str, data: &Value) -> PluginConfig {
)]
/// List ACME challenge plugins.
pub fn list_plugins(rpcenv: &mut dyn RpcEnvironment) -> Result<Vec<PluginConfig>, Error> {
- let (plugins, digest) = plugin::config()?;
- rpcenv["digest"] = hex::encode(digest).into();
- Ok(plugins
- .iter()
- .map(|(id, (ty, data))| modify_cfg_for_api(id, ty, data))
- .collect())
+ proxmox_acme_api::list_plugins(rpcenv)
}
#[api(
@@ -569,13 +339,7 @@ pub fn list_plugins(rpcenv: &mut dyn RpcEnvironment) -> Result<Vec<PluginConfig>
)]
/// List ACME challenge plugins.
pub fn get_plugin(id: String, rpcenv: &mut dyn RpcEnvironment) -> Result<PluginConfig, Error> {
- let (plugins, digest) = plugin::config()?;
- rpcenv["digest"] = hex::encode(digest).into();
-
- match plugins.get(&id) {
- Some((ty, data)) => Ok(modify_cfg_for_api(&id, ty, data)),
- None => http_bail!(NOT_FOUND, "no such plugin"),
- }
+ proxmox_acme_api::get_plugin(id, rpcenv)
}
// Currently we only have "the" standalone plugin and DNS plugins so we can just flatten a
@@ -607,30 +371,7 @@ pub fn get_plugin(id: String, rpcenv: &mut dyn RpcEnvironment) -> Result<PluginC
)]
/// Add ACME plugin configuration.
pub fn add_plugin(r#type: String, core: DnsPluginCore, data: String) -> Result<(), Error> {
- // Currently we only support DNS plugins and the standalone plugin is "fixed":
- if r#type != "dns" {
- param_bail!("type", "invalid ACME plugin type: {:?}", r#type);
- }
-
- let data = String::from_utf8(proxmox_base64::decode(data)?)
- .map_err(|_| format_err!("data must be valid UTF-8"))?;
-
- let id = core.id.clone();
-
- let _lock = plugin::lock()?;
-
- let (mut plugins, _digest) = plugin::config()?;
- if plugins.contains_key(&id) {
- param_bail!("id", "ACME plugin ID {:?} already exists", id);
- }
-
- let plugin = serde_json::to_value(DnsPlugin { core, data })?;
-
- plugins.insert(id, r#type, plugin);
-
- plugin::save_config(&plugins)?;
-
- Ok(())
+ proxmox_acme_api::add_plugin(r#type, core, data)
}
#[api(
@@ -646,26 +387,7 @@ pub fn add_plugin(r#type: String, core: DnsPluginCore, data: String) -> Result<(
)]
/// Delete an ACME plugin configuration.
pub fn delete_plugin(id: String) -> Result<(), Error> {
- let _lock = plugin::lock()?;
-
- let (mut plugins, _digest) = plugin::config()?;
- if plugins.remove(&id).is_none() {
- http_bail!(NOT_FOUND, "no such plugin");
- }
- plugin::save_config(&plugins)?;
-
- Ok(())
-}
-
-#[api()]
-#[derive(Serialize, Deserialize)]
-#[serde(rename_all = "kebab-case")]
-/// Deletable property name
-pub enum DeletableProperty {
- /// Delete the disable property
- Disable,
- /// Delete the validation-delay property
- ValidationDelay,
+ proxmox_acme_api::delete_plugin(id)
}
#[api(
@@ -687,12 +409,12 @@ pub enum DeletableProperty {
type: Array,
optional: true,
items: {
- type: DeletableProperty,
+ type: DeletablePluginProperty,
}
},
digest: {
- description: "Digest to protect against concurrent updates",
optional: true,
+ type: ConfigDigest,
},
},
},
@@ -706,65 +428,8 @@ pub fn update_plugin(
id: String,
update: DnsPluginCoreUpdater,
data: Option<String>,
- delete: Option<Vec<DeletableProperty>>,
- digest: Option<String>,
+ delete: Option<Vec<DeletablePluginProperty>>,
+ digest: Option<ConfigDigest>,
) -> Result<(), Error> {
- let data = data
- .as_deref()
- .map(proxmox_base64::decode)
- .transpose()?
- .map(String::from_utf8)
- .transpose()
- .map_err(|_| format_err!("data must be valid UTF-8"))?;
-
- let _lock = plugin::lock()?;
-
- let (mut plugins, expected_digest) = plugin::config()?;
-
- if let Some(digest) = digest {
- let digest = <[u8; 32]>::from_hex(digest)?;
- crate::tools::detect_modified_configuration_file(&digest, &expected_digest)?;
- }
-
- match plugins.get_mut(&id) {
- Some((ty, ref mut entry)) => {
- if ty != "dns" {
- bail!("cannot update plugin of type {:?}", ty);
- }
-
- let mut plugin = DnsPlugin::deserialize(&*entry)?;
-
- if let Some(delete) = delete {
- for delete_prop in delete {
- match delete_prop {
- DeletableProperty::ValidationDelay => {
- plugin.core.validation_delay = None;
- }
- DeletableProperty::Disable => {
- plugin.core.disable = None;
- }
- }
- }
- }
- if let Some(data) = data {
- plugin.data = data;
- }
- if let Some(api) = update.api {
- plugin.core.api = api;
- }
- if update.validation_delay.is_some() {
- plugin.core.validation_delay = update.validation_delay;
- }
- if update.disable.is_some() {
- plugin.core.disable = update.disable;
- }
-
- *entry = serde_json::to_value(plugin)?;
- }
- None => http_bail!(NOT_FOUND, "no such plugin"),
- }
-
- plugin::save_config(&plugins)?;
-
- Ok(())
+ proxmox_acme_api::update_plugin(id, update, data, delete, digest)
}
diff --git a/src/api2/node/certificates.rs b/src/api2/node/certificates.rs
index 6b1d87d2..7fb3a478 100644
--- a/src/api2/node/certificates.rs
+++ b/src/api2/node/certificates.rs
@@ -1,13 +1,11 @@
-use std::sync::Arc;
-use std::time::Duration;
-
use anyhow::{bail, format_err, Error};
use openssl::pkey::PKey;
use openssl::x509::X509;
use serde::{Deserialize, Serialize};
-use tracing::{info, warn};
+use tracing::info;
use pbs_api_types::{NODE_SCHEMA, PRIV_SYS_MODIFY};
+use proxmox_acme_api::AcmeDomain;
use proxmox_rest_server::WorkerTask;
use proxmox_router::list_subdirs_api_method;
use proxmox_router::SubdirMap;
@@ -17,9 +15,6 @@ use proxmox_schema::api;
use pbs_buildcfg::configdir;
use pbs_tools::cert;
-use crate::acme::AcmeClient;
-use crate::api2::types::AcmeDomain;
-use crate::config::node::NodeConfig;
use crate::server::send_certificate_renewal_mail;
pub const ROUTER: Router = Router::new()
@@ -268,193 +263,6 @@ pub async fn delete_custom_certificate() -> Result<(), Error> {
Ok(())
}
-struct OrderedCertificate {
- certificate: hyper::body::Bytes,
- private_key_pem: Vec<u8>,
-}
-
-async fn order_certificate(
- worker: Arc<WorkerTask>,
- node_config: &NodeConfig,
-) -> Result<Option<OrderedCertificate>, Error> {
- use proxmox_acme::authorization::Status;
- use proxmox_acme::order::Identifier;
-
- let domains = node_config.acme_domains().try_fold(
- Vec::<AcmeDomain>::new(),
- |mut acc, domain| -> Result<_, Error> {
- let mut domain = domain?;
- domain.domain.make_ascii_lowercase();
- if let Some(alias) = &mut domain.alias {
- alias.make_ascii_lowercase();
- }
- acc.push(domain);
- Ok(acc)
- },
- )?;
-
- let get_domain_config = |domain: &str| {
- domains
- .iter()
- .find(|d| d.domain == domain)
- .ok_or_else(|| format_err!("no config for domain '{}'", domain))
- };
-
- if domains.is_empty() {
- info!("No domains configured to be ordered from an ACME server.");
- return Ok(None);
- }
-
- let (plugins, _) = crate::config::acme::plugin::config()?;
-
- let mut acme = node_config.acme_client().await?;
-
- info!("Placing ACME order");
- let order = acme
- .new_order(domains.iter().map(|d| d.domain.to_ascii_lowercase()))
- .await?;
- info!("Order URL: {}", order.location);
-
- let identifiers: Vec<String> = order
- .data
- .identifiers
- .iter()
- .map(|identifier| match identifier {
- Identifier::Dns(domain) => domain.clone(),
- })
- .collect();
-
- for auth_url in &order.data.authorizations {
- info!("Getting authorization details from '{auth_url}'");
- let mut auth = acme.get_authorization(auth_url).await?;
-
- let domain = match &mut auth.identifier {
- Identifier::Dns(domain) => domain.to_ascii_lowercase(),
- };
-
- if auth.status == Status::Valid {
- info!("{domain} is already validated!");
- continue;
- }
-
- info!("The validation for {domain} is pending");
- let domain_config: &AcmeDomain = get_domain_config(&domain)?;
- let plugin_id = domain_config.plugin.as_deref().unwrap_or("standalone");
- let mut plugin_cfg = crate::acme::get_acme_plugin(&plugins, plugin_id)?
- .ok_or_else(|| format_err!("plugin '{plugin_id}' for domain '{domain}' not found!"))?;
-
- info!("Setting up validation plugin");
- let validation_url = plugin_cfg
- .setup(&mut acme, &auth, domain_config, Arc::clone(&worker))
- .await?;
-
- let result = request_validation(&mut acme, auth_url, validation_url).await;
-
- if let Err(err) = plugin_cfg
- .teardown(&mut acme, &auth, domain_config, Arc::clone(&worker))
- .await
- {
- warn!("Failed to teardown plugin '{plugin_id}' for domain '{domain}' - {err}");
- }
-
- result?;
- }
-
- info!("All domains validated");
- info!("Creating CSR");
-
- let csr = proxmox_acme::util::Csr::generate(&identifiers, &Default::default())?;
- let mut finalize_error_cnt = 0u8;
- let order_url = &order.location;
- let mut order;
- loop {
- use proxmox_acme::order::Status;
-
- order = acme.get_order(order_url).await?;
-
- match order.status {
- Status::Pending => {
- info!("still pending, trying to finalize anyway");
- let finalize = order
- .finalize
- .as_deref()
- .ok_or_else(|| format_err!("missing 'finalize' URL in order"))?;
- if let Err(err) = acme.finalize(finalize, &csr.data).await {
- if finalize_error_cnt >= 5 {
- return Err(err);
- }
-
- finalize_error_cnt += 1;
- }
- tokio::time::sleep(Duration::from_secs(5)).await;
- }
- Status::Ready => {
- info!("order is ready, finalizing");
- let finalize = order
- .finalize
- .as_deref()
- .ok_or_else(|| format_err!("missing 'finalize' URL in order"))?;
- acme.finalize(finalize, &csr.data).await?;
- tokio::time::sleep(Duration::from_secs(5)).await;
- }
- Status::Processing => {
- info!("still processing, trying again in 30 seconds");
- tokio::time::sleep(Duration::from_secs(30)).await;
- }
- Status::Valid => {
- info!("valid");
- break;
- }
- other => bail!("order status: {:?}", other),
- }
- }
-
- info!("Downloading certificate");
- let certificate = acme
- .get_certificate(
- order
- .certificate
- .as_deref()
- .ok_or_else(|| format_err!("missing certificate url in finalized order"))?,
- )
- .await?;
-
- Ok(Some(OrderedCertificate {
- certificate,
- private_key_pem: csr.private_key_pem,
- }))
-}
-
-async fn request_validation(
- acme: &mut AcmeClient,
- auth_url: &str,
- validation_url: &str,
-) -> Result<(), Error> {
- info!("Triggering validation");
- acme.request_challenge_validation(validation_url).await?;
-
- info!("Sleeping for 5 seconds");
- tokio::time::sleep(Duration::from_secs(5)).await;
-
- loop {
- use proxmox_acme::authorization::Status;
-
- let auth = acme.get_authorization(auth_url).await?;
- match auth.status {
- Status::Pending => {
- info!("Status is still 'pending', trying again in 10 seconds");
- tokio::time::sleep(Duration::from_secs(10)).await;
- }
- Status::Valid => return Ok(()),
- other => bail!(
- "validating challenge '{}' failed - status: {:?}",
- validation_url,
- other
- ),
- }
- }
-}
-
#[api(
input: {
properties: {
@@ -524,9 +332,26 @@ fn spawn_certificate_worker(
let auth_id = rpcenv.get_auth_id().unwrap();
+ let acme_config = node_config.acme_config()?;
+
+ let domains = node_config.acme_domains().try_fold(
+ Vec::<AcmeDomain>::new(),
+ |mut acc, domain| -> Result<_, Error> {
+ let mut domain = domain?;
+ domain.domain.make_ascii_lowercase();
+ if let Some(alias) = &mut domain.alias {
+ alias.make_ascii_lowercase();
+ }
+ acc.push(domain);
+ Ok(acc)
+ },
+ )?;
+
WorkerTask::spawn(name, None, auth_id, true, move |worker| async move {
let work = || async {
- if let Some(cert) = order_certificate(worker, &node_config).await? {
+ if let Some(cert) =
+ proxmox_acme_api::order_certificate(worker, &acme_config, &domains).await?
+ {
crate::config::set_proxy_certificate(&cert.certificate, &cert.private_key_pem)?;
crate::server::reload_proxy_certificate().await?;
}
@@ -562,16 +387,16 @@ pub fn revoke_acme_cert(rpcenv: &mut dyn RpcEnvironment) -> Result<String, Error
let auth_id = rpcenv.get_auth_id().unwrap();
+ let acme_config = node_config.acme_config()?;
+
WorkerTask::spawn(
"acme-revoke-cert",
None,
auth_id,
true,
move |_worker| async move {
- info!("Loading ACME account");
- let mut acme = node_config.acme_client().await?;
info!("Revoking old certificate");
- acme.revoke_certificate(cert_pem.as_bytes(), None).await?;
+ proxmox_acme_api::revoke_certificate(&acme_config, &cert_pem.as_bytes()).await?;
info!("Deleting certificate and regenerating a self-signed one");
delete_custom_certificate().await?;
Ok(())
diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
index 8661f9e8..b83b9882 100644
--- a/src/api2/types/acme.rs
+++ b/src/api2/types/acme.rs
@@ -1,8 +1,7 @@
use serde::{Deserialize, Serialize};
-use serde_json::Value;
use pbs_api_types::{DNS_ALIAS_FORMAT, DNS_NAME_FORMAT, PROXMOX_SAFE_ID_FORMAT};
-use proxmox_schema::{api, ApiStringFormat, ApiType, Schema, StringSchema};
+use proxmox_schema::api;
#[api(
properties: {
@@ -37,61 +36,3 @@ pub struct AcmeDomain {
#[serde(skip_serializing_if = "Option::is_none")]
pub plugin: Option<String>,
}
-
-pub const ACME_DOMAIN_PROPERTY_SCHEMA: Schema =
- StringSchema::new("ACME domain configuration string")
- .format(&ApiStringFormat::PropertyString(&AcmeDomain::API_SCHEMA))
- .schema();
-
-#[api(
- properties: {
- name: { type: String },
- url: { type: String },
- },
-)]
-/// An ACME directory endpoint with a name and URL.
-#[derive(Serialize)]
-pub struct KnownAcmeDirectory {
- /// The ACME directory's name.
- pub name: &'static str,
-
- /// The ACME directory's endpoint URL.
- pub url: &'static str,
-}
-
-proxmox_schema::api_string_type! {
- #[api(format: &PROXMOX_SAFE_ID_FORMAT)]
- /// ACME account name.
- #[derive(Clone, Eq, PartialEq, Hash, Deserialize, Serialize)]
- #[serde(transparent)]
- pub struct AcmeAccountName(String);
-}
-
-#[api(
- properties: {
- schema: {
- type: Object,
- additional_properties: true,
- properties: {},
- },
- type: {
- type: String,
- },
- },
-)]
-#[derive(Serialize)]
-/// Schema for an ACME challenge plugin.
-pub struct AcmeChallengeSchema {
- /// Plugin ID.
- pub id: String,
-
- /// Human readable name, falls back to id.
- pub name: String,
-
- /// Plugin Type.
- #[serde(rename = "type")]
- pub ty: &'static str,
-
- /// The plugin's parameter schema.
- pub schema: Value,
-}
diff --git a/src/bin/proxmox-backup-api.rs b/src/bin/proxmox-backup-api.rs
index 417e9e97..d0091dca 100644
--- a/src/bin/proxmox-backup-api.rs
+++ b/src/bin/proxmox-backup-api.rs
@@ -14,6 +14,7 @@ use proxmox_rest_server::{ApiConfig, RestServer};
use proxmox_router::RpcEnvironmentType;
use proxmox_sys::fs::CreateOptions;
+use pbs_buildcfg::configdir;
use proxmox_backup::auth_helpers::*;
use proxmox_backup::config;
use proxmox_backup::server::auth::check_pbs_auth;
@@ -78,6 +79,7 @@ async fn run() -> Result<(), Error> {
let mut command_sock = proxmox_daemon::command_socket::CommandSocket::new(backup_user.gid);
proxmox_product_config::init(backup_user.clone(), pbs_config::priv_user()?);
+ proxmox_acme_api::init(configdir!("/acme"), true)?;
let dir_opts = CreateOptions::new()
.owner(backup_user.uid)
diff --git a/src/bin/proxmox-backup-manager.rs b/src/bin/proxmox-backup-manager.rs
index f8365070..f041ba0b 100644
--- a/src/bin/proxmox-backup-manager.rs
+++ b/src/bin/proxmox-backup-manager.rs
@@ -19,12 +19,12 @@ use proxmox_router::{cli::*, RpcEnvironment};
use proxmox_schema::api;
use proxmox_sys::fs::CreateOptions;
+use pbs_buildcfg::configdir;
use pbs_client::{display_task_log, view_task_result};
use pbs_config::sync;
use pbs_tools::json::required_string_param;
use proxmox_backup::api2;
use proxmox_backup::client_helpers::connect_to_localhost;
-use proxmox_backup::config;
mod proxmox_backup_manager;
use proxmox_backup_manager::*;
@@ -667,6 +667,7 @@ async fn run() -> Result<(), Error> {
.init()?;
proxmox_backup::server::notifications::init()?;
proxmox_product_config::init(pbs_config::backup_user()?, pbs_config::priv_user()?);
+ proxmox_acme_api::init(configdir!("/acme"), false)?;
let cmd_def = CliCommandMap::new()
.insert("acl", acl_commands())
diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
index 870208fe..eea44a7d 100644
--- a/src/bin/proxmox-backup-proxy.rs
+++ b/src/bin/proxmox-backup-proxy.rs
@@ -188,6 +188,7 @@ async fn run() -> Result<(), Error> {
proxmox_backup::server::notifications::init()?;
metric_collection::init()?;
proxmox_product_config::init(pbs_config::backup_user()?, pbs_config::priv_user()?);
+ proxmox_acme_api::init(configdir!("/acme"), false)?;
let mut indexpath = PathBuf::from(pbs_buildcfg::JS_DIR);
indexpath.push("index.hbs");
diff --git a/src/bin/proxmox_backup_manager/acme.rs b/src/bin/proxmox_backup_manager/acme.rs
index 0f0eafea..57431225 100644
--- a/src/bin/proxmox_backup_manager/acme.rs
+++ b/src/bin/proxmox_backup_manager/acme.rs
@@ -3,15 +3,13 @@ use std::io::Write;
use anyhow::{bail, Error};
use serde_json::Value;
+use proxmox_acme::async_client::AcmeClient;
+use proxmox_acme_api::{AcmeAccountName, DnsPluginCore, KNOWN_ACME_DIRECTORIES};
use proxmox_router::{cli::*, ApiHandler, RpcEnvironment};
use proxmox_schema::api;
use proxmox_sys::fs::file_get_contents;
-use proxmox_backup::acme::AcmeClient;
use proxmox_backup::api2;
-use proxmox_backup::api2::types::AcmeAccountName;
-use proxmox_backup::config::acme::plugin::DnsPluginCore;
-use proxmox_backup::config::acme::KNOWN_ACME_DIRECTORIES;
pub fn acme_mgmt_cli() -> CommandLineInterface {
let cmd_def = CliCommandMap::new()
@@ -122,7 +120,7 @@ async fn register_account(
match input.trim().parse::<usize>() {
Ok(n) if n < KNOWN_ACME_DIRECTORIES.len() => {
- break (KNOWN_ACME_DIRECTORIES[n].url.to_owned(), false);
+ break (KNOWN_ACME_DIRECTORIES[n].url.to_string(), false);
}
Ok(n) if n == KNOWN_ACME_DIRECTORIES.len() => {
input.clear();
@@ -188,17 +186,20 @@ async fn register_account(
println!("Attempting to register account with {directory_url:?}...");
- let account = api2::config::acme::do_register_account(
- &mut client,
+ let tos_agreed = tos_agreed
+ .then(|| directory.terms_of_service_url().map(str::to_owned))
+ .flatten();
+
+ let location = proxmox_acme_api::register_account(
&name,
- tos_agreed,
contact,
- None,
+ tos_agreed,
+ Some(directory_url),
eab_creds,
)
.await?;
- println!("Registration successful, account URL: {}", account.location);
+ println!("Registration successful, account URL: {}", location);
Ok(())
}
@@ -266,19 +267,19 @@ pub fn account_cli() -> CommandLineInterface {
"deactivate",
CliCommand::new(&API_METHOD_DEACTIVATE_ACCOUNT)
.arg_param(&["name"])
- .completion_cb("name", crate::config::acme::complete_acme_account),
+ .completion_cb("name", proxmox_acme_api::complete_acme_account),
)
.insert(
"info",
CliCommand::new(&API_METHOD_GET_ACCOUNT)
.arg_param(&["name"])
- .completion_cb("name", crate::config::acme::complete_acme_account),
+ .completion_cb("name", proxmox_acme_api::complete_acme_account),
)
.insert(
"update",
CliCommand::new(&API_METHOD_UPDATE_ACCOUNT)
.arg_param(&["name"])
- .completion_cb("name", crate::config::acme::complete_acme_account),
+ .completion_cb("name", proxmox_acme_api::complete_acme_account),
);
cmd_def.into()
@@ -373,26 +374,26 @@ pub fn plugin_cli() -> CommandLineInterface {
"config", // name comes from pve/pmg
CliCommand::new(&API_METHOD_GET_PLUGIN)
.arg_param(&["id"])
- .completion_cb("id", crate::config::acme::complete_acme_plugin),
+ .completion_cb("id", proxmox_acme_api::complete_acme_plugin),
)
.insert(
"add",
CliCommand::new(&API_METHOD_ADD_PLUGIN)
.arg_param(&["type", "id"])
- .completion_cb("api", crate::config::acme::complete_acme_api_challenge_type)
- .completion_cb("type", crate::config::acme::complete_acme_plugin_type),
+ .completion_cb("api", proxmox_acme_api::complete_acme_api_challenge_type)
+ .completion_cb("type", proxmox_acme_api::complete_acme_plugin_type),
)
.insert(
"remove",
CliCommand::new(&acme::API_METHOD_DELETE_PLUGIN)
.arg_param(&["id"])
- .completion_cb("id", crate::config::acme::complete_acme_plugin),
+ .completion_cb("id", proxmox_acme_api::complete_acme_plugin),
)
.insert(
"set",
CliCommand::new(&acme::API_METHOD_UPDATE_PLUGIN)
.arg_param(&["id"])
- .completion_cb("id", crate::config::acme::complete_acme_plugin),
+ .completion_cb("id", proxmox_acme_api::complete_acme_plugin),
);
cmd_def.into()
diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
index ac89ae5e..962cb1bb 100644
--- a/src/config/acme/mod.rs
+++ b/src/config/acme/mod.rs
@@ -1,168 +1 @@
-use std::collections::HashMap;
-use std::ops::ControlFlow;
-use std::path::Path;
-
-use anyhow::{bail, format_err, Error};
-use serde_json::Value;
-
-use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
-use proxmox_sys::error::SysError;
-use proxmox_sys::fs::{file_read_string, CreateOptions};
-
-use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
-
-pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
-pub(crate) const ACME_ACCOUNT_DIR: &str = pbs_buildcfg::configdir!("/acme/accounts");
-
-pub(crate) const ACME_DNS_SCHEMA_FN: &str = "/usr/share/proxmox-acme/dns-challenge-schema.json";
-
pub mod plugin;
-
-// `const fn`ify this once it is supported in `proxmox`
-fn root_only() -> CreateOptions {
- CreateOptions::new()
- .owner(nix::unistd::ROOT)
- .group(nix::unistd::Gid::from_raw(0))
- .perm(nix::sys::stat::Mode::from_bits_truncate(0o700))
-}
-
-fn create_acme_subdir(dir: &str) -> Result<(), Error> {
- proxmox_sys::fs::ensure_dir_exists(dir, &root_only(), false)
-}
-
-pub(crate) fn make_acme_dir() -> Result<(), Error> {
- create_acme_subdir(ACME_DIR)
-}
-
-pub(crate) fn make_acme_account_dir() -> Result<(), Error> {
- make_acme_dir()?;
- create_acme_subdir(ACME_ACCOUNT_DIR)
-}
-
-pub const KNOWN_ACME_DIRECTORIES: &[KnownAcmeDirectory] = &[
- KnownAcmeDirectory {
- name: "Let's Encrypt V2",
- url: "https://acme-v02.api.letsencrypt.org/directory",
- },
- KnownAcmeDirectory {
- name: "Let's Encrypt V2 Staging",
- url: "https://acme-staging-v02.api.letsencrypt.org/directory",
- },
-];
-
-pub const DEFAULT_ACME_DIRECTORY_ENTRY: &KnownAcmeDirectory = &KNOWN_ACME_DIRECTORIES[0];
-
-pub fn account_path(name: &str) -> String {
- format!("{ACME_ACCOUNT_DIR}/{name}")
-}
-
-pub fn foreach_acme_account<F>(mut func: F) -> Result<(), Error>
-where
- F: FnMut(AcmeAccountName) -> ControlFlow<Result<(), Error>>,
-{
- match proxmox_sys::fs::scan_subdir(-1, ACME_ACCOUNT_DIR, &PROXMOX_SAFE_ID_REGEX) {
- Ok(files) => {
- for file in files {
- let file = file?;
- let file_name = unsafe { file.file_name_utf8_unchecked() };
-
- if file_name.starts_with('_') {
- continue;
- }
-
- let account_name = match AcmeAccountName::from_string(file_name.to_owned()) {
- Ok(account_name) => account_name,
- Err(_) => continue,
- };
-
- if let ControlFlow::Break(result) = func(account_name) {
- return result;
- }
- }
- Ok(())
- }
- Err(err) if err.not_found() => Ok(()),
- Err(err) => Err(err.into()),
- }
-}
-
-pub fn mark_account_deactivated(name: &str) -> Result<(), Error> {
- let from = account_path(name);
- for i in 0..100 {
- let to = account_path(&format!("_deactivated_{name}_{i}"));
- if !Path::new(&to).exists() {
- return std::fs::rename(&from, &to).map_err(|err| {
- format_err!(
- "failed to move account path {:?} to {:?} - {}",
- from,
- to,
- err
- )
- });
- }
- }
- bail!(
- "No free slot to rename deactivated account {:?}, please cleanup {:?}",
- from,
- ACME_ACCOUNT_DIR
- );
-}
-
-pub fn load_dns_challenge_schema() -> Result<Vec<AcmeChallengeSchema>, Error> {
- let raw = file_read_string(ACME_DNS_SCHEMA_FN)?;
- let schemas: serde_json::Map<String, Value> = serde_json::from_str(&raw)?;
-
- Ok(schemas
- .iter()
- .map(|(id, schema)| AcmeChallengeSchema {
- id: id.to_owned(),
- name: schema
- .get("name")
- .and_then(Value::as_str)
- .unwrap_or(id)
- .to_owned(),
- ty: "dns",
- schema: schema.to_owned(),
- })
- .collect())
-}
-
-pub fn complete_acme_account(_arg: &str, _param: &HashMap<String, String>) -> Vec<String> {
- let mut out = Vec::new();
- let _ = foreach_acme_account(|name| {
- out.push(name.into_string());
- ControlFlow::Continue(())
- });
- out
-}
-
-pub fn complete_acme_plugin(_arg: &str, _param: &HashMap<String, String>) -> Vec<String> {
- match plugin::config() {
- Ok((config, _digest)) => config
- .iter()
- .map(|(id, (_type, _cfg))| id.clone())
- .collect(),
- Err(_) => Vec::new(),
- }
-}
-
-pub fn complete_acme_plugin_type(_arg: &str, _param: &HashMap<String, String>) -> Vec<String> {
- vec![
- "dns".to_string(),
- //"http".to_string(), // makes currently not really sense to create or the like
- ]
-}
-
-pub fn complete_acme_api_challenge_type(
- _arg: &str,
- param: &HashMap<String, String>,
-) -> Vec<String> {
- if param.get("type") == Some(&"dns".to_string()) {
- match load_dns_challenge_schema() {
- Ok(schema) => schema.into_iter().map(|s| s.id).collect(),
- Err(_) => Vec::new(),
- }
- } else {
- Vec::new()
- }
-}
diff --git a/src/config/acme/plugin.rs b/src/config/acme/plugin.rs
index 8ce852ec..e5a41f99 100644
--- a/src/config/acme/plugin.rs
+++ b/src/config/acme/plugin.rs
@@ -1,14 +1,10 @@
-use std::sync::LazyLock;
-
use anyhow::Error;
use serde::{Deserialize, Serialize};
use serde_json::Value;
use pbs_api_types::PROXMOX_SAFE_ID_FORMAT;
-use proxmox_schema::{api, ApiType, Schema, StringSchema, Updater};
-use proxmox_section_config::{SectionConfig, SectionConfigData, SectionConfigPlugin};
-
-use pbs_config::{open_backup_lockfile, BackupLockGuard};
+use proxmox_schema::{api, Schema, StringSchema, Updater};
+use proxmox_section_config::SectionConfigData;
pub const PLUGIN_ID_SCHEMA: Schema = StringSchema::new("ACME Challenge Plugin ID.")
.format(&PROXMOX_SAFE_ID_FORMAT)
@@ -16,28 +12,6 @@ pub const PLUGIN_ID_SCHEMA: Schema = StringSchema::new("ACME Challenge Plugin ID
.max_length(32)
.schema();
-pub static CONFIG: LazyLock<SectionConfig> = LazyLock::new(init);
-
-#[api(
- properties: {
- id: { schema: PLUGIN_ID_SCHEMA },
- },
-)]
-#[derive(Deserialize, Serialize)]
-/// Standalone ACME Plugin for the http-1 challenge.
-pub struct StandalonePlugin {
- /// Plugin ID.
- id: String,
-}
-
-impl Default for StandalonePlugin {
- fn default() -> Self {
- Self {
- id: "standalone".to_string(),
- }
- }
-}
-
#[api(
properties: {
id: { schema: PLUGIN_ID_SCHEMA },
@@ -99,64 +73,6 @@ impl DnsPlugin {
}
}
-fn init() -> SectionConfig {
- let mut config = SectionConfig::new(&PLUGIN_ID_SCHEMA);
-
- let standalone_schema = match &StandalonePlugin::API_SCHEMA {
- Schema::Object(schema) => schema,
- _ => unreachable!(),
- };
- let standalone_plugin = SectionConfigPlugin::new(
- "standalone".to_string(),
- Some("id".to_string()),
- standalone_schema,
- );
- config.register_plugin(standalone_plugin);
-
- let dns_challenge_schema = match DnsPlugin::API_SCHEMA {
- Schema::AllOf(ref schema) => schema,
- _ => unreachable!(),
- };
- let dns_challenge_plugin = SectionConfigPlugin::new(
- "dns".to_string(),
- Some("id".to_string()),
- dns_challenge_schema,
- );
- config.register_plugin(dns_challenge_plugin);
-
- config
-}
-
-const ACME_PLUGIN_CFG_FILENAME: &str = pbs_buildcfg::configdir!("/acme/plugins.cfg");
-const ACME_PLUGIN_CFG_LOCKFILE: &str = pbs_buildcfg::configdir!("/acme/.plugins.lck");
-
-pub fn lock() -> Result<BackupLockGuard, Error> {
- super::make_acme_dir()?;
- open_backup_lockfile(ACME_PLUGIN_CFG_LOCKFILE, None, true)
-}
-
-pub fn config() -> Result<(PluginData, [u8; 32]), Error> {
- let content =
- proxmox_sys::fs::file_read_optional_string(ACME_PLUGIN_CFG_FILENAME)?.unwrap_or_default();
-
- let digest = openssl::sha::sha256(content.as_bytes());
- let mut data = CONFIG.parse(ACME_PLUGIN_CFG_FILENAME, &content)?;
-
- if !data.sections.contains_key("standalone") {
- let standalone = StandalonePlugin::default();
- data.set_data("standalone", "standalone", &standalone)
- .unwrap();
- }
-
- Ok((PluginData { data }, digest))
-}
-
-pub fn save_config(config: &PluginData) -> Result<(), Error> {
- super::make_acme_dir()?;
- let raw = CONFIG.write(ACME_PLUGIN_CFG_FILENAME, &config.data)?;
- pbs_config::replace_backup_config(ACME_PLUGIN_CFG_FILENAME, raw.as_bytes())
-}
-
pub struct PluginData {
data: SectionConfigData,
}
diff --git a/src/config/node.rs b/src/config/node.rs
index 253b2e36..81eecb24 100644
--- a/src/config/node.rs
+++ b/src/config/node.rs
@@ -8,16 +8,14 @@ use pbs_api_types::{
EMAIL_SCHEMA, MULTI_LINE_COMMENT_SCHEMA, OPENSSL_CIPHERS_TLS_1_2_SCHEMA,
OPENSSL_CIPHERS_TLS_1_3_SCHEMA,
};
+use proxmox_acme_api::{AcmeConfig, AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA};
use proxmox_http::ProxyConfig;
use proxmox_schema::{api, ApiStringFormat, ApiType, Updater};
use pbs_buildcfg::configdir;
use pbs_config::{open_backup_lockfile, BackupLockGuard};
-use crate::acme::AcmeClient;
-use crate::api2::types::{
- AcmeAccountName, AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA,
-};
+use crate::api2::types::HTTP_PROXY_SCHEMA;
const CONF_FILE: &str = configdir!("/node.cfg");
const LOCK_FILE: &str = configdir!("/.node.lck");
@@ -44,20 +42,6 @@ pub fn save_config(config: &NodeConfig) -> Result<(), Error> {
pbs_config::replace_backup_config(CONF_FILE, &raw)
}
-#[api(
- properties: {
- account: { type: AcmeAccountName },
- }
-)]
-#[derive(Deserialize, Serialize)]
-/// The ACME configuration.
-///
-/// Currently only contains the name of the account use.
-pub struct AcmeConfig {
- /// Account to use to acquire ACME certificates.
- account: AcmeAccountName,
-}
-
/// All available languages in Proxmox. Taken from proxmox-i18n repository.
/// pt_BR, zh_CN, and zh_TW use the same case in the translation files.
// TODO: auto-generate from available translations
@@ -235,19 +219,16 @@ pub struct NodeConfig {
}
impl NodeConfig {
- pub fn acme_config(&self) -> Option<Result<AcmeConfig, Error>> {
- self.acme.as_deref().map(|config| -> Result<_, Error> {
- crate::tools::config::from_property_string(config, &AcmeConfig::API_SCHEMA)
- })
- }
-
- pub async fn acme_client(&self) -> Result<AcmeClient, Error> {
- let account = if let Some(cfg) = self.acme_config().transpose()? {
- cfg.account
- } else {
- AcmeAccountName::from_string("default".to_string())? // should really not happen
- };
- AcmeClient::load(&account).await
+ pub fn acme_config(&self) -> Result<AcmeConfig, Error> {
+ self.acme
+ .as_deref()
+ .map(|config| {
+ crate::tools::config::from_property_string::<AcmeConfig>(
+ config,
+ &AcmeConfig::API_SCHEMA,
+ )
+ })
+ .unwrap_or_else(|| proxmox_acme_api::parse_acme_config_string("account=default"))
}
pub fn acme_domains(&'_ self) -> AcmeDomainIter<'_> {
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 4%]
* [pbs-devel] [PATCH proxmox-backup v6 2/2] acme: remove unused src/acme and plugin code
2026-01-16 11:28 10% [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (3 preceding siblings ...)
2026-01-16 11:28 4% ` [pbs-devel] [PATCH proxmox-backup v6 1/2] acme: remove local AcmeClient and use proxmox-acme-api handlers Samuel Rufinatscha
@ 2026-01-16 11:28 9% ` Samuel Rufinatscha
4 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-16 11:28 UTC (permalink / raw)
To: pbs-devel
Removes the unused src/acme module and plugin code as PBS now uses the
factored out client/API handlers.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
src/acme/mod.rs | 1 -
src/acme/plugin.rs | 335 --------------------------------------
src/api2/types/acme.rs | 38 -----
src/api2/types/mod.rs | 3 -
src/config/acme/mod.rs | 1 -
src/config/acme/plugin.rs | 105 ------------
src/config/mod.rs | 1 -
src/lib.rs | 2 -
8 files changed, 486 deletions(-)
delete mode 100644 src/acme/mod.rs
delete mode 100644 src/acme/plugin.rs
delete mode 100644 src/api2/types/acme.rs
delete mode 100644 src/config/acme/mod.rs
delete mode 100644 src/config/acme/plugin.rs
diff --git a/src/acme/mod.rs b/src/acme/mod.rs
deleted file mode 100644
index 700d90d7..00000000
--- a/src/acme/mod.rs
+++ /dev/null
@@ -1 +0,0 @@
-pub(crate) mod plugin;
diff --git a/src/acme/plugin.rs b/src/acme/plugin.rs
deleted file mode 100644
index 6804243c..00000000
--- a/src/acme/plugin.rs
+++ /dev/null
@@ -1,335 +0,0 @@
-use std::future::Future;
-use std::net::{IpAddr, SocketAddr};
-use std::pin::Pin;
-use std::process::Stdio;
-use std::sync::Arc;
-use std::time::Duration;
-
-use anyhow::{bail, format_err, Error};
-use bytes::Bytes;
-use futures::TryFutureExt;
-use http_body_util::Full;
-use hyper::body::Incoming;
-use hyper::server::conn::http1;
-use hyper::service::service_fn;
-use hyper::{Request, Response};
-use hyper_util::rt::TokioIo;
-use tokio::io::{AsyncBufReadExt, AsyncRead, AsyncWriteExt, BufReader};
-use tokio::net::TcpListener;
-use tokio::process::Command;
-
-use proxmox_acme::async_client::AcmeClient;
-use proxmox_acme::{Authorization, Challenge};
-use proxmox_rest_server::WorkerTask;
-
-use crate::api2::types::AcmeDomain;
-use crate::config::acme::plugin::{DnsPlugin, PluginData};
-
-const PROXMOX_ACME_SH_PATH: &str = "/usr/share/proxmox-acme/proxmox-acme";
-
-pub(crate) fn get_acme_plugin(
- plugin_data: &PluginData,
- name: &str,
-) -> Result<Option<Box<dyn AcmePlugin + Send + Sync + 'static>>, Error> {
- let (ty, data) = match plugin_data.get(name) {
- Some(plugin) => plugin,
- None => return Ok(None),
- };
-
- Ok(Some(match ty.as_str() {
- "dns" => {
- let plugin: DnsPlugin = serde::Deserialize::deserialize(data)?;
- Box::new(plugin)
- }
- "standalone" => {
- // this one has no config
- Box::<StandaloneServer>::default()
- }
- other => bail!("missing implementation for plugin type '{}'", other),
- }))
-}
-
-pub(crate) trait AcmePlugin {
- /// Setup everything required to trigger the validation and return the corresponding validation
- /// URL.
- fn setup<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<&'c str, Error>> + Send + 'fut>>;
-
- fn teardown<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + 'fut>>;
-}
-
-fn extract_challenge<'a>(
- authorization: &'a Authorization,
- ty: &str,
-) -> Result<&'a Challenge, Error> {
- authorization
- .challenges
- .iter()
- .find(|ch| ch.ty == ty)
- .ok_or_else(|| format_err!("no supported challenge type ({}) found", ty))
-}
-
-async fn pipe_to_tasklog<T: AsyncRead + Unpin>(
- pipe: T,
- task: Arc<WorkerTask>,
-) -> Result<(), std::io::Error> {
- let mut pipe = BufReader::new(pipe);
- let mut line = String::new();
- loop {
- line.clear();
- match pipe.read_line(&mut line).await {
- Ok(0) => return Ok(()),
- Ok(_) => task.log_message(line.as_str()),
- Err(err) => return Err(err),
- }
- }
-}
-
-impl DnsPlugin {
- async fn action<'a>(
- &self,
- client: &mut AcmeClient,
- authorization: &'a Authorization,
- domain: &AcmeDomain,
- task: Arc<WorkerTask>,
- action: &str,
- ) -> Result<&'a str, Error> {
- let challenge = extract_challenge(authorization, "dns-01")?;
- let mut stdin_data = client
- .dns_01_txt_value(
- challenge
- .token()
- .ok_or_else(|| format_err!("missing token in challenge"))?,
- )?
- .into_bytes();
- stdin_data.push(b'\n');
- stdin_data.extend(self.data.as_bytes());
- if stdin_data.last() != Some(&b'\n') {
- stdin_data.push(b'\n');
- }
-
- let mut command = Command::new("/usr/bin/setpriv");
-
- #[rustfmt::skip]
- command.args([
- "--reuid", "nobody",
- "--regid", "nogroup",
- "--clear-groups",
- "--reset-env",
- "--",
- "/bin/bash",
- PROXMOX_ACME_SH_PATH,
- action,
- &self.core.api,
- domain.alias.as_deref().unwrap_or(&domain.domain),
- ]);
-
- // We could use 1 socketpair, but tokio wraps them all in `File` internally causing `close`
- // to be called separately on all of them without exception, so we need 3 pipes :-(
-
- let mut child = command
- .stdin(Stdio::piped())
- .stdout(Stdio::piped())
- .stderr(Stdio::piped())
- .spawn()?;
-
- let mut stdin = child.stdin.take().expect("Stdio::piped()");
- let stdout = child.stdout.take().expect("Stdio::piped() failed?");
- let stdout = pipe_to_tasklog(stdout, Arc::clone(&task));
- let stderr = child.stderr.take().expect("Stdio::piped() failed?");
- let stderr = pipe_to_tasklog(stderr, Arc::clone(&task));
- let stdin = async move {
- stdin.write_all(&stdin_data).await?;
- stdin.flush().await?;
- Ok::<_, std::io::Error>(())
- };
- match futures::try_join!(stdin, stdout, stderr) {
- Ok(((), (), ())) => (),
- Err(err) => {
- if let Err(err) = child.kill().await {
- task.log_message(format!(
- "failed to kill '{PROXMOX_ACME_SH_PATH} {action}' command: {err}"
- ));
- }
- bail!("'{}' failed: {}", PROXMOX_ACME_SH_PATH, err);
- }
- }
-
- let status = child.wait().await?;
- if !status.success() {
- bail!(
- "'{} {}' exited with error ({})",
- PROXMOX_ACME_SH_PATH,
- action,
- status.code().unwrap_or(-1)
- );
- }
-
- Ok(&challenge.url)
- }
-}
-
-impl AcmePlugin for DnsPlugin {
- fn setup<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<&'c str, Error>> + Send + 'fut>> {
- Box::pin(async move {
- let result = self
- .action(client, authorization, domain, task.clone(), "setup")
- .await;
-
- let validation_delay = self.core.validation_delay.unwrap_or(30) as u64;
- if validation_delay > 0 {
- task.log_message(format!(
- "Sleeping {validation_delay} seconds to wait for TXT record propagation"
- ));
- tokio::time::sleep(Duration::from_secs(validation_delay)).await;
- }
- result
- })
- }
-
- fn teardown<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + 'fut>> {
- Box::pin(async move {
- self.action(client, authorization, domain, task, "teardown")
- .await
- .map(drop)
- })
- }
-}
-
-#[derive(Default)]
-struct StandaloneServer {
- abort_handle: Option<futures::future::AbortHandle>,
-}
-
-// In case the "order_certificates" future gets dropped between setup & teardown, let's also cancel
-// the HTTP listener on Drop:
-impl Drop for StandaloneServer {
- fn drop(&mut self) {
- self.stop();
- }
-}
-
-impl StandaloneServer {
- fn stop(&mut self) {
- if let Some(abort) = self.abort_handle.take() {
- abort.abort();
- }
- }
-}
-
-async fn standalone_respond(
- req: Request<Incoming>,
- path: Arc<String>,
- key_auth: Arc<String>,
-) -> Result<Response<Full<Bytes>>, hyper::Error> {
- if req.method() == hyper::Method::GET && req.uri().path() == path.as_str() {
- Ok(Response::builder()
- .status(hyper::http::StatusCode::OK)
- .body(key_auth.as_bytes().to_vec().into())
- .unwrap())
- } else {
- Ok(Response::builder()
- .status(hyper::http::StatusCode::NOT_FOUND)
- .body("Not found.".into())
- .unwrap())
- }
-}
-
-impl AcmePlugin for StandaloneServer {
- fn setup<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- _domain: &'d AcmeDomain,
- _task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<&'c str, Error>> + Send + 'fut>> {
- Box::pin(async move {
- self.stop();
-
- let challenge = extract_challenge(authorization, "http-01")?;
- let token = challenge
- .token()
- .ok_or_else(|| format_err!("missing token in challenge"))?;
- let key_auth = Arc::new(client.key_authorization(token)?);
- let path = Arc::new(format!("/.well-known/acme-challenge/{token}"));
-
- // `[::]:80` first, then `*:80`
- let dual = SocketAddr::new(IpAddr::from([0u16; 8]), 80);
- let ipv4 = SocketAddr::new(IpAddr::from([0u8; 4]), 80);
- let incoming = TcpListener::bind(dual)
- .or_else(|_| TcpListener::bind(ipv4))
- .await?;
-
- let server = async move {
- loop {
- let key_auth = Arc::clone(&key_auth);
- let path = Arc::clone(&path);
- match incoming.accept().await {
- Ok((tcp, _)) => {
- let io = TokioIo::new(tcp);
- let service = service_fn(move |request| {
- standalone_respond(
- request,
- Arc::clone(&path),
- Arc::clone(&key_auth),
- )
- });
-
- tokio::task::spawn(async move {
- if let Err(err) =
- http1::Builder::new().serve_connection(io, service).await
- {
- println!("Error serving connection: {err:?}");
- }
- });
- }
- Err(err) => println!("Error accepting connection: {err:?}"),
- }
- }
- };
- let (future, abort) = futures::future::abortable(server);
- self.abort_handle = Some(abort);
- tokio::spawn(future);
-
- Ok(challenge.url.as_str())
- })
- }
-
- fn teardown<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- _client: &'b mut AcmeClient,
- _authorization: &'c Authorization,
- _domain: &'d AcmeDomain,
- _task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + 'fut>> {
- Box::pin(async move {
- if let Some(abort) = self.abort_handle.take() {
- abort.abort();
- }
- Ok(())
- })
- }
-}
diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
deleted file mode 100644
index b83b9882..00000000
--- a/src/api2/types/acme.rs
+++ /dev/null
@@ -1,38 +0,0 @@
-use serde::{Deserialize, Serialize};
-
-use pbs_api_types::{DNS_ALIAS_FORMAT, DNS_NAME_FORMAT, PROXMOX_SAFE_ID_FORMAT};
-use proxmox_schema::api;
-
-#[api(
- properties: {
- "domain": { format: &DNS_NAME_FORMAT },
- "alias": {
- optional: true,
- format: &DNS_ALIAS_FORMAT,
- },
- "plugin": {
- optional: true,
- format: &PROXMOX_SAFE_ID_FORMAT,
- },
- },
- default_key: "domain",
-)]
-#[derive(Deserialize, Serialize)]
-/// A domain entry for an ACME certificate.
-pub struct AcmeDomain {
- /// The domain to certify for.
- pub domain: String,
-
- /// The domain to use for challenges instead of the default acme challenge domain.
- ///
- /// This is useful if you use CNAME entries to redirect `_acme-challenge.*` domains to a
- /// different DNS server.
- #[serde(skip_serializing_if = "Option::is_none")]
- pub alias: Option<String>,
-
- /// The plugin to use to validate this domain.
- ///
- /// Empty means standalone HTTP validation is used.
- #[serde(skip_serializing_if = "Option::is_none")]
- pub plugin: Option<String>,
-}
diff --git a/src/api2/types/mod.rs b/src/api2/types/mod.rs
index afc34b30..34193685 100644
--- a/src/api2/types/mod.rs
+++ b/src/api2/types/mod.rs
@@ -4,9 +4,6 @@ use anyhow::bail;
use proxmox_schema::*;
-mod acme;
-pub use acme::*;
-
// File names: may not contain slashes, may not start with "."
pub const FILENAME_FORMAT: ApiStringFormat = ApiStringFormat::VerifyFn(|name| {
if name.starts_with('.') {
diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
deleted file mode 100644
index 962cb1bb..00000000
--- a/src/config/acme/mod.rs
+++ /dev/null
@@ -1 +0,0 @@
-pub mod plugin;
diff --git a/src/config/acme/plugin.rs b/src/config/acme/plugin.rs
deleted file mode 100644
index e5a41f99..00000000
--- a/src/config/acme/plugin.rs
+++ /dev/null
@@ -1,105 +0,0 @@
-use anyhow::Error;
-use serde::{Deserialize, Serialize};
-use serde_json::Value;
-
-use pbs_api_types::PROXMOX_SAFE_ID_FORMAT;
-use proxmox_schema::{api, Schema, StringSchema, Updater};
-use proxmox_section_config::SectionConfigData;
-
-pub const PLUGIN_ID_SCHEMA: Schema = StringSchema::new("ACME Challenge Plugin ID.")
- .format(&PROXMOX_SAFE_ID_FORMAT)
- .min_length(1)
- .max_length(32)
- .schema();
-
-#[api(
- properties: {
- id: { schema: PLUGIN_ID_SCHEMA },
- disable: {
- optional: true,
- default: false,
- },
- "validation-delay": {
- default: 30,
- optional: true,
- minimum: 0,
- maximum: 2 * 24 * 60 * 60,
- },
- },
-)]
-/// DNS ACME Challenge Plugin core data.
-#[derive(Deserialize, Serialize, Updater)]
-#[serde(rename_all = "kebab-case")]
-pub struct DnsPluginCore {
- /// Plugin ID.
- #[updater(skip)]
- pub id: String,
-
- /// DNS API Plugin Id.
- pub api: String,
-
- /// Extra delay in seconds to wait before requesting validation.
- ///
- /// Allows to cope with long TTL of DNS records.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- pub validation_delay: Option<u32>,
-
- /// Flag to disable the config.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- pub disable: Option<bool>,
-}
-
-#[api(
- properties: {
- core: { type: DnsPluginCore },
- },
-)]
-/// DNS ACME Challenge Plugin.
-#[derive(Deserialize, Serialize)]
-#[serde(rename_all = "kebab-case")]
-pub struct DnsPlugin {
- #[serde(flatten)]
- pub core: DnsPluginCore,
-
- // We handle this property separately in the API calls.
- /// DNS plugin data (base64url encoded without padding).
- #[serde(with = "proxmox_serde::string_as_base64url_nopad")]
- pub data: String,
-}
-
-impl DnsPlugin {
- pub fn decode_data(&self, output: &mut Vec<u8>) -> Result<(), Error> {
- Ok(proxmox_base64::url::decode_to_vec(&self.data, output)?)
- }
-}
-
-pub struct PluginData {
- data: SectionConfigData,
-}
-
-// And some convenience helpers.
-impl PluginData {
- pub fn remove(&mut self, name: &str) -> Option<(String, Value)> {
- self.data.sections.remove(name)
- }
-
- pub fn contains_key(&mut self, name: &str) -> bool {
- self.data.sections.contains_key(name)
- }
-
- pub fn get(&self, name: &str) -> Option<&(String, Value)> {
- self.data.sections.get(name)
- }
-
- pub fn get_mut(&mut self, name: &str) -> Option<&mut (String, Value)> {
- self.data.sections.get_mut(name)
- }
-
- pub fn insert(&mut self, id: String, ty: String, plugin: Value) {
- self.data.sections.insert(id, (ty, plugin));
- }
-
- pub fn iter(&self) -> impl Iterator<Item = (&String, &(String, Value))> + Send {
- self.data.sections.iter()
- }
-}
diff --git a/src/config/mod.rs b/src/config/mod.rs
index 19246742..f05af90d 100644
--- a/src/config/mod.rs
+++ b/src/config/mod.rs
@@ -15,7 +15,6 @@ use proxmox_lang::try_block;
use pbs_api_types::{PamRealmConfig, PbsRealmConfig};
use pbs_buildcfg::{self, configdir};
-pub mod acme;
pub mod node;
pub mod tfa;
diff --git a/src/lib.rs b/src/lib.rs
index 8633378c..828f5842 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -27,8 +27,6 @@ pub(crate) mod auth;
pub mod tape;
-pub mod acme;
-
pub mod client_helpers;
pub mod traffic_control_cache;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 9%]
* [pbs-devel] [PATCH proxmox v6 1/3] acme-api: add ACME completion helpers
2026-01-16 11:28 10% [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
@ 2026-01-16 11:28 16% ` Samuel Rufinatscha
2026-01-16 11:28 15% ` [pbs-devel] [PATCH proxmox v6 2/3] acme: introduce http_status module Samuel Rufinatscha
` (3 subsequent siblings)
4 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-16 11:28 UTC (permalink / raw)
To: pbs-devel
Factors out the PBS ACME completion helpers and adds them to
proxmox-acme-api.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme-api/src/challenge_schemas.rs | 2 +-
proxmox-acme-api/src/lib.rs | 57 +++++++++++++++++++++++
2 files changed, 58 insertions(+), 1 deletion(-)
diff --git a/proxmox-acme-api/src/challenge_schemas.rs b/proxmox-acme-api/src/challenge_schemas.rs
index e66e327e..4e94d3ff 100644
--- a/proxmox-acme-api/src/challenge_schemas.rs
+++ b/proxmox-acme-api/src/challenge_schemas.rs
@@ -29,7 +29,7 @@ impl Serialize for ChallengeSchemaWrapper {
}
}
-fn load_dns_challenge_schema() -> Result<Vec<AcmeChallengeSchema>, Error> {
+pub(crate) fn load_dns_challenge_schema() -> Result<Vec<AcmeChallengeSchema>, Error> {
let raw = file_read_string(ACME_DNS_SCHEMA_FN)?;
let schemas: serde_json::Map<String, Value> = serde_json::from_str(&raw)?;
diff --git a/proxmox-acme-api/src/lib.rs b/proxmox-acme-api/src/lib.rs
index 623e9e23..ba64569d 100644
--- a/proxmox-acme-api/src/lib.rs
+++ b/proxmox-acme-api/src/lib.rs
@@ -46,3 +46,60 @@ pub(crate) mod acme_plugin;
mod certificate_helpers;
#[cfg(feature = "impl")]
pub use certificate_helpers::{create_self_signed_cert, order_certificate, revoke_certificate};
+
+#[cfg(feature = "impl")]
+pub mod completion {
+
+ use std::collections::HashMap;
+ use std::ops::ControlFlow;
+
+ use crate::account_config::foreach_acme_account;
+ use crate::challenge_schemas::load_dns_challenge_schema;
+ use crate::plugin_config::plugin_config;
+
+ pub fn complete_acme_account(_arg: &str, _param: &HashMap<String, String>) -> Vec<String> {
+ let mut out = Vec::new();
+ let _ = foreach_acme_account(|name| {
+ out.push(name.into_string());
+ ControlFlow::Continue(())
+ });
+ out
+ }
+
+ pub fn complete_acme_plugin(_arg: &str, _param: &HashMap<String, String>) -> Vec<String> {
+ match plugin_config() {
+ Ok((config, _digest)) => config
+ .iter()
+ .map(|(id, (_type, _cfg))| id.clone())
+ .collect(),
+ Err(_) => Vec::new(),
+ }
+ }
+
+ pub fn complete_acme_plugin_type(_arg: &str, _param: &HashMap<String, String>) -> Vec<String> {
+ vec![
+ "dns".to_string(),
+ //"http".to_string(), // makes currently not really sense to create or the like
+ ]
+ }
+
+ pub fn complete_acme_api_challenge_type(
+ _arg: &str,
+ param: &HashMap<String, String>,
+ ) -> Vec<String> {
+ if param.get("type") == Some(&"dns".to_string()) {
+ match load_dns_challenge_schema() {
+ Ok(schema) => schema.into_iter().map(|s| s.id).collect(),
+ Err(_) => Vec::new(),
+ }
+ } else {
+ Vec::new()
+ }
+ }
+}
+
+#[cfg(feature = "impl")]
+pub use completion::{
+ complete_acme_account, complete_acme_api_challenge_type, complete_acme_plugin,
+ complete_acme_plugin_type,
+};
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 16%]
* [pbs-devel] [PATCH proxmox v6 2/3] acme: introduce http_status module
2026-01-16 11:28 10% [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2026-01-16 11:28 16% ` [pbs-devel] [PATCH proxmox v6 1/3] acme-api: add ACME completion helpers Samuel Rufinatscha
@ 2026-01-16 11:28 15% ` Samuel Rufinatscha
2026-01-16 11:28 14% ` [pbs-devel] [PATCH proxmox v6 3/3] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (2 subsequent siblings)
4 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-16 11:28 UTC (permalink / raw)
To: pbs-devel
Introduce an internal http_status module with the common ACME HTTP
response codes, and replace use of crate::request::CREATED as well as
direct numeric status code usages.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme/src/account.rs | 8 ++++----
proxmox-acme/src/async_client.rs | 4 ++--
proxmox-acme/src/lib.rs | 2 ++
proxmox-acme/src/request.rs | 11 ++++++++++-
4 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
index f763c1e9..c62e60e0 100644
--- a/proxmox-acme/src/account.rs
+++ b/proxmox-acme/src/account.rs
@@ -85,7 +85,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::request::CREATED,
+ expected: crate::http_status::CREATED,
};
Ok(NewOrder::new(request))
@@ -107,7 +107,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: crate::http_status::OK,
})
}
@@ -132,7 +132,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: crate::http_status::OK,
})
}
@@ -405,7 +405,7 @@ impl AccountCreator {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::request::CREATED,
+ expected: crate::http_status::CREATED,
})
}
diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
index dc755fb9..c803823d 100644
--- a/proxmox-acme/src/async_client.rs
+++ b/proxmox-acme/src/async_client.rs
@@ -498,7 +498,7 @@ impl AcmeClient {
method: "GET",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: crate::http_status::OK,
},
nonce,
)
@@ -550,7 +550,7 @@ impl AcmeClient {
method: "HEAD",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: crate::http_status::OK,
},
nonce,
)
diff --git a/proxmox-acme/src/lib.rs b/proxmox-acme/src/lib.rs
index df722629..b1be9d15 100644
--- a/proxmox-acme/src/lib.rs
+++ b/proxmox-acme/src/lib.rs
@@ -74,6 +74,8 @@ pub use request::Request;
#[cfg(feature = "impl")]
pub use order::NewOrder;
#[cfg(feature = "impl")]
+pub(crate) use request::http_status;
+#[cfg(feature = "impl")]
pub use request::ErrorResponse;
/// Header name for nonces.
diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
index 78a90913..2c83255a 100644
--- a/proxmox-acme/src/request.rs
+++ b/proxmox-acme/src/request.rs
@@ -1,7 +1,6 @@
use serde::Deserialize;
pub(crate) const JSON_CONTENT_TYPE: &str = "application/jose+json";
-pub(crate) const CREATED: u16 = 201;
/// A request which should be performed on the ACME provider.
pub struct Request {
@@ -21,6 +20,16 @@ pub struct Request {
pub expected: u16,
}
+/// Common HTTP status codes used in ACME responses.
+pub(crate) mod http_status {
+ /// 200 OK
+ pub(crate) const OK: u16 = 200;
+ /// 201 Created
+ pub(crate) const CREATED: u16 = 201;
+ /// 204 No Content
+ pub(crate) const NO_CONTENT: u16 = 204;
+}
+
/// An ACME error response contains a specially formatted type string, and can optionally
/// contain textual details and a set of sub problems.
#[derive(Clone, Debug, Deserialize)]
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox v6 3/3] fix #6939: acme: support servers returning 204 for nonce requests
2026-01-16 11:28 10% [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2026-01-16 11:28 16% ` [pbs-devel] [PATCH proxmox v6 1/3] acme-api: add ACME completion helpers Samuel Rufinatscha
2026-01-16 11:28 15% ` [pbs-devel] [PATCH proxmox v6 2/3] acme: introduce http_status module Samuel Rufinatscha
@ 2026-01-16 11:28 14% ` Samuel Rufinatscha
2026-01-16 11:28 4% ` [pbs-devel] [PATCH proxmox-backup v6 1/2] acme: remove local AcmeClient and use proxmox-acme-api handlers Samuel Rufinatscha
2026-01-16 11:28 9% ` [pbs-devel] [PATCH proxmox-backup v6 2/2] acme: remove unused src/acme and plugin code Samuel Rufinatscha
4 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-16 11:28 UTC (permalink / raw)
To: pbs-devel
Some ACME servers (notably custom or legacy implementations) respond
to HEAD /newNonce with a 204 No Content instead of the
RFC 8555-recommended 200 OK [1]. While this behavior is technically
off-spec, it is not illegal. This issue was reported on our bug
tracker [2].
The previous implementation treated any non-200 response as an error,
causing account registration to fail against such servers. Relax the
status-code check to accept both 200 and 204 responses (and potentially
support other 2xx codes) to improve interoperability.
Note: In comparison, PVE’s Perl ACME client performs a GET request [3]
instead of a HEAD request and accepts any 2xx success code when
retrieving the nonce [4]. This difference in behavior does not affect
functionality but is worth noting for consistency across
implementations.
[1] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
[3] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219
[4] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597
Fixes: #6939
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme/src/account.rs | 10 +++++-----
proxmox-acme/src/async_client.rs | 6 +++---
proxmox-acme/src/client.rs | 2 +-
proxmox-acme/src/request.rs | 4 ++--
4 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
index c62e60e0..8df19a29 100644
--- a/proxmox-acme/src/account.rs
+++ b/proxmox-acme/src/account.rs
@@ -85,7 +85,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::CREATED,
+ expected: &[crate::http_status::CREATED],
};
Ok(NewOrder::new(request))
@@ -107,7 +107,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK],
})
}
@@ -132,7 +132,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK],
})
}
@@ -157,7 +157,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: &[crate::http_status::OK],
})
}
@@ -405,7 +405,7 @@ impl AccountCreator {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::CREATED,
+ expected: &[crate::http_status::CREATED],
})
}
diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
index c803823d..66ec6024 100644
--- a/proxmox-acme/src/async_client.rs
+++ b/proxmox-acme/src/async_client.rs
@@ -420,7 +420,7 @@ impl AcmeClient {
};
if parts.status.is_success() {
- if status != request.expected {
+ if !request.expected.contains(&status) {
return Err(Error::InvalidApi(format!(
"ACME server responded with unexpected status code: {:?}",
parts.status
@@ -498,7 +498,7 @@ impl AcmeClient {
method: "GET",
content_type: "",
body: String::new(),
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK],
},
nonce,
)
@@ -550,7 +550,7 @@ impl AcmeClient {
method: "HEAD",
content_type: "",
body: String::new(),
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK, crate::http_status::NO_CONTENT],
},
nonce,
)
diff --git a/proxmox-acme/src/client.rs b/proxmox-acme/src/client.rs
index 931f7245..881ee83d 100644
--- a/proxmox-acme/src/client.rs
+++ b/proxmox-acme/src/client.rs
@@ -203,7 +203,7 @@ impl Inner {
let got_nonce = self.update_nonce(&mut response)?;
if response.is_success() {
- if response.status != request.expected {
+ if !request.expected.contains(&response.status) {
return Err(Error::InvalidApi(format!(
"API server responded with unexpected status code: {:?}",
response.status
diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
index 2c83255a..8a4017dc 100644
--- a/proxmox-acme/src/request.rs
+++ b/proxmox-acme/src/request.rs
@@ -16,8 +16,8 @@ pub struct Request {
/// The body to pass along with request, or an empty string.
pub body: String,
- /// The expected status code a compliant ACME provider will return on success.
- pub expected: u16,
+ /// The set of HTTP status codes that indicate a successful response from an ACME provider.
+ pub expected: &'static [u16],
}
/// Common HTTP status codes used in ACME responses.
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 14%]
* [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] fix #6939: acme: support servers returning 204 for nonce requests
@ 2026-01-16 11:28 10% Samuel Rufinatscha
2026-01-16 11:28 16% ` [pbs-devel] [PATCH proxmox v6 1/3] acme-api: add ACME completion helpers Samuel Rufinatscha
` (4 more replies)
0 siblings, 5 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-16 11:28 UTC (permalink / raw)
To: pbs-devel
Hi,
this series fixes account registration for ACME providers that return
HTTP 204 No Content to the newNonce request. Currently, both the PBS
ACME client and the shared ACME client in proxmox-acme only accept
HTTP 200 OK for this request. The issue was observed in PBS against a
custom ACME deployment and reported as bug #6939 [1].
## Problem
During ACME account registration, PBS first fetches an anti-replay
nonce by sending a HEAD request to the CA’s newNonce URL.
RFC 8555 §7.2 [2] states that:
* the server MUST include a Replay-Nonce header with a fresh nonce,
* the server SHOULD use status 200 OK for the HEAD request,
* the server MUST also handle GET on the same resource and may return
204 No Content with an empty body.
The reporter observed the following error message:
"ACME server responded with unexpected status code: 204"
and mentioned that the issue did not appear with PVE 9 [1]. Looking at
PVE’s Perl ACME client [3], it uses a GET request instead of HEAD and
accepts any 2xx success code when retrieving the nonce. This difference
in behavior is worth noting.
## Approach
This series changes the expected field of the internal Request type
from a single u16 to &'static [u16], so one request can explicitly
accept multiple success codes.
To avoid fixing the issue in PBS and in PDM (which uses the shared ACME stack),
this series fixes the bug in
proxmox-acme and then refactors PBS to use the shared ACME stack too.
## Testing
I tested the refactor using Pebble HTTP challenge type.
The DNS challange type will be tested as mentioned by Max (see v5).
*HTTP Challenge Type Test*
To test the refactor, I
(1) installed latest stable PBS on a VM
(2) created .deb package from latest PBS (master), containing the
refactor
(3) installed created .deb package
(4) installed Pebble from Let's Encrypt [5] on the same VM
(5) created an ACME account and ordered the new certificate for the
host domain.
Steps to reproduce:
(1) install latest stable PBS on a VM, create .deb package from latest
PBS (master) containing the refactor, install created .deb package
(2) install Pebble from Let's Encrypt [5] on the same VM:
cd
apt update
apt install -y golang git
git clone https://github.com/letsencrypt/pebble
cd pebble
go build ./cmd/pebble
then, download and trust the Pebble cert:
wget https://raw.githubusercontent.com/letsencrypt/pebble/main/test/certs/pebble.minica.pem
cp pebble.minica.pem /usr/local/share/ca-certificates/pebble.minica.crt
update-ca-certificates
We want Pebble to perform HTTP-01 validation against port 80, because
PBS’s standalone plugin will bind port 80. Set httpPort to 80.
nano ./test/config/pebble-config.json
Start the Pebble server in the background:
./pebble -config ./test/config/pebble-config.json &
Create a Pebble ACME account:
proxmox-backup-manager acme account register default admin@example.com --directory 'https://127.0.0.1:14000/dir'
To verify persistence of the account I checked
ls /etc/proxmox-backup/acme/accounts
Verified if update-account works
proxmox-backup-manager acme account update default --contact "a@example.com,b@example.com"
proxmox-backup-manager acme account info default
In the PBS GUI, you can create a new domain. You can use your host
domain name (see /etc/hosts). Select the created account and order the
certificate.
After a page reload, you might need to accept the new certificate in the browser.
In the PBS dashboard, you should see the new Pebble certificate.
*Note: on reboot, the created Pebble ACME account will be gone and you
will need to create a new one. Pebble does not persist account info.
In that case remove the previously created account in
/etc/proxmox-backup/acme/accounts.
*Testing the newNonce fix*
To test the ACME newNonce fix, I put nginx in front of Pebble, to
intercept the newNonce request in order to return 204 No Content
instead of 200 OK, all other requests are unchanged and forwarded to
Pebble. Requires trusting the nginx CAs via
/usr/local/share/ca-certificates + update-ca-certificates on the VM.
Then I ran following command against nginx:
proxmox-backup-manager acme account register proxytest root@backup.local --directory 'https://nginx-address/dir
The account could be created successfully. When adjusting the nginx
configuration to return any other non-expected success status code,
PBS rejects as expected.
## Patch summary
0001 - proxmox: acme-api: add ACME completion helpers
0002 – proxmox: acme: introduce http_status module
0003 – proxmox: fix #6939: acme: support servers
returning 204 for nonce requests
0004 – proxmox-backup: acme: remove local AcmeClient and use
proxmox-acme-api handlers
0005 – proxmox-backup: acme: remove unused src/acme and plugin code
## Maintainer notes
proxmox-acme: requires version bump (breaking Request::expected change)
proxmox-backup: requires version bump
- NodeConfig::acme_config() signature changed from
Option<Result<AcmeConfig, Error>> to Result<AcmeConfig, Error>
- NodeConfig::acme_client() function removed
0001 - proxmox: acme-api: add ACME completion helpers could be applied
as an independent patch to make sure https://bugzilla.proxmox.com/show_bug.cgi?id=7179
is not blocked / avoid duplicate work
## Changelog
Changes from v5 to v6:
* rebased
* proxmox-acme: revert visibility changes and dead-code removal
* proxmox-acme-api: remove load_client_with_account
* proxmox-backup: remove pub Node::acme_client()
* proxmox-backup: Node::acme_config() inline transpose/default logic
* proxmox-backup: merge PBS Client removal and API handler changes in
one patch
* improve commit messages
Changes from v4 to v5:
* rebased
* re-ordered series (proxmox-acme fix first)
* proxmox-backup: cleaned up imports based on an initial clean-up patch
* proxmox-acme: removed now unused post_request_raw_payload(),
update_account_request(), deactivate_account_request()
* proxmox-acme: removed now obsolete/unused get_authorization() and
GetAuthorization impl
Verified removal by compiling PBS, PDM, and proxmox-perl-rs
with all features.
Changes from v3 to v4:
* add proxmox-acme-api as a dependency and initialize it in
PBS so PBS can use the shared ACME API instead.
* remove the PBS-local AcmeClient implementation and switch PBS
over to the shared proxmox-acme async client.
* rework PBS’ ACME API endpoints to delegate to
proxmox-acme-api handlers instead of duplicating logic locally.
* move PBS’ ACME certificate ordering logic over to
proxmox-acme-api, keeping only certificate installation/reload in PBS.
* add a load_client_with_account helper in proxmox-acme-api so PBS
(and others) can construct an AcmeClient for a configured account
without duplicating boilerplate.
* hide the low-level Request type and its fields behind constructors
/ reduced visibility so changes to “expected” no longer affect the
public API as they did in v3.
* split out the HTTP status constants into an internal http_status
module as a separate preparatory cleanup before the bug fix, instead
of doing this inline like in v3.
* Rebased on top of the refactor: keep the same behavioural fix as in
v3 accept 204 for newNonce with Replay-Nonce present), but implement
it on top of the http_status module that is part of the refactor.
Changes from v2 to v3:
* rename `http_success` module to `http_status`
* replace `http_success` usage
* introduced `http_success` module to contain the http success codes
* replaced `Vec<u16>` with `&[u16]` for expected codes to avoid allocations.
* clarified the PVEs Perl ACME client behaviour in the commit message.
* integrated the `http_success` module, replacing `Vec<u16>` with `&[u16]`
* clarified the PVEs Perl ACME client behaviour in the commit message.
[1] Bugzilla report #6939:
[https://bugzilla.proxmox.com/show_bug.cgi?id=6939](https://bugzilla.proxmox.com/show_bug.cgi?id=6939)
[2] RFC 8555 (ACME):
[https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2](https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2)
[3] PVE’s Perl ACME client (allow 2xx codes for nonce requests):
[https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597)
[4] Pebble ACME server:
[https://github.com/letsencrypt/pebble](https://github.com/letsencrypt/pebble)
[5] Pebble ACME server (perform GET request:
[https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219)
proxmox:
Samuel Rufinatscha (3):
acme-api: add ACME completion helpers
acme: introduce http_status module
fix #6939: acme: support servers returning 204 for nonce requests
proxmox-acme-api/src/challenge_schemas.rs | 2 +-
proxmox-acme-api/src/lib.rs | 57 +++++++++++++++++++++++
proxmox-acme/src/account.rs | 10 ++--
proxmox-acme/src/async_client.rs | 6 +--
proxmox-acme/src/client.rs | 2 +-
proxmox-acme/src/lib.rs | 2 +
proxmox-acme/src/request.rs | 15 ++++--
7 files changed, 81 insertions(+), 13 deletions(-)
proxmox-backup:
Samuel Rufinatscha (2):
acme: remove local AcmeClient and use proxmox-acme-api handlers
acme: remove unused src/acme and plugin code
Cargo.toml | 3 +
src/acme/client.rs | 691 -------------------------
src/acme/mod.rs | 5 -
src/acme/plugin.rs | 335 ------------
src/api2/config/acme.rs | 399 ++------------
src/api2/node/certificates.rs | 221 +-------
src/api2/types/acme.rs | 97 ----
src/api2/types/mod.rs | 3 -
src/bin/proxmox-backup-api.rs | 2 +
src/bin/proxmox-backup-manager.rs | 3 +-
src/bin/proxmox-backup-proxy.rs | 1 +
src/bin/proxmox_backup_manager/acme.rs | 37 +-
src/config/acme/mod.rs | 168 ------
src/config/acme/plugin.rs | 189 -------
src/config/mod.rs | 1 -
src/config/node.rs | 43 +-
src/lib.rs | 2 -
17 files changed, 94 insertions(+), 2106 deletions(-)
delete mode 100644 src/acme/client.rs
delete mode 100644 src/acme/mod.rs
delete mode 100644 src/acme/plugin.rs
delete mode 100644 src/api2/types/acme.rs
delete mode 100644 src/config/acme/mod.rs
delete mode 100644 src/config/acme/plugin.rs
Summary over all repositories:
24 files changed, 175 insertions(+), 2119 deletions(-)
--
Generated by git-murpp 0.8.1
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 10%]
* Re: [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests
2026-01-13 13:48 5% ` [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests Fabian Grünbichler
@ 2026-01-15 10:24 0% ` Max R. Carrara
0 siblings, 0 replies; 200+ results
From: Max R. Carrara @ 2026-01-15 10:24 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On Tue Jan 13, 2026 at 2:48 PM CET, Fabian Grünbichler wrote:
> On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
> > Hi,
> >
> > this series fixes account registration for ACME providers that return
> > HTTP 204 No Content to the newNonce request. Currently, both the PBS
> > ACME client and the shared ACME client in proxmox-acme only accept
> > HTTP 200 OK for this request. The issue was observed in PBS against a
> > custom ACME deployment and reported as bug #6939 [1].
>
> sent some feedback for individual patches, one thing to explicitly test
> is that existing accounts and DNS plugin configuration continue to work
> after the switch over - AFAICT from the testing description below that
> was not done (or not noted?).
Ah, to chime in here: I've tested this with a custom DNS plugin config,
see: https://lore.proxmox.com/pbs-devel/DETUA6ZP3M6X.2S90QXW3EYJXU@proxmox.com/
Since a v6 is on its way, I'll re-test everything then; also the case of
configuring the DNS plugin *first* and then switching over to the new
impl.
>
> >
> > ## Problem
> >
> > During ACME account registration, PBS first fetches an anti-replay
> > nonce by sending a HEAD request to the CA’s newNonce URL.
> > RFC 8555 §7.2 [2] states that:
> >
> > * the server MUST include a Replay-Nonce header with a fresh nonce,
> > * the server SHOULD use status 200 OK for the HEAD request,
> > * the server MUST also handle GET on the same resource and may return
> > 204 No Content with an empty body.
> >
> > The reporter observed the following error message:
> >
> > *ACME server responded with unexpected status code: 204*
> >
> > and mentioned that the issue did not appear with PVE 9 [1]. Looking at
> > PVE’s Perl ACME client [3], it uses a GET request instead of HEAD and
> > accepts any 2xx success code when retrieving the nonce. This difference
> > in behavior does not affect functionality but is worth noting for
> > consistency across implementations.
> >
> > ## Approach
> >
> > To support ACME providers which return 204 No Content, the Rust ACME
> > clients in proxmox-backup and proxmox need to treat both 200 OK and 204
> > No Content as valid responses for the nonce request, as long as a
> > Replay-Nonce header is present.
> >
> > This series changes the expected field of the internal Request type
> > from a single u16 to a list of allowed status codes
> > (e.g. &'static [u16]), so one request can explicitly accept multiple
> > success codes.
> >
> > To avoid fixing the issue twice (once in PBS’ own ACME client and once
> > in the shared Rust client), this series first refactors PBS to use the
> > shared AcmeClient from proxmox-acme / proxmox-acme-api, similar to PDM,
> > and then applies the bug fix in that shared implementation so that all
> > consumers benefit from the more tolerant behavior.
> >
> > ## Testing
> >
> > *Testing the refactor*
> >
> > To test the refactor, I
> > (1) installed latest stable PBS on a VM
> > (2) created .deb package from latest PBS (master), containing the
> > refactor
> > (3) installed created .deb package
> > (4) installed Pebble from Let's Encrypt [5] on the same VM
> > (5) created an ACME account and ordered the new certificate for the
> > host domain.
> >
> > Steps to reproduce:
> >
> > (1) install latest stable PBS on a VM, create .deb package from latest
> > PBS (master) containing the refactor, install created .deb package
> > (2) install Pebble from Let's Encrypt [5] on the same VM:
> >
> > cd
> > apt update
> > apt install -y golang git
> > git clone https://github.com/letsencrypt/pebble
> > cd pebble
> > go build ./cmd/pebble
> >
> > then, download and trust the Pebble cert:
> >
> > wget https://raw.githubusercontent.com/letsencrypt/pebble/main/test/certs/pebble.minica.pem
> > cp pebble.minica.pem /usr/local/share/ca-certificates/pebble.minica.crt
> > update-ca-certificates
> >
> > We want Pebble to perform HTTP-01 validation against port 80, because
> > PBS’s standalone plugin will bind port 80. Set httpPort to 80.
> >
> > nano ./test/config/pebble-config.json
> >
> > Start the Pebble server in the background:
> >
> > ./pebble -config ./test/config/pebble-config.json &
> >
> > Create a Pebble ACME account:
> >
> > proxmox-backup-manager acme account register default admin@example.com --directory 'https://127.0.0.1:14000/dir'
> >
> > To verify persistence of the account I checked
> >
> > ls /etc/proxmox-backup/acme/accounts
> >
> > Verified if update-account works
> >
> > proxmox-backup-manager acme account update default --contact "a@example.com,b@example.com"
> > proxmox-backup-manager acme account info default
> >
> > In the PBS GUI, you can create a new domain. You can use your host
> > domain name (see /etc/hosts). Select the created account and order the
> > certificate.
> >
> > After a page reload, you might need to accept the new certificate in the browser.
> > In the PBS dashboard, you should see the new Pebble certificate.
> >
> > *Note: on reboot, the created Pebble ACME account will be gone and you
> > will need to create a new one. Pebble does not persist account info.
> > In that case remove the previously created account in
> > /etc/proxmox-backup/acme/accounts.
> >
> > *Testing the newNonce fix*
> >
> > To prove the ACME newNonce fix, I put nginx in front of Pebble, to
> > intercept the newNonce request in order to return 204 No Content
> > instead of 200 OK, all other requests are unchanged and forwarded to
> > Pebble. Requires trusting the nginx CAs via
> > /usr/local/share/ca-certificates + update-ca-certificates on the VM.
> >
> > Then I ran following command against nginx:
> >
> > proxmox-backup-manager acme account register proxytest root@backup.local --directory 'https://nginx-address/dir
> >
> > The account could be created successfully. When adjusting the nginx
> > configuration to return any other non-expected success status code,
> > PBS rejects as expected.
> >
> > ## Patch summary
> >
> > 0001 – [PATCH proxmox v5 1/4] acme: reduce visibility of Request type
> > Restricts the visibility of the low-level Request type. Consumers
> > should rely on proxmox-acme-api or AcmeClient handlers.
> >
> > 0002– [PATCH proxmox v5 2/4] acme: introduce http_status module
> >
> > 0003 – [PATCH proxmox v5 3/4] fix #6939: acme: support servers
> > returning 204 for nonce requests
> > Adjusts nonce handling to support ACME servers that return HTTP 204
> > (No Content) for new-nonce requests.
> >
> > 0004 – [PATCH proxmox v5 4/4] acme-api: add helper to load client for
> > an account
> > Introduces a helper function to load an ACME client instance for a
> > given account. Required for the following PBS ACME refactor.
> >
> > 0005 – [PATCH proxmox-backup v5 1/5] acme: clean up ACME-related imports
> >
> > 0006 – [PATCH proxmox-backup v5 2/5] acme: include proxmox-acme-api
> > dependency
> > Prepares the codebase to use the factored out ACME API impl.
> >
> > 0007 – [PATCH proxmox-backup v5 3/5] acme: drop local AcmeClient
> > Removes the local AcmeClient implementation. Represents the minimal
> > set of changes to replace it with the factored out AcmeClient.
> >
> > 0008 – [PATCH proxmox-backup v5 4/5] acme: change API impls to use
> > proxmox-acme-api handlers
> >
> > 0009 – [PATCH proxmox-backup v5 5/5] acme: certificate ordering through
> > proxmox-acme-api
> >
> > Thanks for considering this patch series, I look forward to your
> > feedback.
> >
> > Best,
> > Samuel Rufinatscha
> >
> > ## Changelog
> >
> > Changes from v4 to v5:
> >
> > * rebased series
> > * re-ordered series (proxmox-acme fix first)
> > * proxmox-backup: cleaned up imports based on an initial clean-up patch
> > * proxmox-acme: removed now unused post_request_raw_payload(),
> > update_account_request(), deactivate_account_request()
> > * proxmox-acme: removed now obsolete/unused get_authorization() and
> > GetAuthorization impl
> >
> > Verified removal by compiling PBS, PDM, and proxmox-perl-rs
> > with all features.
> >
> > Changes from v3 to v4:
> >
> > * add proxmox-acme-api as a dependency and initialize it in
> > PBS so PBS can use the shared ACME API instead.
> > * remove the PBS-local AcmeClient implementation and switch PBS
> > over to the shared proxmox-acme async client.
> > * rework PBS’ ACME API endpoints to delegate to
> > proxmox-acme-api handlers instead of duplicating logic locally.
> > * move PBS’ ACME certificate ordering logic over to
> > proxmox-acme-api, keeping only certificate installation/reload in PBS.
> > * add a load_client_with_account helper in proxmox-acme-api so PBS
> > (and others) can construct an AcmeClient for a configured account
> > without duplicating boilerplate.
> > * hide the low-level Request type and its fields behind constructors
> > / reduced visibility so changes to “expected” no longer affect the
> > public API as they did in v3.
> > * split out the HTTP status constants into an internal http_status
> > module as a separate preparatory cleanup before the bug fix, instead
> > of doing this inline like in v3.
> > * Rebased on top of the refactor: keep the same behavioural fix as in
> > v3 accept 204 for newNonce with Replay-Nonce present), but implement
> > it on top of the http_status module that is part of the refactor.
> >
> > Changes from v2 to v3:
> >
> > * rename `http_success` module to `http_status`
> > * replace `http_success` usage
> > * introduced `http_success` module to contain the http success codes
> > * replaced `Vec<u16>` with `&[u16]` for expected codes to avoid allocations.
> > * clarified the PVEs Perl ACME client behaviour in the commit message.
> > * integrated the `http_success` module, replacing `Vec<u16>` with `&[u16]`
> > * clarified the PVEs Perl ACME client behaviour in the commit message.
> >
> > [1] Bugzilla report #6939:
> > [https://bugzilla.proxmox.com/show_bug.cgi?id=6939](https://bugzilla.proxmox.com/show_bug.cgi?id=6939)
> > [2] RFC 8555 (ACME):
> > [https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2](https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2)
> > [3] PVE’s Perl ACME client (allow 2xx codes for nonce requests):
> > [https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597)
> > [4] Pebble ACME server:
> > [https://github.com/letsencrypt/pebble](https://github.com/letsencrypt/pebble)
> > [5] Pebble ACME server (perform GET request:
> > [https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219)
> >
> > proxmox:
> >
> > Samuel Rufinatscha (4):
> > acme: reduce visibility of Request type
> > acme: introduce http_status module
> > fix #6939: acme: support servers returning 204 for nonce requests
> > acme-api: add helper to load client for an account
> >
> > proxmox-acme-api/src/account_api_impl.rs | 5 ++
> > proxmox-acme-api/src/lib.rs | 3 +-
> > proxmox-acme/src/account.rs | 102 ++---------------------
> > proxmox-acme/src/async_client.rs | 8 +-
> > proxmox-acme/src/authorization.rs | 30 -------
> > proxmox-acme/src/client.rs | 8 +-
> > proxmox-acme/src/lib.rs | 6 +-
> > proxmox-acme/src/order.rs | 2 +-
> > proxmox-acme/src/request.rs | 25 ++++--
> > 9 files changed, 44 insertions(+), 145 deletions(-)
> >
> >
> > proxmox-backup:
> >
> > Samuel Rufinatscha (5):
> > acme: clean up ACME-related imports
> > acme: include proxmox-acme-api dependency
> > acme: drop local AcmeClient
> > acme: change API impls to use proxmox-acme-api handlers
> > acme: certificate ordering through proxmox-acme-api
> >
> > Cargo.toml | 3 +
> > src/acme/client.rs | 691 -------------------------
> > src/acme/mod.rs | 5 -
> > src/acme/plugin.rs | 336 ------------
> > src/api2/config/acme.rs | 406 ++-------------
> > src/api2/node/certificates.rs | 232 ++-------
> > src/api2/types/acme.rs | 98 ----
> > src/api2/types/mod.rs | 3 -
> > src/bin/proxmox-backup-api.rs | 2 +
> > src/bin/proxmox-backup-manager.rs | 14 +-
> > src/bin/proxmox-backup-proxy.rs | 15 +-
> > src/bin/proxmox_backup_manager/acme.rs | 21 +-
> > src/config/acme/mod.rs | 55 +-
> > src/config/acme/plugin.rs | 92 +---
> > src/config/node.rs | 31 +-
> > src/lib.rs | 2 -
> > 16 files changed, 109 insertions(+), 1897 deletions(-)
> > delete mode 100644 src/acme/client.rs
> > delete mode 100644 src/acme/mod.rs
> > delete mode 100644 src/acme/plugin.rs
> > delete mode 100644 src/api2/types/acme.rs
> >
> >
> > Summary over all repositories:
> > 25 files changed, 153 insertions(+), 2042 deletions(-)
> >
> > --
> > Generated by git-murpp 0.8.1
> >
> >
> > _______________________________________________
> > pbs-devel mailing list
> > pbs-devel@lists.proxmox.com
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> >
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 0%]
* Re: [pbs-devel] [PATCH proxmox v5 2/4] acme: introduce http_status module
2026-01-14 10:29 6% ` Samuel Rufinatscha
@ 2026-01-15 9:25 5% ` Fabian Grünbichler
0 siblings, 0 replies; 200+ results
From: Fabian Grünbichler @ 2026-01-15 9:25 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Samuel Rufinatscha
Cc: Thomas Lamprecht
On January 14, 2026 11:29 am, Samuel Rufinatscha wrote:
> On 1/13/26 2:45 PM, Fabian Grünbichler wrote:
>> On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
>>> Introduce an internal http_status module with the common ACME HTTP
>>> response codes, and replace use of crate::request::CREATED as well as
>>> direct numeric status code usages.
>>
>> why not use http::status ? we already have this as dependency pretty
>> much everywhere we do anything HTTP related.. would also for nicer error
>> messages in case the status is not as expected..
>>
>
> http is only pulled in via the optional client / async-client features,
> not the base impl feature. This code here is gated by impl, where http
> might not be available. Adding http as a hard
> dependency just for the few status code constants feels a bit overkill.
> This matches what we discussed in a previous review round:
>
> https://lore.proxmox.com/pbs-devel/2b7574fb-a3c5-4119-8fb6-9649881dba15@proxmox.com/
your patch makes this crate unusable without enabling either client ;)
http is a small low-level crate for exactly this purpose (a common
implementation of common HTTP related types). we already pull it in
everywhere proxmox-acme is used. in fact, I think it would even make
sense to switch over more things to use http here than just the status
code, but that's a different matter/series..
anyway, I guess we can ignore this for now and discuss switching over to
http at a later point as a series on its own.
> Also, since this is pub(crate) API, I think we can easily switch to
> StatusCode later if http ever becomes a necessary dependency for impl.
> OK with you?
>
>>>
>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>> ---
>>> proxmox-acme/src/account.rs | 8 ++++----
>>> proxmox-acme/src/async_client.rs | 4 ++--
>>> proxmox-acme/src/lib.rs | 2 ++
>>> proxmox-acme/src/request.rs | 11 ++++++++++-
>>> 4 files changed, 18 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
>>> index d8eb3e73..ea1a3c60 100644
>>> --- a/proxmox-acme/src/account.rs
>>> +++ b/proxmox-acme/src/account.rs
>>> @@ -84,7 +84,7 @@ impl Account {
>>> method: "POST",
>>> content_type: crate::request::JSON_CONTENT_TYPE,
>>> body,
>>> - expected: crate::request::CREATED,
>>> + expected: crate::http_status::CREATED,
>>> };
>>>
>>> Ok(NewOrder::new(request))
>>> @@ -106,7 +106,7 @@ impl Account {
>>> method: "POST",
>>> content_type: crate::request::JSON_CONTENT_TYPE,
>>> body,
>>> - expected: 200,
>>> + expected: crate::http_status::OK,
>>> })
>>> }
>>>
>>> @@ -131,7 +131,7 @@ impl Account {
>>> method: "POST",
>>> content_type: crate::request::JSON_CONTENT_TYPE,
>>> body,
>>> - expected: 200,
>>> + expected: crate::http_status::OK,
>>> })
>>> }
>>>
>>> @@ -321,7 +321,7 @@ impl AccountCreator {
>>> method: "POST",
>>> content_type: crate::request::JSON_CONTENT_TYPE,
>>> body,
>>> - expected: crate::request::CREATED,
>>> + expected: crate::http_status::CREATED,
>>> })
>>> }
>>>
>>> diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
>>> index 2ff3ba22..043648bb 100644
>>> --- a/proxmox-acme/src/async_client.rs
>>> +++ b/proxmox-acme/src/async_client.rs
>>> @@ -498,7 +498,7 @@ impl AcmeClient {
>>> method: "GET",
>>> content_type: "",
>>> body: String::new(),
>>> - expected: 200,
>>> + expected: crate::http_status::OK,
>>> },
>>> nonce,
>>> )
>>> @@ -550,7 +550,7 @@ impl AcmeClient {
>>> method: "HEAD",
>>> content_type: "",
>>> body: String::new(),
>>> - expected: 200,
>>> + expected: crate::http_status::OK,
>>> },
>>> nonce,
>>> )
>>> diff --git a/proxmox-acme/src/lib.rs b/proxmox-acme/src/lib.rs
>>> index 6722030c..6051a025 100644
>>> --- a/proxmox-acme/src/lib.rs
>>> +++ b/proxmox-acme/src/lib.rs
>>> @@ -70,6 +70,8 @@ pub use order::Order;
>>> #[cfg(feature = "impl")]
>>> pub use order::NewOrder;
>>> #[cfg(feature = "impl")]
>>> +pub(crate) use request::http_status;
>>> +#[cfg(feature = "impl")]
>>> pub use request::ErrorResponse;
>>>
>>> /// Header name for nonces.
>>> diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
>>> index dadfc5af..341ce53e 100644
>>> --- a/proxmox-acme/src/request.rs
>>> +++ b/proxmox-acme/src/request.rs
>>> @@ -1,7 +1,6 @@
>>> use serde::Deserialize;
>>>
>>> pub(crate) const JSON_CONTENT_TYPE: &str = "application/jose+json";
>>> -pub(crate) const CREATED: u16 = 201;
>>>
>>> /// A request which should be performed on the ACME provider.
>>> pub(crate) struct Request {
>>> @@ -21,6 +20,16 @@ pub(crate) struct Request {
>>> pub(crate) expected: u16,
>>> }
>>>
>>> +/// Common HTTP status codes used in ACME responses.
>>> +pub(crate) mod http_status {
>>> + /// 200 OK
>>> + pub(crate) const OK: u16 = 200;
>>> + /// 201 Created
>>> + pub(crate) const CREATED: u16 = 201;
>>> + /// 204 No Content
>>> + pub(crate) const NO_CONTENT: u16 = 204;
>>> +}
>>> +
>>> /// An ACME error response contains a specially formatted type string, and can optionally
>>> /// contain textual details and a set of sub problems.
>>> #[derive(Clone, Debug, Deserialize)]
>>> --
>>> 2.47.3
>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 3/5] acme: drop local AcmeClient
2026-01-14 10:52 6% ` Samuel Rufinatscha
@ 2026-01-14 16:41 12% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-14 16:41 UTC (permalink / raw)
To: Fabian Grünbichler, Proxmox Backup Server development discussion
On 1/14/26 11:52 AM, Samuel Rufinatscha wrote:
> On 1/14/26 10:57 AM, Fabian Grünbichler wrote:
>> On January 14, 2026 9:56 am, Samuel Rufinatscha wrote:
>>> On 1/13/26 2:44 PM, Fabian Grünbichler wrote:
>>>> On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
>>>>> PBS currently uses its own ACME client and API logic, while PDM
>>>>> uses the
>>>>> factored out proxmox-acme and proxmox-acme-api crates. This
>>>>> duplication
>>>>> risks differences in behaviour and requires ACME maintenance in two
>>>>> places. This patch is part of a series to move PBS over to the shared
>>>>> ACME stack.
>>>>>
>>>>> Changes:
>>>>> - Remove the local src/acme/client.rs and switch to
>>>>> proxmox_acme::async_client::AcmeClient where needed.
>>>>> - Use proxmox_acme_api::load_client_with_account to the custom
>>>>> AcmeClient::load() function
>>>>> - Replace the local do_register() logic with
>>>>> proxmox_acme_api::register_account, to further ensure accounts are
>>>>> persisted
>>>>> - Replace the local AcmeAccountName type, required for
>>>>> proxmox_acme_api::register_account
>>>>>
>>>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>>>> ---
>>>>> src/acme/client.rs | 691
>>>>> -------------------------
>>>>> src/acme/mod.rs | 3 -
>>>>> src/acme/plugin.rs | 2 +-
>>>>> src/api2/config/acme.rs | 50 +-
>>>>> src/api2/node/certificates.rs | 2 +-
>>>>> src/api2/types/acme.rs | 8 -
>>>>> src/bin/proxmox_backup_manager/acme.rs | 17 +-
>>>>> src/config/acme/mod.rs | 8 +-
>>>>> src/config/node.rs | 9 +-
>>>>> 9 files changed, 36 insertions(+), 754 deletions(-)
>>>>> delete mode 100644 src/acme/client.rs
>>>>>
>>>>
>>>> [..]
>>>>
>>>>> diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
>>>>> index ac89ae5e..e4639c53 100644
>>>>> --- a/src/config/acme/mod.rs
>>>>> +++ b/src/config/acme/mod.rs
>>>>
>>>> I think this whole file should probably be replaced entirely by
>>>> proxmox-acme-api , which - AFAICT - would just require adding the
>>>> completion helpers there?
>>>>
>>>
>>> Good point, yes I think moving the completion helpers would
>>> allow us to get rid of this file. PDM does not make use of
>>> them / there is atm no 1:1 code in proxmox/ for these helpers.
>>
>> only because https://bugzilla.proxmox.com/show_bug.cgi?id=7179 is not
>> yet implemented ;) so please coordinate with Shan to avoid doing the
>> work twice.
>
> Ah, good catch! thanks for the reference @Fabian.
>
> @Shan: since #7179 will likely touch the same area, it probably makes
> sense to factor out the required helpers as part of this series
> to avoid duplicate work. If that works for you, maybe hold off on
> parallel changes here until this lands. What do you think?
>
>
We discussed it, Shan has already progressed on the CLI side, but
hasn’t integrated the completion helpers yet, so there’s no duplicate
work.
The current plan is:
I’ll move/factor the completion helpers into proxmox/proxmox-acme-api
(preferably as a standalone patch so it can be applied
independently).
Shan will then consume those helpers in PDM as part of #7179, so the
CLI can make use of the completions.
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* Re: [pbs-devel] [PATCH proxmox v5 1/4] acme: reduce visibility of Request type
2026-01-13 13:46 5% ` Fabian Grünbichler
@ 2026-01-14 15:07 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-14 15:07 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 1/13/26 2:46 PM, Fabian Grünbichler wrote:
> On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
>> Currently, the low-level ACME Request type is publicly exposed, even
>> though users are expected to go through AcmeClient and
>> proxmox-acme-api handlers. This patch reduces visibility so that
>> the Request type and related fields/methods are crate-internal only.
>
> it also removes a lot of public and private code entirely, not just
> changing visibility.. I think those were intentionally there to allow
> usage without the need to using either of the provided client
> implementations (which are guarded behind feature flags).
>
> if we say the crate should only be used via either the `client` or the
> `async-client` then that's fine, but it should be made explicit and
> discussed.. right now this is sort of half-way there - e.g., the
> Account::new_order method was not made private, even though it makes no
> sense anymore with those other methods/helpers removed..
>
> this patch also breaks a few reference in doc comments that would need
> to be dropped.
>
> a note that this breaks the current usage of proxmox-acme in PBS would
> also be good to have here, if this is kept..
>
Makes sense.
I think the best here is to drop the visibility reductions and removals
and keep the low-level API intact (as it’s currently documented and
feature-gated).
This will keep this series focused on fixing the 204 nonce handling and
switch PBS to the factored-out client / API handlers (to be on the same
base as PDM).
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> proxmox-acme/src/account.rs | 94 ++-----------------------------
>> proxmox-acme/src/async_client.rs | 2 +-
>> proxmox-acme/src/authorization.rs | 30 ----------
>> proxmox-acme/src/client.rs | 6 +-
>> proxmox-acme/src/lib.rs | 4 --
>> proxmox-acme/src/order.rs | 2 +-
>> proxmox-acme/src/request.rs | 12 ++--
>> 7 files changed, 16 insertions(+), 134 deletions(-)
>>
>> diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
>> index f763c1e9..d8eb3e73 100644
>> --- a/proxmox-acme/src/account.rs
>> +++ b/proxmox-acme/src/account.rs
>> @@ -8,12 +8,11 @@ use openssl::pkey::{PKey, Private};
>> use serde::{Deserialize, Serialize};
>> use serde_json::Value;
>>
>> -use crate::authorization::{Authorization, GetAuthorization};
>> use crate::b64u;
>> use crate::directory::Directory;
>> use crate::jws::Jws;
>> use crate::key::{Jwk, PublicKey};
>> -use crate::order::{NewOrder, Order, OrderData};
>> +use crate::order::{NewOrder, OrderData};
>> use crate::request::Request;
>> use crate::types::{AccountData, AccountStatus, ExternalAccountBinding};
>> use crate::Error;
>> @@ -92,7 +91,7 @@ impl Account {
>> }
>>
>> /// Prepare a "POST-as-GET" request to fetch data. Low level helper.
>> - pub fn get_request(&self, url: &str, nonce: &str) -> Result<Request, Error> {
>> + pub(crate) fn get_request(&self, url: &str, nonce: &str) -> Result<Request, Error> {
>> let key = PKey::private_key_from_pem(self.private_key.as_bytes())?;
>> let body = serde_json::to_string(&Jws::new_full(
>> &key,
>> @@ -112,7 +111,7 @@ impl Account {
>> }
>>
>> /// Prepare a JSON POST request. Low level helper.
>> - pub fn post_request<T: Serialize>(
>> + pub(crate) fn post_request<T: Serialize>(
>> &self,
>> url: &str,
>> nonce: &str,
>> @@ -136,31 +135,6 @@ impl Account {
>> })
>> }
>>
>> - /// Prepare a JSON POST request.
>> - fn post_request_raw_payload(
>> - &self,
>> - url: &str,
>> - nonce: &str,
>> - payload: String,
>> - ) -> Result<Request, Error> {
>> - let key = PKey::private_key_from_pem(self.private_key.as_bytes())?;
>> - let body = serde_json::to_string(&Jws::new_full(
>> - &key,
>> - Some(self.location.clone()),
>> - url.to_owned(),
>> - nonce.to_owned(),
>> - payload,
>> - )?)?;
>> -
>> - Ok(Request {
>> - url: url.to_owned(),
>> - method: "POST",
>> - content_type: crate::request::JSON_CONTENT_TYPE,
>> - body,
>> - expected: 200,
>> - })
>> - }
>> -
>> /// Get the "key authorization" for a token.
>> pub fn key_authorization(&self, token: &str) -> Result<String, Error> {
>> let key = PKey::private_key_from_pem(self.private_key.as_bytes())?;
>> @@ -176,64 +150,6 @@ impl Account {
>> Ok(b64u::encode(digest))
>> }
>>
>> - /// Prepare a request to update account data.
>> - ///
>> - /// This is a rather low level interface. You should know what you're doing.
>> - pub fn update_account_request<T: Serialize>(
>> - &self,
>> - nonce: &str,
>> - data: &T,
>> - ) -> Result<Request, Error> {
>> - self.post_request(&self.location, nonce, data)
>> - }
>> -
>> - /// Prepare a request to deactivate this account.
>> - pub fn deactivate_account_request<T: Serialize>(&self, nonce: &str) -> Result<Request, Error> {
>> - self.post_request_raw_payload(
>> - &self.location,
>> - nonce,
>> - r#"{"status":"deactivated"}"#.to_string(),
>> - )
>> - }
>> -
>> - /// Prepare a request to query an Authorization for an Order.
>> - ///
>> - /// Returns `Ok(None)` if `auth_index` is out of out of range. You can query the number of
>> - /// authorizations from via [`Order::authorization_len`] or by manually inspecting its
>> - /// `.data.authorization` vector.
>> - pub fn get_authorization(
>> - &self,
>> - order: &Order,
>> - auth_index: usize,
>> - nonce: &str,
>> - ) -> Result<Option<GetAuthorization>, Error> {
>> - match order.authorization(auth_index) {
>> - None => Ok(None),
>> - Some(url) => Ok(Some(GetAuthorization::new(self.get_request(url, nonce)?))),
>> - }
>> - }
>> -
>> - /// Prepare a request to validate a Challenge from an Authorization.
>> - ///
>> - /// Returns `Ok(None)` if `challenge_index` is out of out of range. The challenge count is
>> - /// available by inspecting the [`Authorization::challenges`] vector.
>> - ///
>> - /// This returns a raw `Request` since validation takes some time and the `Authorization`
>> - /// object has to be re-queried and its `status` inspected.
>> - pub fn validate_challenge(
>> - &self,
>> - authorization: &Authorization,
>> - challenge_index: usize,
>> - nonce: &str,
>> - ) -> Result<Option<Request>, Error> {
>> - match authorization.challenges.get(challenge_index) {
>> - None => Ok(None),
>> - Some(challenge) => self
>> - .post_request_raw_payload(&challenge.url, nonce, "{}".to_string())
>> - .map(Some),
>> - }
>> - }
>> -
>> /// Prepare a request to revoke a certificate.
>> ///
>> /// The certificate can be either PEM or DER formatted.
>> @@ -274,7 +190,7 @@ pub struct CertificateRevocation<'a> {
>>
>> impl CertificateRevocation<'_> {
>> /// Create the revocation request using the specified nonce for the given directory.
>> - pub fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
>> + pub(crate) fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
>> let revoke_cert = directory.data.revoke_cert.as_ref().ok_or_else(|| {
>> Error::Custom("no 'revokeCert' URL specified by provider".to_string())
>> })?;
>> @@ -364,7 +280,7 @@ impl AccountCreator {
>> /// the resulting request.
>> /// Changing the private key between using the request and passing the response to
>> /// [`response`](AccountCreator::response()) will render the account unusable!
>> - pub fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
>> + pub(crate) fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
>> let key = self.key.as_deref().ok_or(Error::MissingKey)?;
>> let url = directory.new_account_url().ok_or_else(|| {
>> Error::Custom("no 'newAccount' URL specified by provider".to_string())
>> diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
>> index dc755fb9..2ff3ba22 100644
>> --- a/proxmox-acme/src/async_client.rs
>> +++ b/proxmox-acme/src/async_client.rs
>> @@ -10,7 +10,7 @@ use proxmox_http::{client::Client, Body};
>>
>> use crate::account::AccountCreator;
>> use crate::order::{Order, OrderData};
>> -use crate::Request as AcmeRequest;
>> +use crate::request::Request as AcmeRequest;
>> use crate::{Account, Authorization, Challenge, Directory, Error, ErrorResponse};
>>
>> /// A non-blocking Acme client using tokio/hyper.
>> diff --git a/proxmox-acme/src/authorization.rs b/proxmox-acme/src/authorization.rs
>> index 28bc1b4b..7027381a 100644
>> --- a/proxmox-acme/src/authorization.rs
>> +++ b/proxmox-acme/src/authorization.rs
>> @@ -6,8 +6,6 @@ use serde::{Deserialize, Serialize};
>> use serde_json::Value;
>>
>> use crate::order::Identifier;
>> -use crate::request::Request;
>> -use crate::Error;
>>
>> /// Status of an [`Authorization`].
>> #[derive(Clone, Copy, Debug, Eq, PartialEq, Deserialize, Serialize)]
>> @@ -132,31 +130,3 @@ impl Challenge {
>> fn is_false(b: &bool) -> bool {
>> !*b
>> }
>> -
>> -/// Represents an in-flight query for an authorization.
>> -///
>> -/// This is created via [`Account::get_authorization`](crate::Account::get_authorization()).
>> -pub struct GetAuthorization {
>> - //order: OrderData,
>> - /// The request to send to the ACME provider. This is wrapped in an option in order to allow
>> - /// moving it out instead of copying the contents.
>> - ///
>> - /// When generated via [`Account::get_authorization`](crate::Account::get_authorization()),
>> - /// this is guaranteed to be `Some`.
>> - ///
>> - /// The response should be passed to the the [`response`](GetAuthorization::response()) method.
>> - pub request: Option<Request>,
>> -}
>> -
>> -impl GetAuthorization {
>> - pub(crate) fn new(request: Request) -> Self {
>> - Self {
>> - request: Some(request),
>> - }
>> - }
>> -
>> - /// Deal with the response we got from the server.
>> - pub fn response(self, response_body: &[u8]) -> Result<Authorization, Error> {
>> - Ok(serde_json::from_slice(response_body)?)
>> - }
>> -}
>> diff --git a/proxmox-acme/src/client.rs b/proxmox-acme/src/client.rs
>> index 931f7245..5c812567 100644
>> --- a/proxmox-acme/src/client.rs
>> +++ b/proxmox-acme/src/client.rs
>> @@ -7,8 +7,8 @@ use serde::{Deserialize, Serialize};
>> use crate::b64u;
>> use crate::error;
>> use crate::order::OrderData;
>> -use crate::request::ErrorResponse;
>> -use crate::{Account, Authorization, Challenge, Directory, Error, Order, Request};
>> +use crate::request::{ErrorResponse, Request};
>> +use crate::{Account, Authorization, Challenge, Directory, Error, Order};
>>
>> macro_rules! format_err {
>> ($($fmt:tt)*) => { Error::Client(format!($($fmt)*)) };
>> @@ -564,7 +564,7 @@ impl Client {
>> }
>>
>> /// Low-level API to run an n API request. This automatically updates the current nonce!
>> - pub fn run_request(&mut self, request: Request) -> Result<HttpResponse, Error> {
>> + pub(crate) fn run_request(&mut self, request: Request) -> Result<HttpResponse, Error> {
>> self.inner.run_request(request)
>> }
>>
>> diff --git a/proxmox-acme/src/lib.rs b/proxmox-acme/src/lib.rs
>> index df722629..6722030c 100644
>> --- a/proxmox-acme/src/lib.rs
>> +++ b/proxmox-acme/src/lib.rs
>> @@ -66,10 +66,6 @@ pub use error::Error;
>> #[doc(inline)]
>> pub use order::Order;
>>
>> -#[cfg(feature = "impl")]
>> -#[doc(inline)]
>> -pub use request::Request;
>> -
>> // we don't inline these:
>> #[cfg(feature = "impl")]
>> pub use order::NewOrder;
>> diff --git a/proxmox-acme/src/order.rs b/proxmox-acme/src/order.rs
>> index b6551004..432a81a4 100644
>> --- a/proxmox-acme/src/order.rs
>> +++ b/proxmox-acme/src/order.rs
>> @@ -153,7 +153,7 @@ pub struct NewOrder {
>> //order: OrderData,
>> /// The request to execute to place the order. When creating a [`NewOrder`] via
>> /// [`Account::new_order`](crate::Account::new_order) this is guaranteed to be `Some`.
>> - pub request: Option<Request>,
>> + pub(crate) request: Option<Request>,
>> }
>>
>> impl NewOrder {
>> diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
>> index 78a90913..dadfc5af 100644
>> --- a/proxmox-acme/src/request.rs
>> +++ b/proxmox-acme/src/request.rs
>> @@ -4,21 +4,21 @@ pub(crate) const JSON_CONTENT_TYPE: &str = "application/jose+json";
>> pub(crate) const CREATED: u16 = 201;
>>
>> /// A request which should be performed on the ACME provider.
>> -pub struct Request {
>> +pub(crate) struct Request {
>> /// The complete URL to send the request to.
>> - pub url: String,
>> + pub(crate) url: String,
>>
>> /// The HTTP method name to use.
>> - pub method: &'static str,
>> + pub(crate) method: &'static str,
>>
>> /// The `Content-Type` header to pass along.
>> - pub content_type: &'static str,
>> + pub(crate) content_type: &'static str,
>>
>> /// The body to pass along with request, or an empty string.
>> - pub body: String,
>> + pub(crate) body: String,
>>
>> /// The expected status code a compliant ACME provider will return on success.
>> - pub expected: u16,
>> + pub(crate) expected: u16,
>> }
>>
>> /// An ACME error response contains a specially formatted type string, and can optionally
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-datacenter-manager v3 2/2] docs: document API token-cache TTL effects
2026-01-14 10:45 5% ` Fabian Grünbichler
@ 2026-01-14 11:24 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-14 11:24 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>> Documents the effects of the added API token-cache in the
>> proxmox-access-control crate. This patch is part of the
>> series that fixes bug #7017 [1].
>>
>> This patch is part of the series which fixes bug #7017 [1].
>>
>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>
> please try to read your commit messages at least once before sending.
> the bug is referenced three times here, and it is not necessary to
> mention it at all..
>
Fair point — the duplication here is a copy/paste slip. I’ll resend v4
with a cleaned-up commit message for this docs only patch and remove
the "part of the series" boilerplate from the other commits. Thanks.
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> Changes from v2 to v3:
>>
>> * Reword documentation warning for clarity.
>>
>> docs/access-control.rst | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/docs/access-control.rst b/docs/access-control.rst
>> index adf26cd..18e57a2 100644
>> --- a/docs/access-control.rst
>> +++ b/docs/access-control.rst
>> @@ -47,6 +47,10 @@ place of the user ID (``user@realm``) and the user password, respectively.
>> The API token is passed from the client to the server by setting the ``Authorization`` HTTP header
>> with method ``PDMAPIToken`` to the value ``TOKENID:TOKENSECRET``.
>>
>> +.. WARNING:: Direct/manual edits to ``token.shadow`` may take up to 60 seconds (or
>> + longer in edge cases) to take effect due to caching. Restart services for
>> + immediate effect of manual edits.
>> +
>> .. _access_control:
>>
>> Access Control
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 3/5] acme: drop local AcmeClient
2026-01-14 9:58 5% ` Fabian Grünbichler
@ 2026-01-14 10:52 6% ` Samuel Rufinatscha
2026-01-14 16:41 12% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2026-01-14 10:52 UTC (permalink / raw)
To: Fabian Grünbichler, Proxmox Backup Server development discussion
On 1/14/26 10:57 AM, Fabian Grünbichler wrote:
> On January 14, 2026 9:56 am, Samuel Rufinatscha wrote:
>> On 1/13/26 2:44 PM, Fabian Grünbichler wrote:
>>> On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
>>>> PBS currently uses its own ACME client and API logic, while PDM uses the
>>>> factored out proxmox-acme and proxmox-acme-api crates. This duplication
>>>> risks differences in behaviour and requires ACME maintenance in two
>>>> places. This patch is part of a series to move PBS over to the shared
>>>> ACME stack.
>>>>
>>>> Changes:
>>>> - Remove the local src/acme/client.rs and switch to
>>>> proxmox_acme::async_client::AcmeClient where needed.
>>>> - Use proxmox_acme_api::load_client_with_account to the custom
>>>> AcmeClient::load() function
>>>> - Replace the local do_register() logic with
>>>> proxmox_acme_api::register_account, to further ensure accounts are persisted
>>>> - Replace the local AcmeAccountName type, required for
>>>> proxmox_acme_api::register_account
>>>>
>>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>>> ---
>>>> src/acme/client.rs | 691 -------------------------
>>>> src/acme/mod.rs | 3 -
>>>> src/acme/plugin.rs | 2 +-
>>>> src/api2/config/acme.rs | 50 +-
>>>> src/api2/node/certificates.rs | 2 +-
>>>> src/api2/types/acme.rs | 8 -
>>>> src/bin/proxmox_backup_manager/acme.rs | 17 +-
>>>> src/config/acme/mod.rs | 8 +-
>>>> src/config/node.rs | 9 +-
>>>> 9 files changed, 36 insertions(+), 754 deletions(-)
>>>> delete mode 100644 src/acme/client.rs
>>>>
>>>
>>> [..]
>>>
>>>> diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
>>>> index ac89ae5e..e4639c53 100644
>>>> --- a/src/config/acme/mod.rs
>>>> +++ b/src/config/acme/mod.rs
>>>
>>> I think this whole file should probably be replaced entirely by
>>> proxmox-acme-api , which - AFAICT - would just require adding the
>>> completion helpers there?
>>>
>>
>> Good point, yes I think moving the completion helpers would
>> allow us to get rid of this file. PDM does not make use of
>> them / there is atm no 1:1 code in proxmox/ for these helpers.
>
> only because https://bugzilla.proxmox.com/show_bug.cgi?id=7179 is not
> yet implemented ;) so please coordinate with Shan to avoid doing the
> work twice.
Ah, good catch! thanks for the reference @Fabian.
@Shan: since #7179 will likely touch the same area, it probably makes
sense to factor out the required helpers as part of this series
to avoid duplicate work. If that works for you, maybe hold off on
parallel changes here until this lands. What do you think?
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-datacenter-manager v3 2/2] docs: document API token-cache TTL effects
2026-01-02 16:07 17% ` [pbs-devel] [PATCH proxmox-datacenter-manager v3 2/2] docs: document API token-cache TTL effects Samuel Rufinatscha
@ 2026-01-14 10:45 5% ` Fabian Grünbichler
2026-01-14 11:24 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2026-01-14 10:45 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
> Documents the effects of the added API token-cache in the
> proxmox-access-control crate. This patch is part of the
> series that fixes bug #7017 [1].
>
> This patch is part of the series which fixes bug #7017 [1].
>
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
please try to read your commit messages at least once before sending.
the bug is referenced three times here, and it is not necessary to
mention it at all..
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> Changes from v2 to v3:
>
> * Reword documentation warning for clarity.
>
> docs/access-control.rst | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/docs/access-control.rst b/docs/access-control.rst
> index adf26cd..18e57a2 100644
> --- a/docs/access-control.rst
> +++ b/docs/access-control.rst
> @@ -47,6 +47,10 @@ place of the user ID (``user@realm``) and the user password, respectively.
> The API token is passed from the client to the server by setting the ``Authorization`` HTTP header
> with method ``PDMAPIToken`` to the value ``TOKENID:TOKENSECRET``.
>
> +.. WARNING:: Direct/manual edits to ``token.shadow`` may take up to 60 seconds (or
> + longer in edge cases) to take effect due to caching. Restart services for
> + immediate effect of manual edits.
> +
> .. _access_control:
>
> Access Control
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-datacenter-manager v3 1/2] pdm-config: implement token.shadow generation
2026-01-02 16:07 13% ` [pbs-devel] [PATCH proxmox-datacenter-manager v3 1/2] pdm-config: implement token.shadow generation Samuel Rufinatscha
@ 2026-01-14 10:45 5% ` Fabian Grünbichler
2026-01-16 16:28 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2026-01-14 10:45 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
> PDM depends on the shared proxmox/proxmox-access-control crate for
> token.shadow handling, which expects the product to provide a
> cross-process invalidation signal so it can safely cache verified API
> token secrets and invalidate them when token.shadow is changed.
>
> This patch
>
> * adds a token_shadow_generation to PDM’s shared-memory
> ConfigVersionCache
> * implements proxmox_access_control::init::AccessControlConfig
> for pdm_config::AccessControlConfig, which
> - delegates roles/privs/path checks to the existing
> pdm_api_types::AccessControlConfig implementation
> - implements the shadow cache generation trait functions
> * switches the AccessControlConfig init paths (server + CLI) to use
> pdm_config::AccessControlConfig instead of
> pdm_api_types::AccessControlConfig
>
> This patch is part of the series which fixes bug #7017 [1].
>
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> cli/admin/src/main.rs | 2 +-
> lib/pdm-config/Cargo.toml | 1 +
> lib/pdm-config/src/access_control_config.rs | 73 +++++++++++++++++++++
> lib/pdm-config/src/config_version_cache.rs | 18 +++++
> lib/pdm-config/src/lib.rs | 2 +
> server/src/acl.rs | 3 +-
> 6 files changed, 96 insertions(+), 3 deletions(-)
> create mode 100644 lib/pdm-config/src/access_control_config.rs
>
> diff --git a/cli/admin/src/main.rs b/cli/admin/src/main.rs
> index f698fa2..916c633 100644
> --- a/cli/admin/src/main.rs
> +++ b/cli/admin/src/main.rs
> @@ -19,7 +19,7 @@ fn main() {
> proxmox_product_config::init(api_user, priv_user);
>
> proxmox_access_control::init::init(
> - &pdm_api_types::AccessControlConfig,
> + &pdm_config::AccessControlConfig,
> pdm_buildcfg::configdir!("/access"),
> )
> .expect("failed to setup access control config");
> diff --git a/lib/pdm-config/Cargo.toml b/lib/pdm-config/Cargo.toml
> index d39c2ad..19781d2 100644
> --- a/lib/pdm-config/Cargo.toml
> +++ b/lib/pdm-config/Cargo.toml
> @@ -13,6 +13,7 @@ once_cell.workspace = true
> openssl.workspace = true
> serde.workspace = true
>
> +proxmox-access-control.workspace = true
> proxmox-config-digest = { workspace = true, features = [ "openssl" ] }
> proxmox-http = { workspace = true, features = [ "http-helpers" ] }
> proxmox-ldap = { workspace = true, features = [ "types" ]}
> diff --git a/lib/pdm-config/src/access_control_config.rs b/lib/pdm-config/src/access_control_config.rs
> new file mode 100644
> index 0000000..6f2e6b3
> --- /dev/null
> +++ b/lib/pdm-config/src/access_control_config.rs
> @@ -0,0 +1,73 @@
> +// e.g. in src/main.rs or server::context mod, wherever convenient
> +
> +use anyhow::Error;
> +use pdm_api_types::{Authid, Userid};
> +use proxmox_section_config::SectionConfigData;
> +use std::collections::HashMap;
> +
> +pub struct AccessControlConfig;
> +
> +impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
should we then remove the impl from the api type?
> + fn privileges(&self) -> &HashMap<&str, u64> {
> + pdm_api_types::AccessControlConfig.privileges()
> + }
> +
> + fn roles(&self) -> &HashMap<&str, (u64, &str)> {
> + pdm_api_types::AccessControlConfig.roles()
> + }
> +
> + fn is_superuser(&self, auth_id: &Authid) -> bool {
> + pdm_api_types::AccessControlConfig.is_superuser(auth_id)
> + }
> +
> + fn is_group_member(&self, user_id: &Userid, group: &str) -> bool {
> + pdm_api_types::AccessControlConfig.is_group_member(user_id, group)
> + }
> +
> + fn role_admin(&self) -> Option<&str> {
> + pdm_api_types::AccessControlConfig.role_admin()
> + }
> +
> + fn role_no_access(&self) -> Option<&str> {
> + pdm_api_types::AccessControlConfig.role_no_access()
> + }
> +
> + fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
> + pdm_api_types::AccessControlConfig.init_user_config(config)
> + }
> +
> + fn acl_audit_privileges(&self) -> u64 {
> + pdm_api_types::AccessControlConfig.acl_audit_privileges()
> + }
> +
> + fn acl_modify_privileges(&self) -> u64 {
> + pdm_api_types::AccessControlConfig.acl_modify_privileges()
> + }
> +
> + fn check_acl_path(&self, path: &str) -> Result<(), Error> {
> + pdm_api_types::AccessControlConfig.check_acl_path(path)
> + }
> +
> + fn allow_partial_permission_match(&self) -> bool {
> + pdm_api_types::AccessControlConfig.allow_partial_permission_match()
> + }
> +
> + fn cache_generation(&self) -> Option<usize> {
> + pdm_api_types::AccessControlConfig.cache_generation()
> + }
shouldn't this be wired up to the ConfigVersionCache?
> +
> + fn increment_cache_generation(&self) -> Result<(), Error> {
> + pdm_api_types::AccessControlConfig.increment_cache_generation()
shouldn't this be wired up to the ConfigVersionCache?
> + }
> +
> + fn token_shadow_cache_generation(&self) -> Option<usize> {
> + crate::ConfigVersionCache::new()
> + .ok()
> + .map(|c| c.token_shadow_generation())
> + }
> +
> + fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
> + let c = crate::ConfigVersionCache::new()?;
> + Ok(c.increase_token_shadow_generation())
> + }
> +}
> diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
> index 36a6a77..933140c 100644
> --- a/lib/pdm-config/src/config_version_cache.rs
> +++ b/lib/pdm-config/src/config_version_cache.rs
> @@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
> traffic_control_generation: AtomicUsize,
> // Tracks updates to the remote/hostname/nodename mapping cache.
> remote_mapping_cache: AtomicUsize,
> + // Token shadow (token.shadow) generation/version.
> + token_shadow_generation: AtomicUsize,
explanation why this is safe for the commit message would be nice ;)
> // Add further atomics here
> }
>
> @@ -172,4 +174,20 @@ impl ConfigVersionCache {
> .fetch_add(1, Ordering::Relaxed)
> + 1
> }
> +
> + /// Returns the token shadow generation number.
> + pub fn token_shadow_generation(&self) -> usize {
> + self.shmem
> + .data()
> + .token_shadow_generation
> + .load(Ordering::Acquire)
> + }
> +
> + /// Increase the token shadow generation number.
> + pub fn increase_token_shadow_generation(&self) -> usize {
> + self.shmem
> + .data()
> + .token_shadow_generation
> + .fetch_add(1, Ordering::AcqRel)
> + }
> }
> diff --git a/lib/pdm-config/src/lib.rs b/lib/pdm-config/src/lib.rs
> index 4c49054..a15a006 100644
> --- a/lib/pdm-config/src/lib.rs
> +++ b/lib/pdm-config/src/lib.rs
> @@ -9,6 +9,8 @@ pub mod remotes;
> pub mod setup;
> pub mod views;
>
> +mod access_control_config;
> +pub use access_control_config::AccessControlConfig;
> mod config_version_cache;
> pub use config_version_cache::ConfigVersionCache;
>
> diff --git a/server/src/acl.rs b/server/src/acl.rs
> index f421814..e6e007b 100644
> --- a/server/src/acl.rs
> +++ b/server/src/acl.rs
> @@ -1,6 +1,5 @@
> pub(crate) fn init() {
> - static ACCESS_CONTROL_CONFIG: pdm_api_types::AccessControlConfig =
> - pdm_api_types::AccessControlConfig;
> + static ACCESS_CONTROL_CONFIG: pdm_config::AccessControlConfig = pdm_config::AccessControlConfig;
>
> proxmox_access_control::init::init(&ACCESS_CONTROL_CONFIG, pdm_buildcfg::configdir!("/access"))
> .expect("failed to setup access control config");
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 1/4] pbs-config: add token.shadow generation to ConfigVersionCache
2026-01-02 16:07 17% ` [pbs-devel] [PATCH proxmox-backup v3 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
@ 2026-01-14 10:44 5% ` Fabian Grünbichler
2026-01-16 13:53 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2026-01-14 10:44 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
> Currently, every token-based API request reads the token.shadow file and
> runs the expensive password hash verification for the given token
> secret. This shows up as a hotspot in /status profiling (see
> bug #7017 [1]).
>
> To solve the issue, this patch prepares the config version cache,
> so that token_shadow_generation config caching can be built on
> top of it.
>
> This patch specifically:
> (1) implements increment function in order to invalidate generations
this is needlessly verbose..
>
> This patch is part of the series which fixes bug #7017 [1].
this is already mentioned higher up and doesn't need to be repeated
here.
this patch needs a rebase. it would be good to call out why it is safe
to add to this struct, since it is accessed/mapped by both old and new
processes.
>
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> pbs-config/src/config_version_cache.rs | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
> index e8fb994f..1376b11d 100644
> --- a/pbs-config/src/config_version_cache.rs
> +++ b/pbs-config/src/config_version_cache.rs
> @@ -28,6 +28,8 @@ struct ConfigVersionCacheDataInner {
> // datastore (datastore.cfg) generation/version
> // FIXME: remove with PBS 3.0
> datastore_generation: AtomicUsize,
> + // Token shadow (token.shadow) generation/version.
> + token_shadow_generation: AtomicUsize,
> // Add further atomics here
> }
>
> @@ -153,4 +155,20 @@ impl ConfigVersionCache {
> .datastore_generation
> .fetch_add(1, Ordering::AcqRel)
> }
> +
> + /// Returns the token shadow generation number.
> + pub fn token_shadow_generation(&self) -> usize {
> + self.shmem
> + .data()
> + .token_shadow_generation
> + .load(Ordering::Acquire)
> + }
> +
> + /// Increase the token shadow generation number.
> + pub fn increase_token_shadow_generation(&self) -> usize {
> + self.shmem
> + .data()
> + .token_shadow_generation
> + .fetch_add(1, Ordering::AcqRel)
> + }
> }
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 3/4] pbs-config: invalidate token-secret cache on token.shadow changes
2026-01-02 16:07 12% ` [pbs-devel] [PATCH proxmox-backup v3 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2026-01-14 10:44 5% ` Fabian Grünbichler
2026-01-20 9:21 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2026-01-14 10:44 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
> Previously the in-memory token-secret cache was only updated via
> set_secret() and delete_secret(), so manual edits to token.shadow were
> not reflected.
>
> This patch adds file change detection to the cache. It tracks the mtime
> and length of token.shadow and clears the in-memory token secret cache
> whenever these values change.
>
> Note, this patch fetches file stats on every request. An TTL-based
> optimization will be covered in a subsequent patch of the series.
>
> This patch is part of the series which fixes bug #7017 [1].
>
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> Changes from v1 to v2:
>
> * Add file metadata tracking (file_mtime, file_len) and
> FILE_GENERATION.
> * Store file_gen in CachedSecret and verify it against the current
> FILE_GENERATION to ensure cached entries belong to the current file
> state.
> * Add shadow_mtime_len() helper and convert refresh to best-effort
> (try_write, returns bool).
> * Pass a pre-write metadata snapshot into apply_api_mutation and
> clear/bump generation if the cache metadata indicates missed external
> edits.
>
> Changes from v2 to v3:
>
> * Cache now tracks last_checked (epoch seconds).
> * Simplified refresh_cache_if_file_changed, removed
> FILE_GENERATION logic
> * On first load, initializes file metadata and keeps empty cache.
>
> pbs-config/src/token_shadow.rs | 122 +++++++++++++++++++++++++++++++--
> 1 file changed, 118 insertions(+), 4 deletions(-)
>
> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
> index fa84aee5..02fb191b 100644
> --- a/pbs-config/src/token_shadow.rs
> +++ b/pbs-config/src/token_shadow.rs
> @@ -1,5 +1,8 @@
> use std::collections::HashMap;
> +use std::fs;
> +use std::io::ErrorKind;
> use std::sync::LazyLock;
> +use std::time::SystemTime;
>
> use anyhow::{bail, format_err, Error};
> use parking_lot::RwLock;
> @@ -7,6 +10,7 @@ use serde::{Deserialize, Serialize};
> use serde_json::{from_value, Value};
>
> use proxmox_sys::fs::CreateOptions;
> +use proxmox_time::epoch_i64;
>
> use pbs_api_types::Authid;
> //use crate::auth;
> @@ -24,6 +28,9 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
> RwLock::new(ApiTokenSecretCache {
> secrets: HashMap::new(),
> shared_gen: 0,
> + file_mtime: None,
> + file_len: None,
> + last_checked: None,
> })
> });
>
> @@ -62,6 +69,63 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
> proxmox_sys::fs::replace_file(CONF_FILE, &json, options, true)
> }
>
> +/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
> +/// Returns true if the cache is valid to use, false if not.
> +fn refresh_cache_if_file_changed() -> bool {
> + let now = epoch_i64();
> +
> + // Best-effort refresh under write lock.
> + let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
> + return false;
> + };
> +
> + let Some(shared_gen_now) = token_shadow_shared_gen() else {
> + return false;
> + };
> +
> + // If another process bumped the generation, we don't know what changed -> clear cache
> + if cache.shared_gen != shared_gen_now {
> + invalidate_cache_state(&mut cache);
> + cache.shared_gen = shared_gen_now;
> + }
> +
> + // Stat the file to detect manual edits.
> + let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
> + return false;
> + };
> +
> + // Initialize file stats if we have no prior state.
> + if cache.last_checked.is_none() {
> + cache.secrets.clear(); // ensure cache is empty on first load
> + cache.file_mtime = new_mtime;
> + cache.file_len = new_len;
> + cache.last_checked = Some(now);
> + return true;
this code here
> + }
> +
> + // No change detected.
> + if cache.file_mtime == new_mtime && cache.file_len == new_len {
> + cache.last_checked = Some(now);
> + return true;
> + }
> +
> + // Manual edit detected -> invalidate cache and update stat.
> + cache.secrets.clear();
> + cache.file_mtime = new_mtime;
> + cache.file_len = new_len;
> + cache.last_checked = Some(now);
and this code here are identical. if this is the first invocation, then
the change detection check above cannot be true (the cached mtime and
len will be None).
so we can drop the first if above, and replace the last line in this
hunk with
let prev_last_checked = cache.last_checked.replace(Some(now));
and then skip bumping the generation if this is_none()
OTOH, if we just cleared the cache here, does it make sense to return
true? the cache is empty, so likely querying it *now* makes no sense?
> +
> + // Best-effort propagation to other processes + update local view.
> + if let Some(shared_gen_new) = bump_token_shadow_shared_gen() {
> + cache.shared_gen = shared_gen_new;
> + } else {
> + // Do not fail: local cache is already safe as we cleared it above.
> + // Keep local shared_gen as-is to avoid repeated failed attempts.
> + }
> +
> + true
> +}
> +
> /// Verifies that an entry for given tokenid / API token secret exists
> pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
> if !tokenid.is_token() {
> @@ -69,7 +133,7 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
> }
>
> // Fast path
> - if cache_try_secret_matches(tokenid, secret) {
> + if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
> return Ok(());
> }
>
> @@ -109,12 +173,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>
> let _guard = lock_config()?;
>
> + // Capture state before we write to detect external edits.
> + let pre_meta = shadow_mtime_len().unwrap_or((None, None));
> +
> let mut data = read_file()?;
> let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
> data.insert(tokenid.clone(), hashed_secret);
> write_file(data)?;
>
> - apply_api_mutation(tokenid, Some(secret));
> + apply_api_mutation(tokenid, Some(secret), pre_meta);
>
> Ok(())
> }
> @@ -127,11 +194,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
>
> let _guard = lock_config()?;
>
> + // Capture state before we write to detect external edits.
> + let pre_meta = shadow_mtime_len().unwrap_or((None, None));
> +
> let mut data = read_file()?;
> data.remove(tokenid);
> write_file(data)?;
>
> - apply_api_mutation(tokenid, None);
> + apply_api_mutation(tokenid, None, pre_meta);
>
> Ok(())
> }
> @@ -145,6 +215,12 @@ struct ApiTokenSecretCache {
> secrets: HashMap<Authid, CachedSecret>,
> /// Shared generation to detect mutations of the underlying token.shadow file.
> shared_gen: usize,
> + // shadow file mtime to detect changes
> + file_mtime: Option<SystemTime>,
> + // shadow file length to detect changes
> + file_len: Option<u64>,
> + // last time the file metadata was checked
> + last_checked: Option<i64>,
these three are always set together, so wouldn't it make more sense to
make them an Option<ShadowFileInfo> ?
> }
>
> /// Cached secret.
> @@ -204,7 +280,13 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
> eq && gen2 == cache_gen
> }
>
> -fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
> +fn apply_api_mutation(
> + tokenid: &Authid,
> + new_secret: Option<&str>,
> + pre_write_meta: (Option<SystemTime>, Option<u64>),
> +) {
> + let now = epoch_i64();
> +
> // Signal cache invalidation to other processes (best-effort).
> let new_shared_gen = bump_token_shadow_shared_gen();
>
> @@ -220,6 +302,13 @@ fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
> // Update to the post-mutation generation.
> cache.shared_gen = gen;
>
> + // If our cached file metadata does not match the on-disk state before our write,
> + // we likely missed an external/manual edit. We can no longer trust any cached secrets.
> + let (pre_mtime, pre_len) = pre_write_meta;
> + if cache.file_mtime != pre_mtime || cache.file_len != pre_len {
> + cache.secrets.clear();
> + }
> +
> // Apply the new mutation.
> match new_secret {
> Some(secret) => {
> @@ -234,6 +323,20 @@ fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
> cache.secrets.remove(tokenid);
> }
> }
> +
> + // Update our view of the file metadata to the post-write state (best-effort).
> + // (If this fails, drop local cache so callers fall back to slow path until refreshed.)
> + match shadow_mtime_len() {
> + Ok((mtime, len)) => {
> + cache.file_mtime = mtime;
> + cache.file_len = len;
> + cache.last_checked = Some(now);
> + }
> + Err(_) => {
> + // If we cannot validate state, do not trust cache.
> + invalidate_cache_state(&mut cache);
> + }
> + }
> }
>
> /// Get the current shared generation.
> @@ -253,4 +356,15 @@ fn bump_token_shadow_shared_gen() -> Option<usize> {
> /// Invalidates the cache state and only keeps the shared generation.
> fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
> cache.secrets.clear();
> + cache.file_mtime = None;
> + cache.file_len = None;
> + cache.last_checked = None;
> +}
> +
> +fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
> + match fs::metadata(CONF_FILE) {
> + Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
> + Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
> + Err(e) => Err(e.into()),
> + }
> }
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets
2026-01-02 16:07 12% ` [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
@ 2026-01-14 10:44 5% ` Fabian Grünbichler
2026-01-16 15:13 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2026-01-14 10:44 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
> Currently, every token-based API request reads the token.shadow file and
> runs the expensive password hash verification for the given token
> secret. This shows up as a hotspot in /status profiling (see
> bug #7017 [1]).
>
> This patch introduces an in-memory cache of successfully verified token
> secrets. Subsequent requests for the same token+secret combination only
> perform a comparison using openssl::memcmp::eq and avoid re-running the
> password hash. The cache is updated when a token secret is set and
> cleared when a token is deleted. Note, this does NOT include manual
> config changes, which will be covered in a subsequent patch.
>
> This patch is part of the series which fixes bug #7017 [1].
>
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> Changes from v1 to v2:
>
> * Replace OnceCell with LazyLock, and std::sync::RwLock with
> parking_lot::RwLock.
> * Add API_MUTATION_GENERATION and guard cache inserts
> to prevent “zombie inserts” across concurrent set/delete.
> * Refactor cache operations into cache_try_secret_matches,
> cache_try_insert_secret, and centralize write-side behavior in
> apply_api_mutation.
> * Switch fast-path cache access to try_read/try_write (best-effort).
>
> Changes from v2 to v3:
>
> * Replaced process-local cache invalidation (AtomicU64
> API_MUTATION_GENERATION) with a cross-process shared generation via
> ConfigVersionCache.
> * Validate shared generation before/after the constant-time secret
> compare; only insert into cache if the generation is unchanged.
> * invalidate_cache_state() on insert if shared generation changed.
>
> Cargo.toml | 1 +
> pbs-config/Cargo.toml | 1 +
> pbs-config/src/token_shadow.rs | 157 ++++++++++++++++++++++++++++++++-
> 3 files changed, 158 insertions(+), 1 deletion(-)
>
> diff --git a/Cargo.toml b/Cargo.toml
> index 1aa57ae5..821b63b7 100644
> --- a/Cargo.toml
> +++ b/Cargo.toml
> @@ -143,6 +143,7 @@ nom = "7"
> num-traits = "0.2"
> once_cell = "1.3.1"
> openssl = "0.10.40"
> +parking_lot = "0.12"
> percent-encoding = "2.1"
> pin-project-lite = "0.2"
> regex = "1.5.5"
> diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
> index 74afb3c6..eb81ce00 100644
> --- a/pbs-config/Cargo.toml
> +++ b/pbs-config/Cargo.toml
> @@ -13,6 +13,7 @@ libc.workspace = true
> nix.workspace = true
> once_cell.workspace = true
> openssl.workspace = true
> +parking_lot.workspace = true
> regex.workspace = true
> serde.workspace = true
> serde_json.workspace = true
> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
> index 640fabbf..fa84aee5 100644
> --- a/pbs-config/src/token_shadow.rs
> +++ b/pbs-config/src/token_shadow.rs
> @@ -1,6 +1,8 @@
> use std::collections::HashMap;
> +use std::sync::LazyLock;
>
> use anyhow::{bail, format_err, Error};
> +use parking_lot::RwLock;
> use serde::{Deserialize, Serialize};
> use serde_json::{from_value, Value};
>
> @@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
> const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
> const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>
> +/// Global in-memory cache for successfully verified API token secrets.
> +/// The cache stores plain text secrets for token Authids that have already been
> +/// verified against the hashed values in `token.shadow`. This allows for cheap
> +/// subsequent authentications for the same token+secret combination, avoiding
> +/// recomputing the password hash on every request.
> +static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
> + RwLock::new(ApiTokenSecretCache {
> + secrets: HashMap::new(),
> + shared_gen: 0,
> + })
> +});
> +
> #[derive(Serialize, Deserialize)]
> #[serde(rename_all = "kebab-case")]
> /// ApiToken id / secret pair
> @@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
> bail!("not an API token ID");
> }
>
> + // Fast path
> + if cache_try_secret_matches(tokenid, secret) {
> + return Ok(());
> + }
> +
> + // Slow path
> + // First, capture the shared generation before doing the hash verification.
> + let gen_before = token_shadow_shared_gen();
> +
> let data = read_file()?;
> match data.get(tokenid) {
> - Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
> + Some(hashed_secret) => {
> + proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
> +
> + // Try to cache only if nothing changed while verifying the secret.
> + if let Some(gen) = gen_before {
> + cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
> + }
> +
> + Ok(())
> + }
> None => bail!("invalid API token"),
> }
> }
> @@ -82,6 +114,8 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
> data.insert(tokenid.clone(), hashed_secret);
> write_file(data)?;
>
> + apply_api_mutation(tokenid, Some(secret));
> +
> Ok(())
> }
>
> @@ -97,5 +131,126 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
> data.remove(tokenid);
> write_file(data)?;
>
> + apply_api_mutation(tokenid, None);
> +
> Ok(())
> }
> +
> +struct ApiTokenSecretCache {
> + /// Keys are token Authids, values are the corresponding plain text secrets.
> + /// Entries are added after a successful on-disk verification in
> + /// `verify_secret` or when a new token secret is generated by
> + /// `generate_and_set_secret`. Used to avoid repeated
> + /// password-hash computation on subsequent authentications.
> + secrets: HashMap<Authid, CachedSecret>,
> + /// Shared generation to detect mutations of the underlying token.shadow file.
> + shared_gen: usize,
> +}
> +
> +/// Cached secret.
> +struct CachedSecret {
> + secret: String,
> +}
> +
> +fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
> + let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
> + return;
> + };
> +
> + let Some(shared_gen_now) = token_shadow_shared_gen() else {
> + return;
> + };
> +
> + // If this process missed a generation bump, its cache is stale.
> + if cache.shared_gen != shared_gen_now {
> + invalidate_cache_state(&mut cache);
> + cache.shared_gen = shared_gen_now;
> + }
> +
> + // If a mutation happened while we were verifying the secret, do not insert.
> + if shared_gen_now == shared_gen_before {
> + cache.secrets.insert(tokenid, CachedSecret { secret });
> + }
> +}
> +
> +// Tries to match the given token secret against the cached secret.
> +// Checks the generation before and after the constant-time compare to avoid a
> +// TOCTOU window. If another process rotates/deletes a token while we're validating
> +// the cached secret, the generation will change, and we
> +// must not trust the cache for this request.
> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
> + let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
> + return false;
> + };
> + let Some(entry) = cache.secrets.get(tokenid) else {
> + return false;
> + };
> +
> + let cache_gen = cache.shared_gen;
> +
> + let Some(gen1) = token_shadow_shared_gen() else {
> + return false;
> + };
> + if gen1 != cache_gen {
> + return false;
> + }
> +
> + let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
should we invalidate the cache here for this particular authid in case
of a mismatch, to avoid making brute forcing too easy/cheap?
> + let Some(gen2) = token_shadow_shared_gen() else {
> + return false;
> + };
> +
> + eq && gen2 == cache_gen
> +}
> +
> +fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
> + // Signal cache invalidation to other processes (best-effort).
> + let new_shared_gen = bump_token_shadow_shared_gen();
> +
> + let mut cache = TOKEN_SECRET_CACHE.write();
> +
> + // If we cannot read/bump the shared generation, we cannot safely trust the cache.
> + let Some(gen) = new_shared_gen else {
> + invalidate_cache_state(&mut cache);
> + cache.shared_gen = 0;
> + return;
> + };
> +
> + // Update to the post-mutation generation.
> + cache.shared_gen = gen;
> +
> + // Apply the new mutation.
> + match new_secret {
> + Some(secret) => {
> + cache.secrets.insert(
> + tokenid.clone(),
> + CachedSecret {
> + secret: secret.to_owned(),
> + },
> + );
> + }
> + None => {
> + cache.secrets.remove(tokenid);
> + }
> + }
> +}
> +
> +/// Get the current shared generation.
> +fn token_shadow_shared_gen() -> Option<usize> {
> + crate::ConfigVersionCache::new()
> + .ok()
> + .map(|cvc| cvc.token_shadow_generation())
> +}
> +
> +/// Bump and return the new shared generation.
> +fn bump_token_shadow_shared_gen() -> Option<usize> {
> + crate::ConfigVersionCache::new()
> + .ok()
> + .map(|cvc| cvc.increase_token_shadow_generation() + 1)
> +}
> +
> +/// Invalidates the cache state and only keeps the shared generation.
both calls to this actually set the cached generation to some value
right after, so maybe this should take a generation directly and set it?
> +fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
> + cache.secrets.clear();
> +}
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox v5 2/4] acme: introduce http_status module
2026-01-13 13:45 5% ` Fabian Grünbichler
@ 2026-01-14 10:29 6% ` Samuel Rufinatscha
2026-01-15 9:25 5% ` Fabian Grünbichler
0 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2026-01-14 10:29 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 1/13/26 2:45 PM, Fabian Grünbichler wrote:
> On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
>> Introduce an internal http_status module with the common ACME HTTP
>> response codes, and replace use of crate::request::CREATED as well as
>> direct numeric status code usages.
>
> why not use http::status ? we already have this as dependency pretty
> much everywhere we do anything HTTP related.. would also for nicer error
> messages in case the status is not as expected..
>
http is only pulled in via the optional client / async-client features,
not the base impl feature. This code here is gated by impl, where http
might not be available. Adding http as a hard
dependency just for the few status code constants feels a bit overkill.
This matches what we discussed in a previous review round:
https://lore.proxmox.com/pbs-devel/2b7574fb-a3c5-4119-8fb6-9649881dba15@proxmox.com/
Also, since this is pub(crate) API, I think we can easily switch to
StatusCode later if http ever becomes a necessary dependency for impl.
OK with you?
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> proxmox-acme/src/account.rs | 8 ++++----
>> proxmox-acme/src/async_client.rs | 4 ++--
>> proxmox-acme/src/lib.rs | 2 ++
>> proxmox-acme/src/request.rs | 11 ++++++++++-
>> 4 files changed, 18 insertions(+), 7 deletions(-)
>>
>> diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
>> index d8eb3e73..ea1a3c60 100644
>> --- a/proxmox-acme/src/account.rs
>> +++ b/proxmox-acme/src/account.rs
>> @@ -84,7 +84,7 @@ impl Account {
>> method: "POST",
>> content_type: crate::request::JSON_CONTENT_TYPE,
>> body,
>> - expected: crate::request::CREATED,
>> + expected: crate::http_status::CREATED,
>> };
>>
>> Ok(NewOrder::new(request))
>> @@ -106,7 +106,7 @@ impl Account {
>> method: "POST",
>> content_type: crate::request::JSON_CONTENT_TYPE,
>> body,
>> - expected: 200,
>> + expected: crate::http_status::OK,
>> })
>> }
>>
>> @@ -131,7 +131,7 @@ impl Account {
>> method: "POST",
>> content_type: crate::request::JSON_CONTENT_TYPE,
>> body,
>> - expected: 200,
>> + expected: crate::http_status::OK,
>> })
>> }
>>
>> @@ -321,7 +321,7 @@ impl AccountCreator {
>> method: "POST",
>> content_type: crate::request::JSON_CONTENT_TYPE,
>> body,
>> - expected: crate::request::CREATED,
>> + expected: crate::http_status::CREATED,
>> })
>> }
>>
>> diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
>> index 2ff3ba22..043648bb 100644
>> --- a/proxmox-acme/src/async_client.rs
>> +++ b/proxmox-acme/src/async_client.rs
>> @@ -498,7 +498,7 @@ impl AcmeClient {
>> method: "GET",
>> content_type: "",
>> body: String::new(),
>> - expected: 200,
>> + expected: crate::http_status::OK,
>> },
>> nonce,
>> )
>> @@ -550,7 +550,7 @@ impl AcmeClient {
>> method: "HEAD",
>> content_type: "",
>> body: String::new(),
>> - expected: 200,
>> + expected: crate::http_status::OK,
>> },
>> nonce,
>> )
>> diff --git a/proxmox-acme/src/lib.rs b/proxmox-acme/src/lib.rs
>> index 6722030c..6051a025 100644
>> --- a/proxmox-acme/src/lib.rs
>> +++ b/proxmox-acme/src/lib.rs
>> @@ -70,6 +70,8 @@ pub use order::Order;
>> #[cfg(feature = "impl")]
>> pub use order::NewOrder;
>> #[cfg(feature = "impl")]
>> +pub(crate) use request::http_status;
>> +#[cfg(feature = "impl")]
>> pub use request::ErrorResponse;
>>
>> /// Header name for nonces.
>> diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
>> index dadfc5af..341ce53e 100644
>> --- a/proxmox-acme/src/request.rs
>> +++ b/proxmox-acme/src/request.rs
>> @@ -1,7 +1,6 @@
>> use serde::Deserialize;
>>
>> pub(crate) const JSON_CONTENT_TYPE: &str = "application/jose+json";
>> -pub(crate) const CREATED: u16 = 201;
>>
>> /// A request which should be performed on the ACME provider.
>> pub(crate) struct Request {
>> @@ -21,6 +20,16 @@ pub(crate) struct Request {
>> pub(crate) expected: u16,
>> }
>>
>> +/// Common HTTP status codes used in ACME responses.
>> +pub(crate) mod http_status {
>> + /// 200 OK
>> + pub(crate) const OK: u16 = 200;
>> + /// 201 Created
>> + pub(crate) const CREATED: u16 = 201;
>> + /// 204 No Content
>> + pub(crate) const NO_CONTENT: u16 = 204;
>> +}
>> +
>> /// An ACME error response contains a specially formatted type string, and can optionally
>> /// contain textual details and a set of sub problems.
>> #[derive(Clone, Debug, Deserialize)]
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 3/5] acme: drop local AcmeClient
2026-01-14 8:56 6% ` Samuel Rufinatscha
@ 2026-01-14 9:58 5% ` Fabian Grünbichler
2026-01-14 10:52 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2026-01-14 9:58 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Samuel Rufinatscha
On January 14, 2026 9:56 am, Samuel Rufinatscha wrote:
> On 1/13/26 2:44 PM, Fabian Grünbichler wrote:
>> On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
>>> PBS currently uses its own ACME client and API logic, while PDM uses the
>>> factored out proxmox-acme and proxmox-acme-api crates. This duplication
>>> risks differences in behaviour and requires ACME maintenance in two
>>> places. This patch is part of a series to move PBS over to the shared
>>> ACME stack.
>>>
>>> Changes:
>>> - Remove the local src/acme/client.rs and switch to
>>> proxmox_acme::async_client::AcmeClient where needed.
>>> - Use proxmox_acme_api::load_client_with_account to the custom
>>> AcmeClient::load() function
>>> - Replace the local do_register() logic with
>>> proxmox_acme_api::register_account, to further ensure accounts are persisted
>>> - Replace the local AcmeAccountName type, required for
>>> proxmox_acme_api::register_account
>>>
>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>> ---
>>> src/acme/client.rs | 691 -------------------------
>>> src/acme/mod.rs | 3 -
>>> src/acme/plugin.rs | 2 +-
>>> src/api2/config/acme.rs | 50 +-
>>> src/api2/node/certificates.rs | 2 +-
>>> src/api2/types/acme.rs | 8 -
>>> src/bin/proxmox_backup_manager/acme.rs | 17 +-
>>> src/config/acme/mod.rs | 8 +-
>>> src/config/node.rs | 9 +-
>>> 9 files changed, 36 insertions(+), 754 deletions(-)
>>> delete mode 100644 src/acme/client.rs
>>>
>>
>> [..]
>>
>>> diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
>>> index ac89ae5e..e4639c53 100644
>>> --- a/src/config/acme/mod.rs
>>> +++ b/src/config/acme/mod.rs
>>
>> I think this whole file should probably be replaced entirely by
>> proxmox-acme-api , which - AFAICT - would just require adding the
>> completion helpers there?
>>
>
> Good point, yes I think moving the completion helpers would
> allow us to get rid of this file. PDM does not make use of
> them / there is atm no 1:1 code in proxmox/ for these helpers.
only because https://bugzilla.proxmox.com/show_bug.cgi?id=7179 is not
yet implemented ;) so please coordinate with Shan to avoid doing the
work twice.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* [pbs-devel] applied-series: [PATCH proxmox-backup v6 0/4] datastore: remove config reload on hot path
2026-01-05 14:16 12% [pbs-devel] [PATCH proxmox-backup v6 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
` (3 preceding siblings ...)
2026-01-05 14:16 13% ` [pbs-devel] [PATCH proxmox-backup v6 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
@ 2026-01-14 9:54 5% ` Fabian Grünbichler
4 siblings, 0 replies; 200+ results
From: Fabian Grünbichler @ 2026-01-14 9:54 UTC (permalink / raw)
To: pbs-devel, Samuel Rufinatscha
On Mon, 05 Jan 2026 15:16:10 +0100, Samuel Rufinatscha wrote:
> this series reduces CPU time in datastore lookups by avoiding repeated
> datastore.cfg reads/parses in both `lookup_datastore()` and
> `DataStore::Drop`. It also adds a TTL so manual config edits are
> noticed without reintroducing hashing on every request.
>
> While investigating #6049 [1], cargo-flamegraph [2] showed hotspots
> during repeated `/status` calls in `lookup_datastore()` and in `Drop`,
> dominated by `pbs_config::datastore::config()` (config parse).
>
> [...]
Applied with some rewording of commit messages to make them less
verbose/boiler-platey, thanks!
[1/4] config: enable config version cache for datastore
commit: d14b7469a72f4265bcc1727a1274b207bc201be0
[2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
commit: be6d251e4483474754cdc5f6d12f2674e22fa132
[3/4] partial fix #6049: datastore: use config fast-path in Drop
commit: 584fa961909c32565046c39f95485273c0a8cba5
[4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
commit: 07ab13e5aaf1d6b790234d5238c1c3668c56c22e
Best regards,
--
Fabian Grünbichler <f.gruenbichler@proxmox.com>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 3/5] acme: drop local AcmeClient
2026-01-13 13:45 5% ` Fabian Grünbichler
@ 2026-01-14 8:56 6% ` Samuel Rufinatscha
2026-01-14 9:58 5% ` Fabian Grünbichler
0 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2026-01-14 8:56 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 1/13/26 2:44 PM, Fabian Grünbichler wrote:
> On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
>> PBS currently uses its own ACME client and API logic, while PDM uses the
>> factored out proxmox-acme and proxmox-acme-api crates. This duplication
>> risks differences in behaviour and requires ACME maintenance in two
>> places. This patch is part of a series to move PBS over to the shared
>> ACME stack.
>>
>> Changes:
>> - Remove the local src/acme/client.rs and switch to
>> proxmox_acme::async_client::AcmeClient where needed.
>> - Use proxmox_acme_api::load_client_with_account to the custom
>> AcmeClient::load() function
>> - Replace the local do_register() logic with
>> proxmox_acme_api::register_account, to further ensure accounts are persisted
>> - Replace the local AcmeAccountName type, required for
>> proxmox_acme_api::register_account
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> src/acme/client.rs | 691 -------------------------
>> src/acme/mod.rs | 3 -
>> src/acme/plugin.rs | 2 +-
>> src/api2/config/acme.rs | 50 +-
>> src/api2/node/certificates.rs | 2 +-
>> src/api2/types/acme.rs | 8 -
>> src/bin/proxmox_backup_manager/acme.rs | 17 +-
>> src/config/acme/mod.rs | 8 +-
>> src/config/node.rs | 9 +-
>> 9 files changed, 36 insertions(+), 754 deletions(-)
>> delete mode 100644 src/acme/client.rs
>>
>
> [..]
>
>> diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
>> index ac89ae5e..e4639c53 100644
>> --- a/src/config/acme/mod.rs
>> +++ b/src/config/acme/mod.rs
>
> I think this whole file should probably be replaced entirely by
> proxmox-acme-api , which - AFAICT - would just require adding the
> completion helpers there?
>
Good point, yes I think moving the completion helpers would
allow us to get rid of this file. PDM does not make use of
them / there is atm no 1:1 code in proxmox/ for these helpers.
>> @@ -6,10 +6,11 @@ use anyhow::{bail, format_err, Error};
>> use serde_json::Value;
>>
>> use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
>> +use proxmox_acme_api::AcmeAccountName;
>> use proxmox_sys::error::SysError;
>> use proxmox_sys::fs::{file_read_string, CreateOptions};
>>
>> -use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
>> +use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
>>
>> pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
>> pub(crate) const ACME_ACCOUNT_DIR: &str = pbs_buildcfg::configdir!("/acme/accounts");
>> @@ -34,11 +35,6 @@ pub(crate) fn make_acme_dir() -> Result<(), Error> {
>> create_acme_subdir(ACME_DIR)
>> }
>>
>> -pub(crate) fn make_acme_account_dir() -> Result<(), Error> {
>> - make_acme_dir()?;
>> - create_acme_subdir(ACME_ACCOUNT_DIR)
>> -}
>> -
>> pub const KNOWN_ACME_DIRECTORIES: &[KnownAcmeDirectory] = &[
>> KnownAcmeDirectory {
>> name: "Let's Encrypt V2",
>> diff --git a/src/config/node.rs b/src/config/node.rs
>> index 253b2e36..e4b66a20 100644
>> --- a/src/config/node.rs
>> +++ b/src/config/node.rs
>> @@ -8,16 +8,15 @@ use pbs_api_types::{
>> EMAIL_SCHEMA, MULTI_LINE_COMMENT_SCHEMA, OPENSSL_CIPHERS_TLS_1_2_SCHEMA,
>> OPENSSL_CIPHERS_TLS_1_3_SCHEMA,
>> };
>> +use proxmox_acme::async_client::AcmeClient;
>> +use proxmox_acme_api::AcmeAccountName;
>> use proxmox_http::ProxyConfig;
>> use proxmox_schema::{api, ApiStringFormat, ApiType, Updater};
>>
>> use pbs_buildcfg::configdir;
>> use pbs_config::{open_backup_lockfile, BackupLockGuard};
>>
>> -use crate::acme::AcmeClient;
>> -use crate::api2::types::{
>> - AcmeAccountName, AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA,
>> -};
>> +use crate::api2::types::{AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA};
>>
>> const CONF_FILE: &str = configdir!("/node.cfg");
>> const LOCK_FILE: &str = configdir!("/.node.lck");
>> @@ -247,7 +246,7 @@ impl NodeConfig {
>> } else {
>> AcmeAccountName::from_string("default".to_string())? // should really not happen
>> };
>> - AcmeClient::load(&account).await
>> + proxmox_acme_api::load_client_with_account(&account).await
>> }
>>
>> pub fn acme_domains(&'_ self) -> AcmeDomainIter<'_> {
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox v5 4/4] acme-api: add helper to load client for an account
2026-01-13 13:45 5% ` Fabian Grünbichler
@ 2026-01-13 16:57 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-13 16:57 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 1/13/26 2:44 PM, Fabian Grünbichler wrote:
> On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
>> The PBS ACME refactoring needs a simple way to obtain an AcmeClient for
>> a given configured account without duplicating config wiring. This patch
>> adds a load_client_with_account helper in proxmox-acme-api that loads
>> the account and constructs a matching client, similarly as PBS previous
>> own AcmeClient::load() function.
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> proxmox-acme-api/src/account_api_impl.rs | 5 +++++
>> proxmox-acme-api/src/lib.rs | 3 ++-
>> 2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/proxmox-acme-api/src/account_api_impl.rs b/proxmox-acme-api/src/account_api_impl.rs
>> index ef195908..ca8c8655 100644
>> --- a/proxmox-acme-api/src/account_api_impl.rs
>> +++ b/proxmox-acme-api/src/account_api_impl.rs
>> @@ -116,3 +116,8 @@ pub async fn update_account(name: &AcmeAccountName, contact: Option<String>) ->
>>
>> Ok(())
>> }
>> +
>> +pub async fn load_client_with_account(account_name: &AcmeAccountName) -> Result<AcmeClient, Error> {
>> + let account_data = super::account_config::load_account_config(&account_name).await?;
>> + Ok(account_data.client())
>> +}
>
> I don't think this is needed - there is only a single callsite in PBS
> and that is itself dead code that can be removed..
>
Will check, thanks!
>> diff --git a/proxmox-acme-api/src/lib.rs b/proxmox-acme-api/src/lib.rs
>> index 623e9e23..96f88ae2 100644
>> --- a/proxmox-acme-api/src/lib.rs
>> +++ b/proxmox-acme-api/src/lib.rs
>> @@ -31,7 +31,8 @@ mod plugin_config;
>> mod account_api_impl;
>> #[cfg(feature = "impl")]
>> pub use account_api_impl::{
>> - deactivate_account, get_account, get_tos, list_accounts, register_account, update_account,
>> + deactivate_account, get_account, get_tos, list_accounts, load_client_with_account,
>> + register_account, update_account,
>> };
>>
>> #[cfg(feature = "impl")]
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 4/5] acme: change API impls to use proxmox-acme-api handlers
2026-01-13 13:45 5% ` Fabian Grünbichler
@ 2026-01-13 16:53 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-13 16:53 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 1/13/26 2:45 PM, Fabian Grünbichler wrote:
> On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
>> PBS currently uses its own ACME client and API logic, while PDM uses the
>> factored out proxmox-acme and proxmox-acme-api crates. This duplication
>> risks differences in behaviour and requires ACME maintenance in two
>> places. This patch is part of a series to move PBS over to the shared
>> ACME stack.
>>
>> Changes:
>> - Replace api2/config/acme.rs API logic with proxmox-acme-api handlers.
>> - Drop local caching and helper types that duplicate proxmox-acme-api.
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> src/api2/config/acme.rs | 378 ++-----------------------
>> src/api2/types/acme.rs | 16 --
>> src/bin/proxmox_backup_manager/acme.rs | 6 +-
>> src/config/acme/mod.rs | 44 +--
>> 4 files changed, 33 insertions(+), 411 deletions(-)
>>
>> diff --git a/src/api2/config/acme.rs b/src/api2/config/acme.rs
>> index 898f06dd..3314430c 100644
>> --- a/src/api2/config/acme.rs
>> +++ b/src/api2/config/acme.rs
>> @@ -1,29 +1,18 @@
>> -use std::fs;
>> -use std::ops::ControlFlow;
>> -use std::path::Path;
>
> nit: this one is actually still used below
Ah, I see :) Good find!
>
>> -use std::sync::{Arc, LazyLock, Mutex};
>> -use std::time::SystemTime;
>> -
>> -use anyhow::{bail, format_err, Error};
>> -use hex::FromHex;
>> -use serde::{Deserialize, Serialize};
>> -use serde_json::{json, Value};
>> -use tracing::{info, warn};
>> +use anyhow::Error;
>> +use tracing::info;
>>
>> use pbs_api_types::{Authid, PRIV_SYS_MODIFY};
>> -use proxmox_acme::async_client::AcmeClient;
>> -use proxmox_acme::types::AccountData as AcmeAccountData;
>> -use proxmox_acme_api::AcmeAccountName;
>> +use proxmox_acme_api::{
>> + AccountEntry, AccountInfo, AcmeAccountName, AcmeChallengeSchema, ChallengeSchemaWrapper,
>> + DeletablePluginProperty, DnsPluginCore, DnsPluginCoreUpdater, KnownAcmeDirectory, PluginConfig,
>> + DEFAULT_ACME_DIRECTORY_ENTRY, PLUGIN_ID_SCHEMA,
>> +};
>> +use proxmox_config_digest::ConfigDigest;
>> use proxmox_rest_server::WorkerTask;
>> use proxmox_router::{
>> http_bail, list_subdirs_api_method, Permission, Router, RpcEnvironment, SubdirMap,
>> };
>> -use proxmox_schema::{api, param_bail};
>> -
>> -use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
>> -use crate::config::acme::plugin::{
>> - self, DnsPlugin, DnsPluginCore, DnsPluginCoreUpdater, PLUGIN_ID_SCHEMA,
>> -};
>> +use proxmox_schema::api;
>>
>> pub(crate) const ROUTER: Router = Router::new()
>> .get(&list_subdirs_api_method!(SUBDIRS))
>> @@ -65,19 +54,6 @@ const PLUGIN_ITEM_ROUTER: Router = Router::new()
>> .put(&API_METHOD_UPDATE_PLUGIN)
>> .delete(&API_METHOD_DELETE_PLUGIN);
>>
>> -#[api(
>> - properties: {
>> - name: { type: AcmeAccountName },
>> - },
>> -)]
>> -/// An ACME Account entry.
>> -///
>> -/// Currently only contains a 'name' property.
>> -#[derive(Serialize)]
>> -pub struct AccountEntry {
>> - name: AcmeAccountName,
>> -}
>> -
>> #[api(
>> access: {
>> permission: &Permission::Privilege(&["system", "certificates"], PRIV_SYS_MODIFY, false),
>> @@ -91,40 +67,7 @@ pub struct AccountEntry {
>> )]
>> /// List ACME accounts.
>> pub fn list_accounts() -> Result<Vec<AccountEntry>, Error> {
>> - let mut entries = Vec::new();
>> - crate::config::acme::foreach_acme_account(|name| {
>> - entries.push(AccountEntry { name });
>> - ControlFlow::Continue(())
>> - })?;
>> - Ok(entries)
>> -}
>> -
>> -#[api(
>> - properties: {
>> - account: { type: Object, properties: {}, additional_properties: true },
>> - tos: {
>> - type: String,
>> - optional: true,
>> - },
>> - },
>> -)]
>> -/// ACME Account information.
>> -///
>> -/// This is what we return via the API.
>> -#[derive(Serialize)]
>> -pub struct AccountInfo {
>> - /// Raw account data.
>> - account: AcmeAccountData,
>> -
>> - /// The ACME directory URL the account was created at.
>> - directory: String,
>> -
>> - /// The account's own URL within the ACME directory.
>> - location: String,
>> -
>> - /// The ToS URL, if the user agreed to one.
>> - #[serde(skip_serializing_if = "Option::is_none")]
>> - tos: Option<String>,
>> + proxmox_acme_api::list_accounts()
>> }
>>
>> #[api(
>> @@ -141,23 +84,7 @@ pub struct AccountInfo {
>> )]
>> /// Return existing ACME account information.
>> pub async fn get_account(name: AcmeAccountName) -> Result<AccountInfo, Error> {
>> - let account_info = proxmox_acme_api::get_account(name).await?;
>> -
>> - Ok(AccountInfo {
>> - location: account_info.location,
>> - tos: account_info.tos,
>> - directory: account_info.directory,
>> - account: AcmeAccountData {
>> - only_return_existing: false, // don't actually write this out in case it's set
>> - ..account_info.account
>> - },
>> - })
>> -}
>> -
>> -fn account_contact_from_string(s: &str) -> Vec<String> {
>> - s.split(&[' ', ';', ',', '\0'][..])
>> - .map(|s| format!("mailto:{s}"))
>> - .collect()
>> + proxmox_acme_api::get_account(name).await
>> }
>>
>> #[api(
>> @@ -222,15 +149,11 @@ fn register_account(
>> );
>> }
>>
>> - if Path::new(&crate::config::acme::account_path(&name)).exists() {
>> + if std::path::Path::new(&proxmox_acme_api::account_config_filename(&name)).exists() {
>
> here ^
>
>> http_bail!(BAD_REQUEST, "account {} already exists", name);
>> }
>>
>> - let directory = directory.unwrap_or_else(|| {
>> - crate::config::acme::DEFAULT_ACME_DIRECTORY_ENTRY
>> - .url
>> - .to_owned()
>> - });
>> + let directory = directory.unwrap_or_else(|| DEFAULT_ACME_DIRECTORY_ENTRY.url.to_string());
>>
>> WorkerTask::spawn(
>> "acme-register",
>> @@ -286,17 +209,7 @@ pub fn update_account(
>> auth_id.to_string(),
>> true,
>> move |_worker| async move {
>> - let data = match contact {
>> - Some(data) => json!({
>> - "contact": account_contact_from_string(&data),
>> - }),
>> - None => json!({}),
>> - };
>> -
>> - proxmox_acme_api::load_client_with_account(&name)
>> - .await?
>> - .update_account(&data)
>> - .await?;
>> + proxmox_acme_api::update_account(&name, contact).await?;
>>
>> Ok(())
>> },
>> @@ -334,18 +247,8 @@ pub fn deactivate_account(
>> auth_id.to_string(),
>> true,
>> move |_worker| async move {
>> - match proxmox_acme_api::load_client_with_account(&name)
>> - .await?
>> - .update_account(&json!({"status": "deactivated"}))
>> - .await
>> - {
>> - Ok(_account) => (),
>> - Err(err) if !force => return Err(err),
>> - Err(err) => {
>> - warn!("error deactivating account {name}, proceeding anyway - {err}");
>> - }
>> - }
>> - crate::config::acme::mark_account_deactivated(&name)?;
>> + proxmox_acme_api::deactivate_account(&name, force).await?;
>> +
>> Ok(())
>> },
>> )
>> @@ -372,15 +275,7 @@ pub fn deactivate_account(
>> )]
>> /// Get the Terms of Service URL for an ACME directory.
>> async fn get_tos(directory: Option<String>) -> Result<Option<String>, Error> {
>> - let directory = directory.unwrap_or_else(|| {
>> - crate::config::acme::DEFAULT_ACME_DIRECTORY_ENTRY
>> - .url
>> - .to_owned()
>> - });
>> - Ok(AcmeClient::new(directory)
>> - .terms_of_service_url()
>> - .await?
>> - .map(str::to_owned))
>> + proxmox_acme_api::get_tos(directory).await
>> }
>>
>> #[api(
>> @@ -395,52 +290,7 @@ async fn get_tos(directory: Option<String>) -> Result<Option<String>, Error> {
>> )]
>> /// Get named known ACME directory endpoints.
>> fn get_directories() -> Result<&'static [KnownAcmeDirectory], Error> {
>> - Ok(crate::config::acme::KNOWN_ACME_DIRECTORIES)
>> -}
>> -
>> -/// Wrapper for efficient Arc use when returning the ACME challenge-plugin schema for serializing
>> -struct ChallengeSchemaWrapper {
>> - inner: Arc<Vec<AcmeChallengeSchema>>,
>> -}
>> -
>> -impl Serialize for ChallengeSchemaWrapper {
>> - fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
>> - where
>> - S: serde::Serializer,
>> - {
>> - self.inner.serialize(serializer)
>> - }
>> -}
>> -
>> -struct CachedSchema {
>> - schema: Arc<Vec<AcmeChallengeSchema>>,
>> - cached_mtime: SystemTime,
>> -}
>> -
>> -fn get_cached_challenge_schemas() -> Result<ChallengeSchemaWrapper, Error> {
>> - static CACHE: LazyLock<Mutex<Option<CachedSchema>>> = LazyLock::new(|| Mutex::new(None));
>> -
>> - // the actual loading code
>> - let mut last = CACHE.lock().unwrap();
>> -
>> - let actual_mtime = fs::metadata(crate::config::acme::ACME_DNS_SCHEMA_FN)?.modified()?;
>> -
>> - let schema = match &*last {
>> - Some(CachedSchema {
>> - schema,
>> - cached_mtime,
>> - }) if *cached_mtime >= actual_mtime => schema.clone(),
>> - _ => {
>> - let new_schema = Arc::new(crate::config::acme::load_dns_challenge_schema()?);
>> - *last = Some(CachedSchema {
>> - schema: Arc::clone(&new_schema),
>> - cached_mtime: actual_mtime,
>> - });
>> - new_schema
>> - }
>> - };
>> -
>> - Ok(ChallengeSchemaWrapper { inner: schema })
>> + Ok(proxmox_acme_api::KNOWN_ACME_DIRECTORIES)
>> }
>>
>> #[api(
>> @@ -455,69 +305,7 @@ fn get_cached_challenge_schemas() -> Result<ChallengeSchemaWrapper, Error> {
>> )]
>> /// Get named known ACME directory endpoints.
>> fn get_challenge_schema() -> Result<ChallengeSchemaWrapper, Error> {
>> - get_cached_challenge_schemas()
>> -}
>> -
>> -#[api]
>> -#[derive(Default, Deserialize, Serialize)]
>> -#[serde(rename_all = "kebab-case")]
>> -/// The API's format is inherited from PVE/PMG:
>> -pub struct PluginConfig {
>> - /// Plugin ID.
>> - plugin: String,
>> -
>> - /// Plugin type.
>> - #[serde(rename = "type")]
>> - ty: String,
>> -
>> - /// DNS Api name.
>> - #[serde(skip_serializing_if = "Option::is_none", default)]
>> - api: Option<String>,
>> -
>> - /// Plugin configuration data.
>> - #[serde(skip_serializing_if = "Option::is_none", default)]
>> - data: Option<String>,
>> -
>> - /// Extra delay in seconds to wait before requesting validation.
>> - ///
>> - /// Allows to cope with long TTL of DNS records.
>> - #[serde(skip_serializing_if = "Option::is_none", default)]
>> - validation_delay: Option<u32>,
>> -
>> - /// Flag to disable the config.
>> - #[serde(skip_serializing_if = "Option::is_none", default)]
>> - disable: Option<bool>,
>> -}
>> -
>> -// See PMG/PVE's $modify_cfg_for_api sub
>> -fn modify_cfg_for_api(id: &str, ty: &str, data: &Value) -> PluginConfig {
>> - let mut entry = data.clone();
>> -
>> - let obj = entry.as_object_mut().unwrap();
>> - obj.remove("id");
>> - obj.insert("plugin".to_string(), Value::String(id.to_owned()));
>> - obj.insert("type".to_string(), Value::String(ty.to_owned()));
>> -
>> - // FIXME: This needs to go once the `Updater` is fixed.
>> - // None of these should be able to fail unless the user changed the files by hand, in which
>> - // case we leave the unmodified string in the Value for now. This will be handled with an error
>> - // later.
>> - if let Some(Value::String(ref mut data)) = obj.get_mut("data") {
>> - if let Ok(new) = proxmox_base64::url::decode_no_pad(&data) {
>> - if let Ok(utf8) = String::from_utf8(new) {
>> - *data = utf8;
>> - }
>> - }
>> - }
>> -
>> - // PVE/PMG do this explicitly for ACME plugins...
>> - // obj.insert("digest".to_string(), Value::String(digest.clone()));
>> -
>> - serde_json::from_value(entry).unwrap_or_else(|_| PluginConfig {
>> - plugin: "*Error*".to_string(),
>> - ty: "*Error*".to_string(),
>> - ..Default::default()
>> - })
>> + proxmox_acme_api::get_cached_challenge_schemas()
>> }
>>
>> #[api(
>> @@ -533,12 +321,7 @@ fn modify_cfg_for_api(id: &str, ty: &str, data: &Value) -> PluginConfig {
>> )]
>> /// List ACME challenge plugins.
>> pub fn list_plugins(rpcenv: &mut dyn RpcEnvironment) -> Result<Vec<PluginConfig>, Error> {
>> - let (plugins, digest) = plugin::config()?;
>> - rpcenv["digest"] = hex::encode(digest).into();
>> - Ok(plugins
>> - .iter()
>> - .map(|(id, (ty, data))| modify_cfg_for_api(id, ty, data))
>> - .collect())
>> + proxmox_acme_api::list_plugins(rpcenv)
>> }
>>
>> #[api(
>> @@ -555,13 +338,7 @@ pub fn list_plugins(rpcenv: &mut dyn RpcEnvironment) -> Result<Vec<PluginConfig>
>> )]
>> /// List ACME challenge plugins.
>> pub fn get_plugin(id: String, rpcenv: &mut dyn RpcEnvironment) -> Result<PluginConfig, Error> {
>> - let (plugins, digest) = plugin::config()?;
>> - rpcenv["digest"] = hex::encode(digest).into();
>> -
>> - match plugins.get(&id) {
>> - Some((ty, data)) => Ok(modify_cfg_for_api(&id, ty, data)),
>> - None => http_bail!(NOT_FOUND, "no such plugin"),
>> - }
>> + proxmox_acme_api::get_plugin(id, rpcenv)
>> }
>>
>> // Currently we only have "the" standalone plugin and DNS plugins so we can just flatten a
>> @@ -593,30 +370,7 @@ pub fn get_plugin(id: String, rpcenv: &mut dyn RpcEnvironment) -> Result<PluginC
>> )]
>> /// Add ACME plugin configuration.
>> pub fn add_plugin(r#type: String, core: DnsPluginCore, data: String) -> Result<(), Error> {
>> - // Currently we only support DNS plugins and the standalone plugin is "fixed":
>> - if r#type != "dns" {
>> - param_bail!("type", "invalid ACME plugin type: {:?}", r#type);
>> - }
>> -
>> - let data = String::from_utf8(proxmox_base64::decode(data)?)
>> - .map_err(|_| format_err!("data must be valid UTF-8"))?;
>> -
>> - let id = core.id.clone();
>> -
>> - let _lock = plugin::lock()?;
>> -
>> - let (mut plugins, _digest) = plugin::config()?;
>> - if plugins.contains_key(&id) {
>> - param_bail!("id", "ACME plugin ID {:?} already exists", id);
>> - }
>> -
>> - let plugin = serde_json::to_value(DnsPlugin { core, data })?;
>> -
>> - plugins.insert(id, r#type, plugin);
>> -
>> - plugin::save_config(&plugins)?;
>> -
>> - Ok(())
>> + proxmox_acme_api::add_plugin(r#type, core, data)
>> }
>>
>> #[api(
>> @@ -632,26 +386,7 @@ pub fn add_plugin(r#type: String, core: DnsPluginCore, data: String) -> Result<(
>> )]
>> /// Delete an ACME plugin configuration.
>> pub fn delete_plugin(id: String) -> Result<(), Error> {
>> - let _lock = plugin::lock()?;
>> -
>> - let (mut plugins, _digest) = plugin::config()?;
>> - if plugins.remove(&id).is_none() {
>> - http_bail!(NOT_FOUND, "no such plugin");
>> - }
>> - plugin::save_config(&plugins)?;
>> -
>> - Ok(())
>> -}
>> -
>> -#[api()]
>> -#[derive(Serialize, Deserialize)]
>> -#[serde(rename_all = "kebab-case")]
>> -/// Deletable property name
>> -pub enum DeletableProperty {
>> - /// Delete the disable property
>> - Disable,
>> - /// Delete the validation-delay property
>> - ValidationDelay,
>> + proxmox_acme_api::delete_plugin(id)
>> }
>>
>> #[api(
>> @@ -673,12 +408,12 @@ pub enum DeletableProperty {
>> type: Array,
>> optional: true,
>> items: {
>> - type: DeletableProperty,
>> + type: DeletablePluginProperty,
>> }
>> },
>> digest: {
>> - description: "Digest to protect against concurrent updates",
>> optional: true,
>> + type: ConfigDigest,
>> },
>> },
>> },
>> @@ -692,65 +427,8 @@ pub fn update_plugin(
>> id: String,
>> update: DnsPluginCoreUpdater,
>> data: Option<String>,
>> - delete: Option<Vec<DeletableProperty>>,
>> - digest: Option<String>,
>> + delete: Option<Vec<DeletablePluginProperty>>,
>> + digest: Option<ConfigDigest>,
>> ) -> Result<(), Error> {
>> - let data = data
>> - .as_deref()
>> - .map(proxmox_base64::decode)
>> - .transpose()?
>> - .map(String::from_utf8)
>> - .transpose()
>> - .map_err(|_| format_err!("data must be valid UTF-8"))?;
>> -
>> - let _lock = plugin::lock()?;
>> -
>> - let (mut plugins, expected_digest) = plugin::config()?;
>> -
>> - if let Some(digest) = digest {
>> - let digest = <[u8; 32]>::from_hex(digest)?;
>> - crate::tools::detect_modified_configuration_file(&digest, &expected_digest)?;
>> - }
>> -
>> - match plugins.get_mut(&id) {
>> - Some((ty, ref mut entry)) => {
>> - if ty != "dns" {
>> - bail!("cannot update plugin of type {:?}", ty);
>> - }
>> -
>> - let mut plugin = DnsPlugin::deserialize(&*entry)?;
>> -
>> - if let Some(delete) = delete {
>> - for delete_prop in delete {
>> - match delete_prop {
>> - DeletableProperty::ValidationDelay => {
>> - plugin.core.validation_delay = None;
>> - }
>> - DeletableProperty::Disable => {
>> - plugin.core.disable = None;
>> - }
>> - }
>> - }
>> - }
>> - if let Some(data) = data {
>> - plugin.data = data;
>> - }
>> - if let Some(api) = update.api {
>> - plugin.core.api = api;
>> - }
>> - if update.validation_delay.is_some() {
>> - plugin.core.validation_delay = update.validation_delay;
>> - }
>> - if update.disable.is_some() {
>> - plugin.core.disable = update.disable;
>> - }
>> -
>> - *entry = serde_json::to_value(plugin)?;
>> - }
>> - None => http_bail!(NOT_FOUND, "no such plugin"),
>> - }
>> -
>> - plugin::save_config(&plugins)?;
>> -
>> - Ok(())
>> + proxmox_acme_api::update_plugin(id, update, data, delete, digest)
>> }
>> diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
>> index 64175aff..0ff496b6 100644
>> --- a/src/api2/types/acme.rs
>> +++ b/src/api2/types/acme.rs
>> @@ -43,22 +43,6 @@ pub const ACME_DOMAIN_PROPERTY_SCHEMA: Schema =
>> .format(&ApiStringFormat::PropertyString(&AcmeDomain::API_SCHEMA))
>> .schema();
>>
>> -#[api(
>> - properties: {
>> - name: { type: String },
>> - url: { type: String },
>> - },
>> -)]
>> -/// An ACME directory endpoint with a name and URL.
>> -#[derive(Serialize)]
>> -pub struct KnownAcmeDirectory {
>> - /// The ACME directory's name.
>> - pub name: &'static str,
>> -
>> - /// The ACME directory's endpoint URL.
>> - pub url: &'static str,
>> -}
>> -
>> #[api(
>> properties: {
>> schema: {
>> diff --git a/src/bin/proxmox_backup_manager/acme.rs b/src/bin/proxmox_backup_manager/acme.rs
>> index 6ed61560..d11d7498 100644
>> --- a/src/bin/proxmox_backup_manager/acme.rs
>> +++ b/src/bin/proxmox_backup_manager/acme.rs
>> @@ -4,14 +4,12 @@ use anyhow::{bail, Error};
>> use serde_json::Value;
>>
>> use proxmox_acme::async_client::AcmeClient;
>> -use proxmox_acme_api::AcmeAccountName;
>> +use proxmox_acme_api::{AcmeAccountName, DnsPluginCore, KNOWN_ACME_DIRECTORIES};
>> use proxmox_router::{cli::*, ApiHandler, RpcEnvironment};
>> use proxmox_schema::api;
>> use proxmox_sys::fs::file_get_contents;
>>
>> use proxmox_backup::api2;
>> -use proxmox_backup::config::acme::plugin::DnsPluginCore;
>> -use proxmox_backup::config::acme::KNOWN_ACME_DIRECTORIES;
>>
>> pub fn acme_mgmt_cli() -> CommandLineInterface {
>> let cmd_def = CliCommandMap::new()
>> @@ -122,7 +120,7 @@ async fn register_account(
>>
>> match input.trim().parse::<usize>() {
>> Ok(n) if n < KNOWN_ACME_DIRECTORIES.len() => {
>> - break (KNOWN_ACME_DIRECTORIES[n].url.to_owned(), false);
>> + break (KNOWN_ACME_DIRECTORIES[n].url.to_string(), false);
>> }
>> Ok(n) if n == KNOWN_ACME_DIRECTORIES.len() => {
>> input.clear();
>> diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
>> index e4639c53..01ab6223 100644
>> --- a/src/config/acme/mod.rs
>> +++ b/src/config/acme/mod.rs
>> @@ -1,16 +1,15 @@
>> use std::collections::HashMap;
>> use std::ops::ControlFlow;
>> -use std::path::Path;
>>
>> -use anyhow::{bail, format_err, Error};
>> +use anyhow::Error;
>> use serde_json::Value;
>>
>> use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
>> -use proxmox_acme_api::AcmeAccountName;
>> +use proxmox_acme_api::{AcmeAccountName, KnownAcmeDirectory, KNOWN_ACME_DIRECTORIES};
>> use proxmox_sys::error::SysError;
>> use proxmox_sys::fs::{file_read_string, CreateOptions};
>>
>> -use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
>> +use crate::api2::types::AcmeChallengeSchema;
>>
>> pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
>> pub(crate) const ACME_ACCOUNT_DIR: &str = pbs_buildcfg::configdir!("/acme/accounts");
>> @@ -35,23 +34,8 @@ pub(crate) fn make_acme_dir() -> Result<(), Error> {
>> create_acme_subdir(ACME_DIR)
>> }
>>
>> -pub const KNOWN_ACME_DIRECTORIES: &[KnownAcmeDirectory] = &[
>> - KnownAcmeDirectory {
>> - name: "Let's Encrypt V2",
>> - url: "https://acme-v02.api.letsencrypt.org/directory",
>> - },
>> - KnownAcmeDirectory {
>> - name: "Let's Encrypt V2 Staging",
>> - url: "https://acme-staging-v02.api.letsencrypt.org/directory",
>> - },
>> -];
>> -
>> pub const DEFAULT_ACME_DIRECTORY_ENTRY: &KnownAcmeDirectory = &KNOWN_ACME_DIRECTORIES[0];
>>
>> -pub fn account_path(name: &str) -> String {
>> - format!("{ACME_ACCOUNT_DIR}/{name}")
>> -}
>> -
>> pub fn foreach_acme_account<F>(mut func: F) -> Result<(), Error>
>> where
>> F: FnMut(AcmeAccountName) -> ControlFlow<Result<(), Error>>,
>> @@ -82,28 +66,6 @@ where
>> }
>> }
>>
>> -pub fn mark_account_deactivated(name: &str) -> Result<(), Error> {
>> - let from = account_path(name);
>> - for i in 0..100 {
>> - let to = account_path(&format!("_deactivated_{name}_{i}"));
>> - if !Path::new(&to).exists() {
>> - return std::fs::rename(&from, &to).map_err(|err| {
>> - format_err!(
>> - "failed to move account path {:?} to {:?} - {}",
>> - from,
>> - to,
>> - err
>> - )
>> - });
>> - }
>> - }
>> - bail!(
>> - "No free slot to rename deactivated account {:?}, please cleanup {:?}",
>> - from,
>> - ACME_ACCOUNT_DIR
>> - );
>> -}
>> -
>> pub fn load_dns_challenge_schema() -> Result<Vec<AcmeChallengeSchema>, Error> {
>> let raw = file_read_string(ACME_DNS_SCHEMA_FN)?;
>> let schemas: serde_json::Map<String, Value> = serde_json::from_str(&raw)?;
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 5/5] acme: certificate ordering through proxmox-acme-api
2026-01-13 13:45 5% ` Fabian Grünbichler
@ 2026-01-13 16:51 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-13 16:51 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
comments inline
On 1/13/26 2:45 PM, Fabian Grünbichler wrote:
> On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
>> PBS currently uses its own ACME client and API logic, while PDM uses the
>> factored out proxmox-acme and proxmox-acme-api crates. This duplication
>> risks differences in behaviour and requires ACME maintenance in two
>> places. This patch is part of a series to move PBS over to the shared
>> ACME stack.
>>
>> Changes:
>> - Replace the custom ACME order/authorization loop in node certificates
>> with a call to proxmox_acme_api::order_certificate.
>> - Build domain + config data as proxmox-acme-api types
>> - Remove obsolete local ACME ordering and plugin glue code.
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> src/acme/mod.rs | 2 -
>> src/acme/plugin.rs | 335 ----------------------------------
>> src/api2/node/certificates.rs | 229 ++++-------------------
>> src/api2/types/acme.rs | 73 --------
>> src/api2/types/mod.rs | 3 -
>> src/config/acme/mod.rs | 8 +-
>> src/config/acme/plugin.rs | 92 +---------
>> src/config/node.rs | 20 +-
>> src/lib.rs | 2 -
>> 9 files changed, 38 insertions(+), 726 deletions(-)
>> delete mode 100644 src/acme/mod.rs
>> delete mode 100644 src/acme/plugin.rs
>> delete mode 100644 src/api2/types/acme.rs
>>
>
> [..]
>
>> diff --git a/src/api2/node/certificates.rs b/src/api2/node/certificates.rs
>> index 47ff8de5..73401c41 100644
>> --- a/src/api2/node/certificates.rs
>> +++ b/src/api2/node/certificates.rs
>> @@ -1,14 +1,11 @@
>> -use std::sync::Arc;
>> -use std::time::Duration;
>> -
>> use anyhow::{bail, format_err, Error};
>> use openssl::pkey::PKey;
>> use openssl::x509::X509;
>> use serde::{Deserialize, Serialize};
>> -use tracing::{info, warn};
>> +use tracing::info;
>>
>> use pbs_api_types::{NODE_SCHEMA, PRIV_SYS_MODIFY};
>> -use proxmox_acme::async_client::AcmeClient;
>> +use proxmox_acme_api::AcmeDomain;
>> use proxmox_rest_server::WorkerTask;
>> use proxmox_router::list_subdirs_api_method;
>> use proxmox_router::SubdirMap;
>> @@ -18,8 +15,6 @@ use proxmox_schema::api;
>> use pbs_buildcfg::configdir;
>> use pbs_tools::cert;
>>
>> -use crate::api2::types::AcmeDomain;
>> -use crate::config::node::NodeConfig;
>> use crate::server::send_certificate_renewal_mail;
>>
>> pub const ROUTER: Router = Router::new()
>> @@ -268,193 +263,6 @@ pub async fn delete_custom_certificate() -> Result<(), Error> {
>> Ok(())
>> }
>>
>> -struct OrderedCertificate {
>> - certificate: hyper::body::Bytes,
>> - private_key_pem: Vec<u8>,
>> -}
>> -
>> -async fn order_certificate(
>> - worker: Arc<WorkerTask>,
>> - node_config: &NodeConfig,
>> -) -> Result<Option<OrderedCertificate>, Error> {
>> - use proxmox_acme::authorization::Status;
>> - use proxmox_acme::order::Identifier;
>> -
>> - let domains = node_config.acme_domains().try_fold(
>> - Vec::<AcmeDomain>::new(),
>> - |mut acc, domain| -> Result<_, Error> {
>> - let mut domain = domain?;
>> - domain.domain.make_ascii_lowercase();
>> - if let Some(alias) = &mut domain.alias {
>> - alias.make_ascii_lowercase();
>> - }
>> - acc.push(domain);
>> - Ok(acc)
>> - },
>> - )?;
>> -
>> - let get_domain_config = |domain: &str| {
>> - domains
>> - .iter()
>> - .find(|d| d.domain == domain)
>> - .ok_or_else(|| format_err!("no config for domain '{}'", domain))
>> - };
>> -
>> - if domains.is_empty() {
>> - info!("No domains configured to be ordered from an ACME server.");
>> - return Ok(None);
>> - }
>> -
>> - let (plugins, _) = crate::config::acme::plugin::config()?;
>> -
>> - let mut acme = node_config.acme_client().await?;
>> -
>> - info!("Placing ACME order");
>> - let order = acme
>> - .new_order(domains.iter().map(|d| d.domain.to_ascii_lowercase()))
>> - .await?;
>> - info!("Order URL: {}", order.location);
>> -
>> - let identifiers: Vec<String> = order
>> - .data
>> - .identifiers
>> - .iter()
>> - .map(|identifier| match identifier {
>> - Identifier::Dns(domain) => domain.clone(),
>> - })
>> - .collect();
>> -
>> - for auth_url in &order.data.authorizations {
>> - info!("Getting authorization details from '{auth_url}'");
>> - let mut auth = acme.get_authorization(auth_url).await?;
>> -
>> - let domain = match &mut auth.identifier {
>> - Identifier::Dns(domain) => domain.to_ascii_lowercase(),
>> - };
>> -
>> - if auth.status == Status::Valid {
>> - info!("{domain} is already validated!");
>> - continue;
>> - }
>> -
>> - info!("The validation for {domain} is pending");
>> - let domain_config: &AcmeDomain = get_domain_config(&domain)?;
>> - let plugin_id = domain_config.plugin.as_deref().unwrap_or("standalone");
>> - let mut plugin_cfg = crate::acme::get_acme_plugin(&plugins, plugin_id)?
>> - .ok_or_else(|| format_err!("plugin '{plugin_id}' for domain '{domain}' not found!"))?;
>> -
>> - info!("Setting up validation plugin");
>> - let validation_url = plugin_cfg
>> - .setup(&mut acme, &auth, domain_config, Arc::clone(&worker))
>> - .await?;
>> -
>> - let result = request_validation(&mut acme, auth_url, validation_url).await;
>> -
>> - if let Err(err) = plugin_cfg
>> - .teardown(&mut acme, &auth, domain_config, Arc::clone(&worker))
>> - .await
>> - {
>> - warn!("Failed to teardown plugin '{plugin_id}' for domain '{domain}' - {err}");
>> - }
>> -
>> - result?;
>> - }
>> -
>> - info!("All domains validated");
>> - info!("Creating CSR");
>> -
>> - let csr = proxmox_acme::util::Csr::generate(&identifiers, &Default::default())?;
>> - let mut finalize_error_cnt = 0u8;
>> - let order_url = &order.location;
>> - let mut order;
>> - loop {
>> - use proxmox_acme::order::Status;
>> -
>> - order = acme.get_order(order_url).await?;
>> -
>> - match order.status {
>> - Status::Pending => {
>> - info!("still pending, trying to finalize anyway");
>> - let finalize = order
>> - .finalize
>> - .as_deref()
>> - .ok_or_else(|| format_err!("missing 'finalize' URL in order"))?;
>> - if let Err(err) = acme.finalize(finalize, &csr.data).await {
>> - if finalize_error_cnt >= 5 {
>> - return Err(err);
>> - }
>> -
>> - finalize_error_cnt += 1;
>> - }
>> - tokio::time::sleep(Duration::from_secs(5)).await;
>> - }
>> - Status::Ready => {
>> - info!("order is ready, finalizing");
>> - let finalize = order
>> - .finalize
>> - .as_deref()
>> - .ok_or_else(|| format_err!("missing 'finalize' URL in order"))?;
>> - acme.finalize(finalize, &csr.data).await?;
>> - tokio::time::sleep(Duration::from_secs(5)).await;
>> - }
>> - Status::Processing => {
>> - info!("still processing, trying again in 30 seconds");
>> - tokio::time::sleep(Duration::from_secs(30)).await;
>> - }
>> - Status::Valid => {
>> - info!("valid");
>> - break;
>> - }
>> - other => bail!("order status: {:?}", other),
>> - }
>> - }
>> -
>> - info!("Downloading certificate");
>> - let certificate = acme
>> - .get_certificate(
>> - order
>> - .certificate
>> - .as_deref()
>> - .ok_or_else(|| format_err!("missing certificate url in finalized order"))?,
>> - )
>> - .await?;
>> -
>> - Ok(Some(OrderedCertificate {
>> - certificate,
>> - private_key_pem: csr.private_key_pem,
>> - }))
>> -}
>> -
>> -async fn request_validation(
>> - acme: &mut AcmeClient,
>> - auth_url: &str,
>> - validation_url: &str,
>> -) -> Result<(), Error> {
>> - info!("Triggering validation");
>> - acme.request_challenge_validation(validation_url).await?;
>> -
>> - info!("Sleeping for 5 seconds");
>> - tokio::time::sleep(Duration::from_secs(5)).await;
>> -
>> - loop {
>> - use proxmox_acme::authorization::Status;
>> -
>> - let auth = acme.get_authorization(auth_url).await?;
>> - match auth.status {
>> - Status::Pending => {
>> - info!("Status is still 'pending', trying again in 10 seconds");
>> - tokio::time::sleep(Duration::from_secs(10)).await;
>> - }
>> - Status::Valid => return Ok(()),
>> - other => bail!(
>> - "validating challenge '{}' failed - status: {:?}",
>> - validation_url,
>> - other
>> - ),
>> - }
>> - }
>> -}
>> -
>> #[api(
>> input: {
>> properties: {
>> @@ -524,9 +332,30 @@ fn spawn_certificate_worker(
>>
>> let auth_id = rpcenv.get_auth_id().unwrap();
>>
>> + let acme_config = if let Some(cfg) = node_config.acme_config().transpose()? {
>> + cfg
>> + } else {
>> + proxmox_acme_api::parse_acme_config_string("account=default")?
>> + };
>
> wouldn't it make sense to inline this into acme_config() ? the same
> fallback is already there for acme_client()
>
Good catch, will refactor!
>> +
>> + let domains = node_config.acme_domains().try_fold(
>> + Vec::<AcmeDomain>::new(),
>> + |mut acc, domain| -> Result<_, Error> {
>> + let mut domain = domain?;
>> + domain.domain.make_ascii_lowercase();
>> + if let Some(alias) = &mut domain.alias {
>> + alias.make_ascii_lowercase();
>> + }
>> + acc.push(domain);
>> + Ok(acc)
>> + },
>> + )?;
>> +
>> WorkerTask::spawn(name, None, auth_id, true, move |worker| async move {
>> let work = || async {
>> - if let Some(cert) = order_certificate(worker, &node_config).await? {
>> + if let Some(cert) =
>> + proxmox_acme_api::order_certificate(worker, &acme_config, &domains).await?
>> + {
>> crate::config::set_proxy_certificate(&cert.certificate, &cert.private_key_pem)?;
>> crate::server::reload_proxy_certificate().await?;
>> }
>> @@ -562,16 +391,20 @@ pub fn revoke_acme_cert(rpcenv: &mut dyn RpcEnvironment) -> Result<String, Error
>>
>> let auth_id = rpcenv.get_auth_id().unwrap();
>>
>> + let acme_config = if let Some(cfg) = node_config.acme_config().transpose()? {
>> + cfg
>> + } else {
>> + proxmox_acme_api::parse_acme_config_string("account=default")?
>> + };
>
> here as well
>
Will adjust!
>> +
>> WorkerTask::spawn(
>> "acme-revoke-cert",
>> None,
>> auth_id,
>> true,
>> move |_worker| async move {
>> - info!("Loading ACME account");
>> - let mut acme = node_config.acme_client().await?;
>> info!("Revoking old certificate");
>> - acme.revoke_certificate(cert_pem.as_bytes(), None).await?;
>> + proxmox_acme_api::revoke_certificate(&acme_config, &cert_pem.as_bytes()).await?;
>> info!("Deleting certificate and regenerating a self-signed one");
>> delete_custom_certificate().await?;
>> Ok(())
>
> [..]
>
>> diff --git a/src/config/acme/plugin.rs b/src/config/acme/plugin.rs
>> index 8ce852ec..4b4a216e 100644
>> --- a/src/config/acme/plugin.rs
>> +++ b/src/config/acme/plugin.rs
>> @@ -1,104 +1,16 @@
>> use std::sync::LazyLock;
>>
>> use anyhow::Error;
>> -use serde::{Deserialize, Serialize};
>> use serde_json::Value;
>>
>> -use pbs_api_types::PROXMOX_SAFE_ID_FORMAT;
>> -use proxmox_schema::{api, ApiType, Schema, StringSchema, Updater};
>> +use proxmox_acme_api::{DnsPlugin, StandalonePlugin, PLUGIN_ID_SCHEMA};
>> +use proxmox_schema::{ApiType, Schema};
>> use proxmox_section_config::{SectionConfig, SectionConfigData, SectionConfigPlugin};
>>
>> use pbs_config::{open_backup_lockfile, BackupLockGuard};
>>
>> -pub const PLUGIN_ID_SCHEMA: Schema = StringSchema::new("ACME Challenge Plugin ID.")
>> - .format(&PROXMOX_SAFE_ID_FORMAT)
>> - .min_length(1)
>> - .max_length(32)
>> - .schema();
>> -
>> pub static CONFIG: LazyLock<SectionConfig> = LazyLock::new(init);
>>
>> -#[api(
>> - properties: {
>> - id: { schema: PLUGIN_ID_SCHEMA },
>> - },
>> -)]
>> -#[derive(Deserialize, Serialize)]
>> -/// Standalone ACME Plugin for the http-1 challenge.
>> -pub struct StandalonePlugin {
>> - /// Plugin ID.
>> - id: String,
>> -}
>> -
>> -impl Default for StandalonePlugin {
>> - fn default() -> Self {
>> - Self {
>> - id: "standalone".to_string(),
>> - }
>> - }
>> -}
>> -
>> -#[api(
>> - properties: {
>> - id: { schema: PLUGIN_ID_SCHEMA },
>> - disable: {
>> - optional: true,
>> - default: false,
>> - },
>> - "validation-delay": {
>> - default: 30,
>> - optional: true,
>> - minimum: 0,
>> - maximum: 2 * 24 * 60 * 60,
>> - },
>> - },
>> -)]
>> -/// DNS ACME Challenge Plugin core data.
>> -#[derive(Deserialize, Serialize, Updater)]
>> -#[serde(rename_all = "kebab-case")]
>> -pub struct DnsPluginCore {
>> - /// Plugin ID.
>> - #[updater(skip)]
>> - pub id: String,
>> -
>> - /// DNS API Plugin Id.
>> - pub api: String,
>> -
>> - /// Extra delay in seconds to wait before requesting validation.
>> - ///
>> - /// Allows to cope with long TTL of DNS records.
>> - #[serde(skip_serializing_if = "Option::is_none", default)]
>> - pub validation_delay: Option<u32>,
>> -
>> - /// Flag to disable the config.
>> - #[serde(skip_serializing_if = "Option::is_none", default)]
>> - pub disable: Option<bool>,
>> -}
>> -
>> -#[api(
>> - properties: {
>> - core: { type: DnsPluginCore },
>> - },
>> -)]
>> -/// DNS ACME Challenge Plugin.
>> -#[derive(Deserialize, Serialize)]
>> -#[serde(rename_all = "kebab-case")]
>> -pub struct DnsPlugin {
>> - #[serde(flatten)]
>> - pub core: DnsPluginCore,
>> -
>> - // We handle this property separately in the API calls.
>> - /// DNS plugin data (base64url encoded without padding).
>> - #[serde(with = "proxmox_serde::string_as_base64url_nopad")]
>> - pub data: String,
>> -}
>> -
>> -impl DnsPlugin {
>> - pub fn decode_data(&self, output: &mut Vec<u8>) -> Result<(), Error> {
>> - Ok(proxmox_base64::url::decode_to_vec(&self.data, output)?)
>> - }
>> -}
>> -
>> fn init() -> SectionConfig {
>> let mut config = SectionConfig::new(&PLUGIN_ID_SCHEMA);
>>
>> diff --git a/src/config/node.rs b/src/config/node.rs
>> index e4b66a20..6865b815 100644
>> --- a/src/config/node.rs
>> +++ b/src/config/node.rs
>> @@ -9,14 +9,14 @@ use pbs_api_types::{
>> OPENSSL_CIPHERS_TLS_1_3_SCHEMA,
>> };
>> use proxmox_acme::async_client::AcmeClient;
>> -use proxmox_acme_api::AcmeAccountName;
>> +use proxmox_acme_api::{AcmeAccountName, AcmeConfig, AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA};
>> use proxmox_http::ProxyConfig;
>> use proxmox_schema::{api, ApiStringFormat, ApiType, Updater};
>>
>> use pbs_buildcfg::configdir;
>> use pbs_config::{open_backup_lockfile, BackupLockGuard};
>>
>> -use crate::api2::types::{AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA};
>> +use crate::api2::types::HTTP_PROXY_SCHEMA;
>>
>> const CONF_FILE: &str = configdir!("/node.cfg");
>> const LOCK_FILE: &str = configdir!("/.node.lck");
>> @@ -43,20 +43,6 @@ pub fn save_config(config: &NodeConfig) -> Result<(), Error> {
>> pbs_config::replace_backup_config(CONF_FILE, &raw)
>> }
>>
>> -#[api(
>> - properties: {
>> - account: { type: AcmeAccountName },
>> - }
>> -)]
>> -#[derive(Deserialize, Serialize)]
>> -/// The ACME configuration.
>> -///
>> -/// Currently only contains the name of the account use.
>> -pub struct AcmeConfig {
>> - /// Account to use to acquire ACME certificates.
>> - account: AcmeAccountName,
>> -}
>> -
>> /// All available languages in Proxmox. Taken from proxmox-i18n repository.
>> /// pt_BR, zh_CN, and zh_TW use the same case in the translation files.
>> // TODO: auto-generate from available translations
>> @@ -242,7 +228,7 @@ impl NodeConfig {
>>
>> pub async fn acme_client(&self) -> Result<AcmeClient, Error> {
>> let account = if let Some(cfg) = self.acme_config().transpose()? {
>> - cfg.account
>> + AcmeAccountName::from_string(cfg.account)?
>> } else {
>> AcmeAccountName::from_string("default".to_string())? // should really not happen
>> };
>> diff --git a/src/lib.rs b/src/lib.rs
>> index 8633378c..828f5842 100644
>> --- a/src/lib.rs
>> +++ b/src/lib.rs
>> @@ -27,8 +27,6 @@ pub(crate) mod auth;
>>
>> pub mod tape;
>>
>> -pub mod acme;
>> -
>> pub mod client_helpers;
>>
>> pub mod traffic_control_cache;
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 2/5] acme: include proxmox-acme-api dependency
2026-01-13 13:45 5% ` Fabian Grünbichler
@ 2026-01-13 16:41 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-13 16:41 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 1/13/26 2:45 PM, Fabian Grünbichler wrote:
> On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
>> PBS currently uses its own ACME client and API logic, while PDM uses the
>> factored out proxmox-acme and proxmox-acme-api crates. This duplication
>> risks differences in behaviour and requires ACME maintenance in two
>> places. This patch is part of a series to move PBS over to the shared
>> ACME stack.
>
> this doesn't need to be in nearly every commit here.
Makes sense, will remove!
>
> adding the dependency and initializing things without using them also
> has no stand-alone value, so this doesn't need to be its own commit.
>
I thought about this - I decided to add it as a dedicated commit to
improve visibility for review, to make sure call/init sites with params
are clear / to avoid it isn't going overlooked in the refactor.
But I see what you mean, it fits better with the changes from next
patch. Will adjust to this!
> we could have two commits:
> - add proxmox-acme-api and use it for client and API
> - remove src/acme since it is now unused
>
> or three or more if you want to split out some of the API replacement where
> there isn't a 1:1 relation between old and new code..
>
I think the 2 commits approach fits, and would hold the API
changes well together - so let's go with this!
I will try to extract the src/api2 changes from 5/5 which should work.
And then probably is src/config/acme/plugin.rs still left, which will
be removed together with src/acme as part of patch 2.
>>
>> Changes:
>> - Add proxmox-acme-api with the "impl" feature as a dependency.
>> - Initialize proxmox_acme_api in proxmox-backup- api, manager and proxy.
>> * Inits PBS config dir /acme as proxmox ACME directory
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> Cargo.toml | 3 +++
>> src/bin/proxmox-backup-api.rs | 2 ++
>> src/bin/proxmox-backup-manager.rs | 2 ++
>> src/bin/proxmox-backup-proxy.rs | 1 +
>> 4 files changed, 8 insertions(+)
>>
>> diff --git a/Cargo.toml b/Cargo.toml
>> index 1aa57ae5..feae351d 100644
>> --- a/Cargo.toml
>> +++ b/Cargo.toml
>> @@ -101,6 +101,7 @@ pbs-api-types = "1.0.8"
>> # other proxmox crates
>> pathpatterns = "1"
>> proxmox-acme = "1"
>> +proxmox-acme-api = { version = "1", features = [ "impl" ] }
>> pxar = "1"
>>
>> # PBS workspace
>> @@ -251,6 +252,7 @@ pbs-api-types.workspace = true
>>
>> # in their respective repo
>> proxmox-acme.workspace = true
>> +proxmox-acme-api.workspace = true
>> pxar.workspace = true
>>
>> # proxmox-backup workspace/internal crates
>> @@ -269,6 +271,7 @@ proxmox-rrd-api-types.workspace = true
>> [patch.crates-io]
>> #pbs-api-types = { path = "../proxmox/pbs-api-types" }
>> #proxmox-acme = { path = "../proxmox/proxmox-acme" }
>> +#proxmox-acme-api = { path = "../proxmox/proxmox-acme-api" }
>> #proxmox-api-macro = { path = "../proxmox/proxmox-api-macro" }
>> #proxmox-apt = { path = "../proxmox/proxmox-apt" }
>> #proxmox-apt-api-types = { path = "../proxmox/proxmox-apt-api-types" }
>> diff --git a/src/bin/proxmox-backup-api.rs b/src/bin/proxmox-backup-api.rs
>> index 417e9e97..d0091dca 100644
>> --- a/src/bin/proxmox-backup-api.rs
>> +++ b/src/bin/proxmox-backup-api.rs
>> @@ -14,6 +14,7 @@ use proxmox_rest_server::{ApiConfig, RestServer};
>> use proxmox_router::RpcEnvironmentType;
>> use proxmox_sys::fs::CreateOptions;
>>
>> +use pbs_buildcfg::configdir;
>> use proxmox_backup::auth_helpers::*;
>> use proxmox_backup::config;
>> use proxmox_backup::server::auth::check_pbs_auth;
>> @@ -78,6 +79,7 @@ async fn run() -> Result<(), Error> {
>> let mut command_sock = proxmox_daemon::command_socket::CommandSocket::new(backup_user.gid);
>>
>> proxmox_product_config::init(backup_user.clone(), pbs_config::priv_user()?);
>> + proxmox_acme_api::init(configdir!("/acme"), true)?;
>>
>> let dir_opts = CreateOptions::new()
>> .owner(backup_user.uid)
>> diff --git a/src/bin/proxmox-backup-manager.rs b/src/bin/proxmox-backup-manager.rs
>> index f8365070..30bc8da9 100644
>> --- a/src/bin/proxmox-backup-manager.rs
>> +++ b/src/bin/proxmox-backup-manager.rs
>> @@ -19,6 +19,7 @@ use proxmox_router::{cli::*, RpcEnvironment};
>> use proxmox_schema::api;
>> use proxmox_sys::fs::CreateOptions;
>>
>> +use pbs_buildcfg::configdir;
>> use pbs_client::{display_task_log, view_task_result};
>> use pbs_config::sync;
>> use pbs_tools::json::required_string_param;
>> @@ -667,6 +668,7 @@ async fn run() -> Result<(), Error> {
>> .init()?;
>> proxmox_backup::server::notifications::init()?;
>> proxmox_product_config::init(pbs_config::backup_user()?, pbs_config::priv_user()?);
>> + proxmox_acme_api::init(configdir!("/acme"), false)?;
>>
>> let cmd_def = CliCommandMap::new()
>> .insert("acl", acl_commands())
>> diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
>> index 870208fe..eea44a7d 100644
>> --- a/src/bin/proxmox-backup-proxy.rs
>> +++ b/src/bin/proxmox-backup-proxy.rs
>> @@ -188,6 +188,7 @@ async fn run() -> Result<(), Error> {
>> proxmox_backup::server::notifications::init()?;
>> metric_collection::init()?;
>> proxmox_product_config::init(pbs_config::backup_user()?, pbs_config::priv_user()?);
>> + proxmox_acme_api::init(configdir!("/acme"), false)?;
>>
>> let mut indexpath = PathBuf::from(pbs_buildcfg::JS_DIR);
>> indexpath.push("index.hbs");
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests
2026-01-08 11:26 11% [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (8 preceding siblings ...)
2026-01-08 11:26 7% ` [pbs-devel] [PATCH proxmox-backup v5 5/5] acme: certificate ordering through proxmox-acme-api Samuel Rufinatscha
@ 2026-01-13 13:48 5% ` Fabian Grünbichler
2026-01-15 10:24 0% ` Max R. Carrara
2026-01-16 11:30 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
10 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2026-01-13 13:48 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
> Hi,
>
> this series fixes account registration for ACME providers that return
> HTTP 204 No Content to the newNonce request. Currently, both the PBS
> ACME client and the shared ACME client in proxmox-acme only accept
> HTTP 200 OK for this request. The issue was observed in PBS against a
> custom ACME deployment and reported as bug #6939 [1].
sent some feedback for individual patches, one thing to explicitly test
is that existing accounts and DNS plugin configuration continue to work
after the switch over - AFAICT from the testing description below that
was not done (or not noted?).
>
> ## Problem
>
> During ACME account registration, PBS first fetches an anti-replay
> nonce by sending a HEAD request to the CA’s newNonce URL.
> RFC 8555 §7.2 [2] states that:
>
> * the server MUST include a Replay-Nonce header with a fresh nonce,
> * the server SHOULD use status 200 OK for the HEAD request,
> * the server MUST also handle GET on the same resource and may return
> 204 No Content with an empty body.
>
> The reporter observed the following error message:
>
> *ACME server responded with unexpected status code: 204*
>
> and mentioned that the issue did not appear with PVE 9 [1]. Looking at
> PVE’s Perl ACME client [3], it uses a GET request instead of HEAD and
> accepts any 2xx success code when retrieving the nonce. This difference
> in behavior does not affect functionality but is worth noting for
> consistency across implementations.
>
> ## Approach
>
> To support ACME providers which return 204 No Content, the Rust ACME
> clients in proxmox-backup and proxmox need to treat both 200 OK and 204
> No Content as valid responses for the nonce request, as long as a
> Replay-Nonce header is present.
>
> This series changes the expected field of the internal Request type
> from a single u16 to a list of allowed status codes
> (e.g. &'static [u16]), so one request can explicitly accept multiple
> success codes.
>
> To avoid fixing the issue twice (once in PBS’ own ACME client and once
> in the shared Rust client), this series first refactors PBS to use the
> shared AcmeClient from proxmox-acme / proxmox-acme-api, similar to PDM,
> and then applies the bug fix in that shared implementation so that all
> consumers benefit from the more tolerant behavior.
>
> ## Testing
>
> *Testing the refactor*
>
> To test the refactor, I
> (1) installed latest stable PBS on a VM
> (2) created .deb package from latest PBS (master), containing the
> refactor
> (3) installed created .deb package
> (4) installed Pebble from Let's Encrypt [5] on the same VM
> (5) created an ACME account and ordered the new certificate for the
> host domain.
>
> Steps to reproduce:
>
> (1) install latest stable PBS on a VM, create .deb package from latest
> PBS (master) containing the refactor, install created .deb package
> (2) install Pebble from Let's Encrypt [5] on the same VM:
>
> cd
> apt update
> apt install -y golang git
> git clone https://github.com/letsencrypt/pebble
> cd pebble
> go build ./cmd/pebble
>
> then, download and trust the Pebble cert:
>
> wget https://raw.githubusercontent.com/letsencrypt/pebble/main/test/certs/pebble.minica.pem
> cp pebble.minica.pem /usr/local/share/ca-certificates/pebble.minica.crt
> update-ca-certificates
>
> We want Pebble to perform HTTP-01 validation against port 80, because
> PBS’s standalone plugin will bind port 80. Set httpPort to 80.
>
> nano ./test/config/pebble-config.json
>
> Start the Pebble server in the background:
>
> ./pebble -config ./test/config/pebble-config.json &
>
> Create a Pebble ACME account:
>
> proxmox-backup-manager acme account register default admin@example.com --directory 'https://127.0.0.1:14000/dir'
>
> To verify persistence of the account I checked
>
> ls /etc/proxmox-backup/acme/accounts
>
> Verified if update-account works
>
> proxmox-backup-manager acme account update default --contact "a@example.com,b@example.com"
> proxmox-backup-manager acme account info default
>
> In the PBS GUI, you can create a new domain. You can use your host
> domain name (see /etc/hosts). Select the created account and order the
> certificate.
>
> After a page reload, you might need to accept the new certificate in the browser.
> In the PBS dashboard, you should see the new Pebble certificate.
>
> *Note: on reboot, the created Pebble ACME account will be gone and you
> will need to create a new one. Pebble does not persist account info.
> In that case remove the previously created account in
> /etc/proxmox-backup/acme/accounts.
>
> *Testing the newNonce fix*
>
> To prove the ACME newNonce fix, I put nginx in front of Pebble, to
> intercept the newNonce request in order to return 204 No Content
> instead of 200 OK, all other requests are unchanged and forwarded to
> Pebble. Requires trusting the nginx CAs via
> /usr/local/share/ca-certificates + update-ca-certificates on the VM.
>
> Then I ran following command against nginx:
>
> proxmox-backup-manager acme account register proxytest root@backup.local --directory 'https://nginx-address/dir
>
> The account could be created successfully. When adjusting the nginx
> configuration to return any other non-expected success status code,
> PBS rejects as expected.
>
> ## Patch summary
>
> 0001 – [PATCH proxmox v5 1/4] acme: reduce visibility of Request type
> Restricts the visibility of the low-level Request type. Consumers
> should rely on proxmox-acme-api or AcmeClient handlers.
>
> 0002– [PATCH proxmox v5 2/4] acme: introduce http_status module
>
> 0003 – [PATCH proxmox v5 3/4] fix #6939: acme: support servers
> returning 204 for nonce requests
> Adjusts nonce handling to support ACME servers that return HTTP 204
> (No Content) for new-nonce requests.
>
> 0004 – [PATCH proxmox v5 4/4] acme-api: add helper to load client for
> an account
> Introduces a helper function to load an ACME client instance for a
> given account. Required for the following PBS ACME refactor.
>
> 0005 – [PATCH proxmox-backup v5 1/5] acme: clean up ACME-related imports
>
> 0006 – [PATCH proxmox-backup v5 2/5] acme: include proxmox-acme-api
> dependency
> Prepares the codebase to use the factored out ACME API impl.
>
> 0007 – [PATCH proxmox-backup v5 3/5] acme: drop local AcmeClient
> Removes the local AcmeClient implementation. Represents the minimal
> set of changes to replace it with the factored out AcmeClient.
>
> 0008 – [PATCH proxmox-backup v5 4/5] acme: change API impls to use
> proxmox-acme-api handlers
>
> 0009 – [PATCH proxmox-backup v5 5/5] acme: certificate ordering through
> proxmox-acme-api
>
> Thanks for considering this patch series, I look forward to your
> feedback.
>
> Best,
> Samuel Rufinatscha
>
> ## Changelog
>
> Changes from v4 to v5:
>
> * rebased series
> * re-ordered series (proxmox-acme fix first)
> * proxmox-backup: cleaned up imports based on an initial clean-up patch
> * proxmox-acme: removed now unused post_request_raw_payload(),
> update_account_request(), deactivate_account_request()
> * proxmox-acme: removed now obsolete/unused get_authorization() and
> GetAuthorization impl
>
> Verified removal by compiling PBS, PDM, and proxmox-perl-rs
> with all features.
>
> Changes from v3 to v4:
>
> * add proxmox-acme-api as a dependency and initialize it in
> PBS so PBS can use the shared ACME API instead.
> * remove the PBS-local AcmeClient implementation and switch PBS
> over to the shared proxmox-acme async client.
> * rework PBS’ ACME API endpoints to delegate to
> proxmox-acme-api handlers instead of duplicating logic locally.
> * move PBS’ ACME certificate ordering logic over to
> proxmox-acme-api, keeping only certificate installation/reload in PBS.
> * add a load_client_with_account helper in proxmox-acme-api so PBS
> (and others) can construct an AcmeClient for a configured account
> without duplicating boilerplate.
> * hide the low-level Request type and its fields behind constructors
> / reduced visibility so changes to “expected” no longer affect the
> public API as they did in v3.
> * split out the HTTP status constants into an internal http_status
> module as a separate preparatory cleanup before the bug fix, instead
> of doing this inline like in v3.
> * Rebased on top of the refactor: keep the same behavioural fix as in
> v3 accept 204 for newNonce with Replay-Nonce present), but implement
> it on top of the http_status module that is part of the refactor.
>
> Changes from v2 to v3:
>
> * rename `http_success` module to `http_status`
> * replace `http_success` usage
> * introduced `http_success` module to contain the http success codes
> * replaced `Vec<u16>` with `&[u16]` for expected codes to avoid allocations.
> * clarified the PVEs Perl ACME client behaviour in the commit message.
> * integrated the `http_success` module, replacing `Vec<u16>` with `&[u16]`
> * clarified the PVEs Perl ACME client behaviour in the commit message.
>
> [1] Bugzilla report #6939:
> [https://bugzilla.proxmox.com/show_bug.cgi?id=6939](https://bugzilla.proxmox.com/show_bug.cgi?id=6939)
> [2] RFC 8555 (ACME):
> [https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2](https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2)
> [3] PVE’s Perl ACME client (allow 2xx codes for nonce requests):
> [https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597)
> [4] Pebble ACME server:
> [https://github.com/letsencrypt/pebble](https://github.com/letsencrypt/pebble)
> [5] Pebble ACME server (perform GET request:
> [https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219)
>
> proxmox:
>
> Samuel Rufinatscha (4):
> acme: reduce visibility of Request type
> acme: introduce http_status module
> fix #6939: acme: support servers returning 204 for nonce requests
> acme-api: add helper to load client for an account
>
> proxmox-acme-api/src/account_api_impl.rs | 5 ++
> proxmox-acme-api/src/lib.rs | 3 +-
> proxmox-acme/src/account.rs | 102 ++---------------------
> proxmox-acme/src/async_client.rs | 8 +-
> proxmox-acme/src/authorization.rs | 30 -------
> proxmox-acme/src/client.rs | 8 +-
> proxmox-acme/src/lib.rs | 6 +-
> proxmox-acme/src/order.rs | 2 +-
> proxmox-acme/src/request.rs | 25 ++++--
> 9 files changed, 44 insertions(+), 145 deletions(-)
>
>
> proxmox-backup:
>
> Samuel Rufinatscha (5):
> acme: clean up ACME-related imports
> acme: include proxmox-acme-api dependency
> acme: drop local AcmeClient
> acme: change API impls to use proxmox-acme-api handlers
> acme: certificate ordering through proxmox-acme-api
>
> Cargo.toml | 3 +
> src/acme/client.rs | 691 -------------------------
> src/acme/mod.rs | 5 -
> src/acme/plugin.rs | 336 ------------
> src/api2/config/acme.rs | 406 ++-------------
> src/api2/node/certificates.rs | 232 ++-------
> src/api2/types/acme.rs | 98 ----
> src/api2/types/mod.rs | 3 -
> src/bin/proxmox-backup-api.rs | 2 +
> src/bin/proxmox-backup-manager.rs | 14 +-
> src/bin/proxmox-backup-proxy.rs | 15 +-
> src/bin/proxmox_backup_manager/acme.rs | 21 +-
> src/config/acme/mod.rs | 55 +-
> src/config/acme/plugin.rs | 92 +---
> src/config/node.rs | 31 +-
> src/lib.rs | 2 -
> 16 files changed, 109 insertions(+), 1897 deletions(-)
> delete mode 100644 src/acme/client.rs
> delete mode 100644 src/acme/mod.rs
> delete mode 100644 src/acme/plugin.rs
> delete mode 100644 src/api2/types/acme.rs
>
>
> Summary over all repositories:
> 25 files changed, 153 insertions(+), 2042 deletions(-)
>
> --
> Generated by git-murpp 0.8.1
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox v5 1/4] acme: reduce visibility of Request type
2026-01-08 11:26 10% ` [pbs-devel] [PATCH proxmox v5 1/4] acme: reduce visibility of Request type Samuel Rufinatscha
@ 2026-01-13 13:46 5% ` Fabian Grünbichler
2026-01-14 15:07 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2026-01-13 13:46 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
> Currently, the low-level ACME Request type is publicly exposed, even
> though users are expected to go through AcmeClient and
> proxmox-acme-api handlers. This patch reduces visibility so that
> the Request type and related fields/methods are crate-internal only.
it also removes a lot of public and private code entirely, not just
changing visibility.. I think those were intentionally there to allow
usage without the need to using either of the provided client
implementations (which are guarded behind feature flags).
if we say the crate should only be used via either the `client` or the
`async-client` then that's fine, but it should be made explicit and
discussed.. right now this is sort of half-way there - e.g., the
Account::new_order method was not made private, even though it makes no
sense anymore with those other methods/helpers removed..
this patch also breaks a few reference in doc comments that would need
to be dropped.
a note that this breaks the current usage of proxmox-acme in PBS would
also be good to have here, if this is kept..
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> proxmox-acme/src/account.rs | 94 ++-----------------------------
> proxmox-acme/src/async_client.rs | 2 +-
> proxmox-acme/src/authorization.rs | 30 ----------
> proxmox-acme/src/client.rs | 6 +-
> proxmox-acme/src/lib.rs | 4 --
> proxmox-acme/src/order.rs | 2 +-
> proxmox-acme/src/request.rs | 12 ++--
> 7 files changed, 16 insertions(+), 134 deletions(-)
>
> diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
> index f763c1e9..d8eb3e73 100644
> --- a/proxmox-acme/src/account.rs
> +++ b/proxmox-acme/src/account.rs
> @@ -8,12 +8,11 @@ use openssl::pkey::{PKey, Private};
> use serde::{Deserialize, Serialize};
> use serde_json::Value;
>
> -use crate::authorization::{Authorization, GetAuthorization};
> use crate::b64u;
> use crate::directory::Directory;
> use crate::jws::Jws;
> use crate::key::{Jwk, PublicKey};
> -use crate::order::{NewOrder, Order, OrderData};
> +use crate::order::{NewOrder, OrderData};
> use crate::request::Request;
> use crate::types::{AccountData, AccountStatus, ExternalAccountBinding};
> use crate::Error;
> @@ -92,7 +91,7 @@ impl Account {
> }
>
> /// Prepare a "POST-as-GET" request to fetch data. Low level helper.
> - pub fn get_request(&self, url: &str, nonce: &str) -> Result<Request, Error> {
> + pub(crate) fn get_request(&self, url: &str, nonce: &str) -> Result<Request, Error> {
> let key = PKey::private_key_from_pem(self.private_key.as_bytes())?;
> let body = serde_json::to_string(&Jws::new_full(
> &key,
> @@ -112,7 +111,7 @@ impl Account {
> }
>
> /// Prepare a JSON POST request. Low level helper.
> - pub fn post_request<T: Serialize>(
> + pub(crate) fn post_request<T: Serialize>(
> &self,
> url: &str,
> nonce: &str,
> @@ -136,31 +135,6 @@ impl Account {
> })
> }
>
> - /// Prepare a JSON POST request.
> - fn post_request_raw_payload(
> - &self,
> - url: &str,
> - nonce: &str,
> - payload: String,
> - ) -> Result<Request, Error> {
> - let key = PKey::private_key_from_pem(self.private_key.as_bytes())?;
> - let body = serde_json::to_string(&Jws::new_full(
> - &key,
> - Some(self.location.clone()),
> - url.to_owned(),
> - nonce.to_owned(),
> - payload,
> - )?)?;
> -
> - Ok(Request {
> - url: url.to_owned(),
> - method: "POST",
> - content_type: crate::request::JSON_CONTENT_TYPE,
> - body,
> - expected: 200,
> - })
> - }
> -
> /// Get the "key authorization" for a token.
> pub fn key_authorization(&self, token: &str) -> Result<String, Error> {
> let key = PKey::private_key_from_pem(self.private_key.as_bytes())?;
> @@ -176,64 +150,6 @@ impl Account {
> Ok(b64u::encode(digest))
> }
>
> - /// Prepare a request to update account data.
> - ///
> - /// This is a rather low level interface. You should know what you're doing.
> - pub fn update_account_request<T: Serialize>(
> - &self,
> - nonce: &str,
> - data: &T,
> - ) -> Result<Request, Error> {
> - self.post_request(&self.location, nonce, data)
> - }
> -
> - /// Prepare a request to deactivate this account.
> - pub fn deactivate_account_request<T: Serialize>(&self, nonce: &str) -> Result<Request, Error> {
> - self.post_request_raw_payload(
> - &self.location,
> - nonce,
> - r#"{"status":"deactivated"}"#.to_string(),
> - )
> - }
> -
> - /// Prepare a request to query an Authorization for an Order.
> - ///
> - /// Returns `Ok(None)` if `auth_index` is out of out of range. You can query the number of
> - /// authorizations from via [`Order::authorization_len`] or by manually inspecting its
> - /// `.data.authorization` vector.
> - pub fn get_authorization(
> - &self,
> - order: &Order,
> - auth_index: usize,
> - nonce: &str,
> - ) -> Result<Option<GetAuthorization>, Error> {
> - match order.authorization(auth_index) {
> - None => Ok(None),
> - Some(url) => Ok(Some(GetAuthorization::new(self.get_request(url, nonce)?))),
> - }
> - }
> -
> - /// Prepare a request to validate a Challenge from an Authorization.
> - ///
> - /// Returns `Ok(None)` if `challenge_index` is out of out of range. The challenge count is
> - /// available by inspecting the [`Authorization::challenges`] vector.
> - ///
> - /// This returns a raw `Request` since validation takes some time and the `Authorization`
> - /// object has to be re-queried and its `status` inspected.
> - pub fn validate_challenge(
> - &self,
> - authorization: &Authorization,
> - challenge_index: usize,
> - nonce: &str,
> - ) -> Result<Option<Request>, Error> {
> - match authorization.challenges.get(challenge_index) {
> - None => Ok(None),
> - Some(challenge) => self
> - .post_request_raw_payload(&challenge.url, nonce, "{}".to_string())
> - .map(Some),
> - }
> - }
> -
> /// Prepare a request to revoke a certificate.
> ///
> /// The certificate can be either PEM or DER formatted.
> @@ -274,7 +190,7 @@ pub struct CertificateRevocation<'a> {
>
> impl CertificateRevocation<'_> {
> /// Create the revocation request using the specified nonce for the given directory.
> - pub fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
> + pub(crate) fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
> let revoke_cert = directory.data.revoke_cert.as_ref().ok_or_else(|| {
> Error::Custom("no 'revokeCert' URL specified by provider".to_string())
> })?;
> @@ -364,7 +280,7 @@ impl AccountCreator {
> /// the resulting request.
> /// Changing the private key between using the request and passing the response to
> /// [`response`](AccountCreator::response()) will render the account unusable!
> - pub fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
> + pub(crate) fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
> let key = self.key.as_deref().ok_or(Error::MissingKey)?;
> let url = directory.new_account_url().ok_or_else(|| {
> Error::Custom("no 'newAccount' URL specified by provider".to_string())
> diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
> index dc755fb9..2ff3ba22 100644
> --- a/proxmox-acme/src/async_client.rs
> +++ b/proxmox-acme/src/async_client.rs
> @@ -10,7 +10,7 @@ use proxmox_http::{client::Client, Body};
>
> use crate::account::AccountCreator;
> use crate::order::{Order, OrderData};
> -use crate::Request as AcmeRequest;
> +use crate::request::Request as AcmeRequest;
> use crate::{Account, Authorization, Challenge, Directory, Error, ErrorResponse};
>
> /// A non-blocking Acme client using tokio/hyper.
> diff --git a/proxmox-acme/src/authorization.rs b/proxmox-acme/src/authorization.rs
> index 28bc1b4b..7027381a 100644
> --- a/proxmox-acme/src/authorization.rs
> +++ b/proxmox-acme/src/authorization.rs
> @@ -6,8 +6,6 @@ use serde::{Deserialize, Serialize};
> use serde_json::Value;
>
> use crate::order::Identifier;
> -use crate::request::Request;
> -use crate::Error;
>
> /// Status of an [`Authorization`].
> #[derive(Clone, Copy, Debug, Eq, PartialEq, Deserialize, Serialize)]
> @@ -132,31 +130,3 @@ impl Challenge {
> fn is_false(b: &bool) -> bool {
> !*b
> }
> -
> -/// Represents an in-flight query for an authorization.
> -///
> -/// This is created via [`Account::get_authorization`](crate::Account::get_authorization()).
> -pub struct GetAuthorization {
> - //order: OrderData,
> - /// The request to send to the ACME provider. This is wrapped in an option in order to allow
> - /// moving it out instead of copying the contents.
> - ///
> - /// When generated via [`Account::get_authorization`](crate::Account::get_authorization()),
> - /// this is guaranteed to be `Some`.
> - ///
> - /// The response should be passed to the the [`response`](GetAuthorization::response()) method.
> - pub request: Option<Request>,
> -}
> -
> -impl GetAuthorization {
> - pub(crate) fn new(request: Request) -> Self {
> - Self {
> - request: Some(request),
> - }
> - }
> -
> - /// Deal with the response we got from the server.
> - pub fn response(self, response_body: &[u8]) -> Result<Authorization, Error> {
> - Ok(serde_json::from_slice(response_body)?)
> - }
> -}
> diff --git a/proxmox-acme/src/client.rs b/proxmox-acme/src/client.rs
> index 931f7245..5c812567 100644
> --- a/proxmox-acme/src/client.rs
> +++ b/proxmox-acme/src/client.rs
> @@ -7,8 +7,8 @@ use serde::{Deserialize, Serialize};
> use crate::b64u;
> use crate::error;
> use crate::order::OrderData;
> -use crate::request::ErrorResponse;
> -use crate::{Account, Authorization, Challenge, Directory, Error, Order, Request};
> +use crate::request::{ErrorResponse, Request};
> +use crate::{Account, Authorization, Challenge, Directory, Error, Order};
>
> macro_rules! format_err {
> ($($fmt:tt)*) => { Error::Client(format!($($fmt)*)) };
> @@ -564,7 +564,7 @@ impl Client {
> }
>
> /// Low-level API to run an n API request. This automatically updates the current nonce!
> - pub fn run_request(&mut self, request: Request) -> Result<HttpResponse, Error> {
> + pub(crate) fn run_request(&mut self, request: Request) -> Result<HttpResponse, Error> {
> self.inner.run_request(request)
> }
>
> diff --git a/proxmox-acme/src/lib.rs b/proxmox-acme/src/lib.rs
> index df722629..6722030c 100644
> --- a/proxmox-acme/src/lib.rs
> +++ b/proxmox-acme/src/lib.rs
> @@ -66,10 +66,6 @@ pub use error::Error;
> #[doc(inline)]
> pub use order::Order;
>
> -#[cfg(feature = "impl")]
> -#[doc(inline)]
> -pub use request::Request;
> -
> // we don't inline these:
> #[cfg(feature = "impl")]
> pub use order::NewOrder;
> diff --git a/proxmox-acme/src/order.rs b/proxmox-acme/src/order.rs
> index b6551004..432a81a4 100644
> --- a/proxmox-acme/src/order.rs
> +++ b/proxmox-acme/src/order.rs
> @@ -153,7 +153,7 @@ pub struct NewOrder {
> //order: OrderData,
> /// The request to execute to place the order. When creating a [`NewOrder`] via
> /// [`Account::new_order`](crate::Account::new_order) this is guaranteed to be `Some`.
> - pub request: Option<Request>,
> + pub(crate) request: Option<Request>,
> }
>
> impl NewOrder {
> diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
> index 78a90913..dadfc5af 100644
> --- a/proxmox-acme/src/request.rs
> +++ b/proxmox-acme/src/request.rs
> @@ -4,21 +4,21 @@ pub(crate) const JSON_CONTENT_TYPE: &str = "application/jose+json";
> pub(crate) const CREATED: u16 = 201;
>
> /// A request which should be performed on the ACME provider.
> -pub struct Request {
> +pub(crate) struct Request {
> /// The complete URL to send the request to.
> - pub url: String,
> + pub(crate) url: String,
>
> /// The HTTP method name to use.
> - pub method: &'static str,
> + pub(crate) method: &'static str,
>
> /// The `Content-Type` header to pass along.
> - pub content_type: &'static str,
> + pub(crate) content_type: &'static str,
>
> /// The body to pass along with request, or an empty string.
> - pub body: String,
> + pub(crate) body: String,
>
> /// The expected status code a compliant ACME provider will return on success.
> - pub expected: u16,
> + pub(crate) expected: u16,
> }
>
> /// An ACME error response contains a specially formatted type string, and can optionally
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 2/5] acme: include proxmox-acme-api dependency
2026-01-08 11:26 15% ` [pbs-devel] [PATCH proxmox-backup v5 2/5] acme: include proxmox-acme-api dependency Samuel Rufinatscha
@ 2026-01-13 13:45 5% ` Fabian Grünbichler
2026-01-13 16:41 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2026-01-13 13:45 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
> PBS currently uses its own ACME client and API logic, while PDM uses the
> factored out proxmox-acme and proxmox-acme-api crates. This duplication
> risks differences in behaviour and requires ACME maintenance in two
> places. This patch is part of a series to move PBS over to the shared
> ACME stack.
this doesn't need to be in nearly every commit here.
adding the dependency and initializing things without using them also
has no stand-alone value, so this doesn't need to be its own commit.
we could have two commits:
- add proxmox-acme-api and use it for client and API
- remove src/acme since it is now unused
or three or more if you want to split out some of the API replacement where
there isn't a 1:1 relation between old and new code..
>
> Changes:
> - Add proxmox-acme-api with the "impl" feature as a dependency.
> - Initialize proxmox_acme_api in proxmox-backup- api, manager and proxy.
> * Inits PBS config dir /acme as proxmox ACME directory
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> Cargo.toml | 3 +++
> src/bin/proxmox-backup-api.rs | 2 ++
> src/bin/proxmox-backup-manager.rs | 2 ++
> src/bin/proxmox-backup-proxy.rs | 1 +
> 4 files changed, 8 insertions(+)
>
> diff --git a/Cargo.toml b/Cargo.toml
> index 1aa57ae5..feae351d 100644
> --- a/Cargo.toml
> +++ b/Cargo.toml
> @@ -101,6 +101,7 @@ pbs-api-types = "1.0.8"
> # other proxmox crates
> pathpatterns = "1"
> proxmox-acme = "1"
> +proxmox-acme-api = { version = "1", features = [ "impl" ] }
> pxar = "1"
>
> # PBS workspace
> @@ -251,6 +252,7 @@ pbs-api-types.workspace = true
>
> # in their respective repo
> proxmox-acme.workspace = true
> +proxmox-acme-api.workspace = true
> pxar.workspace = true
>
> # proxmox-backup workspace/internal crates
> @@ -269,6 +271,7 @@ proxmox-rrd-api-types.workspace = true
> [patch.crates-io]
> #pbs-api-types = { path = "../proxmox/pbs-api-types" }
> #proxmox-acme = { path = "../proxmox/proxmox-acme" }
> +#proxmox-acme-api = { path = "../proxmox/proxmox-acme-api" }
> #proxmox-api-macro = { path = "../proxmox/proxmox-api-macro" }
> #proxmox-apt = { path = "../proxmox/proxmox-apt" }
> #proxmox-apt-api-types = { path = "../proxmox/proxmox-apt-api-types" }
> diff --git a/src/bin/proxmox-backup-api.rs b/src/bin/proxmox-backup-api.rs
> index 417e9e97..d0091dca 100644
> --- a/src/bin/proxmox-backup-api.rs
> +++ b/src/bin/proxmox-backup-api.rs
> @@ -14,6 +14,7 @@ use proxmox_rest_server::{ApiConfig, RestServer};
> use proxmox_router::RpcEnvironmentType;
> use proxmox_sys::fs::CreateOptions;
>
> +use pbs_buildcfg::configdir;
> use proxmox_backup::auth_helpers::*;
> use proxmox_backup::config;
> use proxmox_backup::server::auth::check_pbs_auth;
> @@ -78,6 +79,7 @@ async fn run() -> Result<(), Error> {
> let mut command_sock = proxmox_daemon::command_socket::CommandSocket::new(backup_user.gid);
>
> proxmox_product_config::init(backup_user.clone(), pbs_config::priv_user()?);
> + proxmox_acme_api::init(configdir!("/acme"), true)?;
>
> let dir_opts = CreateOptions::new()
> .owner(backup_user.uid)
> diff --git a/src/bin/proxmox-backup-manager.rs b/src/bin/proxmox-backup-manager.rs
> index f8365070..30bc8da9 100644
> --- a/src/bin/proxmox-backup-manager.rs
> +++ b/src/bin/proxmox-backup-manager.rs
> @@ -19,6 +19,7 @@ use proxmox_router::{cli::*, RpcEnvironment};
> use proxmox_schema::api;
> use proxmox_sys::fs::CreateOptions;
>
> +use pbs_buildcfg::configdir;
> use pbs_client::{display_task_log, view_task_result};
> use pbs_config::sync;
> use pbs_tools::json::required_string_param;
> @@ -667,6 +668,7 @@ async fn run() -> Result<(), Error> {
> .init()?;
> proxmox_backup::server::notifications::init()?;
> proxmox_product_config::init(pbs_config::backup_user()?, pbs_config::priv_user()?);
> + proxmox_acme_api::init(configdir!("/acme"), false)?;
>
> let cmd_def = CliCommandMap::new()
> .insert("acl", acl_commands())
> diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
> index 870208fe..eea44a7d 100644
> --- a/src/bin/proxmox-backup-proxy.rs
> +++ b/src/bin/proxmox-backup-proxy.rs
> @@ -188,6 +188,7 @@ async fn run() -> Result<(), Error> {
> proxmox_backup::server::notifications::init()?;
> metric_collection::init()?;
> proxmox_product_config::init(pbs_config::backup_user()?, pbs_config::priv_user()?);
> + proxmox_acme_api::init(configdir!("/acme"), false)?;
>
> let mut indexpath = PathBuf::from(pbs_buildcfg::JS_DIR);
> indexpath.push("index.hbs");
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox v5 2/4] acme: introduce http_status module
2026-01-08 11:26 15% ` [pbs-devel] [PATCH proxmox v5 2/4] acme: introduce http_status module Samuel Rufinatscha
@ 2026-01-13 13:45 5% ` Fabian Grünbichler
2026-01-14 10:29 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2026-01-13 13:45 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
> Introduce an internal http_status module with the common ACME HTTP
> response codes, and replace use of crate::request::CREATED as well as
> direct numeric status code usages.
why not use http::status ? we already have this as dependency pretty
much everywhere we do anything HTTP related.. would also for nicer error
messages in case the status is not as expected..
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> proxmox-acme/src/account.rs | 8 ++++----
> proxmox-acme/src/async_client.rs | 4 ++--
> proxmox-acme/src/lib.rs | 2 ++
> proxmox-acme/src/request.rs | 11 ++++++++++-
> 4 files changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
> index d8eb3e73..ea1a3c60 100644
> --- a/proxmox-acme/src/account.rs
> +++ b/proxmox-acme/src/account.rs
> @@ -84,7 +84,7 @@ impl Account {
> method: "POST",
> content_type: crate::request::JSON_CONTENT_TYPE,
> body,
> - expected: crate::request::CREATED,
> + expected: crate::http_status::CREATED,
> };
>
> Ok(NewOrder::new(request))
> @@ -106,7 +106,7 @@ impl Account {
> method: "POST",
> content_type: crate::request::JSON_CONTENT_TYPE,
> body,
> - expected: 200,
> + expected: crate::http_status::OK,
> })
> }
>
> @@ -131,7 +131,7 @@ impl Account {
> method: "POST",
> content_type: crate::request::JSON_CONTENT_TYPE,
> body,
> - expected: 200,
> + expected: crate::http_status::OK,
> })
> }
>
> @@ -321,7 +321,7 @@ impl AccountCreator {
> method: "POST",
> content_type: crate::request::JSON_CONTENT_TYPE,
> body,
> - expected: crate::request::CREATED,
> + expected: crate::http_status::CREATED,
> })
> }
>
> diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
> index 2ff3ba22..043648bb 100644
> --- a/proxmox-acme/src/async_client.rs
> +++ b/proxmox-acme/src/async_client.rs
> @@ -498,7 +498,7 @@ impl AcmeClient {
> method: "GET",
> content_type: "",
> body: String::new(),
> - expected: 200,
> + expected: crate::http_status::OK,
> },
> nonce,
> )
> @@ -550,7 +550,7 @@ impl AcmeClient {
> method: "HEAD",
> content_type: "",
> body: String::new(),
> - expected: 200,
> + expected: crate::http_status::OK,
> },
> nonce,
> )
> diff --git a/proxmox-acme/src/lib.rs b/proxmox-acme/src/lib.rs
> index 6722030c..6051a025 100644
> --- a/proxmox-acme/src/lib.rs
> +++ b/proxmox-acme/src/lib.rs
> @@ -70,6 +70,8 @@ pub use order::Order;
> #[cfg(feature = "impl")]
> pub use order::NewOrder;
> #[cfg(feature = "impl")]
> +pub(crate) use request::http_status;
> +#[cfg(feature = "impl")]
> pub use request::ErrorResponse;
>
> /// Header name for nonces.
> diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
> index dadfc5af..341ce53e 100644
> --- a/proxmox-acme/src/request.rs
> +++ b/proxmox-acme/src/request.rs
> @@ -1,7 +1,6 @@
> use serde::Deserialize;
>
> pub(crate) const JSON_CONTENT_TYPE: &str = "application/jose+json";
> -pub(crate) const CREATED: u16 = 201;
>
> /// A request which should be performed on the ACME provider.
> pub(crate) struct Request {
> @@ -21,6 +20,16 @@ pub(crate) struct Request {
> pub(crate) expected: u16,
> }
>
> +/// Common HTTP status codes used in ACME responses.
> +pub(crate) mod http_status {
> + /// 200 OK
> + pub(crate) const OK: u16 = 200;
> + /// 201 Created
> + pub(crate) const CREATED: u16 = 201;
> + /// 204 No Content
> + pub(crate) const NO_CONTENT: u16 = 204;
> +}
> +
> /// An ACME error response contains a specially formatted type string, and can optionally
> /// contain textual details and a set of sub problems.
> #[derive(Clone, Debug, Deserialize)]
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* [pbs-devel] applied: [PATCH proxmox-backup v5 1/5] acme: clean up ACME-related imports
2026-01-08 11:26 13% ` [pbs-devel] [PATCH proxmox-backup v5 1/5] acme: clean up ACME-related imports Samuel Rufinatscha
@ 2026-01-13 13:45 5% ` Fabian Grünbichler
0 siblings, 0 replies; 200+ results
From: Fabian Grünbichler @ 2026-01-13 13:45 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
applied this one, since it is independent.
On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
> Clean up ACME-related imports to make it easier to switch to
> the factored out proxmox/ ACME implementation later.
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> src/acme/plugin.rs | 3 +--
> src/api2/config/acme.rs | 10 ++++------
> src/api2/node/certificates.rs | 7 +++----
> src/api2/types/acme.rs | 3 +--
> src/bin/proxmox-backup-manager.rs | 12 +++++-------
> src/bin/proxmox-backup-proxy.rs | 14 ++++++--------
> src/config/acme/mod.rs | 3 +--
> src/config/acme/plugin.rs | 2 +-
> src/config/node.rs | 6 ++----
> 9 files changed, 24 insertions(+), 36 deletions(-)
>
> diff --git a/src/acme/plugin.rs b/src/acme/plugin.rs
> index f756e9b5..993d729b 100644
> --- a/src/acme/plugin.rs
> +++ b/src/acme/plugin.rs
> @@ -19,11 +19,10 @@ use tokio::net::TcpListener;
> use tokio::process::Command;
>
> use proxmox_acme::{Authorization, Challenge};
> +use proxmox_rest_server::WorkerTask;
>
> use crate::acme::AcmeClient;
> use crate::api2::types::AcmeDomain;
> -use proxmox_rest_server::WorkerTask;
> -
> use crate::config::acme::plugin::{DnsPlugin, PluginData};
>
> const PROXMOX_ACME_SH_PATH: &str = "/usr/share/proxmox-acme/proxmox-acme";
> diff --git a/src/api2/config/acme.rs b/src/api2/config/acme.rs
> index 35c3fb77..18671639 100644
> --- a/src/api2/config/acme.rs
> +++ b/src/api2/config/acme.rs
> @@ -10,22 +10,20 @@ use serde::{Deserialize, Serialize};
> use serde_json::{json, Value};
> use tracing::{info, warn};
>
> +use pbs_api_types::{Authid, PRIV_SYS_MODIFY};
> +use proxmox_acme::types::AccountData as AcmeAccountData;
> +use proxmox_acme::Account;
> +use proxmox_rest_server::WorkerTask;
> use proxmox_router::{
> http_bail, list_subdirs_api_method, Permission, Router, RpcEnvironment, SubdirMap,
> };
> use proxmox_schema::{api, param_bail};
>
> -use proxmox_acme::types::AccountData as AcmeAccountData;
> -use proxmox_acme::Account;
> -
> -use pbs_api_types::{Authid, PRIV_SYS_MODIFY};
> -
> use crate::acme::AcmeClient;
> use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
> use crate::config::acme::plugin::{
> self, DnsPlugin, DnsPluginCore, DnsPluginCoreUpdater, PLUGIN_ID_SCHEMA,
> };
> -use proxmox_rest_server::WorkerTask;
>
> pub(crate) const ROUTER: Router = Router::new()
> .get(&list_subdirs_api_method!(SUBDIRS))
> diff --git a/src/api2/node/certificates.rs b/src/api2/node/certificates.rs
> index 61ef910e..6b1d87d2 100644
> --- a/src/api2/node/certificates.rs
> +++ b/src/api2/node/certificates.rs
> @@ -5,23 +5,22 @@ use anyhow::{bail, format_err, Error};
> use openssl::pkey::PKey;
> use openssl::x509::X509;
> use serde::{Deserialize, Serialize};
> -use tracing::info;
> +use tracing::{info, warn};
>
> +use pbs_api_types::{NODE_SCHEMA, PRIV_SYS_MODIFY};
> +use proxmox_rest_server::WorkerTask;
> use proxmox_router::list_subdirs_api_method;
> use proxmox_router::SubdirMap;
> use proxmox_router::{Permission, Router, RpcEnvironment};
> use proxmox_schema::api;
>
> -use pbs_api_types::{NODE_SCHEMA, PRIV_SYS_MODIFY};
> use pbs_buildcfg::configdir;
> use pbs_tools::cert;
> -use tracing::warn;
>
> use crate::acme::AcmeClient;
> use crate::api2::types::AcmeDomain;
> use crate::config::node::NodeConfig;
> use crate::server::send_certificate_renewal_mail;
> -use proxmox_rest_server::WorkerTask;
>
> pub const ROUTER: Router = Router::new()
> .get(&list_subdirs_api_method!(SUBDIRS))
> diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
> index 210ebdbc..8661f9e8 100644
> --- a/src/api2/types/acme.rs
> +++ b/src/api2/types/acme.rs
> @@ -1,9 +1,8 @@
> use serde::{Deserialize, Serialize};
> use serde_json::Value;
>
> -use proxmox_schema::{api, ApiStringFormat, ApiType, Schema, StringSchema};
> -
> use pbs_api_types::{DNS_ALIAS_FORMAT, DNS_NAME_FORMAT, PROXMOX_SAFE_ID_FORMAT};
> +use proxmox_schema::{api, ApiStringFormat, ApiType, Schema, StringSchema};
>
> #[api(
> properties: {
> diff --git a/src/bin/proxmox-backup-manager.rs b/src/bin/proxmox-backup-manager.rs
> index d9f41353..f8365070 100644
> --- a/src/bin/proxmox-backup-manager.rs
> +++ b/src/bin/proxmox-backup-manager.rs
> @@ -5,10 +5,6 @@ use std::str::FromStr;
> use anyhow::{format_err, Error};
> use serde_json::{json, Value};
>
> -use proxmox_router::{cli::*, RpcEnvironment};
> -use proxmox_schema::api;
> -use proxmox_sys::fs::CreateOptions;
> -
> use pbs_api_types::percent_encoding::percent_encode_component;
> use pbs_api_types::{
> BackupNamespace, GroupFilter, RateLimitConfig, SyncDirection, SyncJobConfig, DATASTORE_SCHEMA,
> @@ -18,12 +14,14 @@ use pbs_api_types::{
> VERIFICATION_OUTDATED_AFTER_SCHEMA, VERIFY_JOB_READ_THREADS_SCHEMA,
> VERIFY_JOB_VERIFY_THREADS_SCHEMA,
> };
> +use proxmox_rest_server::wait_for_local_worker;
> +use proxmox_router::{cli::*, RpcEnvironment};
> +use proxmox_schema::api;
> +use proxmox_sys::fs::CreateOptions;
> +
> use pbs_client::{display_task_log, view_task_result};
> use pbs_config::sync;
> use pbs_tools::json::required_string_param;
> -
> -use proxmox_rest_server::wait_for_local_worker;
> -
> use proxmox_backup::api2;
> use proxmox_backup::client_helpers::connect_to_localhost;
> use proxmox_backup::config;
> diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
> index 92a8cb3c..870208fe 100644
> --- a/src/bin/proxmox-backup-proxy.rs
> +++ b/src/bin/proxmox-backup-proxy.rs
> @@ -9,27 +9,25 @@ use hyper::http::request::Parts;
> use hyper::http::Response;
> use hyper::StatusCode;
> use hyper_util::server::graceful::GracefulShutdown;
> +use openssl::ssl::SslAcceptor;
> +use serde_json::{json, Value};
> use tracing::level_filters::LevelFilter;
> use tracing::{info, warn};
> use url::form_urlencoded;
>
> -use openssl::ssl::SslAcceptor;
> -use serde_json::{json, Value};
> -
> use proxmox_http::Body;
> use proxmox_http::RateLimiterTag;
> use proxmox_lang::try_block;
> +use proxmox_rest_server::{
> + cleanup_old_tasks, cookie_from_header, rotate_task_log_archive, ApiConfig, Redirector,
> + RestEnvironment, RestServer, WorkerTask,
> +};
> use proxmox_router::{RpcEnvironment, RpcEnvironmentType};
> use proxmox_sys::fs::CreateOptions;
> use proxmox_sys::logrotate::LogRotate;
>
> use pbs_datastore::DataStore;
>
> -use proxmox_rest_server::{
> - cleanup_old_tasks, cookie_from_header, rotate_task_log_archive, ApiConfig, Redirector,
> - RestEnvironment, RestServer, WorkerTask,
> -};
> -
> use proxmox_backup::{
> server::{
> auth::check_pbs_auth,
> diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
> index 274a23fd..ac89ae5e 100644
> --- a/src/config/acme/mod.rs
> +++ b/src/config/acme/mod.rs
> @@ -5,11 +5,10 @@ use std::path::Path;
> use anyhow::{bail, format_err, Error};
> use serde_json::Value;
>
> +use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
> use proxmox_sys::error::SysError;
> use proxmox_sys::fs::{file_read_string, CreateOptions};
>
> -use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
> -
> use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
>
> pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
> diff --git a/src/config/acme/plugin.rs b/src/config/acme/plugin.rs
> index 18e71199..8ce852ec 100644
> --- a/src/config/acme/plugin.rs
> +++ b/src/config/acme/plugin.rs
> @@ -4,10 +4,10 @@ use anyhow::Error;
> use serde::{Deserialize, Serialize};
> use serde_json::Value;
>
> +use pbs_api_types::PROXMOX_SAFE_ID_FORMAT;
> use proxmox_schema::{api, ApiType, Schema, StringSchema, Updater};
> use proxmox_section_config::{SectionConfig, SectionConfigData, SectionConfigPlugin};
>
> -use pbs_api_types::PROXMOX_SAFE_ID_FORMAT;
> use pbs_config::{open_backup_lockfile, BackupLockGuard};
>
> pub const PLUGIN_ID_SCHEMA: Schema = StringSchema::new("ACME Challenge Plugin ID.")
> diff --git a/src/config/node.rs b/src/config/node.rs
> index d2d6e383..253b2e36 100644
> --- a/src/config/node.rs
> +++ b/src/config/node.rs
> @@ -4,14 +4,12 @@ use anyhow::{bail, Error};
> use openssl::ssl::{SslAcceptor, SslMethod};
> use serde::{Deserialize, Serialize};
>
> -use proxmox_schema::{api, ApiStringFormat, ApiType, Updater};
> -
> -use proxmox_http::ProxyConfig;
> -
> use pbs_api_types::{
> EMAIL_SCHEMA, MULTI_LINE_COMMENT_SCHEMA, OPENSSL_CIPHERS_TLS_1_2_SCHEMA,
> OPENSSL_CIPHERS_TLS_1_3_SCHEMA,
> };
> +use proxmox_http::ProxyConfig;
> +use proxmox_schema::{api, ApiStringFormat, ApiType, Updater};
>
> use pbs_buildcfg::configdir;
> use pbs_config::{open_backup_lockfile, BackupLockGuard};
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 4/5] acme: change API impls to use proxmox-acme-api handlers
2026-01-08 11:26 8% ` [pbs-devel] [PATCH proxmox-backup v5 4/5] acme: change API impls to use proxmox-acme-api handlers Samuel Rufinatscha
@ 2026-01-13 13:45 5% ` Fabian Grünbichler
2026-01-13 16:53 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2026-01-13 13:45 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
> PBS currently uses its own ACME client and API logic, while PDM uses the
> factored out proxmox-acme and proxmox-acme-api crates. This duplication
> risks differences in behaviour and requires ACME maintenance in two
> places. This patch is part of a series to move PBS over to the shared
> ACME stack.
>
> Changes:
> - Replace api2/config/acme.rs API logic with proxmox-acme-api handlers.
> - Drop local caching and helper types that duplicate proxmox-acme-api.
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> src/api2/config/acme.rs | 378 ++-----------------------
> src/api2/types/acme.rs | 16 --
> src/bin/proxmox_backup_manager/acme.rs | 6 +-
> src/config/acme/mod.rs | 44 +--
> 4 files changed, 33 insertions(+), 411 deletions(-)
>
> diff --git a/src/api2/config/acme.rs b/src/api2/config/acme.rs
> index 898f06dd..3314430c 100644
> --- a/src/api2/config/acme.rs
> +++ b/src/api2/config/acme.rs
> @@ -1,29 +1,18 @@
> -use std::fs;
> -use std::ops::ControlFlow;
> -use std::path::Path;
nit: this one is actually still used below
> -use std::sync::{Arc, LazyLock, Mutex};
> -use std::time::SystemTime;
> -
> -use anyhow::{bail, format_err, Error};
> -use hex::FromHex;
> -use serde::{Deserialize, Serialize};
> -use serde_json::{json, Value};
> -use tracing::{info, warn};
> +use anyhow::Error;
> +use tracing::info;
>
> use pbs_api_types::{Authid, PRIV_SYS_MODIFY};
> -use proxmox_acme::async_client::AcmeClient;
> -use proxmox_acme::types::AccountData as AcmeAccountData;
> -use proxmox_acme_api::AcmeAccountName;
> +use proxmox_acme_api::{
> + AccountEntry, AccountInfo, AcmeAccountName, AcmeChallengeSchema, ChallengeSchemaWrapper,
> + DeletablePluginProperty, DnsPluginCore, DnsPluginCoreUpdater, KnownAcmeDirectory, PluginConfig,
> + DEFAULT_ACME_DIRECTORY_ENTRY, PLUGIN_ID_SCHEMA,
> +};
> +use proxmox_config_digest::ConfigDigest;
> use proxmox_rest_server::WorkerTask;
> use proxmox_router::{
> http_bail, list_subdirs_api_method, Permission, Router, RpcEnvironment, SubdirMap,
> };
> -use proxmox_schema::{api, param_bail};
> -
> -use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
> -use crate::config::acme::plugin::{
> - self, DnsPlugin, DnsPluginCore, DnsPluginCoreUpdater, PLUGIN_ID_SCHEMA,
> -};
> +use proxmox_schema::api;
>
> pub(crate) const ROUTER: Router = Router::new()
> .get(&list_subdirs_api_method!(SUBDIRS))
> @@ -65,19 +54,6 @@ const PLUGIN_ITEM_ROUTER: Router = Router::new()
> .put(&API_METHOD_UPDATE_PLUGIN)
> .delete(&API_METHOD_DELETE_PLUGIN);
>
> -#[api(
> - properties: {
> - name: { type: AcmeAccountName },
> - },
> -)]
> -/// An ACME Account entry.
> -///
> -/// Currently only contains a 'name' property.
> -#[derive(Serialize)]
> -pub struct AccountEntry {
> - name: AcmeAccountName,
> -}
> -
> #[api(
> access: {
> permission: &Permission::Privilege(&["system", "certificates"], PRIV_SYS_MODIFY, false),
> @@ -91,40 +67,7 @@ pub struct AccountEntry {
> )]
> /// List ACME accounts.
> pub fn list_accounts() -> Result<Vec<AccountEntry>, Error> {
> - let mut entries = Vec::new();
> - crate::config::acme::foreach_acme_account(|name| {
> - entries.push(AccountEntry { name });
> - ControlFlow::Continue(())
> - })?;
> - Ok(entries)
> -}
> -
> -#[api(
> - properties: {
> - account: { type: Object, properties: {}, additional_properties: true },
> - tos: {
> - type: String,
> - optional: true,
> - },
> - },
> -)]
> -/// ACME Account information.
> -///
> -/// This is what we return via the API.
> -#[derive(Serialize)]
> -pub struct AccountInfo {
> - /// Raw account data.
> - account: AcmeAccountData,
> -
> - /// The ACME directory URL the account was created at.
> - directory: String,
> -
> - /// The account's own URL within the ACME directory.
> - location: String,
> -
> - /// The ToS URL, if the user agreed to one.
> - #[serde(skip_serializing_if = "Option::is_none")]
> - tos: Option<String>,
> + proxmox_acme_api::list_accounts()
> }
>
> #[api(
> @@ -141,23 +84,7 @@ pub struct AccountInfo {
> )]
> /// Return existing ACME account information.
> pub async fn get_account(name: AcmeAccountName) -> Result<AccountInfo, Error> {
> - let account_info = proxmox_acme_api::get_account(name).await?;
> -
> - Ok(AccountInfo {
> - location: account_info.location,
> - tos: account_info.tos,
> - directory: account_info.directory,
> - account: AcmeAccountData {
> - only_return_existing: false, // don't actually write this out in case it's set
> - ..account_info.account
> - },
> - })
> -}
> -
> -fn account_contact_from_string(s: &str) -> Vec<String> {
> - s.split(&[' ', ';', ',', '\0'][..])
> - .map(|s| format!("mailto:{s}"))
> - .collect()
> + proxmox_acme_api::get_account(name).await
> }
>
> #[api(
> @@ -222,15 +149,11 @@ fn register_account(
> );
> }
>
> - if Path::new(&crate::config::acme::account_path(&name)).exists() {
> + if std::path::Path::new(&proxmox_acme_api::account_config_filename(&name)).exists() {
here ^
> http_bail!(BAD_REQUEST, "account {} already exists", name);
> }
>
> - let directory = directory.unwrap_or_else(|| {
> - crate::config::acme::DEFAULT_ACME_DIRECTORY_ENTRY
> - .url
> - .to_owned()
> - });
> + let directory = directory.unwrap_or_else(|| DEFAULT_ACME_DIRECTORY_ENTRY.url.to_string());
>
> WorkerTask::spawn(
> "acme-register",
> @@ -286,17 +209,7 @@ pub fn update_account(
> auth_id.to_string(),
> true,
> move |_worker| async move {
> - let data = match contact {
> - Some(data) => json!({
> - "contact": account_contact_from_string(&data),
> - }),
> - None => json!({}),
> - };
> -
> - proxmox_acme_api::load_client_with_account(&name)
> - .await?
> - .update_account(&data)
> - .await?;
> + proxmox_acme_api::update_account(&name, contact).await?;
>
> Ok(())
> },
> @@ -334,18 +247,8 @@ pub fn deactivate_account(
> auth_id.to_string(),
> true,
> move |_worker| async move {
> - match proxmox_acme_api::load_client_with_account(&name)
> - .await?
> - .update_account(&json!({"status": "deactivated"}))
> - .await
> - {
> - Ok(_account) => (),
> - Err(err) if !force => return Err(err),
> - Err(err) => {
> - warn!("error deactivating account {name}, proceeding anyway - {err}");
> - }
> - }
> - crate::config::acme::mark_account_deactivated(&name)?;
> + proxmox_acme_api::deactivate_account(&name, force).await?;
> +
> Ok(())
> },
> )
> @@ -372,15 +275,7 @@ pub fn deactivate_account(
> )]
> /// Get the Terms of Service URL for an ACME directory.
> async fn get_tos(directory: Option<String>) -> Result<Option<String>, Error> {
> - let directory = directory.unwrap_or_else(|| {
> - crate::config::acme::DEFAULT_ACME_DIRECTORY_ENTRY
> - .url
> - .to_owned()
> - });
> - Ok(AcmeClient::new(directory)
> - .terms_of_service_url()
> - .await?
> - .map(str::to_owned))
> + proxmox_acme_api::get_tos(directory).await
> }
>
> #[api(
> @@ -395,52 +290,7 @@ async fn get_tos(directory: Option<String>) -> Result<Option<String>, Error> {
> )]
> /// Get named known ACME directory endpoints.
> fn get_directories() -> Result<&'static [KnownAcmeDirectory], Error> {
> - Ok(crate::config::acme::KNOWN_ACME_DIRECTORIES)
> -}
> -
> -/// Wrapper for efficient Arc use when returning the ACME challenge-plugin schema for serializing
> -struct ChallengeSchemaWrapper {
> - inner: Arc<Vec<AcmeChallengeSchema>>,
> -}
> -
> -impl Serialize for ChallengeSchemaWrapper {
> - fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
> - where
> - S: serde::Serializer,
> - {
> - self.inner.serialize(serializer)
> - }
> -}
> -
> -struct CachedSchema {
> - schema: Arc<Vec<AcmeChallengeSchema>>,
> - cached_mtime: SystemTime,
> -}
> -
> -fn get_cached_challenge_schemas() -> Result<ChallengeSchemaWrapper, Error> {
> - static CACHE: LazyLock<Mutex<Option<CachedSchema>>> = LazyLock::new(|| Mutex::new(None));
> -
> - // the actual loading code
> - let mut last = CACHE.lock().unwrap();
> -
> - let actual_mtime = fs::metadata(crate::config::acme::ACME_DNS_SCHEMA_FN)?.modified()?;
> -
> - let schema = match &*last {
> - Some(CachedSchema {
> - schema,
> - cached_mtime,
> - }) if *cached_mtime >= actual_mtime => schema.clone(),
> - _ => {
> - let new_schema = Arc::new(crate::config::acme::load_dns_challenge_schema()?);
> - *last = Some(CachedSchema {
> - schema: Arc::clone(&new_schema),
> - cached_mtime: actual_mtime,
> - });
> - new_schema
> - }
> - };
> -
> - Ok(ChallengeSchemaWrapper { inner: schema })
> + Ok(proxmox_acme_api::KNOWN_ACME_DIRECTORIES)
> }
>
> #[api(
> @@ -455,69 +305,7 @@ fn get_cached_challenge_schemas() -> Result<ChallengeSchemaWrapper, Error> {
> )]
> /// Get named known ACME directory endpoints.
> fn get_challenge_schema() -> Result<ChallengeSchemaWrapper, Error> {
> - get_cached_challenge_schemas()
> -}
> -
> -#[api]
> -#[derive(Default, Deserialize, Serialize)]
> -#[serde(rename_all = "kebab-case")]
> -/// The API's format is inherited from PVE/PMG:
> -pub struct PluginConfig {
> - /// Plugin ID.
> - plugin: String,
> -
> - /// Plugin type.
> - #[serde(rename = "type")]
> - ty: String,
> -
> - /// DNS Api name.
> - #[serde(skip_serializing_if = "Option::is_none", default)]
> - api: Option<String>,
> -
> - /// Plugin configuration data.
> - #[serde(skip_serializing_if = "Option::is_none", default)]
> - data: Option<String>,
> -
> - /// Extra delay in seconds to wait before requesting validation.
> - ///
> - /// Allows to cope with long TTL of DNS records.
> - #[serde(skip_serializing_if = "Option::is_none", default)]
> - validation_delay: Option<u32>,
> -
> - /// Flag to disable the config.
> - #[serde(skip_serializing_if = "Option::is_none", default)]
> - disable: Option<bool>,
> -}
> -
> -// See PMG/PVE's $modify_cfg_for_api sub
> -fn modify_cfg_for_api(id: &str, ty: &str, data: &Value) -> PluginConfig {
> - let mut entry = data.clone();
> -
> - let obj = entry.as_object_mut().unwrap();
> - obj.remove("id");
> - obj.insert("plugin".to_string(), Value::String(id.to_owned()));
> - obj.insert("type".to_string(), Value::String(ty.to_owned()));
> -
> - // FIXME: This needs to go once the `Updater` is fixed.
> - // None of these should be able to fail unless the user changed the files by hand, in which
> - // case we leave the unmodified string in the Value for now. This will be handled with an error
> - // later.
> - if let Some(Value::String(ref mut data)) = obj.get_mut("data") {
> - if let Ok(new) = proxmox_base64::url::decode_no_pad(&data) {
> - if let Ok(utf8) = String::from_utf8(new) {
> - *data = utf8;
> - }
> - }
> - }
> -
> - // PVE/PMG do this explicitly for ACME plugins...
> - // obj.insert("digest".to_string(), Value::String(digest.clone()));
> -
> - serde_json::from_value(entry).unwrap_or_else(|_| PluginConfig {
> - plugin: "*Error*".to_string(),
> - ty: "*Error*".to_string(),
> - ..Default::default()
> - })
> + proxmox_acme_api::get_cached_challenge_schemas()
> }
>
> #[api(
> @@ -533,12 +321,7 @@ fn modify_cfg_for_api(id: &str, ty: &str, data: &Value) -> PluginConfig {
> )]
> /// List ACME challenge plugins.
> pub fn list_plugins(rpcenv: &mut dyn RpcEnvironment) -> Result<Vec<PluginConfig>, Error> {
> - let (plugins, digest) = plugin::config()?;
> - rpcenv["digest"] = hex::encode(digest).into();
> - Ok(plugins
> - .iter()
> - .map(|(id, (ty, data))| modify_cfg_for_api(id, ty, data))
> - .collect())
> + proxmox_acme_api::list_plugins(rpcenv)
> }
>
> #[api(
> @@ -555,13 +338,7 @@ pub fn list_plugins(rpcenv: &mut dyn RpcEnvironment) -> Result<Vec<PluginConfig>
> )]
> /// List ACME challenge plugins.
> pub fn get_plugin(id: String, rpcenv: &mut dyn RpcEnvironment) -> Result<PluginConfig, Error> {
> - let (plugins, digest) = plugin::config()?;
> - rpcenv["digest"] = hex::encode(digest).into();
> -
> - match plugins.get(&id) {
> - Some((ty, data)) => Ok(modify_cfg_for_api(&id, ty, data)),
> - None => http_bail!(NOT_FOUND, "no such plugin"),
> - }
> + proxmox_acme_api::get_plugin(id, rpcenv)
> }
>
> // Currently we only have "the" standalone plugin and DNS plugins so we can just flatten a
> @@ -593,30 +370,7 @@ pub fn get_plugin(id: String, rpcenv: &mut dyn RpcEnvironment) -> Result<PluginC
> )]
> /// Add ACME plugin configuration.
> pub fn add_plugin(r#type: String, core: DnsPluginCore, data: String) -> Result<(), Error> {
> - // Currently we only support DNS plugins and the standalone plugin is "fixed":
> - if r#type != "dns" {
> - param_bail!("type", "invalid ACME plugin type: {:?}", r#type);
> - }
> -
> - let data = String::from_utf8(proxmox_base64::decode(data)?)
> - .map_err(|_| format_err!("data must be valid UTF-8"))?;
> -
> - let id = core.id.clone();
> -
> - let _lock = plugin::lock()?;
> -
> - let (mut plugins, _digest) = plugin::config()?;
> - if plugins.contains_key(&id) {
> - param_bail!("id", "ACME plugin ID {:?} already exists", id);
> - }
> -
> - let plugin = serde_json::to_value(DnsPlugin { core, data })?;
> -
> - plugins.insert(id, r#type, plugin);
> -
> - plugin::save_config(&plugins)?;
> -
> - Ok(())
> + proxmox_acme_api::add_plugin(r#type, core, data)
> }
>
> #[api(
> @@ -632,26 +386,7 @@ pub fn add_plugin(r#type: String, core: DnsPluginCore, data: String) -> Result<(
> )]
> /// Delete an ACME plugin configuration.
> pub fn delete_plugin(id: String) -> Result<(), Error> {
> - let _lock = plugin::lock()?;
> -
> - let (mut plugins, _digest) = plugin::config()?;
> - if plugins.remove(&id).is_none() {
> - http_bail!(NOT_FOUND, "no such plugin");
> - }
> - plugin::save_config(&plugins)?;
> -
> - Ok(())
> -}
> -
> -#[api()]
> -#[derive(Serialize, Deserialize)]
> -#[serde(rename_all = "kebab-case")]
> -/// Deletable property name
> -pub enum DeletableProperty {
> - /// Delete the disable property
> - Disable,
> - /// Delete the validation-delay property
> - ValidationDelay,
> + proxmox_acme_api::delete_plugin(id)
> }
>
> #[api(
> @@ -673,12 +408,12 @@ pub enum DeletableProperty {
> type: Array,
> optional: true,
> items: {
> - type: DeletableProperty,
> + type: DeletablePluginProperty,
> }
> },
> digest: {
> - description: "Digest to protect against concurrent updates",
> optional: true,
> + type: ConfigDigest,
> },
> },
> },
> @@ -692,65 +427,8 @@ pub fn update_plugin(
> id: String,
> update: DnsPluginCoreUpdater,
> data: Option<String>,
> - delete: Option<Vec<DeletableProperty>>,
> - digest: Option<String>,
> + delete: Option<Vec<DeletablePluginProperty>>,
> + digest: Option<ConfigDigest>,
> ) -> Result<(), Error> {
> - let data = data
> - .as_deref()
> - .map(proxmox_base64::decode)
> - .transpose()?
> - .map(String::from_utf8)
> - .transpose()
> - .map_err(|_| format_err!("data must be valid UTF-8"))?;
> -
> - let _lock = plugin::lock()?;
> -
> - let (mut plugins, expected_digest) = plugin::config()?;
> -
> - if let Some(digest) = digest {
> - let digest = <[u8; 32]>::from_hex(digest)?;
> - crate::tools::detect_modified_configuration_file(&digest, &expected_digest)?;
> - }
> -
> - match plugins.get_mut(&id) {
> - Some((ty, ref mut entry)) => {
> - if ty != "dns" {
> - bail!("cannot update plugin of type {:?}", ty);
> - }
> -
> - let mut plugin = DnsPlugin::deserialize(&*entry)?;
> -
> - if let Some(delete) = delete {
> - for delete_prop in delete {
> - match delete_prop {
> - DeletableProperty::ValidationDelay => {
> - plugin.core.validation_delay = None;
> - }
> - DeletableProperty::Disable => {
> - plugin.core.disable = None;
> - }
> - }
> - }
> - }
> - if let Some(data) = data {
> - plugin.data = data;
> - }
> - if let Some(api) = update.api {
> - plugin.core.api = api;
> - }
> - if update.validation_delay.is_some() {
> - plugin.core.validation_delay = update.validation_delay;
> - }
> - if update.disable.is_some() {
> - plugin.core.disable = update.disable;
> - }
> -
> - *entry = serde_json::to_value(plugin)?;
> - }
> - None => http_bail!(NOT_FOUND, "no such plugin"),
> - }
> -
> - plugin::save_config(&plugins)?;
> -
> - Ok(())
> + proxmox_acme_api::update_plugin(id, update, data, delete, digest)
> }
> diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
> index 64175aff..0ff496b6 100644
> --- a/src/api2/types/acme.rs
> +++ b/src/api2/types/acme.rs
> @@ -43,22 +43,6 @@ pub const ACME_DOMAIN_PROPERTY_SCHEMA: Schema =
> .format(&ApiStringFormat::PropertyString(&AcmeDomain::API_SCHEMA))
> .schema();
>
> -#[api(
> - properties: {
> - name: { type: String },
> - url: { type: String },
> - },
> -)]
> -/// An ACME directory endpoint with a name and URL.
> -#[derive(Serialize)]
> -pub struct KnownAcmeDirectory {
> - /// The ACME directory's name.
> - pub name: &'static str,
> -
> - /// The ACME directory's endpoint URL.
> - pub url: &'static str,
> -}
> -
> #[api(
> properties: {
> schema: {
> diff --git a/src/bin/proxmox_backup_manager/acme.rs b/src/bin/proxmox_backup_manager/acme.rs
> index 6ed61560..d11d7498 100644
> --- a/src/bin/proxmox_backup_manager/acme.rs
> +++ b/src/bin/proxmox_backup_manager/acme.rs
> @@ -4,14 +4,12 @@ use anyhow::{bail, Error};
> use serde_json::Value;
>
> use proxmox_acme::async_client::AcmeClient;
> -use proxmox_acme_api::AcmeAccountName;
> +use proxmox_acme_api::{AcmeAccountName, DnsPluginCore, KNOWN_ACME_DIRECTORIES};
> use proxmox_router::{cli::*, ApiHandler, RpcEnvironment};
> use proxmox_schema::api;
> use proxmox_sys::fs::file_get_contents;
>
> use proxmox_backup::api2;
> -use proxmox_backup::config::acme::plugin::DnsPluginCore;
> -use proxmox_backup::config::acme::KNOWN_ACME_DIRECTORIES;
>
> pub fn acme_mgmt_cli() -> CommandLineInterface {
> let cmd_def = CliCommandMap::new()
> @@ -122,7 +120,7 @@ async fn register_account(
>
> match input.trim().parse::<usize>() {
> Ok(n) if n < KNOWN_ACME_DIRECTORIES.len() => {
> - break (KNOWN_ACME_DIRECTORIES[n].url.to_owned(), false);
> + break (KNOWN_ACME_DIRECTORIES[n].url.to_string(), false);
> }
> Ok(n) if n == KNOWN_ACME_DIRECTORIES.len() => {
> input.clear();
> diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
> index e4639c53..01ab6223 100644
> --- a/src/config/acme/mod.rs
> +++ b/src/config/acme/mod.rs
> @@ -1,16 +1,15 @@
> use std::collections::HashMap;
> use std::ops::ControlFlow;
> -use std::path::Path;
>
> -use anyhow::{bail, format_err, Error};
> +use anyhow::Error;
> use serde_json::Value;
>
> use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
> -use proxmox_acme_api::AcmeAccountName;
> +use proxmox_acme_api::{AcmeAccountName, KnownAcmeDirectory, KNOWN_ACME_DIRECTORIES};
> use proxmox_sys::error::SysError;
> use proxmox_sys::fs::{file_read_string, CreateOptions};
>
> -use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
> +use crate::api2::types::AcmeChallengeSchema;
>
> pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
> pub(crate) const ACME_ACCOUNT_DIR: &str = pbs_buildcfg::configdir!("/acme/accounts");
> @@ -35,23 +34,8 @@ pub(crate) fn make_acme_dir() -> Result<(), Error> {
> create_acme_subdir(ACME_DIR)
> }
>
> -pub const KNOWN_ACME_DIRECTORIES: &[KnownAcmeDirectory] = &[
> - KnownAcmeDirectory {
> - name: "Let's Encrypt V2",
> - url: "https://acme-v02.api.letsencrypt.org/directory",
> - },
> - KnownAcmeDirectory {
> - name: "Let's Encrypt V2 Staging",
> - url: "https://acme-staging-v02.api.letsencrypt.org/directory",
> - },
> -];
> -
> pub const DEFAULT_ACME_DIRECTORY_ENTRY: &KnownAcmeDirectory = &KNOWN_ACME_DIRECTORIES[0];
>
> -pub fn account_path(name: &str) -> String {
> - format!("{ACME_ACCOUNT_DIR}/{name}")
> -}
> -
> pub fn foreach_acme_account<F>(mut func: F) -> Result<(), Error>
> where
> F: FnMut(AcmeAccountName) -> ControlFlow<Result<(), Error>>,
> @@ -82,28 +66,6 @@ where
> }
> }
>
> -pub fn mark_account_deactivated(name: &str) -> Result<(), Error> {
> - let from = account_path(name);
> - for i in 0..100 {
> - let to = account_path(&format!("_deactivated_{name}_{i}"));
> - if !Path::new(&to).exists() {
> - return std::fs::rename(&from, &to).map_err(|err| {
> - format_err!(
> - "failed to move account path {:?} to {:?} - {}",
> - from,
> - to,
> - err
> - )
> - });
> - }
> - }
> - bail!(
> - "No free slot to rename deactivated account {:?}, please cleanup {:?}",
> - from,
> - ACME_ACCOUNT_DIR
> - );
> -}
> -
> pub fn load_dns_challenge_schema() -> Result<Vec<AcmeChallengeSchema>, Error> {
> let raw = file_read_string(ACME_DNS_SCHEMA_FN)?;
> let schemas: serde_json::Map<String, Value> = serde_json::from_str(&raw)?;
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 5/5] acme: certificate ordering through proxmox-acme-api
2026-01-08 11:26 7% ` [pbs-devel] [PATCH proxmox-backup v5 5/5] acme: certificate ordering through proxmox-acme-api Samuel Rufinatscha
@ 2026-01-13 13:45 5% ` Fabian Grünbichler
2026-01-13 16:51 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2026-01-13 13:45 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
> PBS currently uses its own ACME client and API logic, while PDM uses the
> factored out proxmox-acme and proxmox-acme-api crates. This duplication
> risks differences in behaviour and requires ACME maintenance in two
> places. This patch is part of a series to move PBS over to the shared
> ACME stack.
>
> Changes:
> - Replace the custom ACME order/authorization loop in node certificates
> with a call to proxmox_acme_api::order_certificate.
> - Build domain + config data as proxmox-acme-api types
> - Remove obsolete local ACME ordering and plugin glue code.
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> src/acme/mod.rs | 2 -
> src/acme/plugin.rs | 335 ----------------------------------
> src/api2/node/certificates.rs | 229 ++++-------------------
> src/api2/types/acme.rs | 73 --------
> src/api2/types/mod.rs | 3 -
> src/config/acme/mod.rs | 8 +-
> src/config/acme/plugin.rs | 92 +---------
> src/config/node.rs | 20 +-
> src/lib.rs | 2 -
> 9 files changed, 38 insertions(+), 726 deletions(-)
> delete mode 100644 src/acme/mod.rs
> delete mode 100644 src/acme/plugin.rs
> delete mode 100644 src/api2/types/acme.rs
>
[..]
> diff --git a/src/api2/node/certificates.rs b/src/api2/node/certificates.rs
> index 47ff8de5..73401c41 100644
> --- a/src/api2/node/certificates.rs
> +++ b/src/api2/node/certificates.rs
> @@ -1,14 +1,11 @@
> -use std::sync::Arc;
> -use std::time::Duration;
> -
> use anyhow::{bail, format_err, Error};
> use openssl::pkey::PKey;
> use openssl::x509::X509;
> use serde::{Deserialize, Serialize};
> -use tracing::{info, warn};
> +use tracing::info;
>
> use pbs_api_types::{NODE_SCHEMA, PRIV_SYS_MODIFY};
> -use proxmox_acme::async_client::AcmeClient;
> +use proxmox_acme_api::AcmeDomain;
> use proxmox_rest_server::WorkerTask;
> use proxmox_router::list_subdirs_api_method;
> use proxmox_router::SubdirMap;
> @@ -18,8 +15,6 @@ use proxmox_schema::api;
> use pbs_buildcfg::configdir;
> use pbs_tools::cert;
>
> -use crate::api2::types::AcmeDomain;
> -use crate::config::node::NodeConfig;
> use crate::server::send_certificate_renewal_mail;
>
> pub const ROUTER: Router = Router::new()
> @@ -268,193 +263,6 @@ pub async fn delete_custom_certificate() -> Result<(), Error> {
> Ok(())
> }
>
> -struct OrderedCertificate {
> - certificate: hyper::body::Bytes,
> - private_key_pem: Vec<u8>,
> -}
> -
> -async fn order_certificate(
> - worker: Arc<WorkerTask>,
> - node_config: &NodeConfig,
> -) -> Result<Option<OrderedCertificate>, Error> {
> - use proxmox_acme::authorization::Status;
> - use proxmox_acme::order::Identifier;
> -
> - let domains = node_config.acme_domains().try_fold(
> - Vec::<AcmeDomain>::new(),
> - |mut acc, domain| -> Result<_, Error> {
> - let mut domain = domain?;
> - domain.domain.make_ascii_lowercase();
> - if let Some(alias) = &mut domain.alias {
> - alias.make_ascii_lowercase();
> - }
> - acc.push(domain);
> - Ok(acc)
> - },
> - )?;
> -
> - let get_domain_config = |domain: &str| {
> - domains
> - .iter()
> - .find(|d| d.domain == domain)
> - .ok_or_else(|| format_err!("no config for domain '{}'", domain))
> - };
> -
> - if domains.is_empty() {
> - info!("No domains configured to be ordered from an ACME server.");
> - return Ok(None);
> - }
> -
> - let (plugins, _) = crate::config::acme::plugin::config()?;
> -
> - let mut acme = node_config.acme_client().await?;
> -
> - info!("Placing ACME order");
> - let order = acme
> - .new_order(domains.iter().map(|d| d.domain.to_ascii_lowercase()))
> - .await?;
> - info!("Order URL: {}", order.location);
> -
> - let identifiers: Vec<String> = order
> - .data
> - .identifiers
> - .iter()
> - .map(|identifier| match identifier {
> - Identifier::Dns(domain) => domain.clone(),
> - })
> - .collect();
> -
> - for auth_url in &order.data.authorizations {
> - info!("Getting authorization details from '{auth_url}'");
> - let mut auth = acme.get_authorization(auth_url).await?;
> -
> - let domain = match &mut auth.identifier {
> - Identifier::Dns(domain) => domain.to_ascii_lowercase(),
> - };
> -
> - if auth.status == Status::Valid {
> - info!("{domain} is already validated!");
> - continue;
> - }
> -
> - info!("The validation for {domain} is pending");
> - let domain_config: &AcmeDomain = get_domain_config(&domain)?;
> - let plugin_id = domain_config.plugin.as_deref().unwrap_or("standalone");
> - let mut plugin_cfg = crate::acme::get_acme_plugin(&plugins, plugin_id)?
> - .ok_or_else(|| format_err!("plugin '{plugin_id}' for domain '{domain}' not found!"))?;
> -
> - info!("Setting up validation plugin");
> - let validation_url = plugin_cfg
> - .setup(&mut acme, &auth, domain_config, Arc::clone(&worker))
> - .await?;
> -
> - let result = request_validation(&mut acme, auth_url, validation_url).await;
> -
> - if let Err(err) = plugin_cfg
> - .teardown(&mut acme, &auth, domain_config, Arc::clone(&worker))
> - .await
> - {
> - warn!("Failed to teardown plugin '{plugin_id}' for domain '{domain}' - {err}");
> - }
> -
> - result?;
> - }
> -
> - info!("All domains validated");
> - info!("Creating CSR");
> -
> - let csr = proxmox_acme::util::Csr::generate(&identifiers, &Default::default())?;
> - let mut finalize_error_cnt = 0u8;
> - let order_url = &order.location;
> - let mut order;
> - loop {
> - use proxmox_acme::order::Status;
> -
> - order = acme.get_order(order_url).await?;
> -
> - match order.status {
> - Status::Pending => {
> - info!("still pending, trying to finalize anyway");
> - let finalize = order
> - .finalize
> - .as_deref()
> - .ok_or_else(|| format_err!("missing 'finalize' URL in order"))?;
> - if let Err(err) = acme.finalize(finalize, &csr.data).await {
> - if finalize_error_cnt >= 5 {
> - return Err(err);
> - }
> -
> - finalize_error_cnt += 1;
> - }
> - tokio::time::sleep(Duration::from_secs(5)).await;
> - }
> - Status::Ready => {
> - info!("order is ready, finalizing");
> - let finalize = order
> - .finalize
> - .as_deref()
> - .ok_or_else(|| format_err!("missing 'finalize' URL in order"))?;
> - acme.finalize(finalize, &csr.data).await?;
> - tokio::time::sleep(Duration::from_secs(5)).await;
> - }
> - Status::Processing => {
> - info!("still processing, trying again in 30 seconds");
> - tokio::time::sleep(Duration::from_secs(30)).await;
> - }
> - Status::Valid => {
> - info!("valid");
> - break;
> - }
> - other => bail!("order status: {:?}", other),
> - }
> - }
> -
> - info!("Downloading certificate");
> - let certificate = acme
> - .get_certificate(
> - order
> - .certificate
> - .as_deref()
> - .ok_or_else(|| format_err!("missing certificate url in finalized order"))?,
> - )
> - .await?;
> -
> - Ok(Some(OrderedCertificate {
> - certificate,
> - private_key_pem: csr.private_key_pem,
> - }))
> -}
> -
> -async fn request_validation(
> - acme: &mut AcmeClient,
> - auth_url: &str,
> - validation_url: &str,
> -) -> Result<(), Error> {
> - info!("Triggering validation");
> - acme.request_challenge_validation(validation_url).await?;
> -
> - info!("Sleeping for 5 seconds");
> - tokio::time::sleep(Duration::from_secs(5)).await;
> -
> - loop {
> - use proxmox_acme::authorization::Status;
> -
> - let auth = acme.get_authorization(auth_url).await?;
> - match auth.status {
> - Status::Pending => {
> - info!("Status is still 'pending', trying again in 10 seconds");
> - tokio::time::sleep(Duration::from_secs(10)).await;
> - }
> - Status::Valid => return Ok(()),
> - other => bail!(
> - "validating challenge '{}' failed - status: {:?}",
> - validation_url,
> - other
> - ),
> - }
> - }
> -}
> -
> #[api(
> input: {
> properties: {
> @@ -524,9 +332,30 @@ fn spawn_certificate_worker(
>
> let auth_id = rpcenv.get_auth_id().unwrap();
>
> + let acme_config = if let Some(cfg) = node_config.acme_config().transpose()? {
> + cfg
> + } else {
> + proxmox_acme_api::parse_acme_config_string("account=default")?
> + };
wouldn't it make sense to inline this into acme_config() ? the same
fallback is already there for acme_client()
> +
> + let domains = node_config.acme_domains().try_fold(
> + Vec::<AcmeDomain>::new(),
> + |mut acc, domain| -> Result<_, Error> {
> + let mut domain = domain?;
> + domain.domain.make_ascii_lowercase();
> + if let Some(alias) = &mut domain.alias {
> + alias.make_ascii_lowercase();
> + }
> + acc.push(domain);
> + Ok(acc)
> + },
> + )?;
> +
> WorkerTask::spawn(name, None, auth_id, true, move |worker| async move {
> let work = || async {
> - if let Some(cert) = order_certificate(worker, &node_config).await? {
> + if let Some(cert) =
> + proxmox_acme_api::order_certificate(worker, &acme_config, &domains).await?
> + {
> crate::config::set_proxy_certificate(&cert.certificate, &cert.private_key_pem)?;
> crate::server::reload_proxy_certificate().await?;
> }
> @@ -562,16 +391,20 @@ pub fn revoke_acme_cert(rpcenv: &mut dyn RpcEnvironment) -> Result<String, Error
>
> let auth_id = rpcenv.get_auth_id().unwrap();
>
> + let acme_config = if let Some(cfg) = node_config.acme_config().transpose()? {
> + cfg
> + } else {
> + proxmox_acme_api::parse_acme_config_string("account=default")?
> + };
here as well
> +
> WorkerTask::spawn(
> "acme-revoke-cert",
> None,
> auth_id,
> true,
> move |_worker| async move {
> - info!("Loading ACME account");
> - let mut acme = node_config.acme_client().await?;
> info!("Revoking old certificate");
> - acme.revoke_certificate(cert_pem.as_bytes(), None).await?;
> + proxmox_acme_api::revoke_certificate(&acme_config, &cert_pem.as_bytes()).await?;
> info!("Deleting certificate and regenerating a self-signed one");
> delete_custom_certificate().await?;
> Ok(())
[..]
> diff --git a/src/config/acme/plugin.rs b/src/config/acme/plugin.rs
> index 8ce852ec..4b4a216e 100644
> --- a/src/config/acme/plugin.rs
> +++ b/src/config/acme/plugin.rs
> @@ -1,104 +1,16 @@
> use std::sync::LazyLock;
>
> use anyhow::Error;
> -use serde::{Deserialize, Serialize};
> use serde_json::Value;
>
> -use pbs_api_types::PROXMOX_SAFE_ID_FORMAT;
> -use proxmox_schema::{api, ApiType, Schema, StringSchema, Updater};
> +use proxmox_acme_api::{DnsPlugin, StandalonePlugin, PLUGIN_ID_SCHEMA};
> +use proxmox_schema::{ApiType, Schema};
> use proxmox_section_config::{SectionConfig, SectionConfigData, SectionConfigPlugin};
>
> use pbs_config::{open_backup_lockfile, BackupLockGuard};
>
> -pub const PLUGIN_ID_SCHEMA: Schema = StringSchema::new("ACME Challenge Plugin ID.")
> - .format(&PROXMOX_SAFE_ID_FORMAT)
> - .min_length(1)
> - .max_length(32)
> - .schema();
> -
> pub static CONFIG: LazyLock<SectionConfig> = LazyLock::new(init);
>
> -#[api(
> - properties: {
> - id: { schema: PLUGIN_ID_SCHEMA },
> - },
> -)]
> -#[derive(Deserialize, Serialize)]
> -/// Standalone ACME Plugin for the http-1 challenge.
> -pub struct StandalonePlugin {
> - /// Plugin ID.
> - id: String,
> -}
> -
> -impl Default for StandalonePlugin {
> - fn default() -> Self {
> - Self {
> - id: "standalone".to_string(),
> - }
> - }
> -}
> -
> -#[api(
> - properties: {
> - id: { schema: PLUGIN_ID_SCHEMA },
> - disable: {
> - optional: true,
> - default: false,
> - },
> - "validation-delay": {
> - default: 30,
> - optional: true,
> - minimum: 0,
> - maximum: 2 * 24 * 60 * 60,
> - },
> - },
> -)]
> -/// DNS ACME Challenge Plugin core data.
> -#[derive(Deserialize, Serialize, Updater)]
> -#[serde(rename_all = "kebab-case")]
> -pub struct DnsPluginCore {
> - /// Plugin ID.
> - #[updater(skip)]
> - pub id: String,
> -
> - /// DNS API Plugin Id.
> - pub api: String,
> -
> - /// Extra delay in seconds to wait before requesting validation.
> - ///
> - /// Allows to cope with long TTL of DNS records.
> - #[serde(skip_serializing_if = "Option::is_none", default)]
> - pub validation_delay: Option<u32>,
> -
> - /// Flag to disable the config.
> - #[serde(skip_serializing_if = "Option::is_none", default)]
> - pub disable: Option<bool>,
> -}
> -
> -#[api(
> - properties: {
> - core: { type: DnsPluginCore },
> - },
> -)]
> -/// DNS ACME Challenge Plugin.
> -#[derive(Deserialize, Serialize)]
> -#[serde(rename_all = "kebab-case")]
> -pub struct DnsPlugin {
> - #[serde(flatten)]
> - pub core: DnsPluginCore,
> -
> - // We handle this property separately in the API calls.
> - /// DNS plugin data (base64url encoded without padding).
> - #[serde(with = "proxmox_serde::string_as_base64url_nopad")]
> - pub data: String,
> -}
> -
> -impl DnsPlugin {
> - pub fn decode_data(&self, output: &mut Vec<u8>) -> Result<(), Error> {
> - Ok(proxmox_base64::url::decode_to_vec(&self.data, output)?)
> - }
> -}
> -
> fn init() -> SectionConfig {
> let mut config = SectionConfig::new(&PLUGIN_ID_SCHEMA);
>
> diff --git a/src/config/node.rs b/src/config/node.rs
> index e4b66a20..6865b815 100644
> --- a/src/config/node.rs
> +++ b/src/config/node.rs
> @@ -9,14 +9,14 @@ use pbs_api_types::{
> OPENSSL_CIPHERS_TLS_1_3_SCHEMA,
> };
> use proxmox_acme::async_client::AcmeClient;
> -use proxmox_acme_api::AcmeAccountName;
> +use proxmox_acme_api::{AcmeAccountName, AcmeConfig, AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA};
> use proxmox_http::ProxyConfig;
> use proxmox_schema::{api, ApiStringFormat, ApiType, Updater};
>
> use pbs_buildcfg::configdir;
> use pbs_config::{open_backup_lockfile, BackupLockGuard};
>
> -use crate::api2::types::{AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA};
> +use crate::api2::types::HTTP_PROXY_SCHEMA;
>
> const CONF_FILE: &str = configdir!("/node.cfg");
> const LOCK_FILE: &str = configdir!("/.node.lck");
> @@ -43,20 +43,6 @@ pub fn save_config(config: &NodeConfig) -> Result<(), Error> {
> pbs_config::replace_backup_config(CONF_FILE, &raw)
> }
>
> -#[api(
> - properties: {
> - account: { type: AcmeAccountName },
> - }
> -)]
> -#[derive(Deserialize, Serialize)]
> -/// The ACME configuration.
> -///
> -/// Currently only contains the name of the account use.
> -pub struct AcmeConfig {
> - /// Account to use to acquire ACME certificates.
> - account: AcmeAccountName,
> -}
> -
> /// All available languages in Proxmox. Taken from proxmox-i18n repository.
> /// pt_BR, zh_CN, and zh_TW use the same case in the translation files.
> // TODO: auto-generate from available translations
> @@ -242,7 +228,7 @@ impl NodeConfig {
>
> pub async fn acme_client(&self) -> Result<AcmeClient, Error> {
> let account = if let Some(cfg) = self.acme_config().transpose()? {
> - cfg.account
> + AcmeAccountName::from_string(cfg.account)?
> } else {
> AcmeAccountName::from_string("default".to_string())? // should really not happen
> };
> diff --git a/src/lib.rs b/src/lib.rs
> index 8633378c..828f5842 100644
> --- a/src/lib.rs
> +++ b/src/lib.rs
> @@ -27,8 +27,6 @@ pub(crate) mod auth;
>
> pub mod tape;
>
> -pub mod acme;
> -
> pub mod client_helpers;
>
> pub mod traffic_control_cache;
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 3/5] acme: drop local AcmeClient
2026-01-08 11:26 6% ` [pbs-devel] [PATCH proxmox-backup v5 3/5] acme: drop local AcmeClient Samuel Rufinatscha
@ 2026-01-13 13:45 5% ` Fabian Grünbichler
2026-01-14 8:56 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2026-01-13 13:45 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
> PBS currently uses its own ACME client and API logic, while PDM uses the
> factored out proxmox-acme and proxmox-acme-api crates. This duplication
> risks differences in behaviour and requires ACME maintenance in two
> places. This patch is part of a series to move PBS over to the shared
> ACME stack.
>
> Changes:
> - Remove the local src/acme/client.rs and switch to
> proxmox_acme::async_client::AcmeClient where needed.
> - Use proxmox_acme_api::load_client_with_account to the custom
> AcmeClient::load() function
> - Replace the local do_register() logic with
> proxmox_acme_api::register_account, to further ensure accounts are persisted
> - Replace the local AcmeAccountName type, required for
> proxmox_acme_api::register_account
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> src/acme/client.rs | 691 -------------------------
> src/acme/mod.rs | 3 -
> src/acme/plugin.rs | 2 +-
> src/api2/config/acme.rs | 50 +-
> src/api2/node/certificates.rs | 2 +-
> src/api2/types/acme.rs | 8 -
> src/bin/proxmox_backup_manager/acme.rs | 17 +-
> src/config/acme/mod.rs | 8 +-
> src/config/node.rs | 9 +-
> 9 files changed, 36 insertions(+), 754 deletions(-)
> delete mode 100644 src/acme/client.rs
>
[..]
> diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
> index ac89ae5e..e4639c53 100644
> --- a/src/config/acme/mod.rs
> +++ b/src/config/acme/mod.rs
I think this whole file should probably be replaced entirely by
proxmox-acme-api , which - AFAICT - would just require adding the
completion helpers there?
> @@ -6,10 +6,11 @@ use anyhow::{bail, format_err, Error};
> use serde_json::Value;
>
> use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
> +use proxmox_acme_api::AcmeAccountName;
> use proxmox_sys::error::SysError;
> use proxmox_sys::fs::{file_read_string, CreateOptions};
>
> -use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
> +use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
>
> pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
> pub(crate) const ACME_ACCOUNT_DIR: &str = pbs_buildcfg::configdir!("/acme/accounts");
> @@ -34,11 +35,6 @@ pub(crate) fn make_acme_dir() -> Result<(), Error> {
> create_acme_subdir(ACME_DIR)
> }
>
> -pub(crate) fn make_acme_account_dir() -> Result<(), Error> {
> - make_acme_dir()?;
> - create_acme_subdir(ACME_ACCOUNT_DIR)
> -}
> -
> pub const KNOWN_ACME_DIRECTORIES: &[KnownAcmeDirectory] = &[
> KnownAcmeDirectory {
> name: "Let's Encrypt V2",
> diff --git a/src/config/node.rs b/src/config/node.rs
> index 253b2e36..e4b66a20 100644
> --- a/src/config/node.rs
> +++ b/src/config/node.rs
> @@ -8,16 +8,15 @@ use pbs_api_types::{
> EMAIL_SCHEMA, MULTI_LINE_COMMENT_SCHEMA, OPENSSL_CIPHERS_TLS_1_2_SCHEMA,
> OPENSSL_CIPHERS_TLS_1_3_SCHEMA,
> };
> +use proxmox_acme::async_client::AcmeClient;
> +use proxmox_acme_api::AcmeAccountName;
> use proxmox_http::ProxyConfig;
> use proxmox_schema::{api, ApiStringFormat, ApiType, Updater};
>
> use pbs_buildcfg::configdir;
> use pbs_config::{open_backup_lockfile, BackupLockGuard};
>
> -use crate::acme::AcmeClient;
> -use crate::api2::types::{
> - AcmeAccountName, AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA,
> -};
> +use crate::api2::types::{AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA};
>
> const CONF_FILE: &str = configdir!("/node.cfg");
> const LOCK_FILE: &str = configdir!("/.node.lck");
> @@ -247,7 +246,7 @@ impl NodeConfig {
> } else {
> AcmeAccountName::from_string("default".to_string())? // should really not happen
> };
> - AcmeClient::load(&account).await
> + proxmox_acme_api::load_client_with_account(&account).await
> }
>
> pub fn acme_domains(&'_ self) -> AcmeDomainIter<'_> {
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox v5 4/4] acme-api: add helper to load client for an account
2026-01-08 11:26 17% ` [pbs-devel] [PATCH proxmox v5 4/4] acme-api: add helper to load client for an account Samuel Rufinatscha
@ 2026-01-13 13:45 5% ` Fabian Grünbichler
2026-01-13 16:57 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2026-01-13 13:45 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On January 8, 2026 12:26 pm, Samuel Rufinatscha wrote:
> The PBS ACME refactoring needs a simple way to obtain an AcmeClient for
> a given configured account without duplicating config wiring. This patch
> adds a load_client_with_account helper in proxmox-acme-api that loads
> the account and constructs a matching client, similarly as PBS previous
> own AcmeClient::load() function.
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> proxmox-acme-api/src/account_api_impl.rs | 5 +++++
> proxmox-acme-api/src/lib.rs | 3 ++-
> 2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/proxmox-acme-api/src/account_api_impl.rs b/proxmox-acme-api/src/account_api_impl.rs
> index ef195908..ca8c8655 100644
> --- a/proxmox-acme-api/src/account_api_impl.rs
> +++ b/proxmox-acme-api/src/account_api_impl.rs
> @@ -116,3 +116,8 @@ pub async fn update_account(name: &AcmeAccountName, contact: Option<String>) ->
>
> Ok(())
> }
> +
> +pub async fn load_client_with_account(account_name: &AcmeAccountName) -> Result<AcmeClient, Error> {
> + let account_data = super::account_config::load_account_config(&account_name).await?;
> + Ok(account_data.client())
> +}
I don't think this is needed - there is only a single callsite in PBS
and that is itself dead code that can be removed..
> diff --git a/proxmox-acme-api/src/lib.rs b/proxmox-acme-api/src/lib.rs
> index 623e9e23..96f88ae2 100644
> --- a/proxmox-acme-api/src/lib.rs
> +++ b/proxmox-acme-api/src/lib.rs
> @@ -31,7 +31,8 @@ mod plugin_config;
> mod account_api_impl;
> #[cfg(feature = "impl")]
> pub use account_api_impl::{
> - deactivate_account, get_account, get_tos, list_accounts, register_account, update_account,
> + deactivate_account, get_account, get_tos, list_accounts, load_client_with_account,
> + register_account, update_account,
> };
>
> #[cfg(feature = "impl")]
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pdm-devel] [PATCH datacenter-manager] fix #7120: remote updates: drop vanished nodes/remotes from cache file
@ 2026-01-08 14:38 15% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-08 14:38 UTC (permalink / raw)
To: Proxmox Datacenter Manager development discussion, Lukas Wagner
On 1/8/26 2:06 PM, Lukas Wagner wrote:
> This commits makes sure that vanished remotes and remote cluster nodes
> are dropped from the remote updates cache file. This happens whenever
> the cache file is fully refreshed, either by the periodic update task,
> or by pressing "Refresh All" in the UI.
>
> Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
> ---
> server/src/remote_updates.rs | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/server/src/remote_updates.rs b/server/src/remote_updates.rs
> index e772eef5..0490d28e 100644
> --- a/server/src/remote_updates.rs
> +++ b/server/src/remote_updates.rs
> @@ -214,6 +214,11 @@ pub async fn refresh_update_summary_cache(remotes: Vec<Remote>) -> Result<(), Er
>
> let mut content = get_cached_summary_or_default()?;
>
> + // Clean out any remotes that might have been removed from the remote config in the meanwhile.
> + content
> + .remotes
> + .retain(|remote, _| fetch_results.remote_results.contains_key(remote));
> +
> for (remote_name, result) in fetch_results.remote_results {
> let entry = content
> .remotes
> @@ -234,6 +239,11 @@ pub async fn refresh_update_summary_cache(remotes: Vec<Remote>) -> Result<(), Er
> Ok(remote_result) => {
> entry.status = RemoteUpdateStatus::Success;
>
> + // Clean out any nodes that might have been removed from the cluster in the meanwhile.
> + entry
> + .nodes
> + .retain(|name, _| remote_result.node_results.contains_key(name));
> +
> for (node_name, node_result) in remote_result.node_results {
> match node_result {
> Ok(NodeResults { data, .. }) => {
Patch looks good to me! I could reproduce the issue and can confirm
the patch works. After the patch,
/var/cache/proxmox-datacenter-manager/remote-updates.json no
longer shows the removed remote when running "Refresh All".
Reviewed-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
Tested-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
_______________________________________________
pdm-devel mailing list
pdm-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pdm-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] superseded: [PATCH proxmox{-backup, } v4 0/8] fix #6939: acme: support servers returning 204 for nonce requests
2025-12-03 10:22 11% [pbs-devel] [PATCH proxmox{-backup, } v4 " Samuel Rufinatscha
` (8 preceding siblings ...)
2025-12-09 16:50 5% ` [pbs-devel] [PATCH proxmox{-backup, } v4 0/8] " Max R. Carrara
@ 2026-01-08 11:48 13% ` Samuel Rufinatscha
9 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-08 11:48 UTC (permalink / raw)
To: pbs-devel
https://lore.proxmox.com/pbs-devel/20260108112629.189670-1-s.rufinatscha@proxmox.com/T/#t
On 12/3/25 11:21 AM, Samuel Rufinatscha wrote:
> Hi,
>
> this series fixes account registration for ACME providers that return
> HTTP 204 No Content to the newNonce request. Currently, both the PBS
> ACME client and the shared ACME client in proxmox-acme only accept
> HTTP 200 OK for this request. The issue was observed in PBS against a
> custom ACME deployment and reported as bug #6939 [1].
>
> ## Problem
>
> During ACME account registration, PBS first fetches an anti-replay
> nonce by sending a HEAD request to the CA’s newNonce URL.
> RFC 8555 §7.2 [2] states that:
>
> * the server MUST include a Replay-Nonce header with a fresh nonce,
> * the server SHOULD use status 200 OK for the HEAD request,
> * the server MUST also handle GET on the same resource and may return
> 204 No Content with an empty body.
>
> The reporter observed the following error message:
>
> *ACME server responded with unexpected status code: 204*
>
> and mentioned that the issue did not appear with PVE 9 [1]. Looking at
> PVE’s Perl ACME client [3], it uses a GET request instead of HEAD and
> accepts any 2xx success code when retrieving the nonce. This difference
> in behavior does not affect functionality but is worth noting for
> consistency across implementations.
>
> ## Approach
>
> To support ACME providers which return 204 No Content, the Rust ACME
> clients in proxmox-backup and proxmox need to treat both 200 OK and 204
> No Content as valid responses for the nonce request, as long as a
> Replay-Nonce header is present.
>
> This series changes the expected field of the internal Request type
> from a single u16 to a list of allowed status codes
> (e.g. &'static [u16]), so one request can explicitly accept multiple
> success codes.
>
> To avoid fixing the issue twice (once in PBS’ own ACME client and once
> in the shared Rust client), this series first refactors PBS to use the
> shared AcmeClient from proxmox-acme / proxmox-acme-api, similar to PDM,
> and then applies the bug fix in that shared implementation so that all
> consumers benefit from the more tolerant behavior.
>
> ## Testing
>
> *Testing the refactor*
>
> To test the refactor, I
> (1) installed latest stable PBS on a VM
> (2) created .deb package from latest PBS (master), containing the
> refactor
> (3) installed created .deb package
> (4) installed Pebble from Let's Encrypt [5] on the same VM
> (5) created an ACME account and ordered the new certificate for the
> host domain.
>
> Steps to reproduce:
>
> (1) install latest stable PBS on a VM, create .deb package from latest
> PBS (master) containing the refactor, install created .deb package
> (2) install Pebble from Let's Encrypt [5] on the same VM:
>
> cd
> apt update
> apt install -y golang git
> git clone https://github.com/letsencrypt/pebble
> cd pebble
> go build ./cmd/pebble
>
> then, download and trust the Pebble cert:
>
> wget https://raw.githubusercontent.com/letsencrypt/pebble/main/test/certs/pebble.minica.pem
> cp pebble.minica.pem /usr/local/share/ca-certificates/pebble.minica.crt
> update-ca-certificates
>
> We want Pebble to perform HTTP-01 validation against port 80, because
> PBS’s standalone plugin will bind port 80. Set httpPort to 80.
>
> nano ./test/config/pebble-config.json
>
> Start the Pebble server in the background:
>
> ./pebble -config ./test/config/pebble-config.json &
>
> Create a Pebble ACME account:
>
> proxmox-backup-manager acme account register default admin@example.com --directory 'https://127.0.0.1:14000/dir'
>
> To verify persistence of the account I checked
>
> ls /etc/proxmox-backup/acme/accounts
>
> Verified if update-account works
>
> proxmox-backup-manager acme account update default --contact "a@example.com,b@example.com"
> proxmox-backup-manager acme account info default
>
> In the PBS GUI, you can create a new domain. You can use your host
> domain name (see /etc/hosts). Select the created account and order the
> certificate.
>
> After a page reload, you might need to accept the new certificate in the browser.
> In the PBS dashboard, you should see the new Pebble certificate.
>
> *Note: on reboot, the created Pebble ACME account will be gone and you
> will need to create a new one. Pebble does not persist account info.
> In that case remove the previously created account in
> /etc/proxmox-backup/acme/accounts.
>
> *Testing the newNonce fix*
>
> To prove the ACME newNonce fix, I put nginx in front of Pebble, to
> intercept the newNonce request in order to return 204 No Content
> instead of 200 OK, all other requests are unchanged and forwarded to
> Pebble. Requires trusting the nginx CAs via
> /usr/local/share/ca-certificates + update-ca-certificates on the VM.
>
> Then I ran following command against nginx:
>
> proxmox-backup-manager acme account register proxytest root@backup.local --directory 'https://nginx-address/dir
>
> The account could be created successfully. When adjusting the nginx
> configuration to return any other non-expected success status code,
> PBS rejects as expected.
>
> ## Patch summary
>
> 0001 – acme: include proxmox-acme-api dependency
> Adds proxmox-acme-api as a new dependency for the ACME code. This
> prepares the codebase to use the shared ACME API instead of local
> implementations.
>
> 0002 – acme: drop local AcmeClient
> Removes the local AcmeClient implementation. Minimal changes
> required to support the removal.
>
> 0003 – acme: change API impls to use proxmox-acme-api handler
> Updates existing ACME API implementations to use the handlers provided
> by proxmox-acme-api.
>
> 0004 – acme: certificate ordering through proxmox-acme-api
> Perform certificate ordering through proxmox-acme-api instead of local
> logic.
>
> 0005 – acme api: add helper to load client for an account
> Introduces a helper function to load an ACME client instance for a
> given account. Required for the PBS refactor.
>
> 0006 – acme: reduce visibility of Request type
> Restricts the visibility of the internal Request type.
>
> 0007 – acme: introduce http_status module
> Adds a dedicated http_status module for handling common HTTP status
> codes.
>
> 0008 – fix #6939: acme: support servers returning 204 for nonce
> Adjusts nonce handling to support ACME servers that return HTTP 204
> (No Content) for new-nonce requests.
>
> Thanks for considering this patch series, I look forward to your
> feedback.
>
> Best,
> Samuel Rufinatscha
>
> ## Changelog
>
> Changes from v3 to v4:
>
> Removed: [PATCH proxmox-backup v3 1/1].
>
> Added:
>
> [PATCH proxmox-backup v4 1/4] acme: include proxmox-acme-api dependency
> * New: add proxmox-acme-api as a dependency and initialize it in
> PBS so PBS can use the shared ACME API instead.
>
> [PATCH proxmox-backup v4 2/4] acme: drop local AcmeClient
> * New: remove the PBS-local AcmeClient implementation and switch PBS
> over to the shared proxmox-acme async client.
>
> [PATCH proxmox-backup v4 3/4] acme: change API impls to use proxmox-acme-api
> handlers
> * New: rework PBS’ ACME API endpoints to delegate to
> proxmox-acme-api handlers instead of duplicating logic locally.
>
> [PATCH proxmox-backup v4 4/4] acme: certificate ordering through
> proxmox-acme-api
> * New: move PBS’ ACME certificate ordering logic over to
> proxmox-acme-api, keeping only certificate installation/reload in
> PBS.
>
> [PATCH proxmox v4 1/4] acme-api: add helper to load client for an account
> * New: add a load_client_with_account helper in proxmox-acme-api so
> PBS (and others) can construct an AcmeClient for a configured account
> without duplicating boilerplate.
>
> [PATCH proxmox v4 2/4] acme: reduce visibility of Request type
> * New: hide the low-level Request type and its fields behind
> constructors / reduced visibility so changes to “expected” no longer
> affect the public API as they did in v3.
>
> [PATCH proxmox v4 3/4] acme: introduce http_status module
> * New: split out the HTTP status constants into an internal
> http_status module as a separate preparatory cleanup before the bug
> fix, instead of doing this inline like in v3.
>
> Changed:
>
> [PATCH proxmox v3 1/1] -> [PATCH proxmox v4 4/4]
> fix #6939: acme: support server returning 204 for nonce requests
> * Rebased on top of the refactor: keep the same behavioural fix as in v3
> (accept 204 for newNonce with Replay-Nonce present), but implement it
> on top of the http_status module that is part of the refactor.
>
> Changes from v2 to v3:
>
> [PATCH proxmox v3 1/1] fix #6939: support providers returning 204 for nonce
> requests
> * Rename `http_success` module to `http_status`
>
> [PATCH proxmox-backup v3 1/1] acme: accept HTTP 204 from newNonce endpoint
> * Replace `http_success` usage
>
> Changes from v1 to v2:
>
> [PATCH proxmox v2 1/1] fix #6939: support providers returning 204 for nonce
> requests
> * Introduced `http_success` module to contain the http success codes
> * Replaced `Vec<u16>` with `&[u16]` for expected codes to avoid
> allocations.
> * Clarified the PVEs Perl ACME client behaviour in the commit message.
>
> [PATCH proxmox-backup v2 1/1] acme: accept HTTP 204 from newNonce endpoint
> * Integrated the `http_success` module, replacing `Vec<u16>` with `&[u16]`
> * Clarified the PVEs Perl ACME client behaviour in the commit message.
>
> [1] Bugzilla report #6939:
> [https://bugzilla.proxmox.com/show_bug.cgi?id=6939](https://bugzilla.proxmox.com/show_bug.cgi?id=6939)
> [2] RFC 8555 (ACME):
> [https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2](https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2)
> [3] PVE’s Perl ACME client (allow 2xx codes for nonce requests):
> [https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597)
> [4] Pebble ACME server:
> [https://github.com/letsencrypt/pebble](https://github.com/letsencrypt/pebble)
> [5] Pebble ACME server (perform GET request:
> [https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219)
>
> proxmox-backup:
>
> Samuel Rufinatscha (4):
> acme: include proxmox-acme-api dependency
> acme: drop local AcmeClient
> acme: change API impls to use proxmox-acme-api handlers
> acme: certificate ordering through proxmox-acme-api
>
> Cargo.toml | 3 +
> src/acme/client.rs | 691 -------------------------
> src/acme/mod.rs | 5 -
> src/acme/plugin.rs | 336 ------------
> src/api2/config/acme.rs | 407 ++-------------
> src/api2/node/certificates.rs | 240 ++-------
> src/api2/types/acme.rs | 98 ----
> src/api2/types/mod.rs | 3 -
> src/bin/proxmox-backup-api.rs | 2 +
> src/bin/proxmox-backup-manager.rs | 2 +
> src/bin/proxmox-backup-proxy.rs | 1 +
> src/bin/proxmox_backup_manager/acme.rs | 21 +-
> src/config/acme/mod.rs | 51 +-
> src/config/acme/plugin.rs | 99 +---
> src/config/node.rs | 29 +-
> src/lib.rs | 2 -
> 16 files changed, 103 insertions(+), 1887 deletions(-)
> delete mode 100644 src/acme/client.rs
> delete mode 100644 src/acme/mod.rs
> delete mode 100644 src/acme/plugin.rs
> delete mode 100644 src/api2/types/acme.rs
>
>
> proxmox:
>
> Samuel Rufinatscha (4):
> acme-api: add helper to load client for an account
> acme: reduce visibility of Request type
> acme: introduce http_status module
> fix #6939: acme: support servers returning 204 for nonce requests
>
> proxmox-acme-api/src/account_api_impl.rs | 5 +++++
> proxmox-acme-api/src/lib.rs | 3 ++-
> proxmox-acme/src/account.rs | 27 +++++++++++++-----------
> proxmox-acme/src/async_client.rs | 8 +++----
> proxmox-acme/src/authorization.rs | 2 +-
> proxmox-acme/src/client.rs | 8 +++----
> proxmox-acme/src/lib.rs | 6 ++----
> proxmox-acme/src/order.rs | 2 +-
> proxmox-acme/src/request.rs | 25 +++++++++++++++-------
> 9 files changed, 51 insertions(+), 35 deletions(-)
>
>
> Summary over all repositories:
> 25 files changed, 154 insertions(+), 1922 deletions(-)
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox-backup v5 3/5] acme: drop local AcmeClient
2026-01-08 11:26 11% [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (5 preceding siblings ...)
2026-01-08 11:26 15% ` [pbs-devel] [PATCH proxmox-backup v5 2/5] acme: include proxmox-acme-api dependency Samuel Rufinatscha
@ 2026-01-08 11:26 6% ` Samuel Rufinatscha
2026-01-13 13:45 5% ` Fabian Grünbichler
2026-01-08 11:26 8% ` [pbs-devel] [PATCH proxmox-backup v5 4/5] acme: change API impls to use proxmox-acme-api handlers Samuel Rufinatscha
` (3 subsequent siblings)
10 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2026-01-08 11:26 UTC (permalink / raw)
To: pbs-devel
PBS currently uses its own ACME client and API logic, while PDM uses the
factored out proxmox-acme and proxmox-acme-api crates. This duplication
risks differences in behaviour and requires ACME maintenance in two
places. This patch is part of a series to move PBS over to the shared
ACME stack.
Changes:
- Remove the local src/acme/client.rs and switch to
proxmox_acme::async_client::AcmeClient where needed.
- Use proxmox_acme_api::load_client_with_account to the custom
AcmeClient::load() function
- Replace the local do_register() logic with
proxmox_acme_api::register_account, to further ensure accounts are persisted
- Replace the local AcmeAccountName type, required for
proxmox_acme_api::register_account
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
src/acme/client.rs | 691 -------------------------
src/acme/mod.rs | 3 -
src/acme/plugin.rs | 2 +-
src/api2/config/acme.rs | 50 +-
src/api2/node/certificates.rs | 2 +-
src/api2/types/acme.rs | 8 -
src/bin/proxmox_backup_manager/acme.rs | 17 +-
src/config/acme/mod.rs | 8 +-
src/config/node.rs | 9 +-
9 files changed, 36 insertions(+), 754 deletions(-)
delete mode 100644 src/acme/client.rs
diff --git a/src/acme/client.rs b/src/acme/client.rs
deleted file mode 100644
index 9fb6ad55..00000000
--- a/src/acme/client.rs
+++ /dev/null
@@ -1,691 +0,0 @@
-//! HTTP Client for the ACME protocol.
-
-use std::fs::OpenOptions;
-use std::io;
-use std::os::unix::fs::OpenOptionsExt;
-
-use anyhow::{bail, format_err};
-use bytes::Bytes;
-use http_body_util::BodyExt;
-use hyper::Request;
-use nix::sys::stat::Mode;
-use proxmox_http::Body;
-use serde::{Deserialize, Serialize};
-
-use proxmox_acme::account::AccountCreator;
-use proxmox_acme::order::{Order, OrderData};
-use proxmox_acme::types::AccountData as AcmeAccountData;
-use proxmox_acme::Request as AcmeRequest;
-use proxmox_acme::{Account, Authorization, Challenge, Directory, Error, ErrorResponse};
-use proxmox_http::client::Client;
-use proxmox_sys::fs::{replace_file, CreateOptions};
-
-use crate::api2::types::AcmeAccountName;
-use crate::config::acme::account_path;
-use crate::tools::pbs_simple_http;
-
-/// Our on-disk format inherited from PVE's proxmox-acme code.
-#[derive(Deserialize, Serialize)]
-#[serde(rename_all = "camelCase")]
-pub struct AccountData {
- /// The account's location URL.
- location: String,
-
- /// The account data.
- account: AcmeAccountData,
-
- /// The private key as PEM formatted string.
- key: String,
-
- /// ToS URL the user agreed to.
- #[serde(skip_serializing_if = "Option::is_none")]
- tos: Option<String>,
-
- #[serde(skip_serializing_if = "is_false", default)]
- debug: bool,
-
- /// The directory's URL.
- directory_url: String,
-}
-
-#[inline]
-fn is_false(b: &bool) -> bool {
- !*b
-}
-
-pub struct AcmeClient {
- directory_url: String,
- debug: bool,
- account_path: Option<String>,
- tos: Option<String>,
- account: Option<Account>,
- directory: Option<Directory>,
- nonce: Option<String>,
- http_client: Client,
-}
-
-impl AcmeClient {
- /// Create a new ACME client for a given ACME directory URL.
- pub fn new(directory_url: String) -> Self {
- Self {
- directory_url,
- debug: false,
- account_path: None,
- tos: None,
- account: None,
- directory: None,
- nonce: None,
- http_client: pbs_simple_http(None),
- }
- }
-
- /// Load an existing ACME account by name.
- pub async fn load(account_name: &AcmeAccountName) -> Result<Self, anyhow::Error> {
- let account_path = account_path(account_name.as_ref());
- let data = match tokio::fs::read(&account_path).await {
- Ok(data) => data,
- Err(err) if err.kind() == io::ErrorKind::NotFound => {
- bail!("acme account '{}' does not exist", account_name)
- }
- Err(err) => bail!(
- "failed to load acme account from '{}' - {}",
- account_path,
- err
- ),
- };
- let data: AccountData = serde_json::from_slice(&data).map_err(|err| {
- format_err!(
- "failed to parse acme account from '{}' - {}",
- account_path,
- err
- )
- })?;
-
- let account = Account::from_parts(data.location, data.key, data.account);
-
- let mut me = Self::new(data.directory_url);
- me.debug = data.debug;
- me.account_path = Some(account_path);
- me.tos = data.tos;
- me.account = Some(account);
-
- Ok(me)
- }
-
- pub async fn new_account<'a>(
- &'a mut self,
- account_name: &AcmeAccountName,
- tos_agreed: bool,
- contact: Vec<String>,
- rsa_bits: Option<u32>,
- eab_creds: Option<(String, String)>,
- ) -> Result<&'a Account, anyhow::Error> {
- self.tos = if tos_agreed {
- self.terms_of_service_url().await?.map(str::to_owned)
- } else {
- None
- };
-
- let mut account = Account::creator()
- .set_contacts(contact)
- .agree_to_tos(tos_agreed);
-
- if let Some((eab_kid, eab_hmac_key)) = eab_creds {
- account = account.set_eab_credentials(eab_kid, eab_hmac_key)?;
- }
-
- let account = if let Some(bits) = rsa_bits {
- account.generate_rsa_key(bits)?
- } else {
- account.generate_ec_key()?
- };
-
- let _ = self.register_account(account).await?;
-
- crate::config::acme::make_acme_account_dir()?;
- let account_path = account_path(account_name.as_ref());
- let file = OpenOptions::new()
- .write(true)
- .create_new(true)
- .mode(0o600)
- .open(&account_path)
- .map_err(|err| format_err!("failed to open {:?} for writing: {}", account_path, err))?;
- self.write_to(file).map_err(|err| {
- format_err!(
- "failed to write acme account to {:?}: {}",
- account_path,
- err
- )
- })?;
- self.account_path = Some(account_path);
-
- // unwrap: Setting `self.account` is literally this function's job, we just can't keep
- // the borrow from from `self.register_account()` active due to clashes.
- Ok(self.account.as_ref().unwrap())
- }
-
- fn save(&self) -> Result<(), anyhow::Error> {
- let mut data = Vec::<u8>::new();
- self.write_to(&mut data)?;
- let account_path = self.account_path.as_ref().ok_or_else(|| {
- format_err!("no account path set, cannot save updated account information")
- })?;
- crate::config::acme::make_acme_account_dir()?;
- replace_file(
- account_path,
- &data,
- CreateOptions::new()
- .perm(Mode::from_bits_truncate(0o600))
- .owner(nix::unistd::ROOT)
- .group(nix::unistd::Gid::from_raw(0)),
- true,
- )
- }
-
- /// Shortcut to `account().ok_or_else(...).key_authorization()`.
- pub fn key_authorization(&self, token: &str) -> Result<String, anyhow::Error> {
- Ok(Self::need_account(&self.account)?.key_authorization(token)?)
- }
-
- /// Shortcut to `account().ok_or_else(...).dns_01_txt_value()`.
- /// the key authorization value.
- pub fn dns_01_txt_value(&self, token: &str) -> Result<String, anyhow::Error> {
- Ok(Self::need_account(&self.account)?.dns_01_txt_value(token)?)
- }
-
- async fn register_account(
- &mut self,
- account: AccountCreator,
- ) -> Result<&Account, anyhow::Error> {
- let mut retry = retry();
- let mut response = loop {
- retry.tick()?;
-
- let (directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
- let request = account.request(directory, nonce)?;
- match self.run_request(request).await {
- Ok(response) => break response,
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- };
-
- let account = account.response(response.location_required()?, &response.body)?;
-
- self.account = Some(account);
- Ok(self.account.as_ref().unwrap())
- }
-
- pub async fn update_account<T: Serialize>(
- &mut self,
- data: &T,
- ) -> Result<&Account, anyhow::Error> {
- let account = Self::need_account(&self.account)?;
-
- let mut retry = retry();
- let response = loop {
- retry.tick()?;
-
- let (_directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = account.post_request(&account.location, nonce, data)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(response) => break response,
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- };
-
- // unwrap: we've been keeping an immutable reference to it from the top of the method
- let _ = account;
- self.account.as_mut().unwrap().data = response.json()?;
- self.save()?;
- Ok(self.account.as_ref().unwrap())
- }
-
- pub async fn new_order<I>(&mut self, domains: I) -> Result<Order, anyhow::Error>
- where
- I: IntoIterator<Item = String>,
- {
- let account = Self::need_account(&self.account)?;
-
- let order = domains
- .into_iter()
- .fold(OrderData::new(), |order, domain| order.domain(domain));
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let mut new_order = account.new_order(&order, directory, nonce)?;
- let mut response = match Self::execute(
- &mut self.http_client,
- new_order.request.take().unwrap(),
- &mut self.nonce,
- )
- .await
- {
- Ok(response) => response,
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- };
-
- return Ok(
- new_order.response(response.location_required()?, response.bytes().as_ref())?
- );
- }
- }
-
- /// Low level "POST-as-GET" request.
- async fn post_as_get(&mut self, url: &str) -> Result<AcmeResponse, anyhow::Error> {
- let account = Self::need_account(&self.account)?;
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (_directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = account.get_request(url, nonce)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(response) => return Ok(response),
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- }
- }
-
- /// Low level POST request.
- async fn post<T: Serialize>(
- &mut self,
- url: &str,
- data: &T,
- ) -> Result<AcmeResponse, anyhow::Error> {
- let account = Self::need_account(&self.account)?;
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (_directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = account.post_request(url, nonce, data)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(response) => return Ok(response),
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- }
- }
-
- /// Request challenge validation. Afterwards, the challenge should be polled.
- pub async fn request_challenge_validation(
- &mut self,
- url: &str,
- ) -> Result<Challenge, anyhow::Error> {
- Ok(self
- .post(url, &serde_json::Value::Object(Default::default()))
- .await?
- .json()?)
- }
-
- /// Assuming the provided URL is an 'Authorization' URL, get and deserialize it.
- pub async fn get_authorization(&mut self, url: &str) -> Result<Authorization, anyhow::Error> {
- Ok(self.post_as_get(url).await?.json()?)
- }
-
- /// Assuming the provided URL is an 'Order' URL, get and deserialize it.
- pub async fn get_order(&mut self, url: &str) -> Result<OrderData, anyhow::Error> {
- Ok(self.post_as_get(url).await?.json()?)
- }
-
- /// Finalize an Order via its `finalize` URL property and the DER encoded CSR.
- pub async fn finalize(&mut self, url: &str, csr: &[u8]) -> Result<(), anyhow::Error> {
- let csr = proxmox_base64::url::encode_no_pad(csr);
- let data = serde_json::json!({ "csr": csr });
- self.post(url, &data).await?;
- Ok(())
- }
-
- /// Download a certificate via its 'certificate' URL property.
- ///
- /// The certificate will be a PEM certificate chain.
- pub async fn get_certificate(&mut self, url: &str) -> Result<Bytes, anyhow::Error> {
- Ok(self.post_as_get(url).await?.body)
- }
-
- /// Revoke an existing certificate (PEM or DER formatted).
- pub async fn revoke_certificate(
- &mut self,
- certificate: &[u8],
- reason: Option<u32>,
- ) -> Result<(), anyhow::Error> {
- // TODO: This can also work without an account.
- let account = Self::need_account(&self.account)?;
-
- let revocation = account.revoke_certificate(certificate, reason)?;
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = revocation.request(directory, nonce)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(_response) => return Ok(()),
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- }
- }
-
- fn need_account(account: &Option<Account>) -> Result<&Account, anyhow::Error> {
- account
- .as_ref()
- .ok_or_else(|| format_err!("cannot use client without an account"))
- }
-
- pub(crate) fn account(&self) -> Result<&Account, anyhow::Error> {
- Self::need_account(&self.account)
- }
-
- pub fn tos(&self) -> Option<&str> {
- self.tos.as_deref()
- }
-
- pub fn directory_url(&self) -> &str {
- &self.directory_url
- }
-
- fn to_account_data(&self) -> Result<AccountData, anyhow::Error> {
- let account = self.account()?;
-
- Ok(AccountData {
- location: account.location.clone(),
- key: account.private_key.clone(),
- account: AcmeAccountData {
- only_return_existing: false, // don't actually write this out in case it's set
- ..account.data.clone()
- },
- tos: self.tos.clone(),
- debug: self.debug,
- directory_url: self.directory_url.clone(),
- })
- }
-
- fn write_to<T: io::Write>(&self, out: T) -> Result<(), anyhow::Error> {
- let data = self.to_account_data()?;
-
- Ok(serde_json::to_writer_pretty(out, &data)?)
- }
-}
-
-struct AcmeResponse {
- body: Bytes,
- location: Option<String>,
- got_nonce: bool,
-}
-
-impl AcmeResponse {
- /// Convenience helper to assert that a location header was part of the response.
- fn location_required(&mut self) -> Result<String, anyhow::Error> {
- self.location
- .take()
- .ok_or_else(|| format_err!("missing Location header"))
- }
-
- /// Convenience shortcut to perform json deserialization of the returned body.
- fn json<T: for<'a> Deserialize<'a>>(&self) -> Result<T, Error> {
- Ok(serde_json::from_slice(&self.body)?)
- }
-
- /// Convenience shortcut to get the body as bytes.
- fn bytes(&self) -> &[u8] {
- &self.body
- }
-}
-
-impl AcmeClient {
- /// Non-self-borrowing run_request version for borrow workarounds.
- async fn execute(
- http_client: &mut Client,
- request: AcmeRequest,
- nonce: &mut Option<String>,
- ) -> Result<AcmeResponse, Error> {
- let req_builder = Request::builder().method(request.method).uri(&request.url);
-
- let http_request = if !request.content_type.is_empty() {
- req_builder
- .header("Content-Type", request.content_type)
- .header("Content-Length", request.body.len())
- .body(request.body.into())
- } else {
- req_builder.body(Body::empty())
- }
- .map_err(|err| Error::Custom(format!("failed to create http request: {err}")))?;
-
- let response = http_client
- .request(http_request)
- .await
- .map_err(|err| Error::Custom(err.to_string()))?;
- let (parts, body) = response.into_parts();
-
- let status = parts.status.as_u16();
- let body = body
- .collect()
- .await
- .map_err(|err| Error::Custom(format!("failed to retrieve response body: {err}")))?
- .to_bytes();
-
- let got_nonce = if let Some(new_nonce) = parts.headers.get(proxmox_acme::REPLAY_NONCE) {
- let new_nonce = new_nonce.to_str().map_err(|err| {
- Error::Client(format!(
- "received invalid replay-nonce header from ACME server: {err}"
- ))
- })?;
- *nonce = Some(new_nonce.to_owned());
- true
- } else {
- false
- };
-
- if parts.status.is_success() {
- if status != request.expected {
- return Err(Error::InvalidApi(format!(
- "ACME server responded with unexpected status code: {:?}",
- parts.status
- )));
- }
-
- let location = parts
- .headers
- .get("Location")
- .map(|header| {
- header.to_str().map(str::to_owned).map_err(|err| {
- Error::Client(format!(
- "received invalid location header from ACME server: {err}"
- ))
- })
- })
- .transpose()?;
-
- return Ok(AcmeResponse {
- body,
- location,
- got_nonce,
- });
- }
-
- let error: ErrorResponse = serde_json::from_slice(&body).map_err(|err| {
- Error::Client(format!(
- "error status with improper error ACME response: {err}"
- ))
- })?;
-
- if error.ty == proxmox_acme::error::BAD_NONCE {
- if !got_nonce {
- return Err(Error::InvalidApi(
- "badNonce without a new Replay-Nonce header".to_string(),
- ));
- }
- return Err(Error::BadNonce);
- }
-
- Err(Error::Api(error))
- }
-
- /// Low-level API to run an n API request. This automatically updates the current nonce!
- async fn run_request(&mut self, request: AcmeRequest) -> Result<AcmeResponse, Error> {
- Self::execute(&mut self.http_client, request, &mut self.nonce).await
- }
-
- pub async fn directory(&mut self) -> Result<&Directory, Error> {
- Ok(Self::get_directory(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?
- .0)
- }
-
- async fn get_directory<'a, 'b>(
- http_client: &mut Client,
- directory_url: &str,
- directory: &'a mut Option<Directory>,
- nonce: &'b mut Option<String>,
- ) -> Result<(&'a Directory, Option<&'b str>), Error> {
- if let Some(d) = directory {
- return Ok((d, nonce.as_deref()));
- }
-
- let response = Self::execute(
- http_client,
- AcmeRequest {
- url: directory_url.to_string(),
- method: "GET",
- content_type: "",
- body: String::new(),
- expected: 200,
- },
- nonce,
- )
- .await?;
-
- *directory = Some(Directory::from_parts(
- directory_url.to_string(),
- response.json()?,
- ));
-
- Ok((directory.as_mut().unwrap(), nonce.as_deref()))
- }
-
- /// Like `get_directory`, but if the directory provides no nonce, also performs a `HEAD`
- /// request on the new nonce URL.
- async fn get_dir_nonce<'a, 'b>(
- http_client: &mut Client,
- directory_url: &str,
- directory: &'a mut Option<Directory>,
- nonce: &'b mut Option<String>,
- ) -> Result<(&'a Directory, &'b str), Error> {
- // this let construct is a lifetime workaround:
- let _ = Self::get_directory(http_client, directory_url, directory, nonce).await?;
- let dir = directory.as_ref().unwrap(); // the above fails if it couldn't fill this option
- if nonce.is_none() {
- // this is also a lifetime issue...
- let _ = Self::get_nonce(http_client, nonce, dir.new_nonce_url()).await?;
- };
- Ok((dir, nonce.as_deref().unwrap()))
- }
-
- pub async fn terms_of_service_url(&mut self) -> Result<Option<&str>, Error> {
- Ok(self.directory().await?.terms_of_service_url())
- }
-
- async fn get_nonce<'a>(
- http_client: &mut Client,
- nonce: &'a mut Option<String>,
- new_nonce_url: &str,
- ) -> Result<&'a str, Error> {
- let response = Self::execute(
- http_client,
- AcmeRequest {
- url: new_nonce_url.to_owned(),
- method: "HEAD",
- content_type: "",
- body: String::new(),
- expected: 200,
- },
- nonce,
- )
- .await?;
-
- if !response.got_nonce {
- return Err(Error::InvalidApi(
- "no new nonce received from new nonce URL".to_string(),
- ));
- }
-
- nonce
- .as_deref()
- .ok_or_else(|| Error::Client("failed to update nonce".to_string()))
- }
-}
-
-/// bad nonce retry count helper
-struct Retry(usize);
-
-const fn retry() -> Retry {
- Retry(0)
-}
-
-impl Retry {
- fn tick(&mut self) -> Result<(), Error> {
- if self.0 >= 3 {
- Err(Error::Client("kept getting a badNonce error!".to_string()))
- } else {
- self.0 += 1;
- Ok(())
- }
- }
-}
diff --git a/src/acme/mod.rs b/src/acme/mod.rs
index bf61811c..cc561f9a 100644
--- a/src/acme/mod.rs
+++ b/src/acme/mod.rs
@@ -1,5 +1,2 @@
-mod client;
-pub use client::AcmeClient;
-
pub(crate) mod plugin;
pub(crate) use plugin::get_acme_plugin;
diff --git a/src/acme/plugin.rs b/src/acme/plugin.rs
index 993d729b..6804243c 100644
--- a/src/acme/plugin.rs
+++ b/src/acme/plugin.rs
@@ -18,10 +18,10 @@ use tokio::io::{AsyncBufReadExt, AsyncRead, AsyncWriteExt, BufReader};
use tokio::net::TcpListener;
use tokio::process::Command;
+use proxmox_acme::async_client::AcmeClient;
use proxmox_acme::{Authorization, Challenge};
use proxmox_rest_server::WorkerTask;
-use crate::acme::AcmeClient;
use crate::api2::types::AcmeDomain;
use crate::config::acme::plugin::{DnsPlugin, PluginData};
diff --git a/src/api2/config/acme.rs b/src/api2/config/acme.rs
index 18671639..898f06dd 100644
--- a/src/api2/config/acme.rs
+++ b/src/api2/config/acme.rs
@@ -11,16 +11,16 @@ use serde_json::{json, Value};
use tracing::{info, warn};
use pbs_api_types::{Authid, PRIV_SYS_MODIFY};
+use proxmox_acme::async_client::AcmeClient;
use proxmox_acme::types::AccountData as AcmeAccountData;
-use proxmox_acme::Account;
+use proxmox_acme_api::AcmeAccountName;
use proxmox_rest_server::WorkerTask;
use proxmox_router::{
http_bail, list_subdirs_api_method, Permission, Router, RpcEnvironment, SubdirMap,
};
use proxmox_schema::{api, param_bail};
-use crate::acme::AcmeClient;
-use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
+use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
use crate::config::acme::plugin::{
self, DnsPlugin, DnsPluginCore, DnsPluginCoreUpdater, PLUGIN_ID_SCHEMA,
};
@@ -141,15 +141,15 @@ pub struct AccountInfo {
)]
/// Return existing ACME account information.
pub async fn get_account(name: AcmeAccountName) -> Result<AccountInfo, Error> {
- let client = AcmeClient::load(&name).await?;
- let account = client.account()?;
+ let account_info = proxmox_acme_api::get_account(name).await?;
+
Ok(AccountInfo {
- location: account.location.clone(),
- tos: client.tos().map(str::to_owned),
- directory: client.directory_url().to_owned(),
+ location: account_info.location,
+ tos: account_info.tos,
+ directory: account_info.directory,
account: AcmeAccountData {
only_return_existing: false, // don't actually write this out in case it's set
- ..account.data.clone()
+ ..account_info.account
},
})
}
@@ -238,41 +238,24 @@ fn register_account(
auth_id.to_string(),
true,
move |_worker| async move {
- let mut client = AcmeClient::new(directory);
-
info!("Registering ACME account '{}'...", &name);
- let account = do_register_account(
- &mut client,
+ let location = proxmox_acme_api::register_account(
&name,
- tos_url.is_some(),
contact,
- None,
+ tos_url,
+ Some(directory),
eab_kid.zip(eab_hmac_key),
)
.await?;
- info!("Registration successful, account URL: {}", account.location);
+ info!("Registration successful, account URL: {}", location);
Ok(())
},
)
}
-pub async fn do_register_account<'a>(
- client: &'a mut AcmeClient,
- name: &AcmeAccountName,
- agree_to_tos: bool,
- contact: String,
- rsa_bits: Option<u32>,
- eab_creds: Option<(String, String)>,
-) -> Result<&'a Account, Error> {
- let contact = account_contact_from_string(&contact);
- client
- .new_account(name, agree_to_tos, contact, rsa_bits, eab_creds)
- .await
-}
-
#[api(
input: {
properties: {
@@ -310,7 +293,10 @@ pub fn update_account(
None => json!({}),
};
- AcmeClient::load(&name).await?.update_account(&data).await?;
+ proxmox_acme_api::load_client_with_account(&name)
+ .await?
+ .update_account(&data)
+ .await?;
Ok(())
},
@@ -348,7 +334,7 @@ pub fn deactivate_account(
auth_id.to_string(),
true,
move |_worker| async move {
- match AcmeClient::load(&name)
+ match proxmox_acme_api::load_client_with_account(&name)
.await?
.update_account(&json!({"status": "deactivated"}))
.await
diff --git a/src/api2/node/certificates.rs b/src/api2/node/certificates.rs
index 6b1d87d2..47ff8de5 100644
--- a/src/api2/node/certificates.rs
+++ b/src/api2/node/certificates.rs
@@ -8,6 +8,7 @@ use serde::{Deserialize, Serialize};
use tracing::{info, warn};
use pbs_api_types::{NODE_SCHEMA, PRIV_SYS_MODIFY};
+use proxmox_acme::async_client::AcmeClient;
use proxmox_rest_server::WorkerTask;
use proxmox_router::list_subdirs_api_method;
use proxmox_router::SubdirMap;
@@ -17,7 +18,6 @@ use proxmox_schema::api;
use pbs_buildcfg::configdir;
use pbs_tools::cert;
-use crate::acme::AcmeClient;
use crate::api2::types::AcmeDomain;
use crate::config::node::NodeConfig;
use crate::server::send_certificate_renewal_mail;
diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
index 8661f9e8..64175aff 100644
--- a/src/api2/types/acme.rs
+++ b/src/api2/types/acme.rs
@@ -59,14 +59,6 @@ pub struct KnownAcmeDirectory {
pub url: &'static str,
}
-proxmox_schema::api_string_type! {
- #[api(format: &PROXMOX_SAFE_ID_FORMAT)]
- /// ACME account name.
- #[derive(Clone, Eq, PartialEq, Hash, Deserialize, Serialize)]
- #[serde(transparent)]
- pub struct AcmeAccountName(String);
-}
-
#[api(
properties: {
schema: {
diff --git a/src/bin/proxmox_backup_manager/acme.rs b/src/bin/proxmox_backup_manager/acme.rs
index 0f0eafea..6ed61560 100644
--- a/src/bin/proxmox_backup_manager/acme.rs
+++ b/src/bin/proxmox_backup_manager/acme.rs
@@ -3,13 +3,13 @@ use std::io::Write;
use anyhow::{bail, Error};
use serde_json::Value;
+use proxmox_acme::async_client::AcmeClient;
+use proxmox_acme_api::AcmeAccountName;
use proxmox_router::{cli::*, ApiHandler, RpcEnvironment};
use proxmox_schema::api;
use proxmox_sys::fs::file_get_contents;
-use proxmox_backup::acme::AcmeClient;
use proxmox_backup::api2;
-use proxmox_backup::api2::types::AcmeAccountName;
use proxmox_backup::config::acme::plugin::DnsPluginCore;
use proxmox_backup::config::acme::KNOWN_ACME_DIRECTORIES;
@@ -188,17 +188,20 @@ async fn register_account(
println!("Attempting to register account with {directory_url:?}...");
- let account = api2::config::acme::do_register_account(
- &mut client,
+ let tos_agreed = tos_agreed
+ .then(|| directory.terms_of_service_url().map(str::to_owned))
+ .flatten();
+
+ let location = proxmox_acme_api::register_account(
&name,
- tos_agreed,
contact,
- None,
+ tos_agreed,
+ Some(directory_url),
eab_creds,
)
.await?;
- println!("Registration successful, account URL: {}", account.location);
+ println!("Registration successful, account URL: {}", location);
Ok(())
}
diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
index ac89ae5e..e4639c53 100644
--- a/src/config/acme/mod.rs
+++ b/src/config/acme/mod.rs
@@ -6,10 +6,11 @@ use anyhow::{bail, format_err, Error};
use serde_json::Value;
use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
+use proxmox_acme_api::AcmeAccountName;
use proxmox_sys::error::SysError;
use proxmox_sys::fs::{file_read_string, CreateOptions};
-use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
+use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
pub(crate) const ACME_ACCOUNT_DIR: &str = pbs_buildcfg::configdir!("/acme/accounts");
@@ -34,11 +35,6 @@ pub(crate) fn make_acme_dir() -> Result<(), Error> {
create_acme_subdir(ACME_DIR)
}
-pub(crate) fn make_acme_account_dir() -> Result<(), Error> {
- make_acme_dir()?;
- create_acme_subdir(ACME_ACCOUNT_DIR)
-}
-
pub const KNOWN_ACME_DIRECTORIES: &[KnownAcmeDirectory] = &[
KnownAcmeDirectory {
name: "Let's Encrypt V2",
diff --git a/src/config/node.rs b/src/config/node.rs
index 253b2e36..e4b66a20 100644
--- a/src/config/node.rs
+++ b/src/config/node.rs
@@ -8,16 +8,15 @@ use pbs_api_types::{
EMAIL_SCHEMA, MULTI_LINE_COMMENT_SCHEMA, OPENSSL_CIPHERS_TLS_1_2_SCHEMA,
OPENSSL_CIPHERS_TLS_1_3_SCHEMA,
};
+use proxmox_acme::async_client::AcmeClient;
+use proxmox_acme_api::AcmeAccountName;
use proxmox_http::ProxyConfig;
use proxmox_schema::{api, ApiStringFormat, ApiType, Updater};
use pbs_buildcfg::configdir;
use pbs_config::{open_backup_lockfile, BackupLockGuard};
-use crate::acme::AcmeClient;
-use crate::api2::types::{
- AcmeAccountName, AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA,
-};
+use crate::api2::types::{AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA};
const CONF_FILE: &str = configdir!("/node.cfg");
const LOCK_FILE: &str = configdir!("/.node.lck");
@@ -247,7 +246,7 @@ impl NodeConfig {
} else {
AcmeAccountName::from_string("default".to_string())? // should really not happen
};
- AcmeClient::load(&account).await
+ proxmox_acme_api::load_client_with_account(&account).await
}
pub fn acme_domains(&'_ self) -> AcmeDomainIter<'_> {
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* [pbs-devel] [PATCH proxmox-backup v5 1/5] acme: clean up ACME-related imports
2026-01-08 11:26 11% [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (3 preceding siblings ...)
2026-01-08 11:26 17% ` [pbs-devel] [PATCH proxmox v5 4/4] acme-api: add helper to load client for an account Samuel Rufinatscha
@ 2026-01-08 11:26 13% ` Samuel Rufinatscha
2026-01-13 13:45 5% ` [pbs-devel] applied: " Fabian Grünbichler
2026-01-08 11:26 15% ` [pbs-devel] [PATCH proxmox-backup v5 2/5] acme: include proxmox-acme-api dependency Samuel Rufinatscha
` (5 subsequent siblings)
10 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2026-01-08 11:26 UTC (permalink / raw)
To: pbs-devel
Clean up ACME-related imports to make it easier to switch to
the factored out proxmox/ ACME implementation later.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
src/acme/plugin.rs | 3 +--
src/api2/config/acme.rs | 10 ++++------
src/api2/node/certificates.rs | 7 +++----
src/api2/types/acme.rs | 3 +--
src/bin/proxmox-backup-manager.rs | 12 +++++-------
src/bin/proxmox-backup-proxy.rs | 14 ++++++--------
src/config/acme/mod.rs | 3 +--
src/config/acme/plugin.rs | 2 +-
src/config/node.rs | 6 ++----
9 files changed, 24 insertions(+), 36 deletions(-)
diff --git a/src/acme/plugin.rs b/src/acme/plugin.rs
index f756e9b5..993d729b 100644
--- a/src/acme/plugin.rs
+++ b/src/acme/plugin.rs
@@ -19,11 +19,10 @@ use tokio::net::TcpListener;
use tokio::process::Command;
use proxmox_acme::{Authorization, Challenge};
+use proxmox_rest_server::WorkerTask;
use crate::acme::AcmeClient;
use crate::api2::types::AcmeDomain;
-use proxmox_rest_server::WorkerTask;
-
use crate::config::acme::plugin::{DnsPlugin, PluginData};
const PROXMOX_ACME_SH_PATH: &str = "/usr/share/proxmox-acme/proxmox-acme";
diff --git a/src/api2/config/acme.rs b/src/api2/config/acme.rs
index 35c3fb77..18671639 100644
--- a/src/api2/config/acme.rs
+++ b/src/api2/config/acme.rs
@@ -10,22 +10,20 @@ use serde::{Deserialize, Serialize};
use serde_json::{json, Value};
use tracing::{info, warn};
+use pbs_api_types::{Authid, PRIV_SYS_MODIFY};
+use proxmox_acme::types::AccountData as AcmeAccountData;
+use proxmox_acme::Account;
+use proxmox_rest_server::WorkerTask;
use proxmox_router::{
http_bail, list_subdirs_api_method, Permission, Router, RpcEnvironment, SubdirMap,
};
use proxmox_schema::{api, param_bail};
-use proxmox_acme::types::AccountData as AcmeAccountData;
-use proxmox_acme::Account;
-
-use pbs_api_types::{Authid, PRIV_SYS_MODIFY};
-
use crate::acme::AcmeClient;
use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
use crate::config::acme::plugin::{
self, DnsPlugin, DnsPluginCore, DnsPluginCoreUpdater, PLUGIN_ID_SCHEMA,
};
-use proxmox_rest_server::WorkerTask;
pub(crate) const ROUTER: Router = Router::new()
.get(&list_subdirs_api_method!(SUBDIRS))
diff --git a/src/api2/node/certificates.rs b/src/api2/node/certificates.rs
index 61ef910e..6b1d87d2 100644
--- a/src/api2/node/certificates.rs
+++ b/src/api2/node/certificates.rs
@@ -5,23 +5,22 @@ use anyhow::{bail, format_err, Error};
use openssl::pkey::PKey;
use openssl::x509::X509;
use serde::{Deserialize, Serialize};
-use tracing::info;
+use tracing::{info, warn};
+use pbs_api_types::{NODE_SCHEMA, PRIV_SYS_MODIFY};
+use proxmox_rest_server::WorkerTask;
use proxmox_router::list_subdirs_api_method;
use proxmox_router::SubdirMap;
use proxmox_router::{Permission, Router, RpcEnvironment};
use proxmox_schema::api;
-use pbs_api_types::{NODE_SCHEMA, PRIV_SYS_MODIFY};
use pbs_buildcfg::configdir;
use pbs_tools::cert;
-use tracing::warn;
use crate::acme::AcmeClient;
use crate::api2::types::AcmeDomain;
use crate::config::node::NodeConfig;
use crate::server::send_certificate_renewal_mail;
-use proxmox_rest_server::WorkerTask;
pub const ROUTER: Router = Router::new()
.get(&list_subdirs_api_method!(SUBDIRS))
diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
index 210ebdbc..8661f9e8 100644
--- a/src/api2/types/acme.rs
+++ b/src/api2/types/acme.rs
@@ -1,9 +1,8 @@
use serde::{Deserialize, Serialize};
use serde_json::Value;
-use proxmox_schema::{api, ApiStringFormat, ApiType, Schema, StringSchema};
-
use pbs_api_types::{DNS_ALIAS_FORMAT, DNS_NAME_FORMAT, PROXMOX_SAFE_ID_FORMAT};
+use proxmox_schema::{api, ApiStringFormat, ApiType, Schema, StringSchema};
#[api(
properties: {
diff --git a/src/bin/proxmox-backup-manager.rs b/src/bin/proxmox-backup-manager.rs
index d9f41353..f8365070 100644
--- a/src/bin/proxmox-backup-manager.rs
+++ b/src/bin/proxmox-backup-manager.rs
@@ -5,10 +5,6 @@ use std::str::FromStr;
use anyhow::{format_err, Error};
use serde_json::{json, Value};
-use proxmox_router::{cli::*, RpcEnvironment};
-use proxmox_schema::api;
-use proxmox_sys::fs::CreateOptions;
-
use pbs_api_types::percent_encoding::percent_encode_component;
use pbs_api_types::{
BackupNamespace, GroupFilter, RateLimitConfig, SyncDirection, SyncJobConfig, DATASTORE_SCHEMA,
@@ -18,12 +14,14 @@ use pbs_api_types::{
VERIFICATION_OUTDATED_AFTER_SCHEMA, VERIFY_JOB_READ_THREADS_SCHEMA,
VERIFY_JOB_VERIFY_THREADS_SCHEMA,
};
+use proxmox_rest_server::wait_for_local_worker;
+use proxmox_router::{cli::*, RpcEnvironment};
+use proxmox_schema::api;
+use proxmox_sys::fs::CreateOptions;
+
use pbs_client::{display_task_log, view_task_result};
use pbs_config::sync;
use pbs_tools::json::required_string_param;
-
-use proxmox_rest_server::wait_for_local_worker;
-
use proxmox_backup::api2;
use proxmox_backup::client_helpers::connect_to_localhost;
use proxmox_backup::config;
diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
index 92a8cb3c..870208fe 100644
--- a/src/bin/proxmox-backup-proxy.rs
+++ b/src/bin/proxmox-backup-proxy.rs
@@ -9,27 +9,25 @@ use hyper::http::request::Parts;
use hyper::http::Response;
use hyper::StatusCode;
use hyper_util::server::graceful::GracefulShutdown;
+use openssl::ssl::SslAcceptor;
+use serde_json::{json, Value};
use tracing::level_filters::LevelFilter;
use tracing::{info, warn};
use url::form_urlencoded;
-use openssl::ssl::SslAcceptor;
-use serde_json::{json, Value};
-
use proxmox_http::Body;
use proxmox_http::RateLimiterTag;
use proxmox_lang::try_block;
+use proxmox_rest_server::{
+ cleanup_old_tasks, cookie_from_header, rotate_task_log_archive, ApiConfig, Redirector,
+ RestEnvironment, RestServer, WorkerTask,
+};
use proxmox_router::{RpcEnvironment, RpcEnvironmentType};
use proxmox_sys::fs::CreateOptions;
use proxmox_sys::logrotate::LogRotate;
use pbs_datastore::DataStore;
-use proxmox_rest_server::{
- cleanup_old_tasks, cookie_from_header, rotate_task_log_archive, ApiConfig, Redirector,
- RestEnvironment, RestServer, WorkerTask,
-};
-
use proxmox_backup::{
server::{
auth::check_pbs_auth,
diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
index 274a23fd..ac89ae5e 100644
--- a/src/config/acme/mod.rs
+++ b/src/config/acme/mod.rs
@@ -5,11 +5,10 @@ use std::path::Path;
use anyhow::{bail, format_err, Error};
use serde_json::Value;
+use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
use proxmox_sys::error::SysError;
use proxmox_sys::fs::{file_read_string, CreateOptions};
-use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
-
use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
diff --git a/src/config/acme/plugin.rs b/src/config/acme/plugin.rs
index 18e71199..8ce852ec 100644
--- a/src/config/acme/plugin.rs
+++ b/src/config/acme/plugin.rs
@@ -4,10 +4,10 @@ use anyhow::Error;
use serde::{Deserialize, Serialize};
use serde_json::Value;
+use pbs_api_types::PROXMOX_SAFE_ID_FORMAT;
use proxmox_schema::{api, ApiType, Schema, StringSchema, Updater};
use proxmox_section_config::{SectionConfig, SectionConfigData, SectionConfigPlugin};
-use pbs_api_types::PROXMOX_SAFE_ID_FORMAT;
use pbs_config::{open_backup_lockfile, BackupLockGuard};
pub const PLUGIN_ID_SCHEMA: Schema = StringSchema::new("ACME Challenge Plugin ID.")
diff --git a/src/config/node.rs b/src/config/node.rs
index d2d6e383..253b2e36 100644
--- a/src/config/node.rs
+++ b/src/config/node.rs
@@ -4,14 +4,12 @@ use anyhow::{bail, Error};
use openssl::ssl::{SslAcceptor, SslMethod};
use serde::{Deserialize, Serialize};
-use proxmox_schema::{api, ApiStringFormat, ApiType, Updater};
-
-use proxmox_http::ProxyConfig;
-
use pbs_api_types::{
EMAIL_SCHEMA, MULTI_LINE_COMMENT_SCHEMA, OPENSSL_CIPHERS_TLS_1_2_SCHEMA,
OPENSSL_CIPHERS_TLS_1_3_SCHEMA,
};
+use proxmox_http::ProxyConfig;
+use proxmox_schema::{api, ApiStringFormat, ApiType, Updater};
use pbs_buildcfg::configdir;
use pbs_config::{open_backup_lockfile, BackupLockGuard};
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests
@ 2026-01-08 11:26 11% Samuel Rufinatscha
2026-01-08 11:26 10% ` [pbs-devel] [PATCH proxmox v5 1/4] acme: reduce visibility of Request type Samuel Rufinatscha
` (10 more replies)
0 siblings, 11 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-08 11:26 UTC (permalink / raw)
To: pbs-devel
Hi,
this series fixes account registration for ACME providers that return
HTTP 204 No Content to the newNonce request. Currently, both the PBS
ACME client and the shared ACME client in proxmox-acme only accept
HTTP 200 OK for this request. The issue was observed in PBS against a
custom ACME deployment and reported as bug #6939 [1].
## Problem
During ACME account registration, PBS first fetches an anti-replay
nonce by sending a HEAD request to the CA’s newNonce URL.
RFC 8555 §7.2 [2] states that:
* the server MUST include a Replay-Nonce header with a fresh nonce,
* the server SHOULD use status 200 OK for the HEAD request,
* the server MUST also handle GET on the same resource and may return
204 No Content with an empty body.
The reporter observed the following error message:
*ACME server responded with unexpected status code: 204*
and mentioned that the issue did not appear with PVE 9 [1]. Looking at
PVE’s Perl ACME client [3], it uses a GET request instead of HEAD and
accepts any 2xx success code when retrieving the nonce. This difference
in behavior does not affect functionality but is worth noting for
consistency across implementations.
## Approach
To support ACME providers which return 204 No Content, the Rust ACME
clients in proxmox-backup and proxmox need to treat both 200 OK and 204
No Content as valid responses for the nonce request, as long as a
Replay-Nonce header is present.
This series changes the expected field of the internal Request type
from a single u16 to a list of allowed status codes
(e.g. &'static [u16]), so one request can explicitly accept multiple
success codes.
To avoid fixing the issue twice (once in PBS’ own ACME client and once
in the shared Rust client), this series first refactors PBS to use the
shared AcmeClient from proxmox-acme / proxmox-acme-api, similar to PDM,
and then applies the bug fix in that shared implementation so that all
consumers benefit from the more tolerant behavior.
## Testing
*Testing the refactor*
To test the refactor, I
(1) installed latest stable PBS on a VM
(2) created .deb package from latest PBS (master), containing the
refactor
(3) installed created .deb package
(4) installed Pebble from Let's Encrypt [5] on the same VM
(5) created an ACME account and ordered the new certificate for the
host domain.
Steps to reproduce:
(1) install latest stable PBS on a VM, create .deb package from latest
PBS (master) containing the refactor, install created .deb package
(2) install Pebble from Let's Encrypt [5] on the same VM:
cd
apt update
apt install -y golang git
git clone https://github.com/letsencrypt/pebble
cd pebble
go build ./cmd/pebble
then, download and trust the Pebble cert:
wget https://raw.githubusercontent.com/letsencrypt/pebble/main/test/certs/pebble.minica.pem
cp pebble.minica.pem /usr/local/share/ca-certificates/pebble.minica.crt
update-ca-certificates
We want Pebble to perform HTTP-01 validation against port 80, because
PBS’s standalone plugin will bind port 80. Set httpPort to 80.
nano ./test/config/pebble-config.json
Start the Pebble server in the background:
./pebble -config ./test/config/pebble-config.json &
Create a Pebble ACME account:
proxmox-backup-manager acme account register default admin@example.com --directory 'https://127.0.0.1:14000/dir'
To verify persistence of the account I checked
ls /etc/proxmox-backup/acme/accounts
Verified if update-account works
proxmox-backup-manager acme account update default --contact "a@example.com,b@example.com"
proxmox-backup-manager acme account info default
In the PBS GUI, you can create a new domain. You can use your host
domain name (see /etc/hosts). Select the created account and order the
certificate.
After a page reload, you might need to accept the new certificate in the browser.
In the PBS dashboard, you should see the new Pebble certificate.
*Note: on reboot, the created Pebble ACME account will be gone and you
will need to create a new one. Pebble does not persist account info.
In that case remove the previously created account in
/etc/proxmox-backup/acme/accounts.
*Testing the newNonce fix*
To prove the ACME newNonce fix, I put nginx in front of Pebble, to
intercept the newNonce request in order to return 204 No Content
instead of 200 OK, all other requests are unchanged and forwarded to
Pebble. Requires trusting the nginx CAs via
/usr/local/share/ca-certificates + update-ca-certificates on the VM.
Then I ran following command against nginx:
proxmox-backup-manager acme account register proxytest root@backup.local --directory 'https://nginx-address/dir
The account could be created successfully. When adjusting the nginx
configuration to return any other non-expected success status code,
PBS rejects as expected.
## Patch summary
0001 – [PATCH proxmox v5 1/4] acme: reduce visibility of Request type
Restricts the visibility of the low-level Request type. Consumers
should rely on proxmox-acme-api or AcmeClient handlers.
0002– [PATCH proxmox v5 2/4] acme: introduce http_status module
0003 – [PATCH proxmox v5 3/4] fix #6939: acme: support servers
returning 204 for nonce requests
Adjusts nonce handling to support ACME servers that return HTTP 204
(No Content) for new-nonce requests.
0004 – [PATCH proxmox v5 4/4] acme-api: add helper to load client for
an account
Introduces a helper function to load an ACME client instance for a
given account. Required for the following PBS ACME refactor.
0005 – [PATCH proxmox-backup v5 1/5] acme: clean up ACME-related imports
0006 – [PATCH proxmox-backup v5 2/5] acme: include proxmox-acme-api
dependency
Prepares the codebase to use the factored out ACME API impl.
0007 – [PATCH proxmox-backup v5 3/5] acme: drop local AcmeClient
Removes the local AcmeClient implementation. Represents the minimal
set of changes to replace it with the factored out AcmeClient.
0008 – [PATCH proxmox-backup v5 4/5] acme: change API impls to use
proxmox-acme-api handlers
0009 – [PATCH proxmox-backup v5 5/5] acme: certificate ordering through
proxmox-acme-api
Thanks for considering this patch series, I look forward to your
feedback.
Best,
Samuel Rufinatscha
## Changelog
Changes from v4 to v5:
* rebased series
* re-ordered series (proxmox-acme fix first)
* proxmox-backup: cleaned up imports based on an initial clean-up patch
* proxmox-acme: removed now unused post_request_raw_payload(),
update_account_request(), deactivate_account_request()
* proxmox-acme: removed now obsolete/unused get_authorization() and
GetAuthorization impl
Verified removal by compiling PBS, PDM, and proxmox-perl-rs
with all features.
Changes from v3 to v4:
* add proxmox-acme-api as a dependency and initialize it in
PBS so PBS can use the shared ACME API instead.
* remove the PBS-local AcmeClient implementation and switch PBS
over to the shared proxmox-acme async client.
* rework PBS’ ACME API endpoints to delegate to
proxmox-acme-api handlers instead of duplicating logic locally.
* move PBS’ ACME certificate ordering logic over to
proxmox-acme-api, keeping only certificate installation/reload in PBS.
* add a load_client_with_account helper in proxmox-acme-api so PBS
(and others) can construct an AcmeClient for a configured account
without duplicating boilerplate.
* hide the low-level Request type and its fields behind constructors
/ reduced visibility so changes to “expected” no longer affect the
public API as they did in v3.
* split out the HTTP status constants into an internal http_status
module as a separate preparatory cleanup before the bug fix, instead
of doing this inline like in v3.
* Rebased on top of the refactor: keep the same behavioural fix as in
v3 accept 204 for newNonce with Replay-Nonce present), but implement
it on top of the http_status module that is part of the refactor.
Changes from v2 to v3:
* rename `http_success` module to `http_status`
* replace `http_success` usage
* introduced `http_success` module to contain the http success codes
* replaced `Vec<u16>` with `&[u16]` for expected codes to avoid allocations.
* clarified the PVEs Perl ACME client behaviour in the commit message.
* integrated the `http_success` module, replacing `Vec<u16>` with `&[u16]`
* clarified the PVEs Perl ACME client behaviour in the commit message.
[1] Bugzilla report #6939:
[https://bugzilla.proxmox.com/show_bug.cgi?id=6939](https://bugzilla.proxmox.com/show_bug.cgi?id=6939)
[2] RFC 8555 (ACME):
[https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2](https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2)
[3] PVE’s Perl ACME client (allow 2xx codes for nonce requests):
[https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597)
[4] Pebble ACME server:
[https://github.com/letsencrypt/pebble](https://github.com/letsencrypt/pebble)
[5] Pebble ACME server (perform GET request:
[https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219)
proxmox:
Samuel Rufinatscha (4):
acme: reduce visibility of Request type
acme: introduce http_status module
fix #6939: acme: support servers returning 204 for nonce requests
acme-api: add helper to load client for an account
proxmox-acme-api/src/account_api_impl.rs | 5 ++
proxmox-acme-api/src/lib.rs | 3 +-
proxmox-acme/src/account.rs | 102 ++---------------------
proxmox-acme/src/async_client.rs | 8 +-
proxmox-acme/src/authorization.rs | 30 -------
proxmox-acme/src/client.rs | 8 +-
proxmox-acme/src/lib.rs | 6 +-
proxmox-acme/src/order.rs | 2 +-
proxmox-acme/src/request.rs | 25 ++++--
9 files changed, 44 insertions(+), 145 deletions(-)
proxmox-backup:
Samuel Rufinatscha (5):
acme: clean up ACME-related imports
acme: include proxmox-acme-api dependency
acme: drop local AcmeClient
acme: change API impls to use proxmox-acme-api handlers
acme: certificate ordering through proxmox-acme-api
Cargo.toml | 3 +
src/acme/client.rs | 691 -------------------------
src/acme/mod.rs | 5 -
src/acme/plugin.rs | 336 ------------
src/api2/config/acme.rs | 406 ++-------------
src/api2/node/certificates.rs | 232 ++-------
src/api2/types/acme.rs | 98 ----
src/api2/types/mod.rs | 3 -
src/bin/proxmox-backup-api.rs | 2 +
src/bin/proxmox-backup-manager.rs | 14 +-
src/bin/proxmox-backup-proxy.rs | 15 +-
src/bin/proxmox_backup_manager/acme.rs | 21 +-
src/config/acme/mod.rs | 55 +-
src/config/acme/plugin.rs | 92 +---
src/config/node.rs | 31 +-
src/lib.rs | 2 -
16 files changed, 109 insertions(+), 1897 deletions(-)
delete mode 100644 src/acme/client.rs
delete mode 100644 src/acme/mod.rs
delete mode 100644 src/acme/plugin.rs
delete mode 100644 src/api2/types/acme.rs
Summary over all repositories:
25 files changed, 153 insertions(+), 2042 deletions(-)
--
Generated by git-murpp 0.8.1
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 11%]
* [pbs-devel] [PATCH proxmox v5 1/4] acme: reduce visibility of Request type
2026-01-08 11:26 11% [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
@ 2026-01-08 11:26 10% ` Samuel Rufinatscha
2026-01-13 13:46 5% ` Fabian Grünbichler
2026-01-08 11:26 15% ` [pbs-devel] [PATCH proxmox v5 2/4] acme: introduce http_status module Samuel Rufinatscha
` (9 subsequent siblings)
10 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2026-01-08 11:26 UTC (permalink / raw)
To: pbs-devel
Currently, the low-level ACME Request type is publicly exposed, even
though users are expected to go through AcmeClient and
proxmox-acme-api handlers. This patch reduces visibility so that
the Request type and related fields/methods are crate-internal only.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme/src/account.rs | 94 ++-----------------------------
proxmox-acme/src/async_client.rs | 2 +-
proxmox-acme/src/authorization.rs | 30 ----------
proxmox-acme/src/client.rs | 6 +-
proxmox-acme/src/lib.rs | 4 --
proxmox-acme/src/order.rs | 2 +-
proxmox-acme/src/request.rs | 12 ++--
7 files changed, 16 insertions(+), 134 deletions(-)
diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
index f763c1e9..d8eb3e73 100644
--- a/proxmox-acme/src/account.rs
+++ b/proxmox-acme/src/account.rs
@@ -8,12 +8,11 @@ use openssl::pkey::{PKey, Private};
use serde::{Deserialize, Serialize};
use serde_json::Value;
-use crate::authorization::{Authorization, GetAuthorization};
use crate::b64u;
use crate::directory::Directory;
use crate::jws::Jws;
use crate::key::{Jwk, PublicKey};
-use crate::order::{NewOrder, Order, OrderData};
+use crate::order::{NewOrder, OrderData};
use crate::request::Request;
use crate::types::{AccountData, AccountStatus, ExternalAccountBinding};
use crate::Error;
@@ -92,7 +91,7 @@ impl Account {
}
/// Prepare a "POST-as-GET" request to fetch data. Low level helper.
- pub fn get_request(&self, url: &str, nonce: &str) -> Result<Request, Error> {
+ pub(crate) fn get_request(&self, url: &str, nonce: &str) -> Result<Request, Error> {
let key = PKey::private_key_from_pem(self.private_key.as_bytes())?;
let body = serde_json::to_string(&Jws::new_full(
&key,
@@ -112,7 +111,7 @@ impl Account {
}
/// Prepare a JSON POST request. Low level helper.
- pub fn post_request<T: Serialize>(
+ pub(crate) fn post_request<T: Serialize>(
&self,
url: &str,
nonce: &str,
@@ -136,31 +135,6 @@ impl Account {
})
}
- /// Prepare a JSON POST request.
- fn post_request_raw_payload(
- &self,
- url: &str,
- nonce: &str,
- payload: String,
- ) -> Result<Request, Error> {
- let key = PKey::private_key_from_pem(self.private_key.as_bytes())?;
- let body = serde_json::to_string(&Jws::new_full(
- &key,
- Some(self.location.clone()),
- url.to_owned(),
- nonce.to_owned(),
- payload,
- )?)?;
-
- Ok(Request {
- url: url.to_owned(),
- method: "POST",
- content_type: crate::request::JSON_CONTENT_TYPE,
- body,
- expected: 200,
- })
- }
-
/// Get the "key authorization" for a token.
pub fn key_authorization(&self, token: &str) -> Result<String, Error> {
let key = PKey::private_key_from_pem(self.private_key.as_bytes())?;
@@ -176,64 +150,6 @@ impl Account {
Ok(b64u::encode(digest))
}
- /// Prepare a request to update account data.
- ///
- /// This is a rather low level interface. You should know what you're doing.
- pub fn update_account_request<T: Serialize>(
- &self,
- nonce: &str,
- data: &T,
- ) -> Result<Request, Error> {
- self.post_request(&self.location, nonce, data)
- }
-
- /// Prepare a request to deactivate this account.
- pub fn deactivate_account_request<T: Serialize>(&self, nonce: &str) -> Result<Request, Error> {
- self.post_request_raw_payload(
- &self.location,
- nonce,
- r#"{"status":"deactivated"}"#.to_string(),
- )
- }
-
- /// Prepare a request to query an Authorization for an Order.
- ///
- /// Returns `Ok(None)` if `auth_index` is out of out of range. You can query the number of
- /// authorizations from via [`Order::authorization_len`] or by manually inspecting its
- /// `.data.authorization` vector.
- pub fn get_authorization(
- &self,
- order: &Order,
- auth_index: usize,
- nonce: &str,
- ) -> Result<Option<GetAuthorization>, Error> {
- match order.authorization(auth_index) {
- None => Ok(None),
- Some(url) => Ok(Some(GetAuthorization::new(self.get_request(url, nonce)?))),
- }
- }
-
- /// Prepare a request to validate a Challenge from an Authorization.
- ///
- /// Returns `Ok(None)` if `challenge_index` is out of out of range. The challenge count is
- /// available by inspecting the [`Authorization::challenges`] vector.
- ///
- /// This returns a raw `Request` since validation takes some time and the `Authorization`
- /// object has to be re-queried and its `status` inspected.
- pub fn validate_challenge(
- &self,
- authorization: &Authorization,
- challenge_index: usize,
- nonce: &str,
- ) -> Result<Option<Request>, Error> {
- match authorization.challenges.get(challenge_index) {
- None => Ok(None),
- Some(challenge) => self
- .post_request_raw_payload(&challenge.url, nonce, "{}".to_string())
- .map(Some),
- }
- }
-
/// Prepare a request to revoke a certificate.
///
/// The certificate can be either PEM or DER formatted.
@@ -274,7 +190,7 @@ pub struct CertificateRevocation<'a> {
impl CertificateRevocation<'_> {
/// Create the revocation request using the specified nonce for the given directory.
- pub fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
+ pub(crate) fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
let revoke_cert = directory.data.revoke_cert.as_ref().ok_or_else(|| {
Error::Custom("no 'revokeCert' URL specified by provider".to_string())
})?;
@@ -364,7 +280,7 @@ impl AccountCreator {
/// the resulting request.
/// Changing the private key between using the request and passing the response to
/// [`response`](AccountCreator::response()) will render the account unusable!
- pub fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
+ pub(crate) fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
let key = self.key.as_deref().ok_or(Error::MissingKey)?;
let url = directory.new_account_url().ok_or_else(|| {
Error::Custom("no 'newAccount' URL specified by provider".to_string())
diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
index dc755fb9..2ff3ba22 100644
--- a/proxmox-acme/src/async_client.rs
+++ b/proxmox-acme/src/async_client.rs
@@ -10,7 +10,7 @@ use proxmox_http::{client::Client, Body};
use crate::account::AccountCreator;
use crate::order::{Order, OrderData};
-use crate::Request as AcmeRequest;
+use crate::request::Request as AcmeRequest;
use crate::{Account, Authorization, Challenge, Directory, Error, ErrorResponse};
/// A non-blocking Acme client using tokio/hyper.
diff --git a/proxmox-acme/src/authorization.rs b/proxmox-acme/src/authorization.rs
index 28bc1b4b..7027381a 100644
--- a/proxmox-acme/src/authorization.rs
+++ b/proxmox-acme/src/authorization.rs
@@ -6,8 +6,6 @@ use serde::{Deserialize, Serialize};
use serde_json::Value;
use crate::order::Identifier;
-use crate::request::Request;
-use crate::Error;
/// Status of an [`Authorization`].
#[derive(Clone, Copy, Debug, Eq, PartialEq, Deserialize, Serialize)]
@@ -132,31 +130,3 @@ impl Challenge {
fn is_false(b: &bool) -> bool {
!*b
}
-
-/// Represents an in-flight query for an authorization.
-///
-/// This is created via [`Account::get_authorization`](crate::Account::get_authorization()).
-pub struct GetAuthorization {
- //order: OrderData,
- /// The request to send to the ACME provider. This is wrapped in an option in order to allow
- /// moving it out instead of copying the contents.
- ///
- /// When generated via [`Account::get_authorization`](crate::Account::get_authorization()),
- /// this is guaranteed to be `Some`.
- ///
- /// The response should be passed to the the [`response`](GetAuthorization::response()) method.
- pub request: Option<Request>,
-}
-
-impl GetAuthorization {
- pub(crate) fn new(request: Request) -> Self {
- Self {
- request: Some(request),
- }
- }
-
- /// Deal with the response we got from the server.
- pub fn response(self, response_body: &[u8]) -> Result<Authorization, Error> {
- Ok(serde_json::from_slice(response_body)?)
- }
-}
diff --git a/proxmox-acme/src/client.rs b/proxmox-acme/src/client.rs
index 931f7245..5c812567 100644
--- a/proxmox-acme/src/client.rs
+++ b/proxmox-acme/src/client.rs
@@ -7,8 +7,8 @@ use serde::{Deserialize, Serialize};
use crate::b64u;
use crate::error;
use crate::order::OrderData;
-use crate::request::ErrorResponse;
-use crate::{Account, Authorization, Challenge, Directory, Error, Order, Request};
+use crate::request::{ErrorResponse, Request};
+use crate::{Account, Authorization, Challenge, Directory, Error, Order};
macro_rules! format_err {
($($fmt:tt)*) => { Error::Client(format!($($fmt)*)) };
@@ -564,7 +564,7 @@ impl Client {
}
/// Low-level API to run an n API request. This automatically updates the current nonce!
- pub fn run_request(&mut self, request: Request) -> Result<HttpResponse, Error> {
+ pub(crate) fn run_request(&mut self, request: Request) -> Result<HttpResponse, Error> {
self.inner.run_request(request)
}
diff --git a/proxmox-acme/src/lib.rs b/proxmox-acme/src/lib.rs
index df722629..6722030c 100644
--- a/proxmox-acme/src/lib.rs
+++ b/proxmox-acme/src/lib.rs
@@ -66,10 +66,6 @@ pub use error::Error;
#[doc(inline)]
pub use order::Order;
-#[cfg(feature = "impl")]
-#[doc(inline)]
-pub use request::Request;
-
// we don't inline these:
#[cfg(feature = "impl")]
pub use order::NewOrder;
diff --git a/proxmox-acme/src/order.rs b/proxmox-acme/src/order.rs
index b6551004..432a81a4 100644
--- a/proxmox-acme/src/order.rs
+++ b/proxmox-acme/src/order.rs
@@ -153,7 +153,7 @@ pub struct NewOrder {
//order: OrderData,
/// The request to execute to place the order. When creating a [`NewOrder`] via
/// [`Account::new_order`](crate::Account::new_order) this is guaranteed to be `Some`.
- pub request: Option<Request>,
+ pub(crate) request: Option<Request>,
}
impl NewOrder {
diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
index 78a90913..dadfc5af 100644
--- a/proxmox-acme/src/request.rs
+++ b/proxmox-acme/src/request.rs
@@ -4,21 +4,21 @@ pub(crate) const JSON_CONTENT_TYPE: &str = "application/jose+json";
pub(crate) const CREATED: u16 = 201;
/// A request which should be performed on the ACME provider.
-pub struct Request {
+pub(crate) struct Request {
/// The complete URL to send the request to.
- pub url: String,
+ pub(crate) url: String,
/// The HTTP method name to use.
- pub method: &'static str,
+ pub(crate) method: &'static str,
/// The `Content-Type` header to pass along.
- pub content_type: &'static str,
+ pub(crate) content_type: &'static str,
/// The body to pass along with request, or an empty string.
- pub body: String,
+ pub(crate) body: String,
/// The expected status code a compliant ACME provider will return on success.
- pub expected: u16,
+ pub(crate) expected: u16,
}
/// An ACME error response contains a specially formatted type string, and can optionally
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 10%]
* [pbs-devel] [PATCH proxmox-backup v5 2/5] acme: include proxmox-acme-api dependency
2026-01-08 11:26 11% [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (4 preceding siblings ...)
2026-01-08 11:26 13% ` [pbs-devel] [PATCH proxmox-backup v5 1/5] acme: clean up ACME-related imports Samuel Rufinatscha
@ 2026-01-08 11:26 15% ` Samuel Rufinatscha
2026-01-13 13:45 5% ` Fabian Grünbichler
2026-01-08 11:26 6% ` [pbs-devel] [PATCH proxmox-backup v5 3/5] acme: drop local AcmeClient Samuel Rufinatscha
` (4 subsequent siblings)
10 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2026-01-08 11:26 UTC (permalink / raw)
To: pbs-devel
PBS currently uses its own ACME client and API logic, while PDM uses the
factored out proxmox-acme and proxmox-acme-api crates. This duplication
risks differences in behaviour and requires ACME maintenance in two
places. This patch is part of a series to move PBS over to the shared
ACME stack.
Changes:
- Add proxmox-acme-api with the "impl" feature as a dependency.
- Initialize proxmox_acme_api in proxmox-backup- api, manager and proxy.
* Inits PBS config dir /acme as proxmox ACME directory
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Cargo.toml | 3 +++
src/bin/proxmox-backup-api.rs | 2 ++
src/bin/proxmox-backup-manager.rs | 2 ++
src/bin/proxmox-backup-proxy.rs | 1 +
4 files changed, 8 insertions(+)
diff --git a/Cargo.toml b/Cargo.toml
index 1aa57ae5..feae351d 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -101,6 +101,7 @@ pbs-api-types = "1.0.8"
# other proxmox crates
pathpatterns = "1"
proxmox-acme = "1"
+proxmox-acme-api = { version = "1", features = [ "impl" ] }
pxar = "1"
# PBS workspace
@@ -251,6 +252,7 @@ pbs-api-types.workspace = true
# in their respective repo
proxmox-acme.workspace = true
+proxmox-acme-api.workspace = true
pxar.workspace = true
# proxmox-backup workspace/internal crates
@@ -269,6 +271,7 @@ proxmox-rrd-api-types.workspace = true
[patch.crates-io]
#pbs-api-types = { path = "../proxmox/pbs-api-types" }
#proxmox-acme = { path = "../proxmox/proxmox-acme" }
+#proxmox-acme-api = { path = "../proxmox/proxmox-acme-api" }
#proxmox-api-macro = { path = "../proxmox/proxmox-api-macro" }
#proxmox-apt = { path = "../proxmox/proxmox-apt" }
#proxmox-apt-api-types = { path = "../proxmox/proxmox-apt-api-types" }
diff --git a/src/bin/proxmox-backup-api.rs b/src/bin/proxmox-backup-api.rs
index 417e9e97..d0091dca 100644
--- a/src/bin/proxmox-backup-api.rs
+++ b/src/bin/proxmox-backup-api.rs
@@ -14,6 +14,7 @@ use proxmox_rest_server::{ApiConfig, RestServer};
use proxmox_router::RpcEnvironmentType;
use proxmox_sys::fs::CreateOptions;
+use pbs_buildcfg::configdir;
use proxmox_backup::auth_helpers::*;
use proxmox_backup::config;
use proxmox_backup::server::auth::check_pbs_auth;
@@ -78,6 +79,7 @@ async fn run() -> Result<(), Error> {
let mut command_sock = proxmox_daemon::command_socket::CommandSocket::new(backup_user.gid);
proxmox_product_config::init(backup_user.clone(), pbs_config::priv_user()?);
+ proxmox_acme_api::init(configdir!("/acme"), true)?;
let dir_opts = CreateOptions::new()
.owner(backup_user.uid)
diff --git a/src/bin/proxmox-backup-manager.rs b/src/bin/proxmox-backup-manager.rs
index f8365070..30bc8da9 100644
--- a/src/bin/proxmox-backup-manager.rs
+++ b/src/bin/proxmox-backup-manager.rs
@@ -19,6 +19,7 @@ use proxmox_router::{cli::*, RpcEnvironment};
use proxmox_schema::api;
use proxmox_sys::fs::CreateOptions;
+use pbs_buildcfg::configdir;
use pbs_client::{display_task_log, view_task_result};
use pbs_config::sync;
use pbs_tools::json::required_string_param;
@@ -667,6 +668,7 @@ async fn run() -> Result<(), Error> {
.init()?;
proxmox_backup::server::notifications::init()?;
proxmox_product_config::init(pbs_config::backup_user()?, pbs_config::priv_user()?);
+ proxmox_acme_api::init(configdir!("/acme"), false)?;
let cmd_def = CliCommandMap::new()
.insert("acl", acl_commands())
diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
index 870208fe..eea44a7d 100644
--- a/src/bin/proxmox-backup-proxy.rs
+++ b/src/bin/proxmox-backup-proxy.rs
@@ -188,6 +188,7 @@ async fn run() -> Result<(), Error> {
proxmox_backup::server::notifications::init()?;
metric_collection::init()?;
proxmox_product_config::init(pbs_config::backup_user()?, pbs_config::priv_user()?);
+ proxmox_acme_api::init(configdir!("/acme"), false)?;
let mut indexpath = PathBuf::from(pbs_buildcfg::JS_DIR);
indexpath.push("index.hbs");
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox v5 2/4] acme: introduce http_status module
2026-01-08 11:26 11% [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2026-01-08 11:26 10% ` [pbs-devel] [PATCH proxmox v5 1/4] acme: reduce visibility of Request type Samuel Rufinatscha
@ 2026-01-08 11:26 15% ` Samuel Rufinatscha
2026-01-13 13:45 5% ` Fabian Grünbichler
2026-01-08 11:26 14% ` [pbs-devel] [PATCH proxmox v5 3/4] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (8 subsequent siblings)
10 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2026-01-08 11:26 UTC (permalink / raw)
To: pbs-devel
Introduce an internal http_status module with the common ACME HTTP
response codes, and replace use of crate::request::CREATED as well as
direct numeric status code usages.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme/src/account.rs | 8 ++++----
proxmox-acme/src/async_client.rs | 4 ++--
proxmox-acme/src/lib.rs | 2 ++
proxmox-acme/src/request.rs | 11 ++++++++++-
4 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
index d8eb3e73..ea1a3c60 100644
--- a/proxmox-acme/src/account.rs
+++ b/proxmox-acme/src/account.rs
@@ -84,7 +84,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::request::CREATED,
+ expected: crate::http_status::CREATED,
};
Ok(NewOrder::new(request))
@@ -106,7 +106,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: crate::http_status::OK,
})
}
@@ -131,7 +131,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: crate::http_status::OK,
})
}
@@ -321,7 +321,7 @@ impl AccountCreator {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::request::CREATED,
+ expected: crate::http_status::CREATED,
})
}
diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
index 2ff3ba22..043648bb 100644
--- a/proxmox-acme/src/async_client.rs
+++ b/proxmox-acme/src/async_client.rs
@@ -498,7 +498,7 @@ impl AcmeClient {
method: "GET",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: crate::http_status::OK,
},
nonce,
)
@@ -550,7 +550,7 @@ impl AcmeClient {
method: "HEAD",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: crate::http_status::OK,
},
nonce,
)
diff --git a/proxmox-acme/src/lib.rs b/proxmox-acme/src/lib.rs
index 6722030c..6051a025 100644
--- a/proxmox-acme/src/lib.rs
+++ b/proxmox-acme/src/lib.rs
@@ -70,6 +70,8 @@ pub use order::Order;
#[cfg(feature = "impl")]
pub use order::NewOrder;
#[cfg(feature = "impl")]
+pub(crate) use request::http_status;
+#[cfg(feature = "impl")]
pub use request::ErrorResponse;
/// Header name for nonces.
diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
index dadfc5af..341ce53e 100644
--- a/proxmox-acme/src/request.rs
+++ b/proxmox-acme/src/request.rs
@@ -1,7 +1,6 @@
use serde::Deserialize;
pub(crate) const JSON_CONTENT_TYPE: &str = "application/jose+json";
-pub(crate) const CREATED: u16 = 201;
/// A request which should be performed on the ACME provider.
pub(crate) struct Request {
@@ -21,6 +20,16 @@ pub(crate) struct Request {
pub(crate) expected: u16,
}
+/// Common HTTP status codes used in ACME responses.
+pub(crate) mod http_status {
+ /// 200 OK
+ pub(crate) const OK: u16 = 200;
+ /// 201 Created
+ pub(crate) const CREATED: u16 = 201;
+ /// 204 No Content
+ pub(crate) const NO_CONTENT: u16 = 204;
+}
+
/// An ACME error response contains a specially formatted type string, and can optionally
/// contain textual details and a set of sub problems.
#[derive(Clone, Debug, Deserialize)]
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox-backup v5 5/5] acme: certificate ordering through proxmox-acme-api
2026-01-08 11:26 11% [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (7 preceding siblings ...)
2026-01-08 11:26 8% ` [pbs-devel] [PATCH proxmox-backup v5 4/5] acme: change API impls to use proxmox-acme-api handlers Samuel Rufinatscha
@ 2026-01-08 11:26 7% ` Samuel Rufinatscha
2026-01-13 13:45 5% ` Fabian Grünbichler
2026-01-13 13:48 5% ` [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests Fabian Grünbichler
2026-01-16 11:30 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
10 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2026-01-08 11:26 UTC (permalink / raw)
To: pbs-devel
PBS currently uses its own ACME client and API logic, while PDM uses the
factored out proxmox-acme and proxmox-acme-api crates. This duplication
risks differences in behaviour and requires ACME maintenance in two
places. This patch is part of a series to move PBS over to the shared
ACME stack.
Changes:
- Replace the custom ACME order/authorization loop in node certificates
with a call to proxmox_acme_api::order_certificate.
- Build domain + config data as proxmox-acme-api types
- Remove obsolete local ACME ordering and plugin glue code.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
src/acme/mod.rs | 2 -
src/acme/plugin.rs | 335 ----------------------------------
src/api2/node/certificates.rs | 229 ++++-------------------
src/api2/types/acme.rs | 73 --------
src/api2/types/mod.rs | 3 -
src/config/acme/mod.rs | 8 +-
src/config/acme/plugin.rs | 92 +---------
src/config/node.rs | 20 +-
src/lib.rs | 2 -
9 files changed, 38 insertions(+), 726 deletions(-)
delete mode 100644 src/acme/mod.rs
delete mode 100644 src/acme/plugin.rs
delete mode 100644 src/api2/types/acme.rs
diff --git a/src/acme/mod.rs b/src/acme/mod.rs
deleted file mode 100644
index cc561f9a..00000000
--- a/src/acme/mod.rs
+++ /dev/null
@@ -1,2 +0,0 @@
-pub(crate) mod plugin;
-pub(crate) use plugin::get_acme_plugin;
diff --git a/src/acme/plugin.rs b/src/acme/plugin.rs
deleted file mode 100644
index 6804243c..00000000
--- a/src/acme/plugin.rs
+++ /dev/null
@@ -1,335 +0,0 @@
-use std::future::Future;
-use std::net::{IpAddr, SocketAddr};
-use std::pin::Pin;
-use std::process::Stdio;
-use std::sync::Arc;
-use std::time::Duration;
-
-use anyhow::{bail, format_err, Error};
-use bytes::Bytes;
-use futures::TryFutureExt;
-use http_body_util::Full;
-use hyper::body::Incoming;
-use hyper::server::conn::http1;
-use hyper::service::service_fn;
-use hyper::{Request, Response};
-use hyper_util::rt::TokioIo;
-use tokio::io::{AsyncBufReadExt, AsyncRead, AsyncWriteExt, BufReader};
-use tokio::net::TcpListener;
-use tokio::process::Command;
-
-use proxmox_acme::async_client::AcmeClient;
-use proxmox_acme::{Authorization, Challenge};
-use proxmox_rest_server::WorkerTask;
-
-use crate::api2::types::AcmeDomain;
-use crate::config::acme::plugin::{DnsPlugin, PluginData};
-
-const PROXMOX_ACME_SH_PATH: &str = "/usr/share/proxmox-acme/proxmox-acme";
-
-pub(crate) fn get_acme_plugin(
- plugin_data: &PluginData,
- name: &str,
-) -> Result<Option<Box<dyn AcmePlugin + Send + Sync + 'static>>, Error> {
- let (ty, data) = match plugin_data.get(name) {
- Some(plugin) => plugin,
- None => return Ok(None),
- };
-
- Ok(Some(match ty.as_str() {
- "dns" => {
- let plugin: DnsPlugin = serde::Deserialize::deserialize(data)?;
- Box::new(plugin)
- }
- "standalone" => {
- // this one has no config
- Box::<StandaloneServer>::default()
- }
- other => bail!("missing implementation for plugin type '{}'", other),
- }))
-}
-
-pub(crate) trait AcmePlugin {
- /// Setup everything required to trigger the validation and return the corresponding validation
- /// URL.
- fn setup<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<&'c str, Error>> + Send + 'fut>>;
-
- fn teardown<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + 'fut>>;
-}
-
-fn extract_challenge<'a>(
- authorization: &'a Authorization,
- ty: &str,
-) -> Result<&'a Challenge, Error> {
- authorization
- .challenges
- .iter()
- .find(|ch| ch.ty == ty)
- .ok_or_else(|| format_err!("no supported challenge type ({}) found", ty))
-}
-
-async fn pipe_to_tasklog<T: AsyncRead + Unpin>(
- pipe: T,
- task: Arc<WorkerTask>,
-) -> Result<(), std::io::Error> {
- let mut pipe = BufReader::new(pipe);
- let mut line = String::new();
- loop {
- line.clear();
- match pipe.read_line(&mut line).await {
- Ok(0) => return Ok(()),
- Ok(_) => task.log_message(line.as_str()),
- Err(err) => return Err(err),
- }
- }
-}
-
-impl DnsPlugin {
- async fn action<'a>(
- &self,
- client: &mut AcmeClient,
- authorization: &'a Authorization,
- domain: &AcmeDomain,
- task: Arc<WorkerTask>,
- action: &str,
- ) -> Result<&'a str, Error> {
- let challenge = extract_challenge(authorization, "dns-01")?;
- let mut stdin_data = client
- .dns_01_txt_value(
- challenge
- .token()
- .ok_or_else(|| format_err!("missing token in challenge"))?,
- )?
- .into_bytes();
- stdin_data.push(b'\n');
- stdin_data.extend(self.data.as_bytes());
- if stdin_data.last() != Some(&b'\n') {
- stdin_data.push(b'\n');
- }
-
- let mut command = Command::new("/usr/bin/setpriv");
-
- #[rustfmt::skip]
- command.args([
- "--reuid", "nobody",
- "--regid", "nogroup",
- "--clear-groups",
- "--reset-env",
- "--",
- "/bin/bash",
- PROXMOX_ACME_SH_PATH,
- action,
- &self.core.api,
- domain.alias.as_deref().unwrap_or(&domain.domain),
- ]);
-
- // We could use 1 socketpair, but tokio wraps them all in `File` internally causing `close`
- // to be called separately on all of them without exception, so we need 3 pipes :-(
-
- let mut child = command
- .stdin(Stdio::piped())
- .stdout(Stdio::piped())
- .stderr(Stdio::piped())
- .spawn()?;
-
- let mut stdin = child.stdin.take().expect("Stdio::piped()");
- let stdout = child.stdout.take().expect("Stdio::piped() failed?");
- let stdout = pipe_to_tasklog(stdout, Arc::clone(&task));
- let stderr = child.stderr.take().expect("Stdio::piped() failed?");
- let stderr = pipe_to_tasklog(stderr, Arc::clone(&task));
- let stdin = async move {
- stdin.write_all(&stdin_data).await?;
- stdin.flush().await?;
- Ok::<_, std::io::Error>(())
- };
- match futures::try_join!(stdin, stdout, stderr) {
- Ok(((), (), ())) => (),
- Err(err) => {
- if let Err(err) = child.kill().await {
- task.log_message(format!(
- "failed to kill '{PROXMOX_ACME_SH_PATH} {action}' command: {err}"
- ));
- }
- bail!("'{}' failed: {}", PROXMOX_ACME_SH_PATH, err);
- }
- }
-
- let status = child.wait().await?;
- if !status.success() {
- bail!(
- "'{} {}' exited with error ({})",
- PROXMOX_ACME_SH_PATH,
- action,
- status.code().unwrap_or(-1)
- );
- }
-
- Ok(&challenge.url)
- }
-}
-
-impl AcmePlugin for DnsPlugin {
- fn setup<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<&'c str, Error>> + Send + 'fut>> {
- Box::pin(async move {
- let result = self
- .action(client, authorization, domain, task.clone(), "setup")
- .await;
-
- let validation_delay = self.core.validation_delay.unwrap_or(30) as u64;
- if validation_delay > 0 {
- task.log_message(format!(
- "Sleeping {validation_delay} seconds to wait for TXT record propagation"
- ));
- tokio::time::sleep(Duration::from_secs(validation_delay)).await;
- }
- result
- })
- }
-
- fn teardown<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + 'fut>> {
- Box::pin(async move {
- self.action(client, authorization, domain, task, "teardown")
- .await
- .map(drop)
- })
- }
-}
-
-#[derive(Default)]
-struct StandaloneServer {
- abort_handle: Option<futures::future::AbortHandle>,
-}
-
-// In case the "order_certificates" future gets dropped between setup & teardown, let's also cancel
-// the HTTP listener on Drop:
-impl Drop for StandaloneServer {
- fn drop(&mut self) {
- self.stop();
- }
-}
-
-impl StandaloneServer {
- fn stop(&mut self) {
- if let Some(abort) = self.abort_handle.take() {
- abort.abort();
- }
- }
-}
-
-async fn standalone_respond(
- req: Request<Incoming>,
- path: Arc<String>,
- key_auth: Arc<String>,
-) -> Result<Response<Full<Bytes>>, hyper::Error> {
- if req.method() == hyper::Method::GET && req.uri().path() == path.as_str() {
- Ok(Response::builder()
- .status(hyper::http::StatusCode::OK)
- .body(key_auth.as_bytes().to_vec().into())
- .unwrap())
- } else {
- Ok(Response::builder()
- .status(hyper::http::StatusCode::NOT_FOUND)
- .body("Not found.".into())
- .unwrap())
- }
-}
-
-impl AcmePlugin for StandaloneServer {
- fn setup<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- _domain: &'d AcmeDomain,
- _task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<&'c str, Error>> + Send + 'fut>> {
- Box::pin(async move {
- self.stop();
-
- let challenge = extract_challenge(authorization, "http-01")?;
- let token = challenge
- .token()
- .ok_or_else(|| format_err!("missing token in challenge"))?;
- let key_auth = Arc::new(client.key_authorization(token)?);
- let path = Arc::new(format!("/.well-known/acme-challenge/{token}"));
-
- // `[::]:80` first, then `*:80`
- let dual = SocketAddr::new(IpAddr::from([0u16; 8]), 80);
- let ipv4 = SocketAddr::new(IpAddr::from([0u8; 4]), 80);
- let incoming = TcpListener::bind(dual)
- .or_else(|_| TcpListener::bind(ipv4))
- .await?;
-
- let server = async move {
- loop {
- let key_auth = Arc::clone(&key_auth);
- let path = Arc::clone(&path);
- match incoming.accept().await {
- Ok((tcp, _)) => {
- let io = TokioIo::new(tcp);
- let service = service_fn(move |request| {
- standalone_respond(
- request,
- Arc::clone(&path),
- Arc::clone(&key_auth),
- )
- });
-
- tokio::task::spawn(async move {
- if let Err(err) =
- http1::Builder::new().serve_connection(io, service).await
- {
- println!("Error serving connection: {err:?}");
- }
- });
- }
- Err(err) => println!("Error accepting connection: {err:?}"),
- }
- }
- };
- let (future, abort) = futures::future::abortable(server);
- self.abort_handle = Some(abort);
- tokio::spawn(future);
-
- Ok(challenge.url.as_str())
- })
- }
-
- fn teardown<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- _client: &'b mut AcmeClient,
- _authorization: &'c Authorization,
- _domain: &'d AcmeDomain,
- _task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + 'fut>> {
- Box::pin(async move {
- if let Some(abort) = self.abort_handle.take() {
- abort.abort();
- }
- Ok(())
- })
- }
-}
diff --git a/src/api2/node/certificates.rs b/src/api2/node/certificates.rs
index 47ff8de5..73401c41 100644
--- a/src/api2/node/certificates.rs
+++ b/src/api2/node/certificates.rs
@@ -1,14 +1,11 @@
-use std::sync::Arc;
-use std::time::Duration;
-
use anyhow::{bail, format_err, Error};
use openssl::pkey::PKey;
use openssl::x509::X509;
use serde::{Deserialize, Serialize};
-use tracing::{info, warn};
+use tracing::info;
use pbs_api_types::{NODE_SCHEMA, PRIV_SYS_MODIFY};
-use proxmox_acme::async_client::AcmeClient;
+use proxmox_acme_api::AcmeDomain;
use proxmox_rest_server::WorkerTask;
use proxmox_router::list_subdirs_api_method;
use proxmox_router::SubdirMap;
@@ -18,8 +15,6 @@ use proxmox_schema::api;
use pbs_buildcfg::configdir;
use pbs_tools::cert;
-use crate::api2::types::AcmeDomain;
-use crate::config::node::NodeConfig;
use crate::server::send_certificate_renewal_mail;
pub const ROUTER: Router = Router::new()
@@ -268,193 +263,6 @@ pub async fn delete_custom_certificate() -> Result<(), Error> {
Ok(())
}
-struct OrderedCertificate {
- certificate: hyper::body::Bytes,
- private_key_pem: Vec<u8>,
-}
-
-async fn order_certificate(
- worker: Arc<WorkerTask>,
- node_config: &NodeConfig,
-) -> Result<Option<OrderedCertificate>, Error> {
- use proxmox_acme::authorization::Status;
- use proxmox_acme::order::Identifier;
-
- let domains = node_config.acme_domains().try_fold(
- Vec::<AcmeDomain>::new(),
- |mut acc, domain| -> Result<_, Error> {
- let mut domain = domain?;
- domain.domain.make_ascii_lowercase();
- if let Some(alias) = &mut domain.alias {
- alias.make_ascii_lowercase();
- }
- acc.push(domain);
- Ok(acc)
- },
- )?;
-
- let get_domain_config = |domain: &str| {
- domains
- .iter()
- .find(|d| d.domain == domain)
- .ok_or_else(|| format_err!("no config for domain '{}'", domain))
- };
-
- if domains.is_empty() {
- info!("No domains configured to be ordered from an ACME server.");
- return Ok(None);
- }
-
- let (plugins, _) = crate::config::acme::plugin::config()?;
-
- let mut acme = node_config.acme_client().await?;
-
- info!("Placing ACME order");
- let order = acme
- .new_order(domains.iter().map(|d| d.domain.to_ascii_lowercase()))
- .await?;
- info!("Order URL: {}", order.location);
-
- let identifiers: Vec<String> = order
- .data
- .identifiers
- .iter()
- .map(|identifier| match identifier {
- Identifier::Dns(domain) => domain.clone(),
- })
- .collect();
-
- for auth_url in &order.data.authorizations {
- info!("Getting authorization details from '{auth_url}'");
- let mut auth = acme.get_authorization(auth_url).await?;
-
- let domain = match &mut auth.identifier {
- Identifier::Dns(domain) => domain.to_ascii_lowercase(),
- };
-
- if auth.status == Status::Valid {
- info!("{domain} is already validated!");
- continue;
- }
-
- info!("The validation for {domain} is pending");
- let domain_config: &AcmeDomain = get_domain_config(&domain)?;
- let plugin_id = domain_config.plugin.as_deref().unwrap_or("standalone");
- let mut plugin_cfg = crate::acme::get_acme_plugin(&plugins, plugin_id)?
- .ok_or_else(|| format_err!("plugin '{plugin_id}' for domain '{domain}' not found!"))?;
-
- info!("Setting up validation plugin");
- let validation_url = plugin_cfg
- .setup(&mut acme, &auth, domain_config, Arc::clone(&worker))
- .await?;
-
- let result = request_validation(&mut acme, auth_url, validation_url).await;
-
- if let Err(err) = plugin_cfg
- .teardown(&mut acme, &auth, domain_config, Arc::clone(&worker))
- .await
- {
- warn!("Failed to teardown plugin '{plugin_id}' for domain '{domain}' - {err}");
- }
-
- result?;
- }
-
- info!("All domains validated");
- info!("Creating CSR");
-
- let csr = proxmox_acme::util::Csr::generate(&identifiers, &Default::default())?;
- let mut finalize_error_cnt = 0u8;
- let order_url = &order.location;
- let mut order;
- loop {
- use proxmox_acme::order::Status;
-
- order = acme.get_order(order_url).await?;
-
- match order.status {
- Status::Pending => {
- info!("still pending, trying to finalize anyway");
- let finalize = order
- .finalize
- .as_deref()
- .ok_or_else(|| format_err!("missing 'finalize' URL in order"))?;
- if let Err(err) = acme.finalize(finalize, &csr.data).await {
- if finalize_error_cnt >= 5 {
- return Err(err);
- }
-
- finalize_error_cnt += 1;
- }
- tokio::time::sleep(Duration::from_secs(5)).await;
- }
- Status::Ready => {
- info!("order is ready, finalizing");
- let finalize = order
- .finalize
- .as_deref()
- .ok_or_else(|| format_err!("missing 'finalize' URL in order"))?;
- acme.finalize(finalize, &csr.data).await?;
- tokio::time::sleep(Duration::from_secs(5)).await;
- }
- Status::Processing => {
- info!("still processing, trying again in 30 seconds");
- tokio::time::sleep(Duration::from_secs(30)).await;
- }
- Status::Valid => {
- info!("valid");
- break;
- }
- other => bail!("order status: {:?}", other),
- }
- }
-
- info!("Downloading certificate");
- let certificate = acme
- .get_certificate(
- order
- .certificate
- .as_deref()
- .ok_or_else(|| format_err!("missing certificate url in finalized order"))?,
- )
- .await?;
-
- Ok(Some(OrderedCertificate {
- certificate,
- private_key_pem: csr.private_key_pem,
- }))
-}
-
-async fn request_validation(
- acme: &mut AcmeClient,
- auth_url: &str,
- validation_url: &str,
-) -> Result<(), Error> {
- info!("Triggering validation");
- acme.request_challenge_validation(validation_url).await?;
-
- info!("Sleeping for 5 seconds");
- tokio::time::sleep(Duration::from_secs(5)).await;
-
- loop {
- use proxmox_acme::authorization::Status;
-
- let auth = acme.get_authorization(auth_url).await?;
- match auth.status {
- Status::Pending => {
- info!("Status is still 'pending', trying again in 10 seconds");
- tokio::time::sleep(Duration::from_secs(10)).await;
- }
- Status::Valid => return Ok(()),
- other => bail!(
- "validating challenge '{}' failed - status: {:?}",
- validation_url,
- other
- ),
- }
- }
-}
-
#[api(
input: {
properties: {
@@ -524,9 +332,30 @@ fn spawn_certificate_worker(
let auth_id = rpcenv.get_auth_id().unwrap();
+ let acme_config = if let Some(cfg) = node_config.acme_config().transpose()? {
+ cfg
+ } else {
+ proxmox_acme_api::parse_acme_config_string("account=default")?
+ };
+
+ let domains = node_config.acme_domains().try_fold(
+ Vec::<AcmeDomain>::new(),
+ |mut acc, domain| -> Result<_, Error> {
+ let mut domain = domain?;
+ domain.domain.make_ascii_lowercase();
+ if let Some(alias) = &mut domain.alias {
+ alias.make_ascii_lowercase();
+ }
+ acc.push(domain);
+ Ok(acc)
+ },
+ )?;
+
WorkerTask::spawn(name, None, auth_id, true, move |worker| async move {
let work = || async {
- if let Some(cert) = order_certificate(worker, &node_config).await? {
+ if let Some(cert) =
+ proxmox_acme_api::order_certificate(worker, &acme_config, &domains).await?
+ {
crate::config::set_proxy_certificate(&cert.certificate, &cert.private_key_pem)?;
crate::server::reload_proxy_certificate().await?;
}
@@ -562,16 +391,20 @@ pub fn revoke_acme_cert(rpcenv: &mut dyn RpcEnvironment) -> Result<String, Error
let auth_id = rpcenv.get_auth_id().unwrap();
+ let acme_config = if let Some(cfg) = node_config.acme_config().transpose()? {
+ cfg
+ } else {
+ proxmox_acme_api::parse_acme_config_string("account=default")?
+ };
+
WorkerTask::spawn(
"acme-revoke-cert",
None,
auth_id,
true,
move |_worker| async move {
- info!("Loading ACME account");
- let mut acme = node_config.acme_client().await?;
info!("Revoking old certificate");
- acme.revoke_certificate(cert_pem.as_bytes(), None).await?;
+ proxmox_acme_api::revoke_certificate(&acme_config, &cert_pem.as_bytes()).await?;
info!("Deleting certificate and regenerating a self-signed one");
delete_custom_certificate().await?;
Ok(())
diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
deleted file mode 100644
index 0ff496b6..00000000
--- a/src/api2/types/acme.rs
+++ /dev/null
@@ -1,73 +0,0 @@
-use serde::{Deserialize, Serialize};
-use serde_json::Value;
-
-use pbs_api_types::{DNS_ALIAS_FORMAT, DNS_NAME_FORMAT, PROXMOX_SAFE_ID_FORMAT};
-use proxmox_schema::{api, ApiStringFormat, ApiType, Schema, StringSchema};
-
-#[api(
- properties: {
- "domain": { format: &DNS_NAME_FORMAT },
- "alias": {
- optional: true,
- format: &DNS_ALIAS_FORMAT,
- },
- "plugin": {
- optional: true,
- format: &PROXMOX_SAFE_ID_FORMAT,
- },
- },
- default_key: "domain",
-)]
-#[derive(Deserialize, Serialize)]
-/// A domain entry for an ACME certificate.
-pub struct AcmeDomain {
- /// The domain to certify for.
- pub domain: String,
-
- /// The domain to use for challenges instead of the default acme challenge domain.
- ///
- /// This is useful if you use CNAME entries to redirect `_acme-challenge.*` domains to a
- /// different DNS server.
- #[serde(skip_serializing_if = "Option::is_none")]
- pub alias: Option<String>,
-
- /// The plugin to use to validate this domain.
- ///
- /// Empty means standalone HTTP validation is used.
- #[serde(skip_serializing_if = "Option::is_none")]
- pub plugin: Option<String>,
-}
-
-pub const ACME_DOMAIN_PROPERTY_SCHEMA: Schema =
- StringSchema::new("ACME domain configuration string")
- .format(&ApiStringFormat::PropertyString(&AcmeDomain::API_SCHEMA))
- .schema();
-
-#[api(
- properties: {
- schema: {
- type: Object,
- additional_properties: true,
- properties: {},
- },
- type: {
- type: String,
- },
- },
-)]
-#[derive(Serialize)]
-/// Schema for an ACME challenge plugin.
-pub struct AcmeChallengeSchema {
- /// Plugin ID.
- pub id: String,
-
- /// Human readable name, falls back to id.
- pub name: String,
-
- /// Plugin Type.
- #[serde(rename = "type")]
- pub ty: &'static str,
-
- /// The plugin's parameter schema.
- pub schema: Value,
-}
diff --git a/src/api2/types/mod.rs b/src/api2/types/mod.rs
index afc34b30..34193685 100644
--- a/src/api2/types/mod.rs
+++ b/src/api2/types/mod.rs
@@ -4,9 +4,6 @@ use anyhow::bail;
use proxmox_schema::*;
-mod acme;
-pub use acme::*;
-
// File names: may not contain slashes, may not start with "."
pub const FILENAME_FORMAT: ApiStringFormat = ApiStringFormat::VerifyFn(|name| {
if name.starts_with('.') {
diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
index 01ab6223..73486df9 100644
--- a/src/config/acme/mod.rs
+++ b/src/config/acme/mod.rs
@@ -5,12 +5,10 @@ use anyhow::Error;
use serde_json::Value;
use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
-use proxmox_acme_api::{AcmeAccountName, KnownAcmeDirectory, KNOWN_ACME_DIRECTORIES};
+use proxmox_acme_api::{AcmeAccountName, AcmeChallengeSchema};
use proxmox_sys::error::SysError;
use proxmox_sys::fs::{file_read_string, CreateOptions};
-use crate::api2::types::AcmeChallengeSchema;
-
pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
pub(crate) const ACME_ACCOUNT_DIR: &str = pbs_buildcfg::configdir!("/acme/accounts");
@@ -34,8 +32,6 @@ pub(crate) fn make_acme_dir() -> Result<(), Error> {
create_acme_subdir(ACME_DIR)
}
-pub const DEFAULT_ACME_DIRECTORY_ENTRY: &KnownAcmeDirectory = &KNOWN_ACME_DIRECTORIES[0];
-
pub fn foreach_acme_account<F>(mut func: F) -> Result<(), Error>
where
F: FnMut(AcmeAccountName) -> ControlFlow<Result<(), Error>>,
@@ -79,7 +75,7 @@ pub fn load_dns_challenge_schema() -> Result<Vec<AcmeChallengeSchema>, Error> {
.and_then(Value::as_str)
.unwrap_or(id)
.to_owned(),
- ty: "dns",
+ ty: "dns".into(),
schema: schema.to_owned(),
})
.collect())
diff --git a/src/config/acme/plugin.rs b/src/config/acme/plugin.rs
index 8ce852ec..4b4a216e 100644
--- a/src/config/acme/plugin.rs
+++ b/src/config/acme/plugin.rs
@@ -1,104 +1,16 @@
use std::sync::LazyLock;
use anyhow::Error;
-use serde::{Deserialize, Serialize};
use serde_json::Value;
-use pbs_api_types::PROXMOX_SAFE_ID_FORMAT;
-use proxmox_schema::{api, ApiType, Schema, StringSchema, Updater};
+use proxmox_acme_api::{DnsPlugin, StandalonePlugin, PLUGIN_ID_SCHEMA};
+use proxmox_schema::{ApiType, Schema};
use proxmox_section_config::{SectionConfig, SectionConfigData, SectionConfigPlugin};
use pbs_config::{open_backup_lockfile, BackupLockGuard};
-pub const PLUGIN_ID_SCHEMA: Schema = StringSchema::new("ACME Challenge Plugin ID.")
- .format(&PROXMOX_SAFE_ID_FORMAT)
- .min_length(1)
- .max_length(32)
- .schema();
-
pub static CONFIG: LazyLock<SectionConfig> = LazyLock::new(init);
-#[api(
- properties: {
- id: { schema: PLUGIN_ID_SCHEMA },
- },
-)]
-#[derive(Deserialize, Serialize)]
-/// Standalone ACME Plugin for the http-1 challenge.
-pub struct StandalonePlugin {
- /// Plugin ID.
- id: String,
-}
-
-impl Default for StandalonePlugin {
- fn default() -> Self {
- Self {
- id: "standalone".to_string(),
- }
- }
-}
-
-#[api(
- properties: {
- id: { schema: PLUGIN_ID_SCHEMA },
- disable: {
- optional: true,
- default: false,
- },
- "validation-delay": {
- default: 30,
- optional: true,
- minimum: 0,
- maximum: 2 * 24 * 60 * 60,
- },
- },
-)]
-/// DNS ACME Challenge Plugin core data.
-#[derive(Deserialize, Serialize, Updater)]
-#[serde(rename_all = "kebab-case")]
-pub struct DnsPluginCore {
- /// Plugin ID.
- #[updater(skip)]
- pub id: String,
-
- /// DNS API Plugin Id.
- pub api: String,
-
- /// Extra delay in seconds to wait before requesting validation.
- ///
- /// Allows to cope with long TTL of DNS records.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- pub validation_delay: Option<u32>,
-
- /// Flag to disable the config.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- pub disable: Option<bool>,
-}
-
-#[api(
- properties: {
- core: { type: DnsPluginCore },
- },
-)]
-/// DNS ACME Challenge Plugin.
-#[derive(Deserialize, Serialize)]
-#[serde(rename_all = "kebab-case")]
-pub struct DnsPlugin {
- #[serde(flatten)]
- pub core: DnsPluginCore,
-
- // We handle this property separately in the API calls.
- /// DNS plugin data (base64url encoded without padding).
- #[serde(with = "proxmox_serde::string_as_base64url_nopad")]
- pub data: String,
-}
-
-impl DnsPlugin {
- pub fn decode_data(&self, output: &mut Vec<u8>) -> Result<(), Error> {
- Ok(proxmox_base64::url::decode_to_vec(&self.data, output)?)
- }
-}
-
fn init() -> SectionConfig {
let mut config = SectionConfig::new(&PLUGIN_ID_SCHEMA);
diff --git a/src/config/node.rs b/src/config/node.rs
index e4b66a20..6865b815 100644
--- a/src/config/node.rs
+++ b/src/config/node.rs
@@ -9,14 +9,14 @@ use pbs_api_types::{
OPENSSL_CIPHERS_TLS_1_3_SCHEMA,
};
use proxmox_acme::async_client::AcmeClient;
-use proxmox_acme_api::AcmeAccountName;
+use proxmox_acme_api::{AcmeAccountName, AcmeConfig, AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA};
use proxmox_http::ProxyConfig;
use proxmox_schema::{api, ApiStringFormat, ApiType, Updater};
use pbs_buildcfg::configdir;
use pbs_config::{open_backup_lockfile, BackupLockGuard};
-use crate::api2::types::{AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA};
+use crate::api2::types::HTTP_PROXY_SCHEMA;
const CONF_FILE: &str = configdir!("/node.cfg");
const LOCK_FILE: &str = configdir!("/.node.lck");
@@ -43,20 +43,6 @@ pub fn save_config(config: &NodeConfig) -> Result<(), Error> {
pbs_config::replace_backup_config(CONF_FILE, &raw)
}
-#[api(
- properties: {
- account: { type: AcmeAccountName },
- }
-)]
-#[derive(Deserialize, Serialize)]
-/// The ACME configuration.
-///
-/// Currently only contains the name of the account use.
-pub struct AcmeConfig {
- /// Account to use to acquire ACME certificates.
- account: AcmeAccountName,
-}
-
/// All available languages in Proxmox. Taken from proxmox-i18n repository.
/// pt_BR, zh_CN, and zh_TW use the same case in the translation files.
// TODO: auto-generate from available translations
@@ -242,7 +228,7 @@ impl NodeConfig {
pub async fn acme_client(&self) -> Result<AcmeClient, Error> {
let account = if let Some(cfg) = self.acme_config().transpose()? {
- cfg.account
+ AcmeAccountName::from_string(cfg.account)?
} else {
AcmeAccountName::from_string("default".to_string())? // should really not happen
};
diff --git a/src/lib.rs b/src/lib.rs
index 8633378c..828f5842 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -27,8 +27,6 @@ pub(crate) mod auth;
pub mod tape;
-pub mod acme;
-
pub mod client_helpers;
pub mod traffic_control_cache;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 7%]
* [pbs-devel] [PATCH proxmox-backup v5 4/5] acme: change API impls to use proxmox-acme-api handlers
2026-01-08 11:26 11% [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (6 preceding siblings ...)
2026-01-08 11:26 6% ` [pbs-devel] [PATCH proxmox-backup v5 3/5] acme: drop local AcmeClient Samuel Rufinatscha
@ 2026-01-08 11:26 8% ` Samuel Rufinatscha
2026-01-13 13:45 5% ` Fabian Grünbichler
2026-01-08 11:26 7% ` [pbs-devel] [PATCH proxmox-backup v5 5/5] acme: certificate ordering through proxmox-acme-api Samuel Rufinatscha
` (2 subsequent siblings)
10 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2026-01-08 11:26 UTC (permalink / raw)
To: pbs-devel
PBS currently uses its own ACME client and API logic, while PDM uses the
factored out proxmox-acme and proxmox-acme-api crates. This duplication
risks differences in behaviour and requires ACME maintenance in two
places. This patch is part of a series to move PBS over to the shared
ACME stack.
Changes:
- Replace api2/config/acme.rs API logic with proxmox-acme-api handlers.
- Drop local caching and helper types that duplicate proxmox-acme-api.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
src/api2/config/acme.rs | 378 ++-----------------------
src/api2/types/acme.rs | 16 --
src/bin/proxmox_backup_manager/acme.rs | 6 +-
src/config/acme/mod.rs | 44 +--
4 files changed, 33 insertions(+), 411 deletions(-)
diff --git a/src/api2/config/acme.rs b/src/api2/config/acme.rs
index 898f06dd..3314430c 100644
--- a/src/api2/config/acme.rs
+++ b/src/api2/config/acme.rs
@@ -1,29 +1,18 @@
-use std::fs;
-use std::ops::ControlFlow;
-use std::path::Path;
-use std::sync::{Arc, LazyLock, Mutex};
-use std::time::SystemTime;
-
-use anyhow::{bail, format_err, Error};
-use hex::FromHex;
-use serde::{Deserialize, Serialize};
-use serde_json::{json, Value};
-use tracing::{info, warn};
+use anyhow::Error;
+use tracing::info;
use pbs_api_types::{Authid, PRIV_SYS_MODIFY};
-use proxmox_acme::async_client::AcmeClient;
-use proxmox_acme::types::AccountData as AcmeAccountData;
-use proxmox_acme_api::AcmeAccountName;
+use proxmox_acme_api::{
+ AccountEntry, AccountInfo, AcmeAccountName, AcmeChallengeSchema, ChallengeSchemaWrapper,
+ DeletablePluginProperty, DnsPluginCore, DnsPluginCoreUpdater, KnownAcmeDirectory, PluginConfig,
+ DEFAULT_ACME_DIRECTORY_ENTRY, PLUGIN_ID_SCHEMA,
+};
+use proxmox_config_digest::ConfigDigest;
use proxmox_rest_server::WorkerTask;
use proxmox_router::{
http_bail, list_subdirs_api_method, Permission, Router, RpcEnvironment, SubdirMap,
};
-use proxmox_schema::{api, param_bail};
-
-use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
-use crate::config::acme::plugin::{
- self, DnsPlugin, DnsPluginCore, DnsPluginCoreUpdater, PLUGIN_ID_SCHEMA,
-};
+use proxmox_schema::api;
pub(crate) const ROUTER: Router = Router::new()
.get(&list_subdirs_api_method!(SUBDIRS))
@@ -65,19 +54,6 @@ const PLUGIN_ITEM_ROUTER: Router = Router::new()
.put(&API_METHOD_UPDATE_PLUGIN)
.delete(&API_METHOD_DELETE_PLUGIN);
-#[api(
- properties: {
- name: { type: AcmeAccountName },
- },
-)]
-/// An ACME Account entry.
-///
-/// Currently only contains a 'name' property.
-#[derive(Serialize)]
-pub struct AccountEntry {
- name: AcmeAccountName,
-}
-
#[api(
access: {
permission: &Permission::Privilege(&["system", "certificates"], PRIV_SYS_MODIFY, false),
@@ -91,40 +67,7 @@ pub struct AccountEntry {
)]
/// List ACME accounts.
pub fn list_accounts() -> Result<Vec<AccountEntry>, Error> {
- let mut entries = Vec::new();
- crate::config::acme::foreach_acme_account(|name| {
- entries.push(AccountEntry { name });
- ControlFlow::Continue(())
- })?;
- Ok(entries)
-}
-
-#[api(
- properties: {
- account: { type: Object, properties: {}, additional_properties: true },
- tos: {
- type: String,
- optional: true,
- },
- },
-)]
-/// ACME Account information.
-///
-/// This is what we return via the API.
-#[derive(Serialize)]
-pub struct AccountInfo {
- /// Raw account data.
- account: AcmeAccountData,
-
- /// The ACME directory URL the account was created at.
- directory: String,
-
- /// The account's own URL within the ACME directory.
- location: String,
-
- /// The ToS URL, if the user agreed to one.
- #[serde(skip_serializing_if = "Option::is_none")]
- tos: Option<String>,
+ proxmox_acme_api::list_accounts()
}
#[api(
@@ -141,23 +84,7 @@ pub struct AccountInfo {
)]
/// Return existing ACME account information.
pub async fn get_account(name: AcmeAccountName) -> Result<AccountInfo, Error> {
- let account_info = proxmox_acme_api::get_account(name).await?;
-
- Ok(AccountInfo {
- location: account_info.location,
- tos: account_info.tos,
- directory: account_info.directory,
- account: AcmeAccountData {
- only_return_existing: false, // don't actually write this out in case it's set
- ..account_info.account
- },
- })
-}
-
-fn account_contact_from_string(s: &str) -> Vec<String> {
- s.split(&[' ', ';', ',', '\0'][..])
- .map(|s| format!("mailto:{s}"))
- .collect()
+ proxmox_acme_api::get_account(name).await
}
#[api(
@@ -222,15 +149,11 @@ fn register_account(
);
}
- if Path::new(&crate::config::acme::account_path(&name)).exists() {
+ if std::path::Path::new(&proxmox_acme_api::account_config_filename(&name)).exists() {
http_bail!(BAD_REQUEST, "account {} already exists", name);
}
- let directory = directory.unwrap_or_else(|| {
- crate::config::acme::DEFAULT_ACME_DIRECTORY_ENTRY
- .url
- .to_owned()
- });
+ let directory = directory.unwrap_or_else(|| DEFAULT_ACME_DIRECTORY_ENTRY.url.to_string());
WorkerTask::spawn(
"acme-register",
@@ -286,17 +209,7 @@ pub fn update_account(
auth_id.to_string(),
true,
move |_worker| async move {
- let data = match contact {
- Some(data) => json!({
- "contact": account_contact_from_string(&data),
- }),
- None => json!({}),
- };
-
- proxmox_acme_api::load_client_with_account(&name)
- .await?
- .update_account(&data)
- .await?;
+ proxmox_acme_api::update_account(&name, contact).await?;
Ok(())
},
@@ -334,18 +247,8 @@ pub fn deactivate_account(
auth_id.to_string(),
true,
move |_worker| async move {
- match proxmox_acme_api::load_client_with_account(&name)
- .await?
- .update_account(&json!({"status": "deactivated"}))
- .await
- {
- Ok(_account) => (),
- Err(err) if !force => return Err(err),
- Err(err) => {
- warn!("error deactivating account {name}, proceeding anyway - {err}");
- }
- }
- crate::config::acme::mark_account_deactivated(&name)?;
+ proxmox_acme_api::deactivate_account(&name, force).await?;
+
Ok(())
},
)
@@ -372,15 +275,7 @@ pub fn deactivate_account(
)]
/// Get the Terms of Service URL for an ACME directory.
async fn get_tos(directory: Option<String>) -> Result<Option<String>, Error> {
- let directory = directory.unwrap_or_else(|| {
- crate::config::acme::DEFAULT_ACME_DIRECTORY_ENTRY
- .url
- .to_owned()
- });
- Ok(AcmeClient::new(directory)
- .terms_of_service_url()
- .await?
- .map(str::to_owned))
+ proxmox_acme_api::get_tos(directory).await
}
#[api(
@@ -395,52 +290,7 @@ async fn get_tos(directory: Option<String>) -> Result<Option<String>, Error> {
)]
/// Get named known ACME directory endpoints.
fn get_directories() -> Result<&'static [KnownAcmeDirectory], Error> {
- Ok(crate::config::acme::KNOWN_ACME_DIRECTORIES)
-}
-
-/// Wrapper for efficient Arc use when returning the ACME challenge-plugin schema for serializing
-struct ChallengeSchemaWrapper {
- inner: Arc<Vec<AcmeChallengeSchema>>,
-}
-
-impl Serialize for ChallengeSchemaWrapper {
- fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
- where
- S: serde::Serializer,
- {
- self.inner.serialize(serializer)
- }
-}
-
-struct CachedSchema {
- schema: Arc<Vec<AcmeChallengeSchema>>,
- cached_mtime: SystemTime,
-}
-
-fn get_cached_challenge_schemas() -> Result<ChallengeSchemaWrapper, Error> {
- static CACHE: LazyLock<Mutex<Option<CachedSchema>>> = LazyLock::new(|| Mutex::new(None));
-
- // the actual loading code
- let mut last = CACHE.lock().unwrap();
-
- let actual_mtime = fs::metadata(crate::config::acme::ACME_DNS_SCHEMA_FN)?.modified()?;
-
- let schema = match &*last {
- Some(CachedSchema {
- schema,
- cached_mtime,
- }) if *cached_mtime >= actual_mtime => schema.clone(),
- _ => {
- let new_schema = Arc::new(crate::config::acme::load_dns_challenge_schema()?);
- *last = Some(CachedSchema {
- schema: Arc::clone(&new_schema),
- cached_mtime: actual_mtime,
- });
- new_schema
- }
- };
-
- Ok(ChallengeSchemaWrapper { inner: schema })
+ Ok(proxmox_acme_api::KNOWN_ACME_DIRECTORIES)
}
#[api(
@@ -455,69 +305,7 @@ fn get_cached_challenge_schemas() -> Result<ChallengeSchemaWrapper, Error> {
)]
/// Get named known ACME directory endpoints.
fn get_challenge_schema() -> Result<ChallengeSchemaWrapper, Error> {
- get_cached_challenge_schemas()
-}
-
-#[api]
-#[derive(Default, Deserialize, Serialize)]
-#[serde(rename_all = "kebab-case")]
-/// The API's format is inherited from PVE/PMG:
-pub struct PluginConfig {
- /// Plugin ID.
- plugin: String,
-
- /// Plugin type.
- #[serde(rename = "type")]
- ty: String,
-
- /// DNS Api name.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- api: Option<String>,
-
- /// Plugin configuration data.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- data: Option<String>,
-
- /// Extra delay in seconds to wait before requesting validation.
- ///
- /// Allows to cope with long TTL of DNS records.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- validation_delay: Option<u32>,
-
- /// Flag to disable the config.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- disable: Option<bool>,
-}
-
-// See PMG/PVE's $modify_cfg_for_api sub
-fn modify_cfg_for_api(id: &str, ty: &str, data: &Value) -> PluginConfig {
- let mut entry = data.clone();
-
- let obj = entry.as_object_mut().unwrap();
- obj.remove("id");
- obj.insert("plugin".to_string(), Value::String(id.to_owned()));
- obj.insert("type".to_string(), Value::String(ty.to_owned()));
-
- // FIXME: This needs to go once the `Updater` is fixed.
- // None of these should be able to fail unless the user changed the files by hand, in which
- // case we leave the unmodified string in the Value for now. This will be handled with an error
- // later.
- if let Some(Value::String(ref mut data)) = obj.get_mut("data") {
- if let Ok(new) = proxmox_base64::url::decode_no_pad(&data) {
- if let Ok(utf8) = String::from_utf8(new) {
- *data = utf8;
- }
- }
- }
-
- // PVE/PMG do this explicitly for ACME plugins...
- // obj.insert("digest".to_string(), Value::String(digest.clone()));
-
- serde_json::from_value(entry).unwrap_or_else(|_| PluginConfig {
- plugin: "*Error*".to_string(),
- ty: "*Error*".to_string(),
- ..Default::default()
- })
+ proxmox_acme_api::get_cached_challenge_schemas()
}
#[api(
@@ -533,12 +321,7 @@ fn modify_cfg_for_api(id: &str, ty: &str, data: &Value) -> PluginConfig {
)]
/// List ACME challenge plugins.
pub fn list_plugins(rpcenv: &mut dyn RpcEnvironment) -> Result<Vec<PluginConfig>, Error> {
- let (plugins, digest) = plugin::config()?;
- rpcenv["digest"] = hex::encode(digest).into();
- Ok(plugins
- .iter()
- .map(|(id, (ty, data))| modify_cfg_for_api(id, ty, data))
- .collect())
+ proxmox_acme_api::list_plugins(rpcenv)
}
#[api(
@@ -555,13 +338,7 @@ pub fn list_plugins(rpcenv: &mut dyn RpcEnvironment) -> Result<Vec<PluginConfig>
)]
/// List ACME challenge plugins.
pub fn get_plugin(id: String, rpcenv: &mut dyn RpcEnvironment) -> Result<PluginConfig, Error> {
- let (plugins, digest) = plugin::config()?;
- rpcenv["digest"] = hex::encode(digest).into();
-
- match plugins.get(&id) {
- Some((ty, data)) => Ok(modify_cfg_for_api(&id, ty, data)),
- None => http_bail!(NOT_FOUND, "no such plugin"),
- }
+ proxmox_acme_api::get_plugin(id, rpcenv)
}
// Currently we only have "the" standalone plugin and DNS plugins so we can just flatten a
@@ -593,30 +370,7 @@ pub fn get_plugin(id: String, rpcenv: &mut dyn RpcEnvironment) -> Result<PluginC
)]
/// Add ACME plugin configuration.
pub fn add_plugin(r#type: String, core: DnsPluginCore, data: String) -> Result<(), Error> {
- // Currently we only support DNS plugins and the standalone plugin is "fixed":
- if r#type != "dns" {
- param_bail!("type", "invalid ACME plugin type: {:?}", r#type);
- }
-
- let data = String::from_utf8(proxmox_base64::decode(data)?)
- .map_err(|_| format_err!("data must be valid UTF-8"))?;
-
- let id = core.id.clone();
-
- let _lock = plugin::lock()?;
-
- let (mut plugins, _digest) = plugin::config()?;
- if plugins.contains_key(&id) {
- param_bail!("id", "ACME plugin ID {:?} already exists", id);
- }
-
- let plugin = serde_json::to_value(DnsPlugin { core, data })?;
-
- plugins.insert(id, r#type, plugin);
-
- plugin::save_config(&plugins)?;
-
- Ok(())
+ proxmox_acme_api::add_plugin(r#type, core, data)
}
#[api(
@@ -632,26 +386,7 @@ pub fn add_plugin(r#type: String, core: DnsPluginCore, data: String) -> Result<(
)]
/// Delete an ACME plugin configuration.
pub fn delete_plugin(id: String) -> Result<(), Error> {
- let _lock = plugin::lock()?;
-
- let (mut plugins, _digest) = plugin::config()?;
- if plugins.remove(&id).is_none() {
- http_bail!(NOT_FOUND, "no such plugin");
- }
- plugin::save_config(&plugins)?;
-
- Ok(())
-}
-
-#[api()]
-#[derive(Serialize, Deserialize)]
-#[serde(rename_all = "kebab-case")]
-/// Deletable property name
-pub enum DeletableProperty {
- /// Delete the disable property
- Disable,
- /// Delete the validation-delay property
- ValidationDelay,
+ proxmox_acme_api::delete_plugin(id)
}
#[api(
@@ -673,12 +408,12 @@ pub enum DeletableProperty {
type: Array,
optional: true,
items: {
- type: DeletableProperty,
+ type: DeletablePluginProperty,
}
},
digest: {
- description: "Digest to protect against concurrent updates",
optional: true,
+ type: ConfigDigest,
},
},
},
@@ -692,65 +427,8 @@ pub fn update_plugin(
id: String,
update: DnsPluginCoreUpdater,
data: Option<String>,
- delete: Option<Vec<DeletableProperty>>,
- digest: Option<String>,
+ delete: Option<Vec<DeletablePluginProperty>>,
+ digest: Option<ConfigDigest>,
) -> Result<(), Error> {
- let data = data
- .as_deref()
- .map(proxmox_base64::decode)
- .transpose()?
- .map(String::from_utf8)
- .transpose()
- .map_err(|_| format_err!("data must be valid UTF-8"))?;
-
- let _lock = plugin::lock()?;
-
- let (mut plugins, expected_digest) = plugin::config()?;
-
- if let Some(digest) = digest {
- let digest = <[u8; 32]>::from_hex(digest)?;
- crate::tools::detect_modified_configuration_file(&digest, &expected_digest)?;
- }
-
- match plugins.get_mut(&id) {
- Some((ty, ref mut entry)) => {
- if ty != "dns" {
- bail!("cannot update plugin of type {:?}", ty);
- }
-
- let mut plugin = DnsPlugin::deserialize(&*entry)?;
-
- if let Some(delete) = delete {
- for delete_prop in delete {
- match delete_prop {
- DeletableProperty::ValidationDelay => {
- plugin.core.validation_delay = None;
- }
- DeletableProperty::Disable => {
- plugin.core.disable = None;
- }
- }
- }
- }
- if let Some(data) = data {
- plugin.data = data;
- }
- if let Some(api) = update.api {
- plugin.core.api = api;
- }
- if update.validation_delay.is_some() {
- plugin.core.validation_delay = update.validation_delay;
- }
- if update.disable.is_some() {
- plugin.core.disable = update.disable;
- }
-
- *entry = serde_json::to_value(plugin)?;
- }
- None => http_bail!(NOT_FOUND, "no such plugin"),
- }
-
- plugin::save_config(&plugins)?;
-
- Ok(())
+ proxmox_acme_api::update_plugin(id, update, data, delete, digest)
}
diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
index 64175aff..0ff496b6 100644
--- a/src/api2/types/acme.rs
+++ b/src/api2/types/acme.rs
@@ -43,22 +43,6 @@ pub const ACME_DOMAIN_PROPERTY_SCHEMA: Schema =
.format(&ApiStringFormat::PropertyString(&AcmeDomain::API_SCHEMA))
.schema();
-#[api(
- properties: {
- name: { type: String },
- url: { type: String },
- },
-)]
-/// An ACME directory endpoint with a name and URL.
-#[derive(Serialize)]
-pub struct KnownAcmeDirectory {
- /// The ACME directory's name.
- pub name: &'static str,
-
- /// The ACME directory's endpoint URL.
- pub url: &'static str,
-}
-
#[api(
properties: {
schema: {
diff --git a/src/bin/proxmox_backup_manager/acme.rs b/src/bin/proxmox_backup_manager/acme.rs
index 6ed61560..d11d7498 100644
--- a/src/bin/proxmox_backup_manager/acme.rs
+++ b/src/bin/proxmox_backup_manager/acme.rs
@@ -4,14 +4,12 @@ use anyhow::{bail, Error};
use serde_json::Value;
use proxmox_acme::async_client::AcmeClient;
-use proxmox_acme_api::AcmeAccountName;
+use proxmox_acme_api::{AcmeAccountName, DnsPluginCore, KNOWN_ACME_DIRECTORIES};
use proxmox_router::{cli::*, ApiHandler, RpcEnvironment};
use proxmox_schema::api;
use proxmox_sys::fs::file_get_contents;
use proxmox_backup::api2;
-use proxmox_backup::config::acme::plugin::DnsPluginCore;
-use proxmox_backup::config::acme::KNOWN_ACME_DIRECTORIES;
pub fn acme_mgmt_cli() -> CommandLineInterface {
let cmd_def = CliCommandMap::new()
@@ -122,7 +120,7 @@ async fn register_account(
match input.trim().parse::<usize>() {
Ok(n) if n < KNOWN_ACME_DIRECTORIES.len() => {
- break (KNOWN_ACME_DIRECTORIES[n].url.to_owned(), false);
+ break (KNOWN_ACME_DIRECTORIES[n].url.to_string(), false);
}
Ok(n) if n == KNOWN_ACME_DIRECTORIES.len() => {
input.clear();
diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
index e4639c53..01ab6223 100644
--- a/src/config/acme/mod.rs
+++ b/src/config/acme/mod.rs
@@ -1,16 +1,15 @@
use std::collections::HashMap;
use std::ops::ControlFlow;
-use std::path::Path;
-use anyhow::{bail, format_err, Error};
+use anyhow::Error;
use serde_json::Value;
use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
-use proxmox_acme_api::AcmeAccountName;
+use proxmox_acme_api::{AcmeAccountName, KnownAcmeDirectory, KNOWN_ACME_DIRECTORIES};
use proxmox_sys::error::SysError;
use proxmox_sys::fs::{file_read_string, CreateOptions};
-use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
+use crate::api2::types::AcmeChallengeSchema;
pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
pub(crate) const ACME_ACCOUNT_DIR: &str = pbs_buildcfg::configdir!("/acme/accounts");
@@ -35,23 +34,8 @@ pub(crate) fn make_acme_dir() -> Result<(), Error> {
create_acme_subdir(ACME_DIR)
}
-pub const KNOWN_ACME_DIRECTORIES: &[KnownAcmeDirectory] = &[
- KnownAcmeDirectory {
- name: "Let's Encrypt V2",
- url: "https://acme-v02.api.letsencrypt.org/directory",
- },
- KnownAcmeDirectory {
- name: "Let's Encrypt V2 Staging",
- url: "https://acme-staging-v02.api.letsencrypt.org/directory",
- },
-];
-
pub const DEFAULT_ACME_DIRECTORY_ENTRY: &KnownAcmeDirectory = &KNOWN_ACME_DIRECTORIES[0];
-pub fn account_path(name: &str) -> String {
- format!("{ACME_ACCOUNT_DIR}/{name}")
-}
-
pub fn foreach_acme_account<F>(mut func: F) -> Result<(), Error>
where
F: FnMut(AcmeAccountName) -> ControlFlow<Result<(), Error>>,
@@ -82,28 +66,6 @@ where
}
}
-pub fn mark_account_deactivated(name: &str) -> Result<(), Error> {
- let from = account_path(name);
- for i in 0..100 {
- let to = account_path(&format!("_deactivated_{name}_{i}"));
- if !Path::new(&to).exists() {
- return std::fs::rename(&from, &to).map_err(|err| {
- format_err!(
- "failed to move account path {:?} to {:?} - {}",
- from,
- to,
- err
- )
- });
- }
- }
- bail!(
- "No free slot to rename deactivated account {:?}, please cleanup {:?}",
- from,
- ACME_ACCOUNT_DIR
- );
-}
-
pub fn load_dns_challenge_schema() -> Result<Vec<AcmeChallengeSchema>, Error> {
let raw = file_read_string(ACME_DNS_SCHEMA_FN)?;
let schemas: serde_json::Map<String, Value> = serde_json::from_str(&raw)?;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 8%]
* [pbs-devel] [PATCH proxmox v5 3/4] fix #6939: acme: support servers returning 204 for nonce requests
2026-01-08 11:26 11% [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2026-01-08 11:26 10% ` [pbs-devel] [PATCH proxmox v5 1/4] acme: reduce visibility of Request type Samuel Rufinatscha
2026-01-08 11:26 15% ` [pbs-devel] [PATCH proxmox v5 2/4] acme: introduce http_status module Samuel Rufinatscha
@ 2026-01-08 11:26 14% ` Samuel Rufinatscha
2026-01-08 11:26 17% ` [pbs-devel] [PATCH proxmox v5 4/4] acme-api: add helper to load client for an account Samuel Rufinatscha
` (7 subsequent siblings)
10 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-08 11:26 UTC (permalink / raw)
To: pbs-devel
Some ACME servers (notably custom or legacy implementations) respond
to HEAD /newNonce with a 204 No Content instead of the
RFC 8555-recommended 200 OK [1]. While this behavior is technically
off-spec, it is not illegal. This issue was reported on our bug
tracker [2].
The previous implementation treated any non-200 response as an error,
causing account registration to fail against such servers. Relax the
status-code check to accept both 200 and 204 responses (and potentially
support other 2xx codes) to improve interoperability.
Note: In comparison, PVE’s Perl ACME client performs a GET request [3]
instead of a HEAD request and accepts any 2xx success code when
retrieving the nonce [4]. This difference in behavior does not affect
functionality but is worth noting for consistency across
implementations.
[1] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
[3] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219
[4] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597
Fixes: #6939
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme/src/account.rs | 8 ++++----
proxmox-acme/src/async_client.rs | 6 +++---
proxmox-acme/src/client.rs | 2 +-
proxmox-acme/src/request.rs | 4 ++--
4 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
index ea1a3c60..84610bf3 100644
--- a/proxmox-acme/src/account.rs
+++ b/proxmox-acme/src/account.rs
@@ -84,7 +84,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::CREATED,
+ expected: &[crate::http_status::CREATED],
};
Ok(NewOrder::new(request))
@@ -106,7 +106,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK],
})
}
@@ -131,7 +131,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK],
})
}
@@ -321,7 +321,7 @@ impl AccountCreator {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::CREATED,
+ expected: &[crate::http_status::CREATED],
})
}
diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
index 043648bb..07da842c 100644
--- a/proxmox-acme/src/async_client.rs
+++ b/proxmox-acme/src/async_client.rs
@@ -420,7 +420,7 @@ impl AcmeClient {
};
if parts.status.is_success() {
- if status != request.expected {
+ if !request.expected.contains(&status) {
return Err(Error::InvalidApi(format!(
"ACME server responded with unexpected status code: {:?}",
parts.status
@@ -498,7 +498,7 @@ impl AcmeClient {
method: "GET",
content_type: "",
body: String::new(),
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK],
},
nonce,
)
@@ -550,7 +550,7 @@ impl AcmeClient {
method: "HEAD",
content_type: "",
body: String::new(),
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK, crate::http_status::NO_CONTENT],
},
nonce,
)
diff --git a/proxmox-acme/src/client.rs b/proxmox-acme/src/client.rs
index 5c812567..af250fb8 100644
--- a/proxmox-acme/src/client.rs
+++ b/proxmox-acme/src/client.rs
@@ -203,7 +203,7 @@ impl Inner {
let got_nonce = self.update_nonce(&mut response)?;
if response.is_success() {
- if response.status != request.expected {
+ if !request.expected.contains(&response.status) {
return Err(Error::InvalidApi(format!(
"API server responded with unexpected status code: {:?}",
response.status
diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
index 341ce53e..d782a7de 100644
--- a/proxmox-acme/src/request.rs
+++ b/proxmox-acme/src/request.rs
@@ -16,8 +16,8 @@ pub(crate) struct Request {
/// The body to pass along with request, or an empty string.
pub(crate) body: String,
- /// The expected status code a compliant ACME provider will return on success.
- pub(crate) expected: u16,
+ /// The set of HTTP status codes that indicate a successful response from an ACME provider.
+ pub(crate) expected: &'static [u16],
}
/// Common HTTP status codes used in ACME responses.
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 14%]
* [pbs-devel] [PATCH proxmox v5 4/4] acme-api: add helper to load client for an account
2026-01-08 11:26 11% [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (2 preceding siblings ...)
2026-01-08 11:26 14% ` [pbs-devel] [PATCH proxmox v5 3/4] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
@ 2026-01-08 11:26 17% ` Samuel Rufinatscha
2026-01-13 13:45 5% ` Fabian Grünbichler
2026-01-08 11:26 13% ` [pbs-devel] [PATCH proxmox-backup v5 1/5] acme: clean up ACME-related imports Samuel Rufinatscha
` (6 subsequent siblings)
10 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2026-01-08 11:26 UTC (permalink / raw)
To: pbs-devel
The PBS ACME refactoring needs a simple way to obtain an AcmeClient for
a given configured account without duplicating config wiring. This patch
adds a load_client_with_account helper in proxmox-acme-api that loads
the account and constructs a matching client, similarly as PBS previous
own AcmeClient::load() function.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme-api/src/account_api_impl.rs | 5 +++++
proxmox-acme-api/src/lib.rs | 3 ++-
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/proxmox-acme-api/src/account_api_impl.rs b/proxmox-acme-api/src/account_api_impl.rs
index ef195908..ca8c8655 100644
--- a/proxmox-acme-api/src/account_api_impl.rs
+++ b/proxmox-acme-api/src/account_api_impl.rs
@@ -116,3 +116,8 @@ pub async fn update_account(name: &AcmeAccountName, contact: Option<String>) ->
Ok(())
}
+
+pub async fn load_client_with_account(account_name: &AcmeAccountName) -> Result<AcmeClient, Error> {
+ let account_data = super::account_config::load_account_config(&account_name).await?;
+ Ok(account_data.client())
+}
diff --git a/proxmox-acme-api/src/lib.rs b/proxmox-acme-api/src/lib.rs
index 623e9e23..96f88ae2 100644
--- a/proxmox-acme-api/src/lib.rs
+++ b/proxmox-acme-api/src/lib.rs
@@ -31,7 +31,8 @@ mod plugin_config;
mod account_api_impl;
#[cfg(feature = "impl")]
pub use account_api_impl::{
- deactivate_account, get_account, get_tos, list_accounts, register_account, update_account,
+ deactivate_account, get_account, get_tos, list_accounts, load_client_with_account,
+ register_account, update_account,
};
#[cfg(feature = "impl")]
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 17%]
* [pbs-devel] [PATCH proxmox-backup v2 1/1] fix: s3: make s3_refresh apihandler sync
@ 2026-01-07 12:46 6% Nicolas Frey
0 siblings, 0 replies; 200+ results
From: Nicolas Frey @ 2026-01-07 12:46 UTC (permalink / raw)
To: pbs-devel
fixes regression from 524cf1e7 that made `datastore::s3_refresh` sync
but did not change the ApiHandler matching part here
This would result in a panic every time an s3-refresh was initiated
Reviewed-by: Christian Ebner <c.ebner@proxmox.com>
Tested-by: Christian Ebner <c.ebner@proxmox.com>
Reviewed-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
Fixes: 524cf1e7 ("api: admin: make s3 refresh handler sync")
Fixes: https://forum.proxmox.com/threads/178655
Signed-off-by: Nicolas Frey <n.frey@proxmox.com>
---
added Fixes trailer to reference blamed commit
src/bin/proxmox_backup_manager/datastore.rs | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/bin/proxmox_backup_manager/datastore.rs b/src/bin/proxmox_backup_manager/datastore.rs
index 57b4ca29..5c65c5ec 100644
--- a/src/bin/proxmox_backup_manager/datastore.rs
+++ b/src/bin/proxmox_backup_manager/datastore.rs
@@ -339,7 +339,7 @@ async fn s3_refresh(mut param: Value, rpcenv: &mut dyn RpcEnvironment) -> Result
let info = &api2::admin::datastore::API_METHOD_S3_REFRESH;
let result = match info.handler {
- ApiHandler::Async(handler) => (handler)(param, info, rpcenv).await?,
+ ApiHandler::Sync(handler) => (handler)(param, info, rpcenv)?,
_ => unreachable!(),
};
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup 1/1] fix: s3: make s3_refresh apihandler sync
@ 2026-01-05 15:22 13% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-05 15:22 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Nicolas Frey
Thanks, this makes sense - the ApiHandler mismatch explains the panic.
Reviewed-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
On 1/5/26 11:34 AM, Nicolas Frey wrote:
> fixes regression from 524cf1e that made `datastore::s3_refresh` sync
> but did not change the ApiHandler matching part here
>
> This would result in a panic every time an s3-refresh was initiated
>
> Fixes: https://forum.proxmox.com/threads/178655
> Signed-off-by: Nicolas Frey <n.frey@proxmox.com>
> ---
> src/bin/proxmox_backup_manager/datastore.rs | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/bin/proxmox_backup_manager/datastore.rs b/src/bin/proxmox_backup_manager/datastore.rs
> index 57b4ca29..5c65c5ec 100644
> --- a/src/bin/proxmox_backup_manager/datastore.rs
> +++ b/src/bin/proxmox_backup_manager/datastore.rs
> @@ -339,7 +339,7 @@ async fn s3_refresh(mut param: Value, rpcenv: &mut dyn RpcEnvironment) -> Result
>
> let info = &api2::admin::datastore::API_METHOD_S3_REFRESH;
> let result = match info.handler {
> - ApiHandler::Async(handler) => (handler)(param, info, rpcenv).await?,
> + ApiHandler::Sync(handler) => (handler)(param, info, rpcenv)?,
> _ => unreachable!(),
> };
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] superseded: [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path
2025-11-24 17:04 12% [pbs-devel] [PATCH proxmox-backup v5 " Samuel Rufinatscha
` (4 preceding siblings ...)
2025-11-26 15:16 5% ` [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Fabian Grünbichler
@ 2026-01-05 14:21 13% ` Samuel Rufinatscha
5 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-05 14:21 UTC (permalink / raw)
To: pbs-devel
https://lore.proxmox.com/pbs-devel/20260105141615.242463-1-s.rufinatscha@proxmox.com/T/#t
On 11/24/25 6:03 PM, Samuel Rufinatscha wrote:
> Hi,
>
> this series reduces CPU time in datastore lookups by avoiding repeated
> datastore.cfg reads/parses in both `lookup_datastore()` and
> `DataStore::Drop`. It also adds a TTL so manual config edits are
> noticed without reintroducing hashing on every request.
>
> While investigating #6049 [1], cargo-flamegraph [2] showed hotspots
> during repeated `/status` calls in `lookup_datastore()` and in `Drop`,
> dominated by `pbs_config::datastore::config()` (config parse).
>
> The parsing cost itself should eventually be investigated in a future
> effort. Furthermore, cargo-flamegraph showed that when using a
> token-based auth method to access the API, a significant amount of time
> is spent in validation on every request request [3].
>
> ## Approach
>
> [PATCH 1/4] Support datastore generation in ConfigVersionCache
>
> [PATCH 2/4] Fast path for datastore lookups
> Cache the parsed datastore.cfg keyed by the shared datastore
> generation. lookup_datastore() reuses both the cached config and an
> existing DataStoreImpl when the generation matches, and falls back
> to the old slow path otherwise. The caching logic is implemented
> using the datastore_section_config_cached(update_cache: bool) helper.
>
> [PATCH 3/4] Fast path for Drop
> Make DataStore::Drop use the datastore_section_config_cached()
> helper to avoid re-reading/parsing datastore.cfg on every Drop.
> Bump generation not only on API config saves, but also on slow-path
> lookups (if update_cache is true), to enable Drop handlers see
> eventual newer configs.
>
> [PATCH 4/4] TTL to catch manual edits
> Add a TTL to the cached config and bump the datastore generation iff
> the digest changed but generation stays the same. This catches manual
> edits to datastore.cfg without reintroducing hashing or config
> parsing on every request.
>
> ## Benchmark results
>
> ### End-to-end
>
> Testing `/status?verbose=0` end-to-end with 1000 stores, 5 req/store
> and parallel=16 before/after the series:
>
> Metric Before After
> ----------------------------------------
> Total time 12s 9s
> Throughput (all) 416.67 555.56
> Cold RPS (round #1) 83.33 111.11
> Warm RPS (#2..N) 333.33 444.44
>
> Running under flamegraph [2], TLS appears to consume a significant
> amount of CPU time and blur the results. Still, a ~33% higher overall
> throughput and ~25% less end-to-end time for this workload.
>
> ### Isolated benchmarks (hyperfine)
>
> In addition to the end-to-end tests, I measured two standalone
> benchmarks with hyperfine, each using a config with 1000 datastores.
> `M` is the number of distinct datastores looked up and
> `N` is the number of lookups per datastore.
>
> Drop-direct variant:
>
> Drops the `DataStore` after every lookup, so the `Drop` path runs on
> every iteration:
>
> use anyhow::Error;
>
> use pbs_api_types::Operation;
> use pbs_datastore::DataStore;
>
> fn main() -> Result<(), Error> {
> let mut args = std::env::args();
> args.next();
>
> let datastores = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> let iterations = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> for d in 1..=datastores {
> let name = format!("ds{:04}", d);
>
> for i in 1..=iterations {
> DataStore::lookup_datastore(&name, Some(Operation::Write))?;
> }
> }
>
> Ok(())
> }
>
> +----+------+-----------+-----------+---------+
> | M | N | Baseline | Patched | Speedup |
> +----+------+-----------+-----------+---------+
> | 1 | 1000 | 1.684 s | 35.3 ms | 47.7x |
> | 10 | 100 | 1.689 s | 35.0 ms | 48.3x |
> | 100| 10 | 1.709 s | 35.8 ms | 47.7x |
> |1000| 1 | 1.809 s | 39.0 ms | 46.4x |
> +----+------+-----------+-----------+---------+
>
> Bulk-drop variant:
>
> Keeps the `DataStore` instances alive for
> all `N` lookups of a given datastore and then drops them in bulk,
> mimicking a task that performs many lookups while it is running and
> only triggers the expensive `Drop` logic when the last user exits.
>
> use anyhow::Error;
>
> use pbs_api_types::Operation;
> use pbs_datastore::DataStore;
>
> fn main() -> Result<(), Error> {
> let mut args = std::env::args();
> args.next();
>
> let datastores = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> let iterations = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> for d in 1..=datastores {
> let name = format!("ds{:04}", d);
>
> let mut stores = Vec::with_capacity(iterations);
> for i in 1..=iterations {
> stores.push(DataStore::lookup_datastore(&name, Some(Operation::Write))?);
> }
> }
>
> Ok(())
> }
>
> +------+------+---------------+--------------+---------+
> | M | N | Baseline mean | Patched mean | Speedup |
> +------+------+---------------+--------------+---------+
> | 1 | 1000 | 890.6 ms | 35.5 ms | 25.1x |
> | 10 | 100 | 891.3 ms | 35.1 ms | 25.4x |
> | 100 | 10 | 983.9 ms | 35.6 ms | 27.6x |
> | 1000 | 1 | 1829.0 ms | 45.2 ms | 40.5x |
> +------+------+---------------+--------------+---------+
>
>
> Both variants show that the combination of the cached config lookups
> and the cheaper `Drop` handling reduces the hot-path cost from ~1.8 s
> per run to a few tens of milliseconds in these benchmarks.
>
> ## Reproduction steps
>
> VM: 4 vCPU, ~8 GiB RAM, VirtIO-SCSI; disks:
> - scsi0 32G (OS)
> - scsi1 1000G (datastores)
>
> Install PBS from ISO on the VM.
>
> Set up ZFS on /dev/sdb (adjust if different):
>
> zpool create -f -o ashift=12 pbsbench /dev/sdb
> zfs set mountpoint=/pbsbench pbsbench
> zfs create pbsbench/pbs-bench
>
> Raise file-descriptor limit:
>
> sudo systemctl edit proxmox-backup-proxy.service
>
> Add the following lines:
>
> [Service]
> LimitNOFILE=1048576
>
> Reload systemd and restart the proxy:
>
> sudo systemctl daemon-reload
> sudo systemctl restart proxmox-backup-proxy.service
>
> Verify the limit:
>
> systemctl show proxmox-backup-proxy.service | grep LimitNOFILE
>
> Create 1000 ZFS-backed datastores (as used in #6049 [1]):
>
> seq -w 001 1000 | xargs -n1 -P1 bash -c '
> id=$0
> name="ds${id}"
> dataset="pbsbench/pbs-bench/${name}"
> path="/pbsbench/pbs-bench/${name}"
> zfs create -o mountpoint="$path" "$dataset"
> proxmox-backup-manager datastore create "$name" "$path" \
> --comment "ZFS dataset-based datastore"
> '
>
> Build PBS from this series, then run the server under manually
> under flamegraph:
>
> systemctl stop proxmox-backup-proxy
> cargo flamegraph --release --bin proxmox-backup-proxy
>
> ## Patch summary
>
> [PATCH 1/4] partial fix #6049: config: enable config version cache for datastore
> [PATCH 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
> [PATCH 3/4] partial fix #6049: datastore: use config fast-path in Drop
> [PATCH 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
>
> ## Maintainer notes
>
> No dependency bumps, no API changes and no breaking changes.
>
> Thanks,
> Samuel
>
> Links
>
> [1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
> [3] Bugzilla #7017: https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>
> Samuel Rufinatscha (4):
> partial fix #6049: config: enable config version cache for datastore
> partial fix #6049: datastore: impl ConfigVersionCache fast path for
> lookups
> partial fix #6049: datastore: use config fast-path in Drop
> partial fix #6049: datastore: add TTL fallback to catch manual config
> edits
>
> pbs-config/src/config_version_cache.rs | 10 +-
> pbs-datastore/Cargo.toml | 1 +
> pbs-datastore/src/datastore.rs | 213 ++++++++++++++++++++-----
> 3 files changed, 179 insertions(+), 45 deletions(-)
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox-backup v6 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
2026-01-05 14:16 12% [pbs-devel] [PATCH proxmox-backup v6 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
` (2 preceding siblings ...)
2026-01-05 14:16 15% ` [pbs-devel] [PATCH proxmox-backup v6 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
@ 2026-01-05 14:16 13% ` Samuel Rufinatscha
2026-01-14 9:54 5% ` [pbs-devel] applied-series: [PATCH proxmox-backup v6 0/4] datastore: remove config reload on hot path Fabian Grünbichler
4 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-05 14:16 UTC (permalink / raw)
To: pbs-devel
The lookup fast path reacts to API-driven config changes because
save_config() bumps the generation. Manual edits of datastore.cfg do
not bump the counter. To keep the system robust against such edits
without reintroducing config reading and hashing on the hot path, this
patch adds a TTL to the cache entry.
If the cached config is older than
DATASTORE_CONFIG_CACHE_TTL_SECS (set to 60s), the next lookup takes
the slow path and refreshes the entry. As an optimization, a check to
catch manual edits was added (if the digest changed but generation
stayed the same). If a manual edit was detected, the generation will be
bumped.
Links
[1] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes:
From v1 → v2
- Store last_update timestamp in DatastoreConfigCache type.
From v2 → v3
No changes
From v3 → v4
- Fix digest generation bump logic in update_cache, thanks @Fabian.
From v4 → v5
- Rebased only, no changes
From v5 → v6
- Rebased
- Styling: simplified digest-matching, thanks @Fabian
pbs-datastore/src/datastore.rs | 47 +++++++++++++++++++++++++---------
1 file changed, 35 insertions(+), 12 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 8adb0e3b..c4be55ad 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -53,8 +53,12 @@ use crate::{DataBlob, LocalDatastoreLruCache};
struct DatastoreConfigCache {
// Parsed datastore.cfg file
config: Arc<SectionConfigData>,
+ // Digest of the datastore.cfg file
+ digest: [u8; 32],
// Generation number from ConfigVersionCache
last_generation: usize,
+ // Last update time (epoch seconds)
+ last_update: i64,
}
static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
@@ -63,6 +67,8 @@ static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
LazyLock::new(|| Mutex::new(HashMap::new()));
+/// Max age in seconds to reuse the cached datastore config.
+const DATASTORE_CONFIG_CACHE_TTL_SECS: i64 = 60;
/// Filename to store backup group notes
pub const GROUP_NOTES_FILE_NAME: &str = "notes";
/// Filename to store backup group owner
@@ -323,15 +329,16 @@ impl DatastoreThreadSettings {
/// generation.
///
/// Uses `ConfigVersionCache` to detect stale entries:
-/// - If the cached generation matches the current generation, the
-/// cached config is returned.
+/// - If the cached generation matches the current generation and TTL is
+/// OK, the cached config is returned.
/// - Otherwise the config is re-read from disk. If `update_cache` is
-/// `true`, the new config and current generation are stored in the
-/// cache. Callers that set `update_cache = true` must hold the
-/// datastore config lock to avoid racing with concurrent config
-/// changes.
+/// `true` and a previous cached entry exists with the same generation
+/// but a different digest, this indicates the config has changed
+/// (e.g. manual edit) and the generation must be bumped. Callers
+/// that set `update_cache = true` must hold the datastore config lock
+/// to avoid racing with concurrent config changes.
/// - If `update_cache` is `false`, the freshly read config is returned
-/// but the cache is left unchanged.
+/// but the cache and generation are left unchanged.
///
/// If `ConfigVersionCache` is not available, the config is always read
/// from disk and `None` is returned as the generation.
@@ -341,25 +348,41 @@ fn datastore_section_config_cached(
let mut config_cache = DATASTORE_CONFIG_CACHE.lock().unwrap();
if let Ok(version_cache) = ConfigVersionCache::new() {
+ let now = epoch_i64();
let current_gen = version_cache.datastore_generation();
if let Some(cached) = config_cache.as_ref() {
- // Fast path: re-use cached datastore.cfg
- if cached.last_generation == current_gen {
+ // Fast path: re-use cached datastore.cfg if generation matches and TTL not expired
+ if cached.last_generation == current_gen
+ && now - cached.last_update < DATASTORE_CONFIG_CACHE_TTL_SECS
+ {
return Ok((cached.config.clone(), Some(cached.last_generation)));
}
}
// Slow path: re-read datastore.cfg
- let (config_raw, _digest) = pbs_config::datastore::config()?;
+ let (config_raw, digest) = pbs_config::datastore::config()?;
let config = Arc::new(config_raw);
+ let mut effective_gen = current_gen;
if update_cache {
+ // Bump the generation if the config has been changed manually.
+ // This ensures that Drop handlers will detect that a newer config exists
+ // and will not rely on a stale cached entry for maintenance mandate.
+ if let Some(cached) = config_cache.as_ref() {
+ if cached.last_generation == current_gen && cached.digest != digest {
+ effective_gen = version_cache.increase_datastore_generation() + 1;
+ }
+ }
+
+ // Persist
*config_cache = Some(DatastoreConfigCache {
config: config.clone(),
- last_generation: current_gen,
+ digest,
+ last_generation: effective_gen,
+ last_update: now,
});
}
- Ok((config, Some(current_gen)))
+ Ok((config, Some(effective_gen)))
} else {
// Fallback path, no config version cache: read datastore.cfg and return None as generation
*config_cache = None;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox-backup v6 0/4] datastore: remove config reload on hot path
@ 2026-01-05 14:16 12% Samuel Rufinatscha
2026-01-05 14:16 16% ` [pbs-devel] [PATCH proxmox-backup v6 1/4] config: enable config version cache for datastore Samuel Rufinatscha
` (4 more replies)
0 siblings, 5 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-05 14:16 UTC (permalink / raw)
To: pbs-devel
Hi,
this series reduces CPU time in datastore lookups by avoiding repeated
datastore.cfg reads/parses in both `lookup_datastore()` and
`DataStore::Drop`. It also adds a TTL so manual config edits are
noticed without reintroducing hashing on every request.
While investigating #6049 [1], cargo-flamegraph [2] showed hotspots
during repeated `/status` calls in `lookup_datastore()` and in `Drop`,
dominated by `pbs_config::datastore::config()` (config parse).
The parsing cost itself should eventually be investigated in a future
effort. Furthermore, cargo-flamegraph showed that when using a
token-based auth method to access the API, a significant amount of time
is spent in validation on every request request [3].
## Approach
[PATCH 1/4] Support datastore generation in ConfigVersionCache
[PATCH 2/4] Fast path for datastore lookups
Cache the parsed datastore.cfg keyed by the shared datastore
generation. lookup_datastore() reuses both the cached config and an
existing DataStoreImpl when the generation matches, and falls back
to the old slow path otherwise. The caching logic is implemented
using the datastore_section_config_cached(update_cache: bool) helper.
[PATCH 3/4] Fast path for Drop
Make DataStore::Drop use the datastore_section_config_cached()
helper to avoid re-reading/parsing datastore.cfg on every Drop.
[PATCH 4/4] TTL to catch manual edits
Add a TTL to the cached config and bump the datastore generation iff
the digest changed but generation stays the same. This catches manual
edits to datastore.cfg without reintroducing hashing or config
parsing on every request.
## Benchmark results
### End-to-end
Testing `/status?verbose=0` end-to-end with 1000 stores, 5 req/store
and parallel=16 before/after the series:
Metric Before After
----------------------------------------
Total time 12s 9s
Throughput (all) 416.67 555.56
Cold RPS (round #1) 83.33 111.11
Warm RPS (#2..N) 333.33 444.44
Running under flamegraph [2], TLS appears to consume a significant
amount of CPU time and blur the results. Still, a ~33% higher overall
throughput and ~25% less end-to-end time for this workload.
### Isolated benchmarks (hyperfine)
In addition to the end-to-end tests, I measured two standalone
benchmarks with hyperfine, each using a config with 1000 datastores.
`M` is the number of distinct datastores looked up and
`N` is the number of lookups per datastore.
Drop-direct variant:
Drops the `DataStore` after every lookup, so the `Drop` path runs on
every iteration:
use anyhow::Error;
use pbs_api_types::Operation;
use pbs_datastore::DataStore;
fn main() -> Result<(), Error> {
let mut args = std::env::args();
args.next();
let datastores = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
let iterations = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
for d in 1..=datastores {
let name = format!("ds{:04}", d);
for i in 1..=iterations {
DataStore::lookup_datastore(&name, Some(Operation::Write))?;
}
}
Ok(())
}
+----+------+-----------+-----------+---------+
| M | N | Baseline | Patched | Speedup |
+----+------+-----------+-----------+---------+
| 1 | 1000 | 1.684 s | 35.3 ms | 47.7x |
| 10 | 100 | 1.689 s | 35.0 ms | 48.3x |
| 100| 10 | 1.709 s | 35.8 ms | 47.7x |
|1000| 1 | 1.809 s | 39.0 ms | 46.4x |
+----+------+-----------+-----------+---------+
Bulk-drop variant:
Keeps the `DataStore` instances alive for
all `N` lookups of a given datastore and then drops them in bulk,
mimicking a task that performs many lookups while it is running and
only triggers the expensive `Drop` logic when the last user exits.
use anyhow::Error;
use pbs_api_types::Operation;
use pbs_datastore::DataStore;
fn main() -> Result<(), Error> {
let mut args = std::env::args();
args.next();
let datastores = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
let iterations = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
for d in 1..=datastores {
let name = format!("ds{:04}", d);
let mut stores = Vec::with_capacity(iterations);
for i in 1..=iterations {
stores.push(DataStore::lookup_datastore(&name, Some(Operation::Write))?);
}
}
Ok(())
}
+------+------+---------------+--------------+---------+
| M | N | Baseline mean | Patched mean | Speedup |
+------+------+---------------+--------------+---------+
| 1 | 1000 | 890.6 ms | 35.5 ms | 25.1x |
| 10 | 100 | 891.3 ms | 35.1 ms | 25.4x |
| 100 | 10 | 983.9 ms | 35.6 ms | 27.6x |
| 1000 | 1 | 1829.0 ms | 45.2 ms | 40.5x |
+------+------+---------------+--------------+---------+
Both variants show that the combination of the cached config lookups
and the cheaper `Drop` handling reduces the hot-path cost from ~1.8 s
per run to a few tens of milliseconds in these benchmarks.
## Reproduction steps
VM: 4 vCPU, ~8 GiB RAM, VirtIO-SCSI; disks:
- scsi0 32G (OS)
- scsi1 1000G (datastores)
Install PBS from ISO on the VM.
Set up ZFS on /dev/sdb (adjust if different):
zpool create -f -o ashift=12 pbsbench /dev/sdb
zfs set mountpoint=/pbsbench pbsbench
zfs create pbsbench/pbs-bench
Raise file-descriptor limit:
sudo systemctl edit proxmox-backup-proxy.service
Add the following lines:
[Service]
LimitNOFILE=1048576
Reload systemd and restart the proxy:
sudo systemctl daemon-reload
sudo systemctl restart proxmox-backup-proxy.service
Verify the limit:
systemctl show proxmox-backup-proxy.service | grep LimitNOFILE
Create 1000 ZFS-backed datastores (as used in #6049 [1]):
seq -w 001 1000 | xargs -n1 -P1 bash -c '
id=$0
name="ds${id}"
dataset="pbsbench/pbs-bench/${name}"
path="/pbsbench/pbs-bench/${name}"
zfs create -o mountpoint="$path" "$dataset"
proxmox-backup-manager datastore create "$name" "$path" \
--comment "ZFS dataset-based datastore"
'
Build PBS from this series, then run the server under manually
under flamegraph:
systemctl stop proxmox-backup-proxy
cargo flamegraph --release --bin proxmox-backup-proxy
## Patch summary
[PATCH 1/4] config: enable config version cache for datastore
[PATCH 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
[PATCH 3/4] partial fix #6049: datastore: use config fast-path in Drop
[PATCH 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
## Changes
Please refer to the per-patch changelogs.
## Maintainer notes
No dependency bumps, no API changes and no breaking changes.
Kind regards,
Samuel
Links
[1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
[3] Bugzilla #7017: https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Samuel Rufinatscha (4):
config: enable config version cache for datastore
partial fix #6049: datastore: impl ConfigVersionCache fast path for
lookups
partial fix #6049: datastore: use config fast-path in Drop
partial fix #6049: datastore: add TTL fallback to catch manual config
edits
pbs-config/src/config_version_cache.rs | 10 +-
pbs-datastore/Cargo.toml | 1 +
pbs-datastore/src/datastore.rs | 148 +++++++++++++++++++++----
3 files changed, 135 insertions(+), 24 deletions(-)
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] [PATCH proxmox-backup v6 1/4] config: enable config version cache for datastore
2026-01-05 14:16 12% [pbs-devel] [PATCH proxmox-backup v6 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
@ 2026-01-05 14:16 16% ` Samuel Rufinatscha
2026-01-05 14:16 11% ` [pbs-devel] [PATCH proxmox-backup v6 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
` (3 subsequent siblings)
4 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-05 14:16 UTC (permalink / raw)
To: pbs-devel
Repeated /status requests caused lookup_datastore() to re-read and
parse datastore.cfg on every call. The issue was mentioned in report
#6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
dominated by pbs_config::datastore::config() (config parsing).
To solve the issue, this patch prepares the config version cache,
so that datastore config caching can be built on top of it.
This patch specifically:
(1) implements increment function in order to invalidate generations
(2) removes obsolete comments
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes:
From v1 → v2 (original introduction), thanks @Fabian
- Split the ConfigVersionCache changes out of the large datastore patch
into their own config-only patch
From v2 → v3
No changes
From v3 → v4
No changes
From v4 → v5
- Rebased only, no changes
From v5 → v6
- Rebased
- Removed "partial-fix" prefix from subject, thanks @Fabian
pbs-config/src/config_version_cache.rs | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
index e8fb994f..b875f7e0 100644
--- a/pbs-config/src/config_version_cache.rs
+++ b/pbs-config/src/config_version_cache.rs
@@ -26,7 +26,6 @@ struct ConfigVersionCacheDataInner {
// Traffic control (traffic-control.cfg) generation/version.
traffic_control_generation: AtomicUsize,
// datastore (datastore.cfg) generation/version
- // FIXME: remove with PBS 3.0
datastore_generation: AtomicUsize,
// Add further atomics here
}
@@ -145,8 +144,15 @@ impl ConfigVersionCache {
.fetch_add(1, Ordering::AcqRel);
}
+ /// Returns the datastore generation number.
+ pub fn datastore_generation(&self) -> usize {
+ self.shmem
+ .data()
+ .datastore_generation
+ .load(Ordering::Acquire)
+ }
+
/// Increase the datastore generation number.
- // FIXME: remove with PBS 3.0 or make actually useful again in datastore lookup
pub fn increase_datastore_generation(&self) -> usize {
self.shmem
.data()
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 16%]
* [pbs-devel] [PATCH proxmox-backup v6 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
2026-01-05 14:16 12% [pbs-devel] [PATCH proxmox-backup v6 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
2026-01-05 14:16 16% ` [pbs-devel] [PATCH proxmox-backup v6 1/4] config: enable config version cache for datastore Samuel Rufinatscha
@ 2026-01-05 14:16 11% ` Samuel Rufinatscha
2026-01-05 14:16 15% ` [pbs-devel] [PATCH proxmox-backup v6 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
` (2 subsequent siblings)
4 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-05 14:16 UTC (permalink / raw)
To: pbs-devel
Repeated /status requests caused lookup_datastore() to re-read and
parse datastore.cfg on every call. The issue was mentioned in report
#6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
dominated by pbs_config::datastore::config() (config parsing).
This patch implements caching of the global datastore.cfg using the
generation numbers from the shared config version cache. It caches the
datastore.cfg along with the generation number and, when a subsequent
lookup sees the same generation, it reuses the cached config without
re-reading it from disk. If the generation differs
(or the cache is unavailable), the config is re-read from disk.
If `update_cache = true`, the new config and current generation are
persisted in the cache. In this case, callers must hold the datastore
config lock to avoid racing with concurrent config changes.
If `update_cache` is `false` and generation did not match, the freshly
read config is returned but the cache is left unchanged. If
`ConfigVersionCache` is not available, the config is always read from
disk and `None` is returned as generation.
Behavioral notes
- The generation is bumped via the existing save_config() path, so
API-driven config changes are detected immediately.
- Manual edits to datastore.cfg are not detected; this is covered in a
dedicated patch in this series.
- DataStore::drop still performs a config read on the common path;
also covered in a dedicated patch in this series.
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes:
From v1 → v2
- Moved the ConfigVersionCache changes into its own patch,
thanks @Fabian
- Introduced the global static DATASTORE_CONFIG_CACHE to store the
fully parsed datastore.cfg instead, along with its generation number.
thanks @Fabian
- Introduced DatastoreConfigCache struct to hold cache values
- Removed and replaced the CachedDatastoreConfigTag field of
DataStoreImpl with a generation number field only (Option<usize>)
to validate DataStoreImpl reuse.
- Added DataStore::datastore_section_config_cached() helper function
to encapsulate the caching logic and simplify reuse.
- Modified DataStore::lookup_datastore() to use the new helper.
From v2 → v3
No changes
From v3 → v4, thanks @Fabian
- Restructured the version cache checks in
datastore_section_config_cached(), to simplify the logic.
- Added update_cache parameter to datastore_section_config_cached() to
control cache updates.
From v4 → v5
- Rebased only, no changes
From v5 → v6
- Rebased
- Styling: minimize/avoid diff noise, thanks @Fabian
pbs-datastore/Cargo.toml | 1 +
pbs-datastore/src/datastore.rs | 90 ++++++++++++++++++++++++++++------
2 files changed, 77 insertions(+), 14 deletions(-)
diff --git a/pbs-datastore/Cargo.toml b/pbs-datastore/Cargo.toml
index 8ce930a9..42f49a7b 100644
--- a/pbs-datastore/Cargo.toml
+++ b/pbs-datastore/Cargo.toml
@@ -40,6 +40,7 @@ proxmox-io.workspace = true
proxmox-lang.workspace=true
proxmox-s3-client = { workspace = true, features = [ "impl" ] }
proxmox-schema = { workspace = true, features = [ "api-macro" ] }
+proxmox-section-config.workspace = true
proxmox-serde = { workspace = true, features = [ "serde_json" ] }
proxmox-sys.workspace = true
proxmox-systemd.workspace = true
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 9c57aaac..aa366826 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -34,7 +34,8 @@ use pbs_api_types::{
MaintenanceType, Operation, UPID,
};
use pbs_config::s3::S3_CFG_TYPE_ID;
-use pbs_config::BackupLockGuard;
+use pbs_config::{BackupLockGuard, ConfigVersionCache};
+use proxmox_section_config::SectionConfigData;
use crate::backup_info::{
BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
@@ -48,6 +49,17 @@ use crate::s3::S3_CONTENT_PREFIX;
use crate::task_tracking::{self, update_active_operations};
use crate::{DataBlob, LocalDatastoreLruCache};
+// Cache for fully parsed datastore.cfg
+struct DatastoreConfigCache {
+ // Parsed datastore.cfg file
+ config: Arc<SectionConfigData>,
+ // Generation number from ConfigVersionCache
+ last_generation: usize,
+}
+
+static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
+ LazyLock::new(|| Mutex::new(None));
+
static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
LazyLock::new(|| Mutex::new(HashMap::new()));
@@ -149,11 +161,13 @@ pub struct DataStoreImpl {
last_gc_status: Mutex<GarbageCollectionStatus>,
verify_new: bool,
chunk_order: ChunkOrder,
- last_digest: Option<[u8; 32]>,
sync_level: DatastoreFSyncLevel,
backend_config: DatastoreBackendConfig,
lru_store_caching: Option<LocalDatastoreLruCache>,
thread_settings: DatastoreThreadSettings,
+ /// datastore.cfg cache generation number at lookup time, used to
+ /// invalidate this cached `DataStoreImpl`
+ config_generation: Option<usize>,
}
impl DataStoreImpl {
@@ -166,11 +180,11 @@ impl DataStoreImpl {
last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
verify_new: false,
chunk_order: Default::default(),
- last_digest: None,
sync_level: Default::default(),
backend_config: Default::default(),
lru_store_caching: None,
thread_settings: Default::default(),
+ config_generation: None,
})
}
}
@@ -286,6 +300,55 @@ impl DatastoreThreadSettings {
}
}
+/// Returns the parsed datastore config (`datastore.cfg`) and its
+/// generation.
+///
+/// Uses `ConfigVersionCache` to detect stale entries:
+/// - If the cached generation matches the current generation, the
+/// cached config is returned.
+/// - Otherwise the config is re-read from disk. If `update_cache` is
+/// `true`, the new config and current generation are stored in the
+/// cache. Callers that set `update_cache = true` must hold the
+/// datastore config lock to avoid racing with concurrent config
+/// changes.
+/// - If `update_cache` is `false`, the freshly read config is returned
+/// but the cache is left unchanged.
+///
+/// If `ConfigVersionCache` is not available, the config is always read
+/// from disk and `None` is returned as the generation.
+fn datastore_section_config_cached(
+ update_cache: bool,
+) -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
+ let mut config_cache = DATASTORE_CONFIG_CACHE.lock().unwrap();
+
+ if let Ok(version_cache) = ConfigVersionCache::new() {
+ let current_gen = version_cache.datastore_generation();
+ if let Some(cached) = config_cache.as_ref() {
+ // Fast path: re-use cached datastore.cfg
+ if cached.last_generation == current_gen {
+ return Ok((cached.config.clone(), Some(cached.last_generation)));
+ }
+ }
+ // Slow path: re-read datastore.cfg
+ let (config_raw, _digest) = pbs_config::datastore::config()?;
+ let config = Arc::new(config_raw);
+
+ if update_cache {
+ *config_cache = Some(DatastoreConfigCache {
+ config: config.clone(),
+ last_generation: current_gen,
+ });
+ }
+
+ Ok((config, Some(current_gen)))
+ } else {
+ // Fallback path, no config version cache: read datastore.cfg and return None as generation
+ *config_cache = None;
+ let (config_raw, _digest) = pbs_config::datastore::config()?;
+ Ok((Arc::new(config_raw), None))
+ }
+}
+
impl DataStore {
// This one just panics on everything
#[doc(hidden)]
@@ -367,10 +430,9 @@ impl DataStore {
// we use it to decide whether it is okay to delete the datastore.
let _config_lock = pbs_config::datastore::lock_config()?;
- // we could use the ConfigVersionCache's generation for staleness detection, but we load
- // the config anyway -> just use digest, additional benefit: manual changes get detected
- let (config, digest) = pbs_config::datastore::config()?;
- let config: DataStoreConfig = config.lookup("datastore", name)?;
+ // Get the current datastore.cfg generation number and cached config
+ let (section_config, gen_num) = datastore_section_config_cached(true)?;
+ let config: DataStoreConfig = section_config.lookup("datastore", name)?;
if let Some(maintenance_mode) = config.get_maintenance_mode() {
if let Err(error) = maintenance_mode.check(operation) {
@@ -378,19 +440,19 @@ impl DataStore {
}
}
+ let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
+
if get_datastore_mount_status(&config) == Some(false) {
- let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
datastore_cache.remove(&config.name);
bail!("datastore '{}' is not mounted", config.name);
}
- let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
let entry = datastore_cache.get(name);
// reuse chunk store so that we keep using the same process locker instance!
let chunk_store = if let Some(datastore) = &entry {
- let last_digest = datastore.last_digest.as_ref();
- if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
+ // Re-use DataStoreImpl
+ if datastore.config_generation == gen_num && gen_num.is_some() {
if let Some(operation) = operation {
update_active_operations(name, operation, 1)?;
}
@@ -412,7 +474,7 @@ impl DataStore {
)?)
};
- let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
+ let datastore = DataStore::with_store_and_config(chunk_store, config, gen_num)?;
let datastore = Arc::new(datastore);
datastore_cache.insert(name.to_string(), datastore.clone());
@@ -514,7 +576,7 @@ impl DataStore {
fn with_store_and_config(
chunk_store: Arc<ChunkStore>,
config: DataStoreConfig,
- last_digest: Option<[u8; 32]>,
+ generation: Option<usize>,
) -> Result<DataStoreImpl, Error> {
let mut gc_status_path = chunk_store.base_path();
gc_status_path.push(".gc-status");
@@ -579,11 +641,11 @@ impl DataStore {
last_gc_status: Mutex::new(gc_status),
verify_new: config.verify_new.unwrap_or(false),
chunk_order: tuning.chunk_order.unwrap_or_default(),
- last_digest,
sync_level: tuning.sync_level.unwrap_or_default(),
backend_config,
lru_store_caching,
thread_settings,
+ config_generation: generation,
})
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 11%]
* [pbs-devel] [PATCH proxmox-backup v6 3/4] partial fix #6049: datastore: use config fast-path in Drop
2026-01-05 14:16 12% [pbs-devel] [PATCH proxmox-backup v6 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
2026-01-05 14:16 16% ` [pbs-devel] [PATCH proxmox-backup v6 1/4] config: enable config version cache for datastore Samuel Rufinatscha
2026-01-05 14:16 11% ` [pbs-devel] [PATCH proxmox-backup v6 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
@ 2026-01-05 14:16 15% ` Samuel Rufinatscha
2026-01-05 14:16 13% ` [pbs-devel] [PATCH proxmox-backup v6 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2026-01-14 9:54 5% ` [pbs-devel] applied-series: [PATCH proxmox-backup v6 0/4] datastore: remove config reload on hot path Fabian Grünbichler
4 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-05 14:16 UTC (permalink / raw)
To: pbs-devel
The Drop impl of DataStore re-read datastore.cfg to decide whether
the entry should be evicted from the in-process cache (based on
maintenance mode’s clear_from_cache). During the investigation of
issue #6049 [1], a flamegraph [2] showed that the config reload in Drop
accounted for a measurable share of CPU time under load.
This patch wires the datastore config fast path to the Drop
impl to eventually avoid an expensive config reload from disk to capture
the maintenance mandate.
Behavioral notes
- Drop no longer silently ignores config/lookup failures: failures to
load/parse datastore.cfg are logged at WARN level
- If the datastore no longer exists in datastore.cfg when the last
handle is dropped, the cached instance is evicted from DATASTORE_MAP
if available (without checking maintenance mode).
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes:
From v1 → v2
- Replace caching logic with the datastore_section_config_cached()
helper.
From v2 → v3
No changes
From v3 → v4, thanks @Fabian
- Pass datastore_section_config_cached(false) in Drop to avoid
concurrent cache updates.
From v4 → v5
- Rebased only, no changes
From v5 → v6
- Rebased
- Styling: restructured cache eviction condition
- Drop impl: log cache-related failures to load/parse datastore.cfg at
WARN level instead of ERROR
- Note logging change in the patch message, thanks @Fabian
- Remove cached entry from DATASTORE_MAP (if available) if datastore no
longer exists in datastore.cfg when the last handle is dropped,
thanks @Fabian
- Removed slow-path generation bumping in
datastore_section_config_cached, since API changes already
bump the generation on config save. Moved to subsequent patch,
relevant for TTL-based mechanism to bump on non-API edits, thanks @Fabian
pbs-datastore/src/datastore.rs | 35 ++++++++++++++++++++++++++--------
1 file changed, 27 insertions(+), 8 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index aa366826..8adb0e3b 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -224,14 +224,33 @@ impl Drop for DataStore {
// remove datastore from cache iff
// - last task finished, and
- // - datastore is in a maintenance mode that mandates it
- let remove_from_cache = last_task
- && pbs_config::datastore::config()
- .and_then(|(s, _)| s.lookup::<DataStoreConfig>("datastore", self.name()))
- .is_ok_and(|c| {
- c.get_maintenance_mode()
- .is_some_and(|m| m.clear_from_cache())
- });
+ // - datastore is in a maintenance mode that mandates it, or the datastore was removed from datastore.cfg
+
+ // first check: check if last task finished
+ if !last_task {
+ return;
+ }
+
+ // determine whether we should evict from DATASTORE_MAP.
+ let remove_from_cache = match datastore_section_config_cached(false) {
+ Ok((section_config, _gen)) => {
+ match section_config.lookup::<DataStoreConfig>("datastore", self.name()) {
+ // second check: check if maintenance mode requires closing FDs
+ Ok(config) => config
+ .get_maintenance_mode()
+ .is_some_and(|m| m.clear_from_cache()),
+ Err(err) => {
+ // datastore removed from config; evict cached entry if available (without checking maintenance mode)
+ log::warn!("DataStore::drop: datastore '{}' missing from datastore.cfg; evicting cached instance: {err}", self.name());
+ true
+ }
+ }
+ }
+ Err(err) => {
+ log::warn!("DataStore::drop: failed to load datastore.cfg for '{}'; skipping cache-eviction: {err}", self.name());
+ false
+ }
+ };
if remove_from_cache {
DATASTORE_MAP.lock().unwrap().remove(self.name());
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] superseded: [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token verification overhead
2025-12-17 16:25 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token " Samuel Rufinatscha
` (7 preceding siblings ...)
2025-12-18 11:03 12% ` [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token verification overhead Samuel Rufinatscha
@ 2026-01-02 16:09 13% ` Samuel Rufinatscha
8 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-02 16:09 UTC (permalink / raw)
To: pbs-devel
https://lore.proxmox.com/pbs-devel/20260102160750.285157-1-s.rufinatscha@proxmox.com/T/#t
On 12/17/25 5:25 PM, Samuel Rufinatscha wrote:
> Hi,
>
> this series improves the performance of token-based API authentication
> in PBS (pbs-config) and in PDM (underlying proxmox-access-control
> crate), addressing the API token verification hotspot reported in our
> bugtracker #6049 [1].
>
> When profiling PBS /status endpoint with cargo flamegraph [2],
> token-based authentication showed up as a dominant hotspot via
> proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
> path from the hot section of the flamegraph. The same performance issue
> was measured [2] for PDM. PDM uses the underlying shared
> proxmox-access-control library for token handling, which is a
> factored out version of the token.shadow handling code from PBS.
>
> While this series fixes the immediate performance issue both in PBS
> (pbs-config) and in the shared proxmox-access-control crate used by
> PDM, PBS should eventually, ideally be refactored, in a separate
> effort, to use proxmox-access-control for token handling instead of its
> local implementation.
>
> Problem
>
> For token-based API requests, both PBS’s pbs-config token.shadow
> handling and PDM proxmox-access-control’s token.shadow handling
> currently:
>
> 1. read the token.shadow file on each request
> 2. deserialize it into a HashMap<Authid, String>
> 3. run password hash verification via
> proxmox_sys::crypt::verify_crypt_pw for the provided token secret
>
> Under load, this results in significant CPU usage spent in repeated
> password hash computations for the same token+secret pairs. The
> attached flamegraphs for PBS [2] and PDM [3] show
> proxmox_sys::crypt::verify_crypt_pw dominating the hot path.
>
> Approach
>
> The goal is to reduce the cost of token-based authentication preserving
> the existing token handling semantics (including detecting manual edits
> to token.shadow) and be consistent between PBS (pbs-config) and
> PDM (proxmox-access-control). For both sites, the series proposes
> following approach:
>
> 1. Introduce an in-memory cache for verified token secrets
> 2. Invalidate the cache when token.shadow changes (detect manual edits)
> 3. Control metadata checks with a TTL window
>
> Testing
>
> *PBS (pbs-config)*
>
> To verify the effect in PBS, I:
> 1. Set up test environment based on latest PBS ISO, installed Rust
> toolchain, cloned proxmox-backup repository to use with cargo
> flamegraph. Reproduced bug #6049 [1] by profiling the /status
> endpoint with token-based authentication using cargo flamegraph [2].
> The flamegraph showed proxmox_sys::crypt::verify_crypt_pw is the
> hotspot.
> 2. Built PBS with pbs-config patches and re-ran the same workload and
> profiling setup.
> 3. Confirmed that the proxmox_sys::crypt::verify_crypt_pw path no
> longer appears in the hot section of the flamegraph. CPU usage is
> now dominated by TLS overhead.
> 4. Functionally verified that:
> * token-based API authentication still works for valid tokens
> * invalid secrets are rejected as before
> * generating a new token secret via dashboard works and
> authenticates correctly
>
> *PDM (proxmox-access-control)*
>
> To verify the effect in PDM, I followed a similar testing approach.
> Instead of /status, I profiled the /version endpoint with cargo
> flamegraph [2] and verified that the token hashing path disappears [4]
> from the hot section after applying the proxmox-access-control patches.
>
> Functionally I verified that:
> * token-based API authentication still works for valid tokens
> * invalid secrets are rejected as before
> * generating a new token secret via dashboard works and
> authenticates correctly
>
> Benchmarks:
>
> Two different benchmarks have been run to measure caching effects
> and RwLock contention:
>
> (1) Requests per second for PBS /status endpoint (E2E)
> (2) RwLock contention for token create/delete under
> heavy parallel token-authenticated readers; compared
> std::sync::RwLock and parking_lot::RwLock.
>
> (1) benchmarked parallel token auth requests for
> /status?verbose=0 on top of the datastore lookup cache series [5]
> to check throughput impact. With datastores=1, repeat=5000, parallel=16
> this series gives ~179 req/s compared to ~65 req/s without it.
> This is a ~2.75x improvement.
>
> (2) benchmarked token create/delete operations under heavy load of
> token-authenticated requests on top of the datastore lookup cache [5]
> series. This benchmark was done using against a 64-parallel
> token-auth flood (200k requests) against
> /admin/datastore/ds0001/status?verbose=0 while executing 50 token
> create + 50 token delete operations. After the series I saw the
> following e2e API latencies:
>
> parking_lot::RwLock
> - create avg ~27ms (p95 ~28ms) vs ~46ms (p95 ~50ms) baseline
> - delete avg ~17ms (p95 ~19ms) vs ~33ms (p95 ~35ms) baseline
>
> std::sync::RwLock
> - create avg ~27ms (p95 ~28ms)
> - create avg ~17ms (p95 ~19ms)
>
> It appears that the both RwLock implementations perform similarly
> for this workload. The parking_lot version has been chosen for the
> added fairness guarantees.
>
> Patch summary
>
> pbs-config:
>
> 0001 – pbs-config: cache verified API token secrets
> Adds an in-memory cache keyed by Authid that stores plain text token
> secrets after a successful verification or generation and uses
> openssl’s memcmp constant-time for comparison.
>
> 0002 – pbs-config: invalidate token-secret cache on token.shadow
> changes
> Tracks token.shadow mtime and length and clears the in-memory
> cache when the file changes.
>
> 0003 – pbs-config: add TTL window to token-secret cache
> Introduces a TTL (TOKEN_SECRET_CACHE_TTL_SECS, default 60) for metadata
> checks so that fs::metadata is only called periodically.
>
> proxmox-access-control:
>
> 0004 – access-control: cache verified API token secrets
> Mirrors PBS PATCH 0001.
>
> 0005 – access-control: invalidate token-secret cache on token.shadow changes
> Mirrors PBS PATCH 0002.
>
> 0006 – access-control: add TTL window to token-secret cache
> Mirrors PBS PATCH 0003.
>
> proxmox-datacenter-manager:
>
> 0007 – docs: document API token-cache TTL effects
> Documents the effects of the TTL window on token.shadow edits
>
> Changes since v1
>
> - (refactor) Switched cache initialization to LazyLock
> - (perf) Use parking_lot::RwLock and best-effort cache access on the
> read/refresh path (try_read/try_write) to avoid lock contention
> - (doc) Document TTL-delayed effect of manual token.shadow edits
> - (fix) Add generation guards (API_MUTATION_GENERATION +
> FILE_GENERATION) to prevent caching across concurrent set/delete and
> external edits
>
> Please see the patch specific changelogs for more details.
>
> Thanks for considering this patch series, I look forward to your
> feedback.
>
> Best,
> Samuel Rufinatscha
>
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
> [2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
> [3] attachment 1794 [1]: Flamegraph PDM baseline
> [4] attachment 1795 [1]: Flamegraph PDM patched
> [5] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>
> proxmox-backup:
>
> Samuel Rufinatscha (3):
> pbs-config: cache verified API token secrets
> pbs-config: invalidate token-secret cache on token.shadow changes
> pbs-config: add TTL window to token secret cache
>
> Cargo.toml | 1 +
> docs/user-management.rst | 4 +
> pbs-config/Cargo.toml | 1 +
> pbs-config/src/token_shadow.rs | 238 ++++++++++++++++++++++++++++++++-
> 4 files changed, 243 insertions(+), 1 deletion(-)
>
>
> proxmox:
>
> Samuel Rufinatscha (3):
> proxmox-access-control: cache verified API token secrets
> proxmox-access-control: invalidate token-secret cache on token.shadow
> changes
> proxmox-access-control: add TTL window to token secret cache
>
> Cargo.toml | 1 +
> proxmox-access-control/Cargo.toml | 1 +
> proxmox-access-control/src/token_shadow.rs | 238 ++++++++++++++++++++-
> 3 files changed, 239 insertions(+), 1 deletion(-)
>
>
> proxmox-datacenter-manager:
>
> Samuel Rufinatscha (1):
> docs: document API token-cache TTL effects
>
> docs/access-control.rst | 3 +++
> 1 file changed, 3 insertions(+)
>
>
> Summary over all repositories:
> 8 files changed, 485 insertions(+), 2 deletions(-)
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox v3 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes
2026-01-02 16:07 13% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] " Samuel Rufinatscha
` (5 preceding siblings ...)
2026-01-02 16:07 12% ` [pbs-devel] [PATCH proxmox v3 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
@ 2026-01-02 16:07 12% ` Samuel Rufinatscha
2026-01-02 16:07 15% ` [pbs-devel] [PATCH proxmox v3 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
` (3 subsequent siblings)
10 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-02 16:07 UTC (permalink / raw)
To: pbs-devel
Previously the in-memory token-secret cache was only updated via
set_secret() and delete_secret(), so manual edits to token.shadow were
not reflected.
This patch adds file change detection to the cache. It tracks the mtime
and length of token.shadow and clears the in-memory token secret cache
whenever these values change.
Note, this patch fetches file stats on every request. An TTL-based
optimization will be covered in a subsequent patch of the series.
This patch is part of the series which fixes bug #7017 [1].
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v1 to v2:
* Add file metadata tracking (file_mtime, file_len) and
FILE_GENERATION.
* Store file_gen in CachedSecret and verify it against the current
FILE_GENERATION to ensure cached entries belong to the current file
state.
* Add shadow_mtime_len() helper and convert refresh to best-effort
(try_write, returns bool).
* Pass a pre-write metadata snapshot into apply_api_mutation and
clear/bump generation if the cache metadata indicates missed external
edits.
Changes from v2 to v3:
* Cache now tracks last_checked (epoch seconds).
* Simplified refresh_cache_if_file_changed, removed
FILE_GENERATION logic
* On first load, initializes file metadata and keeps empty cache.
proxmox-access-control/src/token_shadow.rs | 129 ++++++++++++++++++++-
1 file changed, 123 insertions(+), 6 deletions(-)
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index 895309d2..f30c8ed5 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,5 +1,8 @@
use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
use std::sync::LazyLock;
+use std::time::SystemTime;
use anyhow::{bail, format_err, Error};
use parking_lot::RwLock;
@@ -7,6 +10,7 @@ use serde_json::{from_value, Value};
use proxmox_auth_api::types::Authid;
use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
+use proxmox_time::epoch_i64;
use crate::init::access_conf;
use crate::init::impl_feature::{token_shadow, token_shadow_lock};
@@ -20,6 +24,9 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
RwLock::new(ApiTokenSecretCache {
secrets: HashMap::new(),
shared_gen: 0,
+ file_mtime: None,
+ file_len: None,
+ last_checked: None,
})
});
@@ -45,6 +52,63 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
replace_config(token_shadow(), &json)
}
+/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
+/// Returns true if the cache is valid to use, false if not.
+fn refresh_cache_if_file_changed() -> bool {
+ let now = epoch_i64();
+
+ // Best-effort refresh under write lock.
+ let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+ return false;
+ };
+
+ let Some(shared_gen_now) = token_shadow_shared_gen() else {
+ return false;
+ };
+
+ // If another process bumped the generation, we don't know what changed -> clear cache
+ if cache.shared_gen != shared_gen_now {
+ invalidate_cache_state(&mut cache);
+ cache.shared_gen = shared_gen_now;
+ }
+
+ // Stat the file to detect manual edits.
+ let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
+ return false;
+ };
+
+ // Initialize file stats if we have no prior state.
+ if cache.last_checked.is_none() {
+ cache.secrets.clear(); // ensure cache is empty on first load
+ cache.file_mtime = new_mtime;
+ cache.file_len = new_len;
+ cache.last_checked = Some(now);
+ return true;
+ }
+
+ // No change detected.
+ if cache.file_mtime == new_mtime && cache.file_len == new_len {
+ cache.last_checked = Some(now);
+ return true;
+ }
+
+ // Manual edit detected -> invalidate cache and update stat.
+ cache.secrets.clear();
+ cache.file_mtime = new_mtime;
+ cache.file_len = new_len;
+ cache.last_checked = Some(now);
+
+ // Best-effort propagation to other processes + update local view.
+ if let Some(shared_gen_new) = bump_token_shadow_shared_gen() {
+ cache.shared_gen = shared_gen_new;
+ } else {
+ // Do not fail: local cache is already safe as we cleared it above.
+ // Keep local shared_gen as-is to avoid repeated failed attempts.
+ }
+
+ true
+}
+
/// Verifies that an entry for given tokenid / API token secret exists
pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
if !tokenid.is_token() {
@@ -52,7 +116,7 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
}
// Fast path
- if cache_try_secret_matches(tokenid, secret) {
+ if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
return Ok(());
}
@@ -84,12 +148,15 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
let _guard = lock_config()?;
+ // Capture state before we write to detect external edits.
+ let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
let mut data = read_file()?;
let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
data.insert(tokenid.clone(), hashed_secret);
write_file(data)?;
- apply_api_mutation(tokenid, Some(secret));
+ apply_api_mutation(tokenid, Some(secret), pre_meta);
Ok(())
}
@@ -102,11 +169,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
let _guard = lock_config()?;
+ // Capture state before we write to detect external edits.
+ let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
let mut data = read_file()?;
data.remove(tokenid);
write_file(data)?;
- apply_api_mutation(tokenid, None);
+ apply_api_mutation(tokenid, None, pre_meta);
Ok(())
}
@@ -128,6 +198,12 @@ struct ApiTokenSecretCache {
secrets: HashMap<Authid, CachedSecret>,
/// Shared generation to detect mutations of the underlying token.shadow file.
shared_gen: usize,
+ // shadow file mtime to detect changes
+ file_mtime: Option<SystemTime>,
+ // shadow file length to detect changes
+ file_len: Option<u64>,
+ // last time the file metadata was checked
+ last_checked: Option<i64>,
}
/// Cached secret.
@@ -187,7 +263,13 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
eq && gen2 == cache_gen
}
-fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
+fn apply_api_mutation(
+ tokenid: &Authid,
+ new_secret: Option<&str>,
+ pre_write_meta: (Option<SystemTime>, Option<u64>),
+) {
+ let now = epoch_i64();
+
// Signal cache invalidation to other processes (best-effort).
let new_shared_gen = bump_token_shadow_shared_gen();
@@ -203,6 +285,13 @@ fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
// Update to the post-mutation generation.
cache.shared_gen = gen;
+ // If our cached file metadata does not match the on-disk state before our write,
+ // we likely missed an external/manual edit. We can no longer trust any cached secrets.
+ let (pre_mtime, pre_len) = pre_write_meta;
+ if cache.file_mtime != pre_mtime || cache.file_len != pre_len {
+ cache.secrets.clear();
+ }
+
// Apply the new mutation.
match new_secret {
Some(secret) => {
@@ -217,6 +306,20 @@ fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
cache.secrets.remove(tokenid);
}
}
+
+ // Update our view of the file metadata to the post-write state (best-effort).
+ // (If this fails, drop local cache so callers fall back to slow path until refreshed.)
+ match shadow_mtime_len() {
+ Ok((mtime, len)) => {
+ cache.file_mtime = mtime;
+ cache.file_len = len;
+ cache.last_checked = Some(now);
+ }
+ Err(_) => {
+ // If we cannot validate state, do not trust cache.
+ invalidate_cache_state(&mut cache);
+ }
+ }
}
/// Get the current shared generation.
@@ -226,10 +329,24 @@ fn token_shadow_shared_gen() -> Option<usize> {
/// Bump and return the new shared generation.
fn bump_token_shadow_shared_gen() -> Option<usize> {
- access_conf().increment_token_shadow_cache_generation().ok().map(|prev| prev + 1)
+ access_conf()
+ .increment_token_shadow_cache_generation()
+ .ok()
+ .map(|prev| prev + 1)
}
/// Invalidates the cache state and only keeps the shared generation.
fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
cache.secrets.clear();
-}
\ No newline at end of file
+ cache.file_mtime = None;
+ cache.file_len = None;
+ cache.last_checked = None;
+}
+
+fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
+ match fs::metadata(token_shadow()) {
+ Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
+ Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
+ Err(e) => Err(e.into()),
+ }
+}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] [PATCH proxmox v3 2/4] proxmox-access-control: cache verified API token secrets
2026-01-02 16:07 13% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] " Samuel Rufinatscha
` (4 preceding siblings ...)
2026-01-02 16:07 17% ` [pbs-devel] [PATCH proxmox v3 1/4] proxmox-access-control: extend AccessControlConfig for token.shadow invalidation Samuel Rufinatscha
@ 2026-01-02 16:07 12% ` Samuel Rufinatscha
2026-01-02 16:07 12% ` [pbs-devel] [PATCH proxmox v3 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
` (4 subsequent siblings)
10 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-02 16:07 UTC (permalink / raw)
To: pbs-devel
Currently, every token-based API request reads the token.shadow file and
runs the expensive password hash verification for the given token
secret. This issue was first observed as part of profiling the PBS
/status endpoint (see bug #7017 [1]) and is required for the factored
out proxmox_access_control token_shadow implementation too.
This patch introduces an in-memory cache of successfully verified token
secrets. Subsequent requests for the same token+secret combination only
perform a comparison using openssl::memcmp::eq and avoid re-running the
password hash. The cache is updated when a token secret is set and
cleared when a token is deleted. Note, this does NOT include manual
config changes, which will be covered in a subsequent patch.
This patch is part of the series which fixes bug #7017 [1].
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v1 to v2:
* Replace OnceCell with LazyLock, and std::sync::RwLock with
parking_lot::RwLock.
* Add API_MUTATION_GENERATION and guard cache inserts
to prevent “zombie inserts” across concurrent set/delete.
* Refactor cache operations into cache_try_secret_matches,
cache_try_insert_secret, and centralize write-side behavior in
apply_api_mutation.
* Switch fast-path cache access to try_read/try_write (best-effort).
Changes from v2 to v3:
* Replaced process-local cache invalidation (AtomicU64
API_MUTATION_GENERATION) with a cross-process shared generation via
ConfigVersionCache.
* Validate shared generation before/after the constant-time secret
compare; only insert into cache if the generation is unchanged.
* invalidate_cache_state() on insert if shared generation changed.
Cargo.toml | 1 +
proxmox-access-control/Cargo.toml | 1 +
proxmox-access-control/src/token_shadow.rs | 154 ++++++++++++++++++++-
3 files changed, 155 insertions(+), 1 deletion(-)
diff --git a/Cargo.toml b/Cargo.toml
index 27a69afa..59a2ec93 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -112,6 +112,7 @@ native-tls = "0.2"
nix = "0.29"
openssl = "0.10"
pam-sys = "0.5"
+parking_lot = "0.12"
percent-encoding = "2.1"
pin-utils = "0.1.0"
proc-macro2 = "1.0"
diff --git a/proxmox-access-control/Cargo.toml b/proxmox-access-control/Cargo.toml
index ec189664..1de2842c 100644
--- a/proxmox-access-control/Cargo.toml
+++ b/proxmox-access-control/Cargo.toml
@@ -16,6 +16,7 @@ anyhow.workspace = true
const_format.workspace = true
nix = { workspace = true, optional = true }
openssl = { workspace = true, optional = true }
+parking_lot.workspace = true
regex.workspace = true
hex = { workspace = true, optional = true }
serde.workspace = true
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index c586d834..895309d2 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,13 +1,28 @@
use std::collections::HashMap;
+use std::sync::LazyLock;
use anyhow::{bail, format_err, Error};
+use parking_lot::RwLock;
use serde_json::{from_value, Value};
use proxmox_auth_api::types::Authid;
use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
+use crate::init::access_conf;
use crate::init::impl_feature::{token_shadow, token_shadow_lock};
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
+ RwLock::new(ApiTokenSecretCache {
+ secrets: HashMap::new(),
+ shared_gen: 0,
+ })
+});
+
// Get exclusive lock
fn lock_config() -> Result<ApiLockGuard, Error> {
open_api_lockfile(token_shadow_lock(), None, true)
@@ -36,9 +51,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
bail!("not an API token ID");
}
+ // Fast path
+ if cache_try_secret_matches(tokenid, secret) {
+ return Ok(());
+ }
+
+ // Slow path
+ // First, capture the shared generation before doing the hash verification.
+ let gen_before = token_shadow_shared_gen();
+
let data = read_file()?;
match data.get(tokenid) {
- Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+ Some(hashed_secret) => {
+ proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+
+ // Try to cache only if nothing changed while verifying the secret.
+ if let Some(gen) = gen_before {
+ cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
+ }
+
+ Ok(())
+ }
None => bail!("invalid API token"),
}
}
@@ -56,6 +89,8 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
data.insert(tokenid.clone(), hashed_secret);
write_file(data)?;
+ apply_api_mutation(tokenid, Some(secret));
+
Ok(())
}
@@ -71,6 +106,8 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
data.remove(tokenid);
write_file(data)?;
+ apply_api_mutation(tokenid, None);
+
Ok(())
}
@@ -81,3 +118,118 @@ pub fn generate_and_set_secret(tokenid: &Authid) -> Result<String, Error> {
set_secret(tokenid, &secret)?;
Ok(secret)
}
+
+struct ApiTokenSecretCache {
+ /// Keys are token Authids, values are the corresponding plain text secrets.
+ /// Entries are added after a successful on-disk verification in
+ /// `verify_secret` or when a new token secret is generated by
+ /// `generate_and_set_secret`. Used to avoid repeated
+ /// password-hash computation on subsequent authentications.
+ secrets: HashMap<Authid, CachedSecret>,
+ /// Shared generation to detect mutations of the underlying token.shadow file.
+ shared_gen: usize,
+}
+
+/// Cached secret.
+struct CachedSecret {
+ secret: String,
+}
+
+fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
+ let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+ return;
+ };
+
+ let Some(shared_gen_now) = token_shadow_shared_gen() else {
+ return;
+ };
+
+ // If this process missed a generation bump, its cache is stale.
+ if cache.shared_gen != shared_gen_now {
+ invalidate_cache_state(&mut cache);
+ cache.shared_gen = shared_gen_now;
+ }
+
+ // If a mutation happened while we were verifying the secret, do not insert.
+ if shared_gen_now == shared_gen_before {
+ cache.secrets.insert(tokenid, CachedSecret { secret });
+ }
+}
+
+// Tries to match the given token secret against the cached secret.
+// Checks the generation before and after the constant-time compare to avoid a
+// TOCTOU window. If another process rotates/deletes a token while we're validating
+// the cached secret, the generation will change, and we
+// must not trust the cache for this request.
+fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
+ let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+ return false;
+ };
+ let Some(entry) = cache.secrets.get(tokenid) else {
+ return false;
+ };
+
+ let cache_gen = cache.shared_gen;
+
+ let Some(gen1) = token_shadow_shared_gen() else {
+ return false;
+ };
+ if gen1 != cache_gen {
+ return false;
+ }
+
+ let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
+
+ let Some(gen2) = token_shadow_shared_gen() else {
+ return false;
+ };
+
+ eq && gen2 == cache_gen
+}
+
+fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
+ // Signal cache invalidation to other processes (best-effort).
+ let new_shared_gen = bump_token_shadow_shared_gen();
+
+ let mut cache = TOKEN_SECRET_CACHE.write();
+
+ // If we cannot read/bump the shared generation, we cannot safely trust the cache.
+ let Some(gen) = new_shared_gen else {
+ invalidate_cache_state(&mut cache);
+ cache.shared_gen = 0;
+ return;
+ };
+
+ // Update to the post-mutation generation.
+ cache.shared_gen = gen;
+
+ // Apply the new mutation.
+ match new_secret {
+ Some(secret) => {
+ cache.secrets.insert(
+ tokenid.clone(),
+ CachedSecret {
+ secret: secret.to_owned(),
+ },
+ );
+ }
+ None => {
+ cache.secrets.remove(tokenid);
+ }
+ }
+}
+
+/// Get the current shared generation.
+fn token_shadow_shared_gen() -> Option<usize> {
+ access_conf().token_shadow_cache_generation()
+}
+
+/// Bump and return the new shared generation.
+fn bump_token_shadow_shared_gen() -> Option<usize> {
+ access_conf().increment_token_shadow_cache_generation().ok().map(|prev| prev + 1)
+}
+
+/// Invalidates the cache state and only keeps the shared generation.
+fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
+ cache.secrets.clear();
+}
\ No newline at end of file
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] [PATCH proxmox v3 4/4] proxmox-access-control: add TTL window to token secret cache
2026-01-02 16:07 13% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] " Samuel Rufinatscha
` (6 preceding siblings ...)
2026-01-02 16:07 12% ` [pbs-devel] [PATCH proxmox v3 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2026-01-02 16:07 15% ` Samuel Rufinatscha
2026-01-02 16:07 13% ` [pbs-devel] [PATCH proxmox-datacenter-manager v3 1/2] pdm-config: implement token.shadow generation Samuel Rufinatscha
` (2 subsequent siblings)
10 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-02 16:07 UTC (permalink / raw)
To: pbs-devel
Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
should rarely change.
This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired.
This patch is part of the series which fixes bug #7017 [1].
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v1 to v2:
* Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
* Implement double-checked TTL: check with try_read first; only attempt
refresh with try_write if expired/unknown.
* Fix TTL bookkeeping: update last_checked on the “file unchanged” path
and after API mutations.
* Add documentation warning about TTL-delayed effect of manual
token.shadow edits.
Changes from v2 to v3:
* Refactored refresh_cache_if_file_changed TTL logic.
* Remove had_prior_state check (replaced by last_checked logic).
* Improve TTL bound checks.
* Reword documentation warning for clarity.
proxmox-access-control/src/token_shadow.rs | 30 +++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index f30c8ed5..14eea560 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -30,6 +30,9 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
})
});
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
+
// Get exclusive lock
fn lock_config() -> Result<ApiLockGuard, Error> {
open_api_lockfile(token_shadow_lock(), None, true)
@@ -57,11 +60,28 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
fn refresh_cache_if_file_changed() -> bool {
let now = epoch_i64();
- // Best-effort refresh under write lock.
+ // Fast path: cache is fresh if shared-gen matches and TTL not expired.
+ if let (Some(cache), Some(shared_gen_read)) =
+ (TOKEN_SECRET_CACHE.try_read(), token_shadow_shared_gen())
+ {
+ if cache.shared_gen == shared_gen_read
+ && cache
+ .last_checked
+ .is_some_and(|last| now >= last && (now - last) < TOKEN_SECRET_CACHE_TTL_SECS)
+ {
+ return true;
+ }
+ // read lock drops here
+ } else {
+ return false;
+ }
+
+ // Slow path: best-effort refresh under write lock.
let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
return false;
};
+ // Re-read generation after acquiring the lock (may have changed meanwhile).
let Some(shared_gen_now) = token_shadow_shared_gen() else {
return false;
};
@@ -72,6 +92,14 @@ fn refresh_cache_if_file_changed() -> bool {
cache.shared_gen = shared_gen_now;
}
+ // TTL check again after acquiring the lock
+ if cache
+ .last_checked
+ .is_some_and(|last| now >= last && (now - last) < TOKEN_SECRET_CACHE_TTL_SECS)
+ {
+ return true;
+ }
+
// Stat the file to detect manual edits.
let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
return false;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox v3 1/4] proxmox-access-control: extend AccessControlConfig for token.shadow invalidation
2026-01-02 16:07 13% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] " Samuel Rufinatscha
` (3 preceding siblings ...)
2026-01-02 16:07 15% ` [pbs-devel] [PATCH proxmox-backup v3 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
@ 2026-01-02 16:07 17% ` Samuel Rufinatscha
2026-01-02 16:07 12% ` [pbs-devel] [PATCH proxmox v3 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
` (5 subsequent siblings)
10 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-02 16:07 UTC (permalink / raw)
To: pbs-devel
Add token_shadow_cache_generation() and
increment_token_shadow_cache_generation()
hooks to AccessControlConfig. This lets products provide a cross-process
invalidation signal for token.shadow so proxmox-access-control can cache
verified API token secrets and invalidate that cache on token
rotation/deletion.
This patch is part of the series which fixes bug #7017 [1].
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-access-control/src/init.rs | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/proxmox-access-control/src/init.rs b/proxmox-access-control/src/init.rs
index e64398e8..0ba1a526 100644
--- a/proxmox-access-control/src/init.rs
+++ b/proxmox-access-control/src/init.rs
@@ -51,6 +51,23 @@ pub trait AccessControlConfig: Send + Sync {
Ok(())
}
+ /// Returns the current cache generation of the token shadow cache. If the generation was
+ /// incremented since the last time the cache was queried, the token shadow cache is reloaded
+ /// from disk.
+ ///
+ /// Default: Always returns `None`.
+ fn token_shadow_cache_generation(&self) -> Option<usize> {
+ None
+ }
+
+ /// Increment the cache generation of the token shadow cache. This indicates that it was
+ /// changed on disk.
+ ///
+ /// Default: Returns an error as token shadow generation is not supported.
+ fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
+ anyhow::bail!("token shadow generation not supported");
+ }
+
/// Optionally returns a role that has no access to any resource.
///
/// Default: Returns `None`.
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 17%]
* [pbs-devel] [PATCH proxmox-backup v3 1/4] pbs-config: add token.shadow generation to ConfigVersionCache
2026-01-02 16:07 13% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] " Samuel Rufinatscha
@ 2026-01-02 16:07 17% ` Samuel Rufinatscha
2026-01-14 10:44 5% ` Fabian Grünbichler
2026-01-02 16:07 12% ` [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
` (9 subsequent siblings)
10 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2026-01-02 16:07 UTC (permalink / raw)
To: pbs-devel
Currently, every token-based API request reads the token.shadow file and
runs the expensive password hash verification for the given token
secret. This shows up as a hotspot in /status profiling (see
bug #7017 [1]).
To solve the issue, this patch prepares the config version cache,
so that token_shadow_generation config caching can be built on
top of it.
This patch specifically:
(1) implements increment function in order to invalidate generations
This patch is part of the series which fixes bug #7017 [1].
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
pbs-config/src/config_version_cache.rs | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
index e8fb994f..1376b11d 100644
--- a/pbs-config/src/config_version_cache.rs
+++ b/pbs-config/src/config_version_cache.rs
@@ -28,6 +28,8 @@ struct ConfigVersionCacheDataInner {
// datastore (datastore.cfg) generation/version
// FIXME: remove with PBS 3.0
datastore_generation: AtomicUsize,
+ // Token shadow (token.shadow) generation/version.
+ token_shadow_generation: AtomicUsize,
// Add further atomics here
}
@@ -153,4 +155,20 @@ impl ConfigVersionCache {
.datastore_generation
.fetch_add(1, Ordering::AcqRel)
}
+
+ /// Returns the token shadow generation number.
+ pub fn token_shadow_generation(&self) -> usize {
+ self.shmem
+ .data()
+ .token_shadow_generation
+ .load(Ordering::Acquire)
+ }
+
+ /// Increase the token shadow generation number.
+ pub fn increase_token_shadow_generation(&self) -> usize {
+ self.shmem
+ .data()
+ .token_shadow_generation
+ .fetch_add(1, Ordering::AcqRel)
+ }
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 17%]
* [pbs-devel] [PATCH proxmox-datacenter-manager v3 1/2] pdm-config: implement token.shadow generation
2026-01-02 16:07 13% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] " Samuel Rufinatscha
` (7 preceding siblings ...)
2026-01-02 16:07 15% ` [pbs-devel] [PATCH proxmox v3 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
@ 2026-01-02 16:07 13% ` Samuel Rufinatscha
2026-01-14 10:45 5% ` Fabian Grünbichler
2026-01-02 16:07 17% ` [pbs-devel] [PATCH proxmox-datacenter-manager v3 2/2] docs: document API token-cache TTL effects Samuel Rufinatscha
2026-01-21 15:15 13% ` [pbs-devel] superseded: [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] token-shadow: reduce api token verification overhead Samuel Rufinatscha
10 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2026-01-02 16:07 UTC (permalink / raw)
To: pbs-devel
PDM depends on the shared proxmox/proxmox-access-control crate for
token.shadow handling, which expects the product to provide a
cross-process invalidation signal so it can safely cache verified API
token secrets and invalidate them when token.shadow is changed.
This patch
* adds a token_shadow_generation to PDM’s shared-memory
ConfigVersionCache
* implements proxmox_access_control::init::AccessControlConfig
for pdm_config::AccessControlConfig, which
- delegates roles/privs/path checks to the existing
pdm_api_types::AccessControlConfig implementation
- implements the shadow cache generation trait functions
* switches the AccessControlConfig init paths (server + CLI) to use
pdm_config::AccessControlConfig instead of
pdm_api_types::AccessControlConfig
This patch is part of the series which fixes bug #7017 [1].
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
cli/admin/src/main.rs | 2 +-
lib/pdm-config/Cargo.toml | 1 +
lib/pdm-config/src/access_control_config.rs | 73 +++++++++++++++++++++
lib/pdm-config/src/config_version_cache.rs | 18 +++++
lib/pdm-config/src/lib.rs | 2 +
server/src/acl.rs | 3 +-
6 files changed, 96 insertions(+), 3 deletions(-)
create mode 100644 lib/pdm-config/src/access_control_config.rs
diff --git a/cli/admin/src/main.rs b/cli/admin/src/main.rs
index f698fa2..916c633 100644
--- a/cli/admin/src/main.rs
+++ b/cli/admin/src/main.rs
@@ -19,7 +19,7 @@ fn main() {
proxmox_product_config::init(api_user, priv_user);
proxmox_access_control::init::init(
- &pdm_api_types::AccessControlConfig,
+ &pdm_config::AccessControlConfig,
pdm_buildcfg::configdir!("/access"),
)
.expect("failed to setup access control config");
diff --git a/lib/pdm-config/Cargo.toml b/lib/pdm-config/Cargo.toml
index d39c2ad..19781d2 100644
--- a/lib/pdm-config/Cargo.toml
+++ b/lib/pdm-config/Cargo.toml
@@ -13,6 +13,7 @@ once_cell.workspace = true
openssl.workspace = true
serde.workspace = true
+proxmox-access-control.workspace = true
proxmox-config-digest = { workspace = true, features = [ "openssl" ] }
proxmox-http = { workspace = true, features = [ "http-helpers" ] }
proxmox-ldap = { workspace = true, features = [ "types" ]}
diff --git a/lib/pdm-config/src/access_control_config.rs b/lib/pdm-config/src/access_control_config.rs
new file mode 100644
index 0000000..6f2e6b3
--- /dev/null
+++ b/lib/pdm-config/src/access_control_config.rs
@@ -0,0 +1,73 @@
+// e.g. in src/main.rs or server::context mod, wherever convenient
+
+use anyhow::Error;
+use pdm_api_types::{Authid, Userid};
+use proxmox_section_config::SectionConfigData;
+use std::collections::HashMap;
+
+pub struct AccessControlConfig;
+
+impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
+ fn privileges(&self) -> &HashMap<&str, u64> {
+ pdm_api_types::AccessControlConfig.privileges()
+ }
+
+ fn roles(&self) -> &HashMap<&str, (u64, &str)> {
+ pdm_api_types::AccessControlConfig.roles()
+ }
+
+ fn is_superuser(&self, auth_id: &Authid) -> bool {
+ pdm_api_types::AccessControlConfig.is_superuser(auth_id)
+ }
+
+ fn is_group_member(&self, user_id: &Userid, group: &str) -> bool {
+ pdm_api_types::AccessControlConfig.is_group_member(user_id, group)
+ }
+
+ fn role_admin(&self) -> Option<&str> {
+ pdm_api_types::AccessControlConfig.role_admin()
+ }
+
+ fn role_no_access(&self) -> Option<&str> {
+ pdm_api_types::AccessControlConfig.role_no_access()
+ }
+
+ fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
+ pdm_api_types::AccessControlConfig.init_user_config(config)
+ }
+
+ fn acl_audit_privileges(&self) -> u64 {
+ pdm_api_types::AccessControlConfig.acl_audit_privileges()
+ }
+
+ fn acl_modify_privileges(&self) -> u64 {
+ pdm_api_types::AccessControlConfig.acl_modify_privileges()
+ }
+
+ fn check_acl_path(&self, path: &str) -> Result<(), Error> {
+ pdm_api_types::AccessControlConfig.check_acl_path(path)
+ }
+
+ fn allow_partial_permission_match(&self) -> bool {
+ pdm_api_types::AccessControlConfig.allow_partial_permission_match()
+ }
+
+ fn cache_generation(&self) -> Option<usize> {
+ pdm_api_types::AccessControlConfig.cache_generation()
+ }
+
+ fn increment_cache_generation(&self) -> Result<(), Error> {
+ pdm_api_types::AccessControlConfig.increment_cache_generation()
+ }
+
+ fn token_shadow_cache_generation(&self) -> Option<usize> {
+ crate::ConfigVersionCache::new()
+ .ok()
+ .map(|c| c.token_shadow_generation())
+ }
+
+ fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
+ let c = crate::ConfigVersionCache::new()?;
+ Ok(c.increase_token_shadow_generation())
+ }
+}
diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
index 36a6a77..933140c 100644
--- a/lib/pdm-config/src/config_version_cache.rs
+++ b/lib/pdm-config/src/config_version_cache.rs
@@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
traffic_control_generation: AtomicUsize,
// Tracks updates to the remote/hostname/nodename mapping cache.
remote_mapping_cache: AtomicUsize,
+ // Token shadow (token.shadow) generation/version.
+ token_shadow_generation: AtomicUsize,
// Add further atomics here
}
@@ -172,4 +174,20 @@ impl ConfigVersionCache {
.fetch_add(1, Ordering::Relaxed)
+ 1
}
+
+ /// Returns the token shadow generation number.
+ pub fn token_shadow_generation(&self) -> usize {
+ self.shmem
+ .data()
+ .token_shadow_generation
+ .load(Ordering::Acquire)
+ }
+
+ /// Increase the token shadow generation number.
+ pub fn increase_token_shadow_generation(&self) -> usize {
+ self.shmem
+ .data()
+ .token_shadow_generation
+ .fetch_add(1, Ordering::AcqRel)
+ }
}
diff --git a/lib/pdm-config/src/lib.rs b/lib/pdm-config/src/lib.rs
index 4c49054..a15a006 100644
--- a/lib/pdm-config/src/lib.rs
+++ b/lib/pdm-config/src/lib.rs
@@ -9,6 +9,8 @@ pub mod remotes;
pub mod setup;
pub mod views;
+mod access_control_config;
+pub use access_control_config::AccessControlConfig;
mod config_version_cache;
pub use config_version_cache::ConfigVersionCache;
diff --git a/server/src/acl.rs b/server/src/acl.rs
index f421814..e6e007b 100644
--- a/server/src/acl.rs
+++ b/server/src/acl.rs
@@ -1,6 +1,5 @@
pub(crate) fn init() {
- static ACCESS_CONTROL_CONFIG: pdm_api_types::AccessControlConfig =
- pdm_api_types::AccessControlConfig;
+ static ACCESS_CONTROL_CONFIG: pdm_config::AccessControlConfig = pdm_config::AccessControlConfig;
proxmox_access_control::init::init(&ACCESS_CONTROL_CONFIG, pdm_buildcfg::configdir!("/access"))
.expect("failed to setup access control config");
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox-datacenter-manager v3 2/2] docs: document API token-cache TTL effects
2026-01-02 16:07 13% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] " Samuel Rufinatscha
` (8 preceding siblings ...)
2026-01-02 16:07 13% ` [pbs-devel] [PATCH proxmox-datacenter-manager v3 1/2] pdm-config: implement token.shadow generation Samuel Rufinatscha
@ 2026-01-02 16:07 17% ` Samuel Rufinatscha
2026-01-14 10:45 5% ` Fabian Grünbichler
2026-01-21 15:15 13% ` [pbs-devel] superseded: [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] token-shadow: reduce api token verification overhead Samuel Rufinatscha
10 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2026-01-02 16:07 UTC (permalink / raw)
To: pbs-devel
Documents the effects of the added API token-cache in the
proxmox-access-control crate. This patch is part of the
series that fixes bug #7017 [1].
This patch is part of the series which fixes bug #7017 [1].
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v2 to v3:
* Reword documentation warning for clarity.
docs/access-control.rst | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/docs/access-control.rst b/docs/access-control.rst
index adf26cd..18e57a2 100644
--- a/docs/access-control.rst
+++ b/docs/access-control.rst
@@ -47,6 +47,10 @@ place of the user ID (``user@realm``) and the user password, respectively.
The API token is passed from the client to the server by setting the ``Authorization`` HTTP header
with method ``PDMAPIToken`` to the value ``TOKENID:TOKENSECRET``.
+.. WARNING:: Direct/manual edits to ``token.shadow`` may take up to 60 seconds (or
+ longer in edge cases) to take effect due to caching. Restart services for
+ immediate effect of manual edits.
+
.. _access_control:
Access Control
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 17%]
* [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] token-shadow: reduce api token verification overhead
@ 2026-01-02 16:07 13% Samuel Rufinatscha
2026-01-02 16:07 17% ` [pbs-devel] [PATCH proxmox-backup v3 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
` (10 more replies)
0 siblings, 11 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-02 16:07 UTC (permalink / raw)
To: pbs-devel
Hi,
this series improves the performance of token-based API authentication
in PBS (pbs-config) and in PDM (underlying proxmox-access-control
crate), addressing the API token verification hotspot reported in our
bugtracker #7017 [1].
When profiling PBS /status endpoint with cargo flamegraph [2],
token-based authentication showed up as a dominant hotspot via
proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
path from the hot section of the flamegraph. The same performance issue
was measured [2] for PDM. PDM uses the underlying shared
proxmox-access-control library for token handling, which is a
factored out version of the token.shadow handling code from PBS.
While this series fixes the immediate performance issue both in PBS
(pbs-config) and in the shared proxmox-access-control crate used by
PDM, PBS should eventually, ideally be refactored, in a separate
effort, to use proxmox-access-control for token handling instead of its
local implementation.
Problem
For token-based API requests, both PBS’s pbs-config token.shadow
handling and PDM proxmox-access-control’s token.shadow handling
currently:
1. read the token.shadow file on each request
2. deserialize it into a HashMap<Authid, String>
3. run password hash verification via
proxmox_sys::crypt::verify_crypt_pw for the provided token secret
Under load, this results in significant CPU usage spent in repeated
password hashing for the same token+secret pairs. The attached
flamegraphs for PBS [2] and PDM [3] show
proxmox_sys::crypt::verify_crypt_pw dominating the hot path.
Approach
The goal is to reduce the cost of token-based authentication preserving
the existing token handling semantics (including detecting manual edits
to token.shadow) and be consistent between PBS (pbs-config) and
PDM (proxmox-access-control). For both sites, this series proposes to:
1. Introduce an in-memory cache for verified token secrets and
invalidate it through a shared ConfigVersionCache generation. Note, a
shared generation is required to keep privileged and unprivileged
daemon in sync to avoid caching inconsistencies across processes.
2. Invalidate on token.shadow file API changes (set_secret,
delete_secret)
3. Invalidate on direct/manual token.shadow file changes (mtime +
length)
4. Avoid per-request file stat calls using a TTL window
Testing
*PBS (pbs-config)*
To verify the effect in PBS, I:
1. Set up test environment based on latest PBS ISO, installed Rust
toolchain, cloned proxmox-backup repository to use with cargo
flamegraph. Reproduced bug #7017 [1] by profiling the /status
endpoint with token-based authentication using cargo flamegraph [2].
2. Built PBS with pbs-config patches and re-ran the same workload and
profiling setup. Confirmed that
proxmox_sys::crypt::verify_crypt_pw path no longer appears in the
hot section of the flamegraph. CPU usage is now dominated by TLS
overhead.
3. Functionally-wise, I verified that:
* valid tokens authenticate correctly when used in API requests
* invalid secrets are rejected as before
* generating a new token secret via dashboard (create token for user,
regenerate existing secret) works and authenticates correctly
*PDM (proxmox-access-control)*
To verify the effect in PDM, I followed a similar testing approach.
Instead of PBS’ /status, I profiled the /version endpoint with cargo
flamegraph [2] and verified that the expensive hashing path disappears
from the hot section after introducing caching.
Functionally-wise, I verified that:
* valid tokens authenticate correctly when used in API requests
* invalid secrets are rejected as before
* generating a new token secret via dashboard (create token for user,
regenerate existing secret) works and authenticates correctly
Benchmarks:
Two different benchmarks have been run to measure caching effects
and RwLock contention:
(1) Requests per second for PBS /status endpoint (E2E)
Benchmarked parallel token auth requests for
/status?verbose=0 on top of the datastore lookup cache series [4]
to check throughput impact. With datastores=1, repeat=5000, parallel=16
this series gives ~172 req/s compared to ~65 req/s without it.
This is a ~2.6x improvement (and aligns with the ~179 req/s from the
previous series, which used per-process cache invalidation).
(2) RwLock contention for token create/delete under heavy load of
token-authenticated requests
The previous version of the series compared std::sync::RwLock and
parking_lot::RwLock contention for token create/delete under heavy
parallel token-authenticated readers. parking_lot::RwLock has been
chosen for the added fairness guarantees.
Patch summary
pbs-config:
0001 – pbs-config: add token.shadow generation to ConfigVersionCache
Extends ConfigVersionCache to provide a process-shared generation
number for token.shadow changes.
0002 – pbs-config: cache verified API token secrets
Adds an in-memory cache to cache verified, plain-text API token secrets.
Cache is invalidated through the process-shared ConfigVersionCache
generation number. Uses openssl’s memcmp constant-time for matching
secrets.
0003 – pbs-config: invalidate token-secret cache on token.shadow
changes
Stats token.shadow mtime and length and clears the cache when the
file changes, on each token verification request.
0004 – pbs-config: add TTL window to token-secret cache
Introduces a TTL (TOKEN_SECRET_CACHE_TTL_SECS, default 60) for metadata
checks so that fs::metadata calls are not performed on each request.
proxmox-access-control:
0005 – access-control: extend AccessControlConfig for token.shadow invalidation
Extends the AccessControlConfig trait with
token_shadow_cache_generation() and
increment_token_shadow_cache_generation() for
proxmox-access-control to get the shared token.shadow generation number
and bump it on token shadow changes.
0006 – access-control: cache verified API token secrets
Mirrors PBS PATCH 0002.
0007 – access-control: invalidate token-secret cache on token.shadow changes
Mirrors PBS PATCH 0003.
0008 – access-control: add TTL window to token-secret cache
Mirrors PBS PATCH 0004.
proxmox-datacenter-manager:
0009 – pdm-config: add token.shadow generation to ConfigVersionCache
Extends PDM ConfigVersionCache and implements
token_shadow_cache_generation() and
increment_token_shadow_cache_generation() from AccessControlConfig for
PDM.
0010 – docs: document API token-cache TTL effects
Documents the effects of the TTL window on token.shadow edits
Changes from v1 to v2:
* (refactor) Switched cache initialization to LazyLock
* (perf) Use parking_lot::RwLock and best-effort cache access on the
read/refresh path (try_read/try_write) to avoid lock contention
* (doc) Document TTL-delayed effect of manual token.shadow edits
* (fix) Add generation guards (API_MUTATION_GENERATION +
FILE_GENERATION) to prevent caching across concurrent set/delete and
external edits
Changes from v2 to v3:
* (refactor) Replace PBS per-process cache invalidation with a
cross-process token.shadow generation based on PBS
ConfigVersionCache, ensuring cache consistency between privileged
and unprivileged daemons.
* (refactor) Decoupling generation source from the
proxmox/proxmox-access-control cache implementation: extend
AccessControlConfig hooks so that products can provide the shared
token.shadow generation source.
* (refactor) Extend PDM's ConfigVersionCache with
token_shadow_generation
and introduce a pdm_config::AccessControlConfig wrapper implementing
the new proxmox-access-control trait hooks. Switch server and CLI
initialization to use pdm_config::AccessControlConfig instead of
pdm_api_types::AccessControlConfig.
* (refactor) Adapt generation checks around cached-secret comparison to
use the new shared generation source.
* (fix/logic) cache_try_insert_secret: Update the local cache
generation if stale, allowing the new secret to be inserted
immediately
* (refactor) Extract cache invalidation logic into a
invalidate_cache_state helper to reduce duplication and ensure
consistent state resets
* (refactor) Simplify refresh_cache_if_file_changed: handle the
un-initialized/reset state and adjust the generation mismatch
path to ensure file metadata is always re-read.
* (doc) Clarify TTL-delayed effects of manual token.shadow edits.
Please see the patch specific changelogs for more details.
Thanks for considering this patch series, I look forward to your
feedback.
Best,
Samuel Rufinatscha
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
[2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
[3] attachment 1794 [1]: Flamegraph PDM baseline
[4] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
proxmox-backup:
Samuel Rufinatscha (4):
pbs-config: add token.shadow generation to ConfigVersionCache
pbs-config: cache verified API token secrets
pbs-config: invalidate token-secret cache on token.shadow changes
pbs-config: add TTL window to token secret cache
Cargo.toml | 1 +
docs/user-management.rst | 4 +
pbs-config/Cargo.toml | 1 +
pbs-config/src/config_version_cache.rs | 18 ++
pbs-config/src/token_shadow.rs | 298 ++++++++++++++++++++++++-
5 files changed, 321 insertions(+), 1 deletion(-)
proxmox:
Samuel Rufinatscha (4):
proxmox-access-control: extend AccessControlConfig for token.shadow
invalidation
proxmox-access-control: cache verified API token secrets
proxmox-access-control: invalidate token-secret cache on token.shadow
changes
proxmox-access-control: add TTL window to token secret cache
Cargo.toml | 1 +
proxmox-access-control/Cargo.toml | 1 +
proxmox-access-control/src/init.rs | 17 ++
proxmox-access-control/src/token_shadow.rs | 299 ++++++++++++++++++++-
4 files changed, 317 insertions(+), 1 deletion(-)
proxmox-datacenter-manager:
Samuel Rufinatscha (2):
pdm-config: implement token.shadow generation
docs: document API token-cache TTL effects
cli/admin/src/main.rs | 2 +-
docs/access-control.rst | 4 ++
lib/pdm-config/Cargo.toml | 1 +
lib/pdm-config/src/access_control_config.rs | 73 +++++++++++++++++++++
lib/pdm-config/src/config_version_cache.rs | 18 +++++
lib/pdm-config/src/lib.rs | 2 +
server/src/acl.rs | 3 +-
7 files changed, 100 insertions(+), 3 deletions(-)
create mode 100644 lib/pdm-config/src/access_control_config.rs
Summary over all repositories:
16 files changed, 738 insertions(+), 5 deletions(-)
--
Generated by git-murpp 0.8.1
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox-backup v3 3/4] pbs-config: invalidate token-secret cache on token.shadow changes
2026-01-02 16:07 13% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] " Samuel Rufinatscha
2026-01-02 16:07 17% ` [pbs-devel] [PATCH proxmox-backup v3 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
2026-01-02 16:07 12% ` [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
@ 2026-01-02 16:07 12% ` Samuel Rufinatscha
2026-01-14 10:44 5% ` Fabian Grünbichler
2026-01-02 16:07 15% ` [pbs-devel] [PATCH proxmox-backup v3 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
` (7 subsequent siblings)
10 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2026-01-02 16:07 UTC (permalink / raw)
To: pbs-devel
Previously the in-memory token-secret cache was only updated via
set_secret() and delete_secret(), so manual edits to token.shadow were
not reflected.
This patch adds file change detection to the cache. It tracks the mtime
and length of token.shadow and clears the in-memory token secret cache
whenever these values change.
Note, this patch fetches file stats on every request. An TTL-based
optimization will be covered in a subsequent patch of the series.
This patch is part of the series which fixes bug #7017 [1].
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v1 to v2:
* Add file metadata tracking (file_mtime, file_len) and
FILE_GENERATION.
* Store file_gen in CachedSecret and verify it against the current
FILE_GENERATION to ensure cached entries belong to the current file
state.
* Add shadow_mtime_len() helper and convert refresh to best-effort
(try_write, returns bool).
* Pass a pre-write metadata snapshot into apply_api_mutation and
clear/bump generation if the cache metadata indicates missed external
edits.
Changes from v2 to v3:
* Cache now tracks last_checked (epoch seconds).
* Simplified refresh_cache_if_file_changed, removed
FILE_GENERATION logic
* On first load, initializes file metadata and keeps empty cache.
pbs-config/src/token_shadow.rs | 122 +++++++++++++++++++++++++++++++--
1 file changed, 118 insertions(+), 4 deletions(-)
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index fa84aee5..02fb191b 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,5 +1,8 @@
use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
use std::sync::LazyLock;
+use std::time::SystemTime;
use anyhow::{bail, format_err, Error};
use parking_lot::RwLock;
@@ -7,6 +10,7 @@ use serde::{Deserialize, Serialize};
use serde_json::{from_value, Value};
use proxmox_sys::fs::CreateOptions;
+use proxmox_time::epoch_i64;
use pbs_api_types::Authid;
//use crate::auth;
@@ -24,6 +28,9 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
RwLock::new(ApiTokenSecretCache {
secrets: HashMap::new(),
shared_gen: 0,
+ file_mtime: None,
+ file_len: None,
+ last_checked: None,
})
});
@@ -62,6 +69,63 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
proxmox_sys::fs::replace_file(CONF_FILE, &json, options, true)
}
+/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
+/// Returns true if the cache is valid to use, false if not.
+fn refresh_cache_if_file_changed() -> bool {
+ let now = epoch_i64();
+
+ // Best-effort refresh under write lock.
+ let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+ return false;
+ };
+
+ let Some(shared_gen_now) = token_shadow_shared_gen() else {
+ return false;
+ };
+
+ // If another process bumped the generation, we don't know what changed -> clear cache
+ if cache.shared_gen != shared_gen_now {
+ invalidate_cache_state(&mut cache);
+ cache.shared_gen = shared_gen_now;
+ }
+
+ // Stat the file to detect manual edits.
+ let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
+ return false;
+ };
+
+ // Initialize file stats if we have no prior state.
+ if cache.last_checked.is_none() {
+ cache.secrets.clear(); // ensure cache is empty on first load
+ cache.file_mtime = new_mtime;
+ cache.file_len = new_len;
+ cache.last_checked = Some(now);
+ return true;
+ }
+
+ // No change detected.
+ if cache.file_mtime == new_mtime && cache.file_len == new_len {
+ cache.last_checked = Some(now);
+ return true;
+ }
+
+ // Manual edit detected -> invalidate cache and update stat.
+ cache.secrets.clear();
+ cache.file_mtime = new_mtime;
+ cache.file_len = new_len;
+ cache.last_checked = Some(now);
+
+ // Best-effort propagation to other processes + update local view.
+ if let Some(shared_gen_new) = bump_token_shadow_shared_gen() {
+ cache.shared_gen = shared_gen_new;
+ } else {
+ // Do not fail: local cache is already safe as we cleared it above.
+ // Keep local shared_gen as-is to avoid repeated failed attempts.
+ }
+
+ true
+}
+
/// Verifies that an entry for given tokenid / API token secret exists
pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
if !tokenid.is_token() {
@@ -69,7 +133,7 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
}
// Fast path
- if cache_try_secret_matches(tokenid, secret) {
+ if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
return Ok(());
}
@@ -109,12 +173,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
let _guard = lock_config()?;
+ // Capture state before we write to detect external edits.
+ let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
let mut data = read_file()?;
let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
data.insert(tokenid.clone(), hashed_secret);
write_file(data)?;
- apply_api_mutation(tokenid, Some(secret));
+ apply_api_mutation(tokenid, Some(secret), pre_meta);
Ok(())
}
@@ -127,11 +194,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
let _guard = lock_config()?;
+ // Capture state before we write to detect external edits.
+ let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
let mut data = read_file()?;
data.remove(tokenid);
write_file(data)?;
- apply_api_mutation(tokenid, None);
+ apply_api_mutation(tokenid, None, pre_meta);
Ok(())
}
@@ -145,6 +215,12 @@ struct ApiTokenSecretCache {
secrets: HashMap<Authid, CachedSecret>,
/// Shared generation to detect mutations of the underlying token.shadow file.
shared_gen: usize,
+ // shadow file mtime to detect changes
+ file_mtime: Option<SystemTime>,
+ // shadow file length to detect changes
+ file_len: Option<u64>,
+ // last time the file metadata was checked
+ last_checked: Option<i64>,
}
/// Cached secret.
@@ -204,7 +280,13 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
eq && gen2 == cache_gen
}
-fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
+fn apply_api_mutation(
+ tokenid: &Authid,
+ new_secret: Option<&str>,
+ pre_write_meta: (Option<SystemTime>, Option<u64>),
+) {
+ let now = epoch_i64();
+
// Signal cache invalidation to other processes (best-effort).
let new_shared_gen = bump_token_shadow_shared_gen();
@@ -220,6 +302,13 @@ fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
// Update to the post-mutation generation.
cache.shared_gen = gen;
+ // If our cached file metadata does not match the on-disk state before our write,
+ // we likely missed an external/manual edit. We can no longer trust any cached secrets.
+ let (pre_mtime, pre_len) = pre_write_meta;
+ if cache.file_mtime != pre_mtime || cache.file_len != pre_len {
+ cache.secrets.clear();
+ }
+
// Apply the new mutation.
match new_secret {
Some(secret) => {
@@ -234,6 +323,20 @@ fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
cache.secrets.remove(tokenid);
}
}
+
+ // Update our view of the file metadata to the post-write state (best-effort).
+ // (If this fails, drop local cache so callers fall back to slow path until refreshed.)
+ match shadow_mtime_len() {
+ Ok((mtime, len)) => {
+ cache.file_mtime = mtime;
+ cache.file_len = len;
+ cache.last_checked = Some(now);
+ }
+ Err(_) => {
+ // If we cannot validate state, do not trust cache.
+ invalidate_cache_state(&mut cache);
+ }
+ }
}
/// Get the current shared generation.
@@ -253,4 +356,15 @@ fn bump_token_shadow_shared_gen() -> Option<usize> {
/// Invalidates the cache state and only keeps the shared generation.
fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
cache.secrets.clear();
+ cache.file_mtime = None;
+ cache.file_len = None;
+ cache.last_checked = None;
+}
+
+fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
+ match fs::metadata(CONF_FILE) {
+ Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
+ Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
+ Err(e) => Err(e.into()),
+ }
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] [PATCH proxmox-backup v3 4/4] pbs-config: add TTL window to token secret cache
2026-01-02 16:07 13% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] " Samuel Rufinatscha
` (2 preceding siblings ...)
2026-01-02 16:07 12% ` [pbs-devel] [PATCH proxmox-backup v3 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2026-01-02 16:07 15% ` Samuel Rufinatscha
2026-01-02 16:07 17% ` [pbs-devel] [PATCH proxmox v3 1/4] proxmox-access-control: extend AccessControlConfig for token.shadow invalidation Samuel Rufinatscha
` (6 subsequent siblings)
10 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2026-01-02 16:07 UTC (permalink / raw)
To: pbs-devel
Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
usually should rarely change.
This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired. Documents TTL effects.
This patch is part of the series which fixes bug #7017 [1].
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v1 to v2:
* Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
* Implement double-checked TTL: check with try_read first; only attempt
refresh with try_write if expired/unknown.
* Fix TTL bookkeeping: update last_checked on the “file unchanged” path
and after API mutations.
* Add documentation warning about TTL-delayed effect of manual
token.shadow edits.
Changes from v2 to v3:
* Refactored refresh_cache_if_file_changed TTL logic.
* Remove had_prior_state check (replaced by last_checked logic).
* Improve TTL bound checks.
* Reword documentation warning for clarity.
docs/user-management.rst | 4 ++++
pbs-config/src/token_shadow.rs | 29 ++++++++++++++++++++++++++++-
2 files changed, 32 insertions(+), 1 deletion(-)
diff --git a/docs/user-management.rst b/docs/user-management.rst
index 41b43d60..8dfae528 100644
--- a/docs/user-management.rst
+++ b/docs/user-management.rst
@@ -156,6 +156,10 @@ metadata:
Similarly, the ``user delete-token`` subcommand can be used to delete a token
again.
+.. WARNING:: Direct/manual edits to ``token.shadow`` may take up to 60 seconds (or
+ longer in edge cases) to take effect due to caching. Restart services for
+ immediate effect of manual edits.
+
Newly generated API tokens don't have any permissions. Please read the next
section to learn how to set access permissions.
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index 02fb191b..e3529b40 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -33,6 +33,8 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
last_checked: None,
})
});
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
@@ -74,11 +76,28 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
fn refresh_cache_if_file_changed() -> bool {
let now = epoch_i64();
- // Best-effort refresh under write lock.
+ // Fast path: cache is fresh if shared-gen matches and TTL not expired.
+ if let (Some(cache), Some(shared_gen_read)) =
+ (TOKEN_SECRET_CACHE.try_read(), token_shadow_shared_gen())
+ {
+ if cache.shared_gen == shared_gen_read
+ && cache
+ .last_checked
+ .is_some_and(|last| now >= last && (now - last) < TOKEN_SECRET_CACHE_TTL_SECS)
+ {
+ return true;
+ }
+ // read lock drops here
+ } else {
+ return false;
+ }
+
+ // Slow path: best-effort refresh under write lock.
let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
return false;
};
+ // Re-read generation after acquiring the lock (may have changed meanwhile).
let Some(shared_gen_now) = token_shadow_shared_gen() else {
return false;
};
@@ -89,6 +108,14 @@ fn refresh_cache_if_file_changed() -> bool {
cache.shared_gen = shared_gen_now;
}
+ // TTL check again after acquiring the lock
+ if cache
+ .last_checked
+ .is_some_and(|last| now >= last && (now - last) < TOKEN_SECRET_CACHE_TTL_SECS)
+ {
+ return true;
+ }
+
// Stat the file to detect manual edits.
let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
return false;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets
2026-01-02 16:07 13% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] " Samuel Rufinatscha
2026-01-02 16:07 17% ` [pbs-devel] [PATCH proxmox-backup v3 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
@ 2026-01-02 16:07 12% ` Samuel Rufinatscha
2026-01-14 10:44 5% ` Fabian Grünbichler
2026-01-02 16:07 12% ` [pbs-devel] [PATCH proxmox-backup v3 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
` (8 subsequent siblings)
10 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2026-01-02 16:07 UTC (permalink / raw)
To: pbs-devel
Currently, every token-based API request reads the token.shadow file and
runs the expensive password hash verification for the given token
secret. This shows up as a hotspot in /status profiling (see
bug #7017 [1]).
This patch introduces an in-memory cache of successfully verified token
secrets. Subsequent requests for the same token+secret combination only
perform a comparison using openssl::memcmp::eq and avoid re-running the
password hash. The cache is updated when a token secret is set and
cleared when a token is deleted. Note, this does NOT include manual
config changes, which will be covered in a subsequent patch.
This patch is part of the series which fixes bug #7017 [1].
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v1 to v2:
* Replace OnceCell with LazyLock, and std::sync::RwLock with
parking_lot::RwLock.
* Add API_MUTATION_GENERATION and guard cache inserts
to prevent “zombie inserts” across concurrent set/delete.
* Refactor cache operations into cache_try_secret_matches,
cache_try_insert_secret, and centralize write-side behavior in
apply_api_mutation.
* Switch fast-path cache access to try_read/try_write (best-effort).
Changes from v2 to v3:
* Replaced process-local cache invalidation (AtomicU64
API_MUTATION_GENERATION) with a cross-process shared generation via
ConfigVersionCache.
* Validate shared generation before/after the constant-time secret
compare; only insert into cache if the generation is unchanged.
* invalidate_cache_state() on insert if shared generation changed.
Cargo.toml | 1 +
pbs-config/Cargo.toml | 1 +
pbs-config/src/token_shadow.rs | 157 ++++++++++++++++++++++++++++++++-
3 files changed, 158 insertions(+), 1 deletion(-)
diff --git a/Cargo.toml b/Cargo.toml
index 1aa57ae5..821b63b7 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -143,6 +143,7 @@ nom = "7"
num-traits = "0.2"
once_cell = "1.3.1"
openssl = "0.10.40"
+parking_lot = "0.12"
percent-encoding = "2.1"
pin-project-lite = "0.2"
regex = "1.5.5"
diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
index 74afb3c6..eb81ce00 100644
--- a/pbs-config/Cargo.toml
+++ b/pbs-config/Cargo.toml
@@ -13,6 +13,7 @@ libc.workspace = true
nix.workspace = true
once_cell.workspace = true
openssl.workspace = true
+parking_lot.workspace = true
regex.workspace = true
serde.workspace = true
serde_json.workspace = true
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index 640fabbf..fa84aee5 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,6 +1,8 @@
use std::collections::HashMap;
+use std::sync::LazyLock;
use anyhow::{bail, format_err, Error};
+use parking_lot::RwLock;
use serde::{Deserialize, Serialize};
use serde_json::{from_value, Value};
@@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
+ RwLock::new(ApiTokenSecretCache {
+ secrets: HashMap::new(),
+ shared_gen: 0,
+ })
+});
+
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
/// ApiToken id / secret pair
@@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
bail!("not an API token ID");
}
+ // Fast path
+ if cache_try_secret_matches(tokenid, secret) {
+ return Ok(());
+ }
+
+ // Slow path
+ // First, capture the shared generation before doing the hash verification.
+ let gen_before = token_shadow_shared_gen();
+
let data = read_file()?;
match data.get(tokenid) {
- Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+ Some(hashed_secret) => {
+ proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+
+ // Try to cache only if nothing changed while verifying the secret.
+ if let Some(gen) = gen_before {
+ cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
+ }
+
+ Ok(())
+ }
None => bail!("invalid API token"),
}
}
@@ -82,6 +114,8 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
data.insert(tokenid.clone(), hashed_secret);
write_file(data)?;
+ apply_api_mutation(tokenid, Some(secret));
+
Ok(())
}
@@ -97,5 +131,126 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
data.remove(tokenid);
write_file(data)?;
+ apply_api_mutation(tokenid, None);
+
Ok(())
}
+
+struct ApiTokenSecretCache {
+ /// Keys are token Authids, values are the corresponding plain text secrets.
+ /// Entries are added after a successful on-disk verification in
+ /// `verify_secret` or when a new token secret is generated by
+ /// `generate_and_set_secret`. Used to avoid repeated
+ /// password-hash computation on subsequent authentications.
+ secrets: HashMap<Authid, CachedSecret>,
+ /// Shared generation to detect mutations of the underlying token.shadow file.
+ shared_gen: usize,
+}
+
+/// Cached secret.
+struct CachedSecret {
+ secret: String,
+}
+
+fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
+ let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+ return;
+ };
+
+ let Some(shared_gen_now) = token_shadow_shared_gen() else {
+ return;
+ };
+
+ // If this process missed a generation bump, its cache is stale.
+ if cache.shared_gen != shared_gen_now {
+ invalidate_cache_state(&mut cache);
+ cache.shared_gen = shared_gen_now;
+ }
+
+ // If a mutation happened while we were verifying the secret, do not insert.
+ if shared_gen_now == shared_gen_before {
+ cache.secrets.insert(tokenid, CachedSecret { secret });
+ }
+}
+
+// Tries to match the given token secret against the cached secret.
+// Checks the generation before and after the constant-time compare to avoid a
+// TOCTOU window. If another process rotates/deletes a token while we're validating
+// the cached secret, the generation will change, and we
+// must not trust the cache for this request.
+fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
+ let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+ return false;
+ };
+ let Some(entry) = cache.secrets.get(tokenid) else {
+ return false;
+ };
+
+ let cache_gen = cache.shared_gen;
+
+ let Some(gen1) = token_shadow_shared_gen() else {
+ return false;
+ };
+ if gen1 != cache_gen {
+ return false;
+ }
+
+ let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
+
+ let Some(gen2) = token_shadow_shared_gen() else {
+ return false;
+ };
+
+ eq && gen2 == cache_gen
+}
+
+fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
+ // Signal cache invalidation to other processes (best-effort).
+ let new_shared_gen = bump_token_shadow_shared_gen();
+
+ let mut cache = TOKEN_SECRET_CACHE.write();
+
+ // If we cannot read/bump the shared generation, we cannot safely trust the cache.
+ let Some(gen) = new_shared_gen else {
+ invalidate_cache_state(&mut cache);
+ cache.shared_gen = 0;
+ return;
+ };
+
+ // Update to the post-mutation generation.
+ cache.shared_gen = gen;
+
+ // Apply the new mutation.
+ match new_secret {
+ Some(secret) => {
+ cache.secrets.insert(
+ tokenid.clone(),
+ CachedSecret {
+ secret: secret.to_owned(),
+ },
+ );
+ }
+ None => {
+ cache.secrets.remove(tokenid);
+ }
+ }
+}
+
+/// Get the current shared generation.
+fn token_shadow_shared_gen() -> Option<usize> {
+ crate::ConfigVersionCache::new()
+ .ok()
+ .map(|cvc| cvc.token_shadow_generation())
+}
+
+/// Bump and return the new shared generation.
+fn bump_token_shadow_shared_gen() -> Option<usize> {
+ crate::ConfigVersionCache::new()
+ .ok()
+ .map(|cvc| cvc.increase_token_shadow_generation() + 1)
+}
+
+/// Invalidates the cache state and only keeps the shared generation.
+fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
+ cache.secrets.clear();
+}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* Re: [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token verification overhead
2025-12-17 16:25 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token " Samuel Rufinatscha
` (6 preceding siblings ...)
2025-12-17 16:25 17% ` [pbs-devel] [PATCH proxmox-datacenter-manager v2 1/1] docs: document API token-cache TTL effects Samuel Rufinatscha
@ 2025-12-18 11:03 12% ` Samuel Rufinatscha
2026-01-02 16:09 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
8 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-18 11:03 UTC (permalink / raw)
To: pbs-devel
It appears we need to switch the cache to a SharedMemory
implementation. I ran additional token invalidation tests, which showed
that regenerating a secret for a given token in the dashboard can take
up to one minute (TTL) to propagate, even though it calls the set_secret
API, which should directly update the cache.
I discussed this with Fabian, and it appears the root cause seems to be
that token modifications happen in the privileged daemon, while most
regular API requests are handled by the non-privileged one.
Moving the cache to a SharedMemory implementation should resolve
this. No other changes should be required.
I will send a v3 incorporating this change.
On 12/17/25 5:25 PM, Samuel Rufinatscha wrote:
> Hi,
>
> this series improves the performance of token-based API authentication
> in PBS (pbs-config) and in PDM (underlying proxmox-access-control
> crate), addressing the API token verification hotspot reported in our
> bugtracker #6049 [1].
>
> When profiling PBS /status endpoint with cargo flamegraph [2],
> token-based authentication showed up as a dominant hotspot via
> proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
> path from the hot section of the flamegraph. The same performance issue
> was measured [2] for PDM. PDM uses the underlying shared
> proxmox-access-control library for token handling, which is a
> factored out version of the token.shadow handling code from PBS.
>
> While this series fixes the immediate performance issue both in PBS
> (pbs-config) and in the shared proxmox-access-control crate used by
> PDM, PBS should eventually, ideally be refactored, in a separate
> effort, to use proxmox-access-control for token handling instead of its
> local implementation.
>
> Problem
>
> For token-based API requests, both PBS’s pbs-config token.shadow
> handling and PDM proxmox-access-control’s token.shadow handling
> currently:
>
> 1. read the token.shadow file on each request
> 2. deserialize it into a HashMap<Authid, String>
> 3. run password hash verification via
> proxmox_sys::crypt::verify_crypt_pw for the provided token secret
>
> Under load, this results in significant CPU usage spent in repeated
> password hash computations for the same token+secret pairs. The
> attached flamegraphs for PBS [2] and PDM [3] show
> proxmox_sys::crypt::verify_crypt_pw dominating the hot path.
>
> Approach
>
> The goal is to reduce the cost of token-based authentication preserving
> the existing token handling semantics (including detecting manual edits
> to token.shadow) and be consistent between PBS (pbs-config) and
> PDM (proxmox-access-control). For both sites, the series proposes
> following approach:
>
> 1. Introduce an in-memory cache for verified token secrets
> 2. Invalidate the cache when token.shadow changes (detect manual edits)
> 3. Control metadata checks with a TTL window
>
> Testing
>
> *PBS (pbs-config)*
>
> To verify the effect in PBS, I:
> 1. Set up test environment based on latest PBS ISO, installed Rust
> toolchain, cloned proxmox-backup repository to use with cargo
> flamegraph. Reproduced bug #6049 [1] by profiling the /status
> endpoint with token-based authentication using cargo flamegraph [2].
> The flamegraph showed proxmox_sys::crypt::verify_crypt_pw is the
> hotspot.
> 2. Built PBS with pbs-config patches and re-ran the same workload and
> profiling setup.
> 3. Confirmed that the proxmox_sys::crypt::verify_crypt_pw path no
> longer appears in the hot section of the flamegraph. CPU usage is
> now dominated by TLS overhead.
> 4. Functionally verified that:
> * token-based API authentication still works for valid tokens
> * invalid secrets are rejected as before
> * generating a new token secret via dashboard works and
> authenticates correctly
>
> *PDM (proxmox-access-control)*
>
> To verify the effect in PDM, I followed a similar testing approach.
> Instead of /status, I profiled the /version endpoint with cargo
> flamegraph [2] and verified that the token hashing path disappears [4]
> from the hot section after applying the proxmox-access-control patches.
>
> Functionally I verified that:
> * token-based API authentication still works for valid tokens
> * invalid secrets are rejected as before
> * generating a new token secret via dashboard works and
> authenticates correctly
>
> Benchmarks:
>
> Two different benchmarks have been run to measure caching effects
> and RwLock contention:
>
> (1) Requests per second for PBS /status endpoint (E2E)
> (2) RwLock contention for token create/delete under
> heavy parallel token-authenticated readers; compared
> std::sync::RwLock and parking_lot::RwLock.
>
> (1) benchmarked parallel token auth requests for
> /status?verbose=0 on top of the datastore lookup cache series [5]
> to check throughput impact. With datastores=1, repeat=5000, parallel=16
> this series gives ~179 req/s compared to ~65 req/s without it.
> This is a ~2.75x improvement.
>
> (2) benchmarked token create/delete operations under heavy load of
> token-authenticated requests on top of the datastore lookup cache [5]
> series. This benchmark was done using against a 64-parallel
> token-auth flood (200k requests) against
> /admin/datastore/ds0001/status?verbose=0 while executing 50 token
> create + 50 token delete operations. After the series I saw the
> following e2e API latencies:
>
> parking_lot::RwLock
> - create avg ~27ms (p95 ~28ms) vs ~46ms (p95 ~50ms) baseline
> - delete avg ~17ms (p95 ~19ms) vs ~33ms (p95 ~35ms) baseline
>
> std::sync::RwLock
> - create avg ~27ms (p95 ~28ms)
> - create avg ~17ms (p95 ~19ms)
>
> It appears that the both RwLock implementations perform similarly
> for this workload. The parking_lot version has been chosen for the
> added fairness guarantees.
>
> Patch summary
>
> pbs-config:
>
> 0001 – pbs-config: cache verified API token secrets
> Adds an in-memory cache keyed by Authid that stores plain text token
> secrets after a successful verification or generation and uses
> openssl’s memcmp constant-time for comparison.
>
> 0002 – pbs-config: invalidate token-secret cache on token.shadow
> changes
> Tracks token.shadow mtime and length and clears the in-memory
> cache when the file changes.
>
> 0003 – pbs-config: add TTL window to token-secret cache
> Introduces a TTL (TOKEN_SECRET_CACHE_TTL_SECS, default 60) for metadata
> checks so that fs::metadata is only called periodically.
>
> proxmox-access-control:
>
> 0004 – access-control: cache verified API token secrets
> Mirrors PBS PATCH 0001.
>
> 0005 – access-control: invalidate token-secret cache on token.shadow changes
> Mirrors PBS PATCH 0002.
>
> 0006 – access-control: add TTL window to token-secret cache
> Mirrors PBS PATCH 0003.
>
> proxmox-datacenter-manager:
>
> 0007 – docs: document API token-cache TTL effects
> Documents the effects of the TTL window on token.shadow edits
>
> Changes since v1
>
> - (refactor) Switched cache initialization to LazyLock
> - (perf) Use parking_lot::RwLock and best-effort cache access on the
> read/refresh path (try_read/try_write) to avoid lock contention
> - (doc) Document TTL-delayed effect of manual token.shadow edits
> - (fix) Add generation guards (API_MUTATION_GENERATION +
> FILE_GENERATION) to prevent caching across concurrent set/delete and
> external edits
>
> Please see the patch specific changelogs for more details.
>
> Thanks for considering this patch series, I look forward to your
> feedback.
>
> Best,
> Samuel Rufinatscha
>
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
> [2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
> [3] attachment 1794 [1]: Flamegraph PDM baseline
> [4] attachment 1795 [1]: Flamegraph PDM patched
> [5] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>
> proxmox-backup:
>
> Samuel Rufinatscha (3):
> pbs-config: cache verified API token secrets
> pbs-config: invalidate token-secret cache on token.shadow changes
> pbs-config: add TTL window to token secret cache
>
> Cargo.toml | 1 +
> docs/user-management.rst | 4 +
> pbs-config/Cargo.toml | 1 +
> pbs-config/src/token_shadow.rs | 238 ++++++++++++++++++++++++++++++++-
> 4 files changed, 243 insertions(+), 1 deletion(-)
>
>
> proxmox:
>
> Samuel Rufinatscha (3):
> proxmox-access-control: cache verified API token secrets
> proxmox-access-control: invalidate token-secret cache on token.shadow
> changes
> proxmox-access-control: add TTL window to token secret cache
>
> Cargo.toml | 1 +
> proxmox-access-control/Cargo.toml | 1 +
> proxmox-access-control/src/token_shadow.rs | 238 ++++++++++++++++++++-
> 3 files changed, 239 insertions(+), 1 deletion(-)
>
>
> proxmox-datacenter-manager:
>
> Samuel Rufinatscha (1):
> docs: document API token-cache TTL effects
>
> docs/access-control.rst | 3 +++
> 1 file changed, 3 insertions(+)
>
>
> Summary over all repositories:
> 8 files changed, 485 insertions(+), 2 deletions(-)
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] superseded: [PATCH proxmox{-backup, } 0/6] Reduce token.shadow verification overhead
2025-12-05 13:25 15% [pbs-devel] [PATCH proxmox{-backup, } 0/6] Reduce token.shadow verification overhead Samuel Rufinatscha
` (6 preceding siblings ...)
2025-12-05 14:06 5% ` [pbs-devel] [PATCH proxmox{-backup, } 0/6] Reduce token.shadow verification overhead Shannon Sterz
@ 2025-12-17 16:27 13% ` Samuel Rufinatscha
7 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-17 16:27 UTC (permalink / raw)
To: pbs-devel
https://lore.proxmox.com/pbs-devel/20251217162520.486520-1-s.rufinatscha@proxmox.com/T/#t
On 12/5/25 2:25 PM, Samuel Rufinatscha wrote:
> Hi,
>
> this series improves the performance of token-based API authentication
> in PBS (pbs-config) and in PDM (underlying proxmox-access-control
> crate), addressing the API token verification hotspot reported in our
> bugtracker #6049 [1].
>
> When profiling PBS /status endpoint with cargo flamegraph [2],
> token-based authentication showed up as a dominant hotspot via
> proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
> path from the hot section of the flamegraph. The same performance issue
> was measured [3] for PDM. PDM uses the underlying shared
> proxmox-access-control library for token handling, which is a
> factored out version of the token.shadow handling code from PBS.
>
> While this series fixes the immediate performance issue both in PBS
> (pbs-config) and in the shared proxmox-access-control crate used by
> PDM, PBS should eventually, ideally be refactored, in a separate
> effort, to use proxmox-access-control for token handling instead of its
> local implementation.
>
> Problem
>
> For token-based API requests, both PBS’s pbs-config token.shadow
> handling and PDM proxmox-access-control’s token.shadow handling
> currently:
>
> 1. read the token.shadow file on each request
> 2. deserialize it into a HashMap<Authid, String>
> 3. run password hash verification via
> proxmox_sys::crypt::verify_crypt_pw for the provided token secret
>
> Under load, this results in significant CPU usage spent in repeated
> password hash computations for the same token+secret pairs. The
> attached flamegraphs for PBS [2] and PDM [3] show
> proxmox_sys::crypt::verify_crypt_pw dominating the hot path.
>
> Approach
>
> The goal is to reduce the cost of token-based authentication preserving
> the existing token handling semantics (including detecting manual edits
> to token.shadow) and be consistent between PBS (pbs-config) and
> PDM (proxmox-access-control). For both sites, the series proposes
> following approach:
>
> 1. Introduce an in-memory cache for verified token secrets
> 2. Invalidate the cache when token.shadow changes (detect manual edits)
> 3. Control metadata checks with a TTL window
>
> Testing
>
> *PBS (pbs-config)*
>
> To verify the effect in PBS, I:
> 1. Set up test environment based on latest PBS ISO, installed Rust
> toolchain, cloned proxmox-backup repository to use with cargo
> flamegraph. Reproduced bug #6049 [1] by profiling the /status
> endpoint with token-based authentication using cargo flamegraph [2].
> The flamegraph showed proxmox_sys::crypt::verify_crypt_pw is the
> hotspot.
> 2. Built PBS with pbs-config patches and re-ran the same workload and
> profiling setup.
> 3. Confirmed that the proxmox_sys::crypt::verify_crypt_pw path no
> longer appears in the hot section of the flamegraph. CPU usage is
> now dominated by TLS overhead.
> 4. Functionally verified that:
> * token-based API authentication still works for valid tokens
> * invalid secrets are rejected as before
> * generating a new token secret via dashboard works and
> authenticates correctly
>
> *PDM (proxmox-access-control)*
>
> To verify the effect in PDM, I followed a similar testing approach.
> Instead of /status, I profiled the /version endpoint with cargo
> flamegraph [3] and verified that the token hashing path disappears
> from the hot section after applying the proxmox-access-control patches.
>
> Functionally I verified that:
> * token-based API authentication still works for valid tokens
> * invalid secrets are rejected as before
> * generating a new token secret via dashboard works and
> authenticates correctly
>
> Patch summary
>
> pbs-config:
>
> 0001 – pbs-config: cache verified API token secrets
> Adds an in-memory cache keyed by Authid that stores plain text token
> secrets after a successful verification or generation and uses
> openssl’s memcmp constant-time for comparison.
>
> 0002 – pbs-config: invalidate token-secret cache on token.shadow changes
> Tracks token.shadow mtime and length and clears the in-memory cache
> when the file changes.
>
> 0003 – pbs-config: add TTL window to token-secret cache
> Introduces a TTL (TOKEN_SECRET_CACHE_TTL_SECS, default 60) for metadata checks so
> that fs::metadata is only called periodically.
>
> proxmox-access-control:
>
> 0004 – access-control: cache verified API token secrets
> Mirrors PBS patch 0001.
>
> 0005 – access-control: invalidate token-secret cache on token.shadow changes
> Mirrors PBS patch 0002.
>
> 0006 – access-control: add TTL window to token-secret cache
> Mirrors PBS patch 0003.
>
> Thanks for considering this patch series, I look forward to your
> feedback.
>
> Best,
> Samuel Rufinatscha
>
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] Flamegraph illustrating the`proxmox_sys::crypt::verify_crypt_pw
> hotspot before this series (attached to [1])
>
> proxmox-backup:
>
> Samuel Rufinatscha (3):
> pbs-config: cache verified API token secrets
> pbs-config: invalidate token-secret cache on token.shadow changes
> pbs-config: add TTL window to token secret cache
>
> pbs-config/src/token_shadow.rs | 109 ++++++++++++++++++++++++++++++++-
> 1 file changed, 108 insertions(+), 1 deletion(-)
>
>
> proxmox:
>
> Samuel Rufinatscha (3):
> proxmox-access-control: cache verified API token secrets
> proxmox-access-control: invalidate token-secret cache on token.shadow
> changes
> proxmox-access-control: add TTL window to token secret cache
>
> proxmox-access-control/src/token_shadow.rs | 108 ++++++++++++++++++++-
> 1 file changed, 107 insertions(+), 1 deletion(-)
>
>
> Summary over all repositories:
> 2 files changed, 215 insertions(+), 2 deletions(-)
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox-backup v2 1/3] pbs-config: cache verified API token secrets
2025-12-17 16:25 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token " Samuel Rufinatscha
@ 2025-12-17 16:25 13% ` Samuel Rufinatscha
2025-12-17 16:25 12% ` [pbs-devel] [PATCH proxmox-backup v2 2/3] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
` (7 subsequent siblings)
8 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-17 16:25 UTC (permalink / raw)
To: pbs-devel
Currently, every token-based API request reads the token.shadow file and
runs the expensive password hash verification for the given token
secret. This shows up as a hotspot in /status profiling (see
bug #7017 [1]).
This patch introduces an in-memory cache of successfully verified token
secrets. Subsequent requests for the same token+secret combination only
perform a comparison using openssl::memcmp::eq and avoid re-running the
password hash. The cache is updated when a token secret is set and
cleared when a token is deleted. Note, this does NOT include manual
config changes, which will be covered in a subsequent patch.
This patch partly fixes bug #7017 [1].
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v1 to v2:
- Replace OnceCell with LazyLock, and std::sync::RwLock with
parking_lot::RwLock.
- Add API_MUTATION_GENERATION and guard cache inserts
to prevent “zombie inserts” across concurrent set/delete.
- Refactor cache operations into cache_try_secret_matches,
cache_try_insert_secret, and centralize write-side behavior in
apply_api_mutation.
- Switch fast-path cache access to try_read/try_write (best-effort).
Cargo.toml | 1 +
pbs-config/Cargo.toml | 1 +
pbs-config/src/token_shadow.rs | 94 +++++++++++++++++++++++++++++++++-
3 files changed, 95 insertions(+), 1 deletion(-)
diff --git a/Cargo.toml b/Cargo.toml
index ff143932..231cdca8 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -143,6 +143,7 @@ nom = "7"
num-traits = "0.2"
once_cell = "1.3.1"
openssl = "0.10.40"
+parking_lot = "0.12"
percent-encoding = "2.1"
pin-project-lite = "0.2"
regex = "1.5.5"
diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
index 74afb3c6..eb81ce00 100644
--- a/pbs-config/Cargo.toml
+++ b/pbs-config/Cargo.toml
@@ -13,6 +13,7 @@ libc.workspace = true
nix.workspace = true
once_cell.workspace = true
openssl.workspace = true
+parking_lot.workspace = true
regex.workspace = true
serde.workspace = true
serde_json.workspace = true
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index 640fabbf..ce845e8d 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,6 +1,9 @@
use std::collections::HashMap;
+use std::sync::atomic::{AtomicU64, Ordering};
+use std::sync::LazyLock;
use anyhow::{bail, format_err, Error};
+use parking_lot::RwLock;
use serde::{Deserialize, Serialize};
use serde_json::{from_value, Value};
@@ -13,6 +16,19 @@ use crate::{open_backup_lockfile, BackupLockGuard};
const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
+ RwLock::new(ApiTokenSecretCache {
+ secrets: HashMap::new(),
+ })
+});
+/// API mutation generation (set/delete)
+static API_MUTATION_GENERATION: AtomicU64 = AtomicU64::new(0);
+
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
/// ApiToken id / secret pair
@@ -54,9 +70,24 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
bail!("not an API token ID");
}
+ // Fast path
+ if cache_try_secret_matches(tokenid, secret) {
+ return Ok(());
+ }
+
+ // Slow path snapshot (before expensive work)
+ let api_gen_before = API_MUTATION_GENERATION.load(Ordering::Acquire);
+
let data = read_file()?;
match data.get(tokenid) {
- Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+ Some(hashed_secret) => {
+ proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+
+ // Try to cache only if nothing changed while we verified
+ cache_try_insert_secret(tokenid.clone(), secret.to_owned(), api_gen_before);
+
+ Ok(())
+ }
None => bail!("invalid API token"),
}
}
@@ -82,6 +113,8 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
data.insert(tokenid.clone(), hashed_secret);
write_file(data)?;
+ apply_api_mutation(tokenid, Some(secret));
+
Ok(())
}
@@ -97,5 +130,64 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
data.remove(tokenid);
write_file(data)?;
+ apply_api_mutation(tokenid, None);
+
Ok(())
}
+
+struct ApiTokenSecretCache {
+ /// Keys are token Authids, values are the corresponding plain text secrets.
+ /// Entries are added after a successful on-disk verification in
+ /// `verify_secret` or when a new token secret is generated by
+ /// `generate_and_set_secret`. Used to avoid repeated
+ /// password-hash computation on subsequent authentications.
+ secrets: HashMap<Authid, CachedSecret>,
+}
+
+/// Cached secret.
+struct CachedSecret {
+ secret: String,
+}
+
+fn cache_try_insert_secret(tokenid: Authid, secret: String, api_gen_snapshot: u64) {
+ let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+ return;
+ };
+
+ if API_MUTATION_GENERATION.load(Ordering::Acquire) == api_gen_snapshot {
+ cache.secrets.insert(tokenid, CachedSecret { secret });
+ }
+}
+
+fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
+ let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+ return false;
+ };
+ let Some(entry) = cache.secrets.get(tokenid) else {
+ return false;
+ };
+
+ openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes())
+}
+
+fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
+ // Prevent in-flight verify_secret() from caching results across a mutation.
+ API_MUTATION_GENERATION.fetch_add(1, Ordering::AcqRel);
+
+ // Mutations must be reflected immediately once set/delete returns.
+ let mut cache = TOKEN_SECRET_CACHE.write();
+
+ match new_secret {
+ Some(secret) => {
+ cache.secrets.insert(
+ tokenid.clone(),
+ CachedSecret {
+ secret: secret.to_owned(),
+ },
+ );
+ }
+ None => {
+ cache.secrets.remove(tokenid);
+ }
+ }
+}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox v2 1/3] proxmox-access-control: cache verified API token secrets
2025-12-17 16:25 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token " Samuel Rufinatscha
` (2 preceding siblings ...)
2025-12-17 16:25 14% ` [pbs-devel] [PATCH proxmox-backup v2 3/3] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
@ 2025-12-17 16:25 13% ` Samuel Rufinatscha
2025-12-17 16:25 12% ` [pbs-devel] [PATCH proxmox v2 2/3] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
` (4 subsequent siblings)
8 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-17 16:25 UTC (permalink / raw)
To: pbs-devel
Currently, every token-based API request reads the token.shadow file and
runs the expensive password hash verification for the given token
secret. This issue was first observed as part of profiling the PBS
/status endpoint (see bug #7017 [1]) and is required for the factored
out proxmox_access_control token_shadow implementation too.
This patch introduces an in-memory cache of successfully verified token
secrets. Subsequent requests for the same token+secret combination only
perform a comparison using openssl::memcmp::eq and avoid re-running the
password hash. The cache is updated when a token secret is set and
cleared when a token is deleted. Note, this does NOT include manual
config changes, which will be covered in a subsequent patch.
This patch partly fixes bug #7017 [1].
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v1 to v2:
- Replace OnceCell with LazyLock, and std::sync::RwLock with
parking_lot::RwLock.
- Add API_MUTATION_GENERATION and guard cache inserts
to prevent “zombie inserts” across concurrent set/delete.
- Refactor cache operations into cache_try_secret_matches,
cache_try_insert_secret, and centralize write-side behavior in
apply_api_mutation.
- Switch fast-path cache access to try_read/try_write (best-effort).
Cargo.toml | 1 +
proxmox-access-control/Cargo.toml | 1 +
proxmox-access-control/src/token_shadow.rs | 94 +++++++++++++++++++++-
3 files changed, 95 insertions(+), 1 deletion(-)
diff --git a/Cargo.toml b/Cargo.toml
index 27a69afa..59a2ec93 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -112,6 +112,7 @@ native-tls = "0.2"
nix = "0.29"
openssl = "0.10"
pam-sys = "0.5"
+parking_lot = "0.12"
percent-encoding = "2.1"
pin-utils = "0.1.0"
proc-macro2 = "1.0"
diff --git a/proxmox-access-control/Cargo.toml b/proxmox-access-control/Cargo.toml
index ec189664..1de2842c 100644
--- a/proxmox-access-control/Cargo.toml
+++ b/proxmox-access-control/Cargo.toml
@@ -16,6 +16,7 @@ anyhow.workspace = true
const_format.workspace = true
nix = { workspace = true, optional = true }
openssl = { workspace = true, optional = true }
+parking_lot.workspace = true
regex.workspace = true
hex = { workspace = true, optional = true }
serde.workspace = true
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index c586d834..c0285b62 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,6 +1,9 @@
use std::collections::HashMap;
+use std::sync::atomic::{AtomicU64, Ordering};
+use std::sync::LazyLock;
use anyhow::{bail, format_err, Error};
+use parking_lot::RwLock;
use serde_json::{from_value, Value};
use proxmox_auth_api::types::Authid;
@@ -8,6 +11,19 @@ use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
use crate::init::impl_feature::{token_shadow, token_shadow_lock};
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
+ RwLock::new(ApiTokenSecretCache {
+ secrets: HashMap::new(),
+ })
+});
+/// API mutation generation (set/delete)
+static API_MUTATION_GENERATION: AtomicU64 = AtomicU64::new(0);
+
// Get exclusive lock
fn lock_config() -> Result<ApiLockGuard, Error> {
open_api_lockfile(token_shadow_lock(), None, true)
@@ -36,9 +52,24 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
bail!("not an API token ID");
}
+ // Fast path
+ if cache_try_secret_matches(tokenid, secret) {
+ return Ok(());
+ }
+
+ // Slow path snapshot (before expensive work)
+ let api_gen_before = API_MUTATION_GENERATION.load(Ordering::Acquire);
+
let data = read_file()?;
match data.get(tokenid) {
- Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+ Some(hashed_secret) => {
+ proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+
+ // Try to cache only if nothing changed while we verified
+ cache_try_insert_secret(tokenid.clone(), secret.to_owned(), api_gen_before);
+
+ Ok(())
+ }
None => bail!("invalid API token"),
}
}
@@ -56,6 +87,8 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
data.insert(tokenid.clone(), hashed_secret);
write_file(data)?;
+ apply_api_mutation(tokenid, Some(secret));
+
Ok(())
}
@@ -71,6 +104,8 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
data.remove(tokenid);
write_file(data)?;
+ apply_api_mutation(tokenid, None);
+
Ok(())
}
@@ -81,3 +116,60 @@ pub fn generate_and_set_secret(tokenid: &Authid) -> Result<String, Error> {
set_secret(tokenid, &secret)?;
Ok(secret)
}
+
+struct ApiTokenSecretCache {
+ /// Keys are token Authids, values are the corresponding plain text secrets.
+ /// Entries are added after a successful on-disk verification in
+ /// `verify_secret` or when a new token secret is generated by
+ /// `generate_and_set_secret`. Used to avoid repeated
+ /// password-hash computation on subsequent authentications.
+ secrets: HashMap<Authid, CachedSecret>,
+}
+
+/// Cached secret.
+struct CachedSecret {
+ secret: String,
+}
+
+fn cache_try_insert_secret(tokenid: Authid, secret: String, api_gen_snapshot: u64) {
+ let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+ return;
+ };
+
+ if API_MUTATION_GENERATION.load(Ordering::Acquire) == api_gen_snapshot {
+ cache.secrets.insert(tokenid, CachedSecret { secret });
+ }
+}
+
+fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
+ let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+ return false;
+ };
+ let Some(entry) = cache.secrets.get(tokenid) else {
+ return false;
+ };
+
+ openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes())
+}
+
+fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
+ // Prevent in-flight verify_secret() from caching results across a mutation.
+ API_MUTATION_GENERATION.fetch_add(1, Ordering::AcqRel);
+
+ // Mutations must be reflected immediately once set/delete returns.
+ let mut cache = TOKEN_SECRET_CACHE.write();
+
+ match new_secret {
+ Some(secret) => {
+ cache.secrets.insert(
+ tokenid.clone(),
+ CachedSecret {
+ secret: secret.to_owned(),
+ },
+ );
+ }
+ None => {
+ cache.secrets.remove(tokenid);
+ }
+ }
+}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token verification overhead
@ 2025-12-17 16:25 14% Samuel Rufinatscha
2025-12-17 16:25 13% ` [pbs-devel] [PATCH proxmox-backup v2 1/3] pbs-config: cache verified API token secrets Samuel Rufinatscha
` (8 more replies)
0 siblings, 9 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-17 16:25 UTC (permalink / raw)
To: pbs-devel
Hi,
this series improves the performance of token-based API authentication
in PBS (pbs-config) and in PDM (underlying proxmox-access-control
crate), addressing the API token verification hotspot reported in our
bugtracker #6049 [1].
When profiling PBS /status endpoint with cargo flamegraph [2],
token-based authentication showed up as a dominant hotspot via
proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
path from the hot section of the flamegraph. The same performance issue
was measured [2] for PDM. PDM uses the underlying shared
proxmox-access-control library for token handling, which is a
factored out version of the token.shadow handling code from PBS.
While this series fixes the immediate performance issue both in PBS
(pbs-config) and in the shared proxmox-access-control crate used by
PDM, PBS should eventually, ideally be refactored, in a separate
effort, to use proxmox-access-control for token handling instead of its
local implementation.
Problem
For token-based API requests, both PBS’s pbs-config token.shadow
handling and PDM proxmox-access-control’s token.shadow handling
currently:
1. read the token.shadow file on each request
2. deserialize it into a HashMap<Authid, String>
3. run password hash verification via
proxmox_sys::crypt::verify_crypt_pw for the provided token secret
Under load, this results in significant CPU usage spent in repeated
password hash computations for the same token+secret pairs. The
attached flamegraphs for PBS [2] and PDM [3] show
proxmox_sys::crypt::verify_crypt_pw dominating the hot path.
Approach
The goal is to reduce the cost of token-based authentication preserving
the existing token handling semantics (including detecting manual edits
to token.shadow) and be consistent between PBS (pbs-config) and
PDM (proxmox-access-control). For both sites, the series proposes
following approach:
1. Introduce an in-memory cache for verified token secrets
2. Invalidate the cache when token.shadow changes (detect manual edits)
3. Control metadata checks with a TTL window
Testing
*PBS (pbs-config)*
To verify the effect in PBS, I:
1. Set up test environment based on latest PBS ISO, installed Rust
toolchain, cloned proxmox-backup repository to use with cargo
flamegraph. Reproduced bug #6049 [1] by profiling the /status
endpoint with token-based authentication using cargo flamegraph [2].
The flamegraph showed proxmox_sys::crypt::verify_crypt_pw is the
hotspot.
2. Built PBS with pbs-config patches and re-ran the same workload and
profiling setup.
3. Confirmed that the proxmox_sys::crypt::verify_crypt_pw path no
longer appears in the hot section of the flamegraph. CPU usage is
now dominated by TLS overhead.
4. Functionally verified that:
* token-based API authentication still works for valid tokens
* invalid secrets are rejected as before
* generating a new token secret via dashboard works and
authenticates correctly
*PDM (proxmox-access-control)*
To verify the effect in PDM, I followed a similar testing approach.
Instead of /status, I profiled the /version endpoint with cargo
flamegraph [2] and verified that the token hashing path disappears [4]
from the hot section after applying the proxmox-access-control patches.
Functionally I verified that:
* token-based API authentication still works for valid tokens
* invalid secrets are rejected as before
* generating a new token secret via dashboard works and
authenticates correctly
Benchmarks:
Two different benchmarks have been run to measure caching effects
and RwLock contention:
(1) Requests per second for PBS /status endpoint (E2E)
(2) RwLock contention for token create/delete under
heavy parallel token-authenticated readers; compared
std::sync::RwLock and parking_lot::RwLock.
(1) benchmarked parallel token auth requests for
/status?verbose=0 on top of the datastore lookup cache series [5]
to check throughput impact. With datastores=1, repeat=5000, parallel=16
this series gives ~179 req/s compared to ~65 req/s without it.
This is a ~2.75x improvement.
(2) benchmarked token create/delete operations under heavy load of
token-authenticated requests on top of the datastore lookup cache [5]
series. This benchmark was done using against a 64-parallel
token-auth flood (200k requests) against
/admin/datastore/ds0001/status?verbose=0 while executing 50 token
create + 50 token delete operations. After the series I saw the
following e2e API latencies:
parking_lot::RwLock
- create avg ~27ms (p95 ~28ms) vs ~46ms (p95 ~50ms) baseline
- delete avg ~17ms (p95 ~19ms) vs ~33ms (p95 ~35ms) baseline
std::sync::RwLock
- create avg ~27ms (p95 ~28ms)
- create avg ~17ms (p95 ~19ms)
It appears that the both RwLock implementations perform similarly
for this workload. The parking_lot version has been chosen for the
added fairness guarantees.
Patch summary
pbs-config:
0001 – pbs-config: cache verified API token secrets
Adds an in-memory cache keyed by Authid that stores plain text token
secrets after a successful verification or generation and uses
openssl’s memcmp constant-time for comparison.
0002 – pbs-config: invalidate token-secret cache on token.shadow
changes
Tracks token.shadow mtime and length and clears the in-memory
cache when the file changes.
0003 – pbs-config: add TTL window to token-secret cache
Introduces a TTL (TOKEN_SECRET_CACHE_TTL_SECS, default 60) for metadata
checks so that fs::metadata is only called periodically.
proxmox-access-control:
0004 – access-control: cache verified API token secrets
Mirrors PBS PATCH 0001.
0005 – access-control: invalidate token-secret cache on token.shadow changes
Mirrors PBS PATCH 0002.
0006 – access-control: add TTL window to token-secret cache
Mirrors PBS PATCH 0003.
proxmox-datacenter-manager:
0007 – docs: document API token-cache TTL effects
Documents the effects of the TTL window on token.shadow edits
Changes since v1
- (refactor) Switched cache initialization to LazyLock
- (perf) Use parking_lot::RwLock and best-effort cache access on the
read/refresh path (try_read/try_write) to avoid lock contention
- (doc) Document TTL-delayed effect of manual token.shadow edits
- (fix) Add generation guards (API_MUTATION_GENERATION +
FILE_GENERATION) to prevent caching across concurrent set/delete and
external edits
Please see the patch specific changelogs for more details.
Thanks for considering this patch series, I look forward to your
feedback.
Best,
Samuel Rufinatscha
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
[2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
[3] attachment 1794 [1]: Flamegraph PDM baseline
[4] attachment 1795 [1]: Flamegraph PDM patched
[5] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
proxmox-backup:
Samuel Rufinatscha (3):
pbs-config: cache verified API token secrets
pbs-config: invalidate token-secret cache on token.shadow changes
pbs-config: add TTL window to token secret cache
Cargo.toml | 1 +
docs/user-management.rst | 4 +
pbs-config/Cargo.toml | 1 +
pbs-config/src/token_shadow.rs | 238 ++++++++++++++++++++++++++++++++-
4 files changed, 243 insertions(+), 1 deletion(-)
proxmox:
Samuel Rufinatscha (3):
proxmox-access-control: cache verified API token secrets
proxmox-access-control: invalidate token-secret cache on token.shadow
changes
proxmox-access-control: add TTL window to token secret cache
Cargo.toml | 1 +
proxmox-access-control/Cargo.toml | 1 +
proxmox-access-control/src/token_shadow.rs | 238 ++++++++++++++++++++-
3 files changed, 239 insertions(+), 1 deletion(-)
proxmox-datacenter-manager:
Samuel Rufinatscha (1):
docs: document API token-cache TTL effects
docs/access-control.rst | 3 +++
1 file changed, 3 insertions(+)
Summary over all repositories:
8 files changed, 485 insertions(+), 2 deletions(-)
--
Generated by git-murpp 0.8.1
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 14%]
* [pbs-devel] [PATCH proxmox-backup v2 3/3] pbs-config: add TTL window to token secret cache
2025-12-17 16:25 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token " Samuel Rufinatscha
2025-12-17 16:25 13% ` [pbs-devel] [PATCH proxmox-backup v2 1/3] pbs-config: cache verified API token secrets Samuel Rufinatscha
2025-12-17 16:25 12% ` [pbs-devel] [PATCH proxmox-backup v2 2/3] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2025-12-17 16:25 14% ` Samuel Rufinatscha
2025-12-17 16:25 13% ` [pbs-devel] [PATCH proxmox v2 1/3] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
` (5 subsequent siblings)
8 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-17 16:25 UTC (permalink / raw)
To: pbs-devel
Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
usually should rarely change.
This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired. Documents TTL effects.
This patch partly fixes bug #7017 [1].
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v1 to v2:
- Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
- Implement double-checked TTL: check with try_read first; only attempt
refresh with try_write if expired/unknown.
- Fix TTL bookkeeping: update last_checked on the “file unchanged” path
and after API mutations.
- Add documentation warning about TTL-delayed effect of manual
token.shadow edits.
docs/user-management.rst | 4 ++++
pbs-config/src/token_shadow.rs | 42 +++++++++++++++++++++++++++++++++-
2 files changed, 45 insertions(+), 1 deletion(-)
diff --git a/docs/user-management.rst b/docs/user-management.rst
index 41b43d60..32a9ec29 100644
--- a/docs/user-management.rst
+++ b/docs/user-management.rst
@@ -156,6 +156,10 @@ metadata:
Similarly, the ``user delete-token`` subcommand can be used to delete a token
again.
+.. WARNING:: If you manually remove a generated API token from the token secrets
+ file (token.shadow), it can take up to one minute before the token is
+ rejected. This is due to caching.
+
Newly generated API tokens don't have any permissions. Please read the next
section to learn how to set access permissions.
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index 71553aae..79940fd5 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -11,6 +11,7 @@ use serde::{Deserialize, Serialize};
use serde_json::{from_value, Value};
use proxmox_sys::fs::CreateOptions;
+use proxmox_time::epoch_i64;
use pbs_api_types::Authid;
//use crate::auth;
@@ -29,12 +30,15 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
secrets: HashMap::new(),
file_mtime: None,
file_len: None,
+ last_checked: None,
})
});
/// API mutation generation (set/delete)
static API_MUTATION_GENERATION: AtomicU64 = AtomicU64::new(0);
/// External/manual edits generation for the token.shadow file
static FILE_GENERATION: AtomicU64 = AtomicU64::new(0);
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
@@ -74,22 +78,54 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
/// Returns true if the cache is valid to use, false if not.
fn refresh_cache_if_file_changed() -> bool {
+ let now = epoch_i64();
+
+ // Check TTL (best-effort)
+ let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+ return false; // cannot validate external changes -> don't trust cache
+ };
+
+ let ttl_ok = cache
+ .last_checked
+ .is_some_and(|last| now.saturating_sub(last) < TOKEN_SECRET_CACHE_TTL_SECS);
+
+ drop(cache);
+
+ if ttl_ok {
+ return true;
+ }
+
+ // TTL expired/unknown at this point -> do best-effort refresh.
let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
return false; // cannot validate external changes -> don't trust cache
};
+ // Check TTL after acquiring write lock.
+ if let Some(last) = cache.last_checked {
+ if now.saturating_sub(last) < TOKEN_SECRET_CACHE_TTL_SECS {
+ return true;
+ }
+ }
+
+ let had_prior_state = cache.last_checked.is_some();
+
let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
return false; // cannot validate external changes -> don't trust cache
};
if cache.file_mtime == new_mtime && cache.file_len == new_len {
+ cache.last_checked = Some(now);
return true;
}
cache.secrets.clear();
cache.file_mtime = new_mtime;
cache.file_len = new_len;
- FILE_GENERATION.fetch_add(1, Ordering::AcqRel);
+ cache.last_checked = Some(now);
+
+ if had_prior_state {
+ FILE_GENERATION.fetch_add(1, Ordering::AcqRel);
+ }
true
}
@@ -188,6 +224,8 @@ struct ApiTokenSecretCache {
file_mtime: Option<SystemTime>,
// shadow file length to detect changes
file_len: Option<u64>,
+ // last time the file metadata was checked
+ last_checked: Option<i64>,
}
/// Cached secret and the file generation it was cached at.
@@ -280,10 +318,12 @@ fn apply_api_mutation(
Ok((mtime, len)) => {
cache.file_mtime = mtime;
cache.file_len = len;
+ cache.last_checked = Some(epoch_i64());
}
Err(_) => {
cache.file_mtime = None;
cache.file_len = None;
+ cache.last_checked = None; // to force refresh next time
}
}
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 14%]
* [pbs-devel] [PATCH proxmox v2 2/3] proxmox-access-control: invalidate token-secret cache on token.shadow changes
2025-12-17 16:25 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token " Samuel Rufinatscha
` (3 preceding siblings ...)
2025-12-17 16:25 13% ` [pbs-devel] [PATCH proxmox v2 1/3] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
@ 2025-12-17 16:25 12% ` Samuel Rufinatscha
2025-12-17 16:25 15% ` [pbs-devel] [PATCH proxmox v2 3/3] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
` (3 subsequent siblings)
8 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-17 16:25 UTC (permalink / raw)
To: pbs-devel
Previously the in-memory token-secret cache was only updated via
set_secret() and delete_secret(), so manual edits to token.shadow were
not reflected.
This patch adds file change detection to the cache. It tracks the mtime
and length of token.shadow and clears the in-memory token secret cache
whenever these values change.
Note, this patch fetches file stats on every request. An TTL-based
optimization will be covered in a subsequent patch of the series.
This patch partly fixes bug #7017 [1].
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v1 to v2:
- Add file metadata tracking (file_mtime, file_len) and
FILE_GENERATION.
- Store file_gen in CachedSecret and verify it against the current
FILE_GENERATION to ensure cached entries belong to the current file
state.
- Add shadow_mtime_len() helper and convert refresh to best-effort
(try_write, returns bool).
- Pass a pre-write metadata snapshot into apply_api_mutation and
clear/bump generation if the cache metadata indicates missed external
edits.
proxmox-access-control/src/token_shadow.rs | 128 +++++++++++++++++++--
1 file changed, 116 insertions(+), 12 deletions(-)
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index c0285b62..efadce94 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,6 +1,9 @@
use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::LazyLock;
+use std::time::SystemTime;
use anyhow::{bail, format_err, Error};
use parking_lot::RwLock;
@@ -19,10 +22,14 @@ use crate::init::impl_feature::{token_shadow, token_shadow_lock};
static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
RwLock::new(ApiTokenSecretCache {
secrets: HashMap::new(),
+ file_mtime: None,
+ file_len: None,
})
});
/// API mutation generation (set/delete)
static API_MUTATION_GENERATION: AtomicU64 = AtomicU64::new(0);
+/// External/manual edits generation for the token.shadow file
+static FILE_GENERATION: AtomicU64 = AtomicU64::new(0);
// Get exclusive lock
fn lock_config() -> Result<ApiLockGuard, Error> {
@@ -46,6 +53,29 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
replace_config(token_shadow(), &json)
}
+/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
+/// Returns true if the cache is valid to use, false if not.
+fn refresh_cache_if_file_changed() -> bool {
+ let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+ return false; // cannot validate external changes -> don't trust cache
+ };
+
+ let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
+ return false; // cannot validate external changes -> don't trust cache
+ };
+
+ if cache.file_mtime == new_mtime && cache.file_len == new_len {
+ return true;
+ }
+
+ cache.secrets.clear();
+ cache.file_mtime = new_mtime;
+ cache.file_len = new_len;
+ FILE_GENERATION.fetch_add(1, Ordering::AcqRel);
+
+ true
+}
+
/// Verifies that an entry for given tokenid / API token secret exists
pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
if !tokenid.is_token() {
@@ -53,12 +83,13 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
}
// Fast path
- if cache_try_secret_matches(tokenid, secret) {
+ if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
return Ok(());
}
// Slow path snapshot (before expensive work)
let api_gen_before = API_MUTATION_GENERATION.load(Ordering::Acquire);
+ let file_gen_before = FILE_GENERATION.load(Ordering::Acquire);
let data = read_file()?;
match data.get(tokenid) {
@@ -66,7 +97,12 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
// Try to cache only if nothing changed while we verified
- cache_try_insert_secret(tokenid.clone(), secret.to_owned(), api_gen_before);
+ cache_try_insert_secret(
+ tokenid.clone(),
+ secret.to_owned(),
+ api_gen_before,
+ file_gen_before,
+ );
Ok(())
}
@@ -82,12 +118,15 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
let _guard = lock_config()?;
+ // Capture state BEFORE we write to detect external edits.
+ let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
let mut data = read_file()?;
let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
data.insert(tokenid.clone(), hashed_secret);
write_file(data)?;
- apply_api_mutation(tokenid, Some(secret));
+ apply_api_mutation(tokenid, Some(secret), pre_meta);
Ok(())
}
@@ -100,11 +139,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
let _guard = lock_config()?;
+ // Capture state BEFORE we write to detect external edits.
+ let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
let mut data = read_file()?;
data.remove(tokenid);
write_file(data)?;
- apply_api_mutation(tokenid, None);
+ apply_api_mutation(tokenid, None, pre_meta);
Ok(())
}
@@ -124,20 +166,40 @@ struct ApiTokenSecretCache {
/// `generate_and_set_secret`. Used to avoid repeated
/// password-hash computation on subsequent authentications.
secrets: HashMap<Authid, CachedSecret>,
+ // shadow file mtime to detect changes
+ file_mtime: Option<SystemTime>,
+ // shadow file length to detect changes
+ file_len: Option<u64>,
}
-/// Cached secret.
+/// Cached secret and the file generation it was cached at.
struct CachedSecret {
secret: String,
+ file_gen: u64,
}
-fn cache_try_insert_secret(tokenid: Authid, secret: String, api_gen_snapshot: u64) {
+fn cache_try_insert_secret(
+ tokenid: Authid,
+ secret: String,
+ api_gen_snapshot: u64,
+ file_gen_snapshot: u64,
+) {
let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
return;
};
- if API_MUTATION_GENERATION.load(Ordering::Acquire) == api_gen_snapshot {
- cache.secrets.insert(tokenid, CachedSecret { secret });
+ // Check generations to avoid zombie-inserts
+ let cur_file_gen = FILE_GENERATION.load(Ordering::Acquire);
+ let cur_api_gen = API_MUTATION_GENERATION.load(Ordering::Acquire);
+
+ if cur_file_gen == file_gen_snapshot && cur_api_gen == api_gen_snapshot {
+ cache.secrets.insert(
+ tokenid,
+ CachedSecret {
+ secret,
+ file_gen: cur_file_gen,
+ },
+ );
}
}
@@ -149,22 +211,44 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
return false;
};
- openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes())
+ let gen1 = FILE_GENERATION.load(Ordering::Acquire);
+ if entry.file_gen != gen1 {
+ return false;
+ }
+
+ let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
+
+ let gen2 = FILE_GENERATION.load(Ordering::Acquire);
+ eq && gen1 == gen2
}
-fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
- // Prevent in-flight verify_secret() from caching results across a mutation.
+fn apply_api_mutation(
+ tokenid: &Authid,
+ new_secret: Option<&str>,
+ pre_write_meta: (Option<SystemTime>, Option<u64>),
+) {
API_MUTATION_GENERATION.fetch_add(1, Ordering::AcqRel);
- // Mutations must be reflected immediately once set/delete returns.
let mut cache = TOKEN_SECRET_CACHE.write();
+ // If the cache meta doesn't match the file state before the on-disk write,
+ // external/manual edits happened -> drop everything and bump FILE_GENERATION.
+ let (pre_mtime, pre_len) = pre_write_meta;
+ if cache.file_mtime != pre_mtime || cache.file_len != pre_len {
+ cache.secrets.clear();
+ FILE_GENERATION.fetch_add(1, Ordering::AcqRel);
+ }
+
+ let file_gen = FILE_GENERATION.load(Ordering::Acquire);
+
+ // Apply the API mutation to the cache.
match new_secret {
Some(secret) => {
cache.secrets.insert(
tokenid.clone(),
CachedSecret {
secret: secret.to_owned(),
+ file_gen,
},
);
}
@@ -172,4 +256,24 @@ fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
cache.secrets.remove(tokenid);
}
}
+
+ // Keep cache metadata aligned if possible.
+ match shadow_mtime_len() {
+ Ok((mtime, len)) => {
+ cache.file_mtime = mtime;
+ cache.file_len = len;
+ }
+ Err(_) => {
+ cache.file_mtime = None;
+ cache.file_len = None;
+ }
+ }
+}
+
+fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
+ match fs::metadata(token_shadow().as_path()) {
+ Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
+ Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
+ Err(e) => Err(e.into()),
+ }
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] [PATCH proxmox v2 3/3] proxmox-access-control: add TTL window to token secret cache
2025-12-17 16:25 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token " Samuel Rufinatscha
` (4 preceding siblings ...)
2025-12-17 16:25 12% ` [pbs-devel] [PATCH proxmox v2 2/3] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2025-12-17 16:25 15% ` Samuel Rufinatscha
2025-12-17 16:25 17% ` [pbs-devel] [PATCH proxmox-datacenter-manager v2 1/1] docs: document API token-cache TTL effects Samuel Rufinatscha
` (2 subsequent siblings)
8 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-17 16:25 UTC (permalink / raw)
To: pbs-devel
Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
should rarely change.
This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired.
This patch partly fixes bug #7017 [1].
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v1 to v2:
- Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
- Implement double-checked TTL: check with try_read first; only attempt
refresh with try_write if expired/unknown.
- Fix TTL bookkeeping: update last_checked on the “file unchanged” path
and after API mutations.
proxmox-access-control/src/token_shadow.rs | 42 +++++++++++++++++++++-
1 file changed, 41 insertions(+), 1 deletion(-)
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index efadce94..4ca56de9 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -11,6 +11,7 @@ use serde_json::{from_value, Value};
use proxmox_auth_api::types::Authid;
use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
+use proxmox_time::epoch_i64;
use crate::init::impl_feature::{token_shadow, token_shadow_lock};
@@ -24,12 +25,15 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
secrets: HashMap::new(),
file_mtime: None,
file_len: None,
+ last_checked: None,
})
});
/// API mutation generation (set/delete)
static API_MUTATION_GENERATION: AtomicU64 = AtomicU64::new(0);
/// External/manual edits generation for the token.shadow file
static FILE_GENERATION: AtomicU64 = AtomicU64::new(0);
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
// Get exclusive lock
fn lock_config() -> Result<ApiLockGuard, Error> {
@@ -56,22 +60,54 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
/// Returns true if the cache is valid to use, false if not.
fn refresh_cache_if_file_changed() -> bool {
+ let now = epoch_i64();
+
+ // Check TTL (best-effort)
+ let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+ return false; // cannot validate external changes -> don't trust cache
+ };
+
+ let ttl_ok = cache
+ .last_checked
+ .is_some_and(|last| now.saturating_sub(last) < TOKEN_SECRET_CACHE_TTL_SECS);
+
+ drop(cache);
+
+ if ttl_ok {
+ return true;
+ }
+
+ // TTL expired/unknown at this point -> do best-effort refresh.
let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
return false; // cannot validate external changes -> don't trust cache
};
+ // Check TTL after acquiring write lock.
+ if let Some(last) = cache.last_checked {
+ if now.saturating_sub(last) < TOKEN_SECRET_CACHE_TTL_SECS {
+ return true;
+ }
+ }
+
+ let had_prior_state = cache.last_checked.is_some();
+
let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
return false; // cannot validate external changes -> don't trust cache
};
if cache.file_mtime == new_mtime && cache.file_len == new_len {
+ cache.last_checked = Some(now);
return true;
}
cache.secrets.clear();
cache.file_mtime = new_mtime;
cache.file_len = new_len;
- FILE_GENERATION.fetch_add(1, Ordering::AcqRel);
+ cache.last_checked = Some(now);
+
+ if had_prior_state {
+ FILE_GENERATION.fetch_add(1, Ordering::AcqRel);
+ }
true
}
@@ -170,6 +206,8 @@ struct ApiTokenSecretCache {
file_mtime: Option<SystemTime>,
// shadow file length to detect changes
file_len: Option<u64>,
+ // last time the file metadata was checked
+ last_checked: Option<i64>,
}
/// Cached secret and the file generation it was cached at.
@@ -262,10 +300,12 @@ fn apply_api_mutation(
Ok((mtime, len)) => {
cache.file_mtime = mtime;
cache.file_len = len;
+ cache.last_checked = Some(epoch_i64());
}
Err(_) => {
cache.file_mtime = None;
cache.file_len = None;
+ cache.last_checked = None; // to force refresh next time
}
}
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox-datacenter-manager v2 1/1] docs: document API token-cache TTL effects
2025-12-17 16:25 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token " Samuel Rufinatscha
` (5 preceding siblings ...)
2025-12-17 16:25 15% ` [pbs-devel] [PATCH proxmox v2 3/3] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
@ 2025-12-17 16:25 17% ` Samuel Rufinatscha
2025-12-18 11:03 12% ` [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token verification overhead Samuel Rufinatscha
2026-01-02 16:09 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
8 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-17 16:25 UTC (permalink / raw)
To: pbs-devel
Documents the effects of the added API token-cache in the
proxmox-access-control crate. This patch is part of the
series that fixes bug #7017 [1].
This patch partly fixes bug #7017 [1].
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
docs/access-control.rst | 3 +++
1 file changed, 3 insertions(+)
diff --git a/docs/access-control.rst b/docs/access-control.rst
index adf26cd..f4f26f2 100644
--- a/docs/access-control.rst
+++ b/docs/access-control.rst
@@ -47,6 +47,9 @@ place of the user ID (``user@realm``) and the user password, respectively.
The API token is passed from the client to the server by setting the ``Authorization`` HTTP header
with method ``PDMAPIToken`` to the value ``TOKENID:TOKENSECRET``.
+.. WARNING:: If you manually remove a generated API token from the token secrets file (token.shadow),
+ it can take up to one minute before the token is rejected. This is due to caching.
+
.. _access_control:
Access Control
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 17%]
* [pbs-devel] [PATCH proxmox-backup v2 2/3] pbs-config: invalidate token-secret cache on token.shadow changes
2025-12-17 16:25 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token " Samuel Rufinatscha
2025-12-17 16:25 13% ` [pbs-devel] [PATCH proxmox-backup v2 1/3] pbs-config: cache verified API token secrets Samuel Rufinatscha
@ 2025-12-17 16:25 12% ` Samuel Rufinatscha
2025-12-17 16:25 14% ` [pbs-devel] [PATCH proxmox-backup v2 3/3] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
` (6 subsequent siblings)
8 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-17 16:25 UTC (permalink / raw)
To: pbs-devel
Previously the in-memory token-secret cache was only updated via
set_secret() and delete_secret(), so manual edits to token.shadow were
not reflected.
This patch adds file change detection to the cache. It tracks the mtime
and length of token.shadow and clears the in-memory token secret cache
whenever these values change.
Note, this patch fetches file stats on every request. An TTL-based
optimization will be covered in a subsequent patch of the series.
This patch partly fixes bug #7017 [1].
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v1 to v2:
- Add file metadata tracking (file_mtime, file_len) and
FILE_GENERATION.
- Store file_gen in CachedSecret and verify it against the current
FILE_GENERATION to ensure cached entries belong to the current file
state.
- Add shadow_mtime_len() helper and convert refresh to best-effort
(try_write, returns bool).
- Pass a pre-write metadata snapshot into apply_api_mutation and
clear/bump generation if the cache metadata indicates missed external
edits.
pbs-config/src/token_shadow.rs | 128 +++++++++++++++++++++++++++++----
1 file changed, 116 insertions(+), 12 deletions(-)
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index ce845e8d..71553aae 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,6 +1,9 @@
use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::LazyLock;
+use std::time::SystemTime;
use anyhow::{bail, format_err, Error};
use parking_lot::RwLock;
@@ -24,10 +27,14 @@ const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
RwLock::new(ApiTokenSecretCache {
secrets: HashMap::new(),
+ file_mtime: None,
+ file_len: None,
})
});
/// API mutation generation (set/delete)
static API_MUTATION_GENERATION: AtomicU64 = AtomicU64::new(0);
+/// External/manual edits generation for the token.shadow file
+static FILE_GENERATION: AtomicU64 = AtomicU64::new(0);
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
@@ -64,6 +71,29 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
proxmox_sys::fs::replace_file(CONF_FILE, &json, options, true)
}
+/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
+/// Returns true if the cache is valid to use, false if not.
+fn refresh_cache_if_file_changed() -> bool {
+ let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+ return false; // cannot validate external changes -> don't trust cache
+ };
+
+ let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
+ return false; // cannot validate external changes -> don't trust cache
+ };
+
+ if cache.file_mtime == new_mtime && cache.file_len == new_len {
+ return true;
+ }
+
+ cache.secrets.clear();
+ cache.file_mtime = new_mtime;
+ cache.file_len = new_len;
+ FILE_GENERATION.fetch_add(1, Ordering::AcqRel);
+
+ true
+}
+
/// Verifies that an entry for given tokenid / API token secret exists
pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
if !tokenid.is_token() {
@@ -71,12 +101,13 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
}
// Fast path
- if cache_try_secret_matches(tokenid, secret) {
+ if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
return Ok(());
}
// Slow path snapshot (before expensive work)
let api_gen_before = API_MUTATION_GENERATION.load(Ordering::Acquire);
+ let file_gen_before = FILE_GENERATION.load(Ordering::Acquire);
let data = read_file()?;
match data.get(tokenid) {
@@ -84,7 +115,12 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
// Try to cache only if nothing changed while we verified
- cache_try_insert_secret(tokenid.clone(), secret.to_owned(), api_gen_before);
+ cache_try_insert_secret(
+ tokenid.clone(),
+ secret.to_owned(),
+ api_gen_before,
+ file_gen_before,
+ );
Ok(())
}
@@ -108,12 +144,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
let _guard = lock_config()?;
+ // Capture state BEFORE we write to detect external edits.
+ let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
let mut data = read_file()?;
let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
data.insert(tokenid.clone(), hashed_secret);
write_file(data)?;
- apply_api_mutation(tokenid, Some(secret));
+ apply_api_mutation(tokenid, Some(secret), pre_meta);
Ok(())
}
@@ -126,11 +165,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
let _guard = lock_config()?;
+ // Capture state BEFORE we write to detect external edits.
+ let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
let mut data = read_file()?;
data.remove(tokenid);
write_file(data)?;
- apply_api_mutation(tokenid, None);
+ apply_api_mutation(tokenid, None, pre_meta);
Ok(())
}
@@ -142,20 +184,40 @@ struct ApiTokenSecretCache {
/// `generate_and_set_secret`. Used to avoid repeated
/// password-hash computation on subsequent authentications.
secrets: HashMap<Authid, CachedSecret>,
+ // shadow file mtime to detect changes
+ file_mtime: Option<SystemTime>,
+ // shadow file length to detect changes
+ file_len: Option<u64>,
}
-/// Cached secret.
+/// Cached secret and the file generation it was cached at.
struct CachedSecret {
secret: String,
+ file_gen: u64,
}
-fn cache_try_insert_secret(tokenid: Authid, secret: String, api_gen_snapshot: u64) {
+fn cache_try_insert_secret(
+ tokenid: Authid,
+ secret: String,
+ api_gen_snapshot: u64,
+ file_gen_snapshot: u64,
+) {
let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
return;
};
- if API_MUTATION_GENERATION.load(Ordering::Acquire) == api_gen_snapshot {
- cache.secrets.insert(tokenid, CachedSecret { secret });
+ // Check generations to avoid zombie-inserts
+ let cur_file_gen = FILE_GENERATION.load(Ordering::Acquire);
+ let cur_api_gen = API_MUTATION_GENERATION.load(Ordering::Acquire);
+
+ if cur_file_gen == file_gen_snapshot && cur_api_gen == api_gen_snapshot {
+ cache.secrets.insert(
+ tokenid,
+ CachedSecret {
+ secret,
+ file_gen: cur_file_gen,
+ },
+ );
}
}
@@ -167,22 +229,44 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
return false;
};
- openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes())
+ let gen1 = FILE_GENERATION.load(Ordering::Acquire);
+ if entry.file_gen != gen1 {
+ return false;
+ }
+
+ let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
+
+ let gen2 = FILE_GENERATION.load(Ordering::Acquire);
+ eq && gen1 == gen2
}
-fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
- // Prevent in-flight verify_secret() from caching results across a mutation.
+fn apply_api_mutation(
+ tokenid: &Authid,
+ new_secret: Option<&str>,
+ pre_write_meta: (Option<SystemTime>, Option<u64>),
+) {
API_MUTATION_GENERATION.fetch_add(1, Ordering::AcqRel);
- // Mutations must be reflected immediately once set/delete returns.
let mut cache = TOKEN_SECRET_CACHE.write();
+ // If the cache meta doesn't match the file state before the on-disk write,
+ // external/manual edits happened -> drop everything and bump FILE_GENERATION.
+ let (pre_mtime, pre_len) = pre_write_meta;
+ if cache.file_mtime != pre_mtime || cache.file_len != pre_len {
+ cache.secrets.clear();
+ FILE_GENERATION.fetch_add(1, Ordering::AcqRel);
+ }
+
+ let file_gen = FILE_GENERATION.load(Ordering::Acquire);
+
+ // Apply the API mutation to the cache.
match new_secret {
Some(secret) => {
cache.secrets.insert(
tokenid.clone(),
CachedSecret {
secret: secret.to_owned(),
+ file_gen,
},
);
}
@@ -190,4 +274,24 @@ fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
cache.secrets.remove(tokenid);
}
}
+
+ // Keep cache metadata aligned if possible.
+ match shadow_mtime_len() {
+ Ok((mtime, len)) => {
+ cache.file_mtime = mtime;
+ cache.file_len = len;
+ }
+ Err(_) => {
+ cache.file_mtime = None;
+ cache.file_len = None;
+ }
+ }
+}
+
+fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
+ match fs::metadata(CONF_FILE) {
+ Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
+ Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
+ Err(e) => Err(e.into()),
+ }
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* Re: [pbs-devel] [PATCH proxmox-backup 1/3] pbs-config: cache verified API token secrets
2025-12-17 11:16 5% ` Christian Ebner
@ 2025-12-17 11:25 0% ` Shannon Sterz
0 siblings, 0 replies; 200+ results
From: Shannon Sterz @ 2025-12-17 11:25 UTC (permalink / raw)
To: Christian Ebner; +Cc: Proxmox Backup Server development discussion
On Wed Dec 17, 2025 at 12:16 PM CET, Christian Ebner wrote:
> On 12/9/25 2:29 PM, Samuel Rufinatscha wrote:
>> On 12/5/25 3:03 PM, Shannon Sterz wrote:
>>> On Fri Dec 5, 2025 at 2:25 PM CET, Samuel Rufinatscha wrote:
>>>> Currently, every token-based API request reads the token.shadow file and
>>>> runs the expensive password hash verification for the given token
>>>> secret. This shows up as a hotspot in /status profiling (see
>>>> bug #6049 [1]).
>>>>
>>>> This patch introduces an in-memory cache of successfully verified token
>>>> secrets. Subsequent requests for the same token+secret combination only
>>>> perform a comparison using openssl::memcmp::eq and avoid re-running the
>>>> password hash. The cache is updated when a token secret is set and
>>>> cleared when a token is deleted. Note, this does NOT include manual
>>>> config changes, which will be covered in a subsequent patch.
>>>>
>>>> This patch partly fixes bug #6049 [1].
>>>>
>>>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>>>
>>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>>> ---
>>>> pbs-config/src/token_shadow.rs | 58 +++++++++++++++++++++++++++++++++-
>>>> 1 file changed, 57 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/
>>>> token_shadow.rs
>>>> index 640fabbf..47aa2fc2 100644
>>>> --- a/pbs-config/src/token_shadow.rs
>>>> +++ b/pbs-config/src/token_shadow.rs
>>>> @@ -1,6 +1,8 @@
>>>> use std::collections::HashMap;
>>>> +use std::sync::RwLock;
>>>>
>>>> use anyhow::{bail, format_err, Error};
>>>> +use once_cell::sync::OnceCell;
>>>> use serde::{Deserialize, Serialize};
>>>> use serde_json::{from_value, Value};
>>>>
>>>> @@ -13,6 +15,13 @@ use crate::{open_backup_lockfile, BackupLockGuard};
>>>> const LOCK_FILE: &str = pbs_buildcfg::configdir!("/
>>>> token.shadow.lock");
>>>> const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>>>>
>>>> +/// Global in-memory cache for successfully verified API token secrets.
>>>> +/// The cache stores plain text secrets for token Authids that have
>>>> already been
>>>> +/// verified against the hashed values in `token.shadow`. This
>>>> allows for cheap
>>>> +/// subsequent authentications for the same token+secret
>>>> combination, avoiding
>>>> +/// recomputing the password hash on every request.
>>>> +static TOKEN_SECRET_CACHE: OnceCell<RwLock<ApiTokenSecretCache>> =
>>>> OnceCell::new();
>>>
>>> any reason you are using a once cell with a cutom get_or_init function
>>> instead of a simple `LazyCell` [1] here? seems to me that this would be
>>> the more appropriate type here? similar question for the
>>> proxmox-access-control portion of this series.
>>>
>>> [1]: https://doc.rust-lang.org/std/cell/struct.LazyCell.html
>>>
>>
>> Good point, we should / can directly initialize it! Will change
>> to LazyCell. Thanks!
>
> LazyCell is however not thread safe, so could cause issues with
> concurrent inits from different threads. IMO std::sync::LazyLock [0] is
> a better fit here and follows along the line of what we do for other
> caches in PBS, e.g. in pbs-config::user.
>
> [0] https://doc.rust-lang.org/std/sync/struct.LazyLock.html
ah right, yes that makes a lot of sense, thanks for catching that.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 0%]
* Re: [pbs-devel] [PATCH proxmox-backup 1/3] pbs-config: cache verified API token secrets
2025-12-09 13:29 6% ` Samuel Rufinatscha
@ 2025-12-17 11:16 5% ` Christian Ebner
2025-12-17 11:25 0% ` Shannon Sterz
0 siblings, 1 reply; 200+ results
From: Christian Ebner @ 2025-12-17 11:16 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Samuel Rufinatscha,
Shannon Sterz
On 12/9/25 2:29 PM, Samuel Rufinatscha wrote:
> On 12/5/25 3:03 PM, Shannon Sterz wrote:
>> On Fri Dec 5, 2025 at 2:25 PM CET, Samuel Rufinatscha wrote:
>>> Currently, every token-based API request reads the token.shadow file and
>>> runs the expensive password hash verification for the given token
>>> secret. This shows up as a hotspot in /status profiling (see
>>> bug #6049 [1]).
>>>
>>> This patch introduces an in-memory cache of successfully verified token
>>> secrets. Subsequent requests for the same token+secret combination only
>>> perform a comparison using openssl::memcmp::eq and avoid re-running the
>>> password hash. The cache is updated when a token secret is set and
>>> cleared when a token is deleted. Note, this does NOT include manual
>>> config changes, which will be covered in a subsequent patch.
>>>
>>> This patch partly fixes bug #6049 [1].
>>>
>>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>>
>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>> ---
>>> pbs-config/src/token_shadow.rs | 58 +++++++++++++++++++++++++++++++++-
>>> 1 file changed, 57 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/
>>> token_shadow.rs
>>> index 640fabbf..47aa2fc2 100644
>>> --- a/pbs-config/src/token_shadow.rs
>>> +++ b/pbs-config/src/token_shadow.rs
>>> @@ -1,6 +1,8 @@
>>> use std::collections::HashMap;
>>> +use std::sync::RwLock;
>>>
>>> use anyhow::{bail, format_err, Error};
>>> +use once_cell::sync::OnceCell;
>>> use serde::{Deserialize, Serialize};
>>> use serde_json::{from_value, Value};
>>>
>>> @@ -13,6 +15,13 @@ use crate::{open_backup_lockfile, BackupLockGuard};
>>> const LOCK_FILE: &str = pbs_buildcfg::configdir!("/
>>> token.shadow.lock");
>>> const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>>>
>>> +/// Global in-memory cache for successfully verified API token secrets.
>>> +/// The cache stores plain text secrets for token Authids that have
>>> already been
>>> +/// verified against the hashed values in `token.shadow`. This
>>> allows for cheap
>>> +/// subsequent authentications for the same token+secret
>>> combination, avoiding
>>> +/// recomputing the password hash on every request.
>>> +static TOKEN_SECRET_CACHE: OnceCell<RwLock<ApiTokenSecretCache>> =
>>> OnceCell::new();
>>
>> any reason you are using a once cell with a cutom get_or_init function
>> instead of a simple `LazyCell` [1] here? seems to me that this would be
>> the more appropriate type here? similar question for the
>> proxmox-access-control portion of this series.
>>
>> [1]: https://doc.rust-lang.org/std/cell/struct.LazyCell.html
>>
>
> Good point, we should / can directly initialize it! Will change
> to LazyCell. Thanks!
LazyCell is however not thread safe, so could cause issues with
concurrent inits from different threads. IMO std::sync::LazyLock [0] is
a better fit here and follows along the line of what we do for other
caches in PBS, e.g. in pbs-config::user.
[0] https://doc.rust-lang.org/std/sync/struct.LazyLock.html
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup 1/3] pbs-config: cache verified API token secrets
2025-12-15 19:00 12% ` Samuel Rufinatscha
@ 2025-12-16 8:16 5% ` Fabian Grünbichler
0 siblings, 0 replies; 200+ results
From: Fabian Grünbichler @ 2025-12-16 8:16 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On December 15, 2025 8:00 pm, Samuel Rufinatscha wrote:
> On 12/15/25 4:06 PM, Samuel Rufinatscha wrote:
>> On 12/10/25 4:35 PM, Samuel Rufinatscha wrote:
>>> On 12/10/25 12:47 PM, Fabian Grünbichler wrote:
>>>> Quoting Samuel Rufinatscha (2025-12-05 14:25:54)
>>>>> Currently, every token-based API request reads the token.shadow file
>>>>> and
>>>>> runs the expensive password hash verification for the given token
>>>>> secret. This shows up as a hotspot in /status profiling (see
>>>>> bug #6049 [1]).
>>>>>
>>>>> This patch introduces an in-memory cache of successfully verified token
>>>>> secrets. Subsequent requests for the same token+secret combination only
>>>>> perform a comparison using openssl::memcmp::eq and avoid re-running the
>>>>> password hash. The cache is updated when a token secret is set and
>>>>> cleared when a token is deleted. Note, this does NOT include manual
>>>>> config changes, which will be covered in a subsequent patch.
>>>>>
>>>>> This patch partly fixes bug #6049 [1].
>>>>>
>>>>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>>>>
>>>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>>>> ---
>>>>> pbs-config/src/token_shadow.rs | 58 ++++++++++++++++++++++++++++++
>>>>> +++-
>>>>> 1 file changed, 57 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/
>>>>> token_shadow.rs
>>>>> index 640fabbf..47aa2fc2 100644
>>>>> --- a/pbs-config/src/token_shadow.rs
>>>>> +++ b/pbs-config/src/token_shadow.rs
>>>>> @@ -1,6 +1,8 @@
>>>>> use std::collections::HashMap;
>>>>> +use std::sync::RwLock;
>>>>> use anyhow::{bail, format_err, Error};
>>>>> +use once_cell::sync::OnceCell;
>>>>> use serde::{Deserialize, Serialize};
>>>>> use serde_json::{from_value, Value};
>>>>> @@ -13,6 +15,13 @@ use crate::{open_backup_lockfile, BackupLockGuard};
>>>>> const LOCK_FILE: &str = pbs_buildcfg::configdir!("/
>>>>> token.shadow.lock");
>>>>> const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>>>>> +/// Global in-memory cache for successfully verified API token
>>>>> secrets.
>>>>> +/// The cache stores plain text secrets for token Authids that have
>>>>> already been
>>>>> +/// verified against the hashed values in `token.shadow`. This
>>>>> allows for cheap
>>>>> +/// subsequent authentications for the same token+secret
>>>>> combination, avoiding
>>>>> +/// recomputing the password hash on every request.
>>>>> +static TOKEN_SECRET_CACHE: OnceCell<RwLock<ApiTokenSecretCache>> =
>>>>> OnceCell::new();
>>>>> +
>>>>> #[derive(Serialize, Deserialize)]
>>>>> #[serde(rename_all = "kebab-case")]
>>>>> /// ApiToken id / secret pair
>>>>> @@ -54,9 +63,25 @@ pub fn verify_secret(tokenid: &Authid, secret:
>>>>> &str) -> Result<(), Error> {
>>>>> bail!("not an API token ID");
>>>>> }
>>>>> + // Fast path
>>>>> + if let Some(cached) =
>>>>> token_secret_cache().read().unwrap().secrets.get(tokenid) {
>>>>
>>>> did you benchmark this with a lot of parallel token requests? a plain
>>>> RwLock
>>>> gives no guarantees at all w.r.t. ordering or fairness, so a lot of
>>>> token-based
>>>> requests could effectively prevent token removal AFAICT (or vice-versa,
>>>> spamming token creation could lock out all tokens?)
>>>>
>>>> since we don't actually require the cache here to proceed, we could
>>>> also make this a try_read
>>>> or a read with timeout, and fallback to the slow path if there is too
>>>> much
>>>> contention? alternatively, comparing with parking_lot would also be
>>>> interesting, since that implementation does have fairness guarantees.
>>>>
>>>> note that token-based requests are basically doable by anyone being
>>>> able to
>>>> reach PBS, whereas token creation/deletion is available to every
>>>> authenticaed
>>>> user.
>>>>
>>>
>>> Thanks for the review Fabian and the valuable comments!
>>>
>>> I did not benchmark the RwLock itself under load. Your point about
>>> contention/fairness for RwLock makes perfect sense, and we should
>>> consider this. So for v2, I will integrate try_read() /
>>> try_write() as mentioned to avoid possible contention / DoS issues.
>>>
>>> I’ll also consider parking_lot::RwLock, thanks for the hint!
>>>
>>
>>
>> I benchmarked the "writer under heavy parallel readers" scenario by
>> running a 64-parallel token-auth flood against
>> /admin/datastore/ds0001/status?verbose=0 (≈ 44-48k successful
>> requests total) while executing 50 token create + 50 token delete
>> operations.
>>
>> With the suggested best-effort approach (cache lookups/inserts via
>> try_read/try_write) I saw the following e2e API latencies:
>>
>> delete: p95 ~39ms, max ~44ms
>> create: p95 ~50ms, max ~56ms
>>
>> I also compared against parking_lot::RwLock under the same setup,
>> results were in the same range (delete p95 ~39–43ms, max ~43–64ms)
>> so I didn’t see a clear benefit there for this workload.
>>
>> For v2 I will keep std::sync::RwLock with read/insert best-effort, while
>> delete/removal blocking.
>>
>>
>
> Fabian,
>
> one clarification/follow-up: the comparison against parking_lot::RwLock
> was focused on end-to-end latency, and under the benchmarked
> workload we didn’t observe starvation effects. Still, std::sync::RwLock
> does not provide ordering or fairness guarantees, so under sustained
> token-auth read load cache invalidation could theoretically be delayed.
>
> Given that, I think switching to parking_lot::RwLock for v2 to get clear
> fairness semantics, while keeping the try_read/try_insert approach, is
> the better solution here.
I think going with parking_lot is okay here (it's already a dependency
of tokio anyway..). If we go with the std one, we should keep it in mind
in case we ever see signs of this being a problem.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup 1/3] pbs-config: cache verified API token secrets
2025-12-15 15:05 12% ` Samuel Rufinatscha
@ 2025-12-15 19:00 12% ` Samuel Rufinatscha
2025-12-16 8:16 5% ` Fabian Grünbichler
0 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2025-12-15 19:00 UTC (permalink / raw)
To: Fabian Grünbichler, pbs-devel
On 12/15/25 4:06 PM, Samuel Rufinatscha wrote:
> On 12/10/25 4:35 PM, Samuel Rufinatscha wrote:
>> On 12/10/25 12:47 PM, Fabian Grünbichler wrote:
>>> Quoting Samuel Rufinatscha (2025-12-05 14:25:54)
>>>> Currently, every token-based API request reads the token.shadow file
>>>> and
>>>> runs the expensive password hash verification for the given token
>>>> secret. This shows up as a hotspot in /status profiling (see
>>>> bug #6049 [1]).
>>>>
>>>> This patch introduces an in-memory cache of successfully verified token
>>>> secrets. Subsequent requests for the same token+secret combination only
>>>> perform a comparison using openssl::memcmp::eq and avoid re-running the
>>>> password hash. The cache is updated when a token secret is set and
>>>> cleared when a token is deleted. Note, this does NOT include manual
>>>> config changes, which will be covered in a subsequent patch.
>>>>
>>>> This patch partly fixes bug #6049 [1].
>>>>
>>>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>>>
>>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>>> ---
>>>> pbs-config/src/token_shadow.rs | 58 ++++++++++++++++++++++++++++++
>>>> +++-
>>>> 1 file changed, 57 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/
>>>> token_shadow.rs
>>>> index 640fabbf..47aa2fc2 100644
>>>> --- a/pbs-config/src/token_shadow.rs
>>>> +++ b/pbs-config/src/token_shadow.rs
>>>> @@ -1,6 +1,8 @@
>>>> use std::collections::HashMap;
>>>> +use std::sync::RwLock;
>>>> use anyhow::{bail, format_err, Error};
>>>> +use once_cell::sync::OnceCell;
>>>> use serde::{Deserialize, Serialize};
>>>> use serde_json::{from_value, Value};
>>>> @@ -13,6 +15,13 @@ use crate::{open_backup_lockfile, BackupLockGuard};
>>>> const LOCK_FILE: &str = pbs_buildcfg::configdir!("/
>>>> token.shadow.lock");
>>>> const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>>>> +/// Global in-memory cache for successfully verified API token
>>>> secrets.
>>>> +/// The cache stores plain text secrets for token Authids that have
>>>> already been
>>>> +/// verified against the hashed values in `token.shadow`. This
>>>> allows for cheap
>>>> +/// subsequent authentications for the same token+secret
>>>> combination, avoiding
>>>> +/// recomputing the password hash on every request.
>>>> +static TOKEN_SECRET_CACHE: OnceCell<RwLock<ApiTokenSecretCache>> =
>>>> OnceCell::new();
>>>> +
>>>> #[derive(Serialize, Deserialize)]
>>>> #[serde(rename_all = "kebab-case")]
>>>> /// ApiToken id / secret pair
>>>> @@ -54,9 +63,25 @@ pub fn verify_secret(tokenid: &Authid, secret:
>>>> &str) -> Result<(), Error> {
>>>> bail!("not an API token ID");
>>>> }
>>>> + // Fast path
>>>> + if let Some(cached) =
>>>> token_secret_cache().read().unwrap().secrets.get(tokenid) {
>>>
>>> did you benchmark this with a lot of parallel token requests? a plain
>>> RwLock
>>> gives no guarantees at all w.r.t. ordering or fairness, so a lot of
>>> token-based
>>> requests could effectively prevent token removal AFAICT (or vice-versa,
>>> spamming token creation could lock out all tokens?)
>>>
>>> since we don't actually require the cache here to proceed, we could
>>> also make this a try_read
>>> or a read with timeout, and fallback to the slow path if there is too
>>> much
>>> contention? alternatively, comparing with parking_lot would also be
>>> interesting, since that implementation does have fairness guarantees.
>>>
>>> note that token-based requests are basically doable by anyone being
>>> able to
>>> reach PBS, whereas token creation/deletion is available to every
>>> authenticaed
>>> user.
>>>
>>
>> Thanks for the review Fabian and the valuable comments!
>>
>> I did not benchmark the RwLock itself under load. Your point about
>> contention/fairness for RwLock makes perfect sense, and we should
>> consider this. So for v2, I will integrate try_read() /
>> try_write() as mentioned to avoid possible contention / DoS issues.
>>
>> I’ll also consider parking_lot::RwLock, thanks for the hint!
>>
>
>
> I benchmarked the "writer under heavy parallel readers" scenario by
> running a 64-parallel token-auth flood against
> /admin/datastore/ds0001/status?verbose=0 (≈ 44-48k successful
> requests total) while executing 50 token create + 50 token delete
> operations.
>
> With the suggested best-effort approach (cache lookups/inserts via
> try_read/try_write) I saw the following e2e API latencies:
>
> delete: p95 ~39ms, max ~44ms
> create: p95 ~50ms, max ~56ms
>
> I also compared against parking_lot::RwLock under the same setup,
> results were in the same range (delete p95 ~39–43ms, max ~43–64ms)
> so I didn’t see a clear benefit there for this workload.
>
> For v2 I will keep std::sync::RwLock with read/insert best-effort, while
> delete/removal blocking.
>
>
Fabian,
one clarification/follow-up: the comparison against parking_lot::RwLock
was focused on end-to-end latency, and under the benchmarked
workload we didn’t observe starvation effects. Still, std::sync::RwLock
does not provide ordering or fairness guarantees, so under sustained
token-auth read load cache invalidation could theoretically be delayed.
Given that, I think switching to parking_lot::RwLock for v2 to get clear
fairness semantics, while keeping the try_read/try_insert approach, is
the better solution here.
>>>> + // Compare cached secret with provided one using constant
>>>> time comparison
>>>> + if openssl::memcmp::eq(cached.as_bytes(), secret.as_bytes()) {
>>>> + // Already verified before
>>>> + return Ok(());
>>>> + }
>>>> + // Fall through to slow path if secret doesn't match cached
>>>> one
>>>> + }
>>>
>>> this could also be a helper, like the rest. then it would consume (a
>>> reference
>>> to) the user-provided secret value, instead of giving access to all
>>> cached
>>> ones. doesn't make a real difference now other than consistence, but
>>> the cache
>>> is (more) cleanly encapsulated then.
>>>
>>>> +
>>>> + // Slow path: read file + verify hash
>>>> let data = read_file()?;
>>>> match data.get(tokenid) {
>>>> - Some(hashed_secret) =>
>>>> proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
>>>> + Some(hashed_secret) => {
>>>> + proxmox_sys::crypt::verify_crypt_pw(secret,
>>>> hashed_secret)?;
>>>> + // Cache the plain secret for future requests
>>>> + cache_insert_secret(tokenid.clone(), secret.to_owned());
>>>
>>> same applies here - storing the value in the cache is optional (and
>>> good if it
>>> works), but we don't want to stall forever waiting for the cache
>>> insertion to
>>> go through..
>>>
>>>> + Ok(())
>>>> + }
>>>> None => bail!("invalid API token"),
>>>> }
>>>> }
>>>> @@ -82,6 +107,8 @@ fn set_secret(tokenid: &Authid, secret: &str) ->
>>>> Result<(), Error> {
>>>> data.insert(tokenid.clone(), hashed_secret);
>>>> write_file(data)?;
>>>> + cache_insert_secret(tokenid.clone(), secret.to_owned());
>>>
>>> this
>>>
>>>> +
>>>> Ok(())
>>>> }
>>>> @@ -97,5 +124,34 @@ pub fn delete_secret(tokenid: &Authid) ->
>>>> Result<(), Error> {
>>>> data.remove(tokenid);
>>>> write_file(data)?;
>>>> + cache_remove_secret(tokenid);
>>>
>>> and this need to block of course and can't be skipped, because
>>> otherwise the
>>> read above might operate on wrong data..
>>>
>>>> +
>>>> Ok(())
>>>> }
>>>> +
>>>> +struct ApiTokenSecretCache {
>>>> + /// Keys are token Authids, values are the corresponding plain
>>>> text secrets.
>>>> + /// Entries are added after a successful on-disk verification in
>>>> + /// `verify_secret` or when a new token secret is generated by
>>>> + /// `generate_and_set_secret`. Used to avoid repeated
>>>> + /// password-hash computation on subsequent authentications.
>>>> + secrets: HashMap<Authid, String>,
>>>> +}
>>>> +
>>>> +fn token_secret_cache() -> &'static RwLock<ApiTokenSecretCache> {
>>>> + TOKEN_SECRET_CACHE.get_or_init(|| {
>>>> + RwLock::new(ApiTokenSecretCache {
>>>> + secrets: HashMap::new(),
>>>> + })
>>>> + })
>>>> +}
>>>> +
>>>> +fn cache_insert_secret(tokenid: Authid, secret: String) {
>>>> + let mut cache = token_secret_cache().write().unwrap();
>>>> + cache.secrets.insert(tokenid, secret);
>>>> +}
>>>> +
>>>> +fn cache_remove_secret(tokenid: &Authid) {
>>>> + let mut cache = token_secret_cache().write().unwrap();
>>>> + cache.secrets.remove(tokenid);
>>>> +}
>>>> --
>>>> 2.47.3
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> pbs-devel mailing list
>>>> pbs-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>>
>>>>
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* Re: [pbs-devel] [PATCH proxmox-backup 1/3] pbs-config: cache verified API token secrets
2025-12-10 15:35 6% ` Samuel Rufinatscha
@ 2025-12-15 15:05 12% ` Samuel Rufinatscha
2025-12-15 19:00 12% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2025-12-15 15:05 UTC (permalink / raw)
To: Fabian Grünbichler, pbs-devel
On 12/10/25 4:35 PM, Samuel Rufinatscha wrote:
> On 12/10/25 12:47 PM, Fabian Grünbichler wrote:
>> Quoting Samuel Rufinatscha (2025-12-05 14:25:54)
>>> Currently, every token-based API request reads the token.shadow file and
>>> runs the expensive password hash verification for the given token
>>> secret. This shows up as a hotspot in /status profiling (see
>>> bug #6049 [1]).
>>>
>>> This patch introduces an in-memory cache of successfully verified token
>>> secrets. Subsequent requests for the same token+secret combination only
>>> perform a comparison using openssl::memcmp::eq and avoid re-running the
>>> password hash. The cache is updated when a token secret is set and
>>> cleared when a token is deleted. Note, this does NOT include manual
>>> config changes, which will be covered in a subsequent patch.
>>>
>>> This patch partly fixes bug #6049 [1].
>>>
>>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>>
>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>> ---
>>> pbs-config/src/token_shadow.rs | 58 +++++++++++++++++++++++++++++++++-
>>> 1 file changed, 57 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/
>>> token_shadow.rs
>>> index 640fabbf..47aa2fc2 100644
>>> --- a/pbs-config/src/token_shadow.rs
>>> +++ b/pbs-config/src/token_shadow.rs
>>> @@ -1,6 +1,8 @@
>>> use std::collections::HashMap;
>>> +use std::sync::RwLock;
>>> use anyhow::{bail, format_err, Error};
>>> +use once_cell::sync::OnceCell;
>>> use serde::{Deserialize, Serialize};
>>> use serde_json::{from_value, Value};
>>> @@ -13,6 +15,13 @@ use crate::{open_backup_lockfile, BackupLockGuard};
>>> const LOCK_FILE: &str = pbs_buildcfg::configdir!("/
>>> token.shadow.lock");
>>> const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>>> +/// Global in-memory cache for successfully verified API token secrets.
>>> +/// The cache stores plain text secrets for token Authids that have
>>> already been
>>> +/// verified against the hashed values in `token.shadow`. This
>>> allows for cheap
>>> +/// subsequent authentications for the same token+secret
>>> combination, avoiding
>>> +/// recomputing the password hash on every request.
>>> +static TOKEN_SECRET_CACHE: OnceCell<RwLock<ApiTokenSecretCache>> =
>>> OnceCell::new();
>>> +
>>> #[derive(Serialize, Deserialize)]
>>> #[serde(rename_all = "kebab-case")]
>>> /// ApiToken id / secret pair
>>> @@ -54,9 +63,25 @@ pub fn verify_secret(tokenid: &Authid, secret:
>>> &str) -> Result<(), Error> {
>>> bail!("not an API token ID");
>>> }
>>> + // Fast path
>>> + if let Some(cached) =
>>> token_secret_cache().read().unwrap().secrets.get(tokenid) {
>>
>> did you benchmark this with a lot of parallel token requests? a plain
>> RwLock
>> gives no guarantees at all w.r.t. ordering or fairness, so a lot of
>> token-based
>> requests could effectively prevent token removal AFAICT (or vice-versa,
>> spamming token creation could lock out all tokens?)
>>
>> since we don't actually require the cache here to proceed, we could
>> also make this a try_read
>> or a read with timeout, and fallback to the slow path if there is too
>> much
>> contention? alternatively, comparing with parking_lot would also be
>> interesting, since that implementation does have fairness guarantees.
>>
>> note that token-based requests are basically doable by anyone being
>> able to
>> reach PBS, whereas token creation/deletion is available to every
>> authenticaed
>> user.
>>
>
> Thanks for the review Fabian and the valuable comments!
>
> I did not benchmark the RwLock itself under load. Your point about
> contention/fairness for RwLock makes perfect sense, and we should
> consider this. So for v2, I will integrate try_read() /
> try_write() as mentioned to avoid possible contention / DoS issues.
>
> I’ll also consider parking_lot::RwLock, thanks for the hint!
>
I benchmarked the "writer under heavy parallel readers" scenario by
running a 64-parallel token-auth flood against
/admin/datastore/ds0001/status?verbose=0 (≈ 44-48k successful
requests total) while executing 50 token create + 50 token delete
operations.
With the suggested best-effort approach (cache lookups/inserts via
try_read/try_write) I saw the following e2e API latencies:
delete: p95 ~39ms, max ~44ms
create: p95 ~50ms, max ~56ms
I also compared against parking_lot::RwLock under the same setup,
results were in the same range (delete p95 ~39–43ms, max ~43–64ms)
so I didn’t see a clear benefit there for this workload.
For v2 I will keep std::sync::RwLock with read/insert best-effort, while
delete/removal blocking.
>>> + // Compare cached secret with provided one using constant
>>> time comparison
>>> + if openssl::memcmp::eq(cached.as_bytes(), secret.as_bytes()) {
>>> + // Already verified before
>>> + return Ok(());
>>> + }
>>> + // Fall through to slow path if secret doesn't match cached one
>>> + }
>>
>> this could also be a helper, like the rest. then it would consume (a
>> reference
>> to) the user-provided secret value, instead of giving access to all
>> cached
>> ones. doesn't make a real difference now other than consistence, but
>> the cache
>> is (more) cleanly encapsulated then.
>>
>>> +
>>> + // Slow path: read file + verify hash
>>> let data = read_file()?;
>>> match data.get(tokenid) {
>>> - Some(hashed_secret) =>
>>> proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
>>> + Some(hashed_secret) => {
>>> + proxmox_sys::crypt::verify_crypt_pw(secret,
>>> hashed_secret)?;
>>> + // Cache the plain secret for future requests
>>> + cache_insert_secret(tokenid.clone(), secret.to_owned());
>>
>> same applies here - storing the value in the cache is optional (and
>> good if it
>> works), but we don't want to stall forever waiting for the cache
>> insertion to
>> go through..
>>
>>> + Ok(())
>>> + }
>>> None => bail!("invalid API token"),
>>> }
>>> }
>>> @@ -82,6 +107,8 @@ fn set_secret(tokenid: &Authid, secret: &str) ->
>>> Result<(), Error> {
>>> data.insert(tokenid.clone(), hashed_secret);
>>> write_file(data)?;
>>> + cache_insert_secret(tokenid.clone(), secret.to_owned());
>>
>> this
>>
>>> +
>>> Ok(())
>>> }
>>> @@ -97,5 +124,34 @@ pub fn delete_secret(tokenid: &Authid) ->
>>> Result<(), Error> {
>>> data.remove(tokenid);
>>> write_file(data)?;
>>> + cache_remove_secret(tokenid);
>>
>> and this need to block of course and can't be skipped, because
>> otherwise the
>> read above might operate on wrong data..
>>
>>> +
>>> Ok(())
>>> }
>>> +
>>> +struct ApiTokenSecretCache {
>>> + /// Keys are token Authids, values are the corresponding plain
>>> text secrets.
>>> + /// Entries are added after a successful on-disk verification in
>>> + /// `verify_secret` or when a new token secret is generated by
>>> + /// `generate_and_set_secret`. Used to avoid repeated
>>> + /// password-hash computation on subsequent authentications.
>>> + secrets: HashMap<Authid, String>,
>>> +}
>>> +
>>> +fn token_secret_cache() -> &'static RwLock<ApiTokenSecretCache> {
>>> + TOKEN_SECRET_CACHE.get_or_init(|| {
>>> + RwLock::new(ApiTokenSecretCache {
>>> + secrets: HashMap::new(),
>>> + })
>>> + })
>>> +}
>>> +
>>> +fn cache_insert_secret(tokenid: Authid, secret: String) {
>>> + let mut cache = token_secret_cache().write().unwrap();
>>> + cache.secrets.insert(tokenid, secret);
>>> +}
>>> +
>>> +fn cache_remove_secret(tokenid: &Authid) {
>>> + let mut cache = token_secret_cache().write().unwrap();
>>> + cache.secrets.remove(tokenid);
>>> +}
>>> --
>>> 2.47.3
>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>
>>>
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* Re: [pbs-devel] [PATCH proxmox-backup 1/3] pbs-config: cache verified API token secrets
2025-12-10 11:47 5% ` Fabian Grünbichler
@ 2025-12-10 15:35 6% ` Samuel Rufinatscha
2025-12-15 15:05 12% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2025-12-10 15:35 UTC (permalink / raw)
To: Fabian Grünbichler, pbs-devel
On 12/10/25 12:47 PM, Fabian Grünbichler wrote:
> Quoting Samuel Rufinatscha (2025-12-05 14:25:54)
>> Currently, every token-based API request reads the token.shadow file and
>> runs the expensive password hash verification for the given token
>> secret. This shows up as a hotspot in /status profiling (see
>> bug #6049 [1]).
>>
>> This patch introduces an in-memory cache of successfully verified token
>> secrets. Subsequent requests for the same token+secret combination only
>> perform a comparison using openssl::memcmp::eq and avoid re-running the
>> password hash. The cache is updated when a token secret is set and
>> cleared when a token is deleted. Note, this does NOT include manual
>> config changes, which will be covered in a subsequent patch.
>>
>> This patch partly fixes bug #6049 [1].
>>
>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> pbs-config/src/token_shadow.rs | 58 +++++++++++++++++++++++++++++++++-
>> 1 file changed, 57 insertions(+), 1 deletion(-)
>>
>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
>> index 640fabbf..47aa2fc2 100644
>> --- a/pbs-config/src/token_shadow.rs
>> +++ b/pbs-config/src/token_shadow.rs
>> @@ -1,6 +1,8 @@
>> use std::collections::HashMap;
>> +use std::sync::RwLock;
>>
>> use anyhow::{bail, format_err, Error};
>> +use once_cell::sync::OnceCell;
>> use serde::{Deserialize, Serialize};
>> use serde_json::{from_value, Value};
>>
>> @@ -13,6 +15,13 @@ use crate::{open_backup_lockfile, BackupLockGuard};
>> const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
>> const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>>
>> +/// Global in-memory cache for successfully verified API token secrets.
>> +/// The cache stores plain text secrets for token Authids that have already been
>> +/// verified against the hashed values in `token.shadow`. This allows for cheap
>> +/// subsequent authentications for the same token+secret combination, avoiding
>> +/// recomputing the password hash on every request.
>> +static TOKEN_SECRET_CACHE: OnceCell<RwLock<ApiTokenSecretCache>> = OnceCell::new();
>> +
>> #[derive(Serialize, Deserialize)]
>> #[serde(rename_all = "kebab-case")]
>> /// ApiToken id / secret pair
>> @@ -54,9 +63,25 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>> bail!("not an API token ID");
>> }
>>
>> + // Fast path
>> + if let Some(cached) = token_secret_cache().read().unwrap().secrets.get(tokenid) {
>
> did you benchmark this with a lot of parallel token requests? a plain RwLock
> gives no guarantees at all w.r.t. ordering or fairness, so a lot of token-based
> requests could effectively prevent token removal AFAICT (or vice-versa,
> spamming token creation could lock out all tokens?)
>
> since we don't actually require the cache here to proceed, we could also make this a try_read
> or a read with timeout, and fallback to the slow path if there is too much
> contention? alternatively, comparing with parking_lot would also be
> interesting, since that implementation does have fairness guarantees.
>
> note that token-based requests are basically doable by anyone being able to
> reach PBS, whereas token creation/deletion is available to every authenticaed
> user.
>
Thanks for the review Fabian and the valuable comments!
I did not benchmark the RwLock itself under load. Your point about
contention/fairness for RwLock makes perfect sense, and we should
consider this. So for v2, I will integrate try_read() /
try_write() as mentioned to avoid possible contention / DoS issues.
I’ll also consider parking_lot::RwLock, thanks for the hint!
>> + // Compare cached secret with provided one using constant time comparison
>> + if openssl::memcmp::eq(cached.as_bytes(), secret.as_bytes()) {
>> + // Already verified before
>> + return Ok(());
>> + }
>> + // Fall through to slow path if secret doesn't match cached one
>> + }
>
> this could also be a helper, like the rest. then it would consume (a reference
> to) the user-provided secret value, instead of giving access to all cached
> ones. doesn't make a real difference now other than consistence, but the cache
> is (more) cleanly encapsulated then.
>
>> +
>> + // Slow path: read file + verify hash
>> let data = read_file()?;
>> match data.get(tokenid) {
>> - Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
>> + Some(hashed_secret) => {
>> + proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
>> + // Cache the plain secret for future requests
>> + cache_insert_secret(tokenid.clone(), secret.to_owned());
>
> same applies here - storing the value in the cache is optional (and good if it
> works), but we don't want to stall forever waiting for the cache insertion to
> go through..
>
>> + Ok(())
>> + }
>> None => bail!("invalid API token"),
>> }
>> }
>> @@ -82,6 +107,8 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>> data.insert(tokenid.clone(), hashed_secret);
>> write_file(data)?;
>>
>> + cache_insert_secret(tokenid.clone(), secret.to_owned());
>
> this
>
>> +
>> Ok(())
>> }
>>
>> @@ -97,5 +124,34 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
>> data.remove(tokenid);
>> write_file(data)?;
>>
>> + cache_remove_secret(tokenid);
>
> and this need to block of course and can't be skipped, because otherwise the
> read above might operate on wrong data..
>
>> +
>> Ok(())
>> }
>> +
>> +struct ApiTokenSecretCache {
>> + /// Keys are token Authids, values are the corresponding plain text secrets.
>> + /// Entries are added after a successful on-disk verification in
>> + /// `verify_secret` or when a new token secret is generated by
>> + /// `generate_and_set_secret`. Used to avoid repeated
>> + /// password-hash computation on subsequent authentications.
>> + secrets: HashMap<Authid, String>,
>> +}
>> +
>> +fn token_secret_cache() -> &'static RwLock<ApiTokenSecretCache> {
>> + TOKEN_SECRET_CACHE.get_or_init(|| {
>> + RwLock::new(ApiTokenSecretCache {
>> + secrets: HashMap::new(),
>> + })
>> + })
>> +}
>> +
>> +fn cache_insert_secret(tokenid: Authid, secret: String) {
>> + let mut cache = token_secret_cache().write().unwrap();
>> + cache.secrets.insert(tokenid, secret);
>> +}
>> +
>> +fn cache_remove_secret(tokenid: &Authid) {
>> + let mut cache = token_secret_cache().write().unwrap();
>> + cache.secrets.remove(tokenid);
>> +}
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup 1/3] pbs-config: cache verified API token secrets
2025-12-05 13:25 14% ` [pbs-devel] [PATCH proxmox-backup 1/3] pbs-config: cache verified API token secrets Samuel Rufinatscha
2025-12-05 14:04 5% ` Shannon Sterz
@ 2025-12-10 11:47 5% ` Fabian Grünbichler
2025-12-10 15:35 6% ` Samuel Rufinatscha
1 sibling, 1 reply; 200+ results
From: Fabian Grünbichler @ 2025-12-10 11:47 UTC (permalink / raw)
To: Samuel Rufinatscha, pbs-devel
Quoting Samuel Rufinatscha (2025-12-05 14:25:54)
> Currently, every token-based API request reads the token.shadow file and
> runs the expensive password hash verification for the given token
> secret. This shows up as a hotspot in /status profiling (see
> bug #6049 [1]).
>
> This patch introduces an in-memory cache of successfully verified token
> secrets. Subsequent requests for the same token+secret combination only
> perform a comparison using openssl::memcmp::eq and avoid re-running the
> password hash. The cache is updated when a token secret is set and
> cleared when a token is deleted. Note, this does NOT include manual
> config changes, which will be covered in a subsequent patch.
>
> This patch partly fixes bug #6049 [1].
>
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> pbs-config/src/token_shadow.rs | 58 +++++++++++++++++++++++++++++++++-
> 1 file changed, 57 insertions(+), 1 deletion(-)
>
> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
> index 640fabbf..47aa2fc2 100644
> --- a/pbs-config/src/token_shadow.rs
> +++ b/pbs-config/src/token_shadow.rs
> @@ -1,6 +1,8 @@
> use std::collections::HashMap;
> +use std::sync::RwLock;
>
> use anyhow::{bail, format_err, Error};
> +use once_cell::sync::OnceCell;
> use serde::{Deserialize, Serialize};
> use serde_json::{from_value, Value};
>
> @@ -13,6 +15,13 @@ use crate::{open_backup_lockfile, BackupLockGuard};
> const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
> const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>
> +/// Global in-memory cache for successfully verified API token secrets.
> +/// The cache stores plain text secrets for token Authids that have already been
> +/// verified against the hashed values in `token.shadow`. This allows for cheap
> +/// subsequent authentications for the same token+secret combination, avoiding
> +/// recomputing the password hash on every request.
> +static TOKEN_SECRET_CACHE: OnceCell<RwLock<ApiTokenSecretCache>> = OnceCell::new();
> +
> #[derive(Serialize, Deserialize)]
> #[serde(rename_all = "kebab-case")]
> /// ApiToken id / secret pair
> @@ -54,9 +63,25 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
> bail!("not an API token ID");
> }
>
> + // Fast path
> + if let Some(cached) = token_secret_cache().read().unwrap().secrets.get(tokenid) {
did you benchmark this with a lot of parallel token requests? a plain RwLock
gives no guarantees at all w.r.t. ordering or fairness, so a lot of token-based
requests could effectively prevent token removal AFAICT (or vice-versa,
spamming token creation could lock out all tokens?)
since we don't actually require the cache here to proceed, we could also make this a try_read
or a read with timeout, and fallback to the slow path if there is too much
contention? alternatively, comparing with parking_lot would also be
interesting, since that implementation does have fairness guarantees.
note that token-based requests are basically doable by anyone being able to
reach PBS, whereas token creation/deletion is available to every authenticaed
user.
> + // Compare cached secret with provided one using constant time comparison
> + if openssl::memcmp::eq(cached.as_bytes(), secret.as_bytes()) {
> + // Already verified before
> + return Ok(());
> + }
> + // Fall through to slow path if secret doesn't match cached one
> + }
this could also be a helper, like the rest. then it would consume (a reference
to) the user-provided secret value, instead of giving access to all cached
ones. doesn't make a real difference now other than consistence, but the cache
is (more) cleanly encapsulated then.
> +
> + // Slow path: read file + verify hash
> let data = read_file()?;
> match data.get(tokenid) {
> - Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
> + Some(hashed_secret) => {
> + proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
> + // Cache the plain secret for future requests
> + cache_insert_secret(tokenid.clone(), secret.to_owned());
same applies here - storing the value in the cache is optional (and good if it
works), but we don't want to stall forever waiting for the cache insertion to
go through..
> + Ok(())
> + }
> None => bail!("invalid API token"),
> }
> }
> @@ -82,6 +107,8 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
> data.insert(tokenid.clone(), hashed_secret);
> write_file(data)?;
>
> + cache_insert_secret(tokenid.clone(), secret.to_owned());
this
> +
> Ok(())
> }
>
> @@ -97,5 +124,34 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
> data.remove(tokenid);
> write_file(data)?;
>
> + cache_remove_secret(tokenid);
and this need to block of course and can't be skipped, because otherwise the
read above might operate on wrong data..
> +
> Ok(())
> }
> +
> +struct ApiTokenSecretCache {
> + /// Keys are token Authids, values are the corresponding plain text secrets.
> + /// Entries are added after a successful on-disk verification in
> + /// `verify_secret` or when a new token secret is generated by
> + /// `generate_and_set_secret`. Used to avoid repeated
> + /// password-hash computation on subsequent authentications.
> + secrets: HashMap<Authid, String>,
> +}
> +
> +fn token_secret_cache() -> &'static RwLock<ApiTokenSecretCache> {
> + TOKEN_SECRET_CACHE.get_or_init(|| {
> + RwLock::new(ApiTokenSecretCache {
> + secrets: HashMap::new(),
> + })
> + })
> +}
> +
> +fn cache_insert_secret(tokenid: Authid, secret: String) {
> + let mut cache = token_secret_cache().write().unwrap();
> + cache.secrets.insert(tokenid, secret);
> +}
> +
> +fn cache_remove_secret(tokenid: &Authid) {
> + let mut cache = token_secret_cache().write().unwrap();
> + cache.secrets.remove(tokenid);
> +}
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox v4 1/4] acme-api: add helper to load client for an account
2025-12-09 16:51 5% ` Max R. Carrara
@ 2025-12-10 10:08 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-10 10:08 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Max R. Carrara
On 12/9/25 5:51 PM, Max R. Carrara wrote:
> On Wed Dec 3, 2025 at 11:22 AM CET, Samuel Rufinatscha wrote:
>> The PBS ACME refactoring needs a simple way to obtain an AcmeClient for
>> a given configured account without duplicating config wiring. This patch
>> adds a load_client_with_account helper in proxmox-acme-api that loads
>> the account and constructs a matching client, similarly as PBS previous
>> own AcmeClient::load() function.
>
> Hmm, you say *needs* here, but the function added here isn't actually
> used in this series ...
>
> I personally don't mind keeping this one around for future cases, but
> are there any cases among this series (in PBS or otherwise) where we
> could've used this function instead already?
>
> If not, then it's probably not worth keeping it around. I assume you
> added it for good reason though, so I suggest to double-check where it's
> useful ;)
>
Good point about this function! :)
It was originally introduced to support the minimal client-swap
refactor:
[PATCH proxmox-backup v4 2/4] acme: drop local AcmeClient.
Some of its usages could be removed as part of the API implementation
changes in:
[PATCH proxmox-backup v4 3/4] acme: change API impls to use
proxmox-acme-api handlers.
However, it is still required for NodeConfig::acme_client() currently.
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> proxmox-acme-api/src/account_api_impl.rs | 5 +++++
>> proxmox-acme-api/src/lib.rs | 3 ++-
>> 2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/proxmox-acme-api/src/account_api_impl.rs b/proxmox-acme-api/src/account_api_impl.rs
>> index ef195908..ca8c8655 100644
>> --- a/proxmox-acme-api/src/account_api_impl.rs
>> +++ b/proxmox-acme-api/src/account_api_impl.rs
>> @@ -116,3 +116,8 @@ pub async fn update_account(name: &AcmeAccountName, contact: Option<String>) ->
>>
>> Ok(())
>> }
>> +
>> +pub async fn load_client_with_account(account_name: &AcmeAccountName) -> Result<AcmeClient, Error> {
>> + let account_data = super::account_config::load_account_config(&account_name).await?;
>> + Ok(account_data.client())
>> +}
>> diff --git a/proxmox-acme-api/src/lib.rs b/proxmox-acme-api/src/lib.rs
>> index 623e9e23..96f88ae2 100644
>> --- a/proxmox-acme-api/src/lib.rs
>> +++ b/proxmox-acme-api/src/lib.rs
>> @@ -31,7 +31,8 @@ mod plugin_config;
>> mod account_api_impl;
>> #[cfg(feature = "impl")]
>> pub use account_api_impl::{
>> - deactivate_account, get_account, get_tos, list_accounts, register_account, update_account,
>> + deactivate_account, get_account, get_tos, list_accounts, load_client_with_account,
>> + register_account, update_account,
>> };
>>
>> #[cfg(feature = "impl")]
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox{-backup, } v4 0/8] fix #6939: acme: support servers returning 204 for nonce requests
2025-12-09 16:50 5% ` [pbs-devel] [PATCH proxmox{-backup, } v4 0/8] " Max R. Carrara
@ 2025-12-10 9:44 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-10 9:44 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Max R. Carrara
On 12/9/25 5:51 PM, Max R. Carrara wrote:
> On Wed Dec 3, 2025 at 11:22 AM CET, Samuel Rufinatscha wrote:
>> Hi,
>>
>> this series fixes account registration for ACME providers that return
>> HTTP 204 No Content to the newNonce request. Currently, both the PBS
>> ACME client and the shared ACME client in proxmox-acme only accept
>> HTTP 200 OK for this request. The issue was observed in PBS against a
>> custom ACME deployment and reported as bug #6939 [1].
>>
>> [...]
>
> Testing
> -------
>
> Tested this on my local PBS development instance with the DNS-01
> challenge using one of my domains on OVH and Let's Encrypt Staging.
>
> The cert was ordered without any problems. Everything worked just as
> before.
>
> Comments Regarding the Changes Made
> -----------------------------------
>
> Overall, looks pretty good! I only found a few minor things, see my
> comments inline.
>
> What I would recommend overall is to make the changes in `proxmox`
> first, and then use the new `async fn` you introduced in patch #4
> (proxmox) in `proxmox-backup` instead of doing things the other way
> around. That way you could perhaps use the function you introduced,
> since I'm assuming you added it for good reason.
>
> Conclusion
> ----------
>
> LGTM—needs a teeny tiny bit more polish (see comments inline), but
> otherwise works great already! :D Good to see a lot of redundant code
> being removed.
>
> The few things I mentioned inline aren't *strict* blockers IMO and can
> maybe be addressed in a couple follow-up patches, if this gets merged as
> is. Otherwise, should you release a v5 of this series, I'll do another
> review.
>
> Anyhow, should the maintainer decide to merge this series, please
> consider:
>
> Reviewed-by: Max R. Carrara <m.carrara@proxmox.com>
> Tested-by: Max R. Carrara <m.carrara@proxmox.com>
>
Thank you Max for the detailed review and for testing! It's great to
hear that this refactor behaves as expected - for you too.
I will make sure to polish the imports as suggested (thanks for
providing them). Also, I will re-order the patches so that PBS can
depend on the newly introduced proxmox-acme-api function.
Regarding the unused Account functions you mentioned in
[PATCH proxmox v4 2/4] acme: reduce visibility of Request type, I agree,
we could probably remove them, especially now that their visibility
changed (pub -> pub(crate)) - will do that!
>>
>> proxmox-backup:
>>
>> Samuel Rufinatscha (4):
>> acme: include proxmox-acme-api dependency
>> acme: drop local AcmeClient
>> acme: change API impls to use proxmox-acme-api handlers
>> acme: certificate ordering through proxmox-acme-api
>>
>> Cargo.toml | 3 +
>> src/acme/client.rs | 691 -------------------------
>> src/acme/mod.rs | 5 -
>> src/acme/plugin.rs | 336 ------------
>> src/api2/config/acme.rs | 407 ++-------------
>> src/api2/node/certificates.rs | 240 ++-------
>> src/api2/types/acme.rs | 98 ----
>> src/api2/types/mod.rs | 3 -
>> src/bin/proxmox-backup-api.rs | 2 +
>> src/bin/proxmox-backup-manager.rs | 2 +
>> src/bin/proxmox-backup-proxy.rs | 1 +
>> src/bin/proxmox_backup_manager/acme.rs | 21 +-
>> src/config/acme/mod.rs | 51 +-
>> src/config/acme/plugin.rs | 99 +---
>> src/config/node.rs | 29 +-
>> src/lib.rs | 2 -
>> 16 files changed, 103 insertions(+), 1887 deletions(-)
>> delete mode 100644 src/acme/client.rs
>> delete mode 100644 src/acme/mod.rs
>> delete mode 100644 src/acme/plugin.rs
>> delete mode 100644 src/api2/types/acme.rs
>>
>>
>> proxmox:
>>
>> Samuel Rufinatscha (4):
>> acme-api: add helper to load client for an account
>> acme: reduce visibility of Request type
>> acme: introduce http_status module
>> fix #6939: acme: support servers returning 204 for nonce requests
>>
>> proxmox-acme-api/src/account_api_impl.rs | 5 +++++
>> proxmox-acme-api/src/lib.rs | 3 ++-
>> proxmox-acme/src/account.rs | 27 +++++++++++++-----------
>> proxmox-acme/src/async_client.rs | 8 +++----
>> proxmox-acme/src/authorization.rs | 2 +-
>> proxmox-acme/src/client.rs | 8 +++----
>> proxmox-acme/src/lib.rs | 6 ++----
>> proxmox-acme/src/order.rs | 2 +-
>> proxmox-acme/src/request.rs | 25 +++++++++++++++-------
>> 9 files changed, 51 insertions(+), 35 deletions(-)
>>
>>
>> Summary over all repositories:
>> 25 files changed, 154 insertions(+), 1922 deletions(-)
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox v4 1/4] acme-api: add helper to load client for an account
2025-12-03 10:22 17% ` [pbs-devel] [PATCH proxmox v4 1/4] acme-api: add helper to load client for an account Samuel Rufinatscha
@ 2025-12-09 16:51 5% ` Max R. Carrara
2025-12-10 10:08 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Max R. Carrara @ 2025-12-09 16:51 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On Wed Dec 3, 2025 at 11:22 AM CET, Samuel Rufinatscha wrote:
> The PBS ACME refactoring needs a simple way to obtain an AcmeClient for
> a given configured account without duplicating config wiring. This patch
> adds a load_client_with_account helper in proxmox-acme-api that loads
> the account and constructs a matching client, similarly as PBS previous
> own AcmeClient::load() function.
Hmm, you say *needs* here, but the function added here isn't actually
used in this series ...
I personally don't mind keeping this one around for future cases, but
are there any cases among this series (in PBS or otherwise) where we
could've used this function instead already?
If not, then it's probably not worth keeping it around. I assume you
added it for good reason though, so I suggest to double-check where it's
useful ;)
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> proxmox-acme-api/src/account_api_impl.rs | 5 +++++
> proxmox-acme-api/src/lib.rs | 3 ++-
> 2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/proxmox-acme-api/src/account_api_impl.rs b/proxmox-acme-api/src/account_api_impl.rs
> index ef195908..ca8c8655 100644
> --- a/proxmox-acme-api/src/account_api_impl.rs
> +++ b/proxmox-acme-api/src/account_api_impl.rs
> @@ -116,3 +116,8 @@ pub async fn update_account(name: &AcmeAccountName, contact: Option<String>) ->
>
> Ok(())
> }
> +
> +pub async fn load_client_with_account(account_name: &AcmeAccountName) -> Result<AcmeClient, Error> {
> + let account_data = super::account_config::load_account_config(&account_name).await?;
> + Ok(account_data.client())
> +}
> diff --git a/proxmox-acme-api/src/lib.rs b/proxmox-acme-api/src/lib.rs
> index 623e9e23..96f88ae2 100644
> --- a/proxmox-acme-api/src/lib.rs
> +++ b/proxmox-acme-api/src/lib.rs
> @@ -31,7 +31,8 @@ mod plugin_config;
> mod account_api_impl;
> #[cfg(feature = "impl")]
> pub use account_api_impl::{
> - deactivate_account, get_account, get_tos, list_accounts, register_account, update_account,
> + deactivate_account, get_account, get_tos, list_accounts, load_client_with_account,
> + register_account, update_account,
> };
>
> #[cfg(feature = "impl")]
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v4 4/4] acme: certificate ordering through proxmox-acme-api
2025-12-03 10:22 7% ` [pbs-devel] [PATCH proxmox-backup v4 4/4] acme: certificate ordering through proxmox-acme-api Samuel Rufinatscha
@ 2025-12-09 16:50 5% ` Max R. Carrara
0 siblings, 0 replies; 200+ results
From: Max R. Carrara @ 2025-12-09 16:50 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On Wed Dec 3, 2025 at 11:22 AM CET, Samuel Rufinatscha wrote:
> PBS currently uses its own ACME client and API logic, while PDM uses the
> factored out proxmox-acme and proxmox-acme-api crates. This duplication
> risks differences in behaviour and requires ACME maintenance in two
> places. This patch is part of a series to move PBS over to the shared
> ACME stack.
>
> Changes:
> - Replace the custom ACME order/authorization loop in node certificates
> with a call to proxmox_acme_api::order_certificate.
> - Build domain + config data as proxmox-acme-api types
> - Remove obsolete local ACME ordering and plugin glue code.
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> src/acme/mod.rs | 2 -
> src/acme/plugin.rs | 336 ----------------------------------
> src/api2/node/certificates.rs | 240 ++++--------------------
> src/api2/types/acme.rs | 74 --------
> src/api2/types/mod.rs | 3 -
> src/config/acme/mod.rs | 7 +-
> src/config/acme/plugin.rs | 99 +---------
> src/config/node.rs | 22 +--
> src/lib.rs | 2 -
> 9 files changed, 46 insertions(+), 739 deletions(-)
> delete mode 100644 src/acme/mod.rs
> delete mode 100644 src/acme/plugin.rs
> delete mode 100644 src/api2/types/acme.rs
>
> diff --git a/src/acme/mod.rs b/src/acme/mod.rs
> deleted file mode 100644
> index cc561f9a..00000000
> --- a/src/acme/mod.rs
> +++ /dev/null
> @@ -1,2 +0,0 @@
> -pub(crate) mod plugin;
> -pub(crate) use plugin::get_acme_plugin;
> diff --git a/src/acme/plugin.rs b/src/acme/plugin.rs
> deleted file mode 100644
> index 5bc09e1f..00000000
> --- a/src/acme/plugin.rs
> +++ /dev/null
> @@ -1,336 +0,0 @@
snip 8<-------------
> diff --git a/src/api2/node/certificates.rs b/src/api2/node/certificates.rs
> index 31196715..2a645b4a 100644
> --- a/src/api2/node/certificates.rs
> +++ b/src/api2/node/certificates.rs
> @@ -1,27 +1,19 @@
> -use std::sync::Arc;
> -use std::time::Duration;
> -
> use anyhow::{bail, format_err, Error};
> use openssl::pkey::PKey;
> use openssl::x509::X509;
> use serde::{Deserialize, Serialize};
> use tracing::info;
>
> -use proxmox_router::list_subdirs_api_method;
> -use proxmox_router::SubdirMap;
> -use proxmox_router::{Permission, Router, RpcEnvironment};
> -use proxmox_schema::api;
> -
> +use crate::server::send_certificate_renewal_mail;
> use pbs_api_types::{NODE_SCHEMA, PRIV_SYS_MODIFY};
> use pbs_buildcfg::configdir;
> use pbs_tools::cert;
> -use tracing::warn;
> -
> -use crate::api2::types::AcmeDomain;
> -use crate::config::node::NodeConfig;
> -use crate::server::send_certificate_renewal_mail;
> -use proxmox_acme::async_client::AcmeClient;
> +use proxmox_acme_api::AcmeDomain;
> use proxmox_rest_server::WorkerTask;
> +use proxmox_router::list_subdirs_api_method;
> +use proxmox_router::SubdirMap;
> +use proxmox_router::{Permission, Router, RpcEnvironment};
> +use proxmox_schema::api;
(See my comment on patch 2/4 for more information / context)
use anyhow::{bail, format_err, Error};
use openssl::pkey::PKey;
use openssl::x509::X509;
use serde::{Deserialize, Serialize};
use tracing::info;
use pbs_api_types::{NODE_SCHEMA, PRIV_SYS_MODIFY};
use proxmox_acme_api::AcmeDomain;
use proxmox_rest_server::WorkerTask;
use proxmox_router::SubdirMap;
use proxmox_router::list_subdirs_api_method;
use proxmox_router::{Permission, Router, RpcEnvironment};
use proxmox_schema::api;
use pbs_buildcfg::configdir;
use pbs_tools::cert;
use crate::server::send_certificate_renewal_mail;
>
> pub const ROUTER: Router = Router::new()
> .get(&list_subdirs_api_method!(SUBDIRS))
> @@ -269,193 +261,6 @@ pub async fn delete_custom_certificate() -> Result<(), Error> {
> Ok(())
> }
>
> -struct OrderedCertificate {
> - certificate: hyper::body::Bytes,
> - private_key_pem: Vec<u8>,
> -}
> -
> -async fn order_certificate(
> - worker: Arc<WorkerTask>,
> - node_config: &NodeConfig,
> -) -> Result<Option<OrderedCertificate>, Error> {
> - use proxmox_acme::authorization::Status;
> - use proxmox_acme::order::Identifier;
> -
> - let domains = node_config.acme_domains().try_fold(
> - Vec::<AcmeDomain>::new(),
> - |mut acc, domain| -> Result<_, Error> {
> - let mut domain = domain?;
> - domain.domain.make_ascii_lowercase();
> - if let Some(alias) = &mut domain.alias {
> - alias.make_ascii_lowercase();
> - }
> - acc.push(domain);
> - Ok(acc)
> - },
> - )?;
> -
> - let get_domain_config = |domain: &str| {
> - domains
> - .iter()
> - .find(|d| d.domain == domain)
> - .ok_or_else(|| format_err!("no config for domain '{}'", domain))
> - };
> -
> - if domains.is_empty() {
> - info!("No domains configured to be ordered from an ACME server.");
> - return Ok(None);
> - }
> -
> - let (plugins, _) = crate::config::acme::plugin::config()?;
> -
> - let mut acme = node_config.acme_client().await?;
> -
> - info!("Placing ACME order");
> - let order = acme
> - .new_order(domains.iter().map(|d| d.domain.to_ascii_lowercase()))
> - .await?;
> - info!("Order URL: {}", order.location);
> -
> - let identifiers: Vec<String> = order
> - .data
> - .identifiers
> - .iter()
> - .map(|identifier| match identifier {
> - Identifier::Dns(domain) => domain.clone(),
> - })
> - .collect();
> -
> - for auth_url in &order.data.authorizations {
> - info!("Getting authorization details from '{auth_url}'");
> - let mut auth = acme.get_authorization(auth_url).await?;
> -
> - let domain = match &mut auth.identifier {
> - Identifier::Dns(domain) => domain.to_ascii_lowercase(),
> - };
> -
> - if auth.status == Status::Valid {
> - info!("{domain} is already validated!");
> - continue;
> - }
> -
> - info!("The validation for {domain} is pending");
> - let domain_config: &AcmeDomain = get_domain_config(&domain)?;
> - let plugin_id = domain_config.plugin.as_deref().unwrap_or("standalone");
> - let mut plugin_cfg = crate::acme::get_acme_plugin(&plugins, plugin_id)?
> - .ok_or_else(|| format_err!("plugin '{plugin_id}' for domain '{domain}' not found!"))?;
> -
> - info!("Setting up validation plugin");
> - let validation_url = plugin_cfg
> - .setup(&mut acme, &auth, domain_config, Arc::clone(&worker))
> - .await?;
> -
> - let result = request_validation(&mut acme, auth_url, validation_url).await;
> -
> - if let Err(err) = plugin_cfg
> - .teardown(&mut acme, &auth, domain_config, Arc::clone(&worker))
> - .await
> - {
> - warn!("Failed to teardown plugin '{plugin_id}' for domain '{domain}' - {err}");
> - }
> -
> - result?;
> - }
> -
> - info!("All domains validated");
> - info!("Creating CSR");
> -
> - let csr = proxmox_acme::util::Csr::generate(&identifiers, &Default::default())?;
> - let mut finalize_error_cnt = 0u8;
> - let order_url = &order.location;
> - let mut order;
> - loop {
> - use proxmox_acme::order::Status;
> -
> - order = acme.get_order(order_url).await?;
> -
> - match order.status {
> - Status::Pending => {
> - info!("still pending, trying to finalize anyway");
> - let finalize = order
> - .finalize
> - .as_deref()
> - .ok_or_else(|| format_err!("missing 'finalize' URL in order"))?;
> - if let Err(err) = acme.finalize(finalize, &csr.data).await {
> - if finalize_error_cnt >= 5 {
> - return Err(err);
> - }
> -
> - finalize_error_cnt += 1;
> - }
> - tokio::time::sleep(Duration::from_secs(5)).await;
> - }
> - Status::Ready => {
> - info!("order is ready, finalizing");
> - let finalize = order
> - .finalize
> - .as_deref()
> - .ok_or_else(|| format_err!("missing 'finalize' URL in order"))?;
> - acme.finalize(finalize, &csr.data).await?;
> - tokio::time::sleep(Duration::from_secs(5)).await;
> - }
> - Status::Processing => {
> - info!("still processing, trying again in 30 seconds");
> - tokio::time::sleep(Duration::from_secs(30)).await;
> - }
> - Status::Valid => {
> - info!("valid");
> - break;
> - }
> - other => bail!("order status: {:?}", other),
> - }
> - }
> -
> - info!("Downloading certificate");
> - let certificate = acme
> - .get_certificate(
> - order
> - .certificate
> - .as_deref()
> - .ok_or_else(|| format_err!("missing certificate url in finalized order"))?,
> - )
> - .await?;
> -
> - Ok(Some(OrderedCertificate {
> - certificate,
> - private_key_pem: csr.private_key_pem,
> - }))
> -}
> -
> -async fn request_validation(
> - acme: &mut AcmeClient,
> - auth_url: &str,
> - validation_url: &str,
> -) -> Result<(), Error> {
> - info!("Triggering validation");
> - acme.request_challenge_validation(validation_url).await?;
> -
> - info!("Sleeping for 5 seconds");
> - tokio::time::sleep(Duration::from_secs(5)).await;
> -
> - loop {
> - use proxmox_acme::authorization::Status;
> -
> - let auth = acme.get_authorization(auth_url).await?;
> - match auth.status {
> - Status::Pending => {
> - info!("Status is still 'pending', trying again in 10 seconds");
> - tokio::time::sleep(Duration::from_secs(10)).await;
> - }
> - Status::Valid => return Ok(()),
> - other => bail!(
> - "validating challenge '{}' failed - status: {:?}",
> - validation_url,
> - other
> - ),
> - }
> - }
> -}
> -
> #[api(
> input: {
> properties: {
> @@ -525,9 +330,30 @@ fn spawn_certificate_worker(
>
> let auth_id = rpcenv.get_auth_id().unwrap();
>
> + let acme_config = if let Some(cfg) = node_config.acme_config().transpose()? {
> + cfg
> + } else {
> + proxmox_acme_api::parse_acme_config_string("account=default")?
> + };
> +
> + let domains = node_config.acme_domains().try_fold(
> + Vec::<AcmeDomain>::new(),
> + |mut acc, domain| -> Result<_, Error> {
> + let mut domain = domain?;
> + domain.domain.make_ascii_lowercase();
> + if let Some(alias) = &mut domain.alias {
> + alias.make_ascii_lowercase();
> + }
> + acc.push(domain);
> + Ok(acc)
> + },
> + )?;
> +
> WorkerTask::spawn(name, None, auth_id, true, move |worker| async move {
> let work = || async {
> - if let Some(cert) = order_certificate(worker, &node_config).await? {
> + if let Some(cert) =
> + proxmox_acme_api::order_certificate(worker, &acme_config, &domains).await?
> + {
> crate::config::set_proxy_certificate(&cert.certificate, &cert.private_key_pem)?;
> crate::server::reload_proxy_certificate().await?;
> }
> @@ -563,16 +389,20 @@ pub fn revoke_acme_cert(rpcenv: &mut dyn RpcEnvironment) -> Result<String, Error
>
> let auth_id = rpcenv.get_auth_id().unwrap();
>
> + let acme_config = if let Some(cfg) = node_config.acme_config().transpose()? {
> + cfg
> + } else {
> + proxmox_acme_api::parse_acme_config_string("account=default")?
> + };
> +
> WorkerTask::spawn(
> "acme-revoke-cert",
> None,
> auth_id,
> true,
> move |_worker| async move {
> - info!("Loading ACME account");
> - let mut acme = node_config.acme_client().await?;
> info!("Revoking old certificate");
> - acme.revoke_certificate(cert_pem.as_bytes(), None).await?;
> + proxmox_acme_api::revoke_certificate(&acme_config, &cert_pem.as_bytes()).await?;
> info!("Deleting certificate and regenerating a self-signed one");
> delete_custom_certificate().await?;
> Ok(())
> diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
> deleted file mode 100644
> index 2905b41b..00000000
> --- a/src/api2/types/acme.rs
> +++ /dev/null
> @@ -1,74 +0,0 @@
> -use serde::{Deserialize, Serialize};
> -use serde_json::Value;
> -
> -use proxmox_schema::{api, ApiStringFormat, ApiType, Schema, StringSchema};
> -
> -use pbs_api_types::{DNS_ALIAS_FORMAT, DNS_NAME_FORMAT, PROXMOX_SAFE_ID_FORMAT};
> -
> -#[api(
> - properties: {
> - "domain": { format: &DNS_NAME_FORMAT },
> - "alias": {
> - optional: true,
> - format: &DNS_ALIAS_FORMAT,
> - },
> - "plugin": {
> - optional: true,
> - format: &PROXMOX_SAFE_ID_FORMAT,
> - },
> - },
> - default_key: "domain",
> -)]
> -#[derive(Deserialize, Serialize)]
> -/// A domain entry for an ACME certificate.
> -pub struct AcmeDomain {
> - /// The domain to certify for.
> - pub domain: String,
> -
> - /// The domain to use for challenges instead of the default acme challenge domain.
> - ///
> - /// This is useful if you use CNAME entries to redirect `_acme-challenge.*` domains to a
> - /// different DNS server.
> - #[serde(skip_serializing_if = "Option::is_none")]
> - pub alias: Option<String>,
> -
> - /// The plugin to use to validate this domain.
> - ///
> - /// Empty means standalone HTTP validation is used.
> - #[serde(skip_serializing_if = "Option::is_none")]
> - pub plugin: Option<String>,
> -}
> -
> -pub const ACME_DOMAIN_PROPERTY_SCHEMA: Schema =
> - StringSchema::new("ACME domain configuration string")
> - .format(&ApiStringFormat::PropertyString(&AcmeDomain::API_SCHEMA))
> - .schema();
> -
> -#[api(
> - properties: {
> - schema: {
> - type: Object,
> - additional_properties: true,
> - properties: {},
> - },
> - type: {
> - type: String,
> - },
> - },
> -)]
> -#[derive(Serialize)]
> -/// Schema for an ACME challenge plugin.
> -pub struct AcmeChallengeSchema {
> - /// Plugin ID.
> - pub id: String,
> -
> - /// Human readable name, falls back to id.
> - pub name: String,
> -
> - /// Plugin Type.
> - #[serde(rename = "type")]
> - pub ty: &'static str,
> -
> - /// The plugin's parameter schema.
> - pub schema: Value,
> -}
> diff --git a/src/api2/types/mod.rs b/src/api2/types/mod.rs
> index afc34b30..34193685 100644
> --- a/src/api2/types/mod.rs
> +++ b/src/api2/types/mod.rs
> @@ -4,9 +4,6 @@ use anyhow::bail;
>
> use proxmox_schema::*;
>
> -mod acme;
> -pub use acme::*;
> -
> // File names: may not contain slashes, may not start with "."
> pub const FILENAME_FORMAT: ApiStringFormat = ApiStringFormat::VerifyFn(|name| {
> if name.starts_with('.') {
> diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
> index 35cda50b..afd7abf8 100644
> --- a/src/config/acme/mod.rs
> +++ b/src/config/acme/mod.rs
> @@ -9,8 +9,7 @@ use proxmox_sys::fs::{file_read_string, CreateOptions};
>
> use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
>
> -use crate::api2::types::AcmeChallengeSchema;
> -use proxmox_acme_api::{AcmeAccountName, KnownAcmeDirectory, KNOWN_ACME_DIRECTORIES};
> +use proxmox_acme_api::{AcmeAccountName, AcmeChallengeSchema};
use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
use proxmox_acme_api::{AcmeAccountName, AcmeChallengeSchema};
>
> pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
> pub(crate) const ACME_ACCOUNT_DIR: &str = pbs_buildcfg::configdir!("/acme/accounts");
> @@ -35,8 +34,6 @@ pub(crate) fn make_acme_dir() -> Result<(), Error> {
> create_acme_subdir(ACME_DIR)
> }
>
> -pub const DEFAULT_ACME_DIRECTORY_ENTRY: &KnownAcmeDirectory = &KNOWN_ACME_DIRECTORIES[0];
> -
> pub fn foreach_acme_account<F>(mut func: F) -> Result<(), Error>
> where
> F: FnMut(AcmeAccountName) -> ControlFlow<Result<(), Error>>,
> @@ -80,7 +77,7 @@ pub fn load_dns_challenge_schema() -> Result<Vec<AcmeChallengeSchema>, Error> {
> .and_then(Value::as_str)
> .unwrap_or(id)
> .to_owned(),
> - ty: "dns",
> + ty: "dns".into(),
> schema: schema.to_owned(),
> })
> .collect())
> diff --git a/src/config/acme/plugin.rs b/src/config/acme/plugin.rs
> index 18e71199..2e979ffe 100644
> --- a/src/config/acme/plugin.rs
> +++ b/src/config/acme/plugin.rs
> @@ -1,104 +1,15 @@
> use std::sync::LazyLock;
>
> use anyhow::Error;
> -use serde::{Deserialize, Serialize};
> -use serde_json::Value;
> -
> -use proxmox_schema::{api, ApiType, Schema, StringSchema, Updater};
> -use proxmox_section_config::{SectionConfig, SectionConfigData, SectionConfigPlugin};
> -
> -use pbs_api_types::PROXMOX_SAFE_ID_FORMAT;
> use pbs_config::{open_backup_lockfile, BackupLockGuard};
> -
> -pub const PLUGIN_ID_SCHEMA: Schema = StringSchema::new("ACME Challenge Plugin ID.")
> - .format(&PROXMOX_SAFE_ID_FORMAT)
> - .min_length(1)
> - .max_length(32)
> - .schema();
> +use proxmox_acme_api::PLUGIN_ID_SCHEMA;
> +use proxmox_acme_api::{DnsPlugin, StandalonePlugin};
> +use proxmox_schema::{ApiType, Schema};
> +use proxmox_section_config::{SectionConfig, SectionConfigData, SectionConfigPlugin};
> +use serde_json::Value;
use std::sync::LazyLock;
use anyhow::Error;
use serde::{Deserialize, Serialize};
use serde_json::Value;
use proxmox_acme_api::PLUGIN_ID_SCHEMA;
use proxmox_acme_api::{DnsPlugin, StandalonePlugin};
use proxmox_schema::{ApiType, Schema};
use proxmox_section_config::{SectionConfig, SectionConfigData, SectionConfigPlugin};
use pbs_config::{open_backup_lockfile, BackupLockGuard};
>
> pub static CONFIG: LazyLock<SectionConfig> = LazyLock::new(init);
>
> -#[api(
> - properties: {
> - id: { schema: PLUGIN_ID_SCHEMA },
> - },
> -)]
> -#[derive(Deserialize, Serialize)]
> -/// Standalone ACME Plugin for the http-1 challenge.
> -pub struct StandalonePlugin {
> - /// Plugin ID.
> - id: String,
> -}
> -
> -impl Default for StandalonePlugin {
> - fn default() -> Self {
> - Self {
> - id: "standalone".to_string(),
> - }
> - }
> -}
> -
> -#[api(
> - properties: {
> - id: { schema: PLUGIN_ID_SCHEMA },
> - disable: {
> - optional: true,
> - default: false,
> - },
> - "validation-delay": {
> - default: 30,
> - optional: true,
> - minimum: 0,
> - maximum: 2 * 24 * 60 * 60,
> - },
> - },
> -)]
> -/// DNS ACME Challenge Plugin core data.
> -#[derive(Deserialize, Serialize, Updater)]
> -#[serde(rename_all = "kebab-case")]
> -pub struct DnsPluginCore {
> - /// Plugin ID.
> - #[updater(skip)]
> - pub id: String,
> -
> - /// DNS API Plugin Id.
> - pub api: String,
> -
> - /// Extra delay in seconds to wait before requesting validation.
> - ///
> - /// Allows to cope with long TTL of DNS records.
> - #[serde(skip_serializing_if = "Option::is_none", default)]
> - pub validation_delay: Option<u32>,
> -
> - /// Flag to disable the config.
> - #[serde(skip_serializing_if = "Option::is_none", default)]
> - pub disable: Option<bool>,
> -}
> -
> -#[api(
> - properties: {
> - core: { type: DnsPluginCore },
> - },
> -)]
> -/// DNS ACME Challenge Plugin.
> -#[derive(Deserialize, Serialize)]
> -#[serde(rename_all = "kebab-case")]
> -pub struct DnsPlugin {
> - #[serde(flatten)]
> - pub core: DnsPluginCore,
> -
> - // We handle this property separately in the API calls.
> - /// DNS plugin data (base64url encoded without padding).
> - #[serde(with = "proxmox_serde::string_as_base64url_nopad")]
> - pub data: String,
> -}
> -
> -impl DnsPlugin {
> - pub fn decode_data(&self, output: &mut Vec<u8>) -> Result<(), Error> {
> - Ok(proxmox_base64::url::decode_to_vec(&self.data, output)?)
> - }
> -}
> -
> fn init() -> SectionConfig {
> let mut config = SectionConfig::new(&PLUGIN_ID_SCHEMA);
>
> diff --git a/src/config/node.rs b/src/config/node.rs
> index d2a17a49..b9257adf 100644
> --- a/src/config/node.rs
> +++ b/src/config/node.rs
> @@ -6,17 +6,17 @@ use serde::{Deserialize, Serialize};
>
> use proxmox_schema::{api, ApiStringFormat, ApiType, Updater};
>
> -use proxmox_http::ProxyConfig;
> -
> use pbs_api_types::{
> EMAIL_SCHEMA, MULTI_LINE_COMMENT_SCHEMA, OPENSSL_CIPHERS_TLS_1_2_SCHEMA,
> OPENSSL_CIPHERS_TLS_1_3_SCHEMA,
> };
> +use proxmox_acme_api::{AcmeConfig, AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA};
> +use proxmox_http::ProxyConfig;
>
> use pbs_buildcfg::configdir;
> use pbs_config::{open_backup_lockfile, BackupLockGuard};
>
> -use crate::api2::types::{AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA};
> +use crate::api2::types::HTTP_PROXY_SCHEMA;
> use proxmox_acme::async_client::AcmeClient;
> use proxmox_acme_api::AcmeAccountName;
use serde::{Deserialize, Serialize};
use pbs_api_types::{
EMAIL_SCHEMA, MULTI_LINE_COMMENT_SCHEMA, OPENSSL_CIPHERS_TLS_1_2_SCHEMA,
OPENSSL_CIPHERS_TLS_1_3_SCHEMA,
};
use proxmox_acme::async_client::AcmeClient;
use proxmox_acme_api::AcmeAccountName;
use proxmox_acme_api::{AcmeConfig, AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA};
use proxmox_http::ProxyConfig;
use proxmox_http::ProxyConfig;
use proxmox_schema::{api, ApiStringFormat, ApiType, Updater};
pbs_buildcfg::configdir;
pbs_config::{open_backup_lockfile, BackupLockGuard};
use crate::api2::types::HTTP_PROXY_SCHEMA;
>
> @@ -45,20 +45,6 @@ pub fn save_config(config: &NodeConfig) -> Result<(), Error> {
> pbs_config::replace_backup_config(CONF_FILE, &raw)
> }
>
> -#[api(
> - properties: {
> - account: { type: AcmeAccountName },
> - }
> -)]
> -#[derive(Deserialize, Serialize)]
> -/// The ACME configuration.
> -///
> -/// Currently only contains the name of the account use.
> -pub struct AcmeConfig {
> - /// Account to use to acquire ACME certificates.
> - account: AcmeAccountName,
> -}
> -
> /// All available languages in Proxmox. Taken from proxmox-i18n repository.
> /// pt_BR, zh_CN, and zh_TW use the same case in the translation files.
> // TODO: auto-generate from available translations
> @@ -244,7 +230,7 @@ impl NodeConfig {
>
> pub async fn acme_client(&self) -> Result<AcmeClient, Error> {
> let account = if let Some(cfg) = self.acme_config().transpose()? {
> - cfg.account
> + AcmeAccountName::from_string(cfg.account)?
> } else {
> AcmeAccountName::from_string("default".to_string())? // should really not happen
> };
> diff --git a/src/lib.rs b/src/lib.rs
> index 8633378c..828f5842 100644
> --- a/src/lib.rs
> +++ b/src/lib.rs
> @@ -27,8 +27,6 @@ pub(crate) mod auth;
>
> pub mod tape;
>
> -pub mod acme;
> -
> pub mod client_helpers;
>
> pub mod traffic_control_cache;
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v4 3/4] acme: change API impls to use proxmox-acme-api handlers
2025-12-03 10:22 8% ` [pbs-devel] [PATCH proxmox-backup v4 3/4] acme: change API impls to use proxmox-acme-api handlers Samuel Rufinatscha
@ 2025-12-09 16:50 5% ` Max R. Carrara
0 siblings, 0 replies; 200+ results
From: Max R. Carrara @ 2025-12-09 16:50 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On Wed Dec 3, 2025 at 11:22 AM CET, Samuel Rufinatscha wrote:
> PBS currently uses its own ACME client and API logic, while PDM uses the
> factored out proxmox-acme and proxmox-acme-api crates. This duplication
> risks differences in behaviour and requires ACME maintenance in two
> places. This patch is part of a series to move PBS over to the shared
> ACME stack.
>
> Changes:
> - Replace api2/config/acme.rs API logic with proxmox-acme-api handlers.
> - Drop local caching and helper types that duplicate proxmox-acme-api.
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> src/api2/config/acme.rs | 385 ++-----------------------
> src/api2/types/acme.rs | 16 -
> src/bin/proxmox_backup_manager/acme.rs | 6 +-
> src/config/acme/mod.rs | 44 +--
> 4 files changed, 35 insertions(+), 416 deletions(-)
>
> diff --git a/src/api2/config/acme.rs b/src/api2/config/acme.rs
> index 02f88e2e..a112c8ee 100644
> --- a/src/api2/config/acme.rs
> +++ b/src/api2/config/acme.rs
> @@ -1,31 +1,17 @@
> -use std::fs;
> -use std::ops::ControlFlow;
> -use std::path::Path;
> -use std::sync::{Arc, LazyLock, Mutex};
> -use std::time::SystemTime;
> -
> -use anyhow::{bail, format_err, Error};
> -use hex::FromHex;
> -use serde::{Deserialize, Serialize};
> -use serde_json::{json, Value};
> -use tracing::{info, warn};
> -
> -use proxmox_router::{
> - http_bail, list_subdirs_api_method, Permission, Router, RpcEnvironment, SubdirMap,
> -};
> -use proxmox_schema::{api, param_bail};
> -
> -use proxmox_acme::types::AccountData as AcmeAccountData;
> -
> +use anyhow::Error;
> use pbs_api_types::{Authid, PRIV_SYS_MODIFY};
> -
> -use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
> -use crate::config::acme::plugin::{
> - self, DnsPlugin, DnsPluginCore, DnsPluginCoreUpdater, PLUGIN_ID_SCHEMA,
> +use proxmox_acme_api::{
> + AccountEntry, AccountInfo, AcmeAccountName, AcmeChallengeSchema, ChallengeSchemaWrapper,
> + DeletablePluginProperty, DnsPluginCore, DnsPluginCoreUpdater, KnownAcmeDirectory, PluginConfig,
> + DEFAULT_ACME_DIRECTORY_ENTRY, PLUGIN_ID_SCHEMA,
> };
> -use proxmox_acme::async_client::AcmeClient;
> -use proxmox_acme_api::AcmeAccountName;
> +use proxmox_config_digest::ConfigDigest;
> use proxmox_rest_server::WorkerTask;
> +use proxmox_router::{
> + http_bail, list_subdirs_api_method, Permission, Router, RpcEnvironment, SubdirMap,
> +};
> +use proxmox_schema::api;
> +use tracing::info;
(See my comment to patch 2/4 for an explanation)
use anyhow::Error;
use tracing::info;
use pbs_api_types::{Authid, PRIV_SYS_MODIFY};
use proxmox_acme_api::{
AccountEntry, AccountInfo, AcmeAccountName, AcmeChallengeSchema, ChallengeSchemaWrapper,
DeletablePluginProperty, DnsPluginCore, DnsPluginCoreUpdater, KnownAcmeDirectory, PluginConfig,
DEFAULT_ACME_DIRECTORY_ENTRY, PLUGIN_ID_SCHEMA,
};
use proxmox_config_digest::ConfigDigest;
use proxmox_rest_server::WorkerTask;
use proxmox_router::{
http_bail, list_subdirs_api_method, Permission, Router, RpcEnvironment, SubdirMap,
};
use proxmox_schema::api;
>
> pub(crate) const ROUTER: Router = Router::new()
> .get(&list_subdirs_api_method!(SUBDIRS))
> @@ -67,19 +53,6 @@ const PLUGIN_ITEM_ROUTER: Router = Router::new()
> .put(&API_METHOD_UPDATE_PLUGIN)
> .delete(&API_METHOD_DELETE_PLUGIN);
>
> -#[api(
> - properties: {
> - name: { type: AcmeAccountName },
> - },
> -)]
> -/// An ACME Account entry.
> -///
> -/// Currently only contains a 'name' property.
> -#[derive(Serialize)]
> -pub struct AccountEntry {
> - name: AcmeAccountName,
> -}
> -
> #[api(
> access: {
> permission: &Permission::Privilege(&["system", "certificates"], PRIV_SYS_MODIFY, false),
> @@ -93,40 +66,7 @@ pub struct AccountEntry {
> )]
> /// List ACME accounts.
> pub fn list_accounts() -> Result<Vec<AccountEntry>, Error> {
> - let mut entries = Vec::new();
> - crate::config::acme::foreach_acme_account(|name| {
> - entries.push(AccountEntry { name });
> - ControlFlow::Continue(())
> - })?;
> - Ok(entries)
> -}
> -
> -#[api(
> - properties: {
> - account: { type: Object, properties: {}, additional_properties: true },
> - tos: {
> - type: String,
> - optional: true,
> - },
> - },
> -)]
> -/// ACME Account information.
> -///
> -/// This is what we return via the API.
> -#[derive(Serialize)]
> -pub struct AccountInfo {
> - /// Raw account data.
> - account: AcmeAccountData,
> -
> - /// The ACME directory URL the account was created at.
> - directory: String,
> -
> - /// The account's own URL within the ACME directory.
> - location: String,
> -
> - /// The ToS URL, if the user agreed to one.
> - #[serde(skip_serializing_if = "Option::is_none")]
> - tos: Option<String>,
> + proxmox_acme_api::list_accounts()
> }
>
> #[api(
> @@ -143,23 +83,7 @@ pub struct AccountInfo {
> )]
> /// Return existing ACME account information.
> pub async fn get_account(name: AcmeAccountName) -> Result<AccountInfo, Error> {
> - let account_info = proxmox_acme_api::get_account(name).await?;
> -
> - Ok(AccountInfo {
> - location: account_info.location,
> - tos: account_info.tos,
> - directory: account_info.directory,
> - account: AcmeAccountData {
> - only_return_existing: false, // don't actually write this out in case it's set
> - ..account_info.account
> - },
> - })
> -}
> -
> -fn account_contact_from_string(s: &str) -> Vec<String> {
> - s.split(&[' ', ';', ',', '\0'][..])
> - .map(|s| format!("mailto:{s}"))
> - .collect()
> + proxmox_acme_api::get_account(name).await
> }
>
> #[api(
> @@ -224,15 +148,11 @@ fn register_account(
> );
> }
>
> - if Path::new(&crate::config::acme::account_path(&name)).exists() {
> + if std::path::Path::new(&proxmox_acme_api::account_config_filename(&name)).exists() {
> http_bail!(BAD_REQUEST, "account {} already exists", name);
> }
>
> - let directory = directory.unwrap_or_else(|| {
> - crate::config::acme::DEFAULT_ACME_DIRECTORY_ENTRY
> - .url
> - .to_owned()
> - });
> + let directory = directory.unwrap_or_else(|| DEFAULT_ACME_DIRECTORY_ENTRY.url.to_string());
>
> WorkerTask::spawn(
> "acme-register",
> @@ -288,17 +208,7 @@ pub fn update_account(
> auth_id.to_string(),
> true,
> move |_worker| async move {
> - let data = match contact {
> - Some(data) => json!({
> - "contact": account_contact_from_string(&data),
> - }),
> - None => json!({}),
> - };
> -
> - proxmox_acme_api::load_client_with_account(&name)
> - .await?
> - .update_account(&data)
> - .await?;
> + proxmox_acme_api::update_account(&name, contact).await?;
>
> Ok(())
> },
> @@ -336,18 +246,8 @@ pub fn deactivate_account(
> auth_id.to_string(),
> true,
> move |_worker| async move {
> - match proxmox_acme_api::load_client_with_account(&name)
> - .await?
> - .update_account(&json!({"status": "deactivated"}))
> - .await
> - {
> - Ok(_account) => (),
> - Err(err) if !force => return Err(err),
> - Err(err) => {
> - warn!("error deactivating account {name}, proceeding anyway - {err}");
> - }
> - }
> - crate::config::acme::mark_account_deactivated(&name)?;
> + proxmox_acme_api::deactivate_account(&name, force).await?;
> +
> Ok(())
> },
> )
> @@ -374,15 +274,7 @@ pub fn deactivate_account(
> )]
> /// Get the Terms of Service URL for an ACME directory.
> async fn get_tos(directory: Option<String>) -> Result<Option<String>, Error> {
> - let directory = directory.unwrap_or_else(|| {
> - crate::config::acme::DEFAULT_ACME_DIRECTORY_ENTRY
> - .url
> - .to_owned()
> - });
> - Ok(AcmeClient::new(directory)
> - .terms_of_service_url()
> - .await?
> - .map(str::to_owned))
> + proxmox_acme_api::get_tos(directory).await
> }
>
> #[api(
> @@ -397,52 +289,7 @@ async fn get_tos(directory: Option<String>) -> Result<Option<String>, Error> {
> )]
> /// Get named known ACME directory endpoints.
> fn get_directories() -> Result<&'static [KnownAcmeDirectory], Error> {
> - Ok(crate::config::acme::KNOWN_ACME_DIRECTORIES)
> -}
> -
> -/// Wrapper for efficient Arc use when returning the ACME challenge-plugin schema for serializing
> -struct ChallengeSchemaWrapper {
> - inner: Arc<Vec<AcmeChallengeSchema>>,
> -}
> -
> -impl Serialize for ChallengeSchemaWrapper {
> - fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
> - where
> - S: serde::Serializer,
> - {
> - self.inner.serialize(serializer)
> - }
> -}
> -
> -struct CachedSchema {
> - schema: Arc<Vec<AcmeChallengeSchema>>,
> - cached_mtime: SystemTime,
> -}
> -
> -fn get_cached_challenge_schemas() -> Result<ChallengeSchemaWrapper, Error> {
> - static CACHE: LazyLock<Mutex<Option<CachedSchema>>> = LazyLock::new(|| Mutex::new(None));
> -
> - // the actual loading code
> - let mut last = CACHE.lock().unwrap();
> -
> - let actual_mtime = fs::metadata(crate::config::acme::ACME_DNS_SCHEMA_FN)?.modified()?;
> -
> - let schema = match &*last {
> - Some(CachedSchema {
> - schema,
> - cached_mtime,
> - }) if *cached_mtime >= actual_mtime => schema.clone(),
> - _ => {
> - let new_schema = Arc::new(crate::config::acme::load_dns_challenge_schema()?);
> - *last = Some(CachedSchema {
> - schema: Arc::clone(&new_schema),
> - cached_mtime: actual_mtime,
> - });
> - new_schema
> - }
> - };
> -
> - Ok(ChallengeSchemaWrapper { inner: schema })
> + Ok(proxmox_acme_api::KNOWN_ACME_DIRECTORIES)
> }
>
> #[api(
> @@ -457,69 +304,7 @@ fn get_cached_challenge_schemas() -> Result<ChallengeSchemaWrapper, Error> {
> )]
> /// Get named known ACME directory endpoints.
> fn get_challenge_schema() -> Result<ChallengeSchemaWrapper, Error> {
> - get_cached_challenge_schemas()
> -}
> -
> -#[api]
> -#[derive(Default, Deserialize, Serialize)]
> -#[serde(rename_all = "kebab-case")]
> -/// The API's format is inherited from PVE/PMG:
> -pub struct PluginConfig {
> - /// Plugin ID.
> - plugin: String,
> -
> - /// Plugin type.
> - #[serde(rename = "type")]
> - ty: String,
> -
> - /// DNS Api name.
> - #[serde(skip_serializing_if = "Option::is_none", default)]
> - api: Option<String>,
> -
> - /// Plugin configuration data.
> - #[serde(skip_serializing_if = "Option::is_none", default)]
> - data: Option<String>,
> -
> - /// Extra delay in seconds to wait before requesting validation.
> - ///
> - /// Allows to cope with long TTL of DNS records.
> - #[serde(skip_serializing_if = "Option::is_none", default)]
> - validation_delay: Option<u32>,
> -
> - /// Flag to disable the config.
> - #[serde(skip_serializing_if = "Option::is_none", default)]
> - disable: Option<bool>,
> -}
> -
> -// See PMG/PVE's $modify_cfg_for_api sub
> -fn modify_cfg_for_api(id: &str, ty: &str, data: &Value) -> PluginConfig {
> - let mut entry = data.clone();
> -
> - let obj = entry.as_object_mut().unwrap();
> - obj.remove("id");
> - obj.insert("plugin".to_string(), Value::String(id.to_owned()));
> - obj.insert("type".to_string(), Value::String(ty.to_owned()));
> -
> - // FIXME: This needs to go once the `Updater` is fixed.
> - // None of these should be able to fail unless the user changed the files by hand, in which
> - // case we leave the unmodified string in the Value for now. This will be handled with an error
> - // later.
> - if let Some(Value::String(ref mut data)) = obj.get_mut("data") {
> - if let Ok(new) = proxmox_base64::url::decode_no_pad(&data) {
> - if let Ok(utf8) = String::from_utf8(new) {
> - *data = utf8;
> - }
> - }
> - }
> -
> - // PVE/PMG do this explicitly for ACME plugins...
> - // obj.insert("digest".to_string(), Value::String(digest.clone()));
> -
> - serde_json::from_value(entry).unwrap_or_else(|_| PluginConfig {
> - plugin: "*Error*".to_string(),
> - ty: "*Error*".to_string(),
> - ..Default::default()
> - })
> + proxmox_acme_api::get_cached_challenge_schemas()
> }
>
> #[api(
> @@ -535,12 +320,7 @@ fn modify_cfg_for_api(id: &str, ty: &str, data: &Value) -> PluginConfig {
> )]
> /// List ACME challenge plugins.
> pub fn list_plugins(rpcenv: &mut dyn RpcEnvironment) -> Result<Vec<PluginConfig>, Error> {
> - let (plugins, digest) = plugin::config()?;
> - rpcenv["digest"] = hex::encode(digest).into();
> - Ok(plugins
> - .iter()
> - .map(|(id, (ty, data))| modify_cfg_for_api(id, ty, data))
> - .collect())
> + proxmox_acme_api::list_plugins(rpcenv)
> }
>
> #[api(
> @@ -557,13 +337,7 @@ pub fn list_plugins(rpcenv: &mut dyn RpcEnvironment) -> Result<Vec<PluginConfig>
> )]
> /// List ACME challenge plugins.
> pub fn get_plugin(id: String, rpcenv: &mut dyn RpcEnvironment) -> Result<PluginConfig, Error> {
> - let (plugins, digest) = plugin::config()?;
> - rpcenv["digest"] = hex::encode(digest).into();
> -
> - match plugins.get(&id) {
> - Some((ty, data)) => Ok(modify_cfg_for_api(&id, ty, data)),
> - None => http_bail!(NOT_FOUND, "no such plugin"),
> - }
> + proxmox_acme_api::get_plugin(id, rpcenv)
> }
>
> // Currently we only have "the" standalone plugin and DNS plugins so we can just flatten a
> @@ -595,30 +369,7 @@ pub fn get_plugin(id: String, rpcenv: &mut dyn RpcEnvironment) -> Result<PluginC
> )]
> /// Add ACME plugin configuration.
> pub fn add_plugin(r#type: String, core: DnsPluginCore, data: String) -> Result<(), Error> {
> - // Currently we only support DNS plugins and the standalone plugin is "fixed":
> - if r#type != "dns" {
> - param_bail!("type", "invalid ACME plugin type: {:?}", r#type);
> - }
> -
> - let data = String::from_utf8(proxmox_base64::decode(data)?)
> - .map_err(|_| format_err!("data must be valid UTF-8"))?;
> -
> - let id = core.id.clone();
> -
> - let _lock = plugin::lock()?;
> -
> - let (mut plugins, _digest) = plugin::config()?;
> - if plugins.contains_key(&id) {
> - param_bail!("id", "ACME plugin ID {:?} already exists", id);
> - }
> -
> - let plugin = serde_json::to_value(DnsPlugin { core, data })?;
> -
> - plugins.insert(id, r#type, plugin);
> -
> - plugin::save_config(&plugins)?;
> -
> - Ok(())
> + proxmox_acme_api::add_plugin(r#type, core, data)
> }
>
> #[api(
> @@ -634,26 +385,7 @@ pub fn add_plugin(r#type: String, core: DnsPluginCore, data: String) -> Result<(
> )]
> /// Delete an ACME plugin configuration.
> pub fn delete_plugin(id: String) -> Result<(), Error> {
> - let _lock = plugin::lock()?;
> -
> - let (mut plugins, _digest) = plugin::config()?;
> - if plugins.remove(&id).is_none() {
> - http_bail!(NOT_FOUND, "no such plugin");
> - }
> - plugin::save_config(&plugins)?;
> -
> - Ok(())
> -}
> -
> -#[api()]
> -#[derive(Serialize, Deserialize)]
> -#[serde(rename_all = "kebab-case")]
> -/// Deletable property name
> -pub enum DeletableProperty {
> - /// Delete the disable property
> - Disable,
> - /// Delete the validation-delay property
> - ValidationDelay,
> + proxmox_acme_api::delete_plugin(id)
> }
>
> #[api(
> @@ -675,12 +407,12 @@ pub enum DeletableProperty {
> type: Array,
> optional: true,
> items: {
> - type: DeletableProperty,
> + type: DeletablePluginProperty,
> }
> },
> digest: {
> - description: "Digest to protect against concurrent updates",
> optional: true,
> + type: ConfigDigest,
> },
> },
> },
> @@ -694,65 +426,8 @@ pub fn update_plugin(
> id: String,
> update: DnsPluginCoreUpdater,
> data: Option<String>,
> - delete: Option<Vec<DeletableProperty>>,
> - digest: Option<String>,
> + delete: Option<Vec<DeletablePluginProperty>>,
> + digest: Option<ConfigDigest>,
> ) -> Result<(), Error> {
> - let data = data
> - .as_deref()
> - .map(proxmox_base64::decode)
> - .transpose()?
> - .map(String::from_utf8)
> - .transpose()
> - .map_err(|_| format_err!("data must be valid UTF-8"))?;
> -
> - let _lock = plugin::lock()?;
> -
> - let (mut plugins, expected_digest) = plugin::config()?;
> -
> - if let Some(digest) = digest {
> - let digest = <[u8; 32]>::from_hex(digest)?;
> - crate::tools::detect_modified_configuration_file(&digest, &expected_digest)?;
> - }
> -
> - match plugins.get_mut(&id) {
> - Some((ty, ref mut entry)) => {
> - if ty != "dns" {
> - bail!("cannot update plugin of type {:?}", ty);
> - }
> -
> - let mut plugin = DnsPlugin::deserialize(&*entry)?;
> -
> - if let Some(delete) = delete {
> - for delete_prop in delete {
> - match delete_prop {
> - DeletableProperty::ValidationDelay => {
> - plugin.core.validation_delay = None;
> - }
> - DeletableProperty::Disable => {
> - plugin.core.disable = None;
> - }
> - }
> - }
> - }
> - if let Some(data) = data {
> - plugin.data = data;
> - }
> - if let Some(api) = update.api {
> - plugin.core.api = api;
> - }
> - if update.validation_delay.is_some() {
> - plugin.core.validation_delay = update.validation_delay;
> - }
> - if update.disable.is_some() {
> - plugin.core.disable = update.disable;
> - }
> -
> - *entry = serde_json::to_value(plugin)?;
> - }
> - None => http_bail!(NOT_FOUND, "no such plugin"),
> - }
> -
> - plugin::save_config(&plugins)?;
> -
> - Ok(())
> + proxmox_acme_api::update_plugin(id, update, data, delete, digest)
> }
> diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
> index 7c9063c0..2905b41b 100644
> --- a/src/api2/types/acme.rs
> +++ b/src/api2/types/acme.rs
> @@ -44,22 +44,6 @@ pub const ACME_DOMAIN_PROPERTY_SCHEMA: Schema =
> .format(&ApiStringFormat::PropertyString(&AcmeDomain::API_SCHEMA))
> .schema();
>
> -#[api(
> - properties: {
> - name: { type: String },
> - url: { type: String },
> - },
> -)]
> -/// An ACME directory endpoint with a name and URL.
> -#[derive(Serialize)]
> -pub struct KnownAcmeDirectory {
> - /// The ACME directory's name.
> - pub name: &'static str,
> -
> - /// The ACME directory's endpoint URL.
> - pub url: &'static str,
> -}
> -
> #[api(
> properties: {
> schema: {
> diff --git a/src/bin/proxmox_backup_manager/acme.rs b/src/bin/proxmox_backup_manager/acme.rs
> index bb987b26..e7bd67af 100644
> --- a/src/bin/proxmox_backup_manager/acme.rs
> +++ b/src/bin/proxmox_backup_manager/acme.rs
> @@ -8,10 +8,8 @@ use proxmox_schema::api;
> use proxmox_sys::fs::file_get_contents;
>
> use proxmox_acme::async_client::AcmeClient;
> -use proxmox_acme_api::AcmeAccountName;
> +use proxmox_acme_api::{AcmeAccountName, DnsPluginCore, KNOWN_ACME_DIRECTORIES};
> use proxmox_backup::api2;
> -use proxmox_backup::config::acme::plugin::DnsPluginCore;
> -use proxmox_backup::config::acme::KNOWN_ACME_DIRECTORIES;
use proxmox_acme::async_client::AcmeClient;
use proxmox_acme_api::{AcmeAccountName, DnsPluginCore, KNOWN_ACME_DIRECTORIES};
[...]
use proxmox_sys::fs::file_get_contents;
use proxmox_backup::api2;
>
> pub fn acme_mgmt_cli() -> CommandLineInterface {
> let cmd_def = CliCommandMap::new()
> @@ -122,7 +120,7 @@ async fn register_account(
>
> match input.trim().parse::<usize>() {
> Ok(n) if n < KNOWN_ACME_DIRECTORIES.len() => {
> - break (KNOWN_ACME_DIRECTORIES[n].url.to_owned(), false);
> + break (KNOWN_ACME_DIRECTORIES[n].url.to_string(), false);
> }
> Ok(n) if n == KNOWN_ACME_DIRECTORIES.len() => {
> input.clear();
> diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
> index d31b2bc9..35cda50b 100644
> --- a/src/config/acme/mod.rs
> +++ b/src/config/acme/mod.rs
> @@ -1,8 +1,7 @@
> use std::collections::HashMap;
> use std::ops::ControlFlow;
> -use std::path::Path;
>
> -use anyhow::{bail, format_err, Error};
> +use anyhow::Error;
> use serde_json::Value;
>
> use proxmox_sys::error::SysError;
This here is alright 🎉
> @@ -10,8 +9,8 @@ use proxmox_sys::fs::{file_read_string, CreateOptions};
>
> use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
>
> -use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
> -use proxmox_acme_api::AcmeAccountName;
> +use crate::api2::types::AcmeChallengeSchema;
> +use proxmox_acme_api::{AcmeAccountName, KnownAcmeDirectory, KNOWN_ACME_DIRECTORIES};
use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
use proxmox_acme_api::{AcmeAccountName, KnownAcmeDirectory, KNOWN_ACME_DIRECTORIES};
use crate::api2::types::AcmeChallengeSchema;
>
> pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
> pub(crate) const ACME_ACCOUNT_DIR: &str = pbs_buildcfg::configdir!("/acme/accounts");
> @@ -36,23 +35,8 @@ pub(crate) fn make_acme_dir() -> Result<(), Error> {
> create_acme_subdir(ACME_DIR)
> }
>
> -pub const KNOWN_ACME_DIRECTORIES: &[KnownAcmeDirectory] = &[
> - KnownAcmeDirectory {
> - name: "Let's Encrypt V2",
> - url: "https://acme-v02.api.letsencrypt.org/directory",
> - },
> - KnownAcmeDirectory {
> - name: "Let's Encrypt V2 Staging",
> - url: "https://acme-staging-v02.api.letsencrypt.org/directory",
> - },
> -];
> -
> pub const DEFAULT_ACME_DIRECTORY_ENTRY: &KnownAcmeDirectory = &KNOWN_ACME_DIRECTORIES[0];
>
> -pub fn account_path(name: &str) -> String {
> - format!("{ACME_ACCOUNT_DIR}/{name}")
> -}
> -
> pub fn foreach_acme_account<F>(mut func: F) -> Result<(), Error>
> where
> F: FnMut(AcmeAccountName) -> ControlFlow<Result<(), Error>>,
> @@ -83,28 +67,6 @@ where
> }
> }
>
> -pub fn mark_account_deactivated(name: &str) -> Result<(), Error> {
> - let from = account_path(name);
> - for i in 0..100 {
> - let to = account_path(&format!("_deactivated_{name}_{i}"));
> - if !Path::new(&to).exists() {
> - return std::fs::rename(&from, &to).map_err(|err| {
> - format_err!(
> - "failed to move account path {:?} to {:?} - {}",
> - from,
> - to,
> - err
> - )
> - });
> - }
> - }
> - bail!(
> - "No free slot to rename deactivated account {:?}, please cleanup {:?}",
> - from,
> - ACME_ACCOUNT_DIR
> - );
> -}
> -
> pub fn load_dns_challenge_schema() -> Result<Vec<AcmeChallengeSchema>, Error> {
> let raw = file_read_string(ACME_DNS_SCHEMA_FN)?;
> let schemas: serde_json::Map<String, Value> = serde_json::from_str(&raw)?;
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v4 2/4] acme: drop local AcmeClient
2025-12-03 10:22 6% ` [pbs-devel] [PATCH proxmox-backup v4 2/4] acme: drop local AcmeClient Samuel Rufinatscha
@ 2025-12-09 16:50 4% ` Max R. Carrara
0 siblings, 0 replies; 200+ results
From: Max R. Carrara @ 2025-12-09 16:50 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On Wed Dec 3, 2025 at 11:22 AM CET, Samuel Rufinatscha wrote:
> PBS currently uses its own ACME client and API logic, while PDM uses the
> factored out proxmox-acme and proxmox-acme-api crates. This duplication
> risks differences in behaviour and requires ACME maintenance in two
> places. This patch is part of a series to move PBS over to the shared
> ACME stack.
>
> Changes:
> - Remove the local src/acme/client.rs and switch to
> proxmox_acme::async_client::AcmeClient where needed.
> - Use proxmox_acme_api::load_client_with_account to the custom
> AcmeClient::load() function
> - Replace the local do_register() logic with
> proxmox_acme_api::register_account, to further ensure accounts are persisted
> - Replace the local AcmeAccountName type, required for
> proxmox_acme_api::register_account
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
Since you changed a lot of the imported types and traits in this patch
(and later ones), note that we have a particular ordering regarding
imports:
(At least as far as I'm aware at least; otherwise, someone please
correct me if I'm wrong)
1. imports from the stdlib
2. imports from external dependencies
3. imports from internal dependencies (so, mostly stuff from proxmox/)
4. imports from crates local to the repository
5. imports from the current crate
All of these groups are then separated by a blank line. The `use`
statements within those groups are (usually) ordered alphabetically. For
some examples, just browse around PBS a little bit.
Note that we're not suuuper strict about it, since we seem to not follow
that all too precisely in some isolated cases, but nevertheless, it's
good to stick to that format in order to keep things neat.
Unfortunately this isn't something we've automated yet due to it not
(completely?) supported in `rustfmt` / `cargo fmt` AFAIK. `cargo fmt`
should at least sort the individual groups, though.
Also, one apparently common exception to that format is the placement of
`pbs_api_types`—sometimes its part of 3., sometimes it's thrown in with
the crates of 4. In my suggestions in this patch (and the following
ones), I've added it to 3. for consistency's sake.
I would say that overall when you add new `use` statements, just make
sure they're added to the corresponding group if it exists already;
otherwise, add the group using the ordering above. It's not worth to
change the ordering of existing groups, at least not as part of the same
patch.
> src/acme/client.rs | 691 -------------------------
> src/acme/mod.rs | 3 -
> src/acme/plugin.rs | 2 +-
> src/api2/config/acme.rs | 50 +-
> src/api2/node/certificates.rs | 2 +-
> src/api2/types/acme.rs | 8 -
> src/bin/proxmox_backup_manager/acme.rs | 17 +-
> src/config/acme/mod.rs | 8 +-
> src/config/node.rs | 9 +-
> 9 files changed, 36 insertions(+), 754 deletions(-)
> delete mode 100644 src/acme/client.rs
>
> diff --git a/src/acme/client.rs b/src/acme/client.rs
> deleted file mode 100644
> index 9fb6ad55..00000000
> --- a/src/acme/client.rs
> +++ /dev/null
> @@ -1,691 +0,0 @@
snip 8<---------
> diff --git a/src/acme/mod.rs b/src/acme/mod.rs
> index bf61811c..cc561f9a 100644
> --- a/src/acme/mod.rs
> +++ b/src/acme/mod.rs
> @@ -1,5 +1,2 @@
> -mod client;
> -pub use client::AcmeClient;
> -
> pub(crate) mod plugin;
> pub(crate) use plugin::get_acme_plugin;
> diff --git a/src/acme/plugin.rs b/src/acme/plugin.rs
> index f756e9b5..5bc09e1f 100644
> --- a/src/acme/plugin.rs
> +++ b/src/acme/plugin.rs
> @@ -20,8 +20,8 @@ use tokio::process::Command;
>
> use proxmox_acme::{Authorization, Challenge};
>
> -use crate::acme::AcmeClient;
> use crate::api2::types::AcmeDomain;
> +use proxmox_acme::async_client::AcmeClient;
> use proxmox_rest_server::WorkerTask;
use proxmox_acme::{Authorization, Challenge};
use proxmox_acme::async_client::AcmeClient;
use proxmox_rest_server::WorkerTask;
use crate::api2::types::AcmeDomain;
>
> use crate::config::acme::plugin::{DnsPlugin, PluginData};
> diff --git a/src/api2/config/acme.rs b/src/api2/config/acme.rs
> index 35c3fb77..02f88e2e 100644
> --- a/src/api2/config/acme.rs
> +++ b/src/api2/config/acme.rs
> @@ -16,15 +16,15 @@ use proxmox_router::{
> use proxmox_schema::{api, param_bail};
>
> use proxmox_acme::types::AccountData as AcmeAccountData;
> -use proxmox_acme::Account;
>
> use pbs_api_types::{Authid, PRIV_SYS_MODIFY};
>
> -use crate::acme::AcmeClient;
> -use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
> +use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
> use crate::config::acme::plugin::{
> self, DnsPlugin, DnsPluginCore, DnsPluginCoreUpdater, PLUGIN_ID_SCHEMA,
> };
> +use proxmox_acme::async_client::AcmeClient;
> +use proxmox_acme_api::AcmeAccountName;
> use proxmox_rest_server::WorkerTask;
This file is a good example where we weren't strictly following that
format yet ...
use pbs_api_types::{Authid, PRIV_SYS_MODIFY};
use proxmox_acme::async_client::AcmeClient;
use proxmox_acme::types::AccountData as AcmeAccountData;
use proxmox_acme_api::AcmeAccountName;
use proxmox_rest_server::WorkerTask;
use proxmox_schema::{api, param_bail};
use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
use crate::config::acme::plugin::{
self, DnsPlugin, DnsPluginCore, DnsPluginCoreUpdater, PLUGIN_ID_SCHEMA,
};
>
> pub(crate) const ROUTER: Router = Router::new()
> @@ -143,15 +143,15 @@ pub struct AccountInfo {
> )]
> /// Return existing ACME account information.
> pub async fn get_account(name: AcmeAccountName) -> Result<AccountInfo, Error> {
> - let client = AcmeClient::load(&name).await?;
> - let account = client.account()?;
> + let account_info = proxmox_acme_api::get_account(name).await?;
> +
> Ok(AccountInfo {
> - location: account.location.clone(),
> - tos: client.tos().map(str::to_owned),
> - directory: client.directory_url().to_owned(),
> + location: account_info.location,
> + tos: account_info.tos,
> + directory: account_info.directory,
> account: AcmeAccountData {
> only_return_existing: false, // don't actually write this out in case it's set
> - ..account.data.clone()
> + ..account_info.account
> },
> })
> }
> @@ -240,41 +240,24 @@ fn register_account(
> auth_id.to_string(),
> true,
> move |_worker| async move {
> - let mut client = AcmeClient::new(directory);
> -
> info!("Registering ACME account '{}'...", &name);
>
> - let account = do_register_account(
> - &mut client,
> + let location = proxmox_acme_api::register_account(
> &name,
> - tos_url.is_some(),
> contact,
> - None,
> + tos_url,
> + Some(directory),
> eab_kid.zip(eab_hmac_key),
> )
> .await?;
>
> - info!("Registration successful, account URL: {}", account.location);
> + info!("Registration successful, account URL: {}", location);
>
> Ok(())
> },
> )
> }
>
> -pub async fn do_register_account<'a>(
> - client: &'a mut AcmeClient,
> - name: &AcmeAccountName,
> - agree_to_tos: bool,
> - contact: String,
> - rsa_bits: Option<u32>,
> - eab_creds: Option<(String, String)>,
> -) -> Result<&'a Account, Error> {
> - let contact = account_contact_from_string(&contact);
> - client
> - .new_account(name, agree_to_tos, contact, rsa_bits, eab_creds)
> - .await
> -}
> -
> #[api(
> input: {
> properties: {
> @@ -312,7 +295,10 @@ pub fn update_account(
> None => json!({}),
> };
>
> - AcmeClient::load(&name).await?.update_account(&data).await?;
> + proxmox_acme_api::load_client_with_account(&name)
> + .await?
> + .update_account(&data)
> + .await?;
>
> Ok(())
> },
> @@ -350,7 +336,7 @@ pub fn deactivate_account(
> auth_id.to_string(),
> true,
> move |_worker| async move {
> - match AcmeClient::load(&name)
> + match proxmox_acme_api::load_client_with_account(&name)
> .await?
> .update_account(&json!({"status": "deactivated"}))
> .await
> diff --git a/src/api2/node/certificates.rs b/src/api2/node/certificates.rs
> index 61ef910e..31196715 100644
> --- a/src/api2/node/certificates.rs
> +++ b/src/api2/node/certificates.rs
> @@ -17,10 +17,10 @@ use pbs_buildcfg::configdir;
> use pbs_tools::cert;
> use tracing::warn;
>
> -use crate::acme::AcmeClient;
> use crate::api2::types::AcmeDomain;
> use crate::config::node::NodeConfig;
> use crate::server::send_certificate_renewal_mail;
> +use proxmox_acme::async_client::AcmeClient;
> use proxmox_rest_server::WorkerTask;
use tracing::warn;
use proxmox_acme::async_client::AcmeClient;
use proxmox_rest_server::WorkerTask;
use pbs_tools::cert;
use crate::api2::types::AcmeDomain;
use crate::config::node::NodeConfig;
use crate::server::send_certificate_renewal_mail;
>
> pub const ROUTER: Router = Router::new()
> diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
> index 210ebdbc..7c9063c0 100644
> --- a/src/api2/types/acme.rs
> +++ b/src/api2/types/acme.rs
> @@ -60,14 +60,6 @@ pub struct KnownAcmeDirectory {
> pub url: &'static str,
> }
>
> -proxmox_schema::api_string_type! {
> - #[api(format: &PROXMOX_SAFE_ID_FORMAT)]
> - /// ACME account name.
> - #[derive(Clone, Eq, PartialEq, Hash, Deserialize, Serialize)]
> - #[serde(transparent)]
> - pub struct AcmeAccountName(String);
> -}
> -
> #[api(
> properties: {
> schema: {
> diff --git a/src/bin/proxmox_backup_manager/acme.rs b/src/bin/proxmox_backup_manager/acme.rs
> index 0f0eafea..bb987b26 100644
> --- a/src/bin/proxmox_backup_manager/acme.rs
> +++ b/src/bin/proxmox_backup_manager/acme.rs
> @@ -7,9 +7,9 @@ use proxmox_router::{cli::*, ApiHandler, RpcEnvironment};
> use proxmox_schema::api;
> use proxmox_sys::fs::file_get_contents;
>
> -use proxmox_backup::acme::AcmeClient;
> +use proxmox_acme::async_client::AcmeClient;
> +use proxmox_acme_api::AcmeAccountName;
> use proxmox_backup::api2;
> -use proxmox_backup::api2::types::AcmeAccountName;
> use proxmox_backup::config::acme::plugin::DnsPluginCore;
> use proxmox_backup::config::acme::KNOWN_ACME_DIRECTORIES;
use proxmox_acme::async_client::AcmeClient;
use proxmox_acme_api::AcmeAccountName;
use proxmox_schema::api;
use proxmox_sys::fs::file_get_contents;
use proxmox_backup::acme::AcmeClient;
use proxmox_backup::api2;
use proxmox_backup::api2::types::AcmeAccountName;
use proxmox_backup::config::acme::plugin::DnsPluginCore;
use proxmox_backup::config::acme::KNOWN_ACME_DIRECTORIES;
>
> @@ -188,17 +188,20 @@ async fn register_account(
>
> println!("Attempting to register account with {directory_url:?}...");
>
> - let account = api2::config::acme::do_register_account(
> - &mut client,
> + let tos_agreed = tos_agreed
> + .then(|| directory.terms_of_service_url().map(str::to_owned))
> + .flatten();
> +
> + let location = proxmox_acme_api::register_account(
> &name,
> - tos_agreed,
> contact,
> - None,
> + tos_agreed,
> + Some(directory_url),
> eab_creds,
> )
> .await?;
>
> - println!("Registration successful, account URL: {}", account.location);
> + println!("Registration successful, account URL: {}", location);
>
> Ok(())
> }
> diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
> index 274a23fd..d31b2bc9 100644
> --- a/src/config/acme/mod.rs
> +++ b/src/config/acme/mod.rs
> @@ -10,7 +10,8 @@ use proxmox_sys::fs::{file_read_string, CreateOptions};
>
> use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
>
> -use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
> +use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
> +use proxmox_acme_api::AcmeAccountName;
use proxmox_acme_api::AcmeAccountName;
[...]
use proxmox_sys::fs::{file_read_string, CreateOptions};
use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
>
> pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
> pub(crate) const ACME_ACCOUNT_DIR: &str = pbs_buildcfg::configdir!("/acme/accounts");
> @@ -35,11 +36,6 @@ pub(crate) fn make_acme_dir() -> Result<(), Error> {
> create_acme_subdir(ACME_DIR)
> }
>
> -pub(crate) fn make_acme_account_dir() -> Result<(), Error> {
> - make_acme_dir()?;
> - create_acme_subdir(ACME_ACCOUNT_DIR)
> -}
> -
> pub const KNOWN_ACME_DIRECTORIES: &[KnownAcmeDirectory] = &[
> KnownAcmeDirectory {
> name: "Let's Encrypt V2",
> diff --git a/src/config/node.rs b/src/config/node.rs
> index d2d6e383..d2a17a49 100644
> --- a/src/config/node.rs
> +++ b/src/config/node.rs
> @@ -16,10 +16,9 @@ use pbs_api_types::{
> use pbs_buildcfg::configdir;
> use pbs_config::{open_backup_lockfile, BackupLockGuard};
>
> -use crate::acme::AcmeClient;
> -use crate::api2::types::{
> - AcmeAccountName, AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA,
> -};
> +use crate::api2::types::{AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA};
> +use proxmox_acme::async_client::AcmeClient;
> +use proxmox_acme_api::AcmeAccountName;
use proxmox_acme::async_client::AcmeClient;
use proxmox_acme_api::AcmeAccountName;
use pbs_buildcfg::configdir;
use pbs_config::{open_backup_lockfile, BackupLockGuard};
use crate::api2::types::{AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA};
>
> const CONF_FILE: &str = configdir!("/node.cfg");
> const LOCK_FILE: &str = configdir!("/.node.lck");
> @@ -249,7 +248,7 @@ impl NodeConfig {
> } else {
> AcmeAccountName::from_string("default".to_string())? // should really not happen
> };
> - AcmeClient::load(&account).await
> + proxmox_acme_api::load_client_with_account(&account).await
> }
>
> pub fn acme_domains(&'_ self) -> AcmeDomainIter<'_> {
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 4%]
* Re: [pbs-devel] [PATCH proxmox v4 2/4] acme: reduce visibility of Request type
2025-12-03 10:22 12% ` [pbs-devel] [PATCH proxmox v4 2/4] acme: reduce visibility of Request type Samuel Rufinatscha
@ 2025-12-09 16:51 5% ` Max R. Carrara
0 siblings, 0 replies; 200+ results
From: Max R. Carrara @ 2025-12-09 16:51 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On Wed Dec 3, 2025 at 11:22 AM CET, Samuel Rufinatscha wrote:
> Currently, the low-level ACME Request type is publicly exposed, even
> though users are expected to go through AcmeClient and
> proxmox-acme-api handlers. This patch reduces visibility so that
> the Request type and related fields/methods are crate-internal only.
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> proxmox-acme/src/account.rs | 17 ++++++++++-------
> proxmox-acme/src/async_client.rs | 2 +-
> proxmox-acme/src/authorization.rs | 2 +-
> proxmox-acme/src/client.rs | 6 +++---
> proxmox-acme/src/lib.rs | 4 ----
> proxmox-acme/src/order.rs | 2 +-
> proxmox-acme/src/request.rs | 12 ++++++------
> 7 files changed, 22 insertions(+), 23 deletions(-)
>
> diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
> index 0bbf0027..081ca986 100644
> --- a/proxmox-acme/src/account.rs
> +++ b/proxmox-acme/src/account.rs
> @@ -92,7 +92,7 @@ impl Account {
> }
>
> /// Prepare a "POST-as-GET" request to fetch data. Low level helper.
> - pub fn get_request(&self, url: &str, nonce: &str) -> Result<Request, Error> {
> + pub(crate) fn get_request(&self, url: &str, nonce: &str) -> Result<Request, Error> {
> let key = PKey::private_key_from_pem(self.private_key.as_bytes())?;
> let body = serde_json::to_string(&Jws::new_full(
> &key,
> @@ -112,7 +112,7 @@ impl Account {
> }
>
> /// Prepare a JSON POST request. Low level helper.
> - pub fn post_request<T: Serialize>(
> + pub(crate) fn post_request<T: Serialize>(
> &self,
> url: &str,
> nonce: &str,
> @@ -179,7 +179,7 @@ impl Account {
> /// Prepare a request to update account data.
> ///
> /// This is a rather low level interface. You should know what you're doing.
> - pub fn update_account_request<T: Serialize>(
> + pub(crate) fn update_account_request<T: Serialize>(
^ Regarding this function ...
> &self,
> nonce: &str,
> data: &T,
> @@ -188,7 +188,10 @@ impl Account {
> }
>
> /// Prepare a request to deactivate this account.
> - pub fn deactivate_account_request<T: Serialize>(&self, nonce: &str) -> Result<Request, Error> {
> + pub(crate) fn deactivate_account_request<T: Serialize>(
> + &self,
> + nonce: &str,
> + ) -> Result<Request, Error> {
^ and this one ...
> self.post_request_raw_payload(
> &self.location,
> nonce,
> @@ -220,7 +223,7 @@ impl Account {
> ///
> /// This returns a raw `Request` since validation takes some time and the `Authorization`
> /// object has to be re-queried and its `status` inspected.
> - pub fn validate_challenge(
> + pub(crate) fn validate_challenge(
^ as well as this one here, I noticed that they aren't used anywhere in
our code, at least I couldn't find any references to them by grepping
through our sources. Since they're not used at all, we could just remove
them entirely here, IMO. If it's not used, there's not really any point
in keeping those methods around—and as you mentioned, users should be
using `AcmeClient` and `proxmox-acme-api` handlers anyway.
Note that `post_request_raw_payload()` then also becomes redundant,
since its used in `validate_challenge()` and
`deactivate_account_request()`.
> &self,
> authorization: &Authorization,
> challenge_index: usize,
> @@ -274,7 +277,7 @@ pub struct CertificateRevocation<'a> {
>
> impl CertificateRevocation<'_> {
> /// Create the revocation request using the specified nonce for the given directory.
> - pub fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
> + pub(crate) fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
> let revoke_cert = directory.data.revoke_cert.as_ref().ok_or_else(|| {
> Error::Custom("no 'revokeCert' URL specified by provider".to_string())
> })?;
> @@ -364,7 +367,7 @@ impl AccountCreator {
> /// the resulting request.
> /// Changing the private key between using the request and passing the response to
> /// [`response`](AccountCreator::response()) will render the account unusable!
> - pub fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
> + pub(crate) fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
> let key = self.key.as_deref().ok_or(Error::MissingKey)?;
> let url = directory.new_account_url().ok_or_else(|| {
> Error::Custom("no 'newAccount' URL specified by provider".to_string())
> diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
> index dc755fb9..2ff3ba22 100644
> --- a/proxmox-acme/src/async_client.rs
> +++ b/proxmox-acme/src/async_client.rs
> @@ -10,7 +10,7 @@ use proxmox_http::{client::Client, Body};
>
> use crate::account::AccountCreator;
> use crate::order::{Order, OrderData};
> -use crate::Request as AcmeRequest;
> +use crate::request::Request as AcmeRequest;
> use crate::{Account, Authorization, Challenge, Directory, Error, ErrorResponse};
>
> /// A non-blocking Acme client using tokio/hyper.
> diff --git a/proxmox-acme/src/authorization.rs b/proxmox-acme/src/authorization.rs
> index 28bc1b4b..765714fc 100644
> --- a/proxmox-acme/src/authorization.rs
> +++ b/proxmox-acme/src/authorization.rs
> @@ -145,7 +145,7 @@ pub struct GetAuthorization {
> /// this is guaranteed to be `Some`.
> ///
> /// The response should be passed to the the [`response`](GetAuthorization::response()) method.
> - pub request: Option<Request>,
> + pub(crate) request: Option<Request>,
> }
>
> impl GetAuthorization {
> diff --git a/proxmox-acme/src/client.rs b/proxmox-acme/src/client.rs
> index 931f7245..5c812567 100644
> --- a/proxmox-acme/src/client.rs
> +++ b/proxmox-acme/src/client.rs
> @@ -7,8 +7,8 @@ use serde::{Deserialize, Serialize};
> use crate::b64u;
> use crate::error;
> use crate::order::OrderData;
> -use crate::request::ErrorResponse;
> -use crate::{Account, Authorization, Challenge, Directory, Error, Order, Request};
> +use crate::request::{ErrorResponse, Request};
> +use crate::{Account, Authorization, Challenge, Directory, Error, Order};
>
> macro_rules! format_err {
> ($($fmt:tt)*) => { Error::Client(format!($($fmt)*)) };
> @@ -564,7 +564,7 @@ impl Client {
> }
>
> /// Low-level API to run an n API request. This automatically updates the current nonce!
> - pub fn run_request(&mut self, request: Request) -> Result<HttpResponse, Error> {
> + pub(crate) fn run_request(&mut self, request: Request) -> Result<HttpResponse, Error> {
> self.inner.run_request(request)
> }
>
> diff --git a/proxmox-acme/src/lib.rs b/proxmox-acme/src/lib.rs
> index df722629..6722030c 100644
> --- a/proxmox-acme/src/lib.rs
> +++ b/proxmox-acme/src/lib.rs
> @@ -66,10 +66,6 @@ pub use error::Error;
> #[doc(inline)]
> pub use order::Order;
>
> -#[cfg(feature = "impl")]
> -#[doc(inline)]
> -pub use request::Request;
> -
> // we don't inline these:
> #[cfg(feature = "impl")]
> pub use order::NewOrder;
> diff --git a/proxmox-acme/src/order.rs b/proxmox-acme/src/order.rs
> index b6551004..432a81a4 100644
> --- a/proxmox-acme/src/order.rs
> +++ b/proxmox-acme/src/order.rs
> @@ -153,7 +153,7 @@ pub struct NewOrder {
> //order: OrderData,
> /// The request to execute to place the order. When creating a [`NewOrder`] via
> /// [`Account::new_order`](crate::Account::new_order) this is guaranteed to be `Some`.
> - pub request: Option<Request>,
> + pub(crate) request: Option<Request>,
> }
>
> impl NewOrder {
> diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
> index 78a90913..dadfc5af 100644
> --- a/proxmox-acme/src/request.rs
> +++ b/proxmox-acme/src/request.rs
> @@ -4,21 +4,21 @@ pub(crate) const JSON_CONTENT_TYPE: &str = "application/jose+json";
> pub(crate) const CREATED: u16 = 201;
>
> /// A request which should be performed on the ACME provider.
> -pub struct Request {
> +pub(crate) struct Request {
> /// The complete URL to send the request to.
> - pub url: String,
> + pub(crate) url: String,
>
> /// The HTTP method name to use.
> - pub method: &'static str,
> + pub(crate) method: &'static str,
>
> /// The `Content-Type` header to pass along.
> - pub content_type: &'static str,
> + pub(crate) content_type: &'static str,
>
> /// The body to pass along with request, or an empty string.
> - pub body: String,
> + pub(crate) body: String,
>
> /// The expected status code a compliant ACME provider will return on success.
> - pub expected: u16,
> + pub(crate) expected: u16,
> }
>
> /// An ACME error response contains a specially formatted type string, and can optionally
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox{-backup, } v4 0/8] fix #6939: acme: support servers returning 204 for nonce requests
2025-12-03 10:22 11% [pbs-devel] [PATCH proxmox{-backup, } v4 " Samuel Rufinatscha
` (7 preceding siblings ...)
2025-12-03 10:22 14% ` [pbs-devel] [PATCH proxmox v4 4/4] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
@ 2025-12-09 16:50 5% ` Max R. Carrara
2025-12-10 9:44 6% ` Samuel Rufinatscha
2026-01-08 11:48 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
9 siblings, 1 reply; 200+ results
From: Max R. Carrara @ 2025-12-09 16:50 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On Wed Dec 3, 2025 at 11:22 AM CET, Samuel Rufinatscha wrote:
> Hi,
>
> this series fixes account registration for ACME providers that return
> HTTP 204 No Content to the newNonce request. Currently, both the PBS
> ACME client and the shared ACME client in proxmox-acme only accept
> HTTP 200 OK for this request. The issue was observed in PBS against a
> custom ACME deployment and reported as bug #6939 [1].
>
> [...]
Testing
-------
Tested this on my local PBS development instance with the DNS-01
challenge using one of my domains on OVH and Let's Encrypt Staging.
The cert was ordered without any problems. Everything worked just as
before.
Comments Regarding the Changes Made
-----------------------------------
Overall, looks pretty good! I only found a few minor things, see my
comments inline.
What I would recommend overall is to make the changes in `proxmox`
first, and then use the new `async fn` you introduced in patch #4
(proxmox) in `proxmox-backup` instead of doing things the other way
around. That way you could perhaps use the function you introduced,
since I'm assuming you added it for good reason.
Conclusion
----------
LGTM—needs a teeny tiny bit more polish (see comments inline), but
otherwise works great already! :D Good to see a lot of redundant code
being removed.
The few things I mentioned inline aren't *strict* blockers IMO and can
maybe be addressed in a couple follow-up patches, if this gets merged as
is. Otherwise, should you release a v5 of this series, I'll do another
review.
Anyhow, should the maintainer decide to merge this series, please
consider:
Reviewed-by: Max R. Carrara <m.carrara@proxmox.com>
Tested-by: Max R. Carrara <m.carrara@proxmox.com>
>
> proxmox-backup:
>
> Samuel Rufinatscha (4):
> acme: include proxmox-acme-api dependency
> acme: drop local AcmeClient
> acme: change API impls to use proxmox-acme-api handlers
> acme: certificate ordering through proxmox-acme-api
>
> Cargo.toml | 3 +
> src/acme/client.rs | 691 -------------------------
> src/acme/mod.rs | 5 -
> src/acme/plugin.rs | 336 ------------
> src/api2/config/acme.rs | 407 ++-------------
> src/api2/node/certificates.rs | 240 ++-------
> src/api2/types/acme.rs | 98 ----
> src/api2/types/mod.rs | 3 -
> src/bin/proxmox-backup-api.rs | 2 +
> src/bin/proxmox-backup-manager.rs | 2 +
> src/bin/proxmox-backup-proxy.rs | 1 +
> src/bin/proxmox_backup_manager/acme.rs | 21 +-
> src/config/acme/mod.rs | 51 +-
> src/config/acme/plugin.rs | 99 +---
> src/config/node.rs | 29 +-
> src/lib.rs | 2 -
> 16 files changed, 103 insertions(+), 1887 deletions(-)
> delete mode 100644 src/acme/client.rs
> delete mode 100644 src/acme/mod.rs
> delete mode 100644 src/acme/plugin.rs
> delete mode 100644 src/api2/types/acme.rs
>
>
> proxmox:
>
> Samuel Rufinatscha (4):
> acme-api: add helper to load client for an account
> acme: reduce visibility of Request type
> acme: introduce http_status module
> fix #6939: acme: support servers returning 204 for nonce requests
>
> proxmox-acme-api/src/account_api_impl.rs | 5 +++++
> proxmox-acme-api/src/lib.rs | 3 ++-
> proxmox-acme/src/account.rs | 27 +++++++++++++-----------
> proxmox-acme/src/async_client.rs | 8 +++----
> proxmox-acme/src/authorization.rs | 2 +-
> proxmox-acme/src/client.rs | 8 +++----
> proxmox-acme/src/lib.rs | 6 ++----
> proxmox-acme/src/order.rs | 2 +-
> proxmox-acme/src/request.rs | 25 +++++++++++++++-------
> 9 files changed, 51 insertions(+), 35 deletions(-)
>
>
> Summary over all repositories:
> 25 files changed, 154 insertions(+), 1922 deletions(-)
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox{-backup, } 0/6] Reduce token.shadow verification overhead
2025-12-05 14:06 5% ` [pbs-devel] [PATCH proxmox{-backup, } 0/6] Reduce token.shadow verification overhead Shannon Sterz
@ 2025-12-09 13:58 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-09 13:58 UTC (permalink / raw)
To: Shannon Sterz; +Cc: Proxmox Backup Server development discussion
Thank you for your great review, Shannon, and for your feedback.
I agree, it would be a good to publicly document the TTL window.
Where would you add this best?
Thanks!
On 12/5/25 3:05 PM, Shannon Sterz wrote:
> thank you for this series and the extensive documentation in it. it was
> very easy to follow. the changes look good to me for the most part, see
> the comments on the first patch. one top level question, though:
>
> should we publicly document that manually editing the token.shadow will
> now not instantly make requests by tokens invalid, but changes will take
> up to one minute to take effect? i don't think that this is necessarily
> and issue, but imo we shouldn't make such a change without informing
> users.
>
> other than this and the comments in-line, consider this:
>
> Reviewed-by: Shannon Sterz <s.sterz@proxmox.com>
>
> On Fri Dec 5, 2025 at 2:25 PM CET, Samuel Rufinatscha wrote:
>> Hi,
>>
>> this series improves the performance of token-based API authentication
>> in PBS (pbs-config) and in PDM (underlying proxmox-access-control
>> crate), addressing the API token verification hotspot reported in our
>> bugtracker #6049 [1].
>>
>> When profiling PBS /status endpoint with cargo flamegraph [2],
>> token-based authentication showed up as a dominant hotspot via
>> proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
>> path from the hot section of the flamegraph. The same performance issue
>> was measured [3] for PDM. PDM uses the underlying shared
>> proxmox-access-control library for token handling, which is a
>> factored out version of the token.shadow handling code from PBS.
>>
>> While this series fixes the immediate performance issue both in PBS
>> (pbs-config) and in the shared proxmox-access-control crate used by
>> PDM, PBS should eventually, ideally be refactored, in a separate
>> effort, to use proxmox-access-control for token handling instead of its
>> local implementation.
>>
>> Problem
>>
>> For token-based API requests, both PBS’s pbs-config token.shadow
>> handling and PDM proxmox-access-control’s token.shadow handling
>> currently:
>>
>> 1. read the token.shadow file on each request
>> 2. deserialize it into a HashMap<Authid, String>
>> 3. run password hash verification via
>> proxmox_sys::crypt::verify_crypt_pw for the provided token secret
>>
>> Under load, this results in significant CPU usage spent in repeated
>> password hash computations for the same token+secret pairs. The
>> attached flamegraphs for PBS [2] and PDM [3] show
>> proxmox_sys::crypt::verify_crypt_pw dominating the hot path.
>>
>> Approach
>>
>> The goal is to reduce the cost of token-based authentication preserving
>> the existing token handling semantics (including detecting manual edits
>> to token.shadow) and be consistent between PBS (pbs-config) and
>> PDM (proxmox-access-control). For both sites, the series proposes
>> following approach:
>>
>> 1. Introduce an in-memory cache for verified token secrets
>> 2. Invalidate the cache when token.shadow changes (detect manual edits)
>> 3. Control metadata checks with a TTL window
>>
>> Testing
>>
>> *PBS (pbs-config)*
>>
>> To verify the effect in PBS, I:
>> 1. Set up test environment based on latest PBS ISO, installed Rust
>> toolchain, cloned proxmox-backup repository to use with cargo
>> flamegraph. Reproduced bug #6049 [1] by profiling the /status
>> endpoint with token-based authentication using cargo flamegraph [2].
>> The flamegraph showed proxmox_sys::crypt::verify_crypt_pw is the
>> hotspot.
>> 2. Built PBS with pbs-config patches and re-ran the same workload and
>> profiling setup.
>> 3. Confirmed that the proxmox_sys::crypt::verify_crypt_pw path no
>> longer appears in the hot section of the flamegraph. CPU usage is
>> now dominated by TLS overhead.
>> 4. Functionally verified that:
>> * token-based API authentication still works for valid tokens
>> * invalid secrets are rejected as before
>> * generating a new token secret via dashboard works and
>> authenticates correctly
>>
>> *PDM (proxmox-access-control)*
>>
>> To verify the effect in PDM, I followed a similar testing approach.
>> Instead of /status, I profiled the /version endpoint with cargo
>> flamegraph [3] and verified that the token hashing path disappears
>> from the hot section after applying the proxmox-access-control patches.
>>
>> Functionally I verified that:
>> * token-based API authentication still works for valid tokens
>> * invalid secrets are rejected as before
>> * generating a new token secret via dashboard works and
>> authenticates correctly
>>
>> Patch summary
>>
>> pbs-config:
>>
>> 0001 – pbs-config: cache verified API token secrets
>> Adds an in-memory cache keyed by Authid that stores plain text token
>> secrets after a successful verification or generation and uses
>> openssl’s memcmp constant-time for comparison.
>>
>> 0002 – pbs-config: invalidate token-secret cache on token.shadow changes
>> Tracks token.shadow mtime and length and clears the in-memory cache
>> when the file changes.
>>
>> 0003 – pbs-config: add TTL window to token-secret cache
>> Introduces a TTL (TOKEN_SECRET_CACHE_TTL_SECS, default 60) for metadata checks so
>> that fs::metadata is only called periodically.
>>
>> proxmox-access-control:
>>
>> 0004 – access-control: cache verified API token secrets
>> Mirrors PBS patch 0001.
>>
>> 0005 – access-control: invalidate token-secret cache on token.shadow changes
>> Mirrors PBS patch 0002.
>>
>> 0006 – access-control: add TTL window to token-secret cache
>> Mirrors PBS patch 0003.
>>
>> Thanks for considering this patch series, I look forward to your
>> feedback.
>>
>> Best,
>> Samuel Rufinatscha
>>
>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>> [2] Flamegraph illustrating the`proxmox_sys::crypt::verify_crypt_pw
>> hotspot before this series (attached to [1])
>>
>> proxmox-backup:
>>
>> Samuel Rufinatscha (3):
>> pbs-config: cache verified API token secrets
>> pbs-config: invalidate token-secret cache on token.shadow changes
>> pbs-config: add TTL window to token secret cache
>>
>> pbs-config/src/token_shadow.rs | 109 ++++++++++++++++++++++++++++++++-
>> 1 file changed, 108 insertions(+), 1 deletion(-)
>>
>>
>> proxmox:
>>
>> Samuel Rufinatscha (3):
>> proxmox-access-control: cache verified API token secrets
>> proxmox-access-control: invalidate token-secret cache on token.shadow
>> changes
>> proxmox-access-control: add TTL window to token secret cache
>>
>> proxmox-access-control/src/token_shadow.rs | 108 ++++++++++++++++++++-
>> 1 file changed, 107 insertions(+), 1 deletion(-)
>>
>>
>> Summary over all repositories:
>> 2 files changed, 215 insertions(+), 2 deletions(-)
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup 1/3] pbs-config: cache verified API token secrets
2025-12-05 14:04 5% ` Shannon Sterz
@ 2025-12-09 13:29 6% ` Samuel Rufinatscha
2025-12-17 11:16 5% ` Christian Ebner
0 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2025-12-09 13:29 UTC (permalink / raw)
To: Shannon Sterz; +Cc: Proxmox Backup Server development discussion
On 12/5/25 3:03 PM, Shannon Sterz wrote:
> On Fri Dec 5, 2025 at 2:25 PM CET, Samuel Rufinatscha wrote:
>> Currently, every token-based API request reads the token.shadow file and
>> runs the expensive password hash verification for the given token
>> secret. This shows up as a hotspot in /status profiling (see
>> bug #6049 [1]).
>>
>> This patch introduces an in-memory cache of successfully verified token
>> secrets. Subsequent requests for the same token+secret combination only
>> perform a comparison using openssl::memcmp::eq and avoid re-running the
>> password hash. The cache is updated when a token secret is set and
>> cleared when a token is deleted. Note, this does NOT include manual
>> config changes, which will be covered in a subsequent patch.
>>
>> This patch partly fixes bug #6049 [1].
>>
>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> pbs-config/src/token_shadow.rs | 58 +++++++++++++++++++++++++++++++++-
>> 1 file changed, 57 insertions(+), 1 deletion(-)
>>
>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
>> index 640fabbf..47aa2fc2 100644
>> --- a/pbs-config/src/token_shadow.rs
>> +++ b/pbs-config/src/token_shadow.rs
>> @@ -1,6 +1,8 @@
>> use std::collections::HashMap;
>> +use std::sync::RwLock;
>>
>> use anyhow::{bail, format_err, Error};
>> +use once_cell::sync::OnceCell;
>> use serde::{Deserialize, Serialize};
>> use serde_json::{from_value, Value};
>>
>> @@ -13,6 +15,13 @@ use crate::{open_backup_lockfile, BackupLockGuard};
>> const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
>> const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>>
>> +/// Global in-memory cache for successfully verified API token secrets.
>> +/// The cache stores plain text secrets for token Authids that have already been
>> +/// verified against the hashed values in `token.shadow`. This allows for cheap
>> +/// subsequent authentications for the same token+secret combination, avoiding
>> +/// recomputing the password hash on every request.
>> +static TOKEN_SECRET_CACHE: OnceCell<RwLock<ApiTokenSecretCache>> = OnceCell::new();
>
> any reason you are using a once cell with a cutom get_or_init function
> instead of a simple `LazyCell` [1] here? seems to me that this would be
> the more appropriate type here? similar question for the
> proxmox-access-control portion of this series.
>
> [1]: https://doc.rust-lang.org/std/cell/struct.LazyCell.html
>
Good point, we should / can directly initialize it! Will change
to LazyCell. Thanks!
>> +
>> #[derive(Serialize, Deserialize)]
>> #[serde(rename_all = "kebab-case")]
>> /// ApiToken id / secret pair
>> @@ -54,9 +63,25 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>> bail!("not an API token ID");
>> }
>>
>> + // Fast path
>> + if let Some(cached) = token_secret_cache().read().unwrap().secrets.get(tokenid) {
>> + // Compare cached secret with provided one using constant time comparison
>> + if openssl::memcmp::eq(cached.as_bytes(), secret.as_bytes()) {
>> + // Already verified before
>> + return Ok(());
>> + }
>> + // Fall through to slow path if secret doesn't match cached one
>> + }
>> +
>> + // Slow path: read file + verify hash
>> let data = read_file()?;
>> match data.get(tokenid) {
>> - Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
>> + Some(hashed_secret) => {
>> + proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
>> + // Cache the plain secret for future requests
>> + cache_insert_secret(tokenid.clone(), secret.to_owned());
>> + Ok(())
>> + }
>> None => bail!("invalid API token"),
>> }
>> }
>> @@ -82,6 +107,8 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>> data.insert(tokenid.clone(), hashed_secret);
>> write_file(data)?;
>>
>> + cache_insert_secret(tokenid.clone(), secret.to_owned());
>> +
>> Ok(())
>> }
>>
>> @@ -97,5 +124,34 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
>> data.remove(tokenid);
>> write_file(data)?;
>>
>> + cache_remove_secret(tokenid);
>> +
>> Ok(())
>> }
>> +
>> +struct ApiTokenSecretCache {
>> + /// Keys are token Authids, values are the corresponding plain text secrets.
>> + /// Entries are added after a successful on-disk verification in
>> + /// `verify_secret` or when a new token secret is generated by
>> + /// `generate_and_set_secret`. Used to avoid repeated
>> + /// password-hash computation on subsequent authentications.
>> + secrets: HashMap<Authid, String>,
>> +}
>> +
>> +fn token_secret_cache() -> &'static RwLock<ApiTokenSecretCache> {
>> + TOKEN_SECRET_CACHE.get_or_init(|| {
>> + RwLock::new(ApiTokenSecretCache {
>> + secrets: HashMap::new(),
>> + })
>> + })
>> +}
>> +
>> +fn cache_insert_secret(tokenid: Authid, secret: String) {
>> + let mut cache = token_secret_cache().write().unwrap();
>
> unwrap here could panic if another thread is holding a guard, any reason
> to not return a result here and bubble up the error instead?
>
Unwrap only panics here if another thread panicked while holding the
write lock. If that happens the cache might be in an inconsistent
state and future read() / write() will also return PoisonError. If we
return an error here we return the poison error to every subsequent
request.
I think we can:
– treat this as a hard bug and let the process panic on PoisonError; so
keep write().unwrap()
- catch the error, clear the cache and access the data via .into_inner().
but still forces every future read/write call to handle the poison logic
correctly
I think it makes sense to fail hard here. If the lock is poisoned the
state is likely broken and it seems better to let the process restart
>> + cache.secrets.insert(tokenid, secret);
>> +}
>> +
>> +fn cache_remove_secret(tokenid: &Authid) {
>> + let mut cache = token_secret_cache().write().unwrap();
>
> same here and in the following patches (i won't comment on each
> occurrence there separately.)
>
>> + cache.secrets.remove(tokenid);
>> +}
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox{-backup, } 0/6] Reduce token.shadow verification overhead
2025-12-05 13:25 15% [pbs-devel] [PATCH proxmox{-backup, } 0/6] Reduce token.shadow verification overhead Samuel Rufinatscha
` (5 preceding siblings ...)
2025-12-05 13:25 16% ` [pbs-devel] [PATCH proxmox 3/3] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
@ 2025-12-05 14:06 5% ` Shannon Sterz
2025-12-09 13:58 6% ` Samuel Rufinatscha
2025-12-17 16:27 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
7 siblings, 1 reply; 200+ results
From: Shannon Sterz @ 2025-12-05 14:06 UTC (permalink / raw)
To: Samuel Rufinatscha; +Cc: Proxmox Backup Server development discussion
thank you for this series and the extensive documentation in it. it was
very easy to follow. the changes look good to me for the most part, see
the comments on the first patch. one top level question, though:
should we publicly document that manually editing the token.shadow will
now not instantly make requests by tokens invalid, but changes will take
up to one minute to take effect? i don't think that this is necessarily
and issue, but imo we shouldn't make such a change without informing
users.
other than this and the comments in-line, consider this:
Reviewed-by: Shannon Sterz <s.sterz@proxmox.com>
On Fri Dec 5, 2025 at 2:25 PM CET, Samuel Rufinatscha wrote:
> Hi,
>
> this series improves the performance of token-based API authentication
> in PBS (pbs-config) and in PDM (underlying proxmox-access-control
> crate), addressing the API token verification hotspot reported in our
> bugtracker #6049 [1].
>
> When profiling PBS /status endpoint with cargo flamegraph [2],
> token-based authentication showed up as a dominant hotspot via
> proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
> path from the hot section of the flamegraph. The same performance issue
> was measured [3] for PDM. PDM uses the underlying shared
> proxmox-access-control library for token handling, which is a
> factored out version of the token.shadow handling code from PBS.
>
> While this series fixes the immediate performance issue both in PBS
> (pbs-config) and in the shared proxmox-access-control crate used by
> PDM, PBS should eventually, ideally be refactored, in a separate
> effort, to use proxmox-access-control for token handling instead of its
> local implementation.
>
> Problem
>
> For token-based API requests, both PBS’s pbs-config token.shadow
> handling and PDM proxmox-access-control’s token.shadow handling
> currently:
>
> 1. read the token.shadow file on each request
> 2. deserialize it into a HashMap<Authid, String>
> 3. run password hash verification via
> proxmox_sys::crypt::verify_crypt_pw for the provided token secret
>
> Under load, this results in significant CPU usage spent in repeated
> password hash computations for the same token+secret pairs. The
> attached flamegraphs for PBS [2] and PDM [3] show
> proxmox_sys::crypt::verify_crypt_pw dominating the hot path.
>
> Approach
>
> The goal is to reduce the cost of token-based authentication preserving
> the existing token handling semantics (including detecting manual edits
> to token.shadow) and be consistent between PBS (pbs-config) and
> PDM (proxmox-access-control). For both sites, the series proposes
> following approach:
>
> 1. Introduce an in-memory cache for verified token secrets
> 2. Invalidate the cache when token.shadow changes (detect manual edits)
> 3. Control metadata checks with a TTL window
>
> Testing
>
> *PBS (pbs-config)*
>
> To verify the effect in PBS, I:
> 1. Set up test environment based on latest PBS ISO, installed Rust
> toolchain, cloned proxmox-backup repository to use with cargo
> flamegraph. Reproduced bug #6049 [1] by profiling the /status
> endpoint with token-based authentication using cargo flamegraph [2].
> The flamegraph showed proxmox_sys::crypt::verify_crypt_pw is the
> hotspot.
> 2. Built PBS with pbs-config patches and re-ran the same workload and
> profiling setup.
> 3. Confirmed that the proxmox_sys::crypt::verify_crypt_pw path no
> longer appears in the hot section of the flamegraph. CPU usage is
> now dominated by TLS overhead.
> 4. Functionally verified that:
> * token-based API authentication still works for valid tokens
> * invalid secrets are rejected as before
> * generating a new token secret via dashboard works and
> authenticates correctly
>
> *PDM (proxmox-access-control)*
>
> To verify the effect in PDM, I followed a similar testing approach.
> Instead of /status, I profiled the /version endpoint with cargo
> flamegraph [3] and verified that the token hashing path disappears
> from the hot section after applying the proxmox-access-control patches.
>
> Functionally I verified that:
> * token-based API authentication still works for valid tokens
> * invalid secrets are rejected as before
> * generating a new token secret via dashboard works and
> authenticates correctly
>
> Patch summary
>
> pbs-config:
>
> 0001 – pbs-config: cache verified API token secrets
> Adds an in-memory cache keyed by Authid that stores plain text token
> secrets after a successful verification or generation and uses
> openssl’s memcmp constant-time for comparison.
>
> 0002 – pbs-config: invalidate token-secret cache on token.shadow changes
> Tracks token.shadow mtime and length and clears the in-memory cache
> when the file changes.
>
> 0003 – pbs-config: add TTL window to token-secret cache
> Introduces a TTL (TOKEN_SECRET_CACHE_TTL_SECS, default 60) for metadata checks so
> that fs::metadata is only called periodically.
>
> proxmox-access-control:
>
> 0004 – access-control: cache verified API token secrets
> Mirrors PBS patch 0001.
>
> 0005 – access-control: invalidate token-secret cache on token.shadow changes
> Mirrors PBS patch 0002.
>
> 0006 – access-control: add TTL window to token-secret cache
> Mirrors PBS patch 0003.
>
> Thanks for considering this patch series, I look forward to your
> feedback.
>
> Best,
> Samuel Rufinatscha
>
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] Flamegraph illustrating the`proxmox_sys::crypt::verify_crypt_pw
> hotspot before this series (attached to [1])
>
> proxmox-backup:
>
> Samuel Rufinatscha (3):
> pbs-config: cache verified API token secrets
> pbs-config: invalidate token-secret cache on token.shadow changes
> pbs-config: add TTL window to token secret cache
>
> pbs-config/src/token_shadow.rs | 109 ++++++++++++++++++++++++++++++++-
> 1 file changed, 108 insertions(+), 1 deletion(-)
>
>
> proxmox:
>
> Samuel Rufinatscha (3):
> proxmox-access-control: cache verified API token secrets
> proxmox-access-control: invalidate token-secret cache on token.shadow
> changes
> proxmox-access-control: add TTL window to token secret cache
>
> proxmox-access-control/src/token_shadow.rs | 108 ++++++++++++++++++++-
> 1 file changed, 107 insertions(+), 1 deletion(-)
>
>
> Summary over all repositories:
> 2 files changed, 215 insertions(+), 2 deletions(-)
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup 1/3] pbs-config: cache verified API token secrets
2025-12-05 13:25 14% ` [pbs-devel] [PATCH proxmox-backup 1/3] pbs-config: cache verified API token secrets Samuel Rufinatscha
@ 2025-12-05 14:04 5% ` Shannon Sterz
2025-12-09 13:29 6% ` Samuel Rufinatscha
2025-12-10 11:47 5% ` Fabian Grünbichler
1 sibling, 1 reply; 200+ results
From: Shannon Sterz @ 2025-12-05 14:04 UTC (permalink / raw)
To: Samuel Rufinatscha; +Cc: Proxmox Backup Server development discussion
On Fri Dec 5, 2025 at 2:25 PM CET, Samuel Rufinatscha wrote:
> Currently, every token-based API request reads the token.shadow file and
> runs the expensive password hash verification for the given token
> secret. This shows up as a hotspot in /status profiling (see
> bug #6049 [1]).
>
> This patch introduces an in-memory cache of successfully verified token
> secrets. Subsequent requests for the same token+secret combination only
> perform a comparison using openssl::memcmp::eq and avoid re-running the
> password hash. The cache is updated when a token secret is set and
> cleared when a token is deleted. Note, this does NOT include manual
> config changes, which will be covered in a subsequent patch.
>
> This patch partly fixes bug #6049 [1].
>
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> pbs-config/src/token_shadow.rs | 58 +++++++++++++++++++++++++++++++++-
> 1 file changed, 57 insertions(+), 1 deletion(-)
>
> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
> index 640fabbf..47aa2fc2 100644
> --- a/pbs-config/src/token_shadow.rs
> +++ b/pbs-config/src/token_shadow.rs
> @@ -1,6 +1,8 @@
> use std::collections::HashMap;
> +use std::sync::RwLock;
>
> use anyhow::{bail, format_err, Error};
> +use once_cell::sync::OnceCell;
> use serde::{Deserialize, Serialize};
> use serde_json::{from_value, Value};
>
> @@ -13,6 +15,13 @@ use crate::{open_backup_lockfile, BackupLockGuard};
> const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
> const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>
> +/// Global in-memory cache for successfully verified API token secrets.
> +/// The cache stores plain text secrets for token Authids that have already been
> +/// verified against the hashed values in `token.shadow`. This allows for cheap
> +/// subsequent authentications for the same token+secret combination, avoiding
> +/// recomputing the password hash on every request.
> +static TOKEN_SECRET_CACHE: OnceCell<RwLock<ApiTokenSecretCache>> = OnceCell::new();
any reason you are using a once cell with a cutom get_or_init function
instead of a simple `LazyCell` [1] here? seems to me that this would be
the more appropriate type here? similar question for the
proxmox-access-control portion of this series.
[1]: https://doc.rust-lang.org/std/cell/struct.LazyCell.html
> +
> #[derive(Serialize, Deserialize)]
> #[serde(rename_all = "kebab-case")]
> /// ApiToken id / secret pair
> @@ -54,9 +63,25 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
> bail!("not an API token ID");
> }
>
> + // Fast path
> + if let Some(cached) = token_secret_cache().read().unwrap().secrets.get(tokenid) {
> + // Compare cached secret with provided one using constant time comparison
> + if openssl::memcmp::eq(cached.as_bytes(), secret.as_bytes()) {
> + // Already verified before
> + return Ok(());
> + }
> + // Fall through to slow path if secret doesn't match cached one
> + }
> +
> + // Slow path: read file + verify hash
> let data = read_file()?;
> match data.get(tokenid) {
> - Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
> + Some(hashed_secret) => {
> + proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
> + // Cache the plain secret for future requests
> + cache_insert_secret(tokenid.clone(), secret.to_owned());
> + Ok(())
> + }
> None => bail!("invalid API token"),
> }
> }
> @@ -82,6 +107,8 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
> data.insert(tokenid.clone(), hashed_secret);
> write_file(data)?;
>
> + cache_insert_secret(tokenid.clone(), secret.to_owned());
> +
> Ok(())
> }
>
> @@ -97,5 +124,34 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
> data.remove(tokenid);
> write_file(data)?;
>
> + cache_remove_secret(tokenid);
> +
> Ok(())
> }
> +
> +struct ApiTokenSecretCache {
> + /// Keys are token Authids, values are the corresponding plain text secrets.
> + /// Entries are added after a successful on-disk verification in
> + /// `verify_secret` or when a new token secret is generated by
> + /// `generate_and_set_secret`. Used to avoid repeated
> + /// password-hash computation on subsequent authentications.
> + secrets: HashMap<Authid, String>,
> +}
> +
> +fn token_secret_cache() -> &'static RwLock<ApiTokenSecretCache> {
> + TOKEN_SECRET_CACHE.get_or_init(|| {
> + RwLock::new(ApiTokenSecretCache {
> + secrets: HashMap::new(),
> + })
> + })
> +}
> +
> +fn cache_insert_secret(tokenid: Authid, secret: String) {
> + let mut cache = token_secret_cache().write().unwrap();
unwrap here could panic if another thread is holding a guard, any reason
to not return a result here and bubble up the error instead?
> + cache.secrets.insert(tokenid, secret);
> +}
> +
> +fn cache_remove_secret(tokenid: &Authid) {
> + let mut cache = token_secret_cache().write().unwrap();
same here and in the following patches (i won't comment on each
occurrence there separately.)
> + cache.secrets.remove(tokenid);
> +}
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* [pbs-devel] [PATCH proxmox 3/3] proxmox-access-control: add TTL window to token secret cache
2025-12-05 13:25 15% [pbs-devel] [PATCH proxmox{-backup, } 0/6] Reduce token.shadow verification overhead Samuel Rufinatscha
` (4 preceding siblings ...)
2025-12-05 13:25 15% ` [pbs-devel] [PATCH proxmox 2/3] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2025-12-05 13:25 16% ` Samuel Rufinatscha
2025-12-05 14:06 5% ` [pbs-devel] [PATCH proxmox{-backup, } 0/6] Reduce token.shadow verification overhead Shannon Sterz
2025-12-17 16:27 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
7 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-05 13:25 UTC (permalink / raw)
To: pbs-devel
Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
should rarely change.
This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-access-control/src/token_shadow.rs | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index d08fb06a..885e629d 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -9,6 +9,7 @@ use serde_json::{from_value, Value};
use proxmox_auth_api::types::Authid;
use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
+use proxmox_time::epoch_i64;
use crate::init::impl_feature::{token_shadow, token_shadow_lock};
@@ -18,6 +19,8 @@ use crate::init::impl_feature::{token_shadow, token_shadow_lock};
/// subsequent authentications for the same token+secret combination, avoiding
/// recomputing the password hash on every request.
static TOKEN_SECRET_CACHE: OnceLock<RwLock<ApiTokenSecretCache>> = OnceLock::new();
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
// Get exclusive lock
fn lock_config() -> Result<ApiLockGuard, Error> {
@@ -44,6 +47,15 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
fn refresh_cache_if_file_changed() -> Result<(), Error> {
let mut cache = token_secret_cache().write().unwrap();
+ let now = epoch_i64();
+
+ // Fast path: Within TTL boundary
+ if let Some(last) = cache.last_checked {
+ if now - last < TOKEN_SECRET_CACHE_TTL_SECS {
+ return Ok(());
+ }
+ }
+
// Fetch the current token.shadow metadata
let (new_mtime, new_len) = match fs::metadata(token_shadow().as_path()) {
Ok(meta) => (meta.modified().ok(), Some(meta.len())),
@@ -60,6 +72,7 @@ fn refresh_cache_if_file_changed() -> Result<(), Error> {
cache.secrets.clear();
cache.file_mtime = new_mtime;
cache.file_len = new_len;
+ cache.last_checked = Some(now);
Ok(())
}
@@ -150,6 +163,8 @@ struct ApiTokenSecretCache {
file_mtime: Option<SystemTime>,
// shadow file length to detect changes
file_len: Option<u64>,
+ // last time the file metadata was checked
+ last_checked: Option<i64>,
}
fn token_secret_cache() -> &'static RwLock<ApiTokenSecretCache> {
@@ -158,6 +173,7 @@ fn token_secret_cache() -> &'static RwLock<ApiTokenSecretCache> {
secrets: HashMap::new(),
file_mtime: None,
file_len: None,
+ last_checked: None,
})
})
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 16%]
* [pbs-devel] [PATCH proxmox-backup 2/3] pbs-config: invalidate token-secret cache on token.shadow changes
2025-12-05 13:25 15% [pbs-devel] [PATCH proxmox{-backup, } 0/6] Reduce token.shadow verification overhead Samuel Rufinatscha
2025-12-05 13:25 14% ` [pbs-devel] [PATCH proxmox-backup 1/3] pbs-config: cache verified API token secrets Samuel Rufinatscha
@ 2025-12-05 13:25 15% ` Samuel Rufinatscha
2025-12-05 13:25 16% ` [pbs-devel] [PATCH proxmox-backup 3/3] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
` (5 subsequent siblings)
7 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-05 13:25 UTC (permalink / raw)
To: pbs-devel
Previously the in-memory token-secret cache was only updated via
set_secret() and delete_secret(), so manual edits to token.shadow were
not reflected.
This patch adds file change detection to the cache. It tracks the mtime
and length of token.shadow and clears the in-memory token secret cache
whenever these values change.
Note, this patch fetches file stats on every request. An TTL-based
optimization will be covered in a subsequent patch of the series.
This patch is a partly-fix.
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
pbs-config/src/token_shadow.rs | 35 ++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index 47aa2fc2..ed54cdfa 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,5 +1,8 @@
use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
use std::sync::RwLock;
+use std::time::SystemTime;
use anyhow::{bail, format_err, Error};
use once_cell::sync::OnceCell;
@@ -57,12 +60,38 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
proxmox_sys::fs::replace_file(CONF_FILE, &json, options, true)
}
+fn refresh_cache_if_file_changed() -> Result<(), Error> {
+ let mut cache = token_secret_cache().write().unwrap();
+
+ // Fetch the current token.shadow metadata
+ let (new_mtime, new_len) = match fs::metadata(CONF_FILE) {
+ Ok(meta) => (meta.modified().ok(), Some(meta.len())),
+ Err(e) if e.kind() == ErrorKind::NotFound => (None, None),
+ Err(e) => return Err(e.into()),
+ };
+
+ // Fast path: file did not change, keep the cache
+ if cache.file_mtime == new_mtime && cache.file_len == new_len {
+ return Ok(());
+ }
+
+ // File changed, drop all cached secrets
+ cache.secrets.clear();
+ cache.file_mtime = new_mtime;
+ cache.file_len = new_len;
+
+ Ok(())
+}
+
/// Verifies that an entry for given tokenid / API token secret exists
pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
if !tokenid.is_token() {
bail!("not an API token ID");
}
+ // Ensure cache is in sync with on-disk token.shadow file
+ refresh_cache_if_file_changed()?;
+
// Fast path
if let Some(cached) = token_secret_cache().read().unwrap().secrets.get(tokenid) {
// Compare cached secret with provided one using constant time comparison
@@ -136,12 +165,18 @@ struct ApiTokenSecretCache {
/// `generate_and_set_secret`. Used to avoid repeated
/// password-hash computation on subsequent authentications.
secrets: HashMap<Authid, String>,
+ // shadow file mtime to detect changes
+ file_mtime: Option<SystemTime>,
+ // shadow file length to detect changes
+ file_len: Option<u64>,
}
fn token_secret_cache() -> &'static RwLock<ApiTokenSecretCache> {
TOKEN_SECRET_CACHE.get_or_init(|| {
RwLock::new(ApiTokenSecretCache {
secrets: HashMap::new(),
+ file_mtime: None,
+ file_len: None,
})
})
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox 2/3] proxmox-access-control: invalidate token-secret cache on token.shadow changes
2025-12-05 13:25 15% [pbs-devel] [PATCH proxmox{-backup, } 0/6] Reduce token.shadow verification overhead Samuel Rufinatscha
` (3 preceding siblings ...)
2025-12-05 13:25 14% ` [pbs-devel] [PATCH proxmox 1/3] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
@ 2025-12-05 13:25 15% ` Samuel Rufinatscha
2025-12-05 13:25 16% ` [pbs-devel] [PATCH proxmox 3/3] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
` (2 subsequent siblings)
7 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-05 13:25 UTC (permalink / raw)
To: pbs-devel
Previously the in-memory token-secret cache was only updated via
set_secret() and delete_secret(), so manual edits to token.shadow were
not reflected.
This patch adds file change detection to the cache. It tracks the mtime
and length of token.shadow and clears the in-memory token secret cache
whenever these values change.
Note, this patch fetches file stats on every request. An TTL-based
optimization will be covered in a subsequent patch of the series.
This patch is a partly-fix.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-access-control/src/token_shadow.rs | 35 ++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index 2dcd117d..d08fb06a 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,5 +1,8 @@
use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
use std::sync::{OnceLock, RwLock};
+use std::time::SystemTime;
use anyhow::{bail, format_err, Error};
use serde_json::{from_value, Value};
@@ -38,12 +41,38 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
replace_config(token_shadow(), &json)
}
+fn refresh_cache_if_file_changed() -> Result<(), Error> {
+ let mut cache = token_secret_cache().write().unwrap();
+
+ // Fetch the current token.shadow metadata
+ let (new_mtime, new_len) = match fs::metadata(token_shadow().as_path()) {
+ Ok(meta) => (meta.modified().ok(), Some(meta.len())),
+ Err(e) if e.kind() == ErrorKind::NotFound => (None, None),
+ Err(e) => return Err(e.into()),
+ };
+
+ // Fast path: file did not change, keep the cache
+ if cache.file_mtime == new_mtime && cache.file_len == new_len {
+ return Ok(());
+ }
+
+ // File changed, drop all cached secrets
+ cache.secrets.clear();
+ cache.file_mtime = new_mtime;
+ cache.file_len = new_len;
+
+ Ok(())
+}
+
/// Verifies that an entry for given tokenid / API token secret exists
pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
if !tokenid.is_token() {
bail!("not an API token ID");
}
+ // Ensure cache is in sync with on-disk token.shadow file
+ refresh_cache_if_file_changed()?;
+
// Fast path
if let Some(cached) = token_secret_cache().read().unwrap().secrets.get(tokenid) {
// Compare cached secret with provided one using constant time comparison
@@ -117,12 +146,18 @@ struct ApiTokenSecretCache {
/// `generate_and_set_secret`. Used to avoid repeated
/// password-hash computation on subsequent authentications.
secrets: HashMap<Authid, String>,
+ // shadow file mtime to detect changes
+ file_mtime: Option<SystemTime>,
+ // shadow file length to detect changes
+ file_len: Option<u64>,
}
fn token_secret_cache() -> &'static RwLock<ApiTokenSecretCache> {
TOKEN_SECRET_CACHE.get_or_init(|| {
RwLock::new(ApiTokenSecretCache {
secrets: HashMap::new(),
+ file_mtime: None,
+ file_len: None,
})
})
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox{-backup, } 0/6] Reduce token.shadow verification overhead
@ 2025-12-05 13:25 15% Samuel Rufinatscha
2025-12-05 13:25 14% ` [pbs-devel] [PATCH proxmox-backup 1/3] pbs-config: cache verified API token secrets Samuel Rufinatscha
` (7 more replies)
0 siblings, 8 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-05 13:25 UTC (permalink / raw)
To: pbs-devel
Hi,
this series improves the performance of token-based API authentication
in PBS (pbs-config) and in PDM (underlying proxmox-access-control
crate), addressing the API token verification hotspot reported in our
bugtracker #6049 [1].
When profiling PBS /status endpoint with cargo flamegraph [2],
token-based authentication showed up as a dominant hotspot via
proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
path from the hot section of the flamegraph. The same performance issue
was measured [3] for PDM. PDM uses the underlying shared
proxmox-access-control library for token handling, which is a
factored out version of the token.shadow handling code from PBS.
While this series fixes the immediate performance issue both in PBS
(pbs-config) and in the shared proxmox-access-control crate used by
PDM, PBS should eventually, ideally be refactored, in a separate
effort, to use proxmox-access-control for token handling instead of its
local implementation.
Problem
For token-based API requests, both PBS’s pbs-config token.shadow
handling and PDM proxmox-access-control’s token.shadow handling
currently:
1. read the token.shadow file on each request
2. deserialize it into a HashMap<Authid, String>
3. run password hash verification via
proxmox_sys::crypt::verify_crypt_pw for the provided token secret
Under load, this results in significant CPU usage spent in repeated
password hash computations for the same token+secret pairs. The
attached flamegraphs for PBS [2] and PDM [3] show
proxmox_sys::crypt::verify_crypt_pw dominating the hot path.
Approach
The goal is to reduce the cost of token-based authentication preserving
the existing token handling semantics (including detecting manual edits
to token.shadow) and be consistent between PBS (pbs-config) and
PDM (proxmox-access-control). For both sites, the series proposes
following approach:
1. Introduce an in-memory cache for verified token secrets
2. Invalidate the cache when token.shadow changes (detect manual edits)
3. Control metadata checks with a TTL window
Testing
*PBS (pbs-config)*
To verify the effect in PBS, I:
1. Set up test environment based on latest PBS ISO, installed Rust
toolchain, cloned proxmox-backup repository to use with cargo
flamegraph. Reproduced bug #6049 [1] by profiling the /status
endpoint with token-based authentication using cargo flamegraph [2].
The flamegraph showed proxmox_sys::crypt::verify_crypt_pw is the
hotspot.
2. Built PBS with pbs-config patches and re-ran the same workload and
profiling setup.
3. Confirmed that the proxmox_sys::crypt::verify_crypt_pw path no
longer appears in the hot section of the flamegraph. CPU usage is
now dominated by TLS overhead.
4. Functionally verified that:
* token-based API authentication still works for valid tokens
* invalid secrets are rejected as before
* generating a new token secret via dashboard works and
authenticates correctly
*PDM (proxmox-access-control)*
To verify the effect in PDM, I followed a similar testing approach.
Instead of /status, I profiled the /version endpoint with cargo
flamegraph [3] and verified that the token hashing path disappears
from the hot section after applying the proxmox-access-control patches.
Functionally I verified that:
* token-based API authentication still works for valid tokens
* invalid secrets are rejected as before
* generating a new token secret via dashboard works and
authenticates correctly
Patch summary
pbs-config:
0001 – pbs-config: cache verified API token secrets
Adds an in-memory cache keyed by Authid that stores plain text token
secrets after a successful verification or generation and uses
openssl’s memcmp constant-time for comparison.
0002 – pbs-config: invalidate token-secret cache on token.shadow changes
Tracks token.shadow mtime and length and clears the in-memory cache
when the file changes.
0003 – pbs-config: add TTL window to token-secret cache
Introduces a TTL (TOKEN_SECRET_CACHE_TTL_SECS, default 60) for metadata checks so
that fs::metadata is only called periodically.
proxmox-access-control:
0004 – access-control: cache verified API token secrets
Mirrors PBS patch 0001.
0005 – access-control: invalidate token-secret cache on token.shadow changes
Mirrors PBS patch 0002.
0006 – access-control: add TTL window to token-secret cache
Mirrors PBS patch 0003.
Thanks for considering this patch series, I look forward to your
feedback.
Best,
Samuel Rufinatscha
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] Flamegraph illustrating the`proxmox_sys::crypt::verify_crypt_pw
hotspot before this series (attached to [1])
proxmox-backup:
Samuel Rufinatscha (3):
pbs-config: cache verified API token secrets
pbs-config: invalidate token-secret cache on token.shadow changes
pbs-config: add TTL window to token secret cache
pbs-config/src/token_shadow.rs | 109 ++++++++++++++++++++++++++++++++-
1 file changed, 108 insertions(+), 1 deletion(-)
proxmox:
Samuel Rufinatscha (3):
proxmox-access-control: cache verified API token secrets
proxmox-access-control: invalidate token-secret cache on token.shadow
changes
proxmox-access-control: add TTL window to token secret cache
proxmox-access-control/src/token_shadow.rs | 108 ++++++++++++++++++++-
1 file changed, 107 insertions(+), 1 deletion(-)
Summary over all repositories:
2 files changed, 215 insertions(+), 2 deletions(-)
--
Generated by git-murpp 0.8.1
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox-backup 3/3] pbs-config: add TTL window to token secret cache
2025-12-05 13:25 15% [pbs-devel] [PATCH proxmox{-backup, } 0/6] Reduce token.shadow verification overhead Samuel Rufinatscha
2025-12-05 13:25 14% ` [pbs-devel] [PATCH proxmox-backup 1/3] pbs-config: cache verified API token secrets Samuel Rufinatscha
2025-12-05 13:25 15% ` [pbs-devel] [PATCH proxmox-backup 2/3] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2025-12-05 13:25 16% ` Samuel Rufinatscha
2025-12-05 13:25 14% ` [pbs-devel] [PATCH proxmox 1/3] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
` (4 subsequent siblings)
7 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-05 13:25 UTC (permalink / raw)
To: pbs-devel
Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
usually should rarely change.
This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired.
This patch partly fixes bug #6049 [1].
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
pbs-config/src/token_shadow.rs | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index ed54cdfa..23837c60 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -10,6 +10,7 @@ use serde::{Deserialize, Serialize};
use serde_json::{from_value, Value};
use proxmox_sys::fs::CreateOptions;
+use proxmox_time::epoch_i64;
use pbs_api_types::Authid;
//use crate::auth;
@@ -24,6 +25,8 @@ const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
/// subsequent authentications for the same token+secret combination, avoiding
/// recomputing the password hash on every request.
static TOKEN_SECRET_CACHE: OnceCell<RwLock<ApiTokenSecretCache>> = OnceCell::new();
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
@@ -63,6 +66,15 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
fn refresh_cache_if_file_changed() -> Result<(), Error> {
let mut cache = token_secret_cache().write().unwrap();
+ let now = epoch_i64();
+
+ // Fast path: Within TTL boundary
+ if let Some(last) = cache.last_checked {
+ if now - last < TOKEN_SECRET_CACHE_TTL_SECS {
+ return Ok(());
+ }
+ }
+
// Fetch the current token.shadow metadata
let (new_mtime, new_len) = match fs::metadata(CONF_FILE) {
Ok(meta) => (meta.modified().ok(), Some(meta.len())),
@@ -79,6 +91,7 @@ fn refresh_cache_if_file_changed() -> Result<(), Error> {
cache.secrets.clear();
cache.file_mtime = new_mtime;
cache.file_len = new_len;
+ cache.last_checked = Some(now);
Ok(())
}
@@ -169,6 +182,8 @@ struct ApiTokenSecretCache {
file_mtime: Option<SystemTime>,
// shadow file length to detect changes
file_len: Option<u64>,
+ // last time the file metadata was checked
+ last_checked: Option<i64>,
}
fn token_secret_cache() -> &'static RwLock<ApiTokenSecretCache> {
@@ -177,6 +192,7 @@ fn token_secret_cache() -> &'static RwLock<ApiTokenSecretCache> {
secrets: HashMap::new(),
file_mtime: None,
file_len: None,
+ last_checked: None,
})
})
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 16%]
* [pbs-devel] [PATCH proxmox-backup 1/3] pbs-config: cache verified API token secrets
2025-12-05 13:25 15% [pbs-devel] [PATCH proxmox{-backup, } 0/6] Reduce token.shadow verification overhead Samuel Rufinatscha
@ 2025-12-05 13:25 14% ` Samuel Rufinatscha
2025-12-05 14:04 5% ` Shannon Sterz
2025-12-10 11:47 5% ` Fabian Grünbichler
2025-12-05 13:25 15% ` [pbs-devel] [PATCH proxmox-backup 2/3] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
` (6 subsequent siblings)
7 siblings, 2 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-05 13:25 UTC (permalink / raw)
To: pbs-devel
Currently, every token-based API request reads the token.shadow file and
runs the expensive password hash verification for the given token
secret. This shows up as a hotspot in /status profiling (see
bug #6049 [1]).
This patch introduces an in-memory cache of successfully verified token
secrets. Subsequent requests for the same token+secret combination only
perform a comparison using openssl::memcmp::eq and avoid re-running the
password hash. The cache is updated when a token secret is set and
cleared when a token is deleted. Note, this does NOT include manual
config changes, which will be covered in a subsequent patch.
This patch partly fixes bug #6049 [1].
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
pbs-config/src/token_shadow.rs | 58 +++++++++++++++++++++++++++++++++-
1 file changed, 57 insertions(+), 1 deletion(-)
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index 640fabbf..47aa2fc2 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,6 +1,8 @@
use std::collections::HashMap;
+use std::sync::RwLock;
use anyhow::{bail, format_err, Error};
+use once_cell::sync::OnceCell;
use serde::{Deserialize, Serialize};
use serde_json::{from_value, Value};
@@ -13,6 +15,13 @@ use crate::{open_backup_lockfile, BackupLockGuard};
const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: OnceCell<RwLock<ApiTokenSecretCache>> = OnceCell::new();
+
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
/// ApiToken id / secret pair
@@ -54,9 +63,25 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
bail!("not an API token ID");
}
+ // Fast path
+ if let Some(cached) = token_secret_cache().read().unwrap().secrets.get(tokenid) {
+ // Compare cached secret with provided one using constant time comparison
+ if openssl::memcmp::eq(cached.as_bytes(), secret.as_bytes()) {
+ // Already verified before
+ return Ok(());
+ }
+ // Fall through to slow path if secret doesn't match cached one
+ }
+
+ // Slow path: read file + verify hash
let data = read_file()?;
match data.get(tokenid) {
- Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+ Some(hashed_secret) => {
+ proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+ // Cache the plain secret for future requests
+ cache_insert_secret(tokenid.clone(), secret.to_owned());
+ Ok(())
+ }
None => bail!("invalid API token"),
}
}
@@ -82,6 +107,8 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
data.insert(tokenid.clone(), hashed_secret);
write_file(data)?;
+ cache_insert_secret(tokenid.clone(), secret.to_owned());
+
Ok(())
}
@@ -97,5 +124,34 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
data.remove(tokenid);
write_file(data)?;
+ cache_remove_secret(tokenid);
+
Ok(())
}
+
+struct ApiTokenSecretCache {
+ /// Keys are token Authids, values are the corresponding plain text secrets.
+ /// Entries are added after a successful on-disk verification in
+ /// `verify_secret` or when a new token secret is generated by
+ /// `generate_and_set_secret`. Used to avoid repeated
+ /// password-hash computation on subsequent authentications.
+ secrets: HashMap<Authid, String>,
+}
+
+fn token_secret_cache() -> &'static RwLock<ApiTokenSecretCache> {
+ TOKEN_SECRET_CACHE.get_or_init(|| {
+ RwLock::new(ApiTokenSecretCache {
+ secrets: HashMap::new(),
+ })
+ })
+}
+
+fn cache_insert_secret(tokenid: Authid, secret: String) {
+ let mut cache = token_secret_cache().write().unwrap();
+ cache.secrets.insert(tokenid, secret);
+}
+
+fn cache_remove_secret(tokenid: &Authid) {
+ let mut cache = token_secret_cache().write().unwrap();
+ cache.secrets.remove(tokenid);
+}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 14%]
* [pbs-devel] [PATCH proxmox 1/3] proxmox-access-control: cache verified API token secrets
2025-12-05 13:25 15% [pbs-devel] [PATCH proxmox{-backup, } 0/6] Reduce token.shadow verification overhead Samuel Rufinatscha
` (2 preceding siblings ...)
2025-12-05 13:25 16% ` [pbs-devel] [PATCH proxmox-backup 3/3] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
@ 2025-12-05 13:25 14% ` Samuel Rufinatscha
2025-12-05 13:25 15% ` [pbs-devel] [PATCH proxmox 2/3] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
` (3 subsequent siblings)
7 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-05 13:25 UTC (permalink / raw)
To: pbs-devel
Currently, every token-based API request reads the token.shadow file and
runs the expensive password hash verification for the given token
secret. This issue was first observed as part of profiling the PBS
/status endpoint (see bug #6049 [1]) and is required for the factored
out proxmox_access_control token_shadow implementation too.
This patch introduces an in-memory cache of successfully verified token
secrets. Subsequent requests for the same token+secret combination only
perform a comparison using openssl::memcmp::eq and avoid re-running the
password hash. The cache is updated when a token secret is set and
cleared when a token is deleted. Note, this does NOT include manual
config changes, which will be covered in a subsequent patch.
This patch is a partly-fix.
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-access-control/src/token_shadow.rs | 57 +++++++++++++++++++++-
1 file changed, 56 insertions(+), 1 deletion(-)
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index c586d834..2dcd117d 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,4 +1,5 @@
use std::collections::HashMap;
+use std::sync::{OnceLock, RwLock};
use anyhow::{bail, format_err, Error};
use serde_json::{from_value, Value};
@@ -8,6 +9,13 @@ use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
use crate::init::impl_feature::{token_shadow, token_shadow_lock};
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: OnceLock<RwLock<ApiTokenSecretCache>> = OnceLock::new();
+
// Get exclusive lock
fn lock_config() -> Result<ApiLockGuard, Error> {
open_api_lockfile(token_shadow_lock(), None, true)
@@ -36,9 +44,25 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
bail!("not an API token ID");
}
+ // Fast path
+ if let Some(cached) = token_secret_cache().read().unwrap().secrets.get(tokenid) {
+ // Compare cached secret with provided one using constant time comparison
+ if openssl::memcmp::eq(cached.as_bytes(), secret.as_bytes()) {
+ // Already verified before
+ return Ok(());
+ }
+ // Fall through to slow path if secret doesn't match cached one
+ }
+
+ // Slow path: read file + verify hash
let data = read_file()?;
match data.get(tokenid) {
- Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+ Some(hashed_secret) => {
+ proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+ // Cache the plain secret for future requests
+ cache_insert_secret(tokenid.clone(), secret.to_owned());
+ Ok(())
+ }
None => bail!("invalid API token"),
}
}
@@ -56,6 +80,8 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
data.insert(tokenid.clone(), hashed_secret);
write_file(data)?;
+ cache_insert_secret(tokenid.clone(), secret.to_owned());
+
Ok(())
}
@@ -71,6 +97,8 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
data.remove(tokenid);
write_file(data)?;
+ cache_remove_secret(tokenid);
+
Ok(())
}
@@ -81,3 +109,30 @@ pub fn generate_and_set_secret(tokenid: &Authid) -> Result<String, Error> {
set_secret(tokenid, &secret)?;
Ok(secret)
}
+
+struct ApiTokenSecretCache {
+ /// Keys are token Authids, values are the corresponding plain text secrets.
+ /// Entries are added after a successful on-disk verification in
+ /// `verify_secret` or when a new token secret is generated by
+ /// `generate_and_set_secret`. Used to avoid repeated
+ /// password-hash computation on subsequent authentications.
+ secrets: HashMap<Authid, String>,
+}
+
+fn token_secret_cache() -> &'static RwLock<ApiTokenSecretCache> {
+ TOKEN_SECRET_CACHE.get_or_init(|| {
+ RwLock::new(ApiTokenSecretCache {
+ secrets: HashMap::new(),
+ })
+ })
+}
+
+fn cache_insert_secret(tokenid: Authid, secret: String) {
+ let mut cache = token_secret_cache().write().unwrap();
+ cache.secrets.insert(tokenid, secret);
+}
+
+fn cache_remove_secret(tokenid: &Authid) {
+ let mut cache = token_secret_cache().write().unwrap();
+ cache.secrets.remove(tokenid);
+}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 14%]
* [pbs-devel] superseded: [PATCH proxmox{, -backup} v3 0/2] fix #6939: acme: support servers returning 204 for nonce requests
@ 2025-12-03 10:23 13% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-03 10:23 UTC (permalink / raw)
To: pbs-devel
https://lore.proxmox.com/pbs-devel/20251203102217.59923-1-s.rufinatscha@proxmox.com/T/#t
On 11/3/25 11:13 AM, Samuel Rufinatscha wrote:
> Hi,
>
> this series proposes a change to ACME account registration in Proxmox
> Backup Server (PBS), so that it also works with ACME servers that return
> HTTP 204 No Content to the HEAD request for newNonce.
>
> This behaviour was observed against a specific ACME deployment and
> reported as bug #6939 [1]. Currently, PBS cannot register an ACME
> account for this CA.
>
> ## Problem
>
> During ACME account registration, PBS first fetches an anti-replay nonce
> by sending a HEAD request to the CA’s newNonce URL. RFC 8555 7.2 [2]
> says:
>
> * the server MUST include a Replay-Nonce header with a fresh nonce,
> * the server SHOULD use status 200 OK for the HEAD request,
> * the server MUST also handle GET on the same resource with status 204 No
> Content and an empty body [2].
>
> Currently, our Rust ACME clients only accept 200 OK. PBS inherits that
> strictness and aborts with:
>
> *ACME server responded with unexpected status code: 204*
>
> The author mentions, the issue did not appear with PVE9 [1].
> After looking into PVE’s Perl ACME client [3] it appears it uses a GET
> request instead of a HEAD request and accepts any 2xx success code
> when retrieving the nonce [5]. This difference in behavior does not
> affect functionality but is worth noting for consistency across
> implementations.
>
> ## Ideas to solve the problem
>
> To support ACME providers which return 204 No Content, the underlying
> ACME clients need to tolerate both 200 OK and 204 No Content as valid
> responses for the nonce HEAD request, as long as the Replay-Nonce is
> provided.
>
> I considered following solutions:
>
> 1. Change the `expected` field of the `AcmeRequest` type from `u16` to
> `Vec<u16>`, to support multiple success codes
>
> 2. Keep `expected: u16` and add a second field e.g. `expected_other:
> Vec<u16>` for "also allowed" codes.
>
> 3. Support any 2xx success codes, and remove the `expected` check
>
> I thought (1) might be reasonable, because:
>
> * It stays explicit and makes it clear which statuses are considered
> success.
> * We don’t create two parallel concepts ("expected" vs
> "expected_other") which introduces additional complexity
> * Can be extend later if we meet yet another harmless but not 200
> variant.
> * We don’t allow arbitrary 2xx.
>
> What do you think? Do you maybe have any other solution in mind that
> would fit better?
>
> ## Testing
>
> To prove the proposed fix, I reproduced the scenario:
>
> Pebble (release 2.8.0) from Let's Encrypt [5] running on a Debian 9 VM
> as the ACME server. nginx in front of Pebble, to intercept the
> `newNonce` request in order to return 204 No Content instead of 200 OK,
> all other requests are unchanged and forwarded to Pebble. Trust the
> Pebble and ngix CAs via `/usr/local/share/ca-certificates` +
> `update-ca-certificates` on the PBS VM.
>
> Then I ran following command against nginx:
>
> ```
> proxmox-backup-manager acme account register proxytest root@backup.local
> --directory 'https://nginx-address/dir
>
> Attempting to fetch Terms of
> Service from "https://acme-vm/dir"
> Terms of Service:
> data:text/plain,Do%20what%20thou%20wilt
> Do you agree to the above terms?
> [y|N]: y
> Do you want to use external account binding? [y|N]: N
> Attempting
> to register account with "https://acme-vm/dir"...
> Registration
> successful, account URL: https://acme-vm/my-account/160e58b66bdd72da
> ```
>
> When adjusting the nginx configuration to return any other non-expected
> success status code, e.g. 205, PBS expectely rejects with `API
> misbehaved: ACME server responded with unexpected status code: 205`.
>
> ## Maintainer notes:
>
> The patch series involves the following components:
>
> proxmox-acme: Apply PATCH 1 to change `expected` from `u16` to
> `Vec<u16>`. This results in a breaking change, as it changes the public
> API of the `AcmeRequest` type that is used by other components.
>
> proxmox-acme-api: Needs to depend on the new proxmox-acme; patch bump
>
> proxmox-backup: Apply PATCH 2 to use the new API changes; no breaking
> change as of only internal changes; patch bump
>
> proxmox-perl-rs / proxmox-datacenter-manager: Will need to use the
> dependency version bumps to follow the new proxmox-acme.
>
> ## Patch summary
>
> [PATCH 1/2] fix #6939: support providers returning 204 for nonce
> requests
>
> * Make the expected-status logic accept multiple allowed codes.
> * Treat both 200 OK and 204 No Content as valid for HEAD /newNonce,
> provided Replay-Nonce is present.
> * Keep rejecting other codes.
>
> [PATCH 2/2] acme: accept HTTP 204 from newNonce endpoint
>
> * Use the updated proxmox-acme behavior in PBS.
> * PBS can now register an ACME account against servers that return 204
> for the nonce HEAD request.
> * Still rejects unexpected codes.
>
> Thanks for considering this patch series, I look forward to your
> feedback.
>
> Best,
> Samuel Rufinatscha
>
> ## Changes from v1:
>
> [PATCH 1/2] fix #6939: support providers returning 204 for nonce
> requests
> * Introduced `http_success` module to contain the http success codes
> * Replaced `Vec<u16>` with `&[u16]` for expected codes to avoid
> allocations.
> * Clarified the PVEs Perl ACME client behaviour in the commit message.
>
> [PATCH 2/2] acme: accept HTTP 204 from newNonce endpoint
> * Integrated the `http_success` module, replacing `Vec<u16>` with `&[u16]`
> * Clarified the PVEs Perl ACME client behaviour in the commit message.
>
> ## Changes from v2:
>
> [PATCH 1/2] fix #6939: support providers returning 204 for nonce
> requests
> * Rename `http_success` module to `http_status`
>
> [PATCH 2/2] acme: accept HTTP 204 from newNonce endpoint
> * Replace `http_success` usage
>
> [1] Bugzilla report #6939:
> [https://bugzilla.proxmox.com/show_bug.cgi?id=6939](https://bugzilla.proxmox.com/show_bug.cgi?id=6939)
> [2] RFC 8555 (ACME):
> [https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2](https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2)
> [3] PVE’s Perl ACME client (allow 2xx codes for nonce requests):
> [https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597)
> [4] Pebble ACME server:
> [https://github.com/letsencrypt/pebble](https://github.com/letsencrypt/pebble)
> [5] Pebble ACME server (perform GET request:
> [https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219)
>
> proxmox:
>
> Samuel Rufinatscha (1):
> fix #6939: acme: support servers returning 204 for nonce requests
>
> proxmox-acme/src/account.rs | 10 +++++-----
> proxmox-acme/src/async_client.rs | 6 +++---
> proxmox-acme/src/client.rs | 2 +-
> proxmox-acme/src/lib.rs | 4 ++++
> proxmox-acme/src/request.rs | 15 ++++++++++++---
> 5 files changed, 25 insertions(+), 12 deletions(-)
>
>
> proxmox-backup:
>
> Samuel Rufinatscha (1):
> fix #6939: acme: accept HTTP 204 from newNonce endpoint
>
> src/acme/client.rs | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
>
> Summary over all repositories:
> 6 files changed, 29 insertions(+), 16 deletions(-)
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox-backup v4 2/4] acme: drop local AcmeClient
2025-12-03 10:22 11% [pbs-devel] [PATCH proxmox{-backup, } v4 " Samuel Rufinatscha
2025-12-03 10:22 15% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] acme: include proxmox-acme-api dependency Samuel Rufinatscha
@ 2025-12-03 10:22 6% ` Samuel Rufinatscha
2025-12-09 16:50 4% ` Max R. Carrara
2025-12-03 10:22 8% ` [pbs-devel] [PATCH proxmox-backup v4 3/4] acme: change API impls to use proxmox-acme-api handlers Samuel Rufinatscha
` (7 subsequent siblings)
9 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2025-12-03 10:22 UTC (permalink / raw)
To: pbs-devel
PBS currently uses its own ACME client and API logic, while PDM uses the
factored out proxmox-acme and proxmox-acme-api crates. This duplication
risks differences in behaviour and requires ACME maintenance in two
places. This patch is part of a series to move PBS over to the shared
ACME stack.
Changes:
- Remove the local src/acme/client.rs and switch to
proxmox_acme::async_client::AcmeClient where needed.
- Use proxmox_acme_api::load_client_with_account to the custom
AcmeClient::load() function
- Replace the local do_register() logic with
proxmox_acme_api::register_account, to further ensure accounts are persisted
- Replace the local AcmeAccountName type, required for
proxmox_acme_api::register_account
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
src/acme/client.rs | 691 -------------------------
src/acme/mod.rs | 3 -
src/acme/plugin.rs | 2 +-
src/api2/config/acme.rs | 50 +-
src/api2/node/certificates.rs | 2 +-
src/api2/types/acme.rs | 8 -
src/bin/proxmox_backup_manager/acme.rs | 17 +-
src/config/acme/mod.rs | 8 +-
src/config/node.rs | 9 +-
9 files changed, 36 insertions(+), 754 deletions(-)
delete mode 100644 src/acme/client.rs
diff --git a/src/acme/client.rs b/src/acme/client.rs
deleted file mode 100644
index 9fb6ad55..00000000
--- a/src/acme/client.rs
+++ /dev/null
@@ -1,691 +0,0 @@
-//! HTTP Client for the ACME protocol.
-
-use std::fs::OpenOptions;
-use std::io;
-use std::os::unix::fs::OpenOptionsExt;
-
-use anyhow::{bail, format_err};
-use bytes::Bytes;
-use http_body_util::BodyExt;
-use hyper::Request;
-use nix::sys::stat::Mode;
-use proxmox_http::Body;
-use serde::{Deserialize, Serialize};
-
-use proxmox_acme::account::AccountCreator;
-use proxmox_acme::order::{Order, OrderData};
-use proxmox_acme::types::AccountData as AcmeAccountData;
-use proxmox_acme::Request as AcmeRequest;
-use proxmox_acme::{Account, Authorization, Challenge, Directory, Error, ErrorResponse};
-use proxmox_http::client::Client;
-use proxmox_sys::fs::{replace_file, CreateOptions};
-
-use crate::api2::types::AcmeAccountName;
-use crate::config::acme::account_path;
-use crate::tools::pbs_simple_http;
-
-/// Our on-disk format inherited from PVE's proxmox-acme code.
-#[derive(Deserialize, Serialize)]
-#[serde(rename_all = "camelCase")]
-pub struct AccountData {
- /// The account's location URL.
- location: String,
-
- /// The account data.
- account: AcmeAccountData,
-
- /// The private key as PEM formatted string.
- key: String,
-
- /// ToS URL the user agreed to.
- #[serde(skip_serializing_if = "Option::is_none")]
- tos: Option<String>,
-
- #[serde(skip_serializing_if = "is_false", default)]
- debug: bool,
-
- /// The directory's URL.
- directory_url: String,
-}
-
-#[inline]
-fn is_false(b: &bool) -> bool {
- !*b
-}
-
-pub struct AcmeClient {
- directory_url: String,
- debug: bool,
- account_path: Option<String>,
- tos: Option<String>,
- account: Option<Account>,
- directory: Option<Directory>,
- nonce: Option<String>,
- http_client: Client,
-}
-
-impl AcmeClient {
- /// Create a new ACME client for a given ACME directory URL.
- pub fn new(directory_url: String) -> Self {
- Self {
- directory_url,
- debug: false,
- account_path: None,
- tos: None,
- account: None,
- directory: None,
- nonce: None,
- http_client: pbs_simple_http(None),
- }
- }
-
- /// Load an existing ACME account by name.
- pub async fn load(account_name: &AcmeAccountName) -> Result<Self, anyhow::Error> {
- let account_path = account_path(account_name.as_ref());
- let data = match tokio::fs::read(&account_path).await {
- Ok(data) => data,
- Err(err) if err.kind() == io::ErrorKind::NotFound => {
- bail!("acme account '{}' does not exist", account_name)
- }
- Err(err) => bail!(
- "failed to load acme account from '{}' - {}",
- account_path,
- err
- ),
- };
- let data: AccountData = serde_json::from_slice(&data).map_err(|err| {
- format_err!(
- "failed to parse acme account from '{}' - {}",
- account_path,
- err
- )
- })?;
-
- let account = Account::from_parts(data.location, data.key, data.account);
-
- let mut me = Self::new(data.directory_url);
- me.debug = data.debug;
- me.account_path = Some(account_path);
- me.tos = data.tos;
- me.account = Some(account);
-
- Ok(me)
- }
-
- pub async fn new_account<'a>(
- &'a mut self,
- account_name: &AcmeAccountName,
- tos_agreed: bool,
- contact: Vec<String>,
- rsa_bits: Option<u32>,
- eab_creds: Option<(String, String)>,
- ) -> Result<&'a Account, anyhow::Error> {
- self.tos = if tos_agreed {
- self.terms_of_service_url().await?.map(str::to_owned)
- } else {
- None
- };
-
- let mut account = Account::creator()
- .set_contacts(contact)
- .agree_to_tos(tos_agreed);
-
- if let Some((eab_kid, eab_hmac_key)) = eab_creds {
- account = account.set_eab_credentials(eab_kid, eab_hmac_key)?;
- }
-
- let account = if let Some(bits) = rsa_bits {
- account.generate_rsa_key(bits)?
- } else {
- account.generate_ec_key()?
- };
-
- let _ = self.register_account(account).await?;
-
- crate::config::acme::make_acme_account_dir()?;
- let account_path = account_path(account_name.as_ref());
- let file = OpenOptions::new()
- .write(true)
- .create_new(true)
- .mode(0o600)
- .open(&account_path)
- .map_err(|err| format_err!("failed to open {:?} for writing: {}", account_path, err))?;
- self.write_to(file).map_err(|err| {
- format_err!(
- "failed to write acme account to {:?}: {}",
- account_path,
- err
- )
- })?;
- self.account_path = Some(account_path);
-
- // unwrap: Setting `self.account` is literally this function's job, we just can't keep
- // the borrow from from `self.register_account()` active due to clashes.
- Ok(self.account.as_ref().unwrap())
- }
-
- fn save(&self) -> Result<(), anyhow::Error> {
- let mut data = Vec::<u8>::new();
- self.write_to(&mut data)?;
- let account_path = self.account_path.as_ref().ok_or_else(|| {
- format_err!("no account path set, cannot save updated account information")
- })?;
- crate::config::acme::make_acme_account_dir()?;
- replace_file(
- account_path,
- &data,
- CreateOptions::new()
- .perm(Mode::from_bits_truncate(0o600))
- .owner(nix::unistd::ROOT)
- .group(nix::unistd::Gid::from_raw(0)),
- true,
- )
- }
-
- /// Shortcut to `account().ok_or_else(...).key_authorization()`.
- pub fn key_authorization(&self, token: &str) -> Result<String, anyhow::Error> {
- Ok(Self::need_account(&self.account)?.key_authorization(token)?)
- }
-
- /// Shortcut to `account().ok_or_else(...).dns_01_txt_value()`.
- /// the key authorization value.
- pub fn dns_01_txt_value(&self, token: &str) -> Result<String, anyhow::Error> {
- Ok(Self::need_account(&self.account)?.dns_01_txt_value(token)?)
- }
-
- async fn register_account(
- &mut self,
- account: AccountCreator,
- ) -> Result<&Account, anyhow::Error> {
- let mut retry = retry();
- let mut response = loop {
- retry.tick()?;
-
- let (directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
- let request = account.request(directory, nonce)?;
- match self.run_request(request).await {
- Ok(response) => break response,
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- };
-
- let account = account.response(response.location_required()?, &response.body)?;
-
- self.account = Some(account);
- Ok(self.account.as_ref().unwrap())
- }
-
- pub async fn update_account<T: Serialize>(
- &mut self,
- data: &T,
- ) -> Result<&Account, anyhow::Error> {
- let account = Self::need_account(&self.account)?;
-
- let mut retry = retry();
- let response = loop {
- retry.tick()?;
-
- let (_directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = account.post_request(&account.location, nonce, data)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(response) => break response,
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- };
-
- // unwrap: we've been keeping an immutable reference to it from the top of the method
- let _ = account;
- self.account.as_mut().unwrap().data = response.json()?;
- self.save()?;
- Ok(self.account.as_ref().unwrap())
- }
-
- pub async fn new_order<I>(&mut self, domains: I) -> Result<Order, anyhow::Error>
- where
- I: IntoIterator<Item = String>,
- {
- let account = Self::need_account(&self.account)?;
-
- let order = domains
- .into_iter()
- .fold(OrderData::new(), |order, domain| order.domain(domain));
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let mut new_order = account.new_order(&order, directory, nonce)?;
- let mut response = match Self::execute(
- &mut self.http_client,
- new_order.request.take().unwrap(),
- &mut self.nonce,
- )
- .await
- {
- Ok(response) => response,
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- };
-
- return Ok(
- new_order.response(response.location_required()?, response.bytes().as_ref())?
- );
- }
- }
-
- /// Low level "POST-as-GET" request.
- async fn post_as_get(&mut self, url: &str) -> Result<AcmeResponse, anyhow::Error> {
- let account = Self::need_account(&self.account)?;
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (_directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = account.get_request(url, nonce)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(response) => return Ok(response),
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- }
- }
-
- /// Low level POST request.
- async fn post<T: Serialize>(
- &mut self,
- url: &str,
- data: &T,
- ) -> Result<AcmeResponse, anyhow::Error> {
- let account = Self::need_account(&self.account)?;
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (_directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = account.post_request(url, nonce, data)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(response) => return Ok(response),
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- }
- }
-
- /// Request challenge validation. Afterwards, the challenge should be polled.
- pub async fn request_challenge_validation(
- &mut self,
- url: &str,
- ) -> Result<Challenge, anyhow::Error> {
- Ok(self
- .post(url, &serde_json::Value::Object(Default::default()))
- .await?
- .json()?)
- }
-
- /// Assuming the provided URL is an 'Authorization' URL, get and deserialize it.
- pub async fn get_authorization(&mut self, url: &str) -> Result<Authorization, anyhow::Error> {
- Ok(self.post_as_get(url).await?.json()?)
- }
-
- /// Assuming the provided URL is an 'Order' URL, get and deserialize it.
- pub async fn get_order(&mut self, url: &str) -> Result<OrderData, anyhow::Error> {
- Ok(self.post_as_get(url).await?.json()?)
- }
-
- /// Finalize an Order via its `finalize` URL property and the DER encoded CSR.
- pub async fn finalize(&mut self, url: &str, csr: &[u8]) -> Result<(), anyhow::Error> {
- let csr = proxmox_base64::url::encode_no_pad(csr);
- let data = serde_json::json!({ "csr": csr });
- self.post(url, &data).await?;
- Ok(())
- }
-
- /// Download a certificate via its 'certificate' URL property.
- ///
- /// The certificate will be a PEM certificate chain.
- pub async fn get_certificate(&mut self, url: &str) -> Result<Bytes, anyhow::Error> {
- Ok(self.post_as_get(url).await?.body)
- }
-
- /// Revoke an existing certificate (PEM or DER formatted).
- pub async fn revoke_certificate(
- &mut self,
- certificate: &[u8],
- reason: Option<u32>,
- ) -> Result<(), anyhow::Error> {
- // TODO: This can also work without an account.
- let account = Self::need_account(&self.account)?;
-
- let revocation = account.revoke_certificate(certificate, reason)?;
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = revocation.request(directory, nonce)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(_response) => return Ok(()),
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- }
- }
-
- fn need_account(account: &Option<Account>) -> Result<&Account, anyhow::Error> {
- account
- .as_ref()
- .ok_or_else(|| format_err!("cannot use client without an account"))
- }
-
- pub(crate) fn account(&self) -> Result<&Account, anyhow::Error> {
- Self::need_account(&self.account)
- }
-
- pub fn tos(&self) -> Option<&str> {
- self.tos.as_deref()
- }
-
- pub fn directory_url(&self) -> &str {
- &self.directory_url
- }
-
- fn to_account_data(&self) -> Result<AccountData, anyhow::Error> {
- let account = self.account()?;
-
- Ok(AccountData {
- location: account.location.clone(),
- key: account.private_key.clone(),
- account: AcmeAccountData {
- only_return_existing: false, // don't actually write this out in case it's set
- ..account.data.clone()
- },
- tos: self.tos.clone(),
- debug: self.debug,
- directory_url: self.directory_url.clone(),
- })
- }
-
- fn write_to<T: io::Write>(&self, out: T) -> Result<(), anyhow::Error> {
- let data = self.to_account_data()?;
-
- Ok(serde_json::to_writer_pretty(out, &data)?)
- }
-}
-
-struct AcmeResponse {
- body: Bytes,
- location: Option<String>,
- got_nonce: bool,
-}
-
-impl AcmeResponse {
- /// Convenience helper to assert that a location header was part of the response.
- fn location_required(&mut self) -> Result<String, anyhow::Error> {
- self.location
- .take()
- .ok_or_else(|| format_err!("missing Location header"))
- }
-
- /// Convenience shortcut to perform json deserialization of the returned body.
- fn json<T: for<'a> Deserialize<'a>>(&self) -> Result<T, Error> {
- Ok(serde_json::from_slice(&self.body)?)
- }
-
- /// Convenience shortcut to get the body as bytes.
- fn bytes(&self) -> &[u8] {
- &self.body
- }
-}
-
-impl AcmeClient {
- /// Non-self-borrowing run_request version for borrow workarounds.
- async fn execute(
- http_client: &mut Client,
- request: AcmeRequest,
- nonce: &mut Option<String>,
- ) -> Result<AcmeResponse, Error> {
- let req_builder = Request::builder().method(request.method).uri(&request.url);
-
- let http_request = if !request.content_type.is_empty() {
- req_builder
- .header("Content-Type", request.content_type)
- .header("Content-Length", request.body.len())
- .body(request.body.into())
- } else {
- req_builder.body(Body::empty())
- }
- .map_err(|err| Error::Custom(format!("failed to create http request: {err}")))?;
-
- let response = http_client
- .request(http_request)
- .await
- .map_err(|err| Error::Custom(err.to_string()))?;
- let (parts, body) = response.into_parts();
-
- let status = parts.status.as_u16();
- let body = body
- .collect()
- .await
- .map_err(|err| Error::Custom(format!("failed to retrieve response body: {err}")))?
- .to_bytes();
-
- let got_nonce = if let Some(new_nonce) = parts.headers.get(proxmox_acme::REPLAY_NONCE) {
- let new_nonce = new_nonce.to_str().map_err(|err| {
- Error::Client(format!(
- "received invalid replay-nonce header from ACME server: {err}"
- ))
- })?;
- *nonce = Some(new_nonce.to_owned());
- true
- } else {
- false
- };
-
- if parts.status.is_success() {
- if status != request.expected {
- return Err(Error::InvalidApi(format!(
- "ACME server responded with unexpected status code: {:?}",
- parts.status
- )));
- }
-
- let location = parts
- .headers
- .get("Location")
- .map(|header| {
- header.to_str().map(str::to_owned).map_err(|err| {
- Error::Client(format!(
- "received invalid location header from ACME server: {err}"
- ))
- })
- })
- .transpose()?;
-
- return Ok(AcmeResponse {
- body,
- location,
- got_nonce,
- });
- }
-
- let error: ErrorResponse = serde_json::from_slice(&body).map_err(|err| {
- Error::Client(format!(
- "error status with improper error ACME response: {err}"
- ))
- })?;
-
- if error.ty == proxmox_acme::error::BAD_NONCE {
- if !got_nonce {
- return Err(Error::InvalidApi(
- "badNonce without a new Replay-Nonce header".to_string(),
- ));
- }
- return Err(Error::BadNonce);
- }
-
- Err(Error::Api(error))
- }
-
- /// Low-level API to run an n API request. This automatically updates the current nonce!
- async fn run_request(&mut self, request: AcmeRequest) -> Result<AcmeResponse, Error> {
- Self::execute(&mut self.http_client, request, &mut self.nonce).await
- }
-
- pub async fn directory(&mut self) -> Result<&Directory, Error> {
- Ok(Self::get_directory(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?
- .0)
- }
-
- async fn get_directory<'a, 'b>(
- http_client: &mut Client,
- directory_url: &str,
- directory: &'a mut Option<Directory>,
- nonce: &'b mut Option<String>,
- ) -> Result<(&'a Directory, Option<&'b str>), Error> {
- if let Some(d) = directory {
- return Ok((d, nonce.as_deref()));
- }
-
- let response = Self::execute(
- http_client,
- AcmeRequest {
- url: directory_url.to_string(),
- method: "GET",
- content_type: "",
- body: String::new(),
- expected: 200,
- },
- nonce,
- )
- .await?;
-
- *directory = Some(Directory::from_parts(
- directory_url.to_string(),
- response.json()?,
- ));
-
- Ok((directory.as_mut().unwrap(), nonce.as_deref()))
- }
-
- /// Like `get_directory`, but if the directory provides no nonce, also performs a `HEAD`
- /// request on the new nonce URL.
- async fn get_dir_nonce<'a, 'b>(
- http_client: &mut Client,
- directory_url: &str,
- directory: &'a mut Option<Directory>,
- nonce: &'b mut Option<String>,
- ) -> Result<(&'a Directory, &'b str), Error> {
- // this let construct is a lifetime workaround:
- let _ = Self::get_directory(http_client, directory_url, directory, nonce).await?;
- let dir = directory.as_ref().unwrap(); // the above fails if it couldn't fill this option
- if nonce.is_none() {
- // this is also a lifetime issue...
- let _ = Self::get_nonce(http_client, nonce, dir.new_nonce_url()).await?;
- };
- Ok((dir, nonce.as_deref().unwrap()))
- }
-
- pub async fn terms_of_service_url(&mut self) -> Result<Option<&str>, Error> {
- Ok(self.directory().await?.terms_of_service_url())
- }
-
- async fn get_nonce<'a>(
- http_client: &mut Client,
- nonce: &'a mut Option<String>,
- new_nonce_url: &str,
- ) -> Result<&'a str, Error> {
- let response = Self::execute(
- http_client,
- AcmeRequest {
- url: new_nonce_url.to_owned(),
- method: "HEAD",
- content_type: "",
- body: String::new(),
- expected: 200,
- },
- nonce,
- )
- .await?;
-
- if !response.got_nonce {
- return Err(Error::InvalidApi(
- "no new nonce received from new nonce URL".to_string(),
- ));
- }
-
- nonce
- .as_deref()
- .ok_or_else(|| Error::Client("failed to update nonce".to_string()))
- }
-}
-
-/// bad nonce retry count helper
-struct Retry(usize);
-
-const fn retry() -> Retry {
- Retry(0)
-}
-
-impl Retry {
- fn tick(&mut self) -> Result<(), Error> {
- if self.0 >= 3 {
- Err(Error::Client("kept getting a badNonce error!".to_string()))
- } else {
- self.0 += 1;
- Ok(())
- }
- }
-}
diff --git a/src/acme/mod.rs b/src/acme/mod.rs
index bf61811c..cc561f9a 100644
--- a/src/acme/mod.rs
+++ b/src/acme/mod.rs
@@ -1,5 +1,2 @@
-mod client;
-pub use client::AcmeClient;
-
pub(crate) mod plugin;
pub(crate) use plugin::get_acme_plugin;
diff --git a/src/acme/plugin.rs b/src/acme/plugin.rs
index f756e9b5..5bc09e1f 100644
--- a/src/acme/plugin.rs
+++ b/src/acme/plugin.rs
@@ -20,8 +20,8 @@ use tokio::process::Command;
use proxmox_acme::{Authorization, Challenge};
-use crate::acme::AcmeClient;
use crate::api2::types::AcmeDomain;
+use proxmox_acme::async_client::AcmeClient;
use proxmox_rest_server::WorkerTask;
use crate::config::acme::plugin::{DnsPlugin, PluginData};
diff --git a/src/api2/config/acme.rs b/src/api2/config/acme.rs
index 35c3fb77..02f88e2e 100644
--- a/src/api2/config/acme.rs
+++ b/src/api2/config/acme.rs
@@ -16,15 +16,15 @@ use proxmox_router::{
use proxmox_schema::{api, param_bail};
use proxmox_acme::types::AccountData as AcmeAccountData;
-use proxmox_acme::Account;
use pbs_api_types::{Authid, PRIV_SYS_MODIFY};
-use crate::acme::AcmeClient;
-use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
+use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
use crate::config::acme::plugin::{
self, DnsPlugin, DnsPluginCore, DnsPluginCoreUpdater, PLUGIN_ID_SCHEMA,
};
+use proxmox_acme::async_client::AcmeClient;
+use proxmox_acme_api::AcmeAccountName;
use proxmox_rest_server::WorkerTask;
pub(crate) const ROUTER: Router = Router::new()
@@ -143,15 +143,15 @@ pub struct AccountInfo {
)]
/// Return existing ACME account information.
pub async fn get_account(name: AcmeAccountName) -> Result<AccountInfo, Error> {
- let client = AcmeClient::load(&name).await?;
- let account = client.account()?;
+ let account_info = proxmox_acme_api::get_account(name).await?;
+
Ok(AccountInfo {
- location: account.location.clone(),
- tos: client.tos().map(str::to_owned),
- directory: client.directory_url().to_owned(),
+ location: account_info.location,
+ tos: account_info.tos,
+ directory: account_info.directory,
account: AcmeAccountData {
only_return_existing: false, // don't actually write this out in case it's set
- ..account.data.clone()
+ ..account_info.account
},
})
}
@@ -240,41 +240,24 @@ fn register_account(
auth_id.to_string(),
true,
move |_worker| async move {
- let mut client = AcmeClient::new(directory);
-
info!("Registering ACME account '{}'...", &name);
- let account = do_register_account(
- &mut client,
+ let location = proxmox_acme_api::register_account(
&name,
- tos_url.is_some(),
contact,
- None,
+ tos_url,
+ Some(directory),
eab_kid.zip(eab_hmac_key),
)
.await?;
- info!("Registration successful, account URL: {}", account.location);
+ info!("Registration successful, account URL: {}", location);
Ok(())
},
)
}
-pub async fn do_register_account<'a>(
- client: &'a mut AcmeClient,
- name: &AcmeAccountName,
- agree_to_tos: bool,
- contact: String,
- rsa_bits: Option<u32>,
- eab_creds: Option<(String, String)>,
-) -> Result<&'a Account, Error> {
- let contact = account_contact_from_string(&contact);
- client
- .new_account(name, agree_to_tos, contact, rsa_bits, eab_creds)
- .await
-}
-
#[api(
input: {
properties: {
@@ -312,7 +295,10 @@ pub fn update_account(
None => json!({}),
};
- AcmeClient::load(&name).await?.update_account(&data).await?;
+ proxmox_acme_api::load_client_with_account(&name)
+ .await?
+ .update_account(&data)
+ .await?;
Ok(())
},
@@ -350,7 +336,7 @@ pub fn deactivate_account(
auth_id.to_string(),
true,
move |_worker| async move {
- match AcmeClient::load(&name)
+ match proxmox_acme_api::load_client_with_account(&name)
.await?
.update_account(&json!({"status": "deactivated"}))
.await
diff --git a/src/api2/node/certificates.rs b/src/api2/node/certificates.rs
index 61ef910e..31196715 100644
--- a/src/api2/node/certificates.rs
+++ b/src/api2/node/certificates.rs
@@ -17,10 +17,10 @@ use pbs_buildcfg::configdir;
use pbs_tools::cert;
use tracing::warn;
-use crate::acme::AcmeClient;
use crate::api2::types::AcmeDomain;
use crate::config::node::NodeConfig;
use crate::server::send_certificate_renewal_mail;
+use proxmox_acme::async_client::AcmeClient;
use proxmox_rest_server::WorkerTask;
pub const ROUTER: Router = Router::new()
diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
index 210ebdbc..7c9063c0 100644
--- a/src/api2/types/acme.rs
+++ b/src/api2/types/acme.rs
@@ -60,14 +60,6 @@ pub struct KnownAcmeDirectory {
pub url: &'static str,
}
-proxmox_schema::api_string_type! {
- #[api(format: &PROXMOX_SAFE_ID_FORMAT)]
- /// ACME account name.
- #[derive(Clone, Eq, PartialEq, Hash, Deserialize, Serialize)]
- #[serde(transparent)]
- pub struct AcmeAccountName(String);
-}
-
#[api(
properties: {
schema: {
diff --git a/src/bin/proxmox_backup_manager/acme.rs b/src/bin/proxmox_backup_manager/acme.rs
index 0f0eafea..bb987b26 100644
--- a/src/bin/proxmox_backup_manager/acme.rs
+++ b/src/bin/proxmox_backup_manager/acme.rs
@@ -7,9 +7,9 @@ use proxmox_router::{cli::*, ApiHandler, RpcEnvironment};
use proxmox_schema::api;
use proxmox_sys::fs::file_get_contents;
-use proxmox_backup::acme::AcmeClient;
+use proxmox_acme::async_client::AcmeClient;
+use proxmox_acme_api::AcmeAccountName;
use proxmox_backup::api2;
-use proxmox_backup::api2::types::AcmeAccountName;
use proxmox_backup::config::acme::plugin::DnsPluginCore;
use proxmox_backup::config::acme::KNOWN_ACME_DIRECTORIES;
@@ -188,17 +188,20 @@ async fn register_account(
println!("Attempting to register account with {directory_url:?}...");
- let account = api2::config::acme::do_register_account(
- &mut client,
+ let tos_agreed = tos_agreed
+ .then(|| directory.terms_of_service_url().map(str::to_owned))
+ .flatten();
+
+ let location = proxmox_acme_api::register_account(
&name,
- tos_agreed,
contact,
- None,
+ tos_agreed,
+ Some(directory_url),
eab_creds,
)
.await?;
- println!("Registration successful, account URL: {}", account.location);
+ println!("Registration successful, account URL: {}", location);
Ok(())
}
diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
index 274a23fd..d31b2bc9 100644
--- a/src/config/acme/mod.rs
+++ b/src/config/acme/mod.rs
@@ -10,7 +10,8 @@ use proxmox_sys::fs::{file_read_string, CreateOptions};
use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
-use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
+use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
+use proxmox_acme_api::AcmeAccountName;
pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
pub(crate) const ACME_ACCOUNT_DIR: &str = pbs_buildcfg::configdir!("/acme/accounts");
@@ -35,11 +36,6 @@ pub(crate) fn make_acme_dir() -> Result<(), Error> {
create_acme_subdir(ACME_DIR)
}
-pub(crate) fn make_acme_account_dir() -> Result<(), Error> {
- make_acme_dir()?;
- create_acme_subdir(ACME_ACCOUNT_DIR)
-}
-
pub const KNOWN_ACME_DIRECTORIES: &[KnownAcmeDirectory] = &[
KnownAcmeDirectory {
name: "Let's Encrypt V2",
diff --git a/src/config/node.rs b/src/config/node.rs
index d2d6e383..d2a17a49 100644
--- a/src/config/node.rs
+++ b/src/config/node.rs
@@ -16,10 +16,9 @@ use pbs_api_types::{
use pbs_buildcfg::configdir;
use pbs_config::{open_backup_lockfile, BackupLockGuard};
-use crate::acme::AcmeClient;
-use crate::api2::types::{
- AcmeAccountName, AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA,
-};
+use crate::api2::types::{AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA};
+use proxmox_acme::async_client::AcmeClient;
+use proxmox_acme_api::AcmeAccountName;
const CONF_FILE: &str = configdir!("/node.cfg");
const LOCK_FILE: &str = configdir!("/.node.lck");
@@ -249,7 +248,7 @@ impl NodeConfig {
} else {
AcmeAccountName::from_string("default".to_string())? // should really not happen
};
- AcmeClient::load(&account).await
+ proxmox_acme_api::load_client_with_account(&account).await
}
pub fn acme_domains(&'_ self) -> AcmeDomainIter<'_> {
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* [pbs-devel] [PATCH proxmox-backup v4 3/4] acme: change API impls to use proxmox-acme-api handlers
2025-12-03 10:22 11% [pbs-devel] [PATCH proxmox{-backup, } v4 " Samuel Rufinatscha
2025-12-03 10:22 15% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] acme: include proxmox-acme-api dependency Samuel Rufinatscha
2025-12-03 10:22 6% ` [pbs-devel] [PATCH proxmox-backup v4 2/4] acme: drop local AcmeClient Samuel Rufinatscha
@ 2025-12-03 10:22 8% ` Samuel Rufinatscha
2025-12-09 16:50 5% ` Max R. Carrara
2025-12-03 10:22 7% ` [pbs-devel] [PATCH proxmox-backup v4 4/4] acme: certificate ordering through proxmox-acme-api Samuel Rufinatscha
` (6 subsequent siblings)
9 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2025-12-03 10:22 UTC (permalink / raw)
To: pbs-devel
PBS currently uses its own ACME client and API logic, while PDM uses the
factored out proxmox-acme and proxmox-acme-api crates. This duplication
risks differences in behaviour and requires ACME maintenance in two
places. This patch is part of a series to move PBS over to the shared
ACME stack.
Changes:
- Replace api2/config/acme.rs API logic with proxmox-acme-api handlers.
- Drop local caching and helper types that duplicate proxmox-acme-api.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
src/api2/config/acme.rs | 385 ++-----------------------
src/api2/types/acme.rs | 16 -
src/bin/proxmox_backup_manager/acme.rs | 6 +-
src/config/acme/mod.rs | 44 +--
4 files changed, 35 insertions(+), 416 deletions(-)
diff --git a/src/api2/config/acme.rs b/src/api2/config/acme.rs
index 02f88e2e..a112c8ee 100644
--- a/src/api2/config/acme.rs
+++ b/src/api2/config/acme.rs
@@ -1,31 +1,17 @@
-use std::fs;
-use std::ops::ControlFlow;
-use std::path::Path;
-use std::sync::{Arc, LazyLock, Mutex};
-use std::time::SystemTime;
-
-use anyhow::{bail, format_err, Error};
-use hex::FromHex;
-use serde::{Deserialize, Serialize};
-use serde_json::{json, Value};
-use tracing::{info, warn};
-
-use proxmox_router::{
- http_bail, list_subdirs_api_method, Permission, Router, RpcEnvironment, SubdirMap,
-};
-use proxmox_schema::{api, param_bail};
-
-use proxmox_acme::types::AccountData as AcmeAccountData;
-
+use anyhow::Error;
use pbs_api_types::{Authid, PRIV_SYS_MODIFY};
-
-use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
-use crate::config::acme::plugin::{
- self, DnsPlugin, DnsPluginCore, DnsPluginCoreUpdater, PLUGIN_ID_SCHEMA,
+use proxmox_acme_api::{
+ AccountEntry, AccountInfo, AcmeAccountName, AcmeChallengeSchema, ChallengeSchemaWrapper,
+ DeletablePluginProperty, DnsPluginCore, DnsPluginCoreUpdater, KnownAcmeDirectory, PluginConfig,
+ DEFAULT_ACME_DIRECTORY_ENTRY, PLUGIN_ID_SCHEMA,
};
-use proxmox_acme::async_client::AcmeClient;
-use proxmox_acme_api::AcmeAccountName;
+use proxmox_config_digest::ConfigDigest;
use proxmox_rest_server::WorkerTask;
+use proxmox_router::{
+ http_bail, list_subdirs_api_method, Permission, Router, RpcEnvironment, SubdirMap,
+};
+use proxmox_schema::api;
+use tracing::info;
pub(crate) const ROUTER: Router = Router::new()
.get(&list_subdirs_api_method!(SUBDIRS))
@@ -67,19 +53,6 @@ const PLUGIN_ITEM_ROUTER: Router = Router::new()
.put(&API_METHOD_UPDATE_PLUGIN)
.delete(&API_METHOD_DELETE_PLUGIN);
-#[api(
- properties: {
- name: { type: AcmeAccountName },
- },
-)]
-/// An ACME Account entry.
-///
-/// Currently only contains a 'name' property.
-#[derive(Serialize)]
-pub struct AccountEntry {
- name: AcmeAccountName,
-}
-
#[api(
access: {
permission: &Permission::Privilege(&["system", "certificates"], PRIV_SYS_MODIFY, false),
@@ -93,40 +66,7 @@ pub struct AccountEntry {
)]
/// List ACME accounts.
pub fn list_accounts() -> Result<Vec<AccountEntry>, Error> {
- let mut entries = Vec::new();
- crate::config::acme::foreach_acme_account(|name| {
- entries.push(AccountEntry { name });
- ControlFlow::Continue(())
- })?;
- Ok(entries)
-}
-
-#[api(
- properties: {
- account: { type: Object, properties: {}, additional_properties: true },
- tos: {
- type: String,
- optional: true,
- },
- },
-)]
-/// ACME Account information.
-///
-/// This is what we return via the API.
-#[derive(Serialize)]
-pub struct AccountInfo {
- /// Raw account data.
- account: AcmeAccountData,
-
- /// The ACME directory URL the account was created at.
- directory: String,
-
- /// The account's own URL within the ACME directory.
- location: String,
-
- /// The ToS URL, if the user agreed to one.
- #[serde(skip_serializing_if = "Option::is_none")]
- tos: Option<String>,
+ proxmox_acme_api::list_accounts()
}
#[api(
@@ -143,23 +83,7 @@ pub struct AccountInfo {
)]
/// Return existing ACME account information.
pub async fn get_account(name: AcmeAccountName) -> Result<AccountInfo, Error> {
- let account_info = proxmox_acme_api::get_account(name).await?;
-
- Ok(AccountInfo {
- location: account_info.location,
- tos: account_info.tos,
- directory: account_info.directory,
- account: AcmeAccountData {
- only_return_existing: false, // don't actually write this out in case it's set
- ..account_info.account
- },
- })
-}
-
-fn account_contact_from_string(s: &str) -> Vec<String> {
- s.split(&[' ', ';', ',', '\0'][..])
- .map(|s| format!("mailto:{s}"))
- .collect()
+ proxmox_acme_api::get_account(name).await
}
#[api(
@@ -224,15 +148,11 @@ fn register_account(
);
}
- if Path::new(&crate::config::acme::account_path(&name)).exists() {
+ if std::path::Path::new(&proxmox_acme_api::account_config_filename(&name)).exists() {
http_bail!(BAD_REQUEST, "account {} already exists", name);
}
- let directory = directory.unwrap_or_else(|| {
- crate::config::acme::DEFAULT_ACME_DIRECTORY_ENTRY
- .url
- .to_owned()
- });
+ let directory = directory.unwrap_or_else(|| DEFAULT_ACME_DIRECTORY_ENTRY.url.to_string());
WorkerTask::spawn(
"acme-register",
@@ -288,17 +208,7 @@ pub fn update_account(
auth_id.to_string(),
true,
move |_worker| async move {
- let data = match contact {
- Some(data) => json!({
- "contact": account_contact_from_string(&data),
- }),
- None => json!({}),
- };
-
- proxmox_acme_api::load_client_with_account(&name)
- .await?
- .update_account(&data)
- .await?;
+ proxmox_acme_api::update_account(&name, contact).await?;
Ok(())
},
@@ -336,18 +246,8 @@ pub fn deactivate_account(
auth_id.to_string(),
true,
move |_worker| async move {
- match proxmox_acme_api::load_client_with_account(&name)
- .await?
- .update_account(&json!({"status": "deactivated"}))
- .await
- {
- Ok(_account) => (),
- Err(err) if !force => return Err(err),
- Err(err) => {
- warn!("error deactivating account {name}, proceeding anyway - {err}");
- }
- }
- crate::config::acme::mark_account_deactivated(&name)?;
+ proxmox_acme_api::deactivate_account(&name, force).await?;
+
Ok(())
},
)
@@ -374,15 +274,7 @@ pub fn deactivate_account(
)]
/// Get the Terms of Service URL for an ACME directory.
async fn get_tos(directory: Option<String>) -> Result<Option<String>, Error> {
- let directory = directory.unwrap_or_else(|| {
- crate::config::acme::DEFAULT_ACME_DIRECTORY_ENTRY
- .url
- .to_owned()
- });
- Ok(AcmeClient::new(directory)
- .terms_of_service_url()
- .await?
- .map(str::to_owned))
+ proxmox_acme_api::get_tos(directory).await
}
#[api(
@@ -397,52 +289,7 @@ async fn get_tos(directory: Option<String>) -> Result<Option<String>, Error> {
)]
/// Get named known ACME directory endpoints.
fn get_directories() -> Result<&'static [KnownAcmeDirectory], Error> {
- Ok(crate::config::acme::KNOWN_ACME_DIRECTORIES)
-}
-
-/// Wrapper for efficient Arc use when returning the ACME challenge-plugin schema for serializing
-struct ChallengeSchemaWrapper {
- inner: Arc<Vec<AcmeChallengeSchema>>,
-}
-
-impl Serialize for ChallengeSchemaWrapper {
- fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
- where
- S: serde::Serializer,
- {
- self.inner.serialize(serializer)
- }
-}
-
-struct CachedSchema {
- schema: Arc<Vec<AcmeChallengeSchema>>,
- cached_mtime: SystemTime,
-}
-
-fn get_cached_challenge_schemas() -> Result<ChallengeSchemaWrapper, Error> {
- static CACHE: LazyLock<Mutex<Option<CachedSchema>>> = LazyLock::new(|| Mutex::new(None));
-
- // the actual loading code
- let mut last = CACHE.lock().unwrap();
-
- let actual_mtime = fs::metadata(crate::config::acme::ACME_DNS_SCHEMA_FN)?.modified()?;
-
- let schema = match &*last {
- Some(CachedSchema {
- schema,
- cached_mtime,
- }) if *cached_mtime >= actual_mtime => schema.clone(),
- _ => {
- let new_schema = Arc::new(crate::config::acme::load_dns_challenge_schema()?);
- *last = Some(CachedSchema {
- schema: Arc::clone(&new_schema),
- cached_mtime: actual_mtime,
- });
- new_schema
- }
- };
-
- Ok(ChallengeSchemaWrapper { inner: schema })
+ Ok(proxmox_acme_api::KNOWN_ACME_DIRECTORIES)
}
#[api(
@@ -457,69 +304,7 @@ fn get_cached_challenge_schemas() -> Result<ChallengeSchemaWrapper, Error> {
)]
/// Get named known ACME directory endpoints.
fn get_challenge_schema() -> Result<ChallengeSchemaWrapper, Error> {
- get_cached_challenge_schemas()
-}
-
-#[api]
-#[derive(Default, Deserialize, Serialize)]
-#[serde(rename_all = "kebab-case")]
-/// The API's format is inherited from PVE/PMG:
-pub struct PluginConfig {
- /// Plugin ID.
- plugin: String,
-
- /// Plugin type.
- #[serde(rename = "type")]
- ty: String,
-
- /// DNS Api name.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- api: Option<String>,
-
- /// Plugin configuration data.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- data: Option<String>,
-
- /// Extra delay in seconds to wait before requesting validation.
- ///
- /// Allows to cope with long TTL of DNS records.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- validation_delay: Option<u32>,
-
- /// Flag to disable the config.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- disable: Option<bool>,
-}
-
-// See PMG/PVE's $modify_cfg_for_api sub
-fn modify_cfg_for_api(id: &str, ty: &str, data: &Value) -> PluginConfig {
- let mut entry = data.clone();
-
- let obj = entry.as_object_mut().unwrap();
- obj.remove("id");
- obj.insert("plugin".to_string(), Value::String(id.to_owned()));
- obj.insert("type".to_string(), Value::String(ty.to_owned()));
-
- // FIXME: This needs to go once the `Updater` is fixed.
- // None of these should be able to fail unless the user changed the files by hand, in which
- // case we leave the unmodified string in the Value for now. This will be handled with an error
- // later.
- if let Some(Value::String(ref mut data)) = obj.get_mut("data") {
- if let Ok(new) = proxmox_base64::url::decode_no_pad(&data) {
- if let Ok(utf8) = String::from_utf8(new) {
- *data = utf8;
- }
- }
- }
-
- // PVE/PMG do this explicitly for ACME plugins...
- // obj.insert("digest".to_string(), Value::String(digest.clone()));
-
- serde_json::from_value(entry).unwrap_or_else(|_| PluginConfig {
- plugin: "*Error*".to_string(),
- ty: "*Error*".to_string(),
- ..Default::default()
- })
+ proxmox_acme_api::get_cached_challenge_schemas()
}
#[api(
@@ -535,12 +320,7 @@ fn modify_cfg_for_api(id: &str, ty: &str, data: &Value) -> PluginConfig {
)]
/// List ACME challenge plugins.
pub fn list_plugins(rpcenv: &mut dyn RpcEnvironment) -> Result<Vec<PluginConfig>, Error> {
- let (plugins, digest) = plugin::config()?;
- rpcenv["digest"] = hex::encode(digest).into();
- Ok(plugins
- .iter()
- .map(|(id, (ty, data))| modify_cfg_for_api(id, ty, data))
- .collect())
+ proxmox_acme_api::list_plugins(rpcenv)
}
#[api(
@@ -557,13 +337,7 @@ pub fn list_plugins(rpcenv: &mut dyn RpcEnvironment) -> Result<Vec<PluginConfig>
)]
/// List ACME challenge plugins.
pub fn get_plugin(id: String, rpcenv: &mut dyn RpcEnvironment) -> Result<PluginConfig, Error> {
- let (plugins, digest) = plugin::config()?;
- rpcenv["digest"] = hex::encode(digest).into();
-
- match plugins.get(&id) {
- Some((ty, data)) => Ok(modify_cfg_for_api(&id, ty, data)),
- None => http_bail!(NOT_FOUND, "no such plugin"),
- }
+ proxmox_acme_api::get_plugin(id, rpcenv)
}
// Currently we only have "the" standalone plugin and DNS plugins so we can just flatten a
@@ -595,30 +369,7 @@ pub fn get_plugin(id: String, rpcenv: &mut dyn RpcEnvironment) -> Result<PluginC
)]
/// Add ACME plugin configuration.
pub fn add_plugin(r#type: String, core: DnsPluginCore, data: String) -> Result<(), Error> {
- // Currently we only support DNS plugins and the standalone plugin is "fixed":
- if r#type != "dns" {
- param_bail!("type", "invalid ACME plugin type: {:?}", r#type);
- }
-
- let data = String::from_utf8(proxmox_base64::decode(data)?)
- .map_err(|_| format_err!("data must be valid UTF-8"))?;
-
- let id = core.id.clone();
-
- let _lock = plugin::lock()?;
-
- let (mut plugins, _digest) = plugin::config()?;
- if plugins.contains_key(&id) {
- param_bail!("id", "ACME plugin ID {:?} already exists", id);
- }
-
- let plugin = serde_json::to_value(DnsPlugin { core, data })?;
-
- plugins.insert(id, r#type, plugin);
-
- plugin::save_config(&plugins)?;
-
- Ok(())
+ proxmox_acme_api::add_plugin(r#type, core, data)
}
#[api(
@@ -634,26 +385,7 @@ pub fn add_plugin(r#type: String, core: DnsPluginCore, data: String) -> Result<(
)]
/// Delete an ACME plugin configuration.
pub fn delete_plugin(id: String) -> Result<(), Error> {
- let _lock = plugin::lock()?;
-
- let (mut plugins, _digest) = plugin::config()?;
- if plugins.remove(&id).is_none() {
- http_bail!(NOT_FOUND, "no such plugin");
- }
- plugin::save_config(&plugins)?;
-
- Ok(())
-}
-
-#[api()]
-#[derive(Serialize, Deserialize)]
-#[serde(rename_all = "kebab-case")]
-/// Deletable property name
-pub enum DeletableProperty {
- /// Delete the disable property
- Disable,
- /// Delete the validation-delay property
- ValidationDelay,
+ proxmox_acme_api::delete_plugin(id)
}
#[api(
@@ -675,12 +407,12 @@ pub enum DeletableProperty {
type: Array,
optional: true,
items: {
- type: DeletableProperty,
+ type: DeletablePluginProperty,
}
},
digest: {
- description: "Digest to protect against concurrent updates",
optional: true,
+ type: ConfigDigest,
},
},
},
@@ -694,65 +426,8 @@ pub fn update_plugin(
id: String,
update: DnsPluginCoreUpdater,
data: Option<String>,
- delete: Option<Vec<DeletableProperty>>,
- digest: Option<String>,
+ delete: Option<Vec<DeletablePluginProperty>>,
+ digest: Option<ConfigDigest>,
) -> Result<(), Error> {
- let data = data
- .as_deref()
- .map(proxmox_base64::decode)
- .transpose()?
- .map(String::from_utf8)
- .transpose()
- .map_err(|_| format_err!("data must be valid UTF-8"))?;
-
- let _lock = plugin::lock()?;
-
- let (mut plugins, expected_digest) = plugin::config()?;
-
- if let Some(digest) = digest {
- let digest = <[u8; 32]>::from_hex(digest)?;
- crate::tools::detect_modified_configuration_file(&digest, &expected_digest)?;
- }
-
- match plugins.get_mut(&id) {
- Some((ty, ref mut entry)) => {
- if ty != "dns" {
- bail!("cannot update plugin of type {:?}", ty);
- }
-
- let mut plugin = DnsPlugin::deserialize(&*entry)?;
-
- if let Some(delete) = delete {
- for delete_prop in delete {
- match delete_prop {
- DeletableProperty::ValidationDelay => {
- plugin.core.validation_delay = None;
- }
- DeletableProperty::Disable => {
- plugin.core.disable = None;
- }
- }
- }
- }
- if let Some(data) = data {
- plugin.data = data;
- }
- if let Some(api) = update.api {
- plugin.core.api = api;
- }
- if update.validation_delay.is_some() {
- plugin.core.validation_delay = update.validation_delay;
- }
- if update.disable.is_some() {
- plugin.core.disable = update.disable;
- }
-
- *entry = serde_json::to_value(plugin)?;
- }
- None => http_bail!(NOT_FOUND, "no such plugin"),
- }
-
- plugin::save_config(&plugins)?;
-
- Ok(())
+ proxmox_acme_api::update_plugin(id, update, data, delete, digest)
}
diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
index 7c9063c0..2905b41b 100644
--- a/src/api2/types/acme.rs
+++ b/src/api2/types/acme.rs
@@ -44,22 +44,6 @@ pub const ACME_DOMAIN_PROPERTY_SCHEMA: Schema =
.format(&ApiStringFormat::PropertyString(&AcmeDomain::API_SCHEMA))
.schema();
-#[api(
- properties: {
- name: { type: String },
- url: { type: String },
- },
-)]
-/// An ACME directory endpoint with a name and URL.
-#[derive(Serialize)]
-pub struct KnownAcmeDirectory {
- /// The ACME directory's name.
- pub name: &'static str,
-
- /// The ACME directory's endpoint URL.
- pub url: &'static str,
-}
-
#[api(
properties: {
schema: {
diff --git a/src/bin/proxmox_backup_manager/acme.rs b/src/bin/proxmox_backup_manager/acme.rs
index bb987b26..e7bd67af 100644
--- a/src/bin/proxmox_backup_manager/acme.rs
+++ b/src/bin/proxmox_backup_manager/acme.rs
@@ -8,10 +8,8 @@ use proxmox_schema::api;
use proxmox_sys::fs::file_get_contents;
use proxmox_acme::async_client::AcmeClient;
-use proxmox_acme_api::AcmeAccountName;
+use proxmox_acme_api::{AcmeAccountName, DnsPluginCore, KNOWN_ACME_DIRECTORIES};
use proxmox_backup::api2;
-use proxmox_backup::config::acme::plugin::DnsPluginCore;
-use proxmox_backup::config::acme::KNOWN_ACME_DIRECTORIES;
pub fn acme_mgmt_cli() -> CommandLineInterface {
let cmd_def = CliCommandMap::new()
@@ -122,7 +120,7 @@ async fn register_account(
match input.trim().parse::<usize>() {
Ok(n) if n < KNOWN_ACME_DIRECTORIES.len() => {
- break (KNOWN_ACME_DIRECTORIES[n].url.to_owned(), false);
+ break (KNOWN_ACME_DIRECTORIES[n].url.to_string(), false);
}
Ok(n) if n == KNOWN_ACME_DIRECTORIES.len() => {
input.clear();
diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
index d31b2bc9..35cda50b 100644
--- a/src/config/acme/mod.rs
+++ b/src/config/acme/mod.rs
@@ -1,8 +1,7 @@
use std::collections::HashMap;
use std::ops::ControlFlow;
-use std::path::Path;
-use anyhow::{bail, format_err, Error};
+use anyhow::Error;
use serde_json::Value;
use proxmox_sys::error::SysError;
@@ -10,8 +9,8 @@ use proxmox_sys::fs::{file_read_string, CreateOptions};
use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
-use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
-use proxmox_acme_api::AcmeAccountName;
+use crate::api2::types::AcmeChallengeSchema;
+use proxmox_acme_api::{AcmeAccountName, KnownAcmeDirectory, KNOWN_ACME_DIRECTORIES};
pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
pub(crate) const ACME_ACCOUNT_DIR: &str = pbs_buildcfg::configdir!("/acme/accounts");
@@ -36,23 +35,8 @@ pub(crate) fn make_acme_dir() -> Result<(), Error> {
create_acme_subdir(ACME_DIR)
}
-pub const KNOWN_ACME_DIRECTORIES: &[KnownAcmeDirectory] = &[
- KnownAcmeDirectory {
- name: "Let's Encrypt V2",
- url: "https://acme-v02.api.letsencrypt.org/directory",
- },
- KnownAcmeDirectory {
- name: "Let's Encrypt V2 Staging",
- url: "https://acme-staging-v02.api.letsencrypt.org/directory",
- },
-];
-
pub const DEFAULT_ACME_DIRECTORY_ENTRY: &KnownAcmeDirectory = &KNOWN_ACME_DIRECTORIES[0];
-pub fn account_path(name: &str) -> String {
- format!("{ACME_ACCOUNT_DIR}/{name}")
-}
-
pub fn foreach_acme_account<F>(mut func: F) -> Result<(), Error>
where
F: FnMut(AcmeAccountName) -> ControlFlow<Result<(), Error>>,
@@ -83,28 +67,6 @@ where
}
}
-pub fn mark_account_deactivated(name: &str) -> Result<(), Error> {
- let from = account_path(name);
- for i in 0..100 {
- let to = account_path(&format!("_deactivated_{name}_{i}"));
- if !Path::new(&to).exists() {
- return std::fs::rename(&from, &to).map_err(|err| {
- format_err!(
- "failed to move account path {:?} to {:?} - {}",
- from,
- to,
- err
- )
- });
- }
- }
- bail!(
- "No free slot to rename deactivated account {:?}, please cleanup {:?}",
- from,
- ACME_ACCOUNT_DIR
- );
-}
-
pub fn load_dns_challenge_schema() -> Result<Vec<AcmeChallengeSchema>, Error> {
let raw = file_read_string(ACME_DNS_SCHEMA_FN)?;
let schemas: serde_json::Map<String, Value> = serde_json::from_str(&raw)?;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 8%]
* [pbs-devel] [PATCH proxmox v4 4/4] fix #6939: acme: support servers returning 204 for nonce requests
2025-12-03 10:22 11% [pbs-devel] [PATCH proxmox{-backup, } v4 " Samuel Rufinatscha
` (6 preceding siblings ...)
2025-12-03 10:22 15% ` [pbs-devel] [PATCH proxmox v4 3/4] acme: introduce http_status module Samuel Rufinatscha
@ 2025-12-03 10:22 14% ` Samuel Rufinatscha
2025-12-09 16:50 5% ` [pbs-devel] [PATCH proxmox{-backup, } v4 0/8] " Max R. Carrara
2026-01-08 11:48 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
9 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-03 10:22 UTC (permalink / raw)
To: pbs-devel
Some ACME servers (notably custom or legacy implementations) respond
to HEAD /newNonce with a 204 No Content instead of the
RFC 8555-recommended 200 OK [1]. While this behavior is technically
off-spec, it is not illegal. This issue was reported on our bug
tracker [2].
The previous implementation treated any non-200 response as an error,
causing account registration to fail against such servers. Relax the
status-code check to accept both 200 and 204 responses (and potentially
support other 2xx codes) to improve interoperability.
Note: In comparison, PVE’s Perl ACME client performs a GET request [3]
instead of a HEAD request and accepts any 2xx success code when
retrieving the nonce [4]. This difference in behavior does not affect
functionality but is worth noting for consistency across
implementations.
[1] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
[3] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219
[4] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597
Fixes: #6939
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme/src/account.rs | 10 +++++-----
proxmox-acme/src/async_client.rs | 6 +++---
proxmox-acme/src/client.rs | 2 +-
proxmox-acme/src/request.rs | 4 ++--
4 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
index 350c78d4..820b209d 100644
--- a/proxmox-acme/src/account.rs
+++ b/proxmox-acme/src/account.rs
@@ -85,7 +85,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::CREATED,
+ expected: &[crate::http_status::CREATED],
};
Ok(NewOrder::new(request))
@@ -107,7 +107,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK],
})
}
@@ -132,7 +132,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK],
})
}
@@ -157,7 +157,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK],
})
}
@@ -408,7 +408,7 @@ impl AccountCreator {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::CREATED,
+ expected: &[crate::http_status::CREATED],
})
}
diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
index 043648bb..07da842c 100644
--- a/proxmox-acme/src/async_client.rs
+++ b/proxmox-acme/src/async_client.rs
@@ -420,7 +420,7 @@ impl AcmeClient {
};
if parts.status.is_success() {
- if status != request.expected {
+ if !request.expected.contains(&status) {
return Err(Error::InvalidApi(format!(
"ACME server responded with unexpected status code: {:?}",
parts.status
@@ -498,7 +498,7 @@ impl AcmeClient {
method: "GET",
content_type: "",
body: String::new(),
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK],
},
nonce,
)
@@ -550,7 +550,7 @@ impl AcmeClient {
method: "HEAD",
content_type: "",
body: String::new(),
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK, crate::http_status::NO_CONTENT],
},
nonce,
)
diff --git a/proxmox-acme/src/client.rs b/proxmox-acme/src/client.rs
index 5c812567..af250fb8 100644
--- a/proxmox-acme/src/client.rs
+++ b/proxmox-acme/src/client.rs
@@ -203,7 +203,7 @@ impl Inner {
let got_nonce = self.update_nonce(&mut response)?;
if response.is_success() {
- if response.status != request.expected {
+ if !request.expected.contains(&response.status) {
return Err(Error::InvalidApi(format!(
"API server responded with unexpected status code: {:?}",
response.status
diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
index 341ce53e..d782a7de 100644
--- a/proxmox-acme/src/request.rs
+++ b/proxmox-acme/src/request.rs
@@ -16,8 +16,8 @@ pub(crate) struct Request {
/// The body to pass along with request, or an empty string.
pub(crate) body: String,
- /// The expected status code a compliant ACME provider will return on success.
- pub(crate) expected: u16,
+ /// The set of HTTP status codes that indicate a successful response from an ACME provider.
+ pub(crate) expected: &'static [u16],
}
/// Common HTTP status codes used in ACME responses.
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 14%]
* [pbs-devel] [PATCH proxmox v4 1/4] acme-api: add helper to load client for an account
2025-12-03 10:22 11% [pbs-devel] [PATCH proxmox{-backup, } v4 " Samuel Rufinatscha
` (3 preceding siblings ...)
2025-12-03 10:22 7% ` [pbs-devel] [PATCH proxmox-backup v4 4/4] acme: certificate ordering through proxmox-acme-api Samuel Rufinatscha
@ 2025-12-03 10:22 17% ` Samuel Rufinatscha
2025-12-09 16:51 5% ` Max R. Carrara
2025-12-03 10:22 12% ` [pbs-devel] [PATCH proxmox v4 2/4] acme: reduce visibility of Request type Samuel Rufinatscha
` (4 subsequent siblings)
9 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2025-12-03 10:22 UTC (permalink / raw)
To: pbs-devel
The PBS ACME refactoring needs a simple way to obtain an AcmeClient for
a given configured account without duplicating config wiring. This patch
adds a load_client_with_account helper in proxmox-acme-api that loads
the account and constructs a matching client, similarly as PBS previous
own AcmeClient::load() function.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme-api/src/account_api_impl.rs | 5 +++++
proxmox-acme-api/src/lib.rs | 3 ++-
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/proxmox-acme-api/src/account_api_impl.rs b/proxmox-acme-api/src/account_api_impl.rs
index ef195908..ca8c8655 100644
--- a/proxmox-acme-api/src/account_api_impl.rs
+++ b/proxmox-acme-api/src/account_api_impl.rs
@@ -116,3 +116,8 @@ pub async fn update_account(name: &AcmeAccountName, contact: Option<String>) ->
Ok(())
}
+
+pub async fn load_client_with_account(account_name: &AcmeAccountName) -> Result<AcmeClient, Error> {
+ let account_data = super::account_config::load_account_config(&account_name).await?;
+ Ok(account_data.client())
+}
diff --git a/proxmox-acme-api/src/lib.rs b/proxmox-acme-api/src/lib.rs
index 623e9e23..96f88ae2 100644
--- a/proxmox-acme-api/src/lib.rs
+++ b/proxmox-acme-api/src/lib.rs
@@ -31,7 +31,8 @@ mod plugin_config;
mod account_api_impl;
#[cfg(feature = "impl")]
pub use account_api_impl::{
- deactivate_account, get_account, get_tos, list_accounts, register_account, update_account,
+ deactivate_account, get_account, get_tos, list_accounts, load_client_with_account,
+ register_account, update_account,
};
#[cfg(feature = "impl")]
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 17%]
* [pbs-devel] [PATCH proxmox-backup v4 1/4] acme: include proxmox-acme-api dependency
2025-12-03 10:22 11% [pbs-devel] [PATCH proxmox{-backup, } v4 " Samuel Rufinatscha
@ 2025-12-03 10:22 15% ` Samuel Rufinatscha
2025-12-03 10:22 6% ` [pbs-devel] [PATCH proxmox-backup v4 2/4] acme: drop local AcmeClient Samuel Rufinatscha
` (8 subsequent siblings)
9 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-03 10:22 UTC (permalink / raw)
To: pbs-devel
PBS currently uses its own ACME client and API logic, while PDM uses the
factored out proxmox-acme and proxmox-acme-api crates. This duplication
risks differences in behaviour and requires ACME maintenance in two
places. This patch is part of a series to move PBS over to the shared
ACME stack.
Changes:
- Add proxmox-acme-api with the "impl" feature as a dependency.
- Initialize proxmox_acme_api in proxmox-backup- api, manager and proxy.
* Inits PBS config dir /acme as proxmox ACME directory
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Cargo.toml | 3 +++
src/bin/proxmox-backup-api.rs | 2 ++
src/bin/proxmox-backup-manager.rs | 2 ++
src/bin/proxmox-backup-proxy.rs | 1 +
4 files changed, 8 insertions(+)
diff --git a/Cargo.toml b/Cargo.toml
index ff143932..bdaf7d85 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -101,6 +101,7 @@ pbs-api-types = "1.0.8"
# other proxmox crates
pathpatterns = "1"
proxmox-acme = "1"
+proxmox-acme-api = { version = "1", features = [ "impl" ] }
pxar = "1"
# PBS workspace
@@ -251,6 +252,7 @@ pbs-api-types.workspace = true
# in their respective repo
proxmox-acme.workspace = true
+proxmox-acme-api.workspace = true
pxar.workspace = true
# proxmox-backup workspace/internal crates
@@ -269,6 +271,7 @@ proxmox-rrd-api-types.workspace = true
[patch.crates-io]
#pbs-api-types = { path = "../proxmox/pbs-api-types" }
#proxmox-acme = { path = "../proxmox/proxmox-acme" }
+#proxmox-acme-api = { path = "../proxmox/proxmox-acme-api" }
#proxmox-api-macro = { path = "../proxmox/proxmox-api-macro" }
#proxmox-apt = { path = "../proxmox/proxmox-apt" }
#proxmox-apt-api-types = { path = "../proxmox/proxmox-apt-api-types" }
diff --git a/src/bin/proxmox-backup-api.rs b/src/bin/proxmox-backup-api.rs
index 417e9e97..48f10092 100644
--- a/src/bin/proxmox-backup-api.rs
+++ b/src/bin/proxmox-backup-api.rs
@@ -8,6 +8,7 @@ use hyper_util::server::graceful::GracefulShutdown;
use tokio::net::TcpListener;
use tracing::level_filters::LevelFilter;
+use pbs_buildcfg::configdir;
use proxmox_http::Body;
use proxmox_lang::try_block;
use proxmox_rest_server::{ApiConfig, RestServer};
@@ -78,6 +79,7 @@ async fn run() -> Result<(), Error> {
let mut command_sock = proxmox_daemon::command_socket::CommandSocket::new(backup_user.gid);
proxmox_product_config::init(backup_user.clone(), pbs_config::priv_user()?);
+ proxmox_acme_api::init(configdir!("/acme"), true)?;
let dir_opts = CreateOptions::new()
.owner(backup_user.uid)
diff --git a/src/bin/proxmox-backup-manager.rs b/src/bin/proxmox-backup-manager.rs
index d9f41353..0facb76c 100644
--- a/src/bin/proxmox-backup-manager.rs
+++ b/src/bin/proxmox-backup-manager.rs
@@ -18,6 +18,7 @@ use pbs_api_types::{
VERIFICATION_OUTDATED_AFTER_SCHEMA, VERIFY_JOB_READ_THREADS_SCHEMA,
VERIFY_JOB_VERIFY_THREADS_SCHEMA,
};
+use pbs_buildcfg::configdir;
use pbs_client::{display_task_log, view_task_result};
use pbs_config::sync;
use pbs_tools::json::required_string_param;
@@ -669,6 +670,7 @@ async fn run() -> Result<(), Error> {
.init()?;
proxmox_backup::server::notifications::init()?;
proxmox_product_config::init(pbs_config::backup_user()?, pbs_config::priv_user()?);
+ proxmox_acme_api::init(configdir!("/acme"), false)?;
let cmd_def = CliCommandMap::new()
.insert("acl", acl_commands())
diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
index 92a8cb3c..0bab18ec 100644
--- a/src/bin/proxmox-backup-proxy.rs
+++ b/src/bin/proxmox-backup-proxy.rs
@@ -190,6 +190,7 @@ async fn run() -> Result<(), Error> {
proxmox_backup::server::notifications::init()?;
metric_collection::init()?;
proxmox_product_config::init(pbs_config::backup_user()?, pbs_config::priv_user()?);
+ proxmox_acme_api::init(configdir!("/acme"), false)?;
let mut indexpath = PathBuf::from(pbs_buildcfg::JS_DIR);
indexpath.push("index.hbs");
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox-backup v4 4/4] acme: certificate ordering through proxmox-acme-api
2025-12-03 10:22 11% [pbs-devel] [PATCH proxmox{-backup, } v4 " Samuel Rufinatscha
` (2 preceding siblings ...)
2025-12-03 10:22 8% ` [pbs-devel] [PATCH proxmox-backup v4 3/4] acme: change API impls to use proxmox-acme-api handlers Samuel Rufinatscha
@ 2025-12-03 10:22 7% ` Samuel Rufinatscha
2025-12-09 16:50 5% ` Max R. Carrara
2025-12-03 10:22 17% ` [pbs-devel] [PATCH proxmox v4 1/4] acme-api: add helper to load client for an account Samuel Rufinatscha
` (5 subsequent siblings)
9 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2025-12-03 10:22 UTC (permalink / raw)
To: pbs-devel
PBS currently uses its own ACME client and API logic, while PDM uses the
factored out proxmox-acme and proxmox-acme-api crates. This duplication
risks differences in behaviour and requires ACME maintenance in two
places. This patch is part of a series to move PBS over to the shared
ACME stack.
Changes:
- Replace the custom ACME order/authorization loop in node certificates
with a call to proxmox_acme_api::order_certificate.
- Build domain + config data as proxmox-acme-api types
- Remove obsolete local ACME ordering and plugin glue code.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
src/acme/mod.rs | 2 -
src/acme/plugin.rs | 336 ----------------------------------
src/api2/node/certificates.rs | 240 ++++--------------------
src/api2/types/acme.rs | 74 --------
src/api2/types/mod.rs | 3 -
src/config/acme/mod.rs | 7 +-
src/config/acme/plugin.rs | 99 +---------
src/config/node.rs | 22 +--
src/lib.rs | 2 -
9 files changed, 46 insertions(+), 739 deletions(-)
delete mode 100644 src/acme/mod.rs
delete mode 100644 src/acme/plugin.rs
delete mode 100644 src/api2/types/acme.rs
diff --git a/src/acme/mod.rs b/src/acme/mod.rs
deleted file mode 100644
index cc561f9a..00000000
--- a/src/acme/mod.rs
+++ /dev/null
@@ -1,2 +0,0 @@
-pub(crate) mod plugin;
-pub(crate) use plugin::get_acme_plugin;
diff --git a/src/acme/plugin.rs b/src/acme/plugin.rs
deleted file mode 100644
index 5bc09e1f..00000000
--- a/src/acme/plugin.rs
+++ /dev/null
@@ -1,336 +0,0 @@
-use std::future::Future;
-use std::net::{IpAddr, SocketAddr};
-use std::pin::Pin;
-use std::process::Stdio;
-use std::sync::Arc;
-use std::time::Duration;
-
-use anyhow::{bail, format_err, Error};
-use bytes::Bytes;
-use futures::TryFutureExt;
-use http_body_util::Full;
-use hyper::body::Incoming;
-use hyper::server::conn::http1;
-use hyper::service::service_fn;
-use hyper::{Request, Response};
-use hyper_util::rt::TokioIo;
-use tokio::io::{AsyncBufReadExt, AsyncRead, AsyncWriteExt, BufReader};
-use tokio::net::TcpListener;
-use tokio::process::Command;
-
-use proxmox_acme::{Authorization, Challenge};
-
-use crate::api2::types::AcmeDomain;
-use proxmox_acme::async_client::AcmeClient;
-use proxmox_rest_server::WorkerTask;
-
-use crate::config::acme::plugin::{DnsPlugin, PluginData};
-
-const PROXMOX_ACME_SH_PATH: &str = "/usr/share/proxmox-acme/proxmox-acme";
-
-pub(crate) fn get_acme_plugin(
- plugin_data: &PluginData,
- name: &str,
-) -> Result<Option<Box<dyn AcmePlugin + Send + Sync + 'static>>, Error> {
- let (ty, data) = match plugin_data.get(name) {
- Some(plugin) => plugin,
- None => return Ok(None),
- };
-
- Ok(Some(match ty.as_str() {
- "dns" => {
- let plugin: DnsPlugin = serde::Deserialize::deserialize(data)?;
- Box::new(plugin)
- }
- "standalone" => {
- // this one has no config
- Box::<StandaloneServer>::default()
- }
- other => bail!("missing implementation for plugin type '{}'", other),
- }))
-}
-
-pub(crate) trait AcmePlugin {
- /// Setup everything required to trigger the validation and return the corresponding validation
- /// URL.
- fn setup<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<&'c str, Error>> + Send + 'fut>>;
-
- fn teardown<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + 'fut>>;
-}
-
-fn extract_challenge<'a>(
- authorization: &'a Authorization,
- ty: &str,
-) -> Result<&'a Challenge, Error> {
- authorization
- .challenges
- .iter()
- .find(|ch| ch.ty == ty)
- .ok_or_else(|| format_err!("no supported challenge type ({}) found", ty))
-}
-
-async fn pipe_to_tasklog<T: AsyncRead + Unpin>(
- pipe: T,
- task: Arc<WorkerTask>,
-) -> Result<(), std::io::Error> {
- let mut pipe = BufReader::new(pipe);
- let mut line = String::new();
- loop {
- line.clear();
- match pipe.read_line(&mut line).await {
- Ok(0) => return Ok(()),
- Ok(_) => task.log_message(line.as_str()),
- Err(err) => return Err(err),
- }
- }
-}
-
-impl DnsPlugin {
- async fn action<'a>(
- &self,
- client: &mut AcmeClient,
- authorization: &'a Authorization,
- domain: &AcmeDomain,
- task: Arc<WorkerTask>,
- action: &str,
- ) -> Result<&'a str, Error> {
- let challenge = extract_challenge(authorization, "dns-01")?;
- let mut stdin_data = client
- .dns_01_txt_value(
- challenge
- .token()
- .ok_or_else(|| format_err!("missing token in challenge"))?,
- )?
- .into_bytes();
- stdin_data.push(b'\n');
- stdin_data.extend(self.data.as_bytes());
- if stdin_data.last() != Some(&b'\n') {
- stdin_data.push(b'\n');
- }
-
- let mut command = Command::new("/usr/bin/setpriv");
-
- #[rustfmt::skip]
- command.args([
- "--reuid", "nobody",
- "--regid", "nogroup",
- "--clear-groups",
- "--reset-env",
- "--",
- "/bin/bash",
- PROXMOX_ACME_SH_PATH,
- action,
- &self.core.api,
- domain.alias.as_deref().unwrap_or(&domain.domain),
- ]);
-
- // We could use 1 socketpair, but tokio wraps them all in `File` internally causing `close`
- // to be called separately on all of them without exception, so we need 3 pipes :-(
-
- let mut child = command
- .stdin(Stdio::piped())
- .stdout(Stdio::piped())
- .stderr(Stdio::piped())
- .spawn()?;
-
- let mut stdin = child.stdin.take().expect("Stdio::piped()");
- let stdout = child.stdout.take().expect("Stdio::piped() failed?");
- let stdout = pipe_to_tasklog(stdout, Arc::clone(&task));
- let stderr = child.stderr.take().expect("Stdio::piped() failed?");
- let stderr = pipe_to_tasklog(stderr, Arc::clone(&task));
- let stdin = async move {
- stdin.write_all(&stdin_data).await?;
- stdin.flush().await?;
- Ok::<_, std::io::Error>(())
- };
- match futures::try_join!(stdin, stdout, stderr) {
- Ok(((), (), ())) => (),
- Err(err) => {
- if let Err(err) = child.kill().await {
- task.log_message(format!(
- "failed to kill '{PROXMOX_ACME_SH_PATH} {action}' command: {err}"
- ));
- }
- bail!("'{}' failed: {}", PROXMOX_ACME_SH_PATH, err);
- }
- }
-
- let status = child.wait().await?;
- if !status.success() {
- bail!(
- "'{} {}' exited with error ({})",
- PROXMOX_ACME_SH_PATH,
- action,
- status.code().unwrap_or(-1)
- );
- }
-
- Ok(&challenge.url)
- }
-}
-
-impl AcmePlugin for DnsPlugin {
- fn setup<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<&'c str, Error>> + Send + 'fut>> {
- Box::pin(async move {
- let result = self
- .action(client, authorization, domain, task.clone(), "setup")
- .await;
-
- let validation_delay = self.core.validation_delay.unwrap_or(30) as u64;
- if validation_delay > 0 {
- task.log_message(format!(
- "Sleeping {validation_delay} seconds to wait for TXT record propagation"
- ));
- tokio::time::sleep(Duration::from_secs(validation_delay)).await;
- }
- result
- })
- }
-
- fn teardown<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + 'fut>> {
- Box::pin(async move {
- self.action(client, authorization, domain, task, "teardown")
- .await
- .map(drop)
- })
- }
-}
-
-#[derive(Default)]
-struct StandaloneServer {
- abort_handle: Option<futures::future::AbortHandle>,
-}
-
-// In case the "order_certificates" future gets dropped between setup & teardown, let's also cancel
-// the HTTP listener on Drop:
-impl Drop for StandaloneServer {
- fn drop(&mut self) {
- self.stop();
- }
-}
-
-impl StandaloneServer {
- fn stop(&mut self) {
- if let Some(abort) = self.abort_handle.take() {
- abort.abort();
- }
- }
-}
-
-async fn standalone_respond(
- req: Request<Incoming>,
- path: Arc<String>,
- key_auth: Arc<String>,
-) -> Result<Response<Full<Bytes>>, hyper::Error> {
- if req.method() == hyper::Method::GET && req.uri().path() == path.as_str() {
- Ok(Response::builder()
- .status(hyper::http::StatusCode::OK)
- .body(key_auth.as_bytes().to_vec().into())
- .unwrap())
- } else {
- Ok(Response::builder()
- .status(hyper::http::StatusCode::NOT_FOUND)
- .body("Not found.".into())
- .unwrap())
- }
-}
-
-impl AcmePlugin for StandaloneServer {
- fn setup<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- _domain: &'d AcmeDomain,
- _task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<&'c str, Error>> + Send + 'fut>> {
- Box::pin(async move {
- self.stop();
-
- let challenge = extract_challenge(authorization, "http-01")?;
- let token = challenge
- .token()
- .ok_or_else(|| format_err!("missing token in challenge"))?;
- let key_auth = Arc::new(client.key_authorization(token)?);
- let path = Arc::new(format!("/.well-known/acme-challenge/{token}"));
-
- // `[::]:80` first, then `*:80`
- let dual = SocketAddr::new(IpAddr::from([0u16; 8]), 80);
- let ipv4 = SocketAddr::new(IpAddr::from([0u8; 4]), 80);
- let incoming = TcpListener::bind(dual)
- .or_else(|_| TcpListener::bind(ipv4))
- .await?;
-
- let server = async move {
- loop {
- let key_auth = Arc::clone(&key_auth);
- let path = Arc::clone(&path);
- match incoming.accept().await {
- Ok((tcp, _)) => {
- let io = TokioIo::new(tcp);
- let service = service_fn(move |request| {
- standalone_respond(
- request,
- Arc::clone(&path),
- Arc::clone(&key_auth),
- )
- });
-
- tokio::task::spawn(async move {
- if let Err(err) =
- http1::Builder::new().serve_connection(io, service).await
- {
- println!("Error serving connection: {err:?}");
- }
- });
- }
- Err(err) => println!("Error accepting connection: {err:?}"),
- }
- }
- };
- let (future, abort) = futures::future::abortable(server);
- self.abort_handle = Some(abort);
- tokio::spawn(future);
-
- Ok(challenge.url.as_str())
- })
- }
-
- fn teardown<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- _client: &'b mut AcmeClient,
- _authorization: &'c Authorization,
- _domain: &'d AcmeDomain,
- _task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + 'fut>> {
- Box::pin(async move {
- if let Some(abort) = self.abort_handle.take() {
- abort.abort();
- }
- Ok(())
- })
- }
-}
diff --git a/src/api2/node/certificates.rs b/src/api2/node/certificates.rs
index 31196715..2a645b4a 100644
--- a/src/api2/node/certificates.rs
+++ b/src/api2/node/certificates.rs
@@ -1,27 +1,19 @@
-use std::sync::Arc;
-use std::time::Duration;
-
use anyhow::{bail, format_err, Error};
use openssl::pkey::PKey;
use openssl::x509::X509;
use serde::{Deserialize, Serialize};
use tracing::info;
-use proxmox_router::list_subdirs_api_method;
-use proxmox_router::SubdirMap;
-use proxmox_router::{Permission, Router, RpcEnvironment};
-use proxmox_schema::api;
-
+use crate::server::send_certificate_renewal_mail;
use pbs_api_types::{NODE_SCHEMA, PRIV_SYS_MODIFY};
use pbs_buildcfg::configdir;
use pbs_tools::cert;
-use tracing::warn;
-
-use crate::api2::types::AcmeDomain;
-use crate::config::node::NodeConfig;
-use crate::server::send_certificate_renewal_mail;
-use proxmox_acme::async_client::AcmeClient;
+use proxmox_acme_api::AcmeDomain;
use proxmox_rest_server::WorkerTask;
+use proxmox_router::list_subdirs_api_method;
+use proxmox_router::SubdirMap;
+use proxmox_router::{Permission, Router, RpcEnvironment};
+use proxmox_schema::api;
pub const ROUTER: Router = Router::new()
.get(&list_subdirs_api_method!(SUBDIRS))
@@ -269,193 +261,6 @@ pub async fn delete_custom_certificate() -> Result<(), Error> {
Ok(())
}
-struct OrderedCertificate {
- certificate: hyper::body::Bytes,
- private_key_pem: Vec<u8>,
-}
-
-async fn order_certificate(
- worker: Arc<WorkerTask>,
- node_config: &NodeConfig,
-) -> Result<Option<OrderedCertificate>, Error> {
- use proxmox_acme::authorization::Status;
- use proxmox_acme::order::Identifier;
-
- let domains = node_config.acme_domains().try_fold(
- Vec::<AcmeDomain>::new(),
- |mut acc, domain| -> Result<_, Error> {
- let mut domain = domain?;
- domain.domain.make_ascii_lowercase();
- if let Some(alias) = &mut domain.alias {
- alias.make_ascii_lowercase();
- }
- acc.push(domain);
- Ok(acc)
- },
- )?;
-
- let get_domain_config = |domain: &str| {
- domains
- .iter()
- .find(|d| d.domain == domain)
- .ok_or_else(|| format_err!("no config for domain '{}'", domain))
- };
-
- if domains.is_empty() {
- info!("No domains configured to be ordered from an ACME server.");
- return Ok(None);
- }
-
- let (plugins, _) = crate::config::acme::plugin::config()?;
-
- let mut acme = node_config.acme_client().await?;
-
- info!("Placing ACME order");
- let order = acme
- .new_order(domains.iter().map(|d| d.domain.to_ascii_lowercase()))
- .await?;
- info!("Order URL: {}", order.location);
-
- let identifiers: Vec<String> = order
- .data
- .identifiers
- .iter()
- .map(|identifier| match identifier {
- Identifier::Dns(domain) => domain.clone(),
- })
- .collect();
-
- for auth_url in &order.data.authorizations {
- info!("Getting authorization details from '{auth_url}'");
- let mut auth = acme.get_authorization(auth_url).await?;
-
- let domain = match &mut auth.identifier {
- Identifier::Dns(domain) => domain.to_ascii_lowercase(),
- };
-
- if auth.status == Status::Valid {
- info!("{domain} is already validated!");
- continue;
- }
-
- info!("The validation for {domain} is pending");
- let domain_config: &AcmeDomain = get_domain_config(&domain)?;
- let plugin_id = domain_config.plugin.as_deref().unwrap_or("standalone");
- let mut plugin_cfg = crate::acme::get_acme_plugin(&plugins, plugin_id)?
- .ok_or_else(|| format_err!("plugin '{plugin_id}' for domain '{domain}' not found!"))?;
-
- info!("Setting up validation plugin");
- let validation_url = plugin_cfg
- .setup(&mut acme, &auth, domain_config, Arc::clone(&worker))
- .await?;
-
- let result = request_validation(&mut acme, auth_url, validation_url).await;
-
- if let Err(err) = plugin_cfg
- .teardown(&mut acme, &auth, domain_config, Arc::clone(&worker))
- .await
- {
- warn!("Failed to teardown plugin '{plugin_id}' for domain '{domain}' - {err}");
- }
-
- result?;
- }
-
- info!("All domains validated");
- info!("Creating CSR");
-
- let csr = proxmox_acme::util::Csr::generate(&identifiers, &Default::default())?;
- let mut finalize_error_cnt = 0u8;
- let order_url = &order.location;
- let mut order;
- loop {
- use proxmox_acme::order::Status;
-
- order = acme.get_order(order_url).await?;
-
- match order.status {
- Status::Pending => {
- info!("still pending, trying to finalize anyway");
- let finalize = order
- .finalize
- .as_deref()
- .ok_or_else(|| format_err!("missing 'finalize' URL in order"))?;
- if let Err(err) = acme.finalize(finalize, &csr.data).await {
- if finalize_error_cnt >= 5 {
- return Err(err);
- }
-
- finalize_error_cnt += 1;
- }
- tokio::time::sleep(Duration::from_secs(5)).await;
- }
- Status::Ready => {
- info!("order is ready, finalizing");
- let finalize = order
- .finalize
- .as_deref()
- .ok_or_else(|| format_err!("missing 'finalize' URL in order"))?;
- acme.finalize(finalize, &csr.data).await?;
- tokio::time::sleep(Duration::from_secs(5)).await;
- }
- Status::Processing => {
- info!("still processing, trying again in 30 seconds");
- tokio::time::sleep(Duration::from_secs(30)).await;
- }
- Status::Valid => {
- info!("valid");
- break;
- }
- other => bail!("order status: {:?}", other),
- }
- }
-
- info!("Downloading certificate");
- let certificate = acme
- .get_certificate(
- order
- .certificate
- .as_deref()
- .ok_or_else(|| format_err!("missing certificate url in finalized order"))?,
- )
- .await?;
-
- Ok(Some(OrderedCertificate {
- certificate,
- private_key_pem: csr.private_key_pem,
- }))
-}
-
-async fn request_validation(
- acme: &mut AcmeClient,
- auth_url: &str,
- validation_url: &str,
-) -> Result<(), Error> {
- info!("Triggering validation");
- acme.request_challenge_validation(validation_url).await?;
-
- info!("Sleeping for 5 seconds");
- tokio::time::sleep(Duration::from_secs(5)).await;
-
- loop {
- use proxmox_acme::authorization::Status;
-
- let auth = acme.get_authorization(auth_url).await?;
- match auth.status {
- Status::Pending => {
- info!("Status is still 'pending', trying again in 10 seconds");
- tokio::time::sleep(Duration::from_secs(10)).await;
- }
- Status::Valid => return Ok(()),
- other => bail!(
- "validating challenge '{}' failed - status: {:?}",
- validation_url,
- other
- ),
- }
- }
-}
-
#[api(
input: {
properties: {
@@ -525,9 +330,30 @@ fn spawn_certificate_worker(
let auth_id = rpcenv.get_auth_id().unwrap();
+ let acme_config = if let Some(cfg) = node_config.acme_config().transpose()? {
+ cfg
+ } else {
+ proxmox_acme_api::parse_acme_config_string("account=default")?
+ };
+
+ let domains = node_config.acme_domains().try_fold(
+ Vec::<AcmeDomain>::new(),
+ |mut acc, domain| -> Result<_, Error> {
+ let mut domain = domain?;
+ domain.domain.make_ascii_lowercase();
+ if let Some(alias) = &mut domain.alias {
+ alias.make_ascii_lowercase();
+ }
+ acc.push(domain);
+ Ok(acc)
+ },
+ )?;
+
WorkerTask::spawn(name, None, auth_id, true, move |worker| async move {
let work = || async {
- if let Some(cert) = order_certificate(worker, &node_config).await? {
+ if let Some(cert) =
+ proxmox_acme_api::order_certificate(worker, &acme_config, &domains).await?
+ {
crate::config::set_proxy_certificate(&cert.certificate, &cert.private_key_pem)?;
crate::server::reload_proxy_certificate().await?;
}
@@ -563,16 +389,20 @@ pub fn revoke_acme_cert(rpcenv: &mut dyn RpcEnvironment) -> Result<String, Error
let auth_id = rpcenv.get_auth_id().unwrap();
+ let acme_config = if let Some(cfg) = node_config.acme_config().transpose()? {
+ cfg
+ } else {
+ proxmox_acme_api::parse_acme_config_string("account=default")?
+ };
+
WorkerTask::spawn(
"acme-revoke-cert",
None,
auth_id,
true,
move |_worker| async move {
- info!("Loading ACME account");
- let mut acme = node_config.acme_client().await?;
info!("Revoking old certificate");
- acme.revoke_certificate(cert_pem.as_bytes(), None).await?;
+ proxmox_acme_api::revoke_certificate(&acme_config, &cert_pem.as_bytes()).await?;
info!("Deleting certificate and regenerating a self-signed one");
delete_custom_certificate().await?;
Ok(())
diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
deleted file mode 100644
index 2905b41b..00000000
--- a/src/api2/types/acme.rs
+++ /dev/null
@@ -1,74 +0,0 @@
-use serde::{Deserialize, Serialize};
-use serde_json::Value;
-
-use proxmox_schema::{api, ApiStringFormat, ApiType, Schema, StringSchema};
-
-use pbs_api_types::{DNS_ALIAS_FORMAT, DNS_NAME_FORMAT, PROXMOX_SAFE_ID_FORMAT};
-
-#[api(
- properties: {
- "domain": { format: &DNS_NAME_FORMAT },
- "alias": {
- optional: true,
- format: &DNS_ALIAS_FORMAT,
- },
- "plugin": {
- optional: true,
- format: &PROXMOX_SAFE_ID_FORMAT,
- },
- },
- default_key: "domain",
-)]
-#[derive(Deserialize, Serialize)]
-/// A domain entry for an ACME certificate.
-pub struct AcmeDomain {
- /// The domain to certify for.
- pub domain: String,
-
- /// The domain to use for challenges instead of the default acme challenge domain.
- ///
- /// This is useful if you use CNAME entries to redirect `_acme-challenge.*` domains to a
- /// different DNS server.
- #[serde(skip_serializing_if = "Option::is_none")]
- pub alias: Option<String>,
-
- /// The plugin to use to validate this domain.
- ///
- /// Empty means standalone HTTP validation is used.
- #[serde(skip_serializing_if = "Option::is_none")]
- pub plugin: Option<String>,
-}
-
-pub const ACME_DOMAIN_PROPERTY_SCHEMA: Schema =
- StringSchema::new("ACME domain configuration string")
- .format(&ApiStringFormat::PropertyString(&AcmeDomain::API_SCHEMA))
- .schema();
-
-#[api(
- properties: {
- schema: {
- type: Object,
- additional_properties: true,
- properties: {},
- },
- type: {
- type: String,
- },
- },
-)]
-#[derive(Serialize)]
-/// Schema for an ACME challenge plugin.
-pub struct AcmeChallengeSchema {
- /// Plugin ID.
- pub id: String,
-
- /// Human readable name, falls back to id.
- pub name: String,
-
- /// Plugin Type.
- #[serde(rename = "type")]
- pub ty: &'static str,
-
- /// The plugin's parameter schema.
- pub schema: Value,
-}
diff --git a/src/api2/types/mod.rs b/src/api2/types/mod.rs
index afc34b30..34193685 100644
--- a/src/api2/types/mod.rs
+++ b/src/api2/types/mod.rs
@@ -4,9 +4,6 @@ use anyhow::bail;
use proxmox_schema::*;
-mod acme;
-pub use acme::*;
-
// File names: may not contain slashes, may not start with "."
pub const FILENAME_FORMAT: ApiStringFormat = ApiStringFormat::VerifyFn(|name| {
if name.starts_with('.') {
diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
index 35cda50b..afd7abf8 100644
--- a/src/config/acme/mod.rs
+++ b/src/config/acme/mod.rs
@@ -9,8 +9,7 @@ use proxmox_sys::fs::{file_read_string, CreateOptions};
use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
-use crate::api2::types::AcmeChallengeSchema;
-use proxmox_acme_api::{AcmeAccountName, KnownAcmeDirectory, KNOWN_ACME_DIRECTORIES};
+use proxmox_acme_api::{AcmeAccountName, AcmeChallengeSchema};
pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
pub(crate) const ACME_ACCOUNT_DIR: &str = pbs_buildcfg::configdir!("/acme/accounts");
@@ -35,8 +34,6 @@ pub(crate) fn make_acme_dir() -> Result<(), Error> {
create_acme_subdir(ACME_DIR)
}
-pub const DEFAULT_ACME_DIRECTORY_ENTRY: &KnownAcmeDirectory = &KNOWN_ACME_DIRECTORIES[0];
-
pub fn foreach_acme_account<F>(mut func: F) -> Result<(), Error>
where
F: FnMut(AcmeAccountName) -> ControlFlow<Result<(), Error>>,
@@ -80,7 +77,7 @@ pub fn load_dns_challenge_schema() -> Result<Vec<AcmeChallengeSchema>, Error> {
.and_then(Value::as_str)
.unwrap_or(id)
.to_owned(),
- ty: "dns",
+ ty: "dns".into(),
schema: schema.to_owned(),
})
.collect())
diff --git a/src/config/acme/plugin.rs b/src/config/acme/plugin.rs
index 18e71199..2e979ffe 100644
--- a/src/config/acme/plugin.rs
+++ b/src/config/acme/plugin.rs
@@ -1,104 +1,15 @@
use std::sync::LazyLock;
use anyhow::Error;
-use serde::{Deserialize, Serialize};
-use serde_json::Value;
-
-use proxmox_schema::{api, ApiType, Schema, StringSchema, Updater};
-use proxmox_section_config::{SectionConfig, SectionConfigData, SectionConfigPlugin};
-
-use pbs_api_types::PROXMOX_SAFE_ID_FORMAT;
use pbs_config::{open_backup_lockfile, BackupLockGuard};
-
-pub const PLUGIN_ID_SCHEMA: Schema = StringSchema::new("ACME Challenge Plugin ID.")
- .format(&PROXMOX_SAFE_ID_FORMAT)
- .min_length(1)
- .max_length(32)
- .schema();
+use proxmox_acme_api::PLUGIN_ID_SCHEMA;
+use proxmox_acme_api::{DnsPlugin, StandalonePlugin};
+use proxmox_schema::{ApiType, Schema};
+use proxmox_section_config::{SectionConfig, SectionConfigData, SectionConfigPlugin};
+use serde_json::Value;
pub static CONFIG: LazyLock<SectionConfig> = LazyLock::new(init);
-#[api(
- properties: {
- id: { schema: PLUGIN_ID_SCHEMA },
- },
-)]
-#[derive(Deserialize, Serialize)]
-/// Standalone ACME Plugin for the http-1 challenge.
-pub struct StandalonePlugin {
- /// Plugin ID.
- id: String,
-}
-
-impl Default for StandalonePlugin {
- fn default() -> Self {
- Self {
- id: "standalone".to_string(),
- }
- }
-}
-
-#[api(
- properties: {
- id: { schema: PLUGIN_ID_SCHEMA },
- disable: {
- optional: true,
- default: false,
- },
- "validation-delay": {
- default: 30,
- optional: true,
- minimum: 0,
- maximum: 2 * 24 * 60 * 60,
- },
- },
-)]
-/// DNS ACME Challenge Plugin core data.
-#[derive(Deserialize, Serialize, Updater)]
-#[serde(rename_all = "kebab-case")]
-pub struct DnsPluginCore {
- /// Plugin ID.
- #[updater(skip)]
- pub id: String,
-
- /// DNS API Plugin Id.
- pub api: String,
-
- /// Extra delay in seconds to wait before requesting validation.
- ///
- /// Allows to cope with long TTL of DNS records.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- pub validation_delay: Option<u32>,
-
- /// Flag to disable the config.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- pub disable: Option<bool>,
-}
-
-#[api(
- properties: {
- core: { type: DnsPluginCore },
- },
-)]
-/// DNS ACME Challenge Plugin.
-#[derive(Deserialize, Serialize)]
-#[serde(rename_all = "kebab-case")]
-pub struct DnsPlugin {
- #[serde(flatten)]
- pub core: DnsPluginCore,
-
- // We handle this property separately in the API calls.
- /// DNS plugin data (base64url encoded without padding).
- #[serde(with = "proxmox_serde::string_as_base64url_nopad")]
- pub data: String,
-}
-
-impl DnsPlugin {
- pub fn decode_data(&self, output: &mut Vec<u8>) -> Result<(), Error> {
- Ok(proxmox_base64::url::decode_to_vec(&self.data, output)?)
- }
-}
-
fn init() -> SectionConfig {
let mut config = SectionConfig::new(&PLUGIN_ID_SCHEMA);
diff --git a/src/config/node.rs b/src/config/node.rs
index d2a17a49..b9257adf 100644
--- a/src/config/node.rs
+++ b/src/config/node.rs
@@ -6,17 +6,17 @@ use serde::{Deserialize, Serialize};
use proxmox_schema::{api, ApiStringFormat, ApiType, Updater};
-use proxmox_http::ProxyConfig;
-
use pbs_api_types::{
EMAIL_SCHEMA, MULTI_LINE_COMMENT_SCHEMA, OPENSSL_CIPHERS_TLS_1_2_SCHEMA,
OPENSSL_CIPHERS_TLS_1_3_SCHEMA,
};
+use proxmox_acme_api::{AcmeConfig, AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA};
+use proxmox_http::ProxyConfig;
use pbs_buildcfg::configdir;
use pbs_config::{open_backup_lockfile, BackupLockGuard};
-use crate::api2::types::{AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA};
+use crate::api2::types::HTTP_PROXY_SCHEMA;
use proxmox_acme::async_client::AcmeClient;
use proxmox_acme_api::AcmeAccountName;
@@ -45,20 +45,6 @@ pub fn save_config(config: &NodeConfig) -> Result<(), Error> {
pbs_config::replace_backup_config(CONF_FILE, &raw)
}
-#[api(
- properties: {
- account: { type: AcmeAccountName },
- }
-)]
-#[derive(Deserialize, Serialize)]
-/// The ACME configuration.
-///
-/// Currently only contains the name of the account use.
-pub struct AcmeConfig {
- /// Account to use to acquire ACME certificates.
- account: AcmeAccountName,
-}
-
/// All available languages in Proxmox. Taken from proxmox-i18n repository.
/// pt_BR, zh_CN, and zh_TW use the same case in the translation files.
// TODO: auto-generate from available translations
@@ -244,7 +230,7 @@ impl NodeConfig {
pub async fn acme_client(&self) -> Result<AcmeClient, Error> {
let account = if let Some(cfg) = self.acme_config().transpose()? {
- cfg.account
+ AcmeAccountName::from_string(cfg.account)?
} else {
AcmeAccountName::from_string("default".to_string())? // should really not happen
};
diff --git a/src/lib.rs b/src/lib.rs
index 8633378c..828f5842 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -27,8 +27,6 @@ pub(crate) mod auth;
pub mod tape;
-pub mod acme;
-
pub mod client_helpers;
pub mod traffic_control_cache;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 7%]
* [pbs-devel] [PATCH proxmox{-backup, } v4 0/8] fix #6939: acme: support servers returning 204 for nonce requests
@ 2025-12-03 10:22 11% Samuel Rufinatscha
2025-12-03 10:22 15% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] acme: include proxmox-acme-api dependency Samuel Rufinatscha
` (9 more replies)
0 siblings, 10 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-03 10:22 UTC (permalink / raw)
To: pbs-devel
Hi,
this series fixes account registration for ACME providers that return
HTTP 204 No Content to the newNonce request. Currently, both the PBS
ACME client and the shared ACME client in proxmox-acme only accept
HTTP 200 OK for this request. The issue was observed in PBS against a
custom ACME deployment and reported as bug #6939 [1].
## Problem
During ACME account registration, PBS first fetches an anti-replay
nonce by sending a HEAD request to the CA’s newNonce URL.
RFC 8555 §7.2 [2] states that:
* the server MUST include a Replay-Nonce header with a fresh nonce,
* the server SHOULD use status 200 OK for the HEAD request,
* the server MUST also handle GET on the same resource and may return
204 No Content with an empty body.
The reporter observed the following error message:
*ACME server responded with unexpected status code: 204*
and mentioned that the issue did not appear with PVE 9 [1]. Looking at
PVE’s Perl ACME client [3], it uses a GET request instead of HEAD and
accepts any 2xx success code when retrieving the nonce. This difference
in behavior does not affect functionality but is worth noting for
consistency across implementations.
## Approach
To support ACME providers which return 204 No Content, the Rust ACME
clients in proxmox-backup and proxmox need to treat both 200 OK and 204
No Content as valid responses for the nonce request, as long as a
Replay-Nonce header is present.
This series changes the expected field of the internal Request type
from a single u16 to a list of allowed status codes
(e.g. &'static [u16]), so one request can explicitly accept multiple
success codes.
To avoid fixing the issue twice (once in PBS’ own ACME client and once
in the shared Rust client), this series first refactors PBS to use the
shared AcmeClient from proxmox-acme / proxmox-acme-api, similar to PDM,
and then applies the bug fix in that shared implementation so that all
consumers benefit from the more tolerant behavior.
## Testing
*Testing the refactor*
To test the refactor, I
(1) installed latest stable PBS on a VM
(2) created .deb package from latest PBS (master), containing the
refactor
(3) installed created .deb package
(4) installed Pebble from Let's Encrypt [5] on the same VM
(5) created an ACME account and ordered the new certificate for the
host domain.
Steps to reproduce:
(1) install latest stable PBS on a VM, create .deb package from latest
PBS (master) containing the refactor, install created .deb package
(2) install Pebble from Let's Encrypt [5] on the same VM:
cd
apt update
apt install -y golang git
git clone https://github.com/letsencrypt/pebble
cd pebble
go build ./cmd/pebble
then, download and trust the Pebble cert:
wget https://raw.githubusercontent.com/letsencrypt/pebble/main/test/certs/pebble.minica.pem
cp pebble.minica.pem /usr/local/share/ca-certificates/pebble.minica.crt
update-ca-certificates
We want Pebble to perform HTTP-01 validation against port 80, because
PBS’s standalone plugin will bind port 80. Set httpPort to 80.
nano ./test/config/pebble-config.json
Start the Pebble server in the background:
./pebble -config ./test/config/pebble-config.json &
Create a Pebble ACME account:
proxmox-backup-manager acme account register default admin@example.com --directory 'https://127.0.0.1:14000/dir'
To verify persistence of the account I checked
ls /etc/proxmox-backup/acme/accounts
Verified if update-account works
proxmox-backup-manager acme account update default --contact "a@example.com,b@example.com"
proxmox-backup-manager acme account info default
In the PBS GUI, you can create a new domain. You can use your host
domain name (see /etc/hosts). Select the created account and order the
certificate.
After a page reload, you might need to accept the new certificate in the browser.
In the PBS dashboard, you should see the new Pebble certificate.
*Note: on reboot, the created Pebble ACME account will be gone and you
will need to create a new one. Pebble does not persist account info.
In that case remove the previously created account in
/etc/proxmox-backup/acme/accounts.
*Testing the newNonce fix*
To prove the ACME newNonce fix, I put nginx in front of Pebble, to
intercept the newNonce request in order to return 204 No Content
instead of 200 OK, all other requests are unchanged and forwarded to
Pebble. Requires trusting the nginx CAs via
/usr/local/share/ca-certificates + update-ca-certificates on the VM.
Then I ran following command against nginx:
proxmox-backup-manager acme account register proxytest root@backup.local --directory 'https://nginx-address/dir
The account could be created successfully. When adjusting the nginx
configuration to return any other non-expected success status code,
PBS rejects as expected.
## Patch summary
0001 – acme: include proxmox-acme-api dependency
Adds proxmox-acme-api as a new dependency for the ACME code. This
prepares the codebase to use the shared ACME API instead of local
implementations.
0002 – acme: drop local AcmeClient
Removes the local AcmeClient implementation. Minimal changes
required to support the removal.
0003 – acme: change API impls to use proxmox-acme-api handler
Updates existing ACME API implementations to use the handlers provided
by proxmox-acme-api.
0004 – acme: certificate ordering through proxmox-acme-api
Perform certificate ordering through proxmox-acme-api instead of local
logic.
0005 – acme api: add helper to load client for an account
Introduces a helper function to load an ACME client instance for a
given account. Required for the PBS refactor.
0006 – acme: reduce visibility of Request type
Restricts the visibility of the internal Request type.
0007 – acme: introduce http_status module
Adds a dedicated http_status module for handling common HTTP status
codes.
0008 – fix #6939: acme: support servers returning 204 for nonce
Adjusts nonce handling to support ACME servers that return HTTP 204
(No Content) for new-nonce requests.
Thanks for considering this patch series, I look forward to your
feedback.
Best,
Samuel Rufinatscha
## Changelog
Changes from v3 to v4:
Removed: [PATCH proxmox-backup v3 1/1].
Added:
[PATCH proxmox-backup v4 1/4] acme: include proxmox-acme-api dependency
* New: add proxmox-acme-api as a dependency and initialize it in
PBS so PBS can use the shared ACME API instead.
[PATCH proxmox-backup v4 2/4] acme: drop local AcmeClient
* New: remove the PBS-local AcmeClient implementation and switch PBS
over to the shared proxmox-acme async client.
[PATCH proxmox-backup v4 3/4] acme: change API impls to use proxmox-acme-api
handlers
* New: rework PBS’ ACME API endpoints to delegate to
proxmox-acme-api handlers instead of duplicating logic locally.
[PATCH proxmox-backup v4 4/4] acme: certificate ordering through
proxmox-acme-api
* New: move PBS’ ACME certificate ordering logic over to
proxmox-acme-api, keeping only certificate installation/reload in
PBS.
[PATCH proxmox v4 1/4] acme-api: add helper to load client for an account
* New: add a load_client_with_account helper in proxmox-acme-api so
PBS (and others) can construct an AcmeClient for a configured account
without duplicating boilerplate.
[PATCH proxmox v4 2/4] acme: reduce visibility of Request type
* New: hide the low-level Request type and its fields behind
constructors / reduced visibility so changes to “expected” no longer
affect the public API as they did in v3.
[PATCH proxmox v4 3/4] acme: introduce http_status module
* New: split out the HTTP status constants into an internal
http_status module as a separate preparatory cleanup before the bug
fix, instead of doing this inline like in v3.
Changed:
[PATCH proxmox v3 1/1] -> [PATCH proxmox v4 4/4]
fix #6939: acme: support server returning 204 for nonce requests
* Rebased on top of the refactor: keep the same behavioural fix as in v3
(accept 204 for newNonce with Replay-Nonce present), but implement it
on top of the http_status module that is part of the refactor.
Changes from v2 to v3:
[PATCH proxmox v3 1/1] fix #6939: support providers returning 204 for nonce
requests
* Rename `http_success` module to `http_status`
[PATCH proxmox-backup v3 1/1] acme: accept HTTP 204 from newNonce endpoint
* Replace `http_success` usage
Changes from v1 to v2:
[PATCH proxmox v2 1/1] fix #6939: support providers returning 204 for nonce
requests
* Introduced `http_success` module to contain the http success codes
* Replaced `Vec<u16>` with `&[u16]` for expected codes to avoid
allocations.
* Clarified the PVEs Perl ACME client behaviour in the commit message.
[PATCH proxmox-backup v2 1/1] acme: accept HTTP 204 from newNonce endpoint
* Integrated the `http_success` module, replacing `Vec<u16>` with `&[u16]`
* Clarified the PVEs Perl ACME client behaviour in the commit message.
[1] Bugzilla report #6939:
[https://bugzilla.proxmox.com/show_bug.cgi?id=6939](https://bugzilla.proxmox.com/show_bug.cgi?id=6939)
[2] RFC 8555 (ACME):
[https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2](https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2)
[3] PVE’s Perl ACME client (allow 2xx codes for nonce requests):
[https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597)
[4] Pebble ACME server:
[https://github.com/letsencrypt/pebble](https://github.com/letsencrypt/pebble)
[5] Pebble ACME server (perform GET request:
[https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219)
proxmox-backup:
Samuel Rufinatscha (4):
acme: include proxmox-acme-api dependency
acme: drop local AcmeClient
acme: change API impls to use proxmox-acme-api handlers
acme: certificate ordering through proxmox-acme-api
Cargo.toml | 3 +
src/acme/client.rs | 691 -------------------------
src/acme/mod.rs | 5 -
src/acme/plugin.rs | 336 ------------
src/api2/config/acme.rs | 407 ++-------------
src/api2/node/certificates.rs | 240 ++-------
src/api2/types/acme.rs | 98 ----
src/api2/types/mod.rs | 3 -
src/bin/proxmox-backup-api.rs | 2 +
src/bin/proxmox-backup-manager.rs | 2 +
src/bin/proxmox-backup-proxy.rs | 1 +
src/bin/proxmox_backup_manager/acme.rs | 21 +-
src/config/acme/mod.rs | 51 +-
src/config/acme/plugin.rs | 99 +---
src/config/node.rs | 29 +-
src/lib.rs | 2 -
16 files changed, 103 insertions(+), 1887 deletions(-)
delete mode 100644 src/acme/client.rs
delete mode 100644 src/acme/mod.rs
delete mode 100644 src/acme/plugin.rs
delete mode 100644 src/api2/types/acme.rs
proxmox:
Samuel Rufinatscha (4):
acme-api: add helper to load client for an account
acme: reduce visibility of Request type
acme: introduce http_status module
fix #6939: acme: support servers returning 204 for nonce requests
proxmox-acme-api/src/account_api_impl.rs | 5 +++++
proxmox-acme-api/src/lib.rs | 3 ++-
proxmox-acme/src/account.rs | 27 +++++++++++++-----------
proxmox-acme/src/async_client.rs | 8 +++----
proxmox-acme/src/authorization.rs | 2 +-
proxmox-acme/src/client.rs | 8 +++----
proxmox-acme/src/lib.rs | 6 ++----
proxmox-acme/src/order.rs | 2 +-
proxmox-acme/src/request.rs | 25 +++++++++++++++-------
9 files changed, 51 insertions(+), 35 deletions(-)
Summary over all repositories:
25 files changed, 154 insertions(+), 1922 deletions(-)
--
Generated by git-murpp 0.8.1
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 11%]
* [pbs-devel] [PATCH proxmox v4 3/4] acme: introduce http_status module
2025-12-03 10:22 11% [pbs-devel] [PATCH proxmox{-backup, } v4 " Samuel Rufinatscha
` (5 preceding siblings ...)
2025-12-03 10:22 12% ` [pbs-devel] [PATCH proxmox v4 2/4] acme: reduce visibility of Request type Samuel Rufinatscha
@ 2025-12-03 10:22 15% ` Samuel Rufinatscha
2025-12-03 10:22 14% ` [pbs-devel] [PATCH proxmox v4 4/4] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (2 subsequent siblings)
9 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-03 10:22 UTC (permalink / raw)
To: pbs-devel
Introduce an internal http_status module with the common ACME HTTP
response codes, and replace use of crate::request::CREATED as well as
direct numeric status code usages.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme/src/account.rs | 10 +++++-----
proxmox-acme/src/async_client.rs | 4 ++--
proxmox-acme/src/lib.rs | 2 ++
proxmox-acme/src/request.rs | 11 ++++++++++-
4 files changed, 19 insertions(+), 8 deletions(-)
diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
index 081ca986..350c78d4 100644
--- a/proxmox-acme/src/account.rs
+++ b/proxmox-acme/src/account.rs
@@ -85,7 +85,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::request::CREATED,
+ expected: crate::http_status::CREATED,
};
Ok(NewOrder::new(request))
@@ -107,7 +107,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: crate::http_status::OK,
})
}
@@ -132,7 +132,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: crate::http_status::OK,
})
}
@@ -157,7 +157,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: crate::http_status::OK,
})
}
@@ -408,7 +408,7 @@ impl AccountCreator {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::request::CREATED,
+ expected: crate::http_status::CREATED,
})
}
diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
index 2ff3ba22..043648bb 100644
--- a/proxmox-acme/src/async_client.rs
+++ b/proxmox-acme/src/async_client.rs
@@ -498,7 +498,7 @@ impl AcmeClient {
method: "GET",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: crate::http_status::OK,
},
nonce,
)
@@ -550,7 +550,7 @@ impl AcmeClient {
method: "HEAD",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: crate::http_status::OK,
},
nonce,
)
diff --git a/proxmox-acme/src/lib.rs b/proxmox-acme/src/lib.rs
index 6722030c..6051a025 100644
--- a/proxmox-acme/src/lib.rs
+++ b/proxmox-acme/src/lib.rs
@@ -70,6 +70,8 @@ pub use order::Order;
#[cfg(feature = "impl")]
pub use order::NewOrder;
#[cfg(feature = "impl")]
+pub(crate) use request::http_status;
+#[cfg(feature = "impl")]
pub use request::ErrorResponse;
/// Header name for nonces.
diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
index dadfc5af..341ce53e 100644
--- a/proxmox-acme/src/request.rs
+++ b/proxmox-acme/src/request.rs
@@ -1,7 +1,6 @@
use serde::Deserialize;
pub(crate) const JSON_CONTENT_TYPE: &str = "application/jose+json";
-pub(crate) const CREATED: u16 = 201;
/// A request which should be performed on the ACME provider.
pub(crate) struct Request {
@@ -21,6 +20,16 @@ pub(crate) struct Request {
pub(crate) expected: u16,
}
+/// Common HTTP status codes used in ACME responses.
+pub(crate) mod http_status {
+ /// 200 OK
+ pub(crate) const OK: u16 = 200;
+ /// 201 Created
+ pub(crate) const CREATED: u16 = 201;
+ /// 204 No Content
+ pub(crate) const NO_CONTENT: u16 = 204;
+}
+
/// An ACME error response contains a specially formatted type string, and can optionally
/// contain textual details and a set of sub problems.
#[derive(Clone, Debug, Deserialize)]
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox v4 2/4] acme: reduce visibility of Request type
2025-12-03 10:22 11% [pbs-devel] [PATCH proxmox{-backup, } v4 " Samuel Rufinatscha
` (4 preceding siblings ...)
2025-12-03 10:22 17% ` [pbs-devel] [PATCH proxmox v4 1/4] acme-api: add helper to load client for an account Samuel Rufinatscha
@ 2025-12-03 10:22 12% ` Samuel Rufinatscha
2025-12-09 16:51 5% ` Max R. Carrara
2025-12-03 10:22 15% ` [pbs-devel] [PATCH proxmox v4 3/4] acme: introduce http_status module Samuel Rufinatscha
` (3 subsequent siblings)
9 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2025-12-03 10:22 UTC (permalink / raw)
To: pbs-devel
Currently, the low-level ACME Request type is publicly exposed, even
though users are expected to go through AcmeClient and
proxmox-acme-api handlers. This patch reduces visibility so that
the Request type and related fields/methods are crate-internal only.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme/src/account.rs | 17 ++++++++++-------
proxmox-acme/src/async_client.rs | 2 +-
proxmox-acme/src/authorization.rs | 2 +-
proxmox-acme/src/client.rs | 6 +++---
proxmox-acme/src/lib.rs | 4 ----
proxmox-acme/src/order.rs | 2 +-
proxmox-acme/src/request.rs | 12 ++++++------
7 files changed, 22 insertions(+), 23 deletions(-)
diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
index 0bbf0027..081ca986 100644
--- a/proxmox-acme/src/account.rs
+++ b/proxmox-acme/src/account.rs
@@ -92,7 +92,7 @@ impl Account {
}
/// Prepare a "POST-as-GET" request to fetch data. Low level helper.
- pub fn get_request(&self, url: &str, nonce: &str) -> Result<Request, Error> {
+ pub(crate) fn get_request(&self, url: &str, nonce: &str) -> Result<Request, Error> {
let key = PKey::private_key_from_pem(self.private_key.as_bytes())?;
let body = serde_json::to_string(&Jws::new_full(
&key,
@@ -112,7 +112,7 @@ impl Account {
}
/// Prepare a JSON POST request. Low level helper.
- pub fn post_request<T: Serialize>(
+ pub(crate) fn post_request<T: Serialize>(
&self,
url: &str,
nonce: &str,
@@ -179,7 +179,7 @@ impl Account {
/// Prepare a request to update account data.
///
/// This is a rather low level interface. You should know what you're doing.
- pub fn update_account_request<T: Serialize>(
+ pub(crate) fn update_account_request<T: Serialize>(
&self,
nonce: &str,
data: &T,
@@ -188,7 +188,10 @@ impl Account {
}
/// Prepare a request to deactivate this account.
- pub fn deactivate_account_request<T: Serialize>(&self, nonce: &str) -> Result<Request, Error> {
+ pub(crate) fn deactivate_account_request<T: Serialize>(
+ &self,
+ nonce: &str,
+ ) -> Result<Request, Error> {
self.post_request_raw_payload(
&self.location,
nonce,
@@ -220,7 +223,7 @@ impl Account {
///
/// This returns a raw `Request` since validation takes some time and the `Authorization`
/// object has to be re-queried and its `status` inspected.
- pub fn validate_challenge(
+ pub(crate) fn validate_challenge(
&self,
authorization: &Authorization,
challenge_index: usize,
@@ -274,7 +277,7 @@ pub struct CertificateRevocation<'a> {
impl CertificateRevocation<'_> {
/// Create the revocation request using the specified nonce for the given directory.
- pub fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
+ pub(crate) fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
let revoke_cert = directory.data.revoke_cert.as_ref().ok_or_else(|| {
Error::Custom("no 'revokeCert' URL specified by provider".to_string())
})?;
@@ -364,7 +367,7 @@ impl AccountCreator {
/// the resulting request.
/// Changing the private key between using the request and passing the response to
/// [`response`](AccountCreator::response()) will render the account unusable!
- pub fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
+ pub(crate) fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
let key = self.key.as_deref().ok_or(Error::MissingKey)?;
let url = directory.new_account_url().ok_or_else(|| {
Error::Custom("no 'newAccount' URL specified by provider".to_string())
diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
index dc755fb9..2ff3ba22 100644
--- a/proxmox-acme/src/async_client.rs
+++ b/proxmox-acme/src/async_client.rs
@@ -10,7 +10,7 @@ use proxmox_http::{client::Client, Body};
use crate::account::AccountCreator;
use crate::order::{Order, OrderData};
-use crate::Request as AcmeRequest;
+use crate::request::Request as AcmeRequest;
use crate::{Account, Authorization, Challenge, Directory, Error, ErrorResponse};
/// A non-blocking Acme client using tokio/hyper.
diff --git a/proxmox-acme/src/authorization.rs b/proxmox-acme/src/authorization.rs
index 28bc1b4b..765714fc 100644
--- a/proxmox-acme/src/authorization.rs
+++ b/proxmox-acme/src/authorization.rs
@@ -145,7 +145,7 @@ pub struct GetAuthorization {
/// this is guaranteed to be `Some`.
///
/// The response should be passed to the the [`response`](GetAuthorization::response()) method.
- pub request: Option<Request>,
+ pub(crate) request: Option<Request>,
}
impl GetAuthorization {
diff --git a/proxmox-acme/src/client.rs b/proxmox-acme/src/client.rs
index 931f7245..5c812567 100644
--- a/proxmox-acme/src/client.rs
+++ b/proxmox-acme/src/client.rs
@@ -7,8 +7,8 @@ use serde::{Deserialize, Serialize};
use crate::b64u;
use crate::error;
use crate::order::OrderData;
-use crate::request::ErrorResponse;
-use crate::{Account, Authorization, Challenge, Directory, Error, Order, Request};
+use crate::request::{ErrorResponse, Request};
+use crate::{Account, Authorization, Challenge, Directory, Error, Order};
macro_rules! format_err {
($($fmt:tt)*) => { Error::Client(format!($($fmt)*)) };
@@ -564,7 +564,7 @@ impl Client {
}
/// Low-level API to run an n API request. This automatically updates the current nonce!
- pub fn run_request(&mut self, request: Request) -> Result<HttpResponse, Error> {
+ pub(crate) fn run_request(&mut self, request: Request) -> Result<HttpResponse, Error> {
self.inner.run_request(request)
}
diff --git a/proxmox-acme/src/lib.rs b/proxmox-acme/src/lib.rs
index df722629..6722030c 100644
--- a/proxmox-acme/src/lib.rs
+++ b/proxmox-acme/src/lib.rs
@@ -66,10 +66,6 @@ pub use error::Error;
#[doc(inline)]
pub use order::Order;
-#[cfg(feature = "impl")]
-#[doc(inline)]
-pub use request::Request;
-
// we don't inline these:
#[cfg(feature = "impl")]
pub use order::NewOrder;
diff --git a/proxmox-acme/src/order.rs b/proxmox-acme/src/order.rs
index b6551004..432a81a4 100644
--- a/proxmox-acme/src/order.rs
+++ b/proxmox-acme/src/order.rs
@@ -153,7 +153,7 @@ pub struct NewOrder {
//order: OrderData,
/// The request to execute to place the order. When creating a [`NewOrder`] via
/// [`Account::new_order`](crate::Account::new_order) this is guaranteed to be `Some`.
- pub request: Option<Request>,
+ pub(crate) request: Option<Request>,
}
impl NewOrder {
diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
index 78a90913..dadfc5af 100644
--- a/proxmox-acme/src/request.rs
+++ b/proxmox-acme/src/request.rs
@@ -4,21 +4,21 @@ pub(crate) const JSON_CONTENT_TYPE: &str = "application/jose+json";
pub(crate) const CREATED: u16 = 201;
/// A request which should be performed on the ACME provider.
-pub struct Request {
+pub(crate) struct Request {
/// The complete URL to send the request to.
- pub url: String,
+ pub(crate) url: String,
/// The HTTP method name to use.
- pub method: &'static str,
+ pub(crate) method: &'static str,
/// The `Content-Type` header to pass along.
- pub content_type: &'static str,
+ pub(crate) content_type: &'static str,
/// The body to pass along with request, or an empty string.
- pub body: String,
+ pub(crate) body: String,
/// The expected status code a compliant ACME provider will return on success.
- pub expected: u16,
+ pub(crate) expected: u16,
}
/// An ACME error response contains a specially formatted type string, and can optionally
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* Re: [pbs-devel] [PATCH proxmox{-backup, } 0/8] fix #6939: acme: support servers returning 204 for nonce requests
2025-12-02 15:56 12% [pbs-devel] [PATCH proxmox{-backup, } 0/8] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (7 preceding siblings ...)
2025-12-02 15:56 14% ` [pbs-devel] [PATCH proxmox 4/4] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
@ 2025-12-02 16:02 6% ` Samuel Rufinatscha
8 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-02 16:02 UTC (permalink / raw)
To: pbs-devel
Ignore this please, forgot to add the version in the subject.
Will send a new one.
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* [pbs-devel] [PATCH proxmox 1/4] acme: reduce visibility of Request type
2025-12-02 15:56 12% [pbs-devel] [PATCH proxmox{-backup, } 0/8] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (3 preceding siblings ...)
2025-12-02 15:56 7% ` [pbs-devel] [PATCH proxmox-backup 4/4] acme: certificate ordering through proxmox-acme-api Samuel Rufinatscha
@ 2025-12-02 15:56 12% ` Samuel Rufinatscha
2025-12-02 15:56 15% ` [pbs-devel] [PATCH proxmox 2/4] acme: introduce http_status module Samuel Rufinatscha
` (3 subsequent siblings)
8 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-02 15:56 UTC (permalink / raw)
To: pbs-devel
Currently, the low-level ACME Request type is publicly exposed, even
though users are expected to go through AcmeClient and
proxmox-acme-api handlers. This patch reduces visibility so that
the Request type and related fields/methods are crate-internal only.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme/src/account.rs | 17 ++++++++++-------
proxmox-acme/src/async_client.rs | 2 +-
proxmox-acme/src/authorization.rs | 2 +-
proxmox-acme/src/client.rs | 6 +++---
proxmox-acme/src/lib.rs | 4 ----
proxmox-acme/src/order.rs | 2 +-
proxmox-acme/src/request.rs | 12 ++++++------
7 files changed, 22 insertions(+), 23 deletions(-)
diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
index 0bbf0027..081ca986 100644
--- a/proxmox-acme/src/account.rs
+++ b/proxmox-acme/src/account.rs
@@ -92,7 +92,7 @@ impl Account {
}
/// Prepare a "POST-as-GET" request to fetch data. Low level helper.
- pub fn get_request(&self, url: &str, nonce: &str) -> Result<Request, Error> {
+ pub(crate) fn get_request(&self, url: &str, nonce: &str) -> Result<Request, Error> {
let key = PKey::private_key_from_pem(self.private_key.as_bytes())?;
let body = serde_json::to_string(&Jws::new_full(
&key,
@@ -112,7 +112,7 @@ impl Account {
}
/// Prepare a JSON POST request. Low level helper.
- pub fn post_request<T: Serialize>(
+ pub(crate) fn post_request<T: Serialize>(
&self,
url: &str,
nonce: &str,
@@ -179,7 +179,7 @@ impl Account {
/// Prepare a request to update account data.
///
/// This is a rather low level interface. You should know what you're doing.
- pub fn update_account_request<T: Serialize>(
+ pub(crate) fn update_account_request<T: Serialize>(
&self,
nonce: &str,
data: &T,
@@ -188,7 +188,10 @@ impl Account {
}
/// Prepare a request to deactivate this account.
- pub fn deactivate_account_request<T: Serialize>(&self, nonce: &str) -> Result<Request, Error> {
+ pub(crate) fn deactivate_account_request<T: Serialize>(
+ &self,
+ nonce: &str,
+ ) -> Result<Request, Error> {
self.post_request_raw_payload(
&self.location,
nonce,
@@ -220,7 +223,7 @@ impl Account {
///
/// This returns a raw `Request` since validation takes some time and the `Authorization`
/// object has to be re-queried and its `status` inspected.
- pub fn validate_challenge(
+ pub(crate) fn validate_challenge(
&self,
authorization: &Authorization,
challenge_index: usize,
@@ -274,7 +277,7 @@ pub struct CertificateRevocation<'a> {
impl CertificateRevocation<'_> {
/// Create the revocation request using the specified nonce for the given directory.
- pub fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
+ pub(crate) fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
let revoke_cert = directory.data.revoke_cert.as_ref().ok_or_else(|| {
Error::Custom("no 'revokeCert' URL specified by provider".to_string())
})?;
@@ -364,7 +367,7 @@ impl AccountCreator {
/// the resulting request.
/// Changing the private key between using the request and passing the response to
/// [`response`](AccountCreator::response()) will render the account unusable!
- pub fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
+ pub(crate) fn request(&self, directory: &Directory, nonce: &str) -> Result<Request, Error> {
let key = self.key.as_deref().ok_or(Error::MissingKey)?;
let url = directory.new_account_url().ok_or_else(|| {
Error::Custom("no 'newAccount' URL specified by provider".to_string())
diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
index dc755fb9..2ff3ba22 100644
--- a/proxmox-acme/src/async_client.rs
+++ b/proxmox-acme/src/async_client.rs
@@ -10,7 +10,7 @@ use proxmox_http::{client::Client, Body};
use crate::account::AccountCreator;
use crate::order::{Order, OrderData};
-use crate::Request as AcmeRequest;
+use crate::request::Request as AcmeRequest;
use crate::{Account, Authorization, Challenge, Directory, Error, ErrorResponse};
/// A non-blocking Acme client using tokio/hyper.
diff --git a/proxmox-acme/src/authorization.rs b/proxmox-acme/src/authorization.rs
index 28bc1b4b..765714fc 100644
--- a/proxmox-acme/src/authorization.rs
+++ b/proxmox-acme/src/authorization.rs
@@ -145,7 +145,7 @@ pub struct GetAuthorization {
/// this is guaranteed to be `Some`.
///
/// The response should be passed to the the [`response`](GetAuthorization::response()) method.
- pub request: Option<Request>,
+ pub(crate) request: Option<Request>,
}
impl GetAuthorization {
diff --git a/proxmox-acme/src/client.rs b/proxmox-acme/src/client.rs
index 931f7245..5c812567 100644
--- a/proxmox-acme/src/client.rs
+++ b/proxmox-acme/src/client.rs
@@ -7,8 +7,8 @@ use serde::{Deserialize, Serialize};
use crate::b64u;
use crate::error;
use crate::order::OrderData;
-use crate::request::ErrorResponse;
-use crate::{Account, Authorization, Challenge, Directory, Error, Order, Request};
+use crate::request::{ErrorResponse, Request};
+use crate::{Account, Authorization, Challenge, Directory, Error, Order};
macro_rules! format_err {
($($fmt:tt)*) => { Error::Client(format!($($fmt)*)) };
@@ -564,7 +564,7 @@ impl Client {
}
/// Low-level API to run an n API request. This automatically updates the current nonce!
- pub fn run_request(&mut self, request: Request) -> Result<HttpResponse, Error> {
+ pub(crate) fn run_request(&mut self, request: Request) -> Result<HttpResponse, Error> {
self.inner.run_request(request)
}
diff --git a/proxmox-acme/src/lib.rs b/proxmox-acme/src/lib.rs
index df722629..6722030c 100644
--- a/proxmox-acme/src/lib.rs
+++ b/proxmox-acme/src/lib.rs
@@ -66,10 +66,6 @@ pub use error::Error;
#[doc(inline)]
pub use order::Order;
-#[cfg(feature = "impl")]
-#[doc(inline)]
-pub use request::Request;
-
// we don't inline these:
#[cfg(feature = "impl")]
pub use order::NewOrder;
diff --git a/proxmox-acme/src/order.rs b/proxmox-acme/src/order.rs
index b6551004..432a81a4 100644
--- a/proxmox-acme/src/order.rs
+++ b/proxmox-acme/src/order.rs
@@ -153,7 +153,7 @@ pub struct NewOrder {
//order: OrderData,
/// The request to execute to place the order. When creating a [`NewOrder`] via
/// [`Account::new_order`](crate::Account::new_order) this is guaranteed to be `Some`.
- pub request: Option<Request>,
+ pub(crate) request: Option<Request>,
}
impl NewOrder {
diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
index 78a90913..dadfc5af 100644
--- a/proxmox-acme/src/request.rs
+++ b/proxmox-acme/src/request.rs
@@ -4,21 +4,21 @@ pub(crate) const JSON_CONTENT_TYPE: &str = "application/jose+json";
pub(crate) const CREATED: u16 = 201;
/// A request which should be performed on the ACME provider.
-pub struct Request {
+pub(crate) struct Request {
/// The complete URL to send the request to.
- pub url: String,
+ pub(crate) url: String,
/// The HTTP method name to use.
- pub method: &'static str,
+ pub(crate) method: &'static str,
/// The `Content-Type` header to pass along.
- pub content_type: &'static str,
+ pub(crate) content_type: &'static str,
/// The body to pass along with request, or an empty string.
- pub body: String,
+ pub(crate) body: String,
/// The expected status code a compliant ACME provider will return on success.
- pub expected: u16,
+ pub(crate) expected: u16,
}
/// An ACME error response contains a specially formatted type string, and can optionally
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] [PATCH proxmox 4/4] fix #6939: acme: support servers returning 204 for nonce requests
2025-12-02 15:56 12% [pbs-devel] [PATCH proxmox{-backup, } 0/8] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (6 preceding siblings ...)
2025-12-02 15:56 17% ` [pbs-devel] [PATCH proxmox 3/4] acme-api: add helper to load client for an account Samuel Rufinatscha
@ 2025-12-02 15:56 14% ` Samuel Rufinatscha
2025-12-02 16:02 6% ` [pbs-devel] [PATCH proxmox{-backup, } 0/8] " Samuel Rufinatscha
8 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-02 15:56 UTC (permalink / raw)
To: pbs-devel
Some ACME servers (notably custom or legacy implementations) respond
to HEAD /newNonce with a 204 No Content instead of the
RFC 8555-recommended 200 OK [1]. While this behavior is technically
off-spec, it is not illegal. This issue was reported on our bug
tracker [2].
The previous implementation treated any non-200 response as an error,
causing account registration to fail against such servers. Relax the
status-code check to accept both 200 and 204 responses (and potentially
support other 2xx codes) to improve interoperability.
Note: In comparison, PVE’s Perl ACME client performs a GET request [3]
instead of a HEAD request and accepts any 2xx success code when
retrieving the nonce [4]. This difference in behavior does not affect
functionality but is worth noting for consistency across
implementations.
[1] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
[3] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219
[4] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597
Fixes: #6939
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme/src/account.rs | 10 +++++-----
proxmox-acme/src/async_client.rs | 6 +++---
proxmox-acme/src/client.rs | 2 +-
proxmox-acme/src/request.rs | 4 ++--
4 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
index 350c78d4..820b209d 100644
--- a/proxmox-acme/src/account.rs
+++ b/proxmox-acme/src/account.rs
@@ -85,7 +85,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::CREATED,
+ expected: &[crate::http_status::CREATED],
};
Ok(NewOrder::new(request))
@@ -107,7 +107,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK],
})
}
@@ -132,7 +132,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK],
})
}
@@ -157,7 +157,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK],
})
}
@@ -408,7 +408,7 @@ impl AccountCreator {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::CREATED,
+ expected: &[crate::http_status::CREATED],
})
}
diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
index 043648bb..07da842c 100644
--- a/proxmox-acme/src/async_client.rs
+++ b/proxmox-acme/src/async_client.rs
@@ -420,7 +420,7 @@ impl AcmeClient {
};
if parts.status.is_success() {
- if status != request.expected {
+ if !request.expected.contains(&status) {
return Err(Error::InvalidApi(format!(
"ACME server responded with unexpected status code: {:?}",
parts.status
@@ -498,7 +498,7 @@ impl AcmeClient {
method: "GET",
content_type: "",
body: String::new(),
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK],
},
nonce,
)
@@ -550,7 +550,7 @@ impl AcmeClient {
method: "HEAD",
content_type: "",
body: String::new(),
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK, crate::http_status::NO_CONTENT],
},
nonce,
)
diff --git a/proxmox-acme/src/client.rs b/proxmox-acme/src/client.rs
index 5c812567..af250fb8 100644
--- a/proxmox-acme/src/client.rs
+++ b/proxmox-acme/src/client.rs
@@ -203,7 +203,7 @@ impl Inner {
let got_nonce = self.update_nonce(&mut response)?;
if response.is_success() {
- if response.status != request.expected {
+ if !request.expected.contains(&response.status) {
return Err(Error::InvalidApi(format!(
"API server responded with unexpected status code: {:?}",
response.status
diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
index 341ce53e..d782a7de 100644
--- a/proxmox-acme/src/request.rs
+++ b/proxmox-acme/src/request.rs
@@ -16,8 +16,8 @@ pub(crate) struct Request {
/// The body to pass along with request, or an empty string.
pub(crate) body: String,
- /// The expected status code a compliant ACME provider will return on success.
- pub(crate) expected: u16,
+ /// The set of HTTP status codes that indicate a successful response from an ACME provider.
+ pub(crate) expected: &'static [u16],
}
/// Common HTTP status codes used in ACME responses.
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 14%]
* [pbs-devel] [PATCH proxmox 3/4] acme-api: add helper to load client for an account
2025-12-02 15:56 12% [pbs-devel] [PATCH proxmox{-backup, } 0/8] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (5 preceding siblings ...)
2025-12-02 15:56 15% ` [pbs-devel] [PATCH proxmox 2/4] acme: introduce http_status module Samuel Rufinatscha
@ 2025-12-02 15:56 17% ` Samuel Rufinatscha
2025-12-02 15:56 14% ` [pbs-devel] [PATCH proxmox 4/4] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2025-12-02 16:02 6% ` [pbs-devel] [PATCH proxmox{-backup, } 0/8] " Samuel Rufinatscha
8 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-02 15:56 UTC (permalink / raw)
To: pbs-devel
The PBS ACME refactoring needs a simple way to obtain an AcmeClient for
a given configured account without duplicating config wiring. This patch
adds a load_client_with_account helper in proxmox-acme-api that loads
the account and constructs a matching client, similarly as PBS previous
own AcmeClient::load() function.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme-api/src/account_api_impl.rs | 5 +++++
proxmox-acme-api/src/lib.rs | 3 ++-
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/proxmox-acme-api/src/account_api_impl.rs b/proxmox-acme-api/src/account_api_impl.rs
index ef195908..ca8c8655 100644
--- a/proxmox-acme-api/src/account_api_impl.rs
+++ b/proxmox-acme-api/src/account_api_impl.rs
@@ -116,3 +116,8 @@ pub async fn update_account(name: &AcmeAccountName, contact: Option<String>) ->
Ok(())
}
+
+pub async fn load_client_with_account(account_name: &AcmeAccountName) -> Result<AcmeClient, Error> {
+ let account_data = super::account_config::load_account_config(&account_name).await?;
+ Ok(account_data.client())
+}
diff --git a/proxmox-acme-api/src/lib.rs b/proxmox-acme-api/src/lib.rs
index 623e9e23..96f88ae2 100644
--- a/proxmox-acme-api/src/lib.rs
+++ b/proxmox-acme-api/src/lib.rs
@@ -31,7 +31,8 @@ mod plugin_config;
mod account_api_impl;
#[cfg(feature = "impl")]
pub use account_api_impl::{
- deactivate_account, get_account, get_tos, list_accounts, register_account, update_account,
+ deactivate_account, get_account, get_tos, list_accounts, load_client_with_account,
+ register_account, update_account,
};
#[cfg(feature = "impl")]
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 17%]
* [pbs-devel] [PATCH proxmox 2/4] acme: introduce http_status module
2025-12-02 15:56 12% [pbs-devel] [PATCH proxmox{-backup, } 0/8] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (4 preceding siblings ...)
2025-12-02 15:56 12% ` [pbs-devel] [PATCH proxmox 1/4] acme: reduce visibility of Request type Samuel Rufinatscha
@ 2025-12-02 15:56 15% ` Samuel Rufinatscha
2025-12-02 15:56 17% ` [pbs-devel] [PATCH proxmox 3/4] acme-api: add helper to load client for an account Samuel Rufinatscha
` (2 subsequent siblings)
8 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-02 15:56 UTC (permalink / raw)
To: pbs-devel
Introduce an internal http_status module with the common ACME HTTP
response codes, and replace use of crate::request::CREATED as well as
direct numeric status code usages.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme/src/account.rs | 10 +++++-----
proxmox-acme/src/async_client.rs | 4 ++--
proxmox-acme/src/lib.rs | 2 ++
proxmox-acme/src/request.rs | 11 ++++++++++-
4 files changed, 19 insertions(+), 8 deletions(-)
diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
index 081ca986..350c78d4 100644
--- a/proxmox-acme/src/account.rs
+++ b/proxmox-acme/src/account.rs
@@ -85,7 +85,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::request::CREATED,
+ expected: crate::http_status::CREATED,
};
Ok(NewOrder::new(request))
@@ -107,7 +107,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: crate::http_status::OK,
})
}
@@ -132,7 +132,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: crate::http_status::OK,
})
}
@@ -157,7 +157,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: crate::http_status::OK,
})
}
@@ -408,7 +408,7 @@ impl AccountCreator {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::request::CREATED,
+ expected: crate::http_status::CREATED,
})
}
diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
index 2ff3ba22..043648bb 100644
--- a/proxmox-acme/src/async_client.rs
+++ b/proxmox-acme/src/async_client.rs
@@ -498,7 +498,7 @@ impl AcmeClient {
method: "GET",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: crate::http_status::OK,
},
nonce,
)
@@ -550,7 +550,7 @@ impl AcmeClient {
method: "HEAD",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: crate::http_status::OK,
},
nonce,
)
diff --git a/proxmox-acme/src/lib.rs b/proxmox-acme/src/lib.rs
index 6722030c..6051a025 100644
--- a/proxmox-acme/src/lib.rs
+++ b/proxmox-acme/src/lib.rs
@@ -70,6 +70,8 @@ pub use order::Order;
#[cfg(feature = "impl")]
pub use order::NewOrder;
#[cfg(feature = "impl")]
+pub(crate) use request::http_status;
+#[cfg(feature = "impl")]
pub use request::ErrorResponse;
/// Header name for nonces.
diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
index dadfc5af..341ce53e 100644
--- a/proxmox-acme/src/request.rs
+++ b/proxmox-acme/src/request.rs
@@ -1,7 +1,6 @@
use serde::Deserialize;
pub(crate) const JSON_CONTENT_TYPE: &str = "application/jose+json";
-pub(crate) const CREATED: u16 = 201;
/// A request which should be performed on the ACME provider.
pub(crate) struct Request {
@@ -21,6 +20,16 @@ pub(crate) struct Request {
pub(crate) expected: u16,
}
+/// Common HTTP status codes used in ACME responses.
+pub(crate) mod http_status {
+ /// 200 OK
+ pub(crate) const OK: u16 = 200;
+ /// 201 Created
+ pub(crate) const CREATED: u16 = 201;
+ /// 204 No Content
+ pub(crate) const NO_CONTENT: u16 = 204;
+}
+
/// An ACME error response contains a specially formatted type string, and can optionally
/// contain textual details and a set of sub problems.
#[derive(Clone, Debug, Deserialize)]
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox-backup 4/4] acme: certificate ordering through proxmox-acme-api
2025-12-02 15:56 12% [pbs-devel] [PATCH proxmox{-backup, } 0/8] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (2 preceding siblings ...)
2025-12-02 15:56 8% ` [pbs-devel] [PATCH proxmox-backup 3/4] acme: change API impls to use proxmox-acme-api handlers Samuel Rufinatscha
@ 2025-12-02 15:56 7% ` Samuel Rufinatscha
2025-12-02 15:56 12% ` [pbs-devel] [PATCH proxmox 1/4] acme: reduce visibility of Request type Samuel Rufinatscha
` (4 subsequent siblings)
8 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-02 15:56 UTC (permalink / raw)
To: pbs-devel
PBS currently uses its own ACME client and API logic, while PDM uses the
factored out proxmox-acme and proxmox-acme-api crates. This duplication
risks differences in behaviour and requires ACME maintenance in two
places. This patch is part of a series to move PBS over to the shared
ACME stack.
Changes:
- Replace the custom ACME order/authorization loop in node certificates
with a call to proxmox_acme_api::order_certificate.
- Build domain + config data as proxmox-acme-api types
- Remove obsolete local ACME ordering and plugin glue code.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
src/acme/mod.rs | 2 -
src/acme/plugin.rs | 336 ----------------------------------
src/api2/node/certificates.rs | 240 ++++--------------------
src/api2/types/acme.rs | 74 --------
src/api2/types/mod.rs | 3 -
src/config/acme/mod.rs | 7 +-
src/config/acme/plugin.rs | 99 +---------
src/config/node.rs | 22 +--
src/lib.rs | 2 -
9 files changed, 46 insertions(+), 739 deletions(-)
delete mode 100644 src/acme/mod.rs
delete mode 100644 src/acme/plugin.rs
delete mode 100644 src/api2/types/acme.rs
diff --git a/src/acme/mod.rs b/src/acme/mod.rs
deleted file mode 100644
index cc561f9a..00000000
--- a/src/acme/mod.rs
+++ /dev/null
@@ -1,2 +0,0 @@
-pub(crate) mod plugin;
-pub(crate) use plugin::get_acme_plugin;
diff --git a/src/acme/plugin.rs b/src/acme/plugin.rs
deleted file mode 100644
index 5bc09e1f..00000000
--- a/src/acme/plugin.rs
+++ /dev/null
@@ -1,336 +0,0 @@
-use std::future::Future;
-use std::net::{IpAddr, SocketAddr};
-use std::pin::Pin;
-use std::process::Stdio;
-use std::sync::Arc;
-use std::time::Duration;
-
-use anyhow::{bail, format_err, Error};
-use bytes::Bytes;
-use futures::TryFutureExt;
-use http_body_util::Full;
-use hyper::body::Incoming;
-use hyper::server::conn::http1;
-use hyper::service::service_fn;
-use hyper::{Request, Response};
-use hyper_util::rt::TokioIo;
-use tokio::io::{AsyncBufReadExt, AsyncRead, AsyncWriteExt, BufReader};
-use tokio::net::TcpListener;
-use tokio::process::Command;
-
-use proxmox_acme::{Authorization, Challenge};
-
-use crate::api2::types::AcmeDomain;
-use proxmox_acme::async_client::AcmeClient;
-use proxmox_rest_server::WorkerTask;
-
-use crate::config::acme::plugin::{DnsPlugin, PluginData};
-
-const PROXMOX_ACME_SH_PATH: &str = "/usr/share/proxmox-acme/proxmox-acme";
-
-pub(crate) fn get_acme_plugin(
- plugin_data: &PluginData,
- name: &str,
-) -> Result<Option<Box<dyn AcmePlugin + Send + Sync + 'static>>, Error> {
- let (ty, data) = match plugin_data.get(name) {
- Some(plugin) => plugin,
- None => return Ok(None),
- };
-
- Ok(Some(match ty.as_str() {
- "dns" => {
- let plugin: DnsPlugin = serde::Deserialize::deserialize(data)?;
- Box::new(plugin)
- }
- "standalone" => {
- // this one has no config
- Box::<StandaloneServer>::default()
- }
- other => bail!("missing implementation for plugin type '{}'", other),
- }))
-}
-
-pub(crate) trait AcmePlugin {
- /// Setup everything required to trigger the validation and return the corresponding validation
- /// URL.
- fn setup<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<&'c str, Error>> + Send + 'fut>>;
-
- fn teardown<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + 'fut>>;
-}
-
-fn extract_challenge<'a>(
- authorization: &'a Authorization,
- ty: &str,
-) -> Result<&'a Challenge, Error> {
- authorization
- .challenges
- .iter()
- .find(|ch| ch.ty == ty)
- .ok_or_else(|| format_err!("no supported challenge type ({}) found", ty))
-}
-
-async fn pipe_to_tasklog<T: AsyncRead + Unpin>(
- pipe: T,
- task: Arc<WorkerTask>,
-) -> Result<(), std::io::Error> {
- let mut pipe = BufReader::new(pipe);
- let mut line = String::new();
- loop {
- line.clear();
- match pipe.read_line(&mut line).await {
- Ok(0) => return Ok(()),
- Ok(_) => task.log_message(line.as_str()),
- Err(err) => return Err(err),
- }
- }
-}
-
-impl DnsPlugin {
- async fn action<'a>(
- &self,
- client: &mut AcmeClient,
- authorization: &'a Authorization,
- domain: &AcmeDomain,
- task: Arc<WorkerTask>,
- action: &str,
- ) -> Result<&'a str, Error> {
- let challenge = extract_challenge(authorization, "dns-01")?;
- let mut stdin_data = client
- .dns_01_txt_value(
- challenge
- .token()
- .ok_or_else(|| format_err!("missing token in challenge"))?,
- )?
- .into_bytes();
- stdin_data.push(b'\n');
- stdin_data.extend(self.data.as_bytes());
- if stdin_data.last() != Some(&b'\n') {
- stdin_data.push(b'\n');
- }
-
- let mut command = Command::new("/usr/bin/setpriv");
-
- #[rustfmt::skip]
- command.args([
- "--reuid", "nobody",
- "--regid", "nogroup",
- "--clear-groups",
- "--reset-env",
- "--",
- "/bin/bash",
- PROXMOX_ACME_SH_PATH,
- action,
- &self.core.api,
- domain.alias.as_deref().unwrap_or(&domain.domain),
- ]);
-
- // We could use 1 socketpair, but tokio wraps them all in `File` internally causing `close`
- // to be called separately on all of them without exception, so we need 3 pipes :-(
-
- let mut child = command
- .stdin(Stdio::piped())
- .stdout(Stdio::piped())
- .stderr(Stdio::piped())
- .spawn()?;
-
- let mut stdin = child.stdin.take().expect("Stdio::piped()");
- let stdout = child.stdout.take().expect("Stdio::piped() failed?");
- let stdout = pipe_to_tasklog(stdout, Arc::clone(&task));
- let stderr = child.stderr.take().expect("Stdio::piped() failed?");
- let stderr = pipe_to_tasklog(stderr, Arc::clone(&task));
- let stdin = async move {
- stdin.write_all(&stdin_data).await?;
- stdin.flush().await?;
- Ok::<_, std::io::Error>(())
- };
- match futures::try_join!(stdin, stdout, stderr) {
- Ok(((), (), ())) => (),
- Err(err) => {
- if let Err(err) = child.kill().await {
- task.log_message(format!(
- "failed to kill '{PROXMOX_ACME_SH_PATH} {action}' command: {err}"
- ));
- }
- bail!("'{}' failed: {}", PROXMOX_ACME_SH_PATH, err);
- }
- }
-
- let status = child.wait().await?;
- if !status.success() {
- bail!(
- "'{} {}' exited with error ({})",
- PROXMOX_ACME_SH_PATH,
- action,
- status.code().unwrap_or(-1)
- );
- }
-
- Ok(&challenge.url)
- }
-}
-
-impl AcmePlugin for DnsPlugin {
- fn setup<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<&'c str, Error>> + Send + 'fut>> {
- Box::pin(async move {
- let result = self
- .action(client, authorization, domain, task.clone(), "setup")
- .await;
-
- let validation_delay = self.core.validation_delay.unwrap_or(30) as u64;
- if validation_delay > 0 {
- task.log_message(format!(
- "Sleeping {validation_delay} seconds to wait for TXT record propagation"
- ));
- tokio::time::sleep(Duration::from_secs(validation_delay)).await;
- }
- result
- })
- }
-
- fn teardown<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + 'fut>> {
- Box::pin(async move {
- self.action(client, authorization, domain, task, "teardown")
- .await
- .map(drop)
- })
- }
-}
-
-#[derive(Default)]
-struct StandaloneServer {
- abort_handle: Option<futures::future::AbortHandle>,
-}
-
-// In case the "order_certificates" future gets dropped between setup & teardown, let's also cancel
-// the HTTP listener on Drop:
-impl Drop for StandaloneServer {
- fn drop(&mut self) {
- self.stop();
- }
-}
-
-impl StandaloneServer {
- fn stop(&mut self) {
- if let Some(abort) = self.abort_handle.take() {
- abort.abort();
- }
- }
-}
-
-async fn standalone_respond(
- req: Request<Incoming>,
- path: Arc<String>,
- key_auth: Arc<String>,
-) -> Result<Response<Full<Bytes>>, hyper::Error> {
- if req.method() == hyper::Method::GET && req.uri().path() == path.as_str() {
- Ok(Response::builder()
- .status(hyper::http::StatusCode::OK)
- .body(key_auth.as_bytes().to_vec().into())
- .unwrap())
- } else {
- Ok(Response::builder()
- .status(hyper::http::StatusCode::NOT_FOUND)
- .body("Not found.".into())
- .unwrap())
- }
-}
-
-impl AcmePlugin for StandaloneServer {
- fn setup<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- _domain: &'d AcmeDomain,
- _task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<&'c str, Error>> + Send + 'fut>> {
- Box::pin(async move {
- self.stop();
-
- let challenge = extract_challenge(authorization, "http-01")?;
- let token = challenge
- .token()
- .ok_or_else(|| format_err!("missing token in challenge"))?;
- let key_auth = Arc::new(client.key_authorization(token)?);
- let path = Arc::new(format!("/.well-known/acme-challenge/{token}"));
-
- // `[::]:80` first, then `*:80`
- let dual = SocketAddr::new(IpAddr::from([0u16; 8]), 80);
- let ipv4 = SocketAddr::new(IpAddr::from([0u8; 4]), 80);
- let incoming = TcpListener::bind(dual)
- .or_else(|_| TcpListener::bind(ipv4))
- .await?;
-
- let server = async move {
- loop {
- let key_auth = Arc::clone(&key_auth);
- let path = Arc::clone(&path);
- match incoming.accept().await {
- Ok((tcp, _)) => {
- let io = TokioIo::new(tcp);
- let service = service_fn(move |request| {
- standalone_respond(
- request,
- Arc::clone(&path),
- Arc::clone(&key_auth),
- )
- });
-
- tokio::task::spawn(async move {
- if let Err(err) =
- http1::Builder::new().serve_connection(io, service).await
- {
- println!("Error serving connection: {err:?}");
- }
- });
- }
- Err(err) => println!("Error accepting connection: {err:?}"),
- }
- }
- };
- let (future, abort) = futures::future::abortable(server);
- self.abort_handle = Some(abort);
- tokio::spawn(future);
-
- Ok(challenge.url.as_str())
- })
- }
-
- fn teardown<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- _client: &'b mut AcmeClient,
- _authorization: &'c Authorization,
- _domain: &'d AcmeDomain,
- _task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + 'fut>> {
- Box::pin(async move {
- if let Some(abort) = self.abort_handle.take() {
- abort.abort();
- }
- Ok(())
- })
- }
-}
diff --git a/src/api2/node/certificates.rs b/src/api2/node/certificates.rs
index 31196715..2a645b4a 100644
--- a/src/api2/node/certificates.rs
+++ b/src/api2/node/certificates.rs
@@ -1,27 +1,19 @@
-use std::sync::Arc;
-use std::time::Duration;
-
use anyhow::{bail, format_err, Error};
use openssl::pkey::PKey;
use openssl::x509::X509;
use serde::{Deserialize, Serialize};
use tracing::info;
-use proxmox_router::list_subdirs_api_method;
-use proxmox_router::SubdirMap;
-use proxmox_router::{Permission, Router, RpcEnvironment};
-use proxmox_schema::api;
-
+use crate::server::send_certificate_renewal_mail;
use pbs_api_types::{NODE_SCHEMA, PRIV_SYS_MODIFY};
use pbs_buildcfg::configdir;
use pbs_tools::cert;
-use tracing::warn;
-
-use crate::api2::types::AcmeDomain;
-use crate::config::node::NodeConfig;
-use crate::server::send_certificate_renewal_mail;
-use proxmox_acme::async_client::AcmeClient;
+use proxmox_acme_api::AcmeDomain;
use proxmox_rest_server::WorkerTask;
+use proxmox_router::list_subdirs_api_method;
+use proxmox_router::SubdirMap;
+use proxmox_router::{Permission, Router, RpcEnvironment};
+use proxmox_schema::api;
pub const ROUTER: Router = Router::new()
.get(&list_subdirs_api_method!(SUBDIRS))
@@ -269,193 +261,6 @@ pub async fn delete_custom_certificate() -> Result<(), Error> {
Ok(())
}
-struct OrderedCertificate {
- certificate: hyper::body::Bytes,
- private_key_pem: Vec<u8>,
-}
-
-async fn order_certificate(
- worker: Arc<WorkerTask>,
- node_config: &NodeConfig,
-) -> Result<Option<OrderedCertificate>, Error> {
- use proxmox_acme::authorization::Status;
- use proxmox_acme::order::Identifier;
-
- let domains = node_config.acme_domains().try_fold(
- Vec::<AcmeDomain>::new(),
- |mut acc, domain| -> Result<_, Error> {
- let mut domain = domain?;
- domain.domain.make_ascii_lowercase();
- if let Some(alias) = &mut domain.alias {
- alias.make_ascii_lowercase();
- }
- acc.push(domain);
- Ok(acc)
- },
- )?;
-
- let get_domain_config = |domain: &str| {
- domains
- .iter()
- .find(|d| d.domain == domain)
- .ok_or_else(|| format_err!("no config for domain '{}'", domain))
- };
-
- if domains.is_empty() {
- info!("No domains configured to be ordered from an ACME server.");
- return Ok(None);
- }
-
- let (plugins, _) = crate::config::acme::plugin::config()?;
-
- let mut acme = node_config.acme_client().await?;
-
- info!("Placing ACME order");
- let order = acme
- .new_order(domains.iter().map(|d| d.domain.to_ascii_lowercase()))
- .await?;
- info!("Order URL: {}", order.location);
-
- let identifiers: Vec<String> = order
- .data
- .identifiers
- .iter()
- .map(|identifier| match identifier {
- Identifier::Dns(domain) => domain.clone(),
- })
- .collect();
-
- for auth_url in &order.data.authorizations {
- info!("Getting authorization details from '{auth_url}'");
- let mut auth = acme.get_authorization(auth_url).await?;
-
- let domain = match &mut auth.identifier {
- Identifier::Dns(domain) => domain.to_ascii_lowercase(),
- };
-
- if auth.status == Status::Valid {
- info!("{domain} is already validated!");
- continue;
- }
-
- info!("The validation for {domain} is pending");
- let domain_config: &AcmeDomain = get_domain_config(&domain)?;
- let plugin_id = domain_config.plugin.as_deref().unwrap_or("standalone");
- let mut plugin_cfg = crate::acme::get_acme_plugin(&plugins, plugin_id)?
- .ok_or_else(|| format_err!("plugin '{plugin_id}' for domain '{domain}' not found!"))?;
-
- info!("Setting up validation plugin");
- let validation_url = plugin_cfg
- .setup(&mut acme, &auth, domain_config, Arc::clone(&worker))
- .await?;
-
- let result = request_validation(&mut acme, auth_url, validation_url).await;
-
- if let Err(err) = plugin_cfg
- .teardown(&mut acme, &auth, domain_config, Arc::clone(&worker))
- .await
- {
- warn!("Failed to teardown plugin '{plugin_id}' for domain '{domain}' - {err}");
- }
-
- result?;
- }
-
- info!("All domains validated");
- info!("Creating CSR");
-
- let csr = proxmox_acme::util::Csr::generate(&identifiers, &Default::default())?;
- let mut finalize_error_cnt = 0u8;
- let order_url = &order.location;
- let mut order;
- loop {
- use proxmox_acme::order::Status;
-
- order = acme.get_order(order_url).await?;
-
- match order.status {
- Status::Pending => {
- info!("still pending, trying to finalize anyway");
- let finalize = order
- .finalize
- .as_deref()
- .ok_or_else(|| format_err!("missing 'finalize' URL in order"))?;
- if let Err(err) = acme.finalize(finalize, &csr.data).await {
- if finalize_error_cnt >= 5 {
- return Err(err);
- }
-
- finalize_error_cnt += 1;
- }
- tokio::time::sleep(Duration::from_secs(5)).await;
- }
- Status::Ready => {
- info!("order is ready, finalizing");
- let finalize = order
- .finalize
- .as_deref()
- .ok_or_else(|| format_err!("missing 'finalize' URL in order"))?;
- acme.finalize(finalize, &csr.data).await?;
- tokio::time::sleep(Duration::from_secs(5)).await;
- }
- Status::Processing => {
- info!("still processing, trying again in 30 seconds");
- tokio::time::sleep(Duration::from_secs(30)).await;
- }
- Status::Valid => {
- info!("valid");
- break;
- }
- other => bail!("order status: {:?}", other),
- }
- }
-
- info!("Downloading certificate");
- let certificate = acme
- .get_certificate(
- order
- .certificate
- .as_deref()
- .ok_or_else(|| format_err!("missing certificate url in finalized order"))?,
- )
- .await?;
-
- Ok(Some(OrderedCertificate {
- certificate,
- private_key_pem: csr.private_key_pem,
- }))
-}
-
-async fn request_validation(
- acme: &mut AcmeClient,
- auth_url: &str,
- validation_url: &str,
-) -> Result<(), Error> {
- info!("Triggering validation");
- acme.request_challenge_validation(validation_url).await?;
-
- info!("Sleeping for 5 seconds");
- tokio::time::sleep(Duration::from_secs(5)).await;
-
- loop {
- use proxmox_acme::authorization::Status;
-
- let auth = acme.get_authorization(auth_url).await?;
- match auth.status {
- Status::Pending => {
- info!("Status is still 'pending', trying again in 10 seconds");
- tokio::time::sleep(Duration::from_secs(10)).await;
- }
- Status::Valid => return Ok(()),
- other => bail!(
- "validating challenge '{}' failed - status: {:?}",
- validation_url,
- other
- ),
- }
- }
-}
-
#[api(
input: {
properties: {
@@ -525,9 +330,30 @@ fn spawn_certificate_worker(
let auth_id = rpcenv.get_auth_id().unwrap();
+ let acme_config = if let Some(cfg) = node_config.acme_config().transpose()? {
+ cfg
+ } else {
+ proxmox_acme_api::parse_acme_config_string("account=default")?
+ };
+
+ let domains = node_config.acme_domains().try_fold(
+ Vec::<AcmeDomain>::new(),
+ |mut acc, domain| -> Result<_, Error> {
+ let mut domain = domain?;
+ domain.domain.make_ascii_lowercase();
+ if let Some(alias) = &mut domain.alias {
+ alias.make_ascii_lowercase();
+ }
+ acc.push(domain);
+ Ok(acc)
+ },
+ )?;
+
WorkerTask::spawn(name, None, auth_id, true, move |worker| async move {
let work = || async {
- if let Some(cert) = order_certificate(worker, &node_config).await? {
+ if let Some(cert) =
+ proxmox_acme_api::order_certificate(worker, &acme_config, &domains).await?
+ {
crate::config::set_proxy_certificate(&cert.certificate, &cert.private_key_pem)?;
crate::server::reload_proxy_certificate().await?;
}
@@ -563,16 +389,20 @@ pub fn revoke_acme_cert(rpcenv: &mut dyn RpcEnvironment) -> Result<String, Error
let auth_id = rpcenv.get_auth_id().unwrap();
+ let acme_config = if let Some(cfg) = node_config.acme_config().transpose()? {
+ cfg
+ } else {
+ proxmox_acme_api::parse_acme_config_string("account=default")?
+ };
+
WorkerTask::spawn(
"acme-revoke-cert",
None,
auth_id,
true,
move |_worker| async move {
- info!("Loading ACME account");
- let mut acme = node_config.acme_client().await?;
info!("Revoking old certificate");
- acme.revoke_certificate(cert_pem.as_bytes(), None).await?;
+ proxmox_acme_api::revoke_certificate(&acme_config, &cert_pem.as_bytes()).await?;
info!("Deleting certificate and regenerating a self-signed one");
delete_custom_certificate().await?;
Ok(())
diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
deleted file mode 100644
index 2905b41b..00000000
--- a/src/api2/types/acme.rs
+++ /dev/null
@@ -1,74 +0,0 @@
-use serde::{Deserialize, Serialize};
-use serde_json::Value;
-
-use proxmox_schema::{api, ApiStringFormat, ApiType, Schema, StringSchema};
-
-use pbs_api_types::{DNS_ALIAS_FORMAT, DNS_NAME_FORMAT, PROXMOX_SAFE_ID_FORMAT};
-
-#[api(
- properties: {
- "domain": { format: &DNS_NAME_FORMAT },
- "alias": {
- optional: true,
- format: &DNS_ALIAS_FORMAT,
- },
- "plugin": {
- optional: true,
- format: &PROXMOX_SAFE_ID_FORMAT,
- },
- },
- default_key: "domain",
-)]
-#[derive(Deserialize, Serialize)]
-/// A domain entry for an ACME certificate.
-pub struct AcmeDomain {
- /// The domain to certify for.
- pub domain: String,
-
- /// The domain to use for challenges instead of the default acme challenge domain.
- ///
- /// This is useful if you use CNAME entries to redirect `_acme-challenge.*` domains to a
- /// different DNS server.
- #[serde(skip_serializing_if = "Option::is_none")]
- pub alias: Option<String>,
-
- /// The plugin to use to validate this domain.
- ///
- /// Empty means standalone HTTP validation is used.
- #[serde(skip_serializing_if = "Option::is_none")]
- pub plugin: Option<String>,
-}
-
-pub const ACME_DOMAIN_PROPERTY_SCHEMA: Schema =
- StringSchema::new("ACME domain configuration string")
- .format(&ApiStringFormat::PropertyString(&AcmeDomain::API_SCHEMA))
- .schema();
-
-#[api(
- properties: {
- schema: {
- type: Object,
- additional_properties: true,
- properties: {},
- },
- type: {
- type: String,
- },
- },
-)]
-#[derive(Serialize)]
-/// Schema for an ACME challenge plugin.
-pub struct AcmeChallengeSchema {
- /// Plugin ID.
- pub id: String,
-
- /// Human readable name, falls back to id.
- pub name: String,
-
- /// Plugin Type.
- #[serde(rename = "type")]
- pub ty: &'static str,
-
- /// The plugin's parameter schema.
- pub schema: Value,
-}
diff --git a/src/api2/types/mod.rs b/src/api2/types/mod.rs
index afc34b30..34193685 100644
--- a/src/api2/types/mod.rs
+++ b/src/api2/types/mod.rs
@@ -4,9 +4,6 @@ use anyhow::bail;
use proxmox_schema::*;
-mod acme;
-pub use acme::*;
-
// File names: may not contain slashes, may not start with "."
pub const FILENAME_FORMAT: ApiStringFormat = ApiStringFormat::VerifyFn(|name| {
if name.starts_with('.') {
diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
index 35cda50b..afd7abf8 100644
--- a/src/config/acme/mod.rs
+++ b/src/config/acme/mod.rs
@@ -9,8 +9,7 @@ use proxmox_sys::fs::{file_read_string, CreateOptions};
use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
-use crate::api2::types::AcmeChallengeSchema;
-use proxmox_acme_api::{AcmeAccountName, KnownAcmeDirectory, KNOWN_ACME_DIRECTORIES};
+use proxmox_acme_api::{AcmeAccountName, AcmeChallengeSchema};
pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
pub(crate) const ACME_ACCOUNT_DIR: &str = pbs_buildcfg::configdir!("/acme/accounts");
@@ -35,8 +34,6 @@ pub(crate) fn make_acme_dir() -> Result<(), Error> {
create_acme_subdir(ACME_DIR)
}
-pub const DEFAULT_ACME_DIRECTORY_ENTRY: &KnownAcmeDirectory = &KNOWN_ACME_DIRECTORIES[0];
-
pub fn foreach_acme_account<F>(mut func: F) -> Result<(), Error>
where
F: FnMut(AcmeAccountName) -> ControlFlow<Result<(), Error>>,
@@ -80,7 +77,7 @@ pub fn load_dns_challenge_schema() -> Result<Vec<AcmeChallengeSchema>, Error> {
.and_then(Value::as_str)
.unwrap_or(id)
.to_owned(),
- ty: "dns",
+ ty: "dns".into(),
schema: schema.to_owned(),
})
.collect())
diff --git a/src/config/acme/plugin.rs b/src/config/acme/plugin.rs
index 18e71199..2e979ffe 100644
--- a/src/config/acme/plugin.rs
+++ b/src/config/acme/plugin.rs
@@ -1,104 +1,15 @@
use std::sync::LazyLock;
use anyhow::Error;
-use serde::{Deserialize, Serialize};
-use serde_json::Value;
-
-use proxmox_schema::{api, ApiType, Schema, StringSchema, Updater};
-use proxmox_section_config::{SectionConfig, SectionConfigData, SectionConfigPlugin};
-
-use pbs_api_types::PROXMOX_SAFE_ID_FORMAT;
use pbs_config::{open_backup_lockfile, BackupLockGuard};
-
-pub const PLUGIN_ID_SCHEMA: Schema = StringSchema::new("ACME Challenge Plugin ID.")
- .format(&PROXMOX_SAFE_ID_FORMAT)
- .min_length(1)
- .max_length(32)
- .schema();
+use proxmox_acme_api::PLUGIN_ID_SCHEMA;
+use proxmox_acme_api::{DnsPlugin, StandalonePlugin};
+use proxmox_schema::{ApiType, Schema};
+use proxmox_section_config::{SectionConfig, SectionConfigData, SectionConfigPlugin};
+use serde_json::Value;
pub static CONFIG: LazyLock<SectionConfig> = LazyLock::new(init);
-#[api(
- properties: {
- id: { schema: PLUGIN_ID_SCHEMA },
- },
-)]
-#[derive(Deserialize, Serialize)]
-/// Standalone ACME Plugin for the http-1 challenge.
-pub struct StandalonePlugin {
- /// Plugin ID.
- id: String,
-}
-
-impl Default for StandalonePlugin {
- fn default() -> Self {
- Self {
- id: "standalone".to_string(),
- }
- }
-}
-
-#[api(
- properties: {
- id: { schema: PLUGIN_ID_SCHEMA },
- disable: {
- optional: true,
- default: false,
- },
- "validation-delay": {
- default: 30,
- optional: true,
- minimum: 0,
- maximum: 2 * 24 * 60 * 60,
- },
- },
-)]
-/// DNS ACME Challenge Plugin core data.
-#[derive(Deserialize, Serialize, Updater)]
-#[serde(rename_all = "kebab-case")]
-pub struct DnsPluginCore {
- /// Plugin ID.
- #[updater(skip)]
- pub id: String,
-
- /// DNS API Plugin Id.
- pub api: String,
-
- /// Extra delay in seconds to wait before requesting validation.
- ///
- /// Allows to cope with long TTL of DNS records.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- pub validation_delay: Option<u32>,
-
- /// Flag to disable the config.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- pub disable: Option<bool>,
-}
-
-#[api(
- properties: {
- core: { type: DnsPluginCore },
- },
-)]
-/// DNS ACME Challenge Plugin.
-#[derive(Deserialize, Serialize)]
-#[serde(rename_all = "kebab-case")]
-pub struct DnsPlugin {
- #[serde(flatten)]
- pub core: DnsPluginCore,
-
- // We handle this property separately in the API calls.
- /// DNS plugin data (base64url encoded without padding).
- #[serde(with = "proxmox_serde::string_as_base64url_nopad")]
- pub data: String,
-}
-
-impl DnsPlugin {
- pub fn decode_data(&self, output: &mut Vec<u8>) -> Result<(), Error> {
- Ok(proxmox_base64::url::decode_to_vec(&self.data, output)?)
- }
-}
-
fn init() -> SectionConfig {
let mut config = SectionConfig::new(&PLUGIN_ID_SCHEMA);
diff --git a/src/config/node.rs b/src/config/node.rs
index d2a17a49..b9257adf 100644
--- a/src/config/node.rs
+++ b/src/config/node.rs
@@ -6,17 +6,17 @@ use serde::{Deserialize, Serialize};
use proxmox_schema::{api, ApiStringFormat, ApiType, Updater};
-use proxmox_http::ProxyConfig;
-
use pbs_api_types::{
EMAIL_SCHEMA, MULTI_LINE_COMMENT_SCHEMA, OPENSSL_CIPHERS_TLS_1_2_SCHEMA,
OPENSSL_CIPHERS_TLS_1_3_SCHEMA,
};
+use proxmox_acme_api::{AcmeConfig, AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA};
+use proxmox_http::ProxyConfig;
use pbs_buildcfg::configdir;
use pbs_config::{open_backup_lockfile, BackupLockGuard};
-use crate::api2::types::{AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA};
+use crate::api2::types::HTTP_PROXY_SCHEMA;
use proxmox_acme::async_client::AcmeClient;
use proxmox_acme_api::AcmeAccountName;
@@ -45,20 +45,6 @@ pub fn save_config(config: &NodeConfig) -> Result<(), Error> {
pbs_config::replace_backup_config(CONF_FILE, &raw)
}
-#[api(
- properties: {
- account: { type: AcmeAccountName },
- }
-)]
-#[derive(Deserialize, Serialize)]
-/// The ACME configuration.
-///
-/// Currently only contains the name of the account use.
-pub struct AcmeConfig {
- /// Account to use to acquire ACME certificates.
- account: AcmeAccountName,
-}
-
/// All available languages in Proxmox. Taken from proxmox-i18n repository.
/// pt_BR, zh_CN, and zh_TW use the same case in the translation files.
// TODO: auto-generate from available translations
@@ -244,7 +230,7 @@ impl NodeConfig {
pub async fn acme_client(&self) -> Result<AcmeClient, Error> {
let account = if let Some(cfg) = self.acme_config().transpose()? {
- cfg.account
+ AcmeAccountName::from_string(cfg.account)?
} else {
AcmeAccountName::from_string("default".to_string())? // should really not happen
};
diff --git a/src/lib.rs b/src/lib.rs
index 8633378c..828f5842 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -27,8 +27,6 @@ pub(crate) mod auth;
pub mod tape;
-pub mod acme;
-
pub mod client_helpers;
pub mod traffic_control_cache;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 7%]
* [pbs-devel] [PATCH proxmox-backup 2/4] acme: drop local AcmeClient
2025-12-02 15:56 12% [pbs-devel] [PATCH proxmox{-backup, } 0/8] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2025-12-02 15:56 15% ` [pbs-devel] [PATCH proxmox-backup 1/4] acme: include proxmox-acme-api dependency Samuel Rufinatscha
@ 2025-12-02 15:56 6% ` Samuel Rufinatscha
2025-12-02 15:56 8% ` [pbs-devel] [PATCH proxmox-backup 3/4] acme: change API impls to use proxmox-acme-api handlers Samuel Rufinatscha
` (6 subsequent siblings)
8 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-02 15:56 UTC (permalink / raw)
To: pbs-devel
PBS currently uses its own ACME client and API logic, while PDM uses the
factored out proxmox-acme and proxmox-acme-api crates. This duplication
risks differences in behaviour and requires ACME maintenance in two
places. This patch is part of a series to move PBS over to the shared
ACME stack.
Changes:
- Remove the local src/acme/client.rs and switch to
proxmox_acme::async_client::AcmeClient where needed.
- Use proxmox_acme_api::load_client_with_account to the custom
AcmeClient::load() function
- Replace the local do_register() logic with
proxmox_acme_api::register_account, to further ensure accounts are persisted
- Replace the local AcmeAccountName type, required for
proxmox_acme_api::register_account
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
src/acme/client.rs | 691 -------------------------
src/acme/mod.rs | 3 -
src/acme/plugin.rs | 2 +-
src/api2/config/acme.rs | 50 +-
src/api2/node/certificates.rs | 2 +-
src/api2/types/acme.rs | 8 -
src/bin/proxmox_backup_manager/acme.rs | 17 +-
src/config/acme/mod.rs | 8 +-
src/config/node.rs | 9 +-
9 files changed, 36 insertions(+), 754 deletions(-)
delete mode 100644 src/acme/client.rs
diff --git a/src/acme/client.rs b/src/acme/client.rs
deleted file mode 100644
index 9fb6ad55..00000000
--- a/src/acme/client.rs
+++ /dev/null
@@ -1,691 +0,0 @@
-//! HTTP Client for the ACME protocol.
-
-use std::fs::OpenOptions;
-use std::io;
-use std::os::unix::fs::OpenOptionsExt;
-
-use anyhow::{bail, format_err};
-use bytes::Bytes;
-use http_body_util::BodyExt;
-use hyper::Request;
-use nix::sys::stat::Mode;
-use proxmox_http::Body;
-use serde::{Deserialize, Serialize};
-
-use proxmox_acme::account::AccountCreator;
-use proxmox_acme::order::{Order, OrderData};
-use proxmox_acme::types::AccountData as AcmeAccountData;
-use proxmox_acme::Request as AcmeRequest;
-use proxmox_acme::{Account, Authorization, Challenge, Directory, Error, ErrorResponse};
-use proxmox_http::client::Client;
-use proxmox_sys::fs::{replace_file, CreateOptions};
-
-use crate::api2::types::AcmeAccountName;
-use crate::config::acme::account_path;
-use crate::tools::pbs_simple_http;
-
-/// Our on-disk format inherited from PVE's proxmox-acme code.
-#[derive(Deserialize, Serialize)]
-#[serde(rename_all = "camelCase")]
-pub struct AccountData {
- /// The account's location URL.
- location: String,
-
- /// The account data.
- account: AcmeAccountData,
-
- /// The private key as PEM formatted string.
- key: String,
-
- /// ToS URL the user agreed to.
- #[serde(skip_serializing_if = "Option::is_none")]
- tos: Option<String>,
-
- #[serde(skip_serializing_if = "is_false", default)]
- debug: bool,
-
- /// The directory's URL.
- directory_url: String,
-}
-
-#[inline]
-fn is_false(b: &bool) -> bool {
- !*b
-}
-
-pub struct AcmeClient {
- directory_url: String,
- debug: bool,
- account_path: Option<String>,
- tos: Option<String>,
- account: Option<Account>,
- directory: Option<Directory>,
- nonce: Option<String>,
- http_client: Client,
-}
-
-impl AcmeClient {
- /// Create a new ACME client for a given ACME directory URL.
- pub fn new(directory_url: String) -> Self {
- Self {
- directory_url,
- debug: false,
- account_path: None,
- tos: None,
- account: None,
- directory: None,
- nonce: None,
- http_client: pbs_simple_http(None),
- }
- }
-
- /// Load an existing ACME account by name.
- pub async fn load(account_name: &AcmeAccountName) -> Result<Self, anyhow::Error> {
- let account_path = account_path(account_name.as_ref());
- let data = match tokio::fs::read(&account_path).await {
- Ok(data) => data,
- Err(err) if err.kind() == io::ErrorKind::NotFound => {
- bail!("acme account '{}' does not exist", account_name)
- }
- Err(err) => bail!(
- "failed to load acme account from '{}' - {}",
- account_path,
- err
- ),
- };
- let data: AccountData = serde_json::from_slice(&data).map_err(|err| {
- format_err!(
- "failed to parse acme account from '{}' - {}",
- account_path,
- err
- )
- })?;
-
- let account = Account::from_parts(data.location, data.key, data.account);
-
- let mut me = Self::new(data.directory_url);
- me.debug = data.debug;
- me.account_path = Some(account_path);
- me.tos = data.tos;
- me.account = Some(account);
-
- Ok(me)
- }
-
- pub async fn new_account<'a>(
- &'a mut self,
- account_name: &AcmeAccountName,
- tos_agreed: bool,
- contact: Vec<String>,
- rsa_bits: Option<u32>,
- eab_creds: Option<(String, String)>,
- ) -> Result<&'a Account, anyhow::Error> {
- self.tos = if tos_agreed {
- self.terms_of_service_url().await?.map(str::to_owned)
- } else {
- None
- };
-
- let mut account = Account::creator()
- .set_contacts(contact)
- .agree_to_tos(tos_agreed);
-
- if let Some((eab_kid, eab_hmac_key)) = eab_creds {
- account = account.set_eab_credentials(eab_kid, eab_hmac_key)?;
- }
-
- let account = if let Some(bits) = rsa_bits {
- account.generate_rsa_key(bits)?
- } else {
- account.generate_ec_key()?
- };
-
- let _ = self.register_account(account).await?;
-
- crate::config::acme::make_acme_account_dir()?;
- let account_path = account_path(account_name.as_ref());
- let file = OpenOptions::new()
- .write(true)
- .create_new(true)
- .mode(0o600)
- .open(&account_path)
- .map_err(|err| format_err!("failed to open {:?} for writing: {}", account_path, err))?;
- self.write_to(file).map_err(|err| {
- format_err!(
- "failed to write acme account to {:?}: {}",
- account_path,
- err
- )
- })?;
- self.account_path = Some(account_path);
-
- // unwrap: Setting `self.account` is literally this function's job, we just can't keep
- // the borrow from from `self.register_account()` active due to clashes.
- Ok(self.account.as_ref().unwrap())
- }
-
- fn save(&self) -> Result<(), anyhow::Error> {
- let mut data = Vec::<u8>::new();
- self.write_to(&mut data)?;
- let account_path = self.account_path.as_ref().ok_or_else(|| {
- format_err!("no account path set, cannot save updated account information")
- })?;
- crate::config::acme::make_acme_account_dir()?;
- replace_file(
- account_path,
- &data,
- CreateOptions::new()
- .perm(Mode::from_bits_truncate(0o600))
- .owner(nix::unistd::ROOT)
- .group(nix::unistd::Gid::from_raw(0)),
- true,
- )
- }
-
- /// Shortcut to `account().ok_or_else(...).key_authorization()`.
- pub fn key_authorization(&self, token: &str) -> Result<String, anyhow::Error> {
- Ok(Self::need_account(&self.account)?.key_authorization(token)?)
- }
-
- /// Shortcut to `account().ok_or_else(...).dns_01_txt_value()`.
- /// the key authorization value.
- pub fn dns_01_txt_value(&self, token: &str) -> Result<String, anyhow::Error> {
- Ok(Self::need_account(&self.account)?.dns_01_txt_value(token)?)
- }
-
- async fn register_account(
- &mut self,
- account: AccountCreator,
- ) -> Result<&Account, anyhow::Error> {
- let mut retry = retry();
- let mut response = loop {
- retry.tick()?;
-
- let (directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
- let request = account.request(directory, nonce)?;
- match self.run_request(request).await {
- Ok(response) => break response,
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- };
-
- let account = account.response(response.location_required()?, &response.body)?;
-
- self.account = Some(account);
- Ok(self.account.as_ref().unwrap())
- }
-
- pub async fn update_account<T: Serialize>(
- &mut self,
- data: &T,
- ) -> Result<&Account, anyhow::Error> {
- let account = Self::need_account(&self.account)?;
-
- let mut retry = retry();
- let response = loop {
- retry.tick()?;
-
- let (_directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = account.post_request(&account.location, nonce, data)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(response) => break response,
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- };
-
- // unwrap: we've been keeping an immutable reference to it from the top of the method
- let _ = account;
- self.account.as_mut().unwrap().data = response.json()?;
- self.save()?;
- Ok(self.account.as_ref().unwrap())
- }
-
- pub async fn new_order<I>(&mut self, domains: I) -> Result<Order, anyhow::Error>
- where
- I: IntoIterator<Item = String>,
- {
- let account = Self::need_account(&self.account)?;
-
- let order = domains
- .into_iter()
- .fold(OrderData::new(), |order, domain| order.domain(domain));
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let mut new_order = account.new_order(&order, directory, nonce)?;
- let mut response = match Self::execute(
- &mut self.http_client,
- new_order.request.take().unwrap(),
- &mut self.nonce,
- )
- .await
- {
- Ok(response) => response,
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- };
-
- return Ok(
- new_order.response(response.location_required()?, response.bytes().as_ref())?
- );
- }
- }
-
- /// Low level "POST-as-GET" request.
- async fn post_as_get(&mut self, url: &str) -> Result<AcmeResponse, anyhow::Error> {
- let account = Self::need_account(&self.account)?;
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (_directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = account.get_request(url, nonce)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(response) => return Ok(response),
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- }
- }
-
- /// Low level POST request.
- async fn post<T: Serialize>(
- &mut self,
- url: &str,
- data: &T,
- ) -> Result<AcmeResponse, anyhow::Error> {
- let account = Self::need_account(&self.account)?;
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (_directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = account.post_request(url, nonce, data)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(response) => return Ok(response),
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- }
- }
-
- /// Request challenge validation. Afterwards, the challenge should be polled.
- pub async fn request_challenge_validation(
- &mut self,
- url: &str,
- ) -> Result<Challenge, anyhow::Error> {
- Ok(self
- .post(url, &serde_json::Value::Object(Default::default()))
- .await?
- .json()?)
- }
-
- /// Assuming the provided URL is an 'Authorization' URL, get and deserialize it.
- pub async fn get_authorization(&mut self, url: &str) -> Result<Authorization, anyhow::Error> {
- Ok(self.post_as_get(url).await?.json()?)
- }
-
- /// Assuming the provided URL is an 'Order' URL, get and deserialize it.
- pub async fn get_order(&mut self, url: &str) -> Result<OrderData, anyhow::Error> {
- Ok(self.post_as_get(url).await?.json()?)
- }
-
- /// Finalize an Order via its `finalize` URL property and the DER encoded CSR.
- pub async fn finalize(&mut self, url: &str, csr: &[u8]) -> Result<(), anyhow::Error> {
- let csr = proxmox_base64::url::encode_no_pad(csr);
- let data = serde_json::json!({ "csr": csr });
- self.post(url, &data).await?;
- Ok(())
- }
-
- /// Download a certificate via its 'certificate' URL property.
- ///
- /// The certificate will be a PEM certificate chain.
- pub async fn get_certificate(&mut self, url: &str) -> Result<Bytes, anyhow::Error> {
- Ok(self.post_as_get(url).await?.body)
- }
-
- /// Revoke an existing certificate (PEM or DER formatted).
- pub async fn revoke_certificate(
- &mut self,
- certificate: &[u8],
- reason: Option<u32>,
- ) -> Result<(), anyhow::Error> {
- // TODO: This can also work without an account.
- let account = Self::need_account(&self.account)?;
-
- let revocation = account.revoke_certificate(certificate, reason)?;
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = revocation.request(directory, nonce)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(_response) => return Ok(()),
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- }
- }
-
- fn need_account(account: &Option<Account>) -> Result<&Account, anyhow::Error> {
- account
- .as_ref()
- .ok_or_else(|| format_err!("cannot use client without an account"))
- }
-
- pub(crate) fn account(&self) -> Result<&Account, anyhow::Error> {
- Self::need_account(&self.account)
- }
-
- pub fn tos(&self) -> Option<&str> {
- self.tos.as_deref()
- }
-
- pub fn directory_url(&self) -> &str {
- &self.directory_url
- }
-
- fn to_account_data(&self) -> Result<AccountData, anyhow::Error> {
- let account = self.account()?;
-
- Ok(AccountData {
- location: account.location.clone(),
- key: account.private_key.clone(),
- account: AcmeAccountData {
- only_return_existing: false, // don't actually write this out in case it's set
- ..account.data.clone()
- },
- tos: self.tos.clone(),
- debug: self.debug,
- directory_url: self.directory_url.clone(),
- })
- }
-
- fn write_to<T: io::Write>(&self, out: T) -> Result<(), anyhow::Error> {
- let data = self.to_account_data()?;
-
- Ok(serde_json::to_writer_pretty(out, &data)?)
- }
-}
-
-struct AcmeResponse {
- body: Bytes,
- location: Option<String>,
- got_nonce: bool,
-}
-
-impl AcmeResponse {
- /// Convenience helper to assert that a location header was part of the response.
- fn location_required(&mut self) -> Result<String, anyhow::Error> {
- self.location
- .take()
- .ok_or_else(|| format_err!("missing Location header"))
- }
-
- /// Convenience shortcut to perform json deserialization of the returned body.
- fn json<T: for<'a> Deserialize<'a>>(&self) -> Result<T, Error> {
- Ok(serde_json::from_slice(&self.body)?)
- }
-
- /// Convenience shortcut to get the body as bytes.
- fn bytes(&self) -> &[u8] {
- &self.body
- }
-}
-
-impl AcmeClient {
- /// Non-self-borrowing run_request version for borrow workarounds.
- async fn execute(
- http_client: &mut Client,
- request: AcmeRequest,
- nonce: &mut Option<String>,
- ) -> Result<AcmeResponse, Error> {
- let req_builder = Request::builder().method(request.method).uri(&request.url);
-
- let http_request = if !request.content_type.is_empty() {
- req_builder
- .header("Content-Type", request.content_type)
- .header("Content-Length", request.body.len())
- .body(request.body.into())
- } else {
- req_builder.body(Body::empty())
- }
- .map_err(|err| Error::Custom(format!("failed to create http request: {err}")))?;
-
- let response = http_client
- .request(http_request)
- .await
- .map_err(|err| Error::Custom(err.to_string()))?;
- let (parts, body) = response.into_parts();
-
- let status = parts.status.as_u16();
- let body = body
- .collect()
- .await
- .map_err(|err| Error::Custom(format!("failed to retrieve response body: {err}")))?
- .to_bytes();
-
- let got_nonce = if let Some(new_nonce) = parts.headers.get(proxmox_acme::REPLAY_NONCE) {
- let new_nonce = new_nonce.to_str().map_err(|err| {
- Error::Client(format!(
- "received invalid replay-nonce header from ACME server: {err}"
- ))
- })?;
- *nonce = Some(new_nonce.to_owned());
- true
- } else {
- false
- };
-
- if parts.status.is_success() {
- if status != request.expected {
- return Err(Error::InvalidApi(format!(
- "ACME server responded with unexpected status code: {:?}",
- parts.status
- )));
- }
-
- let location = parts
- .headers
- .get("Location")
- .map(|header| {
- header.to_str().map(str::to_owned).map_err(|err| {
- Error::Client(format!(
- "received invalid location header from ACME server: {err}"
- ))
- })
- })
- .transpose()?;
-
- return Ok(AcmeResponse {
- body,
- location,
- got_nonce,
- });
- }
-
- let error: ErrorResponse = serde_json::from_slice(&body).map_err(|err| {
- Error::Client(format!(
- "error status with improper error ACME response: {err}"
- ))
- })?;
-
- if error.ty == proxmox_acme::error::BAD_NONCE {
- if !got_nonce {
- return Err(Error::InvalidApi(
- "badNonce without a new Replay-Nonce header".to_string(),
- ));
- }
- return Err(Error::BadNonce);
- }
-
- Err(Error::Api(error))
- }
-
- /// Low-level API to run an n API request. This automatically updates the current nonce!
- async fn run_request(&mut self, request: AcmeRequest) -> Result<AcmeResponse, Error> {
- Self::execute(&mut self.http_client, request, &mut self.nonce).await
- }
-
- pub async fn directory(&mut self) -> Result<&Directory, Error> {
- Ok(Self::get_directory(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?
- .0)
- }
-
- async fn get_directory<'a, 'b>(
- http_client: &mut Client,
- directory_url: &str,
- directory: &'a mut Option<Directory>,
- nonce: &'b mut Option<String>,
- ) -> Result<(&'a Directory, Option<&'b str>), Error> {
- if let Some(d) = directory {
- return Ok((d, nonce.as_deref()));
- }
-
- let response = Self::execute(
- http_client,
- AcmeRequest {
- url: directory_url.to_string(),
- method: "GET",
- content_type: "",
- body: String::new(),
- expected: 200,
- },
- nonce,
- )
- .await?;
-
- *directory = Some(Directory::from_parts(
- directory_url.to_string(),
- response.json()?,
- ));
-
- Ok((directory.as_mut().unwrap(), nonce.as_deref()))
- }
-
- /// Like `get_directory`, but if the directory provides no nonce, also performs a `HEAD`
- /// request on the new nonce URL.
- async fn get_dir_nonce<'a, 'b>(
- http_client: &mut Client,
- directory_url: &str,
- directory: &'a mut Option<Directory>,
- nonce: &'b mut Option<String>,
- ) -> Result<(&'a Directory, &'b str), Error> {
- // this let construct is a lifetime workaround:
- let _ = Self::get_directory(http_client, directory_url, directory, nonce).await?;
- let dir = directory.as_ref().unwrap(); // the above fails if it couldn't fill this option
- if nonce.is_none() {
- // this is also a lifetime issue...
- let _ = Self::get_nonce(http_client, nonce, dir.new_nonce_url()).await?;
- };
- Ok((dir, nonce.as_deref().unwrap()))
- }
-
- pub async fn terms_of_service_url(&mut self) -> Result<Option<&str>, Error> {
- Ok(self.directory().await?.terms_of_service_url())
- }
-
- async fn get_nonce<'a>(
- http_client: &mut Client,
- nonce: &'a mut Option<String>,
- new_nonce_url: &str,
- ) -> Result<&'a str, Error> {
- let response = Self::execute(
- http_client,
- AcmeRequest {
- url: new_nonce_url.to_owned(),
- method: "HEAD",
- content_type: "",
- body: String::new(),
- expected: 200,
- },
- nonce,
- )
- .await?;
-
- if !response.got_nonce {
- return Err(Error::InvalidApi(
- "no new nonce received from new nonce URL".to_string(),
- ));
- }
-
- nonce
- .as_deref()
- .ok_or_else(|| Error::Client("failed to update nonce".to_string()))
- }
-}
-
-/// bad nonce retry count helper
-struct Retry(usize);
-
-const fn retry() -> Retry {
- Retry(0)
-}
-
-impl Retry {
- fn tick(&mut self) -> Result<(), Error> {
- if self.0 >= 3 {
- Err(Error::Client("kept getting a badNonce error!".to_string()))
- } else {
- self.0 += 1;
- Ok(())
- }
- }
-}
diff --git a/src/acme/mod.rs b/src/acme/mod.rs
index bf61811c..cc561f9a 100644
--- a/src/acme/mod.rs
+++ b/src/acme/mod.rs
@@ -1,5 +1,2 @@
-mod client;
-pub use client::AcmeClient;
-
pub(crate) mod plugin;
pub(crate) use plugin::get_acme_plugin;
diff --git a/src/acme/plugin.rs b/src/acme/plugin.rs
index f756e9b5..5bc09e1f 100644
--- a/src/acme/plugin.rs
+++ b/src/acme/plugin.rs
@@ -20,8 +20,8 @@ use tokio::process::Command;
use proxmox_acme::{Authorization, Challenge};
-use crate::acme::AcmeClient;
use crate::api2::types::AcmeDomain;
+use proxmox_acme::async_client::AcmeClient;
use proxmox_rest_server::WorkerTask;
use crate::config::acme::plugin::{DnsPlugin, PluginData};
diff --git a/src/api2/config/acme.rs b/src/api2/config/acme.rs
index 35c3fb77..02f88e2e 100644
--- a/src/api2/config/acme.rs
+++ b/src/api2/config/acme.rs
@@ -16,15 +16,15 @@ use proxmox_router::{
use proxmox_schema::{api, param_bail};
use proxmox_acme::types::AccountData as AcmeAccountData;
-use proxmox_acme::Account;
use pbs_api_types::{Authid, PRIV_SYS_MODIFY};
-use crate::acme::AcmeClient;
-use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
+use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
use crate::config::acme::plugin::{
self, DnsPlugin, DnsPluginCore, DnsPluginCoreUpdater, PLUGIN_ID_SCHEMA,
};
+use proxmox_acme::async_client::AcmeClient;
+use proxmox_acme_api::AcmeAccountName;
use proxmox_rest_server::WorkerTask;
pub(crate) const ROUTER: Router = Router::new()
@@ -143,15 +143,15 @@ pub struct AccountInfo {
)]
/// Return existing ACME account information.
pub async fn get_account(name: AcmeAccountName) -> Result<AccountInfo, Error> {
- let client = AcmeClient::load(&name).await?;
- let account = client.account()?;
+ let account_info = proxmox_acme_api::get_account(name).await?;
+
Ok(AccountInfo {
- location: account.location.clone(),
- tos: client.tos().map(str::to_owned),
- directory: client.directory_url().to_owned(),
+ location: account_info.location,
+ tos: account_info.tos,
+ directory: account_info.directory,
account: AcmeAccountData {
only_return_existing: false, // don't actually write this out in case it's set
- ..account.data.clone()
+ ..account_info.account
},
})
}
@@ -240,41 +240,24 @@ fn register_account(
auth_id.to_string(),
true,
move |_worker| async move {
- let mut client = AcmeClient::new(directory);
-
info!("Registering ACME account '{}'...", &name);
- let account = do_register_account(
- &mut client,
+ let location = proxmox_acme_api::register_account(
&name,
- tos_url.is_some(),
contact,
- None,
+ tos_url,
+ Some(directory),
eab_kid.zip(eab_hmac_key),
)
.await?;
- info!("Registration successful, account URL: {}", account.location);
+ info!("Registration successful, account URL: {}", location);
Ok(())
},
)
}
-pub async fn do_register_account<'a>(
- client: &'a mut AcmeClient,
- name: &AcmeAccountName,
- agree_to_tos: bool,
- contact: String,
- rsa_bits: Option<u32>,
- eab_creds: Option<(String, String)>,
-) -> Result<&'a Account, Error> {
- let contact = account_contact_from_string(&contact);
- client
- .new_account(name, agree_to_tos, contact, rsa_bits, eab_creds)
- .await
-}
-
#[api(
input: {
properties: {
@@ -312,7 +295,10 @@ pub fn update_account(
None => json!({}),
};
- AcmeClient::load(&name).await?.update_account(&data).await?;
+ proxmox_acme_api::load_client_with_account(&name)
+ .await?
+ .update_account(&data)
+ .await?;
Ok(())
},
@@ -350,7 +336,7 @@ pub fn deactivate_account(
auth_id.to_string(),
true,
move |_worker| async move {
- match AcmeClient::load(&name)
+ match proxmox_acme_api::load_client_with_account(&name)
.await?
.update_account(&json!({"status": "deactivated"}))
.await
diff --git a/src/api2/node/certificates.rs b/src/api2/node/certificates.rs
index 61ef910e..31196715 100644
--- a/src/api2/node/certificates.rs
+++ b/src/api2/node/certificates.rs
@@ -17,10 +17,10 @@ use pbs_buildcfg::configdir;
use pbs_tools::cert;
use tracing::warn;
-use crate::acme::AcmeClient;
use crate::api2::types::AcmeDomain;
use crate::config::node::NodeConfig;
use crate::server::send_certificate_renewal_mail;
+use proxmox_acme::async_client::AcmeClient;
use proxmox_rest_server::WorkerTask;
pub const ROUTER: Router = Router::new()
diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
index 210ebdbc..7c9063c0 100644
--- a/src/api2/types/acme.rs
+++ b/src/api2/types/acme.rs
@@ -60,14 +60,6 @@ pub struct KnownAcmeDirectory {
pub url: &'static str,
}
-proxmox_schema::api_string_type! {
- #[api(format: &PROXMOX_SAFE_ID_FORMAT)]
- /// ACME account name.
- #[derive(Clone, Eq, PartialEq, Hash, Deserialize, Serialize)]
- #[serde(transparent)]
- pub struct AcmeAccountName(String);
-}
-
#[api(
properties: {
schema: {
diff --git a/src/bin/proxmox_backup_manager/acme.rs b/src/bin/proxmox_backup_manager/acme.rs
index 0f0eafea..bb987b26 100644
--- a/src/bin/proxmox_backup_manager/acme.rs
+++ b/src/bin/proxmox_backup_manager/acme.rs
@@ -7,9 +7,9 @@ use proxmox_router::{cli::*, ApiHandler, RpcEnvironment};
use proxmox_schema::api;
use proxmox_sys::fs::file_get_contents;
-use proxmox_backup::acme::AcmeClient;
+use proxmox_acme::async_client::AcmeClient;
+use proxmox_acme_api::AcmeAccountName;
use proxmox_backup::api2;
-use proxmox_backup::api2::types::AcmeAccountName;
use proxmox_backup::config::acme::plugin::DnsPluginCore;
use proxmox_backup::config::acme::KNOWN_ACME_DIRECTORIES;
@@ -188,17 +188,20 @@ async fn register_account(
println!("Attempting to register account with {directory_url:?}...");
- let account = api2::config::acme::do_register_account(
- &mut client,
+ let tos_agreed = tos_agreed
+ .then(|| directory.terms_of_service_url().map(str::to_owned))
+ .flatten();
+
+ let location = proxmox_acme_api::register_account(
&name,
- tos_agreed,
contact,
- None,
+ tos_agreed,
+ Some(directory_url),
eab_creds,
)
.await?;
- println!("Registration successful, account URL: {}", account.location);
+ println!("Registration successful, account URL: {}", location);
Ok(())
}
diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
index 274a23fd..d31b2bc9 100644
--- a/src/config/acme/mod.rs
+++ b/src/config/acme/mod.rs
@@ -10,7 +10,8 @@ use proxmox_sys::fs::{file_read_string, CreateOptions};
use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
-use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
+use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
+use proxmox_acme_api::AcmeAccountName;
pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
pub(crate) const ACME_ACCOUNT_DIR: &str = pbs_buildcfg::configdir!("/acme/accounts");
@@ -35,11 +36,6 @@ pub(crate) fn make_acme_dir() -> Result<(), Error> {
create_acme_subdir(ACME_DIR)
}
-pub(crate) fn make_acme_account_dir() -> Result<(), Error> {
- make_acme_dir()?;
- create_acme_subdir(ACME_ACCOUNT_DIR)
-}
-
pub const KNOWN_ACME_DIRECTORIES: &[KnownAcmeDirectory] = &[
KnownAcmeDirectory {
name: "Let's Encrypt V2",
diff --git a/src/config/node.rs b/src/config/node.rs
index d2d6e383..d2a17a49 100644
--- a/src/config/node.rs
+++ b/src/config/node.rs
@@ -16,10 +16,9 @@ use pbs_api_types::{
use pbs_buildcfg::configdir;
use pbs_config::{open_backup_lockfile, BackupLockGuard};
-use crate::acme::AcmeClient;
-use crate::api2::types::{
- AcmeAccountName, AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA,
-};
+use crate::api2::types::{AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA};
+use proxmox_acme::async_client::AcmeClient;
+use proxmox_acme_api::AcmeAccountName;
const CONF_FILE: &str = configdir!("/node.cfg");
const LOCK_FILE: &str = configdir!("/.node.lck");
@@ -249,7 +248,7 @@ impl NodeConfig {
} else {
AcmeAccountName::from_string("default".to_string())? // should really not happen
};
- AcmeClient::load(&account).await
+ proxmox_acme_api::load_client_with_account(&account).await
}
pub fn acme_domains(&'_ self) -> AcmeDomainIter<'_> {
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* [pbs-devel] [PATCH proxmox{-backup, } 0/8] fix #6939: acme: support servers returning 204 for nonce requests
@ 2025-12-02 15:56 12% Samuel Rufinatscha
2025-12-02 15:56 15% ` [pbs-devel] [PATCH proxmox-backup 1/4] acme: include proxmox-acme-api dependency Samuel Rufinatscha
` (8 more replies)
0 siblings, 9 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-02 15:56 UTC (permalink / raw)
To: pbs-devel
Hi,
this series fixes account registration for ACME providers that return
HTTP 204 No Content to the newNonce request. Currently, both the PBS
ACME client and the shared ACME client in proxmox-acme only accept
HTTP 200 OK for this request. The issue was observed in PBS against a
custom ACME deployment and reported as bug #6939 [1].
## Problem
During ACME account registration, PBS first fetches an anti-replay
nonce by sending a HEAD request to the CA’s newNonce URL.
RFC 8555 §7.2 [2] states that:
* the server MUST include a Replay-Nonce header with a fresh nonce,
* the server SHOULD use status 200 OK for the HEAD request,
* the server MUST also handle GET on the same resource and may return
204 No Content with an empty body.
The reporter observed the following error message:
*ACME server responded with unexpected status code: 204*
and mentioned that the issue did not appear with PVE 9 [1]. Looking at
PVE’s Perl ACME client [3], it uses a GET request instead of HEAD and
accepts any 2xx success code when retrieving the nonce. This difference
in behavior does not affect functionality but is worth noting for
consistency across implementations.
## Approach
To support ACME providers which return 204 No Content, the Rust ACME
clients in proxmox-backup and proxmox need to treat both 200 OK and 204
No Content as valid responses for the nonce request, as long as a
Replay-Nonce header is present.
This series changes the expected field of the internal Request type
from a single u16 to a list of allowed status codes
(e.g. &'static [u16]), so one request can explicitly accept multiple
success codes.
To avoid fixing the issue twice (once in PBS’ own ACME client and once
in the shared Rust client), this series first refactors PBS to use the
shared AcmeClient from proxmox-acme / proxmox-acme-api, similar to PDM,
and then applies the bug fix in that shared implementation so that all
consumers benefit from the more tolerant behavior.
## Testing
*Testing the refactor*
To test the refactor, I
(1) installed latest stable PBS on a VM
(2) created .deb package from latest PBS (master), containing the
refactor
(3) installed created .deb package
(4) installed Pebble from Let's Encrypt from Let's Encrypt [5] on the
same VM
(5) created an ACME account and ordered the new certificate for the
host domain.
Steps to reproduce:
(1) install latest stable PBS on a VM, created .deb package from latest
PBS (master) containing the refactor, install created .deb package
(2) install Pebble from Let's Encrypt from Let's Encrypt [5] on the
same VM:
cd
apt update
apt install -y golang git
git clone https://github.com/letsencrypt/pebble
cd pebble
go build ./cmd/pebble
then, downloaded and trusted the Pebble cert:
wget https://raw.githubusercontent.com/letsencrypt/pebble/main/test/certs/pebble.minica.pem
cp pebble.minica.pem /usr/local/share/ca-certificates/pebble.minica.crt
update-ca-certificates
We want Pebble to perform HTTP-01 validation against port 80, because
PBS’s standalone plugin will bind port 80. Set httpPort to 80.
nano ./test/config/pebble-config.json
Started the Pebble server in the background:
./pebble -config ./test/config/pebble-config.json &
Created a Pebble ACME account:
proxmox-backup-manager acme account register default admin@example.com \
--directory 'https://127.0.0.1:14000/dir'
To verify persistence of the account I checked
ls /etc/proxmox-backup/acme/accounts
Verified if update-account works
proxmox-backup-manager acme account update default --contact "a@example.com,b@example.com"
proxmox-backup-manager acme account info default
In the PBS GUI, you can create a new domain. You can use your host
domain name (see /etc/hosts). Select the created account and order the
certificate.
After a page reload, you might need to accept the new certificate in the browser.
In the PBS dashboard, you should then see the new Pebble certificate.
*Note: on reboot, the created Pebble ACME account will be gone and you
will need to create a new one. Pebble does not persist account info.
In that case remove your previously created account in
/etc/proxmox-backup/acme/accounts.
*Testing the newNonce fix*
To prove the ACME newNonce fix, I put nginx in front of Pebble, to
intercept the newNonce request in order to return 204 No Content
instead of 200 OK, all other requests are unchanged and forwarded to
Pebble. Requires trusting the nginx CAs via
/usr/local/share/ca-certificates + update-ca-certificates on the VM.
Then I ran following command against nginx:
proxmox-backup-manager acme account register proxytest root@backup.local --directory 'https://nginx-address/dir
The account could be created successfully. When adjusting the nginx
configuration to return any other non-expected success status code,
PBS expectely rejects.
## Patch summary
[PATCH proxmox-backup v4 1/4] acme: include proxmox-acme-api dependency
[PATCH proxmox-backup v4 2/4] acme: drop local AcmeClient
[PATCH proxmox-backup v4 3/4] acme: change API impls to use proxmox-acme-api handlers
[PATCH proxmox-backup v4 4/4] acme: certificate ordering through proxmox-acme-api
[PATCH proxmox v4 1/4] acme: reduce visibility of Request type
[PATCH proxmox v4 2/4] acme: introduce http_status module
[PATCH proxmox v4 3/4] acme-api: add helper to load client for an account
[PATCH proxmox v4 4/4] fix #6939: support servers returning 204 for newNonce
Thanks for considering this patch series, I look forward to your
feedback.
Best,
Samuel Rufinatscha
## Changes from v1:
[PATCH proxmox v2 1/1] fix #6939: support providers returning 204 for nonce
requests
* Introduced `http_success` module to contain the http success codes
* Replaced `Vec<u16>` with `&[u16]` for expected codes to avoid
allocations.
* Clarified the PVEs Perl ACME client behaviour in the commit message.
[PATCH proxmox-backup v2 1/1] acme: accept HTTP 204 from newNonce endpoint
* Integrated the `http_success` module, replacing `Vec<u16>` with `&[u16]`
* Clarified the PVEs Perl ACME client behaviour in the commit message.
## Changes from v2:
[PATCH proxmox v3 1/1] fix #6939: support providers returning 204 for nonce
requests
* Rename `http_success` module to `http_status`
[PATCH proxmox-backup v3 1/1] acme: accept HTTP 204 from newNonce endpoint
* Replace `http_success` usage
## Changes from v3:
Removed: [PATCH proxmox-backup v3 1/1].
Added:
[PATCH proxmox-backup v4 1/4] acme: include proxmox-acme-api dependency
* New: add proxmox-acme-api as a dependency and initialize it in
PBS so PBS can use the shared ACME API instead.
[PATCH proxmox-backup v4 2/4] acme: drop local AcmeClient
* New: remove the PBS-local AcmeClient implementation and switch PBS
over to the shared proxmox-acme async client.
[PATCH proxmox-backup v4 3/4] acme: change API impls to use proxmox-acme-api
handlers
* New: rework PBS’ ACME API endpoints to delegate to
proxmox-acme-api handlers instead of duplicating logic locally.
[PATCH proxmox-backup v4 4/4] acme: certificate ordering through
proxmox-acme-api
* New: move PBS’ ACME certificate ordering logic over to
proxmox-acme-api, keeping only certificate installation/reload in
PBS.
[PATCH proxmox v4 1/4] acme: reduce visibility of Request type
* New: hide the low-level Request type and its fields behind
constructors / reduced visibility so changes to “expected” no longer
affect the public API as they did in v3.
[PATCH proxmox v4 2/4] acme: introduce http_status module
* New: split out the HTTP status constants into an internal
http_status module as a separate preparatory cleanup before the bug
fix, instead of doing this inline like in v3.
[PATCH proxmox v4 3/4] acme-api: add helper to load client for an account
* New: add a load_client_with_account helper in proxmox-acme-api so
PBS (and others) can construct an AcmeClient for a configured account
without duplicating boilerplate.
Changed:
[PATCH proxmox v3 1/1] -> [PATCH proxmox v4 4/4]
fix #6939: acme: support server returning 204 for nonce requests
* Rebased on top of the refactor: keep the same behavioural fix as in v3
(accept 204 for newNonce with Replay-Nonce present), but implement it
on top of the http_status module that is part of the refactor.
proxmox-backup:
Samuel Rufinatscha (4):
acme: include proxmox-acme-api dependency
acme: drop local AcmeClient
acme: change API impls to use proxmox-acme-api handlers
acme: certificate ordering through proxmox-acme-api
Cargo.toml | 3 +
src/acme/client.rs | 691 -------------------------
src/acme/mod.rs | 5 -
src/acme/plugin.rs | 336 ------------
src/api2/config/acme.rs | 407 ++-------------
src/api2/node/certificates.rs | 240 ++-------
src/api2/types/acme.rs | 98 ----
src/api2/types/mod.rs | 3 -
src/bin/proxmox-backup-api.rs | 2 +
src/bin/proxmox-backup-manager.rs | 2 +
src/bin/proxmox-backup-proxy.rs | 1 +
src/bin/proxmox_backup_manager/acme.rs | 21 +-
src/config/acme/mod.rs | 51 +-
src/config/acme/plugin.rs | 99 +---
src/config/node.rs | 29 +-
src/lib.rs | 2 -
16 files changed, 103 insertions(+), 1887 deletions(-)
delete mode 100644 src/acme/client.rs
delete mode 100644 src/acme/mod.rs
delete mode 100644 src/acme/plugin.rs
delete mode 100644 src/api2/types/acme.rs
proxmox:
Samuel Rufinatscha (4):
acme: reduce visibility of Request type
acme: introduce http_status module
acme-api: add helper to load client for an account
fix #6939: acme: support servers returning 204 for nonce requests
proxmox-acme-api/src/account_api_impl.rs | 5 +++++
proxmox-acme-api/src/lib.rs | 3 ++-
proxmox-acme/src/account.rs | 27 +++++++++++++-----------
proxmox-acme/src/async_client.rs | 8 +++----
proxmox-acme/src/authorization.rs | 2 +-
proxmox-acme/src/client.rs | 8 +++----
proxmox-acme/src/lib.rs | 6 ++----
proxmox-acme/src/order.rs | 2 +-
proxmox-acme/src/request.rs | 25 +++++++++++++++-------
9 files changed, 51 insertions(+), 35 deletions(-)
Summary over all repositories:
25 files changed, 154 insertions(+), 1922 deletions(-)
--
Generated by git-murpp 0.8.1
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] [PATCH proxmox-backup 3/4] acme: change API impls to use proxmox-acme-api handlers
2025-12-02 15:56 12% [pbs-devel] [PATCH proxmox{-backup, } 0/8] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2025-12-02 15:56 15% ` [pbs-devel] [PATCH proxmox-backup 1/4] acme: include proxmox-acme-api dependency Samuel Rufinatscha
2025-12-02 15:56 6% ` [pbs-devel] [PATCH proxmox-backup 2/4] acme: drop local AcmeClient Samuel Rufinatscha
@ 2025-12-02 15:56 8% ` Samuel Rufinatscha
2025-12-02 15:56 7% ` [pbs-devel] [PATCH proxmox-backup 4/4] acme: certificate ordering through proxmox-acme-api Samuel Rufinatscha
` (5 subsequent siblings)
8 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-02 15:56 UTC (permalink / raw)
To: pbs-devel
PBS currently uses its own ACME client and API logic, while PDM uses the
factored out proxmox-acme and proxmox-acme-api crates. This duplication
risks differences in behaviour and requires ACME maintenance in two
places. This patch is part of a series to move PBS over to the shared
ACME stack.
Changes:
- Replace api2/config/acme.rs API logic with proxmox-acme-api handlers.
- Drop local caching and helper types that duplicate proxmox-acme-api.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
src/api2/config/acme.rs | 385 ++-----------------------
src/api2/types/acme.rs | 16 -
src/bin/proxmox_backup_manager/acme.rs | 6 +-
src/config/acme/mod.rs | 44 +--
4 files changed, 35 insertions(+), 416 deletions(-)
diff --git a/src/api2/config/acme.rs b/src/api2/config/acme.rs
index 02f88e2e..a112c8ee 100644
--- a/src/api2/config/acme.rs
+++ b/src/api2/config/acme.rs
@@ -1,31 +1,17 @@
-use std::fs;
-use std::ops::ControlFlow;
-use std::path::Path;
-use std::sync::{Arc, LazyLock, Mutex};
-use std::time::SystemTime;
-
-use anyhow::{bail, format_err, Error};
-use hex::FromHex;
-use serde::{Deserialize, Serialize};
-use serde_json::{json, Value};
-use tracing::{info, warn};
-
-use proxmox_router::{
- http_bail, list_subdirs_api_method, Permission, Router, RpcEnvironment, SubdirMap,
-};
-use proxmox_schema::{api, param_bail};
-
-use proxmox_acme::types::AccountData as AcmeAccountData;
-
+use anyhow::Error;
use pbs_api_types::{Authid, PRIV_SYS_MODIFY};
-
-use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
-use crate::config::acme::plugin::{
- self, DnsPlugin, DnsPluginCore, DnsPluginCoreUpdater, PLUGIN_ID_SCHEMA,
+use proxmox_acme_api::{
+ AccountEntry, AccountInfo, AcmeAccountName, AcmeChallengeSchema, ChallengeSchemaWrapper,
+ DeletablePluginProperty, DnsPluginCore, DnsPluginCoreUpdater, KnownAcmeDirectory, PluginConfig,
+ DEFAULT_ACME_DIRECTORY_ENTRY, PLUGIN_ID_SCHEMA,
};
-use proxmox_acme::async_client::AcmeClient;
-use proxmox_acme_api::AcmeAccountName;
+use proxmox_config_digest::ConfigDigest;
use proxmox_rest_server::WorkerTask;
+use proxmox_router::{
+ http_bail, list_subdirs_api_method, Permission, Router, RpcEnvironment, SubdirMap,
+};
+use proxmox_schema::api;
+use tracing::info;
pub(crate) const ROUTER: Router = Router::new()
.get(&list_subdirs_api_method!(SUBDIRS))
@@ -67,19 +53,6 @@ const PLUGIN_ITEM_ROUTER: Router = Router::new()
.put(&API_METHOD_UPDATE_PLUGIN)
.delete(&API_METHOD_DELETE_PLUGIN);
-#[api(
- properties: {
- name: { type: AcmeAccountName },
- },
-)]
-/// An ACME Account entry.
-///
-/// Currently only contains a 'name' property.
-#[derive(Serialize)]
-pub struct AccountEntry {
- name: AcmeAccountName,
-}
-
#[api(
access: {
permission: &Permission::Privilege(&["system", "certificates"], PRIV_SYS_MODIFY, false),
@@ -93,40 +66,7 @@ pub struct AccountEntry {
)]
/// List ACME accounts.
pub fn list_accounts() -> Result<Vec<AccountEntry>, Error> {
- let mut entries = Vec::new();
- crate::config::acme::foreach_acme_account(|name| {
- entries.push(AccountEntry { name });
- ControlFlow::Continue(())
- })?;
- Ok(entries)
-}
-
-#[api(
- properties: {
- account: { type: Object, properties: {}, additional_properties: true },
- tos: {
- type: String,
- optional: true,
- },
- },
-)]
-/// ACME Account information.
-///
-/// This is what we return via the API.
-#[derive(Serialize)]
-pub struct AccountInfo {
- /// Raw account data.
- account: AcmeAccountData,
-
- /// The ACME directory URL the account was created at.
- directory: String,
-
- /// The account's own URL within the ACME directory.
- location: String,
-
- /// The ToS URL, if the user agreed to one.
- #[serde(skip_serializing_if = "Option::is_none")]
- tos: Option<String>,
+ proxmox_acme_api::list_accounts()
}
#[api(
@@ -143,23 +83,7 @@ pub struct AccountInfo {
)]
/// Return existing ACME account information.
pub async fn get_account(name: AcmeAccountName) -> Result<AccountInfo, Error> {
- let account_info = proxmox_acme_api::get_account(name).await?;
-
- Ok(AccountInfo {
- location: account_info.location,
- tos: account_info.tos,
- directory: account_info.directory,
- account: AcmeAccountData {
- only_return_existing: false, // don't actually write this out in case it's set
- ..account_info.account
- },
- })
-}
-
-fn account_contact_from_string(s: &str) -> Vec<String> {
- s.split(&[' ', ';', ',', '\0'][..])
- .map(|s| format!("mailto:{s}"))
- .collect()
+ proxmox_acme_api::get_account(name).await
}
#[api(
@@ -224,15 +148,11 @@ fn register_account(
);
}
- if Path::new(&crate::config::acme::account_path(&name)).exists() {
+ if std::path::Path::new(&proxmox_acme_api::account_config_filename(&name)).exists() {
http_bail!(BAD_REQUEST, "account {} already exists", name);
}
- let directory = directory.unwrap_or_else(|| {
- crate::config::acme::DEFAULT_ACME_DIRECTORY_ENTRY
- .url
- .to_owned()
- });
+ let directory = directory.unwrap_or_else(|| DEFAULT_ACME_DIRECTORY_ENTRY.url.to_string());
WorkerTask::spawn(
"acme-register",
@@ -288,17 +208,7 @@ pub fn update_account(
auth_id.to_string(),
true,
move |_worker| async move {
- let data = match contact {
- Some(data) => json!({
- "contact": account_contact_from_string(&data),
- }),
- None => json!({}),
- };
-
- proxmox_acme_api::load_client_with_account(&name)
- .await?
- .update_account(&data)
- .await?;
+ proxmox_acme_api::update_account(&name, contact).await?;
Ok(())
},
@@ -336,18 +246,8 @@ pub fn deactivate_account(
auth_id.to_string(),
true,
move |_worker| async move {
- match proxmox_acme_api::load_client_with_account(&name)
- .await?
- .update_account(&json!({"status": "deactivated"}))
- .await
- {
- Ok(_account) => (),
- Err(err) if !force => return Err(err),
- Err(err) => {
- warn!("error deactivating account {name}, proceeding anyway - {err}");
- }
- }
- crate::config::acme::mark_account_deactivated(&name)?;
+ proxmox_acme_api::deactivate_account(&name, force).await?;
+
Ok(())
},
)
@@ -374,15 +274,7 @@ pub fn deactivate_account(
)]
/// Get the Terms of Service URL for an ACME directory.
async fn get_tos(directory: Option<String>) -> Result<Option<String>, Error> {
- let directory = directory.unwrap_or_else(|| {
- crate::config::acme::DEFAULT_ACME_DIRECTORY_ENTRY
- .url
- .to_owned()
- });
- Ok(AcmeClient::new(directory)
- .terms_of_service_url()
- .await?
- .map(str::to_owned))
+ proxmox_acme_api::get_tos(directory).await
}
#[api(
@@ -397,52 +289,7 @@ async fn get_tos(directory: Option<String>) -> Result<Option<String>, Error> {
)]
/// Get named known ACME directory endpoints.
fn get_directories() -> Result<&'static [KnownAcmeDirectory], Error> {
- Ok(crate::config::acme::KNOWN_ACME_DIRECTORIES)
-}
-
-/// Wrapper for efficient Arc use when returning the ACME challenge-plugin schema for serializing
-struct ChallengeSchemaWrapper {
- inner: Arc<Vec<AcmeChallengeSchema>>,
-}
-
-impl Serialize for ChallengeSchemaWrapper {
- fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
- where
- S: serde::Serializer,
- {
- self.inner.serialize(serializer)
- }
-}
-
-struct CachedSchema {
- schema: Arc<Vec<AcmeChallengeSchema>>,
- cached_mtime: SystemTime,
-}
-
-fn get_cached_challenge_schemas() -> Result<ChallengeSchemaWrapper, Error> {
- static CACHE: LazyLock<Mutex<Option<CachedSchema>>> = LazyLock::new(|| Mutex::new(None));
-
- // the actual loading code
- let mut last = CACHE.lock().unwrap();
-
- let actual_mtime = fs::metadata(crate::config::acme::ACME_DNS_SCHEMA_FN)?.modified()?;
-
- let schema = match &*last {
- Some(CachedSchema {
- schema,
- cached_mtime,
- }) if *cached_mtime >= actual_mtime => schema.clone(),
- _ => {
- let new_schema = Arc::new(crate::config::acme::load_dns_challenge_schema()?);
- *last = Some(CachedSchema {
- schema: Arc::clone(&new_schema),
- cached_mtime: actual_mtime,
- });
- new_schema
- }
- };
-
- Ok(ChallengeSchemaWrapper { inner: schema })
+ Ok(proxmox_acme_api::KNOWN_ACME_DIRECTORIES)
}
#[api(
@@ -457,69 +304,7 @@ fn get_cached_challenge_schemas() -> Result<ChallengeSchemaWrapper, Error> {
)]
/// Get named known ACME directory endpoints.
fn get_challenge_schema() -> Result<ChallengeSchemaWrapper, Error> {
- get_cached_challenge_schemas()
-}
-
-#[api]
-#[derive(Default, Deserialize, Serialize)]
-#[serde(rename_all = "kebab-case")]
-/// The API's format is inherited from PVE/PMG:
-pub struct PluginConfig {
- /// Plugin ID.
- plugin: String,
-
- /// Plugin type.
- #[serde(rename = "type")]
- ty: String,
-
- /// DNS Api name.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- api: Option<String>,
-
- /// Plugin configuration data.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- data: Option<String>,
-
- /// Extra delay in seconds to wait before requesting validation.
- ///
- /// Allows to cope with long TTL of DNS records.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- validation_delay: Option<u32>,
-
- /// Flag to disable the config.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- disable: Option<bool>,
-}
-
-// See PMG/PVE's $modify_cfg_for_api sub
-fn modify_cfg_for_api(id: &str, ty: &str, data: &Value) -> PluginConfig {
- let mut entry = data.clone();
-
- let obj = entry.as_object_mut().unwrap();
- obj.remove("id");
- obj.insert("plugin".to_string(), Value::String(id.to_owned()));
- obj.insert("type".to_string(), Value::String(ty.to_owned()));
-
- // FIXME: This needs to go once the `Updater` is fixed.
- // None of these should be able to fail unless the user changed the files by hand, in which
- // case we leave the unmodified string in the Value for now. This will be handled with an error
- // later.
- if let Some(Value::String(ref mut data)) = obj.get_mut("data") {
- if let Ok(new) = proxmox_base64::url::decode_no_pad(&data) {
- if let Ok(utf8) = String::from_utf8(new) {
- *data = utf8;
- }
- }
- }
-
- // PVE/PMG do this explicitly for ACME plugins...
- // obj.insert("digest".to_string(), Value::String(digest.clone()));
-
- serde_json::from_value(entry).unwrap_or_else(|_| PluginConfig {
- plugin: "*Error*".to_string(),
- ty: "*Error*".to_string(),
- ..Default::default()
- })
+ proxmox_acme_api::get_cached_challenge_schemas()
}
#[api(
@@ -535,12 +320,7 @@ fn modify_cfg_for_api(id: &str, ty: &str, data: &Value) -> PluginConfig {
)]
/// List ACME challenge plugins.
pub fn list_plugins(rpcenv: &mut dyn RpcEnvironment) -> Result<Vec<PluginConfig>, Error> {
- let (plugins, digest) = plugin::config()?;
- rpcenv["digest"] = hex::encode(digest).into();
- Ok(plugins
- .iter()
- .map(|(id, (ty, data))| modify_cfg_for_api(id, ty, data))
- .collect())
+ proxmox_acme_api::list_plugins(rpcenv)
}
#[api(
@@ -557,13 +337,7 @@ pub fn list_plugins(rpcenv: &mut dyn RpcEnvironment) -> Result<Vec<PluginConfig>
)]
/// List ACME challenge plugins.
pub fn get_plugin(id: String, rpcenv: &mut dyn RpcEnvironment) -> Result<PluginConfig, Error> {
- let (plugins, digest) = plugin::config()?;
- rpcenv["digest"] = hex::encode(digest).into();
-
- match plugins.get(&id) {
- Some((ty, data)) => Ok(modify_cfg_for_api(&id, ty, data)),
- None => http_bail!(NOT_FOUND, "no such plugin"),
- }
+ proxmox_acme_api::get_plugin(id, rpcenv)
}
// Currently we only have "the" standalone plugin and DNS plugins so we can just flatten a
@@ -595,30 +369,7 @@ pub fn get_plugin(id: String, rpcenv: &mut dyn RpcEnvironment) -> Result<PluginC
)]
/// Add ACME plugin configuration.
pub fn add_plugin(r#type: String, core: DnsPluginCore, data: String) -> Result<(), Error> {
- // Currently we only support DNS plugins and the standalone plugin is "fixed":
- if r#type != "dns" {
- param_bail!("type", "invalid ACME plugin type: {:?}", r#type);
- }
-
- let data = String::from_utf8(proxmox_base64::decode(data)?)
- .map_err(|_| format_err!("data must be valid UTF-8"))?;
-
- let id = core.id.clone();
-
- let _lock = plugin::lock()?;
-
- let (mut plugins, _digest) = plugin::config()?;
- if plugins.contains_key(&id) {
- param_bail!("id", "ACME plugin ID {:?} already exists", id);
- }
-
- let plugin = serde_json::to_value(DnsPlugin { core, data })?;
-
- plugins.insert(id, r#type, plugin);
-
- plugin::save_config(&plugins)?;
-
- Ok(())
+ proxmox_acme_api::add_plugin(r#type, core, data)
}
#[api(
@@ -634,26 +385,7 @@ pub fn add_plugin(r#type: String, core: DnsPluginCore, data: String) -> Result<(
)]
/// Delete an ACME plugin configuration.
pub fn delete_plugin(id: String) -> Result<(), Error> {
- let _lock = plugin::lock()?;
-
- let (mut plugins, _digest) = plugin::config()?;
- if plugins.remove(&id).is_none() {
- http_bail!(NOT_FOUND, "no such plugin");
- }
- plugin::save_config(&plugins)?;
-
- Ok(())
-}
-
-#[api()]
-#[derive(Serialize, Deserialize)]
-#[serde(rename_all = "kebab-case")]
-/// Deletable property name
-pub enum DeletableProperty {
- /// Delete the disable property
- Disable,
- /// Delete the validation-delay property
- ValidationDelay,
+ proxmox_acme_api::delete_plugin(id)
}
#[api(
@@ -675,12 +407,12 @@ pub enum DeletableProperty {
type: Array,
optional: true,
items: {
- type: DeletableProperty,
+ type: DeletablePluginProperty,
}
},
digest: {
- description: "Digest to protect against concurrent updates",
optional: true,
+ type: ConfigDigest,
},
},
},
@@ -694,65 +426,8 @@ pub fn update_plugin(
id: String,
update: DnsPluginCoreUpdater,
data: Option<String>,
- delete: Option<Vec<DeletableProperty>>,
- digest: Option<String>,
+ delete: Option<Vec<DeletablePluginProperty>>,
+ digest: Option<ConfigDigest>,
) -> Result<(), Error> {
- let data = data
- .as_deref()
- .map(proxmox_base64::decode)
- .transpose()?
- .map(String::from_utf8)
- .transpose()
- .map_err(|_| format_err!("data must be valid UTF-8"))?;
-
- let _lock = plugin::lock()?;
-
- let (mut plugins, expected_digest) = plugin::config()?;
-
- if let Some(digest) = digest {
- let digest = <[u8; 32]>::from_hex(digest)?;
- crate::tools::detect_modified_configuration_file(&digest, &expected_digest)?;
- }
-
- match plugins.get_mut(&id) {
- Some((ty, ref mut entry)) => {
- if ty != "dns" {
- bail!("cannot update plugin of type {:?}", ty);
- }
-
- let mut plugin = DnsPlugin::deserialize(&*entry)?;
-
- if let Some(delete) = delete {
- for delete_prop in delete {
- match delete_prop {
- DeletableProperty::ValidationDelay => {
- plugin.core.validation_delay = None;
- }
- DeletableProperty::Disable => {
- plugin.core.disable = None;
- }
- }
- }
- }
- if let Some(data) = data {
- plugin.data = data;
- }
- if let Some(api) = update.api {
- plugin.core.api = api;
- }
- if update.validation_delay.is_some() {
- plugin.core.validation_delay = update.validation_delay;
- }
- if update.disable.is_some() {
- plugin.core.disable = update.disable;
- }
-
- *entry = serde_json::to_value(plugin)?;
- }
- None => http_bail!(NOT_FOUND, "no such plugin"),
- }
-
- plugin::save_config(&plugins)?;
-
- Ok(())
+ proxmox_acme_api::update_plugin(id, update, data, delete, digest)
}
diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
index 7c9063c0..2905b41b 100644
--- a/src/api2/types/acme.rs
+++ b/src/api2/types/acme.rs
@@ -44,22 +44,6 @@ pub const ACME_DOMAIN_PROPERTY_SCHEMA: Schema =
.format(&ApiStringFormat::PropertyString(&AcmeDomain::API_SCHEMA))
.schema();
-#[api(
- properties: {
- name: { type: String },
- url: { type: String },
- },
-)]
-/// An ACME directory endpoint with a name and URL.
-#[derive(Serialize)]
-pub struct KnownAcmeDirectory {
- /// The ACME directory's name.
- pub name: &'static str,
-
- /// The ACME directory's endpoint URL.
- pub url: &'static str,
-}
-
#[api(
properties: {
schema: {
diff --git a/src/bin/proxmox_backup_manager/acme.rs b/src/bin/proxmox_backup_manager/acme.rs
index bb987b26..e7bd67af 100644
--- a/src/bin/proxmox_backup_manager/acme.rs
+++ b/src/bin/proxmox_backup_manager/acme.rs
@@ -8,10 +8,8 @@ use proxmox_schema::api;
use proxmox_sys::fs::file_get_contents;
use proxmox_acme::async_client::AcmeClient;
-use proxmox_acme_api::AcmeAccountName;
+use proxmox_acme_api::{AcmeAccountName, DnsPluginCore, KNOWN_ACME_DIRECTORIES};
use proxmox_backup::api2;
-use proxmox_backup::config::acme::plugin::DnsPluginCore;
-use proxmox_backup::config::acme::KNOWN_ACME_DIRECTORIES;
pub fn acme_mgmt_cli() -> CommandLineInterface {
let cmd_def = CliCommandMap::new()
@@ -122,7 +120,7 @@ async fn register_account(
match input.trim().parse::<usize>() {
Ok(n) if n < KNOWN_ACME_DIRECTORIES.len() => {
- break (KNOWN_ACME_DIRECTORIES[n].url.to_owned(), false);
+ break (KNOWN_ACME_DIRECTORIES[n].url.to_string(), false);
}
Ok(n) if n == KNOWN_ACME_DIRECTORIES.len() => {
input.clear();
diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
index d31b2bc9..35cda50b 100644
--- a/src/config/acme/mod.rs
+++ b/src/config/acme/mod.rs
@@ -1,8 +1,7 @@
use std::collections::HashMap;
use std::ops::ControlFlow;
-use std::path::Path;
-use anyhow::{bail, format_err, Error};
+use anyhow::Error;
use serde_json::Value;
use proxmox_sys::error::SysError;
@@ -10,8 +9,8 @@ use proxmox_sys::fs::{file_read_string, CreateOptions};
use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
-use crate::api2::types::{AcmeChallengeSchema, KnownAcmeDirectory};
-use proxmox_acme_api::AcmeAccountName;
+use crate::api2::types::AcmeChallengeSchema;
+use proxmox_acme_api::{AcmeAccountName, KnownAcmeDirectory, KNOWN_ACME_DIRECTORIES};
pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
pub(crate) const ACME_ACCOUNT_DIR: &str = pbs_buildcfg::configdir!("/acme/accounts");
@@ -36,23 +35,8 @@ pub(crate) fn make_acme_dir() -> Result<(), Error> {
create_acme_subdir(ACME_DIR)
}
-pub const KNOWN_ACME_DIRECTORIES: &[KnownAcmeDirectory] = &[
- KnownAcmeDirectory {
- name: "Let's Encrypt V2",
- url: "https://acme-v02.api.letsencrypt.org/directory",
- },
- KnownAcmeDirectory {
- name: "Let's Encrypt V2 Staging",
- url: "https://acme-staging-v02.api.letsencrypt.org/directory",
- },
-];
-
pub const DEFAULT_ACME_DIRECTORY_ENTRY: &KnownAcmeDirectory = &KNOWN_ACME_DIRECTORIES[0];
-pub fn account_path(name: &str) -> String {
- format!("{ACME_ACCOUNT_DIR}/{name}")
-}
-
pub fn foreach_acme_account<F>(mut func: F) -> Result<(), Error>
where
F: FnMut(AcmeAccountName) -> ControlFlow<Result<(), Error>>,
@@ -83,28 +67,6 @@ where
}
}
-pub fn mark_account_deactivated(name: &str) -> Result<(), Error> {
- let from = account_path(name);
- for i in 0..100 {
- let to = account_path(&format!("_deactivated_{name}_{i}"));
- if !Path::new(&to).exists() {
- return std::fs::rename(&from, &to).map_err(|err| {
- format_err!(
- "failed to move account path {:?} to {:?} - {}",
- from,
- to,
- err
- )
- });
- }
- }
- bail!(
- "No free slot to rename deactivated account {:?}, please cleanup {:?}",
- from,
- ACME_ACCOUNT_DIR
- );
-}
-
pub fn load_dns_challenge_schema() -> Result<Vec<AcmeChallengeSchema>, Error> {
let raw = file_read_string(ACME_DNS_SCHEMA_FN)?;
let schemas: serde_json::Map<String, Value> = serde_json::from_str(&raw)?;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 8%]
* [pbs-devel] [PATCH proxmox-backup 1/4] acme: include proxmox-acme-api dependency
2025-12-02 15:56 12% [pbs-devel] [PATCH proxmox{-backup, } 0/8] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
@ 2025-12-02 15:56 15% ` Samuel Rufinatscha
2025-12-02 15:56 6% ` [pbs-devel] [PATCH proxmox-backup 2/4] acme: drop local AcmeClient Samuel Rufinatscha
` (7 subsequent siblings)
8 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-12-02 15:56 UTC (permalink / raw)
To: pbs-devel
PBS currently uses its own ACME client and API logic, while PDM uses the
factored out proxmox-acme and proxmox-acme-api crates. This duplication
risks differences in behaviour and requires ACME maintenance in two
places. This patch is part of a series to move PBS over to the shared
ACME stack.
Changes:
- Add proxmox-acme-api with the "impl" feature as a dependency.
- Initialize proxmox_acme_api in proxmox-backup- api, manager and proxy.
* Inits PBS config dir /acme as proxmox ACME directory
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Cargo.toml | 3 +++
src/bin/proxmox-backup-api.rs | 2 ++
src/bin/proxmox-backup-manager.rs | 2 ++
src/bin/proxmox-backup-proxy.rs | 1 +
4 files changed, 8 insertions(+)
diff --git a/Cargo.toml b/Cargo.toml
index ff143932..bdaf7d85 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -101,6 +101,7 @@ pbs-api-types = "1.0.8"
# other proxmox crates
pathpatterns = "1"
proxmox-acme = "1"
+proxmox-acme-api = { version = "1", features = [ "impl" ] }
pxar = "1"
# PBS workspace
@@ -251,6 +252,7 @@ pbs-api-types.workspace = true
# in their respective repo
proxmox-acme.workspace = true
+proxmox-acme-api.workspace = true
pxar.workspace = true
# proxmox-backup workspace/internal crates
@@ -269,6 +271,7 @@ proxmox-rrd-api-types.workspace = true
[patch.crates-io]
#pbs-api-types = { path = "../proxmox/pbs-api-types" }
#proxmox-acme = { path = "../proxmox/proxmox-acme" }
+#proxmox-acme-api = { path = "../proxmox/proxmox-acme-api" }
#proxmox-api-macro = { path = "../proxmox/proxmox-api-macro" }
#proxmox-apt = { path = "../proxmox/proxmox-apt" }
#proxmox-apt-api-types = { path = "../proxmox/proxmox-apt-api-types" }
diff --git a/src/bin/proxmox-backup-api.rs b/src/bin/proxmox-backup-api.rs
index 417e9e97..48f10092 100644
--- a/src/bin/proxmox-backup-api.rs
+++ b/src/bin/proxmox-backup-api.rs
@@ -8,6 +8,7 @@ use hyper_util::server::graceful::GracefulShutdown;
use tokio::net::TcpListener;
use tracing::level_filters::LevelFilter;
+use pbs_buildcfg::configdir;
use proxmox_http::Body;
use proxmox_lang::try_block;
use proxmox_rest_server::{ApiConfig, RestServer};
@@ -78,6 +79,7 @@ async fn run() -> Result<(), Error> {
let mut command_sock = proxmox_daemon::command_socket::CommandSocket::new(backup_user.gid);
proxmox_product_config::init(backup_user.clone(), pbs_config::priv_user()?);
+ proxmox_acme_api::init(configdir!("/acme"), true)?;
let dir_opts = CreateOptions::new()
.owner(backup_user.uid)
diff --git a/src/bin/proxmox-backup-manager.rs b/src/bin/proxmox-backup-manager.rs
index d9f41353..0facb76c 100644
--- a/src/bin/proxmox-backup-manager.rs
+++ b/src/bin/proxmox-backup-manager.rs
@@ -18,6 +18,7 @@ use pbs_api_types::{
VERIFICATION_OUTDATED_AFTER_SCHEMA, VERIFY_JOB_READ_THREADS_SCHEMA,
VERIFY_JOB_VERIFY_THREADS_SCHEMA,
};
+use pbs_buildcfg::configdir;
use pbs_client::{display_task_log, view_task_result};
use pbs_config::sync;
use pbs_tools::json::required_string_param;
@@ -669,6 +670,7 @@ async fn run() -> Result<(), Error> {
.init()?;
proxmox_backup::server::notifications::init()?;
proxmox_product_config::init(pbs_config::backup_user()?, pbs_config::priv_user()?);
+ proxmox_acme_api::init(configdir!("/acme"), false)?;
let cmd_def = CliCommandMap::new()
.insert("acl", acl_commands())
diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
index 92a8cb3c..0bab18ec 100644
--- a/src/bin/proxmox-backup-proxy.rs
+++ b/src/bin/proxmox-backup-proxy.rs
@@ -190,6 +190,7 @@ async fn run() -> Result<(), Error> {
proxmox_backup::server::notifications::init()?;
metric_collection::init()?;
proxmox_product_config::init(pbs_config::backup_user()?, pbs_config::priv_user()?);
+ proxmox_acme_api::init(configdir!("/acme"), false)?;
let mut indexpath = PathBuf::from(pbs_buildcfg::JS_DIR);
indexpath.push("index.hbs");
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop
2025-11-28 10:46 5% ` Fabian Grünbichler
@ 2025-11-28 11:10 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-28 11:10 UTC (permalink / raw)
To: Fabian Grünbichler, Proxmox Backup Server development discussion
On 11/28/25 11:46 AM, Fabian Grünbichler wrote:
> On November 28, 2025 10:03 am, Samuel Rufinatscha wrote:
>> On 11/26/25 4:15 PM, Fabian Grünbichler wrote:
>>> On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
>>>> @@ -307,12 +332,12 @@ impl DatastoreThreadSettings {
>>>> /// - If the cached generation matches the current generation, the
>>>> /// cached config is returned.
>>>> /// - Otherwise the config is re-read from disk. If `update_cache` is
>>>> -/// `true`, the new config and current generation are stored in the
>>>> +/// `true`, the new config and bumped generation are stored in the
>>>> /// cache. Callers that set `update_cache = true` must hold the
>>>> /// datastore config lock to avoid racing with concurrent config
>>>> /// changes.
>>>> /// - If `update_cache` is `false`, the freshly read config is returned
>>>> -/// but the cache is left unchanged.
>>>> +/// but the cache and generation are left unchanged.
>>>> ///
>>>> /// If `ConfigVersionCache` is not available, the config is always read
>>>> /// from disk and `None` is returned as the generation.
>>>> @@ -333,14 +358,23 @@ fn datastore_section_config_cached(
>>>
>>> does this part here make any sense in this patch?
>>>
>>> we don't check the generation in the Drop handler anyway, so it will get
>>> the latest cached version, no matter what?
>>>
>>
>> we don't check the generation in the Drop handler, but the drop handler
>> depends on this to potentially get a most fresh cached version?
>
> datastore_section_config_cached will only reload the config if it was
> changed over our API and the generation in the cached entry does no
> longer match the current generation number. in that case there is no
> need to bump the generation number, since that was already done by
> whichever call saved the config and caused the generation number
> mismatch in the first place - this already invalidated all previously
> cached entries..
>
> bumping the generation number only makes sense once we introduce the
> force-reload mechanism in patch #4.
>
>>
>>> we'd only end up in this part of the code via lookup_datastore, and only
>>> if:
>>> - the previous cached entry and the current one have a different
>>> generation -> no need to bump again, the cache is already invalidated
>>> - there is no previous cached entry -> nothing to invalidate
>>>
>>> I think this part should move to the next patch..
>>
>> Shouldn't it be rather in PATCH 2 then, instead part of the TTL feature
>> Also I would adjust the comment below then, so that it doesn't
>> necessarily just benefit the drop handler that calls
>> datastore_section_config_cached(false) but would in general future uses
>> of datastore_section_config_cached(false)?
>
> it has no benefit at this point in the series (or after/at patch #2),
> see above. bumping only makes sense if we detect the generation number
> is not valid, which we can only do via the digest check from patch#4.
> and the digest check only makes sense with the TTL force-reload, because
> else we can never end up in the code path where we read the config
> without the cache already being invalid anyway.
>
Makes sense, I see. Thanks for clarifying Fabian!
Will add it to patch 4.
>>
>>>
>>>> let (config_raw, _digest) = pbs_config::datastore::config()?;
>>>> let config = Arc::new(config_raw);
>>>>
>>>> + let mut effective_gen = current_gen;
>>>> if update_cache {
>>>> + // Bump the generation. This ensures that Drop
>>>> + // handlers will detect that a newer config exists
>>>> + // and will not rely on a stale cached entry for
>>>> + // maintenance mandate.
>>>> + let prev_gen = version_cache.increase_datastore_generation();
>>>> + effective_gen = prev_gen + 1;
>>>> +
>>>> + // Persist
>>>> *config_cache = Some(DatastoreConfigCache {
>>>> config: config.clone(),
>>>> - last_generation: current_gen,
>>>> + last_generation: effective_gen,
>>>> });
>>>> }
>>>>
>>>> - Ok((config, Some(current_gen)))
>>>> + Ok((config, Some(effective_gen)))
>>>> } else {
>>>> // Fallback path, no config version cache: read datastore.cfg and return None as generation
>>>> *config_cache = None;
>>>> --
>>>> 2.47.3
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> pbs-devel mailing list
>>>> pbs-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop
2025-11-28 9:03 6% ` Samuel Rufinatscha
@ 2025-11-28 10:46 5% ` Fabian Grünbichler
2025-11-28 11:10 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2025-11-28 10:46 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Samuel Rufinatscha
On November 28, 2025 10:03 am, Samuel Rufinatscha wrote:
> On 11/26/25 4:15 PM, Fabian Grünbichler wrote:
>> On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
>>> @@ -307,12 +332,12 @@ impl DatastoreThreadSettings {
>>> /// - If the cached generation matches the current generation, the
>>> /// cached config is returned.
>>> /// - Otherwise the config is re-read from disk. If `update_cache` is
>>> -/// `true`, the new config and current generation are stored in the
>>> +/// `true`, the new config and bumped generation are stored in the
>>> /// cache. Callers that set `update_cache = true` must hold the
>>> /// datastore config lock to avoid racing with concurrent config
>>> /// changes.
>>> /// - If `update_cache` is `false`, the freshly read config is returned
>>> -/// but the cache is left unchanged.
>>> +/// but the cache and generation are left unchanged.
>>> ///
>>> /// If `ConfigVersionCache` is not available, the config is always read
>>> /// from disk and `None` is returned as the generation.
>>> @@ -333,14 +358,23 @@ fn datastore_section_config_cached(
>>
>> does this part here make any sense in this patch?
>>
>> we don't check the generation in the Drop handler anyway, so it will get
>> the latest cached version, no matter what?
>>
>
> we don't check the generation in the Drop handler, but the drop handler
> depends on this to potentially get a most fresh cached version?
datastore_section_config_cached will only reload the config if it was
changed over our API and the generation in the cached entry does no
longer match the current generation number. in that case there is no
need to bump the generation number, since that was already done by
whichever call saved the config and caused the generation number
mismatch in the first place - this already invalidated all previously
cached entries..
bumping the generation number only makes sense once we introduce the
force-reload mechanism in patch #4.
>
>> we'd only end up in this part of the code via lookup_datastore, and only
>> if:
>> - the previous cached entry and the current one have a different
>> generation -> no need to bump again, the cache is already invalidated
>> - there is no previous cached entry -> nothing to invalidate
>>
>> I think this part should move to the next patch..
>
> Shouldn't it be rather in PATCH 2 then, instead part of the TTL feature
> Also I would adjust the comment below then, so that it doesn't
> necessarily just benefit the drop handler that calls
> datastore_section_config_cached(false) but would in general future uses
> of datastore_section_config_cached(false)?
it has no benefit at this point in the series (or after/at patch #2),
see above. bumping only makes sense if we detect the generation number
is not valid, which we can only do via the digest check from patch#4.
and the digest check only makes sense with the TTL force-reload, because
else we can never end up in the code path where we read the config
without the cache already being invalid anyway.
>
>>
>>> let (config_raw, _digest) = pbs_config::datastore::config()?;
>>> let config = Arc::new(config_raw);
>>>
>>> + let mut effective_gen = current_gen;
>>> if update_cache {
>>> + // Bump the generation. This ensures that Drop
>>> + // handlers will detect that a newer config exists
>>> + // and will not rely on a stale cached entry for
>>> + // maintenance mandate.
>>> + let prev_gen = version_cache.increase_datastore_generation();
>>> + effective_gen = prev_gen + 1;
>>> +
>>> + // Persist
>>> *config_cache = Some(DatastoreConfigCache {
>>> config: config.clone(),
>>> - last_generation: current_gen,
>>> + last_generation: effective_gen,
>>> });
>>> }
>>>
>>> - Ok((config, Some(current_gen)))
>>> + Ok((config, Some(effective_gen)))
>>> } else {
>>> // Fallback path, no config version cache: read datastore.cfg and return None as generation
>>> *config_cache = None;
>>> --
>>> 2.47.3
>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop
2025-11-26 15:15 5% ` Fabian Grünbichler
@ 2025-11-28 9:03 6% ` Samuel Rufinatscha
2025-11-28 10:46 5% ` Fabian Grünbichler
0 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2025-11-28 9:03 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 11/26/25 4:15 PM, Fabian Grünbichler wrote:
> On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
>> The Drop impl of DataStore re-read datastore.cfg to decide whether
>> the entry should be evicted from the in-process cache (based on
>> maintenance mode’s clear_from_cache). During the investigation of
>> issue #6049 [1], a flamegraph [2] showed that the config reload in Drop
>> accounted for a measurable share of CPU time under load.
>>
>> This patch wires the datastore config fast path to the Drop
>> impl to eventually avoid an expensive config reload from disk to capture
>> the maintenance mandate. Also, to ensure the Drop handlers will detect
>> that a newer config exists / to mitigate usage of an eventually stale
>> cached entry, generation will not only be bumped on config save, but also
>> on re-read of the config file (slow path), if `update_cache = true`.
>>
>> Links
>>
>> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>>
>> Fixes: #6049
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> Changes:
>>
>> From v1 → v2
>> - Replace caching logic with the datastore_section_config_cached()
>> helper.
>>
>> From v2 → v3
>> No changes
>>
>> From v3 → v4, thanks @Fabian
>> - Pass datastore_section_config_cached(false) in Drop to avoid
>> concurrent cache updates.
>>
>> From v4 → v5
>> - Rebased only, no changes
>>
>> pbs-datastore/src/datastore.rs | 60 ++++++++++++++++++++++++++--------
>> 1 file changed, 47 insertions(+), 13 deletions(-)
>>
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index c9cb5d65..7638a899 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -225,15 +225,40 @@ impl Drop for DataStore {
>> // remove datastore from cache iff
>> // - last task finished, and
>> // - datastore is in a maintenance mode that mandates it
>> - let remove_from_cache = last_task
>> - && pbs_config::datastore::config()
>> - .and_then(|(s, _)| s.lookup::<DataStoreConfig>("datastore", self.name()))
>> - .is_ok_and(|c| {
>> - c.get_maintenance_mode()
>> - .is_some_and(|m| m.clear_from_cache())
>> - });
>
> old code here ignored parsing/locking/.. issues and just assumed if no
> config can be obtained nothing should be done..
>
>> -
>> - if remove_from_cache {
>> +
>> + // first check: check if last task finished
>> + if !last_task {
>> + return;
>> + }
>> +
>> + let (section_config, _gen) = match datastore_section_config_cached(false) {
>> + Ok(v) => v,
>> + Err(err) => {
>> + log::error!(
>> + "failed to load datastore config in Drop for {} - {err}",
>> + self.name()
>> + );
>> + return;
>> + }
>> + };
>> +
>> + let datastore_cfg: DataStoreConfig =
>> + match section_config.lookup("datastore", self.name()) {
>> + Ok(cfg) => cfg,
>> + Err(err) => {
>> + log::error!(
>> + "failed to look up datastore '{}' in Drop - {err}",
>> + self.name()
>> + );
>> + return;
>
> here we now have fancy error logging ;) which can be fine, but if we go
> from silently ignoring errors to logging them at error level that should
> be mentioned to make it clear that it is intentional.
>
Makes sense, will mention that change in the commit message.
> besides that, the second error here means that the datastore was removed
> from the config in the meantime.. in which case we should probably
> remove it from the map as well, if is still there, even though we can't
> check the maintenance mode..
>
>> + }
>> + };
>> +
>> + // second check: check maintenance mode mandate
>
> what is a "maintenance mode mandate"? ;)
>
> keeping it simple, why not just
>
> // check if maintenance mode requires closing FDs
>
I see, will rephrase this, thanks!
>> + if datastore_cfg
>> + .get_maintenance_mode()
>> + .is_some_and(|m| m.clear_from_cache())
>> + {
>> DATASTORE_MAP.lock().unwrap().remove(self.name());
>> }
>> }
>> @@ -307,12 +332,12 @@ impl DatastoreThreadSettings {
>> /// - If the cached generation matches the current generation, the
>> /// cached config is returned.
>> /// - Otherwise the config is re-read from disk. If `update_cache` is
>> -/// `true`, the new config and current generation are stored in the
>> +/// `true`, the new config and bumped generation are stored in the
>> /// cache. Callers that set `update_cache = true` must hold the
>> /// datastore config lock to avoid racing with concurrent config
>> /// changes.
>> /// - If `update_cache` is `false`, the freshly read config is returned
>> -/// but the cache is left unchanged.
>> +/// but the cache and generation are left unchanged.
>> ///
>> /// If `ConfigVersionCache` is not available, the config is always read
>> /// from disk and `None` is returned as the generation.
>> @@ -333,14 +358,23 @@ fn datastore_section_config_cached(
>
> does this part here make any sense in this patch?
>
> we don't check the generation in the Drop handler anyway, so it will get
> the latest cached version, no matter what?
>
we don't check the generation in the Drop handler, but the drop handler
depends on this to potentially get a most fresh cached version?
> we'd only end up in this part of the code via lookup_datastore, and only
> if:
> - the previous cached entry and the current one have a different
> generation -> no need to bump again, the cache is already invalidated
> - there is no previous cached entry -> nothing to invalidate
>
> I think this part should move to the next patch..
Shouldn't it be rather in PATCH 2 then, instead part of the TTL feature
Also I would adjust the comment below then, so that it doesn't
necessarily just benefit the drop handler that calls
datastore_section_config_cached(false) but would in general future uses
of datastore_section_config_cached(false)?
>
>> let (config_raw, _digest) = pbs_config::datastore::config()?;
>> let config = Arc::new(config_raw);
>>
>> + let mut effective_gen = current_gen;
>> if update_cache {
>> + // Bump the generation. This ensures that Drop
>> + // handlers will detect that a newer config exists
>> + // and will not rely on a stale cached entry for
>> + // maintenance mandate.
>> + let prev_gen = version_cache.increase_datastore_generation();
>> + effective_gen = prev_gen + 1;
>> +
>> + // Persist
>> *config_cache = Some(DatastoreConfigCache {
>> config: config.clone(),
>> - last_generation: current_gen,
>> + last_generation: effective_gen,
>> });
>> }
>>
>> - Ok((config, Some(current_gen)))
>> + Ok((config, Some(effective_gen)))
>> } else {
>> // Fallback path, no config version cache: read datastore.cfg and return None as generation
>> *config_cache = None;
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
2025-11-26 15:15 5% ` Fabian Grünbichler
@ 2025-11-26 17:21 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-26 17:21 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 11/26/25 4:15 PM, Fabian Grünbichler wrote:
> On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
>> Repeated /status requests caused lookup_datastore() to re-read and
>> parse datastore.cfg on every call. The issue was mentioned in report
>> #6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
>> dominated by pbs_config::datastore::config() (config parsing).
>>
>> This patch implements caching of the global datastore.cfg using the
>> generation numbers from the shared config version cache. It caches the
>> datastore.cfg along with the generation number and, when a subsequent
>> lookup sees the same generation, it reuses the cached config without
>> re-reading it from disk. If the generation differs
>> (or the cache is unavailable), the config is re-read from disk.
>> If `update_cache = true`, the new config and current generation are
>> persisted in the cache. In this case, callers must hold the datastore
>> config lock to avoid racing with concurrent config changes.
>> If `update_cache` is `false` and generation did not match, the freshly
>> read config is returned but the cache is left unchanged. If
>> `ConfigVersionCache` is not available, the config is always read from
>> disk and `None` is returned as generation.
>>
>> Behavioral notes
>>
>> - The generation is bumped via the existing save_config() path, so
>> API-driven config changes are detected immediately.
>> - Manual edits to datastore.cfg are not detected; this is covered in a
>> dedicated patch in this series.
>> - DataStore::drop still performs a config read on the common path;
>> also covered in a dedicated patch in this series.
>>
>> Links
>>
>> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>>
>> Fixes: #6049
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>
> style nits below, but otherwise
>
> Reviewed-by: Fabian Grünbicher <f.gruenbichler@proxmox.com>
>
>> ---
>> Changes:
>>
>> From v1 → v2, thanks @Fabian
>> - Moved the ConfigVersionCache changes into its own patch.
>> - Introduced the global static DATASTORE_CONFIG_CACHE to store the
>> fully parsed datastore.cfg instead, along with its generation number.
>> Introduced DatastoreConfigCache struct to hold both.
>> - Removed and replaced the CachedDatastoreConfigTag field of
>> DataStoreImpl with a generation number field only (Option<usize>)
>> to validate DataStoreImpl reuse.
>> - Added DataStore::datastore_section_config_cached() helper function
>> to encapsulate the caching logic and simplify reuse.
>> - Modified DataStore::lookup_datastore() to use the new helper.
>>
>> From v2 → v3
>> No changes
>>
>> From v3 → v4, thanks @Fabian
>> - Restructured the version cache checks in
>> datastore_section_config_cached(), to simplify the logic.
>> - Added update_cache parameter to datastore_section_config_cached() to
>> control cache updates.
>>
>> From v4 → v5
>> - Rebased only, no changes
>>
>> pbs-datastore/Cargo.toml | 1 +
>> pbs-datastore/src/datastore.rs | 138 +++++++++++++++++++++++++--------
>> 2 files changed, 105 insertions(+), 34 deletions(-)
>
> this could be
>
> 2 files changed, 80 insertions(+), 17 deletions(-)
>
> see below. this might sound nit-picky, but keeping diffs concise makes
> reviewing a lot easier because of the higher signal to noise ratio..
>
>>
>> diff --git a/pbs-datastore/Cargo.toml b/pbs-datastore/Cargo.toml
>> index 8ce930a9..42f49a7b 100644
>> --- a/pbs-datastore/Cargo.toml
>> +++ b/pbs-datastore/Cargo.toml
>> @@ -40,6 +40,7 @@ proxmox-io.workspace = true
>> proxmox-lang.workspace=true
>> proxmox-s3-client = { workspace = true, features = [ "impl" ] }
>> proxmox-schema = { workspace = true, features = [ "api-macro" ] }
>> +proxmox-section-config.workspace = true
>> proxmox-serde = { workspace = true, features = [ "serde_json" ] }
>> proxmox-sys.workspace = true
>> proxmox-systemd.workspace = true
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 36550ff6..c9cb5d65 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -34,7 +34,8 @@ use pbs_api_types::{
>> MaintenanceType, Operation, UPID,
>> };
>> use pbs_config::s3::S3_CFG_TYPE_ID;
>> -use pbs_config::BackupLockGuard;
>> +use pbs_config::{BackupLockGuard, ConfigVersionCache};
>> +use proxmox_section_config::SectionConfigData;
>>
>> use crate::backup_info::{
>> BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
>> @@ -48,6 +49,17 @@ use crate::s3::S3_CONTENT_PREFIX;
>> use crate::task_tracking::{self, update_active_operations};
>> use crate::{DataBlob, LocalDatastoreLruCache};
>>
>> +// Cache for fully parsed datastore.cfg
>> +struct DatastoreConfigCache {
>> + // Parsed datastore.cfg file
>> + config: Arc<SectionConfigData>,
>> + // Generation number from ConfigVersionCache
>> + last_generation: usize,
>> +}
>> +
>> +static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
>> + LazyLock::new(|| Mutex::new(None));
>> +
>> static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
>> LazyLock::new(|| Mutex::new(HashMap::new()));
>>
>> @@ -149,11 +161,13 @@ pub struct DataStoreImpl {
>> last_gc_status: Mutex<GarbageCollectionStatus>,
>> verify_new: bool,
>> chunk_order: ChunkOrder,
>> - last_digest: Option<[u8; 32]>,
>> sync_level: DatastoreFSyncLevel,
>> backend_config: DatastoreBackendConfig,
>> lru_store_caching: Option<LocalDatastoreLruCache>,
>> thread_settings: DatastoreThreadSettings,
>> + /// Datastore generation number from `ConfigVersionCache` at creation time, used to
>
> datastore.cfg cache generation number at lookup time, used to invalidate
> this cached `DataStoreImpl`
>
> creation time could also refer to the datastore creation time..
>
Good point, agree! Will apply your suggestion, I like it more :)
>> + /// validate reuse of this cached `DataStoreImpl`.
>> + config_generation: Option<usize>,
>> }
>>
>> impl DataStoreImpl {
>> @@ -166,11 +180,11 @@ impl DataStoreImpl {
>> last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
>> verify_new: false,
>> chunk_order: Default::default(),
>> - last_digest: None,
>> sync_level: Default::default(),
>> backend_config: Default::default(),
>> lru_store_caching: None,
>> thread_settings: Default::default(),
>> + config_generation: None,
>> })
>> }
>> }
>> @@ -286,6 +300,55 @@ impl DatastoreThreadSettings {
>
> so this new helper is already +49 lines, which means it's the bulk of
> the change if the noise below is removed..
>
>> }
>> }
>>
>> +/// Returns the parsed datastore config (`datastore.cfg`) and its
>> +/// generation.
>> +///
>> +/// Uses `ConfigVersionCache` to detect stale entries:
>> +/// - If the cached generation matches the current generation, the
>> +/// cached config is returned.
>> +/// - Otherwise the config is re-read from disk. If `update_cache` is
>> +/// `true`, the new config and current generation are stored in the
>> +/// cache. Callers that set `update_cache = true` must hold the
>> +/// datastore config lock to avoid racing with concurrent config
>> +/// changes.
>> +/// - If `update_cache` is `false`, the freshly read config is returned
>> +/// but the cache is left unchanged.
>> +///
>> +/// If `ConfigVersionCache` is not available, the config is always read
>> +/// from disk and `None` is returned as the generation.
>> +fn datastore_section_config_cached(
>> + update_cache: bool,
>> +) -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
>> + let mut config_cache = DATASTORE_CONFIG_CACHE.lock().unwrap();
>> +
>> + if let Ok(version_cache) = ConfigVersionCache::new() {
>> + let current_gen = version_cache.datastore_generation();
>> + if let Some(cached) = config_cache.as_ref() {
>> + // Fast path: re-use cached datastore.cfg
>> + if cached.last_generation == current_gen {
>> + return Ok((cached.config.clone(), Some(cached.last_generation)));
>> + }
>> + }
>> + // Slow path: re-read datastore.cfg
>> + let (config_raw, _digest) = pbs_config::datastore::config()?;
>> + let config = Arc::new(config_raw);
>> +
>> + if update_cache {
>> + *config_cache = Some(DatastoreConfigCache {
>> + config: config.clone(),
>> + last_generation: current_gen,
>> + });
>> + }
>> +
>> + Ok((config, Some(current_gen)))
>> + } else {
>> + // Fallback path, no config version cache: read datastore.cfg and return None as generation
>> + *config_cache = None;
>> + let (config_raw, _digest) = pbs_config::datastore::config()?;
>> + Ok((Arc::new(config_raw), None))
>> + }
>> +}
>> +
>> impl DataStore {
>> // This one just panics on everything
>> #[doc(hidden)]
>> @@ -363,56 +426,63 @@ impl DataStore {
>> name: &str,
>> operation: Option<Operation>,
>> ) -> Result<Arc<DataStore>, Error> {
>
> the changes here contain a lot of churn that is not needed for the
> actual change that is done - e.g., there's lots of variables being
> needless renamed..
>
>> - // Avoid TOCTOU between checking maintenance mode and updating active operation counter, as
>> - // we use it to decide whether it is okay to delete the datastore.
>> + // Avoid TOCTOU between checking maintenance mode and updating active operations.
>> let _config_lock = pbs_config::datastore::lock_config()?;
>>
>> - // we could use the ConfigVersionCache's generation for staleness detection, but we load
>> - // the config anyway -> just use digest, additional benefit: manual changes get detected
>> - let (config, digest) = pbs_config::datastore::config()?;
>> - let config: DataStoreConfig = config.lookup("datastore", name)?;
>> + // Get the current datastore.cfg generation number and cached config
>> + let (section_config, gen_num) = datastore_section_config_cached(true)?;
>> +
>> + let datastore_cfg: DataStoreConfig = section_config.lookup("datastore", name)?;
>
> renaming this variable here already causes a lot of noise - if you
> really want to do that, do it up-front or at the end as cleanup commit
> (depending on how sure you are the change is agree-able - if it's almost
> certain it is accepted, put it up front, then it can get applied
> already. if not, put it at the end -> then it can be skipped without
> affecting the other patches ;))
>
> but going from config to datastore_cfg doesn't make the code any clearer
> - the latter can still refer to the whole config (datastore.cfg) or the
> individual datastore's section within.. since we have no futher business
> with the whole section config here, I think keeping this as `config` is
> fine..
>
Agree, will change back!
As you mentioned, the idea of the rename was to somehow better
differentiate between the global datastore.cfg and the individual
datastore config.
As we have 2 configs here in the same block, I tried to avoid to rely on
"config". As you said datastore_cfg
does not make it much better, or could be misinterpreted. Ultimatively,
I directly aligend with the type names (datastore_cfg for
DataStoreConfig).
I see your point and agree:
since we have no futher business
> with the whole section config here, I think keeping this as `config` is
> fine..
This is the best solution here.
>> + let maintenance_mode = datastore_cfg.get_maintenance_mode();
>> + let mount_status = get_datastore_mount_status(&datastore_cfg);
>
> and things extracted into variables here despite being used only once
> (this also doesn't change during the rest of the series)
>
Good point, will remove!
>>
>> - if let Some(maintenance_mode) = config.get_maintenance_mode() {
>> - if let Err(error) = maintenance_mode.check(operation) {
>> + if let Some(mm) = &maintenance_mode {
>
> binding name changes
>
>> + if let Err(error) = mm.check(operation.clone()) {
>
> new clone() is introduced but not needed
Nice catch!
>
>> bail!("datastore '{name}' is unavailable: {error}");
>> }
>> }
>>
>> - if get_datastore_mount_status(&config) == Some(false) {
>> - let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
>> - datastore_cache.remove(&config.name);
>> - bail!("datastore '{}' is not mounted", config.name);
>> + let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
>
> moving this is fine!
>
Helps to avoid the duplicate checks :)
>> +
>> + if mount_status == Some(false) {
>> + datastore_cache.remove(&datastore_cfg.name);
>> + bail!("datastore '{}' is not mounted", datastore_cfg.name);
>> }
>>
>> - let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
>> - let entry = datastore_cache.get(name);
>
> this is changed to doing it twice and cloning..
>
Refactoring this!
>> -
>> - // reuse chunk store so that we keep using the same process locker instance!
>> - let chunk_store = if let Some(datastore) = &entry {
>
> structure changed here and binding renamed, even though the old one works as-is
>
Agree, will revert!
>> - let last_digest = datastore.last_digest.as_ref();
>> - if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
>> - if let Some(operation) = operation {
>> - update_active_operations(name, operation, 1)?;
>> + // Re-use DataStoreImpl
>> + if let Some(existing) = datastore_cache.get(name).cloned() {
>> + if let (Some(last_generation), Some(gen_num)) = (existing.config_generation, gen_num) {
>> + if last_generation == gen_num {
>
> these two ifs can be collapsed into
>
> if datastore.config_generation == gen_num && gen_num.is_some() {
>
> since we only want to reuse the entry if the current gen num is the same
> as the last one.
>
Will collapse the two ifs, good catch!
>> + if let Some(op) = operation {
>
> binding needlessly renamed
>
Agree, will revert!
>> + update_active_operations(name, op, 1)?;
>> + }
>> +
>> + return Ok(Arc::new(Self {
>> + inner: existing,
>> + operation,
>> + }));
>> }
>> - return Ok(Arc::new(Self {
>> - inner: Arc::clone(datastore),
>> - operation,
>> - }));
>> }
>> - Arc::clone(&datastore.chunk_store)
>> + }
>> +
>> + // (Re)build DataStoreImpl
>> +
>> + // Reuse chunk store so that we keep using the same process locker instance!
>> + let chunk_store = if let Some(existing) = datastore_cache.get(name) {
>> + Arc::clone(&existing.chunk_store)
>
> this one is not needed at all, can just be combined with the previous if
> as before
>
Will refactor!
>> } else {
>> let tuning: DatastoreTuning = serde_json::from_value(
>> DatastoreTuning::API_SCHEMA
>> - .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
>> + .parse_property_string(datastore_cfg.tuning.as_deref().unwrap_or(""))?,
>> )?;
>> Arc::new(ChunkStore::open(
>> name,
>> - config.absolute_path(),
>> + datastore_cfg.absolute_path(),
>> tuning.sync_level.unwrap_or_default(),
>> )?)
>> };
>>
>> - let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
>> + let datastore = DataStore::with_store_and_config(chunk_store, datastore_cfg, gen_num)?;
>>
>> let datastore = Arc::new(datastore);
>> datastore_cache.insert(name.to_string(), datastore.clone());
>> @@ -514,7 +584,7 @@ impl DataStore {
>> fn with_store_and_config(
>> chunk_store: Arc<ChunkStore>,
>> config: DataStoreConfig,
>> - last_digest: Option<[u8; 32]>,
>> + generation: Option<usize>,
>> ) -> Result<DataStoreImpl, Error> {
>> let mut gc_status_path = chunk_store.base_path();
>> gc_status_path.push(".gc-status");
>> @@ -579,11 +649,11 @@ impl DataStore {
>> last_gc_status: Mutex::new(gc_status),
>> verify_new: config.verify_new.unwrap_or(false),
>> chunk_order: tuning.chunk_order.unwrap_or_default(),
>> - last_digest,
>> sync_level: tuning.sync_level.unwrap_or_default(),
>> backend_config,
>> lru_store_caching,
>> thread_settings,
>> + config_generation: generation,
>> })
>> }
>>
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path
2025-11-26 15:16 5% ` [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Fabian Grünbichler
@ 2025-11-26 16:10 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-26 16:10 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 11/26/25 4:16 PM, Fabian Grünbichler wrote:
> On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
>> Hi,
>>
>> this series reduces CPU time in datastore lookups by avoiding repeated
>> datastore.cfg reads/parses in both `lookup_datastore()` and
>> `DataStore::Drop`. It also adds a TTL so manual config edits are
>> noticed without reintroducing hashing on every request.
>>
>> While investigating #6049 [1], cargo-flamegraph [2] showed hotspots
>> during repeated `/status` calls in `lookup_datastore()` and in `Drop`,
>> dominated by `pbs_config::datastore::config()` (config parse).
>>
>> The parsing cost itself should eventually be investigated in a future
>> effort. Furthermore, cargo-flamegraph showed that when using a
>> token-based auth method to access the API, a significant amount of time
>> is spent in validation on every request request [3].
>>
>> ## Approach
>>
>> [PATCH 1/4] Support datastore generation in ConfigVersionCache
>>
>> [PATCH 2/4] Fast path for datastore lookups
>> Cache the parsed datastore.cfg keyed by the shared datastore
>> generation. lookup_datastore() reuses both the cached config and an
>> existing DataStoreImpl when the generation matches, and falls back
>> to the old slow path otherwise. The caching logic is implemented
>> using the datastore_section_config_cached(update_cache: bool) helper.
>>
>> [PATCH 3/4] Fast path for Drop
>> Make DataStore::Drop use the datastore_section_config_cached()
>> helper to avoid re-reading/parsing datastore.cfg on every Drop.
>> Bump generation not only on API config saves, but also on slow-path
>> lookups (if update_cache is true), to enable Drop handlers see
>> eventual newer configs.
>>
>> [PATCH 4/4] TTL to catch manual edits
>> Add a TTL to the cached config and bump the datastore generation iff
>> the digest changed but generation stays the same. This catches manual
>> edits to datastore.cfg without reintroducing hashing or config
>> parsing on every request.
>
> semantics wise this looks mostly good to me now, sent a few style
> remarks for the individual patches. let's wait for feedback from the
> reporter, and then wrap this up hopefully :)
Thanks for the great review Fabian! I agree, will make sure to integrate
the style remarks and avoid more of the diff noise :)
Thanks!
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path
2025-11-24 17:04 12% [pbs-devel] [PATCH proxmox-backup v5 " Samuel Rufinatscha
` (3 preceding siblings ...)
2025-11-24 17:04 14% ` [pbs-devel] [PATCH proxmox-backup v5 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
@ 2025-11-26 15:16 5% ` Fabian Grünbichler
2025-11-26 16:10 6% ` Samuel Rufinatscha
2026-01-05 14:21 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
5 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2025-11-26 15:16 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
> Hi,
>
> this series reduces CPU time in datastore lookups by avoiding repeated
> datastore.cfg reads/parses in both `lookup_datastore()` and
> `DataStore::Drop`. It also adds a TTL so manual config edits are
> noticed without reintroducing hashing on every request.
>
> While investigating #6049 [1], cargo-flamegraph [2] showed hotspots
> during repeated `/status` calls in `lookup_datastore()` and in `Drop`,
> dominated by `pbs_config::datastore::config()` (config parse).
>
> The parsing cost itself should eventually be investigated in a future
> effort. Furthermore, cargo-flamegraph showed that when using a
> token-based auth method to access the API, a significant amount of time
> is spent in validation on every request request [3].
>
> ## Approach
>
> [PATCH 1/4] Support datastore generation in ConfigVersionCache
>
> [PATCH 2/4] Fast path for datastore lookups
> Cache the parsed datastore.cfg keyed by the shared datastore
> generation. lookup_datastore() reuses both the cached config and an
> existing DataStoreImpl when the generation matches, and falls back
> to the old slow path otherwise. The caching logic is implemented
> using the datastore_section_config_cached(update_cache: bool) helper.
>
> [PATCH 3/4] Fast path for Drop
> Make DataStore::Drop use the datastore_section_config_cached()
> helper to avoid re-reading/parsing datastore.cfg on every Drop.
> Bump generation not only on API config saves, but also on slow-path
> lookups (if update_cache is true), to enable Drop handlers see
> eventual newer configs.
>
> [PATCH 4/4] TTL to catch manual edits
> Add a TTL to the cached config and bump the datastore generation iff
> the digest changed but generation stays the same. This catches manual
> edits to datastore.cfg without reintroducing hashing or config
> parsing on every request.
semantics wise this looks mostly good to me now, sent a few style
remarks for the individual patches. let's wait for feedback from the
reporter, and then wrap this up hopefully :)
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
2025-11-24 17:04 11% ` [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
@ 2025-11-26 15:15 5% ` Fabian Grünbichler
2025-11-26 17:21 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2025-11-26 15:15 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
> Repeated /status requests caused lookup_datastore() to re-read and
> parse datastore.cfg on every call. The issue was mentioned in report
> #6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
> dominated by pbs_config::datastore::config() (config parsing).
>
> This patch implements caching of the global datastore.cfg using the
> generation numbers from the shared config version cache. It caches the
> datastore.cfg along with the generation number and, when a subsequent
> lookup sees the same generation, it reuses the cached config without
> re-reading it from disk. If the generation differs
> (or the cache is unavailable), the config is re-read from disk.
> If `update_cache = true`, the new config and current generation are
> persisted in the cache. In this case, callers must hold the datastore
> config lock to avoid racing with concurrent config changes.
> If `update_cache` is `false` and generation did not match, the freshly
> read config is returned but the cache is left unchanged. If
> `ConfigVersionCache` is not available, the config is always read from
> disk and `None` is returned as generation.
>
> Behavioral notes
>
> - The generation is bumped via the existing save_config() path, so
> API-driven config changes are detected immediately.
> - Manual edits to datastore.cfg are not detected; this is covered in a
> dedicated patch in this series.
> - DataStore::drop still performs a config read on the common path;
> also covered in a dedicated patch in this series.
>
> Links
>
> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>
> Fixes: #6049
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
style nits below, but otherwise
Reviewed-by: Fabian Grünbicher <f.gruenbichler@proxmox.com>
> ---
> Changes:
>
> From v1 → v2, thanks @Fabian
> - Moved the ConfigVersionCache changes into its own patch.
> - Introduced the global static DATASTORE_CONFIG_CACHE to store the
> fully parsed datastore.cfg instead, along with its generation number.
> Introduced DatastoreConfigCache struct to hold both.
> - Removed and replaced the CachedDatastoreConfigTag field of
> DataStoreImpl with a generation number field only (Option<usize>)
> to validate DataStoreImpl reuse.
> - Added DataStore::datastore_section_config_cached() helper function
> to encapsulate the caching logic and simplify reuse.
> - Modified DataStore::lookup_datastore() to use the new helper.
>
> From v2 → v3
> No changes
>
> From v3 → v4, thanks @Fabian
> - Restructured the version cache checks in
> datastore_section_config_cached(), to simplify the logic.
> - Added update_cache parameter to datastore_section_config_cached() to
> control cache updates.
>
> From v4 → v5
> - Rebased only, no changes
>
> pbs-datastore/Cargo.toml | 1 +
> pbs-datastore/src/datastore.rs | 138 +++++++++++++++++++++++++--------
> 2 files changed, 105 insertions(+), 34 deletions(-)
this could be
2 files changed, 80 insertions(+), 17 deletions(-)
see below. this might sound nit-picky, but keeping diffs concise makes
reviewing a lot easier because of the higher signal to noise ratio..
>
> diff --git a/pbs-datastore/Cargo.toml b/pbs-datastore/Cargo.toml
> index 8ce930a9..42f49a7b 100644
> --- a/pbs-datastore/Cargo.toml
> +++ b/pbs-datastore/Cargo.toml
> @@ -40,6 +40,7 @@ proxmox-io.workspace = true
> proxmox-lang.workspace=true
> proxmox-s3-client = { workspace = true, features = [ "impl" ] }
> proxmox-schema = { workspace = true, features = [ "api-macro" ] }
> +proxmox-section-config.workspace = true
> proxmox-serde = { workspace = true, features = [ "serde_json" ] }
> proxmox-sys.workspace = true
> proxmox-systemd.workspace = true
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 36550ff6..c9cb5d65 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -34,7 +34,8 @@ use pbs_api_types::{
> MaintenanceType, Operation, UPID,
> };
> use pbs_config::s3::S3_CFG_TYPE_ID;
> -use pbs_config::BackupLockGuard;
> +use pbs_config::{BackupLockGuard, ConfigVersionCache};
> +use proxmox_section_config::SectionConfigData;
>
> use crate::backup_info::{
> BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
> @@ -48,6 +49,17 @@ use crate::s3::S3_CONTENT_PREFIX;
> use crate::task_tracking::{self, update_active_operations};
> use crate::{DataBlob, LocalDatastoreLruCache};
>
> +// Cache for fully parsed datastore.cfg
> +struct DatastoreConfigCache {
> + // Parsed datastore.cfg file
> + config: Arc<SectionConfigData>,
> + // Generation number from ConfigVersionCache
> + last_generation: usize,
> +}
> +
> +static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
> + LazyLock::new(|| Mutex::new(None));
> +
> static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
> LazyLock::new(|| Mutex::new(HashMap::new()));
>
> @@ -149,11 +161,13 @@ pub struct DataStoreImpl {
> last_gc_status: Mutex<GarbageCollectionStatus>,
> verify_new: bool,
> chunk_order: ChunkOrder,
> - last_digest: Option<[u8; 32]>,
> sync_level: DatastoreFSyncLevel,
> backend_config: DatastoreBackendConfig,
> lru_store_caching: Option<LocalDatastoreLruCache>,
> thread_settings: DatastoreThreadSettings,
> + /// Datastore generation number from `ConfigVersionCache` at creation time, used to
datastore.cfg cache generation number at lookup time, used to invalidate
this cached `DataStoreImpl`
creation time could also refer to the datastore creation time..
> + /// validate reuse of this cached `DataStoreImpl`.
> + config_generation: Option<usize>,
> }
>
> impl DataStoreImpl {
> @@ -166,11 +180,11 @@ impl DataStoreImpl {
> last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
> verify_new: false,
> chunk_order: Default::default(),
> - last_digest: None,
> sync_level: Default::default(),
> backend_config: Default::default(),
> lru_store_caching: None,
> thread_settings: Default::default(),
> + config_generation: None,
> })
> }
> }
> @@ -286,6 +300,55 @@ impl DatastoreThreadSettings {
so this new helper is already +49 lines, which means it's the bulk of
the change if the noise below is removed..
> }
> }
>
> +/// Returns the parsed datastore config (`datastore.cfg`) and its
> +/// generation.
> +///
> +/// Uses `ConfigVersionCache` to detect stale entries:
> +/// - If the cached generation matches the current generation, the
> +/// cached config is returned.
> +/// - Otherwise the config is re-read from disk. If `update_cache` is
> +/// `true`, the new config and current generation are stored in the
> +/// cache. Callers that set `update_cache = true` must hold the
> +/// datastore config lock to avoid racing with concurrent config
> +/// changes.
> +/// - If `update_cache` is `false`, the freshly read config is returned
> +/// but the cache is left unchanged.
> +///
> +/// If `ConfigVersionCache` is not available, the config is always read
> +/// from disk and `None` is returned as the generation.
> +fn datastore_section_config_cached(
> + update_cache: bool,
> +) -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
> + let mut config_cache = DATASTORE_CONFIG_CACHE.lock().unwrap();
> +
> + if let Ok(version_cache) = ConfigVersionCache::new() {
> + let current_gen = version_cache.datastore_generation();
> + if let Some(cached) = config_cache.as_ref() {
> + // Fast path: re-use cached datastore.cfg
> + if cached.last_generation == current_gen {
> + return Ok((cached.config.clone(), Some(cached.last_generation)));
> + }
> + }
> + // Slow path: re-read datastore.cfg
> + let (config_raw, _digest) = pbs_config::datastore::config()?;
> + let config = Arc::new(config_raw);
> +
> + if update_cache {
> + *config_cache = Some(DatastoreConfigCache {
> + config: config.clone(),
> + last_generation: current_gen,
> + });
> + }
> +
> + Ok((config, Some(current_gen)))
> + } else {
> + // Fallback path, no config version cache: read datastore.cfg and return None as generation
> + *config_cache = None;
> + let (config_raw, _digest) = pbs_config::datastore::config()?;
> + Ok((Arc::new(config_raw), None))
> + }
> +}
> +
> impl DataStore {
> // This one just panics on everything
> #[doc(hidden)]
> @@ -363,56 +426,63 @@ impl DataStore {
> name: &str,
> operation: Option<Operation>,
> ) -> Result<Arc<DataStore>, Error> {
the changes here contain a lot of churn that is not needed for the
actual change that is done - e.g., there's lots of variables being
needless renamed..
> - // Avoid TOCTOU between checking maintenance mode and updating active operation counter, as
> - // we use it to decide whether it is okay to delete the datastore.
> + // Avoid TOCTOU between checking maintenance mode and updating active operations.
> let _config_lock = pbs_config::datastore::lock_config()?;
>
> - // we could use the ConfigVersionCache's generation for staleness detection, but we load
> - // the config anyway -> just use digest, additional benefit: manual changes get detected
> - let (config, digest) = pbs_config::datastore::config()?;
> - let config: DataStoreConfig = config.lookup("datastore", name)?;
> + // Get the current datastore.cfg generation number and cached config
> + let (section_config, gen_num) = datastore_section_config_cached(true)?;
> +
> + let datastore_cfg: DataStoreConfig = section_config.lookup("datastore", name)?;
renaming this variable here already causes a lot of noise - if you
really want to do that, do it up-front or at the end as cleanup commit
(depending on how sure you are the change is agree-able - if it's almost
certain it is accepted, put it up front, then it can get applied
already. if not, put it at the end -> then it can be skipped without
affecting the other patches ;))
but going from config to datastore_cfg doesn't make the code any clearer
- the latter can still refer to the whole config (datastore.cfg) or the
individual datastore's section within.. since we have no futher business
with the whole section config here, I think keeping this as `config` is
fine..
> + let maintenance_mode = datastore_cfg.get_maintenance_mode();
> + let mount_status = get_datastore_mount_status(&datastore_cfg);
and things extracted into variables here despite being used only once
(this also doesn't change during the rest of the series)
>
> - if let Some(maintenance_mode) = config.get_maintenance_mode() {
> - if let Err(error) = maintenance_mode.check(operation) {
> + if let Some(mm) = &maintenance_mode {
binding name changes
> + if let Err(error) = mm.check(operation.clone()) {
new clone() is introduced but not needed
> bail!("datastore '{name}' is unavailable: {error}");
> }
> }
>
> - if get_datastore_mount_status(&config) == Some(false) {
> - let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
> - datastore_cache.remove(&config.name);
> - bail!("datastore '{}' is not mounted", config.name);
> + let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
moving this is fine!
> +
> + if mount_status == Some(false) {
> + datastore_cache.remove(&datastore_cfg.name);
> + bail!("datastore '{}' is not mounted", datastore_cfg.name);
> }
>
> - let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
> - let entry = datastore_cache.get(name);
this is changed to doing it twice and cloning..
> -
> - // reuse chunk store so that we keep using the same process locker instance!
> - let chunk_store = if let Some(datastore) = &entry {
structure changed here and binding renamed, even though the old one works as-is
> - let last_digest = datastore.last_digest.as_ref();
> - if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
> - if let Some(operation) = operation {
> - update_active_operations(name, operation, 1)?;
> + // Re-use DataStoreImpl
> + if let Some(existing) = datastore_cache.get(name).cloned() {
> + if let (Some(last_generation), Some(gen_num)) = (existing.config_generation, gen_num) {
> + if last_generation == gen_num {
these two ifs can be collapsed into
if datastore.config_generation == gen_num && gen_num.is_some() {
since we only want to reuse the entry if the current gen num is the same
as the last one.
> + if let Some(op) = operation {
binding needlessly renamed
> + update_active_operations(name, op, 1)?;
> + }
> +
> + return Ok(Arc::new(Self {
> + inner: existing,
> + operation,
> + }));
> }
> - return Ok(Arc::new(Self {
> - inner: Arc::clone(datastore),
> - operation,
> - }));
> }
> - Arc::clone(&datastore.chunk_store)
> + }
> +
> + // (Re)build DataStoreImpl
> +
> + // Reuse chunk store so that we keep using the same process locker instance!
> + let chunk_store = if let Some(existing) = datastore_cache.get(name) {
> + Arc::clone(&existing.chunk_store)
this one is not needed at all, can just be combined with the previous if
as before
> } else {
> let tuning: DatastoreTuning = serde_json::from_value(
> DatastoreTuning::API_SCHEMA
> - .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
> + .parse_property_string(datastore_cfg.tuning.as_deref().unwrap_or(""))?,
> )?;
> Arc::new(ChunkStore::open(
> name,
> - config.absolute_path(),
> + datastore_cfg.absolute_path(),
> tuning.sync_level.unwrap_or_default(),
> )?)
> };
>
> - let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
> + let datastore = DataStore::with_store_and_config(chunk_store, datastore_cfg, gen_num)?;
>
> let datastore = Arc::new(datastore);
> datastore_cache.insert(name.to_string(), datastore.clone());
> @@ -514,7 +584,7 @@ impl DataStore {
> fn with_store_and_config(
> chunk_store: Arc<ChunkStore>,
> config: DataStoreConfig,
> - last_digest: Option<[u8; 32]>,
> + generation: Option<usize>,
> ) -> Result<DataStoreImpl, Error> {
> let mut gc_status_path = chunk_store.base_path();
> gc_status_path.push(".gc-status");
> @@ -579,11 +649,11 @@ impl DataStore {
> last_gc_status: Mutex::new(gc_status),
> verify_new: config.verify_new.unwrap_or(false),
> chunk_order: tuning.chunk_order.unwrap_or_default(),
> - last_digest,
> sync_level: tuning.sync_level.unwrap_or_default(),
> backend_config,
> lru_store_caching,
> thread_settings,
> + config_generation: generation,
> })
> }
>
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop
2025-11-24 17:04 14% ` [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
@ 2025-11-26 15:15 5% ` Fabian Grünbichler
2025-11-28 9:03 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2025-11-26 15:15 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
> The Drop impl of DataStore re-read datastore.cfg to decide whether
> the entry should be evicted from the in-process cache (based on
> maintenance mode’s clear_from_cache). During the investigation of
> issue #6049 [1], a flamegraph [2] showed that the config reload in Drop
> accounted for a measurable share of CPU time under load.
>
> This patch wires the datastore config fast path to the Drop
> impl to eventually avoid an expensive config reload from disk to capture
> the maintenance mandate. Also, to ensure the Drop handlers will detect
> that a newer config exists / to mitigate usage of an eventually stale
> cached entry, generation will not only be bumped on config save, but also
> on re-read of the config file (slow path), if `update_cache = true`.
>
> Links
>
> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>
> Fixes: #6049
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> Changes:
>
> From v1 → v2
> - Replace caching logic with the datastore_section_config_cached()
> helper.
>
> From v2 → v3
> No changes
>
> From v3 → v4, thanks @Fabian
> - Pass datastore_section_config_cached(false) in Drop to avoid
> concurrent cache updates.
>
> From v4 → v5
> - Rebased only, no changes
>
> pbs-datastore/src/datastore.rs | 60 ++++++++++++++++++++++++++--------
> 1 file changed, 47 insertions(+), 13 deletions(-)
>
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index c9cb5d65..7638a899 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -225,15 +225,40 @@ impl Drop for DataStore {
> // remove datastore from cache iff
> // - last task finished, and
> // - datastore is in a maintenance mode that mandates it
> - let remove_from_cache = last_task
> - && pbs_config::datastore::config()
> - .and_then(|(s, _)| s.lookup::<DataStoreConfig>("datastore", self.name()))
> - .is_ok_and(|c| {
> - c.get_maintenance_mode()
> - .is_some_and(|m| m.clear_from_cache())
> - });
old code here ignored parsing/locking/.. issues and just assumed if no
config can be obtained nothing should be done..
> -
> - if remove_from_cache {
> +
> + // first check: check if last task finished
> + if !last_task {
> + return;
> + }
> +
> + let (section_config, _gen) = match datastore_section_config_cached(false) {
> + Ok(v) => v,
> + Err(err) => {
> + log::error!(
> + "failed to load datastore config in Drop for {} - {err}",
> + self.name()
> + );
> + return;
> + }
> + };
> +
> + let datastore_cfg: DataStoreConfig =
> + match section_config.lookup("datastore", self.name()) {
> + Ok(cfg) => cfg,
> + Err(err) => {
> + log::error!(
> + "failed to look up datastore '{}' in Drop - {err}",
> + self.name()
> + );
> + return;
here we now have fancy error logging ;) which can be fine, but if we go
from silently ignoring errors to logging them at error level that should
be mentioned to make it clear that it is intentional.
besides that, the second error here means that the datastore was removed
from the config in the meantime.. in which case we should probably
remove it from the map as well, if is still there, even though we can't
check the maintenance mode..
> + }
> + };
> +
> + // second check: check maintenance mode mandate
what is a "maintenance mode mandate"? ;)
keeping it simple, why not just
// check if maintenance mode requires closing FDs
> + if datastore_cfg
> + .get_maintenance_mode()
> + .is_some_and(|m| m.clear_from_cache())
> + {
> DATASTORE_MAP.lock().unwrap().remove(self.name());
> }
> }
> @@ -307,12 +332,12 @@ impl DatastoreThreadSettings {
> /// - If the cached generation matches the current generation, the
> /// cached config is returned.
> /// - Otherwise the config is re-read from disk. If `update_cache` is
> -/// `true`, the new config and current generation are stored in the
> +/// `true`, the new config and bumped generation are stored in the
> /// cache. Callers that set `update_cache = true` must hold the
> /// datastore config lock to avoid racing with concurrent config
> /// changes.
> /// - If `update_cache` is `false`, the freshly read config is returned
> -/// but the cache is left unchanged.
> +/// but the cache and generation are left unchanged.
> ///
> /// If `ConfigVersionCache` is not available, the config is always read
> /// from disk and `None` is returned as the generation.
> @@ -333,14 +358,23 @@ fn datastore_section_config_cached(
does this part here make any sense in this patch?
we don't check the generation in the Drop handler anyway, so it will get
the latest cached version, no matter what?
we'd only end up in this part of the code via lookup_datastore, and only
if:
- the previous cached entry and the current one have a different
generation -> no need to bump again, the cache is already invalidated
- there is no previous cached entry -> nothing to invalidate
I think this part should move to the next patch..
> let (config_raw, _digest) = pbs_config::datastore::config()?;
> let config = Arc::new(config_raw);
>
> + let mut effective_gen = current_gen;
> if update_cache {
> + // Bump the generation. This ensures that Drop
> + // handlers will detect that a newer config exists
> + // and will not rely on a stale cached entry for
> + // maintenance mandate.
> + let prev_gen = version_cache.increase_datastore_generation();
> + effective_gen = prev_gen + 1;
> +
> + // Persist
> *config_cache = Some(DatastoreConfigCache {
> config: config.clone(),
> - last_generation: current_gen,
> + last_generation: effective_gen,
> });
> }
>
> - Ok((config, Some(current_gen)))
> + Ok((config, Some(effective_gen)))
> } else {
> // Fallback path, no config version cache: read datastore.cfg and return None as generation
> *config_cache = None;
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
2025-11-24 17:04 14% ` [pbs-devel] [PATCH proxmox-backup v5 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
@ 2025-11-26 15:15 5% ` Fabian Grünbichler
0 siblings, 0 replies; 200+ results
From: Fabian Grünbichler @ 2025-11-26 15:15 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
nit for the subject: this doesn't fix the reported issue, it just
improves the fix further, so please drop that part and maybe instead add
"lookup" somewhere..
On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
> The lookup fast path reacts to API-driven config changes because
> save_config() bumps the generation. Manual edits of datastore.cfg do
> not bump the counter. To keep the system robust against such edits
> without reintroducing config reading and hashing on the hot path, this
> patch adds a TTL to the cache entry.
>
> If the cached config is older than
> DATASTORE_CONFIG_CACHE_TTL_SECS (set to 60s), the next lookup takes
> the slow path and refreshes the entry. As an optimization, a check to
> catch manual edits was added (if the digest changed but generation
> stayed the same), so that the generation is only bumped when needed.
>
> Links
>
> [1] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>
> Fixes: #6049
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
one style nit below, otherwise:
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
> Changes:
>
> From v1 → v2
> - Store last_update timestamp in DatastoreConfigCache type.
>
> From v2 → v3
> No changes
>
> From v3 → v4
> - Fix digest generation bump logic in update_cache, thanks @Fabian.
>
> From v4 → v5
> - Rebased only, no changes
>
> pbs-datastore/src/datastore.rs | 53 ++++++++++++++++++++++++----------
> 1 file changed, 38 insertions(+), 15 deletions(-)
>
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 7638a899..0fc3fbf2 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -53,8 +53,12 @@ use crate::{DataBlob, LocalDatastoreLruCache};
> struct DatastoreConfigCache {
> // Parsed datastore.cfg file
> config: Arc<SectionConfigData>,
> + // Digest of the datastore.cfg file
> + digest: [u8; 32],
> // Generation number from ConfigVersionCache
> last_generation: usize,
> + // Last update time (epoch seconds)
> + last_update: i64,
> }
>
> static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
> @@ -63,6 +67,8 @@ static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
> static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
> LazyLock::new(|| Mutex::new(HashMap::new()));
>
> +/// Max age in seconds to reuse the cached datastore config.
> +const DATASTORE_CONFIG_CACHE_TTL_SECS: i64 = 60;
> /// Filename to store backup group notes
> pub const GROUP_NOTES_FILE_NAME: &str = "notes";
> /// Filename to store backup group owner
> @@ -329,13 +335,14 @@ impl DatastoreThreadSettings {
> /// generation.
> ///
> /// Uses `ConfigVersionCache` to detect stale entries:
> -/// - If the cached generation matches the current generation, the
> -/// cached config is returned.
> +/// - If the cached generation matches the current generation and TTL is
> +/// OK, the cached config is returned.
> /// - Otherwise the config is re-read from disk. If `update_cache` is
> -/// `true`, the new config and bumped generation are stored in the
> -/// cache. Callers that set `update_cache = true` must hold the
> -/// datastore config lock to avoid racing with concurrent config
> -/// changes.
> +/// `true` and a previous cached entry exists with the same generation
> +/// but a different digest, this indicates the config has changed
> +/// (e.g. manual edit) and the generation must be bumped. Callers
> +/// that set `update_cache = true` must hold the datastore config lock
> +/// to avoid racing with concurrent config changes.
> /// - If `update_cache` is `false`, the freshly read config is returned
> /// but the cache and generation are left unchanged.
> ///
> @@ -347,30 +354,46 @@ fn datastore_section_config_cached(
> let mut config_cache = DATASTORE_CONFIG_CACHE.lock().unwrap();
>
> if let Ok(version_cache) = ConfigVersionCache::new() {
> + let now = epoch_i64();
> let current_gen = version_cache.datastore_generation();
> if let Some(cached) = config_cache.as_ref() {
> - // Fast path: re-use cached datastore.cfg
> - if cached.last_generation == current_gen {
> + // Fast path: re-use cached datastore.cfg if generation matches and TTL not expired
> + if cached.last_generation == current_gen
> + && now - cached.last_update < DATASTORE_CONFIG_CACHE_TTL_SECS
> + {
> return Ok((cached.config.clone(), Some(cached.last_generation)));
> }
> }
> // Slow path: re-read datastore.cfg
> - let (config_raw, _digest) = pbs_config::datastore::config()?;
> + let (config_raw, digest) = pbs_config::datastore::config()?;
> let config = Arc::new(config_raw);
>
> let mut effective_gen = current_gen;
> if update_cache {
> - // Bump the generation. This ensures that Drop
> - // handlers will detect that a newer config exists
> - // and will not rely on a stale cached entry for
> - // maintenance mandate.
> - let prev_gen = version_cache.increase_datastore_generation();
> - effective_gen = prev_gen + 1;
> + // Bump the generation if the config has been changed manually.
> + // This ensures that Drop handlers will detect that a newer config exists
> + // and will not rely on a stale cached entry for maintenance mandate.
> + let (prev_gen, prev_digest) = config_cache
> + .as_ref()
> + .map(|c| (Some(c.last_generation), Some(c.digest)))
> + .unwrap_or((None, None));
so here we map an option to a tuple of options and unwrap it
> +
> + let manual_edit = match (prev_gen, prev_digest) {
only to then match and convert it to a boolean again here
> + (Some(prev_g), Some(prev_d)) => prev_g == current_gen && prev_d != digest,
> + _ => false,
> + };
> +
> + if manual_edit {
> + let prev_gen = version_cache.increase_datastore_generation();
> + effective_gen = prev_gen + 1;
to then do some code here, if the boolean is true ;)
this can all just be a single block of code instead:
if let Some(cached) = config_cache.as_ref() {
if cached.last_generation == current_gen && cached.digest != digest {
effective_gen = version_cache.increase_datastore_generation() + 1;
}
}
which also matches the first block higher up in the helper..
> + }
>
> // Persist
> *config_cache = Some(DatastoreConfigCache {
> config: config.clone(),
> + digest,
> last_generation: effective_gen,
> + last_update: now,
> });
> }
>
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore
2025-11-24 17:04 16% ` [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
@ 2025-11-26 15:15 5% ` Fabian Grünbichler
0 siblings, 0 replies; 200+ results
From: Fabian Grünbichler @ 2025-11-26 15:15 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
nit for the subject: this doesn't yet partially fix anything..
On November 24, 2025 6:04 pm, Samuel Rufinatscha wrote:
> Repeated /status requests caused lookup_datastore() to re-read and
> parse datastore.cfg on every call. The issue was mentioned in report
> #6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
> dominated by pbs_config::datastore::config() (config parsing).
>
> To solve the issue, this patch prepares the config version cache,
> so that datastore config caching can be built on top of it.
> This patch specifically:
> (1) implements increment function in order to invalidate generations
> (2) removes obsolete comments
>
> Links
>
> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>
> Fixes: #6049
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
> Changes:
>
> From v1 → v2 (original introduction), thanks @Fabian
> - Split the ConfigVersionCache changes out of the large datastore patch
> into their own config-only patch
> * removed the obsolete // FIXME comment on datastore_generation
> * added ConfigVersionCache::datastore_generation() as getter
>
> From v2 → v3
> No changes
>
> From v3 → v4
> No changes
>
> From v4 → v5
> - Rebased only, no changes
>
> pbs-config/src/config_version_cache.rs | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
> index e8fb994f..b875f7e0 100644
> --- a/pbs-config/src/config_version_cache.rs
> +++ b/pbs-config/src/config_version_cache.rs
> @@ -26,7 +26,6 @@ struct ConfigVersionCacheDataInner {
> // Traffic control (traffic-control.cfg) generation/version.
> traffic_control_generation: AtomicUsize,
> // datastore (datastore.cfg) generation/version
> - // FIXME: remove with PBS 3.0
> datastore_generation: AtomicUsize,
> // Add further atomics here
> }
> @@ -145,8 +144,15 @@ impl ConfigVersionCache {
> .fetch_add(1, Ordering::AcqRel);
> }
>
> + /// Returns the datastore generation number.
> + pub fn datastore_generation(&self) -> usize {
> + self.shmem
> + .data()
> + .datastore_generation
> + .load(Ordering::Acquire)
> + }
> +
> /// Increase the datastore generation number.
> - // FIXME: remove with PBS 3.0 or make actually useful again in datastore lookup
> pub fn increase_datastore_generation(&self) -> usize {
> self.shmem
> .data()
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* [pbs-devel] superseded: [PATCH proxmox-backup v4 0/4] datastore: remove config reload on hot path
2025-11-24 15:33 12% [pbs-devel] [PATCH proxmox-backup v4 0/4] " Samuel Rufinatscha
` (3 preceding siblings ...)
2025-11-24 15:33 13% ` [pbs-devel] [PATCH proxmox-backup v4 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
@ 2025-11-24 17:06 13% ` Samuel Rufinatscha
4 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-24 17:06 UTC (permalink / raw)
To: pbs-devel
https://lore.proxmox.com/pbs-devel/20251124170423.303300-1-s.rufinatscha@proxmox.com/T/#t
On 11/24/25 4:32 PM, Samuel Rufinatscha wrote:
> Hi,
>
> this series reduces CPU time in datastore lookups by avoiding repeated
> datastore.cfg reads/parses in both `lookup_datastore()` and
> `DataStore::Drop`. It also adds a TTL so manual config edits are
> noticed without reintroducing hashing on every request.
>
> While investigating #6049 [1], cargo-flamegraph [2] showed hotspots
> during repeated `/status` calls in `lookup_datastore()` and in `Drop`,
> dominated by `pbs_config::datastore::config()` (config parse).
>
> The parsing cost itself should eventually be investigated in a future
> effort. Furthermore, cargo-flamegraph showed that when using a
> token-based auth method to access the API, a significant amount of time
> is spent in validation on every request request [3].
>
> ## Approach
>
> [PATCH 1/4] Support datastore generation in ConfigVersionCache
>
> [PATCH 2/4] Fast path for datastore lookups
> Cache the parsed datastore.cfg keyed by the shared datastore
> generation. lookup_datastore() reuses both the cached config and an
> existing DataStoreImpl when the generation matches, and falls back
> to the old slow path otherwise. The caching logic is implemented
> using the datastore_section_config_cached(update_cache: bool) helper.
>
> [PATCH 3/4] Fast path for Drop
> Make DataStore::Drop use the datastore_section_config_cached()
> helper to avoid re-reading/parsing datastore.cfg on every Drop.
> Bump generation not only on API config saves, but also on slow-path
> lookups (if update_cache is true), to enable Drop handlers see
> eventual newer configs.
>
> [PATCH 4/4] TTL to catch manual edits
> Add a TTL to the cached config and bump the datastore generation iff
> the digest changed but generation stays the same. This catches manual
> edits to datastore.cfg without reintroducing hashing or config
> parsing on every request.
>
> ## Benchmark results
>
> ### End-to-end
>
> Testing `/status?verbose=0` end-to-end with 1000 stores, 5 req/store
> and parallel=16 before/after the series:
>
> Metric Before After
> ----------------------------------------
> Total time 12s 9s
> Throughput (all) 416.67 555.56
> Cold RPS (round #1) 83.33 111.11
> Warm RPS (#2..N) 333.33 444.44
>
> Running under flamegraph [2], TLS appears to consume a significant
> amount of CPU time and blur the results. Still, a ~33% higher overall
> throughput and ~25% less end-to-end time for this workload.
>
> ### Isolated benchmarks (hyperfine)
>
> In addition to the end-to-end tests, I measured two standalone
> benchmarks with hyperfine, each using a config with 1000 datastores.
> `M` is the number of distinct datastores looked up and
> `N` is the number of lookups per datastore.
>
> Drop-direct variant:
>
> Drops the `DataStore` after every lookup, so the `Drop` path runs on
> every iteration:
>
> use anyhow::Error;
>
> use pbs_api_types::Operation;
> use pbs_datastore::DataStore;
>
> fn main() -> Result<(), Error> {
> let mut args = std::env::args();
> args.next();
>
> let datastores = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> let iterations = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> for d in 1..=datastores {
> let name = format!("ds{:04}", d);
>
> for i in 1..=iterations {
> DataStore::lookup_datastore(&name, Some(Operation::Write))?;
> }
> }
>
> Ok(())
> }
>
> +----+------+-----------+-----------+---------+
> | M | N | Baseline | Patched | Speedup |
> +----+------+-----------+-----------+---------+
> | 1 | 1000 | 1.684 s | 35.3 ms | 47.7x |
> | 10 | 100 | 1.689 s | 35.0 ms | 48.3x |
> | 100| 10 | 1.709 s | 35.8 ms | 47.7x |
> |1000| 1 | 1.809 s | 39.0 ms | 46.4x |
> +----+------+-----------+-----------+---------+
>
> Bulk-drop variant:
>
> Keeps the `DataStore` instances alive for
> all `N` lookups of a given datastore and then drops them in bulk,
> mimicking a task that performs many lookups while it is running and
> only triggers the expensive `Drop` logic when the last user exits.
>
> use anyhow::Error;
>
> use pbs_api_types::Operation;
> use pbs_datastore::DataStore;
>
> fn main() -> Result<(), Error> {
> let mut args = std::env::args();
> args.next();
>
> let datastores = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> let iterations = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> for d in 1..=datastores {
> let name = format!("ds{:04}", d);
>
> let mut stores = Vec::with_capacity(iterations);
> for i in 1..=iterations {
> stores.push(DataStore::lookup_datastore(&name, Some(Operation::Write))?);
> }
> }
>
> Ok(())
> }
>
> +------+------+---------------+--------------+---------+
> | M | N | Baseline mean | Patched mean | Speedup |
> +------+------+---------------+--------------+---------+
> | 1 | 1000 | 890.6 ms | 35.5 ms | 25.1x |
> | 10 | 100 | 891.3 ms | 35.1 ms | 25.4x |
> | 100 | 10 | 983.9 ms | 35.6 ms | 27.6x |
> | 1000 | 1 | 1829.0 ms | 45.2 ms | 40.5x |
> +------+------+---------------+--------------+---------+
>
>
> Both variants show that the combination of the cached config lookups
> and the cheaper `Drop` handling reduces the hot-path cost from ~1.8 s
> per run to a few tens of milliseconds in these benchmarks.
>
> ## Reproduction steps
>
> VM: 4 vCPU, ~8 GiB RAM, VirtIO-SCSI; disks:
> - scsi0 32G (OS)
> - scsi1 1000G (datastores)
>
> Install PBS from ISO on the VM.
>
> Set up ZFS on /dev/sdb (adjust if different):
>
> zpool create -f -o ashift=12 pbsbench /dev/sdb
> zfs set mountpoint=/pbsbench pbsbench
> zfs create pbsbench/pbs-bench
>
> Raise file-descriptor limit:
>
> sudo systemctl edit proxmox-backup-proxy.service
>
> Add the following lines:
>
> [Service]
> LimitNOFILE=1048576
>
> Reload systemd and restart the proxy:
>
> sudo systemctl daemon-reload
> sudo systemctl restart proxmox-backup-proxy.service
>
> Verify the limit:
>
> systemctl show proxmox-backup-proxy.service | grep LimitNOFILE
>
> Create 1000 ZFS-backed datastores (as used in #6049 [1]):
>
> seq -w 001 1000 | xargs -n1 -P1 bash -c '
> id=$0
> name="ds${id}"
> dataset="pbsbench/pbs-bench/${name}"
> path="/pbsbench/pbs-bench/${name}"
> zfs create -o mountpoint="$path" "$dataset"
> proxmox-backup-manager datastore create "$name" "$path" \
> --comment "ZFS dataset-based datastore"
> '
>
> Build PBS from this series, then run the server under manually
> under flamegraph:
>
> systemctl stop proxmox-backup-proxy
> cargo flamegraph --release --bin proxmox-backup-proxy
>
> ## Patch summary
>
> [PATCH 1/4] partial fix #6049: config: enable config version cache for datastore
> [PATCH 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
> [PATCH 3/4] partial fix #6049: datastore: use config fast-path in Drop
> [PATCH 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
>
> ## Maintainer notes
>
> No dependency bumps, no API changes and no breaking changes.
>
> Thanks,
> Samuel
>
> Links
>
> [1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
> [3] Bugzilla #7017: https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>
> Samuel Rufinatscha (4):
> partial fix #6049: config: enable config version cache for datastore
> partial fix #6049: datastore: impl ConfigVersionCache fast path for
> lookups
> partial fix #6049: datastore: use config fast-path in Drop
> partial fix #6049: datastore: add TTL fallback to catch manual config
> edits
>
> pbs-config/src/config_version_cache.rs | 10 +-
> pbs-datastore/Cargo.toml | 1 +
> pbs-datastore/src/datastore.rs | 215 ++++++++++++++++++++-----
> 3 files changed, 180 insertions(+), 46 deletions(-)
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop
2025-11-24 17:04 12% [pbs-devel] [PATCH proxmox-backup v5 " Samuel Rufinatscha
2025-11-24 17:04 16% ` [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
2025-11-24 17:04 11% ` [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
@ 2025-11-24 17:04 14% ` Samuel Rufinatscha
2025-11-26 15:15 5% ` Fabian Grünbichler
2025-11-24 17:04 14% ` [pbs-devel] [PATCH proxmox-backup v5 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
` (2 subsequent siblings)
5 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2025-11-24 17:04 UTC (permalink / raw)
To: pbs-devel
The Drop impl of DataStore re-read datastore.cfg to decide whether
the entry should be evicted from the in-process cache (based on
maintenance mode’s clear_from_cache). During the investigation of
issue #6049 [1], a flamegraph [2] showed that the config reload in Drop
accounted for a measurable share of CPU time under load.
This patch wires the datastore config fast path to the Drop
impl to eventually avoid an expensive config reload from disk to capture
the maintenance mandate. Also, to ensure the Drop handlers will detect
that a newer config exists / to mitigate usage of an eventually stale
cached entry, generation will not only be bumped on config save, but also
on re-read of the config file (slow path), if `update_cache = true`.
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes:
From v1 → v2
- Replace caching logic with the datastore_section_config_cached()
helper.
From v2 → v3
No changes
From v3 → v4, thanks @Fabian
- Pass datastore_section_config_cached(false) in Drop to avoid
concurrent cache updates.
From v4 → v5
- Rebased only, no changes
pbs-datastore/src/datastore.rs | 60 ++++++++++++++++++++++++++--------
1 file changed, 47 insertions(+), 13 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index c9cb5d65..7638a899 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -225,15 +225,40 @@ impl Drop for DataStore {
// remove datastore from cache iff
// - last task finished, and
// - datastore is in a maintenance mode that mandates it
- let remove_from_cache = last_task
- && pbs_config::datastore::config()
- .and_then(|(s, _)| s.lookup::<DataStoreConfig>("datastore", self.name()))
- .is_ok_and(|c| {
- c.get_maintenance_mode()
- .is_some_and(|m| m.clear_from_cache())
- });
-
- if remove_from_cache {
+
+ // first check: check if last task finished
+ if !last_task {
+ return;
+ }
+
+ let (section_config, _gen) = match datastore_section_config_cached(false) {
+ Ok(v) => v,
+ Err(err) => {
+ log::error!(
+ "failed to load datastore config in Drop for {} - {err}",
+ self.name()
+ );
+ return;
+ }
+ };
+
+ let datastore_cfg: DataStoreConfig =
+ match section_config.lookup("datastore", self.name()) {
+ Ok(cfg) => cfg,
+ Err(err) => {
+ log::error!(
+ "failed to look up datastore '{}' in Drop - {err}",
+ self.name()
+ );
+ return;
+ }
+ };
+
+ // second check: check maintenance mode mandate
+ if datastore_cfg
+ .get_maintenance_mode()
+ .is_some_and(|m| m.clear_from_cache())
+ {
DATASTORE_MAP.lock().unwrap().remove(self.name());
}
}
@@ -307,12 +332,12 @@ impl DatastoreThreadSettings {
/// - If the cached generation matches the current generation, the
/// cached config is returned.
/// - Otherwise the config is re-read from disk. If `update_cache` is
-/// `true`, the new config and current generation are stored in the
+/// `true`, the new config and bumped generation are stored in the
/// cache. Callers that set `update_cache = true` must hold the
/// datastore config lock to avoid racing with concurrent config
/// changes.
/// - If `update_cache` is `false`, the freshly read config is returned
-/// but the cache is left unchanged.
+/// but the cache and generation are left unchanged.
///
/// If `ConfigVersionCache` is not available, the config is always read
/// from disk and `None` is returned as the generation.
@@ -333,14 +358,23 @@ fn datastore_section_config_cached(
let (config_raw, _digest) = pbs_config::datastore::config()?;
let config = Arc::new(config_raw);
+ let mut effective_gen = current_gen;
if update_cache {
+ // Bump the generation. This ensures that Drop
+ // handlers will detect that a newer config exists
+ // and will not rely on a stale cached entry for
+ // maintenance mandate.
+ let prev_gen = version_cache.increase_datastore_generation();
+ effective_gen = prev_gen + 1;
+
+ // Persist
*config_cache = Some(DatastoreConfigCache {
config: config.clone(),
- last_generation: current_gen,
+ last_generation: effective_gen,
});
}
- Ok((config, Some(current_gen)))
+ Ok((config, Some(effective_gen)))
} else {
// Fallback path, no config version cache: read datastore.cfg and return None as generation
*config_cache = None;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 14%]
* [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path
@ 2025-11-24 17:04 12% Samuel Rufinatscha
2025-11-24 17:04 16% ` [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
` (5 more replies)
0 siblings, 6 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-24 17:04 UTC (permalink / raw)
To: pbs-devel
Hi,
this series reduces CPU time in datastore lookups by avoiding repeated
datastore.cfg reads/parses in both `lookup_datastore()` and
`DataStore::Drop`. It also adds a TTL so manual config edits are
noticed without reintroducing hashing on every request.
While investigating #6049 [1], cargo-flamegraph [2] showed hotspots
during repeated `/status` calls in `lookup_datastore()` and in `Drop`,
dominated by `pbs_config::datastore::config()` (config parse).
The parsing cost itself should eventually be investigated in a future
effort. Furthermore, cargo-flamegraph showed that when using a
token-based auth method to access the API, a significant amount of time
is spent in validation on every request request [3].
## Approach
[PATCH 1/4] Support datastore generation in ConfigVersionCache
[PATCH 2/4] Fast path for datastore lookups
Cache the parsed datastore.cfg keyed by the shared datastore
generation. lookup_datastore() reuses both the cached config and an
existing DataStoreImpl when the generation matches, and falls back
to the old slow path otherwise. The caching logic is implemented
using the datastore_section_config_cached(update_cache: bool) helper.
[PATCH 3/4] Fast path for Drop
Make DataStore::Drop use the datastore_section_config_cached()
helper to avoid re-reading/parsing datastore.cfg on every Drop.
Bump generation not only on API config saves, but also on slow-path
lookups (if update_cache is true), to enable Drop handlers see
eventual newer configs.
[PATCH 4/4] TTL to catch manual edits
Add a TTL to the cached config and bump the datastore generation iff
the digest changed but generation stays the same. This catches manual
edits to datastore.cfg without reintroducing hashing or config
parsing on every request.
## Benchmark results
### End-to-end
Testing `/status?verbose=0` end-to-end with 1000 stores, 5 req/store
and parallel=16 before/after the series:
Metric Before After
----------------------------------------
Total time 12s 9s
Throughput (all) 416.67 555.56
Cold RPS (round #1) 83.33 111.11
Warm RPS (#2..N) 333.33 444.44
Running under flamegraph [2], TLS appears to consume a significant
amount of CPU time and blur the results. Still, a ~33% higher overall
throughput and ~25% less end-to-end time for this workload.
### Isolated benchmarks (hyperfine)
In addition to the end-to-end tests, I measured two standalone
benchmarks with hyperfine, each using a config with 1000 datastores.
`M` is the number of distinct datastores looked up and
`N` is the number of lookups per datastore.
Drop-direct variant:
Drops the `DataStore` after every lookup, so the `Drop` path runs on
every iteration:
use anyhow::Error;
use pbs_api_types::Operation;
use pbs_datastore::DataStore;
fn main() -> Result<(), Error> {
let mut args = std::env::args();
args.next();
let datastores = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
let iterations = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
for d in 1..=datastores {
let name = format!("ds{:04}", d);
for i in 1..=iterations {
DataStore::lookup_datastore(&name, Some(Operation::Write))?;
}
}
Ok(())
}
+----+------+-----------+-----------+---------+
| M | N | Baseline | Patched | Speedup |
+----+------+-----------+-----------+---------+
| 1 | 1000 | 1.684 s | 35.3 ms | 47.7x |
| 10 | 100 | 1.689 s | 35.0 ms | 48.3x |
| 100| 10 | 1.709 s | 35.8 ms | 47.7x |
|1000| 1 | 1.809 s | 39.0 ms | 46.4x |
+----+------+-----------+-----------+---------+
Bulk-drop variant:
Keeps the `DataStore` instances alive for
all `N` lookups of a given datastore and then drops them in bulk,
mimicking a task that performs many lookups while it is running and
only triggers the expensive `Drop` logic when the last user exits.
use anyhow::Error;
use pbs_api_types::Operation;
use pbs_datastore::DataStore;
fn main() -> Result<(), Error> {
let mut args = std::env::args();
args.next();
let datastores = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
let iterations = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
for d in 1..=datastores {
let name = format!("ds{:04}", d);
let mut stores = Vec::with_capacity(iterations);
for i in 1..=iterations {
stores.push(DataStore::lookup_datastore(&name, Some(Operation::Write))?);
}
}
Ok(())
}
+------+------+---------------+--------------+---------+
| M | N | Baseline mean | Patched mean | Speedup |
+------+------+---------------+--------------+---------+
| 1 | 1000 | 890.6 ms | 35.5 ms | 25.1x |
| 10 | 100 | 891.3 ms | 35.1 ms | 25.4x |
| 100 | 10 | 983.9 ms | 35.6 ms | 27.6x |
| 1000 | 1 | 1829.0 ms | 45.2 ms | 40.5x |
+------+------+---------------+--------------+---------+
Both variants show that the combination of the cached config lookups
and the cheaper `Drop` handling reduces the hot-path cost from ~1.8 s
per run to a few tens of milliseconds in these benchmarks.
## Reproduction steps
VM: 4 vCPU, ~8 GiB RAM, VirtIO-SCSI; disks:
- scsi0 32G (OS)
- scsi1 1000G (datastores)
Install PBS from ISO on the VM.
Set up ZFS on /dev/sdb (adjust if different):
zpool create -f -o ashift=12 pbsbench /dev/sdb
zfs set mountpoint=/pbsbench pbsbench
zfs create pbsbench/pbs-bench
Raise file-descriptor limit:
sudo systemctl edit proxmox-backup-proxy.service
Add the following lines:
[Service]
LimitNOFILE=1048576
Reload systemd and restart the proxy:
sudo systemctl daemon-reload
sudo systemctl restart proxmox-backup-proxy.service
Verify the limit:
systemctl show proxmox-backup-proxy.service | grep LimitNOFILE
Create 1000 ZFS-backed datastores (as used in #6049 [1]):
seq -w 001 1000 | xargs -n1 -P1 bash -c '
id=$0
name="ds${id}"
dataset="pbsbench/pbs-bench/${name}"
path="/pbsbench/pbs-bench/${name}"
zfs create -o mountpoint="$path" "$dataset"
proxmox-backup-manager datastore create "$name" "$path" \
--comment "ZFS dataset-based datastore"
'
Build PBS from this series, then run the server under manually
under flamegraph:
systemctl stop proxmox-backup-proxy
cargo flamegraph --release --bin proxmox-backup-proxy
## Patch summary
[PATCH 1/4] partial fix #6049: config: enable config version cache for datastore
[PATCH 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
[PATCH 3/4] partial fix #6049: datastore: use config fast-path in Drop
[PATCH 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
## Maintainer notes
No dependency bumps, no API changes and no breaking changes.
Thanks,
Samuel
Links
[1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
[3] Bugzilla #7017: https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Samuel Rufinatscha (4):
partial fix #6049: config: enable config version cache for datastore
partial fix #6049: datastore: impl ConfigVersionCache fast path for
lookups
partial fix #6049: datastore: use config fast-path in Drop
partial fix #6049: datastore: add TTL fallback to catch manual config
edits
pbs-config/src/config_version_cache.rs | 10 +-
pbs-datastore/Cargo.toml | 1 +
pbs-datastore/src/datastore.rs | 213 ++++++++++++++++++++-----
3 files changed, 179 insertions(+), 45 deletions(-)
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore
2025-11-24 17:04 12% [pbs-devel] [PATCH proxmox-backup v5 " Samuel Rufinatscha
@ 2025-11-24 17:04 16% ` Samuel Rufinatscha
2025-11-26 15:15 5% ` Fabian Grünbichler
2025-11-24 17:04 11% ` [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
` (4 subsequent siblings)
5 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2025-11-24 17:04 UTC (permalink / raw)
To: pbs-devel
Repeated /status requests caused lookup_datastore() to re-read and
parse datastore.cfg on every call. The issue was mentioned in report
#6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
dominated by pbs_config::datastore::config() (config parsing).
To solve the issue, this patch prepares the config version cache,
so that datastore config caching can be built on top of it.
This patch specifically:
(1) implements increment function in order to invalidate generations
(2) removes obsolete comments
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes:
From v1 → v2 (original introduction), thanks @Fabian
- Split the ConfigVersionCache changes out of the large datastore patch
into their own config-only patch
* removed the obsolete // FIXME comment on datastore_generation
* added ConfigVersionCache::datastore_generation() as getter
From v2 → v3
No changes
From v3 → v4
No changes
From v4 → v5
- Rebased only, no changes
pbs-config/src/config_version_cache.rs | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
index e8fb994f..b875f7e0 100644
--- a/pbs-config/src/config_version_cache.rs
+++ b/pbs-config/src/config_version_cache.rs
@@ -26,7 +26,6 @@ struct ConfigVersionCacheDataInner {
// Traffic control (traffic-control.cfg) generation/version.
traffic_control_generation: AtomicUsize,
// datastore (datastore.cfg) generation/version
- // FIXME: remove with PBS 3.0
datastore_generation: AtomicUsize,
// Add further atomics here
}
@@ -145,8 +144,15 @@ impl ConfigVersionCache {
.fetch_add(1, Ordering::AcqRel);
}
+ /// Returns the datastore generation number.
+ pub fn datastore_generation(&self) -> usize {
+ self.shmem
+ .data()
+ .datastore_generation
+ .load(Ordering::Acquire)
+ }
+
/// Increase the datastore generation number.
- // FIXME: remove with PBS 3.0 or make actually useful again in datastore lookup
pub fn increase_datastore_generation(&self) -> usize {
self.shmem
.data()
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 16%]
* [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
2025-11-24 17:04 12% [pbs-devel] [PATCH proxmox-backup v5 " Samuel Rufinatscha
2025-11-24 17:04 16% ` [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
@ 2025-11-24 17:04 11% ` Samuel Rufinatscha
2025-11-26 15:15 5% ` Fabian Grünbichler
2025-11-24 17:04 14% ` [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
` (3 subsequent siblings)
5 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2025-11-24 17:04 UTC (permalink / raw)
To: pbs-devel
Repeated /status requests caused lookup_datastore() to re-read and
parse datastore.cfg on every call. The issue was mentioned in report
#6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
dominated by pbs_config::datastore::config() (config parsing).
This patch implements caching of the global datastore.cfg using the
generation numbers from the shared config version cache. It caches the
datastore.cfg along with the generation number and, when a subsequent
lookup sees the same generation, it reuses the cached config without
re-reading it from disk. If the generation differs
(or the cache is unavailable), the config is re-read from disk.
If `update_cache = true`, the new config and current generation are
persisted in the cache. In this case, callers must hold the datastore
config lock to avoid racing with concurrent config changes.
If `update_cache` is `false` and generation did not match, the freshly
read config is returned but the cache is left unchanged. If
`ConfigVersionCache` is not available, the config is always read from
disk and `None` is returned as generation.
Behavioral notes
- The generation is bumped via the existing save_config() path, so
API-driven config changes are detected immediately.
- Manual edits to datastore.cfg are not detected; this is covered in a
dedicated patch in this series.
- DataStore::drop still performs a config read on the common path;
also covered in a dedicated patch in this series.
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes:
From v1 → v2, thanks @Fabian
- Moved the ConfigVersionCache changes into its own patch.
- Introduced the global static DATASTORE_CONFIG_CACHE to store the
fully parsed datastore.cfg instead, along with its generation number.
Introduced DatastoreConfigCache struct to hold both.
- Removed and replaced the CachedDatastoreConfigTag field of
DataStoreImpl with a generation number field only (Option<usize>)
to validate DataStoreImpl reuse.
- Added DataStore::datastore_section_config_cached() helper function
to encapsulate the caching logic and simplify reuse.
- Modified DataStore::lookup_datastore() to use the new helper.
From v2 → v3
No changes
From v3 → v4, thanks @Fabian
- Restructured the version cache checks in
datastore_section_config_cached(), to simplify the logic.
- Added update_cache parameter to datastore_section_config_cached() to
control cache updates.
From v4 → v5
- Rebased only, no changes
pbs-datastore/Cargo.toml | 1 +
pbs-datastore/src/datastore.rs | 138 +++++++++++++++++++++++++--------
2 files changed, 105 insertions(+), 34 deletions(-)
diff --git a/pbs-datastore/Cargo.toml b/pbs-datastore/Cargo.toml
index 8ce930a9..42f49a7b 100644
--- a/pbs-datastore/Cargo.toml
+++ b/pbs-datastore/Cargo.toml
@@ -40,6 +40,7 @@ proxmox-io.workspace = true
proxmox-lang.workspace=true
proxmox-s3-client = { workspace = true, features = [ "impl" ] }
proxmox-schema = { workspace = true, features = [ "api-macro" ] }
+proxmox-section-config.workspace = true
proxmox-serde = { workspace = true, features = [ "serde_json" ] }
proxmox-sys.workspace = true
proxmox-systemd.workspace = true
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 36550ff6..c9cb5d65 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -34,7 +34,8 @@ use pbs_api_types::{
MaintenanceType, Operation, UPID,
};
use pbs_config::s3::S3_CFG_TYPE_ID;
-use pbs_config::BackupLockGuard;
+use pbs_config::{BackupLockGuard, ConfigVersionCache};
+use proxmox_section_config::SectionConfigData;
use crate::backup_info::{
BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
@@ -48,6 +49,17 @@ use crate::s3::S3_CONTENT_PREFIX;
use crate::task_tracking::{self, update_active_operations};
use crate::{DataBlob, LocalDatastoreLruCache};
+// Cache for fully parsed datastore.cfg
+struct DatastoreConfigCache {
+ // Parsed datastore.cfg file
+ config: Arc<SectionConfigData>,
+ // Generation number from ConfigVersionCache
+ last_generation: usize,
+}
+
+static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
+ LazyLock::new(|| Mutex::new(None));
+
static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
LazyLock::new(|| Mutex::new(HashMap::new()));
@@ -149,11 +161,13 @@ pub struct DataStoreImpl {
last_gc_status: Mutex<GarbageCollectionStatus>,
verify_new: bool,
chunk_order: ChunkOrder,
- last_digest: Option<[u8; 32]>,
sync_level: DatastoreFSyncLevel,
backend_config: DatastoreBackendConfig,
lru_store_caching: Option<LocalDatastoreLruCache>,
thread_settings: DatastoreThreadSettings,
+ /// Datastore generation number from `ConfigVersionCache` at creation time, used to
+ /// validate reuse of this cached `DataStoreImpl`.
+ config_generation: Option<usize>,
}
impl DataStoreImpl {
@@ -166,11 +180,11 @@ impl DataStoreImpl {
last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
verify_new: false,
chunk_order: Default::default(),
- last_digest: None,
sync_level: Default::default(),
backend_config: Default::default(),
lru_store_caching: None,
thread_settings: Default::default(),
+ config_generation: None,
})
}
}
@@ -286,6 +300,55 @@ impl DatastoreThreadSettings {
}
}
+/// Returns the parsed datastore config (`datastore.cfg`) and its
+/// generation.
+///
+/// Uses `ConfigVersionCache` to detect stale entries:
+/// - If the cached generation matches the current generation, the
+/// cached config is returned.
+/// - Otherwise the config is re-read from disk. If `update_cache` is
+/// `true`, the new config and current generation are stored in the
+/// cache. Callers that set `update_cache = true` must hold the
+/// datastore config lock to avoid racing with concurrent config
+/// changes.
+/// - If `update_cache` is `false`, the freshly read config is returned
+/// but the cache is left unchanged.
+///
+/// If `ConfigVersionCache` is not available, the config is always read
+/// from disk and `None` is returned as the generation.
+fn datastore_section_config_cached(
+ update_cache: bool,
+) -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
+ let mut config_cache = DATASTORE_CONFIG_CACHE.lock().unwrap();
+
+ if let Ok(version_cache) = ConfigVersionCache::new() {
+ let current_gen = version_cache.datastore_generation();
+ if let Some(cached) = config_cache.as_ref() {
+ // Fast path: re-use cached datastore.cfg
+ if cached.last_generation == current_gen {
+ return Ok((cached.config.clone(), Some(cached.last_generation)));
+ }
+ }
+ // Slow path: re-read datastore.cfg
+ let (config_raw, _digest) = pbs_config::datastore::config()?;
+ let config = Arc::new(config_raw);
+
+ if update_cache {
+ *config_cache = Some(DatastoreConfigCache {
+ config: config.clone(),
+ last_generation: current_gen,
+ });
+ }
+
+ Ok((config, Some(current_gen)))
+ } else {
+ // Fallback path, no config version cache: read datastore.cfg and return None as generation
+ *config_cache = None;
+ let (config_raw, _digest) = pbs_config::datastore::config()?;
+ Ok((Arc::new(config_raw), None))
+ }
+}
+
impl DataStore {
// This one just panics on everything
#[doc(hidden)]
@@ -363,56 +426,63 @@ impl DataStore {
name: &str,
operation: Option<Operation>,
) -> Result<Arc<DataStore>, Error> {
- // Avoid TOCTOU between checking maintenance mode and updating active operation counter, as
- // we use it to decide whether it is okay to delete the datastore.
+ // Avoid TOCTOU between checking maintenance mode and updating active operations.
let _config_lock = pbs_config::datastore::lock_config()?;
- // we could use the ConfigVersionCache's generation for staleness detection, but we load
- // the config anyway -> just use digest, additional benefit: manual changes get detected
- let (config, digest) = pbs_config::datastore::config()?;
- let config: DataStoreConfig = config.lookup("datastore", name)?;
+ // Get the current datastore.cfg generation number and cached config
+ let (section_config, gen_num) = datastore_section_config_cached(true)?;
+
+ let datastore_cfg: DataStoreConfig = section_config.lookup("datastore", name)?;
+ let maintenance_mode = datastore_cfg.get_maintenance_mode();
+ let mount_status = get_datastore_mount_status(&datastore_cfg);
- if let Some(maintenance_mode) = config.get_maintenance_mode() {
- if let Err(error) = maintenance_mode.check(operation) {
+ if let Some(mm) = &maintenance_mode {
+ if let Err(error) = mm.check(operation.clone()) {
bail!("datastore '{name}' is unavailable: {error}");
}
}
- if get_datastore_mount_status(&config) == Some(false) {
- let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
- datastore_cache.remove(&config.name);
- bail!("datastore '{}' is not mounted", config.name);
+ let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
+
+ if mount_status == Some(false) {
+ datastore_cache.remove(&datastore_cfg.name);
+ bail!("datastore '{}' is not mounted", datastore_cfg.name);
}
- let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
- let entry = datastore_cache.get(name);
-
- // reuse chunk store so that we keep using the same process locker instance!
- let chunk_store = if let Some(datastore) = &entry {
- let last_digest = datastore.last_digest.as_ref();
- if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
- if let Some(operation) = operation {
- update_active_operations(name, operation, 1)?;
+ // Re-use DataStoreImpl
+ if let Some(existing) = datastore_cache.get(name).cloned() {
+ if let (Some(last_generation), Some(gen_num)) = (existing.config_generation, gen_num) {
+ if last_generation == gen_num {
+ if let Some(op) = operation {
+ update_active_operations(name, op, 1)?;
+ }
+
+ return Ok(Arc::new(Self {
+ inner: existing,
+ operation,
+ }));
}
- return Ok(Arc::new(Self {
- inner: Arc::clone(datastore),
- operation,
- }));
}
- Arc::clone(&datastore.chunk_store)
+ }
+
+ // (Re)build DataStoreImpl
+
+ // Reuse chunk store so that we keep using the same process locker instance!
+ let chunk_store = if let Some(existing) = datastore_cache.get(name) {
+ Arc::clone(&existing.chunk_store)
} else {
let tuning: DatastoreTuning = serde_json::from_value(
DatastoreTuning::API_SCHEMA
- .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
+ .parse_property_string(datastore_cfg.tuning.as_deref().unwrap_or(""))?,
)?;
Arc::new(ChunkStore::open(
name,
- config.absolute_path(),
+ datastore_cfg.absolute_path(),
tuning.sync_level.unwrap_or_default(),
)?)
};
- let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
+ let datastore = DataStore::with_store_and_config(chunk_store, datastore_cfg, gen_num)?;
let datastore = Arc::new(datastore);
datastore_cache.insert(name.to_string(), datastore.clone());
@@ -514,7 +584,7 @@ impl DataStore {
fn with_store_and_config(
chunk_store: Arc<ChunkStore>,
config: DataStoreConfig,
- last_digest: Option<[u8; 32]>,
+ generation: Option<usize>,
) -> Result<DataStoreImpl, Error> {
let mut gc_status_path = chunk_store.base_path();
gc_status_path.push(".gc-status");
@@ -579,11 +649,11 @@ impl DataStore {
last_gc_status: Mutex::new(gc_status),
verify_new: config.verify_new.unwrap_or(false),
chunk_order: tuning.chunk_order.unwrap_or_default(),
- last_digest,
sync_level: tuning.sync_level.unwrap_or_default(),
backend_config,
lru_store_caching,
thread_settings,
+ config_generation: generation,
})
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 11%]
* [pbs-devel] [PATCH proxmox-backup v5 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
2025-11-24 17:04 12% [pbs-devel] [PATCH proxmox-backup v5 " Samuel Rufinatscha
` (2 preceding siblings ...)
2025-11-24 17:04 14% ` [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
@ 2025-11-24 17:04 14% ` Samuel Rufinatscha
2025-11-26 15:15 5% ` Fabian Grünbichler
2025-11-26 15:16 5% ` [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Fabian Grünbichler
2026-01-05 14:21 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
5 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2025-11-24 17:04 UTC (permalink / raw)
To: pbs-devel
The lookup fast path reacts to API-driven config changes because
save_config() bumps the generation. Manual edits of datastore.cfg do
not bump the counter. To keep the system robust against such edits
without reintroducing config reading and hashing on the hot path, this
patch adds a TTL to the cache entry.
If the cached config is older than
DATASTORE_CONFIG_CACHE_TTL_SECS (set to 60s), the next lookup takes
the slow path and refreshes the entry. As an optimization, a check to
catch manual edits was added (if the digest changed but generation
stayed the same), so that the generation is only bumped when needed.
Links
[1] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes:
From v1 → v2
- Store last_update timestamp in DatastoreConfigCache type.
From v2 → v3
No changes
From v3 → v4
- Fix digest generation bump logic in update_cache, thanks @Fabian.
From v4 → v5
- Rebased only, no changes
pbs-datastore/src/datastore.rs | 53 ++++++++++++++++++++++++----------
1 file changed, 38 insertions(+), 15 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 7638a899..0fc3fbf2 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -53,8 +53,12 @@ use crate::{DataBlob, LocalDatastoreLruCache};
struct DatastoreConfigCache {
// Parsed datastore.cfg file
config: Arc<SectionConfigData>,
+ // Digest of the datastore.cfg file
+ digest: [u8; 32],
// Generation number from ConfigVersionCache
last_generation: usize,
+ // Last update time (epoch seconds)
+ last_update: i64,
}
static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
@@ -63,6 +67,8 @@ static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
LazyLock::new(|| Mutex::new(HashMap::new()));
+/// Max age in seconds to reuse the cached datastore config.
+const DATASTORE_CONFIG_CACHE_TTL_SECS: i64 = 60;
/// Filename to store backup group notes
pub const GROUP_NOTES_FILE_NAME: &str = "notes";
/// Filename to store backup group owner
@@ -329,13 +335,14 @@ impl DatastoreThreadSettings {
/// generation.
///
/// Uses `ConfigVersionCache` to detect stale entries:
-/// - If the cached generation matches the current generation, the
-/// cached config is returned.
+/// - If the cached generation matches the current generation and TTL is
+/// OK, the cached config is returned.
/// - Otherwise the config is re-read from disk. If `update_cache` is
-/// `true`, the new config and bumped generation are stored in the
-/// cache. Callers that set `update_cache = true` must hold the
-/// datastore config lock to avoid racing with concurrent config
-/// changes.
+/// `true` and a previous cached entry exists with the same generation
+/// but a different digest, this indicates the config has changed
+/// (e.g. manual edit) and the generation must be bumped. Callers
+/// that set `update_cache = true` must hold the datastore config lock
+/// to avoid racing with concurrent config changes.
/// - If `update_cache` is `false`, the freshly read config is returned
/// but the cache and generation are left unchanged.
///
@@ -347,30 +354,46 @@ fn datastore_section_config_cached(
let mut config_cache = DATASTORE_CONFIG_CACHE.lock().unwrap();
if let Ok(version_cache) = ConfigVersionCache::new() {
+ let now = epoch_i64();
let current_gen = version_cache.datastore_generation();
if let Some(cached) = config_cache.as_ref() {
- // Fast path: re-use cached datastore.cfg
- if cached.last_generation == current_gen {
+ // Fast path: re-use cached datastore.cfg if generation matches and TTL not expired
+ if cached.last_generation == current_gen
+ && now - cached.last_update < DATASTORE_CONFIG_CACHE_TTL_SECS
+ {
return Ok((cached.config.clone(), Some(cached.last_generation)));
}
}
// Slow path: re-read datastore.cfg
- let (config_raw, _digest) = pbs_config::datastore::config()?;
+ let (config_raw, digest) = pbs_config::datastore::config()?;
let config = Arc::new(config_raw);
let mut effective_gen = current_gen;
if update_cache {
- // Bump the generation. This ensures that Drop
- // handlers will detect that a newer config exists
- // and will not rely on a stale cached entry for
- // maintenance mandate.
- let prev_gen = version_cache.increase_datastore_generation();
- effective_gen = prev_gen + 1;
+ // Bump the generation if the config has been changed manually.
+ // This ensures that Drop handlers will detect that a newer config exists
+ // and will not rely on a stale cached entry for maintenance mandate.
+ let (prev_gen, prev_digest) = config_cache
+ .as_ref()
+ .map(|c| (Some(c.last_generation), Some(c.digest)))
+ .unwrap_or((None, None));
+
+ let manual_edit = match (prev_gen, prev_digest) {
+ (Some(prev_g), Some(prev_d)) => prev_g == current_gen && prev_d != digest,
+ _ => false,
+ };
+
+ if manual_edit {
+ let prev_gen = version_cache.increase_datastore_generation();
+ effective_gen = prev_gen + 1;
+ }
// Persist
*config_cache = Some(DatastoreConfigCache {
config: config.clone(),
+ digest,
last_generation: effective_gen,
+ last_update: now,
});
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 14%]
* [pbs-devel] superseded: [PATCH proxmox-backup v3 0/6] datastore: remove config reload on hot path
2025-11-20 13:03 10% [pbs-devel] [PATCH proxmox-backup v3 0/6] " Samuel Rufinatscha
` (6 preceding siblings ...)
2025-11-20 14:50 5% ` [pbs-devel] [PATCH proxmox-backup v3 0/6] datastore: remove config reload on hot path Fabian Grünbichler
@ 2025-11-24 15:35 13% ` Samuel Rufinatscha
7 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-24 15:35 UTC (permalink / raw)
To: pbs-devel
https://lore.proxmox.com/pbs-devel/20251124153328.239666-1-s.rufinatscha@proxmox.com/T/#t
On 11/20/25 2:03 PM, Samuel Rufinatscha wrote:
> Hi,
>
> this series reduces CPU time in datastore lookups by avoiding repeated
> datastore.cfg reads/parses in both `lookup_datastore()` and
> `DataStore::Drop`. It also adds a TTL so manual config edits are
> noticed without reintroducing hashing on every request.
>
> While investigating #6049 [1], cargo-flamegraph [2] showed hotspots during
> repeated `/status` calls in `lookup_datastore()` and in `Drop`,
> dominated by `pbs_config::datastore::config()` (config parse).
>
> The parsing cost itself should eventually be investigated in a future
> effort. Furthermore, cargo-flamegraph showed that when using a
> token-based auth method to access the API, a significant amount of time
> is spent in validation on every request request [3].
>
> ## Approach
>
> [PATCH 1/6] Extend ConfigVersionCache for datastore generation
> Expose a dedicated datastore generation counter and an increment
> helper so callers can cheaply track datastore.cfg versions.
>
> [PATCH 2/6] Fast path for datastore lookups
> Cache the parsed datastore.cfg keyed by the shared datastore
> generation. lookup_datastore() reuses both the cached config and an
> existing DataStoreImpl when the generation matches, and falls back
> to the old slow path otherwise.
>
> [PATCH 3/6] Fast path for Drop
> Make DataStore::Drop use the cached config if possible instead of
> rereading datastore.cfg from disk.
>
> [PATCH 4/6] TTL to catch manual edits
> Add a small TTL around the cached config and bump the datastore
> generation whenever the config is reloaded. This catches manual
> edits to datastore.cfg without reintroducing hashing or
> config parsing on every request.
>
> [PATCH 5/6] Add reload flag to config cache helper
> Add a flag to the config cache helper to indicate whether a
> config reload is acceptable.
>
> [PATCH 6/6] Only bump generation on config digest change
> Avoid unnecessary generation bumps when the config is reloaded
> but the digest did not change.
>
> ## Benchmark results
>
> All the following benchmarks are based on top of
> https://lore.proxmox.com/pbs-devel/20251112131525.645971-1-f.gruenbichler@proxmox.com/T/#u
>
> ### End-to-end
>
> Testing `/status?verbose=0` end-to-end with 1000 stores, 5 req/store
> and parallel=16 before/after the series:
>
> Metric Before After
> ----------------------------------------
> Total time 12s 9s
> Throughput (all) 416.67 555.56
> Cold RPS (round #1) 83.33 111.11
> Warm RPS (#2..N) 333.33 444.44
>
> Running under flamegraph [2], TLS appears to consume a significant
> amount of CPU time and blur the results. Still, a ~33% higher overall
> throughput and ~25% less end-to-end time for this workload.
>
> ### Isolated benchmarks (hyperfine)
>
> In addition to the end-to-end tests, I measured two standalone benchmarks
> with hyperfine, each using a config with 1000
> datastores. `M` is the number of distinct datastores looked up and
> `N` is the number of lookups per datastore.
>
> Drop-direct variant:
>
> Drops the `DataStore` after every lookup, so the `Drop` path runs on
> every iteration:
>
> use anyhow::Error;
>
> use pbs_api_types::Operation;
> use pbs_datastore::DataStore;
>
> fn main() -> Result<(), Error> {
> let mut args = std::env::args();
> args.next();
>
> let datastores = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> let iterations = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> for d in 1..=datastores {
> let name = format!("ds{:04}", d);
>
> for i in 1..=iterations {
> DataStore::lookup_datastore(&name, Some(Operation::Write))?;
> }
> }
>
> Ok(())
> }
>
> +------+-------+------------+------------+----------+
> | M | N | Baseline | Patched | Speedup |
> +------+-------+------------+------------+----------+
> | 1 | 1000 | 1.699 s | 37.3 ms | 45.5x |
> | 10 | 100 | 1.710 s | 35.8 ms | 47.7x |
> | 100 | 10 | 1.787 s | 36.6 ms | 48.9x |
> | 1000 | 1 | 1.899 s | 46.0 ms | 41.3x |
> +------+-------+------------+------------+----------+
>
>
> Bulk-drop variant:
>
> Keeps the `DataStore` instances alive for
> all `N` lookups of a given datastore and then drops them in bulk,
> mimicking a task that performs many lookups while it is running and
> only triggers the expensive `Drop` logic when the last user exits.
>
> use anyhow::Error;
>
> use pbs_api_types::Operation;
> use pbs_datastore::DataStore;
>
> fn main() -> Result<(), Error> {
> let mut args = std::env::args();
> args.next();
>
> let datastores = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> let iterations = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> for d in 1..=datastores {
> let name = format!("ds{:04}", d);
>
> let mut stores = Vec::with_capacity(iterations);
> for i in 1..=iterations {
> stores.push(DataStore::lookup_datastore(&name, Some(Operation::Write))?);
> }
> }
>
> Ok(())
> }
>
> +------+-------+--------------+-------------+----------+
> | M | N | Baseline | Patched | Speedup |
> +------+-------+--------------+-------------+----------+
> | 1 | 1000 | 888.8 ms | 39.3 ms | 22.6x |
> | 10 | 100 | 890.8 ms | 35.3 ms | 25.3x |
> | 100 | 10 | 974.5 ms | 36.3 ms | 26.8x |
> | 1000 | 1 | 1.848 s | 39.9 ms | 46.3x |
> +------+-------+--------------+-------------+----------+
>
>
> Both variants show that the combination of the cached config lookups
> and the cheaper `Drop` handling reduces the hot-path cost from ~1.7 s
> per run to a few tens of milliseconds in these benchmarks.
>
> ## Reproduction steps
>
> VM: 4 vCPU, ~8 GiB RAM, VirtIO-SCSI; disks:
> - scsi0 32G (OS)
> - scsi1 1000G (datastores)
>
> Install PBS from ISO on the VM.
>
> Set up ZFS on /dev/sdb (adjust if different):
>
> zpool create -f -o ashift=12 pbsbench /dev/sdb
> zfs set mountpoint=/pbsbench pbsbench
> zfs create pbsbench/pbs-bench
>
> Raise file-descriptor limit:
>
> sudo systemctl edit proxmox-backup-proxy.service
>
> Add the following lines:
>
> [Service]
> LimitNOFILE=1048576
>
> Reload systemd and restart the proxy:
>
> sudo systemctl daemon-reload
> sudo systemctl restart proxmox-backup-proxy.service
>
> Verify the limit:
>
> systemctl show proxmox-backup-proxy.service | grep LimitNOFILE
>
> Create 1000 ZFS-backed datastores (as used in #6049 [1]):
>
> seq -w 001 1000 | xargs -n1 -P1 bash -c '
> id=$0
> name="ds${id}"
> dataset="pbsbench/pbs-bench/${name}"
> path="/pbsbench/pbs-bench/${name}"
> zfs create -o mountpoint="$path" "$dataset"
> proxmox-backup-manager datastore create "$name" "$path" \
> --comment "ZFS dataset-based datastore"
> '
>
> Build PBS from this series, then run the server under manually
> under flamegraph:
>
> systemctl stop proxmox-backup-proxy
> cargo flamegraph --release --bin proxmox-backup-proxy
>
> ## Other resources:
>
> ### E2E benchmark script:
>
> #!/usr/bin/env bash
> set -euo pipefail
>
> # --- Config ---------------------------------------------------------------
> HOST='https://localhost:8007'
> USER='root@pam'
> PASS="$(cat passfile)"
>
> DATASTORE_PATH="/pbsbench/pbs-bench"
> MAX_STORES=1000 # how many stores to include
> PARALLEL=16 # concurrent workers
> REPEAT=5 # requests per store (1 cold + REPEAT-1 warm)
>
> PRINT_FIRST=false # true => log first request's HTTP code per store
>
> # --- Helpers --------------------------------------------------------------
> fmt_rps () {
> local n="$1" t="$2"
> awk -v n="$n" -v t="$t" 'BEGIN { if (t > 0) printf("%.2f\n", n/t); else print "0.00" }'
> }
>
> # --- Login ---------------------------------------------------------------
> auth=$(curl -ks -X POST "$HOST/api2/json/access/ticket" \
> -d "username=$USER" -d "password=$PASS")
> ticket=$(echo "$auth" | jq -r '.data.ticket')
>
> if [[ -z "${ticket:-}" || "$ticket" == "null" ]]; then
> echo "[ERROR] Login failed (no ticket)"
> exit 1
> fi
>
> # --- Collect stores (deterministic order) --------------------------------
> mapfile -t STORES < <(
> find "$DATASTORE_PATH" -mindepth 1 -maxdepth 1 -type d -printf '%f\n' \
> | sort | head -n "$MAX_STORES"
> )
>
> USED_STORES=${#STORES[@]}
> if (( USED_STORES == 0 )); then
> echo "[ERROR] No datastore dirs under $DATASTORE_PATH"
> exit 1
> fi
>
> echo "[INFO] Running with stores=$USED_STORES, repeat=$REPEAT, parallel=$PARALLEL"
>
> # --- Temp counters --------------------------------------------------------
> SUCCESS_ALL="$(mktemp)"
> FAIL_ALL="$(mktemp)"
> COLD_OK="$(mktemp)"
> WARM_OK="$(mktemp)"
> trap 'rm -f "$SUCCESS_ALL" "$FAIL_ALL" "$COLD_OK" "$WARM_OK"' EXIT
>
> export HOST ticket REPEAT SUCCESS_ALL FAIL_ALL COLD_OK WARM_OK PRINT_FIRST
>
> SECONDS=0
>
> # --- Fire requests --------------------------------------------------------
> printf "%s\n" "${STORES[@]}" \
> | xargs -P"$PARALLEL" -I{} bash -c '
> store="$1"
> url="$HOST/api2/json/admin/datastore/$store/status?verbose=0"
>
> for ((i=1;i<=REPEAT;i++)); do
> code=$(curl -ks -o /dev/null -w "%{http_code}" -b "PBSAuthCookie=$ticket" "$url" || echo 000)
>
> if [[ "$code" == "200" ]]; then
> echo 1 >> "$SUCCESS_ALL"
> if (( i == 1 )); then
> echo 1 >> "$COLD_OK"
> else
> echo 1 >> "$WARM_OK"
> fi
> if [[ "$PRINT_FIRST" == "true" && $i -eq 1 ]]; then
> ts=$(date +%H:%M:%S)
> echo "[$ts] $store #$i HTTP:200"
> fi
> else
> echo 1 >> "$FAIL_ALL"
> if [[ "$PRINT_FIRST" == "true" && $i -eq 1 ]]; then
> ts=$(date +%H:%M:%S)
> echo "[$ts] $store #$i HTTP:$code (FAIL)"
> fi
> fi
> done
> ' _ {}
>
> # --- Summary --------------------------------------------------------------
> elapsed=$SECONDS
> ok=$(wc -l < "$SUCCESS_ALL" 2>/dev/null || echo 0)
> fail=$(wc -l < "$FAIL_ALL" 2>/dev/null || echo 0)
> cold_ok=$(wc -l < "$COLD_OK" 2>/dev/null || echo 0)
> warm_ok=$(wc -l < "$WARM_OK" 2>/dev/null || echo 0)
>
> expected=$(( USED_STORES * REPEAT ))
> total=$(( ok + fail ))
>
> rps_all=$(fmt_rps "$ok" "$elapsed")
> rps_cold=$(fmt_rps "$cold_ok" "$elapsed")
> rps_warm=$(fmt_rps "$warm_ok" "$elapsed")
>
> echo "===== Summary ====="
> echo "Stores used: $USED_STORES"
> echo "Expected requests: $expected"
> echo "Executed requests: $total"
> echo "OK (HTTP 200): $ok"
> echo "Failed: $fail"
> printf "Total time: %dm %ds\n" $((elapsed/60)) $((elapsed%60))
> echo "Throughput all RPS: $rps_all"
> echo "Cold RPS (round #1): $rps_cold"
> echo "Warm RPS (#2..N): $rps_warm"
>
> ## Patch summary
>
> [PATCH 1/6] partial fix #6049: config: enable config version cache for datastore
> [PATCH 2/6] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
> [PATCH 3/6] partial fix #6049: datastore: use config fast-path in Drop
> [PATCH 4/6] partial fix #6049: datastore: add TTL fallback to catch manual config edits
> [PATCH 5/6] to add a reload flag to the config cache helper.
> [PATCH 6/6] to only bump generation when the config digest changes.
>
> ## Changes from v2:
>
> Added:
> - [PATCH 5/6]: Add a reload flag to the config cache helper.
> - [PATCH 6/6]: Only bump generation when the config digest changes.
>
> ## Maintainer notes
>
> No dependency bumps, no API changes and no breaking changes.
>
> Thanks,
> Samuel
>
> [1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
> [3] Bugzilla #7017: https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>
> Samuel Rufinatscha (6):
> partial fix #6049: config: enable config version cache for datastore
> partial fix #6049: datastore: impl ConfigVersionCache fast path for
> lookups
> partial fix #6049: datastore: use config fast-path in Drop
> partial fix #6049: datastore: add TTL fallback to catch manual config
> edits
> partial fix #6049: datastore: add reload flag to config cache helper
> datastore: only bump generation when config digest changes
>
> pbs-config/src/config_version_cache.rs | 10 +-
> pbs-datastore/Cargo.toml | 1 +
> pbs-datastore/src/datastore.rs | 232 ++++++++++++++++++++-----
> 3 files changed, 197 insertions(+), 46 deletions(-)
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox-backup v4 1/4] partial fix #6049: config: enable config version cache for datastore
2025-11-24 15:33 12% [pbs-devel] [PATCH proxmox-backup v4 0/4] " Samuel Rufinatscha
@ 2025-11-24 15:33 16% ` Samuel Rufinatscha
2025-11-24 15:33 11% ` [pbs-devel] [PATCH proxmox-backup v4 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
` (3 subsequent siblings)
4 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-24 15:33 UTC (permalink / raw)
To: pbs-devel
Repeated /status requests caused lookup_datastore() to re-read and
parse datastore.cfg on every call. The issue was mentioned in report
#6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
dominated by pbs_config::datastore::config() (config parsing).
To solve the issue, this patch prepares the config version cache,
so that datastore config caching can be built on top of it.
This patch specifically:
(1) implements increment function in order to invalidate generations
(2) removes obsolete comments
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes:
From v1 → v2 (original introduction), thanks @Fabian
- Split the ConfigVersionCache changes out of the large datastore patch
into their own config-only patch.
* removed the obsolete // FIXME comment on datastore_generation.
* added ConfigVersionCache::datastore_generation() as getter.
From v2 → v3
No changes
From v3 → v4
No changes
pbs-config/src/config_version_cache.rs | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
index e8fb994f..b875f7e0 100644
--- a/pbs-config/src/config_version_cache.rs
+++ b/pbs-config/src/config_version_cache.rs
@@ -26,7 +26,6 @@ struct ConfigVersionCacheDataInner {
// Traffic control (traffic-control.cfg) generation/version.
traffic_control_generation: AtomicUsize,
// datastore (datastore.cfg) generation/version
- // FIXME: remove with PBS 3.0
datastore_generation: AtomicUsize,
// Add further atomics here
}
@@ -145,8 +144,15 @@ impl ConfigVersionCache {
.fetch_add(1, Ordering::AcqRel);
}
+ /// Returns the datastore generation number.
+ pub fn datastore_generation(&self) -> usize {
+ self.shmem
+ .data()
+ .datastore_generation
+ .load(Ordering::Acquire)
+ }
+
/// Increase the datastore generation number.
- // FIXME: remove with PBS 3.0 or make actually useful again in datastore lookup
pub fn increase_datastore_generation(&self) -> usize {
self.shmem
.data()
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 16%]
* [pbs-devel] [PATCH proxmox-backup v4 3/4] partial fix #6049: datastore: use config fast-path in Drop
2025-11-24 15:33 12% [pbs-devel] [PATCH proxmox-backup v4 0/4] " Samuel Rufinatscha
2025-11-24 15:33 16% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
2025-11-24 15:33 11% ` [pbs-devel] [PATCH proxmox-backup v4 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
@ 2025-11-24 15:33 14% ` Samuel Rufinatscha
2025-11-24 15:33 13% ` [pbs-devel] [PATCH proxmox-backup v4 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-24 17:06 13% ` [pbs-devel] superseded: [PATCH proxmox-backup v4 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
4 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-24 15:33 UTC (permalink / raw)
To: pbs-devel
The Drop impl of DataStore re-read datastore.cfg to decide whether
the entry should be evicted from the in-process cache (based on
maintenance mode’s clear_from_cache). During the investigation of
issue #6049 [1], a flamegraph [2] showed that the config reload in Drop
accounted for a measurable share of CPU time under load.
This patch wires the datastore config fast path to the Drop
impl to eventually avoid an expensive config reload from disk to capture
the maintenance mandate. Also, to ensure the Drop handlers will detect
that a newer config exists / to mitigate usage of an eventually stale
cached entry, generation will not only be bumped on config save, but also
on re-read of the config file (slow path), if `update_cache = true`.
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes:
From v1 → v2
- Replace caching logic with the datastore_section_config_cached()
helper.
From v2 → v3
No changes
From v3 → v4, thanks @Fabian
- Pass datastore_section_config_cached(false) in Drop to avoid
concurrent cache updates.
pbs-datastore/src/datastore.rs | 60 ++++++++++++++++++++++++++--------
1 file changed, 47 insertions(+), 13 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 11e16eaf..942656e6 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -216,15 +216,40 @@ impl Drop for DataStore {
// remove datastore from cache iff
// - last task finished, and
// - datastore is in a maintenance mode that mandates it
- let remove_from_cache = last_task
- && pbs_config::datastore::config()
- .and_then(|(s, _)| s.lookup::<DataStoreConfig>("datastore", self.name()))
- .is_ok_and(|c| {
- c.get_maintenance_mode()
- .is_some_and(|m| m.clear_from_cache())
- });
-
- if remove_from_cache {
+
+ // first check: check if last task finished
+ if !last_task {
+ return;
+ }
+
+ let (section_config, _gen) = match datastore_section_config_cached(false) {
+ Ok(v) => v,
+ Err(err) => {
+ log::error!(
+ "failed to load datastore config in Drop for {} - {err}",
+ self.name()
+ );
+ return;
+ }
+ };
+
+ let datastore_cfg: DataStoreConfig =
+ match section_config.lookup("datastore", self.name()) {
+ Ok(cfg) => cfg,
+ Err(err) => {
+ log::error!(
+ "failed to look up datastore '{}' in Drop - {err}",
+ self.name()
+ );
+ return;
+ }
+ };
+
+ // second check: check maintenance mode mandate
+ if datastore_cfg
+ .get_maintenance_mode()
+ .is_some_and(|m| m.clear_from_cache())
+ {
DATASTORE_MAP.lock().unwrap().remove(self.name());
}
}
@@ -277,12 +302,12 @@ impl DatastoreBackend {
/// - If the cached generation matches the current generation, the
/// cached config is returned.
/// - Otherwise the config is re-read from disk. If `update_cache` is
-/// `true`, the new config and current generation are stored in the
+/// `true`, the new config and bumped generation are stored in the
/// cache. Callers that set `update_cache = true` must hold the
/// datastore config lock to avoid racing with concurrent config
/// changes.
/// - If `update_cache` is `false`, the freshly read config is returned
-/// but the cache is left unchanged.
+/// but the cache and generation are left unchanged.
///
/// If `ConfigVersionCache` is not available, the config is always read
/// from disk and `None` is returned as the generation.
@@ -303,14 +328,23 @@ fn datastore_section_config_cached(
let (config_raw, _digest) = pbs_config::datastore::config()?;
let config = Arc::new(config_raw);
+ let mut effective_gen = current_gen;
if update_cache {
+ // Bump the generation. This ensures that Drop
+ // handlers will detect that a newer config exists
+ // and will not rely on a stale cached entry for
+ // maintenance mandate.
+ let prev_gen = version_cache.increase_datastore_generation();
+ effective_gen = prev_gen + 1;
+
+ // Persist
*config_cache = Some(DatastoreConfigCache {
config: config.clone(),
- last_generation: current_gen,
+ last_generation: effective_gen,
});
}
- Ok((config, Some(current_gen)))
+ Ok((config, Some(effective_gen)))
} else {
// Fallback path, no config version cache: read datastore.cfg and return None as generation
*config_cache = None;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 14%]
* [pbs-devel] [PATCH proxmox-backup v4 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
2025-11-24 15:33 12% [pbs-devel] [PATCH proxmox-backup v4 0/4] " Samuel Rufinatscha
2025-11-24 15:33 16% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
@ 2025-11-24 15:33 11% ` Samuel Rufinatscha
2025-11-24 15:33 14% ` [pbs-devel] [PATCH proxmox-backup v4 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
` (2 subsequent siblings)
4 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-24 15:33 UTC (permalink / raw)
To: pbs-devel
Repeated /status requests caused lookup_datastore() to re-read and
parse datastore.cfg on every call. The issue was mentioned in report
#6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
dominated by pbs_config::datastore::config() (config parsing).
This patch implements caching of the global datastore.cfg using the
generation numbers from the shared config version cache. It caches the
datastore.cfg along with the generation number and, when a subsequent
lookup sees the same generation, it reuses the cached config without
re-reading it from disk. If the generation differs
(or the cache is unavailable), the config is re-read from disk.
If `update_cache = true`, the new config and current generation are
persisted in the cache. In this case, callers must hold the datastore
config lock to avoid racing with concurrent config changes.
If `update_cache` is `false` and generation did not match, the freshly
read config is returned but the cache is left unchanged. If
`ConfigVersionCache` is not available, the config is always read from
disk and `None` is returned as generation.
Behavioral notes
- The generation is bumped via the existing save_config() path, so
API-driven config changes are detected immediately.
- Manual edits to datastore.cfg are not detected; this is covered in a
dedicated patch in this series.
- DataStore::drop still performs a config read on the common path;
also covered in a dedicated patch in this series.
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes:
From v1 → v2, thanks @Fabian
- Moved the ConfigVersionCache changes into its own patch.
- Introduced the global static DATASTORE_CONFIG_CACHE to store the
fully parsed datastore.cfg instead, along with its generation number.
Introduced DatastoreConfigCache struct to hold both.
- Removed and replaced the CachedDatastoreConfigTag field of
DataStoreImpl with a generation number field only (Option<usize>)
to validate DataStoreImpl reuse.
- Added DataStore::datastore_section_config_cached() helper function
to encapsulate the caching logic and simplify reuse.
- Modified DataStore::lookup_datastore() to use the new helper.
From v2 → v3
No changes
From v3 → v4, thanks @Fabian
- Restructured the version cache checks in
datastore_section_config_cached(), to simplify the logic.
- Added update_cache parameter to datastore_section_config_cached() to
control cache updates.
pbs-datastore/Cargo.toml | 1 +
pbs-datastore/src/datastore.rs | 138 +++++++++++++++++++++++++--------
2 files changed, 105 insertions(+), 34 deletions(-)
diff --git a/pbs-datastore/Cargo.toml b/pbs-datastore/Cargo.toml
index 8ce930a9..42f49a7b 100644
--- a/pbs-datastore/Cargo.toml
+++ b/pbs-datastore/Cargo.toml
@@ -40,6 +40,7 @@ proxmox-io.workspace = true
proxmox-lang.workspace=true
proxmox-s3-client = { workspace = true, features = [ "impl" ] }
proxmox-schema = { workspace = true, features = [ "api-macro" ] }
+proxmox-section-config.workspace = true
proxmox-serde = { workspace = true, features = [ "serde_json" ] }
proxmox-sys.workspace = true
proxmox-systemd.workspace = true
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 0a517923..11e16eaf 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -32,7 +32,8 @@ use pbs_api_types::{
MaintenanceType, Operation, UPID,
};
use pbs_config::s3::S3_CFG_TYPE_ID;
-use pbs_config::BackupLockGuard;
+use pbs_config::{BackupLockGuard, ConfigVersionCache};
+use proxmox_section_config::SectionConfigData;
use crate::backup_info::{
BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
@@ -46,6 +47,17 @@ use crate::s3::S3_CONTENT_PREFIX;
use crate::task_tracking::{self, update_active_operations};
use crate::{DataBlob, LocalDatastoreLruCache};
+// Cache for fully parsed datastore.cfg
+struct DatastoreConfigCache {
+ // Parsed datastore.cfg file
+ config: Arc<SectionConfigData>,
+ // Generation number from ConfigVersionCache
+ last_generation: usize,
+}
+
+static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
+ LazyLock::new(|| Mutex::new(None));
+
static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
LazyLock::new(|| Mutex::new(HashMap::new()));
@@ -142,10 +154,12 @@ pub struct DataStoreImpl {
last_gc_status: Mutex<GarbageCollectionStatus>,
verify_new: bool,
chunk_order: ChunkOrder,
- last_digest: Option<[u8; 32]>,
sync_level: DatastoreFSyncLevel,
backend_config: DatastoreBackendConfig,
lru_store_caching: Option<LocalDatastoreLruCache>,
+ /// Datastore generation number from `ConfigVersionCache` at creation time, used to
+ /// validate reuse of this cached `DataStoreImpl`.
+ config_generation: Option<usize>,
}
impl DataStoreImpl {
@@ -158,10 +172,10 @@ impl DataStoreImpl {
last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
verify_new: false,
chunk_order: Default::default(),
- last_digest: None,
sync_level: Default::default(),
backend_config: Default::default(),
lru_store_caching: None,
+ config_generation: None,
})
}
}
@@ -256,6 +270,55 @@ impl DatastoreBackend {
}
}
+/// Returns the parsed datastore config (`datastore.cfg`) and its
+/// generation.
+///
+/// Uses `ConfigVersionCache` to detect stale entries:
+/// - If the cached generation matches the current generation, the
+/// cached config is returned.
+/// - Otherwise the config is re-read from disk. If `update_cache` is
+/// `true`, the new config and current generation are stored in the
+/// cache. Callers that set `update_cache = true` must hold the
+/// datastore config lock to avoid racing with concurrent config
+/// changes.
+/// - If `update_cache` is `false`, the freshly read config is returned
+/// but the cache is left unchanged.
+///
+/// If `ConfigVersionCache` is not available, the config is always read
+/// from disk and `None` is returned as the generation.
+fn datastore_section_config_cached(
+ update_cache: bool,
+) -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
+ let mut config_cache = DATASTORE_CONFIG_CACHE.lock().unwrap();
+
+ if let Ok(version_cache) = ConfigVersionCache::new() {
+ let current_gen = version_cache.datastore_generation();
+ if let Some(cached) = config_cache.as_ref() {
+ // Fast path: re-use cached datastore.cfg
+ if cached.last_generation == current_gen {
+ return Ok((cached.config.clone(), Some(cached.last_generation)));
+ }
+ }
+ // Slow path: re-read datastore.cfg
+ let (config_raw, _digest) = pbs_config::datastore::config()?;
+ let config = Arc::new(config_raw);
+
+ if update_cache {
+ *config_cache = Some(DatastoreConfigCache {
+ config: config.clone(),
+ last_generation: current_gen,
+ });
+ }
+
+ Ok((config, Some(current_gen)))
+ } else {
+ // Fallback path, no config version cache: read datastore.cfg and return None as generation
+ *config_cache = None;
+ let (config_raw, _digest) = pbs_config::datastore::config()?;
+ Ok((Arc::new(config_raw), None))
+ }
+}
+
impl DataStore {
// This one just panics on everything
#[doc(hidden)]
@@ -327,56 +390,63 @@ impl DataStore {
name: &str,
operation: Option<Operation>,
) -> Result<Arc<DataStore>, Error> {
- // Avoid TOCTOU between checking maintenance mode and updating active operation counter, as
- // we use it to decide whether it is okay to delete the datastore.
+ // Avoid TOCTOU between checking maintenance mode and updating active operations.
let _config_lock = pbs_config::datastore::lock_config()?;
- // we could use the ConfigVersionCache's generation for staleness detection, but we load
- // the config anyway -> just use digest, additional benefit: manual changes get detected
- let (config, digest) = pbs_config::datastore::config()?;
- let config: DataStoreConfig = config.lookup("datastore", name)?;
+ // Get the current datastore.cfg generation number and cached config
+ let (section_config, gen_num) = datastore_section_config_cached(true)?;
+
+ let datastore_cfg: DataStoreConfig = section_config.lookup("datastore", name)?;
+ let maintenance_mode = datastore_cfg.get_maintenance_mode();
+ let mount_status = get_datastore_mount_status(&datastore_cfg);
- if let Some(maintenance_mode) = config.get_maintenance_mode() {
- if let Err(error) = maintenance_mode.check(operation) {
+ if let Some(mm) = &maintenance_mode {
+ if let Err(error) = mm.check(operation.clone()) {
bail!("datastore '{name}' is unavailable: {error}");
}
}
- if get_datastore_mount_status(&config) == Some(false) {
- let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
- datastore_cache.remove(&config.name);
- bail!("datastore '{}' is not mounted", config.name);
+ let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
+
+ if mount_status == Some(false) {
+ datastore_cache.remove(&datastore_cfg.name);
+ bail!("datastore '{}' is not mounted", datastore_cfg.name);
}
- let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
- let entry = datastore_cache.get(name);
-
- // reuse chunk store so that we keep using the same process locker instance!
- let chunk_store = if let Some(datastore) = &entry {
- let last_digest = datastore.last_digest.as_ref();
- if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
- if let Some(operation) = operation {
- update_active_operations(name, operation, 1)?;
+ // Re-use DataStoreImpl
+ if let Some(existing) = datastore_cache.get(name).cloned() {
+ if let (Some(last_generation), Some(gen_num)) = (existing.config_generation, gen_num) {
+ if last_generation == gen_num {
+ if let Some(op) = operation {
+ update_active_operations(name, op, 1)?;
+ }
+
+ return Ok(Arc::new(Self {
+ inner: existing,
+ operation,
+ }));
}
- return Ok(Arc::new(Self {
- inner: Arc::clone(datastore),
- operation,
- }));
}
- Arc::clone(&datastore.chunk_store)
+ }
+
+ // (Re)build DataStoreImpl
+
+ // Reuse chunk store so that we keep using the same process locker instance!
+ let chunk_store = if let Some(existing) = datastore_cache.get(name) {
+ Arc::clone(&existing.chunk_store)
} else {
let tuning: DatastoreTuning = serde_json::from_value(
DatastoreTuning::API_SCHEMA
- .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
+ .parse_property_string(datastore_cfg.tuning.as_deref().unwrap_or(""))?,
)?;
Arc::new(ChunkStore::open(
name,
- config.absolute_path(),
+ datastore_cfg.absolute_path(),
tuning.sync_level.unwrap_or_default(),
)?)
};
- let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
+ let datastore = DataStore::with_store_and_config(chunk_store, datastore_cfg, gen_num)?;
let datastore = Arc::new(datastore);
datastore_cache.insert(name.to_string(), datastore.clone());
@@ -478,7 +548,7 @@ impl DataStore {
fn with_store_and_config(
chunk_store: Arc<ChunkStore>,
config: DataStoreConfig,
- last_digest: Option<[u8; 32]>,
+ generation: Option<usize>,
) -> Result<DataStoreImpl, Error> {
let mut gc_status_path = chunk_store.base_path();
gc_status_path.push(".gc-status");
@@ -538,10 +608,10 @@ impl DataStore {
last_gc_status: Mutex::new(gc_status),
verify_new: config.verify_new.unwrap_or(false),
chunk_order: tuning.chunk_order.unwrap_or_default(),
- last_digest,
sync_level: tuning.sync_level.unwrap_or_default(),
backend_config,
lru_store_caching,
+ config_generation: generation,
})
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 11%]
* [pbs-devel] [PATCH proxmox-backup v4 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
2025-11-24 15:33 12% [pbs-devel] [PATCH proxmox-backup v4 0/4] " Samuel Rufinatscha
` (2 preceding siblings ...)
2025-11-24 15:33 14% ` [pbs-devel] [PATCH proxmox-backup v4 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
@ 2025-11-24 15:33 13% ` Samuel Rufinatscha
2025-11-24 17:06 13% ` [pbs-devel] superseded: [PATCH proxmox-backup v4 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
4 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-24 15:33 UTC (permalink / raw)
To: pbs-devel
The lookup fast path reacts to API-driven config changes because
save_config() bumps the generation. Manual edits of datastore.cfg do
not bump the counter. To keep the system robust against such edits
without reintroducing config reading and hashing on the hot path, this
patch adds a TTL to the cache entry.
If the cached config is older than
DATASTORE_CONFIG_CACHE_TTL_SECS (set to 60s), the next lookup takes
the slow path and refreshes the entry. As an optimization, a check to
catch manual edits was added (if the digest changed but generation
stayed the same), so that the generation is only bumped when needed.
Links
[1] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes:
From v1 → v2
- Store last_update timestamp in DatastoreConfigCache type.
From v2 → v3
No changes
From v3 → v4
- Fix digest generation bump logic in update_cache, thanks @Fabian.
pbs-datastore/src/datastore.rs | 55 ++++++++++++++++++++++++----------
1 file changed, 39 insertions(+), 16 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 942656e6..a5c450d0 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -22,7 +22,7 @@ use proxmox_sys::error::SysError;
use proxmox_sys::fs::{file_read_optional_string, replace_file, CreateOptions};
use proxmox_sys::linux::procfs::MountInfo;
use proxmox_sys::process_locker::{ProcessLockExclusiveGuard, ProcessLockSharedGuard};
-use proxmox_time::TimeSpan;
+use proxmox_time::{epoch_i64, TimeSpan};
use proxmox_worker_task::WorkerTaskContext;
use pbs_api_types::{
@@ -51,8 +51,12 @@ use crate::{DataBlob, LocalDatastoreLruCache};
struct DatastoreConfigCache {
// Parsed datastore.cfg file
config: Arc<SectionConfigData>,
+ // Digest of the datastore.cfg file
+ digest: [u8; 32],
// Generation number from ConfigVersionCache
last_generation: usize,
+ // Last update time (epoch seconds)
+ last_update: i64,
}
static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
@@ -61,6 +65,8 @@ static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
LazyLock::new(|| Mutex::new(HashMap::new()));
+/// Max age in seconds to reuse the cached datastore config.
+const DATASTORE_CONFIG_CACHE_TTL_SECS: i64 = 60;
/// Filename to store backup group notes
pub const GROUP_NOTES_FILE_NAME: &str = "notes";
/// Filename to store backup group owner
@@ -299,13 +305,14 @@ impl DatastoreBackend {
/// generation.
///
/// Uses `ConfigVersionCache` to detect stale entries:
-/// - If the cached generation matches the current generation, the
-/// cached config is returned.
+/// - If the cached generation matches the current generation and TTL is
+/// OK, the cached config is returned.
/// - Otherwise the config is re-read from disk. If `update_cache` is
-/// `true`, the new config and bumped generation are stored in the
-/// cache. Callers that set `update_cache = true` must hold the
-/// datastore config lock to avoid racing with concurrent config
-/// changes.
+/// `true` and a previous cached entry exists with the same generation
+/// but a different digest, this indicates the config has changed
+/// (e.g. manual edit) and the generation must be bumped. Callers
+/// that set `update_cache = true` must hold the datastore config lock
+/// to avoid racing with concurrent config changes.
/// - If `update_cache` is `false`, the freshly read config is returned
/// but the cache and generation are left unchanged.
///
@@ -317,30 +324,46 @@ fn datastore_section_config_cached(
let mut config_cache = DATASTORE_CONFIG_CACHE.lock().unwrap();
if let Ok(version_cache) = ConfigVersionCache::new() {
+ let now = epoch_i64();
let current_gen = version_cache.datastore_generation();
if let Some(cached) = config_cache.as_ref() {
- // Fast path: re-use cached datastore.cfg
- if cached.last_generation == current_gen {
+ // Fast path: re-use cached datastore.cfg if generation matches and TTL not expired
+ if cached.last_generation == current_gen
+ && now - cached.last_update < DATASTORE_CONFIG_CACHE_TTL_SECS
+ {
return Ok((cached.config.clone(), Some(cached.last_generation)));
}
}
// Slow path: re-read datastore.cfg
- let (config_raw, _digest) = pbs_config::datastore::config()?;
+ let (config_raw, digest) = pbs_config::datastore::config()?;
let config = Arc::new(config_raw);
let mut effective_gen = current_gen;
if update_cache {
- // Bump the generation. This ensures that Drop
- // handlers will detect that a newer config exists
- // and will not rely on a stale cached entry for
- // maintenance mandate.
- let prev_gen = version_cache.increase_datastore_generation();
- effective_gen = prev_gen + 1;
+ // Bump the generation if the config has been changed manually.
+ // This ensures that Drop handlers will detect that a newer config exists
+ // and will not rely on a stale cached entry for maintenance mandate.
+ let (prev_gen, prev_digest) = config_cache
+ .as_ref()
+ .map(|c| (Some(c.last_generation), Some(c.digest)))
+ .unwrap_or((None, None));
+
+ let manual_edit = match (prev_gen, prev_digest) {
+ (Some(prev_g), Some(prev_d)) => prev_g == current_gen && prev_d != digest,
+ _ => false,
+ };
+
+ if manual_edit {
+ let prev_gen = version_cache.increase_datastore_generation();
+ effective_gen = prev_gen + 1;
+ }
// Persist
*config_cache = Some(DatastoreConfigCache {
config: config.clone(),
+ digest,
last_generation: effective_gen,
+ last_update: now,
});
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox-backup v4 0/4] datastore: remove config reload on hot path
@ 2025-11-24 15:33 12% Samuel Rufinatscha
2025-11-24 15:33 16% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
` (4 more replies)
0 siblings, 5 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-24 15:33 UTC (permalink / raw)
To: pbs-devel
Hi,
this series reduces CPU time in datastore lookups by avoiding repeated
datastore.cfg reads/parses in both `lookup_datastore()` and
`DataStore::Drop`. It also adds a TTL so manual config edits are
noticed without reintroducing hashing on every request.
While investigating #6049 [1], cargo-flamegraph [2] showed hotspots
during repeated `/status` calls in `lookup_datastore()` and in `Drop`,
dominated by `pbs_config::datastore::config()` (config parse).
The parsing cost itself should eventually be investigated in a future
effort. Furthermore, cargo-flamegraph showed that when using a
token-based auth method to access the API, a significant amount of time
is spent in validation on every request request [3].
## Approach
[PATCH 1/4] Support datastore generation in ConfigVersionCache
[PATCH 2/4] Fast path for datastore lookups
Cache the parsed datastore.cfg keyed by the shared datastore
generation. lookup_datastore() reuses both the cached config and an
existing DataStoreImpl when the generation matches, and falls back
to the old slow path otherwise. The caching logic is implemented
using the datastore_section_config_cached(update_cache: bool) helper.
[PATCH 3/4] Fast path for Drop
Make DataStore::Drop use the datastore_section_config_cached()
helper to avoid re-reading/parsing datastore.cfg on every Drop.
Bump generation not only on API config saves, but also on slow-path
lookups (if update_cache is true), to enable Drop handlers see
eventual newer configs.
[PATCH 4/4] TTL to catch manual edits
Add a TTL to the cached config and bump the datastore generation iff
the digest changed but generation stays the same. This catches manual
edits to datastore.cfg without reintroducing hashing or config
parsing on every request.
## Benchmark results
### End-to-end
Testing `/status?verbose=0` end-to-end with 1000 stores, 5 req/store
and parallel=16 before/after the series:
Metric Before After
----------------------------------------
Total time 12s 9s
Throughput (all) 416.67 555.56
Cold RPS (round #1) 83.33 111.11
Warm RPS (#2..N) 333.33 444.44
Running under flamegraph [2], TLS appears to consume a significant
amount of CPU time and blur the results. Still, a ~33% higher overall
throughput and ~25% less end-to-end time for this workload.
### Isolated benchmarks (hyperfine)
In addition to the end-to-end tests, I measured two standalone
benchmarks with hyperfine, each using a config with 1000 datastores.
`M` is the number of distinct datastores looked up and
`N` is the number of lookups per datastore.
Drop-direct variant:
Drops the `DataStore` after every lookup, so the `Drop` path runs on
every iteration:
use anyhow::Error;
use pbs_api_types::Operation;
use pbs_datastore::DataStore;
fn main() -> Result<(), Error> {
let mut args = std::env::args();
args.next();
let datastores = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
let iterations = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
for d in 1..=datastores {
let name = format!("ds{:04}", d);
for i in 1..=iterations {
DataStore::lookup_datastore(&name, Some(Operation::Write))?;
}
}
Ok(())
}
+----+------+-----------+-----------+---------+
| M | N | Baseline | Patched | Speedup |
+----+------+-----------+-----------+---------+
| 1 | 1000 | 1.684 s | 35.3 ms | 47.7x |
| 10 | 100 | 1.689 s | 35.0 ms | 48.3x |
| 100| 10 | 1.709 s | 35.8 ms | 47.7x |
|1000| 1 | 1.809 s | 39.0 ms | 46.4x |
+----+------+-----------+-----------+---------+
Bulk-drop variant:
Keeps the `DataStore` instances alive for
all `N` lookups of a given datastore and then drops them in bulk,
mimicking a task that performs many lookups while it is running and
only triggers the expensive `Drop` logic when the last user exits.
use anyhow::Error;
use pbs_api_types::Operation;
use pbs_datastore::DataStore;
fn main() -> Result<(), Error> {
let mut args = std::env::args();
args.next();
let datastores = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
let iterations = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
for d in 1..=datastores {
let name = format!("ds{:04}", d);
let mut stores = Vec::with_capacity(iterations);
for i in 1..=iterations {
stores.push(DataStore::lookup_datastore(&name, Some(Operation::Write))?);
}
}
Ok(())
}
+------+------+---------------+--------------+---------+
| M | N | Baseline mean | Patched mean | Speedup |
+------+------+---------------+--------------+---------+
| 1 | 1000 | 890.6 ms | 35.5 ms | 25.1x |
| 10 | 100 | 891.3 ms | 35.1 ms | 25.4x |
| 100 | 10 | 983.9 ms | 35.6 ms | 27.6x |
| 1000 | 1 | 1829.0 ms | 45.2 ms | 40.5x |
+------+------+---------------+--------------+---------+
Both variants show that the combination of the cached config lookups
and the cheaper `Drop` handling reduces the hot-path cost from ~1.8 s
per run to a few tens of milliseconds in these benchmarks.
## Reproduction steps
VM: 4 vCPU, ~8 GiB RAM, VirtIO-SCSI; disks:
- scsi0 32G (OS)
- scsi1 1000G (datastores)
Install PBS from ISO on the VM.
Set up ZFS on /dev/sdb (adjust if different):
zpool create -f -o ashift=12 pbsbench /dev/sdb
zfs set mountpoint=/pbsbench pbsbench
zfs create pbsbench/pbs-bench
Raise file-descriptor limit:
sudo systemctl edit proxmox-backup-proxy.service
Add the following lines:
[Service]
LimitNOFILE=1048576
Reload systemd and restart the proxy:
sudo systemctl daemon-reload
sudo systemctl restart proxmox-backup-proxy.service
Verify the limit:
systemctl show proxmox-backup-proxy.service | grep LimitNOFILE
Create 1000 ZFS-backed datastores (as used in #6049 [1]):
seq -w 001 1000 | xargs -n1 -P1 bash -c '
id=$0
name="ds${id}"
dataset="pbsbench/pbs-bench/${name}"
path="/pbsbench/pbs-bench/${name}"
zfs create -o mountpoint="$path" "$dataset"
proxmox-backup-manager datastore create "$name" "$path" \
--comment "ZFS dataset-based datastore"
'
Build PBS from this series, then run the server under manually
under flamegraph:
systemctl stop proxmox-backup-proxy
cargo flamegraph --release --bin proxmox-backup-proxy
## Patch summary
[PATCH 1/4] partial fix #6049: config: enable config version cache for datastore
[PATCH 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
[PATCH 3/4] partial fix #6049: datastore: use config fast-path in Drop
[PATCH 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
## Maintainer notes
No dependency bumps, no API changes and no breaking changes.
Thanks,
Samuel
Links
[1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
[3] Bugzilla #7017: https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Samuel Rufinatscha (4):
partial fix #6049: config: enable config version cache for datastore
partial fix #6049: datastore: impl ConfigVersionCache fast path for
lookups
partial fix #6049: datastore: use config fast-path in Drop
partial fix #6049: datastore: add TTL fallback to catch manual config
edits
pbs-config/src/config_version_cache.rs | 10 +-
pbs-datastore/Cargo.toml | 1 +
pbs-datastore/src/datastore.rs | 215 ++++++++++++++++++++-----
3 files changed, 180 insertions(+), 46 deletions(-)
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 6/6] datastore: only bump generation when config digest changes
2025-11-20 14:50 5% ` Fabian Grünbichler
@ 2025-11-21 8:37 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-21 8:37 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 11/20/25 3:50 PM, Fabian Grünbichler wrote:
> On November 20, 2025 2:03 pm, Samuel Rufinatscha wrote:
>> When reloading datastore.cfg in datastore_section_config_cached(),
>> we currently bump the datastore generation unconditionally. This is
>> only necessary when the on disk content actually changed and when
>> we already had a previous cached entry.
>>
>> This patch extends the DatastoreConfigCache to store the last digest of
>> datastore.cfg and track the previously cached generation and digest.
>> Only when the digest differs from the cached one. On first load, it
>> reuses the existing datastore_generation without bumping.
>>
>> This avoids unnecessary cache invalidations if the config did not
>> change.
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> pbs-datastore/src/datastore.rs | 43 ++++++++++++++++++++++++----------
>> 1 file changed, 30 insertions(+), 13 deletions(-)
>>
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 12076f31..bf04332e 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -51,6 +51,8 @@ use crate::{DataBlob, LocalDatastoreLruCache};
>> struct DatastoreConfigCache {
>> // Parsed datastore.cfg file
>> config: Arc<SectionConfigData>,
>> + // Digest of the datastore.cfg file
>> + last_digest: [u8; 32],
>> // Generation number from ConfigVersionCache
>> last_generation: usize,
>> // Last update time (epoch seconds)
>> @@ -349,29 +351,44 @@ fn datastore_section_config_cached(
>> }
>>
>> // Slow path: re-read datastore.cfg
>> - let (config_raw, _digest) = pbs_config::datastore::config()?;
>> + let (config_raw, digest) = pbs_config::datastore::config()?;
>> let config = Arc::new(config_raw);
>>
>> - // Update cache
>> + // Decide whether to bump the shared generation.
>> + // Only bump if we already had a cached generation and the digest changed (manual edit or API write)
>> + let (prev_gen, prev_digest) = guard
>> + .as_ref()
>> + .map(|c| (Some(c.last_generation), Some(c.last_digest)))
>> + .unwrap_or((None, None));
>> +
>> let new_gen = if let Some(handle) = version_cache {
>> - // Bump datastore generation whenever we reload the config.
>> - // This ensures that Drop handlers will detect that a newer config exists
>> - // and will not rely on a stale cached entry for maintenance mandate.
>> - let prev_gen = handle.increase_datastore_generation();
>> - let new_gen = prev_gen + 1;
>> + match (prev_gen, prev_digest) {
>> + // We had a previous generation and the digest changed => bump generation.
>> + (Some(_prev_gen), Some(prev_digest)) if prev_digest != digest => {
>
> this is not quite the correct logic - I think.
>
> we only need to bump *iff* the digest doesn't match, but the generation
> does - that implies somebody changed the config behind our back.
>
> if the generation is different, we should *expect* the digest to also
> not be identical, but we don't have to care in that case, since the
> generation was already bumped (compared to the last cached state with
> the different digest), and that invalidates all the old cache references
> anyway..
>
Makes sense and good point! I will restrict bumping here for the case
you mentioned (*iff* the digest doesn't match, but the generation does).
So in the case the generation is different, we can rely on the current gen.
> again, I think this would be easier to follow along if the structure of
> the ifs is changed ;)
>
I agree, changing :-)
>> + let old = handle.increase_datastore_generation();
>> + Some(old + 1)
>> + }
>> + // We had a previous generation but the digest stayed the same:
>> + // keep the existing generation, just refresh the timestamp.
>> + (Some(prev_gen), _) => Some(prev_gen),
>> + // We didn't have a previous generation, just use the current one.
>> + (None, _) => Some(handle.datastore_generation()),
>> + }
>> + } else {
>> + None
>> + };
>>
>> + if let Some(gen_val) = new_gen {
>> *guard = Some(DatastoreConfigCache {
>> config: config.clone(),
>> - last_generation: new_gen,
>> + last_digest: digest,
>> + last_generation: gen_val,
>> last_update: now,
>> });
>> -
>> - Some(new_gen)
>> } else {
>> - // if the cache was not available, use again the slow path next time
>> + // If the shared version cache is not available, don't cache.
>> *guard = None;
>> - None
>> - };
>> + }
>>
>> Ok((config, new_gen))
>> }
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 5/6] partial fix #6049: datastore: add reload flag to config cache helper
2025-11-20 14:50 5% ` Fabian Grünbichler
@ 2025-11-20 18:15 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-20 18:15 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 11/20/25 3:50 PM, Fabian Grünbichler wrote:
> On November 20, 2025 2:03 pm, Samuel Rufinatscha wrote:
>> Extend datastore_section_config_cached() with an `allow_reload` flag to
>> separate two use cases:
>>
>> 1) lookup_datastore() passes `true` and is allowed to reload
>> datastore.cfg from disk when the cache is missing, the generation
>> changed or the TTL expired. The helper may bump the datastore
>> generation if the digest changed.
>>
>> 2) DataStore::drop() passes `false` and only consumes the most recent
>> cached entry without touching the disk, TTL or generation. If the
>> cache was never initialised, it returns an error.
>>
>> This avoids races between Drop and concurrent config changes.
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> pbs-datastore/src/datastore.rs | 36 ++++++++++++++++++++++++++++++----
>> 1 file changed, 32 insertions(+), 4 deletions(-)
>>
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 1711c753..12076f31 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -226,7 +226,7 @@ impl Drop for DataStore {
>> return;
>> }
>>
>> - let (section_config, _gen) = match datastore_section_config_cached() {
>> + let (section_config, _gen) = match datastore_section_config_cached(false) {
>> Ok(v) => v,
>> Err(err) => {
>> log::error!(
>> @@ -299,14 +299,42 @@ impl DatastoreBackend {
>> }
>> }
>>
>> -/// Return the cached datastore SectionConfig and its generation.
>> -fn datastore_section_config_cached() -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
>> +/// Returns the cached `datastore.cfg` and its generation.
>> +///
>> +/// When `allow_reload` is `true`, callers are expected to hold the datastore config. It may:
>> +/// - Reload `datastore.cfg` from disk if either
>> +/// - no cache exists yet, or cache is unavailable
>> +/// - the cached generation does not match the shared generation
>> +/// - the cache entry is older than `DATASTORE_CONFIG_CACHE_TTL_SECS`
>> +/// - Updates the cache with the new config, timestamp and digest.
>> +/// - Bumps the datastore generation in `ConfigVersionCache` only if
>> +/// there was a previous cached entry and the digest changed (manual edit or
>> +/// API write). If the digest is unchanged, the timestamp is refreshed but the
>> +/// generation is kept to avoid unnecessary invalidations.
>> +///
>> +/// When `allow_reload` is `false`:
>> +/// - Never touches the disk or the shared generation.
>> +/// - Ignores TTL and simply returns the most recent cached entry if available.
>> +/// - Returns an error if the cache has not been initialised yet.
>> +///
>> +/// Intended for use with `Datastore::drop` where no config lock is held
>> +/// and eventual stale data is acceptable.
>> +fn datastore_section_config_cached(
>> + allow_reload: bool,
>> +) -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
>> let now = epoch_i64();
>> let version_cache = ConfigVersionCache::new().ok();
>> let current_gen = version_cache.as_ref().map(|c| c.datastore_generation());
>>
>> let mut guard = DATASTORE_CONFIG_CACHE.lock().unwrap();
>>
>> + if !allow_reload {
>> + if let Some(cache) = guard.as_ref() {
>> + return Ok((cache.config.clone(), Some(cache.last_generation)));
>> + }
>> + bail!("datastore config cache not initialized");
>> + }
>
> this is not quite what I intended, we are actually allowed to reload,
> just not bump the generation number and store the result ;) the
> difference is basically whether we
> - hold the lock and can be sure that nothing modifies the
> config/generation number while we do the lookup and bump
> - don't hold the lock and can just compare and reload, but not bump and
> persist
>
> if the code is restructured then this is should boil down to an if
> wrapping the generation bump and cache update, leaving the rest as it
> was..
>
Makes sense, thanks Fabian! I will restructure it and fix the flag
check. The check should then wrap only bump and update as you
suggested. I think it could look like this:
fn datastore_section_config_cached(
update_cache_and_generation: bool,
) -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
let mut guard = DATASTORE_CONFIG_CACHE.lock().unwrap();
if let Some(version_cache) = ConfigVersionCache::new().ok() {
let now = epoch_i64();
let current_gen = version_cache.datastore_generation();
if let Some(cached) = guard.as_ref() {
// Fast path: re-use cached datastore.cfg if cache is
available, generation matches and TTL not expired
if cached.last_generation == current_gen
&& now - cached.last_update <
DATASTORE_CONFIG_CACHE_TTL_SECS
{
return Ok((cached.config.clone(),
Some(cached.last_generation)));
}
}
// Slow path: re-read datastore.cfg
let (config_raw, digest) = pbs_config::datastore::config()?;
let config = Arc::new(config_raw);
let mut effective_gen = current_gen;
if update_cache_and_generation {
let (prev_gen, prev_digest) = guard
.as_ref()
.map(|c| (Some(c.last_generation), Some(c.digest)))
.unwrap_or((None, None));
let manual_edit = match (prev_gen, prev_digest) {
(Some(prev_g), Some(prev_d)) => prev_g == current_gen
&& prev_d != digest,
_ => false,
};
if manual_edit {
let old = version_cache.increase_datastore_generation();
effective_gen = old + 1;
}
// Update cache
*guard = Some(DatastoreConfigCache {
config: config.clone(),
digest,
last_generation: effective_gen,
last_update: now,
});
}
Ok((config, Some(effective_gen)))
} else {
// Fallback path, no config version cache: read datastore.cfg
*guard = None;
let (config_raw, _digest) = pbs_config::datastore::config()?;
Ok((Arc::new(config_raw), None))
}
}
>> +
>> // Fast path: re-use cached datastore.cfg if cache is available, generation matches and TTL not expired
>> if let (Some(current_gen), Some(config_cache)) = (current_gen, guard.as_ref()) {
>> let gen_matches = config_cache.last_generation == current_gen;
>> @@ -423,7 +451,7 @@ impl DataStore {
>> let _config_lock = pbs_config::datastore::lock_config()?;
>>
>> // Get the current datastore.cfg generation number and cached config
>> - let (section_config, gen_num) = datastore_section_config_cached()?;
>> + let (section_config, gen_num) = datastore_section_config_cached(true)?;
>>
>> let datastore_cfg: DataStoreConfig = section_config.lookup("datastore", name)?;
>> let maintenance_mode = datastore_cfg.get_maintenance_mode();
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 0/6] datastore: remove config reload on hot path
2025-11-20 14:50 5% ` [pbs-devel] [PATCH proxmox-backup v3 0/6] datastore: remove config reload on hot path Fabian Grünbichler
@ 2025-11-20 15:17 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-20 15:17 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 11/20/25 3:50 PM, Fabian Grünbichler wrote:
> On November 20, 2025 2:03 pm, Samuel Rufinatscha wrote:
>> Hi,
>>
>> [..]
>
> nit: this is getting a bit long ;)
>
>>
>> ## Patch summary
>>
>> [PATCH 1/6] partial fix #6049: config: enable config version cache for datastore
>> [PATCH 2/6] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
>> [PATCH 3/6] partial fix #6049: datastore: use config fast-path in Drop
>> [PATCH 4/6] partial fix #6049: datastore: add TTL fallback to catch manual config edits
>> [PATCH 5/6] to add a reload flag to the config cache helper.
>> [PATCH 6/6] to only bump generation when the config digest changes.
>>
>> ## Changes from v2:
>>
>> Added:
>> - [PATCH 5/6]: Add a reload flag to the config cache helper.
>> - [PATCH 6/6]: Only bump generation when the config digest changes.
>
> please fold those into the existing version where they make sense, and
> include a per-patch changelog to know *what* changed ;)
>
> e.g., the digest part can already go into the first patch (if the
> generation bumping is also moved thre from patch #4), or into patch #4.
>
> the structural changes I suggested are missing, and I think the
> readability got worse as a result since v2, we now have six instances of
> checking whether there is some cache we are operating on or not..
>
> I'll give more detailed feedback on the two new patches..
>
Thanks for the review Fabian! I actually somehow missed your comment on
PATCH 2/4 v2, sorry for that! Will make sure its included in the new
version for sure.
>>
>> ## Maintainer notes
>>
>> No dependency bumps, no API changes and no breaking changes.
>>
>> Thanks,
>> Samuel
>>
>> [1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>> [3] Bugzilla #7017: https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>
>> Samuel Rufinatscha (6):
>> partial fix #6049: config: enable config version cache for datastore
>> partial fix #6049: datastore: impl ConfigVersionCache fast path for
>> lookups
>> partial fix #6049: datastore: use config fast-path in Drop
>> partial fix #6049: datastore: add TTL fallback to catch manual config
>> edits
>> partial fix #6049: datastore: add reload flag to config cache helper
>> datastore: only bump generation when config digest changes
>>
>> pbs-config/src/config_version_cache.rs | 10 +-
>> pbs-datastore/Cargo.toml | 1 +
>> pbs-datastore/src/datastore.rs | 232 ++++++++++++++++++++-----
>> 3 files changed, 197 insertions(+), 46 deletions(-)
>>
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 0/6] datastore: remove config reload on hot path
2025-11-20 13:03 10% [pbs-devel] [PATCH proxmox-backup v3 0/6] " Samuel Rufinatscha
` (5 preceding siblings ...)
2025-11-20 13:03 15% ` [pbs-devel] [PATCH proxmox-backup v3 6/6] datastore: only bump generation when config digest changes Samuel Rufinatscha
@ 2025-11-20 14:50 5% ` Fabian Grünbichler
2025-11-20 15:17 6% ` Samuel Rufinatscha
2025-11-24 15:35 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
7 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2025-11-20 14:50 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On November 20, 2025 2:03 pm, Samuel Rufinatscha wrote:
> Hi,
>
> [..]
nit: this is getting a bit long ;)
>
> ## Patch summary
>
> [PATCH 1/6] partial fix #6049: config: enable config version cache for datastore
> [PATCH 2/6] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
> [PATCH 3/6] partial fix #6049: datastore: use config fast-path in Drop
> [PATCH 4/6] partial fix #6049: datastore: add TTL fallback to catch manual config edits
> [PATCH 5/6] to add a reload flag to the config cache helper.
> [PATCH 6/6] to only bump generation when the config digest changes.
>
> ## Changes from v2:
>
> Added:
> - [PATCH 5/6]: Add a reload flag to the config cache helper.
> - [PATCH 6/6]: Only bump generation when the config digest changes.
please fold those into the existing version where they make sense, and
include a per-patch changelog to know *what* changed ;)
e.g., the digest part can already go into the first patch (if the
generation bumping is also moved thre from patch #4), or into patch #4.
the structural changes I suggested are missing, and I think the
readability got worse as a result since v2, we now have six instances of
checking whether there is some cache we are operating on or not..
I'll give more detailed feedback on the two new patches..
>
> ## Maintainer notes
>
> No dependency bumps, no API changes and no breaking changes.
>
> Thanks,
> Samuel
>
> [1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
> [3] Bugzilla #7017: https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>
> Samuel Rufinatscha (6):
> partial fix #6049: config: enable config version cache for datastore
> partial fix #6049: datastore: impl ConfigVersionCache fast path for
> lookups
> partial fix #6049: datastore: use config fast-path in Drop
> partial fix #6049: datastore: add TTL fallback to catch manual config
> edits
> partial fix #6049: datastore: add reload flag to config cache helper
> datastore: only bump generation when config digest changes
>
> pbs-config/src/config_version_cache.rs | 10 +-
> pbs-datastore/Cargo.toml | 1 +
> pbs-datastore/src/datastore.rs | 232 ++++++++++++++++++++-----
> 3 files changed, 197 insertions(+), 46 deletions(-)
>
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 5/6] partial fix #6049: datastore: add reload flag to config cache helper
2025-11-20 13:03 15% ` [pbs-devel] [PATCH proxmox-backup v3 5/6] partial fix #6049: datastore: add reload flag to config cache helper Samuel Rufinatscha
@ 2025-11-20 14:50 5% ` Fabian Grünbichler
2025-11-20 18:15 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2025-11-20 14:50 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On November 20, 2025 2:03 pm, Samuel Rufinatscha wrote:
> Extend datastore_section_config_cached() with an `allow_reload` flag to
> separate two use cases:
>
> 1) lookup_datastore() passes `true` and is allowed to reload
> datastore.cfg from disk when the cache is missing, the generation
> changed or the TTL expired. The helper may bump the datastore
> generation if the digest changed.
>
> 2) DataStore::drop() passes `false` and only consumes the most recent
> cached entry without touching the disk, TTL or generation. If the
> cache was never initialised, it returns an error.
>
> This avoids races between Drop and concurrent config changes.
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> pbs-datastore/src/datastore.rs | 36 ++++++++++++++++++++++++++++++----
> 1 file changed, 32 insertions(+), 4 deletions(-)
>
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 1711c753..12076f31 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -226,7 +226,7 @@ impl Drop for DataStore {
> return;
> }
>
> - let (section_config, _gen) = match datastore_section_config_cached() {
> + let (section_config, _gen) = match datastore_section_config_cached(false) {
> Ok(v) => v,
> Err(err) => {
> log::error!(
> @@ -299,14 +299,42 @@ impl DatastoreBackend {
> }
> }
>
> -/// Return the cached datastore SectionConfig and its generation.
> -fn datastore_section_config_cached() -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
> +/// Returns the cached `datastore.cfg` and its generation.
> +///
> +/// When `allow_reload` is `true`, callers are expected to hold the datastore config. It may:
> +/// - Reload `datastore.cfg` from disk if either
> +/// - no cache exists yet, or cache is unavailable
> +/// - the cached generation does not match the shared generation
> +/// - the cache entry is older than `DATASTORE_CONFIG_CACHE_TTL_SECS`
> +/// - Updates the cache with the new config, timestamp and digest.
> +/// - Bumps the datastore generation in `ConfigVersionCache` only if
> +/// there was a previous cached entry and the digest changed (manual edit or
> +/// API write). If the digest is unchanged, the timestamp is refreshed but the
> +/// generation is kept to avoid unnecessary invalidations.
> +///
> +/// When `allow_reload` is `false`:
> +/// - Never touches the disk or the shared generation.
> +/// - Ignores TTL and simply returns the most recent cached entry if available.
> +/// - Returns an error if the cache has not been initialised yet.
> +///
> +/// Intended for use with `Datastore::drop` where no config lock is held
> +/// and eventual stale data is acceptable.
> +fn datastore_section_config_cached(
> + allow_reload: bool,
> +) -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
> let now = epoch_i64();
> let version_cache = ConfigVersionCache::new().ok();
> let current_gen = version_cache.as_ref().map(|c| c.datastore_generation());
>
> let mut guard = DATASTORE_CONFIG_CACHE.lock().unwrap();
>
> + if !allow_reload {
> + if let Some(cache) = guard.as_ref() {
> + return Ok((cache.config.clone(), Some(cache.last_generation)));
> + }
> + bail!("datastore config cache not initialized");
> + }
this is not quite what I intended, we are actually allowed to reload,
just not bump the generation number and store the result ;) the
difference is basically whether we
- hold the lock and can be sure that nothing modifies the
config/generation number while we do the lookup and bump
- don't hold the lock and can just compare and reload, but not bump and
persist
if the code is restructured then this is should boil down to an if
wrapping the generation bump and cache update, leaving the rest as it
was..
> +
> // Fast path: re-use cached datastore.cfg if cache is available, generation matches and TTL not expired
> if let (Some(current_gen), Some(config_cache)) = (current_gen, guard.as_ref()) {
> let gen_matches = config_cache.last_generation == current_gen;
> @@ -423,7 +451,7 @@ impl DataStore {
> let _config_lock = pbs_config::datastore::lock_config()?;
>
> // Get the current datastore.cfg generation number and cached config
> - let (section_config, gen_num) = datastore_section_config_cached()?;
> + let (section_config, gen_num) = datastore_section_config_cached(true)?;
>
> let datastore_cfg: DataStoreConfig = section_config.lookup("datastore", name)?;
> let maintenance_mode = datastore_cfg.get_maintenance_mode();
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 6/6] datastore: only bump generation when config digest changes
2025-11-20 13:03 15% ` [pbs-devel] [PATCH proxmox-backup v3 6/6] datastore: only bump generation when config digest changes Samuel Rufinatscha
@ 2025-11-20 14:50 5% ` Fabian Grünbichler
2025-11-21 8:37 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2025-11-20 14:50 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On November 20, 2025 2:03 pm, Samuel Rufinatscha wrote:
> When reloading datastore.cfg in datastore_section_config_cached(),
> we currently bump the datastore generation unconditionally. This is
> only necessary when the on disk content actually changed and when
> we already had a previous cached entry.
>
> This patch extends the DatastoreConfigCache to store the last digest of
> datastore.cfg and track the previously cached generation and digest.
> Only when the digest differs from the cached one. On first load, it
> reuses the existing datastore_generation without bumping.
>
> This avoids unnecessary cache invalidations if the config did not
> change.
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> pbs-datastore/src/datastore.rs | 43 ++++++++++++++++++++++++----------
> 1 file changed, 30 insertions(+), 13 deletions(-)
>
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 12076f31..bf04332e 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -51,6 +51,8 @@ use crate::{DataBlob, LocalDatastoreLruCache};
> struct DatastoreConfigCache {
> // Parsed datastore.cfg file
> config: Arc<SectionConfigData>,
> + // Digest of the datastore.cfg file
> + last_digest: [u8; 32],
> // Generation number from ConfigVersionCache
> last_generation: usize,
> // Last update time (epoch seconds)
> @@ -349,29 +351,44 @@ fn datastore_section_config_cached(
> }
>
> // Slow path: re-read datastore.cfg
> - let (config_raw, _digest) = pbs_config::datastore::config()?;
> + let (config_raw, digest) = pbs_config::datastore::config()?;
> let config = Arc::new(config_raw);
>
> - // Update cache
> + // Decide whether to bump the shared generation.
> + // Only bump if we already had a cached generation and the digest changed (manual edit or API write)
> + let (prev_gen, prev_digest) = guard
> + .as_ref()
> + .map(|c| (Some(c.last_generation), Some(c.last_digest)))
> + .unwrap_or((None, None));
> +
> let new_gen = if let Some(handle) = version_cache {
> - // Bump datastore generation whenever we reload the config.
> - // This ensures that Drop handlers will detect that a newer config exists
> - // and will not rely on a stale cached entry for maintenance mandate.
> - let prev_gen = handle.increase_datastore_generation();
> - let new_gen = prev_gen + 1;
> + match (prev_gen, prev_digest) {
> + // We had a previous generation and the digest changed => bump generation.
> + (Some(_prev_gen), Some(prev_digest)) if prev_digest != digest => {
this is not quite the correct logic - I think.
we only need to bump *iff* the digest doesn't match, but the generation
does - that implies somebody changed the config behind our back.
if the generation is different, we should *expect* the digest to also
not be identical, but we don't have to care in that case, since the
generation was already bumped (compared to the last cached state with
the different digest), and that invalidates all the old cache references
anyway..
again, I think this would be easier to follow along if the structure of
the ifs is changed ;)
> + let old = handle.increase_datastore_generation();
> + Some(old + 1)
> + }
> + // We had a previous generation but the digest stayed the same:
> + // keep the existing generation, just refresh the timestamp.
> + (Some(prev_gen), _) => Some(prev_gen),
> + // We didn't have a previous generation, just use the current one.
> + (None, _) => Some(handle.datastore_generation()),
> + }
> + } else {
> + None
> + };
>
> + if let Some(gen_val) = new_gen {
> *guard = Some(DatastoreConfigCache {
> config: config.clone(),
> - last_generation: new_gen,
> + last_digest: digest,
> + last_generation: gen_val,
> last_update: now,
> });
> -
> - Some(new_gen)
> } else {
> - // if the cache was not available, use again the slow path next time
> + // If the shared version cache is not available, don't cache.
> *guard = None;
> - None
> - };
> + }
>
> Ok((config, new_gen))
> }
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* [pbs-devel] superseded: [PATCH proxmox-backup v2 0/4] datastore: remove config reload on hot path
2025-11-14 15:05 10% [pbs-devel] [PATCH proxmox-backup v2 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
` (2 preceding siblings ...)
2025-11-14 15:05 15% ` [pbs-devel] [PATCH proxmox-backup v2 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
@ 2025-11-20 13:07 13% ` Samuel Rufinatscha
3 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-20 13:07 UTC (permalink / raw)
To: pbs-devel
https://lore.proxmox.com/pbs-devel/20251120130342.248815-1-s.rufinatscha@proxmox.com/T/#t
On 11/14/25 4:05 PM, Samuel Rufinatscha wrote:
> Hi,
>
> this series reduces CPU time in datastore lookups by avoiding repeated
> datastore.cfg reads/parses in both `lookup_datastore()` and
> `DataStore::Drop`. It also adds a TTL so manual config edits are
> noticed without reintroducing hashing on every request.
>
> While investigating #6049 [1], cargo-flamegraph [2] showed hotspots during
> repeated `/status` calls in `lookup_datastore()` and in `Drop`,
> dominated by `pbs_config::datastore::config()` (config parse).
>
> The parsing cost itself should eventually be investigated in a future
> effort. Furthermore, cargo-flamegraph showed that when using a
> token-based auth method to access the API, a significant amount of time
> is spent in validation on every request request [3].
>
> ## Approach
>
> [PATCH 1/4] Extend ConfigVersionCache for datastore generation
> Expose a dedicated datastore generation counter and an increment
> helper so callers can cheaply track datastore.cfg versions.
>
> [PATCH 2/4] Fast path for datastore lookups
> Cache the parsed datastore.cfg keyed by the shared datastore
> generation. lookup_datastore() reuses both the cached config and an
> existing DataStoreImpl when the generation matches, and falls back
> to the old slow path otherwise.
>
> [PATCH 3/4] Fast path for Drop
> Make DataStore::Drop use the cached config if possible instead of
> rereading datastore.cfg from disk.
>
> [PATCH 4/4] TTL to catch manual edits
> Add a small TTL around the cached config and bump the datastore
> generation whenever the config is reloaded. This catches manual
> edits to datastore.cfg without reintroducing hashing or
> config parsing on every request.
>
> ## Benchmark results
>
> All the following benchmarks are based on top of
> https://lore.proxmox.com/pbs-devel/20251112131525.645971-1-f.gruenbichler@proxmox.com/T/#u
>
> ### End-to-end
>
> Testing `/status?verbose=0` end-to-end with 1000 stores, 5 req/store
> and parallel=16 before/after the series:
>
> Metric Before After
> ----------------------------------------
> Total time 12s 9s
> Throughput (all) 416.67 555.56
> Cold RPS (round #1) 83.33 111.11
> Warm RPS (#2..N) 333.33 444.44
>
> Running under flamegraph [2], TLS appears to consume a significant
> amount of CPU time and blur the results. Still, a ~33% higher overall
> throughput and ~25% less end-to-end time for this workload.
>
> ### Isolated benchmarks (hyperfine)
>
> In addition to the end-to-end tests, I measured two standalone benchmarks
> with hyperfine, each using a config with 1000
> datastores. `M` is the number of distinct datastores looked up and
> `N` is the number of lookups per datastore.
>
> Drop-direct variant:
>
> Drops the `DataStore` after every lookup, so the `Drop` path runs on
> every iteration:
>
> use anyhow::Error;
>
> use pbs_api_types::Operation;
> use pbs_datastore::DataStore;
>
> fn main() -> Result<(), Error> {
> let mut args = std::env::args();
> args.next();
>
> let datastores = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> let iterations = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> for d in 1..=datastores {
> let name = format!("ds{:04}", d);
>
> for i in 1..=iterations {
> DataStore::lookup_datastore(&name, Some(Operation::Write))?;
> }
> }
>
> Ok(())
> }
>
> +----+------+-----------+-----------+---------+
> | M | N | Baseline | Patched | Speedup |
> +----+------+-----------+-----------+---------+
> | 1 | 1000 | 1.670 s | 34.3 ms | 48.7x |
> | 10 | 100 | 1.672 s | 34.5 ms | 48.4x |
> | 100| 10 | 1.679 s | 35.1 ms | 47.8x |
> |1000| 1 | 1.787 s | 38.2 ms | 46.8x |
> +----+------+-----------+-----------+---------+
>
> Bulk-drop variant:
>
> Keeps the `DataStore` instances alive for
> all `N` lookups of a given datastore and then drops them in bulk,
> mimicking a task that performs many lookups while it is running and
> only triggers the expensive `Drop` logic when the last user exits.
>
> use anyhow::Error;
>
> use pbs_api_types::Operation;
> use pbs_datastore::DataStore;
>
> fn main() -> Result<(), Error> {
> let mut args = std::env::args();
> args.next();
>
> let datastores = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> let iterations = if let Some(n) = args.next() {
> n.parse::<usize>()?
> } else {
> 1000
> };
>
> for d in 1..=datastores {
> let name = format!("ds{:04}", d);
>
> let mut stores = Vec::with_capacity(iterations);
> for i in 1..=iterations {
> stores.push(DataStore::lookup_datastore(&name, Some(Operation::Write))?);
> }
> }
>
> Ok(())
> }
>
> +------+------+---------------+--------------+---------+
> | M | N | Baseline mean | Patched mean | Speedup |
> +------+------+---------------+--------------+---------+
> | 1 | 1000 | 884.0 ms | 33.9 ms | 26.1x |
> | 10 | 100 | 881.8 ms | 35.3 ms | 25.0x |
> | 100 | 10 | 969.3 ms | 35.9 ms | 27.0x |
> | 1000 | 1 | 1827.0 ms | 40.7 ms | 44.9x |
> +------+------+---------------+--------------+---------+
>
> Both variants show that the combination of the cached config lookups
> and the cheaper `Drop` handling reduces the hot-path cost from ~1.7 s
> per run to a few tens of milliseconds in these benchmarks.
>
> ## Reproduction steps
>
> VM: 4 vCPU, ~8 GiB RAM, VirtIO-SCSI; disks:
> - scsi0 32G (OS)
> - scsi1 1000G (datastores)
>
> Install PBS from ISO on the VM.
>
> Set up ZFS on /dev/sdb (adjust if different):
>
> zpool create -f -o ashift=12 pbsbench /dev/sdb
> zfs set mountpoint=/pbsbench pbsbench
> zfs create pbsbench/pbs-bench
>
> Raise file-descriptor limit:
>
> sudo systemctl edit proxmox-backup-proxy.service
>
> Add the following lines:
>
> [Service]
> LimitNOFILE=1048576
>
> Reload systemd and restart the proxy:
>
> sudo systemctl daemon-reload
> sudo systemctl restart proxmox-backup-proxy.service
>
> Verify the limit:
>
> systemctl show proxmox-backup-proxy.service | grep LimitNOFILE
>
> Create 1000 ZFS-backed datastores (as used in #6049 [1]):
>
> seq -w 001 1000 | xargs -n1 -P1 bash -c '
> id=$0
> name="ds${id}"
> dataset="pbsbench/pbs-bench/${name}"
> path="/pbsbench/pbs-bench/${name}"
> zfs create -o mountpoint="$path" "$dataset"
> proxmox-backup-manager datastore create "$name" "$path" \
> --comment "ZFS dataset-based datastore"
> '
>
> Build PBS from this series, then run the server under manually
> under flamegraph:
>
> systemctl stop proxmox-backup-proxy
> cargo flamegraph --release --bin proxmox-backup-proxy
>
> ## Other resources:
>
> ### E2E benchmark script:
>
> #!/usr/bin/env bash
> set -euo pipefail
>
> # --- Config ---------------------------------------------------------------
> HOST='https://localhost:8007'
> USER='root@pam'
> PASS="$(cat passfile)"
>
> DATASTORE_PATH="/pbsbench/pbs-bench"
> MAX_STORES=1000 # how many stores to include
> PARALLEL=16 # concurrent workers
> REPEAT=5 # requests per store (1 cold + REPEAT-1 warm)
>
> PRINT_FIRST=false # true => log first request's HTTP code per store
>
> # --- Helpers --------------------------------------------------------------
> fmt_rps () {
> local n="$1" t="$2"
> awk -v n="$n" -v t="$t" 'BEGIN { if (t > 0) printf("%.2f\n", n/t); else print "0.00" }'
> }
>
> # --- Login ---------------------------------------------------------------
> auth=$(curl -ks -X POST "$HOST/api2/json/access/ticket" \
> -d "username=$USER" -d "password=$PASS")
> ticket=$(echo "$auth" | jq -r '.data.ticket')
>
> if [[ -z "${ticket:-}" || "$ticket" == "null" ]]; then
> echo "[ERROR] Login failed (no ticket)"
> exit 1
> fi
>
> # --- Collect stores (deterministic order) --------------------------------
> mapfile -t STORES < <(
> find "$DATASTORE_PATH" -mindepth 1 -maxdepth 1 -type d -printf '%f\n' \
> | sort | head -n "$MAX_STORES"
> )
>
> USED_STORES=${#STORES[@]}
> if (( USED_STORES == 0 )); then
> echo "[ERROR] No datastore dirs under $DATASTORE_PATH"
> exit 1
> fi
>
> echo "[INFO] Running with stores=$USED_STORES, repeat=$REPEAT, parallel=$PARALLEL"
>
> # --- Temp counters --------------------------------------------------------
> SUCCESS_ALL="$(mktemp)"
> FAIL_ALL="$(mktemp)"
> COLD_OK="$(mktemp)"
> WARM_OK="$(mktemp)"
> trap 'rm -f "$SUCCESS_ALL" "$FAIL_ALL" "$COLD_OK" "$WARM_OK"' EXIT
>
> export HOST ticket REPEAT SUCCESS_ALL FAIL_ALL COLD_OK WARM_OK PRINT_FIRST
>
> SECONDS=0
>
> # --- Fire requests --------------------------------------------------------
> printf "%s\n" "${STORES[@]}" \
> | xargs -P"$PARALLEL" -I{} bash -c '
> store="$1"
> url="$HOST/api2/json/admin/datastore/$store/status?verbose=0"
>
> for ((i=1;i<=REPEAT;i++)); do
> code=$(curl -ks -o /dev/null -w "%{http_code}" -b "PBSAuthCookie=$ticket" "$url" || echo 000)
>
> if [[ "$code" == "200" ]]; then
> echo 1 >> "$SUCCESS_ALL"
> if (( i == 1 )); then
> echo 1 >> "$COLD_OK"
> else
> echo 1 >> "$WARM_OK"
> fi
> if [[ "$PRINT_FIRST" == "true" && $i -eq 1 ]]; then
> ts=$(date +%H:%M:%S)
> echo "[$ts] $store #$i HTTP:200"
> fi
> else
> echo 1 >> "$FAIL_ALL"
> if [[ "$PRINT_FIRST" == "true" && $i -eq 1 ]]; then
> ts=$(date +%H:%M:%S)
> echo "[$ts] $store #$i HTTP:$code (FAIL)"
> fi
> fi
> done
> ' _ {}
>
> # --- Summary --------------------------------------------------------------
> elapsed=$SECONDS
> ok=$(wc -l < "$SUCCESS_ALL" 2>/dev/null || echo 0)
> fail=$(wc -l < "$FAIL_ALL" 2>/dev/null || echo 0)
> cold_ok=$(wc -l < "$COLD_OK" 2>/dev/null || echo 0)
> warm_ok=$(wc -l < "$WARM_OK" 2>/dev/null || echo 0)
>
> expected=$(( USED_STORES * REPEAT ))
> total=$(( ok + fail ))
>
> rps_all=$(fmt_rps "$ok" "$elapsed")
> rps_cold=$(fmt_rps "$cold_ok" "$elapsed")
> rps_warm=$(fmt_rps "$warm_ok" "$elapsed")
>
> echo "===== Summary ====="
> echo "Stores used: $USED_STORES"
> echo "Expected requests: $expected"
> echo "Executed requests: $total"
> echo "OK (HTTP 200): $ok"
> echo "Failed: $fail"
> printf "Total time: %dm %ds\n" $((elapsed/60)) $((elapsed%60))
> echo "Throughput all RPS: $rps_all"
> echo "Cold RPS (round #1): $rps_cold"
> echo "Warm RPS (#2..N): $rps_warm"
>
> ## Maintainer notes
>
> No dependency bumps, no API changes and no breaking changes.
>
> ## Patch summary
>
> [PATCH 1/4] partial fix #6049: config: enable config version cache for datastore
> [PATCH 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
> [PATCH 3/4] partial fix #6049: datastore: use config fast-path in Drop
> [PATCH 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
>
> Thanks,
> Samuel
>
> [1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
> [3] Bugzilla #7017: https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>
> Samuel Rufinatscha (4):
> partial fix #6049: config: enable config version cache for datastore
> partial fix #6049: datastore: impl ConfigVersionCache fast path for
> lookups
> partial fix #6049: datastore: use config fast-path in Drop
> partial fix #6049: datastore: add TTL fallback to catch manual config
> edits
>
> pbs-config/src/config_version_cache.rs | 10 +-
> pbs-datastore/Cargo.toml | 1 +
> pbs-datastore/src/datastore.rs | 187 +++++++++++++++++++------
> 3 files changed, 152 insertions(+), 46 deletions(-)
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox-backup v3 2/6] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
2025-11-20 13:03 10% [pbs-devel] [PATCH proxmox-backup v3 0/6] " Samuel Rufinatscha
2025-11-20 13:03 17% ` [pbs-devel] [PATCH proxmox-backup v3 1/6] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
@ 2025-11-20 13:03 12% ` Samuel Rufinatscha
2025-11-20 13:03 16% ` [pbs-devel] [PATCH proxmox-backup v3 3/6] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
` (5 subsequent siblings)
7 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-20 13:03 UTC (permalink / raw)
To: pbs-devel
Repeated /status requests caused lookup_datastore() to re-read and
parse datastore.cfg on every call. The issue was mentioned in report
#6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
dominated by pbs_config::datastore::config() (config parsing).
This patch implements caching of the global datastore.cfg using the
generation numbers from the shared config version cache. It caches the
datastore.cfg along with the generation number and, when a subsequent
lookup sees the same generation, it reuses the cached config without
re-reading it from disk. If the generation differs
(or the cache is unavailable), it falls back to the existing slow path
with no behavioral changes.
Behavioral notes
- The generation is bumped via the existing save_config() path, so
API-driven config changes are detected immediately.
- Manual edits to datastore.cfg are not detected; a TTL
guard is introduced in a dedicated patch in this series.
- DataStore::drop still performs a config read on the common path,
this is covered in a dedicated patch in this series.
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
pbs-datastore/Cargo.toml | 1 +
pbs-datastore/src/datastore.rs | 120 +++++++++++++++++++++++----------
2 files changed, 87 insertions(+), 34 deletions(-)
diff --git a/pbs-datastore/Cargo.toml b/pbs-datastore/Cargo.toml
index 8ce930a9..42f49a7b 100644
--- a/pbs-datastore/Cargo.toml
+++ b/pbs-datastore/Cargo.toml
@@ -40,6 +40,7 @@ proxmox-io.workspace = true
proxmox-lang.workspace=true
proxmox-s3-client = { workspace = true, features = [ "impl" ] }
proxmox-schema = { workspace = true, features = [ "api-macro" ] }
+proxmox-section-config.workspace = true
proxmox-serde = { workspace = true, features = [ "serde_json" ] }
proxmox-sys.workspace = true
proxmox-systemd.workspace = true
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 0a517923..8c687097 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -32,7 +32,8 @@ use pbs_api_types::{
MaintenanceType, Operation, UPID,
};
use pbs_config::s3::S3_CFG_TYPE_ID;
-use pbs_config::BackupLockGuard;
+use pbs_config::{BackupLockGuard, ConfigVersionCache};
+use proxmox_section_config::SectionConfigData;
use crate::backup_info::{
BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
@@ -46,6 +47,17 @@ use crate::s3::S3_CONTENT_PREFIX;
use crate::task_tracking::{self, update_active_operations};
use crate::{DataBlob, LocalDatastoreLruCache};
+// Cache for fully parsed datastore.cfg
+struct DatastoreConfigCache {
+ // Parsed datastore.cfg file
+ config: Arc<SectionConfigData>,
+ // Generation number from ConfigVersionCache
+ last_generation: usize,
+}
+
+static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
+ LazyLock::new(|| Mutex::new(None));
+
static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
LazyLock::new(|| Mutex::new(HashMap::new()));
@@ -142,10 +154,12 @@ pub struct DataStoreImpl {
last_gc_status: Mutex<GarbageCollectionStatus>,
verify_new: bool,
chunk_order: ChunkOrder,
- last_digest: Option<[u8; 32]>,
sync_level: DatastoreFSyncLevel,
backend_config: DatastoreBackendConfig,
lru_store_caching: Option<LocalDatastoreLruCache>,
+ /// Datastore generation number from `ConfigVersionCache` at creation time, used to
+ /// validate reuse of this cached `DataStoreImpl`.
+ config_generation: Option<usize>,
}
impl DataStoreImpl {
@@ -158,10 +172,10 @@ impl DataStoreImpl {
last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
verify_new: false,
chunk_order: Default::default(),
- last_digest: None,
sync_level: Default::default(),
backend_config: Default::default(),
lru_store_caching: None,
+ config_generation: None,
})
}
}
@@ -256,6 +270,37 @@ impl DatastoreBackend {
}
}
+/// Return the cached datastore SectionConfig and its generation.
+fn datastore_section_config_cached() -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
+ let gen = ConfigVersionCache::new()
+ .ok()
+ .map(|c| c.datastore_generation());
+
+ let mut guard = DATASTORE_CONFIG_CACHE.lock().unwrap();
+
+ // Fast path: re-use cached datastore.cfg
+ if let (Some(gen), Some(cache)) = (gen, guard.as_ref()) {
+ if cache.last_generation == gen {
+ return Ok((cache.config.clone(), Some(gen)));
+ }
+ }
+
+ // Slow path: re-read datastore.cfg
+ let (config_raw, _digest) = pbs_config::datastore::config()?;
+ let config = Arc::new(config_raw);
+
+ if let Some(gen_val) = gen {
+ *guard = Some(DatastoreConfigCache {
+ config: config.clone(),
+ last_generation: gen_val,
+ });
+ } else {
+ *guard = None;
+ }
+
+ Ok((config, gen))
+}
+
impl DataStore {
// This one just panics on everything
#[doc(hidden)]
@@ -327,56 +372,63 @@ impl DataStore {
name: &str,
operation: Option<Operation>,
) -> Result<Arc<DataStore>, Error> {
- // Avoid TOCTOU between checking maintenance mode and updating active operation counter, as
- // we use it to decide whether it is okay to delete the datastore.
+ // Avoid TOCTOU between checking maintenance mode and updating active operations.
let _config_lock = pbs_config::datastore::lock_config()?;
- // we could use the ConfigVersionCache's generation for staleness detection, but we load
- // the config anyway -> just use digest, additional benefit: manual changes get detected
- let (config, digest) = pbs_config::datastore::config()?;
- let config: DataStoreConfig = config.lookup("datastore", name)?;
+ // Get the current datastore.cfg generation number and cached config
+ let (section_config, gen_num) = datastore_section_config_cached()?;
+
+ let datastore_cfg: DataStoreConfig = section_config.lookup("datastore", name)?;
+ let maintenance_mode = datastore_cfg.get_maintenance_mode();
+ let mount_status = get_datastore_mount_status(&datastore_cfg);
- if let Some(maintenance_mode) = config.get_maintenance_mode() {
- if let Err(error) = maintenance_mode.check(operation) {
+ if let Some(mm) = &maintenance_mode {
+ if let Err(error) = mm.check(operation.clone()) {
bail!("datastore '{name}' is unavailable: {error}");
}
}
- if get_datastore_mount_status(&config) == Some(false) {
- let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
- datastore_cache.remove(&config.name);
- bail!("datastore '{}' is not mounted", config.name);
+ let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
+
+ if mount_status == Some(false) {
+ datastore_cache.remove(&datastore_cfg.name);
+ bail!("datastore '{}' is not mounted", datastore_cfg.name);
}
- let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
- let entry = datastore_cache.get(name);
-
- // reuse chunk store so that we keep using the same process locker instance!
- let chunk_store = if let Some(datastore) = &entry {
- let last_digest = datastore.last_digest.as_ref();
- if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
- if let Some(operation) = operation {
- update_active_operations(name, operation, 1)?;
+ // Re-use DataStoreImpl
+ if let Some(existing) = datastore_cache.get(name).cloned() {
+ if let (Some(last_generation), Some(gen_num)) = (existing.config_generation, gen_num) {
+ if last_generation == gen_num {
+ if let Some(op) = operation {
+ update_active_operations(name, op, 1)?;
+ }
+
+ return Ok(Arc::new(Self {
+ inner: existing,
+ operation,
+ }));
}
- return Ok(Arc::new(Self {
- inner: Arc::clone(datastore),
- operation,
- }));
}
- Arc::clone(&datastore.chunk_store)
+ }
+
+ // (Re)build DataStoreImpl
+
+ // Reuse chunk store so that we keep using the same process locker instance!
+ let chunk_store = if let Some(existing) = datastore_cache.get(name) {
+ Arc::clone(&existing.chunk_store)
} else {
let tuning: DatastoreTuning = serde_json::from_value(
DatastoreTuning::API_SCHEMA
- .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
+ .parse_property_string(datastore_cfg.tuning.as_deref().unwrap_or(""))?,
)?;
Arc::new(ChunkStore::open(
name,
- config.absolute_path(),
+ datastore_cfg.absolute_path(),
tuning.sync_level.unwrap_or_default(),
)?)
};
- let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
+ let datastore = DataStore::with_store_and_config(chunk_store, datastore_cfg, gen_num)?;
let datastore = Arc::new(datastore);
datastore_cache.insert(name.to_string(), datastore.clone());
@@ -478,7 +530,7 @@ impl DataStore {
fn with_store_and_config(
chunk_store: Arc<ChunkStore>,
config: DataStoreConfig,
- last_digest: Option<[u8; 32]>,
+ generation: Option<usize>,
) -> Result<DataStoreImpl, Error> {
let mut gc_status_path = chunk_store.base_path();
gc_status_path.push(".gc-status");
@@ -538,10 +590,10 @@ impl DataStore {
last_gc_status: Mutex::new(gc_status),
verify_new: config.verify_new.unwrap_or(false),
chunk_order: tuning.chunk_order.unwrap_or_default(),
- last_digest,
sync_level: tuning.sync_level.unwrap_or_default(),
backend_config,
lru_store_caching,
+ config_generation: generation,
})
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] [PATCH proxmox-backup v3 6/6] datastore: only bump generation when config digest changes
2025-11-20 13:03 10% [pbs-devel] [PATCH proxmox-backup v3 0/6] " Samuel Rufinatscha
` (4 preceding siblings ...)
2025-11-20 13:03 15% ` [pbs-devel] [PATCH proxmox-backup v3 5/6] partial fix #6049: datastore: add reload flag to config cache helper Samuel Rufinatscha
@ 2025-11-20 13:03 15% ` Samuel Rufinatscha
2025-11-20 14:50 5% ` Fabian Grünbichler
2025-11-20 14:50 5% ` [pbs-devel] [PATCH proxmox-backup v3 0/6] datastore: remove config reload on hot path Fabian Grünbichler
2025-11-24 15:35 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
7 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2025-11-20 13:03 UTC (permalink / raw)
To: pbs-devel
When reloading datastore.cfg in datastore_section_config_cached(),
we currently bump the datastore generation unconditionally. This is
only necessary when the on disk content actually changed and when
we already had a previous cached entry.
This patch extends the DatastoreConfigCache to store the last digest of
datastore.cfg and track the previously cached generation and digest.
Only when the digest differs from the cached one. On first load, it
reuses the existing datastore_generation without bumping.
This avoids unnecessary cache invalidations if the config did not
change.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
pbs-datastore/src/datastore.rs | 43 ++++++++++++++++++++++++----------
1 file changed, 30 insertions(+), 13 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 12076f31..bf04332e 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -51,6 +51,8 @@ use crate::{DataBlob, LocalDatastoreLruCache};
struct DatastoreConfigCache {
// Parsed datastore.cfg file
config: Arc<SectionConfigData>,
+ // Digest of the datastore.cfg file
+ last_digest: [u8; 32],
// Generation number from ConfigVersionCache
last_generation: usize,
// Last update time (epoch seconds)
@@ -349,29 +351,44 @@ fn datastore_section_config_cached(
}
// Slow path: re-read datastore.cfg
- let (config_raw, _digest) = pbs_config::datastore::config()?;
+ let (config_raw, digest) = pbs_config::datastore::config()?;
let config = Arc::new(config_raw);
- // Update cache
+ // Decide whether to bump the shared generation.
+ // Only bump if we already had a cached generation and the digest changed (manual edit or API write)
+ let (prev_gen, prev_digest) = guard
+ .as_ref()
+ .map(|c| (Some(c.last_generation), Some(c.last_digest)))
+ .unwrap_or((None, None));
+
let new_gen = if let Some(handle) = version_cache {
- // Bump datastore generation whenever we reload the config.
- // This ensures that Drop handlers will detect that a newer config exists
- // and will not rely on a stale cached entry for maintenance mandate.
- let prev_gen = handle.increase_datastore_generation();
- let new_gen = prev_gen + 1;
+ match (prev_gen, prev_digest) {
+ // We had a previous generation and the digest changed => bump generation.
+ (Some(_prev_gen), Some(prev_digest)) if prev_digest != digest => {
+ let old = handle.increase_datastore_generation();
+ Some(old + 1)
+ }
+ // We had a previous generation but the digest stayed the same:
+ // keep the existing generation, just refresh the timestamp.
+ (Some(prev_gen), _) => Some(prev_gen),
+ // We didn't have a previous generation, just use the current one.
+ (None, _) => Some(handle.datastore_generation()),
+ }
+ } else {
+ None
+ };
+ if let Some(gen_val) = new_gen {
*guard = Some(DatastoreConfigCache {
config: config.clone(),
- last_generation: new_gen,
+ last_digest: digest,
+ last_generation: gen_val,
last_update: now,
});
-
- Some(new_gen)
} else {
- // if the cache was not available, use again the slow path next time
+ // If the shared version cache is not available, don't cache.
*guard = None;
- None
- };
+ }
Ok((config, new_gen))
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox-backup v3 4/6] partial fix #6049: datastore: add TTL fallback to catch manual config edits
2025-11-20 13:03 10% [pbs-devel] [PATCH proxmox-backup v3 0/6] " Samuel Rufinatscha
` (2 preceding siblings ...)
2025-11-20 13:03 16% ` [pbs-devel] [PATCH proxmox-backup v3 3/6] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
@ 2025-11-20 13:03 15% ` Samuel Rufinatscha
2025-11-20 13:03 15% ` [pbs-devel] [PATCH proxmox-backup v3 5/6] partial fix #6049: datastore: add reload flag to config cache helper Samuel Rufinatscha
` (3 subsequent siblings)
7 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-20 13:03 UTC (permalink / raw)
To: pbs-devel
The lookup fast path reacts to API-driven config changes because
save_config() bumps the generation. Manual edits of datastore.cfg do
not bump the counter. To keep the system robust against such edits
without reintroducing config reading and hashing on the hot path, this
patch adds a TTL to the cache entry.
If the cached config is older than
DATASTORE_CONFIG_CACHE_TTL_SECS (set to 60s), the next lookup takes
the slow path and refreshes the cached entry. Within
the TTL window, unchanged generations still use the fast path.
Links
[1] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
pbs-datastore/src/datastore.rs | 46 +++++++++++++++++++++++++---------
1 file changed, 34 insertions(+), 12 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 1494521c..1711c753 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -22,7 +22,7 @@ use proxmox_sys::error::SysError;
use proxmox_sys::fs::{file_read_optional_string, replace_file, CreateOptions};
use proxmox_sys::linux::procfs::MountInfo;
use proxmox_sys::process_locker::{ProcessLockExclusiveGuard, ProcessLockSharedGuard};
-use proxmox_time::TimeSpan;
+use proxmox_time::{epoch_i64, TimeSpan};
use proxmox_worker_task::WorkerTaskContext;
use pbs_api_types::{
@@ -53,6 +53,8 @@ struct DatastoreConfigCache {
config: Arc<SectionConfigData>,
// Generation number from ConfigVersionCache
last_generation: usize,
+ // Last update time (epoch seconds)
+ last_update: i64,
}
static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
@@ -61,6 +63,8 @@ static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
LazyLock::new(|| Mutex::new(HashMap::new()));
+/// Max age in seconds to reuse the cached datastore config.
+const DATASTORE_CONFIG_CACHE_TTL_SECS: i64 = 60;
/// Filename to store backup group notes
pub const GROUP_NOTES_FILE_NAME: &str = "notes";
/// Filename to store backup group owner
@@ -297,16 +301,22 @@ impl DatastoreBackend {
/// Return the cached datastore SectionConfig and its generation.
fn datastore_section_config_cached() -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
- let gen = ConfigVersionCache::new()
- .ok()
- .map(|c| c.datastore_generation());
+ let now = epoch_i64();
+ let version_cache = ConfigVersionCache::new().ok();
+ let current_gen = version_cache.as_ref().map(|c| c.datastore_generation());
let mut guard = DATASTORE_CONFIG_CACHE.lock().unwrap();
- // Fast path: re-use cached datastore.cfg
- if let (Some(gen), Some(cache)) = (gen, guard.as_ref()) {
- if cache.last_generation == gen {
- return Ok((cache.config.clone(), Some(gen)));
+ // Fast path: re-use cached datastore.cfg if cache is available, generation matches and TTL not expired
+ if let (Some(current_gen), Some(config_cache)) = (current_gen, guard.as_ref()) {
+ let gen_matches = config_cache.last_generation == current_gen;
+ let ttl_ok = (now - config_cache.last_update) < DATASTORE_CONFIG_CACHE_TTL_SECS;
+
+ if gen_matches && ttl_ok {
+ return Ok((
+ config_cache.config.clone(),
+ Some(config_cache.last_generation),
+ ));
}
}
@@ -314,16 +324,28 @@ fn datastore_section_config_cached() -> Result<(Arc<SectionConfigData>, Option<u
let (config_raw, _digest) = pbs_config::datastore::config()?;
let config = Arc::new(config_raw);
- if let Some(gen_val) = gen {
+ // Update cache
+ let new_gen = if let Some(handle) = version_cache {
+ // Bump datastore generation whenever we reload the config.
+ // This ensures that Drop handlers will detect that a newer config exists
+ // and will not rely on a stale cached entry for maintenance mandate.
+ let prev_gen = handle.increase_datastore_generation();
+ let new_gen = prev_gen + 1;
+
*guard = Some(DatastoreConfigCache {
config: config.clone(),
- last_generation: gen_val,
+ last_generation: new_gen,
+ last_update: now,
});
+
+ Some(new_gen)
} else {
+ // if the cache was not available, use again the slow path next time
*guard = None;
- }
+ None
+ };
- Ok((config, gen))
+ Ok((config, new_gen))
}
impl DataStore {
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox-backup v3 0/6] datastore: remove config reload on hot path
@ 2025-11-20 13:03 10% Samuel Rufinatscha
2025-11-20 13:03 17% ` [pbs-devel] [PATCH proxmox-backup v3 1/6] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
` (7 more replies)
0 siblings, 8 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-20 13:03 UTC (permalink / raw)
To: pbs-devel
Hi,
this series reduces CPU time in datastore lookups by avoiding repeated
datastore.cfg reads/parses in both `lookup_datastore()` and
`DataStore::Drop`. It also adds a TTL so manual config edits are
noticed without reintroducing hashing on every request.
While investigating #6049 [1], cargo-flamegraph [2] showed hotspots during
repeated `/status` calls in `lookup_datastore()` and in `Drop`,
dominated by `pbs_config::datastore::config()` (config parse).
The parsing cost itself should eventually be investigated in a future
effort. Furthermore, cargo-flamegraph showed that when using a
token-based auth method to access the API, a significant amount of time
is spent in validation on every request request [3].
## Approach
[PATCH 1/6] Extend ConfigVersionCache for datastore generation
Expose a dedicated datastore generation counter and an increment
helper so callers can cheaply track datastore.cfg versions.
[PATCH 2/6] Fast path for datastore lookups
Cache the parsed datastore.cfg keyed by the shared datastore
generation. lookup_datastore() reuses both the cached config and an
existing DataStoreImpl when the generation matches, and falls back
to the old slow path otherwise.
[PATCH 3/6] Fast path for Drop
Make DataStore::Drop use the cached config if possible instead of
rereading datastore.cfg from disk.
[PATCH 4/6] TTL to catch manual edits
Add a small TTL around the cached config and bump the datastore
generation whenever the config is reloaded. This catches manual
edits to datastore.cfg without reintroducing hashing or
config parsing on every request.
[PATCH 5/6] Add reload flag to config cache helper
Add a flag to the config cache helper to indicate whether a
config reload is acceptable.
[PATCH 6/6] Only bump generation on config digest change
Avoid unnecessary generation bumps when the config is reloaded
but the digest did not change.
## Benchmark results
All the following benchmarks are based on top of
https://lore.proxmox.com/pbs-devel/20251112131525.645971-1-f.gruenbichler@proxmox.com/T/#u
### End-to-end
Testing `/status?verbose=0` end-to-end with 1000 stores, 5 req/store
and parallel=16 before/after the series:
Metric Before After
----------------------------------------
Total time 12s 9s
Throughput (all) 416.67 555.56
Cold RPS (round #1) 83.33 111.11
Warm RPS (#2..N) 333.33 444.44
Running under flamegraph [2], TLS appears to consume a significant
amount of CPU time and blur the results. Still, a ~33% higher overall
throughput and ~25% less end-to-end time for this workload.
### Isolated benchmarks (hyperfine)
In addition to the end-to-end tests, I measured two standalone benchmarks
with hyperfine, each using a config with 1000
datastores. `M` is the number of distinct datastores looked up and
`N` is the number of lookups per datastore.
Drop-direct variant:
Drops the `DataStore` after every lookup, so the `Drop` path runs on
every iteration:
use anyhow::Error;
use pbs_api_types::Operation;
use pbs_datastore::DataStore;
fn main() -> Result<(), Error> {
let mut args = std::env::args();
args.next();
let datastores = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
let iterations = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
for d in 1..=datastores {
let name = format!("ds{:04}", d);
for i in 1..=iterations {
DataStore::lookup_datastore(&name, Some(Operation::Write))?;
}
}
Ok(())
}
+------+-------+------------+------------+----------+
| M | N | Baseline | Patched | Speedup |
+------+-------+------------+------------+----------+
| 1 | 1000 | 1.699 s | 37.3 ms | 45.5x |
| 10 | 100 | 1.710 s | 35.8 ms | 47.7x |
| 100 | 10 | 1.787 s | 36.6 ms | 48.9x |
| 1000 | 1 | 1.899 s | 46.0 ms | 41.3x |
+------+-------+------------+------------+----------+
Bulk-drop variant:
Keeps the `DataStore` instances alive for
all `N` lookups of a given datastore and then drops them in bulk,
mimicking a task that performs many lookups while it is running and
only triggers the expensive `Drop` logic when the last user exits.
use anyhow::Error;
use pbs_api_types::Operation;
use pbs_datastore::DataStore;
fn main() -> Result<(), Error> {
let mut args = std::env::args();
args.next();
let datastores = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
let iterations = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
for d in 1..=datastores {
let name = format!("ds{:04}", d);
let mut stores = Vec::with_capacity(iterations);
for i in 1..=iterations {
stores.push(DataStore::lookup_datastore(&name, Some(Operation::Write))?);
}
}
Ok(())
}
+------+-------+--------------+-------------+----------+
| M | N | Baseline | Patched | Speedup |
+------+-------+--------------+-------------+----------+
| 1 | 1000 | 888.8 ms | 39.3 ms | 22.6x |
| 10 | 100 | 890.8 ms | 35.3 ms | 25.3x |
| 100 | 10 | 974.5 ms | 36.3 ms | 26.8x |
| 1000 | 1 | 1.848 s | 39.9 ms | 46.3x |
+------+-------+--------------+-------------+----------+
Both variants show that the combination of the cached config lookups
and the cheaper `Drop` handling reduces the hot-path cost from ~1.7 s
per run to a few tens of milliseconds in these benchmarks.
## Reproduction steps
VM: 4 vCPU, ~8 GiB RAM, VirtIO-SCSI; disks:
- scsi0 32G (OS)
- scsi1 1000G (datastores)
Install PBS from ISO on the VM.
Set up ZFS on /dev/sdb (adjust if different):
zpool create -f -o ashift=12 pbsbench /dev/sdb
zfs set mountpoint=/pbsbench pbsbench
zfs create pbsbench/pbs-bench
Raise file-descriptor limit:
sudo systemctl edit proxmox-backup-proxy.service
Add the following lines:
[Service]
LimitNOFILE=1048576
Reload systemd and restart the proxy:
sudo systemctl daemon-reload
sudo systemctl restart proxmox-backup-proxy.service
Verify the limit:
systemctl show proxmox-backup-proxy.service | grep LimitNOFILE
Create 1000 ZFS-backed datastores (as used in #6049 [1]):
seq -w 001 1000 | xargs -n1 -P1 bash -c '
id=$0
name="ds${id}"
dataset="pbsbench/pbs-bench/${name}"
path="/pbsbench/pbs-bench/${name}"
zfs create -o mountpoint="$path" "$dataset"
proxmox-backup-manager datastore create "$name" "$path" \
--comment "ZFS dataset-based datastore"
'
Build PBS from this series, then run the server under manually
under flamegraph:
systemctl stop proxmox-backup-proxy
cargo flamegraph --release --bin proxmox-backup-proxy
## Other resources:
### E2E benchmark script:
#!/usr/bin/env bash
set -euo pipefail
# --- Config ---------------------------------------------------------------
HOST='https://localhost:8007'
USER='root@pam'
PASS="$(cat passfile)"
DATASTORE_PATH="/pbsbench/pbs-bench"
MAX_STORES=1000 # how many stores to include
PARALLEL=16 # concurrent workers
REPEAT=5 # requests per store (1 cold + REPEAT-1 warm)
PRINT_FIRST=false # true => log first request's HTTP code per store
# --- Helpers --------------------------------------------------------------
fmt_rps () {
local n="$1" t="$2"
awk -v n="$n" -v t="$t" 'BEGIN { if (t > 0) printf("%.2f\n", n/t); else print "0.00" }'
}
# --- Login ---------------------------------------------------------------
auth=$(curl -ks -X POST "$HOST/api2/json/access/ticket" \
-d "username=$USER" -d "password=$PASS")
ticket=$(echo "$auth" | jq -r '.data.ticket')
if [[ -z "${ticket:-}" || "$ticket" == "null" ]]; then
echo "[ERROR] Login failed (no ticket)"
exit 1
fi
# --- Collect stores (deterministic order) --------------------------------
mapfile -t STORES < <(
find "$DATASTORE_PATH" -mindepth 1 -maxdepth 1 -type d -printf '%f\n' \
| sort | head -n "$MAX_STORES"
)
USED_STORES=${#STORES[@]}
if (( USED_STORES == 0 )); then
echo "[ERROR] No datastore dirs under $DATASTORE_PATH"
exit 1
fi
echo "[INFO] Running with stores=$USED_STORES, repeat=$REPEAT, parallel=$PARALLEL"
# --- Temp counters --------------------------------------------------------
SUCCESS_ALL="$(mktemp)"
FAIL_ALL="$(mktemp)"
COLD_OK="$(mktemp)"
WARM_OK="$(mktemp)"
trap 'rm -f "$SUCCESS_ALL" "$FAIL_ALL" "$COLD_OK" "$WARM_OK"' EXIT
export HOST ticket REPEAT SUCCESS_ALL FAIL_ALL COLD_OK WARM_OK PRINT_FIRST
SECONDS=0
# --- Fire requests --------------------------------------------------------
printf "%s\n" "${STORES[@]}" \
| xargs -P"$PARALLEL" -I{} bash -c '
store="$1"
url="$HOST/api2/json/admin/datastore/$store/status?verbose=0"
for ((i=1;i<=REPEAT;i++)); do
code=$(curl -ks -o /dev/null -w "%{http_code}" -b "PBSAuthCookie=$ticket" "$url" || echo 000)
if [[ "$code" == "200" ]]; then
echo 1 >> "$SUCCESS_ALL"
if (( i == 1 )); then
echo 1 >> "$COLD_OK"
else
echo 1 >> "$WARM_OK"
fi
if [[ "$PRINT_FIRST" == "true" && $i -eq 1 ]]; then
ts=$(date +%H:%M:%S)
echo "[$ts] $store #$i HTTP:200"
fi
else
echo 1 >> "$FAIL_ALL"
if [[ "$PRINT_FIRST" == "true" && $i -eq 1 ]]; then
ts=$(date +%H:%M:%S)
echo "[$ts] $store #$i HTTP:$code (FAIL)"
fi
fi
done
' _ {}
# --- Summary --------------------------------------------------------------
elapsed=$SECONDS
ok=$(wc -l < "$SUCCESS_ALL" 2>/dev/null || echo 0)
fail=$(wc -l < "$FAIL_ALL" 2>/dev/null || echo 0)
cold_ok=$(wc -l < "$COLD_OK" 2>/dev/null || echo 0)
warm_ok=$(wc -l < "$WARM_OK" 2>/dev/null || echo 0)
expected=$(( USED_STORES * REPEAT ))
total=$(( ok + fail ))
rps_all=$(fmt_rps "$ok" "$elapsed")
rps_cold=$(fmt_rps "$cold_ok" "$elapsed")
rps_warm=$(fmt_rps "$warm_ok" "$elapsed")
echo "===== Summary ====="
echo "Stores used: $USED_STORES"
echo "Expected requests: $expected"
echo "Executed requests: $total"
echo "OK (HTTP 200): $ok"
echo "Failed: $fail"
printf "Total time: %dm %ds\n" $((elapsed/60)) $((elapsed%60))
echo "Throughput all RPS: $rps_all"
echo "Cold RPS (round #1): $rps_cold"
echo "Warm RPS (#2..N): $rps_warm"
## Patch summary
[PATCH 1/6] partial fix #6049: config: enable config version cache for datastore
[PATCH 2/6] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
[PATCH 3/6] partial fix #6049: datastore: use config fast-path in Drop
[PATCH 4/6] partial fix #6049: datastore: add TTL fallback to catch manual config edits
[PATCH 5/6] to add a reload flag to the config cache helper.
[PATCH 6/6] to only bump generation when the config digest changes.
## Changes from v2:
Added:
- [PATCH 5/6]: Add a reload flag to the config cache helper.
- [PATCH 6/6]: Only bump generation when the config digest changes.
## Maintainer notes
No dependency bumps, no API changes and no breaking changes.
Thanks,
Samuel
[1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
[3] Bugzilla #7017: https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Samuel Rufinatscha (6):
partial fix #6049: config: enable config version cache for datastore
partial fix #6049: datastore: impl ConfigVersionCache fast path for
lookups
partial fix #6049: datastore: use config fast-path in Drop
partial fix #6049: datastore: add TTL fallback to catch manual config
edits
partial fix #6049: datastore: add reload flag to config cache helper
datastore: only bump generation when config digest changes
pbs-config/src/config_version_cache.rs | 10 +-
pbs-datastore/Cargo.toml | 1 +
pbs-datastore/src/datastore.rs | 232 ++++++++++++++++++++-----
3 files changed, 197 insertions(+), 46 deletions(-)
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 10%]
* [pbs-devel] [PATCH proxmox-backup v3 5/6] partial fix #6049: datastore: add reload flag to config cache helper
2025-11-20 13:03 10% [pbs-devel] [PATCH proxmox-backup v3 0/6] " Samuel Rufinatscha
` (3 preceding siblings ...)
2025-11-20 13:03 15% ` [pbs-devel] [PATCH proxmox-backup v3 4/6] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
@ 2025-11-20 13:03 15% ` Samuel Rufinatscha
2025-11-20 14:50 5% ` Fabian Grünbichler
2025-11-20 13:03 15% ` [pbs-devel] [PATCH proxmox-backup v3 6/6] datastore: only bump generation when config digest changes Samuel Rufinatscha
` (2 subsequent siblings)
7 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2025-11-20 13:03 UTC (permalink / raw)
To: pbs-devel
Extend datastore_section_config_cached() with an `allow_reload` flag to
separate two use cases:
1) lookup_datastore() passes `true` and is allowed to reload
datastore.cfg from disk when the cache is missing, the generation
changed or the TTL expired. The helper may bump the datastore
generation if the digest changed.
2) DataStore::drop() passes `false` and only consumes the most recent
cached entry without touching the disk, TTL or generation. If the
cache was never initialised, it returns an error.
This avoids races between Drop and concurrent config changes.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
pbs-datastore/src/datastore.rs | 36 ++++++++++++++++++++++++++++++----
1 file changed, 32 insertions(+), 4 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 1711c753..12076f31 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -226,7 +226,7 @@ impl Drop for DataStore {
return;
}
- let (section_config, _gen) = match datastore_section_config_cached() {
+ let (section_config, _gen) = match datastore_section_config_cached(false) {
Ok(v) => v,
Err(err) => {
log::error!(
@@ -299,14 +299,42 @@ impl DatastoreBackend {
}
}
-/// Return the cached datastore SectionConfig and its generation.
-fn datastore_section_config_cached() -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
+/// Returns the cached `datastore.cfg` and its generation.
+///
+/// When `allow_reload` is `true`, callers are expected to hold the datastore config. It may:
+/// - Reload `datastore.cfg` from disk if either
+/// - no cache exists yet, or cache is unavailable
+/// - the cached generation does not match the shared generation
+/// - the cache entry is older than `DATASTORE_CONFIG_CACHE_TTL_SECS`
+/// - Updates the cache with the new config, timestamp and digest.
+/// - Bumps the datastore generation in `ConfigVersionCache` only if
+/// there was a previous cached entry and the digest changed (manual edit or
+/// API write). If the digest is unchanged, the timestamp is refreshed but the
+/// generation is kept to avoid unnecessary invalidations.
+///
+/// When `allow_reload` is `false`:
+/// - Never touches the disk or the shared generation.
+/// - Ignores TTL and simply returns the most recent cached entry if available.
+/// - Returns an error if the cache has not been initialised yet.
+///
+/// Intended for use with `Datastore::drop` where no config lock is held
+/// and eventual stale data is acceptable.
+fn datastore_section_config_cached(
+ allow_reload: bool,
+) -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
let now = epoch_i64();
let version_cache = ConfigVersionCache::new().ok();
let current_gen = version_cache.as_ref().map(|c| c.datastore_generation());
let mut guard = DATASTORE_CONFIG_CACHE.lock().unwrap();
+ if !allow_reload {
+ if let Some(cache) = guard.as_ref() {
+ return Ok((cache.config.clone(), Some(cache.last_generation)));
+ }
+ bail!("datastore config cache not initialized");
+ }
+
// Fast path: re-use cached datastore.cfg if cache is available, generation matches and TTL not expired
if let (Some(current_gen), Some(config_cache)) = (current_gen, guard.as_ref()) {
let gen_matches = config_cache.last_generation == current_gen;
@@ -423,7 +451,7 @@ impl DataStore {
let _config_lock = pbs_config::datastore::lock_config()?;
// Get the current datastore.cfg generation number and cached config
- let (section_config, gen_num) = datastore_section_config_cached()?;
+ let (section_config, gen_num) = datastore_section_config_cached(true)?;
let datastore_cfg: DataStoreConfig = section_config.lookup("datastore", name)?;
let maintenance_mode = datastore_cfg.get_maintenance_mode();
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox-backup v3 1/6] partial fix #6049: config: enable config version cache for datastore
2025-11-20 13:03 10% [pbs-devel] [PATCH proxmox-backup v3 0/6] " Samuel Rufinatscha
@ 2025-11-20 13:03 17% ` Samuel Rufinatscha
2025-11-20 13:03 12% ` [pbs-devel] [PATCH proxmox-backup v3 2/6] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
` (6 subsequent siblings)
7 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-20 13:03 UTC (permalink / raw)
To: pbs-devel
Repeated /status requests caused lookup_datastore() to re-read and
parse datastore.cfg on every call. The issue was mentioned in report
#6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
dominated by pbs_config::datastore::config() (config parsing).
To solve the issue, this patch prepares the config version cache,
so that datastore config caching can be built on top of it.
This patch specifically:
(1) implements increment function in order to invalidate generations
(2) removes obsolete comments
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
pbs-config/src/config_version_cache.rs | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
index e8fb994f..b875f7e0 100644
--- a/pbs-config/src/config_version_cache.rs
+++ b/pbs-config/src/config_version_cache.rs
@@ -26,7 +26,6 @@ struct ConfigVersionCacheDataInner {
// Traffic control (traffic-control.cfg) generation/version.
traffic_control_generation: AtomicUsize,
// datastore (datastore.cfg) generation/version
- // FIXME: remove with PBS 3.0
datastore_generation: AtomicUsize,
// Add further atomics here
}
@@ -145,8 +144,15 @@ impl ConfigVersionCache {
.fetch_add(1, Ordering::AcqRel);
}
+ /// Returns the datastore generation number.
+ pub fn datastore_generation(&self) -> usize {
+ self.shmem
+ .data()
+ .datastore_generation
+ .load(Ordering::Acquire)
+ }
+
/// Increase the datastore generation number.
- // FIXME: remove with PBS 3.0 or make actually useful again in datastore lookup
pub fn increase_datastore_generation(&self) -> usize {
self.shmem
.data()
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 17%]
* [pbs-devel] [PATCH proxmox-backup v3 3/6] partial fix #6049: datastore: use config fast-path in Drop
2025-11-20 13:03 10% [pbs-devel] [PATCH proxmox-backup v3 0/6] " Samuel Rufinatscha
2025-11-20 13:03 17% ` [pbs-devel] [PATCH proxmox-backup v3 1/6] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
2025-11-20 13:03 12% ` [pbs-devel] [PATCH proxmox-backup v3 2/6] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
@ 2025-11-20 13:03 16% ` Samuel Rufinatscha
2025-11-20 13:03 15% ` [pbs-devel] [PATCH proxmox-backup v3 4/6] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
` (4 subsequent siblings)
7 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-20 13:03 UTC (permalink / raw)
To: pbs-devel
The Drop impl of DataStore re-read datastore.cfg to decide whether
the entry should be evicted from the in-process cache (based on
maintenance mode’s clear_from_cache). During the investigation of
issue #6049 [1], a flamegraph [2] showed that the config reload in Drop
accounted for a measurable share of CPU time under load.
This patch adds the datastore config fast path to the Drop impl to
eventually avoid an expensive config reload from disk to capture
the maintenance mandate.
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
pbs-datastore/src/datastore.rs | 43 +++++++++++++++++++++++++++-------
1 file changed, 34 insertions(+), 9 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 8c687097..1494521c 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -216,15 +216,40 @@ impl Drop for DataStore {
// remove datastore from cache iff
// - last task finished, and
// - datastore is in a maintenance mode that mandates it
- let remove_from_cache = last_task
- && pbs_config::datastore::config()
- .and_then(|(s, _)| s.lookup::<DataStoreConfig>("datastore", self.name()))
- .is_ok_and(|c| {
- c.get_maintenance_mode()
- .is_some_and(|m| m.clear_from_cache())
- });
-
- if remove_from_cache {
+
+ // first check: check if last task finished
+ if !last_task {
+ return;
+ }
+
+ let (section_config, _gen) = match datastore_section_config_cached() {
+ Ok(v) => v,
+ Err(err) => {
+ log::error!(
+ "failed to load datastore config in Drop for {} - {err}",
+ self.name()
+ );
+ return;
+ }
+ };
+
+ let datastore_cfg: DataStoreConfig =
+ match section_config.lookup("datastore", self.name()) {
+ Ok(cfg) => cfg,
+ Err(err) => {
+ log::error!(
+ "failed to look up datastore '{}' in Drop - {err}",
+ self.name()
+ );
+ return;
+ }
+ };
+
+ // second check: check maintenance mode mandate
+ if datastore_cfg
+ .get_maintenance_mode()
+ .is_some_and(|m| m.clear_from_cache())
+ {
DATASTORE_MAP.lock().unwrap().remove(self.name());
}
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 16%]
* Re: [pbs-devel] [PATCH proxmox-backup v2 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
2025-11-19 13:24 5% ` Fabian Grünbichler
@ 2025-11-19 17:25 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-19 17:25 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
comments inline
On 11/19/25 2:24 PM, Fabian Grünbichler wrote:
> On November 14, 2025 4:05 pm, Samuel Rufinatscha wrote:
>> The lookup fast path reacts to API-driven config changes because
>> save_config() bumps the generation. Manual edits of datastore.cfg do
>> not bump the counter. To keep the system robust against such edits
>> without reintroducing config reading and hashing on the hot path, this
>> patch adds a TTL to the cache entry.
>>
>> If the cached config is older than
>> DATASTORE_CONFIG_CACHE_TTL_SECS (set to 60s), the next lookup takes
>> the slow path and refreshes the cached entry. Within
>> the TTL window, unchanged generations still use the fast path.
>>
>> Links
>>
>> [1] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>>
>> Refs: #6049
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> pbs-datastore/src/datastore.rs | 46 +++++++++++++++++++++++++---------
>> 1 file changed, 34 insertions(+), 12 deletions(-)
>>
>> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
>> index 0fabf592..7a18435c 100644
>> --- a/pbs-datastore/src/datastore.rs
>> +++ b/pbs-datastore/src/datastore.rs
>> @@ -22,7 +22,7 @@ use proxmox_sys::error::SysError;
>> use proxmox_sys::fs::{file_read_optional_string, replace_file, CreateOptions};
>> use proxmox_sys::linux::procfs::MountInfo;
>> use proxmox_sys::process_locker::{ProcessLockExclusiveGuard, ProcessLockSharedGuard};
>> -use proxmox_time::TimeSpan;
>> +use proxmox_time::{epoch_i64, TimeSpan};
>> use proxmox_worker_task::WorkerTaskContext;
>>
>> use pbs_api_types::{
>> @@ -53,6 +53,8 @@ struct DatastoreConfigCache {
>> config: Arc<SectionConfigData>,
>> // Generation number from ConfigVersionCache
>> last_generation: usize,
>> + // Last update time (epoch seconds)
>> + last_update: i64,
>> }
>>
>> static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
>> @@ -61,6 +63,8 @@ static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
>> static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
>> LazyLock::new(|| Mutex::new(HashMap::new()));
>>
>> +/// Max age in seconds to reuse the cached datastore config.
>> +const DATASTORE_CONFIG_CACHE_TTL_SECS: i64 = 60;
>> /// Filename to store backup group notes
>> pub const GROUP_NOTES_FILE_NAME: &str = "notes";
>> /// Filename to store backup group owner
>> @@ -295,16 +299,22 @@ impl DatastoreBackend {
>>
>> /// Return the cached datastore SectionConfig and its generation.
>> fn datastore_section_config_cached() -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
>> - let gen = ConfigVersionCache::new()
>> - .ok()
>> - .map(|c| c.datastore_generation());
>> + let now = epoch_i64();
>> + let version_cache = ConfigVersionCache::new().ok();
>> + let current_gen = version_cache.as_ref().map(|c| c.datastore_generation());
>>
>> let mut guard = DATASTORE_CONFIG_CACHE.lock().unwrap();
>>
>> - // Fast path: re-use cached datastore.cfg
>> - if let (Some(gen), Some(cache)) = (gen, guard.as_ref()) {
>> - if cache.last_generation == gen {
>> - return Ok((cache.config.clone(), Some(gen)));
>> + // Fast path: re-use cached datastore.cfg if cache is available, generation matches and TTL not expired
>> + if let (Some(current_gen), Some(config_cache)) = (current_gen, guard.as_ref()) {
>> + let gen_matches = config_cache.last_generation == current_gen;
>> + let ttl_ok = (now - config_cache.last_update) < DATASTORE_CONFIG_CACHE_TTL_SECS;
>> +
>> + if gen_matches && ttl_ok {
>> + return Ok((
>> + config_cache.config.clone(),
>> + Some(config_cache.last_generation),
>> + ));
>> }
>> }
>>
>> @@ -312,16 +322,28 @@ fn datastore_section_config_cached() -> Result<(Arc<SectionConfigData>, Option<u
>> let (config_raw, _digest) = pbs_config::datastore::config()?;
>> let config = Arc::new(config_raw);
>>
>> - if let Some(gen_val) = gen {
>> + // Update cache
>> + let new_gen = if let Some(handle) = version_cache {
>> + // Bump datastore generation whenever we reload the config.
>> + // This ensures that Drop handlers will detect that a newer config exists
>> + // and will not rely on a stale cached entry for maintenance mandate.
>> + let prev_gen = handle.increase_datastore_generation();
>
> this could be optimized (further) if we keep the digest when we
> load+parse the config above, because we only need to bump the generation
> if the digest changed. we need to bump the timestamp always of course ;)
> also we only want to bump if we previously had a generation saved, if we
> didn't, then this is the first load and bumping is meaningless anyway..
>
Good point, I think this would be a great optimization - TTL would only
eventually invalidate cached DataStoreImpls (if the config did change
manually). Will add!
> but there is another issue here - this is now called in the Drop
> handler, where we don't hold the config lock, so we have no guard
> against a parallel config change API call that also bumps the generation
> between us reloading and us bumping here.. which means we could have a
> mismatch between the value in new_gen and the actual config we loaded..
>
> I think we need to extend this helper here with a bool flag that
> determines whether we want to reload if the TTL expired, or return
> potentially outdated information? *every* lookup will handle the TTL
> anyway (by setting that parameter), so I think just fetching the
> "freshest" info we can get without reloading (by not setting it) is fine
> for the Drop handler..
>
Good point, will add the flag!
>> + let new_gen = prev_gen + 1;
>> +
>> *guard = Some(DatastoreConfigCache {
>> config: config.clone(),
>> - last_generation: gen_val,
>> + last_generation: new_gen,
>> + last_update: now,
>> });
>> +
>> + Some(new_gen)
>> } else {
>> + // if the cache was not available, use again the slow path next time
>> *guard = None;
>> - }
>> + None
>> + };
>>
>> - Ok((config, gen))
>> + Ok((config, new_gen))
>> }
>>
>> impl DataStore {
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v2 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
2025-11-14 15:05 12% ` [pbs-devel] [PATCH proxmox-backup v2 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
@ 2025-11-19 13:24 5% ` Fabian Grünbichler
0 siblings, 0 replies; 200+ results
From: Fabian Grünbichler @ 2025-11-19 13:24 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On November 14, 2025 4:05 pm, Samuel Rufinatscha wrote:
> Repeated /status requests caused lookup_datastore() to re-read and
> parse datastore.cfg on every call. The issue was mentioned in report
> #6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
> dominated by pbs_config::datastore::config() (config parsing).
>
> This patch implements caching of the global datastore.cfg using the
> generation numbers from the shared config version cache. It caches the
> datastore.cfg along with the generation number and, when a subsequent
> lookup sees the same generation, it reuses the cached config without
> re-reading it from disk. If the generation differs
> (or the cache is unavailable), it falls back to the existing slow path
> with no behavioral changes.
>
> Behavioral notes
>
> - The generation is bumped via the existing save_config() path, so
> API-driven config changes are detected immediately.
> - Manual edits to datastore.cfg are not detected; a TTL
> guard is introduced in a dedicated patch in this series.
> - DataStore::drop still performs a config read on the common path,
> this is covered in a dedicated patch in this series.
>
> Links
>
> [1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>
> Fixes: #6049
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> pbs-datastore/Cargo.toml | 1 +
> pbs-datastore/src/datastore.rs | 120 +++++++++++++++++++++++----------
> 2 files changed, 87 insertions(+), 34 deletions(-)
>
> diff --git a/pbs-datastore/Cargo.toml b/pbs-datastore/Cargo.toml
> index 8ce930a9..42f49a7b 100644
> --- a/pbs-datastore/Cargo.toml
> +++ b/pbs-datastore/Cargo.toml
> @@ -40,6 +40,7 @@ proxmox-io.workspace = true
> proxmox-lang.workspace=true
> proxmox-s3-client = { workspace = true, features = [ "impl" ] }
> proxmox-schema = { workspace = true, features = [ "api-macro" ] }
> +proxmox-section-config.workspace = true
> proxmox-serde = { workspace = true, features = [ "serde_json" ] }
> proxmox-sys.workspace = true
> proxmox-systemd.workspace = true
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 031fa958..e7748872 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -32,7 +32,8 @@ use pbs_api_types::{
> MaintenanceType, Operation, UPID,
> };
> use pbs_config::s3::S3_CFG_TYPE_ID;
> -use pbs_config::BackupLockGuard;
> +use pbs_config::{BackupLockGuard, ConfigVersionCache};
> +use proxmox_section_config::SectionConfigData;
>
> use crate::backup_info::{
> BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
> @@ -46,6 +47,17 @@ use crate::s3::S3_CONTENT_PREFIX;
> use crate::task_tracking::{self, update_active_operations};
> use crate::{DataBlob, LocalDatastoreLruCache};
>
> +// Cache for fully parsed datastore.cfg
> +struct DatastoreConfigCache {
> + // Parsed datastore.cfg file
> + config: Arc<SectionConfigData>,
> + // Generation number from ConfigVersionCache
> + last_generation: usize,
> +}
> +
> +static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
> + LazyLock::new(|| Mutex::new(None));
> +
> static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
> LazyLock::new(|| Mutex::new(HashMap::new()));
>
> @@ -140,10 +152,12 @@ pub struct DataStoreImpl {
> last_gc_status: Mutex<GarbageCollectionStatus>,
> verify_new: bool,
> chunk_order: ChunkOrder,
> - last_digest: Option<[u8; 32]>,
> sync_level: DatastoreFSyncLevel,
> backend_config: DatastoreBackendConfig,
> lru_store_caching: Option<LocalDatastoreLruCache>,
> + /// Datastore generation number from `ConfigVersionCache` at creation time, used to
> + /// validate reuse of this cached `DataStoreImpl`.
> + config_generation: Option<usize>,
> }
>
> impl DataStoreImpl {
> @@ -156,10 +170,10 @@ impl DataStoreImpl {
> last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
> verify_new: false,
> chunk_order: Default::default(),
> - last_digest: None,
> sync_level: Default::default(),
> backend_config: Default::default(),
> lru_store_caching: None,
> + config_generation: None,
> })
> }
> }
> @@ -254,6 +268,37 @@ impl DatastoreBackend {
> }
> }
>
> +/// Return the cached datastore SectionConfig and its generation.
> +fn datastore_section_config_cached() -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
> + let gen = ConfigVersionCache::new()
> + .ok()
> + .map(|c| c.datastore_generation());
> +
> + let mut guard = DATASTORE_CONFIG_CACHE.lock().unwrap();
> +
> + // Fast path: re-use cached datastore.cfg
> + if let (Some(gen), Some(cache)) = (gen, guard.as_ref()) {
> + if cache.last_generation == gen {
> + return Ok((cache.config.clone(), Some(gen)));
> + }
> + }
> +
> + // Slow path: re-read datastore.cfg
> + let (config_raw, _digest) = pbs_config::datastore::config()?;
> + let config = Arc::new(config_raw);
> +
> + if let Some(gen_val) = gen {
> + *guard = Some(DatastoreConfigCache {
> + config: config.clone(),
> + last_generation: gen_val,
> + });
> + } else {
> + *guard = None;
> + }
> +
> + Ok((config, gen))
I think this would be more readable (especially with the extensions
coming later, and with what I propose in my reply to patch #4) if
ordered like this:
let mut guard = DATASTORE_CONFIG_CACHE.lock().unwrap();
if let Ok(version_cache) = ConfigVersionCache::new() {
let gen = version_cache.datastore_generation();
if let Some(cached) = guard.as_ref() {
// Fast path: re-use cached datastore.cfg
if gen == cached.last_generation {
return Ok((cached.config.clone(), Some(gen)));
}
}
// Slow path: re-read datastore.cfg
let (config_raw, _digest) = pbs_config::datastore::config()?;
let config = Arc::new(config_raw);
*guard = Some(DatastoreConfigCache {
config: config.clone(),
last_generation: gen,
});
Ok((config, Some(gen)))
} else {
// Fallback path, no config version cache: read datastore.cfg
*guard = None;
let (config_raw, _digest) = pbs_config::datastore::config()?;
Ok((Arc::new(config_raw), None))
}
with the later changes it would then look like this (but this still has
the issues I mentioned in my comment to patch #4 ;)):
let mut guard = DATASTORE_CONFIG_CACHE.lock().unwrap();
if let Some(version_cache) = ConfigVersionCache::new().ok() {
let now = epoch_i64();
let current_gen = version_cache.datastore_generation();
if let Some(cached) = guard.as_ref() {
// Fast path: re-use cached datastore.cfg if cache is available, generation matches and TTL not expired
if cached.last_generation == current_gen
&& now - cached.last_update < DATASTORE_CONFIG_CACHE_TTL_SECS
{
return Ok((cached.config.clone(), Some(cached.last_generation)));
}
}
// Slow path: re-read datastore.cfg
let (config_raw, _digest) = pbs_config::datastore::config()?;
let config = Arc::new(config_raw);
// Bump datastore generation whenever we reload the config.
// This ensures that Drop handlers will detect that a newer config exists
// and will not rely on a stale cached entry for maintenance mandate.
let prev_gen = version_cache.increase_datastore_generation();
let new_gen = prev_gen + 1;
// Update cache
*guard = Some(DatastoreConfigCache {
config: config.clone(),
last_generation: new_gen,
last_update: now,
});
Ok((config, Some(new_gen)))
} else {
// Fallback path, no config version cache: read datastore.cfg
*guard = None;
let (config_raw, _digest) = pbs_config::datastore::config()?;
Ok((Arc::new(config_raw), None))
}
technically setting the guard to None in the else branch is not needed,
since if we ever get an Ok result back it has been initialized and
subsequent calls cannot fail..
> +}
> +
> impl DataStore {
> // This one just panics on everything
> #[doc(hidden)]
> @@ -325,56 +370,63 @@ impl DataStore {
> name: &str,
> operation: Option<Operation>,
> ) -> Result<Arc<DataStore>, Error> {
> - // Avoid TOCTOU between checking maintenance mode and updating active operation counter, as
> - // we use it to decide whether it is okay to delete the datastore.
> + // Avoid TOCTOU between checking maintenance mode and updating active operations.
> let _config_lock = pbs_config::datastore::lock_config()?;
>
> - // we could use the ConfigVersionCache's generation for staleness detection, but we load
> - // the config anyway -> just use digest, additional benefit: manual changes get detected
> - let (config, digest) = pbs_config::datastore::config()?;
> - let config: DataStoreConfig = config.lookup("datastore", name)?;
> + // Get the current datastore.cfg generation number and cached config
> + let (section_config, gen_num) = datastore_section_config_cached()?;
> +
> + let datastore_cfg: DataStoreConfig = section_config.lookup("datastore", name)?;
> + let maintenance_mode = datastore_cfg.get_maintenance_mode();
> + let mount_status = get_datastore_mount_status(&datastore_cfg);
>
> - if let Some(maintenance_mode) = config.get_maintenance_mode() {
> - if let Err(error) = maintenance_mode.check(operation) {
> + if let Some(mm) = &maintenance_mode {
> + if let Err(error) = mm.check(operation.clone()) {
> bail!("datastore '{name}' is unavailable: {error}");
> }
> }
>
> - if get_datastore_mount_status(&config) == Some(false) {
> - let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
> - datastore_cache.remove(&config.name);
> - bail!("datastore '{}' is not mounted", config.name);
> + let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
> +
> + if mount_status == Some(false) {
> + datastore_cache.remove(&datastore_cfg.name);
> + bail!("datastore '{}' is not mounted", datastore_cfg.name);
> }
>
> - let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
> - let entry = datastore_cache.get(name);
> -
> - // reuse chunk store so that we keep using the same process locker instance!
> - let chunk_store = if let Some(datastore) = &entry {
> - let last_digest = datastore.last_digest.as_ref();
> - if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
> - if let Some(operation) = operation {
> - update_active_operations(name, operation, 1)?;
> + // Re-use DataStoreImpl
> + if let Some(existing) = datastore_cache.get(name).cloned() {
> + if let (Some(last_generation), Some(gen_num)) = (existing.config_generation, gen_num) {
> + if last_generation == gen_num {
> + if let Some(op) = operation {
> + update_active_operations(name, op, 1)?;
> + }
> +
> + return Ok(Arc::new(Self {
> + inner: existing,
> + operation,
> + }));
> }
> - return Ok(Arc::new(Self {
> - inner: Arc::clone(datastore),
> - operation,
> - }));
> }
> - Arc::clone(&datastore.chunk_store)
> + }
> +
> + // (Re)build DataStoreImpl
> +
> + // Reuse chunk store so that we keep using the same process locker instance!
> + let chunk_store = if let Some(existing) = datastore_cache.get(name) {
> + Arc::clone(&existing.chunk_store)
> } else {
> let tuning: DatastoreTuning = serde_json::from_value(
> DatastoreTuning::API_SCHEMA
> - .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
> + .parse_property_string(datastore_cfg.tuning.as_deref().unwrap_or(""))?,
> )?;
> Arc::new(ChunkStore::open(
> name,
> - config.absolute_path(),
> + datastore_cfg.absolute_path(),
> tuning.sync_level.unwrap_or_default(),
> )?)
> };
>
> - let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
> + let datastore = DataStore::with_store_and_config(chunk_store, datastore_cfg, gen_num)?;
>
> let datastore = Arc::new(datastore);
> datastore_cache.insert(name.to_string(), datastore.clone());
> @@ -476,7 +528,7 @@ impl DataStore {
> fn with_store_and_config(
> chunk_store: Arc<ChunkStore>,
> config: DataStoreConfig,
> - last_digest: Option<[u8; 32]>,
> + generation: Option<usize>,
> ) -> Result<DataStoreImpl, Error> {
> let mut gc_status_path = chunk_store.base_path();
> gc_status_path.push(".gc-status");
> @@ -536,10 +588,10 @@ impl DataStore {
> last_gc_status: Mutex::new(gc_status),
> verify_new: config.verify_new.unwrap_or(false),
> chunk_order: tuning.chunk_order.unwrap_or_default(),
> - last_digest,
> sync_level: tuning.sync_level.unwrap_or_default(),
> backend_config,
> lru_store_caching,
> + config_generation: generation,
> })
> }
>
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v2 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
2025-11-14 15:05 15% ` [pbs-devel] [PATCH proxmox-backup v2 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
@ 2025-11-19 13:24 5% ` Fabian Grünbichler
2025-11-19 17:25 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 200+ results
From: Fabian Grünbichler @ 2025-11-19 13:24 UTC (permalink / raw)
To: Proxmox Backup Server development discussion
On November 14, 2025 4:05 pm, Samuel Rufinatscha wrote:
> The lookup fast path reacts to API-driven config changes because
> save_config() bumps the generation. Manual edits of datastore.cfg do
> not bump the counter. To keep the system robust against such edits
> without reintroducing config reading and hashing on the hot path, this
> patch adds a TTL to the cache entry.
>
> If the cached config is older than
> DATASTORE_CONFIG_CACHE_TTL_SECS (set to 60s), the next lookup takes
> the slow path and refreshes the cached entry. Within
> the TTL window, unchanged generations still use the fast path.
>
> Links
>
> [1] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>
> Refs: #6049
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> pbs-datastore/src/datastore.rs | 46 +++++++++++++++++++++++++---------
> 1 file changed, 34 insertions(+), 12 deletions(-)
>
> diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
> index 0fabf592..7a18435c 100644
> --- a/pbs-datastore/src/datastore.rs
> +++ b/pbs-datastore/src/datastore.rs
> @@ -22,7 +22,7 @@ use proxmox_sys::error::SysError;
> use proxmox_sys::fs::{file_read_optional_string, replace_file, CreateOptions};
> use proxmox_sys::linux::procfs::MountInfo;
> use proxmox_sys::process_locker::{ProcessLockExclusiveGuard, ProcessLockSharedGuard};
> -use proxmox_time::TimeSpan;
> +use proxmox_time::{epoch_i64, TimeSpan};
> use proxmox_worker_task::WorkerTaskContext;
>
> use pbs_api_types::{
> @@ -53,6 +53,8 @@ struct DatastoreConfigCache {
> config: Arc<SectionConfigData>,
> // Generation number from ConfigVersionCache
> last_generation: usize,
> + // Last update time (epoch seconds)
> + last_update: i64,
> }
>
> static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
> @@ -61,6 +63,8 @@ static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
> static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
> LazyLock::new(|| Mutex::new(HashMap::new()));
>
> +/// Max age in seconds to reuse the cached datastore config.
> +const DATASTORE_CONFIG_CACHE_TTL_SECS: i64 = 60;
> /// Filename to store backup group notes
> pub const GROUP_NOTES_FILE_NAME: &str = "notes";
> /// Filename to store backup group owner
> @@ -295,16 +299,22 @@ impl DatastoreBackend {
>
> /// Return the cached datastore SectionConfig and its generation.
> fn datastore_section_config_cached() -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
> - let gen = ConfigVersionCache::new()
> - .ok()
> - .map(|c| c.datastore_generation());
> + let now = epoch_i64();
> + let version_cache = ConfigVersionCache::new().ok();
> + let current_gen = version_cache.as_ref().map(|c| c.datastore_generation());
>
> let mut guard = DATASTORE_CONFIG_CACHE.lock().unwrap();
>
> - // Fast path: re-use cached datastore.cfg
> - if let (Some(gen), Some(cache)) = (gen, guard.as_ref()) {
> - if cache.last_generation == gen {
> - return Ok((cache.config.clone(), Some(gen)));
> + // Fast path: re-use cached datastore.cfg if cache is available, generation matches and TTL not expired
> + if let (Some(current_gen), Some(config_cache)) = (current_gen, guard.as_ref()) {
> + let gen_matches = config_cache.last_generation == current_gen;
> + let ttl_ok = (now - config_cache.last_update) < DATASTORE_CONFIG_CACHE_TTL_SECS;
> +
> + if gen_matches && ttl_ok {
> + return Ok((
> + config_cache.config.clone(),
> + Some(config_cache.last_generation),
> + ));
> }
> }
>
> @@ -312,16 +322,28 @@ fn datastore_section_config_cached() -> Result<(Arc<SectionConfigData>, Option<u
> let (config_raw, _digest) = pbs_config::datastore::config()?;
> let config = Arc::new(config_raw);
>
> - if let Some(gen_val) = gen {
> + // Update cache
> + let new_gen = if let Some(handle) = version_cache {
> + // Bump datastore generation whenever we reload the config.
> + // This ensures that Drop handlers will detect that a newer config exists
> + // and will not rely on a stale cached entry for maintenance mandate.
> + let prev_gen = handle.increase_datastore_generation();
this could be optimized (further) if we keep the digest when we
load+parse the config above, because we only need to bump the generation
if the digest changed. we need to bump the timestamp always of course ;)
also we only want to bump if we previously had a generation saved, if we
didn't, then this is the first load and bumping is meaningless anyway..
but there is another issue here - this is now called in the Drop
handler, where we don't hold the config lock, so we have no guard
against a parallel config change API call that also bumps the generation
between us reloading and us bumping here.. which means we could have a
mismatch between the value in new_gen and the actual config we loaded..
I think we need to extend this helper here with a bool flag that
determines whether we want to reload if the TTL expired, or return
potentially outdated information? *every* lookup will handle the TTL
anyway (by setting that parameter), so I think just fetching the
"freshest" info we can get without reloading (by not setting it) is fine
for the Drop handler..
> + let new_gen = prev_gen + 1;
> +
> *guard = Some(DatastoreConfigCache {
> config: config.clone(),
> - last_generation: gen_val,
> + last_generation: new_gen,
> + last_update: now,
> });
> +
> + Some(new_gen)
> } else {
> + // if the cache was not available, use again the slow path next time
> *guard = None;
> - }
> + None
> + };
>
> - Ok((config, gen))
> + Ok((config, new_gen))
> }
>
> impl DataStore {
> --
> 2.47.3
>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup] task tracking: fix adding new entry if other PID is tracked
@ 2025-11-17 8:41 13% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-17 8:41 UTC (permalink / raw)
To: pbs-devel
Looks good to me!
Reviewed-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
On 11/12/25 2:15 PM, Fabian Grünbichler wrote:
> if the tracking file contains an entry for another, still running PID, that
> entry must be preserved, but a new entry for the current PID should still be
> inserted..
>
> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
> found while benchmarking Samuel's datastore lookup caching series..
>
> pbs-datastore/src/task_tracking.rs | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/pbs-datastore/src/task_tracking.rs b/pbs-datastore/src/task_tracking.rs
> index 77851cab6..44a4522dc 100644
> --- a/pbs-datastore/src/task_tracking.rs
> +++ b/pbs-datastore/src/task_tracking.rs
> @@ -108,6 +108,7 @@ pub fn update_active_operations(
> Operation::Write => ActiveOperationStats { read: 0, write: 1 },
> Operation::Lookup => ActiveOperationStats { read: 0, write: 0 },
> };
> + let mut found_entry = false;
> let mut updated_tasks: Vec<TaskOperations> = match file_read_optional_string(&path)? {
> Some(data) => serde_json::from_str::<Vec<TaskOperations>>(&data)?
> .iter_mut()
> @@ -116,6 +117,7 @@ pub fn update_active_operations(
> Some(stat) if pid == task.pid && stat.starttime != task.starttime => None,
> Some(_) => {
> if pid == task.pid {
> + found_entry = true;
> match operation {
> Operation::Read => task.active_operations.read += count,
> Operation::Write => task.active_operations.write += count,
> @@ -132,7 +134,7 @@ pub fn update_active_operations(
> None => Vec::new(),
> };
>
> - if updated_tasks.is_empty() {
> + if !found_entry {
> updated_tasks.push(TaskOperations {
> pid,
> starttime,
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] superseded: [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path
@ 2025-11-14 15:08 13% ` Samuel Rufinatscha
0 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-14 15:08 UTC (permalink / raw)
To: pbs-devel
https://lore.proxmox.com/pbs-devel/20251114150544.224839-1-s.rufinatscha@proxmox.com/T/#t
On 11/11/25 1:29 PM, Samuel Rufinatscha wrote:
> Hi,
>
> this series reduces CPU time in datastore lookups by avoiding repeated
> datastore.cfg reads/parses in both `lookup_datastore()` and
> `DataStore::Drop`. It also adds a TTL so manual config edits are
> noticed without reintroducing hashing on every request.
>
> While investigating #6049 [1], cargo-flamegraph [2] showed hotspots during
> repeated `/status` calls in `lookup_datastore()` and in `Drop`,
> dominated by `pbs_config::datastore::config()` (config parse).
>
> The parsing cost itself should eventually be investigated in a future
> effort. Furthermore, cargo-flamegraph showed that when using a
> token-based auth method to access the API, a significant amount of time
> is spent in validation on every request, likely related to bcrypt.
> Also this should be eventually revisited in a future effort.
>
> ## Approach
>
> [PATCH 1/3] Fast path for datastore lookups
> Use the shared-memory `ConfigVersionCache` generation for `datastore.cfg`.
> Tag each cached `DataStoreImpl` with the last seen generation; when it
> matches, reuse the cached instance. Fall back to the existing slow path
> on mismatch or when the cache is unavailable.
>
> [PATCH 2/3] Fast path for `Drop`
> Reuse the maintenance mode eviction decision captured at lookup time,
> removing the config reload from `Drop`.
>
> [PATCH 3/3] TTL to catch manual edits
> If a cached entry is older than `DATASTORE_CONFIG_CACHE_TTL_SECS`
> (default 60s), the next lookup refreshes it via the slow path. This
> detects manual file edits without hashing on every request.
>
> ## Results
>
> End-to-end `/status?verbose=0` (1000 stores, 5 req/store, parallel=16):
>
> Metric Baseline [1/3] [2/3]
> ------------------------------------------------
> Total time 13s 11s 10s
> Throughput (all) 384.62 454.55 500.00
> Cold RPS (round #1) 76.92 90.91 100.00
> Warm RPS (2..N) 307.69 363.64 400.00
>
> Patch 1 improves overall throughput by ~18% (−15% total time). Patch 2
> adds ~10% on top. Patch 3 is a robustness feature; a 0.1 s probe shows
> periodic latency spikes at TTL expiry and flat latencies otherwise.
>
> ## Reproduction steps
>
> VM: 4 vCPU, ~8 GiB RAM, VirtIO-SCSI; disks:
> - scsi0 32G (OS)
> - scsi1 1000G (datastores)
>
> Install PBS from ISO on the VM.
>
> Set up ZFS on /dev/sdb (adjust if different):
>
> zpool create -f -o ashift=12 pbsbench /dev/sdb
> zfs set mountpoint=/pbsbench pbsbench
> zfs create pbsbench/pbs-bench
>
> Raise file-descriptor limit:
>
> sudo systemctl edit proxmox-backup-proxy.service
>
> Add the following lines:
>
> [Service]
> LimitNOFILE=1048576
>
> Reload systemd and restart the proxy:
>
> sudo systemctl daemon-reload
> sudo systemctl restart proxmox-backup-proxy.service
>
> Verify the limit:
>
> systemctl show proxmox-backup-proxy.service | grep LimitNOFILE
>
> Create 1000 ZFS-backed datastores (as used in #6049 [1]):
>
> seq -w 001 1000 | xargs -n1 -P1 bash -c '
> id=$0
> name="ds${id}"
> dataset="pbsbench/pbs-bench/${name}"
> path="/pbsbench/pbs-bench/${name}"
> zfs create -o mountpoint="$path" "$dataset"
> proxmox-backup-manager datastore create "$name" "$path" \
> --comment "ZFS dataset-based datastore"
> '
>
> Build PBS from this series, then run the server under manually
> under flamegraph:
>
> systemctl stop proxmox-backup-proxy
> cargo flamegraph --release --bin proxmox-backup-proxy
>
> Benchmark script (`bench.sh`) used for the numbers above:
>
> #!/usr/bin/env bash
> set -euo pipefail
>
> # --- Config ---------------------------------------------------------------
> HOST='https://localhost:8007'
> USER='root@pam'
> PASS="$(cat passfile)"
>
> DATASTORE_PATH="/pbsbench/pbs-bench"
> MAX_STORES=1000 # how many stores to include
> PARALLEL=16 # concurrent workers
> REPEAT=5 # requests per store (1 cold + REPEAT-1 warm)
>
> PRINT_FIRST=false # true => log first request's HTTP code per store
>
> # --- Helpers --------------------------------------------------------------
> fmt_rps () {
> local n="$1" t="$2"
> awk -v n="$n" -v t="$t" 'BEGIN { if (t > 0) printf("%.2f\n", n/t); else print "0.00" }'
> }
>
> # --- Login ---------------------------------------------------------------
> auth=$(curl -ks -X POST "$HOST/api2/json/access/ticket" \
> -d "username=$USER" -d "password=$PASS")
> ticket=$(echo "$auth" | jq -r '.data.ticket')
>
> if [[ -z "${ticket:-}" || "$ticket" == "null" ]]; then
> echo "[ERROR] Login failed (no ticket)"
> exit 1
> fi
>
> # --- Collect stores (deterministic order) --------------------------------
> mapfile -t STORES < <(
> find "$DATASTORE_PATH" -mindepth 1 -maxdepth 1 -type d -printf '%f\n' \
> | sort | head -n "$MAX_STORES"
> )
>
> USED_STORES=${#STORES[@]}
> if (( USED_STORES == 0 )); then
> echo "[ERROR] No datastore dirs under $DATASTORE_PATH"
> exit 1
> fi
>
> echo "[INFO] Running with stores=$USED_STORES, repeat=$REPEAT, parallel=$PARALLEL"
>
> # --- Temp counters --------------------------------------------------------
> SUCCESS_ALL="$(mktemp)"
> FAIL_ALL="$(mktemp)"
> COLD_OK="$(mktemp)"
> WARM_OK="$(mktemp)"
> trap 'rm -f "$SUCCESS_ALL" "$FAIL_ALL" "$COLD_OK" "$WARM_OK"' EXIT
>
> export HOST ticket REPEAT SUCCESS_ALL FAIL_ALL COLD_OK WARM_OK PRINT_FIRST
>
> SECONDS=0
>
> # --- Fire requests --------------------------------------------------------
> printf "%s\n" "${STORES[@]}" \
> | xargs -P"$PARALLEL" -I{} bash -c '
> store="$1"
> url="$HOST/api2/json/admin/datastore/$store/status?verbose=0"
>
> for ((i=1;i<=REPEAT;i++)); do
> code=$(curl -ks -o /dev/null -w "%{http_code}" -b "PBSAuthCookie=$ticket" "$url" || echo 000)
>
> if [[ "$code" == "200" ]]; then
> echo 1 >> "$SUCCESS_ALL"
> if (( i == 1 )); then
> echo 1 >> "$COLD_OK"
> else
> echo 1 >> "$WARM_OK"
> fi
> if [[ "$PRINT_FIRST" == "true" && $i -eq 1 ]]; then
> ts=$(date +%H:%M:%S)
> echo "[$ts] $store #$i HTTP:200"
> fi
> else
> echo 1 >> "$FAIL_ALL"
> if [[ "$PRINT_FIRST" == "true" && $i -eq 1 ]]; then
> ts=$(date +%H:%M:%S)
> echo "[$ts] $store #$i HTTP:$code (FAIL)"
> fi
> fi
> done
> ' _ {}
>
> # --- Summary --------------------------------------------------------------
> elapsed=$SECONDS
> ok=$(wc -l < "$SUCCESS_ALL" 2>/dev/null || echo 0)
> fail=$(wc -l < "$FAIL_ALL" 2>/dev/null || echo 0)
> cold_ok=$(wc -l < "$COLD_OK" 2>/dev/null || echo 0)
> warm_ok=$(wc -l < "$WARM_OK" 2>/dev/null || echo 0)
>
> expected=$(( USED_STORES * REPEAT ))
> total=$(( ok + fail ))
>
> rps_all=$(fmt_rps "$ok" "$elapsed")
> rps_cold=$(fmt_rps "$cold_ok" "$elapsed")
> rps_warm=$(fmt_rps "$warm_ok" "$elapsed")
>
> echo "===== Summary ====="
> echo "Stores used: $USED_STORES"
> echo "Expected requests: $expected"
> echo "Executed requests: $total"
> echo "OK (HTTP 200): $ok"
> echo "Failed: $fail"
> printf "Total time: %dm %ds\n" $((elapsed/60)) $((elapsed%60))
> echo "Throughput all RPS: $rps_all"
> echo "Cold RPS (round #1): $rps_cold"
> echo "Warm RPS (#2..N): $rps_warm"
>
> ## Maintainer notes
>
> - No dependency bumps, no API changes, no breaking changes in this
> series.
>
> ## Patch summary
>
> [PATCH 1/3] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
> [PATCH 2/3] partial fix #6049: datastore: use config fast-path in Drop
> [PATCH 3/3] datastore: add TTL fallback to catch manual config edits
>
> Thanks for reviewing!
>
> [1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
>
> Samuel Rufinatscha (3):
> partial fix #6049: datastore: impl ConfigVersionCache fast path for
> lookups
> partial fix #6049: datastore: use config fast-path in Drop
> datastore: add TTL fallback to catch manual config edits
>
> pbs-config/src/config_version_cache.rs | 10 ++-
> pbs-datastore/src/datastore.rs | 119 ++++++++++++++++++-------
> 2 files changed, 96 insertions(+), 33 deletions(-)
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox-backup v2 0/4] datastore: remove config reload on hot path
@ 2025-11-14 15:05 10% Samuel Rufinatscha
2025-11-14 15:05 17% ` [pbs-devel] [PATCH proxmox-backup v2 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
` (3 more replies)
0 siblings, 4 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-14 15:05 UTC (permalink / raw)
To: pbs-devel
Hi,
this series reduces CPU time in datastore lookups by avoiding repeated
datastore.cfg reads/parses in both `lookup_datastore()` and
`DataStore::Drop`. It also adds a TTL so manual config edits are
noticed without reintroducing hashing on every request.
While investigating #6049 [1], cargo-flamegraph [2] showed hotspots during
repeated `/status` calls in `lookup_datastore()` and in `Drop`,
dominated by `pbs_config::datastore::config()` (config parse).
The parsing cost itself should eventually be investigated in a future
effort. Furthermore, cargo-flamegraph showed that when using a
token-based auth method to access the API, a significant amount of time
is spent in validation on every request request [3].
## Approach
[PATCH 1/4] Extend ConfigVersionCache for datastore generation
Expose a dedicated datastore generation counter and an increment
helper so callers can cheaply track datastore.cfg versions.
[PATCH 2/4] Fast path for datastore lookups
Cache the parsed datastore.cfg keyed by the shared datastore
generation. lookup_datastore() reuses both the cached config and an
existing DataStoreImpl when the generation matches, and falls back
to the old slow path otherwise.
[PATCH 3/4] Fast path for Drop
Make DataStore::Drop use the cached config if possible instead of
rereading datastore.cfg from disk.
[PATCH 4/4] TTL to catch manual edits
Add a small TTL around the cached config and bump the datastore
generation whenever the config is reloaded. This catches manual
edits to datastore.cfg without reintroducing hashing or
config parsing on every request.
## Benchmark results
All the following benchmarks are based on top of
https://lore.proxmox.com/pbs-devel/20251112131525.645971-1-f.gruenbichler@proxmox.com/T/#u
### End-to-end
Testing `/status?verbose=0` end-to-end with 1000 stores, 5 req/store
and parallel=16 before/after the series:
Metric Before After
----------------------------------------
Total time 12s 9s
Throughput (all) 416.67 555.56
Cold RPS (round #1) 83.33 111.11
Warm RPS (#2..N) 333.33 444.44
Running under flamegraph [2], TLS appears to consume a significant
amount of CPU time and blur the results. Still, a ~33% higher overall
throughput and ~25% less end-to-end time for this workload.
### Isolated benchmarks (hyperfine)
In addition to the end-to-end tests, I measured two standalone benchmarks
with hyperfine, each using a config with 1000
datastores. `M` is the number of distinct datastores looked up and
`N` is the number of lookups per datastore.
Drop-direct variant:
Drops the `DataStore` after every lookup, so the `Drop` path runs on
every iteration:
use anyhow::Error;
use pbs_api_types::Operation;
use pbs_datastore::DataStore;
fn main() -> Result<(), Error> {
let mut args = std::env::args();
args.next();
let datastores = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
let iterations = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
for d in 1..=datastores {
let name = format!("ds{:04}", d);
for i in 1..=iterations {
DataStore::lookup_datastore(&name, Some(Operation::Write))?;
}
}
Ok(())
}
+----+------+-----------+-----------+---------+
| M | N | Baseline | Patched | Speedup |
+----+------+-----------+-----------+---------+
| 1 | 1000 | 1.670 s | 34.3 ms | 48.7x |
| 10 | 100 | 1.672 s | 34.5 ms | 48.4x |
| 100| 10 | 1.679 s | 35.1 ms | 47.8x |
|1000| 1 | 1.787 s | 38.2 ms | 46.8x |
+----+------+-----------+-----------+---------+
Bulk-drop variant:
Keeps the `DataStore` instances alive for
all `N` lookups of a given datastore and then drops them in bulk,
mimicking a task that performs many lookups while it is running and
only triggers the expensive `Drop` logic when the last user exits.
use anyhow::Error;
use pbs_api_types::Operation;
use pbs_datastore::DataStore;
fn main() -> Result<(), Error> {
let mut args = std::env::args();
args.next();
let datastores = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
let iterations = if let Some(n) = args.next() {
n.parse::<usize>()?
} else {
1000
};
for d in 1..=datastores {
let name = format!("ds{:04}", d);
let mut stores = Vec::with_capacity(iterations);
for i in 1..=iterations {
stores.push(DataStore::lookup_datastore(&name, Some(Operation::Write))?);
}
}
Ok(())
}
+------+------+---------------+--------------+---------+
| M | N | Baseline mean | Patched mean | Speedup |
+------+------+---------------+--------------+---------+
| 1 | 1000 | 884.0 ms | 33.9 ms | 26.1x |
| 10 | 100 | 881.8 ms | 35.3 ms | 25.0x |
| 100 | 10 | 969.3 ms | 35.9 ms | 27.0x |
| 1000 | 1 | 1827.0 ms | 40.7 ms | 44.9x |
+------+------+---------------+--------------+---------+
Both variants show that the combination of the cached config lookups
and the cheaper `Drop` handling reduces the hot-path cost from ~1.7 s
per run to a few tens of milliseconds in these benchmarks.
## Reproduction steps
VM: 4 vCPU, ~8 GiB RAM, VirtIO-SCSI; disks:
- scsi0 32G (OS)
- scsi1 1000G (datastores)
Install PBS from ISO on the VM.
Set up ZFS on /dev/sdb (adjust if different):
zpool create -f -o ashift=12 pbsbench /dev/sdb
zfs set mountpoint=/pbsbench pbsbench
zfs create pbsbench/pbs-bench
Raise file-descriptor limit:
sudo systemctl edit proxmox-backup-proxy.service
Add the following lines:
[Service]
LimitNOFILE=1048576
Reload systemd and restart the proxy:
sudo systemctl daemon-reload
sudo systemctl restart proxmox-backup-proxy.service
Verify the limit:
systemctl show proxmox-backup-proxy.service | grep LimitNOFILE
Create 1000 ZFS-backed datastores (as used in #6049 [1]):
seq -w 001 1000 | xargs -n1 -P1 bash -c '
id=$0
name="ds${id}"
dataset="pbsbench/pbs-bench/${name}"
path="/pbsbench/pbs-bench/${name}"
zfs create -o mountpoint="$path" "$dataset"
proxmox-backup-manager datastore create "$name" "$path" \
--comment "ZFS dataset-based datastore"
'
Build PBS from this series, then run the server under manually
under flamegraph:
systemctl stop proxmox-backup-proxy
cargo flamegraph --release --bin proxmox-backup-proxy
## Other resources:
### E2E benchmark script:
#!/usr/bin/env bash
set -euo pipefail
# --- Config ---------------------------------------------------------------
HOST='https://localhost:8007'
USER='root@pam'
PASS="$(cat passfile)"
DATASTORE_PATH="/pbsbench/pbs-bench"
MAX_STORES=1000 # how many stores to include
PARALLEL=16 # concurrent workers
REPEAT=5 # requests per store (1 cold + REPEAT-1 warm)
PRINT_FIRST=false # true => log first request's HTTP code per store
# --- Helpers --------------------------------------------------------------
fmt_rps () {
local n="$1" t="$2"
awk -v n="$n" -v t="$t" 'BEGIN { if (t > 0) printf("%.2f\n", n/t); else print "0.00" }'
}
# --- Login ---------------------------------------------------------------
auth=$(curl -ks -X POST "$HOST/api2/json/access/ticket" \
-d "username=$USER" -d "password=$PASS")
ticket=$(echo "$auth" | jq -r '.data.ticket')
if [[ -z "${ticket:-}" || "$ticket" == "null" ]]; then
echo "[ERROR] Login failed (no ticket)"
exit 1
fi
# --- Collect stores (deterministic order) --------------------------------
mapfile -t STORES < <(
find "$DATASTORE_PATH" -mindepth 1 -maxdepth 1 -type d -printf '%f\n' \
| sort | head -n "$MAX_STORES"
)
USED_STORES=${#STORES[@]}
if (( USED_STORES == 0 )); then
echo "[ERROR] No datastore dirs under $DATASTORE_PATH"
exit 1
fi
echo "[INFO] Running with stores=$USED_STORES, repeat=$REPEAT, parallel=$PARALLEL"
# --- Temp counters --------------------------------------------------------
SUCCESS_ALL="$(mktemp)"
FAIL_ALL="$(mktemp)"
COLD_OK="$(mktemp)"
WARM_OK="$(mktemp)"
trap 'rm -f "$SUCCESS_ALL" "$FAIL_ALL" "$COLD_OK" "$WARM_OK"' EXIT
export HOST ticket REPEAT SUCCESS_ALL FAIL_ALL COLD_OK WARM_OK PRINT_FIRST
SECONDS=0
# --- Fire requests --------------------------------------------------------
printf "%s\n" "${STORES[@]}" \
| xargs -P"$PARALLEL" -I{} bash -c '
store="$1"
url="$HOST/api2/json/admin/datastore/$store/status?verbose=0"
for ((i=1;i<=REPEAT;i++)); do
code=$(curl -ks -o /dev/null -w "%{http_code}" -b "PBSAuthCookie=$ticket" "$url" || echo 000)
if [[ "$code" == "200" ]]; then
echo 1 >> "$SUCCESS_ALL"
if (( i == 1 )); then
echo 1 >> "$COLD_OK"
else
echo 1 >> "$WARM_OK"
fi
if [[ "$PRINT_FIRST" == "true" && $i -eq 1 ]]; then
ts=$(date +%H:%M:%S)
echo "[$ts] $store #$i HTTP:200"
fi
else
echo 1 >> "$FAIL_ALL"
if [[ "$PRINT_FIRST" == "true" && $i -eq 1 ]]; then
ts=$(date +%H:%M:%S)
echo "[$ts] $store #$i HTTP:$code (FAIL)"
fi
fi
done
' _ {}
# --- Summary --------------------------------------------------------------
elapsed=$SECONDS
ok=$(wc -l < "$SUCCESS_ALL" 2>/dev/null || echo 0)
fail=$(wc -l < "$FAIL_ALL" 2>/dev/null || echo 0)
cold_ok=$(wc -l < "$COLD_OK" 2>/dev/null || echo 0)
warm_ok=$(wc -l < "$WARM_OK" 2>/dev/null || echo 0)
expected=$(( USED_STORES * REPEAT ))
total=$(( ok + fail ))
rps_all=$(fmt_rps "$ok" "$elapsed")
rps_cold=$(fmt_rps "$cold_ok" "$elapsed")
rps_warm=$(fmt_rps "$warm_ok" "$elapsed")
echo "===== Summary ====="
echo "Stores used: $USED_STORES"
echo "Expected requests: $expected"
echo "Executed requests: $total"
echo "OK (HTTP 200): $ok"
echo "Failed: $fail"
printf "Total time: %dm %ds\n" $((elapsed/60)) $((elapsed%60))
echo "Throughput all RPS: $rps_all"
echo "Cold RPS (round #1): $rps_cold"
echo "Warm RPS (#2..N): $rps_warm"
## Maintainer notes
No dependency bumps, no API changes and no breaking changes.
## Patch summary
[PATCH 1/4] partial fix #6049: config: enable config version cache for datastore
[PATCH 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
[PATCH 3/4] partial fix #6049: datastore: use config fast-path in Drop
[PATCH 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
Thanks,
Samuel
[1] Bugzilla #6049: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
[3] Bugzilla #7017: https://bugzilla.proxmox.com/show_bug.cgi?id=7017
Samuel Rufinatscha (4):
partial fix #6049: config: enable config version cache for datastore
partial fix #6049: datastore: impl ConfigVersionCache fast path for
lookups
partial fix #6049: datastore: use config fast-path in Drop
partial fix #6049: datastore: add TTL fallback to catch manual config
edits
pbs-config/src/config_version_cache.rs | 10 +-
pbs-datastore/Cargo.toml | 1 +
pbs-datastore/src/datastore.rs | 187 +++++++++++++++++++------
3 files changed, 152 insertions(+), 46 deletions(-)
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 10%]
* [pbs-devel] [PATCH proxmox-backup v2 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups
2025-11-14 15:05 10% [pbs-devel] [PATCH proxmox-backup v2 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-14 15:05 17% ` [pbs-devel] [PATCH proxmox-backup v2 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
@ 2025-11-14 15:05 12% ` Samuel Rufinatscha
2025-11-19 13:24 5% ` Fabian Grünbichler
2025-11-14 15:05 15% ` [pbs-devel] [PATCH proxmox-backup v2 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-20 13:07 13% ` [pbs-devel] superseded: [PATCH proxmox-backup v2 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
3 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2025-11-14 15:05 UTC (permalink / raw)
To: pbs-devel
Repeated /status requests caused lookup_datastore() to re-read and
parse datastore.cfg on every call. The issue was mentioned in report
#6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
dominated by pbs_config::datastore::config() (config parsing).
This patch implements caching of the global datastore.cfg using the
generation numbers from the shared config version cache. It caches the
datastore.cfg along with the generation number and, when a subsequent
lookup sees the same generation, it reuses the cached config without
re-reading it from disk. If the generation differs
(or the cache is unavailable), it falls back to the existing slow path
with no behavioral changes.
Behavioral notes
- The generation is bumped via the existing save_config() path, so
API-driven config changes are detected immediately.
- Manual edits to datastore.cfg are not detected; a TTL
guard is introduced in a dedicated patch in this series.
- DataStore::drop still performs a config read on the common path,
this is covered in a dedicated patch in this series.
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Fixes: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
pbs-datastore/Cargo.toml | 1 +
pbs-datastore/src/datastore.rs | 120 +++++++++++++++++++++++----------
2 files changed, 87 insertions(+), 34 deletions(-)
diff --git a/pbs-datastore/Cargo.toml b/pbs-datastore/Cargo.toml
index 8ce930a9..42f49a7b 100644
--- a/pbs-datastore/Cargo.toml
+++ b/pbs-datastore/Cargo.toml
@@ -40,6 +40,7 @@ proxmox-io.workspace = true
proxmox-lang.workspace=true
proxmox-s3-client = { workspace = true, features = [ "impl" ] }
proxmox-schema = { workspace = true, features = [ "api-macro" ] }
+proxmox-section-config.workspace = true
proxmox-serde = { workspace = true, features = [ "serde_json" ] }
proxmox-sys.workspace = true
proxmox-systemd.workspace = true
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 031fa958..e7748872 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -32,7 +32,8 @@ use pbs_api_types::{
MaintenanceType, Operation, UPID,
};
use pbs_config::s3::S3_CFG_TYPE_ID;
-use pbs_config::BackupLockGuard;
+use pbs_config::{BackupLockGuard, ConfigVersionCache};
+use proxmox_section_config::SectionConfigData;
use crate::backup_info::{
BackupDir, BackupGroup, BackupInfo, OLD_LOCKING, PROTECTED_MARKER_FILENAME,
@@ -46,6 +47,17 @@ use crate::s3::S3_CONTENT_PREFIX;
use crate::task_tracking::{self, update_active_operations};
use crate::{DataBlob, LocalDatastoreLruCache};
+// Cache for fully parsed datastore.cfg
+struct DatastoreConfigCache {
+ // Parsed datastore.cfg file
+ config: Arc<SectionConfigData>,
+ // Generation number from ConfigVersionCache
+ last_generation: usize,
+}
+
+static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
+ LazyLock::new(|| Mutex::new(None));
+
static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
LazyLock::new(|| Mutex::new(HashMap::new()));
@@ -140,10 +152,12 @@ pub struct DataStoreImpl {
last_gc_status: Mutex<GarbageCollectionStatus>,
verify_new: bool,
chunk_order: ChunkOrder,
- last_digest: Option<[u8; 32]>,
sync_level: DatastoreFSyncLevel,
backend_config: DatastoreBackendConfig,
lru_store_caching: Option<LocalDatastoreLruCache>,
+ /// Datastore generation number from `ConfigVersionCache` at creation time, used to
+ /// validate reuse of this cached `DataStoreImpl`.
+ config_generation: Option<usize>,
}
impl DataStoreImpl {
@@ -156,10 +170,10 @@ impl DataStoreImpl {
last_gc_status: Mutex::new(GarbageCollectionStatus::default()),
verify_new: false,
chunk_order: Default::default(),
- last_digest: None,
sync_level: Default::default(),
backend_config: Default::default(),
lru_store_caching: None,
+ config_generation: None,
})
}
}
@@ -254,6 +268,37 @@ impl DatastoreBackend {
}
}
+/// Return the cached datastore SectionConfig and its generation.
+fn datastore_section_config_cached() -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
+ let gen = ConfigVersionCache::new()
+ .ok()
+ .map(|c| c.datastore_generation());
+
+ let mut guard = DATASTORE_CONFIG_CACHE.lock().unwrap();
+
+ // Fast path: re-use cached datastore.cfg
+ if let (Some(gen), Some(cache)) = (gen, guard.as_ref()) {
+ if cache.last_generation == gen {
+ return Ok((cache.config.clone(), Some(gen)));
+ }
+ }
+
+ // Slow path: re-read datastore.cfg
+ let (config_raw, _digest) = pbs_config::datastore::config()?;
+ let config = Arc::new(config_raw);
+
+ if let Some(gen_val) = gen {
+ *guard = Some(DatastoreConfigCache {
+ config: config.clone(),
+ last_generation: gen_val,
+ });
+ } else {
+ *guard = None;
+ }
+
+ Ok((config, gen))
+}
+
impl DataStore {
// This one just panics on everything
#[doc(hidden)]
@@ -325,56 +370,63 @@ impl DataStore {
name: &str,
operation: Option<Operation>,
) -> Result<Arc<DataStore>, Error> {
- // Avoid TOCTOU between checking maintenance mode and updating active operation counter, as
- // we use it to decide whether it is okay to delete the datastore.
+ // Avoid TOCTOU between checking maintenance mode and updating active operations.
let _config_lock = pbs_config::datastore::lock_config()?;
- // we could use the ConfigVersionCache's generation for staleness detection, but we load
- // the config anyway -> just use digest, additional benefit: manual changes get detected
- let (config, digest) = pbs_config::datastore::config()?;
- let config: DataStoreConfig = config.lookup("datastore", name)?;
+ // Get the current datastore.cfg generation number and cached config
+ let (section_config, gen_num) = datastore_section_config_cached()?;
+
+ let datastore_cfg: DataStoreConfig = section_config.lookup("datastore", name)?;
+ let maintenance_mode = datastore_cfg.get_maintenance_mode();
+ let mount_status = get_datastore_mount_status(&datastore_cfg);
- if let Some(maintenance_mode) = config.get_maintenance_mode() {
- if let Err(error) = maintenance_mode.check(operation) {
+ if let Some(mm) = &maintenance_mode {
+ if let Err(error) = mm.check(operation.clone()) {
bail!("datastore '{name}' is unavailable: {error}");
}
}
- if get_datastore_mount_status(&config) == Some(false) {
- let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
- datastore_cache.remove(&config.name);
- bail!("datastore '{}' is not mounted", config.name);
+ let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
+
+ if mount_status == Some(false) {
+ datastore_cache.remove(&datastore_cfg.name);
+ bail!("datastore '{}' is not mounted", datastore_cfg.name);
}
- let mut datastore_cache = DATASTORE_MAP.lock().unwrap();
- let entry = datastore_cache.get(name);
-
- // reuse chunk store so that we keep using the same process locker instance!
- let chunk_store = if let Some(datastore) = &entry {
- let last_digest = datastore.last_digest.as_ref();
- if let Some(true) = last_digest.map(|last_digest| last_digest == &digest) {
- if let Some(operation) = operation {
- update_active_operations(name, operation, 1)?;
+ // Re-use DataStoreImpl
+ if let Some(existing) = datastore_cache.get(name).cloned() {
+ if let (Some(last_generation), Some(gen_num)) = (existing.config_generation, gen_num) {
+ if last_generation == gen_num {
+ if let Some(op) = operation {
+ update_active_operations(name, op, 1)?;
+ }
+
+ return Ok(Arc::new(Self {
+ inner: existing,
+ operation,
+ }));
}
- return Ok(Arc::new(Self {
- inner: Arc::clone(datastore),
- operation,
- }));
}
- Arc::clone(&datastore.chunk_store)
+ }
+
+ // (Re)build DataStoreImpl
+
+ // Reuse chunk store so that we keep using the same process locker instance!
+ let chunk_store = if let Some(existing) = datastore_cache.get(name) {
+ Arc::clone(&existing.chunk_store)
} else {
let tuning: DatastoreTuning = serde_json::from_value(
DatastoreTuning::API_SCHEMA
- .parse_property_string(config.tuning.as_deref().unwrap_or(""))?,
+ .parse_property_string(datastore_cfg.tuning.as_deref().unwrap_or(""))?,
)?;
Arc::new(ChunkStore::open(
name,
- config.absolute_path(),
+ datastore_cfg.absolute_path(),
tuning.sync_level.unwrap_or_default(),
)?)
};
- let datastore = DataStore::with_store_and_config(chunk_store, config, Some(digest))?;
+ let datastore = DataStore::with_store_and_config(chunk_store, datastore_cfg, gen_num)?;
let datastore = Arc::new(datastore);
datastore_cache.insert(name.to_string(), datastore.clone());
@@ -476,7 +528,7 @@ impl DataStore {
fn with_store_and_config(
chunk_store: Arc<ChunkStore>,
config: DataStoreConfig,
- last_digest: Option<[u8; 32]>,
+ generation: Option<usize>,
) -> Result<DataStoreImpl, Error> {
let mut gc_status_path = chunk_store.base_path();
gc_status_path.push(".gc-status");
@@ -536,10 +588,10 @@ impl DataStore {
last_gc_status: Mutex::new(gc_status),
verify_new: config.verify_new.unwrap_or(false),
chunk_order: tuning.chunk_order.unwrap_or_default(),
- last_digest,
sync_level: tuning.sync_level.unwrap_or_default(),
backend_config,
lru_store_caching,
+ config_generation: generation,
})
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] [PATCH proxmox-backup v2 1/4] partial fix #6049: config: enable config version cache for datastore
2025-11-14 15:05 10% [pbs-devel] [PATCH proxmox-backup v2 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
@ 2025-11-14 15:05 17% ` Samuel Rufinatscha
2025-11-14 15:05 12% ` [pbs-devel] [PATCH proxmox-backup v2 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
` (2 subsequent siblings)
3 siblings, 0 replies; 200+ results
From: Samuel Rufinatscha @ 2025-11-14 15:05 UTC (permalink / raw)
To: pbs-devel
Repeated /status requests caused lookup_datastore() to re-read and
parse datastore.cfg on every call. The issue was mentioned in report
#6049 [1]. cargo-flamegraph [2] confirmed that the hot path is
dominated by pbs_config::datastore::config() (config parsing).
To solve the issue, this patch prepares the config version cache,
so that datastore config caching can be built on top of it.
This patch specifically:
(1) implements increment function in order to invalidate generations
(2) removes obsolete comments
Links
[1] Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[2] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
pbs-config/src/config_version_cache.rs | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
index e8fb994f..b875f7e0 100644
--- a/pbs-config/src/config_version_cache.rs
+++ b/pbs-config/src/config_version_cache.rs
@@ -26,7 +26,6 @@ struct ConfigVersionCacheDataInner {
// Traffic control (traffic-control.cfg) generation/version.
traffic_control_generation: AtomicUsize,
// datastore (datastore.cfg) generation/version
- // FIXME: remove with PBS 3.0
datastore_generation: AtomicUsize,
// Add further atomics here
}
@@ -145,8 +144,15 @@ impl ConfigVersionCache {
.fetch_add(1, Ordering::AcqRel);
}
+ /// Returns the datastore generation number.
+ pub fn datastore_generation(&self) -> usize {
+ self.shmem
+ .data()
+ .datastore_generation
+ .load(Ordering::Acquire)
+ }
+
/// Increase the datastore generation number.
- // FIXME: remove with PBS 3.0 or make actually useful again in datastore lookup
pub fn increase_datastore_generation(&self) -> usize {
self.shmem
.data()
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 17%]
* [pbs-devel] [PATCH proxmox-backup v2 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits
2025-11-14 15:05 10% [pbs-devel] [PATCH proxmox-backup v2 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-14 15:05 17% ` [pbs-devel] [PATCH proxmox-backup v2 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
2025-11-14 15:05 12% ` [pbs-devel] [PATCH proxmox-backup v2 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
@ 2025-11-14 15:05 15% ` Samuel Rufinatscha
2025-11-19 13:24 5% ` Fabian Grünbichler
2025-11-20 13:07 13% ` [pbs-devel] superseded: [PATCH proxmox-backup v2 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
3 siblings, 1 reply; 200+ results
From: Samuel Rufinatscha @ 2025-11-14 15:05 UTC (permalink / raw)
To: pbs-devel
The lookup fast path reacts to API-driven config changes because
save_config() bumps the generation. Manual edits of datastore.cfg do
not bump the counter. To keep the system robust against such edits
without reintroducing config reading and hashing on the hot path, this
patch adds a TTL to the cache entry.
If the cached config is older than
DATASTORE_CONFIG_CACHE_TTL_SECS (set to 60s), the next lookup takes
the slow path and refreshes the cached entry. Within
the TTL window, unchanged generations still use the fast path.
Links
[1] cargo-flamegraph: https://github.com/flamegraph-rs/flamegraph
Refs: #6049
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
pbs-datastore/src/datastore.rs | 46 +++++++++++++++++++++++++---------
1 file changed, 34 insertions(+), 12 deletions(-)
diff --git a/pbs-datastore/src/datastore.rs b/pbs-datastore/src/datastore.rs
index 0fabf592..7a18435c 100644
--- a/pbs-datastore/src/datastore.rs
+++ b/pbs-datastore/src/datastore.rs
@@ -22,7 +22,7 @@ use proxmox_sys::error::SysError;
use proxmox_sys::fs::{file_read_optional_string, replace_file, CreateOptions};
use proxmox_sys::linux::procfs::MountInfo;
use proxmox_sys::process_locker::{ProcessLockExclusiveGuard, ProcessLockSharedGuard};
-use proxmox_time::TimeSpan;
+use proxmox_time::{epoch_i64, TimeSpan};
use proxmox_worker_task::WorkerTaskContext;
use pbs_api_types::{
@@ -53,6 +53,8 @@ struct DatastoreConfigCache {
config: Arc<SectionConfigData>,
// Generation number from ConfigVersionCache
last_generation: usize,
+ // Last update time (epoch seconds)
+ last_update: i64,
}
static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
@@ -61,6 +63,8 @@ static DATASTORE_CONFIG_CACHE: LazyLock<Mutex<Option<DatastoreConfigCache>>> =
static DATASTORE_MAP: LazyLock<Mutex<HashMap<String, Arc<DataStoreImpl>>>> =
LazyLock::new(|| Mutex::new(HashMap::new()));
+/// Max age in seconds to reuse the cached datastore config.
+const DATASTORE_CONFIG_CACHE_TTL_SECS: i64 = 60;
/// Filename to store backup group notes
pub const GROUP_NOTES_FILE_NAME: &str = "notes";
/// Filename to store backup group owner
@@ -295,16 +299,22 @@ impl DatastoreBackend {
/// Return the cached datastore SectionConfig and its generation.
fn datastore_section_config_cached() -> Result<(Arc<SectionConfigData>, Option<usize>), Error> {
- let gen = ConfigVersionCache::new()
- .ok()
- .map(|c| c.datastore_generation());
+ let now = epoch_i64();
+ let version_cache = ConfigVersionCache::new().ok();
+ let current_gen = version_cache.as_ref().map(|c| c.datastore_generation());
let mut guard = DATASTORE_CONFIG_CACHE.lock().unwrap();
- // Fast path: re-use cached datastore.cfg
- if let (Some(gen), Some(cache)) = (gen, guard.as_ref()) {
- if cache.last_generation == gen {
- return Ok((cache.config.clone(), Some(gen)));
+ // Fast path: re-use cached datastore.cfg if cache is available, generation matches and TTL not expired
+ if let (Some(current_gen), Some(config_cache)) = (current_gen, guard.as_ref()) {
+ let gen_matches = config_cache.last_generation == current_gen;
+ let ttl_ok = (now - config_cache.last_update) < DATASTORE_CONFIG_CACHE_TTL_SECS;
+
+ if gen_matches && ttl_ok {
+ return Ok((
+ config_cache.config.clone(),
+ Some(config_cache.last_generation),
+ ));
}
}
@@ -312,16 +322,28 @@ fn datastore_section_config_cached() -> Result<(Arc<SectionConfigData>, Option<u
let (config_raw, _digest) = pbs_config::datastore::config()?;
let config = Arc::new(config_raw);
- if let Some(gen_val) = gen {
+ // Update cache
+ let new_gen = if let Some(handle) = version_cache {
+ // Bump datastore generation whenever we reload the config.
+ // This ensures that Drop handlers will detect that a newer config exists
+ // and will not rely on a stale cached entry for maintenance mandate.
+ let prev_gen = handle.increase_datastore_generation();
+ let new_gen = prev_gen + 1;
+
*guard = Some(DatastoreConfigCache {
config: config.clone(),
- last_generation: gen_val,
+ last_generation: new_gen,
+ last_update: now,
});
+
+ Some(new_gen)
} else {
+ // if the cache was not available, use again the slow path next time
*guard = None;
- }
+ None
+ };
- Ok((config, gen))
+ Ok((config, new_gen))
}
impl DataStore {
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
Results 1-200 of ~300 next (older) | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2025-11-03 10:13 [pbs-devel] [PATCH proxmox{, -backup} v3 0/2] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2025-12-03 10:23 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
2025-11-11 12:29 [pbs-devel] [PATCH proxmox-backup 0/3] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-14 15:08 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
2025-11-12 13:14 [pbs-devel] [PATCH proxmox-backup] task tracking: fix adding new entry if other PID is tracked Fabian Grünbichler
2025-11-17 8:41 13% ` Samuel Rufinatscha
2025-11-14 15:05 10% [pbs-devel] [PATCH proxmox-backup v2 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-14 15:05 17% ` [pbs-devel] [PATCH proxmox-backup v2 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
2025-11-14 15:05 12% ` [pbs-devel] [PATCH proxmox-backup v2 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2025-11-19 13:24 5% ` Fabian Grünbichler
2025-11-14 15:05 15% ` [pbs-devel] [PATCH proxmox-backup v2 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-19 13:24 5% ` Fabian Grünbichler
2025-11-19 17:25 6% ` Samuel Rufinatscha
2025-11-20 13:07 13% ` [pbs-devel] superseded: [PATCH proxmox-backup v2 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-20 13:03 10% [pbs-devel] [PATCH proxmox-backup v3 0/6] " Samuel Rufinatscha
2025-11-20 13:03 17% ` [pbs-devel] [PATCH proxmox-backup v3 1/6] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
2025-11-20 13:03 12% ` [pbs-devel] [PATCH proxmox-backup v3 2/6] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2025-11-20 13:03 16% ` [pbs-devel] [PATCH proxmox-backup v3 3/6] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
2025-11-20 13:03 15% ` [pbs-devel] [PATCH proxmox-backup v3 4/6] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-20 13:03 15% ` [pbs-devel] [PATCH proxmox-backup v3 5/6] partial fix #6049: datastore: add reload flag to config cache helper Samuel Rufinatscha
2025-11-20 14:50 5% ` Fabian Grünbichler
2025-11-20 18:15 6% ` Samuel Rufinatscha
2025-11-20 13:03 15% ` [pbs-devel] [PATCH proxmox-backup v3 6/6] datastore: only bump generation when config digest changes Samuel Rufinatscha
2025-11-20 14:50 5% ` Fabian Grünbichler
2025-11-21 8:37 6% ` Samuel Rufinatscha
2025-11-20 14:50 5% ` [pbs-devel] [PATCH proxmox-backup v3 0/6] datastore: remove config reload on hot path Fabian Grünbichler
2025-11-20 15:17 6% ` Samuel Rufinatscha
2025-11-24 15:35 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
2025-11-24 15:33 12% [pbs-devel] [PATCH proxmox-backup v4 0/4] " Samuel Rufinatscha
2025-11-24 15:33 16% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
2025-11-24 15:33 11% ` [pbs-devel] [PATCH proxmox-backup v4 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2025-11-24 15:33 14% ` [pbs-devel] [PATCH proxmox-backup v4 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
2025-11-24 15:33 13% ` [pbs-devel] [PATCH proxmox-backup v4 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-24 17:06 13% ` [pbs-devel] superseded: [PATCH proxmox-backup v4 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
2025-11-24 17:04 12% [pbs-devel] [PATCH proxmox-backup v5 " Samuel Rufinatscha
2025-11-24 17:04 16% ` [pbs-devel] [PATCH proxmox-backup v5 1/4] partial fix #6049: config: enable config version cache for datastore Samuel Rufinatscha
2025-11-26 15:15 5% ` Fabian Grünbichler
2025-11-24 17:04 11% ` [pbs-devel] [PATCH proxmox-backup v5 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2025-11-26 15:15 5% ` Fabian Grünbichler
2025-11-26 17:21 6% ` Samuel Rufinatscha
2025-11-24 17:04 14% ` [pbs-devel] [PATCH proxmox-backup v5 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
2025-11-26 15:15 5% ` Fabian Grünbichler
2025-11-28 9:03 6% ` Samuel Rufinatscha
2025-11-28 10:46 5% ` Fabian Grünbichler
2025-11-28 11:10 6% ` Samuel Rufinatscha
2025-11-24 17:04 14% ` [pbs-devel] [PATCH proxmox-backup v5 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2025-11-26 15:15 5% ` Fabian Grünbichler
2025-11-26 15:16 5% ` [pbs-devel] [PATCH proxmox-backup v5 0/4] datastore: remove config reload on hot path Fabian Grünbichler
2025-11-26 16:10 6% ` Samuel Rufinatscha
2026-01-05 14:21 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
2025-12-02 15:56 12% [pbs-devel] [PATCH proxmox{-backup, } 0/8] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2025-12-02 15:56 15% ` [pbs-devel] [PATCH proxmox-backup 1/4] acme: include proxmox-acme-api dependency Samuel Rufinatscha
2025-12-02 15:56 6% ` [pbs-devel] [PATCH proxmox-backup 2/4] acme: drop local AcmeClient Samuel Rufinatscha
2025-12-02 15:56 8% ` [pbs-devel] [PATCH proxmox-backup 3/4] acme: change API impls to use proxmox-acme-api handlers Samuel Rufinatscha
2025-12-02 15:56 7% ` [pbs-devel] [PATCH proxmox-backup 4/4] acme: certificate ordering through proxmox-acme-api Samuel Rufinatscha
2025-12-02 15:56 12% ` [pbs-devel] [PATCH proxmox 1/4] acme: reduce visibility of Request type Samuel Rufinatscha
2025-12-02 15:56 15% ` [pbs-devel] [PATCH proxmox 2/4] acme: introduce http_status module Samuel Rufinatscha
2025-12-02 15:56 17% ` [pbs-devel] [PATCH proxmox 3/4] acme-api: add helper to load client for an account Samuel Rufinatscha
2025-12-02 15:56 14% ` [pbs-devel] [PATCH proxmox 4/4] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2025-12-02 16:02 6% ` [pbs-devel] [PATCH proxmox{-backup, } 0/8] " Samuel Rufinatscha
2025-12-03 10:22 11% [pbs-devel] [PATCH proxmox{-backup, } v4 " Samuel Rufinatscha
2025-12-03 10:22 15% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] acme: include proxmox-acme-api dependency Samuel Rufinatscha
2025-12-03 10:22 6% ` [pbs-devel] [PATCH proxmox-backup v4 2/4] acme: drop local AcmeClient Samuel Rufinatscha
2025-12-09 16:50 4% ` Max R. Carrara
2025-12-03 10:22 8% ` [pbs-devel] [PATCH proxmox-backup v4 3/4] acme: change API impls to use proxmox-acme-api handlers Samuel Rufinatscha
2025-12-09 16:50 5% ` Max R. Carrara
2025-12-03 10:22 7% ` [pbs-devel] [PATCH proxmox-backup v4 4/4] acme: certificate ordering through proxmox-acme-api Samuel Rufinatscha
2025-12-09 16:50 5% ` Max R. Carrara
2025-12-03 10:22 17% ` [pbs-devel] [PATCH proxmox v4 1/4] acme-api: add helper to load client for an account Samuel Rufinatscha
2025-12-09 16:51 5% ` Max R. Carrara
2025-12-10 10:08 6% ` Samuel Rufinatscha
2025-12-03 10:22 12% ` [pbs-devel] [PATCH proxmox v4 2/4] acme: reduce visibility of Request type Samuel Rufinatscha
2025-12-09 16:51 5% ` Max R. Carrara
2025-12-03 10:22 15% ` [pbs-devel] [PATCH proxmox v4 3/4] acme: introduce http_status module Samuel Rufinatscha
2025-12-03 10:22 14% ` [pbs-devel] [PATCH proxmox v4 4/4] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2025-12-09 16:50 5% ` [pbs-devel] [PATCH proxmox{-backup, } v4 0/8] " Max R. Carrara
2025-12-10 9:44 6% ` Samuel Rufinatscha
2026-01-08 11:48 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
2025-12-05 13:25 15% [pbs-devel] [PATCH proxmox{-backup, } 0/6] Reduce token.shadow verification overhead Samuel Rufinatscha
2025-12-05 13:25 14% ` [pbs-devel] [PATCH proxmox-backup 1/3] pbs-config: cache verified API token secrets Samuel Rufinatscha
2025-12-05 14:04 5% ` Shannon Sterz
2025-12-09 13:29 6% ` Samuel Rufinatscha
2025-12-17 11:16 5% ` Christian Ebner
2025-12-17 11:25 0% ` Shannon Sterz
2025-12-10 11:47 5% ` Fabian Grünbichler
2025-12-10 15:35 6% ` Samuel Rufinatscha
2025-12-15 15:05 12% ` Samuel Rufinatscha
2025-12-15 19:00 12% ` Samuel Rufinatscha
2025-12-16 8:16 5% ` Fabian Grünbichler
2025-12-05 13:25 15% ` [pbs-devel] [PATCH proxmox-backup 2/3] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2025-12-05 13:25 16% ` [pbs-devel] [PATCH proxmox-backup 3/3] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
2025-12-05 13:25 14% ` [pbs-devel] [PATCH proxmox 1/3] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
2025-12-05 13:25 15% ` [pbs-devel] [PATCH proxmox 2/3] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2025-12-05 13:25 16% ` [pbs-devel] [PATCH proxmox 3/3] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
2025-12-05 14:06 5% ` [pbs-devel] [PATCH proxmox{-backup, } 0/6] Reduce token.shadow verification overhead Shannon Sterz
2025-12-09 13:58 6% ` Samuel Rufinatscha
2025-12-17 16:27 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
2025-12-17 16:25 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token " Samuel Rufinatscha
2025-12-17 16:25 13% ` [pbs-devel] [PATCH proxmox-backup v2 1/3] pbs-config: cache verified API token secrets Samuel Rufinatscha
2025-12-17 16:25 12% ` [pbs-devel] [PATCH proxmox-backup v2 2/3] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2025-12-17 16:25 14% ` [pbs-devel] [PATCH proxmox-backup v2 3/3] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
2025-12-17 16:25 13% ` [pbs-devel] [PATCH proxmox v2 1/3] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
2025-12-17 16:25 12% ` [pbs-devel] [PATCH proxmox v2 2/3] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2025-12-17 16:25 15% ` [pbs-devel] [PATCH proxmox v2 3/3] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
2025-12-17 16:25 17% ` [pbs-devel] [PATCH proxmox-datacenter-manager v2 1/1] docs: document API token-cache TTL effects Samuel Rufinatscha
2025-12-18 11:03 12% ` [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v2 0/7] token-shadow: reduce api token verification overhead Samuel Rufinatscha
2026-01-02 16:09 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
2026-01-02 16:07 13% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] " Samuel Rufinatscha
2026-01-02 16:07 17% ` [pbs-devel] [PATCH proxmox-backup v3 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
2026-01-14 10:44 5% ` Fabian Grünbichler
2026-01-16 13:53 6% ` Samuel Rufinatscha
2026-01-02 16:07 12% ` [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
2026-01-14 10:44 5% ` Fabian Grünbichler
2026-01-16 15:13 6% ` Samuel Rufinatscha
2026-01-16 15:29 5% ` Fabian Grünbichler
2026-01-16 15:33 6% ` Samuel Rufinatscha
2026-01-16 16:00 5% ` Fabian Grünbichler
2026-01-16 16:56 6% ` Samuel Rufinatscha
2026-01-02 16:07 12% ` [pbs-devel] [PATCH proxmox-backup v3 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2026-01-14 10:44 5% ` Fabian Grünbichler
2026-01-20 9:21 6% ` Samuel Rufinatscha
2026-01-02 16:07 15% ` [pbs-devel] [PATCH proxmox-backup v3 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
2026-01-02 16:07 17% ` [pbs-devel] [PATCH proxmox v3 1/4] proxmox-access-control: extend AccessControlConfig for token.shadow invalidation Samuel Rufinatscha
2026-01-02 16:07 12% ` [pbs-devel] [PATCH proxmox v3 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
2026-01-02 16:07 12% ` [pbs-devel] [PATCH proxmox v3 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2026-01-02 16:07 15% ` [pbs-devel] [PATCH proxmox v3 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
2026-01-02 16:07 13% ` [pbs-devel] [PATCH proxmox-datacenter-manager v3 1/2] pdm-config: implement token.shadow generation Samuel Rufinatscha
2026-01-14 10:45 5% ` Fabian Grünbichler
2026-01-16 16:28 6% ` Samuel Rufinatscha
2026-01-16 16:48 5% ` Shannon Sterz
2026-01-19 7:56 6% ` Samuel Rufinatscha
2026-01-02 16:07 17% ` [pbs-devel] [PATCH proxmox-datacenter-manager v3 2/2] docs: document API token-cache TTL effects Samuel Rufinatscha
2026-01-14 10:45 5% ` Fabian Grünbichler
2026-01-14 11:24 6% ` Samuel Rufinatscha
2026-01-21 15:15 13% ` [pbs-devel] superseded: [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] token-shadow: reduce api token verification overhead Samuel Rufinatscha
2026-01-05 10:34 [pbs-devel] [PATCH proxmox-backup 1/1] fix: s3: make s3_refresh apihandler sync Nicolas Frey
2026-01-05 15:22 13% ` Samuel Rufinatscha
2026-01-05 14:16 12% [pbs-devel] [PATCH proxmox-backup v6 0/4] datastore: remove config reload on hot path Samuel Rufinatscha
2026-01-05 14:16 16% ` [pbs-devel] [PATCH proxmox-backup v6 1/4] config: enable config version cache for datastore Samuel Rufinatscha
2026-01-05 14:16 11% ` [pbs-devel] [PATCH proxmox-backup v6 2/4] partial fix #6049: datastore: impl ConfigVersionCache fast path for lookups Samuel Rufinatscha
2026-01-05 14:16 15% ` [pbs-devel] [PATCH proxmox-backup v6 3/4] partial fix #6049: datastore: use config fast-path in Drop Samuel Rufinatscha
2026-01-05 14:16 13% ` [pbs-devel] [PATCH proxmox-backup v6 4/4] partial fix #6049: datastore: add TTL fallback to catch manual config edits Samuel Rufinatscha
2026-01-14 9:54 5% ` [pbs-devel] applied-series: [PATCH proxmox-backup v6 0/4] datastore: remove config reload on hot path Fabian Grünbichler
2026-01-06 14:24 [pve-devel] [PATCH pve-cluster 00/15 v1] Rewrite pmxcfs with Rust Kefu Chai
2026-01-06 14:24 ` [pve-devel] [PATCH pve-cluster 01/15] pmxcfs-rs: add workspace and pmxcfs-api-types crate Kefu Chai
2026-01-23 14:17 6% ` Samuel Rufinatscha
2026-01-26 9:00 6% ` Kefu Chai
2026-01-06 14:24 ` [pve-devel] [PATCH pve-cluster 02/15] pmxcfs-rs: add pmxcfs-config crate Kefu Chai
2026-01-23 15:01 6% ` Samuel Rufinatscha
2026-01-26 9:43 5% ` Kefu Chai
2026-01-06 14:24 ` [pve-devel] [PATCH pve-cluster 03/15] pmxcfs-rs: add pmxcfs-logger crate Kefu Chai
2026-01-27 13:16 6% ` Samuel Rufinatscha
2026-01-06 14:24 ` [pve-devel] [PATCH pve-cluster 04/15] pmxcfs-rs: add pmxcfs-rrd crate Kefu Chai
2026-01-29 14:44 5% ` Samuel Rufinatscha
2026-01-06 14:24 ` [pve-devel] [PATCH pve-cluster 05/15] pmxcfs-rs: add pmxcfs-memdb crate Kefu Chai
2026-01-30 15:35 5% ` Samuel Rufinatscha
2026-01-06 14:24 ` [pve-devel] [PATCH pve-cluster 06/15] pmxcfs-rs: add pmxcfs-status crate Kefu Chai
2026-02-02 16:07 5% ` Samuel Rufinatscha
2026-01-06 14:24 ` [pve-devel] [PATCH pve-cluster 07/15] pmxcfs-rs: add pmxcfs-test-utils infrastructure crate Kefu Chai
2026-02-03 17:03 6% ` Samuel Rufinatscha
2026-01-07 12:46 6% [pbs-devel] [PATCH proxmox-backup v2 1/1] fix: s3: make s3_refresh apihandler sync Nicolas Frey
2026-01-08 11:26 11% [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2026-01-08 11:26 10% ` [pbs-devel] [PATCH proxmox v5 1/4] acme: reduce visibility of Request type Samuel Rufinatscha
2026-01-13 13:46 5% ` Fabian Grünbichler
2026-01-14 15:07 6% ` Samuel Rufinatscha
2026-01-08 11:26 15% ` [pbs-devel] [PATCH proxmox v5 2/4] acme: introduce http_status module Samuel Rufinatscha
2026-01-13 13:45 5% ` Fabian Grünbichler
2026-01-14 10:29 6% ` Samuel Rufinatscha
2026-01-15 9:25 5% ` Fabian Grünbichler
2026-01-08 11:26 14% ` [pbs-devel] [PATCH proxmox v5 3/4] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2026-01-08 11:26 17% ` [pbs-devel] [PATCH proxmox v5 4/4] acme-api: add helper to load client for an account Samuel Rufinatscha
2026-01-13 13:45 5% ` Fabian Grünbichler
2026-01-13 16:57 6% ` Samuel Rufinatscha
2026-01-08 11:26 13% ` [pbs-devel] [PATCH proxmox-backup v5 1/5] acme: clean up ACME-related imports Samuel Rufinatscha
2026-01-13 13:45 5% ` [pbs-devel] applied: " Fabian Grünbichler
2026-01-08 11:26 15% ` [pbs-devel] [PATCH proxmox-backup v5 2/5] acme: include proxmox-acme-api dependency Samuel Rufinatscha
2026-01-13 13:45 5% ` Fabian Grünbichler
2026-01-13 16:41 6% ` Samuel Rufinatscha
2026-01-08 11:26 6% ` [pbs-devel] [PATCH proxmox-backup v5 3/5] acme: drop local AcmeClient Samuel Rufinatscha
2026-01-13 13:45 5% ` Fabian Grünbichler
2026-01-14 8:56 6% ` Samuel Rufinatscha
2026-01-14 9:58 5% ` Fabian Grünbichler
2026-01-14 10:52 6% ` Samuel Rufinatscha
2026-01-14 16:41 12% ` Samuel Rufinatscha
2026-01-08 11:26 8% ` [pbs-devel] [PATCH proxmox-backup v5 4/5] acme: change API impls to use proxmox-acme-api handlers Samuel Rufinatscha
2026-01-13 13:45 5% ` Fabian Grünbichler
2026-01-13 16:53 6% ` Samuel Rufinatscha
2026-01-08 11:26 7% ` [pbs-devel] [PATCH proxmox-backup v5 5/5] acme: certificate ordering through proxmox-acme-api Samuel Rufinatscha
2026-01-13 13:45 5% ` Fabian Grünbichler
2026-01-13 16:51 6% ` Samuel Rufinatscha
2026-01-13 13:48 5% ` [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests Fabian Grünbichler
2026-01-15 10:24 0% ` Max R. Carrara
2026-01-16 11:30 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
2026-01-08 13:06 [pdm-devel] [PATCH datacenter-manager] fix #7120: remote updates: drop vanished nodes/remotes from cache file Lukas Wagner
2026-01-08 14:38 15% ` Samuel Rufinatscha
2026-01-16 11:28 10% [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2026-01-16 11:28 16% ` [pbs-devel] [PATCH proxmox v6 1/3] acme-api: add ACME completion helpers Samuel Rufinatscha
2026-01-16 11:28 15% ` [pbs-devel] [PATCH proxmox v6 2/3] acme: introduce http_status module Samuel Rufinatscha
2026-01-16 11:28 14% ` [pbs-devel] [PATCH proxmox v6 3/3] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2026-01-16 11:28 4% ` [pbs-devel] [PATCH proxmox-backup v6 1/2] acme: remove local AcmeClient and use proxmox-acme-api handlers Samuel Rufinatscha
2026-01-16 11:28 9% ` [pbs-devel] [PATCH proxmox-backup v6 2/2] acme: remove unused src/acme and plugin code Samuel Rufinatscha
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
2026-01-21 15:13 17% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
2026-01-21 15:13 12% ` [pbs-devel] [PATCH proxmox-backup v4 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
2026-01-21 15:13 12% ` [pbs-devel] [PATCH proxmox-backup v4 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2026-01-21 15:14 15% ` [pbs-devel] [PATCH proxmox-backup v4 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
2026-01-21 15:14 13% ` [pbs-devel] [PATCH proxmox v4 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen Samuel Rufinatscha
2026-01-21 15:14 12% ` [pbs-devel] [PATCH proxmox v4 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
2026-01-21 15:14 12% ` [pbs-devel] [PATCH proxmox v4 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2026-01-21 15:14 15% ` [pbs-devel] [PATCH proxmox v4 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
2026-01-21 15:14 13% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 1/3] pdm-config: implement token.shadow generation Samuel Rufinatscha
2026-01-21 15:14 17% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 2/3] docs: document API token-cache TTL effects Samuel Rufinatscha
2026-01-21 15:14 16% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 3/3] pdm-config: wire user+acl cache generation Samuel Rufinatscha
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.