all lists on lists.proxmox.com
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [pbs-devel] [PATCH proxmox v6 2/3] acme: introduce http_status module
  2026-01-16 11:28 11% [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] " Samuel Rufinatscha
  2026-01-16 11:28 16% ` [pbs-devel] [PATCH proxmox v6 1/3] acme-api: add ACME completion helpers Samuel Rufinatscha
@ 2026-01-16 11:28 15% ` Samuel Rufinatscha
  2026-01-16 11:28 14% ` [pbs-devel] [PATCH proxmox v6 3/3] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-16 11:28 UTC (permalink / raw)
  To: pbs-devel

Introduce an internal http_status module with the common ACME HTTP
response codes, and replace use of crate::request::CREATED as well as
direct numeric status code usages.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
 proxmox-acme/src/account.rs      |  8 ++++----
 proxmox-acme/src/async_client.rs |  4 ++--
 proxmox-acme/src/lib.rs          |  2 ++
 proxmox-acme/src/request.rs      | 11 ++++++++++-
 4 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
index f763c1e9..c62e60e0 100644
--- a/proxmox-acme/src/account.rs
+++ b/proxmox-acme/src/account.rs
@@ -85,7 +85,7 @@ impl Account {
             method: "POST",
             content_type: crate::request::JSON_CONTENT_TYPE,
             body,
-            expected: crate::request::CREATED,
+            expected: crate::http_status::CREATED,
         };
 
         Ok(NewOrder::new(request))
@@ -107,7 +107,7 @@ impl Account {
             method: "POST",
             content_type: crate::request::JSON_CONTENT_TYPE,
             body,
-            expected: 200,
+            expected: crate::http_status::OK,
         })
     }
 
@@ -132,7 +132,7 @@ impl Account {
             method: "POST",
             content_type: crate::request::JSON_CONTENT_TYPE,
             body,
-            expected: 200,
+            expected: crate::http_status::OK,
         })
     }
 
@@ -405,7 +405,7 @@ impl AccountCreator {
             method: "POST",
             content_type: crate::request::JSON_CONTENT_TYPE,
             body,
-            expected: crate::request::CREATED,
+            expected: crate::http_status::CREATED,
         })
     }
 
diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
index dc755fb9..c803823d 100644
--- a/proxmox-acme/src/async_client.rs
+++ b/proxmox-acme/src/async_client.rs
@@ -498,7 +498,7 @@ impl AcmeClient {
                 method: "GET",
                 content_type: "",
                 body: String::new(),
-                expected: 200,
+                expected: crate::http_status::OK,
             },
             nonce,
         )
@@ -550,7 +550,7 @@ impl AcmeClient {
                 method: "HEAD",
                 content_type: "",
                 body: String::new(),
-                expected: 200,
+                expected: crate::http_status::OK,
             },
             nonce,
         )
diff --git a/proxmox-acme/src/lib.rs b/proxmox-acme/src/lib.rs
index df722629..b1be9d15 100644
--- a/proxmox-acme/src/lib.rs
+++ b/proxmox-acme/src/lib.rs
@@ -74,6 +74,8 @@ pub use request::Request;
 #[cfg(feature = "impl")]
 pub use order::NewOrder;
 #[cfg(feature = "impl")]
+pub(crate) use request::http_status;
+#[cfg(feature = "impl")]
 pub use request::ErrorResponse;
 
 /// Header name for nonces.
diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
index 78a90913..2c83255a 100644
--- a/proxmox-acme/src/request.rs
+++ b/proxmox-acme/src/request.rs
@@ -1,7 +1,6 @@
 use serde::Deserialize;
 
 pub(crate) const JSON_CONTENT_TYPE: &str = "application/jose+json";
-pub(crate) const CREATED: u16 = 201;
 
 /// A request which should be performed on the ACME provider.
 pub struct Request {
@@ -21,6 +20,16 @@ pub struct Request {
     pub expected: u16,
 }
 
+/// Common HTTP status codes used in ACME responses.
+pub(crate) mod http_status {
+    /// 200 OK
+    pub(crate) const OK: u16 = 200;
+    /// 201 Created
+    pub(crate) const CREATED: u16 = 201;
+    /// 204 No Content
+    pub(crate) const NO_CONTENT: u16 = 204;
+}
+
 /// An ACME error response contains a specially formatted type string, and can optionally
 /// contain textual details and a set of sub problems.
 #[derive(Clone, Debug, Deserialize)]
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply related	[relevance 15%]

* [pbs-devel] [PATCH proxmox v6 3/3] fix #6939: acme: support servers returning 204 for nonce requests
  2026-01-16 11:28 11% [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] " Samuel Rufinatscha
  2026-01-16 11:28 16% ` [pbs-devel] [PATCH proxmox v6 1/3] acme-api: add ACME completion helpers Samuel Rufinatscha
  2026-01-16 11:28 15% ` [pbs-devel] [PATCH proxmox v6 2/3] acme: introduce http_status module Samuel Rufinatscha
@ 2026-01-16 11:28 14% ` Samuel Rufinatscha
  2026-01-16 11:28  4% ` [pbs-devel] [PATCH proxmox-backup v6 1/2] acme: remove local AcmeClient and use proxmox-acme-api handlers Samuel Rufinatscha
  2026-01-16 11:28  9% ` [pbs-devel] [PATCH proxmox-backup v6 2/2] acme: remove unused src/acme and plugin code Samuel Rufinatscha
  4 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-16 11:28 UTC (permalink / raw)
  To: pbs-devel

Some ACME servers (notably custom or legacy implementations) respond
to HEAD /newNonce with a 204 No Content instead of the
RFC 8555-recommended 200 OK [1]. While this behavior is technically
off-spec, it is not illegal. This issue was reported on our bug
tracker [2].

The previous implementation treated any non-200 response as an error,
causing account registration to fail against such servers. Relax the
status-code check to accept both 200 and 204 responses (and potentially
support other 2xx codes) to improve interoperability.

Note: In comparison, PVE’s Perl ACME client performs a GET request [3]
instead of a HEAD request and accepts any 2xx success code when
retrieving the nonce [4]. This difference in behavior does not affect
functionality but is worth noting for consistency across
implementations.

[1] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
[3] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219
[4] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597

Fixes: #6939
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
 proxmox-acme/src/account.rs      | 10 +++++-----
 proxmox-acme/src/async_client.rs |  6 +++---
 proxmox-acme/src/client.rs       |  2 +-
 proxmox-acme/src/request.rs      |  4 ++--
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
index c62e60e0..8df19a29 100644
--- a/proxmox-acme/src/account.rs
+++ b/proxmox-acme/src/account.rs
@@ -85,7 +85,7 @@ impl Account {
             method: "POST",
             content_type: crate::request::JSON_CONTENT_TYPE,
             body,
-            expected: crate::http_status::CREATED,
+            expected: &[crate::http_status::CREATED],
         };
 
         Ok(NewOrder::new(request))
@@ -107,7 +107,7 @@ impl Account {
             method: "POST",
             content_type: crate::request::JSON_CONTENT_TYPE,
             body,
-            expected: crate::http_status::OK,
+            expected: &[crate::http_status::OK],
         })
     }
 
@@ -132,7 +132,7 @@ impl Account {
             method: "POST",
             content_type: crate::request::JSON_CONTENT_TYPE,
             body,
-            expected: crate::http_status::OK,
+            expected: &[crate::http_status::OK],
         })
     }
 
@@ -157,7 +157,7 @@ impl Account {
             method: "POST",
             content_type: crate::request::JSON_CONTENT_TYPE,
             body,
-            expected: 200,
+            expected: &[crate::http_status::OK],
         })
     }
 
@@ -405,7 +405,7 @@ impl AccountCreator {
             method: "POST",
             content_type: crate::request::JSON_CONTENT_TYPE,
             body,
-            expected: crate::http_status::CREATED,
+            expected: &[crate::http_status::CREATED],
         })
     }
 
diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
index c803823d..66ec6024 100644
--- a/proxmox-acme/src/async_client.rs
+++ b/proxmox-acme/src/async_client.rs
@@ -420,7 +420,7 @@ impl AcmeClient {
         };
 
         if parts.status.is_success() {
-            if status != request.expected {
+            if !request.expected.contains(&status) {
                 return Err(Error::InvalidApi(format!(
                     "ACME server responded with unexpected status code: {:?}",
                     parts.status
@@ -498,7 +498,7 @@ impl AcmeClient {
                 method: "GET",
                 content_type: "",
                 body: String::new(),
-                expected: crate::http_status::OK,
+                expected: &[crate::http_status::OK],
             },
             nonce,
         )
@@ -550,7 +550,7 @@ impl AcmeClient {
                 method: "HEAD",
                 content_type: "",
                 body: String::new(),
-                expected: crate::http_status::OK,
+                expected: &[crate::http_status::OK, crate::http_status::NO_CONTENT],
             },
             nonce,
         )
diff --git a/proxmox-acme/src/client.rs b/proxmox-acme/src/client.rs
index 931f7245..881ee83d 100644
--- a/proxmox-acme/src/client.rs
+++ b/proxmox-acme/src/client.rs
@@ -203,7 +203,7 @@ impl Inner {
         let got_nonce = self.update_nonce(&mut response)?;
 
         if response.is_success() {
-            if response.status != request.expected {
+            if !request.expected.contains(&response.status) {
                 return Err(Error::InvalidApi(format!(
                     "API server responded with unexpected status code: {:?}",
                     response.status
diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
index 2c83255a..8a4017dc 100644
--- a/proxmox-acme/src/request.rs
+++ b/proxmox-acme/src/request.rs
@@ -16,8 +16,8 @@ pub struct Request {
     /// The body to pass along with request, or an empty string.
     pub body: String,
 
-    /// The expected status code a compliant ACME provider will return on success.
-    pub expected: u16,
+    /// The set of HTTP status codes that indicate a successful response from an ACME provider.
+    pub expected: &'static [u16],
 }
 
 /// Common HTTP status codes used in ACME responses.
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply related	[relevance 14%]

* [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] fix #6939: acme: support servers returning 204 for nonce requests
@ 2026-01-16 11:28 11% Samuel Rufinatscha
  2026-01-16 11:28 16% ` [pbs-devel] [PATCH proxmox v6 1/3] acme-api: add ACME completion helpers Samuel Rufinatscha
                   ` (4 more replies)
  0 siblings, 5 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-16 11:28 UTC (permalink / raw)
  To: pbs-devel

Hi,

this series fixes account registration for ACME providers that return
HTTP 204 No Content to the newNonce request. Currently, both the PBS
ACME client and the shared ACME client in proxmox-acme only accept
HTTP 200 OK for this request. The issue was observed in PBS against a
custom ACME deployment and reported as bug #6939 [1].

## Problem

During ACME account registration, PBS first fetches an anti-replay
nonce by sending a HEAD request to the CA’s newNonce URL.
RFC 8555 §7.2 [2] states that:

* the server MUST include a Replay-Nonce header with a fresh nonce,
* the server SHOULD use status 200 OK for the HEAD request,
* the server MUST also handle GET on the same resource and may return
  204 No Content with an empty body.

The reporter observed the following error message:

  "ACME server responded with unexpected status code: 204"

and mentioned that the issue did not appear with PVE 9 [1]. Looking at
PVE’s Perl ACME client [3], it uses a GET request instead of HEAD and
accepts any 2xx success code when retrieving the nonce. This difference
in behavior is worth noting.

## Approach

This series changes the expected field of the internal Request type
from a single u16 to &'static [u16], so one request can explicitly
accept multiple success codes.

To avoid fixing the issue in PBS and in PDM (which uses the shared ACME stack),
this series fixes the bug in
proxmox-acme and then refactors PBS to use the shared ACME stack too.

## Testing

I tested the refactor using Pebble HTTP challenge type.
The DNS challange type will be tested as mentioned by Max (see v5).

*HTTP Challenge Type Test*

To test the refactor, I
(1) installed latest stable PBS on a VM
(2) created .deb package from latest PBS (master), containing the
 refactor
(3) installed created .deb package
(4) installed Pebble from Let's Encrypt [5] on the same VM
(5) created an ACME account and ordered the new certificate for the
 host domain.

Steps to reproduce:

(1) install latest stable PBS on a VM, create .deb package from latest
 PBS (master) containing the refactor, install created .deb package
(2) install Pebble from Let's Encrypt [5] on the same VM:

    cd
    apt update
    apt install -y golang git
    git clone https://github.com/letsencrypt/pebble
    cd pebble
    go build ./cmd/pebble

then, download and trust the Pebble cert:

    wget https://raw.githubusercontent.com/letsencrypt/pebble/main/test/certs/pebble.minica.pem
    cp pebble.minica.pem /usr/local/share/ca-certificates/pebble.minica.crt
    update-ca-certificates

We want Pebble to perform HTTP-01 validation against port 80, because
PBS’s standalone plugin will bind port 80. Set httpPort to 80.

    nano ./test/config/pebble-config.json

Start the Pebble server in the background:

    ./pebble -config ./test/config/pebble-config.json &

Create a Pebble ACME account:

    proxmox-backup-manager acme account register default admin@example.com --directory 'https://127.0.0.1:14000/dir'

To verify persistence of the account I checked

    ls /etc/proxmox-backup/acme/accounts

Verified if update-account works

    proxmox-backup-manager acme account update default --contact "a@example.com,b@example.com"
    proxmox-backup-manager acme account info default

In the PBS GUI, you can create a new domain. You can use your host
domain name (see /etc/hosts). Select the created account and order the
certificate.

After a page reload, you might need to accept the new certificate in the browser.
In the PBS dashboard, you should see the new Pebble certificate.

*Note: on reboot, the created Pebble ACME account will be gone and you
will need to create a new one. Pebble does not persist account info.
In that case remove the previously created account in
/etc/proxmox-backup/acme/accounts.

*Testing the newNonce fix*

To test the ACME newNonce fix, I put nginx in front of Pebble, to
intercept the newNonce request in order to return 204 No Content
instead of 200 OK, all other requests are unchanged and forwarded to
Pebble. Requires trusting the nginx CAs via
/usr/local/share/ca-certificates + update-ca-certificates on the VM.

Then I ran following command against nginx:

proxmox-backup-manager acme account register proxytest root@backup.local --directory 'https://nginx-address/dir

The account could be created successfully. When adjusting the nginx
configuration to return any other non-expected success status code,
PBS rejects as expected.

## Patch summary

0001 - proxmox: acme-api: add ACME completion helpers
0002 – proxmox: acme: introduce http_status module
0003 – proxmox: fix #6939: acme: support servers
returning 204 for nonce requests
0004 – proxmox-backup: acme: remove local AcmeClient and use
proxmox-acme-api handlers
0005 – proxmox-backup: acme: remove unused src/acme and plugin code

## Maintainer notes

proxmox-acme: requires version bump (breaking Request::expected change)
proxmox-backup: requires version bump
- NodeConfig::acme_config() signature changed from
Option<Result<AcmeConfig, Error>> to Result<AcmeConfig, Error>
- NodeConfig::acme_client() function removed

0001 - proxmox: acme-api: add ACME completion helpers could be applied
as an independent patch to make sure https://bugzilla.proxmox.com/show_bug.cgi?id=7179
is not blocked / avoid duplicate work

## Changelog

Changes from v5 to v6:

* rebased
* proxmox-acme: revert visibility changes and dead-code removal
* proxmox-acme-api: remove load_client_with_account
* proxmox-backup: remove pub Node::acme_client()
* proxmox-backup: Node::acme_config() inline transpose/default logic
* proxmox-backup: merge PBS Client removal and API handler changes in
one patch
* improve commit messages

Changes from v4 to v5:

* rebased
* re-ordered series (proxmox-acme fix first)
* proxmox-backup: cleaned up imports based on an initial clean-up patch
* proxmox-acme: removed now unused post_request_raw_payload(),
  update_account_request(), deactivate_account_request()
* proxmox-acme: removed now obsolete/unused get_authorization() and
  GetAuthorization impl

Verified removal by compiling PBS, PDM, and proxmox-perl-rs
with all features.

Changes from v3 to v4:

* add proxmox-acme-api as a dependency and initialize it in
 PBS so PBS can use the shared ACME API instead.
* remove the PBS-local AcmeClient implementation and switch PBS
 over to the shared proxmox-acme async client.
* rework PBS’ ACME API endpoints to delegate to
 proxmox-acme-api handlers instead of duplicating logic locally.
* move PBS’ ACME certificate ordering logic over to
 proxmox-acme-api, keeping only certificate installation/reload in PBS.
* add a load_client_with_account helper in proxmox-acme-api so PBS
 (and others) can construct an AcmeClient for a configured account
 without duplicating boilerplate.
* hide the low-level Request type and its fields behind constructors
 / reduced visibility so changes to “expected” no longer affect the
 public API as they did in v3.
* split out the HTTP status constants into an internal http_status
 module as a separate preparatory cleanup before the bug fix, instead
 of doing this inline like in v3.
* Rebased on top of the refactor: keep the same behavioural fix as in
 v3 accept 204 for newNonce with Replay-Nonce present), but implement
 it on top of the http_status module that is part of the refactor.

Changes from v2 to v3:

* rename `http_success` module to `http_status`
* replace `http_success` usage
* introduced `http_success` module to contain the http success codes
* replaced `Vec<u16>` with `&[u16]` for expected codes to avoid allocations.
* clarified the PVEs Perl ACME client behaviour in the commit message.
* integrated the `http_success` module, replacing `Vec<u16>` with `&[u16]`
* clarified the PVEs Perl ACME client behaviour in the commit message.

[1] Bugzilla report #6939:
[https://bugzilla.proxmox.com/show_bug.cgi?id=6939](https://bugzilla.proxmox.com/show_bug.cgi?id=6939)
[2] RFC 8555 (ACME):
[https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2](https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2)
[3] PVE’s Perl ACME client (allow 2xx codes for nonce requests):
[https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597)
[4] Pebble ACME server:
[https://github.com/letsencrypt/pebble](https://github.com/letsencrypt/pebble)
[5] Pebble ACME server (perform GET request:
[https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219)

proxmox:

Samuel Rufinatscha (3):
  acme-api: add ACME completion helpers
  acme: introduce http_status module
  fix #6939: acme: support servers returning 204 for nonce requests

 proxmox-acme-api/src/challenge_schemas.rs |  2 +-
 proxmox-acme-api/src/lib.rs               | 57 +++++++++++++++++++++++
 proxmox-acme/src/account.rs               | 10 ++--
 proxmox-acme/src/async_client.rs          |  6 +--
 proxmox-acme/src/client.rs                |  2 +-
 proxmox-acme/src/lib.rs                   |  2 +
 proxmox-acme/src/request.rs               | 15 ++++--
 7 files changed, 81 insertions(+), 13 deletions(-)


proxmox-backup:

Samuel Rufinatscha (2):
  acme: remove local AcmeClient and use proxmox-acme-api handlers
  acme: remove unused src/acme and plugin code

 Cargo.toml                             |   3 +
 src/acme/client.rs                     | 691 -------------------------
 src/acme/mod.rs                        |   5 -
 src/acme/plugin.rs                     | 335 ------------
 src/api2/config/acme.rs                | 399 ++------------
 src/api2/node/certificates.rs          | 221 +-------
 src/api2/types/acme.rs                 |  97 ----
 src/api2/types/mod.rs                  |   3 -
 src/bin/proxmox-backup-api.rs          |   2 +
 src/bin/proxmox-backup-manager.rs      |   3 +-
 src/bin/proxmox-backup-proxy.rs        |   1 +
 src/bin/proxmox_backup_manager/acme.rs |  37 +-
 src/config/acme/mod.rs                 | 168 ------
 src/config/acme/plugin.rs              | 189 -------
 src/config/mod.rs                      |   1 -
 src/config/node.rs                     |  43 +-
 src/lib.rs                             |   2 -
 17 files changed, 94 insertions(+), 2106 deletions(-)
 delete mode 100644 src/acme/client.rs
 delete mode 100644 src/acme/mod.rs
 delete mode 100644 src/acme/plugin.rs
 delete mode 100644 src/api2/types/acme.rs
 delete mode 100644 src/config/acme/mod.rs
 delete mode 100644 src/config/acme/plugin.rs


Summary over all repositories:
  24 files changed, 175 insertions(+), 2119 deletions(-)

-- 
Generated by git-murpp 0.8.1


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[relevance 11%]

* [pbs-devel] [PATCH proxmox v6 1/3] acme-api: add ACME completion helpers
  2026-01-16 11:28 11% [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] " Samuel Rufinatscha
@ 2026-01-16 11:28 16% ` Samuel Rufinatscha
  2026-01-16 11:28 15% ` [pbs-devel] [PATCH proxmox v6 2/3] acme: introduce http_status module Samuel Rufinatscha
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-16 11:28 UTC (permalink / raw)
  To: pbs-devel

Factors out the PBS ACME completion helpers and adds them to
proxmox-acme-api.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
 proxmox-acme-api/src/challenge_schemas.rs |  2 +-
 proxmox-acme-api/src/lib.rs               | 57 +++++++++++++++++++++++
 2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/proxmox-acme-api/src/challenge_schemas.rs b/proxmox-acme-api/src/challenge_schemas.rs
index e66e327e..4e94d3ff 100644
--- a/proxmox-acme-api/src/challenge_schemas.rs
+++ b/proxmox-acme-api/src/challenge_schemas.rs
@@ -29,7 +29,7 @@ impl Serialize for ChallengeSchemaWrapper {
     }
 }
 
-fn load_dns_challenge_schema() -> Result<Vec<AcmeChallengeSchema>, Error> {
+pub(crate) fn load_dns_challenge_schema() -> Result<Vec<AcmeChallengeSchema>, Error> {
     let raw = file_read_string(ACME_DNS_SCHEMA_FN)?;
     let schemas: serde_json::Map<String, Value> = serde_json::from_str(&raw)?;
 
diff --git a/proxmox-acme-api/src/lib.rs b/proxmox-acme-api/src/lib.rs
index 623e9e23..ba64569d 100644
--- a/proxmox-acme-api/src/lib.rs
+++ b/proxmox-acme-api/src/lib.rs
@@ -46,3 +46,60 @@ pub(crate) mod acme_plugin;
 mod certificate_helpers;
 #[cfg(feature = "impl")]
 pub use certificate_helpers::{create_self_signed_cert, order_certificate, revoke_certificate};
+
+#[cfg(feature = "impl")]
+pub mod completion {
+
+    use std::collections::HashMap;
+    use std::ops::ControlFlow;
+
+    use crate::account_config::foreach_acme_account;
+    use crate::challenge_schemas::load_dns_challenge_schema;
+    use crate::plugin_config::plugin_config;
+
+    pub fn complete_acme_account(_arg: &str, _param: &HashMap<String, String>) -> Vec<String> {
+        let mut out = Vec::new();
+        let _ = foreach_acme_account(|name| {
+            out.push(name.into_string());
+            ControlFlow::Continue(())
+        });
+        out
+    }
+
+    pub fn complete_acme_plugin(_arg: &str, _param: &HashMap<String, String>) -> Vec<String> {
+        match plugin_config() {
+            Ok((config, _digest)) => config
+                .iter()
+                .map(|(id, (_type, _cfg))| id.clone())
+                .collect(),
+            Err(_) => Vec::new(),
+        }
+    }
+
+    pub fn complete_acme_plugin_type(_arg: &str, _param: &HashMap<String, String>) -> Vec<String> {
+        vec![
+            "dns".to_string(),
+            //"http".to_string(), // makes currently not really sense to create or the like
+        ]
+    }
+
+    pub fn complete_acme_api_challenge_type(
+        _arg: &str,
+        param: &HashMap<String, String>,
+    ) -> Vec<String> {
+        if param.get("type") == Some(&"dns".to_string()) {
+            match load_dns_challenge_schema() {
+                Ok(schema) => schema.into_iter().map(|s| s.id).collect(),
+                Err(_) => Vec::new(),
+            }
+        } else {
+            Vec::new()
+        }
+    }
+}
+
+#[cfg(feature = "impl")]
+pub use completion::{
+    complete_acme_account, complete_acme_api_challenge_type, complete_acme_plugin,
+    complete_acme_plugin_type,
+};
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply related	[relevance 16%]

* [pbs-devel] [PATCH proxmox-backup v6 2/2] acme: remove unused src/acme and plugin code
  2026-01-16 11:28 11% [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] " Samuel Rufinatscha
                   ` (3 preceding siblings ...)
  2026-01-16 11:28  4% ` [pbs-devel] [PATCH proxmox-backup v6 1/2] acme: remove local AcmeClient and use proxmox-acme-api handlers Samuel Rufinatscha
@ 2026-01-16 11:28  9% ` Samuel Rufinatscha
  4 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-16 11:28 UTC (permalink / raw)
  To: pbs-devel

Removes the unused src/acme module and plugin code as PBS now uses the
factored out client/API handlers.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
 src/acme/mod.rs           |   1 -
 src/acme/plugin.rs        | 335 --------------------------------------
 src/api2/types/acme.rs    |  38 -----
 src/api2/types/mod.rs     |   3 -
 src/config/acme/mod.rs    |   1 -
 src/config/acme/plugin.rs | 105 ------------
 src/config/mod.rs         |   1 -
 src/lib.rs                |   2 -
 8 files changed, 486 deletions(-)
 delete mode 100644 src/acme/mod.rs
 delete mode 100644 src/acme/plugin.rs
 delete mode 100644 src/api2/types/acme.rs
 delete mode 100644 src/config/acme/mod.rs
 delete mode 100644 src/config/acme/plugin.rs

diff --git a/src/acme/mod.rs b/src/acme/mod.rs
deleted file mode 100644
index 700d90d7..00000000
--- a/src/acme/mod.rs
+++ /dev/null
@@ -1 +0,0 @@
-pub(crate) mod plugin;
diff --git a/src/acme/plugin.rs b/src/acme/plugin.rs
deleted file mode 100644
index 6804243c..00000000
--- a/src/acme/plugin.rs
+++ /dev/null
@@ -1,335 +0,0 @@
-use std::future::Future;
-use std::net::{IpAddr, SocketAddr};
-use std::pin::Pin;
-use std::process::Stdio;
-use std::sync::Arc;
-use std::time::Duration;
-
-use anyhow::{bail, format_err, Error};
-use bytes::Bytes;
-use futures::TryFutureExt;
-use http_body_util::Full;
-use hyper::body::Incoming;
-use hyper::server::conn::http1;
-use hyper::service::service_fn;
-use hyper::{Request, Response};
-use hyper_util::rt::TokioIo;
-use tokio::io::{AsyncBufReadExt, AsyncRead, AsyncWriteExt, BufReader};
-use tokio::net::TcpListener;
-use tokio::process::Command;
-
-use proxmox_acme::async_client::AcmeClient;
-use proxmox_acme::{Authorization, Challenge};
-use proxmox_rest_server::WorkerTask;
-
-use crate::api2::types::AcmeDomain;
-use crate::config::acme::plugin::{DnsPlugin, PluginData};
-
-const PROXMOX_ACME_SH_PATH: &str = "/usr/share/proxmox-acme/proxmox-acme";
-
-pub(crate) fn get_acme_plugin(
-    plugin_data: &PluginData,
-    name: &str,
-) -> Result<Option<Box<dyn AcmePlugin + Send + Sync + 'static>>, Error> {
-    let (ty, data) = match plugin_data.get(name) {
-        Some(plugin) => plugin,
-        None => return Ok(None),
-    };
-
-    Ok(Some(match ty.as_str() {
-        "dns" => {
-            let plugin: DnsPlugin = serde::Deserialize::deserialize(data)?;
-            Box::new(plugin)
-        }
-        "standalone" => {
-            // this one has no config
-            Box::<StandaloneServer>::default()
-        }
-        other => bail!("missing implementation for plugin type '{}'", other),
-    }))
-}
-
-pub(crate) trait AcmePlugin {
-    /// Setup everything required to trigger the validation and return the corresponding validation
-    /// URL.
-    fn setup<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
-        &'a mut self,
-        client: &'b mut AcmeClient,
-        authorization: &'c Authorization,
-        domain: &'d AcmeDomain,
-        task: Arc<WorkerTask>,
-    ) -> Pin<Box<dyn Future<Output = Result<&'c str, Error>> + Send + 'fut>>;
-
-    fn teardown<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
-        &'a mut self,
-        client: &'b mut AcmeClient,
-        authorization: &'c Authorization,
-        domain: &'d AcmeDomain,
-        task: Arc<WorkerTask>,
-    ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + 'fut>>;
-}
-
-fn extract_challenge<'a>(
-    authorization: &'a Authorization,
-    ty: &str,
-) -> Result<&'a Challenge, Error> {
-    authorization
-        .challenges
-        .iter()
-        .find(|ch| ch.ty == ty)
-        .ok_or_else(|| format_err!("no supported challenge type ({}) found", ty))
-}
-
-async fn pipe_to_tasklog<T: AsyncRead + Unpin>(
-    pipe: T,
-    task: Arc<WorkerTask>,
-) -> Result<(), std::io::Error> {
-    let mut pipe = BufReader::new(pipe);
-    let mut line = String::new();
-    loop {
-        line.clear();
-        match pipe.read_line(&mut line).await {
-            Ok(0) => return Ok(()),
-            Ok(_) => task.log_message(line.as_str()),
-            Err(err) => return Err(err),
-        }
-    }
-}
-
-impl DnsPlugin {
-    async fn action<'a>(
-        &self,
-        client: &mut AcmeClient,
-        authorization: &'a Authorization,
-        domain: &AcmeDomain,
-        task: Arc<WorkerTask>,
-        action: &str,
-    ) -> Result<&'a str, Error> {
-        let challenge = extract_challenge(authorization, "dns-01")?;
-        let mut stdin_data = client
-            .dns_01_txt_value(
-                challenge
-                    .token()
-                    .ok_or_else(|| format_err!("missing token in challenge"))?,
-            )?
-            .into_bytes();
-        stdin_data.push(b'\n');
-        stdin_data.extend(self.data.as_bytes());
-        if stdin_data.last() != Some(&b'\n') {
-            stdin_data.push(b'\n');
-        }
-
-        let mut command = Command::new("/usr/bin/setpriv");
-
-        #[rustfmt::skip]
-        command.args([
-            "--reuid", "nobody",
-            "--regid", "nogroup",
-            "--clear-groups",
-            "--reset-env",
-            "--",
-            "/bin/bash",
-                PROXMOX_ACME_SH_PATH,
-                action,
-                &self.core.api,
-                domain.alias.as_deref().unwrap_or(&domain.domain),
-        ]);
-
-        // We could use 1 socketpair, but tokio wraps them all in `File` internally causing `close`
-        // to be called separately on all of them without exception, so we need 3 pipes :-(
-
-        let mut child = command
-            .stdin(Stdio::piped())
-            .stdout(Stdio::piped())
-            .stderr(Stdio::piped())
-            .spawn()?;
-
-        let mut stdin = child.stdin.take().expect("Stdio::piped()");
-        let stdout = child.stdout.take().expect("Stdio::piped() failed?");
-        let stdout = pipe_to_tasklog(stdout, Arc::clone(&task));
-        let stderr = child.stderr.take().expect("Stdio::piped() failed?");
-        let stderr = pipe_to_tasklog(stderr, Arc::clone(&task));
-        let stdin = async move {
-            stdin.write_all(&stdin_data).await?;
-            stdin.flush().await?;
-            Ok::<_, std::io::Error>(())
-        };
-        match futures::try_join!(stdin, stdout, stderr) {
-            Ok(((), (), ())) => (),
-            Err(err) => {
-                if let Err(err) = child.kill().await {
-                    task.log_message(format!(
-                        "failed to kill '{PROXMOX_ACME_SH_PATH} {action}' command: {err}"
-                    ));
-                }
-                bail!("'{}' failed: {}", PROXMOX_ACME_SH_PATH, err);
-            }
-        }
-
-        let status = child.wait().await?;
-        if !status.success() {
-            bail!(
-                "'{} {}' exited with error ({})",
-                PROXMOX_ACME_SH_PATH,
-                action,
-                status.code().unwrap_or(-1)
-            );
-        }
-
-        Ok(&challenge.url)
-    }
-}
-
-impl AcmePlugin for DnsPlugin {
-    fn setup<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
-        &'a mut self,
-        client: &'b mut AcmeClient,
-        authorization: &'c Authorization,
-        domain: &'d AcmeDomain,
-        task: Arc<WorkerTask>,
-    ) -> Pin<Box<dyn Future<Output = Result<&'c str, Error>> + Send + 'fut>> {
-        Box::pin(async move {
-            let result = self
-                .action(client, authorization, domain, task.clone(), "setup")
-                .await;
-
-            let validation_delay = self.core.validation_delay.unwrap_or(30) as u64;
-            if validation_delay > 0 {
-                task.log_message(format!(
-                    "Sleeping {validation_delay} seconds to wait for TXT record propagation"
-                ));
-                tokio::time::sleep(Duration::from_secs(validation_delay)).await;
-            }
-            result
-        })
-    }
-
-    fn teardown<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
-        &'a mut self,
-        client: &'b mut AcmeClient,
-        authorization: &'c Authorization,
-        domain: &'d AcmeDomain,
-        task: Arc<WorkerTask>,
-    ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + 'fut>> {
-        Box::pin(async move {
-            self.action(client, authorization, domain, task, "teardown")
-                .await
-                .map(drop)
-        })
-    }
-}
-
-#[derive(Default)]
-struct StandaloneServer {
-    abort_handle: Option<futures::future::AbortHandle>,
-}
-
-// In case the "order_certificates" future gets dropped between setup & teardown, let's also cancel
-// the HTTP listener on Drop:
-impl Drop for StandaloneServer {
-    fn drop(&mut self) {
-        self.stop();
-    }
-}
-
-impl StandaloneServer {
-    fn stop(&mut self) {
-        if let Some(abort) = self.abort_handle.take() {
-            abort.abort();
-        }
-    }
-}
-
-async fn standalone_respond(
-    req: Request<Incoming>,
-    path: Arc<String>,
-    key_auth: Arc<String>,
-) -> Result<Response<Full<Bytes>>, hyper::Error> {
-    if req.method() == hyper::Method::GET && req.uri().path() == path.as_str() {
-        Ok(Response::builder()
-            .status(hyper::http::StatusCode::OK)
-            .body(key_auth.as_bytes().to_vec().into())
-            .unwrap())
-    } else {
-        Ok(Response::builder()
-            .status(hyper::http::StatusCode::NOT_FOUND)
-            .body("Not found.".into())
-            .unwrap())
-    }
-}
-
-impl AcmePlugin for StandaloneServer {
-    fn setup<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
-        &'a mut self,
-        client: &'b mut AcmeClient,
-        authorization: &'c Authorization,
-        _domain: &'d AcmeDomain,
-        _task: Arc<WorkerTask>,
-    ) -> Pin<Box<dyn Future<Output = Result<&'c str, Error>> + Send + 'fut>> {
-        Box::pin(async move {
-            self.stop();
-
-            let challenge = extract_challenge(authorization, "http-01")?;
-            let token = challenge
-                .token()
-                .ok_or_else(|| format_err!("missing token in challenge"))?;
-            let key_auth = Arc::new(client.key_authorization(token)?);
-            let path = Arc::new(format!("/.well-known/acme-challenge/{token}"));
-
-            // `[::]:80` first, then `*:80`
-            let dual = SocketAddr::new(IpAddr::from([0u16; 8]), 80);
-            let ipv4 = SocketAddr::new(IpAddr::from([0u8; 4]), 80);
-            let incoming = TcpListener::bind(dual)
-                .or_else(|_| TcpListener::bind(ipv4))
-                .await?;
-
-            let server = async move {
-                loop {
-                    let key_auth = Arc::clone(&key_auth);
-                    let path = Arc::clone(&path);
-                    match incoming.accept().await {
-                        Ok((tcp, _)) => {
-                            let io = TokioIo::new(tcp);
-                            let service = service_fn(move |request| {
-                                standalone_respond(
-                                    request,
-                                    Arc::clone(&path),
-                                    Arc::clone(&key_auth),
-                                )
-                            });
-
-                            tokio::task::spawn(async move {
-                                if let Err(err) =
-                                    http1::Builder::new().serve_connection(io, service).await
-                                {
-                                    println!("Error serving connection: {err:?}");
-                                }
-                            });
-                        }
-                        Err(err) => println!("Error accepting connection: {err:?}"),
-                    }
-                }
-            };
-            let (future, abort) = futures::future::abortable(server);
-            self.abort_handle = Some(abort);
-            tokio::spawn(future);
-
-            Ok(challenge.url.as_str())
-        })
-    }
-
-    fn teardown<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
-        &'a mut self,
-        _client: &'b mut AcmeClient,
-        _authorization: &'c Authorization,
-        _domain: &'d AcmeDomain,
-        _task: Arc<WorkerTask>,
-    ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + 'fut>> {
-        Box::pin(async move {
-            if let Some(abort) = self.abort_handle.take() {
-                abort.abort();
-            }
-            Ok(())
-        })
-    }
-}
diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
deleted file mode 100644
index b83b9882..00000000
--- a/src/api2/types/acme.rs
+++ /dev/null
@@ -1,38 +0,0 @@
-use serde::{Deserialize, Serialize};
-
-use pbs_api_types::{DNS_ALIAS_FORMAT, DNS_NAME_FORMAT, PROXMOX_SAFE_ID_FORMAT};
-use proxmox_schema::api;
-
-#[api(
-    properties: {
-        "domain": { format: &DNS_NAME_FORMAT },
-        "alias": {
-            optional: true,
-            format: &DNS_ALIAS_FORMAT,
-        },
-        "plugin": {
-            optional: true,
-            format: &PROXMOX_SAFE_ID_FORMAT,
-        },
-    },
-    default_key: "domain",
-)]
-#[derive(Deserialize, Serialize)]
-/// A domain entry for an ACME certificate.
-pub struct AcmeDomain {
-    /// The domain to certify for.
-    pub domain: String,
-
-    /// The domain to use for challenges instead of the default acme challenge domain.
-    ///
-    /// This is useful if you use CNAME entries to redirect `_acme-challenge.*` domains to a
-    /// different DNS server.
-    #[serde(skip_serializing_if = "Option::is_none")]
-    pub alias: Option<String>,
-
-    /// The plugin to use to validate this domain.
-    ///
-    /// Empty means standalone HTTP validation is used.
-    #[serde(skip_serializing_if = "Option::is_none")]
-    pub plugin: Option<String>,
-}
diff --git a/src/api2/types/mod.rs b/src/api2/types/mod.rs
index afc34b30..34193685 100644
--- a/src/api2/types/mod.rs
+++ b/src/api2/types/mod.rs
@@ -4,9 +4,6 @@ use anyhow::bail;
 
 use proxmox_schema::*;
 
-mod acme;
-pub use acme::*;
-
 // File names: may not contain slashes, may not start with "."
 pub const FILENAME_FORMAT: ApiStringFormat = ApiStringFormat::VerifyFn(|name| {
     if name.starts_with('.') {
diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
deleted file mode 100644
index 962cb1bb..00000000
--- a/src/config/acme/mod.rs
+++ /dev/null
@@ -1 +0,0 @@
-pub mod plugin;
diff --git a/src/config/acme/plugin.rs b/src/config/acme/plugin.rs
deleted file mode 100644
index e5a41f99..00000000
--- a/src/config/acme/plugin.rs
+++ /dev/null
@@ -1,105 +0,0 @@
-use anyhow::Error;
-use serde::{Deserialize, Serialize};
-use serde_json::Value;
-
-use pbs_api_types::PROXMOX_SAFE_ID_FORMAT;
-use proxmox_schema::{api, Schema, StringSchema, Updater};
-use proxmox_section_config::SectionConfigData;
-
-pub const PLUGIN_ID_SCHEMA: Schema = StringSchema::new("ACME Challenge Plugin ID.")
-    .format(&PROXMOX_SAFE_ID_FORMAT)
-    .min_length(1)
-    .max_length(32)
-    .schema();
-
-#[api(
-    properties: {
-        id: { schema: PLUGIN_ID_SCHEMA },
-        disable: {
-            optional: true,
-            default: false,
-        },
-        "validation-delay": {
-            default: 30,
-            optional: true,
-            minimum: 0,
-            maximum: 2 * 24 * 60 * 60,
-        },
-    },
-)]
-/// DNS ACME Challenge Plugin core data.
-#[derive(Deserialize, Serialize, Updater)]
-#[serde(rename_all = "kebab-case")]
-pub struct DnsPluginCore {
-    /// Plugin ID.
-    #[updater(skip)]
-    pub id: String,
-
-    /// DNS API Plugin Id.
-    pub api: String,
-
-    /// Extra delay in seconds to wait before requesting validation.
-    ///
-    /// Allows to cope with long TTL of DNS records.
-    #[serde(skip_serializing_if = "Option::is_none", default)]
-    pub validation_delay: Option<u32>,
-
-    /// Flag to disable the config.
-    #[serde(skip_serializing_if = "Option::is_none", default)]
-    pub disable: Option<bool>,
-}
-
-#[api(
-    properties: {
-        core: { type: DnsPluginCore },
-    },
-)]
-/// DNS ACME Challenge Plugin.
-#[derive(Deserialize, Serialize)]
-#[serde(rename_all = "kebab-case")]
-pub struct DnsPlugin {
-    #[serde(flatten)]
-    pub core: DnsPluginCore,
-
-    // We handle this property separately in the API calls.
-    /// DNS plugin data (base64url encoded without padding).
-    #[serde(with = "proxmox_serde::string_as_base64url_nopad")]
-    pub data: String,
-}
-
-impl DnsPlugin {
-    pub fn decode_data(&self, output: &mut Vec<u8>) -> Result<(), Error> {
-        Ok(proxmox_base64::url::decode_to_vec(&self.data, output)?)
-    }
-}
-
-pub struct PluginData {
-    data: SectionConfigData,
-}
-
-// And some convenience helpers.
-impl PluginData {
-    pub fn remove(&mut self, name: &str) -> Option<(String, Value)> {
-        self.data.sections.remove(name)
-    }
-
-    pub fn contains_key(&mut self, name: &str) -> bool {
-        self.data.sections.contains_key(name)
-    }
-
-    pub fn get(&self, name: &str) -> Option<&(String, Value)> {
-        self.data.sections.get(name)
-    }
-
-    pub fn get_mut(&mut self, name: &str) -> Option<&mut (String, Value)> {
-        self.data.sections.get_mut(name)
-    }
-
-    pub fn insert(&mut self, id: String, ty: String, plugin: Value) {
-        self.data.sections.insert(id, (ty, plugin));
-    }
-
-    pub fn iter(&self) -> impl Iterator<Item = (&String, &(String, Value))> + Send {
-        self.data.sections.iter()
-    }
-}
diff --git a/src/config/mod.rs b/src/config/mod.rs
index 19246742..f05af90d 100644
--- a/src/config/mod.rs
+++ b/src/config/mod.rs
@@ -15,7 +15,6 @@ use proxmox_lang::try_block;
 use pbs_api_types::{PamRealmConfig, PbsRealmConfig};
 use pbs_buildcfg::{self, configdir};
 
-pub mod acme;
 pub mod node;
 pub mod tfa;
 
diff --git a/src/lib.rs b/src/lib.rs
index 8633378c..828f5842 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -27,8 +27,6 @@ pub(crate) mod auth;
 
 pub mod tape;
 
-pub mod acme;
-
 pub mod client_helpers;
 
 pub mod traffic_control_cache;
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply related	[relevance 9%]

* [pbs-devel] [PATCH proxmox-backup v6 1/2] acme: remove local AcmeClient and use proxmox-acme-api handlers
  2026-01-16 11:28 11% [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] " Samuel Rufinatscha
                   ` (2 preceding siblings ...)
  2026-01-16 11:28 14% ` [pbs-devel] [PATCH proxmox v6 3/3] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
@ 2026-01-16 11:28  4% ` Samuel Rufinatscha
  2026-01-16 11:28  9% ` [pbs-devel] [PATCH proxmox-backup v6 2/2] acme: remove unused src/acme and plugin code Samuel Rufinatscha
  4 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-16 11:28 UTC (permalink / raw)
  To: pbs-devel

PBS currently uses its own ACME client and API logic, while PDM uses the
factored out proxmox-acme and proxmox-acme-api crates. This requires
maintenance in two places. This patch moves PBS over to the shared
ACME stack.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
 Cargo.toml                             |   3 +
 src/acme/client.rs                     | 691 -------------------------
 src/acme/mod.rs                        |   4 -
 src/acme/plugin.rs                     |   2 +-
 src/api2/config/acme.rs                | 399 ++------------
 src/api2/node/certificates.rs          | 221 +-------
 src/api2/types/acme.rs                 |  61 +--
 src/bin/proxmox-backup-api.rs          |   2 +
 src/bin/proxmox-backup-manager.rs      |   3 +-
 src/bin/proxmox-backup-proxy.rs        |   1 +
 src/bin/proxmox_backup_manager/acme.rs |  37 +-
 src/config/acme/mod.rs                 | 167 ------
 src/config/acme/plugin.rs              |  88 +---
 src/config/node.rs                     |  43 +-
 14 files changed, 98 insertions(+), 1624 deletions(-)
 delete mode 100644 src/acme/client.rs

diff --git a/Cargo.toml b/Cargo.toml
index 49548ecc..5c94bfaa 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -101,6 +101,7 @@ pbs-api-types = "1.0.8"
 # other proxmox crates
 pathpatterns = "1"
 proxmox-acme = "1"
+proxmox-acme-api = { version = "1", features = [ "impl" ] }
 pxar = "1"
 
 # PBS workspace
@@ -251,6 +252,7 @@ pbs-api-types.workspace = true
 
 # in their respective repo
 proxmox-acme.workspace = true
+proxmox-acme-api.workspace = true
 pxar.workspace = true
 
 # proxmox-backup workspace/internal crates
@@ -269,6 +271,7 @@ proxmox-rrd-api-types.workspace = true
 [patch.crates-io]
 #pbs-api-types = { path = "../proxmox/pbs-api-types" }
 #proxmox-acme = { path = "../proxmox/proxmox-acme" }
+#proxmox-acme-api = { path = "../proxmox/proxmox-acme-api" }
 #proxmox-api-macro = { path = "../proxmox/proxmox-api-macro" }
 #proxmox-apt = { path = "../proxmox/proxmox-apt" }
 #proxmox-apt-api-types = { path = "../proxmox/proxmox-apt-api-types" }
diff --git a/src/acme/client.rs b/src/acme/client.rs
deleted file mode 100644
index 9fb6ad55..00000000
--- a/src/acme/client.rs
+++ /dev/null
@@ -1,691 +0,0 @@
-//! HTTP Client for the ACME protocol.
-
-use std::fs::OpenOptions;
-use std::io;
-use std::os::unix::fs::OpenOptionsExt;
-
-use anyhow::{bail, format_err};
-use bytes::Bytes;
-use http_body_util::BodyExt;
-use hyper::Request;
-use nix::sys::stat::Mode;
-use proxmox_http::Body;
-use serde::{Deserialize, Serialize};
-
-use proxmox_acme::account::AccountCreator;
-use proxmox_acme::order::{Order, OrderData};
-use proxmox_acme::types::AccountData as AcmeAccountData;
-use proxmox_acme::Request as AcmeRequest;
-use proxmox_acme::{Account, Authorization, Challenge, Directory, Error, ErrorResponse};
-use proxmox_http::client::Client;
-use proxmox_sys::fs::{replace_file, CreateOptions};
-
-use crate::api2::types::AcmeAccountName;
-use crate::config::acme::account_path;
-use crate::tools::pbs_simple_http;
-
-/// Our on-disk format inherited from PVE's proxmox-acme code.
-#[derive(Deserialize, Serialize)]
-#[serde(rename_all = "camelCase")]
-pub struct AccountData {
-    /// The account's location URL.
-    location: String,
-
-    /// The account data.
-    account: AcmeAccountData,
-
-    /// The private key as PEM formatted string.
-    key: String,
-
-    /// ToS URL the user agreed to.
-    #[serde(skip_serializing_if = "Option::is_none")]
-    tos: Option<String>,
-
-    #[serde(skip_serializing_if = "is_false", default)]
-    debug: bool,
-
-    /// The directory's URL.
-    directory_url: String,
-}
-
-#[inline]
-fn is_false(b: &bool) -> bool {
-    !*b
-}
-
-pub struct AcmeClient {
-    directory_url: String,
-    debug: bool,
-    account_path: Option<String>,
-    tos: Option<String>,
-    account: Option<Account>,
-    directory: Option<Directory>,
-    nonce: Option<String>,
-    http_client: Client,
-}
-
-impl AcmeClient {
-    /// Create a new ACME client for a given ACME directory URL.
-    pub fn new(directory_url: String) -> Self {
-        Self {
-            directory_url,
-            debug: false,
-            account_path: None,
-            tos: None,
-            account: None,
-            directory: None,
-            nonce: None,
-            http_client: pbs_simple_http(None),
-        }
-    }
-
-    /// Load an existing ACME account by name.
-    pub async fn load(account_name: &AcmeAccountName) -> Result<Self, anyhow::Error> {
-        let account_path = account_path(account_name.as_ref());
-        let data = match tokio::fs::read(&account_path).await {
-            Ok(data) => data,
-            Err(err) if err.kind() == io::ErrorKind::NotFound => {
-                bail!("acme account '{}' does not exist", account_name)
-            }
-            Err(err) => bail!(
-                "failed to load acme account from '{}' - {}",
-                account_path,
-                err
-            ),
-        };
-        let data: AccountData = serde_json::from_slice(&data).map_err(|err| {
-            format_err!(
-                "failed to parse acme account from '{}' - {}",
-                account_path,
-                err
-            )
-        })?;
-
-        let account = Account::from_parts(data.location, data.key, data.account);
-
-        let mut me = Self::new(data.directory_url);
-        me.debug = data.debug;
-        me.account_path = Some(account_path);
-        me.tos = data.tos;
-        me.account = Some(account);
-
-        Ok(me)
-    }
-
-    pub async fn new_account<'a>(
-        &'a mut self,
-        account_name: &AcmeAccountName,
-        tos_agreed: bool,
-        contact: Vec<String>,
-        rsa_bits: Option<u32>,
-        eab_creds: Option<(String, String)>,
-    ) -> Result<&'a Account, anyhow::Error> {
-        self.tos = if tos_agreed {
-            self.terms_of_service_url().await?.map(str::to_owned)
-        } else {
-            None
-        };
-
-        let mut account = Account::creator()
-            .set_contacts(contact)
-            .agree_to_tos(tos_agreed);
-
-        if let Some((eab_kid, eab_hmac_key)) = eab_creds {
-            account = account.set_eab_credentials(eab_kid, eab_hmac_key)?;
-        }
-
-        let account = if let Some(bits) = rsa_bits {
-            account.generate_rsa_key(bits)?
-        } else {
-            account.generate_ec_key()?
-        };
-
-        let _ = self.register_account(account).await?;
-
-        crate::config::acme::make_acme_account_dir()?;
-        let account_path = account_path(account_name.as_ref());
-        let file = OpenOptions::new()
-            .write(true)
-            .create_new(true)
-            .mode(0o600)
-            .open(&account_path)
-            .map_err(|err| format_err!("failed to open {:?} for writing: {}", account_path, err))?;
-        self.write_to(file).map_err(|err| {
-            format_err!(
-                "failed to write acme account to {:?}: {}",
-                account_path,
-                err
-            )
-        })?;
-        self.account_path = Some(account_path);
-
-        // unwrap: Setting `self.account` is literally this function's job, we just can't keep
-        // the borrow from from `self.register_account()` active due to clashes.
-        Ok(self.account.as_ref().unwrap())
-    }
-
-    fn save(&self) -> Result<(), anyhow::Error> {
-        let mut data = Vec::<u8>::new();
-        self.write_to(&mut data)?;
-        let account_path = self.account_path.as_ref().ok_or_else(|| {
-            format_err!("no account path set, cannot save updated account information")
-        })?;
-        crate::config::acme::make_acme_account_dir()?;
-        replace_file(
-            account_path,
-            &data,
-            CreateOptions::new()
-                .perm(Mode::from_bits_truncate(0o600))
-                .owner(nix::unistd::ROOT)
-                .group(nix::unistd::Gid::from_raw(0)),
-            true,
-        )
-    }
-
-    /// Shortcut to `account().ok_or_else(...).key_authorization()`.
-    pub fn key_authorization(&self, token: &str) -> Result<String, anyhow::Error> {
-        Ok(Self::need_account(&self.account)?.key_authorization(token)?)
-    }
-
-    /// Shortcut to `account().ok_or_else(...).dns_01_txt_value()`.
-    /// the key authorization value.
-    pub fn dns_01_txt_value(&self, token: &str) -> Result<String, anyhow::Error> {
-        Ok(Self::need_account(&self.account)?.dns_01_txt_value(token)?)
-    }
-
-    async fn register_account(
-        &mut self,
-        account: AccountCreator,
-    ) -> Result<&Account, anyhow::Error> {
-        let mut retry = retry();
-        let mut response = loop {
-            retry.tick()?;
-
-            let (directory, nonce) = Self::get_dir_nonce(
-                &mut self.http_client,
-                &self.directory_url,
-                &mut self.directory,
-                &mut self.nonce,
-            )
-            .await?;
-            let request = account.request(directory, nonce)?;
-            match self.run_request(request).await {
-                Ok(response) => break response,
-                Err(err) if err.is_bad_nonce() => continue,
-                Err(err) => return Err(err.into()),
-            }
-        };
-
-        let account = account.response(response.location_required()?, &response.body)?;
-
-        self.account = Some(account);
-        Ok(self.account.as_ref().unwrap())
-    }
-
-    pub async fn update_account<T: Serialize>(
-        &mut self,
-        data: &T,
-    ) -> Result<&Account, anyhow::Error> {
-        let account = Self::need_account(&self.account)?;
-
-        let mut retry = retry();
-        let response = loop {
-            retry.tick()?;
-
-            let (_directory, nonce) = Self::get_dir_nonce(
-                &mut self.http_client,
-                &self.directory_url,
-                &mut self.directory,
-                &mut self.nonce,
-            )
-            .await?;
-
-            let request = account.post_request(&account.location, nonce, data)?;
-            match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
-                Ok(response) => break response,
-                Err(err) if err.is_bad_nonce() => continue,
-                Err(err) => return Err(err.into()),
-            }
-        };
-
-        // unwrap: we've been keeping an immutable reference to it from the top of the method
-        let _ = account;
-        self.account.as_mut().unwrap().data = response.json()?;
-        self.save()?;
-        Ok(self.account.as_ref().unwrap())
-    }
-
-    pub async fn new_order<I>(&mut self, domains: I) -> Result<Order, anyhow::Error>
-    where
-        I: IntoIterator<Item = String>,
-    {
-        let account = Self::need_account(&self.account)?;
-
-        let order = domains
-            .into_iter()
-            .fold(OrderData::new(), |order, domain| order.domain(domain));
-
-        let mut retry = retry();
-        loop {
-            retry.tick()?;
-
-            let (directory, nonce) = Self::get_dir_nonce(
-                &mut self.http_client,
-                &self.directory_url,
-                &mut self.directory,
-                &mut self.nonce,
-            )
-            .await?;
-
-            let mut new_order = account.new_order(&order, directory, nonce)?;
-            let mut response = match Self::execute(
-                &mut self.http_client,
-                new_order.request.take().unwrap(),
-                &mut self.nonce,
-            )
-            .await
-            {
-                Ok(response) => response,
-                Err(err) if err.is_bad_nonce() => continue,
-                Err(err) => return Err(err.into()),
-            };
-
-            return Ok(
-                new_order.response(response.location_required()?, response.bytes().as_ref())?
-            );
-        }
-    }
-
-    /// Low level "POST-as-GET" request.
-    async fn post_as_get(&mut self, url: &str) -> Result<AcmeResponse, anyhow::Error> {
-        let account = Self::need_account(&self.account)?;
-
-        let mut retry = retry();
-        loop {
-            retry.tick()?;
-
-            let (_directory, nonce) = Self::get_dir_nonce(
-                &mut self.http_client,
-                &self.directory_url,
-                &mut self.directory,
-                &mut self.nonce,
-            )
-            .await?;
-
-            let request = account.get_request(url, nonce)?;
-            match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
-                Ok(response) => return Ok(response),
-                Err(err) if err.is_bad_nonce() => continue,
-                Err(err) => return Err(err.into()),
-            }
-        }
-    }
-
-    /// Low level POST request.
-    async fn post<T: Serialize>(
-        &mut self,
-        url: &str,
-        data: &T,
-    ) -> Result<AcmeResponse, anyhow::Error> {
-        let account = Self::need_account(&self.account)?;
-
-        let mut retry = retry();
-        loop {
-            retry.tick()?;
-
-            let (_directory, nonce) = Self::get_dir_nonce(
-                &mut self.http_client,
-                &self.directory_url,
-                &mut self.directory,
-                &mut self.nonce,
-            )
-            .await?;
-
-            let request = account.post_request(url, nonce, data)?;
-            match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
-                Ok(response) => return Ok(response),
-                Err(err) if err.is_bad_nonce() => continue,
-                Err(err) => return Err(err.into()),
-            }
-        }
-    }
-
-    /// Request challenge validation. Afterwards, the challenge should be polled.
-    pub async fn request_challenge_validation(
-        &mut self,
-        url: &str,
-    ) -> Result<Challenge, anyhow::Error> {
-        Ok(self
-            .post(url, &serde_json::Value::Object(Default::default()))
-            .await?
-            .json()?)
-    }
-
-    /// Assuming the provided URL is an 'Authorization' URL, get and deserialize it.
-    pub async fn get_authorization(&mut self, url: &str) -> Result<Authorization, anyhow::Error> {
-        Ok(self.post_as_get(url).await?.json()?)
-    }
-
-    /// Assuming the provided URL is an 'Order' URL, get and deserialize it.
-    pub async fn get_order(&mut self, url: &str) -> Result<OrderData, anyhow::Error> {
-        Ok(self.post_as_get(url).await?.json()?)
-    }
-
-    /// Finalize an Order via its `finalize` URL property and the DER encoded CSR.
-    pub async fn finalize(&mut self, url: &str, csr: &[u8]) -> Result<(), anyhow::Error> {
-        let csr = proxmox_base64::url::encode_no_pad(csr);
-        let data = serde_json::json!({ "csr": csr });
-        self.post(url, &data).await?;
-        Ok(())
-    }
-
-    /// Download a certificate via its 'certificate' URL property.
-    ///
-    /// The certificate will be a PEM certificate chain.
-    pub async fn get_certificate(&mut self, url: &str) -> Result<Bytes, anyhow::Error> {
-        Ok(self.post_as_get(url).await?.body)
-    }
-
-    /// Revoke an existing certificate (PEM or DER formatted).
-    pub async fn revoke_certificate(
-        &mut self,
-        certificate: &[u8],
-        reason: Option<u32>,
-    ) -> Result<(), anyhow::Error> {
-        // TODO: This can also work without an account.
-        let account = Self::need_account(&self.account)?;
-
-        let revocation = account.revoke_certificate(certificate, reason)?;
-
-        let mut retry = retry();
-        loop {
-            retry.tick()?;
-
-            let (directory, nonce) = Self::get_dir_nonce(
-                &mut self.http_client,
-                &self.directory_url,
-                &mut self.directory,
-                &mut self.nonce,
-            )
-            .await?;
-
-            let request = revocation.request(directory, nonce)?;
-            match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
-                Ok(_response) => return Ok(()),
-                Err(err) if err.is_bad_nonce() => continue,
-                Err(err) => return Err(err.into()),
-            }
-        }
-    }
-
-    fn need_account(account: &Option<Account>) -> Result<&Account, anyhow::Error> {
-        account
-            .as_ref()
-            .ok_or_else(|| format_err!("cannot use client without an account"))
-    }
-
-    pub(crate) fn account(&self) -> Result<&Account, anyhow::Error> {
-        Self::need_account(&self.account)
-    }
-
-    pub fn tos(&self) -> Option<&str> {
-        self.tos.as_deref()
-    }
-
-    pub fn directory_url(&self) -> &str {
-        &self.directory_url
-    }
-
-    fn to_account_data(&self) -> Result<AccountData, anyhow::Error> {
-        let account = self.account()?;
-
-        Ok(AccountData {
-            location: account.location.clone(),
-            key: account.private_key.clone(),
-            account: AcmeAccountData {
-                only_return_existing: false, // don't actually write this out in case it's set
-                ..account.data.clone()
-            },
-            tos: self.tos.clone(),
-            debug: self.debug,
-            directory_url: self.directory_url.clone(),
-        })
-    }
-
-    fn write_to<T: io::Write>(&self, out: T) -> Result<(), anyhow::Error> {
-        let data = self.to_account_data()?;
-
-        Ok(serde_json::to_writer_pretty(out, &data)?)
-    }
-}
-
-struct AcmeResponse {
-    body: Bytes,
-    location: Option<String>,
-    got_nonce: bool,
-}
-
-impl AcmeResponse {
-    /// Convenience helper to assert that a location header was part of the response.
-    fn location_required(&mut self) -> Result<String, anyhow::Error> {
-        self.location
-            .take()
-            .ok_or_else(|| format_err!("missing Location header"))
-    }
-
-    /// Convenience shortcut to perform json deserialization of the returned body.
-    fn json<T: for<'a> Deserialize<'a>>(&self) -> Result<T, Error> {
-        Ok(serde_json::from_slice(&self.body)?)
-    }
-
-    /// Convenience shortcut to get the body as bytes.
-    fn bytes(&self) -> &[u8] {
-        &self.body
-    }
-}
-
-impl AcmeClient {
-    /// Non-self-borrowing run_request version for borrow workarounds.
-    async fn execute(
-        http_client: &mut Client,
-        request: AcmeRequest,
-        nonce: &mut Option<String>,
-    ) -> Result<AcmeResponse, Error> {
-        let req_builder = Request::builder().method(request.method).uri(&request.url);
-
-        let http_request = if !request.content_type.is_empty() {
-            req_builder
-                .header("Content-Type", request.content_type)
-                .header("Content-Length", request.body.len())
-                .body(request.body.into())
-        } else {
-            req_builder.body(Body::empty())
-        }
-        .map_err(|err| Error::Custom(format!("failed to create http request: {err}")))?;
-
-        let response = http_client
-            .request(http_request)
-            .await
-            .map_err(|err| Error::Custom(err.to_string()))?;
-        let (parts, body) = response.into_parts();
-
-        let status = parts.status.as_u16();
-        let body = body
-            .collect()
-            .await
-            .map_err(|err| Error::Custom(format!("failed to retrieve response body: {err}")))?
-            .to_bytes();
-
-        let got_nonce = if let Some(new_nonce) = parts.headers.get(proxmox_acme::REPLAY_NONCE) {
-            let new_nonce = new_nonce.to_str().map_err(|err| {
-                Error::Client(format!(
-                    "received invalid replay-nonce header from ACME server: {err}"
-                ))
-            })?;
-            *nonce = Some(new_nonce.to_owned());
-            true
-        } else {
-            false
-        };
-
-        if parts.status.is_success() {
-            if status != request.expected {
-                return Err(Error::InvalidApi(format!(
-                    "ACME server responded with unexpected status code: {:?}",
-                    parts.status
-                )));
-            }
-
-            let location = parts
-                .headers
-                .get("Location")
-                .map(|header| {
-                    header.to_str().map(str::to_owned).map_err(|err| {
-                        Error::Client(format!(
-                            "received invalid location header from ACME server: {err}"
-                        ))
-                    })
-                })
-                .transpose()?;
-
-            return Ok(AcmeResponse {
-                body,
-                location,
-                got_nonce,
-            });
-        }
-
-        let error: ErrorResponse = serde_json::from_slice(&body).map_err(|err| {
-            Error::Client(format!(
-                "error status with improper error ACME response: {err}"
-            ))
-        })?;
-
-        if error.ty == proxmox_acme::error::BAD_NONCE {
-            if !got_nonce {
-                return Err(Error::InvalidApi(
-                    "badNonce without a new Replay-Nonce header".to_string(),
-                ));
-            }
-            return Err(Error::BadNonce);
-        }
-
-        Err(Error::Api(error))
-    }
-
-    /// Low-level API to run an n API request. This automatically updates the current nonce!
-    async fn run_request(&mut self, request: AcmeRequest) -> Result<AcmeResponse, Error> {
-        Self::execute(&mut self.http_client, request, &mut self.nonce).await
-    }
-
-    pub async fn directory(&mut self) -> Result<&Directory, Error> {
-        Ok(Self::get_directory(
-            &mut self.http_client,
-            &self.directory_url,
-            &mut self.directory,
-            &mut self.nonce,
-        )
-        .await?
-        .0)
-    }
-
-    async fn get_directory<'a, 'b>(
-        http_client: &mut Client,
-        directory_url: &str,
-        directory: &'a mut Option<Directory>,
-        nonce: &'b mut Option<String>,
-    ) -> Result<(&'a Directory, Option<&'b str>), Error> {
-        if let Some(d) = directory {
-            return Ok((d, nonce.as_deref()));
-        }
-
-        let response = Self::execute(
-            http_client,
-            AcmeRequest {
-                url: directory_url.to_string(),
-                method: "GET",
-                content_type: "",
-                body: String::new(),
-                expected: 200,
-            },
-            nonce,
-        )
-        .await?;
-
-        *directory = Some(Directory::from_parts(
-            directory_url.to_string(),
-            response.json()?,
-        ));
-
-        Ok((directory.as_mut().unwrap(), nonce.as_deref()))
-    }
-
-    /// Like `get_directory`, but if the directory provides no nonce, also performs a `HEAD`
-    /// request on the new nonce URL.
-    async fn get_dir_nonce<'a, 'b>(
-        http_client: &mut Client,
-        directory_url: &str,
-        directory: &'a mut Option<Directory>,
-        nonce: &'b mut Option<String>,
-    ) -> Result<(&'a Directory, &'b str), Error> {
-        // this let construct is a lifetime workaround:
-        let _ = Self::get_directory(http_client, directory_url, directory, nonce).await?;
-        let dir = directory.as_ref().unwrap(); // the above fails if it couldn't fill this option
-        if nonce.is_none() {
-            // this is also a lifetime issue...
-            let _ = Self::get_nonce(http_client, nonce, dir.new_nonce_url()).await?;
-        };
-        Ok((dir, nonce.as_deref().unwrap()))
-    }
-
-    pub async fn terms_of_service_url(&mut self) -> Result<Option<&str>, Error> {
-        Ok(self.directory().await?.terms_of_service_url())
-    }
-
-    async fn get_nonce<'a>(
-        http_client: &mut Client,
-        nonce: &'a mut Option<String>,
-        new_nonce_url: &str,
-    ) -> Result<&'a str, Error> {
-        let response = Self::execute(
-            http_client,
-            AcmeRequest {
-                url: new_nonce_url.to_owned(),
-                method: "HEAD",
-                content_type: "",
-                body: String::new(),
-                expected: 200,
-            },
-            nonce,
-        )
-        .await?;
-
-        if !response.got_nonce {
-            return Err(Error::InvalidApi(
-                "no new nonce received from new nonce URL".to_string(),
-            ));
-        }
-
-        nonce
-            .as_deref()
-            .ok_or_else(|| Error::Client("failed to update nonce".to_string()))
-    }
-}
-
-/// bad nonce retry count helper
-struct Retry(usize);
-
-const fn retry() -> Retry {
-    Retry(0)
-}
-
-impl Retry {
-    fn tick(&mut self) -> Result<(), Error> {
-        if self.0 >= 3 {
-            Err(Error::Client("kept getting a badNonce error!".to_string()))
-        } else {
-            self.0 += 1;
-            Ok(())
-        }
-    }
-}
diff --git a/src/acme/mod.rs b/src/acme/mod.rs
index bf61811c..700d90d7 100644
--- a/src/acme/mod.rs
+++ b/src/acme/mod.rs
@@ -1,5 +1 @@
-mod client;
-pub use client::AcmeClient;
-
 pub(crate) mod plugin;
-pub(crate) use plugin::get_acme_plugin;
diff --git a/src/acme/plugin.rs b/src/acme/plugin.rs
index 993d729b..6804243c 100644
--- a/src/acme/plugin.rs
+++ b/src/acme/plugin.rs
@@ -18,10 +18,10 @@ use tokio::io::{AsyncBufReadExt, AsyncRead, AsyncWriteExt, BufReader};
 use tokio::net::TcpListener;
 use tokio::process::Command;
 
+use proxmox_acme::async_client::AcmeClient;
 use proxmox_acme::{Authorization, Challenge};
 use proxmox_rest_server::WorkerTask;
 
-use crate::acme::AcmeClient;
 use crate::api2::types::AcmeDomain;
 use crate::config::acme::plugin::{DnsPlugin, PluginData};
 
diff --git a/src/api2/config/acme.rs b/src/api2/config/acme.rs
index 18671639..fb1a8a6f 100644
--- a/src/api2/config/acme.rs
+++ b/src/api2/config/acme.rs
@@ -1,29 +1,19 @@
-use std::fs;
-use std::ops::ControlFlow;
+use anyhow::Error;
 use std::path::Path;
-use std::sync::{Arc, LazyLock, Mutex};
-use std::time::SystemTime;
-
-use anyhow::{bail, format_err, Error};
-use hex::FromHex;
-use serde::{Deserialize, Serialize};
-use serde_json::{json, Value};
-use tracing::{info, warn};
+use tracing::info;
 
 use pbs_api_types::{Authid, PRIV_SYS_MODIFY};
-use proxmox_acme::types::AccountData as AcmeAccountData;
-use proxmox_acme::Account;
+use proxmox_acme_api::{
+    AccountEntry, AccountInfo, AcmeAccountName, AcmeChallengeSchema, ChallengeSchemaWrapper,
+    DeletablePluginProperty, DnsPluginCore, DnsPluginCoreUpdater, KnownAcmeDirectory, PluginConfig,
+    DEFAULT_ACME_DIRECTORY_ENTRY, PLUGIN_ID_SCHEMA,
+};
+use proxmox_config_digest::ConfigDigest;
 use proxmox_rest_server::WorkerTask;
 use proxmox_router::{
     http_bail, list_subdirs_api_method, Permission, Router, RpcEnvironment, SubdirMap,
 };
-use proxmox_schema::{api, param_bail};
-
-use crate::acme::AcmeClient;
-use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
-use crate::config::acme::plugin::{
-    self, DnsPlugin, DnsPluginCore, DnsPluginCoreUpdater, PLUGIN_ID_SCHEMA,
-};
+use proxmox_schema::api;
 
 pub(crate) const ROUTER: Router = Router::new()
     .get(&list_subdirs_api_method!(SUBDIRS))
@@ -65,19 +55,6 @@ const PLUGIN_ITEM_ROUTER: Router = Router::new()
     .put(&API_METHOD_UPDATE_PLUGIN)
     .delete(&API_METHOD_DELETE_PLUGIN);
 
-#[api(
-    properties: {
-        name: { type: AcmeAccountName },
-    },
-)]
-/// An ACME Account entry.
-///
-/// Currently only contains a 'name' property.
-#[derive(Serialize)]
-pub struct AccountEntry {
-    name: AcmeAccountName,
-}
-
 #[api(
     access: {
         permission: &Permission::Privilege(&["system", "certificates"], PRIV_SYS_MODIFY, false),
@@ -91,40 +68,7 @@ pub struct AccountEntry {
 )]
 /// List ACME accounts.
 pub fn list_accounts() -> Result<Vec<AccountEntry>, Error> {
-    let mut entries = Vec::new();
-    crate::config::acme::foreach_acme_account(|name| {
-        entries.push(AccountEntry { name });
-        ControlFlow::Continue(())
-    })?;
-    Ok(entries)
-}
-
-#[api(
-    properties: {
-        account: { type: Object, properties: {}, additional_properties: true },
-        tos: {
-            type: String,
-            optional: true,
-        },
-    },
-)]
-/// ACME Account information.
-///
-/// This is what we return via the API.
-#[derive(Serialize)]
-pub struct AccountInfo {
-    /// Raw account data.
-    account: AcmeAccountData,
-
-    /// The ACME directory URL the account was created at.
-    directory: String,
-
-    /// The account's own URL within the ACME directory.
-    location: String,
-
-    /// The ToS URL, if the user agreed to one.
-    #[serde(skip_serializing_if = "Option::is_none")]
-    tos: Option<String>,
+    proxmox_acme_api::list_accounts()
 }
 
 #[api(
@@ -141,23 +85,7 @@ pub struct AccountInfo {
 )]
 /// Return existing ACME account information.
 pub async fn get_account(name: AcmeAccountName) -> Result<AccountInfo, Error> {
-    let client = AcmeClient::load(&name).await?;
-    let account = client.account()?;
-    Ok(AccountInfo {
-        location: account.location.clone(),
-        tos: client.tos().map(str::to_owned),
-        directory: client.directory_url().to_owned(),
-        account: AcmeAccountData {
-            only_return_existing: false, // don't actually write this out in case it's set
-            ..account.data.clone()
-        },
-    })
-}
-
-fn account_contact_from_string(s: &str) -> Vec<String> {
-    s.split(&[' ', ';', ',', '\0'][..])
-        .map(|s| format!("mailto:{s}"))
-        .collect()
+    proxmox_acme_api::get_account(name).await
 }
 
 #[api(
@@ -222,15 +150,11 @@ fn register_account(
         );
     }
 
-    if Path::new(&crate::config::acme::account_path(&name)).exists() {
+    if Path::new(&proxmox_acme_api::account_config_filename(&name)).exists() {
         http_bail!(BAD_REQUEST, "account {} already exists", name);
     }
 
-    let directory = directory.unwrap_or_else(|| {
-        crate::config::acme::DEFAULT_ACME_DIRECTORY_ENTRY
-            .url
-            .to_owned()
-    });
+    let directory = directory.unwrap_or_else(|| DEFAULT_ACME_DIRECTORY_ENTRY.url.to_string());
 
     WorkerTask::spawn(
         "acme-register",
@@ -238,41 +162,24 @@ fn register_account(
         auth_id.to_string(),
         true,
         move |_worker| async move {
-            let mut client = AcmeClient::new(directory);
-
             info!("Registering ACME account '{}'...", &name);
 
-            let account = do_register_account(
-                &mut client,
+            let location = proxmox_acme_api::register_account(
                 &name,
-                tos_url.is_some(),
                 contact,
-                None,
+                tos_url,
+                Some(directory),
                 eab_kid.zip(eab_hmac_key),
             )
             .await?;
 
-            info!("Registration successful, account URL: {}", account.location);
+            info!("Registration successful, account URL: {}", location);
 
             Ok(())
         },
     )
 }
 
-pub async fn do_register_account<'a>(
-    client: &'a mut AcmeClient,
-    name: &AcmeAccountName,
-    agree_to_tos: bool,
-    contact: String,
-    rsa_bits: Option<u32>,
-    eab_creds: Option<(String, String)>,
-) -> Result<&'a Account, Error> {
-    let contact = account_contact_from_string(&contact);
-    client
-        .new_account(name, agree_to_tos, contact, rsa_bits, eab_creds)
-        .await
-}
-
 #[api(
     input: {
         properties: {
@@ -303,14 +210,7 @@ pub fn update_account(
         auth_id.to_string(),
         true,
         move |_worker| async move {
-            let data = match contact {
-                Some(data) => json!({
-                    "contact": account_contact_from_string(&data),
-                }),
-                None => json!({}),
-            };
-
-            AcmeClient::load(&name).await?.update_account(&data).await?;
+            proxmox_acme_api::update_account(&name, contact).await?;
 
             Ok(())
         },
@@ -348,18 +248,8 @@ pub fn deactivate_account(
         auth_id.to_string(),
         true,
         move |_worker| async move {
-            match AcmeClient::load(&name)
-                .await?
-                .update_account(&json!({"status": "deactivated"}))
-                .await
-            {
-                Ok(_account) => (),
-                Err(err) if !force => return Err(err),
-                Err(err) => {
-                    warn!("error deactivating account {name}, proceeding anyway - {err}");
-                }
-            }
-            crate::config::acme::mark_account_deactivated(&name)?;
+            proxmox_acme_api::deactivate_account(&name, force).await?;
+
             Ok(())
         },
     )
@@ -386,15 +276,7 @@ pub fn deactivate_account(
 )]
 /// Get the Terms of Service URL for an ACME directory.
 async fn get_tos(directory: Option<String>) -> Result<Option<String>, Error> {
-    let directory = directory.unwrap_or_else(|| {
-        crate::config::acme::DEFAULT_ACME_DIRECTORY_ENTRY
-            .url
-            .to_owned()
-    });
-    Ok(AcmeClient::new(directory)
-        .terms_of_service_url()
-        .await?
-        .map(str::to_owned))
+    proxmox_acme_api::get_tos(directory).await
 }
 
 #[api(
@@ -409,52 +291,7 @@ async fn get_tos(directory: Option<String>) -> Result<Option<String>, Error> {
 )]
 /// Get named known ACME directory endpoints.
 fn get_directories() -> Result<&'static [KnownAcmeDirectory], Error> {
-    Ok(crate::config::acme::KNOWN_ACME_DIRECTORIES)
-}
-
-/// Wrapper for efficient Arc use when returning the ACME challenge-plugin schema for serializing
-struct ChallengeSchemaWrapper {
-    inner: Arc<Vec<AcmeChallengeSchema>>,
-}
-
-impl Serialize for ChallengeSchemaWrapper {
-    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
-    where
-        S: serde::Serializer,
-    {
-        self.inner.serialize(serializer)
-    }
-}
-
-struct CachedSchema {
-    schema: Arc<Vec<AcmeChallengeSchema>>,
-    cached_mtime: SystemTime,
-}
-
-fn get_cached_challenge_schemas() -> Result<ChallengeSchemaWrapper, Error> {
-    static CACHE: LazyLock<Mutex<Option<CachedSchema>>> = LazyLock::new(|| Mutex::new(None));
-
-    // the actual loading code
-    let mut last = CACHE.lock().unwrap();
-
-    let actual_mtime = fs::metadata(crate::config::acme::ACME_DNS_SCHEMA_FN)?.modified()?;
-
-    let schema = match &*last {
-        Some(CachedSchema {
-            schema,
-            cached_mtime,
-        }) if *cached_mtime >= actual_mtime => schema.clone(),
-        _ => {
-            let new_schema = Arc::new(crate::config::acme::load_dns_challenge_schema()?);
-            *last = Some(CachedSchema {
-                schema: Arc::clone(&new_schema),
-                cached_mtime: actual_mtime,
-            });
-            new_schema
-        }
-    };
-
-    Ok(ChallengeSchemaWrapper { inner: schema })
+    Ok(proxmox_acme_api::KNOWN_ACME_DIRECTORIES)
 }
 
 #[api(
@@ -469,69 +306,7 @@ fn get_cached_challenge_schemas() -> Result<ChallengeSchemaWrapper, Error> {
 )]
 /// Get named known ACME directory endpoints.
 fn get_challenge_schema() -> Result<ChallengeSchemaWrapper, Error> {
-    get_cached_challenge_schemas()
-}
-
-#[api]
-#[derive(Default, Deserialize, Serialize)]
-#[serde(rename_all = "kebab-case")]
-/// The API's format is inherited from PVE/PMG:
-pub struct PluginConfig {
-    /// Plugin ID.
-    plugin: String,
-
-    /// Plugin type.
-    #[serde(rename = "type")]
-    ty: String,
-
-    /// DNS Api name.
-    #[serde(skip_serializing_if = "Option::is_none", default)]
-    api: Option<String>,
-
-    /// Plugin configuration data.
-    #[serde(skip_serializing_if = "Option::is_none", default)]
-    data: Option<String>,
-
-    /// Extra delay in seconds to wait before requesting validation.
-    ///
-    /// Allows to cope with long TTL of DNS records.
-    #[serde(skip_serializing_if = "Option::is_none", default)]
-    validation_delay: Option<u32>,
-
-    /// Flag to disable the config.
-    #[serde(skip_serializing_if = "Option::is_none", default)]
-    disable: Option<bool>,
-}
-
-// See PMG/PVE's $modify_cfg_for_api sub
-fn modify_cfg_for_api(id: &str, ty: &str, data: &Value) -> PluginConfig {
-    let mut entry = data.clone();
-
-    let obj = entry.as_object_mut().unwrap();
-    obj.remove("id");
-    obj.insert("plugin".to_string(), Value::String(id.to_owned()));
-    obj.insert("type".to_string(), Value::String(ty.to_owned()));
-
-    // FIXME: This needs to go once the `Updater` is fixed.
-    // None of these should be able to fail unless the user changed the files by hand, in which
-    // case we leave the unmodified string in the Value for now. This will be handled with an error
-    // later.
-    if let Some(Value::String(ref mut data)) = obj.get_mut("data") {
-        if let Ok(new) = proxmox_base64::url::decode_no_pad(&data) {
-            if let Ok(utf8) = String::from_utf8(new) {
-                *data = utf8;
-            }
-        }
-    }
-
-    // PVE/PMG do this explicitly for ACME plugins...
-    // obj.insert("digest".to_string(), Value::String(digest.clone()));
-
-    serde_json::from_value(entry).unwrap_or_else(|_| PluginConfig {
-        plugin: "*Error*".to_string(),
-        ty: "*Error*".to_string(),
-        ..Default::default()
-    })
+    proxmox_acme_api::get_cached_challenge_schemas()
 }
 
 #[api(
@@ -547,12 +322,7 @@ fn modify_cfg_for_api(id: &str, ty: &str, data: &Value) -> PluginConfig {
 )]
 /// List ACME challenge plugins.
 pub fn list_plugins(rpcenv: &mut dyn RpcEnvironment) -> Result<Vec<PluginConfig>, Error> {
-    let (plugins, digest) = plugin::config()?;
-    rpcenv["digest"] = hex::encode(digest).into();
-    Ok(plugins
-        .iter()
-        .map(|(id, (ty, data))| modify_cfg_for_api(id, ty, data))
-        .collect())
+    proxmox_acme_api::list_plugins(rpcenv)
 }
 
 #[api(
@@ -569,13 +339,7 @@ pub fn list_plugins(rpcenv: &mut dyn RpcEnvironment) -> Result<Vec<PluginConfig>
 )]
 /// List ACME challenge plugins.
 pub fn get_plugin(id: String, rpcenv: &mut dyn RpcEnvironment) -> Result<PluginConfig, Error> {
-    let (plugins, digest) = plugin::config()?;
-    rpcenv["digest"] = hex::encode(digest).into();
-
-    match plugins.get(&id) {
-        Some((ty, data)) => Ok(modify_cfg_for_api(&id, ty, data)),
-        None => http_bail!(NOT_FOUND, "no such plugin"),
-    }
+    proxmox_acme_api::get_plugin(id, rpcenv)
 }
 
 // Currently we only have "the" standalone plugin and DNS plugins so we can just flatten a
@@ -607,30 +371,7 @@ pub fn get_plugin(id: String, rpcenv: &mut dyn RpcEnvironment) -> Result<PluginC
 )]
 /// Add ACME plugin configuration.
 pub fn add_plugin(r#type: String, core: DnsPluginCore, data: String) -> Result<(), Error> {
-    // Currently we only support DNS plugins and the standalone plugin is "fixed":
-    if r#type != "dns" {
-        param_bail!("type", "invalid ACME plugin type: {:?}", r#type);
-    }
-
-    let data = String::from_utf8(proxmox_base64::decode(data)?)
-        .map_err(|_| format_err!("data must be valid UTF-8"))?;
-
-    let id = core.id.clone();
-
-    let _lock = plugin::lock()?;
-
-    let (mut plugins, _digest) = plugin::config()?;
-    if plugins.contains_key(&id) {
-        param_bail!("id", "ACME plugin ID {:?} already exists", id);
-    }
-
-    let plugin = serde_json::to_value(DnsPlugin { core, data })?;
-
-    plugins.insert(id, r#type, plugin);
-
-    plugin::save_config(&plugins)?;
-
-    Ok(())
+    proxmox_acme_api::add_plugin(r#type, core, data)
 }
 
 #[api(
@@ -646,26 +387,7 @@ pub fn add_plugin(r#type: String, core: DnsPluginCore, data: String) -> Result<(
 )]
 /// Delete an ACME plugin configuration.
 pub fn delete_plugin(id: String) -> Result<(), Error> {
-    let _lock = plugin::lock()?;
-
-    let (mut plugins, _digest) = plugin::config()?;
-    if plugins.remove(&id).is_none() {
-        http_bail!(NOT_FOUND, "no such plugin");
-    }
-    plugin::save_config(&plugins)?;
-
-    Ok(())
-}
-
-#[api()]
-#[derive(Serialize, Deserialize)]
-#[serde(rename_all = "kebab-case")]
-/// Deletable property name
-pub enum DeletableProperty {
-    /// Delete the disable property
-    Disable,
-    /// Delete the validation-delay property
-    ValidationDelay,
+    proxmox_acme_api::delete_plugin(id)
 }
 
 #[api(
@@ -687,12 +409,12 @@ pub enum DeletableProperty {
                 type: Array,
                 optional: true,
                 items: {
-                    type: DeletableProperty,
+                    type: DeletablePluginProperty,
                 }
             },
             digest: {
-                description: "Digest to protect against concurrent updates",
                 optional: true,
+                type: ConfigDigest,
             },
         },
     },
@@ -706,65 +428,8 @@ pub fn update_plugin(
     id: String,
     update: DnsPluginCoreUpdater,
     data: Option<String>,
-    delete: Option<Vec<DeletableProperty>>,
-    digest: Option<String>,
+    delete: Option<Vec<DeletablePluginProperty>>,
+    digest: Option<ConfigDigest>,
 ) -> Result<(), Error> {
-    let data = data
-        .as_deref()
-        .map(proxmox_base64::decode)
-        .transpose()?
-        .map(String::from_utf8)
-        .transpose()
-        .map_err(|_| format_err!("data must be valid UTF-8"))?;
-
-    let _lock = plugin::lock()?;
-
-    let (mut plugins, expected_digest) = plugin::config()?;
-
-    if let Some(digest) = digest {
-        let digest = <[u8; 32]>::from_hex(digest)?;
-        crate::tools::detect_modified_configuration_file(&digest, &expected_digest)?;
-    }
-
-    match plugins.get_mut(&id) {
-        Some((ty, ref mut entry)) => {
-            if ty != "dns" {
-                bail!("cannot update plugin of type {:?}", ty);
-            }
-
-            let mut plugin = DnsPlugin::deserialize(&*entry)?;
-
-            if let Some(delete) = delete {
-                for delete_prop in delete {
-                    match delete_prop {
-                        DeletableProperty::ValidationDelay => {
-                            plugin.core.validation_delay = None;
-                        }
-                        DeletableProperty::Disable => {
-                            plugin.core.disable = None;
-                        }
-                    }
-                }
-            }
-            if let Some(data) = data {
-                plugin.data = data;
-            }
-            if let Some(api) = update.api {
-                plugin.core.api = api;
-            }
-            if update.validation_delay.is_some() {
-                plugin.core.validation_delay = update.validation_delay;
-            }
-            if update.disable.is_some() {
-                plugin.core.disable = update.disable;
-            }
-
-            *entry = serde_json::to_value(plugin)?;
-        }
-        None => http_bail!(NOT_FOUND, "no such plugin"),
-    }
-
-    plugin::save_config(&plugins)?;
-
-    Ok(())
+    proxmox_acme_api::update_plugin(id, update, data, delete, digest)
 }
diff --git a/src/api2/node/certificates.rs b/src/api2/node/certificates.rs
index 6b1d87d2..7fb3a478 100644
--- a/src/api2/node/certificates.rs
+++ b/src/api2/node/certificates.rs
@@ -1,13 +1,11 @@
-use std::sync::Arc;
-use std::time::Duration;
-
 use anyhow::{bail, format_err, Error};
 use openssl::pkey::PKey;
 use openssl::x509::X509;
 use serde::{Deserialize, Serialize};
-use tracing::{info, warn};
+use tracing::info;
 
 use pbs_api_types::{NODE_SCHEMA, PRIV_SYS_MODIFY};
+use proxmox_acme_api::AcmeDomain;
 use proxmox_rest_server::WorkerTask;
 use proxmox_router::list_subdirs_api_method;
 use proxmox_router::SubdirMap;
@@ -17,9 +15,6 @@ use proxmox_schema::api;
 use pbs_buildcfg::configdir;
 use pbs_tools::cert;
 
-use crate::acme::AcmeClient;
-use crate::api2::types::AcmeDomain;
-use crate::config::node::NodeConfig;
 use crate::server::send_certificate_renewal_mail;
 
 pub const ROUTER: Router = Router::new()
@@ -268,193 +263,6 @@ pub async fn delete_custom_certificate() -> Result<(), Error> {
     Ok(())
 }
 
-struct OrderedCertificate {
-    certificate: hyper::body::Bytes,
-    private_key_pem: Vec<u8>,
-}
-
-async fn order_certificate(
-    worker: Arc<WorkerTask>,
-    node_config: &NodeConfig,
-) -> Result<Option<OrderedCertificate>, Error> {
-    use proxmox_acme::authorization::Status;
-    use proxmox_acme::order::Identifier;
-
-    let domains = node_config.acme_domains().try_fold(
-        Vec::<AcmeDomain>::new(),
-        |mut acc, domain| -> Result<_, Error> {
-            let mut domain = domain?;
-            domain.domain.make_ascii_lowercase();
-            if let Some(alias) = &mut domain.alias {
-                alias.make_ascii_lowercase();
-            }
-            acc.push(domain);
-            Ok(acc)
-        },
-    )?;
-
-    let get_domain_config = |domain: &str| {
-        domains
-            .iter()
-            .find(|d| d.domain == domain)
-            .ok_or_else(|| format_err!("no config for domain '{}'", domain))
-    };
-
-    if domains.is_empty() {
-        info!("No domains configured to be ordered from an ACME server.");
-        return Ok(None);
-    }
-
-    let (plugins, _) = crate::config::acme::plugin::config()?;
-
-    let mut acme = node_config.acme_client().await?;
-
-    info!("Placing ACME order");
-    let order = acme
-        .new_order(domains.iter().map(|d| d.domain.to_ascii_lowercase()))
-        .await?;
-    info!("Order URL: {}", order.location);
-
-    let identifiers: Vec<String> = order
-        .data
-        .identifiers
-        .iter()
-        .map(|identifier| match identifier {
-            Identifier::Dns(domain) => domain.clone(),
-        })
-        .collect();
-
-    for auth_url in &order.data.authorizations {
-        info!("Getting authorization details from '{auth_url}'");
-        let mut auth = acme.get_authorization(auth_url).await?;
-
-        let domain = match &mut auth.identifier {
-            Identifier::Dns(domain) => domain.to_ascii_lowercase(),
-        };
-
-        if auth.status == Status::Valid {
-            info!("{domain} is already validated!");
-            continue;
-        }
-
-        info!("The validation for {domain} is pending");
-        let domain_config: &AcmeDomain = get_domain_config(&domain)?;
-        let plugin_id = domain_config.plugin.as_deref().unwrap_or("standalone");
-        let mut plugin_cfg = crate::acme::get_acme_plugin(&plugins, plugin_id)?
-            .ok_or_else(|| format_err!("plugin '{plugin_id}' for domain '{domain}' not found!"))?;
-
-        info!("Setting up validation plugin");
-        let validation_url = plugin_cfg
-            .setup(&mut acme, &auth, domain_config, Arc::clone(&worker))
-            .await?;
-
-        let result = request_validation(&mut acme, auth_url, validation_url).await;
-
-        if let Err(err) = plugin_cfg
-            .teardown(&mut acme, &auth, domain_config, Arc::clone(&worker))
-            .await
-        {
-            warn!("Failed to teardown plugin '{plugin_id}' for domain '{domain}' - {err}");
-        }
-
-        result?;
-    }
-
-    info!("All domains validated");
-    info!("Creating CSR");
-
-    let csr = proxmox_acme::util::Csr::generate(&identifiers, &Default::default())?;
-    let mut finalize_error_cnt = 0u8;
-    let order_url = &order.location;
-    let mut order;
-    loop {
-        use proxmox_acme::order::Status;
-
-        order = acme.get_order(order_url).await?;
-
-        match order.status {
-            Status::Pending => {
-                info!("still pending, trying to finalize anyway");
-                let finalize = order
-                    .finalize
-                    .as_deref()
-                    .ok_or_else(|| format_err!("missing 'finalize' URL in order"))?;
-                if let Err(err) = acme.finalize(finalize, &csr.data).await {
-                    if finalize_error_cnt >= 5 {
-                        return Err(err);
-                    }
-
-                    finalize_error_cnt += 1;
-                }
-                tokio::time::sleep(Duration::from_secs(5)).await;
-            }
-            Status::Ready => {
-                info!("order is ready, finalizing");
-                let finalize = order
-                    .finalize
-                    .as_deref()
-                    .ok_or_else(|| format_err!("missing 'finalize' URL in order"))?;
-                acme.finalize(finalize, &csr.data).await?;
-                tokio::time::sleep(Duration::from_secs(5)).await;
-            }
-            Status::Processing => {
-                info!("still processing, trying again in 30 seconds");
-                tokio::time::sleep(Duration::from_secs(30)).await;
-            }
-            Status::Valid => {
-                info!("valid");
-                break;
-            }
-            other => bail!("order status: {:?}", other),
-        }
-    }
-
-    info!("Downloading certificate");
-    let certificate = acme
-        .get_certificate(
-            order
-                .certificate
-                .as_deref()
-                .ok_or_else(|| format_err!("missing certificate url in finalized order"))?,
-        )
-        .await?;
-
-    Ok(Some(OrderedCertificate {
-        certificate,
-        private_key_pem: csr.private_key_pem,
-    }))
-}
-
-async fn request_validation(
-    acme: &mut AcmeClient,
-    auth_url: &str,
-    validation_url: &str,
-) -> Result<(), Error> {
-    info!("Triggering validation");
-    acme.request_challenge_validation(validation_url).await?;
-
-    info!("Sleeping for 5 seconds");
-    tokio::time::sleep(Duration::from_secs(5)).await;
-
-    loop {
-        use proxmox_acme::authorization::Status;
-
-        let auth = acme.get_authorization(auth_url).await?;
-        match auth.status {
-            Status::Pending => {
-                info!("Status is still 'pending', trying again in 10 seconds");
-                tokio::time::sleep(Duration::from_secs(10)).await;
-            }
-            Status::Valid => return Ok(()),
-            other => bail!(
-                "validating challenge '{}' failed - status: {:?}",
-                validation_url,
-                other
-            ),
-        }
-    }
-}
-
 #[api(
     input: {
         properties: {
@@ -524,9 +332,26 @@ fn spawn_certificate_worker(
 
     let auth_id = rpcenv.get_auth_id().unwrap();
 
+    let acme_config = node_config.acme_config()?;
+
+    let domains = node_config.acme_domains().try_fold(
+        Vec::<AcmeDomain>::new(),
+        |mut acc, domain| -> Result<_, Error> {
+            let mut domain = domain?;
+            domain.domain.make_ascii_lowercase();
+            if let Some(alias) = &mut domain.alias {
+                alias.make_ascii_lowercase();
+            }
+            acc.push(domain);
+            Ok(acc)
+        },
+    )?;
+
     WorkerTask::spawn(name, None, auth_id, true, move |worker| async move {
         let work = || async {
-            if let Some(cert) = order_certificate(worker, &node_config).await? {
+            if let Some(cert) =
+                proxmox_acme_api::order_certificate(worker, &acme_config, &domains).await?
+            {
                 crate::config::set_proxy_certificate(&cert.certificate, &cert.private_key_pem)?;
                 crate::server::reload_proxy_certificate().await?;
             }
@@ -562,16 +387,16 @@ pub fn revoke_acme_cert(rpcenv: &mut dyn RpcEnvironment) -> Result<String, Error
 
     let auth_id = rpcenv.get_auth_id().unwrap();
 
+    let acme_config = node_config.acme_config()?;
+
     WorkerTask::spawn(
         "acme-revoke-cert",
         None,
         auth_id,
         true,
         move |_worker| async move {
-            info!("Loading ACME account");
-            let mut acme = node_config.acme_client().await?;
             info!("Revoking old certificate");
-            acme.revoke_certificate(cert_pem.as_bytes(), None).await?;
+            proxmox_acme_api::revoke_certificate(&acme_config, &cert_pem.as_bytes()).await?;
             info!("Deleting certificate and regenerating a self-signed one");
             delete_custom_certificate().await?;
             Ok(())
diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
index 8661f9e8..b83b9882 100644
--- a/src/api2/types/acme.rs
+++ b/src/api2/types/acme.rs
@@ -1,8 +1,7 @@
 use serde::{Deserialize, Serialize};
-use serde_json::Value;
 
 use pbs_api_types::{DNS_ALIAS_FORMAT, DNS_NAME_FORMAT, PROXMOX_SAFE_ID_FORMAT};
-use proxmox_schema::{api, ApiStringFormat, ApiType, Schema, StringSchema};
+use proxmox_schema::api;
 
 #[api(
     properties: {
@@ -37,61 +36,3 @@ pub struct AcmeDomain {
     #[serde(skip_serializing_if = "Option::is_none")]
     pub plugin: Option<String>,
 }
-
-pub const ACME_DOMAIN_PROPERTY_SCHEMA: Schema =
-    StringSchema::new("ACME domain configuration string")
-        .format(&ApiStringFormat::PropertyString(&AcmeDomain::API_SCHEMA))
-        .schema();
-
-#[api(
-    properties: {
-        name: { type: String },
-        url: { type: String },
-    },
-)]
-/// An ACME directory endpoint with a name and URL.
-#[derive(Serialize)]
-pub struct KnownAcmeDirectory {
-    /// The ACME directory's name.
-    pub name: &'static str,
-
-    /// The ACME directory's endpoint URL.
-    pub url: &'static str,
-}
-
-proxmox_schema::api_string_type! {
-    #[api(format: &PROXMOX_SAFE_ID_FORMAT)]
-    /// ACME account name.
-    #[derive(Clone, Eq, PartialEq, Hash, Deserialize, Serialize)]
-    #[serde(transparent)]
-    pub struct AcmeAccountName(String);
-}
-
-#[api(
-    properties: {
-        schema: {
-            type: Object,
-            additional_properties: true,
-            properties: {},
-        },
-        type: {
-            type: String,
-        },
-    },
-)]
-#[derive(Serialize)]
-/// Schema for an ACME challenge plugin.
-pub struct AcmeChallengeSchema {
-    /// Plugin ID.
-    pub id: String,
-
-    /// Human readable name, falls back to id.
-    pub name: String,
-
-    /// Plugin Type.
-    #[serde(rename = "type")]
-    pub ty: &'static str,
-
-    /// The plugin's parameter schema.
-    pub schema: Value,
-}
diff --git a/src/bin/proxmox-backup-api.rs b/src/bin/proxmox-backup-api.rs
index 417e9e97..d0091dca 100644
--- a/src/bin/proxmox-backup-api.rs
+++ b/src/bin/proxmox-backup-api.rs
@@ -14,6 +14,7 @@ use proxmox_rest_server::{ApiConfig, RestServer};
 use proxmox_router::RpcEnvironmentType;
 use proxmox_sys::fs::CreateOptions;
 
+use pbs_buildcfg::configdir;
 use proxmox_backup::auth_helpers::*;
 use proxmox_backup::config;
 use proxmox_backup::server::auth::check_pbs_auth;
@@ -78,6 +79,7 @@ async fn run() -> Result<(), Error> {
     let mut command_sock = proxmox_daemon::command_socket::CommandSocket::new(backup_user.gid);
 
     proxmox_product_config::init(backup_user.clone(), pbs_config::priv_user()?);
+    proxmox_acme_api::init(configdir!("/acme"), true)?;
 
     let dir_opts = CreateOptions::new()
         .owner(backup_user.uid)
diff --git a/src/bin/proxmox-backup-manager.rs b/src/bin/proxmox-backup-manager.rs
index f8365070..f041ba0b 100644
--- a/src/bin/proxmox-backup-manager.rs
+++ b/src/bin/proxmox-backup-manager.rs
@@ -19,12 +19,12 @@ use proxmox_router::{cli::*, RpcEnvironment};
 use proxmox_schema::api;
 use proxmox_sys::fs::CreateOptions;
 
+use pbs_buildcfg::configdir;
 use pbs_client::{display_task_log, view_task_result};
 use pbs_config::sync;
 use pbs_tools::json::required_string_param;
 use proxmox_backup::api2;
 use proxmox_backup::client_helpers::connect_to_localhost;
-use proxmox_backup::config;
 
 mod proxmox_backup_manager;
 use proxmox_backup_manager::*;
@@ -667,6 +667,7 @@ async fn run() -> Result<(), Error> {
         .init()?;
     proxmox_backup::server::notifications::init()?;
     proxmox_product_config::init(pbs_config::backup_user()?, pbs_config::priv_user()?);
+    proxmox_acme_api::init(configdir!("/acme"), false)?;
 
     let cmd_def = CliCommandMap::new()
         .insert("acl", acl_commands())
diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
index 870208fe..eea44a7d 100644
--- a/src/bin/proxmox-backup-proxy.rs
+++ b/src/bin/proxmox-backup-proxy.rs
@@ -188,6 +188,7 @@ async fn run() -> Result<(), Error> {
     proxmox_backup::server::notifications::init()?;
     metric_collection::init()?;
     proxmox_product_config::init(pbs_config::backup_user()?, pbs_config::priv_user()?);
+    proxmox_acme_api::init(configdir!("/acme"), false)?;
 
     let mut indexpath = PathBuf::from(pbs_buildcfg::JS_DIR);
     indexpath.push("index.hbs");
diff --git a/src/bin/proxmox_backup_manager/acme.rs b/src/bin/proxmox_backup_manager/acme.rs
index 0f0eafea..57431225 100644
--- a/src/bin/proxmox_backup_manager/acme.rs
+++ b/src/bin/proxmox_backup_manager/acme.rs
@@ -3,15 +3,13 @@ use std::io::Write;
 use anyhow::{bail, Error};
 use serde_json::Value;
 
+use proxmox_acme::async_client::AcmeClient;
+use proxmox_acme_api::{AcmeAccountName, DnsPluginCore, KNOWN_ACME_DIRECTORIES};
 use proxmox_router::{cli::*, ApiHandler, RpcEnvironment};
 use proxmox_schema::api;
 use proxmox_sys::fs::file_get_contents;
 
-use proxmox_backup::acme::AcmeClient;
 use proxmox_backup::api2;
-use proxmox_backup::api2::types::AcmeAccountName;
-use proxmox_backup::config::acme::plugin::DnsPluginCore;
-use proxmox_backup::config::acme::KNOWN_ACME_DIRECTORIES;
 
 pub fn acme_mgmt_cli() -> CommandLineInterface {
     let cmd_def = CliCommandMap::new()
@@ -122,7 +120,7 @@ async fn register_account(
 
                 match input.trim().parse::<usize>() {
                     Ok(n) if n < KNOWN_ACME_DIRECTORIES.len() => {
-                        break (KNOWN_ACME_DIRECTORIES[n].url.to_owned(), false);
+                        break (KNOWN_ACME_DIRECTORIES[n].url.to_string(), false);
                     }
                     Ok(n) if n == KNOWN_ACME_DIRECTORIES.len() => {
                         input.clear();
@@ -188,17 +186,20 @@ async fn register_account(
 
     println!("Attempting to register account with {directory_url:?}...");
 
-    let account = api2::config::acme::do_register_account(
-        &mut client,
+    let tos_agreed = tos_agreed
+        .then(|| directory.terms_of_service_url().map(str::to_owned))
+        .flatten();
+
+    let location = proxmox_acme_api::register_account(
         &name,
-        tos_agreed,
         contact,
-        None,
+        tos_agreed,
+        Some(directory_url),
         eab_creds,
     )
     .await?;
 
-    println!("Registration successful, account URL: {}", account.location);
+    println!("Registration successful, account URL: {}", location);
 
     Ok(())
 }
@@ -266,19 +267,19 @@ pub fn account_cli() -> CommandLineInterface {
             "deactivate",
             CliCommand::new(&API_METHOD_DEACTIVATE_ACCOUNT)
                 .arg_param(&["name"])
-                .completion_cb("name", crate::config::acme::complete_acme_account),
+                .completion_cb("name", proxmox_acme_api::complete_acme_account),
         )
         .insert(
             "info",
             CliCommand::new(&API_METHOD_GET_ACCOUNT)
                 .arg_param(&["name"])
-                .completion_cb("name", crate::config::acme::complete_acme_account),
+                .completion_cb("name", proxmox_acme_api::complete_acme_account),
         )
         .insert(
             "update",
             CliCommand::new(&API_METHOD_UPDATE_ACCOUNT)
                 .arg_param(&["name"])
-                .completion_cb("name", crate::config::acme::complete_acme_account),
+                .completion_cb("name", proxmox_acme_api::complete_acme_account),
         );
 
     cmd_def.into()
@@ -373,26 +374,26 @@ pub fn plugin_cli() -> CommandLineInterface {
             "config", // name comes from pve/pmg
             CliCommand::new(&API_METHOD_GET_PLUGIN)
                 .arg_param(&["id"])
-                .completion_cb("id", crate::config::acme::complete_acme_plugin),
+                .completion_cb("id", proxmox_acme_api::complete_acme_plugin),
         )
         .insert(
             "add",
             CliCommand::new(&API_METHOD_ADD_PLUGIN)
                 .arg_param(&["type", "id"])
-                .completion_cb("api", crate::config::acme::complete_acme_api_challenge_type)
-                .completion_cb("type", crate::config::acme::complete_acme_plugin_type),
+                .completion_cb("api", proxmox_acme_api::complete_acme_api_challenge_type)
+                .completion_cb("type", proxmox_acme_api::complete_acme_plugin_type),
         )
         .insert(
             "remove",
             CliCommand::new(&acme::API_METHOD_DELETE_PLUGIN)
                 .arg_param(&["id"])
-                .completion_cb("id", crate::config::acme::complete_acme_plugin),
+                .completion_cb("id", proxmox_acme_api::complete_acme_plugin),
         )
         .insert(
             "set",
             CliCommand::new(&acme::API_METHOD_UPDATE_PLUGIN)
                 .arg_param(&["id"])
-                .completion_cb("id", crate::config::acme::complete_acme_plugin),
+                .completion_cb("id", proxmox_acme_api::complete_acme_plugin),
         );
 
     cmd_def.into()
diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
index ac89ae5e..962cb1bb 100644
--- a/src/config/acme/mod.rs
+++ b/src/config/acme/mod.rs
@@ -1,168 +1 @@
-use std::collections::HashMap;
-use std::ops::ControlFlow;
-use std::path::Path;
-
-use anyhow::{bail, format_err, Error};
-use serde_json::Value;
-
-use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
-use proxmox_sys::error::SysError;
-use proxmox_sys::fs::{file_read_string, CreateOptions};
-
-use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
-
-pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
-pub(crate) const ACME_ACCOUNT_DIR: &str = pbs_buildcfg::configdir!("/acme/accounts");
-
-pub(crate) const ACME_DNS_SCHEMA_FN: &str = "/usr/share/proxmox-acme/dns-challenge-schema.json";
-
 pub mod plugin;
-
-// `const fn`ify this once it is supported in `proxmox`
-fn root_only() -> CreateOptions {
-    CreateOptions::new()
-        .owner(nix::unistd::ROOT)
-        .group(nix::unistd::Gid::from_raw(0))
-        .perm(nix::sys::stat::Mode::from_bits_truncate(0o700))
-}
-
-fn create_acme_subdir(dir: &str) -> Result<(), Error> {
-    proxmox_sys::fs::ensure_dir_exists(dir, &root_only(), false)
-}
-
-pub(crate) fn make_acme_dir() -> Result<(), Error> {
-    create_acme_subdir(ACME_DIR)
-}
-
-pub(crate) fn make_acme_account_dir() -> Result<(), Error> {
-    make_acme_dir()?;
-    create_acme_subdir(ACME_ACCOUNT_DIR)
-}
-
-pub const KNOWN_ACME_DIRECTORIES: &[KnownAcmeDirectory] = &[
-    KnownAcmeDirectory {
-        name: "Let's Encrypt V2",
-        url: "https://acme-v02.api.letsencrypt.org/directory",
-    },
-    KnownAcmeDirectory {
-        name: "Let's Encrypt V2 Staging",
-        url: "https://acme-staging-v02.api.letsencrypt.org/directory",
-    },
-];
-
-pub const DEFAULT_ACME_DIRECTORY_ENTRY: &KnownAcmeDirectory = &KNOWN_ACME_DIRECTORIES[0];
-
-pub fn account_path(name: &str) -> String {
-    format!("{ACME_ACCOUNT_DIR}/{name}")
-}
-
-pub fn foreach_acme_account<F>(mut func: F) -> Result<(), Error>
-where
-    F: FnMut(AcmeAccountName) -> ControlFlow<Result<(), Error>>,
-{
-    match proxmox_sys::fs::scan_subdir(-1, ACME_ACCOUNT_DIR, &PROXMOX_SAFE_ID_REGEX) {
-        Ok(files) => {
-            for file in files {
-                let file = file?;
-                let file_name = unsafe { file.file_name_utf8_unchecked() };
-
-                if file_name.starts_with('_') {
-                    continue;
-                }
-
-                let account_name = match AcmeAccountName::from_string(file_name.to_owned()) {
-                    Ok(account_name) => account_name,
-                    Err(_) => continue,
-                };
-
-                if let ControlFlow::Break(result) = func(account_name) {
-                    return result;
-                }
-            }
-            Ok(())
-        }
-        Err(err) if err.not_found() => Ok(()),
-        Err(err) => Err(err.into()),
-    }
-}
-
-pub fn mark_account_deactivated(name: &str) -> Result<(), Error> {
-    let from = account_path(name);
-    for i in 0..100 {
-        let to = account_path(&format!("_deactivated_{name}_{i}"));
-        if !Path::new(&to).exists() {
-            return std::fs::rename(&from, &to).map_err(|err| {
-                format_err!(
-                    "failed to move account path {:?} to {:?} - {}",
-                    from,
-                    to,
-                    err
-                )
-            });
-        }
-    }
-    bail!(
-        "No free slot to rename deactivated account {:?}, please cleanup {:?}",
-        from,
-        ACME_ACCOUNT_DIR
-    );
-}
-
-pub fn load_dns_challenge_schema() -> Result<Vec<AcmeChallengeSchema>, Error> {
-    let raw = file_read_string(ACME_DNS_SCHEMA_FN)?;
-    let schemas: serde_json::Map<String, Value> = serde_json::from_str(&raw)?;
-
-    Ok(schemas
-        .iter()
-        .map(|(id, schema)| AcmeChallengeSchema {
-            id: id.to_owned(),
-            name: schema
-                .get("name")
-                .and_then(Value::as_str)
-                .unwrap_or(id)
-                .to_owned(),
-            ty: "dns",
-            schema: schema.to_owned(),
-        })
-        .collect())
-}
-
-pub fn complete_acme_account(_arg: &str, _param: &HashMap<String, String>) -> Vec<String> {
-    let mut out = Vec::new();
-    let _ = foreach_acme_account(|name| {
-        out.push(name.into_string());
-        ControlFlow::Continue(())
-    });
-    out
-}
-
-pub fn complete_acme_plugin(_arg: &str, _param: &HashMap<String, String>) -> Vec<String> {
-    match plugin::config() {
-        Ok((config, _digest)) => config
-            .iter()
-            .map(|(id, (_type, _cfg))| id.clone())
-            .collect(),
-        Err(_) => Vec::new(),
-    }
-}
-
-pub fn complete_acme_plugin_type(_arg: &str, _param: &HashMap<String, String>) -> Vec<String> {
-    vec![
-        "dns".to_string(),
-        //"http".to_string(), // makes currently not really sense to create or the like
-    ]
-}
-
-pub fn complete_acme_api_challenge_type(
-    _arg: &str,
-    param: &HashMap<String, String>,
-) -> Vec<String> {
-    if param.get("type") == Some(&"dns".to_string()) {
-        match load_dns_challenge_schema() {
-            Ok(schema) => schema.into_iter().map(|s| s.id).collect(),
-            Err(_) => Vec::new(),
-        }
-    } else {
-        Vec::new()
-    }
-}
diff --git a/src/config/acme/plugin.rs b/src/config/acme/plugin.rs
index 8ce852ec..e5a41f99 100644
--- a/src/config/acme/plugin.rs
+++ b/src/config/acme/plugin.rs
@@ -1,14 +1,10 @@
-use std::sync::LazyLock;
-
 use anyhow::Error;
 use serde::{Deserialize, Serialize};
 use serde_json::Value;
 
 use pbs_api_types::PROXMOX_SAFE_ID_FORMAT;
-use proxmox_schema::{api, ApiType, Schema, StringSchema, Updater};
-use proxmox_section_config::{SectionConfig, SectionConfigData, SectionConfigPlugin};
-
-use pbs_config::{open_backup_lockfile, BackupLockGuard};
+use proxmox_schema::{api, Schema, StringSchema, Updater};
+use proxmox_section_config::SectionConfigData;
 
 pub const PLUGIN_ID_SCHEMA: Schema = StringSchema::new("ACME Challenge Plugin ID.")
     .format(&PROXMOX_SAFE_ID_FORMAT)
@@ -16,28 +12,6 @@ pub const PLUGIN_ID_SCHEMA: Schema = StringSchema::new("ACME Challenge Plugin ID
     .max_length(32)
     .schema();
 
-pub static CONFIG: LazyLock<SectionConfig> = LazyLock::new(init);
-
-#[api(
-    properties: {
-        id: { schema: PLUGIN_ID_SCHEMA },
-    },
-)]
-#[derive(Deserialize, Serialize)]
-/// Standalone ACME Plugin for the http-1 challenge.
-pub struct StandalonePlugin {
-    /// Plugin ID.
-    id: String,
-}
-
-impl Default for StandalonePlugin {
-    fn default() -> Self {
-        Self {
-            id: "standalone".to_string(),
-        }
-    }
-}
-
 #[api(
     properties: {
         id: { schema: PLUGIN_ID_SCHEMA },
@@ -99,64 +73,6 @@ impl DnsPlugin {
     }
 }
 
-fn init() -> SectionConfig {
-    let mut config = SectionConfig::new(&PLUGIN_ID_SCHEMA);
-
-    let standalone_schema = match &StandalonePlugin::API_SCHEMA {
-        Schema::Object(schema) => schema,
-        _ => unreachable!(),
-    };
-    let standalone_plugin = SectionConfigPlugin::new(
-        "standalone".to_string(),
-        Some("id".to_string()),
-        standalone_schema,
-    );
-    config.register_plugin(standalone_plugin);
-
-    let dns_challenge_schema = match DnsPlugin::API_SCHEMA {
-        Schema::AllOf(ref schema) => schema,
-        _ => unreachable!(),
-    };
-    let dns_challenge_plugin = SectionConfigPlugin::new(
-        "dns".to_string(),
-        Some("id".to_string()),
-        dns_challenge_schema,
-    );
-    config.register_plugin(dns_challenge_plugin);
-
-    config
-}
-
-const ACME_PLUGIN_CFG_FILENAME: &str = pbs_buildcfg::configdir!("/acme/plugins.cfg");
-const ACME_PLUGIN_CFG_LOCKFILE: &str = pbs_buildcfg::configdir!("/acme/.plugins.lck");
-
-pub fn lock() -> Result<BackupLockGuard, Error> {
-    super::make_acme_dir()?;
-    open_backup_lockfile(ACME_PLUGIN_CFG_LOCKFILE, None, true)
-}
-
-pub fn config() -> Result<(PluginData, [u8; 32]), Error> {
-    let content =
-        proxmox_sys::fs::file_read_optional_string(ACME_PLUGIN_CFG_FILENAME)?.unwrap_or_default();
-
-    let digest = openssl::sha::sha256(content.as_bytes());
-    let mut data = CONFIG.parse(ACME_PLUGIN_CFG_FILENAME, &content)?;
-
-    if !data.sections.contains_key("standalone") {
-        let standalone = StandalonePlugin::default();
-        data.set_data("standalone", "standalone", &standalone)
-            .unwrap();
-    }
-
-    Ok((PluginData { data }, digest))
-}
-
-pub fn save_config(config: &PluginData) -> Result<(), Error> {
-    super::make_acme_dir()?;
-    let raw = CONFIG.write(ACME_PLUGIN_CFG_FILENAME, &config.data)?;
-    pbs_config::replace_backup_config(ACME_PLUGIN_CFG_FILENAME, raw.as_bytes())
-}
-
 pub struct PluginData {
     data: SectionConfigData,
 }
diff --git a/src/config/node.rs b/src/config/node.rs
index 253b2e36..81eecb24 100644
--- a/src/config/node.rs
+++ b/src/config/node.rs
@@ -8,16 +8,14 @@ use pbs_api_types::{
     EMAIL_SCHEMA, MULTI_LINE_COMMENT_SCHEMA, OPENSSL_CIPHERS_TLS_1_2_SCHEMA,
     OPENSSL_CIPHERS_TLS_1_3_SCHEMA,
 };
+use proxmox_acme_api::{AcmeConfig, AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA};
 use proxmox_http::ProxyConfig;
 use proxmox_schema::{api, ApiStringFormat, ApiType, Updater};
 
 use pbs_buildcfg::configdir;
 use pbs_config::{open_backup_lockfile, BackupLockGuard};
 
-use crate::acme::AcmeClient;
-use crate::api2::types::{
-    AcmeAccountName, AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA,
-};
+use crate::api2::types::HTTP_PROXY_SCHEMA;
 
 const CONF_FILE: &str = configdir!("/node.cfg");
 const LOCK_FILE: &str = configdir!("/.node.lck");
@@ -44,20 +42,6 @@ pub fn save_config(config: &NodeConfig) -> Result<(), Error> {
     pbs_config::replace_backup_config(CONF_FILE, &raw)
 }
 
-#[api(
-    properties: {
-        account: { type: AcmeAccountName },
-    }
-)]
-#[derive(Deserialize, Serialize)]
-/// The ACME configuration.
-///
-/// Currently only contains the name of the account use.
-pub struct AcmeConfig {
-    /// Account to use to acquire ACME certificates.
-    account: AcmeAccountName,
-}
-
 /// All available languages in Proxmox. Taken from proxmox-i18n repository.
 /// pt_BR, zh_CN, and zh_TW use the same case in the translation files.
 // TODO: auto-generate from available translations
@@ -235,19 +219,16 @@ pub struct NodeConfig {
 }
 
 impl NodeConfig {
-    pub fn acme_config(&self) -> Option<Result<AcmeConfig, Error>> {
-        self.acme.as_deref().map(|config| -> Result<_, Error> {
-            crate::tools::config::from_property_string(config, &AcmeConfig::API_SCHEMA)
-        })
-    }
-
-    pub async fn acme_client(&self) -> Result<AcmeClient, Error> {
-        let account = if let Some(cfg) = self.acme_config().transpose()? {
-            cfg.account
-        } else {
-            AcmeAccountName::from_string("default".to_string())? // should really not happen
-        };
-        AcmeClient::load(&account).await
+    pub fn acme_config(&self) -> Result<AcmeConfig, Error> {
+        self.acme
+            .as_deref()
+            .map(|config| {
+                crate::tools::config::from_property_string::<AcmeConfig>(
+                    config,
+                    &AcmeConfig::API_SCHEMA,
+                )
+            })
+            .unwrap_or_else(|| proxmox_acme_api::parse_acme_config_string("account=default"))
     }
 
     pub fn acme_domains(&'_ self) -> AcmeDomainIter<'_> {
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply related	[relevance 4%]

* [pbs-devel] superseded: [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests
  @ 2026-01-16 11:30 13% ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-16 11:30 UTC (permalink / raw)
  To: pbs-devel

https://lore.proxmox.com/pbs-devel/20260116112859.194016-1-s.rufinatscha@proxmox.com/T/#t

On 1/8/26 12:25 PM, Samuel Rufinatscha wrote:
> Hi,
> 
> this series fixes account registration for ACME providers that return
> HTTP 204 No Content to the newNonce request. Currently, both the PBS
> ACME client and the shared ACME client in proxmox-acme only accept
> HTTP 200 OK for this request. The issue was observed in PBS against a
> custom ACME deployment and reported as bug #6939 [1].
> 
> ## Problem
> 
> During ACME account registration, PBS first fetches an anti-replay
> nonce by sending a HEAD request to the CA’s newNonce URL.
> RFC 8555 §7.2 [2] states that:
> 
> * the server MUST include a Replay-Nonce header with a fresh nonce,
> * the server SHOULD use status 200 OK for the HEAD request,
> * the server MUST also handle GET on the same resource and may return
>    204 No Content with an empty body.
> 
> The reporter observed the following error message:
> 
>    *ACME server responded with unexpected status code: 204*
> 
> and mentioned that the issue did not appear with PVE 9 [1]. Looking at
> PVE’s Perl ACME client [3], it uses a GET request instead of HEAD and
> accepts any 2xx success code when retrieving the nonce. This difference
> in behavior does not affect functionality but is worth noting for
> consistency across implementations.
> 
> ## Approach
> 
> To support ACME providers which return 204 No Content, the Rust ACME
> clients in proxmox-backup and proxmox need to treat both 200 OK and 204
> No Content as valid responses for the nonce request, as long as a
> Replay-Nonce header is present.
> 
> This series changes the expected field of the internal Request type
> from a single u16 to a list of allowed status codes
> (e.g. &'static [u16]), so one request can explicitly accept multiple
> success codes.
> 
> To avoid fixing the issue twice (once in PBS’ own ACME client and once
> in the shared Rust client), this series first refactors PBS to use the
> shared AcmeClient from proxmox-acme / proxmox-acme-api, similar to PDM,
> and then applies the bug fix in that shared implementation so that all
> consumers benefit from the more tolerant behavior.
> 
> ## Testing
> 
> *Testing the refactor*
> 
> To test the refactor, I
> (1) installed latest stable PBS on a VM
> (2) created .deb package from latest PBS (master), containing the
>   refactor
> (3) installed created .deb package
> (4) installed Pebble from Let's Encrypt [5] on the same VM
> (5) created an ACME account and ordered the new certificate for the
>   host domain.
> 
> Steps to reproduce:
> 
> (1) install latest stable PBS on a VM, create .deb package from latest
>   PBS (master) containing the refactor, install created .deb package
> (2) install Pebble from Let's Encrypt [5] on the same VM:
> 
>      cd
>      apt update
>      apt install -y golang git
>      git clone https://github.com/letsencrypt/pebble
>      cd pebble
>      go build ./cmd/pebble
> 
> then, download and trust the Pebble cert:
> 
>      wget https://raw.githubusercontent.com/letsencrypt/pebble/main/test/certs/pebble.minica.pem
>      cp pebble.minica.pem /usr/local/share/ca-certificates/pebble.minica.crt
>      update-ca-certificates
> 
> We want Pebble to perform HTTP-01 validation against port 80, because
> PBS’s standalone plugin will bind port 80. Set httpPort to 80.
> 
>      nano ./test/config/pebble-config.json
> 
> Start the Pebble server in the background:
> 
>      ./pebble -config ./test/config/pebble-config.json &
> 
> Create a Pebble ACME account:
> 
>      proxmox-backup-manager acme account register default admin@example.com --directory 'https://127.0.0.1:14000/dir'
> 
> To verify persistence of the account I checked
> 
>      ls /etc/proxmox-backup/acme/accounts
> 
> Verified if update-account works
> 
>      proxmox-backup-manager acme account update default --contact "a@example.com,b@example.com"
>      proxmox-backup-manager acme account info default
> 
> In the PBS GUI, you can create a new domain. You can use your host
> domain name (see /etc/hosts). Select the created account and order the
> certificate.
> 
> After a page reload, you might need to accept the new certificate in the browser.
> In the PBS dashboard, you should see the new Pebble certificate.
> 
> *Note: on reboot, the created Pebble ACME account will be gone and you
> will need to create a new one. Pebble does not persist account info.
> In that case remove the previously created account in
> /etc/proxmox-backup/acme/accounts.
> 
> *Testing the newNonce fix*
> 
> To prove the ACME newNonce fix, I put nginx in front of Pebble, to
> intercept the newNonce request in order to return 204 No Content
> instead of 200 OK, all other requests are unchanged and forwarded to
> Pebble. Requires trusting the nginx CAs via
> /usr/local/share/ca-certificates + update-ca-certificates on the VM.
> 
> Then I ran following command against nginx:
> 
> proxmox-backup-manager acme account register proxytest root@backup.local --directory 'https://nginx-address/dir
> 
> The account could be created successfully. When adjusting the nginx
> configuration to return any other non-expected success status code,
> PBS rejects as expected.
> 
> ## Patch summary
> 
> 0001 – [PATCH proxmox v5 1/4] acme: reduce visibility of Request type
>   Restricts the visibility of the low-level Request type. Consumers
>   should rely on proxmox-acme-api or AcmeClient handlers.
> 
> 0002– [PATCH proxmox v5 2/4] acme: introduce http_status module
> 
> 0003 – [PATCH proxmox v5 3/4] fix #6939: acme: support servers
> returning 204 for nonce requests
>   Adjusts nonce handling to support ACME servers that return HTTP 204
>   (No Content) for new-nonce requests.
> 
> 0004 – [PATCH proxmox v5 4/4] acme-api: add helper to load client for
> an account
>   Introduces a helper function to load an ACME client instance for a
>   given account. Required for the following PBS ACME refactor.
> 
> 0005 – [PATCH proxmox-backup v5 1/5] acme: clean up ACME-related imports
> 
> 0006 – [PATCH proxmox-backup v5 2/5] acme: include proxmox-acme-api
> dependency
>   Prepares the codebase to use the factored out ACME API impl.
> 
> 0007 – [PATCH proxmox-backup v5 3/5] acme: drop local AcmeClient
>   Removes the local AcmeClient implementation. Represents the minimal
>   set of changes to replace it with the factored out AcmeClient.
> 
> 0008 – [PATCH proxmox-backup v5 4/5] acme: change API impls to use
> proxmox-acme-api handlers
> 
> 0009 – [PATCH proxmox-backup v5 5/5] acme: certificate ordering through
> proxmox-acme-api
> 
> Thanks for considering this patch series, I look forward to your
> feedback.
> 
> Best,
> Samuel Rufinatscha
> 
> ## Changelog
> 
> Changes from v4 to v5:
> 
> * rebased series
> * re-ordered series (proxmox-acme fix first)
> * proxmox-backup: cleaned up imports based on an initial clean-up patch
> * proxmox-acme: removed now unused post_request_raw_payload(),
>    update_account_request(), deactivate_account_request()
> * proxmox-acme: removed now obsolete/unused get_authorization() and
>    GetAuthorization impl
> 
> Verified removal by compiling PBS, PDM, and proxmox-perl-rs
> with all features.
> 
> Changes from v3 to v4:
> 
> * add proxmox-acme-api as a dependency and initialize it in
>   PBS so PBS can use the shared ACME API instead.
> * remove the PBS-local AcmeClient implementation and switch PBS
>   over to the shared proxmox-acme async client.
> * rework PBS’ ACME API endpoints to delegate to
>   proxmox-acme-api handlers instead of duplicating logic locally.
> * move PBS’ ACME certificate ordering logic over to
>   proxmox-acme-api, keeping only certificate installation/reload in PBS.
> * add a load_client_with_account helper in proxmox-acme-api so PBS
>   (and others) can construct an AcmeClient for a configured account
>   without duplicating boilerplate.
> * hide the low-level Request type and its fields behind constructors
>   / reduced visibility so changes to “expected” no longer affect the
>   public API as they did in v3.
> * split out the HTTP status constants into an internal http_status
>   module as a separate preparatory cleanup before the bug fix, instead
>   of doing this inline like in v3.
> * Rebased on top of the refactor: keep the same behavioural fix as in
>   v3 accept 204 for newNonce with Replay-Nonce present), but implement
>   it on top of the http_status module that is part of the refactor.
> 
> Changes from v2 to v3:
> 
> * rename `http_success` module to `http_status`
> * replace `http_success` usage
> * introduced `http_success` module to contain the http success codes
> * replaced `Vec<u16>` with `&[u16]` for expected codes to avoid allocations.
> * clarified the PVEs Perl ACME client behaviour in the commit message.
> * integrated the `http_success` module, replacing `Vec<u16>` with `&[u16]`
> * clarified the PVEs Perl ACME client behaviour in the commit message.
> 
> [1] Bugzilla report #6939:
> [https://bugzilla.proxmox.com/show_bug.cgi?id=6939](https://bugzilla.proxmox.com/show_bug.cgi?id=6939)
> [2] RFC 8555 (ACME):
> [https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2](https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2)
> [3] PVE’s Perl ACME client (allow 2xx codes for nonce requests):
> [https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597)
> [4] Pebble ACME server:
> [https://github.com/letsencrypt/pebble](https://github.com/letsencrypt/pebble)
> [5] Pebble ACME server (perform GET request:
> [https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219)
> 
> proxmox:
> 
> Samuel Rufinatscha (4):
>    acme: reduce visibility of Request type
>    acme: introduce http_status module
>    fix #6939: acme: support servers returning 204 for nonce requests
>    acme-api: add helper to load client for an account
> 
>   proxmox-acme-api/src/account_api_impl.rs |   5 ++
>   proxmox-acme-api/src/lib.rs              |   3 +-
>   proxmox-acme/src/account.rs              | 102 ++---------------------
>   proxmox-acme/src/async_client.rs         |   8 +-
>   proxmox-acme/src/authorization.rs        |  30 -------
>   proxmox-acme/src/client.rs               |   8 +-
>   proxmox-acme/src/lib.rs                  |   6 +-
>   proxmox-acme/src/order.rs                |   2 +-
>   proxmox-acme/src/request.rs              |  25 ++++--
>   9 files changed, 44 insertions(+), 145 deletions(-)
> 
> 
> proxmox-backup:
> 
> Samuel Rufinatscha (5):
>    acme: clean up ACME-related imports
>    acme: include proxmox-acme-api dependency
>    acme: drop local AcmeClient
>    acme: change API impls to use proxmox-acme-api handlers
>    acme: certificate ordering through proxmox-acme-api
> 
>   Cargo.toml                             |   3 +
>   src/acme/client.rs                     | 691 -------------------------
>   src/acme/mod.rs                        |   5 -
>   src/acme/plugin.rs                     | 336 ------------
>   src/api2/config/acme.rs                | 406 ++-------------
>   src/api2/node/certificates.rs          | 232 ++-------
>   src/api2/types/acme.rs                 |  98 ----
>   src/api2/types/mod.rs                  |   3 -
>   src/bin/proxmox-backup-api.rs          |   2 +
>   src/bin/proxmox-backup-manager.rs      |  14 +-
>   src/bin/proxmox-backup-proxy.rs        |  15 +-
>   src/bin/proxmox_backup_manager/acme.rs |  21 +-
>   src/config/acme/mod.rs                 |  55 +-
>   src/config/acme/plugin.rs              |  92 +---
>   src/config/node.rs                     |  31 +-
>   src/lib.rs                             |   2 -
>   16 files changed, 109 insertions(+), 1897 deletions(-)
>   delete mode 100644 src/acme/client.rs
>   delete mode 100644 src/acme/mod.rs
>   delete mode 100644 src/acme/plugin.rs
>   delete mode 100644 src/api2/types/acme.rs
> 
> 
> Summary over all repositories:
>    25 files changed, 153 insertions(+), 2042 deletions(-)
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[relevance 13%]

* Re: [pbs-devel] [PATCH proxmox-backup v3 1/4] pbs-config: add token.shadow generation to ConfigVersionCache
  @ 2026-01-16 13:53  6%     ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-16 13:53 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion,
	Fabian Grünbichler

On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>> Currently, every token-based API request reads the token.shadow file and
>> runs the expensive password hash verification for the given token
>> secret. This shows up as a hotspot in /status profiling (see
>> bug #7017 [1]).
>>
>> To solve the issue, this patch prepares the config version cache,
>> so that token_shadow_generation config caching can be built on
>> top of it.
>>
>> This patch specifically:
>> (1) implements increment function in order to invalidate generations
> 
> this is needlessly verbose..
> 
>>
>> This patch is part of the series which fixes bug #7017 [1].
> 
> this is already mentioned higher up and doesn't need to be repeated
> here.
> 

Makes sense, will adjust this. Thanks!

> this patch needs a rebase. it would be good to call out why it is safe
> to add to this struct, since it is accessed/mapped by both old and new
> processes.
>

Will add a note on why this is safe: the shmem mapping is fixed to 4096
bytes via the #[repr(C)] union padding and enforced
by assert_cache_size(). The new AtomicUsize is appended at the end of
the struct, so existing field offsets are unchanged. Old
processes keep accessing the same bytes; the new field consumes
previously reserved padding.
>>
>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>>   pbs-config/src/config_version_cache.rs | 18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
>> index e8fb994f..1376b11d 100644
>> --- a/pbs-config/src/config_version_cache.rs
>> +++ b/pbs-config/src/config_version_cache.rs
>> @@ -28,6 +28,8 @@ struct ConfigVersionCacheDataInner {
>>       // datastore (datastore.cfg) generation/version
>>       // FIXME: remove with PBS 3.0
>>       datastore_generation: AtomicUsize,
>> +    // Token shadow (token.shadow) generation/version.
>> +    token_shadow_generation: AtomicUsize,
>>       // Add further atomics here
>>   }
>>   
>> @@ -153,4 +155,20 @@ impl ConfigVersionCache {
>>               .datastore_generation
>>               .fetch_add(1, Ordering::AcqRel)
>>       }
>> +
>> +    /// Returns the token shadow generation number.
>> +    pub fn token_shadow_generation(&self) -> usize {
>> +        self.shmem
>> +            .data()
>> +            .token_shadow_generation
>> +            .load(Ordering::Acquire)
>> +    }
>> +
>> +    /// Increase the token shadow generation number.
>> +    pub fn increase_token_shadow_generation(&self) -> usize {
>> +        self.shmem
>> +            .data()
>> +            .token_shadow_generation
>> +            .fetch_add(1, Ordering::AcqRel)
>> +    }
>>   }
>> -- 
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[relevance 6%]

* Re: [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets
  @ 2026-01-16 15:13  6%     ` Samuel Rufinatscha
  2026-01-16 15:29  6%       ` Fabian Grünbichler
  2026-01-16 16:00  5%       ` Fabian Grünbichler
  0 siblings, 2 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-16 15:13 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion,
	Fabian Grünbichler

On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>> Currently, every token-based API request reads the token.shadow file and
>> runs the expensive password hash verification for the given token
>> secret. This shows up as a hotspot in /status profiling (see
>> bug #7017 [1]).
>>
>> This patch introduces an in-memory cache of successfully verified token
>> secrets. Subsequent requests for the same token+secret combination only
>> perform a comparison using openssl::memcmp::eq and avoid re-running the
>> password hash. The cache is updated when a token secret is set and
>> cleared when a token is deleted. Note, this does NOT include manual
>> config changes, which will be covered in a subsequent patch.
>>
>> This patch is part of the series which fixes bug #7017 [1].
>>
>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> Changes from v1 to v2:
>>
>> * Replace OnceCell with LazyLock, and std::sync::RwLock with
>> parking_lot::RwLock.
>> * Add API_MUTATION_GENERATION and guard cache inserts
>> to prevent “zombie inserts” across concurrent set/delete.
>> * Refactor cache operations into cache_try_secret_matches,
>> cache_try_insert_secret, and centralize write-side behavior in
>> apply_api_mutation.
>> * Switch fast-path cache access to try_read/try_write (best-effort).
>>
>> Changes from v2 to v3:
>>
>> * Replaced process-local cache invalidation (AtomicU64
>> API_MUTATION_GENERATION) with a cross-process shared generation via
>> ConfigVersionCache.
>> * Validate shared generation before/after the constant-time secret
>> compare; only insert into cache if the generation is unchanged.
>> * invalidate_cache_state() on insert if shared generation changed.
>>
>>   Cargo.toml                     |   1 +
>>   pbs-config/Cargo.toml          |   1 +
>>   pbs-config/src/token_shadow.rs | 157 ++++++++++++++++++++++++++++++++-
>>   3 files changed, 158 insertions(+), 1 deletion(-)
>>
>> diff --git a/Cargo.toml b/Cargo.toml
>> index 1aa57ae5..821b63b7 100644
>> --- a/Cargo.toml
>> +++ b/Cargo.toml
>> @@ -143,6 +143,7 @@ nom = "7"
>>   num-traits = "0.2"
>>   once_cell = "1.3.1"
>>   openssl = "0.10.40"
>> +parking_lot = "0.12"
>>   percent-encoding = "2.1"
>>   pin-project-lite = "0.2"
>>   regex = "1.5.5"
>> diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
>> index 74afb3c6..eb81ce00 100644
>> --- a/pbs-config/Cargo.toml
>> +++ b/pbs-config/Cargo.toml
>> @@ -13,6 +13,7 @@ libc.workspace = true
>>   nix.workspace = true
>>   once_cell.workspace = true
>>   openssl.workspace = true
>> +parking_lot.workspace = true
>>   regex.workspace = true
>>   serde.workspace = true
>>   serde_json.workspace = true
>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
>> index 640fabbf..fa84aee5 100644
>> --- a/pbs-config/src/token_shadow.rs
>> +++ b/pbs-config/src/token_shadow.rs
>> @@ -1,6 +1,8 @@
>>   use std::collections::HashMap;
>> +use std::sync::LazyLock;
>>   
>>   use anyhow::{bail, format_err, Error};
>> +use parking_lot::RwLock;
>>   use serde::{Deserialize, Serialize};
>>   use serde_json::{from_value, Value};
>>   
>> @@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
>>   const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
>>   const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>>   
>> +/// Global in-memory cache for successfully verified API token secrets.
>> +/// The cache stores plain text secrets for token Authids that have already been
>> +/// verified against the hashed values in `token.shadow`. This allows for cheap
>> +/// subsequent authentications for the same token+secret combination, avoiding
>> +/// recomputing the password hash on every request.
>> +static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
>> +    RwLock::new(ApiTokenSecretCache {
>> +        secrets: HashMap::new(),
>> +        shared_gen: 0,
>> +    })
>> +});
>> +
>>   #[derive(Serialize, Deserialize)]
>>   #[serde(rename_all = "kebab-case")]
>>   /// ApiToken id / secret pair
>> @@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>>           bail!("not an API token ID");
>>       }
>>   
>> +    // Fast path
>> +    if cache_try_secret_matches(tokenid, secret) {
>> +        return Ok(());
>> +    }
>> +
>> +    // Slow path
>> +    // First, capture the shared generation before doing the hash verification.
>> +    let gen_before = token_shadow_shared_gen();
>> +
>>       let data = read_file()?;
>>       match data.get(tokenid) {
>> -        Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
>> +        Some(hashed_secret) => {
>> +            proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
>> +
>> +            // Try to cache only if nothing changed while verifying the secret.
>> +            if let Some(gen) = gen_before {
>> +                cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
>> +            }
>> +
>> +            Ok(())
>> +        }
>>           None => bail!("invalid API token"),
>>       }
>>   }
>> @@ -82,6 +114,8 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>>       data.insert(tokenid.clone(), hashed_secret);
>>       write_file(data)?;
>>   
>> +    apply_api_mutation(tokenid, Some(secret));
>> +
>>       Ok(())
>>   }
>>   
>> @@ -97,5 +131,126 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
>>       data.remove(tokenid);
>>       write_file(data)?;
>>   
>> +    apply_api_mutation(tokenid, None);
>> +
>>       Ok(())
>>   }
>> +
>> +struct ApiTokenSecretCache {
>> +    /// Keys are token Authids, values are the corresponding plain text secrets.
>> +    /// Entries are added after a successful on-disk verification in
>> +    /// `verify_secret` or when a new token secret is generated by
>> +    /// `generate_and_set_secret`. Used to avoid repeated
>> +    /// password-hash computation on subsequent authentications.
>> +    secrets: HashMap<Authid, CachedSecret>,
>> +    /// Shared generation to detect mutations of the underlying token.shadow file.
>> +    shared_gen: usize,
>> +}
>> +
>> +/// Cached secret.
>> +struct CachedSecret {
>> +    secret: String,
>> +}
>> +
>> +fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
>> +    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
>> +        return;
>> +    };
>> +
>> +    let Some(shared_gen_now) = token_shadow_shared_gen() else {
>> +        return;
>> +    };
>> +
>> +    // If this process missed a generation bump, its cache is stale.
>> +    if cache.shared_gen != shared_gen_now {
>> +        invalidate_cache_state(&mut cache);
>> +        cache.shared_gen = shared_gen_now;
>> +    }
>> +
>> +    // If a mutation happened while we were verifying the secret, do not insert.
>> +    if shared_gen_now == shared_gen_before {
>> +        cache.secrets.insert(tokenid, CachedSecret { secret });
>> +    }
>> +}
>> +
>> +// Tries to match the given token secret against the cached secret.
>> +// Checks the generation before and after the constant-time compare to avoid a
>> +// TOCTOU window. If another process rotates/deletes a token while we're validating
>> +// the cached secret, the generation will change, and we
>> +// must not trust the cache for this request.
>> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
>> +    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
>> +        return false;
>> +    };
>> +    let Some(entry) = cache.secrets.get(tokenid) else {
>> +        return false;
>> +    };
>> +
>> +    let cache_gen = cache.shared_gen;
>> +
>> +    let Some(gen1) = token_shadow_shared_gen() else {
>> +        return false;
>> +    };
>> +    if gen1 != cache_gen {
>> +        return false;
>> +    }
>> +
>> +    let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
> 
> should we invalidate the cache here for this particular authid in case
> of a mismatch, to avoid making brute forcing too easy/cheap?
>

We are not doing a cheap reject, in mismatch we do still fall through to
verify_crypt_pw(). Evicting on mismatch could however enable cache
thrashing where wrong secrets for a known tokenid would evict cached
entries. So I think we should not invalidate here on mismatch.

>> +    let Some(gen2) = token_shadow_shared_gen() else {
>> +        return false;
>> +    };
>> +
>> +    eq && gen2 == cache_gen
>> +}
>> +
>> +fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
>> +    // Signal cache invalidation to other processes (best-effort).
>> +    let new_shared_gen = bump_token_shadow_shared_gen();
>> +
>> +    let mut cache = TOKEN_SECRET_CACHE.write();
>> +
>> +    // If we cannot read/bump the shared generation, we cannot safely trust the cache.
>> +    let Some(gen) = new_shared_gen else {
>> +        invalidate_cache_state(&mut cache);
>> +        cache.shared_gen = 0;
>> +        return;
>> +    };
>> +
>> +    // Update to the post-mutation generation.
>> +    cache.shared_gen = gen;
>> +
>> +    // Apply the new mutation.
>> +    match new_secret {
>> +        Some(secret) => {
>> +            cache.secrets.insert(
>> +                tokenid.clone(),
>> +                CachedSecret {
>> +                    secret: secret.to_owned(),
>> +                },
>> +            );
>> +        }
>> +        None => {
>> +            cache.secrets.remove(tokenid);
>> +        }
>> +    }
>> +}
>> +
>> +/// Get the current shared generation.
>> +fn token_shadow_shared_gen() -> Option<usize> {
>> +    crate::ConfigVersionCache::new()
>> +        .ok()
>> +        .map(|cvc| cvc.token_shadow_generation())
>> +}
>> +
>> +/// Bump and return the new shared generation.
>> +fn bump_token_shadow_shared_gen() -> Option<usize> {
>> +    crate::ConfigVersionCache::new()
>> +        .ok()
>> +        .map(|cvc| cvc.increase_token_shadow_generation() + 1)
>> +}
>> +
>> +/// Invalidates the cache state and only keeps the shared generation.
> 
> both calls to this actually set the cached generation to some value
> right after, so maybe this should take a generation directly and set it?
>

patch 3/4 doesn’t always update the gen on cache invalidation
(shadow_mtime_len() error branch in apply_api_mutation) but most other
call sites do. Agreed this can be refactored, maybe:

fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
     cache.secrets.clear();
     // clear other cache fields (mtime/len/last_checked) as needed
}

fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache, 
gen: usize) {
     invalidate_cache_state(cache);
     cache.shared_gen = gen;
}

We could also do a single helper with Option<usize> but two helpers make 
the call sites more explicit.

>> +fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>> +    cache.secrets.clear();
>> +}
>> -- 
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[relevance 6%]

* Re: [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets
  2026-01-16 15:13  6%     ` Samuel Rufinatscha
@ 2026-01-16 15:29  6%       ` Fabian Grünbichler
  2026-01-16 15:33  6%         ` Samuel Rufinatscha
  2026-01-16 16:00  5%       ` Fabian Grünbichler
  1 sibling, 1 reply; 117+ results
From: Fabian Grünbichler @ 2026-01-16 15:29 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Samuel Rufinatscha

Quoting Samuel Rufinatscha (2026-01-16 16:13:17)
> On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
> > On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
> >> Currently, every token-based API request reads the token.shadow file and
> >> runs the expensive password hash verification for the given token
> >> secret. This shows up as a hotspot in /status profiling (see
> >> bug #7017 [1]).
> >>
> >> This patch introduces an in-memory cache of successfully verified token
> >> secrets. Subsequent requests for the same token+secret combination only
> >> perform a comparison using openssl::memcmp::eq and avoid re-running the
> >> password hash. The cache is updated when a token secret is set and
> >> cleared when a token is deleted. Note, this does NOT include manual
> >> config changes, which will be covered in a subsequent patch.
> >>
> >> This patch is part of the series which fixes bug #7017 [1].
> >>
> >> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
> >>
> >> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> >> ---
> >> Changes from v1 to v2:
> >>
> >> * Replace OnceCell with LazyLock, and std::sync::RwLock with
> >> parking_lot::RwLock.
> >> * Add API_MUTATION_GENERATION and guard cache inserts
> >> to prevent “zombie inserts” across concurrent set/delete.
> >> * Refactor cache operations into cache_try_secret_matches,
> >> cache_try_insert_secret, and centralize write-side behavior in
> >> apply_api_mutation.
> >> * Switch fast-path cache access to try_read/try_write (best-effort).
> >>
> >> Changes from v2 to v3:
> >>
> >> * Replaced process-local cache invalidation (AtomicU64
> >> API_MUTATION_GENERATION) with a cross-process shared generation via
> >> ConfigVersionCache.
> >> * Validate shared generation before/after the constant-time secret
> >> compare; only insert into cache if the generation is unchanged.
> >> * invalidate_cache_state() on insert if shared generation changed.
> >>
> >>   Cargo.toml                     |   1 +
> >>   pbs-config/Cargo.toml          |   1 +
> >>   pbs-config/src/token_shadow.rs | 157 ++++++++++++++++++++++++++++++++-
> >>   3 files changed, 158 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/Cargo.toml b/Cargo.toml
> >> index 1aa57ae5..821b63b7 100644
> >> --- a/Cargo.toml
> >> +++ b/Cargo.toml
> >> @@ -143,6 +143,7 @@ nom = "7"
> >>   num-traits = "0.2"
> >>   once_cell = "1.3.1"
> >>   openssl = "0.10.40"
> >> +parking_lot = "0.12"
> >>   percent-encoding = "2.1"
> >>   pin-project-lite = "0.2"
> >>   regex = "1.5.5"
> >> diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
> >> index 74afb3c6..eb81ce00 100644
> >> --- a/pbs-config/Cargo.toml
> >> +++ b/pbs-config/Cargo.toml
> >> @@ -13,6 +13,7 @@ libc.workspace = true
> >>   nix.workspace = true
> >>   once_cell.workspace = true
> >>   openssl.workspace = true
> >> +parking_lot.workspace = true
> >>   regex.workspace = true
> >>   serde.workspace = true
> >>   serde_json.workspace = true
> >> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
> >> index 640fabbf..fa84aee5 100644
> >> --- a/pbs-config/src/token_shadow.rs
> >> +++ b/pbs-config/src/token_shadow.rs
> >> @@ -1,6 +1,8 @@
> >>   use std::collections::HashMap;
> >> +use std::sync::LazyLock;
> >>   
> >>   use anyhow::{bail, format_err, Error};
> >> +use parking_lot::RwLock;
> >>   use serde::{Deserialize, Serialize};
> >>   use serde_json::{from_value, Value};
> >>   
> >> @@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
> >>   const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
> >>   const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
> >>   
> >> +/// Global in-memory cache for successfully verified API token secrets.
> >> +/// The cache stores plain text secrets for token Authids that have already been
> >> +/// verified against the hashed values in `token.shadow`. This allows for cheap
> >> +/// subsequent authentications for the same token+secret combination, avoiding
> >> +/// recomputing the password hash on every request.
> >> +static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
> >> +    RwLock::new(ApiTokenSecretCache {
> >> +        secrets: HashMap::new(),
> >> +        shared_gen: 0,
> >> +    })
> >> +});
> >> +
> >>   #[derive(Serialize, Deserialize)]
> >>   #[serde(rename_all = "kebab-case")]
> >>   /// ApiToken id / secret pair
> >> @@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
> >>           bail!("not an API token ID");
> >>       }
> >>   
> >> +    // Fast path
> >> +    if cache_try_secret_matches(tokenid, secret) {
> >> +        return Ok(());
> >> +    }
> >> +
> >> +    // Slow path
> >> +    // First, capture the shared generation before doing the hash verification.
> >> +    let gen_before = token_shadow_shared_gen();
> >> +
> >>       let data = read_file()?;
> >>       match data.get(tokenid) {
> >> -        Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
> >> +        Some(hashed_secret) => {
> >> +            proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
> >> +
> >> +            // Try to cache only if nothing changed while verifying the secret.
> >> +            if let Some(gen) = gen_before {
> >> +                cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
> >> +            }
> >> +
> >> +            Ok(())
> >> +        }
> >>           None => bail!("invalid API token"),
> >>       }
> >>   }
> >> @@ -82,6 +114,8 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
> >>       data.insert(tokenid.clone(), hashed_secret);
> >>       write_file(data)?;
> >>   
> >> +    apply_api_mutation(tokenid, Some(secret));
> >> +
> >>       Ok(())
> >>   }
> >>   
> >> @@ -97,5 +131,126 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
> >>       data.remove(tokenid);
> >>       write_file(data)?;
> >>   
> >> +    apply_api_mutation(tokenid, None);
> >> +
> >>       Ok(())
> >>   }
> >> +
> >> +struct ApiTokenSecretCache {
> >> +    /// Keys are token Authids, values are the corresponding plain text secrets.
> >> +    /// Entries are added after a successful on-disk verification in
> >> +    /// `verify_secret` or when a new token secret is generated by
> >> +    /// `generate_and_set_secret`. Used to avoid repeated
> >> +    /// password-hash computation on subsequent authentications.
> >> +    secrets: HashMap<Authid, CachedSecret>,
> >> +    /// Shared generation to detect mutations of the underlying token.shadow file.
> >> +    shared_gen: usize,
> >> +}
> >> +
> >> +/// Cached secret.
> >> +struct CachedSecret {
> >> +    secret: String,
> >> +}
> >> +
> >> +fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
> >> +    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
> >> +        return;
> >> +    };
> >> +
> >> +    let Some(shared_gen_now) = token_shadow_shared_gen() else {
> >> +        return;
> >> +    };
> >> +
> >> +    // If this process missed a generation bump, its cache is stale.
> >> +    if cache.shared_gen != shared_gen_now {
> >> +        invalidate_cache_state(&mut cache);
> >> +        cache.shared_gen = shared_gen_now;
> >> +    }
> >> +
> >> +    // If a mutation happened while we were verifying the secret, do not insert.
> >> +    if shared_gen_now == shared_gen_before {
> >> +        cache.secrets.insert(tokenid, CachedSecret { secret });
> >> +    }
> >> +}
> >> +
> >> +// Tries to match the given token secret against the cached secret.
> >> +// Checks the generation before and after the constant-time compare to avoid a
> >> +// TOCTOU window. If another process rotates/deletes a token while we're validating
> >> +// the cached secret, the generation will change, and we
> >> +// must not trust the cache for this request.
> >> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
> >> +    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
> >> +        return false;
> >> +    };
> >> +    let Some(entry) = cache.secrets.get(tokenid) else {
> >> +        return false;
> >> +    };
> >> +
> >> +    let cache_gen = cache.shared_gen;
> >> +
> >> +    let Some(gen1) = token_shadow_shared_gen() else {
> >> +        return false;
> >> +    };
> >> +    if gen1 != cache_gen {
> >> +        return false;
> >> +    }
> >> +
> >> +    let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
> > 
> > should we invalidate the cache here for this particular authid in case
> > of a mismatch, to avoid making brute forcing too easy/cheap?
> >
> 
> We are not doing a cheap reject, in mismatch we do still fall through to
> verify_crypt_pw(). Evicting on mismatch could however enable cache
> thrashing where wrong secrets for a known tokenid would evict cached
> entries. So I think we should not invalidate here on mismatch.
> 
> >> +    let Some(gen2) = token_shadow_shared_gen() else {
> >> +        return false;
> >> +    };
> >> +
> >> +    eq && gen2 == cache_gen
> >> +}
> >> +
> >> +fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
> >> +    // Signal cache invalidation to other processes (best-effort).
> >> +    let new_shared_gen = bump_token_shadow_shared_gen();
> >> +
> >> +    let mut cache = TOKEN_SECRET_CACHE.write();
> >> +
> >> +    // If we cannot read/bump the shared generation, we cannot safely trust the cache.
> >> +    let Some(gen) = new_shared_gen else {
> >> +        invalidate_cache_state(&mut cache);
> >> +        cache.shared_gen = 0;
> >> +        return;
> >> +    };
> >> +
> >> +    // Update to the post-mutation generation.
> >> +    cache.shared_gen = gen;
> >> +
> >> +    // Apply the new mutation.
> >> +    match new_secret {
> >> +        Some(secret) => {
> >> +            cache.secrets.insert(
> >> +                tokenid.clone(),
> >> +                CachedSecret {
> >> +                    secret: secret.to_owned(),
> >> +                },
> >> +            );
> >> +        }
> >> +        None => {
> >> +            cache.secrets.remove(tokenid);
> >> +        }
> >> +    }
> >> +}
> >> +
> >> +/// Get the current shared generation.
> >> +fn token_shadow_shared_gen() -> Option<usize> {
> >> +    crate::ConfigVersionCache::new()
> >> +        .ok()
> >> +        .map(|cvc| cvc.token_shadow_generation())
> >> +}
> >> +
> >> +/// Bump and return the new shared generation.
> >> +fn bump_token_shadow_shared_gen() -> Option<usize> {
> >> +    crate::ConfigVersionCache::new()
> >> +        .ok()
> >> +        .map(|cvc| cvc.increase_token_shadow_generation() + 1)
> >> +}
> >> +
> >> +/// Invalidates the cache state and only keeps the shared generation.
> > 
> > both calls to this actually set the cached generation to some value
> > right after, so maybe this should take a generation directly and set it?
> >
> 
> patch 3/4 doesn’t always update the gen on cache invalidation
> (shadow_mtime_len() error branch in apply_api_mutation) but most other
> call sites do. Agreed this can be refactored, maybe:

that one sets the generation before (potentially) invalidating the cache
though, so we could unconditionally reset the generation to that value when
invalidating.. we should maybe also re-order the lock and bump there?

> 
> fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>      cache.secrets.clear();
>      // clear other cache fields (mtime/len/last_checked) as needed
> }
> 
> fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache, 
> gen: usize) {
>      invalidate_cache_state(cache);
>      cache.shared_gen = gen;
> }
> 
> We could also do a single helper with Option<usize> but two helpers make 
> the call sites more explicit.
> 
> >> +fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
> >> +    cache.secrets.clear();
> >> +}
> >> -- 
> >> 2.47.3
> >>
> >>
> >>
> >> _______________________________________________
> >> pbs-devel mailing list
> >> pbs-devel@lists.proxmox.com
> >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> >>
> > 
> > 
> > _______________________________________________
> > pbs-devel mailing list
> > pbs-devel@lists.proxmox.com
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
>


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[relevance 6%]

* Re: [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets
  2026-01-16 15:29  6%       ` Fabian Grünbichler
@ 2026-01-16 15:33  6%         ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-16 15:33 UTC (permalink / raw)
  To: Fabian Grünbichler,
	Proxmox Backup Server development discussion

On 1/16/26 4:28 PM, Fabian Grünbichler wrote:
> Quoting Samuel Rufinatscha (2026-01-16 16:13:17)
>> On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
>>> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>>>> Currently, every token-based API request reads the token.shadow file and
>>>> runs the expensive password hash verification for the given token
>>>> secret. This shows up as a hotspot in /status profiling (see
>>>> bug #7017 [1]).
>>>>
>>>> This patch introduces an in-memory cache of successfully verified token
>>>> secrets. Subsequent requests for the same token+secret combination only
>>>> perform a comparison using openssl::memcmp::eq and avoid re-running the
>>>> password hash. The cache is updated when a token secret is set and
>>>> cleared when a token is deleted. Note, this does NOT include manual
>>>> config changes, which will be covered in a subsequent patch.
>>>>
>>>> This patch is part of the series which fixes bug #7017 [1].
>>>>
>>>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>>>
>>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>>> ---
>>>> Changes from v1 to v2:
>>>>
>>>> * Replace OnceCell with LazyLock, and std::sync::RwLock with
>>>> parking_lot::RwLock.
>>>> * Add API_MUTATION_GENERATION and guard cache inserts
>>>> to prevent “zombie inserts” across concurrent set/delete.
>>>> * Refactor cache operations into cache_try_secret_matches,
>>>> cache_try_insert_secret, and centralize write-side behavior in
>>>> apply_api_mutation.
>>>> * Switch fast-path cache access to try_read/try_write (best-effort).
>>>>
>>>> Changes from v2 to v3:
>>>>
>>>> * Replaced process-local cache invalidation (AtomicU64
>>>> API_MUTATION_GENERATION) with a cross-process shared generation via
>>>> ConfigVersionCache.
>>>> * Validate shared generation before/after the constant-time secret
>>>> compare; only insert into cache if the generation is unchanged.
>>>> * invalidate_cache_state() on insert if shared generation changed.
>>>>
>>>>    Cargo.toml                     |   1 +
>>>>    pbs-config/Cargo.toml          |   1 +
>>>>    pbs-config/src/token_shadow.rs | 157 ++++++++++++++++++++++++++++++++-
>>>>    3 files changed, 158 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/Cargo.toml b/Cargo.toml
>>>> index 1aa57ae5..821b63b7 100644
>>>> --- a/Cargo.toml
>>>> +++ b/Cargo.toml
>>>> @@ -143,6 +143,7 @@ nom = "7"
>>>>    num-traits = "0.2"
>>>>    once_cell = "1.3.1"
>>>>    openssl = "0.10.40"
>>>> +parking_lot = "0.12"
>>>>    percent-encoding = "2.1"
>>>>    pin-project-lite = "0.2"
>>>>    regex = "1.5.5"
>>>> diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
>>>> index 74afb3c6..eb81ce00 100644
>>>> --- a/pbs-config/Cargo.toml
>>>> +++ b/pbs-config/Cargo.toml
>>>> @@ -13,6 +13,7 @@ libc.workspace = true
>>>>    nix.workspace = true
>>>>    once_cell.workspace = true
>>>>    openssl.workspace = true
>>>> +parking_lot.workspace = true
>>>>    regex.workspace = true
>>>>    serde.workspace = true
>>>>    serde_json.workspace = true
>>>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
>>>> index 640fabbf..fa84aee5 100644
>>>> --- a/pbs-config/src/token_shadow.rs
>>>> +++ b/pbs-config/src/token_shadow.rs
>>>> @@ -1,6 +1,8 @@
>>>>    use std::collections::HashMap;
>>>> +use std::sync::LazyLock;
>>>>    
>>>>    use anyhow::{bail, format_err, Error};
>>>> +use parking_lot::RwLock;
>>>>    use serde::{Deserialize, Serialize};
>>>>    use serde_json::{from_value, Value};
>>>>    
>>>> @@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
>>>>    const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
>>>>    const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>>>>    
>>>> +/// Global in-memory cache for successfully verified API token secrets.
>>>> +/// The cache stores plain text secrets for token Authids that have already been
>>>> +/// verified against the hashed values in `token.shadow`. This allows for cheap
>>>> +/// subsequent authentications for the same token+secret combination, avoiding
>>>> +/// recomputing the password hash on every request.
>>>> +static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
>>>> +    RwLock::new(ApiTokenSecretCache {
>>>> +        secrets: HashMap::new(),
>>>> +        shared_gen: 0,
>>>> +    })
>>>> +});
>>>> +
>>>>    #[derive(Serialize, Deserialize)]
>>>>    #[serde(rename_all = "kebab-case")]
>>>>    /// ApiToken id / secret pair
>>>> @@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>>>>            bail!("not an API token ID");
>>>>        }
>>>>    
>>>> +    // Fast path
>>>> +    if cache_try_secret_matches(tokenid, secret) {
>>>> +        return Ok(());
>>>> +    }
>>>> +
>>>> +    // Slow path
>>>> +    // First, capture the shared generation before doing the hash verification.
>>>> +    let gen_before = token_shadow_shared_gen();
>>>> +
>>>>        let data = read_file()?;
>>>>        match data.get(tokenid) {
>>>> -        Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
>>>> +        Some(hashed_secret) => {
>>>> +            proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
>>>> +
>>>> +            // Try to cache only if nothing changed while verifying the secret.
>>>> +            if let Some(gen) = gen_before {
>>>> +                cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
>>>> +            }
>>>> +
>>>> +            Ok(())
>>>> +        }
>>>>            None => bail!("invalid API token"),
>>>>        }
>>>>    }
>>>> @@ -82,6 +114,8 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>>>>        data.insert(tokenid.clone(), hashed_secret);
>>>>        write_file(data)?;
>>>>    
>>>> +    apply_api_mutation(tokenid, Some(secret));
>>>> +
>>>>        Ok(())
>>>>    }
>>>>    
>>>> @@ -97,5 +131,126 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
>>>>        data.remove(tokenid);
>>>>        write_file(data)?;
>>>>    
>>>> +    apply_api_mutation(tokenid, None);
>>>> +
>>>>        Ok(())
>>>>    }
>>>> +
>>>> +struct ApiTokenSecretCache {
>>>> +    /// Keys are token Authids, values are the corresponding plain text secrets.
>>>> +    /// Entries are added after a successful on-disk verification in
>>>> +    /// `verify_secret` or when a new token secret is generated by
>>>> +    /// `generate_and_set_secret`. Used to avoid repeated
>>>> +    /// password-hash computation on subsequent authentications.
>>>> +    secrets: HashMap<Authid, CachedSecret>,
>>>> +    /// Shared generation to detect mutations of the underlying token.shadow file.
>>>> +    shared_gen: usize,
>>>> +}
>>>> +
>>>> +/// Cached secret.
>>>> +struct CachedSecret {
>>>> +    secret: String,
>>>> +}
>>>> +
>>>> +fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
>>>> +    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
>>>> +        return;
>>>> +    };
>>>> +
>>>> +    let Some(shared_gen_now) = token_shadow_shared_gen() else {
>>>> +        return;
>>>> +    };
>>>> +
>>>> +    // If this process missed a generation bump, its cache is stale.
>>>> +    if cache.shared_gen != shared_gen_now {
>>>> +        invalidate_cache_state(&mut cache);
>>>> +        cache.shared_gen = shared_gen_now;
>>>> +    }
>>>> +
>>>> +    // If a mutation happened while we were verifying the secret, do not insert.
>>>> +    if shared_gen_now == shared_gen_before {
>>>> +        cache.secrets.insert(tokenid, CachedSecret { secret });
>>>> +    }
>>>> +}
>>>> +
>>>> +// Tries to match the given token secret against the cached secret.
>>>> +// Checks the generation before and after the constant-time compare to avoid a
>>>> +// TOCTOU window. If another process rotates/deletes a token while we're validating
>>>> +// the cached secret, the generation will change, and we
>>>> +// must not trust the cache for this request.
>>>> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
>>>> +    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
>>>> +        return false;
>>>> +    };
>>>> +    let Some(entry) = cache.secrets.get(tokenid) else {
>>>> +        return false;
>>>> +    };
>>>> +
>>>> +    let cache_gen = cache.shared_gen;
>>>> +
>>>> +    let Some(gen1) = token_shadow_shared_gen() else {
>>>> +        return false;
>>>> +    };
>>>> +    if gen1 != cache_gen {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
>>>
>>> should we invalidate the cache here for this particular authid in case
>>> of a mismatch, to avoid making brute forcing too easy/cheap?
>>>
>>
>> We are not doing a cheap reject, in mismatch we do still fall through to
>> verify_crypt_pw(). Evicting on mismatch could however enable cache
>> thrashing where wrong secrets for a known tokenid would evict cached
>> entries. So I think we should not invalidate here on mismatch.
>>
>>>> +    let Some(gen2) = token_shadow_shared_gen() else {
>>>> +        return false;
>>>> +    };
>>>> +
>>>> +    eq && gen2 == cache_gen
>>>> +}
>>>> +
>>>> +fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
>>>> +    // Signal cache invalidation to other processes (best-effort).
>>>> +    let new_shared_gen = bump_token_shadow_shared_gen();
>>>> +
>>>> +    let mut cache = TOKEN_SECRET_CACHE.write();
>>>> +
>>>> +    // If we cannot read/bump the shared generation, we cannot safely trust the cache.
>>>> +    let Some(gen) = new_shared_gen else {
>>>> +        invalidate_cache_state(&mut cache);
>>>> +        cache.shared_gen = 0;
>>>> +        return;
>>>> +    };
>>>> +
>>>> +    // Update to the post-mutation generation.
>>>> +    cache.shared_gen = gen;
>>>> +
>>>> +    // Apply the new mutation.
>>>> +    match new_secret {
>>>> +        Some(secret) => {
>>>> +            cache.secrets.insert(
>>>> +                tokenid.clone(),
>>>> +                CachedSecret {
>>>> +                    secret: secret.to_owned(),
>>>> +                },
>>>> +            );
>>>> +        }
>>>> +        None => {
>>>> +            cache.secrets.remove(tokenid);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +/// Get the current shared generation.
>>>> +fn token_shadow_shared_gen() -> Option<usize> {
>>>> +    crate::ConfigVersionCache::new()
>>>> +        .ok()
>>>> +        .map(|cvc| cvc.token_shadow_generation())
>>>> +}
>>>> +
>>>> +/// Bump and return the new shared generation.
>>>> +fn bump_token_shadow_shared_gen() -> Option<usize> {
>>>> +    crate::ConfigVersionCache::new()
>>>> +        .ok()
>>>> +        .map(|cvc| cvc.increase_token_shadow_generation() + 1)
>>>> +}
>>>> +
>>>> +/// Invalidates the cache state and only keeps the shared generation.
>>>
>>> both calls to this actually set the cached generation to some value
>>> right after, so maybe this should take a generation directly and set it?
>>>
>>
>> patch 3/4 doesn’t always update the gen on cache invalidation
>> (shadow_mtime_len() error branch in apply_api_mutation) but most other
>> call sites do. Agreed this can be refactored, maybe:
> 
> that one sets the generation before (potentially) invalidating the cache
> though, so we could unconditionally reset the generation to that value when
> invalidating.. we should maybe also re-order the lock and bump there?
>

Good point, I will check this! thanks Fabian! :)

>>
>> fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>>       cache.secrets.clear();
>>       // clear other cache fields (mtime/len/last_checked) as needed
>> }
>>
>> fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache,
>> gen: usize) {
>>       invalidate_cache_state(cache);
>>       cache.shared_gen = gen;
>> }
>>
>> We could also do a single helper with Option<usize> but two helpers make
>> the call sites more explicit.
>>
>>>> +fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>>>> +    cache.secrets.clear();
>>>> +}
>>>> -- 
>>>> 2.47.3
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> pbs-devel mailing list
>>>> pbs-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[relevance 6%]

* Re: [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets
  2026-01-16 15:13  6%     ` Samuel Rufinatscha
  2026-01-16 15:29  6%       ` Fabian Grünbichler
@ 2026-01-16 16:00  5%       ` Fabian Grünbichler
  2026-01-16 16:56  6%         ` Samuel Rufinatscha
  1 sibling, 1 reply; 117+ results
From: Fabian Grünbichler @ 2026-01-16 16:00 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Samuel Rufinatscha

Quoting Samuel Rufinatscha (2026-01-16 16:13:17)
> On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
> > On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
> >> Currently, every token-based API request reads the token.shadow file and
> >> runs the expensive password hash verification for the given token
> >> secret. This shows up as a hotspot in /status profiling (see
> >> bug #7017 [1]).
> >>
> >> This patch introduces an in-memory cache of successfully verified token
> >> secrets. Subsequent requests for the same token+secret combination only
> >> perform a comparison using openssl::memcmp::eq and avoid re-running the
> >> password hash. The cache is updated when a token secret is set and
> >> cleared when a token is deleted. Note, this does NOT include manual
> >> config changes, which will be covered in a subsequent patch.
> >>
> >> This patch is part of the series which fixes bug #7017 [1].
> >>
> >> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
> >>
> >> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> >> ---

[..]

> >> +
> >> +// Tries to match the given token secret against the cached secret.
> >> +// Checks the generation before and after the constant-time compare to avoid a
> >> +// TOCTOU window. If another process rotates/deletes a token while we're validating
> >> +// the cached secret, the generation will change, and we
> >> +// must not trust the cache for this request.
> >> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
> >> +    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
> >> +        return false;
> >> +    };
> >> +    let Some(entry) = cache.secrets.get(tokenid) else {
> >> +        return false;
> >> +    };
> >> +
> >> +    let cache_gen = cache.shared_gen;
> >> +
> >> +    let Some(gen1) = token_shadow_shared_gen() else {
> >> +        return false;
> >> +    };
> >> +    if gen1 != cache_gen {
> >> +        return false;
> >> +    }
> >> +
> >> +    let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
> > 
> > should we invalidate the cache here for this particular authid in case
> > of a mismatch, to avoid making brute forcing too easy/cheap?
> >
> 
> We are not doing a cheap reject, in mismatch we do still fall through to
> verify_crypt_pw(). Evicting on mismatch could however enable cache
> thrashing where wrong secrets for a known tokenid would evict cached
> entries. So I think we should not invalidate here on mismatch.

forgot this part here, sorry. you are right, this *should* be okay. I do think
the second generation check there serves no purpose though. the token config
can change at any point after we've validated the secret using the old state,
there is nothing we can do about that, and it's totally fine to accept a token
that is modified at exactly the same moment, even if that same token wouldn't
be valid 2 seconds later..

there has to be a point where we have to say "this token is valid", and at the
point of memcmp here we have already:
- verified we don't need to reload the file
- verified we didn't have any API changes to the token config
- verified that the secret matches what we have cached

redoing the first two changes after that point doesn't protect us against
changes afterwards either, so we might as well not do that extra work that
doesn't give us any extra safety guarantees anyway..

> 
> >> +    let Some(gen2) = token_shadow_shared_gen() else {
> >> +        return false;
> >> +    };
> >> +
> >> +    eq && gen2 == cache_gen
> >> +}
> >> +
> >> +fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
> >> +    // Signal cache invalidation to other processes (best-effort).
> >> +    let new_shared_gen = bump_token_shadow_shared_gen();
> >> +
> >> +    let mut cache = TOKEN_SECRET_CACHE.write();

because I mentioned switching those two around - this actually requires more
thought I think..

right now, calling apply_api_mutation happens under a lock, but there are other
calls that bump the generation, so this is actually racy here. OTOH, bumping
the generation before locking the cache means faster cache invalidation..

maybe we should re-verify the generation after obtaining the lock? and maybe
make apply_api_mutation consume the shadow config file lock, to ensure it's
only called while that lock is being held?

> >> +
> >> +    // If we cannot read/bump the shared generation, we cannot safely trust the cache.
> >> +    let Some(gen) = new_shared_gen else {
> >> +        invalidate_cache_state(&mut cache);
> >> +        cache.shared_gen = 0;
> >> +        return;
> >> +    };
> >> +
> >> +    // Update to the post-mutation generation.
> >> +    cache.shared_gen = gen;
> >> +
> >> +    // Apply the new mutation.
> >> +    match new_secret {
> >> +        Some(secret) => {
> >> +            cache.secrets.insert(
> >> +                tokenid.clone(),
> >> +                CachedSecret {
> >> +                    secret: secret.to_owned(),
> >> +                },
> >> +            );
> >> +        }
> >> +        None => {
> >> +            cache.secrets.remove(tokenid);
> >> +        }
> >> +    }
> >> +}
> >> +
> >> +/// Get the current shared generation.
> >> +fn token_shadow_shared_gen() -> Option<usize> {
> >> +    crate::ConfigVersionCache::new()
> >> +        .ok()
> >> +        .map(|cvc| cvc.token_shadow_generation())
> >> +}
> >> +
> >> +/// Bump and return the new shared generation.
> >> +fn bump_token_shadow_shared_gen() -> Option<usize> {
> >> +    crate::ConfigVersionCache::new()
> >> +        .ok()
> >> +        .map(|cvc| cvc.increase_token_shadow_generation() + 1)
> >> +}
> >> +
> >> +/// Invalidates the cache state and only keeps the shared generation.
> > 
> > both calls to this actually set the cached generation to some value
> > right after, so maybe this should take a generation directly and set it?
> >
> 
> patch 3/4 doesn’t always update the gen on cache invalidation
> (shadow_mtime_len() error branch in apply_api_mutation) but most other
> call sites do. Agreed this can be refactored, maybe:
> 
> fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>      cache.secrets.clear();
>      // clear other cache fields (mtime/len/last_checked) as needed
> }
> 
> fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache, 
> gen: usize) {
>      invalidate_cache_state(cache);
>      cache.shared_gen = gen;
> }
> 
> We could also do a single helper with Option<usize> but two helpers make 
> the call sites more explicit.
> 
> >> +fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
> >> +    cache.secrets.clear();
> >> +}
> >> -- 
> >> 2.47.3
> >>
> >>
> >>
> >> _______________________________________________
> >> pbs-devel mailing list
> >> pbs-devel@lists.proxmox.com
> >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> >>
> > 
> > 
> > _______________________________________________
> > pbs-devel mailing list
> > pbs-devel@lists.proxmox.com
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
>


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[relevance 5%]

* Re: [pbs-devel] [PATCH proxmox-datacenter-manager v3 1/2] pdm-config: implement token.shadow generation
  @ 2026-01-16 16:28  6%     ` Samuel Rufinatscha
  2026-01-16 16:48  6%       ` Shannon Sterz
  0 siblings, 1 reply; 117+ results
From: Samuel Rufinatscha @ 2026-01-16 16:28 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion,
	Fabian Grünbichler

On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>> PDM depends on the shared proxmox/proxmox-access-control crate for
>> token.shadow handling, which expects the product to provide a
>> cross-process invalidation signal so it can safely cache verified API
>> token secrets and invalidate them when token.shadow is changed.
>>
>> This patch
>>
>> * adds a token_shadow_generation to PDM’s shared-memory
>> ConfigVersionCache
>> * implements proxmox_access_control::init::AccessControlConfig
>> for pdm_config::AccessControlConfig, which
>>     - delegates roles/privs/path checks to the existing
>> pdm_api_types::AccessControlConfig implementation
>>     - implements the shadow cache generation trait functions
>> * switches the AccessControlConfig init paths (server + CLI) to use
>> pdm_config::AccessControlConfig instead of
>> pdm_api_types::AccessControlConfig
>>
>> This patch is part of the series which fixes bug #7017 [1].
>>
>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>>   cli/admin/src/main.rs                       |  2 +-
>>   lib/pdm-config/Cargo.toml                   |  1 +
>>   lib/pdm-config/src/access_control_config.rs | 73 +++++++++++++++++++++
>>   lib/pdm-config/src/config_version_cache.rs  | 18 +++++
>>   lib/pdm-config/src/lib.rs                   |  2 +
>>   server/src/acl.rs                           |  3 +-
>>   6 files changed, 96 insertions(+), 3 deletions(-)
>>   create mode 100644 lib/pdm-config/src/access_control_config.rs
>>
>> diff --git a/cli/admin/src/main.rs b/cli/admin/src/main.rs
>> index f698fa2..916c633 100644
>> --- a/cli/admin/src/main.rs
>> +++ b/cli/admin/src/main.rs
>> @@ -19,7 +19,7 @@ fn main() {
>>       proxmox_product_config::init(api_user, priv_user);
>>   
>>       proxmox_access_control::init::init(
>> -        &pdm_api_types::AccessControlConfig,
>> +        &pdm_config::AccessControlConfig,
>>           pdm_buildcfg::configdir!("/access"),
>>       )
>>       .expect("failed to setup access control config");
>> diff --git a/lib/pdm-config/Cargo.toml b/lib/pdm-config/Cargo.toml
>> index d39c2ad..19781d2 100644
>> --- a/lib/pdm-config/Cargo.toml
>> +++ b/lib/pdm-config/Cargo.toml
>> @@ -13,6 +13,7 @@ once_cell.workspace = true
>>   openssl.workspace = true
>>   serde.workspace = true
>>   
>> +proxmox-access-control.workspace = true
>>   proxmox-config-digest = { workspace = true, features = [ "openssl" ] }
>>   proxmox-http = { workspace = true, features = [ "http-helpers" ] }
>>   proxmox-ldap = { workspace = true, features = [ "types" ]}
>> diff --git a/lib/pdm-config/src/access_control_config.rs b/lib/pdm-config/src/access_control_config.rs
>> new file mode 100644
>> index 0000000..6f2e6b3
>> --- /dev/null
>> +++ b/lib/pdm-config/src/access_control_config.rs
>> @@ -0,0 +1,73 @@
>> +// e.g. in src/main.rs or server::context mod, wherever convenient
>> +
>> +use anyhow::Error;
>> +use pdm_api_types::{Authid, Userid};
>> +use proxmox_section_config::SectionConfigData;
>> +use std::collections::HashMap;
>> +
>> +pub struct AccessControlConfig;
>> +
>> +impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
> 
> should we then remove the impl from the api type?
>

Thanks for pointing this out Fabian! Currently, /ui/src/main.rs still
makes use of pdm_api_types::AccessControlConfig. This looks like a WASM
module, and is based on ticket based auth
(proxmox_login::Authentication) as far as I can see. Do you maybe know
if it actually requires the token cache / can work with CVC? If it does
not, then I think we should keep the API impl. I left this unchanged
and only touched server and CLI call sites.
>> +    fn privileges(&self) -> &HashMap<&str, u64> {
>> +        pdm_api_types::AccessControlConfig.privileges()
>> +    }
>> +
>> +    fn roles(&self) -> &HashMap<&str, (u64, &str)> {
>> +        pdm_api_types::AccessControlConfig.roles()
>> +    }
>> +
>> +    fn is_superuser(&self, auth_id: &Authid) -> bool {
>> +        pdm_api_types::AccessControlConfig.is_superuser(auth_id)
>> +    }
>> +
>> +    fn is_group_member(&self, user_id: &Userid, group: &str) -> bool {
>> +        pdm_api_types::AccessControlConfig.is_group_member(user_id, group)
>> +    }
>> +
>> +    fn role_admin(&self) -> Option<&str> {
>> +        pdm_api_types::AccessControlConfig.role_admin()
>> +    }
>> +
>> +    fn role_no_access(&self) -> Option<&str> {
>> +        pdm_api_types::AccessControlConfig.role_no_access()
>> +    }
>> +
>> +    fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
>> +        pdm_api_types::AccessControlConfig.init_user_config(config)
>> +    }
>> +
>> +    fn acl_audit_privileges(&self) -> u64 {
>> +        pdm_api_types::AccessControlConfig.acl_audit_privileges()
>> +    }
>> +
>> +    fn acl_modify_privileges(&self) -> u64 {
>> +        pdm_api_types::AccessControlConfig.acl_modify_privileges()
>> +    }
>> +
>> +    fn check_acl_path(&self, path: &str) -> Result<(), Error> {
>> +        pdm_api_types::AccessControlConfig.check_acl_path(path)
>> +    }
>> +
>> +    fn allow_partial_permission_match(&self) -> bool {
>> +        pdm_api_types::AccessControlConfig.allow_partial_permission_match()
>> +    }
>> +
>> +    fn cache_generation(&self) -> Option<usize> {
>> +        pdm_api_types::AccessControlConfig.cache_generation()
>> +    }
> 
> shouldn't this be wired up to the ConfigVersionCache?
>

If I understand correctly, cache_generation() and the
increment_cache_generation() below do not appear to have been wired
so far, meaning that caches were not enabled. To enable them,
a PDM AccessControlConfig implementation would probably be required
(as suggested in this patch) in order to be able integrate with
ConfigVersionCache.

I think these two functions should be checked, if we want to enabled
them or not, probably best as part of a dedicated scope? I can create a
bug report for this.

>> +
>> +    fn increment_cache_generation(&self) -> Result<(), Error> {
>> +        pdm_api_types::AccessControlConfig.increment_cache_generation()
> 
> shouldn't this be wired up to the ConfigVersionCache?
> 
>> +    }
>> +
>> +    fn token_shadow_cache_generation(&self) -> Option<usize> {
>> +        crate::ConfigVersionCache::new()
>> +            .ok()
>> +            .map(|c| c.token_shadow_generation())
>> +    }
>> +
>> +    fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
>> +        let c = crate::ConfigVersionCache::new()?;
>> +        Ok(c.increase_token_shadow_generation())
>> +    }
>> +}
>> diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
>> index 36a6a77..933140c 100644
>> --- a/lib/pdm-config/src/config_version_cache.rs
>> +++ b/lib/pdm-config/src/config_version_cache.rs
>> @@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
>>       traffic_control_generation: AtomicUsize,
>>       // Tracks updates to the remote/hostname/nodename mapping cache.
>>       remote_mapping_cache: AtomicUsize,
>> +    // Token shadow (token.shadow) generation/version.
>> +    token_shadow_generation: AtomicUsize,
> 
> explanation why this is safe for the commit message would be nice ;)
>

Will add :)

>>       // Add further atomics here
>>   }
>>   
>> @@ -172,4 +174,20 @@ impl ConfigVersionCache {
>>               .fetch_add(1, Ordering::Relaxed)
>>               + 1
>>       }
>> +
>> +    /// Returns the token shadow generation number.
>> +    pub fn token_shadow_generation(&self) -> usize {
>> +        self.shmem
>> +            .data()
>> +            .token_shadow_generation
>> +            .load(Ordering::Acquire)
>> +    }
>> +
>> +    /// Increase the token shadow generation number.
>> +    pub fn increase_token_shadow_generation(&self) -> usize {
>> +        self.shmem
>> +            .data()
>> +            .token_shadow_generation
>> +            .fetch_add(1, Ordering::AcqRel)
>> +    }
>>   }
>> diff --git a/lib/pdm-config/src/lib.rs b/lib/pdm-config/src/lib.rs
>> index 4c49054..a15a006 100644
>> --- a/lib/pdm-config/src/lib.rs
>> +++ b/lib/pdm-config/src/lib.rs
>> @@ -9,6 +9,8 @@ pub mod remotes;
>>   pub mod setup;
>>   pub mod views;
>>   
>> +mod access_control_config;
>> +pub use access_control_config::AccessControlConfig;
>>   mod config_version_cache;
>>   pub use config_version_cache::ConfigVersionCache;
>>   
>> diff --git a/server/src/acl.rs b/server/src/acl.rs
>> index f421814..e6e007b 100644
>> --- a/server/src/acl.rs
>> +++ b/server/src/acl.rs
>> @@ -1,6 +1,5 @@
>>   pub(crate) fn init() {
>> -    static ACCESS_CONTROL_CONFIG: pdm_api_types::AccessControlConfig =
>> -        pdm_api_types::AccessControlConfig;
>> +    static ACCESS_CONTROL_CONFIG: pdm_config::AccessControlConfig = pdm_config::AccessControlConfig;
>>   
>>       proxmox_access_control::init::init(&ACCESS_CONTROL_CONFIG, pdm_buildcfg::configdir!("/access"))
>>           .expect("failed to setup access control config");
>> -- 
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[relevance 6%]

* Re: [pbs-devel] [PATCH proxmox-datacenter-manager v3 1/2] pdm-config: implement token.shadow generation
  2026-01-16 16:28  6%     ` Samuel Rufinatscha
@ 2026-01-16 16:48  6%       ` Shannon Sterz
  2026-01-19  7:56  6%         ` Samuel Rufinatscha
  0 siblings, 1 reply; 117+ results
From: Shannon Sterz @ 2026-01-16 16:48 UTC (permalink / raw)
  To: Samuel Rufinatscha; +Cc: Proxmox Backup Server development discussion

On Fri Jan 16, 2026 at 5:28 PM CET, Samuel Rufinatscha wrote:
> On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
>> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>>> PDM depends on the shared proxmox/proxmox-access-control crate for
>>> token.shadow handling, which expects the product to provide a
>>> cross-process invalidation signal so it can safely cache verified API
>>> token secrets and invalidate them when token.shadow is changed.
>>>
>>> This patch
>>>
>>> * adds a token_shadow_generation to PDM’s shared-memory
>>> ConfigVersionCache
>>> * implements proxmox_access_control::init::AccessControlConfig
>>> for pdm_config::AccessControlConfig, which
>>>     - delegates roles/privs/path checks to the existing
>>> pdm_api_types::AccessControlConfig implementation
>>>     - implements the shadow cache generation trait functions
>>> * switches the AccessControlConfig init paths (server + CLI) to use
>>> pdm_config::AccessControlConfig instead of
>>> pdm_api_types::AccessControlConfig
>>>
>>> This patch is part of the series which fixes bug #7017 [1].
>>>
>>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>>
>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>> ---
>>>   cli/admin/src/main.rs                       |  2 +-
>>>   lib/pdm-config/Cargo.toml                   |  1 +
>>>   lib/pdm-config/src/access_control_config.rs | 73 +++++++++++++++++++++
>>>   lib/pdm-config/src/config_version_cache.rs  | 18 +++++
>>>   lib/pdm-config/src/lib.rs                   |  2 +
>>>   server/src/acl.rs                           |  3 +-
>>>   6 files changed, 96 insertions(+), 3 deletions(-)
>>>   create mode 100644 lib/pdm-config/src/access_control_config.rs
>>>
>>> diff --git a/cli/admin/src/main.rs b/cli/admin/src/main.rs
>>> index f698fa2..916c633 100644
>>> --- a/cli/admin/src/main.rs
>>> +++ b/cli/admin/src/main.rs
>>> @@ -19,7 +19,7 @@ fn main() {
>>>       proxmox_product_config::init(api_user, priv_user);
>>>
>>>       proxmox_access_control::init::init(
>>> -        &pdm_api_types::AccessControlConfig,
>>> +        &pdm_config::AccessControlConfig,
>>>           pdm_buildcfg::configdir!("/access"),
>>>       )
>>>       .expect("failed to setup access control config");
>>> diff --git a/lib/pdm-config/Cargo.toml b/lib/pdm-config/Cargo.toml
>>> index d39c2ad..19781d2 100644
>>> --- a/lib/pdm-config/Cargo.toml
>>> +++ b/lib/pdm-config/Cargo.toml
>>> @@ -13,6 +13,7 @@ once_cell.workspace = true
>>>   openssl.workspace = true
>>>   serde.workspace = true
>>>
>>> +proxmox-access-control.workspace = true
>>>   proxmox-config-digest = { workspace = true, features = [ "openssl" ] }
>>>   proxmox-http = { workspace = true, features = [ "http-helpers" ] }
>>>   proxmox-ldap = { workspace = true, features = [ "types" ]}
>>> diff --git a/lib/pdm-config/src/access_control_config.rs b/lib/pdm-config/src/access_control_config.rs
>>> new file mode 100644
>>> index 0000000..6f2e6b3
>>> --- /dev/null
>>> +++ b/lib/pdm-config/src/access_control_config.rs
>>> @@ -0,0 +1,73 @@
>>> +// e.g. in src/main.rs or server::context mod, wherever convenient
>>> +
>>> +use anyhow::Error;
>>> +use pdm_api_types::{Authid, Userid};
>>> +use proxmox_section_config::SectionConfigData;
>>> +use std::collections::HashMap;
>>> +
>>> +pub struct AccessControlConfig;
>>> +
>>> +impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
>>
>> should we then remove the impl from the api type?
>>
>
> Thanks for pointing this out Fabian! Currently, /ui/src/main.rs still
> makes use of pdm_api_types::AccessControlConfig. This looks like a WASM
> module, and is based on ticket based auth
> (proxmox_login::Authentication) as far as I can see. Do you maybe know
> if it actually requires the token cache / can work with CVC? If it does
> not, then I think we should keep the API impl. I left this unchanged
> and only touched server and CLI call sites.

i mostly exposed that there to get access to the privileges, roles, and
is_superuser functions. they are needed in the ui to selectively render
ui elements depending on a users privileges.

this should probably be factored out though and shared differently if we
want to extend this trait with more caching functions.

>>> +    fn privileges(&self) -> &HashMap<&str, u64> {
>>> +        pdm_api_types::AccessControlConfig.privileges()
>>> +    }
>>> +
>>> +    fn roles(&self) -> &HashMap<&str, (u64, &str)> {
>>> +        pdm_api_types::AccessControlConfig.roles()
>>> +    }
>>> +
>>> +    fn is_superuser(&self, auth_id: &Authid) -> bool {
>>> +        pdm_api_types::AccessControlConfig.is_superuser(auth_id)
>>> +    }
>>> +
>>> +    fn is_group_member(&self, user_id: &Userid, group: &str) -> bool {
>>> +        pdm_api_types::AccessControlConfig.is_group_member(user_id, group)
>>> +    }
>>> +
>>> +    fn role_admin(&self) -> Option<&str> {
>>> +        pdm_api_types::AccessControlConfig.role_admin()
>>> +    }
>>> +
>>> +    fn role_no_access(&self) -> Option<&str> {
>>> +        pdm_api_types::AccessControlConfig.role_no_access()
>>> +    }
>>> +
>>> +    fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
>>> +        pdm_api_types::AccessControlConfig.init_user_config(config)
>>> +    }
>>> +
>>> +    fn acl_audit_privileges(&self) -> u64 {
>>> +        pdm_api_types::AccessControlConfig.acl_audit_privileges()
>>> +    }
>>> +
>>> +    fn acl_modify_privileges(&self) -> u64 {
>>> +        pdm_api_types::AccessControlConfig.acl_modify_privileges()
>>> +    }
>>> +
>>> +    fn check_acl_path(&self, path: &str) -> Result<(), Error> {
>>> +        pdm_api_types::AccessControlConfig.check_acl_path(path)
>>> +    }
>>> +
>>> +    fn allow_partial_permission_match(&self) -> bool {
>>> +        pdm_api_types::AccessControlConfig.allow_partial_permission_match()
>>> +    }
>>> +
>>> +    fn cache_generation(&self) -> Option<usize> {
>>> +        pdm_api_types::AccessControlConfig.cache_generation()
>>> +    }
>>
>> shouldn't this be wired up to the ConfigVersionCache?
>>
>
> If I understand correctly, cache_generation() and the
> increment_cache_generation() below do not appear to have been wired
> so far, meaning that caches were not enabled. To enable them,
> a PDM AccessControlConfig implementation would probably be required
> (as suggested in this patch) in order to be able integrate with
> ConfigVersionCache.
>
> I think these two functions should be checked, if we want to enabled
> them or not, probably best as part of a dedicated scope? I can create a
> bug report for this.
>

sure, i think it's not too much effort, though. if you split out the
caching parts, the ui should be fine without them. it really has no need
for them afair.

>>> +
>>> +    fn increment_cache_generation(&self) -> Result<(), Error> {
>>> +        pdm_api_types::AccessControlConfig.increment_cache_generation()
>>
>> shouldn't this be wired up to the ConfigVersionCache?
>>
>>> +    }
>>> +
>>> +    fn token_shadow_cache_generation(&self) -> Option<usize> {
>>> +        crate::ConfigVersionCache::new()
>>> +            .ok()
>>> +            .map(|c| c.token_shadow_generation())
>>> +    }
>>> +
>>> +    fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
>>> +        let c = crate::ConfigVersionCache::new()?;
>>> +        Ok(c.increase_token_shadow_generation())
>>> +    }
>>> +}
>>> diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
>>> index 36a6a77..933140c 100644
>>> --- a/lib/pdm-config/src/config_version_cache.rs
>>> +++ b/lib/pdm-config/src/config_version_cache.rs
>>> @@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
>>>       traffic_control_generation: AtomicUsize,
>>>       // Tracks updates to the remote/hostname/nodename mapping cache.
>>>       remote_mapping_cache: AtomicUsize,
>>> +    // Token shadow (token.shadow) generation/version.
>>> +    token_shadow_generation: AtomicUsize,
>>
>> explanation why this is safe for the commit message would be nice ;)
>>
>
> Will add :)
>
>>>       // Add further atomics here
>>>   }
>>>
>>> @@ -172,4 +174,20 @@ impl ConfigVersionCache {
>>>               .fetch_add(1, Ordering::Relaxed)
>>>               + 1
>>>       }
>>> +
>>> +    /// Returns the token shadow generation number.
>>> +    pub fn token_shadow_generation(&self) -> usize {
>>> +        self.shmem
>>> +            .data()
>>> +            .token_shadow_generation
>>> +            .load(Ordering::Acquire)
>>> +    }
>>> +
>>> +    /// Increase the token shadow generation number.
>>> +    pub fn increase_token_shadow_generation(&self) -> usize {
>>> +        self.shmem
>>> +            .data()
>>> +            .token_shadow_generation
>>> +            .fetch_add(1, Ordering::AcqRel)
>>> +    }
>>>   }
>>> diff --git a/lib/pdm-config/src/lib.rs b/lib/pdm-config/src/lib.rs
>>> index 4c49054..a15a006 100644
>>> --- a/lib/pdm-config/src/lib.rs
>>> +++ b/lib/pdm-config/src/lib.rs
>>> @@ -9,6 +9,8 @@ pub mod remotes;
>>>   pub mod setup;
>>>   pub mod views;
>>>
>>> +mod access_control_config;
>>> +pub use access_control_config::AccessControlConfig;
>>>   mod config_version_cache;
>>>   pub use config_version_cache::ConfigVersionCache;
>>>
>>> diff --git a/server/src/acl.rs b/server/src/acl.rs
>>> index f421814..e6e007b 100644
>>> --- a/server/src/acl.rs
>>> +++ b/server/src/acl.rs
>>> @@ -1,6 +1,5 @@
>>>   pub(crate) fn init() {
>>> -    static ACCESS_CONTROL_CONFIG: pdm_api_types::AccessControlConfig =
>>> -        pdm_api_types::AccessControlConfig;
>>> +    static ACCESS_CONTROL_CONFIG: pdm_config::AccessControlConfig = pdm_config::AccessControlConfig;
>>>
>>>       proxmox_access_control::init::init(&ACCESS_CONTROL_CONFIG, pdm_buildcfg::configdir!("/access"))
>>>           .expect("failed to setup access control config");
>>> --
>>> 2.47.3
>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[relevance 6%]

* Re: [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets
  2026-01-16 16:00  5%       ` Fabian Grünbichler
@ 2026-01-16 16:56  6%         ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-16 16:56 UTC (permalink / raw)
  To: Fabian Grünbichler,
	Proxmox Backup Server development discussion

On 1/16/26 4:59 PM, Fabian Grünbichler wrote:
> Quoting Samuel Rufinatscha (2026-01-16 16:13:17)
>> On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
>>> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>>>> Currently, every token-based API request reads the token.shadow file and
>>>> runs the expensive password hash verification for the given token
>>>> secret. This shows up as a hotspot in /status profiling (see
>>>> bug #7017 [1]).
>>>>
>>>> This patch introduces an in-memory cache of successfully verified token
>>>> secrets. Subsequent requests for the same token+secret combination only
>>>> perform a comparison using openssl::memcmp::eq and avoid re-running the
>>>> password hash. The cache is updated when a token secret is set and
>>>> cleared when a token is deleted. Note, this does NOT include manual
>>>> config changes, which will be covered in a subsequent patch.
>>>>
>>>> This patch is part of the series which fixes bug #7017 [1].
>>>>
>>>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>>>
>>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>>> ---
> 
> [..]
> 
>>>> +
>>>> +// Tries to match the given token secret against the cached secret.
>>>> +// Checks the generation before and after the constant-time compare to avoid a
>>>> +// TOCTOU window. If another process rotates/deletes a token while we're validating
>>>> +// the cached secret, the generation will change, and we
>>>> +// must not trust the cache for this request.
>>>> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
>>>> +    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
>>>> +        return false;
>>>> +    };
>>>> +    let Some(entry) = cache.secrets.get(tokenid) else {
>>>> +        return false;
>>>> +    };
>>>> +
>>>> +    let cache_gen = cache.shared_gen;
>>>> +
>>>> +    let Some(gen1) = token_shadow_shared_gen() else {
>>>> +        return false;
>>>> +    };
>>>> +    if gen1 != cache_gen {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
>>>
>>> should we invalidate the cache here for this particular authid in case
>>> of a mismatch, to avoid making brute forcing too easy/cheap?
>>>
>>
>> We are not doing a cheap reject, in mismatch we do still fall through to
>> verify_crypt_pw(). Evicting on mismatch could however enable cache
>> thrashing where wrong secrets for a known tokenid would evict cached
>> entries. So I think we should not invalidate here on mismatch.
> 
> forgot this part here, sorry. you are right, this *should* be okay. I do think
> the second generation check there serves no purpose though. the token config
> can change at any point after we've validated the secret using the old state,
> there is nothing we can do about that, and it's totally fine to accept a token
> that is modified at exactly the same moment, even if that same token wouldn't
> be valid 2 seconds later..
> 
> there has to be a point where we have to say "this token is valid", and at the
> point of memcmp here we have already:
> - verified we don't need to reload the file
> - verified we didn't have any API changes to the token config
> - verified that the secret matches what we have cached
> 
> redoing the first two changes after that point doesn't protect us against
> changes afterwards either, so we might as well not do that extra work that
> doesn't give us any extra safety guarantees anyway..

Agreed, the second generation check only narrows down a very small 
window around memcmp (tried to avoid the TOCTOU at this point), but as 
you said, it doesn’t provide a strong additional guarantee and is 
unnecessary. Will remove!

> 
>>
>>>> +    let Some(gen2) = token_shadow_shared_gen() else {
>>>> +        return false;
>>>> +    };
>>>> +
>>>> +    eq && gen2 == cache_gen
>>>> +}
>>>> +
>>>> +fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
>>>> +    // Signal cache invalidation to other processes (best-effort).
>>>> +    let new_shared_gen = bump_token_shadow_shared_gen();
>>>> +
>>>> +    let mut cache = TOKEN_SECRET_CACHE.write();
> 
> because I mentioned switching those two around - this actually requires more
> thought I think..
> 
> right now, calling apply_api_mutation happens under a lock, but there are other
> calls that bump the generation, so this is actually racy here. OTOH, bumping
> the generation before locking the cache means faster cache invalidation..

Yes, I favored to bump the gen before the write lock for faster cache
invalidation / for better security.
> 
> maybe we should re-verify the generation after obtaining the lock? and maybe
> make apply_api_mutation consume the shadow config file lock, to ensure it's
> only called while that lock is being held?

Agree, I think we should re-verify the generation after the write lock.
Also agree, I think we should pass the file lock down. Good idea! :)
This should make it more robust.

> 
>>>> +
>>>> +    // If we cannot read/bump the shared generation, we cannot safely trust the cache.
>>>> +    let Some(gen) = new_shared_gen else {
>>>> +        invalidate_cache_state(&mut cache);
>>>> +        cache.shared_gen = 0;
>>>> +        return;
>>>> +    };
>>>> +
>>>> +    // Update to the post-mutation generation.
>>>> +    cache.shared_gen = gen;
>>>> +
>>>> +    // Apply the new mutation.
>>>> +    match new_secret {
>>>> +        Some(secret) => {
>>>> +            cache.secrets.insert(
>>>> +                tokenid.clone(),
>>>> +                CachedSecret {
>>>> +                    secret: secret.to_owned(),
>>>> +                },
>>>> +            );
>>>> +        }
>>>> +        None => {
>>>> +            cache.secrets.remove(tokenid);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +/// Get the current shared generation.
>>>> +fn token_shadow_shared_gen() -> Option<usize> {
>>>> +    crate::ConfigVersionCache::new()
>>>> +        .ok()
>>>> +        .map(|cvc| cvc.token_shadow_generation())
>>>> +}
>>>> +
>>>> +/// Bump and return the new shared generation.
>>>> +fn bump_token_shadow_shared_gen() -> Option<usize> {
>>>> +    crate::ConfigVersionCache::new()
>>>> +        .ok()
>>>> +        .map(|cvc| cvc.increase_token_shadow_generation() + 1)
>>>> +}
>>>> +
>>>> +/// Invalidates the cache state and only keeps the shared generation.
>>>
>>> both calls to this actually set the cached generation to some value
>>> right after, so maybe this should take a generation directly and set it?
>>>
>>
>> patch 3/4 doesn’t always update the gen on cache invalidation
>> (shadow_mtime_len() error branch in apply_api_mutation) but most other
>> call sites do. Agreed this can be refactored, maybe:
>>
>> fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>>       cache.secrets.clear();
>>       // clear other cache fields (mtime/len/last_checked) as needed
>> }
>>
>> fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache,
>> gen: usize) {
>>       invalidate_cache_state(cache);
>>       cache.shared_gen = gen;
>> }
>>
>> We could also do a single helper with Option<usize> but two helpers make
>> the call sites more explicit.
>>
>>>> +fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>>>> +    cache.secrets.clear();
>>>> +}
>>>> -- 
>>>> 2.47.3
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> pbs-devel mailing list
>>>> pbs-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[relevance 6%]

* Re: [pbs-devel] [PATCH proxmox-datacenter-manager v3 1/2] pdm-config: implement token.shadow generation
  2026-01-16 16:48  6%       ` Shannon Sterz
@ 2026-01-19  7:56  6%         ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-19  7:56 UTC (permalink / raw)
  To: Shannon Sterz; +Cc: Proxmox Backup Server development discussion

comments inline

On 1/16/26 5:47 PM, Shannon Sterz wrote:
> On Fri Jan 16, 2026 at 5:28 PM CET, Samuel Rufinatscha wrote:
>> On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
>>> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>>>> PDM depends on the shared proxmox/proxmox-access-control crate for
>>>> token.shadow handling, which expects the product to provide a
>>>> cross-process invalidation signal so it can safely cache verified API
>>>> token secrets and invalidate them when token.shadow is changed.
>>>>
>>>> This patch
>>>>
>>>> * adds a token_shadow_generation to PDM’s shared-memory
>>>> ConfigVersionCache
>>>> * implements proxmox_access_control::init::AccessControlConfig
>>>> for pdm_config::AccessControlConfig, which
>>>>      - delegates roles/privs/path checks to the existing
>>>> pdm_api_types::AccessControlConfig implementation
>>>>      - implements the shadow cache generation trait functions
>>>> * switches the AccessControlConfig init paths (server + CLI) to use
>>>> pdm_config::AccessControlConfig instead of
>>>> pdm_api_types::AccessControlConfig
>>>>
>>>> This patch is part of the series which fixes bug #7017 [1].
>>>>
>>>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>>>
>>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>>> ---
>>>>    cli/admin/src/main.rs                       |  2 +-
>>>>    lib/pdm-config/Cargo.toml                   |  1 +
>>>>    lib/pdm-config/src/access_control_config.rs | 73 +++++++++++++++++++++
>>>>    lib/pdm-config/src/config_version_cache.rs  | 18 +++++
>>>>    lib/pdm-config/src/lib.rs                   |  2 +
>>>>    server/src/acl.rs                           |  3 +-
>>>>    6 files changed, 96 insertions(+), 3 deletions(-)
>>>>    create mode 100644 lib/pdm-config/src/access_control_config.rs
>>>>
>>>> diff --git a/cli/admin/src/main.rs b/cli/admin/src/main.rs
>>>> index f698fa2..916c633 100644
>>>> --- a/cli/admin/src/main.rs
>>>> +++ b/cli/admin/src/main.rs
>>>> @@ -19,7 +19,7 @@ fn main() {
>>>>        proxmox_product_config::init(api_user, priv_user);
>>>>
>>>>        proxmox_access_control::init::init(
>>>> -        &pdm_api_types::AccessControlConfig,
>>>> +        &pdm_config::AccessControlConfig,
>>>>            pdm_buildcfg::configdir!("/access"),
>>>>        )
>>>>        .expect("failed to setup access control config");
>>>> diff --git a/lib/pdm-config/Cargo.toml b/lib/pdm-config/Cargo.toml
>>>> index d39c2ad..19781d2 100644
>>>> --- a/lib/pdm-config/Cargo.toml
>>>> +++ b/lib/pdm-config/Cargo.toml
>>>> @@ -13,6 +13,7 @@ once_cell.workspace = true
>>>>    openssl.workspace = true
>>>>    serde.workspace = true
>>>>
>>>> +proxmox-access-control.workspace = true
>>>>    proxmox-config-digest = { workspace = true, features = [ "openssl" ] }
>>>>    proxmox-http = { workspace = true, features = [ "http-helpers" ] }
>>>>    proxmox-ldap = { workspace = true, features = [ "types" ]}
>>>> diff --git a/lib/pdm-config/src/access_control_config.rs b/lib/pdm-config/src/access_control_config.rs
>>>> new file mode 100644
>>>> index 0000000..6f2e6b3
>>>> --- /dev/null
>>>> +++ b/lib/pdm-config/src/access_control_config.rs
>>>> @@ -0,0 +1,73 @@
>>>> +// e.g. in src/main.rs or server::context mod, wherever convenient
>>>> +
>>>> +use anyhow::Error;
>>>> +use pdm_api_types::{Authid, Userid};
>>>> +use proxmox_section_config::SectionConfigData;
>>>> +use std::collections::HashMap;
>>>> +
>>>> +pub struct AccessControlConfig;
>>>> +
>>>> +impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
>>>
>>> should we then remove the impl from the api type?
>>>
>>
>> Thanks for pointing this out Fabian! Currently, /ui/src/main.rs still
>> makes use of pdm_api_types::AccessControlConfig. This looks like a WASM
>> module, and is based on ticket based auth
>> (proxmox_login::Authentication) as far as I can see. Do you maybe know
>> if it actually requires the token cache / can work with CVC? If it does
>> not, then I think we should keep the API impl. I left this unchanged
>> and only touched server and CLI call sites.
> 
> i mostly exposed that there to get access to the privileges, roles, and
> is_superuser functions. they are needed in the ui to selectively render
> ui elements depending on a users privileges.
> 
> this should probably be factored out though and shared differently if we
> want to extend this trait with more caching functions.
>

Good point.

>>>> +    fn privileges(&self) -> &HashMap<&str, u64> {
>>>> +        pdm_api_types::AccessControlConfig.privileges()
>>>> +    }
>>>> +
>>>> +    fn roles(&self) -> &HashMap<&str, (u64, &str)> {
>>>> +        pdm_api_types::AccessControlConfig.roles()
>>>> +    }
>>>> +
>>>> +    fn is_superuser(&self, auth_id: &Authid) -> bool {
>>>> +        pdm_api_types::AccessControlConfig.is_superuser(auth_id)
>>>> +    }
>>>> +
>>>> +    fn is_group_member(&self, user_id: &Userid, group: &str) -> bool {
>>>> +        pdm_api_types::AccessControlConfig.is_group_member(user_id, group)
>>>> +    }
>>>> +
>>>> +    fn role_admin(&self) -> Option<&str> {
>>>> +        pdm_api_types::AccessControlConfig.role_admin()
>>>> +    }
>>>> +
>>>> +    fn role_no_access(&self) -> Option<&str> {
>>>> +        pdm_api_types::AccessControlConfig.role_no_access()
>>>> +    }
>>>> +
>>>> +    fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
>>>> +        pdm_api_types::AccessControlConfig.init_user_config(config)
>>>> +    }
>>>> +
>>>> +    fn acl_audit_privileges(&self) -> u64 {
>>>> +        pdm_api_types::AccessControlConfig.acl_audit_privileges()
>>>> +    }
>>>> +
>>>> +    fn acl_modify_privileges(&self) -> u64 {
>>>> +        pdm_api_types::AccessControlConfig.acl_modify_privileges()
>>>> +    }
>>>> +
>>>> +    fn check_acl_path(&self, path: &str) -> Result<(), Error> {
>>>> +        pdm_api_types::AccessControlConfig.check_acl_path(path)
>>>> +    }
>>>> +
>>>> +    fn allow_partial_permission_match(&self) -> bool {
>>>> +        pdm_api_types::AccessControlConfig.allow_partial_permission_match()
>>>> +    }
>>>> +
>>>> +    fn cache_generation(&self) -> Option<usize> {
>>>> +        pdm_api_types::AccessControlConfig.cache_generation()
>>>> +    }
>>>
>>> shouldn't this be wired up to the ConfigVersionCache?
>>>
>>
>> If I understand correctly, cache_generation() and the
>> increment_cache_generation() below do not appear to have been wired
>> so far, meaning that caches were not enabled. To enable them,
>> a PDM AccessControlConfig implementation would probably be required
>> (as suggested in this patch) in order to be able integrate with
>> ConfigVersionCache.
>>
>> I think these two functions should be checked, if we want to enabled
>> them or not, probably best as part of a dedicated scope? I can create a
>> bug report for this.
>>
> 
> sure, i think it's not too much effort, though. if you split out the
> caching parts, the ui should be fine without them. it really has no need
> for them afair.

If the UI doesnt make use of it maybe it would be simply best to keep
two different impls? One to keep it minimal, also since not all parts
might be WASM compatible, and one impl as proposed to wire-up CVC (and
maybe other things in the future..).

And will wire CVC for the other two existing caching functions as part
of this series.

> 
>>>> +
>>>> +    fn increment_cache_generation(&self) -> Result<(), Error> {
>>>> +        pdm_api_types::AccessControlConfig.increment_cache_generation()
>>>
>>> shouldn't this be wired up to the ConfigVersionCache?
>>>
>>>> +    }
>>>> +
>>>> +    fn token_shadow_cache_generation(&self) -> Option<usize> {
>>>> +        crate::ConfigVersionCache::new()
>>>> +            .ok()
>>>> +            .map(|c| c.token_shadow_generation())
>>>> +    }
>>>> +
>>>> +    fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
>>>> +        let c = crate::ConfigVersionCache::new()?;
>>>> +        Ok(c.increase_token_shadow_generation())
>>>> +    }
>>>> +}
>>>> diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
>>>> index 36a6a77..933140c 100644
>>>> --- a/lib/pdm-config/src/config_version_cache.rs
>>>> +++ b/lib/pdm-config/src/config_version_cache.rs
>>>> @@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
>>>>        traffic_control_generation: AtomicUsize,
>>>>        // Tracks updates to the remote/hostname/nodename mapping cache.
>>>>        remote_mapping_cache: AtomicUsize,
>>>> +    // Token shadow (token.shadow) generation/version.
>>>> +    token_shadow_generation: AtomicUsize,
>>>
>>> explanation why this is safe for the commit message would be nice ;)
>>>
>>
>> Will add :)
>>
>>>>        // Add further atomics here
>>>>    }
>>>>
>>>> @@ -172,4 +174,20 @@ impl ConfigVersionCache {
>>>>                .fetch_add(1, Ordering::Relaxed)
>>>>                + 1
>>>>        }
>>>> +
>>>> +    /// Returns the token shadow generation number.
>>>> +    pub fn token_shadow_generation(&self) -> usize {
>>>> +        self.shmem
>>>> +            .data()
>>>> +            .token_shadow_generation
>>>> +            .load(Ordering::Acquire)
>>>> +    }
>>>> +
>>>> +    /// Increase the token shadow generation number.
>>>> +    pub fn increase_token_shadow_generation(&self) -> usize {
>>>> +        self.shmem
>>>> +            .data()
>>>> +            .token_shadow_generation
>>>> +            .fetch_add(1, Ordering::AcqRel)
>>>> +    }
>>>>    }
>>>> diff --git a/lib/pdm-config/src/lib.rs b/lib/pdm-config/src/lib.rs
>>>> index 4c49054..a15a006 100644
>>>> --- a/lib/pdm-config/src/lib.rs
>>>> +++ b/lib/pdm-config/src/lib.rs
>>>> @@ -9,6 +9,8 @@ pub mod remotes;
>>>>    pub mod setup;
>>>>    pub mod views;
>>>>
>>>> +mod access_control_config;
>>>> +pub use access_control_config::AccessControlConfig;
>>>>    mod config_version_cache;
>>>>    pub use config_version_cache::ConfigVersionCache;
>>>>
>>>> diff --git a/server/src/acl.rs b/server/src/acl.rs
>>>> index f421814..e6e007b 100644
>>>> --- a/server/src/acl.rs
>>>> +++ b/server/src/acl.rs
>>>> @@ -1,6 +1,5 @@
>>>>    pub(crate) fn init() {
>>>> -    static ACCESS_CONTROL_CONFIG: pdm_api_types::AccessControlConfig =
>>>> -        pdm_api_types::AccessControlConfig;
>>>> +    static ACCESS_CONTROL_CONFIG: pdm_config::AccessControlConfig = pdm_config::AccessControlConfig;
>>>>
>>>>        proxmox_access_control::init::init(&ACCESS_CONTROL_CONFIG, pdm_buildcfg::configdir!("/access"))
>>>>            .expect("failed to setup access control config");
>>>> --
>>>> 2.47.3
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> pbs-devel mailing list
>>>> pbs-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[relevance 6%]

* Re: [pbs-devel] [PATCH proxmox-backup v3 3/4] pbs-config: invalidate token-secret cache on token.shadow changes
  @ 2026-01-20  9:21  6%     ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-20  9:21 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion,
	Fabian Grünbichler

On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>> Previously the in-memory token-secret cache was only updated via
>> set_secret() and delete_secret(), so manual edits to token.shadow were
>> not reflected.
>>
>> This patch adds file change detection to the cache. It tracks the mtime
>> and length of token.shadow and clears the in-memory token secret cache
>> whenever these values change.
>>
>> Note, this patch fetches file stats on every request. An TTL-based
>> optimization will be covered in a subsequent patch of the series.
>>
>> This patch is part of the series which fixes bug #7017 [1].
>>
>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> Changes from v1 to v2:
>>
>> * Add file metadata tracking (file_mtime, file_len) and
>>    FILE_GENERATION.
>> * Store file_gen in CachedSecret and verify it against the current
>>    FILE_GENERATION to ensure cached entries belong to the current file
>>    state.
>> * Add shadow_mtime_len() helper and convert refresh to best-effort
>>    (try_write, returns bool).
>> * Pass a pre-write metadata snapshot into apply_api_mutation and
>>    clear/bump generation if the cache metadata indicates missed external
>>    edits.
>>
>> Changes from v2 to v3:
>>
>> * Cache now tracks last_checked (epoch seconds).
>> * Simplified refresh_cache_if_file_changed, removed
>> FILE_GENERATION logic
>> * On first load, initializes file metadata and keeps empty cache.
>>
>>   pbs-config/src/token_shadow.rs | 122 +++++++++++++++++++++++++++++++--
>>   1 file changed, 118 insertions(+), 4 deletions(-)
>>
>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
>> index fa84aee5..02fb191b 100644
>> --- a/pbs-config/src/token_shadow.rs
>> +++ b/pbs-config/src/token_shadow.rs
>> @@ -1,5 +1,8 @@
>>   use std::collections::HashMap;
>> +use std::fs;
>> +use std::io::ErrorKind;
>>   use std::sync::LazyLock;
>> +use std::time::SystemTime;
>>   
>>   use anyhow::{bail, format_err, Error};
>>   use parking_lot::RwLock;
>> @@ -7,6 +10,7 @@ use serde::{Deserialize, Serialize};
>>   use serde_json::{from_value, Value};
>>   
>>   use proxmox_sys::fs::CreateOptions;
>> +use proxmox_time::epoch_i64;
>>   
>>   use pbs_api_types::Authid;
>>   //use crate::auth;
>> @@ -24,6 +28,9 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
>>       RwLock::new(ApiTokenSecretCache {
>>           secrets: HashMap::new(),
>>           shared_gen: 0,
>> +        file_mtime: None,
>> +        file_len: None,
>> +        last_checked: None,
>>       })
>>   });
>>   
>> @@ -62,6 +69,63 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
>>       proxmox_sys::fs::replace_file(CONF_FILE, &json, options, true)
>>   }
>>   
>> +/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
>> +/// Returns true if the cache is valid to use, false if not.
>> +fn refresh_cache_if_file_changed() -> bool {
>> +    let now = epoch_i64();
>> +
>> +    // Best-effort refresh under write lock.
>> +    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
>> +        return false;
>> +    };
>> +
>> +    let Some(shared_gen_now) = token_shadow_shared_gen() else {
>> +        return false;
>> +    };
>> +
>> +    // If another process bumped the generation, we don't know what changed -> clear cache
>> +    if cache.shared_gen != shared_gen_now {
>> +        invalidate_cache_state(&mut cache);
>> +        cache.shared_gen = shared_gen_now;
>> +    }
>> +
>> +    // Stat the file to detect manual edits.
>> +    let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
>> +        return false;
>> +    };
>> +
>> +    // Initialize file stats if we have no prior state.
>> +    if cache.last_checked.is_none() {
>> +        cache.secrets.clear(); // ensure cache is empty on first load
>> +        cache.file_mtime = new_mtime;
>> +        cache.file_len = new_len;
>> +        cache.last_checked = Some(now);
>> +        return true;
> 
> this code here
> 
>> +    }
>> +
>> +    // No change detected.
>> +    if cache.file_mtime == new_mtime && cache.file_len == new_len {
>> +        cache.last_checked = Some(now);
>> +        return true;
>> +    }
>> +
>> +    // Manual edit detected -> invalidate cache and update stat.
>> +    cache.secrets.clear();
>> +    cache.file_mtime = new_mtime;
>> +    cache.file_len = new_len;
>> +    cache.last_checked = Some(now);
> 
> and this code here are identical. if this is the first invocation, then
> the change detection check above cannot be true (the cached mtime and
> len will be None).
> 
> so we can drop the first if above, and replace the last line in this
> hunk with
> 
> let prev_last_checked = cache.last_checked.replace(Some(now));
>
> and then skip bumping the generation if this is_none()

Great idea about the .replace()! Integrating it with the new
ShadowFileInfo :)

> 
> OTOH, if we just cleared the cache here, does it make sense to return
> true? the cache is empty, so likely querying it *now* makes no sense?

Agree, we should just return false here

> 
>> +
>> +    // Best-effort propagation to other processes + update local view.
>> +    if let Some(shared_gen_new) = bump_token_shadow_shared_gen() {
>> +        cache.shared_gen = shared_gen_new;
>> +    } else {
>> +        // Do not fail: local cache is already safe as we cleared it above.
>> +        // Keep local shared_gen as-is to avoid repeated failed attempts.
>> +    }
>> +
>> +    true
>> +}
>> +
>>   /// Verifies that an entry for given tokenid / API token secret exists
>>   pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>>       if !tokenid.is_token() {
>> @@ -69,7 +133,7 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>>       }
>>   
>>       // Fast path
>> -    if cache_try_secret_matches(tokenid, secret) {
>> +    if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
>>           return Ok(());
>>       }
>>   
>> @@ -109,12 +173,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>>   
>>       let _guard = lock_config()?;
>>   
>> +    // Capture state before we write to detect external edits.
>> +    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
>> +
>>       let mut data = read_file()?;
>>       let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
>>       data.insert(tokenid.clone(), hashed_secret);
>>       write_file(data)?;
>>   
>> -    apply_api_mutation(tokenid, Some(secret));
>> +    apply_api_mutation(tokenid, Some(secret), pre_meta);
>>   
>>       Ok(())
>>   }
>> @@ -127,11 +194,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
>>   
>>       let _guard = lock_config()?;
>>   
>> +    // Capture state before we write to detect external edits.
>> +    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
>> +
>>       let mut data = read_file()?;
>>       data.remove(tokenid);
>>       write_file(data)?;
>>   
>> -    apply_api_mutation(tokenid, None);
>> +    apply_api_mutation(tokenid, None, pre_meta);
>>   
>>       Ok(())
>>   }
>> @@ -145,6 +215,12 @@ struct ApiTokenSecretCache {
>>       secrets: HashMap<Authid, CachedSecret>,
>>       /// Shared generation to detect mutations of the underlying token.shadow file.
>>       shared_gen: usize,
>> +    // shadow file mtime to detect changes
>> +    file_mtime: Option<SystemTime>,
>> +    // shadow file length to detect changes
>> +    file_len: Option<u64>,
>> +    // last time the file metadata was checked
>> +    last_checked: Option<i64>,
> 
> these three are always set together, so wouldn't it make more sense to
> make them an Option<ShadowFileInfo> ?
>
>>   }
>>   
>>   /// Cached secret.
>> @@ -204,7 +280,13 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
>>       eq && gen2 == cache_gen
>>   }
>>   
>> -fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
>> +fn apply_api_mutation(
>> +    tokenid: &Authid,
>> +    new_secret: Option<&str>,
>> +    pre_write_meta: (Option<SystemTime>, Option<u64>),
>> +) {
>> +    let now = epoch_i64();
>> +
>>       // Signal cache invalidation to other processes (best-effort).
>>       let new_shared_gen = bump_token_shadow_shared_gen();
>>   
>> @@ -220,6 +302,13 @@ fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
>>       // Update to the post-mutation generation.
>>       cache.shared_gen = gen;
>>   
>> +    // If our cached file metadata does not match the on-disk state before our write,
>> +    // we likely missed an external/manual edit. We can no longer trust any cached secrets.
>> +    let (pre_mtime, pre_len) = pre_write_meta;
>> +    if cache.file_mtime != pre_mtime || cache.file_len != pre_len {
>> +        cache.secrets.clear();
>> +    }
>> +
>>       // Apply the new mutation.
>>       match new_secret {
>>           Some(secret) => {
>> @@ -234,6 +323,20 @@ fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
>>               cache.secrets.remove(tokenid);
>>           }
>>       }
>> +
>> +    // Update our view of the file metadata to the post-write state (best-effort).
>> +    // (If this fails, drop local cache so callers fall back to slow path until refreshed.)
>> +    match shadow_mtime_len() {
>> +        Ok((mtime, len)) => {
>> +            cache.file_mtime = mtime;
>> +            cache.file_len = len;
>> +            cache.last_checked = Some(now);
>> +        }
>> +        Err(_) => {
>> +            // If we cannot validate state, do not trust cache.
>> +            invalidate_cache_state(&mut cache);
>> +        }
>> +    }
>>   }
>>   
>>   /// Get the current shared generation.
>> @@ -253,4 +356,15 @@ fn bump_token_shadow_shared_gen() -> Option<usize> {
>>   /// Invalidates the cache state and only keeps the shared generation.
>>   fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>>       cache.secrets.clear();
>> +    cache.file_mtime = None;
>> +    cache.file_len = None;
>> +    cache.last_checked = None;
>> +}
>> +
>> +fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
>> +    match fs::metadata(CONF_FILE) {
>> +        Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
>> +        Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
>> +        Err(e) => Err(e.into()),
>> +    }
>>   }
>> -- 
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
> 
> 
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> 
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[relevance 6%]

* [pbs-devel] [PATCH proxmox-backup v4 4/4] pbs-config: add TTL window to token secret cache
  2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (2 preceding siblings ...)
  2026-01-21 15:13 12% ` [pbs-devel] [PATCH proxmox-backup v4 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2026-01-21 15:14 15% ` Samuel Rufinatscha
  2026-02-10 12:58  6%   ` Christian Ebner
  2026-01-21 15:14 14% ` [pbs-devel] [PATCH proxmox v4 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen Samuel Rufinatscha
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 117+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
  To: pbs-devel

Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
usually should rarely change.

This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired; documents TTL effects.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to 4:
* Adjusted commit message

Changes from v2 to v3:
* Refactored refresh_cache_if_file_changed TTL logic.
* Remove had_prior_state check (replaced by last_checked logic).
* Improve TTL bound checks.
* Reword documentation warning for clarity.

Changes from v1 to v2:
* Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
* Implement double-checked TTL: check with try_read first; only attempt
  refresh with try_write if expired/unknown.
* Fix TTL bookkeeping: update last_checked on the “file unchanged” path
  and after API mutations.
* Add documentation warning about TTL-delayed effect of manual
  token.shadow edits.

 docs/user-management.rst       |  4 ++++
 pbs-config/src/token_shadow.rs | 29 ++++++++++++++++++++++++++++-
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/docs/user-management.rst b/docs/user-management.rst
index 41b43d60..8dfae528 100644
--- a/docs/user-management.rst
+++ b/docs/user-management.rst
@@ -156,6 +156,10 @@ metadata:
 Similarly, the ``user delete-token`` subcommand can be used to delete a token
 again.
 
+.. WARNING:: Direct/manual edits to ``token.shadow`` may take up to 60 seconds (or
+   longer in edge cases) to take effect due to caching. Restart services for
+   immediate effect of manual edits.
+
 Newly generated API tokens don't have any permissions. Please read the next
 section to learn how to set access permissions.
 
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index a5bd1525..24633f6e 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -31,6 +31,8 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
         shadow: None,
     })
 });
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
 
 #[derive(Serialize, Deserialize)]
 #[serde(rename_all = "kebab-case")]
@@ -72,11 +74,29 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
 fn refresh_cache_if_file_changed() -> bool {
     let now = epoch_i64();
 
-    // Best-effort refresh under write lock.
+    // Fast path: cache is fresh if shared-gen matches and TTL not expired.
+    if let (Some(cache), Some(shared_gen_read)) =
+        (TOKEN_SECRET_CACHE.try_read(), token_shadow_shared_gen())
+    {
+        if cache.shared_gen == shared_gen_read
+            && cache.shadow.as_ref().is_some_and(|cached| {
+                now >= cached.last_checked
+                    && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS
+            })
+        {
+            return true;
+        }
+        // read lock drops here
+    } else {
+        return false;
+    }
+
+    // Slow path: best-effort refresh under write lock.
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return false;
     };
 
+    // Re-read generation after acquiring the lock (may have changed meanwhile).
     let Some(shared_gen_now) = token_shadow_shared_gen() else {
         return false;
     };
@@ -86,6 +106,13 @@ fn refresh_cache_if_file_changed() -> bool {
         invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
     }
 
+    // TTL check again after acquiring the lock
+    if cache.shadow.as_ref().is_some_and(|cached| {
+        now >= cached.last_checked && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS
+    }) {
+        return true;
+    }
+
     // Stat the file to detect manual edits.
     let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
         return false;
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply related	[relevance 15%]

* [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead
@ 2026-01-21 15:13 14% Samuel Rufinatscha
  2026-01-21 15:13 17% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
                   ` (11 more replies)
  0 siblings, 12 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-21 15:13 UTC (permalink / raw)
  To: pbs-devel

Hi,

this series improves the performance of token-based API authentication
in PBS (pbs-config) and in PDM (underlying proxmox-access-control
crate), addressing the API token verification hotspot reported in our
bugtracker #7017 [1].

When profiling PBS /status endpoint with cargo flamegraph [2],
token-based authentication showed up as a dominant hotspot via
proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
path from the hot section of the flamegraph. The same performance issue
was measured [2] for PDM. PDM uses the underlying shared
proxmox-access-control library for token handling, which is a
factored out version of the token.shadow handling code from PBS.

While this series fixes the immediate performance issue both in PBS
(pbs-config) and in the shared proxmox-access-control crate used by
PDM, PBS should eventually, ideally be refactored, in a separate
effort, to use proxmox-access-control for token handling instead of its
local implementation.

Approach

The goal is to reduce the cost of token-based authentication preserving
the existing token handling semantics (including detecting manual edits
to token.shadow) and be consistent between PBS (pbs-config) and
PDM (proxmox-access-control). For both sites, this series proposes to:

1. Introduce an in-memory cache for verified token secrets and
invalidate it through a shared ConfigVersionCache generation. Note, a
shared generation is required to keep privileged and unprivileged
daemon in sync to avoid caching inconsistencies across processes.
2. Invalidate on token.shadow API changes (set_secret,
delete_secret)
3. Invalidate on direct/manual token.shadow file changes (mtime +
length)
4. Avoid per-request file stat calls using a TTL window

Testing

To verify the effect in PBS (pbs-config changes), I:
1. Set up test environment based on latest PBS ISO, installed Rust
   toolchain, cloned proxmox-backup repository to use with cargo
   flamegraph. Reproduced bug #7017 [1] by profiling the /status
   endpoint with token-based authentication using cargo flamegraph [2].
2. Built PBS with pbs-config patches and re-ran the same workload and
   profiling setup. Confirmed that
   proxmox_sys::crypt::verify_crypt_pw path no longer appears in the
   hot section of the flamegraph. CPU usage is now dominated by TLS
   overhead.
3. Functionally-wise, I verified that:
   * valid tokens authenticate correctly when used in API requests
   * invalid secrets are rejected as before
   * generating a new token secret via dashboard (create token for
   user, regenerate existing secret) works and authenticates correctly

To verify the effect in PDM (proxmox-access-control changes), instead
of PBS’ /status, I profiled the /version endpoint with cargo flamegraph
[2] and verified that the expensive hashing path disappears from the
hot section after introducing caching. Functionally-wise, I verified
that:
   * valid tokens authenticate correctly when used in API requests
   * invalid secrets are rejected as before
   * generating a new token secret via dashboard (create token for user,
   regenerate existing secret) works and authenticates correctly

Benchmarks

Two different benchmarks have been run to measure caching effects
and RwLock contention:

(1) Requests per second for PBS /status endpoint (E2E)

Benchmarked parallel token auth requests for
/status?verbose=0 on top of the datastore lookup cache series [3]
to check throughput impact. With datastores=1, repeat=5000, parallel=16
this series gives ~172 req/s compared to ~65 req/s without it.
This is a ~2.6x improvement (and aligns with the ~179 req/s from the
previous series, which used per-process cache invalidation).

(2) RwLock contention for token create/delete under heavy load of
token-authenticated requests

The previous version of the series compared std::sync::RwLock and
parking_lot::RwLock contention for token create/delete under heavy
parallel token-authenticated readers. parking_lot::RwLock has been
chosen for the added fairness guarantees.

Patch summary

pbs-config:
0001 – pbs-config: add token.shadow generation to ConfigVersionCache
0002 – pbs-config: cache verified API token secrets
0003 – pbs-config: invalidate token-secret cache on token.shadow
changes
0004 – pbs-config: add TTL window to token-secret cache

proxmox-access-control:
0005 – access-control: extend AccessControlConfig for token.shadow invalidation
0006 – access-control: cache verified API token secrets
0007 – access-control: invalidate token-secret cache on token.shadow changes
0008 – access-control: add TTL window to token-secret cache

proxmox-datacenter-manager:
0009 – pdm-config: add token.shadow generation to ConfigVersionCache
0010 – docs: document API token-cache TTL effects
0011 – pdm-config: wire user+acl cache generation

Maintainer notes
* proxmox-access-control trait split: permissions now live in
 AccessControlPermissions, and AccessControlConfig now requires
 fn permissions(&self) -> &dyn AccessControlPermissions ->
 version bump
* Renames ConfigVersionCache`s pub user_cache_generation and
 increase_user_cache_generation -> version bump
* Adds parking_lot::RwLock dependency in PBS and proxmox-access-control

Kind regards,
Samuel Rufinatscha

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
[2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
[3] https://bugzilla.proxmox.com/show_bug.cgi?id=6049

proxmox-backup:

Samuel Rufinatscha (4):
  pbs-config: add token.shadow generation to ConfigVersionCache
  pbs-config: cache verified API token secrets
  pbs-config: invalidate token-secret cache on token.shadow changes
  pbs-config: add TTL window to token secret cache

 Cargo.toml                             |   1 +
 docs/user-management.rst               |   4 +
 pbs-config/Cargo.toml                  |   1 +
 pbs-config/src/config_version_cache.rs |  18 ++
 pbs-config/src/token_shadow.rs         | 302 ++++++++++++++++++++++++-
 5 files changed, 323 insertions(+), 3 deletions(-)


proxmox:

Samuel Rufinatscha (4):
  proxmox-access-control: split AccessControlConfig and add token.shadow
    gen
  proxmox-access-control: cache verified API token secrets
  proxmox-access-control: invalidate token-secret cache on token.shadow
    changes
  proxmox-access-control: add TTL window to token secret cache

 Cargo.toml                                 |   1 +
 proxmox-access-control/Cargo.toml          |   1 +
 proxmox-access-control/src/acl.rs          |  10 +-
 proxmox-access-control/src/init.rs         | 113 ++++++--
 proxmox-access-control/src/token_shadow.rs | 303 ++++++++++++++++++++-
 5 files changed, 401 insertions(+), 27 deletions(-)


proxmox-datacenter-manager:

Samuel Rufinatscha (3):
  pdm-config: implement token.shadow generation
  docs: document API token-cache TTL effects
  pdm-config: wire user+acl cache generation

 cli/admin/src/main.rs                      |  2 +-
 docs/access-control.rst                    |  4 +++
 lib/pdm-api-types/src/acl.rs               |  4 +--
 lib/pdm-config/Cargo.toml                  |  1 +
 lib/pdm-config/src/access_control.rs       | 31 ++++++++++++++++++++
 lib/pdm-config/src/config_version_cache.rs | 34 +++++++++++++++++-----
 lib/pdm-config/src/lib.rs                  |  2 ++
 server/src/acl.rs                          |  3 +-
 ui/src/main.rs                             | 10 ++++++-
 9 files changed, 77 insertions(+), 14 deletions(-)
 create mode 100644 lib/pdm-config/src/access_control.rs


Summary over all repositories:
  19 files changed, 801 insertions(+), 44 deletions(-)

-- 
Generated by git-murpp 0.8.1


_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[relevance 14%]

* [pbs-devel] [PATCH proxmox-datacenter-manager v4 2/3] docs: document API token-cache TTL effects
  2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (8 preceding siblings ...)
  2026-01-21 15:14 14% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 1/3] pdm-config: implement token.shadow generation Samuel Rufinatscha
@ 2026-01-21 15:14 17% ` Samuel Rufinatscha
  2026-01-21 15:14 16% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 3/3] pdm-config: wire user+acl cache generation Samuel Rufinatscha
  2026-02-17 11:14 13% ` [pbs-devel] superseded: [PATCH proxmox{-backup,,-datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
  To: pbs-devel

Documents the effects of the added API token-cache in the
proxmox-access-control crate.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to 4:
* Adjusted commit message

 docs/access-control.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/docs/access-control.rst b/docs/access-control.rst
index adf26cd..18e57a2 100644
--- a/docs/access-control.rst
+++ b/docs/access-control.rst
@@ -47,6 +47,10 @@ place of the user ID (``user@realm``) and the user password, respectively.
 The API token is passed from the client to the server by setting the ``Authorization`` HTTP header
 with method ``PDMAPIToken`` to the value ``TOKENID:TOKENSECRET``.
 
+.. WARNING:: Direct/manual edits to ``token.shadow`` may take up to 60 seconds (or
+   longer in edge cases) to take effect due to caching. Restart services for
+   immediate effect of manual edits.
+
 .. _access_control:
 
 Access Control
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply related	[relevance 17%]

* [pbs-devel] [PATCH proxmox-backup v4 2/4] pbs-config: cache verified API token secrets
  2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
  2026-01-21 15:13 17% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
@ 2026-01-21 15:13 12% ` Samuel Rufinatscha
  2026-02-10 12:54  5%   ` Christian Ebner
  2026-01-21 15:13 12% ` [pbs-devel] [PATCH proxmox-backup v4 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 117+ results
From: Samuel Rufinatscha @ 2026-01-21 15:13 UTC (permalink / raw)
  To: pbs-devel

Adds an in-memory cache of successfully verified token secrets.
Subsequent requests for the same token+secret combination only perform a
comparison using openssl::memcmp::eq and avoid re-running the password
hash. The cache is updated when a token secret is set and cleared when a
token is deleted.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to v4:
* Add gen param to invalidate_cache_state()
* Validates the generation bump after obtaining write lock in
apply_api_mutation
* Pass lock to apply_api_mutation
* Remove unnecessary gen check cache_try_secret_matches
* Adjusted commit message

Changes from v2 to v3:
* Replaced process-local cache invalidation (AtomicU64
API_MUTATION_GENERATION) with a cross-process shared generation via
ConfigVersionCache.
* Validate shared generation before/after the constant-time secret
compare; only insert into cache if the generation is unchanged.
* invalidate_cache_state() on insert if shared generation changed.

Changes from v1 to v2:
* Replace OnceCell with LazyLock, and std::sync::RwLock with
parking_lot::RwLock.
* Add API_MUTATION_GENERATION and guard cache inserts
to prevent “zombie inserts” across concurrent set/delete.
* Refactor cache operations into cache_try_secret_matches,
cache_try_insert_secret, and centralize write-side behavior in
apply_api_mutation.
* Switch fast-path cache access to try_read/try_write (best-effort).

 Cargo.toml                     |   1 +
 pbs-config/Cargo.toml          |   1 +
 pbs-config/src/token_shadow.rs | 160 ++++++++++++++++++++++++++++++++-
 3 files changed, 159 insertions(+), 3 deletions(-)

diff --git a/Cargo.toml b/Cargo.toml
index 0da18383..aed66fe3 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -143,6 +143,7 @@ nom = "7"
 num-traits = "0.2"
 once_cell = "1.3.1"
 openssl = "0.10.40"
+parking_lot = "0.12"
 percent-encoding = "2.1"
 pin-project-lite = "0.2"
 regex = "1.5.5"
diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
index 74afb3c6..eb81ce00 100644
--- a/pbs-config/Cargo.toml
+++ b/pbs-config/Cargo.toml
@@ -13,6 +13,7 @@ libc.workspace = true
 nix.workspace = true
 once_cell.workspace = true
 openssl.workspace = true
+parking_lot.workspace = true
 regex.workspace = true
 serde.workspace = true
 serde_json.workspace = true
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index 640fabbf..d5aa5de2 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,6 +1,8 @@
 use std::collections::HashMap;
+use std::sync::LazyLock;
 
 use anyhow::{bail, format_err, Error};
+use parking_lot::RwLock;
 use serde::{Deserialize, Serialize};
 use serde_json::{from_value, Value};
 
@@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
 const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
 const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
 
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
+    RwLock::new(ApiTokenSecretCache {
+        secrets: HashMap::new(),
+        shared_gen: 0,
+    })
+});
+
 #[derive(Serialize, Deserialize)]
 #[serde(rename_all = "kebab-case")]
 /// ApiToken id / secret pair
@@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
+    // Fast path
+    if cache_try_secret_matches(tokenid, secret) {
+        return Ok(());
+    }
+
+    // Slow path
+    // First, capture the shared generation before doing the hash verification.
+    let gen_before = token_shadow_shared_gen();
+
     let data = read_file()?;
     match data.get(tokenid) {
-        Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+        Some(hashed_secret) => {
+            proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+
+            // Try to cache only if nothing changed while verifying the secret.
+            if let Some(gen) = gen_before {
+                cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
+            }
+
+            Ok(())
+        }
         None => bail!("invalid API token"),
     }
 }
@@ -75,13 +107,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
-    let _guard = lock_config()?;
+    let guard = lock_config()?;
 
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
+    apply_api_mutation(guard, tokenid, Some(secret));
+
     Ok(())
 }
 
@@ -91,11 +125,131 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
-    let _guard = lock_config()?;
+    let guard = lock_config()?;
 
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
+    apply_api_mutation(guard, tokenid, None);
+
     Ok(())
 }
+
+struct ApiTokenSecretCache {
+    /// Keys are token Authids, values are the corresponding plain text secrets.
+    /// Entries are added after a successful on-disk verification in
+    /// `verify_secret` or when a new token secret is generated by
+    /// `generate_and_set_secret`. Used to avoid repeated
+    /// password-hash computation on subsequent authentications.
+    secrets: HashMap<Authid, CachedSecret>,
+    /// Shared generation to detect mutations of the underlying token.shadow file.
+    shared_gen: usize,
+}
+
+/// Cached secret.
+struct CachedSecret {
+    secret: String,
+}
+
+fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return;
+    };
+
+    let Some(shared_gen_now) = token_shadow_shared_gen() else {
+        return;
+    };
+
+    // If this process missed a generation bump, its cache is stale.
+    if cache.shared_gen != shared_gen_now {
+        invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
+    }
+
+    // If a mutation happened while we were verifying the secret, do not insert.
+    if shared_gen_now == shared_gen_before {
+        cache.secrets.insert(tokenid, CachedSecret { secret });
+    }
+}
+
+/// Tries to match the given token secret against the cached secret.
+///
+/// Verifies the generation/version before doing the constant-time
+/// comparison to reduce TOCTOU risk. During token rotation or deletion
+/// tokens for in-flight requests may still validate against the previous
+/// generation.
+fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
+    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+        return false;
+    };
+    let Some(entry) = cache.secrets.get(tokenid) else {
+        return false;
+    };
+    let Some(current_gen) = token_shadow_shared_gen() else {
+        return false;
+    };
+
+    if current_gen == cache.shared_gen {
+        return openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
+    }
+
+    false
+}
+
+fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+    // Signal cache invalidation to other processes (best-effort).
+    let bumped_gen = bump_token_shadow_shared_gen();
+
+    let mut cache = TOKEN_SECRET_CACHE.write();
+
+    // If we cannot get the current generation, we cannot trust the cache
+    let Some(current_gen) = token_shadow_shared_gen() else {
+        invalidate_cache_state_and_set_gen(&mut cache, 0);
+        return;
+    };
+
+    // If we cannot bump the shared generation, or if it changed after
+    // obtaining the cache write lock, we cannot trust the cache
+    if bumped_gen != Some(current_gen) {
+        invalidate_cache_state_and_set_gen(&mut cache, current_gen);
+        return;
+    }
+
+    // Update to the post-mutation generation.
+    cache.shared_gen = current_gen;
+
+    // Apply the new mutation.
+    match new_secret {
+        Some(secret) => {
+            cache.secrets.insert(
+                tokenid.clone(),
+                CachedSecret {
+                    secret: secret.to_owned(),
+                },
+            );
+        }
+        None => {
+            cache.secrets.remove(tokenid);
+        }
+    }
+}
+
+/// Get the current shared generation.
+fn token_shadow_shared_gen() -> Option<usize> {
+    crate::ConfigVersionCache::new()
+        .ok()
+        .map(|cvc| cvc.token_shadow_generation())
+}
+
+/// Bump and return the new shared generation.
+fn bump_token_shadow_shared_gen() -> Option<usize> {
+    crate::ConfigVersionCache::new()
+        .ok()
+        .map(|cvc| cvc.increase_token_shadow_generation() + 1)
+}
+
+/// Invalidates local cache contents and sets/updates the cached generation.
+fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache, gen: usize) {
+    cache.secrets.clear();
+    cache.shared_gen = gen;
+}
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply related	[relevance 12%]

* [pbs-devel] [PATCH proxmox v4 4/4] proxmox-access-control: add TTL window to token secret cache
  2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (6 preceding siblings ...)
  2026-01-21 15:14 12% ` [pbs-devel] [PATCH proxmox v4 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2026-01-21 15:14 15% ` Samuel Rufinatscha
  2026-01-21 15:14 14% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 1/3] pdm-config: implement token.shadow generation Samuel Rufinatscha
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
  To: pbs-devel

Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
usually should rarely change.

This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired; documents TTL effects.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to 4:
* Adjusted commit message

Changes from v2 to v3:
* Refactored refresh_cache_if_file_changed TTL logic.
* Remove had_prior_state check (replaced by last_checked logic).
* Improve TTL bound checks.
* Reword documentation warning for clarity.

Changes from v1 to v2:
* Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
* Implement double-checked TTL: check with try_read first; only attempt
  refresh with try_write if expired/unknown.
* Fix TTL bookkeeping: update last_checked on the “file unchanged” path
  and after API mutations.
* Add documentation warning about TTL-delayed effect of manual
  token.shadow edits.

 proxmox-access-control/src/token_shadow.rs | 30 +++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index 05813b52..a361fd72 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -28,6 +28,9 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
     })
 });
 
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
+
 // Get exclusive lock
 fn lock_config() -> Result<ApiLockGuard, Error> {
     open_api_lockfile(token_shadow_lock(), None, true)
@@ -55,11 +58,29 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
 fn refresh_cache_if_file_changed() -> bool {
     let now = epoch_i64();
 
-    // Best-effort refresh under write lock.
+    // Fast path: cache is fresh if shared-gen matches and TTL not expired.
+    if let (Some(cache), Some(shared_gen_read)) =
+        (TOKEN_SECRET_CACHE.try_read(), token_shadow_shared_gen())
+    {
+        if cache.shared_gen == shared_gen_read
+            && cache.shadow.as_ref().is_some_and(|cached| {
+                now >= cached.last_checked
+                    && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS
+            })
+        {
+            return true;
+        }
+        // read lock drops here
+    } else {
+        return false;
+    }
+
+    // Slow path: best-effort refresh under write lock.
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return false;
     };
 
+    // Re-read generation after acquiring the lock (may have changed meanwhile).
     let Some(shared_gen_now) = token_shadow_shared_gen() else {
         return false;
     };
@@ -69,6 +90,13 @@ fn refresh_cache_if_file_changed() -> bool {
         invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
     }
 
+    // TTL check again after acquiring the lock
+    if cache.shadow.as_ref().is_some_and(|cached| {
+        now >= cached.last_checked && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS
+    }) {
+        return true;
+    }
+
     // Stat the file to detect manual edits.
     let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
         return false;
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply related	[relevance 15%]

* [pbs-devel] [PATCH proxmox v4 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen
  2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (3 preceding siblings ...)
  2026-01-21 15:14 15% ` [pbs-devel] [PATCH proxmox-backup v4 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
@ 2026-01-21 15:14 14% ` Samuel Rufinatscha
  2026-01-21 15:14 12% ` [pbs-devel] [PATCH proxmox v4 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
  To: pbs-devel

Splits AccessControlConfig trait into AccessControlPermissions and
AccessControlConfig traits and adds token.shadow generation support
to AccessControlConfig (provides default impl).

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to 4:
* Split AccessControlConfig: introduced AccessControlPermissions to
provide permissions for AccessControlConfig
* Adjusted commit message

 proxmox-access-control/src/acl.rs  |  10 ++-
 proxmox-access-control/src/init.rs | 113 +++++++++++++++++++++++------
 2 files changed, 99 insertions(+), 24 deletions(-)

diff --git a/proxmox-access-control/src/acl.rs b/proxmox-access-control/src/acl.rs
index 38cb7edf..4b4eac09 100644
--- a/proxmox-access-control/src/acl.rs
+++ b/proxmox-access-control/src/acl.rs
@@ -763,7 +763,7 @@ fn privs_to_priv_names(privs: u64) -> Vec<&'static str> {
 mod test {
     use std::{collections::HashMap, sync::OnceLock};
 
-    use crate::init::{init_access_config, AccessControlConfig};
+    use crate::init::{init_access_config, AccessControlConfig, AccessControlPermissions};
 
     use super::AclTree;
     use anyhow::Error;
@@ -775,7 +775,7 @@ mod test {
         roles: HashMap<&'a str, (u64, &'a str)>,
     }
 
-    impl AccessControlConfig for TestAcmConfig<'_> {
+    impl AccessControlPermissions for TestAcmConfig<'_> {
         fn roles(&self) -> &HashMap<&str, (u64, &str)> {
             &self.roles
         }
@@ -793,6 +793,12 @@ mod test {
         }
     }
 
+    impl AccessControlConfig for TestAcmConfig<'_> {
+        fn permissions(&self) -> &dyn AccessControlPermissions {
+            self
+        }
+    }
+
     fn setup_acl_tree_config() {
         static ACL_CONFIG: OnceLock<TestAcmConfig> = OnceLock::new();
         let config = ACL_CONFIG.get_or_init(|| {
diff --git a/proxmox-access-control/src/init.rs b/proxmox-access-control/src/init.rs
index e64398e8..dfd7784b 100644
--- a/proxmox-access-control/src/init.rs
+++ b/proxmox-access-control/src/init.rs
@@ -8,9 +8,8 @@ use proxmox_section_config::SectionConfigData;
 
 static ACCESS_CONF: OnceLock<&'static dyn AccessControlConfig> = OnceLock::new();
 
-/// This trait specifies the functions a product needs to implement to get ACL tree based access
-/// control management from this plugin.
-pub trait AccessControlConfig: Send + Sync {
+/// Provides permission metadata used by access control.
+pub trait AccessControlPermissions: Send + Sync {
     /// Returns a mapping of all recognized privileges and their corresponding `u64` value.
     fn privileges(&self) -> &HashMap<&str, u64>;
 
@@ -32,25 +31,6 @@ pub trait AccessControlConfig: Send + Sync {
         false
     }
 
-    /// Returns the current cache generation of the user and acl configs. If the generation was
-    /// incremented since the last time the cache was queried, the configs are loaded again from
-    /// disk.
-    ///
-    /// Returning `None` will always reload the cache.
-    ///
-    /// Default: Always returns `None`.
-    fn cache_generation(&self) -> Option<usize> {
-        None
-    }
-
-    /// Increment the cache generation of user and acl configs. This indicates that they were
-    /// changed on disk.
-    ///
-    /// Default: Does nothing.
-    fn increment_cache_generation(&self) -> Result<(), Error> {
-        Ok(())
-    }
-
     /// Optionally returns a role that has no access to any resource.
     ///
     /// Default: Returns `None`.
@@ -103,6 +83,95 @@ pub trait AccessControlConfig: Send + Sync {
     }
 }
 
+/// This trait specifies the functions a product needs to implement to get ACL tree based access
+/// control management from this plugin.
+pub trait AccessControlConfig: Send + Sync {
+    /// Return the permissions provider.
+    fn permissions(&self) -> &dyn AccessControlPermissions;
+
+    fn privileges(&self) -> &HashMap<&str, u64> {
+        self.permissions().privileges()
+    }
+
+    fn roles(&self) -> &HashMap<&str, (u64, &str)> {
+        self.permissions().roles()
+    }
+
+    fn is_superuser(&self, auth_id: &Authid) -> bool {
+        self.permissions().is_superuser(auth_id)
+    }
+
+    fn is_group_member(&self, user_id: &Userid, group: &str) -> bool {
+        self.permissions().is_group_member(user_id, group)
+    }
+
+    fn role_no_access(&self) -> Option<&str> {
+        self.permissions().role_no_access()
+    }
+
+    fn role_admin(&self) -> Option<&str> {
+        self.permissions().role_admin()
+    }
+
+    fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
+        self.permissions().init_user_config(config)
+    }
+
+    fn acl_audit_privileges(&self) -> u64 {
+        self.permissions().acl_audit_privileges()
+    }
+
+    fn acl_modify_privileges(&self) -> u64 {
+        self.permissions().acl_modify_privileges()
+    }
+
+    fn check_acl_path(&self, path: &str) -> Result<(), Error> {
+        self.permissions().check_acl_path(path)
+    }
+
+    fn allow_partial_permission_match(&self) -> bool {
+        self.permissions().allow_partial_permission_match()
+    }
+
+    // Cache hooks
+
+    /// Returns the current cache generation of the user and acl configs. If the generation was
+    /// incremented since the last time the cache was queried, the configs are loaded again from
+    /// disk.
+    ///
+    /// Returning `None` will always reload the cache.
+    ///
+    /// Default: Always returns `None`.
+    fn cache_generation(&self) -> Option<usize> {
+        None
+    }
+
+    /// Increment the cache generation of user and acl configs. This indicates that they were
+    /// changed on disk.
+    ///
+    /// Default: Does nothing.
+    fn increment_cache_generation(&self) -> Result<(), Error> {
+        Ok(())
+    }
+
+    /// Returns the current cache generation of the token shadow cache. If the generation was
+    /// incremented since the last time the cache was queried, the token shadow cache is reloaded
+    /// from disk.
+    ///
+    /// Default: Always returns `None`.
+    fn token_shadow_cache_generation(&self) -> Option<usize> {
+        None
+    }
+
+    /// Increment the cache generation of the token shadow cache. This indicates that it was
+    /// changed on disk.
+    ///
+    /// Default: Returns an error as token shadow generation is not supported.
+    fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
+        anyhow::bail!("token shadow generation not supported");
+    }
+}
+
 pub fn init_access_config(config: &'static dyn AccessControlConfig) -> Result<(), Error> {
     ACCESS_CONF
         .set(config)
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply related	[relevance 14%]

* [pbs-devel] [PATCH proxmox-datacenter-manager v4 1/3] pdm-config: implement token.shadow generation
  2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (7 preceding siblings ...)
  2026-01-21 15:14 15% ` [pbs-devel] [PATCH proxmox v4 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
@ 2026-01-21 15:14 14% ` Samuel Rufinatscha
  2026-01-21 15:14 17% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 2/3] docs: document API token-cache TTL effects Samuel Rufinatscha
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
  To: pbs-devel

PDM depends on the shared proxmox/proxmox-access-control crate for
token.shadow handling which expects the product to provide a
cross-process invalidation signal so it can cache/invalidate
token.shadow secrets.

This patch wires AccessControlConfig to ConfigVersionCache for
token.shadow invalidation and switches server/CLI to use
pdm-config’s AccessControlConfig and UI to use
UiAccessControlConfig.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to v4:
* pdm-api-types: replace AccessControlConfig with
AccessControlPermissions and implement init::AccessControlPermissions
there
* pdm-config: add new AccessControlConfig implementing
init::AccessControlConfig
* UI: init uses a local UiAccessControlConfig for init_access_config()
* Adjusted commit message

 cli/admin/src/main.rs                      |  2 +-
 lib/pdm-api-types/src/acl.rs               |  4 ++--
 lib/pdm-config/Cargo.toml                  |  1 +
 lib/pdm-config/src/access_control.rs       | 20 ++++++++++++++++++++
 lib/pdm-config/src/config_version_cache.rs | 18 ++++++++++++++++++
 lib/pdm-config/src/lib.rs                  |  2 ++
 server/src/acl.rs                          |  3 +--
 ui/src/main.rs                             | 10 +++++++++-
 8 files changed, 54 insertions(+), 6 deletions(-)
 create mode 100644 lib/pdm-config/src/access_control.rs

diff --git a/cli/admin/src/main.rs b/cli/admin/src/main.rs
index f698fa2..916c633 100644
--- a/cli/admin/src/main.rs
+++ b/cli/admin/src/main.rs
@@ -19,7 +19,7 @@ fn main() {
     proxmox_product_config::init(api_user, priv_user);
 
     proxmox_access_control::init::init(
-        &pdm_api_types::AccessControlConfig,
+        &pdm_config::AccessControlConfig,
         pdm_buildcfg::configdir!("/access"),
     )
     .expect("failed to setup access control config");
diff --git a/lib/pdm-api-types/src/acl.rs b/lib/pdm-api-types/src/acl.rs
index 405982a..7c405a7 100644
--- a/lib/pdm-api-types/src/acl.rs
+++ b/lib/pdm-api-types/src/acl.rs
@@ -187,9 +187,9 @@ pub struct AclListItem {
     pub roleid: String,
 }
 
-pub struct AccessControlConfig;
+pub struct AccessControlPermissions;
 
-impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
+impl proxmox_access_control::init::AccessControlPermissions for AccessControlPermissions {
     fn privileges(&self) -> &HashMap<&str, u64> {
         static PRIVS: LazyLock<HashMap<&str, u64>> =
             LazyLock::new(|| PRIVILEGES.iter().copied().collect());
diff --git a/lib/pdm-config/Cargo.toml b/lib/pdm-config/Cargo.toml
index d39c2ad..19781d2 100644
--- a/lib/pdm-config/Cargo.toml
+++ b/lib/pdm-config/Cargo.toml
@@ -13,6 +13,7 @@ once_cell.workspace = true
 openssl.workspace = true
 serde.workspace = true
 
+proxmox-access-control.workspace = true
 proxmox-config-digest = { workspace = true, features = [ "openssl" ] }
 proxmox-http = { workspace = true, features = [ "http-helpers" ] }
 proxmox-ldap = { workspace = true, features = [ "types" ]}
diff --git a/lib/pdm-config/src/access_control.rs b/lib/pdm-config/src/access_control.rs
new file mode 100644
index 0000000..389b3f4
--- /dev/null
+++ b/lib/pdm-config/src/access_control.rs
@@ -0,0 +1,20 @@
+use anyhow::Error;
+
+pub struct AccessControlConfig;
+
+impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
+    fn permissions(&self) -> &dyn proxmox_access_control::init::AccessControlPermissions {
+        &pdm_api_types::AccessControlPermissions
+    }
+
+    fn token_shadow_cache_generation(&self) -> Option<usize> {
+        crate::ConfigVersionCache::new()
+            .ok()
+            .map(|c| c.token_shadow_generation())
+    }
+
+    fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
+        let c = crate::ConfigVersionCache::new()?;
+        Ok(c.increase_token_shadow_generation())
+    }
+}
diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
index 36a6a77..933140c 100644
--- a/lib/pdm-config/src/config_version_cache.rs
+++ b/lib/pdm-config/src/config_version_cache.rs
@@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
     traffic_control_generation: AtomicUsize,
     // Tracks updates to the remote/hostname/nodename mapping cache.
     remote_mapping_cache: AtomicUsize,
+    // Token shadow (token.shadow) generation/version.
+    token_shadow_generation: AtomicUsize,
     // Add further atomics here
 }
 
@@ -172,4 +174,20 @@ impl ConfigVersionCache {
             .fetch_add(1, Ordering::Relaxed)
             + 1
     }
+
+    /// Returns the token shadow generation number.
+    pub fn token_shadow_generation(&self) -> usize {
+        self.shmem
+            .data()
+            .token_shadow_generation
+            .load(Ordering::Acquire)
+    }
+
+    /// Increase the token shadow generation number.
+    pub fn increase_token_shadow_generation(&self) -> usize {
+        self.shmem
+            .data()
+            .token_shadow_generation
+            .fetch_add(1, Ordering::AcqRel)
+    }
 }
diff --git a/lib/pdm-config/src/lib.rs b/lib/pdm-config/src/lib.rs
index 4c49054..614f7ae 100644
--- a/lib/pdm-config/src/lib.rs
+++ b/lib/pdm-config/src/lib.rs
@@ -9,6 +9,8 @@ pub mod remotes;
 pub mod setup;
 pub mod views;
 
+mod access_control;
+pub use access_control::AccessControlConfig;
 mod config_version_cache;
 pub use config_version_cache::ConfigVersionCache;
 
diff --git a/server/src/acl.rs b/server/src/acl.rs
index f421814..e6e007b 100644
--- a/server/src/acl.rs
+++ b/server/src/acl.rs
@@ -1,6 +1,5 @@
 pub(crate) fn init() {
-    static ACCESS_CONTROL_CONFIG: pdm_api_types::AccessControlConfig =
-        pdm_api_types::AccessControlConfig;
+    static ACCESS_CONTROL_CONFIG: pdm_config::AccessControlConfig = pdm_config::AccessControlConfig;
 
     proxmox_access_control::init::init(&ACCESS_CONTROL_CONFIG, pdm_buildcfg::configdir!("/access"))
         .expect("failed to setup access control config");
diff --git a/ui/src/main.rs b/ui/src/main.rs
index 2bd900e..9f87505 100644
--- a/ui/src/main.rs
+++ b/ui/src/main.rs
@@ -390,10 +390,18 @@ fn main() {
     pwt::state::set_available_languages(proxmox_yew_comp::available_language_list());
 
     if let Err(e) =
-        proxmox_access_control::init::init_access_config(&pdm_api_types::AccessControlConfig)
+        proxmox_access_control::init::init_access_config(&UiAccessControlConfig)
     {
         log::error!("could not initialize access control config - {e:#}");
     }
 
     yew::Renderer::<DatacenterManagerApp>::new().render();
 }
+
+struct UiAccessControlConfig;
+
+impl proxmox_access_control::init::AccessControlConfig for UiAccessControlConfig {
+    fn permissions(&self) -> &dyn proxmox_access_control::init::AccessControlPermissions {
+        &pdm_api_types::AccessControlPermissions
+    }
+}
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply related	[relevance 14%]

* [pbs-devel] [PATCH proxmox v4 2/4] proxmox-access-control: cache verified API token secrets
  2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (4 preceding siblings ...)
  2026-01-21 15:14 14% ` [pbs-devel] [PATCH proxmox v4 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen Samuel Rufinatscha
@ 2026-01-21 15:14 12% ` Samuel Rufinatscha
  2026-02-10 13:38  6%   ` Christian Ebner
  2026-01-21 15:14 12% ` [pbs-devel] [PATCH proxmox v4 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 117+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
  To: pbs-devel

Adds an in-memory cache of successfully verified token secrets.
Subsequent requests for the same token+secret combination only perform a
comparison using openssl::memcmp::eq and avoid re-running the password
hash. The cache is updated when a token secret is set and cleared when a
token is deleted.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to v4:
* Add gen param to invalidate_cache_state()
* Validates the generation bump after obtaining write lock in
apply_api_mutation
* Pass lock to apply_api_mutation
* Remove unnecessary gen check cache_try_secret_matches
* Adjusted commit message

Changes from v2 to v3:
* Replaced process-local cache invalidation (AtomicU64
API_MUTATION_GENERATION) with a cross-process shared generation via
ConfigVersionCache.
* Validate shared generation before/after the constant-time secret
compare; only insert into cache if the generation is unchanged.
* invalidate_cache_state() on insert if shared generation changed.

Changes from v1 to v2:
* Replace OnceCell with LazyLock, and std::sync::RwLock with
parking_lot::RwLock.
* Add API_MUTATION_GENERATION and guard cache inserts
to prevent “zombie inserts” across concurrent set/delete.
* Refactor cache operations into cache_try_secret_matches,
cache_try_insert_secret, and centralize write-side behavior in
apply_api_mutation.
* Switch fast-path cache access to try_read/try_write (best-effort).

 Cargo.toml                                 |   1 +
 proxmox-access-control/Cargo.toml          |   1 +
 proxmox-access-control/src/token_shadow.rs | 160 ++++++++++++++++++++-
 3 files changed, 159 insertions(+), 3 deletions(-)

diff --git a/Cargo.toml b/Cargo.toml
index 27a69afa..59a2ec93 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -112,6 +112,7 @@ native-tls = "0.2"
 nix = "0.29"
 openssl = "0.10"
 pam-sys = "0.5"
+parking_lot = "0.12"
 percent-encoding = "2.1"
 pin-utils = "0.1.0"
 proc-macro2 = "1.0"
diff --git a/proxmox-access-control/Cargo.toml b/proxmox-access-control/Cargo.toml
index ec189664..1de2842c 100644
--- a/proxmox-access-control/Cargo.toml
+++ b/proxmox-access-control/Cargo.toml
@@ -16,6 +16,7 @@ anyhow.workspace = true
 const_format.workspace = true
 nix = { workspace = true, optional = true }
 openssl = { workspace = true, optional = true }
+parking_lot.workspace = true
 regex.workspace = true
 hex = { workspace = true, optional = true }
 serde.workspace = true
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index c586d834..e4dfab50 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,13 +1,28 @@
 use std::collections::HashMap;
+use std::sync::LazyLock;
 
 use anyhow::{bail, format_err, Error};
+use parking_lot::RwLock;
 use serde_json::{from_value, Value};
 
 use proxmox_auth_api::types::Authid;
 use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
 
+use crate::init::access_conf;
 use crate::init::impl_feature::{token_shadow, token_shadow_lock};
 
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
+    RwLock::new(ApiTokenSecretCache {
+        secrets: HashMap::new(),
+        shared_gen: 0,
+    })
+});
+
 // Get exclusive lock
 fn lock_config() -> Result<ApiLockGuard, Error> {
     open_api_lockfile(token_shadow_lock(), None, true)
@@ -36,9 +51,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
+    // Fast path
+    if cache_try_secret_matches(tokenid, secret) {
+        return Ok(());
+    }
+
+    // Slow path
+    // First, capture the shared generation before doing the hash verification.
+    let gen_before = token_shadow_shared_gen();
+
     let data = read_file()?;
     match data.get(tokenid) {
-        Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+        Some(hashed_secret) => {
+            proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+
+            // Try to cache only if nothing changed while verifying the secret.
+            if let Some(gen) = gen_before {
+                cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
+            }
+
+            Ok(())
+        }
         None => bail!("invalid API token"),
     }
 }
@@ -49,13 +82,15 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
-    let _guard = lock_config()?;
+    let guard = lock_config()?;
 
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
+    apply_api_mutation(guard, tokenid, Some(secret));
+
     Ok(())
 }
 
@@ -65,12 +100,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
-    let _guard = lock_config()?;
+    let guard = lock_config()?;
 
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
+    apply_api_mutation(guard, tokenid, None);
+
     Ok(())
 }
 
@@ -81,3 +118,120 @@ pub fn generate_and_set_secret(tokenid: &Authid) -> Result<String, Error> {
     set_secret(tokenid, &secret)?;
     Ok(secret)
 }
+
+struct ApiTokenSecretCache {
+    /// Keys are token Authids, values are the corresponding plain text secrets.
+    /// Entries are added after a successful on-disk verification in
+    /// `verify_secret` or when a new token secret is generated by
+    /// `generate_and_set_secret`. Used to avoid repeated
+    /// password-hash computation on subsequent authentications.
+    secrets: HashMap<Authid, CachedSecret>,
+    /// Shared generation to detect mutations of the underlying token.shadow file.
+    shared_gen: usize,
+}
+
+/// Cached secret.
+struct CachedSecret {
+    secret: String,
+}
+
+fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return;
+    };
+
+    let Some(shared_gen_now) = token_shadow_shared_gen() else {
+        return;
+    };
+
+    // If this process missed a generation bump, its cache is stale.
+    if cache.shared_gen != shared_gen_now {
+        invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
+    }
+
+    // If a mutation happened while we were verifying the secret, do not insert.
+    if shared_gen_now == shared_gen_before {
+        cache.secrets.insert(tokenid, CachedSecret { secret });
+    }
+}
+
+/// Tries to match the given token secret against the cached secret.
+///
+/// Verifies the generation/version before doing the constant-time
+/// comparison to reduce TOCTOU risk. During token rotation or deletion
+/// tokens for in-flight requests may still validate against the previous
+/// generation.
+fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
+    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+        return false;
+    };
+    let Some(entry) = cache.secrets.get(tokenid) else {
+        return false;
+    };
+    let Some(current_gen) = token_shadow_shared_gen() else {
+        return false;
+    };
+
+    if current_gen == cache.shared_gen {
+        return openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
+    }
+
+    false
+}
+
+fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+    // Signal cache invalidation to other processes (best-effort).
+    let bumped_gen = bump_token_shadow_shared_gen();
+
+    let mut cache = TOKEN_SECRET_CACHE.write();
+
+    // If we cannot get the current generation, we cannot trust the cache
+    let Some(current_gen) = token_shadow_shared_gen() else {
+        invalidate_cache_state_and_set_gen(&mut cache, 0);
+        return;
+    };
+
+    // If we cannot bump the shared generation, or if it changed after
+    // obtaining the cache write lock, we cannot trust the cache
+    if bumped_gen != Some(current_gen) {
+        invalidate_cache_state_and_set_gen(&mut cache, current_gen);
+        return;
+    }
+
+    // Update to the post-mutation generation.
+    cache.shared_gen = current_gen;
+
+    // Apply the new mutation.
+    match new_secret {
+        Some(secret) => {
+            cache.secrets.insert(
+                tokenid.clone(),
+                CachedSecret {
+                    secret: secret.to_owned(),
+                },
+            );
+        }
+        None => {
+            cache.secrets.remove(tokenid);
+        }
+    }
+}
+
+/// Get the current shared generation.
+fn token_shadow_shared_gen() -> Option<usize> {
+    access_conf().token_shadow_cache_generation()
+}
+
+/// Bump and return the new shared generation.
+fn bump_token_shadow_shared_gen() -> Option<usize> {
+    access_conf()
+        .increment_token_shadow_cache_generation()
+        .ok()
+        .map(|prev| prev + 1)
+}
+
+/// Invalidates local cache contents and sets/updates the cached generation.
+fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache, gen: usize) {
+    cache.secrets.clear();
+    cache.shared_gen = gen;
+}
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply related	[relevance 12%]

* [pbs-devel] [PATCH proxmox-backup v4 1/4] pbs-config: add token.shadow generation to ConfigVersionCache
  2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
@ 2026-01-21 15:13 17% ` Samuel Rufinatscha
  2026-01-21 15:13 12% ` [pbs-devel] [PATCH proxmox-backup v4 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-21 15:13 UTC (permalink / raw)
  To: pbs-devel

Prepares the config version cache to support token_shadow caching.

Safety: the shmem mapping is fixed to 4096 bytes via the #[repr(C)]
union padding, and the new atomic is appended to the end of the
#[repr(C)] inner struct, so all existing field offsets stay unchanged.
Old processes keep accessing the same bytes and new processes consume
previously reserved padding.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to v4:
* Rebased
* Adjusted commit message

Changes from v2 to v3:
* Rebased

Changes from v1 to v2:
* Rebased

 pbs-config/src/config_version_cache.rs | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
index b875f7e0..399a6f79 100644
--- a/pbs-config/src/config_version_cache.rs
+++ b/pbs-config/src/config_version_cache.rs
@@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
     traffic_control_generation: AtomicUsize,
     // datastore (datastore.cfg) generation/version
     datastore_generation: AtomicUsize,
+    // Token shadow (token.shadow) generation/version.
+    token_shadow_generation: AtomicUsize,
     // Add further atomics here
 }
 
@@ -159,4 +161,20 @@ impl ConfigVersionCache {
             .datastore_generation
             .fetch_add(1, Ordering::AcqRel)
     }
+
+    /// Returns the token shadow generation number.
+    pub fn token_shadow_generation(&self) -> usize {
+        self.shmem
+            .data()
+            .token_shadow_generation
+            .load(Ordering::Acquire)
+    }
+
+    /// Increase the token shadow generation number.
+    pub fn increase_token_shadow_generation(&self) -> usize {
+        self.shmem
+            .data()
+            .token_shadow_generation
+            .fetch_add(1, Ordering::AcqRel)
+    }
 }
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply related	[relevance 17%]

* [pbs-devel] [PATCH proxmox-backup v4 3/4] pbs-config: invalidate token-secret cache on token.shadow changes
  2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
  2026-01-21 15:13 17% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
  2026-01-21 15:13 12% ` [pbs-devel] [PATCH proxmox-backup v4 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
@ 2026-01-21 15:13 12% ` Samuel Rufinatscha
  2026-01-21 15:14 15% ` [pbs-devel] [PATCH proxmox-backup v4 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-21 15:13 UTC (permalink / raw)
  To: pbs-devel

This patch adds manual/direct file change detection by tracking the
mtime and length of token.shadow and clears the in-memory token secret
cache whenever these values change.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to v4:
* make use of .replace() in refresh_cache_if_file_changed to get
previous state
* Group file stats with ShadowFileInfo
* Return false in refresh_cache_if_file_changed to avoid unnecessary cache
queries
* Adjusted commit message

Changes from v2 to v3:
* Cache now tracks last_checked (epoch seconds).
* Simplified refresh_cache_if_file_changed, removed
FILE_GENERATION logic
* On first load, initializes file metadata and keeps empty cache.

Changes from v1 to v2:
* Add file metadata tracking (file_mtime, file_len) and
  FILE_GENERATION.
* Store file_gen in CachedSecret and verify it against the current
  FILE_GENERATION to ensure cached entries belong to the current file
  state.
* Add shadow_mtime_len() helper and convert refresh to best-effort
  (try_write, returns bool).
* Pass a pre-write metadata snapshot into apply_api_mutation and
  clear/bump generation if the cache metadata indicates missed external
  edits.

 pbs-config/src/token_shadow.rs | 123 +++++++++++++++++++++++++++++++--
 1 file changed, 119 insertions(+), 4 deletions(-)

diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index d5aa5de2..a5bd1525 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,5 +1,8 @@
 use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
 use std::sync::LazyLock;
+use std::time::SystemTime;
 
 use anyhow::{bail, format_err, Error};
 use parking_lot::RwLock;
@@ -7,6 +10,7 @@ use serde::{Deserialize, Serialize};
 use serde_json::{from_value, Value};
 
 use proxmox_sys::fs::CreateOptions;
+use proxmox_time::epoch_i64;
 
 use pbs_api_types::Authid;
 //use crate::auth;
@@ -24,6 +28,7 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
     RwLock::new(ApiTokenSecretCache {
         secrets: HashMap::new(),
         shared_gen: 0,
+        shadow: None,
     })
 });
 
@@ -62,6 +67,56 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
     proxmox_sys::fs::replace_file(CONF_FILE, &json, options, true)
 }
 
+/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
+/// Returns true if the cache is valid to use, false if not.
+fn refresh_cache_if_file_changed() -> bool {
+    let now = epoch_i64();
+
+    // Best-effort refresh under write lock.
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return false;
+    };
+
+    let Some(shared_gen_now) = token_shadow_shared_gen() else {
+        return false;
+    };
+
+    // If another process bumped the generation, we don't know what changed -> clear cache
+    if cache.shared_gen != shared_gen_now {
+        invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
+    }
+
+    // Stat the file to detect manual edits.
+    let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
+        return false;
+    };
+
+    // If the file didn't change, only update last_checked
+    if let Some(shadow) = cache.shadow.as_mut() {
+        if shadow.mtime == new_mtime && shadow.len == new_len {
+            shadow.last_checked = now;
+            return true;
+        }
+    }
+
+    cache.secrets.clear();
+
+    let prev = cache.shadow.replace(ShadowFileInfo {
+        mtime: new_mtime,
+        len: new_len,
+        last_checked: now,
+    });
+
+    if prev.is_some() {
+        // Best-effort propagation to other processes if a change was detected
+        if let Some(shared_gen_new) = bump_token_shadow_shared_gen() {
+            cache.shared_gen = shared_gen_new;
+        }
+    }
+
+    false
+}
+
 /// Verifies that an entry for given tokenid / API token secret exists
 pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     if !tokenid.is_token() {
@@ -69,7 +124,7 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     }
 
     // Fast path
-    if cache_try_secret_matches(tokenid, secret) {
+    if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
         return Ok(());
     }
 
@@ -109,12 +164,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
 
     let guard = lock_config()?;
 
+    // Capture state before we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
-    apply_api_mutation(guard, tokenid, Some(secret));
+    apply_api_mutation(guard, tokenid, Some(secret), pre_meta);
 
     Ok(())
 }
@@ -127,11 +185,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
 
     let guard = lock_config()?;
 
+    // Capture state before we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
-    apply_api_mutation(guard, tokenid, None);
+    apply_api_mutation(guard, tokenid, None, pre_meta);
 
     Ok(())
 }
@@ -145,6 +206,8 @@ struct ApiTokenSecretCache {
     secrets: HashMap<Authid, CachedSecret>,
     /// Shared generation to detect mutations of the underlying token.shadow file.
     shared_gen: usize,
+    /// Shadow file info to detect changes
+    shadow: Option<ShadowFileInfo>,
 }
 
 /// Cached secret.
@@ -152,6 +215,16 @@ struct CachedSecret {
     secret: String,
 }
 
+/// Shadow file info
+struct ShadowFileInfo {
+    // shadow file mtime to detect changes
+    mtime: Option<SystemTime>,
+    // shadow file length to detect changes
+    len: Option<u64>,
+    // last time the file metadata was checked
+    last_checked: i64,
+}
+
 fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return;
@@ -196,7 +269,14 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
     false
 }
 
-fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+fn apply_api_mutation(
+    _guard: BackupLockGuard,
+    tokenid: &Authid,
+    new_secret: Option<&str>,
+    pre_write_meta: (Option<SystemTime>, Option<u64>),
+) {
+    let now = epoch_i64();
+
     // Signal cache invalidation to other processes (best-effort).
     let bumped_gen = bump_token_shadow_shared_gen();
 
@@ -215,6 +295,16 @@ fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Opt
         return;
     }
 
+    // If our cached file metadata does not match the on-disk state before our write,
+    // we likely missed an external/manual edit. We can no longer trust any cached secrets.
+    if cache
+        .shadow
+        .as_ref()
+        .is_some_and(|s| (s.mtime, s.len) != pre_write_meta)
+    {
+        cache.secrets.clear();
+    }
+
     // Update to the post-mutation generation.
     cache.shared_gen = current_gen;
 
@@ -232,6 +322,22 @@ fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Opt
             cache.secrets.remove(tokenid);
         }
     }
+
+    // Update our view of the file metadata to the post-write state (best-effort).
+    // (If this fails, drop local cache so callers fall back to slow path until refreshed.)
+    match shadow_mtime_len() {
+        Ok((mtime, len)) => {
+            cache.shadow = Some(ShadowFileInfo {
+                mtime,
+                len,
+                last_checked: now,
+            });
+        }
+        Err(_) => {
+            // If we cannot validate state, do not trust cache.
+            invalidate_cache_state_and_set_gen(&mut cache, current_gen);
+        }
+    }
 }
 
 /// Get the current shared generation.
@@ -252,4 +358,13 @@ fn bump_token_shadow_shared_gen() -> Option<usize> {
 fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache, gen: usize) {
     cache.secrets.clear();
     cache.shared_gen = gen;
+    cache.shadow = None;
+}
+
+fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
+    match fs::metadata(CONF_FILE) {
+        Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
+        Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
+        Err(e) => Err(e.into()),
+    }
 }
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply related	[relevance 12%]

* [pbs-devel] [PATCH proxmox-datacenter-manager v4 3/3] pdm-config: wire user+acl cache generation
  2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (9 preceding siblings ...)
  2026-01-21 15:14 17% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 2/3] docs: document API token-cache TTL effects Samuel Rufinatscha
@ 2026-01-21 15:14 16% ` Samuel Rufinatscha
  2026-02-17 11:14 13% ` [pbs-devel] superseded: [PATCH proxmox{-backup,,-datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
  To: pbs-devel

Rename ConfigVersionCache’s user_cache_generation to
user_and_acl_generation to match AccessControlConfig::cache_generation
and increment_cache_generation semantics: it expects the same shared
generation for both user and ACL configs.

Safety: no layout change, the shared-memory size and field order remain
unchanged.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
 lib/pdm-config/src/access_control.rs       | 11 +++++++++++
 lib/pdm-config/src/config_version_cache.rs | 16 ++++++++--------
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/lib/pdm-config/src/access_control.rs b/lib/pdm-config/src/access_control.rs
index 389b3f4..1d498d3 100644
--- a/lib/pdm-config/src/access_control.rs
+++ b/lib/pdm-config/src/access_control.rs
@@ -7,6 +7,17 @@ impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
         &pdm_api_types::AccessControlPermissions
     }
 
+    fn cache_generation(&self) -> Option<usize> {
+        crate::ConfigVersionCache::new()
+            .ok()
+            .map(|c| c.user_and_acl_generation())
+    }
+
+    fn increment_cache_generation(&self) -> Result<(), Error> {
+        let c = crate::ConfigVersionCache::new()?;
+        Ok(c.increase_user_and_acl_generation())
+    }
+
     fn token_shadow_cache_generation(&self) -> Option<usize> {
         crate::ConfigVersionCache::new()
             .ok()
diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
index 933140c..f3d52a0 100644
--- a/lib/pdm-config/src/config_version_cache.rs
+++ b/lib/pdm-config/src/config_version_cache.rs
@@ -21,8 +21,8 @@ use proxmox_shared_memory::*;
 #[repr(C)]
 struct ConfigVersionCacheDataInner {
     magic: [u8; 8],
-    // User (user.cfg) cache generation/version.
-    user_cache_generation: AtomicUsize,
+    // User (user.cfg) and ACL (acl.cfg) generation/version.
+    user_and_acl_generation: AtomicUsize,
     // Traffic control (traffic-control.cfg) generation/version.
     traffic_control_generation: AtomicUsize,
     // Tracks updates to the remote/hostname/nodename mapping cache.
@@ -126,19 +126,19 @@ impl ConfigVersionCache {
         Ok(Arc::new(Self { shmem }))
     }
 
-    /// Returns the user cache generation number.
-    pub fn user_cache_generation(&self) -> usize {
+    /// Returns the user and ACL cache generation number.
+    pub fn user_and_acl_generation(&self) -> usize {
         self.shmem
             .data()
-            .user_cache_generation
+            .user_and_acl_generation
             .load(Ordering::Acquire)
     }
 
-    /// Increase the user cache generation number.
-    pub fn increase_user_cache_generation(&self) {
+    /// Increase the user and ACL cache generation number.
+    pub fn increase_user_and_acl_generation(&self) {
         self.shmem
             .data()
-            .user_cache_generation
+            .user_and_acl_generation
             .fetch_add(1, Ordering::AcqRel);
     }
 
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply related	[relevance 16%]

* [pbs-devel] [PATCH proxmox v4 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes
  2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (5 preceding siblings ...)
  2026-01-21 15:14 12% ` [pbs-devel] [PATCH proxmox v4 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
@ 2026-01-21 15:14 12% ` Samuel Rufinatscha
  2026-01-21 15:14 15% ` [pbs-devel] [PATCH proxmox v4 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
  To: pbs-devel

This patch adds manual/direct file change detection by tracking the
mtime and length of token.shadow and clears the in-memory token secret
cache whenever these values change.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to v4:
* make use of .replace() in refresh_cache_if_file_changed to get
previous state
* Group file stats with ShadowFileInfo
* Return false in refresh_cache_if_file_changed to avoid unnecessary cache
queries
* Adjusted commit message

Changes from v2 to v3:
* Cache now tracks last_checked (epoch seconds).
* Simplified refresh_cache_if_file_changed, removed
FILE_GENERATION logic
* On first load, initializes file metadata and keeps empty cache.

Changes from v1 to v2:
* Add file metadata tracking (file_mtime, file_len) and
  FILE_GENERATION.
* Store file_gen in CachedSecret and verify it against the current
  FILE_GENERATION to ensure cached entries belong to the current file
  state.
* Add shadow_mtime_len() helper and convert refresh to best-effort
  (try_write, returns bool).
* Pass a pre-write metadata snapshot into apply_api_mutation and
  clear/bump generation if the cache metadata indicates missed external
  edits.

 proxmox-access-control/src/token_shadow.rs | 123 ++++++++++++++++++++-
 1 file changed, 119 insertions(+), 4 deletions(-)

diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index e4dfab50..05813b52 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,5 +1,8 @@
 use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
 use std::sync::LazyLock;
+use std::time::SystemTime;
 
 use anyhow::{bail, format_err, Error};
 use parking_lot::RwLock;
@@ -7,6 +10,7 @@ use serde_json::{from_value, Value};
 
 use proxmox_auth_api::types::Authid;
 use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
+use proxmox_time::epoch_i64;
 
 use crate::init::access_conf;
 use crate::init::impl_feature::{token_shadow, token_shadow_lock};
@@ -20,6 +24,7 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
     RwLock::new(ApiTokenSecretCache {
         secrets: HashMap::new(),
         shared_gen: 0,
+        shadow: None,
     })
 });
 
@@ -45,6 +50,56 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
     replace_config(token_shadow(), &json)
 }
 
+/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
+/// Returns true if the cache is valid to use, false if not.
+fn refresh_cache_if_file_changed() -> bool {
+    let now = epoch_i64();
+
+    // Best-effort refresh under write lock.
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return false;
+    };
+
+    let Some(shared_gen_now) = token_shadow_shared_gen() else {
+        return false;
+    };
+
+    // If another process bumped the generation, we don't know what changed -> clear cache
+    if cache.shared_gen != shared_gen_now {
+        invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
+    }
+
+    // Stat the file to detect manual edits.
+    let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
+        return false;
+    };
+
+    // If the file didn't change, only update last_checked
+    if let Some(shadow) = cache.shadow.as_mut() {
+        if shadow.mtime == new_mtime && shadow.len == new_len {
+            shadow.last_checked = now;
+            return true;
+        }
+    }
+
+    cache.secrets.clear();
+
+    let prev = cache.shadow.replace(ShadowFileInfo {
+        mtime: new_mtime,
+        len: new_len,
+        last_checked: now,
+    });
+
+    if prev.is_some() {
+        // Best-effort propagation to other processes if a change was detected
+        if let Some(shared_gen_new) = bump_token_shadow_shared_gen() {
+            cache.shared_gen = shared_gen_new;
+        }
+    }
+
+    false
+}
+
 /// Verifies that an entry for given tokenid / API token secret exists
 pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     if !tokenid.is_token() {
@@ -52,7 +107,7 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     }
 
     // Fast path
-    if cache_try_secret_matches(tokenid, secret) {
+    if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
         return Ok(());
     }
 
@@ -84,12 +139,15 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
 
     let guard = lock_config()?;
 
+    // Capture state before we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
-    apply_api_mutation(guard, tokenid, Some(secret));
+    apply_api_mutation(guard, tokenid, Some(secret), pre_meta);
 
     Ok(())
 }
@@ -102,11 +160,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
 
     let guard = lock_config()?;
 
+    // Capture state before we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
-    apply_api_mutation(guard, tokenid, None);
+    apply_api_mutation(guard, tokenid, None, pre_meta);
 
     Ok(())
 }
@@ -128,6 +189,8 @@ struct ApiTokenSecretCache {
     secrets: HashMap<Authid, CachedSecret>,
     /// Shared generation to detect mutations of the underlying token.shadow file.
     shared_gen: usize,
+    /// Shadow file info to detect changes
+    shadow: Option<ShadowFileInfo>,
 }
 
 /// Cached secret.
@@ -135,6 +198,16 @@ struct CachedSecret {
     secret: String,
 }
 
+/// Shadow file info
+struct ShadowFileInfo {
+    // shadow file mtime to detect changes
+    mtime: Option<SystemTime>,
+    // shadow file length to detect changes
+    len: Option<u64>,
+    // last time the file metadata was checked
+    last_checked: i64,
+}
+
 fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return;
@@ -179,7 +252,14 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
     false
 }
 
-fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+fn apply_api_mutation(
+    _guard: ApiLockGuard,
+    tokenid: &Authid,
+    new_secret: Option<&str>,
+    pre_write_meta: (Option<SystemTime>, Option<u64>),
+) {
+    let now = epoch_i64();
+
     // Signal cache invalidation to other processes (best-effort).
     let bumped_gen = bump_token_shadow_shared_gen();
 
@@ -198,6 +278,16 @@ fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Opt
         return;
     }
 
+    // If our cached file metadata does not match the on-disk state before our write,
+    // we likely missed an external/manual edit. We can no longer trust any cached secrets.
+    if cache
+        .shadow
+        .as_ref()
+        .is_some_and(|s| (s.mtime, s.len) != pre_write_meta)
+    {
+        cache.secrets.clear();
+    }
+
     // Update to the post-mutation generation.
     cache.shared_gen = current_gen;
 
@@ -215,6 +305,22 @@ fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Opt
             cache.secrets.remove(tokenid);
         }
     }
+
+    // Update our view of the file metadata to the post-write state (best-effort).
+    // (If this fails, drop local cache so callers fall back to slow path until refreshed.)
+    match shadow_mtime_len() {
+        Ok((mtime, len)) => {
+            cache.shadow = Some(ShadowFileInfo {
+                mtime,
+                len,
+                last_checked: now,
+            });
+        }
+        Err(_) => {
+            // If we cannot validate state, do not trust cache.
+            invalidate_cache_state_and_set_gen(&mut cache, current_gen);
+        }
+    }
 }
 
 /// Get the current shared generation.
@@ -234,4 +340,13 @@ fn bump_token_shadow_shared_gen() -> Option<usize> {
 fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache, gen: usize) {
     cache.secrets.clear();
     cache.shared_gen = gen;
+    cache.shadow = None;
+}
+
+fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
+    match fs::metadata(token_shadow()) {
+        Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
+        Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
+        Err(e) => Err(e.into()),
+    }
 }
-- 
2.47.3



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel


^ permalink raw reply related	[relevance 12%]

* [pbs-devel] superseded: [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] token-shadow: reduce api token verification overhead
                     ` (3 preceding siblings ...)
  @ 2026-01-21 15:15 13% ` Samuel Rufinatscha
  4 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-21 15:15 UTC (permalink / raw)
  To: pbs-devel

https://lore.proxmox.com/pbs-devel/20260121151408.731516-1-s.rufinatscha@proxmox.com/T/#t

On 1/2/26 5:07 PM, Samuel Rufinatscha wrote:
> Hi,
> 
> this series improves the performance of token-based API authentication
> in PBS (pbs-config) and in PDM (underlying proxmox-access-control
> crate), addressing the API token verification hotspot reported in our
> bugtracker #7017 [1].
> 
> When profiling PBS /status endpoint with cargo flamegraph [2],
> token-based authentication showed up as a dominant hotspot via
> proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
> path from the hot section of the flamegraph. The same performance issue
> was measured [2] for PDM. PDM uses the underlying shared
> proxmox-access-control library for token handling, which is a
> factored out version of the token.shadow handling code from PBS.
> 
> While this series fixes the immediate performance issue both in PBS
> (pbs-config) and in the shared proxmox-access-control crate used by
> PDM, PBS should eventually, ideally be refactored, in a separate
> effort, to use proxmox-access-control for token handling instead of its
> local implementation.
> 
> Problem
> 
> For token-based API requests, both PBS’s pbs-config token.shadow
> handling and PDM proxmox-access-control’s token.shadow handling
> currently:
> 
> 1. read the token.shadow file on each request
> 2. deserialize it into a HashMap<Authid, String>
> 3. run password hash verification via
>     proxmox_sys::crypt::verify_crypt_pw for the provided token secret
> 
> Under load, this results in significant CPU usage spent in repeated
> password hashing for the same token+secret pairs. The attached
> flamegraphs for PBS [2] and PDM [3] show
> proxmox_sys::crypt::verify_crypt_pw dominating the hot path.
> 
> Approach
> 
> The goal is to reduce the cost of token-based authentication preserving
> the existing token handling semantics (including detecting manual edits
> to token.shadow) and be consistent between PBS (pbs-config) and
> PDM (proxmox-access-control). For both sites, this series proposes to:
> 
> 1. Introduce an in-memory cache for verified token secrets and
> invalidate it through a shared ConfigVersionCache generation. Note, a
> shared generation is required to keep privileged and unprivileged
> daemon in sync to avoid caching inconsistencies across processes.
> 2. Invalidate on token.shadow file API changes (set_secret,
> delete_secret)
> 3. Invalidate on direct/manual token.shadow file changes (mtime +
> length)
> 4. Avoid per-request file stat calls using a TTL window
> 
> Testing
> 
> *PBS (pbs-config)*
> 
> To verify the effect in PBS, I:
> 1. Set up test environment based on latest PBS ISO, installed Rust
>     toolchain, cloned proxmox-backup repository to use with cargo
>     flamegraph. Reproduced bug #7017 [1] by profiling the /status
>     endpoint with token-based authentication using cargo flamegraph [2].
> 2. Built PBS with pbs-config patches and re-ran the same workload and
>     profiling setup. Confirmed that
>     proxmox_sys::crypt::verify_crypt_pw path no longer appears in the
>     hot section of the flamegraph. CPU usage is now dominated by TLS
>     overhead.
> 3. Functionally-wise, I verified that:
>     * valid tokens authenticate correctly when used in API requests
>     * invalid secrets are rejected as before
>     * generating a new token secret via dashboard (create token for user,
>     regenerate existing secret) works and authenticates correctly
> 
> *PDM (proxmox-access-control)*
> 
> To verify the effect in PDM, I followed a similar testing approach.
> Instead of PBS’ /status, I profiled the /version endpoint with cargo
> flamegraph [2] and verified that the expensive hashing path disappears
> from the hot section after introducing caching.
> 
> Functionally-wise, I verified that:
>     * valid tokens authenticate correctly when used in API requests
>     * invalid secrets are rejected as before
>     * generating a new token secret via dashboard (create token for user,
>     regenerate existing secret) works and authenticates correctly
> 
> Benchmarks:
> 
> Two different benchmarks have been run to measure caching effects
> and RwLock contention:
> 
> (1) Requests per second for PBS /status endpoint (E2E)
> 
> Benchmarked parallel token auth requests for
> /status?verbose=0 on top of the datastore lookup cache series [4]
> to check throughput impact. With datastores=1, repeat=5000, parallel=16
> this series gives ~172 req/s compared to ~65 req/s without it.
> This is a ~2.6x improvement (and aligns with the ~179 req/s from the
> previous series, which used per-process cache invalidation).
> 
> (2) RwLock contention for token create/delete under heavy load of
> token-authenticated requests
> 
> The previous version of the series compared std::sync::RwLock and
> parking_lot::RwLock contention for token create/delete under heavy
> parallel token-authenticated readers. parking_lot::RwLock has been
> chosen for the added fairness guarantees.
> 
> Patch summary
> 
> pbs-config:
> 
> 0001 – pbs-config: add token.shadow generation to ConfigVersionCache
> Extends ConfigVersionCache to provide a process-shared generation
> number for token.shadow changes.
> 
> 0002 – pbs-config: cache verified API token secrets
> Adds an in-memory cache to cache verified, plain-text API token secrets.
> Cache is invalidated through the process-shared ConfigVersionCache
> generation number. Uses openssl’s memcmp constant-time for matching
> secrets.
> 
> 0003 – pbs-config: invalidate token-secret cache on token.shadow
> changes
> Stats token.shadow mtime and length and clears the cache when the
> file changes, on each token verification request.
> 
> 0004 – pbs-config: add TTL window to token-secret cache
> Introduces a TTL (TOKEN_SECRET_CACHE_TTL_SECS, default 60) for metadata
> checks so that fs::metadata calls are not performed on each request.
> 
> proxmox-access-control:
> 
> 0005 – access-control: extend AccessControlConfig for token.shadow invalidation
> 
> Extends the AccessControlConfig trait with
> token_shadow_cache_generation() and
> increment_token_shadow_cache_generation() for
> proxmox-access-control to get the shared token.shadow generation number
> and bump it on token shadow changes.
> 
> 0006 – access-control: cache verified API token secrets
> Mirrors PBS PATCH 0002.
> 
> 0007 – access-control: invalidate token-secret cache on token.shadow changes
> Mirrors PBS PATCH 0003.
> 
> 0008 – access-control: add TTL window to token-secret cache
> Mirrors PBS PATCH 0004.
> 
> proxmox-datacenter-manager:
> 
> 0009 – pdm-config: add token.shadow generation to ConfigVersionCache
> Extends PDM ConfigVersionCache and implements
> token_shadow_cache_generation() and
> increment_token_shadow_cache_generation() from AccessControlConfig for
> PDM.
> 
> 0010 – docs: document API token-cache TTL effects
> Documents the effects of the TTL window on token.shadow edits
> 
> Changes from v1 to v2:
> 
> * (refactor) Switched cache initialization to LazyLock
> * (perf) Use parking_lot::RwLock and best-effort cache access on the
>    read/refresh path (try_read/try_write) to avoid lock contention
> * (doc) Document TTL-delayed effect of manual token.shadow edits
> * (fix) Add generation guards (API_MUTATION_GENERATION +
>    FILE_GENERATION) to prevent caching across concurrent set/delete and
>    external edits
> 
> Changes from v2 to v3:
> 
> * (refactor) Replace PBS per-process cache invalidation with a
>    cross-process token.shadow generation based on PBS
>    ConfigVersionCache, ensuring cache consistency between privileged
>    and unprivileged daemons.
> * (refactor) Decoupling generation source from the
>    proxmox/proxmox-access-control cache implementation: extend
>    AccessControlConfig hooks so that products can provide the shared
>    token.shadow generation source.
> * (refactor) Extend PDM's ConfigVersionCache with
>    token_shadow_generation
>    and introduce a pdm_config::AccessControlConfig wrapper implementing
>    the new proxmox-access-control trait hooks. Switch server and CLI
>    initialization to use pdm_config::AccessControlConfig instead of
>    pdm_api_types::AccessControlConfig.
> * (refactor) Adapt generation checks around cached-secret comparison to
>    use the new shared generation source.
> * (fix/logic) cache_try_insert_secret: Update the local cache
>    generation if stale, allowing the new secret to be inserted
>    immediately
> * (refactor) Extract cache invalidation logic into a
>    invalidate_cache_state helper to reduce duplication and ensure
>    consistent state resets
> * (refactor) Simplify refresh_cache_if_file_changed: handle the
>    un-initialized/reset state and adjust the generation mismatch
>    path to ensure file metadata is always re-read.
> * (doc) Clarify TTL-delayed effects of manual token.shadow edits.
> 
> Please see the patch specific changelogs for more details.
> 
> Thanks for considering this patch series, I look forward to your
> feedback.
> 
> Best,
> Samuel Rufinatscha
> 
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
> [2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
> [3] attachment 1794 [1]: Flamegraph PDM baseline
> [4] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> 
> proxmox-backup:
> 
> Samuel Rufinatscha (4):
>    pbs-config: add token.shadow generation to ConfigVersionCache
>    pbs-config: cache verified API token secrets
>    pbs-config: invalidate token-secret cache on token.shadow changes
>    pbs-config: add TTL window to token secret cache
> 
>   Cargo.toml                             |   1 +
>   docs/user-management.rst               |   4 +
>   pbs-config/Cargo.toml                  |   1 +
>   pbs-config/src/config_version_cache.rs |  18 ++
>   pbs-config/src/token_shadow.rs         | 298 ++++++++++++++++++++++++-
>   5 files changed, 321 insertions(+), 1 deletion(-)
> 
> 
> proxmox:
> 
> Samuel Rufinatscha (4):
>    proxmox-access-control: extend AccessControlConfig for token.shadow
>      invalidation
>    proxmox-access-control: cache verified API token secrets
>    proxmox-access-control: invalidate token-secret cache on token.shadow
>      changes
>    proxmox-access-control: add TTL window to token secret cache
> 
>   Cargo.toml                                 |   1 +
>   proxmox-access-control/Cargo.toml          |   1 +
>   proxmox-access-control/src/init.rs         |  17 ++
>   proxmox-access-control/src/token_shadow.rs | 299 ++++++++++++++++++++-
>   4 files changed, 317 insertions(+), 1 deletion(-)
> 
> 
> proxmox-datacenter-manager:
> 
> Samuel Rufinatscha (2):
>    pdm-config: implement token.shadow generation
>    docs: document API token-cache TTL effects
> 
>   cli/admin/src/main.rs                       |  2 +-
>   docs/access-control.rst                     |  4 ++
>   lib/pdm-config/Cargo.toml                   |  1 +
>   lib/pdm-config/src/access_control_config.rs | 73 +++++++++++++++++++++
>   lib/pdm-config/src/config_version_cache.rs  | 18 +++++
>   lib/pdm-config/src/lib.rs                   |  2 +
>   server/src/acl.rs                           |  3 +-
>   7 files changed, 100 insertions(+), 3 deletions(-)
>   create mode 100644 lib/pdm-config/src/access_control_config.rs
> 
> 
> Summary over all repositories:
>    16 files changed, 738 insertions(+), 5 deletions(-)
> 



_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel

^ permalink raw reply	[relevance 13%]

* Re: [pve-devel] [PATCH pve-cluster 01/15] pmxcfs-rs: add workspace and pmxcfs-api-types crate
  @ 2026-01-23 14:17  6%   ` Samuel Rufinatscha
  2026-01-26  9:00  6%     ` Kefu Chai
  0 siblings, 1 reply; 117+ results
From: Samuel Rufinatscha @ 2026-01-23 14:17 UTC (permalink / raw)
  To: Proxmox VE development discussion, Kefu Chai

Thanks for the series. I’ve started reviewing patches 1–6; sending
notes for patch 1 first, and I’ll follow up with comments on the
others once I’ve gone through them in more depth.

comments inline

On 1/6/26 3:25 PM, Kefu Chai wrote:
> Initialize the Rust workspace for the pmxcfs rewrite project.
> 
> Add pmxcfs-api-types crate which provides foundational types:
> - PmxcfsError: Error type with errno mapping for FUSE operations
> - FuseMessage: Filesystem operation messages
> - KvStoreMessage: Status synchronization messages
> - ApplicationMessage: Wrapper enum for both message types
> - VmType: VM type enum (Qemu, Lxc)
> 
> This is the foundation crate with no internal dependencies, only
> requiring thiserror and libc. All other crates will depend on these
> shared type definitions.
> 
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
>   src/pmxcfs-rs/Cargo.lock                  | 2067 +++++++++++++++++++++

Following the .gitignore pattern in our other repos, Cargo.lock is
ignored, so I’d suggest dropping it from the series.

>   src/pmxcfs-rs/Cargo.toml                  |   83 +
>   src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml |   19 +
>   src/pmxcfs-rs/pmxcfs-api-types/README.md  |  105 ++
>   src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs |  152 ++
>   5 files changed, 2426 insertions(+)
>   create mode 100644 src/pmxcfs-rs/Cargo.lock
>   create mode 100644 src/pmxcfs-rs/Cargo.toml
>   create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
>   create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/README.md
>   create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
> 
> diff --git a/src/pmxcfs-rs/Cargo.lock b/src/pmxcfs-rs/Cargo.lock

[..]

> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -0,0 +1,83 @@
> +# Workspace root for pmxcfs Rust implementation
> +[workspace]
> +members = [
> +    "pmxcfs-api-types", # Shared types and error definitions
> +]
> +resolver = "2"
> +
> +[workspace.package]
> +version = "9.0.6"
> +edition = "2024"
> +authors = ["Proxmox Support Team <support@proxmox.com>"]
> +license = "AGPL-3.0"
> +repository = "https://git.proxmox.com/?p=pve-cluster.git"
> +rust-version = "1.85"
> +
> +[workspace.dependencies]

Here we already declare workspace path deps for crates that aren’t
present yet (pmxcfs-config, pmxcfs-memdb, ...). For bisectability,
could we keep this patch minimal and add those workspace
members/path deps in the patches where the crates are introduced?

> +# Internal workspace dependencies
> +pmxcfs-api-types = { path = "pmxcfs-api-types" }
> +pmxcfs-config = { path = "pmxcfs-config" }
> +pmxcfs-memdb = { path = "pmxcfs-memdb" }
> +pmxcfs-dfsm = { path = "pmxcfs-dfsm" }
> +pmxcfs-rrd = { path = "pmxcfs-rrd" }
> +pmxcfs-status = { path = "pmxcfs-status" }
> +pmxcfs-ipc = { path = "pmxcfs-ipc" }
> +pmxcfs-services = { path = "pmxcfs-services" }
> +pmxcfs-logger = { path = "pmxcfs-logger" }
> +
> +# Core async runtime
> +tokio = { version = "1.35", features = ["full"] }
> +tokio-util = "0.7"
> +async-trait = "0.1"
> +

If the goal is to centrally pin external crate versions early, maybe
limit [workspace.dependencies] here generally to the crates actually
used by pmxcfs-api-types (thiserror, libc) and extend as new crates
are added.

> +# Error handling
> +anyhow = "1.0"
> +thiserror = "1.0"
> +
> +# Logging and tracing
> +tracing = "0.1"
> +tracing-subscriber = { version = "0.3", features = ["env-filter"] }
> +
> +# Serialization
> +serde = { version = "1.0", features = ["derive"] }
> +serde_json = "1.0"
> +bincode = "1.3"
> +
> +# Network and cluster
> +bytes = "1.5"
> +sha2 = "0.10"
> +bytemuck = { version = "1.14", features = ["derive"] }
> +
> +# System integration
> +libc = "0.2"
> +nix = { version = "0.27", features = ["fs", "process", "signal", "user", "socket"] }
> +users = "0.11"
> +
> +# Corosync/CPG bindings
> +rust-corosync = "0.1"
> +
> +# Enum conversions
> +num_enum = "0.7"
> +
> +# Concurrency primitives
> +parking_lot = "0.12"
> +
> +# Utilities
> +chrono = "0.4"
> +futures = "0.3"
> +
> +# Development dependencies
> +tempfile = "3.8"
> +
> +[workspace.lints.clippy]
> +uninlined_format_args = "warn"
> +
> +[profile.release]
> +lto = true
> +codegen-units = 1
> +opt-level = 3
> +strip = true
> +
> +[profile.dev]
> +opt-level = 1
> +debug = true
> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml b/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
> new file mode 100644
> index 00000000..cdce7951
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
> @@ -0,0 +1,19 @@
> +[package]
> +name = "pmxcfs-api-types"
> +description = "Shared types and error definitions for pmxcfs"
> +
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +repository.workspace = true
> +
> +[lints]
> +workspace = true
> +
> +[dependencies]
> +# Error handling
> +thiserror.workspace = true
> +
> +# System integration
> +libc.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/README.md b/src/pmxcfs-rs/pmxcfs-api-types/README.md
> new file mode 100644
> index 00000000..da8304ae
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-api-types/README.md
> @@ -0,0 +1,105 @@
> +# pmxcfs-api-types
> +
> +**Shared Types and Error Definitions** for pmxcfs.
> +
> +This crate provides common types, error definitions, and message formats used across all pmxcfs crates. It serves as the "API contract" between different components.
> +
> +## Overview
> +
> +The crate contains:
> +- **Error types**: `PmxcfsError` with errno mapping for FUSE
> +- **Message types**: `FuseMessage`, `KvStoreMessage`, `ApplicationMessage`

These types and the mentioned serialization helpers aren’t part of this 
diff, could you re-check both README.md (and the commit message) so they 
match?

> +- **Shared types**: `MemberInfo`, `NodeSyncInfo`
> +- **Serialization**: C-compatible wire format helpers
> +
> +## Error Types
> +
> +### PmxcfsError
> +
> +Type-safe error enum with automatic errno conversion.
> +
> +### errno Mapping
> +
> +Errors automatically convert to POSIX errno values for FUSE.
> +
> +| Error | errno | Value |
> +|-------|-------|-------|
> +| `NotFound` | `ENOENT` | 2 |
> +| `PermissionDenied` | `EPERM` | 1 |
> +| `AlreadyExists` | `EEXIST` | 17 |
> +| `NotADirectory` | `ENOTDIR` | 20 |
> +| `IsADirectory` | `EISDIR` | 21 |
> +| `DirectoryNotEmpty` | `ENOTEMPTY` | 39 |
> +| `FileTooLarge` | `EFBIG` | 27 |
> +| `ReadOnlyFilesystem` | `EROFS` | 30 |
> +| `NoQuorum` | `EACCES` | 13 |
> +| `Timeout` | `ETIMEDOUT` | 110 |
> +
> +## Message Types
> +
> +### FuseMessage
> +
> +Filesystem operations broadcast through the cluster (via DFSM). Uses C-compatible wire format compatible with `dcdb.c`.
> +
> +### KvStoreMessage
> +
> +Status and metrics synchronization (via kvstore DFSM). Uses C-compatible wire format.
> +
> +### ApplicationMessage
> +
> +Wrapper for either FuseMessage or KvStoreMessage, used by DFSM to handle both filesystem and status messages with type safety.
> +
> +## Shared Types
> +
> +### MemberInfo
> +
> +Cluster member information.
> +
> +### NodeSyncInfo
> +
> +DFSM synchronization state.
> +
> +## C to Rust Mapping
> +
> +### Error Handling
> +
> +**C Version (cfs-utils.h):**
> +- Return codes: `0` = success, negative = error
> +- errno-based error reporting
> +- Manual error checking everywhere
> +
> +**Rust Version:**
> +- `Result<T, PmxcfsError>` type
> +
> +### Message Types
> +
> +**C Version (dcdb.h):**
> +
> +**Rust Version:**
> +- Type-safe enums
> +
> +## Key Differences from C Implementation
> +
> +All message types have `serialize()` and `deserialize()` methods that produce byte-for-byte compatible formats with the C implementation.
> +
> +## Known Issues / TODOs
> +
> +### Missing Features
> +- None identified
> +
> +### Compatibility
> +- **Wire format**: 100% compatible with C implementation
> +- **errno values**: Match POSIX standards
> +- **Message types**: All C message types covered
> +
> +## References
> +
> +### C Implementation
> +- `src/pmxcfs/cfs-utils.h` - Utility types and error codes
> +- `src/pmxcfs/dcdb.h` - FUSE message types
> +- `src/pmxcfs/status.h` - KvStore message types
> +
> +### Related Crates
> +- **pmxcfs-dfsm**: Uses ApplicationMessage for cluster sync
> +- **pmxcfs-memdb**: Uses PmxcfsError for database operations
> +- **pmxcfs**: Uses FuseMessage for FUSE operations
> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs b/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
> new file mode 100644
> index 00000000..ae0e5eb0
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
> @@ -0,0 +1,152 @@
> +use thiserror::Error;
> +
> +/// Error types for pmxcfs operations
> +#[derive(Error, Debug)]
> +pub enum PmxcfsError {

nit: the error related parts could be added into a dedicated error.rs
module

> +    #[error("I/O error: {0}")]
> +    Io(#[from] std::io::Error),
> +
> +    #[error("Database error: {0}")]
> +    Database(String),
> +
> +    #[error("FUSE error: {0}")]
> +    Fuse(String),
> +
> +    #[error("Cluster error: {0}")]
> +    Cluster(String),
> +
> +    #[error("Corosync error: {0}")]
> +    Corosync(String),
> +
> +    #[error("Configuration error: {0}")]
> +    Configuration(String),
> +
> +    #[error("System error: {0}")]
> +    System(String),
> +
> +    #[error("IPC error: {0}")]
> +    Ipc(String),
> +
> +    #[error("Permission denied")]
> +    PermissionDenied,
> +
> +    #[error("Not found: {0}")]
> +    NotFound(String),
> +
> +    #[error("Already exists: {0}")]
> +    AlreadyExists(String),
> +
> +    #[error("Invalid argument: {0}")]
> +    InvalidArgument(String),
> +
> +    #[error("Not a directory: {0}")]
> +    NotADirectory(String),
> +
> +    #[error("Is a directory: {0}")]
> +    IsADirectory(String),
> +
> +    #[error("Directory not empty: {0}")]
> +    DirectoryNotEmpty(String),
> +
> +    #[error("No quorum")]
> +    NoQuorum,
> +
> +    #[error("Read-only filesystem")]
> +    ReadOnlyFilesystem,
> +
> +    #[error("File too large")]
> +    FileTooLarge,
> +
> +    #[error("Lock error: {0}")]
> +    Lock(String),
> +
> +    #[error("Timeout")]
> +    Timeout,
> +
> +    #[error("Invalid path: {0}")]
> +    InvalidPath(String),
> +}
> +
> +impl PmxcfsError {
> +    /// Convert error to errno value for FUSE operations
> +    pub fn to_errno(&self) -> i32 {
> +        match self {
> +            PmxcfsError::NotFound(_) => libc::ENOENT,
> +            PmxcfsError::PermissionDenied => libc::EPERM,
> +            PmxcfsError::AlreadyExists(_) => libc::EEXIST,
> +            PmxcfsError::NotADirectory(_) => libc::ENOTDIR,
> +            PmxcfsError::IsADirectory(_) => libc::EISDIR,
> +            PmxcfsError::DirectoryNotEmpty(_) => libc::ENOTEMPTY,
> +            PmxcfsError::InvalidArgument(_) => libc::EINVAL,
> +            PmxcfsError::FileTooLarge => libc::EFBIG,
> +            PmxcfsError::ReadOnlyFilesystem => libc::EROFS,
> +            PmxcfsError::NoQuorum => libc::EACCES,
> +            PmxcfsError::Timeout => libc::ETIMEDOUT,
> +            PmxcfsError::Io(e) => match e.raw_os_error() {
> +                Some(errno) => errno,
> +                None => libc::EIO,
> +            },
> +            _ => libc::EIO,

Please check with C implementation, but:

"PermissionDenied" should likely map to EACCES rather than EPERM. In
FUSE/POSIX, EACCES is the standard return for file permission blocks,
whereas EPERM is usually for administrative restrictions
(like ownership)

"InvalidPath" maps better to EINVAL. EIO suggests a hardware/disk
failure, whereas InvalidPath implies an argument issue

Also, "Lock" should explicitly be mapped.
EBUSY (resource busy / lock contention)
or EDEADLK (deadlock) / EAGAIN depending on semantics

In general, can we minimize the number of errors falling into the
generic EIO branch?

> +        }
> +    }
> +}
> +
> +/// Result type for pmxcfs operations
> +pub type Result<T> = std::result::Result<T, PmxcfsError>;
> +
> +/// VM/CT types
> +#[derive(Debug, Clone, Copy, PartialEq, Eq)]

If this is used in wire contexts please add #[repr(u8)] to ensure a
stable ABI.

> +pub enum VmType {
> +    Qemu = 1,
> +    Lxc = 3,

There’s a gap between values 1 -> 3: is 2 reserved?
If so, maybe add a short comment.

> +}
> +
> +impl VmType {
> +    /// Returns the directory name where config files are stored
> +    pub fn config_dir(&self) -> &'static str {
> +        match self {
> +            VmType::Qemu => "qemu-server",
> +            VmType::Lxc => "lxc",
> +        }
> +    }
> +}
> +
> +impl std::fmt::Display for VmType {
> +    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
> +        match self {
> +            VmType::Qemu => write!(f, "qemu"),
> +            VmType::Lxc => write!(f, "lxc"),
> +        }
> +    }
> +}
> +
> +/// VM/CT entry for vmlist
> +#[derive(Debug, Clone)]
> +pub struct VmEntry {
> +    pub vmid: u32,
> +    pub vmtype: VmType,
> +    pub node: String,
> +    /// Per-VM version counter (increments when this VM's config changes)
> +    pub version: u32,
> +}
> +
> +/// Information about a cluster member
> +///
> +/// This is a shared type used by both cluster and DFSM modules
> +#[derive(Debug, Clone)]
> +pub struct MemberInfo {
> +    pub node_id: u32,
> +    pub pid: u32,
> +    pub joined_at: u64,
> +}
> +
> +/// Node synchronization info for DFSM state sync
> +///
> +/// Used during DFSM synchronization to track which nodes have provided state
> +#[derive(Debug, Clone)]
> +pub struct NodeSyncInfo {
> +    pub nodeid: u32,

We have "nodeid" here but "node_id" in MemberInfo, this should be
aligned.

> +    pub pid: u32,
> +    pub state: Option<Vec<u8>>,
> +    pub synced: bool,
> +}



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[relevance 6%]

* Re: [pve-devel] [PATCH pve-cluster 02/15] pmxcfs-rs: add pmxcfs-config crate
  @ 2026-01-23 15:01  6%   ` Samuel Rufinatscha
  2026-01-26  9:43  6%     ` Kefu Chai
  0 siblings, 1 reply; 117+ results
From: Samuel Rufinatscha @ 2026-01-23 15:01 UTC (permalink / raw)
  To: Proxmox VE development discussion, Kefu Chai

comments inline

On 1/6/26 3:25 PM, Kefu Chai wrote:
> Add configuration management crate that provides:
> - Config struct for runtime configuration
> - Node hostname, IP, and group ID tracking
> - Debug and local mode flags
> - Thread-safe configuration access via parking_lot Mutex
> 
> This is a foundational crate with no internal dependencies, only
> requiring parking_lot for synchronization. Other crates will use
> this for accessing runtime configuration.
> 
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
>   src/pmxcfs-rs/Cargo.toml               |   3 +-
>   src/pmxcfs-rs/pmxcfs-config/Cargo.toml |  16 +
>   src/pmxcfs-rs/pmxcfs-config/README.md  | 127 +++++++
>   src/pmxcfs-rs/pmxcfs-config/src/lib.rs | 471 +++++++++++++++++++++++++
>   4 files changed, 616 insertions(+), 1 deletion(-)
>   create mode 100644 src/pmxcfs-rs/pmxcfs-config/Cargo.toml
>   create mode 100644 src/pmxcfs-rs/pmxcfs-config/README.md
>   create mode 100644 src/pmxcfs-rs/pmxcfs-config/src/lib.rs
> 
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index 15d88f52..28e20bb7 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -1,7 +1,8 @@
>   # Workspace root for pmxcfs Rust implementation
>   [workspace]
>   members = [
> -    "pmxcfs-api-types", # Shared types and error definitions
> +    "pmxcfs-api-types",  # Shared types and error definitions
> +    "pmxcfs-config",     # Configuration management
>   ]
>   resolver = "2"
>   
> diff --git a/src/pmxcfs-rs/pmxcfs-config/Cargo.toml b/src/pmxcfs-rs/pmxcfs-config/Cargo.toml
> new file mode 100644
> index 00000000..f5a60995
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-config/Cargo.toml
> @@ -0,0 +1,16 @@
> +[package]
> +name = "pmxcfs-config"
> +description = "Configuration management for pmxcfs"
> +
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +repository.workspace = true
> +
> +[lints]
> +workspace = true
> +
> +[dependencies]
> +# Concurrency primitives
> +parking_lot.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-config/README.md b/src/pmxcfs-rs/pmxcfs-config/README.md
> new file mode 100644
> index 00000000..c06b2170
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-config/README.md
> @@ -0,0 +1,127 @@
> +# pmxcfs-config
> +
> +**Configuration Management** and **Cluster Services** for pmxcfs.
> +
> +This crate provides configuration structures and cluster integration services including quorum tracking and cluster configuration monitoring via Corosync APIs.
> +
> +## Overview
> +
> +This crate contains:
> +1. **Config struct**: Runtime configuration (node name, IPs, flags)
> +2. Integration with Corosync services (tracked in main pmxcfs crate):
> +   - **QuorumService** (`pmxcfs/src/quorum_service.rs`) - Quorum monitoring
> +   - **ClusterConfigService** (`pmxcfs/src/cluster_config_service.rs`) - Config tracking

This patch only contains the Config struct, but not Cluster Services
or QuorumService, please revist commit message and README.

> +
> +## Config Struct
> +
> +The `Config` struct holds daemon-wide configuration including node hostname, IP address, www-data group ID, debug flag, local mode flag, and cluster name.
> +
> +## Cluster Services
> +
> +The following services are implemented in the main pmxcfs crate but documented here for completeness.
> +
> +### QuorumService
> +
> +**C Equivalent:** `src/pmxcfs/quorum.c` - `service_quorum_new()`
> +**Rust Location:** `src/pmxcfs-rs/pmxcfs/src/quorum_service.rs`
> +
> +Monitors cluster quorum status via Corosync quorum API.
> +
> +#### Features
> +- Tracks quorum state (quorate/inquorate)
> +- Monitors member list changes
> +- Automatic reconnection on Corosync restart
> +- Updates `Status` quorum flag
> +
> +#### C to Rust Mapping
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `service_quorum_new()` | `QuorumService::new()` | quorum_service.rs |
> +| `service_quorum_destroy()` | (Drop trait / finalize) | Automatic |
> +| `quorum_notification_fn` | quorum_notification closure | quorum_service.rs |
> +| `nodelist_notification_fn` | nodelist_notification closure | quorum_service.rs |
> +
> +#### Quorum Notifications
> +
> +The service monitors quorum state changes and member list changes, updating the Status accordingly.
> +
> +### ClusterConfigService
> +
> +**C Equivalent:** `src/pmxcfs/confdb.c` - `service_confdb_new()`
> +**Rust Location:** `src/pmxcfs-rs/pmxcfs/src/cluster_config_service.rs`
> +
> +Monitors Corosync cluster configuration (cmap) and tracks node membership.
> +
> +#### Features
> +- Monitors cluster membership via Corosync cmap API
> +- Tracks node additions/removals
> +- Registers nodes in Status
> +- Automatic reconnection on Corosync restart
> +
> +#### C to Rust Mapping
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `service_confdb_new()` | `ClusterConfigService::new()` | cluster_config_service.rs |
> +| `service_confdb_destroy()` | (Drop trait / finalize) | Automatic |
> +| `confdb_track_fn` | (direct cmap queries) | Different approach |
> +
> +#### Configuration Tracking
> +
> +The service monitors:
> +- `nodelist.node.*.nodeid` - Node IDs
> +- `nodelist.node.*.name` - Node names
> +- `nodelist.node.*.ring*_addr` - Node IP addresses
> +
> +Updates `Status` with current cluster membership.
> +
> +## Key Differences from C Implementation
> +
> +### Cluster Config Service API
> +
> +**C Version (confdb.c):**
> +- Uses deprecated confdb API
> +- Track changes via confdb notifications
> +
> +**Rust Version:**
> +- Uses modern cmap API
> +- Direct cmap queries
> +
> +Both read the same data, but Rust uses the modern Corosync API.
> +
> +### Service Integration
> +
> +**C Version:**
> +- qb_loop manages lifecycle
> +
> +**Rust Version:**
> +- Service trait abstracts lifecycle
> +- ServiceManager handles retry
> +- Tokio async dispatch
> +
> +## Known Issues / TODOs
> +
> +### Compatibility
> +- **Quorum tracking**: Compatible with C implementation
> +- **Node registration**: Equivalent behavior
> +- **cmap vs confdb**: Rust uses modern cmap API (C uses deprecated confdb)
> +
> +### Missing Features
> +- None identified
> +
> +### Behavioral Differences (Benign)
> +- **API choice**: Rust uses cmap, C uses confdb (both read same data)
> +- **Lifecycle**: Rust uses Service trait, C uses manual lifecycle
> +
> +## References
> +
> +### C Implementation
> +- `src/pmxcfs/quorum.c` / `quorum.h` - Quorum service
> +- `src/pmxcfs/confdb.c` / `confdb.h` - Cluster config service
> +
> +### Related Crates
> +- **pmxcfs**: Main daemon with QuorumService and ClusterConfigService
> +- **pmxcfs-status**: Status tracking updated by these services
> +- **pmxcfs-services**: Service framework used by both services
> +- **rust-corosync**: Corosync FFI bindings
> diff --git a/src/pmxcfs-rs/pmxcfs-config/src/lib.rs b/src/pmxcfs-rs/pmxcfs-config/src/lib.rs
> new file mode 100644
> index 00000000..5e1ee1b2
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-config/src/lib.rs
> @@ -0,0 +1,471 @@
> +use parking_lot::RwLock;
> +use std::sync::Arc;
> +
> +/// Global configuration for pmxcfs
> +pub struct Config {
> +    /// Node name (hostname without domain)
> +    pub nodename: String,
> +
> +    /// Node IP address
> +    pub node_ip: String,

Consider using std::net::IpAddr (or SocketAddr if a port is part of the
value). Tests currently mix IP vs IP:PORT, so it’s unclear what node_ip
is supposed to represent.

> +
> +    /// www-data group ID for file permissions
> +    pub www_data_gid: u32,
> +
> +    /// Debug mode enabled
> +    pub debug: bool,
> +
> +    /// Force local mode (no clustering)
> +    pub local_mode: bool,
> +
> +    /// Cluster name (CPG group name)
> +    pub cluster_name: String,
> +
> +    /// Debug level (0 = normal, 1+ = debug) - mutable at runtime
> +    debug_level: RwLock<u8>,

in the crate docs it says: “The Config struct uses Arc<AtomicU8> for
debug_level” but the implementation uses parking_lot::RwLock<u8>.
Unless we need lock coupling with other fields, AtomicU8 would likely
be sufficient (and cheaper) for debug_level. Also please re-check the
commit message, which mentions parking_lot::Mutex.

> +}
> +
> +impl Clone for Config {
> +    fn clone(&self) -> Self {
> +        Self {
> +            nodename: self.nodename.clone(),
> +            node_ip: self.node_ip.clone(),
> +            www_data_gid: self.www_data_gid,
> +            debug: self.debug,
> +            local_mode: self.local_mode,
> +            cluster_name: self.cluster_name.clone(),
> +            debug_level: RwLock::new(*self.debug_level.read()),
> +        }
> +    }
> +}
> +
> +impl std::fmt::Debug for Config {
> +    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
> +        f.debug_struct("Config")
> +            .field("nodename", &self.nodename)
> +            .field("node_ip", &self.node_ip)
> +            .field("www_data_gid", &self.www_data_gid)
> +            .field("debug", &self.debug)
> +            .field("local_mode", &self.local_mode)
> +            .field("cluster_name", &self.cluster_name)
> +            .field("debug_level", &*self.debug_level.read())
> +            .finish()
> +    }
> +}
> +
> +impl Config {
> +    pub fn new(
> +        nodename: String,
> +        node_ip: String,
> +        www_data_gid: u32,
> +        debug: bool,
> +        local_mode: bool,
> +        cluster_name: String,
> +    ) -> Arc<Self> {

The constructor returns Arc<Config>
I think we could keep new() -> Self, and provide convenience
constructor shared() -> Arc<Self>.
This would allow local usage (e.g. for tests) without heap allocation
of the struct

> +        let debug_level = if debug { 1 } else { 0 };

debug_level is derived from debug at creation time, but thereafter:
set_debug_level() does not update debug and is_debug() would continue
to reflect the initial flag, not the effective debug level
is_debug() should just be a helper that returns self.debug_level() > 0.
The debug field should probably be removed entirely.

> +        Arc::new(Self {
> +            nodename,
> +            node_ip,
> +            www_data_gid,
> +            debug,
> +            local_mode,
> +            cluster_name,
> +            debug_level: RwLock::new(debug_level),
> +        })
> +    }
> +
> +    pub fn cluster_name(&self) -> &str {
> +        &self.cluster_name
> +    }
> +
> +    pub fn nodename(&self) -> &str {
> +        &self.nodename
> +    }
> +
> +    pub fn node_ip(&self) -> &str {
> +        &self.node_ip
> +    }
> +
> +    pub fn www_data_gid(&self) -> u32 {
> +        self.www_data_gid
> +    }
> +
> +    pub fn is_debug(&self) -> bool {
> +        self.debug
> +    }
> +
> +    pub fn is_local_mode(&self) -> bool {
> +        self.local_mode
> +    }
> +
> +    /// Get current debug level (0 = normal, 1+ = debug)
> +    pub fn debug_level(&self) -> u8 {
> +        *self.debug_level.read()
> +    }
> +
> +    /// Set debug level (0 = normal, 1+ = debug)
> +    pub fn set_debug_level(&self, level: u8) {
> +        *self.debug_level.write() = level;
> +    }

Right now most fields are pub but also getters are exposed. This will
make it harder to enforce invariants.
I would suggest to make fields private and keep getters, or keep fields
public and drop the getters.

> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    //! Unit tests for Config struct
> +    //!
> +    //! This test module provides comprehensive coverage for:
> +    //! - Configuration creation and initialization
> +    //! - Getter methods for all configuration fields
> +    //! - Debug level mutation and thread safety
> +    //! - Concurrent access patterns (reads and writes)
> +    //! - Clone independence
> +    //! - Debug formatting
> +    //! - Edge cases (empty strings, long strings, special characters, unicode)
> +    //!
> +    //! ## Thread Safety
> +    //!
> +    //! The Config struct uses `Arc<AtomicU8>` for debug_level to allow
> +    //! safe concurrent reads and writes. Tests verify:
> +    //! - 10 threads × 100 operations (concurrent modifications)
> +    //! - 20 threads × 1000 operations (concurrent reads)
> +    //!
> +    //! ## Edge Cases
> +    //!
> +    //! Tests cover various edge cases including:
> +    //! - Empty strings for node/cluster names
> +    //! - Long strings (1000+ characters)
> +    //! - Special characters in strings
> +    //! - Unicode support (emoji, non-ASCII characters)
> +
> +    use super::*;
> +    use std::thread;
> +
> +    // ===== Basic Construction Tests =====
> +
> +    #[test]
> +    fn test_config_creation() {
> +        let config = Config::new(
> +            "node1".to_string(),
> +            "192.168.1.10".to_string(),
> +            33,
> +            false,
> +            false,
> +            "pmxcfs".to_string(),
> +        );
> +
> +        assert_eq!(config.nodename(), "node1");
> +        assert_eq!(config.node_ip(), "192.168.1.10");
> +        assert_eq!(config.www_data_gid(), 33);
> +        assert!(!config.is_debug());
> +        assert!(!config.is_local_mode());
> +        assert_eq!(config.cluster_name(), "pmxcfs");
> +        assert_eq!(
> +            config.debug_level(),
> +            0,
> +            "Debug level should be 0 when debug is false"
> +        );
> +    }
> +
> +    #[test]
> +    fn test_config_creation_with_debug() {
> +        let config = Config::new(
> +            "node2".to_string(),
> +            "10.0.0.5".to_string(),
> +            1000,
> +            true,
> +            false,
> +            "test-cluster".to_string(),
> +        );
> +
> +        assert!(config.is_debug());
> +        assert_eq!(
> +            config.debug_level(),
> +            1,
> +            "Debug level should be 1 when debug is true"
> +        );
> +    }
> +
> +    #[test]
> +    fn test_config_creation_local_mode() {
> +        let config = Config::new(
> +            "localhost".to_string(),
> +            "127.0.0.1".to_string(),
> +            33,
> +            false,
> +            true,
> +            "local".to_string(),
> +        );
> +
> +        assert!(config.is_local_mode());
> +        assert!(!config.is_debug());
> +    }
> +
> +    // ===== Getter Tests =====
> +
> +    #[test]
> +    fn test_all_getters() {
> +        let config = Config::new(
> +            "testnode".to_string(),
> +            "172.16.0.1".to_string(),
> +            999,
> +            true,
> +            true,
> +            "my-cluster".to_string(),
> +        );
> +
> +        // Test all getter methods
> +        assert_eq!(config.nodename(), "testnode");
> +        assert_eq!(config.node_ip(), "172.16.0.1");
> +        assert_eq!(config.www_data_gid(), 999);
> +        assert!(config.is_debug());
> +        assert!(config.is_local_mode());
> +        assert_eq!(config.cluster_name(), "my-cluster");
> +        assert_eq!(config.debug_level(), 1);
> +    }
> +
> +    // ===== Debug Level Mutation Tests =====
> +
> +    #[test]
> +    fn test_debug_level_mutation() {
> +        let config = Config::new(
> +            "node1".to_string(),
> +            "192.168.1.1".to_string(),
> +            33,
> +            false,
> +            false,
> +            "pmxcfs".to_string(),
> +        );
> +
> +        assert_eq!(config.debug_level(), 0);
> +
> +        config.set_debug_level(1);
> +        assert_eq!(config.debug_level(), 1);
> +
> +        config.set_debug_level(5);
> +        assert_eq!(config.debug_level(), 5);
> +
> +        config.set_debug_level(0);
> +        assert_eq!(config.debug_level(), 0);
> +    }
> +
> +    #[test]
> +    fn test_debug_level_max_value() {
> +        let config = Config::new(
> +            "node1".to_string(),
> +            "192.168.1.1".to_string(),
> +            33,
> +            false,
> +            false,
> +            "pmxcfs".to_string(),
> +        );
> +
> +        config.set_debug_level(255);
> +        assert_eq!(config.debug_level(), 255);
> +
> +        config.set_debug_level(0);
> +        assert_eq!(config.debug_level(), 0);
> +    }
> +
> +    // ===== Thread Safety Tests =====
> +
> +    #[test]
> +    fn test_debug_level_thread_safety() {
> +        let config = Config::new(
> +            "node1".to_string(),
> +            "192.168.1.1".to_string(),
> +            33,
> +            false,
> +            false,
> +            "pmxcfs".to_string(),
> +        );
> +
> +        let config_clone = Arc::clone(&config);
> +
> +        // Spawn multiple threads that concurrently modify debug level
> +        let handles: Vec<_> = (0..10)
> +            .map(|i| {
> +                let cfg = Arc::clone(&config);
> +                thread::spawn(move || {
> +                    for _ in 0..100 {
> +                        cfg.set_debug_level(i);
> +                        let _ = cfg.debug_level();
> +                    }
> +                })
> +            })
> +            .collect();
> +
> +        // All threads should complete without panicking
> +        for handle in handles {
> +            handle.join().unwrap();
> +        }
> +
> +        // Final value should be one of the values set by threads
> +        let final_level = config_clone.debug_level();
> +        assert!(
> +            final_level < 10,
> +            "Debug level should be < 10, got {final_level}"
> +        );
> +    }
> +
> +    #[test]
> +    fn test_concurrent_reads() {
> +        let config = Config::new(
> +            "node1".to_string(),
> +            "192.168.1.1".to_string(),
> +            33,
> +            true,
> +            false,
> +            "pmxcfs".to_string(),
> +        );
> +
> +        // Spawn multiple threads that concurrently read config
> +        let handles: Vec<_> = (0..20)
> +            .map(|_| {
> +                let cfg = Arc::clone(&config);
> +                thread::spawn(move || {
> +                    for _ in 0..1000 {
> +                        assert_eq!(cfg.nodename(), "node1");
> +                        assert_eq!(cfg.node_ip(), "192.168.1.1");
> +                        assert_eq!(cfg.www_data_gid(), 33);
> +                        assert!(cfg.is_debug());
> +                        assert!(!cfg.is_local_mode());
> +                        assert_eq!(cfg.cluster_name(), "pmxcfs");
> +                    }
> +                })
> +            })
> +            .collect();
> +
> +        for handle in handles {
> +            handle.join().unwrap();
> +        }
> +    }
> +
> +    // ===== Clone Tests =====
> +
> +    #[test]
> +    fn test_config_clone() {
> +        let config1 = Config::new(
> +            "node1".to_string(),
> +            "192.168.1.1".to_string(),
> +            33,
> +            true,
> +            false,
> +            "pmxcfs".to_string(),
> +        );
> +
> +        config1.set_debug_level(5);
> +
> +        let config2 = (*config1).clone();
> +
> +        // Cloned config should have same values
> +        assert_eq!(config2.nodename(), config1.nodename());
> +        assert_eq!(config2.node_ip(), config1.node_ip());
> +        assert_eq!(config2.www_data_gid(), config1.www_data_gid());
> +        assert_eq!(config2.is_debug(), config1.is_debug());
> +        assert_eq!(config2.is_local_mode(), config1.is_local_mode());
> +        assert_eq!(config2.cluster_name(), config1.cluster_name());
> +        assert_eq!(config2.debug_level(), 5);
> +
> +        // Modifying one should not affect the other
> +        config2.set_debug_level(10);
> +        assert_eq!(config1.debug_level(), 5);
> +        assert_eq!(config2.debug_level(), 10);
> +    }
> +
> +    // ===== Debug Formatting Tests =====
> +
> +    #[test]
> +    fn test_debug_format() {
> +        let config = Config::new(
> +            "node1".to_string(),
> +            "192.168.1.1".to_string(),
> +            33,
> +            true,
> +            false,
> +            "pmxcfs".to_string(),
> +        );
> +
> +        let debug_str = format!("{config:?}");
> +
> +        // Check that debug output contains all fields
> +        assert!(debug_str.contains("Config"));
> +        assert!(debug_str.contains("nodename"));
> +        assert!(debug_str.contains("node1"));
> +        assert!(debug_str.contains("node_ip"));
> +        assert!(debug_str.contains("192.168.1.1"));
> +        assert!(debug_str.contains("www_data_gid"));
> +        assert!(debug_str.contains("33"));
> +        assert!(debug_str.contains("debug"));
> +        assert!(debug_str.contains("true"));
> +        assert!(debug_str.contains("local_mode"));
> +        assert!(debug_str.contains("false"));
> +        assert!(debug_str.contains("cluster_name"));
> +        assert!(debug_str.contains("pmxcfs"));
> +        assert!(debug_str.contains("debug_level"));
> +    }
> +
> +    // ===== Edge Cases and Boundary Tests =====
> +
> +    #[test]
> +    fn test_empty_strings() {
> +        let config = Config::new(String::new(), String::new(), 0, false, false, String::new());
> +
> +        assert_eq!(config.nodename(), "");
> +        assert_eq!(config.node_ip(), "");
> +        assert_eq!(config.cluster_name(), "");
> +        assert_eq!(config.www_data_gid(), 0);
> +    }
> +
> +    #[test]
> +    fn test_long_strings() {
> +        let long_name = "a".repeat(1000);
> +        let long_ip = "192.168.1.".to_string() + &"1".repeat(100);
> +        let long_cluster = "cluster-".to_string() + &"x".repeat(500);
> +
> +        let config = Config::new(
> +            long_name.clone(),
> +            long_ip.clone(),
> +            u32::MAX,
> +            true,
> +            true,
> +            long_cluster.clone(),
> +        );
> +
> +        assert_eq!(config.nodename(), long_name);
> +        assert_eq!(config.node_ip(), long_ip);
> +        assert_eq!(config.cluster_name(), long_cluster);
> +        assert_eq!(config.www_data_gid(), u32::MAX);
> +    }
> +
> +    #[test]
> +    fn test_special_characters_in_strings() {
> +        let config = Config::new(
> +            "node-1_test.local".to_string(),
> +            "192.168.1.10:8006".to_string(),
> +            33,
> +            false,
> +            false,
> +            "my-cluster_v2.0".to_string(),
> +        );
> +
> +        assert_eq!(config.nodename(), "node-1_test.local");
> +        assert_eq!(config.node_ip(), "192.168.1.10:8006");
> +        assert_eq!(config.cluster_name(), "my-cluster_v2.0");
> +    }
> +
> +    #[test]
> +    fn test_unicode_in_strings() {
> +        let config = Config::new(
> +            "ノード1".to_string(),
> +            "::1".to_string(),
> +            33,
> +            false,
> +            false,
> +            "集群".to_string(),
> +        );
> +
> +        assert_eq!(config.nodename(), "ノード1");
> +        assert_eq!(config.node_ip(), "::1");
> +        assert_eq!(config.cluster_name(), "集群");
> +    }
> +}



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[relevance 6%]

* Re: [pve-devel] [PATCH pve-cluster 01/15] pmxcfs-rs: add workspace and pmxcfs-api-types crate
  2026-01-23 14:17  6%   ` Samuel Rufinatscha
@ 2026-01-26  9:00  6%     ` Kefu Chai
  0 siblings, 0 replies; 117+ results
From: Kefu Chai @ 2026-01-26  9:00 UTC (permalink / raw)
  To: Samuel Rufinatscha, Proxmox VE development discussion

On Fri Jan 23, 2026 at 10:17 PM CST, Samuel Rufinatscha wrote:
> Thanks for the series. I’ve started reviewing patches 1–6; sending
> notes for patch 1 first, and I’ll follow up with comments on the
> others once I’ve gone through them in more depth.

Hi Samuel, thanks for your review. replies inlined.
>
> comments inline
>
> On 1/6/26 3:25 PM, Kefu Chai wrote:
>> Initialize the Rust workspace for the pmxcfs rewrite project.
>> 
>> Add pmxcfs-api-types crate which provides foundational types:
>> - PmxcfsError: Error type with errno mapping for FUSE operations
>> - FuseMessage: Filesystem operation messages
>> - KvStoreMessage: Status synchronization messages
>> - ApplicationMessage: Wrapper enum for both message types
>> - VmType: VM type enum (Qemu, Lxc)
>> 
>> This is the foundation crate with no internal dependencies, only
>> requiring thiserror and libc. All other crates will depend on these
>> shared type definitions.
>> 
>> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
>> ---
>>   src/pmxcfs-rs/Cargo.lock                  | 2067 +++++++++++++++++++++
>
> Following the .gitignore pattern in our other repos, Cargo.lock is
> ignored, so I’d suggest dropping it from the series.

dropped.

>
>>   src/pmxcfs-rs/Cargo.toml                  |   83 +
>>   src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml |   19 +
>>   src/pmxcfs-rs/pmxcfs-api-types/README.md  |  105 ++
>>   src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs |  152 ++
>>   5 files changed, 2426 insertions(+)
>>   create mode 100644 src/pmxcfs-rs/Cargo.lock
>>   create mode 100644 src/pmxcfs-rs/Cargo.toml
>>   create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
>>   create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/README.md
>>   create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
>> 
>> diff --git a/src/pmxcfs-rs/Cargo.lock b/src/pmxcfs-rs/Cargo.lock
>
> [..]
>
>> +++ b/src/pmxcfs-rs/Cargo.toml
>> @@ -0,0 +1,83 @@
>> +# Workspace root for pmxcfs Rust implementation
>> +[workspace]
>> +members = [
>> +    "pmxcfs-api-types", # Shared types and error definitions
>> +]
>> +resolver = "2"
>> +
>> +[workspace.package]
>> +version = "9.0.6"
>> +edition = "2024"
>> +authors = ["Proxmox Support Team <support@proxmox.com>"]
>> +license = "AGPL-3.0"
>> +repository = "https://git.proxmox.com/?p=pve-cluster.git"
>> +rust-version = "1.85"
>> +
>> +[workspace.dependencies]
>
> Here we already declare workspace path deps for crates that aren’t
> present yet (pmxcfs-config, pmxcfs-memdb, ...). For bisectability,
> could we keep this patch minimal and add those workspace
> members/path deps in the patches where the crates are introduced?

restructured the commits to add the deps only when they are used.

>
>> +# Internal workspace dependencies
>> +pmxcfs-api-types = { path = "pmxcfs-api-types" }
>> +pmxcfs-config = { path = "pmxcfs-config" }
>> +pmxcfs-memdb = { path = "pmxcfs-memdb" }
>> +pmxcfs-dfsm = { path = "pmxcfs-dfsm" }
>> +pmxcfs-rrd = { path = "pmxcfs-rrd" }
>> +pmxcfs-status = { path = "pmxcfs-status" }
>> +pmxcfs-ipc = { path = "pmxcfs-ipc" }
>> +pmxcfs-services = { path = "pmxcfs-services" }
>> +pmxcfs-logger = { path = "pmxcfs-logger" }
>> +
>> +# Core async runtime
>> +tokio = { version = "1.35", features = ["full"] }
>> +tokio-util = "0.7"
>> +async-trait = "0.1"
>> +
>
> If the goal is to centrally pin external crate versions early, maybe
> limit [workspace.dependencies] here generally to the crates actually
> used by pmxcfs-api-types (thiserror, libc) and extend as new crates
> are added.

likewise.

>
>> +# Error handling
>> +anyhow = "1.0"
>> +thiserror = "1.0"
>> +
>> +# Logging and tracing
>> +tracing = "0.1"
>> +tracing-subscriber = { version = "0.3", features = ["env-filter"] }
>> +
>> +# Serialization
>> +serde = { version = "1.0", features = ["derive"] }
>> +serde_json = "1.0"
>> +bincode = "1.3"
>> +
>> +# Network and cluster
>> +bytes = "1.5"
>> +sha2 = "0.10"
>> +bytemuck = { version = "1.14", features = ["derive"] }
>> +
>> +# System integration
>> +libc = "0.2"
>> +nix = { version = "0.27", features = ["fs", "process", "signal", "user", "socket"] }
>> +users = "0.11"
>> +
>> +# Corosync/CPG bindings
>> +rust-corosync = "0.1"
>> +
>> +# Enum conversions
>> +num_enum = "0.7"
>> +
>> +# Concurrency primitives
>> +parking_lot = "0.12"
>> +
>> +# Utilities
>> +chrono = "0.4"
>> +futures = "0.3"
>> +
>> +# Development dependencies
>> +tempfile = "3.8"
>> +
>> +[workspace.lints.clippy]
>> +uninlined_format_args = "warn"
>> +
>> +[profile.release]
>> +lto = true
>> +codegen-units = 1
>> +opt-level = 3
>> +strip = true
>> +
>> +[profile.dev]
>> +opt-level = 1
>> +debug = true
>> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml b/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
>> new file mode 100644
>> index 00000000..cdce7951
>> --- /dev/null
>> +++ b/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
>> @@ -0,0 +1,19 @@
>> +[package]
>> +name = "pmxcfs-api-types"
>> +description = "Shared types and error definitions for pmxcfs"
>> +
>> +version.workspace = true
>> +edition.workspace = true
>> +authors.workspace = true
>> +license.workspace = true
>> +repository.workspace = true
>> +
>> +[lints]
>> +workspace = true
>> +
>> +[dependencies]
>> +# Error handling
>> +thiserror.workspace = true
>> +
>> +# System integration
>> +libc.workspace = true
>> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/README.md b/src/pmxcfs-rs/pmxcfs-api-types/README.md
>> new file mode 100644
>> index 00000000..da8304ae
>> --- /dev/null
>> +++ b/src/pmxcfs-rs/pmxcfs-api-types/README.md
>> @@ -0,0 +1,105 @@
>> +# pmxcfs-api-types
>> +
>> +**Shared Types and Error Definitions** for pmxcfs.
>> +
>> +This crate provides common types, error definitions, and message formats used across all pmxcfs crates. It serves as the "API contract" between different components.
>> +
>> +## Overview
>> +
>> +The crate contains:
>> +- **Error types**: `PmxcfsError` with errno mapping for FUSE
>> +- **Message types**: `FuseMessage`, `KvStoreMessage`, `ApplicationMessage`
>
> These types and the mentioned serialization helpers aren’t part of this 
> diff, could you re-check both README.md (and the commit message) so they 
> match?

sorry, this README was revised before the last refactory. fixed.

>
>> +- **Shared types**: `MemberInfo`, `NodeSyncInfo`
>> +- **Serialization**: C-compatible wire format helpers
>> +
>> +## Error Types
>> +
>> +### PmxcfsError
>> +
>> +Type-safe error enum with automatic errno conversion.
>> +
>> +### errno Mapping
>> +
>> +Errors automatically convert to POSIX errno values for FUSE.
>> +
>> +| Error | errno | Value |
>> +|-------|-------|-------|
>> +| `NotFound` | `ENOENT` | 2 |
>> +| `PermissionDenied` | `EPERM` | 1 |
>> +| `AlreadyExists` | `EEXIST` | 17 |
>> +| `NotADirectory` | `ENOTDIR` | 20 |
>> +| `IsADirectory` | `EISDIR` | 21 |
>> +| `DirectoryNotEmpty` | `ENOTEMPTY` | 39 |
>> +| `FileTooLarge` | `EFBIG` | 27 |
>> +| `ReadOnlyFilesystem` | `EROFS` | 30 |
>> +| `NoQuorum` | `EACCES` | 13 |
>> +| `Timeout` | `ETIMEDOUT` | 110 |
>> +
>> +## Message Types
>> +
>> +### FuseMessage
>> +
>> +Filesystem operations broadcast through the cluster (via DFSM). Uses C-compatible wire format compatible with `dcdb.c`.
>> +
>> +### KvStoreMessage
>> +
>> +Status and metrics synchronization (via kvstore DFSM). Uses C-compatible wire format.
>> +
>> +### ApplicationMessage
>> +
>> +Wrapper for either FuseMessage or KvStoreMessage, used by DFSM to handle both filesystem and status messages with type safety.
>> +
>> +## Shared Types
>> +
>> +### MemberInfo
>> +
>> +Cluster member information.
>> +
>> +### NodeSyncInfo
>> +
>> +DFSM synchronization state.
>> +
>> +## C to Rust Mapping
>> +
>> +### Error Handling
>> +
>> +**C Version (cfs-utils.h):**
>> +- Return codes: `0` = success, negative = error
>> +- errno-based error reporting
>> +- Manual error checking everywhere
>> +
>> +**Rust Version:**
>> +- `Result<T, PmxcfsError>` type
>> +
>> +### Message Types
>> +
>> +**C Version (dcdb.h):**
>> +
>> +**Rust Version:**
>> +- Type-safe enums
>> +
>> +## Key Differences from C Implementation
>> +
>> +All message types have `serialize()` and `deserialize()` methods that produce byte-for-byte compatible formats with the C implementation.
>> +
>> +## Known Issues / TODOs
>> +
>> +### Missing Features
>> +- None identified
>> +
>> +### Compatibility
>> +- **Wire format**: 100% compatible with C implementation
>> +- **errno values**: Match POSIX standards
>> +- **Message types**: All C message types covered
>> +
>> +## References
>> +
>> +### C Implementation
>> +- `src/pmxcfs/cfs-utils.h` - Utility types and error codes
>> +- `src/pmxcfs/dcdb.h` - FUSE message types
>> +- `src/pmxcfs/status.h` - KvStore message types
>> +
>> +### Related Crates
>> +- **pmxcfs-dfsm**: Uses ApplicationMessage for cluster sync
>> +- **pmxcfs-memdb**: Uses PmxcfsError for database operations
>> +- **pmxcfs**: Uses FuseMessage for FUSE operations
>> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs b/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
>> new file mode 100644
>> index 00000000..ae0e5eb0
>> --- /dev/null
>> +++ b/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
>> @@ -0,0 +1,152 @@
>> +use thiserror::Error;
>> +
>> +/// Error types for pmxcfs operations
>> +#[derive(Error, Debug)]
>> +pub enum PmxcfsError {
>
> nit: the error related parts could be added into a dedicated error.rs
> module

thanks! extracted.

>
>> +    #[error("I/O error: {0}")]
>> +    Io(#[from] std::io::Error),
>> +
>> +    #[error("Database error: {0}")]
>> +    Database(String),
>> +
>> +    #[error("FUSE error: {0}")]
>> +    Fuse(String),
>> +
>> +    #[error("Cluster error: {0}")]
>> +    Cluster(String),
>> +
>> +    #[error("Corosync error: {0}")]
>> +    Corosync(String),
>> +
>> +    #[error("Configuration error: {0}")]
>> +    Configuration(String),
>> +
>> +    #[error("System error: {0}")]
>> +    System(String),
>> +
>> +    #[error("IPC error: {0}")]
>> +    Ipc(String),
>> +
>> +    #[error("Permission denied")]
>> +    PermissionDenied,
>> +
>> +    #[error("Not found: {0}")]
>> +    NotFound(String),
>> +
>> +    #[error("Already exists: {0}")]
>> +    AlreadyExists(String),
>> +
>> +    #[error("Invalid argument: {0}")]
>> +    InvalidArgument(String),
>> +
>> +    #[error("Not a directory: {0}")]
>> +    NotADirectory(String),
>> +
>> +    #[error("Is a directory: {0}")]
>> +    IsADirectory(String),
>> +
>> +    #[error("Directory not empty: {0}")]
>> +    DirectoryNotEmpty(String),
>> +
>> +    #[error("No quorum")]
>> +    NoQuorum,
>> +
>> +    #[error("Read-only filesystem")]
>> +    ReadOnlyFilesystem,
>> +
>> +    #[error("File too large")]
>> +    FileTooLarge,
>> +
>> +    #[error("Lock error: {0}")]
>> +    Lock(String),
>> +
>> +    #[error("Timeout")]
>> +    Timeout,
>> +
>> +    #[error("Invalid path: {0}")]
>> +    InvalidPath(String),
>> +}
>> +
>> +impl PmxcfsError {
>> +    /// Convert error to errno value for FUSE operations
>> +    pub fn to_errno(&self) -> i32 {
>> +        match self {
>> +            PmxcfsError::NotFound(_) => libc::ENOENT,
>> +            PmxcfsError::PermissionDenied => libc::EPERM,
>> +            PmxcfsError::AlreadyExists(_) => libc::EEXIST,
>> +            PmxcfsError::NotADirectory(_) => libc::ENOTDIR,
>> +            PmxcfsError::IsADirectory(_) => libc::EISDIR,
>> +            PmxcfsError::DirectoryNotEmpty(_) => libc::ENOTEMPTY,
>> +            PmxcfsError::InvalidArgument(_) => libc::EINVAL,
>> +            PmxcfsError::FileTooLarge => libc::EFBIG,
>> +            PmxcfsError::ReadOnlyFilesystem => libc::EROFS,
>> +            PmxcfsError::NoQuorum => libc::EACCES,
>> +            PmxcfsError::Timeout => libc::ETIMEDOUT,
>> +            PmxcfsError::Io(e) => match e.raw_os_error() {
>> +                Some(errno) => errno,
>> +                None => libc::EIO,
>> +            },
>> +            _ => libc::EIO,
>
> Please check with C implementation, but:
>
> "PermissionDenied" should likely map to EACCES rather than EPERM. In
> FUSE/POSIX, EACCES is the standard return for file permission blocks,
> whereas EPERM is usually for administrative restrictions
> (like ownership)
>
> "InvalidPath" maps better to EINVAL. EIO suggests a hardware/disk
> failure, whereas InvalidPath implies an argument issue
>
> Also, "Lock" should explicitly be mapped.
> EBUSY (resource busy / lock contention)
> or EDEADLK (deadlock) / EAGAIN depending on semantics
>
> In general, can we minimize the number of errors falling into the
> generic EIO branch?
>

indeed. the way how the errors were categorized was way too
coarse-grained. fixed accordingly.

>> +        }
>> +    }
>> +}
>> +
>> +/// Result type for pmxcfs operations
>> +pub type Result<T> = std::result::Result<T, PmxcfsError>;
>> +
>> +/// VM/CT types
>> +#[derive(Debug, Clone, Copy, PartialEq, Eq)]
>
> If this is used in wire contexts please add #[repr(u8)] to ensure a
> stable ABI.

it's not use in wire format. i removed assignment statements, as in 
this case, we don't need predictable values for these values -- they 
are only used in-memory comparisons as distinct identifiers.

>
>> +pub enum VmType {
>> +    Qemu = 1,
>> +    Lxc = 3,
>
> There’s a gap between values 1 -> 3: is 2 reserved?
> If so, maybe add a short comment.

it's not reserved. actually, it's OpenVZ which was not supported anymore.
see https://www.proxmox.com/en/about/company-details/press-releases/proxmox-ve-4-0-released
now that the specific values are not assigned to these enum values, we
don't need to keep it anymore. but a short comment was added anyway to 
explain that OpenVZ support was removed.

>
>> +}
>> +
>> +impl VmType {
>> +    /// Returns the directory name where config files are stored
>> +    pub fn config_dir(&self) -> &'static str {
>> +        match self {
>> +            VmType::Qemu => "qemu-server",
>> +            VmType::Lxc => "lxc",
>> +        }
>> +    }
>> +}
>> +
>> +impl std::fmt::Display for VmType {
>> +    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
>> +        match self {
>> +            VmType::Qemu => write!(f, "qemu"),
>> +            VmType::Lxc => write!(f, "lxc"),
>> +        }
>> +    }
>> +}
>> +
>> +/// VM/CT entry for vmlist
>> +#[derive(Debug, Clone)]
>> +pub struct VmEntry {
>> +    pub vmid: u32,
>> +    pub vmtype: VmType,
>> +    pub node: String,
>> +    /// Per-VM version counter (increments when this VM's config changes)
>> +    pub version: u32,
>> +}
>> +
>> +/// Information about a cluster member
>> +///
>> +/// This is a shared type used by both cluster and DFSM modules
>> +#[derive(Debug, Clone)]
>> +pub struct MemberInfo {
>> +    pub node_id: u32,
>> +    pub pid: u32,
>> +    pub joined_at: u64,
>> +}
>> +
>> +/// Node synchronization info for DFSM state sync
>> +///
>> +/// Used during DFSM synchronization to track which nodes have provided state
>> +#[derive(Debug, Clone)]
>> +pub struct NodeSyncInfo {
>> +    pub nodeid: u32,
>
> We have "nodeid" here but "node_id" in MemberInfo, this should be
> aligned.

thanks for pointing this out! changed to "node_id".
>
>> +    pub pid: u32,
>> +    pub state: Option<Vec<u8>>,
>> +    pub synced: bool,
>> +}



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[relevance 6%]

* Re: [pve-devel] [PATCH pve-cluster 02/15] pmxcfs-rs: add pmxcfs-config crate
  2026-01-23 15:01  6%   ` Samuel Rufinatscha
@ 2026-01-26  9:43  6%     ` Kefu Chai
  0 siblings, 0 replies; 117+ results
From: Kefu Chai @ 2026-01-26  9:43 UTC (permalink / raw)
  To: Samuel Rufinatscha, Proxmox VE development discussion

On Fri Jan 23, 2026 at 11:01 PM CST, Samuel Rufinatscha wrote:
> comments inline
>
> On 1/6/26 3:25 PM, Kefu Chai wrote:
>> Add configuration management crate that provides:
>> - Config struct for runtime configuration
>> - Node hostname, IP, and group ID tracking
>> - Debug and local mode flags
>> - Thread-safe configuration access via parking_lot Mutex
>> 
>> This is a foundational crate with no internal dependencies, only
>> requiring parking_lot for synchronization. Other crates will use
>> this for accessing runtime configuration.
>> 
>> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
>> ---
>>   src/pmxcfs-rs/Cargo.toml               |   3 +-
>>   src/pmxcfs-rs/pmxcfs-config/Cargo.toml |  16 +
>>   src/pmxcfs-rs/pmxcfs-config/README.md  | 127 +++++++
>>   src/pmxcfs-rs/pmxcfs-config/src/lib.rs | 471 +++++++++++++++++++++++++
>>   4 files changed, 616 insertions(+), 1 deletion(-)
>>   create mode 100644 src/pmxcfs-rs/pmxcfs-config/Cargo.toml
>>   create mode 100644 src/pmxcfs-rs/pmxcfs-config/README.md
>>   create mode 100644 src/pmxcfs-rs/pmxcfs-config/src/lib.rs
>> 
>> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
>> index 15d88f52..28e20bb7 100644
>> --- a/src/pmxcfs-rs/Cargo.toml
>> +++ b/src/pmxcfs-rs/Cargo.toml
>> @@ -1,7 +1,8 @@
>>   # Workspace root for pmxcfs Rust implementation
>>   [workspace]
>>   members = [
>> -    "pmxcfs-api-types", # Shared types and error definitions
>> +    "pmxcfs-api-types",  # Shared types and error definitions
>> +    "pmxcfs-config",     # Configuration management
>>   ]
>>   resolver = "2"
>>   
>> diff --git a/src/pmxcfs-rs/pmxcfs-config/Cargo.toml b/src/pmxcfs-rs/pmxcfs-config/Cargo.toml
>> new file mode 100644
>> index 00000000..f5a60995
>> --- /dev/null
>> +++ b/src/pmxcfs-rs/pmxcfs-config/Cargo.toml
>> @@ -0,0 +1,16 @@
>> +[package]
>> +name = "pmxcfs-config"
>> +description = "Configuration management for pmxcfs"
>> +
>> +version.workspace = true
>> +edition.workspace = true
>> +authors.workspace = true
>> +license.workspace = true
>> +repository.workspace = true
>> +
>> +[lints]
>> +workspace = true
>> +
>> +[dependencies]
>> +# Concurrency primitives
>> +parking_lot.workspace = true
>> diff --git a/src/pmxcfs-rs/pmxcfs-config/README.md b/src/pmxcfs-rs/pmxcfs-config/README.md
>> new file mode 100644
>> index 00000000..c06b2170
>> --- /dev/null
>> +++ b/src/pmxcfs-rs/pmxcfs-config/README.md
>> @@ -0,0 +1,127 @@
>> +# pmxcfs-config
>> +
>> +**Configuration Management** and **Cluster Services** for pmxcfs.
>> +
>> +This crate provides configuration structures and cluster integration services including quorum tracking and cluster configuration monitoring via Corosync APIs.
>> +
>> +## Overview
>> +
>> +This crate contains:
>> +1. **Config struct**: Runtime configuration (node name, IPs, flags)
>> +2. Integration with Corosync services (tracked in main pmxcfs crate):
>> +   - **QuorumService** (`pmxcfs/src/quorum_service.rs`) - Quorum monitoring
>> +   - **ClusterConfigService** (`pmxcfs/src/cluster_config_service.rs`) - Config tracking
>
> This patch only contains the Config struct, but not Cluster Services
> or QuorumService, please revist commit message and README.

Sorry, the README.md was out-of-sync after the latest refatory. Fixed.

>
>> +
>> +## Config Struct
>> +
>> +The `Config` struct holds daemon-wide configuration including node hostname, IP address, www-data group ID, debug flag, local mode flag, and cluster name.
>> +
>> +## Cluster Services
>> +
>> +The following services are implemented in the main pmxcfs crate but documented here for completeness.
>> +
>> +### QuorumService
>> +
>> +**C Equivalent:** `src/pmxcfs/quorum.c` - `service_quorum_new()`
>> +**Rust Location:** `src/pmxcfs-rs/pmxcfs/src/quorum_service.rs`
>> +
>> +Monitors cluster quorum status via Corosync quorum API.
>> +
>> +#### Features
>> +- Tracks quorum state (quorate/inquorate)
>> +- Monitors member list changes
>> +- Automatic reconnection on Corosync restart
>> +- Updates `Status` quorum flag
>> +
>> +#### C to Rust Mapping
>> +
>> +| C Function | Rust Equivalent | Location |
>> +|-----------|-----------------|----------|
>> +| `service_quorum_new()` | `QuorumService::new()` | quorum_service.rs |
>> +| `service_quorum_destroy()` | (Drop trait / finalize) | Automatic |
>> +| `quorum_notification_fn` | quorum_notification closure | quorum_service.rs |
>> +| `nodelist_notification_fn` | nodelist_notification closure | quorum_service.rs |
>> +
>> +#### Quorum Notifications
>> +
>> +The service monitors quorum state changes and member list changes, updating the Status accordingly.
>> +
>> +### ClusterConfigService
>> +
>> +**C Equivalent:** `src/pmxcfs/confdb.c` - `service_confdb_new()`
>> +**Rust Location:** `src/pmxcfs-rs/pmxcfs/src/cluster_config_service.rs`
>> +
>> +Monitors Corosync cluster configuration (cmap) and tracks node membership.
>> +
>> +#### Features
>> +- Monitors cluster membership via Corosync cmap API
>> +- Tracks node additions/removals
>> +- Registers nodes in Status
>> +- Automatic reconnection on Corosync restart
>> +
>> +#### C to Rust Mapping
>> +
>> +| C Function | Rust Equivalent | Location |
>> +|-----------|-----------------|----------|
>> +| `service_confdb_new()` | `ClusterConfigService::new()` | cluster_config_service.rs |
>> +| `service_confdb_destroy()` | (Drop trait / finalize) | Automatic |
>> +| `confdb_track_fn` | (direct cmap queries) | Different approach |
>> +
>> +#### Configuration Tracking
>> +
>> +The service monitors:
>> +- `nodelist.node.*.nodeid` - Node IDs
>> +- `nodelist.node.*.name` - Node names
>> +- `nodelist.node.*.ring*_addr` - Node IP addresses
>> +
>> +Updates `Status` with current cluster membership.
>> +
>> +## Key Differences from C Implementation
>> +
>> +### Cluster Config Service API
>> +
>> +**C Version (confdb.c):**
>> +- Uses deprecated confdb API
>> +- Track changes via confdb notifications
>> +
>> +**Rust Version:**
>> +- Uses modern cmap API
>> +- Direct cmap queries
>> +
>> +Both read the same data, but Rust uses the modern Corosync API.
>> +
>> +### Service Integration
>> +
>> +**C Version:**
>> +- qb_loop manages lifecycle
>> +
>> +**Rust Version:**
>> +- Service trait abstracts lifecycle
>> +- ServiceManager handles retry
>> +- Tokio async dispatch
>> +
>> +## Known Issues / TODOs
>> +
>> +### Compatibility
>> +- **Quorum tracking**: Compatible with C implementation
>> +- **Node registration**: Equivalent behavior
>> +- **cmap vs confdb**: Rust uses modern cmap API (C uses deprecated confdb)
>> +
>> +### Missing Features
>> +- None identified
>> +
>> +### Behavioral Differences (Benign)
>> +- **API choice**: Rust uses cmap, C uses confdb (both read same data)
>> +- **Lifecycle**: Rust uses Service trait, C uses manual lifecycle
>> +
>> +## References
>> +
>> +### C Implementation
>> +- `src/pmxcfs/quorum.c` / `quorum.h` - Quorum service
>> +- `src/pmxcfs/confdb.c` / `confdb.h` - Cluster config service
>> +
>> +### Related Crates
>> +- **pmxcfs**: Main daemon with QuorumService and ClusterConfigService
>> +- **pmxcfs-status**: Status tracking updated by these services
>> +- **pmxcfs-services**: Service framework used by both services
>> +- **rust-corosync**: Corosync FFI bindings
>> diff --git a/src/pmxcfs-rs/pmxcfs-config/src/lib.rs b/src/pmxcfs-rs/pmxcfs-config/src/lib.rs
>> new file mode 100644
>> index 00000000..5e1ee1b2
>> --- /dev/null
>> +++ b/src/pmxcfs-rs/pmxcfs-config/src/lib.rs
>> @@ -0,0 +1,471 @@
>> +use parking_lot::RwLock;
>> +use std::sync::Arc;
>> +
>> +/// Global configuration for pmxcfs
>> +pub struct Config {
>> +    /// Node name (hostname without domain)
>> +    pub nodename: String,
>> +
>> +    /// Node IP address
>> +    pub node_ip: String,
>
> Consider using std::net::IpAddr (or SocketAddr if a port is part of the
> value). Tests currently mix IP vs IP:PORT, so it’s unclear what node_ip
> is supposed to represent.

It's a value extracted from resolve_node_ip(), so it's just an IP
address. so switched to IpAddr, and tests are updated accordingly.

>
>> +
>> +    /// www-data group ID for file permissions
>> +    pub www_data_gid: u32,
>> +
>> +    /// Debug mode enabled
>> +    pub debug: bool,
>> +
>> +    /// Force local mode (no clustering)
>> +    pub local_mode: bool,
>> +
>> +    /// Cluster name (CPG group name)
>> +    pub cluster_name: String,
>> +
>> +    /// Debug level (0 = normal, 1+ = debug) - mutable at runtime
>> +    debug_level: RwLock<u8>,
>
> in the crate docs it says: “The Config struct uses Arc<AtomicU8> for
> debug_level” but the implementation uses parking_lot::RwLock<u8>.
> Unless we need lock coupling with other fields, AtomicU8 would likely
> be sufficient (and cheaper) for debug_level. Also please re-check the
> commit message, which mentions parking_lot::Mutex.

Indeed. AtomicU8 is more light-weight and simpler than RwLock. changed 
accordingly.

>
>> +}
>> +
>> +impl Clone for Config {
>> +    fn clone(&self) -> Self {
>> +        Self {
>> +            nodename: self.nodename.clone(),
>> +            node_ip: self.node_ip.clone(),
>> +            www_data_gid: self.www_data_gid,
>> +            debug: self.debug,
>> +            local_mode: self.local_mode,
>> +            cluster_name: self.cluster_name.clone(),
>> +            debug_level: RwLock::new(*self.debug_level.read()),
>> +        }
>> +    }
>> +}
>> +
>> +impl std::fmt::Debug for Config {
>> +    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
>> +        f.debug_struct("Config")
>> +            .field("nodename", &self.nodename)
>> +            .field("node_ip", &self.node_ip)
>> +            .field("www_data_gid", &self.www_data_gid)
>> +            .field("debug", &self.debug)
>> +            .field("local_mode", &self.local_mode)
>> +            .field("cluster_name", &self.cluster_name)
>> +            .field("debug_level", &*self.debug_level.read())
>> +            .finish()
>> +    }
>> +}
>> +
>> +impl Config {
>> +    pub fn new(
>> +        nodename: String,
>> +        node_ip: String,
>> +        www_data_gid: u32,
>> +        debug: bool,
>> +        local_mode: bool,
>> +        cluster_name: String,
>> +    ) -> Arc<Self> {
>
> The constructor returns Arc<Config>
> I think we could keep new() -> Self, and provide convenience
> constructor shared() -> Arc<Self>.
> This would allow local usage (e.g. for tests) without heap allocation
> of the struct

Config::new() is added. And tests are using it automatically.

>
>> +        let debug_level = if debug { 1 } else { 0 };
>
> debug_level is derived from debug at creation time, but thereafter:
> set_debug_level() does not update debug and is_debug() would continue
> to reflect the initial flag, not the effective debug level
> is_debug() should just be a helper that returns self.debug_level() > 0.
> The debug field should probably be removed entirely.

Ahh, thanks for pointing this out. Fixed.

>
>> +        Arc::new(Self {
>> +            nodename,
>> +            node_ip,
>> +            www_data_gid,
>> +            debug,
>> +            local_mode,
>> +            cluster_name,
>> +            debug_level: RwLock::new(debug_level),
>> +        })
>> +    }
>> +
>> +    pub fn cluster_name(&self) -> &str {
>> +        &self.cluster_name
>> +    }
>> +
>> +    pub fn nodename(&self) -> &str {
>> +        &self.nodename
>> +    }
>> +
>> +    pub fn node_ip(&self) -> &str {
>> +        &self.node_ip
>> +    }
>> +
>> +    pub fn www_data_gid(&self) -> u32 {
>> +        self.www_data_gid
>> +    }
>> +
>> +    pub fn is_debug(&self) -> bool {
>> +        self.debug
>> +    }
>> +
>> +    pub fn is_local_mode(&self) -> bool {
>> +        self.local_mode
>> +    }
>> +
>> +    /// Get current debug level (0 = normal, 1+ = debug)
>> +    pub fn debug_level(&self) -> u8 {
>> +        *self.debug_level.read()
>> +    }
>> +
>> +    /// Set debug level (0 = normal, 1+ = debug)
>> +    pub fn set_debug_level(&self, level: u8) {
>> +        *self.debug_level.write() = level;
>> +    }
>
> Right now most fields are pub but also getters are exposed. This will
> make it harder to enforce invariants.
> I would suggest to make fields private and keep getters, or keep fields
> public and drop the getters.

Indeed. I made all fields private and keep getters.

>
>> +}
>> +
>> +#[cfg(test)]
>> +mod tests {
>> +    //! Unit tests for Config struct
>> +    //!
>> +    //! This test module provides comprehensive coverage for:
>> +    //! - Configuration creation and initialization
>> +    //! - Getter methods for all configuration fields
>> +    //! - Debug level mutation and thread safety
>> +    //! - Concurrent access patterns (reads and writes)
>> +    //! - Clone independence
>> +    //! - Debug formatting
>> +    //! - Edge cases (empty strings, long strings, special characters, unicode)
>> +    //!
>> +    //! ## Thread Safety
>> +    //!
>> +    //! The Config struct uses `Arc<AtomicU8>` for debug_level to allow
>> +    //! safe concurrent reads and writes. Tests verify:
>> +    //! - 10 threads × 100 operations (concurrent modifications)
>> +    //! - 20 threads × 1000 operations (concurrent reads)
>> +    //!
>> +    //! ## Edge Cases
>> +    //!
>> +    //! Tests cover various edge cases including:
>> +    //! - Empty strings for node/cluster names
>> +    //! - Long strings (1000+ characters)
>> +    //! - Special characters in strings
>> +    //! - Unicode support (emoji, non-ASCII characters)
>> +
>> +    use super::*;
>> +    use std::thread;
>> +
>> +    // ===== Basic Construction Tests =====
>> +
>> +    #[test]
>> +    fn test_config_creation() {
>> +        let config = Config::new(
>> +            "node1".to_string(),
>> +            "192.168.1.10".to_string(),
>> +            33,
>> +            false,
>> +            false,
>> +            "pmxcfs".to_string(),
>> +        );
>> +
>> +        assert_eq!(config.nodename(), "node1");
>> +        assert_eq!(config.node_ip(), "192.168.1.10");
>> +        assert_eq!(config.www_data_gid(), 33);
>> +        assert!(!config.is_debug());
>> +        assert!(!config.is_local_mode());
>> +        assert_eq!(config.cluster_name(), "pmxcfs");
>> +        assert_eq!(
>> +            config.debug_level(),
>> +            0,
>> +            "Debug level should be 0 when debug is false"
>> +        );
>> +    }
>> +
>> +    #[test]
>> +    fn test_config_creation_with_debug() {
>> +        let config = Config::new(
>> +            "node2".to_string(),
>> +            "10.0.0.5".to_string(),
>> +            1000,
>> +            true,
>> +            false,
>> +            "test-cluster".to_string(),
>> +        );
>> +
>> +        assert!(config.is_debug());
>> +        assert_eq!(
>> +            config.debug_level(),
>> +            1,
>> +            "Debug level should be 1 when debug is true"
>> +        );
>> +    }
>> +
>> +    #[test]
>> +    fn test_config_creation_local_mode() {
>> +        let config = Config::new(
>> +            "localhost".to_string(),
>> +            "127.0.0.1".to_string(),
>> +            33,
>> +            false,
>> +            true,
>> +            "local".to_string(),
>> +        );
>> +
>> +        assert!(config.is_local_mode());
>> +        assert!(!config.is_debug());
>> +    }
>> +
>> +    // ===== Getter Tests =====
>> +
>> +    #[test]
>> +    fn test_all_getters() {
>> +        let config = Config::new(
>> +            "testnode".to_string(),
>> +            "172.16.0.1".to_string(),
>> +            999,
>> +            true,
>> +            true,
>> +            "my-cluster".to_string(),
>> +        );
>> +
>> +        // Test all getter methods
>> +        assert_eq!(config.nodename(), "testnode");
>> +        assert_eq!(config.node_ip(), "172.16.0.1");
>> +        assert_eq!(config.www_data_gid(), 999);
>> +        assert!(config.is_debug());
>> +        assert!(config.is_local_mode());
>> +        assert_eq!(config.cluster_name(), "my-cluster");
>> +        assert_eq!(config.debug_level(), 1);
>> +    }
>> +
>> +    // ===== Debug Level Mutation Tests =====
>> +
>> +    #[test]
>> +    fn test_debug_level_mutation() {
>> +        let config = Config::new(
>> +            "node1".to_string(),
>> +            "192.168.1.1".to_string(),
>> +            33,
>> +            false,
>> +            false,
>> +            "pmxcfs".to_string(),
>> +        );
>> +
>> +        assert_eq!(config.debug_level(), 0);
>> +
>> +        config.set_debug_level(1);
>> +        assert_eq!(config.debug_level(), 1);
>> +
>> +        config.set_debug_level(5);
>> +        assert_eq!(config.debug_level(), 5);
>> +
>> +        config.set_debug_level(0);
>> +        assert_eq!(config.debug_level(), 0);
>> +    }
>> +
>> +    #[test]
>> +    fn test_debug_level_max_value() {
>> +        let config = Config::new(
>> +            "node1".to_string(),
>> +            "192.168.1.1".to_string(),
>> +            33,
>> +            false,
>> +            false,
>> +            "pmxcfs".to_string(),
>> +        );
>> +
>> +        config.set_debug_level(255);
>> +        assert_eq!(config.debug_level(), 255);
>> +
>> +        config.set_debug_level(0);
>> +        assert_eq!(config.debug_level(), 0);
>> +    }
>> +
>> +    // ===== Thread Safety Tests =====
>> +
>> +    #[test]
>> +    fn test_debug_level_thread_safety() {
>> +        let config = Config::new(
>> +            "node1".to_string(),
>> +            "192.168.1.1".to_string(),
>> +            33,
>> +            false,
>> +            false,
>> +            "pmxcfs".to_string(),
>> +        );
>> +
>> +        let config_clone = Arc::clone(&config);
>> +
>> +        // Spawn multiple threads that concurrently modify debug level
>> +        let handles: Vec<_> = (0..10)
>> +            .map(|i| {
>> +                let cfg = Arc::clone(&config);
>> +                thread::spawn(move || {
>> +                    for _ in 0..100 {
>> +                        cfg.set_debug_level(i);
>> +                        let _ = cfg.debug_level();
>> +                    }
>> +                })
>> +            })
>> +            .collect();
>> +
>> +        // All threads should complete without panicking
>> +        for handle in handles {
>> +            handle.join().unwrap();
>> +        }
>> +
>> +        // Final value should be one of the values set by threads
>> +        let final_level = config_clone.debug_level();
>> +        assert!(
>> +            final_level < 10,
>> +            "Debug level should be < 10, got {final_level}"
>> +        );
>> +    }
>> +
>> +    #[test]
>> +    fn test_concurrent_reads() {
>> +        let config = Config::new(
>> +            "node1".to_string(),
>> +            "192.168.1.1".to_string(),
>> +            33,
>> +            true,
>> +            false,
>> +            "pmxcfs".to_string(),
>> +        );
>> +
>> +        // Spawn multiple threads that concurrently read config
>> +        let handles: Vec<_> = (0..20)
>> +            .map(|_| {
>> +                let cfg = Arc::clone(&config);
>> +                thread::spawn(move || {
>> +                    for _ in 0..1000 {
>> +                        assert_eq!(cfg.nodename(), "node1");
>> +                        assert_eq!(cfg.node_ip(), "192.168.1.1");
>> +                        assert_eq!(cfg.www_data_gid(), 33);
>> +                        assert!(cfg.is_debug());
>> +                        assert!(!cfg.is_local_mode());
>> +                        assert_eq!(cfg.cluster_name(), "pmxcfs");
>> +                    }
>> +                })
>> +            })
>> +            .collect();
>> +
>> +        for handle in handles {
>> +            handle.join().unwrap();
>> +        }
>> +    }
>> +
>> +    // ===== Clone Tests =====
>> +
>> +    #[test]
>> +    fn test_config_clone() {
>> +        let config1 = Config::new(
>> +            "node1".to_string(),
>> +            "192.168.1.1".to_string(),
>> +            33,
>> +            true,
>> +            false,
>> +            "pmxcfs".to_string(),
>> +        );
>> +
>> +        config1.set_debug_level(5);
>> +
>> +        let config2 = (*config1).clone();
>> +
>> +        // Cloned config should have same values
>> +        assert_eq!(config2.nodename(), config1.nodename());
>> +        assert_eq!(config2.node_ip(), config1.node_ip());
>> +        assert_eq!(config2.www_data_gid(), config1.www_data_gid());
>> +        assert_eq!(config2.is_debug(), config1.is_debug());
>> +        assert_eq!(config2.is_local_mode(), config1.is_local_mode());
>> +        assert_eq!(config2.cluster_name(), config1.cluster_name());
>> +        assert_eq!(config2.debug_level(), 5);
>> +
>> +        // Modifying one should not affect the other
>> +        config2.set_debug_level(10);
>> +        assert_eq!(config1.debug_level(), 5);
>> +        assert_eq!(config2.debug_level(), 10);
>> +    }
>> +
>> +    // ===== Debug Formatting Tests =====
>> +
>> +    #[test]
>> +    fn test_debug_format() {
>> +        let config = Config::new(
>> +            "node1".to_string(),
>> +            "192.168.1.1".to_string(),
>> +            33,
>> +            true,
>> +            false,
>> +            "pmxcfs".to_string(),
>> +        );
>> +
>> +        let debug_str = format!("{config:?}");
>> +
>> +        // Check that debug output contains all fields
>> +        assert!(debug_str.contains("Config"));
>> +        assert!(debug_str.contains("nodename"));
>> +        assert!(debug_str.contains("node1"));
>> +        assert!(debug_str.contains("node_ip"));
>> +        assert!(debug_str.contains("192.168.1.1"));
>> +        assert!(debug_str.contains("www_data_gid"));
>> +        assert!(debug_str.contains("33"));
>> +        assert!(debug_str.contains("debug"));
>> +        assert!(debug_str.contains("true"));
>> +        assert!(debug_str.contains("local_mode"));
>> +        assert!(debug_str.contains("false"));
>> +        assert!(debug_str.contains("cluster_name"));
>> +        assert!(debug_str.contains("pmxcfs"));
>> +        assert!(debug_str.contains("debug_level"));
>> +    }
>> +
>> +    // ===== Edge Cases and Boundary Tests =====
>> +
>> +    #[test]
>> +    fn test_empty_strings() {
>> +        let config = Config::new(String::new(), String::new(), 0, false, false, String::new());
>> +
>> +        assert_eq!(config.nodename(), "");
>> +        assert_eq!(config.node_ip(), "");
>> +        assert_eq!(config.cluster_name(), "");
>> +        assert_eq!(config.www_data_gid(), 0);
>> +    }
>> +
>> +    #[test]
>> +    fn test_long_strings() {
>> +        let long_name = "a".repeat(1000);
>> +        let long_ip = "192.168.1.".to_string() + &"1".repeat(100);
>> +        let long_cluster = "cluster-".to_string() + &"x".repeat(500);
>> +
>> +        let config = Config::new(
>> +            long_name.clone(),
>> +            long_ip.clone(),
>> +            u32::MAX,
>> +            true,
>> +            true,
>> +            long_cluster.clone(),
>> +        );
>> +
>> +        assert_eq!(config.nodename(), long_name);
>> +        assert_eq!(config.node_ip(), long_ip);
>> +        assert_eq!(config.cluster_name(), long_cluster);
>> +        assert_eq!(config.www_data_gid(), u32::MAX);
>> +    }
>> +
>> +    #[test]
>> +    fn test_special_characters_in_strings() {
>> +        let config = Config::new(
>> +            "node-1_test.local".to_string(),
>> +            "192.168.1.10:8006".to_string(),
>> +            33,
>> +            false,
>> +            false,
>> +            "my-cluster_v2.0".to_string(),
>> +        );
>> +
>> +        assert_eq!(config.nodename(), "node-1_test.local");
>> +        assert_eq!(config.node_ip(), "192.168.1.10:8006");
>> +        assert_eq!(config.cluster_name(), "my-cluster_v2.0");
>> +    }
>> +
>> +    #[test]
>> +    fn test_unicode_in_strings() {
>> +        let config = Config::new(
>> +            "ノード1".to_string(),
>> +            "::1".to_string(),
>> +            33,
>> +            false,
>> +            false,
>> +            "集群".to_string(),
>> +        );
>> +
>> +        assert_eq!(config.nodename(), "ノード1");
>> +        assert_eq!(config.node_ip(), "::1");
>> +        assert_eq!(config.cluster_name(), "集群");
>> +    }
>> +}



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[relevance 6%]

* Re: [pve-devel] [PATCH pve-cluster 03/15] pmxcfs-rs: add pmxcfs-logger crate
  @ 2026-01-27 13:16  6%   ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-27 13:16 UTC (permalink / raw)
  To: Proxmox VE development discussion, Kefu Chai

Thanks for the patch, Kefu.

The overall structure looks solid.

Main points are around C compatibility details. It might
also be worth adding a couple of binary compatibility tests
(known C blobs/fixtures) and a perf test for merging large logs.

Please see inline comments below.

On 1/6/26 3:24 PM, Kefu Chai wrote:
> Add cluster logging system with:
> - ClusterLog: Main API with automatic deduplication
> - RingBuffer: Circular buffer (50,000 entries)
> - FNV-1a hashing for duplicate detection
> - JSON export matching C format
> - Binary serialization for efficient storage
> - Time-based and node-digest sorting
> 
> This is a self-contained crate with no internal dependencies,
> only requiring serde and parking_lot. It provides ~24% of the
> C version's LOC (740 vs 3000+) while maintaining full
> compatibility with the existing log format.
> 
> Includes comprehensive unit tests for ring buffer operations,
> serialization, and filtering.
> 
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
>   src/pmxcfs-rs/Cargo.toml                      |   1 +
>   src/pmxcfs-rs/pmxcfs-logger/Cargo.toml        |  15 +
>   src/pmxcfs-rs/pmxcfs-logger/README.md         |  58 ++
>   .../pmxcfs-logger/src/cluster_log.rs          | 550 +++++++++++++++++
>   src/pmxcfs-rs/pmxcfs-logger/src/entry.rs      | 579 +++++++++++++++++
>   src/pmxcfs-rs/pmxcfs-logger/src/hash.rs       | 173 ++++++
>   src/pmxcfs-rs/pmxcfs-logger/src/lib.rs        |  27 +
>   .../pmxcfs-logger/src/ring_buffer.rs          | 581 ++++++++++++++++++
>   8 files changed, 1984 insertions(+)
>   create mode 100644 src/pmxcfs-rs/pmxcfs-logger/Cargo.toml
>   create mode 100644 src/pmxcfs-rs/pmxcfs-logger/README.md
>   create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/cluster_log.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/entry.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/hash.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/lib.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/ring_buffer.rs
> 
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index 28e20bb7..4d17e87e 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -3,6 +3,7 @@
>   members = [
>       "pmxcfs-api-types",  # Shared types and error definitions
>       "pmxcfs-config",     # Configuration management
> +    "pmxcfs-logger",     # Cluster log with ring buffer and deduplication
>   ]
>   resolver = "2"
>   
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/Cargo.toml b/src/pmxcfs-rs/pmxcfs-logger/Cargo.toml
> new file mode 100644
> index 00000000..1af3f015
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/Cargo.toml
> @@ -0,0 +1,15 @@
> +[package]
> +name = "pmxcfs-logger"
> +version = "0.1.0"
> +edition = "2021"
> +
> +[dependencies]
> +anyhow = "1.0"
> +parking_lot = "0.12"
> +serde = { version = "1.0", features = ["derive"] }
> +serde_json = "1.0"
> +tracing = "0.1"
> +
> +[dev-dependencies]
> +tempfile = "3.0"
> +
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/README.md b/src/pmxcfs-rs/pmxcfs-logger/README.md
> new file mode 100644
> index 00000000..38f102c2
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/README.md
> @@ -0,0 +1,58 @@
> +# pmxcfs-logger
> +
> +Cluster-wide log management for pmxcfs, fully compatible with the C implementation (logger.c).
> +
> +## Overview
> +
> +This crate implements a cluster log system matching Proxmox's C-based logger.c behavior. It provides:
> +
> +- **Ring Buffer Storage**: Circular buffer for log entries with automatic capacity management
> +- **FNV-1a Hashing**: Hashing for node and identity-based deduplication
> +- **Deduplication**: Per-node tracking of latest log entries to avoid duplicates
> +- **Time-based Sorting**: Chronological ordering of log entries across nodes
> +- **Multi-node Merging**: Combining logs from multiple cluster nodes
> +- **JSON Export**: Web UI-compatible JSON output matching C format
> +
> +## Architecture
> +
> +### Key Components
> +
> +1. **LogEntry** (`entry.rs`): Individual log entry with automatic UID generation
> +2. **RingBuffer** (`ring_buffer.rs`): Circular buffer with capacity management
> +3. **ClusterLog** (`lib.rs`): Main API with deduplication and merging
> +4. **Hash Functions** (`hash.rs`): FNV-1a implementation matching C
> +
> +## C to Rust Mapping
> +
> +| C Function | Rust Equivalent | Location |
> +|------------|-----------------|----------|
> +| `fnv_64a_buf` | `hash::fnv_64a` | hash.rs |
> +| `clog_pack` | `LogEntry::pack` | entry.rs |
> +| `clog_copy` | `RingBuffer::add_entry` | ring_buffer.rs |
> +| `clog_sort` | `RingBuffer::sort` | ring_buffer.rs |
> +| `clog_dump_json` | `RingBuffer::dump_json` | ring_buffer.rs |
> +| `clusterlog_insert` | `ClusterLog::insert` | lib.rs |
> +| `clusterlog_add` | `ClusterLog::add` | lib.rs |
> +| `clusterlog_merge` | `ClusterLog::merge` | lib.rs |
> +| `dedup_lookup` | `ClusterLog::dedup_lookup` | lib.rs |
> +
> +## Key Differences from C
> +
> +1. **No `node_digest` in DedupEntry**: C stores `node_digest` both as HashMap key and in the struct. Rust only uses it as the key, saving 8 bytes per entry.
> +
> +2. **Mutex granularity**: C uses a single global mutex. Rust uses separate Arc<Mutex<>> for buffer and dedup table, allowing better concurrency.
> +
> +3. **Code size**: Rust implementation is ~24% the size of C (740 lines vs 3,000+) while maintaining equivalent functionality.
> +
> +## Integration
> +
> +This crate is integrated into `pmxcfs-status` to provide cluster log functionality. The `.clusterlog` FUSE plugin uses this to provide JSON log output compatible with the Proxmox web UI.
> +
> +## References
> +
> +### C Implementation
> +- `src/pmxcfs/logger.c` / `logger.h` - Cluster log implementation
> +
> +### Related Crates
> +- **pmxcfs-status**: Integrates ClusterLog for status tracking
> +- **pmxcfs**: FUSE plugin exposes cluster log via `.clusterlog`
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/cluster_log.rs b/src/pmxcfs-rs/pmxcfs-logger/src/cluster_log.rs
> new file mode 100644
> index 00000000..3eb6c68c
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/cluster_log.rs
> @@ -0,0 +1,550 @@
> +/// Cluster Log Implementation
> +///
> +/// This module implements the cluster-wide log system with deduplication
> +/// and merging support, matching C's clusterlog_t.
> +use crate::entry::LogEntry;
> +use crate::ring_buffer::{RingBuffer, CLOG_DEFAULT_SIZE};
> +use anyhow::Result;
> +use parking_lot::Mutex;
> +use std::collections::{BTreeMap, HashMap};
> +use std::sync::Arc;
> +
> +/// Deduplication entry - tracks the latest UID and time for each node
> +///
> +/// Note: C's `dedup_entry_t` (logger.c:70-74) includes node_digest field because
> +/// GHashTable stores the struct pointer both as key and value. In Rust, we use
> +/// HashMap<u64, DedupEntry> where node_digest is the key, so we don't need to
> +/// duplicate it in the value. This is functionally equivalent but more efficient.
> +#[derive(Debug, Clone)]
> +pub(crate) struct DedupEntry {
> +    /// Latest UID seen from this node
> +    pub uid: u32,
> +    /// Latest timestamp seen from this node
> +    pub time: u32,
> +}
> +
> +/// Cluster-wide log with deduplication and merging support
> +/// Matches C's `clusterlog_t`
> +pub struct ClusterLog {
> +    /// Ring buffer for log storage
> +    pub(crate) buffer: Arc<Mutex<RingBuffer>>,
> +
> +    /// Deduplication tracker (node_digest -> latest entry info)
> +    /// Matches C's dedup hash table
> +    pub(crate) dedup: Arc<Mutex<HashMap<u64, DedupEntry>>>,
> +}
> +
> +impl ClusterLog {
> +    /// Create a new cluster log with default size
> +    pub fn new() -> Self {
> +        Self::with_capacity(CLOG_DEFAULT_SIZE)
> +    }
> +
> +    /// Create a new cluster log with specified capacity
> +    pub fn with_capacity(capacity: usize) -> Self {
> +        Self {
> +            buffer: Arc::new(Mutex::new(RingBuffer::new(capacity))),
> +            dedup: Arc::new(Mutex::new(HashMap::new())),
> +        }
> +    }
> +
> +    /// Matches C's `clusterlog_add` function (logger.c:588-615)
> +    #[allow(clippy::too_many_arguments)]
> +    pub fn add(
> +        &self,
> +        node: &str,
> +        ident: &str,
> +        tag: &str,
> +        pid: u32,
> +        priority: u8,
> +        time: u32,
> +        message: &str,
> +    ) -> Result<()> {
> +        let entry = LogEntry::pack(node, ident, tag, pid, time, priority, message)?;
> +        self.insert(&entry)
> +    }
> +
> +    /// Insert a log entry (with deduplication)
> +    ///
> +    /// Matches C's `clusterlog_insert` function (logger.c:573-586)
> +    pub fn insert(&self, entry: &LogEntry) -> Result<()> {
> +        let mut dedup = self.dedup.lock();
> +
> +        // Check deduplication
> +        if self.is_not_duplicate(&mut dedup, entry) {
> +            // Entry is not a duplicate, add it
> +            let mut buffer = self.buffer.lock();
> +            buffer.add_entry(entry)?;
> +        } else {
> +            tracing::debug!("Ignoring duplicate cluster log entry");
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Check if entry is a duplicate (returns true if NOT a duplicate)
> +    ///
> +    /// Matches C's `dedup_lookup` function (logger.c:362-388)
> +    fn is_not_duplicate(&self, dedup: &mut HashMap<u64, DedupEntry>, entry: &LogEntry) -> bool {
> +        match dedup.get_mut(&entry.node_digest) {
> +            None => {
> +                dedup.insert(
> +                    entry.node_digest,
> +                    DedupEntry {
> +                        time: entry.time,
> +                        uid: entry.uid,
> +                    },
> +                );
> +                true
> +            }
> +            Some(dd) => {
> +                if entry.time > dd.time || (entry.time == dd.time && entry.uid > dd.uid) {
> +                    dd.time = entry.time;
> +                    dd.uid = entry.uid;
> +                    true
> +                } else {
> +                    false
> +                }
> +            }
> +        }
> +    }
> +
> +    pub fn get_entries(&self, max: usize) -> Vec<LogEntry> {
> +        let buffer = self.buffer.lock();
> +        buffer.iter().take(max).cloned().collect()
> +    }
> +
> +    /// Clear all log entries (for testing)
> +    pub fn clear(&self) {
> +        let mut buffer = self.buffer.lock();
> +        let capacity = buffer.capacity();
> +        *buffer = RingBuffer::new(capacity);
> +        drop(buffer);
> +
> +        self.dedup.lock().clear();
> +    }
> +
> +    /// Sort the log entries by time
> +    ///
> +    /// Matches C's `clog_sort` function (logger.c:321-355)
> +    pub fn sort(&self) -> Result<RingBuffer> {
> +        let buffer = self.buffer.lock();
> +        buffer.sort()
> +    }
> +
> +    /// Merge logs from multiple nodes
> +    ///
> +    /// Matches C's `clusterlog_merge` function (logger.c:405-512)
> +    pub fn merge(&self, remote_logs: Vec<RingBuffer>, include_local: bool) -> Result<RingBuffer> {
> +        let mut sorted_entries: BTreeMap<(u32, u64, u32), LogEntry> = BTreeMap::new();
> +        let mut merge_dedup: HashMap<u64, DedupEntry> = HashMap::new();
> +
> +        // Calculate maximum capacity
> +        let max_size = if include_local {
> +            let local = self.buffer.lock();
> +            let local_cap = local.capacity();
> +            drop(local);
> +
> +            std::iter::once(local_cap)
> +                .chain(remote_logs.iter().map(|b| b.capacity()))
> +                .max()
> +                .unwrap_or(CLOG_DEFAULT_SIZE)
> +        } else {
> +            remote_logs
> +                .iter()
> +                .map(|b| b.capacity())
> +                .max()
> +                .unwrap_or(CLOG_DEFAULT_SIZE)
> +        };
> +
> +        // Add local entries if requested
> +        if include_local {
> +            let buffer = self.buffer.lock();
> +            for entry in buffer.iter() {
> +                let key = (entry.time, entry.node_digest, entry.uid);
> +                sorted_entries.insert(key, entry.clone());

BTreeMap::insert overwrites on duplicate. Please re-check whether we
want that; if we want to keep-first, use
entry(key).or_insert(...) and only update merge_dedup when newly
inserted.

> +                self.is_not_duplicate(&mut merge_dedup, entry);
> +            }
> +        }
> +
> +        // Add remote entries
> +        for remote_buffer in &remote_logs {
> +            for entry in remote_buffer.iter() {
> +                let key = (entry.time, entry.node_digest, entry.uid);
> +                sorted_entries.insert(key, entry.clone());
> +                self.is_not_duplicate(&mut merge_dedup, entry);
> +            }
> +        }
> +
> +        let mut result = RingBuffer::new(max_size);
> +
> +        // BTreeMap iterates in key order, entries are already sorted by (time, node_digest, uid)
> +        for (_key, entry) in sorted_entries.iter().rev() {

C iterates oldest -> newest and clog_copy() makes each entry the new
head, so result is newest first. With .rev() and push_front we likely
invert it. Maybe drop .rev()? Please re-check

> +            if result.is_near_full() {
> +                break;
> +            }
> +            result.add_entry(entry)?;
> +        }
> +
> +        *self.dedup.lock() = merge_dedup;

clusterlog_merge() in C updates both cl->dedup and cl->base under the
same mutex. Here we update only dedup but return a RingBuffer which
then requires a separate update_buffer() call. Shouldn't this be an
atomic operation? Also, we currently have two mutexes (dedup and
buffer), which increases deadlock risk. Couldnt we put buffer and
dedup behind one mutex and make merge() update both buffer+dedup
atomically inside the same lock?

> +
> +        Ok(result)
> +    }
> +
> +    /// Export log to JSON format
> +    ///
> +    /// Matches C's `clog_dump_json` function (logger.c:139-199)
> +    pub fn dump_json(&self, ident_filter: Option<&str>, max_entries: usize) -> String {
> +        let buffer = self.buffer.lock();
> +        buffer.dump_json(ident_filter, max_entries)
> +    }
> +
> +    /// Export log to JSON format with sorted entries
> +    pub fn dump_json_sorted(
> +        &self,
> +        ident_filter: Option<&str>,
> +        max_entries: usize,
> +    ) -> Result<String> {
> +        let sorted = self.sort()?;
> +        Ok(sorted.dump_json(ident_filter, max_entries))
> +    }
> +
> +    /// Matches C's `clusterlog_get_state` function (logger.c:553-571)
> +    ///
> +    /// Returns binary-serialized clog_base_t structure for network transmission.
> +    /// This format is compatible with C nodes for mixed-cluster operation.
> +    pub fn get_state(&self) -> Result<Vec<u8>> {
> +        let sorted = self.sort()?;
> +        Ok(sorted.serialize_binary())
> +    }
> +
> +    pub fn deserialize_state(data: &[u8]) -> Result<RingBuffer> {
> +        RingBuffer::deserialize_binary(data)
> +    }
> +
> +    /// Replace the entire buffer after merging logs from multiple nodes
> +    pub fn update_buffer(&self, new_buffer: RingBuffer) {
> +        *self.buffer.lock() = new_buffer;
> +    }
> +}
> +
> +impl Default for ClusterLog {
> +    fn default() -> Self {
> +        Self::new()
> +    }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +
> +    #[test]
> +    fn test_cluster_log_creation() {
> +        let log = ClusterLog::new();
> +        assert!(log.buffer.lock().is_empty());
> +    }
> +
> +    #[test]
> +    fn test_add_entry() {
> +        let log = ClusterLog::new();
> +
> +        let result = log.add(
> +            "node1",
> +            "root",
> +            "cluster",
> +            12345,
> +            6, // Info priority
> +            1234567890,
> +            "Test message",
> +        );
> +
> +        assert!(result.is_ok());
> +        assert!(!log.buffer.lock().is_empty());
> +    }
> +
> +    #[test]
> +    fn test_deduplication() {
> +        let log = ClusterLog::new();
> +
> +        // Add same entry twice (but with different UIDs since each add creates a new entry)
> +        let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Message 1");
> +        let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Message 1");
> +
> +        // Both entries are added because they have different UIDs
> +        // Deduplication tracks the latest (time, UID) per node, not content
> +        let buffer = log.buffer.lock();
> +        assert_eq!(buffer.len(), 2);
> +    }
> +
> +    #[test]
> +    fn test_newer_entry_replaces() {
> +        let log = ClusterLog::new();
> +
> +        // Add older entry
> +        let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Old message");
> +
> +        // Add newer entry from same node
> +        let _ = log.add("node1", "root", "cluster", 123, 6, 1001, "New message");
> +
> +        // Should have both entries (newer doesn't remove older, just updates dedup tracker)
> +        let buffer = log.buffer.lock();
> +        assert_eq!(buffer.len(), 2);
> +    }
> +
> +    #[test]
> +    fn test_json_export() {
> +        let log = ClusterLog::new();
> +
> +        let _ = log.add(
> +            "node1",
> +            "root",
> +            "cluster",
> +            123,
> +            6,
> +            1234567890,
> +            "Test message",
> +        );
> +
> +        let json = log.dump_json(None, 50);
> +
> +        // Should be valid JSON
> +        assert!(serde_json::from_str::<serde_json::Value>(&json).is_ok());
> +
> +        // Should contain "data" field
> +        let value: serde_json::Value = serde_json::from_str(&json).unwrap();
> +        assert!(value.get("data").is_some());
> +    }
> +
> +    #[test]
> +    fn test_merge_logs() {
> +        let log1 = ClusterLog::new();
> +        let log2 = ClusterLog::new();
> +
> +        // Add entries to first log
> +        let _ = log1.add(
> +            "node1",
> +            "root",
> +            "cluster",
> +            123,
> +            6,
> +            1000,
> +            "Message from node1",
> +        );
> +
> +        // Add entries to second log
> +        let _ = log2.add(
> +            "node2",
> +            "root",
> +            "cluster",
> +            456,
> +            6,
> +            1001,
> +            "Message from node2",
> +        );
> +
> +        // Get log2's buffer for merging
> +        let log2_buffer = log2.buffer.lock().clone();
> +
> +        // Merge into log1
> +        let merged = log1.merge(vec![log2_buffer], true).unwrap();
> +
> +        // Should contain entries from both logs
> +        assert!(merged.len() >= 2);
> +    }
> +
> +    // ========================================================================
> +    // HIGH PRIORITY TESTS - Merge Edge Cases
> +    // ========================================================================
> +
> +    #[test]
> +    fn test_merge_empty_logs() {
> +        let log = ClusterLog::new();
> +
> +        // Add some entries to local log
> +        let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Local entry");
> +
> +        // Merge with empty remote logs
> +        let merged = log.merge(vec![], true).unwrap();
> +
> +        // Should have 1 entry (from local log)
> +        assert_eq!(merged.len(), 1);
> +        let entry = merged.iter().next().unwrap();
> +        assert_eq!(entry.node, "node1");
> +    }
> +
> +    #[test]
> +    fn test_merge_single_node_only() {
> +        let log = ClusterLog::new();
> +
> +        // Add entries only from single node
> +        let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Entry 1");
> +        let _ = log.add("node1", "root", "cluster", 124, 6, 1001, "Entry 2");
> +        let _ = log.add("node1", "root", "cluster", 125, 6, 1002, "Entry 3");
> +
> +        // Merge with no remote logs (just sort local)
> +        let merged = log.merge(vec![], true).unwrap();
> +
> +        // Should have all 3 entries
> +        assert_eq!(merged.len(), 3);
> +
> +        // Entries should be sorted by time (buffer stores newest first after reversing during add)
> +        // Merge reverses the BTreeMap iteration, so newest entries are added first
> +        let times: Vec<u32> = merged.iter().map(|e| e.time).collect();
> +        let mut expected = vec![1002, 1001, 1000];
> +        expected.sort();
> +        expected.reverse(); // Newest first
> +
> +        let mut actual = times.clone();
> +        actual.sort();
> +        actual.reverse();
> +
> +        assert_eq!(actual, expected);
> +    }
> +
> +    #[test]
> +    fn test_merge_all_duplicates() {
> +        let log1 = ClusterLog::new();
> +        let log2 = ClusterLog::new();
> +
> +        // Add same entries to both logs (same node, time, but different UIDs)
> +        let _ = log1.add("node1", "root", "cluster", 123, 6, 1000, "Entry 1");
> +        let _ = log1.add("node1", "root", "cluster", 124, 6, 1001, "Entry 2");
> +
> +        let _ = log2.add("node1", "root", "cluster", 125, 6, 1000, "Entry 1");
> +        let _ = log2.add("node1", "root", "cluster", 126, 6, 1001, "Entry 2");
> +
> +        let log2_buffer = log2.buffer.lock().clone();
> +
> +        // Merge - should handle entries from same node at same times
> +        let merged = log1.merge(vec![log2_buffer], true).unwrap();
> +
> +        // Should have 4 entries (all are unique by UID despite same time/node)
> +        assert_eq!(merged.len(), 4);
> +    }
> +
> +    #[test]
> +    fn test_merge_exceeding_capacity() {
> +        // Create small buffer to test capacity enforcement
> +        let log = ClusterLog::with_capacity(50_000); // Small buffer
> +
> +        // Add many entries to fill beyond capacity
> +        for i in 0..100 {
> +            let _ = log.add(
> +                "node1",
> +                "root",
> +                "cluster",
> +                100 + i,
> +                6,
> +                1000 + i,
> +                &format!("Entry {}", i),
> +            );
> +        }
> +
> +        // Create remote log with many entries
> +        let remote = ClusterLog::with_capacity(50_000);
> +        for i in 0..100 {
> +            let _ = remote.add(
> +                "node2",
> +                "root",
> +                "cluster",
> +                200 + i,
> +                6,
> +                1000 + i,
> +                &format!("Remote {}", i),
> +            );
> +        }
> +
> +        let remote_buffer = remote.buffer.lock().clone();
> +
> +        // Merge - should stop when buffer is near full
> +        let merged = log.merge(vec![remote_buffer], true).unwrap();
> +
> +        // Buffer should be limited by capacity, not necessarily < 200
> +        // The actual limit depends on entry sizes and capacity
> +        // Just verify we got some reasonable number of entries
> +        assert!(!merged.is_empty(), "Should have some entries");
> +        assert!(
> +            merged.len() <= 200,
> +            "Should not exceed total available entries"
> +        );
> +    }
> +
> +    #[test]
> +    fn test_merge_preserves_dedup_state() {
> +        let log = ClusterLog::new();
> +
> +        // Add entries from node1
> +        let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Entry 1");
> +        let _ = log.add("node1", "root", "cluster", 124, 6, 1001, "Entry 2");
> +
> +        // Create remote log with later entries from node1
> +        let remote = ClusterLog::new();
> +        let _ = remote.add("node1", "root", "cluster", 125, 6, 1002, "Entry 3");
> +
> +        let remote_buffer = remote.buffer.lock().clone();
> +
> +        // Merge
> +        let _ = log.merge(vec![remote_buffer], true).unwrap();
> +
> +        // Check that dedup state was updated
> +        let dedup = log.dedup.lock();
> +        let node1_digest = crate::hash::fnv_64a_str("node1");
> +        let dedup_entry = dedup.get(&node1_digest).unwrap();
> +
> +        // Should track the latest time from node1
> +        assert_eq!(dedup_entry.time, 1002);
> +        // UID is auto-generated, so just verify it exists and is reasonable
> +        assert!(dedup_entry.uid > 0);
> +    }
> +
> +    #[test]
> +    fn test_get_state_binary_format() {
> +        let log = ClusterLog::new();
> +
> +        // Add some entries
> +        let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Entry 1");
> +        let _ = log.add("node2", "admin", "system", 456, 6, 1001, "Entry 2");
> +
> +        // Get state
> +        let state = log.get_state().unwrap();
> +
> +        // Should be binary format, not JSON
> +        assert!(state.len() >= 8); // At least header
> +
> +        // Check header format (clog_base_t)
> +        let size = u32::from_le_bytes(state[0..4].try_into().unwrap()) as usize;
> +        let cpos = u32::from_le_bytes(state[4..8].try_into().unwrap());
> +
> +        assert_eq!(size, state.len());
> +        assert_eq!(cpos, 8); // First entry at offset 8
> +
> +        // Should be able to deserialize back
> +        let deserialized = ClusterLog::deserialize_state(&state).unwrap();
> +        assert_eq!(deserialized.len(), 2);
> +    }
> +
> +    #[test]
> +    fn test_state_roundtrip() {
> +        let log = ClusterLog::new();
> +
> +        // Add entries
> +        let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Test 1");
> +        let _ = log.add("node2", "admin", "system", 456, 6, 1001, "Test 2");
> +
> +        // Serialize
> +        let state = log.get_state().unwrap();
> +
> +        // Deserialize
> +        let deserialized = ClusterLog::deserialize_state(&state).unwrap();
> +
> +        // Check entries preserved
> +        assert_eq!(deserialized.len(), 2);
> +
> +        // Buffer is stored newest-first after sorting and serialization
> +        let entries: Vec<_> = deserialized.iter().collect();
> +        assert_eq!(entries[0].node, "node2"); // Newest (time 1001)
> +        assert_eq!(entries[0].message, "Test 2");
> +        assert_eq!(entries[1].node, "node1"); // Oldest (time 1000)
> +        assert_eq!(entries[1].message, "Test 1");
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/entry.rs b/src/pmxcfs-rs/pmxcfs-logger/src/entry.rs
> new file mode 100644
> index 00000000..187667ad
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/entry.rs
> @@ -0,0 +1,579 @@
> +/// Log Entry Implementation
> +///
> +/// This module implements the cluster log entry structure, matching the C
> +/// implementation's clog_entry_t (logger.c).
> +use super::hash::fnv_64a_str;
> +use anyhow::{bail, Result};
> +use serde::Serialize;
> +use std::sync::atomic::{AtomicU32, Ordering};
> +
> +// Constants from C implementation
> +pub(crate) const CLOG_MAX_ENTRY_SIZE: usize = 8192 + 4096; // SYSLOG_MAX_LINE_LENGTH + overhead

This constant is also defined in ring_buffer.rs.

> +
> +/// Global UID counter (matches C's `uid_counter` in logger.c:62)
> +static UID_COUNTER: AtomicU32 = AtomicU32::new(0);
> +
> +/// Log entry structure
> +///
> +/// Matches C's `clog_entry_t` from logger.c:
> +/// ```c
> +/// typedef struct {
> +///     uint32_t prev;          // Previous entry offset
> +///     uint32_t next;          // Next entry offset
> +///     uint32_t uid;           // Unique ID
> +///     uint32_t time;          // Timestamp
> +///     uint64_t node_digest;   // FNV-1a hash of node name
> +///     uint64_t ident_digest;  // FNV-1a hash of ident
> +///     uint32_t pid;           // Process ID
> +///     uint8_t priority;       // Syslog priority (0-7)
> +///     uint8_t node_len;       // Length of node name (including null)
> +///     uint8_t ident_len;      // Length of ident (including null)
> +///     uint8_t tag_len;        // Length of tag (including null)
> +///     uint32_t msg_len;       // Length of message (including null)
> +///     char data[];            // Variable length data: node + ident + tag + msg
> +/// } clog_entry_t;
> +/// ```
> +#[derive(Debug, Clone, Serialize)]
> +pub struct LogEntry {
> +    /// Unique ID for this entry (auto-incrementing)
> +    pub uid: u32,
> +
> +    /// Unix timestamp
> +    pub time: u32,
> +
> +    /// FNV-1a hash of node name
> +    pub node_digest: u64,
> +
> +    /// FNV-1a hash of ident (user)
> +    pub ident_digest: u64,
> +
> +    /// Process ID
> +    pub pid: u32,
> +
> +    /// Syslog priority (0-7)
> +    pub priority: u8,
> +
> +    /// Node name
> +    pub node: String,
> +
> +    /// Identity/user
> +    pub ident: String,
> +
> +    /// Tag (e.g., "cluster", "pmxcfs")
> +    pub tag: String,
> +
> +    /// Log message
> +    pub message: String,
> +}
> +
> +impl LogEntry {
> +    /// Matches C's `clog_pack` function (logger.c:220-278)
> +    pub fn pack(
> +        node: &str,
> +        ident: &str,
> +        tag: &str,
> +        pid: u32,
> +        time: u32,
> +        priority: u8,
> +        message: &str,
> +    ) -> Result<Self> {
> +        if priority >= 8 {
> +            bail!("Invalid priority: {priority} (must be 0-7)");
> +        }
> +
> +        let node = Self::truncate_string(node, 255);
> +        let ident = Self::truncate_string(ident, 255);
> +        let tag = Self::truncate_string(tag, 255);
> +        let message = Self::utf8_to_ascii(message);
> +
> +        let node_len = node.len() + 1;
> +        let ident_len = ident.len() + 1;
> +        let tag_len = tag.len() + 1;
> +        let mut msg_len = message.len() + 1;
> +
> +        let total_size = std::mem::size_of::<u32>() * 4  // prev, next, uid, time
> +            + std::mem::size_of::<u64>() * 2  // node_digest, ident_digest
> +            + std::mem::size_of::<u32>() * 2  // pid, msg_len
> +            + std::mem::size_of::<u8>() * 4   // priority, node_len, ident_len, tag_len
> +            + node_len
> +            + ident_len
> +            + tag_len
> +            + msg_len;
> +
> +        if total_size > CLOG_MAX_ENTRY_SIZE {
> +            let diff = total_size - CLOG_MAX_ENTRY_SIZE;
> +            msg_len = msg_len.saturating_sub(diff);
> +        }
> +
> +        let node_digest = fnv_64a_str(&node);
> +        let ident_digest = fnv_64a_str(&ident);
> +        let uid = UID_COUNTER.fetch_add(1, Ordering::SeqCst).wrapping_add(1);
> +
> +        Ok(Self {
> +            uid,
> +            time,
> +            node_digest,
> +            ident_digest,
> +            pid,
> +            priority,
> +            node,
> +            ident,
> +            tag,
> +            message: message[..msg_len.saturating_sub(1)].to_string(),
> +        })
> +    }
> +
> +    /// Truncate string to max length
> +    fn truncate_string(s: &str, max_len: usize) -> String {
> +        if s.len() > max_len {
> +            s[..max_len].to_string()
> +        } else {
> +            s.to_string()
> +        }
> +    }
> +
> +    /// Convert UTF-8 to ASCII with proper escaping
> +    ///
> +    /// Matches C's `utf8_to_ascii` behavior (cfs-utils.c:40-107):
> +    /// - Control characters (0x00-0x1F, 0x7F): Escaped as #0XXX (e.g., #007 for BEL)
> +    /// - Unicode (U+0080 to U+FFFF): Escaped as \uXXXX (e.g., \u4e16 for 世)
> +    /// - Quotes (when quotequote=true): Escaped as \"
> +    /// - Characters > U+FFFF: Silently dropped
> +    /// - ASCII printable (0x20-0x7E except quotes): Passed through unchanged
> +    fn utf8_to_ascii(s: &str) -> String {
> +        let mut result = String::with_capacity(s.len());
> +
> +        for c in s.chars() {
> +            match c {
> +                // Control characters: #0XXX format (3 decimal digits with leading 0)
> +                '\x00'..='\x1F' | '\x7F' => {
> +                    let code = c as u32;
> +                    result.push('#');
> +                    result.push('0');
> +                    // Format as 3 decimal digits with leading zeros (e.g., #0007 for BEL)
> +                    result.push_str(&format!("{:03}", code));
> +                }
> +                // ASCII printable characters: pass through
> +                c if c.is_ascii() => {
> +                    result.push(c);
> +                }
> +                // Unicode U+0080 to U+FFFF: \uXXXX format
> +                c if (c as u32) < 0x10000 => {
> +                    result.push('\\');
> +                    result.push('u');
> +                    result.push_str(&format!("{:04x}", c as u32));
> +                }
> +                // Characters > U+FFFF: silently drop (matches C behavior)
> +                _ => {}
> +            }
> +        }
> +
> +        result
> +    }
> +
> +    /// Matches C's `clog_entry_size` function (logger.c:201-206)
> +    pub fn size(&self) -> usize {
> +        std::mem::size_of::<u32>() * 4  // prev, next, uid, time
> +            + std::mem::size_of::<u64>() * 2  // node_digest, ident_digest
> +            + std::mem::size_of::<u32>() * 2  // pid, msg_len
> +            + std::mem::size_of::<u8>() * 4   // priority, node_len, ident_len, tag_len
> +            + self.node.len() + 1
> +            + self.ident.len() + 1
> +            + self.tag.len() + 1
> +            + self.message.len() + 1
> +    }
> +
> +    /// C implementation: `uint32_t realsize = ((size + 7) & 0xfffffff8);`
> +    pub fn aligned_size(&self) -> usize {
> +        let size = self.size();
> +        (size + 7) & !7
> +    }
> +
> +    pub fn to_json_object(&self) -> serde_json::Value {
> +        serde_json::json!({
> +            "uid": self.uid,
> +            "time": self.time,
> +            "pri": self.priority,
> +            "tag": self.tag,
> +            "pid": self.pid,
> +            "node": self.node,
> +            "user": self.ident,
> +            "msg": self.message,
> +        })
> +    }
> +
> +    /// Serialize to C binary format (clog_entry_t)
> +    ///
> +    /// Binary layout matches C structure:
> +    /// ```c
> +    /// struct {
> +    ///     uint32_t prev;          // Will be filled by ring buffer
> +    ///     uint32_t next;          // Will be filled by ring buffer
> +    ///     uint32_t uid;
> +    ///     uint32_t time;
> +    ///     uint64_t node_digest;
> +    ///     uint64_t ident_digest;
> +    ///     uint32_t pid;
> +    ///     uint8_t priority;
> +    ///     uint8_t node_len;
> +    ///     uint8_t ident_len;
> +    ///     uint8_t tag_len;
> +    ///     uint32_t msg_len;
> +    ///     char data[];  // node + ident + tag + msg (null-terminated)
> +    /// }
> +    /// ```
> +    pub(crate) fn serialize_binary(&self, prev: u32, next: u32) -> Vec<u8> {
> +        let mut buf = Vec::new();
> +
> +        buf.extend_from_slice(&prev.to_le_bytes());
> +        buf.extend_from_slice(&next.to_le_bytes());
> +        buf.extend_from_slice(&self.uid.to_le_bytes());
> +        buf.extend_from_slice(&self.time.to_le_bytes());
> +        buf.extend_from_slice(&self.node_digest.to_le_bytes());
> +        buf.extend_from_slice(&self.ident_digest.to_le_bytes());
> +        buf.extend_from_slice(&self.pid.to_le_bytes());
> +        buf.push(self.priority);
> +
> +        let node_len = (self.node.len() + 1) as u8;
> +        let ident_len = (self.ident.len() + 1) as u8;
> +        let tag_len = (self.tag.len() + 1) as u8;


These three fields are u8 incl. NUL. Payload must cap at 254 bytes, 
otherwise len + 1 wraps to 0. C does MIN(strlen + 1,255)

> +        let msg_len = (self.message.len() + 1) as u32;
> +
> +        buf.push(node_len);
> +        buf.push(ident_len);
> +        buf.push(tag_len);
> +        buf.extend_from_slice(&msg_len.to_le_bytes());
> +
> +        buf.extend_from_slice(self.node.as_bytes());
> +        buf.push(0);
> +
> +        buf.extend_from_slice(self.ident.as_bytes());
> +        buf.push(0);
> +
> +        buf.extend_from_slice(self.tag.as_bytes());
> +        buf.push(0);
> +
> +        buf.extend_from_slice(self.message.as_bytes());
> +        buf.push(0);
> +
> +        buf
> +    }
> +
> +    pub(crate) fn deserialize_binary(data: &[u8]) -> Result<(Self, u32, u32)> {
> +        if data.len() < 48 {
> +            bail!(
> +                "Entry too small: {} bytes (need at least 48 for header)",
> +                data.len()
> +            );
> +        }
> +
> +        let mut offset = 0;
> +
> +        let prev = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> +        offset += 4;
> +
> +        let next = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> +        offset += 4;
> +
> +        let uid = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> +        offset += 4;
> +
> +        let time = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> +        offset += 4;
> +
> +        let node_digest = u64::from_le_bytes(data[offset..offset + 8].try_into()?);
> +        offset += 8;
> +
> +        let ident_digest = u64::from_le_bytes(data[offset..offset + 8].try_into()?);
> +        offset += 8;
> +
> +        let pid = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> +        offset += 4;
> +
> +        let priority = data[offset];
> +        offset += 1;
> +
> +        let node_len = data[offset] as usize;
> +        offset += 1;
> +
> +        let ident_len = data[offset] as usize;
> +        offset += 1;
> +
> +        let tag_len = data[offset] as usize;
> +        offset += 1;
> +
> +        let msg_len = u32::from_le_bytes(data[offset..offset + 4].try_into()?) as usize;
> +        offset += 4;
> +
> +        if offset + node_len + ident_len + tag_len + msg_len > data.len() {
> +            bail!("Entry data exceeds buffer size");
> +        }
> +
> +        let node = read_null_terminated(&data[offset..offset + node_len])?;
> +        offset += node_len;
> +
> +        let ident = read_null_terminated(&data[offset..offset + ident_len])?;
> +        offset += ident_len;
> +
> +        let tag = read_null_terminated(&data[offset..offset + tag_len])?;
> +        offset += tag_len;
> +
> +        let message = read_null_terminated(&data[offset..offset + msg_len])?;
> +
> +        Ok((
> +            Self {
> +                uid,
> +                time,
> +                node_digest,
> +                ident_digest,
> +                pid,
> +                priority,
> +                node,
> +                ident,
> +                tag,
> +                message,
> +            },
> +            prev,
> +            next,
> +        ))
> +    }
> +}
> +
> +fn read_null_terminated(data: &[u8]) -> Result<String> {
> +    let len = data.iter().position(|&b| b == 0).unwrap_or(data.len());
> +    Ok(String::from_utf8_lossy(&data[..len]).into_owned())
> +}
> +
> +#[cfg(test)]
> +pub fn reset_uid_counter() {
> +    UID_COUNTER.store(0, Ordering::SeqCst);
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +
> +    #[test]
> +    fn test_pack_entry() {
> +        reset_uid_counter();
> +
> +        let entry = LogEntry::pack(
> +            "node1",
> +            "root",
> +            "cluster",
> +            12345,
> +            1234567890,
> +            6, // Info priority
> +            "Test message",
> +        )
> +        .unwrap();
> +
> +        assert_eq!(entry.uid, 1);
> +        assert_eq!(entry.time, 1234567890);
> +        assert_eq!(entry.node, "node1");
> +        assert_eq!(entry.ident, "root");
> +        assert_eq!(entry.tag, "cluster");
> +        assert_eq!(entry.pid, 12345);
> +        assert_eq!(entry.priority, 6);
> +        assert_eq!(entry.message, "Test message");
> +    }
> +
> +    #[test]
> +    fn test_uid_increment() {
> +        reset_uid_counter();
> +
> +        let entry1 = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg1").unwrap();
> +        let entry2 = LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "msg2").unwrap();
> +
> +        assert_eq!(entry1.uid, 1);
> +        assert_eq!(entry2.uid, 2);
> +    }
> +
> +    #[test]
> +    fn test_invalid_priority() {
> +        let result = LogEntry::pack("node1", "root", "tag", 0, 1000, 8, "message");
> +        assert!(result.is_err());
> +    }
> +
> +    #[test]
> +    fn test_node_digest() {
> +        let entry1 = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg").unwrap();
> +        let entry2 = LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "msg").unwrap();
> +        let entry3 = LogEntry::pack("node2", "root", "tag", 0, 1000, 6, "msg").unwrap();
> +
> +        // Same node should have same digest
> +        assert_eq!(entry1.node_digest, entry2.node_digest);
> +
> +        // Different node should have different digest
> +        assert_ne!(entry1.node_digest, entry3.node_digest);
> +    }
> +
> +    #[test]
> +    fn test_ident_digest() {
> +        let entry1 = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg").unwrap();
> +        let entry2 = LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "msg").unwrap();
> +        let entry3 = LogEntry::pack("node1", "admin", "tag", 0, 1000, 6, "msg").unwrap();
> +
> +        // Same ident should have same digest
> +        assert_eq!(entry1.ident_digest, entry2.ident_digest);
> +
> +        // Different ident should have different digest
> +        assert_ne!(entry1.ident_digest, entry3.ident_digest);
> +    }
> +
> +    #[test]
> +    fn test_utf8_to_ascii() {
> +        let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "Hello 世界").unwrap();
> +        assert!(entry.message.is_ascii());
> +        // Unicode chars escaped as \uXXXX format (matches C implementation)
> +        assert!(entry.message.contains("\\u4e16")); // 世 = U+4E16
> +        assert!(entry.message.contains("\\u754c")); // 界 = U+754C
> +    }
> +
> +    #[test]
> +    fn test_utf8_control_chars() {
> +        // Test control character escaping
> +        let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "Hello\x07World").unwrap();
> +        assert!(entry.message.is_ascii());
> +        // BEL (0x07) should be escaped as #0007
> +        assert!(entry.message.contains("#0007"));
> +    }
> +
> +    #[test]
> +    fn test_utf8_mixed_content() {
> +        // Test mix of ASCII, Unicode, and control chars
> +        let entry = LogEntry::pack(
> +            "node1",
> +            "root",
> +            "tag",
> +            0,
> +            1000,
> +            6,
> +            "Test\x01\nUnicode世\ttab",
> +        )
> +        .unwrap();
> +        assert!(entry.message.is_ascii());
> +        // SOH (0x01) -> #0001
> +        assert!(entry.message.contains("#0001"));
> +        // Newline (0x0A) -> #0010
> +        assert!(entry.message.contains("#0010"));
> +        // Unicode 世 (U+4E16) -> \u4e16
> +        assert!(entry.message.contains("\\u4e16"));
> +        // Tab (0x09) -> #0009
> +        assert!(entry.message.contains("#0009"));
> +    }
> +
> +    #[test]
> +    fn test_string_truncation() {
> +        let long_node = "a".repeat(300);
> +        let entry = LogEntry::pack(&long_node, "root", "tag", 0, 1000, 6, "msg").unwrap();
> +        assert!(entry.node.len() <= 255);
> +    }
> +
> +    #[test]
> +    fn test_message_truncation() {
> +        let long_message = "a".repeat(CLOG_MAX_ENTRY_SIZE);
> +        let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, &long_message).unwrap();
> +        // Entry should fit within max size
> +        assert!(entry.size() <= CLOG_MAX_ENTRY_SIZE);
> +    }
> +
> +    #[test]
> +    fn test_aligned_size() {
> +        let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg").unwrap();
> +        let aligned = entry.aligned_size();
> +
> +        // Aligned size should be multiple of 8
> +        assert_eq!(aligned % 8, 0);
> +
> +        // Aligned size should be >= actual size
> +        assert!(aligned >= entry.size());
> +
> +        // Aligned size should be within 7 bytes of actual size
> +        assert!(aligned - entry.size() < 8);
> +    }
> +
> +    #[test]
> +    fn test_json_export() {
> +        let entry = LogEntry::pack("node1", "root", "cluster", 123, 1234567890, 6, "Test").unwrap();
> +        let json = entry.to_json_object();
> +
> +        assert_eq!(json["node"], "node1");
> +        assert_eq!(json["user"], "root");
> +        assert_eq!(json["tag"], "cluster");
> +        assert_eq!(json["pid"], 123);
> +        assert_eq!(json["time"], 1234567890);
> +        assert_eq!(json["pri"], 6);
> +        assert_eq!(json["msg"], "Test");
> +    }
> +
> +    #[test]
> +    fn test_binary_serialization_roundtrip() {
> +        let entry = LogEntry::pack(
> +            "node1",
> +            "root",
> +            "cluster",
> +            12345,
> +            1234567890,
> +            6,
> +            "Test message",
> +        )
> +        .unwrap();
> +
> +        // Serialize with prev/next pointers
> +        let binary = entry.serialize_binary(100, 200);
> +
> +        // Deserialize
> +        let (deserialized, prev, next) = LogEntry::deserialize_binary(&binary).unwrap();
> +
> +        // Check prev/next pointers
> +        assert_eq!(prev, 100);
> +        assert_eq!(next, 200);
> +
> +        // Check entry fields
> +        assert_eq!(deserialized.uid, entry.uid);
> +        assert_eq!(deserialized.time, entry.time);
> +        assert_eq!(deserialized.node_digest, entry.node_digest);
> +        assert_eq!(deserialized.ident_digest, entry.ident_digest);
> +        assert_eq!(deserialized.pid, entry.pid);
> +        assert_eq!(deserialized.priority, entry.priority);
> +        assert_eq!(deserialized.node, entry.node);
> +        assert_eq!(deserialized.ident, entry.ident);
> +        assert_eq!(deserialized.tag, entry.tag);
> +        assert_eq!(deserialized.message, entry.message);
> +    }
> +
> +    #[test]
> +    fn test_binary_format_header_size() {
> +        let entry = LogEntry::pack("n", "u", "t", 1, 1000, 6, "m").unwrap();
> +        let binary = entry.serialize_binary(0, 0);
> +
> +        // Header should be exactly 48 bytes
> +        // prev(4) + next(4) + uid(4) + time(4) + node_digest(8) + ident_digest(8) +
> +        // pid(4) + priority(1) + node_len(1) + ident_len(1) + tag_len(1) + msg_len(4)
> +        assert!(binary.len() >= 48);
> +
> +        // First 48 bytes are header
> +        assert_eq!(&binary[0..4], &0u32.to_le_bytes()); // prev
> +        assert_eq!(&binary[4..8], &0u32.to_le_bytes()); // next
> +    }
> +
> +    #[test]
> +    fn test_binary_deserialize_invalid_size() {
> +        let too_small = vec![0u8; 40]; // Less than 48 byte header
> +        let result = LogEntry::deserialize_binary(&too_small);
> +        assert!(result.is_err());
> +    }
> +
> +    #[test]
> +    fn test_binary_null_terminators() {
> +        let entry = LogEntry::pack("node1", "root", "tag", 123, 1000, 6, "message").unwrap();
> +        let binary = entry.serialize_binary(0, 0);
> +
> +        // Check that strings are null-terminated
> +        // Find null bytes in data section (after 48-byte header)
> +        let data_section = &binary[48..];
> +        let null_count = data_section.iter().filter(|&&b| b == 0).count();
> +        assert_eq!(null_count, 4); // 4 null terminators (node, ident, tag, msg)
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/hash.rs b/src/pmxcfs-rs/pmxcfs-logger/src/hash.rs
> new file mode 100644
> index 00000000..710c9ab3
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/hash.rs
> @@ -0,0 +1,173 @@
> +/// FNV-1a (Fowler-Noll-Vo) 64-bit hash function
> +///
> +/// This matches the C implementation's fnv_64a_buf function (logger.c:52-60)
> +/// Used for generating node and ident digests for deduplication.
> +/// FNV-1a 64-bit non-zero initial basis
> +pub(crate) const FNV1A_64_INIT: u64 = 0xcbf29ce484222325;
> +
> +/// Compute 64-bit FNV-1a hash
> +///
> +/// This is a faithful port of the C implementation from logger.c lines 52-60:
> +/// ```c
> +/// static inline uint64_t fnv_64a_buf(const void *buf, size_t len, uint64_t hval) {
> +///     unsigned char *bp = (unsigned char *)buf;
> +///     unsigned char *be = bp + len;
> +///     while (bp < be) {
> +///         hval ^= (uint64_t)*bp++;
> +///         hval += (hval << 1) + (hval << 4) + (hval << 5) + (hval << 7) + (hval << 8) + (hval << 40);
> +///     }
> +///     return hval;
> +/// }
> +/// ```
> +///
> +/// # Arguments
> +/// * `data` - The data to hash
> +/// * `init` - Initial hash value (use FNV1A_64_INIT for first hash)
> +///
> +/// # Returns
> +/// 64-bit hash value
> +///
> +/// Note: This function appears unused but is actually called via `fnv_64a_str` below,
> +/// which provides the primary API for string hashing. Both functions share the core
> +/// FNV-1a implementation logic.
> +#[inline]
> +#[allow(dead_code)] // Used via fnv_64a_str wrapper
> +pub(crate) fn fnv_64a(data: &[u8], init: u64) -> u64 {
> +    let mut hval = init;
> +
> +    for &byte in data {
> +        hval ^= byte as u64;
> +        // FNV magic prime multiplication done via shifts and adds
> +        // This is equivalent to: hval *= 0x100000001b3 (FNV 64-bit prime)
> +        hval = hval.wrapping_add(
> +            (hval << 1)
> +                .wrapping_add(hval << 4)
> +                .wrapping_add(hval << 5)
> +                .wrapping_add(hval << 7)
> +                .wrapping_add(hval << 8)
> +                .wrapping_add(hval << 40),
> +        );
> +    }
> +
> +    hval
> +}
> +
> +/// Hash a null-terminated string (includes the null byte)
> +///
> +/// The C implementation includes the null terminator in the hash:
> +/// `fnv_64a_buf(node, node_len, FNV1A_64_INIT)` where node_len includes the '\0'
> +///
> +/// This function adds a null byte to match that behavior.
> +#[inline]
> +pub(crate) fn fnv_64a_str(s: &str) -> u64 {
> +    let bytes = s.as_bytes();
> +    let mut hval = FNV1A_64_INIT;
> +
> +    for &byte in bytes {
> +        hval ^= byte as u64;
> +        hval = hval.wrapping_add(
> +            (hval << 1)
> +                .wrapping_add(hval << 4)
> +                .wrapping_add(hval << 5)
> +                .wrapping_add(hval << 7)
> +                .wrapping_add(hval << 8)
> +                .wrapping_add(hval << 40),
> +        );
> +    }
> +
> +    // Hash the null terminator (C compatibility: original XORs with 0 which is a no-op)
> +    // We skip the no-op XOR and proceed directly to the final avalanche
> +    hval.wrapping_add(
> +        (hval << 1)
> +            .wrapping_add(hval << 4)
> +            .wrapping_add(hval << 5)
> +            .wrapping_add(hval << 7)
> +            .wrapping_add(hval << 8)
> +            .wrapping_add(hval << 40),
> +    )
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +
> +    #[test]
> +    fn test_fnv1a_init() {
> +        // Test that init constant matches C implementation
> +        assert_eq!(FNV1A_64_INIT, 0xcbf29ce484222325);
> +    }
> +
> +    #[test]
> +    fn test_fnv1a_empty() {
> +        // Empty string with null terminator
> +        let hash = fnv_64a(&[0], FNV1A_64_INIT);
> +        assert_ne!(hash, FNV1A_64_INIT); // Should be different from init
> +    }
> +
> +    #[test]
> +    fn test_fnv1a_consistency() {
> +        // Same input should produce same output
> +        let data = b"test";
> +        let hash1 = fnv_64a(data, FNV1A_64_INIT);
> +        let hash2 = fnv_64a(data, FNV1A_64_INIT);
> +        assert_eq!(hash1, hash2);
> +    }
> +
> +    #[test]
> +    fn test_fnv1a_different_data() {
> +        // Different input should (usually) produce different output
> +        let hash1 = fnv_64a(b"test1", FNV1A_64_INIT);
> +        let hash2 = fnv_64a(b"test2", FNV1A_64_INIT);
> +        assert_ne!(hash1, hash2);
> +    }
> +
> +    #[test]
> +    fn test_fnv1a_str() {
> +        // Test string hashing with null terminator
> +        let hash1 = fnv_64a_str("node1");
> +        let hash2 = fnv_64a_str("node1");
> +        let hash3 = fnv_64a_str("node2");
> +
> +        assert_eq!(hash1, hash2); // Same string should hash the same
> +        assert_ne!(hash1, hash3); // Different strings should hash differently
> +    }
> +
> +    #[test]
> +    fn test_fnv1a_node_names() {
> +        // Test with typical Proxmox node names
> +        let nodes = vec!["pve1", "pve2", "pve3"];
> +        let mut hashes = Vec::new();
> +
> +        for node in &nodes {
> +            let hash = fnv_64a_str(node);
> +            hashes.push(hash);
> +        }
> +
> +        // All hashes should be unique
> +        for i in 0..hashes.len() {
> +            for j in (i + 1)..hashes.len() {
> +                assert_ne!(
> +                    hashes[i], hashes[j],
> +                    "Hashes for {} and {} should differ",
> +                    nodes[i], nodes[j]
> +                );
> +            }
> +        }
> +    }
> +
> +    #[test]
> +    fn test_fnv1a_chaining() {
> +        // Test that we can chain hashes
> +        let data1 = b"first";
> +        let data2 = b"second";
> +
> +        let hash1 = fnv_64a(data1, FNV1A_64_INIT);
> +        let hash2 = fnv_64a(data2, hash1); // Use previous hash as init
> +
> +        // Should produce a deterministic result
> +        let hash1_again = fnv_64a(data1, FNV1A_64_INIT);
> +        let hash2_again = fnv_64a(data2, hash1_again);
> +
> +        assert_eq!(hash2, hash2_again);
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/lib.rs b/src/pmxcfs-rs/pmxcfs-logger/src/lib.rs
> new file mode 100644
> index 00000000..964f0b3a
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/lib.rs
> @@ -0,0 +1,27 @@
> +/// Cluster Log Implementation
> +///
> +/// This module provides a cluster-wide log system compatible with the C implementation.
> +/// It maintains a ring buffer of log entries that can be merged from multiple nodes,
> +/// deduplicated, and exported to JSON.
> +///
> +/// Key features:
> +/// - Ring buffer storage for efficient memory usage
> +/// - FNV-1a hashing for node and ident tracking
> +/// - Deduplication across nodes
> +/// - Time-based sorting
> +/// - Multi-node log merging
> +/// - JSON export for web UI
> +// Internal modules (not exposed)
> +mod cluster_log;
> +mod entry;
> +mod hash;
> +mod ring_buffer;
> +
> +// Public API - only expose what's needed externally
> +pub use cluster_log::ClusterLog;
> +
> +// Re-export types only for testing or internal crate use
> +#[doc(hidden)]
> +pub use entry::LogEntry;
> +#[doc(hidden)]
> +pub use ring_buffer::RingBuffer;
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/ring_buffer.rs b/src/pmxcfs-rs/pmxcfs-logger/src/ring_buffer.rs
> new file mode 100644
> index 00000000..4f6db63e
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/ring_buffer.rs
> @@ -0,0 +1,581 @@
> +/// Ring Buffer Implementation for Cluster Log
> +///
> +/// This module implements a circular buffer for storing log entries,
> +/// matching the C implementation's clog_base_t structure.
> +use super::entry::LogEntry;
> +use super::hash::fnv_64a_str;
> +use anyhow::{bail, Result};
> +use std::collections::VecDeque;
> +
> +pub(crate) const CLOG_DEFAULT_SIZE: usize = 5 * 1024 * 1024; // 5MB
> +pub(crate) const CLOG_MAX_ENTRY_SIZE: usize = 8192 + 4096;

These constants don't match the C constants

#define CLOG_DEFAULT_SIZE (8192 * 16)
#define CLOG_MAX_ENTRY_SIZE 4096

That likely affects capacity semantics, merge limits, and the binary
format?

> +
> +/// Ring buffer for log entries
> +///
> +/// This is a simplified Rust version of the C implementation's ring buffer.
> +/// The C version uses a raw byte buffer with manual pointer arithmetic,
> +/// but we use a VecDeque for safety and simplicity while maintaining
> +/// the same conceptual behavior.
> +///
> +/// C structure (logger.c:64-68):
> +/// ```c
> +/// struct clog_base {
> +///     uint32_t size;    // Total buffer size
> +///     uint32_t cpos;    // Current position
> +///     char data[];      // Variable length data
> +/// };
> +/// ```
> +#[derive(Debug, Clone)]
> +pub struct RingBuffer {
> +    /// Maximum capacity in bytes
> +    capacity: usize,
> +
> +    /// Current size in bytes (approximate)
> +    current_size: usize,
> +
> +    /// Entries stored in the buffer (newest first)
> +    /// We use VecDeque for efficient push/pop at both ends
> +    entries: VecDeque<LogEntry>,
> +}
> +
> +impl RingBuffer {
> +    /// Create a new ring buffer with specified capacity
> +    pub fn new(capacity: usize) -> Self {
> +        // Ensure minimum capacity
> +        let capacity = if capacity < CLOG_MAX_ENTRY_SIZE * 10 {
> +            CLOG_DEFAULT_SIZE
> +        } else {
> +            capacity
> +        };
> +
> +        Self {
> +            capacity,
> +            current_size: 0,
> +            entries: VecDeque::new(),
> +        }
> +    }
> +
> +    /// Add an entry to the buffer
> +    ///
> +    /// Matches C's `clog_copy` function (logger.c:208-218) which calls
> +    /// `clog_alloc_entry` (logger.c:76-102) to allocate space in the ring buffer.
> +    pub fn add_entry(&mut self, entry: &LogEntry) -> Result<()> {
> +        let entry_size = entry.aligned_size();
> +
> +        // Make room if needed (remove oldest entries)
> +        while self.current_size + entry_size > self.capacity && !self.entries.is_empty() {
> +            if let Some(old_entry) = self.entries.pop_back() {
> +                self.current_size = self.current_size.saturating_sub(old_entry.aligned_size());
> +            }
> +        }
> +
> +        // Add new entry at the front (newest first)
> +        self.entries.push_front(entry.clone());
> +        self.current_size += entry_size;
> +
> +        Ok(())
> +    }
> +
> +    /// Check if buffer is near full (>90% capacity)
> +    pub fn is_near_full(&self) -> bool {
> +        self.current_size > (self.capacity * 9 / 10)
> +    }
> +
> +    /// Check if buffer is empty
> +    pub fn is_empty(&self) -> bool {
> +        self.entries.is_empty()
> +    }
> +
> +    /// Get number of entries
> +    pub fn len(&self) -> usize {
> +        self.entries.len()
> +    }
> +
> +    /// Get buffer capacity
> +    pub fn capacity(&self) -> usize {
> +        self.capacity
> +    }
> +
> +    /// Iterate over entries (newest first)
> +    pub fn iter(&self) -> impl Iterator<Item = &LogEntry> {
> +        self.entries.iter()
> +    }
> +
> +    /// Sort entries by time, node_digest, and uid
> +    ///
> +    /// Matches C's `clog_sort` function (logger.c:321-355)
> +    ///
> +    /// C uses GTree with custom comparison function `clog_entry_sort_fn`
> +    /// (logger.c:297-310):
> +    /// ```c
> +    /// if (entry1->time != entry2->time) {
> +    ///     return entry1->time - entry2->time;
> +    /// }
> +    /// if (entry1->node_digest != entry2->node_digest) {
> +    ///     return entry1->node_digest - entry2->node_digest;
> +    /// }
> +    /// return entry1->uid - entry2->uid;
> +    /// ```
> +    pub fn sort(&self) -> Result<Self> {
> +        let mut new_buffer = Self::new(self.capacity);
> +
> +        // Collect and sort entries
> +        let mut sorted: Vec<LogEntry> = self.entries.iter().cloned().collect();
> +
> +        // Sort by time (ascending), then node_digest, then uid
> +        sorted.sort_by_key(|e| (e.time, e.node_digest, e.uid));
> +
> +        // Add sorted entries to new buffer
> +        // Since add_entry pushes to front, we add in forward order to get newest-first
> +        // sorted = [oldest...newest], add_entry pushes to front, so:
> +        // - Add oldest: [oldest]
> +        // - Add next: [next, oldest]
> +        // - Add newest: [newest, next, oldest]
> +        for entry in sorted.iter() {
> +            new_buffer.add_entry(entry)?;
> +        }
> +
> +        Ok(new_buffer)
> +    }
> +
> +    /// Dump buffer to JSON format
> +    ///
> +    /// Matches C's `clog_dump_json` function (logger.c:139-199)
> +    ///
> +    /// # Arguments
> +    /// * `ident_filter` - Optional ident filter (user filter)
> +    /// * `max_entries` - Maximum number of entries to include
> +    pub fn dump_json(&self, ident_filter: Option<&str>, max_entries: usize) -> String {
> +        // Compute ident digest if filter is provided
> +        let ident_digest = ident_filter.map(fnv_64a_str);
> +
> +        let mut data = Vec::new();
> +        let mut count = 0;
> +
> +        // Iterate over entries (newest first)
> +        for entry in self.iter() {
> +            if count >= max_entries {
> +                break;
> +            }
> +
> +            // Apply ident filter if specified
> +            if let Some(digest) = ident_digest {
> +                if digest != entry.ident_digest {
> +                    continue;
> +                }
> +            }
> +
> +            data.push(entry.to_json_object());
> +            count += 1;
> +        }
> +
> +        // Reverse to show oldest first (matching C behavior)
> +        data.reverse();

C prints entries newest to oldest (walk prev from cpos).
Shouldnt this line be removed?

> +
> +        let result = serde_json::json!({
> +            "data": data
> +        });
> +
> +        serde_json::to_string_pretty(&result).unwrap_or_else(|_| "{}".to_string())
> +    }
> +
> +    /// Dump buffer contents (for debugging)
> +    ///
> +    /// Matches C's `clog_dump` function (logger.c:122-137)
> +    #[allow(dead_code)]
> +    pub fn dump(&self) {
> +        for (idx, entry) in self.entries.iter().enumerate() {
> +            println!(
> +                "[{}] uid={:08x} time={} node={}{{{:016X}}} tag={}[{}{{{:016X}}}]: {}",
> +                idx,
> +                entry.uid,
> +                entry.time,
> +                entry.node,
> +                entry.node_digest,
> +                entry.tag,
> +                entry.ident,
> +                entry.ident_digest,
> +                entry.message
> +            );
> +        }
> +    }
> +
> +    /// Serialize to C binary format (clog_base_t)
> +    ///
> +    /// Binary layout matches C structure:
> +    /// ```c
> +    /// struct clog_base {
> +    ///     uint32_t size;    // Total buffer size
> +    ///     uint32_t cpos;    // Current position (offset to newest entry)
> +    ///     char data[];      // Entry data
> +    /// };
> +    /// ```
> +    pub(crate) fn serialize_binary(&self) -> Vec<u8> {

Please re-check, but in C, clusterlog_get_state() returns a full
memdump (allocated ring buffer capacity), with cpos pointing at the
newest entry offset (not always 8). Also in C, entry.next is not a
pointer to the next/newer entry, it’s the end offset of this entry
(entry_off + aligned_size), used to find where the next entry
should be written.

> +        // Empty buffer case
> +        if self.entries.is_empty() {
> +            let mut buf = Vec::with_capacity(8);
> +            buf.extend_from_slice(&8u32.to_le_bytes()); // size = header only
> +            buf.extend_from_slice(&0u32.to_le_bytes()); // cpos = 0 (empty)
> +            return buf;
> +        }
> +
> +        // Calculate total size needed
> +        let mut data_size = 0usize;
> +        for entry in self.iter() {
> +            data_size += entry.aligned_size();
> +        }
> +
> +        let total_size = 8 + data_size; // 8 bytes header + data
> +        let mut buf = Vec::with_capacity(total_size);
> +
> +        // Write header
> +        buf.extend_from_slice(&(total_size as u32).to_le_bytes()); // size
> +        buf.extend_from_slice(&8u32.to_le_bytes()); // cpos (points to first entry at offset 8)
> +
> +        // Write entries with linked list structure
> +        // Entries are in newest-first order in our VecDeque
> +        let entry_count = self.entries.len();
> +        let mut offsets = Vec::with_capacity(entry_count);
> +        let mut current_offset = 8u32; // Start after header
> +
> +        // Calculate offsets first
> +        for entry in self.iter() {
> +            offsets.push(current_offset);
> +            current_offset += entry.aligned_size() as u32;
> +        }
> +
> +        // Write entries with prev/next pointers
> +        // Build circular linked list: newest -> ... -> oldest
> +        // Entry 0 (newest) has prev pointing to entry 1
> +        // Last entry has prev = 0 (end of list)
> +        for (i, entry) in self.iter().enumerate() {
> +            let prev = if i + 1 < entry_count {
> +                offsets[i + 1]
> +            } else {
> +                0
> +            };
> +            let next = if i > 0 { offsets[i - 1] } else { 0 };
> +
> +            let entry_bytes = entry.serialize_binary(prev, next);
> +            buf.extend_from_slice(&entry_bytes);
> +
> +            // Add padding to maintain 8-byte alignment
> +            let aligned_size = entry.aligned_size();
> +            let padding = aligned_size - entry_bytes.len();
> +            buf.resize(buf.len() + padding, 0);
> +        }
> +
> +        buf
> +    }
> +
> +    /// Deserialize from C binary format
> +    ///
> +    /// Parses clog_base_t structure and extracts all entries
> +    pub(crate) fn deserialize_binary(data: &[u8]) -> Result<Self> {
> +        if data.len() < 8 {
> +            bail!(
> +                "Buffer too small: {} bytes (need at least 8 for header)",
> +                data.len()
> +            );
> +        }
> +
> +        // Read header
> +        let size = u32::from_le_bytes(data[0..4].try_into()?) as usize;
> +        let cpos = u32::from_le_bytes(data[4..8].try_into()?) as usize;
> +
> +        if size != data.len() {
> +            bail!(
> +                "Size mismatch: header says {}, got {} bytes",
> +                size,
> +                data.len()
> +            );
> +        }
> +
> +        if cpos < 8 || cpos >= size {
> +            // Empty buffer (cpos == 0) or invalid
> +            if cpos == 0 {
> +                return Ok(Self::new(size));
> +            }
> +            bail!("Invalid cpos: {cpos} (size: {size})");
> +        }
> +
> +        // Parse entries starting from cpos, walking backwards via prev pointers
> +        let mut entries = VecDeque::new();
> +        let mut current_pos = cpos;
> +

C has wrap/overwrite guards when walking prev.
We should probably mirror those checks here too

> +        loop {
> +            if current_pos == 0 || current_pos < 8 || current_pos >= size {
> +                break;
> +            }
> +
> +            // Parse entry at current_pos
> +            let entry_data = &data[current_pos..];
> +            let (entry, prev, _next) = LogEntry::deserialize_binary(entry_data)?;
> +
> +            // Add to back (we're walking backwards in time, newest to oldest)
> +            // VecDeque should end up as [newest, ..., oldest]
> +            entries.push_back(entry);
> +
> +            current_pos = prev as usize;
> +        }
> +
> +        // Create ring buffer with entries
> +        let mut ring = Self::new(size);
> +        ring.entries = entries;
> +        ring.current_size = size - 8; // Approximate
> +
> +        Ok(ring)
> +    }
> +}
> +
> +impl Default for RingBuffer {
> +    fn default() -> Self {
> +        Self::new(CLOG_DEFAULT_SIZE)
> +    }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +
> +    #[test]
> +    fn test_ring_buffer_creation() {
> +        let buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +        assert_eq!(buffer.capacity, CLOG_DEFAULT_SIZE);
> +        assert_eq!(buffer.len(), 0);
> +        assert!(buffer.is_empty());
> +    }
> +
> +    #[test]
> +    fn test_add_entry() {
> +        let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +        let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "message").unwrap();
> +
> +        let result = buffer.add_entry(&entry);
> +        assert!(result.is_ok());
> +        assert_eq!(buffer.len(), 1);
> +        assert!(!buffer.is_empty());
> +    }
> +
> +    #[test]
> +    fn test_ring_buffer_wraparound() {
> +        // Create a buffer with minimum required size (CLOG_MAX_ENTRY_SIZE * 10)
> +        // but fill it beyond 90% to trigger wraparound
> +        let mut buffer = RingBuffer::new(CLOG_MAX_ENTRY_SIZE * 10);
> +
> +        // Add many small entries to fill the buffer
> +        // Each entry is small, so we need many to fill the buffer
> +        let initial_count = 50_usize;
> +        for i in 0..initial_count {
> +            let entry =
> +                LogEntry::pack("node1", "root", "tag", 0, 1000 + i as u32, 6, "msg").unwrap();
> +            let _ = buffer.add_entry(&entry);
> +        }
> +
> +        // All entries should fit initially
> +        let count_before = buffer.len();
> +        assert_eq!(count_before, initial_count);
> +
> +        // Now add entries with large messages to trigger wraparound
> +        // Make messages large enough to fill the buffer beyond capacity
> +        let large_msg = "x".repeat(7000); // Very large message (close to max)
> +        let large_entries_count = 20_usize;
> +        for i in 0..large_entries_count {
> +            let entry =
> +                LogEntry::pack("node1", "root", "tag", 0, 2000 + i as u32, 6, &large_msg).unwrap();
> +            let _ = buffer.add_entry(&entry);
> +        }
> +
> +        // Should have removed some old entries due to capacity limits
> +        assert!(
> +            buffer.len() < count_before + large_entries_count,
> +            "Expected wraparound to remove old entries (have {} entries, expected < {})",
> +            buffer.len(),
> +            count_before + large_entries_count
> +        );
> +
> +        // Newest entry should be present
> +        let newest = buffer.iter().next().unwrap();
> +        assert_eq!(newest.time, 2000 + large_entries_count as u32 - 1); // Last added entry
> +    }
> +
> +    #[test]
> +    fn test_sort_by_time() {
> +        let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> +        // Add entries in random time order
> +        let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1002, 6, "c").unwrap());
> +        let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "a").unwrap());
> +        let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "b").unwrap());
> +
> +        let sorted = buffer.sort().unwrap();
> +
> +        // Check that entries are sorted by time (oldest first after reversing)
> +        let times: Vec<u32> = sorted.iter().map(|e| e.time).collect();
> +        let mut times_sorted = times.clone();
> +        times_sorted.sort();
> +        times_sorted.reverse(); // Newest first in buffer
> +        assert_eq!(times, times_sorted);
> +    }
> +
> +    #[test]
> +    fn test_sort_by_node_digest() {
> +        let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> +        // Add entries with same time but different nodes
> +        let _ = buffer.add_entry(&LogEntry::pack("node3", "root", "tag", 0, 1000, 6, "c").unwrap());
> +        let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "a").unwrap());
> +        let _ = buffer.add_entry(&LogEntry::pack("node2", "root", "tag", 0, 1000, 6, "b").unwrap());
> +
> +        let sorted = buffer.sort().unwrap();
> +
> +        // Entries with same time should be sorted by node_digest
> +        // Within same time, should be sorted
> +        for entries in sorted.iter().collect::<Vec<_>>().windows(2) {
> +            if entries[0].time == entries[1].time {
> +                assert!(entries[0].node_digest >= entries[1].node_digest);
> +            }
> +        }
> +    }
> +
> +    #[test]
> +    fn test_json_dump() {
> +        let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +        let _ = buffer
> +            .add_entry(&LogEntry::pack("node1", "root", "cluster", 123, 1000, 6, "msg").unwrap());
> +
> +        let json = buffer.dump_json(None, 50);
> +
> +        // Should be valid JSON
> +        let parsed: serde_json::Value = serde_json::from_str(&json).unwrap();
> +        assert!(parsed.get("data").is_some());
> +
> +        let data = parsed["data"].as_array().unwrap();
> +        assert_eq!(data.len(), 1);
> +
> +        let entry = &data[0];
> +        assert_eq!(entry["node"], "node1");
> +        assert_eq!(entry["user"], "root");
> +        assert_eq!(entry["tag"], "cluster");
> +    }
> +
> +    #[test]
> +    fn test_json_dump_with_filter() {
> +        let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> +        // Add entries with different users
> +        let _ =
> +            buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg1").unwrap());
> +        let _ =
> +            buffer.add_entry(&LogEntry::pack("node1", "admin", "tag", 0, 1001, 6, "msg2").unwrap());
> +        let _ =
> +            buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1002, 6, "msg3").unwrap());
> +
> +        // Filter for "root" only
> +        let json = buffer.dump_json(Some("root"), 50);
> +
> +        let parsed: serde_json::Value = serde_json::from_str(&json).unwrap();
> +        let data = parsed["data"].as_array().unwrap();
> +
> +        // Should only have 2 entries (the ones from "root")
> +        assert_eq!(data.len(), 2);
> +
> +        for entry in data {
> +            assert_eq!(entry["user"], "root");
> +        }
> +    }
> +
> +    #[test]
> +    fn test_json_dump_max_entries() {
> +        let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> +        // Add 10 entries
> +        for i in 0..10 {
> +            let _ = buffer
> +                .add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000 + i, 6, "msg").unwrap());
> +        }
> +
> +        // Request only 5 entries
> +        let json = buffer.dump_json(None, 5);
> +
> +        let parsed: serde_json::Value = serde_json::from_str(&json).unwrap();
> +        let data = parsed["data"].as_array().unwrap();
> +
> +        assert_eq!(data.len(), 5);
> +    }
> +
> +    #[test]
> +    fn test_iterator() {
> +        let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> +        let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "a").unwrap());
> +        let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "b").unwrap());
> +        let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1002, 6, "c").unwrap());
> +
> +        let messages: Vec<String> = buffer.iter().map(|e| e.message.clone()).collect();
> +
> +        // Should be in reverse order (newest first)
> +        assert_eq!(messages, vec!["c", "b", "a"]);
> +    }
> +
> +    #[test]
> +    fn test_binary_serialization_roundtrip() {
> +        let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> +        let _ = buffer.add_entry(
> +            &LogEntry::pack("node1", "root", "cluster", 123, 1000, 6, "Entry 1").unwrap(),
> +        );
> +        let _ = buffer.add_entry(
> +            &LogEntry::pack("node2", "admin", "system", 456, 1001, 5, "Entry 2").unwrap(),
> +        );
> +
> +        // Serialize
> +        let binary = buffer.serialize_binary();
> +
> +        // Deserialize
> +        let deserialized = RingBuffer::deserialize_binary(&binary).unwrap();
> +
> +        // Check entry count
> +        assert_eq!(deserialized.len(), buffer.len());
> +
> +        // Check entries match
> +        let orig_entries: Vec<_> = buffer.iter().collect();
> +        let deser_entries: Vec<_> = deserialized.iter().collect();
> +
> +        for (orig, deser) in orig_entries.iter().zip(deser_entries.iter()) {
> +            assert_eq!(deser.uid, orig.uid);
> +            assert_eq!(deser.time, orig.time);
> +            assert_eq!(deser.node, orig.node);
> +            assert_eq!(deser.message, orig.message);
> +        }
> +    }
> +
> +    #[test]
> +    fn test_binary_format_header() {
> +        let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +        let _ = buffer.add_entry(&LogEntry::pack("n", "u", "t", 1, 1000, 6, "m").unwrap());
> +
> +        let binary = buffer.serialize_binary();
> +
> +        // Check header format
> +        assert!(binary.len() >= 8);
> +
> +        let size = u32::from_le_bytes(binary[0..4].try_into().unwrap()) as usize;
> +        let cpos = u32::from_le_bytes(binary[4..8].try_into().unwrap());
> +
> +        assert_eq!(size, binary.len());
> +        assert_eq!(cpos, 8); // First entry at offset 8
> +    }
> +
> +    #[test]
> +    fn test_binary_empty_buffer() {
> +        let buffer = RingBuffer::new(1024);
> +        let binary = buffer.serialize_binary();
> +
> +        // Empty buffer should just be header
> +        assert_eq!(binary.len(), 8);
> +
> +        let deserialized = RingBuffer::deserialize_binary(&binary).unwrap();
> +        assert_eq!(deserialized.len(), 0);
> +    }
> +}



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[relevance 6%]

* Re: [pve-devel] [PATCH pve-cluster 04/15] pmxcfs-rs: add pmxcfs-rrd crate
  @ 2026-01-29 14:44  5%   ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-29 14:44 UTC (permalink / raw)
  To: Proxmox VE development discussion, Kefu Chai

Thanks for the patch, Kefu.

Overall looks good (nice backend abstraction and schema separation).
I left a few inline notes around transform_data() skip logic,
sanitizing key path components, .rrd on-disk naming consistency across
backends/tests, and adding a few actual payload fixtures for the
transform tests.

Please see comments inline.

On 1/7/26 10:15 AM, Kefu Chai wrote:
> Add RRD (Round-Robin Database) file persistence system:
> - RrdWriter: Main API for RRD operations
> - Schema definitions for CPU, memory, network metrics
> - Format migration support (v1/v2/v3)
> - rrdcached integration for batched writes
> - Data transformation for legacy formats
> 
> This is an independent crate with no internal dependencies,
> only requiring external RRD libraries (rrd, rrdcached-client)
> and tokio for async operations. It handles time-series data
> storage compatible with the C implementation.
> 
> Includes comprehensive unit tests for data transformation,
> schema generation, and multi-source data processing.
> 
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
>   src/pmxcfs-rs/Cargo.toml                      |   1 +
>   src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml           |  18 +
>   src/pmxcfs-rs/pmxcfs-rrd/README.md            |  51 ++
>   src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs       |  67 ++
>   .../pmxcfs-rrd/src/backend/backend_daemon.rs  | 214 +++++++
>   .../pmxcfs-rrd/src/backend/backend_direct.rs  | 606 ++++++++++++++++++
>   .../src/backend/backend_fallback.rs           | 229 +++++++
>   src/pmxcfs-rs/pmxcfs-rrd/src/daemon.rs        | 140 ++++
>   src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs      | 313 +++++++++
>   src/pmxcfs-rs/pmxcfs-rrd/src/lib.rs           |  21 +
>   src/pmxcfs-rs/pmxcfs-rrd/src/schema.rs        | 577 +++++++++++++++++
>   src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs        | 397 ++++++++++++
>   12 files changed, 2634 insertions(+)
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/README.md
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_daemon.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_direct.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_fallback.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/daemon.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/lib.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/schema.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs
> 
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index 4d17e87e..dd36c81f 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -4,6 +4,7 @@ members = [
>       "pmxcfs-api-types",  # Shared types and error definitions
>       "pmxcfs-config",     # Configuration management
>       "pmxcfs-logger",     # Cluster log with ring buffer and deduplication
> +    "pmxcfs-rrd",        # RRD (Round-Robin Database) persistence
>   ]
>   resolver = "2"
>   
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml b/src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml
> new file mode 100644
> index 00000000..bab71423
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml
> @@ -0,0 +1,18 @@
> +[package]
> +name = "pmxcfs-rrd"
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +
> +[dependencies]
> +anyhow.workspace = true
> +async-trait = "0.1"
> +chrono = { version = "0.4", default-features = false, features = ["clock"] }
> +rrd = "0.2"
> +rrdcached-client = "0.1.5"

This crate looks fairly young/small. Are we comfortable depending on
it? We could probably vendor/fork it to control stability?

> +tokio.workspace = true
> +tracing.workspace = true
> +
> +[dev-dependencies]
> +tempfile.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/README.md b/src/pmxcfs-rs/pmxcfs-rrd/README.md
> new file mode 100644
> index 00000000..800d78cf
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/README.md
> @@ -0,0 +1,51 @@
> +# pmxcfs-rrd
> +
> +RRD (Round-Robin Database) persistence for pmxcfs performance metrics.
> +
> +## Overview
> +
> +This crate provides RRD file management for storing time-series performance data from Proxmox nodes and VMs. It handles file creation, updates, and integration with rrdcached daemon for efficient writes.

Can we elaborate on the usage / flow of this crate?
How it will be called, what data will be passed, how the transformation
works, how the backend impls differ. This will help reviewers for sure.
Maybe also add a small code example how this lib is used, which I think
is valuable.

> +
> +### Key Features
> +
> +- RRD file creation with schema-based initialization
> +- RRD updates (write metrics to disk)
> +- rrdcached integration for batched writes
> +- Support for both legacy and current schema versions
> +- Type-safe key parsing and validation
> +- Compatible with existing C-created RRD files
> +
> +## Module Structure
> +
> +| Module | Purpose |
> +|--------|---------|
> +| `writer.rs` | Main RrdWriter API |
> +| `schema.rs` | RRD schema definitions (DS, RRA) |
> +| `key_type.rs` | RRD key parsing and validation |
> +| `daemon.rs` | rrdcached daemon client |

The backend module is not listed here.
But I think we could drop this table. If we keep it, I think it would
be helpful to elaborate a bit more on the components.

> +
> +## External Dependencies
> +
> +- **librrd**: RRDtool library (via FFI bindings)

Lets explicitly note the rrd crate here, which provides the bindings

> +- **rrdcached**: Optional daemon for batched writes and improved performance

Since rrdcached is optional, we could also add a feature flag to reduce
dependencies/build surface?

> +
> +## Testing
> +
> +Unit tests verify:
> +- Schema generation and validation
> +- Key parsing for different RRD types (node, VM, storage)
> +- RRD file creation and update operations
> +- rrdcached client connection and fallback behavior
> +
> +Run tests with:
> +```bash
> +cargo test -p pmxcfs-rrd
> +```
> +
> +## References
> +
> +- **C Implementation**: `src/pmxcfs/status.c` (RRD code embedded)
> +- **Related Crates**:
> +  - `pmxcfs-status` - Uses RrdWriter for metrics persistence
> +  - `pmxcfs` - FUSE `.rrd` plugin reads RRD files
> +- **RRDtool Documentation**: https://oss.oetiker.ch/rrdtool/

Thanks for adding the references and how they are used in C, this is
very helpful I think.

> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs
> new file mode 100644
> index 00000000..58652831
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs
> @@ -0,0 +1,67 @@
> +/// RRD Backend Trait and Implementations
> +///
> +/// This module provides an abstraction over different RRD writing mechanisms:
> +/// - Daemon-based (via rrdcached) for performance and batching
> +/// - Direct file writing for reliability and fallback scenarios
> +/// - Fallback composite that tries daemon first, then falls back to direct
> +///
> +/// This design matches the C implementation's behavior in status.c where
> +/// it attempts daemon update first, then falls back to direct file writes.
> +use super::schema::RrdSchema;
> +use anyhow::Result;
> +use async_trait::async_trait;
> +use std::path::Path;
> +
> +/// Trait for RRD backend implementations
> +///
> +/// Provides abstraction over different RRD writing mechanisms.
> +/// All methods are async to support both async (daemon) and sync (direct file) operations.
> +#[async_trait]
> +pub trait RrdBackend: Send + Sync {

Great idea to abstract this!

> +    /// Update RRD file with new data
> +    ///
> +    /// # Arguments
> +    /// * `file_path` - Full path to the RRD file
> +    /// * `data` - Update data in format "timestamp:value1:value2:..."
> +    async fn update(&mut self, file_path: &Path, data: &str) -> Result<()>;
> +
> +    /// Create new RRD file with schema
> +    ///
> +    /// # Arguments
> +    /// * `file_path` - Full path where RRD file should be created
> +    /// * `schema` - RRD schema defining data sources and archives
> +    /// * `start_timestamp` - Start time for the RRD file (Unix timestamp)
> +    async fn create(
> +        &mut self,
> +        file_path: &Path,
> +        schema: &RrdSchema,
> +        start_timestamp: i64,
> +    ) -> Result<()>;
> +
> +    /// Flush pending updates to disk
> +    ///
> +    /// For daemon backends, this sends a FLUSH command.
> +    /// For direct backends, this is a no-op (writes are immediate).
> +    #[allow(dead_code)] // Used in backend implementations via trait dispatch
> +    async fn flush(&mut self) -> Result<()>;
> +
> +    /// Check if backend is available and healthy
> +    ///
> +    /// Returns true if the backend can be used for operations.
> +    /// For daemon backends, this checks if the connection is alive.
> +    /// For direct backends, this always returns true.
> +    #[allow(dead_code)] // Used in fallback backend via trait dispatch
> +    async fn is_available(&self) -> bool;
> +
> +    /// Get a human-readable name for this backend
> +    fn name(&self) -> &str;
> +}
> +
> +// Backend implementations
> +mod backend_daemon;
> +mod backend_direct;
> +mod backend_fallback;
> +
> +pub use backend_daemon::RrdCachedBackend;
> +pub use backend_direct::RrdDirectBackend;
> +pub use backend_fallback::RrdFallbackBackend;
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_daemon.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_daemon.rs
> new file mode 100644
> index 00000000..28c1a99a
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_daemon.rs
> @@ -0,0 +1,214 @@
> +/// RRD Backend: rrdcached daemon
> +///
> +/// Uses rrdcached for batched, high-performance RRD updates.
> +/// This is the preferred backend when the daemon is available.
> +use super::super::schema::RrdSchema;
> +use anyhow::{Context, Result};
> +use async_trait::async_trait;
> +use rrdcached_client::RRDCachedClient;
> +use rrdcached_client::consolidation_function::ConsolidationFunction;
> +use rrdcached_client::create::{
> +    CreateArguments, CreateDataSource, CreateDataSourceType, CreateRoundRobinArchive,
> +};
> +use std::path::Path;
> +
> +/// RRD backend using rrdcached daemon
> +pub struct RrdCachedBackend {
> +    client: RRDCachedClient<tokio::net::UnixStream>,
> +}
> +
> +impl RrdCachedBackend {
> +    /// Connect to rrdcached daemon
> +    ///
> +    /// # Arguments
> +    /// * `socket_path` - Path to rrdcached Unix socket (default: /var/run/rrdcached.sock)
> +    pub async fn connect(socket_path: &str) -> Result<Self> {
> +        let client = RRDCachedClient::connect_unix(socket_path)
> +            .await
> +            .with_context(|| format!("Failed to connect to rrdcached at {socket_path}"))?;
> +
> +        tracing::info!("Connected to rrdcached at {}", socket_path);
> +
> +        Ok(Self { client })
> +    }
> +}
> +
> +#[async_trait]
> +impl super::super::backend::RrdBackend for RrdCachedBackend {
> +    async fn update(&mut self, file_path: &Path, data: &str) -> Result<()> {
> +        // Parse the update data
> +        let parts: Vec<&str> = data.split(':').collect();
> +        if parts.len() < 2 {
> +            anyhow::bail!("Invalid update data format: {data}");
> +        }
> +
> +        let timestamp = if parts[0] == "N" {
> +            None
> +        } else {
> +            Some(
> +                parts[0]
> +                    .parse::<usize>()
> +                    .with_context(|| format!("Invalid timestamp: {}", parts[0]))?,
> +            )
> +        };
> +
> +        let values: Vec<f64> = parts[1..]
> +            .iter()
> +            .map(|v| {
> +                if *v == "U" {
> +                    Ok(f64::NAN)
> +                } else {
> +                    v.parse::<f64>()
> +                        .with_context(|| format!("Invalid value: {v}"))
> +                }
> +            })
> +            .collect::<Result<Vec<_>>>()?;
> +
> +        // Get file path without .rrd extension (rrdcached-client adds it)
> +        let path_str = file_path.to_string_lossy();
> +        let path_without_ext = path_str.strip_suffix(".rrd").unwrap_or(&path_str);
> +
> +        // Send update via rrdcached
> +        self.client
> +            .update(path_without_ext, timestamp, values)
> +            .await
> +            .with_context(|| format!("rrdcached update failed for {:?}", file_path))?;
> +
> +        tracing::trace!("Updated RRD via daemon: {:?} -> {}", file_path, data);
> +
> +        Ok(())
> +    }
> +
> +    async fn create(
> +        &mut self,
> +        file_path: &Path,
> +        schema: &RrdSchema,
> +        start_timestamp: i64,
> +    ) -> Result<()> {
> +        tracing::debug!(
> +            "Creating RRD file via daemon: {:?} with {} data sources",
> +            file_path,
> +            schema.column_count()
> +        );
> +
> +        // Convert our data sources to rrdcached-client CreateDataSource objects
> +        let mut data_sources = Vec::new();
> +        for ds in &schema.data_sources {
> +            let serie_type = match ds.ds_type {
> +                "GAUGE" => CreateDataSourceType::Gauge,
> +                "DERIVE" => CreateDataSourceType::Derive,
> +                "COUNTER" => CreateDataSourceType::Counter,
> +                "ABSOLUTE" => CreateDataSourceType::Absolute,
> +                _ => anyhow::bail!("Unsupported data source type: {}", ds.ds_type),
> +            };
> +
> +            // Parse min/max values
> +            let minimum = if ds.min == "U" {
> +                None
> +            } else {
> +                ds.min.parse().ok()
> +            };
> +            let maximum = if ds.max == "U" {
> +                None
> +            } else {
> +                ds.max.parse().ok()
> +            };
> +
> +            let data_source = CreateDataSource {
> +                name: ds.name.to_string(),
> +                minimum,
> +                maximum,
> +                heartbeat: ds.heartbeat as i64,
> +                serie_type,
> +            };
> +
> +            data_sources.push(data_source);
> +        }
> +
> +        // Convert our RRA definitions to rrdcached-client CreateRoundRobinArchive objects
> +        let mut archives = Vec::new();
> +        for rra in &schema.archives {
> +            // Parse RRA string: "RRA:AVERAGE:0.5:1:70"
> +            let parts: Vec<&str> = rra.split(':').collect();
> +            if parts.len() != 5 || parts[0] != "RRA" {
> +                anyhow::bail!("Invalid RRA format: {rra}");
> +            }
> +
> +            let consolidation_function = match parts[1] {
> +                "AVERAGE" => ConsolidationFunction::Average,
> +                "MIN" => ConsolidationFunction::Min,
> +                "MAX" => ConsolidationFunction::Max,
> +                "LAST" => ConsolidationFunction::Last,
> +                _ => anyhow::bail!("Unsupported consolidation function: {}", parts[1]),
> +            };
> +
> +            let xfiles_factor: f64 = parts[2]
> +                .parse()
> +                .with_context(|| format!("Invalid xff in RRA: {rra}"))?;
> +            let steps: i64 = parts[3]
> +                .parse()
> +                .with_context(|| format!("Invalid steps in RRA: {rra}"))?;
> +            let rows: i64 = parts[4]
> +                .parse()
> +                .with_context(|| format!("Invalid rows in RRA: {rra}"))?;
> +
> +            let archive = CreateRoundRobinArchive {
> +                consolidation_function,
> +                xfiles_factor,
> +                steps,
> +                rows,
> +            };
> +            archives.push(archive);
> +        }
> +
> +        // Get path without .rrd extension (rrdcached-client adds it)
> +        let path_str = file_path.to_string_lossy();
> +        let path_without_ext = path_str
> +            .strip_suffix(".rrd")
> +            .unwrap_or(&path_str)
> +            .to_string();
> +
> +        // Create CreateArguments
> +        let create_args = CreateArguments {
> +            path: path_without_ext,
> +            data_sources,
> +            round_robin_archives: archives,
> +            start_timestamp: start_timestamp as u64,
> +            step_seconds: 60, // 60-second step (1 minute resolution)
> +        };
> +
> +        // Validate before sending
> +        create_args.validate().context("Invalid CREATE arguments")?;
> +
> +        // Send CREATE command via rrdcached
> +        self.client
> +            .create(create_args)
> +            .await
> +            .with_context(|| format!("Failed to create RRD file via daemon: {file_path:?}"))?;
> +
> +        tracing::info!("Created RRD file via daemon: {:?} ({})", file_path, schema);
> +
> +        Ok(())
> +    }
> +
> +    async fn flush(&mut self) -> Result<()> {
> +        self.client
> +            .flush_all()
> +            .await
> +            .context("Failed to flush rrdcached")?;
> +
> +        tracing::debug!("Flushed all pending RRD updates");
> +
> +        Ok(())
> +    }
> +
> +    async fn is_available(&self) -> bool {
> +        // For now, assume we're available if we have a client
> +        // Could add a PING command in the future
> +        true
> +    }
> +
> +    fn name(&self) -> &str {
> +        "rrdcached"
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_direct.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_direct.rs
> new file mode 100644
> index 00000000..6be3eb5d
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_direct.rs
> @@ -0,0 +1,606 @@
> +/// RRD Backend: Direct file writing
> +///
> +/// Uses the `rrd` crate (librrd bindings) for direct RRD file operations.
> +/// This backend is used as a fallback when rrdcached is unavailable.
> +///
> +/// This matches the C implementation's behavior in status.c:1416-1420 where
> +/// it falls back to rrd_update_r() and rrd_create_r() for direct file access.
> +use super::super::schema::RrdSchema;
> +use anyhow::{Context, Result};
> +use async_trait::async_trait;
> +use std::path::Path;
> +use std::time::Duration;
> +
> +/// RRD backend using direct file operations via librrd
> +pub struct RrdDirectBackend {
> +    // Currently stateless, but kept as struct for future enhancements
> +}
> +
> +impl RrdDirectBackend {
> +    /// Create a new direct file backend
> +    pub fn new() -> Self {
> +        tracing::info!("Using direct RRD file backend (via librrd)");
> +        Self {}
> +    }
> +}
> +
> +impl Default for RrdDirectBackend {
> +    fn default() -> Self {
> +        Self::new()
> +    }
> +}
> +
> +#[async_trait]
> +impl super::super::backend::RrdBackend for RrdDirectBackend {
> +    async fn update(&mut self, file_path: &Path, data: &str) -> Result<()> {
> +        let path = file_path.to_path_buf();
> +        let data_str = data.to_string();
> +
> +        // Use tokio::task::spawn_blocking for sync rrd operations
> +        // This prevents blocking the async runtime
> +        tokio::task::spawn_blocking(move || {
> +            // Parse the update data to extract timestamp and values
> +            // Format: "timestamp:value1:value2:..."
> +            let parts: Vec<&str> = data_str.split(':').collect();
> +            if parts.is_empty() {
> +                anyhow::bail!("Empty update data");
> +            }
> +
> +            // Use rrd::ops::update::update_all_with_timestamp
> +            // This is the most direct way to update RRD files
> +            let timestamp_str = parts[0];
> +            let timestamp: i64 = if timestamp_str == "N" {
> +                // "N" means "now" in RRD terminology
> +                chrono::Utc::now().timestamp()
> +            } else {
> +                timestamp_str
> +                    .parse()
> +                    .with_context(|| format!("Invalid timestamp: {}", timestamp_str))?
> +            };
> +
> +            let timestamp = chrono::DateTime::from_timestamp(timestamp, 0)
> +                .ok_or_else(|| anyhow::anyhow!("Invalid timestamp value: {}", timestamp))?;
> +
> +            // Convert values to Datum
> +            let values: Vec<rrd::ops::update::Datum> = parts[1..]
> +                .iter()
> +                .map(|v| {
> +                    if *v == "U" {
> +                        // Unknown/unspecified value
> +                        rrd::ops::update::Datum::Unspecified
> +                    } else if let Ok(int_val) = v.parse::<u64>() {
> +                        rrd::ops::update::Datum::Int(int_val)
> +                    } else if let Ok(float_val) = v.parse::<f64>() {
> +                        rrd::ops::update::Datum::Float(float_val)
> +                    } else {
> +                        rrd::ops::update::Datum::Unspecified
> +                    }
> +                })
> +                .collect();
> +
> +            // Perform the update
> +            rrd::ops::update::update_all(
> +                &path,
> +                rrd::ops::update::ExtraFlags::empty(),
> +                &[(
> +                    rrd::ops::update::BatchTime::Timestamp(timestamp),
> +                    values.as_slice(),
> +                )],
> +            )
> +            .with_context(|| format!("Direct RRD update failed for {:?}", path))?;
> +
> +            tracing::trace!("Updated RRD via direct file: {:?} -> {}", path, data_str);
> +
> +            Ok::<(), anyhow::Error>(())
> +        })
> +        .await
> +        .context("Failed to spawn blocking task for RRD update")??;
> +
> +        Ok(())
> +    }
> +
> +    async fn create(
> +        &mut self,
> +        file_path: &Path,
> +        schema: &RrdSchema,
> +        start_timestamp: i64,
> +    ) -> Result<()> {
> +        tracing::debug!(
> +            "Creating RRD file via direct: {:?} with {} data sources",
> +            file_path,
> +            schema.column_count()
> +        );
> +
> +        let path = file_path.to_path_buf();
> +        let schema = schema.clone();
> +
> +        // Ensure parent directory exists
> +        if let Some(parent) = path.parent() {
> +            std::fs::create_dir_all(parent)
> +                .with_context(|| format!("Failed to create directory: {parent:?}"))?;
> +        }
> +
> +        // Use tokio::task::spawn_blocking for sync rrd operations
> +        tokio::task::spawn_blocking(move || {
> +            // Convert timestamp
> +            let start = chrono::DateTime::from_timestamp(start_timestamp, 0)
> +                .ok_or_else(|| anyhow::anyhow!("Invalid start timestamp: {}", start_timestamp))?;
> +
> +            // Convert data sources
> +            let data_sources: Vec<rrd::ops::create::DataSource> = schema
> +                .data_sources
> +                .iter()
> +                .map(|ds| {
> +                    let name = rrd::ops::create::DataSourceName::new(ds.name);
> +
> +                    match ds.ds_type {
> +                        "GAUGE" => {
> +                            let min = if ds.min == "U" {
> +                                None
> +                            } else {
> +                                Some(ds.min.parse().context("Invalid min value")?)
> +                            };
> +                            let max = if ds.max == "U" {
> +                                None
> +                            } else {
> +                                Some(ds.max.parse().context("Invalid max value")?)
> +                            };
> +                            Ok(rrd::ops::create::DataSource::gauge(
> +                                name,
> +                                ds.heartbeat,
> +                                min,
> +                                max,
> +                            ))
> +                        }
> +                        "DERIVE" => {
> +                            let min = if ds.min == "U" {
> +                                None
> +                            } else {
> +                                Some(ds.min.parse().context("Invalid min value")?)
> +                            };
> +                            let max = if ds.max == "U" {
> +                                None
> +                            } else {
> +                                Some(ds.max.parse().context("Invalid max value")?)
> +                            };
> +                            Ok(rrd::ops::create::DataSource::derive(
> +                                name,
> +                                ds.heartbeat,
> +                                min,
> +                                max,
> +                            ))
> +                        }
> +                        "COUNTER" => {
> +                            let min = if ds.min == "U" {
> +                                None
> +                            } else {
> +                                Some(ds.min.parse().context("Invalid min value")?)
> +                            };
> +                            let max = if ds.max == "U" {
> +                                None
> +                            } else {
> +                                Some(ds.max.parse().context("Invalid max value")?)
> +                            };
> +                            Ok(rrd::ops::create::DataSource::counter(
> +                                name,
> +                                ds.heartbeat,
> +                                min,
> +                                max,
> +                            ))
> +                        }
> +                        "ABSOLUTE" => {
> +                            let min = if ds.min == "U" {
> +                                None
> +                            } else {
> +                                Some(ds.min.parse().context("Invalid min value")?)
> +                            };
> +                            let max = if ds.max == "U" {
> +                                None
> +                            } else {
> +                                Some(ds.max.parse().context("Invalid max value")?)
> +                            };
> +                            Ok(rrd::ops::create::DataSource::absolute(
> +                                name,
> +                                ds.heartbeat,
> +                                min,
> +                                max,
> +                            ))
> +                        }
> +                        _ => anyhow::bail!("Unsupported data source type: {}", ds.ds_type),
> +                    }
> +                })
> +                .collect::<Result<Vec<_>>>()?;
> +
> +            // Convert RRAs
> +            let archives: Result<Vec<rrd::ops::create::Archive>> = schema
> +                .archives
> +                .iter()
> +                .map(|rra| {
> +                    // Parse RRA string: "RRA:AVERAGE:0.5:1:1440"
> +                    let parts: Vec<&str> = rra.split(':').collect();
> +                    if parts.len() != 5 || parts[0] != "RRA" {
> +                        anyhow::bail!("Invalid RRA format: {}", rra);
> +                    }
> +
> +                    let cf = match parts[1] {
> +                        "AVERAGE" => rrd::ConsolidationFn::Avg,
> +                        "MIN" => rrd::ConsolidationFn::Min,
> +                        "MAX" => rrd::ConsolidationFn::Max,
> +                        "LAST" => rrd::ConsolidationFn::Last,
> +                        _ => anyhow::bail!("Unsupported consolidation function: {}", parts[1]),
> +                    };
> +
> +                    let xff: f64 = parts[2]
> +                        .parse()
> +                        .with_context(|| format!("Invalid xff in RRA: {}", rra))?;
> +                    let steps: u32 = parts[3]
> +                        .parse()
> +                        .with_context(|| format!("Invalid steps in RRA: {}", rra))?;
> +                    let rows: u32 = parts[4]
> +                        .parse()
> +                        .with_context(|| format!("Invalid rows in RRA: {}", rra))?;
> +
> +                    rrd::ops::create::Archive::new(cf, xff, steps, rows)
> +                        .map_err(|e| anyhow::anyhow!("Failed to create archive: {}", e))
> +                })
> +                .collect();
> +
> +            let archives = archives?;
> +
> +            // Call rrd::ops::create::create
> +            rrd::ops::create::create(
> +                &path,
> +                start,
> +                Duration::from_secs(60), // 60-second step
> +                false,                   // no_overwrite = false

With overwrite allowed, there could be a race if we have a second,
concurrent create.

> +                None,                    // template
> +                &[],                     // sources
> +                data_sources.iter(),
> +                archives.iter(),
> +            )
> +            .with_context(|| format!("Direct RRD create failed for {:?}", path))?;
> +
> +            tracing::info!("Created RRD file via direct: {:?} ({})", path, schema);
> +
> +            Ok::<(), anyhow::Error>(())
> +        })
> +        .await
> +        .context("Failed to spawn blocking task for RRD create")??;
> +
> +        Ok(())
> +    }
> +
> +    async fn flush(&mut self) -> Result<()> {
> +        // No-op for direct backend - writes are immediate
> +        tracing::trace!("Flush called on direct backend (no-op)");
> +        Ok(())
> +    }
> +
> +    async fn is_available(&self) -> bool {
> +        // Direct backend is always available (no external dependencies)
> +        true
> +    }
> +
> +    fn name(&self) -> &str {
> +        "direct"
> +    }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +    use crate::backend::RrdBackend;
> +    use crate::schema::{RrdFormat, RrdSchema};
> +    use std::path::PathBuf;
> +    use tempfile::TempDir;
> +
> +    // ===== Test Helpers =====
> +
> +    /// Create a temporary directory for RRD files
> +    fn setup_temp_dir() -> TempDir {
> +        TempDir::new().expect("Failed to create temp directory")
> +    }
> +
> +    /// Create a test RRD file path
> +    fn test_rrd_path(dir: &TempDir, name: &str) -> PathBuf {
> +        dir.path().join(format!("{}.rrd", name))

What’s the canonical on-disk naming here (with or without .rrd)?
file_path() and the daemon path handling suggest no extension,
but direct/tests currently create *.rrd. Can we make this consistent
across writer/backends/tests?

> +    }
> +
> +    // ===== RrdDirectBackend Tests =====
> +
> +    #[tokio::test]
> +    async fn test_direct_backend_create_node_rrd() {
> +        let temp_dir = setup_temp_dir();
> +        let rrd_path = test_rrd_path(&temp_dir, "node_test");
> +
> +        let mut backend = RrdDirectBackend::new();
> +        let schema = RrdSchema::node(RrdFormat::Pve9_0);
> +        let start_time = 1704067200; // 2024-01-01 00:00:00
> +
> +        // Create RRD file
> +        let result = backend.create(&rrd_path, &schema, start_time).await;
> +        assert!(
> +            result.is_ok(),
> +            "Failed to create node RRD: {:?}",
> +            result.err()
> +        );
> +
> +        // Verify file was created
> +        assert!(rrd_path.exists(), "RRD file should exist after create");
> +
> +        // Verify backend name
> +        assert_eq!(backend.name(), "direct");
> +    }
> +
> +    #[tokio::test]
> +    async fn test_direct_backend_create_vm_rrd() {
> +        let temp_dir = setup_temp_dir();
> +        let rrd_path = test_rrd_path(&temp_dir, "vm_test");
> +
> +        let mut backend = RrdDirectBackend::new();
> +        let schema = RrdSchema::vm(RrdFormat::Pve9_0);
> +        let start_time = 1704067200;
> +
> +        let result = backend.create(&rrd_path, &schema, start_time).await;
> +        assert!(
> +            result.is_ok(),
> +            "Failed to create VM RRD: {:?}",
> +            result.err()
> +        );
> +        assert!(rrd_path.exists());
> +    }
> +
> +    #[tokio::test]
> +    async fn test_direct_backend_create_storage_rrd() {
> +        let temp_dir = setup_temp_dir();
> +        let rrd_path = test_rrd_path(&temp_dir, "storage_test");
> +
> +        let mut backend = RrdDirectBackend::new();
> +        let schema = RrdSchema::storage(RrdFormat::Pve2);
> +        let start_time = 1704067200;
> +
> +        let result = backend.create(&rrd_path, &schema, start_time).await;
> +        assert!(
> +            result.is_ok(),
> +            "Failed to create storage RRD: {:?}",
> +            result.err()
> +        );
> +        assert!(rrd_path.exists());
> +    }
> +
> +    #[tokio::test]
> +    async fn test_direct_backend_update_with_timestamp() {
> +        let temp_dir = setup_temp_dir();
> +        let rrd_path = test_rrd_path(&temp_dir, "update_test");
> +
> +        let mut backend = RrdDirectBackend::new();
> +        let schema = RrdSchema::storage(RrdFormat::Pve2);
> +        let start_time = 1704067200;
> +
> +        // Create RRD file
> +        backend
> +            .create(&rrd_path, &schema, start_time)
> +            .await
> +            .expect("Failed to create RRD");
> +
> +        // Update with explicit timestamp and values
> +        // Format: "timestamp:value1:value2"
> +        let update_data = "1704067260:1000000:500000"; // total=1MB, used=500KB
> +        let result = backend.update(&rrd_path, update_data).await;
> +
> +        assert!(result.is_ok(), "Failed to update RRD: {:?}", result.err());
> +    }
> +
> +    #[tokio::test]
> +    async fn test_direct_backend_update_with_n_timestamp() {
> +        let temp_dir = setup_temp_dir();
> +        let rrd_path = test_rrd_path(&temp_dir, "update_n_test");
> +
> +        let mut backend = RrdDirectBackend::new();
> +        let schema = RrdSchema::storage(RrdFormat::Pve2);
> +        let start_time = 1704067200;
> +
> +        backend
> +            .create(&rrd_path, &schema, start_time)
> +            .await
> +            .expect("Failed to create RRD");
> +
> +        // Update with "N" (current time) timestamp
> +        let update_data = "N:2000000:750000";
> +        let result = backend.update(&rrd_path, update_data).await;
> +
> +        assert!(
> +            result.is_ok(),
> +            "Failed to update RRD with N timestamp: {:?}",
> +            result.err()
> +        );
> +    }
> +
> +    #[tokio::test]
> +    async fn test_direct_backend_update_with_unknown_values() {
> +        let temp_dir = setup_temp_dir();
> +        let rrd_path = test_rrd_path(&temp_dir, "update_u_test");
> +
> +        let mut backend = RrdDirectBackend::new();
> +        let schema = RrdSchema::storage(RrdFormat::Pve2);
> +        let start_time = 1704067200;
> +
> +        backend
> +            .create(&rrd_path, &schema, start_time)
> +            .await
> +            .expect("Failed to create RRD");
> +
> +        // Update with "U" (unknown) values
> +        let update_data = "N:U:1000000"; // total unknown, used known
> +        let result = backend.update(&rrd_path, update_data).await;
> +
> +        assert!(
> +            result.is_ok(),
> +            "Failed to update RRD with U values: {:?}",
> +            result.err()
> +        );
> +    }
> +
> +    #[tokio::test]
> +    async fn test_direct_backend_update_invalid_data() {
> +        let temp_dir = setup_temp_dir();
> +        let rrd_path = test_rrd_path(&temp_dir, "invalid_test");
> +
> +        let mut backend = RrdDirectBackend::new();
> +        let schema = RrdSchema::storage(RrdFormat::Pve2);
> +        let start_time = 1704067200;
> +
> +        backend
> +            .create(&rrd_path, &schema, start_time)
> +            .await
> +            .expect("Failed to create RRD");
> +
> +        // Test truly invalid data formats that MUST fail
> +        // Note: Invalid values like "abc" are converted to Unspecified (U), which is valid RRD behavior
> +        let invalid_cases = vec![
> +            "",            // Empty string
> +            ":",           // Only separator
> +            "timestamp",   // Missing values
> +            "N",           // No colon separator
> +            "abc:123:456", // Invalid timestamp (not N or integer)
> +        ];
> +
> +        for invalid_data in invalid_cases {
> +            let result = backend.update(&rrd_path, invalid_data).await;
> +            assert!(
> +                result.is_err(),
> +                "Update should fail for invalid data: '{}', but got Ok",
> +                invalid_data
> +            );
> +        }
> +
> +        // Test lenient data formats that succeed (invalid values become Unspecified)
> +        // Use explicit timestamps to avoid "same timestamp" errors
> +        let mut timestamp = start_time + 60;
> +        let lenient_cases = vec![
> +            "abc:456", // Invalid first value -> becomes U
> +            "123:def", // Invalid second value -> becomes U
> +            "U:U",     // All unknown
> +        ];
> +
> +        for valid_data in lenient_cases {
> +            let update_data = format!("{}:{}", timestamp, valid_data);
> +            let result = backend.update(&rrd_path, &update_data).await;
> +            assert!(
> +                result.is_ok(),
> +                "Update should succeed for lenient data: '{}', but got Err: {:?}",
> +                update_data,
> +                result.err()
> +            );
> +            timestamp += 60; // Increment timestamp for next update
> +        }
> +    }
> +
> +    #[tokio::test]
> +    async fn test_direct_backend_update_nonexistent_file() {
> +        let temp_dir = setup_temp_dir();
> +        let rrd_path = test_rrd_path(&temp_dir, "nonexistent");
> +
> +        let mut backend = RrdDirectBackend::new();
> +
> +        // Try to update a file that doesn't exist
> +        let result = backend.update(&rrd_path, "N:100:200").await;
> +
> +        assert!(result.is_err(), "Update should fail for nonexistent file");
> +    }
> +
> +    #[tokio::test]
> +    async fn test_direct_backend_flush() {
> +        let mut backend = RrdDirectBackend::new();
> +
> +        // Flush should always succeed for direct backend (no-op)
> +        let result = backend.flush().await;
> +        assert!(
> +            result.is_ok(),
> +            "Flush should always succeed for direct backend"
> +        );
> +    }
> +
> +    #[tokio::test]
> +    async fn test_direct_backend_is_available() {
> +        let backend = RrdDirectBackend::new();
> +
> +        // Direct backend should always be available
> +        assert!(
> +            backend.is_available().await,
> +            "Direct backend should always be available"
> +        );
> +    }
> +
> +    #[tokio::test]
> +    async fn test_direct_backend_multiple_updates() {
> +        let temp_dir = setup_temp_dir();
> +        let rrd_path = test_rrd_path(&temp_dir, "multi_update_test");
> +
> +        let mut backend = RrdDirectBackend::new();
> +        let schema = RrdSchema::storage(RrdFormat::Pve2);
> +        let start_time = 1704067200;
> +
> +        backend
> +            .create(&rrd_path, &schema, start_time)
> +            .await
> +            .expect("Failed to create RRD");
> +
> +        // Perform multiple updates
> +        for i in 0..10 {
> +            let timestamp = start_time + 60 * (i + 1); // 1 minute intervals
> +            let total = 1000000 + (i * 100000);
> +            let used = 500000 + (i * 50000);
> +            let update_data = format!("{}:{}:{}", timestamp, total, used);
> +
> +            let result = backend.update(&rrd_path, &update_data).await;
> +            assert!(result.is_ok(), "Update {} failed: {:?}", i, result.err());
> +        }
> +    }
> +
> +    #[tokio::test]
> +    async fn test_direct_backend_overwrite_file() {
> +        let temp_dir = setup_temp_dir();
> +        let rrd_path = test_rrd_path(&temp_dir, "overwrite_test");
> +
> +        let mut backend = RrdDirectBackend::new();
> +        let schema = RrdSchema::storage(RrdFormat::Pve2);
> +        let start_time = 1704067200;
> +
> +        // Create file first time
> +        backend
> +            .create(&rrd_path, &schema, start_time)
> +            .await
> +            .expect("First create failed");
> +
> +        // Create same file again - should succeed (overwrites)
> +        // Note: librrd create() with no_overwrite=false allows overwriting
> +        let result = backend.create(&rrd_path, &schema, start_time).await;
> +        assert!(
> +            result.is_ok(),
> +            "Creating file again should succeed (overwrite mode): {:?}",
> +            result.err()
> +        );
> +    }
> +
> +    #[tokio::test]
> +    async fn test_direct_backend_large_schema() {
> +        let temp_dir = setup_temp_dir();
> +        let rrd_path = test_rrd_path(&temp_dir, "large_schema_test");
> +
> +        let mut backend = RrdDirectBackend::new();
> +        let schema = RrdSchema::node(RrdFormat::Pve9_0); // 19 data sources
> +        let start_time = 1704067200;
> +
> +        // Create RRD with large schema
> +        let result = backend.create(&rrd_path, &schema, start_time).await;
> +        assert!(result.is_ok(), "Failed to create RRD with large schema");
> +
> +        // Update with all values
> +        let values = "100:200:50.5:10.2:8000000:4000000:2000000:500000:50000000:25000000:1000000:2000000:6000000:1000000:0.5:1.2:0.8:0.3:0.1";
> +        let update_data = format!("N:{}", values);
> +
> +        let result = backend.update(&rrd_path, &update_data).await;
> +        assert!(result.is_ok(), "Failed to update RRD with large schema");
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_fallback.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_fallback.rs
> new file mode 100644
> index 00000000..7d574e5b
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_fallback.rs
> @@ -0,0 +1,229 @@
> +/// RRD Backend: Fallback (Daemon + Direct)
> +///
> +/// Composite backend that tries daemon first, falls back to direct file writing.
> +/// This matches the C implementation's behavior in status.c:1405-1420 where
> +/// it attempts rrdc_update() first, then falls back to rrd_update_r().
> +use super::super::schema::RrdSchema;
> +use super::{RrdCachedBackend, RrdDirectBackend};
> +use anyhow::{Context, Result};
> +use async_trait::async_trait;
> +use std::path::Path;
> +
> +/// Composite backend that tries daemon first, falls back to direct
> +///
> +/// This provides the same behavior as the C implementation:
> +/// 1. Try to use rrdcached daemon for performance
> +/// 2. If daemon fails or is unavailable, fall back to direct file writes
> +pub struct RrdFallbackBackend {
> +    /// Optional daemon backend (None if daemon is unavailable/failed)
> +    daemon: Option<RrdCachedBackend>,
> +    /// Direct backend (always available)
> +    direct: RrdDirectBackend,
> +}
> +
> +impl RrdFallbackBackend {
> +    /// Create a new fallback backend
> +    ///
> +    /// Attempts to connect to rrdcached daemon. If successful, will prefer daemon.
> +    /// If daemon is unavailable, will use direct mode only.
> +    ///
> +    /// # Arguments
> +    /// * `daemon_socket` - Path to rrdcached Unix socket
> +    pub async fn new(daemon_socket: &str) -> Self {
> +        let daemon = match RrdCachedBackend::connect(daemon_socket).await {
> +            Ok(backend) => {
> +                tracing::info!("RRD fallback backend: daemon available, will prefer daemon mode");
> +                Some(backend)
> +            }
> +            Err(e) => {
> +                tracing::warn!(
> +                    "RRD fallback backend: daemon unavailable ({}), using direct mode only",
> +                    e
> +                );
> +                None
> +            }
> +        };
> +
> +        let direct = RrdDirectBackend::new();
> +
> +        Self { daemon, direct }
> +    }
> +
> +    /// Create a fallback backend with explicit daemon and direct backends
> +    ///
> +    /// Useful for testing or custom configurations
> +    #[allow(dead_code)] // Used in tests for custom backend configurations
> +    pub fn with_backends(daemon: Option<RrdCachedBackend>, direct: RrdDirectBackend) -> Self {
> +        Self { daemon, direct }
> +    }
> +
> +    /// Check if daemon is currently being used
> +    #[allow(dead_code)] // Used for debugging/monitoring daemon status
> +    pub fn is_using_daemon(&self) -> bool {
> +        self.daemon.is_some()
> +    }
> +
> +    /// Disable daemon mode and switch to direct mode only
> +    ///
> +    /// Called automatically when daemon operations fail
> +    fn disable_daemon(&mut self) {
> +        if self.daemon.is_some() {
> +            tracing::warn!("Disabling daemon mode, switching to direct file writes");
> +            self.daemon = None;
> +        }
> +    }
> +}
> +
> +#[async_trait]
> +impl super::super::backend::RrdBackend for RrdFallbackBackend {
> +    async fn update(&mut self, file_path: &Path, data: &str) -> Result<()> {
> +        // Try daemon first if available
> +        if let Some(daemon) = &mut self.daemon {
> +            match daemon.update(file_path, data).await {
> +                Ok(()) => {
> +                    tracing::trace!("Updated RRD via daemon (fallback backend)");
> +                    return Ok(());
> +                }
> +                Err(e) => {
> +                    tracing::warn!("Daemon update failed, falling back to direct: {}", e);
> +                    self.disable_daemon();

Currently, we disable here the daemon permanently after one failure.
In C the daemon retries on every update call it seems.
I think its fine if we go with this for now, but this should be then
noted as a difference in the README, and maybe something to change
in the future.

> +                }
> +            }
> +        }
> +
> +        // Fallback to direct
> +        self.direct
> +            .update(file_path, data)
> +            .await
> +            .context("Both daemon and direct update failed")
> +    }
> +
> +    async fn create(
> +        &mut self,
> +        file_path: &Path,
> +        schema: &RrdSchema,
> +        start_timestamp: i64,
> +    ) -> Result<()> {
> +        // Try daemon first if available
> +        if let Some(daemon) = &mut self.daemon {
> +            match daemon.create(file_path, schema, start_timestamp).await {
> +                Ok(()) => {
> +                    tracing::trace!("Created RRD via daemon (fallback backend)");
> +                    return Ok(());
> +                }
> +                Err(e) => {
> +                    tracing::warn!("Daemon create failed, falling back to direct: {}", e);
> +                    self.disable_daemon();
> +                }
> +            }
> +        }
> +
> +        // Fallback to direct
> +        self.direct
> +            .create(file_path, schema, start_timestamp)
> +            .await
> +            .context("Both daemon and direct create failed")
> +    }
> +
> +    async fn flush(&mut self) -> Result<()> {
> +        // Only flush if using daemon
> +        if let Some(daemon) = &mut self.daemon {
> +            match daemon.flush().await {
> +                Ok(()) => return Ok(()),
> +                Err(e) => {
> +                    tracing::warn!("Daemon flush failed: {}", e);
> +                    self.disable_daemon();
> +                }
> +            }
> +        }
> +
> +        // Direct backend flush is a no-op
> +        self.direct.flush().await
> +    }
> +
> +    async fn is_available(&self) -> bool {
> +        // Always available - either daemon or direct will work
> +        true
> +    }
> +
> +    fn name(&self) -> &str {
> +        if self.daemon.is_some() {
> +            "fallback(daemon+direct)"
> +        } else {
> +            "fallback(direct-only)"
> +        }
> +    }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +    use crate::backend::RrdBackend;
> +    use crate::schema::{RrdFormat, RrdSchema};
> +    use std::path::PathBuf;
> +    use tempfile::TempDir;
> +
> +    /// Create a temporary directory for RRD files
> +    fn setup_temp_dir() -> TempDir {
> +        TempDir::new().expect("Failed to create temp directory")
> +    }
> +
> +    /// Create a test RRD file path
> +    fn test_rrd_path(dir: &TempDir, name: &str) -> PathBuf {
> +        dir.path().join(format!("{}.rrd", name))
> +    }
> +
> +    #[test]
> +    fn test_fallback_backend_without_daemon() {
> +        let direct = RrdDirectBackend::new();
> +        let backend = RrdFallbackBackend::with_backends(None, direct);
> +
> +        assert!(!backend.is_using_daemon());
> +        assert_eq!(backend.name(), "fallback(direct-only)");
> +    }
> +
> +    #[tokio::test]
> +    async fn test_fallback_backend_direct_mode_operations() {
> +        let temp_dir = setup_temp_dir();
> +        let rrd_path = test_rrd_path(&temp_dir, "fallback_test");
> +
> +        // Create fallback backend without daemon (direct mode only)
> +        let direct = RrdDirectBackend::new();
> +        let mut backend = RrdFallbackBackend::with_backends(None, direct);
> +
> +        assert!(!backend.is_using_daemon(), "Should not be using daemon");
> +        assert_eq!(backend.name(), "fallback(direct-only)");
> +
> +        // Test create and update operations work in direct mode
> +        let schema = RrdSchema::storage(RrdFormat::Pve2);
> +        let start_time = 1704067200;
> +
> +        let result = backend.create(&rrd_path, &schema, start_time).await;
> +        assert!(result.is_ok(), "Create should work in direct mode");
> +
> +        let result = backend.update(&rrd_path, "N:1000:500").await;
> +        assert!(result.is_ok(), "Update should work in direct mode");
> +    }
> +
> +    #[tokio::test]
> +    async fn test_fallback_backend_is_always_available() {
> +        let direct = RrdDirectBackend::new();
> +        let backend = RrdFallbackBackend::with_backends(None, direct);
> +
> +        // Fallback backend should always be available (even without daemon)
> +        assert!(
> +            backend.is_available().await,
> +            "Fallback backend should always be available"
> +        );
> +    }
> +
> +    #[tokio::test]
> +    async fn test_fallback_backend_flush_without_daemon() {
> +        let direct = RrdDirectBackend::new();
> +        let mut backend = RrdFallbackBackend::with_backends(None, direct);
> +
> +        // Flush should succeed even without daemon (no-op for direct)
> +        let result = backend.flush().await;
> +        assert!(result.is_ok(), "Flush should succeed without daemon");
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/daemon.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/daemon.rs
> new file mode 100644
> index 00000000..e53b6dad
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/daemon.rs
> @@ -0,0 +1,140 @@
> +/// RRDCached Daemon Client (wrapper around rrdcached-client crate)
> +///
> +/// This module provides a thin wrapper around the rrdcached-client crate.
> +use anyhow::{Context, Result};
> +use std::path::Path;
> +
> +/// Wrapper around rrdcached-client
> +#[allow(dead_code)] // Used in backend_daemon.rs via module-level access
> +pub struct RrdCachedClient {
> +    pub(crate) client:
> +        tokio::sync::Mutex<rrdcached_client::RRDCachedClient<tokio::net::UnixStream>>,
> +}
> +
> +impl RrdCachedClient {
> +    /// Connect to rrdcached daemon via Unix socket
> +    ///
> +    /// # Arguments
> +    /// * `socket_path` - Path to rrdcached Unix socket (default: /var/run/rrdcached.sock)
> +    #[allow(dead_code)] // Used via backend modules
> +    pub async fn connect<P: AsRef<Path>>(socket_path: P) -> Result<Self> {
> +        let socket_path = socket_path.as_ref().to_string_lossy().to_string();
> +
> +        tracing::debug!("Connecting to rrdcached at {}", socket_path);
> +
> +        // Connect to daemon (async operation)
> +        let client = rrdcached_client::RRDCachedClient::connect_unix(&socket_path)
> +            .await
> +            .with_context(|| format!("Failed to connect to rrdcached: {socket_path}"))?;
> +
> +        tracing::info!("Connected to rrdcached at {}", socket_path);
> +
> +        Ok(Self {
> +            client: tokio::sync::Mutex::new(client),
> +        })
> +    }
> +
> +    /// Update RRD file via rrdcached
> +    ///
> +    /// # Arguments
> +    /// * `file_path` - Full path to RRD file
> +    /// * `data` - Update data in format "timestamp:value1:value2:..."
> +    #[allow(dead_code)] // Used via backend modules
> +    pub async fn update<P: AsRef<Path>>(&self, file_path: P, data: &str) -> Result<()> {

There is a lot of duplication in this function and
RrdCachedBackend::update(), I think this can be refactored a bit.

> +        let file_path = file_path.as_ref();
> +
> +        // Parse the update data
> +        let parts: Vec<&str> = data.split(':').collect();
> +        if parts.len() < 2 {
> +            anyhow::bail!("Invalid update data format: {data}");
> +        }
> +
> +        let timestamp = if parts[0] == "N" {
> +            None
> +        } else {
> +            Some(
> +                parts[0]
> +                    .parse::<usize>()
> +                    .with_context(|| format!("Invalid timestamp: {}", parts[0]))?,
> +            )
> +        };
> +
> +        let values: Vec<f64> = parts[1..]
> +            .iter()
> +            .map(|v| {
> +                if *v == "U" {
> +                    Ok(f64::NAN)
> +                } else {
> +                    v.parse::<f64>()
> +                        .with_context(|| format!("Invalid value: {v}"))

while we fail here on parsing of non-U values,
RrdCachedBackend::update() treats many invalid tokens as
Datum::Unspecified and succeeds.

It makes behavior depend on which backend is active.
We should stick to one rule.

> +                }
> +            })
> +            .collect::<Result<Vec<_>>>()?;
> +
> +        // Get file path without .rrd extension (rrdcached-client adds it)
> +        let path_str = file_path.to_string_lossy();
> +        let path_without_ext = path_str.strip_suffix(".rrd").unwrap_or(&path_str);
> +
> +        // Send update via rrdcached
> +        let mut client = self.client.lock().await;
> +        client
> +            .update(path_without_ext, timestamp, values)
> +            .await
> +            .context("Failed to send update to rrdcached")?;
> +
> +        tracing::trace!("Updated RRD via daemon: {:?} -> {}", file_path, data);
> +
> +        Ok(())
> +    }
> +
> +    /// Create RRD file via rrdcached
> +    #[allow(dead_code)] // Used via backend modules
> +    pub async fn create(&self, args: rrdcached_client::create::CreateArguments) -> Result<()> {
> +        let mut client = self.client.lock().await;
> +        client
> +            .create(args)
> +            .await
> +            .context("Failed to create RRD via rrdcached")?;
> +        Ok(())
> +    }
> +
> +    /// Flush all pending updates
> +    #[allow(dead_code)] // Used via backend modules
> +    pub async fn flush(&self) -> Result<()> {
> +        let mut client = self.client.lock().await;
> +        client
> +            .flush_all()
> +            .await
> +            .context("Failed to flush rrdcached")?;
> +
> +        tracing::debug!("Flushed all RRD files");
> +
> +        Ok(())
> +    }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +
> +    #[tokio::test]
> +    #[ignore] // Only runs if rrdcached daemon is actually running
> +    async fn test_connect_to_daemon() {
> +        // This test requires a running rrdcached daemon
> +        let result = RrdCachedClient::connect("/var/run/rrdcached.sock").await;
> +
> +        match result {
> +            Ok(client) => {
> +                // Try to flush (basic connectivity test)
> +                let result = client.flush().await;
> +                println!("RRDCached flush result: {:?}", result);
> +
> +                // Connection successful (flush may fail if no files, that's OK)
> +                assert!(result.is_ok() || result.is_err());
> +            }
> +            Err(e) => {
> +                println!("Note: rrdcached not running (expected in test env): {}", e);
> +            }
> +        }
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs
> new file mode 100644
> index 00000000..54021c14
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs
> @@ -0,0 +1,313 @@
> +/// RRD Key Type Parsing and Path Resolution
> +///
> +/// This module handles parsing RRD status update keys and mapping them
> +/// to the appropriate file paths and schemas.
> +use anyhow::{Context, Result};
> +use std::path::{Path, PathBuf};
> +
> +use super::schema::{RrdFormat, RrdSchema};
> +
> +/// RRD key types for routing to correct schema and path
> +///
> +/// This enum represents the different types of RRD metrics that pmxcfs tracks:
> +/// - Node metrics (CPU, memory, network for a node)
> +/// - VM metrics (CPU, memory, disk, network for a VM/CT)
> +/// - Storage metrics (total/used space for a storage)
> +#[derive(Debug, Clone, PartialEq, Eq)]
> +pub(crate) enum RrdKeyType {
> +    /// Node metrics: pve2-node/{nodename} or pve-node-9.0/{nodename}
> +    Node { nodename: String, format: RrdFormat },
> +    /// VM metrics: pve2.3-vm/{vmid} or pve-vm-9.0/{vmid}
> +    Vm { vmid: String, format: RrdFormat },
> +    /// Storage metrics: pve2-storage/{node}/{storage} or pve-storage-9.0/{node}/{storage}
> +    Storage {
> +        nodename: String,
> +        storage: String,
> +        format: RrdFormat,
> +    },
> +}
> +
> +impl RrdKeyType {
> +    /// Parse RRD key from status update key
> +    ///
> +    /// Supported formats:
> +    /// - "pve2-node/node1" → Node { nodename: "node1", format: Pve2 }
> +    /// - "pve-node-9.0/node1" → Node { nodename: "node1", format: Pve9_0 }
> +    /// - "pve2.3-vm/100" → Vm { vmid: "100", format: Pve2 }
> +    /// - "pve-storage-9.0/node1/local" → Storage { nodename: "node1", storage: "local", format: Pve9_0 }
> +    pub(crate) fn parse(key: &str) -> Result<Self> {
> +        let parts: Vec<&str> = key.split('/').collect();
> +
> +        if parts.is_empty() {
> +            anyhow::bail!("Empty RRD key");
> +        }
> +
> +        match parts[0] {
> +            "pve2-node" => {
> +                let nodename = parts.get(1).context("Missing nodename")?.to_string();
> +                Ok(RrdKeyType::Node {
> +                    nodename,
> +                    format: RrdFormat::Pve2,
> +                })
> +            }
> +            prefix if prefix.starts_with("pve-node-") => {

pve-node-9.1/... would be treated as 9.0 so we lose the ability to 
distinguish future format
Shouldnt we parse the suffix? Or please explicitly document the assumption.

> +                let nodename = parts.get(1).context("Missing nodename")?.to_string();
> +                Ok(RrdKeyType::Node {
> +                    nodename,
> +                    format: RrdFormat::Pve9_0,
> +                })
> +            }
> +            "pve2.3-vm" => {
> +                let vmid = parts.get(1).context("Missing vmid")?.to_string();
> +                Ok(RrdKeyType::Vm {
> +                    vmid,
> +                    format: RrdFormat::Pve2,
> +                })
> +            }
> +            prefix if prefix.starts_with("pve-vm-") => {
> +                let vmid = parts.get(1).context("Missing vmid")?.to_string();
> +                Ok(RrdKeyType::Vm {
> +                    vmid,
> +                    format: RrdFormat::Pve9_0,
> +                })
> +            }
> +            "pve2-storage" => {
> +                let nodename = parts.get(1).context("Missing nodename")?.to_string();
> +                let storage = parts.get(2).context("Missing storage")?.to_string();
> +                Ok(RrdKeyType::Storage {
> +                    nodename,
> +                    storage,
> +                    format: RrdFormat::Pve2,
> +                })
> +            }
> +            prefix if prefix.starts_with("pve-storage-") => {
> +                let nodename = parts.get(1).context("Missing nodename")?.to_string();
> +                let storage = parts.get(2).context("Missing storage")?.to_string();
> +                Ok(RrdKeyType::Storage {
> +                    nodename,
> +                    storage,
> +                    format: RrdFormat::Pve9_0,
> +                })
> +            }
> +            _ => anyhow::bail!("Unknown RRD key format: {key}"),
> +        }
> +    }
> +
> +    /// Get the RRD file path for this key type
> +    ///
> +    /// Always returns paths using the current format (9.0), regardless of the input format.
> +    /// This enables transparent format migration: old PVE8 nodes can send `pve2-node/` keys,
> +    /// and they'll be written to `pve-node-9.0/` files automatically.
> +    ///
> +    /// # Format Migration Strategy
> +    ///
> +    /// The C implementation always creates files in the current format directory
> +    /// (see status.c:1287). This Rust implementation follows the same approach:
> +    /// - Input: `pve2-node/node1` → Output: `/var/lib/rrdcached/db/pve-node-9.0/node1`
> +    /// - Input: `pve-node-9.0/node1` → Output: `/var/lib/rrdcached/db/pve-node-9.0/node1`
> +    ///
> +    /// This allows rolling upgrades where old and new nodes coexist in the same cluster.
> +    pub(crate) fn file_path(&self, base_dir: &Path) -> PathBuf {
> +        match self {
> +            RrdKeyType::Node { nodename, .. } => {
> +                // Always use current format path
> +                base_dir.join("pve-node-9.0").join(nodename)

If nodename or storage contains .. or / base_dir could be escaped and 
the write could happen anywhere.

I think we need validate/sanitize the input paths if not already
done. Ideally already as part of RrdKeyType?

> +            }
> +            RrdKeyType::Vm { vmid, .. } => {
> +                // Always use current format path
> +                base_dir.join("pve-vm-9.0").join(vmid)
> +            }
> +            RrdKeyType::Storage {
> +                nodename, storage, ..
> +            } => {
> +                // Always use current format path
> +                base_dir
> +                    .join("pve-storage-9.0")
> +                    .join(nodename)
> +                    .join(storage)
> +            }
> +        }
> +    }
> +
> +    /// Get the source format from the input key
> +    ///
> +    /// This is used for data transformation (padding/truncation).
> +    pub(crate) fn source_format(&self) -> RrdFormat {
> +        match self {
> +            RrdKeyType::Node { format, .. }
> +            | RrdKeyType::Vm { format, .. }
> +            | RrdKeyType::Storage { format, .. } => *format,
> +        }
> +    }
> +
> +    /// Get the target RRD schema (always current format)
> +    ///
> +    /// Files are always created using the current format (Pve9_0),
> +    /// regardless of the source format in the key.
> +    pub(crate) fn schema(&self) -> RrdSchema {
> +        match self {
> +            RrdKeyType::Node { .. } => RrdSchema::node(RrdFormat::Pve9_0),
> +            RrdKeyType::Vm { .. } => RrdSchema::vm(RrdFormat::Pve9_0),
> +            RrdKeyType::Storage { .. } => RrdSchema::storage(RrdFormat::Pve9_0),
> +        }
> +    }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +
> +    #[test]
> +    fn test_parse_node_keys() {
> +        let key = RrdKeyType::parse("pve2-node/testnode").unwrap();
> +        assert_eq!(
> +            key,
> +            RrdKeyType::Node {
> +                nodename: "testnode".to_string(),
> +                format: RrdFormat::Pve2
> +            }
> +        );
> +
> +        let key = RrdKeyType::parse("pve-node-9.0/testnode").unwrap();
> +        assert_eq!(
> +            key,
> +            RrdKeyType::Node {
> +                nodename: "testnode".to_string(),
> +                format: RrdFormat::Pve9_0
> +            }
> +        );
> +    }
> +
> +    #[test]
> +    fn test_parse_vm_keys() {
> +        let key = RrdKeyType::parse("pve2.3-vm/100").unwrap();
> +        assert_eq!(
> +            key,
> +            RrdKeyType::Vm {
> +                vmid: "100".to_string(),
> +                format: RrdFormat::Pve2
> +            }
> +        );
> +
> +        let key = RrdKeyType::parse("pve-vm-9.0/100").unwrap();
> +        assert_eq!(
> +            key,
> +            RrdKeyType::Vm {
> +                vmid: "100".to_string(),
> +                format: RrdFormat::Pve9_0
> +            }
> +        );
> +    }
> +
> +    #[test]
> +    fn test_parse_storage_keys() {
> +        let key = RrdKeyType::parse("pve2-storage/node1/local").unwrap();
> +        assert_eq!(
> +            key,
> +            RrdKeyType::Storage {
> +                nodename: "node1".to_string(),
> +                storage: "local".to_string(),
> +                format: RrdFormat::Pve2
> +            }
> +        );
> +
> +        let key = RrdKeyType::parse("pve-storage-9.0/node1/local").unwrap();
> +        assert_eq!(
> +            key,
> +            RrdKeyType::Storage {
> +                nodename: "node1".to_string(),
> +                storage: "local".to_string(),
> +                format: RrdFormat::Pve9_0
> +            }
> +        );
> +    }
> +
> +    #[test]
> +    fn test_file_paths() {
> +        let base = Path::new("/var/lib/rrdcached/db");
> +
> +        // New format key → new format path
> +        let key = RrdKeyType::Node {
> +            nodename: "node1".to_string(),
> +            format: RrdFormat::Pve9_0,
> +        };
> +        assert_eq!(
> +            key.file_path(base),
> +            PathBuf::from("/var/lib/rrdcached/db/pve-node-9.0/node1")
> +        );
> +
> +        // Old format key → new format path (auto-upgrade!)
> +        let key = RrdKeyType::Node {
> +            nodename: "node1".to_string(),
> +            format: RrdFormat::Pve2,
> +        };
> +        assert_eq!(
> +            key.file_path(base),
> +            PathBuf::from("/var/lib/rrdcached/db/pve-node-9.0/node1"),
> +            "Old format keys should create new format files"
> +        );
> +
> +        // VM: Old format → new format
> +        let key = RrdKeyType::Vm {
> +            vmid: "100".to_string(),
> +            format: RrdFormat::Pve2,
> +        };
> +        assert_eq!(
> +            key.file_path(base),
> +            PathBuf::from("/var/lib/rrdcached/db/pve-vm-9.0/100"),
> +            "Old VM format should upgrade to new format"
> +        );
> +
> +        // Storage: Always uses current format
> +        let key = RrdKeyType::Storage {
> +            nodename: "node1".to_string(),
> +            storage: "local".to_string(),
> +            format: RrdFormat::Pve2,
> +        };
> +        assert_eq!(
> +            key.file_path(base),
> +            PathBuf::from("/var/lib/rrdcached/db/pve-storage-9.0/node1/local"),
> +            "Old storage format should upgrade to new format"
> +        );
> +    }
> +
> +    #[test]
> +    fn test_source_format() {
> +        let key = RrdKeyType::Node {
> +            nodename: "node1".to_string(),
> +            format: RrdFormat::Pve2,
> +        };
> +        assert_eq!(key.source_format(), RrdFormat::Pve2);
> +
> +        let key = RrdKeyType::Vm {
> +            vmid: "100".to_string(),
> +            format: RrdFormat::Pve9_0,
> +        };
> +        assert_eq!(key.source_format(), RrdFormat::Pve9_0);
> +    }
> +
> +    #[test]
> +    fn test_schema_always_current_format() {
> +        // Even with Pve2 source format, schema should return Pve9_0
> +        let key = RrdKeyType::Node {
> +            nodename: "node1".to_string(),
> +            format: RrdFormat::Pve2,
> +        };
> +        let schema = key.schema();
> +        assert_eq!(
> +            schema.format,
> +            RrdFormat::Pve9_0,
> +            "Schema should always use current format"
> +        );
> +        assert_eq!(schema.column_count(), 19, "Should have Pve9_0 column count");
> +
> +        // Pve9_0 source also gets Pve9_0 schema
> +        let key = RrdKeyType::Node {
> +            nodename: "node1".to_string(),
> +            format: RrdFormat::Pve9_0,
> +        };
> +        let schema = key.schema();
> +        assert_eq!(schema.format, RrdFormat::Pve9_0);
> +        assert_eq!(schema.column_count(), 19);
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/lib.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/lib.rs
> new file mode 100644
> index 00000000..7a439676
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/lib.rs
> @@ -0,0 +1,21 @@
> +/// RRD (Round-Robin Database) Persistence Module
> +///
> +/// This module provides RRD file persistence compatible with the C pmxcfs implementation.
> +/// It handles:
> +/// - RRD file creation with proper schemas (node, VM, storage)
> +/// - RRD file updates (writing metrics to disk)
> +/// - Multiple backend strategies:
> +///   - Daemon mode: High-performance batched updates via rrdcached
> +///   - Direct mode: Reliable fallback using direct file writes
> +///   - Fallback mode: Tries daemon first, falls back to direct (matches C behavior)
> +/// - Version management (pve2 vs pve-9.0 formats)
> +///
> +/// The implementation matches the C behavior in status.c where it attempts
> +/// daemon updates first, then falls back to direct file operations.
> +mod backend;
> +mod daemon;
> +mod key_type;
> +pub(crate) mod schema;
> +mod writer;
> +
> +pub use writer::RrdWriter;
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/schema.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/schema.rs
> new file mode 100644
> index 00000000..d449bd6e
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/schema.rs
> @@ -0,0 +1,577 @@
> +/// RRD Schema Definitions
> +///
> +/// Defines RRD database schemas matching the C pmxcfs implementation.
> +/// Each schema specifies data sources (DS) and round-robin archives (RRA).
> +use std::fmt;
> +
> +/// RRD format version
> +#[derive(Debug, Clone, Copy, PartialEq, Eq)]
> +pub enum RrdFormat {
> +    /// Legacy pve2 format (12 columns for node, 10 for VM, 2 for storage)
> +    Pve2,
> +    /// New pve-9.0 format (19 columns for node, 17 for VM, 2 for storage)
> +    Pve9_0,
> +}
> +
> +/// RRD data source definition
> +#[derive(Debug, Clone)]
> +pub struct RrdDataSource {
> +    /// Data source name
> +    pub name: &'static str,
> +    /// Data source type (GAUGE, COUNTER, DERIVE, ABSOLUTE)
> +    pub ds_type: &'static str,
> +    /// Heartbeat (seconds before marking as unknown)
> +    pub heartbeat: u32,
> +    /// Minimum value (U for unknown)
> +    pub min: &'static str,
> +    /// Maximum value (U for unknown)
> +    pub max: &'static str,
> +}
> +
> +impl RrdDataSource {
> +    /// Create GAUGE data source with no min/max limits
> +    pub(super) const fn gauge(name: &'static str) -> Self {
> +        Self {
> +            name,
> +            ds_type: "GAUGE",
> +            heartbeat: 120,
> +            min: "0",
> +            max: "U",
> +        }
> +    }
> +
> +    /// Create DERIVE data source (for counters that can wrap)
> +    pub(super) const fn derive(name: &'static str) -> Self {
> +        Self {
> +            name,
> +            ds_type: "DERIVE",
> +            heartbeat: 120,
> +            min: "0",
> +            max: "U",
> +        }
> +    }
> +
> +    /// Format as RRD command line argument
> +    ///
> +    /// Matches C implementation format: "DS:name:TYPE:heartbeat:min:max"
> +    /// (see rrd_def_node in src/pmxcfs/status.c:1100)
> +    ///
> +    /// Currently unused but kept for debugging/testing and C format compatibility.
> +    #[allow(dead_code)]
> +    pub(super) fn to_arg(&self) -> String {
> +        format!(
> +            "DS:{}:{}:{}:{}:{}",
> +            self.name, self.ds_type, self.heartbeat, self.min, self.max
> +        )
> +    }
> +}
> +
> +/// RRD schema with data sources and archives
> +#[derive(Debug, Clone)]
> +pub struct RrdSchema {
> +    /// RRD format version
> +    pub format: RrdFormat,
> +    /// Data sources
> +    pub data_sources: Vec<RrdDataSource>,
> +    /// Round-robin archives (RRA definitions)
> +    pub archives: Vec<String>,
> +}
> +
> +impl RrdSchema {
> +    /// Create node RRD schema
> +    pub fn node(format: RrdFormat) -> Self {
> +        let data_sources = match format {
> +            RrdFormat::Pve2 => vec![
> +                RrdDataSource::gauge("loadavg"),
> +                RrdDataSource::gauge("maxcpu"),
> +                RrdDataSource::gauge("cpu"),
> +                RrdDataSource::gauge("iowait"),
> +                RrdDataSource::gauge("memtotal"),
> +                RrdDataSource::gauge("memused"),
> +                RrdDataSource::gauge("swaptotal"),
> +                RrdDataSource::gauge("swapused"),
> +                RrdDataSource::gauge("roottotal"),
> +                RrdDataSource::gauge("rootused"),
> +                RrdDataSource::derive("netin"),
> +                RrdDataSource::derive("netout"),
> +            ],
> +            RrdFormat::Pve9_0 => vec![
> +                RrdDataSource::gauge("loadavg"),
> +                RrdDataSource::gauge("maxcpu"),
> +                RrdDataSource::gauge("cpu"),
> +                RrdDataSource::gauge("iowait"),
> +                RrdDataSource::gauge("memtotal"),
> +                RrdDataSource::gauge("memused"),
> +                RrdDataSource::gauge("swaptotal"),
> +                RrdDataSource::gauge("swapused"),
> +                RrdDataSource::gauge("roottotal"),
> +                RrdDataSource::gauge("rootused"),
> +                RrdDataSource::derive("netin"),
> +                RrdDataSource::derive("netout"),
> +                RrdDataSource::gauge("memavailable"),
> +                RrdDataSource::gauge("arcsize"),
> +                RrdDataSource::gauge("pressurecpusome"),
> +                RrdDataSource::gauge("pressureiosome"),
> +                RrdDataSource::gauge("pressureiofull"),
> +                RrdDataSource::gauge("pressurememorysome"),
> +                RrdDataSource::gauge("pressurememoryfull"),
> +            ],
> +        };
> +
> +        Self {
> +            format,
> +            data_sources,
> +            archives: Self::default_archives(),
> +        }
> +    }
> +
> +    /// Create VM RRD schema
> +    pub fn vm(format: RrdFormat) -> Self {
> +        let data_sources = match format {
> +            RrdFormat::Pve2 => vec![
> +                RrdDataSource::gauge("maxcpu"),
> +                RrdDataSource::gauge("cpu"),
> +                RrdDataSource::gauge("maxmem"),
> +                RrdDataSource::gauge("mem"),
> +                RrdDataSource::gauge("maxdisk"),
> +                RrdDataSource::gauge("disk"),
> +                RrdDataSource::derive("netin"),
> +                RrdDataSource::derive("netout"),
> +                RrdDataSource::derive("diskread"),
> +                RrdDataSource::derive("diskwrite"),
> +            ],
> +            RrdFormat::Pve9_0 => vec![
> +                RrdDataSource::gauge("maxcpu"),
> +                RrdDataSource::gauge("cpu"),
> +                RrdDataSource::gauge("maxmem"),
> +                RrdDataSource::gauge("mem"),
> +                RrdDataSource::gauge("maxdisk"),
> +                RrdDataSource::gauge("disk"),
> +                RrdDataSource::derive("netin"),
> +                RrdDataSource::derive("netout"),
> +                RrdDataSource::derive("diskread"),
> +                RrdDataSource::derive("diskwrite"),
> +                RrdDataSource::gauge("memhost"),
> +                RrdDataSource::gauge("pressurecpusome"),
> +                RrdDataSource::gauge("pressurecpufull"),
> +                RrdDataSource::gauge("pressureiosome"),
> +                RrdDataSource::gauge("pressureiofull"),
> +                RrdDataSource::gauge("pressurememorysome"),
> +                RrdDataSource::gauge("pressurememoryfull"),
> +            ],
> +        };
> +
> +        Self {
> +            format,
> +            data_sources,
> +            archives: Self::default_archives(),
> +        }
> +    }
> +
> +    /// Create storage RRD schema
> +    pub fn storage(format: RrdFormat) -> Self {
> +        let data_sources = vec![RrdDataSource::gauge("total"), RrdDataSource::gauge("used")];
> +
> +        Self {
> +            format,
> +            data_sources,
> +            archives: Self::default_archives(),
> +        }
> +    }
> +
> +    /// Default RRA (Round-Robin Archive) definitions
> +    ///
> +    /// These match the C implementation's archives for 60-second step size:
> +    /// - RRA:AVERAGE:0.5:1:1440      -> 1 min * 1440 => 1 day
> +    /// - RRA:AVERAGE:0.5:30:1440     -> 30 min * 1440 => 30 days
> +    /// - RRA:AVERAGE:0.5:360:1440    -> 6 hours * 1440 => 360 days (~1 year)
> +    /// - RRA:AVERAGE:0.5:10080:570   -> 1 week * 570 => ~10 years
> +    /// - RRA:MAX:0.5:1:1440          -> 1 min * 1440 => 1 day
> +    /// - RRA:MAX:0.5:30:1440         -> 30 min * 1440 => 30 days
> +    /// - RRA:MAX:0.5:360:1440        -> 6 hours * 1440 => 360 days (~1 year)
> +    /// - RRA:MAX:0.5:10080:570       -> 1 week * 570 => ~10 years
> +    pub(super) fn default_archives() -> Vec<String> {
> +        vec![
> +            "RRA:AVERAGE:0.5:1:1440".to_string(),
> +            "RRA:AVERAGE:0.5:30:1440".to_string(),
> +            "RRA:AVERAGE:0.5:360:1440".to_string(),
> +            "RRA:AVERAGE:0.5:10080:570".to_string(),
> +            "RRA:MAX:0.5:1:1440".to_string(),
> +            "RRA:MAX:0.5:30:1440".to_string(),
> +            "RRA:MAX:0.5:360:1440".to_string(),
> +            "RRA:MAX:0.5:10080:570".to_string(),
> +        ]
> +    }
> +
> +    /// Get number of data sources
> +    pub fn column_count(&self) -> usize {
> +        self.data_sources.len()
> +    }
> +}
> +
> +impl fmt::Display for RrdSchema {
> +    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
> +        write!(
> +            f,
> +            "{:?} schema with {} data sources",
> +            self.format,
> +            self.column_count()
> +        )
> +    }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +
> +    fn assert_ds_properties(
> +        ds: &RrdDataSource,
> +        expected_name: &str,
> +        expected_type: &str,
> +        index: usize,
> +    ) {
> +        assert_eq!(ds.name, expected_name, "DS[{}] name mismatch", index);
> +        assert_eq!(ds.ds_type, expected_type, "DS[{}] type mismatch", index);
> +        assert_eq!(ds.heartbeat, 120, "DS[{}] heartbeat should be 120", index);
> +        assert_eq!(ds.min, "0", "DS[{}] min should be 0", index);
> +        assert_eq!(ds.max, "U", "DS[{}] max should be U", index);
> +    }
> +
> +    #[test]
> +    fn test_datasource_construction() {
> +        let gauge_ds = RrdDataSource::gauge("cpu");
> +        assert_eq!(gauge_ds.name, "cpu");
> +        assert_eq!(gauge_ds.ds_type, "GAUGE");
> +        assert_eq!(gauge_ds.heartbeat, 120);
> +        assert_eq!(gauge_ds.min, "0");
> +        assert_eq!(gauge_ds.max, "U");
> +        assert_eq!(gauge_ds.to_arg(), "DS:cpu:GAUGE:120:0:U");
> +
> +        let derive_ds = RrdDataSource::derive("netin");
> +        assert_eq!(derive_ds.name, "netin");
> +        assert_eq!(derive_ds.ds_type, "DERIVE");
> +        assert_eq!(derive_ds.heartbeat, 120);
> +        assert_eq!(derive_ds.min, "0");
> +        assert_eq!(derive_ds.max, "U");
> +        assert_eq!(derive_ds.to_arg(), "DS:netin:DERIVE:120:0:U");
> +    }
> +
> +    #[test]
> +    fn test_node_schema_pve2() {
> +        let schema = RrdSchema::node(RrdFormat::Pve2);
> +
> +        assert_eq!(schema.column_count(), 12);
> +        assert_eq!(schema.format, RrdFormat::Pve2);
> +
> +        let expected_ds = vec![
> +            ("loadavg", "GAUGE"),
> +            ("maxcpu", "GAUGE"),
> +            ("cpu", "GAUGE"),
> +            ("iowait", "GAUGE"),
> +            ("memtotal", "GAUGE"),
> +            ("memused", "GAUGE"),
> +            ("swaptotal", "GAUGE"),
> +            ("swapused", "GAUGE"),
> +            ("roottotal", "GAUGE"),
> +            ("rootused", "GAUGE"),
> +            ("netin", "DERIVE"),
> +            ("netout", "DERIVE"),
> +        ];
> +
> +        for (i, (name, ds_type)) in expected_ds.iter().enumerate() {
> +            assert_ds_properties(&schema.data_sources[i], name, ds_type, i);
> +        }
> +    }
> +
> +    #[test]
> +    fn test_node_schema_pve9() {
> +        let schema = RrdSchema::node(RrdFormat::Pve9_0);
> +
> +        assert_eq!(schema.column_count(), 19);
> +        assert_eq!(schema.format, RrdFormat::Pve9_0);
> +
> +        let pve2_schema = RrdSchema::node(RrdFormat::Pve2);
> +        for i in 0..12 {
> +            assert_eq!(
> +                schema.data_sources[i].name, pve2_schema.data_sources[i].name,
> +                "First 12 DS should match pve2"
> +            );
> +            assert_eq!(
> +                schema.data_sources[i].ds_type, pve2_schema.data_sources[i].ds_type,
> +                "First 12 DS types should match pve2"
> +            );
> +        }
> +
> +        let pve9_additions = vec![
> +            ("memavailable", "GAUGE"),
> +            ("arcsize", "GAUGE"),
> +            ("pressurecpusome", "GAUGE"),
> +            ("pressureiosome", "GAUGE"),
> +            ("pressureiofull", "GAUGE"),
> +            ("pressurememorysome", "GAUGE"),
> +            ("pressurememoryfull", "GAUGE"),
> +        ];
> +
> +        for (i, (name, ds_type)) in pve9_additions.iter().enumerate() {
> +            assert_ds_properties(&schema.data_sources[12 + i], name, ds_type, 12 + i);
> +        }
> +    }
> +
> +    #[test]
> +    fn test_vm_schema_pve2() {
> +        let schema = RrdSchema::vm(RrdFormat::Pve2);
> +
> +        assert_eq!(schema.column_count(), 10);
> +        assert_eq!(schema.format, RrdFormat::Pve2);
> +
> +        let expected_ds = vec![
> +            ("maxcpu", "GAUGE"),
> +            ("cpu", "GAUGE"),
> +            ("maxmem", "GAUGE"),
> +            ("mem", "GAUGE"),
> +            ("maxdisk", "GAUGE"),
> +            ("disk", "GAUGE"),
> +            ("netin", "DERIVE"),
> +            ("netout", "DERIVE"),
> +            ("diskread", "DERIVE"),
> +            ("diskwrite", "DERIVE"),
> +        ];
> +
> +        for (i, (name, ds_type)) in expected_ds.iter().enumerate() {
> +            assert_ds_properties(&schema.data_sources[i], name, ds_type, i);
> +        }
> +    }
> +
> +    #[test]
> +    fn test_vm_schema_pve9() {
> +        let schema = RrdSchema::vm(RrdFormat::Pve9_0);
> +
> +        assert_eq!(schema.column_count(), 17);
> +        assert_eq!(schema.format, RrdFormat::Pve9_0);
> +
> +        let pve2_schema = RrdSchema::vm(RrdFormat::Pve2);
> +        for i in 0..10 {
> +            assert_eq!(
> +                schema.data_sources[i].name, pve2_schema.data_sources[i].name,
> +                "First 10 DS should match pve2"
> +            );
> +            assert_eq!(
> +                schema.data_sources[i].ds_type, pve2_schema.data_sources[i].ds_type,
> +                "First 10 DS types should match pve2"
> +            );
> +        }
> +
> +        let pve9_additions = vec![
> +            ("memhost", "GAUGE"),
> +            ("pressurecpusome", "GAUGE"),
> +            ("pressurecpufull", "GAUGE"),
> +            ("pressureiosome", "GAUGE"),
> +            ("pressureiofull", "GAUGE"),
> +            ("pressurememorysome", "GAUGE"),
> +            ("pressurememoryfull", "GAUGE"),
> +        ];
> +
> +        for (i, (name, ds_type)) in pve9_additions.iter().enumerate() {
> +            assert_ds_properties(&schema.data_sources[10 + i], name, ds_type, 10 + i);
> +        }
> +    }
> +
> +    #[test]
> +    fn test_storage_schema() {
> +        for format in [RrdFormat::Pve2, RrdFormat::Pve9_0] {
> +            let schema = RrdSchema::storage(format);
> +
> +            assert_eq!(schema.column_count(), 2);
> +            assert_eq!(schema.format, format);
> +
> +            assert_ds_properties(&schema.data_sources[0], "total", "GAUGE", 0);
> +            assert_ds_properties(&schema.data_sources[1], "used", "GAUGE", 1);
> +        }
> +    }
> +
> +    #[test]
> +    fn test_rra_archives() {
> +        let expected_rras = [
> +            "RRA:AVERAGE:0.5:1:1440",
> +            "RRA:AVERAGE:0.5:30:1440",
> +            "RRA:AVERAGE:0.5:360:1440",
> +            "RRA:AVERAGE:0.5:10080:570",
> +            "RRA:MAX:0.5:1:1440",
> +            "RRA:MAX:0.5:30:1440",
> +            "RRA:MAX:0.5:360:1440",
> +            "RRA:MAX:0.5:10080:570",
> +        ];
> +
> +        let schemas = vec![
> +            RrdSchema::node(RrdFormat::Pve2),
> +            RrdSchema::node(RrdFormat::Pve9_0),
> +            RrdSchema::vm(RrdFormat::Pve2),
> +            RrdSchema::vm(RrdFormat::Pve9_0),
> +            RrdSchema::storage(RrdFormat::Pve2),
> +            RrdSchema::storage(RrdFormat::Pve9_0),
> +        ];
> +
> +        for schema in schemas {
> +            assert_eq!(schema.archives.len(), 8);
> +
> +            for (i, expected) in expected_rras.iter().enumerate() {
> +                assert_eq!(
> +                    &schema.archives[i], expected,
> +                    "RRA[{}] mismatch in {:?}",
> +                    i, schema.format
> +                );
> +            }
> +        }
> +    }
> +
> +    #[test]
> +    fn test_heartbeat_consistency() {
> +        let schemas = vec![
> +            RrdSchema::node(RrdFormat::Pve2),
> +            RrdSchema::node(RrdFormat::Pve9_0),
> +            RrdSchema::vm(RrdFormat::Pve2),
> +            RrdSchema::vm(RrdFormat::Pve9_0),
> +            RrdSchema::storage(RrdFormat::Pve2),
> +            RrdSchema::storage(RrdFormat::Pve9_0),
> +        ];
> +
> +        for schema in schemas {
> +            for ds in &schema.data_sources {
> +                assert_eq!(ds.heartbeat, 120);
> +                assert_eq!(ds.min, "0");
> +                assert_eq!(ds.max, "U");
> +            }
> +        }
> +    }
> +
> +    #[test]
> +    fn test_gauge_vs_derive_correctness() {
> +        // GAUGE: instantaneous values (CPU%, memory bytes)
> +        // DERIVE: cumulative counters that can wrap (network/disk bytes)
> +
> +        let node = RrdSchema::node(RrdFormat::Pve2);
> +        let node_derive_indices = [10, 11]; // netin, netout
> +        for (i, ds) in node.data_sources.iter().enumerate() {
> +            if node_derive_indices.contains(&i) {
> +                assert_eq!(
> +                    ds.ds_type, "DERIVE",
> +                    "Node DS[{}] ({}) should be DERIVE",
> +                    i, ds.name
> +                );
> +            } else {
> +                assert_eq!(
> +                    ds.ds_type, "GAUGE",
> +                    "Node DS[{}] ({}) should be GAUGE",
> +                    i, ds.name
> +                );
> +            }
> +        }
> +
> +        let vm = RrdSchema::vm(RrdFormat::Pve2);
> +        let vm_derive_indices = [6, 7, 8, 9]; // netin, netout, diskread, diskwrite
> +        for (i, ds) in vm.data_sources.iter().enumerate() {
> +            if vm_derive_indices.contains(&i) {
> +                assert_eq!(
> +                    ds.ds_type, "DERIVE",
> +                    "VM DS[{}] ({}) should be DERIVE",
> +                    i, ds.name
> +                );
> +            } else {
> +                assert_eq!(
> +                    ds.ds_type, "GAUGE",
> +                    "VM DS[{}] ({}) should be GAUGE",
> +                    i, ds.name
> +                );
> +            }
> +        }
> +
> +        let storage = RrdSchema::storage(RrdFormat::Pve2);
> +        for ds in &storage.data_sources {
> +            assert_eq!(
> +                ds.ds_type, "GAUGE",
> +                "Storage DS ({}) should be GAUGE",
> +                ds.name
> +            );
> +        }
> +    }
> +
> +    #[test]
> +    fn test_pve9_backward_compatibility() {
> +        let node_pve2 = RrdSchema::node(RrdFormat::Pve2);
> +        let node_pve9 = RrdSchema::node(RrdFormat::Pve9_0);
> +
> +        assert!(node_pve9.column_count() > node_pve2.column_count());
> +
> +        for i in 0..node_pve2.column_count() {
> +            assert_eq!(
> +                node_pve2.data_sources[i].name, node_pve9.data_sources[i].name,
> +                "Node DS[{}] name must match between pve2 and pve9.0",
> +                i
> +            );
> +            assert_eq!(
> +                node_pve2.data_sources[i].ds_type, node_pve9.data_sources[i].ds_type,
> +                "Node DS[{}] type must match between pve2 and pve9.0",
> +                i
> +            );
> +        }
> +
> +        let vm_pve2 = RrdSchema::vm(RrdFormat::Pve2);
> +        let vm_pve9 = RrdSchema::vm(RrdFormat::Pve9_0);
> +
> +        assert!(vm_pve9.column_count() > vm_pve2.column_count());
> +
> +        for i in 0..vm_pve2.column_count() {
> +            assert_eq!(
> +                vm_pve2.data_sources[i].name, vm_pve9.data_sources[i].name,
> +                "VM DS[{}] name must match between pve2 and pve9.0",
> +                i
> +            );
> +            assert_eq!(
> +                vm_pve2.data_sources[i].ds_type, vm_pve9.data_sources[i].ds_type,
> +                "VM DS[{}] type must match between pve2 and pve9.0",
> +                i
> +            );
> +        }
> +
> +        let storage_pve2 = RrdSchema::storage(RrdFormat::Pve2);
> +        let storage_pve9 = RrdSchema::storage(RrdFormat::Pve9_0);
> +        assert_eq!(storage_pve2.column_count(), storage_pve9.column_count());
> +    }
> +
> +    #[test]
> +    fn test_schema_display() {
> +        let test_cases = vec![
> +            (RrdSchema::node(RrdFormat::Pve2), "Pve2", "12 data sources"),
> +            (
> +                RrdSchema::node(RrdFormat::Pve9_0),
> +                "Pve9_0",
> +                "19 data sources",
> +            ),
> +            (RrdSchema::vm(RrdFormat::Pve2), "Pve2", "10 data sources"),
> +            (
> +                RrdSchema::vm(RrdFormat::Pve9_0),
> +                "Pve9_0",
> +                "17 data sources",
> +            ),
> +            (
> +                RrdSchema::storage(RrdFormat::Pve2),
> +                "Pve2",
> +                "2 data sources",
> +            ),
> +        ];
> +
> +        for (schema, expected_format, expected_count) in test_cases {
> +            let display = format!("{}", schema);
> +            assert!(
> +                display.contains(expected_format),
> +                "Display should contain format: {}",
> +                display
> +            );
> +            assert!(
> +                display.contains(expected_count),
> +                "Display should contain count: {}",
> +                display
> +            );
> +        }
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs
> new file mode 100644
> index 00000000..79ed202a
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs
> @@ -0,0 +1,397 @@
> +/// RRD File Writer
> +///
> +/// Handles creating and updating RRD files via pluggable backends.
> +/// Supports daemon-based (rrdcached) and direct file writing modes.
> +use super::key_type::RrdKeyType;
> +use super::schema::{RrdFormat, RrdSchema};
> +use anyhow::{Context, Result};
> +use chrono::Utc;
> +use std::collections::HashMap;
> +use std::fs;
> +use std::path::{Path, PathBuf};
> +
> +/// Metric type for determining column skipping rules
> +#[derive(Debug, Clone, Copy, PartialEq, Eq)]
> +enum MetricType {
> +    Node,
> +    Vm,
> +    Storage,
> +}
> +
> +impl MetricType {
> +    /// Number of non-archivable columns to skip
> +    ///
> +    /// C implementation (status.c:1300, 1335):
> +    /// - Node: skip 2 (uptime, status)
> +    /// - VM: skip 4 (uptime, status, template, pid)
> +    /// - Storage: skip 0
> +    fn skip_columns(self) -> usize {
> +        match self {
> +            MetricType::Node => 2,
> +            MetricType::Vm => 4,
> +            MetricType::Storage => 0,
> +        }
> +    }
> +}
> +
> +impl RrdFormat {
> +    /// Get column count for a specific metric type
> +    #[allow(dead_code)]
> +    fn column_count(self, metric_type: &MetricType) -> usize {
> +        match (self, metric_type) {
> +            (RrdFormat::Pve2, MetricType::Node) => 12,
> +            (RrdFormat::Pve9_0, MetricType::Node) => 19,
> +            (RrdFormat::Pve2, MetricType::Vm) => 10,
> +            (RrdFormat::Pve9_0, MetricType::Vm) => 17,
> +            (_, MetricType::Storage) => 2, // Same for both formats
> +        }
> +    }
> +}
> +
> +impl RrdKeyType {
> +    /// Get the metric type for this key
> +    fn metric_type(&self) -> MetricType {
> +        match self {
> +            RrdKeyType::Node { .. } => MetricType::Node,
> +            RrdKeyType::Vm { .. } => MetricType::Vm,
> +            RrdKeyType::Storage { .. } => MetricType::Storage,
> +        }
> +    }
> +}
> +
> +/// RRD writer for persistent metric storage
> +///
> +/// Uses pluggable backends (daemon, direct, or fallback) for RRD operations.
> +pub struct RrdWriter {
> +    /// Base directory for RRD files (default: /var/lib/rrdcached/db)
> +    base_dir: PathBuf,
> +    /// Backend for RRD operations (daemon, direct, or fallback)
> +    backend: Box<dyn super::backend::RrdBackend>,
> +    /// Track which RRD files we've already created
> +    created_files: HashMap<String, ()>,

We currently dont clear this cache?
This suggests to risk DDoS.

> +}
> +
> +impl RrdWriter {
> +    /// Create new RRD writer with default fallback backend
> +    ///
> +    /// Uses the fallback backend that tries daemon first, then falls back to direct file writes.
> +    /// This matches the C implementation's behavior.
> +    ///
> +    /// # Arguments
> +    /// * `base_dir` - Base directory for RRD files
> +    pub async fn new<P: AsRef<Path>>(base_dir: P) -> Result<Self> {
> +        let backend = Self::default_backend().await?;
> +        Self::with_backend(base_dir, backend).await
> +    }
> +
> +    /// Create new RRD writer with specific backend
> +    ///
> +    /// # Arguments
> +    /// * `base_dir` - Base directory for RRD files
> +    /// * `backend` - RRD backend to use (daemon, direct, or fallback)
> +    pub(crate) async fn with_backend<P: AsRef<Path>>(
> +        base_dir: P,
> +        backend: Box<dyn super::backend::RrdBackend>,
> +    ) -> Result<Self> {
> +        let base_dir = base_dir.as_ref().to_path_buf();
> +
> +        // Create base directory if it doesn't exist
> +        fs::create_dir_all(&base_dir)
> +            .with_context(|| format!("Failed to create RRD base directory: {base_dir:?}"))?;
> +
> +        tracing::info!("RRD writer using backend: {}", backend.name());
> +
> +        Ok(Self {
> +            base_dir,
> +            backend,
> +            created_files: HashMap::new(),
> +        })
> +    }
> +
> +    /// Create default backend (fallback: daemon + direct)
> +    ///
> +    /// This matches the C implementation's behavior:
> +    /// - Tries rrdcached daemon first for performance
> +    /// - Falls back to direct file writes if daemon fails
> +    async fn default_backend() -> Result<Box<dyn super::backend::RrdBackend>> {
> +        let backend = super::backend::RrdFallbackBackend::new("/var/run/rrdcached.sock").await;
> +        Ok(Box::new(backend))
> +    }
> +
> +    /// Update RRD file with metric data
> +    ///
> +    /// This will:
> +    /// 1. Transform data from source format to target format (padding/truncation/column skipping)
> +    /// 2. Create the RRD file if it doesn't exist
> +    /// 3. Update via rrdcached daemon
> +    ///
> +    /// # Arguments
> +    /// * `key` - RRD key (e.g., "pve2-node/node1", "pve-vm-9.0/100")
> +    /// * `data` - Metric data string (format: "timestamp:value1:value2:...")
> +    pub async fn update(&mut self, key: &str, data: &str) -> Result<()> {
> +        // Parse the key to determine file path and schema
> +        let key_type = RrdKeyType::parse(key).with_context(|| format!("Invalid RRD key: {key}"))?;
> +
> +        // Get source format and target schema
> +        let source_format = key_type.source_format();
> +        let target_schema = key_type.schema();
> +        let metric_type = key_type.metric_type();
> +
> +        // Transform data from source to target format
> +        let transformed_data =
> +            Self::transform_data(data, source_format, &target_schema, metric_type)
> +                .with_context(|| format!("Failed to transform RRD data for key: {key}"))?;
> +
> +        // Get the file path (always uses current format)
> +        let file_path = key_type.file_path(&self.base_dir);
> +
> +        // Ensure the RRD file exists
> +        if !self.created_files.contains_key(key) && !file_path.exists() {

If an RRD file is deleted/rotated while the process is running,
created_files still contains the key, so it won’t recreate and
updates will fail. Maybe check file_path.exists() unconditionally?

> +            self.create_rrd_file(&key_type, &file_path).await?;
> +            self.created_files.insert(key.to_string(), ());
> +        }
> +
> +        // Update the RRD file via backend
> +        self.backend.update(&file_path, &transformed_data).await?;
> +
> +        Ok(())
> +    }
> +
> +    /// Create RRD file with appropriate schema via backend
> +    async fn create_rrd_file(&mut self, key_type: &RrdKeyType, file_path: &Path) -> Result<()> {
> +        // Ensure parent directory exists
> +        if let Some(parent) = file_path.parent() {
> +            fs::create_dir_all(parent)
> +                .with_context(|| format!("Failed to create directory: {parent:?}"))?;
> +        }
> +
> +        // Get schema for this RRD type
> +        let schema = key_type.schema();
> +
> +        // Calculate start time (at day boundary, matching C implementation)
> +        let now = Utc::now();
> +        let start = now
> +            .date_naive()
> +            .and_hms_opt(0, 0, 0)
> +            .expect("00:00:00 is always a valid time")
> +            .and_utc();

start time uses UTC midnight here, I think the C code uses localtime
day boundary. Worth double checking

> +        let start_timestamp = start.timestamp();
> +
> +        tracing::debug!(
> +            "Creating RRD file: {:?} with {} data sources via {}",
> +            file_path,
> +            schema.column_count(),
> +            self.backend.name()
> +        );
> +
> +        // Delegate to backend for creation
> +        self.backend
> +            .create(file_path, &schema, start_timestamp)
> +            .await?;
> +
> +        tracing::info!("Created RRD file: {:?} ({})", file_path, schema);
> +
> +        Ok(())
> +    }
> +
> +    /// Transform data from source format to target format
> +    ///
> +    /// This implements the C behavior from status.c:
> +    /// 1. Skip non-archivable columns only for old formats (uptime, status for nodes)
> +    /// 2. Pad old format data with `:U` for missing columns
> +    /// 3. Truncate future format data to known columns
> +    ///
> +    /// # Arguments
> +    /// * `data` - Raw data string from status update (format: "timestamp:v1:v2:...")
> +    /// * `source_format` - Format indicated by the input key
> +    /// * `target_schema` - Target RRD schema (always Pve9_0 currently)
> +    /// * `metric_type` - Type of metric (Node, VM, Storage) for column skipping
> +    ///
> +    /// # Returns
> +    /// Transformed data string ready for RRD update
> +    fn transform_data(
> +        data: &str,
> +        source_format: RrdFormat,
> +        target_schema: &RrdSchema,
> +        metric_type: MetricType,
> +    ) -> Result<String> {
> +        let mut parts = data.split(':');
> +
> +        let timestamp = parts
> +            .next()
> +            .ok_or_else(|| anyhow::anyhow!("Empty data string"))?;

Not required for correctness as backend will reject, but early
validation here of the timestamp would improve the error message
and avoid doing work before failing

> +
> +        // Skip non-archivable columns for old format only (C: status.c:1300, 1335, 1385)
> +        let skip_count = if source_format == RrdFormat::Pve2 {
> +            metric_type.skip_columns()
> +        } else {
> +            0
> +        };

likely a bug: here we only skip the non-archivable prefix fields for
Pve2, but not for Pve9_0. If pve9 payloads still include
uptime/status/template/pid, the mapping will be shifted and metrics
will be written into the wrong columns.

status.c:update_rrd_data() skips unconditionally by type:

if (strncmp(key, "pve2-node/", 10) == 0 || strncmp(key, "pve-node-", 9) 
== 0) {
     ...
     skip = 2; // first two columns are live data that isn't archived
     ...
}

} else if (strncmp(key, "pve2.3-vm/", 10) == 0 || strncmp(key, 
"pve-vm-", 7) == 0) {
     ...
     skip = 4; // first 4 columns are live data that isn't archived
     ...
}

skip = 2 / skip = 4 is not "PVE2 only"

Let’s either apply skip_columns() based on metric type for all formats,
or show with captured fixtures pve9 payloads are already stripped.

> +
> +        // Build transformed data: timestamp + values (skipped, padded/truncated to target_cols)
> +        let target_cols = target_schema.column_count();
> +
> +        // Join values with ':' separator, efficiently building the string without Vec allocation
> +        let mut iter = parts
> +            .skip(skip_count)
> +            .chain(std::iter::repeat("U"))
> +            .take(target_cols);
> +        let values = match iter.next() {
> +            Some(first) => {
> +                // Start with first value, fold remaining values with separator
> +                iter.fold(first.to_string(), |mut acc, value| {
> +                    acc.push(':');
> +                    acc.push_str(value);
> +                    acc
> +                })
> +            }
> +            None => String::new(),
> +        };
> +
> +        Ok(format!("{timestamp}:{values}"))
> +    }
> +
> +    /// Flush all pending updates
> +    #[allow(dead_code)] // Used via RRD update cycle
> +    pub(crate) async fn flush(&mut self) -> Result<()> {
> +        self.backend.flush().await
> +    }
> +
> +    /// Get base directory
> +    #[allow(dead_code)] // Used for path resolution in updates
> +    pub(crate) fn base_dir(&self) -> &Path {
> +        &self.base_dir
> +    }
> +}
> +
> +impl Drop for RrdWriter {
> +    fn drop(&mut self) {
> +        // Note: We can't flush in Drop since it's async
> +        // Users should call flush() explicitly before dropping if needed
> +        tracing::debug!("RrdWriter dropped");
> +    }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::super::schema::{RrdFormat, RrdSchema};
> +    use super::*;
> +
> +    #[test]
> +    fn test_rrd_file_path_generation() {
> +        let temp_dir = std::path::PathBuf::from("/tmp/test");
> +
> +        let key_node = RrdKeyType::Node {
> +            nodename: "testnode".to_string(),
> +            format: RrdFormat::Pve9_0,
> +        };
> +        let path = key_node.file_path(&temp_dir);
> +        assert_eq!(path, temp_dir.join("pve-node-9.0").join("testnode"));
> +    }
> +
> +    // ===== Format Adaptation Tests =====

The transform tests are helpful, but can we add some real sample
payloads?
If we can capture a few actual update strings produced by the current
C impl / running system for:

then we could add them as fixtures and assert transform_data() produces
exactly the expected column layout for the target schema

> +
> +    #[test]
> +    fn test_transform_data_node_pve2_to_pve9() {
> +        // Test padding old format (12 cols) to new format (19 cols)
> +        // Input: timestamp:uptime:status:load:maxcpu:cpu:iowait:memtotal:memused:swap_t:swap_u:netin:netout
> +        let data = "1234567890:1000:0:1.5:4:2.0:0.5:8000000000:6000000000:0:0:1000000:500000";
> +
> +        let schema = RrdSchema::node(RrdFormat::Pve9_0);
> +        let result =
> +            RrdWriter::transform_data(data, RrdFormat::Pve2, &schema, MetricType::Node).unwrap();
> +
> +        // After skipping 2 cols (uptime, status) and padding with 7 U's:
> +        // timestamp:load:maxcpu:cpu:iowait:memtotal:memused:swap_t:swap_u:netin:netout:U:U:U:U:U:U:U
> +        let parts: Vec<&str> = result.split(':').collect();
> +        assert_eq!(parts[0], "1234567890", "Timestamp should be preserved");
> +        assert_eq!(parts.len(), 20, "Should have timestamp + 19 values"); // 1 + 19
> +        assert_eq!(parts[1], "1.5", "First value after skip should be load");
> +        assert_eq!(parts[2], "4", "Second value should be maxcpu");
> +
> +        // Check padding
> +        for (i, item) in parts.iter().enumerate().take(20).skip(12) {
> +            assert_eq!(item, &"U", "Column {} should be padded with U", i);
> +        }
> +    }
> +
> +    #[test]
> +    fn test_transform_data_vm_pve2_to_pve9() {
> +        // Test VM transformation with 4 columns skipped
> +        // Input: timestamp:uptime:status:template:pid:maxcpu:cpu:maxmem:mem:maxdisk:disk:netin:netout:diskread:diskwrite
> +        let data = "1234567890:1000:1:0:12345:4:2:4096:2048:100000:50000:1000:500:100:50";
> +
> +        let schema = RrdSchema::vm(RrdFormat::Pve9_0);
> +        let result =
> +            RrdWriter::transform_data(data, RrdFormat::Pve2, &schema, MetricType::Vm).unwrap();
> +
> +        let parts: Vec<&str> = result.split(':').collect();
> +        assert_eq!(parts[0], "1234567890");
> +        assert_eq!(parts.len(), 18, "Should have timestamp + 17 values");
> +        assert_eq!(parts[1], "4", "First value after skip should be maxcpu");
> +
> +        // Check padding (last 7 columns)
> +        for (i, item) in parts.iter().enumerate().take(18).skip(11) {
> +            assert_eq!(item, &"U", "Column {} should be padded", i);
> +        }
> +    }
> +
> +    #[test]
> +    fn test_transform_data_no_padding_needed() {
> +        // Test when source and target have same column count
> +        let data = "1234567890:1.5:4:2.0:0.5:8000000000:6000000000:0:0:0:0:1000000:500000:7000000000:0:0:0:0:0:0";
> +
> +        let schema = RrdSchema::node(RrdFormat::Pve9_0);
> +        let result =
> +            RrdWriter::transform_data(data, RrdFormat::Pve9_0, &schema, MetricType::Node).unwrap();
> +
> +        // No transformation should occur (same format)
> +        let parts: Vec<&str> = result.split(':').collect();
> +        assert_eq!(parts.len(), 20); // timestamp + 19 values
> +        assert_eq!(parts[1], "1.5");
> +    }
> +
> +    #[test]
> +    fn test_transform_data_future_format_truncation() {
> +        // Test truncation of future format with extra columns
> +        let data = "1234567890:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25";
> +
> +        let schema = RrdSchema::node(RrdFormat::Pve9_0);
> +        // Simulating future format that has 25 columns
> +        let result =
> +            RrdWriter::transform_data(data, RrdFormat::Pve9_0, &schema, MetricType::Node).unwrap();
> +
> +        let parts: Vec<&str> = result.split(':').collect();
> +        assert_eq!(parts.len(), 20, "Should truncate to timestamp + 19 values");
> +        assert_eq!(parts[19], "19", "Last value should be column 19");
> +    }
> +
> +    #[test]
> +    fn test_transform_data_storage_no_change() {
> +        // Storage format is same for Pve2 and Pve9_0 (2 columns, no skipping)
> +        let data = "1234567890:1000000000000:500000000000";
> +
> +        let schema = RrdSchema::storage(RrdFormat::Pve9_0);
> +        let result =
> +            RrdWriter::transform_data(data, RrdFormat::Pve2, &schema, MetricType::Storage).unwrap();
> +
> +        assert_eq!(result, data, "Storage data should not be transformed");
> +    }
> +
> +    #[test]
> +    fn test_metric_type_methods() {
> +        assert_eq!(MetricType::Node.skip_columns(), 2);
> +        assert_eq!(MetricType::Vm.skip_columns(), 4);
> +        assert_eq!(MetricType::Storage.skip_columns(), 0);
> +    }
> +
> +    #[test]
> +    fn test_format_column_counts() {
> +        assert_eq!(RrdFormat::Pve2.column_count(&MetricType::Node), 12);
> +        assert_eq!(RrdFormat::Pve9_0.column_count(&MetricType::Node), 19);
> +        assert_eq!(RrdFormat::Pve2.column_count(&MetricType::Vm), 10);
> +        assert_eq!(RrdFormat::Pve9_0.column_count(&MetricType::Vm), 17);
> +        assert_eq!(RrdFormat::Pve2.column_count(&MetricType::Storage), 2);
> +        assert_eq!(RrdFormat::Pve9_0.column_count(&MetricType::Storage), 2);
> +    }
> +}





^ permalink raw reply	[relevance 5%]

* Re: [pve-devel] [PATCH pve-cluster 05/15] pmxcfs-rs: add pmxcfs-memdb crate
  @ 2026-01-30 15:35  5%   ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-01-30 15:35 UTC (permalink / raw)
  To: Proxmox VE development discussion, Kefu Chai

Thanks for this substantial patch, Kefu! The overall structure looks
good and is already a good step.

Main issues I noted are around C compatibility.
Besides that, the version handling differs between operations.
I'd suggest centralizing this into a single mutation helper that
handles version bump + version update + entry change in one
transaction. A single write guard mutex would also help avoid the lock
ordering issues and race conditions I noted.

Details inline.

On 1/7/26 10:15 AM, Kefu Chai wrote:
> Add in-memory database with SQLite persistence:
> - MemDb: Main database handle (thread-safe via Arc)
> - TreeEntry: File/directory entries with metadata
> - SQLite schema version 5 (C-compatible)
> - Plugin system (6 functional + 4 link plugins)
> - Resource locking with timeout-based expiration
> - Version tracking and checksumming
> - Index encoding/decoding for cluster synchronization
> 
> This crate depends only on pmxcfs-api-types and external
> libraries (rusqlite, sha2, bincode). It provides the core
> storage layer used by the distributed file system.
> 
> Includes comprehensive unit tests for:
> - CRUD operations on files and directories
> - Lock acquisition and expiration
> - SQLite persistence and recovery
> - Index encoding/decoding for sync
> - Tree entry application
> 
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
>   src/pmxcfs-rs/Cargo.toml                      |    1 +
>   src/pmxcfs-rs/pmxcfs-memdb/Cargo.toml         |   42 +
>   src/pmxcfs-rs/pmxcfs-memdb/README.md          |  220 ++
>   src/pmxcfs-rs/pmxcfs-memdb/src/database.rs    | 2227 +++++++++++++++++
>   src/pmxcfs-rs/pmxcfs-memdb/src/index.rs       |  814 ++++++
>   src/pmxcfs-rs/pmxcfs-memdb/src/lib.rs         |   26 +
>   src/pmxcfs-rs/pmxcfs-memdb/src/locks.rs       |  286 +++
>   src/pmxcfs-rs/pmxcfs-memdb/src/sync.rs        |  249 ++
>   src/pmxcfs-rs/pmxcfs-memdb/src/traits.rs      |  101 +
>   src/pmxcfs-rs/pmxcfs-memdb/src/types.rs       |  325 +++
>   src/pmxcfs-rs/pmxcfs-memdb/src/vmlist.rs      |  189 ++
>   .../pmxcfs-memdb/tests/checksum_test.rs       |  158 ++
>   .../tests/sync_integration_tests.rs           |  394 +++
>   13 files changed, 5032 insertions(+)
>   create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/Cargo.toml
>   create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/README.md
>   create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/database.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/index.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/lib.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/locks.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/sync.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/traits.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/types.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/vmlist.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/tests/checksum_test.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/tests/sync_integration_tests.rs
> 
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index dd36c81f..2e41ac93 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -5,6 +5,7 @@ members = [
>       "pmxcfs-config",     # Configuration management
>       "pmxcfs-logger",     # Cluster log with ring buffer and deduplication
>       "pmxcfs-rrd",        # RRD (Round-Robin Database) persistence
> +    "pmxcfs-memdb",      # In-memory database with SQLite persistence
>   ]
>   resolver = "2"
>   
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/Cargo.toml b/src/pmxcfs-rs/pmxcfs-memdb/Cargo.toml
> new file mode 100644
> index 00000000..409b87ce
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/Cargo.toml
> @@ -0,0 +1,42 @@
> +[package]
> +name = "pmxcfs-memdb"
> +description = "In-memory database with SQLite persistence for pmxcfs"
> +
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +repository.workspace = true
> +
> +[lints]
> +workspace = true
> +
> +[dependencies]
> +# Error handling
> +anyhow.workspace = true
> +
> +# Database
> +rusqlite = { version = "0.30", features = ["bundled"] }
> +
> +# Concurrency primitives
> +parking_lot.workspace = true
> +
> +# System integration
> +libc.workspace = true
> +
> +# Cryptography (for checksums)
> +sha2.workspace = true
> +bytes.workspace = true
> +
> +# Serialization
> +serde.workspace = true
> +bincode.workspace = true
> +
> +# Logging
> +tracing.workspace = true
> +
> +# pmxcfs types
> +pmxcfs-api-types = { path = "../pmxcfs-api-types" }
> +
> +[dev-dependencies]
> +tempfile.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/README.md b/src/pmxcfs-rs/pmxcfs-memdb/README.md
> new file mode 100644
> index 00000000..172e7351
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/README.md
> @@ -0,0 +1,220 @@
> +# pmxcfs-memdb
> +
> +**In-Memory Database** with SQLite persistence for pmxcfs cluster filesystem.
> +
> +This crate provides a thread-safe, cluster-synchronized in-memory database that serves as the backend storage for the Proxmox cluster filesystem. All filesystem operations (read, write, create, delete) are performed on in-memory structures with SQLite providing durable persistence.
> +
> +## Overview
> +
> +The MemDb is the core data structure that stores all cluster configuration files in memory for fast access while maintaining durability through SQLite. Changes are synchronized across the cluster using the DFSM protocol.
> +
> +### Key Features
> +
> +- **In-memory tree structure**: All filesystem entries cached in memory
> +- **SQLite persistence**: Durable storage with ACID guarantees
> +- **Cluster synchronization**: State replication via DFSM (pmxcfs-dfsm crate)
> +- **Version tracking**: Monotonically increasing version numbers for conflict detection
> +- **Resource locking**: File-level locks with timeout-based expiration
> +- **Thread-safe**: All operations protected by mutex
> +- **Size limits**: Enforces max file size (1 MiB) and total filesystem size (128 MiB)
> +
> +## Architecture
> +
> +### Module Structure
> +
> +| Module | Purpose | C Equivalent |
> +|--------|---------|--------------|
> +| `database.rs` | Core MemDb struct and CRUD operations | `memdb.c` (main functions) |
> +| `types.rs` | TreeEntry, LockInfo, constants | `memdb.h:38-51, 71-74` |
> +| `locks.rs` | Resource locking functionality | `memdb.c:memdb_lock_*` |
> +| `sync.rs` | State serialization for cluster sync | `memdb.c:memdb_encode_index` |
> +| `index.rs` | Index comparison for DFSM updates | `memdb.c:memdb_index_*` |
> +
> +## C to Rust Mapping
> +
> +### Data Structures
> +
> +| C Type | Rust Type | Notes |
> +|--------|-----------|-------|
> +| `memdb_t` | `MemDb` | Main database handle (Clone-able via Arc) |
> +| `memdb_tree_entry_t` | `TreeEntry` | File/directory entry |
> +| `memdb_index_t` | `MemDbIndex` | Serialized state for sync |
> +| `memdb_index_extry_t` | `IndexEntry` | Single index entry |
> +| `memdb_lock_info_t` | `LockInfo` | Lock metadata |
> +| `db_backend_t` | `Connection` | SQLite backend (rusqlite) |
> +| `GHashTable *index` | `HashMap<u64, TreeEntry>` | Inode index |
> +| `GHashTable *locks` | `HashMap<String, LockInfo>` | Lock table |
> +| `GMutex mutex` | `Mutex` | Thread synchronization |
> +
> +### Core Functions
> +
> +#### Database Lifecycle
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_open()` | `MemDb::open()` | database.rs |
> +| `memdb_close()` | (Drop trait) | Automatic |
> +| `memdb_checkpoint()` | (implicit in writes) | Auto-commit |
> +
> +#### File Operations
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_read()` | `MemDb::read()` | database.rs |
> +| `memdb_write()` | `MemDb::write()` | database.rs |
> +| `memdb_create()` | `MemDb::create()` | database.rs |
> +| `memdb_delete()` | `MemDb::delete()` | database.rs |
> +| `memdb_mkdir()` | `MemDb::create()` (with DT_DIR) | database.rs |
> +| `memdb_rename()` | `MemDb::rename()` | database.rs |
> +| `memdb_mtime()` | (included in write) | database.rs |
> +
> +#### Directory Operations
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_readdir()` | `MemDb::readdir()` | database.rs |
> +| `memdb_dirlist_free()` | (automatic) | Rust's Vec drops automatically |
> +
> +#### Metadata Operations
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_getattr()` | `MemDb::lookup_path()` | database.rs |
> +| `memdb_statfs()` | `MemDb::statfs()` | database.rs |

the statfs impl is missing in the diff, please re-visit

> +
> +#### Tree Entry Functions
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_tree_entry_new()` | `TreeEntry { ... }` | Struct initialization |
> +| `memdb_tree_entry_copy()` | `.clone()` | Automatic (derive Clone) |
> +| `memdb_tree_entry_free()` | (Drop trait) | Automatic |
> +| `tree_entry_debug()` | `{:?}` format | Automatic (derive Debug) |
> +| `memdb_tree_entry_csum()` | `TreeEntry::compute_checksum()` | types.rs |
> +
> +#### Lock Operations
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_lock_expired()` | `MemDb::is_lock_expired()` | locks.rs |
> +| `memdb_update_locks()` | `MemDb::update_locks()` | locks.rs |
> +
> +#### Index/Sync Operations
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_encode_index()` | `MemDb::get_index()` | sync.rs |
> +| `memdb_index_copy()` | `.clone()` | Automatic (derive Clone) |
> +| `memdb_compute_checksum()` | `MemDb::compute_checksum()` | sync.rs |
> +| `bdb_backend_commit_update()` | `MemDb::apply_tree_entry()` | database.rs |
> +
> +#### State Synchronization
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_recreate_vmlist()` | (handled by status crate) | External |
> +| (implicit) | `MemDb::replace_all_entries()` | database.rs |
> +
> +### SQLite Backend
> +
> +**C Version (database.c):**
> +- Direct SQLite3 C API
> +- Manual statement preparation
> +- Explicit transaction management
> +- Manual memory management
> +
> +**Rust Version (database.rs):**
> +- `rusqlite` crate for type-safe SQLite access
> +
> +## Database Schema
> +
> +The SQLite schema stores all filesystem entries with metadata:
> +- `inode = 1` is always the root directory
> +- `parent = 0` for root, otherwise parent directory's inode
> +- `version` increments on each modification (monotonic)
> +- `writer` is the node ID that made the change
> +- `mtime` is seconds since UNIX epoch
> +- `data` is NULL for directories, BLOB for files
> +
> +## TreeEntry Wire Format
> +
> +For cluster synchronization (DFSM Update messages), TreeEntry uses C-compatible serialization that is byte-compatible with C's implementation.
> +
> +## Key Differences from C Implementation
> +
> +### Thread Safety
> +
> +**C Version:**
> +- Single `GMutex` protects entire memdb_t
> +- Callback-based access from qb_loop (single-threaded)
> +
> +**Rust Version:**
> +- Mutex for each data structure (index, tree, locks, conn)
> +- More granular locking
> +- Can be shared across tokio tasks
> +
> +### Data Structures
> +
> +**C Version:**
> +- `GHashTable` (GLib) for index and tree
> +- Recursive tree structure with pointers
> +
> +**Rust Version:**
> +- `HashMap` from std
> +- Flat structure: `HashMap<u64, HashMap<String, u64>>` for tree
> +- Separate `HashMap<u64, TreeEntry>` for index
> +- No recursive pointers (eliminates cycles)
> +
> +### SQLite Integration
> +
> +**C Version (database.c):**
> +- Direct SQLite3 C API
> +
> +**Rust Version (database.rs):**
> +- `rusqlite` crate for type-safe SQLite access
> +
> +## Constants
> +
> +| Constant | Value | Purpose |
> +|----------|-------|---------|
> +| `MEMDB_MAX_FILE_SIZE` | 1 MiB | Maximum file size (matches C) |
> +| `MEMDB_MAX_FSSIZE` | 128 MiB | Maximum total filesystem size |
> +| `MEMDB_MAX_INODES` | 256k | Maximum number of files/dirs |
> +| `MEMDB_BLOCKSIZE` | 4096 | Block size for statfs |
> +| `LOCK_TIMEOUT` | 120 sec | Lock expiration timeout |
> +| `DT_DIR` | 4 | Directory type (matches POSIX) |
> +| `DT_REG` | 8 | Regular file type (matches POSIX) |
> +
> +## Known Issues / TODOs
> +
> +### Missing Features
> +
> +- [ ] **vmlist regeneration**: `memdb_recreate_vmlist()` not implemented (handled by status crate's `scan_vmlist()`)
> +
> +### Behavioral Differences (Benign)
> +
> +- **Lock storage**: C reads from filesystem at startup, Rust does the same but implementation differs
> +- **Index encoding**: Rust uses `Vec<IndexEntry>` instead of flexible array member
> +- **Checksum algorithm**: Same (SHA-256) but implementation differs (ring vs OpenSSL)
> +
> +### Compatibility
> +
> +- **Database format**: 100% compatible with C version (same SQLite schema)
> +- **Wire format**: TreeEntry serialization matches C byte-for-byte
> +- **Constants**: All limits match C version exactly
> +
> +## References
> +
> +### C Implementation
> +- `src/pmxcfs/memdb.c` / `memdb.h` - In-memory database
> +- `src/pmxcfs/database.c` - SQLite backend
> +
> +### Related Crates
> +- **pmxcfs-dfsm**: Uses MemDb for cluster synchronization
> +- **pmxcfs-api-types**: Message types for FUSE operations
> +- **pmxcfs**: Main daemon and FUSE integration
> +
> +### External Dependencies
> +- **rusqlite**: SQLite bindings
> +- **parking_lot**: Fast mutex implementation
> +- **sha2**: SHA-256 checksums
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/database.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/database.rs
> new file mode 100644
> index 00000000..ee280683
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/database.rs
> @@ -0,0 +1,2227 @@
> +/// Core MemDb implementation - in-memory database with SQLite persistence
> +use anyhow::{Context, Result};
> +use parking_lot::Mutex;
> +use rusqlite::{Connection, params};
> +use std::collections::HashMap;
> +use std::path::Path;
> +use std::sync::Arc;
> +use std::sync::atomic::{AtomicU64, Ordering};
> +use std::time::{SystemTime, UNIX_EPOCH};
> +
> +use super::types::LockInfo;
> +use super::types::{
> +    DT_DIR, DT_REG, LOCK_DIR_PATH, LoadDbResult, MEMDB_MAX_FILE_SIZE, ROOT_INODE, TreeEntry,
> +    VERSION_FILENAME,
> +};
> +
> +/// In-memory database with SQLite persistence
> +#[derive(Clone)]
> +pub struct MemDb {
> +    pub(super) inner: Arc<MemDbInner>,
> +}
> +
> +pub(super) struct MemDbInner {
> +    /// SQLite connection for persistence (wrapped in Mutex for thread-safety)
> +    pub(super) conn: Mutex<Connection>,
> +
> +    /// In-memory index of all entries (inode -> TreeEntry)
> +    /// This is a cache of the database for fast lookups
> +    pub(super) index: Mutex<HashMap<u64, TreeEntry>>,
> +
> +    /// In-memory tree structure (parent inode -> children)
> +    pub(super) tree: Mutex<HashMap<u64, HashMap<String, u64>>>,
> +
> +    /// Root entry
> +    pub(super) root_inode: u64,
> +
> +    /// Current version (incremented on each write)
> +    pub(super) version: AtomicU64,
> +
> +    /// Resource locks (path -> LockInfo)
> +    pub(super) locks: Mutex<HashMap<String, LockInfo>>,

In C we set memdb->errors = 1 after DB errors and refuses subsequent
operations. We should likely also have a error flag here and update
/ check it when performing the operations?

> +}
> +
> +// Manually implement Send and Sync for MemDb
> +// This is safe because we protect the Connection with a Mutex
> +unsafe impl Send for MemDbInner {}
> +unsafe impl Sync for MemDbInner {}

Mutex<Connection> should allow us to avoid any unsafe impls here.
please remove and let the compiler enforce the guarantees

> +
> +impl MemDb {
> +    pub fn open(path: &Path, create: bool) -> Result<Self> {
> +        let conn = Connection::open(path)?;
> +
> +        if create {
> +            Self::init_schema(&conn)?;
> +        }
> +
> +        let (index, tree, root_inode, version) = Self::load_from_db(&conn)?;
> +
> +        let memdb = Self {
> +            inner: Arc::new(MemDbInner {
> +                conn: Mutex::new(conn),
> +                index: Mutex::new(index),
> +                tree: Mutex::new(tree),
> +                root_inode,
> +                version: AtomicU64::new(version),
> +                locks: Mutex::new(HashMap::new()),
> +            }),
> +        };
> +
> +        memdb.update_locks();
> +
> +        Ok(memdb)
> +    }
> +
> +    fn init_schema(conn: &Connection) -> Result<()> {
> +        conn.execute_batch(
> +            r#"
> +            CREATE TABLE tree (
> +                inode INTEGER PRIMARY KEY,
> +                parent INTEGER NOT NULL,
> +                version INTEGER NOT NULL,
> +                writer INTEGER NOT NULL,
> +                mtime INTEGER NOT NULL,
> +                type INTEGER NOT NULL,
> +                name TEXT NOT NULL,
> +                data BLOB,
> +                size INTEGER NOT NULL
> +            );
> +
> +            CREATE INDEX tree_parent_idx ON tree(parent, name);
> +
> +            CREATE TABLE config (
> +                name TEXT PRIMARY KEY,
> +                value TEXT
> +            );
> +            "#,
> +        )?;
> +
> +        // Create root metadata entry as inode ROOT_INODE with name "__version__"
> +        // Matching C implementation: root inode is NEVER in database as a regular entry
> +        // Root metadata is stored as inode ROOT_INODE with special name "__version__"
> +        let now = SystemTime::now()
> +            .duration_since(SystemTime::UNIX_EPOCH)?
> +            .as_secs() as u32;
> +
> +        conn.execute(
> +            "INSERT INTO tree (inode, parent, version, writer, mtime, type, name, data, size) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)",
> +            params![ROOT_INODE, ROOT_INODE, 1, 0, now, DT_REG, VERSION_FILENAME, None::<Vec<u8>>, 0],
> +        )?;
> +
> +        Ok(())
> +    }
> +
> +    fn load_from_db(conn: &Connection) -> Result<LoadDbResult> {
> +        let mut index = HashMap::new();
> +        let mut tree: HashMap<u64, HashMap<String, u64>> = HashMap::new();
> +        let mut max_version = 0u64;
> +
> +        let mut stmt = conn.prepare(
> +            "SELECT inode, parent, version, writer, mtime, type, name, data, size FROM tree",
> +        )?;
> +        let rows = stmt.query_map([], |row| {
> +            let inode: u64 = row.get(0)?;
> +            let parent: u64 = row.get(1)?;
> +            let version: u64 = row.get(2)?;
> +            let writer: u32 = row.get(3)?;
> +            let mtime: u32 = row.get(4)?;
> +            let entry_type: u8 = row.get(5)?;
> +            let name: String = row.get(6)?;
> +            let data: Option<Vec<u8>> = row.get(7)?;
> +            let size: i64 = row.get(8)?;
> +
> +            Ok(TreeEntry {
> +                inode,
> +                parent,
> +                version,
> +                writer,
> +                mtime,
> +                size: size as usize,
> +                entry_type,
> +                name,
> +                data: data.unwrap_or_default(),
> +            })
> +        })?;
> +
> +        // Create root entry in memory first (matching C implementation in database.c:559-567)
> +        // Root is NEVER stored in database, only its metadata via inode ROOT_INODE
> +        let now = SystemTime::now()
> +            .duration_since(SystemTime::UNIX_EPOCH)?
> +            .as_secs() as u32;
> +        let mut root = TreeEntry {
> +            inode: ROOT_INODE,
> +            parent: ROOT_INODE, // Root's parent is itself
> +            version: 0,         // Will be populated from __version__ entry
> +            writer: 0,
> +            mtime: now,
> +            size: 0,
> +            entry_type: DT_DIR,
> +            name: String::new(),
> +            data: Vec::new(),
> +        };
> +
> +        for row in rows {
> +            let entry = row?;
> +
> +            // Handle __version__ entry (inode ROOT_INODE) - populate root metadata (C: database.c:372-382)
> +            if entry.inode == ROOT_INODE {
> +                if entry.name == VERSION_FILENAME {
> +                    tracing::debug!(
> +                        "Loading root metadata from __version__: version={}, writer={}, mtime={}",
> +                        entry.version,
> +                        entry.writer,
> +                        entry.mtime
> +                    );
> +                    root.version = entry.version;
> +                    root.writer = entry.writer;
> +                    root.mtime = entry.mtime;
> +                    if entry.version > max_version {
> +                        max_version = entry.version;
> +                    }
> +                } else {
> +                    tracing::warn!("Ignoring inode 0 with unexpected name: {}", entry.name);
> +                }
> +                continue; // Don't add __version__ to index
> +            }
> +
> +            // Track max version from all entries
> +            if entry.version > max_version {
> +                max_version = entry.version;
> +            }
> +
> +            // Add to tree structure
> +            tree.entry(entry.parent)
> +                .or_default()
> +                .insert(entry.name.clone(), entry.inode);
> +
> +            // If this is a directory, ensure it has an entry in the tree map
> +            if entry.is_dir() {
> +                tree.entry(entry.inode).or_default();
> +            }
> +
> +            // Add to index
> +            index.insert(entry.inode, entry);
> +        }
> +
> +        // If root version is still 0, set it to 1 (new database)
> +        if root.version == 0 {
> +            root.version = 1;
> +            max_version = 1;
> +            tracing::debug!("No __version__ entry found, initializing root with version 1");
> +        }
> +
> +        // Add root to index and ensure it has a tree entry (use entry() to not overwrite children!)
> +        index.insert(ROOT_INODE, root);
> +        tree.entry(ROOT_INODE).or_default();
> +
> +        Ok((index, tree, ROOT_INODE, max_version))
> +    }
> +
> +    pub fn get_entry_by_inode(&self, inode: u64) -> Option<TreeEntry> {
> +        let index = self.inner.index.lock();
> +        index.get(&inode).cloned()
> +    }
> +
> +    /// Increment global version and synchronize root entry version
> +    ///
> +    /// CRITICAL: The C implementation uses root->version as the index version.
> +    /// We must keep the root entry's version synchronized with the global version counter
> +    /// to ensure C nodes can verify the index after applying updates.
> +    ///
> +    /// This function acquires the index lock and database connection lock internally,
> +    /// so it must NOT be called while holding either lock.

We could use a single "write guard" mutex for all mutating operations
to avoid risking consistency / races.
To me it seems C is doing exactly that and it helps us avoid these
issues.

> +    fn increment_version(&self) -> Result<u64> {
> +        let new_version = self.inner.version.fetch_add(1, Ordering::SeqCst) + 1;
> +
> +        // Update root entry version in memory and database
> +        {
> +            let mut index = self.inner.index.lock();
> +            if let Some(root_entry) = index.get_mut(&self.inner.root_inode) {
> +                root_entry.version = new_version;
> +            }
> +            drop(index);  // Release lock before DB access
> +        }
> +
> +        // Persist to database (outside index lock to avoid deadlock)
> +        {
> +            let conn = self.inner.conn.lock();
> +            conn.execute(
> +                "UPDATE tree SET version = ? WHERE inode = ?",
> +                rusqlite::params![new_version as i64, self.inner.root_inode as i64],
> +            )
> +            .context("Failed to update root version in database")?;
> +        }
> +
> +        Ok(new_version)
> +    }

Can we please centralize version bumps and __version__ updates?
right now increment_version() updates root version in
memory + DB separately from the actual entry mutation, while
other paths update __version__ differently (and sometimes
not at all).
it’d be much safer if every mutation did:
bump version +update __version__ + apply the entry change in
the same transaction, then updated in-memory

For example we could have a helper like this:

fn with_mutation<R>(&self, writer: u32, mtime: u32, f: impl 
FnOnce(&Transaction<'_>, u64) -> Result<R>) -> Result<R>;

> +
> +    /// Get the __version__ entry for sending updates to C nodes
> +    ///
> +    /// The __version__ entry (inode ROOT_INODE) stores root metadata in the database
> +    /// but is not kept in the in-memory index. This method queries it directly
> +    /// from the database to send as an UPDATE message to C nodes.
> +    pub fn get_version_entry(&self) -> anyhow::Result<TreeEntry> {
> +        let index = self.inner.index.lock();
> +        let root_entry = index
> +            .get(&self.inner.root_inode)
> +            .ok_or_else(|| anyhow::anyhow!("Root entry not found"))?;
> +
> +        // Create a __version__ entry matching C's format
> +        // This is what C expects to receive as inode ROOT_INODE
> +        Ok(TreeEntry {
> +            inode: ROOT_INODE, // __version__ is always inode ROOT_INODE in database/wire format
> +            parent: ROOT_INODE, // Root's parent is itself
> +            version: root_entry.version,
> +            writer: root_entry.writer,
> +            mtime: root_entry.mtime,
> +            size: 0,
> +            entry_type: DT_REG,
> +            name: VERSION_FILENAME.to_string(),
> +            data: Vec::new(),
> +        })
> +    }
> +
> +    pub fn lookup_path(&self, path: &str) -> Option<TreeEntry> {
> +        let index = self.inner.index.lock();
> +        let tree = self.inner.tree.lock();

Here we lock in order index, tree
But in fn readdir(..) we lock in order tree, index

I think we should at least enforce a strict lock ordering across
all methods, or collapse to a single mutex as mentioned.

> +
> +        if path.is_empty() || path == "/" || path == "." {
> +            return index.get(&self.inner.root_inode).cloned();
> +        }
> +
> +        let parts: Vec<&str> = path.split('/').filter(|s| !s.is_empty()).collect();
> +        let mut current_inode = self.inner.root_inode;
> +
> +        for part in parts {
> +            let children = tree.get(&current_inode)?;
> +            current_inode = *children.get(part)?;
> +        }
> +
> +        index.get(&current_inode).cloned()
> +    }
> +
> +    /// Split a path into parent directory and basename
> +    ///
> +    /// Paths should be absolute (starting with `/`). While the implementation
> +    /// handles relative paths for C compatibility, all new code should use absolute paths.
> +    fn split_path(path: &str) -> (String, String) {
> +        debug_assert!(
> +            path.starts_with('/') || path.is_empty(),
> +            "Path should be absolute (start with /), got: {path}"
> +        );

This only validates in debug builds. You could replace this with
actual checks.

> +
> +        let path = path.trim_end_matches('/');
> +
> +        if let Some(pos) = path.rfind('/') {
> +            let dirname = if pos == 0 { "/" } else { &path[..pos] };
> +            let basename = &path[pos + 1..];
> +            (dirname.to_string(), basename.to_string())
> +        } else {
> +            ("/".to_string(), path.to_string())
> +        }
> +    }
> +
> +    pub fn exists(&self, path: &str) -> Result<bool> {
> +        Ok(self.lookup_path(path).is_some())
> +    }
> +
> +    pub fn read(&self, path: &str, offset: u64, size: usize) -> Result<Vec<u8>> {
> +        let entry = self
> +            .lookup_path(path)
> +            .ok_or_else(|| anyhow::anyhow!("File not found: {path}"))?;
> +
> +        if entry.is_dir() {
> +            return Err(anyhow::anyhow!("Cannot read directory: {path}"));
> +        }
> +
> +        let offset = offset as usize;
> +        if offset >= entry.data.len() {
> +            return Ok(Vec::new());
> +        }
> +
> +        let end = std::cmp::min(offset + size, entry.data.len());
> +        Ok(entry.data[offset..end].to_vec())
> +    }
> +
> +    /// Helper to update __version__ entry in database
> +    ///
> +    /// This is called for EVERY write operation to keep root metadata synchronized
> +    /// (matching C behavior in database.c:275-278)
> +    fn update_version_entry(
> +        conn: &rusqlite::Connection,
> +        version: u64,
> +        writer: u32,
> +        mtime: u32,
> +    ) -> Result<()> {
> +        conn.execute(
> +            "UPDATE tree SET version = ?1, writer = ?2, mtime = ?3 WHERE inode = ?4",
> +            params![version, writer, mtime, ROOT_INODE],
> +        )?;
> +        Ok(())
> +    }
> +
> +    /// Helper to update root entry in index
> +    ///
> +    /// Keeps the in-memory root entry synchronized with database __version__
> +    fn update_root_metadata(
> +        index: &mut HashMap<u64, TreeEntry>,
> +        root_inode: u64,
> +        version: u64,
> +        writer: u32,
> +        mtime: u32,
> +    ) {
> +        if let Some(root_entry) = index.get_mut(&root_inode) {
> +            root_entry.version = version;
> +            root_entry.writer = writer;
> +            root_entry.mtime = mtime;
> +        }
> +    }
> +
> +    pub fn create(&self, path: &str, mode: u32, mtime: u32) -> Result<()> {
> +        if self.exists(path)? {
> +            return Err(anyhow::anyhow!("File already exists: {path}"));
> +        }
> +
> +        let (parent_path, basename) = Self::split_path(path);
> +
> +        let parent_entry = self
> +            .lookup_path(&parent_path)
> +            .ok_or_else(|| anyhow::anyhow!("Parent directory not found: {parent_path}"))?;
> +
> +        if !parent_entry.is_dir() {
> +            return Err(anyhow::anyhow!("Parent is not a directory: {parent_path}"));
> +        }
> +
> +        let entry_type = if mode & libc::S_IFDIR != 0 {
> +            DT_DIR
> +        } else {
> +            DT_REG
> +        };
> +
> +        // CRITICAL: Increment version FIRST, then assign inode = version
> +        // This matches C's behavior: te->inode = memdb->root->version
> +        // (see src/pmxcfs/memdb.c:760)
> +        let version = self.increment_version()?;
> +        let new_inode = version;  // Inode equals version number (C compatibility)
> +
> +        let entry = TreeEntry {
> +            inode: new_inode,
> +            parent: parent_entry.inode,
> +            version,
> +            writer: 0, // Local operations always use writer 0 (matching C)
> +            mtime,
> +            size: 0,
> +            entry_type,
> +            name: basename.clone(),
> +            data: Vec::new(),
> +        };
> +
> +        {
> +            let conn = self.inner.conn.lock();
> +            let tx = conn.unchecked_transaction()?;
> +
> +            tx.execute(
> +                "INSERT INTO tree (inode, parent, version, writer, mtime, type, name, data, size) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)",
> +                params![
> +                    entry.inode,
> +                    entry.parent,
> +                    entry.version,
> +                    entry.writer,
> +                    entry.mtime,
> +                    entry.entry_type,
> +                    entry.name,
> +                    if entry.is_dir() { None::<Vec<u8>> } else { Some(entry.data.clone()) },
> +                    entry.size
> +                ],
> +            )?;
> +
> +            // CRITICAL: Update __version__ entry (matching C in database.c:275-278)
> +            Self::update_version_entry(&tx, entry.version, entry.writer, entry.mtime)?;
> +
> +            tx.commit()?;
> +        }
> +
> +        {
> +            let mut index = self.inner.index.lock();
> +            let mut tree = self.inner.tree.lock();
> +
> +            index.insert(new_inode, entry.clone());
> +            Self::update_root_metadata(
> +                &mut index,
> +                self.inner.root_inode,
> +                entry.version,
> +                entry.writer,
> +                entry.mtime,
> +            );
> +
> +            tree.entry(parent_entry.inode)
> +                .or_default()
> +                .insert(basename, new_inode);
> +
> +            if entry.is_dir() {
> +                tree.insert(new_inode, HashMap::new());
> +            }
> +        }
> +
> +        // If this is a directory in priv/lock/, register it in the lock table
> +        if entry.is_dir() && parent_path == LOCK_DIR_PATH {
> +            let csum = entry.compute_checksum();
> +            let _ = self.lock_expired(path, &csum);
> +            tracing::debug!("Registered lock directory: {}", path);
> +        }
> +
> +        Ok(())
> +    }
> +
> +    pub fn write(
> +        &self,
> +        path: &str,
> +        offset: u64,
> +        mtime: u32,
> +        data: &[u8],
> +        truncate: bool,
> +    ) -> Result<usize> {
> +        let mut entry = self
> +            .lookup_path(path)
> +            .ok_or_else(|| anyhow::anyhow!("File not found: {path}"))?;
> +
> +        if entry.is_dir() {
> +            return Err(anyhow::anyhow!("Cannot write to directory: {path}"));
> +        }
> +
> +        // Truncate before writing if requested (matches C implementation behavior)

C preserves prefix bytes on truncate

> +        if truncate {
> +            entry.data.clear();
> +        }
> +
> +        // Check size limit
> +        let new_size = std::cmp::max(entry.data.len(), (offset as usize) + data.len());

I think we should use checked arithmetic to avoid possible overflows
on 32 bit systems. Also we should check

offset + data.len() <= MEMDB_MAX_FILE_SIZE

> +
> +        if new_size > MEMDB_MAX_FILE_SIZE {
> +            return Err(anyhow::anyhow!(
> +                "File size exceeds maximum: {MEMDB_MAX_FILE_SIZE}"
> +            ));
> +        }
> +
> +        // Extend if necessary
> +        let offset = offset as usize;
> +        if offset + data.len() > entry.data.len() {
> +            entry.data.resize(offset + data.len(), 0);
> +        }
> +
> +        // Write data
> +        entry.data[offset..offset + data.len()].copy_from_slice(data);
> +        entry.size = entry.data.len();
> +        entry.mtime = mtime;
> +        entry.writer = 0; // Local operations always use writer 0 (matching C)
> +
> +        // Increment version
> +        let version = self.increment_version()?;
> +        entry.version = version;
> +
> +        // Update database
> +        {
> +            let conn = self.inner.conn.lock();
> +            let tx = conn.unchecked_transaction()?;
> +
> +            tx.execute(
> +                "UPDATE tree SET version = ?1, writer = ?2, mtime = ?3, size = ?4, data = ?5 WHERE inode = ?6",
> +                params![
> +                    entry.version,
> +                    entry.writer,
> +                    entry.mtime,
> +                    entry.size,
> +                    &entry.data,
> +                    entry.inode
> +                ],
> +            )?;
> +
> +            // CRITICAL: Update __version__ entry (matching C in database.c:275-278)
> +            Self::update_version_entry(&tx, entry.version, entry.writer, entry.mtime)?;
> +
> +            tx.commit()?;
> +        }
> +
> +        // Update in-memory index
> +        {
> +            let mut index = self.inner.index.lock();
> +            index.insert(entry.inode, entry.clone());
> +            Self::update_root_metadata(
> +                &mut index,
> +                self.inner.root_inode,
> +                entry.version,
> +                entry.writer,
> +                entry.mtime,
> +            );
> +        }
> +
> +        Ok(data.len())
> +    }
> +
> +    /// Update modification time of a file or directory
> +    ///
> +    /// This implements the C version's `memdb_mtime` function (memdb.c:860-932)
> +    /// with full lock protection semantics for directories in `priv/lock/`.
> +    ///
> +    /// # Lock Protection
> +    ///
> +    /// For lock directories (`priv/lock/*`), this function enforces:
> +    /// 1. Only the same writer (node ID) can update the lock
> +    /// 2. Only newer mtime values are accepted (to prevent replay attacks)
> +    /// 3. Lock cache is refreshed after successful update
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `path` - Path to the file/directory
> +    /// * `writer` - Writer ID (node ID in cluster)
> +    /// * `mtime` - New modification time (seconds since UNIX epoch)
> +    pub fn set_mtime(&self, path: &str, writer: u32, mtime: u32) -> Result<()> {
> +        let mut entry = self
> +            .lookup_path(path)
> +            .ok_or_else(|| anyhow::anyhow!("File not found: {path}"))?;
> +
> +        // Don't allow updating root
> +        if entry.inode == self.inner.root_inode {
> +            return Err(anyhow::anyhow!("Cannot update root directory"));
> +        }
> +
> +        // Check if this is a lock directory (matching C logic in memdb.c:882)
> +        let (parent_path, _) = Self::split_path(path);
> +        let is_lock = parent_path.trim_start_matches('/') == LOCK_DIR_PATH && entry.is_dir();
> +
> +        if is_lock {
> +            // Lock protection: Only allow newer mtime (C: memdb.c:886-889)
> +            // This prevents replay attacks and ensures lock renewal works correctly
> +            if mtime < entry.mtime {
> +                tracing::warn!(
> +                    "Rejecting mtime update for lock '{}': {} < {} (locked)",
> +                    path,
> +                    mtime,
> +                    entry.mtime
> +                );
> +                return Err(anyhow::anyhow!(
> +                    "Cannot set older mtime on locked directory (dir is locked)"
> +                ));
> +            }
> +
> +            // Lock protection: Only same writer can update (C: memdb.c:890-894)
> +            // This prevents lock hijacking from other nodes
> +            if entry.writer != writer {
> +                tracing::warn!(
> +                    "Rejecting mtime update for lock '{}': writer {} != {} (wrong owner)",
> +                    path,
> +                    writer,
> +                    entry.writer
> +                );
> +                return Err(anyhow::anyhow!(
> +                    "Lock owned by different writer (cannot hijack lock)"
> +                ));
> +            }
> +
> +            tracing::debug!(
> +                "Updating lock directory: {} (mtime: {} -> {})",
> +                path,
> +                entry.mtime,
> +                mtime
> +            );
> +        }
> +
> +        // Increment version
> +        let version = self.increment_version()?;
> +
> +        // Update entry
> +        entry.version = version;
> +        entry.writer = writer;
> +        entry.mtime = mtime;
> +
> +        // Update database
> +        {
> +            let conn = self.inner.conn.lock();
> +            conn.execute(
> +                "UPDATE tree SET version = ?1, writer = ?2, mtime = ?3 WHERE inode = ?4",
> +                params![entry.version, entry.writer, entry.mtime, entry.inode],
> +            )?;
> +        }
> +
> +        // Update in-memory index
> +        {
> +            let mut index = self.inner.index.lock();
> +            index.insert(entry.inode, entry.clone());
> +        }
> +
> +        // Refresh lock cache if this is a lock directory (C: memdb.c:924-929)
> +        // Remove old entry and insert new one with updated checksum
> +        if is_lock {
> +            let mut locks = self.inner.locks.lock();
> +            locks.remove(path);
> +
> +            let csum = entry.compute_checksum();
> +            let now = SystemTime::now()
> +                .duration_since(UNIX_EPOCH)
> +                .unwrap_or_default()
> +                .as_secs();
> +
> +            locks.insert(path.to_string(), LockInfo { ltime: now, csum });
> +
> +            tracing::debug!("Refreshed lock cache for: {}", path);
> +        }
> +
> +        Ok(())
> +    }
> +
> +    pub fn readdir(&self, path: &str) -> Result<Vec<TreeEntry>> {
> +        let entry = self
> +            .lookup_path(path)
> +            .ok_or_else(|| anyhow::anyhow!("Directory not found: {path}"))?;
> +
> +        if !entry.is_dir() {
> +            return Err(anyhow::anyhow!("Not a directory: {path}"));
> +        }
> +
> +        let tree = self.inner.tree.lock();
> +        let index = self.inner.index.lock();
> +
> +        let children = tree
> +            .get(&entry.inode)
> +            .ok_or_else(|| anyhow::anyhow!("Directory structure corrupted"))?;
> +
> +        let mut entries = Vec::new();
> +        for child_inode in children.values() {
> +            if let Some(child) = index.get(child_inode) {
> +                entries.push(child.clone());
> +            }
> +        }
> +
> +        Ok(entries)
> +    }
> +
> +    pub fn delete(&self, path: &str) -> Result<()> {
> +        let entry = self
> +            .lookup_path(path)
> +            .ok_or_else(|| anyhow::anyhow!("File not found: {path}"))?;
> +
> +        // Don't allow deleting root
> +        if entry.inode == self.inner.root_inode {
> +            return Err(anyhow::anyhow!("Cannot delete root directory"));
> +        }
> +
> +        // If directory, check if empty
> +        if entry.is_dir() {
> +            let tree = self.inner.tree.lock();
> +            if let Some(children) = tree.get(&entry.inode)
> +                && !children.is_empty()
> +            {
> +                return Err(anyhow::anyhow!("Directory not empty: {path}"));
> +            }
> +        }

C's memdb_delete() increments the root version, but here we dont.
Also the __version__ needs to be incremented.

> +
> +        // Delete from database
> +        {
> +            let conn = self.inner.conn.lock();
> +            conn.execute("DELETE FROM tree WHERE inode = ?1", params![entry.inode])?;
> +        }
> +
> +        // Update in-memory structures
> +        {
> +            let mut index = self.inner.index.lock();
> +            let mut tree = self.inner.tree.lock();
> +
> +            // Remove from index
> +            index.remove(&entry.inode);
> +
> +            // Remove from parent's children
> +            if let Some(parent_children) = tree.get_mut(&entry.parent) {
> +                parent_children.remove(&entry.name);
> +            }
> +
> +            // Remove from tree if directory
> +            if entry.is_dir() {
> +                tree.remove(&entry.inode);
> +            }
> +        }
> +
> +        // Clean up lock cache for directories (matching C behavior in memdb.c:1235)
> +        // This prevents stale lock cache entries and memory leaks
> +        if entry.is_dir() {
> +            let mut locks = self.inner.locks.lock();
> +            locks.remove(path);
> +            tracing::debug!("Removed lock cache entry for deleted directory: {}", path);
> +        }
> +
> +        Ok(())
> +    }
> +
> +    pub fn rename(&self, old_path: &str, new_path: &str) -> Result<()> {
> +        let mut entry = self
> +            .lookup_path(old_path)
> +            .ok_or_else(|| anyhow::anyhow!("Source not found: {old_path}"))?;
> +
> +        if entry.inode == self.inner.root_inode {
> +            return Err(anyhow::anyhow!("Cannot rename root directory"));
> +        }
> +
> +        if self.exists(new_path)? {
> +            return Err(anyhow::anyhow!("Destination already exists: {new_path}"));
> +        }
> +
> +        let (new_parent_path, new_basename) = Self::split_path(new_path);
> +
> +        let new_parent_entry = self
> +            .lookup_path(&new_parent_path)
> +            .ok_or_else(|| anyhow::anyhow!("New parent directory not found: {new_parent_path}"))?;
> +
> +        if !new_parent_entry.is_dir() {
> +            return Err(anyhow::anyhow!(
> +                "New parent is not a directory: {new_parent_path}"
> +            ));
> +        }
> +
> +        let old_parent = entry.parent;
> +        let old_name = entry.name.clone();
> +
> +        entry.parent = new_parent_entry.inode;
> +        entry.name = new_basename.clone();
> +
> +        let version = self.increment_version()?;
> +        entry.version = version;
> +
> +        // Update database
> +        {
> +            let conn = self.inner.conn.lock();
> +            let tx = conn.unchecked_transaction()?;
> +
> +            tx.execute(
> +                "UPDATE tree SET parent = ?1, name = ?2, version = ?3 WHERE inode = ?4",
> +                params![entry.parent, entry.name, entry.version, entry.inode],
> +            )?;
> +
> +            // CRITICAL: Update __version__ entry (matching C in database.c:275-278)
> +            Self::update_version_entry(&tx, entry.version, entry.writer, entry.mtime)?;
> +
> +            tx.commit()?;
> +        }
> +
> +        {
> +            let mut index = self.inner.index.lock();
> +            let mut tree = self.inner.tree.lock();
> +
> +            index.insert(entry.inode, entry.clone());
> +            Self::update_root_metadata(
> +                &mut index,
> +                self.inner.root_inode,
> +                entry.version,
> +                entry.writer,
> +                entry.mtime,
> +            );
> +
> +            if let Some(old_parent_children) = tree.get_mut(&old_parent) {
> +                old_parent_children.remove(&old_name);
> +            }
> +
> +            tree.entry(new_parent_entry.inode)
> +                .or_default()
> +                .insert(new_basename, entry.inode);
> +        }
> +
> +        Ok(())
> +    }
> +
> +    pub fn get_all_entries(&self) -> Result<Vec<TreeEntry>> {
> +        let index = self.inner.index.lock();
> +        let entries: Vec<TreeEntry> = index.values().cloned().collect();
> +        Ok(entries)
> +    }
> +
> +    pub fn get_version(&self) -> u64 {
> +        self.inner.version.load(Ordering::SeqCst)
> +    }
> +
> +    /// Replace all entries (for full state synchronization)
> +    pub fn replace_all_entries(&self, entries: Vec<TreeEntry>) -> Result<()> {
> +        tracing::info!(
> +            "Replacing all database entries with {} new entries",
> +            entries.len()
> +        );
> +
> +        let conn = self.inner.conn.lock();
> +        let tx = conn.unchecked_transaction()?;
> +
> +        tx.execute("DELETE FROM tree", [])?;

Here we delete all entries, including the root one ..

> +
> +        let max_version = entries.iter().map(|e| e.version).max().unwrap_or(0);
> +
> +        for entry in &entries {
> +            tx.execute(
> +                "INSERT INTO tree (inode, parent, version, writer, mtime, type, name, data, size) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)",
> +                params![
> +                    entry.inode,
> +                    entry.parent,
> +                    entry.version,
> +                    entry.writer,
> +                    entry.mtime,
> +                    entry.entry_type,
> +                    entry.name,
> +                    if entry.is_dir() { None::<Vec<u8>> } else { Some(entry.data.clone()) },
> +                    entry.size
> +                ],
> +            )?;
> +        }

.. but if Vec<TreeEntry> contains the in-memory root format instead
of the DB format the database may corrupt; on restart load_from_db()
will ignore the malformed root entry and reset version to 1.
We need to handle the root case explicitly.

> +
> +        tx.commit()?;
> +        drop(conn);
> +
> +        let mut index = self.inner.index.lock();
> +        let mut tree = self.inner.tree.lock();
> +
> +        index.clear();
> +        tree.clear();
> +
> +        for entry in entries {
> +            tree.entry(entry.parent)
> +                .or_default()
> +                .insert(entry.name.clone(), entry.inode);
> +
> +            if entry.is_dir() {
> +                tree.entry(entry.inode).or_default();
> +            }
> +
> +            index.insert(entry.inode, entry);
> +        }
> +
> +        self.inner.version.store(max_version, Ordering::SeqCst);
> +
> +        tracing::info!(
> +            "Database state replaced successfully, version now: {}",
> +            max_version
> +        );
> +        Ok(())
> +    }
> +
> +    /// Apply a single TreeEntry during incremental synchronization
> +    ///
> +    /// This is used when receiving Update messages from the leader.
> +    /// It directly inserts or updates the entry in the database without
> +    /// going through the path-based API.
> +    pub fn apply_tree_entry(&self, entry: TreeEntry) -> Result<()> {
> +        tracing::debug!(
> +            "Applying TreeEntry: inode={}, parent={}, name='{}', version={}",
> +            entry.inode,
> +            entry.parent,
> +            entry.name,
> +            entry.version
> +        );
> +
> +        // Begin transaction for atomicity
> +        let conn = self.inner.conn.lock();
> +        let tx = conn.unchecked_transaction()?;
> +
> +        // Handle root inode specially (inode 0 is __version__)
> +        let db_name = if entry.inode == self.inner.root_inode {
> +            VERSION_FILENAME
> +        } else {
> +            entry.name.as_str()
> +        };
> +
> +        // Insert or replace the entry in database
> +        tx.execute(
> +            "INSERT OR REPLACE INTO tree (inode, parent, version, writer, mtime, type, name, data, size) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)",
> +            params![
> +                entry.inode,
> +                entry.parent,
> +                entry.version,
> +                entry.writer,
> +                entry.mtime,
> +                entry.entry_type,
> +                db_name,
> +                if entry.is_dir() { None::<Vec<u8>> } else { Some(entry.data.clone()) },
> +                entry.size
> +            ],
> +        )?;
> +
> +        // CRITICAL: Update __version__ entry with the same metadata (matching C in database.c:275-278)
> +        // Only do this if we're not already writing __version__ itself
> +        if entry.inode != ROOT_INODE {
> +            Self::update_version_entry(&tx, entry.version, entry.writer, entry.mtime)?;
> +        }
> +
> +        tx.commit()?;
> +        drop(conn);
> +
> +        // Update in-memory structures
> +        let mut index = self.inner.index.lock();
> +        let mut tree = self.inner.tree.lock();
> +
> +        // Check if this entry already exists
> +        let old_entry = index.get(&entry.inode).cloned();
> +
> +        // If entry exists with different parent or name, update tree structure
> +        if let Some(old) = old_entry {
> +            if old.parent != entry.parent || old.name != entry.name {
> +                // Remove from old parent's children
> +                if let Some(old_parent_children) = tree.get_mut(&old.parent) {
> +                    old_parent_children.remove(&old.name);
> +                }
> +
> +                // Add to new parent's children
> +                tree.entry(entry.parent)
> +                    .or_default()
> +                    .insert(entry.name.clone(), entry.inode);
> +            }
> +        } else {
> +            // New entry - add to parent's children
> +            tree.entry(entry.parent)
> +                .or_default()
> +                .insert(entry.name.clone(), entry.inode);
> +        }
> +
> +        // If this is a directory, ensure it has an entry in the tree map
> +        if entry.is_dir() {
> +            tree.entry(entry.inode).or_default();
> +        }
> +
> +        // Update index
> +        index.insert(entry.inode, entry.clone());

incoming updates may include inode 0. this would overwrite the
in-memory root dir entry (DT_DIR) with a file entry

> +
> +        // Update root entry's metadata to match __version__ (if we wrote a non-root entry)
> +        if entry.inode != self.inner.root_inode {
> +            Self::update_root_metadata(
> +                &mut index,
> +                self.inner.root_inode,
> +                entry.version,
> +                entry.writer,
> +                entry.mtime,
> +            );
> +            tracing::debug!(
> +                version = entry.version,
> +                writer = entry.writer,
> +                mtime = entry.mtime,
> +                "Updated root entry metadata"
> +            );
> +        }
> +
> +        // Update version counter if this entry has a higher version
> +        self.inner
> +            .version
> +            .fetch_max(entry.version, Ordering::SeqCst);
> +
> +        tracing::debug!("TreeEntry applied successfully");
> +        Ok(())
> +    }
> +
> +    /// **TEST ONLY**: Manually set lock timestamp for testing expiration behavior
> +    ///
> +    /// This method is exposed for testing purposes only to simulate lock expiration
> +    /// without waiting the full 120 seconds. Do not use in production code.
> +    #[cfg(test)]
> +    pub fn test_set_lock_timestamp(&self, path: &str, timestamp_secs: u64) {
> +        let mut locks = self.inner.locks.lock();
> +        if let Some(lock_info) = locks.get_mut(path) {
> +            lock_info.ltime = timestamp_secs;
> +        }
> +    }
> +}
> +
> +// ============================================================================
> +// Trait Implementation for Dependency Injection
> +// ============================================================================
> +
> +impl crate::traits::MemDbOps for MemDb {
> +    fn create(&self, path: &str, mode: u32, mtime: u32) -> Result<()> {
> +        self.create(path, mode, mtime)
> +    }
> +
> +    fn read(&self, path: &str, offset: u64, size: usize) -> Result<Vec<u8>> {
> +        self.read(path, offset, size)
> +    }
> +
> +    fn write(
> +        &self,
> +        path: &str,
> +        offset: u64,
> +        mtime: u32,
> +        data: &[u8],
> +        truncate: bool,
> +    ) -> Result<usize> {
> +        self.write(path, offset, mtime, data, truncate)
> +    }
> +
> +    fn delete(&self, path: &str) -> Result<()> {
> +        self.delete(path)
> +    }
> +
> +    fn rename(&self, old_path: &str, new_path: &str) -> Result<()> {
> +        self.rename(old_path, new_path)
> +    }
> +
> +    fn exists(&self, path: &str) -> Result<bool> {
> +        self.exists(path)
> +    }
> +
> +    fn readdir(&self, path: &str) -> Result<Vec<crate::types::TreeEntry>> {
> +        self.readdir(path)
> +    }
> +
> +    fn set_mtime(&self, path: &str, writer: u32, mtime: u32) -> Result<()> {
> +        self.set_mtime(path, writer, mtime)
> +    }
> +
> +    fn lookup_path(&self, path: &str) -> Option<crate::types::TreeEntry> {
> +        self.lookup_path(path)
> +    }
> +
> +    fn get_entry_by_inode(&self, inode: u64) -> Option<crate::types::TreeEntry> {
> +        self.get_entry_by_inode(inode)
> +    }
> +
> +    fn acquire_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()> {
> +        self.acquire_lock(path, csum)
> +    }
> +
> +    fn release_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()> {
> +        self.release_lock(path, csum)
> +    }
> +
> +    fn is_locked(&self, path: &str) -> bool {
> +        self.is_locked(path)
> +    }
> +
> +    fn lock_expired(&self, path: &str, csum: &[u8; 32]) -> bool {
> +        self.lock_expired(path, csum)
> +    }
> +
> +    fn get_version(&self) -> u64 {
> +        self.get_version()
> +    }
> +
> +    fn get_all_entries(&self) -> Result<Vec<crate::types::TreeEntry>> {
> +        self.get_all_entries()
> +    }
> +
> +    fn replace_all_entries(&self, entries: Vec<crate::types::TreeEntry>) -> Result<()> {
> +        self.replace_all_entries(entries)
> +    }
> +
> +    fn apply_tree_entry(&self, entry: crate::types::TreeEntry) -> Result<()> {
> +        self.apply_tree_entry(entry)
> +    }
> +
> +    fn encode_database(&self) -> Result<Vec<u8>> {
> +        self.encode_database()
> +    }
> +
> +    fn compute_database_checksum(&self) -> Result<[u8; 32]> {
> +        self.compute_database_checksum()
> +    }
> +}
> +

[..]

> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/index.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/index.rs
> new file mode 100644
> index 00000000..5bf9c102
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/index.rs
> @@ -0,0 +1,814 @@
> +/// MemDB Index structures for C-compatible state synchronization
> +///
> +/// This module implements the memdb_index_t format used by the C implementation
> +/// for efficient state comparison during cluster synchronization.
> +use anyhow::Result;
> +use sha2::{Digest, Sha256};
> +
> +/// Index entry matching C's memdb_index_extry_t
> +///
> +/// Wire format (40 bytes):
> +/// ```c
> +/// typedef struct {
> +///     guint64 inode;      // 8 bytes
> +///     char digest[32];    // 32 bytes (SHA256)
> +/// } memdb_index_extry_t;
> +/// ```
> +#[derive(Debug, Clone, PartialEq, Eq)]
> +pub struct IndexEntry {
> +    pub inode: u64,
> +    pub digest: [u8; 32],
> +}
> +
> +impl IndexEntry {
> +    pub fn serialize(&self) -> Vec<u8> {
> +        let mut data = Vec::with_capacity(40);
> +        data.extend_from_slice(&self.inode.to_le_bytes());
> +        data.extend_from_slice(&self.digest);
> +        data
> +    }
> +
> +    pub fn deserialize(data: &[u8]) -> Result<Self> {
> +        if data.len() < 40 {
> +            anyhow::bail!("IndexEntry too short: {} bytes (need 40)", data.len());
> +        }
> +
> +        let inode = u64::from_le_bytes(data[0..8].try_into().unwrap());
> +        let mut digest = [0u8; 32];
> +        digest.copy_from_slice(&data[8..40]);
> +
> +        Ok(Self { inode, digest })
> +    }
> +}
> +
> +/// MemDB index matching C's memdb_index_t
> +///
> +/// Wire format header (24 bytes) + entries:

this should be 32 bytes? also please fix the comment below and the
reference in the README

> +/// ```c
> +/// typedef struct {
> +///     guint64 version;        // 8 bytes
> +///     guint64 last_inode;     // 8 bytes
> +///     guint32 writer;         // 4 bytes
> +///     guint32 mtime;          // 4 bytes
> +///     guint32 size;           // 4 bytes (number of entries)
> +///     guint32 bytes;          // 4 bytes (total bytes allocated)
> +///     memdb_index_extry_t entries[];  // variable length
> +/// } memdb_index_t;
> +/// ```
> +#[derive(Debug, Clone, PartialEq, Eq)]
> +pub struct MemDbIndex {
> +    pub version: u64,
> +    pub last_inode: u64,
> +    pub writer: u32,
> +    pub mtime: u32,
> +    pub size: u32,  // number of entries
> +    pub bytes: u32, // total bytes (24 + size * 40)
> +    pub entries: Vec<IndexEntry>,
> +}
> +
> +impl MemDbIndex {
> +    /// Create a new index from entries
> +    ///
> +    /// Entries are automatically sorted by inode for efficient comparison
> +    /// and to match C implementation behavior.
> +    pub fn new(
> +        version: u64,
> +        last_inode: u64,
> +        writer: u32,
> +        mtime: u32,
> +        mut entries: Vec<IndexEntry>,
> +    ) -> Self {
> +        // Sort entries by inode (matching C implementation)
> +        entries.sort_by_key(|e| e.inode);
> +
> +        let size = entries.len() as u32;
> +        let bytes = 32 + size * 40; // header (32) + entries
> +
> +        Self {
> +            version,
> +            last_inode,
> +            writer,
> +            mtime,
> +            size,
> +            bytes,
> +            entries,
> +        }
> +    }
> +
> +    /// Serialize to C-compatible wire format
> +    pub fn serialize(&self) -> Vec<u8> {
> +        let mut data = Vec::with_capacity(self.bytes as usize);
> +
> +        // Header (32 bytes)
> +        data.extend_from_slice(&self.version.to_le_bytes());
> +        data.extend_from_slice(&self.last_inode.to_le_bytes());
> +        data.extend_from_slice(&self.writer.to_le_bytes());
> +        data.extend_from_slice(&self.mtime.to_le_bytes());
> +        data.extend_from_slice(&self.size.to_le_bytes());
> +        data.extend_from_slice(&self.bytes.to_le_bytes());
> +
> +        // Entries (40 bytes each)
> +        for entry in &self.entries {
> +            data.extend_from_slice(&entry.serialize());
> +        }
> +
> +        data
> +    }
> +
> +    /// Deserialize from C-compatible wire format
> +    pub fn deserialize(data: &[u8]) -> Result<Self> {
> +        if data.len() < 32 {
> +            anyhow::bail!(
> +                "MemDbIndex too short: {} bytes (need at least 32)",
> +                data.len()
> +            );
> +        }
> +
> +        // Parse header
> +        let version = u64::from_le_bytes(data[0..8].try_into().unwrap());
> +        let last_inode = u64::from_le_bytes(data[8..16].try_into().unwrap());
> +        let writer = u32::from_le_bytes(data[16..20].try_into().unwrap());
> +        let mtime = u32::from_le_bytes(data[20..24].try_into().unwrap());
> +        let size = u32::from_le_bytes(data[24..28].try_into().unwrap());
> +        let bytes = u32::from_le_bytes(data[28..32].try_into().unwrap());
> +
> +        // Validate size
> +        let expected_bytes = 32 + size * 40;
> +        if bytes != expected_bytes {
> +            anyhow::bail!("MemDbIndex bytes mismatch: got {bytes}, expected {expected_bytes}");
> +        }
> +
> +        if data.len() < bytes as usize {
> +            anyhow::bail!(
> +                "MemDbIndex data too short: {} bytes (need {})",
> +                data.len(),
> +                bytes
> +            );
> +        }
> +
> +        // Parse entries
> +        let mut entries = Vec::with_capacity(size as usize);
> +        let mut offset = 32;
> +        for _ in 0..size {
> +            let entry = IndexEntry::deserialize(&data[offset..offset + 40])?;
> +            entries.push(entry);
> +            offset += 40;
> +        }
> +
> +        Ok(Self {
> +            version,
> +            last_inode,
> +            writer,
> +            mtime,
> +            size,
> +            bytes,
> +            entries,
> +        })
> +    }
> +
> +    /// Compute SHA256 digest of a tree entry for the index
> +    ///
> +    /// Matches C's memdb_encode_index() digest computation (memdb.c:1497-1507)
> +    /// CRITICAL: Order and fields must match exactly:
> +    ///   1. version, 2. writer, 3. mtime, 4. size, 5. type, 6. parent, 7. name, 8. data
> +    ///
> +    /// NOTE: inode is NOT included in the digest (only used as the index key)
> +    #[allow(clippy::too_many_arguments)]
> +    pub fn compute_entry_digest(
> +        _inode: u64, // Not included in digest, only for signature compatibility
> +        parent: u64,
> +        version: u64,
> +        writer: u32,
> +        mtime: u32,
> +        size: usize,
> +        entry_type: u8,
> +        name: &str,
> +        data: &[u8],
> +    ) -> [u8; 32] {
> +        let mut hasher = Sha256::new();
> +
> +        // Hash entry metadata in C's exact order (memdb.c:1497-1503)
> +        hasher.update(version.to_le_bytes());
> +        hasher.update(writer.to_le_bytes());
> +        hasher.update(mtime.to_le_bytes());
> +        hasher.update((size as u32).to_le_bytes()); // C uses u32 for te->size
> +        hasher.update([entry_type]);
> +        hasher.update(parent.to_le_bytes());
> +        hasher.update(name.as_bytes());
> +
> +        // Hash data only for regular files with non-zero size (memdb.c:1505-1507)
> +        if entry_type == 8 /* DT_REG */ && size > 0 {
> +            hasher.update(data);
> +        }
> +
> +        hasher.finalize().into()
> +    }
> +}
> +
> +/// Implement comparison for MemDbIndex
> +///
> +/// Matches C's dcdb_choose_leader_with_highest_index() logic:
> +/// - If same version, higher mtime wins
> +/// - If different version, higher version wins
> +impl PartialOrd for MemDbIndex {
> +    fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
> +        Some(self.cmp(other))
> +    }
> +}
> +
> +impl Ord for MemDbIndex {
> +    fn cmp(&self, other: &Self) -> std::cmp::Ordering {
> +        // First compare by version (higher version wins)
> +        // Then by mtime (higher mtime wins) if versions are equal
> +        self.version
> +            .cmp(&other.version)
> +            .then_with(|| self.mtime.cmp(&other.mtime))
> +    }
> +}
> +
> +impl MemDbIndex {
> +    /// Find entries that differ from another index
> +    ///
> +    /// Returns the set of inodes that need to be sent as updates.
> +    /// Matches C's dcdb_create_and_send_updates() comparison logic.
> +    pub fn find_differences(&self, other: &MemDbIndex) -> Vec<u64> {
> +        let mut differences = Vec::new();
> +
> +        // Walk through master index, comparing with slave
> +        let mut j = 0; // slave position
> +
> +        for i in 0..self.entries.len() {
> +            let master_entry = &self.entries[i];
> +            let inode = master_entry.inode;
> +
> +            // Advance slave pointer to matching or higher inode
> +            while j < other.entries.len() && other.entries[j].inode < inode {
> +                j += 1;
> +            }
> +
> +            // Check if entries match
> +            if j < other.entries.len() {
> +                let slave_entry = &other.entries[j];
> +                if slave_entry.inode == inode && slave_entry.digest == master_entry.digest {
> +                    // Entries match - skip
> +                    continue;
> +                }
> +            }
> +
> +            // Entry differs or missing - needs update
> +            differences.push(inode);
> +        }
> +
> +        differences
> +    }
> +}
> +
> +#[cfg(test)]

[..]

> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/lib.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/lib.rs
> new file mode 100644
> index 00000000..f5c6d97a
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/lib.rs
> @@ -0,0 +1,26 @@
> +/// In-memory database with SQLite persistence
> +///
> +/// This module provides a cluster-synchronized in-memory database with SQLite persistence.
> +/// The implementation is organized into focused submodules:
> +///
> +/// - `types`: Type definitions and constants
> +/// - `database`: Core MemDb struct and CRUD operations
> +/// - `locks`: Resource locking functionality
> +/// - `sync`: State synchronization and serialization
> +/// - `index`: C-compatible memdb index structures for efficient state comparison
> +/// - `traits`: Trait abstractions for dependency injection and testing
> +mod database;
> +mod index;
> +mod locks;
> +mod sync;
> +mod traits;
> +mod types;
> +mod vmlist;
> +
> +// Re-export public types
> +pub use database::MemDb;
> +pub use index::{IndexEntry, MemDbIndex};
> +pub use locks::is_lock_path;
> +pub use traits::MemDbOps;
> +pub use types::{ROOT_INODE, TreeEntry};
> +pub use vmlist::recreate_vmlist;
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/locks.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/locks.rs
> new file mode 100644
> index 00000000..6d797fd0
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/locks.rs
> @@ -0,0 +1,286 @@
> +/// Lock management for memdb
> +///
> +/// Locks in pmxcfs are implemented as directory entries stored in the database at
> +/// `priv/lock/<lockname>`. This ensures locks are:
> +/// 1. Persistent across restarts
> +/// 2. Synchronized across the cluster via DFSM
> +/// 3. Visible to both C and Rust nodes
> +///
> +/// The in-memory lock table is a cache rebuilt from the database on startup
> +/// and updated dynamically during runtime.
> +use anyhow::Result;
> +use std::time::{SystemTime, UNIX_EPOCH};
> +
> +use super::database::MemDb;
> +use super::types::{LOCK_DIR_PATH, LOCK_TIMEOUT, LockInfo};
> +
> +/// Check if a path is in the lock directory
> +///
> +/// Matches C's path_is_lockdir() function (cfs-utils.c:306)
> +/// Returns true if path is "{LOCK_DIR_PATH}/<something>" (with or without leading /)
> +pub fn is_lock_path(path: &str) -> bool {
> +    let path = path.trim_start_matches('/');
> +    let lock_prefix = format!("{LOCK_DIR_PATH}/");
> +    path.starts_with(&lock_prefix) && path.len() > lock_prefix.len()
> +}
> +
> +impl MemDb {
> +    /// Check if a lock has expired (with side effects matching C semantics)
> +    ///
> +    /// This function implements the same behavior as the C version (memdb.c:330-358):
> +    /// - If no lock exists in cache: Reads from database, creates cache entry, returns `false`
> +    /// - If lock exists but csum mismatches: Updates csum, resets timeout, logs critical error, returns `false`
> +    /// - If lock exists, csum matches, and time > LOCK_TIMEOUT: Returns `true` (expired)
> +    /// - Otherwise: Returns `false` (not expired)
> +    ///
> +    /// This function is used for both checking AND managing locks, matching C semantics.
> +    ///
> +    /// # Current Usage
> +    /// - Called from `database::create()` when creating lock directories (matching C memdb.c:928)
> +    /// - Called from FUSE utimens operation (pmxcfs/src/fuse/filesystem.rs:717) for mtime=0 unlock requests
> +    /// - Called from DFSM unlock message handlers (pmxcfs/src/memdb_callbacks.rs:142,161)
> +    ///
> +    /// Note: DFSM broadcasting of unlock messages to cluster nodes is not yet fully implemented.
> +    /// See TODOs in filesystem.rs:723 and memdb_callbacks.rs:154 for remaining work.
> +    pub fn lock_expired(&self, path: &str, csum: &[u8; 32]) -> bool {
> +        let mut locks = self.inner.locks.lock();
> +        let now = SystemTime::now()
> +            .duration_since(UNIX_EPOCH)
> +            .unwrap_or_default()
> +            .as_secs();
> +
> +        match locks.get_mut(path) {
> +            Some(lock_info) => {
> +                // Lock exists in cache - check csum
> +                if lock_info.csum != *csum {
> +                    // Wrong csum - update and reset timeout
> +                    lock_info.ltime = now;
> +                    lock_info.csum = *csum;
> +                    tracing::error!("Lock checksum mismatch for '{}' - resetting timeout", path);
> +                    return false;
> +                }
> +
> +                // Csum matches - check if expired
> +                let elapsed = now - lock_info.ltime;
> +                if elapsed > LOCK_TIMEOUT {
> +                    tracing::debug!(path, elapsed, "Lock expired");
> +                    return true; // Expired
> +                }
> +
> +                false // Not expired
> +            }
> +            None => {
> +                // No lock in cache - create new cache entry
> +                locks.insert(
> +                    path.to_string(),
> +                    LockInfo {
> +                        ltime: now,
> +                        csum: *csum,
> +                    },
> +                );
> +                tracing::debug!(path, "Created new lock cache entry");
> +                false // Not expired (just created)
> +            }
> +        }
> +    }
> +
> +    /// Acquire a lock on a path
> +    ///
> +    /// This creates a directory entry in the database at `priv/lock/<lockname>`
> +    /// and broadcasts the operation to the cluster via DFSM.
> +    pub fn acquire_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()> {
> +        let now = SystemTime::now()
> +            .duration_since(UNIX_EPOCH)
> +            .unwrap_or_default()
> +            .as_secs();
> +
> +        let locks = self.inner.locks.lock();
> +
> +        // Check if there's an existing valid lock in cache
> +        if let Some(existing_lock) = locks.get(path) {
> +            let lock_age = now - existing_lock.ltime;
> +            if lock_age <= LOCK_TIMEOUT && existing_lock.csum != *csum {
> +                return Err(anyhow::anyhow!("Lock already held by another process"));
> +            }
> +        }
> +
> +        // Convert path like "/priv/lock/foo.lock" to just the lock name
> +        let lock_dir_with_slash = format!("/{LOCK_DIR_PATH}/");
> +        let lock_name = if let Some(name) = path.strip_prefix(&lock_dir_with_slash) {
> +            name
> +        } else {
> +            path.strip_prefix('/').unwrap_or(path)
> +        };
> +
> +        let lock_path = format!("/{LOCK_DIR_PATH}/{lock_name}");

In this lock path we use leading slash, but update_locks is without
format!("{}/{}", LOCK_DIR_PATH, entry.name) which would not match.

Please standardize on paths without leading slash and also adjust
the stripping logic accordingly.

Also we should validate the lock names to avoid path traversal.

> +
> +        // Release locks mutex before database operations to avoid deadlock
> +        drop(locks);
> +
> +        // Create or update lock directory in database
> +        // First check if it exists
> +        if self.exists(&lock_path)? {
> +            // Lock directory exists - update its mtime to refresh
> +            // In C this is implicit through the checksum, we'll update the entry
> +            tracing::debug!("Refreshing existing lock directory: {}", lock_path);
> +            // We don't need to do anything - the lock cache entry will be updated below
> +        } else {
> +            // Create lock directory in database
> +            let mode = libc::S_IFDIR | 0o755;
> +            let mtime = now as u32;
> +
> +            // Ensure lock directory exists
> +            let lock_dir_full = format!("/{LOCK_DIR_PATH}");
> +            if !self.exists(&lock_dir_full)? {
> +                self.create(&lock_dir_full, libc::S_IFDIR | 0o755, mtime)?;
> +            }
> +
> +            self.create(&lock_path, mode, mtime)?;
> +            tracing::debug!("Created lock directory in database: {}", lock_path);
> +        }
> +
> +        // Update in-memory cache
> +        let mut locks = self.inner.locks.lock();
> +        locks.insert(
> +            lock_path.clone(),
> +            LockInfo {
> +                ltime: now,
> +                csum: *csum,
> +            },
> +        );
> +
> +        tracing::debug!("Lock acquired on path: {}", lock_path);
> +        Ok(())
> +    }
> +
> +    /// Release a lock on a path
> +    ///
> +    /// This deletes the directory entry from the database and broadcasts
> +    /// the delete operation to the cluster via DFSM.
> +    pub fn release_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()> {
> +        let locks = self.inner.locks.lock();
> +
> +        if let Some(lock_info) = locks.get(path) {
> +            // Only release if checksum matches
> +            if lock_info.csum != *csum {
> +                return Err(anyhow::anyhow!("Cannot release lock: checksum mismatch"));
> +            }
> +        } else {
> +            return Err(anyhow::anyhow!("No lock found on path: {path}"));
> +        }
> +
> +        // Release locks mutex before database operations
> +        drop(locks);
> +
> +        // Delete lock directory from database
> +        if self.exists(path)? {
> +            self.delete(path)?;
> +            tracing::debug!("Deleted lock directory from database: {}", path);
> +        }
> +
> +        // Remove from in-memory cache
> +        let mut locks = self.inner.locks.lock();
> +        locks.remove(path);
> +
> +        tracing::debug!("Lock released on path: {}", path);
> +        Ok(())
> +    }
> +
> +    /// Update lock cache by scanning the priv/lock directory in database
> +    ///
> +    /// This implements the C version's behavior (memdb.c:360-89):
> +    /// - Scans the `priv/lock` directory in the database
> +    /// - Rebuilds the entire lock hash table from database state
> +    /// - Preserves `ltime` from old entries if csum matches
> +    /// - Is called on database open and after synchronization
> +    ///
> +    /// This ensures locks are visible across C/Rust nodes and survive restarts.
> +    pub(crate) fn update_locks(&self) {
> +        // Check if lock directory exists
> +        let _lock_dir = match self.lookup_path(LOCK_DIR_PATH) {
> +            Some(entry) if entry.is_dir() => entry,
> +            _ => {
> +                tracing::debug!(
> +                    "{} directory does not exist, initializing empty lock table",
> +                    LOCK_DIR_PATH
> +                );
> +                self.inner.locks.lock().clear();
> +                return;
> +            }
> +        };
> +
> +        let now = SystemTime::now()
> +            .duration_since(UNIX_EPOCH)
> +            .unwrap_or_default()
> +            .as_secs();
> +
> +        // Get old locks table for preserving ltimes
> +        let old_locks = {
> +            let locks = self.inner.locks.lock();
> +            locks.clone()
> +        };
> +
> +        // Build new locks table from database
> +        let mut new_locks = std::collections::HashMap::new();
> +
> +        // Read all lock directories
> +        match self.readdir(LOCK_DIR_PATH) {
> +            Ok(entries) => {
> +                for entry in entries {
> +                    // Only process directories (locks are stored as directories)
> +                    if !entry.is_dir() {
> +                        continue;
> +                    }
> +
> +                    let lock_path = format!("{}/{}", LOCK_DIR_PATH, entry.name);
> +                    let csum = entry.compute_checksum();
> +
> +                    // Check if we have an old entry with matching checksum
> +                    let ltime = if let Some(old_lock) = old_locks.get(&lock_path) {
> +                        if old_lock.csum == csum {
> +                            // Checksum matches - preserve old ltime
> +                            old_lock.ltime
> +                        } else {
> +                            // Checksum changed - reset ltime
> +                            now
> +                        }
> +                    } else {
> +                        // New lock - set ltime to now
> +                        now
> +                    };
> +
> +                    new_locks.insert(lock_path.clone(), LockInfo { ltime, csum });
> +                    tracing::debug!("Loaded lock from database: {}", lock_path);
> +                }
> +            }
> +            Err(e) => {
> +                tracing::warn!("Failed to read {} directory: {}", LOCK_DIR_PATH, e);
> +                return;
> +            }
> +        }
> +
> +        // Replace lock table
> +        *self.inner.locks.lock() = new_locks;
> +
> +        tracing::debug!(
> +            "Updated lock table from database: {} locks",
> +            self.inner.locks.lock().len()
> +        );
> +    }
> +
> +    /// Check if a path is locked
> +    pub fn is_locked(&self, path: &str) -> bool {
> +        let locks = self.inner.locks.lock();
> +        if let Some(lock_info) = locks.get(path) {
> +            let now = SystemTime::now()
> +                .duration_since(UNIX_EPOCH)
> +                .unwrap_or_default()
> +                .as_secs();
> +
> +            // Check if lock is still valid (not expired)
> +            (now - lock_info.ltime) <= LOCK_TIMEOUT
> +        } else {
> +            false
> +        }
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/sync.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/sync.rs
> new file mode 100644
> index 00000000..719a2cf0
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/sync.rs
> @@ -0,0 +1,249 @@
> +/// State synchronization and serialization for memdb
> +use anyhow::{Context, Result};
> +use sha2::{Digest, Sha256};
> +use std::sync::atomic::Ordering;
> +
> +use super::database::MemDb;
> +use super::index::{IndexEntry, MemDbIndex};
> +use super::types::TreeEntry;
> +
> +impl MemDb {
> +    /// Encode database index for C-compatible state synchronization
> +    ///
> +    /// This creates a memdb_index_t structure matching the C implementation,
> +    /// containing metadata and a sorted list of (inode, digest) pairs.
> +    /// This is sent as the "state" during DFSM synchronization.
> +    pub fn encode_index(&self) -> Result<MemDbIndex> {
> +        let mut index = self.inner.index.lock();
> +
> +        // CRITICAL: Synchronize root entry version with global version counter
> +        // The C implementation uses root->version as the index version,
> +        // so we must ensure they match before encoding.
> +        let global_version = self.inner.version.load(Ordering::SeqCst);
> +
> +        let root_inode = self.inner.root_inode;
> +        let mut root_version_updated = false;
> +        if let Some(root_entry) = index.get_mut(&root_inode) {
> +            if root_entry.version != global_version {
> +                root_entry.version = global_version;
> +                root_version_updated = true;
> +            }
> +        } else {
> +            anyhow::bail!("Root entry not found in index");
> +        }
> +
> +        // If root version was updated, persist to database
> +        if root_version_updated {
> +            let conn = self.inner.conn.lock();
> +            let root_entry = index.get(&root_inode).unwrap();  // Safe: we just checked it exists
> +
> +            conn.execute(
> +                "UPDATE entries SET version = ? WHERE inode = ?",

Please revisit the schema, should refer to the tree table?

> +                rusqlite::params![root_entry.version as i64, root_inode as i64],
> +            )
> +            .context("Failed to update root version in database")?;
> +
> +            drop(conn);
> +        }
> +
> +        // Collect ALL entries including root, sorted by inode
> +        let mut entries: Vec<&TreeEntry> = index.values().collect();
> +        entries.sort_by_key(|e| e.inode);
> +
> +        tracing::info!("=== encode_index: Encoding {} entries ===", entries.len());
> +        for te in entries.iter() {
> +            tracing::info!(
> +                "  Entry: inode={:#018x}, parent={:#018x}, name='{}', type={}, version={}, writer={}, mtime={}, size={}",
> +                te.inode, te.parent, te.name, te.entry_type, te.version, te.writer, te.mtime, te.size
> +            );
> +        }
> +
> +        // Create index entries with digests
> +        let index_entries: Vec<IndexEntry> = entries
> +            .iter()
> +            .map(|te| {
> +                let digest = MemDbIndex::compute_entry_digest(
> +                    te.inode,
> +                    te.parent,
> +                    te.version,
> +                    te.writer,
> +                    te.mtime,
> +                    te.size,
> +                    te.entry_type,
> +                    &te.name,
> +                    &te.data,
> +                );
> +                tracing::debug!(
> +                    "  Digest for inode {:#018x}: {:02x}{:02x}{:02x}{:02x}...{:02x}{:02x}{:02x}{:02x}",
> +                    te.inode,
> +                    digest[0], digest[1], digest[2], digest[3],
> +                    digest[28], digest[29], digest[30], digest[31]
> +                );
> +                IndexEntry { inode: te.inode, digest }
> +            })
> +            .collect();
> +
> +        // Get root entry for mtime and writer_id (now updated with global version)
> +        let root_entry = index
> +            .get(&self.inner.root_inode)
> +            .ok_or_else(|| anyhow::anyhow!("Root entry not found in index"))?;
> +
> +        let version = global_version;  // Already synchronized above
> +        let last_inode = index.keys().max().copied().unwrap_or(1);
> +        let writer = root_entry.writer;
> +        let mtime = root_entry.mtime;
> +
> +        drop(index);
> +
> +        Ok(MemDbIndex::new(
> +            version,
> +            last_inode,
> +            writer,
> +            mtime,
> +            index_entries,
> +        ))
> +    }
> +
> +    /// Encode the entire database state into a byte array
> +    /// Matches C version's memdb_encode() function
> +    pub fn encode_database(&self) -> Result<Vec<u8>> {
> +        let index = self.inner.index.lock();
> +
> +        // Collect all entries sorted by inode for consistent ordering
> +        // This matches the C implementation's memdb_tree_compare function
> +        let mut entries: Vec<&TreeEntry> = index.values().collect();
> +        entries.sort_by_key(|e| e.inode);
> +
> +        // Log all entries for debugging
> +        tracing::info!(
> +            "Encoding database: {} entries",
> +            entries.len()
> +        );
> +        for entry in entries.iter() {
> +            tracing::info!(
> +                "  Entry: inode={}, name='{}', parent={}, type={}, size={}, version={}",
> +                entry.inode,
> +                entry.name,
> +                entry.parent,
> +                entry.entry_type,
> +                entry.size,
> +                entry.version
> +            );
> +        }
> +
> +        // Serialize using bincode (compatible with C struct layout)
> +        let encoded = bincode::serialize(&entries)
> +            .map_err(|e| anyhow::anyhow!("Failed to encode database: {e}"))?;
> +
> +        tracing::debug!(
> +            "Encoded database: {} entries, {} bytes",
> +            entries.len(),
> +            encoded.len()
> +        );
> +
> +        Ok(encoded)
> +    }
> +
> +    /// Compute checksum of the entire database state
> +    /// Used for DFSM state verification
> +    pub fn compute_database_checksum(&self) -> Result<[u8; 32]> {
> +        let encoded = self.encode_database()?;

This currently serializes via bincode then hashes. C’s
memdb_compute_checksum hashes the entries directly.
This does not look C compatible.

> +
> +        let mut hasher = Sha256::new();
> +        hasher.update(&encoded);
> +
> +        Ok(hasher.finalize().into())
> +    }
> +
> +    /// Decode database state from a byte array
> +    /// Used during DFSM state synchronization
> +    pub fn decode_database(data: &[u8]) -> Result<Vec<TreeEntry>> {
> +        let entries: Vec<TreeEntry> = bincode::deserialize(data)
> +            .map_err(|e| anyhow::anyhow!("Failed to decode database: {e}"))?;
> +
> +        tracing::debug!("Decoded database: {} entries", entries.len());
> +
> +        Ok(entries)
> +    }
> +
> +    /// Synchronize corosync configuration from MemDb to filesystem
> +    ///
> +    /// Reads corosync.conf from memdb and writes to system file if changed.
> +    /// This syncs the cluster configuration from the distributed database
> +    /// to the local filesystem.
> +    ///
> +    /// # Arguments
> +    /// * `system_path` - Path to write the corosync.conf file (default: /etc/corosync/corosync.conf)
> +    /// * `force` - Force write even if unchanged
> +    pub fn sync_corosync_conf(&self, system_path: Option<&str>, force: bool) -> Result<()> {
> +        let system_path = system_path.unwrap_or("/etc/corosync/corosync.conf");
> +        tracing::info!(
> +            "Syncing corosync configuration to {} (force={})",
> +            system_path,
> +            force
> +        );
> +
> +        // Path in memdb for corosync.conf
> +        let memdb_path = "/corosync.conf";
> +
> +        // Try to read from memdb
> +        let memdb_data = match self.lookup_path(memdb_path) {
> +            Some(entry) if entry.is_file() => entry.data,
> +            Some(_) => {
> +                return Err(anyhow::anyhow!("{memdb_path} exists but is not a file"));
> +            }
> +            None => {
> +                tracing::debug!("{} not found in memdb, nothing to sync", memdb_path);
> +                return Ok(());
> +            }
> +        };
> +
> +        // Read current system file if it exists
> +        let system_data = std::fs::read(system_path).ok();
> +
> +        // Determine if we need to write
> +        let should_write = force || system_data.as_ref() != Some(&memdb_data);
> +
> +        if !should_write {
> +            tracing::debug!("Corosync configuration unchanged, skipping write");
> +            return Ok(());
> +        }
> +
> +        // SAFETY CHECK: Writing to /etc requires root permissions
> +        // We'll attempt the write but log clearly if it fails
> +        tracing::info!(
> +            "Corosync configuration changed (size: {} bytes), updating {}",
> +            memdb_data.len(),
> +            system_path
> +        );
> +
> +        // Basic validation: check if it looks like a valid corosync config
> +        let config_str =
> +            std::str::from_utf8(&memdb_data).context("Corosync config is not valid UTF-8")?;
> +
> +        if !config_str.contains("totem") {
> +            tracing::warn!("Corosync config validation: missing 'totem' section");
> +        }
> +        if !config_str.contains("nodelist") {
> +            tracing::warn!("Corosync config validation: missing 'nodelist' section");
> +        }
> +
> +        // Attempt to write (will fail if not root or no permissions)
> +        match std::fs::write(system_path, &memdb_data) {
> +            Ok(()) => {
> +                tracing::info!("Successfully updated {}", system_path);
> +                Ok(())
> +            }
> +            Err(e) if e.kind() == std::io::ErrorKind::PermissionDenied => {
> +                tracing::warn!(
> +                    "Permission denied writing {}: {}. Run as root to enable corosync sync.",
> +                    system_path,
> +                    e
> +                );
> +                // Don't return error - this is expected in non-root mode
> +                Ok(())
> +            }
> +            Err(e) => Err(anyhow::anyhow!("Failed to write {system_path}: {e}")),
> +        }
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/traits.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/traits.rs
> new file mode 100644
> index 00000000..efe3ff36
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/traits.rs
> @@ -0,0 +1,101 @@
> +//! Traits for MemDb operations
> +//!
> +//! This module provides the `MemDbOps` trait which abstracts MemDb operations
> +//! for dependency injection and testing. Similar to `StatusOps` in pmxcfs-status.
> +
> +use crate::types::TreeEntry;
> +use anyhow::Result;
> +
> +/// Trait abstracting MemDb operations for dependency injection and mocking
> +///
> +/// This trait enables:
> +/// - Dependency injection of MemDb into components
> +/// - Testing with MockMemDb instead of real database
> +/// - Trait objects for runtime polymorphism
> +///
> +/// # Example
> +/// ```no_run
> +/// use pmxcfs_memdb::{MemDb, MemDbOps};
> +/// use std::sync::Arc;
> +///
> +/// fn use_database(db: Arc<dyn MemDbOps>) {
> +///     // Can work with real MemDb or MockMemDb
> +///     let exists = db.exists("/test").unwrap();
> +/// }
> +/// ```
> +pub trait MemDbOps: Send + Sync {
> +    // ===== Basic File Operations =====
> +
> +    /// Create a new file or directory
> +    fn create(&self, path: &str, mode: u32, mtime: u32) -> Result<()>;
> +
> +    /// Read data from a file
> +    fn read(&self, path: &str, offset: u64, size: usize) -> Result<Vec<u8>>;
> +
> +    /// Write data to a file
> +    fn write(
> +        &self,
> +        path: &str,
> +        offset: u64,
> +        mtime: u32,
> +        data: &[u8],
> +        truncate: bool,
> +    ) -> Result<usize>;
> +
> +    /// Delete a file or directory
> +    fn delete(&self, path: &str) -> Result<()>;
> +
> +    /// Rename a file or directory
> +    fn rename(&self, old_path: &str, new_path: &str) -> Result<()>;
> +
> +    /// Check if a path exists
> +    fn exists(&self, path: &str) -> Result<bool>;
> +
> +    /// List directory contents
> +    fn readdir(&self, path: &str) -> Result<Vec<TreeEntry>>;
> +
> +    /// Set modification time
> +    fn set_mtime(&self, path: &str, writer: u32, mtime: u32) -> Result<()>;
> +
> +    // ===== Path Lookup =====
> +
> +    /// Look up a path and return its entry
> +    fn lookup_path(&self, path: &str) -> Option<TreeEntry>;
> +
> +    /// Get entry by inode number
> +    fn get_entry_by_inode(&self, inode: u64) -> Option<TreeEntry>;
> +
> +    // ===== Lock Operations =====
> +
> +    /// Acquire a lock on a path
> +    fn acquire_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()>;
> +
> +    /// Release a lock on a path
> +    fn release_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()>;
> +
> +    /// Check if a path is locked
> +    fn is_locked(&self, path: &str) -> bool;
> +
> +    /// Check if a lock has expired
> +    fn lock_expired(&self, path: &str, csum: &[u8; 32]) -> bool;
> +
> +    // ===== Database Operations =====
> +
> +    /// Get the current database version
> +    fn get_version(&self) -> u64;
> +
> +    /// Get all entries in the database
> +    fn get_all_entries(&self) -> Result<Vec<TreeEntry>>;
> +
> +    /// Replace all entries (for synchronization)
> +    fn replace_all_entries(&self, entries: Vec<TreeEntry>) -> Result<()>;
> +
> +    /// Apply a single tree entry update
> +    fn apply_tree_entry(&self, entry: TreeEntry) -> Result<()>;
> +
> +    /// Encode the entire database for network transmission
> +    fn encode_database(&self) -> Result<Vec<u8>>;
> +
> +    /// Compute database checksum
> +    fn compute_database_checksum(&self) -> Result<[u8; 32]>;
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/types.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/types.rs
> new file mode 100644
> index 00000000..988596c8
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/types.rs
> @@ -0,0 +1,325 @@
> +/// Type definitions for memdb module
> +use sha2::{Digest, Sha256};
> +use std::collections::HashMap;
> +
> +pub(super) const MEMDB_MAX_FILE_SIZE: usize = 1024 * 1024; // 1 MiB (matches C version)
> +pub(super) const LOCK_TIMEOUT: u64 = 120; // Lock timeout in seconds
> +pub(super) const DT_DIR: u8 = 4; // Directory type
> +pub(super) const DT_REG: u8 = 8; // Regular file type
> +
> +/// Root inode number (matches C implementation's memdb root inode)
> +/// IMPORTANT: This is the MEMDB root inode, which is 0 in both C and Rust.
> +/// The FUSE layer exposes this as inode 1 to the filesystem (FUSE_ROOT_ID).
> +/// See pmxcfs/src/fuse.rs for the inode mapping logic between memdb and FUSE.
> +pub const ROOT_INODE: u64 = 0;
> +
> +/// Version file name (matches C VERSIONFILENAME)
> +/// Used to store root metadata as inode ROOT_INODE in the database
> +pub const VERSION_FILENAME: &str = "__version__";
> +
> +/// Lock directory path (where cluster resource locks are stored)
> +/// Locks are implemented as directory entries stored at `priv/lock/<lockname>`
> +pub const LOCK_DIR_PATH: &str = "priv/lock";
> +
> +/// Lock information for resource locking
> +///
> +/// In the C version (memdb.h:71-74), the lock info struct includes a `path` field
> +/// that serves as the hash table key. In Rust, we use `HashMap<String, LockInfo>`
> +/// where the path is stored as the HashMap key, so we don't duplicate it here.
> +#[derive(Clone, Debug)]
> +pub(crate) struct LockInfo {
> +    /// Lock timestamp (seconds since UNIX epoch)
> +    pub(crate) ltime: u64,
> +
> +    /// Checksum of the locked resource (used to detect changes)
> +    pub(crate) csum: [u8; 32],
> +}
> +
> +/// Tree entry representing a file or directory
> +#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)]
> +pub struct TreeEntry {
> +    pub inode: u64,
> +    pub parent: u64,
> +    pub version: u64,
> +    pub writer: u32,
> +    pub mtime: u32,
> +    pub size: usize,
> +    pub entry_type: u8, // DT_DIR or DT_REG
> +    pub name: String,
> +    pub data: Vec<u8>, // File data (empty for directories)
> +}
> +
> +impl TreeEntry {
> +    pub fn is_dir(&self) -> bool {
> +        self.entry_type == DT_DIR
> +    }
> +
> +    pub fn is_file(&self) -> bool {
> +        self.entry_type == DT_REG
> +    }
> +
> +    /// Serialize TreeEntry to C-compatible wire format for Update messages
> +    ///
> +    /// Wire format (matches dcdb_send_update_inode):
> +    /// ```c
> +    /// [parent: u64][inode: u64][version: u64][writer: u32][mtime: u32]
> +    /// [size: u32][namelen: u32][type: u8][name: namelen bytes][data: size bytes]
> +    /// ```
> +    pub fn serialize_for_update(&self) -> Vec<u8> {
> +        let namelen = (self.name.len() + 1) as u32; // Include null terminator
> +        let header_size = 8 + 8 + 8 + 4 + 4 + 4 + 4 + 1; // 41 bytes
> +        let total_size = header_size + namelen as usize + self.data.len();
> +
> +        let mut buf = Vec::with_capacity(total_size);
> +
> +        // Header fields
> +        buf.extend_from_slice(&self.parent.to_le_bytes());
> +        buf.extend_from_slice(&self.inode.to_le_bytes());
> +        buf.extend_from_slice(&self.version.to_le_bytes());
> +        buf.extend_from_slice(&self.writer.to_le_bytes());
> +        buf.extend_from_slice(&self.mtime.to_le_bytes());
> +        buf.extend_from_slice(&(self.size as u32).to_le_bytes());
> +        buf.extend_from_slice(&namelen.to_le_bytes());
> +        buf.push(self.entry_type);
> +
> +        // Name (null-terminated)
> +        buf.extend_from_slice(self.name.as_bytes());
> +        buf.push(0); // null terminator
> +
> +        // Data (only for files)
> +        if self.entry_type == DT_REG && !self.data.is_empty() {
> +            buf.extend_from_slice(&self.data);
> +        }
> +
> +        buf
> +    }
> +
> +    /// Deserialize TreeEntry from C-compatible wire format
> +    ///
> +    /// Matches dcdb_parse_update_inode
> +    pub fn deserialize_from_update(data: &[u8]) -> anyhow::Result<Self> {
> +        if data.len() < 41 {
> +            anyhow::bail!(
> +                "Update message too short: {} bytes (need at least 41)",
> +                data.len()
> +            );
> +        }
> +
> +        let mut offset = 0;
> +
> +        // Parse header
> +        let parent = u64::from_le_bytes(data[offset..offset + 8].try_into().unwrap());
> +        offset += 8;
> +        let inode = u64::from_le_bytes(data[offset..offset + 8].try_into().unwrap());
> +        offset += 8;
> +        let version = u64::from_le_bytes(data[offset..offset + 8].try_into().unwrap());
> +        offset += 8;
> +        let writer = u32::from_le_bytes(data[offset..offset + 4].try_into().unwrap());
> +        offset += 4;
> +        let mtime = u32::from_le_bytes(data[offset..offset + 4].try_into().unwrap());
> +        offset += 4;
> +        let size = u32::from_le_bytes(data[offset..offset + 4].try_into().unwrap()) as usize;
> +        offset += 4;
> +        let namelen = u32::from_le_bytes(data[offset..offset + 4].try_into().unwrap()) as usize;
> +        offset += 4;
> +        let entry_type = data[offset];
> +        offset += 1;
> +
> +        // Validate type
> +        if entry_type != DT_REG && entry_type != DT_DIR {
> +            anyhow::bail!("Invalid entry type: {entry_type}");
> +        }
> +
> +        // Validate lengths
> +        if data.len() < offset + namelen + size {
> +            anyhow::bail!(
> +                "Update message too short: {} bytes (need {})",
> +                data.len(),
> +                offset + namelen + size
> +            );
> +        }
> +
> +        // Parse name (null-terminated)
> +        let name_bytes = &data[offset..offset + namelen];
> +        if name_bytes.is_empty() || name_bytes[namelen - 1] != 0 {
> +            anyhow::bail!("Name not null-terminated");
> +        }
> +        let name = std::str::from_utf8(&name_bytes[..namelen - 1])
> +            .map_err(|e| anyhow::anyhow!("Invalid UTF-8 in name: {e}"))?
> +            .to_string();
> +        offset += namelen;
> +
> +        // Parse data
> +        let data_vec = if entry_type == DT_REG && size > 0 {
> +            data[offset..offset + size].to_vec()
> +        } else {
> +            Vec::new()
> +        };
> +
> +        Ok(TreeEntry {
> +            inode,
> +            parent,
> +            version,
> +            writer,
> +            mtime,
> +            size,
> +            entry_type,
> +            name,
> +            data: data_vec,
> +        })
> +    }
> +
> +    /// Compute SHA-256 checksum of this tree entry
> +    ///
> +    /// This checksum is used by the lock system to detect changes to lock directory entries.
> +    /// Matches C version's memdb_tree_entry_csum() function (memdb.c:1389).
> +    ///
> +    /// The checksum includes all entry metadata (inode, parent, version, writer, mtime, size,
> +    /// entry_type, name) and data (for files). This ensures any modification to a lock directory
> +    /// entry is detected, triggering lock timeout reset.

Since C hashes raw integer bytes, should we use to_ne_bytes() here?

> +    pub fn compute_checksum(&self) -> [u8; 32] {
> +        let mut hasher = Sha256::new();
> +
> +        // Hash entry metadata in the same order as C version
> +        hasher.update(self.inode.to_le_bytes());
> +        hasher.update(self.parent.to_le_bytes());

This seems to be at the wrong position.
In C it is at the 7th position.

> +        hasher.update(self.version.to_le_bytes());
> +        hasher.update(self.writer.to_le_bytes());
> +        hasher.update(self.mtime.to_le_bytes());
> +        hasher.update(self.size.to_le_bytes());

C hashes only 4 bytes (guint32)
I think this should be

hasher.update((self.size as u32).to_le_bytes());

> +        hasher.update([self.entry_type]);
> +        hasher.update(self.name.as_bytes());
> +
> +        // Hash data if present
> +        if !self.data.is_empty() {
> +            hasher.update(&self.data);
> +        }
> +
> +        hasher.finalize().into()
> +    }
> +}
> +
> +/// Return type for load_from_db: (index, tree, root_inode, max_version)
> +pub(super) type LoadDbResult = (
> +    HashMap<u64, TreeEntry>,
> +    HashMap<u64, HashMap<String, u64>>,
> +    u64,
> +    u64,
> +);
> +

[..]

> +}





^ permalink raw reply	[relevance 5%]

* Re: [pve-devel] [PATCH pve-cluster 06/15] pmxcfs-rs: add pmxcfs-status crate
  @ 2026-02-02 16:07  5%   ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-02 16:07 UTC (permalink / raw)
  To: Proxmox VE development discussion, Kefu Chai

comments inline

On 1/6/26 3:25 PM, Kefu Chai wrote:
> Add cluster status tracking and monitoring:
> - Status: Central status container (thread-safe)
> - Cluster membership tracking
> - VM/CT registry with version tracking
> - RRD data management
> - Cluster log integration
> - Quorum state tracking
> - Configuration file version tracking
> 
> This integrates pmxcfs-memdb, pmxcfs-rrd, pmxcfs-logger, and
> pmxcfs-api-types to provide centralized cluster state management.
> It also uses procfs for system metrics collection.
> 
> Includes comprehensive unit tests for:
> - VM registration and deletion
> - Cluster membership updates
> - Version tracking
> - Configuration file monitoring
> 
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
>   src/pmxcfs-rs/Cargo.toml                  |    1 +
>   src/pmxcfs-rs/pmxcfs-status/Cargo.toml    |   40 +
>   src/pmxcfs-rs/pmxcfs-status/README.md     |  142 ++
>   src/pmxcfs-rs/pmxcfs-status/src/lib.rs    |   54 +
>   src/pmxcfs-rs/pmxcfs-status/src/status.rs | 1561 +++++++++++++++++++++
>   src/pmxcfs-rs/pmxcfs-status/src/traits.rs |  486 +++++++
>   src/pmxcfs-rs/pmxcfs-status/src/types.rs  |   62 +
>   7 files changed, 2346 insertions(+)
>   create mode 100644 src/pmxcfs-rs/pmxcfs-status/Cargo.toml
>   create mode 100644 src/pmxcfs-rs/pmxcfs-status/README.md
>   create mode 100644 src/pmxcfs-rs/pmxcfs-status/src/lib.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-status/src/status.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-status/src/traits.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-status/src/types.rs
> 
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index 2e41ac93..b5191c31 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -6,6 +6,7 @@ members = [
>       "pmxcfs-logger",     # Cluster log with ring buffer and deduplication
>       "pmxcfs-rrd",        # RRD (Round-Robin Database) persistence
>       "pmxcfs-memdb",      # In-memory database with SQLite persistence
> +    "pmxcfs-status",     # Status monitoring and RRD data management
>   ]
>   resolver = "2"
>   
> diff --git a/src/pmxcfs-rs/pmxcfs-status/Cargo.toml b/src/pmxcfs-rs/pmxcfs-status/Cargo.toml
> new file mode 100644
> index 00000000..e4a817d7
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-status/Cargo.toml
> @@ -0,0 +1,40 @@
> +[package]
> +name = "pmxcfs-status"
> +description = "Status monitoring and RRD data management for pmxcfs"
> +
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +repository.workspace = true
> +
> +[lints]
> +workspace = true
> +
> +[dependencies]
> +# Workspace dependencies
> +pmxcfs-api-types.workspace = true
> +pmxcfs-rrd.workspace = true
> +pmxcfs-memdb.workspace = true
> +pmxcfs-logger.workspace = true
> +
> +# Error handling
> +anyhow.workspace = true
> +
> +# Async runtime
> +tokio.workspace = true
> +
> +# Concurrency primitives
> +parking_lot.workspace = true
> +
> +# Logging
> +tracing.workspace = true
> +
> +# Utilities
> +chrono.workspace = true

this dependency is not used

> +
> +# System information (Linux /proc filesystem)
> +procfs = "0.17"
> +
> +[dev-dependencies]
> +tempfile.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-status/README.md b/src/pmxcfs-rs/pmxcfs-status/README.md
> new file mode 100644
> index 00000000..b6958af3
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-status/README.md
> @@ -0,0 +1,142 @@
> +# pmxcfs-status
> +
> +**Cluster Status** tracking and monitoring for pmxcfs.
> +
> +This crate manages all runtime cluster state information including membership, VM lists, node status, RRD metrics, and cluster logs. It serves as the central repository for dynamic cluster information that changes during runtime.
> +
> +## Overview
> +
> +The Status subsystem tracks:
> +- **Cluster membership**: Which nodes are in the cluster and their states
> +- **VM/CT tracking**: Registry of all virtual machines and containers
> +- **Node status**: Per-node health and resource information
> +- **RRD data**: Performance metrics (CPU, memory, disk, network)
> +- **Cluster log**: Centralized log aggregation
> +- **Quorum state**: Whether cluster has quorum
> +- **Version tracking**: Monitors configuration file changes
> +
> +## Usage
> +
> +### Initialization
> +
> +```rust
> +use pmxcfs_status;
> +
> +// For tests or when RRD persistence is not needed
> +let status = pmxcfs_status::init();
> +
> +// For production with RRD file persistence
> +let status = pmxcfs_status::init_with_rrd("/var/lib/rrdcached/db").await;
> +```
> +
> +The default `init()` is synchronous and doesn't require a directory parameter, making tests simpler. Use `init_with_rrd()` for production deployments that need RRD persistence.
> +
> +### Integration with Other Components
> +
> +**FUSE Plugins**:
> +- `.version` plugin reads from Status
> +- `.vmlist` plugin generates VM list from Status
> +- `.members` plugin generates member list from Status
> +- `.rrd` plugin accesses RRD data from Status
> +- `.clusterlog` plugin reads cluster log from Status
> +
> +**DFSM Status Sync**:
> +- `StatusSyncService` (pmxcfs-dfsm) broadcasts status updates
> +- Uses `pve_kvstore_v1` CPG group
> +- KV store data synchronized across nodes
> +
> +**IPC Server**:
> +- `set_status` IPC call updates Status
> +- Used by `pvecm`/`pvenode` tools
> +- RRD data received via IPC
> +
> +**MemDb Integration**:
> +- Scans VM configs to populate vmlist
> +- Tracks version changes on file modifications
> +- Used for `.version` plugin timestamps
> +
> +## Architecture
> +
> +### Module Structure
> +
> +| Module | Purpose |
> +|--------|---------|
> +| `lib.rs` | Public API and initialization |
> +| `status.rs` | Core Status struct and operations |
> +| `types.rs` | Type definitions (ClusterNode, ClusterInfo, etc.) |
> +
> +### Key Features
> +
> +**Thread-Safe**: All operations use `RwLock` or `AtomicU64` for concurrent access
> +**Version Tracking**: Monotonically increasing counters for change detection
> +**Structured Logging**: Field-based tracing for better observability
> +**Optional RRD**: RRD persistence is opt-in, simplifying testing
> +
> +## C to Rust Mapping
> +
> +### Data Structures
> +
> +| C Type | Rust Type | Notes |
> +|--------|-----------|-------|
> +| `cfs_status_t` | `Status` | Main status container |
> +| `cfs_clinfo_t` | `ClusterInfo` | Cluster membership info |
> +| `cfs_clnode_t` | `ClusterNode` | Individual node info |
> +| `vminfo_t` | `VmEntry` | VM/CT registry entry (in pmxcfs-api-types) |
> +| `clog_entry_t` | `ClusterLogEntry` | Cluster log entry |
> +
> +### Core Functions
> +
> +| C Function | Rust Equivalent | Notes |
> +|-----------|-----------------|-------|
> +| `cfs_status_init()` | `init()` or `init_with_rrd()` | Two variants for flexibility |
> +| `cfs_set_quorate()` | `Status::set_quorate()` | Quorum tracking |
> +| `cfs_is_quorate()` | `Status::is_quorate()` | Quorum checking |
> +| `vmlist_register_vm()` | `Status::register_vm()` | VM registration |
> +| `vmlist_delete_vm()` | `Status::delete_vm()` | VM deletion |
> +| `cfs_status_set()` | `Status::set_node_status()` | Status updates (including RRD) |
> +
> +## Key Differences from C Implementation
> +
> +### RRD Decoupling
> +
> +**C Version (status.c)**:
> +- RRD code embedded in status.c
> +- Async initialization always required
> +
> +**Rust Version**:
> +- Separate `pmxcfs-rrd` crate
> +- `init()` is synchronous (no RRD)
> +- `init_with_rrd()` is async (with RRD)
> +- Tests don't need temp directories
> +
> +### Concurrency
> +
> +**C Version**:
> +- Single `GMutex` for entire status structure
> +
> +**Rust Version**:
> +- Fine-grained `RwLock` for different data structures
> +- `AtomicU64` for version counters
> +- Better read parallelism
> +
> +## Configuration File Tracking
> +
> +Status tracks version numbers for these common Proxmox config files:
> +
> +- `corosync.conf`, `corosync.conf.new`
> +- `storage.cfg`, `user.cfg`, `domains.cfg`
> +- `datacenter.cfg`, `vzdump.cron`, `vzdump.conf`
> +- `ha/` directory files (crm_commands, manager_status, resources.cfg, etc.)
> +- `sdn/` directory files (vnets.cfg, zones.cfg, controllers.cfg, etc.)
> +- And many more (see `Status::new()` in status.rs for complete list)
> +
> +## References
> +
> +### C Implementation
> +- `src/pmxcfs/status.c` / `status.h` - Status tracking
> +
> +### Related Crates
> +- **pmxcfs-rrd**: RRD file persistence
> +- **pmxcfs-dfsm**: Status synchronization via StatusSyncService
> +- **pmxcfs-logger**: Cluster log implementation
> +- **pmxcfs**: FUSE plugins that read from Status
> diff --git a/src/pmxcfs-rs/pmxcfs-status/src/lib.rs b/src/pmxcfs-rs/pmxcfs-status/src/lib.rs
> new file mode 100644
> index 00000000..282e007d
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-status/src/lib.rs
> @@ -0,0 +1,54 @@
> +/// Status information and monitoring
> +///
> +/// This module manages:
> +/// - Cluster membership (nodes, IPs, online status)
> +/// - RRD (Round Robin Database) data for metrics
> +/// - Cluster log
> +/// - Node status information
> +/// - VM/CT list tracking
> +mod status;
> +mod traits;
> +mod types;
> +
> +// Re-export public types
> +pub use pmxcfs_api_types::{VmEntry, VmType};
> +pub use types::{ClusterInfo, ClusterLogEntry, ClusterNode, NodeStatus};
> +
> +// Re-export Status struct and trait
> +pub use status::Status;
> +pub use traits::{BoxFuture, MockStatus, StatusOps};
> +
> +use std::sync::Arc;
> +
> +/// Initialize status subsystem without RRD persistence
> +///
> +/// This is the default initialization that creates a Status instance
> +/// without file-based RRD persistence. RRD data will be kept in memory only.
> +pub fn init() -> Arc<Status> {
> +    tracing::info!("Status subsystem initialized (RRD persistence disabled)");
> +    Arc::new(Status::new(None))
> +}
> +
> +/// Initialize status subsystem with RRD file persistence
> +///
> +/// Creates a Status instance with RRD data written to disk in the specified directory.
> +/// This requires the RRD directory to exist and be writable.
> +pub async fn init_with_rrd<P: AsRef<std::path::Path>>(rrd_dir: P) -> Arc<Status> {
> +    let rrd_dir_path = rrd_dir.as_ref();
> +    let rrd_writer = match pmxcfs_rrd::RrdWriter::new(rrd_dir_path).await {
> +        Ok(writer) => {
> +            tracing::info!(
> +                directory = %rrd_dir_path.display(),
> +                "RRD file persistence enabled"
> +            );
> +            Some(writer)
> +        }
> +        Err(e) => {
> +            tracing::warn!(error = %e, "RRD file persistence disabled");
> +            None
> +        }
> +    };
> +
> +    tracing::info!("Status subsystem initialized");
> +    Arc::new(Status::new(rrd_writer))
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-status/src/status.rs b/src/pmxcfs-rs/pmxcfs-status/src/status.rs
> new file mode 100644
> index 00000000..94b6483d
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-status/src/status.rs
> @@ -0,0 +1,1561 @@
> +/// Status subsystem implementation
> +use crate::types::{ClusterInfo, ClusterLogEntry, ClusterNode, NodeStatus, RrdEntry};
> +use anyhow::Result;
> +use parking_lot::RwLock;
> +use pmxcfs_api_types::{VmEntry, VmType};
> +use std::collections::HashMap;
> +use std::sync::Arc;
> +use std::sync::atomic::{AtomicU64, Ordering};
> +use std::time::{SystemTime, UNIX_EPOCH};
> +
> +/// Status subsystem (matches C implementation's cfs_status_t)
> +pub struct Status {
> +    /// Cluster information (nodes, membership) - matches C's clinfo
> +    cluster_info: RwLock<Option<ClusterInfo>>,
> +
> +    /// Cluster info version counter - increments on membership changes (matches C's clinfo_version)
> +    cluster_version: AtomicU64,

This field is used as a change counter in multiple places but gets
overwritten in update_cluster_info(). In C we have clinfo_version
vs cman_version. These need to be separate fields as in C, otherwise
update_cluster_info overwrites the monotonic change counter that other
call sites depend on.

> +
> +    /// VM list version counter - increments when VM list changes (matches C's vmlist_version)
> +    vmlist_version: AtomicU64,
> +
> +    /// MemDB path version counters (matches C's memdb_change_array)
> +    /// Tracks versions for specific config files like "corosync.conf", "user.cfg", etc.
> +    memdb_path_versions: RwLock<HashMap<String, AtomicU64>>,
> +
> +    /// Node status data by name
> +    node_status: RwLock<HashMap<String, NodeStatus>>,
> +
> +    /// Cluster log with ring buffer and deduplication (matches C's clusterlog_t)
> +    cluster_log: pmxcfs_logger::ClusterLog,
> +
> +    /// RRD entries by key (e.g., "pve2-node/nodename" or "pve2.3-vm/vmid")
> +    pub(crate) rrd_data: RwLock<HashMap<String, RrdEntry>>,
> +
> +    /// RRD file writer for persistent storage (using tokio RwLock for async compatibility)
> +    rrd_writer: Option<Arc<tokio::sync::RwLock<pmxcfs_rrd::RrdWriter>>>,
> +
> +    /// VM/CT list (vmid -> VmEntry)
> +    vmlist: RwLock<HashMap<u32, VmEntry>>,
> +
> +    /// Quorum status (matches C's cfs_status.quorate)
> +    quorate: RwLock<bool>,
> +
> +    /// Current cluster members (CPG membership)
> +    members: RwLock<Vec<pmxcfs_api_types::MemberInfo>>,
> +
> +    /// Daemon start timestamp (UNIX epoch) - for .version plugin
> +    start_time: u64,
> +
> +    /// KV store data from nodes (nodeid -> key -> value)
> +    /// Matches C implementation's kvhash
> +    kvstore: RwLock<HashMap<u32, HashMap<String, Vec<u8>>>>,

C removes a kvstore entry when len == 0 and maintains a
per key entry->version counter (incremented on overwrite).
Our kvstore currently stores only Vec<u8> and doesn’t reflect
these semantics

> +}
> +
> +impl Status {
> +    /// Create a new Status instance
> +    ///
> +    /// For production use with RRD persistence, use `pmxcfs_status::init_with_rrd()`.
> +    /// For tests or when RRD persistence is not needed, use `pmxcfs_status::init()`.
> +    /// This constructor is public to allow custom initialization patterns.
> +    pub fn new(rrd_writer: Option<pmxcfs_rrd::RrdWriter>) -> Self {
> +        // Wrap RrdWriter in Arc<tokio::sync::RwLock> if provided (for async compatibility)
> +        let rrd_writer = rrd_writer.map(|w| Arc::new(tokio::sync::RwLock::new(w)));
> +
> +        // Initialize memdb path versions for common Proxmox config files
> +        // Matches C implementation's memdb_change_array (status.c:79-120)
> +        // These are the exact paths tracked by the C implementation
> +        let mut path_versions = HashMap::new();
> +        let common_paths = vec![
> +            "corosync.conf",
> +            "corosync.conf.new",
> +            "storage.cfg",
> +            "user.cfg",
> +            "domains.cfg",
> +            "notifications.cfg",
> +            "priv/notifications.cfg",
> +            "priv/shadow.cfg",
> +            "priv/acme/plugins.cfg",
> +            "priv/tfa.cfg",
> +            "priv/token.cfg",
> +            "datacenter.cfg",
> +            "vzdump.cron",
> +            "vzdump.conf",
> +            "jobs.cfg",
> +            "ha/crm_commands",
> +            "ha/manager_status",
> +            "ha/resources.cfg",
> +            "ha/rules.cfg",
> +            "ha/groups.cfg",
> +            "ha/fence.cfg",
> +            "status.cfg",
> +            "replication.cfg",
> +            "ceph.conf",
> +            "sdn/vnets.cfg",
> +            "sdn/zones.cfg",
> +            "sdn/controllers.cfg",
> +            "sdn/subnets.cfg",
> +            "sdn/ipams.cfg",
> +            "sdn/mac-cache.json",            // SDN MAC address cache
> +            "sdn/pve-ipam-state.json",       // SDN IPAM state
> +            "sdn/dns.cfg",                   // SDN DNS configuration
> +            "sdn/fabrics.cfg",               // SDN fabrics configuration
> +            "sdn/.running-config",           // SDN running configuration
> +            "virtual-guest/cpu-models.conf", // Virtual guest CPU models
> +            "virtual-guest/profiles.cfg",    // Virtual guest profiles
> +            "firewall/cluster.fw",           // Cluster firewall rules
> +            "mapping/directory.cfg",         // Directory mappings
> +            "mapping/pci.cfg",               // PCI device mappings
> +            "mapping/usb.cfg",               // USB device mappings
> +        ];
> +
> +        for path in common_paths {
> +            path_versions.insert(path.to_string(), AtomicU64::new(0));
> +        }
> +
> +        // Get start time (matches C implementation's cfs_status.start_time)
> +        let start_time = SystemTime::now()
> +            .duration_since(UNIX_EPOCH)
> +            .unwrap_or_default()
> +            .as_secs();
> +
> +        Self {
> +            cluster_info: RwLock::new(None),
> +            cluster_version: AtomicU64::new(1),
> +            vmlist_version: AtomicU64::new(1),
> +            memdb_path_versions: RwLock::new(path_versions),
> +            node_status: RwLock::new(HashMap::new()),
> +            cluster_log: pmxcfs_logger::ClusterLog::new(),
> +            rrd_data: RwLock::new(HashMap::new()),
> +            rrd_writer,
> +            vmlist: RwLock::new(HashMap::new()),
> +            quorate: RwLock::new(false),
> +            members: RwLock::new(Vec::new()),
> +            start_time,
> +            kvstore: RwLock::new(HashMap::new()),
> +        }
> +    }
> +
> +    /// Get node status
> +    pub fn get_node_status(&self, name: &str) -> Option<NodeStatus> {
> +        self.node_status.read().get(name).cloned()
> +    }
> +
> +    /// Set node status (matches C implementation's cfs_status_set)
> +    ///
> +    /// This handles status updates received via IPC from external clients.
> +    /// If the key starts with "rrd/", it's RRD data that should be written to disk.
> +    /// Otherwise, it's generic node status data.
> +    pub async fn set_node_status(&self, name: String, data: Vec<u8>) -> Result<()> {

we need to check for CFS_MAX_STATUS_SIZE, to avoid accepting unbounded
payloads (and to avoid possible state divergence with C)

> +        // Check if this is RRD data (matching C's cfs_status_set behavior)
> +        if let Some(rrd_key) = name.strip_prefix("rrd/") {
> +            // Strip "rrd/" prefix to get the actual RRD key
> +            // Convert data to string (RRD data is text format)
> +            let data_str = String::from_utf8(data)
> +                .map_err(|e| anyhow::anyhow!("Invalid UTF-8 in RRD data: {e}"))?;

We need to strip \0 as C payloads are NUL terminated and from_utf8
preserves it, so that it doesn't end up in RRD dump output

> +
> +            // Write to RRD (stores in memory and writes to disk)
> +            self.set_rrd_data(rrd_key.to_string(), data_str).await?;
> +        } else {

nodeip handling is missing here, C has a dedicated branch for it.
The backing data structure iphash is also missing.

> +            // Regular node status (not RRD)
> +            let now = SystemTime::now().duration_since(UNIX_EPOCH)?.as_secs();
> +            let status = NodeStatus {
> +                name: name.clone(),
> +                data,
> +                timestamp: now,
> +            };
> +            self.node_status.write().insert(name, status);
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Add cluster log entry
> +    pub fn add_log_entry(&self, entry: ClusterLogEntry) {
> +        // Convert ClusterLogEntry to ClusterLog format and add
> +        // The ClusterLog handles size limits and deduplication internally
> +        let _ = self.cluster_log.add(
> +            &entry.node,
> +            &entry.ident,
> +            &entry.tag,
> +            0, // pid not tracked in our entries
> +            entry.priority,
> +            entry.timestamp as u32,
> +            &entry.message,
> +        );
> +    }
> +
> +    /// Get cluster log entries
> +    pub fn get_log_entries(&self, max: usize) -> Vec<ClusterLogEntry> {
> +        // Get entries from ClusterLog and convert to ClusterLogEntry
> +        self.cluster_log
> +            .get_entries(max)
> +            .into_iter()
> +            .map(|entry| ClusterLogEntry {
> +                timestamp: entry.time as u64,
> +                node: entry.node,
> +                priority: entry.priority,
> +                ident: entry.ident,
> +                tag: entry.tag,
> +                message: entry.message,
> +            })
> +            .collect()
> +    }
> +
> +    /// Clear all cluster log entries (for testing)
> +    pub fn clear_cluster_log(&self) {
> +        self.cluster_log.clear();
> +    }
> +
> +    /// Set RRD data (C-compatible format)
> +    /// Key format: "pve2-node/{nodename}" or "pve2.3-vm/{vmid}"
> +    /// Data format: "{timestamp}:{val1}:{val2}:..."
> +    pub async fn set_rrd_data(&self, key: String, data: String) -> Result<()> {
> +        let now = SystemTime::now()
> +            .duration_since(UNIX_EPOCH)
> +            .unwrap_or_default()
> +            .as_secs();
> +
> +        let entry = RrdEntry {
> +            key: key.clone(),
> +            data: data.clone(),
> +            timestamp: now,
> +        };
> +
> +        // Store in memory for .rrd plugin file
> +        self.rrd_data.write().insert(key.clone(), entry);
> +
> +        // Also write to RRD file on disk (if persistence is enabled)
> +        if let Some(writer_lock) = &self.rrd_writer {
> +            let mut writer = writer_lock.write().await;
> +            writer.update(&key, &data).await?;
> +            tracing::trace!("Updated RRD file: {} -> {}", key, data);
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Remove old RRD entries (older than 5 minutes)
> +    pub fn remove_old_rrd_data(&self) {
> +        let now = SystemTime::now()
> +            .duration_since(UNIX_EPOCH)
> +            .unwrap_or_default()
> +            .as_secs();
> +
> +        const EXPIRE_SECONDS: u64 = 60 * 5; // 5 minutes
> +
> +        self.rrd_data
> +            .write()
> +            .retain(|_, entry| now - entry.timestamp <= EXPIRE_SECONDS);

If the system clock jumps backwards, now can be less than
entry.timestamp

> +    }
> +
> +    /// Get RRD data dump (text format matching C implementation)
> +    pub fn get_rrd_dump(&self) -> String {

This rebuilds everytime when called, and calls remove_old_rrd_data
under write lock. This could be cached for a specific time to improve
performance, similarly as done in C.

> +        // Remove old entries first
> +        self.remove_old_rrd_data();
> +
> +        let rrd = self.rrd_data.read();
> +        let mut result = String::new();
> +
> +        for entry in rrd.values() {
> +            result.push_str(&entry.key);
> +            result.push(':');
> +            result.push_str(&entry.data);
> +            result.push('\n');
> +        }
> +
> +        result
> +    }
> +
> +    /// Collect disk I/O statistics (bytes read, bytes written)
> +    ///
> +    /// Note: This is for future VM RRD implementation. Per C implementation:
> +    /// - Node RRD (rrd_def_node) has 12 fields and does NOT include diskread/diskwrite
> +    /// - VM RRD (rrd_def_vm) has 10 fields and DOES include diskread/diskwrite at indices 8-9
> +    ///
> +    /// This method will be used when implementing VM RRD collection.
> +    ///
> +    /// # Sector Size
> +    /// The Linux kernel reports disk statistics in /proc/diskstats using 512-byte sectors
> +    /// as the standard unit, regardless of the device's actual physical sector size.
> +    /// This is a kernel reporting convention (see Documentation/admin-guide/iostats.rst).
> +    #[allow(dead_code)]
> +    fn collect_disk_io() -> Result<(u64, u64)> {
> +        // /proc/diskstats always uses 512-byte sectors (kernel convention)
> +        const DISKSTATS_SECTOR_SIZE: u64 = 512;
> +
> +        let diskstats = procfs::diskstats()?;
> +
> +        let mut total_read = 0u64;
> +        let mut total_write = 0u64;
> +
> +        for stat in diskstats {
> +            // Skip partitions (only look at whole disks: sda, vda, etc.)
> +            if stat
> +                .name
> +                .chars()
> +                .last()
> +                .map(|c| c.is_numeric())
> +                .unwrap_or(false)
> +            {
> +                continue;
> +            }
> +
> +            // Convert sectors to bytes using kernel's reporting unit
> +            total_read += stat.sectors_read * DISKSTATS_SECTOR_SIZE;
> +            total_write += stat.sectors_written * DISKSTATS_SECTOR_SIZE;
> +        }
> +
> +        Ok((total_read, total_write))
> +    }
> +
> +    /// Register a VM/CT
> +    pub fn register_vm(&self, vmid: u32, vmtype: VmType, node: String) {
> +        tracing::debug!(vmid, vmtype = ?vmtype, node = %node, "Registered VM");
> +
> +        // Get existing VM version or start at 1
> +        let version = self
> +            .vmlist
> +            .read()
> +            .get(&vmid)
> +            .map(|vm| vm.version + 1)

In C we have the global static uint32_t vminfo_version_counter.
Here we have per vm based counters. Why the difference?
wouldnt it be more
helpful if we also have a global order here, so we can determine
the update order of VMs from it?

> +            .unwrap_or(1);
> +
> +        let entry = VmEntry {
> +            vmid,
> +            vmtype,
> +            node,
> +            version,
> +        };
> +        self.vmlist.write().insert(vmid, entry);

Between the read() and write() we have TOCTOU window, similarly as in
set_quorate

> +
> +        // Increment vmlist version counter
> +        self.increment_vmlist_version();
> +    }
> +
> +    /// Delete a VM/CT
> +    pub fn delete_vm(&self, vmid: u32) {
> +        if self.vmlist.write().remove(&vmid).is_some() {

This should bump unconditionally to match C

> +            tracing::debug!(vmid, "Deleted VM");
> +
> +            // Increment vmlist version counter
> +            self.increment_vmlist_version();
> +        }
> +    }
> +
> +    /// Check if VM/CT exists
> +    pub fn vm_exists(&self, vmid: u32) -> bool {
> +        self.vmlist.read().contains_key(&vmid)
> +    }
> +
> +    /// Check if a different VM/CT exists (different node or type)
> +    pub fn different_vm_exists(&self, vmid: u32, vmtype: VmType, node: &str) -> bool {
> +        if let Some(entry) = self.vmlist.read().get(&vmid) {
> +            entry.vmtype != vmtype || entry.node != node
> +        } else {
> +            false
> +        }
> +    }
> +
> +    /// Get VM list
> +    pub fn get_vmlist(&self) -> HashMap<u32, VmEntry> {
> +        self.vmlist.read().clone()
> +    }
> +
> +    /// Scan directories for VMs/CTs and update vmlist
> +    ///
> +    /// Uses memdb's `recreate_vmlist()` to properly scan nodes/*/qemu-server/
> +    /// and nodes/*/lxc/ directories to track which node each VM belongs to.
> +    pub fn scan_vmlist(&self, memdb: &pmxcfs_memdb::MemDb) {
> +        // Use the proper recreate_vmlist from memdb which scans nodes/*/qemu-server/ and nodes/*/lxc/
> +        match pmxcfs_memdb::recreate_vmlist(memdb) {
> +            Ok(new_vmlist) => {
> +                let vmlist_len = new_vmlist.len();
> +                let mut vmlist = self.vmlist.write();
> +                *vmlist = new_vmlist;

This replaces the entire HashMap, which resets all per VM version
counters.

> +                drop(vmlist);
> +
> +                tracing::info!(vms = vmlist_len, "VM list scan complete");
> +
> +                // Increment vmlist version counter
> +                self.increment_vmlist_version();
> +            }
> +            Err(err) => {
> +                tracing::error!(error = %err, "Failed to recreate vmlist");
> +            }
> +        }
> +    }
> +
> +    /// Initialize cluster information with cluster name
> +    pub fn init_cluster(&self, cluster_name: String) {
> +        let info = ClusterInfo::new(cluster_name);
> +        *self.cluster_info.write() = Some(info);
> +        self.cluster_version.fetch_add(1, Ordering::SeqCst);
> +    }
> +
> +    /// Register a node in the cluster (name, ID, IP)
> +    pub fn register_node(&self, node_id: u32, name: String, ip: String) {
> +        tracing::debug!(node_id, node = %name, ip = %ip, "Registering cluster node");
> +
> +        let mut cluster_info = self.cluster_info.write();
> +        if let Some(ref mut info) = *cluster_info {
> +            let node = ClusterNode {
> +                name,
> +                node_id,
> +                ip,
> +                online: false, // Will be updated by cluster module
> +            };
> +            info.add_node(node);
> +            self.cluster_version.fetch_add(1, Ordering::SeqCst);
> +        }
> +    }
> +
> +    /// Get cluster information (for .members plugin)
> +    pub fn get_cluster_info(&self) -> Option<ClusterInfo> {
> +        self.cluster_info.read().clone()
> +    }
> +
> +    /// Get cluster version
> +    pub fn get_cluster_version(&self) -> u64 {
> +        self.cluster_version.load(Ordering::SeqCst)
> +    }
> +
> +    /// Increment cluster version (called when membership changes)
> +    pub fn increment_cluster_version(&self) {
> +        self.cluster_version.fetch_add(1, Ordering::SeqCst);
> +    }
> +
> +    /// Update cluster info from CMAP (called by ClusterConfigService)
> +    pub fn update_cluster_info(
> +        &self,
> +        cluster_name: String,
> +        config_version: u64,
> +        nodes: Vec<(u32, String, String)>,
> +    ) -> Result<()> {
> +        let mut cluster_info = self.cluster_info.write();
> +
> +        // Create or update cluster info
> +        let mut info = cluster_info
> +            .take()
> +            .unwrap_or_else(|| ClusterInfo::new(cluster_name.clone()));
> +
> +        // Update cluster name if changed
> +        if info.cluster_name != cluster_name {
> +            info.cluster_name = cluster_name;
> +        }
> +
> +        // Clear existing nodes
> +        info.nodes_by_id.clear();
> +        info.nodes_by_name.clear();
> +
> +        // Add updated nodes
> +        for (nodeid, name, ip) in nodes {
> +            let node = ClusterNode {
> +                name: name.clone(),
> +                node_id: nodeid,
> +                ip,
> +                online: false, // Will be updated by quorum module

This drops online status. C's cfs_status_set_clinfo preserves it by
copying from oldnode. This needs the same treatment here.

> +            };
> +            info.add_node(node);
> +        }

Do we need to cleanup kvstore on node removal?

> +
> +        *cluster_info = Some(info);
> +
> +        // Update version to reflect configuration change
> +        self.cluster_version.store(config_version, Ordering::SeqCst);
> +
> +        tracing::info!(version = config_version, "Updated cluster configuration");
> +        Ok(())
> +    }
> +
> +    /// Update node online status (called by cluster module)
> +    pub fn set_node_online(&self, node_id: u32, online: bool) {
> +        let mut cluster_info = self.cluster_info.write();
> +        if let Some(ref mut info) = *cluster_info
> +            && let Some(node) = info.nodes_by_id.get_mut(&node_id)
> +            && node.online != online
> +        {
> +            node.online = online;
> +            // Also update in nodes_by_name
> +            if let Some(name_node) = info.nodes_by_name.get_mut(&node.name) {
> +                name_node.online = online;
> +            }
> +            self.cluster_version.fetch_add(1, Ordering::SeqCst);
> +            tracing::debug!(
> +                node = %node.name,
> +                node_id,
> +                online = if online { "true" } else { "false" },
> +                "Node online status changed"
> +            );
> +        }
> +    }
> +
> +    /// Check if cluster is quorate (matches C's cfs_is_quorate)
> +    pub fn is_quorate(&self) -> bool {
> +        *self.quorate.read()
> +    }
> +
> +    /// Set quorum status (matches C's cfs_set_quorate)
> +    pub fn set_quorate(&self, quorate: bool) {
> +        let old_quorate = *self.quorate.read();

between this

> +        *self.quorate.write() = quorate;

and this line we have a TOCTOU window.
The * dereferences the bool out of it, and then the guard is dropped at
the semicolon. So between line 1 and line 2, no lock is held.
Putting both operations under the write lock would solve it.

> +
> +        if old_quorate != quorate {
> +            if quorate {
> +                tracing::info!("Node has quorum");
> +            } else {
> +                tracing::info!("Node lost quorum");
> +            }
> +        }
> +    }
> +
> +    /// Get current cluster members (CPG membership)
> +    pub fn get_members(&self) -> Vec<pmxcfs_api_types::MemberInfo> {
> +        self.members.read().clone()
> +    }
> +
> +    /// Update cluster members and sync online status (matches C's dfsm_confchg callback)
> +    ///
> +    /// This updates the CPG member list and synchronizes the online status
> +    /// in cluster_info to match current membership.
> +    pub fn update_members(&self, members: Vec<pmxcfs_api_types::MemberInfo>) {
> +        *self.members.write() = members.clone();
> +
> +        // Update online status in cluster_info based on members
> +        // (matches C implementation's dfsm_confchg in status.c:1989-2025)
> +        let mut cluster_info = self.cluster_info.write();
> +        if let Some(ref mut info) = *cluster_info {
> +            // First mark all nodes as offline
> +            for node in info.nodes_by_id.values_mut() {
> +                node.online = false;
> +            }
> +            for node in info.nodes_by_name.values_mut() {
> +                node.online = false;
> +            }
> +
> +            // Then mark active members as online
> +            for member in &members {
> +                if let Some(node) = info.nodes_by_id.get_mut(&member.node_id) {
> +                    node.online = true;
> +                    // Also update in nodes_by_name
> +                    if let Some(name_node) = info.nodes_by_name.get_mut(&node.name) {
> +                        name_node.online = true;
> +                    }
> +                }
> +            }
> +
> +            self.cluster_version.fetch_add(1, Ordering::SeqCst);
> +        }
> +    }
> +
> +    /// Get daemon start timestamp (for .version plugin)
> +    pub fn get_start_time(&self) -> u64 {
> +        self.start_time
> +    }
> +
> +    /// Increment VM list version (matches C's cfs_status.vmlist_version++)
> +    pub fn increment_vmlist_version(&self) {
> +        self.vmlist_version.fetch_add(1, Ordering::SeqCst);
> +    }
> +
> +    /// Get VM list version
> +    pub fn get_vmlist_version(&self) -> u64 {
> +        self.vmlist_version.load(Ordering::SeqCst)
> +    }
> +
> +    /// Increment version for a specific memdb path (matches C's record_memdb_change)
> +    pub fn increment_path_version(&self, path: &str) {
> +        let versions = self.memdb_path_versions.read();
> +        if let Some(counter) = versions.get(path) {
> +            counter.fetch_add(1, Ordering::SeqCst);
> +        }
> +    }
> +
> +    /// Get version for a specific memdb path
> +    pub fn get_path_version(&self, path: &str) -> u64 {
> +        let versions = self.memdb_path_versions.read();
> +        versions
> +            .get(path)
> +            .map(|counter| counter.load(Ordering::SeqCst))
> +            .unwrap_or(0)
> +    }
> +
> +    /// Get all memdb path versions (for .version plugin)
> +    pub fn get_all_path_versions(&self) -> HashMap<String, u64> {
> +        let versions = self.memdb_path_versions.read();
> +        versions
> +            .iter()
> +            .map(|(path, counter)| (path.clone(), counter.load(Ordering::SeqCst)))
> +            .collect()
> +    }
> +
> +    /// Increment ALL configuration file versions (matches C's record_memdb_reload)
> +    ///
> +    /// Called when the entire database is reloaded from cluster peers.
> +    /// This ensures clients know that all configuration files should be re-read.
> +    pub fn increment_all_path_versions(&self) {
> +        let versions = self.memdb_path_versions.read();
> +        for (_, counter) in versions.iter() {
> +            counter.fetch_add(1, Ordering::SeqCst);
> +        }
> +    }
> +
> +    /// Set key-value data from a node (kvstore DFSM)
> +    ///
> +    /// Matches C implementation's cfs_kvstore_node_set in status.c.
> +    /// Stores ephemeral status data like RRD metrics, IP addresses, etc.
> +    pub fn set_node_kv(&self, nodeid: u32, key: String, value: Vec<u8>) {

We accept unknown nodeids here, maybe something like this would work

     let cluster_info = self.cluster_info.read();
     match &*cluster_info {
         Some(info) if info.nodes_by_id.contains_key(&nodeid) => {},
         _ => return,
     }
     drop(cluster_info);

Also, shouldn't we also have the same 3 checks here as set_node_status
should have? Basically

     if let Some(rrd_key) = key.strip_prefix("rrd/") {
        ..
     } else if key == "nodeip" {
        ..
     } else {
        ..
     }

> +        let mut kvstore = self.kvstore.write();
> +        kvstore.entry(nodeid).or_default().insert(key, value);
> +    }
> +
> +    /// Get key-value data from a node
> +    pub fn get_node_kv(&self, nodeid: u32, key: &str) -> Option<Vec<u8>> {
> +        let kvstore = self.kvstore.read();
> +        kvstore.get(&nodeid)?.get(key).cloned()
> +    }
> +
> +    /// Add cluster log entry (called by kvstore DFSM)
> +    ///
> +    /// This is the wrapper for kvstore LOG messages.
> +    /// Matches C implementation's clusterlog_insert call.
> +    pub fn add_cluster_log(
> +        &self,
> +        timestamp: u32,
> +        priority: u8,
> +        tag: String,
> +        node: String,
> +        message: String,
> +    ) {
> +        let entry = ClusterLogEntry {
> +            timestamp: timestamp as u64,
> +            node,
> +            priority,
> +            ident: String::new(), // Not used in kvstore messages
> +            tag,
> +            message,
> +        };
> +        self.add_log_entry(entry);
> +    }
> +
> +    /// Update node online status based on CPG membership (kvstore DFSM confchg callback)
> +    ///
> +    /// This is called when kvstore CPG membership changes.
> +    /// Matches C implementation's dfsm_confchg in status.c.
> +    pub fn update_member_status(&self, member_list: &[u32]) {
> +        let mut cluster_info = self.cluster_info.write();
> +        if let Some(ref mut info) = *cluster_info {
> +            // Mark all nodes as offline
> +            for node in info.nodes_by_id.values_mut() {
> +                node.online = false;
> +            }
> +            for node in info.nodes_by_name.values_mut() {
> +                node.online = false;
> +            }
> +
> +            // Mark nodes in member_list as online
> +            for &nodeid in member_list {
> +                if let Some(node) = info.nodes_by_id.get_mut(&nodeid) {
> +                    node.online = true;
> +                    // Also update in nodes_by_name
> +                    if let Some(name_node) = info.nodes_by_name.get_mut(&node.name) {
> +                        name_node.online = true;
> +                    }
> +                }
> +            }
> +
> +            self.cluster_version.fetch_add(1, Ordering::SeqCst);
> +        }
> +    }
> +
> +    /// Get cluster log state (for DFSM synchronization)
> +    ///
> +    /// Returns the cluster log in C-compatible binary format (clog_base_t).
> +    /// Matches C implementation's clusterlog_get_state() in logger.c:553-571.
> +    pub fn get_cluster_log_state(&self) -> Result<Vec<u8>> {
> +        self.cluster_log.get_state()
> +    }
> +
> +    /// Merge cluster log states from remote nodes
> +    ///
> +    /// Deserializes binary states from remote nodes and merges them with the local log.
> +    /// Matches C implementation's dfsm_process_state_update() in status.c:2049-2074.
> +    pub fn merge_cluster_log_states(
> +        &self,
> +        states: &[pmxcfs_api_types::NodeSyncInfo],
> +    ) -> Result<()> {
> +        use pmxcfs_logger::ClusterLog;
> +
> +        let mut remote_logs = Vec::new();
> +
> +        for state_info in states {
> +            // Check if this node has state data
> +            let state_data = match &state_info.state {
> +                Some(data) if !data.is_empty() => data,
> +                _ => continue,
> +            };
> +
> +            match ClusterLog::deserialize_state(state_data) {
> +                Ok(ring_buffer) => {
> +                    tracing::debug!(
> +                        "Deserialized cluster log from node {}: {} entries",
> +                        state_info.nodeid,
> +                        ring_buffer.len()
> +                    );
> +                    remote_logs.push(ring_buffer);
> +                }
> +                Err(e) => {
> +                    tracing::warn!(
> +                        nodeid = state_info.nodeid,
> +                        error = %e,
> +                        "Failed to deserialize cluster log from node"
> +                    );
> +                }
> +            }
> +        }
> +
> +        if !remote_logs.is_empty() {
> +            // Merge remote logs with local log (include_local = true)
> +            match self.cluster_log.merge(remote_logs, true) {
> +                Ok(merged) => {
> +                    // Update our buffer with the merged result
> +                    self.cluster_log.update_buffer(merged);
> +                    tracing::debug!("Successfully merged cluster logs");
> +                }
> +                Err(e) => {
> +                    tracing::error!(error = %e, "Failed to merge cluster logs");
> +                }
> +            }
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Add cluster log entry from remote node (kvstore LOG message)
> +    ///
> +    /// Matches C implementation's clusterlog_insert() via kvstore message handling.
> +    pub fn add_remote_cluster_log(
> +        &self,
> +        time: u32,
> +        priority: u8,
> +        node: String,
> +        ident: String,
> +        tag: String,
> +        message: String,
> +    ) -> Result<()> {
> +        self.cluster_log
> +            .add(&node, &ident, &tag, 0, priority, time, &message)?;
> +        Ok(())
> +    }
> +}
> +
> +// Implement StatusOps trait for Status
> +impl crate::traits::StatusOps for Status {
> +    fn get_node_status(&self, name: &str) -> Option<NodeStatus> {
> +        self.get_node_status(name)
> +    }
> +
> +    fn set_node_status<'a>(
> +        &'a self,
> +        name: String,
> +        data: Vec<u8>,
> +    ) -> crate::traits::BoxFuture<'a, Result<()>> {
> +        Box::pin(self.set_node_status(name, data))
> +    }
> +
> +    fn add_log_entry(&self, entry: ClusterLogEntry) {
> +        self.add_log_entry(entry)
> +    }
> +
> +    fn get_log_entries(&self, max: usize) -> Vec<ClusterLogEntry> {
> +        self.get_log_entries(max)
> +    }
> +
> +    fn clear_cluster_log(&self) {
> +        self.clear_cluster_log()
> +    }
> +
> +    fn add_cluster_log(
> +        &self,
> +        timestamp: u32,
> +        priority: u8,
> +        tag: String,
> +        node: String,
> +        msg: String,
> +    ) {
> +        self.add_cluster_log(timestamp, priority, tag, node, msg)
> +    }
> +
> +    fn get_cluster_log_state(&self) -> Result<Vec<u8>> {
> +        self.get_cluster_log_state()
> +    }
> +
> +    fn merge_cluster_log_states(&self, states: &[pmxcfs_api_types::NodeSyncInfo]) -> Result<()> {
> +        self.merge_cluster_log_states(states)
> +    }
> +
> +    fn add_remote_cluster_log(
> +        &self,
> +        time: u32,
> +        priority: u8,
> +        node: String,
> +        ident: String,
> +        tag: String,
> +        message: String,
> +    ) -> Result<()> {
> +        self.add_remote_cluster_log(time, priority, node, ident, tag, message)
> +    }
> +
> +    fn set_rrd_data<'a>(
> +        &'a self,
> +        key: String,
> +        data: String,
> +    ) -> crate::traits::BoxFuture<'a, Result<()>> {
> +        Box::pin(self.set_rrd_data(key, data))
> +    }
> +
> +    fn remove_old_rrd_data(&self) {
> +        self.remove_old_rrd_data()
> +    }
> +
> +    fn get_rrd_dump(&self) -> String {
> +        self.get_rrd_dump()
> +    }
> +
> +    fn register_vm(&self, vmid: u32, vmtype: VmType, node: String) {
> +        self.register_vm(vmid, vmtype, node)
> +    }
> +
> +    fn delete_vm(&self, vmid: u32) {
> +        self.delete_vm(vmid)
> +    }
> +
> +    fn vm_exists(&self, vmid: u32) -> bool {
> +        self.vm_exists(vmid)
> +    }
> +
> +    fn different_vm_exists(&self, vmid: u32, vmtype: VmType, node: &str) -> bool {
> +        self.different_vm_exists(vmid, vmtype, node)
> +    }
> +
> +    fn get_vmlist(&self) -> HashMap<u32, VmEntry> {
> +        self.get_vmlist()
> +    }
> +
> +    fn scan_vmlist(&self, memdb: &pmxcfs_memdb::MemDb) {
> +        self.scan_vmlist(memdb)
> +    }
> +
> +    fn init_cluster(&self, cluster_name: String) {
> +        self.init_cluster(cluster_name)
> +    }
> +
> +    fn register_node(&self, node_id: u32, name: String, ip: String) {
> +        self.register_node(node_id, name, ip)
> +    }
> +
> +    fn get_cluster_info(&self) -> Option<ClusterInfo> {
> +        self.get_cluster_info()
> +    }
> +
> +    fn get_cluster_version(&self) -> u64 {
> +        self.get_cluster_version()
> +    }
> +
> +    fn increment_cluster_version(&self) {
> +        self.increment_cluster_version()
> +    }
> +
> +    fn update_cluster_info(
> +        &self,
> +        cluster_name: String,
> +        config_version: u64,
> +        nodes: Vec<(u32, String, String)>,
> +    ) -> Result<()> {
> +        self.update_cluster_info(cluster_name, config_version, nodes)
> +    }
> +
> +    fn set_node_online(&self, node_id: u32, online: bool) {
> +        self.set_node_online(node_id, online)
> +    }
> +
> +    fn is_quorate(&self) -> bool {
> +        self.is_quorate()
> +    }
> +
> +    fn set_quorate(&self, quorate: bool) {
> +        self.set_quorate(quorate)
> +    }
> +
> +    fn get_members(&self) -> Vec<pmxcfs_api_types::MemberInfo> {
> +        self.get_members()
> +    }
> +
> +    fn update_members(&self, members: Vec<pmxcfs_api_types::MemberInfo>) {
> +        self.update_members(members)
> +    }
> +
> +    fn update_member_status(&self, member_list: &[u32]) {
> +        self.update_member_status(member_list)
> +    }
> +
> +    fn get_start_time(&self) -> u64 {
> +        self.get_start_time()
> +    }
> +
> +    fn increment_vmlist_version(&self) {
> +        self.increment_vmlist_version()
> +    }
> +
> +    fn get_vmlist_version(&self) -> u64 {
> +        self.get_vmlist_version()
> +    }
> +
> +    fn increment_path_version(&self, path: &str) {
> +        self.increment_path_version(path)
> +    }
> +
> +    fn get_path_version(&self, path: &str) -> u64 {
> +        self.get_path_version(path)
> +    }
> +
> +    fn get_all_path_versions(&self) -> HashMap<String, u64> {
> +        self.get_all_path_versions()
> +    }
> +
> +    fn increment_all_path_versions(&self) {
> +        self.increment_all_path_versions()
> +    }
> +
> +    fn set_node_kv(&self, nodeid: u32, key: String, value: Vec<u8>) {
> +        self.set_node_kv(nodeid, key, value)
> +    }
> +
> +    fn get_node_kv(&self, nodeid: u32, key: &str) -> Option<Vec<u8>> {
> +        self.get_node_kv(nodeid, key)
> +    }
> +}
> +
> +#[cfg(test)]
> +mod tests {

[..]

> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-status/src/traits.rs b/src/pmxcfs-rs/pmxcfs-status/src/traits.rs

[..]
> diff --git a/src/pmxcfs-rs/pmxcfs-status/src/types.rs b/src/pmxcfs-rs/pmxcfs-status/src/types.rs
> new file mode 100644
> index 00000000..393ce63a
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-status/src/types.rs
> @@ -0,0 +1,62 @@
> +/// Data types for the status module
> +use std::collections::HashMap;
> +
> +/// Cluster node information (matches C implementation's cfs_clnode_t)
> +#[derive(Debug, Clone)]
> +pub struct ClusterNode {
> +    pub name: String,
> +    pub node_id: u32,
> +    pub ip: String,
> +    pub online: bool,
> +}
> +
> +/// Cluster information (matches C implementation's cfs_clinfo_t)
> +#[derive(Debug, Clone)]
> +pub struct ClusterInfo {
> +    pub cluster_name: String,
> +    pub nodes_by_id: HashMap<u32, ClusterNode>,
> +    pub nodes_by_name: HashMap<String, ClusterNode>,

Mutation sites have to remember to update both maps.
A safer pattern would be to make nodes_by_name just an index:

pub nodes_by_id: HashMap<u32, ClusterNode>,
pub nodes_by_name: HashMap<String, u32>,

> +}
> +
> +impl ClusterInfo {
> +    pub(crate) fn new(cluster_name: String) -> Self {
> +        Self {
> +            cluster_name,
> +            nodes_by_id: HashMap::new(),
> +            nodes_by_name: HashMap::new(),
> +        }
> +    }
> +
> +    /// Add or update a node in the cluster
> +    pub(crate) fn add_node(&mut self, node: ClusterNode) {
> +        self.nodes_by_name.insert(node.name.clone(), node.clone());
> +        self.nodes_by_id.insert(node.node_id, node);
> +    }
> +}
> +
> +/// Node status data
> +#[derive(Clone, Debug)]
> +pub struct NodeStatus {
> +    pub name: String,
> +    pub data: Vec<u8>,
> +    pub timestamp: u64,
> +}
> +
> +/// Cluster log entry
> +#[derive(Clone, Debug)]
> +pub struct ClusterLogEntry {
> +    pub timestamp: u64,
> +    pub node: String,
> +    pub priority: u8,
> +    pub ident: String,
> +    pub tag: String,
> +    pub message: String,
> +}
> +
> +/// RRD (Round Robin Database) entry
> +#[derive(Clone, Debug)]
> +pub(crate) struct RrdEntry {
> +    pub key: String,
> +    pub data: String,
> +    pub timestamp: u64,
> +}





^ permalink raw reply	[relevance 5%]

* Re: [pve-devel] [PATCH pve-cluster 07/15] pmxcfs-rs: add pmxcfs-test-utils infrastructure crate
  @ 2026-02-03 17:03  6%   ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-03 17:03 UTC (permalink / raw)
  To: Proxmox VE development discussion, Kefu Chai; +Cc: Kefu Chai

Thanks for the patch, having shared test utilities in a dedicated crate
makes a lot of sense.

Comments inline.

On 1/6/26 3:25 PM, Kefu Chai wrote:
> From: Kefu Chai <tchaikov@gmail.com>
> 
> This commit introduces a dedicated testing infrastructure crate to support
> comprehensive unit and integration testing across the pmxcfs-rs workspace.
> 
> Why a dedicated crate?
> - Provides shared test utilities without creating circular dependencies
> - Enables consistent test patterns across all pmxcfs crates
> - Centralizes mock implementations for dependency injection
> 
> What this crate provides:
> 1. MockMemDb: Fast, in-memory implementation of MemDbOps trait
>     - Eliminates SQLite I/O overhead in unit tests (~100x faster)
>     - Enables isolated testing without filesystem dependencies
>     - Uses HashMap for storage instead of SQLite persistence
> 
> 2. MockStatus: Re-exported mock implementation for StatusOps trait
>     - Allows testing without global singleton state
>     - Enables parallel test execution
> 
> 3. TestEnv builder: Fluent interface for test environment setup
>     - Standardizes test configuration across different test types
>     - Provides common directory structures and test data
> 
> 4. Async helpers: Condition polling utilities (wait_for_condition)
>     - Replaces sleep-based synchronization with active polling
> 
> This crate is marked as dev-only in the workspace and is used by other
> crates through [dev-dependencies] to avoid circular dependencies.
> 
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
>   src/pmxcfs-rs/Cargo.toml                      |   2 +
>   src/pmxcfs-rs/pmxcfs-test-utils/Cargo.toml    |  34 +
>   src/pmxcfs-rs/pmxcfs-test-utils/src/lib.rs    | 526 +++++++++++++++
>   .../pmxcfs-test-utils/src/mock_memdb.rs       | 636 ++++++++++++++++++
>   4 files changed, 1198 insertions(+)
>   create mode 100644 src/pmxcfs-rs/pmxcfs-test-utils/Cargo.toml
>   create mode 100644 src/pmxcfs-rs/pmxcfs-test-utils/src/lib.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-test-utils/src/mock_memdb.rs
> 
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index b5191c31..8fe06b88 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -7,6 +7,7 @@ members = [
>       "pmxcfs-rrd",        # RRD (Round-Robin Database) persistence
>       "pmxcfs-memdb",      # In-memory database with SQLite persistence
>       "pmxcfs-status",     # Status monitoring and RRD data management
> +    "pmxcfs-test-utils", # Test utilities and helpers (dev-only)
>   ]
>   resolver = "2"
>   
> @@ -29,6 +30,7 @@ pmxcfs-status = { path = "pmxcfs-status" }
>   pmxcfs-ipc = { path = "pmxcfs-ipc" }
>   pmxcfs-services = { path = "pmxcfs-services" }
>   pmxcfs-logger = { path = "pmxcfs-logger" }
> +pmxcfs-test-utils = { path = "pmxcfs-test-utils" }
>   
>   # Core async runtime
>   tokio = { version = "1.35", features = ["full"] }
> diff --git a/src/pmxcfs-rs/pmxcfs-test-utils/Cargo.toml b/src/pmxcfs-rs/pmxcfs-test-utils/Cargo.toml
> new file mode 100644
> index 00000000..41cdce64
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-test-utils/Cargo.toml
> @@ -0,0 +1,34 @@
> +[package]
> +name = "pmxcfs-test-utils"
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +repository.workspace = true
> +rust-version.workspace = true
> +
> +[lib]
> +name = "pmxcfs_test_utils"
> +path = "src/lib.rs"
> +
> +[dependencies]
> +# Internal workspace dependencies
> +pmxcfs-api-types.workspace = true
> +pmxcfs-config.workspace = true
> +pmxcfs-memdb.workspace = true
> +pmxcfs-status.workspace = true
> +
> +# Error handling
> +anyhow.workspace = true
> +
> +# Concurrency
> +parking_lot.workspace = true
> +
> +# System integration
> +libc.workspace = true
> +
> +# Development utilities
> +tempfile.workspace = true
> +
> +# Async runtime
> +tokio.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-test-utils/src/lib.rs b/src/pmxcfs-rs/pmxcfs-test-utils/src/lib.rs
> new file mode 100644
> index 00000000..a2b732a5
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-test-utils/src/lib.rs
> @@ -0,0 +1,526 @@
> +//! Test utilities for pmxcfs integration and unit tests
> +//!
> +//! This crate provides:
> +//! - Common test setup and helper functions
> +//! - TestEnv builder for standard test configurations
> +//! - Mock implementations (MockStatus, MockMemDb for isolated testing)
> +//! - Test constants and utilities
> +
> +use anyhow::Result;
> +use pmxcfs_config::Config;
> +use pmxcfs_memdb::MemDb;
> +use std::sync::Arc;
> +use std::time::{Duration, Instant};
> +use tempfile::TempDir;
> +
> +// Re-export MockStatus for easy test access
> +pub use pmxcfs_status::{MockStatus, StatusOps};
> +
> +// Mock implementations
> +mod mock_memdb;
> +pub use mock_memdb::MockMemDb;
> +
> +// Re-export MemDbOps for convenience in tests
> +pub use pmxcfs_memdb::MemDbOps;
> +
> +// Test constants
> +pub const TEST_MTIME: u32 = 1234567890;
> +pub const TEST_NODE_NAME: &str = "testnode";
> +pub const TEST_CLUSTER_NAME: &str = "test-cluster";
> +pub const TEST_WWW_DATA_GID: u32 = 33;
> +
> +/// Test environment builder for standard test setups
> +///
> +/// This builder provides a fluent interface for creating test environments
> +/// with optional components (database, status, config).
> +///
> +/// # Example
> +/// ```
> +/// use pmxcfs_test_utils::TestEnv;
> +///
> +/// # fn example() -> anyhow::Result<()> {
> +/// let env = TestEnv::new()
> +///     .with_database()?
> +///     .with_mock_status()
> +///     .build();
> +///
> +/// // Use env.db, env.status, etc.
> +/// # Ok(())
> +/// # }
> +/// ```
> +pub struct TestEnv {
> +    pub config: Arc<Config>,
> +    pub db: Option<MemDb>,
> +    pub status: Option<Arc<dyn StatusOps>>,

these are pub, but we also have accessor functions
(which can panic)

> +    pub temp_dir: Option<TempDir>,
> +}
> +
> +impl TestEnv {
> +    /// Create a new test environment builder with default config
> +    pub fn new() -> Self {
> +        Self::new_with_config(false)
> +    }
> +
> +    /// Create a new test environment builder with local mode config
> +    pub fn new_local() -> Self {
> +        Self::new_with_config(true)
> +    }
> +
> +    /// Create a new test environment builder with custom local_mode setting
> +    pub fn new_with_config(local_mode: bool) -> Self {
> +        let config = create_test_config(local_mode);
> +        Self {
> +            config,
> +            db: None,
> +            status: None,
> +            temp_dir: None,
> +        }
> +    }
> +
> +    /// Add a database with standard directory structure
> +    pub fn with_database(mut self) -> Result<Self> {
> +        let (temp_dir, db) = create_test_db()?;
> +        self.temp_dir = Some(temp_dir);
> +        self.db = Some(db);
> +        Ok(self)
> +    }
> +
> +    /// Add a minimal database (no standard directories)
> +    pub fn with_minimal_database(mut self) -> Result<Self> {
> +        let (temp_dir, db) = create_minimal_test_db()?;
> +        self.temp_dir = Some(temp_dir);
> +        self.db = Some(db);
> +        Ok(self)
> +    }
> +
> +    /// Add a MockStatus instance for isolated testing
> +    pub fn with_mock_status(mut self) -> Self {
> +        self.status = Some(Arc::new(MockStatus::new()));
> +        self
> +    }
> +
> +    /// Add the real Status instance (uses global singleton)
> +    pub fn with_status(mut self) -> Self {
> +        self.status = Some(pmxcfs_status::init());
> +        self
> +    }
> +
> +    /// Build and return the test environment
> +    pub fn build(self) -> Self {
> +        self
> +    }

this function seems redundant

> +
> +    /// Get a reference to the database (panics if not configured)
> +    pub fn db(&self) -> &MemDb {
> +        self.db
> +            .as_ref()
> +            .expect("Database not configured. Call with_database() first")
> +    }
> +
> +    /// Get a reference to the status (panics if not configured)
> +    pub fn status(&self) -> &Arc<dyn StatusOps> {
> +        self.status
> +            .as_ref()
> +            .expect("Status not configured. Call with_status() or with_mock_status() first")
> +    }
> +}
> +
> +impl Default for TestEnv {
> +    fn default() -> Self {
> +        Self::new()
> +    }
> +}
> +
> +/// Creates a standard test configuration
> +///
> +/// # Arguments
> +/// * `local_mode` - Whether to run in local mode (no cluster)
> +///
> +/// # Returns
> +/// Arc-wrapped Config suitable for testing
> +pub fn create_test_config(local_mode: bool) -> Arc<Config> {
> +    Config::new(
> +        TEST_NODE_NAME.to_string(),
> +        "127.0.0.1".to_string(),
> +        TEST_WWW_DATA_GID,
> +        false, // debug mode
> +        local_mode,
> +        TEST_CLUSTER_NAME.to_string(),
> +    )
> +}
> +
> +/// Creates a test database with standard directory structure
> +///
> +/// Creates the following directories:
> +/// - /nodes/{nodename}/qemu-server
> +/// - /nodes/{nodename}/lxc
> +/// - /nodes/{nodename}/priv
> +/// - /priv/lock/qemu-server
> +/// - /priv/lock/lxc
> +/// - /qemu-server
> +/// - /lxc
> +///
> +/// # Returns
> +/// (TempDir, MemDb) - The temp directory must be kept alive for database to persist
> +pub fn create_test_db() -> Result<(TempDir, MemDb)> {
> +    let temp_dir = TempDir::new()?;
> +    let db_path = temp_dir.path().join("test.db");
> +    let db = MemDb::open(&db_path, true)?;
> +
> +    // Create standard directory structure
> +    let now = TEST_MTIME;
> +
> +    // Node-specific directories
> +    db.create("/nodes", libc::S_IFDIR, now)?;
> +    db.create(&format!("/nodes/{}", TEST_NODE_NAME), libc::S_IFDIR, now)?;
> +    db.create(
> +        &format!("/nodes/{}/qemu-server", TEST_NODE_NAME),
> +        libc::S_IFDIR,
> +        now,
> +    )?;
> +    db.create(
> +        &format!("/nodes/{}/lxc", TEST_NODE_NAME),
> +        libc::S_IFDIR,
> +        now,
> +    )?;
> +    db.create(
> +        &format!("/nodes/{}/priv", TEST_NODE_NAME),
> +        libc::S_IFDIR,
> +        now,
> +    )?;
> +
> +    // Global directories
> +    db.create("/priv", libc::S_IFDIR, now)?;
> +    db.create("/priv/lock", libc::S_IFDIR, now)?;
> +    db.create("/priv/lock/qemu-server", libc::S_IFDIR, now)?;
> +    db.create("/priv/lock/lxc", libc::S_IFDIR, now)?;
> +    db.create("/qemu-server", libc::S_IFDIR, now)?;
> +    db.create("/lxc", libc::S_IFDIR, now)?;
> +
> +    Ok((temp_dir, db))
> +}
> +
> +/// Creates a minimal test database (no standard directories)
> +///
> +/// Use this when you want full control over database structure
> +///
> +/// # Returns
> +/// (TempDir, MemDb) - The temp directory must be kept alive for database to persist
> +pub fn create_minimal_test_db() -> Result<(TempDir, MemDb)> {
> +    let temp_dir = TempDir::new()?;
> +    let db_path = temp_dir.path().join("test.db");
> +    let db = MemDb::open(&db_path, true)?;
> +    Ok((temp_dir, db))
> +}
> +
> +/// Creates test VM configuration content
> +///
> +/// # Arguments
> +/// * `vmid` - VM ID
> +/// * `cores` - Number of CPU cores
> +/// * `memory` - Memory in MB
> +///
> +/// # Returns
> +/// Configuration file content as bytes
> +pub fn create_vm_config(vmid: u32, cores: u32, memory: u32) -> Vec<u8> {
> +    format!(
> +        "name: test-vm-{}\ncores: {}\nmemory: {}\nbootdisk: scsi0\n",
> +        vmid, cores, memory
> +    )
> +    .into_bytes()
> +}
> +
> +/// Creates test CT (container) configuration content
> +///
> +/// # Arguments
> +/// * `vmid` - Container ID
> +/// * `cores` - Number of CPU cores
> +/// * `memory` - Memory in MB
> +///
> +/// # Returns
> +/// Configuration file content as bytes
> +pub fn create_ct_config(vmid: u32, cores: u32, memory: u32) -> Vec<u8> {
> +    format!(
> +        "cores: {}\nmemory: {}\nrootfs: local:100/vm-{}-disk-0.raw\n",
> +        cores, memory, vmid
> +    )
> +    .into_bytes()
> +}
> +
> +/// Creates a test lock path for a VM config
> +///
> +/// # Arguments
> +/// * `vmid` - VM ID
> +/// * `vm_type` - "qemu" or "lxc"
> +///
> +/// # Returns
> +/// Lock path in format `/priv/lock/{vm_type}/{vmid}.conf`
> +pub fn create_lock_path(vmid: u32, vm_type: &str) -> String {
> +    format!("/priv/lock/{}/{}.conf", vm_type, vmid)
> +}
> +
> +/// Creates a test config path for a VM
> +///
> +/// # Arguments
> +/// * `vmid` - VM ID
> +/// * `vm_type` - "qemu-server" or "lxc"
> +///
> +/// # Returns
> +/// Config path in format `/{vm_type}/{vmid}.conf`
> +pub fn create_config_path(vmid: u32, vm_type: &str) -> String {
> +    format!("/{}/{}.conf", vm_type, vmid)
> +}
> +
> +/// Clears all VMs from a status instance
> +///
> +/// Useful for ensuring clean state before tests that register VMs.
> +///
> +/// # Arguments
> +/// * `status` - The status instance to clear
> +pub fn clear_test_vms(status: &dyn StatusOps) {
> +    let existing_vms: Vec<u32> = status.get_vmlist().keys().copied().collect();
> +    for vmid in existing_vms {
> +        status.delete_vm(vmid);
> +    }
> +}
> +
> +/// Wait for a condition to become true, polling at regular intervals
> +///
> +/// This is a replacement for sleep-based synchronization in integration tests.
> +/// Instead of sleeping for an arbitrary duration and hoping the condition is met,
> +/// this function polls the condition and returns as soon as it becomes true.
> +///
> +/// # Arguments
> +/// * `predicate` - Function that returns true when the condition is met
> +/// * `timeout` - Maximum time to wait for the condition
> +/// * `check_interval` - How often to check the condition
> +///
> +/// # Returns
> +/// * `true` if condition was met within timeout
> +/// * `false` if timeout was reached without condition being met
> +///
> +/// # Example
> +/// ```no_run
> +/// use pmxcfs_test_utils::wait_for_condition;
> +/// use std::time::Duration;
> +/// use std::sync::atomic::{AtomicBool, Ordering};
> +/// use std::sync::Arc;
> +///
> +/// # async fn example() {
> +/// let ready = Arc::new(AtomicBool::new(false));
> +///
> +/// // Wait for service to be ready (with timeout)
> +/// let result = wait_for_condition(
> +///     || ready.load(Ordering::SeqCst),
> +///     Duration::from_secs(5),
> +///     Duration::from_millis(10),
> +/// ).await;
> +///
> +/// assert!(result, "Service should be ready within 5 seconds");
> +/// # }
> +/// ```
> +pub async fn wait_for_condition<F>(
> +    predicate: F,
> +    timeout: Duration,
> +    check_interval: Duration,
> +) -> bool
> +where
> +    F: Fn() -> bool,
> +{
> +    let start = Instant::now();
> +    loop {
> +        if predicate() {
> +            return true;
> +        }
> +        if start.elapsed() >= timeout {
> +            return false;
> +        }
> +        tokio::time::sleep(check_interval).await;
> +    }
> +}
> +
> +/// Wait for a condition with a custom error message
> +///
> +/// Similar to `wait_for_condition`, but returns a Result with a custom error message
> +/// if the timeout is reached.
> +///
> +/// # Arguments
> +/// * `predicate` - Function that returns true when the condition is met
> +/// * `timeout` - Maximum time to wait for the condition
> +/// * `check_interval` - How often to check the condition
> +/// * `error_msg` - Error message to return if timeout is reached
> +///
> +/// # Returns
> +/// * `Ok(())` if condition was met within timeout
> +/// * `Err(anyhow::Error)` with custom message if timeout was reached
> +///
> +/// # Example
> +/// ```no_run
> +/// use pmxcfs_test_utils::wait_for_condition_or_fail;
> +/// use std::time::Duration;
> +/// use std::sync::atomic::{AtomicU64, Ordering};
> +/// use std::sync::Arc;
> +///
> +/// # async fn example() -> anyhow::Result<()> {
> +/// let counter = Arc::new(AtomicU64::new(0));
> +///
> +/// wait_for_condition_or_fail(
> +///     || counter.load(Ordering::SeqCst) >= 1,
> +///     Duration::from_secs(5),
> +///     Duration::from_millis(10),
> +///     "Service should initialize within 5 seconds",
> +/// ).await?;
> +///
> +/// # Ok(())
> +/// # }
> +/// ```
> +pub async fn wait_for_condition_or_fail<F>(
> +    predicate: F,
> +    timeout: Duration,
> +    check_interval: Duration,
> +    error_msg: &str,
> +) -> Result<()>
> +where
> +    F: Fn() -> bool,
> +{
> +    if wait_for_condition(predicate, timeout, check_interval).await {
> +        Ok(())
> +    } else {
> +        anyhow::bail!("{}", error_msg)
> +    }
> +}
> +
> +/// Blocking version of wait_for_condition for synchronous tests
> +///
> +/// Similar to `wait_for_condition`, but works in synchronous contexts.
> +/// Polls the condition and returns as soon as it becomes true or timeout is reached.
> +///
> +/// # Arguments
> +/// * `predicate` - Function that returns true when the condition is met
> +/// * `timeout` - Maximum time to wait for the condition
> +/// * `check_interval` - How often to check the condition
> +///
> +/// # Returns
> +/// * `true` if condition was met within timeout
> +/// * `false` if timeout was reached without condition being met
> +///
> +/// # Example
> +/// ```no_run
> +/// use pmxcfs_test_utils::wait_for_condition_blocking;
> +/// use std::time::Duration;
> +/// use std::sync::atomic::{AtomicBool, Ordering};
> +/// use std::sync::Arc;
> +///
> +/// let ready = Arc::new(AtomicBool::new(false));
> +///
> +/// // Wait for service to be ready (with timeout)
> +/// let result = wait_for_condition_blocking(
> +///     || ready.load(Ordering::SeqCst),
> +///     Duration::from_secs(5),
> +///     Duration::from_millis(10),
> +/// );
> +///
> +/// assert!(result, "Service should be ready within 5 seconds");
> +/// ```
> +pub fn wait_for_condition_blocking<F>(
> +    predicate: F,
> +    timeout: Duration,
> +    check_interval: Duration,
> +) -> bool
> +where
> +    F: Fn() -> bool,
> +{
> +    let start = Instant::now();
> +    loop {
> +        if predicate() {
> +            return true;
> +        }
> +        if start.elapsed() >= timeout {
> +            return false;
> +        }
> +        std::thread::sleep(check_interval);
> +    }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +
> +    #[test]
> +    fn test_create_test_config() {
> +        let config = create_test_config(true);
> +        assert_eq!(config.nodename, TEST_NODE_NAME);
> +        assert_eq!(config.cluster_name, TEST_CLUSTER_NAME);
> +        assert!(config.local_mode);
> +    }
> +
> +    #[test]
> +    fn test_create_test_db() -> Result<()> {
> +        let (_temp_dir, db) = create_test_db()?;
> +
> +        // Verify standard directories exist
> +        assert!(db.exists("/nodes")?, "Should have /nodes");
> +        assert!(db.exists("/qemu-server")?, "Should have /qemu-server");
> +        assert!(db.exists("/priv/lock")?, "Should have /priv/lock");
> +
> +        Ok(())
> +    }
> +
> +    #[test]
> +    fn test_path_helpers() {
> +        assert_eq!(
> +            create_lock_path(100, "qemu-server"),

The docs of create_lock_path say qemu or lxc, but we pass "qemu-server"

> +            "/priv/lock/qemu-server/100.conf"
> +        );
> +        assert_eq!(
> +            create_config_path(100, "qemu-server"),
> +            "/qemu-server/100.conf"
> +        );
> +    }
> +
> +    #[test]
> +    fn test_env_builder_basic() {
> +        let env = TestEnv::new().build();
> +        assert_eq!(env.config.nodename, TEST_NODE_NAME);
> +        assert!(env.db.is_none());
> +        assert!(env.status.is_none());
> +    }
> +
> +    #[test]
> +    fn test_env_builder_with_database() -> Result<()> {
> +        let env = TestEnv::new().with_database()?.build();
> +        assert!(env.db.is_some());
> +        assert!(env.db().exists("/nodes")?);
> +        Ok(())
> +    }
> +
> +    #[test]
> +    fn test_env_builder_with_mock_status() {
> +        let env = TestEnv::new().with_mock_status().build();
> +        assert!(env.status.is_some());
> +
> +        // Test that MockStatus works
> +        let status = env.status();
> +        status.set_quorate(true);
> +        assert!(status.is_quorate());
> +    }
> +
> +    #[test]
> +    fn test_env_builder_full() -> Result<()> {
> +        let env = TestEnv::new().with_database()?.with_mock_status().build();
> +
> +        assert!(env.db.is_some());
> +        assert!(env.status.is_some());
> +        assert!(env.config.nodename == TEST_NODE_NAME);
> +
> +        Ok(())
> +    }
> +
> +    // NOTE: Tokio tests for wait_for_condition functions are REMOVED because they
> +    // cause the test runner to hang when running `cargo test --lib --workspace`.
> +    // Root cause: tokio multi-threaded runtime doesn't shut down properly when
> +    // these async tests complete, blocking the entire test suite.
> +    //
> +    // These utility functions work correctly and are verified in integration tests
> +    // that actually use them (e.g., integration-tests/).
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-test-utils/src/mock_memdb.rs b/src/pmxcfs-rs/pmxcfs-test-utils/src/mock_memdb.rs
> new file mode 100644
> index 00000000..c341f9eb
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-test-utils/src/mock_memdb.rs
> @@ -0,0 +1,636 @@
> +//! Mock in-memory database implementation for testing
> +//!
> +//! This module provides `MockMemDb`, a lightweight in-memory implementation
> +//! of the `MemDbOps` trait for use in unit tests.
> +
> +use anyhow::{Result, bail};
> +use parking_lot::RwLock;
> +use pmxcfs_memdb::{MemDbOps, ROOT_INODE, TreeEntry};
> +use std::collections::HashMap;
> +use std::sync::atomic::{AtomicU64, Ordering};
> +use std::time::{SystemTime, UNIX_EPOCH};
> +
> +// Directory and file type constants from dirent.h
> +const DT_DIR: u8 = 4;
> +const DT_REG: u8 = 8;
> +
> +/// Mock in-memory database for testing
> +///
> +/// Unlike the real `MemDb` which uses SQLite persistence, `MockMemDb` stores
> +/// everything in memory using HashMap. This makes it:
> +/// - Faster for unit tests (no disk I/O)
> +/// - Easier to inject failures for error testing
> +/// - Completely isolated (no shared state between tests)
> +///
> +/// # Example
> +/// ```
> +/// use pmxcfs_test_utils::MockMemDb;
> +/// use pmxcfs_memdb::MemDbOps;
> +/// use std::sync::Arc;
> +///
> +/// let db: Arc<dyn MemDbOps> = Arc::new(MockMemDb::new());
> +/// db.create("/test.txt", 0, 1234).unwrap();
> +/// assert!(db.exists("/test.txt").unwrap());
> +/// ```
> +pub struct MockMemDb {
> +    /// Files and directories stored as path -> data
> +    files: RwLock<HashMap<String, Vec<u8>>>,
> +    /// Directory entries stored as path -> Vec<child_names>
> +    directories: RwLock<HashMap<String, Vec<String>>>,
> +    /// Metadata stored as path -> TreeEntry
> +    entries: RwLock<HashMap<String, TreeEntry>>,
> +    /// Lock state stored as path -> (timestamp, checksum)
> +    locks: RwLock<HashMap<String, (u64, [u8; 32])>>,
> +    /// Version counter
> +    version: AtomicU64,
> +    /// Inode counter
> +    next_inode: AtomicU64,
> +}
> +
> +impl MockMemDb {
> +    /// Create a new empty mock database
> +    pub fn new() -> Self {
> +        let mut directories = HashMap::new();
> +        directories.insert("/".to_string(), Vec::new());
> +
> +        let mut entries = HashMap::new();
> +        let now = SystemTime::now()
> +            .duration_since(UNIX_EPOCH)
> +            .unwrap()
> +            .as_secs() as u32;
> +
> +        // Create root entry
> +        entries.insert(
> +            "/".to_string(),
> +            TreeEntry {
> +                inode: ROOT_INODE,
> +                parent: 0,
> +                version: 0,
> +                writer: 1,
> +                mtime: now,
> +                size: 0,
> +                entry_type: DT_DIR,
> +                data: Vec::new(),
> +                name: String::new(),
> +            },
> +        );
> +
> +        Self {
> +            files: RwLock::new(HashMap::new()),
> +            directories: RwLock::new(directories),
> +            entries: RwLock::new(entries),
> +            locks: RwLock::new(HashMap::new()),
> +            version: AtomicU64::new(1),
> +            next_inode: AtomicU64::new(ROOT_INODE + 1),
> +        }
> +    }
> +
> +    /// Helper to check if path is a directory
> +    fn is_directory(&self, path: &str) -> bool {
> +        self.directories.read().contains_key(path)
> +    }
> +
> +    /// Helper to get parent path
> +    fn parent_path(path: &str) -> Option<String> {
> +        if path == "/" {
> +            return None;
> +        }
> +        let parent = path.rsplit_once('/')?.0;
> +        if parent.is_empty() {
> +            Some("/".to_string())
> +        } else {
> +            Some(parent.to_string())
> +        }
> +    }
> +
> +    /// Helper to get file name from path
> +    fn file_name(path: &str) -> String {
> +        if path == "/" {
> +            return String::new();
> +        }
> +        path.rsplit('/').next().unwrap_or("").to_string()
> +    }
> +}
> +
> +impl Default for MockMemDb {
> +    fn default() -> Self {
> +        Self::new()
> +    }
> +}
> +
> +impl MemDbOps for MockMemDb {
> +    fn create(&self, path: &str, mode: u32, mtime: u32) -> Result<()> {
> +        if path.is_empty() {
> +            bail!("Empty path");
> +        }
> +
> +        if self.entries.read().contains_key(path) {
> +            bail!("File exists: {}", path);
> +        }
> +
> +        let is_dir = (mode & libc::S_IFMT) == libc::S_IFDIR;
> +        let entry_type = if is_dir { DT_DIR } else { DT_REG };
> +        let inode = self.next_inode.fetch_add(1, Ordering::SeqCst);
> +
> +        // Add to parent directory
> +        if let Some(parent) = Self::parent_path(path) {
> +            if !self.is_directory(&parent) {
> +                bail!("Parent is not a directory: {}", parent);
> +            }
> +            let mut dirs = self.directories.write();
> +            if let Some(children) = dirs.get_mut(&parent) {
> +                children.push(Self::file_name(path));
> +            }
> +        }
> +
> +        // Create entry
> +        let entry = TreeEntry {
> +            inode,
> +            parent: 0, // Simplified
> +            version: self.version.load(Ordering::SeqCst),
> +            writer: 1,
> +            mtime,
> +            size: 0,
> +            entry_type,
> +            data: Vec::new(),
> +            name: Self::file_name(path),
> +        };
> +
> +        self.entries.write().insert(path.to_string(), entry);
> +
> +        if is_dir {
> +            self.directories
> +                .write()
> +                .insert(path.to_string(), Vec::new());
> +        } else {
> +            self.files.write().insert(path.to_string(), Vec::new());
> +        }
> +
> +        self.version.fetch_add(1, Ordering::SeqCst);
> +        Ok(())
> +    }
> +
> +    fn read(&self, path: &str, offset: u64, size: usize) -> Result<Vec<u8>> {
> +        let files = self.files.read();
> +        let data = files
> +            .get(path)
> +            .ok_or_else(|| anyhow::anyhow!("File not found: {}", path))?;
> +
> +        let offset = offset as usize;
> +        if offset >= data.len() {
> +            return Ok(Vec::new());
> +        }
> +
> +        let end = std::cmp::min(offset + size, data.len());
> +        Ok(data[offset..end].to_vec())
> +    }
> +
> +    fn write(
> +        &self,
> +        path: &str,
> +        offset: u64,
> +        mtime: u32,
> +        data: &[u8],
> +        truncate: bool,
> +    ) -> Result<usize> {
> +        let mut files = self.files.write();
> +        let file_data = files
> +            .get_mut(path)
> +            .ok_or_else(|| anyhow::anyhow!("File not found: {}", path))?;
> +
> +        let offset = offset as usize;
> +
> +        if truncate {
> +            file_data.clear();
> +        }
> +
> +        // Expand if needed
> +        if offset + data.len() > file_data.len() {
> +            file_data.resize(offset + data.len(), 0);
> +        }
> +
> +        file_data[offset..offset + data.len()].copy_from_slice(data);
> +
> +        // Update entry
> +        if let Some(entry) = self.entries.write().get_mut(path) {
> +            entry.mtime = mtime;
> +            entry.size = file_data.len();
> +        }
> +
> +        self.version.fetch_add(1, Ordering::SeqCst);
> +        Ok(data.len())
> +    }
> +
> +    fn delete(&self, path: &str) -> Result<()> {
> +        if !self.entries.read().contains_key(path) {
> +            bail!("File not found: {}", path);
> +        }
> +
> +        // Check if directory is empty
> +        if let Some(children) = self.directories.read().get(path) {
> +            if !children.is_empty() {
> +                bail!("Directory not empty: {}", path);
> +            }
> +        }
> +
> +        self.entries.write().remove(path);
> +        self.files.write().remove(path);
> +        self.directories.write().remove(path);
> +
> +        // Remove from parent
> +        if let Some(parent) = Self::parent_path(path) {
> +            if let Some(children) = self.directories.write().get_mut(&parent) {
> +                children.retain(|name| name != &Self::file_name(path));
> +            }
> +        }
> +
> +        self.version.fetch_add(1, Ordering::SeqCst);
> +        Ok(())
> +    }
> +
> +    fn rename(&self, old_path: &str, new_path: &str) -> Result<()> {
> +        // Check existence first with read locks (released immediately)
> +        {
> +            let entries = self.entries.read();
> +            if !entries.contains_key(old_path) {
> +                bail!("Source not found: {}", old_path);
> +            }
> +            if entries.contains_key(new_path) {
> +                bail!("Destination already exists: {}", new_path);
> +            }
> +        }

We currently don't update parent children lists.
Also, if rename() can be used for directories: we likely need to
rewrite/move all descendant keys (/old/... -> /new/...) across
entries/files/directories to keep the tree consistent.

> +
> +        // Move entry - hold write lock for entire operation
> +        {
> +            let mut entries = self.entries.write();
> +            if let Some(mut entry) = entries.remove(old_path) {
> +                entry.name = Self::file_name(new_path);
> +                entries.insert(new_path.to_string(), entry);
> +            }
> +        }

Between the read and write lock we have a TOCTOU.
Coudlnt we just hold the write lock?

> +
> +        // Move file data - hold write lock for entire operation
> +        {
> +            let mut files = self.files.write();
> +            if let Some(data) = files.remove(old_path) {
> +                files.insert(new_path.to_string(), data);
> +            }
> +        }
> +
> +        // Move directory - hold write lock for entire operation
> +        {
> +            let mut directories = self.directories.write();
> +            if let Some(children) = directories.remove(old_path) {
> +                directories.insert(new_path.to_string(), children);
> +            }
> +        }
> +
> +        self.version.fetch_add(1, Ordering::SeqCst);
> +        Ok(())
> +    }
> +
> +    fn exists(&self, path: &str) -> Result<bool> {
> +        Ok(self.entries.read().contains_key(path))
> +    }
> +
> +    fn readdir(&self, path: &str) -> Result<Vec<TreeEntry>> {
> +        let directories = self.directories.read();
> +        let children = directories
> +            .get(path)
> +            .ok_or_else(|| anyhow::anyhow!("Not a directory: {}", path))?;
> +
> +        let entries = self.entries.read();
> +        let mut result = Vec::new();
> +
> +        for child_name in children {
> +            let child_path = if path == "/" {
> +                format!("/{}", child_name)
> +            } else {
> +                format!("{}/{}", path, child_name)
> +            };
> +
> +            if let Some(entry) = entries.get(&child_path) {
> +                result.push(entry.clone());
> +            }
> +        }
> +
> +        Ok(result)
> +    }
> +
> +    fn set_mtime(&self, path: &str, _writer: u32, mtime: u32) -> Result<()> {
> +        let mut entries = self.entries.write();
> +        let entry = entries
> +            .get_mut(path)
> +            .ok_or_else(|| anyhow::anyhow!("File not found: {}", path))?;
> +        entry.mtime = mtime;
> +        Ok(())
> +    }
> +
> +    fn lookup_path(&self, path: &str) -> Option<TreeEntry> {
> +        self.entries.read().get(path).cloned()
> +    }
> +
> +    fn get_entry_by_inode(&self, inode: u64) -> Option<TreeEntry> {
> +        self.entries
> +            .read()
> +            .values()
> +            .find(|e| e.inode == inode)
> +            .cloned()
> +    }
> +
> +    fn acquire_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()> {
> +        let mut locks = self.locks.write();
> +        let now = SystemTime::now()
> +            .duration_since(UNIX_EPOCH)
> +            .unwrap()
> +            .as_secs();
> +
> +        if let Some((timestamp, existing_csum)) = locks.get(path) {
> +            // Check if expired
> +            if now - timestamp > 120 {

nit: magic number here, could we use a
const LOCK_TIMEOUT_SECS: u64 = 120; for example?

> +                // Expired, can acquire
> +                locks.insert(path.to_string(), (now, *csum));
> +                return Ok(());
> +            }
> +
> +            // Not expired, check if same checksum (refresh)
> +            if existing_csum == csum {
> +                locks.insert(path.to_string(), (now, *csum));
> +                return Ok(());
> +            }
> +
> +            bail!("Lock already held with different checksum");
> +        }
> +
> +        locks.insert(path.to_string(), (now, *csum));
> +        Ok(())
> +    }
> +
> +    fn release_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()> {
> +        let mut locks = self.locks.write();
> +        if let Some((_, existing_csum)) = locks.get(path) {
> +            if existing_csum == csum {
> +                locks.remove(path);
> +                return Ok(());
> +            }
> +            bail!("Lock checksum mismatch");
> +        }
> +        bail!("No lock found");
> +    }
> +
> +    fn is_locked(&self, path: &str) -> bool {
> +        if let Some((timestamp, _)) = self.locks.read().get(path) {
> +            let now = SystemTime::now()
> +                .duration_since(UNIX_EPOCH)
> +                .unwrap()
> +                .as_secs();
> +            now - timestamp <= 120
> +        } else {
> +            false
> +        }
> +    }
> +
> +    fn lock_expired(&self, path: &str, csum: &[u8; 32]) -> bool {
> +        if let Some((timestamp, existing_csum)) = self.locks.read().get(path).cloned() {
> +            let now = SystemTime::now()
> +                .duration_since(UNIX_EPOCH)
> +                .unwrap()
> +                .as_secs();
> +
> +            // Checksum mismatch - reset timeout
> +            if &existing_csum != csum {
> +                self.locks.write().insert(path.to_string(), (now, *csum));

can we please document this, why we are modifying state when
checksums mismatch?

> +                return false;
> +            }
> +
> +            // Check expiration
> +            now - timestamp > 120
> +        } else {
> +            false
> +        }
> +    }
> +
> +    fn get_version(&self) -> u64 {
> +        self.version.load(Ordering::SeqCst)
> +    }
> +
> +    fn get_all_entries(&self) -> Result<Vec<TreeEntry>> {
> +        Ok(self.entries.read().values().cloned().collect())
> +    }
> +
> +    fn replace_all_entries(&self, entries: Vec<TreeEntry>) -> Result<()> {

Also replace_all_entries() / apply_tree_entry() don’t rebuild parent
directories[..] children lists

> +        self.entries.write().clear();

Clears entries, so the root TreeEntry ("/") should be reinserted to
preserve invariants not? (similar to directories below).

> +        self.files.write().clear();
> +        self.directories.write().clear();

Clearing directories removes "/" but doesn’t reinsert "/"


If possible, we could acquire all write locks once (in the right order) 
before the loop

> +
> +        for entry in entries {
> +            let path = format!("/{}", entry.name); // Simplified
> +            self.entries.write().insert(path.clone(), entry.clone());
> +
> +            if entry.size > 0 {

Use entry.entry_type == DT_DIR to distinguish directories from files.
The current entry.size > 0 check incorrectly classifies empty files
(size 0) as directories.

> +                self.files.write().insert(path, entry.data.clone());
> +            } else {
> +                self.directories.write().insert(path, Vec::new());
> +            }
> +        }
> +
> +        self.version.fetch_add(1, Ordering::SeqCst);
> +        Ok(())
> +    }
> +
> +    fn apply_tree_entry(&self, entry: TreeEntry) -> Result<()> {
> +        let path = format!("/{}", entry.name); // Simplified
> +        self.entries.write().insert(path.clone(), entry.clone());
> +
> +        if entry.size > 0 {

also here please use entry.entry_type == DT_DIR

> +            self.files.write().insert(path, entry.data.clone());
> +        }
> +
> +        self.version.fetch_add(1, Ordering::SeqCst);
> +        Ok(())
> +    }
> +
> +    fn encode_database(&self) -> Result<Vec<u8>> {
> +        // Simplified - just return empty vec
> +        Ok(Vec::new())
> +    }
> +
> +    fn compute_database_checksum(&self) -> Result<[u8; 32]> {
> +        // Simplified - return deterministic checksum based on version
> +        let version = self.version.load(Ordering::SeqCst);
> +        let mut checksum = [0u8; 32];
> +        checksum[0..8].copy_from_slice(&version.to_le_bytes());
> +        Ok(checksum)
> +    }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +    use std::sync::Arc;
> +
> +    #[test]
> +    fn test_mock_memdb_basic_operations() {
> +        let db = MockMemDb::new();
> +
> +        // Create file
> +        db.create("/test.txt", libc::S_IFREG, 1234).unwrap();
> +        assert!(db.exists("/test.txt").unwrap());
> +
> +        // Write data
> +        let data = b"Hello, MockMemDb!";
> +        db.write("/test.txt", 0, 1235, data, false).unwrap();
> +
> +        // Read data
> +        let read_data = db.read("/test.txt", 0, 100).unwrap();
> +        assert_eq!(&read_data[..], data);
> +
> +        // Check entry
> +        let entry = db.lookup_path("/test.txt").unwrap();
> +        assert_eq!(entry.size, data.len());
> +        assert_eq!(entry.mtime, 1235);
> +    }
> +
> +    #[test]
> +    fn test_mock_memdb_directory_operations() {
> +        let db = MockMemDb::new();
> +
> +        // Create directory
> +        db.create("/mydir", libc::S_IFDIR, 1000).unwrap();
> +        assert!(db.exists("/mydir").unwrap());
> +
> +        // Create file in directory
> +        db.create("/mydir/file.txt", libc::S_IFREG, 1001).unwrap();
> +
> +        // Read directory
> +        let entries = db.readdir("/mydir").unwrap();
> +        assert_eq!(entries.len(), 1);
> +        assert_eq!(entries[0].name, "file.txt");
> +    }
> +
> +    #[test]
> +    fn test_mock_memdb_lock_operations() {
> +        let db = MockMemDb::new();
> +        let csum1 = [1u8; 32];
> +        let csum2 = [2u8; 32];
> +
> +        // Acquire lock
> +        db.acquire_lock("/priv/lock/resource", &csum1).unwrap();
> +        assert!(db.is_locked("/priv/lock/resource"));
> +
> +        // Lock with same checksum should succeed (refresh)
> +        assert!(db.acquire_lock("/priv/lock/resource", &csum1).is_ok());
> +
> +        // Lock with different checksum should fail
> +        assert!(db.acquire_lock("/priv/lock/resource", &csum2).is_err());
> +
> +        // Release lock
> +        db.release_lock("/priv/lock/resource", &csum1).unwrap();
> +        assert!(!db.is_locked("/priv/lock/resource"));
> +
> +        // Can acquire with different checksum now
> +        db.acquire_lock("/priv/lock/resource", &csum2).unwrap();
> +        assert!(db.is_locked("/priv/lock/resource"));
> +    }
> +
> +    #[test]
> +    fn test_mock_memdb_rename() {
> +        let db = MockMemDb::new();
> +
> +        // Create file
> +        db.create("/old.txt", libc::S_IFREG, 1000).unwrap();
> +        db.write("/old.txt", 0, 1001, b"content", false).unwrap();
> +
> +        // Rename
> +        db.rename("/old.txt", "/new.txt").unwrap();
> +
> +        // Old path should not exist
> +        assert!(!db.exists("/old.txt").unwrap());
> +
> +        // New path should exist with same content
> +        assert!(db.exists("/new.txt").unwrap());
> +        let data = db.read("/new.txt", 0, 100).unwrap();
> +        assert_eq!(&data[..], b"content");
> +    }
> +
> +    #[test]
> +    fn test_mock_memdb_delete() {
> +        let db = MockMemDb::new();
> +
> +        // Create and delete file
> +        db.create("/delete-me.txt", libc::S_IFREG, 1000).unwrap();
> +        assert!(db.exists("/delete-me.txt").unwrap());
> +
> +        db.delete("/delete-me.txt").unwrap();
> +        assert!(!db.exists("/delete-me.txt").unwrap());
> +
> +        // Delete non-existent file should fail
> +        assert!(db.delete("/nonexistent.txt").is_err());
> +    }
> +
> +    #[test]
> +    fn test_mock_memdb_version_tracking() {
> +        let db = MockMemDb::new();
> +        let initial_version = db.get_version();
> +
> +        // Version should increment on modifications
> +        db.create("/file1.txt", libc::S_IFREG, 1000).unwrap();
> +        assert!(db.get_version() > initial_version);
> +
> +        let v1 = db.get_version();
> +        db.write("/file1.txt", 0, 1001, b"data", false).unwrap();
> +        assert!(db.get_version() > v1);
> +
> +        let v2 = db.get_version();
> +        db.delete("/file1.txt").unwrap();
> +        assert!(db.get_version() > v2);
> +    }
> +
> +    #[test]
> +    fn test_mock_memdb_isolation() {
> +        // Each MockMemDb instance is completely isolated
> +        let db1 = MockMemDb::new();
> +        let db2 = MockMemDb::new();
> +
> +        db1.create("/test.txt", libc::S_IFREG, 1000).unwrap();
> +
> +        // db2 should not see db1's files
> +        assert!(db1.exists("/test.txt").unwrap());
> +        assert!(!db2.exists("/test.txt").unwrap());
> +    }
> +
> +    #[test]
> +    fn test_mock_memdb_as_trait_object() {
> +        // Demonstrate using MockMemDb through trait object
> +        let db: Arc<dyn MemDbOps> = Arc::new(MockMemDb::new());
> +
> +        db.create("/trait-test.txt", libc::S_IFREG, 2000).unwrap();
> +        assert!(db.exists("/trait-test.txt").unwrap());
> +
> +        db.write("/trait-test.txt", 0, 2001, b"via trait", false)
> +            .unwrap();
> +        let data = db.read("/trait-test.txt", 0, 100).unwrap();
> +        assert_eq!(&data[..], b"via trait");
> +    }
> +
> +    #[test]
> +    fn test_mock_memdb_error_cases() {
> +        let db = MockMemDb::new();
> +
> +        // Create duplicate should fail
> +        db.create("/dup.txt", libc::S_IFREG, 1000).unwrap();
> +        assert!(db.create("/dup.txt", libc::S_IFREG, 1000).is_err());
> +
> +        // Read non-existent file should fail
> +        assert!(db.read("/nonexistent.txt", 0, 100).is_err());
> +
> +        // Write to non-existent file should fail
> +        assert!(
> +            db.write("/nonexistent.txt", 0, 1000, b"data", false)
> +                .is_err()
> +        );
> +
> +        // Empty path should fail
> +        assert!(db.create("", libc::S_IFREG, 1000).is_err());
> +    }
> +}





^ permalink raw reply	[relevance 6%]

* Re: [pve-devel] [PATCH pve-cluster 14/15] pmxcfs-rs: add Makefile for build automation
  @ 2026-02-09 16:25  6%   ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-09 16:25 UTC (permalink / raw)
  To: Proxmox VE development discussion, Kefu Chai

On 1/6/26 3:25 PM, Kefu Chai wrote:
> Add Makefile with standard targets for building, testing, and linting:
> - test: Run all workspace tests
> - clippy: Lint code with clippy
> - fmt: Check code formatting
> - check: Full quality check (fmt + clippy + test)
> - build: Build release version
> - clean: Clean build artifacts
> 
> This provides a consistent interface for building and testing the
> Rust implementation.
> 
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
>   src/pmxcfs-rs/.gitignore |  1 +
>   src/pmxcfs-rs/Makefile   | 39 +++++++++++++++++++++++++++++++++++++++
>   2 files changed, 40 insertions(+)
>   create mode 100644 src/pmxcfs-rs/.gitignore
>   create mode 100644 src/pmxcfs-rs/Makefile
> 
> diff --git a/src/pmxcfs-rs/.gitignore b/src/pmxcfs-rs/.gitignore
> new file mode 100644
> index 00000000..ea8c4bf7
> --- /dev/null
> +++ b/src/pmxcfs-rs/.gitignore
> @@ -0,0 +1 @@
> +/target

nit: Since patch 1 introduces the workspace, could we add the
.gitignore there (and possibly fold this Makefile into that
patch as well?

> diff --git a/src/pmxcfs-rs/Makefile b/src/pmxcfs-rs/Makefile
> new file mode 100644
> index 00000000..eaa96317
> --- /dev/null
> +++ b/src/pmxcfs-rs/Makefile
> @@ -0,0 +1,39 @@
> +.PHONY: all test lint clippy fmt check build clean help
> +
> +# Default target
> +all: check build
> +
> +# Run all tests
> +test:
> +	cargo test --workspace
> +
> +# Lint with clippy (using proxmox-backup style: only fail on correctness issues)
> +clippy:
> +	cargo clippy --workspace -- -A clippy::all -D clippy::correctness
> +
> +# Check code formatting
> +fmt:
> +	cargo fmt --all --check
> +
> +# Full quality check (format + lint + test)
> +check: fmt clippy test
> +
> +# Build release version
> +build:
> +	cargo build --workspace --release
> +
> +# Clean build artifacts
> +clean:
> +	cargo clean
> +
> +# Show available targets
> +help:
> +	@echo "Available targets:"
> +	@echo "  all      - Run check and build (default)"
> +	@echo "  test     - Run all tests"
> +	@echo "  clippy   - Run clippy linter"
> +	@echo "  fmt      - Check code formatting"
> +	@echo "  check    - Run fmt + clippy + test"
> +	@echo "  build    - Build release version"
> +	@echo "  clean    - Clean build artifacts"
> +	@echo "  help     - Show this help message"





^ permalink raw reply	[relevance 6%]

* Re: [pbs-devel] [PATCH proxmox-backup v4 2/4] pbs-config: cache verified API token secrets
  2026-01-21 15:13 12% ` [pbs-devel] [PATCH proxmox-backup v4 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
@ 2026-02-10 12:54  5%   ` Christian Ebner
  2026-02-10 13:08  6%     ` Samuel Rufinatscha
  0 siblings, 1 reply; 117+ results
From: Christian Ebner @ 2026-02-10 12:54 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Samuel Rufinatscha

one suggestion below

On 1/21/26 4:13 PM, Samuel Rufinatscha wrote:
> Adds an in-memory cache of successfully verified token secrets.
> Subsequent requests for the same token+secret combination only perform a
> comparison using openssl::memcmp::eq and avoid re-running the password
> hash. The cache is updated when a token secret is set and cleared when a
> token is deleted.
> 
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> Changes from v3 to v4:
> * Add gen param to invalidate_cache_state()
> * Validates the generation bump after obtaining write lock in
> apply_api_mutation
> * Pass lock to apply_api_mutation
> * Remove unnecessary gen check cache_try_secret_matches
> * Adjusted commit message
> 
> Changes from v2 to v3:
> * Replaced process-local cache invalidation (AtomicU64
> API_MUTATION_GENERATION) with a cross-process shared generation via
> ConfigVersionCache.
> * Validate shared generation before/after the constant-time secret
> compare; only insert into cache if the generation is unchanged.
> * invalidate_cache_state() on insert if shared generation changed.
> 
> Changes from v1 to v2:
> * Replace OnceCell with LazyLock, and std::sync::RwLock with
> parking_lot::RwLock.
> * Add API_MUTATION_GENERATION and guard cache inserts
> to prevent “zombie inserts” across concurrent set/delete.
> * Refactor cache operations into cache_try_secret_matches,
> cache_try_insert_secret, and centralize write-side behavior in
> apply_api_mutation.
> * Switch fast-path cache access to try_read/try_write (best-effort).
> 
>   Cargo.toml                     |   1 +
>   pbs-config/Cargo.toml          |   1 +
>   pbs-config/src/token_shadow.rs | 160 ++++++++++++++++++++++++++++++++-
>   3 files changed, 159 insertions(+), 3 deletions(-)
> 
> diff --git a/Cargo.toml b/Cargo.toml
> index 0da18383..aed66fe3 100644
> --- a/Cargo.toml
> +++ b/Cargo.toml
> @@ -143,6 +143,7 @@ nom = "7"
>   num-traits = "0.2"
>   once_cell = "1.3.1"
>   openssl = "0.10.40"
> +parking_lot = "0.12"
>   percent-encoding = "2.1"
>   pin-project-lite = "0.2"
>   regex = "1.5.5"
> diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
> index 74afb3c6..eb81ce00 100644
> --- a/pbs-config/Cargo.toml
> +++ b/pbs-config/Cargo.toml
> @@ -13,6 +13,7 @@ libc.workspace = true
>   nix.workspace = true
>   once_cell.workspace = true
>   openssl.workspace = true
> +parking_lot.workspace = true
>   regex.workspace = true
>   serde.workspace = true
>   serde_json.workspace = true
> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
> index 640fabbf..d5aa5de2 100644
> --- a/pbs-config/src/token_shadow.rs
> +++ b/pbs-config/src/token_shadow.rs
> @@ -1,6 +1,8 @@
>   use std::collections::HashMap;
> +use std::sync::LazyLock;
>   
>   use anyhow::{bail, format_err, Error};
> +use parking_lot::RwLock;
>   use serde::{Deserialize, Serialize};
>   use serde_json::{from_value, Value};
>   
> @@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
>   const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
>   const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>   
> +/// Global in-memory cache for successfully verified API token secrets.
> +/// The cache stores plain text secrets for token Authids that have already been
> +/// verified against the hashed values in `token.shadow`. This allows for cheap
> +/// subsequent authentications for the same token+secret combination, avoiding
> +/// recomputing the password hash on every request.
> +static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
> +    RwLock::new(ApiTokenSecretCache {
> +        secrets: HashMap::new(),
> +        shared_gen: 0,
> +    })
> +});
> +
>   #[derive(Serialize, Deserialize)]
>   #[serde(rename_all = "kebab-case")]
>   /// ApiToken id / secret pair
> @@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>           bail!("not an API token ID");
>       }
>   
> +    // Fast path
> +    if cache_try_secret_matches(tokenid, secret) {
> +        return Ok(());
> +    }
> +
> +    // Slow path
> +    // First, capture the shared generation before doing the hash verification.
> +    let gen_before = token_shadow_shared_gen();
> +
>       let data = read_file()?;
>       match data.get(tokenid) {
> -        Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
> +        Some(hashed_secret) => {
> +            proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
> +
> +            // Try to cache only if nothing changed while verifying the secret.
> +            if let Some(gen) = gen_before {
> +                cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
> +            }
> +
> +            Ok(())
> +        }
>           None => bail!("invalid API token"),
>       }
>   }
> @@ -75,13 +107,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>           bail!("not an API token ID");
>       }
>   
> -    let _guard = lock_config()?;
> +    let guard = lock_config()?;
>   
>       let mut data = read_file()?;
>       let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
>       data.insert(tokenid.clone(), hashed_secret);
>       write_file(data)?;
>   
> +    apply_api_mutation(guard, tokenid, Some(secret));
> +
>       Ok(())
>   }
>   
> @@ -91,11 +125,131 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
>           bail!("not an API token ID");
>       }
>   
> -    let _guard = lock_config()?;
> +    let guard = lock_config()?;
>   
>       let mut data = read_file()?;
>       data.remove(tokenid);
>       write_file(data)?;
>   
> +    apply_api_mutation(guard, tokenid, None);
> +
>       Ok(())
>   }
> +
> +struct ApiTokenSecretCache {
> +    /// Keys are token Authids, values are the corresponding plain text secrets.
> +    /// Entries are added after a successful on-disk verification in
> +    /// `verify_secret` or when a new token secret is generated by
> +    /// `generate_and_set_secret`. Used to avoid repeated
> +    /// password-hash computation on subsequent authentications.
> +    secrets: HashMap<Authid, CachedSecret>,
> +    /// Shared generation to detect mutations of the underlying token.shadow file.
> +    shared_gen: usize,
> +}
> +
> +/// Cached secret.
> +struct CachedSecret {
> +    secret: String,
> +}
> +
> +fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
> +    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
> +        return;
> +    };
> +
> +    let Some(shared_gen_now) = token_shadow_shared_gen() else {
> +        return;
> +    };
> +
> +    // If this process missed a generation bump, its cache is stale.
> +    if cache.shared_gen != shared_gen_now {
> +        invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
> +    }
> +
> +    // If a mutation happened while we were verifying the secret, do not insert.
> +    if shared_gen_now == shared_gen_before {
> +        cache.secrets.insert(tokenid, CachedSecret { secret });
> +    }
> +}
> +
> +/// Tries to match the given token secret against the cached secret.
> +///
> +/// Verifies the generation/version before doing the constant-time
> +/// comparison to reduce TOCTOU risk. During token rotation or deletion
> +/// tokens for in-flight requests may still validate against the previous
> +/// generation.
> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
> +    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
> +        return false;
> +    };
> +    let Some(entry) = cache.secrets.get(tokenid) else {
> +        return false;
> +    };
> +    let Some(current_gen) = token_shadow_shared_gen() else {
> +        return false;
> +    };
> +
> +    if current_gen == cache.shared_gen {
> +        return openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
> +    }
> +
> +    false
> +}
> +
> +fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
> +    // Signal cache invalidation to other processes (best-effort).
> +    let bumped_gen = bump_token_shadow_shared_gen();
> +
> +    let mut cache = TOKEN_SECRET_CACHE.write();
> +
> +    // If we cannot get the current generation, we cannot trust the cache
> +    let Some(current_gen) = token_shadow_shared_gen() else {
> +        invalidate_cache_state_and_set_gen(&mut cache, 0);
> +        return;
> +    };
> +
> +    // If we cannot bump the shared generation, or if it changed after
> +    // obtaining the cache write lock, we cannot trust the cache
> +    if bumped_gen != Some(current_gen) {
> +        invalidate_cache_state_and_set_gen(&mut cache, current_gen);
> +        return;
> +    }
> +
> +    // Update to the post-mutation generation.
> +    cache.shared_gen = current_gen;
> +
> +    // Apply the new mutation.
> +    match new_secret {
> +        Some(secret) => {
> +            cache.secrets.insert(
> +                tokenid.clone(),
> +                CachedSecret {
> +                    secret: secret.to_owned(),
> +                },
> +            );
> +        }
> +        None => {
> +            cache.secrets.remove(tokenid);
> +        }
> +    }
> +}
> +
> +/// Get the current shared generation.
> +fn token_shadow_shared_gen() -> Option<usize> {
> +    crate::ConfigVersionCache::new()
> +        .ok()
> +        .map(|cvc| cvc.token_shadow_generation())
> +}
> +
> +/// Bump and return the new shared generation.
> +fn bump_token_shadow_shared_gen() -> Option<usize> {
> +    crate::ConfigVersionCache::new()
> +        .ok()
> +        .map(|cvc| cvc.increase_token_shadow_generation() + 1)
> +}
> +
> +/// Invalidates local cache contents and sets/updates the cached generation.
> +fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache, gen: usize) {
> +    cache.secrets.clear();
> +    cache.shared_gen = gen;
> +}

above function operates on the chache, so why not make it a method 
thereof? And also bundle the generation bumps, so they might not be 
forgotten.

Something along the lines of the following diff on top of this patch:

diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index d5aa5de28..a8104f142 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -136,6 +136,11 @@ pub fn delete_secret(tokenid: &Authid) -> 
Result<(), Error> {
      Ok(())
  }

+/// Cached secret.
+struct CachedSecret {
+    secret: String,
+}
+
  struct ApiTokenSecretCache {
      /// Keys are token Authids, values are the corresponding plain 
text secrets.
      /// Entries are added after a successful on-disk verification in
@@ -147,9 +152,22 @@ struct ApiTokenSecretCache {
      shared_gen: usize,
  }

-/// Cached secret.
-struct CachedSecret {
-    secret: String,
+impl ApiTokenSecretCache {
+    /// Invalidates local cache contents and sets/updates the cached 
generation.
+    fn invalidate_state_and_set_gen(&mut self, gen: usize) {
+        self.secrets.clear();
+        self.shared_gen = gen;
+    }
+
+    fn insert_and_set_gen(&mut self, tokenid: Authid, secret: 
CachedSecret, gen: usize) {
+        self.secrets.insert(tokenid.clone(), secret);
+        self.shared_gen = gen;
+    }
+
+    fn evict_and_set_gen(&mut self, tokenid: &Authid, gen: usize) {
+        self.secrets.remove(tokenid);
+        self.shared_gen = gen;
+    }
  }

  fn cache_try_insert_secret(tokenid: Authid, secret: String, 
shared_gen_before: usize) {
@@ -163,12 +181,12 @@ fn cache_try_insert_secret(tokenid: Authid, 
secret: String, shared_gen_before: u

      // If this process missed a generation bump, its cache is stale.
      if cache.shared_gen != shared_gen_now {
-        invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
+        cache.invalidate_state_and_set_gen(shared_gen_now);
      }

      // If a mutation happened while we were verifying the secret, do 
not insert.
      if shared_gen_now == shared_gen_before {
-        cache.secrets.insert(tokenid, CachedSecret { secret });
+        cache.insert_and_set_gen(tokenid, CachedSecret { secret }, 
shared_gen_now);
      }
  }

@@ -204,33 +222,24 @@ fn apply_api_mutation(_guard: BackupLockGuard, 
tokenid: &Authid, new_secret: Opt

      // If we cannot get the current generation, we cannot trust the cache
      let Some(current_gen) = token_shadow_shared_gen() else {
-        invalidate_cache_state_and_set_gen(&mut cache, 0);
+        cache.invalidate_state_and_set_gen(0);
          return;
      };

      // If we cannot bump the shared generation, or if it changed after
      // obtaining the cache write lock, we cannot trust the cache
      if bumped_gen != Some(current_gen) {
-        invalidate_cache_state_and_set_gen(&mut cache, current_gen);
+        cache.invalidate_state_and_set_gen(current_gen);
          return;
      }

-    // Update to the post-mutation generation.
-    cache.shared_gen = current_gen;
-
      // Apply the new mutation.
      match new_secret {
          Some(secret) => {
-            cache.secrets.insert(
-                tokenid.clone(),
-                CachedSecret {
-                    secret: secret.to_owned(),
-                },
-            );
-        }
-        None => {
-            cache.secrets.remove(tokenid);
+            let cached_secret = CachedSecret { secret: secret.to_owned() };
+            cache.insert_and_set_gen(tokenid.clone(), cached_secret, 
current_gen);
          }
+        None => cache.evict_and_set_gen(tokenid, current_gen),
      }
  }

@@ -248,8 +257,3 @@ fn bump_token_shadow_shared_gen() -> Option<usize> {
          .map(|cvc| cvc.increase_token_shadow_generation() + 1)
  }

-/// Invalidates local cache contents and sets/updates the cached 
generation.
-fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache, 
gen: usize) {
-    cache.secrets.clear();
-    cache.shared_gen = gen;
-}





^ permalink raw reply related	[relevance 5%]

* Re: [pbs-devel] [PATCH proxmox-backup v4 4/4] pbs-config: add TTL window to token secret cache
  2026-01-21 15:14 15% ` [pbs-devel] [PATCH proxmox-backup v4 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
@ 2026-02-10 12:58  6%   ` Christian Ebner
  2026-02-10 13:18  6%     ` Samuel Rufinatscha
  0 siblings, 1 reply; 117+ results
From: Christian Ebner @ 2026-02-10 12:58 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Samuel Rufinatscha

one suggestion inline

On 1/21/26 4:13 PM, Samuel Rufinatscha wrote:
> Verify_secret() currently calls refresh_cache_if_file_changed() on every
> request, which performs a metadata() call on token.shadow each time.
> Under load this adds unnecessary overhead, considering also the file
> usually should rarely change.
> 
> This patch introduces a TTL boundary, controlled by
> TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
> TTL has expired; documents TTL effects.
> 
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> Changes from v3 to 4:
> * Adjusted commit message
> 
> Changes from v2 to v3:
> * Refactored refresh_cache_if_file_changed TTL logic.
> * Remove had_prior_state check (replaced by last_checked logic).
> * Improve TTL bound checks.
> * Reword documentation warning for clarity.
> 
> Changes from v1 to v2:
> * Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
> * Implement double-checked TTL: check with try_read first; only attempt
>    refresh with try_write if expired/unknown.
> * Fix TTL bookkeeping: update last_checked on the “file unchanged” path
>    and after API mutations.
> * Add documentation warning about TTL-delayed effect of manual
>    token.shadow edits.
> 
>   docs/user-management.rst       |  4 ++++
>   pbs-config/src/token_shadow.rs | 29 ++++++++++++++++++++++++++++-
>   2 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/user-management.rst b/docs/user-management.rst
> index 41b43d60..8dfae528 100644
> --- a/docs/user-management.rst
> +++ b/docs/user-management.rst
> @@ -156,6 +156,10 @@ metadata:
>   Similarly, the ``user delete-token`` subcommand can be used to delete a token
>   again.
>   
> +.. WARNING:: Direct/manual edits to ``token.shadow`` may take up to 60 seconds (or
> +   longer in edge cases) to take effect due to caching. Restart services for
> +   immediate effect of manual edits.
> +
>   Newly generated API tokens don't have any permissions. Please read the next
>   section to learn how to set access permissions.
>   
> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
> index a5bd1525..24633f6e 100644
> --- a/pbs-config/src/token_shadow.rs
> +++ b/pbs-config/src/token_shadow.rs
> @@ -31,6 +31,8 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
>           shadow: None,
>       })
>   });
> +/// Max age in seconds of the token secret cache before checking for file changes.
> +const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
>   
>   #[derive(Serialize, Deserialize)]
>   #[serde(rename_all = "kebab-case")]
> @@ -72,11 +74,29 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
>   fn refresh_cache_if_file_changed() -> bool {
>       let now = epoch_i64();
>   
> -    // Best-effort refresh under write lock.
> +    // Fast path: cache is fresh if shared-gen matches and TTL not expired.
> +    if let (Some(cache), Some(shared_gen_read)) =
> +        (TOKEN_SECRET_CACHE.try_read(), token_shadow_shared_gen())
> +    {
> +        if cache.shared_gen == shared_gen_read

starting here ..

> +            && cache.shadow.as_ref().is_some_and(|cached| {
> +                now >= cached.last_checked
> +                    && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS

... to here is the same as ...

> +            })
> +        {
> +            return true;
> +        }
> +        // read lock drops here
> +    } else {
> +        return false;
> +    }
> +
> +    // Slow path: best-effort refresh under write lock.
>       let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
>           return false;
>       };
>   
> +    // Re-read generation after acquiring the lock (may have changed meanwhile).
>       let Some(shared_gen_now) = token_shadow_shared_gen() else {
>           return false;
>       };
> @@ -86,6 +106,13 @@ fn refresh_cache_if_file_changed() -> bool {
>           invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
>       }
>   
> +    // TTL check again after acquiring the lock
> +    if cache.shadow.as_ref().is_some_and(|cached| {
> +        now >= cached.last_checked && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS

... this check above. I think this could be defined as method on the 
cache object so it can be easily reused and the code is more readable.

> +    }) {
> +        return true;
> +    }
> +
>       // Stat the file to detect manual edits.
>       let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
>           return false;





^ permalink raw reply	[relevance 6%]

* Re: [pbs-devel] [PATCH proxmox-backup v4 2/4] pbs-config: cache verified API token secrets
  2026-02-10 12:54  5%   ` Christian Ebner
@ 2026-02-10 13:08  6%     ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-10 13:08 UTC (permalink / raw)
  To: Christian Ebner, Proxmox Backup Server development discussion

On 2/10/26 1:52 PM, Christian Ebner wrote:
> one suggestion below
> 
> On 1/21/26 4:13 PM, Samuel Rufinatscha wrote:
>> Adds an in-memory cache of successfully verified token secrets.
>> Subsequent requests for the same token+secret combination only perform a
>> comparison using openssl::memcmp::eq and avoid re-running the password
>> hash. The cache is updated when a token secret is set and cleared when a
>> token is deleted.
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> Changes from v3 to v4:
>> * Add gen param to invalidate_cache_state()
>> * Validates the generation bump after obtaining write lock in
>> apply_api_mutation
>> * Pass lock to apply_api_mutation
>> * Remove unnecessary gen check cache_try_secret_matches
>> * Adjusted commit message
>>
>> Changes from v2 to v3:
>> * Replaced process-local cache invalidation (AtomicU64
>> API_MUTATION_GENERATION) with a cross-process shared generation via
>> ConfigVersionCache.
>> * Validate shared generation before/after the constant-time secret
>> compare; only insert into cache if the generation is unchanged.
>> * invalidate_cache_state() on insert if shared generation changed.
>>
>> Changes from v1 to v2:
>> * Replace OnceCell with LazyLock, and std::sync::RwLock with
>> parking_lot::RwLock.
>> * Add API_MUTATION_GENERATION and guard cache inserts
>> to prevent “zombie inserts” across concurrent set/delete.
>> * Refactor cache operations into cache_try_secret_matches,
>> cache_try_insert_secret, and centralize write-side behavior in
>> apply_api_mutation.
>> * Switch fast-path cache access to try_read/try_write (best-effort).
>>
>>   Cargo.toml                     |   1 +
>>   pbs-config/Cargo.toml          |   1 +
>>   pbs-config/src/token_shadow.rs | 160 ++++++++++++++++++++++++++++++++-
>>   3 files changed, 159 insertions(+), 3 deletions(-)
>>
>> diff --git a/Cargo.toml b/Cargo.toml
>> index 0da18383..aed66fe3 100644
>> --- a/Cargo.toml
>> +++ b/Cargo.toml
>> @@ -143,6 +143,7 @@ nom = "7"
>>   num-traits = "0.2"
>>   once_cell = "1.3.1"
>>   openssl = "0.10.40"
>> +parking_lot = "0.12"
>>   percent-encoding = "2.1"
>>   pin-project-lite = "0.2"
>>   regex = "1.5.5"
>> diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
>> index 74afb3c6..eb81ce00 100644
>> --- a/pbs-config/Cargo.toml
>> +++ b/pbs-config/Cargo.toml
>> @@ -13,6 +13,7 @@ libc.workspace = true
>>   nix.workspace = true
>>   once_cell.workspace = true
>>   openssl.workspace = true
>> +parking_lot.workspace = true
>>   regex.workspace = true
>>   serde.workspace = true
>>   serde_json.workspace = true
>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/ 
>> token_shadow.rs
>> index 640fabbf..d5aa5de2 100644
>> --- a/pbs-config/src/token_shadow.rs
>> +++ b/pbs-config/src/token_shadow.rs
>> @@ -1,6 +1,8 @@
>>   use std::collections::HashMap;
>> +use std::sync::LazyLock;
>>   use anyhow::{bail, format_err, Error};
>> +use parking_lot::RwLock;
>>   use serde::{Deserialize, Serialize};
>>   use serde_json::{from_value, Value};
>> @@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
>>   const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
>>   const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>> +/// Global in-memory cache for successfully verified API token secrets.
>> +/// The cache stores plain text secrets for token Authids that have 
>> already been
>> +/// verified against the hashed values in `token.shadow`. This allows 
>> for cheap
>> +/// subsequent authentications for the same token+secret combination, 
>> avoiding
>> +/// recomputing the password hash on every request.
>> +static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = 
>> LazyLock::new(|| {
>> +    RwLock::new(ApiTokenSecretCache {
>> +        secrets: HashMap::new(),
>> +        shared_gen: 0,
>> +    })
>> +});
>> +
>>   #[derive(Serialize, Deserialize)]
>>   #[serde(rename_all = "kebab-case")]
>>   /// ApiToken id / secret pair
>> @@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: 
>> &str) -> Result<(), Error> {
>>           bail!("not an API token ID");
>>       }
>> +    // Fast path
>> +    if cache_try_secret_matches(tokenid, secret) {
>> +        return Ok(());
>> +    }
>> +
>> +    // Slow path
>> +    // First, capture the shared generation before doing the hash 
>> verification.
>> +    let gen_before = token_shadow_shared_gen();
>> +
>>       let data = read_file()?;
>>       match data.get(tokenid) {
>> -        Some(hashed_secret) => 
>> proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
>> +        Some(hashed_secret) => {
>> +            proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
>> +
>> +            // Try to cache only if nothing changed while verifying 
>> the secret.
>> +            if let Some(gen) = gen_before {
>> +                cache_try_insert_secret(tokenid.clone(), 
>> secret.to_owned(), gen);
>> +            }
>> +
>> +            Ok(())
>> +        }
>>           None => bail!("invalid API token"),
>>       }
>>   }
>> @@ -75,13 +107,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> 
>> Result<(), Error> {
>>           bail!("not an API token ID");
>>       }
>> -    let _guard = lock_config()?;
>> +    let guard = lock_config()?;
>>       let mut data = read_file()?;
>>       let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
>>       data.insert(tokenid.clone(), hashed_secret);
>>       write_file(data)?;
>> +    apply_api_mutation(guard, tokenid, Some(secret));
>> +
>>       Ok(())
>>   }
>> @@ -91,11 +125,131 @@ pub fn delete_secret(tokenid: &Authid) -> 
>> Result<(), Error> {
>>           bail!("not an API token ID");
>>       }
>> -    let _guard = lock_config()?;
>> +    let guard = lock_config()?;
>>       let mut data = read_file()?;
>>       data.remove(tokenid);
>>       write_file(data)?;
>> +    apply_api_mutation(guard, tokenid, None);
>> +
>>       Ok(())
>>   }
>> +
>> +struct ApiTokenSecretCache {
>> +    /// Keys are token Authids, values are the corresponding plain 
>> text secrets.
>> +    /// Entries are added after a successful on-disk verification in
>> +    /// `verify_secret` or when a new token secret is generated by
>> +    /// `generate_and_set_secret`. Used to avoid repeated
>> +    /// password-hash computation on subsequent authentications.
>> +    secrets: HashMap<Authid, CachedSecret>,
>> +    /// Shared generation to detect mutations of the underlying 
>> token.shadow file.
>> +    shared_gen: usize,
>> +}
>> +
>> +/// Cached secret.
>> +struct CachedSecret {
>> +    secret: String,
>> +}
>> +
>> +fn cache_try_insert_secret(tokenid: Authid, secret: String, 
>> shared_gen_before: usize) {
>> +    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
>> +        return;
>> +    };
>> +
>> +    let Some(shared_gen_now) = token_shadow_shared_gen() else {
>> +        return;
>> +    };
>> +
>> +    // If this process missed a generation bump, its cache is stale.
>> +    if cache.shared_gen != shared_gen_now {
>> +        invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
>> +    }
>> +
>> +    // If a mutation happened while we were verifying the secret, do 
>> not insert.
>> +    if shared_gen_now == shared_gen_before {
>> +        cache.secrets.insert(tokenid, CachedSecret { secret });
>> +    }
>> +}
>> +
>> +/// Tries to match the given token secret against the cached secret.
>> +///
>> +/// Verifies the generation/version before doing the constant-time
>> +/// comparison to reduce TOCTOU risk. During token rotation or deletion
>> +/// tokens for in-flight requests may still validate against the 
>> previous
>> +/// generation.
>> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
>> +    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
>> +        return false;
>> +    };
>> +    let Some(entry) = cache.secrets.get(tokenid) else {
>> +        return false;
>> +    };
>> +    let Some(current_gen) = token_shadow_shared_gen() else {
>> +        return false;
>> +    };
>> +
>> +    if current_gen == cache.shared_gen {
>> +        return openssl::memcmp::eq(entry.secret.as_bytes(), 
>> secret.as_bytes());
>> +    }
>> +
>> +    false
>> +}
>> +
>> +fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, 
>> new_secret: Option<&str>) {
>> +    // Signal cache invalidation to other processes (best-effort).
>> +    let bumped_gen = bump_token_shadow_shared_gen();
>> +
>> +    let mut cache = TOKEN_SECRET_CACHE.write();
>> +
>> +    // If we cannot get the current generation, we cannot trust the 
>> cache
>> +    let Some(current_gen) = token_shadow_shared_gen() else {
>> +        invalidate_cache_state_and_set_gen(&mut cache, 0);
>> +        return;
>> +    };
>> +
>> +    // If we cannot bump the shared generation, or if it changed after
>> +    // obtaining the cache write lock, we cannot trust the cache
>> +    if bumped_gen != Some(current_gen) {
>> +        invalidate_cache_state_and_set_gen(&mut cache, current_gen);
>> +        return;
>> +    }
>> +
>> +    // Update to the post-mutation generation.
>> +    cache.shared_gen = current_gen;
>> +
>> +    // Apply the new mutation.
>> +    match new_secret {
>> +        Some(secret) => {
>> +            cache.secrets.insert(
>> +                tokenid.clone(),
>> +                CachedSecret {
>> +                    secret: secret.to_owned(),
>> +                },
>> +            );
>> +        }
>> +        None => {
>> +            cache.secrets.remove(tokenid);
>> +        }
>> +    }
>> +}
>> +
>> +/// Get the current shared generation.
>> +fn token_shadow_shared_gen() -> Option<usize> {
>> +    crate::ConfigVersionCache::new()
>> +        .ok()
>> +        .map(|cvc| cvc.token_shadow_generation())
>> +}
>> +
>> +/// Bump and return the new shared generation.
>> +fn bump_token_shadow_shared_gen() -> Option<usize> {
>> +    crate::ConfigVersionCache::new()
>> +        .ok()
>> +        .map(|cvc| cvc.increase_token_shadow_generation() + 1)
>> +}
>> +
>> +/// Invalidates local cache contents and sets/updates the cached 
>> generation.
>> +fn invalidate_cache_state_and_set_gen(cache: &mut 
>> ApiTokenSecretCache, gen: usize) {
>> +    cache.secrets.clear();
>> +    cache.shared_gen = gen;
>> +}
> 
> above function operates on the chache, so why not make it a method 
> thereof? And also bundle the generation bumps, so they might not be 
> forgotten.

Good catch, makes sense. I’ll adjust!

> 
> Something along the lines of the following diff on top of this patch:
>

Thanks for the code suggestion below, Christian.

> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/ 
> token_shadow.rs
> index d5aa5de28..a8104f142 100644
> --- a/pbs-config/src/token_shadow.rs
> +++ b/pbs-config/src/token_shadow.rs
> @@ -136,6 +136,11 @@ pub fn delete_secret(tokenid: &Authid) -> 
> Result<(), Error> {
>       Ok(())
>   }
> 
> +/// Cached secret.
> +struct CachedSecret {
> +    secret: String,
> +}
> +
>   struct ApiTokenSecretCache {
>       /// Keys are token Authids, values are the corresponding plain 
> text secrets.
>       /// Entries are added after a successful on-disk verification in
> @@ -147,9 +152,22 @@ struct ApiTokenSecretCache {
>       shared_gen: usize,
>   }
> 
> -/// Cached secret.
> -struct CachedSecret {
> -    secret: String,
> +impl ApiTokenSecretCache {
> +    /// Invalidates local cache contents and sets/updates the cached 
> generation.
> +    fn invalidate_state_and_set_gen(&mut self, gen: usize) {
> +        self.secrets.clear();
> +        self.shared_gen = gen;
> +    }
> +
> +    fn insert_and_set_gen(&mut self, tokenid: Authid, secret: 
> CachedSecret, gen: usize) {
> +        self.secrets.insert(tokenid.clone(), secret);
> +        self.shared_gen = gen;
> +    }
> +
> +    fn evict_and_set_gen(&mut self, tokenid: &Authid, gen: usize) {
> +        self.secrets.remove(tokenid);
> +        self.shared_gen = gen;
> +    }
>   }
> 
>   fn cache_try_insert_secret(tokenid: Authid, secret: String, 
> shared_gen_before: usize) {
> @@ -163,12 +181,12 @@ fn cache_try_insert_secret(tokenid: Authid, 
> secret: String, shared_gen_before: u
> 
>       // If this process missed a generation bump, its cache is stale.
>       if cache.shared_gen != shared_gen_now {
> -        invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
> +        cache.invalidate_state_and_set_gen(shared_gen_now);
>       }
> 
>       // If a mutation happened while we were verifying the secret, do 
> not insert.
>       if shared_gen_now == shared_gen_before {
> -        cache.secrets.insert(tokenid, CachedSecret { secret });
> +        cache.insert_and_set_gen(tokenid, CachedSecret { secret }, 
> shared_gen_now);
>       }
>   }
> 
> @@ -204,33 +222,24 @@ fn apply_api_mutation(_guard: BackupLockGuard, 
> tokenid: &Authid, new_secret: Opt
> 
>       // If we cannot get the current generation, we cannot trust the cache
>       let Some(current_gen) = token_shadow_shared_gen() else {
> -        invalidate_cache_state_and_set_gen(&mut cache, 0);
> +        cache.invalidate_state_and_set_gen(0);
>           return;
>       };
> 
>       // If we cannot bump the shared generation, or if it changed after
>       // obtaining the cache write lock, we cannot trust the cache
>       if bumped_gen != Some(current_gen) {
> -        invalidate_cache_state_and_set_gen(&mut cache, current_gen);
> +        cache.invalidate_state_and_set_gen(current_gen);
>           return;
>       }
> 
> -    // Update to the post-mutation generation.
> -    cache.shared_gen = current_gen;
> -
>       // Apply the new mutation.
>       match new_secret {
>           Some(secret) => {
> -            cache.secrets.insert(
> -                tokenid.clone(),
> -                CachedSecret {
> -                    secret: secret.to_owned(),
> -                },
> -            );
> -        }
> -        None => {
> -            cache.secrets.remove(tokenid);
> +            let cached_secret = CachedSecret { secret: 
> secret.to_owned() };
> +            cache.insert_and_set_gen(tokenid.clone(), cached_secret, 
> current_gen);
>           }
> +        None => cache.evict_and_set_gen(tokenid, current_gen),
>       }
>   }
> 
> @@ -248,8 +257,3 @@ fn bump_token_shadow_shared_gen() -> Option<usize> {
>           .map(|cvc| cvc.increase_token_shadow_generation() + 1)
>   }
> 
> -/// Invalidates local cache contents and sets/updates the cached 
> generation.
> -fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache, 
> gen: usize) {
> -    cache.secrets.clear();
> -    cache.shared_gen = gen;
> -}
> 





^ permalink raw reply	[relevance 6%]

* Re: [pbs-devel] [PATCH proxmox-backup v4 4/4] pbs-config: add TTL window to token secret cache
  2026-02-10 12:58  6%   ` Christian Ebner
@ 2026-02-10 13:18  6%     ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-10 13:18 UTC (permalink / raw)
  To: Christian Ebner, Proxmox Backup Server development discussion

On 2/10/26 1:57 PM, Christian Ebner wrote:
> one suggestion inline
> 
> On 1/21/26 4:13 PM, Samuel Rufinatscha wrote:
>> Verify_secret() currently calls refresh_cache_if_file_changed() on every
>> request, which performs a metadata() call on token.shadow each time.
>> Under load this adds unnecessary overhead, considering also the file
>> usually should rarely change.
>>
>> This patch introduces a TTL boundary, controlled by
>> TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
>> TTL has expired; documents TTL effects.
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> Changes from v3 to 4:
>> * Adjusted commit message
>>
>> Changes from v2 to v3:
>> * Refactored refresh_cache_if_file_changed TTL logic.
>> * Remove had_prior_state check (replaced by last_checked logic).
>> * Improve TTL bound checks.
>> * Reword documentation warning for clarity.
>>
>> Changes from v1 to v2:
>> * Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
>> * Implement double-checked TTL: check with try_read first; only attempt
>>    refresh with try_write if expired/unknown.
>> * Fix TTL bookkeeping: update last_checked on the “file unchanged” path
>>    and after API mutations.
>> * Add documentation warning about TTL-delayed effect of manual
>>    token.shadow edits.
>>
>>   docs/user-management.rst       |  4 ++++
>>   pbs-config/src/token_shadow.rs | 29 ++++++++++++++++++++++++++++-
>>   2 files changed, 32 insertions(+), 1 deletion(-)
>>
>> diff --git a/docs/user-management.rst b/docs/user-management.rst
>> index 41b43d60..8dfae528 100644
>> --- a/docs/user-management.rst
>> +++ b/docs/user-management.rst
>> @@ -156,6 +156,10 @@ metadata:
>>   Similarly, the ``user delete-token`` subcommand can be used to 
>> delete a token
>>   again.
>> +.. WARNING:: Direct/manual edits to ``token.shadow`` may take up to 
>> 60 seconds (or
>> +   longer in edge cases) to take effect due to caching. Restart 
>> services for
>> +   immediate effect of manual edits.
>> +
>>   Newly generated API tokens don't have any permissions. Please read 
>> the next
>>   section to learn how to set access permissions.
>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/ 
>> token_shadow.rs
>> index a5bd1525..24633f6e 100644
>> --- a/pbs-config/src/token_shadow.rs
>> +++ b/pbs-config/src/token_shadow.rs
>> @@ -31,6 +31,8 @@ static TOKEN_SECRET_CACHE: 
>> LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
>>           shadow: None,
>>       })
>>   });
>> +/// Max age in seconds of the token secret cache before checking for 
>> file changes.
>> +const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
>>   #[derive(Serialize, Deserialize)]
>>   #[serde(rename_all = "kebab-case")]
>> @@ -72,11 +74,29 @@ fn write_file(data: HashMap<Authid, String>) -> 
>> Result<(), Error> {
>>   fn refresh_cache_if_file_changed() -> bool {
>>       let now = epoch_i64();
>> -    // Best-effort refresh under write lock.
>> +    // Fast path: cache is fresh if shared-gen matches and TTL not 
>> expired.
>> +    if let (Some(cache), Some(shared_gen_read)) =
>> +        (TOKEN_SECRET_CACHE.try_read(), token_shadow_shared_gen())
>> +    {
>> +        if cache.shared_gen == shared_gen_read
> 
> starting here ..
> 
>> +            && cache.shadow.as_ref().is_some_and(|cached| {
>> +                now >= cached.last_checked
>> +                    && (now - cached.last_checked) < 
>> TOKEN_SECRET_CACHE_TTL_SECS
> 
> ... to here is the same as ...
> 
>> +            })
>> +        {
>> +            return true;
>> +        }
>> +        // read lock drops here
>> +    } else {
>> +        return false;
>> +    }
>> +
>> +    // Slow path: best-effort refresh under write lock.
>>       let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
>>           return false;
>>       };
>> +    // Re-read generation after acquiring the lock (may have changed 
>> meanwhile).
>>       let Some(shared_gen_now) = token_shadow_shared_gen() else {
>>           return false;
>>       };
>> @@ -86,6 +106,13 @@ fn refresh_cache_if_file_changed() -> bool {
>>           invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
>>       }
>> +    // TTL check again after acquiring the lock
>> +    if cache.shadow.as_ref().is_some_and(|cached| {
>> +        now >= cached.last_checked && (now - cached.last_checked) < 
>> TOKEN_SECRET_CACHE_TTL_SECS
> 
> ... this check above. I think this could be defined as method on the 
> cache object so it can be easily reused and the code is more readable.

Makes sense, I’ll update it to match this. Thanks!

> 
>> +    }) {
>> +        return true;
>> +    }
>> +
>>       // Stat the file to detect manual edits.
>>       let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
>>           return false;
> 





^ permalink raw reply	[relevance 6%]

* Re: [pbs-devel] [PATCH proxmox v4 2/4] proxmox-access-control: cache verified API token secrets
  2026-01-21 15:14 12% ` [pbs-devel] [PATCH proxmox v4 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
@ 2026-02-10 13:38  6%   ` Christian Ebner
  2026-02-10 14:07  6%     ` Samuel Rufinatscha
  0 siblings, 1 reply; 117+ results
From: Christian Ebner @ 2026-02-10 13:38 UTC (permalink / raw)
  To: Proxmox Backup Server development discussion, Samuel Rufinatscha

On 1/21/26 4:13 PM, Samuel Rufinatscha wrote:
> Adds an in-memory cache of successfully verified token secrets.
> Subsequent requests for the same token+secret combination only perform a
> comparison using openssl::memcmp::eq and avoid re-running the password
> hash. The cache is updated when a token secret is set and cleared when a
> token is deleted.
> 
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---

IMHO the commit message should contain a concise additional statement 
specifying the purpose of the shared shadow generation for inter-process 
cache invalidation, as the code in this patch covers that to a 
significant extend.

Forgot to mention this on patch 2 for proxmox-backup, where it applies 
as well.





^ permalink raw reply	[relevance 6%]

* Re: [pbs-devel] [PATCH proxmox v4 2/4] proxmox-access-control: cache verified API token secrets
  2026-02-10 13:38  6%   ` Christian Ebner
@ 2026-02-10 14:07  6%     ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-10 14:07 UTC (permalink / raw)
  To: Christian Ebner, Proxmox Backup Server development discussion

On 2/10/26 2:36 PM, Christian Ebner wrote:
> On 1/21/26 4:13 PM, Samuel Rufinatscha wrote:
>> Adds an in-memory cache of successfully verified token secrets.
>> Subsequent requests for the same token+secret combination only perform a
>> comparison using openssl::memcmp::eq and avoid re-running the password
>> hash. The cache is updated when a token secret is set and cleared when a
>> token is deleted.
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
> 
> IMHO the commit message should contain a concise additional statement 
> specifying the purpose of the shared shadow generation for inter-process 
> cache invalidation, as the code in this patch covers that to a 
> significant extend.
> 
> Forgot to mention this on patch 2 for proxmox-backup, where it applies 
> as well.
> 

Agreed, this should be added and will make the implementation a bit
more obvious. Will add also on patch 2, thanks!




^ permalink raw reply	[relevance 6%]

* Re: [pve-devel] [PATCH pve-cluster 08/15] pmxcfs-rs: add pmxcfs-services crate
  @ 2026-02-11 11:52  5%   ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-11 11:52 UTC (permalink / raw)
  To: Proxmox VE development discussion, Kefu Chai

Thanks for this patch, Kefu!

Comments inline.

On 1/7/26 10:16 AM, Kefu Chai wrote:
> Add service lifecycle management framework providing:
> - Service trait: Lifecycle interface for async services
> - ServiceManager: Orchestrates multiple services
> - Automatic retry logic for failed services
> - Event-driven dispatching via file descriptors
> - Graceful shutdown coordination
> 
> This is a generic framework with no pmxcfs-specific dependencies,
> only requiring tokio, async-trait, and standard error handling.
> It replaces the C version's qb_loop-based event management.
> 
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
>   src/pmxcfs-rs/Cargo.lock                      | 1798 +----------------
>   src/pmxcfs-rs/Cargo.toml                      |    1 +
>   src/pmxcfs-rs/pmxcfs-services/Cargo.toml      |   17 +
>   src/pmxcfs-rs/pmxcfs-services/README.md       |  167 ++
>   src/pmxcfs-rs/pmxcfs-services/src/error.rs    |   37 +
>   src/pmxcfs-rs/pmxcfs-services/src/lib.rs      |   16 +
>   src/pmxcfs-rs/pmxcfs-services/src/manager.rs  |  477 +++++
>   src/pmxcfs-rs/pmxcfs-services/src/service.rs  |  173 ++
>   .../pmxcfs-services/tests/service_tests.rs    |  808 ++++++++
>   9 files changed, 1778 insertions(+), 1716 deletions(-)
>   create mode 100644 src/pmxcfs-rs/pmxcfs-services/Cargo.toml
>   create mode 100644 src/pmxcfs-rs/pmxcfs-services/README.md
>   create mode 100644 src/pmxcfs-rs/pmxcfs-services/src/error.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-services/src/lib.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-services/src/manager.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-services/src/service.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-services/tests/service_tests.rs
> 
> diff --git a/src/pmxcfs-rs/Cargo.lock b/src/pmxcfs-rs/Cargo.lock
> index 31a30e13..f0ec6231 100644
> --- a/src/pmxcfs-rs/Cargo.lock
> +++ b/src/pmxcfs-rs/Cargo.lock

[..]

> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index 8fe06b88..b00ca68f 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -8,6 +8,7 @@ members = [
>       "pmxcfs-memdb",      # In-memory database with SQLite persistence
>       "pmxcfs-status",     # Status monitoring and RRD data management
>       "pmxcfs-test-utils", # Test utilities and helpers (dev-only)
> +    "pmxcfs-services",   # Service framework for automatic retry and lifecycle management
>   ]
>   resolver = "2"
>   
> diff --git a/src/pmxcfs-rs/pmxcfs-services/Cargo.toml b/src/pmxcfs-rs/pmxcfs-services/Cargo.toml
> new file mode 100644
> index 00000000..7991b913
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-services/Cargo.toml
> @@ -0,0 +1,17 @@
> +[package]
> +name = "pmxcfs-services"
> +version = "0.1.0"
> +edition = "2024"
> +
> +[dependencies]
> +anyhow = "1.0"
> +async-trait = "0.1"
> +tokio = { version = "1.41", features = ["full"] }
> +tokio-util = "0.7"
> +tracing = "0.1"
> +thiserror = "2.0"
> +parking_lot = "0.12"
> +scopeguard = "1.2"

This dependency is unused.

> +
> +[dev-dependencies]
> +pmxcfs-test-utils = { path = "../pmxcfs-test-utils" }
> diff --git a/src/pmxcfs-rs/pmxcfs-services/README.md b/src/pmxcfs-rs/pmxcfs-services/README.md

The lifecycle overview, C to Rust mapping, and usage example are great
for reviewers coming from the C codebase.
I think the rest should move to rustdoc to avoid duplication and drift.
It's easier to keep one source of truth in sync. I think the README
would benefit from being more brief/tight: what and why, how the pieces
fit together (who calls what and maybe when but as a brief but still
helpful summary) so readers get a good first idea, C mapping (maybe also
add local links to the C files, maybe even refer / tag the functions in
the link (if possible) to the mentioned functions to more easily
compare the impls), the usage example, and differences in regards to
the C impl. This would apply to the other crates in the series too.

> new file mode 100644
> index 00000000..ca17e3e9
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-services/README.md
> @@ -0,0 +1,167 @@
> +# pmxcfs-services
> +
> +**Service Management Framework** for pmxcfs - tokio-based replacement for qb_loop.
> +
> +This crate provides a robust, async service management framework with automatic retry, event-driven dispatching, periodic timers, and graceful shutdown. It replaces the C implementation's libqb loop with a modern tokio-based architecture.
> +
> +## Overview
> +
> +The service framework manages long-running services that need:
> +- **Automatic initialization retry** when connections fail
> +- **Event-driven dispatching** for file descriptor-based services (Corosync)
> +- **Periodic timers** for maintenance tasks
> +- **Error tracking** with throttled logging
> +- **Graceful shutdown** with resource cleanup
> +
> +## Key Concepts
> +
> +- **Service**: A trait implementing lifecycle methods (`initialize`, `dispatch`, `finalize`)
> +- **ServiceManager**: Orchestrates multiple services, handles retries, timers, and shutdown
> +- **ManagedService**: Internal wrapper that tracks state and handles recovery
> +
> +## Service Trait
> +
> +The `Service` trait defines the lifecycle of a managed service:
> +
> +```rust
> +#[async_trait]
> +pub trait Service: Send + Sync {
> +    fn name(&self) -> &str;
> +    async fn initialize(&mut self) -> Result<InitResult>;
> +    async fn dispatch(&mut self) -> Result<DispatchAction>;
> +    async fn finalize(&mut self) -> Result<()>;
> +
> +    // Optional overrides:
> +    fn timer_period(&self) -> Option<Duration> { None }
> +    async fn timer_callback(&mut self) -> Result<()> { Ok(()) }
> +    fn is_restartable(&self) -> bool { true }
> +    fn retry_interval(&self) -> Duration { Duration::from_secs(5) }
> +    fn dispatch_interval(&self) -> Duration { Duration::from_millis(100) }
> +}
> +```
> +
> +## InitResult
> +
> +Services return `InitResult` to indicate their dispatch mode:
> +
> +**WithFileDescriptor(fd)**:
> +- **Use case**: Corosync services (CPG, quorum, cmap)
> +- **Behavior**: `dispatch()` called when fd becomes readable
> +- **Efficiency**: Event-driven, no polling overhead
> +- **Example**: ClusterDatabaseService, QuorumService
> +
> +**NoFileDescriptor**:
> +- **Use case**: Services without external event sources
> +- **Behavior**: `dispatch()` called periodically at `dispatch_interval()`
> +- **Efficiency**: Polling overhead (default: 100ms interval)
> +
> +## ServiceManager
> +
> +Orchestrates multiple services with automatic management:
> +
> +```rust
> +let mut manager = ServiceManager::new();
> +manager.add_service(Box::new(MyService::new()));
> +manager.add_service(Box::new(AnotherService::new()));
> +let handle = manager.spawn();  // Returns JoinHandle for lifecycle control
> +// ... later ...
> +handle.abort();  // Gracefully shuts down all services

This does not gracefully shutdown all services, as it doesn't invoke
finalization code.

Please consider this approach:

let shutdown_token = manager.shutdown_token();
let handle = manager.spawn();

...

shutdown_token.cancel();  // Signal graceful shutdown
handle.await;             // Wait for all services to finalize

> +```
> +
> +### Features
> +
> +1. **Automatic Retry**: Failed services automatically retry initialization
> +2. **Event-Driven**: Services with file descriptors use tokio AsyncFd (no polling)
> +3. **Timers**: Optional periodic callbacks for maintenance
> +4. **Error Tracking**: Counts consecutive failures, throttles error logs
> +5. **Graceful Shutdown**: Finalizes all services on exit
> +
> +## Usage Example
> +
> +```rust
> +use pmxcfs_services::{Service, InitResult, DispatchAction, ServiceManager};
> +
> +struct MyService {
> +    fd: Option<i32>,
> +}
> +
> +#[async_trait]
> +impl Service for MyService {
> +    fn name(&self) -> &str { "my-service" }
> +
> +    async fn initialize(&mut self) -> Result<InitResult> {
> +        let fd = connect_to_external_service()?;
> +        self.fd = Some(fd);
> +        Ok(InitResult::WithFileDescriptor(fd))
> +    }
> +
> +    async fn dispatch(&mut self) -> Result<DispatchAction> {
> +        handle_events()?;
> +        Ok(DispatchAction::Continue)
> +    }
> +
> +    async fn finalize(&mut self) -> Result<()> {
> +        close_connection(self.fd.take())?;
> +        Ok(())
> +    }
> +}
> +```
> +
> +## C to Rust Mapping
> +
> +### Data Structures
> +
> +| C Type | Rust Type | Notes |
> +|--------|-----------|-------|
> +| `cfs_loop_t` | `ServiceManager` | Event loop manager |
> +| `cfs_service_t` | `dyn Service` | Service trait |
> +| `cfs_service_callbacks_t` | (trait methods) | Callbacks as trait methods |
> +
> +### Functions
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `cfs_loop_new()` | `ServiceManager::new()` | manager.rs |
> +| `cfs_loop_add_service()` | `ServiceManager::add_service()` | manager.rs |
> +| `cfs_loop_start_worker()` | `ServiceManager::spawn()` | manager.rs |
> +| `cfs_loop_stop_worker()` | `handle.abort()` | Tokio abort |

The pattern should be shutdown_token.cancel() + handle.await

> +| `cfs_service_new()` | (struct + impl Service) | User code |
> +
> +## Key Differences from C Implementation
> +
> +### Event Loop Architecture
> +
> +**C Version (loop.c)**:
> +- Uses libqb's `qb_loop` event loop
> +- Manual fd registration with `qb_loop_poll_add()`
> +- Single-threaded callback-based model
> +- Priority levels for services
> +
> +**Rust Version**:
> +- Uses tokio async runtime
> +- Automatic fd monitoring with `AsyncFd`
> +- Concurrent task-based model
> +- No priority levels (all equal)
> +
> +### Concurrency
> +
> +**C Version**:
> +- Single-threaded qb_loop
> +- Callbacks run sequentially
> +
> +**Rust Version**:
> +- Multi-threaded tokio runtime
> +- Services can run in parallel
> +
> +## References
> +
> +### C Implementation
> +- `src/pmxcfs/loop.c` / `loop.h` - Service loop
> +
> +### Related Crates
> +- **pmxcfs-dfsm**: Uses Service trait for ClusterDatabaseService, StatusSyncService
> +- **pmxcfs**: Uses ServiceManager to orchestrate all cluster services
> +
> +### External Dependencies
> +- **tokio**: Async runtime and I/O
> +- **async-trait**: Async methods in traits
> diff --git a/src/pmxcfs-rs/pmxcfs-services/src/error.rs b/src/pmxcfs-rs/pmxcfs-services/src/error.rs
> new file mode 100644
> index 00000000..c0dde47b
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-services/src/error.rs
> @@ -0,0 +1,37 @@
> +//! Error types for the service framework
> +
> +use thiserror::Error;
> +
> +/// Errors that can occur during service operations
> +#[derive(Error, Debug)]
> +pub enum ServiceError {

Several variants are dead code, please remove if not needed.

> +    /// Service initialization failed
> +    #[error("Failed to initialize service: {0}")]
> +    InitializationFailed(String),
> +
> +    /// Service dispatch failed
> +    #[error("Failed to dispatch service events: {0}")]
> +    DispatchFailed(String),
> +
> +    /// Service finalization failed
> +    #[error("Failed to finalize service: {0}")]
> +    FinalizationFailed(String),
> +
> +    /// Timer callback failed
> +    #[error("Timer callback failed: {0}")]
> +    TimerFailed(String),
> +
> +    /// Service is not running
> +    #[error("Service is not running")]
> +    NotRunning,
> +
> +    /// Service is already running
> +    #[error("Service is already running")]
> +    AlreadyRunning,
> +
> +    /// Generic error with context
> +    #[error("{0}")]
> +    Other(#[from] anyhow::Error),
> +}
> +
> +pub type Result<T> = std::result::Result<T, ServiceError>;
> diff --git a/src/pmxcfs-rs/pmxcfs-services/src/lib.rs b/src/pmxcfs-rs/pmxcfs-services/src/lib.rs
> new file mode 100644
> index 00000000..cf894cc5
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-services/src/lib.rs
> @@ -0,0 +1,16 @@
> +//! Service framework for pmxcfs
> +//!
> +//! This crate provides a robust, tokio-based service management framework with:
> +//! - Automatic retry on failure
> +//! - Event-driven file descriptor monitoring
> +//! - Periodic timer callbacks
> +//! - Error tracking and throttled logging
> +//! - Graceful shutdown
> +
> +mod error;
> +mod manager;
> +mod service;
> +
> +pub use error::{Result, ServiceError};
> +pub use manager::ServiceManager;
> +pub use service::{DispatchAction, InitResult, Service};
> diff --git a/src/pmxcfs-rs/pmxcfs-services/src/manager.rs b/src/pmxcfs-rs/pmxcfs-services/src/manager.rs

manager.rs is currently doing quite a lot.
can we split for example like this?

manager/mod.rs
manager/retry.rs
manager/timer.rs
manager/dispatch.rs
manager/state.rs

> new file mode 100644
> index 00000000..48c09c15
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-services/src/manager.rs
> @@ -0,0 +1,477 @@
> +//! Service manager for orchestrating multiple managed services
> +//!
> +//! The ServiceManager handles automatic retry, error tracking, event dispatching,
> +//! and timer callbacks for all registered services. It uses tokio for async I/O
> +//! and provides graceful shutdown capabilities.
> +
> +use crate::service::{DispatchAction, InitResult, Service};
> +use parking_lot::RwLock;
> +use std::collections::HashMap;
> +use std::os::unix::io::{AsRawFd, RawFd};
> +use std::sync::Arc;
> +use std::time::{Duration, Instant};
> +use tokio::io::unix::AsyncFd;
> +use tokio::task::JoinHandle;
> +use tokio::time::{MissedTickBehavior, interval};
> +use tokio_util::sync::CancellationToken;
> +use tracing::{debug, error, info, warn};
> +
> +/// Shared state for a managed service
> +struct ManagedService {
> +    /// The service implementation (wrapped in Mutex for interior mutability)
> +    service: tokio::sync::Mutex<Box<dyn Service>>,
> +    /// Current service state
> +    state: RwLock<ServiceState>,

The fields guarded by parking_lot::RwLock
are simple values that could be atomics, which would also let you
drop the parking_lot::RwLock dependency.

> +    /// Consecutive error count (reset on successful initialization)
> +    error_count: RwLock<u64>,
> +    /// Last initialization attempt timestamp
> +    last_init_attempt: RwLock<Option<Instant>>,
> +    /// Async file descriptor for event monitoring (if applicable)
> +    async_fd: RwLock<Option<Arc<AsyncFd<FdWrapper>>>>,
> +    /// Last timer callback invocation
> +    last_timer_invoke: RwLock<Option<Instant>>,
> +}
> +
> +/// Service state
> +#[derive(Debug, Clone, Copy, PartialEq, Eq)]
> +enum ServiceState {
> +    /// Service not yet initialized
> +    Uninitialized,
> +    /// Service currently initializing
> +    Initializing,
> +    /// Service running successfully
> +    Running,
> +    /// Service failed, awaiting retry
> +    Failed,
> +}
> +
> +/// Wrapper for raw file descriptor to implement AsRawFd
> +struct FdWrapper(RawFd);
> +
> +impl AsRawFd for FdWrapper {
> +    fn as_raw_fd(&self) -> RawFd {
> +        self.0
> +    }
> +}
> +
> +impl Drop for FdWrapper {
> +    fn drop(&mut self) {
> +        // File descriptor ownership is managed by the service
> +        // We just monitor it, so don't close it here
> +    }
> +}
> +
> +/// Service manager for orchestrating multiple services
> +///
> +/// The ServiceManager provides:
> +/// - Automatic retry of failed initializations
> +/// - Event-driven dispatching for file descriptor-based services
> +/// - Periodic polling for services without file descriptors
> +/// - Timer callbacks for periodic maintenance
> +/// - Error tracking and throttled logging
> +/// - Graceful shutdown
> +pub struct ServiceManager {
> +    /// Registered services by name
> +    services: HashMap<String, Arc<ManagedService>>,
> +    /// Cancellation token for graceful shutdown
> +    shutdown_token: CancellationToken,
> +}
> +
> +impl ServiceManager {
> +    /// Create a new service manager
> +    pub fn new() -> Self {
> +        Self {
> +            services: HashMap::new(),
> +            shutdown_token: CancellationToken::new(),
> +        }
> +    }
> +
> +    /// Add a service to be managed
> +    ///
> +    /// Services will be started when `run()` is called.
> +    ///
> +    /// # Panics
> +    ///
> +    /// Panics if a service with the same name is already registered.
> +    pub fn add_service(&mut self, service: Box<dyn Service>) {
> +        let name = service.name().to_string();
> +
> +        if self.services.contains_key(&name) {
> +            panic!("Service '{name}' is already registered");
> +        }
> +
> +        let managed = Arc::new(ManagedService {
> +            service: tokio::sync::Mutex::new(service),
> +            state: RwLock::new(ServiceState::Uninitialized),
> +            error_count: RwLock::new(0),
> +            last_init_attempt: RwLock::new(None),
> +            async_fd: RwLock::new(None),
> +            last_timer_invoke: RwLock::new(None),
> +        });
> +
> +        self.services.insert(name, managed);
> +    }
> +
> +    /// Get a handle to trigger shutdown
> +    ///
> +    /// Call `cancel()` on the returned token to initiate graceful shutdown.
> +    pub fn shutdown_token(&self) -> CancellationToken {
> +        self.shutdown_token.clone()
> +    }
> +
> +    /// Spawn the service manager in a background task
> +    ///
> +    /// Returns a JoinHandle that can be used to await completion.
> +    /// To gracefully shut down, call `.shutdown_token().cancel()` then await the handle.
> +    ///
> +    /// # Example
> +    ///
> +    /// ```ignore
> +    /// let shutdown_token = manager.shutdown_token();
> +    /// let handle = manager.spawn();
> +    /// // ... later ...
> +    /// shutdown_token.cancel();  // Trigger graceful shutdown
> +    /// handle.await;              // Wait for shutdown to complete
> +    /// ```
> +    pub fn spawn(self) -> JoinHandle<()> {
> +        tokio::spawn(async move { self.run().await })
> +    }
> +
> +    /// Run the service manager (private - use spawn() instead)
> +    ///
> +    /// This starts all registered services and runs until shutdown is requested.
> +    /// Services are automatically retried on failure according to their configuration.
> +    async fn run(self) {
> +        info!(
> +            "Starting ServiceManager with {} services",
> +            self.services.len()
> +        );
> +
> +        let services = Arc::new(self.services);
> +
> +        // Spawn retry task for failed services
> +        let retry_handle = Self::spawn_retry_task_static(Arc::clone(&services));
> +
> +        // Spawn timer callback task
> +        let timer_handle = Self::spawn_timer_task_static(Arc::clone(&services));
> +
> +        // Spawn dispatch tasks for each service
> +        let dispatch_handles = Self::spawn_dispatch_tasks_static(Arc::clone(&services));
> +
> +        // Wait for shutdown signal
> +        self.shutdown_token.cancelled().await;
> +
> +        // Graceful shutdown sequence
> +        info!("ServiceManager shutting down...");
> +
> +        // Shutdown all services gracefully
> +        Self::shutdown_all_services_static(&services).await;

Between finalize releasing the Mutex ...

> +
> +        // Cancel background tasks
> +        retry_handle.abort();
> +        timer_handle.abort();
> +        for handle in dispatch_handles {
> +            handle.abort();
> +        }

... and abort a re-initialization could happen.

Please pass the CancellationToken to every spawned task
and make each loop select! on token.canceled().
Also, reverse the shutdown order, stop tasks first and then finalize 
second to be aligned with C.

> +
> +        info!("ServiceManager stopped");
> +    }
> +
> +    /// Spawn task that retries failed service initializations
> +    fn spawn_retry_task_static(
> +        services: Arc<HashMap<String, Arc<ManagedService>>>,
> +    ) -> JoinHandle<()> {
> +        tokio::spawn(async move {
> +            let mut retry_interval = interval(Duration::from_secs(1));
> +            retry_interval.set_missed_tick_behavior(MissedTickBehavior::Skip);
> +
> +            loop {
> +                retry_interval.tick().await;
> +                Self::retry_failed_services(&services).await;
> +            }
> +        })
> +    }
> +
> +    /// Retry initialization for failed services
> +    async fn retry_failed_services(services: &HashMap<String, Arc<ManagedService>>) {
> +        for (name, managed) in services {
> +            // Check if service needs retry
> +            let state = *managed.state.read();
> +            if state != ServiceState::Uninitialized {

So this only retries on Uninitialized?
But the comment of the function says "for failed services".
Also ServiceState::Failed docs say it await retry.
Either removing Failed or actually using it (and retrying it). Right now 
this is misleading.

> +                continue;
> +            }
> +
> +            let (is_restartable, retry_interval) = {
> +                let service = managed.service.lock().await;
> +                (service.is_restartable(), service.retry_interval())
> +            };
> +
> +            // Check if this is a retry or first attempt
> +            let now = Instant::now();
> +            let is_first_attempt = managed.last_init_attempt.read().is_none();
> +
> +            // Allow first attempt for all services, but block retries for non-restartable services
> +            if !is_first_attempt && !is_restartable {
> +                continue;
> +            }
> +
> +            // Check retry throttle (only for retries)
> +            if let Some(last) = *managed.last_init_attempt.read()
> +                && now.duration_since(last) < retry_interval
> +            {
> +                continue;
> +            }
> +
> +            // Attempt initialization
> +            *managed.last_init_attempt.write() = Some(now);
> +            *managed.state.write() = ServiceState::Initializing;
> +
> +            debug!(service = %name, "Attempting to initialize service");
> +
> +            let mut service = managed.service.lock().await;
> +
> +            match service.initialize().await {

When calling this, resources are held already ..

> +                Ok(InitResult::WithFileDescriptor(fd)) => match AsyncFd::new(FdWrapper(fd)) {
> +                    Ok(async_fd) => {
> +                        *managed.async_fd.write() = Some(Arc::new(async_fd));
> +                        *managed.state.write() = ServiceState::Running;
> +                        *managed.error_count.write() = 0;
> +                        info!(service = %name, fd, "Service initialized successfully");
> +                    }
> +                    Err(e) => {
> +                        error!(service = %name, fd, error = %e, "Failed to register fd");
> +                        *managed.state.write() = ServiceState::Failed;
> +                        *managed.error_count.write() += 1;

.. but if fail here afterwards, the service is marked Failed but 
finalize() is never called which would leak resources.

Also shutdown doesnt finalize currently, as it skips non-Running.
To make sure the resources don’t leak, call finalize() before marking 
Failed.

Also this failed service will not be retried?

> +                    }
> +                },
> +                Ok(InitResult::NoFileDescriptor) => {
> +                    *managed.state.write() = ServiceState::Running;
> +                    *managed.error_count.write() = 0;
> +                    info!(service = %name, "Service initialized successfully (no fd)");
> +                }
> +                Err(e) => {
> +                    let err_count = {
> +                        let mut count = managed.error_count.write();
> +                        *count += 1;
> +                        *count
> +                    };
> +
> +                    // Only log first failure to avoid spam
> +                    if err_count == 1 {
> +                        error!(service = %name, error = %e, "Failed to initialize service");
> +                    } else {
> +                        debug!(service = %name, attempt = err_count, error = %e, "Service initialization failed");
> +                    }
> +
> +                    *managed.state.write() = ServiceState::Uninitialized;
> +                }
> +            }
> +        }
> +    }
> +
> +    /// Spawn task that invokes timer callbacks
> +    fn spawn_timer_task_static(
> +        services: Arc<HashMap<String, Arc<ManagedService>>>,
> +    ) -> JoinHandle<()> {
> +        tokio::spawn(async move {
> +            let mut timer_interval = interval(Duration::from_secs(1));

The timer task ticks at 1 second, but the API allows any Duration.

fn timer_period(&self) -> Option<Duration> {
     None
}

A service returning Some(Duration::from_millis(200)) would still only 
fire roughly every 1 second.

> +            timer_interval.set_missed_tick_behavior(MissedTickBehavior::Skip);
> +
> +            loop {
> +                timer_interval.tick().await;
> +                Self::invoke_timer_callbacks(&services).await;
> +            }
> +        })
> +    }
> +
> +    /// Invoke timer callbacks for running services
> +    async fn invoke_timer_callbacks(services: &HashMap<String, Arc<ManagedService>>) {
> +        let now = Instant::now();
> +
> +        for (name, managed) in services {
> +            // Check if service is running
> +            if *managed.state.read() != ServiceState::Running {
> +                continue;
> +            }
> +
> +            let Some(period) = ({ managed.service.lock().await.timer_period() }) else {
> +                continue;
> +            };

The mutex is acquired twice per service per tick just to read the 
timer_period() which is likely constant.
The same pattern applies to dispatch_interval().

You could probably cache the config at registration time in ManagedService?

> +
> +            // Check if it's time to invoke timer
> +            let should_invoke = match *managed.last_timer_invoke.read() {
> +                Some(last) => now.duration_since(last) >= period,
> +                None => true, // First invocation
> +            };
> +
> +            if !should_invoke {
> +                continue;
> +            }
> +
> +            *managed.last_timer_invoke.write() = Some(now);
> +
> +            debug!(service = %name, "Invoking timer callback");
> +
> +            let mut service = managed.service.lock().await;
> +
> +            if let Err(e) = service.timer_callback().await {
> +                warn!(service = %name, error = %e, "Timer callback failed");
> +            }
> +        }
> +    }
> +
> +    /// Spawn dispatch tasks for all services
> +    fn spawn_dispatch_tasks_static(
> +        services: Arc<HashMap<String, Arc<ManagedService>>>,
> +    ) -> Vec<JoinHandle<()>> {
> +        let mut handles = Vec::new();
> +
> +        for (name, managed) in services.iter() {
> +            let name = name.clone();
> +            let managed = Arc::clone(managed);
> +
> +            let handle = tokio::spawn(async move {
> +                loop {
> +                    // Wait for service to be running
> +                    loop {
> +                        tokio::time::sleep(Duration::from_millis(100)).await;

optional: a Notify/watch would be cleaner

> +                        let state = *managed.state.read();
> +                        if state == ServiceState::Running {
> +                            break;
> +                        }
> +                    }
> +
> +                    // Dispatch based on service type
> +                    let async_fd = managed.async_fd.read().clone();
> +
> +                    if let Some(fd) = async_fd {
> +                        // Event-driven dispatch
> +                        Self::dispatch_with_fd(&name, &managed, &fd).await;
> +                    } else {
> +                        // Polling dispatch
> +                        Self::dispatch_polling(&name, &managed).await;
> +                    }
> +                }
> +            });
> +
> +            handles.push(handle);
> +        }
> +
> +        handles
> +    }
> +
> +    /// Dispatch events for service with file descriptor
> +    async fn dispatch_with_fd(
> +        name: &str,
> +        managed: &Arc<ManagedService>,
> +        async_fd: &Arc<AsyncFd<FdWrapper>>,
> +    ) {
> +        loop {
> +            let readable = match async_fd.readable().await {
> +                Ok(r) => r,
> +                Err(e) => {
> +                    warn!(service = %name, error = %e, "Error waiting for fd readability");
> +                    break;

this breaks the loop / returns without doing re-initialization.
but a few lines later we handle reinitialize_service()?
is this missing here?

> +                }
> +            };
> +
> +            let mut guard = readable;
> +            let mut service = managed.service.lock().await;
> +
> +            match service.dispatch().await {
> +                Ok(DispatchAction::Continue) => {
> +                    guard.clear_ready();
> +                }
> +                Ok(DispatchAction::Reinitialize) => {
> +                    info!(service = %name, "Service requested reinitialization");
> +                    guard.clear_ready();
> +                    drop(service);
> +                    Self::reinitialize_service(name, managed).await;
> +                    break;
> +                }
> +                Err(e) => {
> +                    error!(service = %name, error = %e, "Service dispatch failed");
> +                    guard.clear_ready();
> +                    drop(service);
> +                    Self::reinitialize_service(name, managed).await;
> +                    break;
> +                }
> +            }
> +        }
> +    }
> +
> +    /// Dispatch events for service without file descriptor (polling)
> +    async fn dispatch_polling(name: &str, managed: &Arc<ManagedService>) {
> +        let dispatch_interval = managed.service.lock().await.dispatch_interval();
> +        let mut interval_timer = interval(dispatch_interval);
> +        interval_timer.set_missed_tick_behavior(MissedTickBehavior::Skip);
> +
> +        loop {
> +            interval_timer.tick().await;
> +
> +            // Check if still running
> +            if *managed.state.read() != ServiceState::Running {
> +                break;
> +            }
> +
> +            let mut service = managed.service.lock().await;
> +
> +            match service.dispatch().await {
> +                Ok(DispatchAction::Continue) => {}
> +                Ok(DispatchAction::Reinitialize) => {
> +                    info!(service = %name, "Service requested reinitialization");
> +                    drop(service);
> +                    Self::reinitialize_service(name, managed).await;
> +                    break;
> +                }
> +                Err(e) => {
> +                    error!(service = %name, error = %e, "Service dispatch failed");
> +                    drop(service);
> +                    Self::reinitialize_service(name, managed).await;

this sets ServiceState::Uninitialized. retry_failed_services() will
refuse to re-init a service when is_restartable() == false after the
first attempt and it would become stuck through

if !is_first_attempt && !is_restartable {
   continue;
}

> +                    break;
> +                }
> +            }
> +        }
> +    }
> +
> +    /// Reinitialize a service (finalize, then mark for retry)
> +    async fn reinitialize_service(name: &str, managed: &Arc<ManagedService>) {
> +        debug!(service = %name, "Reinitializing service");
> +
> +        let mut service = managed.service.lock().await;
> +
> +        if let Err(e) = service.finalize().await {
> +            warn!(service = %name, error = %e, "Error finalizing service");
> +        }
> +
> +        drop(service);

At this point, after dropping the lock, the timer task can see Running,
acquire a Mutex and call timer_callback().
Maybe make it invisible to timer/dispatch first?

> +
> +        // Clear async fd and mark for retry
> +        *managed.async_fd.write() = None;
> +        *managed.state.write() = ServiceState::Uninitialized;
> +        *managed.error_count.write() = 0;
> +    }
> +
> +    /// Shutdown all services gracefully
> +    async fn shutdown_all_services_static(services: &HashMap<String, Arc<ManagedService>>) {
> +        for (name, managed) in services {
> +            if *managed.state.read() != ServiceState::Running {

In C we finalize unconditionally all services, regardless of state.
finalize() should be idempotent to avoid skipping Failed,
Uninitialized, Initializing services.

> +                continue;
> +            }
> +
> +            info!(service = %name, "Shutting down service");
> +
> +            let mut service = managed.service.lock().await;
> +
> +            if let Err(e) = service.finalize().await {
> +                error!(service = %name, error = %e, "Error finalizing service");
> +            }

State is still Running here at this point.
Don’t we need to set ServiceState::Finalized or Uninitialized?

> +        }
> +    }
> +}
> +
> +impl Default for ServiceManager {
> +    fn default() -> Self {
> +        Self::new()
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-services/src/service.rs b/src/pmxcfs-rs/pmxcfs-services/src/service.rs
> new file mode 100644
> index 00000000..395ba67f
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-services/src/service.rs
> @@ -0,0 +1,173 @@
> +//! Service trait and related types
> +//!
> +//! This module provides the core abstraction for managed services that can
> +//! automatically retry initialization, handle errors gracefully, and provide
> +//! timer-based periodic callbacks.
> +
> +use crate::error::Result;
> +use async_trait::async_trait;
> +use std::time::Duration;
> +
> +/// A managed service that can be monitored and restarted automatically
> +///
> +/// This trait provides the core abstraction for services in the pmxcfs daemon.
> +/// Services implementing this trait gain automatic retry on failure, graceful
> +/// error handling, and optional periodic timer callbacks.
> +///
> +/// ## Lifecycle
> +///
> +/// 1. **Uninitialized** - Service created but not yet initialized
> +/// 2. **Initializing** - `initialize()` in progress
> +/// 3. **Running** - Service initialized successfully, dispatching events
> +/// 4. **Failed** - Service encountered an error, will retry if restartable
> +#[async_trait]
> +pub trait Service: Send + Sync {
> +    /// Service name for logging and identification
> +    ///
> +    /// Should be a short, descriptive identifier (e.g., "quorum", "dfsm", "confdb")
> +    fn name(&self) -> &str;
> +
> +    /// Initialize the service
> +    ///
> +    /// Called when the service is first started or after a failure (if restartable).
> +    /// Returns an `InitResult` indicating whether the service needs file descriptor
> +    /// monitoring.
> +    ///
> +    /// # Errors
> +    ///
> +    /// Returns an error if initialization fails. The ServiceManager will automatically
> +    /// retry initialization based on `retry_interval()` if `is_restartable()` returns true.
> +    ///
> +    /// # Implementation Notes
> +    ///
> +    /// - Initialize connections to external services (Corosync, CPG, etc.)
> +    /// - Set up internal state
> +    /// - Return file descriptor if the service needs event-driven dispatching
> +    /// - Keep initialization lightweight - heavy work should be in `dispatch()`
> +    async fn initialize(&mut self) -> Result<InitResult>;
> +
> +    /// Handle events for this service
> +    ///
> +    /// Called when:
> +    /// - The file descriptor returned by `initialize()` becomes readable (if WithFileDescriptor)
> +    /// - Periodically for services without file descriptors (if NoFileDescriptor)
> +    ///
> +    /// # Returns
> +    ///
> +    /// - `DispatchAction::Continue` - Continue normal operation
> +    /// - `DispatchAction::Reinitialize` - Request reinitialization (triggers `finalize()` then `initialize()`)
> +    ///
> +    /// # Errors
> +    ///
> +    /// Errors automatically trigger reinitialization if the service is restartable.
> +    /// The service will be finalized and reinitialized according to `retry_interval()`.
> +    async fn dispatch(&mut self) -> Result<DispatchAction>;
> +
> +    /// Clean up service resources
> +    ///
> +    /// Called when:
> +    /// - Service is being shut down
> +    /// - Service is being reinitialized after dispatch failure
> +    /// - ServiceManager is shutting down
> +    ///
> +    /// # Implementation Notes
> +    ///
> +    /// - Close connections
> +    /// - Release resources
> +    /// - Should not fail - log errors but return Ok(())
> +    async fn finalize(&mut self) -> Result<()>;
> +
> +    /// Optional periodic callback
> +    ///
> +    /// Called at the interval specified by `timer_period()` if the service is running.
> +    /// Useful for periodic maintenance tasks like state verification or cleanup.
> +    ///
> +    /// # Default Implementation
> +    ///
> +    /// Does nothing by default. Override to implement periodic behavior.
> +    async fn timer_callback(&mut self) -> Result<()> {
> +        Ok(())
> +    }
> +
> +    /// Timer period for periodic callbacks
> +    ///
> +    /// If `Some(duration)`, `timer_callback()` will be invoked every `duration`.
> +    /// If `None`, timer callbacks are disabled.
> +    ///
> +    /// # Default
> +    ///
> +    /// Returns `None` (no timer callbacks)
> +    fn timer_period(&self) -> Option<Duration> {
> +        None
> +    }
> +
> +    /// Whether to automatically retry initialization after failure
> +    ///
> +    /// If `true`, the ServiceManager will automatically retry `initialize()`
> +    /// after failures using the interval specified by `retry_interval()`.
> +    ///
> +    /// If `false`, the service will remain in a failed state after the first
> +    /// initialization failure.
> +    ///
> +    /// # Default
> +    ///
> +    /// Returns `true` (auto-retry enabled)
> +    fn is_restartable(&self) -> bool {
> +        true
> +    }
> +
> +    /// Minimum interval between retry attempts
> +    ///
> +    /// When `initialize()` fails, the ServiceManager will wait at least this
> +    /// long before attempting to reinitialize.
> +    ///
> +    /// # Default
> +    ///
> +    /// Returns 5 seconds (matching C implementation)
> +    fn retry_interval(&self) -> Duration {
> +        Duration::from_secs(5)
> +    }
> +
> +    /// Dispatch interval for services without file descriptors
> +    ///
> +    /// For services that return `InitResult::NoFileDescriptor`, this determines
> +    /// how often `dispatch()` is called.
> +    ///
> +    /// # Default
> +    ///
> +    /// Returns 100ms (matching current Rust implementation)
> +    fn dispatch_interval(&self) -> Duration {
> +        Duration::from_millis(100)
> +    }
> +}
> +
> +/// Result of service initialization
> +#[derive(Debug, Clone, Copy)]
> +pub enum InitResult {
> +    /// Service uses a file descriptor for event notification
> +    ///
> +    /// The ServiceManager will use tokio's AsyncFd to monitor this file descriptor
> +    /// and call `dispatch()` when it becomes readable. This is the most efficient
> +    /// mode for services that interact with Corosync (quorum, CPG, cmap).
> +    WithFileDescriptor(i32),

should use RawFd (which is i32 underneath) to better reflect intent

> +
> +    /// Service does not use a file descriptor
> +    ///
> +    /// The ServiceManager will call `dispatch()` periodically at the interval
> +    /// specified by `dispatch_interval()`. Use this for services that poll
> +    /// or have no external event source.
> +    NoFileDescriptor,
> +}
> +
> +/// Action requested by service dispatch
> +#[derive(Debug, Clone, Copy, PartialEq, Eq)]
> +pub enum DispatchAction {
> +    /// Continue normal operation
> +    Continue,
> +
> +    /// Request reinitialization
> +    ///
> +    /// The service will be finalized and reinitialized. This is useful when
> +    /// the underlying connection is lost or becomes invalid.
> +    Reinitialize,
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-services/tests/service_tests.rs b/src/pmxcfs-rs/pmxcfs-services/tests/service_tests.rs
> new file mode 100644
> index 00000000..4574a8d6
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-services/tests/service_tests.rs
> @@ -0,0 +1,808 @@
> +//! Comprehensive tests for the service framework
> +//!
> +//! Tests cover:
> +//! - Service lifecycle (start, stop, restart)
> +//! - Service manager orchestration
> +//! - Error handling and retry logic
> +//! - Timer callbacks
> +//! - File descriptor and polling dispatch modes
> +//! - Service coordination and state management
> +
> +use async_trait::async_trait;
> +use pmxcfs_services::{DispatchAction, InitResult, Service, ServiceError, ServiceManager};
> +use pmxcfs_test_utils::wait_for_condition;
> +use std::sync::Arc;
> +use std::sync::atomic::{AtomicBool, AtomicU32, Ordering};
> +use std::time::Duration;
> +use tokio::time::sleep;
> +
> +// ===== Test Service Implementations =====
> +
> +/// Mock service for testing lifecycle
> +struct MockService {
> +    name: String,
> +    init_count: Arc<AtomicU32>,
> +    dispatch_count: Arc<AtomicU32>,
> +    finalize_count: Arc<AtomicU32>,
> +    timer_count: Arc<AtomicU32>,
> +    should_fail_init: Arc<AtomicBool>,
> +    should_fail_dispatch: Arc<AtomicBool>,
> +    should_reinit: Arc<AtomicBool>,
> +    use_fd: bool,
> +    timer_period: Option<Duration>,
> +    restartable: bool,
> +}
> +
> +impl MockService {
> +    fn new(name: &str) -> Self {
> +        Self {
> +            name: name.to_string(),
> +            init_count: Arc::new(AtomicU32::new(0)),
> +            dispatch_count: Arc::new(AtomicU32::new(0)),
> +            finalize_count: Arc::new(AtomicU32::new(0)),
> +            timer_count: Arc::new(AtomicU32::new(0)),
> +            should_fail_init: Arc::new(AtomicBool::new(false)),
> +            should_fail_dispatch: Arc::new(AtomicBool::new(false)),
> +            should_reinit: Arc::new(AtomicBool::new(false)),
> +            use_fd: false,
> +            timer_period: None,
> +            restartable: true,
> +        }
> +    }
> +
> +    fn with_timer(mut self, period: Duration) -> Self {
> +        self.timer_period = Some(period);
> +        self
> +    }
> +
> +    fn with_restartable(mut self, restartable: bool) -> Self {
> +        self.restartable = restartable;
> +        self
> +    }
> +
> +    fn counters(&self) -> ServiceCounters {
> +        ServiceCounters {
> +            init_count: self.init_count.clone(),
> +            dispatch_count: self.dispatch_count.clone(),
> +            finalize_count: self.finalize_count.clone(),
> +            timer_count: self.timer_count.clone(),
> +            should_fail_init: self.should_fail_init.clone(),
> +            should_fail_dispatch: self.should_fail_dispatch.clone(),
> +            should_reinit: self.should_reinit.clone(),
> +        }
> +    }
> +}
> +
> +#[async_trait]
> +impl Service for MockService {
> +    fn name(&self) -> &str {
> +        &self.name
> +    }
> +
> +    async fn initialize(&mut self) -> pmxcfs_services::Result<InitResult> {
> +        self.init_count.fetch_add(1, Ordering::SeqCst);
> +
> +        if self.should_fail_init.load(Ordering::SeqCst) {
> +            return Err(ServiceError::InitializationFailed(
> +                "Mock init failure".to_string(),
> +            ));
> +        }
> +
> +        if self.use_fd {
> +            // Return a dummy fd (stderr is always available)
> +            Ok(InitResult::WithFileDescriptor(2))
> +        } else {
> +            Ok(InitResult::NoFileDescriptor)
> +        }
> +    }
> +
> +    async fn dispatch(&mut self) -> pmxcfs_services::Result<DispatchAction> {
> +        self.dispatch_count.fetch_add(1, Ordering::SeqCst);
> +
> +        if self.should_fail_dispatch.load(Ordering::SeqCst) {
> +            return Err(ServiceError::DispatchFailed(
> +                "Mock dispatch failure".to_string(),
> +            ));
> +        }
> +
> +        if self.should_reinit.load(Ordering::SeqCst) {
> +            return Ok(DispatchAction::Reinitialize);
> +        }
> +
> +        Ok(DispatchAction::Continue)
> +    }
> +
> +    async fn finalize(&mut self) -> pmxcfs_services::Result<()> {
> +        self.finalize_count.fetch_add(1, Ordering::SeqCst);
> +        Ok(())
> +    }
> +
> +    async fn timer_callback(&mut self) -> pmxcfs_services::Result<()> {
> +        self.timer_count.fetch_add(1, Ordering::SeqCst);
> +        Ok(())
> +    }
> +
> +    fn timer_period(&self) -> Option<Duration> {
> +        self.timer_period
> +    }
> +
> +    fn is_restartable(&self) -> bool {
> +        self.restartable
> +    }
> +
> +    fn retry_interval(&self) -> Duration {
> +        Duration::from_millis(100) // Fast retry for tests
> +    }
> +
> +    fn dispatch_interval(&self) -> Duration {
> +        Duration::from_millis(50) // Fast polling for tests
> +    }
> +}
> +
> +/// Helper struct to access service counters from tests
> +#[derive(Clone)]
> +struct ServiceCounters {
> +    init_count: Arc<AtomicU32>,
> +    dispatch_count: Arc<AtomicU32>,
> +    finalize_count: Arc<AtomicU32>,
> +    timer_count: Arc<AtomicU32>,
> +    should_fail_init: Arc<AtomicBool>,
> +    should_fail_dispatch: Arc<AtomicBool>,
> +    should_reinit: Arc<AtomicBool>,
> +}
> +
> +impl ServiceCounters {
> +    fn init_count(&self) -> u32 {
> +        self.init_count.load(Ordering::SeqCst)
> +    }
> +
> +    fn dispatch_count(&self) -> u32 {
> +        self.dispatch_count.load(Ordering::SeqCst)
> +    }
> +
> +    fn finalize_count(&self) -> u32 {
> +        self.finalize_count.load(Ordering::SeqCst)
> +    }
> +
> +    fn timer_count(&self) -> u32 {
> +        self.timer_count.load(Ordering::SeqCst)
> +    }
> +
> +    fn set_fail_init(&self, fail: bool) {
> +        self.should_fail_init.store(fail, Ordering::SeqCst);
> +    }
> +
> +    fn set_fail_dispatch(&self, fail: bool) {
> +        self.should_fail_dispatch.store(fail, Ordering::SeqCst);
> +    }
> +
> +    fn set_reinit(&self, reinit: bool) {
> +        self.should_reinit.store(reinit, Ordering::SeqCst);
> +    }
> +}
> +
> +// ===== Lifecycle Tests =====
> +
> +#[tokio::test]
> +async fn test_service_lifecycle_basic() {
> +    let service = MockService::new("test_service");
> +    let counters = service.counters();
> +
> +    let mut manager = ServiceManager::new();
> +    manager.add_service(Box::new(service));
> +
> +    let shutdown_token = manager.shutdown_token();
> +    let handle = manager.spawn();
> +
> +    // Wait for initialization and dispatching
> +    assert!(
> +        wait_for_condition(
> +            || counters.init_count() >= 1 && counters.dispatch_count() >= 1,
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Service should initialize and dispatch within 5 seconds"
> +    );
> +
> +    // Shutdown
> +    shutdown_token.cancel();
> +    let _ = handle.await;
> +
> +    // Service should be finalized
> +    assert_eq!(
> +        counters.finalize_count(),
> +        1,
> +        "Service should be finalized exactly once"
> +    );
> +}
> +
> +#[tokio::test]
> +async fn test_service_with_file_descriptor() {
> +    // Don't use FD-based service in tests since we can't easily create a readable FD
> +    // Just test that WithFileDescriptor variant works with manager
> +    let service = MockService::new("no_fd_service"); // Changed to not use FD

The dispatch_with_fd code path is untested. Lets try to add
tests for that.

> +    let counters = service.counters();
> +
> +    let mut manager = ServiceManager::new();
> +    manager.add_service(Box::new(service));
> +
> +    let shutdown_token = manager.shutdown_token();
> +    let handle = manager.spawn();
> +
> +    // Wait for initialization and some dispatches
> +    assert!(
> +        wait_for_condition(
> +            || counters.init_count() == 1 && counters.dispatch_count() >= 1,
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Service should initialize once and dispatch within 5 seconds"
> +    );
> +
> +    // Shutdown
> +    shutdown_token.cancel();
> +    let _ = handle.await;
> +
> +    assert_eq!(counters.finalize_count(), 1, "Service should finalize once");
> +}
> +
> +#[tokio::test]
> +async fn test_service_initialization_failure() {
> +    let service = MockService::new("failing_service");
> +    let counters = service.counters();
> +
> +    // Make initialization fail
> +    counters.set_fail_init(true);
> +
> +    let mut manager = ServiceManager::new();
> +    manager.add_service(Box::new(service));
> +
> +    let shutdown_token = manager.shutdown_token();
> +    let handle = manager.spawn();
> +
> +    // Wait for several retry attempts
> +    assert!(
> +        wait_for_condition(
> +            || counters.init_count() >= 3,
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Service should retry initialization at least 3 times within 5 seconds"
> +    );
> +
> +    // Dispatch should not run if init fails
> +    assert_eq!(
> +        counters.dispatch_count(),
> +        0,
> +        "Service should not dispatch if init fails"
> +    );
> +
> +    // Shutdown
> +    shutdown_token.cancel();
> +    let _ = handle.await;
> +}
> +
> +#[tokio::test]
> +async fn test_service_initialization_recovery() {
> +    let service = MockService::new("recovering_service");
> +    let counters = service.counters();
> +
> +    // Start with failing initialization
> +    counters.set_fail_init(true);
> +
> +    let mut manager = ServiceManager::new();
> +    manager.add_service(Box::new(service));
> +
> +    let shutdown_token = manager.shutdown_token();
> +    let handle = manager.spawn();
> +
> +    // Wait for some failed attempts
> +    assert!(
> +        wait_for_condition(
> +            || counters.init_count() >= 2,
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Should have at least 2 failed initialization attempts within 5 seconds"
> +    );
> +
> +    let failed_attempts = counters.init_count();
> +
> +    // Allow initialization to succeed
> +    counters.set_fail_init(false);
> +
> +    // Wait for recovery
> +    assert!(
> +        wait_for_condition(
> +            || counters.init_count() > failed_attempts && counters.dispatch_count() >= 1,
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Service should recover and start dispatching within 5 seconds"
> +    );
> +
> +    // Shutdown
> +    shutdown_token.cancel();
> +    let _ = handle.await;
> +}
> +
> +#[tokio::test]
> +async fn test_service_not_restartable() {

This only tests non restartable with init failure.

Please add an additional test for a non restartable service that
succeeds init but fails dispatch,
which would catch the path where
reinitialize_service() sets Uninitialized but retry refuses
because !is_first_attempt && !is_restartable.

> +    let service = MockService::new("non_restartable").with_restartable(false);
> +    let counters = service.counters();
> +
> +    // Make initialization fail
> +    counters.set_fail_init(true);
> +
> +    let mut manager = ServiceManager::new();
> +    manager.add_service(Box::new(service));
> +
> +    let shutdown_token = manager.shutdown_token();
> +    let handle = manager.spawn();
> +
> +    // Wait for initialization attempt
> +    assert!(
> +        wait_for_condition(
> +            || counters.init_count() >= 1,
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Service should attempt initialization within 5 seconds"
> +    );
> +
> +    // Service should only try once (not restartable)
> +    assert_eq!(
> +        counters.init_count(),
> +        1,
> +        "Non-restartable service should only try initialization once"
> +    );
> +
> +    // Wait another cycle to confirm it doesn't retry
> +    sleep(Duration::from_millis(1500)).await;
> +
> +    // Should still be 1
> +    assert_eq!(
> +        counters.init_count(),
> +        1,
> +        "Non-restartable service should not retry, got {}",
> +        counters.init_count()
> +    );
> +
> +    // Shutdown
> +    shutdown_token.cancel();
> +    let _ = handle.await;
> +}
> +
> +// ===== Dispatch Tests =====
> +
> +#[tokio::test]
> +async fn test_service_dispatch_failure_triggers_reinit() {
> +    let service = MockService::new("dispatch_fail_service");
> +    let counters = service.counters();
> +
> +    let mut manager = ServiceManager::new();
> +    manager.add_service(Box::new(service));
> +
> +    let shutdown_token = manager.shutdown_token();
> +    let handle = manager.spawn();
> +
> +    // Wait for initialization and first dispatches
> +    assert!(
> +        wait_for_condition(
> +            || counters.init_count() == 1 && counters.dispatch_count() >= 1,
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Service should initialize once and dispatch within 5 seconds"
> +    );
> +
> +    // Make dispatch fail
> +    counters.set_fail_dispatch(true);
> +
> +    // Wait for dispatch failure and reinitialization
> +    assert!(
> +        wait_for_condition(
> +            || counters.init_count() >= 2 && counters.finalize_count() >= 1,
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Service should reinitialize and finalize after dispatch failure within 5 seconds"
> +    );
> +
> +    // Shutdown
> +    shutdown_token.cancel();
> +    let _ = handle.await;
> +}
> +
> +#[tokio::test]
> +async fn test_service_dispatch_requests_reinit() {
> +    let service = MockService::new("reinit_request_service");
> +    let counters = service.counters();
> +
> +    let mut manager = ServiceManager::new();
> +    manager.add_service(Box::new(service));
> +
> +    let shutdown_token = manager.shutdown_token();
> +    let handle = manager.spawn();
> +
> +    // Wait for initialization
> +    assert!(
> +        wait_for_condition(
> +            || counters.init_count() == 1,
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Service should initialize once within 5 seconds"
> +    );
> +
> +    // Request reinitialization from dispatch
> +    counters.set_reinit(true);
> +
> +    // Wait for reinitialization
> +    assert!(
> +        wait_for_condition(
> +            || counters.init_count() >= 2 && counters.finalize_count() >= 1,
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Service should reinitialize and finalize when dispatch requests it within 5 seconds"
> +    );
> +
> +    // Shutdown
> +    shutdown_token.cancel();
> +    let _ = handle.await;
> +}
> +
> +// ===== Timer Callback Tests =====
> +
> +#[tokio::test]
> +async fn test_service_timer_callback() {
> +    let service = MockService::new("timer_service").with_timer(Duration::from_millis(300));
> +    let counters = service.counters();
> +
> +    let mut manager = ServiceManager::new();
> +    manager.add_service(Box::new(service));
> +
> +    let shutdown_token = manager.shutdown_token();
> +    let handle = manager.spawn();
> +
> +    // Wait for initialization plus several timer periods
> +    assert!(
> +        wait_for_condition(
> +            || counters.timer_count() >= 3,
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Timer should fire at least 3 times within 5 seconds"
> +    );
> +
> +    let timer_count = counters.timer_count();
> +
> +    // Wait for more timer invocations
> +    assert!(
> +        wait_for_condition(
> +            || counters.timer_count() > timer_count,
> +            Duration::from_secs(2),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Timer should continue firing"
> +    );
> +
> +    // Shutdown
> +    shutdown_token.cancel();
> +    let _ = handle.await;
> +}
> +
> +#[tokio::test]
> +async fn test_service_timer_callback_not_invoked_when_failed() {
> +    let service = MockService::new("failed_timer_service").with_timer(Duration::from_millis(100));
> +    let counters = service.counters();
> +
> +    // Make initialization fail
> +    counters.set_fail_init(true);
> +
> +    let mut manager = ServiceManager::new();
> +    manager.add_service(Box::new(service));
> +
> +    let shutdown_token = manager.shutdown_token();
> +    let handle = manager.spawn();
> +
> +    // Wait for several timer periods
> +    sleep(Duration::from_millis(2000)).await;
> +
> +    // Timer should NOT fire if service is not running
> +    assert_eq!(
> +        counters.timer_count(),
> +        0,
> +        "Timer should not fire when service is not running"
> +    );
> +
> +    // Shutdown
> +    shutdown_token.cancel();
> +    let _ = handle.await;
> +}
> +
> +// ===== Service Manager Tests =====
> +
> +#[tokio::test]
> +async fn test_manager_multiple_services() {
> +    let service1 = MockService::new("service1");
> +    let service2 = MockService::new("service2");
> +    let service3 = MockService::new("service3");
> +
> +    let counters1 = service1.counters();
> +    let counters2 = service2.counters();
> +    let counters3 = service3.counters();
> +
> +    let mut manager = ServiceManager::new();
> +    manager.add_service(Box::new(service1));
> +    manager.add_service(Box::new(service2));
> +    manager.add_service(Box::new(service3));
> +
> +    let shutdown_token = manager.shutdown_token();
> +    let handle = manager.spawn();
> +
> +    // Wait for initialization
> +    assert!(
> +        wait_for_condition(
> +            || counters1.init_count() == 1
> +                && counters2.init_count() == 1
> +                && counters3.init_count() == 1
> +                && counters1.dispatch_count() >= 1
> +                && counters2.dispatch_count() >= 1
> +                && counters3.dispatch_count() >= 1,
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "All services should initialize and dispatch within 5 seconds"
> +    );
> +
> +    // Shutdown
> +    shutdown_token.cancel();
> +    let _ = handle.await;
> +
> +    // All services should be finalized
> +    assert_eq!(counters1.finalize_count(), 1, "Service1 should finalize");
> +    assert_eq!(counters2.finalize_count(), 1, "Service2 should finalize");
> +    assert_eq!(counters3.finalize_count(), 1, "Service3 should finalize");
> +}
> +
> +#[tokio::test]
> +#[should_panic(expected = "already registered")]
> +async fn test_manager_duplicate_service_name() {
> +    let service1 = MockService::new("duplicate");
> +    let service2 = MockService::new("duplicate");
> +
> +    let mut manager = ServiceManager::new();
> +    manager.add_service(Box::new(service1));
> +    manager.add_service(Box::new(service2)); // Should panic
> +}
> +
> +#[tokio::test]
> +async fn test_manager_partial_service_failure() {
> +    let service1 = MockService::new("working_service");
> +    let service2 = MockService::new("failing_service");
> +
> +    let counters1 = service1.counters();
> +    let counters2 = service2.counters();
> +
> +    // Make service2 fail
> +    counters2.set_fail_init(true);
> +
> +    let mut manager = ServiceManager::new();
> +    manager.add_service(Box::new(service1));
> +    manager.add_service(Box::new(service2));
> +
> +    let shutdown_token = manager.shutdown_token();
> +    let handle = manager.spawn();
> +
> +    // Wait for initialization
> +    assert!(
> +        wait_for_condition(
> +            || counters1.init_count() == 1
> +                && counters1.dispatch_count() >= 1
> +                && counters2.init_count() >= 2,
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Service1 should work normally and Service2 should retry within 5 seconds"
> +    );
> +
> +    // Service2 should not dispatch when failing
> +    assert_eq!(
> +        counters2.dispatch_count(),
> +        0,
> +        "Service2 should not dispatch when failing"
> +    );
> +
> +    // Shutdown
> +    shutdown_token.cancel();
> +    let _ = handle.await;
> +
> +    // Only service1 should finalize (service2 never initialized)
> +    assert_eq!(counters1.finalize_count(), 1, "Service1 should finalize");
> +    assert_eq!(
> +        counters2.finalize_count(),
> +        0,
> +        "Service2 should not finalize if never initialized"
> +    );
> +}
> +
> +// ===== Error Handling Tests =====
> +
> +#[tokio::test]
> +async fn test_service_error_count_tracking() {
> +    let service = MockService::new("error_tracking_service");
> +    let counters = service.counters();
> +
> +    // Make initialization fail
> +    counters.set_fail_init(true);
> +
> +    let mut manager = ServiceManager::new();
> +    manager.add_service(Box::new(service));
> +
> +    let shutdown_token = manager.shutdown_token();
> +    let handle = manager.spawn();
> +
> +    // Wait for multiple failures
> +    assert!(
> +        wait_for_condition(
> +            || counters.init_count() >= 4,
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Should accumulate at least 4 failures within 5 seconds"
> +    );
> +
> +    // Allow recovery
> +    counters.set_fail_init(false);
> +
> +    // Wait for recovery
> +    assert!(
> +        wait_for_condition(
> +            || counters.dispatch_count() >= 1,
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Service should recover within 5 seconds"
> +    );
> +
> +    // Shutdown
> +    shutdown_token.cancel();
> +    let _ = handle.await;
> +}
> +
> +#[tokio::test]
> +async fn test_service_graceful_shutdown() {
> +    let service = MockService::new("shutdown_test");
> +    let counters = service.counters();
> +
> +    let mut manager = ServiceManager::new();
> +    manager.add_service(Box::new(service));
> +
> +    let shutdown_token = manager.shutdown_token();
> +    let handle = manager.spawn();
> +
> +    // Wait for service to be running
> +    assert!(
> +        wait_for_condition(
> +            || counters.dispatch_count() >= 1,
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Service should be running within 5 seconds"
> +    );
> +
> +    // Graceful shutdown
> +    shutdown_token.cancel();
> +    let _ = handle.await;
> +
> +    // Service should be properly finalized
> +    assert_eq!(
> +        counters.finalize_count(),
> +        1,
> +        "Service should finalize during shutdown"
> +    );
> +}
> +
> +// ===== Concurrency Tests =====
> +
> +#[tokio::test]
> +async fn test_service_concurrent_operations() {
> +    let service = MockService::new("concurrent_service").with_timer(Duration::from_millis(200));
> +    let counters = service.counters();
> +
> +    let mut manager = ServiceManager::new();
> +    manager.add_service(Box::new(service));
> +
> +    let shutdown_token = manager.shutdown_token();
> +    let handle = manager.spawn();
> +
> +    // Wait for service to run with both dispatch and timer
> +    assert!(
> +        wait_for_condition(
> +            || counters.dispatch_count() >= 3 && counters.timer_count() >= 3,
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Service should dispatch and timer should fire multiple times within 5 seconds"
> +    );
> +
> +    // Shutdown
> +    shutdown_token.cancel();
> +    let _ = handle.await;
> +}
> +
> +#[tokio::test]
> +async fn test_service_state_consistency_after_reinit() {
> +    let service = MockService::new("consistency_service");
> +    let counters = service.counters();
> +
> +    let mut manager = ServiceManager::new();
> +    manager.add_service(Box::new(service));
> +
> +    let shutdown_token = manager.shutdown_token();
> +    let handle = manager.spawn();
> +
> +    // Wait for initialization
> +    assert!(
> +        wait_for_condition(
> +            || counters.init_count() >= 1,
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Service should initialize within 5 seconds"
> +    );
> +
> +    // Trigger reinitialization
> +    counters.set_reinit(true);
> +
> +    // Wait for reinit
> +    assert!(
> +        wait_for_condition(
> +            || counters.init_count() >= 2,
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Service should reinitialize within 5 seconds"
> +    );
> +
> +    // Clear reinit flag
> +    counters.set_reinit(false);
> +
> +    // Wait for more dispatches
> +    let dispatch_count = counters.dispatch_count();
> +    assert!(
> +        wait_for_condition(
> +            || counters.dispatch_count() > dispatch_count,
> +            Duration::from_secs(2),
> +            Duration::from_millis(10),
> +        )
> +        .await,
> +        "Service should continue dispatching after reinit"
> +    );
> +
> +    // Shutdown
> +    shutdown_token.cancel();
> +    let _ = handle.await;
> +}





^ permalink raw reply	[relevance 5%]

* Re: [pve-devel] [PATCH pve-cluster 11/15] pmxcfs-rs: vendor patched rust-corosync for CPG compatibility
  @ 2026-02-11 12:55  6%   ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-11 12:55 UTC (permalink / raw)
  To: Proxmox VE development discussion, Kefu Chai

comments inline

On 1/7/26 10:16 AM, Kefu Chai wrote:
> Add vendored rust-corosync library with CPG group name fix to support
> optional trailing nuls in group names, ensuring compatibility between
> Rust and C pmxcfs implementations.
> 
> The patch addresses a limitation in CString::new() which doesn't allow
> trailing \0 in its input, while C code uses strlen(name) + 1 for CPG
> group names (including the trailing nul).
> 
> This vendored version will be replaced once the fix is upstreamed and
> a new rust-corosync crate version is published.
> 
> See: vendor/rust-corosync/README.PATCH.md for details

Could you please link the relevant GH issue / PR for more context? Also 
please mention it in the README.PATCH.md.

> ---
>   src/pmxcfs-rs/Cargo.toml                      |    6 +
>   src/pmxcfs-rs/vendor/rust-corosync/Cargo.toml |   33 +
>   .../vendor/rust-corosync/Cargo.toml.orig      |   19 +
>   src/pmxcfs-rs/vendor/rust-corosync/LICENSE    |   21 +
>   .../vendor/rust-corosync/README.PATCH.md      |   36 +
>   src/pmxcfs-rs/vendor/rust-corosync/README.md  |   13 +
>   src/pmxcfs-rs/vendor/rust-corosync/build.rs   |   64 +
>   .../vendor/rust-corosync/regenerate-sys.sh    |   15 +
>   src/pmxcfs-rs/vendor/rust-corosync/src/cfg.rs |  392 ++
>   .../vendor/rust-corosync/src/cmap.rs          |  812 ++++
>   src/pmxcfs-rs/vendor/rust-corosync/src/cpg.rs |  657 ++++
>   src/pmxcfs-rs/vendor/rust-corosync/src/lib.rs |  297 ++
>   .../vendor/rust-corosync/src/quorum.rs        |  337 ++
>   .../vendor/rust-corosync/src/sys/cfg.rs       | 1239 ++++++
>   .../vendor/rust-corosync/src/sys/cmap.rs      | 3323 +++++++++++++++++
>   .../vendor/rust-corosync/src/sys/cpg.rs       | 1310 +++++++
>   .../vendor/rust-corosync/src/sys/mod.rs       |    8 +
>   .../vendor/rust-corosync/src/sys/quorum.rs    |  537 +++
>   .../rust-corosync/src/sys/votequorum.rs       |  574 +++
>   .../vendor/rust-corosync/src/votequorum.rs    |  556 +++
>   20 files changed, 10249 insertions(+)
>   create mode 100644 src/pmxcfs-rs/vendor/rust-corosync/Cargo.toml
>   create mode 100644 src/pmxcfs-rs/vendor/rust-corosync/Cargo.toml.orig
>   create mode 100644 src/pmxcfs-rs/vendor/rust-corosync/LICENSE
>   create mode 100644 src/pmxcfs-rs/vendor/rust-corosync/README.PATCH.md
>   create mode 100644 src/pmxcfs-rs/vendor/rust-corosync/README.md
>   create mode 100644 src/pmxcfs-rs/vendor/rust-corosync/build.rs
>   create mode 100644 src/pmxcfs-rs/vendor/rust-corosync/regenerate-sys.sh
>   create mode 100644 src/pmxcfs-rs/vendor/rust-corosync/src/cfg.rs
>   create mode 100644 src/pmxcfs-rs/vendor/rust-corosync/src/cmap.rs
>   create mode 100644 src/pmxcfs-rs/vendor/rust-corosync/src/cpg.rs
>   create mode 100644 src/pmxcfs-rs/vendor/rust-corosync/src/lib.rs
>   create mode 100644 src/pmxcfs-rs/vendor/rust-corosync/src/quorum.rs
>   create mode 100644 src/pmxcfs-rs/vendor/rust-corosync/src/sys/cfg.rs
>   create mode 100644 src/pmxcfs-rs/vendor/rust-corosync/src/sys/cmap.rs
>   create mode 100644 src/pmxcfs-rs/vendor/rust-corosync/src/sys/cpg.rs
>   create mode 100644 src/pmxcfs-rs/vendor/rust-corosync/src/sys/mod.rs
>   create mode 100644 src/pmxcfs-rs/vendor/rust-corosync/src/sys/quorum.rs
>   create mode 100644 src/pmxcfs-rs/vendor/rust-corosync/src/sys/votequorum.rs
>   create mode 100644 src/pmxcfs-rs/vendor/rust-corosync/src/votequorum.rs
> 
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index 4d18aa93..a178bc27 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -91,3 +91,9 @@ strip = true
>   [profile.dev]
>   opt-level = 1
>   debug = true
> +
> +[patch.crates-io]
> +# Temporary patch for CPG group name length bug
> +# Fixed in corosync upstream (commit 71d6d93c) but not yet released
> +# Remove this patch when rust-corosync > 0.1.0 is published
> +rust-corosync = { path = "vendor/rust-corosync" }
> diff --git a/src/pmxcfs-rs/vendor/rust-corosync/Cargo.toml b/src/pmxcfs-rs/vendor/rust-corosync/Cargo.toml
> new file mode 100644
> index 00000000..f299ca76
> --- /dev/null
> +++ b/src/pmxcfs-rs/vendor/rust-corosync/Cargo.toml
> @@ -0,0 +1,33 @@
> +# THIS FILE IS AUTOMATICALLY GENERATED BY CARGO
> +#
> +# When uploading crates to the registry Cargo will automatically
> +# "normalize" Cargo.toml files for maximal compatibility
> +# with all versions of Cargo and also rewrite `path` dependencies
> +# to registry (e.g., crates.io) dependencies
> +#
> +# If you believe there's an error in this file please file an
> +# issue against the rust-lang/cargo repository. If you're
> +# editing this file be aware that the upstream Cargo.toml
> +# will likely look very different (and much more reasonable)
> +
> +[package]
> +edition = "2018"
> +name = "rust-corosync"
> +version = "0.1.0"

Not fully sure I understand:
why are we using 0.1.0? Crates.io lists newer versions:
https://crates.io/crates/rust-corosync/versions

> +authors = ["Christine Caulfield <ccaulfie@redhat.com>"]
> +description = "Rust bindings for corosync libraries"
> +readme = "README.md"
> +keywords = ["cluster", "high-availability"]
> +categories = ["api-bindings"]
> +license = "MIT OR Apache-2.0"
> +repository = "https://github.com/chrissie-c/rust-corosync"
> +[dependencies.bitflags]
> +version = "1.2.1"
> +
> +[dependencies.lazy_static]
> +version = "1.4.0"
> +
> +[dependencies.num_enum]
> +version = "0.5.1"
> +[build-dependencies.pkg-config]
> +version = "0.3"
> diff --git a/src/pmxcfs-rs/vendor/rust-corosync/Cargo.toml.orig b/src/pmxcfs-rs/vendor/rust-corosync/Cargo.toml.orig
> new file mode 100644
> index 00000000..2165c8e9
> --- /dev/null
> +++ b/src/pmxcfs-rs/vendor/rust-corosync/Cargo.toml.orig
> @@ -0,0 +1,19 @@
> +[package]
> +name = "rust-corosync"
> +version = "0.1.0"
> +authors = ["Christine Caulfield <ccaulfie@redhat.com>"]
> +edition = "2018"
> +readme = "README.md"
> +license = "MIT OR Apache-2.0"
> +repository = "https://github.com/chrissie-c/rust-corosync"
> +description = "Rust bindings for corosync libraries"
> +categories = ["api-bindings"]
> +keywords = ["cluster", "high-availability"]
> +
> +[dependencies]
> +lazy_static = "1.4.0"
> +num_enum = "0.5.1"
> +bitflags = "1.2.1"
> +
> +[build-dependencies]
> +pkg-config = "0.3"
> diff --git a/src/pmxcfs-rs/vendor/rust-corosync/LICENSE b/src/pmxcfs-rs/vendor/rust-corosync/LICENSE
> new file mode 100644
> index 00000000..43da7b99
> --- /dev/null
> +++ b/src/pmxcfs-rs/vendor/rust-corosync/LICENSE
> @@ -0,0 +1,21 @@
> +MIT License
> +
> +Copyright (c) 2021 Chrissie Caulfield
> +
> +Permission is hereby granted, free of charge, to any person obtaining a copy
> +of this software and associated documentation files (the "Software"), to deal
> +in the Software without restriction, including without limitation the rights
> +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> +copies of the Software, and to permit persons to whom the Software is
> +furnished to do so, subject to the following conditions:
> +
> +The above copyright notice and this permission notice shall be included in all
> +copies or substantial portions of the Software.
> +
> +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> +SOFTWARE.
> diff --git a/src/pmxcfs-rs/vendor/rust-corosync/README.PATCH.md b/src/pmxcfs-rs/vendor/rust-corosync/README.PATCH.md
> new file mode 100644
> index 00000000..c8ba2d6f
> --- /dev/null
> +++ b/src/pmxcfs-rs/vendor/rust-corosync/README.PATCH.md
> @@ -0,0 +1,36 @@
> +# Temporary Vendored rust-corosync v0.1.0
> +
> +This is a temporary vendored copy of `rust-corosync` v0.1.0 with a critical bug fix.
> +
> +## Why Vendored?
> +
> +The published `rust-corosync` v0.1.0 on crates.io has a bug that prevents Rust and C applications from joining the same CPG groups. This bug has been fixed in corosync upstream but not yet released.

Can you please link the commit?

> +
> +## Upstream Fix
> +
> +The fix has been committed to the corosync repository:
> +- Repository: https://github.com/corosync/corosync
> +- Local commit: `~/dev/corosync` commit 71d6d93c

So this is the local commit after applying the fix diff, right?

> +- File: `bindings/rust/src/cpg.rs`
> +- Lines changed: 209-220

Since the fix is quite small, can we please add the diff here?

> +
> +## The Bug
> +
> +CPG group name length calculation was excluding the null terminator:
> +- C code: `length = strlen(name) + 1` (includes \0)
> +- Rust (before): `length = name.len()` (excludes \0)
> +- Rust (after): `length = name.len() + 1` (includes \0)
> +
> +This caused Rust and C nodes to be isolated in separate CPG groups even when using identical group names.
> +
> +## Removal Plan
> +
> +Once `rust-corosync` v0.1.1+ is published with this fix:
> +
> +1. Remove this `vendor/rust-corosync` directory
> +2. Remove the `[patch.crates-io]` section from `../Cargo.toml`
> +3. Update workspace dependency to `rust-corosync = "0.1.1"`
> +
> +## Testing
> +
> +The fix has been tested with mixed C/Rust pmxcfs clusters and verified that all nodes successfully join the same CPG group and communicate properly.
> diff --git a/src/pmxcfs-rs/vendor/rust-corosync/README.md b/src/pmxcfs-rs/vendor/rust-corosync/README.md
> new file mode 100644
> index 00000000..9c376b8a
> --- /dev/null
> +++ b/src/pmxcfs-rs/vendor/rust-corosync/README.md
> @@ -0,0 +1,13 @@
> +# rust-corosync
> +Rust bindings for corosync
> +
> +This crate covers Rust bindings for the
> +cfg, cmap, cpg, quorum, votequorum
> +libraries in corosync.
> +
> +It is very much in an alpha state at the moment and APIs
> +may well change as and when people start to use them.
> +
> +Please report bugs and offer any suggestions to ccaulfie@redhat.com
> +
> +https://corosync.github.io/corosync/
> diff --git a/src/pmxcfs-rs/vendor/rust-corosync/build.rs b/src/pmxcfs-rs/vendor/rust-corosync/build.rs
> new file mode 100644
> index 00000000..8635b5e4
> --- /dev/null

[..]





^ permalink raw reply	[relevance 6%]

* [PATCH proxmox-backup 0/1] fix #7311: bin: init proxmox_acme_api in proxmox-daily-update
@ 2026-02-12 13:58 15% Samuel Rufinatscha
  2026-02-12 13:58 17% ` [PATCH proxmox-backup 1/1] " Samuel Rufinatscha
  2026-02-12 14:37  6% ` applied: [PATCH proxmox-backup 0/1] " Fabian Grünbichler
  0 siblings, 2 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-12 13:58 UTC (permalink / raw)
  To: pbs-devel

This patch adds the missing proxmox_acme_api::init() call in
proxmox-daily-update, fixing the regression introduced in
4.1.2-1 where certificate renewal fails [0].

Tested by running:

    /usr/lib/x86_64-linux-gnu/proxmox-backup/proxmox-daily-update

which now completes successfully without panicking or hanging.
The command was tested against Pebble [1] for both
HTTP-01 and DNS-01 challenge types.

HTTP-01 Challenge Test

(1) make deb, deployed package
(2) installed Pebble on the same VM:

        cd
        apt update
        apt install -y golang git
        git clone https://github.com/letsencrypt/pebble
        cd pebble
        go build ./cmd/pebble

(3) downloaded and trusted the Pebble cert:

        wget https://raw.githubusercontent.com/letsencrypt/pebble/main/test/certs/pebble.minica.pem
        cp pebble.minica.pem /usr/local/share/ca-certificates/pebble.minica.crt
        update-ca-certificates

(4) set httpPort to 80 in Pebble's config so PBS's standalone plugin
    can handle HTTP-01 validation on port 80:

        nano ./test/config/pebble-config.json

(5) started Pebble:

        ./pebble -config ./test/config/pebble-config.json &

(6) created an ACME account:

        proxmox-backup-manager acme account register default admin@example.com \
            --directory 'https://127.0.0.1:14000/dir'

(7) Created a domain (used my host domain name from /etc/hosts) and ordered
the certificate via proxmox-daily-update.

DNS-01 Challenge Test

Same VM setup as above, additionally:

(1) build and start the challenge test server:

    go build ./cmd/pebble-challtestsrv
    ./pebble-challtestsrv -http01 "" -https01 "" -tlsalpn01 "" \
        -dns01 :8053 -defaultIPv4 127.0.0.1 &

(2) start Pebble with DNS resolver pointing at the challenge test
    server:

    ./pebble -config ./test/config/pebble-config.json \
        -dnsserver 127.0.0.1:8053 &

(3) create and registered a custom DNS plugin script at
    /usr/share/proxmox-acme/dnsapi/dns_pebble.sh.

(4) created an ACME account, changed challenge type of existing domain
to DNS and order the certificate via proxmox-daily-update.

Note: Pebble does not persist account info across restarts. On reboot,
remove the old account from /etc/proxmox-backup/acme/accounts and
create a new one.

*Maintainer notes*
- this fix requires a version bump

[0] https://bugzilla.proxmox.com/show_bug.cgi?id=7311
[1] https://github.com/letsencrypt/pebble

Samuel Rufinatscha (1):
  fix #7311: bin: init proxmox_acme_api in proxmox-daily-update

 src/bin/proxmox-daily-update.rs | 3 +++
 1 file changed, 3 insertions(+)

-- 
2.47.3





^ permalink raw reply	[relevance 15%]

* [PATCH proxmox-backup 1/1] fix #7311: bin: init proxmox_acme_api in proxmox-daily-update
  2026-02-12 13:58 15% [PATCH proxmox-backup 0/1] fix #7311: bin: init proxmox_acme_api in proxmox-daily-update Samuel Rufinatscha
@ 2026-02-12 13:58 17% ` Samuel Rufinatscha
  2026-02-12 14:37  6% ` applied: [PATCH proxmox-backup 0/1] " Fabian Grünbichler
  1 sibling, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-12 13:58 UTC (permalink / raw)
  To: pbs-devel

The daily-update binary was missing initialization of the ACME config directory,
causing certificate renewal to panic.

Fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=7311
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
 src/bin/proxmox-daily-update.rs | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/bin/proxmox-daily-update.rs b/src/bin/proxmox-daily-update.rs
index 224103cc..025eb47f 100644
--- a/src/bin/proxmox-daily-update.rs
+++ b/src/bin/proxmox-daily-update.rs
@@ -6,6 +6,7 @@ use proxmox_router::{cli::*, ApiHandler, RpcEnvironment};
 use proxmox_subscription::SubscriptionStatus;
 use proxmox_sys::fs::CreateOptions;
 
+use pbs_buildcfg::configdir;
 use proxmox_backup::api2;
 
 async fn wait_for_local_worker(upid_str: &str) -> Result<(), Error> {
@@ -104,6 +105,8 @@ async fn run(rpcenv: &mut dyn RpcEnvironment) -> Result<(), Error> {
 
     proxmox_notify::context::set_context(&PBS_CONTEXT);
 
+    proxmox_acme_api::init(configdir!("/acme"), false)?;
+
     do_update(rpcenv).await
 }
 
-- 
2.47.3





^ permalink raw reply related	[relevance 17%]

* applied: [PATCH proxmox-backup 0/1] fix #7311: bin: init proxmox_acme_api in proxmox-daily-update
  2026-02-12 13:58 15% [PATCH proxmox-backup 0/1] fix #7311: bin: init proxmox_acme_api in proxmox-daily-update Samuel Rufinatscha
  2026-02-12 13:58 17% ` [PATCH proxmox-backup 1/1] " Samuel Rufinatscha
@ 2026-02-12 14:37  6% ` Fabian Grünbichler
  1 sibling, 0 replies; 117+ results
From: Fabian Grünbichler @ 2026-02-12 14:37 UTC (permalink / raw)
  To: pbs-devel, Samuel Rufinatscha


On Thu, 12 Feb 2026 14:58:28 +0100, Samuel Rufinatscha wrote:
> This patch adds the missing proxmox_acme_api::init() call in
> proxmox-daily-update, fixing the regression introduced in
> 4.1.2-1 where certificate renewal fails [0].
> 
> Tested by running:
> 
>     /usr/lib/x86_64-linux-gnu/proxmox-backup/proxmox-daily-update
> 
> [...]

Applied, thanks!

[1/1] fix #7311: bin: init proxmox_acme_api in proxmox-daily-update
      commit: ec54e5cd87f7c41c3776deb3164dea0d5347e153

Best regards,
-- 
Fabian Grünbichler <f.gruenbichler@proxmox.com>




^ permalink raw reply	[relevance 6%]

* Re: [pve-devel] [PATCH pve-cluster 09/15] pmxcfs-rs: add pmxcfs-ipc crate
  @ 2026-02-12 15:21  5%   ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-12 15:21 UTC (permalink / raw)
  To: Proxmox VE development discussion, Kefu Chai

Thanks for this IPC implementation, Kefu :)

Quite a comprehensive patch.
Already a good step in the right direction.

Comments inlined.

On 1/7/26 10:16 AM, Kefu Chai wrote:
> Add libqb-compatible IPC server implementation:
> - QB_IPC_SHM protocol (shared memory ring buffers)
> - Abstract Unix socket (@pve2) for handshake
> - Lock-free SPSC ring buffers
> - Authentication via SO_PASSCRED (uid/gid/pid)
> - 13 IPC operations (GET_FS_VERSION, GET_CLUSTER_INFO, etc.)
> 
> This is an independent crate with no internal dependencies,
> only requiring tokio, nix, and memmap2. It provides wire-
> compatible IPC with the C implementation's libqb-based server,
> allowing existing clients to work unchanged.
> 
> Includes wire protocol compatibility tests (require root to run).

IMO the commit message reads a bit hard (mostly because of the list
points). Could you please reword / add a bit of touch to make sure its
easier to follow?

Applies also to the other patches. Please revisit also the READMEs as
noted in the last patch.

> 
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
>   src/pmxcfs-rs/Cargo.toml                      |    1 +
>   src/pmxcfs-rs/pmxcfs-ipc/Cargo.toml           |   44 +
>   src/pmxcfs-rs/pmxcfs-ipc/README.md            |  182 +++
>   .../pmxcfs-ipc/examples/test_server.rs        |   92 ++
>   src/pmxcfs-rs/pmxcfs-ipc/src/connection.rs    |  657 ++++++++++
>   src/pmxcfs-rs/pmxcfs-ipc/src/handler.rs       |   93 ++
>   src/pmxcfs-rs/pmxcfs-ipc/src/lib.rs           |   37 +
>   src/pmxcfs-rs/pmxcfs-ipc/src/protocol.rs      |  332 +++++
>   src/pmxcfs-rs/pmxcfs-ipc/src/ringbuffer.rs    | 1158 +++++++++++++++++
>   src/pmxcfs-rs/pmxcfs-ipc/src/server.rs        |  278 ++++
>   src/pmxcfs-rs/pmxcfs-ipc/src/socket.rs        |   84 ++
>   src/pmxcfs-rs/pmxcfs-ipc/tests/auth_test.rs   |  450 +++++++
>   .../pmxcfs-ipc/tests/qb_wire_compat.rs        |  413 ++++++
>   13 files changed, 3821 insertions(+)
>   create mode 100644 src/pmxcfs-rs/pmxcfs-ipc/Cargo.toml
>   create mode 100644 src/pmxcfs-rs/pmxcfs-ipc/README.md
>   create mode 100644 src/pmxcfs-rs/pmxcfs-ipc/examples/test_server.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-ipc/src/connection.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-ipc/src/handler.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-ipc/src/lib.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-ipc/src/protocol.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-ipc/src/ringbuffer.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-ipc/src/server.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-ipc/src/socket.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-ipc/tests/auth_test.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-ipc/tests/qb_wire_compat.rs
> 
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index b00ca68f..f4497d58 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -9,6 +9,7 @@ members = [
>       "pmxcfs-status",     # Status monitoring and RRD data management
>       "pmxcfs-test-utils", # Test utilities and helpers (dev-only)
>       "pmxcfs-services",   # Service framework for automatic retry and lifecycle management
> +    "pmxcfs-ipc",        # libqb-compatible IPC server
>   ]
>   resolver = "2"
>   
> diff --git a/src/pmxcfs-rs/pmxcfs-ipc/Cargo.toml b/src/pmxcfs-rs/pmxcfs-ipc/Cargo.toml
> new file mode 100644
> index 00000000..dbee2e9a
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-ipc/Cargo.toml
> @@ -0,0 +1,44 @@
> +[package]
> +name = "pmxcfs-ipc"
> +description = "libqb-compatible IPC server implementation in pure Rust"
> +
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +repository.workspace = true
> +
> +[lints]
> +workspace = true
> +
> +# System dependencies:
> +# - libqb (runtime) - QB IPC library for client compatibility
> +# - libqb-dev (build/test only) - Required to run wire protocol tests
> +
> +[dependencies]
> +# Error handling
> +anyhow.workspace = true
> +
> +# Async runtime
> +tokio.workspace = true
> +tokio-util.workspace = true
> +
> +# Concurrency primitives
> +parking_lot.workspace = true
> +
> +# System integration
> +libc.workspace = true
> +nix.workspace = true
> +memmap2 = "0.9"
> +
> +# Logging
> +tracing.workspace = true
> +
> +# Async trait support
> +async-trait.workspace = true
> +
> +[dev-dependencies]
> +pmxcfs-test-utils = { path = "../pmxcfs-test-utils" }
> +tempfile.workspace = true
> +tokio = { workspace = true, features = ["rt", "macros"] }
> +tracing-subscriber.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-ipc/README.md b/src/pmxcfs-rs/pmxcfs-ipc/README.md
> new file mode 100644
> index 00000000..5b5b98ae
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-ipc/README.md
> @@ -0,0 +1,182 @@
> +# pmxcfs-ipc: libqb-Compatible IPC Server
> +
> +**Rust implementation of libqb IPC server for pmxcfs using shared memory ring buffers**
> +
> +This crate provides a wire-compatible IPC server that works with libqb clients (C `qb_ipcc_*` API) without depending on the libqb C library.
> +
> +## Table of Contents
> +
> +- [Overview](#overview)
> +- [Architecture](#architecture)
> +- [Protocol Implementation](#protocol-implementation)
> +- [Usage](#usage)
> +- [Testing](#testing)
> +- [References](#references)

nit: not all READMEs have a table of contents.
For consistency I think we should either have it everywhere (if it 
helps), or simply drop it.

> +
> +---
> +
> +## Overview
> +
> +pmxcfs uses libqb for IPC communication between the daemon and client tools (`pvecm`, `pvenode`, etc.). This crate implements a server using QB_IPC_SHM (shared memory ring buffers) that is wire-compatible with libqb clients, enabling the Rust pmxcfs implementation to communicate with existing C-based tools.
> +
> +**Key Features**:
> +- Wire-compatible with libqb clients
> +- QB_IPC_SHM transport (shared memory ring buffers)
> +- Async I/O via tokio
> +- Lock-free SPSC ring buffers
> +- Supports authentication via uid/gid
> +- Per-connection context (uid, gid, pid, read-only flag)
> +- Connection statistics tracking
> +- Abstract Unix sockets for setup handshake (Linux-specific)
> +
> +---
> +
> +## Architecture
> +
> +### Transport: QB_IPC_SHM (Shared Memory Ring Buffers)
> +
> +**Rust pmxcfs uses**: `QB_IPC_SHM` (shared memory ring buffers)
> +
> +We implemented shared memory transport using lock-free SPSC (single-producer single-consumer) ring buffers. This provides:
> +
> +- **Wire compatibility**: Same handshake protocol as libqb
> +- **Async I/O**: Integration with tokio ecosystem
> +
> +**Ring Buffer Design**:
> +- Each connection has 3 ring buffers:
> +  1. **Request ring**: Client writes, server reads
> +  2. **Response ring**: Server writes, client reads
> +  3. **Event ring**: Server writes, client reads (for async notifications)
> +- Ring buffers stored in `/dev/shm` (Linux shared memory)
> +- Chunk-based protocol matching libqb
> +
> +### Server Structure
> +
> +### Connection Statistics
> +
> +Tracks statistics for C compatibility (matching `qb_ipcs_stats`).
> +
> +---
> +
> +## Protocol Implementation
> +
> +### Connection Handshake
> +
> +Server creates an abstract Unix socket `@pve2` (@ prefix indicates abstract namespace) for initial connection setup.
> +
> +### Request/Response Communication
> +
> +After handshake, communication happens via shared memory ring buffers using libqb-compatible chunk format.
> +
> +### Wire Format Structures
> +
> +All structures use `#[repr(C, align(8))]` to match C's alignment requirements.
> +
> +Error codes must be negative errno values (e.g., `-EPERM`, `-EINVAL`) to match libqb convention.
> +
> +---
> +
> +## Testing
> +
> +Requires Corosync running for integration tests. See `tests/` directory for C client FFI compatibility tests.
> +
> +## Implementation Status
> +
> +### Implemented
> +
> +- Connection handshake (SOCK_STREAM setup socket)
> +- Authentication via SO_PASSCRED (uid/gid/pid)
> +- QB_IPC_SHM transport (shared memory ring buffers)
> +- Lock-free SPSC ring buffers
> +- Async I/O via tokio
> +- Abstract Unix sockets for setup handshake
> +- Message header parsing (request/response)
> +- Error code propagation (negative errno)
> +- Ring buffer file management (creation/cleanup)
> +- Event channel ring buffers (created, not actively used)
> +- Connection statistics tracking
> +- Disconnect detection
> +- Read-only flag based on gid
> +
> +### Not Implemented
> +
> +- Event channel message sending (pmxcfs doesn't use events yet)
> +
> +## Application-Level IPC Operations
> +
> +### Operation Summary
> +
> +The following IPC operations are supported (defined in pmxcfs):
> +
> +| Operation | Request Data | Response Data | Description |
> +|-----------|-------------|---------------|-------------|
> +| GET_FS_VERSION | Empty | uint32_t version | Get filesystem version number |
> +| GET_CLUSTER_INFO | Empty | JSON string | Get cluster information |
> +| GET_GUEST_LIST | Empty | JSON array | Get list of all VMs/containers |
> +| SET_STATUS | name + data | Empty | Set status key-value pair |
> +| GET_STATUS | name | Binary data | Get status value by name |
> +| GET_CONFIG | name | File contents | Read configuration file |
> +| LOG_CLUSTER_MSG | priority + msg | Empty | Add cluster log entry |
> +| GET_CLUSTER_LOG | max_entries | JSON array | Get cluster log entries |
> +| GET_RRD_DUMP | Empty | RRD dump text | Get all RRD data |
> +| GET_GUEST_CONFIG_PROPERTY | vmid + key | String value | Get single VM config property |
> +| GET_GUEST_CONFIG_PROPERTIES | vmid | JSON object | Get all VM config properties |
> +| VERIFY_TOKEN | userid + token | Boolean | Verify API token validity |
> +
> +### Common Clients
> +
> +The following Proxmox components use the IPC interface:
> +
> +- **pvestatd**: Updates node/VM/storage metrics (SET_STATUS, GET_STATUS)
> +- **pve-ha-crm**: HA cluster resource manager (GET_CLUSTER_INFO, GET_GUEST_LIST)
> +- **pve-ha-lrm**: HA local resource manager (GET_CONFIG, LOG_CLUSTER_MSG)
> +- **pvecm**: Cluster management CLI (GET_CLUSTER_INFO, GET_CLUSTER_LOG)
> +- **pvedaemon**: PVE API daemon (All query operations)
> +
> +### Permission Model
> +
> +**Write Operations** (require root):
> +- SET_STATUS
> +- LOG_CLUSTER_MSG
> +
> +**Read Operations** (any authenticated user):
> +- All GET_* operations
> +- VERIFY_TOKEN
> +
> +---
> +
> +## References
> +
> +### libqb Source
> +
> +Reference implementation of QB IPC protocol (available at https://github.com/ClusterLabs/libqb):
> +
> +- `libqb/lib/ringbuffer.c` - Ring buffer implementation
> +- `libqb/lib/ipc_shm.c` - Shared memory transport
> +- `libqb/lib/ipc_setup.c` - Connection setup/handshake
> +- `libqb/include/qb/qbipc_common.h` - Wire protocol structures
> +
> +### C pmxcfs (pve-cluster)
> +
> +- `src/pmxcfs/server.c` - C IPC server using libqb
> +- `src/pmxcfs/cfs-ipc-ops.h` - pmxcfs IPC operation codes
> +
> +### Related Documentation
> +
> +- `../C_COMPATIBILITY.md` - General C compatibility notes (if exists)
> +
> +---
> +
> +## Notes
> +
> +### Ring Buffer Naming Convention
> +
> +Ring buffer files are created in `/dev/shm` with names based on connection descriptor and ring type (request/response/event).
> +
> +### Error Handling
> +
> +Always use **negative errno values** for errors to maintain compatibility with libqb clients.
> +
> +### Alignment and Padding
> +
> +All wire format structures must use `#[repr(C, align(8))]` to ensure 8-byte alignment matching C's requirements.
> diff --git a/src/pmxcfs-rs/pmxcfs-ipc/examples/test_server.rs b/src/pmxcfs-rs/pmxcfs-ipc/examples/test_server.rs
> new file mode 100644
> index 00000000..6b9695ce
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-ipc/examples/test_server.rs
> @@ -0,0 +1,92 @@
> +//! Simple test server for debugging libqb connectivity
> +
> +use async_trait::async_trait;
> +use pmxcfs_ipc::{Handler, Permissions, Request, Response, Server};
> +
> +/// Example handler implementation
> +struct TestHandler;
> +
> +#[async_trait]
> +impl Handler for TestHandler {
> +    fn authenticate(&self, uid: u32, gid: u32) -> Option<Permissions> {
> +        // Accept root with read-write access
> +        if uid == 0 {
> +            eprintln!("Authenticated uid={uid}, gid={gid} as root (read-write)");
> +            return Some(Permissions::ReadWrite);
> +        }
> +
> +        // Accept all other users with read-only access for testing
> +        eprintln!("Authenticated uid={uid}, gid={gid} as regular user (read-only)");
> +        Some(Permissions::ReadOnly)
> +    }
> +
> +    async fn handle(&self, request: Request) -> Response {
> +        eprintln!(
> +            "Received request: id={}, data_len={}, conn={}, uid={}, gid={}, pid={}, read_only={}",
> +            request.msg_id,
> +            request.data.len(),
> +            request.conn_id,
> +            request.uid,
> +            request.gid,
> +            request.pid,
> +            request.is_read_only
> +        );
> +
> +        match request.msg_id {
> +            1 => {
> +                // CFS_IPC_GET_FS_VERSION
> +                let response_str = r#"{"version":1,"protocol":1}"#;
> +                eprintln!("Responding with: {response_str}");
> +                Response::ok(response_str.as_bytes().to_vec())
> +            }
> +            2 => {
> +                // CFS_IPC_GET_CLUSTER_INFO
> +                let response_str = r#"{"nodes":["node1","node2"],"quorate":true}"#;
> +                eprintln!("Responding with: {response_str}");
> +                Response::ok(response_str.as_bytes().to_vec())
> +            }
> +            3 => {
> +                // CFS_IPC_GET_GUEST_LIST
> +                let response_str = r#"{"data":[{"vmid":100}]}"#;
> +                eprintln!("Responding with: {response_str}");
> +                Response::ok(response_str.as_bytes().to_vec())
> +            }
> +            _ => {
> +                eprintln!("Unknown message id: {}", request.msg_id);
> +                Response::err(-libc::EINVAL)
> +            }
> +        }
> +    }
> +}
> +
> +#[tokio::main]
> +async fn main() {
> +    // Initialize tracing
> +    tracing_subscriber::fmt()
> +        .with_max_level(tracing::Level::DEBUG)
> +        .with_target(true)
> +        .init();
> +
> +    println!("Starting QB IPC test server on 'pve2'...");
> +
> +    // Create handler and server
> +    let handler = TestHandler;
> +    let mut server = Server::new("pve2", handler);
> +
> +    println!("Server created, starting...");
> +
> +    if let Err(e) = server.start() {
> +        eprintln!("Failed to start server: {e}");
> +        std::process::exit(1);
> +    }
> +
> +    println!("Server started successfully!");
> +    println!("Waiting for connections...");
> +
> +    // Keep server running
> +    tokio::signal::ctrl_c()
> +        .await
> +        .expect("Failed to wait for Ctrl-C");
> +
> +    println!("Shutting down...");
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-ipc/src/connection.rs b/src/pmxcfs-rs/pmxcfs-ipc/src/connection.rs
> new file mode 100644
> index 00000000..d6d77e6c
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-ipc/src/connection.rs
> @@ -0,0 +1,657 @@
> +/// Per-connection handling for libqb IPC with shared memory ring buffers
> +///
> +/// This module contains all connection-specific logic including connection
> +/// establishment, authentication, request handling, and shared memory ring buffer management.
> +use anyhow::{Context, Result};
> +use std::os::unix::io::AsRawFd;
> +use std::path::PathBuf;
> +use std::sync::Arc;
> +use tokio::io::{AsyncReadExt, AsyncWriteExt};
> +use tokio::net::UnixStream;
> +use tokio_util::sync::CancellationToken;
> +
> +use super::handler::{Handler, Permissions};
> +use super::protocol::*;
> +use super::ringbuffer::{FlowControl, RingBuffer};
> +
> +/// Per-connection state using shared memory ring buffers
> +///
> +/// Uses SHM transport (shared memory ring buffers).
> +#[allow(dead_code)] // Fields are intentionally stored for lifecycle management
> +pub(super) struct QbConnection {
> +    /// Connection ID for logging and debugging
> +    conn_id: u64,
> +
> +    /// Client process ID (from SO_PEERCRED)
> +    pid: u32,
> +
> +    /// Client user ID (from SO_PEERCRED)
> +    uid: u32,
> +
> +    /// Client group ID (from SO_PEERCRED)
> +    gid: u32,
> +
> +    /// Whether this connection has read-only access (determined by Handler::authenticate)
> +    pub(super) read_only: bool,
> +
> +    /// Setup socket (kept open for disconnect detection)
> +    _setup_stream: UnixStream,
> +
> +    /// Ring buffers for shared memory IPC
> +    /// Request ring: client writes, server reads
> +    request_rb: Option<RingBuffer>,
> +    /// Response ring: server writes, client reads
> +    response_rb: Option<RingBuffer>,
> +    /// Event ring: server writes, client reads (for async notifications)
> +    /// NOTE: The existing PVE/IPCC.xs Perl client only uses qb_ipcc_sendv_recv()
> +    /// and never calls qb_ipcc_event_recv(), so this ring buffer is created
> +    /// for libqb compatibility but remains unused in practice.
> +    _event_rb: Option<RingBuffer>,
> +
> +    /// Paths to ring buffer data files (for debugging/cleanup)
> +    pub(super) ring_buffer_paths: Vec<PathBuf>,
> +
> +    /// Task handle for request handler (auto-aborted on drop)
> +    pub(super) task_handle: Option<tokio::task::JoinHandle<()>>,

The comment says "auto-aborted on drop which is not correct. Tokio 
detaches the task, it keeps running in the background. Please handle 
explicit abort.

Also the "task handles will be aborted when drop" in the server stop() 
is wrong for the same reason, please re-visit.

> +}
> +
> +impl QbConnection {
> +    /// Accept a new connection from the setup socket
> +    ///
> +    /// Performs authentication, creates ring buffers, spawns request handler task,
> +    /// and returns the connection object.
> +    pub(super) async fn accept(
> +        mut stream: UnixStream,
> +        conn_id: u64,
> +        service_name: &str,
> +        handler: Arc<dyn Handler>,
> +        cancellation_token: CancellationToken,
> +    ) -> Result<Self> {
> +        // Read connection request
> +        let fd = stream.as_raw_fd();
> +        let mut req_bytes = vec![0u8; std::mem::size_of::<ConnectionRequest>()];
> +        stream
> +            .read_exact(&mut req_bytes)
> +            .await
> +            .context("Failed to read connection request")?;
> +
> +        tracing::debug!(
> +            "Connection request raw bytes ({} bytes): {:02x?}",
> +            req_bytes.len(),
> +            req_bytes
> +        );
> +
> +        let req =
> +            unsafe { std::ptr::read_unaligned(req_bytes.as_ptr() as *const ConnectionRequest) };

Can we please validate the handshake / the sent data?

> +
> +        tracing::debug!(
> +            "Connection request: id={}, size={}, max_msg_size={}",
> +            *req.hdr.id,
> +            *req.hdr.size,
> +            req.max_msg_size
> +        );
> +
> +        // Get peer credentials (SO_PEERCRED on Linux)
> +        let (uid, gid, pid) = get_peer_credentials(fd)?;
> +
> +        // Authenticate using Handler trait
> +        let read_only = match handler.authenticate(uid, gid) {
> +            Some(Permissions::ReadWrite) => {
> +                tracing::info!(pid, uid, gid, "Connection accepted with read-write access");
> +                false
> +            }
> +            Some(Permissions::ReadOnly) => {
> +                tracing::info!(pid, uid, gid, "Connection accepted with read-only access");
> +                true
> +            }
> +            None => {
> +                tracing::warn!(
> +                    pid,
> +                    uid,
> +                    gid,
> +                    "Connection rejected by authentication policy"
> +                );
> +                send_connection_response(&mut stream, -libc::EPERM, conn_id, 0, "", "", "").await?;
> +                anyhow::bail!("Connection authentication failed");
> +            }
> +        };
> +
> +        // Create connection descriptor for ring buffer naming
> +        let conn_desc = format!("{}-{}-{}", std::process::id(), pid, conn_id);
> +        let max_msg_size = req.max_msg_size.max(8192);

Please clamp this to a reasonable server side maximum.

> +
> +        // Create ring buffers in /dev/shm
> +        // Pass max_msg_size directly - RingBuffer::new() will add QB_RB_CHUNK_MARGIN and round up
> +        // (just like qb_rb_open() does on the client side)
> +        let ring_size = max_msg_size as usize;
> +
> +        tracing::debug!(
> +            "Creating ring buffers for connection {}: size={} bytes",
> +            conn_id,
> +            ring_size
> +        );
> +
> +        // Request ring: client writes, server reads
> +        // Request ring needs sizeof(int32_t) for flow control (shared_user_data)
> +        let request_rb_name = format!("{conn_desc}-{service_name}-request");
> +        let request_rb = RingBuffer::new(
> +            "/dev/shm",
> +            &request_rb_name,
> +            ring_size,
> +            std::mem::size_of::<i32>(),
> +        )
> +        .context("Failed to create request ring buffer")?;
> +
> +        // Response ring: server writes, client reads
> +        // Response ring doesn't need shared_user_data
> +        let response_rb_name = format!("{conn_desc}-{service_name}-response");
> +        let response_rb = RingBuffer::new("/dev/shm", &response_rb_name, ring_size, 0)
> +            .context("Failed to create response ring buffer")?;
> +
> +        // Event ring: server writes, client reads (for async notifications)
> +        // Event ring doesn't need shared_user_data
> +        let event_rb_name = format!("{conn_desc}-{service_name}-event");
> +        let event_rb = RingBuffer::new("/dev/shm", &event_rb_name, ring_size, 0)
> +            .context("Failed to create event ring buffer")?;
> +
> +        // Collect full paths for cleanup tracking
> +        let request_data_path = PathBuf::from(format!("/dev/shm/qb-{request_rb_name}-data"));
> +        let response_data_path = PathBuf::from(format!("/dev/shm/qb-{response_rb_name}-data"));
> +        let event_data_path = PathBuf::from(format!("/dev/shm/qb-{event_rb_name}-data"));

request, response and event headers should be tracked too

> +
> +        // Send connection response with ring buffer BASE NAMES (not full paths)
> +        // libqb client expects base names (e.g., "123-456-1-pve2-request")
> +        // It will internally prepend "/dev/shm/qb-" and append "-header" or "-data"
> +        send_connection_response(
> +            &mut stream,
> +            0,
> +            conn_id,
> +            max_msg_size,
> +            &request_rb_name,
> +            &response_rb_name,
> +            &event_rb_name,
> +        )
> +        .await?;
> +
> +        // Spawn request handler task
> +        let handler_for_task = handler.clone();
> +        let cancellation_for_task = cancellation_token.child_token();
> +
> +        let task_handle = tokio::spawn(async move {
> +            Self::handle_requests(
> +                request_rb,
> +                response_rb,
> +                handler_for_task,
> +                cancellation_for_task,
> +                conn_id,
> +                uid,
> +                gid,
> +                pid,
> +                read_only,
> +            )
> +            .await;
> +        });
> +
> +        tracing::info!("Connection {} established (SHM transport)", conn_id);
> +
> +        Ok(Self {
> +            conn_id,
> +            pid,
> +            uid,
> +            gid,
> +            read_only,
> +            _setup_stream: stream,
> +            request_rb: None,  // Moved to task
> +            response_rb: None, // Moved to task
> +            _event_rb: Some(event_rb),
> +            ring_buffer_paths: vec![request_data_path, response_data_path, event_data_path],
> +            task_handle: Some(task_handle),
> +        })
> +    }
> +
> +    /// Request handler loop - receives and processes messages via ring buffers
> +    ///
> +    /// Runs in a background async task, receiving requests and sending responses
> +    /// through shared memory ring buffers.
> +    ///
> +    /// Uses tokio channels to implement a workqueue with flow control:
> +    /// - FlowControl::OK: Proceed with sending
> +    /// - FlowControl::SLOW_DOWN: Reduce send rate
> +    /// - FlowControl::STOP: Do not send
> +    ///
> +    /// Architecture: Three concurrent tasks communicating via tokio channels:
> +    /// 1. Request receiver: reads from request ring buffer, queues work
> +    /// 2. Worker: processes requests from work queue, sends to response queue
> +    /// 3. Response sender: writes responses from response queue to response ring buffer
> +    #[allow(clippy::too_many_arguments)]
> +    async fn handle_requests(
> +        mut request_rb: RingBuffer,
> +        mut response_rb: RingBuffer,
> +        handler: Arc<dyn Handler>,
> +        cancellation_token: CancellationToken,
> +        conn_id: u64,
> +        uid: u32,
> +        gid: u32,
> +        pid: u32,
> +        read_only: bool,
> +    ) {
> +        tracing::debug!("Request handler started for connection {}", conn_id);
> +
> +        // Workqueue capacity and flow control thresholds
> +        //
> +        // NOTE: The C implementation (using libqb) processes requests synchronously
> +        // in the event loop callback (server.c:159 s1_msg_process_fn), so there's
> +        // no explicit queue. We add async queueing in Rust to allow non-blocking
> +        // request handling with tokio.
> +        //
> +        // Queue capacity of 8 is chosen as a reasonable default for:
> +        // - Typical PVE workloads: Most IPC operations are fast (file reads/writes)
> +        // - Memory efficiency: Each queued item = ~1KB (request header + data)
> +        // - Backpressure: Small queue encourages flow control to activate quickly
> +        // - Testing: Flow control test (02-flow-control.sh) verifies 20 concurrent
> +        //   operations work correctly with capacity 8
> +        //
> +        // Flow control thresholds match libqb's rate limiting (ipcs.c:199-203):
> +        // - FlowControl::OK (0): Proceed with sending (QB_IPCS_RATE_NORMAL)
> +        // - FlowControl::SLOW_DOWN (1): Reduce send rate (QB_IPCS_RATE_OFF)
> +        // - FlowControl::STOP (2): Do not send (QB_IPCS_RATE_OFF_2)
> +        const MAX_PENDING_REQUESTS: usize = 8;
> +
> +        // Set SLOW_DOWN when queue reaches 75% capacity (6/8 items)
> +        // This provides early warning before the queue fills completely,
> +        // allowing clients to throttle before hitting STOP
> +        const FC_WARNING_THRESHOLD: usize = 6;
> +
> +        // Work queue: (header, request) -> worker
> +        let (work_tx, mut work_rx) =
> +            tokio::sync::mpsc::channel::<(RequestHeader, Request)>(MAX_PENDING_REQUESTS);
> +
> +        // Response queue: worker -> response sender
> +        // Unbounded because responses must not block the worker
> +        let (response_tx, mut response_rx) =
> +            tokio::sync::mpsc::unbounded_channel::<(RequestHeader, Response)>();

if the response ring buffer fills up (slow/stuck client), responses
queue in memory without limit and can OOM the daemon. This should
be bounded like the work queue already is.

> +
> +        // Spawn worker task to process requests
> +        let worker_handler = handler.clone();
> +        let worker_response_tx = response_tx.clone();
> +        let worker_task = tokio::spawn(async move {
> +            while let Some((header, request)) = work_rx.recv().await {
> +                let handler_response = worker_handler.handle(request).await;
> +                // Send to response queue (unbounded, never blocks)
> +                let _ = worker_response_tx.send((header, handler_response));
> +            }
> +        });
> +
> +        // Spawn response sender task
> +        let response_task = tokio::spawn(async move {
> +            while let Some((header, handler_response)) = response_rx.recv().await {
> +                Self::send_response(&mut response_rb, header, handler_response).await;
> +            }
> +        });
> +
> +        // Main request receiver loop
> +        loop {
> +            // Wait for incoming request (async, yields to tokio scheduler)
> +            let request_data = tokio::select! {
> +                _ = cancellation_token.cancelled() => {
> +                    tracing::debug!("Request handler cancelled for connection {}", conn_id);
> +                    break;
> +                }
> +                result = request_rb.recv() => {
> +                    match result {
> +                        Ok(data) => data,
> +                        Err(e) => {
> +                            tracing::error!("Error receiving request on conn {}: {}", conn_id, e);
> +                            break;
> +                        }
> +                    }
> +                }
> +            };
> +
> +            // After receiving from ring buffer, flow control is already set to 0
> +            // by RingBufferShared::read_chunk()
> +
> +            // Parse request header
> +            if request_data.len() < std::mem::size_of::<RequestHeader>() {
> +                tracing::warn!(
> +                    "Request too small: {} bytes (need {} for header)",
> +                    request_data.len(),
> +                    std::mem::size_of::<RequestHeader>()
> +                );
> +                continue;
> +            }
> +
> +            let header =
> +                unsafe { std::ptr::read_unaligned(request_data.as_ptr() as *const RequestHeader) };
> +
> +            tracing::debug!(
> +                "Received request on conn {}: id={}, size={}",
> +                conn_id,
> +                *header.id,
> +                *header.size
> +            );
> +
> +            // Extract message data (after header)
> +            let header_size = std::mem::size_of::<RequestHeader>();
> +            let msg_data = &request_data[header_size..];
> +
> +            // Build request object with full context
> +            let request = Request {
> +                msg_id: *header.id,
> +                data: msg_data.to_vec(),
> +                is_read_only: read_only,
> +                conn_id,
> +                uid,
> +                gid,
> +                pid,
> +            };
> +
> +            // Send to workqueue - implements backpressure via flow control
> +            match work_tx.try_send((header, request)) {
> +                Ok(()) => {
> +                    // Request queued successfully
> +
> +                    // Update flow control based on queue depth
> +                    // This matches libqb's rate limiting behavior
> +                    let queue_len = MAX_PENDING_REQUESTS - work_tx.capacity();
> +                    let fc_value = if queue_len >= MAX_PENDING_REQUESTS {
> +                        FlowControl::STOP // Queue full - stop sending
> +                    } else if queue_len >= FC_WARNING_THRESHOLD {
> +                        FlowControl::SLOW_DOWN // Queue approaching full - slow down
> +                    } else {
> +                        FlowControl::OK // Queue has space - OK to send
> +                    };
> +
> +                    if fc_value > FlowControl::OK {
> +                        tracing::debug!(
> +                            "Setting flow control to {} (queue: {}/{})",
> +                            fc_value,
> +                            queue_len,
> +                            MAX_PENDING_REQUESTS
> +                        );
> +                    }
> +                    request_rb.flow_control.set(fc_value);
> +                }
> +                Err(tokio::sync::mpsc::error::TrySendError::Full(_)) => {
> +                    // Queue is full - set flow control to STOP and send EAGAIN
> +                    tracing::warn!("Work queue full on conn {}, sending EAGAIN", conn_id);
> +                    request_rb.flow_control.set(FlowControl::STOP);
> +
> +                    let error_response = Response {
> +                        error_code: -libc::EAGAIN,
> +                        data: Vec::new(),
> +                    };
> +                    // Send error response directly (bypassing queue)
> +                    let _ = response_tx.send((header, error_response));
> +                }
> +                Err(tokio::sync::mpsc::error::TrySendError::Closed(_)) => {
> +                    tracing::error!("Work queue closed on conn {}", conn_id);
> +                    break;
> +                }
> +            }
> +        }
> +
> +        // Cleanup: drop channels to signal tasks to exit
> +        drop(work_tx);
> +        drop(response_tx);
> +        let _ = worker_task.await;
> +        let _ = response_task.await;
> +
> +        tracing::debug!("Request handler finished for connection {}", conn_id);
> +    }
> +
> +    /// Send a response to the client
> +    async fn send_response(
> +        response_rb: &mut RingBuffer,
> +        header: RequestHeader,
> +        handler_response: Response,
> +    ) {
> +        // Build and serialize response: [header][data]
> +        let response_size = std::mem::size_of::<ResponseHeader>() + handler_response.data.len();
> +        let mut response_bytes = Vec::with_capacity(response_size);
> +
> +        let response_header = ResponseHeader {
> +            id: header.id,
> +            size: (response_size as i32).into(),
> +            error: handler_response.error_code.into(),
> +        };
> +
> +        response_bytes.extend_from_slice(unsafe {
> +            std::slice::from_raw_parts(
> +                &response_header as *const _ as *const u8,
> +                std::mem::size_of::<ResponseHeader>(),
> +            )
> +        });
> +        response_bytes.extend_from_slice(&handler_response.data);
> +
> +        tracing::debug!("Response header bytes (24): {:02x?}", &response_bytes[..24]);
> +
> +        // Send response (async, yields if buffer full)
> +        match response_rb.send(&response_bytes).await {
> +            Ok(()) => {
> +                // Response sent successfully
> +            }
> +            Err(e) => {
> +                tracing::error!("Failed to send response: {}", e);
> +            }
> +        }
> +    }
> +}
> +
> +/// Get peer credentials from Unix socket
> +fn get_peer_credentials(fd: i32) -> Result<(u32, u32, u32)> {
> +    #[cfg(target_os = "linux")]
> +    {
> +        let mut ucred: libc::ucred = unsafe { std::mem::zeroed() };
> +        let mut ucred_size = std::mem::size_of::<libc::ucred>() as libc::socklen_t;
> +
> +        let res = unsafe {
> +            libc::getsockopt(
> +                fd,
> +                libc::SOL_SOCKET,
> +                libc::SO_PEERCRED,
> +                &mut ucred as *mut _ as *mut libc::c_void,
> +                &mut ucred_size,
> +            )
> +        };
> +
> +        if res != 0 {
> +            anyhow::bail!(
> +                "getsockopt SO_PEERCRED failed: {}",
> +                std::io::Error::last_os_error()
> +            );
> +        }
> +
> +        Ok((ucred.uid, ucred.gid, ucred.pid as u32))
> +    }
> +
> +    #[cfg(not(target_os = "linux"))]
> +    {
> +        anyhow::bail!("Peer credentials not supported on this platform");
> +    }
> +}
> +
> +/// Send connection response to client
> +async fn send_connection_response(
> +    stream: &mut UnixStream,
> +    error: i32,
> +    conn_id: u64,
> +    max_msg_size: u32,
> +    request_path: &str,
> +    response_path: &str,
> +    event_path: &str,
> +) -> Result<()> {
> +    let mut response = ConnectionResponse {
> +        hdr: ResponseHeader {
> +            id: MSG_AUTHENTICATE.into(),
> +            size: (std::mem::size_of::<ConnectionResponse>() as i32).into(),
> +            error: error.into(),
> +        },
> +        connection_type: CONNECTION_TYPE_SHM, // Shared memory transport
> +        max_msg_size,
> +        connection: conn_id as usize,
> +        request: [0u8; PATH_MAX],
> +        response: [0u8; PATH_MAX],
> +        event: [0u8; PATH_MAX],
> +    };
> +
> +    // Helper to copy path strings into fixed-size buffers
> +    let copy_path = |dest: &mut [u8; PATH_MAX], src: &str| {
> +        if !src.is_empty() {
> +            let len = src.len().min(PATH_MAX - 1);
> +            dest[..len].copy_from_slice(&src.as_bytes()[..len]);
> +            tracing::debug!("Connection response path: '{}'", src);
> +        }
> +    };
> +
> +    copy_path(&mut response.request, request_path);
> +    copy_path(&mut response.response, response_path);
> +    copy_path(&mut response.event, event_path);
> +
> +    // Serialize and send
> +    let response_bytes = unsafe {
> +        std::slice::from_raw_parts(
> +            &response as *const _ as *const u8,
> +            std::mem::size_of::<ConnectionResponse>(),
> +        )
> +    };
> +
> +    stream
> +        .write_all(response_bytes)
> +        .await
> +        .context("Failed to send connection response")?;
> +
> +    tracing::debug!(
> +        "Sent connection response: error={}, connection_type=SHM",
> +        error
> +    );
> +
> +    Ok(())
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +
> +    #[test]
> +    fn test_malformed_request_size_validation() {
> +        // This test verifies the size validation logic for malformed requests
> +        // The actual validation happens in handle_requests() at line 247-254
> +
> +        let header_size = std::mem::size_of::<RequestHeader>();
> +        assert_eq!(header_size, 16, "RequestHeader should be 16 bytes");
> +
> +        // Test case 1: Request too small (would be rejected)
> +        let too_small_data = [0x01, 0x02, 0x03]; // Only 3 bytes
> +        assert!(
> +            too_small_data.len() < header_size,
> +            "Malformed request with {} bytes should be less than header size {}",
> +            too_small_data.len(),
> +            header_size
> +        );
> +
> +        // Test case 2: More realistic too-small cases
> +        let test_cases = vec![
> +            (vec![0u8; 0], 0),   // Empty request
> +            (vec![0u8; 1], 1),   // 1 byte
> +            (vec![0u8; 8], 8),   // 8 bytes (half header)
> +            (vec![0u8; 15], 15), // 15 bytes (just short of header)
> +        ];
> +
> +        for (data, expected_len) in test_cases {
> +            assert_eq!(data.len(), expected_len);
> +            assert!(
> +                data.len() < header_size,
> +                "Request with {} bytes should be rejected (need {})",
> +                data.len(),
> +                header_size
> +            );
> +        }
> +
> +        // Test case 3: Valid size requests (would pass size check)
> +        let valid_cases = vec![
> +            vec![0u8; 16],   // Exact header size
> +            vec![0u8; 32],   // Header + data
> +            vec![0u8; 1024], // Large request
> +        ];
> +
> +        for data in valid_cases {
> +            assert!(
> +                data.len() >= header_size,
> +                "Request with {} bytes should pass size check",
> +                data.len()
> +            );
> +        }
> +    }
> +
> +    #[test]
> +    fn test_malformed_header_structure() {
> +        // This test verifies that the header structure is correctly defined
> +        // and that we can safely parse various header patterns
> +
> +        let header_size = std::mem::size_of::<RequestHeader>();
> +
> +        // Create a valid-sized buffer with various patterns
> +        let patterns = vec![
> +            vec![0x00; header_size], // All zeros
> +            vec![0xFF; header_size], // All ones
> +            vec![0xAA; header_size], // Alternating pattern
> +        ];
> +
> +        for pattern in patterns {
> +            assert_eq!(pattern.len(), header_size);
> +
> +            // Parse header (same unsafe code as in handle_requests:256-258)
> +            let header =
> +                unsafe { std::ptr::read_unaligned(pattern.as_ptr() as *const RequestHeader) };
> +
> +            // The parsing should not crash, regardless of values
> +            // The actual values don't matter for this safety test
> +            let _id = *header.id;
> +            let _size = *header.size;
> +        }
> +    }
> +
> +    #[test]
> +    fn test_request_header_alignment() {
> +        // Verify that RequestHeader can be read with read_unaligned
> +        // This is important because data from ring buffers may not be aligned
> +
> +        let header_size = std::mem::size_of::<RequestHeader>();
> +
> +        // Create misaligned buffer (offset by 1 byte to test unaligned access)
> +        let mut buffer = vec![0u8; header_size + 1];
> +        buffer[1..].fill(0x42);
> +
> +        // Read from misaligned offset (this is what read_unaligned is for)
> +        let header =
> +            unsafe { std::ptr::read_unaligned(&buffer[1] as *const u8 as *const RequestHeader) };
> +
> +        // Should successfully read without crashing
> +        let _id = *header.id;
> +        let _size = *header.size;
> +    }
> +
> +    #[test]
> +    fn test_connection_request_structure() {
> +        // Verify ConnectionRequest structure for connection setup
> +
> +        let conn_req_size = std::mem::size_of::<ConnectionRequest>();
> +
> +        // ConnectionRequest should be properly sized
> +        assert!(
> +            conn_req_size > std::mem::size_of::<RequestHeader>(),
> +            "ConnectionRequest should include header plus additional fields"
> +        );
> +
> +        // Test that we can parse a zero-filled connection request
> +        let data = vec![0u8; conn_req_size];
> +        let conn_req =
> +            unsafe { std::ptr::read_unaligned(data.as_ptr() as *const ConnectionRequest) };
> +
> +        // Should not crash when accessing fields
> +        let _id = *conn_req.hdr.id;
> +        let _size = *conn_req.hdr.size;
> +        let _max_msg_size = conn_req.max_msg_size;
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-ipc/src/handler.rs b/src/pmxcfs-rs/pmxcfs-ipc/src/handler.rs
> new file mode 100644
> index 00000000..12b40cd4
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-ipc/src/handler.rs
> @@ -0,0 +1,93 @@
> +//! Handler trait for processing IPC requests
> +//!
> +//! This module defines the core `Handler` trait that users implement to process
> +//! IPC requests. The trait-based approach provides a more idiomatic and extensible
> +//! API compared to raw function closures.
> +
> +use crate::protocol::{Request, Response};
> +use async_trait::async_trait;
> +
> +/// Permissions for IPC connections
> +///
> +/// Determines the access level for authenticated connections.
> +#[derive(Debug, Clone, Copy, PartialEq, Eq)]
> +pub enum Permissions {
> +    /// Read-only access
> +    ReadOnly,
> +    /// Read-write access
> +    ReadWrite,
> +}
> +
> +/// Handler trait for processing IPC requests and authentication
> +///
> +/// Implement this trait to define custom request handling logic and authentication
> +/// policy for your IPC server. The handler receives a `Request` containing the
> +/// message ID, payload data, and connection context, and returns a `Response` with
> +/// an error code and response data.
> +///
> +/// ## Authentication
> +///
> +/// The `authenticate` method is called during connection setup to determine whether
> +/// a client with given credentials should be accepted. This allows the handler to
> +/// implement application-specific authentication policies.
> +///
> +/// ## Async Support
> +///
> +/// The `handle` method is async, allowing you to perform I/O operations, database
> +/// queries, or other async work within your handler.
> +///
> +/// ## Thread Safety
> +///
> +/// Handlers must be `Send + Sync` as they may be called from multiple tokio tasks
> +/// concurrently. Use `Arc<Mutex<T>>` or other synchronization primitives if you need
> +/// mutable shared state.
> +///
> +/// ## Error Handling
> +///
> +/// Return negative errno values in `Response::error_code` to indicate errors.
> +/// Use 0 for success. See `libc::*` constants for standard errno values.
> +#[async_trait]
> +pub trait Handler: Send + Sync {
> +    /// Authenticate a connecting client and determine access level
> +    ///
> +    /// Called during connection setup to determine whether to accept the connection
> +    /// and what access level to grant.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `uid` - Client user ID (from SO_PEERCRED)
> +    /// * `gid` - Client group ID (from SO_PEERCRED)
> +    ///
> +    /// # Returns
> +    ///
> +    /// - `Some(Permissions::ReadWrite)` to accept with read-write access
> +    /// - `Some(Permissions::ReadOnly)` to accept with read-only access
> +    /// - `None` to reject the connection
> +    fn authenticate(&self, uid: u32, gid: u32) -> Option<Permissions>;
> +
> +    /// Handle an IPC request
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `request` - The incoming request with message ID, data, and connection context
> +    ///
> +    /// # Returns
> +    ///
> +    /// A `Response` containing the error code (0 = success, negative = errno) and
> +    /// optional response data to send back to the client.
> +    async fn handle(&self, request: Request) -> Response;
> +}
> +
> +/// Blanket implementation for Arc<T> where T: Handler
> +///
> +/// This allows passing `Arc<MyHandler>` directly to `Server::new()`.
> +#[async_trait]
> +impl<T: Handler> Handler for std::sync::Arc<T> {
> +    fn authenticate(&self, uid: u32, gid: u32) -> Option<Permissions> {
> +        (**self).authenticate(uid, gid)
> +    }
> +
> +    async fn handle(&self, request: Request) -> Response {
> +        (**self).handle(request).await
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-ipc/src/lib.rs b/src/pmxcfs-rs/pmxcfs-ipc/src/lib.rs
> new file mode 100644
> index 00000000..923c359e
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-ipc/src/lib.rs
> @@ -0,0 +1,37 @@
> +/// libqb-compatible IPC server implementation in pure Rust
> +///
> +/// This crate implements a minimal libqb IPC server that is wire-compatible
> +/// with libqb clients (qb_ipcc_*), without depending on the libqb C library.
> +///
> +/// ## Protocol Overview
> +///
> +/// 1. **Connection Handshake** (SOCK_STREAM):
> +///    - Server listens on `/var/run/{service_name}`
> +///    - Client connects and sends `qb_ipc_connection_request`
> +///    - Server authenticates (uid/gid), creates per-connection datagram sockets
> +///    - Server sends `qb_ipc_connection_response` with socket paths
> +///
> +/// 2. **Request/Response** (SOCK_DGRAM):
> +///    - Client sends requests on datagram socket

The actual implementation uses SHM ring buffers, please re-visit 
documentation.

> +///    - Server receives, processes, and sends responses
> +///
> +/// ## Module Structure
> +///
> +/// - `protocol` - Wire protocol structures and constants
> +/// - `socket` - Abstract Unix socket utilities
> +/// - `connection` - Per-connection handling and request processing
> +/// - `server` - Main IPC server and connection acceptance
> +///
> +/// References:
> +/// - libqb source: ~/dev/libqb/lib/ipc_socket.c, ipc_setup.c
> +mod connection;
> +mod handler;
> +mod protocol;
> +mod ringbuffer;
> +mod server;
> +mod socket;
> +
> +// Public API
> +pub use handler::{Handler, Permissions};
> +pub use protocol::{Request, Response};
> +pub use server::Server;
> diff --git a/src/pmxcfs-rs/pmxcfs-ipc/src/protocol.rs b/src/pmxcfs-rs/pmxcfs-ipc/src/protocol.rs
> new file mode 100644
> index 00000000..469099f2
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-ipc/src/protocol.rs
> @@ -0,0 +1,332 @@
> +//! libqb wire protocol structures and constants
> +//!
> +//! This module contains the low-level protocol definitions for libqb IPC communication.
> +//! All structures must match the C counterparts exactly for binary compatibility.
> +
> +/// Message ID for authentication requests (matches libqb's QB_IPC_MSG_AUTHENTICATE)
> +pub(super) const MSG_AUTHENTICATE: i32 = 1;
> +
> +/// Connection type for shared memory transport (matches libqb's QB_IPC_SHM)
> +pub(super) const CONNECTION_TYPE_SHM: u32 = 1;
> +
> +/// Maximum path length - used in connection response
> +pub(super) const PATH_MAX: usize = 4096;
> +
> +/// Wrapper for i32 that aligns to 8-byte boundary with explicit padding
> +///
> +/// Simulates C's `__attribute__ ((aligned(8)))` on individual i32 fields.
> +/// This is used to match libqb's per-field alignment behavior.
> +///
> +/// Memory layout:
> +/// - Bytes 0-3: i32 value
> +/// - Bytes 4-7: zero padding
> +/// - Total: 8 bytes
> +#[repr(C, align(8))]
> +#[derive(Debug, Copy, Clone, PartialEq, Eq)]
> +pub struct Align8 {
> +    pub value: i32,
> +    _pad: u32, // 4 bytes padding for i32 -> 8 bytes total
> +}
> +
> +impl Align8 {
> +    #[inline]
> +    pub const fn new(value: i32) -> Self {
> +        Align8 { value, _pad: 0 }
> +    }
> +}
> +
> +impl std::ops::Deref for Align8 {
> +    type Target = i32;
> +
> +    #[inline]
> +    fn deref(&self) -> &i32 {
> +        &self.value
> +    }
> +}
> +
> +impl std::ops::DerefMut for Align8 {
> +    #[inline]
> +    fn deref_mut(&mut self) -> &mut i32 {
> +        &mut self.value
> +    }
> +}
> +
> +impl From<i32> for Align8 {
> +    #[inline]
> +    fn from(value: i32) -> Self {
> +        Align8::new(value)
> +    }
> +}
> +
> +impl Default for Align8 {
> +    #[inline]
> +    fn default() -> Self {
> +        Align8::new(0)
> +    }
> +}
> +
> +/// Request header (matches libqb's qb_ipc_request_header)
> +///
> +/// Each field is 8-byte aligned to match C's __attribute__ ((aligned(8)))
> +#[repr(C, align(8))]
> +#[derive(Debug, Copy, Clone)]
> +pub struct RequestHeader {
> +    pub id: Align8,
> +    pub size: Align8,
> +}
> +
> +/// Response header (matches libqb's qb_ipc_response_header)
> +#[repr(C, align(8))]
> +#[derive(Debug, Copy, Clone)]
> +pub struct ResponseHeader {
> +    pub id: Align8,
> +    pub size: Align8,
> +    pub error: Align8,
> +}
> +
> +/// Connection request sent by client during handshake (matches libqb's qb_ipc_connection_request)
> +#[repr(C)]
> +#[derive(Debug, Copy, Clone)]
> +pub(super) struct ConnectionRequest {
> +    pub hdr: RequestHeader,
> +    pub max_msg_size: u32,
> +}
> +
> +/// Connection response sent by server during handshake (matches libqb's qb_ipc_connection_response)
> +#[repr(C, align(8))]
> +#[derive(Debug)]
> +pub(super) struct ConnectionResponse {
> +    pub hdr: ResponseHeader,
> +    pub connection_type: u32,
> +    pub max_msg_size: u32,
> +    pub connection: usize,
> +    pub request: [u8; PATH_MAX],
> +    pub response: [u8; PATH_MAX],
> +    pub event: [u8; PATH_MAX],
> +}
> +
> +/// Request passed to handlers
> +///
> +/// Contains all information about an IPC request including the message ID,
> +/// payload data, and connection context (uid, gid, pid, permissions).
> +#[derive(Debug, Clone)]
> +pub struct Request {
> +    /// Message ID identifying the operation (application-defined)
> +    pub msg_id: i32,
> +
> +    /// Request payload data
> +    pub data: Vec<u8>,
> +
> +    /// Whether this connection has read-only access
> +    pub is_read_only: bool,
> +
> +    /// Connection ID (for logging/debugging)
> +    pub conn_id: u64,
> +
> +    /// Client user ID (from SO_PEERCRED)
> +    pub uid: u32,
> +
> +    /// Client group ID (from SO_PEERCRED)
> +    pub gid: u32,
> +
> +    /// Client process ID (from SO_PEERCRED)
> +    pub pid: u32,
> +}
> +
> +/// Response from handlers
> +///
> +/// Contains the error code and response data to send back to the client.
> +#[derive(Debug, Clone)]
> +pub struct Response {
> +    /// Error code (0 = success, negative = errno)
> +    pub error_code: i32,
> +
> +    /// Response payload data
> +    pub data: Vec<u8>,
> +}
> +
> +impl Response {
> +    /// Create a successful response with data
> +    pub fn ok(data: Vec<u8>) -> Self {
> +        Self {
> +            error_code: 0,
> +            data,
> +        }
> +    }
> +
> +    /// Create an error response with errno
> +    pub fn err(error_code: i32) -> Self {
> +        Self {
> +            error_code,
> +            data: Vec::new(),
> +        }
> +    }
> +
> +    /// Create an error response with errno and optional data
> +    pub fn with_error(error_code: i32, data: Vec<u8>) -> Self {
> +        Self { error_code, data }
> +    }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +
> +    #[test]
> +    fn test_header_sizes() {
> +        assert_eq!(std::mem::size_of::<RequestHeader>(), 16);
> +        assert_eq!(std::mem::align_of::<RequestHeader>(), 8);
> +        assert_eq!(std::mem::size_of::<ResponseHeader>(), 24);
> +        assert_eq!(std::mem::align_of::<ResponseHeader>(), 8);
> +        assert_eq!(std::mem::size_of::<ConnectionRequest>(), 24); // 16 (header) + 4 (max_msg_size) + 4 (padding)
> +
> +        println!(
> +            "ConnectionResponse size: {}",
> +            std::mem::size_of::<ConnectionResponse>()
> +        );
> +        println!(
> +            "ConnectionResponse align: {}",
> +            std::mem::align_of::<ConnectionResponse>()
> +        );
> +        println!("PATH_MAX: {PATH_MAX}");
> +
> +        // C expects: 24 (header) + 4 (connection_type) + 4 (max_msg_size) + 8 (connection pointer) + 3*4096 (paths) = 12328
> +        assert_eq!(std::mem::size_of::<ConnectionResponse>(), 12328);
> +    }
> +
> +    // ===== Align8 Tests =====
> +
> +    #[test]
> +    fn test_align8_size_and_alignment() {
> +        // Verify Align8 is exactly 8 bytes
> +        assert_eq!(std::mem::size_of::<Align8>(), 8);
> +        assert_eq!(std::mem::align_of::<Align8>(), 8);
> +    }
> +
> +    #[test]
> +    fn test_align8_creation_and_value_access() {
> +        let a = Align8::new(42);
> +        assert_eq!(a.value, 42);
> +        assert_eq!(*a, 42); // Test Deref
> +    }
> +
> +    #[test]
> +    fn test_align8_from_i32() {
> +        let a: Align8 = (-100).into();
> +        assert_eq!(a.value, -100);
> +    }
> +
> +    #[test]
> +    fn test_align8_default() {
> +        let a = Align8::default();
> +        assert_eq!(a.value, 0);
> +    }
> +
> +    #[test]
> +    fn test_align8_deref_mut() {
> +        let mut a = Align8::new(10);
> +        *a = 20; // Test DerefMut
> +        assert_eq!(a.value, 20);
> +    }
> +
> +    #[test]
> +    fn test_align8_padding_is_zero() {
> +        let a = Align8::new(123);
> +        // Padding should always be 0
> +        assert_eq!(a._pad, 0);
> +    }
> +
> +    // ===== Response Tests =====
> +
> +    #[test]
> +    fn test_response_ok_creation() {
> +        let data = b"test data".to_vec();
> +        let resp = Response::ok(data.clone());
> +
> +        assert_eq!(resp.error_code, 0);
> +        assert_eq!(resp.data, data);
> +    }
> +
> +    #[test]
> +    fn test_response_err_creation() {
> +        let resp = Response::err(-5); // ERRNO like EIO
> +
> +        assert_eq!(resp.error_code, -5);
> +        assert!(resp.data.is_empty());
> +    }
> +
> +    #[test]
> +    fn test_response_with_error_and_data() {
> +        let data = b"error details".to_vec();
> +        let resp = Response::with_error(-22, data.clone()); // EINVAL
> +
> +        assert_eq!(resp.error_code, -22);
> +        assert_eq!(resp.data, data);
> +    }
> +
> +    #[test]
> +    fn test_response_error_codes() {
> +        // Test various errno values
> +        let test_cases = vec![
> +            (0, "success"),
> +            (-1, "EPERM"),
> +            (-2, "ENOENT"),
> +            (-13, "EACCES"),
> +            (-22, "EINVAL"),
> +        ];
> +
> +        for (code, _name) in test_cases {
> +            let resp = Response::err(code);
> +            assert_eq!(resp.error_code, code);
> +        }
> +    }
> +
> +    // ===== Request Tests =====
> +
> +    #[test]
> +    fn test_request_creation() {
> +        let req = Request {
> +            msg_id: 100,
> +            data: b"payload".to_vec(),
> +            is_read_only: false,
> +            conn_id: 12345,
> +            uid: 0,
> +            gid: 0,
> +            pid: 999,
> +        };
> +
> +        assert_eq!(req.msg_id, 100);
> +        assert_eq!(req.data, b"payload");
> +        assert!(!req.is_read_only);
> +        assert_eq!(req.conn_id, 12345);
> +        assert_eq!(req.uid, 0);
> +        assert_eq!(req.gid, 0);
> +        assert_eq!(req.pid, 999);
> +    }
> +
> +    #[test]
> +    fn test_request_read_only_flag() {
> +        let req_ro = Request {
> +            msg_id: 1,
> +            data: vec![],
> +            is_read_only: true,
> +            conn_id: 1,
> +            uid: 33,
> +            gid: 33,
> +            pid: 1000,
> +        };
> +
> +        let req_rw = Request {
> +            msg_id: 1,
> +            data: vec![],
> +            is_read_only: false,
> +            conn_id: 2,
> +            uid: 0,
> +            gid: 0,
> +            pid: 1001,
> +        };
> +
> +        assert!(req_ro.is_read_only);
> +        assert!(!req_rw.is_read_only);
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-ipc/src/ringbuffer.rs b/src/pmxcfs-rs/pmxcfs-ipc/src/ringbuffer.rs
> new file mode 100644
> index 00000000..96dd192b
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-ipc/src/ringbuffer.rs
> @@ -0,0 +1,1158 @@
> +/// Lock-free ring buffer implementation compatible with libqb's shared memory IPC
> +///
> +/// This module implements a SPSC (single-producer single-consumer) ring buffer
> +/// using shared memory, matching libqb's wire protocol and memory layout.
> +///
> +/// ## Design
> +///
> +/// - **Shared Memory**: Two mmap'd files (header + data) in /dev/shm
> +/// - **Lock-Free**: Uses atomic operations for read_pt/write_pt synchronization
> +/// - **Chunk-Based**: Messages stored as [size][magic][data] chunks
> +/// - **Wire-Compatible**: Matches libqb's qb_ringbuffer_shared_s layout
> +use anyhow::{Context, Result};
> +use memmap2::MmapMut;
> +use std::fs::OpenOptions;
> +use std::os::fd::AsRawFd;
> +use std::path::Path;
> +use std::sync::Arc;
> +use std::sync::atomic::{AtomicI32, AtomicU32, Ordering};
> +use tokio::sync::Notify;
> +
> +/// Circular mmap wrapper for ring buffer data
> +///
> +/// This struct manages a circular memory mapping where the same file is mapped
> +/// twice in consecutive virtual addresses. This allows ring buffer operations
> +/// to wrap around naturally without modulo arithmetic.
> +///
> +/// Matches libqb's qb_sys_circular_mmap() behavior.
> +struct CircularMmap {
> +    /// Starting address of the 2x circular mapping
> +    addr: *mut libc::c_void,
> +    /// Size of the file (virtual mapping is 2x this size)
> +    size: usize,
> +}
> +
> +impl CircularMmap {
> +    /// Create a circular mmap from a file descriptor
> +    ///
> +    /// Maps the file TWICE in consecutive virtual addresses, allowing ring buffer
> +    /// wraparound without modulo arithmetic. Matches libqb's qb_sys_circular_mmap().
> +    ///
> +    /// # Arguments
> +    /// - `fd`: File descriptor of the data file (must be sized to `size` bytes)
> +    /// - `size`: Size of the file in bytes (virtual mapping will be 2x this)
> +    ///
> +    /// # Safety
> +    /// The file must be properly sized before calling this function.
> +    unsafe fn new(fd: i32, size: usize) -> Result<Self> {
> +        // SAFETY: All operations in this function are inherently unsafe as they
> +        // manipulate raw memory mappings. The caller must ensure the fd is valid
> +        // and the file is properly sized.
> +        unsafe {
> +            // Step 1: Reserve 2x space with anonymous mmap
> +            let addr_orig = libc::mmap(
> +                std::ptr::null_mut(),
> +                size * 2,
> +                libc::PROT_NONE,
> +                libc::MAP_ANONYMOUS | libc::MAP_PRIVATE,
> +                -1,
> +                0,
> +            );
> +
> +            if addr_orig == libc::MAP_FAILED {
> +                anyhow::bail!(
> +                    "Failed to reserve circular mmap space: {}",
> +                    std::io::Error::last_os_error()
> +                );
> +            }
> +
> +            // Step 2: Map the file at the start of reserved space
> +            let addr1 = libc::mmap(
> +                addr_orig,
> +                size,
> +                libc::PROT_READ | libc::PROT_WRITE,
> +                libc::MAP_FIXED | libc::MAP_SHARED,
> +                fd,
> +                0,
> +            );
> +
> +            if addr1 != addr_orig {
> +                libc::munmap(addr_orig, size * 2);
> +                anyhow::bail!(
> +                    "Failed to map first half of circular buffer: {}",
> +                    std::io::Error::last_os_error()
> +                );
> +            }
> +
> +            // Step 3: Map the SAME file again right after
> +            let addr_next = (addr_orig as *mut u8).add(size) as *mut libc::c_void;
> +            let addr2 = libc::mmap(
> +                addr_next,
> +                size,
> +                libc::PROT_READ | libc::PROT_WRITE,
> +                libc::MAP_FIXED | libc::MAP_SHARED,
> +                fd,
> +                0,
> +            );
> +
> +            if addr2 != addr_next {
> +                libc::munmap(addr_orig, size * 2);
> +                anyhow::bail!(
> +                    "Failed to map second half of circular buffer: {}",
> +                    std::io::Error::last_os_error()
> +                );
> +            }
> +
> +            tracing::debug!(
> +                "Created circular mmap: {:p}, {} bytes (2x {} bytes file)",
> +                addr_orig,
> +                size * 2,
> +                size
> +            );
> +
> +            Ok(Self {
> +                addr: addr_orig,
> +                size,
> +            })
> +        }
> +    }
> +
> +    /// Get the base address as a mutable pointer to u32
> +    ///
> +    /// This is the most common use case for ring buffers which work with u32 words.
> +    fn as_mut_ptr(&self) -> *mut u32 {
> +        self.addr as *mut u32
> +    }
> +
> +    /// Zero-initialize the circular mapping
> +    ///
> +    /// Only needs to write to the first half due to the circular nature.
> +    ///
> +    /// # Safety
> +    /// The circular mmap must be properly initialized and the address valid.
> +    unsafe fn zero_initialize(&mut self) {
> +        // SAFETY: Caller ensures the circular mmap is valid and mapped
> +        unsafe {
> +            std::ptr::write_bytes(self.addr as *mut u8, 0, self.size);
> +        }
> +    }
> +}
> +
> +impl Drop for CircularMmap {
> +    fn drop(&mut self) {
> +        // Munmap the 2x circular mapping
> +        // Matches libqb's cleanup in qb_rb_close_helper
> +        unsafe {
> +            libc::munmap(self.addr, self.size * 2);
> +        }
> +        tracing::debug!(
> +            "Unmapped circular buffer: {:p}, {} bytes (2x {} bytes file)",
> +            self.addr,
> +            self.size * 2,
> +            self.size
> +        );
> +    }
> +}
> +
> +/// Process-shared POSIX semaphore wrapper
> +///
> +/// This wraps the native Linux sem_t (32 bytes on x86_64) for inter-process
> +/// synchronization in the ring buffer.
> +///
> +/// **libqb compatibility note**: This corresponds to libqb's `rpl_sem_t` type.
> +/// On Linux with HAVE_SEM_TIMEDWAIT defined, rpl_sem_t is just an alias for
> +/// the native sem_t. The "rpl" prefix stands for "replacement" - libqb provides
> +/// a fallback implementation using mutexes/condvars on systems without proper
> +/// POSIX semaphore support (like BSD). Since we only target Linux, we use the
> +/// native sem_t directly.
> +#[repr(C)]
> +struct PosixSem {
> +    /// Raw sem_t storage (32 bytes on Linux x86_64)
> +    _sem: [u8; 32],
> +}
> +
> +impl PosixSem {
> +    /// Initialize a POSIX semaphore in-place in shared memory
> +    ///
> +    /// This initializes the semaphore at its current memory location, which is
> +    /// critical for process-shared semaphores in mmap'd memory. The semaphore
> +    /// must not be moved after initialization.
> +    ///
> +    /// The semaphore is always initialized as:
> +    /// - **Process-shared** (pshared=1): Shared between processes via mmap
> +    /// - **Initial value 0**: No data available initially
> +    ///
> +    /// Matches libqb's semaphore initialization in `qb_rb_create_from_file`.
> +    ///
> +    /// # Safety
> +    /// The semaphore must remain at its current memory location and must not
> +    /// be moved or copied after initialization.
> +    unsafe fn init_in_place(&mut self) -> Result<()> {
> +        let sem_ptr = self._sem.as_mut_ptr() as *mut libc::sem_t;
> +
> +        // pshared=1: Process-shared semaphore (for cross-process IPC)
> +        // initial_value=0: No data available initially (producers will post)
> +        const PSHARED: libc::c_int = 1;
> +        const INITIAL_VALUE: libc::c_uint = 0;
> +
> +        // SAFETY: Caller ensures the semaphore memory is valid and will remain
> +        // at this location for its lifetime
> +        let ret = unsafe { libc::sem_init(sem_ptr, PSHARED, INITIAL_VALUE) };
> +
> +        if ret != 0 {
> +            anyhow::bail!("sem_init failed: {}", std::io::Error::last_os_error());
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Destroy the semaphore
> +    ///
> +    /// This should be called when the semaphore is no longer needed.
> +    /// Matches libqb's rpl_sem_destroy (which is sem_destroy on Linux).
> +    ///
> +    /// # Safety
> +    /// The semaphore must have been properly initialized and no threads should
> +    /// be waiting on it.
> +    unsafe fn destroy(&mut self) -> Result<()> {
> +        let sem_ptr = self._sem.as_mut_ptr() as *mut libc::sem_t;
> +
> +        // SAFETY: Caller ensures the semaphore is initialized and not in use
> +        let ret = unsafe { libc::sem_destroy(sem_ptr) };
> +
> +        if ret != 0 {
> +            anyhow::bail!("sem_destroy failed: {}", std::io::Error::last_os_error());
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Post to the semaphore (increment)
> +    ///
> +    /// Matches libqb's rpl_sem_post (which is sem_post on Linux).
> +    unsafe fn post(&self) -> Result<()> {
> +        let ret = unsafe { libc::sem_post(self._sem.as_ptr() as *mut libc::sem_t) };
> +
> +        if ret != 0 {
> +            anyhow::bail!("sem_post failed: {}", std::io::Error::last_os_error());
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Wait on the semaphore asynchronously (decrement, blocking)
> +    ///
> +    /// Uses `spawn_blocking` to wait on the semaphore without blocking the tokio
> +    /// runtime. This provides true event-driven behavior while maintaining
> +    /// compatibility with libqb's semaphore-based notification mechanism.
> +    ///
> +    /// Matches libqb's `my_posix_sem_timedwait` / `sem_wait` behavior.
> +    ///
> +    /// # Safety
> +    /// The semaphore must be properly initialized and remain valid for the
> +    /// duration of the wait operation.
> +    async unsafe fn wait(&self) -> Result<()> {
> +        // Get raw pointer to semaphore
> +        let sem_ptr = self._sem.as_ptr() as *mut libc::sem_t;
> +
> +        // Convert to usize for safe transfer between threads
> +        // This is safe because:
> +        // 1. The semaphore is in process-shared memory (mmap'd file)
> +        // 2. The memory remains valid for the lifetime of the containing structure
> +        // 3. We're only using the pointer on the blocking thread pool
> +        let sem_ptr_addr = sem_ptr as usize;
> +
> +        // Use spawn_blocking to wait on the semaphore without blocking tokio runtime
> +        // This offloads the blocking sem_wait to tokio's dedicated blocking thread pool
> +        tokio::task::spawn_blocking(move || {
> +            // Reconstruct the pointer on the blocking thread
> +            // SAFETY: The semaphore is in shared memory and remains valid.
> +            // We're calling sem_wait on a process-shared semaphore from a thread
> +            // in the same process, which is safe.
> +            let sem_ptr = sem_ptr_addr as *mut libc::sem_t;
> +            let ret = unsafe { libc::sem_wait(sem_ptr) };

On shutdown the async task can be dropped while the blocking sem_wait
keeps running, but RingBuffer may then sem_destroy/unmap. Please make
the wait shutdown aware and wake before destroying.

> +
> +            if ret != 0 {
> +                let err = std::io::Error::last_os_error();
> +                // Handle EINTR by returning an error that causes retry
> +                if err.raw_os_error() == Some(libc::EINTR) {
> +                    anyhow::bail!("sem_wait interrupted (EINTR), will retry");

this says "will retry" but actually crashes?

> +                }
> +                anyhow::bail!("sem_wait failed: {err}");
> +            }
> +
> +            Ok(())
> +        })
> +        .await
> +        .context("spawn_blocking task failed")??;
> +
> +        Ok(())
> +    }
> +}
> +
> +/// Shared memory header matching libqb's qb_ringbuffer_shared_s layout
> +///
> +/// This structure is mmap'd and shared between processes.
> +/// Field order and alignment must exactly match libqb for compatibility.
> +///
> +/// Note: libqb's struct has `char user_data[1]` which contributes 1 byte to sizeof(),
> +/// then the struct is padded to 8-byte alignment (7 bytes padding).
> +/// Additional shared_user_data_size bytes are allocated beyond sizeof().
> +#[repr(C, align(8))]
> +struct RingBufferShared {
> +    /// Write pointer (word index, not byte offset)
> +    write_pt: AtomicU32,
> +    /// Read pointer (word index, not byte offset)
> +    read_pt: AtomicU32,
> +    /// Ring buffer size in words (u32 units)
> +    word_size: u32,
> +    /// Path to header file
> +    hdr_path: [u8; libc::PATH_MAX as usize],
> +    /// Path to data file
> +    data_path: [u8; libc::PATH_MAX as usize],
> +    /// Reference count (for cleanup)
> +    ref_count: AtomicU32,
> +    /// Process-shared semaphore for notification
> +    posix_sem: PosixSem,
> +    /// Flexible array member placeholder (matches C's char user_data[1])
> +    /// Actual user_data starts here and continues beyond sizeof(RingBufferShared)
> +    user_data: [u8; 1],
> +    // 7 bytes of padding added by align(8) to reach 8248 bytes total
> +}
> +
> +impl RingBufferShared {
> +    /// Chunk header size in 32-bit words (matching libqb)
> +    const CHUNK_HEADER_WORDS: usize = 2;
> +
> +    /// Chunk magic numbers (matching libqb qb_ringbuffer_int.h)
> +    const CHUNK_MAGIC: u32 = 0xA1A1A1A1; // Valid allocated chunk
> +    const CHUNK_MAGIC_DEAD: u32 = 0xD0D0D0D0; // Reclaimed/dead chunk
> +    const CHUNK_MAGIC_ALLOC: u32 = 0xA110CED0; // Chunk being allocated
> +
> +    /// Calculate the next pointer position after a chunk of given size
> +    ///
> +    /// This implements libqb's qb_rb_chunk_step logic (ringbuffer.c:464-484):
> +    /// 1. Skip chunk header (CHUNK_HEADER_WORDS)
> +    /// 2. Skip user data (rounded up to word boundary)
> +    /// 3. Wrap around if needed
> +    ///
> +    /// # Arguments
> +    /// - `current_pt`: Current read or write pointer (in words)
> +    /// - `data_size_bytes`: Size of the data payload in bytes
> +    ///
> +    /// # Returns
> +    /// New pointer position (in words), wrapped to [0, word_size)
> +    fn chunk_step(&self, current_pt: u32, data_size_bytes: usize) -> u32 {
> +        let word_size = self.word_size as usize;
> +
> +        // Convert bytes to words, rounding up to word boundary
> +        // This matches libqb's logic:
> +        //   pointer += (chunk_size / sizeof(uint32_t));
> +        //   if ((chunk_size % (sizeof(uint32_t) * QB_RB_WORD_ALIGN)) != 0) pointer++;
> +        let data_words = data_size_bytes.div_ceil(std::mem::size_of::<u32>());
> +
> +        // Calculate new position: current + header + data (in words)
> +        let new_pt = (current_pt as usize + Self::CHUNK_HEADER_WORDS + data_words) % word_size;
> +
> +        new_pt as u32
> +    }
> +
> +    /// Initialize a RingBufferShared structure in-place in shared memory
> +    ///
> +    /// This initializes the ring buffer header at its current memory location, which is
> +    /// critical for process-shared data structures in mmap'd memory. The structure
> +    /// must not be moved after initialization.
> +    ///
> +    /// # Arguments
> +    /// - `word_size`: Size of ring buffer in 32-bit words
> +    /// - `hdr_path`: Path to the header file (will be copied into the structure)
> +    /// - `data_path`: Path to the data file (will be copied into the structure)
> +    ///
> +    /// # Safety
> +    /// The RingBufferShared must remain at its current memory location and must not
> +    /// be moved or copied after initialization.
> +    unsafe fn init_in_place(
> +        &mut self,
> +        word_size: u32,
> +        hdr_path: &std::path::Path,
> +        data_path: &std::path::Path,
> +    ) -> Result<()> {
> +        // SAFETY: Caller ensures this structure is in shared memory and will remain
> +        // at this location for its lifetime
> +        unsafe {
> +            // Zero-initialize the entire structure first
> +            std::ptr::write_bytes(self as *mut Self, 0, 1);
> +
> +            // Initialize atomic fields
> +            self.write_pt = AtomicU32::new(0);
> +            self.read_pt = AtomicU32::new(0);
> +            self.word_size = word_size;
> +            self.ref_count = AtomicU32::new(1);
> +
> +            // Initialize semaphore in-place in shared memory
> +            // This is critical - the semaphore must be initialized at its final location
> +            self.posix_sem
> +                .init_in_place()
> +                .context("Failed to initialize semaphore")?;
> +
> +            // Copy header path into structure
> +            let hdr_path_str = hdr_path.to_string_lossy();
> +            let hdr_path_bytes = hdr_path_str.as_bytes();
> +            let len = hdr_path_bytes.len().min(libc::PATH_MAX as usize - 1);
> +            self.hdr_path[..len].copy_from_slice(&hdr_path_bytes[..len]);
> +
> +            // Copy data path into structure
> +            let data_path_str = data_path.to_string_lossy();
> +            let data_path_bytes = data_path_str.as_bytes();
> +            let len = data_path_bytes.len().min(libc::PATH_MAX as usize - 1);
> +            self.data_path[..len].copy_from_slice(&data_path_bytes[..len]);
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Calculate free space in the ring buffer (in words)
> +    ///
> +    /// Returns the number of free words (u32 units) available for allocation.
> +    /// This uses atomic loads to read the pointers safely.
> +    fn space_free_words(&self) -> usize {
> +        let write_pt = self.write_pt.load(Ordering::Acquire);
> +        let read_pt = self.read_pt.load(Ordering::Acquire);
> +        let word_size = self.word_size as usize;
> +
> +        if write_pt >= read_pt {
> +            if write_pt == read_pt {
> +                word_size // Buffer is empty, all space available
> +            } else {
> +                (read_pt as usize + word_size - write_pt as usize) - 1
> +            }
> +        } else {
> +            (read_pt as usize - write_pt as usize) - 1
> +        }
> +    }
> +
> +    /// Calculate free space in bytes
> +    ///
> +    /// Converts the word count to bytes by multiplying by sizeof(uint32_t).
> +    /// Matches libqb's qb_rb_space_free (ringbuffer.c:373).
> +    fn space_free_bytes(&self) -> usize {
> +        self.space_free_words() * std::mem::size_of::<u32>()
> +    }
> +
> +    /// Check if a chunk of given size (in bytes) can fit in the buffer
> +    ///
> +    /// Includes chunk header overhead and alignment requirements.
> +    fn chunk_fits(&self, message_size: usize, chunk_margin: usize) -> bool {
> +        let required_bytes = message_size + chunk_margin;
> +        self.space_free_bytes() >= required_bytes
> +    }
> +
> +    /// Write a chunk to the ring buffer
> +    ///
> +    /// This performs the complete chunk write operation:
> +    /// 1. Allocate space in the ring buffer
> +    /// 2. Write the message data (handling wraparound)
> +    /// 3. Commit the chunk (update write_pt, set magic)
> +    /// 4. Post to semaphore to wake readers
> +    ///
> +    /// # Safety
> +    /// Caller must ensure:
> +    /// - shared_data points to valid ring buffer data
> +    /// - There is sufficient space (checked via chunk_fits)
> +    /// - No other thread is writing concurrently
> +    unsafe fn write_chunk(&self, shared_data: *mut u32, message: &[u8]) -> Result<()> {
> +        let msg_len = message.len();
> +        let word_size = self.word_size as usize;
> +
> +        // Get current write pointer
> +        let write_pt = self.write_pt.load(Ordering::Acquire);
> +
> +        // Write chunk header: [size=0][magic=ALLOC]
> +        // Matches libqb's qb_rb_chunk_alloc (ringbuffer.c:439-440)
> +        unsafe {
> +            *shared_data.add(write_pt as usize) = 0; // Size is 0 during allocation
> +            *shared_data.add((write_pt as usize + 1) % word_size) = Self::CHUNK_MAGIC_ALLOC;
> +        }
> +
> +        // Write message data
> +        let data_offset = (write_pt as usize + Self::CHUNK_HEADER_WORDS) % word_size;
> +        let data_ptr = unsafe { shared_data.add(data_offset) as *mut u8 };
> +
> +        // Handle wraparound - calculate remaining bytes in buffer before wraparound
> +        let remaining = (word_size - data_offset) * std::mem::size_of::<u32>();
> +        if msg_len <= remaining {
> +            // No wraparound needed
> +            unsafe {
> +                std::ptr::copy_nonoverlapping(message.as_ptr(), data_ptr, msg_len);
> +            }
> +        } else {
> +            // Need to wrap around
> +            unsafe {
> +                std::ptr::copy_nonoverlapping(message.as_ptr(), data_ptr, remaining);
> +                std::ptr::copy_nonoverlapping(
> +                    message.as_ptr().add(remaining),
> +                    shared_data as *mut u8,
> +                    msg_len - remaining,
> +                );
> +            }
> +        }
> +
> +        // Calculate new write pointer - matches libqb's qb_rb_chunk_step logic
> +        let new_write_pt = self.chunk_step(write_pt, msg_len);
> +
> +        // Commit: write size, update write pointer, then set magic with atomic RELEASE
> +        // This matches libqb's qb_rb_chunk_commit behavior (ringbuffer.c:497-504)
> +        unsafe {
> +            // 1. Write chunk size
> +            *shared_data.add(write_pt as usize) = msg_len as u32;
> +
> +            // 2. Update write pointer
> +            self.write_pt.store(new_write_pt, Ordering::Relaxed);
> +
> +            // 3. Set magic with RELEASE
> +            // RELEASE ensures all previous writes (data, size, write_pt) are visible before magic
> +            let magic_offset = (write_pt as usize + 1) % word_size;
> +            let magic_ptr = shared_data.add(magic_offset) as *mut AtomicU32;
> +            (*magic_ptr).store(Self::CHUNK_MAGIC, Ordering::Release);

write_pt could eventually be peeked before the chunk is committed?

We should publish the chunk by advancing write_pt with Release after 
writing data/size/magic, so readers can’t observe the new write_pt 
before the chunk is committed.

> +
> +            // 4. Post to semaphore to wake up waiting readers
> +            self.posix_sem
> +                .post()
> +                .context("Failed to post to semaphore")?;
> +        }
> +
> +        tracing::debug!(
> +            "Wrote chunk: {} bytes, write_pt {} -> {}",
> +            msg_len,
> +            write_pt,
> +            new_write_pt
> +        );
> +
> +        Ok(())
> +    }
> +
> +    /// Read a chunk from the ring buffer
> +    ///
> +    /// This reads the chunk at the current read pointer, validates it,
> +    /// copies the data, and reclaims the chunk.
> +    ///
> +    /// Returns None if the buffer is empty (read_pt == write_pt).
> +    ///
> +    /// # Safety
> +    /// Caller must ensure:
> +    /// - shared_data points to valid ring buffer data
> +    /// - flow_control_ptr (if Some) points to valid i32
> +    /// - No other thread is reading concurrently
> +    unsafe fn read_chunk(
> +        &self,
> +        shared_data: *mut u32,
> +        flow_control_ptr: Option<*mut i32>,
> +    ) -> Result<Option<Vec<u8>>> {
> +        let word_size = self.word_size as usize;
> +
> +        // Get current read pointer
> +        let read_pt = self.read_pt.load(Ordering::Acquire);
> +        let write_pt = self.write_pt.load(Ordering::Acquire);
> +
> +        // Check if buffer is empty
> +        if read_pt == write_pt {
> +            return Ok(None);
> +        }
> +
> +        // Read chunk header with ACQUIRE to see all writes
> +        let magic_offset = (read_pt as usize + 1) % word_size;
> +        let magic_ptr = unsafe { shared_data.add(magic_offset) as *const AtomicU32 };
> +        let chunk_magic = unsafe { (*magic_ptr).load(Ordering::Acquire) };
> +
> +        // Read chunk size
> +        let chunk_size = unsafe { *shared_data.add(read_pt as usize) };

This should be validated too.

> +
> +        tracing::debug!(
> +            "Reading chunk: read_pt={}, write_pt={}, size={}, magic=0x{:08x}",
> +            read_pt,
> +            write_pt,
> +            chunk_size,
> +            chunk_magic
> +        );
> +
> +        // Verify magic
> +        if chunk_magic != Self::CHUNK_MAGIC {
> +            anyhow::bail!(
> +                "Invalid chunk magic at read_pt={}: expected 0x{:08x}, got 0x{:08x}",
> +                read_pt,
> +                Self::CHUNK_MAGIC,
> +                chunk_magic
> +            );
> +        }
> +
> +        // Read message data
> +        let data_offset = (read_pt as usize + Self::CHUNK_HEADER_WORDS) % word_size;
> +        let data_ptr = unsafe { shared_data.add(data_offset) as *const u8 };
> +
> +        let mut message = vec![0u8; chunk_size as usize];
> +
> +        // Handle wraparound - calculate remaining bytes in buffer before wraparound
> +        let remaining = (word_size - data_offset) * std::mem::size_of::<u32>();
> +        if chunk_size as usize <= remaining {
> +            // No wraparound
> +            unsafe {
> +                std::ptr::copy_nonoverlapping(data_ptr, message.as_mut_ptr(), chunk_size as usize);
> +            }
> +        } else {
> +            // Wraparound
> +            unsafe {
> +                std::ptr::copy_nonoverlapping(data_ptr, message.as_mut_ptr(), remaining);
> +                std::ptr::copy_nonoverlapping(
> +                    shared_data as *const u8,
> +                    message.as_mut_ptr().add(remaining),
> +                    chunk_size as usize - remaining,
> +                );
> +            }
> +        }
> +
> +        // Reclaim chunk: clear header and update read pointer
> +        let new_read_pt = self.chunk_step(read_pt, chunk_size as usize);
> +
> +        unsafe {
> +            // Clear chunk size
> +            *shared_data.add(read_pt as usize) = 0;
> +
> +            // Set magic to DEAD with RELEASE
> +            let magic_ptr = shared_data.add(magic_offset) as *mut AtomicU32;
> +            (*magic_ptr).store(Self::CHUNK_MAGIC_DEAD, Ordering::Release);
> +
> +            // Update read_pt
> +            self.read_pt.store(new_read_pt, Ordering::Relaxed);
> +
> +            // Signal flow control - server is ready for next request
> +            if let Some(fc_ptr) = flow_control_ptr {
> +                let refcount = self.ref_count.load(Ordering::Acquire);
> +                if refcount == 2 {
> +                    let fc_atomic = fc_ptr as *mut AtomicI32;
> +                    (*fc_atomic).store(0, Ordering::Relaxed);
> +                }
> +            }
> +        }
> +
> +        Ok(Some(message))
> +    }
> +}
> +
> +/// Flow control mechanism for ring buffer backpressure
> +///
> +/// Implements libqb's flow control protocol for IPC communication.
> +/// The server writes flow control values to shared memory, and clients
> +/// read these values to determine if they should back off.
> +///
> +/// Flow control values (matching libqb's rate limiting):
> +/// - `OK`: Proceed with sending (QB_IPCS_RATE_NORMAL)
> +/// - `SLOW_DOWN`: Approaching capacity, reduce send rate (QB_IPCS_RATE_OFF)
> +/// - `STOP`: Queue full, do not send (QB_IPCS_RATE_OFF_2)
> +///
> +/// ## Disabled Flow Control
> +///
> +/// When constructed with a null fc_ptr, flow control is disabled and all
> +/// operations become no-ops. This matches libqb's behavior for response/event
> +/// rings which don't need backpressure signaling.
> +///
> +/// Matches libqb's qb_ipc_shm_fc_get/qb_ipc_shm_fc_set (ipc_shm.c:176-195)
> +pub struct FlowControl {
> +    /// Pointer to flow control field in shared memory (i32 atomic)
> +    /// Located in shared_user_data area of RingBufferShared
> +    /// If null, flow control is disabled (no-op mode)
> +    fc_ptr: *mut i32,
> +    /// Pointer to shared header for refcount checks
> +    /// If null, flow control is disabled (no-op mode)
> +    shared_hdr: *mut RingBufferShared,
> +}
> +
> +impl FlowControl {
> +    /// OK to send - queue has space (QB_IPCS_RATE_NORMAL)
> +    pub const OK: i32 = 0;
> +
> +    /// Slow down - queue approaching full (QB_IPCS_RATE_OFF)
> +    pub const SLOW_DOWN: i32 = 1;
> +
> +    /// Stop sending - queue full (QB_IPCS_RATE_OFF_2)
> +    pub const STOP: i32 = 2;
> +
> +    /// Create a new FlowControl instance
> +    ///
> +    /// Pass null pointers to create a disabled (no-op) flow control instance.
> +    /// This is used for response/event rings that don't need backpressure.
> +    ///
> +    /// # Safety
> +    /// - If fc_ptr is non-null, it must point to valid shared memory for an i32
> +    /// - If shared_hdr is non-null, it must point to valid RingBufferShared
> +    /// - Both must remain valid for the lifetime of FlowControl (if non-null)
> +    unsafe fn new(fc_ptr: *mut i32, shared_hdr: *mut RingBufferShared) -> Self {
> +        // Initialize to 0 if enabled - server is ready for requests
> +        // libqb clients check: if (fc > 0 && fc <= fc_enable_max) return EAGAIN
> +        // So 0 means "ready to transmit", > 0 means "flow control active/blocked"
> +        if !fc_ptr.is_null() {
> +            let fc_atomic = fc_ptr as *mut AtomicI32;
> +            unsafe {
> +                (*fc_atomic).store(0, Ordering::Relaxed);
> +            }
> +        }
> +
> +        Self { fc_ptr, shared_hdr }
> +    }
> +
> +    /// Check if flow control is enabled
> +    #[inline]
> +    fn is_enabled(&self) -> bool {
> +        !self.fc_ptr.is_null()
> +    }
> +
> +    /// Get the raw flow control pointer (for internal use)
> +    #[inline]
> +    fn fc_ptr(&self) -> *mut i32 {
> +        self.fc_ptr
> +    }
> +
> +    /// Get flow control value
> +    ///
> +    /// Matches libqb's qb_ipc_shm_fc_get (ipc_shm.c:185-195).
> +    /// Returns:
> +    /// - 0: Ready for requests (or flow control disabled)
> +    /// - >0: Flow control active (client should retry)
> +    /// - <0: Error (not connected)
> +    ///
> +    /// Note: This method is primarily for libqb clients, not used internally by server
> +    #[allow(dead_code)]
> +    pub fn get(&self) -> i32 {
> +        if !self.is_enabled() {
> +            return 0; // Disabled = always ready
> +        }
> +
> +        // Check if both client and server are connected (refcount == 2)
> +        let refcount = unsafe { (*self.shared_hdr).ref_count.load(Ordering::Acquire) };
> +        if refcount != 2 {
> +            return -libc::ENOTCONN;
> +        }
> +
> +        // Read flow control value atomically
> +        unsafe {
> +            let fc_atomic = self.fc_ptr as *const AtomicI32;
> +            (*fc_atomic).load(Ordering::Relaxed)
> +        }
> +    }
> +
> +    /// Set flow control value
> +    ///
> +    /// Matches libqb's qb_ipc_shm_fc_set (ipc_shm.c:176-182).
> +    /// - fc_enable = 0: Ready for requests
> +    /// - fc_enable > 0: Flow control active (backpressure)
> +    ///
> +    /// No-op if flow control is disabled.
> +    pub fn set(&self, fc_enable: i32) {
> +        if !self.is_enabled() {
> +            return; // Disabled = no-op
> +        }
> +
> +        tracing::trace!("Setting flow control to {}", fc_enable);
> +        unsafe {
> +            let fc_atomic = self.fc_ptr as *mut AtomicI32;
> +            (*fc_atomic).store(fc_enable, Ordering::Relaxed);
> +        }
> +    }
> +}
> +
> +// Safety: FlowControl uses atomic operations for synchronization
> +unsafe impl Send for FlowControl {}
> +unsafe impl Sync for FlowControl {}
> +
> +/// Ring buffer handle
> +///
> +/// Owns the mmap'd memory regions and provides async message-passing API.
> +pub struct RingBuffer {
> +    /// Mmap of shared header
> +    _mmap_hdr: MmapMut,
> +    /// Circular mmap of shared data (2x virtual mapping)
> +    _mmap_data: CircularMmap,
> +    /// Pointer to shared header (inside _mmap_hdr)
> +    shared_hdr: *mut RingBufferShared,
> +    /// Pointer to shared data array (inside _mmap_data)
> +    shared_data: *mut u32,
> +    /// Flow control mechanism
> +    /// Always present, but may be disabled (no-op) for response/event rings
> +    pub flow_control: FlowControl,
> +    /// Notifier for when data becomes available (for consumers)
> +    data_available: Arc<Notify>,
> +    /// Notifier for when space becomes available (for producers)
> +    space_available: Arc<Notify>,
> +    /// Whether this instance created the ring buffer (and thus owns cleanup)
> +    /// Matches libqb's QB_RB_FLAG_CREATE flag
> +    is_creator: bool,
> +}
> +
> +// Safety: RingBuffer uses atomic operations for synchronization
> +unsafe impl Send for RingBuffer {}
> +unsafe impl Sync for RingBuffer {}
> +
> +impl RingBuffer {
> +    /// Chunk margin for space calculations (in bytes)
> +    /// Matches libqb: sizeof(uint32_t) * (CHUNK_HEADER_WORDS + WORD_ALIGN + CACHE_LINE_WORDS)
> +    /// We don't use cache line alignment, so CACHE_LINE_WORDS = 0
> +    const CHUNK_MARGIN: usize = 4 * (RingBufferShared::CHUNK_HEADER_WORDS + 1);
> +
> +    /// Create a new ring buffer in shared memory
> +    ///
> +    /// Creates two files in `/dev/shm`:
> +    /// - `{base_dir}/qb-{name}-header`
> +    /// - `{base_dir}/qb-{name}-data`
> +    ///
> +    /// # Arguments
> +    /// - `base_dir`: Directory for shared memory files (typically "/dev/shm")
> +    /// - `name`: Ring buffer name
> +    /// - `size_bytes`: Size of ring buffer data in bytes
> +    /// - `shared_user_data_size`: Extra bytes to allocate after RingBufferShared for flow control
> +    ///
> +    /// The header file size will be: sizeof(RingBufferShared) + shared_user_data_size
> +    /// This matches libqb's behavior: sizeof(qb_ringbuffer_shared_s) + shared_user_data_size
> +    pub fn new(
> +        base_dir: impl AsRef<Path>,
> +        name: &str,
> +        size_bytes: usize,
> +        shared_user_data_size: usize,
> +    ) -> Result<Self> {
> +        let base_dir = base_dir.as_ref();
> +
> +        // Match libqb's size calculation exactly:
> +        // 1. Add CHUNK_MARGIN + 1 (13 bytes)
> +        //    CHUNK_MARGIN = sizeof(uint32_t) * (CHUNK_HEADER_WORDS + WORD_ALIGN + CACHE_LINE_WORDS)
> +        //    = 4 * (2 + 1 + 0) = 12 bytes (without cache line alignment)
> +        let size = size_bytes + Self::CHUNK_MARGIN + 1;
> +
> +        // 2. Round up to page size (typically 4096)
> +        let page_size = 4096; // Standard page size on Linux
> +        let real_size = size.div_ceil(page_size) * page_size;
> +
> +        // 3. Calculate word_size from rounded size
> +        let word_size = real_size / 4;
> +
> +        tracing::info!(
> +            "Creating ring buffer '{}': size_bytes={}, real_size={}, word_size={} ({}words = {} bytes)",
> +            name,
> +            size_bytes,
> +            real_size,
> +            word_size,
> +            word_size,
> +            real_size
> +        );
> +
> +        // Create header file
> +        let hdr_filename = format!("qb-{name}-header");
> +        let hdr_path = base_dir.join(&hdr_filename);
> +
> +        let hdr_file = OpenOptions::new()
> +            .read(true)
> +            .write(true)
> +            .create(true)
> +            .truncate(true)
> +            .open(&hdr_path)
> +            .context("Failed to create header file")?;

Should explicitely set .mode(0o600) here?

> +
> +        // Resize to fit RingBufferShared structure + shared_user_data
> +        // This matches libqb: sizeof(qb_ringbuffer_shared_s) + shared_user_data_size
> +        let hdr_size = std::mem::size_of::<RingBufferShared>() + shared_user_data_size;
> +        hdr_file
> +            .set_len(hdr_size as u64)
> +            .context("Failed to resize header file")?;
> +
> +        // Mmap header
> +        let mut mmap_hdr =
> +            unsafe { MmapMut::map_mut(&hdr_file) }.context("Failed to mmap header")?;
> +
> +        // Create data file path (needed for init_in_place)
> +        let data_filename = format!("qb-{name}-data");
> +        let data_path = base_dir.join(&data_filename);
> +
> +        // Initialize shared header
> +        let shared_hdr = mmap_hdr.as_mut_ptr() as *mut RingBufferShared;
> +
> +        unsafe {
> +            (*shared_hdr).init_in_place(word_size as u32, &hdr_path, &data_path)?;
> +        }
> +
> +        // Create data file
> +        let data_file = OpenOptions::new()
> +            .read(true)
> +            .write(true)
> +            .create(true)
> +            .truncate(true)
> +            .open(&data_path)
> +            .context("Failed to create data file")?;
> +
> +        // Create data file with real_size (NOT 2x real_size!)
> +        // libqb creates the file with real_size, then uses circular mmap to map it TWICE
> +        // in consecutive virtual address space. The file itself is only real_size bytes.
> +        // During cleanup, libqb unmaps 2*real_size bytes (the circular mmap), but the
> +        // file itself remains real_size bytes.
> +        data_file
> +            .set_len(real_size as u64)
> +            .context("Failed to resize data file")?;
> +
> +        // Create circular mmap - maps the file TWICE in consecutive virtual memory
> +        // This matches libqb's qb_sys_circular_mmap implementation
> +        let data_fd = data_file.as_raw_fd();
> +        let mut mmap_data = unsafe {
> +            CircularMmap::new(data_fd, real_size).context("Failed to create circular mmap")?
> +        };
> +
> +        // Zero-initialize the data (only need to zero first half due to circular mapping)
> +        unsafe {
> +            mmap_data.zero_initialize();
> +        }
> +
> +        let shared_data = mmap_data.as_mut_ptr();
> +
> +        // Write sentinel value at end of buffer (matches libqb behavior)
> +        // This works now because we have circular mmap with 2x virtual space!
> +        unsafe {
> +            *shared_data.add(word_size) = 5;
> +        }
> +
> +        // Initialize flow control
> +        // If shared_user_data_size >= sizeof(i32), flow control is enabled (for request ring)
> +        // Otherwise, flow control is disabled (for response/event rings)
> +        let flow_control = if shared_user_data_size >= std::mem::size_of::<i32>() {
> +            unsafe {
> +                // Get pointer to user_data field within the structure
> +                // This matches libqb's: return rb->shared_hdr->user_data;
> +                let fc_ptr = std::ptr::addr_of_mut!((*shared_hdr).user_data) as *mut i32;
> +                FlowControl::new(fc_ptr, shared_hdr)
> +            }
> +        } else {
> +            // Disabled flow control (null pointers = no-op mode)
> +            unsafe { FlowControl::new(std::ptr::null_mut(), std::ptr::null_mut()) }
> +        };
> +
> +        Ok(Self {
> +            _mmap_hdr: mmap_hdr,
> +            _mmap_data: mmap_data,
> +            shared_hdr,
> +            shared_data,
> +            flow_control,
> +            data_available: Arc::new(Notify::new()),
> +            space_available: Arc::new(Notify::new()),
> +            is_creator: true, // This instance created the ring buffer
> +        })
> +    }
> +
> +    /// Send a message into the ring buffer (async)
> +    ///
> +    /// Allocates a chunk, writes the message data, and commits the chunk.
> +    /// Awaits if there's insufficient space.
> +    pub async fn send(&mut self, message: &[u8]) -> Result<()> {
> +        loop {
> +            match self.try_send(message) {
> +                Ok(()) => {
> +                    // Notify consumers that data is available
> +                    self.data_available.notify_one();

nobody waits on this?

> +                    return Ok(());
> +                }
> +                Err(e) if e.to_string().contains("Insufficient space") => {

Can we please replace this string matching by introducing a enum error 
variant?

> +                    // Wait for space to become available
> +                    self.space_available.notified().await;

Tokio notify only works inside one process.
But another process frees up the space.
So this would hang likely forever?

> +                    continue;
> +                }
> +                Err(e) => return Err(e),
> +            }
> +        }
> +    }
> +
> +    /// Try to send a message without blocking
> +    ///
> +    /// Returns an error if there's insufficient space.
> +    pub fn try_send(&mut self, message: &[u8]) -> Result<()> {
> +        // Check if we have enough space
> +        if !unsafe { (*self.shared_hdr).chunk_fits(message.len(), Self::CHUNK_MARGIN) } {
> +            let space_free = self.space_free();
> +            let required = Self::CHUNK_MARGIN + message.len();
> +            anyhow::bail!(
> +                "Insufficient space: need {required} bytes, have {space_free} bytes free"
> +            );
> +        }
> +
> +        // Write the chunk using RingBufferShared
> +        unsafe { (*self.shared_hdr).write_chunk(self.shared_data, message)? };
> +
> +        Ok(())
> +    }
> +
> +    /// Receive a message from the ring buffer (async)
> +    ///
> +    /// Awaits if no message is available.
> +    /// After processing, the chunk is automatically reclaimed.
> +    ///
> +    /// ## Implementation Note
> +    ///
> +    /// libqb uses semaphore-based blocking (sem_timedwait) to wait for data
> +    /// (see qb_rb_chunk_peek in libqb/lib/ringbuffer.c).
> +    ///
> +    /// We use tokio's `spawn_blocking` to wait on the POSIX semaphore without
> +    /// blocking the async runtime. This provides true event-driven behavior with
> +    /// zero polling overhead, while maintaining compatibility with libqb clients.
> +    pub async fn recv(&mut self) -> Result<Vec<u8>> {
> +        loop {
> +            // Wait on POSIX semaphore asynchronously
> +            // This matches libqb's timedwait_fn behavior in qb_rb_chunk_peek
> +            // SAFETY: The semaphore is properly initialized in new() and remains
> +            // valid for the lifetime of RingBuffer
> +            unsafe { (*self.shared_hdr).posix_sem.wait().await? };
> +
> +            // Semaphore was decremented, data should be available
> +            // Read and reclaim the chunk
> +            match self.recv_after_semwait()? {
> +                Some(data) => {
> +                    // Notify producers that space is available
> +                    self.space_available.notify_one();
> +                    return Ok(data);
> +                }
> +                None => {
> +                    // Spurious wakeup or race condition - semaphore was decremented
> +                    // but no valid data found. This shouldn't happen in normal operation.
> +                    tracing::warn!("Spurious semaphore wakeup detected, retrying");
> +                    continue;
> +                }
> +            }
> +        }
> +    }
> +
> +    /// Receive a message after semaphore has been decremented
> +    ///
> +    /// This is called after `PosixSem::wait()` has successfully decremented
> +    /// the semaphore. It reads the chunk data and reclaims the chunk.
> +    ///
> +    /// Returns `None` if the buffer is empty despite semaphore being decremented
> +    /// (which indicates a bug or race condition).
> +    fn recv_after_semwait(&mut self) -> Result<Option<Vec<u8>>> {
> +        // Get fc_ptr if flow control is enabled, otherwise null
> +        let fc_ptr = if self.flow_control.is_enabled() {
> +            Some(self.flow_control.fc_ptr())
> +        } else {
> +            None
> +        };
> +        unsafe { (*self.shared_hdr).read_chunk(self.shared_data, fc_ptr) }
> +    }
> +
> +    /// Calculate free space in the ring buffer (in bytes)
> +    fn space_free(&self) -> usize {
> +        unsafe { (*self.shared_hdr).space_free_bytes() }
> +    }
> +}
> +
> +impl Drop for RingBuffer {
> +    fn drop(&mut self) {
> +        // Decrement ref count
> +        let ref_count = unsafe { (*self.shared_hdr).ref_count.fetch_sub(1, Ordering::AcqRel) };
> +
> +        tracing::debug!(
> +            "Dropping ring buffer, ref_count: {} -> {}",
> +            ref_count,
> +            ref_count - 1
> +        );
> +
> +        // If last reference AND we created it, clean up semaphore and files
> +        // This matches libqb's behavior: only the creator (QB_RB_FLAG_CREATE) destroys the semaphore
> +        if ref_count == 1 && self.is_creator {
> +            unsafe {
> +                // Destroy the semaphore before cleaning up the mmap
> +                // Matches libqb's cleanup in qb_rb_close_helper
> +                if let Err(e) = (*self.shared_hdr).posix_sem.destroy() {
> +                    tracing::warn!("Failed to destroy semaphore: {}", e);
> +                }
> +
> +                let hdr_path =
> +                    std::ffi::CStr::from_ptr((*self.shared_hdr).hdr_path.as_ptr() as *const i8);
> +                let data_path =
> +                    std::ffi::CStr::from_ptr((*self.shared_hdr).data_path.as_ptr() as *const i8);
> +
> +                if let Ok(hdr_path_str) = hdr_path.to_str()
> +                    && !hdr_path_str.is_empty()
> +                {
> +                    let _ = std::fs::remove_file(hdr_path_str);
> +                    tracing::debug!("Removed header file: {}", hdr_path_str);
> +                }
> +
> +                if let Ok(data_path_str) = data_path.to_str()
> +                    && !data_path_str.is_empty()
> +                {
> +                    let _ = std::fs::remove_file(data_path_str);
> +                    tracing::debug!("Removed data file: {}", data_path_str);
> +                }
> +            }
> +        }
> +    }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +
> +    #[tokio::test]
> +    async fn test_ringbuffer_basic() -> Result<()> {
> +        let temp_dir = tempfile::tempdir()?;
> +        let mut rb = RingBuffer::new(temp_dir.path(), "test", 4096, 0)?;
> +
> +        // Send a message
> +        rb.send(b"hello world").await?;
> +
> +        // Receive the message
> +        let msg = rb.recv().await?;
> +        assert_eq!(msg, b"hello world");
> +
> +        Ok(())
> +    }
> +
> +    #[tokio::test]
> +    async fn test_ringbuffer_multiple_messages() -> Result<()> {
> +        let temp_dir = tempfile::tempdir()?;
> +        let mut rb = RingBuffer::new(temp_dir.path(), "test", 4096, 0)?;
> +
> +        // Send multiple messages
> +        rb.send(b"message 1").await?;
> +        rb.send(b"message 2").await?;
> +        rb.send(b"message 3").await?;
> +
> +        // Receive in order
> +        assert_eq!(rb.recv().await?, b"message 1");
> +        assert_eq!(rb.recv().await?, b"message 2");
> +        assert_eq!(rb.recv().await?, b"message 3");
> +
> +        Ok(())
> +    }
> +
> +    #[tokio::test]
> +    async fn test_ringbuffer_nonblocking_send() -> Result<()> {
> +        let temp_dir = tempfile::tempdir()?;
> +        let mut rb = RingBuffer::new(temp_dir.path(), "test", 4096, 0)?;
> +
> +        // Test try_send (non-blocking send) with async recv
> +        rb.try_send(b"data")?;
> +        let msg = rb.recv().await?;
> +        assert_eq!(msg, b"data");
> +
> +        Ok(())
> +    }
> +
> +    #[tokio::test]
> +    async fn test_ringbuffer_wraparound() -> Result<()> {
> +        let temp_dir = tempfile::tempdir()?;
> +        let mut rb = RingBuffer::new(temp_dir.path(), "test", 256, 0)?;
> +
> +        // Fill and drain to force wraparound
> +        for _ in 0..10 {
> +            rb.send(b"data").await?;
> +            rb.recv().await?;
> +        }
> +
> +        // Should still work
> +        rb.send(b"after wrap").await?;
> +        assert_eq!(rb.recv().await?, b"after wrap");
> +
> +        Ok(())
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-ipc/src/server.rs b/src/pmxcfs-rs/pmxcfs-ipc/src/server.rs
> new file mode 100644
> index 00000000..73d63de0
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-ipc/src/server.rs
> @@ -0,0 +1,278 @@
> +/// Main libqb IPC server implementation
> +///
> +/// This module contains the Server struct and its implementation,
> +/// including connection acceptance and server lifecycle management.
> +use anyhow::{Context, Result};
> +use parking_lot::Mutex;
> +use std::collections::HashMap;
> +use std::sync::Arc;
> +use std::sync::atomic::{AtomicU64, AtomicUsize, Ordering};
> +use tokio::net::UnixListener;
> +use tokio_util::sync::CancellationToken;
> +
> +use super::connection::QbConnection;
> +use super::handler::Handler;
> +use super::socket::bind_abstract_socket;
> +
> +/// Server-level connection statistics (matches libqb qb_ipcs_stats)
> +#[derive(Debug, Default)]
> +pub struct ServerStats {
> +    /// Number of currently active connections
> +    pub active_connections: AtomicUsize,
> +    /// Total number of closed connections since server start
> +    pub closed_connections: AtomicUsize,
> +}
> +
> +impl ServerStats {
> +    fn new() -> Self {
> +        Self {
> +            active_connections: AtomicUsize::new(0),
> +            closed_connections: AtomicUsize::new(0),
> +        }
> +    }
> +
> +    /// Increment active connections count (new connection established)
> +    fn connection_created(&self) {
> +        self.active_connections.fetch_add(1, Ordering::Relaxed);
> +        tracing::debug!(
> +            active = self.active_connections.load(Ordering::Relaxed),
> +            closed = self.closed_connections.load(Ordering::Relaxed),
> +            "Connection created"
> +        );
> +    }
> +
> +    /// Decrement active, increment closed (connection terminated)
> +    fn connection_closed(&self) {
> +        self.active_connections.fetch_sub(1, Ordering::Relaxed);
> +        self.closed_connections.fetch_add(1, Ordering::Relaxed);
> +        tracing::debug!(
> +            active = self.active_connections.load(Ordering::Relaxed),
> +            closed = self.closed_connections.load(Ordering::Relaxed),
> +            "Connection closed"
> +        );
> +    }
> +
> +    /// Get current statistics (for monitoring/debugging)
> +    pub fn get(&self) -> (usize, usize) {
> +        (
> +            self.active_connections.load(Ordering::Relaxed),
> +            self.closed_connections.load(Ordering::Relaxed),
> +        )
> +    }
> +}
> +
> +/// libqb-compatible IPC server
> +pub struct Server {
> +    service_name: String,
> +
> +    // Setup socket (SOCK_STREAM) - accepts new connections
> +    setup_listener: Option<Arc<UnixListener>>,
> +
> +    // Per-connection state
> +    connections: Arc<Mutex<HashMap<u64, QbConnection>>>,
> +    next_conn_id: Arc<AtomicU64>,
> +
> +    // Connection statistics (matches libqb behavior)
> +    stats: Arc<ServerStats>,
> +
> +    // Message handler (trait object, also handles authentication)
> +    handler: Arc<dyn Handler>,
> +
> +    // Cancellation token for graceful shutdown
> +    cancellation_token: CancellationToken,
> +}
> +
> +impl Server {
> +    /// Create a new libqb-compatible IPC server
> +    ///
> +    /// Uses Linux abstract Unix sockets for IPC (no filesystem paths needed).
> +    ///
> +    /// # Arguments
> +    /// * `service_name` - Service name (e.g., "pve2"), used as abstract socket name
> +    /// * `handler` - Handler implementing the Handler trait (handles both authentication and requests)
> +    pub fn new(service_name: &str, handler: impl Handler + 'static) -> Self {
> +        Self {
> +            service_name: service_name.to_string(),
> +            setup_listener: None,
> +            connections: Arc::new(Mutex::new(HashMap::new())),
> +            next_conn_id: Arc::new(AtomicU64::new(1)),
> +            stats: Arc::new(ServerStats::new()),
> +            handler: Arc::new(handler),
> +            cancellation_token: CancellationToken::new(),
> +        }
> +    }
> +
> +    /// Start the IPC server
> +    ///
> +    /// Creates abstract Unix socket that libqb clients can connect to
> +    pub fn start(&mut self) -> Result<()> {
> +        tracing::info!(
> +            "Starting libqb-compatible IPC server: {}",
> +            self.service_name
> +        );
> +
> +        // Create abstract Unix socket (no filesystem paths needed)
> +        let std_listener =
> +            bind_abstract_socket(&self.service_name).context("Failed to bind abstract socket")?;
> +
> +        // Convert to tokio listener
> +        std_listener.set_nonblocking(true)?;
> +        let listener = UnixListener::from_std(std_listener)?;
> +
> +        tracing::info!("Bound abstract Unix socket: @{}", self.service_name);
> +
> +        let listener_arc = Arc::new(listener);
> +        self.setup_listener = Some(listener_arc.clone());
> +
> +        // Start connection acceptor task
> +        let context = AcceptorContext {
> +            listener: listener_arc,
> +            service_name: self.service_name.clone(),
> +            connections: self.connections.clone(),
> +            next_conn_id: self.next_conn_id.clone(),
> +            stats: self.stats.clone(),
> +            handler: self.handler.clone(),
> +            cancellation_token: self.cancellation_token.child_token(),
> +        };
> +
> +        tokio::spawn(async move {
> +            context.run().await;
> +        });
> +
> +        tracing::info!("libqb IPC server started: {}", self.service_name);
> +        Ok(())
> +    }
> +
> +    /// Stop the IPC server
> +    pub fn stop(&mut self) {
> +        tracing::info!("Stopping libqb IPC server: {}", self.service_name);
> +
> +        // Signal all tasks to stop
> +        self.cancellation_token.cancel();
> +
> +        // Close all connections
> +        let connections = std::mem::take(&mut *self.connections.lock());
> +        let num_connections = connections.len();
> +
> +        for (_id, conn) in connections {
> +            // Clean up ring buffer files
> +            for rb_path in &conn.ring_buffer_paths {
> +                if let Err(e) = std::fs::remove_file(rb_path) {
> +                    tracing::debug!(
> +                        "Failed to remove ring buffer file {} (may already be cleaned up): {}",
> +                        rb_path.display(),
> +                        e
> +                    );
> +                }
> +            }
> +
> +            // Update statistics
> +            self.stats.connection_closed();
> +
> +            // Task handles will be aborted when dropped
> +        }
> +
> +        // Final stats
> +        if num_connections > 0 {
> +            let (active, closed) = self.stats.get();
> +            tracing::info!(
> +                "Closed {} connections (final stats: active={}, closed={})",
> +                num_connections,
> +                active,
> +                closed
> +            );
> +        }
> +
> +        self.setup_listener = None;
> +
> +        tracing::info!("libqb IPC server stopped");
> +    }
> +}
> +
> +impl Drop for Server {
> +    fn drop(&mut self) {
> +        self.stop();
> +    }
> +}
> +
> +/// Context for the connection acceptor task
> +///
> +/// Bundles all the state needed by the acceptor loop to avoid passing many parameters.
> +struct AcceptorContext {
> +    listener: Arc<UnixListener>,
> +    service_name: String,
> +    connections: Arc<Mutex<HashMap<u64, QbConnection>>>,
> +    next_conn_id: Arc<AtomicU64>,
> +    stats: Arc<ServerStats>,
> +    handler: Arc<dyn Handler>,
> +    cancellation_token: CancellationToken,
> +}
> +
> +impl AcceptorContext {
> +    /// Run the connection acceptor loop
> +    ///
> +    /// Accepts new connections and spawns handler tasks for each.
> +    async fn run(self) {
> +        tracing::debug!("libqb IPC connection acceptor started");
> +
> +        loop {
> +            // Accept new connection with cancellation support
> +            let accept_result = tokio::select! {
> +                _ = self.cancellation_token.cancelled() => {
> +                    tracing::debug!("Connection acceptor cancelled");
> +                    break;
> +                }
> +                result = self.listener.accept() => result,
> +            };
> +
> +            let (stream, _addr) = match accept_result {
> +                Ok((stream, addr)) => (stream, addr),
> +                Err(e) => {
> +                    if !self.cancellation_token.is_cancelled() {
> +                        tracing::error!("Error accepting connection: {}", e);
> +                    }
> +                    break;
> +                }
> +            };
> +
> +            tracing::debug!("Accepted new setup connection");
> +
> +            // Handle connection
> +            let conn_id = self.next_conn_id.fetch_add(1, Ordering::SeqCst);
> +            match QbConnection::accept(
> +                stream,
> +                conn_id,
> +                &self.service_name,
> +                self.handler.clone(),
> +                self.cancellation_token.child_token(),
> +            )
> +            .await
> +            {
> +                Ok(conn) => {
> +                    self.connections.lock().insert(conn_id, conn);

Connections are never removed which could result into memory leak.

> +                    // Update statistics
> +                    self.stats.connection_created();
> +                }
> +                Err(e) => {
> +                    tracing::error!("Failed to accept connection {}: {}", conn_id, e);
> +                }
> +            }
> +        }
> +
> +        tracing::debug!("libqb IPC connection acceptor finished");
> +    }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use crate::protocol::*;
> +
> +    #[test]
> +    fn test_header_sizes() {
> +        // Verify C struct compatibility
> +        assert_eq!(std::mem::size_of::<RequestHeader>(), 16);
> +        assert_eq!(std::mem::align_of::<RequestHeader>(), 8);
> +        assert_eq!(std::mem::size_of::<ResponseHeader>(), 24);
> +        assert_eq!(std::mem::align_of::<ResponseHeader>(), 8);
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-ipc/src/socket.rs b/src/pmxcfs-rs/pmxcfs-ipc/src/socket.rs
> new file mode 100644
> index 00000000..5831b329
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-ipc/src/socket.rs
> @@ -0,0 +1,84 @@
> +/// Abstract Unix socket utilities
> +///
> +/// This module provides functions for working with Linux abstract Unix sockets,
> +/// which are used by libqb for IPC communication.
> +use anyhow::Result;
> +use std::os::unix::io::FromRawFd;
> +use std::os::unix::net::UnixListener;
> +
> +/// Bind to an abstract Unix socket (Linux-specific)
> +///
> +/// Abstract sockets are identified by a name in the kernel's socket namespace,
> +/// not a filesystem path. They are automatically removed when all references are closed.
> +///
> +/// libqb clients create abstract sockets with FULL 108-byte sun_path (null-padded).
> +/// Linux abstract sockets are length-sensitive, so we must match exactly.
> +pub(super) fn bind_abstract_socket(name: &str) -> Result<UnixListener> {
> +    // Create a Unix socket using libc directly
> +    let sock_fd = unsafe { libc::socket(libc::AF_UNIX, libc::SOCK_STREAM, 0) };
> +    if sock_fd < 0 {
> +        anyhow::bail!(
> +            "Failed to create Unix socket: {}",
> +            std::io::Error::last_os_error()
> +        );
> +    }
> +
> +    // RAII guard to ensure socket is closed on error
> +    struct SocketGuard(i32);
> +    impl Drop for SocketGuard {
> +        fn drop(&mut self) {
> +            unsafe { libc::close(self.0) };
> +        }
> +    }
> +    let guard = SocketGuard(sock_fd);
> +
> +    // Create sockaddr_un with full 108-byte abstract address (matching libqb)
> +    // libqb format: sun_path[0] = '\0', sun_path[1..] = "name\0\0..." (null-padded)
> +    let mut addr: libc::sockaddr_un = unsafe { std::mem::zeroed() };
> +    addr.sun_family = libc::AF_UNIX as libc::sa_family_t;
> +
> +    // sun_path[0] is already 0 (abstract socket marker)
> +    // Copy name starting at sun_path[1]
> +    let name_bytes = name.as_bytes();
> +    let copy_len = name_bytes.len().min(107); // Leave room for initial \0
> +    unsafe {
> +        std::ptr::copy_nonoverlapping(
> +            name_bytes.as_ptr(),
> +            addr.sun_path.as_mut_ptr().offset(1) as *mut u8,
> +            copy_len,
> +        );
> +    }
> +
> +    // Use FULL sockaddr_un length for libqb compatibility!
> +    // libqb clients use the full 110-byte structure (2 + 108) when connecting,
> +    // so we MUST bind with the same length. Verified via strace.
> +    let addr_len = std::mem::size_of::<libc::sockaddr_un>() as libc::socklen_t;
> +    let bind_res = unsafe {
> +        libc::bind(
> +            sock_fd,
> +            &addr as *const _ as *const libc::sockaddr,
> +            addr_len,
> +        )
> +    };
> +    if bind_res < 0 {
> +        anyhow::bail!(
> +            "Failed to bind abstract socket: {}",
> +            std::io::Error::last_os_error()
> +        );
> +    }
> +
> +    // Set socket to listen mode (backlog = 128)
> +    let listen_res = unsafe { libc::listen(sock_fd, 128) };
> +    if listen_res < 0 {
> +        anyhow::bail!(
> +            "Failed to listen on socket: {}",
> +            std::io::Error::last_os_error()
> +        );
> +    }
> +
> +    // Convert raw fd to UnixListener (takes ownership, forget guard)
> +    std::mem::forget(guard);
> +    let listener = unsafe { UnixListener::from_raw_fd(sock_fd) };
> +
> +    Ok(listener)
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-ipc/tests/auth_test.rs b/src/pmxcfs-rs/pmxcfs-ipc/tests/auth_test.rs
> new file mode 100644
> index 00000000..f8e541b0
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-ipc/tests/auth_test.rs
> @@ -0,0 +1,450 @@
> +//! Authentication tests for pmxcfs-ipc
> +//!
> +//! These tests verify that the Handler::authenticate() mechanism works correctly
> +//! for different authentication policies.
> +//!
> +//! Note: These tests use real Unix sockets, so they test authentication behavior
> +//! from the server's perspective. The UID/GID will be the test process's credentials,
> +//! so we test the Handler logic rather than OS-level credential checking.
> +use async_trait::async_trait;
> +use pmxcfs_ipc::{Handler, Permissions, Request, Response, Server};
> +use pmxcfs_test_utils::wait_for_condition_blocking;
> +use std::sync::Arc;
> +use std::sync::atomic::{AtomicU32, Ordering};
> +use std::thread;
> +use std::time::Duration;
> +
> +/// Helper to create a unique service name for each test
> +fn unique_service_name() -> String {
> +    static COUNTER: AtomicU32 = AtomicU32::new(0);
> +    format!("auth-test-{}", COUNTER.fetch_add(1, Ordering::SeqCst))
> +}
> +
> +/// Helper to connect using the qb_wire_compat FFI client
> +/// Returns true if connection succeeded, false if rejected
> +fn try_connect(service_name: &str) -> bool {
> +    use std::ffi::CString;
> +
> +    #[repr(C)]
> +    struct QbIpccConnection {
> +        _private: [u8; 0],
> +    }
> +
> +    #[link(name = "qb")]
> +    unsafe extern "C" {
> +        fn qb_ipcc_connect(name: *const libc::c_char, max_msg_size: usize)
> +        -> *mut QbIpccConnection;
> +        fn qb_ipcc_disconnect(conn: *mut QbIpccConnection);
> +    }
> +
> +    let name = CString::new(service_name).expect("Invalid service name");
> +    let conn = unsafe { qb_ipcc_connect(name.as_ptr(), 8192) };
> +
> +    let success = !conn.is_null();
> +
> +    if success {
> +        unsafe { qb_ipcc_disconnect(conn) };
> +    }
> +
> +    success
> +}
> +
> +// ============================================================================
> +// Test Handlers with Different Authentication Policies
> +// ============================================================================
> +
> +/// Handler that accepts all connections with read-write access
> +struct AcceptAllHandler;
> +
> +#[async_trait]
> +impl Handler for AcceptAllHandler {
> +    fn authenticate(&self, _uid: u32, _gid: u32) -> Option<Permissions> {
> +        Some(Permissions::ReadWrite)
> +    }
> +
> +    async fn handle(&self, _request: Request) -> Response {
> +        Response::ok(b"test".to_vec())
> +    }
> +}
> +
> +/// Handler that rejects all connections
> +struct RejectAllHandler;
> +
> +#[async_trait]
> +impl Handler for RejectAllHandler {
> +    fn authenticate(&self, _uid: u32, _gid: u32) -> Option<Permissions> {
> +        None
> +    }
> +
> +    async fn handle(&self, _request: Request) -> Response {
> +        Response::ok(b"test".to_vec())
> +    }
> +}
> +
> +/// Handler that only accepts root (uid=0)
> +struct RootOnlyHandler;
> +
> +#[async_trait]
> +impl Handler for RootOnlyHandler {
> +    fn authenticate(&self, uid: u32, _gid: u32) -> Option<Permissions> {
> +        if uid == 0 {
> +            Some(Permissions::ReadWrite)
> +        } else {
> +            None
> +        }
> +    }
> +
> +    async fn handle(&self, _request: Request) -> Response {
> +        Response::ok(b"test".to_vec())
> +    }
> +}
> +
> +/// Handler that tracks authentication calls
> +struct TrackingHandler {
> +    call_count: Arc<AtomicU32>,
> +    last_uid: Arc<AtomicU32>,
> +    last_gid: Arc<AtomicU32>,
> +}
> +
> +impl TrackingHandler {
> +    fn new() -> (Self, Arc<AtomicU32>, Arc<AtomicU32>, Arc<AtomicU32>) {
> +        let call_count = Arc::new(AtomicU32::new(0));
> +        let last_uid = Arc::new(AtomicU32::new(0));
> +        let last_gid = Arc::new(AtomicU32::new(0));
> +
> +        (
> +            Self {
> +                call_count: call_count.clone(),
> +                last_uid: last_uid.clone(),
> +                last_gid: last_gid.clone(),
> +            },
> +            call_count,
> +            last_uid,
> +            last_gid,
> +        )
> +    }
> +}
> +
> +#[async_trait]
> +impl Handler for TrackingHandler {
> +    fn authenticate(&self, uid: u32, gid: u32) -> Option<Permissions> {
> +        self.call_count.fetch_add(1, Ordering::SeqCst);
> +        self.last_uid.store(uid, Ordering::SeqCst);
> +        self.last_gid.store(gid, Ordering::SeqCst);
> +        Some(Permissions::ReadWrite)
> +    }
> +
> +    async fn handle(&self, _request: Request) -> Response {
> +        Response::ok(b"test".to_vec())
> +    }
> +}
> +
> +/// Handler that grants read-only access to non-root
> +struct ReadOnlyForNonRootHandler;
> +
> +#[async_trait]
> +impl Handler for ReadOnlyForNonRootHandler {
> +    fn authenticate(&self, uid: u32, _gid: u32) -> Option<Permissions> {
> +        if uid == 0 {
> +            Some(Permissions::ReadWrite)
> +        } else {
> +            Some(Permissions::ReadOnly)
> +        }
> +    }
> +
> +    async fn handle(&self, request: Request) -> Response {
> +        // read_only field is visible to the handler via the connection
> +        // For testing purposes, just accept requests
> +        Response::ok(format!("handled msg_id {}", request.msg_id).into_bytes())
> +    }
> +}
> +
> +// ============================================================================
> +// Helper to start server in background thread
> +// ============================================================================
> +
> +fn start_server<H: Handler + 'static>(service_name: String, handler: H) -> thread::JoinHandle<()> {
> +    thread::spawn(move || {
> +        let rt = tokio::runtime::Runtime::new().expect("Failed to create tokio runtime");
> +        rt.block_on(async {
> +            let mut server = Server::new(&service_name, handler);
> +            server.start().expect("Server startup failed");
> +            std::future::pending::<()>().await;
> +        });
> +    })
> +}
> +
> +/// Wait for server to be ready by checking if socket file exists
> +fn wait_for_server_ready(service_name: &str) {

There is already another wait_for_server_ready? Please check.
Can we have shared test utils?

> +    // The socket is created in /dev/shm/qb-{service_name}-*
> +    // We'll just try to connect repeatedly until successful or timeout
> +    assert!(
> +        wait_for_condition_blocking(
> +            || {
> +                // Try a quick connection attempt
> +                // For servers that accept connections, this will succeed
> +                // For servers that reject, the socket will at least exist
> +
> +                let socket_pattern = format!("/dev/shm/qb-{service_name}-");
> +                // Check if any socket file matching the pattern exists
> +                if let Ok(entries) = std::fs::read_dir("/dev/shm") {
> +                    for entry in entries.flatten() {
> +                        if let Ok(name) = entry.file_name().into_string()
> +                            && name.starts_with(&socket_pattern)
> +                        {
> +                            return true;
> +                        }
> +                    }
> +                }
> +                false
> +            },
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        ),
> +        "Server should be ready within 5 seconds"

The helper can time out when the server is already listening for "reject 
all" case because it never creates ringbuffer files in that case.

> +    );
> +}
> +
> +// ============================================================================
> +// Tests
> +// ============================================================================
> +
> +#[test]
> +#[ignore] // Requires libqb-dev
> +fn test_accept_all_handler() {
> +    let service_name = unique_service_name();
> +    let _server = start_server(service_name.clone(), AcceptAllHandler);
> +
> +    wait_for_server_ready(&service_name);
> +
> +    assert!(
> +        try_connect(&service_name),
> +        "AcceptAllHandler should accept connection"
> +    );
> +}
> +
> +#[test]
> +#[ignore] // Requires libqb-dev
> +fn test_reject_all_handler() {
> +    let service_name = unique_service_name();
> +    let _server = start_server(service_name.clone(), RejectAllHandler);
> +
> +    wait_for_server_ready(&service_name);
> +
> +    assert!(
> +        !try_connect(&service_name),
> +        "RejectAllHandler should reject connection"
> +    );
> +}
> +
> +#[test]
> +#[ignore] // Requires libqb-dev
> +fn test_root_only_handler() {
> +    let service_name = unique_service_name();
> +    let _server = start_server(service_name.clone(), RootOnlyHandler);
> +
> +    wait_for_server_ready(&service_name);
> +
> +    let connected = try_connect(&service_name);
> +
> +    // Get current uid
> +    let current_uid = unsafe { libc::getuid() };
> +
> +    if current_uid == 0 {
> +        assert!(
> +            connected,
> +            "RootOnlyHandler should accept connection when running as root"
> +        );
> +    } else {
> +        assert!(
> +            !connected,
> +            "RootOnlyHandler should reject connection when not running as root (uid={current_uid})"
> +        );
> +    }
> +}
> +
> +#[test]
> +#[ignore] // Requires libqb-dev
> +fn test_authentication_called_with_credentials() {
> +    let service_name = unique_service_name();
> +    let (handler, call_count, last_uid, last_gid) = TrackingHandler::new();
> +    let _server = start_server(service_name.clone(), handler);
> +
> +    wait_for_server_ready(&service_name);
> +
> +    let current_uid = unsafe { libc::getuid() };
> +    let current_gid = unsafe { libc::getgid() };
> +
> +    assert_eq!(
> +        call_count.load(Ordering::SeqCst),
> +        0,
> +        "Should not be called yet"
> +    );
> +
> +    let connected = try_connect(&service_name);
> +
> +    assert!(connected, "TrackingHandler should accept connection");
> +    assert_eq!(
> +        call_count.load(Ordering::SeqCst),
> +        1,
> +        "authenticate() should be called once"
> +    );
> +    assert_eq!(
> +        last_uid.load(Ordering::SeqCst),
> +        current_uid,
> +        "authenticate() should receive correct uid"
> +    );
> +    assert_eq!(
> +        last_gid.load(Ordering::SeqCst),
> +        current_gid,
> +        "authenticate() should receive correct gid"
> +    );
> +}
> +
> +#[test]
> +#[ignore] // Requires libqb-dev
> +fn test_multiple_connections_call_authenticate_each_time() {
> +    let service_name = unique_service_name();
> +    let (handler, call_count, _, _) = TrackingHandler::new();
> +    let _server = start_server(service_name.clone(), handler);
> +
> +    wait_for_server_ready(&service_name);
> +
> +    // First connection
> +    assert!(try_connect(&service_name));
> +    assert_eq!(call_count.load(Ordering::SeqCst), 1);
> +
> +    // Second connection
> +    assert!(try_connect(&service_name));
> +    assert_eq!(call_count.load(Ordering::SeqCst), 2);
> +
> +    // Third connection
> +    assert!(try_connect(&service_name));
> +    assert_eq!(call_count.load(Ordering::SeqCst), 3);
> +}
> +
> +#[test]
> +#[ignore] // Requires libqb-dev
> +fn test_read_only_permissions_accepted() {
> +    let service_name = unique_service_name();
> +    let _server = start_server(service_name.clone(), ReadOnlyForNonRootHandler);
> +
> +    wait_for_server_ready(&service_name);
> +
> +    // Connection should succeed regardless of whether we get ReadOnly or ReadWrite
> +    // (both are accepted, just with different permissions)
> +    assert!(
> +        try_connect(&service_name),
> +        "ReadOnlyForNonRootHandler should accept connections with appropriate permissions"
> +    );
> +}
> +
> +/// Test that demonstrates the authentication policy is enforced at connection time
> +#[test]
> +#[ignore] // Requires libqb-dev
> +fn test_authentication_enforced_at_connection_time() {
> +    // This test verifies that authentication happens during connection setup,
> +    // not during request handling
> +    let service_name = unique_service_name();
> +    let _server = start_server(service_name.clone(), RejectAllHandler);
> +
> +    wait_for_server_ready(&service_name);
> +
> +    // Connection should fail immediately, before any request is sent
> +    let start = std::time::Instant::now();
> +    let connected = try_connect(&service_name);
> +    let duration = start.elapsed();
> +
> +    assert!(!connected, "Connection should be rejected");
> +    assert!(
> +        duration < Duration::from_millis(100),
> +        "Rejection should happen quickly during handshake, not during request processing"
> +    );
> +}
> +
> +#[cfg(test)]
> +mod policy_examples {
> +    use super::*;
> +
> +    /// Example: Handler that mimics Proxmox VE authentication policy
> +    /// - Root (uid=0) gets read-write
> +    /// - www-data (uid=33) gets read-only (for web UI)
> +    /// - Others are rejected
> +    struct ProxmoxStyleHandler;
> +
> +    #[async_trait]
> +    impl Handler for ProxmoxStyleHandler {
> +        fn authenticate(&self, uid: u32, _gid: u32) -> Option<Permissions> {
> +            match uid {
> +                0 => Some(Permissions::ReadWrite), // root
> +                33 => Some(Permissions::ReadOnly), // www-data
> +                _ => None,                         // reject others
> +            }
> +        }
> +
> +        async fn handle(&self, request: Request) -> Response {
> +            // In real implementation, would check request.read_only
> +            // to enforce read-only restrictions
> +            Response::ok(format!("msg_id {}", request.msg_id).into_bytes())
> +        }
> +    }
> +
> +    #[test]
> +    #[ignore] // Requires libqb-dev
> +    fn test_proxmox_style_policy() {
> +        let service_name = unique_service_name();
> +        let _server = start_server(service_name.clone(), ProxmoxStyleHandler);
> +
> +        wait_for_server_ready(&service_name);
> +
> +        let current_uid = unsafe { libc::getuid() };
> +        let connected = try_connect(&service_name);
> +
> +        match current_uid {
> +            0 => assert!(connected, "Root should be accepted"),
> +            33 => assert!(connected, "www-data should be accepted"),
> +            _ => assert!(!connected, "Other users should be rejected"),
> +        }
> +    }
> +
> +    /// Example: Handler that uses group-based authentication
> +    struct GroupBasedHandler {
> +        allowed_gid: u32,
> +    }
> +
> +    impl GroupBasedHandler {
> +        fn new(allowed_gid: u32) -> Self {
> +            Self { allowed_gid }
> +        }
> +    }
> +
> +    #[async_trait]
> +    impl Handler for GroupBasedHandler {
> +        fn authenticate(&self, _uid: u32, gid: u32) -> Option<Permissions> {
> +            if gid == self.allowed_gid {
> +                Some(Permissions::ReadWrite)
> +            } else {
> +                None
> +            }
> +        }
> +
> +        async fn handle(&self, _request: Request) -> Response {
> +            Response::ok(b"ok".to_vec())
> +        }
> +    }
> +
> +    #[test]
> +    #[ignore] // Requires libqb-dev
> +    fn test_group_based_authentication() {
> +        let service_name = unique_service_name();
> +        let current_gid = unsafe { libc::getgid() };
> +        let _server = start_server(service_name.clone(), GroupBasedHandler::new(current_gid));
> +
> +        wait_for_server_ready(&service_name);
> +
> +        assert!(
> +            try_connect(&service_name),
> +            "Should accept connection from same group"
> +        );
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-ipc/tests/qb_wire_compat.rs b/src/pmxcfs-rs/pmxcfs-ipc/tests/qb_wire_compat.rs
> new file mode 100644
> index 00000000..8c0db962
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-ipc/tests/qb_wire_compat.rs
> @@ -0,0 +1,413 @@
> +//! Wire protocol compatibility test with libqb C clients
> +//!
> +//! This integration test verifies that our Rust Server is fully compatible
> +//! with real libqb C clients by using libqb's client API via FFI.
> +//!
> +//! Run with: cargo test --package pmxcfs-ipc --test qb_wire_compat -- --ignored --nocapture
> +//!
> +//! Requires: libqb-dev installed
> +
> +use pmxcfs_test_utils::wait_for_condition_blocking;
> +use std::ffi::CString;
> +use std::thread;
> +use std::time::Duration;
> +
> +// ============================================================================
> +// Minimal libqb FFI bindings (client-side only)
> +// ============================================================================
> +
> +/// libqb request header matching C's __attribute__ ((aligned(8)))
> +/// Each field is i32 with 8-byte alignment, achieved via explicit padding
> +#[repr(C, align(8))]
> +#[derive(Debug, Copy, Clone)]
> +struct QbIpcRequestHeader {
> +    id: i32,    // 4 bytes
> +    _pad1: u32, // 4 bytes padding
> +    size: i32,  // 4 bytes
> +    _pad2: u32, // 4 bytes padding
> +}
> +
> +/// libqb response header matching C's __attribute__ ((aligned(8)))
> +/// Each field is i32 with 8-byte alignment, achieved via explicit padding
> +#[repr(C, align(8))]
> +#[derive(Debug, Copy, Clone)]
> +struct QbIpcResponseHeader {
> +    id: i32,    // 4 bytes
> +    _pad1: u32, // 4 bytes padding
> +    size: i32,  // 4 bytes
> +    _pad2: u32, // 4 bytes padding
> +    error: i32, // 4 bytes
> +    _pad3: u32, // 4 bytes padding
> +}
> +
> +// Opaque type for connection handle
> +#[repr(C)]
> +struct QbIpccConnection {
> +    _private: [u8; 0],
> +}
> +
> +#[link(name = "qb")]
> +unsafe extern "C" {
> +    /// Connect to a QB IPC service
> +    /// Returns NULL on failure
> +    fn qb_ipcc_connect(name: *const libc::c_char, max_msg_size: usize) -> *mut QbIpccConnection;
> +
> +    /// Send request and receive response (with iovec)
> +    /// Returns number of bytes received, or negative errno on error
> +    fn qb_ipcc_sendv_recv(
> +        conn: *mut QbIpccConnection,
> +        iov: *const libc::iovec,
> +        iov_len: u32,
> +        res_buf: *mut libc::c_void,
> +        res_buf_size: usize,
> +        timeout_ms: i32,
> +    ) -> libc::ssize_t;
> +
> +    /// Disconnect from service
> +    fn qb_ipcc_disconnect(conn: *mut QbIpccConnection);
> +
> +    /// Initialize libqb logging
> +    fn qb_log_init(name: *const libc::c_char, facility: i32, priority: i32);
> +
> +    /// Control log targets
> +    fn qb_log_ctl(target: i32, conf: i32, arg: i32) -> i32;
> +
> +    /// Filter control
> +    fn qb_log_filter_ctl(
> +        target: i32,
> +        op: i32,
> +        type_: i32,
> +        text: *const libc::c_char,
> +        priority: i32,
> +    ) -> i32;
> +}
> +
> +// Log targets
> +const QB_LOG_STDERR: i32 = 2;
> +
> +// Log control operations
> +const QB_LOG_CONF_ENABLED: i32 = 1;
> +
> +// Log filter operations
> +const QB_LOG_FILTER_ADD: i32 = 0;
> +const QB_LOG_FILTER_FILE: i32 = 1;
> +
> +// Log levels (from syslog.h)
> +const LOG_TRACE: i32 = 8; // LOG_DEBUG + 1
> +
> +// ============================================================================
> +// Safe Rust wrapper around libqb client
> +// ============================================================================
> +
> +struct QbIpcClient {
> +    conn: *mut QbIpccConnection,
> +}
> +
> +impl QbIpcClient {
> +    fn connect(service_name: &str, max_msg_size: usize) -> Result<Self, String> {
> +        let name = CString::new(service_name).map_err(|e| format!("Invalid service name: {e}"))?;
> +
> +        let conn = unsafe { qb_ipcc_connect(name.as_ptr(), max_msg_size) };
> +
> +        if conn.is_null() {
> +            let errno = unsafe { *libc::__errno_location() };
> +            let error_str = unsafe {
> +                let err_ptr = libc::strerror(errno);
> +                std::ffi::CStr::from_ptr(err_ptr)
> +                    .to_string_lossy()
> +                    .to_string()
> +            };
> +            Err(format!(
> +                "qb_ipcc_connect returned NULL (errno={errno}: {error_str})"
> +            ))
> +        } else {
> +            Ok(Self { conn })
> +        }
> +    }
> +
> +    fn send_recv(
> +        &self,
> +        request_id: i32,
> +        request_data: &[u8],
> +        timeout_ms: i32,
> +    ) -> Result<(i32, Vec<u8>), String> {
> +        // Build request
> +        let req_header = QbIpcRequestHeader {
> +            id: request_id,
> +            _pad1: 0,
> +            size: (std::mem::size_of::<QbIpcRequestHeader>() + request_data.len()) as i32,
> +            _pad2: 0,
> +        };
> +
> +        // Setup iovec
> +        let mut iov = vec![libc::iovec {
> +            iov_base: &req_header as *const _ as *mut libc::c_void,
> +            iov_len: std::mem::size_of::<QbIpcRequestHeader>(),
> +        }];
> +
> +        if !request_data.is_empty() {
> +            iov.push(libc::iovec {
> +                iov_base: request_data.as_ptr() as *mut libc::c_void,
> +                iov_len: request_data.len(),
> +            });
> +        }
> +
> +        // Response buffer
> +        const MAX_RESPONSE: usize = 8192 * 128;
> +        let mut resp_buf = vec![0u8; MAX_RESPONSE];
> +
> +        // Send and receive
> +        let result = unsafe {
> +            qb_ipcc_sendv_recv(
> +                self.conn,
> +                iov.as_ptr(),
> +                iov.len() as u32,
> +                resp_buf.as_mut_ptr() as *mut libc::c_void,
> +                resp_buf.len(),
> +                timeout_ms,
> +            )
> +        };
> +
> +        if result < 0 {
> +            return Err(format!("qb_ipcc_sendv_recv failed: {}", -result));
> +        }
> +
> +        let bytes_received = result as usize;
> +
> +        // Parse response header
> +        if bytes_received < std::mem::size_of::<QbIpcResponseHeader>() {
> +            return Err("Response too short".to_string());
> +        }
> +
> +        let resp_header = unsafe { *(resp_buf.as_ptr() as *const QbIpcResponseHeader) };
> +
> +        // Verify response ID matches request
> +        if resp_header.id != request_id {
> +            return Err(format!(
> +                "Response ID mismatch: expected {}, got {}",
> +                request_id, resp_header.id
> +            ));
> +        }
> +
> +        // Extract data
> +        let data_start = std::mem::size_of::<QbIpcResponseHeader>();
> +        let data = resp_buf[data_start..bytes_received].to_vec();
> +
> +        Ok((resp_header.error, data))
> +    }
> +}
> +
> +impl Drop for QbIpcClient {
> +    fn drop(&mut self) {
> +        unsafe {
> +            qb_ipcc_disconnect(self.conn);
> +        }
> +    }
> +}
> +
> +// ============================================================================
> +// Integration Test
> +// ============================================================================
> +
> +#[test]
> +#[ignore] // Run with: cargo test -- --ignored
> +fn test_libqb_wire_protocol_compatibility() {
> +    eprintln!("🧪 Starting wire protocol compatibility test");
> +
> +    // Check if libqb is available
> +    eprintln!("🔍 Checking if libqb is available...");
> +    if !check_libqb_available() {
> +        eprintln!("⏭️  SKIP: libqb not installed");
> +        eprintln!("   Install with: sudo apt-get install libqb-dev");
> +        return;
> +    }
> +    eprintln!("✓ libqb is available");
> +
> +    // Start test server
> +    eprintln!("🚀 Starting test server...");
> +    let server_handle = start_test_server();
> +    eprintln!("✓ Server thread spawned");
> +
> +    // Wait for server to be ready
> +    eprintln!("⏳ Waiting for server initialization...");
> +    wait_for_server_ready("pve2");
> +    eprintln!("✓ Server is ready");
> +
> +    // Run tests
> +    eprintln!("🧪 Running client tests...");
> +    let test_result = run_client_tests();
> +
> +    // Cleanup
> +    drop(server_handle);
> +
> +    // Assert results
> +    assert!(
> +        test_result.is_ok(),
> +        "Client tests failed: {:?}",
> +        test_result.err()
> +    );
> +}
> +
> +fn check_libqb_available() -> bool {
> +    std::process::Command::new("pkg-config")
> +        .args(["--exists", "libqb"])
> +        .status()
> +        .map(|s| s.success())
> +        .unwrap_or(false)
> +}
> +
> +fn start_test_server() -> thread::JoinHandle<()> {
> +    use async_trait::async_trait;
> +    use pmxcfs_ipc::{Handler, Request, Response, Server};
> +
> +    // Create test handler
> +    struct TestHandler;
> +
> +    #[async_trait]
> +    impl Handler for TestHandler {
> +        fn authenticate(&self, _uid: u32, _gid: u32) -> Option<pmxcfs_ipc::Permissions> {
> +            // Accept all connections with read-write access for testing
> +            Some(pmxcfs_ipc::Permissions::ReadWrite)
> +        }
> +
> +        async fn handle(&self, request: Request) -> Response {
> +            match request.msg_id {
> +                1 => {
> +                    // CFS_IPC_GET_FS_VERSION
> +                    let response_str = r#"{"version":1,"protocol":1}"#;
> +                    Response::ok(response_str.as_bytes().to_vec())
> +                }
> +                2 => {
> +                    // CFS_IPC_GET_CLUSTER_INFO
> +                    let response_str = r#"{"nodes":[],"quorate":false}"#;
> +                    Response::ok(response_str.as_bytes().to_vec())
> +                }
> +                3 => {
> +                    // CFS_IPC_GET_GUEST_LIST
> +                    let response_str = r#"{"data":[]}"#;
> +                    Response::ok(response_str.as_bytes().to_vec())
> +                }
> +                _ => Response::err(-libc::EINVAL),
> +            }
> +        }
> +    }
> +
> +    // Spawn server thread with tokio runtime
> +    thread::spawn(move || {
> +        // Initialize tracing for server (WARN level - silent on success)
> +        tracing_subscriber::fmt()
> +            .with_max_level(tracing::Level::WARN)
> +            .with_target(false)
> +            .init();
> +
> +        // Create tokio runtime for async server
> +        let rt = tokio::runtime::Runtime::new().expect("Failed to create tokio runtime");
> +
> +        rt.block_on(async {
> +            let mut server = Server::new("pve2", TestHandler);
> +
> +            // Server uses abstract Unix socket (Linux-specific)
> +            if let Err(e) = server.start() {
> +                eprintln!("Server startup failed: {e}");
> +                eprintln!("Error details: {e:?}");
> +                panic!("Server startup failed");
> +            }
> +
> +            // Give tokio a chance to start the acceptor task
> +            tokio::task::yield_now().await;
> +
> +            // Block forever to keep server alive
> +            std::future::pending::<()>().await;
> +        });
> +    })
> +}
> +
> +/// Wait for server to be ready by checking if socket file exists
> +fn wait_for_server_ready(service_name: &str) {
> +    assert!(
> +        wait_for_condition_blocking(
> +            || {
> +                // Check if socket file exists in /dev/shm
> +                let socket_pattern = format!("/dev/shm/qb-{service_name}-");
> +                if let Ok(entries) = std::fs::read_dir("/dev/shm") {
> +                    for entry in entries.flatten() {
> +                        if let Ok(name) = entry.file_name().into_string()
> +                            && name.starts_with(&socket_pattern)
> +                        {
> +                            return true;
> +                        }
> +                    }
> +                }
> +                false
> +            },
> +            Duration::from_secs(5),
> +            Duration::from_millis(10),
> +        ),
> +        "Server should be ready within 5 seconds"

The /dev/shm/qb-* ringbuffer files are created per accepted connection. 
For RejectAllHandler no connection is accepted, so no files appear and 
this can time out even though the server is already ready.

> +    );
> +}
> +
> +fn run_client_tests() -> Result<(), String> {
> +    // Enable libqb debug logging to see what's happening
> +    eprintln!("🔧 Enabling libqb debug logging...");
> +    unsafe {
> +        let name = CString::new("qb_test").unwrap();
> +        qb_log_init(name.as_ptr(), libc::LOG_USER, LOG_TRACE);
> +        qb_log_ctl(QB_LOG_STDERR, QB_LOG_CONF_ENABLED, 1);
> +        // Enable all log messages from all files at TRACE level
> +        let all_files = CString::new("*").unwrap();
> +        qb_log_filter_ctl(
> +            QB_LOG_STDERR,
> +            QB_LOG_FILTER_ADD,
> +            QB_LOG_FILTER_FILE,
> +            all_files.as_ptr(),
> +            LOG_TRACE,
> +        );
> +    }
> +    eprintln!("✓ libqb logging enabled (TRACE level)");
> +
> +    eprintln!("📡 Connecting to server...");

nit: for consistency with other repos I suggest to remove the emojies :)
applies also to the other patches.

> +    // Connect to abstract socket "pve2"
> +    // Use a very large buffer size to rule out space issues
> +    let client = QbIpcClient::connect("pve2", 8192 * 1024)?; // 8MB instead of 1MB
> +    eprintln!("✓ Connected successfully");
> +
> +    eprintln!("🧪 Test 1: GET_FS_VERSION");
> +    // Test 1: GET_FS_VERSION
> +    let (error, data) = client.send_recv(1, &[], 5000)?;
> +    eprintln!("✓ Got response: error={}, data_len={}", error, data.len());
> +    if error == 0 {
> +        let response = String::from_utf8_lossy(&data);
> +        eprintln!("  Response: {response}");
> +        assert!(
> +            response.contains("version"),
> +            "Response should contain version field"
> +        );
> +    }
> +
> +    eprintln!("🧪 Test 2: GET_CLUSTER_INFO");
> +    // Test 2: GET_CLUSTER_INFO
> +    let (error, data) = client.send_recv(2, &[], 5000)?;
> +    eprintln!("✓ Got response: error={}, data_len={}", error, data.len());
> +    if error == 0 {
> +        let response = String::from_utf8_lossy(&data);
> +        eprintln!("  Response: {response}");
> +        assert!(
> +            response.contains("nodes"),
> +            "Response should contain nodes field"
> +        );
> +    }
> +
> +    eprintln!("🧪 Test 3: Request with data payload");
> +    // Test 3: Request with data payload
> +    let test_payload = b"test_payload_data";
> +    let (_error, _data) = client.send_recv(1, test_payload, 5000)?;
> +    eprintln!("✓ Request with payload succeeded");
> +
> +    eprintln!("🧪 Test 4: GET_GUEST_LIST");
> +    // Test 4: GET_GUEST_LIST
> +    let (_error, _data) = client.send_recv(3, &[], 5000)?;
> +    eprintln!("✓ GET_GUEST_LIST succeeded");
> +
> +    Ok(())
> +}

Can we please also test additionally for
- the expected behaviour when the ring buffer is full?
- on connection disconnect are ring buffer files deleted?
- adversarial inputs
- graceful shutdown
- concurrent connections




^ permalink raw reply	[relevance 5%]

* [PATCH proxmox-backup v5 4/4] pbs-config: add TTL window to token secret cache
  2026-02-17 11:12 14% [PATCH proxmox{-backup,,-datacenter-manager} v5 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (2 preceding siblings ...)
  2026-02-17 11:12 12% ` [PATCH proxmox-backup v5 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2026-02-17 11:12 15% ` Samuel Rufinatscha
  2026-02-17 11:12 14% ` [PATCH proxmox v5 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen Samuel Rufinatscha
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-17 11:12 UTC (permalink / raw)
  To: pbs-devel

Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
usually should rarely change.

This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired; documents TTL effects.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v4 to v5:
* Rebased
* Introduce shadow_check_within_ttl() helper

Changes from v3 to v4:
* Adjusted commit message

Changes from v2 to v3:
* Refactored refresh_cache_if_file_changed TTL logic.
* Remove had_prior_state check (replaced by last_checked logic).
* Improve TTL bound checks.
* Reword documentation warning for clarity.

Changes from v1 to v2:
* Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
* Implement double-checked TTL: check with try_read first; only attempt
  refresh with try_write if expired/unknown.
* Fix TTL bookkeeping: update last_checked on the “file unchanged” path
  and after API mutations.
* Add documentation warning about TTL-delayed effect of manual
  token.shadow edits.

 docs/user-management.rst       |  4 ++++
 pbs-config/src/token_shadow.rs | 30 +++++++++++++++++++++++++++++-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/docs/user-management.rst b/docs/user-management.rst
index 41b43d60..8dfae528 100644
--- a/docs/user-management.rst
+++ b/docs/user-management.rst
@@ -156,6 +156,10 @@ metadata:
 Similarly, the ``user delete-token`` subcommand can be used to delete a token
 again.
 
+.. WARNING:: Direct/manual edits to ``token.shadow`` may take up to 60 seconds (or
+   longer in edge cases) to take effect due to caching. Restart services for
+   immediate effect of manual edits.
+
 Newly generated API tokens don't have any permissions. Please read the next
 section to learn how to set access permissions.
 
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index 82c4a7f1..2930465f 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -31,6 +31,8 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
         shadow: None,
     })
 });
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
 
 #[derive(Serialize, Deserialize)]
 #[serde(rename_all = "kebab-case")]
@@ -72,11 +74,24 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
 fn refresh_cache_if_file_changed() -> bool {
     let now = epoch_i64();
 
-    // Best-effort refresh under write lock.
+    // Fast path: cache is fresh if shared-gen matches and TTL not expired.
+    if let (Some(cache), Some(shared_gen_read)) =
+        (TOKEN_SECRET_CACHE.try_read(), token_shadow_shared_gen())
+    {
+        if cache.shared_gen == shared_gen_read && cache.shadow_check_within_ttl(now) {
+            return true;
+        }
+        // read lock drops here
+    } else {
+        return false;
+    }
+
+    // Slow path: best-effort refresh under write lock.
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return false;
     };
 
+    // Re-read generation after acquiring the lock (may have changed meanwhile).
     let Some(shared_gen_now) = token_shadow_shared_gen() else {
         return false;
     };
@@ -86,6 +101,12 @@ fn refresh_cache_if_file_changed() -> bool {
         cache.reset_and_set_gen(shared_gen_now);
     }
 
+    // TTL check again after acquiring the lock
+    let now = epoch_i64();
+    if cache.shadow_check_within_ttl(now) {
+        return true;
+    }
+
     // Stat the file to detect manual edits.
     let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
         return false;
@@ -234,6 +255,13 @@ impl ApiTokenSecretCache {
         self.secrets.remove(tokenid);
         self.shared_gen = gen;
     }
+
+    /// Returns true if cached token.shadow metadata exists and was checked within the TTL window.
+    fn shadow_check_within_ttl(&self, now: i64) -> bool {
+        self.shadow.as_ref().is_some_and(|cached| {
+            now >= cached.last_checked && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS
+        })
+    }
 }
 
 /// Shadow file info
-- 
2.47.3





^ permalink raw reply related	[relevance 15%]

* [PATCH proxmox-backup v5 3/4] pbs-config: invalidate token-secret cache on token.shadow changes
  2026-02-17 11:12 14% [PATCH proxmox{-backup,,-datacenter-manager} v5 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
  2026-02-17 11:12 17% ` [PATCH proxmox-backup v5 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
  2026-02-17 11:12 12% ` [PATCH proxmox-backup v5 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
@ 2026-02-17 11:12 12% ` Samuel Rufinatscha
  2026-02-17 11:12 15% ` [PATCH proxmox-backup v5 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-17 11:12 UTC (permalink / raw)
  To: pbs-devel

This patch adds manual/direct file change detection by tracking the
mtime and length of token.shadow and clears the in-memory token secret
cache whenever these values change.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v4 to v5:
* Rebased

Changes from v3 to v4:
* make use of .replace() in refresh_cache_if_file_changed to get
previous state
* Group file stats with ShadowFileInfo
* Return false in refresh_cache_if_file_changed to avoid unnecessary cache
queries
* Adjusted commit message

Changes from v2 to v3:
* Cache now tracks last_checked (epoch seconds).
* Simplified refresh_cache_if_file_changed, removed
FILE_GENERATION logic
* On first load, initializes file metadata and keeps empty cache.

Changes from v1 to v2:
* Add file metadata tracking (file_mtime, file_len) and
  FILE_GENERATION.
* Store file_gen in CachedSecret and verify it against the current
  FILE_GENERATION to ensure cached entries belong to the current file
  state.
* Add shadow_mtime_len() helper and convert refresh to best-effort
  (try_write, returns bool).
* Pass a pre-write metadata snapshot into apply_api_mutation and
  clear/bump generation if the cache metadata indicates missed external
  edits.

 pbs-config/src/token_shadow.rs | 123 +++++++++++++++++++++++++++++++--
 1 file changed, 119 insertions(+), 4 deletions(-)

diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index ad766671..82c4a7f1 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,5 +1,8 @@
 use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
 use std::sync::LazyLock;
+use std::time::SystemTime;
 
 use anyhow::{bail, format_err, Error};
 use parking_lot::RwLock;
@@ -7,6 +10,7 @@ use serde::{Deserialize, Serialize};
 use serde_json::{from_value, Value};
 
 use proxmox_sys::fs::CreateOptions;
+use proxmox_time::epoch_i64;
 
 use pbs_api_types::Authid;
 //use crate::auth;
@@ -24,6 +28,7 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
     RwLock::new(ApiTokenSecretCache {
         secrets: HashMap::new(),
         shared_gen: 0,
+        shadow: None,
     })
 });
 
@@ -62,6 +67,56 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
     proxmox_sys::fs::replace_file(CONF_FILE, &json, options, true)
 }
 
+/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
+/// Returns true if the cache is valid to use, false if not.
+fn refresh_cache_if_file_changed() -> bool {
+    let now = epoch_i64();
+
+    // Best-effort refresh under write lock.
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return false;
+    };
+
+    let Some(shared_gen_now) = token_shadow_shared_gen() else {
+        return false;
+    };
+
+    // If another process bumped the generation, we don't know what changed -> clear cache
+    if cache.shared_gen != shared_gen_now {
+        cache.reset_and_set_gen(shared_gen_now);
+    }
+
+    // Stat the file to detect manual edits.
+    let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
+        return false;
+    };
+
+    // If the file didn't change, only update last_checked
+    if let Some(shadow) = cache.shadow.as_mut() {
+        if shadow.mtime == new_mtime && shadow.len == new_len {
+            shadow.last_checked = now;
+            return true;
+        }
+    }
+
+    cache.secrets.clear();
+
+    let prev = cache.shadow.replace(ShadowFileInfo {
+        mtime: new_mtime,
+        len: new_len,
+        last_checked: now,
+    });
+
+    if prev.is_some() {
+        // Best-effort propagation to other processes if a change was detected
+        if let Some(shared_gen_new) = bump_token_shadow_shared_gen() {
+            cache.shared_gen = shared_gen_new;
+        }
+    }
+
+    false
+}
+
 /// Verifies that an entry for given tokenid / API token secret exists
 pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     if !tokenid.is_token() {
@@ -69,7 +124,7 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     }
 
     // Fast path
-    if cache_try_secret_matches(tokenid, secret) {
+    if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
         return Ok(());
     }
 
@@ -109,12 +164,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
 
     let guard = lock_config()?;
 
+    // Capture state before we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
-    apply_api_mutation(guard, tokenid, Some(secret));
+    apply_api_mutation(guard, tokenid, Some(secret), pre_meta);
 
     Ok(())
 }
@@ -127,11 +185,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
 
     let guard = lock_config()?;
 
+    // Capture state before we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
-    apply_api_mutation(guard, tokenid, None);
+    apply_api_mutation(guard, tokenid, None, pre_meta);
 
     Ok(())
 }
@@ -150,6 +211,8 @@ struct ApiTokenSecretCache {
     secrets: HashMap<Authid, CachedSecret>,
     /// Shared generation to detect mutations of the underlying token.shadow file.
     shared_gen: usize,
+    /// Shadow file info to detect changes
+    shadow: Option<ShadowFileInfo>,
 }
 
 impl ApiTokenSecretCache {
@@ -157,6 +220,7 @@ impl ApiTokenSecretCache {
     fn reset_and_set_gen(&mut self, gen: usize) {
         self.secrets.clear();
         self.shared_gen = gen;
+        self.shadow = None;
     }
 
     /// Caches a secret and sets/updates the cache generation.
@@ -172,6 +236,16 @@ impl ApiTokenSecretCache {
     }
 }
 
+/// Shadow file info
+struct ShadowFileInfo {
+    // shadow file mtime to detect changes
+    mtime: Option<SystemTime>,
+    // shadow file length to detect changes
+    len: Option<u64>,
+    // last time the file metadata was checked
+    last_checked: i64,
+}
+
 fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return;
@@ -216,7 +290,14 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
     false
 }
 
-fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+fn apply_api_mutation(
+    _guard: BackupLockGuard,
+    tokenid: &Authid,
+    new_secret: Option<&str>,
+    pre_write_meta: (Option<SystemTime>, Option<u64>),
+) {
+    let now = epoch_i64();
+
     // Signal cache invalidation to other processes (best-effort).
     let bumped_gen = bump_token_shadow_shared_gen();
 
@@ -235,6 +316,16 @@ fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Opt
         return;
     }
 
+    // If our cached file metadata does not match the on-disk state before our write,
+    // we likely missed an external/manual edit. We can no longer trust any cached secrets.
+    if cache
+        .shadow
+        .as_ref()
+        .is_some_and(|s| (s.mtime, s.len) != pre_write_meta)
+    {
+        cache.secrets.clear();
+    }
+
     // Apply the new mutation.
     match new_secret {
         Some(secret) => {
@@ -245,6 +336,22 @@ fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Opt
         }
         None => cache.evict_and_set_gen(tokenid, current_gen),
     }
+
+    // Update our view of the file metadata to the post-write state (best-effort).
+    // (If this fails, drop local cache so callers fall back to slow path until refreshed.)
+    match shadow_mtime_len() {
+        Ok((mtime, len)) => {
+            cache.shadow = Some(ShadowFileInfo {
+                mtime,
+                len,
+                last_checked: now,
+            });
+        }
+        Err(_) => {
+            // If we cannot validate state, do not trust cache.
+            cache.reset_and_set_gen(current_gen);
+        }
+    }
 }
 
 /// Get the current shared generation.
@@ -260,3 +367,11 @@ fn bump_token_shadow_shared_gen() -> Option<usize> {
         .ok()
         .map(|cvc| cvc.increase_token_shadow_generation() + 1)
 }
+
+fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
+    match fs::metadata(CONF_FILE) {
+        Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
+        Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
+        Err(e) => Err(e.into()),
+    }
+}
-- 
2.47.3





^ permalink raw reply related	[relevance 12%]

* [PATCH proxmox-backup v5 2/4] pbs-config: cache verified API token secrets
  2026-02-17 11:12 14% [PATCH proxmox{-backup,,-datacenter-manager} v5 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
  2026-02-17 11:12 17% ` [PATCH proxmox-backup v5 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
@ 2026-02-17 11:12 12% ` Samuel Rufinatscha
  2026-02-25 15:44  6%   ` Shannon Sterz
  2026-02-17 11:12 12% ` [PATCH proxmox-backup v5 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 117+ results
From: Samuel Rufinatscha @ 2026-02-17 11:12 UTC (permalink / raw)
  To: pbs-devel

Adds an in-memory cache of successfully verified token secrets.
Subsequent requests for the same token+secret combination only perform
a comparison using openssl::memcmp::eq and avoid re-running the
password hash. The cache is updated when a token secret is set and
cleared when a token is deleted. A shared generation counter (via
ConfigVersionCache) is used to invalidate caches across processes when
token secrets are modified or deleted. This keeps privileged and
unprivileged daemons in sync.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v4 to v5:
* Rebased
* Move invalidate_cache_state_and_set_gen into cache object impl
rename to reset_and_set_gen
* Add additional insert/remove helpers which set/update the generation
directly
* Clarified the  usage of shared generation counter in the commit
message

Changes from v3 to v4:
* Add gen param to invalidate_cache_state()
* Validates the generation bump after obtaining write lock in
apply_api_mutation
* Pass lock to apply_api_mutation
* Remove unnecessary gen check cache_try_secret_matches
* Adjusted commit message

Changes from v2 to v3:
* Replaced process-local cache invalidation (AtomicU64
API_MUTATION_GENERATION) with a cross-process shared generation via
ConfigVersionCache.
* Validate shared generation before/after the constant-time secret
compare; only insert into cache if the generation is unchanged.
* invalidate_cache_state() on insert if shared generation changed.

Changes from v1 to v2:
* Replace OnceCell with LazyLock, and std::sync::RwLock with
parking_lot::RwLock.
* Add API_MUTATION_GENERATION and guard cache inserts
to prevent “zombie inserts” across concurrent set/delete.
* Refactor cache operations into cache_try_secret_matches,
cache_try_insert_secret, and centralize write-side behavior in
apply_api_mutation.
* Switch fast-path cache access to try_read/try_write (best-effort).

 Cargo.toml                     |   1 +
 pbs-config/Cargo.toml          |   1 +
 pbs-config/src/token_shadow.rs | 167 ++++++++++++++++++++++++++++++++-
 3 files changed, 166 insertions(+), 3 deletions(-)

diff --git a/Cargo.toml b/Cargo.toml
index dd8af85f..469538bb 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -144,6 +144,7 @@ nom = "7"
 num-traits = "0.2"
 once_cell = "1.3.1"
 openssl = "0.10.40"
+parking_lot = "0.12"
 percent-encoding = "2.1"
 pin-project-lite = "0.2"
 regex = "1.5.5"
diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
index 74afb3c6..eb81ce00 100644
--- a/pbs-config/Cargo.toml
+++ b/pbs-config/Cargo.toml
@@ -13,6 +13,7 @@ libc.workspace = true
 nix.workspace = true
 once_cell.workspace = true
 openssl.workspace = true
+parking_lot.workspace = true
 regex.workspace = true
 serde.workspace = true
 serde_json.workspace = true
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index 640fabbf..ad766671 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,6 +1,8 @@
 use std::collections::HashMap;
+use std::sync::LazyLock;
 
 use anyhow::{bail, format_err, Error};
+use parking_lot::RwLock;
 use serde::{Deserialize, Serialize};
 use serde_json::{from_value, Value};
 
@@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
 const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
 const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
 
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
+    RwLock::new(ApiTokenSecretCache {
+        secrets: HashMap::new(),
+        shared_gen: 0,
+    })
+});
+
 #[derive(Serialize, Deserialize)]
 #[serde(rename_all = "kebab-case")]
 /// ApiToken id / secret pair
@@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
+    // Fast path
+    if cache_try_secret_matches(tokenid, secret) {
+        return Ok(());
+    }
+
+    // Slow path
+    // First, capture the shared generation before doing the hash verification.
+    let gen_before = token_shadow_shared_gen();
+
     let data = read_file()?;
     match data.get(tokenid) {
-        Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+        Some(hashed_secret) => {
+            proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+
+            // Try to cache only if nothing changed while verifying the secret.
+            if let Some(gen) = gen_before {
+                cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
+            }
+
+            Ok(())
+        }
         None => bail!("invalid API token"),
     }
 }
@@ -75,13 +107,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
-    let _guard = lock_config()?;
+    let guard = lock_config()?;
 
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
+    apply_api_mutation(guard, tokenid, Some(secret));
+
     Ok(())
 }
 
@@ -91,11 +125,138 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
-    let _guard = lock_config()?;
+    let guard = lock_config()?;
 
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
+    apply_api_mutation(guard, tokenid, None);
+
     Ok(())
 }
+
+/// Cached secret.
+struct CachedSecret {
+    secret: String,
+}
+
+struct ApiTokenSecretCache {
+    /// Keys are token Authids, values are the corresponding plain text secrets.
+    /// Entries are added after a successful on-disk verification in
+    /// `verify_secret` or when a new token secret is generated by
+    /// `generate_and_set_secret`. Used to avoid repeated
+    /// password-hash computation on subsequent authentications.
+    secrets: HashMap<Authid, CachedSecret>,
+    /// Shared generation to detect mutations of the underlying token.shadow file.
+    shared_gen: usize,
+}
+
+impl ApiTokenSecretCache {
+    /// Resets all local cache contents and sets/updates the cached generation.
+    fn reset_and_set_gen(&mut self, gen: usize) {
+        self.secrets.clear();
+        self.shared_gen = gen;
+    }
+
+    /// Caches a secret and sets/updates the cache generation.
+    fn insert_and_set_gen(&mut self, tokenid: Authid, secret: CachedSecret, gen: usize) {
+        self.secrets.insert(tokenid, secret);
+        self.shared_gen = gen;
+    }
+
+    /// Evicts a cached secret and sets/updates the cached generation.
+    fn evict_and_set_gen(&mut self, tokenid: &Authid, gen: usize) {
+        self.secrets.remove(tokenid);
+        self.shared_gen = gen;
+    }
+}
+
+fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return;
+    };
+
+    let Some(shared_gen_now) = token_shadow_shared_gen() else {
+        return;
+    };
+
+    // If this process missed a generation bump, its cache is stale.
+    if cache.shared_gen != shared_gen_now {
+        cache.reset_and_set_gen(shared_gen_now);
+    }
+
+    // If a mutation happened while we were verifying the secret, do not insert.
+    if shared_gen_now == shared_gen_before {
+        cache.insert_and_set_gen(tokenid, CachedSecret { secret }, shared_gen_now);
+    }
+}
+
+/// Tries to match the given token secret against the cached secret.
+///
+/// Verifies the generation/version before doing the constant-time
+/// comparison to reduce TOCTOU risk. During token rotation or deletion
+/// tokens for in-flight requests may still validate against the previous
+/// generation.
+fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
+    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+        return false;
+    };
+    let Some(entry) = cache.secrets.get(tokenid) else {
+        return false;
+    };
+    let Some(current_gen) = token_shadow_shared_gen() else {
+        return false;
+    };
+
+    if current_gen == cache.shared_gen {
+        return openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
+    }
+
+    false
+}
+
+fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+    // Signal cache invalidation to other processes (best-effort).
+    let bumped_gen = bump_token_shadow_shared_gen();
+
+    let mut cache = TOKEN_SECRET_CACHE.write();
+
+    // If we cannot get the current generation, we cannot trust the cache
+    let Some(current_gen) = token_shadow_shared_gen() else {
+        cache.reset_and_set_gen(0);
+        return;
+    };
+
+    // If we cannot bump the shared generation, or if it changed after
+    // obtaining the cache write lock, we cannot trust the cache
+    if bumped_gen != Some(current_gen) {
+        cache.reset_and_set_gen(current_gen);
+        return;
+    }
+
+    // Apply the new mutation.
+    match new_secret {
+        Some(secret) => {
+            let cached_secret = CachedSecret {
+                secret: secret.to_owned(),
+            };
+            cache.insert_and_set_gen(tokenid.clone(), cached_secret, current_gen);
+        }
+        None => cache.evict_and_set_gen(tokenid, current_gen),
+    }
+}
+
+/// Get the current shared generation.
+fn token_shadow_shared_gen() -> Option<usize> {
+    crate::ConfigVersionCache::new()
+        .ok()
+        .map(|cvc| cvc.token_shadow_generation())
+}
+
+/// Bump and return the new shared generation.
+fn bump_token_shadow_shared_gen() -> Option<usize> {
+    crate::ConfigVersionCache::new()
+        .ok()
+        .map(|cvc| cvc.increase_token_shadow_generation() + 1)
+}
-- 
2.47.3





^ permalink raw reply related	[relevance 12%]

* [PATCH proxmox v5 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes
  2026-02-17 11:12 14% [PATCH proxmox{-backup,,-datacenter-manager} v5 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (5 preceding siblings ...)
  2026-02-17 11:12 11% ` [PATCH proxmox v5 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
@ 2026-02-17 11:12 12% ` Samuel Rufinatscha
  2026-02-17 11:12 15% ` [PATCH proxmox v5 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-17 11:12 UTC (permalink / raw)
  To: pbs-devel

This patch adds manual/direct file change detection by tracking the
mtime and length of token.shadow and clears the in-memory token secret
cache whenever these values change.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v4 to v5:
* Rebased

Changes from v3 to v4:
* make use of .replace() in refresh_cache_if_file_changed to get
previous state
* Group file stats with ShadowFileInfo
* Return false in refresh_cache_if_file_changed to avoid unnecessary cache
queries
* Adjusted commit message

Changes from v2 to v3:
* Cache now tracks last_checked (epoch seconds).
* Simplified refresh_cache_if_file_changed, removed
FILE_GENERATION logic
* On first load, initializes file metadata and keeps empty cache.

Changes from v1 to v2:
* Add file metadata tracking (file_mtime, file_len) and
  FILE_GENERATION.
* Store file_gen in CachedSecret and verify it against the current
  FILE_GENERATION to ensure cached entries belong to the current file
  state.
* Add shadow_mtime_len() helper and convert refresh to best-effort
  (try_write, returns bool).
* Pass a pre-write metadata snapshot into apply_api_mutation and
  clear/bump generation if the cache metadata indicates missed external
  edits.

 proxmox-access-control/src/token_shadow.rs | 123 ++++++++++++++++++++-
 1 file changed, 119 insertions(+), 4 deletions(-)

diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index ba5fb3f8..d1b7d4cb 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,5 +1,8 @@
 use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
 use std::sync::LazyLock;
+use std::time::SystemTime;
 
 use anyhow::{bail, format_err, Error};
 use parking_lot::RwLock;
@@ -7,6 +10,7 @@ use serde_json::{from_value, Value};
 
 use proxmox_auth_api::types::Authid;
 use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
+use proxmox_time::epoch_i64;
 
 use crate::init::access_conf;
 use crate::init::impl_feature::{token_shadow, token_shadow_lock};
@@ -20,6 +24,7 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
     RwLock::new(ApiTokenSecretCache {
         secrets: HashMap::new(),
         shared_gen: 0,
+        shadow: None,
     })
 });
 
@@ -45,6 +50,56 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
     replace_config(token_shadow(), &json)
 }
 
+/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
+/// Returns true if the cache is valid to use, false if not.
+fn refresh_cache_if_file_changed() -> bool {
+    let now = epoch_i64();
+
+    // Best-effort refresh under write lock.
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return false;
+    };
+
+    let Some(shared_gen_now) = token_shadow_shared_gen() else {
+        return false;
+    };
+
+    // If another process bumped the generation, we don't know what changed -> clear cache
+    if cache.shared_gen != shared_gen_now {
+        cache.reset_and_set_gen(shared_gen_now);
+    }
+
+    // Stat the file to detect manual edits.
+    let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
+        return false;
+    };
+
+    // If the file didn't change, only update last_checked
+    if let Some(shadow) = cache.shadow.as_mut() {
+        if shadow.mtime == new_mtime && shadow.len == new_len {
+            shadow.last_checked = now;
+            return true;
+        }
+    }
+
+    cache.secrets.clear();
+
+    let prev = cache.shadow.replace(ShadowFileInfo {
+        mtime: new_mtime,
+        len: new_len,
+        last_checked: now,
+    });
+
+    if prev.is_some() {
+        // Best-effort propagation to other processes if a change was detected
+        if let Some(shared_gen_new) = bump_token_shadow_shared_gen() {
+            cache.shared_gen = shared_gen_new;
+        }
+    }
+
+    false
+}
+
 /// Verifies that an entry for given tokenid / API token secret exists
 pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     if !tokenid.is_token() {
@@ -52,7 +107,7 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     }
 
     // Fast path
-    if cache_try_secret_matches(tokenid, secret) {
+    if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
         return Ok(());
     }
 
@@ -84,12 +139,15 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
 
     let guard = lock_config()?;
 
+    // Capture state before we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
-    apply_api_mutation(guard, tokenid, Some(secret));
+    apply_api_mutation(guard, tokenid, Some(secret), pre_meta);
 
     Ok(())
 }
@@ -102,11 +160,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
 
     let guard = lock_config()?;
 
+    // Capture state before we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
-    apply_api_mutation(guard, tokenid, None);
+    apply_api_mutation(guard, tokenid, None, pre_meta);
 
     Ok(())
 }
@@ -133,6 +194,8 @@ struct ApiTokenSecretCache {
     secrets: HashMap<Authid, CachedSecret>,
     /// Shared generation to detect mutations of the underlying token.shadow file.
     shared_gen: usize,
+    /// Shadow file info to detect changes
+    shadow: Option<ShadowFileInfo>,
 }
 
 impl ApiTokenSecretCache {
@@ -140,6 +203,7 @@ impl ApiTokenSecretCache {
     fn reset_and_set_gen(&mut self, gen: usize) {
         self.secrets.clear();
         self.shared_gen = gen;
+        self.shadow = None;
     }
 
     /// Caches a secret and sets/updates the cache generation.
@@ -155,6 +219,16 @@ impl ApiTokenSecretCache {
     }
 }
 
+/// Shadow file info
+struct ShadowFileInfo {
+    // shadow file mtime to detect changes
+    mtime: Option<SystemTime>,
+    // shadow file length to detect changes
+    len: Option<u64>,
+    // last time the file metadata was checked
+    last_checked: i64,
+}
+
 fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return;
@@ -199,7 +273,14 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
     false
 }
 
-fn apply_api_mutation(_guard: ApiLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+fn apply_api_mutation(
+    _guard: ApiLockGuard,
+    tokenid: &Authid,
+    new_secret: Option<&str>,
+    pre_write_meta: (Option<SystemTime>, Option<u64>),
+) {
+    let now = epoch_i64();
+
     // Signal cache invalidation to other processes (best-effort).
     let bumped_gen = bump_token_shadow_shared_gen();
 
@@ -218,6 +299,16 @@ fn apply_api_mutation(_guard: ApiLockGuard, tokenid: &Authid, new_secret: Option
         return;
     }
 
+    // If our cached file metadata does not match the on-disk state before our write,
+    // we likely missed an external/manual edit. We can no longer trust any cached secrets.
+    if cache
+        .shadow
+        .as_ref()
+        .is_some_and(|s| (s.mtime, s.len) != pre_write_meta)
+    {
+        cache.secrets.clear();
+    }
+
     // Apply the new mutation.
     match new_secret {
         Some(secret) => {
@@ -228,6 +319,22 @@ fn apply_api_mutation(_guard: ApiLockGuard, tokenid: &Authid, new_secret: Option
         }
         None => cache.evict_and_set_gen(tokenid, current_gen),
     }
+
+    // Update our view of the file metadata to the post-write state (best-effort).
+    // (If this fails, drop local cache so callers fall back to slow path until refreshed.)
+    match shadow_mtime_len() {
+        Ok((mtime, len)) => {
+            cache.shadow = Some(ShadowFileInfo {
+                mtime,
+                len,
+                last_checked: now,
+            });
+        }
+        Err(_) => {
+            // If we cannot validate state, do not trust cache.
+            cache.reset_and_set_gen(current_gen);
+        }
+    }
 }
 
 /// Get the current shared generation.
@@ -242,3 +349,11 @@ fn bump_token_shadow_shared_gen() -> Option<usize> {
         .ok()
         .map(|prev| prev + 1)
 }
+
+fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
+    match fs::metadata(token_shadow()) {
+        Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
+        Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
+        Err(e) => Err(e.into()),
+    }
+}
-- 
2.47.3





^ permalink raw reply related	[relevance 12%]

* [PATCH proxmox{-backup,,-datacenter-manager} v5 00/11] token-shadow: reduce api token verification overhead
@ 2026-02-17 11:12 14% Samuel Rufinatscha
  2026-02-17 11:12 17% ` [PATCH proxmox-backup v5 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
                   ` (11 more replies)
  0 siblings, 12 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-17 11:12 UTC (permalink / raw)
  To: pbs-devel

Hi,

this series improves the performance of token-based API authentication
in PBS (pbs-config) and in PDM (underlying proxmox-access-control
crate), addressing the API token verification hotspot reported in our
bugtracker #7017 [1].

When profiling PBS /status endpoint with cargo flamegraph [2],
token-based authentication showed up as a dominant hotspot via
proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
path from the hot section of the flamegraph. The same performance issue
was measured [2] for PDM. PDM uses the underlying shared
proxmox-access-control library for token handling, which is a
factored out version of the token.shadow handling code from PBS.

While this series fixes the immediate performance issue both in PBS
(pbs-config) and in the shared proxmox-access-control crate used by
PDM, PBS should eventually, ideally be refactored, in a separate
effort, to use proxmox-access-control for token handling instead of its
local implementation.

Approach

The goal is to reduce the cost of token-based authentication preserving
the existing token handling semantics (including detecting manual edits
to token.shadow) and be consistent between PBS (pbs-config) and
PDM (proxmox-access-control). For both sites, this series proposes to:

1. Introduce an in-memory cache for verified token secrets and
invalidate it through a shared ConfigVersionCache generation. Note, a
shared generation is required to keep privileged and unprivileged
daemon in sync to avoid caching inconsistencies across processes.
2. Invalidate on token.shadow API changes (set_secret,
delete_secret)
3. Invalidate on direct/manual token.shadow file changes (mtime +
length)
4. Avoid per-request file stat calls using a TTL window

Testing

To verify the effect in PBS (pbs-config changes), I:
1. Set up test environment based on latest PBS ISO, installed Rust
   toolchain, cloned proxmox-backup repository to use with cargo
   flamegraph. Reproduced bug #7017 [1] by profiling the /status
   endpoint with token-based authentication using cargo flamegraph [2].
2. Built PBS with pbs-config patches and re-ran the same workload and
   profiling setup. Confirmed that
   proxmox_sys::crypt::verify_crypt_pw path no longer appears in the
   hot section of the flamegraph. CPU usage is now dominated by TLS
   overhead.
3. Functionally-wise, I verified that:
   * valid tokens authenticate correctly when used in API requests
   * invalid secrets are rejected as before
   * generating a new token secret via dashboard (create token for
   user, regenerate existing secret) works and authenticates correctly

To verify the effect in PDM (proxmox-access-control changes), instead
of PBS’ /status, I profiled the /version endpoint with cargo flamegraph
[2] and verified that the expensive hashing path disappears from the
hot section after introducing caching. Functionally-wise, I verified
that:
   * valid tokens authenticate correctly when used in API requests
   * invalid secrets are rejected as before
   * generating a new token secret via dashboard (create token for user,
   regenerate existing secret) works and authenticates correctly

Results

To measure caching effect I benchmarked parallel token auth requests
for /status?verbose=0 on top of the datastore lookup cache series [3]
to check throughput impact. With datastores=1, repeat=5000, parallel=16
this series gives ~172 req/s compared to ~65 req/s without it.
This is a ~2.6x improvement (and aligns with the ~179 req/s from the
previous series, which used per-process cache invalidation).

Patch summary

pbs-config:
0001 – pbs-config: add token.shadow generation to ConfigVersionCache
0002 – pbs-config: cache verified API token secrets
0003 – pbs-config: invalidate token-secret cache on token.shadow
changes
0004 – pbs-config: add TTL window to token-secret cache

proxmox-access-control:
0005 – access-control: extend AccessControlConfig for token.shadow invalidation
0006 – access-control: cache verified API token secrets
0007 – access-control: invalidate token-secret cache on token.shadow changes
0008 – access-control: add TTL window to token-secret cache

proxmox-datacenter-manager:
0009 – pdm-config: add token.shadow generation to ConfigVersionCache
0010 – docs: document API token-cache TTL effects
0011 – pdm-config: wire user+acl cache generation

Maintainer Notes:
* proxmox-access-control trait split: permissions now live in
 AccessControlPermissions, and AccessControlConfig now requires
 fn permissions(&self) -> &dyn AccessControlPermissions ->
 version bump
* Renames ConfigVersionCache`s pub user_cache_generation and
 increase_user_cache_generation -> version bump
* Adds parking_lot::RwLock dependency in PBS and proxmox-access-control

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
[2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
[3] https://bugzilla.proxmox.com/show_bug.cgi?id=6049

proxmox-backup:

Samuel Rufinatscha (4):
  pbs-config: add token.shadow generation to ConfigVersionCache
  pbs-config: cache verified API token secrets
  pbs-config: invalidate token-secret cache on token.shadow changes
  pbs-config: add TTL window to token secret cache

 Cargo.toml                             |   1 +
 docs/user-management.rst               |   4 +
 pbs-config/Cargo.toml                  |   1 +
 pbs-config/src/config_version_cache.rs |  18 ++
 pbs-config/src/token_shadow.rs         | 310 ++++++++++++++++++++++++-
 5 files changed, 331 insertions(+), 3 deletions(-)


proxmox:

Samuel Rufinatscha (4):
  proxmox-access-control: split AccessControlConfig and add token.shadow
    gen
  proxmox-access-control: cache verified API token secrets
  proxmox-access-control: invalidate token-secret cache on token.shadow
    changes
  proxmox-access-control: add TTL window to token secret cache

 Cargo.toml                                 |   1 +
 proxmox-access-control/Cargo.toml          |   1 +
 proxmox-access-control/src/acl.rs          |  10 +-
 proxmox-access-control/src/init.rs         | 113 ++++++--
 proxmox-access-control/src/token_shadow.rs | 311 ++++++++++++++++++++-
 5 files changed, 409 insertions(+), 27 deletions(-)


proxmox-datacenter-manager:

Samuel Rufinatscha (3):
  pdm-config: implement token.shadow generation
  docs: document API token-cache TTL effects
  pdm-config: wire user+acl cache generation

 cli/admin/src/main.rs                      |  2 +-
 docs/access-control.rst                    |  4 +++
 lib/pdm-api-types/src/acl.rs               |  4 +--
 lib/pdm-config/Cargo.toml                  |  1 +
 lib/pdm-config/src/access_control.rs       | 31 ++++++++++++++++++++
 lib/pdm-config/src/config_version_cache.rs | 34 +++++++++++++++++-----
 lib/pdm-config/src/lib.rs                  |  2 ++
 server/src/acl.rs                          |  3 +-
 ui/src/main.rs                             | 10 ++++++-
 9 files changed, 77 insertions(+), 14 deletions(-)
 create mode 100644 lib/pdm-config/src/access_control.rs


Summary over all repositories:
  19 files changed, 817 insertions(+), 44 deletions(-)

-- 
Generated by git-murpp 0.8.1




^ permalink raw reply	[relevance 14%]

* [PATCH proxmox v5 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen
  2026-02-17 11:12 14% [PATCH proxmox{-backup,,-datacenter-manager} v5 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (3 preceding siblings ...)
  2026-02-17 11:12 15% ` [PATCH proxmox-backup v5 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
@ 2026-02-17 11:12 14% ` Samuel Rufinatscha
  2026-02-17 11:12 11% ` [PATCH proxmox v5 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-17 11:12 UTC (permalink / raw)
  To: pbs-devel

Splits AccessControlConfig into permissions and config traits and adds
token.shadow generation support. The trait split separates permission
from cache/invalidation concerns while keeping existing call sites
working via default delegation.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v4 to v5:
* Rebased

 proxmox-access-control/src/acl.rs  |  10 ++-
 proxmox-access-control/src/init.rs | 113 +++++++++++++++++++++++------
 2 files changed, 99 insertions(+), 24 deletions(-)

diff --git a/proxmox-access-control/src/acl.rs b/proxmox-access-control/src/acl.rs
index 38cb7edf..4b4eac09 100644
--- a/proxmox-access-control/src/acl.rs
+++ b/proxmox-access-control/src/acl.rs
@@ -763,7 +763,7 @@ fn privs_to_priv_names(privs: u64) -> Vec<&'static str> {
 mod test {
     use std::{collections::HashMap, sync::OnceLock};
 
-    use crate::init::{init_access_config, AccessControlConfig};
+    use crate::init::{init_access_config, AccessControlConfig, AccessControlPermissions};
 
     use super::AclTree;
     use anyhow::Error;
@@ -775,7 +775,7 @@ mod test {
         roles: HashMap<&'a str, (u64, &'a str)>,
     }
 
-    impl AccessControlConfig for TestAcmConfig<'_> {
+    impl AccessControlPermissions for TestAcmConfig<'_> {
         fn roles(&self) -> &HashMap<&str, (u64, &str)> {
             &self.roles
         }
@@ -793,6 +793,12 @@ mod test {
         }
     }
 
+    impl AccessControlConfig for TestAcmConfig<'_> {
+        fn permissions(&self) -> &dyn AccessControlPermissions {
+            self
+        }
+    }
+
     fn setup_acl_tree_config() {
         static ACL_CONFIG: OnceLock<TestAcmConfig> = OnceLock::new();
         let config = ACL_CONFIG.get_or_init(|| {
diff --git a/proxmox-access-control/src/init.rs b/proxmox-access-control/src/init.rs
index e64398e8..dfd7784b 100644
--- a/proxmox-access-control/src/init.rs
+++ b/proxmox-access-control/src/init.rs
@@ -8,9 +8,8 @@ use proxmox_section_config::SectionConfigData;
 
 static ACCESS_CONF: OnceLock<&'static dyn AccessControlConfig> = OnceLock::new();
 
-/// This trait specifies the functions a product needs to implement to get ACL tree based access
-/// control management from this plugin.
-pub trait AccessControlConfig: Send + Sync {
+/// Provides permission metadata used by access control.
+pub trait AccessControlPermissions: Send + Sync {
     /// Returns a mapping of all recognized privileges and their corresponding `u64` value.
     fn privileges(&self) -> &HashMap<&str, u64>;
 
@@ -32,25 +31,6 @@ pub trait AccessControlConfig: Send + Sync {
         false
     }
 
-    /// Returns the current cache generation of the user and acl configs. If the generation was
-    /// incremented since the last time the cache was queried, the configs are loaded again from
-    /// disk.
-    ///
-    /// Returning `None` will always reload the cache.
-    ///
-    /// Default: Always returns `None`.
-    fn cache_generation(&self) -> Option<usize> {
-        None
-    }
-
-    /// Increment the cache generation of user and acl configs. This indicates that they were
-    /// changed on disk.
-    ///
-    /// Default: Does nothing.
-    fn increment_cache_generation(&self) -> Result<(), Error> {
-        Ok(())
-    }
-
     /// Optionally returns a role that has no access to any resource.
     ///
     /// Default: Returns `None`.
@@ -103,6 +83,95 @@ pub trait AccessControlConfig: Send + Sync {
     }
 }
 
+/// This trait specifies the functions a product needs to implement to get ACL tree based access
+/// control management from this plugin.
+pub trait AccessControlConfig: Send + Sync {
+    /// Return the permissions provider.
+    fn permissions(&self) -> &dyn AccessControlPermissions;
+
+    fn privileges(&self) -> &HashMap<&str, u64> {
+        self.permissions().privileges()
+    }
+
+    fn roles(&self) -> &HashMap<&str, (u64, &str)> {
+        self.permissions().roles()
+    }
+
+    fn is_superuser(&self, auth_id: &Authid) -> bool {
+        self.permissions().is_superuser(auth_id)
+    }
+
+    fn is_group_member(&self, user_id: &Userid, group: &str) -> bool {
+        self.permissions().is_group_member(user_id, group)
+    }
+
+    fn role_no_access(&self) -> Option<&str> {
+        self.permissions().role_no_access()
+    }
+
+    fn role_admin(&self) -> Option<&str> {
+        self.permissions().role_admin()
+    }
+
+    fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
+        self.permissions().init_user_config(config)
+    }
+
+    fn acl_audit_privileges(&self) -> u64 {
+        self.permissions().acl_audit_privileges()
+    }
+
+    fn acl_modify_privileges(&self) -> u64 {
+        self.permissions().acl_modify_privileges()
+    }
+
+    fn check_acl_path(&self, path: &str) -> Result<(), Error> {
+        self.permissions().check_acl_path(path)
+    }
+
+    fn allow_partial_permission_match(&self) -> bool {
+        self.permissions().allow_partial_permission_match()
+    }
+
+    // Cache hooks
+
+    /// Returns the current cache generation of the user and acl configs. If the generation was
+    /// incremented since the last time the cache was queried, the configs are loaded again from
+    /// disk.
+    ///
+    /// Returning `None` will always reload the cache.
+    ///
+    /// Default: Always returns `None`.
+    fn cache_generation(&self) -> Option<usize> {
+        None
+    }
+
+    /// Increment the cache generation of user and acl configs. This indicates that they were
+    /// changed on disk.
+    ///
+    /// Default: Does nothing.
+    fn increment_cache_generation(&self) -> Result<(), Error> {
+        Ok(())
+    }
+
+    /// Returns the current cache generation of the token shadow cache. If the generation was
+    /// incremented since the last time the cache was queried, the token shadow cache is reloaded
+    /// from disk.
+    ///
+    /// Default: Always returns `None`.
+    fn token_shadow_cache_generation(&self) -> Option<usize> {
+        None
+    }
+
+    /// Increment the cache generation of the token shadow cache. This indicates that it was
+    /// changed on disk.
+    ///
+    /// Default: Returns an error as token shadow generation is not supported.
+    fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
+        anyhow::bail!("token shadow generation not supported");
+    }
+}
+
 pub fn init_access_config(config: &'static dyn AccessControlConfig) -> Result<(), Error> {
     ACCESS_CONF
         .set(config)
-- 
2.47.3





^ permalink raw reply related	[relevance 14%]

* [PATCH proxmox-datacenter-manager v5 1/3] pdm-config: implement token.shadow generation
  2026-02-17 11:12 14% [PATCH proxmox{-backup,,-datacenter-manager} v5 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (7 preceding siblings ...)
  2026-02-17 11:12 15% ` [PATCH proxmox v5 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
@ 2026-02-17 11:12 13% ` Samuel Rufinatscha
  2026-02-17 11:12 17% ` [PATCH proxmox-datacenter-manager v5 2/3] docs: document API token-cache TTL effects Samuel Rufinatscha
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-17 11:12 UTC (permalink / raw)
  To: pbs-devel

PDM depends on the shared proxmox/proxmox-access-control crate for
token.shadow handling which expects the product to provide a
cross-process invalidation signal so it can cache/invalidate
token.shadow secrets.

This patch wires AccessControlConfig to ConfigVersionCache for
token.shadow invalidation and switches server/CLI/UI init to use
pdm-config’s AccessControlConfig.

Safety: the shmem mapping is fixed to 4096 bytes via the #[repr(C)]
union padding, and the new atomic is appended to the end of the
#[repr(C)] inner struct, so all existing field offsets stay unchanged.
Old processes keep accessing the same bytes and new processes consume
previously reserved padding.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v4 to v5:
* Rebased
* Added safety note to commit message

Changes from v3 to v4:
* pdm-api-types: replace AccessControlConfig with
AccessControlPermissions and implement init::AccessControlPermissions
there
* pdm-config: add new AccessControlConfig implementing
init::AccessControlConfig
* UI: init uses a local UiAccessControlConfig for init_access_config()
* Adjusted commit message

 cli/admin/src/main.rs                      |  2 +-
 lib/pdm-api-types/src/acl.rs               |  4 ++--
 lib/pdm-config/Cargo.toml                  |  1 +
 lib/pdm-config/src/access_control.rs       | 20 ++++++++++++++++++++
 lib/pdm-config/src/config_version_cache.rs | 18 ++++++++++++++++++
 lib/pdm-config/src/lib.rs                  |  2 ++
 server/src/acl.rs                          |  3 +--
 ui/src/main.rs                             | 10 +++++++++-
 8 files changed, 54 insertions(+), 6 deletions(-)
 create mode 100644 lib/pdm-config/src/access_control.rs

diff --git a/cli/admin/src/main.rs b/cli/admin/src/main.rs
index f698fa2..916c633 100644
--- a/cli/admin/src/main.rs
+++ b/cli/admin/src/main.rs
@@ -19,7 +19,7 @@ fn main() {
     proxmox_product_config::init(api_user, priv_user);
 
     proxmox_access_control::init::init(
-        &pdm_api_types::AccessControlConfig,
+        &pdm_config::AccessControlConfig,
         pdm_buildcfg::configdir!("/access"),
     )
     .expect("failed to setup access control config");
diff --git a/lib/pdm-api-types/src/acl.rs b/lib/pdm-api-types/src/acl.rs
index 405982a..7c405a7 100644
--- a/lib/pdm-api-types/src/acl.rs
+++ b/lib/pdm-api-types/src/acl.rs
@@ -187,9 +187,9 @@ pub struct AclListItem {
     pub roleid: String,
 }
 
-pub struct AccessControlConfig;
+pub struct AccessControlPermissions;
 
-impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
+impl proxmox_access_control::init::AccessControlPermissions for AccessControlPermissions {
     fn privileges(&self) -> &HashMap<&str, u64> {
         static PRIVS: LazyLock<HashMap<&str, u64>> =
             LazyLock::new(|| PRIVILEGES.iter().copied().collect());
diff --git a/lib/pdm-config/Cargo.toml b/lib/pdm-config/Cargo.toml
index d39c2ad..19781d2 100644
--- a/lib/pdm-config/Cargo.toml
+++ b/lib/pdm-config/Cargo.toml
@@ -13,6 +13,7 @@ once_cell.workspace = true
 openssl.workspace = true
 serde.workspace = true
 
+proxmox-access-control.workspace = true
 proxmox-config-digest = { workspace = true, features = [ "openssl" ] }
 proxmox-http = { workspace = true, features = [ "http-helpers" ] }
 proxmox-ldap = { workspace = true, features = [ "types" ]}
diff --git a/lib/pdm-config/src/access_control.rs b/lib/pdm-config/src/access_control.rs
new file mode 100644
index 0000000..389b3f4
--- /dev/null
+++ b/lib/pdm-config/src/access_control.rs
@@ -0,0 +1,20 @@
+use anyhow::Error;
+
+pub struct AccessControlConfig;
+
+impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
+    fn permissions(&self) -> &dyn proxmox_access_control::init::AccessControlPermissions {
+        &pdm_api_types::AccessControlPermissions
+    }
+
+    fn token_shadow_cache_generation(&self) -> Option<usize> {
+        crate::ConfigVersionCache::new()
+            .ok()
+            .map(|c| c.token_shadow_generation())
+    }
+
+    fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
+        let c = crate::ConfigVersionCache::new()?;
+        Ok(c.increase_token_shadow_generation())
+    }
+}
diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
index 36a6a77..933140c 100644
--- a/lib/pdm-config/src/config_version_cache.rs
+++ b/lib/pdm-config/src/config_version_cache.rs
@@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
     traffic_control_generation: AtomicUsize,
     // Tracks updates to the remote/hostname/nodename mapping cache.
     remote_mapping_cache: AtomicUsize,
+    // Token shadow (token.shadow) generation/version.
+    token_shadow_generation: AtomicUsize,
     // Add further atomics here
 }
 
@@ -172,4 +174,20 @@ impl ConfigVersionCache {
             .fetch_add(1, Ordering::Relaxed)
             + 1
     }
+
+    /// Returns the token shadow generation number.
+    pub fn token_shadow_generation(&self) -> usize {
+        self.shmem
+            .data()
+            .token_shadow_generation
+            .load(Ordering::Acquire)
+    }
+
+    /// Increase the token shadow generation number.
+    pub fn increase_token_shadow_generation(&self) -> usize {
+        self.shmem
+            .data()
+            .token_shadow_generation
+            .fetch_add(1, Ordering::AcqRel)
+    }
 }
diff --git a/lib/pdm-config/src/lib.rs b/lib/pdm-config/src/lib.rs
index 4c49054..614f7ae 100644
--- a/lib/pdm-config/src/lib.rs
+++ b/lib/pdm-config/src/lib.rs
@@ -9,6 +9,8 @@ pub mod remotes;
 pub mod setup;
 pub mod views;
 
+mod access_control;
+pub use access_control::AccessControlConfig;
 mod config_version_cache;
 pub use config_version_cache::ConfigVersionCache;
 
diff --git a/server/src/acl.rs b/server/src/acl.rs
index f421814..e6e007b 100644
--- a/server/src/acl.rs
+++ b/server/src/acl.rs
@@ -1,6 +1,5 @@
 pub(crate) fn init() {
-    static ACCESS_CONTROL_CONFIG: pdm_api_types::AccessControlConfig =
-        pdm_api_types::AccessControlConfig;
+    static ACCESS_CONTROL_CONFIG: pdm_config::AccessControlConfig = pdm_config::AccessControlConfig;
 
     proxmox_access_control::init::init(&ACCESS_CONTROL_CONFIG, pdm_buildcfg::configdir!("/access"))
         .expect("failed to setup access control config");
diff --git a/ui/src/main.rs b/ui/src/main.rs
index 2bd900e..9f87505 100644
--- a/ui/src/main.rs
+++ b/ui/src/main.rs
@@ -390,10 +390,18 @@ fn main() {
     pwt::state::set_available_languages(proxmox_yew_comp::available_language_list());
 
     if let Err(e) =
-        proxmox_access_control::init::init_access_config(&pdm_api_types::AccessControlConfig)
+        proxmox_access_control::init::init_access_config(&UiAccessControlConfig)
     {
         log::error!("could not initialize access control config - {e:#}");
     }
 
     yew::Renderer::<DatacenterManagerApp>::new().render();
 }
+
+struct UiAccessControlConfig;
+
+impl proxmox_access_control::init::AccessControlConfig for UiAccessControlConfig {
+    fn permissions(&self) -> &dyn proxmox_access_control::init::AccessControlPermissions {
+        &pdm_api_types::AccessControlPermissions
+    }
+}
-- 
2.47.3





^ permalink raw reply related	[relevance 13%]

* [PATCH proxmox-backup v5 1/4] pbs-config: add token.shadow generation to ConfigVersionCache
  2026-02-17 11:12 14% [PATCH proxmox{-backup,,-datacenter-manager} v5 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
@ 2026-02-17 11:12 17% ` Samuel Rufinatscha
  2026-02-17 11:12 12% ` [PATCH proxmox-backup v5 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-17 11:12 UTC (permalink / raw)
  To: pbs-devel

Prepares the config version cache to support token_shadow caching.

Safety: the shmem mapping is fixed to 4096 bytes via the #[repr(C)]
union padding, and the new atomic is appended to the end of the
#[repr(C)] inner struct, so all existing field offsets stay unchanged.
Old processes keep accessing the same bytes and new processes consume
previously reserved padding.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v4 to v5:
* Rebased

Changes from v3 to v4:
* Rebased
* Adjusted commit message

Changes from v2 to v3:
* Rebased

Changes from v1 to v2:
* Rebased

 pbs-config/src/config_version_cache.rs | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
index b875f7e0..399a6f79 100644
--- a/pbs-config/src/config_version_cache.rs
+++ b/pbs-config/src/config_version_cache.rs
@@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
     traffic_control_generation: AtomicUsize,
     // datastore (datastore.cfg) generation/version
     datastore_generation: AtomicUsize,
+    // Token shadow (token.shadow) generation/version.
+    token_shadow_generation: AtomicUsize,
     // Add further atomics here
 }
 
@@ -159,4 +161,20 @@ impl ConfigVersionCache {
             .datastore_generation
             .fetch_add(1, Ordering::AcqRel)
     }
+
+    /// Returns the token shadow generation number.
+    pub fn token_shadow_generation(&self) -> usize {
+        self.shmem
+            .data()
+            .token_shadow_generation
+            .load(Ordering::Acquire)
+    }
+
+    /// Increase the token shadow generation number.
+    pub fn increase_token_shadow_generation(&self) -> usize {
+        self.shmem
+            .data()
+            .token_shadow_generation
+            .fetch_add(1, Ordering::AcqRel)
+    }
 }
-- 
2.47.3





^ permalink raw reply related	[relevance 17%]

* [PATCH proxmox-datacenter-manager v5 2/3] docs: document API token-cache TTL effects
  2026-02-17 11:12 14% [PATCH proxmox{-backup,,-datacenter-manager} v5 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (8 preceding siblings ...)
  2026-02-17 11:12 13% ` [PATCH proxmox-datacenter-manager v5 1/3] pdm-config: implement token.shadow generation Samuel Rufinatscha
@ 2026-02-17 11:12 17% ` Samuel Rufinatscha
  2026-02-17 11:12 16% ` [PATCH proxmox-datacenter-manager v5 3/3] pdm-config: wire user+acl cache generation Samuel Rufinatscha
  2026-03-03 16:52 13% ` [pbs-devel] superseded: [PATCH proxmox{-backup,,-datacenter-manager} v5 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-17 11:12 UTC (permalink / raw)
  To: pbs-devel

Documents the effects of the added API token-cache in the
proxmox-access-control crate.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v4 to v5:
* Rebased

Changes from v3 to 4:
* Adjusted commit message

 docs/access-control.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/docs/access-control.rst b/docs/access-control.rst
index adf26cd..18e57a2 100644
--- a/docs/access-control.rst
+++ b/docs/access-control.rst
@@ -47,6 +47,10 @@ place of the user ID (``user@realm``) and the user password, respectively.
 The API token is passed from the client to the server by setting the ``Authorization`` HTTP header
 with method ``PDMAPIToken`` to the value ``TOKENID:TOKENSECRET``.
 
+.. WARNING:: Direct/manual edits to ``token.shadow`` may take up to 60 seconds (or
+   longer in edge cases) to take effect due to caching. Restart services for
+   immediate effect of manual edits.
+
 .. _access_control:
 
 Access Control
-- 
2.47.3





^ permalink raw reply related	[relevance 17%]

* [PATCH proxmox v5 4/4] proxmox-access-control: add TTL window to token secret cache
  2026-02-17 11:12 14% [PATCH proxmox{-backup,,-datacenter-manager} v5 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (6 preceding siblings ...)
  2026-02-17 11:12 12% ` [PATCH proxmox v5 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2026-02-17 11:12 15% ` Samuel Rufinatscha
  2026-02-17 11:12 13% ` [PATCH proxmox-datacenter-manager v5 1/3] pdm-config: implement token.shadow generation Samuel Rufinatscha
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-17 11:12 UTC (permalink / raw)
  To: pbs-devel

Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
usually should rarely change.

This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired; documents TTL effects.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v4 to v5:
* Rebased
* Introduce shadow_check_within_ttl() helper

Changes from v3 to v4:
* Adjusted commit message

Changes from v2 to v3:
* Refactored refresh_cache_if_file_changed TTL logic.
* Remove had_prior_state check (replaced by last_checked logic).
* Improve TTL bound checks.
* Reword documentation warning for clarity.

Changes from v1 to v2:
* Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
* Implement double-checked TTL: check with try_read first; only attempt
  refresh with try_write if expired/unknown.
* Fix TTL bookkeeping: update last_checked on the “file unchanged” path
  and after API mutations.
* Add documentation warning about TTL-delayed effect of manual
  token.shadow edits.

 proxmox-access-control/src/token_shadow.rs | 31 +++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index d1b7d4cb..2d318f64 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -28,6 +28,9 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
     })
 });
 
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
+
 // Get exclusive lock
 fn lock_config() -> Result<ApiLockGuard, Error> {
     open_api_lockfile(token_shadow_lock(), None, true)
@@ -55,11 +58,24 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
 fn refresh_cache_if_file_changed() -> bool {
     let now = epoch_i64();
 
-    // Best-effort refresh under write lock.
+    // Fast path: cache is fresh if shared-gen matches and TTL not expired.
+    if let (Some(cache), Some(shared_gen_read)) =
+        (TOKEN_SECRET_CACHE.try_read(), token_shadow_shared_gen())
+    {
+        if cache.shared_gen == shared_gen_read && cache.shadow_check_within_ttl(now) {
+            return true;
+        }
+        // read lock drops here
+    } else {
+        return false;
+    }
+
+    // Slow path: best-effort refresh under write lock.
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return false;
     };
 
+    // Re-read generation after acquiring the lock (may have changed meanwhile).
     let Some(shared_gen_now) = token_shadow_shared_gen() else {
         return false;
     };
@@ -69,6 +85,12 @@ fn refresh_cache_if_file_changed() -> bool {
         cache.reset_and_set_gen(shared_gen_now);
     }
 
+    // TTL check again after acquiring the lock
+    let now = epoch_i64();
+    if cache.shadow_check_within_ttl(now) {
+        return true;
+    }
+
     // Stat the file to detect manual edits.
     let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
         return false;
@@ -217,6 +239,13 @@ impl ApiTokenSecretCache {
         self.secrets.remove(tokenid);
         self.shared_gen = gen;
     }
+
+    /// Returns true if cached token.shadow metadata exists and was checked within the TTL window.
+    fn shadow_check_within_ttl(&self, now: i64) -> bool {
+        self.shadow.as_ref().is_some_and(|cached| {
+            now >= cached.last_checked && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS
+        })
+    }
 }
 
 /// Shadow file info
-- 
2.47.3





^ permalink raw reply related	[relevance 15%]

* [PATCH proxmox-datacenter-manager v5 3/3] pdm-config: wire user+acl cache generation
  2026-02-17 11:12 14% [PATCH proxmox{-backup,,-datacenter-manager} v5 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (9 preceding siblings ...)
  2026-02-17 11:12 17% ` [PATCH proxmox-datacenter-manager v5 2/3] docs: document API token-cache TTL effects Samuel Rufinatscha
@ 2026-02-17 11:12 16% ` Samuel Rufinatscha
  2026-03-03 16:52 13% ` [pbs-devel] superseded: [PATCH proxmox{-backup,,-datacenter-manager} v5 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-17 11:12 UTC (permalink / raw)
  To: pbs-devel

Rename ConfigVersionCache’s user_cache_generation to
user_and_acl_generation to match AccessControlConfig::cache_generation
and increment_cache_generation semantics: it expects the same shared
generation for both user and ACL configs.

Safety: no layout change, the shared-memory size and field order remain
unchanged.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v4 to v5:
* Rebased

 lib/pdm-config/src/access_control.rs       | 11 +++++++++++
 lib/pdm-config/src/config_version_cache.rs | 16 ++++++++--------
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/lib/pdm-config/src/access_control.rs b/lib/pdm-config/src/access_control.rs
index 389b3f4..1d498d3 100644
--- a/lib/pdm-config/src/access_control.rs
+++ b/lib/pdm-config/src/access_control.rs
@@ -7,6 +7,17 @@ impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
         &pdm_api_types::AccessControlPermissions
     }
 
+    fn cache_generation(&self) -> Option<usize> {
+        crate::ConfigVersionCache::new()
+            .ok()
+            .map(|c| c.user_and_acl_generation())
+    }
+
+    fn increment_cache_generation(&self) -> Result<(), Error> {
+        let c = crate::ConfigVersionCache::new()?;
+        Ok(c.increase_user_and_acl_generation())
+    }
+
     fn token_shadow_cache_generation(&self) -> Option<usize> {
         crate::ConfigVersionCache::new()
             .ok()
diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
index 933140c..f3d52a0 100644
--- a/lib/pdm-config/src/config_version_cache.rs
+++ b/lib/pdm-config/src/config_version_cache.rs
@@ -21,8 +21,8 @@ use proxmox_shared_memory::*;
 #[repr(C)]
 struct ConfigVersionCacheDataInner {
     magic: [u8; 8],
-    // User (user.cfg) cache generation/version.
-    user_cache_generation: AtomicUsize,
+    // User (user.cfg) and ACL (acl.cfg) generation/version.
+    user_and_acl_generation: AtomicUsize,
     // Traffic control (traffic-control.cfg) generation/version.
     traffic_control_generation: AtomicUsize,
     // Tracks updates to the remote/hostname/nodename mapping cache.
@@ -126,19 +126,19 @@ impl ConfigVersionCache {
         Ok(Arc::new(Self { shmem }))
     }
 
-    /// Returns the user cache generation number.
-    pub fn user_cache_generation(&self) -> usize {
+    /// Returns the user and ACL cache generation number.
+    pub fn user_and_acl_generation(&self) -> usize {
         self.shmem
             .data()
-            .user_cache_generation
+            .user_and_acl_generation
             .load(Ordering::Acquire)
     }
 
-    /// Increase the user cache generation number.
-    pub fn increase_user_cache_generation(&self) {
+    /// Increase the user and ACL cache generation number.
+    pub fn increase_user_and_acl_generation(&self) {
         self.shmem
             .data()
-            .user_cache_generation
+            .user_and_acl_generation
             .fetch_add(1, Ordering::AcqRel);
     }
 
-- 
2.47.3





^ permalink raw reply related	[relevance 16%]

* [PATCH proxmox v5 2/4] proxmox-access-control: cache verified API token secrets
  2026-02-17 11:12 14% [PATCH proxmox{-backup,,-datacenter-manager} v5 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (4 preceding siblings ...)
  2026-02-17 11:12 14% ` [PATCH proxmox v5 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen Samuel Rufinatscha
@ 2026-02-17 11:12 11% ` Samuel Rufinatscha
  2026-02-17 11:12 12% ` [PATCH proxmox v5 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-17 11:12 UTC (permalink / raw)
  To: pbs-devel

Adds an in-memory cache of successfully verified token secrets.
Subsequent requests for the same token+secret combination only perform
a comparison using openssl::memcmp::eq and avoid re-running the
password hash. The cache is updated when a token secret is set and
cleared when a token is deleted. A shared generation counter (via
ConfigVersionCache) is used to invalidate caches across processes when
token secrets are modified or deleted. This keeps privileged and
unprivileged daemons in sync.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v4 to v5:
* Rebased
* Fix wrong type compilation issue; replaced with ApiLockGuard
* Move invalidate_cache_state_and_set_gen into cache object impl
rename to reset_and_set_gen
* Add additional insert/remove helpers which set/update the generation
directly
* Clarified the  usage of shared generation counter in the commit
message

Changes from v3 to v4:
* Add gen param to invalidate_cache_state()
* Validates the generation bump after obtaining write lock in
apply_api_mutation
* Pass lock to apply_api_mutation
* Remove unnecessary gen check cache_try_secret_matches
* Adjusted commit message

Changes from v2 to v3:
* Replaced process-local cache invalidation (AtomicU64
API_MUTATION_GENERATION) with a cross-process shared generation via
ConfigVersionCache.
* Validate shared generation before/after the constant-time secret
compare; only insert into cache if the generation is unchanged.
* invalidate_cache_state() on insert if shared generation changed.

Changes from v1 to v2:
* Replace OnceCell with LazyLock, and std::sync::RwLock with
parking_lot::RwLock.
* Add API_MUTATION_GENERATION and guard cache inserts
to prevent “zombie inserts” across concurrent set/delete.
* Refactor cache operations into cache_try_secret_matches,
cache_try_insert_secret, and centralize write-side behavior in
apply_api_mutation.
* Switch fast-path cache access to try_read/try_write (best-effort).

 Cargo.toml                                 |   1 +
 proxmox-access-control/Cargo.toml          |   1 +
 proxmox-access-control/src/token_shadow.rs | 167 ++++++++++++++++++++-
 3 files changed, 166 insertions(+), 3 deletions(-)

diff --git a/Cargo.toml b/Cargo.toml
index 6ce4d5ec..d31e4e30 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -112,6 +112,7 @@ native-tls = "0.2"
 nix = "0.29"
 openssl = "0.10"
 pam-sys = "0.5"
+parking_lot = "0.12"
 percent-encoding = "2.1"
 pin-utils = "0.1.0"
 proc-macro2 = "1.0"
diff --git a/proxmox-access-control/Cargo.toml b/proxmox-access-control/Cargo.toml
index ec189664..1de2842c 100644
--- a/proxmox-access-control/Cargo.toml
+++ b/proxmox-access-control/Cargo.toml
@@ -16,6 +16,7 @@ anyhow.workspace = true
 const_format.workspace = true
 nix = { workspace = true, optional = true }
 openssl = { workspace = true, optional = true }
+parking_lot.workspace = true
 regex.workspace = true
 hex = { workspace = true, optional = true }
 serde.workspace = true
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index c586d834..ba5fb3f8 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,13 +1,28 @@
 use std::collections::HashMap;
+use std::sync::LazyLock;
 
 use anyhow::{bail, format_err, Error};
+use parking_lot::RwLock;
 use serde_json::{from_value, Value};
 
 use proxmox_auth_api::types::Authid;
 use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
 
+use crate::init::access_conf;
 use crate::init::impl_feature::{token_shadow, token_shadow_lock};
 
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
+    RwLock::new(ApiTokenSecretCache {
+        secrets: HashMap::new(),
+        shared_gen: 0,
+    })
+});
+
 // Get exclusive lock
 fn lock_config() -> Result<ApiLockGuard, Error> {
     open_api_lockfile(token_shadow_lock(), None, true)
@@ -36,9 +51,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
+    // Fast path
+    if cache_try_secret_matches(tokenid, secret) {
+        return Ok(());
+    }
+
+    // Slow path
+    // First, capture the shared generation before doing the hash verification.
+    let gen_before = token_shadow_shared_gen();
+
     let data = read_file()?;
     match data.get(tokenid) {
-        Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+        Some(hashed_secret) => {
+            proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+
+            // Try to cache only if nothing changed while verifying the secret.
+            if let Some(gen) = gen_before {
+                cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
+            }
+
+            Ok(())
+        }
         None => bail!("invalid API token"),
     }
 }
@@ -49,13 +82,15 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
-    let _guard = lock_config()?;
+    let guard = lock_config()?;
 
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
+    apply_api_mutation(guard, tokenid, Some(secret));
+
     Ok(())
 }
 
@@ -65,12 +100,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
-    let _guard = lock_config()?;
+    let guard = lock_config()?;
 
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
+    apply_api_mutation(guard, tokenid, None);
+
     Ok(())
 }
 
@@ -81,3 +118,127 @@ pub fn generate_and_set_secret(tokenid: &Authid) -> Result<String, Error> {
     set_secret(tokenid, &secret)?;
     Ok(secret)
 }
+
+/// Cached secret.
+struct CachedSecret {
+    secret: String,
+}
+
+struct ApiTokenSecretCache {
+    /// Keys are token Authids, values are the corresponding plain text secrets.
+    /// Entries are added after a successful on-disk verification in
+    /// `verify_secret` or when a new token secret is generated by
+    /// `generate_and_set_secret`. Used to avoid repeated
+    /// password-hash computation on subsequent authentications.
+    secrets: HashMap<Authid, CachedSecret>,
+    /// Shared generation to detect mutations of the underlying token.shadow file.
+    shared_gen: usize,
+}
+
+impl ApiTokenSecretCache {
+    /// Resets all local cache contents and sets/updates the cached generation.
+    fn reset_and_set_gen(&mut self, gen: usize) {
+        self.secrets.clear();
+        self.shared_gen = gen;
+    }
+
+    /// Caches a secret and sets/updates the cache generation.
+    fn insert_and_set_gen(&mut self, tokenid: Authid, secret: CachedSecret, gen: usize) {
+        self.secrets.insert(tokenid, secret);
+        self.shared_gen = gen;
+    }
+
+    /// Evicts a cached secret and sets/updates the cached generation.
+    fn evict_and_set_gen(&mut self, tokenid: &Authid, gen: usize) {
+        self.secrets.remove(tokenid);
+        self.shared_gen = gen;
+    }
+}
+
+fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return;
+    };
+
+    let Some(shared_gen_now) = token_shadow_shared_gen() else {
+        return;
+    };
+
+    // If this process missed a generation bump, its cache is stale.
+    if cache.shared_gen != shared_gen_now {
+        cache.reset_and_set_gen(shared_gen_now);
+    }
+
+    // If a mutation happened while we were verifying the secret, do not insert.
+    if shared_gen_now == shared_gen_before {
+        cache.insert_and_set_gen(tokenid, CachedSecret { secret }, shared_gen_now);
+    }
+}
+
+/// Tries to match the given token secret against the cached secret.
+///
+/// Verifies the generation/version before doing the constant-time
+/// comparison to reduce TOCTOU risk. During token rotation or deletion
+/// tokens for in-flight requests may still validate against the previous
+/// generation.
+fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
+    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+        return false;
+    };
+    let Some(entry) = cache.secrets.get(tokenid) else {
+        return false;
+    };
+    let Some(current_gen) = token_shadow_shared_gen() else {
+        return false;
+    };
+
+    if current_gen == cache.shared_gen {
+        return openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
+    }
+
+    false
+}
+
+fn apply_api_mutation(_guard: ApiLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+    // Signal cache invalidation to other processes (best-effort).
+    let bumped_gen = bump_token_shadow_shared_gen();
+
+    let mut cache = TOKEN_SECRET_CACHE.write();
+
+    // If we cannot get the current generation, we cannot trust the cache
+    let Some(current_gen) = token_shadow_shared_gen() else {
+        cache.reset_and_set_gen(0);
+        return;
+    };
+
+    // If we cannot bump the shared generation, or if it changed after
+    // obtaining the cache write lock, we cannot trust the cache
+    if bumped_gen != Some(current_gen) {
+        cache.reset_and_set_gen(current_gen);
+        return;
+    }
+
+    // Apply the new mutation.
+    match new_secret {
+        Some(secret) => {
+            let cached_secret = CachedSecret {
+                secret: secret.to_owned(),
+            };
+            cache.insert_and_set_gen(tokenid.clone(), cached_secret, current_gen);
+        }
+        None => cache.evict_and_set_gen(tokenid, current_gen),
+    }
+}
+
+/// Get the current shared generation.
+fn token_shadow_shared_gen() -> Option<usize> {
+    access_conf().token_shadow_cache_generation()
+}
+
+/// Bump and return the new shared generation.
+fn bump_token_shadow_shared_gen() -> Option<usize> {
+    access_conf()
+        .increment_token_shadow_cache_generation()
+        .ok()
+        .map(|prev| prev + 1)
+}
-- 
2.47.3





^ permalink raw reply related	[relevance 11%]

* [pbs-devel] superseded: [PATCH proxmox{-backup,,-datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead
  2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (10 preceding siblings ...)
  2026-01-21 15:14 16% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 3/3] pdm-config: wire user+acl cache generation Samuel Rufinatscha
@ 2026-02-17 11:14 13% ` Samuel Rufinatscha
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-17 11:14 UTC (permalink / raw)
  To: pbs-devel

https://lore.proxmox.com/pbs-devel/20260217111229.78661-1-s.rufinatscha@proxmox.com/T/#t

On 1/21/26 4:13 PM, Samuel Rufinatscha wrote:
> Hi,
> 
> this series improves the performance of token-based API authentication
> in PBS (pbs-config) and in PDM (underlying proxmox-access-control
> crate), addressing the API token verification hotspot reported in our
> bugtracker #7017 [1].
> 
> When profiling PBS /status endpoint with cargo flamegraph [2],
> token-based authentication showed up as a dominant hotspot via
> proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
> path from the hot section of the flamegraph. The same performance issue
> was measured [2] for PDM. PDM uses the underlying shared
> proxmox-access-control library for token handling, which is a
> factored out version of the token.shadow handling code from PBS.
> 
> While this series fixes the immediate performance issue both in PBS
> (pbs-config) and in the shared proxmox-access-control crate used by
> PDM, PBS should eventually, ideally be refactored, in a separate
> effort, to use proxmox-access-control for token handling instead of its
> local implementation.
> 
> Approach
> 
> The goal is to reduce the cost of token-based authentication preserving
> the existing token handling semantics (including detecting manual edits
> to token.shadow) and be consistent between PBS (pbs-config) and
> PDM (proxmox-access-control). For both sites, this series proposes to:
> 
> 1. Introduce an in-memory cache for verified token secrets and
> invalidate it through a shared ConfigVersionCache generation. Note, a
> shared generation is required to keep privileged and unprivileged
> daemon in sync to avoid caching inconsistencies across processes.
> 2. Invalidate on token.shadow API changes (set_secret,
> delete_secret)
> 3. Invalidate on direct/manual token.shadow file changes (mtime +
> length)
> 4. Avoid per-request file stat calls using a TTL window
> 
> Testing
> 
> To verify the effect in PBS (pbs-config changes), I:
> 1. Set up test environment based on latest PBS ISO, installed Rust
>     toolchain, cloned proxmox-backup repository to use with cargo
>     flamegraph. Reproduced bug #7017 [1] by profiling the /status
>     endpoint with token-based authentication using cargo flamegraph [2].
> 2. Built PBS with pbs-config patches and re-ran the same workload and
>     profiling setup. Confirmed that
>     proxmox_sys::crypt::verify_crypt_pw path no longer appears in the
>     hot section of the flamegraph. CPU usage is now dominated by TLS
>     overhead.
> 3. Functionally-wise, I verified that:
>     * valid tokens authenticate correctly when used in API requests
>     * invalid secrets are rejected as before
>     * generating a new token secret via dashboard (create token for
>     user, regenerate existing secret) works and authenticates correctly
> 
> To verify the effect in PDM (proxmox-access-control changes), instead
> of PBS’ /status, I profiled the /version endpoint with cargo flamegraph
> [2] and verified that the expensive hashing path disappears from the
> hot section after introducing caching. Functionally-wise, I verified
> that:
>     * valid tokens authenticate correctly when used in API requests
>     * invalid secrets are rejected as before
>     * generating a new token secret via dashboard (create token for user,
>     regenerate existing secret) works and authenticates correctly
> 
> Benchmarks
> 
> Two different benchmarks have been run to measure caching effects
> and RwLock contention:
> 
> (1) Requests per second for PBS /status endpoint (E2E)
> 
> Benchmarked parallel token auth requests for
> /status?verbose=0 on top of the datastore lookup cache series [3]
> to check throughput impact. With datastores=1, repeat=5000, parallel=16
> this series gives ~172 req/s compared to ~65 req/s without it.
> This is a ~2.6x improvement (and aligns with the ~179 req/s from the
> previous series, which used per-process cache invalidation).
> 
> (2) RwLock contention for token create/delete under heavy load of
> token-authenticated requests
> 
> The previous version of the series compared std::sync::RwLock and
> parking_lot::RwLock contention for token create/delete under heavy
> parallel token-authenticated readers. parking_lot::RwLock has been
> chosen for the added fairness guarantees.
> 
> Patch summary
> 
> pbs-config:
> 0001 – pbs-config: add token.shadow generation to ConfigVersionCache
> 0002 – pbs-config: cache verified API token secrets
> 0003 – pbs-config: invalidate token-secret cache on token.shadow
> changes
> 0004 – pbs-config: add TTL window to token-secret cache
> 
> proxmox-access-control:
> 0005 – access-control: extend AccessControlConfig for token.shadow invalidation
> 0006 – access-control: cache verified API token secrets
> 0007 – access-control: invalidate token-secret cache on token.shadow changes
> 0008 – access-control: add TTL window to token-secret cache
> 
> proxmox-datacenter-manager:
> 0009 – pdm-config: add token.shadow generation to ConfigVersionCache
> 0010 – docs: document API token-cache TTL effects
> 0011 – pdm-config: wire user+acl cache generation
> 
> Maintainer notes
> * proxmox-access-control trait split: permissions now live in
>   AccessControlPermissions, and AccessControlConfig now requires
>   fn permissions(&self) -> &dyn AccessControlPermissions ->
>   version bump
> * Renames ConfigVersionCache`s pub user_cache_generation and
>   increase_user_cache_generation -> version bump
> * Adds parking_lot::RwLock dependency in PBS and proxmox-access-control
> 
> Kind regards,
> Samuel Rufinatscha
> 
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
> [2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
> [3] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> 
> proxmox-backup:
> 
> Samuel Rufinatscha (4):
>    pbs-config: add token.shadow generation to ConfigVersionCache
>    pbs-config: cache verified API token secrets
>    pbs-config: invalidate token-secret cache on token.shadow changes
>    pbs-config: add TTL window to token secret cache
> 
>   Cargo.toml                             |   1 +
>   docs/user-management.rst               |   4 +
>   pbs-config/Cargo.toml                  |   1 +
>   pbs-config/src/config_version_cache.rs |  18 ++
>   pbs-config/src/token_shadow.rs         | 302 ++++++++++++++++++++++++-
>   5 files changed, 323 insertions(+), 3 deletions(-)
> 
> 
> proxmox:
> 
> Samuel Rufinatscha (4):
>    proxmox-access-control: split AccessControlConfig and add token.shadow
>      gen
>    proxmox-access-control: cache verified API token secrets
>    proxmox-access-control: invalidate token-secret cache on token.shadow
>      changes
>    proxmox-access-control: add TTL window to token secret cache
> 
>   Cargo.toml                                 |   1 +
>   proxmox-access-control/Cargo.toml          |   1 +
>   proxmox-access-control/src/acl.rs          |  10 +-
>   proxmox-access-control/src/init.rs         | 113 ++++++--
>   proxmox-access-control/src/token_shadow.rs | 303 ++++++++++++++++++++-
>   5 files changed, 401 insertions(+), 27 deletions(-)
> 
> 
> proxmox-datacenter-manager:
> 
> Samuel Rufinatscha (3):
>    pdm-config: implement token.shadow generation
>    docs: document API token-cache TTL effects
>    pdm-config: wire user+acl cache generation
> 
>   cli/admin/src/main.rs                      |  2 +-
>   docs/access-control.rst                    |  4 +++
>   lib/pdm-api-types/src/acl.rs               |  4 +--
>   lib/pdm-config/Cargo.toml                  |  1 +
>   lib/pdm-config/src/access_control.rs       | 31 ++++++++++++++++++++
>   lib/pdm-config/src/config_version_cache.rs | 34 +++++++++++++++++-----
>   lib/pdm-config/src/lib.rs                  |  2 ++
>   server/src/acl.rs                          |  3 +-
>   ui/src/main.rs                             | 10 ++++++-
>   9 files changed, 77 insertions(+), 14 deletions(-)
>   create mode 100644 lib/pdm-config/src/access_control.rs
> 
> 
> Summary over all repositories:
>    19 files changed, 801 insertions(+), 44 deletions(-)
> 





^ permalink raw reply	[relevance 13%]

* Re: [PATCH pve-cluster 01/14 v2] pmxcfs-rs: add Rust workspace configuration
  @ 2026-02-18 10:41  6%   ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-18 10:41 UTC (permalink / raw)
  To: Kefu Chai, pve-devel

Thanks for the patch!

I think it’d be better to merge this with the next patch that adds the
first workspace member, since otherwise cargo build doesn’t work yet.

Also, could you please additionally add a rustfmt.toml so formatting is
consistent across repos? And a small inline comment below:

On 2/13/26 10:41 AM, Kefu Chai wrote:
> Initialize the Rust workspace for the pmxcfs rewrite project.
> 
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
>   src/pmxcfs-rs/.gitignore |  3 +++
>   src/pmxcfs-rs/Cargo.toml | 31 +++++++++++++++++++++++++++++++
>   src/pmxcfs-rs/Makefile   | 39 +++++++++++++++++++++++++++++++++++++++
>   3 files changed, 73 insertions(+)
>   create mode 100644 src/pmxcfs-rs/.gitignore
>   create mode 100644 src/pmxcfs-rs/Cargo.toml
>   create mode 100644 src/pmxcfs-rs/Makefile
> 
> diff --git a/src/pmxcfs-rs/.gitignore b/src/pmxcfs-rs/.gitignore
> new file mode 100644
> index 000000000..f2e56d3f7
> --- /dev/null
> +++ b/src/pmxcfs-rs/.gitignore
> @@ -0,0 +1,3 @@
> +/target

this entry should not be needed as it’s covered by target/ below

> +Cargo.lock
> +target/
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> new file mode 100644
> index 000000000..d109221fb
> --- /dev/null
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -0,0 +1,31 @@
> +# Workspace root for pmxcfs Rust implementation
> +[workspace]
> +members = [
> +]
> +resolver = "2"
> +
> +[workspace.package]
> +version = "9.0.6"
> +edition = "2024"
> +authors = ["Proxmox Support Team <support@proxmox.com>"]
> +license = "AGPL-3.0"
> +repository = "https://git.proxmox.com/?p=pve-cluster.git"
> +rust-version = "1.85"
> +
> +[workspace.dependencies]
> +# Dependencies will be added incrementally as crates are introduced
> +
> +[workspace.lints.clippy]
> +uninlined_format_args = "warn"
> +
> +[profile.release]
> +lto = true
> +codegen-units = 1
> +opt-level = 3
> +strip = true
> +
> +[profile.dev]
> +opt-level = 1
> +debug = true
> +
> +[patch.crates-io]
> diff --git a/src/pmxcfs-rs/Makefile b/src/pmxcfs-rs/Makefile
> new file mode 100644
> index 000000000..eaa96317f
> --- /dev/null
> +++ b/src/pmxcfs-rs/Makefile
> @@ -0,0 +1,39 @@
> +.PHONY: all test lint clippy fmt check build clean help
> +
> +# Default target
> +all: check build
> +
> +# Run all tests
> +test:
> +	cargo test --workspace
> +
> +# Lint with clippy (using proxmox-backup style: only fail on correctness issues)
> +clippy:
> +	cargo clippy --workspace -- -A clippy::all -D clippy::correctness
> +
> +# Check code formatting
> +fmt:
> +	cargo fmt --all --check
> +
> +# Full quality check (format + lint + test)
> +check: fmt clippy test
> +
> +# Build release version
> +build:
> +	cargo build --workspace --release
> +
> +# Clean build artifacts
> +clean:
> +	cargo clean
> +
> +# Show available targets
> +help:
> +	@echo "Available targets:"
> +	@echo "  all      - Run check and build (default)"
> +	@echo "  test     - Run all tests"
> +	@echo "  clippy   - Run clippy linter"
> +	@echo "  fmt      - Check code formatting"
> +	@echo "  check    - Run fmt + clippy + test"
> +	@echo "  build    - Build release version"
> +	@echo "  clean    - Clean build artifacts"
> +	@echo "  help     - Show this help message"





^ permalink raw reply	[relevance 6%]

* Re: [PATCH pve-cluster 02/14 v2] pmxcfs-rs: add pmxcfs-api-types crate
  @ 2026-02-18 15:06  5%   ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-18 15:06 UTC (permalink / raw)
  To: Kefu Chai, pve-devel

Thanks for the patch.

Looking across the series, only PmxcfsError::System and
PmxcfsError::Configuration are actually constructed. The remaining
variants, to_errno() and the defined Result<T> seem unused.
For my own understanding, how and when are they actually used?
Also the README mentions automatic errno conversion. What does
automatic mean here, and when is that triggered? I couldnt find
a to_errno() usage.

If not needed, please trim the unused variants and to_errno() to
just what the series actually needs.

On 2/13/26 10:41 AM, Kefu Chai wrote:
> Add pmxcfs-api-types crate which provides foundational types:
> - PmxcfsError: Error type with errno mapping for FUSE operations
> - FuseMessage: Filesystem operation messages
> - KvStoreMessage: Status synchronization messages
> - ApplicationMessage: Wrapper enum for both message types
> - VmType: VM type enum (Qemu, Lxc)

FuseMessage, KvStoreMessage and ApplicationMessage are not present in
this diff. Consider a more high-level prose message. IMO the message
doesnt necessarly need to mention the actual PmxcfsError or
VmType type names.

> 
> All other crates will depend on these shared type definitions.
> 
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
>   src/pmxcfs-rs/Cargo.toml                    |  10 +-
>   src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml   |  19 +++
>   src/pmxcfs-rs/pmxcfs-api-types/README.md    |  88 ++++++++++++++
>   src/pmxcfs-rs/pmxcfs-api-types/src/error.rs | 122 ++++++++++++++++++++
>   src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs   |  67 +++++++++++
>   5 files changed, 305 insertions(+), 1 deletion(-)
>   create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
>   create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/README.md
>   create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/src/error.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
> 
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index d109221fb..13407f402 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -1,6 +1,7 @@
>   # Workspace root for pmxcfs Rust implementation
>   [workspace]
>   members = [
> +    "pmxcfs-api-types",  # Shared types and error definitions
>   ]
>   resolver = "2"
>   
> @@ -13,7 +14,14 @@ repository = "https://git.proxmox.com/?p=pve-cluster.git"
>   rust-version = "1.85"
>   
>   [workspace.dependencies]
> -# Dependencies will be added incrementally as crates are introduced
> +# Internal workspace dependencies
> +pmxcfs-api-types = { path = "pmxcfs-api-types" }
> +
> +# Error handling
> +thiserror = "1.0"
> +
> +# System integration
> +libc = "0.2"
>   
>   [workspace.lints.clippy]
>   uninlined_format_args = "warn"
> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml b/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
> new file mode 100644
> index 000000000..cdce7951a
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
> @@ -0,0 +1,19 @@
> +[package]
> +name = "pmxcfs-api-types"
> +description = "Shared types and error definitions for pmxcfs"
> +
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +repository.workspace = true
> +
> +[lints]
> +workspace = true
> +
> +[dependencies]
> +# Error handling
> +thiserror.workspace = true
> +
> +# System integration
> +libc.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/README.md b/src/pmxcfs-rs/pmxcfs-api-types/README.md
> new file mode 100644
> index 000000000..ddcd4e478
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-api-types/README.md

The README needs a revisit. Type names and API surface are better
documented through rustdoc on the code itself, the README should focus
on why this crate exists and what a new developer needs to understand
before looking at the code. Please also check the other patches, if
this can be improved.

The errno mapping table doesn't need to be in the README, I think.
To keep it brief, I think this can be looked up in the code.
The "Error Handling" section is not that informative and a bit hard
to follow. On the Rust side it only mentions Result<T> without further
explaination how this maps? I assume the "automatic errno conversion"?
If this is important, please note this.

Also it mentions cfs-utils.h for error codes, but I couldnt find error
codes there. The "Known Issues / TODOs" section can be dropped if
anyways "None identified". I would keep it more brief.

Something a long the following lines would likely be enough for this 
crate (and if the to_errno is still required, please also include it)

# pmxcfs-api-types

This crate provides shared types and error definitions used across all
pmxcfs crates. Having them in a dedicated crate with no internal
dependencies avoids circular dependencies between the higher-level
crates.

Note that OpenVZ (historically present in the C implementation) is not
represented, it was dropped in PVE 4.0.

## References

- [xyz](../actual link to C file)
- ...

> @@ -0,0 +1,88 @@
> +# pmxcfs-api-types
> +
> +**Shared Types and Error Definitions** for pmxcfs.
> +
> +This crate provides common types and error definitions used across all pmxcfs crates.
> +
> +## Overview
> +
> +The crate contains:
> +- **Error types**: `PmxcfsError` with errno mapping for FUSE
> +- **Shared types**: `MemberInfo`, `NodeSyncInfo`, `VmType`, `VmEntry`
> +
> +## Error Types
> +
> +### PmxcfsError
> +
> +Type-safe error enum with automatic errno conversion.
> +
> +### errno Mapping
> +
> +Errors automatically convert to POSIX errno values for FUSE.
> +
> +| Error | errno | Value | Note |
> +|-------|-------|-------|------|
> +| `NotFound(_)` | `ENOENT` | 2 | File or directory not found |
> +| `PermissionDenied` | `EACCES` | 13 | File permission denied |
> +| `AlreadyExists(_)` | `EEXIST` | 17 | File already exists |
> +| `NotADirectory(_)` | `ENOTDIR` | 20 | Not a directory |
> +| `IsADirectory(_)` | `EISDIR` | 21 | Is a directory |
> +| `DirectoryNotEmpty(_)` | `ENOTEMPTY` | 39 | Directory not empty |
> +| `InvalidArgument(_)` | `EINVAL` | 22 | Invalid argument |
> +| `InvalidPath(_)` | `EINVAL` | 22 | Invalid path |
> +| `FileTooLarge` | `EFBIG` | 27 | File too large |
> +| `ReadOnlyFilesystem` | `EROFS` | 30 | Read-only filesystem |
> +| `NoQuorum` | `EACCES` | 13 | No cluster quorum |
> +| `Lock(_)` | `EAGAIN` | 11 | Lock unavailable, try again |
> +| `Timeout` | `ETIMEDOUT` | 110 | Operation timed out |
> +| `Io(e)` | varies | varies | OS error code or `EIO` |
> +| Others* | `EIO` | 5 | Internal error |
> +
> +*Others include: `Database`, `Fuse`, `Cluster`, `Corosync`, `Configuration`, `System`, `Ipc`
> +
> +## Shared Types
> +
> +### MemberInfo
> +
> +Cluster member information.
> +
> +### NodeSyncInfo
> +
> +DFSM synchronization state.
> +
> +### VmType
> +
> +VM/CT type enum (Qemu or Lxc).
> +
> +### VmEntry
> +
> +VM/CT entry for vmlist.
> +
> +## C to Rust Mapping
> +
> +### Error Handling
> +
> +**C Version (cfs-utils.h):**
> +- Return codes: `0` = success, negative = error
> +- errno-based error reporting
> +- Manual error checking everywhere
> +
> +**Rust Version:**
> +- `Result<T, PmxcfsError>` type
> +
> +## Known Issues / TODOs
> +
> +### Missing Features
> +- None identified
> +
> +### Compatibility
> +- **errno values**: Match POSIX standards
> +
> +## References
> +
> +### C Implementation
> +- `src/pmxcfs/cfs-utils.h` - Utility types and error codes

This file does not contain error codes, maybe wrong ref?

> +
> +### Related Crates
> +- **pmxcfs-dfsm**: Uses shared types for cluster sync
> +- **pmxcfs-memdb**: Uses PmxcfsError for database operations

I could not find any PmxcfsError usage in these two crates.
Please re-visit.

> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/src/error.rs b/src/pmxcfs-rs/pmxcfs-api-types/src/error.rs
> new file mode 100644
> index 000000000..dcb5d1e9e
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-api-types/src/error.rs
> @@ -0,0 +1,122 @@
> +use thiserror::Error;
> +
> +/// Error types for pmxcfs operations
> +#[derive(Error, Debug)]
> +pub enum PmxcfsError {
> +    #[error("I/O error: {0}")]
> +    Io(#[from] std::io::Error),
> +
> +    #[error("Database error: {0}")]
> +    Database(String),
> +
> +    #[error("FUSE error: {0}")]
> +    Fuse(String),
> +
> +    #[error("Cluster error: {0}")]
> +    Cluster(String),
> +
> +    #[error("Corosync error: {0}")]
> +    Corosync(String),
> +
> +    #[error("Configuration error: {0}")]
> +    Configuration(String),
> +
> +    #[error("System error: {0}")]
> +    System(String),
> +
> +    #[error("IPC error: {0}")]
> +    Ipc(String),
> +
> +    #[error("Permission denied")]
> +    PermissionDenied,
> +
> +    #[error("Not found: {0}")]
> +    NotFound(String),
> +
> +    #[error("Already exists: {0}")]
> +    AlreadyExists(String),
> +
> +    #[error("Invalid argument: {0}")]
> +    InvalidArgument(String),
> +
> +    #[error("Not a directory: {0}")]
> +    NotADirectory(String),
> +
> +    #[error("Is a directory: {0}")]
> +    IsADirectory(String),
> +
> +    #[error("Directory not empty: {0}")]
> +    DirectoryNotEmpty(String),
> +
> +    #[error("No quorum")]
> +    NoQuorum,
> +
> +    #[error("Read-only filesystem")]
> +    ReadOnlyFilesystem,
> +
> +    #[error("File too large")]
> +    FileTooLarge,
> +
> +    #[error("Filesystem full")]
> +    FilesystemFull,
> +
> +    #[error("Lock error: {0}")]
> +    Lock(String),
> +
> +    #[error("Timeout")]
> +    Timeout,
> +
> +    #[error("Invalid path: {0}")]
> +    InvalidPath(String),
> +}
> +
> +impl PmxcfsError {
> +    /// Convert error to errno value for FUSE operations
> +    pub fn to_errno(&self) -> i32 {
> +        match self {
> +            // File/directory errors
> +            PmxcfsError::NotFound(_) => libc::ENOENT,
> +            PmxcfsError::AlreadyExists(_) => libc::EEXIST,
> +            PmxcfsError::NotADirectory(_) => libc::ENOTDIR,
> +            PmxcfsError::IsADirectory(_) => libc::EISDIR,
> +            PmxcfsError::DirectoryNotEmpty(_) => libc::ENOTEMPTY,
> +            PmxcfsError::FileTooLarge => libc::EFBIG,
> +            PmxcfsError::FilesystemFull => libc::ENOSPC,
> +            PmxcfsError::ReadOnlyFilesystem => libc::EROFS,
> +
> +            // Permission and access errors
> +            // EACCES: Permission denied for file operations (standard POSIX)
> +            // C implementation uses EACCES as default for access/quorum issues
> +            PmxcfsError::PermissionDenied => libc::EACCES,
> +            PmxcfsError::NoQuorum => libc::EACCES,
> +
> +            // Validation errors
> +            PmxcfsError::InvalidArgument(_) => libc::EINVAL,
> +            PmxcfsError::InvalidPath(_) => libc::EINVAL,
> +
> +            // Lock errors - use EAGAIN for temporary failures
> +            PmxcfsError::Lock(_) => libc::EAGAIN,
> +
> +            // Timeout
> +            PmxcfsError::Timeout => libc::ETIMEDOUT,
> +
> +            // I/O errors with automatic errno extraction
> +            PmxcfsError::Io(e) => match e.raw_os_error() {
> +                Some(errno) => errno,
> +                None => libc::EIO,
> +            },
> +
> +            // Fallback to EIO for internal/system errors
> +            PmxcfsError::Database(_) |
> +            PmxcfsError::Fuse(_) |
> +            PmxcfsError::Cluster(_) |
> +            PmxcfsError::Corosync(_) |
> +            PmxcfsError::Configuration(_) |
> +            PmxcfsError::System(_) |
> +            PmxcfsError::Ipc(_) => libc::EIO,
> +        }
> +    }
> +}
> +
> +/// Result type for pmxcfs operations
> +pub type Result<T> = std::result::Result<T, PmxcfsError>;
> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs b/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
> new file mode 100644
> index 000000000..99cafbaa3
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
> @@ -0,0 +1,67 @@
> +mod error;
> +
> +pub use error::{PmxcfsError, Result};
> +
> +/// Maximum size for status data (matches C implementation)
> +/// From status.h: #define CFS_MAX_STATUS_SIZE (32 * 1024)
> +pub const CFS_MAX_STATUS_SIZE: usize = 32 * 1024;

This const is only used in the status crate.
Do we need to share it here?

> +
> +/// VM/CT types
> +///
> +/// Note: OpenVZ was historically supported (VMTYPE_OPENVZ = 2 in C implementation)
> +/// but was removed in PVE 4.0 in favor of LXC. Only QEMU and LXC are currently supported.

Thanks for adding this note!

> +#[derive(Debug, Clone, Copy, PartialEq, Eq)]
> +pub enum VmType {
> +    Qemu,
> +    Lxc,
> +}
> +
> +impl VmType {
> +    /// Returns the directory name where config files are stored
> +    pub fn config_dir(&self) -> &'static str {
> +        match self {
> +            VmType::Qemu => "qemu-server",
> +            VmType::Lxc => "lxc",
> +        }
> +    }
> +}
> +
> +impl std::fmt::Display for VmType {
> +    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
> +        match self {
> +            VmType::Qemu => write!(f, "qemu"),
> +            VmType::Lxc => write!(f, "lxc"),
> +        }
> +    }
> +}
> +
> +/// VM/CT entry for vmlist
> +#[derive(Debug, Clone)]
> +pub struct VmEntry {
> +    pub vmid: u32,

vmid and vmtype should also be aligned to snake-case

> +    pub vmtype: VmType,
> +    pub node: String,
> +    /// Per-VM version counter (increments when this VM's config changes)
> +    pub version: u32,
> +}
> +
> +/// Information about a cluster member
> +///
> +/// This is a shared type used by both cluster and DFSM modules
> +#[derive(Debug, Clone)]
> +pub struct MemberInfo {
> +    pub node_id: u32,
> +    pub pid: u32,
> +    pub joined_at: u64,
> +}
> +
> +/// Node synchronization info for DFSM state sync
> +///
> +/// Used during DFSM synchronization to track which nodes have provided state
> +#[derive(Debug, Clone)]
> +pub struct NodeSyncInfo {
> +    pub node_id: u32,
> +    pub pid: u32,
> +    pub state: Option<Vec<u8>>,

Does the state have a fixed size?
Also can we add a doc comment?

> +    pub synced: bool,

What does it mean if this is true/false?
Please add a doc comment for this pub field.

> +}





^ permalink raw reply	[relevance 5%]

* Re: [PATCH pve-cluster 03/14 v2] pmxcfs-rs: add pmxcfs-config crate
  @ 2026-02-18 16:41  6%   ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-18 16:41 UTC (permalink / raw)
  To: Kefu Chai, pve-devel

Thanks for the patch, Kefu!

Some comments inline.

On 2/13/26 10:42 AM, Kefu Chai wrote:
> Add configuration management crate for pmxcfs:
> - Config struct: Runtime configuration (node name, IP, flags)
> - Thread-safe debug level mutation via RwLock

Small issue here, uses AtomicU8 with the latest changes

> - Arc-wrapped for shared ownership across components
> - Comprehensive unit tests including thread safety tests
> 
> This crate provides the foundational configuration structure used
> by all pmxcfs components. The Config is designed to be shared via
> Arc to allow multiple components to access the same configuration
> instance, with mutable debug level for runtime adjustments.
> 
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
>   src/pmxcfs-rs/Cargo.toml               |   5 +
>   src/pmxcfs-rs/pmxcfs-config/Cargo.toml |  19 +
>   src/pmxcfs-rs/pmxcfs-config/README.md  |  15 +
>   src/pmxcfs-rs/pmxcfs-config/src/lib.rs | 521 +++++++++++++++++++++++++
>   4 files changed, 560 insertions(+)
>   create mode 100644 src/pmxcfs-rs/pmxcfs-config/Cargo.toml
>   create mode 100644 src/pmxcfs-rs/pmxcfs-config/README.md
>   create mode 100644 src/pmxcfs-rs/pmxcfs-config/src/lib.rs
> 
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index 13407f402..f190968ed 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -2,6 +2,7 @@
>   [workspace]
>   members = [
>       "pmxcfs-api-types",  # Shared types and error definitions
> +    "pmxcfs-config",     # Configuration management
>   ]
>   resolver = "2"
>   
> @@ -16,10 +17,14 @@ rust-version = "1.85"
>   [workspace.dependencies]
>   # Internal workspace dependencies
>   pmxcfs-api-types = { path = "pmxcfs-api-types" }
> +pmxcfs-config = { path = "pmxcfs-config" }
>   
>   # Error handling
>   thiserror = "1.0"

The tracing dependency needs to be added in the workspace config

>   
> +# Concurrency primitives
> +parking_lot = "0.12"

This is not needed anymore ...

> +
>   # System integration
>   libc = "0.2"
>   
> diff --git a/src/pmxcfs-rs/pmxcfs-config/Cargo.toml b/src/pmxcfs-rs/pmxcfs-config/Cargo.toml
> new file mode 100644
> index 000000000..a1aeba1d3
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-config/Cargo.toml
> @@ -0,0 +1,19 @@
> +[package]
> +name = "pmxcfs-config"
> +description = "Configuration management for pmxcfs"
> +
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +repository.workspace = true
> +
> +[lints]
> +workspace = true
> +
> +[dependencies]
> +# Concurrency primitives
> +parking_lot.workspace = true

.. as this is unused

> +
> +# Logging
> +tracing.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-config/README.md b/src/pmxcfs-rs/pmxcfs-config/README.md
> new file mode 100644
> index 000000000..53aaf443a
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-config/README.md
> @@ -0,0 +1,15 @@
> +# pmxcfs-config
> +
> +**Configuration Management** for pmxcfs.
> +
> +This crate provides configuration structures for the pmxcfs daemon.
> +
> +## Overview
> +
> +The `Config` struct holds daemon-wide configuration including:
> +- Node hostname
> +- IP address
> +- www-data group ID
> +- Debug flag
> +- Local mode flag
> +- Cluster name
> diff --git a/src/pmxcfs-rs/pmxcfs-config/src/lib.rs b/src/pmxcfs-rs/pmxcfs-config/src/lib.rs
> new file mode 100644
> index 000000000..dca3c76b1
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-config/src/lib.rs
> @@ -0,0 +1,521 @@
> +use std::net::IpAddr;
> +use std::sync::atomic::{AtomicU8, Ordering};
> +use std::sync::Arc;
> +
> +/// Global configuration for pmxcfs
> +pub struct Config {
> +    /// Node name (hostname without domain)

The validation code below allows dots, please re-visit

> +    nodename: String,
> +
> +    /// Node IP address
> +    node_ip: IpAddr,
> +
> +    /// www-data group ID for file permissions
> +    www_data_gid: u32,
> +
> +    /// Force local mode (no clustering)
> +    local_mode: bool,
> +
> +    /// Cluster name (CPG group name)
> +    cluster_name: String,
> +
> +    /// Debug level (0 = normal, 1+ = debug) - mutable at runtime
> +    debug_level: AtomicU8,
> +}
> +
> +impl Clone for Config {
> +    fn clone(&self) -> Self {
> +        Self {
> +            nodename: self.nodename.clone(),
> +            node_ip: self.node_ip,
> +            www_data_gid: self.www_data_gid,
> +            local_mode: self.local_mode,
> +            cluster_name: self.cluster_name.clone(),
> +            debug_level: AtomicU8::new(self.debug_level.load(Ordering::Relaxed)),
> +        }
> +    }
> +}

Do we need this Clone impl actually?
If not we could remove it to avoid confusion with Arc::clone()

> +
> +impl std::fmt::Debug for Config {
> +    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
> +        f.debug_struct("Config")
> +            .field("nodename", &self.nodename)
> +            .field("node_ip", &self.node_ip)
> +            .field("www_data_gid", &self.www_data_gid)
> +            .field("local_mode", &self.local_mode)
> +            .field("cluster_name", &self.cluster_name)
> +            .field("debug_level", &self.debug_level.load(Ordering::Relaxed))
> +            .finish()
> +    }
> +}
> +
> +impl Config {
> +    /// Validate a hostname according to RFC 1123
> +    ///
> +    /// Hostname requirements:
> +    /// - Length: 1-253 characters
> +    /// - Labels (dot-separated parts): 1-63 characters each
> +    /// - Characters: alphanumeric and hyphens
> +    /// - Cannot start or end with hyphen
> +    /// - Case insensitive (lowercase preferred)
> +    fn validate_hostname(hostname: &str) -> Result<(), String> {
> +        if hostname.is_empty() {
> +            return Err("Hostname cannot be empty".to_string());
> +        }
> +        if hostname.len() > 253 {
> +            return Err(format!("Hostname too long: {} > 253 characters", hostname.len()));
> +        }
> +
> +        for label in hostname.split('.') {
> +            if label.is_empty() {
> +                return Err("Hostname cannot have empty labels (consecutive dots)".to_string());
> +            }
> +            if label.len() > 63 {
> +                return Err(format!("Hostname label '{}' too long: {} > 63 characters", label, label.len()));
> +            }
> +            if label.starts_with('-') || label.ends_with('-') {
> +                return Err(format!("Hostname label '{}' cannot start or end with hyphen", label));
> +            }
> +            if !label.chars().all(|c| c.is_ascii_alphanumeric() || c == '-') {
> +                return Err(format!("Hostname label '{}' contains invalid characters (only alphanumeric and hyphen allowed)", label));
> +            }
> +        }
> +
> +        Ok(())
> +    }
> +
> +    pub fn new(
> +        nodename: String,

Into<String> / &str could be nicer here. Also for the other String field
below.

> +        node_ip: IpAddr,
> +        www_data_gid: u32,
> +        debug: bool,

Maybe we should od also debug_level: u8 here?
There is a setter below with also expects debug_level: u8
If we align this, we could avoid the bool to u8 conversion/indirection.

> +        local_mode: bool,
> +        cluster_name: String,
> +    ) -> Self {
> +        // Validate hostname (log warning but don't fail - matches C behavior)
> +        // The C implementation accepts any hostname from uname() without validation

The first comment says "log warning but don't fail - matches C behavior" 
but the second says C does no validation at all. Please clarify :)

If C does not validate, does not log about validity, and does not fail 
we maybe shouldnt do it on the Rust side too (for behavioral
consistency), what do you think?

> +        if let Err(e) = Self::validate_hostname(&nodename) {
> +            tracing::warn!("Invalid nodename '{}': {}", nodename, e);

nit: eventually use structured fields if we decide to log

tracing::warn!(nodename = %nodename, error = %e, "invalid nodename");

> +        }
> +
> +        let debug_level = if debug { 1 } else { 0 };
> +        Self {
> +            nodename,
> +            node_ip,
> +            www_data_gid,
> +            local_mode,
> +            cluster_name,
> +            debug_level: AtomicU8::new(debug_level),
> +        }
> +    }
> +
> +    pub fn shared(
> +        nodename: String,
> +        node_ip: IpAddr,
> +        www_data_gid: u32,
> +        debug: bool,
> +        local_mode: bool,
> +        cluster_name: String,
> +    ) -> Arc<Self> {
> +        Arc::new(Self::new(nodename, node_ip, www_data_gid, debug, local_mode, cluster_name))
> +    }

nit: maybe we should even change this to the following to avoid
duplication of all parameters of new()?

     pub fn into_shared(self) -> Arc<Self> {
         Arc::new(self)
     }

so we only need to maintain one signature on future
changes

> +
> +    pub fn cluster_name(&self) -> &str {
> +        &self.cluster_name
> +    }
> +
> +    pub fn nodename(&self) -> &str {
> +        &self.nodename
> +    }
> +
> +    pub fn node_ip(&self) -> IpAddr {
> +        self.node_ip
> +    }
> +
> +    pub fn www_data_gid(&self) -> u32 {
> +        self.www_data_gid
> +    }
> +
> +    pub fn is_debug(&self) -> bool {
> +        self.debug_level() > 0
> +    }
> +
> +    pub fn is_local_mode(&self) -> bool {
> +        self.local_mode
> +    }
> +
> +    /// Get current debug level (0 = normal, 1+ = debug)
> +    pub fn debug_level(&self) -> u8 {
> +        self.debug_level.load(Ordering::Relaxed)
> +    }
> +
> +    /// Set debug level (0 = normal, 1+ = debug)
> +    pub fn set_debug_level(&self, level: u8) {
> +        self.debug_level.store(level, Ordering::Relaxed);
> +    }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    //! Unit tests for Config struct
> +    //!
> +    //! This test module provides comprehensive coverage for:
> +    //! - Configuration creation and initialization
> +    //! - Getter methods for all configuration fields
> +    //! - Debug level mutation and thread safety
> +    //! - Concurrent access patterns (reads and writes)
> +    //! - Clone independence
> +    //! - Debug formatting
> +    //! - Edge cases (empty strings, long strings, special characters, unicode)
> +    //!
> +    //! ## Thread Safety
> +    //!
> +    //! The Config struct uses `AtomicU8` for debug_level to allow
> +    //! safe concurrent reads and writes. Tests verify:
> +    //! - 10 threads × 100 operations (concurrent modifications)
> +    //! - 20 threads × 1000 operations (concurrent reads)
> +    //!
> +    //! ## Edge Cases
> +    //!
> +    //! Tests cover various edge cases including:
> +    //! - Empty strings for node/cluster names
> +    //! - Long strings (1000+ characters)
> +    //! - Special characters in strings
> +    //! - Unicode support (emoji, non-ASCII characters)
> +
> +    use super::*;
> +    use std::thread;
> +
> +    // ===== Basic Construction Tests =====
> +
> +    #[test]
> +    fn test_config_creation() {
> +        let config = Config::new(
> +            "node1".to_string(),
> +            "192.168.1.10".parse().unwrap(),
> +            33,
> +            false,
> +            false,
> +            "pmxcfs".to_string(),
> +        );
> +
> +        assert_eq!(config.nodename(), "node1");
> +        assert_eq!(config.node_ip(), "192.168.1.10".parse::<IpAddr>().unwrap());
> +        assert_eq!(config.www_data_gid(), 33);
> +        assert!(!config.is_debug());
> +        assert!(!config.is_local_mode());
> +        assert_eq!(config.cluster_name(), "pmxcfs");
> +        assert_eq!(
> +            config.debug_level(),
> +            0,
> +            "Debug level should be 0 when debug is false"
> +        );
> +    }
> +
> +    #[test]
> +    fn test_config_creation_with_debug() {
> +        let config = Config::new(
> +            "node2".to_string(),
> +            "10.0.0.5".parse().unwrap(),
> +            1000,
> +            true,
> +            false,
> +            "test-cluster".to_string(),
> +        );
> +
> +        assert!(config.is_debug());
> +        assert_eq!(
> +            config.debug_level(),
> +            1,
> +            "Debug level should be 1 when debug is true"
> +        );
> +    }
> +
> +    #[test]
> +    fn test_config_creation_local_mode() {
> +        let config = Config::new(
> +            "localhost".to_string(),
> +            "127.0.0.1".parse().unwrap(),
> +            33,
> +            false,
> +            true,
> +            "local".to_string(),
> +        );
> +
> +        assert!(config.is_local_mode());
> +        assert!(!config.is_debug());
> +    }
> +
> +    // ===== Getter Tests =====
> +
> +    #[test]
> +    fn test_all_getters() {
> +        let config = Config::new(
> +            "testnode".to_string(),
> +            "172.16.0.1".parse().unwrap(),
> +            999,
> +            true,
> +            true,
> +            "my-cluster".to_string(),
> +        );
> +
> +        // Test all getter methods
> +        assert_eq!(config.nodename(), "testnode");
> +        assert_eq!(config.node_ip(), "172.16.0.1".parse::<IpAddr>().unwrap());
> +        assert_eq!(config.www_data_gid(), 999);
> +        assert!(config.is_debug());
> +        assert!(config.is_local_mode());
> +        assert_eq!(config.cluster_name(), "my-cluster");
> +        assert_eq!(config.debug_level(), 1);
> +    }
> +
> +    // ===== Debug Level Mutation Tests =====
> +
> +    #[test]
> +    fn test_debug_level_mutation() {
> +        let config = Config::new(
> +            "node1".to_string(),
> +            "192.168.1.1".parse().unwrap(),
> +            33,
> +            false,
> +            false,
> +            "pmxcfs".to_string(),
> +        );
> +
> +        assert_eq!(config.debug_level(), 0);
> +
> +        config.set_debug_level(1);
> +        assert_eq!(config.debug_level(), 1);
> +
> +        config.set_debug_level(5);
> +        assert_eq!(config.debug_level(), 5);
> +
> +        config.set_debug_level(0);
> +        assert_eq!(config.debug_level(), 0);
> +    }
> +
> +    #[test]
> +    fn test_debug_level_max_value() {
> +        let config = Config::new(
> +            "node1".to_string(),
> +            "192.168.1.1".parse().unwrap(),
> +            33,
> +            false,
> +            false,
> +            "pmxcfs".to_string(),
> +        );
> +
> +        config.set_debug_level(255);
> +        assert_eq!(config.debug_level(), 255);
> +
> +        config.set_debug_level(0);
> +        assert_eq!(config.debug_level(), 0);
> +    }
> +
> +    // ===== Thread Safety Tests =====
> +
> +    #[test]
> +    fn test_debug_level_thread_safety() {
> +        let config = Config::shared(
> +            "node1".to_string(),
> +            "192.168.1.1".parse().unwrap(),
> +            33,
> +            false,
> +            false,
> +            "pmxcfs".to_string(),
> +        );
> +
> +        let config_clone = Arc::clone(&config);
> +
> +        // Spawn multiple threads that concurrently modify debug level
> +        let handles: Vec<_> = (0..10)
> +            .map(|i| {
> +                let cfg = Arc::clone(&config);
> +                thread::spawn(move || {
> +                    for _ in 0..100 {
> +                        cfg.set_debug_level(i);
> +                        let _ = cfg.debug_level();
> +                    }
> +                })
> +            })
> +            .collect();
> +
> +        // All threads should complete without panicking
> +        for handle in handles {
> +            handle.join().unwrap();
> +        }
> +
> +        // Final value should be one of the values set by threads
> +        let final_level = config_clone.debug_level();
> +        assert!(
> +            final_level < 10,
> +            "Debug level should be < 10, got {final_level}"
> +        );
> +    }
> +
> +    #[test]
> +    fn test_concurrent_reads() {
> +        let config = Config::shared(
> +            "node1".to_string(),
> +            "192.168.1.1".parse().unwrap(),
> +            33,
> +            true,
> +            false,
> +            "pmxcfs".to_string(),
> +        );
> +
> +        // Spawn multiple threads that concurrently read config
> +        let handles: Vec<_> = (0..20)
> +            .map(|_| {
> +                let cfg = Arc::clone(&config);
> +                thread::spawn(move || {
> +                    for _ in 0..1000 {
> +                        assert_eq!(cfg.nodename(), "node1");
> +                        assert_eq!(cfg.node_ip(), "192.168.1.1".parse::<IpAddr>().unwrap());
> +                        assert_eq!(cfg.www_data_gid(), 33);
> +                        assert!(cfg.is_debug());
> +                        assert!(!cfg.is_local_mode());
> +                        assert_eq!(cfg.cluster_name(), "pmxcfs");
> +                    }
> +                })
> +            })
> +            .collect();
> +
> +        for handle in handles {
> +            handle.join().unwrap();
> +        }
> +    }
> +
> +    // ===== Clone Tests =====
> +
> +    #[test]
> +    fn test_config_clone() {
> +        let config1 = Config::new(
> +            "node1".to_string(),
> +            "192.168.1.1".parse().unwrap(),
> +            33,
> +            true,
> +            false,
> +            "pmxcfs".to_string(),
> +        );
> +
> +        config1.set_debug_level(5);
> +
> +        let config2 = config1.clone();
> +
> +        // Cloned config should have same values
> +        assert_eq!(config2.nodename(), config1.nodename());
> +        assert_eq!(config2.node_ip(), config1.node_ip());
> +        assert_eq!(config2.www_data_gid(), config1.www_data_gid());
> +        assert_eq!(config2.is_debug(), config1.is_debug());
> +        assert_eq!(config2.is_local_mode(), config1.is_local_mode());
> +        assert_eq!(config2.cluster_name(), config1.cluster_name());
> +        assert_eq!(config2.debug_level(), 5);
> +
> +        // Modifying one should not affect the other
> +        config2.set_debug_level(10);
> +        assert_eq!(config1.debug_level(), 5);
> +        assert_eq!(config2.debug_level(), 10);
> +    }
> +
> +    // ===== Debug Formatting Tests =====
> +
> +    #[test]
> +    fn test_debug_format() {
> +        let config = Config::new(
> +            "node1".to_string(),
> +            "192.168.1.1".parse().unwrap(),
> +            33,
> +            true,
> +            false,
> +            "pmxcfs".to_string(),
> +        );
> +
> +        let debug_str = format!("{config:?}");
> +
> +        // Check that debug output contains all fields
> +        assert!(debug_str.contains("Config"));
> +        assert!(debug_str.contains("nodename"));
> +        assert!(debug_str.contains("node1"));
> +        assert!(debug_str.contains("node_ip"));
> +        assert!(debug_str.contains("192.168.1.1"));
> +        assert!(debug_str.contains("www_data_gid"));
> +        assert!(debug_str.contains("33"));
> +        assert!(debug_str.contains("local_mode"));
> +        assert!(debug_str.contains("false"));
> +        assert!(debug_str.contains("cluster_name"));
> +        assert!(debug_str.contains("pmxcfs"));
> +        assert!(debug_str.contains("debug_level"));
> +    }
> +
> +    // ===== Edge Cases and Boundary Tests =====
> +
> +    #[test]
> +    fn test_empty_strings() {
> +        let config = Config::new(
> +            String::new(),
> +            "127.0.0.1".parse().unwrap(),
> +            0,
> +            false,
> +            false,
> +            String::new(),
> +        );
> +
> +        assert_eq!(config.nodename(), "");
> +        assert_eq!(config.node_ip(), "127.0.0.1".parse::<IpAddr>().unwrap());
> +        assert_eq!(config.cluster_name(), "");
> +        assert_eq!(config.www_data_gid(), 0);
> +    }
> +
> +    #[test]
> +    fn test_long_strings() {
> +        let long_name = "a".repeat(1000);
> +        let long_cluster = "cluster-".to_string() + &"x".repeat(500);
> +
> +        let config = Config::new(
> +            long_name.clone(),
> +            "192.168.1.1".parse().unwrap(),
> +            u32::MAX,
> +            true,
> +            true,
> +            long_cluster.clone(),
> +        );
> +
> +        assert_eq!(config.nodename(), long_name);
> +        assert_eq!(config.node_ip(), "192.168.1.1".parse::<IpAddr>().unwrap());
> +        assert_eq!(config.cluster_name(), long_cluster);
> +        assert_eq!(config.www_data_gid(), u32::MAX);
> +    }
> +
> +    #[test]
> +    fn test_special_characters_in_strings() {
> +        let config = Config::new(
> +            "node-1_test.local".to_string(),
> +            "192.168.1.10".parse().unwrap(),
> +            33,
> +            false,
> +            false,
> +            "my-cluster_v2.0".to_string(),
> +        );
> +
> +        assert_eq!(config.nodename(), "node-1_test.local");
> +        assert_eq!(config.node_ip(), "192.168.1.10".parse::<IpAddr>().unwrap());
> +        assert_eq!(config.cluster_name(), "my-cluster_v2.0");
> +    }
> +
> +    #[test]
> +    fn test_unicode_in_strings() {
> +        let config = Config::new(
> +            "ノード1".to_string(),
> +            "::1".parse().unwrap(),
> +            33,
> +            false,
> +            false,
> +            "集群".to_string(),
> +        );
> +
> +        assert_eq!(config.nodename(), "ノード1");
> +        assert_eq!(config.node_ip(), "::1".parse::<IpAddr>().unwrap());
> +        assert_eq!(config.cluster_name(), "集群");
> +    }

If we keep the validate_hostname() we should also have relevant
tests for it

> +}





^ permalink raw reply	[relevance 6%]

* Re: [PATCH pve-cluster 04/14 v2] pmxcfs-rs: add pmxcfs-logger crate
  @ 2026-02-24 16:17  6%   ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-24 16:17 UTC (permalink / raw)
  To: Kefu Chai, pve-devel

Nice work on v2 and thanks for applying my previous suggestions, Kefu.

A few things I’d like to suggest:

The binary compat tests should use real C fixtures (include_bytes!) and
assert actual contents. Please generate binary blobs via
clusterlog_get_state() and clog_dump_json() and check for:
* parsed entries
* header fields, including at least one multi entry fixture with cpos != 8
* JSON output for non ASCII content
* and a test that serializes a buffer where not all entries fit

Separately, I’m wondering about the use of VecDeque<LogEntry> vs C byte
ring abstraction. A benchmark at high log rates would help quantify the
serialization/allocation overhead.

Also left two small inline comments on the test offsets and a header
size comment.

On 2/13/26 10:48 AM, Kefu Chai wrote:
> Add configuration management crate for pmxcfs:
> - Config struct: Runtime configuration (node name, IP, flags)
> - Thread-safe debug level mutation via RwLock
> - Arc-wrapped for shared ownership across components
> - Comprehensive unit tests including thread safety tests
> 
> This crate provides the foundational configuration structure used
> by all pmxcfs components. The Config is designed to be shared via
> Arc to allow multiple components to access the same configuration
> instance, with mutable debug level for runtime adjustments.
> 
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
>   src/pmxcfs-rs/Cargo.toml                      |   2 +
>   src/pmxcfs-rs/pmxcfs-logger/Cargo.toml        |  15 +
>   src/pmxcfs-rs/pmxcfs-logger/README.md         |  58 ++
>   .../pmxcfs-logger/src/cluster_log.rs          | 615 ++++++++++++++++
>   src/pmxcfs-rs/pmxcfs-logger/src/entry.rs      | 694 ++++++++++++++++++
>   src/pmxcfs-rs/pmxcfs-logger/src/hash.rs       | 176 +++++
>   src/pmxcfs-rs/pmxcfs-logger/src/lib.rs        |  27 +
>   .../pmxcfs-logger/src/ring_buffer.rs          | 628 ++++++++++++++++
>   .../tests/binary_compatibility_tests.rs       | 315 ++++++++
>   .../pmxcfs-logger/tests/performance_tests.rs  | 294 ++++++++
>   10 files changed, 2824 insertions(+)
>   create mode 100644 src/pmxcfs-rs/pmxcfs-logger/Cargo.toml
>   create mode 100644 src/pmxcfs-rs/pmxcfs-logger/README.md
>   create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/cluster_log.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/entry.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/hash.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/lib.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/ring_buffer.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-logger/tests/binary_compatibility_tests.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-logger/tests/performance_tests.rs
> 
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index f190968ed..d26fac04c 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -3,6 +3,7 @@
>   members = [
>       "pmxcfs-api-types",  # Shared types and error definitions
>       "pmxcfs-config",     # Configuration management
> +    "pmxcfs-logger",     # Cluster log with ring buffer and deduplication
>   ]
>   resolver = "2"
>   
> @@ -18,6 +19,7 @@ rust-version = "1.85"
>   # Internal workspace dependencies
>   pmxcfs-api-types = { path = "pmxcfs-api-types" }
>   pmxcfs-config = { path = "pmxcfs-config" }
> +pmxcfs-logger = { path = "pmxcfs-logger" }
>   
>   # Error handling
>   thiserror = "1.0"
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/Cargo.toml b/src/pmxcfs-rs/pmxcfs-logger/Cargo.toml
> new file mode 100644
> index 000000000..1af3f015c
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/Cargo.toml
> @@ -0,0 +1,15 @@
> +[package]
> +name = "pmxcfs-logger"
> +version = "0.1.0"
> +edition = "2021"
> +
> +[dependencies]
> +anyhow = "1.0"
> +parking_lot = "0.12"
> +serde = { version = "1.0", features = ["derive"] }
> +serde_json = "1.0"
> +tracing = "0.1"
> +
> +[dev-dependencies]
> +tempfile = "3.0"
> +
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/README.md b/src/pmxcfs-rs/pmxcfs-logger/README.md
> new file mode 100644
> index 000000000..38f102c27
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/README.md
> @@ -0,0 +1,58 @@
> +# pmxcfs-logger
> +
> +Cluster-wide log management for pmxcfs, fully compatible with the C implementation (logger.c).
> +
> +## Overview
> +
> +This crate implements a cluster log system matching Proxmox's C-based logger.c behavior. It provides:
> +
> +- **Ring Buffer Storage**: Circular buffer for log entries with automatic capacity management
> +- **FNV-1a Hashing**: Hashing for node and identity-based deduplication
> +- **Deduplication**: Per-node tracking of latest log entries to avoid duplicates
> +- **Time-based Sorting**: Chronological ordering of log entries across nodes
> +- **Multi-node Merging**: Combining logs from multiple cluster nodes
> +- **JSON Export**: Web UI-compatible JSON output matching C format
> +
> +## Architecture
> +
> +### Key Components
> +
> +1. **LogEntry** (`entry.rs`): Individual log entry with automatic UID generation
> +2. **RingBuffer** (`ring_buffer.rs`): Circular buffer with capacity management
> +3. **ClusterLog** (`lib.rs`): Main API with deduplication and merging
> +4. **Hash Functions** (`hash.rs`): FNV-1a implementation matching C
> +
> +## C to Rust Mapping
> +
> +| C Function | Rust Equivalent | Location |
> +|------------|-----------------|----------|
> +| `fnv_64a_buf` | `hash::fnv_64a` | hash.rs |
> +| `clog_pack` | `LogEntry::pack` | entry.rs |
> +| `clog_copy` | `RingBuffer::add_entry` | ring_buffer.rs |
> +| `clog_sort` | `RingBuffer::sort` | ring_buffer.rs |
> +| `clog_dump_json` | `RingBuffer::dump_json` | ring_buffer.rs |
> +| `clusterlog_insert` | `ClusterLog::insert` | lib.rs |
> +| `clusterlog_add` | `ClusterLog::add` | lib.rs |
> +| `clusterlog_merge` | `ClusterLog::merge` | lib.rs |
> +| `dedup_lookup` | `ClusterLog::dedup_lookup` | lib.rs |
> +
> +## Key Differences from C
> +
> +1. **No `node_digest` in DedupEntry**: C stores `node_digest` both as HashMap key and in the struct. Rust only uses it as the key, saving 8 bytes per entry.
> +
> +2. **Mutex granularity**: C uses a single global mutex. Rust uses separate Arc<Mutex<>> for buffer and dedup table, allowing better concurrency.
> +
> +3. **Code size**: Rust implementation is ~24% the size of C (740 lines vs 3,000+) while maintaining equivalent functionality.
> +
> +## Integration
> +
> +This crate is integrated into `pmxcfs-status` to provide cluster log functionality. The `.clusterlog` FUSE plugin uses this to provide JSON log output compatible with the Proxmox web UI.
> +
> +## References
> +
> +### C Implementation
> +- `src/pmxcfs/logger.c` / `logger.h` - Cluster log implementation
> +
> +### Related Crates
> +- **pmxcfs-status**: Integrates ClusterLog for status tracking
> +- **pmxcfs**: FUSE plugin exposes cluster log via `.clusterlog`
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/cluster_log.rs b/src/pmxcfs-rs/pmxcfs-logger/src/cluster_log.rs
> new file mode 100644
> index 000000000..c9d04ee47
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/cluster_log.rs
> @@ -0,0 +1,615 @@
> +/// Cluster Log Implementation
> +///
> +/// This module implements the cluster-wide log system with deduplication
> +/// and merging support, matching C's clusterlog_t.
> +use crate::entry::LogEntry;
> +use crate::ring_buffer::{RingBuffer, CLOG_DEFAULT_SIZE};
> +use anyhow::Result;
> +use parking_lot::Mutex;
> +use std::collections::{BTreeMap, HashMap};
> +use std::sync::Arc;
> +
> +/// Deduplication entry - tracks the latest UID and time for each node
> +///
> +/// Note: C's `dedup_entry_t` includes node_digest field because GHashTable stores
> +/// the struct pointer both as key and value. In Rust, we use HashMap<u64, DedupEntry>
> +/// where node_digest is the key, so we don't need to duplicate it in the value.
> +/// This is functionally equivalent but more efficient.
> +#[derive(Debug, Clone)]
> +pub(crate) struct DedupEntry {
> +    /// Latest UID seen from this node
> +    pub uid: u32,
> +    /// Latest timestamp seen from this node
> +    pub time: u32,
> +}
> +
> +/// Internal state protected by a single mutex
> +/// Matches C's clusterlog_t which uses a single mutex for both base and dedup
> +struct ClusterLogInner {
> +    /// Ring buffer for log storage (matches C's cl->base)
> +    buffer: RingBuffer,
> +    /// Deduplication tracker (matches C's cl->dedup)
> +    dedup: HashMap<u64, DedupEntry>,
> +}
> +
> +/// Cluster-wide log with deduplication and merging support
> +/// Matches C's `clusterlog_t`
> +///
> +/// Note: Unlike the initial implementation with separate mutexes, we use a single
> +/// mutex to match C's semantics and ensure atomic updates of buffer+dedup.
> +pub struct ClusterLog {
> +    /// Inner state protected by a single mutex
> +    /// Matches C's single g_mutex_t protecting both cl->base and cl->dedup
> +    inner: Arc<Mutex<ClusterLogInner>>,
> +}
> +
> +impl ClusterLog {
> +    /// Create a new cluster log with default size
> +    pub fn new() -> Self {
> +        Self::with_capacity(CLOG_DEFAULT_SIZE)
> +    }
> +
> +    /// Create a new cluster log with specified capacity
> +    pub fn with_capacity(capacity: usize) -> Self {
> +        Self {
> +            inner: Arc::new(Mutex::new(ClusterLogInner {
> +                buffer: RingBuffer::new(capacity),
> +                dedup: HashMap::new(),
> +            })),
> +        }
> +    }
> +
> +    /// Matches C's `clusterlog_add` function
> +    #[allow(clippy::too_many_arguments)]
> +    pub fn add(
> +        &self,
> +        node: &str,
> +        ident: &str,
> +        tag: &str,
> +        pid: u32,
> +        priority: u8,
> +        time: u32,
> +        message: &str,
> +    ) -> Result<()> {
> +        let entry = LogEntry::pack(node, ident, tag, pid, time, priority, message)?;
> +        self.insert(&entry)
> +    }
> +
> +    /// Insert a log entry (with deduplication)
> +    ///
> +    /// Matches C's `clusterlog_insert` function
> +    pub fn insert(&self, entry: &LogEntry) -> Result<()> {
> +        let mut inner = self.inner.lock();
> +
> +        // Check deduplication
> +        if Self::is_not_duplicate(&mut inner.dedup, entry) {
> +            // Entry is not a duplicate, add it
> +            inner.buffer.add_entry(entry)?;
> +        } else {
> +            tracing::debug!("Ignoring duplicate cluster log entry");
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Check if entry is a duplicate (returns true if NOT a duplicate)
> +    ///
> +    /// Matches C's `dedup_lookup` function
> +    ///
> +    /// ## Hash Collision Risk
> +    ///
> +    /// Uses FNV-1a hash (`node_digest`) as deduplication key. Hash collisions
> +    /// are theoretically possible but extremely rare in practice:
> +    ///
> +    /// - FNV-1a produces 64-bit hashes (2^64 possible values)
> +    /// - Collision probability with N entries: ~N²/(2 × 2^64)
> +    /// - For 10,000 log entries: collision probability < 10^-11
> +    ///
> +    /// If a collision occurs, two different log entries (from different nodes
> +    /// or with different content) will be treated as duplicates, causing one
> +    /// to be silently dropped.
> +    ///
> +    /// This design is inherited from the C implementation for compatibility.
> +    /// The risk is acceptable because:
> +    /// 1. Collisions are astronomically rare
> +    /// 2. Only affects log deduplication, not critical data integrity
> +    /// 3. Lost log entries don't compromise cluster operation
> +    ///
> +    /// Changing this would break wire format compatibility with C nodes.
> +    fn is_not_duplicate(dedup: &mut HashMap<u64, DedupEntry>, entry: &LogEntry) -> bool {
> +        match dedup.get_mut(&entry.node_digest) {
> +            None => {
> +                dedup.insert(
> +                    entry.node_digest,
> +                    DedupEntry {
> +                        time: entry.time,
> +                        uid: entry.uid,
> +                    },
> +                );
> +                true
> +            }
> +            Some(dd) => {
> +                if entry.time > dd.time || (entry.time == dd.time && entry.uid > dd.uid) {
> +                    dd.time = entry.time;
> +                    dd.uid = entry.uid;
> +                    true
> +                } else {
> +                    false
> +                }
> +            }
> +        }
> +    }
> +
> +    pub fn get_entries(&self, max: usize) -> Vec<LogEntry> {
> +        let inner = self.inner.lock();
> +        inner.buffer.iter().take(max).cloned().collect()
> +    }
> +
> +    /// Get the current buffer (for testing)
> +    pub fn get_buffer(&self) -> RingBuffer {
> +        let inner = self.inner.lock();
> +        inner.buffer.clone()
> +    }
> +
> +    /// Get buffer length (for testing)
> +    pub fn len(&self) -> usize {
> +        let inner = self.inner.lock();
> +        inner.buffer.len()
> +    }
> +
> +    /// Get buffer capacity (for testing)
> +    pub fn capacity(&self) -> usize {
> +        let inner = self.inner.lock();
> +        inner.buffer.capacity()
> +    }
> +
> +    /// Check if buffer is empty (for testing)
> +    pub fn is_empty(&self) -> bool {
> +        let inner = self.inner.lock();
> +        inner.buffer.is_empty()
> +    }
> +
> +    /// Clear all log entries (for testing)
> +    pub fn clear(&self) {
> +        let mut inner = self.inner.lock();
> +        let capacity = inner.buffer.capacity();
> +        inner.buffer = RingBuffer::new(capacity);
> +        inner.dedup.clear();
> +    }
> +
> +    /// Sort the log entries by time
> +    ///
> +    /// Matches C's `clog_sort` function
> +    pub fn sort(&self) -> Result<RingBuffer> {
> +        let inner = self.inner.lock();
> +        inner.buffer.sort()
> +    }
> +
> +    /// Merge logs from multiple nodes
> +    ///
> +    /// Matches C's `clusterlog_merge` function
> +    ///
> +    /// This method atomically updates both the buffer and dedup state under a single
> +    /// mutex lock, matching C's behavior where both cl->base and cl->dedup are
> +    /// updated under cl->mutex.
> +    pub fn merge(&self, remote_logs: Vec<RingBuffer>, include_local: bool) -> Result<()> {
> +        let mut sorted_entries: BTreeMap<(u32, u64, u32), LogEntry> = BTreeMap::new();
> +        let mut merge_dedup: HashMap<u64, DedupEntry> = HashMap::new();
> +
> +        // Lock once for the entire operation (matching C's single mutex)
> +        let mut inner = self.inner.lock();
> +
> +        // Calculate maximum capacity
> +        let max_size = if include_local {
> +            let local_cap = inner.buffer.capacity();
> +
> +            std::iter::once(local_cap)
> +                .chain(remote_logs.iter().map(|b| b.capacity()))
> +                .max()
> +                .unwrap_or(CLOG_DEFAULT_SIZE)
> +        } else {
> +            remote_logs
> +                .iter()
> +                .map(|b| b.capacity())
> +                .max()
> +                .unwrap_or(CLOG_DEFAULT_SIZE)
> +        };
> +
> +        // Add local entries if requested
> +        if include_local {
> +            for entry in inner.buffer.iter() {
> +                let key = (entry.time, entry.node_digest, entry.uid);
> +                // Keep-first: only insert if key doesn't exist, matching C's g_tree_lookup guard
> +                if let std::collections::btree_map::Entry::Vacant(e) = sorted_entries.entry(key) {
> +                    e.insert(entry.clone());
> +                    Self::is_not_duplicate(&mut merge_dedup, entry);
> +                }
> +            }
> +        }
> +
> +        // Add remote entries
> +        for remote_buffer in &remote_logs {
> +            for entry in remote_buffer.iter() {
> +                let key = (entry.time, entry.node_digest, entry.uid);
> +                // Keep-first: only insert if key doesn't exist, matching C's g_tree_lookup guard
> +                if let std::collections::btree_map::Entry::Vacant(e) = sorted_entries.entry(key) {
> +                    e.insert(entry.clone());
> +                    Self::is_not_duplicate(&mut merge_dedup, entry);
> +                }
> +            }
> +        }
> +
> +        let mut result = RingBuffer::new(max_size);
> +
> +        // BTreeMap iterates oldest->newest. We add each as new head (push_front),
> +        // so result ends with newest at head, matching C's behavior.
> +        // Fill to 100% capacity (matching C's behavior), not just 90%
> +        for (_key, entry) in sorted_entries.iter() {
> +            // add_entry will automatically evict old entries if needed to stay within capacity
> +            result.add_entry(entry)?;
> +        }
> +
> +        // Atomically update both buffer and dedup (matches C lines 503-507)
> +        inner.buffer = result;
> +        inner.dedup = merge_dedup;
> +
> +        Ok(())
> +    }
> +
> +    /// Export log to JSON format
> +    ///
> +    /// Matches C's `clog_dump_json` function
> +    pub fn dump_json(&self, ident_filter: Option<&str>, max_entries: usize) -> String {
> +        let inner = self.inner.lock();
> +        inner.buffer.dump_json(ident_filter, max_entries)
> +    }
> +
> +    /// Export log to JSON format with sorted entries
> +    pub fn dump_json_sorted(
> +        &self,
> +        ident_filter: Option<&str>,
> +        max_entries: usize,
> +    ) -> Result<String> {
> +        let sorted = self.sort()?;
> +        Ok(sorted.dump_json(ident_filter, max_entries))
> +    }
> +
> +    /// Matches C's `clusterlog_get_state` function
> +    ///
> +    /// Returns binary-serialized clog_base_t structure for network transmission.
> +    /// This format is compatible with C nodes for mixed-cluster operation.
> +    pub fn get_state(&self) -> Result<Vec<u8>> {
> +        let sorted = self.sort()?;
> +        Ok(sorted.serialize_binary())
> +    }
> +
> +    pub fn deserialize_state(data: &[u8]) -> Result<RingBuffer> {
> +        RingBuffer::deserialize_binary(data)
> +    }
> +
> +}
> +
> +impl Default for ClusterLog {
> +    fn default() -> Self {
> +        Self::new()
> +    }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +
> +    #[test]
> +    fn test_cluster_log_creation() {
> +        let log = ClusterLog::new();
> +        assert!(log.inner.lock().buffer.is_empty());
> +    }
> +
> +    #[test]
> +    fn test_add_entry() {
> +        let log = ClusterLog::new();
> +
> +        let result = log.add(
> +            "node1",
> +            "root",
> +            "cluster",
> +            12345,
> +            6, // Info priority
> +            1234567890,
> +            "Test message",
> +        );
> +
> +        assert!(result.is_ok());
> +        assert!(!log.inner.lock().buffer.is_empty());
> +    }
> +
> +    #[test]
> +    fn test_deduplication() {
> +        let log = ClusterLog::new();
> +
> +        // Add same entry twice (but with different UIDs since each add creates a new entry)
> +        let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Message 1");
> +        let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Message 1");
> +
> +        // Both entries are added because they have different UIDs
> +        // Deduplication tracks the latest (time, UID) per node, not content
> +        let inner = log.inner.lock();
> +        assert_eq!(inner.buffer.len(), 2);
> +    }
> +
> +    #[test]
> +    fn test_newer_entry_replaces() {
> +        let log = ClusterLog::new();
> +
> +        // Add older entry
> +        let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Old message");
> +
> +        // Add newer entry from same node
> +        let _ = log.add("node1", "root", "cluster", 123, 6, 1001, "New message");
> +
> +        // Should have both entries (newer doesn't remove older, just updates dedup tracker)
> +        let inner = log.inner.lock();
> +        assert_eq!(inner.buffer.len(), 2);
> +    }
> +
> +    #[test]
> +    fn test_json_export() {
> +        let log = ClusterLog::new();
> +
> +        let _ = log.add(
> +            "node1",
> +            "root",
> +            "cluster",
> +            123,
> +            6,
> +            1234567890,
> +            "Test message",
> +        );
> +
> +        let json = log.dump_json(None, 50);
> +
> +        // Should be valid JSON
> +        assert!(serde_json::from_str::<serde_json::Value>(&json).is_ok());
> +
> +        // Should contain "data" field
> +        let value: serde_json::Value = serde_json::from_str(&json).unwrap();
> +        assert!(value.get("data").is_some());
> +    }
> +
> +    #[test]
> +    fn test_merge_logs() {
> +        let log1 = ClusterLog::new();
> +        let log2 = ClusterLog::new();
> +
> +        // Add entries to first log
> +        let _ = log1.add(
> +            "node1",
> +            "root",
> +            "cluster",
> +            123,
> +            6,
> +            1000,
> +            "Message from node1",
> +        );
> +
> +        // Add entries to second log
> +        let _ = log2.add(
> +            "node2",
> +            "root",
> +            "cluster",
> +            456,
> +            6,
> +            1001,
> +            "Message from node2",
> +        );
> +
> +        // Get log2's buffer for merging
> +        let log2_buffer = log2.inner.lock().buffer.clone();
> +
> +        // Merge into log1 (updates log1's buffer atomically)
> +        log1.merge(vec![log2_buffer], true).unwrap();
> +
> +        // Check log1's buffer now contains entries from both logs
> +        let inner = log1.inner.lock();
> +        assert!(inner.buffer.len() >= 2);
> +    }
> +
> +    // ========================================================================
> +    // HIGH PRIORITY TESTS - Merge Edge Cases
> +    // ========================================================================
> +
> +    #[test]
> +    fn test_merge_empty_logs() {
> +        let log = ClusterLog::new();
> +
> +        // Add some entries to local log
> +        let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Local entry");
> +
> +        // Merge with empty remote logs (updates buffer atomically)
> +        log.merge(vec![], true).unwrap();
> +
> +        // Check buffer has 1 entry (from local log)
> +        let inner = log.inner.lock();
> +        assert_eq!(inner.buffer.len(), 1);
> +        let entry = inner.buffer.iter().next().unwrap();
> +        assert_eq!(entry.node, "node1");
> +    }
> +
> +    #[test]
> +    fn test_merge_single_node_only() {
> +        let log = ClusterLog::new();
> +
> +        // Add entries only from single node
> +        let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Entry 1");
> +        let _ = log.add("node1", "root", "cluster", 124, 6, 1001, "Entry 2");
> +        let _ = log.add("node1", "root", "cluster", 125, 6, 1002, "Entry 3");
> +
> +        // Merge with no remote logs (just sort local)
> +        log.merge(vec![], true).unwrap();
> +
> +        // Check buffer has all 3 entries
> +        let inner = log.inner.lock();
> +        assert_eq!(inner.buffer.len(), 3);
> +
> +        // Entries should be sorted by time (buffer stores newest first)
> +        let times: Vec<u32> = inner.buffer.iter().map(|e| e.time).collect();
> +        let mut expected = vec![1002, 1001, 1000];
> +        expected.sort();
> +        expected.reverse(); // Newest first
> +
> +        let mut actual = times.clone();
> +        actual.sort();
> +        actual.reverse();
> +
> +        assert_eq!(actual, expected);
> +    }
> +
> +    #[test]
> +    fn test_merge_all_duplicates() {
> +        let log1 = ClusterLog::new();
> +        let log2 = ClusterLog::new();
> +
> +        // Add same entries to both logs (same node, time, but different UIDs)
> +        let _ = log1.add("node1", "root", "cluster", 123, 6, 1000, "Entry 1");
> +        let _ = log1.add("node1", "root", "cluster", 124, 6, 1001, "Entry 2");
> +
> +        let _ = log2.add("node1", "root", "cluster", 125, 6, 1000, "Entry 1");
> +        let _ = log2.add("node1", "root", "cluster", 126, 6, 1001, "Entry 2");
> +
> +        let log2_buffer = log2.inner.lock().buffer.clone();
> +
> +        // Merge - should handle entries from same node at same times
> +        log1.merge(vec![log2_buffer], true).unwrap();
> +
> +        // Check merged buffer has 4 entries (all are unique by UID despite same time/node)
> +        let inner = log1.inner.lock();
> +        assert_eq!(inner.buffer.len(), 4);
> +    }
> +
> +    #[test]
> +    fn test_merge_exceeding_capacity() {
> +        // Create small buffer to test capacity enforcement
> +        let log = ClusterLog::with_capacity(50_000); // Small buffer
> +
> +        // Add many entries to fill beyond capacity
> +        for i in 0..100 {
> +            let _ = log.add(
> +                "node1",
> +                "root",
> +                "cluster",
> +                100 + i,
> +                6,
> +                1000 + i,
> +                &format!("Entry {}", i),
> +            );
> +        }
> +
> +        // Create remote log with many entries
> +        let remote = ClusterLog::with_capacity(50_000);
> +        for i in 0..100 {
> +            let _ = remote.add(
> +                "node2",
> +                "root",
> +                "cluster",
> +                200 + i,
> +                6,
> +                1000 + i,
> +                &format!("Remote {}", i),
> +            );
> +        }
> +
> +        let remote_buffer = remote.inner.lock().buffer.clone();
> +
> +        // Merge - should stop when buffer is near full
> +        log.merge(vec![remote_buffer], true).unwrap();
> +
> +        // Buffer should be limited by capacity, not necessarily < 200
> +        // The actual limit depends on entry sizes and capacity
> +        // Just verify we got some reasonable number of entries
> +        let inner = log.inner.lock();
> +        assert!(!inner.buffer.is_empty(), "Should have some entries");
> +        assert!(
> +            inner.buffer.len() <= 200,
> +            "Should not exceed total available entries"
> +        );
> +    }
> +
> +    #[test]
> +    fn test_merge_preserves_dedup_state() {
> +        let log = ClusterLog::new();
> +
> +        // Add entries from node1
> +        let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Entry 1");
> +        let _ = log.add("node1", "root", "cluster", 124, 6, 1001, "Entry 2");
> +
> +        // Create remote log with later entries from node1
> +        let remote = ClusterLog::new();
> +        let _ = remote.add("node1", "root", "cluster", 125, 6, 1002, "Entry 3");
> +
> +        let remote_buffer = remote.inner.lock().buffer.clone();
> +
> +        // Merge
> +        log.merge(vec![remote_buffer], true).unwrap();
> +
> +        // Check that dedup state was updated
> +        let inner = log.inner.lock();
> +        let node1_digest = crate::hash::fnv_64a_str("node1");
> +        let dedup_entry = inner.dedup.get(&node1_digest).unwrap();
> +
> +        // Should track the latest time from node1
> +        assert_eq!(dedup_entry.time, 1002);
> +        // UID is auto-generated, so just verify it exists and is reasonable
> +        assert!(dedup_entry.uid > 0);
> +    }
> +
> +    #[test]
> +    fn test_get_state_binary_format() {
> +        let log = ClusterLog::new();
> +
> +        // Add some entries
> +        let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Entry 1");
> +        let _ = log.add("node2", "admin", "system", 456, 6, 1001, "Entry 2");
> +
> +        // Get state
> +        let state = log.get_state().unwrap();
> +
> +        // Should be binary format, not JSON
> +        assert!(state.len() >= 8); // At least header
> +
> +        // Check header format (clog_base_t)
> +        let size = u32::from_le_bytes(state[0..4].try_into().unwrap()) as usize;
> +        let cpos = u32::from_le_bytes(state[4..8].try_into().unwrap());
> +
> +        assert_eq!(size, state.len());
> +        assert_eq!(cpos, 8); // First entry at offset 8
> +
> +        // Should be able to deserialize back
> +        let deserialized = ClusterLog::deserialize_state(&state).unwrap();
> +        assert_eq!(deserialized.len(), 2);
> +    }
> +
> +    #[test]
> +    fn test_state_roundtrip() {
> +        let log = ClusterLog::new();
> +
> +        // Add entries
> +        let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Test 1");
> +        let _ = log.add("node2", "admin", "system", 456, 6, 1001, "Test 2");
> +
> +        // Serialize
> +        let state = log.get_state().unwrap();
> +
> +        // Deserialize
> +        let deserialized = ClusterLog::deserialize_state(&state).unwrap();
> +
> +        // Check entries preserved
> +        assert_eq!(deserialized.len(), 2);
> +
> +        // Buffer is stored newest-first after sorting and serialization
> +        let entries: Vec<_> = deserialized.iter().collect();
> +        assert_eq!(entries[0].node, "node2"); // Newest (time 1001)
> +        assert_eq!(entries[0].message, "Test 2");
> +        assert_eq!(entries[1].node, "node1"); // Oldest (time 1000)
> +        assert_eq!(entries[1].message, "Test 1");
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/entry.rs b/src/pmxcfs-rs/pmxcfs-logger/src/entry.rs
> new file mode 100644
> index 000000000..81d5cecbc
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/entry.rs
> @@ -0,0 +1,694 @@
> +/// Log Entry Implementation
> +///
> +/// This module implements the cluster log entry structure, matching the C
> +/// implementation's clog_entry_t (logger.c).
> +use super::hash::fnv_64a_str;
> +use anyhow::{bail, Result};
> +use serde::Serialize;
> +use std::sync::atomic::{AtomicU32, Ordering};
> +
> +// Import constant from ring_buffer to avoid duplication
> +use crate::ring_buffer::CLOG_MAX_ENTRY_SIZE;
> +
> +/// Global UID counter (matches C's `uid_counter` global variable)
> +///
> +/// # UID Wraparound Behavior
> +///
> +/// The UID counter is a 32-bit unsigned integer that wraps around after 2^32 entries.
> +/// This matches the C implementation's behavior (logger.c:62).
> +///
> +/// **Wraparound implications:**
> +/// - At 1000 entries/second: wraparound after ~49 days
> +/// - At 100 entries/second: wraparound after ~497 days
> +/// - After wraparound, UIDs restart from 1
> +///
> +/// **Impact on deduplication:**
> +/// The deduplication logic compares (time, UID) tuples. After wraparound, an entry
> +/// with UID=1 might be incorrectly considered older than an entry with UID=4294967295,
> +/// even if they have the same timestamp. This is a known limitation inherited from
> +/// the C implementation.
> +///
> +/// **Mitigation:**
> +/// - Entries with different timestamps are correctly ordered (time is primary sort key)
> +/// - Wraparound only affects entries with identical timestamps from the same node
> +/// - A warning is logged when wraparound occurs (see fetch_add below)
> +static UID_COUNTER: AtomicU32 = AtomicU32::new(0);
> +
> +/// Log entry structure
> +///
> +/// Matches C's `clog_entry_t` from logger.c:
> +/// ```c
> +/// typedef struct {
> +///     uint32_t prev;          // Previous entry offset
> +///     uint32_t next;          // Next entry offset
> +///     uint32_t uid;           // Unique ID
> +///     uint32_t time;          // Timestamp
> +///     uint64_t node_digest;   // FNV-1a hash of node name
> +///     uint64_t ident_digest;  // FNV-1a hash of ident
> +///     uint32_t pid;           // Process ID
> +///     uint8_t priority;       // Syslog priority (0-7)
> +///     uint8_t node_len;       // Length of node name (including null)
> +///     uint8_t ident_len;      // Length of ident (including null)
> +///     uint8_t tag_len;        // Length of tag (including null)
> +///     uint32_t msg_len;       // Length of message (including null)
> +///     char data[];            // Variable length data: node + ident + tag + msg
> +/// } clog_entry_t;
> +/// ```
> +#[derive(Debug, Clone, Serialize)]
> +pub struct LogEntry {
> +    /// Unique ID for this entry (auto-incrementing)
> +    pub uid: u32,
> +
> +    /// Unix timestamp
> +    pub time: u32,
> +
> +    /// FNV-1a hash of node name
> +    pub node_digest: u64,
> +
> +    /// FNV-1a hash of ident (user)
> +    pub ident_digest: u64,
> +
> +    /// Process ID
> +    pub pid: u32,
> +
> +    /// Syslog priority (0-7)
> +    pub priority: u8,
> +
> +    /// Node name
> +    pub node: String,
> +
> +    /// Identity/user
> +    pub ident: String,
> +
> +    /// Tag (e.g., "cluster", "pmxcfs")
> +    pub tag: String,
> +
> +    /// Log message
> +    pub message: String,
> +}
> +
> +impl LogEntry {
> +    /// Matches C's `clog_pack` function
> +    pub fn pack(
> +        node: &str,
> +        ident: &str,
> +        tag: &str,
> +        pid: u32,
> +        time: u32,
> +        priority: u8,
> +        message: &str,
> +    ) -> Result<Self> {
> +        if priority >= 8 {
> +            bail!("Invalid priority: {priority} (must be 0-7)");
> +        }
> +
> +        // Truncate to 254 bytes to leave room for null terminator (C uses MIN(strlen+1, 255))
> +        let node = Self::truncate_string(node, 254);
> +        let ident = Self::truncate_string(ident, 254);
> +        let tag = Self::truncate_string(tag, 254);
> +        let message = Self::utf8_to_ascii(message);
> +
> +        let node_len = node.len() + 1;
> +        let ident_len = ident.len() + 1;
> +        let tag_len = tag.len() + 1;
> +        let mut msg_len = message.len() + 1;
> +
> +        // Use checked arithmetic to prevent integer overflow
> +        // Header: 48 bytes fixed (prev, next, uid, time, digests, pid, priority, lengths)

Shouldn't this be 44 bytes?

> +        // Variable: node_len + ident_len + tag_len + msg_len
> +        let header_size = std::mem::size_of::<u32>() * 4  // prev, next, uid, time
> +            + std::mem::size_of::<u64>() * 2  // node_digest, ident_digest
> +            + std::mem::size_of::<u32>() * 2  // pid, msg_len
> +            + std::mem::size_of::<u8>() * 4;  // priority, node_len, ident_len, tag_len
> +
> +        let total_size = header_size
> +            .checked_add(node_len)
> +            .and_then(|s| s.checked_add(ident_len))
> +            .and_then(|s| s.checked_add(tag_len))
> +            .and_then(|s| s.checked_add(msg_len))
> +            .ok_or_else(|| anyhow::anyhow!("Entry size calculation overflow"))?;
> +
> +        if total_size > CLOG_MAX_ENTRY_SIZE {
> +            let diff = total_size - CLOG_MAX_ENTRY_SIZE;
> +            msg_len = msg_len.saturating_sub(diff);
> +        }
> +
> +        let node_digest = fnv_64a_str(&node);
> +        let ident_digest = fnv_64a_str(&ident);
> +
> +        // Increment UID counter with wraparound detection
> +        let old_uid = UID_COUNTER.fetch_add(1, Ordering::SeqCst);
> +
> +        // Warn on wraparound (when counter goes from u32::MAX to 0)
> +        // This happens approximately every 49 days at 1000 entries/second
> +        if old_uid == u32::MAX {
> +            tracing::warn!(
> +                "UID counter wrapped around (2^32 entries reached). \
> +                 Deduplication may be affected for entries with identical timestamps. \
> +                 This is expected behavior matching the C implementation."
> +            );
> +        }
> +
> +        let uid = old_uid.wrapping_add(1);
> +
> +        Ok(Self {
> +            uid,
> +            time,
> +            node_digest,
> +            ident_digest,
> +            pid,
> +            priority,
> +            node,
> +            ident,
> +            tag,
> +            message: message[..msg_len.saturating_sub(1)].to_string(),
> +        })
> +    }
> +
> +    /// Truncate string to max length (safe for multi-byte UTF-8)
> +    fn truncate_string(s: &str, max_len: usize) -> String {
> +        if s.len() <= max_len {
> +            return s.to_string();
> +        }
> +
> +        // Find the last valid UTF-8 character that fits within max_len
> +        let truncate_at = s
> +            .char_indices()
> +            .take_while(|(idx, ch)| idx + ch.len_utf8() <= max_len)
> +            .last()
> +            .map(|(idx, ch)| idx + ch.len_utf8())
> +            .unwrap_or(0);
> +
> +        s[..truncate_at].to_string()
> +    }
> +
> +    /// Convert UTF-8 to ASCII with proper escaping
> +    ///
> +    /// Matches C's `utf8_to_ascii` function behavior:
> +    /// - Control characters (0x00-0x1F, 0x7F): Escaped as #XXX (e.g., #007 for BEL)
> +    /// - Unicode (U+0080 to U+FFFF): Escaped as \uXXXX (e.g., \u4e16 for 世)
> +    /// - Quotes: Escaped as \" (matches C's quotequote=TRUE behavior)
> +    /// - Characters > U+FFFF: Silently dropped
> +    /// - ASCII printable (0x20-0x7E except quotes): Passed through unchanged
> +    fn utf8_to_ascii(s: &str) -> String {
> +        let mut result = String::with_capacity(s.len());
> +
> +        for c in s.chars() {
> +            match c {
> +                // Control characters: #XXX format (3 decimal digits)
> +                '\x00'..='\x1F' | '\x7F' => {
> +                    let code = c as u32;
> +                    result.push('#');
> +                    // Format as 3 decimal digits with leading zeros (e.g., #007 for BEL)
> +                    result.push_str(&format!("{:03}", code));
> +                }
> +                // Quote escaping: matches C's quotequote=TRUE behavior (logger.c:245)
> +                '"' => {
> +                    result.push('\\');
> +                    result.push('"');
> +                }
> +                // ASCII printable characters: pass through
> +                c if c.is_ascii() => {
> +                    result.push(c);
> +                }
> +                // Unicode U+0080 to U+FFFF: \uXXXX format
> +                c if (c as u32) < 0x10000 => {
> +                    result.push('\\');
> +                    result.push('u');
> +                    result.push_str(&format!("{:04x}", c as u32));
> +                }
> +                // Characters > U+FFFF: silently drop (matches C behavior)
> +                _ => {}
> +            }
> +        }
> +
> +        result
> +    }
> +
> +    /// Matches C's `clog_entry_size` function
> +    pub fn size(&self) -> usize {
> +        std::mem::size_of::<u32>() * 4  // prev, next, uid, time
> +            + std::mem::size_of::<u64>() * 2  // node_digest, ident_digest
> +            + std::mem::size_of::<u32>() * 2  // pid, msg_len
> +            + std::mem::size_of::<u8>() * 4   // priority, node_len, ident_len, tag_len
> +            + self.node.len() + 1
> +            + self.ident.len() + 1
> +            + self.tag.len() + 1
> +            + self.message.len() + 1
> +    }
> +
> +    /// C implementation: `uint32_t realsize = ((size + 7) & 0xfffffff8);`
> +    pub fn aligned_size(&self) -> usize {
> +        let size = self.size();
> +        (size + 7) & !7
> +    }
> +
> +    pub fn to_json_object(&self) -> serde_json::Value {
> +        serde_json::json!({
> +            "uid": self.uid,
> +            "time": self.time,
> +            "pri": self.priority,
> +            "tag": self.tag,
> +            "pid": self.pid,
> +            "node": self.node,
> +            "user": self.ident,
> +            "msg": self.message,
> +        })
> +    }
> +
> +    /// Serialize to C binary format (clog_entry_t)
> +    ///
> +    /// Binary layout matches C structure:
> +    /// ```c
> +    /// struct {
> +    ///     uint32_t prev;          // Will be filled by ring buffer
> +    ///     uint32_t next;          // Will be filled by ring buffer
> +    ///     uint32_t uid;
> +    ///     uint32_t time;
> +    ///     uint64_t node_digest;
> +    ///     uint64_t ident_digest;
> +    ///     uint32_t pid;
> +    ///     uint8_t priority;
> +    ///     uint8_t node_len;
> +    ///     uint8_t ident_len;
> +    ///     uint8_t tag_len;
> +    ///     uint32_t msg_len;
> +    ///     char data[];  // node + ident + tag + msg (null-terminated)
> +    /// }
> +    /// ```
> +    pub fn serialize_binary(&self, prev: u32, next: u32) -> Vec<u8> {
> +        let mut buf = Vec::new();
> +
> +        buf.extend_from_slice(&prev.to_le_bytes());
> +        buf.extend_from_slice(&next.to_le_bytes());
> +        buf.extend_from_slice(&self.uid.to_le_bytes());
> +        buf.extend_from_slice(&self.time.to_le_bytes());
> +        buf.extend_from_slice(&self.node_digest.to_le_bytes());
> +        buf.extend_from_slice(&self.ident_digest.to_le_bytes());
> +        buf.extend_from_slice(&self.pid.to_le_bytes());
> +        buf.push(self.priority);
> +
> +        // Cap at 255 to match C's MIN(strlen+1, 255) and prevent u8 overflow
> +        let node_len = (self.node.len() + 1).min(255) as u8;
> +        let ident_len = (self.ident.len() + 1).min(255) as u8;
> +        let tag_len = (self.tag.len() + 1).min(255) as u8;
> +        let msg_len = (self.message.len() + 1) as u32;
> +
> +        buf.push(node_len);
> +        buf.push(ident_len);
> +        buf.push(tag_len);
> +        buf.extend_from_slice(&msg_len.to_le_bytes());
> +
> +        buf.extend_from_slice(self.node.as_bytes());
> +        buf.push(0);
> +
> +        buf.extend_from_slice(self.ident.as_bytes());
> +        buf.push(0);
> +
> +        buf.extend_from_slice(self.tag.as_bytes());
> +        buf.push(0);
> +
> +        buf.extend_from_slice(self.message.as_bytes());
> +        buf.push(0);
> +
> +        buf
> +    }
> +
> +    pub(crate) fn deserialize_binary(data: &[u8]) -> Result<(Self, u32, u32)> {
> +        if data.len() < 48 {
> +            bail!(
> +                "Entry too small: {} bytes (need at least 48 for header)",
> +                data.len()
> +            );
> +        }
> +
> +        let mut offset = 0;
> +
> +        let prev = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> +        offset += 4;
> +
> +        let next = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> +        offset += 4;
> +
> +        let uid = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> +        offset += 4;
> +
> +        let time = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> +        offset += 4;
> +
> +        let node_digest = u64::from_le_bytes(data[offset..offset + 8].try_into()?);
> +        offset += 8;
> +
> +        let ident_digest = u64::from_le_bytes(data[offset..offset + 8].try_into()?);
> +        offset += 8;
> +
> +        let pid = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> +        offset += 4;
> +
> +        let priority = data[offset];
> +        offset += 1;
> +
> +        let node_len = data[offset] as usize;
> +        offset += 1;
> +
> +        let ident_len = data[offset] as usize;
> +        offset += 1;
> +
> +        let tag_len = data[offset] as usize;
> +        offset += 1;
> +
> +        let msg_len = u32::from_le_bytes(data[offset..offset + 4].try_into()?) as usize;
> +        offset += 4;
> +
> +        if offset + node_len + ident_len + tag_len + msg_len > data.len() {
> +            bail!("Entry data exceeds buffer size");
> +        }
> +
> +        let node = read_null_terminated(&data[offset..offset + node_len])?;
> +        offset += node_len;
> +
> +        let ident = read_null_terminated(&data[offset..offset + ident_len])?;
> +        offset += ident_len;
> +
> +        let tag = read_null_terminated(&data[offset..offset + tag_len])?;
> +        offset += tag_len;
> +
> +        let message = read_null_terminated(&data[offset..offset + msg_len])?;
> +
> +        Ok((
> +            Self {
> +                uid,
> +                time,
> +                node_digest,
> +                ident_digest,
> +                pid,
> +                priority,
> +                node,
> +                ident,
> +                tag,
> +                message,
> +            },
> +            prev,
> +            next,
> +        ))
> +    }
> +}
> +
> +fn read_null_terminated(data: &[u8]) -> Result<String> {
> +    let len = data.iter().position(|&b| b == 0).unwrap_or(data.len());
> +    Ok(String::from_utf8_lossy(&data[..len]).into_owned())
> +}
> +
> +#[cfg(test)]
> +pub fn reset_uid_counter() {
> +    UID_COUNTER.store(0, Ordering::SeqCst);
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +
> +    #[test]
> +    fn test_pack_entry() {
> +        reset_uid_counter();
> +
> +        let entry = LogEntry::pack(
> +            "node1",
> +            "root",
> +            "cluster",
> +            12345,
> +            1234567890,
> +            6, // Info priority
> +            "Test message",
> +        )
> +        .unwrap();
> +
> +        assert_eq!(entry.uid, 1);
> +        assert_eq!(entry.time, 1234567890);
> +        assert_eq!(entry.node, "node1");
> +        assert_eq!(entry.ident, "root");
> +        assert_eq!(entry.tag, "cluster");
> +        assert_eq!(entry.pid, 12345);
> +        assert_eq!(entry.priority, 6);
> +        assert_eq!(entry.message, "Test message");
> +    }
> +
> +    #[test]
> +    fn test_uid_increment() {
> +        reset_uid_counter();
> +
> +        let entry1 = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg1").unwrap();
> +        let entry2 = LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "msg2").unwrap();
> +
> +        assert_eq!(entry1.uid, 1);
> +        assert_eq!(entry2.uid, 2);
> +    }
> +
> +    #[test]
> +    fn test_invalid_priority() {
> +        let result = LogEntry::pack("node1", "root", "tag", 0, 1000, 8, "message");
> +        assert!(result.is_err());
> +    }
> +
> +    #[test]
> +    fn test_node_digest() {
> +        let entry1 = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg").unwrap();
> +        let entry2 = LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "msg").unwrap();
> +        let entry3 = LogEntry::pack("node2", "root", "tag", 0, 1000, 6, "msg").unwrap();
> +
> +        // Same node should have same digest
> +        assert_eq!(entry1.node_digest, entry2.node_digest);
> +
> +        // Different node should have different digest
> +        assert_ne!(entry1.node_digest, entry3.node_digest);
> +    }
> +
> +    #[test]
> +    fn test_ident_digest() {
> +        let entry1 = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg").unwrap();
> +        let entry2 = LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "msg").unwrap();
> +        let entry3 = LogEntry::pack("node1", "admin", "tag", 0, 1000, 6, "msg").unwrap();
> +
> +        // Same ident should have same digest
> +        assert_eq!(entry1.ident_digest, entry2.ident_digest);
> +
> +        // Different ident should have different digest
> +        assert_ne!(entry1.ident_digest, entry3.ident_digest);
> +    }
> +
> +    #[test]
> +    fn test_utf8_to_ascii() {
> +        let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "Hello 世界").unwrap();
> +        assert!(entry.message.is_ascii());
> +        // Unicode chars escaped as \uXXXX format (matches C implementation)
> +        assert!(entry.message.contains("\\u4e16")); // 世 = U+4E16
> +        assert!(entry.message.contains("\\u754c")); // 界 = U+754C
> +    }
> +
> +    #[test]
> +    fn test_utf8_control_chars() {
> +        // Test control character escaping
> +        let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "Hello\x07World").unwrap();
> +        assert!(entry.message.is_ascii());
> +        // BEL (0x07) should be escaped as #007 (matches C implementation)
> +        assert!(entry.message.contains("#007"));
> +    }
> +
> +    #[test]
> +    fn test_utf8_mixed_content() {
> +        // Test mix of ASCII, Unicode, and control chars
> +        let entry = LogEntry::pack(
> +            "node1",
> +            "root",
> +            "tag",
> +            0,
> +            1000,
> +            6,
> +            "Test\x01\nUnicode世\ttab",
> +        )
> +        .unwrap();
> +        assert!(entry.message.is_ascii());
> +        // SOH (0x01) -> #001
> +        assert!(entry.message.contains("#001"));
> +        // Newline (0x0A) -> #010
> +        assert!(entry.message.contains("#010"));
> +        // Unicode 世 (U+4E16) -> \u4e16
> +        assert!(entry.message.contains("\\u4e16"));
> +        // Tab (0x09) -> #009
> +        assert!(entry.message.contains("#009"));
> +    }
> +
> +    #[test]
> +    fn test_string_truncation() {
> +        let long_node = "a".repeat(300);
> +        let entry = LogEntry::pack(&long_node, "root", "tag", 0, 1000, 6, "msg").unwrap();
> +        assert!(entry.node.len() <= 255);
> +    }
> +
> +    #[test]
> +    fn test_truncate_multibyte_utf8() {
> +        // Test that truncate_string doesn't panic on multi-byte UTF-8 boundaries
> +        // "世" is 3 bytes in UTF-8 (0xE4 0xB8 0x96)
> +        let s = "x".repeat(253) + "世";
> +
> +        // This should not panic, even though 254 falls in the middle of "世"
> +        let entry = LogEntry::pack(&s, "root", "tag", 0, 1000, 6, "msg").unwrap();
> +
> +        // Should truncate to 253 bytes (before the multi-byte char)
> +        assert_eq!(entry.node.len(), 253);
> +        assert_eq!(entry.node, "x".repeat(253));
> +    }
> +
> +    #[test]
> +    fn test_message_truncation() {
> +        let long_message = "a".repeat(CLOG_MAX_ENTRY_SIZE);
> +        let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, &long_message).unwrap();
> +        // Entry should fit within max size
> +        assert!(entry.size() <= CLOG_MAX_ENTRY_SIZE);
> +    }
> +
> +    #[test]
> +    fn test_aligned_size() {
> +        let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg").unwrap();
> +        let aligned = entry.aligned_size();
> +
> +        // Aligned size should be multiple of 8
> +        assert_eq!(aligned % 8, 0);
> +
> +        // Aligned size should be >= actual size
> +        assert!(aligned >= entry.size());
> +
> +        // Aligned size should be within 7 bytes of actual size
> +        assert!(aligned - entry.size() < 8);
> +    }
> +
> +    #[test]
> +    fn test_json_export() {
> +        let entry = LogEntry::pack("node1", "root", "cluster", 123, 1234567890, 6, "Test").unwrap();
> +        let json = entry.to_json_object();
> +
> +        assert_eq!(json["node"], "node1");
> +        assert_eq!(json["user"], "root");
> +        assert_eq!(json["tag"], "cluster");
> +        assert_eq!(json["pid"], 123);
> +        assert_eq!(json["time"], 1234567890);
> +        assert_eq!(json["pri"], 6);
> +        assert_eq!(json["msg"], "Test");
> +    }
> +
> +    #[test]
> +    fn test_binary_serialization_roundtrip() {
> +        let entry = LogEntry::pack(
> +            "node1",
> +            "root",
> +            "cluster",
> +            12345,
> +            1234567890,
> +            6,
> +            "Test message",
> +        )
> +        .unwrap();
> +
> +        // Serialize with prev/next pointers
> +        let binary = entry.serialize_binary(100, 200);
> +
> +        // Deserialize
> +        let (deserialized, prev, next) = LogEntry::deserialize_binary(&binary).unwrap();
> +
> +        // Check prev/next pointers
> +        assert_eq!(prev, 100);
> +        assert_eq!(next, 200);
> +
> +        // Check entry fields
> +        assert_eq!(deserialized.uid, entry.uid);
> +        assert_eq!(deserialized.time, entry.time);
> +        assert_eq!(deserialized.node_digest, entry.node_digest);
> +        assert_eq!(deserialized.ident_digest, entry.ident_digest);
> +        assert_eq!(deserialized.pid, entry.pid);
> +        assert_eq!(deserialized.priority, entry.priority);
> +        assert_eq!(deserialized.node, entry.node);
> +        assert_eq!(deserialized.ident, entry.ident);
> +        assert_eq!(deserialized.tag, entry.tag);
> +        assert_eq!(deserialized.message, entry.message);
> +    }
> +
> +    #[test]
> +    fn test_binary_format_header_size() {
> +        let entry = LogEntry::pack("n", "u", "t", 1, 1000, 6, "m").unwrap();
> +        let binary = entry.serialize_binary(0, 0);
> +
> +        // Header should be exactly 48 bytes
> +        // prev(4) + next(4) + uid(4) + time(4) + node_digest(8) + ident_digest(8) +
> +        // pid(4) + priority(1) + node_len(1) + ident_len(1) + tag_len(1) + msg_len(4)
> +        assert!(binary.len() >= 48);
> +
> +        // First 48 bytes are header
> +        assert_eq!(&binary[0..4], &0u32.to_le_bytes()); // prev
> +        assert_eq!(&binary[4..8], &0u32.to_le_bytes()); // next
> +    }
> +
> +    #[test]
> +    fn test_binary_deserialize_invalid_size() {
> +        let too_small = vec![0u8; 40]; // Less than 48 byte header
> +        let result = LogEntry::deserialize_binary(&too_small);
> +        assert!(result.is_err());
> +    }
> +
> +    #[test]
> +    fn test_binary_null_terminators() {
> +        let entry = LogEntry::pack("node1", "root", "tag", 123, 1000, 6, "message").unwrap();
> +        let binary = entry.serialize_binary(0, 0);
> +
> +        // Check that strings are null-terminated
> +        // Find null bytes in data section (after 48-byte header)
> +        let data_section = &binary[48..];
> +        let null_count = data_section.iter().filter(|&&b| b == 0).count();
> +        assert_eq!(null_count, 4); // 4 null terminators (node, ident, tag, msg)
> +    }
> +
> +    #[test]
> +    fn test_length_field_overflow_prevention() {
> +        // Test that 255-byte strings are handled correctly (prevent u8 overflow)
> +        // C does: MIN(strlen(s) + 1, 255) to cap at 255
> +        let long_string = "a".repeat(255);
> +
> +        let entry = LogEntry::pack(&long_string, &long_string, &long_string, 123, 1000, 6, "msg").unwrap();
> +
> +        // Strings should be truncated to 254 bytes (leaving room for null)
> +        assert_eq!(entry.node.len(), 254);
> +        assert_eq!(entry.ident.len(), 254);
> +        assert_eq!(entry.tag.len(), 254);
> +
> +        // Serialize and check length fields are capped at 255 (254 bytes + null)
> +        let binary = entry.serialize_binary(0, 0);
> +
> +        // Extract length fields from header
> +        // Layout: prev(4) + next(4) + uid(4) + time(4) + node_digest(8) + ident_digest(8) +
> +        //         pid(4) + priority(1) + node_len(1) + ident_len(1) + tag_len(1) + msg_len(4)
> +        // Offsets: node_len=37, ident_len=38, tag_len=39
> +        let node_len = binary[37];
> +        let ident_len = binary[38];
> +        let tag_len = binary[39];
> +
> +        assert_eq!(node_len, 255); // 254 bytes + 1 null = 255
> +        assert_eq!(ident_len, 255);
> +        assert_eq!(tag_len, 255);
> +    }
> +
> +    #[test]
> +    fn test_length_field_no_wraparound() {
> +        // Even if somehow a 255+ byte string gets through, serialize should cap at 255
> +        // This tests the defensive .min(255) in serialize_binary
> +        let mut entry = LogEntry::pack("node", "ident", "tag", 123, 1000, 6, "msg").unwrap();
> +
> +        // Artificially create an edge case (though pack() already prevents this)
> +        entry.node = "x".repeat(254);  // Max valid size
> +
> +        let binary = entry.serialize_binary(0, 0);
> +        let node_len = binary[37];  // Offset 37 for node_len
> +
> +        // Should be 255 (254 + 1 for null), not wrap to 0
> +        assert_eq!(node_len, 255);
> +        assert_ne!(node_len, 0); // Ensure no wraparound
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/hash.rs b/src/pmxcfs-rs/pmxcfs-logger/src/hash.rs
> new file mode 100644
> index 000000000..09dad6afd
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/hash.rs
> @@ -0,0 +1,176 @@
> +/// FNV-1a (Fowler-Noll-Vo) 64-bit hash function
> +///
> +/// This matches the C implementation's `fnv_64a_buf` function
> +/// Used for generating node and ident digests for deduplication.
> +/// FNV-1a 64-bit non-zero initial basis
> +pub(crate) const FNV1A_64_INIT: u64 = 0xcbf29ce484222325;
> +
> +/// Compute 64-bit FNV-1a hash
> +///
> +/// This is a faithful port of the C implementation's `fnv_64a_buf` function:
> +/// ```c
> +/// static inline uint64_t fnv_64a_buf(const void *buf, size_t len, uint64_t hval) {
> +///     unsigned char *bp = (unsigned char *)buf;
> +///     unsigned char *be = bp + len;
> +///     while (bp < be) {
> +///         hval ^= (uint64_t)*bp++;
> +///         hval += (hval << 1) + (hval << 4) + (hval << 5) + (hval << 7) + (hval << 8) + (hval << 40);
> +///     }
> +///     return hval;
> +/// }
> +/// ```
> +///
> +/// # Arguments
> +/// * `data` - The data to hash
> +/// * `init` - Initial hash value (use FNV1A_64_INIT for first hash)
> +///
> +/// # Returns
> +/// 64-bit hash value
> +///
> +/// Note: This function appears unused but is actually called via `fnv_64a_str` below,
> +/// which provides the primary API for string hashing. Both functions share the core
> +/// FNV-1a implementation logic.
> +#[inline]
> +#[allow(dead_code)] // Used via fnv_64a_str wrapper
> +pub(crate) fn fnv_64a(data: &[u8], init: u64) -> u64 {
> +    let mut hval = init;
> +
> +    for &byte in data {
> +        hval ^= byte as u64;
> +        // FNV magic prime multiplication done via shifts and adds
> +        // This is equivalent to: hval *= 0x100000001b3 (FNV 64-bit prime)
> +        hval = hval.wrapping_add(
> +            (hval << 1)
> +                .wrapping_add(hval << 4)
> +                .wrapping_add(hval << 5)
> +                .wrapping_add(hval << 7)
> +                .wrapping_add(hval << 8)
> +                .wrapping_add(hval << 40),
> +        );
> +    }
> +
> +    hval
> +}
> +
> +/// Hash a null-terminated string (includes the null byte)
> +///
> +/// The C implementation includes the null terminator in the hash:
> +/// `fnv_64a_buf(node, node_len, FNV1A_64_INIT)` where node_len includes the '\0'
> +///
> +/// This function adds a null byte to match that behavior.
> +#[inline]
> +pub(crate) fn fnv_64a_str(s: &str) -> u64 {
> +    let bytes = s.as_bytes();
> +    let mut hval = FNV1A_64_INIT;
> +
> +    for &byte in bytes {
> +        hval ^= byte as u64;
> +        hval = hval.wrapping_add(
> +            (hval << 1)
> +                .wrapping_add(hval << 4)
> +                .wrapping_add(hval << 5)
> +                .wrapping_add(hval << 7)
> +                .wrapping_add(hval << 8)
> +                .wrapping_add(hval << 40),
> +        );
> +    }
> +
> +    // Hash the null terminator to match C behavior
> +    // C implementation: `hval ^= (uint64_t)*bp++` where *bp is '\0'
> +    // Since XOR with 0 is a no-op (hval ^ 0 == hval), we skip it and proceed
> +    // directly to the multiplication step. This optimization produces identical
> +    // results to the C implementation while being more explicit about the intent.
> +    hval.wrapping_add(
> +        (hval << 1)
> +            .wrapping_add(hval << 4)
> +            .wrapping_add(hval << 5)
> +            .wrapping_add(hval << 7)
> +            .wrapping_add(hval << 8)
> +            .wrapping_add(hval << 40),
> +    )
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +
> +    #[test]
> +    fn test_fnv1a_init() {
> +        // Test that init constant matches C implementation
> +        assert_eq!(FNV1A_64_INIT, 0xcbf29ce484222325);
> +    }
> +
> +    #[test]
> +    fn test_fnv1a_empty() {
> +        // Empty string with null terminator
> +        let hash = fnv_64a(&[0], FNV1A_64_INIT);
> +        assert_ne!(hash, FNV1A_64_INIT); // Should be different from init
> +    }
> +
> +    #[test]
> +    fn test_fnv1a_consistency() {
> +        // Same input should produce same output
> +        let data = b"test";
> +        let hash1 = fnv_64a(data, FNV1A_64_INIT);
> +        let hash2 = fnv_64a(data, FNV1A_64_INIT);
> +        assert_eq!(hash1, hash2);
> +    }
> +
> +    #[test]
> +    fn test_fnv1a_different_data() {
> +        // Different input should (usually) produce different output
> +        let hash1 = fnv_64a(b"test1", FNV1A_64_INIT);
> +        let hash2 = fnv_64a(b"test2", FNV1A_64_INIT);
> +        assert_ne!(hash1, hash2);
> +    }
> +
> +    #[test]
> +    fn test_fnv1a_str() {
> +        // Test string hashing with null terminator
> +        let hash1 = fnv_64a_str("node1");
> +        let hash2 = fnv_64a_str("node1");
> +        let hash3 = fnv_64a_str("node2");
> +
> +        assert_eq!(hash1, hash2); // Same string should hash the same
> +        assert_ne!(hash1, hash3); // Different strings should hash differently
> +    }
> +
> +    #[test]
> +    fn test_fnv1a_node_names() {
> +        // Test with typical Proxmox node names
> +        let nodes = vec!["pve1", "pve2", "pve3"];
> +        let mut hashes = Vec::new();
> +
> +        for node in &nodes {
> +            let hash = fnv_64a_str(node);
> +            hashes.push(hash);
> +        }
> +
> +        // All hashes should be unique
> +        for i in 0..hashes.len() {
> +            for j in (i + 1)..hashes.len() {
> +                assert_ne!(
> +                    hashes[i], hashes[j],
> +                    "Hashes for {} and {} should differ",
> +                    nodes[i], nodes[j]
> +                );
> +            }
> +        }
> +    }
> +
> +    #[test]
> +    fn test_fnv1a_chaining() {
> +        // Test that we can chain hashes
> +        let data1 = b"first";
> +        let data2 = b"second";
> +
> +        let hash1 = fnv_64a(data1, FNV1A_64_INIT);
> +        let hash2 = fnv_64a(data2, hash1); // Use previous hash as init
> +
> +        // Should produce a deterministic result
> +        let hash1_again = fnv_64a(data1, FNV1A_64_INIT);
> +        let hash2_again = fnv_64a(data2, hash1_again);
> +
> +        assert_eq!(hash2, hash2_again);
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/lib.rs b/src/pmxcfs-rs/pmxcfs-logger/src/lib.rs
> new file mode 100644
> index 000000000..964f0b3a6
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/lib.rs
> @@ -0,0 +1,27 @@
> +/// Cluster Log Implementation
> +///
> +/// This module provides a cluster-wide log system compatible with the C implementation.
> +/// It maintains a ring buffer of log entries that can be merged from multiple nodes,
> +/// deduplicated, and exported to JSON.
> +///
> +/// Key features:
> +/// - Ring buffer storage for efficient memory usage
> +/// - FNV-1a hashing for node and ident tracking
> +/// - Deduplication across nodes
> +/// - Time-based sorting
> +/// - Multi-node log merging
> +/// - JSON export for web UI
> +// Internal modules (not exposed)
> +mod cluster_log;
> +mod entry;
> +mod hash;
> +mod ring_buffer;
> +
> +// Public API - only expose what's needed externally
> +pub use cluster_log::ClusterLog;
> +
> +// Re-export types only for testing or internal crate use
> +#[doc(hidden)]
> +pub use entry::LogEntry;
> +#[doc(hidden)]
> +pub use ring_buffer::RingBuffer;
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/ring_buffer.rs b/src/pmxcfs-rs/pmxcfs-logger/src/ring_buffer.rs
> new file mode 100644
> index 000000000..2c82308c9
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/ring_buffer.rs
> @@ -0,0 +1,628 @@
> +/// Ring Buffer Implementation for Cluster Log
> +///
> +/// This module implements a circular buffer for storing log entries,
> +/// matching the C implementation's clog_base_t structure.
> +use super::entry::LogEntry;
> +use super::hash::fnv_64a_str;
> +use anyhow::{bail, Result};
> +use std::collections::VecDeque;
> +
> +/// Matches C's CLOG_DEFAULT_SIZE constant
> +pub(crate) const CLOG_DEFAULT_SIZE: usize = 8192 * 16; // 131,072 bytes (128 KB)
> +
> +/// Matches C's CLOG_MAX_ENTRY_SIZE constant
> +pub(crate) const CLOG_MAX_ENTRY_SIZE: usize = 4096; // 4,096 bytes (4 KB)
> +
> +/// Ring buffer for log entries
> +///
> +/// This is a simplified Rust version of the C implementation's ring buffer.
> +/// The C version uses a raw byte buffer with manual pointer arithmetic,
> +/// but we use a VecDeque for safety and simplicity while maintaining
> +/// the same conceptual behavior.
> +///
> +/// C structure (clog_base_t):
> +/// ```c
> +/// struct clog_base {
> +///     uint32_t size;    // Total buffer size
> +///     uint32_t cpos;    // Current position
> +///     char data[];      // Variable length data
> +/// };
> +/// ```
> +#[derive(Debug, Clone)]
> +pub struct RingBuffer {
> +    /// Maximum capacity in bytes
> +    capacity: usize,
> +
> +    /// Current size in bytes (approximate)
> +    current_size: usize,
> +
> +    /// Entries stored in the buffer (newest first)
> +    /// We use VecDeque for efficient push/pop at both ends
> +    entries: VecDeque<LogEntry>,
> +}
> +
> +impl RingBuffer {
> +    /// Create a new ring buffer with specified capacity
> +    pub fn new(capacity: usize) -> Self {
> +        // Ensure minimum capacity
> +        let capacity = if capacity < CLOG_MAX_ENTRY_SIZE * 10 {
> +            CLOG_DEFAULT_SIZE
> +        } else {
> +            capacity
> +        };
> +
> +        Self {
> +            capacity,
> +            current_size: 0,
> +            entries: VecDeque::new(),
> +        }
> +    }
> +
> +    /// Add an entry to the buffer
> +    ///
> +    /// Matches C's `clog_copy` function which calls `clog_alloc_entry`
> +    /// to allocate space in the ring buffer.
> +    pub fn add_entry(&mut self, entry: &LogEntry) -> Result<()> {
> +        let entry_size = entry.aligned_size();
> +
> +        // Make room if needed (remove oldest entries)
> +        while self.current_size + entry_size > self.capacity && !self.entries.is_empty() {
> +            if let Some(old_entry) = self.entries.pop_back() {
> +                self.current_size = self.current_size.saturating_sub(old_entry.aligned_size());
> +            }
> +        }
> +
> +        // Add new entry at the front (newest first)
> +        self.entries.push_front(entry.clone());
> +        self.current_size += entry_size;
> +
> +        Ok(())
> +    }
> +
> +    /// Check if buffer is near full (>90% capacity)
> +    pub fn is_near_full(&self) -> bool {
> +        self.current_size > (self.capacity * 9 / 10)
> +    }
> +
> +    /// Check if buffer is empty
> +    pub fn is_empty(&self) -> bool {
> +        self.entries.is_empty()
> +    }
> +
> +    /// Get number of entries
> +    pub fn len(&self) -> usize {
> +        self.entries.len()
> +    }
> +
> +    /// Get buffer capacity
> +    pub fn capacity(&self) -> usize {
> +        self.capacity
> +    }
> +
> +    /// Iterate over entries (newest first)
> +    pub fn iter(&self) -> impl Iterator<Item = &LogEntry> {
> +        self.entries.iter()
> +    }
> +
> +    /// Sort entries by time, node_digest, and uid
> +    ///
> +    /// Matches C's `clog_sort` function
> +    ///
> +    /// C uses GTree with custom comparison function `clog_entry_sort_fn`:
> +    /// ```c
> +    /// if (entry1->time != entry2->time) {
> +    ///     return entry1->time - entry2->time;
> +    /// }
> +    /// if (entry1->node_digest != entry2->node_digest) {
> +    ///     return entry1->node_digest - entry2->node_digest;
> +    /// }
> +    /// return entry1->uid - entry2->uid;
> +    /// ```
> +    pub fn sort(&self) -> Result<Self> {
> +        let mut new_buffer = Self::new(self.capacity);
> +
> +        // Collect and sort entries
> +        let mut sorted: Vec<LogEntry> = self.entries.iter().cloned().collect();
> +
> +        // Sort by time (ascending), then node_digest, then uid
> +        sorted.sort_by_key(|e| (e.time, e.node_digest, e.uid));
> +
> +        // Add sorted entries to new buffer
> +        // Since add_entry pushes to front, we add in forward order to get newest-first
> +        // sorted = [oldest...newest], add_entry pushes to front, so:
> +        // - Add oldest: [oldest]
> +        // - Add next: [next, oldest]
> +        // - Add newest: [newest, next, oldest]
> +        for entry in sorted.iter() {
> +            new_buffer.add_entry(entry)?;
> +        }
> +
> +        Ok(new_buffer)
> +    }
> +
> +    /// Dump buffer to JSON format
> +    ///
> +    /// Matches C's `clog_dump_json` function
> +    ///
> +    /// # Arguments
> +    /// * `ident_filter` - Optional ident filter (user filter)
> +    /// * `max_entries` - Maximum number of entries to include
> +    pub fn dump_json(&self, ident_filter: Option<&str>, max_entries: usize) -> String {
> +        // Compute ident digest if filter is provided
> +        let ident_digest = ident_filter.map(fnv_64a_str);
> +
> +        let mut data = Vec::new();
> +        let mut count = 0;
> +
> +        // Iterate over entries (newest first, matching C's walk from cpos->prev)
> +        for entry in self.iter() {
> +            if count >= max_entries {
> +                break;
> +            }
> +
> +            // Apply ident filter if specified
> +            if let Some(digest) = ident_digest {
> +                if digest != entry.ident_digest {
> +                    continue;
> +                }
> +            }
> +
> +            data.push(entry.to_json_object());
> +            count += 1;
> +        }
> +
> +        let result = serde_json::json!({
> +            "data": data
> +        });
> +
> +        serde_json::to_string_pretty(&result).unwrap_or_else(|_| "{}".to_string())
> +    }
> +
> +    /// Dump buffer contents (for debugging)
> +    ///
> +    /// Matches C's `clog_dump` function
> +    #[allow(dead_code)]
> +    pub fn dump(&self) {
> +        for (idx, entry) in self.entries.iter().enumerate() {
> +            println!(
> +                "[{}] uid={:08x} time={} node={}{{{:016X}}} tag={}[{}{{{:016X}}}]: {}",
> +                idx,
> +                entry.uid,
> +                entry.time,
> +                entry.node,
> +                entry.node_digest,
> +                entry.tag,
> +                entry.ident,
> +                entry.ident_digest,
> +                entry.message
> +            );
> +        }
> +    }
> +
> +    /// Serialize to C binary format (clog_base_t)
> +    ///
> +    /// Returns a full memory dump of the ring buffer matching C's format.
> +    /// C's clusterlog_get_state() returns g_memdup2(cl->base, clog->size),
> +    /// which is the entire allocated buffer capacity, not just used space.
> +    ///
> +    /// Binary layout matches C structure:
> +    /// ```c
> +    /// struct clog_base {
> +    ///     uint32_t size;    // Total allocated buffer capacity
> +    ///     uint32_t cpos;    // Offset to newest entry (not always 8!)
> +    ///     char data[];      // Ring buffer data (entries at various offsets)
> +    /// };
> +    /// ```
> +    ///
> +    /// Entry offsets and linkage:
> +    /// - entry.prev: offset to previous (older) entry
> +    /// - entry.next: end offset of THIS entry (offset + aligned_size), NOT pointer to next entry!
> +    pub fn serialize_binary(&self) -> Vec<u8> {
> +        // Allocate full buffer capacity (matching C's g_malloc0(size))
> +        let mut buf = vec![0u8; self.capacity];
> +
> +        // Empty buffer case
> +        if self.entries.is_empty() {
> +            buf[0..4].copy_from_slice(&(self.capacity as u32).to_le_bytes()); // size
> +            buf[4..8].copy_from_slice(&0u32.to_le_bytes()); // cpos = 0 (empty)
> +            return buf;
> +        }
> +
> +        // Calculate all offsets first
> +        let mut offsets = Vec::with_capacity(self.entries.len());
> +        let mut current_offset = 8usize;
> +
> +        for entry in self.iter() {
> +            let aligned_size = entry.aligned_size();
> +
> +            // Check if we have space
> +            if current_offset + aligned_size > self.capacity {
> +                break;
> +            }
> +
> +            offsets.push(current_offset as u32);
> +            current_offset += aligned_size;
> +        }
> +
> +        // Track where newest entry is (first entry at offset 8)
> +        let newest_offset = 8u32;
> +
> +        // Write entries with correct prev/next pointers
> +        // Entries are in newest-first order: [newest, second-newest, ..., oldest]
> +        for (i, entry) in self.iter().enumerate() {
> +            let offset = offsets[i] as usize;
> +            let aligned_size = entry.aligned_size();
> +
> +            // entry.prev points to the next-older entry (or 0 if this is oldest)
> +            let prev = if i + 1 < offsets.len() {
> +                offsets[i + 1]
> +            } else {
> +                0 // Oldest entry has prev = 0
> +            };
> +
> +            // entry.next is the end offset of THIS entry
> +            let next = offset as u32 + aligned_size as u32;
> +
> +            let entry_bytes = entry.serialize_binary(prev, next);
> +
> +            // Write entry data
> +            buf[offset..offset + entry_bytes.len()].copy_from_slice(&entry_bytes);
> +
> +            // Padding is already zeroed in vec![0u8; capacity]
> +        }
> +
> +        // Write header
> +        buf[0..4].copy_from_slice(&(self.capacity as u32).to_le_bytes()); // size = full capacity
> +        buf[4..8].copy_from_slice(&newest_offset.to_le_bytes()); // cpos = offset to newest entry
> +
> +        buf
> +    }
> +
> +    /// Deserialize from C binary format
> +    ///
> +    /// Parses clog_base_t structure and extracts all entries.
> +    /// Includes wrap-around guards matching C's logic in `clog_dump`, `clog_dump_json`,
> +    /// and `clog_sort` functions.
> +    pub fn deserialize_binary(data: &[u8]) -> Result<Self> {
> +        if data.len() < 8 {
> +            bail!(
> +                "Buffer too small: {} bytes (need at least 8 for header)",
> +                data.len()
> +            );
> +        }
> +
> +        // Read header
> +        let size = u32::from_le_bytes(data[0..4].try_into()?) as usize;
> +        let initial_cpos = u32::from_le_bytes(data[4..8].try_into()?) as usize;
> +
> +        if size != data.len() {
> +            bail!(
> +                "Size mismatch: header says {}, got {} bytes",
> +                size,
> +                data.len()
> +            );
> +        }
> +
> +        // Empty buffer (cpos == 0)
> +        if initial_cpos == 0 {
> +            return Ok(Self::new(size));
> +        }
> +
> +        // Validate cpos range
> +        if initial_cpos < 8 || initial_cpos >= size {
> +            bail!("Invalid cpos: {initial_cpos} (size: {size})");
> +        }
> +
> +        // Parse entries starting from cpos, walking backwards via prev pointers
> +        // Apply C's wrap-around guards from `clog_dump` and `clog_dump_json`
> +        let mut entries = VecDeque::new();
> +        let mut current_pos = initial_cpos;
> +        let mut visited = std::collections::HashSet::new();
> +
> +        loop {
> +            // Guard against infinite loops
> +            if !visited.insert(current_pos) {
> +                break; // Already visited this position
> +            }
> +
> +            // C guard: cpos must be non-zero
> +            if current_pos == 0 {
> +                break;
> +            }
> +
> +            // Validate bounds
> +            if current_pos >= size {
> +                break;
> +            }
> +
> +            // Parse entry at current_pos
> +            let entry_data = &data[current_pos..];
> +            let (entry, prev, _next) = LogEntry::deserialize_binary(entry_data)?;
> +
> +            // Add to back (we're walking backwards in time, newest to oldest)
> +            // VecDeque should end up as [newest, ..., oldest]
> +            entries.push_back(entry);
> +
> +            // C wrap-around guard: if (cpos < cur->prev && cur->prev <= clog->cpos) break;
> +            // Detects when prev wraps around past initial position
> +            if current_pos < prev as usize && prev as usize <= initial_cpos {
> +                break;
> +            }
> +
> +            current_pos = prev as usize;
> +        }
> +
> +        // Create ring buffer with entries
> +        let mut buffer = Self::new(size);
> +        buffer.entries = entries;
> +
> +        // Recalculate current_size
> +        buffer.current_size = buffer
> +            .entries
> +            .iter()
> +            .map(|e| e.aligned_size())
> +            .sum();
> +
> +        Ok(buffer)
> +    }
> +}
> +
> +impl Default for RingBuffer {
> +    fn default() -> Self {
> +        Self::new(CLOG_DEFAULT_SIZE)
> +    }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> +    use super::*;
> +
> +    #[test]
> +    fn test_ring_buffer_creation() {
> +        let buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +        assert_eq!(buffer.capacity, CLOG_DEFAULT_SIZE);
> +        assert_eq!(buffer.len(), 0);
> +        assert!(buffer.is_empty());
> +    }
> +
> +    #[test]
> +    fn test_add_entry() {
> +        let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +        let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "message").unwrap();
> +
> +        let result = buffer.add_entry(&entry);
> +        assert!(result.is_ok());
> +        assert_eq!(buffer.len(), 1);
> +        assert!(!buffer.is_empty());
> +    }
> +
> +    #[test]
> +    fn test_ring_buffer_wraparound() {
> +        // Create a buffer with minimum required size (CLOG_MAX_ENTRY_SIZE * 10)
> +        // but fill it beyond 90% to trigger wraparound
> +        let mut buffer = RingBuffer::new(CLOG_MAX_ENTRY_SIZE * 10);
> +
> +        // Add many small entries to fill the buffer
> +        // Each entry is small, so we need many to fill the buffer
> +        let initial_count = 50_usize;
> +        for i in 0..initial_count {
> +            let entry =
> +                LogEntry::pack("node1", "root", "tag", 0, 1000 + i as u32, 6, "msg").unwrap();
> +            let _ = buffer.add_entry(&entry);
> +        }
> +
> +        // All entries should fit initially
> +        let count_before = buffer.len();
> +        assert_eq!(count_before, initial_count);
> +
> +        // Now add entries with large messages to trigger wraparound
> +        // Make messages large enough to fill the buffer beyond capacity
> +        let large_msg = "x".repeat(7000); // Very large message (close to max)
> +        let large_entries_count = 20_usize;
> +        for i in 0..large_entries_count {
> +            let entry =
> +                LogEntry::pack("node1", "root", "tag", 0, 2000 + i as u32, 6, &large_msg).unwrap();
> +            let _ = buffer.add_entry(&entry);
> +        }
> +
> +        // Should have removed some old entries due to capacity limits
> +        assert!(
> +            buffer.len() < count_before + large_entries_count,
> +            "Expected wraparound to remove old entries (have {} entries, expected < {})",
> +            buffer.len(),
> +            count_before + large_entries_count
> +        );
> +
> +        // Newest entry should be present
> +        let newest = buffer.iter().next().unwrap();
> +        assert_eq!(newest.time, 2000 + large_entries_count as u32 - 1); // Last added entry
> +    }
> +
> +    #[test]
> +    fn test_sort_by_time() {
> +        let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> +        // Add entries in random time order
> +        let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1002, 6, "c").unwrap());
> +        let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "a").unwrap());
> +        let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "b").unwrap());
> +
> +        let sorted = buffer.sort().unwrap();
> +
> +        // Check that entries are sorted by time (oldest first after reversing)
> +        let times: Vec<u32> = sorted.iter().map(|e| e.time).collect();
> +        let mut times_sorted = times.clone();
> +        times_sorted.sort();
> +        times_sorted.reverse(); // Newest first in buffer
> +        assert_eq!(times, times_sorted);
> +    }
> +
> +    #[test]
> +    fn test_sort_by_node_digest() {
> +        let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> +        // Add entries with same time but different nodes
> +        let _ = buffer.add_entry(&LogEntry::pack("node3", "root", "tag", 0, 1000, 6, "c").unwrap());
> +        let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "a").unwrap());
> +        let _ = buffer.add_entry(&LogEntry::pack("node2", "root", "tag", 0, 1000, 6, "b").unwrap());
> +
> +        let sorted = buffer.sort().unwrap();
> +
> +        // Entries with same time should be sorted by node_digest
> +        // Within same time, should be sorted
> +        for entries in sorted.iter().collect::<Vec<_>>().windows(2) {
> +            if entries[0].time == entries[1].time {
> +                assert!(entries[0].node_digest >= entries[1].node_digest);
> +            }
> +        }
> +    }
> +
> +    #[test]
> +    fn test_json_dump() {
> +        let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +        let _ = buffer
> +            .add_entry(&LogEntry::pack("node1", "root", "cluster", 123, 1000, 6, "msg").unwrap());
> +
> +        let json = buffer.dump_json(None, 50);
> +
> +        // Should be valid JSON
> +        let parsed: serde_json::Value = serde_json::from_str(&json).unwrap();
> +        assert!(parsed.get("data").is_some());
> +
> +        let data = parsed["data"].as_array().unwrap();
> +        assert_eq!(data.len(), 1);
> +
> +        let entry = &data[0];
> +        assert_eq!(entry["node"], "node1");
> +        assert_eq!(entry["user"], "root");
> +        assert_eq!(entry["tag"], "cluster");
> +    }
> +
> +    #[test]
> +    fn test_json_dump_with_filter() {
> +        let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> +        // Add entries with different users
> +        let _ =
> +            buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg1").unwrap());
> +        let _ =
> +            buffer.add_entry(&LogEntry::pack("node1", "admin", "tag", 0, 1001, 6, "msg2").unwrap());
> +        let _ =
> +            buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1002, 6, "msg3").unwrap());
> +
> +        // Filter for "root" only
> +        let json = buffer.dump_json(Some("root"), 50);
> +
> +        let parsed: serde_json::Value = serde_json::from_str(&json).unwrap();
> +        let data = parsed["data"].as_array().unwrap();
> +
> +        // Should only have 2 entries (the ones from "root")
> +        assert_eq!(data.len(), 2);
> +
> +        for entry in data {
> +            assert_eq!(entry["user"], "root");
> +        }
> +    }
> +
> +    #[test]
> +    fn test_json_dump_max_entries() {
> +        let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> +        // Add 10 entries
> +        for i in 0..10 {
> +            let _ = buffer
> +                .add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000 + i, 6, "msg").unwrap());
> +        }
> +
> +        // Request only 5 entries
> +        let json = buffer.dump_json(None, 5);
> +
> +        let parsed: serde_json::Value = serde_json::from_str(&json).unwrap();
> +        let data = parsed["data"].as_array().unwrap();
> +
> +        assert_eq!(data.len(), 5);
> +    }
> +
> +    #[test]
> +    fn test_iterator() {
> +        let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> +        let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "a").unwrap());
> +        let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "b").unwrap());
> +        let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1002, 6, "c").unwrap());
> +
> +        let messages: Vec<String> = buffer.iter().map(|e| e.message.clone()).collect();
> +
> +        // Should be in reverse order (newest first)
> +        assert_eq!(messages, vec!["c", "b", "a"]);
> +    }
> +
> +    #[test]
> +    fn test_binary_serialization_roundtrip() {
> +        let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> +        let _ = buffer.add_entry(
> +            &LogEntry::pack("node1", "root", "cluster", 123, 1000, 6, "Entry 1").unwrap(),
> +        );
> +        let _ = buffer.add_entry(
> +            &LogEntry::pack("node2", "admin", "system", 456, 1001, 5, "Entry 2").unwrap(),
> +        );
> +
> +        // Serialize
> +        let binary = buffer.serialize_binary();
> +
> +        // Deserialize
> +        let deserialized = RingBuffer::deserialize_binary(&binary).unwrap();
> +
> +        // Check entry count
> +        assert_eq!(deserialized.len(), buffer.len());
> +
> +        // Check entries match
> +        let orig_entries: Vec<_> = buffer.iter().collect();
> +        let deser_entries: Vec<_> = deserialized.iter().collect();
> +
> +        for (orig, deser) in orig_entries.iter().zip(deser_entries.iter()) {
> +            assert_eq!(deser.uid, orig.uid);
> +            assert_eq!(deser.time, orig.time);
> +            assert_eq!(deser.node, orig.node);
> +            assert_eq!(deser.message, orig.message);
> +        }
> +    }
> +
> +    #[test]
> +    fn test_binary_format_header() {
> +        let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +        let _ = buffer.add_entry(&LogEntry::pack("n", "u", "t", 1, 1000, 6, "m").unwrap());
> +
> +        let binary = buffer.serialize_binary();
> +
> +        // Check header format
> +        assert!(binary.len() >= 8);
> +
> +        let size = u32::from_le_bytes(binary[0..4].try_into().unwrap()) as usize;
> +        let cpos = u32::from_le_bytes(binary[4..8].try_into().unwrap());
> +
> +        assert_eq!(size, binary.len());
> +        assert_eq!(cpos, 8); // First entry at offset 8
> +    }
> +
> +    #[test]
> +    fn test_binary_empty_buffer() {
> +        let buffer = RingBuffer::new(CLOG_DEFAULT_SIZE); // Use default size to avoid capacity upgrade
> +        let binary = buffer.serialize_binary();
> +
> +        // Empty buffer returns full capacity (matching C's g_memdup2(cl->base, clog->size))
> +        assert_eq!(binary.len(), CLOG_DEFAULT_SIZE); // Full capacity, not just header!
> +
> +        // Check header
> +        let size = u32::from_le_bytes(binary[0..4].try_into().unwrap()) as usize;
> +        let cpos = u32::from_le_bytes(binary[4..8].try_into().unwrap());
> +
> +        assert_eq!(size, CLOG_DEFAULT_SIZE);
> +        assert_eq!(cpos, 0); // Empty buffer has cpos = 0
> +
> +        let deserialized = RingBuffer::deserialize_binary(&binary).unwrap();
> +        assert_eq!(deserialized.len(), 0);
> +        assert_eq!(deserialized.capacity(), CLOG_DEFAULT_SIZE);
> +    }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/tests/binary_compatibility_tests.rs b/src/pmxcfs-rs/pmxcfs-logger/tests/binary_compatibility_tests.rs
> new file mode 100644
> index 000000000..5185386dc
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/tests/binary_compatibility_tests.rs
> @@ -0,0 +1,315 @@
> +//! Binary compatibility tests for pmxcfs-logger
> +//!
> +//! These tests verify that the Rust implementation can correctly
> +//! serialize/deserialize binary data in a format compatible with
> +//! the C implementation.
> +
> +use pmxcfs_logger::{ClusterLog, LogEntry, RingBuffer};
> +
> +/// Test deserializing a minimal C-compatible binary blob
> +///
> +/// This test uses a hand-crafted binary blob that matches C's clog_base_t format:
> +/// - 8-byte header (size + cpos)
> +/// - Single entry at offset 8
> +#[test]
> +fn test_deserialize_minimal_c_blob() {
> +    // Create a minimal valid C binary blob
> +    // Header: size=8+entry_size, cpos=8 (points to first entry)
> +    // Entry: minimal valid entry with all required fields
> +
> +    let entry = LogEntry::pack("node1", "root", "test", 123, 1000, 6, "msg").unwrap();
> +    let entry_bytes = entry.serialize_binary(0, 0); // prev=0 (end), next=0
> +    let entry_size = entry_bytes.len();
> +
> +    // Allocate buffer with capacity for header + entry
> +    let total_size = 8 + entry_size;
> +    let mut blob = vec![0u8; total_size];
> +
> +    // Write header
> +    blob[0..4].copy_from_slice(&(total_size as u32).to_le_bytes()); // size
> +    blob[4..8].copy_from_slice(&8u32.to_le_bytes()); // cpos = 8
> +
> +    // Write entry
> +    blob[8..8 + entry_size].copy_from_slice(&entry_bytes);
> +
> +    // Deserialize
> +    let buffer = RingBuffer::deserialize_binary(&blob).expect("Should deserialize");
> +
> +    // Verify
> +    assert_eq!(buffer.len(), 1, "Should have 1 entry");
> +    let entries: Vec<_> = buffer.iter().collect();
> +    assert_eq!(entries[0].node, "node1");
> +    assert_eq!(entries[0].message, "msg");
> +}
> +
> +/// Test round-trip: Rust serialize -> deserialize
> +///
> +/// Verifies that Rust can serialize and deserialize its own format
> +#[test]
> +fn test_roundtrip_single_entry() {
> +    let mut buffer = RingBuffer::new(8192 * 16);
> +
> +    let entry = LogEntry::pack("node1", "root", "cluster", 123, 1000, 6, "Test message").unwrap();
> +    buffer.add_entry(&entry).unwrap();
> +
> +    // Serialize
> +    let blob = buffer.serialize_binary();
> +
> +    // Verify header
> +    let size = u32::from_le_bytes(blob[0..4].try_into().unwrap()) as usize;
> +    let cpos = u32::from_le_bytes(blob[4..8].try_into().unwrap()) as usize;
> +
> +    assert_eq!(size, blob.len(), "Size should match blob length");
> +    assert_eq!(cpos, 8, "First entry should be at offset 8");
> +
> +    // Deserialize
> +    let deserialized = RingBuffer::deserialize_binary(&blob).expect("Should deserialize");
> +
> +    // Verify
> +    assert_eq!(deserialized.len(), 1);
> +    let entries: Vec<_> = deserialized.iter().collect();
> +    assert_eq!(entries[0].node, "node1");
> +    assert_eq!(entries[0].ident, "root");
> +    assert_eq!(entries[0].message, "Test message");
> +}
> +
> +/// Test round-trip with multiple entries
> +///
> +/// Verifies linked list structure (prev/next pointers)
> +#[test]
> +fn test_roundtrip_multiple_entries() {
> +    let mut buffer = RingBuffer::new(8192 * 16);
> +
> +    // Add 3 entries
> +    for i in 0..3 {
> +        let entry = LogEntry::pack(
> +            "node1",
> +            "root",
> +            "test",
> +            100 + i,
> +            1000 + i,
> +            6,
> +            &format!("Message {}", i),
> +        )
> +        .unwrap();
> +        buffer.add_entry(&entry).unwrap();
> +    }
> +
> +    // Serialize
> +    let blob = buffer.serialize_binary();
> +
> +    // Deserialize
> +    let deserialized = RingBuffer::deserialize_binary(&blob).expect("Should deserialize");
> +
> +    // Verify all entries preserved
> +    assert_eq!(deserialized.len(), 3);
> +
> +    let entries: Vec<_> = deserialized.iter().collect();
> +    // Entries are stored newest-first
> +    assert_eq!(entries[0].message, "Message 2"); // Newest
> +    assert_eq!(entries[1].message, "Message 1");
> +    assert_eq!(entries[2].message, "Message 0"); // Oldest
> +}
> +
> +/// Test empty buffer serialization
> +///
> +/// C returns a buffer with size and cpos=0 for empty buffers
> +#[test]
> +fn test_empty_buffer_format() {
> +    let buffer = RingBuffer::new(8192 * 16);
> +
> +    // Serialize empty buffer
> +    let blob = buffer.serialize_binary();
> +
> +    // Verify format
> +    assert_eq!(blob.len(), 8192 * 16, "Should be full capacity");
> +
> +    let size = u32::from_le_bytes(blob[0..4].try_into().unwrap()) as usize;
> +    let cpos = u32::from_le_bytes(blob[4..8].try_into().unwrap()) as usize;
> +
> +    assert_eq!(size, 8192 * 16, "Size should match capacity");
> +    assert_eq!(cpos, 0, "Empty buffer should have cpos=0");
> +
> +    // Deserialize
> +    let deserialized = RingBuffer::deserialize_binary(&blob).expect("Should deserialize");
> +    assert_eq!(deserialized.len(), 0, "Should be empty");
> +}
> +
> +/// Test entry alignment (8-byte boundaries)
> +///
> +/// C uses ((size + 7) & ~7) for alignment
> +#[test]
> +fn test_entry_alignment() {
> +    let entry = LogEntry::pack("n", "u", "t", 1, 1000, 6, "m").unwrap();
> +
> +    let aligned_size = entry.aligned_size();
> +
> +    // Should be multiple of 8
> +    assert_eq!(aligned_size % 8, 0, "Aligned size should be multiple of 8");
> +
> +    // Should be >= actual size
> +    assert!(aligned_size >= entry.size());
> +
> +    // Should be within 7 bytes of actual size
> +    assert!(aligned_size - entry.size() < 8);
> +}
> +
> +/// Test string length capping (prevents u8 overflow)
> +///
> +/// node_len, ident_len, tag_len are u8 and must cap at 255
> +#[test]
> +fn test_string_length_capping() {
> +    // Create entry with very long strings
> +    let long_node = "a".repeat(300);
> +    let long_ident = "b".repeat(300);
> +    let long_tag = "c".repeat(300);
> +
> +    let entry = LogEntry::pack(&long_node, &long_ident, &long_tag, 1, 1000, 6, "msg").unwrap();
> +
> +    // Serialize
> +    let blob = entry.serialize_binary(0, 0);
> +
> +    // Check length fields (at offsets 32, 33, 34 after header)
> +    let node_len = blob[32];
> +    let ident_len = blob[33];
> +    let tag_len = blob[34];

Shoudnt this be 37/38/39 ?

> +
> +    // All should be capped at 255 (including null terminator)
> +    assert!(node_len <= 255, "node_len should be capped at 255");
> +    assert!(ident_len <= 255, "ident_len should be capped at 255");
> +    assert!(tag_len <= 255, "tag_len should be capped at 255");

Assert the expected value instead.

> +}
> +
> +/// Test ClusterLog state serialization
> +///
> +/// Verifies get_state() returns C-compatible format
> +#[test]
> +fn test_cluster_log_state_format() {
> +    let log = ClusterLog::new();
> +
> +    // Add some entries
> +    log.add("node1", "root", "cluster", 123, 6, 1000, "Entry 1")
> +        .unwrap();
> +    log.add("node2", "admin", "system", 456, 6, 1001, "Entry 2")
> +        .unwrap();
> +
> +    // Get state
> +    let state = log.get_state().expect("Should serialize");
> +
> +    // Verify header format
> +    assert!(state.len() >= 8, "Should have at least header");
> +
> +    let size = u32::from_le_bytes(state[0..4].try_into().unwrap()) as usize;
> +    let cpos = u32::from_le_bytes(state[4..8].try_into().unwrap()) as usize;
> +
> +    assert_eq!(size, state.len(), "Size should match blob length");
> +    assert!(cpos >= 8, "cpos should point into data section");
> +    assert!(cpos < size, "cpos should be within buffer");
> +
> +    // Deserialize and verify
> +    let deserialized = ClusterLog::deserialize_state(&state).expect("Should deserialize");
> +    assert_eq!(deserialized.len(), 2, "Should have 2 entries");
> +}
> +
> +/// Test wrap-around detection in deserialization
> +///
> +/// Verifies that circular buffer wrap-around is handled correctly
> +#[test]
> +fn test_wraparound_detection() {
> +    // Create a buffer with entries
> +    let mut buffer = RingBuffer::new(8192 * 16);
> +
> +    for i in 0..5 {
> +        let entry = LogEntry::pack("node1", "root", "test", 100 + i, 1000 + i, 6, "msg").unwrap();
> +        buffer.add_entry(&entry).unwrap();
> +    }
> +
> +    // Serialize
> +    let blob = buffer.serialize_binary();
> +
> +    // Deserialize (should handle prev pointers correctly)
> +    let deserialized = RingBuffer::deserialize_binary(&blob).expect("Should deserialize");
> +
> +    // Should get all entries
> +    assert_eq!(deserialized.len(), 5);
> +}
> +
> +/// Test invalid binary data handling
> +///
> +/// Verifies that malformed data is rejected
> +#[test]
> +fn test_invalid_binary_data() {
> +    // Too small
> +    let too_small = vec![0u8; 4];
> +    assert!(RingBuffer::deserialize_binary(&too_small).is_err());
> +
> +    // Size mismatch
> +    let mut size_mismatch = vec![0u8; 100];
> +    size_mismatch[0..4].copy_from_slice(&200u32.to_le_bytes()); // Claims 200 bytes
> +    assert!(RingBuffer::deserialize_binary(&size_mismatch).is_err());
> +
> +    // Invalid cpos (beyond buffer)
> +    let mut invalid_cpos = vec![0u8; 100];
> +    invalid_cpos[0..4].copy_from_slice(&100u32.to_le_bytes()); // size = 100
> +    invalid_cpos[4..8].copy_from_slice(&200u32.to_le_bytes()); // cpos = 200 (invalid)
> +    assert!(RingBuffer::deserialize_binary(&invalid_cpos).is_err());
> +}
> +
> +/// Test FNV-1a hash consistency
> +///
> +/// Verifies that node_digest and ident_digest are computed correctly
> +#[test]
> +fn test_hash_consistency() {
> +    let entry1 = LogEntry::pack("node1", "root", "test", 1, 1000, 6, "msg1").unwrap();
> +    let entry2 = LogEntry::pack("node1", "root", "test", 2, 1001, 6, "msg2").unwrap();
> +    let entry3 = LogEntry::pack("node2", "admin", "test", 3, 1002, 6, "msg3").unwrap();
> +
> +    // Same node should have same digest
> +    assert_eq!(entry1.node_digest, entry2.node_digest);
> +
> +    // Same ident should have same digest
> +    assert_eq!(entry1.ident_digest, entry2.ident_digest);
> +
> +    // Different node should have different digest
> +    assert_ne!(entry1.node_digest, entry3.node_digest);
> +
> +    // Different ident should have different digest
> +    assert_ne!(entry1.ident_digest, entry3.ident_digest);
> +}
> +
> +/// Test priority validation
> +///
> +/// Priority must be 0-7 (syslog priority)
> +#[test]
> +fn test_priority_validation() {
> +    // Valid priorities (0-7)
> +    for pri in 0..=7 {
> +        let result = LogEntry::pack("node1", "root", "test", 1, 1000, pri, "msg");
> +        assert!(result.is_ok(), "Priority {} should be valid", pri);
> +    }
> +
> +    // Invalid priority (8+)
> +    let result = LogEntry::pack("node1", "root", "test", 1, 1000, 8, "msg");
> +    assert!(result.is_err(), "Priority 8 should be invalid");
> +}
> +
> +/// Test UTF-8 to ASCII conversion
> +///
> +/// Verifies control character and Unicode escaping (matches C implementation)
> +#[test]
> +fn test_utf8_escaping() {
> +    // Control characters (C format: #XXX with 3 decimal digits)
> +    let entry = LogEntry::pack("node1", "root", "test", 1, 1000, 6, "Hello\x07World").unwrap();
> +    assert!(entry.message.contains("#007"), "BEL should be escaped as #007");
> +
> +    // Unicode characters
> +    let entry = LogEntry::pack("node1", "root", "test", 1, 1000, 6, "Hello 世界").unwrap();
> +    assert!(entry.message.contains("\\u4e16"), "世 should be escaped as \\u4e16");
> +    assert!(entry.message.contains("\\u754c"), "界 should be escaped as \\u754c");
> +
> +    // Mixed content
> +    let entry = LogEntry::pack("node1", "root", "test", 1, 1000, 6, "Test\x01\n世").unwrap();
> +    assert!(entry.message.contains("#001"), "SOH should be escaped");
> +    assert!(entry.message.contains("#010"), "LF should be escaped");
> +    assert!(entry.message.contains("\\u4e16"), "Unicode should be escaped");
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/tests/performance_tests.rs b/src/pmxcfs-rs/pmxcfs-logger/tests/performance_tests.rs
> new file mode 100644
> index 000000000..eec7470d3
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/tests/performance_tests.rs
> @@ -0,0 +1,294 @@
> +//! Performance tests for pmxcfs-logger
> +//!
> +//! These tests verify that the logger implementation scales properly
> +//! and handles large log merges efficiently.
> +
> +use pmxcfs_logger::ClusterLog;
> +
> +/// Test merging large logs from multiple nodes
> +///
> +/// This test verifies:
> +/// 1. Large log merge performance (multiple nodes with many entries)
> +/// 2. Memory usage stays bounded
> +/// 3. Deduplication works correctly at scale
> +#[test]
> +fn test_large_log_merge_performance() {
> +    // Create 3 nodes with large logs
> +    let node1 = ClusterLog::new();
> +    let node2 = ClusterLog::new();
> +    let node3 = ClusterLog::new();
> +
> +    // Add 1000 entries per node (3000 total)
> +    for i in 0..1000 {
> +        let _ = node1.add(
> +            "node1",
> +            "root",
> +            "cluster",
> +            1000 + i,
> +            6,
> +            1000000 + i,
> +            &format!("Node1 entry {}", i),
> +        );
> +        let _ = node2.add(
> +            "node2",
> +            "admin",
> +            "system",
> +            2000 + i,
> +            6,
> +            1000000 + i,
> +            &format!("Node2 entry {}", i),
> +        );
> +        let _ = node3.add(
> +            "node3",
> +            "user",
> +            "service",
> +            3000 + i,
> +            6,
> +            1000000 + i,
> +            &format!("Node3 entry {}", i),
> +        );
> +    }
> +
> +    // Get remote buffers
> +    let node2_buffer = node2.get_buffer();
> +    let node3_buffer = node3.get_buffer();
> +
> +    // Merge all logs into node1
> +    let start = std::time::Instant::now();
> +    node1
> +        .merge(vec![node2_buffer, node3_buffer], true)
> +        .expect("Merge should succeed");
> +    let duration = start.elapsed();
> +
> +    // Verify merge completed
> +    let merged_count = node1.len();
> +
> +    // Should have merged entries (may be less than 3000 due to capacity limits)
> +    assert!(
> +        merged_count > 0,
> +        "Should have some entries after merge (got {})",
> +        merged_count
> +    );
> +
> +    // Performance check: merge should complete in reasonable time
> +    // For 3000 entries, should be well under 1 second
> +    assert!(
> +        duration.as_millis() < 1000,
> +        "Large merge took too long: {:?}",
> +        duration
> +    );
> +
> +    println!(
> +        "[OK] Merged 3000 entries from 3 nodes in {:?} (result: {} entries)",
> +        duration, merged_count
> +    );
> +}
> +
> +/// Test deduplication performance with high duplicate rate
> +///
> +/// This test verifies that deduplication works efficiently when
> +/// many duplicate entries are present.
> +#[test]
> +fn test_deduplication_performance() {
> +    let log = ClusterLog::new();
> +
> +    // Add 500 entries from same node with overlapping times
> +    // This creates many potential duplicates
> +    for i in 0..500 {
> +        let _ = log.add(
> +            "node1",
> +            "root",
> +            "cluster",
> +            1000 + i,
> +            6,
> +            1000 + (i / 10), // Reuse timestamps (50 unique times)
> +            &format!("Entry {}", i),
> +        );
> +    }
> +
> +    // Create remote log with overlapping entries
> +    let remote = ClusterLog::new();
> +    for i in 0..500 {
> +        let _ = remote.add(
> +            "node1",
> +            "root",
> +            "cluster",
> +            2000 + i,
> +            6,
> +            1000 + (i / 10), // Same timestamp pattern
> +            &format!("Remote entry {}", i),
> +        );
> +    }
> +
> +    let remote_buffer = remote.get_buffer();
> +
> +    // Merge with deduplication
> +    let start = std::time::Instant::now();
> +    log.merge(vec![remote_buffer], true)
> +        .expect("Merge should succeed");
> +    let duration = start.elapsed();
> +
> +    let final_count = log.len();
> +
> +    // Should have deduplicated some entries
> +    assert!(
> +        final_count > 0,
> +        "Should have entries after deduplication"
> +    );
> +
> +    // Performance check
> +    assert!(
> +        duration.as_millis() < 500,
> +        "Deduplication took too long: {:?}",
> +        duration
> +    );
> +
> +    println!(
> +        "[OK] Deduplicated 1000 entries in {:?} (result: {} entries)",
> +        duration, final_count
> +    );
> +}
> +
> +/// Test memory usage stays bounded during large operations
> +///
> +/// This test verifies that the ring buffer properly limits memory
> +/// usage even when adding many entries.
> +#[test]
> +fn test_memory_bounded() {
> +    // Create log with default capacity
> +    let log = ClusterLog::new();
> +
> +    // Add many entries (more than capacity)
> +    for i in 0..10000 {
> +        let _ = log.add(
> +            "node1",
> +            "root",
> +            "cluster",
> +            1000 + i,
> +            6,
> +            1000000 + i,
> +            &format!("Entry with some message content {}", i),
> +        );
> +    }
> +
> +    let entry_count = log.len();
> +    let capacity = log.capacity();
> +
> +    // Buffer should not grow unbounded
> +    // Entry count should be reasonable relative to capacity
> +    assert!(
> +        entry_count < 10000,
> +        "Buffer should not store all 10000 entries (got {})",
> +        entry_count
> +    );
> +
> +    // Verify capacity is respected
> +    assert!(
> +        capacity > 0,
> +        "Capacity should be set (got {})",
> +        capacity
> +    );
> +
> +    println!(
> +        "[OK] Added 10000 entries, buffer contains {} (capacity: {} bytes)",
> +        entry_count, capacity
> +    );
> +}
> +
> +/// Test JSON export performance with large logs
> +///
> +/// This test verifies that JSON export scales properly.
> +#[test]
> +fn test_json_export_performance() {
> +    let log = ClusterLog::new();
> +
> +    // Add 1000 entries
> +    for i in 0..1000 {
> +        let _ = log.add(
> +            "node1",
> +            "root",
> +            "cluster",
> +            1000 + i,
> +            6,
> +            1000000 + i,
> +            &format!("Test message {}", i),
> +        );
> +    }
> +
> +    // Export to JSON
> +    let start = std::time::Instant::now();
> +    let json = log.dump_json(None, 1000);
> +    let duration = start.elapsed();
> +
> +    // Verify JSON is valid
> +    let parsed: serde_json::Value =
> +        serde_json::from_str(&json).expect("Should be valid JSON");
> +    let data = parsed["data"].as_array().expect("Should have data array");
> +
> +    assert!(data.len() > 0, "Should have entries in JSON");
> +
> +    // Performance check
> +    assert!(
> +        duration.as_millis() < 500,
> +        "JSON export took too long: {:?}",
> +        duration
> +    );
> +
> +    println!(
> +        "[OK] Exported {} entries to JSON in {:?}",
> +        data.len(),
> +        duration
> +    );
> +}
> +
> +/// Test binary serialization performance
> +///
> +/// This test verifies that binary serialization/deserialization
> +/// is efficient for large buffers.
> +#[test]
> +fn test_binary_serialization_performance() {
> +    let log = ClusterLog::new();
> +
> +    // Add 500 entries
> +    for i in 0..500 {
> +        let _ = log.add(
> +            "node1",
> +            "root",
> +            "cluster",
> +            1000 + i,
> +            6,
> +            1000000 + i,
> +            &format!("Entry {}", i),
> +        );
> +    }
> +
> +    // Serialize
> +    let start = std::time::Instant::now();
> +    let state = log.get_state().expect("Should serialize");
> +    let serialize_duration = start.elapsed();
> +
> +    // Deserialize
> +    let start = std::time::Instant::now();
> +    let deserialized = ClusterLog::deserialize_state(&state).expect("Should deserialize");
> +    let deserialize_duration = start.elapsed();
> +
> +    // Verify round-trip
> +    assert_eq!(deserialized.len(), 500, "Should preserve entry count");
> +
> +    // Performance checks
> +    assert!(
> +        serialize_duration.as_millis() < 200,
> +        "Serialization took too long: {:?}",
> +        serialize_duration
> +    );
> +    assert!(
> +        deserialize_duration.as_millis() < 200,
> +        "Deserialization took too long: {:?}",
> +        deserialize_duration
> +    );
> +
> +    println!(
> +        "[OK] Serialized 500 entries in {:?}, deserialized in {:?}",
> +        serialize_duration, deserialize_duration
> +    );
> +}





^ permalink raw reply	[relevance 6%]

* Re: [PATCH proxmox-backup v5 2/4] pbs-config: cache verified API token secrets
  2026-02-17 11:12 12% ` [PATCH proxmox-backup v5 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
@ 2026-02-25 15:44  6%   ` Shannon Sterz
  2026-02-27  9:28  6%     ` Samuel Rufinatscha
  0 siblings, 1 reply; 117+ results
From: Shannon Sterz @ 2026-02-25 15:44 UTC (permalink / raw)
  To: Samuel Rufinatscha; +Cc: pbs-devel

On Tue Feb 17, 2026 at 12:12 PM CET, Samuel Rufinatscha wrote:
> Adds an in-memory cache of successfully verified token secrets.
> Subsequent requests for the same token+secret combination only perform
> a comparison using openssl::memcmp::eq and avoid re-running the
> password hash. The cache is updated when a token secret is set and
> cleared when a token is deleted. A shared generation counter (via
> ConfigVersionCache) is used to invalidate caches across processes when
> token secrets are modified or deleted. This keeps privileged and
> unprivileged daemons in sync.
>
> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> ---
> Changes from v4 to v5:
> * Rebased
> * Move invalidate_cache_state_and_set_gen into cache object impl
> rename to reset_and_set_gen
> * Add additional insert/remove helpers which set/update the generation
> directly
> * Clarified the  usage of shared generation counter in the commit
> message
>
> Changes from v3 to v4:
> * Add gen param to invalidate_cache_state()
> * Validates the generation bump after obtaining write lock in
> apply_api_mutation
> * Pass lock to apply_api_mutation
> * Remove unnecessary gen check cache_try_secret_matches
> * Adjusted commit message
>
> Changes from v2 to v3:
> * Replaced process-local cache invalidation (AtomicU64
> API_MUTATION_GENERATION) with a cross-process shared generation via
> ConfigVersionCache.
> * Validate shared generation before/after the constant-time secret
> compare; only insert into cache if the generation is unchanged.
> * invalidate_cache_state() on insert if shared generation changed.
>
> Changes from v1 to v2:
> * Replace OnceCell with LazyLock, and std::sync::RwLock with
> parking_lot::RwLock.
> * Add API_MUTATION_GENERATION and guard cache inserts
> to prevent “zombie inserts” across concurrent set/delete.
> * Refactor cache operations into cache_try_secret_matches,
> cache_try_insert_secret, and centralize write-side behavior in
> apply_api_mutation.
> * Switch fast-path cache access to try_read/try_write (best-effort).
>
>  Cargo.toml                     |   1 +
>  pbs-config/Cargo.toml          |   1 +
>  pbs-config/src/token_shadow.rs | 167 ++++++++++++++++++++++++++++++++-
>  3 files changed, 166 insertions(+), 3 deletions(-)
>
> diff --git a/Cargo.toml b/Cargo.toml
> index dd8af85f..469538bb 100644
> --- a/Cargo.toml
> +++ b/Cargo.toml
> @@ -144,6 +144,7 @@ nom = "7"
>  num-traits = "0.2"
>  once_cell = "1.3.1"
>  openssl = "0.10.40"
> +parking_lot = "0.12"
>  percent-encoding = "2.1"
>  pin-project-lite = "0.2"
>  regex = "1.5.5"
> diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
> index 74afb3c6..eb81ce00 100644
> --- a/pbs-config/Cargo.toml
> +++ b/pbs-config/Cargo.toml
> @@ -13,6 +13,7 @@ libc.workspace = true
>  nix.workspace = true
>  once_cell.workspace = true
>  openssl.workspace = true
> +parking_lot.workspace = true
>  regex.workspace = true
>  serde.workspace = true
>  serde_json.workspace = true
> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
> index 640fabbf..ad766671 100644
> --- a/pbs-config/src/token_shadow.rs
> +++ b/pbs-config/src/token_shadow.rs
> @@ -1,6 +1,8 @@
>  use std::collections::HashMap;
> +use std::sync::LazyLock;
>
>  use anyhow::{bail, format_err, Error};
> +use parking_lot::RwLock;
>  use serde::{Deserialize, Serialize};
>  use serde_json::{from_value, Value};
>
> @@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
>  const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
>  const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>
> +/// Global in-memory cache for successfully verified API token secrets.
> +/// The cache stores plain text secrets for token Authids that have already been
> +/// verified against the hashed values in `token.shadow`. This allows for cheap
> +/// subsequent authentications for the same token+secret combination, avoiding
> +/// recomputing the password hash on every request.
> +static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
> +    RwLock::new(ApiTokenSecretCache {
> +        secrets: HashMap::new(),
> +        shared_gen: 0,
> +    })
> +});
> +
>  #[derive(Serialize, Deserialize)]
>  #[serde(rename_all = "kebab-case")]
>  /// ApiToken id / secret pair
> @@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>          bail!("not an API token ID");
>      }
>
> +    // Fast path
> +    if cache_try_secret_matches(tokenid, secret) {
> +        return Ok(());
> +    }
> +
> +    // Slow path
> +    // First, capture the shared generation before doing the hash verification.
> +    let gen_before = token_shadow_shared_gen();
> +
>      let data = read_file()?;
>      match data.get(tokenid) {
> -        Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
> +        Some(hashed_secret) => {
> +            proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
> +
> +            // Try to cache only if nothing changed while verifying the secret.
> +            if let Some(gen) = gen_before {
> +                cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
> +            }
> +
> +            Ok(())
> +        }
>          None => bail!("invalid API token"),
>      }
>  }
> @@ -75,13 +107,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>          bail!("not an API token ID");
>      }
>
> -    let _guard = lock_config()?;
> +    let guard = lock_config()?;
>
>      let mut data = read_file()?;
>      let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
>      data.insert(tokenid.clone(), hashed_secret);
>      write_file(data)?;
>
> +    apply_api_mutation(guard, tokenid, Some(secret));
> +
>      Ok(())
>  }
>
> @@ -91,11 +125,138 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
>          bail!("not an API token ID");
>      }
>
> -    let _guard = lock_config()?;
> +    let guard = lock_config()?;
>
>      let mut data = read_file()?;
>      data.remove(tokenid);
>      write_file(data)?;
>
> +    apply_api_mutation(guard, tokenid, None);
> +
>      Ok(())
>  }
> +
> +/// Cached secret.
> +struct CachedSecret {
> +    secret: String,
> +}
> +
> +struct ApiTokenSecretCache {
> +    /// Keys are token Authids, values are the corresponding plain text secrets.
> +    /// Entries are added after a successful on-disk verification in
> +    /// `verify_secret` or when a new token secret is generated by
> +    /// `generate_and_set_secret`. Used to avoid repeated
> +    /// password-hash computation on subsequent authentications.
> +    secrets: HashMap<Authid, CachedSecret>,
> +    /// Shared generation to detect mutations of the underlying token.shadow file.
> +    shared_gen: usize,
> +}
> +
> +impl ApiTokenSecretCache {
> +    /// Resets all local cache contents and sets/updates the cached generation.
> +    fn reset_and_set_gen(&mut self, gen: usize) {
> +        self.secrets.clear();
> +        self.shared_gen = gen;
> +    }
> +
> +    /// Caches a secret and sets/updates the cache generation.
> +    fn insert_and_set_gen(&mut self, tokenid: Authid, secret: CachedSecret, gen: usize) {
> +        self.secrets.insert(tokenid, secret);
> +        self.shared_gen = gen;
> +    }
> +
> +    /// Evicts a cached secret and sets/updates the cached generation.
> +    fn evict_and_set_gen(&mut self, tokenid: &Authid, gen: usize) {
> +        self.secrets.remove(tokenid);
> +        self.shared_gen = gen;
> +    }
> +}
> +
> +fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
> +    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
> +        return;
> +    };
> +
> +    let Some(shared_gen_now) = token_shadow_shared_gen() else {
> +        return;
> +    };
> +
> +    // If this process missed a generation bump, its cache is stale.
> +    if cache.shared_gen != shared_gen_now {
> +        cache.reset_and_set_gen(shared_gen_now);
> +    }
> +
> +    // If a mutation happened while we were verifying the secret, do not insert.
> +    if shared_gen_now == shared_gen_before {
> +        cache.insert_and_set_gen(tokenid, CachedSecret { secret }, shared_gen_now);
> +    }
> +}
> +
> +/// Tries to match the given token secret against the cached secret.
> +///
> +/// Verifies the generation/version before doing the constant-time
> +/// comparison to reduce TOCTOU risk. During token rotation or deletion
> +/// tokens for in-flight requests may still validate against the previous
> +/// generation.
> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
> +    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
> +        return false;
> +    };
> +    let Some(entry) = cache.secrets.get(tokenid) else {
> +        return false;
> +    };
> +    let Some(current_gen) = token_shadow_shared_gen() else {
> +        return false;
> +    };
> +
> +    if current_gen == cache.shared_gen {
> +        return openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());

tiny comment here: if we ever allow secrets to have different lengths
this could panic:

> This function will panic the current task if a and b do not have the
> same length.
> - https://docs.rs/openssl/latest/openssl/memcmp/fn.eq.html

might be worth guarding against that or at least documenting that we
expect these to always have the same length.

> +    }
> +
> +    false
> +}
> +
> +fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
> +    // Signal cache invalidation to other processes (best-effort).
> +    let bumped_gen = bump_token_shadow_shared_gen();
> +
> +    let mut cache = TOKEN_SECRET_CACHE.write();
> +
> +    // If we cannot get the current generation, we cannot trust the cache
> +    let Some(current_gen) = token_shadow_shared_gen() else {
> +        cache.reset_and_set_gen(0);
> +        return;
> +    };
> +
> +    // If we cannot bump the shared generation, or if it changed after
> +    // obtaining the cache write lock, we cannot trust the cache
> +    if bumped_gen != Some(current_gen) {
> +        cache.reset_and_set_gen(current_gen);
> +        return;
> +    }
> +
> +    // Apply the new mutation.
> +    match new_secret {
> +        Some(secret) => {
> +            let cached_secret = CachedSecret {
> +                secret: secret.to_owned(),
> +            };
> +            cache.insert_and_set_gen(tokenid.clone(), cached_secret, current_gen);
> +        }
> +        None => cache.evict_and_set_gen(tokenid, current_gen),
> +    }
> +}
> +
> +/// Get the current shared generation.
> +fn token_shadow_shared_gen() -> Option<usize> {
> +    crate::ConfigVersionCache::new()
> +        .ok()
> +        .map(|cvc| cvc.token_shadow_generation())
> +}
> +
> +/// Bump and return the new shared generation.
> +fn bump_token_shadow_shared_gen() -> Option<usize> {
> +    crate::ConfigVersionCache::new()
> +        .ok()
> +        .map(|cvc| cvc.increase_token_shadow_generation() + 1)
> +}





^ permalink raw reply	[relevance 6%]

* Re: [PATCH proxmox-backup v5 2/4] pbs-config: cache verified API token secrets
  2026-02-25 15:44  6%   ` Shannon Sterz
@ 2026-02-27  9:28  6%     ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-02-27  9:28 UTC (permalink / raw)
  To: Shannon Sterz; +Cc: pbs-devel

On 2/25/26 4:44 PM, Shannon Sterz wrote:
> On Tue Feb 17, 2026 at 12:12 PM CET, Samuel Rufinatscha wrote:
>> Adds an in-memory cache of successfully verified token secrets.
>> Subsequent requests for the same token+secret combination only perform
>> a comparison using openssl::memcmp::eq and avoid re-running the
>> password hash. The cache is updated when a token secret is set and
>> cleared when a token is deleted. A shared generation counter (via
>> ConfigVersionCache) is used to invalidate caches across processes when
>> token secrets are modified or deleted. This keeps privileged and
>> unprivileged daemons in sync.
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> Changes from v4 to v5:
>> * Rebased
>> * Move invalidate_cache_state_and_set_gen into cache object impl
>> rename to reset_and_set_gen
>> * Add additional insert/remove helpers which set/update the generation
>> directly
>> * Clarified the  usage of shared generation counter in the commit
>> message
>>
>> Changes from v3 to v4:
>> * Add gen param to invalidate_cache_state()
>> * Validates the generation bump after obtaining write lock in
>> apply_api_mutation
>> * Pass lock to apply_api_mutation
>> * Remove unnecessary gen check cache_try_secret_matches
>> * Adjusted commit message
>>
>> Changes from v2 to v3:
>> * Replaced process-local cache invalidation (AtomicU64
>> API_MUTATION_GENERATION) with a cross-process shared generation via
>> ConfigVersionCache.
>> * Validate shared generation before/after the constant-time secret
>> compare; only insert into cache if the generation is unchanged.
>> * invalidate_cache_state() on insert if shared generation changed.
>>
>> Changes from v1 to v2:
>> * Replace OnceCell with LazyLock, and std::sync::RwLock with
>> parking_lot::RwLock.
>> * Add API_MUTATION_GENERATION and guard cache inserts
>> to prevent “zombie inserts” across concurrent set/delete.
>> * Refactor cache operations into cache_try_secret_matches,
>> cache_try_insert_secret, and centralize write-side behavior in
>> apply_api_mutation.
>> * Switch fast-path cache access to try_read/try_write (best-effort).
>>
>>   Cargo.toml                     |   1 +
>>   pbs-config/Cargo.toml          |   1 +
>>   pbs-config/src/token_shadow.rs | 167 ++++++++++++++++++++++++++++++++-
>>   3 files changed, 166 insertions(+), 3 deletions(-)
>>
>> diff --git a/Cargo.toml b/Cargo.toml
>> index dd8af85f..469538bb 100644
>> --- a/Cargo.toml
>> +++ b/Cargo.toml
>> @@ -144,6 +144,7 @@ nom = "7"
>>   num-traits = "0.2"
>>   once_cell = "1.3.1"
>>   openssl = "0.10.40"
>> +parking_lot = "0.12"
>>   percent-encoding = "2.1"
>>   pin-project-lite = "0.2"
>>   regex = "1.5.5"
>> diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
>> index 74afb3c6..eb81ce00 100644
>> --- a/pbs-config/Cargo.toml
>> +++ b/pbs-config/Cargo.toml
>> @@ -13,6 +13,7 @@ libc.workspace = true
>>   nix.workspace = true
>>   once_cell.workspace = true
>>   openssl.workspace = true
>> +parking_lot.workspace = true
>>   regex.workspace = true
>>   serde.workspace = true
>>   serde_json.workspace = true
>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
>> index 640fabbf..ad766671 100644
>> --- a/pbs-config/src/token_shadow.rs
>> +++ b/pbs-config/src/token_shadow.rs
>> @@ -1,6 +1,8 @@
>>   use std::collections::HashMap;
>> +use std::sync::LazyLock;
>>
>>   use anyhow::{bail, format_err, Error};
>> +use parking_lot::RwLock;
>>   use serde::{Deserialize, Serialize};
>>   use serde_json::{from_value, Value};
>>
>> @@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
>>   const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
>>   const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>>
>> +/// Global in-memory cache for successfully verified API token secrets.
>> +/// The cache stores plain text secrets for token Authids that have already been
>> +/// verified against the hashed values in `token.shadow`. This allows for cheap
>> +/// subsequent authentications for the same token+secret combination, avoiding
>> +/// recomputing the password hash on every request.
>> +static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
>> +    RwLock::new(ApiTokenSecretCache {
>> +        secrets: HashMap::new(),
>> +        shared_gen: 0,
>> +    })
>> +});
>> +
>>   #[derive(Serialize, Deserialize)]
>>   #[serde(rename_all = "kebab-case")]
>>   /// ApiToken id / secret pair
>> @@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>>           bail!("not an API token ID");
>>       }
>>
>> +    // Fast path
>> +    if cache_try_secret_matches(tokenid, secret) {
>> +        return Ok(());
>> +    }
>> +
>> +    // Slow path
>> +    // First, capture the shared generation before doing the hash verification.
>> +    let gen_before = token_shadow_shared_gen();
>> +
>>       let data = read_file()?;
>>       match data.get(tokenid) {
>> -        Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
>> +        Some(hashed_secret) => {
>> +            proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
>> +
>> +            // Try to cache only if nothing changed while verifying the secret.
>> +            if let Some(gen) = gen_before {
>> +                cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
>> +            }
>> +
>> +            Ok(())
>> +        }
>>           None => bail!("invalid API token"),
>>       }
>>   }
>> @@ -75,13 +107,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>>           bail!("not an API token ID");
>>       }
>>
>> -    let _guard = lock_config()?;
>> +    let guard = lock_config()?;
>>
>>       let mut data = read_file()?;
>>       let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
>>       data.insert(tokenid.clone(), hashed_secret);
>>       write_file(data)?;
>>
>> +    apply_api_mutation(guard, tokenid, Some(secret));
>> +
>>       Ok(())
>>   }
>>
>> @@ -91,11 +125,138 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
>>           bail!("not an API token ID");
>>       }
>>
>> -    let _guard = lock_config()?;
>> +    let guard = lock_config()?;
>>
>>       let mut data = read_file()?;
>>       data.remove(tokenid);
>>       write_file(data)?;
>>
>> +    apply_api_mutation(guard, tokenid, None);
>> +
>>       Ok(())
>>   }
>> +
>> +/// Cached secret.
>> +struct CachedSecret {
>> +    secret: String,
>> +}
>> +
>> +struct ApiTokenSecretCache {
>> +    /// Keys are token Authids, values are the corresponding plain text secrets.
>> +    /// Entries are added after a successful on-disk verification in
>> +    /// `verify_secret` or when a new token secret is generated by
>> +    /// `generate_and_set_secret`. Used to avoid repeated
>> +    /// password-hash computation on subsequent authentications.
>> +    secrets: HashMap<Authid, CachedSecret>,
>> +    /// Shared generation to detect mutations of the underlying token.shadow file.
>> +    shared_gen: usize,
>> +}
>> +
>> +impl ApiTokenSecretCache {
>> +    /// Resets all local cache contents and sets/updates the cached generation.
>> +    fn reset_and_set_gen(&mut self, gen: usize) {
>> +        self.secrets.clear();
>> +        self.shared_gen = gen;
>> +    }
>> +
>> +    /// Caches a secret and sets/updates the cache generation.
>> +    fn insert_and_set_gen(&mut self, tokenid: Authid, secret: CachedSecret, gen: usize) {
>> +        self.secrets.insert(tokenid, secret);
>> +        self.shared_gen = gen;
>> +    }
>> +
>> +    /// Evicts a cached secret and sets/updates the cached generation.
>> +    fn evict_and_set_gen(&mut self, tokenid: &Authid, gen: usize) {
>> +        self.secrets.remove(tokenid);
>> +        self.shared_gen = gen;
>> +    }
>> +}
>> +
>> +fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
>> +    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
>> +        return;
>> +    };
>> +
>> +    let Some(shared_gen_now) = token_shadow_shared_gen() else {
>> +        return;
>> +    };
>> +
>> +    // If this process missed a generation bump, its cache is stale.
>> +    if cache.shared_gen != shared_gen_now {
>> +        cache.reset_and_set_gen(shared_gen_now);
>> +    }
>> +
>> +    // If a mutation happened while we were verifying the secret, do not insert.
>> +    if shared_gen_now == shared_gen_before {
>> +        cache.insert_and_set_gen(tokenid, CachedSecret { secret }, shared_gen_now);
>> +    }
>> +}
>> +
>> +/// Tries to match the given token secret against the cached secret.
>> +///
>> +/// Verifies the generation/version before doing the constant-time
>> +/// comparison to reduce TOCTOU risk. During token rotation or deletion
>> +/// tokens for in-flight requests may still validate against the previous
>> +/// generation.
>> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
>> +    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
>> +        return false;
>> +    };
>> +    let Some(entry) = cache.secrets.get(tokenid) else {
>> +        return false;
>> +    };
>> +    let Some(current_gen) = token_shadow_shared_gen() else {
>> +        return false;
>> +    };
>> +
>> +    if current_gen == cache.shared_gen {
>> +        return openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
> 
> tiny comment here: if we ever allow secrets to have different lengths
> this could panic:
> 
>> This function will panic the current task if a and b do not have the
>> same length.
>> - https://docs.rs/openssl/latest/openssl/memcmp/fn.eq.html
> 
> might be worth guarding against that or at least documenting that we
> expect these to always have the same length.

Thanks, yes I agree. Will add a length check! In that case we should
invalidate/clear the cache and run the slow path and then cache the
secret with the new length.

> 
>> +    }
>> +
>> +    false
>> +}
>> +
>> +fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
>> +    // Signal cache invalidation to other processes (best-effort).
>> +    let bumped_gen = bump_token_shadow_shared_gen();
>> +
>> +    let mut cache = TOKEN_SECRET_CACHE.write();
>> +
>> +    // If we cannot get the current generation, we cannot trust the cache
>> +    let Some(current_gen) = token_shadow_shared_gen() else {
>> +        cache.reset_and_set_gen(0);
>> +        return;
>> +    };
>> +
>> +    // If we cannot bump the shared generation, or if it changed after
>> +    // obtaining the cache write lock, we cannot trust the cache
>> +    if bumped_gen != Some(current_gen) {
>> +        cache.reset_and_set_gen(current_gen);
>> +        return;
>> +    }
>> +
>> +    // Apply the new mutation.
>> +    match new_secret {
>> +        Some(secret) => {
>> +            let cached_secret = CachedSecret {
>> +                secret: secret.to_owned(),
>> +            };
>> +            cache.insert_and_set_gen(tokenid.clone(), cached_secret, current_gen);
>> +        }
>> +        None => cache.evict_and_set_gen(tokenid, current_gen),
>> +    }
>> +}
>> +
>> +/// Get the current shared generation.
>> +fn token_shadow_shared_gen() -> Option<usize> {
>> +    crate::ConfigVersionCache::new()
>> +        .ok()
>> +        .map(|cvc| cvc.token_shadow_generation())
>> +}
>> +
>> +/// Bump and return the new shared generation.
>> +fn bump_token_shadow_shared_gen() -> Option<usize> {
>> +    crate::ConfigVersionCache::new()
>> +        .ok()
>> +        .map(|cvc| cvc.increase_token_shadow_generation() + 1)
>> +}
> 





^ permalink raw reply	[relevance 6%]

* [PATCH proxmox-backup v6 1/4] pbs-config: add token.shadow generation to ConfigVersionCache
  2026-03-03 16:49 14% [PATCH proxmox{-backup,,-datacenter-manager} v6 " Samuel Rufinatscha
@ 2026-03-03 16:49 17% ` Samuel Rufinatscha
  2026-03-03 16:49 11% ` [PATCH proxmox-backup v6 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-03 16:49 UTC (permalink / raw)
  To: pbs-devel

Prepares the config version cache to support token_shadow caching.

Safety: the shmem mapping is fixed to 4096 bytes via the #[repr(C)]
union padding, and the new atomic is appended to the end of the
#[repr(C)] inner struct, so all existing field offsets stay unchanged.
Old processes keep accessing the same bytes and new processes consume
previously reserved padding.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>

---
Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased

Changes from v3 to v4:
* Rebased
* Adjusted commit message

Changes from v2 to v3:
* Rebased

Changes from v1 to v2:
* Rebased

 pbs-config/src/config_version_cache.rs | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
index b875f7e0..399a6f79 100644
--- a/pbs-config/src/config_version_cache.rs
+++ b/pbs-config/src/config_version_cache.rs
@@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
     traffic_control_generation: AtomicUsize,
     // datastore (datastore.cfg) generation/version
     datastore_generation: AtomicUsize,
+    // Token shadow (token.shadow) generation/version.
+    token_shadow_generation: AtomicUsize,
     // Add further atomics here
 }
 
@@ -159,4 +161,20 @@ impl ConfigVersionCache {
             .datastore_generation
             .fetch_add(1, Ordering::AcqRel)
     }
+
+    /// Returns the token shadow generation number.
+    pub fn token_shadow_generation(&self) -> usize {
+        self.shmem
+            .data()
+            .token_shadow_generation
+            .load(Ordering::Acquire)
+    }
+
+    /// Increase the token shadow generation number.
+    pub fn increase_token_shadow_generation(&self) -> usize {
+        self.shmem
+            .data()
+            .token_shadow_generation
+            .fetch_add(1, Ordering::AcqRel)
+    }
 }
-- 
2.47.3





^ permalink raw reply related	[relevance 17%]

* [PATCH proxmox v6 4/4] proxmox-access-control: add TTL window to token secret cache
  2026-03-03 16:49 14% [PATCH proxmox{-backup,,-datacenter-manager} v6 " Samuel Rufinatscha
                   ` (6 preceding siblings ...)
  2026-03-03 16:49 12% ` [PATCH proxmox v6 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2026-03-03 16:49 15% ` Samuel Rufinatscha
  2026-03-03 16:49 13% ` [PATCH proxmox-datacenter-manager v6 1/3] pdm-config: implement token.shadow generation Samuel Rufinatscha
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-03 16:49 UTC (permalink / raw)
  To: pbs-devel

Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
usually should rarely change.

This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired; documents TTL effects.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased
* Introduce shadow_check_within_ttl() helper

Changes from v3 to v4:
* Adjusted commit message

Changes from v2 to v3:
* Refactored refresh_cache_if_file_changed TTL logic.
* Remove had_prior_state check (replaced by last_checked logic).
* Improve TTL bound checks.
* Reword documentation warning for clarity.

Changes from v1 to v2:
* Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
* Implement double-checked TTL: check with try_read first; only attempt
  refresh with try_write if expired/unknown.
* Fix TTL bookkeeping: update last_checked on the “file unchanged” path
  and after API mutations.
* Add documentation warning about TTL-delayed effect of manual
  token.shadow edits.

 proxmox-access-control/src/token_shadow.rs | 31 +++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index d237b63b..a0eb2317 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -28,6 +28,9 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
     })
 });
 
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
+
 // Get exclusive lock
 fn lock_config() -> Result<ApiLockGuard, Error> {
     open_api_lockfile(token_shadow_lock(), None, true)
@@ -55,11 +58,24 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
 fn refresh_cache_if_file_changed() -> bool {
     let now = epoch_i64();
 
-    // Best-effort refresh under write lock.
+    // Fast path: cache is fresh if shared-gen matches and TTL not expired.
+    if let (Some(cache), Some(shared_gen_read)) =
+        (TOKEN_SECRET_CACHE.try_read(), token_shadow_shared_gen())
+    {
+        if cache.shared_gen == shared_gen_read && cache.shadow_check_within_ttl(now) {
+            return true;
+        }
+        // read lock drops here
+    } else {
+        return false;
+    }
+
+    // Slow path: best-effort refresh under write lock.
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return false;
     };
 
+    // Re-read generation after acquiring the lock (may have changed meanwhile).
     let Some(shared_gen_now) = token_shadow_shared_gen() else {
         return false;
     };
@@ -69,6 +85,12 @@ fn refresh_cache_if_file_changed() -> bool {
         cache.reset_and_set_gen(shared_gen_now);
     }
 
+    // TTL check again after acquiring the lock
+    let now = epoch_i64();
+    if cache.shadow_check_within_ttl(now) {
+        return true;
+    }
+
     // Stat the file to detect manual edits.
     let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
         return false;
@@ -217,6 +239,13 @@ impl ApiTokenSecretCache {
         self.secrets.remove(tokenid);
         self.shared_gen = gen;
     }
+
+    /// Returns true if cached token.shadow metadata exists and was checked within the TTL window.
+    fn shadow_check_within_ttl(&self, now: i64) -> bool {
+        self.shadow.as_ref().is_some_and(|cached| {
+            now >= cached.last_checked && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS
+        })
+    }
 }
 
 /// Shadow file info
-- 
2.47.3





^ permalink raw reply related	[relevance 15%]

* [PATCH proxmox-backup v6 4/4] pbs-config: add TTL window to token secret cache
  2026-03-03 16:49 14% [PATCH proxmox{-backup,,-datacenter-manager} v6 " Samuel Rufinatscha
                   ` (2 preceding siblings ...)
  2026-03-03 16:49 12% ` [PATCH proxmox-backup v6 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2026-03-03 16:49 15% ` Samuel Rufinatscha
  2026-03-03 16:49 14% ` [PATCH proxmox v6 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen Samuel Rufinatscha
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-03 16:49 UTC (permalink / raw)
  To: pbs-devel

Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
usually should rarely change.

This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired; documents TTL effects.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased
* Introduce shadow_check_within_ttl() helper

Changes from v3 to v4:
* Adjusted commit message

Changes from v2 to v3:
* Refactored refresh_cache_if_file_changed TTL logic.
* Remove had_prior_state check (replaced by last_checked logic).
* Improve TTL bound checks.
* Reword documentation warning for clarity.

Changes from v1 to v2:
* Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
* Implement double-checked TTL: check with try_read first; only attempt
  refresh with try_write if expired/unknown.
* Fix TTL bookkeeping: update last_checked on the “file unchanged” path
  and after API mutations.
* Add documentation warning about TTL-delayed effect of manual
  token.shadow edits.

 docs/user-management.rst       |  4 ++++
 pbs-config/src/token_shadow.rs | 30 +++++++++++++++++++++++++++++-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/docs/user-management.rst b/docs/user-management.rst
index 41b43d60..8dfae528 100644
--- a/docs/user-management.rst
+++ b/docs/user-management.rst
@@ -156,6 +156,10 @@ metadata:
 Similarly, the ``user delete-token`` subcommand can be used to delete a token
 again.
 
+.. WARNING:: Direct/manual edits to ``token.shadow`` may take up to 60 seconds (or
+   longer in edge cases) to take effect due to caching. Restart services for
+   immediate effect of manual edits.
+
 Newly generated API tokens don't have any permissions. Please read the next
 section to learn how to set access permissions.
 
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index dc1da728..f043ba38 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -31,6 +31,8 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
         shadow: None,
     })
 });
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
 
 #[derive(Serialize, Deserialize)]
 #[serde(rename_all = "kebab-case")]
@@ -72,11 +74,24 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
 fn refresh_cache_if_file_changed() -> bool {
     let now = epoch_i64();
 
-    // Best-effort refresh under write lock.
+    // Fast path: cache is fresh if shared-gen matches and TTL not expired.
+    if let (Some(cache), Some(shared_gen_read)) =
+        (TOKEN_SECRET_CACHE.try_read(), token_shadow_shared_gen())
+    {
+        if cache.shared_gen == shared_gen_read && cache.shadow_check_within_ttl(now) {
+            return true;
+        }
+        // read lock drops here
+    } else {
+        return false;
+    }
+
+    // Slow path: best-effort refresh under write lock.
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return false;
     };
 
+    // Re-read generation after acquiring the lock (may have changed meanwhile).
     let Some(shared_gen_now) = token_shadow_shared_gen() else {
         return false;
     };
@@ -86,6 +101,12 @@ fn refresh_cache_if_file_changed() -> bool {
         cache.reset_and_set_gen(shared_gen_now);
     }
 
+    // TTL check again after acquiring the lock
+    let now = epoch_i64();
+    if cache.shadow_check_within_ttl(now) {
+        return true;
+    }
+
     // Stat the file to detect manual edits.
     let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
         return false;
@@ -234,6 +255,13 @@ impl ApiTokenSecretCache {
         self.secrets.remove(tokenid);
         self.shared_gen = gen;
     }
+
+    /// Returns true if cached token.shadow metadata exists and was checked within the TTL window.
+    fn shadow_check_within_ttl(&self, now: i64) -> bool {
+        self.shadow.as_ref().is_some_and(|cached| {
+            now >= cached.last_checked && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS
+        })
+    }
 }
 
 /// Shadow file info
-- 
2.47.3





^ permalink raw reply related	[relevance 15%]

* [PATCH proxmox{-backup,,-datacenter-manager} v6 00/11] token-shadow: reduce api token verification overhead
@ 2026-03-03 16:49 14% Samuel Rufinatscha
  2026-03-03 16:49 17% ` [PATCH proxmox-backup v6 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
                   ` (12 more replies)
  0 siblings, 13 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-03 16:49 UTC (permalink / raw)
  To: pbs-devel

Hi,

this series improves the performance of token-based API authentication
in PBS (pbs-config) and in PDM (underlying proxmox-access-control
crate), addressing the API token verification hotspot reported in our
bugtracker #7017 [1].

When profiling PBS /status endpoint with cargo flamegraph [2],
token-based authentication showed up as a dominant hotspot via
proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
path from the hot section of the flamegraph. The same performance issue
was measured [2] for PDM. PDM uses the underlying shared
proxmox-access-control library for token handling, which is a
factored out version of the token.shadow handling code from PBS.

While this series fixes the immediate performance issue both in PBS
(pbs-config) and in the shared proxmox-access-control crate used by
PDM, PBS should eventually, ideally be refactored, in a separate
effort, to use proxmox-access-control for token handling instead of its
local implementation.

Approach

The goal is to reduce the cost of token-based authentication preserving
the existing token handling semantics (including detecting manual edits
to token.shadow) and be consistent between PBS (pbs-config) and
PDM (proxmox-access-control). For both sites, this series proposes to:

1. Introduce an in-memory cache for verified token secrets and
invalidate it through a shared ConfigVersionCache generation. Note, a
shared generation is required to keep privileged and unprivileged
daemon in sync to avoid caching inconsistencies across processes.
2. Invalidate on token.shadow API changes (set_secret,
delete_secret)
3. Invalidate on direct/manual token.shadow file changes (mtime +
length)
4. Avoid per-request file stat calls using a TTL window

Testing

To verify the effect in PBS (pbs-config changes), I:
1. Set up test environment based on latest PBS ISO, installed Rust
   toolchain, cloned proxmox-backup repository to use with cargo
   flamegraph. Reproduced bug #7017 [1] by profiling the /status
   endpoint with token-based authentication using cargo flamegraph [2].
2. Built PBS with pbs-config patches and re-ran the same workload and
   profiling setup. Confirmed that
   proxmox_sys::crypt::verify_crypt_pw path no longer appears in the
   hot section of the flamegraph. CPU usage is now dominated by TLS
   overhead.
3. Functionally-wise, I verified that:
   * valid tokens authenticate correctly when used in API requests
   * invalid secrets are rejected as before
   * generating a new token secret via dashboard (create token for
   user, regenerate existing secret) works and authenticates correctly

To verify the effect in PDM (proxmox-access-control changes), instead
of PBS’ /status, I profiled the /version endpoint with cargo flamegraph
[2] and verified that the expensive hashing path disappears from the
hot section after introducing caching. Functionally-wise, I verified
that:
   * valid tokens authenticate correctly when used in API requests
   * invalid secrets are rejected as before
   * generating a new token secret via dashboard (create token for user,
   regenerate existing secret) works and authenticates correctly

Results

To measure caching effect I benchmarked parallel token auth requests
for /status?verbose=0 on top of the datastore lookup cache series [3]
to check throughput impact. With datastores=1, repeat=5000, parallel=16
this series gives ~172 req/s compared to ~65 req/s without it.
This is a ~2.6x improvement (and aligns with the ~179 req/s from the
previous series, which used per-process cache invalidation).

Patch summary

pbs-config:
0001 – pbs-config: add token.shadow generation to ConfigVersionCache
0002 – pbs-config: cache verified API token secrets
0003 – pbs-config: invalidate token-secret cache on token.shadow
changes
0004 – pbs-config: add TTL window to token-secret cache

proxmox-access-control:
0005 – access-control: extend AccessControlConfig for token.shadow invalidation
0006 – access-control: cache verified API token secrets
0007 – access-control: invalidate token-secret cache on token.shadow changes
0008 – access-control: add TTL window to token-secret cache

proxmox-datacenter-manager:
0009 – pdm-config: add token.shadow generation to ConfigVersionCache
0010 – docs: document API token-cache TTL effects
0011 – pdm-config: wire user+acl cache generation

Maintainer Notes:
* proxmox-access-control trait split: permissions now live in
 AccessControlPermissions, and AccessControlConfig now requires
 fn permissions(&self) -> &dyn AccessControlPermissions ->
 version bump
* Renames ConfigVersionCache`s pub user_cache_generation and
 increase_user_cache_generation -> version bump
* Adds parking_lot::RwLock dependency in PBS and proxmox-access-control

This version and the version before only incorporate the reviewers'
feedback [4][5], also please consider Christian's R-b tag [4].

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
[2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
[3] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[4] https://lore.proxmox.com/pbs-devel/20260121151408.731516-1-s.rufinatscha@proxmox.com/T/#t
[5] https://lore.proxmox.com/pbs-devel/20260217111229.78661-1-s.rufinatscha@proxmox.com/T/#t

proxmox-backup:

Samuel Rufinatscha (4):
  pbs-config: add token.shadow generation to ConfigVersionCache
  pbs-config: cache verified API token secrets
  pbs-config: invalidate token-secret cache on token.shadow changes
  pbs-config: add TTL window to token secret cache

 Cargo.toml                             |   1 +
 docs/user-management.rst               |   4 +
 pbs-config/Cargo.toml                  |   1 +
 pbs-config/src/config_version_cache.rs |  18 ++
 pbs-config/src/token_shadow.rs         | 314 ++++++++++++++++++++++++-
 5 files changed, 335 insertions(+), 3 deletions(-)


proxmox:

Samuel Rufinatscha (4):
  proxmox-access-control: split AccessControlConfig and add token.shadow
    gen
  proxmox-access-control: cache verified API token secrets
  proxmox-access-control: invalidate token-secret cache on token.shadow
    changes
  proxmox-access-control: add TTL window to token secret cache

 Cargo.toml                                 |   1 +
 proxmox-access-control/Cargo.toml          |   1 +
 proxmox-access-control/src/acl.rs          |  10 +-
 proxmox-access-control/src/init.rs         | 113 ++++++--
 proxmox-access-control/src/token_shadow.rs | 315 ++++++++++++++++++++-
 5 files changed, 413 insertions(+), 27 deletions(-)


proxmox-datacenter-manager:

Samuel Rufinatscha (3):
  pdm-config: implement token.shadow generation
  docs: document API token-cache TTL effects
  pdm-config: wire user+acl cache generation

 cli/admin/src/main.rs                      |  2 +-
 docs/access-control.rst                    |  4 +++
 lib/pdm-api-types/src/acl.rs               |  4 +--
 lib/pdm-config/Cargo.toml                  |  1 +
 lib/pdm-config/src/access_control.rs       | 31 ++++++++++++++++++++
 lib/pdm-config/src/config_version_cache.rs | 34 +++++++++++++++++-----
 lib/pdm-config/src/lib.rs                  |  2 ++
 server/src/acl.rs                          |  3 +-
 ui/src/main.rs                             | 10 ++++++-
 9 files changed, 77 insertions(+), 14 deletions(-)
 create mode 100644 lib/pdm-config/src/access_control.rs


Summary over all repositories:
  19 files changed, 825 insertions(+), 44 deletions(-)

-- 
Generated by git-murpp 0.8.1




^ permalink raw reply	[relevance 14%]

* [PATCH proxmox v6 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes
  2026-03-03 16:49 14% [PATCH proxmox{-backup,,-datacenter-manager} v6 " Samuel Rufinatscha
                   ` (5 preceding siblings ...)
  2026-03-03 16:49 11% ` [PATCH proxmox v6 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
@ 2026-03-03 16:49 12% ` Samuel Rufinatscha
  2026-03-03 16:49 15% ` [PATCH proxmox v6 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-03 16:49 UTC (permalink / raw)
  To: pbs-devel

This patch adds manual/direct file change detection by tracking the
mtime and length of token.shadow and clears the in-memory token secret
cache whenever these values change.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased

Changes from v3 to v4:
* make use of .replace() in refresh_cache_if_file_changed to get
previous state
* Group file stats with ShadowFileInfo
* Return false in refresh_cache_if_file_changed to avoid unnecessary cache
queries
* Adjusted commit message

Changes from v2 to v3:
* Cache now tracks last_checked (epoch seconds).
* Simplified refresh_cache_if_file_changed, removed
FILE_GENERATION logic
* On first load, initializes file metadata and keeps empty cache.

Changes from v1 to v2:
* Add file metadata tracking (file_mtime, file_len) and
  FILE_GENERATION.
* Store file_gen in CachedSecret and verify it against the current
  FILE_GENERATION to ensure cached entries belong to the current file
  state.
* Add shadow_mtime_len() helper and convert refresh to best-effort
  (try_write, returns bool).
* Pass a pre-write metadata snapshot into apply_api_mutation and
  clear/bump generation if the cache metadata indicates missed external
  edits.

 proxmox-access-control/src/token_shadow.rs | 123 ++++++++++++++++++++-
 1 file changed, 119 insertions(+), 4 deletions(-)

diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index 79e78555..d237b63b 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,5 +1,8 @@
 use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
 use std::sync::LazyLock;
+use std::time::SystemTime;
 
 use anyhow::{bail, format_err, Error};
 use parking_lot::RwLock;
@@ -7,6 +10,7 @@ use serde_json::{from_value, Value};
 
 use proxmox_auth_api::types::Authid;
 use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
+use proxmox_time::epoch_i64;
 
 use crate::init::access_conf;
 use crate::init::impl_feature::{token_shadow, token_shadow_lock};
@@ -20,6 +24,7 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
     RwLock::new(ApiTokenSecretCache {
         secrets: HashMap::new(),
         shared_gen: 0,
+        shadow: None,
     })
 });
 
@@ -45,6 +50,56 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
     replace_config(token_shadow(), &json)
 }
 
+/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
+/// Returns true if the cache is valid to use, false if not.
+fn refresh_cache_if_file_changed() -> bool {
+    let now = epoch_i64();
+
+    // Best-effort refresh under write lock.
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return false;
+    };
+
+    let Some(shared_gen_now) = token_shadow_shared_gen() else {
+        return false;
+    };
+
+    // If another process bumped the generation, we don't know what changed -> clear cache
+    if cache.shared_gen != shared_gen_now {
+        cache.reset_and_set_gen(shared_gen_now);
+    }
+
+    // Stat the file to detect manual edits.
+    let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
+        return false;
+    };
+
+    // If the file didn't change, only update last_checked
+    if let Some(shadow) = cache.shadow.as_mut() {
+        if shadow.mtime == new_mtime && shadow.len == new_len {
+            shadow.last_checked = now;
+            return true;
+        }
+    }
+
+    cache.secrets.clear();
+
+    let prev = cache.shadow.replace(ShadowFileInfo {
+        mtime: new_mtime,
+        len: new_len,
+        last_checked: now,
+    });
+
+    if prev.is_some() {
+        // Best-effort propagation to other processes if a change was detected
+        if let Some(shared_gen_new) = bump_token_shadow_shared_gen() {
+            cache.shared_gen = shared_gen_new;
+        }
+    }
+
+    false
+}
+
 /// Verifies that an entry for given tokenid / API token secret exists
 pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     if !tokenid.is_token() {
@@ -52,7 +107,7 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     }
 
     // Fast path
-    if cache_try_secret_matches(tokenid, secret) {
+    if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
         return Ok(());
     }
 
@@ -84,12 +139,15 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
 
     let guard = lock_config()?;
 
+    // Capture state before we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
-    apply_api_mutation(guard, tokenid, Some(secret));
+    apply_api_mutation(guard, tokenid, Some(secret), pre_meta);
 
     Ok(())
 }
@@ -102,11 +160,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
 
     let guard = lock_config()?;
 
+    // Capture state before we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
-    apply_api_mutation(guard, tokenid, None);
+    apply_api_mutation(guard, tokenid, None, pre_meta);
 
     Ok(())
 }
@@ -133,6 +194,8 @@ struct ApiTokenSecretCache {
     secrets: HashMap<Authid, CachedSecret>,
     /// Shared generation to detect mutations of the underlying token.shadow file.
     shared_gen: usize,
+    /// Shadow file info to detect changes
+    shadow: Option<ShadowFileInfo>,
 }
 
 impl ApiTokenSecretCache {
@@ -140,6 +203,7 @@ impl ApiTokenSecretCache {
     fn reset_and_set_gen(&mut self, gen: usize) {
         self.secrets.clear();
         self.shared_gen = gen;
+        self.shadow = None;
     }
 
     /// Caches a secret and sets/updates the cache generation.
@@ -155,6 +219,16 @@ impl ApiTokenSecretCache {
     }
 }
 
+/// Shadow file info
+struct ShadowFileInfo {
+    // shadow file mtime to detect changes
+    mtime: Option<SystemTime>,
+    // shadow file length to detect changes
+    len: Option<u64>,
+    // last time the file metadata was checked
+    last_checked: i64,
+}
+
 fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return;
@@ -203,7 +277,14 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
     false
 }
 
-fn apply_api_mutation(_guard: ApiLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+fn apply_api_mutation(
+    _guard: ApiLockGuard,
+    tokenid: &Authid,
+    new_secret: Option<&str>,
+    pre_write_meta: (Option<SystemTime>, Option<u64>),
+) {
+    let now = epoch_i64();
+
     // Signal cache invalidation to other processes (best-effort).
     let bumped_gen = bump_token_shadow_shared_gen();
 
@@ -222,6 +303,16 @@ fn apply_api_mutation(_guard: ApiLockGuard, tokenid: &Authid, new_secret: Option
         return;
     }
 
+    // If our cached file metadata does not match the on-disk state before our write,
+    // we likely missed an external/manual edit. We can no longer trust any cached secrets.
+    if cache
+        .shadow
+        .as_ref()
+        .is_some_and(|s| (s.mtime, s.len) != pre_write_meta)
+    {
+        cache.secrets.clear();
+    }
+
     // Apply the new mutation.
     match new_secret {
         Some(secret) => {
@@ -232,6 +323,22 @@ fn apply_api_mutation(_guard: ApiLockGuard, tokenid: &Authid, new_secret: Option
         }
         None => cache.evict_and_set_gen(tokenid, current_gen),
     }
+
+    // Update our view of the file metadata to the post-write state (best-effort).
+    // (If this fails, drop local cache so callers fall back to slow path until refreshed.)
+    match shadow_mtime_len() {
+        Ok((mtime, len)) => {
+            cache.shadow = Some(ShadowFileInfo {
+                mtime,
+                len,
+                last_checked: now,
+            });
+        }
+        Err(_) => {
+            // If we cannot validate state, do not trust cache.
+            cache.reset_and_set_gen(current_gen);
+        }
+    }
 }
 
 /// Get the current shared generation.
@@ -246,3 +353,11 @@ fn bump_token_shadow_shared_gen() -> Option<usize> {
         .ok()
         .map(|prev| prev + 1)
 }
+
+fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
+    match fs::metadata(token_shadow()) {
+        Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
+        Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
+        Err(e) => Err(e.into()),
+    }
+}
-- 
2.47.3





^ permalink raw reply related	[relevance 12%]

* [PATCH proxmox-backup v6 3/4] pbs-config: invalidate token-secret cache on token.shadow changes
  2026-03-03 16:49 14% [PATCH proxmox{-backup,,-datacenter-manager} v6 " Samuel Rufinatscha
  2026-03-03 16:49 17% ` [PATCH proxmox-backup v6 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
  2026-03-03 16:49 11% ` [PATCH proxmox-backup v6 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
@ 2026-03-03 16:49 12% ` Samuel Rufinatscha
  2026-03-03 16:49 15% ` [PATCH proxmox-backup v6 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-03 16:49 UTC (permalink / raw)
  To: pbs-devel

This patch adds manual/direct file change detection by tracking the
mtime and length of token.shadow and clears the in-memory token secret
cache whenever these values change.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased

Changes from v3 to v4:
* make use of .replace() in refresh_cache_if_file_changed to get
previous state
* Group file stats with ShadowFileInfo
* Return false in refresh_cache_if_file_changed to avoid unnecessary cache
queries
* Adjusted commit message

Changes from v2 to v3:
* Cache now tracks last_checked (epoch seconds).
* Simplified refresh_cache_if_file_changed, removed
FILE_GENERATION logic
* On first load, initializes file metadata and keeps empty cache.

Changes from v1 to v2:
* Add file metadata tracking (file_mtime, file_len) and
  FILE_GENERATION.
* Store file_gen in CachedSecret and verify it against the current
  FILE_GENERATION to ensure cached entries belong to the current file
  state.
* Add shadow_mtime_len() helper and convert refresh to best-effort
  (try_write, returns bool).
* Pass a pre-write metadata snapshot into apply_api_mutation and
  clear/bump generation if the cache metadata indicates missed external
  edits.

 pbs-config/src/token_shadow.rs | 123 +++++++++++++++++++++++++++++++--
 1 file changed, 119 insertions(+), 4 deletions(-)

diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index 77efac0e..dc1da728 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,5 +1,8 @@
 use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
 use std::sync::LazyLock;
+use std::time::SystemTime;
 
 use anyhow::{bail, format_err, Error};
 use parking_lot::RwLock;
@@ -7,6 +10,7 @@ use serde::{Deserialize, Serialize};
 use serde_json::{from_value, Value};
 
 use proxmox_sys::fs::CreateOptions;
+use proxmox_time::epoch_i64;
 
 use pbs_api_types::Authid;
 //use crate::auth;
@@ -24,6 +28,7 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
     RwLock::new(ApiTokenSecretCache {
         secrets: HashMap::new(),
         shared_gen: 0,
+        shadow: None,
     })
 });
 
@@ -62,6 +67,56 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
     proxmox_sys::fs::replace_file(CONF_FILE, &json, options, true)
 }
 
+/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
+/// Returns true if the cache is valid to use, false if not.
+fn refresh_cache_if_file_changed() -> bool {
+    let now = epoch_i64();
+
+    // Best-effort refresh under write lock.
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return false;
+    };
+
+    let Some(shared_gen_now) = token_shadow_shared_gen() else {
+        return false;
+    };
+
+    // If another process bumped the generation, we don't know what changed -> clear cache
+    if cache.shared_gen != shared_gen_now {
+        cache.reset_and_set_gen(shared_gen_now);
+    }
+
+    // Stat the file to detect manual edits.
+    let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
+        return false;
+    };
+
+    // If the file didn't change, only update last_checked
+    if let Some(shadow) = cache.shadow.as_mut() {
+        if shadow.mtime == new_mtime && shadow.len == new_len {
+            shadow.last_checked = now;
+            return true;
+        }
+    }
+
+    cache.secrets.clear();
+
+    let prev = cache.shadow.replace(ShadowFileInfo {
+        mtime: new_mtime,
+        len: new_len,
+        last_checked: now,
+    });
+
+    if prev.is_some() {
+        // Best-effort propagation to other processes if a change was detected
+        if let Some(shared_gen_new) = bump_token_shadow_shared_gen() {
+            cache.shared_gen = shared_gen_new;
+        }
+    }
+
+    false
+}
+
 /// Verifies that an entry for given tokenid / API token secret exists
 pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     if !tokenid.is_token() {
@@ -69,7 +124,7 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     }
 
     // Fast path
-    if cache_try_secret_matches(tokenid, secret) {
+    if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
         return Ok(());
     }
 
@@ -109,12 +164,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
 
     let guard = lock_config()?;
 
+    // Capture state before we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
-    apply_api_mutation(guard, tokenid, Some(secret));
+    apply_api_mutation(guard, tokenid, Some(secret), pre_meta);
 
     Ok(())
 }
@@ -127,11 +185,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
 
     let guard = lock_config()?;
 
+    // Capture state before we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
-    apply_api_mutation(guard, tokenid, None);
+    apply_api_mutation(guard, tokenid, None, pre_meta);
 
     Ok(())
 }
@@ -150,6 +211,8 @@ struct ApiTokenSecretCache {
     secrets: HashMap<Authid, CachedSecret>,
     /// Shared generation to detect mutations of the underlying token.shadow file.
     shared_gen: usize,
+    /// Shadow file info to detect changes
+    shadow: Option<ShadowFileInfo>,
 }
 
 impl ApiTokenSecretCache {
@@ -157,6 +220,7 @@ impl ApiTokenSecretCache {
     fn reset_and_set_gen(&mut self, gen: usize) {
         self.secrets.clear();
         self.shared_gen = gen;
+        self.shadow = None;
     }
 
     /// Caches a secret and sets/updates the cache generation.
@@ -172,6 +236,16 @@ impl ApiTokenSecretCache {
     }
 }
 
+/// Shadow file info
+struct ShadowFileInfo {
+    // shadow file mtime to detect changes
+    mtime: Option<SystemTime>,
+    // shadow file length to detect changes
+    len: Option<u64>,
+    // last time the file metadata was checked
+    last_checked: i64,
+}
+
 fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return;
@@ -220,7 +294,14 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
     false
 }
 
-fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+fn apply_api_mutation(
+    _guard: BackupLockGuard,
+    tokenid: &Authid,
+    new_secret: Option<&str>,
+    pre_write_meta: (Option<SystemTime>, Option<u64>),
+) {
+    let now = epoch_i64();
+
     // Signal cache invalidation to other processes (best-effort).
     let bumped_gen = bump_token_shadow_shared_gen();
 
@@ -239,6 +320,16 @@ fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Opt
         return;
     }
 
+    // If our cached file metadata does not match the on-disk state before our write,
+    // we likely missed an external/manual edit. We can no longer trust any cached secrets.
+    if cache
+        .shadow
+        .as_ref()
+        .is_some_and(|s| (s.mtime, s.len) != pre_write_meta)
+    {
+        cache.secrets.clear();
+    }
+
     // Apply the new mutation.
     match new_secret {
         Some(secret) => {
@@ -249,6 +340,22 @@ fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Opt
         }
         None => cache.evict_and_set_gen(tokenid, current_gen),
     }
+
+    // Update our view of the file metadata to the post-write state (best-effort).
+    // (If this fails, drop local cache so callers fall back to slow path until refreshed.)
+    match shadow_mtime_len() {
+        Ok((mtime, len)) => {
+            cache.shadow = Some(ShadowFileInfo {
+                mtime,
+                len,
+                last_checked: now,
+            });
+        }
+        Err(_) => {
+            // If we cannot validate state, do not trust cache.
+            cache.reset_and_set_gen(current_gen);
+        }
+    }
 }
 
 /// Get the current shared generation.
@@ -264,3 +371,11 @@ fn bump_token_shadow_shared_gen() -> Option<usize> {
         .ok()
         .map(|cvc| cvc.increase_token_shadow_generation() + 1)
 }
+
+fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
+    match fs::metadata(CONF_FILE) {
+        Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
+        Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
+        Err(e) => Err(e.into()),
+    }
+}
-- 
2.47.3





^ permalink raw reply related	[relevance 12%]

* [PATCH proxmox-datacenter-manager v6 2/3] docs: document API token-cache TTL effects
  2026-03-03 16:49 14% [PATCH proxmox{-backup,,-datacenter-manager} v6 " Samuel Rufinatscha
                   ` (8 preceding siblings ...)
  2026-03-03 16:49 13% ` [PATCH proxmox-datacenter-manager v6 1/3] pdm-config: implement token.shadow generation Samuel Rufinatscha
@ 2026-03-03 16:49 17% ` Samuel Rufinatscha
  2026-03-03 16:50 16% ` [PATCH proxmox-datacenter-manager v6 3/3] pdm-config: wire user+acl cache generation Samuel Rufinatscha
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-03 16:49 UTC (permalink / raw)
  To: pbs-devel

Documents the effects of the added API token-cache in the
proxmox-access-control crate.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased

Changes from v3 to 4:
* Adjusted commit message

 docs/access-control.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/docs/access-control.rst b/docs/access-control.rst
index adf26cd..18e57a2 100644
--- a/docs/access-control.rst
+++ b/docs/access-control.rst
@@ -47,6 +47,10 @@ place of the user ID (``user@realm``) and the user password, respectively.
 The API token is passed from the client to the server by setting the ``Authorization`` HTTP header
 with method ``PDMAPIToken`` to the value ``TOKENID:TOKENSECRET``.
 
+.. WARNING:: Direct/manual edits to ``token.shadow`` may take up to 60 seconds (or
+   longer in edge cases) to take effect due to caching. Restart services for
+   immediate effect of manual edits.
+
 .. _access_control:
 
 Access Control
-- 
2.47.3





^ permalink raw reply related	[relevance 17%]

* [PATCH proxmox-datacenter-manager v6 3/3] pdm-config: wire user+acl cache generation
  2026-03-03 16:49 14% [PATCH proxmox{-backup,,-datacenter-manager} v6 " Samuel Rufinatscha
                   ` (9 preceding siblings ...)
  2026-03-03 16:49 17% ` [PATCH proxmox-datacenter-manager v6 2/3] docs: document API token-cache TTL effects Samuel Rufinatscha
@ 2026-03-03 16:50 16% ` Samuel Rufinatscha
  2026-03-11  8:59  6% ` [PATCH proxmox{-backup,,-datacenter-manager} v6 00/11] token-shadow: reduce api token verification overhead Fabian Grünbichler
  2026-03-12 10:38 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-03 16:50 UTC (permalink / raw)
  To: pbs-devel

Rename ConfigVersionCache’s user_cache_generation to
user_and_acl_generation to match AccessControlConfig::cache_generation
and increment_cache_generation semantics: it expects the same shared
generation for both user and ACL configs.

Safety: no layout change, the shared-memory size and field order remain
unchanged.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased

 lib/pdm-config/src/access_control.rs       | 11 +++++++++++
 lib/pdm-config/src/config_version_cache.rs | 16 ++++++++--------
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/lib/pdm-config/src/access_control.rs b/lib/pdm-config/src/access_control.rs
index 389b3f4..1d498d3 100644
--- a/lib/pdm-config/src/access_control.rs
+++ b/lib/pdm-config/src/access_control.rs
@@ -7,6 +7,17 @@ impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
         &pdm_api_types::AccessControlPermissions
     }
 
+    fn cache_generation(&self) -> Option<usize> {
+        crate::ConfigVersionCache::new()
+            .ok()
+            .map(|c| c.user_and_acl_generation())
+    }
+
+    fn increment_cache_generation(&self) -> Result<(), Error> {
+        let c = crate::ConfigVersionCache::new()?;
+        Ok(c.increase_user_and_acl_generation())
+    }
+
     fn token_shadow_cache_generation(&self) -> Option<usize> {
         crate::ConfigVersionCache::new()
             .ok()
diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
index 933140c..f3d52a0 100644
--- a/lib/pdm-config/src/config_version_cache.rs
+++ b/lib/pdm-config/src/config_version_cache.rs
@@ -21,8 +21,8 @@ use proxmox_shared_memory::*;
 #[repr(C)]
 struct ConfigVersionCacheDataInner {
     magic: [u8; 8],
-    // User (user.cfg) cache generation/version.
-    user_cache_generation: AtomicUsize,
+    // User (user.cfg) and ACL (acl.cfg) generation/version.
+    user_and_acl_generation: AtomicUsize,
     // Traffic control (traffic-control.cfg) generation/version.
     traffic_control_generation: AtomicUsize,
     // Tracks updates to the remote/hostname/nodename mapping cache.
@@ -126,19 +126,19 @@ impl ConfigVersionCache {
         Ok(Arc::new(Self { shmem }))
     }
 
-    /// Returns the user cache generation number.
-    pub fn user_cache_generation(&self) -> usize {
+    /// Returns the user and ACL cache generation number.
+    pub fn user_and_acl_generation(&self) -> usize {
         self.shmem
             .data()
-            .user_cache_generation
+            .user_and_acl_generation
             .load(Ordering::Acquire)
     }
 
-    /// Increase the user cache generation number.
-    pub fn increase_user_cache_generation(&self) {
+    /// Increase the user and ACL cache generation number.
+    pub fn increase_user_and_acl_generation(&self) {
         self.shmem
             .data()
-            .user_cache_generation
+            .user_and_acl_generation
             .fetch_add(1, Ordering::AcqRel);
     }
 
-- 
2.47.3





^ permalink raw reply related	[relevance 16%]

* [PATCH proxmox v6 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen
  2026-03-03 16:49 14% [PATCH proxmox{-backup,,-datacenter-manager} v6 " Samuel Rufinatscha
                   ` (3 preceding siblings ...)
  2026-03-03 16:49 15% ` [PATCH proxmox-backup v6 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
@ 2026-03-03 16:49 14% ` Samuel Rufinatscha
  2026-03-03 16:49 11% ` [PATCH proxmox v6 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-03 16:49 UTC (permalink / raw)
  To: pbs-devel

Splits AccessControlConfig into permissions and config traits and adds
token.shadow generation support. The trait split separates permission
from cache/invalidation concerns while keeping existing call sites
working via default delegation.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased

 proxmox-access-control/src/acl.rs  |  10 ++-
 proxmox-access-control/src/init.rs | 113 +++++++++++++++++++++++------
 2 files changed, 99 insertions(+), 24 deletions(-)

diff --git a/proxmox-access-control/src/acl.rs b/proxmox-access-control/src/acl.rs
index 38cb7edf..4b4eac09 100644
--- a/proxmox-access-control/src/acl.rs
+++ b/proxmox-access-control/src/acl.rs
@@ -763,7 +763,7 @@ fn privs_to_priv_names(privs: u64) -> Vec<&'static str> {
 mod test {
     use std::{collections::HashMap, sync::OnceLock};
 
-    use crate::init::{init_access_config, AccessControlConfig};
+    use crate::init::{init_access_config, AccessControlConfig, AccessControlPermissions};
 
     use super::AclTree;
     use anyhow::Error;
@@ -775,7 +775,7 @@ mod test {
         roles: HashMap<&'a str, (u64, &'a str)>,
     }
 
-    impl AccessControlConfig for TestAcmConfig<'_> {
+    impl AccessControlPermissions for TestAcmConfig<'_> {
         fn roles(&self) -> &HashMap<&str, (u64, &str)> {
             &self.roles
         }
@@ -793,6 +793,12 @@ mod test {
         }
     }
 
+    impl AccessControlConfig for TestAcmConfig<'_> {
+        fn permissions(&self) -> &dyn AccessControlPermissions {
+            self
+        }
+    }
+
     fn setup_acl_tree_config() {
         static ACL_CONFIG: OnceLock<TestAcmConfig> = OnceLock::new();
         let config = ACL_CONFIG.get_or_init(|| {
diff --git a/proxmox-access-control/src/init.rs b/proxmox-access-control/src/init.rs
index e64398e8..dfd7784b 100644
--- a/proxmox-access-control/src/init.rs
+++ b/proxmox-access-control/src/init.rs
@@ -8,9 +8,8 @@ use proxmox_section_config::SectionConfigData;
 
 static ACCESS_CONF: OnceLock<&'static dyn AccessControlConfig> = OnceLock::new();
 
-/// This trait specifies the functions a product needs to implement to get ACL tree based access
-/// control management from this plugin.
-pub trait AccessControlConfig: Send + Sync {
+/// Provides permission metadata used by access control.
+pub trait AccessControlPermissions: Send + Sync {
     /// Returns a mapping of all recognized privileges and their corresponding `u64` value.
     fn privileges(&self) -> &HashMap<&str, u64>;
 
@@ -32,25 +31,6 @@ pub trait AccessControlConfig: Send + Sync {
         false
     }
 
-    /// Returns the current cache generation of the user and acl configs. If the generation was
-    /// incremented since the last time the cache was queried, the configs are loaded again from
-    /// disk.
-    ///
-    /// Returning `None` will always reload the cache.
-    ///
-    /// Default: Always returns `None`.
-    fn cache_generation(&self) -> Option<usize> {
-        None
-    }
-
-    /// Increment the cache generation of user and acl configs. This indicates that they were
-    /// changed on disk.
-    ///
-    /// Default: Does nothing.
-    fn increment_cache_generation(&self) -> Result<(), Error> {
-        Ok(())
-    }
-
     /// Optionally returns a role that has no access to any resource.
     ///
     /// Default: Returns `None`.
@@ -103,6 +83,95 @@ pub trait AccessControlConfig: Send + Sync {
     }
 }
 
+/// This trait specifies the functions a product needs to implement to get ACL tree based access
+/// control management from this plugin.
+pub trait AccessControlConfig: Send + Sync {
+    /// Return the permissions provider.
+    fn permissions(&self) -> &dyn AccessControlPermissions;
+
+    fn privileges(&self) -> &HashMap<&str, u64> {
+        self.permissions().privileges()
+    }
+
+    fn roles(&self) -> &HashMap<&str, (u64, &str)> {
+        self.permissions().roles()
+    }
+
+    fn is_superuser(&self, auth_id: &Authid) -> bool {
+        self.permissions().is_superuser(auth_id)
+    }
+
+    fn is_group_member(&self, user_id: &Userid, group: &str) -> bool {
+        self.permissions().is_group_member(user_id, group)
+    }
+
+    fn role_no_access(&self) -> Option<&str> {
+        self.permissions().role_no_access()
+    }
+
+    fn role_admin(&self) -> Option<&str> {
+        self.permissions().role_admin()
+    }
+
+    fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
+        self.permissions().init_user_config(config)
+    }
+
+    fn acl_audit_privileges(&self) -> u64 {
+        self.permissions().acl_audit_privileges()
+    }
+
+    fn acl_modify_privileges(&self) -> u64 {
+        self.permissions().acl_modify_privileges()
+    }
+
+    fn check_acl_path(&self, path: &str) -> Result<(), Error> {
+        self.permissions().check_acl_path(path)
+    }
+
+    fn allow_partial_permission_match(&self) -> bool {
+        self.permissions().allow_partial_permission_match()
+    }
+
+    // Cache hooks
+
+    /// Returns the current cache generation of the user and acl configs. If the generation was
+    /// incremented since the last time the cache was queried, the configs are loaded again from
+    /// disk.
+    ///
+    /// Returning `None` will always reload the cache.
+    ///
+    /// Default: Always returns `None`.
+    fn cache_generation(&self) -> Option<usize> {
+        None
+    }
+
+    /// Increment the cache generation of user and acl configs. This indicates that they were
+    /// changed on disk.
+    ///
+    /// Default: Does nothing.
+    fn increment_cache_generation(&self) -> Result<(), Error> {
+        Ok(())
+    }
+
+    /// Returns the current cache generation of the token shadow cache. If the generation was
+    /// incremented since the last time the cache was queried, the token shadow cache is reloaded
+    /// from disk.
+    ///
+    /// Default: Always returns `None`.
+    fn token_shadow_cache_generation(&self) -> Option<usize> {
+        None
+    }
+
+    /// Increment the cache generation of the token shadow cache. This indicates that it was
+    /// changed on disk.
+    ///
+    /// Default: Returns an error as token shadow generation is not supported.
+    fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
+        anyhow::bail!("token shadow generation not supported");
+    }
+}
+
 pub fn init_access_config(config: &'static dyn AccessControlConfig) -> Result<(), Error> {
     ACCESS_CONF
         .set(config)
-- 
2.47.3





^ permalink raw reply related	[relevance 14%]

* [PATCH proxmox-backup v6 2/4] pbs-config: cache verified API token secrets
  2026-03-03 16:49 14% [PATCH proxmox{-backup,,-datacenter-manager} v6 " Samuel Rufinatscha
  2026-03-03 16:49 17% ` [PATCH proxmox-backup v6 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
@ 2026-03-03 16:49 11% ` Samuel Rufinatscha
  2026-03-03 16:49 12% ` [PATCH proxmox-backup v6 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-03 16:49 UTC (permalink / raw)
  To: pbs-devel

Adds an in-memory cache of successfully verified token secrets.
Subsequent requests for the same token+secret combination only perform
a comparison using openssl::memcmp::eq and avoid re-running the
password hash. The cache is updated when a token secret is set and
cleared when a token is deleted. A shared generation counter (via
ConfigVersionCache) is used to invalidate caches across processes when
token secrets are modified or deleted. This keeps privileged and
unprivileged daemons in sync.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v5 to v6:
* Rebased
* Check that the input byte lengths are equal before calling
openssl::memcmp::eq(..).

Changes from v4 to v5:
* Rebased
* Move invalidate_cache_state_and_set_gen into cache object impl
rename to reset_and_set_gen
* Add additional insert/remove helpers which set/update the generation
directly
* Clarified the  usage of shared generation counter in the commit
message

Changes from v3 to v4:
* Add gen param to invalidate_cache_state()
* Validates the generation bump after obtaining write lock in
apply_api_mutation
* Pass lock to apply_api_mutation
* Remove unnecessary gen check cache_try_secret_matches
* Adjusted commit message

Changes from v2 to v3:
* Replaced process-local cache invalidation (AtomicU64
API_MUTATION_GENERATION) with a cross-process shared generation via
ConfigVersionCache.
* Validate shared generation before/after the constant-time secret
compare; only insert into cache if the generation is unchanged.
* invalidate_cache_state() on insert if shared generation changed.

Changes from v1 to v2:
* Replace OnceCell with LazyLock, and std::sync::RwLock with
parking_lot::RwLock.
* Add API_MUTATION_GENERATION and guard cache inserts
to prevent “zombie inserts” across concurrent set/delete.
* Refactor cache operations into cache_try_secret_matches,
cache_try_insert_secret, and centralize write-side behavior in
apply_api_mutation.
* Switch fast-path cache access to try_read/try_write (best-effort).

 Cargo.toml                     |   1 +
 pbs-config/Cargo.toml          |   1 +
 pbs-config/src/token_shadow.rs | 171 ++++++++++++++++++++++++++++++++-
 3 files changed, 170 insertions(+), 3 deletions(-)

diff --git a/Cargo.toml b/Cargo.toml
index 9bf7b79a..e8a4e0a9 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -145,6 +145,7 @@ nom = "7"
 num-traits = "0.2"
 once_cell = "1.3.1"
 openssl = "0.10.40"
+parking_lot = "0.12"
 percent-encoding = "2.1"
 pin-project-lite = "0.2"
 regex = "1.5.5"
diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
index 74afb3c6..eb81ce00 100644
--- a/pbs-config/Cargo.toml
+++ b/pbs-config/Cargo.toml
@@ -13,6 +13,7 @@ libc.workspace = true
 nix.workspace = true
 once_cell.workspace = true
 openssl.workspace = true
+parking_lot.workspace = true
 regex.workspace = true
 serde.workspace = true
 serde_json.workspace = true
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index 640fabbf..77efac0e 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,6 +1,8 @@
 use std::collections::HashMap;
+use std::sync::LazyLock;
 
 use anyhow::{bail, format_err, Error};
+use parking_lot::RwLock;
 use serde::{Deserialize, Serialize};
 use serde_json::{from_value, Value};
 
@@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
 const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
 const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
 
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
+    RwLock::new(ApiTokenSecretCache {
+        secrets: HashMap::new(),
+        shared_gen: 0,
+    })
+});
+
 #[derive(Serialize, Deserialize)]
 #[serde(rename_all = "kebab-case")]
 /// ApiToken id / secret pair
@@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
+    // Fast path
+    if cache_try_secret_matches(tokenid, secret) {
+        return Ok(());
+    }
+
+    // Slow path
+    // First, capture the shared generation before doing the hash verification.
+    let gen_before = token_shadow_shared_gen();
+
     let data = read_file()?;
     match data.get(tokenid) {
-        Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+        Some(hashed_secret) => {
+            proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+
+            // Try to cache only if nothing changed while verifying the secret.
+            if let Some(gen) = gen_before {
+                cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
+            }
+
+            Ok(())
+        }
         None => bail!("invalid API token"),
     }
 }
@@ -75,13 +107,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
-    let _guard = lock_config()?;
+    let guard = lock_config()?;
 
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
+    apply_api_mutation(guard, tokenid, Some(secret));
+
     Ok(())
 }
 
@@ -91,11 +125,142 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
-    let _guard = lock_config()?;
+    let guard = lock_config()?;
 
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
+    apply_api_mutation(guard, tokenid, None);
+
     Ok(())
 }
+
+/// Cached secret.
+struct CachedSecret {
+    secret: String,
+}
+
+struct ApiTokenSecretCache {
+    /// Keys are token Authids, values are the corresponding plain text secrets.
+    /// Entries are added after a successful on-disk verification in
+    /// `verify_secret` or when a new token secret is generated by
+    /// `generate_and_set_secret`. Used to avoid repeated
+    /// password-hash computation on subsequent authentications.
+    secrets: HashMap<Authid, CachedSecret>,
+    /// Shared generation to detect mutations of the underlying token.shadow file.
+    shared_gen: usize,
+}
+
+impl ApiTokenSecretCache {
+    /// Resets all local cache contents and sets/updates the cached generation.
+    fn reset_and_set_gen(&mut self, gen: usize) {
+        self.secrets.clear();
+        self.shared_gen = gen;
+    }
+
+    /// Caches a secret and sets/updates the cache generation.
+    fn insert_and_set_gen(&mut self, tokenid: Authid, secret: CachedSecret, gen: usize) {
+        self.secrets.insert(tokenid, secret);
+        self.shared_gen = gen;
+    }
+
+    /// Evicts a cached secret and sets/updates the cached generation.
+    fn evict_and_set_gen(&mut self, tokenid: &Authid, gen: usize) {
+        self.secrets.remove(tokenid);
+        self.shared_gen = gen;
+    }
+}
+
+fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return;
+    };
+
+    let Some(shared_gen_now) = token_shadow_shared_gen() else {
+        return;
+    };
+
+    // If this process missed a generation bump, its cache is stale.
+    if cache.shared_gen != shared_gen_now {
+        cache.reset_and_set_gen(shared_gen_now);
+    }
+
+    // If a mutation happened while we were verifying the secret, do not insert.
+    if shared_gen_now == shared_gen_before {
+        cache.insert_and_set_gen(tokenid, CachedSecret { secret }, shared_gen_now);
+    }
+}
+
+/// Tries to match the given token secret against the cached secret.
+///
+/// Verifies the generation/version before doing the constant-time
+/// comparison to reduce TOCTOU risk. During token rotation or deletion
+/// tokens for in-flight requests may still validate against the previous
+/// generation.
+fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
+    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+        return false;
+    };
+    let Some(entry) = cache.secrets.get(tokenid) else {
+        return false;
+    };
+    let Some(current_gen) = token_shadow_shared_gen() else {
+        return false;
+    };
+
+    if current_gen == cache.shared_gen {
+        let cached_secret_bytes = entry.secret.as_bytes();
+        let secret_bytes = secret.as_bytes();
+
+        return cached_secret_bytes.len() == secret_bytes.len()
+            && openssl::memcmp::eq(cached_secret_bytes, secret_bytes);
+    }
+
+    false
+}
+
+fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+    // Signal cache invalidation to other processes (best-effort).
+    let bumped_gen = bump_token_shadow_shared_gen();
+
+    let mut cache = TOKEN_SECRET_CACHE.write();
+
+    // If we cannot get the current generation, we cannot trust the cache
+    let Some(current_gen) = token_shadow_shared_gen() else {
+        cache.reset_and_set_gen(0);
+        return;
+    };
+
+    // If we cannot bump the shared generation, or if it changed after
+    // obtaining the cache write lock, we cannot trust the cache
+    if bumped_gen != Some(current_gen) {
+        cache.reset_and_set_gen(current_gen);
+        return;
+    }
+
+    // Apply the new mutation.
+    match new_secret {
+        Some(secret) => {
+            let cached_secret = CachedSecret {
+                secret: secret.to_owned(),
+            };
+            cache.insert_and_set_gen(tokenid.clone(), cached_secret, current_gen);
+        }
+        None => cache.evict_and_set_gen(tokenid, current_gen),
+    }
+}
+
+/// Get the current shared generation.
+fn token_shadow_shared_gen() -> Option<usize> {
+    crate::ConfigVersionCache::new()
+        .ok()
+        .map(|cvc| cvc.token_shadow_generation())
+}
+
+/// Bump and return the new shared generation.
+fn bump_token_shadow_shared_gen() -> Option<usize> {
+    crate::ConfigVersionCache::new()
+        .ok()
+        .map(|cvc| cvc.increase_token_shadow_generation() + 1)
+}
-- 
2.47.3





^ permalink raw reply related	[relevance 11%]

* [PATCH proxmox-datacenter-manager v6 1/3] pdm-config: implement token.shadow generation
  2026-03-03 16:49 14% [PATCH proxmox{-backup,,-datacenter-manager} v6 " Samuel Rufinatscha
                   ` (7 preceding siblings ...)
  2026-03-03 16:49 15% ` [PATCH proxmox v6 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
@ 2026-03-03 16:49 13% ` Samuel Rufinatscha
  2026-03-03 16:49 17% ` [PATCH proxmox-datacenter-manager v6 2/3] docs: document API token-cache TTL effects Samuel Rufinatscha
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-03 16:49 UTC (permalink / raw)
  To: pbs-devel

PDM depends on the shared proxmox/proxmox-access-control crate for
token.shadow handling which expects the product to provide a
cross-process invalidation signal so it can cache/invalidate
token.shadow secrets.

This patch wires AccessControlConfig to ConfigVersionCache for
token.shadow invalidation and switches server/CLI/UI init to use
pdm-config’s AccessControlConfig.

Safety: the shmem mapping is fixed to 4096 bytes via the #[repr(C)]
union padding, and the new atomic is appended to the end of the
#[repr(C)] inner struct, so all existing field offsets stay unchanged.
Old processes keep accessing the same bytes and new processes consume
previously reserved padding.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased
* Added safety note to commit message

Changes from v3 to v4:
* pdm-api-types: replace AccessControlConfig with
AccessControlPermissions and implement init::AccessControlPermissions
there
* pdm-config: add new AccessControlConfig implementing
init::AccessControlConfig
* UI: init uses a local UiAccessControlConfig for init_access_config()
* Adjusted commit message

 cli/admin/src/main.rs                      |  2 +-
 lib/pdm-api-types/src/acl.rs               |  4 ++--
 lib/pdm-config/Cargo.toml                  |  1 +
 lib/pdm-config/src/access_control.rs       | 20 ++++++++++++++++++++
 lib/pdm-config/src/config_version_cache.rs | 18 ++++++++++++++++++
 lib/pdm-config/src/lib.rs                  |  2 ++
 server/src/acl.rs                          |  3 +--
 ui/src/main.rs                             | 10 +++++++++-
 8 files changed, 54 insertions(+), 6 deletions(-)
 create mode 100644 lib/pdm-config/src/access_control.rs

diff --git a/cli/admin/src/main.rs b/cli/admin/src/main.rs
index f698fa2..916c633 100644
--- a/cli/admin/src/main.rs
+++ b/cli/admin/src/main.rs
@@ -19,7 +19,7 @@ fn main() {
     proxmox_product_config::init(api_user, priv_user);
 
     proxmox_access_control::init::init(
-        &pdm_api_types::AccessControlConfig,
+        &pdm_config::AccessControlConfig,
         pdm_buildcfg::configdir!("/access"),
     )
     .expect("failed to setup access control config");
diff --git a/lib/pdm-api-types/src/acl.rs b/lib/pdm-api-types/src/acl.rs
index 405982a..7c405a7 100644
--- a/lib/pdm-api-types/src/acl.rs
+++ b/lib/pdm-api-types/src/acl.rs
@@ -187,9 +187,9 @@ pub struct AclListItem {
     pub roleid: String,
 }
 
-pub struct AccessControlConfig;
+pub struct AccessControlPermissions;
 
-impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
+impl proxmox_access_control::init::AccessControlPermissions for AccessControlPermissions {
     fn privileges(&self) -> &HashMap<&str, u64> {
         static PRIVS: LazyLock<HashMap<&str, u64>> =
             LazyLock::new(|| PRIVILEGES.iter().copied().collect());
diff --git a/lib/pdm-config/Cargo.toml b/lib/pdm-config/Cargo.toml
index d39c2ad..19781d2 100644
--- a/lib/pdm-config/Cargo.toml
+++ b/lib/pdm-config/Cargo.toml
@@ -13,6 +13,7 @@ once_cell.workspace = true
 openssl.workspace = true
 serde.workspace = true
 
+proxmox-access-control.workspace = true
 proxmox-config-digest = { workspace = true, features = [ "openssl" ] }
 proxmox-http = { workspace = true, features = [ "http-helpers" ] }
 proxmox-ldap = { workspace = true, features = [ "types" ]}
diff --git a/lib/pdm-config/src/access_control.rs b/lib/pdm-config/src/access_control.rs
new file mode 100644
index 0000000..389b3f4
--- /dev/null
+++ b/lib/pdm-config/src/access_control.rs
@@ -0,0 +1,20 @@
+use anyhow::Error;
+
+pub struct AccessControlConfig;
+
+impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
+    fn permissions(&self) -> &dyn proxmox_access_control::init::AccessControlPermissions {
+        &pdm_api_types::AccessControlPermissions
+    }
+
+    fn token_shadow_cache_generation(&self) -> Option<usize> {
+        crate::ConfigVersionCache::new()
+            .ok()
+            .map(|c| c.token_shadow_generation())
+    }
+
+    fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
+        let c = crate::ConfigVersionCache::new()?;
+        Ok(c.increase_token_shadow_generation())
+    }
+}
diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
index 36a6a77..933140c 100644
--- a/lib/pdm-config/src/config_version_cache.rs
+++ b/lib/pdm-config/src/config_version_cache.rs
@@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
     traffic_control_generation: AtomicUsize,
     // Tracks updates to the remote/hostname/nodename mapping cache.
     remote_mapping_cache: AtomicUsize,
+    // Token shadow (token.shadow) generation/version.
+    token_shadow_generation: AtomicUsize,
     // Add further atomics here
 }
 
@@ -172,4 +174,20 @@ impl ConfigVersionCache {
             .fetch_add(1, Ordering::Relaxed)
             + 1
     }
+
+    /// Returns the token shadow generation number.
+    pub fn token_shadow_generation(&self) -> usize {
+        self.shmem
+            .data()
+            .token_shadow_generation
+            .load(Ordering::Acquire)
+    }
+
+    /// Increase the token shadow generation number.
+    pub fn increase_token_shadow_generation(&self) -> usize {
+        self.shmem
+            .data()
+            .token_shadow_generation
+            .fetch_add(1, Ordering::AcqRel)
+    }
 }
diff --git a/lib/pdm-config/src/lib.rs b/lib/pdm-config/src/lib.rs
index 4c49054..614f7ae 100644
--- a/lib/pdm-config/src/lib.rs
+++ b/lib/pdm-config/src/lib.rs
@@ -9,6 +9,8 @@ pub mod remotes;
 pub mod setup;
 pub mod views;
 
+mod access_control;
+pub use access_control::AccessControlConfig;
 mod config_version_cache;
 pub use config_version_cache::ConfigVersionCache;
 
diff --git a/server/src/acl.rs b/server/src/acl.rs
index f421814..e6e007b 100644
--- a/server/src/acl.rs
+++ b/server/src/acl.rs
@@ -1,6 +1,5 @@
 pub(crate) fn init() {
-    static ACCESS_CONTROL_CONFIG: pdm_api_types::AccessControlConfig =
-        pdm_api_types::AccessControlConfig;
+    static ACCESS_CONTROL_CONFIG: pdm_config::AccessControlConfig = pdm_config::AccessControlConfig;
 
     proxmox_access_control::init::init(&ACCESS_CONTROL_CONFIG, pdm_buildcfg::configdir!("/access"))
         .expect("failed to setup access control config");
diff --git a/ui/src/main.rs b/ui/src/main.rs
index 2bd900e..9f87505 100644
--- a/ui/src/main.rs
+++ b/ui/src/main.rs
@@ -390,10 +390,18 @@ fn main() {
     pwt::state::set_available_languages(proxmox_yew_comp::available_language_list());
 
     if let Err(e) =
-        proxmox_access_control::init::init_access_config(&pdm_api_types::AccessControlConfig)
+        proxmox_access_control::init::init_access_config(&UiAccessControlConfig)
     {
         log::error!("could not initialize access control config - {e:#}");
     }
 
     yew::Renderer::<DatacenterManagerApp>::new().render();
 }
+
+struct UiAccessControlConfig;
+
+impl proxmox_access_control::init::AccessControlConfig for UiAccessControlConfig {
+    fn permissions(&self) -> &dyn proxmox_access_control::init::AccessControlPermissions {
+        &pdm_api_types::AccessControlPermissions
+    }
+}
-- 
2.47.3





^ permalink raw reply related	[relevance 13%]

* [PATCH proxmox v6 2/4] proxmox-access-control: cache verified API token secrets
  2026-03-03 16:49 14% [PATCH proxmox{-backup,,-datacenter-manager} v6 " Samuel Rufinatscha
                   ` (4 preceding siblings ...)
  2026-03-03 16:49 14% ` [PATCH proxmox v6 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen Samuel Rufinatscha
@ 2026-03-03 16:49 11% ` Samuel Rufinatscha
  2026-03-03 16:49 12% ` [PATCH proxmox v6 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-03 16:49 UTC (permalink / raw)
  To: pbs-devel

Adds an in-memory cache of successfully verified token secrets.
Subsequent requests for the same token+secret combination only perform
a comparison using openssl::memcmp::eq and avoid re-running the
password hash. The cache is updated when a token secret is set and
cleared when a token is deleted. A shared generation counter (via
ConfigVersionCache) is used to invalidate caches across processes when
token secrets are modified or deleted. This keeps privileged and
unprivileged daemons in sync.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v5 to v6:
* Rebased
* Check that the input byte lengths are equal before calling
openssl::memcmp::eq(..).

Changes from v4 to v5:
* Rebased
* Fix wrong type compilation issue; replaced with ApiLockGuard
* Move invalidate_cache_state_and_set_gen into cache object impl
rename to reset_and_set_gen
* Add additional insert/remove helpers which set/update the generation
directly
* Clarified the  usage of shared generation counter in the commit
message

Changes from v3 to v4:
* Add gen param to invalidate_cache_state()
* Validates the generation bump after obtaining write lock in
apply_api_mutation
* Pass lock to apply_api_mutation
* Remove unnecessary gen check cache_try_secret_matches
* Adjusted commit message

Changes from v2 to v3:
* Replaced process-local cache invalidation (AtomicU64
API_MUTATION_GENERATION) with a cross-process shared generation via
ConfigVersionCache.
* Validate shared generation before/after the constant-time secret
compare; only insert into cache if the generation is unchanged.
* invalidate_cache_state() on insert if shared generation changed.

Changes from v1 to v2:
* Replace OnceCell with LazyLock, and std::sync::RwLock with
parking_lot::RwLock.
* Add API_MUTATION_GENERATION and guard cache inserts
to prevent “zombie inserts” across concurrent set/delete.
* Refactor cache operations into cache_try_secret_matches,
cache_try_insert_secret, and centralize write-side behavior in
apply_api_mutation.
* Switch fast-path cache access to try_read/try_write (best-effort).

 Cargo.toml                                 |   1 +
 proxmox-access-control/Cargo.toml          |   1 +
 proxmox-access-control/src/token_shadow.rs | 171 ++++++++++++++++++++-
 3 files changed, 170 insertions(+), 3 deletions(-)

diff --git a/Cargo.toml b/Cargo.toml
index 1a6cc5d7..3747b092 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -113,6 +113,7 @@ native-tls = "0.2"
 nix = "0.29"
 openssl = "0.10"
 pam-sys = "0.5"
+parking_lot = "0.12"
 percent-encoding = "2.1"
 pin-utils = "0.1.0"
 proc-macro2 = "1.0"
diff --git a/proxmox-access-control/Cargo.toml b/proxmox-access-control/Cargo.toml
index ec189664..1de2842c 100644
--- a/proxmox-access-control/Cargo.toml
+++ b/proxmox-access-control/Cargo.toml
@@ -16,6 +16,7 @@ anyhow.workspace = true
 const_format.workspace = true
 nix = { workspace = true, optional = true }
 openssl = { workspace = true, optional = true }
+parking_lot.workspace = true
 regex.workspace = true
 hex = { workspace = true, optional = true }
 serde.workspace = true
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index c586d834..79e78555 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,13 +1,28 @@
 use std::collections::HashMap;
+use std::sync::LazyLock;
 
 use anyhow::{bail, format_err, Error};
+use parking_lot::RwLock;
 use serde_json::{from_value, Value};
 
 use proxmox_auth_api::types::Authid;
 use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
 
+use crate::init::access_conf;
 use crate::init::impl_feature::{token_shadow, token_shadow_lock};
 
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
+    RwLock::new(ApiTokenSecretCache {
+        secrets: HashMap::new(),
+        shared_gen: 0,
+    })
+});
+
 // Get exclusive lock
 fn lock_config() -> Result<ApiLockGuard, Error> {
     open_api_lockfile(token_shadow_lock(), None, true)
@@ -36,9 +51,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
+    // Fast path
+    if cache_try_secret_matches(tokenid, secret) {
+        return Ok(());
+    }
+
+    // Slow path
+    // First, capture the shared generation before doing the hash verification.
+    let gen_before = token_shadow_shared_gen();
+
     let data = read_file()?;
     match data.get(tokenid) {
-        Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+        Some(hashed_secret) => {
+            proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+
+            // Try to cache only if nothing changed while verifying the secret.
+            if let Some(gen) = gen_before {
+                cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
+            }
+
+            Ok(())
+        }
         None => bail!("invalid API token"),
     }
 }
@@ -49,13 +82,15 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
-    let _guard = lock_config()?;
+    let guard = lock_config()?;
 
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
+    apply_api_mutation(guard, tokenid, Some(secret));
+
     Ok(())
 }
 
@@ -65,12 +100,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
-    let _guard = lock_config()?;
+    let guard = lock_config()?;
 
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
+    apply_api_mutation(guard, tokenid, None);
+
     Ok(())
 }
 
@@ -81,3 +118,131 @@ pub fn generate_and_set_secret(tokenid: &Authid) -> Result<String, Error> {
     set_secret(tokenid, &secret)?;
     Ok(secret)
 }
+
+/// Cached secret.
+struct CachedSecret {
+    secret: String,
+}
+
+struct ApiTokenSecretCache {
+    /// Keys are token Authids, values are the corresponding plain text secrets.
+    /// Entries are added after a successful on-disk verification in
+    /// `verify_secret` or when a new token secret is generated by
+    /// `generate_and_set_secret`. Used to avoid repeated
+    /// password-hash computation on subsequent authentications.
+    secrets: HashMap<Authid, CachedSecret>,
+    /// Shared generation to detect mutations of the underlying token.shadow file.
+    shared_gen: usize,
+}
+
+impl ApiTokenSecretCache {
+    /// Resets all local cache contents and sets/updates the cached generation.
+    fn reset_and_set_gen(&mut self, gen: usize) {
+        self.secrets.clear();
+        self.shared_gen = gen;
+    }
+
+    /// Caches a secret and sets/updates the cache generation.
+    fn insert_and_set_gen(&mut self, tokenid: Authid, secret: CachedSecret, gen: usize) {
+        self.secrets.insert(tokenid, secret);
+        self.shared_gen = gen;
+    }
+
+    /// Evicts a cached secret and sets/updates the cached generation.
+    fn evict_and_set_gen(&mut self, tokenid: &Authid, gen: usize) {
+        self.secrets.remove(tokenid);
+        self.shared_gen = gen;
+    }
+}
+
+fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return;
+    };
+
+    let Some(shared_gen_now) = token_shadow_shared_gen() else {
+        return;
+    };
+
+    // If this process missed a generation bump, its cache is stale.
+    if cache.shared_gen != shared_gen_now {
+        cache.reset_and_set_gen(shared_gen_now);
+    }
+
+    // If a mutation happened while we were verifying the secret, do not insert.
+    if shared_gen_now == shared_gen_before {
+        cache.insert_and_set_gen(tokenid, CachedSecret { secret }, shared_gen_now);
+    }
+}
+
+/// Tries to match the given token secret against the cached secret.
+///
+/// Verifies the generation/version before doing the constant-time
+/// comparison to reduce TOCTOU risk. During token rotation or deletion
+/// tokens for in-flight requests may still validate against the previous
+/// generation.
+fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
+    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+        return false;
+    };
+    let Some(entry) = cache.secrets.get(tokenid) else {
+        return false;
+    };
+    let Some(current_gen) = token_shadow_shared_gen() else {
+        return false;
+    };
+
+    if current_gen == cache.shared_gen {
+        let cached_secret_bytes = entry.secret.as_bytes();
+        let secret_bytes = secret.as_bytes();
+
+        return cached_secret_bytes.len() == secret_bytes.len()
+            && openssl::memcmp::eq(cached_secret_bytes, secret_bytes);
+    }
+
+    false
+}
+
+fn apply_api_mutation(_guard: ApiLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+    // Signal cache invalidation to other processes (best-effort).
+    let bumped_gen = bump_token_shadow_shared_gen();
+
+    let mut cache = TOKEN_SECRET_CACHE.write();
+
+    // If we cannot get the current generation, we cannot trust the cache
+    let Some(current_gen) = token_shadow_shared_gen() else {
+        cache.reset_and_set_gen(0);
+        return;
+    };
+
+    // If we cannot bump the shared generation, or if it changed after
+    // obtaining the cache write lock, we cannot trust the cache
+    if bumped_gen != Some(current_gen) {
+        cache.reset_and_set_gen(current_gen);
+        return;
+    }
+
+    // Apply the new mutation.
+    match new_secret {
+        Some(secret) => {
+            let cached_secret = CachedSecret {
+                secret: secret.to_owned(),
+            };
+            cache.insert_and_set_gen(tokenid.clone(), cached_secret, current_gen);
+        }
+        None => cache.evict_and_set_gen(tokenid, current_gen),
+    }
+}
+
+/// Get the current shared generation.
+fn token_shadow_shared_gen() -> Option<usize> {
+    access_conf().token_shadow_cache_generation()
+}
+
+/// Bump and return the new shared generation.
+fn bump_token_shadow_shared_gen() -> Option<usize> {
+    access_conf()
+        .increment_token_shadow_cache_generation()
+        .ok()
+        .map(|prev| prev + 1)
+}
-- 
2.47.3





^ permalink raw reply related	[relevance 11%]

* [pbs-devel] superseded: [PATCH proxmox{-backup,,-datacenter-manager} v5 00/11] token-shadow: reduce api token verification overhead
  2026-02-17 11:12 14% [PATCH proxmox{-backup,,-datacenter-manager} v5 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
                   ` (10 preceding siblings ...)
  2026-02-17 11:12 16% ` [PATCH proxmox-datacenter-manager v5 3/3] pdm-config: wire user+acl cache generation Samuel Rufinatscha
@ 2026-03-03 16:52 13% ` Samuel Rufinatscha
  11 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-03 16:52 UTC (permalink / raw)
  To: pbs-devel

https://lore.proxmox.com/pbs-devel/20260303165005.373120-1-s.rufinatscha@proxmox.com/T/#t

On 2/17/26 12:12 PM, Samuel Rufinatscha wrote:
> Hi,
> 
> this series improves the performance of token-based API authentication
> in PBS (pbs-config) and in PDM (underlying proxmox-access-control
> crate), addressing the API token verification hotspot reported in our
> bugtracker #7017 [1].
> 
> When profiling PBS /status endpoint with cargo flamegraph [2],
> token-based authentication showed up as a dominant hotspot via
> proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
> path from the hot section of the flamegraph. The same performance issue
> was measured [2] for PDM. PDM uses the underlying shared
> proxmox-access-control library for token handling, which is a
> factored out version of the token.shadow handling code from PBS.
> 
> While this series fixes the immediate performance issue both in PBS
> (pbs-config) and in the shared proxmox-access-control crate used by
> PDM, PBS should eventually, ideally be refactored, in a separate
> effort, to use proxmox-access-control for token handling instead of its
> local implementation.
> 
> Approach
> 
> The goal is to reduce the cost of token-based authentication preserving
> the existing token handling semantics (including detecting manual edits
> to token.shadow) and be consistent between PBS (pbs-config) and
> PDM (proxmox-access-control). For both sites, this series proposes to:
> 
> 1. Introduce an in-memory cache for verified token secrets and
> invalidate it through a shared ConfigVersionCache generation. Note, a
> shared generation is required to keep privileged and unprivileged
> daemon in sync to avoid caching inconsistencies across processes.
> 2. Invalidate on token.shadow API changes (set_secret,
> delete_secret)
> 3. Invalidate on direct/manual token.shadow file changes (mtime +
> length)
> 4. Avoid per-request file stat calls using a TTL window
> 
> Testing
> 
> To verify the effect in PBS (pbs-config changes), I:
> 1. Set up test environment based on latest PBS ISO, installed Rust
>     toolchain, cloned proxmox-backup repository to use with cargo
>     flamegraph. Reproduced bug #7017 [1] by profiling the /status
>     endpoint with token-based authentication using cargo flamegraph [2].
> 2. Built PBS with pbs-config patches and re-ran the same workload and
>     profiling setup. Confirmed that
>     proxmox_sys::crypt::verify_crypt_pw path no longer appears in the
>     hot section of the flamegraph. CPU usage is now dominated by TLS
>     overhead.
> 3. Functionally-wise, I verified that:
>     * valid tokens authenticate correctly when used in API requests
>     * invalid secrets are rejected as before
>     * generating a new token secret via dashboard (create token for
>     user, regenerate existing secret) works and authenticates correctly
> 
> To verify the effect in PDM (proxmox-access-control changes), instead
> of PBS’ /status, I profiled the /version endpoint with cargo flamegraph
> [2] and verified that the expensive hashing path disappears from the
> hot section after introducing caching. Functionally-wise, I verified
> that:
>     * valid tokens authenticate correctly when used in API requests
>     * invalid secrets are rejected as before
>     * generating a new token secret via dashboard (create token for user,
>     regenerate existing secret) works and authenticates correctly
> 
> Results
> 
> To measure caching effect I benchmarked parallel token auth requests
> for /status?verbose=0 on top of the datastore lookup cache series [3]
> to check throughput impact. With datastores=1, repeat=5000, parallel=16
> this series gives ~172 req/s compared to ~65 req/s without it.
> This is a ~2.6x improvement (and aligns with the ~179 req/s from the
> previous series, which used per-process cache invalidation).
> 
> Patch summary
> 
> pbs-config:
> 0001 – pbs-config: add token.shadow generation to ConfigVersionCache
> 0002 – pbs-config: cache verified API token secrets
> 0003 – pbs-config: invalidate token-secret cache on token.shadow
> changes
> 0004 – pbs-config: add TTL window to token-secret cache
> 
> proxmox-access-control:
> 0005 – access-control: extend AccessControlConfig for token.shadow invalidation
> 0006 – access-control: cache verified API token secrets
> 0007 – access-control: invalidate token-secret cache on token.shadow changes
> 0008 – access-control: add TTL window to token-secret cache
> 
> proxmox-datacenter-manager:
> 0009 – pdm-config: add token.shadow generation to ConfigVersionCache
> 0010 – docs: document API token-cache TTL effects
> 0011 – pdm-config: wire user+acl cache generation
> 
> Maintainer Notes:
> * proxmox-access-control trait split: permissions now live in
>   AccessControlPermissions, and AccessControlConfig now requires
>   fn permissions(&self) -> &dyn AccessControlPermissions ->
>   version bump
> * Renames ConfigVersionCache`s pub user_cache_generation and
>   increase_user_cache_generation -> version bump
> * Adds parking_lot::RwLock dependency in PBS and proxmox-access-control
> 
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
> [2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
> [3] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> 
> proxmox-backup:
> 
> Samuel Rufinatscha (4):
>    pbs-config: add token.shadow generation to ConfigVersionCache
>    pbs-config: cache verified API token secrets
>    pbs-config: invalidate token-secret cache on token.shadow changes
>    pbs-config: add TTL window to token secret cache
> 
>   Cargo.toml                             |   1 +
>   docs/user-management.rst               |   4 +
>   pbs-config/Cargo.toml                  |   1 +
>   pbs-config/src/config_version_cache.rs |  18 ++
>   pbs-config/src/token_shadow.rs         | 310 ++++++++++++++++++++++++-
>   5 files changed, 331 insertions(+), 3 deletions(-)
> 
> 
> proxmox:
> 
> Samuel Rufinatscha (4):
>    proxmox-access-control: split AccessControlConfig and add token.shadow
>      gen
>    proxmox-access-control: cache verified API token secrets
>    proxmox-access-control: invalidate token-secret cache on token.shadow
>      changes
>    proxmox-access-control: add TTL window to token secret cache
> 
>   Cargo.toml                                 |   1 +
>   proxmox-access-control/Cargo.toml          |   1 +
>   proxmox-access-control/src/acl.rs          |  10 +-
>   proxmox-access-control/src/init.rs         | 113 ++++++--
>   proxmox-access-control/src/token_shadow.rs | 311 ++++++++++++++++++++-
>   5 files changed, 409 insertions(+), 27 deletions(-)
> 
> 
> proxmox-datacenter-manager:
> 
> Samuel Rufinatscha (3):
>    pdm-config: implement token.shadow generation
>    docs: document API token-cache TTL effects
>    pdm-config: wire user+acl cache generation
> 
>   cli/admin/src/main.rs                      |  2 +-
>   docs/access-control.rst                    |  4 +++
>   lib/pdm-api-types/src/acl.rs               |  4 +--
>   lib/pdm-config/Cargo.toml                  |  1 +
>   lib/pdm-config/src/access_control.rs       | 31 ++++++++++++++++++++
>   lib/pdm-config/src/config_version_cache.rs | 34 +++++++++++++++++-----
>   lib/pdm-config/src/lib.rs                  |  2 ++
>   server/src/acl.rs                          |  3 +-
>   ui/src/main.rs                             | 10 ++++++-
>   9 files changed, 77 insertions(+), 14 deletions(-)
>   create mode 100644 lib/pdm-config/src/access_control.rs
> 
> 
> Summary over all repositories:
>    19 files changed, 817 insertions(+), 44 deletions(-)
> 





^ permalink raw reply	[relevance 13%]

* Re: [PATCH proxmox{-backup,,-datacenter-manager} v6 00/11] token-shadow: reduce api token verification overhead
  2026-03-03 16:49 14% [PATCH proxmox{-backup,,-datacenter-manager} v6 " Samuel Rufinatscha
                   ` (10 preceding siblings ...)
  2026-03-03 16:50 16% ` [PATCH proxmox-datacenter-manager v6 3/3] pdm-config: wire user+acl cache generation Samuel Rufinatscha
@ 2026-03-11  8:59  6% ` Fabian Grünbichler
  2026-03-11 16:26  6%   ` Samuel Rufinatscha
  2026-03-12 10:38 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
  12 siblings, 1 reply; 117+ results
From: Fabian Grünbichler @ 2026-03-11  8:59 UTC (permalink / raw)
  To: pbs-devel, Samuel Rufinatscha

On March 3, 2026 5:49 pm, Samuel Rufinatscha wrote:
> Hi,
> 
> this series improves the performance of token-based API authentication
> in PBS (pbs-config) and in PDM (underlying proxmox-access-control
> crate), addressing the API token verification hotspot reported in our
> bugtracker #7017 [1].

could you re-spin this so the proxmox/pdm parts compile (the proxmox
workspace got switched to edition 2024, which clashes with the `gen`
variable/binding name used in the patches here).

please also adapt the PBS patches to the new compatible naming - at some
point we probably want to switch that over as well ;)

> 
> When profiling PBS /status endpoint with cargo flamegraph [2],
> token-based authentication showed up as a dominant hotspot via
> proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
> path from the hot section of the flamegraph. The same performance issue
> was measured [2] for PDM. PDM uses the underlying shared
> proxmox-access-control library for token handling, which is a
> factored out version of the token.shadow handling code from PBS.
> 
> While this series fixes the immediate performance issue both in PBS
> (pbs-config) and in the shared proxmox-access-control crate used by
> PDM, PBS should eventually, ideally be refactored, in a separate
> effort, to use proxmox-access-control for token handling instead of its
> local implementation.
> 
> Approach
> 
> The goal is to reduce the cost of token-based authentication preserving
> the existing token handling semantics (including detecting manual edits
> to token.shadow) and be consistent between PBS (pbs-config) and
> PDM (proxmox-access-control). For both sites, this series proposes to:
> 
> 1. Introduce an in-memory cache for verified token secrets and
> invalidate it through a shared ConfigVersionCache generation. Note, a
> shared generation is required to keep privileged and unprivileged
> daemon in sync to avoid caching inconsistencies across processes.
> 2. Invalidate on token.shadow API changes (set_secret,
> delete_secret)
> 3. Invalidate on direct/manual token.shadow file changes (mtime +
> length)
> 4. Avoid per-request file stat calls using a TTL window
> 
> Testing
> 
> To verify the effect in PBS (pbs-config changes), I:
> 1. Set up test environment based on latest PBS ISO, installed Rust
>    toolchain, cloned proxmox-backup repository to use with cargo
>    flamegraph. Reproduced bug #7017 [1] by profiling the /status
>    endpoint with token-based authentication using cargo flamegraph [2].
> 2. Built PBS with pbs-config patches and re-ran the same workload and
>    profiling setup. Confirmed that
>    proxmox_sys::crypt::verify_crypt_pw path no longer appears in the
>    hot section of the flamegraph. CPU usage is now dominated by TLS
>    overhead.
> 3. Functionally-wise, I verified that:
>    * valid tokens authenticate correctly when used in API requests
>    * invalid secrets are rejected as before
>    * generating a new token secret via dashboard (create token for
>    user, regenerate existing secret) works and authenticates correctly
> 
> To verify the effect in PDM (proxmox-access-control changes), instead
> of PBS’ /status, I profiled the /version endpoint with cargo flamegraph
> [2] and verified that the expensive hashing path disappears from the
> hot section after introducing caching. Functionally-wise, I verified
> that:
>    * valid tokens authenticate correctly when used in API requests
>    * invalid secrets are rejected as before
>    * generating a new token secret via dashboard (create token for user,
>    regenerate existing secret) works and authenticates correctly
> 
> Results
> 
> To measure caching effect I benchmarked parallel token auth requests
> for /status?verbose=0 on top of the datastore lookup cache series [3]
> to check throughput impact. With datastores=1, repeat=5000, parallel=16
> this series gives ~172 req/s compared to ~65 req/s without it.
> This is a ~2.6x improvement (and aligns with the ~179 req/s from the
> previous series, which used per-process cache invalidation).
> 
> Patch summary
> 
> pbs-config:
> 0001 – pbs-config: add token.shadow generation to ConfigVersionCache
> 0002 – pbs-config: cache verified API token secrets
> 0003 – pbs-config: invalidate token-secret cache on token.shadow
> changes
> 0004 – pbs-config: add TTL window to token-secret cache
> 
> proxmox-access-control:
> 0005 – access-control: extend AccessControlConfig for token.shadow invalidation
> 0006 – access-control: cache verified API token secrets
> 0007 – access-control: invalidate token-secret cache on token.shadow changes
> 0008 – access-control: add TTL window to token-secret cache
> 
> proxmox-datacenter-manager:
> 0009 – pdm-config: add token.shadow generation to ConfigVersionCache
> 0010 – docs: document API token-cache TTL effects
> 0011 – pdm-config: wire user+acl cache generation
> 
> Maintainer Notes:
> * proxmox-access-control trait split: permissions now live in
>  AccessControlPermissions, and AccessControlConfig now requires
>  fn permissions(&self) -> &dyn AccessControlPermissions ->
>  version bump
> * Renames ConfigVersionCache`s pub user_cache_generation and
>  increase_user_cache_generation -> version bump
> * Adds parking_lot::RwLock dependency in PBS and proxmox-access-control
> 
> This version and the version before only incorporate the reviewers'
> feedback [4][5], also please consider Christian's R-b tag [4].
> 
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
> [2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
> [3] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [4] https://lore.proxmox.com/pbs-devel/20260121151408.731516-1-s.rufinatscha@proxmox.com/T/#t
> [5] https://lore.proxmox.com/pbs-devel/20260217111229.78661-1-s.rufinatscha@proxmox.com/T/#t
> 
> proxmox-backup:
> 
> Samuel Rufinatscha (4):
>   pbs-config: add token.shadow generation to ConfigVersionCache
>   pbs-config: cache verified API token secrets
>   pbs-config: invalidate token-secret cache on token.shadow changes
>   pbs-config: add TTL window to token secret cache
> 
>  Cargo.toml                             |   1 +
>  docs/user-management.rst               |   4 +
>  pbs-config/Cargo.toml                  |   1 +
>  pbs-config/src/config_version_cache.rs |  18 ++
>  pbs-config/src/token_shadow.rs         | 314 ++++++++++++++++++++++++-
>  5 files changed, 335 insertions(+), 3 deletions(-)
> 
> 
> proxmox:
> 
> Samuel Rufinatscha (4):
>   proxmox-access-control: split AccessControlConfig and add token.shadow
>     gen
>   proxmox-access-control: cache verified API token secrets
>   proxmox-access-control: invalidate token-secret cache on token.shadow
>     changes
>   proxmox-access-control: add TTL window to token secret cache
> 
>  Cargo.toml                                 |   1 +
>  proxmox-access-control/Cargo.toml          |   1 +
>  proxmox-access-control/src/acl.rs          |  10 +-
>  proxmox-access-control/src/init.rs         | 113 ++++++--
>  proxmox-access-control/src/token_shadow.rs | 315 ++++++++++++++++++++-
>  5 files changed, 413 insertions(+), 27 deletions(-)
> 
> 
> proxmox-datacenter-manager:
> 
> Samuel Rufinatscha (3):
>   pdm-config: implement token.shadow generation
>   docs: document API token-cache TTL effects
>   pdm-config: wire user+acl cache generation
> 
>  cli/admin/src/main.rs                      |  2 +-
>  docs/access-control.rst                    |  4 +++
>  lib/pdm-api-types/src/acl.rs               |  4 +--
>  lib/pdm-config/Cargo.toml                  |  1 +
>  lib/pdm-config/src/access_control.rs       | 31 ++++++++++++++++++++
>  lib/pdm-config/src/config_version_cache.rs | 34 +++++++++++++++++-----
>  lib/pdm-config/src/lib.rs                  |  2 ++
>  server/src/acl.rs                          |  3 +-
>  ui/src/main.rs                             | 10 ++++++-
>  9 files changed, 77 insertions(+), 14 deletions(-)
>  create mode 100644 lib/pdm-config/src/access_control.rs
> 
> 
> Summary over all repositories:
>   19 files changed, 825 insertions(+), 44 deletions(-)
> 
> -- 
> Generated by git-murpp 0.8.1
> 
> 
> 
> 
> 




^ permalink raw reply	[relevance 6%]

* Re: [PATCH proxmox{-backup,,-datacenter-manager} v6 00/11] token-shadow: reduce api token verification overhead
  2026-03-11  8:59  6% ` [PATCH proxmox{-backup,,-datacenter-manager} v6 00/11] token-shadow: reduce api token verification overhead Fabian Grünbichler
@ 2026-03-11 16:26  6%   ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-11 16:26 UTC (permalink / raw)
  To: Fabian Grünbichler, pbs-devel

On 3/11/26 9:59 AM, Fabian Grünbichler wrote:
> On March 3, 2026 5:49 pm, Samuel Rufinatscha wrote:
>> Hi,
>>
>> this series improves the performance of token-based API authentication
>> in PBS (pbs-config) and in PDM (underlying proxmox-access-control
>> crate), addressing the API token verification hotspot reported in our
>> bugtracker #7017 [1].
> 
> could you re-spin this so the proxmox/pdm parts compile (the proxmox
> workspace got switched to edition 2024, which clashes with the `gen`
> variable/binding name used in the patches here).
> 
> please also adapt the PBS patches to the new compatible naming - at some
> point we probably want to switch that over as well ;)
> 

Sure, will take care of this! :) Thanks!

>>
>> When profiling PBS /status endpoint with cargo flamegraph [2],
>> token-based authentication showed up as a dominant hotspot via
>> proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
>> path from the hot section of the flamegraph. The same performance issue
>> was measured [2] for PDM. PDM uses the underlying shared
>> proxmox-access-control library for token handling, which is a
>> factored out version of the token.shadow handling code from PBS.
>>
>> While this series fixes the immediate performance issue both in PBS
>> (pbs-config) and in the shared proxmox-access-control crate used by
>> PDM, PBS should eventually, ideally be refactored, in a separate
>> effort, to use proxmox-access-control for token handling instead of its
>> local implementation.
>>
>> Approach
>>
>> The goal is to reduce the cost of token-based authentication preserving
>> the existing token handling semantics (including detecting manual edits
>> to token.shadow) and be consistent between PBS (pbs-config) and
>> PDM (proxmox-access-control). For both sites, this series proposes to:
>>
>> 1. Introduce an in-memory cache for verified token secrets and
>> invalidate it through a shared ConfigVersionCache generation. Note, a
>> shared generation is required to keep privileged and unprivileged
>> daemon in sync to avoid caching inconsistencies across processes.
>> 2. Invalidate on token.shadow API changes (set_secret,
>> delete_secret)
>> 3. Invalidate on direct/manual token.shadow file changes (mtime +
>> length)
>> 4. Avoid per-request file stat calls using a TTL window
>>
>> Testing
>>
>> To verify the effect in PBS (pbs-config changes), I:
>> 1. Set up test environment based on latest PBS ISO, installed Rust
>>     toolchain, cloned proxmox-backup repository to use with cargo
>>     flamegraph. Reproduced bug #7017 [1] by profiling the /status
>>     endpoint with token-based authentication using cargo flamegraph [2].
>> 2. Built PBS with pbs-config patches and re-ran the same workload and
>>     profiling setup. Confirmed that
>>     proxmox_sys::crypt::verify_crypt_pw path no longer appears in the
>>     hot section of the flamegraph. CPU usage is now dominated by TLS
>>     overhead.
>> 3. Functionally-wise, I verified that:
>>     * valid tokens authenticate correctly when used in API requests
>>     * invalid secrets are rejected as before
>>     * generating a new token secret via dashboard (create token for
>>     user, regenerate existing secret) works and authenticates correctly
>>
>> To verify the effect in PDM (proxmox-access-control changes), instead
>> of PBS’ /status, I profiled the /version endpoint with cargo flamegraph
>> [2] and verified that the expensive hashing path disappears from the
>> hot section after introducing caching. Functionally-wise, I verified
>> that:
>>     * valid tokens authenticate correctly when used in API requests
>>     * invalid secrets are rejected as before
>>     * generating a new token secret via dashboard (create token for user,
>>     regenerate existing secret) works and authenticates correctly
>>
>> Results
>>
>> To measure caching effect I benchmarked parallel token auth requests
>> for /status?verbose=0 on top of the datastore lookup cache series [3]
>> to check throughput impact. With datastores=1, repeat=5000, parallel=16
>> this series gives ~172 req/s compared to ~65 req/s without it.
>> This is a ~2.6x improvement (and aligns with the ~179 req/s from the
>> previous series, which used per-process cache invalidation).
>>
>> Patch summary
>>
>> pbs-config:
>> 0001 – pbs-config: add token.shadow generation to ConfigVersionCache
>> 0002 – pbs-config: cache verified API token secrets
>> 0003 – pbs-config: invalidate token-secret cache on token.shadow
>> changes
>> 0004 – pbs-config: add TTL window to token-secret cache
>>
>> proxmox-access-control:
>> 0005 – access-control: extend AccessControlConfig for token.shadow invalidation
>> 0006 – access-control: cache verified API token secrets
>> 0007 – access-control: invalidate token-secret cache on token.shadow changes
>> 0008 – access-control: add TTL window to token-secret cache
>>
>> proxmox-datacenter-manager:
>> 0009 – pdm-config: add token.shadow generation to ConfigVersionCache
>> 0010 – docs: document API token-cache TTL effects
>> 0011 – pdm-config: wire user+acl cache generation
>>
>> Maintainer Notes:
>> * proxmox-access-control trait split: permissions now live in
>>   AccessControlPermissions, and AccessControlConfig now requires
>>   fn permissions(&self) -> &dyn AccessControlPermissions ->
>>   version bump
>> * Renames ConfigVersionCache`s pub user_cache_generation and
>>   increase_user_cache_generation -> version bump
>> * Adds parking_lot::RwLock dependency in PBS and proxmox-access-control
>>
>> This version and the version before only incorporate the reviewers'
>> feedback [4][5], also please consider Christian's R-b tag [4].
>>
>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>> [2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
>> [3] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>> [4] https://lore.proxmox.com/pbs-devel/20260121151408.731516-1-s.rufinatscha@proxmox.com/T/#t
>> [5] https://lore.proxmox.com/pbs-devel/20260217111229.78661-1-s.rufinatscha@proxmox.com/T/#t
>>
>> proxmox-backup:
>>
>> Samuel Rufinatscha (4):
>>    pbs-config: add token.shadow generation to ConfigVersionCache
>>    pbs-config: cache verified API token secrets
>>    pbs-config: invalidate token-secret cache on token.shadow changes
>>    pbs-config: add TTL window to token secret cache
>>
>>   Cargo.toml                             |   1 +
>>   docs/user-management.rst               |   4 +
>>   pbs-config/Cargo.toml                  |   1 +
>>   pbs-config/src/config_version_cache.rs |  18 ++
>>   pbs-config/src/token_shadow.rs         | 314 ++++++++++++++++++++++++-
>>   5 files changed, 335 insertions(+), 3 deletions(-)
>>
>>
>> proxmox:
>>
>> Samuel Rufinatscha (4):
>>    proxmox-access-control: split AccessControlConfig and add token.shadow
>>      gen
>>    proxmox-access-control: cache verified API token secrets
>>    proxmox-access-control: invalidate token-secret cache on token.shadow
>>      changes
>>    proxmox-access-control: add TTL window to token secret cache
>>
>>   Cargo.toml                                 |   1 +
>>   proxmox-access-control/Cargo.toml          |   1 +
>>   proxmox-access-control/src/acl.rs          |  10 +-
>>   proxmox-access-control/src/init.rs         | 113 ++++++--
>>   proxmox-access-control/src/token_shadow.rs | 315 ++++++++++++++++++++-
>>   5 files changed, 413 insertions(+), 27 deletions(-)
>>
>>
>> proxmox-datacenter-manager:
>>
>> Samuel Rufinatscha (3):
>>    pdm-config: implement token.shadow generation
>>    docs: document API token-cache TTL effects
>>    pdm-config: wire user+acl cache generation
>>
>>   cli/admin/src/main.rs                      |  2 +-
>>   docs/access-control.rst                    |  4 +++
>>   lib/pdm-api-types/src/acl.rs               |  4 +--
>>   lib/pdm-config/Cargo.toml                  |  1 +
>>   lib/pdm-config/src/access_control.rs       | 31 ++++++++++++++++++++
>>   lib/pdm-config/src/config_version_cache.rs | 34 +++++++++++++++++-----
>>   lib/pdm-config/src/lib.rs                  |  2 ++
>>   server/src/acl.rs                          |  3 +-
>>   ui/src/main.rs                             | 10 ++++++-
>>   9 files changed, 77 insertions(+), 14 deletions(-)
>>   create mode 100644 lib/pdm-config/src/access_control.rs
>>
>>
>> Summary over all repositories:
>>    19 files changed, 825 insertions(+), 44 deletions(-)
>>
>> -- 
>> Generated by git-murpp 0.8.1
>>
>>
>>
>>
>>





^ permalink raw reply	[relevance 6%]

* [PATCH proxmox-backup v7 1/4] pbs-config: add token.shadow generation to ConfigVersionCache
  2026-03-12 10:36 14% [PATCH proxmox{-backup,,-datacenter-manager} v7 " Samuel Rufinatscha
@ 2026-03-12 10:36 17% ` Samuel Rufinatscha
  2026-03-12 10:36 11% ` [PATCH proxmox-backup v7 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-12 10:36 UTC (permalink / raw)
  To: pbs-devel

Prepares the config version cache to support token_shadow caching.

Safety: the shmem mapping is fixed to 4096 bytes via the #[repr(C)]
union padding, and the new atomic is appended to the end of the
#[repr(C)] inner struct, so all existing field offsets stay unchanged.
Old processes keep accessing the same bytes and new processes consume
previously reserved padding.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v6 to v7:
* Rebased

Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased

Changes from v3 to v4:
* Rebased
* Adjusted commit message

Changes from v2 to v3:
* Rebased

Changes from v1 to v2:
* Rebased

 pbs-config/src/config_version_cache.rs | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
index b875f7e0..399a6f79 100644
--- a/pbs-config/src/config_version_cache.rs
+++ b/pbs-config/src/config_version_cache.rs
@@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
     traffic_control_generation: AtomicUsize,
     // datastore (datastore.cfg) generation/version
     datastore_generation: AtomicUsize,
+    // Token shadow (token.shadow) generation/version.
+    token_shadow_generation: AtomicUsize,
     // Add further atomics here
 }
 
@@ -159,4 +161,20 @@ impl ConfigVersionCache {
             .datastore_generation
             .fetch_add(1, Ordering::AcqRel)
     }
+
+    /// Returns the token shadow generation number.
+    pub fn token_shadow_generation(&self) -> usize {
+        self.shmem
+            .data()
+            .token_shadow_generation
+            .load(Ordering::Acquire)
+    }
+
+    /// Increase the token shadow generation number.
+    pub fn increase_token_shadow_generation(&self) -> usize {
+        self.shmem
+            .data()
+            .token_shadow_generation
+            .fetch_add(1, Ordering::AcqRel)
+    }
 }
-- 
2.47.3





^ permalink raw reply related	[relevance 17%]

* [PATCH proxmox-backup v7 4/4] pbs-config: add TTL window to token secret cache
  2026-03-12 10:36 14% [PATCH proxmox{-backup,,-datacenter-manager} v7 " Samuel Rufinatscha
                   ` (2 preceding siblings ...)
  2026-03-12 10:36 12% ` [PATCH proxmox-backup v7 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2026-03-12 10:37 15% ` Samuel Rufinatscha
  2026-03-12 10:37 14% ` [PATCH proxmox v7 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen Samuel Rufinatscha
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-12 10:37 UTC (permalink / raw)
  To: pbs-devel

Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
usually should rarely change.

This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired; documents TTL effects.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v6 to v7:
* Rebased

Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased
* Introduce shadow_check_within_ttl() helper

Changes from v3 to v4:
* Adjusted commit message

Changes from v2 to v3:
* Refactored refresh_cache_if_file_changed TTL logic.
* Remove had_prior_state check (replaced by last_checked logic).
* Improve TTL bound checks.
* Reword documentation warning for clarity.

Changes from v1 to v2:
* Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
* Implement double-checked TTL: check with try_read first; only attempt
  refresh with try_write if expired/unknown.
* Fix TTL bookkeeping: update last_checked on the “file unchanged” path
  and after API mutations.
* Add documentation warning about TTL-delayed effect of manual
  token.shadow edits.

 docs/user-management.rst       |  4 ++++
 pbs-config/src/token_shadow.rs | 30 +++++++++++++++++++++++++++++-
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/docs/user-management.rst b/docs/user-management.rst
index 41b43d60..8dfae528 100644
--- a/docs/user-management.rst
+++ b/docs/user-management.rst
@@ -156,6 +156,10 @@ metadata:
 Similarly, the ``user delete-token`` subcommand can be used to delete a token
 again.
 
+.. WARNING:: Direct/manual edits to ``token.shadow`` may take up to 60 seconds (or
+   longer in edge cases) to take effect due to caching. Restart services for
+   immediate effect of manual edits.
+
 Newly generated API tokens don't have any permissions. Please read the next
 section to learn how to set access permissions.
 
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index c08252c8..8c5a7b97 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -31,6 +31,8 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
         shadow: None,
     })
 });
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
 
 #[derive(Serialize, Deserialize)]
 #[serde(rename_all = "kebab-case")]
@@ -72,11 +74,24 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
 fn refresh_cache_if_file_changed() -> bool {
     let now = epoch_i64();
 
-    // Best-effort refresh under write lock.
+    // Fast path: cache is fresh if shared-gen matches and TTL not expired.
+    if let (Some(cache), Some(shared_gen_read)) =
+        (TOKEN_SECRET_CACHE.try_read(), token_shadow_shared_gen())
+    {
+        if cache.shared_gen == shared_gen_read && cache.shadow_check_within_ttl(now) {
+            return true;
+        }
+        // read lock drops here
+    } else {
+        return false;
+    }
+
+    // Slow path: best-effort refresh under write lock.
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return false;
     };
 
+    // Re-read generation after acquiring the lock (may have changed meanwhile).
     let Some(shared_gen_now) = token_shadow_shared_gen() else {
         return false;
     };
@@ -86,6 +101,12 @@ fn refresh_cache_if_file_changed() -> bool {
         cache.reset_and_set_gen(shared_gen_now);
     }
 
+    // TTL check again after acquiring the lock
+    let now = epoch_i64();
+    if cache.shadow_check_within_ttl(now) {
+        return true;
+    }
+
     // Stat the file to detect manual edits.
     let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
         return false;
@@ -234,6 +255,13 @@ impl ApiTokenSecretCache {
         self.secrets.remove(tokenid);
         self.shared_gen = new_gen;
     }
+
+    /// Returns true if cached token.shadow metadata exists and was checked within the TTL window.
+    fn shadow_check_within_ttl(&self, now: i64) -> bool {
+        self.shadow.as_ref().is_some_and(|cached| {
+            now >= cached.last_checked && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS
+        })
+    }
 }
 
 /// Shadow file info
-- 
2.47.3





^ permalink raw reply related	[relevance 15%]

* [PATCH proxmox-datacenter-manager v7 2/3] docs: document API token-cache TTL effects
  2026-03-12 10:36 14% [PATCH proxmox{-backup,,-datacenter-manager} v7 " Samuel Rufinatscha
                   ` (8 preceding siblings ...)
  2026-03-12 10:37 13% ` [PATCH proxmox-datacenter-manager v7 1/3] pdm-config: implement token.shadow generation Samuel Rufinatscha
@ 2026-03-12 10:37 17% ` Samuel Rufinatscha
  2026-03-12 10:37 16% ` [PATCH proxmox-datacenter-manager v7 3/3] pdm-config: wire user+acl cache generation Samuel Rufinatscha
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-12 10:37 UTC (permalink / raw)
  To: pbs-devel

Documents the effects of the added API token-cache in the
proxmox-access-control crate.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v6 to v7:
* Rebased

Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased

Changes from v3 to 4:
* Adjusted commit message

 docs/access-control.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/docs/access-control.rst b/docs/access-control.rst
index adf26cd..18e57a2 100644
--- a/docs/access-control.rst
+++ b/docs/access-control.rst
@@ -47,6 +47,10 @@ place of the user ID (``user@realm``) and the user password, respectively.
 The API token is passed from the client to the server by setting the ``Authorization`` HTTP header
 with method ``PDMAPIToken`` to the value ``TOKENID:TOKENSECRET``.
 
+.. WARNING:: Direct/manual edits to ``token.shadow`` may take up to 60 seconds (or
+   longer in edge cases) to take effect due to caching. Restart services for
+   immediate effect of manual edits.
+
 .. _access_control:
 
 Access Control
-- 
2.47.3





^ permalink raw reply related	[relevance 17%]

* [PATCH proxmox-backup v7 3/4] pbs-config: invalidate token-secret cache on token.shadow changes
  2026-03-12 10:36 14% [PATCH proxmox{-backup,,-datacenter-manager} v7 " Samuel Rufinatscha
  2026-03-12 10:36 17% ` [PATCH proxmox-backup v7 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
  2026-03-12 10:36 11% ` [PATCH proxmox-backup v7 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
@ 2026-03-12 10:36 12% ` Samuel Rufinatscha
  2026-03-12 10:37 15% ` [PATCH proxmox-backup v7 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-12 10:36 UTC (permalink / raw)
  To: pbs-devel

This patch adds manual/direct file change detection by tracking the
mtime and length of token.shadow and clears the in-memory token secret
cache whenever these values change.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v6 to v7:
* Rebased

Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased

Changes from v3 to v4:
* make use of .replace() in refresh_cache_if_file_changed to get
previous state
* Group file stats with ShadowFileInfo
* Return false in refresh_cache_if_file_changed to avoid unnecessary cache
queries
* Adjusted commit message

Changes from v2 to v3:
* Cache now tracks last_checked (epoch seconds).
* Simplified refresh_cache_if_file_changed, removed
FILE_GENERATION logic
* On first load, initializes file metadata and keeps empty cache.

Changes from v1 to v2:
* Add file metadata tracking (file_mtime, file_len) and
  FILE_GENERATION.
* Store file_gen in CachedSecret and verify it against the current
  FILE_GENERATION to ensure cached entries belong to the current file
  state.
* Add shadow_mtime_len() helper and convert refresh to best-effort
  (try_write, returns bool).
* Pass a pre-write metadata snapshot into apply_api_mutation and
  clear/bump generation if the cache metadata indicates missed external
  edits.

 pbs-config/src/token_shadow.rs | 123 +++++++++++++++++++++++++++++++--
 1 file changed, 119 insertions(+), 4 deletions(-)

diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index f6f962b3..c08252c8 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,5 +1,8 @@
 use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
 use std::sync::LazyLock;
+use std::time::SystemTime;
 
 use anyhow::{bail, format_err, Error};
 use parking_lot::RwLock;
@@ -7,6 +10,7 @@ use serde::{Deserialize, Serialize};
 use serde_json::{from_value, Value};
 
 use proxmox_sys::fs::CreateOptions;
+use proxmox_time::epoch_i64;
 
 use pbs_api_types::Authid;
 //use crate::auth;
@@ -24,6 +28,7 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
     RwLock::new(ApiTokenSecretCache {
         secrets: HashMap::new(),
         shared_gen: 0,
+        shadow: None,
     })
 });
 
@@ -62,6 +67,56 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
     proxmox_sys::fs::replace_file(CONF_FILE, &json, options, true)
 }
 
+/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
+/// Returns true if the cache is valid to use, false if not.
+fn refresh_cache_if_file_changed() -> bool {
+    let now = epoch_i64();
+
+    // Best-effort refresh under write lock.
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return false;
+    };
+
+    let Some(shared_gen_now) = token_shadow_shared_gen() else {
+        return false;
+    };
+
+    // If another process bumped the generation, we don't know what changed -> clear cache
+    if cache.shared_gen != shared_gen_now {
+        cache.reset_and_set_gen(shared_gen_now);
+    }
+
+    // Stat the file to detect manual edits.
+    let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
+        return false;
+    };
+
+    // If the file didn't change, only update last_checked
+    if let Some(shadow) = cache.shadow.as_mut() {
+        if shadow.mtime == new_mtime && shadow.len == new_len {
+            shadow.last_checked = now;
+            return true;
+        }
+    }
+
+    cache.secrets.clear();
+
+    let prev = cache.shadow.replace(ShadowFileInfo {
+        mtime: new_mtime,
+        len: new_len,
+        last_checked: now,
+    });
+
+    if prev.is_some() {
+        // Best-effort propagation to other processes if a change was detected
+        if let Some(shared_gen_new) = bump_token_shadow_shared_gen() {
+            cache.shared_gen = shared_gen_new;
+        }
+    }
+
+    false
+}
+
 /// Verifies that an entry for given tokenid / API token secret exists
 pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     if !tokenid.is_token() {
@@ -69,7 +124,7 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     }
 
     // Fast path
-    if cache_try_secret_matches(tokenid, secret) {
+    if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
         return Ok(());
     }
 
@@ -109,12 +164,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
 
     let guard = lock_config()?;
 
+    // Capture state before we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
-    apply_api_mutation(guard, tokenid, Some(secret));
+    apply_api_mutation(guard, tokenid, Some(secret), pre_meta);
 
     Ok(())
 }
@@ -127,11 +185,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
 
     let guard = lock_config()?;
 
+    // Capture state before we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
-    apply_api_mutation(guard, tokenid, None);
+    apply_api_mutation(guard, tokenid, None, pre_meta);
 
     Ok(())
 }
@@ -150,6 +211,8 @@ struct ApiTokenSecretCache {
     secrets: HashMap<Authid, CachedSecret>,
     /// Shared generation to detect mutations of the underlying token.shadow file.
     shared_gen: usize,
+    /// Shadow file info to detect changes
+    shadow: Option<ShadowFileInfo>,
 }
 
 impl ApiTokenSecretCache {
@@ -157,6 +220,7 @@ impl ApiTokenSecretCache {
     fn reset_and_set_gen(&mut self, new_gen: usize) {
         self.secrets.clear();
         self.shared_gen = new_gen;
+        self.shadow = None;
     }
 
     /// Caches a secret and sets/updates the cache generation.
@@ -172,6 +236,16 @@ impl ApiTokenSecretCache {
     }
 }
 
+/// Shadow file info
+struct ShadowFileInfo {
+    // shadow file mtime to detect changes
+    mtime: Option<SystemTime>,
+    // shadow file length to detect changes
+    len: Option<u64>,
+    // last time the file metadata was checked
+    last_checked: i64,
+}
+
 fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return;
@@ -220,7 +294,14 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
     false
 }
 
-fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+fn apply_api_mutation(
+    _guard: BackupLockGuard,
+    tokenid: &Authid,
+    new_secret: Option<&str>,
+    pre_write_meta: (Option<SystemTime>, Option<u64>),
+) {
+    let now = epoch_i64();
+
     // Signal cache invalidation to other processes (best-effort).
     let bumped_gen = bump_token_shadow_shared_gen();
 
@@ -239,6 +320,16 @@ fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Opt
         return;
     }
 
+    // If our cached file metadata does not match the on-disk state before our write,
+    // we likely missed an external/manual edit. We can no longer trust any cached secrets.
+    if cache
+        .shadow
+        .as_ref()
+        .is_some_and(|s| (s.mtime, s.len) != pre_write_meta)
+    {
+        cache.secrets.clear();
+    }
+
     // Apply the new mutation.
     match new_secret {
         Some(secret) => {
@@ -249,6 +340,22 @@ fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Opt
         }
         None => cache.evict_and_set_gen(tokenid, current_gen),
     }
+
+    // Update our view of the file metadata to the post-write state (best-effort).
+    // (If this fails, drop local cache so callers fall back to slow path until refreshed.)
+    match shadow_mtime_len() {
+        Ok((mtime, len)) => {
+            cache.shadow = Some(ShadowFileInfo {
+                mtime,
+                len,
+                last_checked: now,
+            });
+        }
+        Err(_) => {
+            // If we cannot validate state, do not trust cache.
+            cache.reset_and_set_gen(current_gen);
+        }
+    }
 }
 
 /// Get the current shared generation.
@@ -264,3 +371,11 @@ fn bump_token_shadow_shared_gen() -> Option<usize> {
         .ok()
         .map(|cvc| cvc.increase_token_shadow_generation() + 1)
 }
+
+fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
+    match fs::metadata(CONF_FILE) {
+        Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
+        Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
+        Err(e) => Err(e.into()),
+    }
+}
-- 
2.47.3





^ permalink raw reply related	[relevance 12%]

* [PATCH proxmox-datacenter-manager v7 3/3] pdm-config: wire user+acl cache generation
  2026-03-12 10:36 14% [PATCH proxmox{-backup,,-datacenter-manager} v7 " Samuel Rufinatscha
                   ` (9 preceding siblings ...)
  2026-03-12 10:37 17% ` [PATCH proxmox-datacenter-manager v7 2/3] docs: document API token-cache TTL effects Samuel Rufinatscha
@ 2026-03-12 10:37 16% ` Samuel Rufinatscha
  2026-03-19 12:26  5% ` partially-applied: [PATCH proxmox{-backup,,-datacenter-manager} v7 00/11] token-shadow: reduce api token verification overhead Fabian Grünbichler
  2026-04-09 15:58 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-12 10:37 UTC (permalink / raw)
  To: pbs-devel

Rename ConfigVersionCache’s user_cache_generation to
user_and_acl_generation to match AccessControlConfig::cache_generation
and increment_cache_generation semantics: it expects the same shared
generation for both user and ACL configs.

Safety: no layout change, the shared-memory size and field order remain
unchanged.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v6 to v7:
* Rebased

Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased

 lib/pdm-config/src/access_control.rs       | 11 +++++++++++
 lib/pdm-config/src/config_version_cache.rs | 16 ++++++++--------
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/lib/pdm-config/src/access_control.rs b/lib/pdm-config/src/access_control.rs
index 389b3f4..1d498d3 100644
--- a/lib/pdm-config/src/access_control.rs
+++ b/lib/pdm-config/src/access_control.rs
@@ -7,6 +7,17 @@ impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
         &pdm_api_types::AccessControlPermissions
     }
 
+    fn cache_generation(&self) -> Option<usize> {
+        crate::ConfigVersionCache::new()
+            .ok()
+            .map(|c| c.user_and_acl_generation())
+    }
+
+    fn increment_cache_generation(&self) -> Result<(), Error> {
+        let c = crate::ConfigVersionCache::new()?;
+        Ok(c.increase_user_and_acl_generation())
+    }
+
     fn token_shadow_cache_generation(&self) -> Option<usize> {
         crate::ConfigVersionCache::new()
             .ok()
diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
index 933140c..f3d52a0 100644
--- a/lib/pdm-config/src/config_version_cache.rs
+++ b/lib/pdm-config/src/config_version_cache.rs
@@ -21,8 +21,8 @@ use proxmox_shared_memory::*;
 #[repr(C)]
 struct ConfigVersionCacheDataInner {
     magic: [u8; 8],
-    // User (user.cfg) cache generation/version.
-    user_cache_generation: AtomicUsize,
+    // User (user.cfg) and ACL (acl.cfg) generation/version.
+    user_and_acl_generation: AtomicUsize,
     // Traffic control (traffic-control.cfg) generation/version.
     traffic_control_generation: AtomicUsize,
     // Tracks updates to the remote/hostname/nodename mapping cache.
@@ -126,19 +126,19 @@ impl ConfigVersionCache {
         Ok(Arc::new(Self { shmem }))
     }
 
-    /// Returns the user cache generation number.
-    pub fn user_cache_generation(&self) -> usize {
+    /// Returns the user and ACL cache generation number.
+    pub fn user_and_acl_generation(&self) -> usize {
         self.shmem
             .data()
-            .user_cache_generation
+            .user_and_acl_generation
             .load(Ordering::Acquire)
     }
 
-    /// Increase the user cache generation number.
-    pub fn increase_user_cache_generation(&self) {
+    /// Increase the user and ACL cache generation number.
+    pub fn increase_user_and_acl_generation(&self) {
         self.shmem
             .data()
-            .user_cache_generation
+            .user_and_acl_generation
             .fetch_add(1, Ordering::AcqRel);
     }
 
-- 
2.47.3





^ permalink raw reply related	[relevance 16%]

* [PATCH proxmox v7 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen
  2026-03-12 10:36 14% [PATCH proxmox{-backup,,-datacenter-manager} v7 " Samuel Rufinatscha
                   ` (3 preceding siblings ...)
  2026-03-12 10:37 15% ` [PATCH proxmox-backup v7 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
@ 2026-03-12 10:37 14% ` Samuel Rufinatscha
  2026-03-12 10:37 11% ` [PATCH proxmox v7 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-12 10:37 UTC (permalink / raw)
  To: pbs-devel

Splits AccessControlConfig into permissions and config traits and adds
token.shadow generation support. The trait split separates permission
from cache/invalidation concerns while keeping existing call sites
working via default delegation.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v6 to v7:
* Rebased

Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased

 proxmox-access-control/src/acl.rs  |  10 ++-
 proxmox-access-control/src/init.rs | 113 +++++++++++++++++++++++------
 2 files changed, 99 insertions(+), 24 deletions(-)

diff --git a/proxmox-access-control/src/acl.rs b/proxmox-access-control/src/acl.rs
index 38cb7edf..4b4eac09 100644
--- a/proxmox-access-control/src/acl.rs
+++ b/proxmox-access-control/src/acl.rs
@@ -763,7 +763,7 @@ fn privs_to_priv_names(privs: u64) -> Vec<&'static str> {
 mod test {
     use std::{collections::HashMap, sync::OnceLock};
 
-    use crate::init::{init_access_config, AccessControlConfig};
+    use crate::init::{init_access_config, AccessControlConfig, AccessControlPermissions};
 
     use super::AclTree;
     use anyhow::Error;
@@ -775,7 +775,7 @@ mod test {
         roles: HashMap<&'a str, (u64, &'a str)>,
     }
 
-    impl AccessControlConfig for TestAcmConfig<'_> {
+    impl AccessControlPermissions for TestAcmConfig<'_> {
         fn roles(&self) -> &HashMap<&str, (u64, &str)> {
             &self.roles
         }
@@ -793,6 +793,12 @@ mod test {
         }
     }
 
+    impl AccessControlConfig for TestAcmConfig<'_> {
+        fn permissions(&self) -> &dyn AccessControlPermissions {
+            self
+        }
+    }
+
     fn setup_acl_tree_config() {
         static ACL_CONFIG: OnceLock<TestAcmConfig> = OnceLock::new();
         let config = ACL_CONFIG.get_or_init(|| {
diff --git a/proxmox-access-control/src/init.rs b/proxmox-access-control/src/init.rs
index e64398e8..dfd7784b 100644
--- a/proxmox-access-control/src/init.rs
+++ b/proxmox-access-control/src/init.rs
@@ -8,9 +8,8 @@ use proxmox_section_config::SectionConfigData;
 
 static ACCESS_CONF: OnceLock<&'static dyn AccessControlConfig> = OnceLock::new();
 
-/// This trait specifies the functions a product needs to implement to get ACL tree based access
-/// control management from this plugin.
-pub trait AccessControlConfig: Send + Sync {
+/// Provides permission metadata used by access control.
+pub trait AccessControlPermissions: Send + Sync {
     /// Returns a mapping of all recognized privileges and their corresponding `u64` value.
     fn privileges(&self) -> &HashMap<&str, u64>;
 
@@ -32,25 +31,6 @@ pub trait AccessControlConfig: Send + Sync {
         false
     }
 
-    /// Returns the current cache generation of the user and acl configs. If the generation was
-    /// incremented since the last time the cache was queried, the configs are loaded again from
-    /// disk.
-    ///
-    /// Returning `None` will always reload the cache.
-    ///
-    /// Default: Always returns `None`.
-    fn cache_generation(&self) -> Option<usize> {
-        None
-    }
-
-    /// Increment the cache generation of user and acl configs. This indicates that they were
-    /// changed on disk.
-    ///
-    /// Default: Does nothing.
-    fn increment_cache_generation(&self) -> Result<(), Error> {
-        Ok(())
-    }
-
     /// Optionally returns a role that has no access to any resource.
     ///
     /// Default: Returns `None`.
@@ -103,6 +83,95 @@ pub trait AccessControlConfig: Send + Sync {
     }
 }
 
+/// This trait specifies the functions a product needs to implement to get ACL tree based access
+/// control management from this plugin.
+pub trait AccessControlConfig: Send + Sync {
+    /// Return the permissions provider.
+    fn permissions(&self) -> &dyn AccessControlPermissions;
+
+    fn privileges(&self) -> &HashMap<&str, u64> {
+        self.permissions().privileges()
+    }
+
+    fn roles(&self) -> &HashMap<&str, (u64, &str)> {
+        self.permissions().roles()
+    }
+
+    fn is_superuser(&self, auth_id: &Authid) -> bool {
+        self.permissions().is_superuser(auth_id)
+    }
+
+    fn is_group_member(&self, user_id: &Userid, group: &str) -> bool {
+        self.permissions().is_group_member(user_id, group)
+    }
+
+    fn role_no_access(&self) -> Option<&str> {
+        self.permissions().role_no_access()
+    }
+
+    fn role_admin(&self) -> Option<&str> {
+        self.permissions().role_admin()
+    }
+
+    fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
+        self.permissions().init_user_config(config)
+    }
+
+    fn acl_audit_privileges(&self) -> u64 {
+        self.permissions().acl_audit_privileges()
+    }
+
+    fn acl_modify_privileges(&self) -> u64 {
+        self.permissions().acl_modify_privileges()
+    }
+
+    fn check_acl_path(&self, path: &str) -> Result<(), Error> {
+        self.permissions().check_acl_path(path)
+    }
+
+    fn allow_partial_permission_match(&self) -> bool {
+        self.permissions().allow_partial_permission_match()
+    }
+
+    // Cache hooks
+
+    /// Returns the current cache generation of the user and acl configs. If the generation was
+    /// incremented since the last time the cache was queried, the configs are loaded again from
+    /// disk.
+    ///
+    /// Returning `None` will always reload the cache.
+    ///
+    /// Default: Always returns `None`.
+    fn cache_generation(&self) -> Option<usize> {
+        None
+    }
+
+    /// Increment the cache generation of user and acl configs. This indicates that they were
+    /// changed on disk.
+    ///
+    /// Default: Does nothing.
+    fn increment_cache_generation(&self) -> Result<(), Error> {
+        Ok(())
+    }
+
+    /// Returns the current cache generation of the token shadow cache. If the generation was
+    /// incremented since the last time the cache was queried, the token shadow cache is reloaded
+    /// from disk.
+    ///
+    /// Default: Always returns `None`.
+    fn token_shadow_cache_generation(&self) -> Option<usize> {
+        None
+    }
+
+    /// Increment the cache generation of the token shadow cache. This indicates that it was
+    /// changed on disk.
+    ///
+    /// Default: Returns an error as token shadow generation is not supported.
+    fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
+        anyhow::bail!("token shadow generation not supported");
+    }
+}
+
 pub fn init_access_config(config: &'static dyn AccessControlConfig) -> Result<(), Error> {
     ACCESS_CONF
         .set(config)
-- 
2.47.3





^ permalink raw reply related	[relevance 14%]

* [PATCH proxmox-backup v7 2/4] pbs-config: cache verified API token secrets
  2026-03-12 10:36 14% [PATCH proxmox{-backup,,-datacenter-manager} v7 " Samuel Rufinatscha
  2026-03-12 10:36 17% ` [PATCH proxmox-backup v7 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
@ 2026-03-12 10:36 11% ` Samuel Rufinatscha
  2026-03-12 10:36 12% ` [PATCH proxmox-backup v7 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-12 10:36 UTC (permalink / raw)
  To: pbs-devel

Adds an in-memory cache of successfully verified token secrets.
Subsequent requests for the same token+secret combination only perform
a comparison using openssl::memcmp::eq and avoid re-running the
password hash. The cache is updated when a token secret is set and
cleared when a token is deleted. A shared generation counter (via
ConfigVersionCache) is used to invalidate caches across processes when
token secrets are modified or deleted. This keeps privileged and
unprivileged daemons in sync.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v6 to v7:
* Rebased
* Rename "gen" variables to be compatible with Rust 2024 keyword
changes

Changes from v5 to v6:
* Rebased
* Check that the input byte lengths are equal before calling
openssl::memcmp::eq(..).

Changes from v4 to v5:
* Rebased
* Move invalidate_cache_state_and_set_gen into cache object impl
rename to reset_and_set_gen
* Add additional insert/remove helpers which set/update the generation
directly
* Clarified the  usage of shared generation counter in the commit
message

Changes from v3 to v4:
* Add gen param to invalidate_cache_state()
* Validates the generation bump after obtaining write lock in
apply_api_mutation
* Pass lock to apply_api_mutation
* Remove unnecessary gen check cache_try_secret_matches
* Adjusted commit message

Changes from v2 to v3:
* Replaced process-local cache invalidation (AtomicU64
API_MUTATION_GENERATION) with a cross-process shared generation via
ConfigVersionCache.
* Validate shared generation before/after the constant-time secret
compare; only insert into cache if the generation is unchanged.
* invalidate_cache_state() on insert if shared generation changed.

Changes from v1 to v2:
* Replace OnceCell with LazyLock, and std::sync::RwLock with
parking_lot::RwLock.
* Add API_MUTATION_GENERATION and guard cache inserts
to prevent “zombie inserts” across concurrent set/delete.
* Refactor cache operations into cache_try_secret_matches,
cache_try_insert_secret, and centralize write-side behavior in
apply_api_mutation.
* Switch fast-path cache access to try_read/try_write (best-effort).

 Cargo.toml                     |   1 +
 pbs-config/Cargo.toml          |   1 +
 pbs-config/src/token_shadow.rs | 171 ++++++++++++++++++++++++++++++++-
 3 files changed, 170 insertions(+), 3 deletions(-)

diff --git a/Cargo.toml b/Cargo.toml
index ca0ee176..c3573238 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -146,6 +146,7 @@ nom = "7"
 num-traits = "0.2"
 once_cell = "1.3.1"
 openssl = "0.10.40"
+parking_lot = "0.12"
 percent-encoding = "2.1"
 pin-project-lite = "0.2"
 regex = "1.5.5"
diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
index 74afb3c6..eb81ce00 100644
--- a/pbs-config/Cargo.toml
+++ b/pbs-config/Cargo.toml
@@ -13,6 +13,7 @@ libc.workspace = true
 nix.workspace = true
 once_cell.workspace = true
 openssl.workspace = true
+parking_lot.workspace = true
 regex.workspace = true
 serde.workspace = true
 serde_json.workspace = true
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index 640fabbf..f6f962b3 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,6 +1,8 @@
 use std::collections::HashMap;
+use std::sync::LazyLock;
 
 use anyhow::{bail, format_err, Error};
+use parking_lot::RwLock;
 use serde::{Deserialize, Serialize};
 use serde_json::{from_value, Value};
 
@@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
 const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
 const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
 
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
+    RwLock::new(ApiTokenSecretCache {
+        secrets: HashMap::new(),
+        shared_gen: 0,
+    })
+});
+
 #[derive(Serialize, Deserialize)]
 #[serde(rename_all = "kebab-case")]
 /// ApiToken id / secret pair
@@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
+    // Fast path
+    if cache_try_secret_matches(tokenid, secret) {
+        return Ok(());
+    }
+
+    // Slow path
+    // First, capture the shared generation before doing the hash verification.
+    let gen_before = token_shadow_shared_gen();
+
     let data = read_file()?;
     match data.get(tokenid) {
-        Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+        Some(hashed_secret) => {
+            proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+
+            // Try to cache only if nothing changed while verifying the secret.
+            if let Some(gen_before) = gen_before {
+                cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen_before);
+            }
+
+            Ok(())
+        }
         None => bail!("invalid API token"),
     }
 }
@@ -75,13 +107,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
-    let _guard = lock_config()?;
+    let guard = lock_config()?;
 
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
+    apply_api_mutation(guard, tokenid, Some(secret));
+
     Ok(())
 }
 
@@ -91,11 +125,142 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
-    let _guard = lock_config()?;
+    let guard = lock_config()?;
 
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
+    apply_api_mutation(guard, tokenid, None);
+
     Ok(())
 }
+
+/// Cached secret.
+struct CachedSecret {
+    secret: String,
+}
+
+struct ApiTokenSecretCache {
+    /// Keys are token Authids, values are the corresponding plain text secrets.
+    /// Entries are added after a successful on-disk verification in
+    /// `verify_secret` or when a new token secret is generated by
+    /// `generate_and_set_secret`. Used to avoid repeated
+    /// password-hash computation on subsequent authentications.
+    secrets: HashMap<Authid, CachedSecret>,
+    /// Shared generation to detect mutations of the underlying token.shadow file.
+    shared_gen: usize,
+}
+
+impl ApiTokenSecretCache {
+    /// Resets all local cache contents and sets/updates the cached generation.
+    fn reset_and_set_gen(&mut self, new_gen: usize) {
+        self.secrets.clear();
+        self.shared_gen = new_gen;
+    }
+
+    /// Caches a secret and sets/updates the cache generation.
+    fn insert_and_set_gen(&mut self, tokenid: Authid, secret: CachedSecret, new_gen: usize) {
+        self.secrets.insert(tokenid, secret);
+        self.shared_gen = new_gen;
+    }
+
+    /// Evicts a cached secret and sets/updates the cached generation.
+    fn evict_and_set_gen(&mut self, tokenid: &Authid, new_gen: usize) {
+        self.secrets.remove(tokenid);
+        self.shared_gen = new_gen;
+    }
+}
+
+fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return;
+    };
+
+    let Some(shared_gen_now) = token_shadow_shared_gen() else {
+        return;
+    };
+
+    // If this process missed a generation bump, its cache is stale.
+    if cache.shared_gen != shared_gen_now {
+        cache.reset_and_set_gen(shared_gen_now);
+    }
+
+    // If a mutation happened while we were verifying the secret, do not insert.
+    if shared_gen_now == shared_gen_before {
+        cache.insert_and_set_gen(tokenid, CachedSecret { secret }, shared_gen_now);
+    }
+}
+
+/// Tries to match the given token secret against the cached secret.
+///
+/// Verifies the generation/version before doing the constant-time
+/// comparison to reduce TOCTOU risk. During token rotation or deletion
+/// tokens for in-flight requests may still validate against the previous
+/// generation.
+fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
+    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+        return false;
+    };
+    let Some(entry) = cache.secrets.get(tokenid) else {
+        return false;
+    };
+    let Some(current_gen) = token_shadow_shared_gen() else {
+        return false;
+    };
+
+    if current_gen == cache.shared_gen {
+        let cached_secret_bytes = entry.secret.as_bytes();
+        let secret_bytes = secret.as_bytes();
+
+        return cached_secret_bytes.len() == secret_bytes.len()
+            && openssl::memcmp::eq(cached_secret_bytes, secret_bytes);
+    }
+
+    false
+}
+
+fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+    // Signal cache invalidation to other processes (best-effort).
+    let bumped_gen = bump_token_shadow_shared_gen();
+
+    let mut cache = TOKEN_SECRET_CACHE.write();
+
+    // If we cannot get the current generation, we cannot trust the cache
+    let Some(current_gen) = token_shadow_shared_gen() else {
+        cache.reset_and_set_gen(0);
+        return;
+    };
+
+    // If we cannot bump the shared generation, or if it changed after
+    // obtaining the cache write lock, we cannot trust the cache
+    if bumped_gen != Some(current_gen) {
+        cache.reset_and_set_gen(current_gen);
+        return;
+    }
+
+    // Apply the new mutation.
+    match new_secret {
+        Some(secret) => {
+            let cached_secret = CachedSecret {
+                secret: secret.to_owned(),
+            };
+            cache.insert_and_set_gen(tokenid.clone(), cached_secret, current_gen);
+        }
+        None => cache.evict_and_set_gen(tokenid, current_gen),
+    }
+}
+
+/// Get the current shared generation.
+fn token_shadow_shared_gen() -> Option<usize> {
+    crate::ConfigVersionCache::new()
+        .ok()
+        .map(|cvc| cvc.token_shadow_generation())
+}
+
+/// Bump and return the new shared generation.
+fn bump_token_shadow_shared_gen() -> Option<usize> {
+    crate::ConfigVersionCache::new()
+        .ok()
+        .map(|cvc| cvc.increase_token_shadow_generation() + 1)
+}
-- 
2.47.3





^ permalink raw reply related	[relevance 11%]

* [PATCH proxmox-datacenter-manager v7 1/3] pdm-config: implement token.shadow generation
  2026-03-12 10:36 14% [PATCH proxmox{-backup,,-datacenter-manager} v7 " Samuel Rufinatscha
                   ` (7 preceding siblings ...)
  2026-03-12 10:37 15% ` [PATCH proxmox v7 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
@ 2026-03-12 10:37 13% ` Samuel Rufinatscha
  2026-03-12 10:37 17% ` [PATCH proxmox-datacenter-manager v7 2/3] docs: document API token-cache TTL effects Samuel Rufinatscha
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-12 10:37 UTC (permalink / raw)
  To: pbs-devel

PDM depends on the shared proxmox/proxmox-access-control crate for
token.shadow handling which expects the product to provide a
cross-process invalidation signal so it can cache/invalidate
token.shadow secrets.

This patch wires AccessControlConfig to ConfigVersionCache for
token.shadow invalidation and switches server/CLI/UI init to use
pdm-config’s AccessControlConfig.

Safety: the shmem mapping is fixed to 4096 bytes via the #[repr(C)]
union padding, and the new atomic is appended to the end of the
#[repr(C)] inner struct, so all existing field offsets stay unchanged.
Old processes keep accessing the same bytes and new processes consume
previously reserved padding.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v6 to v7:
* Rebased

Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased
* Added safety note to commit message

Changes from v3 to v4:
* pdm-api-types: replace AccessControlConfig with
AccessControlPermissions and implement init::AccessControlPermissions
there
* pdm-config: add new AccessControlConfig implementing
init::AccessControlConfig
* UI: init uses a local UiAccessControlConfig for init_access_config()
* Adjusted commit message

 cli/admin/src/main.rs                      |  2 +-
 lib/pdm-api-types/src/acl.rs               |  4 ++--
 lib/pdm-config/Cargo.toml                  |  1 +
 lib/pdm-config/src/access_control.rs       | 20 ++++++++++++++++++++
 lib/pdm-config/src/config_version_cache.rs | 18 ++++++++++++++++++
 lib/pdm-config/src/lib.rs                  |  2 ++
 server/src/acl.rs                          |  3 +--
 ui/src/main.rs                             | 10 +++++++++-
 8 files changed, 54 insertions(+), 6 deletions(-)
 create mode 100644 lib/pdm-config/src/access_control.rs

diff --git a/cli/admin/src/main.rs b/cli/admin/src/main.rs
index f698fa2..916c633 100644
--- a/cli/admin/src/main.rs
+++ b/cli/admin/src/main.rs
@@ -19,7 +19,7 @@ fn main() {
     proxmox_product_config::init(api_user, priv_user);
 
     proxmox_access_control::init::init(
-        &pdm_api_types::AccessControlConfig,
+        &pdm_config::AccessControlConfig,
         pdm_buildcfg::configdir!("/access"),
     )
     .expect("failed to setup access control config");
diff --git a/lib/pdm-api-types/src/acl.rs b/lib/pdm-api-types/src/acl.rs
index 405982a..7c405a7 100644
--- a/lib/pdm-api-types/src/acl.rs
+++ b/lib/pdm-api-types/src/acl.rs
@@ -187,9 +187,9 @@ pub struct AclListItem {
     pub roleid: String,
 }
 
-pub struct AccessControlConfig;
+pub struct AccessControlPermissions;
 
-impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
+impl proxmox_access_control::init::AccessControlPermissions for AccessControlPermissions {
     fn privileges(&self) -> &HashMap<&str, u64> {
         static PRIVS: LazyLock<HashMap<&str, u64>> =
             LazyLock::new(|| PRIVILEGES.iter().copied().collect());
diff --git a/lib/pdm-config/Cargo.toml b/lib/pdm-config/Cargo.toml
index d39c2ad..19781d2 100644
--- a/lib/pdm-config/Cargo.toml
+++ b/lib/pdm-config/Cargo.toml
@@ -13,6 +13,7 @@ once_cell.workspace = true
 openssl.workspace = true
 serde.workspace = true
 
+proxmox-access-control.workspace = true
 proxmox-config-digest = { workspace = true, features = [ "openssl" ] }
 proxmox-http = { workspace = true, features = [ "http-helpers" ] }
 proxmox-ldap = { workspace = true, features = [ "types" ]}
diff --git a/lib/pdm-config/src/access_control.rs b/lib/pdm-config/src/access_control.rs
new file mode 100644
index 0000000..389b3f4
--- /dev/null
+++ b/lib/pdm-config/src/access_control.rs
@@ -0,0 +1,20 @@
+use anyhow::Error;
+
+pub struct AccessControlConfig;
+
+impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
+    fn permissions(&self) -> &dyn proxmox_access_control::init::AccessControlPermissions {
+        &pdm_api_types::AccessControlPermissions
+    }
+
+    fn token_shadow_cache_generation(&self) -> Option<usize> {
+        crate::ConfigVersionCache::new()
+            .ok()
+            .map(|c| c.token_shadow_generation())
+    }
+
+    fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
+        let c = crate::ConfigVersionCache::new()?;
+        Ok(c.increase_token_shadow_generation())
+    }
+}
diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
index 36a6a77..933140c 100644
--- a/lib/pdm-config/src/config_version_cache.rs
+++ b/lib/pdm-config/src/config_version_cache.rs
@@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
     traffic_control_generation: AtomicUsize,
     // Tracks updates to the remote/hostname/nodename mapping cache.
     remote_mapping_cache: AtomicUsize,
+    // Token shadow (token.shadow) generation/version.
+    token_shadow_generation: AtomicUsize,
     // Add further atomics here
 }
 
@@ -172,4 +174,20 @@ impl ConfigVersionCache {
             .fetch_add(1, Ordering::Relaxed)
             + 1
     }
+
+    /// Returns the token shadow generation number.
+    pub fn token_shadow_generation(&self) -> usize {
+        self.shmem
+            .data()
+            .token_shadow_generation
+            .load(Ordering::Acquire)
+    }
+
+    /// Increase the token shadow generation number.
+    pub fn increase_token_shadow_generation(&self) -> usize {
+        self.shmem
+            .data()
+            .token_shadow_generation
+            .fetch_add(1, Ordering::AcqRel)
+    }
 }
diff --git a/lib/pdm-config/src/lib.rs b/lib/pdm-config/src/lib.rs
index 4c49054..614f7ae 100644
--- a/lib/pdm-config/src/lib.rs
+++ b/lib/pdm-config/src/lib.rs
@@ -9,6 +9,8 @@ pub mod remotes;
 pub mod setup;
 pub mod views;
 
+mod access_control;
+pub use access_control::AccessControlConfig;
 mod config_version_cache;
 pub use config_version_cache::ConfigVersionCache;
 
diff --git a/server/src/acl.rs b/server/src/acl.rs
index f421814..e6e007b 100644
--- a/server/src/acl.rs
+++ b/server/src/acl.rs
@@ -1,6 +1,5 @@
 pub(crate) fn init() {
-    static ACCESS_CONTROL_CONFIG: pdm_api_types::AccessControlConfig =
-        pdm_api_types::AccessControlConfig;
+    static ACCESS_CONTROL_CONFIG: pdm_config::AccessControlConfig = pdm_config::AccessControlConfig;
 
     proxmox_access_control::init::init(&ACCESS_CONTROL_CONFIG, pdm_buildcfg::configdir!("/access"))
         .expect("failed to setup access control config");
diff --git a/ui/src/main.rs b/ui/src/main.rs
index db27ecf..46249ae 100644
--- a/ui/src/main.rs
+++ b/ui/src/main.rs
@@ -391,10 +391,18 @@ fn main() {
     pwt::state::set_available_languages(proxmox_yew_comp::available_language_list());
 
     if let Err(e) =
-        proxmox_access_control::init::init_access_config(&pdm_api_types::AccessControlConfig)
+        proxmox_access_control::init::init_access_config(&UiAccessControlConfig)
     {
         log::error!("could not initialize access control config - {e:#}");
     }
 
     yew::Renderer::<DatacenterManagerApp>::new().render();
 }
+
+struct UiAccessControlConfig;
+
+impl proxmox_access_control::init::AccessControlConfig for UiAccessControlConfig {
+    fn permissions(&self) -> &dyn proxmox_access_control::init::AccessControlPermissions {
+        &pdm_api_types::AccessControlPermissions
+    }
+}
-- 
2.47.3





^ permalink raw reply related	[relevance 13%]

* [PATCH proxmox v7 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes
  2026-03-12 10:36 14% [PATCH proxmox{-backup,,-datacenter-manager} v7 " Samuel Rufinatscha
                   ` (5 preceding siblings ...)
  2026-03-12 10:37 11% ` [PATCH proxmox v7 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
@ 2026-03-12 10:37 12% ` Samuel Rufinatscha
  2026-03-12 10:37 15% ` [PATCH proxmox v7 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-12 10:37 UTC (permalink / raw)
  To: pbs-devel

This patch adds manual/direct file change detection by tracking the
mtime and length of token.shadow and clears the in-memory token secret
cache whenever these values change.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v6 to v7:
* Rebased

Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased

Changes from v3 to v4:
* make use of .replace() in refresh_cache_if_file_changed to get
previous state
* Group file stats with ShadowFileInfo
* Return false in refresh_cache_if_file_changed to avoid unnecessary cache
queries
* Adjusted commit message

Changes from v2 to v3:
* Cache now tracks last_checked (epoch seconds).
* Simplified refresh_cache_if_file_changed, removed
FILE_GENERATION logic
* On first load, initializes file metadata and keeps empty cache.

Changes from v1 to v2:
* Add file metadata tracking (file_mtime, file_len) and
  FILE_GENERATION.
* Store file_gen in CachedSecret and verify it against the current
  FILE_GENERATION to ensure cached entries belong to the current file
  state.
* Add shadow_mtime_len() helper and convert refresh to best-effort
  (try_write, returns bool).
* Pass a pre-write metadata snapshot into apply_api_mutation and
  clear/bump generation if the cache metadata indicates missed external
  edits.

 proxmox-access-control/src/token_shadow.rs | 123 ++++++++++++++++++++-
 1 file changed, 119 insertions(+), 4 deletions(-)

diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index 389c57ec..063306d5 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,5 +1,8 @@
 use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
 use std::sync::LazyLock;
+use std::time::SystemTime;
 
 use anyhow::{bail, format_err, Error};
 use parking_lot::RwLock;
@@ -7,6 +10,7 @@ use serde_json::{from_value, Value};
 
 use proxmox_auth_api::types::Authid;
 use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
+use proxmox_time::epoch_i64;
 
 use crate::init::access_conf;
 use crate::init::impl_feature::{token_shadow, token_shadow_lock};
@@ -20,6 +24,7 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
     RwLock::new(ApiTokenSecretCache {
         secrets: HashMap::new(),
         shared_gen: 0,
+        shadow: None,
     })
 });
 
@@ -45,6 +50,56 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
     replace_config(token_shadow(), &json)
 }
 
+/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
+/// Returns true if the cache is valid to use, false if not.
+fn refresh_cache_if_file_changed() -> bool {
+    let now = epoch_i64();
+
+    // Best-effort refresh under write lock.
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return false;
+    };
+
+    let Some(shared_gen_now) = token_shadow_shared_gen() else {
+        return false;
+    };
+
+    // If another process bumped the generation, we don't know what changed -> clear cache
+    if cache.shared_gen != shared_gen_now {
+        cache.reset_and_set_gen(shared_gen_now);
+    }
+
+    // Stat the file to detect manual edits.
+    let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
+        return false;
+    };
+
+    // If the file didn't change, only update last_checked
+    if let Some(shadow) = cache.shadow.as_mut() {
+        if shadow.mtime == new_mtime && shadow.len == new_len {
+            shadow.last_checked = now;
+            return true;
+        }
+    }
+
+    cache.secrets.clear();
+
+    let prev = cache.shadow.replace(ShadowFileInfo {
+        mtime: new_mtime,
+        len: new_len,
+        last_checked: now,
+    });
+
+    if prev.is_some() {
+        // Best-effort propagation to other processes if a change was detected
+        if let Some(shared_gen_new) = bump_token_shadow_shared_gen() {
+            cache.shared_gen = shared_gen_new;
+        }
+    }
+
+    false
+}
+
 /// Verifies that an entry for given tokenid / API token secret exists
 pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     if !tokenid.is_token() {
@@ -52,7 +107,7 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     }
 
     // Fast path
-    if cache_try_secret_matches(tokenid, secret) {
+    if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
         return Ok(());
     }
 
@@ -84,12 +139,15 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
 
     let guard = lock_config()?;
 
+    // Capture state before we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
-    apply_api_mutation(guard, tokenid, Some(secret));
+    apply_api_mutation(guard, tokenid, Some(secret), pre_meta);
 
     Ok(())
 }
@@ -102,11 +160,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
 
     let guard = lock_config()?;
 
+    // Capture state before we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
-    apply_api_mutation(guard, tokenid, None);
+    apply_api_mutation(guard, tokenid, None, pre_meta);
 
     Ok(())
 }
@@ -133,6 +194,8 @@ struct ApiTokenSecretCache {
     secrets: HashMap<Authid, CachedSecret>,
     /// Shared generation to detect mutations of the underlying token.shadow file.
     shared_gen: usize,
+    /// Shadow file info to detect changes
+    shadow: Option<ShadowFileInfo>,
 }
 
 impl ApiTokenSecretCache {
@@ -140,6 +203,7 @@ impl ApiTokenSecretCache {
     fn reset_and_set_gen(&mut self, new_gen: usize) {
         self.secrets.clear();
         self.shared_gen = new_gen;
+        self.shadow = None;
     }
 
     /// Caches a secret and sets/updates the cache generation.
@@ -155,6 +219,16 @@ impl ApiTokenSecretCache {
     }
 }
 
+/// Shadow file info
+struct ShadowFileInfo {
+    // shadow file mtime to detect changes
+    mtime: Option<SystemTime>,
+    // shadow file length to detect changes
+    len: Option<u64>,
+    // last time the file metadata was checked
+    last_checked: i64,
+}
+
 fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return;
@@ -203,7 +277,14 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
     false
 }
 
-fn apply_api_mutation(_guard: ApiLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+fn apply_api_mutation(
+    _guard: ApiLockGuard,
+    tokenid: &Authid,
+    new_secret: Option<&str>,
+    pre_write_meta: (Option<SystemTime>, Option<u64>),
+) {
+    let now = epoch_i64();
+
     // Signal cache invalidation to other processes (best-effort).
     let bumped_gen = bump_token_shadow_shared_gen();
 
@@ -222,6 +303,16 @@ fn apply_api_mutation(_guard: ApiLockGuard, tokenid: &Authid, new_secret: Option
         return;
     }
 
+    // If our cached file metadata does not match the on-disk state before our write,
+    // we likely missed an external/manual edit. We can no longer trust any cached secrets.
+    if cache
+        .shadow
+        .as_ref()
+        .is_some_and(|s| (s.mtime, s.len) != pre_write_meta)
+    {
+        cache.secrets.clear();
+    }
+
     // Apply the new mutation.
     match new_secret {
         Some(secret) => {
@@ -232,6 +323,22 @@ fn apply_api_mutation(_guard: ApiLockGuard, tokenid: &Authid, new_secret: Option
         }
         None => cache.evict_and_set_gen(tokenid, current_gen),
     }
+
+    // Update our view of the file metadata to the post-write state (best-effort).
+    // (If this fails, drop local cache so callers fall back to slow path until refreshed.)
+    match shadow_mtime_len() {
+        Ok((mtime, len)) => {
+            cache.shadow = Some(ShadowFileInfo {
+                mtime,
+                len,
+                last_checked: now,
+            });
+        }
+        Err(_) => {
+            // If we cannot validate state, do not trust cache.
+            cache.reset_and_set_gen(current_gen);
+        }
+    }
 }
 
 /// Get the current shared generation.
@@ -246,3 +353,11 @@ fn bump_token_shadow_shared_gen() -> Option<usize> {
         .ok()
         .map(|prev| prev + 1)
 }
+
+fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
+    match fs::metadata(token_shadow()) {
+        Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
+        Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
+        Err(e) => Err(e.into()),
+    }
+}
-- 
2.47.3





^ permalink raw reply related	[relevance 12%]

* [PATCH proxmox{-backup,,-datacenter-manager} v7 00/11] token-shadow: reduce api token verification overhead
@ 2026-03-12 10:36 14% Samuel Rufinatscha
  2026-03-12 10:36 17% ` [PATCH proxmox-backup v7 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
                   ` (12 more replies)
  0 siblings, 13 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-12 10:36 UTC (permalink / raw)
  To: pbs-devel

Hi,

this series improves the performance of token-based API authentication
in PBS (pbs-config) and in PDM (underlying proxmox-access-control
crate), addressing the API token verification hotspot reported in our
bugtracker #7017 [1].

When profiling PBS /status endpoint with cargo flamegraph [2],
token-based authentication showed up as a dominant hotspot via
proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
path from the hot section of the flamegraph. The same performance issue
was measured [2] for PDM. PDM uses the underlying shared
proxmox-access-control library for token handling, which is a
factored out version of the token.shadow handling code from PBS.

While this series fixes the immediate performance issue both in PBS
(pbs-config) and in the shared proxmox-access-control crate used by
PDM, PBS should eventually, ideally be refactored, in a separate
effort, to use proxmox-access-control for token handling instead of its
local implementation.

Approach

The goal is to reduce the cost of token-based authentication preserving
the existing token handling semantics (including detecting manual edits
to token.shadow) and be consistent between PBS (pbs-config) and
PDM (proxmox-access-control). For both sites, this series proposes to:

1. Introduce an in-memory cache for verified token secrets and
invalidate it through a shared ConfigVersionCache generation. Note, a
shared generation is required to keep privileged and unprivileged
daemon in sync to avoid caching inconsistencies across processes.
2. Invalidate on token.shadow API changes (set_secret,
delete_secret)
3. Invalidate on direct/manual token.shadow file changes (mtime +
length)
4. Avoid per-request file stat calls using a TTL window

Testing

To verify the effect in PBS (pbs-config changes), I:
1. Set up test environment based on latest PBS ISO, installed Rust
   toolchain, cloned proxmox-backup repository to use with cargo
   flamegraph. Reproduced bug #7017 [1] by profiling the /status
   endpoint with token-based authentication using cargo flamegraph [2].
2. Built PBS with pbs-config patches and re-ran the same workload and
   profiling setup. Confirmed that
   proxmox_sys::crypt::verify_crypt_pw path no longer appears in the
   hot section of the flamegraph. CPU usage is now dominated by TLS
   overhead.
3. Functionally-wise, I verified that:
   * valid tokens authenticate correctly when used in API requests
   * invalid secrets are rejected as before
   * generating a new token secret via dashboard (create token for
   user, regenerate existing secret) works and authenticates correctly

To verify the effect in PDM (proxmox-access-control changes), instead
of PBS’ /status, I profiled the /version endpoint with cargo flamegraph
[2] and verified that the expensive hashing path disappears from the
hot section after introducing caching. Functionally-wise, I verified
that:
   * valid tokens authenticate correctly when used in API requests
   * invalid secrets are rejected as before
   * generating a new token secret via dashboard (create token for user,
   regenerate existing secret) works and authenticates correctly

Results

To measure caching effect I benchmarked parallel token auth requests
for /status?verbose=0 on top of the datastore lookup cache series [3]
to check throughput impact. With datastores=1, repeat=5000, parallel=16
this series gives ~172 req/s compared to ~65 req/s without it.
This is a ~2.6x improvement (and aligns with the ~179 req/s from the
previous series, which used per-process cache invalidation).

Patch summary

pbs-config:
0001 – pbs-config: add token.shadow generation to ConfigVersionCache
0002 – pbs-config: cache verified API token secrets
0003 – pbs-config: invalidate token-secret cache on token.shadow
changes
0004 – pbs-config: add TTL window to token-secret cache

proxmox-access-control:
0005 – access-control: extend AccessControlConfig for token.shadow invalidation
0006 – access-control: cache verified API token secrets
0007 – access-control: invalidate token-secret cache on token.shadow changes
0008 – access-control: add TTL window to token-secret cache

proxmox-datacenter-manager:
0009 – pdm-config: add token.shadow generation to ConfigVersionCache
0010 – docs: document API token-cache TTL effects
0011 – pdm-config: wire user+acl cache generation

Maintainer Notes:
* proxmox-access-control trait split: permissions now live in
 AccessControlPermissions, and AccessControlConfig now requires
 fn permissions(&self) -> &dyn AccessControlPermissions ->
 version bump
* Renames ConfigVersionCache`s pub user_cache_generation and
 increase_user_cache_generation -> version bump
* Adds parking_lot::RwLock dependency in PBS and proxmox-access-control

This version and the version before only incorporate the reviewers'
feedback [4][5][6], also please consider Christian's R-b tag [4].

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
[2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
[3] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
[4] https://lore.proxmox.com/pbs-devel/20260121151408.731516-1-s.rufinatscha@proxmox.com/T/#t
[5] https://lore.proxmox.com/pbs-devel/20260217111229.78661-1-s.rufinatscha@proxmox.com/T/#t
[6] https://lore.proxmox.com/pbs-devel/725687dd-5a35-41ed-af62-6dc9f062cbd4@proxmox.com/T/#t

proxmox-backup:

Samuel Rufinatscha (4):
  pbs-config: add token.shadow generation to ConfigVersionCache
  pbs-config: cache verified API token secrets
  pbs-config: invalidate token-secret cache on token.shadow changes
  pbs-config: add TTL window to token secret cache

 Cargo.toml                             |   1 +
 docs/user-management.rst               |   4 +
 pbs-config/Cargo.toml                  |   1 +
 pbs-config/src/config_version_cache.rs |  18 ++
 pbs-config/src/token_shadow.rs         | 314 ++++++++++++++++++++++++-
 5 files changed, 335 insertions(+), 3 deletions(-)


proxmox:

Samuel Rufinatscha (4):
  proxmox-access-control: split AccessControlConfig and add token.shadow
    gen
  proxmox-access-control: cache verified API token secrets
  proxmox-access-control: invalidate token-secret cache on token.shadow
    changes
  proxmox-access-control: add TTL window to token secret cache

 Cargo.toml                                 |   1 +
 proxmox-access-control/Cargo.toml          |   1 +
 proxmox-access-control/src/acl.rs          |  10 +-
 proxmox-access-control/src/init.rs         | 113 ++++++--
 proxmox-access-control/src/token_shadow.rs | 315 ++++++++++++++++++++-
 5 files changed, 413 insertions(+), 27 deletions(-)


proxmox-datacenter-manager:

Samuel Rufinatscha (3):
  pdm-config: implement token.shadow generation
  docs: document API token-cache TTL effects
  pdm-config: wire user+acl cache generation

 cli/admin/src/main.rs                      |  2 +-
 docs/access-control.rst                    |  4 +++
 lib/pdm-api-types/src/acl.rs               |  4 +--
 lib/pdm-config/Cargo.toml                  |  1 +
 lib/pdm-config/src/access_control.rs       | 31 ++++++++++++++++++++
 lib/pdm-config/src/config_version_cache.rs | 34 +++++++++++++++++-----
 lib/pdm-config/src/lib.rs                  |  2 ++
 server/src/acl.rs                          |  3 +-
 ui/src/main.rs                             | 10 ++++++-
 9 files changed, 77 insertions(+), 14 deletions(-)
 create mode 100644 lib/pdm-config/src/access_control.rs


Summary over all repositories:
  19 files changed, 825 insertions(+), 44 deletions(-)

-- 
Generated by git-murpp 0.8.1




^ permalink raw reply	[relevance 14%]

* [PATCH proxmox v7 4/4] proxmox-access-control: add TTL window to token secret cache
  2026-03-12 10:36 14% [PATCH proxmox{-backup,,-datacenter-manager} v7 " Samuel Rufinatscha
                   ` (6 preceding siblings ...)
  2026-03-12 10:37 12% ` [PATCH proxmox v7 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2026-03-12 10:37 15% ` Samuel Rufinatscha
  2026-03-12 10:37 13% ` [PATCH proxmox-datacenter-manager v7 1/3] pdm-config: implement token.shadow generation Samuel Rufinatscha
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-12 10:37 UTC (permalink / raw)
  To: pbs-devel

Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
usually should rarely change.

This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired; documents TTL effects.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v6 to v7:
* Rebased

Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased
* Introduce shadow_check_within_ttl() helper

Changes from v3 to v4:
* Adjusted commit message

Changes from v2 to v3:
* Refactored refresh_cache_if_file_changed TTL logic.
* Remove had_prior_state check (replaced by last_checked logic).
* Improve TTL bound checks.
* Reword documentation warning for clarity.

Changes from v1 to v2:
* Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
* Implement double-checked TTL: check with try_read first; only attempt
  refresh with try_write if expired/unknown.
* Fix TTL bookkeeping: update last_checked on the “file unchanged” path
  and after API mutations.
* Add documentation warning about TTL-delayed effect of manual
  token.shadow edits.

 proxmox-access-control/src/token_shadow.rs | 31 +++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index 063306d5..d37b0585 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -28,6 +28,9 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
     })
 });
 
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
+
 // Get exclusive lock
 fn lock_config() -> Result<ApiLockGuard, Error> {
     open_api_lockfile(token_shadow_lock(), None, true)
@@ -55,11 +58,24 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
 fn refresh_cache_if_file_changed() -> bool {
     let now = epoch_i64();
 
-    // Best-effort refresh under write lock.
+    // Fast path: cache is fresh if shared-gen matches and TTL not expired.
+    if let (Some(cache), Some(shared_gen_read)) =
+        (TOKEN_SECRET_CACHE.try_read(), token_shadow_shared_gen())
+    {
+        if cache.shared_gen == shared_gen_read && cache.shadow_check_within_ttl(now) {
+            return true;
+        }
+        // read lock drops here
+    } else {
+        return false;
+    }
+
+    // Slow path: best-effort refresh under write lock.
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return false;
     };
 
+    // Re-read generation after acquiring the lock (may have changed meanwhile).
     let Some(shared_gen_now) = token_shadow_shared_gen() else {
         return false;
     };
@@ -69,6 +85,12 @@ fn refresh_cache_if_file_changed() -> bool {
         cache.reset_and_set_gen(shared_gen_now);
     }
 
+    // TTL check again after acquiring the lock
+    let now = epoch_i64();
+    if cache.shadow_check_within_ttl(now) {
+        return true;
+    }
+
     // Stat the file to detect manual edits.
     let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
         return false;
@@ -217,6 +239,13 @@ impl ApiTokenSecretCache {
         self.secrets.remove(tokenid);
         self.shared_gen = new_gen;
     }
+
+    /// Returns true if cached token.shadow metadata exists and was checked within the TTL window.
+    fn shadow_check_within_ttl(&self, now: i64) -> bool {
+        self.shadow.as_ref().is_some_and(|cached| {
+            now >= cached.last_checked && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS
+        })
+    }
 }
 
 /// Shadow file info
-- 
2.47.3





^ permalink raw reply related	[relevance 15%]

* [PATCH proxmox v7 2/4] proxmox-access-control: cache verified API token secrets
  2026-03-12 10:36 14% [PATCH proxmox{-backup,,-datacenter-manager} v7 " Samuel Rufinatscha
                   ` (4 preceding siblings ...)
  2026-03-12 10:37 14% ` [PATCH proxmox v7 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen Samuel Rufinatscha
@ 2026-03-12 10:37 11% ` Samuel Rufinatscha
  2026-03-12 10:37 12% ` [PATCH proxmox v7 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-12 10:37 UTC (permalink / raw)
  To: pbs-devel

Adds an in-memory cache of successfully verified token secrets.
Subsequent requests for the same token+secret combination only perform
a comparison using openssl::memcmp::eq and avoid re-running the
password hash. The cache is updated when a token secret is set and
cleared when a token is deleted. A shared generation counter (via
ConfigVersionCache) is used to invalidate caches across processes when
token secrets are modified or deleted. This keeps privileged and
unprivileged daemons in sync.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v6 to v7:
* Rebased
* Rename "gen" variables to be compatible with Rust 2024 keyword
changes

Changes from v5 to v6:
* Rebased
* Check that the input byte lengths are equal before calling
openssl::memcmp::eq(..).

Changes from v4 to v5:
* Rebased
* Fix wrong type compilation issue; replaced with ApiLockGuard
* Move invalidate_cache_state_and_set_gen into cache object impl
rename to reset_and_set_gen
* Add additional insert/remove helpers which set/update the generation
directly
* Clarified the  usage of shared generation counter in the commit
message

Changes from v3 to v4:
* Add gen param to invalidate_cache_state()
* Validates the generation bump after obtaining write lock in
apply_api_mutation
* Pass lock to apply_api_mutation
* Remove unnecessary gen check cache_try_secret_matches
* Adjusted commit message

Changes from v2 to v3:
* Replaced process-local cache invalidation (AtomicU64
API_MUTATION_GENERATION) with a cross-process shared generation via
ConfigVersionCache.
* Validate shared generation before/after the constant-time secret
compare; only insert into cache if the generation is unchanged.
* invalidate_cache_state() on insert if shared generation changed.

Changes from v1 to v2:
* Replace OnceCell with LazyLock, and std::sync::RwLock with
parking_lot::RwLock.
* Add API_MUTATION_GENERATION and guard cache inserts
to prevent “zombie inserts” across concurrent set/delete.
* Refactor cache operations into cache_try_secret_matches,
cache_try_insert_secret, and centralize write-side behavior in
apply_api_mutation.
* Switch fast-path cache access to try_read/try_write (best-effort).

 Cargo.toml                                 |   1 +
 proxmox-access-control/Cargo.toml          |   1 +
 proxmox-access-control/src/token_shadow.rs | 171 ++++++++++++++++++++-
 3 files changed, 170 insertions(+), 3 deletions(-)

diff --git a/Cargo.toml b/Cargo.toml
index 1cb5f09e..d930854e 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -113,6 +113,7 @@ native-tls = "0.2"
 nix = "0.29"
 openssl = "0.10"
 pam-sys = "0.5"
+parking_lot = "0.12"
 percent-encoding = "2.1"
 pin-utils = "0.1.0"
 proc-macro2 = "1.0"
diff --git a/proxmox-access-control/Cargo.toml b/proxmox-access-control/Cargo.toml
index ec189664..1de2842c 100644
--- a/proxmox-access-control/Cargo.toml
+++ b/proxmox-access-control/Cargo.toml
@@ -16,6 +16,7 @@ anyhow.workspace = true
 const_format.workspace = true
 nix = { workspace = true, optional = true }
 openssl = { workspace = true, optional = true }
+parking_lot.workspace = true
 regex.workspace = true
 hex = { workspace = true, optional = true }
 serde.workspace = true
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index c586d834..389c57ec 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,13 +1,28 @@
 use std::collections::HashMap;
+use std::sync::LazyLock;
 
 use anyhow::{bail, format_err, Error};
+use parking_lot::RwLock;
 use serde_json::{from_value, Value};
 
 use proxmox_auth_api::types::Authid;
 use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
 
+use crate::init::access_conf;
 use crate::init::impl_feature::{token_shadow, token_shadow_lock};
 
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
+    RwLock::new(ApiTokenSecretCache {
+        secrets: HashMap::new(),
+        shared_gen: 0,
+    })
+});
+
 // Get exclusive lock
 fn lock_config() -> Result<ApiLockGuard, Error> {
     open_api_lockfile(token_shadow_lock(), None, true)
@@ -36,9 +51,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
+    // Fast path
+    if cache_try_secret_matches(tokenid, secret) {
+        return Ok(());
+    }
+
+    // Slow path
+    // First, capture the shared generation before doing the hash verification.
+    let gen_before = token_shadow_shared_gen();
+
     let data = read_file()?;
     match data.get(tokenid) {
-        Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+        Some(hashed_secret) => {
+            proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+
+            // Try to cache only if nothing changed while verifying the secret.
+            if let Some(gen_before) = gen_before {
+                cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen_before);
+            }
+
+            Ok(())
+        }
         None => bail!("invalid API token"),
     }
 }
@@ -49,13 +82,15 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
-    let _guard = lock_config()?;
+    let guard = lock_config()?;
 
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
+    apply_api_mutation(guard, tokenid, Some(secret));
+
     Ok(())
 }
 
@@ -65,12 +100,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
-    let _guard = lock_config()?;
+    let guard = lock_config()?;
 
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
+    apply_api_mutation(guard, tokenid, None);
+
     Ok(())
 }
 
@@ -81,3 +118,131 @@ pub fn generate_and_set_secret(tokenid: &Authid) -> Result<String, Error> {
     set_secret(tokenid, &secret)?;
     Ok(secret)
 }
+
+/// Cached secret.
+struct CachedSecret {
+    secret: String,
+}
+
+struct ApiTokenSecretCache {
+    /// Keys are token Authids, values are the corresponding plain text secrets.
+    /// Entries are added after a successful on-disk verification in
+    /// `verify_secret` or when a new token secret is generated by
+    /// `generate_and_set_secret`. Used to avoid repeated
+    /// password-hash computation on subsequent authentications.
+    secrets: HashMap<Authid, CachedSecret>,
+    /// Shared generation to detect mutations of the underlying token.shadow file.
+    shared_gen: usize,
+}
+
+impl ApiTokenSecretCache {
+    /// Resets all local cache contents and sets/updates the cached generation.
+    fn reset_and_set_gen(&mut self, new_gen: usize) {
+        self.secrets.clear();
+        self.shared_gen = new_gen;
+    }
+
+    /// Caches a secret and sets/updates the cache generation.
+    fn insert_and_set_gen(&mut self, tokenid: Authid, secret: CachedSecret, new_gen: usize) {
+        self.secrets.insert(tokenid, secret);
+        self.shared_gen = new_gen;
+    }
+
+    /// Evicts a cached secret and sets/updates the cached generation.
+    fn evict_and_set_gen(&mut self, tokenid: &Authid, new_gen: usize) {
+        self.secrets.remove(tokenid);
+        self.shared_gen = new_gen;
+    }
+}
+
+fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return;
+    };
+
+    let Some(shared_gen_now) = token_shadow_shared_gen() else {
+        return;
+    };
+
+    // If this process missed a generation bump, its cache is stale.
+    if cache.shared_gen != shared_gen_now {
+        cache.reset_and_set_gen(shared_gen_now);
+    }
+
+    // If a mutation happened while we were verifying the secret, do not insert.
+    if shared_gen_now == shared_gen_before {
+        cache.insert_and_set_gen(tokenid, CachedSecret { secret }, shared_gen_now);
+    }
+}
+
+/// Tries to match the given token secret against the cached secret.
+///
+/// Verifies the generation/version before doing the constant-time
+/// comparison to reduce TOCTOU risk. During token rotation or deletion
+/// tokens for in-flight requests may still validate against the previous
+/// generation.
+fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
+    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+        return false;
+    };
+    let Some(entry) = cache.secrets.get(tokenid) else {
+        return false;
+    };
+    let Some(current_gen) = token_shadow_shared_gen() else {
+        return false;
+    };
+
+    if current_gen == cache.shared_gen {
+        let cached_secret_bytes = entry.secret.as_bytes();
+        let secret_bytes = secret.as_bytes();
+
+        return cached_secret_bytes.len() == secret_bytes.len()
+            && openssl::memcmp::eq(cached_secret_bytes, secret_bytes);
+    }
+
+    false
+}
+
+fn apply_api_mutation(_guard: ApiLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+    // Signal cache invalidation to other processes (best-effort).
+    let bumped_gen = bump_token_shadow_shared_gen();
+
+    let mut cache = TOKEN_SECRET_CACHE.write();
+
+    // If we cannot get the current generation, we cannot trust the cache
+    let Some(current_gen) = token_shadow_shared_gen() else {
+        cache.reset_and_set_gen(0);
+        return;
+    };
+
+    // If we cannot bump the shared generation, or if it changed after
+    // obtaining the cache write lock, we cannot trust the cache
+    if bumped_gen != Some(current_gen) {
+        cache.reset_and_set_gen(current_gen);
+        return;
+    }
+
+    // Apply the new mutation.
+    match new_secret {
+        Some(secret) => {
+            let cached_secret = CachedSecret {
+                secret: secret.to_owned(),
+            };
+            cache.insert_and_set_gen(tokenid.clone(), cached_secret, current_gen);
+        }
+        None => cache.evict_and_set_gen(tokenid, current_gen),
+    }
+}
+
+/// Get the current shared generation.
+fn token_shadow_shared_gen() -> Option<usize> {
+    access_conf().token_shadow_cache_generation()
+}
+
+/// Bump and return the new shared generation.
+fn bump_token_shadow_shared_gen() -> Option<usize> {
+    access_conf()
+        .increment_token_shadow_cache_generation()
+        .ok()
+        .map(|prev| prev + 1)
+}
-- 
2.47.3





^ permalink raw reply related	[relevance 11%]

* [pbs-devel] superseded: [PATCH proxmox{-backup,,-datacenter-manager} v6 00/11] token-shadow: reduce api token verification overhead
  2026-03-03 16:49 14% [PATCH proxmox{-backup,,-datacenter-manager} v6 " Samuel Rufinatscha
                   ` (11 preceding siblings ...)
  2026-03-11  8:59  6% ` [PATCH proxmox{-backup,,-datacenter-manager} v6 00/11] token-shadow: reduce api token verification overhead Fabian Grünbichler
@ 2026-03-12 10:38 13% ` Samuel Rufinatscha
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-12 10:38 UTC (permalink / raw)
  To: pbs-devel

https://lore.proxmox.com/pbs-devel/20260312103708.125282-2-s.rufinatscha@proxmox.com/T/#t

On 3/3/26 5:49 PM, Samuel Rufinatscha wrote:
> Hi,
> 
> this series improves the performance of token-based API authentication
> in PBS (pbs-config) and in PDM (underlying proxmox-access-control
> crate), addressing the API token verification hotspot reported in our
> bugtracker #7017 [1].
> 
> When profiling PBS /status endpoint with cargo flamegraph [2],
> token-based authentication showed up as a dominant hotspot via
> proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
> path from the hot section of the flamegraph. The same performance issue
> was measured [2] for PDM. PDM uses the underlying shared
> proxmox-access-control library for token handling, which is a
> factored out version of the token.shadow handling code from PBS.
> 
> While this series fixes the immediate performance issue both in PBS
> (pbs-config) and in the shared proxmox-access-control crate used by
> PDM, PBS should eventually, ideally be refactored, in a separate
> effort, to use proxmox-access-control for token handling instead of its
> local implementation.
> 
> Approach
> 
> The goal is to reduce the cost of token-based authentication preserving
> the existing token handling semantics (including detecting manual edits
> to token.shadow) and be consistent between PBS (pbs-config) and
> PDM (proxmox-access-control). For both sites, this series proposes to:
> 
> 1. Introduce an in-memory cache for verified token secrets and
> invalidate it through a shared ConfigVersionCache generation. Note, a
> shared generation is required to keep privileged and unprivileged
> daemon in sync to avoid caching inconsistencies across processes.
> 2. Invalidate on token.shadow API changes (set_secret,
> delete_secret)
> 3. Invalidate on direct/manual token.shadow file changes (mtime +
> length)
> 4. Avoid per-request file stat calls using a TTL window
> 
> Testing
> 
> To verify the effect in PBS (pbs-config changes), I:
> 1. Set up test environment based on latest PBS ISO, installed Rust
>     toolchain, cloned proxmox-backup repository to use with cargo
>     flamegraph. Reproduced bug #7017 [1] by profiling the /status
>     endpoint with token-based authentication using cargo flamegraph [2].
> 2. Built PBS with pbs-config patches and re-ran the same workload and
>     profiling setup. Confirmed that
>     proxmox_sys::crypt::verify_crypt_pw path no longer appears in the
>     hot section of the flamegraph. CPU usage is now dominated by TLS
>     overhead.
> 3. Functionally-wise, I verified that:
>     * valid tokens authenticate correctly when used in API requests
>     * invalid secrets are rejected as before
>     * generating a new token secret via dashboard (create token for
>     user, regenerate existing secret) works and authenticates correctly
> 
> To verify the effect in PDM (proxmox-access-control changes), instead
> of PBS’ /status, I profiled the /version endpoint with cargo flamegraph
> [2] and verified that the expensive hashing path disappears from the
> hot section after introducing caching. Functionally-wise, I verified
> that:
>     * valid tokens authenticate correctly when used in API requests
>     * invalid secrets are rejected as before
>     * generating a new token secret via dashboard (create token for user,
>     regenerate existing secret) works and authenticates correctly
> 
> Results
> 
> To measure caching effect I benchmarked parallel token auth requests
> for /status?verbose=0 on top of the datastore lookup cache series [3]
> to check throughput impact. With datastores=1, repeat=5000, parallel=16
> this series gives ~172 req/s compared to ~65 req/s without it.
> This is a ~2.6x improvement (and aligns with the ~179 req/s from the
> previous series, which used per-process cache invalidation).
> 
> Patch summary
> 
> pbs-config:
> 0001 – pbs-config: add token.shadow generation to ConfigVersionCache
> 0002 – pbs-config: cache verified API token secrets
> 0003 – pbs-config: invalidate token-secret cache on token.shadow
> changes
> 0004 – pbs-config: add TTL window to token-secret cache
> 
> proxmox-access-control:
> 0005 – access-control: extend AccessControlConfig for token.shadow invalidation
> 0006 – access-control: cache verified API token secrets
> 0007 – access-control: invalidate token-secret cache on token.shadow changes
> 0008 – access-control: add TTL window to token-secret cache
> 
> proxmox-datacenter-manager:
> 0009 – pdm-config: add token.shadow generation to ConfigVersionCache
> 0010 – docs: document API token-cache TTL effects
> 0011 – pdm-config: wire user+acl cache generation
> 
> Maintainer Notes:
> * proxmox-access-control trait split: permissions now live in
>   AccessControlPermissions, and AccessControlConfig now requires
>   fn permissions(&self) -> &dyn AccessControlPermissions ->
>   version bump
> * Renames ConfigVersionCache`s pub user_cache_generation and
>   increase_user_cache_generation -> version bump
> * Adds parking_lot::RwLock dependency in PBS and proxmox-access-control
> 
> This version and the version before only incorporate the reviewers'
> feedback [4][5], also please consider Christian's R-b tag [4].
> 
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
> [2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
> [3] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [4] https://lore.proxmox.com/pbs-devel/20260121151408.731516-1-s.rufinatscha@proxmox.com/T/#t
> [5] https://lore.proxmox.com/pbs-devel/20260217111229.78661-1-s.rufinatscha@proxmox.com/T/#t
> 
> proxmox-backup:
> 
> Samuel Rufinatscha (4):
>    pbs-config: add token.shadow generation to ConfigVersionCache
>    pbs-config: cache verified API token secrets
>    pbs-config: invalidate token-secret cache on token.shadow changes
>    pbs-config: add TTL window to token secret cache
> 
>   Cargo.toml                             |   1 +
>   docs/user-management.rst               |   4 +
>   pbs-config/Cargo.toml                  |   1 +
>   pbs-config/src/config_version_cache.rs |  18 ++
>   pbs-config/src/token_shadow.rs         | 314 ++++++++++++++++++++++++-
>   5 files changed, 335 insertions(+), 3 deletions(-)
> 
> 
> proxmox:
> 
> Samuel Rufinatscha (4):
>    proxmox-access-control: split AccessControlConfig and add token.shadow
>      gen
>    proxmox-access-control: cache verified API token secrets
>    proxmox-access-control: invalidate token-secret cache on token.shadow
>      changes
>    proxmox-access-control: add TTL window to token secret cache
> 
>   Cargo.toml                                 |   1 +
>   proxmox-access-control/Cargo.toml          |   1 +
>   proxmox-access-control/src/acl.rs          |  10 +-
>   proxmox-access-control/src/init.rs         | 113 ++++++--
>   proxmox-access-control/src/token_shadow.rs | 315 ++++++++++++++++++++-
>   5 files changed, 413 insertions(+), 27 deletions(-)
> 
> 
> proxmox-datacenter-manager:
> 
> Samuel Rufinatscha (3):
>    pdm-config: implement token.shadow generation
>    docs: document API token-cache TTL effects
>    pdm-config: wire user+acl cache generation
> 
>   cli/admin/src/main.rs                      |  2 +-
>   docs/access-control.rst                    |  4 +++
>   lib/pdm-api-types/src/acl.rs               |  4 +--
>   lib/pdm-config/Cargo.toml                  |  1 +
>   lib/pdm-config/src/access_control.rs       | 31 ++++++++++++++++++++
>   lib/pdm-config/src/config_version_cache.rs | 34 +++++++++++++++++-----
>   lib/pdm-config/src/lib.rs                  |  2 ++
>   server/src/acl.rs                          |  3 +-
>   ui/src/main.rs                             | 10 ++++++-
>   9 files changed, 77 insertions(+), 14 deletions(-)
>   create mode 100644 lib/pdm-config/src/access_control.rs
> 
> 
> Summary over all repositories:
>    19 files changed, 825 insertions(+), 44 deletions(-)
> 





^ permalink raw reply	[relevance 13%]

* Re: [PATCH pve-cluster 05/14 v2] pmxcfs-rs: add pmxcfs-rrd crate
  @ 2026-03-13 14:09  3%   ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-13 14:09 UTC (permalink / raw)
  To: Kefu Chai, pve-devel

Thanks also for this patch and for working through / including most of
my previous v1 suggestions.

Please see my comments inline.

On 2/13/26 10:47 AM, Kefu Chai wrote:
> Add RRD (Round-Robin Database) file persistence system:
> - RrdWriter: Main API for RRD operations
> - Schema definitions for CPU, memory, network metrics
> - Format migration support (v1/v2/v3)
> - rrdcached integration for batched writes
> - Data transformation for legacy formats
> 
> This is an independent crate with no internal dependencies,
> only requiring external RRD libraries (rrd, rrdcached-client)
> and tokio for async operations. It handles time-series data
> storage compatible with the C implementation.
> 
> Includes comprehensive unit tests for data transformation,
> schema generation, and multi-source data processing.
> 
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
>   src/pmxcfs-rs/Cargo.toml                      |  12 +
>   src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml           |  23 +
>   src/pmxcfs-rs/pmxcfs-rrd/README.md            | 119 ++++
>   src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs       |  62 ++
>   .../pmxcfs-rrd/src/backend/backend_daemon.rs  | 184 ++++++
>   .../pmxcfs-rrd/src/backend/backend_direct.rs  | 586 ++++++++++++++++++
>   .../src/backend/backend_fallback.rs           | 212 +++++++
>   src/pmxcfs-rs/pmxcfs-rrd/src/daemon.rs        | 140 +++++

This file doesn't seem to be included (there is no mod definition).
Please remove if not needed.

>   src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs      | 408 ++++++++++++
>   src/pmxcfs-rs/pmxcfs-rrd/src/lib.rs           |  23 +
>   src/pmxcfs-rs/pmxcfs-rrd/src/parse.rs         | 124 ++++
>   .../pmxcfs-rrd/src/rrdcached/LICENSE          |  21 +
>   .../pmxcfs-rrd/src/rrdcached/client.rs        | 208 +++++++
>   .../src/rrdcached/consolidation_function.rs   |  30 +
>   .../pmxcfs-rrd/src/rrdcached/create.rs        | 410 ++++++++++++
>   .../pmxcfs-rrd/src/rrdcached/errors.rs        |  29 +
>   src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/mod.rs |  45 ++
>   src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/now.rs |  18 +
>   .../pmxcfs-rrd/src/rrdcached/parsers.rs       |  65 ++
>   .../pmxcfs-rrd/src/rrdcached/sanitisation.rs  | 100 +++
>   src/pmxcfs-rs/pmxcfs-rrd/src/schema.rs        | 577 +++++++++++++++++
>   src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs        | 582 +++++++++++++++++
>   22 files changed, 3978 insertions(+)
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/README.md
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_daemon.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_direct.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_fallback.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/daemon.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/lib.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/parse.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/LICENSE
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/client.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/consolidation_function.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/create.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/errors.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/mod.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/now.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/parsers.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/sanitisation.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/schema.rs
>   create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs
> 
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index d26fac04c..2457fe368 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -4,6 +4,7 @@ members = [
>       "pmxcfs-api-types",  # Shared types and error definitions
>       "pmxcfs-config",     # Configuration management
>       "pmxcfs-logger",     # Cluster log with ring buffer and deduplication
> +    "pmxcfs-rrd",        # RRD (Round-Robin Database) persistence
>   ]
>   resolver = "2"
>   
> @@ -20,16 +21,27 @@ rust-version = "1.85"
>   pmxcfs-api-types = { path = "pmxcfs-api-types" }
>   pmxcfs-config = { path = "pmxcfs-config" }
>   pmxcfs-logger = { path = "pmxcfs-logger" }
> +pmxcfs-rrd = { path = "pmxcfs-rrd" }
> +
> +# Core async runtime
> +tokio = { version = "1.35", features = ["full"] }
>   
>   # Error handling
> +anyhow = "1.0"
>   thiserror = "1.0"
>   
> +# Logging and tracing
> +tracing = "0.1"
> +
>   # Concurrency primitives
>   parking_lot = "0.12"
>   
>   # System integration
>   libc = "0.2"
>   
> +# Development dependencies
> +tempfile = "3.8"
> +
>   [workspace.lints.clippy]
>   uninlined_format_args = "warn"
>   
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml b/src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml
> new file mode 100644
> index 000000000..33c87ec91
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml
> @@ -0,0 +1,23 @@
> +[package]
> +name = "pmxcfs-rrd"
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +
> +[features]
> +default = ["rrdcached"]
> +rrdcached = []
> +
> +[dependencies]
> +anyhow.workspace = true
> +async-trait = "0.1"
> +chrono = { version = "0.4", default-features = false, features = ["clock"] }
> +nom = "8.0"

Do we actually need this extra dependency?
It seems like we only use it for basic string operations.

> +rrd = "0.2"
> +thiserror = "2.0"

In the workspace there is already thiserror = "1.0".
Please align accordingly.

> +tokio.workspace = true
> +tracing.workspace = true
> +
> +[dev-dependencies]
> +tempfile.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/README.md b/src/pmxcfs-rs/pmxcfs-rrd/README.md
> new file mode 100644
> index 000000000..d6f6ad9b1
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/README.md
> @@ -0,0 +1,119 @@
> +# pmxcfs-rrd
> +
> +RRD (Round-Robin Database) persistence for pmxcfs performance metrics.
> +
> +## Overview
> +
> +This crate provides RRD file management for storing time-series performance data from Proxmox nodes and VMs. It handles file creation, updates, and integration with rrdcached daemon for efficient writes.
> +
> +### Key Features
> +
> +- RRD file creation with schema-based initialization
> +- RRD updates (write metrics to disk)
> +- rrdcached integration for batched writes
> +- Support for both legacy and current schema versions (v1/v2/v3)
> +- Type-safe key parsing and validation
> +- Compatible with existing C-created RRD files
> +
> +## Usage Flow
> +
> +The typical data flow through this crate:
> +
> +1. **Metrics Collection**: pmxcfs-status collects performance metrics (CPU, memory, network, etc.)
> +2. **Key Generation**: Metrics are organized by key type (node, VM, storage)
> +3. **Schema Selection**: Appropriate RRD schema is selected based on key type and version
> +4. **Data Transformation**: Legacy data (v1/v2) is transformed to current format (v3) if needed
> +5. **Backend Selection**:
> +   - **Daemon backend**: Preferred for performance, batches writes via rrdcached
> +   - **Direct backend**: Fallback using librrd directly when daemon unavailable
> +   - **Fallback backend**: Tries daemon first, falls back to direct on failure
> +6. **File Operations**: Create RRD files if needed, update with new data points
> +
> +### Data Transformation
> +
> +The crate handles migration between schema versions:
> +- **v1 → v2**: Adds additional data sources for extended metrics
> +- **v2 → v3**: Consolidates and optimizes data sources
> +- **Transform logic**: `schema.rs:transform_data()` handles conversion, skipping incompatible entries
> +
> +### Backend Differences
> +
> +- **Daemon Backend** (`backend_daemon.rs`):
> +  - Uses vendored rrdcached client for async communication
> +  - Batches multiple updates for efficiency
> +  - Requires rrdcached daemon running
> +  - Best for high-frequency updates

And:

The C code tries rrdc_update() on every call, only falling back to
rrd_update_r() for that individual call if it fails, it doesn't
permanently disable the daemon path. So this is a difference too and
should be documented, or fixed.

> +
> +- **Direct Backend** (`backend_direct.rs`):
> +  - Uses rrd crate (librrd FFI bindings) directly
> +  - Synchronous file operations
> +  - No external daemon required
> +  - Reliable fallback option
> +
> +- **Fallback Backend** (`backend_fallback.rs`):
> +  - Composite pattern: tries daemon, falls back to direct
> +  - Matches C implementation behavior
> +  - Provides best of both worlds
> +
> +## Module Structure
> +
> +| Module | Purpose |
> +|--------|---------|
> +| `writer.rs` | Main RrdWriter API - high-level interface for RRD operations |
> +| `schema.rs` | RRD schema definitions (DS, RRA) and data transformation logic |
> +| `key_type.rs` | RRD key parsing, validation, and path sanitization |
> +| `daemon.rs` | rrdcached daemon client wrapper |
> +| `backend.rs` | Backend trait and implementations (daemon/direct/fallback) |
> +| `rrdcached/` | Vendored rrdcached client implementation (adapted from rrdcached-client v0.1.5) |
> +
> +## Usage Example
> +
> +```rust
> +use pmxcfs_rrd::{RrdWriter, RrdFallbackBackend};

RrdFallbackBackend is not exported from lib.rs.
Also the signature below doesn't match the current code.
Please verify.

> +
> +// Create writer with fallback backend
> +let backend = RrdFallbackBackend::new("/var/run/rrdcached.sock").await?;
> +let writer = RrdWriter::new(backend);
> +
> +// Update node CPU metrics
> +writer.update(
> +    "pve/nodes/node1/cpu",
> +    &[0.45, 0.52, 0.38, 0.61], // CPU usage values
> +    None, // Use current timestamp
> +).await?;
> +
> +// Create new RRD file for VM
> +writer.create(
> +    "pve/qemu/100/cpu",
> +    1704067200, // Start timestamp
> +).await?;
> +```
> +
> +## External Dependencies
> +
> +- **rrd crate**: Provides Rust bindings to librrd (RRDtool C library)
> +- **rrdcached client**: Vendored and adapted from rrdcached-client v0.1.5 (Apache-2.0 license)
> +  - Original source: https://github.com/SINTEF/rrdcached-client
> +  - Vendored to gain full control and adapt to our specific needs
> +  - Can be disabled via the `rrdcached` feature flag
> +
> +## Testing
> +
> +Unit tests verify:
> +- Schema generation and validation
> +- Key parsing for different RRD types (node, VM, storage)
> +- RRD file creation and update operations
> +- rrdcached client connection and fallback behavior
> +
> +Run tests with:
> +```bash
> +cargo test -p pmxcfs-rrd
> +```
> +
> +## References
> +
> +- **C Implementation**: `src/pmxcfs/status.c` (RRD code embedded)
> +- **Related Crates**:
> +  - `pmxcfs-status` - Uses RrdWriter for metrics persistence
> +  - `pmxcfs` - FUSE `.rrd` plugin reads RRD files
> +- **RRDtool Documentation**: https://oss.oetiker.ch/rrdtool/
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs
> new file mode 100644
> index 000000000..2fa4fa39d
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs
> @@ -0,0 +1,62 @@
> +/// RRD Backend Trait and Implementations
> +///
> +/// This module provides an abstraction over different RRD writing mechanisms:
> +/// - Daemon-based (via rrdcached) for performance and batching
> +/// - Direct file writing for reliability and fallback scenarios
> +/// - Fallback composite that tries daemon first, then falls back to direct
> +///
> +/// This design matches the C implementation's behavior in status.c where
> +/// it attempts daemon update first, then falls back to direct file writes.
> +use super::schema::RrdSchema;
> +use anyhow::Result;
> +use async_trait::async_trait;
> +use std::path::Path;
> +
> +/// Constants for RRD configuration
> +pub const DEFAULT_SOCKET_PATH: &str = "/var/run/rrdcached.sock";
> +pub const RRD_STEP_SECONDS: u64 = 60;
> +
> +/// Trait for RRD backend implementations
> +///
> +/// Provides abstraction over different RRD writing mechanisms.
> +/// All methods are async to support both async (daemon) and sync (direct file) operations.
> +#[async_trait]
> +pub trait RrdBackend: Send + Sync {
> +    /// Update RRD file with new data
> +    ///
> +    /// # Arguments
> +    /// * `file_path` - Full path to the RRD file
> +    /// * `data` - Update data in format "timestamp:value1:value2:..."
> +    async fn update(&mut self, file_path: &Path, data: &str) -> Result<()>;
> +
> +    /// Create new RRD file with schema
> +    ///
> +    /// # Arguments
> +    /// * `file_path` - Full path where RRD file should be created
> +    /// * `schema` - RRD schema defining data sources and archives
> +    /// * `start_timestamp` - Start time for the RRD file (Unix timestamp)
> +    async fn create(
> +        &mut self,
> +        file_path: &Path,
> +        schema: &RrdSchema,
> +        start_timestamp: i64,
> +    ) -> Result<()>;
> +
> +    /// Flush pending updates to disk
> +    ///
> +    /// For daemon backends, this sends a FLUSH command.
> +    /// For direct backends, this is a no-op (writes are immediate).
> +    async fn flush(&mut self) -> Result<()>;
> +
> +    /// Get a human-readable name for this backend
> +    fn name(&self) -> &str;
> +}
> +
> +// Backend implementations
> +mod backend_daemon;

The rrdcached module is conditional, but the daemon backend is always 
included. Please feature gate this too.

> +mod backend_direct;
> +mod backend_fallback;
> +
> +pub use backend_daemon::RrdCachedBackend;

Also this should be gated, no?

And similarly please gate the daemon usage in backend_fallback.rs and 
writer.rs where the fallback backend tries to connect to the daemon.

> +pub use backend_direct::RrdDirectBackend;
> +pub use backend_fallback::RrdFallbackBackend;
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_daemon.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_daemon.rs

[..]

> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_direct.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_direct.rs

[..]

> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_fallback.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_fallback.rs

[..]

> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/daemon.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/daemon.rs

[..]

> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs
> new file mode 100644
> index 000000000..fabe7e669
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs
> @@ -0,0 +1,408 @@
> +/// RRD Key Type Parsing and Path Resolution
> +///
> +/// This module handles parsing RRD status update keys and mapping them
> +/// to the appropriate file paths and schemas.
> +use super::schema::{RrdFormat, RrdSchema};
> +use anyhow::{Context, Result};
> +use std::path::{Path, PathBuf};
> +
> +/// Metric type for determining column skipping rules
> +#[derive(Debug, Clone, Copy, PartialEq, Eq)]
> +pub enum MetricType {
> +    Node,
> +    Vm,
> +    Storage,
> +}
> +
> +impl MetricType {
> +    /// Number of non-archivable columns to skip from the start of the data string
> +    ///
> +    /// The data from pvestatd has non-archivable fields at the beginning:
> +    /// - Node: skip 2 (uptime, sublevel) - then ctime:loadavg:maxcpu:...
> +    /// - VM: skip 4 (uptime, name, status, template) - then ctime:maxcpu:cpu:...
> +    /// - Storage: skip 0 - data starts with ctime:total:used
> +    ///
> +    /// C implementation: status.c:1300 (node skip=2), status.c:1335 (VM skip=4)
> +    pub fn skip_columns(self) -> usize {
> +        match self {
> +            MetricType::Node => 2,
> +            MetricType::Vm => 4,
> +            MetricType::Storage => 0,
> +        }
> +    }
> +
> +    /// Get column count for a specific RRD format
> +    #[allow(dead_code)]
> +    pub fn column_count(self, format: RrdFormat) -> usize {
> +        match (format, self) {
> +            (RrdFormat::Pve2, MetricType::Node) => 12,
> +            (RrdFormat::Pve9_0, MetricType::Node) => 19,
> +            (RrdFormat::Pve2, MetricType::Vm) => 10,
> +            (RrdFormat::Pve9_0, MetricType::Vm) => 17,
> +            (_, MetricType::Storage) => 2, // Same for both formats
> +        }
> +    }
> +}
> +
> +/// RRD key types for routing to correct schema and path
> +///
> +/// This enum represents the different types of RRD metrics that pmxcfs tracks:
> +/// - Node metrics (CPU, memory, network for a node)
> +/// - VM metrics (CPU, memory, disk, network for a VM/CT)
> +/// - Storage metrics (total/used space for a storage)
> +#[derive(Debug, Clone, PartialEq, Eq)]
> +pub(crate) enum RrdKeyType {
> +    /// Node metrics: pve2-node/{nodename} or pve-node-9.0/{nodename}
> +    Node { nodename: String, format: RrdFormat },
> +    /// VM metrics: pve2.3-vm/{vmid} or pve-vm-9.0/{vmid}
> +    Vm { vmid: String, format: RrdFormat },
> +    /// Storage metrics: pve2-storage/{node}/{storage} or pve-storage-9.0/{node}/{storage}
> +    Storage {
> +        nodename: String,
> +        storage: String,
> +        format: RrdFormat,
> +    },
> +}
> +
> +impl RrdKeyType {
> +    /// Parse RRD key from status update key
> +    ///
> +    /// Supported formats:
> +    /// - "pve2-node/node1" → Node { nodename: "node1", format: Pve2 }
> +    /// - "pve-node-9.0/node1" → Node { nodename: "node1", format: Pve9_0 }
> +    /// - "pve2.3-vm/100" → Vm { vmid: "100", format: Pve2 }
> +    /// - "pve-storage-9.0/node1/local" → Storage { nodename: "node1", storage: "local", format: Pve9_0 }
> +    ///
> +    /// # Security
> +    ///
> +    /// Path components are validated to prevent directory traversal attacks:
> +    /// - Rejects paths containing ".."
> +    /// - Rejects absolute paths
> +    /// - Rejects paths with special characters that could be exploited
> +    pub(crate) fn parse(key: &str) -> Result<Self> {
> +        let parts: Vec<&str> = key.split('/').collect();
> +
> +        if parts.is_empty() {
> +            anyhow::bail!("Empty RRD key");
> +        }
> +
> +        // Validate all path components for security
> +        for part in &parts[1..] {
> +            Self::validate_path_component(part)?;
> +        }
> +
> +        match parts[0] {
> +            "pve2-node" => {
> +                let nodename = parts.get(1).context("Missing nodename")?.to_string();
> +                Ok(RrdKeyType::Node {
> +                    nodename,
> +                    format: RrdFormat::Pve2,
> +                })
> +            }
> +            prefix if prefix.starts_with("pve-node-") => {
> +                let nodename = parts.get(1).context("Missing nodename")?.to_string();
> +                Ok(RrdKeyType::Node {
> +                    nodename,
> +                    format: RrdFormat::Pve9_0,

"pve-node-9.0" matches, but so does "pve-node-9.1", "pve-node-10.0" all 
treated as Pve9_0

I think we maybe parse the suffix and match exactly?

> +                })
> +            }
> +            "pve2.3-vm" => {
> +                let vmid = parts.get(1).context("Missing vmid")?.to_string();
> +                Ok(RrdKeyType::Vm {
> +                    vmid,
> +                    format: RrdFormat::Pve2,
> +                })
> +            }
> +            prefix if prefix.starts_with("pve-vm-") => {
> +                let vmid = parts.get(1).context("Missing vmid")?.to_string();
> +                Ok(RrdKeyType::Vm {
> +                    vmid,
> +                    format: RrdFormat::Pve9_0,
> +                })
> +            }
> +            "pve2-storage" => {
> +                let nodename = parts.get(1).context("Missing nodename")?.to_string();
> +                let storage = parts.get(2).context("Missing storage")?.to_string();
> +                Ok(RrdKeyType::Storage {
> +                    nodename,
> +                    storage,
> +                    format: RrdFormat::Pve2,
> +                })
> +            }
> +            prefix if prefix.starts_with("pve-storage-") => {
> +                let nodename = parts.get(1).context("Missing nodename")?.to_string();
> +                let storage = parts.get(2).context("Missing storage")?.to_string();
> +                Ok(RrdKeyType::Storage {
> +                    nodename,
> +                    storage,
> +                    format: RrdFormat::Pve9_0,
> +                })
> +            }
> +            _ => anyhow::bail!("Unknown RRD key format: {key}"),
> +        }
> +    }
> +
> +    /// Validate a path component for security
> +    ///
> +    /// Prevents directory traversal attacks by rejecting:
> +    /// - ".." (parent directory)
> +    /// - Absolute paths (starting with "/")
> +    /// - Empty components
> +    /// - Components with null bytes or other dangerous characters
> +    fn validate_path_component(component: &str) -> Result<()> {
> +        if component.is_empty() {
> +            anyhow::bail!("Empty path component");
> +        }
> +
> +        if component == ".." {
> +            anyhow::bail!("Path traversal attempt: '..' not allowed");
> +        }
> +
> +        if component.starts_with('/') {
> +            anyhow::bail!("Absolute paths not allowed");
> +        }
> +
> +        if component.contains('\0') {
> +            anyhow::bail!("Null byte in path component");
> +        }
> +
> +        // Reject other potentially dangerous characters
> +        if component.contains(['\\', '\n', '\r']) {
> +            anyhow::bail!("Invalid characters in path component");
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Get the RRD file path for this key type
> +    ///
> +    /// Always returns paths using the current format (9.0), regardless of the input format.
> +    /// This enables transparent format migration: old PVE8 nodes can send `pve2-node/` keys,
> +    /// and they'll be written to `pve-node-9.0/` files automatically.
> +    ///
> +    /// # Format Migration Strategy
> +    ///
> +    /// Returns the file path for this RRD key (without .rrd extension)
> +    ///
> +    /// The C implementation always creates files in the current format directory
> +    /// (see status.c:1287). This Rust implementation follows the same approach:
> +    /// - Input: `pve2-node/node1` → Output: `/var/lib/rrdcached/db/pve-node-9.0/node1`
> +    /// - Input: `pve-node-9.0/node1` → Output: `/var/lib/rrdcached/db/pve-node-9.0/node1`
> +    ///
> +    /// This allows rolling upgrades where old and new nodes coexist in the same cluster.
> +    ///
> +    /// Note: The path does NOT include .rrd extension, matching C implementation.
> +    /// The librrd functions (rrd_create_r, rrdc_update) add .rrd internally.
> +    pub(crate) fn file_path(&self, base_dir: &Path) -> PathBuf {
> +        match self {
> +            RrdKeyType::Node { nodename, .. } => {
> +                // Always use current format path
> +                base_dir.join("pve-node-9.0").join(nodename)
> +            }
> +            RrdKeyType::Vm { vmid, .. } => {
> +                // Always use current format path
> +                base_dir.join("pve-vm-9.0").join(vmid)
> +            }
> +            RrdKeyType::Storage {
> +                nodename, storage, ..
> +            } => {
> +                // Always use current format path
> +                base_dir
> +                    .join("pve-storage-9.0")
> +                    .join(nodename)
> +                    .join(storage)
> +            }
> +        }
> +    }
> +
> +    /// Get the source format from the input key
> +    ///
> +    /// This is used for data transformation (padding/truncation).
> +    pub(crate) fn source_format(&self) -> RrdFormat {
> +        match self {
> +            RrdKeyType::Node { format, .. }
> +            | RrdKeyType::Vm { format, .. }
> +            | RrdKeyType::Storage { format, .. } => *format,
> +        }
> +    }
> +
> +    /// Get the target RRD schema (always current format)
> +    ///
> +    /// Files are always created using the current format (Pve9_0),
> +    /// regardless of the source format in the key.
> +    pub(crate) fn schema(&self) -> RrdSchema {
> +        match self {
> +            RrdKeyType::Node { .. } => RrdSchema::node(RrdFormat::Pve9_0),
> +            RrdKeyType::Vm { .. } => RrdSchema::vm(RrdFormat::Pve9_0),
> +            RrdKeyType::Storage { .. } => RrdSchema::storage(RrdFormat::Pve9_0),
> +        }
> +    }
> +
> +    /// Get the metric type for this key
> +    pub(crate) fn metric_type(&self) -> MetricType {
> +        match self {
> +            RrdKeyType::Node { .. } => MetricType::Node,
> +            RrdKeyType::Vm { .. } => MetricType::Vm,
> +            RrdKeyType::Storage { .. } => MetricType::Storage,
> +        }
> +    }
> +}
> +
> +#[cfg(test)]
> +mod tests {

[..]

> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/lib.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/lib.rs

[..]

> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/parse.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/parse.rs

[..]

> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/LICENSE b/src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/LICENSE

[..]

> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/client.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/client.rs

[..]

> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/consolidation_function.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/consolidation_function.rs

[..]

> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/create.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/create.rs

[..]

> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/errors.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/errors.rs

[..]

> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/mod.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/mod.rs

[..]

> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/now.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/now.rs

[..]

> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/parsers.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/parsers.rs

[..]

> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/sanitisation.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/sanitisation.rs
> new file mode 100644
> index 000000000..8da6b633d
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/rrdcached/sanitisation.rs
> @@ -0,0 +1,100 @@
> +use super::errors::RRDCachedClientError;
> +
> +pub fn check_data_source_name(name: &str) -> Result<(), RRDCachedClientError> {
> +    if name.is_empty() || name.len() > 64 {
> +        return Err(RRDCachedClientError::InvalidDataSourceName(
> +            "name must be between 1 and 64 characters".to_string(),
> +        ));
> +    }
> +    if !name
> +        .chars()
> +        .all(|c| c.is_alphanumeric() || c == '_' || c == '-')
> +    {
> +        return Err(RRDCachedClientError::InvalidDataSourceName(
> +            "name must only contain alphanumeric characters and underscores".to_string(),
> +        ));
> +    }
> +    Ok(())
> +}
> +
> +pub fn check_rrd_path(name: &str) -> Result<(), RRDCachedClientError> {
> +    if name.is_empty() || name.len() > 64 {
> +        return Err(RRDCachedClientError::InvalidCreateDataSerie(
> +            "name must be between 1 and 64 characters".to_string(),
> +        ));
> +    }
> +    if !name
> +        .chars()
> +        .all(|c| c.is_alphanumeric() || c == '_' || c == '-')

This rejects "/" and ".", but we pass full system paths to it. Also 
please check the path length limitation above.

> +    {
> +        return Err(RRDCachedClientError::InvalidCreateDataSerie(
> +            "name must only contain alphanumeric characters and underscores".to_string(),
> +        ));
> +    }
> +    Ok(())
> +}

[..]

> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/schema.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/schema.rs
[..]

> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs
> new file mode 100644
> index 000000000..6c48940be
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs
> @@ -0,0 +1,582 @@
> +/// RRD File Writer
> +///
> +/// Handles creating and updating RRD files via pluggable backends.
> +/// Supports daemon-based (rrdcached) and direct file writing modes.
> +use super::backend::{DEFAULT_SOCKET_PATH, RrdFallbackBackend};
> +use super::key_type::{MetricType, RrdKeyType};
> +use super::schema::{RrdFormat, RrdSchema};
> +use anyhow::{Context, Result};
> +use chrono::Local;
> +use std::fs;
> +use std::path::{Path, PathBuf};
> +
> +
> +/// RRD writer for persistent metric storage
> +///
> +/// Uses pluggable backends (daemon, direct, or fallback) for RRD operations.
> +pub struct RrdWriter {
> +    /// Base directory for RRD files (default: /var/lib/rrdcached/db)
> +    base_dir: PathBuf,
> +    /// Backend for RRD operations (daemon, direct, or fallback)
> +    backend: Box<dyn super::backend::RrdBackend>,
> +}
> +
> +impl RrdWriter {
> +    /// Create new RRD writer with default fallback backend
> +    ///
> +    /// Uses the fallback backend that tries daemon first, then falls back to direct file writes.
> +    /// This matches the C implementation's behavior.
> +    ///
> +    /// # Arguments
> +    /// * `base_dir` - Base directory for RRD files
> +    pub async fn new<P: AsRef<Path>>(base_dir: P) -> Result<Self> {
> +        let backend = Self::default_backend().await?;
> +        Self::with_backend(base_dir, backend).await
> +    }
> +
> +    /// Create new RRD writer with specific backend
> +    ///
> +    /// # Arguments
> +    /// * `base_dir` - Base directory for RRD files
> +    /// * `backend` - RRD backend to use (daemon, direct, or fallback)
> +    pub(crate) async fn with_backend<P: AsRef<Path>>(
> +        base_dir: P,
> +        backend: Box<dyn super::backend::RrdBackend>,
> +    ) -> Result<Self> {
> +        let base_dir = base_dir.as_ref().to_path_buf();
> +
> +        // Create base directory if it doesn't exist
> +        fs::create_dir_all(&base_dir)
> +            .with_context(|| format!("Failed to create RRD base directory: {base_dir:?}"))?;
> +
> +        tracing::info!("RRD writer using backend: {}", backend.name());
> +
> +        Ok(Self { base_dir, backend })
> +    }
> +
> +    /// Create default backend (fallback: daemon + direct)
> +    ///
> +    /// This matches the C implementation's behavior:
> +    /// - Tries rrdcached daemon first for performance
> +    /// - Falls back to direct file writes if daemon fails
> +    async fn default_backend() -> Result<Box<dyn super::backend::RrdBackend>> {
> +        let backend = RrdFallbackBackend::new(DEFAULT_SOCKET_PATH).await;
> +        Ok(Box::new(backend))
> +    }
> +
> +    /// Update RRD file with metric data
> +    ///
> +    /// This will:
> +    /// 1. Transform data from source format to target format (padding/truncation/column skipping)
> +    /// 2. Create the RRD file if it doesn't exist
> +    /// 3. Update via rrdcached daemon
> +    ///
> +    /// # Arguments
> +    /// * `key` - RRD key (e.g., "pve2-node/node1", "pve-vm-9.0/100")
> +    /// * `data` - Raw metric data string from pvestatd (format: "skipped_fields...:ctime:val1:val2:...")
> +    pub async fn update(&mut self, key: &str, data: &str) -> Result<()> {
> +        // Parse the key to determine file path and schema
> +        let key_type = RrdKeyType::parse(key).with_context(|| format!("Invalid RRD key: {key}"))?;
> +
> +        // Get source format and target schema
> +        let source_format = key_type.source_format();
> +        let target_schema = key_type.schema();
> +        let metric_type = key_type.metric_type();
> +
> +        // Transform data from source to target format
> +        let transformed_data =
> +            Self::transform_data(data, source_format, &target_schema, metric_type)
> +                .with_context(|| format!("Failed to transform RRD data for key: {key}"))?;
> +
> +        // Get the file path (always uses current format)
> +        let file_path = key_type.file_path(&self.base_dir);
> +
> +        // Ensure the RRD file exists
> +        // Always check file existence directly - handles file deletion/rotation
> +        if !file_path.exists() {
> +            self.create_rrd_file(&key_type, &file_path).await?;

The on-disk naming convention for .rrd is inconsistent across the crate
and I think this can break the logic here.
file_path() in key_type.rs is documented as returning paths without
.rrd, and that's what this existence check runs against. But the vendored
rrdcached client in rrdcached/client.rs and rrdcached/create.rs
appends .rrd when building the update and create commands, so the
daemon backend creates files at path.rrd. Meanwhile the direct backend
tests in backend_direct.rs also construct paths with .rrd explicitly.

Can we please pin down whether the rrd crate's create() / update_all()
auto append .rrd or not then make one consistent decision and
align file_path(), the existence check, the vendored client and the
direct backend tests to the same convention?

> +        }
> +
> +        // Update the RRD file via backend
> +        self.backend.update(&file_path, &transformed_data).await?;
> +
> +        Ok(())
> +    }
> +
> +    /// Create RRD file with appropriate schema via backend
> +    async fn create_rrd_file(&mut self, key_type: &RrdKeyType, file_path: &Path) -> Result<()> {
> +        // Ensure parent directory exists
> +        if let Some(parent) = file_path.parent() {
> +            fs::create_dir_all(parent)
> +                .with_context(|| format!("Failed to create directory: {parent:?}"))?;
> +        }
> +
> +        // Get schema for this RRD type
> +        let schema = key_type.schema();
> +
> +        // Calculate start time (at day boundary, matching C implementation)
> +        // C uses localtime() (status.c:1206-1219), not UTC
> +        let now = Local::now();
> +        let start = now
> +            .date_naive()
> +            .and_hms_opt(0, 0, 0)
> +            .expect("00:00:00 is always a valid time")
> +            .and_local_timezone(Local)
> +            .single()

This might return None and would panic in that case.
Maybe earliest() would help here?

> +            .expect("Local midnight should have single timezone mapping");
> +        let start_timestamp = start.timestamp();
> +
> +        tracing::debug!(
> +            "Creating RRD file: {:?} with {} data sources via {}",
> +            file_path,
> +            schema.column_count(),
> +            self.backend.name()
> +        );
> +
> +        // Delegate to backend for creation
> +        self.backend
> +            .create(file_path, &schema, start_timestamp)
> +            .await?;
> +
> +        tracing::info!("Created RRD file: {:?} ({})", file_path, schema);
> +
> +        Ok(())
> +    }
> ++#[cfg(test)]
> +mod tests {
> +    use super::super::schema::{RrdFormat, RrdSchema};
> +    use super::*;
> +
+    #[test]
+    fn test_rrd_file_path_generation() {
+        let temp_dir = std::path::PathBuf::from("/tmp/test");
+
+        let key_node = RrdKeyType::Node {
+            nodename: "testnode".to_string(),
+            format: RrdFormat::Pve9_0,
+        };
+        let path = key_node.file_path(&temp_dir);
+        assert_eq!(path, temp_dir.join("pve-node-9.0").join("testnode"));
+    }
+
+    // ===== Format Adaptation Tests =====
+
+    #[test]
+    fn test_transform_data_node_pve2_to_pve9() {
+        // Test padding old format (12 archivable cols) to new format 
(19 archivable cols)
+        // pvestatd data format for node: 
"uptime:sublevel:ctime:loadavg:maxcpu:cpu:iowait:memtotal:memused:swap_t:swap_u:root_t:root_u:netin:netout"
+        // = 2 non-archivable + 1 timestamp + 12 archivable = 15 fields
+        let data = 
"1000:0:1234567890:1.5:4:2.0:0.5:8000000000:6000000000:0:0:0:0:1000000:500000";
+
+        let schema = RrdSchema::node(RrdFormat::Pve9_0);
+        let result =
+            RrdWriter::transform_data(data, RrdFormat::Pve2, &schema, 
MetricType::Node).unwrap();
+
+        // After skip(2): "1234567890:1.5:4:2.0:0.5:...:500000" = 13 fields
+        // Pad to 20 total (timestamp + 19 values): 13 + 7 "U" = 20
+        let parts: Vec<&str> = result.split(':').collect();
+        assert_eq!(parts[0], "1234567890", "Timestamp should be 
preserved");
+        assert_eq!(parts.len(), 20, "Should have timestamp + 19 values");
+        assert_eq!(parts[1], "1.5", "First value after skip should be 
loadavg");
+        assert_eq!(parts[2], "4", "Second value should be maxcpu");
+        assert_eq!(parts[12], "500000", "Last data value should be 
netout");
+
+        // Check padding (7 columns: 19 - 12 = 7)
+        for (i, item) in parts.iter().enumerate().take(20).skip(13) {
+            assert_eq!(item, &"U", "Column {} should be padded with U", i);
+        }
+    }
+
+    #[test]
+    fn test_transform_data_vm_pve2_to_pve9() {
+        // Test VM transformation with 4 columns skipped
+        // pvestatd data format for VM: 
"uptime:name:status:template:ctime:maxcpu:cpu:maxmem:mem:maxdisk:disk:netin:netout:diskread:diskwrite"
+        // = 4 non-archivable + 1 timestamp + 10 archivable = 15 fields
+        let data = 
"1000:myvm:1:0:1234567890:4:2:4096:2048:100000:50000:1000:500:100:50";
+
+        let schema = RrdSchema::vm(RrdFormat::Pve9_0);
+        let result =
+            RrdWriter::transform_data(data, RrdFormat::Pve2, &schema, 
MetricType::Vm).unwrap();
+
+        // After skip(4): "1234567890:4:2:4096:...:50" = 11 fields
+        // Pad to 18 total (timestamp + 17 values): 11 + 7 "U" = 18
+        let parts: Vec<&str> = result.split(':').collect();
+        assert_eq!(parts[0], "1234567890");
+        assert_eq!(parts.len(), 18, "Should have timestamp + 17 values");
+        assert_eq!(parts[1], "4", "First value after skip should be 
maxcpu");
+        assert_eq!(parts[10], "50", "Last data value should be diskwrite");
+
+        // Check padding (7 columns: 17 - 10 = 7)
+        for (i, item) in parts.iter().enumerate().take(18).skip(11) {
+            assert_eq!(item, &"U", "Column {} should be padded", i);
+        }
+    }
+
+    #[test]
+    fn test_transform_data_no_padding_needed() {
+        // Test when source and target have same column count (Pve9_0 
node: 19 archivable cols)
+        // pvestatd format: 
"uptime:sublevel:ctime:loadavg:maxcpu:cpu:iowait:memtotal:memused:swap_t:swap_u:root_t:root_u:netin:netout:memavail:arcsize:cpu_some:io_some:io_full:mem_some:mem_full"
+        // = 2 non-archivable + 1 timestamp + 19 archivable = 22 fields
+        let data = 
"1000:0:1234567890:1.5:4:2.0:0.5:8000000000:6000000000:0:0:0:0:1000000:500000:7000000000:0:0.12:0.05:0.02:0.08:0.03";
+
+        let schema = RrdSchema::node(RrdFormat::Pve9_0);
+        let result =
+            RrdWriter::transform_data(data, RrdFormat::Pve9_0, &schema, 
MetricType::Node).unwrap();
+
+        // After skip(2): 20 fields = timestamp + 19 values (exact 
match, no padding)
+        let parts: Vec<&str> = result.split(':').collect();
+        assert_eq!(parts.len(), 20, "Should have timestamp + 19 values");
+        assert_eq!(parts[0], "1234567890", "Timestamp should be ctime");
+        assert_eq!(parts[1], "1.5", "First value after skip should be 
loadavg");
+        assert_eq!(parts[19], "0.03", "Last value should be mem_full 
(no padding)");
+    }
+
+    #[test]
+    fn test_transform_data_future_format_truncation() {
+        // Test truncation when a future format sends more columns than 
current pve9.0
+        // Simulating: uptime:sublevel:ctime:1:2:3:...:25 (2 skipped + 
timestamp + 25 archivable = 28 fields)
+        let data =
+ 
"999:0:1234567890:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25";
+
+        let schema = RrdSchema::node(RrdFormat::Pve9_0);
+        let result =
+            RrdWriter::transform_data(data, RrdFormat::Pve9_0, &schema, 
MetricType::Node).unwrap();
+
+        // After skip(2): "1234567890:1:2:...:25" = 26 fields
+        // take(20): truncate to timestamp + 19 values
+        let parts: Vec<&str> = result.split(':').collect();
+        assert_eq!(parts.len(), 20, "Should truncate to timestamp + 19 
values");
+        assert_eq!(parts[0], "1234567890", "Timestamp should be ctime");
+        assert_eq!(parts[1], "1", "First archivable value");
+        assert_eq!(parts[19], "19", "Last value should be column 19 
(truncated)");
+    }
+
+    #[test]
+    fn test_transform_data_storage_no_change() {
+        // Storage format is same for Pve2 and Pve9_0 (2 columns, no 
skipping)
+        let data = "1234567890:1000000000000:500000000000";
+
+        let schema = RrdSchema::storage(RrdFormat::Pve9_0);
+        let result =
+            RrdWriter::transform_data(data, RrdFormat::Pve2, &schema, 
MetricType::Storage).unwrap();
+
+        assert_eq!(result, data, "Storage data should not be transformed");
+    }
+
+    #[test]
+    fn test_metric_type_methods() {
+        assert_eq!(MetricType::Node.skip_columns(), 2);
+        assert_eq!(MetricType::Vm.skip_columns(), 4);
+        assert_eq!(MetricType::Storage.skip_columns(), 0);
+    }
+
+    #[test]
+    fn test_format_column_counts() {
+        assert_eq!(MetricType::Node.column_count(RrdFormat::Pve2), 12);
+        assert_eq!(MetricType::Node.column_count(RrdFormat::Pve9_0), 19);
+        assert_eq!(MetricType::Vm.column_count(RrdFormat::Pve2), 10);
+        assert_eq!(MetricType::Vm.column_count(RrdFormat::Pve9_0), 17);
+        assert_eq!(MetricType::Storage.column_count(RrdFormat::Pve2), 2);
+        assert_eq!(MetricType::Storage.column_count(RrdFormat::Pve9_0), 2);
+    }
+
+    // ===== Real Payload Fixtures from Production Systems =====
+    //
+    // These tests use actual RRD data captured from running PVE systems
+    // to validate transform_data() correctness against real-world 
payloads.
+
+    #[test]
+    fn test_real_payload_node_pve2() {
+        // Real pve2-node payload captured from PVE 6.x system
+        // Format: 
uptime:sublevel:ctime:loadavg:maxcpu:cpu:iowait:memtotal:memused:swaptotal:swapused:roottotal:rootused:netin:netout
+        let data = 
"432156:0:1709123456:0.15:8:3.2:0.8:33554432000:12884901888:8589934592:0:107374182400:53687091200:1234567890:987654321";
+
+        let schema = RrdSchema::node(RrdFormat::Pve9_0);
+        let result =
+            RrdWriter::transform_data(data, RrdFormat::Pve2, &schema, 
MetricType::Node).unwrap();
+
+        let parts: Vec<&str> = result.split(':').collect();
+        assert_eq!(parts[0], "1709123456", "Timestamp preserved");
+        assert_eq!(parts.len(), 20, "Should have timestamp + 19 values");
+
+        // Verify key metrics are preserved
+        assert_eq!(parts[1], "0.15", "Load average preserved");
+        assert_eq!(parts[2], "8", "Max CPU preserved");
+        assert_eq!(parts[3], "3.2", "CPU usage preserved");
+        assert_eq!(parts[4], "0.8", "IO wait preserved");
+
+        // Verify padding for new columns (7 new columns in Pve9_0)
+        for i in 13..20 {
+            assert_eq!(parts[i], "U", "New column {} should be padded", i);
+        }
+    }
+
+    #[test]
+    fn test_real_payload_vm_pve2() {
+        // Real pve2.3-vm payload captured from PVE 6.x system
+        // Format: 
uptime:name:status:template:ctime:maxcpu:cpu:maxmem:mem:maxdisk:disk:netin:netout:diskread:diskwrite
+        let data = 
"86400:vm-100-disk-0:running:0:1709123456:4:45.3:8589934592:4294967296:107374182400:32212254720:123456789:98765432:1048576:2097152";
+
+        let schema = RrdSchema::vm(RrdFormat::Pve9_0);
+        let result =
+            RrdWriter::transform_data(data, RrdFormat::Pve2, &schema, 
MetricType::Vm).unwrap();
+
+        let parts: Vec<&str> = result.split(':').collect();
+        assert_eq!(parts[0], "1709123456", "Timestamp preserved");
+        assert_eq!(parts.len(), 18, "Should have timestamp + 17 values");
+
+        // Verify key metrics are preserved
+        assert_eq!(parts[1], "4", "Max CPU preserved");
+        assert_eq!(parts[2], "45.3", "CPU usage preserved");
+        assert_eq!(parts[3], "8589934592", "Max memory preserved");
+        assert_eq!(parts[4], "4294967296", "Memory usage preserved");
+
+        // Verify padding for new columns (7 new columns in Pve9_0)
+        for i in 11..18 {
+            assert_eq!(parts[i], "U", "New column {} should be padded", i);
+        }
+    }
+
+    #[test]
+    fn test_real_payload_storage_pve2() {
+        // Real pve2-storage payload captured from PVE 6.x system
+        // Format: ctime:total:used
+        let data = "1709123456:1099511627776:549755813888";
+
+        let schema = RrdSchema::storage(RrdFormat::Pve9_0);
+        let result =
+            RrdWriter::transform_data(data, RrdFormat::Pve2, &schema, 
MetricType::Storage)
+                .unwrap();
+
+        // Storage format unchanged between Pve2 and Pve9_0
+        assert_eq!(result, data, "Storage data should not be transformed");
+
+        let parts: Vec<&str> = result.split(':').collect();
+        assert_eq!(parts[0], "1709123456", "Timestamp preserved");
+        assert_eq!(parts[1], "1099511627776", "Total storage preserved");
+        assert_eq!(parts[2], "549755813888", "Used storage preserved");
+    }
+
+    #[test]
+    fn test_real_payload_node_pve9_0() {
+        // Real pve-node-9.0 payload from PVE 8.x system (already in 
target format)

Can we please add real binary fixtures instead?
We would catch more issues using that.

+        // Input has 19 fields, after skip(2) = 17 archivable columns
+        // Schema expects 19 archivable columns, so 2 "U" padding added
+        let data = 
"864321:0:1709123456:0.25:16:8.5:1.2:67108864000:25769803776:17179869184:0:214748364800:107374182400:2345678901:1876543210:x86_64:6.5.11:0.3:250";
+
+        let schema = RrdSchema::node(RrdFormat::Pve9_0);
+        let result =
+            RrdWriter::transform_data(data, RrdFormat::Pve9_0, &schema, 
MetricType::Node)
+                .unwrap();
+
+        let parts: Vec<&str> = result.split(':').collect();
+        assert_eq!(parts[0], "1709123456", "Timestamp preserved");
+        assert_eq!(parts.len(), 20, "Should have timestamp + 19 values");
+
+        // Verify all columns preserved
+        assert_eq!(parts[1], "0.25", "Load average preserved");
+        assert_eq!(parts[13], "x86_64", "CPU info preserved");
+        assert_eq!(parts[14], "6.5.11", "Kernel version preserved");
+        assert_eq!(parts[15], "0.3", "Wait time preserved");
+        assert_eq!(parts[16], "250", "Process count preserved");
+
+        // Last 3 columns are padding (input had 17 archivable, schema 
expects 19)
+        assert_eq!(parts[17], "U", "Padding column 1");
+        assert_eq!(parts[18], "U", "Padding column 2");
+        assert_eq!(parts[19], "U", "Padding column 3");
+    }
+
+    #[test]
+    fn test_real_payload_with_missing_values() {
+        // Real payload with some missing values (represented as "U")
+        // This can happen when metrics are temporarily unavailable
+        let data = 
"432156:0:1709123456:0.15:8:U:0.8:33554432000:12884901888:U:0:107374182400:53687091200:1234567890:987654321";
+
+        let schema = RrdSchema::node(RrdFormat::Pve9_0);
+        let result =
+            RrdWriter::transform_data(data, RrdFormat::Pve2, &schema, 
MetricType::Node).unwrap();
+
+        let parts: Vec<&str> = result.split(':').collect();
+        assert_eq!(parts[0], "1709123456", "Timestamp preserved");
+
+        // Verify "U" values are preserved (after skip(2), positions shift)
+        assert_eq!(parts[3], "U", "Missing CPU value preserved as U");
+        assert_eq!(parts[7], "U", "Missing swap total preserved as U");
+    }
[..]




^ permalink raw reply	[relevance 3%]

* partially-applied: [PATCH proxmox{-backup,,-datacenter-manager} v7 00/11] token-shadow: reduce api token verification overhead
  2026-03-12 10:36 14% [PATCH proxmox{-backup,,-datacenter-manager} v7 " Samuel Rufinatscha
                   ` (10 preceding siblings ...)
  2026-03-12 10:37 16% ` [PATCH proxmox-datacenter-manager v7 3/3] pdm-config: wire user+acl cache generation Samuel Rufinatscha
@ 2026-03-19 12:26  5% ` Fabian Grünbichler
  2026-03-23 12:16  6%   ` Samuel Rufinatscha
  2026-04-09 15:58 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
  12 siblings, 1 reply; 117+ results
From: Fabian Grünbichler @ 2026-03-19 12:26 UTC (permalink / raw)
  To: pbs-devel, Samuel Rufinatscha

On March 12, 2026 11:36 am, Samuel Rufinatscha wrote:
> Hi,
>
> [..]
> 
> Patch summary
> 
> pbs-config:
> 0001 – pbs-config: add token.shadow generation to ConfigVersionCache
> 0002 – pbs-config: cache verified API token secrets
> 0003 – pbs-config: invalidate token-secret cache on token.shadow
> changes
> 0004 – pbs-config: add TTL window to token-secret cache

applied these, with some follow-ups as discussed off-list

> proxmox-access-control:
> 0005 – access-control: extend AccessControlConfig for token.shadow invalidation
> 0006 – access-control: cache verified API token secrets
> 0007 – access-control: invalidate token-secret cache on token.shadow changes
> 0008 – access-control: add TTL window to token-secret cache
> 
> proxmox-datacenter-manager:
> 0009 – pdm-config: add token.shadow generation to ConfigVersionCache
> 0010 – docs: document API token-cache TTL effects
> 0011 – pdm-config: wire user+acl cache generation

skipped these for now - I think the split between the traits there makes
things a bit too intertwined while sort of pretending they are separate.
we probably would have noticed earlier that things aren't cleanly
separated if PDM had wired up user.cfg/acl.cfg caching ;)

we should probably split the two traits completely, instead of nesting:
- one for the ACL-related parts needed by the UI, guarded by the `acl`
  feature
- one for the user.cfg/token.shadow and caching related parts, guarded
  by the `impl` feature

with an `init` call each consuming them (or one consuming the acl one,
and second impl-one consuming both?).

neither of these traits is used by the product code itself, except for
implementing them to tell proxmox-access-control about product-specific
bits, so having two traits allows properly separating the concerns on
the product side.

it would probably also make sense to include the follow-ups (which are
mostly renaming things) to make it easier to at some point switch the
PBS code over to proxmox-access-control..

> Maintainer Notes:
> * proxmox-access-control trait split: permissions now live in
>  AccessControlPermissions, and AccessControlConfig now requires
>  fn permissions(&self) -> &dyn AccessControlPermissions ->
>  version bump
> * Renames ConfigVersionCache`s pub user_cache_generation and
>  increase_user_cache_generation -> version bump
> * Adds parking_lot::RwLock dependency in PBS and proxmox-access-control
> 
> This version and the version before only incorporate the reviewers'
> feedback [4][5][6], also please consider Christian's R-b tag [4].
> 
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
> [2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
> [3] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [4] https://lore.proxmox.com/pbs-devel/20260121151408.731516-1-s.rufinatscha@proxmox.com/T/#t
> [5] https://lore.proxmox.com/pbs-devel/20260217111229.78661-1-s.rufinatscha@proxmox.com/T/#t
> [6] https://lore.proxmox.com/pbs-devel/725687dd-5a35-41ed-af62-6dc9f062cbd4@proxmox.com/T/#t
> 
> proxmox-backup:
> 
> Samuel Rufinatscha (4):
>   pbs-config: add token.shadow generation to ConfigVersionCache
>   pbs-config: cache verified API token secrets
>   pbs-config: invalidate token-secret cache on token.shadow changes
>   pbs-config: add TTL window to token secret cache
> 
>  Cargo.toml                             |   1 +
>  docs/user-management.rst               |   4 +
>  pbs-config/Cargo.toml                  |   1 +
>  pbs-config/src/config_version_cache.rs |  18 ++
>  pbs-config/src/token_shadow.rs         | 314 ++++++++++++++++++++++++-
>  5 files changed, 335 insertions(+), 3 deletions(-)
> 
> 
> proxmox:
> 
> Samuel Rufinatscha (4):
>   proxmox-access-control: split AccessControlConfig and add token.shadow
>     gen
>   proxmox-access-control: cache verified API token secrets
>   proxmox-access-control: invalidate token-secret cache on token.shadow
>     changes
>   proxmox-access-control: add TTL window to token secret cache
> 
>  Cargo.toml                                 |   1 +
>  proxmox-access-control/Cargo.toml          |   1 +
>  proxmox-access-control/src/acl.rs          |  10 +-
>  proxmox-access-control/src/init.rs         | 113 ++++++--
>  proxmox-access-control/src/token_shadow.rs | 315 ++++++++++++++++++++-
>  5 files changed, 413 insertions(+), 27 deletions(-)
> 
> 
> proxmox-datacenter-manager:
> 
> Samuel Rufinatscha (3):
>   pdm-config: implement token.shadow generation
>   docs: document API token-cache TTL effects
>   pdm-config: wire user+acl cache generation
> 
>  cli/admin/src/main.rs                      |  2 +-
>  docs/access-control.rst                    |  4 +++
>  lib/pdm-api-types/src/acl.rs               |  4 +--
>  lib/pdm-config/Cargo.toml                  |  1 +
>  lib/pdm-config/src/access_control.rs       | 31 ++++++++++++++++++++
>  lib/pdm-config/src/config_version_cache.rs | 34 +++++++++++++++++-----
>  lib/pdm-config/src/lib.rs                  |  2 ++
>  server/src/acl.rs                          |  3 +-
>  ui/src/main.rs                             | 10 ++++++-
>  9 files changed, 77 insertions(+), 14 deletions(-)
>  create mode 100644 lib/pdm-config/src/access_control.rs
> 
> 
> Summary over all repositories:
>   19 files changed, 825 insertions(+), 44 deletions(-)
> 
> -- 
> Generated by git-murpp 0.8.1
> 
> 
> 
> 
> 




^ permalink raw reply	[relevance 5%]

* Re: partially-applied: [PATCH proxmox{-backup,,-datacenter-manager} v7 00/11] token-shadow: reduce api token verification overhead
  2026-03-19 12:26  5% ` partially-applied: [PATCH proxmox{-backup,,-datacenter-manager} v7 00/11] token-shadow: reduce api token verification overhead Fabian Grünbichler
@ 2026-03-23 12:16  6%   ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-03-23 12:16 UTC (permalink / raw)
  To: Fabian Grünbichler, pbs-devel

On 3/19/26 1:25 PM, Fabian Grünbichler wrote:
> On March 12, 2026 11:36 am, Samuel Rufinatscha wrote:
>> Hi,
>>
>> [..]
>>
>> Patch summary
>>
>> pbs-config:
>> 0001 – pbs-config: add token.shadow generation to ConfigVersionCache
>> 0002 – pbs-config: cache verified API token secrets
>> 0003 – pbs-config: invalidate token-secret cache on token.shadow
>> changes
>> 0004 – pbs-config: add TTL window to token-secret cache
> 
> applied these, with some follow-ups as discussed off-list
>

Thanks again for applying these and for the provided follow-ups.

>> proxmox-access-control:
>> 0005 – access-control: extend AccessControlConfig for token.shadow invalidation
>> 0006 – access-control: cache verified API token secrets
>> 0007 – access-control: invalidate token-secret cache on token.shadow changes
>> 0008 – access-control: add TTL window to token-secret cache
>>
>> proxmox-datacenter-manager:
>> 0009 – pdm-config: add token.shadow generation to ConfigVersionCache
>> 0010 – docs: document API token-cache TTL effects
>> 0011 – pdm-config: wire user+acl cache generation
> 
> skipped these for now - I think the split between the traits there makes
> things a bit too intertwined while sort of pretending they are separate.
> we probably would have noticed earlier that things aren't cleanly
> separated if PDM had wired up user.cfg/acl.cfg caching ;)
> 
> we should probably split the two traits completely, instead of nesting:
> - one for the ACL-related parts needed by the UI, guarded by the `acl`
>    feature
> - one for the user.cfg/token.shadow and caching related parts, guarded
>    by the `impl` feature
> 
> with an `init` call each consuming them (or one consuming the acl one,
> and second impl-one consuming both?).
> 
> neither of these traits is used by the product code itself, except for
> implementing them to tell proxmox-access-control about product-specific
> bits, so having two traits allows properly separating the concerns on
> the product side.
> 
> it would probably also make sense to include the follow-ups (which are
> mostly renaming things) to make it easier to at some point switch the
> PBS code over to proxmox-access-control..
>

Fully agree on this, we should really split the traits completely.
Thanks for the summary. This will be adjusted as part of the next
version.

>> Maintainer Notes:
>> * proxmox-access-control trait split: permissions now live in
>>   AccessControlPermissions, and AccessControlConfig now requires
>>   fn permissions(&self) -> &dyn AccessControlPermissions ->
>>   version bump
>> * Renames ConfigVersionCache`s pub user_cache_generation and
>>   increase_user_cache_generation -> version bump
>> * Adds parking_lot::RwLock dependency in PBS and proxmox-access-control
>>
>> This version and the version before only incorporate the reviewers'
>> feedback [4][5][6], also please consider Christian's R-b tag [4].
>>
>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>> [2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
>> [3] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>> [4] https://lore.proxmox.com/pbs-devel/20260121151408.731516-1-s.rufinatscha@proxmox.com/T/#t
>> [5] https://lore.proxmox.com/pbs-devel/20260217111229.78661-1-s.rufinatscha@proxmox.com/T/#t
>> [6] https://lore.proxmox.com/pbs-devel/725687dd-5a35-41ed-af62-6dc9f062cbd4@proxmox.com/T/#t
>>
>> proxmox-backup:
>>
>> Samuel Rufinatscha (4):
>>    pbs-config: add token.shadow generation to ConfigVersionCache
>>    pbs-config: cache verified API token secrets
>>    pbs-config: invalidate token-secret cache on token.shadow changes
>>    pbs-config: add TTL window to token secret cache
>>
>>   Cargo.toml                             |   1 +
>>   docs/user-management.rst               |   4 +
>>   pbs-config/Cargo.toml                  |   1 +
>>   pbs-config/src/config_version_cache.rs |  18 ++
>>   pbs-config/src/token_shadow.rs         | 314 ++++++++++++++++++++++++-
>>   5 files changed, 335 insertions(+), 3 deletions(-)
>>
>>
>> proxmox:
>>
>> Samuel Rufinatscha (4):
>>    proxmox-access-control: split AccessControlConfig and add token.shadow
>>      gen
>>    proxmox-access-control: cache verified API token secrets
>>    proxmox-access-control: invalidate token-secret cache on token.shadow
>>      changes
>>    proxmox-access-control: add TTL window to token secret cache
>>
>>   Cargo.toml                                 |   1 +
>>   proxmox-access-control/Cargo.toml          |   1 +
>>   proxmox-access-control/src/acl.rs          |  10 +-
>>   proxmox-access-control/src/init.rs         | 113 ++++++--
>>   proxmox-access-control/src/token_shadow.rs | 315 ++++++++++++++++++++-
>>   5 files changed, 413 insertions(+), 27 deletions(-)
>>
>>
>> proxmox-datacenter-manager:
>>
>> Samuel Rufinatscha (3):
>>    pdm-config: implement token.shadow generation
>>    docs: document API token-cache TTL effects
>>    pdm-config: wire user+acl cache generation
>>
>>   cli/admin/src/main.rs                      |  2 +-
>>   docs/access-control.rst                    |  4 +++
>>   lib/pdm-api-types/src/acl.rs               |  4 +--
>>   lib/pdm-config/Cargo.toml                  |  1 +
>>   lib/pdm-config/src/access_control.rs       | 31 ++++++++++++++++++++
>>   lib/pdm-config/src/config_version_cache.rs | 34 +++++++++++++++++-----
>>   lib/pdm-config/src/lib.rs                  |  2 ++
>>   server/src/acl.rs                          |  3 +-
>>   ui/src/main.rs                             | 10 ++++++-
>>   9 files changed, 77 insertions(+), 14 deletions(-)
>>   create mode 100644 lib/pdm-config/src/access_control.rs
>>
>>
>> Summary over all repositories:
>>    19 files changed, 825 insertions(+), 44 deletions(-)
>>
>> -- 
>> Generated by git-murpp 0.8.1
>>
>>
>>
>>
>>





^ permalink raw reply	[relevance 6%]

* [PATCH proxmox v8 3/6] token shadow: invalidate token-secret cache on token.shadow changes
  2026-04-09 15:54 17% [PATCH proxmox{,-datacenter-manager} v8 0/9] " Samuel Rufinatscha
  2026-04-09 15:54 11% ` [PATCH proxmox v8 1/6] token shadow: split AccessControlConfig and add token.shadow generation Samuel Rufinatscha
  2026-04-09 15:54 11% ` [PATCH proxmox v8 2/6] token shadow: cache verified API token secrets Samuel Rufinatscha
@ 2026-04-09 15:54 11% ` Samuel Rufinatscha
  2026-04-09 15:54 15% ` [PATCH proxmox v8 4/6] token shadow: add TTL window to token secret cache Samuel Rufinatscha
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-04-09 15:54 UTC (permalink / raw)
  To: pbs-devel

This patch adds manual/direct file change detection by tracking the
mtime and length of token.shadow and clears the in-memory token secret
cache whenever these values change.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v7 to v8:
* Merge refresh_cache_if_file_changed() and cache_try_secret_matches()
  into a single cached_secret_valid() function
* Move secret comparison logic into secret_matches() method on
  ApiTokenSecretCache
* Rename shadow field -> file_info

Changes from v6 to v7:
* Rebased

Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased

Changes from v3 to v4:
* make use of .replace() in refresh_cache_if_file_changed to get
previous state
* Group file stats with ShadowFileInfo
* Return false in refresh_cache_if_file_changed to avoid unnecessary cache
queries
* Adjusted commit message

Changes from v2 to v3:
* Cache now tracks last_checked (epoch seconds).
* Simplified refresh_cache_if_file_changed, removed
FILE_GENERATION logic
* On first load, initializes file metadata and keeps empty cache.

Changes from v1 to v2:
* Add file metadata tracking (file_mtime, file_len) and
  FILE_GENERATION.
* Store file_gen in CachedSecret and verify it against the current
  FILE_GENERATION to ensure cached entries belong to the current file
  state.
* Add shadow_mtime_len() helper and convert refresh to best-effort
  (try_write, returns bool).
* Pass a pre-write metadata snapshot into apply_api_mutation and
  clear/bump generation if the cache metadata indicates missed external
  edits.

 proxmox-access-control/src/token_shadow.rs | 167 +++++++++++++++++----
 1 file changed, 136 insertions(+), 31 deletions(-)

diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index d0bf43d7..810ff0c3 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,5 +1,8 @@
 use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
 use std::sync::LazyLock;
+use std::time::SystemTime;
 
 use anyhow::{bail, format_err, Error};
 use parking_lot::RwLock;
@@ -7,6 +10,7 @@ use serde_json::{from_value, Value};
 
 use proxmox_auth_api::types::Authid;
 use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
+use proxmox_time::epoch_i64;
 
 use crate::init::access_backend;
 use crate::init::impl_feature::{token_shadow, token_shadow_lock};
@@ -20,6 +24,7 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
     RwLock::new(ApiTokenSecretCache {
         secrets: HashMap::new(),
         cached_gen: 0,
+        file_info: None,
     })
 });
 
@@ -45,6 +50,62 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
     replace_config(token_shadow(), &json)
 }
 
+/// Tries to match the given token secret against the cached secret.
+///
+/// Verifies the generation/version before doing the constant-time
+/// comparison to reduce TOCTOU risk. During token rotation or deletion
+/// tokens for in-flight requests may still validate against the previous
+/// generation.
+///
+/// Returns true if secret is cached and cache is still valid
+fn cached_secret_valid(tokenid: &Authid, secret: &str) -> bool {
+    let now = epoch_i64();
+
+    // Best-effort refresh under write lock.
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return false;
+    };
+
+    let Some(current_gen) = token_shadow_generation() else {
+        return false;
+    };
+
+    // If another process bumped the generation, we don't know what changed -> clear cache
+    if cache.cached_gen != current_gen {
+        cache.reset_and_set_gen(current_gen);
+    }
+
+    // Stat the file to detect manual edits.
+    let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
+        return false;
+    };
+
+    // If the file didn't change, only update last_checked
+    if let Some(shadow) = cache.file_info.as_mut() {
+        if shadow.mtime == new_mtime && shadow.len == new_len {
+            shadow.last_checked = now;
+            return cache.secret_matches(tokenid, secret);
+        }
+    }
+
+    cache.secrets.clear();
+
+    let prev = cache.file_info.replace(ShadowFileInfo {
+        mtime: new_mtime,
+        len: new_len,
+        last_checked: now,
+    });
+
+    if prev.is_some() {
+        // Best-effort propagation to other processes if a change was detected
+        if let Some(new_gen) = bump_token_shadow_generation() {
+            cache.cached_gen = new_gen;
+        }
+    }
+
+    false
+}
+
 /// Verifies that an entry for given tokenid / API token secret exists
 pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     if !tokenid.is_token() {
@@ -52,7 +113,7 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     }
 
     // Fast path
-    if cache_try_secret_matches(tokenid, secret) {
+    if cached_secret_valid(tokenid, secret) {
         return Ok(());
     }
 
@@ -84,12 +145,15 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
 
     let guard = lock_config()?;
 
+    // Capture state before we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
-    apply_api_mutation(guard, tokenid, Some(secret));
+    apply_api_mutation(guard, tokenid, Some(secret), pre_meta);
 
     Ok(())
 }
@@ -102,11 +166,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
 
     let guard = lock_config()?;
 
+    // Capture state before we write to detect external edits.
+    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
-    apply_api_mutation(guard, tokenid, None);
+    apply_api_mutation(guard, tokenid, None, pre_meta);
 
     Ok(())
 }
@@ -133,6 +200,8 @@ struct ApiTokenSecretCache {
     secrets: HashMap<Authid, CachedSecret>,
     /// token.shadow generation of cached secrets.
     cached_gen: usize,
+    /// Shadow file info to detect changes
+    file_info: Option<ShadowFileInfo>,
 }
 
 impl ApiTokenSecretCache {
@@ -140,6 +209,7 @@ impl ApiTokenSecretCache {
     fn reset_and_set_gen(&mut self, new_gen: usize) {
         self.secrets.clear();
         self.cached_gen = new_gen;
+        self.file_info = None;
     }
 
     /// Caches a secret and sets/updates the cache generation.
@@ -153,6 +223,28 @@ impl ApiTokenSecretCache {
         self.secrets.remove(tokenid);
         self.cached_gen = new_gen;
     }
+
+    /// Returns true if there is a matching cached entry
+    fn secret_matches(&self, tokenid: &Authid, secret: &str) -> bool {
+        let Some(entry) = self.secrets.get(tokenid) else {
+            return false;
+        };
+        let cached_secret_bytes = entry.secret.as_bytes();
+        let secret_bytes = secret.as_bytes();
+
+        cached_secret_bytes.len() == secret_bytes.len()
+            && openssl::memcmp::eq(cached_secret_bytes, secret_bytes)
+    }
+}
+
+/// Shadow file info
+struct ShadowFileInfo {
+    // shadow file mtime to detect changes
+    mtime: Option<SystemTime>,
+    // shadow file length to detect changes
+    len: Option<u64>,
+    // last time the file metadata was checked
+    last_checked: i64,
 }
 
 fn cache_try_insert_secret(tokenid: Authid, secret: String, gen_before: usize) {
@@ -175,35 +267,14 @@ fn cache_try_insert_secret(tokenid: Authid, secret: String, gen_before: usize) {
     }
 }
 
-/// Tries to match the given token secret against the cached secret.
-///
-/// Verifies the generation/version before doing the constant-time
-/// comparison to reduce TOCTOU risk. During token rotation or deletion
-/// tokens for in-flight requests may still validate against the previous
-/// generation.
-fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
-    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
-        return false;
-    };
-    let Some(entry) = cache.secrets.get(tokenid) else {
-        return false;
-    };
-    let Some(current_gen) = token_shadow_generation() else {
-        return false;
-    };
-
-    if current_gen == cache.cached_gen {
-        let cached_secret_bytes = entry.secret.as_bytes();
-        let secret_bytes = secret.as_bytes();
-
-        return cached_secret_bytes.len() == secret_bytes.len()
-            && openssl::memcmp::eq(cached_secret_bytes, secret_bytes);
-    }
-
-    false
-}
+fn apply_api_mutation(
+    _guard: ApiLockGuard,
+    tokenid: &Authid,
+    secret: Option<&str>,
+    pre_write_meta: (Option<SystemTime>, Option<u64>),
+) {
+    let now = epoch_i64();
 
-fn apply_api_mutation(_guard: ApiLockGuard, tokenid: &Authid, secret: Option<&str>) {
     // Signal cache invalidation to other processes (best-effort).
     let bumped_gen = bump_token_shadow_generation();
     let mut cache = TOKEN_SECRET_CACHE.write();
@@ -221,6 +292,16 @@ fn apply_api_mutation(_guard: ApiLockGuard, tokenid: &Authid, secret: Option<&st
         return;
     }
 
+    // If our cached file metadata does not match the on-disk state before our write,
+    // we likely missed an external/manual edit. We can no longer trust any cached secrets.
+    if cache
+        .file_info
+        .as_ref()
+        .is_some_and(|s| (s.mtime, s.len) != pre_write_meta)
+    {
+        cache.secrets.clear();
+    }
+
     // Apply the new mutation.
     match secret {
         Some(secret) => {
@@ -231,6 +312,22 @@ fn apply_api_mutation(_guard: ApiLockGuard, tokenid: &Authid, secret: Option<&st
         }
         None => cache.evict_and_set_gen(tokenid, current_gen),
     }
+
+    // Update our view of the file metadata to the post-write state (best-effort).
+    // (If this fails, drop local cache so callers fall back to slow path until refreshed.)
+    match shadow_mtime_len() {
+        Ok((mtime, len)) => {
+            cache.file_info = Some(ShadowFileInfo {
+                mtime,
+                len,
+                last_checked: now,
+            });
+        }
+        Err(_) => {
+            // If we cannot validate state, do not trust cache.
+            cache.reset_and_set_gen(current_gen);
+        }
+    }
 }
 
 /// Get the current generation.
@@ -245,3 +342,11 @@ fn bump_token_shadow_generation() -> Option<usize> {
         .ok()
         .map(|prev| prev + 1)
 }
+
+fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
+    match fs::metadata(token_shadow()) {
+        Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
+        Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
+        Err(e) => Err(e.into()),
+    }
+}
-- 
2.47.3





^ permalink raw reply related	[relevance 11%]

* [PATCH proxmox-datacenter-manager v8 2/3] pdm-config: wire user and ACL cache generation
  2026-04-09 15:54 17% [PATCH proxmox{,-datacenter-manager} v8 0/9] " Samuel Rufinatscha
                   ` (6 preceding siblings ...)
  2026-04-09 15:54 14% ` [PATCH proxmox-datacenter-manager v8 1/3] pdm-config: implement access control backend hooks Samuel Rufinatscha
@ 2026-04-09 15:54 16% ` Samuel Rufinatscha
  2026-04-09 15:54 16% ` [PATCH proxmox-datacenter-manager v8 3/3] pdm-config: wire token.shadow generation Samuel Rufinatscha
  8 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-04-09 15:54 UTC (permalink / raw)
  To: pbs-devel

Enables user.cfg and acl.cfg caching by wiring
proxmox_access_control::init::AccessControlBackend's cache_generation()
and increment_cache_generation() with ConfigVersionCache.

Since the trait expects a single shared generation for both
user.cfg and acl.cfg files, the ConfigVersionCache's
user_cache_generation variable is
renamed to user_and_acl_generation to reflect its actual scope.

Safety: the renamed generation was unused before, no layout change, the
shared-memory size and field order remain unchanged.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v7 to v8:
* Rebased
* Improve commit message

Changes from v6 to v7:
* Rebased

Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased

 lib/pdm-config/src/access_control.rs       | 11 +++++++++++
 lib/pdm-config/src/config_version_cache.rs | 16 ++++++++--------
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/lib/pdm-config/src/access_control.rs b/lib/pdm-config/src/access_control.rs
index 0c17c99..6bc6ca6 100644
--- a/lib/pdm-config/src/access_control.rs
+++ b/lib/pdm-config/src/access_control.rs
@@ -26,4 +26,15 @@ impl proxmox_access_control::init::AccessControlBackend for AccessControlBackend
 
         Ok(())
     }
+
+    fn cache_generation(&self) -> Option<usize> {
+        crate::ConfigVersionCache::new()
+            .ok()
+            .map(|c| c.user_and_acl_generation())
+    }
+
+    fn increment_cache_generation(&self) -> Result<(), Error> {
+        let c = crate::ConfigVersionCache::new()?;
+        Ok(c.increase_user_and_acl_generation())
+    }
 }
diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
index 36a6a77..d27ec95 100644
--- a/lib/pdm-config/src/config_version_cache.rs
+++ b/lib/pdm-config/src/config_version_cache.rs
@@ -21,8 +21,8 @@ use proxmox_shared_memory::*;
 #[repr(C)]
 struct ConfigVersionCacheDataInner {
     magic: [u8; 8],
-    // User (user.cfg) cache generation/version.
-    user_cache_generation: AtomicUsize,
+    // User (user.cfg) and ACL (acl.cfg) generation/version.
+    user_and_acl_generation: AtomicUsize,
     // Traffic control (traffic-control.cfg) generation/version.
     traffic_control_generation: AtomicUsize,
     // Tracks updates to the remote/hostname/nodename mapping cache.
@@ -124,19 +124,19 @@ impl ConfigVersionCache {
         Ok(Arc::new(Self { shmem }))
     }
 
-    /// Returns the user cache generation number.
-    pub fn user_cache_generation(&self) -> usize {
+    /// Returns the user and ACL cache generation number.
+    pub fn user_and_acl_generation(&self) -> usize {
         self.shmem
             .data()
-            .user_cache_generation
+            .user_and_acl_generation
             .load(Ordering::Acquire)
     }
 
-    /// Increase the user cache generation number.
-    pub fn increase_user_cache_generation(&self) {
+    /// Increase the user and ACL cache generation number.
+    pub fn increase_user_and_acl_generation(&self) {
         self.shmem
             .data()
-            .user_cache_generation
+            .user_and_acl_generation
             .fetch_add(1, Ordering::AcqRel);
     }
 
-- 
2.47.3





^ permalink raw reply related	[relevance 16%]

* [PATCH proxmox-datacenter-manager v8 3/3] pdm-config: wire token.shadow generation
  2026-04-09 15:54 17% [PATCH proxmox{,-datacenter-manager} v8 0/9] " Samuel Rufinatscha
                   ` (7 preceding siblings ...)
  2026-04-09 15:54 16% ` [PATCH proxmox-datacenter-manager v8 2/3] pdm-config: wire user and ACL cache generation Samuel Rufinatscha
@ 2026-04-09 15:54 16% ` Samuel Rufinatscha
  8 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-04-09 15:54 UTC (permalink / raw)
  To: pbs-devel

Wires ConfigVersionCache with AccessControlBackend to support
token.shadow caching.

Safety: the shmem mapping is fixed to 4096 bytes via the #[repr(C)]
union padding, and the new atomic is appended to the end of the
#[repr(C)] inner struct, so all existing field offsets stay unchanged.
Old processes keep accessing the same bytes and new processes consume
previously reserved padding.

Also documents the effects of the added API token-cache in the
proxmox-access-control crate.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
 docs/access-control.rst                    |  4 ++++
 lib/pdm-config/src/access_control.rs       | 11 +++++++++++
 lib/pdm-config/src/config_version_cache.rs | 18 ++++++++++++++++++
 3 files changed, 33 insertions(+)

diff --git a/docs/access-control.rst b/docs/access-control.rst
index adf26cd..18e57a2 100644
--- a/docs/access-control.rst
+++ b/docs/access-control.rst
@@ -47,6 +47,10 @@ place of the user ID (``user@realm``) and the user password, respectively.
 The API token is passed from the client to the server by setting the ``Authorization`` HTTP header
 with method ``PDMAPIToken`` to the value ``TOKENID:TOKENSECRET``.
 
+.. WARNING:: Direct/manual edits to ``token.shadow`` may take up to 60 seconds (or
+   longer in edge cases) to take effect due to caching. Restart services for
+   immediate effect of manual edits.
+
 .. _access_control:
 
 Access Control
diff --git a/lib/pdm-config/src/access_control.rs b/lib/pdm-config/src/access_control.rs
index 6bc6ca6..d9fc8ff 100644
--- a/lib/pdm-config/src/access_control.rs
+++ b/lib/pdm-config/src/access_control.rs
@@ -37,4 +37,15 @@ impl proxmox_access_control::init::AccessControlBackend for AccessControlBackend
         let c = crate::ConfigVersionCache::new()?;
         Ok(c.increase_user_and_acl_generation())
     }
+
+    fn token_shadow_cache_generation(&self) -> Option<usize> {
+        crate::ConfigVersionCache::new()
+            .ok()
+            .map(|c| c.token_shadow_generation())
+    }
+
+    fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
+        let c = crate::ConfigVersionCache::new()?;
+        Ok(c.increase_token_shadow_generation())
+    }
 }
diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
index d27ec95..f3d52a0 100644
--- a/lib/pdm-config/src/config_version_cache.rs
+++ b/lib/pdm-config/src/config_version_cache.rs
@@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
     traffic_control_generation: AtomicUsize,
     // Tracks updates to the remote/hostname/nodename mapping cache.
     remote_mapping_cache: AtomicUsize,
+    // Token shadow (token.shadow) generation/version.
+    token_shadow_generation: AtomicUsize,
     // Add further atomics here
 }
 
@@ -172,4 +174,20 @@ impl ConfigVersionCache {
             .fetch_add(1, Ordering::Relaxed)
             + 1
     }
+
+    /// Returns the token shadow generation number.
+    pub fn token_shadow_generation(&self) -> usize {
+        self.shmem
+            .data()
+            .token_shadow_generation
+            .load(Ordering::Acquire)
+    }
+
+    /// Increase the token shadow generation number.
+    pub fn increase_token_shadow_generation(&self) -> usize {
+        self.shmem
+            .data()
+            .token_shadow_generation
+            .fetch_add(1, Ordering::AcqRel)
+    }
 }
-- 
2.47.3





^ permalink raw reply related	[relevance 16%]

* [PATCH proxmox v8 1/6] token shadow: split AccessControlConfig and add token.shadow generation
  2026-04-09 15:54 17% [PATCH proxmox{,-datacenter-manager} v8 0/9] " Samuel Rufinatscha
@ 2026-04-09 15:54 11% ` Samuel Rufinatscha
  2026-04-09 15:54 11% ` [PATCH proxmox v8 2/6] token shadow: cache verified API token secrets Samuel Rufinatscha
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-04-09 15:54 UTC (permalink / raw)
  To: pbs-devel

Splits implementation hooks from AccessControlConfig and introduces
AccessControlBackend to keep AccessControlConfig focused on ACL metadata and
validation.

Also introduces generation hooks in AccessControlBackend to support token.shadow
caching.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v7 to v8:
* Split into AccessControlConfig + AccessControlBackend instead of
  AccessControlConfig + AccessControlPermissions
* Gate AccessControlBackend behind #[cfg(feature = "impl")]
* Move init_user_config and cache_generation/increment_cache_generation
  into AccessControlBackend
* Remove delegation methods on AccessControlConfig (no longer needed
  with this split)
* Add init_separate() for cases where config and backend are different
  objects; constrain init() with T: AccessControlConfig +
  AccessControlBackend
* Callers use access_backend() instead of access_conf() for
  cache/backend hooks

Changes from v6 to v7:
* Rebased

Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased

 proxmox-access-control/src/acl.rs             |   4 +-
 .../src/cached_user_info.rs                   |   4 +-
 proxmox-access-control/src/init.rs            | 113 +++++++++++++-----
 proxmox-access-control/src/lib.rs             |   2 +-
 proxmox-access-control/src/user.rs            |   6 +-
 5 files changed, 91 insertions(+), 38 deletions(-)

diff --git a/proxmox-access-control/src/acl.rs b/proxmox-access-control/src/acl.rs
index 38cb7edf..e4c35e02 100644
--- a/proxmox-access-control/src/acl.rs
+++ b/proxmox-access-control/src/acl.rs
@@ -660,7 +660,7 @@ mod impl_feature {
     use proxmox_product_config::{open_api_lockfile, replace_privileged_config, ApiLockGuard};
 
     use crate::acl::AclTree;
-    use crate::init::access_conf;
+    use crate::init::access_backend;
     use crate::init::impl_feature::{acl_config, acl_config_lock};
 
     /// Get exclusive lock
@@ -741,7 +741,7 @@ mod impl_feature {
         replace_privileged_config(conf, &raw)?;
 
         // increase cache generation so we reload it next time we access it
-        access_conf().increment_cache_generation()?;
+        access_backend().increment_cache_generation()?;
 
         Ok(())
     }
diff --git a/proxmox-access-control/src/cached_user_info.rs b/proxmox-access-control/src/cached_user_info.rs
index 8db37727..81df561f 100644
--- a/proxmox-access-control/src/cached_user_info.rs
+++ b/proxmox-access-control/src/cached_user_info.rs
@@ -10,7 +10,7 @@ use proxmox_section_config::SectionConfigData;
 use proxmox_time::epoch_i64;
 
 use crate::acl::AclTree;
-use crate::init::access_conf;
+use crate::init::{access_backend, access_conf};
 use crate::types::{ApiToken, User};
 
 /// Cache User/Group/Token/Acl configuration data for fast permission tests
@@ -30,7 +30,7 @@ impl CachedUserInfo {
     pub fn new() -> Result<Arc<Self>, Error> {
         let now = epoch_i64();
 
-        let cache_generation = access_conf().cache_generation();
+        let cache_generation = access_backend().cache_generation();
 
         static CACHED_CONFIG: OnceLock<RwLock<ConfigCache>> = OnceLock::new();
         let cached_config = CACHED_CONFIG.get_or_init(|| {
diff --git a/proxmox-access-control/src/init.rs b/proxmox-access-control/src/init.rs
index e64398e8..07b5df11 100644
--- a/proxmox-access-control/src/init.rs
+++ b/proxmox-access-control/src/init.rs
@@ -7,6 +7,8 @@ use proxmox_auth_api::types::{Authid, Userid};
 use proxmox_section_config::SectionConfigData;
 
 static ACCESS_CONF: OnceLock<&'static dyn AccessControlConfig> = OnceLock::new();
+#[cfg(feature = "impl")]
+static ACCESS_BACKEND: OnceLock<&'static dyn AccessControlBackend> = OnceLock::new();
 
 /// This trait specifies the functions a product needs to implement to get ACL tree based access
 /// control management from this plugin.
@@ -32,25 +34,6 @@ pub trait AccessControlConfig: Send + Sync {
         false
     }
 
-    /// Returns the current cache generation of the user and acl configs. If the generation was
-    /// incremented since the last time the cache was queried, the configs are loaded again from
-    /// disk.
-    ///
-    /// Returning `None` will always reload the cache.
-    ///
-    /// Default: Always returns `None`.
-    fn cache_generation(&self) -> Option<usize> {
-        None
-    }
-
-    /// Increment the cache generation of user and acl configs. This indicates that they were
-    /// changed on disk.
-    ///
-    /// Default: Does nothing.
-    fn increment_cache_generation(&self) -> Result<(), Error> {
-        Ok(())
-    }
-
     /// Optionally returns a role that has no access to any resource.
     ///
     /// Default: Returns `None`.
@@ -65,13 +48,6 @@ pub trait AccessControlConfig: Send + Sync {
         None
     }
 
-    /// Called after the user configuration is loaded to potentially re-add fixed users, such as a
-    /// `root@pam` user.
-    fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
-        let _ = config;
-        Ok(())
-    }
-
     /// This is used to determined what access control list entries a user is allowed to read.
     ///
     /// Override this if you want to use the `api` feature.
@@ -103,6 +79,53 @@ pub trait AccessControlConfig: Send + Sync {
     }
 }
 
+/// Backend hooks for loading and caching access control state.
+#[cfg(feature = "impl")]
+pub trait AccessControlBackend: Send + Sync {
+    /// Called after the user configuration is loaded to potentially re-add fixed users, such as a
+    /// `root@pam` user.
+    fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
+        let _ = config;
+        Ok(())
+    }
+
+    /// Returns the current cache generation of the user and acl configs. If the generation was
+    /// incremented since the last time the cache was queried, the configs are loaded again from
+    /// disk.
+    ///
+    /// Returning `None` will always reload the cache.
+    ///
+    /// Default: Always returns `None`.
+    fn cache_generation(&self) -> Option<usize> {
+        None
+    }
+
+    /// Increment the cache generation of user and acl configs. This indicates that they were
+    /// changed on disk.
+    ///
+    /// Default: Does nothing.
+    fn increment_cache_generation(&self) -> Result<(), Error> {
+        Ok(())
+    }
+
+    /// Returns the current cache generation of the token shadow cache. If the generation was
+    /// incremented since the last time the cache was queried, the token shadow cache is reloaded
+    /// from disk.
+    ///
+    /// Default: Always returns `None`.
+    fn token_shadow_cache_generation(&self) -> Option<usize> {
+        None
+    }
+
+    /// Increment the cache generation of the token shadow cache and return the previous value.
+    /// This indicates that it was changed on disk.
+    ///
+    /// Default: Returns an error as token shadow generation is not supported.
+    fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
+        anyhow::bail!("token shadow generation not supported");
+    }
+}
+
 pub fn init_access_config(config: &'static dyn AccessControlConfig) -> Result<(), Error> {
     ACCESS_CONF
         .set(config)
@@ -115,8 +138,24 @@ pub(crate) fn access_conf() -> &'static dyn AccessControlConfig {
         .expect("please initialize the acm config before using it!")
 }
 
+#[cfg(feature = "impl")]
+pub fn init_access_backend(config: &'static dyn AccessControlBackend) -> Result<(), Error> {
+    ACCESS_BACKEND
+        .set(config)
+        .map_err(|_| format_err!("cannot initialize access control backend twice!"))
+}
+
+#[cfg(feature = "impl")]
+pub(crate) fn access_backend() -> &'static dyn AccessControlBackend {
+    *ACCESS_BACKEND
+        .get()
+        .expect("please initialize the access control backend before using it!")
+}
+
 #[cfg(feature = "impl")]
 pub use impl_feature::init;
+#[cfg(feature = "impl")]
+pub use impl_feature::init_separate;
 
 #[cfg(feature = "impl")]
 pub(crate) mod impl_feature {
@@ -125,15 +164,29 @@ pub(crate) mod impl_feature {
 
     use anyhow::{format_err, Error};
 
-    use crate::init::{init_access_config, AccessControlConfig};
+    use crate::init::{
+        init_access_backend, init_access_config, AccessControlBackend, AccessControlConfig,
+    };
 
     static ACCESS_CONF_DIR: OnceLock<PathBuf> = OnceLock::new();
 
-    pub fn init<P: AsRef<Path>>(
-        acm_config: &'static dyn AccessControlConfig,
+    pub fn init<T, P>(config: &'static T, config_dir: P) -> Result<(), Error>
+    where
+        T: AccessControlConfig + AccessControlBackend,
+        P: AsRef<Path>,
+    {
+        init_access_config(config)?;
+        init_access_backend(config)?;
+        init_access_config_dir(config_dir)
+    }
+
+    pub fn init_separate<P: AsRef<Path>>(
+        acl_config: &'static dyn AccessControlConfig,
+        backend: &'static dyn AccessControlBackend,
         config_dir: P,
     ) -> Result<(), Error> {
-        init_access_config(acm_config)?;
+        init_access_config(acl_config)?;
+        init_access_backend(backend)?;
         init_access_config_dir(config_dir)
     }
 
diff --git a/proxmox-access-control/src/lib.rs b/proxmox-access-control/src/lib.rs
index 9195c999..dc17da2a 100644
--- a/proxmox-access-control/src/lib.rs
+++ b/proxmox-access-control/src/lib.rs
@@ -8,7 +8,7 @@ pub mod acl;
 #[cfg(feature = "api")]
 pub mod api;
 
-#[cfg(feature = "acl")]
+#[cfg(any(feature = "acl", feature = "impl"))]
 pub mod init;
 
 #[cfg(feature = "impl")]
diff --git a/proxmox-access-control/src/user.rs b/proxmox-access-control/src/user.rs
index a4b59edc..ec5336d2 100644
--- a/proxmox-access-control/src/user.rs
+++ b/proxmox-access-control/src/user.rs
@@ -9,7 +9,7 @@ use proxmox_product_config::{open_api_lockfile, replace_privileged_config, ApiLo
 use proxmox_schema::*;
 use proxmox_section_config::{SectionConfig, SectionConfigData, SectionConfigPlugin};
 
-use crate::init::access_conf;
+use crate::init::access_backend;
 use crate::init::impl_feature::{user_config, user_config_lock};
 use crate::types::{ApiToken, User};
 
@@ -52,7 +52,7 @@ pub fn config() -> Result<(SectionConfigData, ConfigDigest), Error> {
     let digest = ConfigDigest::from_slice(content.as_bytes());
     let mut data = get_or_init_config().parse(user_config(), &content)?;
 
-    access_conf().init_user_config(&mut data)?;
+    access_backend().init_user_config(&mut data)?;
 
     Ok((data, digest))
 }
@@ -113,7 +113,7 @@ pub fn save_config(config: &SectionConfigData) -> Result<(), Error> {
     replace_privileged_config(config_file, raw.as_bytes())?;
 
     // increase cache generation so we reload it next time we access it
-    access_conf().increment_cache_generation()?;
+    access_backend().increment_cache_generation()?;
 
     Ok(())
 }
-- 
2.47.3





^ permalink raw reply related	[relevance 11%]

* [PATCH proxmox v8 4/6] token shadow: add TTL window to token secret cache
  2026-04-09 15:54 17% [PATCH proxmox{,-datacenter-manager} v8 0/9] " Samuel Rufinatscha
                   ` (2 preceding siblings ...)
  2026-04-09 15:54 11% ` [PATCH proxmox v8 3/6] token shadow: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2026-04-09 15:54 15% ` Samuel Rufinatscha
  2026-04-09 15:54 17% ` [PATCH proxmox v8 5/6] token shadow: inline set_secret fn Samuel Rufinatscha
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-04-09 15:54 UTC (permalink / raw)
  To: pbs-devel

cached_secret_valid() currently stats the file on every request, which
performs a metadata() call on token.shadow each time. Under load this
adds unnecessary overhead, considering also the file usually should
rarely change.

This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v7 to v8:
* TTL fast path (read lock) and write-lock re-check now return
  cache.secret_matches(tokenid, secret) instead of just true, following
  the cached_secret_valid() merge from the previous patch
* Adjusted commit message

Changes from v6 to v7:
* Rebased

Changes from v5 to v6:
* Rebased

Changes from v4 to v5:
* Rebased
* Introduce shadow_check_within_ttl() helper

Changes from v3 to v4:
* Adjusted commit message

Changes from v2 to v3:
* Refactored refresh_cache_if_file_changed TTL logic.
* Remove had_prior_state check (replaced by last_checked logic).
* Improve TTL bound checks.
* Reword documentation warning for clarity.

Changes from v1 to v2:
* Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
* Implement double-checked TTL: check with try_read first; only attempt
  refresh with try_write if expired/unknown.
* Fix TTL bookkeeping: update last_checked on the “file unchanged” path
  and after API mutations.
* Add documentation warning about TTL-delayed effect of manual
  token.shadow edits.

 proxmox-access-control/src/token_shadow.rs | 33 +++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index 810ff0c3..4185351e 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -27,6 +27,8 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
         file_info: None,
     })
 });
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
 
 // Get exclusive lock
 fn lock_config() -> Result<ApiLockGuard, Error> {
@@ -57,15 +59,31 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
 /// tokens for in-flight requests may still validate against the previous
 /// generation.
 ///
+/// If the cache file metadata's TTL has expired, will revalidate and invalidate the cache if
+/// needed.
+///
 /// Returns true if secret is cached and cache is still valid
 fn cached_secret_valid(tokenid: &Authid, secret: &str) -> bool {
     let now = epoch_i64();
 
-    // Best-effort refresh under write lock.
+    // Fast path: cache is fresh if generation matches and TTL not expired.
+    if let (Some(cache), Some(read_gen)) =
+        (TOKEN_SECRET_CACHE.try_read(), token_shadow_generation())
+    {
+        if cache.cached_gen == read_gen && cache.shadow_check_within_ttl(now) {
+            return cache.secret_matches(tokenid, secret);
+        }
+        // read lock drops here
+    } else {
+        return false;
+    }
+
+    // Slow path: best-effort refresh under write lock.
     let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
         return false;
     };
 
+    // Re-read generation after acquiring the lock (may have changed meanwhile).
     let Some(current_gen) = token_shadow_generation() else {
         return false;
     };
@@ -75,6 +93,12 @@ fn cached_secret_valid(tokenid: &Authid, secret: &str) -> bool {
         cache.reset_and_set_gen(current_gen);
     }
 
+    // TTL check again after acquiring the lock
+    let now = epoch_i64();
+    if cache.shadow_check_within_ttl(now) {
+        return cache.secret_matches(tokenid, secret);
+    }
+
     // Stat the file to detect manual edits.
     let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
         return false;
@@ -224,6 +248,13 @@ impl ApiTokenSecretCache {
         self.cached_gen = new_gen;
     }
 
+    /// Returns true if cached token.shadow metadata exists and was checked within the TTL window.
+    fn shadow_check_within_ttl(&self, now: i64) -> bool {
+        self.file_info.as_ref().is_some_and(|cached| {
+            now >= cached.last_checked && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS
+        })
+    }
+
     /// Returns true if there is a matching cached entry
     fn secret_matches(&self, tokenid: &Authid, secret: &str) -> bool {
         let Some(entry) = self.secrets.get(tokenid) else {
-- 
2.47.3





^ permalink raw reply related	[relevance 15%]

* [PATCH proxmox v8 5/6] token shadow: inline set_secret fn
  2026-04-09 15:54 17% [PATCH proxmox{,-datacenter-manager} v8 0/9] " Samuel Rufinatscha
                   ` (3 preceding siblings ...)
  2026-04-09 15:54 15% ` [PATCH proxmox v8 4/6] token shadow: add TTL window to token secret cache Samuel Rufinatscha
@ 2026-04-09 15:54 17% ` Samuel Rufinatscha
  2026-04-09 15:54 15% ` [PATCH proxmox v8 6/6] token shadow: deduplicate more code into apply_api_mutation Samuel Rufinatscha
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-04-09 15:54 UTC (permalink / raw)
  To: pbs-devel

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
 proxmox-access-control/src/token_shadow.rs | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index 4185351e..270f3bfa 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -161,8 +161,11 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     }
 }
 
-/// Adds a new entry for the given tokenid / API token secret. The secret is stored as salted hash.
-pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
+/// Generates a new secret for the given tokenid / API token, sets it then returns it.
+/// The secret is stored as salted hash.
+pub fn generate_and_set_secret(tokenid: &Authid) -> Result<String, Error> {
+    let secret = format!("{:x}", proxmox_uuid::Uuid::generate());
+
     if !tokenid.is_token() {
         bail!("not an API token ID");
     }
@@ -173,13 +176,13 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
     let pre_meta = shadow_mtime_len().unwrap_or((None, None));
 
     let mut data = read_file()?;
-    let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
+    let hashed_secret = proxmox_sys::crypt::encrypt_pw(&secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
-    apply_api_mutation(guard, tokenid, Some(secret), pre_meta);
+    apply_api_mutation(guard, tokenid, Some(&secret), pre_meta);
 
-    Ok(())
+    Ok(secret)
 }
 
 /// Deletes the entry for the given tokenid.
@@ -202,14 +205,6 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
     Ok(())
 }
 
-/// Generates a new secret for the given tokenid / API token, sets it then returns it.
-/// The secret is stored as salted hash.
-pub fn generate_and_set_secret(tokenid: &Authid) -> Result<String, Error> {
-    let secret = format!("{:x}", proxmox_uuid::Uuid::generate());
-    set_secret(tokenid, &secret)?;
-    Ok(secret)
-}
-
 /// Cached secret.
 struct CachedSecret {
     secret: String,
-- 
2.47.3





^ permalink raw reply related	[relevance 17%]

* [PATCH proxmox{,-datacenter-manager} v8 0/9] token-shadow: reduce api token verification overhead
@ 2026-04-09 15:54 17% Samuel Rufinatscha
  2026-04-09 15:54 11% ` [PATCH proxmox v8 1/6] token shadow: split AccessControlConfig and add token.shadow generation Samuel Rufinatscha
                   ` (8 more replies)
  0 siblings, 9 replies; 117+ results
From: Samuel Rufinatscha @ 2026-04-09 15:54 UTC (permalink / raw)
  To: pbs-devel

Hi,

This series mirrors the token secret caching approach from PBS [0] for
PDM through proxmox-access-control.

Since PDM implements permissions in pdm-api-types and cache/generation
hooks in pdm-config, the trait needed to be split. This series
introduces a separate AccessControlBackend trait
(gated behind cfg(feature = "impl")) for the cache and token.shadow
generation hooks, and moves init_user_config there as well. PDM wires
the backend via init_separate(), which accepts the two traits
independently.

This series also wires the existing but previously not wired user and
ACL generation.

Testing

I verified that this series mirrors the already applied PBS patches
including follow-ups by comparing patch diffs.

Functionally-wise, I tested if:
   * valid tokens authenticate correctly when used in API requests
   * invalid secrets are rejected as before
   * generating a new token secret via dashboard (create token for user,
   regenerate existing secret) works and authenticates correctly
   * disabling the token or removing ACL permissions stops accepting
   requests

Patch 1 - 6 generally mirror the already applied PBS patches in
proxmox-access-control including follow-ups (thanks @Fabian).
Patch 7 - 9 focus on PDM its AccessControlBackend implementation and
wires the cache generations.

Maintainer Notes:
* proxmox-access-control trait split -> version bump
* Renames ConfigVersionCache`s pub user_cache_generation and
 increase_user_cache_generation -> version bump
* Adds parking_lot::RwLock dependency in proxmox-access-control

[0] https://lore.proxmox.com/pbs-devel/20260312103708.125282-1-s.rufinatscha@proxmox.com/T/#t

proxmox:

Samuel Rufinatscha (6):
  token shadow: split AccessControlConfig and add token.shadow
    generation
  token shadow: cache verified API token secrets
  token shadow: invalidate token-secret cache on token.shadow changes
  token shadow: add TTL window to token secret cache
  token shadow: inline set_secret fn
  token shadow: deduplicate more code into apply_api_mutation

 Cargo.toml                                    |   1 +
 proxmox-access-control/Cargo.toml             |   1 +
 proxmox-access-control/src/acl.rs             |   4 +-
 .../src/cached_user_info.rs                   |   4 +-
 proxmox-access-control/src/init.rs            | 113 ++++--
 proxmox-access-control/src/lib.rs             |   2 +-
 proxmox-access-control/src/token_shadow.rs    | 324 ++++++++++++++++--
 proxmox-access-control/src/user.rs            |   6 +-
 8 files changed, 396 insertions(+), 59 deletions(-)


proxmox-datacenter-manager:

Samuel Rufinatscha (3):
  pdm-config: implement access control backend hooks
  pdm-config: wire user and ACL cache generation
  pdm-config: wire token.shadow generation

 cli/admin/src/main.rs                      |  3 +-
 docs/access-control.rst                    |  4 ++
 lib/pdm-api-types/src/acl.rs               | 26 +----------
 lib/pdm-config/Cargo.toml                  |  1 +
 lib/pdm-config/src/access_control.rs       | 51 ++++++++++++++++++++++
 lib/pdm-config/src/config_version_cache.rs | 34 +++++++++++----
 lib/pdm-config/src/lib.rs                  |  2 +
 server/src/acl.rs                          | 10 ++++-
 8 files changed, 95 insertions(+), 36 deletions(-)
 create mode 100644 lib/pdm-config/src/access_control.rs


Summary over all repositories:
  16 files changed, 491 insertions(+), 95 deletions(-)

-- 
Generated by git-murpp 0.8.1




^ permalink raw reply	[relevance 17%]

* [PATCH proxmox v8 2/6] token shadow: cache verified API token secrets
  2026-04-09 15:54 17% [PATCH proxmox{,-datacenter-manager} v8 0/9] " Samuel Rufinatscha
  2026-04-09 15:54 11% ` [PATCH proxmox v8 1/6] token shadow: split AccessControlConfig and add token.shadow generation Samuel Rufinatscha
@ 2026-04-09 15:54 11% ` Samuel Rufinatscha
  2026-04-09 15:54 11% ` [PATCH proxmox v8 3/6] token shadow: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-04-09 15:54 UTC (permalink / raw)
  To: pbs-devel

Adds an in-memory cache of successfully verified token secrets.
Subsequent requests for the same token+secret combination only perform
a comparison using openssl::memcmp::eq and avoid re-running the
password hash. The cache is updated when a token secret is set and
cleared when a token is deleted. A shared generation counter (via
ConfigVersionCache) is used to invalidate caches across processes when
token secrets are modified or deleted. This keeps privileged and
unprivileged daemons in sync.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v7 to v8:
* Rename shared_gen -> cached_gen
* Rename token_shadow_shared_gen() -> token_shadow_generation()
* Rename bump_token_shadow_shared_gen() -> bump_token_shadow_generation()
* Use access_backend() instead of access_conf() for generation hooks

Changes from v6 to v7:
* Rebased
* Rename "gen" variables to be compatible with Rust 2024 keyword
changes

Changes from v5 to v6:
* Rebased
* Check that the input byte lengths are equal before calling
openssl::memcmp::eq(..).

Changes from v4 to v5:
* Rebased
* Fix wrong type compilation issue; replaced with ApiLockGuard
* Move invalidate_cache_state_and_set_gen into cache object impl
rename to reset_and_set_gen
* Add additional insert/remove helpers which set/update the generation
directly
* Clarified the  usage of shared generation counter in the commit
message

Changes from v3 to v4:
* Add gen param to invalidate_cache_state()
* Validates the generation bump after obtaining write lock in
apply_api_mutation
* Pass lock to apply_api_mutation
* Remove unnecessary gen check cache_try_secret_matches
* Adjusted commit message

Changes from v2 to v3:
* Replaced process-local cache invalidation (AtomicU64
API_MUTATION_GENERATION) with a cross-process shared generation via
ConfigVersionCache.
* Validate shared generation before/after the constant-time secret
compare; only insert into cache if the generation is unchanged.
* invalidate_cache_state() on insert if shared generation changed.

Changes from v1 to v2:
* Replace OnceCell with LazyLock, and std::sync::RwLock with
parking_lot::RwLock.
* Add API_MUTATION_GENERATION and guard cache inserts
to prevent “zombie inserts” across concurrent set/delete.
* Refactor cache operations into cache_try_secret_matches,
cache_try_insert_secret, and centralize write-side behavior in
apply_api_mutation.
* Switch fast-path cache access to try_read/try_write (best-effort).

 Cargo.toml                                 |   1 +
 proxmox-access-control/Cargo.toml          |   1 +
 proxmox-access-control/src/token_shadow.rs | 170 ++++++++++++++++++++-
 3 files changed, 169 insertions(+), 3 deletions(-)

diff --git a/Cargo.toml b/Cargo.toml
index 02ff7f81..cf55653f 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -115,6 +115,7 @@ native-tls = "0.2"
 nix = "0.29"
 openssl = "0.10"
 pam-sys = "0.5"
+parking_lot = "0.12"
 percent-encoding = "2.1"
 pin-utils = "0.1.0"
 proc-macro2 = "1.0"
diff --git a/proxmox-access-control/Cargo.toml b/proxmox-access-control/Cargo.toml
index ec189664..1de2842c 100644
--- a/proxmox-access-control/Cargo.toml
+++ b/proxmox-access-control/Cargo.toml
@@ -16,6 +16,7 @@ anyhow.workspace = true
 const_format.workspace = true
 nix = { workspace = true, optional = true }
 openssl = { workspace = true, optional = true }
+parking_lot.workspace = true
 regex.workspace = true
 hex = { workspace = true, optional = true }
 serde.workspace = true
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index c586d834..d0bf43d7 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,13 +1,28 @@
 use std::collections::HashMap;
+use std::sync::LazyLock;
 
 use anyhow::{bail, format_err, Error};
+use parking_lot::RwLock;
 use serde_json::{from_value, Value};
 
 use proxmox_auth_api::types::Authid;
 use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
 
+use crate::init::access_backend;
 use crate::init::impl_feature::{token_shadow, token_shadow_lock};
 
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
+    RwLock::new(ApiTokenSecretCache {
+        secrets: HashMap::new(),
+        cached_gen: 0,
+    })
+});
+
 // Get exclusive lock
 fn lock_config() -> Result<ApiLockGuard, Error> {
     open_api_lockfile(token_shadow_lock(), None, true)
@@ -36,9 +51,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
+    // Fast path
+    if cache_try_secret_matches(tokenid, secret) {
+        return Ok(());
+    }
+
+    // Slow path
+    // First, capture the generation before doing the hash verification.
+    let gen_before = token_shadow_generation();
+
     let data = read_file()?;
     match data.get(tokenid) {
-        Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+        Some(hashed_secret) => {
+            proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+
+            // Try to cache only if nothing changed while verifying the secret.
+            if let Some(gen_before) = gen_before {
+                cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen_before);
+            }
+
+            Ok(())
+        }
         None => bail!("invalid API token"),
     }
 }
@@ -49,13 +82,15 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
-    let _guard = lock_config()?;
+    let guard = lock_config()?;
 
     let mut data = read_file()?;
     let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
     data.insert(tokenid.clone(), hashed_secret);
     write_file(data)?;
 
+    apply_api_mutation(guard, tokenid, Some(secret));
+
     Ok(())
 }
 
@@ -65,12 +100,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
         bail!("not an API token ID");
     }
 
-    let _guard = lock_config()?;
+    let guard = lock_config()?;
 
     let mut data = read_file()?;
     data.remove(tokenid);
     write_file(data)?;
 
+    apply_api_mutation(guard, tokenid, None);
+
     Ok(())
 }
 
@@ -81,3 +118,130 @@ pub fn generate_and_set_secret(tokenid: &Authid) -> Result<String, Error> {
     set_secret(tokenid, &secret)?;
     Ok(secret)
 }
+
+/// Cached secret.
+struct CachedSecret {
+    secret: String,
+}
+
+struct ApiTokenSecretCache {
+    /// Keys are token Authids, values are the corresponding plain text secrets.
+    /// Entries are added after a successful on-disk verification in
+    /// `verify_secret` or when a new token secret is generated by
+    /// `generate_and_set_secret`. Used to avoid repeated
+    /// password-hash computation on subsequent authentications.
+    secrets: HashMap<Authid, CachedSecret>,
+    /// token.shadow generation of cached secrets.
+    cached_gen: usize,
+}
+
+impl ApiTokenSecretCache {
+    /// Resets all local cache contents and sets/updates the cached generation.
+    fn reset_and_set_gen(&mut self, new_gen: usize) {
+        self.secrets.clear();
+        self.cached_gen = new_gen;
+    }
+
+    /// Caches a secret and sets/updates the cache generation.
+    fn insert_and_set_gen(&mut self, tokenid: Authid, secret: CachedSecret, new_gen: usize) {
+        self.secrets.insert(tokenid, secret);
+        self.cached_gen = new_gen;
+    }
+
+    /// Evicts a cached secret and sets/updates the cached generation.
+    fn evict_and_set_gen(&mut self, tokenid: &Authid, new_gen: usize) {
+        self.secrets.remove(tokenid);
+        self.cached_gen = new_gen;
+    }
+}
+
+fn cache_try_insert_secret(tokenid: Authid, secret: String, gen_before: usize) {
+    let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+        return;
+    };
+
+    let Some(gen_now) = token_shadow_generation() else {
+        return;
+    };
+
+    // If this process missed a generation bump, its cache is stale.
+    if cache.cached_gen != gen_now {
+        cache.reset_and_set_gen(gen_now);
+    }
+
+    // If a mutation happened while we were verifying the secret, do not insert.
+    if gen_now == gen_before {
+        cache.insert_and_set_gen(tokenid, CachedSecret { secret }, gen_now);
+    }
+}
+
+/// Tries to match the given token secret against the cached secret.
+///
+/// Verifies the generation/version before doing the constant-time
+/// comparison to reduce TOCTOU risk. During token rotation or deletion
+/// tokens for in-flight requests may still validate against the previous
+/// generation.
+fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
+    let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+        return false;
+    };
+    let Some(entry) = cache.secrets.get(tokenid) else {
+        return false;
+    };
+    let Some(current_gen) = token_shadow_generation() else {
+        return false;
+    };
+
+    if current_gen == cache.cached_gen {
+        let cached_secret_bytes = entry.secret.as_bytes();
+        let secret_bytes = secret.as_bytes();
+
+        return cached_secret_bytes.len() == secret_bytes.len()
+            && openssl::memcmp::eq(cached_secret_bytes, secret_bytes);
+    }
+
+    false
+}
+
+fn apply_api_mutation(_guard: ApiLockGuard, tokenid: &Authid, secret: Option<&str>) {
+    // Signal cache invalidation to other processes (best-effort).
+    let bumped_gen = bump_token_shadow_generation();
+    let mut cache = TOKEN_SECRET_CACHE.write();
+
+    // If we cannot get the current generation, we cannot trust the cache
+    let Some(current_gen) = token_shadow_generation() else {
+        cache.reset_and_set_gen(0);
+        return;
+    };
+
+    // If we cannot bump the generation, or if it changed after
+    // obtaining the cache write lock, we cannot trust the cache
+    if bumped_gen != Some(current_gen) {
+        cache.reset_and_set_gen(current_gen);
+        return;
+    }
+
+    // Apply the new mutation.
+    match secret {
+        Some(secret) => {
+            let cached_secret = CachedSecret {
+                secret: secret.to_owned(),
+            };
+            cache.insert_and_set_gen(tokenid.clone(), cached_secret, current_gen);
+        }
+        None => cache.evict_and_set_gen(tokenid, current_gen),
+    }
+}
+
+/// Get the current generation.
+fn token_shadow_generation() -> Option<usize> {
+    access_backend().token_shadow_cache_generation()
+}
+
+/// Bump and return the new generation.
+fn bump_token_shadow_generation() -> Option<usize> {
+    access_backend()
+        .increment_token_shadow_cache_generation()
+        .ok()
+        .map(|prev| prev + 1)
+}
-- 
2.47.3





^ permalink raw reply related	[relevance 11%]

* [PATCH proxmox v8 6/6] token shadow: deduplicate more code into apply_api_mutation
  2026-04-09 15:54 17% [PATCH proxmox{,-datacenter-manager} v8 0/9] " Samuel Rufinatscha
                   ` (4 preceding siblings ...)
  2026-04-09 15:54 17% ` [PATCH proxmox v8 5/6] token shadow: inline set_secret fn Samuel Rufinatscha
@ 2026-04-09 15:54 15% ` Samuel Rufinatscha
  2026-04-09 15:54 14% ` [PATCH proxmox-datacenter-manager v8 1/3] pdm-config: implement access control backend hooks Samuel Rufinatscha
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-04-09 15:54 UTC (permalink / raw)
  To: pbs-devel

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
 proxmox-access-control/src/token_shadow.rs | 71 +++++++++-------------
 1 file changed, 29 insertions(+), 42 deletions(-)

diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index 270f3bfa..a8cd4209 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -164,43 +164,13 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
 /// Generates a new secret for the given tokenid / API token, sets it then returns it.
 /// The secret is stored as salted hash.
 pub fn generate_and_set_secret(tokenid: &Authid) -> Result<String, Error> {
-    let secret = format!("{:x}", proxmox_uuid::Uuid::generate());
-
-    if !tokenid.is_token() {
-        bail!("not an API token ID");
-    }
-
-    let guard = lock_config()?;
-
-    // Capture state before we write to detect external edits.
-    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
-
-    let mut data = read_file()?;
-    let hashed_secret = proxmox_sys::crypt::encrypt_pw(&secret)?;
-    data.insert(tokenid.clone(), hashed_secret);
-    write_file(data)?;
-
-    apply_api_mutation(guard, tokenid, Some(&secret), pre_meta);
-
-    Ok(secret)
+    apply_api_mutation(tokenid, true)?
+        .ok_or_else(|| format_err!("Failed to generate API token secret"))
 }
 
 /// Deletes the entry for the given tokenid.
 pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
-    if !tokenid.is_token() {
-        bail!("not an API token ID");
-    }
-
-    let guard = lock_config()?;
-
-    // Capture state before we write to detect external edits.
-    let pre_meta = shadow_mtime_len().unwrap_or((None, None));
-
-    let mut data = read_file()?;
-    data.remove(tokenid);
-    write_file(data)?;
-
-    apply_api_mutation(guard, tokenid, None, pre_meta);
+    apply_api_mutation(tokenid, false)?;
 
     Ok(())
 }
@@ -293,12 +263,28 @@ fn cache_try_insert_secret(tokenid: Authid, secret: String, gen_before: usize) {
     }
 }
 
-fn apply_api_mutation(
-    _guard: ApiLockGuard,
-    tokenid: &Authid,
-    secret: Option<&str>,
-    pre_write_meta: (Option<SystemTime>, Option<u64>),
-) {
+fn apply_api_mutation(tokenid: &Authid, generate: bool) -> Result<Option<String>, Error> {
+    if !tokenid.is_token() {
+        bail!("not an API token ID");
+    }
+
+    let _guard = lock_config()?;
+
+    // Capture state before we write to detect external edits.
+    let pre_write_meta = shadow_mtime_len().unwrap_or((None, None));
+
+    let mut data = read_file()?;
+    let secret = if generate {
+        let secret = format!("{:x}", proxmox_uuid::Uuid::generate());
+        let hashed_secret = proxmox_sys::crypt::encrypt_pw(&secret)?;
+        data.insert(tokenid.clone(), hashed_secret);
+        Some(secret)
+    } else {
+        data.remove(tokenid);
+        None
+    };
+    write_file(data)?;
+
     let now = epoch_i64();
 
     // Signal cache invalidation to other processes (best-effort).
@@ -308,14 +294,14 @@ fn apply_api_mutation(
     // If we cannot get the current generation, we cannot trust the cache
     let Some(current_gen) = token_shadow_generation() else {
         cache.reset_and_set_gen(0);
-        return;
+        return Ok(secret);
     };
 
     // If we cannot bump the generation, or if it changed after
     // obtaining the cache write lock, we cannot trust the cache
     if bumped_gen != Some(current_gen) {
         cache.reset_and_set_gen(current_gen);
-        return;
+        return Ok(secret);
     }
 
     // If our cached file metadata does not match the on-disk state before our write,
@@ -329,7 +315,7 @@ fn apply_api_mutation(
     }
 
     // Apply the new mutation.
-    match secret {
+    match &secret {
         Some(secret) => {
             let cached_secret = CachedSecret {
                 secret: secret.to_owned(),
@@ -354,6 +340,7 @@ fn apply_api_mutation(
             cache.reset_and_set_gen(current_gen);
         }
     }
+    Ok(secret)
 }
 
 /// Get the current generation.
-- 
2.47.3





^ permalink raw reply related	[relevance 15%]

* [PATCH proxmox-datacenter-manager v8 1/3] pdm-config: implement access control backend hooks
  2026-04-09 15:54 17% [PATCH proxmox{,-datacenter-manager} v8 0/9] " Samuel Rufinatscha
                   ` (5 preceding siblings ...)
  2026-04-09 15:54 15% ` [PATCH proxmox v8 6/6] token shadow: deduplicate more code into apply_api_mutation Samuel Rufinatscha
@ 2026-04-09 15:54 14% ` Samuel Rufinatscha
  2026-04-09 15:54 16% ` [PATCH proxmox-datacenter-manager v8 2/3] pdm-config: wire user and ACL cache generation Samuel Rufinatscha
  2026-04-09 15:54 16% ` [PATCH proxmox-datacenter-manager v8 3/3] pdm-config: wire token.shadow generation Samuel Rufinatscha
  8 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-04-09 15:54 UTC (permalink / raw)
  To: pbs-devel

Implement AccessControlBackend in pdm-config and move
init_user_config() there from the ACL config in pdm-api-types.

Update server and admin initialization to pass ACL config and backend
separately.

Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
 cli/admin/src/main.rs                |  3 ++-
 lib/pdm-api-types/src/acl.rs         | 26 +------------------------
 lib/pdm-config/Cargo.toml            |  1 +
 lib/pdm-config/src/access_control.rs | 29 ++++++++++++++++++++++++++++
 lib/pdm-config/src/lib.rs            |  2 ++
 server/src/acl.rs                    | 10 ++++++++--
 6 files changed, 43 insertions(+), 28 deletions(-)
 create mode 100644 lib/pdm-config/src/access_control.rs

diff --git a/cli/admin/src/main.rs b/cli/admin/src/main.rs
index f698fa2..d51f211 100644
--- a/cli/admin/src/main.rs
+++ b/cli/admin/src/main.rs
@@ -18,8 +18,9 @@ fn main() {
     let priv_user = pdm_config::priv_user().expect("cannot get privileged user");
     proxmox_product_config::init(api_user, priv_user);
 
-    proxmox_access_control::init::init(
+    proxmox_access_control::init::init_separate(
         &pdm_api_types::AccessControlConfig,
+        &pdm_config::AccessControlBackend,
         pdm_buildcfg::configdir!("/access"),
     )
     .expect("failed to setup access control config");
diff --git a/lib/pdm-api-types/src/acl.rs b/lib/pdm-api-types/src/acl.rs
index 405982a..0868f3d 100644
--- a/lib/pdm-api-types/src/acl.rs
+++ b/lib/pdm-api-types/src/acl.rs
@@ -2,17 +2,15 @@ use std::collections::HashMap;
 use std::str::FromStr;
 use std::sync::LazyLock;
 
-use anyhow::{format_err, Context, Error};
+use anyhow::{format_err, Error};
 use const_format::concatcp;
 use serde::de::{value, IntoDeserializer};
 use serde::{Deserialize, Serialize};
 
-use proxmox_access_control::types::User;
 use proxmox_auth_api::types::Authid;
 use proxmox_lang::constnamedbitmap;
 use proxmox_schema::api_types::SAFE_ID_REGEX_STR;
 use proxmox_schema::{api, const_regex, ApiStringFormat, BooleanSchema, Schema, StringSchema};
-use proxmox_section_config::SectionConfigData;
 
 const_regex! {
     pub ACL_PATH_REGEX = concatcp!(r"^(?:/|", r"(?:/", SAFE_ID_REGEX_STR, ")+", r")$");
@@ -224,28 +222,6 @@ impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
         Some("Administrator")
     }
 
-    fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
-        if !config.sections.contains_key("root@pam") {
-            config
-                .set_data(
-                    "root@pam",
-                    "user",
-                    User {
-                        userid: "root@pam".parse().expect("invalid user id"),
-                        comment: Some("Superuser".to_string()),
-                        enable: None,
-                        expire: None,
-                        firstname: None,
-                        lastname: None,
-                        email: None,
-                    },
-                )
-                .context("failed to insert default user into user config")?
-        }
-
-        Ok(())
-    }
-
     fn acl_audit_privileges(&self) -> u64 {
         PRIV_ACCESS_AUDIT
     }
diff --git a/lib/pdm-config/Cargo.toml b/lib/pdm-config/Cargo.toml
index d39c2ad..19781d2 100644
--- a/lib/pdm-config/Cargo.toml
+++ b/lib/pdm-config/Cargo.toml
@@ -13,6 +13,7 @@ once_cell.workspace = true
 openssl.workspace = true
 serde.workspace = true
 
+proxmox-access-control.workspace = true
 proxmox-config-digest = { workspace = true, features = [ "openssl" ] }
 proxmox-http = { workspace = true, features = [ "http-helpers" ] }
 proxmox-ldap = { workspace = true, features = [ "types" ]}
diff --git a/lib/pdm-config/src/access_control.rs b/lib/pdm-config/src/access_control.rs
new file mode 100644
index 0000000..0c17c99
--- /dev/null
+++ b/lib/pdm-config/src/access_control.rs
@@ -0,0 +1,29 @@
+use anyhow::{Context, Error};
+use proxmox_access_control::types::User;
+use proxmox_section_config::SectionConfigData;
+
+pub struct AccessControlBackend;
+
+impl proxmox_access_control::init::AccessControlBackend for AccessControlBackend {
+    fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
+        if !config.sections.contains_key("root@pam") {
+            config
+                .set_data(
+                    "root@pam",
+                    "user",
+                    User {
+                        userid: "root@pam".parse().expect("invalid user id"),
+                        comment: Some("Superuser".to_string()),
+                        enable: None,
+                        expire: None,
+                        firstname: None,
+                        lastname: None,
+                        email: None,
+                    },
+                )
+                .context("failed to insert default user into user config")?
+        }
+
+        Ok(())
+    }
+}
diff --git a/lib/pdm-config/src/lib.rs b/lib/pdm-config/src/lib.rs
index 4c49054..6e5e760 100644
--- a/lib/pdm-config/src/lib.rs
+++ b/lib/pdm-config/src/lib.rs
@@ -9,6 +9,8 @@ pub mod remotes;
 pub mod setup;
 pub mod views;
 
+mod access_control;
+pub use access_control::AccessControlBackend;
 mod config_version_cache;
 pub use config_version_cache::ConfigVersionCache;
 
diff --git a/server/src/acl.rs b/server/src/acl.rs
index f421814..4150ef4 100644
--- a/server/src/acl.rs
+++ b/server/src/acl.rs
@@ -1,7 +1,13 @@
 pub(crate) fn init() {
     static ACCESS_CONTROL_CONFIG: pdm_api_types::AccessControlConfig =
         pdm_api_types::AccessControlConfig;
+    static ACCESS_CONTROL_BACKEND: pdm_config::AccessControlBackend =
+        pdm_config::AccessControlBackend;
 
-    proxmox_access_control::init::init(&ACCESS_CONTROL_CONFIG, pdm_buildcfg::configdir!("/access"))
-        .expect("failed to setup access control config");
+    proxmox_access_control::init::init_separate(
+        &ACCESS_CONTROL_CONFIG,
+        &ACCESS_CONTROL_BACKEND,
+        pdm_buildcfg::configdir!("/access"),
+    )
+    .expect("failed to setup access control config");
 }
-- 
2.47.3





^ permalink raw reply related	[relevance 14%]

* [pbs-devel] superseded: [PATCH proxmox{-backup,,-datacenter-manager} v7 00/11] token-shadow: reduce api token verification overhead
  2026-03-12 10:36 14% [PATCH proxmox{-backup,,-datacenter-manager} v7 " Samuel Rufinatscha
                   ` (11 preceding siblings ...)
  2026-03-19 12:26  5% ` partially-applied: [PATCH proxmox{-backup,,-datacenter-manager} v7 00/11] token-shadow: reduce api token verification overhead Fabian Grünbichler
@ 2026-04-09 15:58 13% ` Samuel Rufinatscha
  12 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-04-09 15:58 UTC (permalink / raw)
  To: pbs-devel

https://lore.proxmox.com/pbs-devel/20260409155437.312760-1-s.rufinatscha@proxmox.com/T/#t

On 3/12/26 11:36 AM, Samuel Rufinatscha wrote:
> Hi,
> 
> this series improves the performance of token-based API authentication
> in PBS (pbs-config) and in PDM (underlying proxmox-access-control
> crate), addressing the API token verification hotspot reported in our
> bugtracker #7017 [1].
> 
> When profiling PBS /status endpoint with cargo flamegraph [2],
> token-based authentication showed up as a dominant hotspot via
> proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
> path from the hot section of the flamegraph. The same performance issue
> was measured [2] for PDM. PDM uses the underlying shared
> proxmox-access-control library for token handling, which is a
> factored out version of the token.shadow handling code from PBS.
> 
> While this series fixes the immediate performance issue both in PBS
> (pbs-config) and in the shared proxmox-access-control crate used by
> PDM, PBS should eventually, ideally be refactored, in a separate
> effort, to use proxmox-access-control for token handling instead of its
> local implementation.
> 
> Approach
> 
> The goal is to reduce the cost of token-based authentication preserving
> the existing token handling semantics (including detecting manual edits
> to token.shadow) and be consistent between PBS (pbs-config) and
> PDM (proxmox-access-control). For both sites, this series proposes to:
> 
> 1. Introduce an in-memory cache for verified token secrets and
> invalidate it through a shared ConfigVersionCache generation. Note, a
> shared generation is required to keep privileged and unprivileged
> daemon in sync to avoid caching inconsistencies across processes.
> 2. Invalidate on token.shadow API changes (set_secret,
> delete_secret)
> 3. Invalidate on direct/manual token.shadow file changes (mtime +
> length)
> 4. Avoid per-request file stat calls using a TTL window
> 
> Testing
> 
> To verify the effect in PBS (pbs-config changes), I:
> 1. Set up test environment based on latest PBS ISO, installed Rust
>     toolchain, cloned proxmox-backup repository to use with cargo
>     flamegraph. Reproduced bug #7017 [1] by profiling the /status
>     endpoint with token-based authentication using cargo flamegraph [2].
> 2. Built PBS with pbs-config patches and re-ran the same workload and
>     profiling setup. Confirmed that
>     proxmox_sys::crypt::verify_crypt_pw path no longer appears in the
>     hot section of the flamegraph. CPU usage is now dominated by TLS
>     overhead.
> 3. Functionally-wise, I verified that:
>     * valid tokens authenticate correctly when used in API requests
>     * invalid secrets are rejected as before
>     * generating a new token secret via dashboard (create token for
>     user, regenerate existing secret) works and authenticates correctly
> 
> To verify the effect in PDM (proxmox-access-control changes), instead
> of PBS’ /status, I profiled the /version endpoint with cargo flamegraph
> [2] and verified that the expensive hashing path disappears from the
> hot section after introducing caching. Functionally-wise, I verified
> that:
>     * valid tokens authenticate correctly when used in API requests
>     * invalid secrets are rejected as before
>     * generating a new token secret via dashboard (create token for user,
>     regenerate existing secret) works and authenticates correctly
> 
> Results
> 
> To measure caching effect I benchmarked parallel token auth requests
> for /status?verbose=0 on top of the datastore lookup cache series [3]
> to check throughput impact. With datastores=1, repeat=5000, parallel=16
> this series gives ~172 req/s compared to ~65 req/s without it.
> This is a ~2.6x improvement (and aligns with the ~179 req/s from the
> previous series, which used per-process cache invalidation).
> 
> Patch summary
> 
> pbs-config:
> 0001 – pbs-config: add token.shadow generation to ConfigVersionCache
> 0002 – pbs-config: cache verified API token secrets
> 0003 – pbs-config: invalidate token-secret cache on token.shadow
> changes
> 0004 – pbs-config: add TTL window to token-secret cache
> 
> proxmox-access-control:
> 0005 – access-control: extend AccessControlConfig for token.shadow invalidation
> 0006 – access-control: cache verified API token secrets
> 0007 – access-control: invalidate token-secret cache on token.shadow changes
> 0008 – access-control: add TTL window to token-secret cache
> 
> proxmox-datacenter-manager:
> 0009 – pdm-config: add token.shadow generation to ConfigVersionCache
> 0010 – docs: document API token-cache TTL effects
> 0011 – pdm-config: wire user+acl cache generation
> 
> Maintainer Notes:
> * proxmox-access-control trait split: permissions now live in
>   AccessControlPermissions, and AccessControlConfig now requires
>   fn permissions(&self) -> &dyn AccessControlPermissions ->
>   version bump
> * Renames ConfigVersionCache`s pub user_cache_generation and
>   increase_user_cache_generation -> version bump
> * Adds parking_lot::RwLock dependency in PBS and proxmox-access-control
> 
> This version and the version before only incorporate the reviewers'
> feedback [4][5][6], also please consider Christian's R-b tag [4].
> 
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
> [2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
> [3] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
> [4] https://lore.proxmox.com/pbs-devel/20260121151408.731516-1-s.rufinatscha@proxmox.com/T/#t
> [5] https://lore.proxmox.com/pbs-devel/20260217111229.78661-1-s.rufinatscha@proxmox.com/T/#t
> [6] https://lore.proxmox.com/pbs-devel/725687dd-5a35-41ed-af62-6dc9f062cbd4@proxmox.com/T/#t
> 
> proxmox-backup:
> 
> Samuel Rufinatscha (4):
>    pbs-config: add token.shadow generation to ConfigVersionCache
>    pbs-config: cache verified API token secrets
>    pbs-config: invalidate token-secret cache on token.shadow changes
>    pbs-config: add TTL window to token secret cache
> 
>   Cargo.toml                             |   1 +
>   docs/user-management.rst               |   4 +
>   pbs-config/Cargo.toml                  |   1 +
>   pbs-config/src/config_version_cache.rs |  18 ++
>   pbs-config/src/token_shadow.rs         | 314 ++++++++++++++++++++++++-
>   5 files changed, 335 insertions(+), 3 deletions(-)
> 
> 
> proxmox:
> 
> Samuel Rufinatscha (4):
>    proxmox-access-control: split AccessControlConfig and add token.shadow
>      gen
>    proxmox-access-control: cache verified API token secrets
>    proxmox-access-control: invalidate token-secret cache on token.shadow
>      changes
>    proxmox-access-control: add TTL window to token secret cache
> 
>   Cargo.toml                                 |   1 +
>   proxmox-access-control/Cargo.toml          |   1 +
>   proxmox-access-control/src/acl.rs          |  10 +-
>   proxmox-access-control/src/init.rs         | 113 ++++++--
>   proxmox-access-control/src/token_shadow.rs | 315 ++++++++++++++++++++-
>   5 files changed, 413 insertions(+), 27 deletions(-)
> 
> 
> proxmox-datacenter-manager:
> 
> Samuel Rufinatscha (3):
>    pdm-config: implement token.shadow generation
>    docs: document API token-cache TTL effects
>    pdm-config: wire user+acl cache generation
> 
>   cli/admin/src/main.rs                      |  2 +-
>   docs/access-control.rst                    |  4 +++
>   lib/pdm-api-types/src/acl.rs               |  4 +--
>   lib/pdm-config/Cargo.toml                  |  1 +
>   lib/pdm-config/src/access_control.rs       | 31 ++++++++++++++++++++
>   lib/pdm-config/src/config_version_cache.rs | 34 +++++++++++++++++-----
>   lib/pdm-config/src/lib.rs                  |  2 ++
>   server/src/acl.rs                          |  3 +-
>   ui/src/main.rs                             | 10 ++++++-
>   9 files changed, 77 insertions(+), 14 deletions(-)
>   create mode 100644 lib/pdm-config/src/access_control.rs
> 
> 
> Summary over all repositories:
>    19 files changed, 825 insertions(+), 44 deletions(-)
> 





^ permalink raw reply	[relevance 13%]

* Re: [PATCH widget-toolkit] confirm remove dialog: improve layout for larger fields/components
  @ 2026-04-29 11:04 15% ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-04-29 11:04 UTC (permalink / raw)
  To: Dominik Csapak, pbs-devel

On 4/29/26 12:11 PM, Dominik Csapak wrote:
> If there are larger fields e.g. in the additionalItems property, the
> current layout configuration leads to cut-off text and fields.
> 
> To fix this, apply 'flex: 1' to the overall body and actual content
> component (so they adapt their size) and use 'align: stretch' for the
> content vbox (so the elements adapt to the size of the other content
> elements).
> 
> with that, these larger elements (e.g. a hint box) will either shrink,
> or wrap it's text.
> 
> Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> ---
>   src/window/ConfirmRemoveDialog.js | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/src/window/ConfirmRemoveDialog.js b/src/window/ConfirmRemoveDialog.js
> index 94f1e84..5f76e47 100644
> --- a/src/window/ConfirmRemoveDialog.js
> +++ b/src/window/ConfirmRemoveDialog.js
> @@ -133,6 +133,7 @@ Ext.define('Proxmox.window.ConfirmRemoveDialog', {
>           }
>   
>           let body = {
> +            flex: 1,
>               xtype: 'container',
>               layout: 'hbox',
>               items: [
> @@ -149,8 +150,12 @@ Ext.define('Proxmox.window.ConfirmRemoveDialog', {
>           }
>   
>           let content = {
> +            flex: 1,
>               xtype: 'container',
> -            layout: 'vbox',
> +            layout: {
> +                type: 'vbox',
> +                align: 'stretch',
> +            },
>               items: [
>                   {
>                       xtype: 'component',

Thanks for the patch, I tested it and it fixes the mentioned issue!

Reviewed-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
Tested-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>





^ permalink raw reply	[relevance 15%]

* Re: [PATCH v2 storage 07/15] iscsi: introduce helper to update discovery db
  @ 2026-05-11 16:46  6%   ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-05-11 16:46 UTC (permalink / raw)
  To: Mira Limbeck, pve-devel

On 4/30/26 7:32 PM, Mira Limbeck wrote:
> In the case of using mappings, running a discovery against the
> configured portals could lead to lots of additional node entries, as can
> be seen with the original iSCSI plugin way.
> 
> Add a discovery db update helper that adds and removes node entries
> based on what is configured via mappings.
> For the non-mapping case a discovery is done. If no portal entries are
> returned, only the portal from the storage config is kept for the
> target.
> 
> Also adds a helper to gather all node entries.
> 
> Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
> ---
>   src/PVE/Storage/ISCSIPlugin.pm | 164 +++++++++++++++++++++++++++++++++
>   1 file changed, 164 insertions(+)
> 
> diff --git a/src/PVE/Storage/ISCSIPlugin.pm b/src/PVE/Storage/ISCSIPlugin.pm
> index 8f2bdb5..0f77f3f 100644
> --- a/src/PVE/Storage/ISCSIPlugin.pm
> +++ b/src/PVE/Storage/ISCSIPlugin.pm
> @@ -79,6 +79,36 @@ my $get_local_config = sub {
>       }
>   };
>   
> +sub iscsi_node_list {
> +    assert_iscsi_support();
> +
> +    my $cmd = [$ISCSIADM, '--mode', 'node'];
> +
> +    my $res = {};
> +    eval {
> +        run_command(
> +            $cmd,
> +            errmsg => 'iscsi node scan failed',
> +            outfunc => sub {
> +                my $line = shift;
> +
> +                # example: 10.67.1.144:3260,4294967295 iqn.2003-01.org.linux-iscsi.iscsi.x8664:sn.81bb080df375
> +                if ($line =~ m/^$ISCSI_TARGET_RE$/) {
> +                    my ($portal, $target) = ($1, $2);
> +
> +                    push $res->{$target}->@*, $portal;
> +                }
> +            },
> +        );
> +    };
> +
> +    if (my $err = $@) {
> +        die $err if $err !~ m/: No records found$/i;
> +    }
> +
> +    return $res;
> +}
> +
>   sub iscsi_session_list {
>       assert_iscsi_support();
>   
> @@ -368,6 +398,140 @@ sub iscsi_device_list {
>       return $res;
>   }
>   
> +sub update_iscsi_discovery_db {
> +    my ($local_cfg, $cache) = @_;
> +
> +    if ($local_cfg->{is_mapping}) {
> +        # do mapping specific discoverydb update
> +
> +        my $host_targets = iscsi_node_list();

AFAICT this returns all local iscsiadm node entries

> +        my $mapped_targets = $local_cfg->{targets};
> +        my $added = {};
> +        my $removed = {};
> +
> +        for my $host_target (keys $host_targets->%*) {
> +            if (!defined($mapped_targets->{$host_target})) {

Could this therefore mark entries from another iSCSI storage/mapping as
stale and remove them below?

> +                # add all portals of that target to be removed
> +                $removed->{$host_target} = $host_targets->{$host_target};
> +            } else {
> +                for my $host_target_portal ($host_targets->{$host_target}->@*) {
> +                    if (!grep(/^\Q$host_target_portal\E$/, $mapped_targets->{$host_target}->@*)) {
> +                        push $removed->{$host_target}->@*, $host_target_portal;
> +                    }
> +                }
> +            }
> +        }
> +
> +        for my $mapped_target (keys $mapped_targets->%*) {
> +            if (!defined($host_targets->{$mapped_target})) {
> +                # add all portals of that target to be added
> +                $added->{$mapped_target} = $mapped_targets->{$mapped_target};
> +            } else {
> +                for my $mapped_target_portal ($mapped_targets->{$mapped_target}->@*) {
> +                    if (
> +                        !grep(/^\Q$mapped_target_portal\E$/,
> +                            $host_targets->{$mapped_target}->@*)
> +                    ) {
> +                        push $added->{$mapped_target}->@*, $mapped_target_portal;
> +                    }
> +                }
> +            }
> +        }
> +
> +        # remove stale entries
> +        for my $target (keys $removed->%*) {
> +            my $target_sessions = iscsi_session($cache, $target);
> +            my $sessions = {};
> +            for my $target_session ($target_sessions->@*) {
> +                $sessions->{ $target_session->{portal} } = $target_session->{session_id};
> +            }
> +
> +            my $removed_sessions = [];
> +            for my $portal ($removed->{$target}->@*) {
> +                # log out of session before removing the stale node entry
> +                # iscsiadm returns an error otherwise
> +                print "removing stale iscsi session: $target via $portal\n";
> +                if (defined($sessions->{$portal})) {
> +                    my $cmd = [
> +                        $ISCSIADM, '--mode', 'session', '--sid', $sessions->{$portal},
> +                        '--logout',
> +                    ];
> +                    eval { run_command($cmd); };
> +                    warn "failed to log out of stale session: $@\n" if $@;
> +
> +                    # remove logged out sessions
> +                    # the list of still active sessions will be used to update cache later
> +                    delete $sessions->{$portal};
> +                }
> +                print "removing stale iscsi target entry: $target via $portal\n";
> +                my $cmd = [
> +                    $ISCSIADM,
> +                    '--mode',
> +                    'node',
> +                    '--target',
> +                    $target,
> +                    '--portal',
> +                    $portal,
> +                    '-o',
> +                    'delete',
> +                ];
> +                eval { run_command($cmd); };
> +                warn "failed to remove stale node entry: $@\n" if $@;
> +            }
> +            # update cache with list of active sessions
> +            $cache->{iscsi_sessions}->{$target} =
> +                [map { { portal => $_, session_id => $sessions->{$_} } } keys $sessions->%*];
> +        }
> +
> +        # add new mapping entries
> +        for my $target (keys $added->%*) {
> +            for my $portal ($added->{$target}->@*) {
> +                print "adding new iscsi target entry: $target via $portal\n";
> +                my $cmd = [
> +                    $ISCSIADM,
> +                    '--mode',
> +                    'node',
> +                    '--target',
> +                    $target,
> +                    '--portal',
> +                    $portal,
> +                    '-o',
> +                    'new',
> +                ];
> +                eval { run_command($cmd); };
> +                warn "failed to add new node entry: $@\n" if $@;
> +            }
> +        }
> +
> +    } else {
> +        my $portals = undef;
> +        my $target = undef;
> +        for my $config_target (keys $local_cfg->{targets}->%*) {
> +            $target = $config_target;
> +            $portals = iscsi_portals($config_target, $local_cfg->{targets}->{$config_target}->[0]);
> +            last;
> +        }
> +        die "no target found for discovery\n" if !defined($target);
> +        die "no portals found for discovery\n" if !defined($portals);
> +
> +        my $res = eval { iscsi_discovery($target, $portals, $cache); };
> +        warn $@ if $@;
> +
> +        if (defined($res->{$target})) {
> +            $local_cfg->{targets}->{$target} = [map { $_->{portal} } $res->{$target}->@*];
> +        } else {
> +            # no portal is discovered, keep already configured node entries
> +            # in that case, otherwise we might remove sessions still in use
> +            #
> +            # cleanup in that case should be done manually after verifying that
> +            # active sessions are no longer needed
> +            $local_cfg->{targets}->{$target} = $portals;
> +        }
> +    }
> +
> +    return 1;
> +}
> +
>   # Configuration
>   
>   sub type {





^ permalink raw reply	[relevance 6%]

* Re: [PATCH v2 cluster/storage/manager 00/15] storage mapping
    @ 2026-05-11 17:05 13% ` Samuel Rufinatscha
  1 sibling, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-05-11 17:05 UTC (permalink / raw)
  To: Mira Limbeck, pve-devel

Tested the series with an existing 3 node cluster, with two nodes
patched and the third (unpatched) node served a targetcli-fb
iSCSI target.

The iSCSI mappings were replicated via pmxcfs and the new API endpoint
returned the mappings. pvestatd activated mapped iSCSI storage
without errors on both nodes. Changing one node's mapping to another
target removed the stale session entry and added the new one. A failed
login to a non-existing target was logged but did not crash pvestatd.
Non-persistent discovery left the discovery DB unchanged.

For the ZFSPool POC, I tested file backed pools with different names on
both nodes. A VM was created on the local mapped pool and replication
to the other node completed successfully with the dataset appearing in
the other mapped pool.

Tested-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>


On 4/30/26 7:32 PM, Mira Limbeck wrote:
> This patch series is the second iteration of storage mapping support.
> The first iteration can be found under [1].
> 
> What is included:
> * new mapping base plugin
> * new iscsi mapping plugin
> * reworked iscsi plugin to support mappings
> * api support for creating and updating mappings
> * optional non-persistent discovery
> * (optional) cleanup for leftover node entries that are no longer discovered
> * POC zfspool mapping plugin
> 
> What is missing:
> * fix for pvesh to support oneOf schemas
> * GUI for handling mappings
> * additional mapping plugins
> 
> 
> Some patches need to be applied in a specific order:
> pve-cluster > pve-storage > pve-manager
> 
> The pve-cluster patch adds support for mapping/storage.cfg in /etc/pve.
> This is required by the pve-storage changes, they won't compile
> otherwise.
> The pve-manager patch has to be applied after the API additions of
> pve-storage, otherwise it won't compile since it adds an API endpoint
> that forwards to the newly introduced pve-storage API.
> 
> 
> The idea behind mapping support:
> This stems mainly from iSCSI plugin limitations we've seen in support.
> The current iSCSI storage plugin assumes that the central part of its
> config is the `target`. It assumes there is only one target with one or
> more portals.
> But we've seen (mostly) proprietary SANs expose those LUNs in a
> slightly different way. Each LUN is exposed via a different target via a
> different portal.
> This is something the current iSCSI plugin does not handle nicely.
> Sometimes those targets and portals are even different on different
> Proxmox VE nodes, since not all portals will be reachable from every
> node.
> 
> To manage such setups, the idea of cluster-wide storage mappings was
> born. With this, rather than having one storage with each
> target/multiple targets and all the portals, we can now specify exactly
> which node has which portals and targets. And those are all combined
> into a `logical mapping target`, which can be used cluster-wide and will
> be resolved on each node separately.
> 
> Even though the idea came based on limitations of the iSCSI plugin, it
> can be used for other plugins as well, see the POC for the zfspool
> plugin.
> The idea is to create a base that would work for all storages where
> mapping makes sense.
> 
> How to test:
> 
> *iSCSI*:
> 
> Setup:
> * at least 2 Proxmox VE hosts
> * iSCSI target on one of the Proxmox VE hosts, or separate
> 
> targetcli [2] makes setting up iSCSI easy [3]
> 
> Create a mapping:
> * use the API via curl/something else
> * use pvesh (can't specify --map since oneOf schema is not supported)
> 
> An example mapping config might look like this:
> # cat /etc/pve/mapping/storage.cfg
> iscsi: logicaltarget
> 	discovery-portals 10.67.0.144,10.67.1.144
> 	map node=iscsi-test,portals=10.67.1.144;10.67.2.144;10.67.3.144:3260,target=iqn.2003-01.org.linux-iscsi.iscsi.x8664:sn.81bb080df375
> 	map node=iscsi-test,portals=10.67.4.144;10.67.5.144;10.67.8.144,target=iqn.2003-01.org.linux-iscsi.iscsi.x8664:sn.81bb080df375
> 	map node=iscsi-test2,portals=10.67.1.144;10.67.2.144;10.67.3.144:3260,target=iqn.2003-01.org.linux-iscsi.iscsi.x8664:sn.81bb080df375
> 
> And the corresponding /etc/pve/storage.cfg would look like this:
> iscsi: iscsi
> 	mapping logicaltarget
> 
> On the next iteration of pvestatd it should pick it up and log in to all
> configured portals when possible.
> This can be checked with:
> `iscsiadm -m session`
> 
> `iscsiadm -m node` will contain all configured entries, and those will
> be updated on the next pvestatd iteration after a change to the mapping
> config.
> 
> For the regular iSCSI storage config, everything should stay the same. A
> discovery is done every iteration now though.
> 
> With optional patch 09/15 it will clean up stale node entries that are no
> longer announced on discovery. This can be tested by removing mapped
> luns in targetcli under `acls`.
> 
> 
> *ZFS*:
> 
> Setup:
> * at least 2 Proxmox VE hosts
> * both hosts with different zpool names
> 
> Create a mapping:
> * same as iSCSI, but with `zfspool` storage type
> * map only accepts `pool` and `node` as options
> 
> To test:
> * set up replication and see if it works with different zpool names on
>    each host
> 
> 
> 
> [1] https://lore.proxmox.com/all/20251110170124.3460419-1-m.limbeck@proxmox.com/
> [2] https://packages.debian.org/trixie/targetcli-fb
> [3] https://wiki.archlinux.org/title/ISCSI/LIO
> 
> 
> v2:
>   - split up patch series
>   - fixed discovery in mapping case, entries are now added manually based
>     on the config
>   - added discovery-portals for future GUI usability
>   - added optional cleanup for stale node entries for regular iSCSI storages
>   - added POC ZFSPool mapping plugin with support for replication between
>     zpools with different names
> 
> 
> pve-cluster:
> 
> Mira Limbeck (1):
>    mapping: add storage.cfg
> 
>   src/PVE/Cluster.pm  | 1 +
>   src/pmxcfs/status.c | 1 +
>   2 files changed, 2 insertions(+)
> 
> pve-storage:
> 
> Mira Limbeck (13):
>    mapping: add base plugin
>    mapping: add iSCSI plugin
>    iscsi: introduce mapping support
>    iscsi: add helper to get local config
>    iscsi: change functions to handle mappings
>    iscsi: introduce helper to update discovery db
>    iscsi: rework to update discovery db and simplify login
>    iscsi: remove stale sessions in non-mapping case
>    api: add mapping support
>    mapping: iscsi: add discovery-portal config option
>    iscsi: add support for non-persistent discovery
>    api: add non-persistent iscsi discovery option
>    mapping: add zfspool poc
> 
>   src/PVE/API2/Storage/Makefile      |   2 +-
>   src/PVE/API2/Storage/Mapping.pm    | 213 +++++++++++++++
>   src/PVE/API2/Storage/Scan.pm       |  32 ++-
>   src/PVE/CLI/pvesm.pm               |   3 +-
>   src/PVE/Storage.pm                 |   5 +-
>   src/PVE/Storage/ISCSIPlugin.pm     | 399 +++++++++++++++++++++++++----
>   src/PVE/Storage/Makefile           |   4 +-
>   src/PVE/Storage/Mapping.pm         |  46 ++++
>   src/PVE/Storage/Mapping/ISCSI.pm   |  59 +++++
>   src/PVE/Storage/Mapping/Makefile   |   8 +
>   src/PVE/Storage/Mapping/Plugin.pm  |  90 +++++++
>   src/PVE/Storage/Mapping/ZFSPool.pm |  48 ++++
>   src/PVE/Storage/Plugin.pm          |   6 +
>   src/PVE/Storage/ZFSPoolPlugin.pm   | 133 +++++++---
>   14 files changed, 957 insertions(+), 91 deletions(-)
>   create mode 100644 src/PVE/API2/Storage/Mapping.pm
>   create mode 100644 src/PVE/Storage/Mapping.pm
>   create mode 100644 src/PVE/Storage/Mapping/ISCSI.pm
>   create mode 100644 src/PVE/Storage/Mapping/Makefile
>   create mode 100644 src/PVE/Storage/Mapping/Plugin.pm
>   create mode 100644 src/PVE/Storage/Mapping/ZFSPool.pm
> 
> pve-manager:
> 
> Mira Limbeck (1):
>    api: mapping: add storage mapping path
> 
>   PVE/API2/Cluster/Mapping.pm | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> 





^ permalink raw reply	[relevance 13%]

* Re: [PATCH manager/pmg-api/proxmox{,-backup,-perl-rs,-offline-mirror} 0/8] adapt subscription handling to alternative server IDs
  @ 2026-05-18 16:02 13% ` Samuel Rufinatscha
  0 siblings, 0 replies; 117+ results
From: Samuel Rufinatscha @ 2026-05-18 16:02 UTC (permalink / raw)
  To: Fabian Grünbichler, pve-devel

Please find below a summary of the tests i performed:

PVE:
- existing SshMd5 subscription stays "active" after the patch
- update using --force on a pre-existing subscription works
- delete + set without reissue is rejected with "Invalid Server ID"
   as expected, since the patched client sends MachineId
- After reissue, delete + set binds the subscription to the new
   MachineId
- Regenerating the SSH host keys on a MachineId-bound subscription
   does not change its status, no reissue needed

PMG: tested the same scenarios as PVE

POM:
- existing SshMd5 mirror subscription stays "active" after the
   patch
- key refresh on a pre-existing subscription works
- key remove + key add-mirror-key without reissue is rejected with
   "Invalid Server ID" as expected

For PDM I tested if the subscription of the patched
remote shows up as active in the PDM system report.
Tested using a Community Subscription.

I didn't run into any errors and didn't notice anything that looks off.

Tested-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>

On 5/7/26 1:59 PM, Fabian Grünbichler wrote:
> instead of only supporting one variant of server IDs, get a list of possible
> candidates via the proxmox-subscription crate. if a subscription is already
> configured, the matching server ID will be reused to avoid reissuing.
> 
> v1:
> - drop proxmox-systemd part, already applied
> - add PMG changes
> - add PDM changes
> - add POM changes
> - rebase
> 
> order of bumping:
> - proxmox-subscription (breaks PBS/PDM/POM)
> 
> - pve-rs/pmg-rs (needs proxmox-subscription)
> - pve-manager (needs pve-rs)
> - pmg-api (needs pmg-rs)
> 
> - pbs (needs proxmox-subscription)
> 
> - pdm (needs proxmox-subscription)
> 
> - pom (needs proxmox-subscription)
> 
> tested PBS/PVE, additional testing of PMG/POM/PDM would be highly
> appreciated.
> 
> sending to pve-devel, since it's our main list - this of course is a
> cross-product patch series ;)
> 
> 
> proxmox:
> 
> Fabian Grünbichler (1):
>    proxmox-subscription: add new machine-id based serverid
> 
>   proxmox-subscription/Cargo.toml               |   3 +-
>   proxmox-subscription/src/lib.rs               |   2 +-
>   proxmox-subscription/src/subscription_info.rs | 105 ++++++++++++++++--
>   3 files changed, 96 insertions(+), 14 deletions(-)
> 
> 
> proxmox-backup:
> 
> Fabian Grünbichler (1):
>    subscription: adapt to multiple server ID variants
> 
>   src/api2/node/subscription.rs | 38 ++++++++++++++++++++++++++---------
>   1 file changed, 28 insertions(+), 10 deletions(-)
> 
> 
> proxmox-perl-rs:
> 
> Fabian Grünbichler (1):
>    common: subscription: expose server ID candidates
> 
>   common/src/bindings/subscription.rs | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> 
> pve-manager:
> 
> Fabian Grünbichler (2):
>    subscription: adapt to multiple server ID variants
>    api2tools: remove unused get_hwaddress
> 
>   PVE/API2/Subscription.pm | 26 ++++++++++++++++++++------
>   PVE/API2Tools.pm         | 23 -----------------------
>   2 files changed, 20 insertions(+), 29 deletions(-)
> 
> 
> pmg-api:
> 
> Fabian Grünbichler (2):
>    subscription: adapt to multiple server ID variants
>    utils: drop now unused get_hwaddress
> 
>   src/PMG/API2/Subscription.pm | 27 +++++++++++++++++++++------
>   src/PMG/Utils.pm             | 23 -----------------------
>   2 files changed, 21 insertions(+), 29 deletions(-)
> 
> 
> proxmox-offline-mirror:
> 
> Fabian Grünbichler (1):
>    subscription handling: adapt to multiple server ID candidates
> 
>   src/bin/proxmox-offline-mirror-helper.rs      | 36 +++++++++++++++----
>   src/bin/proxmox-offline-mirror.rs             |  6 +++-
>   .../subscription.rs                           |  5 ++-
>   3 files changed, 39 insertions(+), 8 deletions(-)
> 
> 
> Summary over all repositories:
>    12 files changed, 215 insertions(+), 90 deletions(-)
> 





^ permalink raw reply	[relevance 13%]

Results 201-317 of 317	 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2026-01-02 16:07     [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] token-shadow: reduce api token verification overhead Samuel Rufinatscha
2026-01-02 16:07     ` [pbs-devel] [PATCH proxmox-backup v3 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
2026-01-14 10:44       ` Fabian Grünbichler
2026-01-16 13:53  6%     ` Samuel Rufinatscha
2026-01-02 16:07     ` [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
2026-01-14 10:44       ` Fabian Grünbichler
2026-01-16 15:13  6%     ` Samuel Rufinatscha
2026-01-16 15:29  6%       ` Fabian Grünbichler
2026-01-16 15:33  6%         ` Samuel Rufinatscha
2026-01-16 16:00  5%       ` Fabian Grünbichler
2026-01-16 16:56  6%         ` Samuel Rufinatscha
2026-01-02 16:07     ` [pbs-devel] [PATCH proxmox-backup v3 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2026-01-14 10:44       ` Fabian Grünbichler
2026-01-20  9:21  6%     ` Samuel Rufinatscha
2026-01-02 16:07     ` [pbs-devel] [PATCH proxmox-datacenter-manager v3 1/2] pdm-config: implement token.shadow generation Samuel Rufinatscha
2026-01-14 10:45       ` Fabian Grünbichler
2026-01-16 16:28  6%     ` Samuel Rufinatscha
2026-01-16 16:48  6%       ` Shannon Sterz
2026-01-19  7:56  6%         ` Samuel Rufinatscha
2026-01-21 15:15 13% ` [pbs-devel] superseded: [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] token-shadow: reduce api token verification overhead Samuel Rufinatscha
2026-01-06 14:24     [pve-devel] [PATCH pve-cluster 00/15 v1] Rewrite pmxcfs with Rust Kefu Chai
2026-01-06 14:24     ` [pve-devel] [PATCH pve-cluster 01/15] pmxcfs-rs: add workspace and pmxcfs-api-types crate Kefu Chai
2026-01-23 14:17  6%   ` Samuel Rufinatscha
2026-01-26  9:00  6%     ` Kefu Chai
2026-01-06 14:24     ` [pve-devel] [PATCH pve-cluster 02/15] pmxcfs-rs: add pmxcfs-config crate Kefu Chai
2026-01-23 15:01  6%   ` Samuel Rufinatscha
2026-01-26  9:43  6%     ` Kefu Chai
2026-01-06 14:24     ` [pve-devel] [PATCH pve-cluster 03/15] pmxcfs-rs: add pmxcfs-logger crate Kefu Chai
2026-01-27 13:16  6%   ` Samuel Rufinatscha
2026-01-06 14:24     ` [pve-devel] [PATCH pve-cluster 04/15] pmxcfs-rs: add pmxcfs-rrd crate Kefu Chai
2026-01-29 14:44  5%   ` Samuel Rufinatscha
2026-01-06 14:24     ` [pve-devel] [PATCH pve-cluster 05/15] pmxcfs-rs: add pmxcfs-memdb crate Kefu Chai
2026-01-30 15:35  5%   ` Samuel Rufinatscha
2026-01-06 14:24     ` [pve-devel] [PATCH pve-cluster 06/15] pmxcfs-rs: add pmxcfs-status crate Kefu Chai
2026-02-02 16:07  5%   ` Samuel Rufinatscha
2026-01-06 14:24     ` [pve-devel] [PATCH pve-cluster 07/15] pmxcfs-rs: add pmxcfs-test-utils infrastructure crate Kefu Chai
2026-02-03 17:03  6%   ` Samuel Rufinatscha
2026-01-06 14:24     ` [pve-devel] [PATCH pve-cluster 08/15] pmxcfs-rs: add pmxcfs-services crate Kefu Chai
2026-02-11 11:52  5%   ` Samuel Rufinatscha
2026-01-06 14:24     ` [pve-devel] [PATCH pve-cluster 09/15] pmxcfs-rs: add pmxcfs-ipc crate Kefu Chai
2026-02-12 15:21  5%   ` Samuel Rufinatscha
2026-01-06 14:24     ` [pve-devel] [PATCH pve-cluster 11/15] pmxcfs-rs: vendor patched rust-corosync for CPG compatibility Kefu Chai
2026-02-11 12:55  6%   ` Samuel Rufinatscha
2026-01-06 14:24     ` [pve-devel] [PATCH pve-cluster 14/15] pmxcfs-rs: add Makefile for build automation Kefu Chai
2026-02-09 16:25  6%   ` Samuel Rufinatscha
2026-01-08 11:26     [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2026-01-16 11:30 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
2026-01-16 11:28 11% [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] " Samuel Rufinatscha
2026-01-16 11:28 16% ` [pbs-devel] [PATCH proxmox v6 1/3] acme-api: add ACME completion helpers Samuel Rufinatscha
2026-01-16 11:28 15% ` [pbs-devel] [PATCH proxmox v6 2/3] acme: introduce http_status module Samuel Rufinatscha
2026-01-16 11:28 14% ` [pbs-devel] [PATCH proxmox v6 3/3] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2026-01-16 11:28  4% ` [pbs-devel] [PATCH proxmox-backup v6 1/2] acme: remove local AcmeClient and use proxmox-acme-api handlers Samuel Rufinatscha
2026-01-16 11:28  9% ` [pbs-devel] [PATCH proxmox-backup v6 2/2] acme: remove unused src/acme and plugin code Samuel Rufinatscha
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
2026-01-21 15:13 17% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
2026-01-21 15:13 12% ` [pbs-devel] [PATCH proxmox-backup v4 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
2026-02-10 12:54  5%   ` Christian Ebner
2026-02-10 13:08  6%     ` Samuel Rufinatscha
2026-01-21 15:13 12% ` [pbs-devel] [PATCH proxmox-backup v4 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2026-01-21 15:14 15% ` [pbs-devel] [PATCH proxmox-backup v4 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
2026-02-10 12:58  6%   ` Christian Ebner
2026-02-10 13:18  6%     ` Samuel Rufinatscha
2026-01-21 15:14 14% ` [pbs-devel] [PATCH proxmox v4 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen Samuel Rufinatscha
2026-01-21 15:14 12% ` [pbs-devel] [PATCH proxmox v4 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
2026-02-10 13:38  6%   ` Christian Ebner
2026-02-10 14:07  6%     ` Samuel Rufinatscha
2026-01-21 15:14 12% ` [pbs-devel] [PATCH proxmox v4 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2026-01-21 15:14 15% ` [pbs-devel] [PATCH proxmox v4 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
2026-01-21 15:14 14% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 1/3] pdm-config: implement token.shadow generation Samuel Rufinatscha
2026-01-21 15:14 17% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 2/3] docs: document API token-cache TTL effects Samuel Rufinatscha
2026-01-21 15:14 16% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 3/3] pdm-config: wire user+acl cache generation Samuel Rufinatscha
2026-02-17 11:14 13% ` [pbs-devel] superseded: [PATCH proxmox{-backup,,-datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
2026-02-12 13:58 15% [PATCH proxmox-backup 0/1] fix #7311: bin: init proxmox_acme_api in proxmox-daily-update Samuel Rufinatscha
2026-02-12 13:58 17% ` [PATCH proxmox-backup 1/1] " Samuel Rufinatscha
2026-02-12 14:37  6% ` applied: [PATCH proxmox-backup 0/1] " Fabian Grünbichler
2026-02-13  9:33     [PATCH pve-cluster 00/14 v2] Rewrite pmxcfs with Rust Kefu Chai
2026-02-13  9:33     ` [PATCH pve-cluster 01/14 v2] pmxcfs-rs: add Rust workspace configuration Kefu Chai
2026-02-18 10:41  6%   ` Samuel Rufinatscha
2026-02-13  9:33     ` [PATCH pve-cluster 02/14 v2] pmxcfs-rs: add pmxcfs-api-types crate Kefu Chai
2026-02-18 15:06  5%   ` Samuel Rufinatscha
2026-02-13  9:33     ` [PATCH pve-cluster 03/14 v2] pmxcfs-rs: add pmxcfs-config crate Kefu Chai
2026-02-18 16:41  6%   ` Samuel Rufinatscha
2026-02-13  9:33     ` [PATCH pve-cluster 04/14 v2] pmxcfs-rs: add pmxcfs-logger crate Kefu Chai
2026-02-24 16:17  6%   ` Samuel Rufinatscha
2026-02-13  9:33     ` [PATCH pve-cluster 05/14 v2] pmxcfs-rs: add pmxcfs-rrd crate Kefu Chai
2026-03-13 14:09  3%   ` Samuel Rufinatscha
2026-02-17 11:12 14% [PATCH proxmox{-backup,,-datacenter-manager} v5 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
2026-02-17 11:12 17% ` [PATCH proxmox-backup v5 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
2026-02-17 11:12 12% ` [PATCH proxmox-backup v5 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
2026-02-25 15:44  6%   ` Shannon Sterz
2026-02-27  9:28  6%     ` Samuel Rufinatscha
2026-02-17 11:12 12% ` [PATCH proxmox-backup v5 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2026-02-17 11:12 15% ` [PATCH proxmox-backup v5 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
2026-02-17 11:12 14% ` [PATCH proxmox v5 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen Samuel Rufinatscha
2026-02-17 11:12 11% ` [PATCH proxmox v5 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
2026-02-17 11:12 12% ` [PATCH proxmox v5 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2026-02-17 11:12 15% ` [PATCH proxmox v5 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
2026-02-17 11:12 13% ` [PATCH proxmox-datacenter-manager v5 1/3] pdm-config: implement token.shadow generation Samuel Rufinatscha
2026-02-17 11:12 17% ` [PATCH proxmox-datacenter-manager v5 2/3] docs: document API token-cache TTL effects Samuel Rufinatscha
2026-02-17 11:12 16% ` [PATCH proxmox-datacenter-manager v5 3/3] pdm-config: wire user+acl cache generation Samuel Rufinatscha
2026-03-03 16:52 13% ` [pbs-devel] superseded: [PATCH proxmox{-backup,,-datacenter-manager} v5 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
2026-03-03 16:49 14% [PATCH proxmox{-backup,,-datacenter-manager} v6 " Samuel Rufinatscha
2026-03-03 16:49 17% ` [PATCH proxmox-backup v6 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
2026-03-03 16:49 11% ` [PATCH proxmox-backup v6 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
2026-03-03 16:49 12% ` [PATCH proxmox-backup v6 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2026-03-03 16:49 15% ` [PATCH proxmox-backup v6 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
2026-03-03 16:49 14% ` [PATCH proxmox v6 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen Samuel Rufinatscha
2026-03-03 16:49 11% ` [PATCH proxmox v6 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
2026-03-03 16:49 12% ` [PATCH proxmox v6 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2026-03-03 16:49 15% ` [PATCH proxmox v6 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
2026-03-03 16:49 13% ` [PATCH proxmox-datacenter-manager v6 1/3] pdm-config: implement token.shadow generation Samuel Rufinatscha
2026-03-03 16:49 17% ` [PATCH proxmox-datacenter-manager v6 2/3] docs: document API token-cache TTL effects Samuel Rufinatscha
2026-03-03 16:50 16% ` [PATCH proxmox-datacenter-manager v6 3/3] pdm-config: wire user+acl cache generation Samuel Rufinatscha
2026-03-11  8:59  6% ` [PATCH proxmox{-backup,,-datacenter-manager} v6 00/11] token-shadow: reduce api token verification overhead Fabian Grünbichler
2026-03-11 16:26  6%   ` Samuel Rufinatscha
2026-03-12 10:38 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
2026-03-12 10:36 14% [PATCH proxmox{-backup,,-datacenter-manager} v7 " Samuel Rufinatscha
2026-03-12 10:36 17% ` [PATCH proxmox-backup v7 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
2026-03-12 10:36 11% ` [PATCH proxmox-backup v7 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
2026-03-12 10:36 12% ` [PATCH proxmox-backup v7 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2026-03-12 10:37 15% ` [PATCH proxmox-backup v7 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
2026-03-12 10:37 14% ` [PATCH proxmox v7 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen Samuel Rufinatscha
2026-03-12 10:37 11% ` [PATCH proxmox v7 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
2026-03-12 10:37 12% ` [PATCH proxmox v7 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2026-03-12 10:37 15% ` [PATCH proxmox v7 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
2026-03-12 10:37 13% ` [PATCH proxmox-datacenter-manager v7 1/3] pdm-config: implement token.shadow generation Samuel Rufinatscha
2026-03-12 10:37 17% ` [PATCH proxmox-datacenter-manager v7 2/3] docs: document API token-cache TTL effects Samuel Rufinatscha
2026-03-12 10:37 16% ` [PATCH proxmox-datacenter-manager v7 3/3] pdm-config: wire user+acl cache generation Samuel Rufinatscha
2026-03-19 12:26  5% ` partially-applied: [PATCH proxmox{-backup,,-datacenter-manager} v7 00/11] token-shadow: reduce api token verification overhead Fabian Grünbichler
2026-03-23 12:16  6%   ` Samuel Rufinatscha
2026-04-09 15:58 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
2026-04-09 15:54 17% [PATCH proxmox{,-datacenter-manager} v8 0/9] " Samuel Rufinatscha
2026-04-09 15:54 11% ` [PATCH proxmox v8 1/6] token shadow: split AccessControlConfig and add token.shadow generation Samuel Rufinatscha
2026-04-09 15:54 11% ` [PATCH proxmox v8 2/6] token shadow: cache verified API token secrets Samuel Rufinatscha
2026-04-09 15:54 11% ` [PATCH proxmox v8 3/6] token shadow: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2026-04-09 15:54 15% ` [PATCH proxmox v8 4/6] token shadow: add TTL window to token secret cache Samuel Rufinatscha
2026-04-09 15:54 17% ` [PATCH proxmox v8 5/6] token shadow: inline set_secret fn Samuel Rufinatscha
2026-04-09 15:54 15% ` [PATCH proxmox v8 6/6] token shadow: deduplicate more code into apply_api_mutation Samuel Rufinatscha
2026-04-09 15:54 14% ` [PATCH proxmox-datacenter-manager v8 1/3] pdm-config: implement access control backend hooks Samuel Rufinatscha
2026-04-09 15:54 16% ` [PATCH proxmox-datacenter-manager v8 2/3] pdm-config: wire user and ACL cache generation Samuel Rufinatscha
2026-04-09 15:54 16% ` [PATCH proxmox-datacenter-manager v8 3/3] pdm-config: wire token.shadow generation Samuel Rufinatscha
2026-04-29 10:12     [PATCH widget-toolkit] confirm remove dialog: improve layout for larger fields/components Dominik Csapak
2026-04-29 11:04 15% ` Samuel Rufinatscha
2026-04-30 17:26     [PATCH v2 cluster/storage/manager 00/15] storage mapping Mira Limbeck
2026-04-30 17:27     ` [PATCH v2 storage 07/15] iscsi: introduce helper to update discovery db Mira Limbeck
2026-05-11 16:46  6%   ` Samuel Rufinatscha
2026-05-11 17:05 13% ` [PATCH v2 cluster/storage/manager 00/15] storage mapping Samuel Rufinatscha
2026-05-07 11:59     [PATCH manager/pmg-api/proxmox{,-backup,-perl-rs,-offline-mirror} 0/8] adapt subscription handling to alternative server IDs Fabian Grünbichler
2026-05-18 16:02 13% ` Samuel Rufinatscha

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal