* [pbs-devel] [PATCH proxmox v6 2/3] acme: introduce http_status module
2026-01-16 11:28 10% [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] " Samuel Rufinatscha
2026-01-16 11:28 16% ` [pbs-devel] [PATCH proxmox v6 1/3] acme-api: add ACME completion helpers Samuel Rufinatscha
@ 2026-01-16 11:28 15% ` Samuel Rufinatscha
2026-01-16 11:28 14% ` [pbs-devel] [PATCH proxmox v6 3/3] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
` (2 subsequent siblings)
4 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-16 11:28 UTC (permalink / raw)
To: pbs-devel
Introduce an internal http_status module with the common ACME HTTP
response codes, and replace use of crate::request::CREATED as well as
direct numeric status code usages.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme/src/account.rs | 8 ++++----
proxmox-acme/src/async_client.rs | 4 ++--
proxmox-acme/src/lib.rs | 2 ++
proxmox-acme/src/request.rs | 11 ++++++++++-
4 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
index f763c1e9..c62e60e0 100644
--- a/proxmox-acme/src/account.rs
+++ b/proxmox-acme/src/account.rs
@@ -85,7 +85,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::request::CREATED,
+ expected: crate::http_status::CREATED,
};
Ok(NewOrder::new(request))
@@ -107,7 +107,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: crate::http_status::OK,
})
}
@@ -132,7 +132,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: crate::http_status::OK,
})
}
@@ -405,7 +405,7 @@ impl AccountCreator {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::request::CREATED,
+ expected: crate::http_status::CREATED,
})
}
diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
index dc755fb9..c803823d 100644
--- a/proxmox-acme/src/async_client.rs
+++ b/proxmox-acme/src/async_client.rs
@@ -498,7 +498,7 @@ impl AcmeClient {
method: "GET",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: crate::http_status::OK,
},
nonce,
)
@@ -550,7 +550,7 @@ impl AcmeClient {
method: "HEAD",
content_type: "",
body: String::new(),
- expected: 200,
+ expected: crate::http_status::OK,
},
nonce,
)
diff --git a/proxmox-acme/src/lib.rs b/proxmox-acme/src/lib.rs
index df722629..b1be9d15 100644
--- a/proxmox-acme/src/lib.rs
+++ b/proxmox-acme/src/lib.rs
@@ -74,6 +74,8 @@ pub use request::Request;
#[cfg(feature = "impl")]
pub use order::NewOrder;
#[cfg(feature = "impl")]
+pub(crate) use request::http_status;
+#[cfg(feature = "impl")]
pub use request::ErrorResponse;
/// Header name for nonces.
diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
index 78a90913..2c83255a 100644
--- a/proxmox-acme/src/request.rs
+++ b/proxmox-acme/src/request.rs
@@ -1,7 +1,6 @@
use serde::Deserialize;
pub(crate) const JSON_CONTENT_TYPE: &str = "application/jose+json";
-pub(crate) const CREATED: u16 = 201;
/// A request which should be performed on the ACME provider.
pub struct Request {
@@ -21,6 +20,16 @@ pub struct Request {
pub expected: u16,
}
+/// Common HTTP status codes used in ACME responses.
+pub(crate) mod http_status {
+ /// 200 OK
+ pub(crate) const OK: u16 = 200;
+ /// 201 Created
+ pub(crate) const CREATED: u16 = 201;
+ /// 204 No Content
+ pub(crate) const NO_CONTENT: u16 = 204;
+}
+
/// An ACME error response contains a specially formatted type string, and can optionally
/// contain textual details and a set of sub problems.
#[derive(Clone, Debug, Deserialize)]
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox v6 3/3] fix #6939: acme: support servers returning 204 for nonce requests
2026-01-16 11:28 10% [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] " Samuel Rufinatscha
2026-01-16 11:28 16% ` [pbs-devel] [PATCH proxmox v6 1/3] acme-api: add ACME completion helpers Samuel Rufinatscha
2026-01-16 11:28 15% ` [pbs-devel] [PATCH proxmox v6 2/3] acme: introduce http_status module Samuel Rufinatscha
@ 2026-01-16 11:28 14% ` Samuel Rufinatscha
2026-01-16 11:28 4% ` [pbs-devel] [PATCH proxmox-backup v6 1/2] acme: remove local AcmeClient and use proxmox-acme-api handlers Samuel Rufinatscha
2026-01-16 11:28 9% ` [pbs-devel] [PATCH proxmox-backup v6 2/2] acme: remove unused src/acme and plugin code Samuel Rufinatscha
4 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-16 11:28 UTC (permalink / raw)
To: pbs-devel
Some ACME servers (notably custom or legacy implementations) respond
to HEAD /newNonce with a 204 No Content instead of the
RFC 8555-recommended 200 OK [1]. While this behavior is technically
off-spec, it is not illegal. This issue was reported on our bug
tracker [2].
The previous implementation treated any non-200 response as an error,
causing account registration to fail against such servers. Relax the
status-code check to accept both 200 and 204 responses (and potentially
support other 2xx codes) to improve interoperability.
Note: In comparison, PVE’s Perl ACME client performs a GET request [3]
instead of a HEAD request and accepts any 2xx success code when
retrieving the nonce [4]. This difference in behavior does not affect
functionality but is worth noting for consistency across
implementations.
[1] https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2
[2] https://bugzilla.proxmox.com/show_bug.cgi?id=6939
[3] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219
[4] https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597
Fixes: #6939
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme/src/account.rs | 10 +++++-----
proxmox-acme/src/async_client.rs | 6 +++---
proxmox-acme/src/client.rs | 2 +-
proxmox-acme/src/request.rs | 4 ++--
4 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/proxmox-acme/src/account.rs b/proxmox-acme/src/account.rs
index c62e60e0..8df19a29 100644
--- a/proxmox-acme/src/account.rs
+++ b/proxmox-acme/src/account.rs
@@ -85,7 +85,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::CREATED,
+ expected: &[crate::http_status::CREATED],
};
Ok(NewOrder::new(request))
@@ -107,7 +107,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK],
})
}
@@ -132,7 +132,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK],
})
}
@@ -157,7 +157,7 @@ impl Account {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: 200,
+ expected: &[crate::http_status::OK],
})
}
@@ -405,7 +405,7 @@ impl AccountCreator {
method: "POST",
content_type: crate::request::JSON_CONTENT_TYPE,
body,
- expected: crate::http_status::CREATED,
+ expected: &[crate::http_status::CREATED],
})
}
diff --git a/proxmox-acme/src/async_client.rs b/proxmox-acme/src/async_client.rs
index c803823d..66ec6024 100644
--- a/proxmox-acme/src/async_client.rs
+++ b/proxmox-acme/src/async_client.rs
@@ -420,7 +420,7 @@ impl AcmeClient {
};
if parts.status.is_success() {
- if status != request.expected {
+ if !request.expected.contains(&status) {
return Err(Error::InvalidApi(format!(
"ACME server responded with unexpected status code: {:?}",
parts.status
@@ -498,7 +498,7 @@ impl AcmeClient {
method: "GET",
content_type: "",
body: String::new(),
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK],
},
nonce,
)
@@ -550,7 +550,7 @@ impl AcmeClient {
method: "HEAD",
content_type: "",
body: String::new(),
- expected: crate::http_status::OK,
+ expected: &[crate::http_status::OK, crate::http_status::NO_CONTENT],
},
nonce,
)
diff --git a/proxmox-acme/src/client.rs b/proxmox-acme/src/client.rs
index 931f7245..881ee83d 100644
--- a/proxmox-acme/src/client.rs
+++ b/proxmox-acme/src/client.rs
@@ -203,7 +203,7 @@ impl Inner {
let got_nonce = self.update_nonce(&mut response)?;
if response.is_success() {
- if response.status != request.expected {
+ if !request.expected.contains(&response.status) {
return Err(Error::InvalidApi(format!(
"API server responded with unexpected status code: {:?}",
response.status
diff --git a/proxmox-acme/src/request.rs b/proxmox-acme/src/request.rs
index 2c83255a..8a4017dc 100644
--- a/proxmox-acme/src/request.rs
+++ b/proxmox-acme/src/request.rs
@@ -16,8 +16,8 @@ pub struct Request {
/// The body to pass along with request, or an empty string.
pub body: String,
- /// The expected status code a compliant ACME provider will return on success.
- pub expected: u16,
+ /// The set of HTTP status codes that indicate a successful response from an ACME provider.
+ pub expected: &'static [u16],
}
/// Common HTTP status codes used in ACME responses.
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 14%]
* [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] fix #6939: acme: support servers returning 204 for nonce requests
@ 2026-01-16 11:28 10% Samuel Rufinatscha
2026-01-16 11:28 16% ` [pbs-devel] [PATCH proxmox v6 1/3] acme-api: add ACME completion helpers Samuel Rufinatscha
` (4 more replies)
0 siblings, 5 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-16 11:28 UTC (permalink / raw)
To: pbs-devel
Hi,
this series fixes account registration for ACME providers that return
HTTP 204 No Content to the newNonce request. Currently, both the PBS
ACME client and the shared ACME client in proxmox-acme only accept
HTTP 200 OK for this request. The issue was observed in PBS against a
custom ACME deployment and reported as bug #6939 [1].
## Problem
During ACME account registration, PBS first fetches an anti-replay
nonce by sending a HEAD request to the CA’s newNonce URL.
RFC 8555 §7.2 [2] states that:
* the server MUST include a Replay-Nonce header with a fresh nonce,
* the server SHOULD use status 200 OK for the HEAD request,
* the server MUST also handle GET on the same resource and may return
204 No Content with an empty body.
The reporter observed the following error message:
"ACME server responded with unexpected status code: 204"
and mentioned that the issue did not appear with PVE 9 [1]. Looking at
PVE’s Perl ACME client [3], it uses a GET request instead of HEAD and
accepts any 2xx success code when retrieving the nonce. This difference
in behavior is worth noting.
## Approach
This series changes the expected field of the internal Request type
from a single u16 to &'static [u16], so one request can explicitly
accept multiple success codes.
To avoid fixing the issue in PBS and in PDM (which uses the shared ACME stack),
this series fixes the bug in
proxmox-acme and then refactors PBS to use the shared ACME stack too.
## Testing
I tested the refactor using Pebble HTTP challenge type.
The DNS challange type will be tested as mentioned by Max (see v5).
*HTTP Challenge Type Test*
To test the refactor, I
(1) installed latest stable PBS on a VM
(2) created .deb package from latest PBS (master), containing the
refactor
(3) installed created .deb package
(4) installed Pebble from Let's Encrypt [5] on the same VM
(5) created an ACME account and ordered the new certificate for the
host domain.
Steps to reproduce:
(1) install latest stable PBS on a VM, create .deb package from latest
PBS (master) containing the refactor, install created .deb package
(2) install Pebble from Let's Encrypt [5] on the same VM:
cd
apt update
apt install -y golang git
git clone https://github.com/letsencrypt/pebble
cd pebble
go build ./cmd/pebble
then, download and trust the Pebble cert:
wget https://raw.githubusercontent.com/letsencrypt/pebble/main/test/certs/pebble.minica.pem
cp pebble.minica.pem /usr/local/share/ca-certificates/pebble.minica.crt
update-ca-certificates
We want Pebble to perform HTTP-01 validation against port 80, because
PBS’s standalone plugin will bind port 80. Set httpPort to 80.
nano ./test/config/pebble-config.json
Start the Pebble server in the background:
./pebble -config ./test/config/pebble-config.json &
Create a Pebble ACME account:
proxmox-backup-manager acme account register default admin@example.com --directory 'https://127.0.0.1:14000/dir'
To verify persistence of the account I checked
ls /etc/proxmox-backup/acme/accounts
Verified if update-account works
proxmox-backup-manager acme account update default --contact "a@example.com,b@example.com"
proxmox-backup-manager acme account info default
In the PBS GUI, you can create a new domain. You can use your host
domain name (see /etc/hosts). Select the created account and order the
certificate.
After a page reload, you might need to accept the new certificate in the browser.
In the PBS dashboard, you should see the new Pebble certificate.
*Note: on reboot, the created Pebble ACME account will be gone and you
will need to create a new one. Pebble does not persist account info.
In that case remove the previously created account in
/etc/proxmox-backup/acme/accounts.
*Testing the newNonce fix*
To test the ACME newNonce fix, I put nginx in front of Pebble, to
intercept the newNonce request in order to return 204 No Content
instead of 200 OK, all other requests are unchanged and forwarded to
Pebble. Requires trusting the nginx CAs via
/usr/local/share/ca-certificates + update-ca-certificates on the VM.
Then I ran following command against nginx:
proxmox-backup-manager acme account register proxytest root@backup.local --directory 'https://nginx-address/dir
The account could be created successfully. When adjusting the nginx
configuration to return any other non-expected success status code,
PBS rejects as expected.
## Patch summary
0001 - proxmox: acme-api: add ACME completion helpers
0002 – proxmox: acme: introduce http_status module
0003 – proxmox: fix #6939: acme: support servers
returning 204 for nonce requests
0004 – proxmox-backup: acme: remove local AcmeClient and use
proxmox-acme-api handlers
0005 – proxmox-backup: acme: remove unused src/acme and plugin code
## Maintainer notes
proxmox-acme: requires version bump (breaking Request::expected change)
proxmox-backup: requires version bump
- NodeConfig::acme_config() signature changed from
Option<Result<AcmeConfig, Error>> to Result<AcmeConfig, Error>
- NodeConfig::acme_client() function removed
0001 - proxmox: acme-api: add ACME completion helpers could be applied
as an independent patch to make sure https://bugzilla.proxmox.com/show_bug.cgi?id=7179
is not blocked / avoid duplicate work
## Changelog
Changes from v5 to v6:
* rebased
* proxmox-acme: revert visibility changes and dead-code removal
* proxmox-acme-api: remove load_client_with_account
* proxmox-backup: remove pub Node::acme_client()
* proxmox-backup: Node::acme_config() inline transpose/default logic
* proxmox-backup: merge PBS Client removal and API handler changes in
one patch
* improve commit messages
Changes from v4 to v5:
* rebased
* re-ordered series (proxmox-acme fix first)
* proxmox-backup: cleaned up imports based on an initial clean-up patch
* proxmox-acme: removed now unused post_request_raw_payload(),
update_account_request(), deactivate_account_request()
* proxmox-acme: removed now obsolete/unused get_authorization() and
GetAuthorization impl
Verified removal by compiling PBS, PDM, and proxmox-perl-rs
with all features.
Changes from v3 to v4:
* add proxmox-acme-api as a dependency and initialize it in
PBS so PBS can use the shared ACME API instead.
* remove the PBS-local AcmeClient implementation and switch PBS
over to the shared proxmox-acme async client.
* rework PBS’ ACME API endpoints to delegate to
proxmox-acme-api handlers instead of duplicating logic locally.
* move PBS’ ACME certificate ordering logic over to
proxmox-acme-api, keeping only certificate installation/reload in PBS.
* add a load_client_with_account helper in proxmox-acme-api so PBS
(and others) can construct an AcmeClient for a configured account
without duplicating boilerplate.
* hide the low-level Request type and its fields behind constructors
/ reduced visibility so changes to “expected” no longer affect the
public API as they did in v3.
* split out the HTTP status constants into an internal http_status
module as a separate preparatory cleanup before the bug fix, instead
of doing this inline like in v3.
* Rebased on top of the refactor: keep the same behavioural fix as in
v3 accept 204 for newNonce with Replay-Nonce present), but implement
it on top of the http_status module that is part of the refactor.
Changes from v2 to v3:
* rename `http_success` module to `http_status`
* replace `http_success` usage
* introduced `http_success` module to contain the http success codes
* replaced `Vec<u16>` with `&[u16]` for expected codes to avoid allocations.
* clarified the PVEs Perl ACME client behaviour in the commit message.
* integrated the `http_success` module, replacing `Vec<u16>` with `&[u16]`
* clarified the PVEs Perl ACME client behaviour in the commit message.
[1] Bugzilla report #6939:
[https://bugzilla.proxmox.com/show_bug.cgi?id=6939](https://bugzilla.proxmox.com/show_bug.cgi?id=6939)
[2] RFC 8555 (ACME):
[https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2](https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2)
[3] PVE’s Perl ACME client (allow 2xx codes for nonce requests):
[https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597)
[4] Pebble ACME server:
[https://github.com/letsencrypt/pebble](https://github.com/letsencrypt/pebble)
[5] Pebble ACME server (perform GET request:
[https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219)
proxmox:
Samuel Rufinatscha (3):
acme-api: add ACME completion helpers
acme: introduce http_status module
fix #6939: acme: support servers returning 204 for nonce requests
proxmox-acme-api/src/challenge_schemas.rs | 2 +-
proxmox-acme-api/src/lib.rs | 57 +++++++++++++++++++++++
proxmox-acme/src/account.rs | 10 ++--
proxmox-acme/src/async_client.rs | 6 +--
proxmox-acme/src/client.rs | 2 +-
proxmox-acme/src/lib.rs | 2 +
proxmox-acme/src/request.rs | 15 ++++--
7 files changed, 81 insertions(+), 13 deletions(-)
proxmox-backup:
Samuel Rufinatscha (2):
acme: remove local AcmeClient and use proxmox-acme-api handlers
acme: remove unused src/acme and plugin code
Cargo.toml | 3 +
src/acme/client.rs | 691 -------------------------
src/acme/mod.rs | 5 -
src/acme/plugin.rs | 335 ------------
src/api2/config/acme.rs | 399 ++------------
src/api2/node/certificates.rs | 221 +-------
src/api2/types/acme.rs | 97 ----
src/api2/types/mod.rs | 3 -
src/bin/proxmox-backup-api.rs | 2 +
src/bin/proxmox-backup-manager.rs | 3 +-
src/bin/proxmox-backup-proxy.rs | 1 +
src/bin/proxmox_backup_manager/acme.rs | 37 +-
src/config/acme/mod.rs | 168 ------
src/config/acme/plugin.rs | 189 -------
src/config/mod.rs | 1 -
src/config/node.rs | 43 +-
src/lib.rs | 2 -
17 files changed, 94 insertions(+), 2106 deletions(-)
delete mode 100644 src/acme/client.rs
delete mode 100644 src/acme/mod.rs
delete mode 100644 src/acme/plugin.rs
delete mode 100644 src/api2/types/acme.rs
delete mode 100644 src/config/acme/mod.rs
delete mode 100644 src/config/acme/plugin.rs
Summary over all repositories:
24 files changed, 175 insertions(+), 2119 deletions(-)
--
Generated by git-murpp 0.8.1
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 10%]
* [pbs-devel] [PATCH proxmox v6 1/3] acme-api: add ACME completion helpers
2026-01-16 11:28 10% [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] " Samuel Rufinatscha
@ 2026-01-16 11:28 16% ` Samuel Rufinatscha
2026-01-16 11:28 15% ` [pbs-devel] [PATCH proxmox v6 2/3] acme: introduce http_status module Samuel Rufinatscha
` (3 subsequent siblings)
4 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-16 11:28 UTC (permalink / raw)
To: pbs-devel
Factors out the PBS ACME completion helpers and adds them to
proxmox-acme-api.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
proxmox-acme-api/src/challenge_schemas.rs | 2 +-
proxmox-acme-api/src/lib.rs | 57 +++++++++++++++++++++++
2 files changed, 58 insertions(+), 1 deletion(-)
diff --git a/proxmox-acme-api/src/challenge_schemas.rs b/proxmox-acme-api/src/challenge_schemas.rs
index e66e327e..4e94d3ff 100644
--- a/proxmox-acme-api/src/challenge_schemas.rs
+++ b/proxmox-acme-api/src/challenge_schemas.rs
@@ -29,7 +29,7 @@ impl Serialize for ChallengeSchemaWrapper {
}
}
-fn load_dns_challenge_schema() -> Result<Vec<AcmeChallengeSchema>, Error> {
+pub(crate) fn load_dns_challenge_schema() -> Result<Vec<AcmeChallengeSchema>, Error> {
let raw = file_read_string(ACME_DNS_SCHEMA_FN)?;
let schemas: serde_json::Map<String, Value> = serde_json::from_str(&raw)?;
diff --git a/proxmox-acme-api/src/lib.rs b/proxmox-acme-api/src/lib.rs
index 623e9e23..ba64569d 100644
--- a/proxmox-acme-api/src/lib.rs
+++ b/proxmox-acme-api/src/lib.rs
@@ -46,3 +46,60 @@ pub(crate) mod acme_plugin;
mod certificate_helpers;
#[cfg(feature = "impl")]
pub use certificate_helpers::{create_self_signed_cert, order_certificate, revoke_certificate};
+
+#[cfg(feature = "impl")]
+pub mod completion {
+
+ use std::collections::HashMap;
+ use std::ops::ControlFlow;
+
+ use crate::account_config::foreach_acme_account;
+ use crate::challenge_schemas::load_dns_challenge_schema;
+ use crate::plugin_config::plugin_config;
+
+ pub fn complete_acme_account(_arg: &str, _param: &HashMap<String, String>) -> Vec<String> {
+ let mut out = Vec::new();
+ let _ = foreach_acme_account(|name| {
+ out.push(name.into_string());
+ ControlFlow::Continue(())
+ });
+ out
+ }
+
+ pub fn complete_acme_plugin(_arg: &str, _param: &HashMap<String, String>) -> Vec<String> {
+ match plugin_config() {
+ Ok((config, _digest)) => config
+ .iter()
+ .map(|(id, (_type, _cfg))| id.clone())
+ .collect(),
+ Err(_) => Vec::new(),
+ }
+ }
+
+ pub fn complete_acme_plugin_type(_arg: &str, _param: &HashMap<String, String>) -> Vec<String> {
+ vec![
+ "dns".to_string(),
+ //"http".to_string(), // makes currently not really sense to create or the like
+ ]
+ }
+
+ pub fn complete_acme_api_challenge_type(
+ _arg: &str,
+ param: &HashMap<String, String>,
+ ) -> Vec<String> {
+ if param.get("type") == Some(&"dns".to_string()) {
+ match load_dns_challenge_schema() {
+ Ok(schema) => schema.into_iter().map(|s| s.id).collect(),
+ Err(_) => Vec::new(),
+ }
+ } else {
+ Vec::new()
+ }
+ }
+}
+
+#[cfg(feature = "impl")]
+pub use completion::{
+ complete_acme_account, complete_acme_api_challenge_type, complete_acme_plugin,
+ complete_acme_plugin_type,
+};
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 16%]
* [pbs-devel] [PATCH proxmox-backup v6 2/2] acme: remove unused src/acme and plugin code
2026-01-16 11:28 10% [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] " Samuel Rufinatscha
` (3 preceding siblings ...)
2026-01-16 11:28 4% ` [pbs-devel] [PATCH proxmox-backup v6 1/2] acme: remove local AcmeClient and use proxmox-acme-api handlers Samuel Rufinatscha
@ 2026-01-16 11:28 9% ` Samuel Rufinatscha
4 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-16 11:28 UTC (permalink / raw)
To: pbs-devel
Removes the unused src/acme module and plugin code as PBS now uses the
factored out client/API handlers.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
src/acme/mod.rs | 1 -
src/acme/plugin.rs | 335 --------------------------------------
src/api2/types/acme.rs | 38 -----
src/api2/types/mod.rs | 3 -
src/config/acme/mod.rs | 1 -
src/config/acme/plugin.rs | 105 ------------
src/config/mod.rs | 1 -
src/lib.rs | 2 -
8 files changed, 486 deletions(-)
delete mode 100644 src/acme/mod.rs
delete mode 100644 src/acme/plugin.rs
delete mode 100644 src/api2/types/acme.rs
delete mode 100644 src/config/acme/mod.rs
delete mode 100644 src/config/acme/plugin.rs
diff --git a/src/acme/mod.rs b/src/acme/mod.rs
deleted file mode 100644
index 700d90d7..00000000
--- a/src/acme/mod.rs
+++ /dev/null
@@ -1 +0,0 @@
-pub(crate) mod plugin;
diff --git a/src/acme/plugin.rs b/src/acme/plugin.rs
deleted file mode 100644
index 6804243c..00000000
--- a/src/acme/plugin.rs
+++ /dev/null
@@ -1,335 +0,0 @@
-use std::future::Future;
-use std::net::{IpAddr, SocketAddr};
-use std::pin::Pin;
-use std::process::Stdio;
-use std::sync::Arc;
-use std::time::Duration;
-
-use anyhow::{bail, format_err, Error};
-use bytes::Bytes;
-use futures::TryFutureExt;
-use http_body_util::Full;
-use hyper::body::Incoming;
-use hyper::server::conn::http1;
-use hyper::service::service_fn;
-use hyper::{Request, Response};
-use hyper_util::rt::TokioIo;
-use tokio::io::{AsyncBufReadExt, AsyncRead, AsyncWriteExt, BufReader};
-use tokio::net::TcpListener;
-use tokio::process::Command;
-
-use proxmox_acme::async_client::AcmeClient;
-use proxmox_acme::{Authorization, Challenge};
-use proxmox_rest_server::WorkerTask;
-
-use crate::api2::types::AcmeDomain;
-use crate::config::acme::plugin::{DnsPlugin, PluginData};
-
-const PROXMOX_ACME_SH_PATH: &str = "/usr/share/proxmox-acme/proxmox-acme";
-
-pub(crate) fn get_acme_plugin(
- plugin_data: &PluginData,
- name: &str,
-) -> Result<Option<Box<dyn AcmePlugin + Send + Sync + 'static>>, Error> {
- let (ty, data) = match plugin_data.get(name) {
- Some(plugin) => plugin,
- None => return Ok(None),
- };
-
- Ok(Some(match ty.as_str() {
- "dns" => {
- let plugin: DnsPlugin = serde::Deserialize::deserialize(data)?;
- Box::new(plugin)
- }
- "standalone" => {
- // this one has no config
- Box::<StandaloneServer>::default()
- }
- other => bail!("missing implementation for plugin type '{}'", other),
- }))
-}
-
-pub(crate) trait AcmePlugin {
- /// Setup everything required to trigger the validation and return the corresponding validation
- /// URL.
- fn setup<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<&'c str, Error>> + Send + 'fut>>;
-
- fn teardown<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + 'fut>>;
-}
-
-fn extract_challenge<'a>(
- authorization: &'a Authorization,
- ty: &str,
-) -> Result<&'a Challenge, Error> {
- authorization
- .challenges
- .iter()
- .find(|ch| ch.ty == ty)
- .ok_or_else(|| format_err!("no supported challenge type ({}) found", ty))
-}
-
-async fn pipe_to_tasklog<T: AsyncRead + Unpin>(
- pipe: T,
- task: Arc<WorkerTask>,
-) -> Result<(), std::io::Error> {
- let mut pipe = BufReader::new(pipe);
- let mut line = String::new();
- loop {
- line.clear();
- match pipe.read_line(&mut line).await {
- Ok(0) => return Ok(()),
- Ok(_) => task.log_message(line.as_str()),
- Err(err) => return Err(err),
- }
- }
-}
-
-impl DnsPlugin {
- async fn action<'a>(
- &self,
- client: &mut AcmeClient,
- authorization: &'a Authorization,
- domain: &AcmeDomain,
- task: Arc<WorkerTask>,
- action: &str,
- ) -> Result<&'a str, Error> {
- let challenge = extract_challenge(authorization, "dns-01")?;
- let mut stdin_data = client
- .dns_01_txt_value(
- challenge
- .token()
- .ok_or_else(|| format_err!("missing token in challenge"))?,
- )?
- .into_bytes();
- stdin_data.push(b'\n');
- stdin_data.extend(self.data.as_bytes());
- if stdin_data.last() != Some(&b'\n') {
- stdin_data.push(b'\n');
- }
-
- let mut command = Command::new("/usr/bin/setpriv");
-
- #[rustfmt::skip]
- command.args([
- "--reuid", "nobody",
- "--regid", "nogroup",
- "--clear-groups",
- "--reset-env",
- "--",
- "/bin/bash",
- PROXMOX_ACME_SH_PATH,
- action,
- &self.core.api,
- domain.alias.as_deref().unwrap_or(&domain.domain),
- ]);
-
- // We could use 1 socketpair, but tokio wraps them all in `File` internally causing `close`
- // to be called separately on all of them without exception, so we need 3 pipes :-(
-
- let mut child = command
- .stdin(Stdio::piped())
- .stdout(Stdio::piped())
- .stderr(Stdio::piped())
- .spawn()?;
-
- let mut stdin = child.stdin.take().expect("Stdio::piped()");
- let stdout = child.stdout.take().expect("Stdio::piped() failed?");
- let stdout = pipe_to_tasklog(stdout, Arc::clone(&task));
- let stderr = child.stderr.take().expect("Stdio::piped() failed?");
- let stderr = pipe_to_tasklog(stderr, Arc::clone(&task));
- let stdin = async move {
- stdin.write_all(&stdin_data).await?;
- stdin.flush().await?;
- Ok::<_, std::io::Error>(())
- };
- match futures::try_join!(stdin, stdout, stderr) {
- Ok(((), (), ())) => (),
- Err(err) => {
- if let Err(err) = child.kill().await {
- task.log_message(format!(
- "failed to kill '{PROXMOX_ACME_SH_PATH} {action}' command: {err}"
- ));
- }
- bail!("'{}' failed: {}", PROXMOX_ACME_SH_PATH, err);
- }
- }
-
- let status = child.wait().await?;
- if !status.success() {
- bail!(
- "'{} {}' exited with error ({})",
- PROXMOX_ACME_SH_PATH,
- action,
- status.code().unwrap_or(-1)
- );
- }
-
- Ok(&challenge.url)
- }
-}
-
-impl AcmePlugin for DnsPlugin {
- fn setup<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<&'c str, Error>> + Send + 'fut>> {
- Box::pin(async move {
- let result = self
- .action(client, authorization, domain, task.clone(), "setup")
- .await;
-
- let validation_delay = self.core.validation_delay.unwrap_or(30) as u64;
- if validation_delay > 0 {
- task.log_message(format!(
- "Sleeping {validation_delay} seconds to wait for TXT record propagation"
- ));
- tokio::time::sleep(Duration::from_secs(validation_delay)).await;
- }
- result
- })
- }
-
- fn teardown<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- domain: &'d AcmeDomain,
- task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + 'fut>> {
- Box::pin(async move {
- self.action(client, authorization, domain, task, "teardown")
- .await
- .map(drop)
- })
- }
-}
-
-#[derive(Default)]
-struct StandaloneServer {
- abort_handle: Option<futures::future::AbortHandle>,
-}
-
-// In case the "order_certificates" future gets dropped between setup & teardown, let's also cancel
-// the HTTP listener on Drop:
-impl Drop for StandaloneServer {
- fn drop(&mut self) {
- self.stop();
- }
-}
-
-impl StandaloneServer {
- fn stop(&mut self) {
- if let Some(abort) = self.abort_handle.take() {
- abort.abort();
- }
- }
-}
-
-async fn standalone_respond(
- req: Request<Incoming>,
- path: Arc<String>,
- key_auth: Arc<String>,
-) -> Result<Response<Full<Bytes>>, hyper::Error> {
- if req.method() == hyper::Method::GET && req.uri().path() == path.as_str() {
- Ok(Response::builder()
- .status(hyper::http::StatusCode::OK)
- .body(key_auth.as_bytes().to_vec().into())
- .unwrap())
- } else {
- Ok(Response::builder()
- .status(hyper::http::StatusCode::NOT_FOUND)
- .body("Not found.".into())
- .unwrap())
- }
-}
-
-impl AcmePlugin for StandaloneServer {
- fn setup<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- client: &'b mut AcmeClient,
- authorization: &'c Authorization,
- _domain: &'d AcmeDomain,
- _task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<&'c str, Error>> + Send + 'fut>> {
- Box::pin(async move {
- self.stop();
-
- let challenge = extract_challenge(authorization, "http-01")?;
- let token = challenge
- .token()
- .ok_or_else(|| format_err!("missing token in challenge"))?;
- let key_auth = Arc::new(client.key_authorization(token)?);
- let path = Arc::new(format!("/.well-known/acme-challenge/{token}"));
-
- // `[::]:80` first, then `*:80`
- let dual = SocketAddr::new(IpAddr::from([0u16; 8]), 80);
- let ipv4 = SocketAddr::new(IpAddr::from([0u8; 4]), 80);
- let incoming = TcpListener::bind(dual)
- .or_else(|_| TcpListener::bind(ipv4))
- .await?;
-
- let server = async move {
- loop {
- let key_auth = Arc::clone(&key_auth);
- let path = Arc::clone(&path);
- match incoming.accept().await {
- Ok((tcp, _)) => {
- let io = TokioIo::new(tcp);
- let service = service_fn(move |request| {
- standalone_respond(
- request,
- Arc::clone(&path),
- Arc::clone(&key_auth),
- )
- });
-
- tokio::task::spawn(async move {
- if let Err(err) =
- http1::Builder::new().serve_connection(io, service).await
- {
- println!("Error serving connection: {err:?}");
- }
- });
- }
- Err(err) => println!("Error accepting connection: {err:?}"),
- }
- }
- };
- let (future, abort) = futures::future::abortable(server);
- self.abort_handle = Some(abort);
- tokio::spawn(future);
-
- Ok(challenge.url.as_str())
- })
- }
-
- fn teardown<'fut, 'a: 'fut, 'b: 'fut, 'c: 'fut, 'd: 'fut>(
- &'a mut self,
- _client: &'b mut AcmeClient,
- _authorization: &'c Authorization,
- _domain: &'d AcmeDomain,
- _task: Arc<WorkerTask>,
- ) -> Pin<Box<dyn Future<Output = Result<(), Error>> + Send + 'fut>> {
- Box::pin(async move {
- if let Some(abort) = self.abort_handle.take() {
- abort.abort();
- }
- Ok(())
- })
- }
-}
diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
deleted file mode 100644
index b83b9882..00000000
--- a/src/api2/types/acme.rs
+++ /dev/null
@@ -1,38 +0,0 @@
-use serde::{Deserialize, Serialize};
-
-use pbs_api_types::{DNS_ALIAS_FORMAT, DNS_NAME_FORMAT, PROXMOX_SAFE_ID_FORMAT};
-use proxmox_schema::api;
-
-#[api(
- properties: {
- "domain": { format: &DNS_NAME_FORMAT },
- "alias": {
- optional: true,
- format: &DNS_ALIAS_FORMAT,
- },
- "plugin": {
- optional: true,
- format: &PROXMOX_SAFE_ID_FORMAT,
- },
- },
- default_key: "domain",
-)]
-#[derive(Deserialize, Serialize)]
-/// A domain entry for an ACME certificate.
-pub struct AcmeDomain {
- /// The domain to certify for.
- pub domain: String,
-
- /// The domain to use for challenges instead of the default acme challenge domain.
- ///
- /// This is useful if you use CNAME entries to redirect `_acme-challenge.*` domains to a
- /// different DNS server.
- #[serde(skip_serializing_if = "Option::is_none")]
- pub alias: Option<String>,
-
- /// The plugin to use to validate this domain.
- ///
- /// Empty means standalone HTTP validation is used.
- #[serde(skip_serializing_if = "Option::is_none")]
- pub plugin: Option<String>,
-}
diff --git a/src/api2/types/mod.rs b/src/api2/types/mod.rs
index afc34b30..34193685 100644
--- a/src/api2/types/mod.rs
+++ b/src/api2/types/mod.rs
@@ -4,9 +4,6 @@ use anyhow::bail;
use proxmox_schema::*;
-mod acme;
-pub use acme::*;
-
// File names: may not contain slashes, may not start with "."
pub const FILENAME_FORMAT: ApiStringFormat = ApiStringFormat::VerifyFn(|name| {
if name.starts_with('.') {
diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
deleted file mode 100644
index 962cb1bb..00000000
--- a/src/config/acme/mod.rs
+++ /dev/null
@@ -1 +0,0 @@
-pub mod plugin;
diff --git a/src/config/acme/plugin.rs b/src/config/acme/plugin.rs
deleted file mode 100644
index e5a41f99..00000000
--- a/src/config/acme/plugin.rs
+++ /dev/null
@@ -1,105 +0,0 @@
-use anyhow::Error;
-use serde::{Deserialize, Serialize};
-use serde_json::Value;
-
-use pbs_api_types::PROXMOX_SAFE_ID_FORMAT;
-use proxmox_schema::{api, Schema, StringSchema, Updater};
-use proxmox_section_config::SectionConfigData;
-
-pub const PLUGIN_ID_SCHEMA: Schema = StringSchema::new("ACME Challenge Plugin ID.")
- .format(&PROXMOX_SAFE_ID_FORMAT)
- .min_length(1)
- .max_length(32)
- .schema();
-
-#[api(
- properties: {
- id: { schema: PLUGIN_ID_SCHEMA },
- disable: {
- optional: true,
- default: false,
- },
- "validation-delay": {
- default: 30,
- optional: true,
- minimum: 0,
- maximum: 2 * 24 * 60 * 60,
- },
- },
-)]
-/// DNS ACME Challenge Plugin core data.
-#[derive(Deserialize, Serialize, Updater)]
-#[serde(rename_all = "kebab-case")]
-pub struct DnsPluginCore {
- /// Plugin ID.
- #[updater(skip)]
- pub id: String,
-
- /// DNS API Plugin Id.
- pub api: String,
-
- /// Extra delay in seconds to wait before requesting validation.
- ///
- /// Allows to cope with long TTL of DNS records.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- pub validation_delay: Option<u32>,
-
- /// Flag to disable the config.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- pub disable: Option<bool>,
-}
-
-#[api(
- properties: {
- core: { type: DnsPluginCore },
- },
-)]
-/// DNS ACME Challenge Plugin.
-#[derive(Deserialize, Serialize)]
-#[serde(rename_all = "kebab-case")]
-pub struct DnsPlugin {
- #[serde(flatten)]
- pub core: DnsPluginCore,
-
- // We handle this property separately in the API calls.
- /// DNS plugin data (base64url encoded without padding).
- #[serde(with = "proxmox_serde::string_as_base64url_nopad")]
- pub data: String,
-}
-
-impl DnsPlugin {
- pub fn decode_data(&self, output: &mut Vec<u8>) -> Result<(), Error> {
- Ok(proxmox_base64::url::decode_to_vec(&self.data, output)?)
- }
-}
-
-pub struct PluginData {
- data: SectionConfigData,
-}
-
-// And some convenience helpers.
-impl PluginData {
- pub fn remove(&mut self, name: &str) -> Option<(String, Value)> {
- self.data.sections.remove(name)
- }
-
- pub fn contains_key(&mut self, name: &str) -> bool {
- self.data.sections.contains_key(name)
- }
-
- pub fn get(&self, name: &str) -> Option<&(String, Value)> {
- self.data.sections.get(name)
- }
-
- pub fn get_mut(&mut self, name: &str) -> Option<&mut (String, Value)> {
- self.data.sections.get_mut(name)
- }
-
- pub fn insert(&mut self, id: String, ty: String, plugin: Value) {
- self.data.sections.insert(id, (ty, plugin));
- }
-
- pub fn iter(&self) -> impl Iterator<Item = (&String, &(String, Value))> + Send {
- self.data.sections.iter()
- }
-}
diff --git a/src/config/mod.rs b/src/config/mod.rs
index 19246742..f05af90d 100644
--- a/src/config/mod.rs
+++ b/src/config/mod.rs
@@ -15,7 +15,6 @@ use proxmox_lang::try_block;
use pbs_api_types::{PamRealmConfig, PbsRealmConfig};
use pbs_buildcfg::{self, configdir};
-pub mod acme;
pub mod node;
pub mod tfa;
diff --git a/src/lib.rs b/src/lib.rs
index 8633378c..828f5842 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -27,8 +27,6 @@ pub(crate) mod auth;
pub mod tape;
-pub mod acme;
-
pub mod client_helpers;
pub mod traffic_control_cache;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 9%]
* [pbs-devel] [PATCH proxmox-backup v6 1/2] acme: remove local AcmeClient and use proxmox-acme-api handlers
2026-01-16 11:28 10% [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] " Samuel Rufinatscha
` (2 preceding siblings ...)
2026-01-16 11:28 14% ` [pbs-devel] [PATCH proxmox v6 3/3] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
@ 2026-01-16 11:28 4% ` Samuel Rufinatscha
2026-01-16 11:28 9% ` [pbs-devel] [PATCH proxmox-backup v6 2/2] acme: remove unused src/acme and plugin code Samuel Rufinatscha
4 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-16 11:28 UTC (permalink / raw)
To: pbs-devel
PBS currently uses its own ACME client and API logic, while PDM uses the
factored out proxmox-acme and proxmox-acme-api crates. This requires
maintenance in two places. This patch moves PBS over to the shared
ACME stack.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Cargo.toml | 3 +
src/acme/client.rs | 691 -------------------------
src/acme/mod.rs | 4 -
src/acme/plugin.rs | 2 +-
src/api2/config/acme.rs | 399 ++------------
src/api2/node/certificates.rs | 221 +-------
src/api2/types/acme.rs | 61 +--
src/bin/proxmox-backup-api.rs | 2 +
src/bin/proxmox-backup-manager.rs | 3 +-
src/bin/proxmox-backup-proxy.rs | 1 +
src/bin/proxmox_backup_manager/acme.rs | 37 +-
src/config/acme/mod.rs | 167 ------
src/config/acme/plugin.rs | 88 +---
src/config/node.rs | 43 +-
14 files changed, 98 insertions(+), 1624 deletions(-)
delete mode 100644 src/acme/client.rs
diff --git a/Cargo.toml b/Cargo.toml
index 49548ecc..5c94bfaa 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -101,6 +101,7 @@ pbs-api-types = "1.0.8"
# other proxmox crates
pathpatterns = "1"
proxmox-acme = "1"
+proxmox-acme-api = { version = "1", features = [ "impl" ] }
pxar = "1"
# PBS workspace
@@ -251,6 +252,7 @@ pbs-api-types.workspace = true
# in their respective repo
proxmox-acme.workspace = true
+proxmox-acme-api.workspace = true
pxar.workspace = true
# proxmox-backup workspace/internal crates
@@ -269,6 +271,7 @@ proxmox-rrd-api-types.workspace = true
[patch.crates-io]
#pbs-api-types = { path = "../proxmox/pbs-api-types" }
#proxmox-acme = { path = "../proxmox/proxmox-acme" }
+#proxmox-acme-api = { path = "../proxmox/proxmox-acme-api" }
#proxmox-api-macro = { path = "../proxmox/proxmox-api-macro" }
#proxmox-apt = { path = "../proxmox/proxmox-apt" }
#proxmox-apt-api-types = { path = "../proxmox/proxmox-apt-api-types" }
diff --git a/src/acme/client.rs b/src/acme/client.rs
deleted file mode 100644
index 9fb6ad55..00000000
--- a/src/acme/client.rs
+++ /dev/null
@@ -1,691 +0,0 @@
-//! HTTP Client for the ACME protocol.
-
-use std::fs::OpenOptions;
-use std::io;
-use std::os::unix::fs::OpenOptionsExt;
-
-use anyhow::{bail, format_err};
-use bytes::Bytes;
-use http_body_util::BodyExt;
-use hyper::Request;
-use nix::sys::stat::Mode;
-use proxmox_http::Body;
-use serde::{Deserialize, Serialize};
-
-use proxmox_acme::account::AccountCreator;
-use proxmox_acme::order::{Order, OrderData};
-use proxmox_acme::types::AccountData as AcmeAccountData;
-use proxmox_acme::Request as AcmeRequest;
-use proxmox_acme::{Account, Authorization, Challenge, Directory, Error, ErrorResponse};
-use proxmox_http::client::Client;
-use proxmox_sys::fs::{replace_file, CreateOptions};
-
-use crate::api2::types::AcmeAccountName;
-use crate::config::acme::account_path;
-use crate::tools::pbs_simple_http;
-
-/// Our on-disk format inherited from PVE's proxmox-acme code.
-#[derive(Deserialize, Serialize)]
-#[serde(rename_all = "camelCase")]
-pub struct AccountData {
- /// The account's location URL.
- location: String,
-
- /// The account data.
- account: AcmeAccountData,
-
- /// The private key as PEM formatted string.
- key: String,
-
- /// ToS URL the user agreed to.
- #[serde(skip_serializing_if = "Option::is_none")]
- tos: Option<String>,
-
- #[serde(skip_serializing_if = "is_false", default)]
- debug: bool,
-
- /// The directory's URL.
- directory_url: String,
-}
-
-#[inline]
-fn is_false(b: &bool) -> bool {
- !*b
-}
-
-pub struct AcmeClient {
- directory_url: String,
- debug: bool,
- account_path: Option<String>,
- tos: Option<String>,
- account: Option<Account>,
- directory: Option<Directory>,
- nonce: Option<String>,
- http_client: Client,
-}
-
-impl AcmeClient {
- /// Create a new ACME client for a given ACME directory URL.
- pub fn new(directory_url: String) -> Self {
- Self {
- directory_url,
- debug: false,
- account_path: None,
- tos: None,
- account: None,
- directory: None,
- nonce: None,
- http_client: pbs_simple_http(None),
- }
- }
-
- /// Load an existing ACME account by name.
- pub async fn load(account_name: &AcmeAccountName) -> Result<Self, anyhow::Error> {
- let account_path = account_path(account_name.as_ref());
- let data = match tokio::fs::read(&account_path).await {
- Ok(data) => data,
- Err(err) if err.kind() == io::ErrorKind::NotFound => {
- bail!("acme account '{}' does not exist", account_name)
- }
- Err(err) => bail!(
- "failed to load acme account from '{}' - {}",
- account_path,
- err
- ),
- };
- let data: AccountData = serde_json::from_slice(&data).map_err(|err| {
- format_err!(
- "failed to parse acme account from '{}' - {}",
- account_path,
- err
- )
- })?;
-
- let account = Account::from_parts(data.location, data.key, data.account);
-
- let mut me = Self::new(data.directory_url);
- me.debug = data.debug;
- me.account_path = Some(account_path);
- me.tos = data.tos;
- me.account = Some(account);
-
- Ok(me)
- }
-
- pub async fn new_account<'a>(
- &'a mut self,
- account_name: &AcmeAccountName,
- tos_agreed: bool,
- contact: Vec<String>,
- rsa_bits: Option<u32>,
- eab_creds: Option<(String, String)>,
- ) -> Result<&'a Account, anyhow::Error> {
- self.tos = if tos_agreed {
- self.terms_of_service_url().await?.map(str::to_owned)
- } else {
- None
- };
-
- let mut account = Account::creator()
- .set_contacts(contact)
- .agree_to_tos(tos_agreed);
-
- if let Some((eab_kid, eab_hmac_key)) = eab_creds {
- account = account.set_eab_credentials(eab_kid, eab_hmac_key)?;
- }
-
- let account = if let Some(bits) = rsa_bits {
- account.generate_rsa_key(bits)?
- } else {
- account.generate_ec_key()?
- };
-
- let _ = self.register_account(account).await?;
-
- crate::config::acme::make_acme_account_dir()?;
- let account_path = account_path(account_name.as_ref());
- let file = OpenOptions::new()
- .write(true)
- .create_new(true)
- .mode(0o600)
- .open(&account_path)
- .map_err(|err| format_err!("failed to open {:?} for writing: {}", account_path, err))?;
- self.write_to(file).map_err(|err| {
- format_err!(
- "failed to write acme account to {:?}: {}",
- account_path,
- err
- )
- })?;
- self.account_path = Some(account_path);
-
- // unwrap: Setting `self.account` is literally this function's job, we just can't keep
- // the borrow from from `self.register_account()` active due to clashes.
- Ok(self.account.as_ref().unwrap())
- }
-
- fn save(&self) -> Result<(), anyhow::Error> {
- let mut data = Vec::<u8>::new();
- self.write_to(&mut data)?;
- let account_path = self.account_path.as_ref().ok_or_else(|| {
- format_err!("no account path set, cannot save updated account information")
- })?;
- crate::config::acme::make_acme_account_dir()?;
- replace_file(
- account_path,
- &data,
- CreateOptions::new()
- .perm(Mode::from_bits_truncate(0o600))
- .owner(nix::unistd::ROOT)
- .group(nix::unistd::Gid::from_raw(0)),
- true,
- )
- }
-
- /// Shortcut to `account().ok_or_else(...).key_authorization()`.
- pub fn key_authorization(&self, token: &str) -> Result<String, anyhow::Error> {
- Ok(Self::need_account(&self.account)?.key_authorization(token)?)
- }
-
- /// Shortcut to `account().ok_or_else(...).dns_01_txt_value()`.
- /// the key authorization value.
- pub fn dns_01_txt_value(&self, token: &str) -> Result<String, anyhow::Error> {
- Ok(Self::need_account(&self.account)?.dns_01_txt_value(token)?)
- }
-
- async fn register_account(
- &mut self,
- account: AccountCreator,
- ) -> Result<&Account, anyhow::Error> {
- let mut retry = retry();
- let mut response = loop {
- retry.tick()?;
-
- let (directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
- let request = account.request(directory, nonce)?;
- match self.run_request(request).await {
- Ok(response) => break response,
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- };
-
- let account = account.response(response.location_required()?, &response.body)?;
-
- self.account = Some(account);
- Ok(self.account.as_ref().unwrap())
- }
-
- pub async fn update_account<T: Serialize>(
- &mut self,
- data: &T,
- ) -> Result<&Account, anyhow::Error> {
- let account = Self::need_account(&self.account)?;
-
- let mut retry = retry();
- let response = loop {
- retry.tick()?;
-
- let (_directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = account.post_request(&account.location, nonce, data)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(response) => break response,
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- };
-
- // unwrap: we've been keeping an immutable reference to it from the top of the method
- let _ = account;
- self.account.as_mut().unwrap().data = response.json()?;
- self.save()?;
- Ok(self.account.as_ref().unwrap())
- }
-
- pub async fn new_order<I>(&mut self, domains: I) -> Result<Order, anyhow::Error>
- where
- I: IntoIterator<Item = String>,
- {
- let account = Self::need_account(&self.account)?;
-
- let order = domains
- .into_iter()
- .fold(OrderData::new(), |order, domain| order.domain(domain));
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let mut new_order = account.new_order(&order, directory, nonce)?;
- let mut response = match Self::execute(
- &mut self.http_client,
- new_order.request.take().unwrap(),
- &mut self.nonce,
- )
- .await
- {
- Ok(response) => response,
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- };
-
- return Ok(
- new_order.response(response.location_required()?, response.bytes().as_ref())?
- );
- }
- }
-
- /// Low level "POST-as-GET" request.
- async fn post_as_get(&mut self, url: &str) -> Result<AcmeResponse, anyhow::Error> {
- let account = Self::need_account(&self.account)?;
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (_directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = account.get_request(url, nonce)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(response) => return Ok(response),
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- }
- }
-
- /// Low level POST request.
- async fn post<T: Serialize>(
- &mut self,
- url: &str,
- data: &T,
- ) -> Result<AcmeResponse, anyhow::Error> {
- let account = Self::need_account(&self.account)?;
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (_directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = account.post_request(url, nonce, data)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(response) => return Ok(response),
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- }
- }
-
- /// Request challenge validation. Afterwards, the challenge should be polled.
- pub async fn request_challenge_validation(
- &mut self,
- url: &str,
- ) -> Result<Challenge, anyhow::Error> {
- Ok(self
- .post(url, &serde_json::Value::Object(Default::default()))
- .await?
- .json()?)
- }
-
- /// Assuming the provided URL is an 'Authorization' URL, get and deserialize it.
- pub async fn get_authorization(&mut self, url: &str) -> Result<Authorization, anyhow::Error> {
- Ok(self.post_as_get(url).await?.json()?)
- }
-
- /// Assuming the provided URL is an 'Order' URL, get and deserialize it.
- pub async fn get_order(&mut self, url: &str) -> Result<OrderData, anyhow::Error> {
- Ok(self.post_as_get(url).await?.json()?)
- }
-
- /// Finalize an Order via its `finalize` URL property and the DER encoded CSR.
- pub async fn finalize(&mut self, url: &str, csr: &[u8]) -> Result<(), anyhow::Error> {
- let csr = proxmox_base64::url::encode_no_pad(csr);
- let data = serde_json::json!({ "csr": csr });
- self.post(url, &data).await?;
- Ok(())
- }
-
- /// Download a certificate via its 'certificate' URL property.
- ///
- /// The certificate will be a PEM certificate chain.
- pub async fn get_certificate(&mut self, url: &str) -> Result<Bytes, anyhow::Error> {
- Ok(self.post_as_get(url).await?.body)
- }
-
- /// Revoke an existing certificate (PEM or DER formatted).
- pub async fn revoke_certificate(
- &mut self,
- certificate: &[u8],
- reason: Option<u32>,
- ) -> Result<(), anyhow::Error> {
- // TODO: This can also work without an account.
- let account = Self::need_account(&self.account)?;
-
- let revocation = account.revoke_certificate(certificate, reason)?;
-
- let mut retry = retry();
- loop {
- retry.tick()?;
-
- let (directory, nonce) = Self::get_dir_nonce(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?;
-
- let request = revocation.request(directory, nonce)?;
- match Self::execute(&mut self.http_client, request, &mut self.nonce).await {
- Ok(_response) => return Ok(()),
- Err(err) if err.is_bad_nonce() => continue,
- Err(err) => return Err(err.into()),
- }
- }
- }
-
- fn need_account(account: &Option<Account>) -> Result<&Account, anyhow::Error> {
- account
- .as_ref()
- .ok_or_else(|| format_err!("cannot use client without an account"))
- }
-
- pub(crate) fn account(&self) -> Result<&Account, anyhow::Error> {
- Self::need_account(&self.account)
- }
-
- pub fn tos(&self) -> Option<&str> {
- self.tos.as_deref()
- }
-
- pub fn directory_url(&self) -> &str {
- &self.directory_url
- }
-
- fn to_account_data(&self) -> Result<AccountData, anyhow::Error> {
- let account = self.account()?;
-
- Ok(AccountData {
- location: account.location.clone(),
- key: account.private_key.clone(),
- account: AcmeAccountData {
- only_return_existing: false, // don't actually write this out in case it's set
- ..account.data.clone()
- },
- tos: self.tos.clone(),
- debug: self.debug,
- directory_url: self.directory_url.clone(),
- })
- }
-
- fn write_to<T: io::Write>(&self, out: T) -> Result<(), anyhow::Error> {
- let data = self.to_account_data()?;
-
- Ok(serde_json::to_writer_pretty(out, &data)?)
- }
-}
-
-struct AcmeResponse {
- body: Bytes,
- location: Option<String>,
- got_nonce: bool,
-}
-
-impl AcmeResponse {
- /// Convenience helper to assert that a location header was part of the response.
- fn location_required(&mut self) -> Result<String, anyhow::Error> {
- self.location
- .take()
- .ok_or_else(|| format_err!("missing Location header"))
- }
-
- /// Convenience shortcut to perform json deserialization of the returned body.
- fn json<T: for<'a> Deserialize<'a>>(&self) -> Result<T, Error> {
- Ok(serde_json::from_slice(&self.body)?)
- }
-
- /// Convenience shortcut to get the body as bytes.
- fn bytes(&self) -> &[u8] {
- &self.body
- }
-}
-
-impl AcmeClient {
- /// Non-self-borrowing run_request version for borrow workarounds.
- async fn execute(
- http_client: &mut Client,
- request: AcmeRequest,
- nonce: &mut Option<String>,
- ) -> Result<AcmeResponse, Error> {
- let req_builder = Request::builder().method(request.method).uri(&request.url);
-
- let http_request = if !request.content_type.is_empty() {
- req_builder
- .header("Content-Type", request.content_type)
- .header("Content-Length", request.body.len())
- .body(request.body.into())
- } else {
- req_builder.body(Body::empty())
- }
- .map_err(|err| Error::Custom(format!("failed to create http request: {err}")))?;
-
- let response = http_client
- .request(http_request)
- .await
- .map_err(|err| Error::Custom(err.to_string()))?;
- let (parts, body) = response.into_parts();
-
- let status = parts.status.as_u16();
- let body = body
- .collect()
- .await
- .map_err(|err| Error::Custom(format!("failed to retrieve response body: {err}")))?
- .to_bytes();
-
- let got_nonce = if let Some(new_nonce) = parts.headers.get(proxmox_acme::REPLAY_NONCE) {
- let new_nonce = new_nonce.to_str().map_err(|err| {
- Error::Client(format!(
- "received invalid replay-nonce header from ACME server: {err}"
- ))
- })?;
- *nonce = Some(new_nonce.to_owned());
- true
- } else {
- false
- };
-
- if parts.status.is_success() {
- if status != request.expected {
- return Err(Error::InvalidApi(format!(
- "ACME server responded with unexpected status code: {:?}",
- parts.status
- )));
- }
-
- let location = parts
- .headers
- .get("Location")
- .map(|header| {
- header.to_str().map(str::to_owned).map_err(|err| {
- Error::Client(format!(
- "received invalid location header from ACME server: {err}"
- ))
- })
- })
- .transpose()?;
-
- return Ok(AcmeResponse {
- body,
- location,
- got_nonce,
- });
- }
-
- let error: ErrorResponse = serde_json::from_slice(&body).map_err(|err| {
- Error::Client(format!(
- "error status with improper error ACME response: {err}"
- ))
- })?;
-
- if error.ty == proxmox_acme::error::BAD_NONCE {
- if !got_nonce {
- return Err(Error::InvalidApi(
- "badNonce without a new Replay-Nonce header".to_string(),
- ));
- }
- return Err(Error::BadNonce);
- }
-
- Err(Error::Api(error))
- }
-
- /// Low-level API to run an n API request. This automatically updates the current nonce!
- async fn run_request(&mut self, request: AcmeRequest) -> Result<AcmeResponse, Error> {
- Self::execute(&mut self.http_client, request, &mut self.nonce).await
- }
-
- pub async fn directory(&mut self) -> Result<&Directory, Error> {
- Ok(Self::get_directory(
- &mut self.http_client,
- &self.directory_url,
- &mut self.directory,
- &mut self.nonce,
- )
- .await?
- .0)
- }
-
- async fn get_directory<'a, 'b>(
- http_client: &mut Client,
- directory_url: &str,
- directory: &'a mut Option<Directory>,
- nonce: &'b mut Option<String>,
- ) -> Result<(&'a Directory, Option<&'b str>), Error> {
- if let Some(d) = directory {
- return Ok((d, nonce.as_deref()));
- }
-
- let response = Self::execute(
- http_client,
- AcmeRequest {
- url: directory_url.to_string(),
- method: "GET",
- content_type: "",
- body: String::new(),
- expected: 200,
- },
- nonce,
- )
- .await?;
-
- *directory = Some(Directory::from_parts(
- directory_url.to_string(),
- response.json()?,
- ));
-
- Ok((directory.as_mut().unwrap(), nonce.as_deref()))
- }
-
- /// Like `get_directory`, but if the directory provides no nonce, also performs a `HEAD`
- /// request on the new nonce URL.
- async fn get_dir_nonce<'a, 'b>(
- http_client: &mut Client,
- directory_url: &str,
- directory: &'a mut Option<Directory>,
- nonce: &'b mut Option<String>,
- ) -> Result<(&'a Directory, &'b str), Error> {
- // this let construct is a lifetime workaround:
- let _ = Self::get_directory(http_client, directory_url, directory, nonce).await?;
- let dir = directory.as_ref().unwrap(); // the above fails if it couldn't fill this option
- if nonce.is_none() {
- // this is also a lifetime issue...
- let _ = Self::get_nonce(http_client, nonce, dir.new_nonce_url()).await?;
- };
- Ok((dir, nonce.as_deref().unwrap()))
- }
-
- pub async fn terms_of_service_url(&mut self) -> Result<Option<&str>, Error> {
- Ok(self.directory().await?.terms_of_service_url())
- }
-
- async fn get_nonce<'a>(
- http_client: &mut Client,
- nonce: &'a mut Option<String>,
- new_nonce_url: &str,
- ) -> Result<&'a str, Error> {
- let response = Self::execute(
- http_client,
- AcmeRequest {
- url: new_nonce_url.to_owned(),
- method: "HEAD",
- content_type: "",
- body: String::new(),
- expected: 200,
- },
- nonce,
- )
- .await?;
-
- if !response.got_nonce {
- return Err(Error::InvalidApi(
- "no new nonce received from new nonce URL".to_string(),
- ));
- }
-
- nonce
- .as_deref()
- .ok_or_else(|| Error::Client("failed to update nonce".to_string()))
- }
-}
-
-/// bad nonce retry count helper
-struct Retry(usize);
-
-const fn retry() -> Retry {
- Retry(0)
-}
-
-impl Retry {
- fn tick(&mut self) -> Result<(), Error> {
- if self.0 >= 3 {
- Err(Error::Client("kept getting a badNonce error!".to_string()))
- } else {
- self.0 += 1;
- Ok(())
- }
- }
-}
diff --git a/src/acme/mod.rs b/src/acme/mod.rs
index bf61811c..700d90d7 100644
--- a/src/acme/mod.rs
+++ b/src/acme/mod.rs
@@ -1,5 +1 @@
-mod client;
-pub use client::AcmeClient;
-
pub(crate) mod plugin;
-pub(crate) use plugin::get_acme_plugin;
diff --git a/src/acme/plugin.rs b/src/acme/plugin.rs
index 993d729b..6804243c 100644
--- a/src/acme/plugin.rs
+++ b/src/acme/plugin.rs
@@ -18,10 +18,10 @@ use tokio::io::{AsyncBufReadExt, AsyncRead, AsyncWriteExt, BufReader};
use tokio::net::TcpListener;
use tokio::process::Command;
+use proxmox_acme::async_client::AcmeClient;
use proxmox_acme::{Authorization, Challenge};
use proxmox_rest_server::WorkerTask;
-use crate::acme::AcmeClient;
use crate::api2::types::AcmeDomain;
use crate::config::acme::plugin::{DnsPlugin, PluginData};
diff --git a/src/api2/config/acme.rs b/src/api2/config/acme.rs
index 18671639..fb1a8a6f 100644
--- a/src/api2/config/acme.rs
+++ b/src/api2/config/acme.rs
@@ -1,29 +1,19 @@
-use std::fs;
-use std::ops::ControlFlow;
+use anyhow::Error;
use std::path::Path;
-use std::sync::{Arc, LazyLock, Mutex};
-use std::time::SystemTime;
-
-use anyhow::{bail, format_err, Error};
-use hex::FromHex;
-use serde::{Deserialize, Serialize};
-use serde_json::{json, Value};
-use tracing::{info, warn};
+use tracing::info;
use pbs_api_types::{Authid, PRIV_SYS_MODIFY};
-use proxmox_acme::types::AccountData as AcmeAccountData;
-use proxmox_acme::Account;
+use proxmox_acme_api::{
+ AccountEntry, AccountInfo, AcmeAccountName, AcmeChallengeSchema, ChallengeSchemaWrapper,
+ DeletablePluginProperty, DnsPluginCore, DnsPluginCoreUpdater, KnownAcmeDirectory, PluginConfig,
+ DEFAULT_ACME_DIRECTORY_ENTRY, PLUGIN_ID_SCHEMA,
+};
+use proxmox_config_digest::ConfigDigest;
use proxmox_rest_server::WorkerTask;
use proxmox_router::{
http_bail, list_subdirs_api_method, Permission, Router, RpcEnvironment, SubdirMap,
};
-use proxmox_schema::{api, param_bail};
-
-use crate::acme::AcmeClient;
-use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
-use crate::config::acme::plugin::{
- self, DnsPlugin, DnsPluginCore, DnsPluginCoreUpdater, PLUGIN_ID_SCHEMA,
-};
+use proxmox_schema::api;
pub(crate) const ROUTER: Router = Router::new()
.get(&list_subdirs_api_method!(SUBDIRS))
@@ -65,19 +55,6 @@ const PLUGIN_ITEM_ROUTER: Router = Router::new()
.put(&API_METHOD_UPDATE_PLUGIN)
.delete(&API_METHOD_DELETE_PLUGIN);
-#[api(
- properties: {
- name: { type: AcmeAccountName },
- },
-)]
-/// An ACME Account entry.
-///
-/// Currently only contains a 'name' property.
-#[derive(Serialize)]
-pub struct AccountEntry {
- name: AcmeAccountName,
-}
-
#[api(
access: {
permission: &Permission::Privilege(&["system", "certificates"], PRIV_SYS_MODIFY, false),
@@ -91,40 +68,7 @@ pub struct AccountEntry {
)]
/// List ACME accounts.
pub fn list_accounts() -> Result<Vec<AccountEntry>, Error> {
- let mut entries = Vec::new();
- crate::config::acme::foreach_acme_account(|name| {
- entries.push(AccountEntry { name });
- ControlFlow::Continue(())
- })?;
- Ok(entries)
-}
-
-#[api(
- properties: {
- account: { type: Object, properties: {}, additional_properties: true },
- tos: {
- type: String,
- optional: true,
- },
- },
-)]
-/// ACME Account information.
-///
-/// This is what we return via the API.
-#[derive(Serialize)]
-pub struct AccountInfo {
- /// Raw account data.
- account: AcmeAccountData,
-
- /// The ACME directory URL the account was created at.
- directory: String,
-
- /// The account's own URL within the ACME directory.
- location: String,
-
- /// The ToS URL, if the user agreed to one.
- #[serde(skip_serializing_if = "Option::is_none")]
- tos: Option<String>,
+ proxmox_acme_api::list_accounts()
}
#[api(
@@ -141,23 +85,7 @@ pub struct AccountInfo {
)]
/// Return existing ACME account information.
pub async fn get_account(name: AcmeAccountName) -> Result<AccountInfo, Error> {
- let client = AcmeClient::load(&name).await?;
- let account = client.account()?;
- Ok(AccountInfo {
- location: account.location.clone(),
- tos: client.tos().map(str::to_owned),
- directory: client.directory_url().to_owned(),
- account: AcmeAccountData {
- only_return_existing: false, // don't actually write this out in case it's set
- ..account.data.clone()
- },
- })
-}
-
-fn account_contact_from_string(s: &str) -> Vec<String> {
- s.split(&[' ', ';', ',', '\0'][..])
- .map(|s| format!("mailto:{s}"))
- .collect()
+ proxmox_acme_api::get_account(name).await
}
#[api(
@@ -222,15 +150,11 @@ fn register_account(
);
}
- if Path::new(&crate::config::acme::account_path(&name)).exists() {
+ if Path::new(&proxmox_acme_api::account_config_filename(&name)).exists() {
http_bail!(BAD_REQUEST, "account {} already exists", name);
}
- let directory = directory.unwrap_or_else(|| {
- crate::config::acme::DEFAULT_ACME_DIRECTORY_ENTRY
- .url
- .to_owned()
- });
+ let directory = directory.unwrap_or_else(|| DEFAULT_ACME_DIRECTORY_ENTRY.url.to_string());
WorkerTask::spawn(
"acme-register",
@@ -238,41 +162,24 @@ fn register_account(
auth_id.to_string(),
true,
move |_worker| async move {
- let mut client = AcmeClient::new(directory);
-
info!("Registering ACME account '{}'...", &name);
- let account = do_register_account(
- &mut client,
+ let location = proxmox_acme_api::register_account(
&name,
- tos_url.is_some(),
contact,
- None,
+ tos_url,
+ Some(directory),
eab_kid.zip(eab_hmac_key),
)
.await?;
- info!("Registration successful, account URL: {}", account.location);
+ info!("Registration successful, account URL: {}", location);
Ok(())
},
)
}
-pub async fn do_register_account<'a>(
- client: &'a mut AcmeClient,
- name: &AcmeAccountName,
- agree_to_tos: bool,
- contact: String,
- rsa_bits: Option<u32>,
- eab_creds: Option<(String, String)>,
-) -> Result<&'a Account, Error> {
- let contact = account_contact_from_string(&contact);
- client
- .new_account(name, agree_to_tos, contact, rsa_bits, eab_creds)
- .await
-}
-
#[api(
input: {
properties: {
@@ -303,14 +210,7 @@ pub fn update_account(
auth_id.to_string(),
true,
move |_worker| async move {
- let data = match contact {
- Some(data) => json!({
- "contact": account_contact_from_string(&data),
- }),
- None => json!({}),
- };
-
- AcmeClient::load(&name).await?.update_account(&data).await?;
+ proxmox_acme_api::update_account(&name, contact).await?;
Ok(())
},
@@ -348,18 +248,8 @@ pub fn deactivate_account(
auth_id.to_string(),
true,
move |_worker| async move {
- match AcmeClient::load(&name)
- .await?
- .update_account(&json!({"status": "deactivated"}))
- .await
- {
- Ok(_account) => (),
- Err(err) if !force => return Err(err),
- Err(err) => {
- warn!("error deactivating account {name}, proceeding anyway - {err}");
- }
- }
- crate::config::acme::mark_account_deactivated(&name)?;
+ proxmox_acme_api::deactivate_account(&name, force).await?;
+
Ok(())
},
)
@@ -386,15 +276,7 @@ pub fn deactivate_account(
)]
/// Get the Terms of Service URL for an ACME directory.
async fn get_tos(directory: Option<String>) -> Result<Option<String>, Error> {
- let directory = directory.unwrap_or_else(|| {
- crate::config::acme::DEFAULT_ACME_DIRECTORY_ENTRY
- .url
- .to_owned()
- });
- Ok(AcmeClient::new(directory)
- .terms_of_service_url()
- .await?
- .map(str::to_owned))
+ proxmox_acme_api::get_tos(directory).await
}
#[api(
@@ -409,52 +291,7 @@ async fn get_tos(directory: Option<String>) -> Result<Option<String>, Error> {
)]
/// Get named known ACME directory endpoints.
fn get_directories() -> Result<&'static [KnownAcmeDirectory], Error> {
- Ok(crate::config::acme::KNOWN_ACME_DIRECTORIES)
-}
-
-/// Wrapper for efficient Arc use when returning the ACME challenge-plugin schema for serializing
-struct ChallengeSchemaWrapper {
- inner: Arc<Vec<AcmeChallengeSchema>>,
-}
-
-impl Serialize for ChallengeSchemaWrapper {
- fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
- where
- S: serde::Serializer,
- {
- self.inner.serialize(serializer)
- }
-}
-
-struct CachedSchema {
- schema: Arc<Vec<AcmeChallengeSchema>>,
- cached_mtime: SystemTime,
-}
-
-fn get_cached_challenge_schemas() -> Result<ChallengeSchemaWrapper, Error> {
- static CACHE: LazyLock<Mutex<Option<CachedSchema>>> = LazyLock::new(|| Mutex::new(None));
-
- // the actual loading code
- let mut last = CACHE.lock().unwrap();
-
- let actual_mtime = fs::metadata(crate::config::acme::ACME_DNS_SCHEMA_FN)?.modified()?;
-
- let schema = match &*last {
- Some(CachedSchema {
- schema,
- cached_mtime,
- }) if *cached_mtime >= actual_mtime => schema.clone(),
- _ => {
- let new_schema = Arc::new(crate::config::acme::load_dns_challenge_schema()?);
- *last = Some(CachedSchema {
- schema: Arc::clone(&new_schema),
- cached_mtime: actual_mtime,
- });
- new_schema
- }
- };
-
- Ok(ChallengeSchemaWrapper { inner: schema })
+ Ok(proxmox_acme_api::KNOWN_ACME_DIRECTORIES)
}
#[api(
@@ -469,69 +306,7 @@ fn get_cached_challenge_schemas() -> Result<ChallengeSchemaWrapper, Error> {
)]
/// Get named known ACME directory endpoints.
fn get_challenge_schema() -> Result<ChallengeSchemaWrapper, Error> {
- get_cached_challenge_schemas()
-}
-
-#[api]
-#[derive(Default, Deserialize, Serialize)]
-#[serde(rename_all = "kebab-case")]
-/// The API's format is inherited from PVE/PMG:
-pub struct PluginConfig {
- /// Plugin ID.
- plugin: String,
-
- /// Plugin type.
- #[serde(rename = "type")]
- ty: String,
-
- /// DNS Api name.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- api: Option<String>,
-
- /// Plugin configuration data.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- data: Option<String>,
-
- /// Extra delay in seconds to wait before requesting validation.
- ///
- /// Allows to cope with long TTL of DNS records.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- validation_delay: Option<u32>,
-
- /// Flag to disable the config.
- #[serde(skip_serializing_if = "Option::is_none", default)]
- disable: Option<bool>,
-}
-
-// See PMG/PVE's $modify_cfg_for_api sub
-fn modify_cfg_for_api(id: &str, ty: &str, data: &Value) -> PluginConfig {
- let mut entry = data.clone();
-
- let obj = entry.as_object_mut().unwrap();
- obj.remove("id");
- obj.insert("plugin".to_string(), Value::String(id.to_owned()));
- obj.insert("type".to_string(), Value::String(ty.to_owned()));
-
- // FIXME: This needs to go once the `Updater` is fixed.
- // None of these should be able to fail unless the user changed the files by hand, in which
- // case we leave the unmodified string in the Value for now. This will be handled with an error
- // later.
- if let Some(Value::String(ref mut data)) = obj.get_mut("data") {
- if let Ok(new) = proxmox_base64::url::decode_no_pad(&data) {
- if let Ok(utf8) = String::from_utf8(new) {
- *data = utf8;
- }
- }
- }
-
- // PVE/PMG do this explicitly for ACME plugins...
- // obj.insert("digest".to_string(), Value::String(digest.clone()));
-
- serde_json::from_value(entry).unwrap_or_else(|_| PluginConfig {
- plugin: "*Error*".to_string(),
- ty: "*Error*".to_string(),
- ..Default::default()
- })
+ proxmox_acme_api::get_cached_challenge_schemas()
}
#[api(
@@ -547,12 +322,7 @@ fn modify_cfg_for_api(id: &str, ty: &str, data: &Value) -> PluginConfig {
)]
/// List ACME challenge plugins.
pub fn list_plugins(rpcenv: &mut dyn RpcEnvironment) -> Result<Vec<PluginConfig>, Error> {
- let (plugins, digest) = plugin::config()?;
- rpcenv["digest"] = hex::encode(digest).into();
- Ok(plugins
- .iter()
- .map(|(id, (ty, data))| modify_cfg_for_api(id, ty, data))
- .collect())
+ proxmox_acme_api::list_plugins(rpcenv)
}
#[api(
@@ -569,13 +339,7 @@ pub fn list_plugins(rpcenv: &mut dyn RpcEnvironment) -> Result<Vec<PluginConfig>
)]
/// List ACME challenge plugins.
pub fn get_plugin(id: String, rpcenv: &mut dyn RpcEnvironment) -> Result<PluginConfig, Error> {
- let (plugins, digest) = plugin::config()?;
- rpcenv["digest"] = hex::encode(digest).into();
-
- match plugins.get(&id) {
- Some((ty, data)) => Ok(modify_cfg_for_api(&id, ty, data)),
- None => http_bail!(NOT_FOUND, "no such plugin"),
- }
+ proxmox_acme_api::get_plugin(id, rpcenv)
}
// Currently we only have "the" standalone plugin and DNS plugins so we can just flatten a
@@ -607,30 +371,7 @@ pub fn get_plugin(id: String, rpcenv: &mut dyn RpcEnvironment) -> Result<PluginC
)]
/// Add ACME plugin configuration.
pub fn add_plugin(r#type: String, core: DnsPluginCore, data: String) -> Result<(), Error> {
- // Currently we only support DNS plugins and the standalone plugin is "fixed":
- if r#type != "dns" {
- param_bail!("type", "invalid ACME plugin type: {:?}", r#type);
- }
-
- let data = String::from_utf8(proxmox_base64::decode(data)?)
- .map_err(|_| format_err!("data must be valid UTF-8"))?;
-
- let id = core.id.clone();
-
- let _lock = plugin::lock()?;
-
- let (mut plugins, _digest) = plugin::config()?;
- if plugins.contains_key(&id) {
- param_bail!("id", "ACME plugin ID {:?} already exists", id);
- }
-
- let plugin = serde_json::to_value(DnsPlugin { core, data })?;
-
- plugins.insert(id, r#type, plugin);
-
- plugin::save_config(&plugins)?;
-
- Ok(())
+ proxmox_acme_api::add_plugin(r#type, core, data)
}
#[api(
@@ -646,26 +387,7 @@ pub fn add_plugin(r#type: String, core: DnsPluginCore, data: String) -> Result<(
)]
/// Delete an ACME plugin configuration.
pub fn delete_plugin(id: String) -> Result<(), Error> {
- let _lock = plugin::lock()?;
-
- let (mut plugins, _digest) = plugin::config()?;
- if plugins.remove(&id).is_none() {
- http_bail!(NOT_FOUND, "no such plugin");
- }
- plugin::save_config(&plugins)?;
-
- Ok(())
-}
-
-#[api()]
-#[derive(Serialize, Deserialize)]
-#[serde(rename_all = "kebab-case")]
-/// Deletable property name
-pub enum DeletableProperty {
- /// Delete the disable property
- Disable,
- /// Delete the validation-delay property
- ValidationDelay,
+ proxmox_acme_api::delete_plugin(id)
}
#[api(
@@ -687,12 +409,12 @@ pub enum DeletableProperty {
type: Array,
optional: true,
items: {
- type: DeletableProperty,
+ type: DeletablePluginProperty,
}
},
digest: {
- description: "Digest to protect against concurrent updates",
optional: true,
+ type: ConfigDigest,
},
},
},
@@ -706,65 +428,8 @@ pub fn update_plugin(
id: String,
update: DnsPluginCoreUpdater,
data: Option<String>,
- delete: Option<Vec<DeletableProperty>>,
- digest: Option<String>,
+ delete: Option<Vec<DeletablePluginProperty>>,
+ digest: Option<ConfigDigest>,
) -> Result<(), Error> {
- let data = data
- .as_deref()
- .map(proxmox_base64::decode)
- .transpose()?
- .map(String::from_utf8)
- .transpose()
- .map_err(|_| format_err!("data must be valid UTF-8"))?;
-
- let _lock = plugin::lock()?;
-
- let (mut plugins, expected_digest) = plugin::config()?;
-
- if let Some(digest) = digest {
- let digest = <[u8; 32]>::from_hex(digest)?;
- crate::tools::detect_modified_configuration_file(&digest, &expected_digest)?;
- }
-
- match plugins.get_mut(&id) {
- Some((ty, ref mut entry)) => {
- if ty != "dns" {
- bail!("cannot update plugin of type {:?}", ty);
- }
-
- let mut plugin = DnsPlugin::deserialize(&*entry)?;
-
- if let Some(delete) = delete {
- for delete_prop in delete {
- match delete_prop {
- DeletableProperty::ValidationDelay => {
- plugin.core.validation_delay = None;
- }
- DeletableProperty::Disable => {
- plugin.core.disable = None;
- }
- }
- }
- }
- if let Some(data) = data {
- plugin.data = data;
- }
- if let Some(api) = update.api {
- plugin.core.api = api;
- }
- if update.validation_delay.is_some() {
- plugin.core.validation_delay = update.validation_delay;
- }
- if update.disable.is_some() {
- plugin.core.disable = update.disable;
- }
-
- *entry = serde_json::to_value(plugin)?;
- }
- None => http_bail!(NOT_FOUND, "no such plugin"),
- }
-
- plugin::save_config(&plugins)?;
-
- Ok(())
+ proxmox_acme_api::update_plugin(id, update, data, delete, digest)
}
diff --git a/src/api2/node/certificates.rs b/src/api2/node/certificates.rs
index 6b1d87d2..7fb3a478 100644
--- a/src/api2/node/certificates.rs
+++ b/src/api2/node/certificates.rs
@@ -1,13 +1,11 @@
-use std::sync::Arc;
-use std::time::Duration;
-
use anyhow::{bail, format_err, Error};
use openssl::pkey::PKey;
use openssl::x509::X509;
use serde::{Deserialize, Serialize};
-use tracing::{info, warn};
+use tracing::info;
use pbs_api_types::{NODE_SCHEMA, PRIV_SYS_MODIFY};
+use proxmox_acme_api::AcmeDomain;
use proxmox_rest_server::WorkerTask;
use proxmox_router::list_subdirs_api_method;
use proxmox_router::SubdirMap;
@@ -17,9 +15,6 @@ use proxmox_schema::api;
use pbs_buildcfg::configdir;
use pbs_tools::cert;
-use crate::acme::AcmeClient;
-use crate::api2::types::AcmeDomain;
-use crate::config::node::NodeConfig;
use crate::server::send_certificate_renewal_mail;
pub const ROUTER: Router = Router::new()
@@ -268,193 +263,6 @@ pub async fn delete_custom_certificate() -> Result<(), Error> {
Ok(())
}
-struct OrderedCertificate {
- certificate: hyper::body::Bytes,
- private_key_pem: Vec<u8>,
-}
-
-async fn order_certificate(
- worker: Arc<WorkerTask>,
- node_config: &NodeConfig,
-) -> Result<Option<OrderedCertificate>, Error> {
- use proxmox_acme::authorization::Status;
- use proxmox_acme::order::Identifier;
-
- let domains = node_config.acme_domains().try_fold(
- Vec::<AcmeDomain>::new(),
- |mut acc, domain| -> Result<_, Error> {
- let mut domain = domain?;
- domain.domain.make_ascii_lowercase();
- if let Some(alias) = &mut domain.alias {
- alias.make_ascii_lowercase();
- }
- acc.push(domain);
- Ok(acc)
- },
- )?;
-
- let get_domain_config = |domain: &str| {
- domains
- .iter()
- .find(|d| d.domain == domain)
- .ok_or_else(|| format_err!("no config for domain '{}'", domain))
- };
-
- if domains.is_empty() {
- info!("No domains configured to be ordered from an ACME server.");
- return Ok(None);
- }
-
- let (plugins, _) = crate::config::acme::plugin::config()?;
-
- let mut acme = node_config.acme_client().await?;
-
- info!("Placing ACME order");
- let order = acme
- .new_order(domains.iter().map(|d| d.domain.to_ascii_lowercase()))
- .await?;
- info!("Order URL: {}", order.location);
-
- let identifiers: Vec<String> = order
- .data
- .identifiers
- .iter()
- .map(|identifier| match identifier {
- Identifier::Dns(domain) => domain.clone(),
- })
- .collect();
-
- for auth_url in &order.data.authorizations {
- info!("Getting authorization details from '{auth_url}'");
- let mut auth = acme.get_authorization(auth_url).await?;
-
- let domain = match &mut auth.identifier {
- Identifier::Dns(domain) => domain.to_ascii_lowercase(),
- };
-
- if auth.status == Status::Valid {
- info!("{domain} is already validated!");
- continue;
- }
-
- info!("The validation for {domain} is pending");
- let domain_config: &AcmeDomain = get_domain_config(&domain)?;
- let plugin_id = domain_config.plugin.as_deref().unwrap_or("standalone");
- let mut plugin_cfg = crate::acme::get_acme_plugin(&plugins, plugin_id)?
- .ok_or_else(|| format_err!("plugin '{plugin_id}' for domain '{domain}' not found!"))?;
-
- info!("Setting up validation plugin");
- let validation_url = plugin_cfg
- .setup(&mut acme, &auth, domain_config, Arc::clone(&worker))
- .await?;
-
- let result = request_validation(&mut acme, auth_url, validation_url).await;
-
- if let Err(err) = plugin_cfg
- .teardown(&mut acme, &auth, domain_config, Arc::clone(&worker))
- .await
- {
- warn!("Failed to teardown plugin '{plugin_id}' for domain '{domain}' - {err}");
- }
-
- result?;
- }
-
- info!("All domains validated");
- info!("Creating CSR");
-
- let csr = proxmox_acme::util::Csr::generate(&identifiers, &Default::default())?;
- let mut finalize_error_cnt = 0u8;
- let order_url = &order.location;
- let mut order;
- loop {
- use proxmox_acme::order::Status;
-
- order = acme.get_order(order_url).await?;
-
- match order.status {
- Status::Pending => {
- info!("still pending, trying to finalize anyway");
- let finalize = order
- .finalize
- .as_deref()
- .ok_or_else(|| format_err!("missing 'finalize' URL in order"))?;
- if let Err(err) = acme.finalize(finalize, &csr.data).await {
- if finalize_error_cnt >= 5 {
- return Err(err);
- }
-
- finalize_error_cnt += 1;
- }
- tokio::time::sleep(Duration::from_secs(5)).await;
- }
- Status::Ready => {
- info!("order is ready, finalizing");
- let finalize = order
- .finalize
- .as_deref()
- .ok_or_else(|| format_err!("missing 'finalize' URL in order"))?;
- acme.finalize(finalize, &csr.data).await?;
- tokio::time::sleep(Duration::from_secs(5)).await;
- }
- Status::Processing => {
- info!("still processing, trying again in 30 seconds");
- tokio::time::sleep(Duration::from_secs(30)).await;
- }
- Status::Valid => {
- info!("valid");
- break;
- }
- other => bail!("order status: {:?}", other),
- }
- }
-
- info!("Downloading certificate");
- let certificate = acme
- .get_certificate(
- order
- .certificate
- .as_deref()
- .ok_or_else(|| format_err!("missing certificate url in finalized order"))?,
- )
- .await?;
-
- Ok(Some(OrderedCertificate {
- certificate,
- private_key_pem: csr.private_key_pem,
- }))
-}
-
-async fn request_validation(
- acme: &mut AcmeClient,
- auth_url: &str,
- validation_url: &str,
-) -> Result<(), Error> {
- info!("Triggering validation");
- acme.request_challenge_validation(validation_url).await?;
-
- info!("Sleeping for 5 seconds");
- tokio::time::sleep(Duration::from_secs(5)).await;
-
- loop {
- use proxmox_acme::authorization::Status;
-
- let auth = acme.get_authorization(auth_url).await?;
- match auth.status {
- Status::Pending => {
- info!("Status is still 'pending', trying again in 10 seconds");
- tokio::time::sleep(Duration::from_secs(10)).await;
- }
- Status::Valid => return Ok(()),
- other => bail!(
- "validating challenge '{}' failed - status: {:?}",
- validation_url,
- other
- ),
- }
- }
-}
-
#[api(
input: {
properties: {
@@ -524,9 +332,26 @@ fn spawn_certificate_worker(
let auth_id = rpcenv.get_auth_id().unwrap();
+ let acme_config = node_config.acme_config()?;
+
+ let domains = node_config.acme_domains().try_fold(
+ Vec::<AcmeDomain>::new(),
+ |mut acc, domain| -> Result<_, Error> {
+ let mut domain = domain?;
+ domain.domain.make_ascii_lowercase();
+ if let Some(alias) = &mut domain.alias {
+ alias.make_ascii_lowercase();
+ }
+ acc.push(domain);
+ Ok(acc)
+ },
+ )?;
+
WorkerTask::spawn(name, None, auth_id, true, move |worker| async move {
let work = || async {
- if let Some(cert) = order_certificate(worker, &node_config).await? {
+ if let Some(cert) =
+ proxmox_acme_api::order_certificate(worker, &acme_config, &domains).await?
+ {
crate::config::set_proxy_certificate(&cert.certificate, &cert.private_key_pem)?;
crate::server::reload_proxy_certificate().await?;
}
@@ -562,16 +387,16 @@ pub fn revoke_acme_cert(rpcenv: &mut dyn RpcEnvironment) -> Result<String, Error
let auth_id = rpcenv.get_auth_id().unwrap();
+ let acme_config = node_config.acme_config()?;
+
WorkerTask::spawn(
"acme-revoke-cert",
None,
auth_id,
true,
move |_worker| async move {
- info!("Loading ACME account");
- let mut acme = node_config.acme_client().await?;
info!("Revoking old certificate");
- acme.revoke_certificate(cert_pem.as_bytes(), None).await?;
+ proxmox_acme_api::revoke_certificate(&acme_config, &cert_pem.as_bytes()).await?;
info!("Deleting certificate and regenerating a self-signed one");
delete_custom_certificate().await?;
Ok(())
diff --git a/src/api2/types/acme.rs b/src/api2/types/acme.rs
index 8661f9e8..b83b9882 100644
--- a/src/api2/types/acme.rs
+++ b/src/api2/types/acme.rs
@@ -1,8 +1,7 @@
use serde::{Deserialize, Serialize};
-use serde_json::Value;
use pbs_api_types::{DNS_ALIAS_FORMAT, DNS_NAME_FORMAT, PROXMOX_SAFE_ID_FORMAT};
-use proxmox_schema::{api, ApiStringFormat, ApiType, Schema, StringSchema};
+use proxmox_schema::api;
#[api(
properties: {
@@ -37,61 +36,3 @@ pub struct AcmeDomain {
#[serde(skip_serializing_if = "Option::is_none")]
pub plugin: Option<String>,
}
-
-pub const ACME_DOMAIN_PROPERTY_SCHEMA: Schema =
- StringSchema::new("ACME domain configuration string")
- .format(&ApiStringFormat::PropertyString(&AcmeDomain::API_SCHEMA))
- .schema();
-
-#[api(
- properties: {
- name: { type: String },
- url: { type: String },
- },
-)]
-/// An ACME directory endpoint with a name and URL.
-#[derive(Serialize)]
-pub struct KnownAcmeDirectory {
- /// The ACME directory's name.
- pub name: &'static str,
-
- /// The ACME directory's endpoint URL.
- pub url: &'static str,
-}
-
-proxmox_schema::api_string_type! {
- #[api(format: &PROXMOX_SAFE_ID_FORMAT)]
- /// ACME account name.
- #[derive(Clone, Eq, PartialEq, Hash, Deserialize, Serialize)]
- #[serde(transparent)]
- pub struct AcmeAccountName(String);
-}
-
-#[api(
- properties: {
- schema: {
- type: Object,
- additional_properties: true,
- properties: {},
- },
- type: {
- type: String,
- },
- },
-)]
-#[derive(Serialize)]
-/// Schema for an ACME challenge plugin.
-pub struct AcmeChallengeSchema {
- /// Plugin ID.
- pub id: String,
-
- /// Human readable name, falls back to id.
- pub name: String,
-
- /// Plugin Type.
- #[serde(rename = "type")]
- pub ty: &'static str,
-
- /// The plugin's parameter schema.
- pub schema: Value,
-}
diff --git a/src/bin/proxmox-backup-api.rs b/src/bin/proxmox-backup-api.rs
index 417e9e97..d0091dca 100644
--- a/src/bin/proxmox-backup-api.rs
+++ b/src/bin/proxmox-backup-api.rs
@@ -14,6 +14,7 @@ use proxmox_rest_server::{ApiConfig, RestServer};
use proxmox_router::RpcEnvironmentType;
use proxmox_sys::fs::CreateOptions;
+use pbs_buildcfg::configdir;
use proxmox_backup::auth_helpers::*;
use proxmox_backup::config;
use proxmox_backup::server::auth::check_pbs_auth;
@@ -78,6 +79,7 @@ async fn run() -> Result<(), Error> {
let mut command_sock = proxmox_daemon::command_socket::CommandSocket::new(backup_user.gid);
proxmox_product_config::init(backup_user.clone(), pbs_config::priv_user()?);
+ proxmox_acme_api::init(configdir!("/acme"), true)?;
let dir_opts = CreateOptions::new()
.owner(backup_user.uid)
diff --git a/src/bin/proxmox-backup-manager.rs b/src/bin/proxmox-backup-manager.rs
index f8365070..f041ba0b 100644
--- a/src/bin/proxmox-backup-manager.rs
+++ b/src/bin/proxmox-backup-manager.rs
@@ -19,12 +19,12 @@ use proxmox_router::{cli::*, RpcEnvironment};
use proxmox_schema::api;
use proxmox_sys::fs::CreateOptions;
+use pbs_buildcfg::configdir;
use pbs_client::{display_task_log, view_task_result};
use pbs_config::sync;
use pbs_tools::json::required_string_param;
use proxmox_backup::api2;
use proxmox_backup::client_helpers::connect_to_localhost;
-use proxmox_backup::config;
mod proxmox_backup_manager;
use proxmox_backup_manager::*;
@@ -667,6 +667,7 @@ async fn run() -> Result<(), Error> {
.init()?;
proxmox_backup::server::notifications::init()?;
proxmox_product_config::init(pbs_config::backup_user()?, pbs_config::priv_user()?);
+ proxmox_acme_api::init(configdir!("/acme"), false)?;
let cmd_def = CliCommandMap::new()
.insert("acl", acl_commands())
diff --git a/src/bin/proxmox-backup-proxy.rs b/src/bin/proxmox-backup-proxy.rs
index 870208fe..eea44a7d 100644
--- a/src/bin/proxmox-backup-proxy.rs
+++ b/src/bin/proxmox-backup-proxy.rs
@@ -188,6 +188,7 @@ async fn run() -> Result<(), Error> {
proxmox_backup::server::notifications::init()?;
metric_collection::init()?;
proxmox_product_config::init(pbs_config::backup_user()?, pbs_config::priv_user()?);
+ proxmox_acme_api::init(configdir!("/acme"), false)?;
let mut indexpath = PathBuf::from(pbs_buildcfg::JS_DIR);
indexpath.push("index.hbs");
diff --git a/src/bin/proxmox_backup_manager/acme.rs b/src/bin/proxmox_backup_manager/acme.rs
index 0f0eafea..57431225 100644
--- a/src/bin/proxmox_backup_manager/acme.rs
+++ b/src/bin/proxmox_backup_manager/acme.rs
@@ -3,15 +3,13 @@ use std::io::Write;
use anyhow::{bail, Error};
use serde_json::Value;
+use proxmox_acme::async_client::AcmeClient;
+use proxmox_acme_api::{AcmeAccountName, DnsPluginCore, KNOWN_ACME_DIRECTORIES};
use proxmox_router::{cli::*, ApiHandler, RpcEnvironment};
use proxmox_schema::api;
use proxmox_sys::fs::file_get_contents;
-use proxmox_backup::acme::AcmeClient;
use proxmox_backup::api2;
-use proxmox_backup::api2::types::AcmeAccountName;
-use proxmox_backup::config::acme::plugin::DnsPluginCore;
-use proxmox_backup::config::acme::KNOWN_ACME_DIRECTORIES;
pub fn acme_mgmt_cli() -> CommandLineInterface {
let cmd_def = CliCommandMap::new()
@@ -122,7 +120,7 @@ async fn register_account(
match input.trim().parse::<usize>() {
Ok(n) if n < KNOWN_ACME_DIRECTORIES.len() => {
- break (KNOWN_ACME_DIRECTORIES[n].url.to_owned(), false);
+ break (KNOWN_ACME_DIRECTORIES[n].url.to_string(), false);
}
Ok(n) if n == KNOWN_ACME_DIRECTORIES.len() => {
input.clear();
@@ -188,17 +186,20 @@ async fn register_account(
println!("Attempting to register account with {directory_url:?}...");
- let account = api2::config::acme::do_register_account(
- &mut client,
+ let tos_agreed = tos_agreed
+ .then(|| directory.terms_of_service_url().map(str::to_owned))
+ .flatten();
+
+ let location = proxmox_acme_api::register_account(
&name,
- tos_agreed,
contact,
- None,
+ tos_agreed,
+ Some(directory_url),
eab_creds,
)
.await?;
- println!("Registration successful, account URL: {}", account.location);
+ println!("Registration successful, account URL: {}", location);
Ok(())
}
@@ -266,19 +267,19 @@ pub fn account_cli() -> CommandLineInterface {
"deactivate",
CliCommand::new(&API_METHOD_DEACTIVATE_ACCOUNT)
.arg_param(&["name"])
- .completion_cb("name", crate::config::acme::complete_acme_account),
+ .completion_cb("name", proxmox_acme_api::complete_acme_account),
)
.insert(
"info",
CliCommand::new(&API_METHOD_GET_ACCOUNT)
.arg_param(&["name"])
- .completion_cb("name", crate::config::acme::complete_acme_account),
+ .completion_cb("name", proxmox_acme_api::complete_acme_account),
)
.insert(
"update",
CliCommand::new(&API_METHOD_UPDATE_ACCOUNT)
.arg_param(&["name"])
- .completion_cb("name", crate::config::acme::complete_acme_account),
+ .completion_cb("name", proxmox_acme_api::complete_acme_account),
);
cmd_def.into()
@@ -373,26 +374,26 @@ pub fn plugin_cli() -> CommandLineInterface {
"config", // name comes from pve/pmg
CliCommand::new(&API_METHOD_GET_PLUGIN)
.arg_param(&["id"])
- .completion_cb("id", crate::config::acme::complete_acme_plugin),
+ .completion_cb("id", proxmox_acme_api::complete_acme_plugin),
)
.insert(
"add",
CliCommand::new(&API_METHOD_ADD_PLUGIN)
.arg_param(&["type", "id"])
- .completion_cb("api", crate::config::acme::complete_acme_api_challenge_type)
- .completion_cb("type", crate::config::acme::complete_acme_plugin_type),
+ .completion_cb("api", proxmox_acme_api::complete_acme_api_challenge_type)
+ .completion_cb("type", proxmox_acme_api::complete_acme_plugin_type),
)
.insert(
"remove",
CliCommand::new(&acme::API_METHOD_DELETE_PLUGIN)
.arg_param(&["id"])
- .completion_cb("id", crate::config::acme::complete_acme_plugin),
+ .completion_cb("id", proxmox_acme_api::complete_acme_plugin),
)
.insert(
"set",
CliCommand::new(&acme::API_METHOD_UPDATE_PLUGIN)
.arg_param(&["id"])
- .completion_cb("id", crate::config::acme::complete_acme_plugin),
+ .completion_cb("id", proxmox_acme_api::complete_acme_plugin),
);
cmd_def.into()
diff --git a/src/config/acme/mod.rs b/src/config/acme/mod.rs
index ac89ae5e..962cb1bb 100644
--- a/src/config/acme/mod.rs
+++ b/src/config/acme/mod.rs
@@ -1,168 +1 @@
-use std::collections::HashMap;
-use std::ops::ControlFlow;
-use std::path::Path;
-
-use anyhow::{bail, format_err, Error};
-use serde_json::Value;
-
-use pbs_api_types::PROXMOX_SAFE_ID_REGEX;
-use proxmox_sys::error::SysError;
-use proxmox_sys::fs::{file_read_string, CreateOptions};
-
-use crate::api2::types::{AcmeAccountName, AcmeChallengeSchema, KnownAcmeDirectory};
-
-pub(crate) const ACME_DIR: &str = pbs_buildcfg::configdir!("/acme");
-pub(crate) const ACME_ACCOUNT_DIR: &str = pbs_buildcfg::configdir!("/acme/accounts");
-
-pub(crate) const ACME_DNS_SCHEMA_FN: &str = "/usr/share/proxmox-acme/dns-challenge-schema.json";
-
pub mod plugin;
-
-// `const fn`ify this once it is supported in `proxmox`
-fn root_only() -> CreateOptions {
- CreateOptions::new()
- .owner(nix::unistd::ROOT)
- .group(nix::unistd::Gid::from_raw(0))
- .perm(nix::sys::stat::Mode::from_bits_truncate(0o700))
-}
-
-fn create_acme_subdir(dir: &str) -> Result<(), Error> {
- proxmox_sys::fs::ensure_dir_exists(dir, &root_only(), false)
-}
-
-pub(crate) fn make_acme_dir() -> Result<(), Error> {
- create_acme_subdir(ACME_DIR)
-}
-
-pub(crate) fn make_acme_account_dir() -> Result<(), Error> {
- make_acme_dir()?;
- create_acme_subdir(ACME_ACCOUNT_DIR)
-}
-
-pub const KNOWN_ACME_DIRECTORIES: &[KnownAcmeDirectory] = &[
- KnownAcmeDirectory {
- name: "Let's Encrypt V2",
- url: "https://acme-v02.api.letsencrypt.org/directory",
- },
- KnownAcmeDirectory {
- name: "Let's Encrypt V2 Staging",
- url: "https://acme-staging-v02.api.letsencrypt.org/directory",
- },
-];
-
-pub const DEFAULT_ACME_DIRECTORY_ENTRY: &KnownAcmeDirectory = &KNOWN_ACME_DIRECTORIES[0];
-
-pub fn account_path(name: &str) -> String {
- format!("{ACME_ACCOUNT_DIR}/{name}")
-}
-
-pub fn foreach_acme_account<F>(mut func: F) -> Result<(), Error>
-where
- F: FnMut(AcmeAccountName) -> ControlFlow<Result<(), Error>>,
-{
- match proxmox_sys::fs::scan_subdir(-1, ACME_ACCOUNT_DIR, &PROXMOX_SAFE_ID_REGEX) {
- Ok(files) => {
- for file in files {
- let file = file?;
- let file_name = unsafe { file.file_name_utf8_unchecked() };
-
- if file_name.starts_with('_') {
- continue;
- }
-
- let account_name = match AcmeAccountName::from_string(file_name.to_owned()) {
- Ok(account_name) => account_name,
- Err(_) => continue,
- };
-
- if let ControlFlow::Break(result) = func(account_name) {
- return result;
- }
- }
- Ok(())
- }
- Err(err) if err.not_found() => Ok(()),
- Err(err) => Err(err.into()),
- }
-}
-
-pub fn mark_account_deactivated(name: &str) -> Result<(), Error> {
- let from = account_path(name);
- for i in 0..100 {
- let to = account_path(&format!("_deactivated_{name}_{i}"));
- if !Path::new(&to).exists() {
- return std::fs::rename(&from, &to).map_err(|err| {
- format_err!(
- "failed to move account path {:?} to {:?} - {}",
- from,
- to,
- err
- )
- });
- }
- }
- bail!(
- "No free slot to rename deactivated account {:?}, please cleanup {:?}",
- from,
- ACME_ACCOUNT_DIR
- );
-}
-
-pub fn load_dns_challenge_schema() -> Result<Vec<AcmeChallengeSchema>, Error> {
- let raw = file_read_string(ACME_DNS_SCHEMA_FN)?;
- let schemas: serde_json::Map<String, Value> = serde_json::from_str(&raw)?;
-
- Ok(schemas
- .iter()
- .map(|(id, schema)| AcmeChallengeSchema {
- id: id.to_owned(),
- name: schema
- .get("name")
- .and_then(Value::as_str)
- .unwrap_or(id)
- .to_owned(),
- ty: "dns",
- schema: schema.to_owned(),
- })
- .collect())
-}
-
-pub fn complete_acme_account(_arg: &str, _param: &HashMap<String, String>) -> Vec<String> {
- let mut out = Vec::new();
- let _ = foreach_acme_account(|name| {
- out.push(name.into_string());
- ControlFlow::Continue(())
- });
- out
-}
-
-pub fn complete_acme_plugin(_arg: &str, _param: &HashMap<String, String>) -> Vec<String> {
- match plugin::config() {
- Ok((config, _digest)) => config
- .iter()
- .map(|(id, (_type, _cfg))| id.clone())
- .collect(),
- Err(_) => Vec::new(),
- }
-}
-
-pub fn complete_acme_plugin_type(_arg: &str, _param: &HashMap<String, String>) -> Vec<String> {
- vec![
- "dns".to_string(),
- //"http".to_string(), // makes currently not really sense to create or the like
- ]
-}
-
-pub fn complete_acme_api_challenge_type(
- _arg: &str,
- param: &HashMap<String, String>,
-) -> Vec<String> {
- if param.get("type") == Some(&"dns".to_string()) {
- match load_dns_challenge_schema() {
- Ok(schema) => schema.into_iter().map(|s| s.id).collect(),
- Err(_) => Vec::new(),
- }
- } else {
- Vec::new()
- }
-}
diff --git a/src/config/acme/plugin.rs b/src/config/acme/plugin.rs
index 8ce852ec..e5a41f99 100644
--- a/src/config/acme/plugin.rs
+++ b/src/config/acme/plugin.rs
@@ -1,14 +1,10 @@
-use std::sync::LazyLock;
-
use anyhow::Error;
use serde::{Deserialize, Serialize};
use serde_json::Value;
use pbs_api_types::PROXMOX_SAFE_ID_FORMAT;
-use proxmox_schema::{api, ApiType, Schema, StringSchema, Updater};
-use proxmox_section_config::{SectionConfig, SectionConfigData, SectionConfigPlugin};
-
-use pbs_config::{open_backup_lockfile, BackupLockGuard};
+use proxmox_schema::{api, Schema, StringSchema, Updater};
+use proxmox_section_config::SectionConfigData;
pub const PLUGIN_ID_SCHEMA: Schema = StringSchema::new("ACME Challenge Plugin ID.")
.format(&PROXMOX_SAFE_ID_FORMAT)
@@ -16,28 +12,6 @@ pub const PLUGIN_ID_SCHEMA: Schema = StringSchema::new("ACME Challenge Plugin ID
.max_length(32)
.schema();
-pub static CONFIG: LazyLock<SectionConfig> = LazyLock::new(init);
-
-#[api(
- properties: {
- id: { schema: PLUGIN_ID_SCHEMA },
- },
-)]
-#[derive(Deserialize, Serialize)]
-/// Standalone ACME Plugin for the http-1 challenge.
-pub struct StandalonePlugin {
- /// Plugin ID.
- id: String,
-}
-
-impl Default for StandalonePlugin {
- fn default() -> Self {
- Self {
- id: "standalone".to_string(),
- }
- }
-}
-
#[api(
properties: {
id: { schema: PLUGIN_ID_SCHEMA },
@@ -99,64 +73,6 @@ impl DnsPlugin {
}
}
-fn init() -> SectionConfig {
- let mut config = SectionConfig::new(&PLUGIN_ID_SCHEMA);
-
- let standalone_schema = match &StandalonePlugin::API_SCHEMA {
- Schema::Object(schema) => schema,
- _ => unreachable!(),
- };
- let standalone_plugin = SectionConfigPlugin::new(
- "standalone".to_string(),
- Some("id".to_string()),
- standalone_schema,
- );
- config.register_plugin(standalone_plugin);
-
- let dns_challenge_schema = match DnsPlugin::API_SCHEMA {
- Schema::AllOf(ref schema) => schema,
- _ => unreachable!(),
- };
- let dns_challenge_plugin = SectionConfigPlugin::new(
- "dns".to_string(),
- Some("id".to_string()),
- dns_challenge_schema,
- );
- config.register_plugin(dns_challenge_plugin);
-
- config
-}
-
-const ACME_PLUGIN_CFG_FILENAME: &str = pbs_buildcfg::configdir!("/acme/plugins.cfg");
-const ACME_PLUGIN_CFG_LOCKFILE: &str = pbs_buildcfg::configdir!("/acme/.plugins.lck");
-
-pub fn lock() -> Result<BackupLockGuard, Error> {
- super::make_acme_dir()?;
- open_backup_lockfile(ACME_PLUGIN_CFG_LOCKFILE, None, true)
-}
-
-pub fn config() -> Result<(PluginData, [u8; 32]), Error> {
- let content =
- proxmox_sys::fs::file_read_optional_string(ACME_PLUGIN_CFG_FILENAME)?.unwrap_or_default();
-
- let digest = openssl::sha::sha256(content.as_bytes());
- let mut data = CONFIG.parse(ACME_PLUGIN_CFG_FILENAME, &content)?;
-
- if !data.sections.contains_key("standalone") {
- let standalone = StandalonePlugin::default();
- data.set_data("standalone", "standalone", &standalone)
- .unwrap();
- }
-
- Ok((PluginData { data }, digest))
-}
-
-pub fn save_config(config: &PluginData) -> Result<(), Error> {
- super::make_acme_dir()?;
- let raw = CONFIG.write(ACME_PLUGIN_CFG_FILENAME, &config.data)?;
- pbs_config::replace_backup_config(ACME_PLUGIN_CFG_FILENAME, raw.as_bytes())
-}
-
pub struct PluginData {
data: SectionConfigData,
}
diff --git a/src/config/node.rs b/src/config/node.rs
index 253b2e36..81eecb24 100644
--- a/src/config/node.rs
+++ b/src/config/node.rs
@@ -8,16 +8,14 @@ use pbs_api_types::{
EMAIL_SCHEMA, MULTI_LINE_COMMENT_SCHEMA, OPENSSL_CIPHERS_TLS_1_2_SCHEMA,
OPENSSL_CIPHERS_TLS_1_3_SCHEMA,
};
+use proxmox_acme_api::{AcmeConfig, AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA};
use proxmox_http::ProxyConfig;
use proxmox_schema::{api, ApiStringFormat, ApiType, Updater};
use pbs_buildcfg::configdir;
use pbs_config::{open_backup_lockfile, BackupLockGuard};
-use crate::acme::AcmeClient;
-use crate::api2::types::{
- AcmeAccountName, AcmeDomain, ACME_DOMAIN_PROPERTY_SCHEMA, HTTP_PROXY_SCHEMA,
-};
+use crate::api2::types::HTTP_PROXY_SCHEMA;
const CONF_FILE: &str = configdir!("/node.cfg");
const LOCK_FILE: &str = configdir!("/.node.lck");
@@ -44,20 +42,6 @@ pub fn save_config(config: &NodeConfig) -> Result<(), Error> {
pbs_config::replace_backup_config(CONF_FILE, &raw)
}
-#[api(
- properties: {
- account: { type: AcmeAccountName },
- }
-)]
-#[derive(Deserialize, Serialize)]
-/// The ACME configuration.
-///
-/// Currently only contains the name of the account use.
-pub struct AcmeConfig {
- /// Account to use to acquire ACME certificates.
- account: AcmeAccountName,
-}
-
/// All available languages in Proxmox. Taken from proxmox-i18n repository.
/// pt_BR, zh_CN, and zh_TW use the same case in the translation files.
// TODO: auto-generate from available translations
@@ -235,19 +219,16 @@ pub struct NodeConfig {
}
impl NodeConfig {
- pub fn acme_config(&self) -> Option<Result<AcmeConfig, Error>> {
- self.acme.as_deref().map(|config| -> Result<_, Error> {
- crate::tools::config::from_property_string(config, &AcmeConfig::API_SCHEMA)
- })
- }
-
- pub async fn acme_client(&self) -> Result<AcmeClient, Error> {
- let account = if let Some(cfg) = self.acme_config().transpose()? {
- cfg.account
- } else {
- AcmeAccountName::from_string("default".to_string())? // should really not happen
- };
- AcmeClient::load(&account).await
+ pub fn acme_config(&self) -> Result<AcmeConfig, Error> {
+ self.acme
+ .as_deref()
+ .map(|config| {
+ crate::tools::config::from_property_string::<AcmeConfig>(
+ config,
+ &AcmeConfig::API_SCHEMA,
+ )
+ })
+ .unwrap_or_else(|| proxmox_acme_api::parse_acme_config_string("account=default"))
}
pub fn acme_domains(&'_ self) -> AcmeDomainIter<'_> {
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 4%]
* [pbs-devel] superseded: [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests
@ 2026-01-16 11:30 13% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-16 11:30 UTC (permalink / raw)
To: pbs-devel
https://lore.proxmox.com/pbs-devel/20260116112859.194016-1-s.rufinatscha@proxmox.com/T/#t
On 1/8/26 12:25 PM, Samuel Rufinatscha wrote:
> Hi,
>
> this series fixes account registration for ACME providers that return
> HTTP 204 No Content to the newNonce request. Currently, both the PBS
> ACME client and the shared ACME client in proxmox-acme only accept
> HTTP 200 OK for this request. The issue was observed in PBS against a
> custom ACME deployment and reported as bug #6939 [1].
>
> ## Problem
>
> During ACME account registration, PBS first fetches an anti-replay
> nonce by sending a HEAD request to the CA’s newNonce URL.
> RFC 8555 §7.2 [2] states that:
>
> * the server MUST include a Replay-Nonce header with a fresh nonce,
> * the server SHOULD use status 200 OK for the HEAD request,
> * the server MUST also handle GET on the same resource and may return
> 204 No Content with an empty body.
>
> The reporter observed the following error message:
>
> *ACME server responded with unexpected status code: 204*
>
> and mentioned that the issue did not appear with PVE 9 [1]. Looking at
> PVE’s Perl ACME client [3], it uses a GET request instead of HEAD and
> accepts any 2xx success code when retrieving the nonce. This difference
> in behavior does not affect functionality but is worth noting for
> consistency across implementations.
>
> ## Approach
>
> To support ACME providers which return 204 No Content, the Rust ACME
> clients in proxmox-backup and proxmox need to treat both 200 OK and 204
> No Content as valid responses for the nonce request, as long as a
> Replay-Nonce header is present.
>
> This series changes the expected field of the internal Request type
> from a single u16 to a list of allowed status codes
> (e.g. &'static [u16]), so one request can explicitly accept multiple
> success codes.
>
> To avoid fixing the issue twice (once in PBS’ own ACME client and once
> in the shared Rust client), this series first refactors PBS to use the
> shared AcmeClient from proxmox-acme / proxmox-acme-api, similar to PDM,
> and then applies the bug fix in that shared implementation so that all
> consumers benefit from the more tolerant behavior.
>
> ## Testing
>
> *Testing the refactor*
>
> To test the refactor, I
> (1) installed latest stable PBS on a VM
> (2) created .deb package from latest PBS (master), containing the
> refactor
> (3) installed created .deb package
> (4) installed Pebble from Let's Encrypt [5] on the same VM
> (5) created an ACME account and ordered the new certificate for the
> host domain.
>
> Steps to reproduce:
>
> (1) install latest stable PBS on a VM, create .deb package from latest
> PBS (master) containing the refactor, install created .deb package
> (2) install Pebble from Let's Encrypt [5] on the same VM:
>
> cd
> apt update
> apt install -y golang git
> git clone https://github.com/letsencrypt/pebble
> cd pebble
> go build ./cmd/pebble
>
> then, download and trust the Pebble cert:
>
> wget https://raw.githubusercontent.com/letsencrypt/pebble/main/test/certs/pebble.minica.pem
> cp pebble.minica.pem /usr/local/share/ca-certificates/pebble.minica.crt
> update-ca-certificates
>
> We want Pebble to perform HTTP-01 validation against port 80, because
> PBS’s standalone plugin will bind port 80. Set httpPort to 80.
>
> nano ./test/config/pebble-config.json
>
> Start the Pebble server in the background:
>
> ./pebble -config ./test/config/pebble-config.json &
>
> Create a Pebble ACME account:
>
> proxmox-backup-manager acme account register default admin@example.com --directory 'https://127.0.0.1:14000/dir'
>
> To verify persistence of the account I checked
>
> ls /etc/proxmox-backup/acme/accounts
>
> Verified if update-account works
>
> proxmox-backup-manager acme account update default --contact "a@example.com,b@example.com"
> proxmox-backup-manager acme account info default
>
> In the PBS GUI, you can create a new domain. You can use your host
> domain name (see /etc/hosts). Select the created account and order the
> certificate.
>
> After a page reload, you might need to accept the new certificate in the browser.
> In the PBS dashboard, you should see the new Pebble certificate.
>
> *Note: on reboot, the created Pebble ACME account will be gone and you
> will need to create a new one. Pebble does not persist account info.
> In that case remove the previously created account in
> /etc/proxmox-backup/acme/accounts.
>
> *Testing the newNonce fix*
>
> To prove the ACME newNonce fix, I put nginx in front of Pebble, to
> intercept the newNonce request in order to return 204 No Content
> instead of 200 OK, all other requests are unchanged and forwarded to
> Pebble. Requires trusting the nginx CAs via
> /usr/local/share/ca-certificates + update-ca-certificates on the VM.
>
> Then I ran following command against nginx:
>
> proxmox-backup-manager acme account register proxytest root@backup.local --directory 'https://nginx-address/dir
>
> The account could be created successfully. When adjusting the nginx
> configuration to return any other non-expected success status code,
> PBS rejects as expected.
>
> ## Patch summary
>
> 0001 – [PATCH proxmox v5 1/4] acme: reduce visibility of Request type
> Restricts the visibility of the low-level Request type. Consumers
> should rely on proxmox-acme-api or AcmeClient handlers.
>
> 0002– [PATCH proxmox v5 2/4] acme: introduce http_status module
>
> 0003 – [PATCH proxmox v5 3/4] fix #6939: acme: support servers
> returning 204 for nonce requests
> Adjusts nonce handling to support ACME servers that return HTTP 204
> (No Content) for new-nonce requests.
>
> 0004 – [PATCH proxmox v5 4/4] acme-api: add helper to load client for
> an account
> Introduces a helper function to load an ACME client instance for a
> given account. Required for the following PBS ACME refactor.
>
> 0005 – [PATCH proxmox-backup v5 1/5] acme: clean up ACME-related imports
>
> 0006 – [PATCH proxmox-backup v5 2/5] acme: include proxmox-acme-api
> dependency
> Prepares the codebase to use the factored out ACME API impl.
>
> 0007 – [PATCH proxmox-backup v5 3/5] acme: drop local AcmeClient
> Removes the local AcmeClient implementation. Represents the minimal
> set of changes to replace it with the factored out AcmeClient.
>
> 0008 – [PATCH proxmox-backup v5 4/5] acme: change API impls to use
> proxmox-acme-api handlers
>
> 0009 – [PATCH proxmox-backup v5 5/5] acme: certificate ordering through
> proxmox-acme-api
>
> Thanks for considering this patch series, I look forward to your
> feedback.
>
> Best,
> Samuel Rufinatscha
>
> ## Changelog
>
> Changes from v4 to v5:
>
> * rebased series
> * re-ordered series (proxmox-acme fix first)
> * proxmox-backup: cleaned up imports based on an initial clean-up patch
> * proxmox-acme: removed now unused post_request_raw_payload(),
> update_account_request(), deactivate_account_request()
> * proxmox-acme: removed now obsolete/unused get_authorization() and
> GetAuthorization impl
>
> Verified removal by compiling PBS, PDM, and proxmox-perl-rs
> with all features.
>
> Changes from v3 to v4:
>
> * add proxmox-acme-api as a dependency and initialize it in
> PBS so PBS can use the shared ACME API instead.
> * remove the PBS-local AcmeClient implementation and switch PBS
> over to the shared proxmox-acme async client.
> * rework PBS’ ACME API endpoints to delegate to
> proxmox-acme-api handlers instead of duplicating logic locally.
> * move PBS’ ACME certificate ordering logic over to
> proxmox-acme-api, keeping only certificate installation/reload in PBS.
> * add a load_client_with_account helper in proxmox-acme-api so PBS
> (and others) can construct an AcmeClient for a configured account
> without duplicating boilerplate.
> * hide the low-level Request type and its fields behind constructors
> / reduced visibility so changes to “expected” no longer affect the
> public API as they did in v3.
> * split out the HTTP status constants into an internal http_status
> module as a separate preparatory cleanup before the bug fix, instead
> of doing this inline like in v3.
> * Rebased on top of the refactor: keep the same behavioural fix as in
> v3 accept 204 for newNonce with Replay-Nonce present), but implement
> it on top of the http_status module that is part of the refactor.
>
> Changes from v2 to v3:
>
> * rename `http_success` module to `http_status`
> * replace `http_success` usage
> * introduced `http_success` module to contain the http success codes
> * replaced `Vec<u16>` with `&[u16]` for expected codes to avoid allocations.
> * clarified the PVEs Perl ACME client behaviour in the commit message.
> * integrated the `http_success` module, replacing `Vec<u16>` with `&[u16]`
> * clarified the PVEs Perl ACME client behaviour in the commit message.
>
> [1] Bugzilla report #6939:
> [https://bugzilla.proxmox.com/show_bug.cgi?id=6939](https://bugzilla.proxmox.com/show_bug.cgi?id=6939)
> [2] RFC 8555 (ACME):
> [https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2](https://datatracker.ietf.org/doc/html/rfc8555/#section-7.2)
> [3] PVE’s Perl ACME client (allow 2xx codes for nonce requests):
> [https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l597)
> [4] Pebble ACME server:
> [https://github.com/letsencrypt/pebble](https://github.com/letsencrypt/pebble)
> [5] Pebble ACME server (perform GET request:
> [https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219](https://git.proxmox.com/?p=proxmox-acme.git;a=blob;f=src/PVE/ACME.pm;h=f1e9bb7d316e3cea1e376c610b0479119217aecc;hb=HEAD#l219)
>
> proxmox:
>
> Samuel Rufinatscha (4):
> acme: reduce visibility of Request type
> acme: introduce http_status module
> fix #6939: acme: support servers returning 204 for nonce requests
> acme-api: add helper to load client for an account
>
> proxmox-acme-api/src/account_api_impl.rs | 5 ++
> proxmox-acme-api/src/lib.rs | 3 +-
> proxmox-acme/src/account.rs | 102 ++---------------------
> proxmox-acme/src/async_client.rs | 8 +-
> proxmox-acme/src/authorization.rs | 30 -------
> proxmox-acme/src/client.rs | 8 +-
> proxmox-acme/src/lib.rs | 6 +-
> proxmox-acme/src/order.rs | 2 +-
> proxmox-acme/src/request.rs | 25 ++++--
> 9 files changed, 44 insertions(+), 145 deletions(-)
>
>
> proxmox-backup:
>
> Samuel Rufinatscha (5):
> acme: clean up ACME-related imports
> acme: include proxmox-acme-api dependency
> acme: drop local AcmeClient
> acme: change API impls to use proxmox-acme-api handlers
> acme: certificate ordering through proxmox-acme-api
>
> Cargo.toml | 3 +
> src/acme/client.rs | 691 -------------------------
> src/acme/mod.rs | 5 -
> src/acme/plugin.rs | 336 ------------
> src/api2/config/acme.rs | 406 ++-------------
> src/api2/node/certificates.rs | 232 ++-------
> src/api2/types/acme.rs | 98 ----
> src/api2/types/mod.rs | 3 -
> src/bin/proxmox-backup-api.rs | 2 +
> src/bin/proxmox-backup-manager.rs | 14 +-
> src/bin/proxmox-backup-proxy.rs | 15 +-
> src/bin/proxmox_backup_manager/acme.rs | 21 +-
> src/config/acme/mod.rs | 55 +-
> src/config/acme/plugin.rs | 92 +---
> src/config/node.rs | 31 +-
> src/lib.rs | 2 -
> 16 files changed, 109 insertions(+), 1897 deletions(-)
> delete mode 100644 src/acme/client.rs
> delete mode 100644 src/acme/mod.rs
> delete mode 100644 src/acme/plugin.rs
> delete mode 100644 src/api2/types/acme.rs
>
>
> Summary over all repositories:
> 25 files changed, 153 insertions(+), 2042 deletions(-)
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 1/4] pbs-config: add token.shadow generation to ConfigVersionCache
@ 2026-01-16 13:53 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-16 13:53 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>> Currently, every token-based API request reads the token.shadow file and
>> runs the expensive password hash verification for the given token
>> secret. This shows up as a hotspot in /status profiling (see
>> bug #7017 [1]).
>>
>> To solve the issue, this patch prepares the config version cache,
>> so that token_shadow_generation config caching can be built on
>> top of it.
>>
>> This patch specifically:
>> (1) implements increment function in order to invalidate generations
>
> this is needlessly verbose..
>
>>
>> This patch is part of the series which fixes bug #7017 [1].
>
> this is already mentioned higher up and doesn't need to be repeated
> here.
>
Makes sense, will adjust this. Thanks!
> this patch needs a rebase. it would be good to call out why it is safe
> to add to this struct, since it is accessed/mapped by both old and new
> processes.
>
Will add a note on why this is safe: the shmem mapping is fixed to 4096
bytes via the #[repr(C)] union padding and enforced
by assert_cache_size(). The new AtomicUsize is appended at the end of
the struct, so existing field offsets are unchanged. Old
processes keep accessing the same bytes; the new field consumes
previously reserved padding.
>>
>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> pbs-config/src/config_version_cache.rs | 18 ++++++++++++++++++
>> 1 file changed, 18 insertions(+)
>>
>> diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
>> index e8fb994f..1376b11d 100644
>> --- a/pbs-config/src/config_version_cache.rs
>> +++ b/pbs-config/src/config_version_cache.rs
>> @@ -28,6 +28,8 @@ struct ConfigVersionCacheDataInner {
>> // datastore (datastore.cfg) generation/version
>> // FIXME: remove with PBS 3.0
>> datastore_generation: AtomicUsize,
>> + // Token shadow (token.shadow) generation/version.
>> + token_shadow_generation: AtomicUsize,
>> // Add further atomics here
>> }
>>
>> @@ -153,4 +155,20 @@ impl ConfigVersionCache {
>> .datastore_generation
>> .fetch_add(1, Ordering::AcqRel)
>> }
>> +
>> + /// Returns the token shadow generation number.
>> + pub fn token_shadow_generation(&self) -> usize {
>> + self.shmem
>> + .data()
>> + .token_shadow_generation
>> + .load(Ordering::Acquire)
>> + }
>> +
>> + /// Increase the token shadow generation number.
>> + pub fn increase_token_shadow_generation(&self) -> usize {
>> + self.shmem
>> + .data()
>> + .token_shadow_generation
>> + .fetch_add(1, Ordering::AcqRel)
>> + }
>> }
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets
@ 2026-01-16 15:13 6% ` Samuel Rufinatscha
2026-01-16 15:29 5% ` Fabian Grünbichler
2026-01-16 16:00 5% ` Fabian Grünbichler
0 siblings, 2 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-16 15:13 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>> Currently, every token-based API request reads the token.shadow file and
>> runs the expensive password hash verification for the given token
>> secret. This shows up as a hotspot in /status profiling (see
>> bug #7017 [1]).
>>
>> This patch introduces an in-memory cache of successfully verified token
>> secrets. Subsequent requests for the same token+secret combination only
>> perform a comparison using openssl::memcmp::eq and avoid re-running the
>> password hash. The cache is updated when a token secret is set and
>> cleared when a token is deleted. Note, this does NOT include manual
>> config changes, which will be covered in a subsequent patch.
>>
>> This patch is part of the series which fixes bug #7017 [1].
>>
>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> Changes from v1 to v2:
>>
>> * Replace OnceCell with LazyLock, and std::sync::RwLock with
>> parking_lot::RwLock.
>> * Add API_MUTATION_GENERATION and guard cache inserts
>> to prevent “zombie inserts” across concurrent set/delete.
>> * Refactor cache operations into cache_try_secret_matches,
>> cache_try_insert_secret, and centralize write-side behavior in
>> apply_api_mutation.
>> * Switch fast-path cache access to try_read/try_write (best-effort).
>>
>> Changes from v2 to v3:
>>
>> * Replaced process-local cache invalidation (AtomicU64
>> API_MUTATION_GENERATION) with a cross-process shared generation via
>> ConfigVersionCache.
>> * Validate shared generation before/after the constant-time secret
>> compare; only insert into cache if the generation is unchanged.
>> * invalidate_cache_state() on insert if shared generation changed.
>>
>> Cargo.toml | 1 +
>> pbs-config/Cargo.toml | 1 +
>> pbs-config/src/token_shadow.rs | 157 ++++++++++++++++++++++++++++++++-
>> 3 files changed, 158 insertions(+), 1 deletion(-)
>>
>> diff --git a/Cargo.toml b/Cargo.toml
>> index 1aa57ae5..821b63b7 100644
>> --- a/Cargo.toml
>> +++ b/Cargo.toml
>> @@ -143,6 +143,7 @@ nom = "7"
>> num-traits = "0.2"
>> once_cell = "1.3.1"
>> openssl = "0.10.40"
>> +parking_lot = "0.12"
>> percent-encoding = "2.1"
>> pin-project-lite = "0.2"
>> regex = "1.5.5"
>> diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
>> index 74afb3c6..eb81ce00 100644
>> --- a/pbs-config/Cargo.toml
>> +++ b/pbs-config/Cargo.toml
>> @@ -13,6 +13,7 @@ libc.workspace = true
>> nix.workspace = true
>> once_cell.workspace = true
>> openssl.workspace = true
>> +parking_lot.workspace = true
>> regex.workspace = true
>> serde.workspace = true
>> serde_json.workspace = true
>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
>> index 640fabbf..fa84aee5 100644
>> --- a/pbs-config/src/token_shadow.rs
>> +++ b/pbs-config/src/token_shadow.rs
>> @@ -1,6 +1,8 @@
>> use std::collections::HashMap;
>> +use std::sync::LazyLock;
>>
>> use anyhow::{bail, format_err, Error};
>> +use parking_lot::RwLock;
>> use serde::{Deserialize, Serialize};
>> use serde_json::{from_value, Value};
>>
>> @@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
>> const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
>> const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>>
>> +/// Global in-memory cache for successfully verified API token secrets.
>> +/// The cache stores plain text secrets for token Authids that have already been
>> +/// verified against the hashed values in `token.shadow`. This allows for cheap
>> +/// subsequent authentications for the same token+secret combination, avoiding
>> +/// recomputing the password hash on every request.
>> +static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
>> + RwLock::new(ApiTokenSecretCache {
>> + secrets: HashMap::new(),
>> + shared_gen: 0,
>> + })
>> +});
>> +
>> #[derive(Serialize, Deserialize)]
>> #[serde(rename_all = "kebab-case")]
>> /// ApiToken id / secret pair
>> @@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>> bail!("not an API token ID");
>> }
>>
>> + // Fast path
>> + if cache_try_secret_matches(tokenid, secret) {
>> + return Ok(());
>> + }
>> +
>> + // Slow path
>> + // First, capture the shared generation before doing the hash verification.
>> + let gen_before = token_shadow_shared_gen();
>> +
>> let data = read_file()?;
>> match data.get(tokenid) {
>> - Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
>> + Some(hashed_secret) => {
>> + proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
>> +
>> + // Try to cache only if nothing changed while verifying the secret.
>> + if let Some(gen) = gen_before {
>> + cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
>> + }
>> +
>> + Ok(())
>> + }
>> None => bail!("invalid API token"),
>> }
>> }
>> @@ -82,6 +114,8 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>> data.insert(tokenid.clone(), hashed_secret);
>> write_file(data)?;
>>
>> + apply_api_mutation(tokenid, Some(secret));
>> +
>> Ok(())
>> }
>>
>> @@ -97,5 +131,126 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
>> data.remove(tokenid);
>> write_file(data)?;
>>
>> + apply_api_mutation(tokenid, None);
>> +
>> Ok(())
>> }
>> +
>> +struct ApiTokenSecretCache {
>> + /// Keys are token Authids, values are the corresponding plain text secrets.
>> + /// Entries are added after a successful on-disk verification in
>> + /// `verify_secret` or when a new token secret is generated by
>> + /// `generate_and_set_secret`. Used to avoid repeated
>> + /// password-hash computation on subsequent authentications.
>> + secrets: HashMap<Authid, CachedSecret>,
>> + /// Shared generation to detect mutations of the underlying token.shadow file.
>> + shared_gen: usize,
>> +}
>> +
>> +/// Cached secret.
>> +struct CachedSecret {
>> + secret: String,
>> +}
>> +
>> +fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
>> + let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
>> + return;
>> + };
>> +
>> + let Some(shared_gen_now) = token_shadow_shared_gen() else {
>> + return;
>> + };
>> +
>> + // If this process missed a generation bump, its cache is stale.
>> + if cache.shared_gen != shared_gen_now {
>> + invalidate_cache_state(&mut cache);
>> + cache.shared_gen = shared_gen_now;
>> + }
>> +
>> + // If a mutation happened while we were verifying the secret, do not insert.
>> + if shared_gen_now == shared_gen_before {
>> + cache.secrets.insert(tokenid, CachedSecret { secret });
>> + }
>> +}
>> +
>> +// Tries to match the given token secret against the cached secret.
>> +// Checks the generation before and after the constant-time compare to avoid a
>> +// TOCTOU window. If another process rotates/deletes a token while we're validating
>> +// the cached secret, the generation will change, and we
>> +// must not trust the cache for this request.
>> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
>> + let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
>> + return false;
>> + };
>> + let Some(entry) = cache.secrets.get(tokenid) else {
>> + return false;
>> + };
>> +
>> + let cache_gen = cache.shared_gen;
>> +
>> + let Some(gen1) = token_shadow_shared_gen() else {
>> + return false;
>> + };
>> + if gen1 != cache_gen {
>> + return false;
>> + }
>> +
>> + let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
>
> should we invalidate the cache here for this particular authid in case
> of a mismatch, to avoid making brute forcing too easy/cheap?
>
We are not doing a cheap reject, in mismatch we do still fall through to
verify_crypt_pw(). Evicting on mismatch could however enable cache
thrashing where wrong secrets for a known tokenid would evict cached
entries. So I think we should not invalidate here on mismatch.
>> + let Some(gen2) = token_shadow_shared_gen() else {
>> + return false;
>> + };
>> +
>> + eq && gen2 == cache_gen
>> +}
>> +
>> +fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
>> + // Signal cache invalidation to other processes (best-effort).
>> + let new_shared_gen = bump_token_shadow_shared_gen();
>> +
>> + let mut cache = TOKEN_SECRET_CACHE.write();
>> +
>> + // If we cannot read/bump the shared generation, we cannot safely trust the cache.
>> + let Some(gen) = new_shared_gen else {
>> + invalidate_cache_state(&mut cache);
>> + cache.shared_gen = 0;
>> + return;
>> + };
>> +
>> + // Update to the post-mutation generation.
>> + cache.shared_gen = gen;
>> +
>> + // Apply the new mutation.
>> + match new_secret {
>> + Some(secret) => {
>> + cache.secrets.insert(
>> + tokenid.clone(),
>> + CachedSecret {
>> + secret: secret.to_owned(),
>> + },
>> + );
>> + }
>> + None => {
>> + cache.secrets.remove(tokenid);
>> + }
>> + }
>> +}
>> +
>> +/// Get the current shared generation.
>> +fn token_shadow_shared_gen() -> Option<usize> {
>> + crate::ConfigVersionCache::new()
>> + .ok()
>> + .map(|cvc| cvc.token_shadow_generation())
>> +}
>> +
>> +/// Bump and return the new shared generation.
>> +fn bump_token_shadow_shared_gen() -> Option<usize> {
>> + crate::ConfigVersionCache::new()
>> + .ok()
>> + .map(|cvc| cvc.increase_token_shadow_generation() + 1)
>> +}
>> +
>> +/// Invalidates the cache state and only keeps the shared generation.
>
> both calls to this actually set the cached generation to some value
> right after, so maybe this should take a generation directly and set it?
>
patch 3/4 doesn’t always update the gen on cache invalidation
(shadow_mtime_len() error branch in apply_api_mutation) but most other
call sites do. Agreed this can be refactored, maybe:
fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
cache.secrets.clear();
// clear other cache fields (mtime/len/last_checked) as needed
}
fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache,
gen: usize) {
invalidate_cache_state(cache);
cache.shared_gen = gen;
}
We could also do a single helper with Option<usize> but two helpers make
the call sites more explicit.
>> +fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>> + cache.secrets.clear();
>> +}
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets
2026-01-16 15:13 6% ` Samuel Rufinatscha
@ 2026-01-16 15:29 5% ` Fabian Grünbichler
2026-01-16 15:33 6% ` Samuel Rufinatscha
2026-01-16 16:00 5% ` Fabian Grünbichler
1 sibling, 1 reply; 39+ results
From: Fabian Grünbichler @ 2026-01-16 15:29 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Samuel Rufinatscha
Quoting Samuel Rufinatscha (2026-01-16 16:13:17)
> On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
> > On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
> >> Currently, every token-based API request reads the token.shadow file and
> >> runs the expensive password hash verification for the given token
> >> secret. This shows up as a hotspot in /status profiling (see
> >> bug #7017 [1]).
> >>
> >> This patch introduces an in-memory cache of successfully verified token
> >> secrets. Subsequent requests for the same token+secret combination only
> >> perform a comparison using openssl::memcmp::eq and avoid re-running the
> >> password hash. The cache is updated when a token secret is set and
> >> cleared when a token is deleted. Note, this does NOT include manual
> >> config changes, which will be covered in a subsequent patch.
> >>
> >> This patch is part of the series which fixes bug #7017 [1].
> >>
> >> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
> >>
> >> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> >> ---
> >> Changes from v1 to v2:
> >>
> >> * Replace OnceCell with LazyLock, and std::sync::RwLock with
> >> parking_lot::RwLock.
> >> * Add API_MUTATION_GENERATION and guard cache inserts
> >> to prevent “zombie inserts” across concurrent set/delete.
> >> * Refactor cache operations into cache_try_secret_matches,
> >> cache_try_insert_secret, and centralize write-side behavior in
> >> apply_api_mutation.
> >> * Switch fast-path cache access to try_read/try_write (best-effort).
> >>
> >> Changes from v2 to v3:
> >>
> >> * Replaced process-local cache invalidation (AtomicU64
> >> API_MUTATION_GENERATION) with a cross-process shared generation via
> >> ConfigVersionCache.
> >> * Validate shared generation before/after the constant-time secret
> >> compare; only insert into cache if the generation is unchanged.
> >> * invalidate_cache_state() on insert if shared generation changed.
> >>
> >> Cargo.toml | 1 +
> >> pbs-config/Cargo.toml | 1 +
> >> pbs-config/src/token_shadow.rs | 157 ++++++++++++++++++++++++++++++++-
> >> 3 files changed, 158 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/Cargo.toml b/Cargo.toml
> >> index 1aa57ae5..821b63b7 100644
> >> --- a/Cargo.toml
> >> +++ b/Cargo.toml
> >> @@ -143,6 +143,7 @@ nom = "7"
> >> num-traits = "0.2"
> >> once_cell = "1.3.1"
> >> openssl = "0.10.40"
> >> +parking_lot = "0.12"
> >> percent-encoding = "2.1"
> >> pin-project-lite = "0.2"
> >> regex = "1.5.5"
> >> diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
> >> index 74afb3c6..eb81ce00 100644
> >> --- a/pbs-config/Cargo.toml
> >> +++ b/pbs-config/Cargo.toml
> >> @@ -13,6 +13,7 @@ libc.workspace = true
> >> nix.workspace = true
> >> once_cell.workspace = true
> >> openssl.workspace = true
> >> +parking_lot.workspace = true
> >> regex.workspace = true
> >> serde.workspace = true
> >> serde_json.workspace = true
> >> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
> >> index 640fabbf..fa84aee5 100644
> >> --- a/pbs-config/src/token_shadow.rs
> >> +++ b/pbs-config/src/token_shadow.rs
> >> @@ -1,6 +1,8 @@
> >> use std::collections::HashMap;
> >> +use std::sync::LazyLock;
> >>
> >> use anyhow::{bail, format_err, Error};
> >> +use parking_lot::RwLock;
> >> use serde::{Deserialize, Serialize};
> >> use serde_json::{from_value, Value};
> >>
> >> @@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
> >> const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
> >> const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
> >>
> >> +/// Global in-memory cache for successfully verified API token secrets.
> >> +/// The cache stores plain text secrets for token Authids that have already been
> >> +/// verified against the hashed values in `token.shadow`. This allows for cheap
> >> +/// subsequent authentications for the same token+secret combination, avoiding
> >> +/// recomputing the password hash on every request.
> >> +static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
> >> + RwLock::new(ApiTokenSecretCache {
> >> + secrets: HashMap::new(),
> >> + shared_gen: 0,
> >> + })
> >> +});
> >> +
> >> #[derive(Serialize, Deserialize)]
> >> #[serde(rename_all = "kebab-case")]
> >> /// ApiToken id / secret pair
> >> @@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
> >> bail!("not an API token ID");
> >> }
> >>
> >> + // Fast path
> >> + if cache_try_secret_matches(tokenid, secret) {
> >> + return Ok(());
> >> + }
> >> +
> >> + // Slow path
> >> + // First, capture the shared generation before doing the hash verification.
> >> + let gen_before = token_shadow_shared_gen();
> >> +
> >> let data = read_file()?;
> >> match data.get(tokenid) {
> >> - Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
> >> + Some(hashed_secret) => {
> >> + proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
> >> +
> >> + // Try to cache only if nothing changed while verifying the secret.
> >> + if let Some(gen) = gen_before {
> >> + cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
> >> + }
> >> +
> >> + Ok(())
> >> + }
> >> None => bail!("invalid API token"),
> >> }
> >> }
> >> @@ -82,6 +114,8 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
> >> data.insert(tokenid.clone(), hashed_secret);
> >> write_file(data)?;
> >>
> >> + apply_api_mutation(tokenid, Some(secret));
> >> +
> >> Ok(())
> >> }
> >>
> >> @@ -97,5 +131,126 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
> >> data.remove(tokenid);
> >> write_file(data)?;
> >>
> >> + apply_api_mutation(tokenid, None);
> >> +
> >> Ok(())
> >> }
> >> +
> >> +struct ApiTokenSecretCache {
> >> + /// Keys are token Authids, values are the corresponding plain text secrets.
> >> + /// Entries are added after a successful on-disk verification in
> >> + /// `verify_secret` or when a new token secret is generated by
> >> + /// `generate_and_set_secret`. Used to avoid repeated
> >> + /// password-hash computation on subsequent authentications.
> >> + secrets: HashMap<Authid, CachedSecret>,
> >> + /// Shared generation to detect mutations of the underlying token.shadow file.
> >> + shared_gen: usize,
> >> +}
> >> +
> >> +/// Cached secret.
> >> +struct CachedSecret {
> >> + secret: String,
> >> +}
> >> +
> >> +fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
> >> + let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
> >> + return;
> >> + };
> >> +
> >> + let Some(shared_gen_now) = token_shadow_shared_gen() else {
> >> + return;
> >> + };
> >> +
> >> + // If this process missed a generation bump, its cache is stale.
> >> + if cache.shared_gen != shared_gen_now {
> >> + invalidate_cache_state(&mut cache);
> >> + cache.shared_gen = shared_gen_now;
> >> + }
> >> +
> >> + // If a mutation happened while we were verifying the secret, do not insert.
> >> + if shared_gen_now == shared_gen_before {
> >> + cache.secrets.insert(tokenid, CachedSecret { secret });
> >> + }
> >> +}
> >> +
> >> +// Tries to match the given token secret against the cached secret.
> >> +// Checks the generation before and after the constant-time compare to avoid a
> >> +// TOCTOU window. If another process rotates/deletes a token while we're validating
> >> +// the cached secret, the generation will change, and we
> >> +// must not trust the cache for this request.
> >> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
> >> + let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
> >> + return false;
> >> + };
> >> + let Some(entry) = cache.secrets.get(tokenid) else {
> >> + return false;
> >> + };
> >> +
> >> + let cache_gen = cache.shared_gen;
> >> +
> >> + let Some(gen1) = token_shadow_shared_gen() else {
> >> + return false;
> >> + };
> >> + if gen1 != cache_gen {
> >> + return false;
> >> + }
> >> +
> >> + let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
> >
> > should we invalidate the cache here for this particular authid in case
> > of a mismatch, to avoid making brute forcing too easy/cheap?
> >
>
> We are not doing a cheap reject, in mismatch we do still fall through to
> verify_crypt_pw(). Evicting on mismatch could however enable cache
> thrashing where wrong secrets for a known tokenid would evict cached
> entries. So I think we should not invalidate here on mismatch.
>
> >> + let Some(gen2) = token_shadow_shared_gen() else {
> >> + return false;
> >> + };
> >> +
> >> + eq && gen2 == cache_gen
> >> +}
> >> +
> >> +fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
> >> + // Signal cache invalidation to other processes (best-effort).
> >> + let new_shared_gen = bump_token_shadow_shared_gen();
> >> +
> >> + let mut cache = TOKEN_SECRET_CACHE.write();
> >> +
> >> + // If we cannot read/bump the shared generation, we cannot safely trust the cache.
> >> + let Some(gen) = new_shared_gen else {
> >> + invalidate_cache_state(&mut cache);
> >> + cache.shared_gen = 0;
> >> + return;
> >> + };
> >> +
> >> + // Update to the post-mutation generation.
> >> + cache.shared_gen = gen;
> >> +
> >> + // Apply the new mutation.
> >> + match new_secret {
> >> + Some(secret) => {
> >> + cache.secrets.insert(
> >> + tokenid.clone(),
> >> + CachedSecret {
> >> + secret: secret.to_owned(),
> >> + },
> >> + );
> >> + }
> >> + None => {
> >> + cache.secrets.remove(tokenid);
> >> + }
> >> + }
> >> +}
> >> +
> >> +/// Get the current shared generation.
> >> +fn token_shadow_shared_gen() -> Option<usize> {
> >> + crate::ConfigVersionCache::new()
> >> + .ok()
> >> + .map(|cvc| cvc.token_shadow_generation())
> >> +}
> >> +
> >> +/// Bump and return the new shared generation.
> >> +fn bump_token_shadow_shared_gen() -> Option<usize> {
> >> + crate::ConfigVersionCache::new()
> >> + .ok()
> >> + .map(|cvc| cvc.increase_token_shadow_generation() + 1)
> >> +}
> >> +
> >> +/// Invalidates the cache state and only keeps the shared generation.
> >
> > both calls to this actually set the cached generation to some value
> > right after, so maybe this should take a generation directly and set it?
> >
>
> patch 3/4 doesn’t always update the gen on cache invalidation
> (shadow_mtime_len() error branch in apply_api_mutation) but most other
> call sites do. Agreed this can be refactored, maybe:
that one sets the generation before (potentially) invalidating the cache
though, so we could unconditionally reset the generation to that value when
invalidating.. we should maybe also re-order the lock and bump there?
>
> fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
> cache.secrets.clear();
> // clear other cache fields (mtime/len/last_checked) as needed
> }
>
> fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache,
> gen: usize) {
> invalidate_cache_state(cache);
> cache.shared_gen = gen;
> }
>
> We could also do a single helper with Option<usize> but two helpers make
> the call sites more explicit.
>
> >> +fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
> >> + cache.secrets.clear();
> >> +}
> >> --
> >> 2.47.3
> >>
> >>
> >>
> >> _______________________________________________
> >> pbs-devel mailing list
> >> pbs-devel@lists.proxmox.com
> >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> >>
> >
> >
> > _______________________________________________
> > pbs-devel mailing list
> > pbs-devel@lists.proxmox.com
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets
2026-01-16 15:29 5% ` Fabian Grünbichler
@ 2026-01-16 15:33 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-16 15:33 UTC (permalink / raw)
To: Fabian Grünbichler, Proxmox Backup Server development discussion
On 1/16/26 4:28 PM, Fabian Grünbichler wrote:
> Quoting Samuel Rufinatscha (2026-01-16 16:13:17)
>> On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
>>> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>>>> Currently, every token-based API request reads the token.shadow file and
>>>> runs the expensive password hash verification for the given token
>>>> secret. This shows up as a hotspot in /status profiling (see
>>>> bug #7017 [1]).
>>>>
>>>> This patch introduces an in-memory cache of successfully verified token
>>>> secrets. Subsequent requests for the same token+secret combination only
>>>> perform a comparison using openssl::memcmp::eq and avoid re-running the
>>>> password hash. The cache is updated when a token secret is set and
>>>> cleared when a token is deleted. Note, this does NOT include manual
>>>> config changes, which will be covered in a subsequent patch.
>>>>
>>>> This patch is part of the series which fixes bug #7017 [1].
>>>>
>>>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>>>
>>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>>> ---
>>>> Changes from v1 to v2:
>>>>
>>>> * Replace OnceCell with LazyLock, and std::sync::RwLock with
>>>> parking_lot::RwLock.
>>>> * Add API_MUTATION_GENERATION and guard cache inserts
>>>> to prevent “zombie inserts” across concurrent set/delete.
>>>> * Refactor cache operations into cache_try_secret_matches,
>>>> cache_try_insert_secret, and centralize write-side behavior in
>>>> apply_api_mutation.
>>>> * Switch fast-path cache access to try_read/try_write (best-effort).
>>>>
>>>> Changes from v2 to v3:
>>>>
>>>> * Replaced process-local cache invalidation (AtomicU64
>>>> API_MUTATION_GENERATION) with a cross-process shared generation via
>>>> ConfigVersionCache.
>>>> * Validate shared generation before/after the constant-time secret
>>>> compare; only insert into cache if the generation is unchanged.
>>>> * invalidate_cache_state() on insert if shared generation changed.
>>>>
>>>> Cargo.toml | 1 +
>>>> pbs-config/Cargo.toml | 1 +
>>>> pbs-config/src/token_shadow.rs | 157 ++++++++++++++++++++++++++++++++-
>>>> 3 files changed, 158 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/Cargo.toml b/Cargo.toml
>>>> index 1aa57ae5..821b63b7 100644
>>>> --- a/Cargo.toml
>>>> +++ b/Cargo.toml
>>>> @@ -143,6 +143,7 @@ nom = "7"
>>>> num-traits = "0.2"
>>>> once_cell = "1.3.1"
>>>> openssl = "0.10.40"
>>>> +parking_lot = "0.12"
>>>> percent-encoding = "2.1"
>>>> pin-project-lite = "0.2"
>>>> regex = "1.5.5"
>>>> diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
>>>> index 74afb3c6..eb81ce00 100644
>>>> --- a/pbs-config/Cargo.toml
>>>> +++ b/pbs-config/Cargo.toml
>>>> @@ -13,6 +13,7 @@ libc.workspace = true
>>>> nix.workspace = true
>>>> once_cell.workspace = true
>>>> openssl.workspace = true
>>>> +parking_lot.workspace = true
>>>> regex.workspace = true
>>>> serde.workspace = true
>>>> serde_json.workspace = true
>>>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
>>>> index 640fabbf..fa84aee5 100644
>>>> --- a/pbs-config/src/token_shadow.rs
>>>> +++ b/pbs-config/src/token_shadow.rs
>>>> @@ -1,6 +1,8 @@
>>>> use std::collections::HashMap;
>>>> +use std::sync::LazyLock;
>>>>
>>>> use anyhow::{bail, format_err, Error};
>>>> +use parking_lot::RwLock;
>>>> use serde::{Deserialize, Serialize};
>>>> use serde_json::{from_value, Value};
>>>>
>>>> @@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
>>>> const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
>>>> const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
>>>>
>>>> +/// Global in-memory cache for successfully verified API token secrets.
>>>> +/// The cache stores plain text secrets for token Authids that have already been
>>>> +/// verified against the hashed values in `token.shadow`. This allows for cheap
>>>> +/// subsequent authentications for the same token+secret combination, avoiding
>>>> +/// recomputing the password hash on every request.
>>>> +static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
>>>> + RwLock::new(ApiTokenSecretCache {
>>>> + secrets: HashMap::new(),
>>>> + shared_gen: 0,
>>>> + })
>>>> +});
>>>> +
>>>> #[derive(Serialize, Deserialize)]
>>>> #[serde(rename_all = "kebab-case")]
>>>> /// ApiToken id / secret pair
>>>> @@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>>>> bail!("not an API token ID");
>>>> }
>>>>
>>>> + // Fast path
>>>> + if cache_try_secret_matches(tokenid, secret) {
>>>> + return Ok(());
>>>> + }
>>>> +
>>>> + // Slow path
>>>> + // First, capture the shared generation before doing the hash verification.
>>>> + let gen_before = token_shadow_shared_gen();
>>>> +
>>>> let data = read_file()?;
>>>> match data.get(tokenid) {
>>>> - Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
>>>> + Some(hashed_secret) => {
>>>> + proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
>>>> +
>>>> + // Try to cache only if nothing changed while verifying the secret.
>>>> + if let Some(gen) = gen_before {
>>>> + cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
>>>> + }
>>>> +
>>>> + Ok(())
>>>> + }
>>>> None => bail!("invalid API token"),
>>>> }
>>>> }
>>>> @@ -82,6 +114,8 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>>>> data.insert(tokenid.clone(), hashed_secret);
>>>> write_file(data)?;
>>>>
>>>> + apply_api_mutation(tokenid, Some(secret));
>>>> +
>>>> Ok(())
>>>> }
>>>>
>>>> @@ -97,5 +131,126 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
>>>> data.remove(tokenid);
>>>> write_file(data)?;
>>>>
>>>> + apply_api_mutation(tokenid, None);
>>>> +
>>>> Ok(())
>>>> }
>>>> +
>>>> +struct ApiTokenSecretCache {
>>>> + /// Keys are token Authids, values are the corresponding plain text secrets.
>>>> + /// Entries are added after a successful on-disk verification in
>>>> + /// `verify_secret` or when a new token secret is generated by
>>>> + /// `generate_and_set_secret`. Used to avoid repeated
>>>> + /// password-hash computation on subsequent authentications.
>>>> + secrets: HashMap<Authid, CachedSecret>,
>>>> + /// Shared generation to detect mutations of the underlying token.shadow file.
>>>> + shared_gen: usize,
>>>> +}
>>>> +
>>>> +/// Cached secret.
>>>> +struct CachedSecret {
>>>> + secret: String,
>>>> +}
>>>> +
>>>> +fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
>>>> + let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
>>>> + return;
>>>> + };
>>>> +
>>>> + let Some(shared_gen_now) = token_shadow_shared_gen() else {
>>>> + return;
>>>> + };
>>>> +
>>>> + // If this process missed a generation bump, its cache is stale.
>>>> + if cache.shared_gen != shared_gen_now {
>>>> + invalidate_cache_state(&mut cache);
>>>> + cache.shared_gen = shared_gen_now;
>>>> + }
>>>> +
>>>> + // If a mutation happened while we were verifying the secret, do not insert.
>>>> + if shared_gen_now == shared_gen_before {
>>>> + cache.secrets.insert(tokenid, CachedSecret { secret });
>>>> + }
>>>> +}
>>>> +
>>>> +// Tries to match the given token secret against the cached secret.
>>>> +// Checks the generation before and after the constant-time compare to avoid a
>>>> +// TOCTOU window. If another process rotates/deletes a token while we're validating
>>>> +// the cached secret, the generation will change, and we
>>>> +// must not trust the cache for this request.
>>>> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
>>>> + let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
>>>> + return false;
>>>> + };
>>>> + let Some(entry) = cache.secrets.get(tokenid) else {
>>>> + return false;
>>>> + };
>>>> +
>>>> + let cache_gen = cache.shared_gen;
>>>> +
>>>> + let Some(gen1) = token_shadow_shared_gen() else {
>>>> + return false;
>>>> + };
>>>> + if gen1 != cache_gen {
>>>> + return false;
>>>> + }
>>>> +
>>>> + let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
>>>
>>> should we invalidate the cache here for this particular authid in case
>>> of a mismatch, to avoid making brute forcing too easy/cheap?
>>>
>>
>> We are not doing a cheap reject, in mismatch we do still fall through to
>> verify_crypt_pw(). Evicting on mismatch could however enable cache
>> thrashing where wrong secrets for a known tokenid would evict cached
>> entries. So I think we should not invalidate here on mismatch.
>>
>>>> + let Some(gen2) = token_shadow_shared_gen() else {
>>>> + return false;
>>>> + };
>>>> +
>>>> + eq && gen2 == cache_gen
>>>> +}
>>>> +
>>>> +fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
>>>> + // Signal cache invalidation to other processes (best-effort).
>>>> + let new_shared_gen = bump_token_shadow_shared_gen();
>>>> +
>>>> + let mut cache = TOKEN_SECRET_CACHE.write();
>>>> +
>>>> + // If we cannot read/bump the shared generation, we cannot safely trust the cache.
>>>> + let Some(gen) = new_shared_gen else {
>>>> + invalidate_cache_state(&mut cache);
>>>> + cache.shared_gen = 0;
>>>> + return;
>>>> + };
>>>> +
>>>> + // Update to the post-mutation generation.
>>>> + cache.shared_gen = gen;
>>>> +
>>>> + // Apply the new mutation.
>>>> + match new_secret {
>>>> + Some(secret) => {
>>>> + cache.secrets.insert(
>>>> + tokenid.clone(),
>>>> + CachedSecret {
>>>> + secret: secret.to_owned(),
>>>> + },
>>>> + );
>>>> + }
>>>> + None => {
>>>> + cache.secrets.remove(tokenid);
>>>> + }
>>>> + }
>>>> +}
>>>> +
>>>> +/// Get the current shared generation.
>>>> +fn token_shadow_shared_gen() -> Option<usize> {
>>>> + crate::ConfigVersionCache::new()
>>>> + .ok()
>>>> + .map(|cvc| cvc.token_shadow_generation())
>>>> +}
>>>> +
>>>> +/// Bump and return the new shared generation.
>>>> +fn bump_token_shadow_shared_gen() -> Option<usize> {
>>>> + crate::ConfigVersionCache::new()
>>>> + .ok()
>>>> + .map(|cvc| cvc.increase_token_shadow_generation() + 1)
>>>> +}
>>>> +
>>>> +/// Invalidates the cache state and only keeps the shared generation.
>>>
>>> both calls to this actually set the cached generation to some value
>>> right after, so maybe this should take a generation directly and set it?
>>>
>>
>> patch 3/4 doesn’t always update the gen on cache invalidation
>> (shadow_mtime_len() error branch in apply_api_mutation) but most other
>> call sites do. Agreed this can be refactored, maybe:
>
> that one sets the generation before (potentially) invalidating the cache
> though, so we could unconditionally reset the generation to that value when
> invalidating.. we should maybe also re-order the lock and bump there?
>
Good point, I will check this! thanks Fabian! :)
>>
>> fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>> cache.secrets.clear();
>> // clear other cache fields (mtime/len/last_checked) as needed
>> }
>>
>> fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache,
>> gen: usize) {
>> invalidate_cache_state(cache);
>> cache.shared_gen = gen;
>> }
>>
>> We could also do a single helper with Option<usize> but two helpers make
>> the call sites more explicit.
>>
>>>> +fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>>>> + cache.secrets.clear();
>>>> +}
>>>> --
>>>> 2.47.3
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> pbs-devel mailing list
>>>> pbs-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets
2026-01-16 15:13 6% ` Samuel Rufinatscha
2026-01-16 15:29 5% ` Fabian Grünbichler
@ 2026-01-16 16:00 5% ` Fabian Grünbichler
2026-01-16 16:56 6% ` Samuel Rufinatscha
1 sibling, 1 reply; 39+ results
From: Fabian Grünbichler @ 2026-01-16 16:00 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Samuel Rufinatscha
Quoting Samuel Rufinatscha (2026-01-16 16:13:17)
> On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
> > On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
> >> Currently, every token-based API request reads the token.shadow file and
> >> runs the expensive password hash verification for the given token
> >> secret. This shows up as a hotspot in /status profiling (see
> >> bug #7017 [1]).
> >>
> >> This patch introduces an in-memory cache of successfully verified token
> >> secrets. Subsequent requests for the same token+secret combination only
> >> perform a comparison using openssl::memcmp::eq and avoid re-running the
> >> password hash. The cache is updated when a token secret is set and
> >> cleared when a token is deleted. Note, this does NOT include manual
> >> config changes, which will be covered in a subsequent patch.
> >>
> >> This patch is part of the series which fixes bug #7017 [1].
> >>
> >> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
> >>
> >> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
> >> ---
[..]
> >> +
> >> +// Tries to match the given token secret against the cached secret.
> >> +// Checks the generation before and after the constant-time compare to avoid a
> >> +// TOCTOU window. If another process rotates/deletes a token while we're validating
> >> +// the cached secret, the generation will change, and we
> >> +// must not trust the cache for this request.
> >> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
> >> + let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
> >> + return false;
> >> + };
> >> + let Some(entry) = cache.secrets.get(tokenid) else {
> >> + return false;
> >> + };
> >> +
> >> + let cache_gen = cache.shared_gen;
> >> +
> >> + let Some(gen1) = token_shadow_shared_gen() else {
> >> + return false;
> >> + };
> >> + if gen1 != cache_gen {
> >> + return false;
> >> + }
> >> +
> >> + let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
> >
> > should we invalidate the cache here for this particular authid in case
> > of a mismatch, to avoid making brute forcing too easy/cheap?
> >
>
> We are not doing a cheap reject, in mismatch we do still fall through to
> verify_crypt_pw(). Evicting on mismatch could however enable cache
> thrashing where wrong secrets for a known tokenid would evict cached
> entries. So I think we should not invalidate here on mismatch.
forgot this part here, sorry. you are right, this *should* be okay. I do think
the second generation check there serves no purpose though. the token config
can change at any point after we've validated the secret using the old state,
there is nothing we can do about that, and it's totally fine to accept a token
that is modified at exactly the same moment, even if that same token wouldn't
be valid 2 seconds later..
there has to be a point where we have to say "this token is valid", and at the
point of memcmp here we have already:
- verified we don't need to reload the file
- verified we didn't have any API changes to the token config
- verified that the secret matches what we have cached
redoing the first two changes after that point doesn't protect us against
changes afterwards either, so we might as well not do that extra work that
doesn't give us any extra safety guarantees anyway..
>
> >> + let Some(gen2) = token_shadow_shared_gen() else {
> >> + return false;
> >> + };
> >> +
> >> + eq && gen2 == cache_gen
> >> +}
> >> +
> >> +fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
> >> + // Signal cache invalidation to other processes (best-effort).
> >> + let new_shared_gen = bump_token_shadow_shared_gen();
> >> +
> >> + let mut cache = TOKEN_SECRET_CACHE.write();
because I mentioned switching those two around - this actually requires more
thought I think..
right now, calling apply_api_mutation happens under a lock, but there are other
calls that bump the generation, so this is actually racy here. OTOH, bumping
the generation before locking the cache means faster cache invalidation..
maybe we should re-verify the generation after obtaining the lock? and maybe
make apply_api_mutation consume the shadow config file lock, to ensure it's
only called while that lock is being held?
> >> +
> >> + // If we cannot read/bump the shared generation, we cannot safely trust the cache.
> >> + let Some(gen) = new_shared_gen else {
> >> + invalidate_cache_state(&mut cache);
> >> + cache.shared_gen = 0;
> >> + return;
> >> + };
> >> +
> >> + // Update to the post-mutation generation.
> >> + cache.shared_gen = gen;
> >> +
> >> + // Apply the new mutation.
> >> + match new_secret {
> >> + Some(secret) => {
> >> + cache.secrets.insert(
> >> + tokenid.clone(),
> >> + CachedSecret {
> >> + secret: secret.to_owned(),
> >> + },
> >> + );
> >> + }
> >> + None => {
> >> + cache.secrets.remove(tokenid);
> >> + }
> >> + }
> >> +}
> >> +
> >> +/// Get the current shared generation.
> >> +fn token_shadow_shared_gen() -> Option<usize> {
> >> + crate::ConfigVersionCache::new()
> >> + .ok()
> >> + .map(|cvc| cvc.token_shadow_generation())
> >> +}
> >> +
> >> +/// Bump and return the new shared generation.
> >> +fn bump_token_shadow_shared_gen() -> Option<usize> {
> >> + crate::ConfigVersionCache::new()
> >> + .ok()
> >> + .map(|cvc| cvc.increase_token_shadow_generation() + 1)
> >> +}
> >> +
> >> +/// Invalidates the cache state and only keeps the shared generation.
> >
> > both calls to this actually set the cached generation to some value
> > right after, so maybe this should take a generation directly and set it?
> >
>
> patch 3/4 doesn’t always update the gen on cache invalidation
> (shadow_mtime_len() error branch in apply_api_mutation) but most other
> call sites do. Agreed this can be refactored, maybe:
>
> fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
> cache.secrets.clear();
> // clear other cache fields (mtime/len/last_checked) as needed
> }
>
> fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache,
> gen: usize) {
> invalidate_cache_state(cache);
> cache.shared_gen = gen;
> }
>
> We could also do a single helper with Option<usize> but two helpers make
> the call sites more explicit.
>
> >> +fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
> >> + cache.secrets.clear();
> >> +}
> >> --
> >> 2.47.3
> >>
> >>
> >>
> >> _______________________________________________
> >> pbs-devel mailing list
> >> pbs-devel@lists.proxmox.com
> >> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
> >>
> >
> >
> > _______________________________________________
> > pbs-devel mailing list
> > pbs-devel@lists.proxmox.com
> > https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-datacenter-manager v3 1/2] pdm-config: implement token.shadow generation
@ 2026-01-16 16:28 6% ` Samuel Rufinatscha
2026-01-16 16:48 5% ` Shannon Sterz
0 siblings, 1 reply; 39+ results
From: Samuel Rufinatscha @ 2026-01-16 16:28 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>> PDM depends on the shared proxmox/proxmox-access-control crate for
>> token.shadow handling, which expects the product to provide a
>> cross-process invalidation signal so it can safely cache verified API
>> token secrets and invalidate them when token.shadow is changed.
>>
>> This patch
>>
>> * adds a token_shadow_generation to PDM’s shared-memory
>> ConfigVersionCache
>> * implements proxmox_access_control::init::AccessControlConfig
>> for pdm_config::AccessControlConfig, which
>> - delegates roles/privs/path checks to the existing
>> pdm_api_types::AccessControlConfig implementation
>> - implements the shadow cache generation trait functions
>> * switches the AccessControlConfig init paths (server + CLI) to use
>> pdm_config::AccessControlConfig instead of
>> pdm_api_types::AccessControlConfig
>>
>> This patch is part of the series which fixes bug #7017 [1].
>>
>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> cli/admin/src/main.rs | 2 +-
>> lib/pdm-config/Cargo.toml | 1 +
>> lib/pdm-config/src/access_control_config.rs | 73 +++++++++++++++++++++
>> lib/pdm-config/src/config_version_cache.rs | 18 +++++
>> lib/pdm-config/src/lib.rs | 2 +
>> server/src/acl.rs | 3 +-
>> 6 files changed, 96 insertions(+), 3 deletions(-)
>> create mode 100644 lib/pdm-config/src/access_control_config.rs
>>
>> diff --git a/cli/admin/src/main.rs b/cli/admin/src/main.rs
>> index f698fa2..916c633 100644
>> --- a/cli/admin/src/main.rs
>> +++ b/cli/admin/src/main.rs
>> @@ -19,7 +19,7 @@ fn main() {
>> proxmox_product_config::init(api_user, priv_user);
>>
>> proxmox_access_control::init::init(
>> - &pdm_api_types::AccessControlConfig,
>> + &pdm_config::AccessControlConfig,
>> pdm_buildcfg::configdir!("/access"),
>> )
>> .expect("failed to setup access control config");
>> diff --git a/lib/pdm-config/Cargo.toml b/lib/pdm-config/Cargo.toml
>> index d39c2ad..19781d2 100644
>> --- a/lib/pdm-config/Cargo.toml
>> +++ b/lib/pdm-config/Cargo.toml
>> @@ -13,6 +13,7 @@ once_cell.workspace = true
>> openssl.workspace = true
>> serde.workspace = true
>>
>> +proxmox-access-control.workspace = true
>> proxmox-config-digest = { workspace = true, features = [ "openssl" ] }
>> proxmox-http = { workspace = true, features = [ "http-helpers" ] }
>> proxmox-ldap = { workspace = true, features = [ "types" ]}
>> diff --git a/lib/pdm-config/src/access_control_config.rs b/lib/pdm-config/src/access_control_config.rs
>> new file mode 100644
>> index 0000000..6f2e6b3
>> --- /dev/null
>> +++ b/lib/pdm-config/src/access_control_config.rs
>> @@ -0,0 +1,73 @@
>> +// e.g. in src/main.rs or server::context mod, wherever convenient
>> +
>> +use anyhow::Error;
>> +use pdm_api_types::{Authid, Userid};
>> +use proxmox_section_config::SectionConfigData;
>> +use std::collections::HashMap;
>> +
>> +pub struct AccessControlConfig;
>> +
>> +impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
>
> should we then remove the impl from the api type?
>
Thanks for pointing this out Fabian! Currently, /ui/src/main.rs still
makes use of pdm_api_types::AccessControlConfig. This looks like a WASM
module, and is based on ticket based auth
(proxmox_login::Authentication) as far as I can see. Do you maybe know
if it actually requires the token cache / can work with CVC? If it does
not, then I think we should keep the API impl. I left this unchanged
and only touched server and CLI call sites.
>> + fn privileges(&self) -> &HashMap<&str, u64> {
>> + pdm_api_types::AccessControlConfig.privileges()
>> + }
>> +
>> + fn roles(&self) -> &HashMap<&str, (u64, &str)> {
>> + pdm_api_types::AccessControlConfig.roles()
>> + }
>> +
>> + fn is_superuser(&self, auth_id: &Authid) -> bool {
>> + pdm_api_types::AccessControlConfig.is_superuser(auth_id)
>> + }
>> +
>> + fn is_group_member(&self, user_id: &Userid, group: &str) -> bool {
>> + pdm_api_types::AccessControlConfig.is_group_member(user_id, group)
>> + }
>> +
>> + fn role_admin(&self) -> Option<&str> {
>> + pdm_api_types::AccessControlConfig.role_admin()
>> + }
>> +
>> + fn role_no_access(&self) -> Option<&str> {
>> + pdm_api_types::AccessControlConfig.role_no_access()
>> + }
>> +
>> + fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
>> + pdm_api_types::AccessControlConfig.init_user_config(config)
>> + }
>> +
>> + fn acl_audit_privileges(&self) -> u64 {
>> + pdm_api_types::AccessControlConfig.acl_audit_privileges()
>> + }
>> +
>> + fn acl_modify_privileges(&self) -> u64 {
>> + pdm_api_types::AccessControlConfig.acl_modify_privileges()
>> + }
>> +
>> + fn check_acl_path(&self, path: &str) -> Result<(), Error> {
>> + pdm_api_types::AccessControlConfig.check_acl_path(path)
>> + }
>> +
>> + fn allow_partial_permission_match(&self) -> bool {
>> + pdm_api_types::AccessControlConfig.allow_partial_permission_match()
>> + }
>> +
>> + fn cache_generation(&self) -> Option<usize> {
>> + pdm_api_types::AccessControlConfig.cache_generation()
>> + }
>
> shouldn't this be wired up to the ConfigVersionCache?
>
If I understand correctly, cache_generation() and the
increment_cache_generation() below do not appear to have been wired
so far, meaning that caches were not enabled. To enable them,
a PDM AccessControlConfig implementation would probably be required
(as suggested in this patch) in order to be able integrate with
ConfigVersionCache.
I think these two functions should be checked, if we want to enabled
them or not, probably best as part of a dedicated scope? I can create a
bug report for this.
>> +
>> + fn increment_cache_generation(&self) -> Result<(), Error> {
>> + pdm_api_types::AccessControlConfig.increment_cache_generation()
>
> shouldn't this be wired up to the ConfigVersionCache?
>
>> + }
>> +
>> + fn token_shadow_cache_generation(&self) -> Option<usize> {
>> + crate::ConfigVersionCache::new()
>> + .ok()
>> + .map(|c| c.token_shadow_generation())
>> + }
>> +
>> + fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
>> + let c = crate::ConfigVersionCache::new()?;
>> + Ok(c.increase_token_shadow_generation())
>> + }
>> +}
>> diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
>> index 36a6a77..933140c 100644
>> --- a/lib/pdm-config/src/config_version_cache.rs
>> +++ b/lib/pdm-config/src/config_version_cache.rs
>> @@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
>> traffic_control_generation: AtomicUsize,
>> // Tracks updates to the remote/hostname/nodename mapping cache.
>> remote_mapping_cache: AtomicUsize,
>> + // Token shadow (token.shadow) generation/version.
>> + token_shadow_generation: AtomicUsize,
>
> explanation why this is safe for the commit message would be nice ;)
>
Will add :)
>> // Add further atomics here
>> }
>>
>> @@ -172,4 +174,20 @@ impl ConfigVersionCache {
>> .fetch_add(1, Ordering::Relaxed)
>> + 1
>> }
>> +
>> + /// Returns the token shadow generation number.
>> + pub fn token_shadow_generation(&self) -> usize {
>> + self.shmem
>> + .data()
>> + .token_shadow_generation
>> + .load(Ordering::Acquire)
>> + }
>> +
>> + /// Increase the token shadow generation number.
>> + pub fn increase_token_shadow_generation(&self) -> usize {
>> + self.shmem
>> + .data()
>> + .token_shadow_generation
>> + .fetch_add(1, Ordering::AcqRel)
>> + }
>> }
>> diff --git a/lib/pdm-config/src/lib.rs b/lib/pdm-config/src/lib.rs
>> index 4c49054..a15a006 100644
>> --- a/lib/pdm-config/src/lib.rs
>> +++ b/lib/pdm-config/src/lib.rs
>> @@ -9,6 +9,8 @@ pub mod remotes;
>> pub mod setup;
>> pub mod views;
>>
>> +mod access_control_config;
>> +pub use access_control_config::AccessControlConfig;
>> mod config_version_cache;
>> pub use config_version_cache::ConfigVersionCache;
>>
>> diff --git a/server/src/acl.rs b/server/src/acl.rs
>> index f421814..e6e007b 100644
>> --- a/server/src/acl.rs
>> +++ b/server/src/acl.rs
>> @@ -1,6 +1,5 @@
>> pub(crate) fn init() {
>> - static ACCESS_CONTROL_CONFIG: pdm_api_types::AccessControlConfig =
>> - pdm_api_types::AccessControlConfig;
>> + static ACCESS_CONTROL_CONFIG: pdm_config::AccessControlConfig = pdm_config::AccessControlConfig;
>>
>> proxmox_access_control::init::init(&ACCESS_CONTROL_CONFIG, pdm_buildcfg::configdir!("/access"))
>> .expect("failed to setup access control config");
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-datacenter-manager v3 1/2] pdm-config: implement token.shadow generation
2026-01-16 16:28 6% ` Samuel Rufinatscha
@ 2026-01-16 16:48 5% ` Shannon Sterz
2026-01-19 7:56 6% ` Samuel Rufinatscha
0 siblings, 1 reply; 39+ results
From: Shannon Sterz @ 2026-01-16 16:48 UTC (permalink / raw)
To: Samuel Rufinatscha; +Cc: Proxmox Backup Server development discussion
On Fri Jan 16, 2026 at 5:28 PM CET, Samuel Rufinatscha wrote:
> On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
>> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>>> PDM depends on the shared proxmox/proxmox-access-control crate for
>>> token.shadow handling, which expects the product to provide a
>>> cross-process invalidation signal so it can safely cache verified API
>>> token secrets and invalidate them when token.shadow is changed.
>>>
>>> This patch
>>>
>>> * adds a token_shadow_generation to PDM’s shared-memory
>>> ConfigVersionCache
>>> * implements proxmox_access_control::init::AccessControlConfig
>>> for pdm_config::AccessControlConfig, which
>>> - delegates roles/privs/path checks to the existing
>>> pdm_api_types::AccessControlConfig implementation
>>> - implements the shadow cache generation trait functions
>>> * switches the AccessControlConfig init paths (server + CLI) to use
>>> pdm_config::AccessControlConfig instead of
>>> pdm_api_types::AccessControlConfig
>>>
>>> This patch is part of the series which fixes bug #7017 [1].
>>>
>>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>>
>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>> ---
>>> cli/admin/src/main.rs | 2 +-
>>> lib/pdm-config/Cargo.toml | 1 +
>>> lib/pdm-config/src/access_control_config.rs | 73 +++++++++++++++++++++
>>> lib/pdm-config/src/config_version_cache.rs | 18 +++++
>>> lib/pdm-config/src/lib.rs | 2 +
>>> server/src/acl.rs | 3 +-
>>> 6 files changed, 96 insertions(+), 3 deletions(-)
>>> create mode 100644 lib/pdm-config/src/access_control_config.rs
>>>
>>> diff --git a/cli/admin/src/main.rs b/cli/admin/src/main.rs
>>> index f698fa2..916c633 100644
>>> --- a/cli/admin/src/main.rs
>>> +++ b/cli/admin/src/main.rs
>>> @@ -19,7 +19,7 @@ fn main() {
>>> proxmox_product_config::init(api_user, priv_user);
>>>
>>> proxmox_access_control::init::init(
>>> - &pdm_api_types::AccessControlConfig,
>>> + &pdm_config::AccessControlConfig,
>>> pdm_buildcfg::configdir!("/access"),
>>> )
>>> .expect("failed to setup access control config");
>>> diff --git a/lib/pdm-config/Cargo.toml b/lib/pdm-config/Cargo.toml
>>> index d39c2ad..19781d2 100644
>>> --- a/lib/pdm-config/Cargo.toml
>>> +++ b/lib/pdm-config/Cargo.toml
>>> @@ -13,6 +13,7 @@ once_cell.workspace = true
>>> openssl.workspace = true
>>> serde.workspace = true
>>>
>>> +proxmox-access-control.workspace = true
>>> proxmox-config-digest = { workspace = true, features = [ "openssl" ] }
>>> proxmox-http = { workspace = true, features = [ "http-helpers" ] }
>>> proxmox-ldap = { workspace = true, features = [ "types" ]}
>>> diff --git a/lib/pdm-config/src/access_control_config.rs b/lib/pdm-config/src/access_control_config.rs
>>> new file mode 100644
>>> index 0000000..6f2e6b3
>>> --- /dev/null
>>> +++ b/lib/pdm-config/src/access_control_config.rs
>>> @@ -0,0 +1,73 @@
>>> +// e.g. in src/main.rs or server::context mod, wherever convenient
>>> +
>>> +use anyhow::Error;
>>> +use pdm_api_types::{Authid, Userid};
>>> +use proxmox_section_config::SectionConfigData;
>>> +use std::collections::HashMap;
>>> +
>>> +pub struct AccessControlConfig;
>>> +
>>> +impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
>>
>> should we then remove the impl from the api type?
>>
>
> Thanks for pointing this out Fabian! Currently, /ui/src/main.rs still
> makes use of pdm_api_types::AccessControlConfig. This looks like a WASM
> module, and is based on ticket based auth
> (proxmox_login::Authentication) as far as I can see. Do you maybe know
> if it actually requires the token cache / can work with CVC? If it does
> not, then I think we should keep the API impl. I left this unchanged
> and only touched server and CLI call sites.
i mostly exposed that there to get access to the privileges, roles, and
is_superuser functions. they are needed in the ui to selectively render
ui elements depending on a users privileges.
this should probably be factored out though and shared differently if we
want to extend this trait with more caching functions.
>>> + fn privileges(&self) -> &HashMap<&str, u64> {
>>> + pdm_api_types::AccessControlConfig.privileges()
>>> + }
>>> +
>>> + fn roles(&self) -> &HashMap<&str, (u64, &str)> {
>>> + pdm_api_types::AccessControlConfig.roles()
>>> + }
>>> +
>>> + fn is_superuser(&self, auth_id: &Authid) -> bool {
>>> + pdm_api_types::AccessControlConfig.is_superuser(auth_id)
>>> + }
>>> +
>>> + fn is_group_member(&self, user_id: &Userid, group: &str) -> bool {
>>> + pdm_api_types::AccessControlConfig.is_group_member(user_id, group)
>>> + }
>>> +
>>> + fn role_admin(&self) -> Option<&str> {
>>> + pdm_api_types::AccessControlConfig.role_admin()
>>> + }
>>> +
>>> + fn role_no_access(&self) -> Option<&str> {
>>> + pdm_api_types::AccessControlConfig.role_no_access()
>>> + }
>>> +
>>> + fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
>>> + pdm_api_types::AccessControlConfig.init_user_config(config)
>>> + }
>>> +
>>> + fn acl_audit_privileges(&self) -> u64 {
>>> + pdm_api_types::AccessControlConfig.acl_audit_privileges()
>>> + }
>>> +
>>> + fn acl_modify_privileges(&self) -> u64 {
>>> + pdm_api_types::AccessControlConfig.acl_modify_privileges()
>>> + }
>>> +
>>> + fn check_acl_path(&self, path: &str) -> Result<(), Error> {
>>> + pdm_api_types::AccessControlConfig.check_acl_path(path)
>>> + }
>>> +
>>> + fn allow_partial_permission_match(&self) -> bool {
>>> + pdm_api_types::AccessControlConfig.allow_partial_permission_match()
>>> + }
>>> +
>>> + fn cache_generation(&self) -> Option<usize> {
>>> + pdm_api_types::AccessControlConfig.cache_generation()
>>> + }
>>
>> shouldn't this be wired up to the ConfigVersionCache?
>>
>
> If I understand correctly, cache_generation() and the
> increment_cache_generation() below do not appear to have been wired
> so far, meaning that caches were not enabled. To enable them,
> a PDM AccessControlConfig implementation would probably be required
> (as suggested in this patch) in order to be able integrate with
> ConfigVersionCache.
>
> I think these two functions should be checked, if we want to enabled
> them or not, probably best as part of a dedicated scope? I can create a
> bug report for this.
>
sure, i think it's not too much effort, though. if you split out the
caching parts, the ui should be fine without them. it really has no need
for them afair.
>>> +
>>> + fn increment_cache_generation(&self) -> Result<(), Error> {
>>> + pdm_api_types::AccessControlConfig.increment_cache_generation()
>>
>> shouldn't this be wired up to the ConfigVersionCache?
>>
>>> + }
>>> +
>>> + fn token_shadow_cache_generation(&self) -> Option<usize> {
>>> + crate::ConfigVersionCache::new()
>>> + .ok()
>>> + .map(|c| c.token_shadow_generation())
>>> + }
>>> +
>>> + fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
>>> + let c = crate::ConfigVersionCache::new()?;
>>> + Ok(c.increase_token_shadow_generation())
>>> + }
>>> +}
>>> diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
>>> index 36a6a77..933140c 100644
>>> --- a/lib/pdm-config/src/config_version_cache.rs
>>> +++ b/lib/pdm-config/src/config_version_cache.rs
>>> @@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
>>> traffic_control_generation: AtomicUsize,
>>> // Tracks updates to the remote/hostname/nodename mapping cache.
>>> remote_mapping_cache: AtomicUsize,
>>> + // Token shadow (token.shadow) generation/version.
>>> + token_shadow_generation: AtomicUsize,
>>
>> explanation why this is safe for the commit message would be nice ;)
>>
>
> Will add :)
>
>>> // Add further atomics here
>>> }
>>>
>>> @@ -172,4 +174,20 @@ impl ConfigVersionCache {
>>> .fetch_add(1, Ordering::Relaxed)
>>> + 1
>>> }
>>> +
>>> + /// Returns the token shadow generation number.
>>> + pub fn token_shadow_generation(&self) -> usize {
>>> + self.shmem
>>> + .data()
>>> + .token_shadow_generation
>>> + .load(Ordering::Acquire)
>>> + }
>>> +
>>> + /// Increase the token shadow generation number.
>>> + pub fn increase_token_shadow_generation(&self) -> usize {
>>> + self.shmem
>>> + .data()
>>> + .token_shadow_generation
>>> + .fetch_add(1, Ordering::AcqRel)
>>> + }
>>> }
>>> diff --git a/lib/pdm-config/src/lib.rs b/lib/pdm-config/src/lib.rs
>>> index 4c49054..a15a006 100644
>>> --- a/lib/pdm-config/src/lib.rs
>>> +++ b/lib/pdm-config/src/lib.rs
>>> @@ -9,6 +9,8 @@ pub mod remotes;
>>> pub mod setup;
>>> pub mod views;
>>>
>>> +mod access_control_config;
>>> +pub use access_control_config::AccessControlConfig;
>>> mod config_version_cache;
>>> pub use config_version_cache::ConfigVersionCache;
>>>
>>> diff --git a/server/src/acl.rs b/server/src/acl.rs
>>> index f421814..e6e007b 100644
>>> --- a/server/src/acl.rs
>>> +++ b/server/src/acl.rs
>>> @@ -1,6 +1,5 @@
>>> pub(crate) fn init() {
>>> - static ACCESS_CONTROL_CONFIG: pdm_api_types::AccessControlConfig =
>>> - pdm_api_types::AccessControlConfig;
>>> + static ACCESS_CONTROL_CONFIG: pdm_config::AccessControlConfig = pdm_config::AccessControlConfig;
>>>
>>> proxmox_access_control::init::init(&ACCESS_CONTROL_CONFIG, pdm_buildcfg::configdir!("/access"))
>>> .expect("failed to setup access control config");
>>> --
>>> 2.47.3
>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 5%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets
2026-01-16 16:00 5% ` Fabian Grünbichler
@ 2026-01-16 16:56 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-16 16:56 UTC (permalink / raw)
To: Fabian Grünbichler, Proxmox Backup Server development discussion
On 1/16/26 4:59 PM, Fabian Grünbichler wrote:
> Quoting Samuel Rufinatscha (2026-01-16 16:13:17)
>> On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
>>> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>>>> Currently, every token-based API request reads the token.shadow file and
>>>> runs the expensive password hash verification for the given token
>>>> secret. This shows up as a hotspot in /status profiling (see
>>>> bug #7017 [1]).
>>>>
>>>> This patch introduces an in-memory cache of successfully verified token
>>>> secrets. Subsequent requests for the same token+secret combination only
>>>> perform a comparison using openssl::memcmp::eq and avoid re-running the
>>>> password hash. The cache is updated when a token secret is set and
>>>> cleared when a token is deleted. Note, this does NOT include manual
>>>> config changes, which will be covered in a subsequent patch.
>>>>
>>>> This patch is part of the series which fixes bug #7017 [1].
>>>>
>>>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>>>
>>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>>> ---
>
> [..]
>
>>>> +
>>>> +// Tries to match the given token secret against the cached secret.
>>>> +// Checks the generation before and after the constant-time compare to avoid a
>>>> +// TOCTOU window. If another process rotates/deletes a token while we're validating
>>>> +// the cached secret, the generation will change, and we
>>>> +// must not trust the cache for this request.
>>>> +fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
>>>> + let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
>>>> + return false;
>>>> + };
>>>> + let Some(entry) = cache.secrets.get(tokenid) else {
>>>> + return false;
>>>> + };
>>>> +
>>>> + let cache_gen = cache.shared_gen;
>>>> +
>>>> + let Some(gen1) = token_shadow_shared_gen() else {
>>>> + return false;
>>>> + };
>>>> + if gen1 != cache_gen {
>>>> + return false;
>>>> + }
>>>> +
>>>> + let eq = openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
>>>
>>> should we invalidate the cache here for this particular authid in case
>>> of a mismatch, to avoid making brute forcing too easy/cheap?
>>>
>>
>> We are not doing a cheap reject, in mismatch we do still fall through to
>> verify_crypt_pw(). Evicting on mismatch could however enable cache
>> thrashing where wrong secrets for a known tokenid would evict cached
>> entries. So I think we should not invalidate here on mismatch.
>
> forgot this part here, sorry. you are right, this *should* be okay. I do think
> the second generation check there serves no purpose though. the token config
> can change at any point after we've validated the secret using the old state,
> there is nothing we can do about that, and it's totally fine to accept a token
> that is modified at exactly the same moment, even if that same token wouldn't
> be valid 2 seconds later..
>
> there has to be a point where we have to say "this token is valid", and at the
> point of memcmp here we have already:
> - verified we don't need to reload the file
> - verified we didn't have any API changes to the token config
> - verified that the secret matches what we have cached
>
> redoing the first two changes after that point doesn't protect us against
> changes afterwards either, so we might as well not do that extra work that
> doesn't give us any extra safety guarantees anyway..
Agreed, the second generation check only narrows down a very small
window around memcmp (tried to avoid the TOCTOU at this point), but as
you said, it doesn’t provide a strong additional guarantee and is
unnecessary. Will remove!
>
>>
>>>> + let Some(gen2) = token_shadow_shared_gen() else {
>>>> + return false;
>>>> + };
>>>> +
>>>> + eq && gen2 == cache_gen
>>>> +}
>>>> +
>>>> +fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
>>>> + // Signal cache invalidation to other processes (best-effort).
>>>> + let new_shared_gen = bump_token_shadow_shared_gen();
>>>> +
>>>> + let mut cache = TOKEN_SECRET_CACHE.write();
>
> because I mentioned switching those two around - this actually requires more
> thought I think..
>
> right now, calling apply_api_mutation happens under a lock, but there are other
> calls that bump the generation, so this is actually racy here. OTOH, bumping
> the generation before locking the cache means faster cache invalidation..
Yes, I favored to bump the gen before the write lock for faster cache
invalidation / for better security.
>
> maybe we should re-verify the generation after obtaining the lock? and maybe
> make apply_api_mutation consume the shadow config file lock, to ensure it's
> only called while that lock is being held?
Agree, I think we should re-verify the generation after the write lock.
Also agree, I think we should pass the file lock down. Good idea! :)
This should make it more robust.
>
>>>> +
>>>> + // If we cannot read/bump the shared generation, we cannot safely trust the cache.
>>>> + let Some(gen) = new_shared_gen else {
>>>> + invalidate_cache_state(&mut cache);
>>>> + cache.shared_gen = 0;
>>>> + return;
>>>> + };
>>>> +
>>>> + // Update to the post-mutation generation.
>>>> + cache.shared_gen = gen;
>>>> +
>>>> + // Apply the new mutation.
>>>> + match new_secret {
>>>> + Some(secret) => {
>>>> + cache.secrets.insert(
>>>> + tokenid.clone(),
>>>> + CachedSecret {
>>>> + secret: secret.to_owned(),
>>>> + },
>>>> + );
>>>> + }
>>>> + None => {
>>>> + cache.secrets.remove(tokenid);
>>>> + }
>>>> + }
>>>> +}
>>>> +
>>>> +/// Get the current shared generation.
>>>> +fn token_shadow_shared_gen() -> Option<usize> {
>>>> + crate::ConfigVersionCache::new()
>>>> + .ok()
>>>> + .map(|cvc| cvc.token_shadow_generation())
>>>> +}
>>>> +
>>>> +/// Bump and return the new shared generation.
>>>> +fn bump_token_shadow_shared_gen() -> Option<usize> {
>>>> + crate::ConfigVersionCache::new()
>>>> + .ok()
>>>> + .map(|cvc| cvc.increase_token_shadow_generation() + 1)
>>>> +}
>>>> +
>>>> +/// Invalidates the cache state and only keeps the shared generation.
>>>
>>> both calls to this actually set the cached generation to some value
>>> right after, so maybe this should take a generation directly and set it?
>>>
>>
>> patch 3/4 doesn’t always update the gen on cache invalidation
>> (shadow_mtime_len() error branch in apply_api_mutation) but most other
>> call sites do. Agreed this can be refactored, maybe:
>>
>> fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>> cache.secrets.clear();
>> // clear other cache fields (mtime/len/last_checked) as needed
>> }
>>
>> fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache,
>> gen: usize) {
>> invalidate_cache_state(cache);
>> cache.shared_gen = gen;
>> }
>>
>> We could also do a single helper with Option<usize> but two helpers make
>> the call sites more explicit.
>>
>>>> +fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>>>> + cache.secrets.clear();
>>>> +}
>>>> --
>>>> 2.47.3
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> pbs-devel mailing list
>>>> pbs-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-datacenter-manager v3 1/2] pdm-config: implement token.shadow generation
2026-01-16 16:48 5% ` Shannon Sterz
@ 2026-01-19 7:56 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-19 7:56 UTC (permalink / raw)
To: Shannon Sterz; +Cc: Proxmox Backup Server development discussion
comments inline
On 1/16/26 5:47 PM, Shannon Sterz wrote:
> On Fri Jan 16, 2026 at 5:28 PM CET, Samuel Rufinatscha wrote:
>> On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
>>> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>>>> PDM depends on the shared proxmox/proxmox-access-control crate for
>>>> token.shadow handling, which expects the product to provide a
>>>> cross-process invalidation signal so it can safely cache verified API
>>>> token secrets and invalidate them when token.shadow is changed.
>>>>
>>>> This patch
>>>>
>>>> * adds a token_shadow_generation to PDM’s shared-memory
>>>> ConfigVersionCache
>>>> * implements proxmox_access_control::init::AccessControlConfig
>>>> for pdm_config::AccessControlConfig, which
>>>> - delegates roles/privs/path checks to the existing
>>>> pdm_api_types::AccessControlConfig implementation
>>>> - implements the shadow cache generation trait functions
>>>> * switches the AccessControlConfig init paths (server + CLI) to use
>>>> pdm_config::AccessControlConfig instead of
>>>> pdm_api_types::AccessControlConfig
>>>>
>>>> This patch is part of the series which fixes bug #7017 [1].
>>>>
>>>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>>>
>>>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>>>> ---
>>>> cli/admin/src/main.rs | 2 +-
>>>> lib/pdm-config/Cargo.toml | 1 +
>>>> lib/pdm-config/src/access_control_config.rs | 73 +++++++++++++++++++++
>>>> lib/pdm-config/src/config_version_cache.rs | 18 +++++
>>>> lib/pdm-config/src/lib.rs | 2 +
>>>> server/src/acl.rs | 3 +-
>>>> 6 files changed, 96 insertions(+), 3 deletions(-)
>>>> create mode 100644 lib/pdm-config/src/access_control_config.rs
>>>>
>>>> diff --git a/cli/admin/src/main.rs b/cli/admin/src/main.rs
>>>> index f698fa2..916c633 100644
>>>> --- a/cli/admin/src/main.rs
>>>> +++ b/cli/admin/src/main.rs
>>>> @@ -19,7 +19,7 @@ fn main() {
>>>> proxmox_product_config::init(api_user, priv_user);
>>>>
>>>> proxmox_access_control::init::init(
>>>> - &pdm_api_types::AccessControlConfig,
>>>> + &pdm_config::AccessControlConfig,
>>>> pdm_buildcfg::configdir!("/access"),
>>>> )
>>>> .expect("failed to setup access control config");
>>>> diff --git a/lib/pdm-config/Cargo.toml b/lib/pdm-config/Cargo.toml
>>>> index d39c2ad..19781d2 100644
>>>> --- a/lib/pdm-config/Cargo.toml
>>>> +++ b/lib/pdm-config/Cargo.toml
>>>> @@ -13,6 +13,7 @@ once_cell.workspace = true
>>>> openssl.workspace = true
>>>> serde.workspace = true
>>>>
>>>> +proxmox-access-control.workspace = true
>>>> proxmox-config-digest = { workspace = true, features = [ "openssl" ] }
>>>> proxmox-http = { workspace = true, features = [ "http-helpers" ] }
>>>> proxmox-ldap = { workspace = true, features = [ "types" ]}
>>>> diff --git a/lib/pdm-config/src/access_control_config.rs b/lib/pdm-config/src/access_control_config.rs
>>>> new file mode 100644
>>>> index 0000000..6f2e6b3
>>>> --- /dev/null
>>>> +++ b/lib/pdm-config/src/access_control_config.rs
>>>> @@ -0,0 +1,73 @@
>>>> +// e.g. in src/main.rs or server::context mod, wherever convenient
>>>> +
>>>> +use anyhow::Error;
>>>> +use pdm_api_types::{Authid, Userid};
>>>> +use proxmox_section_config::SectionConfigData;
>>>> +use std::collections::HashMap;
>>>> +
>>>> +pub struct AccessControlConfig;
>>>> +
>>>> +impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
>>>
>>> should we then remove the impl from the api type?
>>>
>>
>> Thanks for pointing this out Fabian! Currently, /ui/src/main.rs still
>> makes use of pdm_api_types::AccessControlConfig. This looks like a WASM
>> module, and is based on ticket based auth
>> (proxmox_login::Authentication) as far as I can see. Do you maybe know
>> if it actually requires the token cache / can work with CVC? If it does
>> not, then I think we should keep the API impl. I left this unchanged
>> and only touched server and CLI call sites.
>
> i mostly exposed that there to get access to the privileges, roles, and
> is_superuser functions. they are needed in the ui to selectively render
> ui elements depending on a users privileges.
>
> this should probably be factored out though and shared differently if we
> want to extend this trait with more caching functions.
>
Good point.
>>>> + fn privileges(&self) -> &HashMap<&str, u64> {
>>>> + pdm_api_types::AccessControlConfig.privileges()
>>>> + }
>>>> +
>>>> + fn roles(&self) -> &HashMap<&str, (u64, &str)> {
>>>> + pdm_api_types::AccessControlConfig.roles()
>>>> + }
>>>> +
>>>> + fn is_superuser(&self, auth_id: &Authid) -> bool {
>>>> + pdm_api_types::AccessControlConfig.is_superuser(auth_id)
>>>> + }
>>>> +
>>>> + fn is_group_member(&self, user_id: &Userid, group: &str) -> bool {
>>>> + pdm_api_types::AccessControlConfig.is_group_member(user_id, group)
>>>> + }
>>>> +
>>>> + fn role_admin(&self) -> Option<&str> {
>>>> + pdm_api_types::AccessControlConfig.role_admin()
>>>> + }
>>>> +
>>>> + fn role_no_access(&self) -> Option<&str> {
>>>> + pdm_api_types::AccessControlConfig.role_no_access()
>>>> + }
>>>> +
>>>> + fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
>>>> + pdm_api_types::AccessControlConfig.init_user_config(config)
>>>> + }
>>>> +
>>>> + fn acl_audit_privileges(&self) -> u64 {
>>>> + pdm_api_types::AccessControlConfig.acl_audit_privileges()
>>>> + }
>>>> +
>>>> + fn acl_modify_privileges(&self) -> u64 {
>>>> + pdm_api_types::AccessControlConfig.acl_modify_privileges()
>>>> + }
>>>> +
>>>> + fn check_acl_path(&self, path: &str) -> Result<(), Error> {
>>>> + pdm_api_types::AccessControlConfig.check_acl_path(path)
>>>> + }
>>>> +
>>>> + fn allow_partial_permission_match(&self) -> bool {
>>>> + pdm_api_types::AccessControlConfig.allow_partial_permission_match()
>>>> + }
>>>> +
>>>> + fn cache_generation(&self) -> Option<usize> {
>>>> + pdm_api_types::AccessControlConfig.cache_generation()
>>>> + }
>>>
>>> shouldn't this be wired up to the ConfigVersionCache?
>>>
>>
>> If I understand correctly, cache_generation() and the
>> increment_cache_generation() below do not appear to have been wired
>> so far, meaning that caches were not enabled. To enable them,
>> a PDM AccessControlConfig implementation would probably be required
>> (as suggested in this patch) in order to be able integrate with
>> ConfigVersionCache.
>>
>> I think these two functions should be checked, if we want to enabled
>> them or not, probably best as part of a dedicated scope? I can create a
>> bug report for this.
>>
>
> sure, i think it's not too much effort, though. if you split out the
> caching parts, the ui should be fine without them. it really has no need
> for them afair.
If the UI doesnt make use of it maybe it would be simply best to keep
two different impls? One to keep it minimal, also since not all parts
might be WASM compatible, and one impl as proposed to wire-up CVC (and
maybe other things in the future..).
And will wire CVC for the other two existing caching functions as part
of this series.
>
>>>> +
>>>> + fn increment_cache_generation(&self) -> Result<(), Error> {
>>>> + pdm_api_types::AccessControlConfig.increment_cache_generation()
>>>
>>> shouldn't this be wired up to the ConfigVersionCache?
>>>
>>>> + }
>>>> +
>>>> + fn token_shadow_cache_generation(&self) -> Option<usize> {
>>>> + crate::ConfigVersionCache::new()
>>>> + .ok()
>>>> + .map(|c| c.token_shadow_generation())
>>>> + }
>>>> +
>>>> + fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
>>>> + let c = crate::ConfigVersionCache::new()?;
>>>> + Ok(c.increase_token_shadow_generation())
>>>> + }
>>>> +}
>>>> diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
>>>> index 36a6a77..933140c 100644
>>>> --- a/lib/pdm-config/src/config_version_cache.rs
>>>> +++ b/lib/pdm-config/src/config_version_cache.rs
>>>> @@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
>>>> traffic_control_generation: AtomicUsize,
>>>> // Tracks updates to the remote/hostname/nodename mapping cache.
>>>> remote_mapping_cache: AtomicUsize,
>>>> + // Token shadow (token.shadow) generation/version.
>>>> + token_shadow_generation: AtomicUsize,
>>>
>>> explanation why this is safe for the commit message would be nice ;)
>>>
>>
>> Will add :)
>>
>>>> // Add further atomics here
>>>> }
>>>>
>>>> @@ -172,4 +174,20 @@ impl ConfigVersionCache {
>>>> .fetch_add(1, Ordering::Relaxed)
>>>> + 1
>>>> }
>>>> +
>>>> + /// Returns the token shadow generation number.
>>>> + pub fn token_shadow_generation(&self) -> usize {
>>>> + self.shmem
>>>> + .data()
>>>> + .token_shadow_generation
>>>> + .load(Ordering::Acquire)
>>>> + }
>>>> +
>>>> + /// Increase the token shadow generation number.
>>>> + pub fn increase_token_shadow_generation(&self) -> usize {
>>>> + self.shmem
>>>> + .data()
>>>> + .token_shadow_generation
>>>> + .fetch_add(1, Ordering::AcqRel)
>>>> + }
>>>> }
>>>> diff --git a/lib/pdm-config/src/lib.rs b/lib/pdm-config/src/lib.rs
>>>> index 4c49054..a15a006 100644
>>>> --- a/lib/pdm-config/src/lib.rs
>>>> +++ b/lib/pdm-config/src/lib.rs
>>>> @@ -9,6 +9,8 @@ pub mod remotes;
>>>> pub mod setup;
>>>> pub mod views;
>>>>
>>>> +mod access_control_config;
>>>> +pub use access_control_config::AccessControlConfig;
>>>> mod config_version_cache;
>>>> pub use config_version_cache::ConfigVersionCache;
>>>>
>>>> diff --git a/server/src/acl.rs b/server/src/acl.rs
>>>> index f421814..e6e007b 100644
>>>> --- a/server/src/acl.rs
>>>> +++ b/server/src/acl.rs
>>>> @@ -1,6 +1,5 @@
>>>> pub(crate) fn init() {
>>>> - static ACCESS_CONTROL_CONFIG: pdm_api_types::AccessControlConfig =
>>>> - pdm_api_types::AccessControlConfig;
>>>> + static ACCESS_CONTROL_CONFIG: pdm_config::AccessControlConfig = pdm_config::AccessControlConfig;
>>>>
>>>> proxmox_access_control::init::init(&ACCESS_CONTROL_CONFIG, pdm_buildcfg::configdir!("/access"))
>>>> .expect("failed to setup access control config");
>>>> --
>>>> 2.47.3
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> pbs-devel mailing list
>>>> pbs-devel@lists.proxmox.com
>>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> pbs-devel mailing list
>>> pbs-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* Re: [pbs-devel] [PATCH proxmox-backup v3 3/4] pbs-config: invalidate token-secret cache on token.shadow changes
@ 2026-01-20 9:21 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-20 9:21 UTC (permalink / raw)
To: Proxmox Backup Server development discussion, Fabian Grünbichler
On 1/14/26 11:44 AM, Fabian Grünbichler wrote:
> On January 2, 2026 5:07 pm, Samuel Rufinatscha wrote:
>> Previously the in-memory token-secret cache was only updated via
>> set_secret() and delete_secret(), so manual edits to token.shadow were
>> not reflected.
>>
>> This patch adds file change detection to the cache. It tracks the mtime
>> and length of token.shadow and clears the in-memory token secret cache
>> whenever these values change.
>>
>> Note, this patch fetches file stats on every request. An TTL-based
>> optimization will be covered in a subsequent patch of the series.
>>
>> This patch is part of the series which fixes bug #7017 [1].
>>
>> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
>>
>> Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
>> ---
>> Changes from v1 to v2:
>>
>> * Add file metadata tracking (file_mtime, file_len) and
>> FILE_GENERATION.
>> * Store file_gen in CachedSecret and verify it against the current
>> FILE_GENERATION to ensure cached entries belong to the current file
>> state.
>> * Add shadow_mtime_len() helper and convert refresh to best-effort
>> (try_write, returns bool).
>> * Pass a pre-write metadata snapshot into apply_api_mutation and
>> clear/bump generation if the cache metadata indicates missed external
>> edits.
>>
>> Changes from v2 to v3:
>>
>> * Cache now tracks last_checked (epoch seconds).
>> * Simplified refresh_cache_if_file_changed, removed
>> FILE_GENERATION logic
>> * On first load, initializes file metadata and keeps empty cache.
>>
>> pbs-config/src/token_shadow.rs | 122 +++++++++++++++++++++++++++++++--
>> 1 file changed, 118 insertions(+), 4 deletions(-)
>>
>> diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
>> index fa84aee5..02fb191b 100644
>> --- a/pbs-config/src/token_shadow.rs
>> +++ b/pbs-config/src/token_shadow.rs
>> @@ -1,5 +1,8 @@
>> use std::collections::HashMap;
>> +use std::fs;
>> +use std::io::ErrorKind;
>> use std::sync::LazyLock;
>> +use std::time::SystemTime;
>>
>> use anyhow::{bail, format_err, Error};
>> use parking_lot::RwLock;
>> @@ -7,6 +10,7 @@ use serde::{Deserialize, Serialize};
>> use serde_json::{from_value, Value};
>>
>> use proxmox_sys::fs::CreateOptions;
>> +use proxmox_time::epoch_i64;
>>
>> use pbs_api_types::Authid;
>> //use crate::auth;
>> @@ -24,6 +28,9 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
>> RwLock::new(ApiTokenSecretCache {
>> secrets: HashMap::new(),
>> shared_gen: 0,
>> + file_mtime: None,
>> + file_len: None,
>> + last_checked: None,
>> })
>> });
>>
>> @@ -62,6 +69,63 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
>> proxmox_sys::fs::replace_file(CONF_FILE, &json, options, true)
>> }
>>
>> +/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
>> +/// Returns true if the cache is valid to use, false if not.
>> +fn refresh_cache_if_file_changed() -> bool {
>> + let now = epoch_i64();
>> +
>> + // Best-effort refresh under write lock.
>> + let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
>> + return false;
>> + };
>> +
>> + let Some(shared_gen_now) = token_shadow_shared_gen() else {
>> + return false;
>> + };
>> +
>> + // If another process bumped the generation, we don't know what changed -> clear cache
>> + if cache.shared_gen != shared_gen_now {
>> + invalidate_cache_state(&mut cache);
>> + cache.shared_gen = shared_gen_now;
>> + }
>> +
>> + // Stat the file to detect manual edits.
>> + let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
>> + return false;
>> + };
>> +
>> + // Initialize file stats if we have no prior state.
>> + if cache.last_checked.is_none() {
>> + cache.secrets.clear(); // ensure cache is empty on first load
>> + cache.file_mtime = new_mtime;
>> + cache.file_len = new_len;
>> + cache.last_checked = Some(now);
>> + return true;
>
> this code here
>
>> + }
>> +
>> + // No change detected.
>> + if cache.file_mtime == new_mtime && cache.file_len == new_len {
>> + cache.last_checked = Some(now);
>> + return true;
>> + }
>> +
>> + // Manual edit detected -> invalidate cache and update stat.
>> + cache.secrets.clear();
>> + cache.file_mtime = new_mtime;
>> + cache.file_len = new_len;
>> + cache.last_checked = Some(now);
>
> and this code here are identical. if this is the first invocation, then
> the change detection check above cannot be true (the cached mtime and
> len will be None).
>
> so we can drop the first if above, and replace the last line in this
> hunk with
>
> let prev_last_checked = cache.last_checked.replace(Some(now));
>
> and then skip bumping the generation if this is_none()
Great idea about the .replace()! Integrating it with the new
ShadowFileInfo :)
>
> OTOH, if we just cleared the cache here, does it make sense to return
> true? the cache is empty, so likely querying it *now* makes no sense?
Agree, we should just return false here
>
>> +
>> + // Best-effort propagation to other processes + update local view.
>> + if let Some(shared_gen_new) = bump_token_shadow_shared_gen() {
>> + cache.shared_gen = shared_gen_new;
>> + } else {
>> + // Do not fail: local cache is already safe as we cleared it above.
>> + // Keep local shared_gen as-is to avoid repeated failed attempts.
>> + }
>> +
>> + true
>> +}
>> +
>> /// Verifies that an entry for given tokenid / API token secret exists
>> pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>> if !tokenid.is_token() {
>> @@ -69,7 +133,7 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>> }
>>
>> // Fast path
>> - if cache_try_secret_matches(tokenid, secret) {
>> + if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
>> return Ok(());
>> }
>>
>> @@ -109,12 +173,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
>>
>> let _guard = lock_config()?;
>>
>> + // Capture state before we write to detect external edits.
>> + let pre_meta = shadow_mtime_len().unwrap_or((None, None));
>> +
>> let mut data = read_file()?;
>> let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
>> data.insert(tokenid.clone(), hashed_secret);
>> write_file(data)?;
>>
>> - apply_api_mutation(tokenid, Some(secret));
>> + apply_api_mutation(tokenid, Some(secret), pre_meta);
>>
>> Ok(())
>> }
>> @@ -127,11 +194,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
>>
>> let _guard = lock_config()?;
>>
>> + // Capture state before we write to detect external edits.
>> + let pre_meta = shadow_mtime_len().unwrap_or((None, None));
>> +
>> let mut data = read_file()?;
>> data.remove(tokenid);
>> write_file(data)?;
>>
>> - apply_api_mutation(tokenid, None);
>> + apply_api_mutation(tokenid, None, pre_meta);
>>
>> Ok(())
>> }
>> @@ -145,6 +215,12 @@ struct ApiTokenSecretCache {
>> secrets: HashMap<Authid, CachedSecret>,
>> /// Shared generation to detect mutations of the underlying token.shadow file.
>> shared_gen: usize,
>> + // shadow file mtime to detect changes
>> + file_mtime: Option<SystemTime>,
>> + // shadow file length to detect changes
>> + file_len: Option<u64>,
>> + // last time the file metadata was checked
>> + last_checked: Option<i64>,
>
> these three are always set together, so wouldn't it make more sense to
> make them an Option<ShadowFileInfo> ?
>
>> }
>>
>> /// Cached secret.
>> @@ -204,7 +280,13 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
>> eq && gen2 == cache_gen
>> }
>>
>> -fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
>> +fn apply_api_mutation(
>> + tokenid: &Authid,
>> + new_secret: Option<&str>,
>> + pre_write_meta: (Option<SystemTime>, Option<u64>),
>> +) {
>> + let now = epoch_i64();
>> +
>> // Signal cache invalidation to other processes (best-effort).
>> let new_shared_gen = bump_token_shadow_shared_gen();
>>
>> @@ -220,6 +302,13 @@ fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
>> // Update to the post-mutation generation.
>> cache.shared_gen = gen;
>>
>> + // If our cached file metadata does not match the on-disk state before our write,
>> + // we likely missed an external/manual edit. We can no longer trust any cached secrets.
>> + let (pre_mtime, pre_len) = pre_write_meta;
>> + if cache.file_mtime != pre_mtime || cache.file_len != pre_len {
>> + cache.secrets.clear();
>> + }
>> +
>> // Apply the new mutation.
>> match new_secret {
>> Some(secret) => {
>> @@ -234,6 +323,20 @@ fn apply_api_mutation(tokenid: &Authid, new_secret: Option<&str>) {
>> cache.secrets.remove(tokenid);
>> }
>> }
>> +
>> + // Update our view of the file metadata to the post-write state (best-effort).
>> + // (If this fails, drop local cache so callers fall back to slow path until refreshed.)
>> + match shadow_mtime_len() {
>> + Ok((mtime, len)) => {
>> + cache.file_mtime = mtime;
>> + cache.file_len = len;
>> + cache.last_checked = Some(now);
>> + }
>> + Err(_) => {
>> + // If we cannot validate state, do not trust cache.
>> + invalidate_cache_state(&mut cache);
>> + }
>> + }
>> }
>>
>> /// Get the current shared generation.
>> @@ -253,4 +356,15 @@ fn bump_token_shadow_shared_gen() -> Option<usize> {
>> /// Invalidates the cache state and only keeps the shared generation.
>> fn invalidate_cache_state(cache: &mut ApiTokenSecretCache) {
>> cache.secrets.clear();
>> + cache.file_mtime = None;
>> + cache.file_len = None;
>> + cache.last_checked = None;
>> +}
>> +
>> +fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
>> + match fs::metadata(CONF_FILE) {
>> + Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
>> + Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
>> + Err(e) => Err(e.into()),
>> + }
>> }
>> --
>> 2.47.3
>>
>>
>>
>> _______________________________________________
>> pbs-devel mailing list
>> pbs-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>>
>>
>>
>
>
> _______________________________________________
> pbs-devel mailing list
> pbs-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
>
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 6%]
* [pbs-devel] [PATCH proxmox-backup v4 4/4] pbs-config: add TTL window to token secret cache
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
` (2 preceding siblings ...)
2026-01-21 15:13 12% ` [pbs-devel] [PATCH proxmox-backup v4 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2026-01-21 15:14 15% ` Samuel Rufinatscha
2026-01-21 15:14 13% ` [pbs-devel] [PATCH proxmox v4 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen Samuel Rufinatscha
` (6 subsequent siblings)
10 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
To: pbs-devel
Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
usually should rarely change.
This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired; documents TTL effects.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to 4:
* Adjusted commit message
Changes from v2 to v3:
* Refactored refresh_cache_if_file_changed TTL logic.
* Remove had_prior_state check (replaced by last_checked logic).
* Improve TTL bound checks.
* Reword documentation warning for clarity.
Changes from v1 to v2:
* Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
* Implement double-checked TTL: check with try_read first; only attempt
refresh with try_write if expired/unknown.
* Fix TTL bookkeeping: update last_checked on the “file unchanged” path
and after API mutations.
* Add documentation warning about TTL-delayed effect of manual
token.shadow edits.
docs/user-management.rst | 4 ++++
pbs-config/src/token_shadow.rs | 29 ++++++++++++++++++++++++++++-
2 files changed, 32 insertions(+), 1 deletion(-)
diff --git a/docs/user-management.rst b/docs/user-management.rst
index 41b43d60..8dfae528 100644
--- a/docs/user-management.rst
+++ b/docs/user-management.rst
@@ -156,6 +156,10 @@ metadata:
Similarly, the ``user delete-token`` subcommand can be used to delete a token
again.
+.. WARNING:: Direct/manual edits to ``token.shadow`` may take up to 60 seconds (or
+ longer in edge cases) to take effect due to caching. Restart services for
+ immediate effect of manual edits.
+
Newly generated API tokens don't have any permissions. Please read the next
section to learn how to set access permissions.
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index a5bd1525..24633f6e 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -31,6 +31,8 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
shadow: None,
})
});
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
@@ -72,11 +74,29 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
fn refresh_cache_if_file_changed() -> bool {
let now = epoch_i64();
- // Best-effort refresh under write lock.
+ // Fast path: cache is fresh if shared-gen matches and TTL not expired.
+ if let (Some(cache), Some(shared_gen_read)) =
+ (TOKEN_SECRET_CACHE.try_read(), token_shadow_shared_gen())
+ {
+ if cache.shared_gen == shared_gen_read
+ && cache.shadow.as_ref().is_some_and(|cached| {
+ now >= cached.last_checked
+ && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS
+ })
+ {
+ return true;
+ }
+ // read lock drops here
+ } else {
+ return false;
+ }
+
+ // Slow path: best-effort refresh under write lock.
let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
return false;
};
+ // Re-read generation after acquiring the lock (may have changed meanwhile).
let Some(shared_gen_now) = token_shadow_shared_gen() else {
return false;
};
@@ -86,6 +106,13 @@ fn refresh_cache_if_file_changed() -> bool {
invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
}
+ // TTL check again after acquiring the lock
+ if cache.shadow.as_ref().is_some_and(|cached| {
+ now >= cached.last_checked && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS
+ }) {
+ return true;
+ }
+
// Stat the file to detect manual edits.
let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
return false;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead
@ 2026-01-21 15:13 14% Samuel Rufinatscha
2026-01-21 15:13 17% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
` (10 more replies)
0 siblings, 11 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-21 15:13 UTC (permalink / raw)
To: pbs-devel
Hi,
this series improves the performance of token-based API authentication
in PBS (pbs-config) and in PDM (underlying proxmox-access-control
crate), addressing the API token verification hotspot reported in our
bugtracker #7017 [1].
When profiling PBS /status endpoint with cargo flamegraph [2],
token-based authentication showed up as a dominant hotspot via
proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
path from the hot section of the flamegraph. The same performance issue
was measured [2] for PDM. PDM uses the underlying shared
proxmox-access-control library for token handling, which is a
factored out version of the token.shadow handling code from PBS.
While this series fixes the immediate performance issue both in PBS
(pbs-config) and in the shared proxmox-access-control crate used by
PDM, PBS should eventually, ideally be refactored, in a separate
effort, to use proxmox-access-control for token handling instead of its
local implementation.
Approach
The goal is to reduce the cost of token-based authentication preserving
the existing token handling semantics (including detecting manual edits
to token.shadow) and be consistent between PBS (pbs-config) and
PDM (proxmox-access-control). For both sites, this series proposes to:
1. Introduce an in-memory cache for verified token secrets and
invalidate it through a shared ConfigVersionCache generation. Note, a
shared generation is required to keep privileged and unprivileged
daemon in sync to avoid caching inconsistencies across processes.
2. Invalidate on token.shadow API changes (set_secret,
delete_secret)
3. Invalidate on direct/manual token.shadow file changes (mtime +
length)
4. Avoid per-request file stat calls using a TTL window
Testing
To verify the effect in PBS (pbs-config changes), I:
1. Set up test environment based on latest PBS ISO, installed Rust
toolchain, cloned proxmox-backup repository to use with cargo
flamegraph. Reproduced bug #7017 [1] by profiling the /status
endpoint with token-based authentication using cargo flamegraph [2].
2. Built PBS with pbs-config patches and re-ran the same workload and
profiling setup. Confirmed that
proxmox_sys::crypt::verify_crypt_pw path no longer appears in the
hot section of the flamegraph. CPU usage is now dominated by TLS
overhead.
3. Functionally-wise, I verified that:
* valid tokens authenticate correctly when used in API requests
* invalid secrets are rejected as before
* generating a new token secret via dashboard (create token for
user, regenerate existing secret) works and authenticates correctly
To verify the effect in PDM (proxmox-access-control changes), instead
of PBS’ /status, I profiled the /version endpoint with cargo flamegraph
[2] and verified that the expensive hashing path disappears from the
hot section after introducing caching. Functionally-wise, I verified
that:
* valid tokens authenticate correctly when used in API requests
* invalid secrets are rejected as before
* generating a new token secret via dashboard (create token for user,
regenerate existing secret) works and authenticates correctly
Benchmarks
Two different benchmarks have been run to measure caching effects
and RwLock contention:
(1) Requests per second for PBS /status endpoint (E2E)
Benchmarked parallel token auth requests for
/status?verbose=0 on top of the datastore lookup cache series [3]
to check throughput impact. With datastores=1, repeat=5000, parallel=16
this series gives ~172 req/s compared to ~65 req/s without it.
This is a ~2.6x improvement (and aligns with the ~179 req/s from the
previous series, which used per-process cache invalidation).
(2) RwLock contention for token create/delete under heavy load of
token-authenticated requests
The previous version of the series compared std::sync::RwLock and
parking_lot::RwLock contention for token create/delete under heavy
parallel token-authenticated readers. parking_lot::RwLock has been
chosen for the added fairness guarantees.
Patch summary
pbs-config:
0001 – pbs-config: add token.shadow generation to ConfigVersionCache
0002 – pbs-config: cache verified API token secrets
0003 – pbs-config: invalidate token-secret cache on token.shadow
changes
0004 – pbs-config: add TTL window to token-secret cache
proxmox-access-control:
0005 – access-control: extend AccessControlConfig for token.shadow invalidation
0006 – access-control: cache verified API token secrets
0007 – access-control: invalidate token-secret cache on token.shadow changes
0008 – access-control: add TTL window to token-secret cache
proxmox-datacenter-manager:
0009 – pdm-config: add token.shadow generation to ConfigVersionCache
0010 – docs: document API token-cache TTL effects
0011 – pdm-config: wire user+acl cache generation
Maintainer notes
* proxmox-access-control trait split: permissions now live in
AccessControlPermissions, and AccessControlConfig now requires
fn permissions(&self) -> &dyn AccessControlPermissions ->
version bump
* Renames ConfigVersionCache`s pub user_cache_generation and
increase_user_cache_generation -> version bump
* Adds parking_lot::RwLock dependency in PBS and proxmox-access-control
Kind regards,
Samuel Rufinatscha
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
[2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
[3] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
proxmox-backup:
Samuel Rufinatscha (4):
pbs-config: add token.shadow generation to ConfigVersionCache
pbs-config: cache verified API token secrets
pbs-config: invalidate token-secret cache on token.shadow changes
pbs-config: add TTL window to token secret cache
Cargo.toml | 1 +
docs/user-management.rst | 4 +
pbs-config/Cargo.toml | 1 +
pbs-config/src/config_version_cache.rs | 18 ++
pbs-config/src/token_shadow.rs | 302 ++++++++++++++++++++++++-
5 files changed, 323 insertions(+), 3 deletions(-)
proxmox:
Samuel Rufinatscha (4):
proxmox-access-control: split AccessControlConfig and add token.shadow
gen
proxmox-access-control: cache verified API token secrets
proxmox-access-control: invalidate token-secret cache on token.shadow
changes
proxmox-access-control: add TTL window to token secret cache
Cargo.toml | 1 +
proxmox-access-control/Cargo.toml | 1 +
proxmox-access-control/src/acl.rs | 10 +-
proxmox-access-control/src/init.rs | 113 ++++++--
proxmox-access-control/src/token_shadow.rs | 303 ++++++++++++++++++++-
5 files changed, 401 insertions(+), 27 deletions(-)
proxmox-datacenter-manager:
Samuel Rufinatscha (3):
pdm-config: implement token.shadow generation
docs: document API token-cache TTL effects
pdm-config: wire user+acl cache generation
cli/admin/src/main.rs | 2 +-
docs/access-control.rst | 4 +++
lib/pdm-api-types/src/acl.rs | 4 +--
lib/pdm-config/Cargo.toml | 1 +
lib/pdm-config/src/access_control.rs | 31 ++++++++++++++++++++
lib/pdm-config/src/config_version_cache.rs | 34 +++++++++++++++++-----
lib/pdm-config/src/lib.rs | 2 ++
server/src/acl.rs | 3 +-
ui/src/main.rs | 10 ++++++-
9 files changed, 77 insertions(+), 14 deletions(-)
create mode 100644 lib/pdm-config/src/access_control.rs
Summary over all repositories:
19 files changed, 801 insertions(+), 44 deletions(-)
--
Generated by git-murpp 0.8.1
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 14%]
* [pbs-devel] [PATCH proxmox-datacenter-manager v4 2/3] docs: document API token-cache TTL effects
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
` (8 preceding siblings ...)
2026-01-21 15:14 13% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 1/3] pdm-config: implement token.shadow generation Samuel Rufinatscha
@ 2026-01-21 15:14 17% ` Samuel Rufinatscha
2026-01-21 15:14 16% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 3/3] pdm-config: wire user+acl cache generation Samuel Rufinatscha
10 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
To: pbs-devel
Documents the effects of the added API token-cache in the
proxmox-access-control crate.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to 4:
* Adjusted commit message
docs/access-control.rst | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/docs/access-control.rst b/docs/access-control.rst
index adf26cd..18e57a2 100644
--- a/docs/access-control.rst
+++ b/docs/access-control.rst
@@ -47,6 +47,10 @@ place of the user ID (``user@realm``) and the user password, respectively.
The API token is passed from the client to the server by setting the ``Authorization`` HTTP header
with method ``PDMAPIToken`` to the value ``TOKENID:TOKENSECRET``.
+.. WARNING:: Direct/manual edits to ``token.shadow`` may take up to 60 seconds (or
+ longer in edge cases) to take effect due to caching. Restart services for
+ immediate effect of manual edits.
+
.. _access_control:
Access Control
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 17%]
* [pbs-devel] [PATCH proxmox-backup v4 2/4] pbs-config: cache verified API token secrets
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
2026-01-21 15:13 17% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
@ 2026-01-21 15:13 12% ` Samuel Rufinatscha
2026-01-21 15:13 12% ` [pbs-devel] [PATCH proxmox-backup v4 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
` (8 subsequent siblings)
10 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-21 15:13 UTC (permalink / raw)
To: pbs-devel
Adds an in-memory cache of successfully verified token secrets.
Subsequent requests for the same token+secret combination only perform a
comparison using openssl::memcmp::eq and avoid re-running the password
hash. The cache is updated when a token secret is set and cleared when a
token is deleted.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to v4:
* Add gen param to invalidate_cache_state()
* Validates the generation bump after obtaining write lock in
apply_api_mutation
* Pass lock to apply_api_mutation
* Remove unnecessary gen check cache_try_secret_matches
* Adjusted commit message
Changes from v2 to v3:
* Replaced process-local cache invalidation (AtomicU64
API_MUTATION_GENERATION) with a cross-process shared generation via
ConfigVersionCache.
* Validate shared generation before/after the constant-time secret
compare; only insert into cache if the generation is unchanged.
* invalidate_cache_state() on insert if shared generation changed.
Changes from v1 to v2:
* Replace OnceCell with LazyLock, and std::sync::RwLock with
parking_lot::RwLock.
* Add API_MUTATION_GENERATION and guard cache inserts
to prevent “zombie inserts” across concurrent set/delete.
* Refactor cache operations into cache_try_secret_matches,
cache_try_insert_secret, and centralize write-side behavior in
apply_api_mutation.
* Switch fast-path cache access to try_read/try_write (best-effort).
Cargo.toml | 1 +
pbs-config/Cargo.toml | 1 +
pbs-config/src/token_shadow.rs | 160 ++++++++++++++++++++++++++++++++-
3 files changed, 159 insertions(+), 3 deletions(-)
diff --git a/Cargo.toml b/Cargo.toml
index 0da18383..aed66fe3 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -143,6 +143,7 @@ nom = "7"
num-traits = "0.2"
once_cell = "1.3.1"
openssl = "0.10.40"
+parking_lot = "0.12"
percent-encoding = "2.1"
pin-project-lite = "0.2"
regex = "1.5.5"
diff --git a/pbs-config/Cargo.toml b/pbs-config/Cargo.toml
index 74afb3c6..eb81ce00 100644
--- a/pbs-config/Cargo.toml
+++ b/pbs-config/Cargo.toml
@@ -13,6 +13,7 @@ libc.workspace = true
nix.workspace = true
once_cell.workspace = true
openssl.workspace = true
+parking_lot.workspace = true
regex.workspace = true
serde.workspace = true
serde_json.workspace = true
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index 640fabbf..d5aa5de2 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,6 +1,8 @@
use std::collections::HashMap;
+use std::sync::LazyLock;
use anyhow::{bail, format_err, Error};
+use parking_lot::RwLock;
use serde::{Deserialize, Serialize};
use serde_json::{from_value, Value};
@@ -13,6 +15,18 @@ use crate::{open_backup_lockfile, BackupLockGuard};
const LOCK_FILE: &str = pbs_buildcfg::configdir!("/token.shadow.lock");
const CONF_FILE: &str = pbs_buildcfg::configdir!("/token.shadow");
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
+ RwLock::new(ApiTokenSecretCache {
+ secrets: HashMap::new(),
+ shared_gen: 0,
+ })
+});
+
#[derive(Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
/// ApiToken id / secret pair
@@ -54,9 +68,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
bail!("not an API token ID");
}
+ // Fast path
+ if cache_try_secret_matches(tokenid, secret) {
+ return Ok(());
+ }
+
+ // Slow path
+ // First, capture the shared generation before doing the hash verification.
+ let gen_before = token_shadow_shared_gen();
+
let data = read_file()?;
match data.get(tokenid) {
- Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+ Some(hashed_secret) => {
+ proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+
+ // Try to cache only if nothing changed while verifying the secret.
+ if let Some(gen) = gen_before {
+ cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
+ }
+
+ Ok(())
+ }
None => bail!("invalid API token"),
}
}
@@ -75,13 +107,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
bail!("not an API token ID");
}
- let _guard = lock_config()?;
+ let guard = lock_config()?;
let mut data = read_file()?;
let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
data.insert(tokenid.clone(), hashed_secret);
write_file(data)?;
+ apply_api_mutation(guard, tokenid, Some(secret));
+
Ok(())
}
@@ -91,11 +125,131 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
bail!("not an API token ID");
}
- let _guard = lock_config()?;
+ let guard = lock_config()?;
let mut data = read_file()?;
data.remove(tokenid);
write_file(data)?;
+ apply_api_mutation(guard, tokenid, None);
+
Ok(())
}
+
+struct ApiTokenSecretCache {
+ /// Keys are token Authids, values are the corresponding plain text secrets.
+ /// Entries are added after a successful on-disk verification in
+ /// `verify_secret` or when a new token secret is generated by
+ /// `generate_and_set_secret`. Used to avoid repeated
+ /// password-hash computation on subsequent authentications.
+ secrets: HashMap<Authid, CachedSecret>,
+ /// Shared generation to detect mutations of the underlying token.shadow file.
+ shared_gen: usize,
+}
+
+/// Cached secret.
+struct CachedSecret {
+ secret: String,
+}
+
+fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
+ let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+ return;
+ };
+
+ let Some(shared_gen_now) = token_shadow_shared_gen() else {
+ return;
+ };
+
+ // If this process missed a generation bump, its cache is stale.
+ if cache.shared_gen != shared_gen_now {
+ invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
+ }
+
+ // If a mutation happened while we were verifying the secret, do not insert.
+ if shared_gen_now == shared_gen_before {
+ cache.secrets.insert(tokenid, CachedSecret { secret });
+ }
+}
+
+/// Tries to match the given token secret against the cached secret.
+///
+/// Verifies the generation/version before doing the constant-time
+/// comparison to reduce TOCTOU risk. During token rotation or deletion
+/// tokens for in-flight requests may still validate against the previous
+/// generation.
+fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
+ let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+ return false;
+ };
+ let Some(entry) = cache.secrets.get(tokenid) else {
+ return false;
+ };
+ let Some(current_gen) = token_shadow_shared_gen() else {
+ return false;
+ };
+
+ if current_gen == cache.shared_gen {
+ return openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
+ }
+
+ false
+}
+
+fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+ // Signal cache invalidation to other processes (best-effort).
+ let bumped_gen = bump_token_shadow_shared_gen();
+
+ let mut cache = TOKEN_SECRET_CACHE.write();
+
+ // If we cannot get the current generation, we cannot trust the cache
+ let Some(current_gen) = token_shadow_shared_gen() else {
+ invalidate_cache_state_and_set_gen(&mut cache, 0);
+ return;
+ };
+
+ // If we cannot bump the shared generation, or if it changed after
+ // obtaining the cache write lock, we cannot trust the cache
+ if bumped_gen != Some(current_gen) {
+ invalidate_cache_state_and_set_gen(&mut cache, current_gen);
+ return;
+ }
+
+ // Update to the post-mutation generation.
+ cache.shared_gen = current_gen;
+
+ // Apply the new mutation.
+ match new_secret {
+ Some(secret) => {
+ cache.secrets.insert(
+ tokenid.clone(),
+ CachedSecret {
+ secret: secret.to_owned(),
+ },
+ );
+ }
+ None => {
+ cache.secrets.remove(tokenid);
+ }
+ }
+}
+
+/// Get the current shared generation.
+fn token_shadow_shared_gen() -> Option<usize> {
+ crate::ConfigVersionCache::new()
+ .ok()
+ .map(|cvc| cvc.token_shadow_generation())
+}
+
+/// Bump and return the new shared generation.
+fn bump_token_shadow_shared_gen() -> Option<usize> {
+ crate::ConfigVersionCache::new()
+ .ok()
+ .map(|cvc| cvc.increase_token_shadow_generation() + 1)
+}
+
+/// Invalidates local cache contents and sets/updates the cached generation.
+fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache, gen: usize) {
+ cache.secrets.clear();
+ cache.shared_gen = gen;
+}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] [PATCH proxmox v4 4/4] proxmox-access-control: add TTL window to token secret cache
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
` (6 preceding siblings ...)
2026-01-21 15:14 12% ` [pbs-devel] [PATCH proxmox v4 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
@ 2026-01-21 15:14 15% ` Samuel Rufinatscha
2026-01-21 15:14 13% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 1/3] pdm-config: implement token.shadow generation Samuel Rufinatscha
` (2 subsequent siblings)
10 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
To: pbs-devel
Verify_secret() currently calls refresh_cache_if_file_changed() on every
request, which performs a metadata() call on token.shadow each time.
Under load this adds unnecessary overhead, considering also the file
usually should rarely change.
This patch introduces a TTL boundary, controlled by
TOKEN_SECRET_CACHE_TTL_SECS. File metadata is only re-loaded once the
TTL has expired; documents TTL effects.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to 4:
* Adjusted commit message
Changes from v2 to v3:
* Refactored refresh_cache_if_file_changed TTL logic.
* Remove had_prior_state check (replaced by last_checked logic).
* Improve TTL bound checks.
* Reword documentation warning for clarity.
Changes from v1 to v2:
* Add TOKEN_SECRET_CACHE_TTL_SECS and last_checked.
* Implement double-checked TTL: check with try_read first; only attempt
refresh with try_write if expired/unknown.
* Fix TTL bookkeeping: update last_checked on the “file unchanged” path
and after API mutations.
* Add documentation warning about TTL-delayed effect of manual
token.shadow edits.
proxmox-access-control/src/token_shadow.rs | 30 +++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index 05813b52..a361fd72 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -28,6 +28,9 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
})
});
+/// Max age in seconds of the token secret cache before checking for file changes.
+const TOKEN_SECRET_CACHE_TTL_SECS: i64 = 60;
+
// Get exclusive lock
fn lock_config() -> Result<ApiLockGuard, Error> {
open_api_lockfile(token_shadow_lock(), None, true)
@@ -55,11 +58,29 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
fn refresh_cache_if_file_changed() -> bool {
let now = epoch_i64();
- // Best-effort refresh under write lock.
+ // Fast path: cache is fresh if shared-gen matches and TTL not expired.
+ if let (Some(cache), Some(shared_gen_read)) =
+ (TOKEN_SECRET_CACHE.try_read(), token_shadow_shared_gen())
+ {
+ if cache.shared_gen == shared_gen_read
+ && cache.shadow.as_ref().is_some_and(|cached| {
+ now >= cached.last_checked
+ && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS
+ })
+ {
+ return true;
+ }
+ // read lock drops here
+ } else {
+ return false;
+ }
+
+ // Slow path: best-effort refresh under write lock.
let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
return false;
};
+ // Re-read generation after acquiring the lock (may have changed meanwhile).
let Some(shared_gen_now) = token_shadow_shared_gen() else {
return false;
};
@@ -69,6 +90,13 @@ fn refresh_cache_if_file_changed() -> bool {
invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
}
+ // TTL check again after acquiring the lock
+ if cache.shadow.as_ref().is_some_and(|cached| {
+ now >= cached.last_checked && (now - cached.last_checked) < TOKEN_SECRET_CACHE_TTL_SECS
+ }) {
+ return true;
+ }
+
// Stat the file to detect manual edits.
let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
return false;
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 15%]
* [pbs-devel] [PATCH proxmox v4 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
` (3 preceding siblings ...)
2026-01-21 15:14 15% ` [pbs-devel] [PATCH proxmox-backup v4 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
@ 2026-01-21 15:14 13% ` Samuel Rufinatscha
2026-01-21 15:14 12% ` [pbs-devel] [PATCH proxmox v4 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
` (5 subsequent siblings)
10 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
To: pbs-devel
Splits AccessControlConfig trait into AccessControlPermissions and
AccessControlConfig traits and adds token.shadow generation support
to AccessControlConfig (provides default impl).
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to 4:
* Split AccessControlConfig: introduced AccessControlPermissions to
provide permissions for AccessControlConfig
* Adjusted commit message
proxmox-access-control/src/acl.rs | 10 ++-
proxmox-access-control/src/init.rs | 113 +++++++++++++++++++++++------
2 files changed, 99 insertions(+), 24 deletions(-)
diff --git a/proxmox-access-control/src/acl.rs b/proxmox-access-control/src/acl.rs
index 38cb7edf..4b4eac09 100644
--- a/proxmox-access-control/src/acl.rs
+++ b/proxmox-access-control/src/acl.rs
@@ -763,7 +763,7 @@ fn privs_to_priv_names(privs: u64) -> Vec<&'static str> {
mod test {
use std::{collections::HashMap, sync::OnceLock};
- use crate::init::{init_access_config, AccessControlConfig};
+ use crate::init::{init_access_config, AccessControlConfig, AccessControlPermissions};
use super::AclTree;
use anyhow::Error;
@@ -775,7 +775,7 @@ mod test {
roles: HashMap<&'a str, (u64, &'a str)>,
}
- impl AccessControlConfig for TestAcmConfig<'_> {
+ impl AccessControlPermissions for TestAcmConfig<'_> {
fn roles(&self) -> &HashMap<&str, (u64, &str)> {
&self.roles
}
@@ -793,6 +793,12 @@ mod test {
}
}
+ impl AccessControlConfig for TestAcmConfig<'_> {
+ fn permissions(&self) -> &dyn AccessControlPermissions {
+ self
+ }
+ }
+
fn setup_acl_tree_config() {
static ACL_CONFIG: OnceLock<TestAcmConfig> = OnceLock::new();
let config = ACL_CONFIG.get_or_init(|| {
diff --git a/proxmox-access-control/src/init.rs b/proxmox-access-control/src/init.rs
index e64398e8..dfd7784b 100644
--- a/proxmox-access-control/src/init.rs
+++ b/proxmox-access-control/src/init.rs
@@ -8,9 +8,8 @@ use proxmox_section_config::SectionConfigData;
static ACCESS_CONF: OnceLock<&'static dyn AccessControlConfig> = OnceLock::new();
-/// This trait specifies the functions a product needs to implement to get ACL tree based access
-/// control management from this plugin.
-pub trait AccessControlConfig: Send + Sync {
+/// Provides permission metadata used by access control.
+pub trait AccessControlPermissions: Send + Sync {
/// Returns a mapping of all recognized privileges and their corresponding `u64` value.
fn privileges(&self) -> &HashMap<&str, u64>;
@@ -32,25 +31,6 @@ pub trait AccessControlConfig: Send + Sync {
false
}
- /// Returns the current cache generation of the user and acl configs. If the generation was
- /// incremented since the last time the cache was queried, the configs are loaded again from
- /// disk.
- ///
- /// Returning `None` will always reload the cache.
- ///
- /// Default: Always returns `None`.
- fn cache_generation(&self) -> Option<usize> {
- None
- }
-
- /// Increment the cache generation of user and acl configs. This indicates that they were
- /// changed on disk.
- ///
- /// Default: Does nothing.
- fn increment_cache_generation(&self) -> Result<(), Error> {
- Ok(())
- }
-
/// Optionally returns a role that has no access to any resource.
///
/// Default: Returns `None`.
@@ -103,6 +83,95 @@ pub trait AccessControlConfig: Send + Sync {
}
}
+/// This trait specifies the functions a product needs to implement to get ACL tree based access
+/// control management from this plugin.
+pub trait AccessControlConfig: Send + Sync {
+ /// Return the permissions provider.
+ fn permissions(&self) -> &dyn AccessControlPermissions;
+
+ fn privileges(&self) -> &HashMap<&str, u64> {
+ self.permissions().privileges()
+ }
+
+ fn roles(&self) -> &HashMap<&str, (u64, &str)> {
+ self.permissions().roles()
+ }
+
+ fn is_superuser(&self, auth_id: &Authid) -> bool {
+ self.permissions().is_superuser(auth_id)
+ }
+
+ fn is_group_member(&self, user_id: &Userid, group: &str) -> bool {
+ self.permissions().is_group_member(user_id, group)
+ }
+
+ fn role_no_access(&self) -> Option<&str> {
+ self.permissions().role_no_access()
+ }
+
+ fn role_admin(&self) -> Option<&str> {
+ self.permissions().role_admin()
+ }
+
+ fn init_user_config(&self, config: &mut SectionConfigData) -> Result<(), Error> {
+ self.permissions().init_user_config(config)
+ }
+
+ fn acl_audit_privileges(&self) -> u64 {
+ self.permissions().acl_audit_privileges()
+ }
+
+ fn acl_modify_privileges(&self) -> u64 {
+ self.permissions().acl_modify_privileges()
+ }
+
+ fn check_acl_path(&self, path: &str) -> Result<(), Error> {
+ self.permissions().check_acl_path(path)
+ }
+
+ fn allow_partial_permission_match(&self) -> bool {
+ self.permissions().allow_partial_permission_match()
+ }
+
+ // Cache hooks
+
+ /// Returns the current cache generation of the user and acl configs. If the generation was
+ /// incremented since the last time the cache was queried, the configs are loaded again from
+ /// disk.
+ ///
+ /// Returning `None` will always reload the cache.
+ ///
+ /// Default: Always returns `None`.
+ fn cache_generation(&self) -> Option<usize> {
+ None
+ }
+
+ /// Increment the cache generation of user and acl configs. This indicates that they were
+ /// changed on disk.
+ ///
+ /// Default: Does nothing.
+ fn increment_cache_generation(&self) -> Result<(), Error> {
+ Ok(())
+ }
+
+ /// Returns the current cache generation of the token shadow cache. If the generation was
+ /// incremented since the last time the cache was queried, the token shadow cache is reloaded
+ /// from disk.
+ ///
+ /// Default: Always returns `None`.
+ fn token_shadow_cache_generation(&self) -> Option<usize> {
+ None
+ }
+
+ /// Increment the cache generation of the token shadow cache. This indicates that it was
+ /// changed on disk.
+ ///
+ /// Default: Returns an error as token shadow generation is not supported.
+ fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
+ anyhow::bail!("token shadow generation not supported");
+ }
+}
+
pub fn init_access_config(config: &'static dyn AccessControlConfig) -> Result<(), Error> {
ACCESS_CONF
.set(config)
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox-datacenter-manager v4 1/3] pdm-config: implement token.shadow generation
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
` (7 preceding siblings ...)
2026-01-21 15:14 15% ` [pbs-devel] [PATCH proxmox v4 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
@ 2026-01-21 15:14 13% ` Samuel Rufinatscha
2026-01-21 15:14 17% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 2/3] docs: document API token-cache TTL effects Samuel Rufinatscha
2026-01-21 15:14 16% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 3/3] pdm-config: wire user+acl cache generation Samuel Rufinatscha
10 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
To: pbs-devel
PDM depends on the shared proxmox/proxmox-access-control crate for
token.shadow handling which expects the product to provide a
cross-process invalidation signal so it can cache/invalidate
token.shadow secrets.
This patch wires AccessControlConfig to ConfigVersionCache for
token.shadow invalidation and switches server/CLI to use
pdm-config’s AccessControlConfig and UI to use
UiAccessControlConfig.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to v4:
* pdm-api-types: replace AccessControlConfig with
AccessControlPermissions and implement init::AccessControlPermissions
there
* pdm-config: add new AccessControlConfig implementing
init::AccessControlConfig
* UI: init uses a local UiAccessControlConfig for init_access_config()
* Adjusted commit message
cli/admin/src/main.rs | 2 +-
lib/pdm-api-types/src/acl.rs | 4 ++--
lib/pdm-config/Cargo.toml | 1 +
lib/pdm-config/src/access_control.rs | 20 ++++++++++++++++++++
lib/pdm-config/src/config_version_cache.rs | 18 ++++++++++++++++++
lib/pdm-config/src/lib.rs | 2 ++
server/src/acl.rs | 3 +--
ui/src/main.rs | 10 +++++++++-
8 files changed, 54 insertions(+), 6 deletions(-)
create mode 100644 lib/pdm-config/src/access_control.rs
diff --git a/cli/admin/src/main.rs b/cli/admin/src/main.rs
index f698fa2..916c633 100644
--- a/cli/admin/src/main.rs
+++ b/cli/admin/src/main.rs
@@ -19,7 +19,7 @@ fn main() {
proxmox_product_config::init(api_user, priv_user);
proxmox_access_control::init::init(
- &pdm_api_types::AccessControlConfig,
+ &pdm_config::AccessControlConfig,
pdm_buildcfg::configdir!("/access"),
)
.expect("failed to setup access control config");
diff --git a/lib/pdm-api-types/src/acl.rs b/lib/pdm-api-types/src/acl.rs
index 405982a..7c405a7 100644
--- a/lib/pdm-api-types/src/acl.rs
+++ b/lib/pdm-api-types/src/acl.rs
@@ -187,9 +187,9 @@ pub struct AclListItem {
pub roleid: String,
}
-pub struct AccessControlConfig;
+pub struct AccessControlPermissions;
-impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
+impl proxmox_access_control::init::AccessControlPermissions for AccessControlPermissions {
fn privileges(&self) -> &HashMap<&str, u64> {
static PRIVS: LazyLock<HashMap<&str, u64>> =
LazyLock::new(|| PRIVILEGES.iter().copied().collect());
diff --git a/lib/pdm-config/Cargo.toml b/lib/pdm-config/Cargo.toml
index d39c2ad..19781d2 100644
--- a/lib/pdm-config/Cargo.toml
+++ b/lib/pdm-config/Cargo.toml
@@ -13,6 +13,7 @@ once_cell.workspace = true
openssl.workspace = true
serde.workspace = true
+proxmox-access-control.workspace = true
proxmox-config-digest = { workspace = true, features = [ "openssl" ] }
proxmox-http = { workspace = true, features = [ "http-helpers" ] }
proxmox-ldap = { workspace = true, features = [ "types" ]}
diff --git a/lib/pdm-config/src/access_control.rs b/lib/pdm-config/src/access_control.rs
new file mode 100644
index 0000000..389b3f4
--- /dev/null
+++ b/lib/pdm-config/src/access_control.rs
@@ -0,0 +1,20 @@
+use anyhow::Error;
+
+pub struct AccessControlConfig;
+
+impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
+ fn permissions(&self) -> &dyn proxmox_access_control::init::AccessControlPermissions {
+ &pdm_api_types::AccessControlPermissions
+ }
+
+ fn token_shadow_cache_generation(&self) -> Option<usize> {
+ crate::ConfigVersionCache::new()
+ .ok()
+ .map(|c| c.token_shadow_generation())
+ }
+
+ fn increment_token_shadow_cache_generation(&self) -> Result<usize, Error> {
+ let c = crate::ConfigVersionCache::new()?;
+ Ok(c.increase_token_shadow_generation())
+ }
+}
diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
index 36a6a77..933140c 100644
--- a/lib/pdm-config/src/config_version_cache.rs
+++ b/lib/pdm-config/src/config_version_cache.rs
@@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
traffic_control_generation: AtomicUsize,
// Tracks updates to the remote/hostname/nodename mapping cache.
remote_mapping_cache: AtomicUsize,
+ // Token shadow (token.shadow) generation/version.
+ token_shadow_generation: AtomicUsize,
// Add further atomics here
}
@@ -172,4 +174,20 @@ impl ConfigVersionCache {
.fetch_add(1, Ordering::Relaxed)
+ 1
}
+
+ /// Returns the token shadow generation number.
+ pub fn token_shadow_generation(&self) -> usize {
+ self.shmem
+ .data()
+ .token_shadow_generation
+ .load(Ordering::Acquire)
+ }
+
+ /// Increase the token shadow generation number.
+ pub fn increase_token_shadow_generation(&self) -> usize {
+ self.shmem
+ .data()
+ .token_shadow_generation
+ .fetch_add(1, Ordering::AcqRel)
+ }
}
diff --git a/lib/pdm-config/src/lib.rs b/lib/pdm-config/src/lib.rs
index 4c49054..614f7ae 100644
--- a/lib/pdm-config/src/lib.rs
+++ b/lib/pdm-config/src/lib.rs
@@ -9,6 +9,8 @@ pub mod remotes;
pub mod setup;
pub mod views;
+mod access_control;
+pub use access_control::AccessControlConfig;
mod config_version_cache;
pub use config_version_cache::ConfigVersionCache;
diff --git a/server/src/acl.rs b/server/src/acl.rs
index f421814..e6e007b 100644
--- a/server/src/acl.rs
+++ b/server/src/acl.rs
@@ -1,6 +1,5 @@
pub(crate) fn init() {
- static ACCESS_CONTROL_CONFIG: pdm_api_types::AccessControlConfig =
- pdm_api_types::AccessControlConfig;
+ static ACCESS_CONTROL_CONFIG: pdm_config::AccessControlConfig = pdm_config::AccessControlConfig;
proxmox_access_control::init::init(&ACCESS_CONTROL_CONFIG, pdm_buildcfg::configdir!("/access"))
.expect("failed to setup access control config");
diff --git a/ui/src/main.rs b/ui/src/main.rs
index 2bd900e..9f87505 100644
--- a/ui/src/main.rs
+++ b/ui/src/main.rs
@@ -390,10 +390,18 @@ fn main() {
pwt::state::set_available_languages(proxmox_yew_comp::available_language_list());
if let Err(e) =
- proxmox_access_control::init::init_access_config(&pdm_api_types::AccessControlConfig)
+ proxmox_access_control::init::init_access_config(&UiAccessControlConfig)
{
log::error!("could not initialize access control config - {e:#}");
}
yew::Renderer::<DatacenterManagerApp>::new().render();
}
+
+struct UiAccessControlConfig;
+
+impl proxmox_access_control::init::AccessControlConfig for UiAccessControlConfig {
+ fn permissions(&self) -> &dyn proxmox_access_control::init::AccessControlPermissions {
+ &pdm_api_types::AccessControlPermissions
+ }
+}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* [pbs-devel] [PATCH proxmox v4 2/4] proxmox-access-control: cache verified API token secrets
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
` (4 preceding siblings ...)
2026-01-21 15:14 13% ` [pbs-devel] [PATCH proxmox v4 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen Samuel Rufinatscha
@ 2026-01-21 15:14 12% ` Samuel Rufinatscha
2026-01-21 15:14 12% ` [pbs-devel] [PATCH proxmox v4 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
` (4 subsequent siblings)
10 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
To: pbs-devel
Adds an in-memory cache of successfully verified token secrets.
Subsequent requests for the same token+secret combination only perform a
comparison using openssl::memcmp::eq and avoid re-running the password
hash. The cache is updated when a token secret is set and cleared when a
token is deleted.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to v4:
* Add gen param to invalidate_cache_state()
* Validates the generation bump after obtaining write lock in
apply_api_mutation
* Pass lock to apply_api_mutation
* Remove unnecessary gen check cache_try_secret_matches
* Adjusted commit message
Changes from v2 to v3:
* Replaced process-local cache invalidation (AtomicU64
API_MUTATION_GENERATION) with a cross-process shared generation via
ConfigVersionCache.
* Validate shared generation before/after the constant-time secret
compare; only insert into cache if the generation is unchanged.
* invalidate_cache_state() on insert if shared generation changed.
Changes from v1 to v2:
* Replace OnceCell with LazyLock, and std::sync::RwLock with
parking_lot::RwLock.
* Add API_MUTATION_GENERATION and guard cache inserts
to prevent “zombie inserts” across concurrent set/delete.
* Refactor cache operations into cache_try_secret_matches,
cache_try_insert_secret, and centralize write-side behavior in
apply_api_mutation.
* Switch fast-path cache access to try_read/try_write (best-effort).
Cargo.toml | 1 +
proxmox-access-control/Cargo.toml | 1 +
proxmox-access-control/src/token_shadow.rs | 160 ++++++++++++++++++++-
3 files changed, 159 insertions(+), 3 deletions(-)
diff --git a/Cargo.toml b/Cargo.toml
index 27a69afa..59a2ec93 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -112,6 +112,7 @@ native-tls = "0.2"
nix = "0.29"
openssl = "0.10"
pam-sys = "0.5"
+parking_lot = "0.12"
percent-encoding = "2.1"
pin-utils = "0.1.0"
proc-macro2 = "1.0"
diff --git a/proxmox-access-control/Cargo.toml b/proxmox-access-control/Cargo.toml
index ec189664..1de2842c 100644
--- a/proxmox-access-control/Cargo.toml
+++ b/proxmox-access-control/Cargo.toml
@@ -16,6 +16,7 @@ anyhow.workspace = true
const_format.workspace = true
nix = { workspace = true, optional = true }
openssl = { workspace = true, optional = true }
+parking_lot.workspace = true
regex.workspace = true
hex = { workspace = true, optional = true }
serde.workspace = true
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index c586d834..e4dfab50 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,13 +1,28 @@
use std::collections::HashMap;
+use std::sync::LazyLock;
use anyhow::{bail, format_err, Error};
+use parking_lot::RwLock;
use serde_json::{from_value, Value};
use proxmox_auth_api::types::Authid;
use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
+use crate::init::access_conf;
use crate::init::impl_feature::{token_shadow, token_shadow_lock};
+/// Global in-memory cache for successfully verified API token secrets.
+/// The cache stores plain text secrets for token Authids that have already been
+/// verified against the hashed values in `token.shadow`. This allows for cheap
+/// subsequent authentications for the same token+secret combination, avoiding
+/// recomputing the password hash on every request.
+static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new(|| {
+ RwLock::new(ApiTokenSecretCache {
+ secrets: HashMap::new(),
+ shared_gen: 0,
+ })
+});
+
// Get exclusive lock
fn lock_config() -> Result<ApiLockGuard, Error> {
open_api_lockfile(token_shadow_lock(), None, true)
@@ -36,9 +51,27 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
bail!("not an API token ID");
}
+ // Fast path
+ if cache_try_secret_matches(tokenid, secret) {
+ return Ok(());
+ }
+
+ // Slow path
+ // First, capture the shared generation before doing the hash verification.
+ let gen_before = token_shadow_shared_gen();
+
let data = read_file()?;
match data.get(tokenid) {
- Some(hashed_secret) => proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret),
+ Some(hashed_secret) => {
+ proxmox_sys::crypt::verify_crypt_pw(secret, hashed_secret)?;
+
+ // Try to cache only if nothing changed while verifying the secret.
+ if let Some(gen) = gen_before {
+ cache_try_insert_secret(tokenid.clone(), secret.to_owned(), gen);
+ }
+
+ Ok(())
+ }
None => bail!("invalid API token"),
}
}
@@ -49,13 +82,15 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
bail!("not an API token ID");
}
- let _guard = lock_config()?;
+ let guard = lock_config()?;
let mut data = read_file()?;
let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
data.insert(tokenid.clone(), hashed_secret);
write_file(data)?;
+ apply_api_mutation(guard, tokenid, Some(secret));
+
Ok(())
}
@@ -65,12 +100,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
bail!("not an API token ID");
}
- let _guard = lock_config()?;
+ let guard = lock_config()?;
let mut data = read_file()?;
data.remove(tokenid);
write_file(data)?;
+ apply_api_mutation(guard, tokenid, None);
+
Ok(())
}
@@ -81,3 +118,120 @@ pub fn generate_and_set_secret(tokenid: &Authid) -> Result<String, Error> {
set_secret(tokenid, &secret)?;
Ok(secret)
}
+
+struct ApiTokenSecretCache {
+ /// Keys are token Authids, values are the corresponding plain text secrets.
+ /// Entries are added after a successful on-disk verification in
+ /// `verify_secret` or when a new token secret is generated by
+ /// `generate_and_set_secret`. Used to avoid repeated
+ /// password-hash computation on subsequent authentications.
+ secrets: HashMap<Authid, CachedSecret>,
+ /// Shared generation to detect mutations of the underlying token.shadow file.
+ shared_gen: usize,
+}
+
+/// Cached secret.
+struct CachedSecret {
+ secret: String,
+}
+
+fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
+ let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+ return;
+ };
+
+ let Some(shared_gen_now) = token_shadow_shared_gen() else {
+ return;
+ };
+
+ // If this process missed a generation bump, its cache is stale.
+ if cache.shared_gen != shared_gen_now {
+ invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
+ }
+
+ // If a mutation happened while we were verifying the secret, do not insert.
+ if shared_gen_now == shared_gen_before {
+ cache.secrets.insert(tokenid, CachedSecret { secret });
+ }
+}
+
+/// Tries to match the given token secret against the cached secret.
+///
+/// Verifies the generation/version before doing the constant-time
+/// comparison to reduce TOCTOU risk. During token rotation or deletion
+/// tokens for in-flight requests may still validate against the previous
+/// generation.
+fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
+ let Some(cache) = TOKEN_SECRET_CACHE.try_read() else {
+ return false;
+ };
+ let Some(entry) = cache.secrets.get(tokenid) else {
+ return false;
+ };
+ let Some(current_gen) = token_shadow_shared_gen() else {
+ return false;
+ };
+
+ if current_gen == cache.shared_gen {
+ return openssl::memcmp::eq(entry.secret.as_bytes(), secret.as_bytes());
+ }
+
+ false
+}
+
+fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+ // Signal cache invalidation to other processes (best-effort).
+ let bumped_gen = bump_token_shadow_shared_gen();
+
+ let mut cache = TOKEN_SECRET_CACHE.write();
+
+ // If we cannot get the current generation, we cannot trust the cache
+ let Some(current_gen) = token_shadow_shared_gen() else {
+ invalidate_cache_state_and_set_gen(&mut cache, 0);
+ return;
+ };
+
+ // If we cannot bump the shared generation, or if it changed after
+ // obtaining the cache write lock, we cannot trust the cache
+ if bumped_gen != Some(current_gen) {
+ invalidate_cache_state_and_set_gen(&mut cache, current_gen);
+ return;
+ }
+
+ // Update to the post-mutation generation.
+ cache.shared_gen = current_gen;
+
+ // Apply the new mutation.
+ match new_secret {
+ Some(secret) => {
+ cache.secrets.insert(
+ tokenid.clone(),
+ CachedSecret {
+ secret: secret.to_owned(),
+ },
+ );
+ }
+ None => {
+ cache.secrets.remove(tokenid);
+ }
+ }
+}
+
+/// Get the current shared generation.
+fn token_shadow_shared_gen() -> Option<usize> {
+ access_conf().token_shadow_cache_generation()
+}
+
+/// Bump and return the new shared generation.
+fn bump_token_shadow_shared_gen() -> Option<usize> {
+ access_conf()
+ .increment_token_shadow_cache_generation()
+ .ok()
+ .map(|prev| prev + 1)
+}
+
+/// Invalidates local cache contents and sets/updates the cached generation.
+fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache, gen: usize) {
+ cache.secrets.clear();
+ cache.shared_gen = gen;
+}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] [PATCH proxmox-backup v4 1/4] pbs-config: add token.shadow generation to ConfigVersionCache
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
@ 2026-01-21 15:13 17% ` Samuel Rufinatscha
2026-01-21 15:13 12% ` [pbs-devel] [PATCH proxmox-backup v4 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
` (9 subsequent siblings)
10 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-21 15:13 UTC (permalink / raw)
To: pbs-devel
Prepares the config version cache to support token_shadow caching.
Safety: the shmem mapping is fixed to 4096 bytes via the #[repr(C)]
union padding, and the new atomic is appended to the end of the
#[repr(C)] inner struct, so all existing field offsets stay unchanged.
Old processes keep accessing the same bytes and new processes consume
previously reserved padding.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to v4:
* Rebased
* Adjusted commit message
Changes from v2 to v3:
* Rebased
Changes from v1 to v2:
* Rebased
pbs-config/src/config_version_cache.rs | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/pbs-config/src/config_version_cache.rs b/pbs-config/src/config_version_cache.rs
index b875f7e0..399a6f79 100644
--- a/pbs-config/src/config_version_cache.rs
+++ b/pbs-config/src/config_version_cache.rs
@@ -27,6 +27,8 @@ struct ConfigVersionCacheDataInner {
traffic_control_generation: AtomicUsize,
// datastore (datastore.cfg) generation/version
datastore_generation: AtomicUsize,
+ // Token shadow (token.shadow) generation/version.
+ token_shadow_generation: AtomicUsize,
// Add further atomics here
}
@@ -159,4 +161,20 @@ impl ConfigVersionCache {
.datastore_generation
.fetch_add(1, Ordering::AcqRel)
}
+
+ /// Returns the token shadow generation number.
+ pub fn token_shadow_generation(&self) -> usize {
+ self.shmem
+ .data()
+ .token_shadow_generation
+ .load(Ordering::Acquire)
+ }
+
+ /// Increase the token shadow generation number.
+ pub fn increase_token_shadow_generation(&self) -> usize {
+ self.shmem
+ .data()
+ .token_shadow_generation
+ .fetch_add(1, Ordering::AcqRel)
+ }
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 17%]
* [pbs-devel] [PATCH proxmox-backup v4 3/4] pbs-config: invalidate token-secret cache on token.shadow changes
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
2026-01-21 15:13 17% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
2026-01-21 15:13 12% ` [pbs-devel] [PATCH proxmox-backup v4 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
@ 2026-01-21 15:13 12% ` Samuel Rufinatscha
2026-01-21 15:14 15% ` [pbs-devel] [PATCH proxmox-backup v4 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
` (7 subsequent siblings)
10 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-21 15:13 UTC (permalink / raw)
To: pbs-devel
This patch adds manual/direct file change detection by tracking the
mtime and length of token.shadow and clears the in-memory token secret
cache whenever these values change.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to v4:
* make use of .replace() in refresh_cache_if_file_changed to get
previous state
* Group file stats with ShadowFileInfo
* Return false in refresh_cache_if_file_changed to avoid unnecessary cache
queries
* Adjusted commit message
Changes from v2 to v3:
* Cache now tracks last_checked (epoch seconds).
* Simplified refresh_cache_if_file_changed, removed
FILE_GENERATION logic
* On first load, initializes file metadata and keeps empty cache.
Changes from v1 to v2:
* Add file metadata tracking (file_mtime, file_len) and
FILE_GENERATION.
* Store file_gen in CachedSecret and verify it against the current
FILE_GENERATION to ensure cached entries belong to the current file
state.
* Add shadow_mtime_len() helper and convert refresh to best-effort
(try_write, returns bool).
* Pass a pre-write metadata snapshot into apply_api_mutation and
clear/bump generation if the cache metadata indicates missed external
edits.
pbs-config/src/token_shadow.rs | 123 +++++++++++++++++++++++++++++++--
1 file changed, 119 insertions(+), 4 deletions(-)
diff --git a/pbs-config/src/token_shadow.rs b/pbs-config/src/token_shadow.rs
index d5aa5de2..a5bd1525 100644
--- a/pbs-config/src/token_shadow.rs
+++ b/pbs-config/src/token_shadow.rs
@@ -1,5 +1,8 @@
use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
use std::sync::LazyLock;
+use std::time::SystemTime;
use anyhow::{bail, format_err, Error};
use parking_lot::RwLock;
@@ -7,6 +10,7 @@ use serde::{Deserialize, Serialize};
use serde_json::{from_value, Value};
use proxmox_sys::fs::CreateOptions;
+use proxmox_time::epoch_i64;
use pbs_api_types::Authid;
//use crate::auth;
@@ -24,6 +28,7 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
RwLock::new(ApiTokenSecretCache {
secrets: HashMap::new(),
shared_gen: 0,
+ shadow: None,
})
});
@@ -62,6 +67,56 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
proxmox_sys::fs::replace_file(CONF_FILE, &json, options, true)
}
+/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
+/// Returns true if the cache is valid to use, false if not.
+fn refresh_cache_if_file_changed() -> bool {
+ let now = epoch_i64();
+
+ // Best-effort refresh under write lock.
+ let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+ return false;
+ };
+
+ let Some(shared_gen_now) = token_shadow_shared_gen() else {
+ return false;
+ };
+
+ // If another process bumped the generation, we don't know what changed -> clear cache
+ if cache.shared_gen != shared_gen_now {
+ invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
+ }
+
+ // Stat the file to detect manual edits.
+ let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
+ return false;
+ };
+
+ // If the file didn't change, only update last_checked
+ if let Some(shadow) = cache.shadow.as_mut() {
+ if shadow.mtime == new_mtime && shadow.len == new_len {
+ shadow.last_checked = now;
+ return true;
+ }
+ }
+
+ cache.secrets.clear();
+
+ let prev = cache.shadow.replace(ShadowFileInfo {
+ mtime: new_mtime,
+ len: new_len,
+ last_checked: now,
+ });
+
+ if prev.is_some() {
+ // Best-effort propagation to other processes if a change was detected
+ if let Some(shared_gen_new) = bump_token_shadow_shared_gen() {
+ cache.shared_gen = shared_gen_new;
+ }
+ }
+
+ false
+}
+
/// Verifies that an entry for given tokenid / API token secret exists
pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
if !tokenid.is_token() {
@@ -69,7 +124,7 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
}
// Fast path
- if cache_try_secret_matches(tokenid, secret) {
+ if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
return Ok(());
}
@@ -109,12 +164,15 @@ fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
let guard = lock_config()?;
+ // Capture state before we write to detect external edits.
+ let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
let mut data = read_file()?;
let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
data.insert(tokenid.clone(), hashed_secret);
write_file(data)?;
- apply_api_mutation(guard, tokenid, Some(secret));
+ apply_api_mutation(guard, tokenid, Some(secret), pre_meta);
Ok(())
}
@@ -127,11 +185,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
let guard = lock_config()?;
+ // Capture state before we write to detect external edits.
+ let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
let mut data = read_file()?;
data.remove(tokenid);
write_file(data)?;
- apply_api_mutation(guard, tokenid, None);
+ apply_api_mutation(guard, tokenid, None, pre_meta);
Ok(())
}
@@ -145,6 +206,8 @@ struct ApiTokenSecretCache {
secrets: HashMap<Authid, CachedSecret>,
/// Shared generation to detect mutations of the underlying token.shadow file.
shared_gen: usize,
+ /// Shadow file info to detect changes
+ shadow: Option<ShadowFileInfo>,
}
/// Cached secret.
@@ -152,6 +215,16 @@ struct CachedSecret {
secret: String,
}
+/// Shadow file info
+struct ShadowFileInfo {
+ // shadow file mtime to detect changes
+ mtime: Option<SystemTime>,
+ // shadow file length to detect changes
+ len: Option<u64>,
+ // last time the file metadata was checked
+ last_checked: i64,
+}
+
fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
return;
@@ -196,7 +269,14 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
false
}
-fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+fn apply_api_mutation(
+ _guard: BackupLockGuard,
+ tokenid: &Authid,
+ new_secret: Option<&str>,
+ pre_write_meta: (Option<SystemTime>, Option<u64>),
+) {
+ let now = epoch_i64();
+
// Signal cache invalidation to other processes (best-effort).
let bumped_gen = bump_token_shadow_shared_gen();
@@ -215,6 +295,16 @@ fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Opt
return;
}
+ // If our cached file metadata does not match the on-disk state before our write,
+ // we likely missed an external/manual edit. We can no longer trust any cached secrets.
+ if cache
+ .shadow
+ .as_ref()
+ .is_some_and(|s| (s.mtime, s.len) != pre_write_meta)
+ {
+ cache.secrets.clear();
+ }
+
// Update to the post-mutation generation.
cache.shared_gen = current_gen;
@@ -232,6 +322,22 @@ fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Opt
cache.secrets.remove(tokenid);
}
}
+
+ // Update our view of the file metadata to the post-write state (best-effort).
+ // (If this fails, drop local cache so callers fall back to slow path until refreshed.)
+ match shadow_mtime_len() {
+ Ok((mtime, len)) => {
+ cache.shadow = Some(ShadowFileInfo {
+ mtime,
+ len,
+ last_checked: now,
+ });
+ }
+ Err(_) => {
+ // If we cannot validate state, do not trust cache.
+ invalidate_cache_state_and_set_gen(&mut cache, current_gen);
+ }
+ }
}
/// Get the current shared generation.
@@ -252,4 +358,13 @@ fn bump_token_shadow_shared_gen() -> Option<usize> {
fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache, gen: usize) {
cache.secrets.clear();
cache.shared_gen = gen;
+ cache.shadow = None;
+}
+
+fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
+ match fs::metadata(CONF_FILE) {
+ Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
+ Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
+ Err(e) => Err(e.into()),
+ }
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] [PATCH proxmox-datacenter-manager v4 3/3] pdm-config: wire user+acl cache generation
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
` (9 preceding siblings ...)
2026-01-21 15:14 17% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 2/3] docs: document API token-cache TTL effects Samuel Rufinatscha
@ 2026-01-21 15:14 16% ` Samuel Rufinatscha
10 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
To: pbs-devel
Rename ConfigVersionCache’s user_cache_generation to
user_and_acl_generation to match AccessControlConfig::cache_generation
and increment_cache_generation semantics: it expects the same shared
generation for both user and ACL configs.
Safety: no layout change, the shared-memory size and field order remain
unchanged.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
lib/pdm-config/src/access_control.rs | 11 +++++++++++
lib/pdm-config/src/config_version_cache.rs | 16 ++++++++--------
2 files changed, 19 insertions(+), 8 deletions(-)
diff --git a/lib/pdm-config/src/access_control.rs b/lib/pdm-config/src/access_control.rs
index 389b3f4..1d498d3 100644
--- a/lib/pdm-config/src/access_control.rs
+++ b/lib/pdm-config/src/access_control.rs
@@ -7,6 +7,17 @@ impl proxmox_access_control::init::AccessControlConfig for AccessControlConfig {
&pdm_api_types::AccessControlPermissions
}
+ fn cache_generation(&self) -> Option<usize> {
+ crate::ConfigVersionCache::new()
+ .ok()
+ .map(|c| c.user_and_acl_generation())
+ }
+
+ fn increment_cache_generation(&self) -> Result<(), Error> {
+ let c = crate::ConfigVersionCache::new()?;
+ Ok(c.increase_user_and_acl_generation())
+ }
+
fn token_shadow_cache_generation(&self) -> Option<usize> {
crate::ConfigVersionCache::new()
.ok()
diff --git a/lib/pdm-config/src/config_version_cache.rs b/lib/pdm-config/src/config_version_cache.rs
index 933140c..f3d52a0 100644
--- a/lib/pdm-config/src/config_version_cache.rs
+++ b/lib/pdm-config/src/config_version_cache.rs
@@ -21,8 +21,8 @@ use proxmox_shared_memory::*;
#[repr(C)]
struct ConfigVersionCacheDataInner {
magic: [u8; 8],
- // User (user.cfg) cache generation/version.
- user_cache_generation: AtomicUsize,
+ // User (user.cfg) and ACL (acl.cfg) generation/version.
+ user_and_acl_generation: AtomicUsize,
// Traffic control (traffic-control.cfg) generation/version.
traffic_control_generation: AtomicUsize,
// Tracks updates to the remote/hostname/nodename mapping cache.
@@ -126,19 +126,19 @@ impl ConfigVersionCache {
Ok(Arc::new(Self { shmem }))
}
- /// Returns the user cache generation number.
- pub fn user_cache_generation(&self) -> usize {
+ /// Returns the user and ACL cache generation number.
+ pub fn user_and_acl_generation(&self) -> usize {
self.shmem
.data()
- .user_cache_generation
+ .user_and_acl_generation
.load(Ordering::Acquire)
}
- /// Increase the user cache generation number.
- pub fn increase_user_cache_generation(&self) {
+ /// Increase the user and ACL cache generation number.
+ pub fn increase_user_and_acl_generation(&self) {
self.shmem
.data()
- .user_cache_generation
+ .user_and_acl_generation
.fetch_add(1, Ordering::AcqRel);
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 16%]
* [pbs-devel] [PATCH proxmox v4 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
` (5 preceding siblings ...)
2026-01-21 15:14 12% ` [pbs-devel] [PATCH proxmox v4 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
@ 2026-01-21 15:14 12% ` Samuel Rufinatscha
2026-01-21 15:14 15% ` [pbs-devel] [PATCH proxmox v4 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
` (3 subsequent siblings)
10 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-21 15:14 UTC (permalink / raw)
To: pbs-devel
This patch adds manual/direct file change detection by tracking the
mtime and length of token.shadow and clears the in-memory token secret
cache whenever these values change.
Signed-off-by: Samuel Rufinatscha <s.rufinatscha@proxmox.com>
---
Changes from v3 to v4:
* make use of .replace() in refresh_cache_if_file_changed to get
previous state
* Group file stats with ShadowFileInfo
* Return false in refresh_cache_if_file_changed to avoid unnecessary cache
queries
* Adjusted commit message
Changes from v2 to v3:
* Cache now tracks last_checked (epoch seconds).
* Simplified refresh_cache_if_file_changed, removed
FILE_GENERATION logic
* On first load, initializes file metadata and keeps empty cache.
Changes from v1 to v2:
* Add file metadata tracking (file_mtime, file_len) and
FILE_GENERATION.
* Store file_gen in CachedSecret and verify it against the current
FILE_GENERATION to ensure cached entries belong to the current file
state.
* Add shadow_mtime_len() helper and convert refresh to best-effort
(try_write, returns bool).
* Pass a pre-write metadata snapshot into apply_api_mutation and
clear/bump generation if the cache metadata indicates missed external
edits.
proxmox-access-control/src/token_shadow.rs | 123 ++++++++++++++++++++-
1 file changed, 119 insertions(+), 4 deletions(-)
diff --git a/proxmox-access-control/src/token_shadow.rs b/proxmox-access-control/src/token_shadow.rs
index e4dfab50..05813b52 100644
--- a/proxmox-access-control/src/token_shadow.rs
+++ b/proxmox-access-control/src/token_shadow.rs
@@ -1,5 +1,8 @@
use std::collections::HashMap;
+use std::fs;
+use std::io::ErrorKind;
use std::sync::LazyLock;
+use std::time::SystemTime;
use anyhow::{bail, format_err, Error};
use parking_lot::RwLock;
@@ -7,6 +10,7 @@ use serde_json::{from_value, Value};
use proxmox_auth_api::types::Authid;
use proxmox_product_config::{open_api_lockfile, replace_config, ApiLockGuard};
+use proxmox_time::epoch_i64;
use crate::init::access_conf;
use crate::init::impl_feature::{token_shadow, token_shadow_lock};
@@ -20,6 +24,7 @@ static TOKEN_SECRET_CACHE: LazyLock<RwLock<ApiTokenSecretCache>> = LazyLock::new
RwLock::new(ApiTokenSecretCache {
secrets: HashMap::new(),
shared_gen: 0,
+ shadow: None,
})
});
@@ -45,6 +50,56 @@ fn write_file(data: HashMap<Authid, String>) -> Result<(), Error> {
replace_config(token_shadow(), &json)
}
+/// Refreshes the in-memory cache if the on-disk token.shadow file changed.
+/// Returns true if the cache is valid to use, false if not.
+fn refresh_cache_if_file_changed() -> bool {
+ let now = epoch_i64();
+
+ // Best-effort refresh under write lock.
+ let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
+ return false;
+ };
+
+ let Some(shared_gen_now) = token_shadow_shared_gen() else {
+ return false;
+ };
+
+ // If another process bumped the generation, we don't know what changed -> clear cache
+ if cache.shared_gen != shared_gen_now {
+ invalidate_cache_state_and_set_gen(&mut cache, shared_gen_now);
+ }
+
+ // Stat the file to detect manual edits.
+ let Ok((new_mtime, new_len)) = shadow_mtime_len() else {
+ return false;
+ };
+
+ // If the file didn't change, only update last_checked
+ if let Some(shadow) = cache.shadow.as_mut() {
+ if shadow.mtime == new_mtime && shadow.len == new_len {
+ shadow.last_checked = now;
+ return true;
+ }
+ }
+
+ cache.secrets.clear();
+
+ let prev = cache.shadow.replace(ShadowFileInfo {
+ mtime: new_mtime,
+ len: new_len,
+ last_checked: now,
+ });
+
+ if prev.is_some() {
+ // Best-effort propagation to other processes if a change was detected
+ if let Some(shared_gen_new) = bump_token_shadow_shared_gen() {
+ cache.shared_gen = shared_gen_new;
+ }
+ }
+
+ false
+}
+
/// Verifies that an entry for given tokenid / API token secret exists
pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
if !tokenid.is_token() {
@@ -52,7 +107,7 @@ pub fn verify_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
}
// Fast path
- if cache_try_secret_matches(tokenid, secret) {
+ if refresh_cache_if_file_changed() && cache_try_secret_matches(tokenid, secret) {
return Ok(());
}
@@ -84,12 +139,15 @@ pub fn set_secret(tokenid: &Authid, secret: &str) -> Result<(), Error> {
let guard = lock_config()?;
+ // Capture state before we write to detect external edits.
+ let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
let mut data = read_file()?;
let hashed_secret = proxmox_sys::crypt::encrypt_pw(secret)?;
data.insert(tokenid.clone(), hashed_secret);
write_file(data)?;
- apply_api_mutation(guard, tokenid, Some(secret));
+ apply_api_mutation(guard, tokenid, Some(secret), pre_meta);
Ok(())
}
@@ -102,11 +160,14 @@ pub fn delete_secret(tokenid: &Authid) -> Result<(), Error> {
let guard = lock_config()?;
+ // Capture state before we write to detect external edits.
+ let pre_meta = shadow_mtime_len().unwrap_or((None, None));
+
let mut data = read_file()?;
data.remove(tokenid);
write_file(data)?;
- apply_api_mutation(guard, tokenid, None);
+ apply_api_mutation(guard, tokenid, None, pre_meta);
Ok(())
}
@@ -128,6 +189,8 @@ struct ApiTokenSecretCache {
secrets: HashMap<Authid, CachedSecret>,
/// Shared generation to detect mutations of the underlying token.shadow file.
shared_gen: usize,
+ /// Shadow file info to detect changes
+ shadow: Option<ShadowFileInfo>,
}
/// Cached secret.
@@ -135,6 +198,16 @@ struct CachedSecret {
secret: String,
}
+/// Shadow file info
+struct ShadowFileInfo {
+ // shadow file mtime to detect changes
+ mtime: Option<SystemTime>,
+ // shadow file length to detect changes
+ len: Option<u64>,
+ // last time the file metadata was checked
+ last_checked: i64,
+}
+
fn cache_try_insert_secret(tokenid: Authid, secret: String, shared_gen_before: usize) {
let Some(mut cache) = TOKEN_SECRET_CACHE.try_write() else {
return;
@@ -179,7 +252,14 @@ fn cache_try_secret_matches(tokenid: &Authid, secret: &str) -> bool {
false
}
-fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Option<&str>) {
+fn apply_api_mutation(
+ _guard: ApiLockGuard,
+ tokenid: &Authid,
+ new_secret: Option<&str>,
+ pre_write_meta: (Option<SystemTime>, Option<u64>),
+) {
+ let now = epoch_i64();
+
// Signal cache invalidation to other processes (best-effort).
let bumped_gen = bump_token_shadow_shared_gen();
@@ -198,6 +278,16 @@ fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Opt
return;
}
+ // If our cached file metadata does not match the on-disk state before our write,
+ // we likely missed an external/manual edit. We can no longer trust any cached secrets.
+ if cache
+ .shadow
+ .as_ref()
+ .is_some_and(|s| (s.mtime, s.len) != pre_write_meta)
+ {
+ cache.secrets.clear();
+ }
+
// Update to the post-mutation generation.
cache.shared_gen = current_gen;
@@ -215,6 +305,22 @@ fn apply_api_mutation(_guard: BackupLockGuard, tokenid: &Authid, new_secret: Opt
cache.secrets.remove(tokenid);
}
}
+
+ // Update our view of the file metadata to the post-write state (best-effort).
+ // (If this fails, drop local cache so callers fall back to slow path until refreshed.)
+ match shadow_mtime_len() {
+ Ok((mtime, len)) => {
+ cache.shadow = Some(ShadowFileInfo {
+ mtime,
+ len,
+ last_checked: now,
+ });
+ }
+ Err(_) => {
+ // If we cannot validate state, do not trust cache.
+ invalidate_cache_state_and_set_gen(&mut cache, current_gen);
+ }
+ }
}
/// Get the current shared generation.
@@ -234,4 +340,13 @@ fn bump_token_shadow_shared_gen() -> Option<usize> {
fn invalidate_cache_state_and_set_gen(cache: &mut ApiTokenSecretCache, gen: usize) {
cache.secrets.clear();
cache.shared_gen = gen;
+ cache.shadow = None;
+}
+
+fn shadow_mtime_len() -> Result<(Option<SystemTime>, Option<u64>), Error> {
+ match fs::metadata(token_shadow()) {
+ Ok(meta) => Ok((meta.modified().ok(), Some(meta.len()))),
+ Err(e) if e.kind() == ErrorKind::NotFound => Ok((None, None)),
+ Err(e) => Err(e.into()),
+ }
}
--
2.47.3
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 12%]
* [pbs-devel] superseded: [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] token-shadow: reduce api token verification overhead
` (3 preceding siblings ...)
@ 2026-01-21 15:15 13% ` Samuel Rufinatscha
4 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-21 15:15 UTC (permalink / raw)
To: pbs-devel
https://lore.proxmox.com/pbs-devel/20260121151408.731516-1-s.rufinatscha@proxmox.com/T/#t
On 1/2/26 5:07 PM, Samuel Rufinatscha wrote:
> Hi,
>
> this series improves the performance of token-based API authentication
> in PBS (pbs-config) and in PDM (underlying proxmox-access-control
> crate), addressing the API token verification hotspot reported in our
> bugtracker #7017 [1].
>
> When profiling PBS /status endpoint with cargo flamegraph [2],
> token-based authentication showed up as a dominant hotspot via
> proxmox_sys::crypt::verify_crypt_pw. Applying this series removes that
> path from the hot section of the flamegraph. The same performance issue
> was measured [2] for PDM. PDM uses the underlying shared
> proxmox-access-control library for token handling, which is a
> factored out version of the token.shadow handling code from PBS.
>
> While this series fixes the immediate performance issue both in PBS
> (pbs-config) and in the shared proxmox-access-control crate used by
> PDM, PBS should eventually, ideally be refactored, in a separate
> effort, to use proxmox-access-control for token handling instead of its
> local implementation.
>
> Problem
>
> For token-based API requests, both PBS’s pbs-config token.shadow
> handling and PDM proxmox-access-control’s token.shadow handling
> currently:
>
> 1. read the token.shadow file on each request
> 2. deserialize it into a HashMap<Authid, String>
> 3. run password hash verification via
> proxmox_sys::crypt::verify_crypt_pw for the provided token secret
>
> Under load, this results in significant CPU usage spent in repeated
> password hashing for the same token+secret pairs. The attached
> flamegraphs for PBS [2] and PDM [3] show
> proxmox_sys::crypt::verify_crypt_pw dominating the hot path.
>
> Approach
>
> The goal is to reduce the cost of token-based authentication preserving
> the existing token handling semantics (including detecting manual edits
> to token.shadow) and be consistent between PBS (pbs-config) and
> PDM (proxmox-access-control). For both sites, this series proposes to:
>
> 1. Introduce an in-memory cache for verified token secrets and
> invalidate it through a shared ConfigVersionCache generation. Note, a
> shared generation is required to keep privileged and unprivileged
> daemon in sync to avoid caching inconsistencies across processes.
> 2. Invalidate on token.shadow file API changes (set_secret,
> delete_secret)
> 3. Invalidate on direct/manual token.shadow file changes (mtime +
> length)
> 4. Avoid per-request file stat calls using a TTL window
>
> Testing
>
> *PBS (pbs-config)*
>
> To verify the effect in PBS, I:
> 1. Set up test environment based on latest PBS ISO, installed Rust
> toolchain, cloned proxmox-backup repository to use with cargo
> flamegraph. Reproduced bug #7017 [1] by profiling the /status
> endpoint with token-based authentication using cargo flamegraph [2].
> 2. Built PBS with pbs-config patches and re-ran the same workload and
> profiling setup. Confirmed that
> proxmox_sys::crypt::verify_crypt_pw path no longer appears in the
> hot section of the flamegraph. CPU usage is now dominated by TLS
> overhead.
> 3. Functionally-wise, I verified that:
> * valid tokens authenticate correctly when used in API requests
> * invalid secrets are rejected as before
> * generating a new token secret via dashboard (create token for user,
> regenerate existing secret) works and authenticates correctly
>
> *PDM (proxmox-access-control)*
>
> To verify the effect in PDM, I followed a similar testing approach.
> Instead of PBS’ /status, I profiled the /version endpoint with cargo
> flamegraph [2] and verified that the expensive hashing path disappears
> from the hot section after introducing caching.
>
> Functionally-wise, I verified that:
> * valid tokens authenticate correctly when used in API requests
> * invalid secrets are rejected as before
> * generating a new token secret via dashboard (create token for user,
> regenerate existing secret) works and authenticates correctly
>
> Benchmarks:
>
> Two different benchmarks have been run to measure caching effects
> and RwLock contention:
>
> (1) Requests per second for PBS /status endpoint (E2E)
>
> Benchmarked parallel token auth requests for
> /status?verbose=0 on top of the datastore lookup cache series [4]
> to check throughput impact. With datastores=1, repeat=5000, parallel=16
> this series gives ~172 req/s compared to ~65 req/s without it.
> This is a ~2.6x improvement (and aligns with the ~179 req/s from the
> previous series, which used per-process cache invalidation).
>
> (2) RwLock contention for token create/delete under heavy load of
> token-authenticated requests
>
> The previous version of the series compared std::sync::RwLock and
> parking_lot::RwLock contention for token create/delete under heavy
> parallel token-authenticated readers. parking_lot::RwLock has been
> chosen for the added fairness guarantees.
>
> Patch summary
>
> pbs-config:
>
> 0001 – pbs-config: add token.shadow generation to ConfigVersionCache
> Extends ConfigVersionCache to provide a process-shared generation
> number for token.shadow changes.
>
> 0002 – pbs-config: cache verified API token secrets
> Adds an in-memory cache to cache verified, plain-text API token secrets.
> Cache is invalidated through the process-shared ConfigVersionCache
> generation number. Uses openssl’s memcmp constant-time for matching
> secrets.
>
> 0003 – pbs-config: invalidate token-secret cache on token.shadow
> changes
> Stats token.shadow mtime and length and clears the cache when the
> file changes, on each token verification request.
>
> 0004 – pbs-config: add TTL window to token-secret cache
> Introduces a TTL (TOKEN_SECRET_CACHE_TTL_SECS, default 60) for metadata
> checks so that fs::metadata calls are not performed on each request.
>
> proxmox-access-control:
>
> 0005 – access-control: extend AccessControlConfig for token.shadow invalidation
>
> Extends the AccessControlConfig trait with
> token_shadow_cache_generation() and
> increment_token_shadow_cache_generation() for
> proxmox-access-control to get the shared token.shadow generation number
> and bump it on token shadow changes.
>
> 0006 – access-control: cache verified API token secrets
> Mirrors PBS PATCH 0002.
>
> 0007 – access-control: invalidate token-secret cache on token.shadow changes
> Mirrors PBS PATCH 0003.
>
> 0008 – access-control: add TTL window to token-secret cache
> Mirrors PBS PATCH 0004.
>
> proxmox-datacenter-manager:
>
> 0009 – pdm-config: add token.shadow generation to ConfigVersionCache
> Extends PDM ConfigVersionCache and implements
> token_shadow_cache_generation() and
> increment_token_shadow_cache_generation() from AccessControlConfig for
> PDM.
>
> 0010 – docs: document API token-cache TTL effects
> Documents the effects of the TTL window on token.shadow edits
>
> Changes from v1 to v2:
>
> * (refactor) Switched cache initialization to LazyLock
> * (perf) Use parking_lot::RwLock and best-effort cache access on the
> read/refresh path (try_read/try_write) to avoid lock contention
> * (doc) Document TTL-delayed effect of manual token.shadow edits
> * (fix) Add generation guards (API_MUTATION_GENERATION +
> FILE_GENERATION) to prevent caching across concurrent set/delete and
> external edits
>
> Changes from v2 to v3:
>
> * (refactor) Replace PBS per-process cache invalidation with a
> cross-process token.shadow generation based on PBS
> ConfigVersionCache, ensuring cache consistency between privileged
> and unprivileged daemons.
> * (refactor) Decoupling generation source from the
> proxmox/proxmox-access-control cache implementation: extend
> AccessControlConfig hooks so that products can provide the shared
> token.shadow generation source.
> * (refactor) Extend PDM's ConfigVersionCache with
> token_shadow_generation
> and introduce a pdm_config::AccessControlConfig wrapper implementing
> the new proxmox-access-control trait hooks. Switch server and CLI
> initialization to use pdm_config::AccessControlConfig instead of
> pdm_api_types::AccessControlConfig.
> * (refactor) Adapt generation checks around cached-secret comparison to
> use the new shared generation source.
> * (fix/logic) cache_try_insert_secret: Update the local cache
> generation if stale, allowing the new secret to be inserted
> immediately
> * (refactor) Extract cache invalidation logic into a
> invalidate_cache_state helper to reduce duplication and ensure
> consistent state resets
> * (refactor) Simplify refresh_cache_if_file_changed: handle the
> un-initialized/reset state and adjust the generation mismatch
> path to ensure file metadata is always re-read.
> * (doc) Clarify TTL-delayed effects of manual token.shadow edits.
>
> Please see the patch specific changelogs for more details.
>
> Thanks for considering this patch series, I look forward to your
> feedback.
>
> Best,
> Samuel Rufinatscha
>
> [1] https://bugzilla.proxmox.com/show_bug.cgi?id=7017
> [2] attachment 1767 [1]: Flamegraph showing the proxmox_sys::crypt::verify_crypt_pw stack
> [3] attachment 1794 [1]: Flamegraph PDM baseline
> [4] https://bugzilla.proxmox.com/show_bug.cgi?id=6049
>
> proxmox-backup:
>
> Samuel Rufinatscha (4):
> pbs-config: add token.shadow generation to ConfigVersionCache
> pbs-config: cache verified API token secrets
> pbs-config: invalidate token-secret cache on token.shadow changes
> pbs-config: add TTL window to token secret cache
>
> Cargo.toml | 1 +
> docs/user-management.rst | 4 +
> pbs-config/Cargo.toml | 1 +
> pbs-config/src/config_version_cache.rs | 18 ++
> pbs-config/src/token_shadow.rs | 298 ++++++++++++++++++++++++-
> 5 files changed, 321 insertions(+), 1 deletion(-)
>
>
> proxmox:
>
> Samuel Rufinatscha (4):
> proxmox-access-control: extend AccessControlConfig for token.shadow
> invalidation
> proxmox-access-control: cache verified API token secrets
> proxmox-access-control: invalidate token-secret cache on token.shadow
> changes
> proxmox-access-control: add TTL window to token secret cache
>
> Cargo.toml | 1 +
> proxmox-access-control/Cargo.toml | 1 +
> proxmox-access-control/src/init.rs | 17 ++
> proxmox-access-control/src/token_shadow.rs | 299 ++++++++++++++++++++-
> 4 files changed, 317 insertions(+), 1 deletion(-)
>
>
> proxmox-datacenter-manager:
>
> Samuel Rufinatscha (2):
> pdm-config: implement token.shadow generation
> docs: document API token-cache TTL effects
>
> cli/admin/src/main.rs | 2 +-
> docs/access-control.rst | 4 ++
> lib/pdm-config/Cargo.toml | 1 +
> lib/pdm-config/src/access_control_config.rs | 73 +++++++++++++++++++++
> lib/pdm-config/src/config_version_cache.rs | 18 +++++
> lib/pdm-config/src/lib.rs | 2 +
> server/src/acl.rs | 3 +-
> 7 files changed, 100 insertions(+), 3 deletions(-)
> create mode 100644 lib/pdm-config/src/access_control_config.rs
>
>
> Summary over all repositories:
> 16 files changed, 738 insertions(+), 5 deletions(-)
>
_______________________________________________
pbs-devel mailing list
pbs-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pbs-devel
^ permalink raw reply [relevance 13%]
* Re: [pve-devel] [PATCH pve-cluster 01/15] pmxcfs-rs: add workspace and pmxcfs-api-types crate
@ 2026-01-23 14:17 6% ` Samuel Rufinatscha
2026-01-26 9:00 6% ` Kefu Chai
0 siblings, 1 reply; 39+ results
From: Samuel Rufinatscha @ 2026-01-23 14:17 UTC (permalink / raw)
To: Proxmox VE development discussion, Kefu Chai
Thanks for the series. I’ve started reviewing patches 1–6; sending
notes for patch 1 first, and I’ll follow up with comments on the
others once I’ve gone through them in more depth.
comments inline
On 1/6/26 3:25 PM, Kefu Chai wrote:
> Initialize the Rust workspace for the pmxcfs rewrite project.
>
> Add pmxcfs-api-types crate which provides foundational types:
> - PmxcfsError: Error type with errno mapping for FUSE operations
> - FuseMessage: Filesystem operation messages
> - KvStoreMessage: Status synchronization messages
> - ApplicationMessage: Wrapper enum for both message types
> - VmType: VM type enum (Qemu, Lxc)
>
> This is the foundation crate with no internal dependencies, only
> requiring thiserror and libc. All other crates will depend on these
> shared type definitions.
>
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
> src/pmxcfs-rs/Cargo.lock | 2067 +++++++++++++++++++++
Following the .gitignore pattern in our other repos, Cargo.lock is
ignored, so I’d suggest dropping it from the series.
> src/pmxcfs-rs/Cargo.toml | 83 +
> src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml | 19 +
> src/pmxcfs-rs/pmxcfs-api-types/README.md | 105 ++
> src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs | 152 ++
> 5 files changed, 2426 insertions(+)
> create mode 100644 src/pmxcfs-rs/Cargo.lock
> create mode 100644 src/pmxcfs-rs/Cargo.toml
> create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
> create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/README.md
> create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
>
> diff --git a/src/pmxcfs-rs/Cargo.lock b/src/pmxcfs-rs/Cargo.lock
[..]
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -0,0 +1,83 @@
> +# Workspace root for pmxcfs Rust implementation
> +[workspace]
> +members = [
> + "pmxcfs-api-types", # Shared types and error definitions
> +]
> +resolver = "2"
> +
> +[workspace.package]
> +version = "9.0.6"
> +edition = "2024"
> +authors = ["Proxmox Support Team <support@proxmox.com>"]
> +license = "AGPL-3.0"
> +repository = "https://git.proxmox.com/?p=pve-cluster.git"
> +rust-version = "1.85"
> +
> +[workspace.dependencies]
Here we already declare workspace path deps for crates that aren’t
present yet (pmxcfs-config, pmxcfs-memdb, ...). For bisectability,
could we keep this patch minimal and add those workspace
members/path deps in the patches where the crates are introduced?
> +# Internal workspace dependencies
> +pmxcfs-api-types = { path = "pmxcfs-api-types" }
> +pmxcfs-config = { path = "pmxcfs-config" }
> +pmxcfs-memdb = { path = "pmxcfs-memdb" }
> +pmxcfs-dfsm = { path = "pmxcfs-dfsm" }
> +pmxcfs-rrd = { path = "pmxcfs-rrd" }
> +pmxcfs-status = { path = "pmxcfs-status" }
> +pmxcfs-ipc = { path = "pmxcfs-ipc" }
> +pmxcfs-services = { path = "pmxcfs-services" }
> +pmxcfs-logger = { path = "pmxcfs-logger" }
> +
> +# Core async runtime
> +tokio = { version = "1.35", features = ["full"] }
> +tokio-util = "0.7"
> +async-trait = "0.1"
> +
If the goal is to centrally pin external crate versions early, maybe
limit [workspace.dependencies] here generally to the crates actually
used by pmxcfs-api-types (thiserror, libc) and extend as new crates
are added.
> +# Error handling
> +anyhow = "1.0"
> +thiserror = "1.0"
> +
> +# Logging and tracing
> +tracing = "0.1"
> +tracing-subscriber = { version = "0.3", features = ["env-filter"] }
> +
> +# Serialization
> +serde = { version = "1.0", features = ["derive"] }
> +serde_json = "1.0"
> +bincode = "1.3"
> +
> +# Network and cluster
> +bytes = "1.5"
> +sha2 = "0.10"
> +bytemuck = { version = "1.14", features = ["derive"] }
> +
> +# System integration
> +libc = "0.2"
> +nix = { version = "0.27", features = ["fs", "process", "signal", "user", "socket"] }
> +users = "0.11"
> +
> +# Corosync/CPG bindings
> +rust-corosync = "0.1"
> +
> +# Enum conversions
> +num_enum = "0.7"
> +
> +# Concurrency primitives
> +parking_lot = "0.12"
> +
> +# Utilities
> +chrono = "0.4"
> +futures = "0.3"
> +
> +# Development dependencies
> +tempfile = "3.8"
> +
> +[workspace.lints.clippy]
> +uninlined_format_args = "warn"
> +
> +[profile.release]
> +lto = true
> +codegen-units = 1
> +opt-level = 3
> +strip = true
> +
> +[profile.dev]
> +opt-level = 1
> +debug = true
> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml b/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
> new file mode 100644
> index 00000000..cdce7951
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
> @@ -0,0 +1,19 @@
> +[package]
> +name = "pmxcfs-api-types"
> +description = "Shared types and error definitions for pmxcfs"
> +
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +repository.workspace = true
> +
> +[lints]
> +workspace = true
> +
> +[dependencies]
> +# Error handling
> +thiserror.workspace = true
> +
> +# System integration
> +libc.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/README.md b/src/pmxcfs-rs/pmxcfs-api-types/README.md
> new file mode 100644
> index 00000000..da8304ae
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-api-types/README.md
> @@ -0,0 +1,105 @@
> +# pmxcfs-api-types
> +
> +**Shared Types and Error Definitions** for pmxcfs.
> +
> +This crate provides common types, error definitions, and message formats used across all pmxcfs crates. It serves as the "API contract" between different components.
> +
> +## Overview
> +
> +The crate contains:
> +- **Error types**: `PmxcfsError` with errno mapping for FUSE
> +- **Message types**: `FuseMessage`, `KvStoreMessage`, `ApplicationMessage`
These types and the mentioned serialization helpers aren’t part of this
diff, could you re-check both README.md (and the commit message) so they
match?
> +- **Shared types**: `MemberInfo`, `NodeSyncInfo`
> +- **Serialization**: C-compatible wire format helpers
> +
> +## Error Types
> +
> +### PmxcfsError
> +
> +Type-safe error enum with automatic errno conversion.
> +
> +### errno Mapping
> +
> +Errors automatically convert to POSIX errno values for FUSE.
> +
> +| Error | errno | Value |
> +|-------|-------|-------|
> +| `NotFound` | `ENOENT` | 2 |
> +| `PermissionDenied` | `EPERM` | 1 |
> +| `AlreadyExists` | `EEXIST` | 17 |
> +| `NotADirectory` | `ENOTDIR` | 20 |
> +| `IsADirectory` | `EISDIR` | 21 |
> +| `DirectoryNotEmpty` | `ENOTEMPTY` | 39 |
> +| `FileTooLarge` | `EFBIG` | 27 |
> +| `ReadOnlyFilesystem` | `EROFS` | 30 |
> +| `NoQuorum` | `EACCES` | 13 |
> +| `Timeout` | `ETIMEDOUT` | 110 |
> +
> +## Message Types
> +
> +### FuseMessage
> +
> +Filesystem operations broadcast through the cluster (via DFSM). Uses C-compatible wire format compatible with `dcdb.c`.
> +
> +### KvStoreMessage
> +
> +Status and metrics synchronization (via kvstore DFSM). Uses C-compatible wire format.
> +
> +### ApplicationMessage
> +
> +Wrapper for either FuseMessage or KvStoreMessage, used by DFSM to handle both filesystem and status messages with type safety.
> +
> +## Shared Types
> +
> +### MemberInfo
> +
> +Cluster member information.
> +
> +### NodeSyncInfo
> +
> +DFSM synchronization state.
> +
> +## C to Rust Mapping
> +
> +### Error Handling
> +
> +**C Version (cfs-utils.h):**
> +- Return codes: `0` = success, negative = error
> +- errno-based error reporting
> +- Manual error checking everywhere
> +
> +**Rust Version:**
> +- `Result<T, PmxcfsError>` type
> +
> +### Message Types
> +
> +**C Version (dcdb.h):**
> +
> +**Rust Version:**
> +- Type-safe enums
> +
> +## Key Differences from C Implementation
> +
> +All message types have `serialize()` and `deserialize()` methods that produce byte-for-byte compatible formats with the C implementation.
> +
> +## Known Issues / TODOs
> +
> +### Missing Features
> +- None identified
> +
> +### Compatibility
> +- **Wire format**: 100% compatible with C implementation
> +- **errno values**: Match POSIX standards
> +- **Message types**: All C message types covered
> +
> +## References
> +
> +### C Implementation
> +- `src/pmxcfs/cfs-utils.h` - Utility types and error codes
> +- `src/pmxcfs/dcdb.h` - FUSE message types
> +- `src/pmxcfs/status.h` - KvStore message types
> +
> +### Related Crates
> +- **pmxcfs-dfsm**: Uses ApplicationMessage for cluster sync
> +- **pmxcfs-memdb**: Uses PmxcfsError for database operations
> +- **pmxcfs**: Uses FuseMessage for FUSE operations
> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs b/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
> new file mode 100644
> index 00000000..ae0e5eb0
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
> @@ -0,0 +1,152 @@
> +use thiserror::Error;
> +
> +/// Error types for pmxcfs operations
> +#[derive(Error, Debug)]
> +pub enum PmxcfsError {
nit: the error related parts could be added into a dedicated error.rs
module
> + #[error("I/O error: {0}")]
> + Io(#[from] std::io::Error),
> +
> + #[error("Database error: {0}")]
> + Database(String),
> +
> + #[error("FUSE error: {0}")]
> + Fuse(String),
> +
> + #[error("Cluster error: {0}")]
> + Cluster(String),
> +
> + #[error("Corosync error: {0}")]
> + Corosync(String),
> +
> + #[error("Configuration error: {0}")]
> + Configuration(String),
> +
> + #[error("System error: {0}")]
> + System(String),
> +
> + #[error("IPC error: {0}")]
> + Ipc(String),
> +
> + #[error("Permission denied")]
> + PermissionDenied,
> +
> + #[error("Not found: {0}")]
> + NotFound(String),
> +
> + #[error("Already exists: {0}")]
> + AlreadyExists(String),
> +
> + #[error("Invalid argument: {0}")]
> + InvalidArgument(String),
> +
> + #[error("Not a directory: {0}")]
> + NotADirectory(String),
> +
> + #[error("Is a directory: {0}")]
> + IsADirectory(String),
> +
> + #[error("Directory not empty: {0}")]
> + DirectoryNotEmpty(String),
> +
> + #[error("No quorum")]
> + NoQuorum,
> +
> + #[error("Read-only filesystem")]
> + ReadOnlyFilesystem,
> +
> + #[error("File too large")]
> + FileTooLarge,
> +
> + #[error("Lock error: {0}")]
> + Lock(String),
> +
> + #[error("Timeout")]
> + Timeout,
> +
> + #[error("Invalid path: {0}")]
> + InvalidPath(String),
> +}
> +
> +impl PmxcfsError {
> + /// Convert error to errno value for FUSE operations
> + pub fn to_errno(&self) -> i32 {
> + match self {
> + PmxcfsError::NotFound(_) => libc::ENOENT,
> + PmxcfsError::PermissionDenied => libc::EPERM,
> + PmxcfsError::AlreadyExists(_) => libc::EEXIST,
> + PmxcfsError::NotADirectory(_) => libc::ENOTDIR,
> + PmxcfsError::IsADirectory(_) => libc::EISDIR,
> + PmxcfsError::DirectoryNotEmpty(_) => libc::ENOTEMPTY,
> + PmxcfsError::InvalidArgument(_) => libc::EINVAL,
> + PmxcfsError::FileTooLarge => libc::EFBIG,
> + PmxcfsError::ReadOnlyFilesystem => libc::EROFS,
> + PmxcfsError::NoQuorum => libc::EACCES,
> + PmxcfsError::Timeout => libc::ETIMEDOUT,
> + PmxcfsError::Io(e) => match e.raw_os_error() {
> + Some(errno) => errno,
> + None => libc::EIO,
> + },
> + _ => libc::EIO,
Please check with C implementation, but:
"PermissionDenied" should likely map to EACCES rather than EPERM. In
FUSE/POSIX, EACCES is the standard return for file permission blocks,
whereas EPERM is usually for administrative restrictions
(like ownership)
"InvalidPath" maps better to EINVAL. EIO suggests a hardware/disk
failure, whereas InvalidPath implies an argument issue
Also, "Lock" should explicitly be mapped.
EBUSY (resource busy / lock contention)
or EDEADLK (deadlock) / EAGAIN depending on semantics
In general, can we minimize the number of errors falling into the
generic EIO branch?
> + }
> + }
> +}
> +
> +/// Result type for pmxcfs operations
> +pub type Result<T> = std::result::Result<T, PmxcfsError>;
> +
> +/// VM/CT types
> +#[derive(Debug, Clone, Copy, PartialEq, Eq)]
If this is used in wire contexts please add #[repr(u8)] to ensure a
stable ABI.
> +pub enum VmType {
> + Qemu = 1,
> + Lxc = 3,
There’s a gap between values 1 -> 3: is 2 reserved?
If so, maybe add a short comment.
> +}
> +
> +impl VmType {
> + /// Returns the directory name where config files are stored
> + pub fn config_dir(&self) -> &'static str {
> + match self {
> + VmType::Qemu => "qemu-server",
> + VmType::Lxc => "lxc",
> + }
> + }
> +}
> +
> +impl std::fmt::Display for VmType {
> + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
> + match self {
> + VmType::Qemu => write!(f, "qemu"),
> + VmType::Lxc => write!(f, "lxc"),
> + }
> + }
> +}
> +
> +/// VM/CT entry for vmlist
> +#[derive(Debug, Clone)]
> +pub struct VmEntry {
> + pub vmid: u32,
> + pub vmtype: VmType,
> + pub node: String,
> + /// Per-VM version counter (increments when this VM's config changes)
> + pub version: u32,
> +}
> +
> +/// Information about a cluster member
> +///
> +/// This is a shared type used by both cluster and DFSM modules
> +#[derive(Debug, Clone)]
> +pub struct MemberInfo {
> + pub node_id: u32,
> + pub pid: u32,
> + pub joined_at: u64,
> +}
> +
> +/// Node synchronization info for DFSM state sync
> +///
> +/// Used during DFSM synchronization to track which nodes have provided state
> +#[derive(Debug, Clone)]
> +pub struct NodeSyncInfo {
> + pub nodeid: u32,
We have "nodeid" here but "node_id" in MemberInfo, this should be
aligned.
> + pub pid: u32,
> + pub state: Option<Vec<u8>>,
> + pub synced: bool,
> +}
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [relevance 6%]
* Re: [pve-devel] [PATCH pve-cluster 02/15] pmxcfs-rs: add pmxcfs-config crate
@ 2026-01-23 15:01 6% ` Samuel Rufinatscha
2026-01-26 9:43 5% ` Kefu Chai
0 siblings, 1 reply; 39+ results
From: Samuel Rufinatscha @ 2026-01-23 15:01 UTC (permalink / raw)
To: Proxmox VE development discussion, Kefu Chai
comments inline
On 1/6/26 3:25 PM, Kefu Chai wrote:
> Add configuration management crate that provides:
> - Config struct for runtime configuration
> - Node hostname, IP, and group ID tracking
> - Debug and local mode flags
> - Thread-safe configuration access via parking_lot Mutex
>
> This is a foundational crate with no internal dependencies, only
> requiring parking_lot for synchronization. Other crates will use
> this for accessing runtime configuration.
>
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
> src/pmxcfs-rs/Cargo.toml | 3 +-
> src/pmxcfs-rs/pmxcfs-config/Cargo.toml | 16 +
> src/pmxcfs-rs/pmxcfs-config/README.md | 127 +++++++
> src/pmxcfs-rs/pmxcfs-config/src/lib.rs | 471 +++++++++++++++++++++++++
> 4 files changed, 616 insertions(+), 1 deletion(-)
> create mode 100644 src/pmxcfs-rs/pmxcfs-config/Cargo.toml
> create mode 100644 src/pmxcfs-rs/pmxcfs-config/README.md
> create mode 100644 src/pmxcfs-rs/pmxcfs-config/src/lib.rs
>
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index 15d88f52..28e20bb7 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -1,7 +1,8 @@
> # Workspace root for pmxcfs Rust implementation
> [workspace]
> members = [
> - "pmxcfs-api-types", # Shared types and error definitions
> + "pmxcfs-api-types", # Shared types and error definitions
> + "pmxcfs-config", # Configuration management
> ]
> resolver = "2"
>
> diff --git a/src/pmxcfs-rs/pmxcfs-config/Cargo.toml b/src/pmxcfs-rs/pmxcfs-config/Cargo.toml
> new file mode 100644
> index 00000000..f5a60995
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-config/Cargo.toml
> @@ -0,0 +1,16 @@
> +[package]
> +name = "pmxcfs-config"
> +description = "Configuration management for pmxcfs"
> +
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +repository.workspace = true
> +
> +[lints]
> +workspace = true
> +
> +[dependencies]
> +# Concurrency primitives
> +parking_lot.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-config/README.md b/src/pmxcfs-rs/pmxcfs-config/README.md
> new file mode 100644
> index 00000000..c06b2170
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-config/README.md
> @@ -0,0 +1,127 @@
> +# pmxcfs-config
> +
> +**Configuration Management** and **Cluster Services** for pmxcfs.
> +
> +This crate provides configuration structures and cluster integration services including quorum tracking and cluster configuration monitoring via Corosync APIs.
> +
> +## Overview
> +
> +This crate contains:
> +1. **Config struct**: Runtime configuration (node name, IPs, flags)
> +2. Integration with Corosync services (tracked in main pmxcfs crate):
> + - **QuorumService** (`pmxcfs/src/quorum_service.rs`) - Quorum monitoring
> + - **ClusterConfigService** (`pmxcfs/src/cluster_config_service.rs`) - Config tracking
This patch only contains the Config struct, but not Cluster Services
or QuorumService, please revist commit message and README.
> +
> +## Config Struct
> +
> +The `Config` struct holds daemon-wide configuration including node hostname, IP address, www-data group ID, debug flag, local mode flag, and cluster name.
> +
> +## Cluster Services
> +
> +The following services are implemented in the main pmxcfs crate but documented here for completeness.
> +
> +### QuorumService
> +
> +**C Equivalent:** `src/pmxcfs/quorum.c` - `service_quorum_new()`
> +**Rust Location:** `src/pmxcfs-rs/pmxcfs/src/quorum_service.rs`
> +
> +Monitors cluster quorum status via Corosync quorum API.
> +
> +#### Features
> +- Tracks quorum state (quorate/inquorate)
> +- Monitors member list changes
> +- Automatic reconnection on Corosync restart
> +- Updates `Status` quorum flag
> +
> +#### C to Rust Mapping
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `service_quorum_new()` | `QuorumService::new()` | quorum_service.rs |
> +| `service_quorum_destroy()` | (Drop trait / finalize) | Automatic |
> +| `quorum_notification_fn` | quorum_notification closure | quorum_service.rs |
> +| `nodelist_notification_fn` | nodelist_notification closure | quorum_service.rs |
> +
> +#### Quorum Notifications
> +
> +The service monitors quorum state changes and member list changes, updating the Status accordingly.
> +
> +### ClusterConfigService
> +
> +**C Equivalent:** `src/pmxcfs/confdb.c` - `service_confdb_new()`
> +**Rust Location:** `src/pmxcfs-rs/pmxcfs/src/cluster_config_service.rs`
> +
> +Monitors Corosync cluster configuration (cmap) and tracks node membership.
> +
> +#### Features
> +- Monitors cluster membership via Corosync cmap API
> +- Tracks node additions/removals
> +- Registers nodes in Status
> +- Automatic reconnection on Corosync restart
> +
> +#### C to Rust Mapping
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `service_confdb_new()` | `ClusterConfigService::new()` | cluster_config_service.rs |
> +| `service_confdb_destroy()` | (Drop trait / finalize) | Automatic |
> +| `confdb_track_fn` | (direct cmap queries) | Different approach |
> +
> +#### Configuration Tracking
> +
> +The service monitors:
> +- `nodelist.node.*.nodeid` - Node IDs
> +- `nodelist.node.*.name` - Node names
> +- `nodelist.node.*.ring*_addr` - Node IP addresses
> +
> +Updates `Status` with current cluster membership.
> +
> +## Key Differences from C Implementation
> +
> +### Cluster Config Service API
> +
> +**C Version (confdb.c):**
> +- Uses deprecated confdb API
> +- Track changes via confdb notifications
> +
> +**Rust Version:**
> +- Uses modern cmap API
> +- Direct cmap queries
> +
> +Both read the same data, but Rust uses the modern Corosync API.
> +
> +### Service Integration
> +
> +**C Version:**
> +- qb_loop manages lifecycle
> +
> +**Rust Version:**
> +- Service trait abstracts lifecycle
> +- ServiceManager handles retry
> +- Tokio async dispatch
> +
> +## Known Issues / TODOs
> +
> +### Compatibility
> +- **Quorum tracking**: Compatible with C implementation
> +- **Node registration**: Equivalent behavior
> +- **cmap vs confdb**: Rust uses modern cmap API (C uses deprecated confdb)
> +
> +### Missing Features
> +- None identified
> +
> +### Behavioral Differences (Benign)
> +- **API choice**: Rust uses cmap, C uses confdb (both read same data)
> +- **Lifecycle**: Rust uses Service trait, C uses manual lifecycle
> +
> +## References
> +
> +### C Implementation
> +- `src/pmxcfs/quorum.c` / `quorum.h` - Quorum service
> +- `src/pmxcfs/confdb.c` / `confdb.h` - Cluster config service
> +
> +### Related Crates
> +- **pmxcfs**: Main daemon with QuorumService and ClusterConfigService
> +- **pmxcfs-status**: Status tracking updated by these services
> +- **pmxcfs-services**: Service framework used by both services
> +- **rust-corosync**: Corosync FFI bindings
> diff --git a/src/pmxcfs-rs/pmxcfs-config/src/lib.rs b/src/pmxcfs-rs/pmxcfs-config/src/lib.rs
> new file mode 100644
> index 00000000..5e1ee1b2
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-config/src/lib.rs
> @@ -0,0 +1,471 @@
> +use parking_lot::RwLock;
> +use std::sync::Arc;
> +
> +/// Global configuration for pmxcfs
> +pub struct Config {
> + /// Node name (hostname without domain)
> + pub nodename: String,
> +
> + /// Node IP address
> + pub node_ip: String,
Consider using std::net::IpAddr (or SocketAddr if a port is part of the
value). Tests currently mix IP vs IP:PORT, so it’s unclear what node_ip
is supposed to represent.
> +
> + /// www-data group ID for file permissions
> + pub www_data_gid: u32,
> +
> + /// Debug mode enabled
> + pub debug: bool,
> +
> + /// Force local mode (no clustering)
> + pub local_mode: bool,
> +
> + /// Cluster name (CPG group name)
> + pub cluster_name: String,
> +
> + /// Debug level (0 = normal, 1+ = debug) - mutable at runtime
> + debug_level: RwLock<u8>,
in the crate docs it says: “The Config struct uses Arc<AtomicU8> for
debug_level” but the implementation uses parking_lot::RwLock<u8>.
Unless we need lock coupling with other fields, AtomicU8 would likely
be sufficient (and cheaper) for debug_level. Also please re-check the
commit message, which mentions parking_lot::Mutex.
> +}
> +
> +impl Clone for Config {
> + fn clone(&self) -> Self {
> + Self {
> + nodename: self.nodename.clone(),
> + node_ip: self.node_ip.clone(),
> + www_data_gid: self.www_data_gid,
> + debug: self.debug,
> + local_mode: self.local_mode,
> + cluster_name: self.cluster_name.clone(),
> + debug_level: RwLock::new(*self.debug_level.read()),
> + }
> + }
> +}
> +
> +impl std::fmt::Debug for Config {
> + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
> + f.debug_struct("Config")
> + .field("nodename", &self.nodename)
> + .field("node_ip", &self.node_ip)
> + .field("www_data_gid", &self.www_data_gid)
> + .field("debug", &self.debug)
> + .field("local_mode", &self.local_mode)
> + .field("cluster_name", &self.cluster_name)
> + .field("debug_level", &*self.debug_level.read())
> + .finish()
> + }
> +}
> +
> +impl Config {
> + pub fn new(
> + nodename: String,
> + node_ip: String,
> + www_data_gid: u32,
> + debug: bool,
> + local_mode: bool,
> + cluster_name: String,
> + ) -> Arc<Self> {
The constructor returns Arc<Config>
I think we could keep new() -> Self, and provide convenience
constructor shared() -> Arc<Self>.
This would allow local usage (e.g. for tests) without heap allocation
of the struct
> + let debug_level = if debug { 1 } else { 0 };
debug_level is derived from debug at creation time, but thereafter:
set_debug_level() does not update debug and is_debug() would continue
to reflect the initial flag, not the effective debug level
is_debug() should just be a helper that returns self.debug_level() > 0.
The debug field should probably be removed entirely.
> + Arc::new(Self {
> + nodename,
> + node_ip,
> + www_data_gid,
> + debug,
> + local_mode,
> + cluster_name,
> + debug_level: RwLock::new(debug_level),
> + })
> + }
> +
> + pub fn cluster_name(&self) -> &str {
> + &self.cluster_name
> + }
> +
> + pub fn nodename(&self) -> &str {
> + &self.nodename
> + }
> +
> + pub fn node_ip(&self) -> &str {
> + &self.node_ip
> + }
> +
> + pub fn www_data_gid(&self) -> u32 {
> + self.www_data_gid
> + }
> +
> + pub fn is_debug(&self) -> bool {
> + self.debug
> + }
> +
> + pub fn is_local_mode(&self) -> bool {
> + self.local_mode
> + }
> +
> + /// Get current debug level (0 = normal, 1+ = debug)
> + pub fn debug_level(&self) -> u8 {
> + *self.debug_level.read()
> + }
> +
> + /// Set debug level (0 = normal, 1+ = debug)
> + pub fn set_debug_level(&self, level: u8) {
> + *self.debug_level.write() = level;
> + }
Right now most fields are pub but also getters are exposed. This will
make it harder to enforce invariants.
I would suggest to make fields private and keep getters, or keep fields
public and drop the getters.
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + //! Unit tests for Config struct
> + //!
> + //! This test module provides comprehensive coverage for:
> + //! - Configuration creation and initialization
> + //! - Getter methods for all configuration fields
> + //! - Debug level mutation and thread safety
> + //! - Concurrent access patterns (reads and writes)
> + //! - Clone independence
> + //! - Debug formatting
> + //! - Edge cases (empty strings, long strings, special characters, unicode)
> + //!
> + //! ## Thread Safety
> + //!
> + //! The Config struct uses `Arc<AtomicU8>` for debug_level to allow
> + //! safe concurrent reads and writes. Tests verify:
> + //! - 10 threads × 100 operations (concurrent modifications)
> + //! - 20 threads × 1000 operations (concurrent reads)
> + //!
> + //! ## Edge Cases
> + //!
> + //! Tests cover various edge cases including:
> + //! - Empty strings for node/cluster names
> + //! - Long strings (1000+ characters)
> + //! - Special characters in strings
> + //! - Unicode support (emoji, non-ASCII characters)
> +
> + use super::*;
> + use std::thread;
> +
> + // ===== Basic Construction Tests =====
> +
> + #[test]
> + fn test_config_creation() {
> + let config = Config::new(
> + "node1".to_string(),
> + "192.168.1.10".to_string(),
> + 33,
> + false,
> + false,
> + "pmxcfs".to_string(),
> + );
> +
> + assert_eq!(config.nodename(), "node1");
> + assert_eq!(config.node_ip(), "192.168.1.10");
> + assert_eq!(config.www_data_gid(), 33);
> + assert!(!config.is_debug());
> + assert!(!config.is_local_mode());
> + assert_eq!(config.cluster_name(), "pmxcfs");
> + assert_eq!(
> + config.debug_level(),
> + 0,
> + "Debug level should be 0 when debug is false"
> + );
> + }
> +
> + #[test]
> + fn test_config_creation_with_debug() {
> + let config = Config::new(
> + "node2".to_string(),
> + "10.0.0.5".to_string(),
> + 1000,
> + true,
> + false,
> + "test-cluster".to_string(),
> + );
> +
> + assert!(config.is_debug());
> + assert_eq!(
> + config.debug_level(),
> + 1,
> + "Debug level should be 1 when debug is true"
> + );
> + }
> +
> + #[test]
> + fn test_config_creation_local_mode() {
> + let config = Config::new(
> + "localhost".to_string(),
> + "127.0.0.1".to_string(),
> + 33,
> + false,
> + true,
> + "local".to_string(),
> + );
> +
> + assert!(config.is_local_mode());
> + assert!(!config.is_debug());
> + }
> +
> + // ===== Getter Tests =====
> +
> + #[test]
> + fn test_all_getters() {
> + let config = Config::new(
> + "testnode".to_string(),
> + "172.16.0.1".to_string(),
> + 999,
> + true,
> + true,
> + "my-cluster".to_string(),
> + );
> +
> + // Test all getter methods
> + assert_eq!(config.nodename(), "testnode");
> + assert_eq!(config.node_ip(), "172.16.0.1");
> + assert_eq!(config.www_data_gid(), 999);
> + assert!(config.is_debug());
> + assert!(config.is_local_mode());
> + assert_eq!(config.cluster_name(), "my-cluster");
> + assert_eq!(config.debug_level(), 1);
> + }
> +
> + // ===== Debug Level Mutation Tests =====
> +
> + #[test]
> + fn test_debug_level_mutation() {
> + let config = Config::new(
> + "node1".to_string(),
> + "192.168.1.1".to_string(),
> + 33,
> + false,
> + false,
> + "pmxcfs".to_string(),
> + );
> +
> + assert_eq!(config.debug_level(), 0);
> +
> + config.set_debug_level(1);
> + assert_eq!(config.debug_level(), 1);
> +
> + config.set_debug_level(5);
> + assert_eq!(config.debug_level(), 5);
> +
> + config.set_debug_level(0);
> + assert_eq!(config.debug_level(), 0);
> + }
> +
> + #[test]
> + fn test_debug_level_max_value() {
> + let config = Config::new(
> + "node1".to_string(),
> + "192.168.1.1".to_string(),
> + 33,
> + false,
> + false,
> + "pmxcfs".to_string(),
> + );
> +
> + config.set_debug_level(255);
> + assert_eq!(config.debug_level(), 255);
> +
> + config.set_debug_level(0);
> + assert_eq!(config.debug_level(), 0);
> + }
> +
> + // ===== Thread Safety Tests =====
> +
> + #[test]
> + fn test_debug_level_thread_safety() {
> + let config = Config::new(
> + "node1".to_string(),
> + "192.168.1.1".to_string(),
> + 33,
> + false,
> + false,
> + "pmxcfs".to_string(),
> + );
> +
> + let config_clone = Arc::clone(&config);
> +
> + // Spawn multiple threads that concurrently modify debug level
> + let handles: Vec<_> = (0..10)
> + .map(|i| {
> + let cfg = Arc::clone(&config);
> + thread::spawn(move || {
> + for _ in 0..100 {
> + cfg.set_debug_level(i);
> + let _ = cfg.debug_level();
> + }
> + })
> + })
> + .collect();
> +
> + // All threads should complete without panicking
> + for handle in handles {
> + handle.join().unwrap();
> + }
> +
> + // Final value should be one of the values set by threads
> + let final_level = config_clone.debug_level();
> + assert!(
> + final_level < 10,
> + "Debug level should be < 10, got {final_level}"
> + );
> + }
> +
> + #[test]
> + fn test_concurrent_reads() {
> + let config = Config::new(
> + "node1".to_string(),
> + "192.168.1.1".to_string(),
> + 33,
> + true,
> + false,
> + "pmxcfs".to_string(),
> + );
> +
> + // Spawn multiple threads that concurrently read config
> + let handles: Vec<_> = (0..20)
> + .map(|_| {
> + let cfg = Arc::clone(&config);
> + thread::spawn(move || {
> + for _ in 0..1000 {
> + assert_eq!(cfg.nodename(), "node1");
> + assert_eq!(cfg.node_ip(), "192.168.1.1");
> + assert_eq!(cfg.www_data_gid(), 33);
> + assert!(cfg.is_debug());
> + assert!(!cfg.is_local_mode());
> + assert_eq!(cfg.cluster_name(), "pmxcfs");
> + }
> + })
> + })
> + .collect();
> +
> + for handle in handles {
> + handle.join().unwrap();
> + }
> + }
> +
> + // ===== Clone Tests =====
> +
> + #[test]
> + fn test_config_clone() {
> + let config1 = Config::new(
> + "node1".to_string(),
> + "192.168.1.1".to_string(),
> + 33,
> + true,
> + false,
> + "pmxcfs".to_string(),
> + );
> +
> + config1.set_debug_level(5);
> +
> + let config2 = (*config1).clone();
> +
> + // Cloned config should have same values
> + assert_eq!(config2.nodename(), config1.nodename());
> + assert_eq!(config2.node_ip(), config1.node_ip());
> + assert_eq!(config2.www_data_gid(), config1.www_data_gid());
> + assert_eq!(config2.is_debug(), config1.is_debug());
> + assert_eq!(config2.is_local_mode(), config1.is_local_mode());
> + assert_eq!(config2.cluster_name(), config1.cluster_name());
> + assert_eq!(config2.debug_level(), 5);
> +
> + // Modifying one should not affect the other
> + config2.set_debug_level(10);
> + assert_eq!(config1.debug_level(), 5);
> + assert_eq!(config2.debug_level(), 10);
> + }
> +
> + // ===== Debug Formatting Tests =====
> +
> + #[test]
> + fn test_debug_format() {
> + let config = Config::new(
> + "node1".to_string(),
> + "192.168.1.1".to_string(),
> + 33,
> + true,
> + false,
> + "pmxcfs".to_string(),
> + );
> +
> + let debug_str = format!("{config:?}");
> +
> + // Check that debug output contains all fields
> + assert!(debug_str.contains("Config"));
> + assert!(debug_str.contains("nodename"));
> + assert!(debug_str.contains("node1"));
> + assert!(debug_str.contains("node_ip"));
> + assert!(debug_str.contains("192.168.1.1"));
> + assert!(debug_str.contains("www_data_gid"));
> + assert!(debug_str.contains("33"));
> + assert!(debug_str.contains("debug"));
> + assert!(debug_str.contains("true"));
> + assert!(debug_str.contains("local_mode"));
> + assert!(debug_str.contains("false"));
> + assert!(debug_str.contains("cluster_name"));
> + assert!(debug_str.contains("pmxcfs"));
> + assert!(debug_str.contains("debug_level"));
> + }
> +
> + // ===== Edge Cases and Boundary Tests =====
> +
> + #[test]
> + fn test_empty_strings() {
> + let config = Config::new(String::new(), String::new(), 0, false, false, String::new());
> +
> + assert_eq!(config.nodename(), "");
> + assert_eq!(config.node_ip(), "");
> + assert_eq!(config.cluster_name(), "");
> + assert_eq!(config.www_data_gid(), 0);
> + }
> +
> + #[test]
> + fn test_long_strings() {
> + let long_name = "a".repeat(1000);
> + let long_ip = "192.168.1.".to_string() + &"1".repeat(100);
> + let long_cluster = "cluster-".to_string() + &"x".repeat(500);
> +
> + let config = Config::new(
> + long_name.clone(),
> + long_ip.clone(),
> + u32::MAX,
> + true,
> + true,
> + long_cluster.clone(),
> + );
> +
> + assert_eq!(config.nodename(), long_name);
> + assert_eq!(config.node_ip(), long_ip);
> + assert_eq!(config.cluster_name(), long_cluster);
> + assert_eq!(config.www_data_gid(), u32::MAX);
> + }
> +
> + #[test]
> + fn test_special_characters_in_strings() {
> + let config = Config::new(
> + "node-1_test.local".to_string(),
> + "192.168.1.10:8006".to_string(),
> + 33,
> + false,
> + false,
> + "my-cluster_v2.0".to_string(),
> + );
> +
> + assert_eq!(config.nodename(), "node-1_test.local");
> + assert_eq!(config.node_ip(), "192.168.1.10:8006");
> + assert_eq!(config.cluster_name(), "my-cluster_v2.0");
> + }
> +
> + #[test]
> + fn test_unicode_in_strings() {
> + let config = Config::new(
> + "ノード1".to_string(),
> + "::1".to_string(),
> + 33,
> + false,
> + false,
> + "集群".to_string(),
> + );
> +
> + assert_eq!(config.nodename(), "ノード1");
> + assert_eq!(config.node_ip(), "::1");
> + assert_eq!(config.cluster_name(), "集群");
> + }
> +}
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [relevance 6%]
* Re: [pve-devel] [PATCH pve-cluster 01/15] pmxcfs-rs: add workspace and pmxcfs-api-types crate
2026-01-23 14:17 6% ` Samuel Rufinatscha
@ 2026-01-26 9:00 6% ` Kefu Chai
0 siblings, 0 replies; 39+ results
From: Kefu Chai @ 2026-01-26 9:00 UTC (permalink / raw)
To: Samuel Rufinatscha, Proxmox VE development discussion
On Fri Jan 23, 2026 at 10:17 PM CST, Samuel Rufinatscha wrote:
> Thanks for the series. I’ve started reviewing patches 1–6; sending
> notes for patch 1 first, and I’ll follow up with comments on the
> others once I’ve gone through them in more depth.
Hi Samuel, thanks for your review. replies inlined.
>
> comments inline
>
> On 1/6/26 3:25 PM, Kefu Chai wrote:
>> Initialize the Rust workspace for the pmxcfs rewrite project.
>>
>> Add pmxcfs-api-types crate which provides foundational types:
>> - PmxcfsError: Error type with errno mapping for FUSE operations
>> - FuseMessage: Filesystem operation messages
>> - KvStoreMessage: Status synchronization messages
>> - ApplicationMessage: Wrapper enum for both message types
>> - VmType: VM type enum (Qemu, Lxc)
>>
>> This is the foundation crate with no internal dependencies, only
>> requiring thiserror and libc. All other crates will depend on these
>> shared type definitions.
>>
>> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
>> ---
>> src/pmxcfs-rs/Cargo.lock | 2067 +++++++++++++++++++++
>
> Following the .gitignore pattern in our other repos, Cargo.lock is
> ignored, so I’d suggest dropping it from the series.
dropped.
>
>> src/pmxcfs-rs/Cargo.toml | 83 +
>> src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml | 19 +
>> src/pmxcfs-rs/pmxcfs-api-types/README.md | 105 ++
>> src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs | 152 ++
>> 5 files changed, 2426 insertions(+)
>> create mode 100644 src/pmxcfs-rs/Cargo.lock
>> create mode 100644 src/pmxcfs-rs/Cargo.toml
>> create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
>> create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/README.md
>> create mode 100644 src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
>>
>> diff --git a/src/pmxcfs-rs/Cargo.lock b/src/pmxcfs-rs/Cargo.lock
>
> [..]
>
>> +++ b/src/pmxcfs-rs/Cargo.toml
>> @@ -0,0 +1,83 @@
>> +# Workspace root for pmxcfs Rust implementation
>> +[workspace]
>> +members = [
>> + "pmxcfs-api-types", # Shared types and error definitions
>> +]
>> +resolver = "2"
>> +
>> +[workspace.package]
>> +version = "9.0.6"
>> +edition = "2024"
>> +authors = ["Proxmox Support Team <support@proxmox.com>"]
>> +license = "AGPL-3.0"
>> +repository = "https://git.proxmox.com/?p=pve-cluster.git"
>> +rust-version = "1.85"
>> +
>> +[workspace.dependencies]
>
> Here we already declare workspace path deps for crates that aren’t
> present yet (pmxcfs-config, pmxcfs-memdb, ...). For bisectability,
> could we keep this patch minimal and add those workspace
> members/path deps in the patches where the crates are introduced?
restructured the commits to add the deps only when they are used.
>
>> +# Internal workspace dependencies
>> +pmxcfs-api-types = { path = "pmxcfs-api-types" }
>> +pmxcfs-config = { path = "pmxcfs-config" }
>> +pmxcfs-memdb = { path = "pmxcfs-memdb" }
>> +pmxcfs-dfsm = { path = "pmxcfs-dfsm" }
>> +pmxcfs-rrd = { path = "pmxcfs-rrd" }
>> +pmxcfs-status = { path = "pmxcfs-status" }
>> +pmxcfs-ipc = { path = "pmxcfs-ipc" }
>> +pmxcfs-services = { path = "pmxcfs-services" }
>> +pmxcfs-logger = { path = "pmxcfs-logger" }
>> +
>> +# Core async runtime
>> +tokio = { version = "1.35", features = ["full"] }
>> +tokio-util = "0.7"
>> +async-trait = "0.1"
>> +
>
> If the goal is to centrally pin external crate versions early, maybe
> limit [workspace.dependencies] here generally to the crates actually
> used by pmxcfs-api-types (thiserror, libc) and extend as new crates
> are added.
likewise.
>
>> +# Error handling
>> +anyhow = "1.0"
>> +thiserror = "1.0"
>> +
>> +# Logging and tracing
>> +tracing = "0.1"
>> +tracing-subscriber = { version = "0.3", features = ["env-filter"] }
>> +
>> +# Serialization
>> +serde = { version = "1.0", features = ["derive"] }
>> +serde_json = "1.0"
>> +bincode = "1.3"
>> +
>> +# Network and cluster
>> +bytes = "1.5"
>> +sha2 = "0.10"
>> +bytemuck = { version = "1.14", features = ["derive"] }
>> +
>> +# System integration
>> +libc = "0.2"
>> +nix = { version = "0.27", features = ["fs", "process", "signal", "user", "socket"] }
>> +users = "0.11"
>> +
>> +# Corosync/CPG bindings
>> +rust-corosync = "0.1"
>> +
>> +# Enum conversions
>> +num_enum = "0.7"
>> +
>> +# Concurrency primitives
>> +parking_lot = "0.12"
>> +
>> +# Utilities
>> +chrono = "0.4"
>> +futures = "0.3"
>> +
>> +# Development dependencies
>> +tempfile = "3.8"
>> +
>> +[workspace.lints.clippy]
>> +uninlined_format_args = "warn"
>> +
>> +[profile.release]
>> +lto = true
>> +codegen-units = 1
>> +opt-level = 3
>> +strip = true
>> +
>> +[profile.dev]
>> +opt-level = 1
>> +debug = true
>> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml b/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
>> new file mode 100644
>> index 00000000..cdce7951
>> --- /dev/null
>> +++ b/src/pmxcfs-rs/pmxcfs-api-types/Cargo.toml
>> @@ -0,0 +1,19 @@
>> +[package]
>> +name = "pmxcfs-api-types"
>> +description = "Shared types and error definitions for pmxcfs"
>> +
>> +version.workspace = true
>> +edition.workspace = true
>> +authors.workspace = true
>> +license.workspace = true
>> +repository.workspace = true
>> +
>> +[lints]
>> +workspace = true
>> +
>> +[dependencies]
>> +# Error handling
>> +thiserror.workspace = true
>> +
>> +# System integration
>> +libc.workspace = true
>> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/README.md b/src/pmxcfs-rs/pmxcfs-api-types/README.md
>> new file mode 100644
>> index 00000000..da8304ae
>> --- /dev/null
>> +++ b/src/pmxcfs-rs/pmxcfs-api-types/README.md
>> @@ -0,0 +1,105 @@
>> +# pmxcfs-api-types
>> +
>> +**Shared Types and Error Definitions** for pmxcfs.
>> +
>> +This crate provides common types, error definitions, and message formats used across all pmxcfs crates. It serves as the "API contract" between different components.
>> +
>> +## Overview
>> +
>> +The crate contains:
>> +- **Error types**: `PmxcfsError` with errno mapping for FUSE
>> +- **Message types**: `FuseMessage`, `KvStoreMessage`, `ApplicationMessage`
>
> These types and the mentioned serialization helpers aren’t part of this
> diff, could you re-check both README.md (and the commit message) so they
> match?
sorry, this README was revised before the last refactory. fixed.
>
>> +- **Shared types**: `MemberInfo`, `NodeSyncInfo`
>> +- **Serialization**: C-compatible wire format helpers
>> +
>> +## Error Types
>> +
>> +### PmxcfsError
>> +
>> +Type-safe error enum with automatic errno conversion.
>> +
>> +### errno Mapping
>> +
>> +Errors automatically convert to POSIX errno values for FUSE.
>> +
>> +| Error | errno | Value |
>> +|-------|-------|-------|
>> +| `NotFound` | `ENOENT` | 2 |
>> +| `PermissionDenied` | `EPERM` | 1 |
>> +| `AlreadyExists` | `EEXIST` | 17 |
>> +| `NotADirectory` | `ENOTDIR` | 20 |
>> +| `IsADirectory` | `EISDIR` | 21 |
>> +| `DirectoryNotEmpty` | `ENOTEMPTY` | 39 |
>> +| `FileTooLarge` | `EFBIG` | 27 |
>> +| `ReadOnlyFilesystem` | `EROFS` | 30 |
>> +| `NoQuorum` | `EACCES` | 13 |
>> +| `Timeout` | `ETIMEDOUT` | 110 |
>> +
>> +## Message Types
>> +
>> +### FuseMessage
>> +
>> +Filesystem operations broadcast through the cluster (via DFSM). Uses C-compatible wire format compatible with `dcdb.c`.
>> +
>> +### KvStoreMessage
>> +
>> +Status and metrics synchronization (via kvstore DFSM). Uses C-compatible wire format.
>> +
>> +### ApplicationMessage
>> +
>> +Wrapper for either FuseMessage or KvStoreMessage, used by DFSM to handle both filesystem and status messages with type safety.
>> +
>> +## Shared Types
>> +
>> +### MemberInfo
>> +
>> +Cluster member information.
>> +
>> +### NodeSyncInfo
>> +
>> +DFSM synchronization state.
>> +
>> +## C to Rust Mapping
>> +
>> +### Error Handling
>> +
>> +**C Version (cfs-utils.h):**
>> +- Return codes: `0` = success, negative = error
>> +- errno-based error reporting
>> +- Manual error checking everywhere
>> +
>> +**Rust Version:**
>> +- `Result<T, PmxcfsError>` type
>> +
>> +### Message Types
>> +
>> +**C Version (dcdb.h):**
>> +
>> +**Rust Version:**
>> +- Type-safe enums
>> +
>> +## Key Differences from C Implementation
>> +
>> +All message types have `serialize()` and `deserialize()` methods that produce byte-for-byte compatible formats with the C implementation.
>> +
>> +## Known Issues / TODOs
>> +
>> +### Missing Features
>> +- None identified
>> +
>> +### Compatibility
>> +- **Wire format**: 100% compatible with C implementation
>> +- **errno values**: Match POSIX standards
>> +- **Message types**: All C message types covered
>> +
>> +## References
>> +
>> +### C Implementation
>> +- `src/pmxcfs/cfs-utils.h` - Utility types and error codes
>> +- `src/pmxcfs/dcdb.h` - FUSE message types
>> +- `src/pmxcfs/status.h` - KvStore message types
>> +
>> +### Related Crates
>> +- **pmxcfs-dfsm**: Uses ApplicationMessage for cluster sync
>> +- **pmxcfs-memdb**: Uses PmxcfsError for database operations
>> +- **pmxcfs**: Uses FuseMessage for FUSE operations
>> diff --git a/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs b/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
>> new file mode 100644
>> index 00000000..ae0e5eb0
>> --- /dev/null
>> +++ b/src/pmxcfs-rs/pmxcfs-api-types/src/lib.rs
>> @@ -0,0 +1,152 @@
>> +use thiserror::Error;
>> +
>> +/// Error types for pmxcfs operations
>> +#[derive(Error, Debug)]
>> +pub enum PmxcfsError {
>
> nit: the error related parts could be added into a dedicated error.rs
> module
thanks! extracted.
>
>> + #[error("I/O error: {0}")]
>> + Io(#[from] std::io::Error),
>> +
>> + #[error("Database error: {0}")]
>> + Database(String),
>> +
>> + #[error("FUSE error: {0}")]
>> + Fuse(String),
>> +
>> + #[error("Cluster error: {0}")]
>> + Cluster(String),
>> +
>> + #[error("Corosync error: {0}")]
>> + Corosync(String),
>> +
>> + #[error("Configuration error: {0}")]
>> + Configuration(String),
>> +
>> + #[error("System error: {0}")]
>> + System(String),
>> +
>> + #[error("IPC error: {0}")]
>> + Ipc(String),
>> +
>> + #[error("Permission denied")]
>> + PermissionDenied,
>> +
>> + #[error("Not found: {0}")]
>> + NotFound(String),
>> +
>> + #[error("Already exists: {0}")]
>> + AlreadyExists(String),
>> +
>> + #[error("Invalid argument: {0}")]
>> + InvalidArgument(String),
>> +
>> + #[error("Not a directory: {0}")]
>> + NotADirectory(String),
>> +
>> + #[error("Is a directory: {0}")]
>> + IsADirectory(String),
>> +
>> + #[error("Directory not empty: {0}")]
>> + DirectoryNotEmpty(String),
>> +
>> + #[error("No quorum")]
>> + NoQuorum,
>> +
>> + #[error("Read-only filesystem")]
>> + ReadOnlyFilesystem,
>> +
>> + #[error("File too large")]
>> + FileTooLarge,
>> +
>> + #[error("Lock error: {0}")]
>> + Lock(String),
>> +
>> + #[error("Timeout")]
>> + Timeout,
>> +
>> + #[error("Invalid path: {0}")]
>> + InvalidPath(String),
>> +}
>> +
>> +impl PmxcfsError {
>> + /// Convert error to errno value for FUSE operations
>> + pub fn to_errno(&self) -> i32 {
>> + match self {
>> + PmxcfsError::NotFound(_) => libc::ENOENT,
>> + PmxcfsError::PermissionDenied => libc::EPERM,
>> + PmxcfsError::AlreadyExists(_) => libc::EEXIST,
>> + PmxcfsError::NotADirectory(_) => libc::ENOTDIR,
>> + PmxcfsError::IsADirectory(_) => libc::EISDIR,
>> + PmxcfsError::DirectoryNotEmpty(_) => libc::ENOTEMPTY,
>> + PmxcfsError::InvalidArgument(_) => libc::EINVAL,
>> + PmxcfsError::FileTooLarge => libc::EFBIG,
>> + PmxcfsError::ReadOnlyFilesystem => libc::EROFS,
>> + PmxcfsError::NoQuorum => libc::EACCES,
>> + PmxcfsError::Timeout => libc::ETIMEDOUT,
>> + PmxcfsError::Io(e) => match e.raw_os_error() {
>> + Some(errno) => errno,
>> + None => libc::EIO,
>> + },
>> + _ => libc::EIO,
>
> Please check with C implementation, but:
>
> "PermissionDenied" should likely map to EACCES rather than EPERM. In
> FUSE/POSIX, EACCES is the standard return for file permission blocks,
> whereas EPERM is usually for administrative restrictions
> (like ownership)
>
> "InvalidPath" maps better to EINVAL. EIO suggests a hardware/disk
> failure, whereas InvalidPath implies an argument issue
>
> Also, "Lock" should explicitly be mapped.
> EBUSY (resource busy / lock contention)
> or EDEADLK (deadlock) / EAGAIN depending on semantics
>
> In general, can we minimize the number of errors falling into the
> generic EIO branch?
>
indeed. the way how the errors were categorized was way too
coarse-grained. fixed accordingly.
>> + }
>> + }
>> +}
>> +
>> +/// Result type for pmxcfs operations
>> +pub type Result<T> = std::result::Result<T, PmxcfsError>;
>> +
>> +/// VM/CT types
>> +#[derive(Debug, Clone, Copy, PartialEq, Eq)]
>
> If this is used in wire contexts please add #[repr(u8)] to ensure a
> stable ABI.
it's not use in wire format. i removed assignment statements, as in
this case, we don't need predictable values for these values -- they
are only used in-memory comparisons as distinct identifiers.
>
>> +pub enum VmType {
>> + Qemu = 1,
>> + Lxc = 3,
>
> There’s a gap between values 1 -> 3: is 2 reserved?
> If so, maybe add a short comment.
it's not reserved. actually, it's OpenVZ which was not supported anymore.
see https://www.proxmox.com/en/about/company-details/press-releases/proxmox-ve-4-0-released
now that the specific values are not assigned to these enum values, we
don't need to keep it anymore. but a short comment was added anyway to
explain that OpenVZ support was removed.
>
>> +}
>> +
>> +impl VmType {
>> + /// Returns the directory name where config files are stored
>> + pub fn config_dir(&self) -> &'static str {
>> + match self {
>> + VmType::Qemu => "qemu-server",
>> + VmType::Lxc => "lxc",
>> + }
>> + }
>> +}
>> +
>> +impl std::fmt::Display for VmType {
>> + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
>> + match self {
>> + VmType::Qemu => write!(f, "qemu"),
>> + VmType::Lxc => write!(f, "lxc"),
>> + }
>> + }
>> +}
>> +
>> +/// VM/CT entry for vmlist
>> +#[derive(Debug, Clone)]
>> +pub struct VmEntry {
>> + pub vmid: u32,
>> + pub vmtype: VmType,
>> + pub node: String,
>> + /// Per-VM version counter (increments when this VM's config changes)
>> + pub version: u32,
>> +}
>> +
>> +/// Information about a cluster member
>> +///
>> +/// This is a shared type used by both cluster and DFSM modules
>> +#[derive(Debug, Clone)]
>> +pub struct MemberInfo {
>> + pub node_id: u32,
>> + pub pid: u32,
>> + pub joined_at: u64,
>> +}
>> +
>> +/// Node synchronization info for DFSM state sync
>> +///
>> +/// Used during DFSM synchronization to track which nodes have provided state
>> +#[derive(Debug, Clone)]
>> +pub struct NodeSyncInfo {
>> + pub nodeid: u32,
>
> We have "nodeid" here but "node_id" in MemberInfo, this should be
> aligned.
thanks for pointing this out! changed to "node_id".
>
>> + pub pid: u32,
>> + pub state: Option<Vec<u8>>,
>> + pub synced: bool,
>> +}
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [relevance 6%]
* Re: [pve-devel] [PATCH pve-cluster 02/15] pmxcfs-rs: add pmxcfs-config crate
2026-01-23 15:01 6% ` Samuel Rufinatscha
@ 2026-01-26 9:43 5% ` Kefu Chai
0 siblings, 0 replies; 39+ results
From: Kefu Chai @ 2026-01-26 9:43 UTC (permalink / raw)
To: Samuel Rufinatscha, Proxmox VE development discussion
On Fri Jan 23, 2026 at 11:01 PM CST, Samuel Rufinatscha wrote:
> comments inline
>
> On 1/6/26 3:25 PM, Kefu Chai wrote:
>> Add configuration management crate that provides:
>> - Config struct for runtime configuration
>> - Node hostname, IP, and group ID tracking
>> - Debug and local mode flags
>> - Thread-safe configuration access via parking_lot Mutex
>>
>> This is a foundational crate with no internal dependencies, only
>> requiring parking_lot for synchronization. Other crates will use
>> this for accessing runtime configuration.
>>
>> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
>> ---
>> src/pmxcfs-rs/Cargo.toml | 3 +-
>> src/pmxcfs-rs/pmxcfs-config/Cargo.toml | 16 +
>> src/pmxcfs-rs/pmxcfs-config/README.md | 127 +++++++
>> src/pmxcfs-rs/pmxcfs-config/src/lib.rs | 471 +++++++++++++++++++++++++
>> 4 files changed, 616 insertions(+), 1 deletion(-)
>> create mode 100644 src/pmxcfs-rs/pmxcfs-config/Cargo.toml
>> create mode 100644 src/pmxcfs-rs/pmxcfs-config/README.md
>> create mode 100644 src/pmxcfs-rs/pmxcfs-config/src/lib.rs
>>
>> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
>> index 15d88f52..28e20bb7 100644
>> --- a/src/pmxcfs-rs/Cargo.toml
>> +++ b/src/pmxcfs-rs/Cargo.toml
>> @@ -1,7 +1,8 @@
>> # Workspace root for pmxcfs Rust implementation
>> [workspace]
>> members = [
>> - "pmxcfs-api-types", # Shared types and error definitions
>> + "pmxcfs-api-types", # Shared types and error definitions
>> + "pmxcfs-config", # Configuration management
>> ]
>> resolver = "2"
>>
>> diff --git a/src/pmxcfs-rs/pmxcfs-config/Cargo.toml b/src/pmxcfs-rs/pmxcfs-config/Cargo.toml
>> new file mode 100644
>> index 00000000..f5a60995
>> --- /dev/null
>> +++ b/src/pmxcfs-rs/pmxcfs-config/Cargo.toml
>> @@ -0,0 +1,16 @@
>> +[package]
>> +name = "pmxcfs-config"
>> +description = "Configuration management for pmxcfs"
>> +
>> +version.workspace = true
>> +edition.workspace = true
>> +authors.workspace = true
>> +license.workspace = true
>> +repository.workspace = true
>> +
>> +[lints]
>> +workspace = true
>> +
>> +[dependencies]
>> +# Concurrency primitives
>> +parking_lot.workspace = true
>> diff --git a/src/pmxcfs-rs/pmxcfs-config/README.md b/src/pmxcfs-rs/pmxcfs-config/README.md
>> new file mode 100644
>> index 00000000..c06b2170
>> --- /dev/null
>> +++ b/src/pmxcfs-rs/pmxcfs-config/README.md
>> @@ -0,0 +1,127 @@
>> +# pmxcfs-config
>> +
>> +**Configuration Management** and **Cluster Services** for pmxcfs.
>> +
>> +This crate provides configuration structures and cluster integration services including quorum tracking and cluster configuration monitoring via Corosync APIs.
>> +
>> +## Overview
>> +
>> +This crate contains:
>> +1. **Config struct**: Runtime configuration (node name, IPs, flags)
>> +2. Integration with Corosync services (tracked in main pmxcfs crate):
>> + - **QuorumService** (`pmxcfs/src/quorum_service.rs`) - Quorum monitoring
>> + - **ClusterConfigService** (`pmxcfs/src/cluster_config_service.rs`) - Config tracking
>
> This patch only contains the Config struct, but not Cluster Services
> or QuorumService, please revist commit message and README.
Sorry, the README.md was out-of-sync after the latest refatory. Fixed.
>
>> +
>> +## Config Struct
>> +
>> +The `Config` struct holds daemon-wide configuration including node hostname, IP address, www-data group ID, debug flag, local mode flag, and cluster name.
>> +
>> +## Cluster Services
>> +
>> +The following services are implemented in the main pmxcfs crate but documented here for completeness.
>> +
>> +### QuorumService
>> +
>> +**C Equivalent:** `src/pmxcfs/quorum.c` - `service_quorum_new()`
>> +**Rust Location:** `src/pmxcfs-rs/pmxcfs/src/quorum_service.rs`
>> +
>> +Monitors cluster quorum status via Corosync quorum API.
>> +
>> +#### Features
>> +- Tracks quorum state (quorate/inquorate)
>> +- Monitors member list changes
>> +- Automatic reconnection on Corosync restart
>> +- Updates `Status` quorum flag
>> +
>> +#### C to Rust Mapping
>> +
>> +| C Function | Rust Equivalent | Location |
>> +|-----------|-----------------|----------|
>> +| `service_quorum_new()` | `QuorumService::new()` | quorum_service.rs |
>> +| `service_quorum_destroy()` | (Drop trait / finalize) | Automatic |
>> +| `quorum_notification_fn` | quorum_notification closure | quorum_service.rs |
>> +| `nodelist_notification_fn` | nodelist_notification closure | quorum_service.rs |
>> +
>> +#### Quorum Notifications
>> +
>> +The service monitors quorum state changes and member list changes, updating the Status accordingly.
>> +
>> +### ClusterConfigService
>> +
>> +**C Equivalent:** `src/pmxcfs/confdb.c` - `service_confdb_new()`
>> +**Rust Location:** `src/pmxcfs-rs/pmxcfs/src/cluster_config_service.rs`
>> +
>> +Monitors Corosync cluster configuration (cmap) and tracks node membership.
>> +
>> +#### Features
>> +- Monitors cluster membership via Corosync cmap API
>> +- Tracks node additions/removals
>> +- Registers nodes in Status
>> +- Automatic reconnection on Corosync restart
>> +
>> +#### C to Rust Mapping
>> +
>> +| C Function | Rust Equivalent | Location |
>> +|-----------|-----------------|----------|
>> +| `service_confdb_new()` | `ClusterConfigService::new()` | cluster_config_service.rs |
>> +| `service_confdb_destroy()` | (Drop trait / finalize) | Automatic |
>> +| `confdb_track_fn` | (direct cmap queries) | Different approach |
>> +
>> +#### Configuration Tracking
>> +
>> +The service monitors:
>> +- `nodelist.node.*.nodeid` - Node IDs
>> +- `nodelist.node.*.name` - Node names
>> +- `nodelist.node.*.ring*_addr` - Node IP addresses
>> +
>> +Updates `Status` with current cluster membership.
>> +
>> +## Key Differences from C Implementation
>> +
>> +### Cluster Config Service API
>> +
>> +**C Version (confdb.c):**
>> +- Uses deprecated confdb API
>> +- Track changes via confdb notifications
>> +
>> +**Rust Version:**
>> +- Uses modern cmap API
>> +- Direct cmap queries
>> +
>> +Both read the same data, but Rust uses the modern Corosync API.
>> +
>> +### Service Integration
>> +
>> +**C Version:**
>> +- qb_loop manages lifecycle
>> +
>> +**Rust Version:**
>> +- Service trait abstracts lifecycle
>> +- ServiceManager handles retry
>> +- Tokio async dispatch
>> +
>> +## Known Issues / TODOs
>> +
>> +### Compatibility
>> +- **Quorum tracking**: Compatible with C implementation
>> +- **Node registration**: Equivalent behavior
>> +- **cmap vs confdb**: Rust uses modern cmap API (C uses deprecated confdb)
>> +
>> +### Missing Features
>> +- None identified
>> +
>> +### Behavioral Differences (Benign)
>> +- **API choice**: Rust uses cmap, C uses confdb (both read same data)
>> +- **Lifecycle**: Rust uses Service trait, C uses manual lifecycle
>> +
>> +## References
>> +
>> +### C Implementation
>> +- `src/pmxcfs/quorum.c` / `quorum.h` - Quorum service
>> +- `src/pmxcfs/confdb.c` / `confdb.h` - Cluster config service
>> +
>> +### Related Crates
>> +- **pmxcfs**: Main daemon with QuorumService and ClusterConfigService
>> +- **pmxcfs-status**: Status tracking updated by these services
>> +- **pmxcfs-services**: Service framework used by both services
>> +- **rust-corosync**: Corosync FFI bindings
>> diff --git a/src/pmxcfs-rs/pmxcfs-config/src/lib.rs b/src/pmxcfs-rs/pmxcfs-config/src/lib.rs
>> new file mode 100644
>> index 00000000..5e1ee1b2
>> --- /dev/null
>> +++ b/src/pmxcfs-rs/pmxcfs-config/src/lib.rs
>> @@ -0,0 +1,471 @@
>> +use parking_lot::RwLock;
>> +use std::sync::Arc;
>> +
>> +/// Global configuration for pmxcfs
>> +pub struct Config {
>> + /// Node name (hostname without domain)
>> + pub nodename: String,
>> +
>> + /// Node IP address
>> + pub node_ip: String,
>
> Consider using std::net::IpAddr (or SocketAddr if a port is part of the
> value). Tests currently mix IP vs IP:PORT, so it’s unclear what node_ip
> is supposed to represent.
It's a value extracted from resolve_node_ip(), so it's just an IP
address. so switched to IpAddr, and tests are updated accordingly.
>
>> +
>> + /// www-data group ID for file permissions
>> + pub www_data_gid: u32,
>> +
>> + /// Debug mode enabled
>> + pub debug: bool,
>> +
>> + /// Force local mode (no clustering)
>> + pub local_mode: bool,
>> +
>> + /// Cluster name (CPG group name)
>> + pub cluster_name: String,
>> +
>> + /// Debug level (0 = normal, 1+ = debug) - mutable at runtime
>> + debug_level: RwLock<u8>,
>
> in the crate docs it says: “The Config struct uses Arc<AtomicU8> for
> debug_level” but the implementation uses parking_lot::RwLock<u8>.
> Unless we need lock coupling with other fields, AtomicU8 would likely
> be sufficient (and cheaper) for debug_level. Also please re-check the
> commit message, which mentions parking_lot::Mutex.
Indeed. AtomicU8 is more light-weight and simpler than RwLock. changed
accordingly.
>
>> +}
>> +
>> +impl Clone for Config {
>> + fn clone(&self) -> Self {
>> + Self {
>> + nodename: self.nodename.clone(),
>> + node_ip: self.node_ip.clone(),
>> + www_data_gid: self.www_data_gid,
>> + debug: self.debug,
>> + local_mode: self.local_mode,
>> + cluster_name: self.cluster_name.clone(),
>> + debug_level: RwLock::new(*self.debug_level.read()),
>> + }
>> + }
>> +}
>> +
>> +impl std::fmt::Debug for Config {
>> + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
>> + f.debug_struct("Config")
>> + .field("nodename", &self.nodename)
>> + .field("node_ip", &self.node_ip)
>> + .field("www_data_gid", &self.www_data_gid)
>> + .field("debug", &self.debug)
>> + .field("local_mode", &self.local_mode)
>> + .field("cluster_name", &self.cluster_name)
>> + .field("debug_level", &*self.debug_level.read())
>> + .finish()
>> + }
>> +}
>> +
>> +impl Config {
>> + pub fn new(
>> + nodename: String,
>> + node_ip: String,
>> + www_data_gid: u32,
>> + debug: bool,
>> + local_mode: bool,
>> + cluster_name: String,
>> + ) -> Arc<Self> {
>
> The constructor returns Arc<Config>
> I think we could keep new() -> Self, and provide convenience
> constructor shared() -> Arc<Self>.
> This would allow local usage (e.g. for tests) without heap allocation
> of the struct
Config::new() is added. And tests are using it automatically.
>
>> + let debug_level = if debug { 1 } else { 0 };
>
> debug_level is derived from debug at creation time, but thereafter:
> set_debug_level() does not update debug and is_debug() would continue
> to reflect the initial flag, not the effective debug level
> is_debug() should just be a helper that returns self.debug_level() > 0.
> The debug field should probably be removed entirely.
Ahh, thanks for pointing this out. Fixed.
>
>> + Arc::new(Self {
>> + nodename,
>> + node_ip,
>> + www_data_gid,
>> + debug,
>> + local_mode,
>> + cluster_name,
>> + debug_level: RwLock::new(debug_level),
>> + })
>> + }
>> +
>> + pub fn cluster_name(&self) -> &str {
>> + &self.cluster_name
>> + }
>> +
>> + pub fn nodename(&self) -> &str {
>> + &self.nodename
>> + }
>> +
>> + pub fn node_ip(&self) -> &str {
>> + &self.node_ip
>> + }
>> +
>> + pub fn www_data_gid(&self) -> u32 {
>> + self.www_data_gid
>> + }
>> +
>> + pub fn is_debug(&self) -> bool {
>> + self.debug
>> + }
>> +
>> + pub fn is_local_mode(&self) -> bool {
>> + self.local_mode
>> + }
>> +
>> + /// Get current debug level (0 = normal, 1+ = debug)
>> + pub fn debug_level(&self) -> u8 {
>> + *self.debug_level.read()
>> + }
>> +
>> + /// Set debug level (0 = normal, 1+ = debug)
>> + pub fn set_debug_level(&self, level: u8) {
>> + *self.debug_level.write() = level;
>> + }
>
> Right now most fields are pub but also getters are exposed. This will
> make it harder to enforce invariants.
> I would suggest to make fields private and keep getters, or keep fields
> public and drop the getters.
Indeed. I made all fields private and keep getters.
>
>> +}
>> +
>> +#[cfg(test)]
>> +mod tests {
>> + //! Unit tests for Config struct
>> + //!
>> + //! This test module provides comprehensive coverage for:
>> + //! - Configuration creation and initialization
>> + //! - Getter methods for all configuration fields
>> + //! - Debug level mutation and thread safety
>> + //! - Concurrent access patterns (reads and writes)
>> + //! - Clone independence
>> + //! - Debug formatting
>> + //! - Edge cases (empty strings, long strings, special characters, unicode)
>> + //!
>> + //! ## Thread Safety
>> + //!
>> + //! The Config struct uses `Arc<AtomicU8>` for debug_level to allow
>> + //! safe concurrent reads and writes. Tests verify:
>> + //! - 10 threads × 100 operations (concurrent modifications)
>> + //! - 20 threads × 1000 operations (concurrent reads)
>> + //!
>> + //! ## Edge Cases
>> + //!
>> + //! Tests cover various edge cases including:
>> + //! - Empty strings for node/cluster names
>> + //! - Long strings (1000+ characters)
>> + //! - Special characters in strings
>> + //! - Unicode support (emoji, non-ASCII characters)
>> +
>> + use super::*;
>> + use std::thread;
>> +
>> + // ===== Basic Construction Tests =====
>> +
>> + #[test]
>> + fn test_config_creation() {
>> + let config = Config::new(
>> + "node1".to_string(),
>> + "192.168.1.10".to_string(),
>> + 33,
>> + false,
>> + false,
>> + "pmxcfs".to_string(),
>> + );
>> +
>> + assert_eq!(config.nodename(), "node1");
>> + assert_eq!(config.node_ip(), "192.168.1.10");
>> + assert_eq!(config.www_data_gid(), 33);
>> + assert!(!config.is_debug());
>> + assert!(!config.is_local_mode());
>> + assert_eq!(config.cluster_name(), "pmxcfs");
>> + assert_eq!(
>> + config.debug_level(),
>> + 0,
>> + "Debug level should be 0 when debug is false"
>> + );
>> + }
>> +
>> + #[test]
>> + fn test_config_creation_with_debug() {
>> + let config = Config::new(
>> + "node2".to_string(),
>> + "10.0.0.5".to_string(),
>> + 1000,
>> + true,
>> + false,
>> + "test-cluster".to_string(),
>> + );
>> +
>> + assert!(config.is_debug());
>> + assert_eq!(
>> + config.debug_level(),
>> + 1,
>> + "Debug level should be 1 when debug is true"
>> + );
>> + }
>> +
>> + #[test]
>> + fn test_config_creation_local_mode() {
>> + let config = Config::new(
>> + "localhost".to_string(),
>> + "127.0.0.1".to_string(),
>> + 33,
>> + false,
>> + true,
>> + "local".to_string(),
>> + );
>> +
>> + assert!(config.is_local_mode());
>> + assert!(!config.is_debug());
>> + }
>> +
>> + // ===== Getter Tests =====
>> +
>> + #[test]
>> + fn test_all_getters() {
>> + let config = Config::new(
>> + "testnode".to_string(),
>> + "172.16.0.1".to_string(),
>> + 999,
>> + true,
>> + true,
>> + "my-cluster".to_string(),
>> + );
>> +
>> + // Test all getter methods
>> + assert_eq!(config.nodename(), "testnode");
>> + assert_eq!(config.node_ip(), "172.16.0.1");
>> + assert_eq!(config.www_data_gid(), 999);
>> + assert!(config.is_debug());
>> + assert!(config.is_local_mode());
>> + assert_eq!(config.cluster_name(), "my-cluster");
>> + assert_eq!(config.debug_level(), 1);
>> + }
>> +
>> + // ===== Debug Level Mutation Tests =====
>> +
>> + #[test]
>> + fn test_debug_level_mutation() {
>> + let config = Config::new(
>> + "node1".to_string(),
>> + "192.168.1.1".to_string(),
>> + 33,
>> + false,
>> + false,
>> + "pmxcfs".to_string(),
>> + );
>> +
>> + assert_eq!(config.debug_level(), 0);
>> +
>> + config.set_debug_level(1);
>> + assert_eq!(config.debug_level(), 1);
>> +
>> + config.set_debug_level(5);
>> + assert_eq!(config.debug_level(), 5);
>> +
>> + config.set_debug_level(0);
>> + assert_eq!(config.debug_level(), 0);
>> + }
>> +
>> + #[test]
>> + fn test_debug_level_max_value() {
>> + let config = Config::new(
>> + "node1".to_string(),
>> + "192.168.1.1".to_string(),
>> + 33,
>> + false,
>> + false,
>> + "pmxcfs".to_string(),
>> + );
>> +
>> + config.set_debug_level(255);
>> + assert_eq!(config.debug_level(), 255);
>> +
>> + config.set_debug_level(0);
>> + assert_eq!(config.debug_level(), 0);
>> + }
>> +
>> + // ===== Thread Safety Tests =====
>> +
>> + #[test]
>> + fn test_debug_level_thread_safety() {
>> + let config = Config::new(
>> + "node1".to_string(),
>> + "192.168.1.1".to_string(),
>> + 33,
>> + false,
>> + false,
>> + "pmxcfs".to_string(),
>> + );
>> +
>> + let config_clone = Arc::clone(&config);
>> +
>> + // Spawn multiple threads that concurrently modify debug level
>> + let handles: Vec<_> = (0..10)
>> + .map(|i| {
>> + let cfg = Arc::clone(&config);
>> + thread::spawn(move || {
>> + for _ in 0..100 {
>> + cfg.set_debug_level(i);
>> + let _ = cfg.debug_level();
>> + }
>> + })
>> + })
>> + .collect();
>> +
>> + // All threads should complete without panicking
>> + for handle in handles {
>> + handle.join().unwrap();
>> + }
>> +
>> + // Final value should be one of the values set by threads
>> + let final_level = config_clone.debug_level();
>> + assert!(
>> + final_level < 10,
>> + "Debug level should be < 10, got {final_level}"
>> + );
>> + }
>> +
>> + #[test]
>> + fn test_concurrent_reads() {
>> + let config = Config::new(
>> + "node1".to_string(),
>> + "192.168.1.1".to_string(),
>> + 33,
>> + true,
>> + false,
>> + "pmxcfs".to_string(),
>> + );
>> +
>> + // Spawn multiple threads that concurrently read config
>> + let handles: Vec<_> = (0..20)
>> + .map(|_| {
>> + let cfg = Arc::clone(&config);
>> + thread::spawn(move || {
>> + for _ in 0..1000 {
>> + assert_eq!(cfg.nodename(), "node1");
>> + assert_eq!(cfg.node_ip(), "192.168.1.1");
>> + assert_eq!(cfg.www_data_gid(), 33);
>> + assert!(cfg.is_debug());
>> + assert!(!cfg.is_local_mode());
>> + assert_eq!(cfg.cluster_name(), "pmxcfs");
>> + }
>> + })
>> + })
>> + .collect();
>> +
>> + for handle in handles {
>> + handle.join().unwrap();
>> + }
>> + }
>> +
>> + // ===== Clone Tests =====
>> +
>> + #[test]
>> + fn test_config_clone() {
>> + let config1 = Config::new(
>> + "node1".to_string(),
>> + "192.168.1.1".to_string(),
>> + 33,
>> + true,
>> + false,
>> + "pmxcfs".to_string(),
>> + );
>> +
>> + config1.set_debug_level(5);
>> +
>> + let config2 = (*config1).clone();
>> +
>> + // Cloned config should have same values
>> + assert_eq!(config2.nodename(), config1.nodename());
>> + assert_eq!(config2.node_ip(), config1.node_ip());
>> + assert_eq!(config2.www_data_gid(), config1.www_data_gid());
>> + assert_eq!(config2.is_debug(), config1.is_debug());
>> + assert_eq!(config2.is_local_mode(), config1.is_local_mode());
>> + assert_eq!(config2.cluster_name(), config1.cluster_name());
>> + assert_eq!(config2.debug_level(), 5);
>> +
>> + // Modifying one should not affect the other
>> + config2.set_debug_level(10);
>> + assert_eq!(config1.debug_level(), 5);
>> + assert_eq!(config2.debug_level(), 10);
>> + }
>> +
>> + // ===== Debug Formatting Tests =====
>> +
>> + #[test]
>> + fn test_debug_format() {
>> + let config = Config::new(
>> + "node1".to_string(),
>> + "192.168.1.1".to_string(),
>> + 33,
>> + true,
>> + false,
>> + "pmxcfs".to_string(),
>> + );
>> +
>> + let debug_str = format!("{config:?}");
>> +
>> + // Check that debug output contains all fields
>> + assert!(debug_str.contains("Config"));
>> + assert!(debug_str.contains("nodename"));
>> + assert!(debug_str.contains("node1"));
>> + assert!(debug_str.contains("node_ip"));
>> + assert!(debug_str.contains("192.168.1.1"));
>> + assert!(debug_str.contains("www_data_gid"));
>> + assert!(debug_str.contains("33"));
>> + assert!(debug_str.contains("debug"));
>> + assert!(debug_str.contains("true"));
>> + assert!(debug_str.contains("local_mode"));
>> + assert!(debug_str.contains("false"));
>> + assert!(debug_str.contains("cluster_name"));
>> + assert!(debug_str.contains("pmxcfs"));
>> + assert!(debug_str.contains("debug_level"));
>> + }
>> +
>> + // ===== Edge Cases and Boundary Tests =====
>> +
>> + #[test]
>> + fn test_empty_strings() {
>> + let config = Config::new(String::new(), String::new(), 0, false, false, String::new());
>> +
>> + assert_eq!(config.nodename(), "");
>> + assert_eq!(config.node_ip(), "");
>> + assert_eq!(config.cluster_name(), "");
>> + assert_eq!(config.www_data_gid(), 0);
>> + }
>> +
>> + #[test]
>> + fn test_long_strings() {
>> + let long_name = "a".repeat(1000);
>> + let long_ip = "192.168.1.".to_string() + &"1".repeat(100);
>> + let long_cluster = "cluster-".to_string() + &"x".repeat(500);
>> +
>> + let config = Config::new(
>> + long_name.clone(),
>> + long_ip.clone(),
>> + u32::MAX,
>> + true,
>> + true,
>> + long_cluster.clone(),
>> + );
>> +
>> + assert_eq!(config.nodename(), long_name);
>> + assert_eq!(config.node_ip(), long_ip);
>> + assert_eq!(config.cluster_name(), long_cluster);
>> + assert_eq!(config.www_data_gid(), u32::MAX);
>> + }
>> +
>> + #[test]
>> + fn test_special_characters_in_strings() {
>> + let config = Config::new(
>> + "node-1_test.local".to_string(),
>> + "192.168.1.10:8006".to_string(),
>> + 33,
>> + false,
>> + false,
>> + "my-cluster_v2.0".to_string(),
>> + );
>> +
>> + assert_eq!(config.nodename(), "node-1_test.local");
>> + assert_eq!(config.node_ip(), "192.168.1.10:8006");
>> + assert_eq!(config.cluster_name(), "my-cluster_v2.0");
>> + }
>> +
>> + #[test]
>> + fn test_unicode_in_strings() {
>> + let config = Config::new(
>> + "ノード1".to_string(),
>> + "::1".to_string(),
>> + 33,
>> + false,
>> + false,
>> + "集群".to_string(),
>> + );
>> +
>> + assert_eq!(config.nodename(), "ノード1");
>> + assert_eq!(config.node_ip(), "::1");
>> + assert_eq!(config.cluster_name(), "集群");
>> + }
>> +}
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [relevance 5%]
* Re: [pve-devel] [PATCH pve-cluster 03/15] pmxcfs-rs: add pmxcfs-logger crate
@ 2026-01-27 13:16 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-27 13:16 UTC (permalink / raw)
To: Proxmox VE development discussion, Kefu Chai
Thanks for the patch, Kefu.
The overall structure looks solid.
Main points are around C compatibility details. It might
also be worth adding a couple of binary compatibility tests
(known C blobs/fixtures) and a perf test for merging large logs.
Please see inline comments below.
On 1/6/26 3:24 PM, Kefu Chai wrote:
> Add cluster logging system with:
> - ClusterLog: Main API with automatic deduplication
> - RingBuffer: Circular buffer (50,000 entries)
> - FNV-1a hashing for duplicate detection
> - JSON export matching C format
> - Binary serialization for efficient storage
> - Time-based and node-digest sorting
>
> This is a self-contained crate with no internal dependencies,
> only requiring serde and parking_lot. It provides ~24% of the
> C version's LOC (740 vs 3000+) while maintaining full
> compatibility with the existing log format.
>
> Includes comprehensive unit tests for ring buffer operations,
> serialization, and filtering.
>
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
> src/pmxcfs-rs/Cargo.toml | 1 +
> src/pmxcfs-rs/pmxcfs-logger/Cargo.toml | 15 +
> src/pmxcfs-rs/pmxcfs-logger/README.md | 58 ++
> .../pmxcfs-logger/src/cluster_log.rs | 550 +++++++++++++++++
> src/pmxcfs-rs/pmxcfs-logger/src/entry.rs | 579 +++++++++++++++++
> src/pmxcfs-rs/pmxcfs-logger/src/hash.rs | 173 ++++++
> src/pmxcfs-rs/pmxcfs-logger/src/lib.rs | 27 +
> .../pmxcfs-logger/src/ring_buffer.rs | 581 ++++++++++++++++++
> 8 files changed, 1984 insertions(+)
> create mode 100644 src/pmxcfs-rs/pmxcfs-logger/Cargo.toml
> create mode 100644 src/pmxcfs-rs/pmxcfs-logger/README.md
> create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/cluster_log.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/entry.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/hash.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/lib.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-logger/src/ring_buffer.rs
>
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index 28e20bb7..4d17e87e 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -3,6 +3,7 @@
> members = [
> "pmxcfs-api-types", # Shared types and error definitions
> "pmxcfs-config", # Configuration management
> + "pmxcfs-logger", # Cluster log with ring buffer and deduplication
> ]
> resolver = "2"
>
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/Cargo.toml b/src/pmxcfs-rs/pmxcfs-logger/Cargo.toml
> new file mode 100644
> index 00000000..1af3f015
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/Cargo.toml
> @@ -0,0 +1,15 @@
> +[package]
> +name = "pmxcfs-logger"
> +version = "0.1.0"
> +edition = "2021"
> +
> +[dependencies]
> +anyhow = "1.0"
> +parking_lot = "0.12"
> +serde = { version = "1.0", features = ["derive"] }
> +serde_json = "1.0"
> +tracing = "0.1"
> +
> +[dev-dependencies]
> +tempfile = "3.0"
> +
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/README.md b/src/pmxcfs-rs/pmxcfs-logger/README.md
> new file mode 100644
> index 00000000..38f102c2
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/README.md
> @@ -0,0 +1,58 @@
> +# pmxcfs-logger
> +
> +Cluster-wide log management for pmxcfs, fully compatible with the C implementation (logger.c).
> +
> +## Overview
> +
> +This crate implements a cluster log system matching Proxmox's C-based logger.c behavior. It provides:
> +
> +- **Ring Buffer Storage**: Circular buffer for log entries with automatic capacity management
> +- **FNV-1a Hashing**: Hashing for node and identity-based deduplication
> +- **Deduplication**: Per-node tracking of latest log entries to avoid duplicates
> +- **Time-based Sorting**: Chronological ordering of log entries across nodes
> +- **Multi-node Merging**: Combining logs from multiple cluster nodes
> +- **JSON Export**: Web UI-compatible JSON output matching C format
> +
> +## Architecture
> +
> +### Key Components
> +
> +1. **LogEntry** (`entry.rs`): Individual log entry with automatic UID generation
> +2. **RingBuffer** (`ring_buffer.rs`): Circular buffer with capacity management
> +3. **ClusterLog** (`lib.rs`): Main API with deduplication and merging
> +4. **Hash Functions** (`hash.rs`): FNV-1a implementation matching C
> +
> +## C to Rust Mapping
> +
> +| C Function | Rust Equivalent | Location |
> +|------------|-----------------|----------|
> +| `fnv_64a_buf` | `hash::fnv_64a` | hash.rs |
> +| `clog_pack` | `LogEntry::pack` | entry.rs |
> +| `clog_copy` | `RingBuffer::add_entry` | ring_buffer.rs |
> +| `clog_sort` | `RingBuffer::sort` | ring_buffer.rs |
> +| `clog_dump_json` | `RingBuffer::dump_json` | ring_buffer.rs |
> +| `clusterlog_insert` | `ClusterLog::insert` | lib.rs |
> +| `clusterlog_add` | `ClusterLog::add` | lib.rs |
> +| `clusterlog_merge` | `ClusterLog::merge` | lib.rs |
> +| `dedup_lookup` | `ClusterLog::dedup_lookup` | lib.rs |
> +
> +## Key Differences from C
> +
> +1. **No `node_digest` in DedupEntry**: C stores `node_digest` both as HashMap key and in the struct. Rust only uses it as the key, saving 8 bytes per entry.
> +
> +2. **Mutex granularity**: C uses a single global mutex. Rust uses separate Arc<Mutex<>> for buffer and dedup table, allowing better concurrency.
> +
> +3. **Code size**: Rust implementation is ~24% the size of C (740 lines vs 3,000+) while maintaining equivalent functionality.
> +
> +## Integration
> +
> +This crate is integrated into `pmxcfs-status` to provide cluster log functionality. The `.clusterlog` FUSE plugin uses this to provide JSON log output compatible with the Proxmox web UI.
> +
> +## References
> +
> +### C Implementation
> +- `src/pmxcfs/logger.c` / `logger.h` - Cluster log implementation
> +
> +### Related Crates
> +- **pmxcfs-status**: Integrates ClusterLog for status tracking
> +- **pmxcfs**: FUSE plugin exposes cluster log via `.clusterlog`
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/cluster_log.rs b/src/pmxcfs-rs/pmxcfs-logger/src/cluster_log.rs
> new file mode 100644
> index 00000000..3eb6c68c
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/cluster_log.rs
> @@ -0,0 +1,550 @@
> +/// Cluster Log Implementation
> +///
> +/// This module implements the cluster-wide log system with deduplication
> +/// and merging support, matching C's clusterlog_t.
> +use crate::entry::LogEntry;
> +use crate::ring_buffer::{RingBuffer, CLOG_DEFAULT_SIZE};
> +use anyhow::Result;
> +use parking_lot::Mutex;
> +use std::collections::{BTreeMap, HashMap};
> +use std::sync::Arc;
> +
> +/// Deduplication entry - tracks the latest UID and time for each node
> +///
> +/// Note: C's `dedup_entry_t` (logger.c:70-74) includes node_digest field because
> +/// GHashTable stores the struct pointer both as key and value. In Rust, we use
> +/// HashMap<u64, DedupEntry> where node_digest is the key, so we don't need to
> +/// duplicate it in the value. This is functionally equivalent but more efficient.
> +#[derive(Debug, Clone)]
> +pub(crate) struct DedupEntry {
> + /// Latest UID seen from this node
> + pub uid: u32,
> + /// Latest timestamp seen from this node
> + pub time: u32,
> +}
> +
> +/// Cluster-wide log with deduplication and merging support
> +/// Matches C's `clusterlog_t`
> +pub struct ClusterLog {
> + /// Ring buffer for log storage
> + pub(crate) buffer: Arc<Mutex<RingBuffer>>,
> +
> + /// Deduplication tracker (node_digest -> latest entry info)
> + /// Matches C's dedup hash table
> + pub(crate) dedup: Arc<Mutex<HashMap<u64, DedupEntry>>>,
> +}
> +
> +impl ClusterLog {
> + /// Create a new cluster log with default size
> + pub fn new() -> Self {
> + Self::with_capacity(CLOG_DEFAULT_SIZE)
> + }
> +
> + /// Create a new cluster log with specified capacity
> + pub fn with_capacity(capacity: usize) -> Self {
> + Self {
> + buffer: Arc::new(Mutex::new(RingBuffer::new(capacity))),
> + dedup: Arc::new(Mutex::new(HashMap::new())),
> + }
> + }
> +
> + /// Matches C's `clusterlog_add` function (logger.c:588-615)
> + #[allow(clippy::too_many_arguments)]
> + pub fn add(
> + &self,
> + node: &str,
> + ident: &str,
> + tag: &str,
> + pid: u32,
> + priority: u8,
> + time: u32,
> + message: &str,
> + ) -> Result<()> {
> + let entry = LogEntry::pack(node, ident, tag, pid, time, priority, message)?;
> + self.insert(&entry)
> + }
> +
> + /// Insert a log entry (with deduplication)
> + ///
> + /// Matches C's `clusterlog_insert` function (logger.c:573-586)
> + pub fn insert(&self, entry: &LogEntry) -> Result<()> {
> + let mut dedup = self.dedup.lock();
> +
> + // Check deduplication
> + if self.is_not_duplicate(&mut dedup, entry) {
> + // Entry is not a duplicate, add it
> + let mut buffer = self.buffer.lock();
> + buffer.add_entry(entry)?;
> + } else {
> + tracing::debug!("Ignoring duplicate cluster log entry");
> + }
> +
> + Ok(())
> + }
> +
> + /// Check if entry is a duplicate (returns true if NOT a duplicate)
> + ///
> + /// Matches C's `dedup_lookup` function (logger.c:362-388)
> + fn is_not_duplicate(&self, dedup: &mut HashMap<u64, DedupEntry>, entry: &LogEntry) -> bool {
> + match dedup.get_mut(&entry.node_digest) {
> + None => {
> + dedup.insert(
> + entry.node_digest,
> + DedupEntry {
> + time: entry.time,
> + uid: entry.uid,
> + },
> + );
> + true
> + }
> + Some(dd) => {
> + if entry.time > dd.time || (entry.time == dd.time && entry.uid > dd.uid) {
> + dd.time = entry.time;
> + dd.uid = entry.uid;
> + true
> + } else {
> + false
> + }
> + }
> + }
> + }
> +
> + pub fn get_entries(&self, max: usize) -> Vec<LogEntry> {
> + let buffer = self.buffer.lock();
> + buffer.iter().take(max).cloned().collect()
> + }
> +
> + /// Clear all log entries (for testing)
> + pub fn clear(&self) {
> + let mut buffer = self.buffer.lock();
> + let capacity = buffer.capacity();
> + *buffer = RingBuffer::new(capacity);
> + drop(buffer);
> +
> + self.dedup.lock().clear();
> + }
> +
> + /// Sort the log entries by time
> + ///
> + /// Matches C's `clog_sort` function (logger.c:321-355)
> + pub fn sort(&self) -> Result<RingBuffer> {
> + let buffer = self.buffer.lock();
> + buffer.sort()
> + }
> +
> + /// Merge logs from multiple nodes
> + ///
> + /// Matches C's `clusterlog_merge` function (logger.c:405-512)
> + pub fn merge(&self, remote_logs: Vec<RingBuffer>, include_local: bool) -> Result<RingBuffer> {
> + let mut sorted_entries: BTreeMap<(u32, u64, u32), LogEntry> = BTreeMap::new();
> + let mut merge_dedup: HashMap<u64, DedupEntry> = HashMap::new();
> +
> + // Calculate maximum capacity
> + let max_size = if include_local {
> + let local = self.buffer.lock();
> + let local_cap = local.capacity();
> + drop(local);
> +
> + std::iter::once(local_cap)
> + .chain(remote_logs.iter().map(|b| b.capacity()))
> + .max()
> + .unwrap_or(CLOG_DEFAULT_SIZE)
> + } else {
> + remote_logs
> + .iter()
> + .map(|b| b.capacity())
> + .max()
> + .unwrap_or(CLOG_DEFAULT_SIZE)
> + };
> +
> + // Add local entries if requested
> + if include_local {
> + let buffer = self.buffer.lock();
> + for entry in buffer.iter() {
> + let key = (entry.time, entry.node_digest, entry.uid);
> + sorted_entries.insert(key, entry.clone());
BTreeMap::insert overwrites on duplicate. Please re-check whether we
want that; if we want to keep-first, use
entry(key).or_insert(...) and only update merge_dedup when newly
inserted.
> + self.is_not_duplicate(&mut merge_dedup, entry);
> + }
> + }
> +
> + // Add remote entries
> + for remote_buffer in &remote_logs {
> + for entry in remote_buffer.iter() {
> + let key = (entry.time, entry.node_digest, entry.uid);
> + sorted_entries.insert(key, entry.clone());
> + self.is_not_duplicate(&mut merge_dedup, entry);
> + }
> + }
> +
> + let mut result = RingBuffer::new(max_size);
> +
> + // BTreeMap iterates in key order, entries are already sorted by (time, node_digest, uid)
> + for (_key, entry) in sorted_entries.iter().rev() {
C iterates oldest -> newest and clog_copy() makes each entry the new
head, so result is newest first. With .rev() and push_front we likely
invert it. Maybe drop .rev()? Please re-check
> + if result.is_near_full() {
> + break;
> + }
> + result.add_entry(entry)?;
> + }
> +
> + *self.dedup.lock() = merge_dedup;
clusterlog_merge() in C updates both cl->dedup and cl->base under the
same mutex. Here we update only dedup but return a RingBuffer which
then requires a separate update_buffer() call. Shouldn't this be an
atomic operation? Also, we currently have two mutexes (dedup and
buffer), which increases deadlock risk. Couldnt we put buffer and
dedup behind one mutex and make merge() update both buffer+dedup
atomically inside the same lock?
> +
> + Ok(result)
> + }
> +
> + /// Export log to JSON format
> + ///
> + /// Matches C's `clog_dump_json` function (logger.c:139-199)
> + pub fn dump_json(&self, ident_filter: Option<&str>, max_entries: usize) -> String {
> + let buffer = self.buffer.lock();
> + buffer.dump_json(ident_filter, max_entries)
> + }
> +
> + /// Export log to JSON format with sorted entries
> + pub fn dump_json_sorted(
> + &self,
> + ident_filter: Option<&str>,
> + max_entries: usize,
> + ) -> Result<String> {
> + let sorted = self.sort()?;
> + Ok(sorted.dump_json(ident_filter, max_entries))
> + }
> +
> + /// Matches C's `clusterlog_get_state` function (logger.c:553-571)
> + ///
> + /// Returns binary-serialized clog_base_t structure for network transmission.
> + /// This format is compatible with C nodes for mixed-cluster operation.
> + pub fn get_state(&self) -> Result<Vec<u8>> {
> + let sorted = self.sort()?;
> + Ok(sorted.serialize_binary())
> + }
> +
> + pub fn deserialize_state(data: &[u8]) -> Result<RingBuffer> {
> + RingBuffer::deserialize_binary(data)
> + }
> +
> + /// Replace the entire buffer after merging logs from multiple nodes
> + pub fn update_buffer(&self, new_buffer: RingBuffer) {
> + *self.buffer.lock() = new_buffer;
> + }
> +}
> +
> +impl Default for ClusterLog {
> + fn default() -> Self {
> + Self::new()
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> +
> + #[test]
> + fn test_cluster_log_creation() {
> + let log = ClusterLog::new();
> + assert!(log.buffer.lock().is_empty());
> + }
> +
> + #[test]
> + fn test_add_entry() {
> + let log = ClusterLog::new();
> +
> + let result = log.add(
> + "node1",
> + "root",
> + "cluster",
> + 12345,
> + 6, // Info priority
> + 1234567890,
> + "Test message",
> + );
> +
> + assert!(result.is_ok());
> + assert!(!log.buffer.lock().is_empty());
> + }
> +
> + #[test]
> + fn test_deduplication() {
> + let log = ClusterLog::new();
> +
> + // Add same entry twice (but with different UIDs since each add creates a new entry)
> + let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Message 1");
> + let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Message 1");
> +
> + // Both entries are added because they have different UIDs
> + // Deduplication tracks the latest (time, UID) per node, not content
> + let buffer = log.buffer.lock();
> + assert_eq!(buffer.len(), 2);
> + }
> +
> + #[test]
> + fn test_newer_entry_replaces() {
> + let log = ClusterLog::new();
> +
> + // Add older entry
> + let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Old message");
> +
> + // Add newer entry from same node
> + let _ = log.add("node1", "root", "cluster", 123, 6, 1001, "New message");
> +
> + // Should have both entries (newer doesn't remove older, just updates dedup tracker)
> + let buffer = log.buffer.lock();
> + assert_eq!(buffer.len(), 2);
> + }
> +
> + #[test]
> + fn test_json_export() {
> + let log = ClusterLog::new();
> +
> + let _ = log.add(
> + "node1",
> + "root",
> + "cluster",
> + 123,
> + 6,
> + 1234567890,
> + "Test message",
> + );
> +
> + let json = log.dump_json(None, 50);
> +
> + // Should be valid JSON
> + assert!(serde_json::from_str::<serde_json::Value>(&json).is_ok());
> +
> + // Should contain "data" field
> + let value: serde_json::Value = serde_json::from_str(&json).unwrap();
> + assert!(value.get("data").is_some());
> + }
> +
> + #[test]
> + fn test_merge_logs() {
> + let log1 = ClusterLog::new();
> + let log2 = ClusterLog::new();
> +
> + // Add entries to first log
> + let _ = log1.add(
> + "node1",
> + "root",
> + "cluster",
> + 123,
> + 6,
> + 1000,
> + "Message from node1",
> + );
> +
> + // Add entries to second log
> + let _ = log2.add(
> + "node2",
> + "root",
> + "cluster",
> + 456,
> + 6,
> + 1001,
> + "Message from node2",
> + );
> +
> + // Get log2's buffer for merging
> + let log2_buffer = log2.buffer.lock().clone();
> +
> + // Merge into log1
> + let merged = log1.merge(vec![log2_buffer], true).unwrap();
> +
> + // Should contain entries from both logs
> + assert!(merged.len() >= 2);
> + }
> +
> + // ========================================================================
> + // HIGH PRIORITY TESTS - Merge Edge Cases
> + // ========================================================================
> +
> + #[test]
> + fn test_merge_empty_logs() {
> + let log = ClusterLog::new();
> +
> + // Add some entries to local log
> + let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Local entry");
> +
> + // Merge with empty remote logs
> + let merged = log.merge(vec![], true).unwrap();
> +
> + // Should have 1 entry (from local log)
> + assert_eq!(merged.len(), 1);
> + let entry = merged.iter().next().unwrap();
> + assert_eq!(entry.node, "node1");
> + }
> +
> + #[test]
> + fn test_merge_single_node_only() {
> + let log = ClusterLog::new();
> +
> + // Add entries only from single node
> + let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Entry 1");
> + let _ = log.add("node1", "root", "cluster", 124, 6, 1001, "Entry 2");
> + let _ = log.add("node1", "root", "cluster", 125, 6, 1002, "Entry 3");
> +
> + // Merge with no remote logs (just sort local)
> + let merged = log.merge(vec![], true).unwrap();
> +
> + // Should have all 3 entries
> + assert_eq!(merged.len(), 3);
> +
> + // Entries should be sorted by time (buffer stores newest first after reversing during add)
> + // Merge reverses the BTreeMap iteration, so newest entries are added first
> + let times: Vec<u32> = merged.iter().map(|e| e.time).collect();
> + let mut expected = vec![1002, 1001, 1000];
> + expected.sort();
> + expected.reverse(); // Newest first
> +
> + let mut actual = times.clone();
> + actual.sort();
> + actual.reverse();
> +
> + assert_eq!(actual, expected);
> + }
> +
> + #[test]
> + fn test_merge_all_duplicates() {
> + let log1 = ClusterLog::new();
> + let log2 = ClusterLog::new();
> +
> + // Add same entries to both logs (same node, time, but different UIDs)
> + let _ = log1.add("node1", "root", "cluster", 123, 6, 1000, "Entry 1");
> + let _ = log1.add("node1", "root", "cluster", 124, 6, 1001, "Entry 2");
> +
> + let _ = log2.add("node1", "root", "cluster", 125, 6, 1000, "Entry 1");
> + let _ = log2.add("node1", "root", "cluster", 126, 6, 1001, "Entry 2");
> +
> + let log2_buffer = log2.buffer.lock().clone();
> +
> + // Merge - should handle entries from same node at same times
> + let merged = log1.merge(vec![log2_buffer], true).unwrap();
> +
> + // Should have 4 entries (all are unique by UID despite same time/node)
> + assert_eq!(merged.len(), 4);
> + }
> +
> + #[test]
> + fn test_merge_exceeding_capacity() {
> + // Create small buffer to test capacity enforcement
> + let log = ClusterLog::with_capacity(50_000); // Small buffer
> +
> + // Add many entries to fill beyond capacity
> + for i in 0..100 {
> + let _ = log.add(
> + "node1",
> + "root",
> + "cluster",
> + 100 + i,
> + 6,
> + 1000 + i,
> + &format!("Entry {}", i),
> + );
> + }
> +
> + // Create remote log with many entries
> + let remote = ClusterLog::with_capacity(50_000);
> + for i in 0..100 {
> + let _ = remote.add(
> + "node2",
> + "root",
> + "cluster",
> + 200 + i,
> + 6,
> + 1000 + i,
> + &format!("Remote {}", i),
> + );
> + }
> +
> + let remote_buffer = remote.buffer.lock().clone();
> +
> + // Merge - should stop when buffer is near full
> + let merged = log.merge(vec![remote_buffer], true).unwrap();
> +
> + // Buffer should be limited by capacity, not necessarily < 200
> + // The actual limit depends on entry sizes and capacity
> + // Just verify we got some reasonable number of entries
> + assert!(!merged.is_empty(), "Should have some entries");
> + assert!(
> + merged.len() <= 200,
> + "Should not exceed total available entries"
> + );
> + }
> +
> + #[test]
> + fn test_merge_preserves_dedup_state() {
> + let log = ClusterLog::new();
> +
> + // Add entries from node1
> + let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Entry 1");
> + let _ = log.add("node1", "root", "cluster", 124, 6, 1001, "Entry 2");
> +
> + // Create remote log with later entries from node1
> + let remote = ClusterLog::new();
> + let _ = remote.add("node1", "root", "cluster", 125, 6, 1002, "Entry 3");
> +
> + let remote_buffer = remote.buffer.lock().clone();
> +
> + // Merge
> + let _ = log.merge(vec![remote_buffer], true).unwrap();
> +
> + // Check that dedup state was updated
> + let dedup = log.dedup.lock();
> + let node1_digest = crate::hash::fnv_64a_str("node1");
> + let dedup_entry = dedup.get(&node1_digest).unwrap();
> +
> + // Should track the latest time from node1
> + assert_eq!(dedup_entry.time, 1002);
> + // UID is auto-generated, so just verify it exists and is reasonable
> + assert!(dedup_entry.uid > 0);
> + }
> +
> + #[test]
> + fn test_get_state_binary_format() {
> + let log = ClusterLog::new();
> +
> + // Add some entries
> + let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Entry 1");
> + let _ = log.add("node2", "admin", "system", 456, 6, 1001, "Entry 2");
> +
> + // Get state
> + let state = log.get_state().unwrap();
> +
> + // Should be binary format, not JSON
> + assert!(state.len() >= 8); // At least header
> +
> + // Check header format (clog_base_t)
> + let size = u32::from_le_bytes(state[0..4].try_into().unwrap()) as usize;
> + let cpos = u32::from_le_bytes(state[4..8].try_into().unwrap());
> +
> + assert_eq!(size, state.len());
> + assert_eq!(cpos, 8); // First entry at offset 8
> +
> + // Should be able to deserialize back
> + let deserialized = ClusterLog::deserialize_state(&state).unwrap();
> + assert_eq!(deserialized.len(), 2);
> + }
> +
> + #[test]
> + fn test_state_roundtrip() {
> + let log = ClusterLog::new();
> +
> + // Add entries
> + let _ = log.add("node1", "root", "cluster", 123, 6, 1000, "Test 1");
> + let _ = log.add("node2", "admin", "system", 456, 6, 1001, "Test 2");
> +
> + // Serialize
> + let state = log.get_state().unwrap();
> +
> + // Deserialize
> + let deserialized = ClusterLog::deserialize_state(&state).unwrap();
> +
> + // Check entries preserved
> + assert_eq!(deserialized.len(), 2);
> +
> + // Buffer is stored newest-first after sorting and serialization
> + let entries: Vec<_> = deserialized.iter().collect();
> + assert_eq!(entries[0].node, "node2"); // Newest (time 1001)
> + assert_eq!(entries[0].message, "Test 2");
> + assert_eq!(entries[1].node, "node1"); // Oldest (time 1000)
> + assert_eq!(entries[1].message, "Test 1");
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/entry.rs b/src/pmxcfs-rs/pmxcfs-logger/src/entry.rs
> new file mode 100644
> index 00000000..187667ad
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/entry.rs
> @@ -0,0 +1,579 @@
> +/// Log Entry Implementation
> +///
> +/// This module implements the cluster log entry structure, matching the C
> +/// implementation's clog_entry_t (logger.c).
> +use super::hash::fnv_64a_str;
> +use anyhow::{bail, Result};
> +use serde::Serialize;
> +use std::sync::atomic::{AtomicU32, Ordering};
> +
> +// Constants from C implementation
> +pub(crate) const CLOG_MAX_ENTRY_SIZE: usize = 8192 + 4096; // SYSLOG_MAX_LINE_LENGTH + overhead
This constant is also defined in ring_buffer.rs.
> +
> +/// Global UID counter (matches C's `uid_counter` in logger.c:62)
> +static UID_COUNTER: AtomicU32 = AtomicU32::new(0);
> +
> +/// Log entry structure
> +///
> +/// Matches C's `clog_entry_t` from logger.c:
> +/// ```c
> +/// typedef struct {
> +/// uint32_t prev; // Previous entry offset
> +/// uint32_t next; // Next entry offset
> +/// uint32_t uid; // Unique ID
> +/// uint32_t time; // Timestamp
> +/// uint64_t node_digest; // FNV-1a hash of node name
> +/// uint64_t ident_digest; // FNV-1a hash of ident
> +/// uint32_t pid; // Process ID
> +/// uint8_t priority; // Syslog priority (0-7)
> +/// uint8_t node_len; // Length of node name (including null)
> +/// uint8_t ident_len; // Length of ident (including null)
> +/// uint8_t tag_len; // Length of tag (including null)
> +/// uint32_t msg_len; // Length of message (including null)
> +/// char data[]; // Variable length data: node + ident + tag + msg
> +/// } clog_entry_t;
> +/// ```
> +#[derive(Debug, Clone, Serialize)]
> +pub struct LogEntry {
> + /// Unique ID for this entry (auto-incrementing)
> + pub uid: u32,
> +
> + /// Unix timestamp
> + pub time: u32,
> +
> + /// FNV-1a hash of node name
> + pub node_digest: u64,
> +
> + /// FNV-1a hash of ident (user)
> + pub ident_digest: u64,
> +
> + /// Process ID
> + pub pid: u32,
> +
> + /// Syslog priority (0-7)
> + pub priority: u8,
> +
> + /// Node name
> + pub node: String,
> +
> + /// Identity/user
> + pub ident: String,
> +
> + /// Tag (e.g., "cluster", "pmxcfs")
> + pub tag: String,
> +
> + /// Log message
> + pub message: String,
> +}
> +
> +impl LogEntry {
> + /// Matches C's `clog_pack` function (logger.c:220-278)
> + pub fn pack(
> + node: &str,
> + ident: &str,
> + tag: &str,
> + pid: u32,
> + time: u32,
> + priority: u8,
> + message: &str,
> + ) -> Result<Self> {
> + if priority >= 8 {
> + bail!("Invalid priority: {priority} (must be 0-7)");
> + }
> +
> + let node = Self::truncate_string(node, 255);
> + let ident = Self::truncate_string(ident, 255);
> + let tag = Self::truncate_string(tag, 255);
> + let message = Self::utf8_to_ascii(message);
> +
> + let node_len = node.len() + 1;
> + let ident_len = ident.len() + 1;
> + let tag_len = tag.len() + 1;
> + let mut msg_len = message.len() + 1;
> +
> + let total_size = std::mem::size_of::<u32>() * 4 // prev, next, uid, time
> + + std::mem::size_of::<u64>() * 2 // node_digest, ident_digest
> + + std::mem::size_of::<u32>() * 2 // pid, msg_len
> + + std::mem::size_of::<u8>() * 4 // priority, node_len, ident_len, tag_len
> + + node_len
> + + ident_len
> + + tag_len
> + + msg_len;
> +
> + if total_size > CLOG_MAX_ENTRY_SIZE {
> + let diff = total_size - CLOG_MAX_ENTRY_SIZE;
> + msg_len = msg_len.saturating_sub(diff);
> + }
> +
> + let node_digest = fnv_64a_str(&node);
> + let ident_digest = fnv_64a_str(&ident);
> + let uid = UID_COUNTER.fetch_add(1, Ordering::SeqCst).wrapping_add(1);
> +
> + Ok(Self {
> + uid,
> + time,
> + node_digest,
> + ident_digest,
> + pid,
> + priority,
> + node,
> + ident,
> + tag,
> + message: message[..msg_len.saturating_sub(1)].to_string(),
> + })
> + }
> +
> + /// Truncate string to max length
> + fn truncate_string(s: &str, max_len: usize) -> String {
> + if s.len() > max_len {
> + s[..max_len].to_string()
> + } else {
> + s.to_string()
> + }
> + }
> +
> + /// Convert UTF-8 to ASCII with proper escaping
> + ///
> + /// Matches C's `utf8_to_ascii` behavior (cfs-utils.c:40-107):
> + /// - Control characters (0x00-0x1F, 0x7F): Escaped as #0XXX (e.g., #007 for BEL)
> + /// - Unicode (U+0080 to U+FFFF): Escaped as \uXXXX (e.g., \u4e16 for 世)
> + /// - Quotes (when quotequote=true): Escaped as \"
> + /// - Characters > U+FFFF: Silently dropped
> + /// - ASCII printable (0x20-0x7E except quotes): Passed through unchanged
> + fn utf8_to_ascii(s: &str) -> String {
> + let mut result = String::with_capacity(s.len());
> +
> + for c in s.chars() {
> + match c {
> + // Control characters: #0XXX format (3 decimal digits with leading 0)
> + '\x00'..='\x1F' | '\x7F' => {
> + let code = c as u32;
> + result.push('#');
> + result.push('0');
> + // Format as 3 decimal digits with leading zeros (e.g., #0007 for BEL)
> + result.push_str(&format!("{:03}", code));
> + }
> + // ASCII printable characters: pass through
> + c if c.is_ascii() => {
> + result.push(c);
> + }
> + // Unicode U+0080 to U+FFFF: \uXXXX format
> + c if (c as u32) < 0x10000 => {
> + result.push('\\');
> + result.push('u');
> + result.push_str(&format!("{:04x}", c as u32));
> + }
> + // Characters > U+FFFF: silently drop (matches C behavior)
> + _ => {}
> + }
> + }
> +
> + result
> + }
> +
> + /// Matches C's `clog_entry_size` function (logger.c:201-206)
> + pub fn size(&self) -> usize {
> + std::mem::size_of::<u32>() * 4 // prev, next, uid, time
> + + std::mem::size_of::<u64>() * 2 // node_digest, ident_digest
> + + std::mem::size_of::<u32>() * 2 // pid, msg_len
> + + std::mem::size_of::<u8>() * 4 // priority, node_len, ident_len, tag_len
> + + self.node.len() + 1
> + + self.ident.len() + 1
> + + self.tag.len() + 1
> + + self.message.len() + 1
> + }
> +
> + /// C implementation: `uint32_t realsize = ((size + 7) & 0xfffffff8);`
> + pub fn aligned_size(&self) -> usize {
> + let size = self.size();
> + (size + 7) & !7
> + }
> +
> + pub fn to_json_object(&self) -> serde_json::Value {
> + serde_json::json!({
> + "uid": self.uid,
> + "time": self.time,
> + "pri": self.priority,
> + "tag": self.tag,
> + "pid": self.pid,
> + "node": self.node,
> + "user": self.ident,
> + "msg": self.message,
> + })
> + }
> +
> + /// Serialize to C binary format (clog_entry_t)
> + ///
> + /// Binary layout matches C structure:
> + /// ```c
> + /// struct {
> + /// uint32_t prev; // Will be filled by ring buffer
> + /// uint32_t next; // Will be filled by ring buffer
> + /// uint32_t uid;
> + /// uint32_t time;
> + /// uint64_t node_digest;
> + /// uint64_t ident_digest;
> + /// uint32_t pid;
> + /// uint8_t priority;
> + /// uint8_t node_len;
> + /// uint8_t ident_len;
> + /// uint8_t tag_len;
> + /// uint32_t msg_len;
> + /// char data[]; // node + ident + tag + msg (null-terminated)
> + /// }
> + /// ```
> + pub(crate) fn serialize_binary(&self, prev: u32, next: u32) -> Vec<u8> {
> + let mut buf = Vec::new();
> +
> + buf.extend_from_slice(&prev.to_le_bytes());
> + buf.extend_from_slice(&next.to_le_bytes());
> + buf.extend_from_slice(&self.uid.to_le_bytes());
> + buf.extend_from_slice(&self.time.to_le_bytes());
> + buf.extend_from_slice(&self.node_digest.to_le_bytes());
> + buf.extend_from_slice(&self.ident_digest.to_le_bytes());
> + buf.extend_from_slice(&self.pid.to_le_bytes());
> + buf.push(self.priority);
> +
> + let node_len = (self.node.len() + 1) as u8;
> + let ident_len = (self.ident.len() + 1) as u8;
> + let tag_len = (self.tag.len() + 1) as u8;
These three fields are u8 incl. NUL. Payload must cap at 254 bytes,
otherwise len + 1 wraps to 0. C does MIN(strlen + 1,255)
> + let msg_len = (self.message.len() + 1) as u32;
> +
> + buf.push(node_len);
> + buf.push(ident_len);
> + buf.push(tag_len);
> + buf.extend_from_slice(&msg_len.to_le_bytes());
> +
> + buf.extend_from_slice(self.node.as_bytes());
> + buf.push(0);
> +
> + buf.extend_from_slice(self.ident.as_bytes());
> + buf.push(0);
> +
> + buf.extend_from_slice(self.tag.as_bytes());
> + buf.push(0);
> +
> + buf.extend_from_slice(self.message.as_bytes());
> + buf.push(0);
> +
> + buf
> + }
> +
> + pub(crate) fn deserialize_binary(data: &[u8]) -> Result<(Self, u32, u32)> {
> + if data.len() < 48 {
> + bail!(
> + "Entry too small: {} bytes (need at least 48 for header)",
> + data.len()
> + );
> + }
> +
> + let mut offset = 0;
> +
> + let prev = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> + offset += 4;
> +
> + let next = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> + offset += 4;
> +
> + let uid = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> + offset += 4;
> +
> + let time = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> + offset += 4;
> +
> + let node_digest = u64::from_le_bytes(data[offset..offset + 8].try_into()?);
> + offset += 8;
> +
> + let ident_digest = u64::from_le_bytes(data[offset..offset + 8].try_into()?);
> + offset += 8;
> +
> + let pid = u32::from_le_bytes(data[offset..offset + 4].try_into()?);
> + offset += 4;
> +
> + let priority = data[offset];
> + offset += 1;
> +
> + let node_len = data[offset] as usize;
> + offset += 1;
> +
> + let ident_len = data[offset] as usize;
> + offset += 1;
> +
> + let tag_len = data[offset] as usize;
> + offset += 1;
> +
> + let msg_len = u32::from_le_bytes(data[offset..offset + 4].try_into()?) as usize;
> + offset += 4;
> +
> + if offset + node_len + ident_len + tag_len + msg_len > data.len() {
> + bail!("Entry data exceeds buffer size");
> + }
> +
> + let node = read_null_terminated(&data[offset..offset + node_len])?;
> + offset += node_len;
> +
> + let ident = read_null_terminated(&data[offset..offset + ident_len])?;
> + offset += ident_len;
> +
> + let tag = read_null_terminated(&data[offset..offset + tag_len])?;
> + offset += tag_len;
> +
> + let message = read_null_terminated(&data[offset..offset + msg_len])?;
> +
> + Ok((
> + Self {
> + uid,
> + time,
> + node_digest,
> + ident_digest,
> + pid,
> + priority,
> + node,
> + ident,
> + tag,
> + message,
> + },
> + prev,
> + next,
> + ))
> + }
> +}
> +
> +fn read_null_terminated(data: &[u8]) -> Result<String> {
> + let len = data.iter().position(|&b| b == 0).unwrap_or(data.len());
> + Ok(String::from_utf8_lossy(&data[..len]).into_owned())
> +}
> +
> +#[cfg(test)]
> +pub fn reset_uid_counter() {
> + UID_COUNTER.store(0, Ordering::SeqCst);
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> +
> + #[test]
> + fn test_pack_entry() {
> + reset_uid_counter();
> +
> + let entry = LogEntry::pack(
> + "node1",
> + "root",
> + "cluster",
> + 12345,
> + 1234567890,
> + 6, // Info priority
> + "Test message",
> + )
> + .unwrap();
> +
> + assert_eq!(entry.uid, 1);
> + assert_eq!(entry.time, 1234567890);
> + assert_eq!(entry.node, "node1");
> + assert_eq!(entry.ident, "root");
> + assert_eq!(entry.tag, "cluster");
> + assert_eq!(entry.pid, 12345);
> + assert_eq!(entry.priority, 6);
> + assert_eq!(entry.message, "Test message");
> + }
> +
> + #[test]
> + fn test_uid_increment() {
> + reset_uid_counter();
> +
> + let entry1 = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg1").unwrap();
> + let entry2 = LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "msg2").unwrap();
> +
> + assert_eq!(entry1.uid, 1);
> + assert_eq!(entry2.uid, 2);
> + }
> +
> + #[test]
> + fn test_invalid_priority() {
> + let result = LogEntry::pack("node1", "root", "tag", 0, 1000, 8, "message");
> + assert!(result.is_err());
> + }
> +
> + #[test]
> + fn test_node_digest() {
> + let entry1 = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg").unwrap();
> + let entry2 = LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "msg").unwrap();
> + let entry3 = LogEntry::pack("node2", "root", "tag", 0, 1000, 6, "msg").unwrap();
> +
> + // Same node should have same digest
> + assert_eq!(entry1.node_digest, entry2.node_digest);
> +
> + // Different node should have different digest
> + assert_ne!(entry1.node_digest, entry3.node_digest);
> + }
> +
> + #[test]
> + fn test_ident_digest() {
> + let entry1 = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg").unwrap();
> + let entry2 = LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "msg").unwrap();
> + let entry3 = LogEntry::pack("node1", "admin", "tag", 0, 1000, 6, "msg").unwrap();
> +
> + // Same ident should have same digest
> + assert_eq!(entry1.ident_digest, entry2.ident_digest);
> +
> + // Different ident should have different digest
> + assert_ne!(entry1.ident_digest, entry3.ident_digest);
> + }
> +
> + #[test]
> + fn test_utf8_to_ascii() {
> + let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "Hello 世界").unwrap();
> + assert!(entry.message.is_ascii());
> + // Unicode chars escaped as \uXXXX format (matches C implementation)
> + assert!(entry.message.contains("\\u4e16")); // 世 = U+4E16
> + assert!(entry.message.contains("\\u754c")); // 界 = U+754C
> + }
> +
> + #[test]
> + fn test_utf8_control_chars() {
> + // Test control character escaping
> + let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "Hello\x07World").unwrap();
> + assert!(entry.message.is_ascii());
> + // BEL (0x07) should be escaped as #0007
> + assert!(entry.message.contains("#0007"));
> + }
> +
> + #[test]
> + fn test_utf8_mixed_content() {
> + // Test mix of ASCII, Unicode, and control chars
> + let entry = LogEntry::pack(
> + "node1",
> + "root",
> + "tag",
> + 0,
> + 1000,
> + 6,
> + "Test\x01\nUnicode世\ttab",
> + )
> + .unwrap();
> + assert!(entry.message.is_ascii());
> + // SOH (0x01) -> #0001
> + assert!(entry.message.contains("#0001"));
> + // Newline (0x0A) -> #0010
> + assert!(entry.message.contains("#0010"));
> + // Unicode 世 (U+4E16) -> \u4e16
> + assert!(entry.message.contains("\\u4e16"));
> + // Tab (0x09) -> #0009
> + assert!(entry.message.contains("#0009"));
> + }
> +
> + #[test]
> + fn test_string_truncation() {
> + let long_node = "a".repeat(300);
> + let entry = LogEntry::pack(&long_node, "root", "tag", 0, 1000, 6, "msg").unwrap();
> + assert!(entry.node.len() <= 255);
> + }
> +
> + #[test]
> + fn test_message_truncation() {
> + let long_message = "a".repeat(CLOG_MAX_ENTRY_SIZE);
> + let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, &long_message).unwrap();
> + // Entry should fit within max size
> + assert!(entry.size() <= CLOG_MAX_ENTRY_SIZE);
> + }
> +
> + #[test]
> + fn test_aligned_size() {
> + let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg").unwrap();
> + let aligned = entry.aligned_size();
> +
> + // Aligned size should be multiple of 8
> + assert_eq!(aligned % 8, 0);
> +
> + // Aligned size should be >= actual size
> + assert!(aligned >= entry.size());
> +
> + // Aligned size should be within 7 bytes of actual size
> + assert!(aligned - entry.size() < 8);
> + }
> +
> + #[test]
> + fn test_json_export() {
> + let entry = LogEntry::pack("node1", "root", "cluster", 123, 1234567890, 6, "Test").unwrap();
> + let json = entry.to_json_object();
> +
> + assert_eq!(json["node"], "node1");
> + assert_eq!(json["user"], "root");
> + assert_eq!(json["tag"], "cluster");
> + assert_eq!(json["pid"], 123);
> + assert_eq!(json["time"], 1234567890);
> + assert_eq!(json["pri"], 6);
> + assert_eq!(json["msg"], "Test");
> + }
> +
> + #[test]
> + fn test_binary_serialization_roundtrip() {
> + let entry = LogEntry::pack(
> + "node1",
> + "root",
> + "cluster",
> + 12345,
> + 1234567890,
> + 6,
> + "Test message",
> + )
> + .unwrap();
> +
> + // Serialize with prev/next pointers
> + let binary = entry.serialize_binary(100, 200);
> +
> + // Deserialize
> + let (deserialized, prev, next) = LogEntry::deserialize_binary(&binary).unwrap();
> +
> + // Check prev/next pointers
> + assert_eq!(prev, 100);
> + assert_eq!(next, 200);
> +
> + // Check entry fields
> + assert_eq!(deserialized.uid, entry.uid);
> + assert_eq!(deserialized.time, entry.time);
> + assert_eq!(deserialized.node_digest, entry.node_digest);
> + assert_eq!(deserialized.ident_digest, entry.ident_digest);
> + assert_eq!(deserialized.pid, entry.pid);
> + assert_eq!(deserialized.priority, entry.priority);
> + assert_eq!(deserialized.node, entry.node);
> + assert_eq!(deserialized.ident, entry.ident);
> + assert_eq!(deserialized.tag, entry.tag);
> + assert_eq!(deserialized.message, entry.message);
> + }
> +
> + #[test]
> + fn test_binary_format_header_size() {
> + let entry = LogEntry::pack("n", "u", "t", 1, 1000, 6, "m").unwrap();
> + let binary = entry.serialize_binary(0, 0);
> +
> + // Header should be exactly 48 bytes
> + // prev(4) + next(4) + uid(4) + time(4) + node_digest(8) + ident_digest(8) +
> + // pid(4) + priority(1) + node_len(1) + ident_len(1) + tag_len(1) + msg_len(4)
> + assert!(binary.len() >= 48);
> +
> + // First 48 bytes are header
> + assert_eq!(&binary[0..4], &0u32.to_le_bytes()); // prev
> + assert_eq!(&binary[4..8], &0u32.to_le_bytes()); // next
> + }
> +
> + #[test]
> + fn test_binary_deserialize_invalid_size() {
> + let too_small = vec![0u8; 40]; // Less than 48 byte header
> + let result = LogEntry::deserialize_binary(&too_small);
> + assert!(result.is_err());
> + }
> +
> + #[test]
> + fn test_binary_null_terminators() {
> + let entry = LogEntry::pack("node1", "root", "tag", 123, 1000, 6, "message").unwrap();
> + let binary = entry.serialize_binary(0, 0);
> +
> + // Check that strings are null-terminated
> + // Find null bytes in data section (after 48-byte header)
> + let data_section = &binary[48..];
> + let null_count = data_section.iter().filter(|&&b| b == 0).count();
> + assert_eq!(null_count, 4); // 4 null terminators (node, ident, tag, msg)
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/hash.rs b/src/pmxcfs-rs/pmxcfs-logger/src/hash.rs
> new file mode 100644
> index 00000000..710c9ab3
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/hash.rs
> @@ -0,0 +1,173 @@
> +/// FNV-1a (Fowler-Noll-Vo) 64-bit hash function
> +///
> +/// This matches the C implementation's fnv_64a_buf function (logger.c:52-60)
> +/// Used for generating node and ident digests for deduplication.
> +/// FNV-1a 64-bit non-zero initial basis
> +pub(crate) const FNV1A_64_INIT: u64 = 0xcbf29ce484222325;
> +
> +/// Compute 64-bit FNV-1a hash
> +///
> +/// This is a faithful port of the C implementation from logger.c lines 52-60:
> +/// ```c
> +/// static inline uint64_t fnv_64a_buf(const void *buf, size_t len, uint64_t hval) {
> +/// unsigned char *bp = (unsigned char *)buf;
> +/// unsigned char *be = bp + len;
> +/// while (bp < be) {
> +/// hval ^= (uint64_t)*bp++;
> +/// hval += (hval << 1) + (hval << 4) + (hval << 5) + (hval << 7) + (hval << 8) + (hval << 40);
> +/// }
> +/// return hval;
> +/// }
> +/// ```
> +///
> +/// # Arguments
> +/// * `data` - The data to hash
> +/// * `init` - Initial hash value (use FNV1A_64_INIT for first hash)
> +///
> +/// # Returns
> +/// 64-bit hash value
> +///
> +/// Note: This function appears unused but is actually called via `fnv_64a_str` below,
> +/// which provides the primary API for string hashing. Both functions share the core
> +/// FNV-1a implementation logic.
> +#[inline]
> +#[allow(dead_code)] // Used via fnv_64a_str wrapper
> +pub(crate) fn fnv_64a(data: &[u8], init: u64) -> u64 {
> + let mut hval = init;
> +
> + for &byte in data {
> + hval ^= byte as u64;
> + // FNV magic prime multiplication done via shifts and adds
> + // This is equivalent to: hval *= 0x100000001b3 (FNV 64-bit prime)
> + hval = hval.wrapping_add(
> + (hval << 1)
> + .wrapping_add(hval << 4)
> + .wrapping_add(hval << 5)
> + .wrapping_add(hval << 7)
> + .wrapping_add(hval << 8)
> + .wrapping_add(hval << 40),
> + );
> + }
> +
> + hval
> +}
> +
> +/// Hash a null-terminated string (includes the null byte)
> +///
> +/// The C implementation includes the null terminator in the hash:
> +/// `fnv_64a_buf(node, node_len, FNV1A_64_INIT)` where node_len includes the '\0'
> +///
> +/// This function adds a null byte to match that behavior.
> +#[inline]
> +pub(crate) fn fnv_64a_str(s: &str) -> u64 {
> + let bytes = s.as_bytes();
> + let mut hval = FNV1A_64_INIT;
> +
> + for &byte in bytes {
> + hval ^= byte as u64;
> + hval = hval.wrapping_add(
> + (hval << 1)
> + .wrapping_add(hval << 4)
> + .wrapping_add(hval << 5)
> + .wrapping_add(hval << 7)
> + .wrapping_add(hval << 8)
> + .wrapping_add(hval << 40),
> + );
> + }
> +
> + // Hash the null terminator (C compatibility: original XORs with 0 which is a no-op)
> + // We skip the no-op XOR and proceed directly to the final avalanche
> + hval.wrapping_add(
> + (hval << 1)
> + .wrapping_add(hval << 4)
> + .wrapping_add(hval << 5)
> + .wrapping_add(hval << 7)
> + .wrapping_add(hval << 8)
> + .wrapping_add(hval << 40),
> + )
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> +
> + #[test]
> + fn test_fnv1a_init() {
> + // Test that init constant matches C implementation
> + assert_eq!(FNV1A_64_INIT, 0xcbf29ce484222325);
> + }
> +
> + #[test]
> + fn test_fnv1a_empty() {
> + // Empty string with null terminator
> + let hash = fnv_64a(&[0], FNV1A_64_INIT);
> + assert_ne!(hash, FNV1A_64_INIT); // Should be different from init
> + }
> +
> + #[test]
> + fn test_fnv1a_consistency() {
> + // Same input should produce same output
> + let data = b"test";
> + let hash1 = fnv_64a(data, FNV1A_64_INIT);
> + let hash2 = fnv_64a(data, FNV1A_64_INIT);
> + assert_eq!(hash1, hash2);
> + }
> +
> + #[test]
> + fn test_fnv1a_different_data() {
> + // Different input should (usually) produce different output
> + let hash1 = fnv_64a(b"test1", FNV1A_64_INIT);
> + let hash2 = fnv_64a(b"test2", FNV1A_64_INIT);
> + assert_ne!(hash1, hash2);
> + }
> +
> + #[test]
> + fn test_fnv1a_str() {
> + // Test string hashing with null terminator
> + let hash1 = fnv_64a_str("node1");
> + let hash2 = fnv_64a_str("node1");
> + let hash3 = fnv_64a_str("node2");
> +
> + assert_eq!(hash1, hash2); // Same string should hash the same
> + assert_ne!(hash1, hash3); // Different strings should hash differently
> + }
> +
> + #[test]
> + fn test_fnv1a_node_names() {
> + // Test with typical Proxmox node names
> + let nodes = vec!["pve1", "pve2", "pve3"];
> + let mut hashes = Vec::new();
> +
> + for node in &nodes {
> + let hash = fnv_64a_str(node);
> + hashes.push(hash);
> + }
> +
> + // All hashes should be unique
> + for i in 0..hashes.len() {
> + for j in (i + 1)..hashes.len() {
> + assert_ne!(
> + hashes[i], hashes[j],
> + "Hashes for {} and {} should differ",
> + nodes[i], nodes[j]
> + );
> + }
> + }
> + }
> +
> + #[test]
> + fn test_fnv1a_chaining() {
> + // Test that we can chain hashes
> + let data1 = b"first";
> + let data2 = b"second";
> +
> + let hash1 = fnv_64a(data1, FNV1A_64_INIT);
> + let hash2 = fnv_64a(data2, hash1); // Use previous hash as init
> +
> + // Should produce a deterministic result
> + let hash1_again = fnv_64a(data1, FNV1A_64_INIT);
> + let hash2_again = fnv_64a(data2, hash1_again);
> +
> + assert_eq!(hash2, hash2_again);
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/lib.rs b/src/pmxcfs-rs/pmxcfs-logger/src/lib.rs
> new file mode 100644
> index 00000000..964f0b3a
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/lib.rs
> @@ -0,0 +1,27 @@
> +/// Cluster Log Implementation
> +///
> +/// This module provides a cluster-wide log system compatible with the C implementation.
> +/// It maintains a ring buffer of log entries that can be merged from multiple nodes,
> +/// deduplicated, and exported to JSON.
> +///
> +/// Key features:
> +/// - Ring buffer storage for efficient memory usage
> +/// - FNV-1a hashing for node and ident tracking
> +/// - Deduplication across nodes
> +/// - Time-based sorting
> +/// - Multi-node log merging
> +/// - JSON export for web UI
> +// Internal modules (not exposed)
> +mod cluster_log;
> +mod entry;
> +mod hash;
> +mod ring_buffer;
> +
> +// Public API - only expose what's needed externally
> +pub use cluster_log::ClusterLog;
> +
> +// Re-export types only for testing or internal crate use
> +#[doc(hidden)]
> +pub use entry::LogEntry;
> +#[doc(hidden)]
> +pub use ring_buffer::RingBuffer;
> diff --git a/src/pmxcfs-rs/pmxcfs-logger/src/ring_buffer.rs b/src/pmxcfs-rs/pmxcfs-logger/src/ring_buffer.rs
> new file mode 100644
> index 00000000..4f6db63e
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-logger/src/ring_buffer.rs
> @@ -0,0 +1,581 @@
> +/// Ring Buffer Implementation for Cluster Log
> +///
> +/// This module implements a circular buffer for storing log entries,
> +/// matching the C implementation's clog_base_t structure.
> +use super::entry::LogEntry;
> +use super::hash::fnv_64a_str;
> +use anyhow::{bail, Result};
> +use std::collections::VecDeque;
> +
> +pub(crate) const CLOG_DEFAULT_SIZE: usize = 5 * 1024 * 1024; // 5MB
> +pub(crate) const CLOG_MAX_ENTRY_SIZE: usize = 8192 + 4096;
These constants don't match the C constants
#define CLOG_DEFAULT_SIZE (8192 * 16)
#define CLOG_MAX_ENTRY_SIZE 4096
That likely affects capacity semantics, merge limits, and the binary
format?
> +
> +/// Ring buffer for log entries
> +///
> +/// This is a simplified Rust version of the C implementation's ring buffer.
> +/// The C version uses a raw byte buffer with manual pointer arithmetic,
> +/// but we use a VecDeque for safety and simplicity while maintaining
> +/// the same conceptual behavior.
> +///
> +/// C structure (logger.c:64-68):
> +/// ```c
> +/// struct clog_base {
> +/// uint32_t size; // Total buffer size
> +/// uint32_t cpos; // Current position
> +/// char data[]; // Variable length data
> +/// };
> +/// ```
> +#[derive(Debug, Clone)]
> +pub struct RingBuffer {
> + /// Maximum capacity in bytes
> + capacity: usize,
> +
> + /// Current size in bytes (approximate)
> + current_size: usize,
> +
> + /// Entries stored in the buffer (newest first)
> + /// We use VecDeque for efficient push/pop at both ends
> + entries: VecDeque<LogEntry>,
> +}
> +
> +impl RingBuffer {
> + /// Create a new ring buffer with specified capacity
> + pub fn new(capacity: usize) -> Self {
> + // Ensure minimum capacity
> + let capacity = if capacity < CLOG_MAX_ENTRY_SIZE * 10 {
> + CLOG_DEFAULT_SIZE
> + } else {
> + capacity
> + };
> +
> + Self {
> + capacity,
> + current_size: 0,
> + entries: VecDeque::new(),
> + }
> + }
> +
> + /// Add an entry to the buffer
> + ///
> + /// Matches C's `clog_copy` function (logger.c:208-218) which calls
> + /// `clog_alloc_entry` (logger.c:76-102) to allocate space in the ring buffer.
> + pub fn add_entry(&mut self, entry: &LogEntry) -> Result<()> {
> + let entry_size = entry.aligned_size();
> +
> + // Make room if needed (remove oldest entries)
> + while self.current_size + entry_size > self.capacity && !self.entries.is_empty() {
> + if let Some(old_entry) = self.entries.pop_back() {
> + self.current_size = self.current_size.saturating_sub(old_entry.aligned_size());
> + }
> + }
> +
> + // Add new entry at the front (newest first)
> + self.entries.push_front(entry.clone());
> + self.current_size += entry_size;
> +
> + Ok(())
> + }
> +
> + /// Check if buffer is near full (>90% capacity)
> + pub fn is_near_full(&self) -> bool {
> + self.current_size > (self.capacity * 9 / 10)
> + }
> +
> + /// Check if buffer is empty
> + pub fn is_empty(&self) -> bool {
> + self.entries.is_empty()
> + }
> +
> + /// Get number of entries
> + pub fn len(&self) -> usize {
> + self.entries.len()
> + }
> +
> + /// Get buffer capacity
> + pub fn capacity(&self) -> usize {
> + self.capacity
> + }
> +
> + /// Iterate over entries (newest first)
> + pub fn iter(&self) -> impl Iterator<Item = &LogEntry> {
> + self.entries.iter()
> + }
> +
> + /// Sort entries by time, node_digest, and uid
> + ///
> + /// Matches C's `clog_sort` function (logger.c:321-355)
> + ///
> + /// C uses GTree with custom comparison function `clog_entry_sort_fn`
> + /// (logger.c:297-310):
> + /// ```c
> + /// if (entry1->time != entry2->time) {
> + /// return entry1->time - entry2->time;
> + /// }
> + /// if (entry1->node_digest != entry2->node_digest) {
> + /// return entry1->node_digest - entry2->node_digest;
> + /// }
> + /// return entry1->uid - entry2->uid;
> + /// ```
> + pub fn sort(&self) -> Result<Self> {
> + let mut new_buffer = Self::new(self.capacity);
> +
> + // Collect and sort entries
> + let mut sorted: Vec<LogEntry> = self.entries.iter().cloned().collect();
> +
> + // Sort by time (ascending), then node_digest, then uid
> + sorted.sort_by_key(|e| (e.time, e.node_digest, e.uid));
> +
> + // Add sorted entries to new buffer
> + // Since add_entry pushes to front, we add in forward order to get newest-first
> + // sorted = [oldest...newest], add_entry pushes to front, so:
> + // - Add oldest: [oldest]
> + // - Add next: [next, oldest]
> + // - Add newest: [newest, next, oldest]
> + for entry in sorted.iter() {
> + new_buffer.add_entry(entry)?;
> + }
> +
> + Ok(new_buffer)
> + }
> +
> + /// Dump buffer to JSON format
> + ///
> + /// Matches C's `clog_dump_json` function (logger.c:139-199)
> + ///
> + /// # Arguments
> + /// * `ident_filter` - Optional ident filter (user filter)
> + /// * `max_entries` - Maximum number of entries to include
> + pub fn dump_json(&self, ident_filter: Option<&str>, max_entries: usize) -> String {
> + // Compute ident digest if filter is provided
> + let ident_digest = ident_filter.map(fnv_64a_str);
> +
> + let mut data = Vec::new();
> + let mut count = 0;
> +
> + // Iterate over entries (newest first)
> + for entry in self.iter() {
> + if count >= max_entries {
> + break;
> + }
> +
> + // Apply ident filter if specified
> + if let Some(digest) = ident_digest {
> + if digest != entry.ident_digest {
> + continue;
> + }
> + }
> +
> + data.push(entry.to_json_object());
> + count += 1;
> + }
> +
> + // Reverse to show oldest first (matching C behavior)
> + data.reverse();
C prints entries newest to oldest (walk prev from cpos).
Shouldnt this line be removed?
> +
> + let result = serde_json::json!({
> + "data": data
> + });
> +
> + serde_json::to_string_pretty(&result).unwrap_or_else(|_| "{}".to_string())
> + }
> +
> + /// Dump buffer contents (for debugging)
> + ///
> + /// Matches C's `clog_dump` function (logger.c:122-137)
> + #[allow(dead_code)]
> + pub fn dump(&self) {
> + for (idx, entry) in self.entries.iter().enumerate() {
> + println!(
> + "[{}] uid={:08x} time={} node={}{{{:016X}}} tag={}[{}{{{:016X}}}]: {}",
> + idx,
> + entry.uid,
> + entry.time,
> + entry.node,
> + entry.node_digest,
> + entry.tag,
> + entry.ident,
> + entry.ident_digest,
> + entry.message
> + );
> + }
> + }
> +
> + /// Serialize to C binary format (clog_base_t)
> + ///
> + /// Binary layout matches C structure:
> + /// ```c
> + /// struct clog_base {
> + /// uint32_t size; // Total buffer size
> + /// uint32_t cpos; // Current position (offset to newest entry)
> + /// char data[]; // Entry data
> + /// };
> + /// ```
> + pub(crate) fn serialize_binary(&self) -> Vec<u8> {
Please re-check, but in C, clusterlog_get_state() returns a full
memdump (allocated ring buffer capacity), with cpos pointing at the
newest entry offset (not always 8). Also in C, entry.next is not a
pointer to the next/newer entry, it’s the end offset of this entry
(entry_off + aligned_size), used to find where the next entry
should be written.
> + // Empty buffer case
> + if self.entries.is_empty() {
> + let mut buf = Vec::with_capacity(8);
> + buf.extend_from_slice(&8u32.to_le_bytes()); // size = header only
> + buf.extend_from_slice(&0u32.to_le_bytes()); // cpos = 0 (empty)
> + return buf;
> + }
> +
> + // Calculate total size needed
> + let mut data_size = 0usize;
> + for entry in self.iter() {
> + data_size += entry.aligned_size();
> + }
> +
> + let total_size = 8 + data_size; // 8 bytes header + data
> + let mut buf = Vec::with_capacity(total_size);
> +
> + // Write header
> + buf.extend_from_slice(&(total_size as u32).to_le_bytes()); // size
> + buf.extend_from_slice(&8u32.to_le_bytes()); // cpos (points to first entry at offset 8)
> +
> + // Write entries with linked list structure
> + // Entries are in newest-first order in our VecDeque
> + let entry_count = self.entries.len();
> + let mut offsets = Vec::with_capacity(entry_count);
> + let mut current_offset = 8u32; // Start after header
> +
> + // Calculate offsets first
> + for entry in self.iter() {
> + offsets.push(current_offset);
> + current_offset += entry.aligned_size() as u32;
> + }
> +
> + // Write entries with prev/next pointers
> + // Build circular linked list: newest -> ... -> oldest
> + // Entry 0 (newest) has prev pointing to entry 1
> + // Last entry has prev = 0 (end of list)
> + for (i, entry) in self.iter().enumerate() {
> + let prev = if i + 1 < entry_count {
> + offsets[i + 1]
> + } else {
> + 0
> + };
> + let next = if i > 0 { offsets[i - 1] } else { 0 };
> +
> + let entry_bytes = entry.serialize_binary(prev, next);
> + buf.extend_from_slice(&entry_bytes);
> +
> + // Add padding to maintain 8-byte alignment
> + let aligned_size = entry.aligned_size();
> + let padding = aligned_size - entry_bytes.len();
> + buf.resize(buf.len() + padding, 0);
> + }
> +
> + buf
> + }
> +
> + /// Deserialize from C binary format
> + ///
> + /// Parses clog_base_t structure and extracts all entries
> + pub(crate) fn deserialize_binary(data: &[u8]) -> Result<Self> {
> + if data.len() < 8 {
> + bail!(
> + "Buffer too small: {} bytes (need at least 8 for header)",
> + data.len()
> + );
> + }
> +
> + // Read header
> + let size = u32::from_le_bytes(data[0..4].try_into()?) as usize;
> + let cpos = u32::from_le_bytes(data[4..8].try_into()?) as usize;
> +
> + if size != data.len() {
> + bail!(
> + "Size mismatch: header says {}, got {} bytes",
> + size,
> + data.len()
> + );
> + }
> +
> + if cpos < 8 || cpos >= size {
> + // Empty buffer (cpos == 0) or invalid
> + if cpos == 0 {
> + return Ok(Self::new(size));
> + }
> + bail!("Invalid cpos: {cpos} (size: {size})");
> + }
> +
> + // Parse entries starting from cpos, walking backwards via prev pointers
> + let mut entries = VecDeque::new();
> + let mut current_pos = cpos;
> +
C has wrap/overwrite guards when walking prev.
We should probably mirror those checks here too
> + loop {
> + if current_pos == 0 || current_pos < 8 || current_pos >= size {
> + break;
> + }
> +
> + // Parse entry at current_pos
> + let entry_data = &data[current_pos..];
> + let (entry, prev, _next) = LogEntry::deserialize_binary(entry_data)?;
> +
> + // Add to back (we're walking backwards in time, newest to oldest)
> + // VecDeque should end up as [newest, ..., oldest]
> + entries.push_back(entry);
> +
> + current_pos = prev as usize;
> + }
> +
> + // Create ring buffer with entries
> + let mut ring = Self::new(size);
> + ring.entries = entries;
> + ring.current_size = size - 8; // Approximate
> +
> + Ok(ring)
> + }
> +}
> +
> +impl Default for RingBuffer {
> + fn default() -> Self {
> + Self::new(CLOG_DEFAULT_SIZE)
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> +
> + #[test]
> + fn test_ring_buffer_creation() {
> + let buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> + assert_eq!(buffer.capacity, CLOG_DEFAULT_SIZE);
> + assert_eq!(buffer.len(), 0);
> + assert!(buffer.is_empty());
> + }
> +
> + #[test]
> + fn test_add_entry() {
> + let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> + let entry = LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "message").unwrap();
> +
> + let result = buffer.add_entry(&entry);
> + assert!(result.is_ok());
> + assert_eq!(buffer.len(), 1);
> + assert!(!buffer.is_empty());
> + }
> +
> + #[test]
> + fn test_ring_buffer_wraparound() {
> + // Create a buffer with minimum required size (CLOG_MAX_ENTRY_SIZE * 10)
> + // but fill it beyond 90% to trigger wraparound
> + let mut buffer = RingBuffer::new(CLOG_MAX_ENTRY_SIZE * 10);
> +
> + // Add many small entries to fill the buffer
> + // Each entry is small, so we need many to fill the buffer
> + let initial_count = 50_usize;
> + for i in 0..initial_count {
> + let entry =
> + LogEntry::pack("node1", "root", "tag", 0, 1000 + i as u32, 6, "msg").unwrap();
> + let _ = buffer.add_entry(&entry);
> + }
> +
> + // All entries should fit initially
> + let count_before = buffer.len();
> + assert_eq!(count_before, initial_count);
> +
> + // Now add entries with large messages to trigger wraparound
> + // Make messages large enough to fill the buffer beyond capacity
> + let large_msg = "x".repeat(7000); // Very large message (close to max)
> + let large_entries_count = 20_usize;
> + for i in 0..large_entries_count {
> + let entry =
> + LogEntry::pack("node1", "root", "tag", 0, 2000 + i as u32, 6, &large_msg).unwrap();
> + let _ = buffer.add_entry(&entry);
> + }
> +
> + // Should have removed some old entries due to capacity limits
> + assert!(
> + buffer.len() < count_before + large_entries_count,
> + "Expected wraparound to remove old entries (have {} entries, expected < {})",
> + buffer.len(),
> + count_before + large_entries_count
> + );
> +
> + // Newest entry should be present
> + let newest = buffer.iter().next().unwrap();
> + assert_eq!(newest.time, 2000 + large_entries_count as u32 - 1); // Last added entry
> + }
> +
> + #[test]
> + fn test_sort_by_time() {
> + let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> + // Add entries in random time order
> + let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1002, 6, "c").unwrap());
> + let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "a").unwrap());
> + let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "b").unwrap());
> +
> + let sorted = buffer.sort().unwrap();
> +
> + // Check that entries are sorted by time (oldest first after reversing)
> + let times: Vec<u32> = sorted.iter().map(|e| e.time).collect();
> + let mut times_sorted = times.clone();
> + times_sorted.sort();
> + times_sorted.reverse(); // Newest first in buffer
> + assert_eq!(times, times_sorted);
> + }
> +
> + #[test]
> + fn test_sort_by_node_digest() {
> + let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> + // Add entries with same time but different nodes
> + let _ = buffer.add_entry(&LogEntry::pack("node3", "root", "tag", 0, 1000, 6, "c").unwrap());
> + let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "a").unwrap());
> + let _ = buffer.add_entry(&LogEntry::pack("node2", "root", "tag", 0, 1000, 6, "b").unwrap());
> +
> + let sorted = buffer.sort().unwrap();
> +
> + // Entries with same time should be sorted by node_digest
> + // Within same time, should be sorted
> + for entries in sorted.iter().collect::<Vec<_>>().windows(2) {
> + if entries[0].time == entries[1].time {
> + assert!(entries[0].node_digest >= entries[1].node_digest);
> + }
> + }
> + }
> +
> + #[test]
> + fn test_json_dump() {
> + let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> + let _ = buffer
> + .add_entry(&LogEntry::pack("node1", "root", "cluster", 123, 1000, 6, "msg").unwrap());
> +
> + let json = buffer.dump_json(None, 50);
> +
> + // Should be valid JSON
> + let parsed: serde_json::Value = serde_json::from_str(&json).unwrap();
> + assert!(parsed.get("data").is_some());
> +
> + let data = parsed["data"].as_array().unwrap();
> + assert_eq!(data.len(), 1);
> +
> + let entry = &data[0];
> + assert_eq!(entry["node"], "node1");
> + assert_eq!(entry["user"], "root");
> + assert_eq!(entry["tag"], "cluster");
> + }
> +
> + #[test]
> + fn test_json_dump_with_filter() {
> + let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> + // Add entries with different users
> + let _ =
> + buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "msg1").unwrap());
> + let _ =
> + buffer.add_entry(&LogEntry::pack("node1", "admin", "tag", 0, 1001, 6, "msg2").unwrap());
> + let _ =
> + buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1002, 6, "msg3").unwrap());
> +
> + // Filter for "root" only
> + let json = buffer.dump_json(Some("root"), 50);
> +
> + let parsed: serde_json::Value = serde_json::from_str(&json).unwrap();
> + let data = parsed["data"].as_array().unwrap();
> +
> + // Should only have 2 entries (the ones from "root")
> + assert_eq!(data.len(), 2);
> +
> + for entry in data {
> + assert_eq!(entry["user"], "root");
> + }
> + }
> +
> + #[test]
> + fn test_json_dump_max_entries() {
> + let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> + // Add 10 entries
> + for i in 0..10 {
> + let _ = buffer
> + .add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000 + i, 6, "msg").unwrap());
> + }
> +
> + // Request only 5 entries
> + let json = buffer.dump_json(None, 5);
> +
> + let parsed: serde_json::Value = serde_json::from_str(&json).unwrap();
> + let data = parsed["data"].as_array().unwrap();
> +
> + assert_eq!(data.len(), 5);
> + }
> +
> + #[test]
> + fn test_iterator() {
> + let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> + let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1000, 6, "a").unwrap());
> + let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1001, 6, "b").unwrap());
> + let _ = buffer.add_entry(&LogEntry::pack("node1", "root", "tag", 0, 1002, 6, "c").unwrap());
> +
> + let messages: Vec<String> = buffer.iter().map(|e| e.message.clone()).collect();
> +
> + // Should be in reverse order (newest first)
> + assert_eq!(messages, vec!["c", "b", "a"]);
> + }
> +
> + #[test]
> + fn test_binary_serialization_roundtrip() {
> + let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> +
> + let _ = buffer.add_entry(
> + &LogEntry::pack("node1", "root", "cluster", 123, 1000, 6, "Entry 1").unwrap(),
> + );
> + let _ = buffer.add_entry(
> + &LogEntry::pack("node2", "admin", "system", 456, 1001, 5, "Entry 2").unwrap(),
> + );
> +
> + // Serialize
> + let binary = buffer.serialize_binary();
> +
> + // Deserialize
> + let deserialized = RingBuffer::deserialize_binary(&binary).unwrap();
> +
> + // Check entry count
> + assert_eq!(deserialized.len(), buffer.len());
> +
> + // Check entries match
> + let orig_entries: Vec<_> = buffer.iter().collect();
> + let deser_entries: Vec<_> = deserialized.iter().collect();
> +
> + for (orig, deser) in orig_entries.iter().zip(deser_entries.iter()) {
> + assert_eq!(deser.uid, orig.uid);
> + assert_eq!(deser.time, orig.time);
> + assert_eq!(deser.node, orig.node);
> + assert_eq!(deser.message, orig.message);
> + }
> + }
> +
> + #[test]
> + fn test_binary_format_header() {
> + let mut buffer = RingBuffer::new(CLOG_DEFAULT_SIZE);
> + let _ = buffer.add_entry(&LogEntry::pack("n", "u", "t", 1, 1000, 6, "m").unwrap());
> +
> + let binary = buffer.serialize_binary();
> +
> + // Check header format
> + assert!(binary.len() >= 8);
> +
> + let size = u32::from_le_bytes(binary[0..4].try_into().unwrap()) as usize;
> + let cpos = u32::from_le_bytes(binary[4..8].try_into().unwrap());
> +
> + assert_eq!(size, binary.len());
> + assert_eq!(cpos, 8); // First entry at offset 8
> + }
> +
> + #[test]
> + fn test_binary_empty_buffer() {
> + let buffer = RingBuffer::new(1024);
> + let binary = buffer.serialize_binary();
> +
> + // Empty buffer should just be header
> + assert_eq!(binary.len(), 8);
> +
> + let deserialized = RingBuffer::deserialize_binary(&binary).unwrap();
> + assert_eq!(deserialized.len(), 0);
> + }
> +}
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
^ permalink raw reply [relevance 6%]
* Re: [pve-devel] [PATCH pve-cluster 04/15] pmxcfs-rs: add pmxcfs-rrd crate
@ 2026-01-29 14:44 5% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-29 14:44 UTC (permalink / raw)
To: Proxmox VE development discussion, Kefu Chai
Thanks for the patch, Kefu.
Overall looks good (nice backend abstraction and schema separation).
I left a few inline notes around transform_data() skip logic,
sanitizing key path components, .rrd on-disk naming consistency across
backends/tests, and adding a few actual payload fixtures for the
transform tests.
Please see comments inline.
On 1/7/26 10:15 AM, Kefu Chai wrote:
> Add RRD (Round-Robin Database) file persistence system:
> - RrdWriter: Main API for RRD operations
> - Schema definitions for CPU, memory, network metrics
> - Format migration support (v1/v2/v3)
> - rrdcached integration for batched writes
> - Data transformation for legacy formats
>
> This is an independent crate with no internal dependencies,
> only requiring external RRD libraries (rrd, rrdcached-client)
> and tokio for async operations. It handles time-series data
> storage compatible with the C implementation.
>
> Includes comprehensive unit tests for data transformation,
> schema generation, and multi-source data processing.
>
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
> src/pmxcfs-rs/Cargo.toml | 1 +
> src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml | 18 +
> src/pmxcfs-rs/pmxcfs-rrd/README.md | 51 ++
> src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs | 67 ++
> .../pmxcfs-rrd/src/backend/backend_daemon.rs | 214 +++++++
> .../pmxcfs-rrd/src/backend/backend_direct.rs | 606 ++++++++++++++++++
> .../src/backend/backend_fallback.rs | 229 +++++++
> src/pmxcfs-rs/pmxcfs-rrd/src/daemon.rs | 140 ++++
> src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs | 313 +++++++++
> src/pmxcfs-rs/pmxcfs-rrd/src/lib.rs | 21 +
> src/pmxcfs-rs/pmxcfs-rrd/src/schema.rs | 577 +++++++++++++++++
> src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs | 397 ++++++++++++
> 12 files changed, 2634 insertions(+)
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/README.md
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_daemon.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_direct.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_fallback.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/daemon.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/lib.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/schema.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs
>
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index 4d17e87e..dd36c81f 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -4,6 +4,7 @@ members = [
> "pmxcfs-api-types", # Shared types and error definitions
> "pmxcfs-config", # Configuration management
> "pmxcfs-logger", # Cluster log with ring buffer and deduplication
> + "pmxcfs-rrd", # RRD (Round-Robin Database) persistence
> ]
> resolver = "2"
>
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml b/src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml
> new file mode 100644
> index 00000000..bab71423
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/Cargo.toml
> @@ -0,0 +1,18 @@
> +[package]
> +name = "pmxcfs-rrd"
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +
> +[dependencies]
> +anyhow.workspace = true
> +async-trait = "0.1"
> +chrono = { version = "0.4", default-features = false, features = ["clock"] }
> +rrd = "0.2"
> +rrdcached-client = "0.1.5"
This crate looks fairly young/small. Are we comfortable depending on
it? We could probably vendor/fork it to control stability?
> +tokio.workspace = true
> +tracing.workspace = true
> +
> +[dev-dependencies]
> +tempfile.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/README.md b/src/pmxcfs-rs/pmxcfs-rrd/README.md
> new file mode 100644
> index 00000000..800d78cf
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/README.md
> @@ -0,0 +1,51 @@
> +# pmxcfs-rrd
> +
> +RRD (Round-Robin Database) persistence for pmxcfs performance metrics.
> +
> +## Overview
> +
> +This crate provides RRD file management for storing time-series performance data from Proxmox nodes and VMs. It handles file creation, updates, and integration with rrdcached daemon for efficient writes.
Can we elaborate on the usage / flow of this crate?
How it will be called, what data will be passed, how the transformation
works, how the backend impls differ. This will help reviewers for sure.
Maybe also add a small code example how this lib is used, which I think
is valuable.
> +
> +### Key Features
> +
> +- RRD file creation with schema-based initialization
> +- RRD updates (write metrics to disk)
> +- rrdcached integration for batched writes
> +- Support for both legacy and current schema versions
> +- Type-safe key parsing and validation
> +- Compatible with existing C-created RRD files
> +
> +## Module Structure
> +
> +| Module | Purpose |
> +|--------|---------|
> +| `writer.rs` | Main RrdWriter API |
> +| `schema.rs` | RRD schema definitions (DS, RRA) |
> +| `key_type.rs` | RRD key parsing and validation |
> +| `daemon.rs` | rrdcached daemon client |
The backend module is not listed here.
But I think we could drop this table. If we keep it, I think it would
be helpful to elaborate a bit more on the components.
> +
> +## External Dependencies
> +
> +- **librrd**: RRDtool library (via FFI bindings)
Lets explicitly note the rrd crate here, which provides the bindings
> +- **rrdcached**: Optional daemon for batched writes and improved performance
Since rrdcached is optional, we could also add a feature flag to reduce
dependencies/build surface?
> +
> +## Testing
> +
> +Unit tests verify:
> +- Schema generation and validation
> +- Key parsing for different RRD types (node, VM, storage)
> +- RRD file creation and update operations
> +- rrdcached client connection and fallback behavior
> +
> +Run tests with:
> +```bash
> +cargo test -p pmxcfs-rrd
> +```
> +
> +## References
> +
> +- **C Implementation**: `src/pmxcfs/status.c` (RRD code embedded)
> +- **Related Crates**:
> + - `pmxcfs-status` - Uses RrdWriter for metrics persistence
> + - `pmxcfs` - FUSE `.rrd` plugin reads RRD files
> +- **RRDtool Documentation**: https://oss.oetiker.ch/rrdtool/
Thanks for adding the references and how they are used in C, this is
very helpful I think.
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs
> new file mode 100644
> index 00000000..58652831
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/backend.rs
> @@ -0,0 +1,67 @@
> +/// RRD Backend Trait and Implementations
> +///
> +/// This module provides an abstraction over different RRD writing mechanisms:
> +/// - Daemon-based (via rrdcached) for performance and batching
> +/// - Direct file writing for reliability and fallback scenarios
> +/// - Fallback composite that tries daemon first, then falls back to direct
> +///
> +/// This design matches the C implementation's behavior in status.c where
> +/// it attempts daemon update first, then falls back to direct file writes.
> +use super::schema::RrdSchema;
> +use anyhow::Result;
> +use async_trait::async_trait;
> +use std::path::Path;
> +
> +/// Trait for RRD backend implementations
> +///
> +/// Provides abstraction over different RRD writing mechanisms.
> +/// All methods are async to support both async (daemon) and sync (direct file) operations.
> +#[async_trait]
> +pub trait RrdBackend: Send + Sync {
Great idea to abstract this!
> + /// Update RRD file with new data
> + ///
> + /// # Arguments
> + /// * `file_path` - Full path to the RRD file
> + /// * `data` - Update data in format "timestamp:value1:value2:..."
> + async fn update(&mut self, file_path: &Path, data: &str) -> Result<()>;
> +
> + /// Create new RRD file with schema
> + ///
> + /// # Arguments
> + /// * `file_path` - Full path where RRD file should be created
> + /// * `schema` - RRD schema defining data sources and archives
> + /// * `start_timestamp` - Start time for the RRD file (Unix timestamp)
> + async fn create(
> + &mut self,
> + file_path: &Path,
> + schema: &RrdSchema,
> + start_timestamp: i64,
> + ) -> Result<()>;
> +
> + /// Flush pending updates to disk
> + ///
> + /// For daemon backends, this sends a FLUSH command.
> + /// For direct backends, this is a no-op (writes are immediate).
> + #[allow(dead_code)] // Used in backend implementations via trait dispatch
> + async fn flush(&mut self) -> Result<()>;
> +
> + /// Check if backend is available and healthy
> + ///
> + /// Returns true if the backend can be used for operations.
> + /// For daemon backends, this checks if the connection is alive.
> + /// For direct backends, this always returns true.
> + #[allow(dead_code)] // Used in fallback backend via trait dispatch
> + async fn is_available(&self) -> bool;
> +
> + /// Get a human-readable name for this backend
> + fn name(&self) -> &str;
> +}
> +
> +// Backend implementations
> +mod backend_daemon;
> +mod backend_direct;
> +mod backend_fallback;
> +
> +pub use backend_daemon::RrdCachedBackend;
> +pub use backend_direct::RrdDirectBackend;
> +pub use backend_fallback::RrdFallbackBackend;
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_daemon.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_daemon.rs
> new file mode 100644
> index 00000000..28c1a99a
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_daemon.rs
> @@ -0,0 +1,214 @@
> +/// RRD Backend: rrdcached daemon
> +///
> +/// Uses rrdcached for batched, high-performance RRD updates.
> +/// This is the preferred backend when the daemon is available.
> +use super::super::schema::RrdSchema;
> +use anyhow::{Context, Result};
> +use async_trait::async_trait;
> +use rrdcached_client::RRDCachedClient;
> +use rrdcached_client::consolidation_function::ConsolidationFunction;
> +use rrdcached_client::create::{
> + CreateArguments, CreateDataSource, CreateDataSourceType, CreateRoundRobinArchive,
> +};
> +use std::path::Path;
> +
> +/// RRD backend using rrdcached daemon
> +pub struct RrdCachedBackend {
> + client: RRDCachedClient<tokio::net::UnixStream>,
> +}
> +
> +impl RrdCachedBackend {
> + /// Connect to rrdcached daemon
> + ///
> + /// # Arguments
> + /// * `socket_path` - Path to rrdcached Unix socket (default: /var/run/rrdcached.sock)
> + pub async fn connect(socket_path: &str) -> Result<Self> {
> + let client = RRDCachedClient::connect_unix(socket_path)
> + .await
> + .with_context(|| format!("Failed to connect to rrdcached at {socket_path}"))?;
> +
> + tracing::info!("Connected to rrdcached at {}", socket_path);
> +
> + Ok(Self { client })
> + }
> +}
> +
> +#[async_trait]
> +impl super::super::backend::RrdBackend for RrdCachedBackend {
> + async fn update(&mut self, file_path: &Path, data: &str) -> Result<()> {
> + // Parse the update data
> + let parts: Vec<&str> = data.split(':').collect();
> + if parts.len() < 2 {
> + anyhow::bail!("Invalid update data format: {data}");
> + }
> +
> + let timestamp = if parts[0] == "N" {
> + None
> + } else {
> + Some(
> + parts[0]
> + .parse::<usize>()
> + .with_context(|| format!("Invalid timestamp: {}", parts[0]))?,
> + )
> + };
> +
> + let values: Vec<f64> = parts[1..]
> + .iter()
> + .map(|v| {
> + if *v == "U" {
> + Ok(f64::NAN)
> + } else {
> + v.parse::<f64>()
> + .with_context(|| format!("Invalid value: {v}"))
> + }
> + })
> + .collect::<Result<Vec<_>>>()?;
> +
> + // Get file path without .rrd extension (rrdcached-client adds it)
> + let path_str = file_path.to_string_lossy();
> + let path_without_ext = path_str.strip_suffix(".rrd").unwrap_or(&path_str);
> +
> + // Send update via rrdcached
> + self.client
> + .update(path_without_ext, timestamp, values)
> + .await
> + .with_context(|| format!("rrdcached update failed for {:?}", file_path))?;
> +
> + tracing::trace!("Updated RRD via daemon: {:?} -> {}", file_path, data);
> +
> + Ok(())
> + }
> +
> + async fn create(
> + &mut self,
> + file_path: &Path,
> + schema: &RrdSchema,
> + start_timestamp: i64,
> + ) -> Result<()> {
> + tracing::debug!(
> + "Creating RRD file via daemon: {:?} with {} data sources",
> + file_path,
> + schema.column_count()
> + );
> +
> + // Convert our data sources to rrdcached-client CreateDataSource objects
> + let mut data_sources = Vec::new();
> + for ds in &schema.data_sources {
> + let serie_type = match ds.ds_type {
> + "GAUGE" => CreateDataSourceType::Gauge,
> + "DERIVE" => CreateDataSourceType::Derive,
> + "COUNTER" => CreateDataSourceType::Counter,
> + "ABSOLUTE" => CreateDataSourceType::Absolute,
> + _ => anyhow::bail!("Unsupported data source type: {}", ds.ds_type),
> + };
> +
> + // Parse min/max values
> + let minimum = if ds.min == "U" {
> + None
> + } else {
> + ds.min.parse().ok()
> + };
> + let maximum = if ds.max == "U" {
> + None
> + } else {
> + ds.max.parse().ok()
> + };
> +
> + let data_source = CreateDataSource {
> + name: ds.name.to_string(),
> + minimum,
> + maximum,
> + heartbeat: ds.heartbeat as i64,
> + serie_type,
> + };
> +
> + data_sources.push(data_source);
> + }
> +
> + // Convert our RRA definitions to rrdcached-client CreateRoundRobinArchive objects
> + let mut archives = Vec::new();
> + for rra in &schema.archives {
> + // Parse RRA string: "RRA:AVERAGE:0.5:1:70"
> + let parts: Vec<&str> = rra.split(':').collect();
> + if parts.len() != 5 || parts[0] != "RRA" {
> + anyhow::bail!("Invalid RRA format: {rra}");
> + }
> +
> + let consolidation_function = match parts[1] {
> + "AVERAGE" => ConsolidationFunction::Average,
> + "MIN" => ConsolidationFunction::Min,
> + "MAX" => ConsolidationFunction::Max,
> + "LAST" => ConsolidationFunction::Last,
> + _ => anyhow::bail!("Unsupported consolidation function: {}", parts[1]),
> + };
> +
> + let xfiles_factor: f64 = parts[2]
> + .parse()
> + .with_context(|| format!("Invalid xff in RRA: {rra}"))?;
> + let steps: i64 = parts[3]
> + .parse()
> + .with_context(|| format!("Invalid steps in RRA: {rra}"))?;
> + let rows: i64 = parts[4]
> + .parse()
> + .with_context(|| format!("Invalid rows in RRA: {rra}"))?;
> +
> + let archive = CreateRoundRobinArchive {
> + consolidation_function,
> + xfiles_factor,
> + steps,
> + rows,
> + };
> + archives.push(archive);
> + }
> +
> + // Get path without .rrd extension (rrdcached-client adds it)
> + let path_str = file_path.to_string_lossy();
> + let path_without_ext = path_str
> + .strip_suffix(".rrd")
> + .unwrap_or(&path_str)
> + .to_string();
> +
> + // Create CreateArguments
> + let create_args = CreateArguments {
> + path: path_without_ext,
> + data_sources,
> + round_robin_archives: archives,
> + start_timestamp: start_timestamp as u64,
> + step_seconds: 60, // 60-second step (1 minute resolution)
> + };
> +
> + // Validate before sending
> + create_args.validate().context("Invalid CREATE arguments")?;
> +
> + // Send CREATE command via rrdcached
> + self.client
> + .create(create_args)
> + .await
> + .with_context(|| format!("Failed to create RRD file via daemon: {file_path:?}"))?;
> +
> + tracing::info!("Created RRD file via daemon: {:?} ({})", file_path, schema);
> +
> + Ok(())
> + }
> +
> + async fn flush(&mut self) -> Result<()> {
> + self.client
> + .flush_all()
> + .await
> + .context("Failed to flush rrdcached")?;
> +
> + tracing::debug!("Flushed all pending RRD updates");
> +
> + Ok(())
> + }
> +
> + async fn is_available(&self) -> bool {
> + // For now, assume we're available if we have a client
> + // Could add a PING command in the future
> + true
> + }
> +
> + fn name(&self) -> &str {
> + "rrdcached"
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_direct.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_direct.rs
> new file mode 100644
> index 00000000..6be3eb5d
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_direct.rs
> @@ -0,0 +1,606 @@
> +/// RRD Backend: Direct file writing
> +///
> +/// Uses the `rrd` crate (librrd bindings) for direct RRD file operations.
> +/// This backend is used as a fallback when rrdcached is unavailable.
> +///
> +/// This matches the C implementation's behavior in status.c:1416-1420 where
> +/// it falls back to rrd_update_r() and rrd_create_r() for direct file access.
> +use super::super::schema::RrdSchema;
> +use anyhow::{Context, Result};
> +use async_trait::async_trait;
> +use std::path::Path;
> +use std::time::Duration;
> +
> +/// RRD backend using direct file operations via librrd
> +pub struct RrdDirectBackend {
> + // Currently stateless, but kept as struct for future enhancements
> +}
> +
> +impl RrdDirectBackend {
> + /// Create a new direct file backend
> + pub fn new() -> Self {
> + tracing::info!("Using direct RRD file backend (via librrd)");
> + Self {}
> + }
> +}
> +
> +impl Default for RrdDirectBackend {
> + fn default() -> Self {
> + Self::new()
> + }
> +}
> +
> +#[async_trait]
> +impl super::super::backend::RrdBackend for RrdDirectBackend {
> + async fn update(&mut self, file_path: &Path, data: &str) -> Result<()> {
> + let path = file_path.to_path_buf();
> + let data_str = data.to_string();
> +
> + // Use tokio::task::spawn_blocking for sync rrd operations
> + // This prevents blocking the async runtime
> + tokio::task::spawn_blocking(move || {
> + // Parse the update data to extract timestamp and values
> + // Format: "timestamp:value1:value2:..."
> + let parts: Vec<&str> = data_str.split(':').collect();
> + if parts.is_empty() {
> + anyhow::bail!("Empty update data");
> + }
> +
> + // Use rrd::ops::update::update_all_with_timestamp
> + // This is the most direct way to update RRD files
> + let timestamp_str = parts[0];
> + let timestamp: i64 = if timestamp_str == "N" {
> + // "N" means "now" in RRD terminology
> + chrono::Utc::now().timestamp()
> + } else {
> + timestamp_str
> + .parse()
> + .with_context(|| format!("Invalid timestamp: {}", timestamp_str))?
> + };
> +
> + let timestamp = chrono::DateTime::from_timestamp(timestamp, 0)
> + .ok_or_else(|| anyhow::anyhow!("Invalid timestamp value: {}", timestamp))?;
> +
> + // Convert values to Datum
> + let values: Vec<rrd::ops::update::Datum> = parts[1..]
> + .iter()
> + .map(|v| {
> + if *v == "U" {
> + // Unknown/unspecified value
> + rrd::ops::update::Datum::Unspecified
> + } else if let Ok(int_val) = v.parse::<u64>() {
> + rrd::ops::update::Datum::Int(int_val)
> + } else if let Ok(float_val) = v.parse::<f64>() {
> + rrd::ops::update::Datum::Float(float_val)
> + } else {
> + rrd::ops::update::Datum::Unspecified
> + }
> + })
> + .collect();
> +
> + // Perform the update
> + rrd::ops::update::update_all(
> + &path,
> + rrd::ops::update::ExtraFlags::empty(),
> + &[(
> + rrd::ops::update::BatchTime::Timestamp(timestamp),
> + values.as_slice(),
> + )],
> + )
> + .with_context(|| format!("Direct RRD update failed for {:?}", path))?;
> +
> + tracing::trace!("Updated RRD via direct file: {:?} -> {}", path, data_str);
> +
> + Ok::<(), anyhow::Error>(())
> + })
> + .await
> + .context("Failed to spawn blocking task for RRD update")??;
> +
> + Ok(())
> + }
> +
> + async fn create(
> + &mut self,
> + file_path: &Path,
> + schema: &RrdSchema,
> + start_timestamp: i64,
> + ) -> Result<()> {
> + tracing::debug!(
> + "Creating RRD file via direct: {:?} with {} data sources",
> + file_path,
> + schema.column_count()
> + );
> +
> + let path = file_path.to_path_buf();
> + let schema = schema.clone();
> +
> + // Ensure parent directory exists
> + if let Some(parent) = path.parent() {
> + std::fs::create_dir_all(parent)
> + .with_context(|| format!("Failed to create directory: {parent:?}"))?;
> + }
> +
> + // Use tokio::task::spawn_blocking for sync rrd operations
> + tokio::task::spawn_blocking(move || {
> + // Convert timestamp
> + let start = chrono::DateTime::from_timestamp(start_timestamp, 0)
> + .ok_or_else(|| anyhow::anyhow!("Invalid start timestamp: {}", start_timestamp))?;
> +
> + // Convert data sources
> + let data_sources: Vec<rrd::ops::create::DataSource> = schema
> + .data_sources
> + .iter()
> + .map(|ds| {
> + let name = rrd::ops::create::DataSourceName::new(ds.name);
> +
> + match ds.ds_type {
> + "GAUGE" => {
> + let min = if ds.min == "U" {
> + None
> + } else {
> + Some(ds.min.parse().context("Invalid min value")?)
> + };
> + let max = if ds.max == "U" {
> + None
> + } else {
> + Some(ds.max.parse().context("Invalid max value")?)
> + };
> + Ok(rrd::ops::create::DataSource::gauge(
> + name,
> + ds.heartbeat,
> + min,
> + max,
> + ))
> + }
> + "DERIVE" => {
> + let min = if ds.min == "U" {
> + None
> + } else {
> + Some(ds.min.parse().context("Invalid min value")?)
> + };
> + let max = if ds.max == "U" {
> + None
> + } else {
> + Some(ds.max.parse().context("Invalid max value")?)
> + };
> + Ok(rrd::ops::create::DataSource::derive(
> + name,
> + ds.heartbeat,
> + min,
> + max,
> + ))
> + }
> + "COUNTER" => {
> + let min = if ds.min == "U" {
> + None
> + } else {
> + Some(ds.min.parse().context("Invalid min value")?)
> + };
> + let max = if ds.max == "U" {
> + None
> + } else {
> + Some(ds.max.parse().context("Invalid max value")?)
> + };
> + Ok(rrd::ops::create::DataSource::counter(
> + name,
> + ds.heartbeat,
> + min,
> + max,
> + ))
> + }
> + "ABSOLUTE" => {
> + let min = if ds.min == "U" {
> + None
> + } else {
> + Some(ds.min.parse().context("Invalid min value")?)
> + };
> + let max = if ds.max == "U" {
> + None
> + } else {
> + Some(ds.max.parse().context("Invalid max value")?)
> + };
> + Ok(rrd::ops::create::DataSource::absolute(
> + name,
> + ds.heartbeat,
> + min,
> + max,
> + ))
> + }
> + _ => anyhow::bail!("Unsupported data source type: {}", ds.ds_type),
> + }
> + })
> + .collect::<Result<Vec<_>>>()?;
> +
> + // Convert RRAs
> + let archives: Result<Vec<rrd::ops::create::Archive>> = schema
> + .archives
> + .iter()
> + .map(|rra| {
> + // Parse RRA string: "RRA:AVERAGE:0.5:1:1440"
> + let parts: Vec<&str> = rra.split(':').collect();
> + if parts.len() != 5 || parts[0] != "RRA" {
> + anyhow::bail!("Invalid RRA format: {}", rra);
> + }
> +
> + let cf = match parts[1] {
> + "AVERAGE" => rrd::ConsolidationFn::Avg,
> + "MIN" => rrd::ConsolidationFn::Min,
> + "MAX" => rrd::ConsolidationFn::Max,
> + "LAST" => rrd::ConsolidationFn::Last,
> + _ => anyhow::bail!("Unsupported consolidation function: {}", parts[1]),
> + };
> +
> + let xff: f64 = parts[2]
> + .parse()
> + .with_context(|| format!("Invalid xff in RRA: {}", rra))?;
> + let steps: u32 = parts[3]
> + .parse()
> + .with_context(|| format!("Invalid steps in RRA: {}", rra))?;
> + let rows: u32 = parts[4]
> + .parse()
> + .with_context(|| format!("Invalid rows in RRA: {}", rra))?;
> +
> + rrd::ops::create::Archive::new(cf, xff, steps, rows)
> + .map_err(|e| anyhow::anyhow!("Failed to create archive: {}", e))
> + })
> + .collect();
> +
> + let archives = archives?;
> +
> + // Call rrd::ops::create::create
> + rrd::ops::create::create(
> + &path,
> + start,
> + Duration::from_secs(60), // 60-second step
> + false, // no_overwrite = false
With overwrite allowed, there could be a race if we have a second,
concurrent create.
> + None, // template
> + &[], // sources
> + data_sources.iter(),
> + archives.iter(),
> + )
> + .with_context(|| format!("Direct RRD create failed for {:?}", path))?;
> +
> + tracing::info!("Created RRD file via direct: {:?} ({})", path, schema);
> +
> + Ok::<(), anyhow::Error>(())
> + })
> + .await
> + .context("Failed to spawn blocking task for RRD create")??;
> +
> + Ok(())
> + }
> +
> + async fn flush(&mut self) -> Result<()> {
> + // No-op for direct backend - writes are immediate
> + tracing::trace!("Flush called on direct backend (no-op)");
> + Ok(())
> + }
> +
> + async fn is_available(&self) -> bool {
> + // Direct backend is always available (no external dependencies)
> + true
> + }
> +
> + fn name(&self) -> &str {
> + "direct"
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> + use crate::backend::RrdBackend;
> + use crate::schema::{RrdFormat, RrdSchema};
> + use std::path::PathBuf;
> + use tempfile::TempDir;
> +
> + // ===== Test Helpers =====
> +
> + /// Create a temporary directory for RRD files
> + fn setup_temp_dir() -> TempDir {
> + TempDir::new().expect("Failed to create temp directory")
> + }
> +
> + /// Create a test RRD file path
> + fn test_rrd_path(dir: &TempDir, name: &str) -> PathBuf {
> + dir.path().join(format!("{}.rrd", name))
What’s the canonical on-disk naming here (with or without .rrd)?
file_path() and the daemon path handling suggest no extension,
but direct/tests currently create *.rrd. Can we make this consistent
across writer/backends/tests?
> + }
> +
> + // ===== RrdDirectBackend Tests =====
> +
> + #[tokio::test]
> + async fn test_direct_backend_create_node_rrd() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "node_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::node(RrdFormat::Pve9_0);
> + let start_time = 1704067200; // 2024-01-01 00:00:00
> +
> + // Create RRD file
> + let result = backend.create(&rrd_path, &schema, start_time).await;
> + assert!(
> + result.is_ok(),
> + "Failed to create node RRD: {:?}",
> + result.err()
> + );
> +
> + // Verify file was created
> + assert!(rrd_path.exists(), "RRD file should exist after create");
> +
> + // Verify backend name
> + assert_eq!(backend.name(), "direct");
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_create_vm_rrd() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "vm_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::vm(RrdFormat::Pve9_0);
> + let start_time = 1704067200;
> +
> + let result = backend.create(&rrd_path, &schema, start_time).await;
> + assert!(
> + result.is_ok(),
> + "Failed to create VM RRD: {:?}",
> + result.err()
> + );
> + assert!(rrd_path.exists());
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_create_storage_rrd() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "storage_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::storage(RrdFormat::Pve2);
> + let start_time = 1704067200;
> +
> + let result = backend.create(&rrd_path, &schema, start_time).await;
> + assert!(
> + result.is_ok(),
> + "Failed to create storage RRD: {:?}",
> + result.err()
> + );
> + assert!(rrd_path.exists());
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_update_with_timestamp() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "update_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::storage(RrdFormat::Pve2);
> + let start_time = 1704067200;
> +
> + // Create RRD file
> + backend
> + .create(&rrd_path, &schema, start_time)
> + .await
> + .expect("Failed to create RRD");
> +
> + // Update with explicit timestamp and values
> + // Format: "timestamp:value1:value2"
> + let update_data = "1704067260:1000000:500000"; // total=1MB, used=500KB
> + let result = backend.update(&rrd_path, update_data).await;
> +
> + assert!(result.is_ok(), "Failed to update RRD: {:?}", result.err());
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_update_with_n_timestamp() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "update_n_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::storage(RrdFormat::Pve2);
> + let start_time = 1704067200;
> +
> + backend
> + .create(&rrd_path, &schema, start_time)
> + .await
> + .expect("Failed to create RRD");
> +
> + // Update with "N" (current time) timestamp
> + let update_data = "N:2000000:750000";
> + let result = backend.update(&rrd_path, update_data).await;
> +
> + assert!(
> + result.is_ok(),
> + "Failed to update RRD with N timestamp: {:?}",
> + result.err()
> + );
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_update_with_unknown_values() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "update_u_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::storage(RrdFormat::Pve2);
> + let start_time = 1704067200;
> +
> + backend
> + .create(&rrd_path, &schema, start_time)
> + .await
> + .expect("Failed to create RRD");
> +
> + // Update with "U" (unknown) values
> + let update_data = "N:U:1000000"; // total unknown, used known
> + let result = backend.update(&rrd_path, update_data).await;
> +
> + assert!(
> + result.is_ok(),
> + "Failed to update RRD with U values: {:?}",
> + result.err()
> + );
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_update_invalid_data() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "invalid_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::storage(RrdFormat::Pve2);
> + let start_time = 1704067200;
> +
> + backend
> + .create(&rrd_path, &schema, start_time)
> + .await
> + .expect("Failed to create RRD");
> +
> + // Test truly invalid data formats that MUST fail
> + // Note: Invalid values like "abc" are converted to Unspecified (U), which is valid RRD behavior
> + let invalid_cases = vec![
> + "", // Empty string
> + ":", // Only separator
> + "timestamp", // Missing values
> + "N", // No colon separator
> + "abc:123:456", // Invalid timestamp (not N or integer)
> + ];
> +
> + for invalid_data in invalid_cases {
> + let result = backend.update(&rrd_path, invalid_data).await;
> + assert!(
> + result.is_err(),
> + "Update should fail for invalid data: '{}', but got Ok",
> + invalid_data
> + );
> + }
> +
> + // Test lenient data formats that succeed (invalid values become Unspecified)
> + // Use explicit timestamps to avoid "same timestamp" errors
> + let mut timestamp = start_time + 60;
> + let lenient_cases = vec![
> + "abc:456", // Invalid first value -> becomes U
> + "123:def", // Invalid second value -> becomes U
> + "U:U", // All unknown
> + ];
> +
> + for valid_data in lenient_cases {
> + let update_data = format!("{}:{}", timestamp, valid_data);
> + let result = backend.update(&rrd_path, &update_data).await;
> + assert!(
> + result.is_ok(),
> + "Update should succeed for lenient data: '{}', but got Err: {:?}",
> + update_data,
> + result.err()
> + );
> + timestamp += 60; // Increment timestamp for next update
> + }
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_update_nonexistent_file() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "nonexistent");
> +
> + let mut backend = RrdDirectBackend::new();
> +
> + // Try to update a file that doesn't exist
> + let result = backend.update(&rrd_path, "N:100:200").await;
> +
> + assert!(result.is_err(), "Update should fail for nonexistent file");
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_flush() {
> + let mut backend = RrdDirectBackend::new();
> +
> + // Flush should always succeed for direct backend (no-op)
> + let result = backend.flush().await;
> + assert!(
> + result.is_ok(),
> + "Flush should always succeed for direct backend"
> + );
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_is_available() {
> + let backend = RrdDirectBackend::new();
> +
> + // Direct backend should always be available
> + assert!(
> + backend.is_available().await,
> + "Direct backend should always be available"
> + );
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_multiple_updates() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "multi_update_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::storage(RrdFormat::Pve2);
> + let start_time = 1704067200;
> +
> + backend
> + .create(&rrd_path, &schema, start_time)
> + .await
> + .expect("Failed to create RRD");
> +
> + // Perform multiple updates
> + for i in 0..10 {
> + let timestamp = start_time + 60 * (i + 1); // 1 minute intervals
> + let total = 1000000 + (i * 100000);
> + let used = 500000 + (i * 50000);
> + let update_data = format!("{}:{}:{}", timestamp, total, used);
> +
> + let result = backend.update(&rrd_path, &update_data).await;
> + assert!(result.is_ok(), "Update {} failed: {:?}", i, result.err());
> + }
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_overwrite_file() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "overwrite_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::storage(RrdFormat::Pve2);
> + let start_time = 1704067200;
> +
> + // Create file first time
> + backend
> + .create(&rrd_path, &schema, start_time)
> + .await
> + .expect("First create failed");
> +
> + // Create same file again - should succeed (overwrites)
> + // Note: librrd create() with no_overwrite=false allows overwriting
> + let result = backend.create(&rrd_path, &schema, start_time).await;
> + assert!(
> + result.is_ok(),
> + "Creating file again should succeed (overwrite mode): {:?}",
> + result.err()
> + );
> + }
> +
> + #[tokio::test]
> + async fn test_direct_backend_large_schema() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "large_schema_test");
> +
> + let mut backend = RrdDirectBackend::new();
> + let schema = RrdSchema::node(RrdFormat::Pve9_0); // 19 data sources
> + let start_time = 1704067200;
> +
> + // Create RRD with large schema
> + let result = backend.create(&rrd_path, &schema, start_time).await;
> + assert!(result.is_ok(), "Failed to create RRD with large schema");
> +
> + // Update with all values
> + let values = "100:200:50.5:10.2:8000000:4000000:2000000:500000:50000000:25000000:1000000:2000000:6000000:1000000:0.5:1.2:0.8:0.3:0.1";
> + let update_data = format!("N:{}", values);
> +
> + let result = backend.update(&rrd_path, &update_data).await;
> + assert!(result.is_ok(), "Failed to update RRD with large schema");
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_fallback.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_fallback.rs
> new file mode 100644
> index 00000000..7d574e5b
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/backend/backend_fallback.rs
> @@ -0,0 +1,229 @@
> +/// RRD Backend: Fallback (Daemon + Direct)
> +///
> +/// Composite backend that tries daemon first, falls back to direct file writing.
> +/// This matches the C implementation's behavior in status.c:1405-1420 where
> +/// it attempts rrdc_update() first, then falls back to rrd_update_r().
> +use super::super::schema::RrdSchema;
> +use super::{RrdCachedBackend, RrdDirectBackend};
> +use anyhow::{Context, Result};
> +use async_trait::async_trait;
> +use std::path::Path;
> +
> +/// Composite backend that tries daemon first, falls back to direct
> +///
> +/// This provides the same behavior as the C implementation:
> +/// 1. Try to use rrdcached daemon for performance
> +/// 2. If daemon fails or is unavailable, fall back to direct file writes
> +pub struct RrdFallbackBackend {
> + /// Optional daemon backend (None if daemon is unavailable/failed)
> + daemon: Option<RrdCachedBackend>,
> + /// Direct backend (always available)
> + direct: RrdDirectBackend,
> +}
> +
> +impl RrdFallbackBackend {
> + /// Create a new fallback backend
> + ///
> + /// Attempts to connect to rrdcached daemon. If successful, will prefer daemon.
> + /// If daemon is unavailable, will use direct mode only.
> + ///
> + /// # Arguments
> + /// * `daemon_socket` - Path to rrdcached Unix socket
> + pub async fn new(daemon_socket: &str) -> Self {
> + let daemon = match RrdCachedBackend::connect(daemon_socket).await {
> + Ok(backend) => {
> + tracing::info!("RRD fallback backend: daemon available, will prefer daemon mode");
> + Some(backend)
> + }
> + Err(e) => {
> + tracing::warn!(
> + "RRD fallback backend: daemon unavailable ({}), using direct mode only",
> + e
> + );
> + None
> + }
> + };
> +
> + let direct = RrdDirectBackend::new();
> +
> + Self { daemon, direct }
> + }
> +
> + /// Create a fallback backend with explicit daemon and direct backends
> + ///
> + /// Useful for testing or custom configurations
> + #[allow(dead_code)] // Used in tests for custom backend configurations
> + pub fn with_backends(daemon: Option<RrdCachedBackend>, direct: RrdDirectBackend) -> Self {
> + Self { daemon, direct }
> + }
> +
> + /// Check if daemon is currently being used
> + #[allow(dead_code)] // Used for debugging/monitoring daemon status
> + pub fn is_using_daemon(&self) -> bool {
> + self.daemon.is_some()
> + }
> +
> + /// Disable daemon mode and switch to direct mode only
> + ///
> + /// Called automatically when daemon operations fail
> + fn disable_daemon(&mut self) {
> + if self.daemon.is_some() {
> + tracing::warn!("Disabling daemon mode, switching to direct file writes");
> + self.daemon = None;
> + }
> + }
> +}
> +
> +#[async_trait]
> +impl super::super::backend::RrdBackend for RrdFallbackBackend {
> + async fn update(&mut self, file_path: &Path, data: &str) -> Result<()> {
> + // Try daemon first if available
> + if let Some(daemon) = &mut self.daemon {
> + match daemon.update(file_path, data).await {
> + Ok(()) => {
> + tracing::trace!("Updated RRD via daemon (fallback backend)");
> + return Ok(());
> + }
> + Err(e) => {
> + tracing::warn!("Daemon update failed, falling back to direct: {}", e);
> + self.disable_daemon();
Currently, we disable here the daemon permanently after one failure.
In C the daemon retries on every update call it seems.
I think its fine if we go with this for now, but this should be then
noted as a difference in the README, and maybe something to change
in the future.
> + }
> + }
> + }
> +
> + // Fallback to direct
> + self.direct
> + .update(file_path, data)
> + .await
> + .context("Both daemon and direct update failed")
> + }
> +
> + async fn create(
> + &mut self,
> + file_path: &Path,
> + schema: &RrdSchema,
> + start_timestamp: i64,
> + ) -> Result<()> {
> + // Try daemon first if available
> + if let Some(daemon) = &mut self.daemon {
> + match daemon.create(file_path, schema, start_timestamp).await {
> + Ok(()) => {
> + tracing::trace!("Created RRD via daemon (fallback backend)");
> + return Ok(());
> + }
> + Err(e) => {
> + tracing::warn!("Daemon create failed, falling back to direct: {}", e);
> + self.disable_daemon();
> + }
> + }
> + }
> +
> + // Fallback to direct
> + self.direct
> + .create(file_path, schema, start_timestamp)
> + .await
> + .context("Both daemon and direct create failed")
> + }
> +
> + async fn flush(&mut self) -> Result<()> {
> + // Only flush if using daemon
> + if let Some(daemon) = &mut self.daemon {
> + match daemon.flush().await {
> + Ok(()) => return Ok(()),
> + Err(e) => {
> + tracing::warn!("Daemon flush failed: {}", e);
> + self.disable_daemon();
> + }
> + }
> + }
> +
> + // Direct backend flush is a no-op
> + self.direct.flush().await
> + }
> +
> + async fn is_available(&self) -> bool {
> + // Always available - either daemon or direct will work
> + true
> + }
> +
> + fn name(&self) -> &str {
> + if self.daemon.is_some() {
> + "fallback(daemon+direct)"
> + } else {
> + "fallback(direct-only)"
> + }
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> + use crate::backend::RrdBackend;
> + use crate::schema::{RrdFormat, RrdSchema};
> + use std::path::PathBuf;
> + use tempfile::TempDir;
> +
> + /// Create a temporary directory for RRD files
> + fn setup_temp_dir() -> TempDir {
> + TempDir::new().expect("Failed to create temp directory")
> + }
> +
> + /// Create a test RRD file path
> + fn test_rrd_path(dir: &TempDir, name: &str) -> PathBuf {
> + dir.path().join(format!("{}.rrd", name))
> + }
> +
> + #[test]
> + fn test_fallback_backend_without_daemon() {
> + let direct = RrdDirectBackend::new();
> + let backend = RrdFallbackBackend::with_backends(None, direct);
> +
> + assert!(!backend.is_using_daemon());
> + assert_eq!(backend.name(), "fallback(direct-only)");
> + }
> +
> + #[tokio::test]
> + async fn test_fallback_backend_direct_mode_operations() {
> + let temp_dir = setup_temp_dir();
> + let rrd_path = test_rrd_path(&temp_dir, "fallback_test");
> +
> + // Create fallback backend without daemon (direct mode only)
> + let direct = RrdDirectBackend::new();
> + let mut backend = RrdFallbackBackend::with_backends(None, direct);
> +
> + assert!(!backend.is_using_daemon(), "Should not be using daemon");
> + assert_eq!(backend.name(), "fallback(direct-only)");
> +
> + // Test create and update operations work in direct mode
> + let schema = RrdSchema::storage(RrdFormat::Pve2);
> + let start_time = 1704067200;
> +
> + let result = backend.create(&rrd_path, &schema, start_time).await;
> + assert!(result.is_ok(), "Create should work in direct mode");
> +
> + let result = backend.update(&rrd_path, "N:1000:500").await;
> + assert!(result.is_ok(), "Update should work in direct mode");
> + }
> +
> + #[tokio::test]
> + async fn test_fallback_backend_is_always_available() {
> + let direct = RrdDirectBackend::new();
> + let backend = RrdFallbackBackend::with_backends(None, direct);
> +
> + // Fallback backend should always be available (even without daemon)
> + assert!(
> + backend.is_available().await,
> + "Fallback backend should always be available"
> + );
> + }
> +
> + #[tokio::test]
> + async fn test_fallback_backend_flush_without_daemon() {
> + let direct = RrdDirectBackend::new();
> + let mut backend = RrdFallbackBackend::with_backends(None, direct);
> +
> + // Flush should succeed even without daemon (no-op for direct)
> + let result = backend.flush().await;
> + assert!(result.is_ok(), "Flush should succeed without daemon");
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/daemon.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/daemon.rs
> new file mode 100644
> index 00000000..e53b6dad
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/daemon.rs
> @@ -0,0 +1,140 @@
> +/// RRDCached Daemon Client (wrapper around rrdcached-client crate)
> +///
> +/// This module provides a thin wrapper around the rrdcached-client crate.
> +use anyhow::{Context, Result};
> +use std::path::Path;
> +
> +/// Wrapper around rrdcached-client
> +#[allow(dead_code)] // Used in backend_daemon.rs via module-level access
> +pub struct RrdCachedClient {
> + pub(crate) client:
> + tokio::sync::Mutex<rrdcached_client::RRDCachedClient<tokio::net::UnixStream>>,
> +}
> +
> +impl RrdCachedClient {
> + /// Connect to rrdcached daemon via Unix socket
> + ///
> + /// # Arguments
> + /// * `socket_path` - Path to rrdcached Unix socket (default: /var/run/rrdcached.sock)
> + #[allow(dead_code)] // Used via backend modules
> + pub async fn connect<P: AsRef<Path>>(socket_path: P) -> Result<Self> {
> + let socket_path = socket_path.as_ref().to_string_lossy().to_string();
> +
> + tracing::debug!("Connecting to rrdcached at {}", socket_path);
> +
> + // Connect to daemon (async operation)
> + let client = rrdcached_client::RRDCachedClient::connect_unix(&socket_path)
> + .await
> + .with_context(|| format!("Failed to connect to rrdcached: {socket_path}"))?;
> +
> + tracing::info!("Connected to rrdcached at {}", socket_path);
> +
> + Ok(Self {
> + client: tokio::sync::Mutex::new(client),
> + })
> + }
> +
> + /// Update RRD file via rrdcached
> + ///
> + /// # Arguments
> + /// * `file_path` - Full path to RRD file
> + /// * `data` - Update data in format "timestamp:value1:value2:..."
> + #[allow(dead_code)] // Used via backend modules
> + pub async fn update<P: AsRef<Path>>(&self, file_path: P, data: &str) -> Result<()> {
There is a lot of duplication in this function and
RrdCachedBackend::update(), I think this can be refactored a bit.
> + let file_path = file_path.as_ref();
> +
> + // Parse the update data
> + let parts: Vec<&str> = data.split(':').collect();
> + if parts.len() < 2 {
> + anyhow::bail!("Invalid update data format: {data}");
> + }
> +
> + let timestamp = if parts[0] == "N" {
> + None
> + } else {
> + Some(
> + parts[0]
> + .parse::<usize>()
> + .with_context(|| format!("Invalid timestamp: {}", parts[0]))?,
> + )
> + };
> +
> + let values: Vec<f64> = parts[1..]
> + .iter()
> + .map(|v| {
> + if *v == "U" {
> + Ok(f64::NAN)
> + } else {
> + v.parse::<f64>()
> + .with_context(|| format!("Invalid value: {v}"))
while we fail here on parsing of non-U values,
RrdCachedBackend::update() treats many invalid tokens as
Datum::Unspecified and succeeds.
It makes behavior depend on which backend is active.
We should stick to one rule.
> + }
> + })
> + .collect::<Result<Vec<_>>>()?;
> +
> + // Get file path without .rrd extension (rrdcached-client adds it)
> + let path_str = file_path.to_string_lossy();
> + let path_without_ext = path_str.strip_suffix(".rrd").unwrap_or(&path_str);
> +
> + // Send update via rrdcached
> + let mut client = self.client.lock().await;
> + client
> + .update(path_without_ext, timestamp, values)
> + .await
> + .context("Failed to send update to rrdcached")?;
> +
> + tracing::trace!("Updated RRD via daemon: {:?} -> {}", file_path, data);
> +
> + Ok(())
> + }
> +
> + /// Create RRD file via rrdcached
> + #[allow(dead_code)] // Used via backend modules
> + pub async fn create(&self, args: rrdcached_client::create::CreateArguments) -> Result<()> {
> + let mut client = self.client.lock().await;
> + client
> + .create(args)
> + .await
> + .context("Failed to create RRD via rrdcached")?;
> + Ok(())
> + }
> +
> + /// Flush all pending updates
> + #[allow(dead_code)] // Used via backend modules
> + pub async fn flush(&self) -> Result<()> {
> + let mut client = self.client.lock().await;
> + client
> + .flush_all()
> + .await
> + .context("Failed to flush rrdcached")?;
> +
> + tracing::debug!("Flushed all RRD files");
> +
> + Ok(())
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> +
> + #[tokio::test]
> + #[ignore] // Only runs if rrdcached daemon is actually running
> + async fn test_connect_to_daemon() {
> + // This test requires a running rrdcached daemon
> + let result = RrdCachedClient::connect("/var/run/rrdcached.sock").await;
> +
> + match result {
> + Ok(client) => {
> + // Try to flush (basic connectivity test)
> + let result = client.flush().await;
> + println!("RRDCached flush result: {:?}", result);
> +
> + // Connection successful (flush may fail if no files, that's OK)
> + assert!(result.is_ok() || result.is_err());
> + }
> + Err(e) => {
> + println!("Note: rrdcached not running (expected in test env): {}", e);
> + }
> + }
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs
> new file mode 100644
> index 00000000..54021c14
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/key_type.rs
> @@ -0,0 +1,313 @@
> +/// RRD Key Type Parsing and Path Resolution
> +///
> +/// This module handles parsing RRD status update keys and mapping them
> +/// to the appropriate file paths and schemas.
> +use anyhow::{Context, Result};
> +use std::path::{Path, PathBuf};
> +
> +use super::schema::{RrdFormat, RrdSchema};
> +
> +/// RRD key types for routing to correct schema and path
> +///
> +/// This enum represents the different types of RRD metrics that pmxcfs tracks:
> +/// - Node metrics (CPU, memory, network for a node)
> +/// - VM metrics (CPU, memory, disk, network for a VM/CT)
> +/// - Storage metrics (total/used space for a storage)
> +#[derive(Debug, Clone, PartialEq, Eq)]
> +pub(crate) enum RrdKeyType {
> + /// Node metrics: pve2-node/{nodename} or pve-node-9.0/{nodename}
> + Node { nodename: String, format: RrdFormat },
> + /// VM metrics: pve2.3-vm/{vmid} or pve-vm-9.0/{vmid}
> + Vm { vmid: String, format: RrdFormat },
> + /// Storage metrics: pve2-storage/{node}/{storage} or pve-storage-9.0/{node}/{storage}
> + Storage {
> + nodename: String,
> + storage: String,
> + format: RrdFormat,
> + },
> +}
> +
> +impl RrdKeyType {
> + /// Parse RRD key from status update key
> + ///
> + /// Supported formats:
> + /// - "pve2-node/node1" → Node { nodename: "node1", format: Pve2 }
> + /// - "pve-node-9.0/node1" → Node { nodename: "node1", format: Pve9_0 }
> + /// - "pve2.3-vm/100" → Vm { vmid: "100", format: Pve2 }
> + /// - "pve-storage-9.0/node1/local" → Storage { nodename: "node1", storage: "local", format: Pve9_0 }
> + pub(crate) fn parse(key: &str) -> Result<Self> {
> + let parts: Vec<&str> = key.split('/').collect();
> +
> + if parts.is_empty() {
> + anyhow::bail!("Empty RRD key");
> + }
> +
> + match parts[0] {
> + "pve2-node" => {
> + let nodename = parts.get(1).context("Missing nodename")?.to_string();
> + Ok(RrdKeyType::Node {
> + nodename,
> + format: RrdFormat::Pve2,
> + })
> + }
> + prefix if prefix.starts_with("pve-node-") => {
pve-node-9.1/... would be treated as 9.0 so we lose the ability to
distinguish future format
Shouldnt we parse the suffix? Or please explicitly document the assumption.
> + let nodename = parts.get(1).context("Missing nodename")?.to_string();
> + Ok(RrdKeyType::Node {
> + nodename,
> + format: RrdFormat::Pve9_0,
> + })
> + }
> + "pve2.3-vm" => {
> + let vmid = parts.get(1).context("Missing vmid")?.to_string();
> + Ok(RrdKeyType::Vm {
> + vmid,
> + format: RrdFormat::Pve2,
> + })
> + }
> + prefix if prefix.starts_with("pve-vm-") => {
> + let vmid = parts.get(1).context("Missing vmid")?.to_string();
> + Ok(RrdKeyType::Vm {
> + vmid,
> + format: RrdFormat::Pve9_0,
> + })
> + }
> + "pve2-storage" => {
> + let nodename = parts.get(1).context("Missing nodename")?.to_string();
> + let storage = parts.get(2).context("Missing storage")?.to_string();
> + Ok(RrdKeyType::Storage {
> + nodename,
> + storage,
> + format: RrdFormat::Pve2,
> + })
> + }
> + prefix if prefix.starts_with("pve-storage-") => {
> + let nodename = parts.get(1).context("Missing nodename")?.to_string();
> + let storage = parts.get(2).context("Missing storage")?.to_string();
> + Ok(RrdKeyType::Storage {
> + nodename,
> + storage,
> + format: RrdFormat::Pve9_0,
> + })
> + }
> + _ => anyhow::bail!("Unknown RRD key format: {key}"),
> + }
> + }
> +
> + /// Get the RRD file path for this key type
> + ///
> + /// Always returns paths using the current format (9.0), regardless of the input format.
> + /// This enables transparent format migration: old PVE8 nodes can send `pve2-node/` keys,
> + /// and they'll be written to `pve-node-9.0/` files automatically.
> + ///
> + /// # Format Migration Strategy
> + ///
> + /// The C implementation always creates files in the current format directory
> + /// (see status.c:1287). This Rust implementation follows the same approach:
> + /// - Input: `pve2-node/node1` → Output: `/var/lib/rrdcached/db/pve-node-9.0/node1`
> + /// - Input: `pve-node-9.0/node1` → Output: `/var/lib/rrdcached/db/pve-node-9.0/node1`
> + ///
> + /// This allows rolling upgrades where old and new nodes coexist in the same cluster.
> + pub(crate) fn file_path(&self, base_dir: &Path) -> PathBuf {
> + match self {
> + RrdKeyType::Node { nodename, .. } => {
> + // Always use current format path
> + base_dir.join("pve-node-9.0").join(nodename)
If nodename or storage contains .. or / base_dir could be escaped and
the write could happen anywhere.
I think we need validate/sanitize the input paths if not already
done. Ideally already as part of RrdKeyType?
> + }
> + RrdKeyType::Vm { vmid, .. } => {
> + // Always use current format path
> + base_dir.join("pve-vm-9.0").join(vmid)
> + }
> + RrdKeyType::Storage {
> + nodename, storage, ..
> + } => {
> + // Always use current format path
> + base_dir
> + .join("pve-storage-9.0")
> + .join(nodename)
> + .join(storage)
> + }
> + }
> + }
> +
> + /// Get the source format from the input key
> + ///
> + /// This is used for data transformation (padding/truncation).
> + pub(crate) fn source_format(&self) -> RrdFormat {
> + match self {
> + RrdKeyType::Node { format, .. }
> + | RrdKeyType::Vm { format, .. }
> + | RrdKeyType::Storage { format, .. } => *format,
> + }
> + }
> +
> + /// Get the target RRD schema (always current format)
> + ///
> + /// Files are always created using the current format (Pve9_0),
> + /// regardless of the source format in the key.
> + pub(crate) fn schema(&self) -> RrdSchema {
> + match self {
> + RrdKeyType::Node { .. } => RrdSchema::node(RrdFormat::Pve9_0),
> + RrdKeyType::Vm { .. } => RrdSchema::vm(RrdFormat::Pve9_0),
> + RrdKeyType::Storage { .. } => RrdSchema::storage(RrdFormat::Pve9_0),
> + }
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> +
> + #[test]
> + fn test_parse_node_keys() {
> + let key = RrdKeyType::parse("pve2-node/testnode").unwrap();
> + assert_eq!(
> + key,
> + RrdKeyType::Node {
> + nodename: "testnode".to_string(),
> + format: RrdFormat::Pve2
> + }
> + );
> +
> + let key = RrdKeyType::parse("pve-node-9.0/testnode").unwrap();
> + assert_eq!(
> + key,
> + RrdKeyType::Node {
> + nodename: "testnode".to_string(),
> + format: RrdFormat::Pve9_0
> + }
> + );
> + }
> +
> + #[test]
> + fn test_parse_vm_keys() {
> + let key = RrdKeyType::parse("pve2.3-vm/100").unwrap();
> + assert_eq!(
> + key,
> + RrdKeyType::Vm {
> + vmid: "100".to_string(),
> + format: RrdFormat::Pve2
> + }
> + );
> +
> + let key = RrdKeyType::parse("pve-vm-9.0/100").unwrap();
> + assert_eq!(
> + key,
> + RrdKeyType::Vm {
> + vmid: "100".to_string(),
> + format: RrdFormat::Pve9_0
> + }
> + );
> + }
> +
> + #[test]
> + fn test_parse_storage_keys() {
> + let key = RrdKeyType::parse("pve2-storage/node1/local").unwrap();
> + assert_eq!(
> + key,
> + RrdKeyType::Storage {
> + nodename: "node1".to_string(),
> + storage: "local".to_string(),
> + format: RrdFormat::Pve2
> + }
> + );
> +
> + let key = RrdKeyType::parse("pve-storage-9.0/node1/local").unwrap();
> + assert_eq!(
> + key,
> + RrdKeyType::Storage {
> + nodename: "node1".to_string(),
> + storage: "local".to_string(),
> + format: RrdFormat::Pve9_0
> + }
> + );
> + }
> +
> + #[test]
> + fn test_file_paths() {
> + let base = Path::new("/var/lib/rrdcached/db");
> +
> + // New format key → new format path
> + let key = RrdKeyType::Node {
> + nodename: "node1".to_string(),
> + format: RrdFormat::Pve9_0,
> + };
> + assert_eq!(
> + key.file_path(base),
> + PathBuf::from("/var/lib/rrdcached/db/pve-node-9.0/node1")
> + );
> +
> + // Old format key → new format path (auto-upgrade!)
> + let key = RrdKeyType::Node {
> + nodename: "node1".to_string(),
> + format: RrdFormat::Pve2,
> + };
> + assert_eq!(
> + key.file_path(base),
> + PathBuf::from("/var/lib/rrdcached/db/pve-node-9.0/node1"),
> + "Old format keys should create new format files"
> + );
> +
> + // VM: Old format → new format
> + let key = RrdKeyType::Vm {
> + vmid: "100".to_string(),
> + format: RrdFormat::Pve2,
> + };
> + assert_eq!(
> + key.file_path(base),
> + PathBuf::from("/var/lib/rrdcached/db/pve-vm-9.0/100"),
> + "Old VM format should upgrade to new format"
> + );
> +
> + // Storage: Always uses current format
> + let key = RrdKeyType::Storage {
> + nodename: "node1".to_string(),
> + storage: "local".to_string(),
> + format: RrdFormat::Pve2,
> + };
> + assert_eq!(
> + key.file_path(base),
> + PathBuf::from("/var/lib/rrdcached/db/pve-storage-9.0/node1/local"),
> + "Old storage format should upgrade to new format"
> + );
> + }
> +
> + #[test]
> + fn test_source_format() {
> + let key = RrdKeyType::Node {
> + nodename: "node1".to_string(),
> + format: RrdFormat::Pve2,
> + };
> + assert_eq!(key.source_format(), RrdFormat::Pve2);
> +
> + let key = RrdKeyType::Vm {
> + vmid: "100".to_string(),
> + format: RrdFormat::Pve9_0,
> + };
> + assert_eq!(key.source_format(), RrdFormat::Pve9_0);
> + }
> +
> + #[test]
> + fn test_schema_always_current_format() {
> + // Even with Pve2 source format, schema should return Pve9_0
> + let key = RrdKeyType::Node {
> + nodename: "node1".to_string(),
> + format: RrdFormat::Pve2,
> + };
> + let schema = key.schema();
> + assert_eq!(
> + schema.format,
> + RrdFormat::Pve9_0,
> + "Schema should always use current format"
> + );
> + assert_eq!(schema.column_count(), 19, "Should have Pve9_0 column count");
> +
> + // Pve9_0 source also gets Pve9_0 schema
> + let key = RrdKeyType::Node {
> + nodename: "node1".to_string(),
> + format: RrdFormat::Pve9_0,
> + };
> + let schema = key.schema();
> + assert_eq!(schema.format, RrdFormat::Pve9_0);
> + assert_eq!(schema.column_count(), 19);
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/lib.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/lib.rs
> new file mode 100644
> index 00000000..7a439676
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/lib.rs
> @@ -0,0 +1,21 @@
> +/// RRD (Round-Robin Database) Persistence Module
> +///
> +/// This module provides RRD file persistence compatible with the C pmxcfs implementation.
> +/// It handles:
> +/// - RRD file creation with proper schemas (node, VM, storage)
> +/// - RRD file updates (writing metrics to disk)
> +/// - Multiple backend strategies:
> +/// - Daemon mode: High-performance batched updates via rrdcached
> +/// - Direct mode: Reliable fallback using direct file writes
> +/// - Fallback mode: Tries daemon first, falls back to direct (matches C behavior)
> +/// - Version management (pve2 vs pve-9.0 formats)
> +///
> +/// The implementation matches the C behavior in status.c where it attempts
> +/// daemon updates first, then falls back to direct file operations.
> +mod backend;
> +mod daemon;
> +mod key_type;
> +pub(crate) mod schema;
> +mod writer;
> +
> +pub use writer::RrdWriter;
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/schema.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/schema.rs
> new file mode 100644
> index 00000000..d449bd6e
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/schema.rs
> @@ -0,0 +1,577 @@
> +/// RRD Schema Definitions
> +///
> +/// Defines RRD database schemas matching the C pmxcfs implementation.
> +/// Each schema specifies data sources (DS) and round-robin archives (RRA).
> +use std::fmt;
> +
> +/// RRD format version
> +#[derive(Debug, Clone, Copy, PartialEq, Eq)]
> +pub enum RrdFormat {
> + /// Legacy pve2 format (12 columns for node, 10 for VM, 2 for storage)
> + Pve2,
> + /// New pve-9.0 format (19 columns for node, 17 for VM, 2 for storage)
> + Pve9_0,
> +}
> +
> +/// RRD data source definition
> +#[derive(Debug, Clone)]
> +pub struct RrdDataSource {
> + /// Data source name
> + pub name: &'static str,
> + /// Data source type (GAUGE, COUNTER, DERIVE, ABSOLUTE)
> + pub ds_type: &'static str,
> + /// Heartbeat (seconds before marking as unknown)
> + pub heartbeat: u32,
> + /// Minimum value (U for unknown)
> + pub min: &'static str,
> + /// Maximum value (U for unknown)
> + pub max: &'static str,
> +}
> +
> +impl RrdDataSource {
> + /// Create GAUGE data source with no min/max limits
> + pub(super) const fn gauge(name: &'static str) -> Self {
> + Self {
> + name,
> + ds_type: "GAUGE",
> + heartbeat: 120,
> + min: "0",
> + max: "U",
> + }
> + }
> +
> + /// Create DERIVE data source (for counters that can wrap)
> + pub(super) const fn derive(name: &'static str) -> Self {
> + Self {
> + name,
> + ds_type: "DERIVE",
> + heartbeat: 120,
> + min: "0",
> + max: "U",
> + }
> + }
> +
> + /// Format as RRD command line argument
> + ///
> + /// Matches C implementation format: "DS:name:TYPE:heartbeat:min:max"
> + /// (see rrd_def_node in src/pmxcfs/status.c:1100)
> + ///
> + /// Currently unused but kept for debugging/testing and C format compatibility.
> + #[allow(dead_code)]
> + pub(super) fn to_arg(&self) -> String {
> + format!(
> + "DS:{}:{}:{}:{}:{}",
> + self.name, self.ds_type, self.heartbeat, self.min, self.max
> + )
> + }
> +}
> +
> +/// RRD schema with data sources and archives
> +#[derive(Debug, Clone)]
> +pub struct RrdSchema {
> + /// RRD format version
> + pub format: RrdFormat,
> + /// Data sources
> + pub data_sources: Vec<RrdDataSource>,
> + /// Round-robin archives (RRA definitions)
> + pub archives: Vec<String>,
> +}
> +
> +impl RrdSchema {
> + /// Create node RRD schema
> + pub fn node(format: RrdFormat) -> Self {
> + let data_sources = match format {
> + RrdFormat::Pve2 => vec![
> + RrdDataSource::gauge("loadavg"),
> + RrdDataSource::gauge("maxcpu"),
> + RrdDataSource::gauge("cpu"),
> + RrdDataSource::gauge("iowait"),
> + RrdDataSource::gauge("memtotal"),
> + RrdDataSource::gauge("memused"),
> + RrdDataSource::gauge("swaptotal"),
> + RrdDataSource::gauge("swapused"),
> + RrdDataSource::gauge("roottotal"),
> + RrdDataSource::gauge("rootused"),
> + RrdDataSource::derive("netin"),
> + RrdDataSource::derive("netout"),
> + ],
> + RrdFormat::Pve9_0 => vec![
> + RrdDataSource::gauge("loadavg"),
> + RrdDataSource::gauge("maxcpu"),
> + RrdDataSource::gauge("cpu"),
> + RrdDataSource::gauge("iowait"),
> + RrdDataSource::gauge("memtotal"),
> + RrdDataSource::gauge("memused"),
> + RrdDataSource::gauge("swaptotal"),
> + RrdDataSource::gauge("swapused"),
> + RrdDataSource::gauge("roottotal"),
> + RrdDataSource::gauge("rootused"),
> + RrdDataSource::derive("netin"),
> + RrdDataSource::derive("netout"),
> + RrdDataSource::gauge("memavailable"),
> + RrdDataSource::gauge("arcsize"),
> + RrdDataSource::gauge("pressurecpusome"),
> + RrdDataSource::gauge("pressureiosome"),
> + RrdDataSource::gauge("pressureiofull"),
> + RrdDataSource::gauge("pressurememorysome"),
> + RrdDataSource::gauge("pressurememoryfull"),
> + ],
> + };
> +
> + Self {
> + format,
> + data_sources,
> + archives: Self::default_archives(),
> + }
> + }
> +
> + /// Create VM RRD schema
> + pub fn vm(format: RrdFormat) -> Self {
> + let data_sources = match format {
> + RrdFormat::Pve2 => vec![
> + RrdDataSource::gauge("maxcpu"),
> + RrdDataSource::gauge("cpu"),
> + RrdDataSource::gauge("maxmem"),
> + RrdDataSource::gauge("mem"),
> + RrdDataSource::gauge("maxdisk"),
> + RrdDataSource::gauge("disk"),
> + RrdDataSource::derive("netin"),
> + RrdDataSource::derive("netout"),
> + RrdDataSource::derive("diskread"),
> + RrdDataSource::derive("diskwrite"),
> + ],
> + RrdFormat::Pve9_0 => vec![
> + RrdDataSource::gauge("maxcpu"),
> + RrdDataSource::gauge("cpu"),
> + RrdDataSource::gauge("maxmem"),
> + RrdDataSource::gauge("mem"),
> + RrdDataSource::gauge("maxdisk"),
> + RrdDataSource::gauge("disk"),
> + RrdDataSource::derive("netin"),
> + RrdDataSource::derive("netout"),
> + RrdDataSource::derive("diskread"),
> + RrdDataSource::derive("diskwrite"),
> + RrdDataSource::gauge("memhost"),
> + RrdDataSource::gauge("pressurecpusome"),
> + RrdDataSource::gauge("pressurecpufull"),
> + RrdDataSource::gauge("pressureiosome"),
> + RrdDataSource::gauge("pressureiofull"),
> + RrdDataSource::gauge("pressurememorysome"),
> + RrdDataSource::gauge("pressurememoryfull"),
> + ],
> + };
> +
> + Self {
> + format,
> + data_sources,
> + archives: Self::default_archives(),
> + }
> + }
> +
> + /// Create storage RRD schema
> + pub fn storage(format: RrdFormat) -> Self {
> + let data_sources = vec![RrdDataSource::gauge("total"), RrdDataSource::gauge("used")];
> +
> + Self {
> + format,
> + data_sources,
> + archives: Self::default_archives(),
> + }
> + }
> +
> + /// Default RRA (Round-Robin Archive) definitions
> + ///
> + /// These match the C implementation's archives for 60-second step size:
> + /// - RRA:AVERAGE:0.5:1:1440 -> 1 min * 1440 => 1 day
> + /// - RRA:AVERAGE:0.5:30:1440 -> 30 min * 1440 => 30 days
> + /// - RRA:AVERAGE:0.5:360:1440 -> 6 hours * 1440 => 360 days (~1 year)
> + /// - RRA:AVERAGE:0.5:10080:570 -> 1 week * 570 => ~10 years
> + /// - RRA:MAX:0.5:1:1440 -> 1 min * 1440 => 1 day
> + /// - RRA:MAX:0.5:30:1440 -> 30 min * 1440 => 30 days
> + /// - RRA:MAX:0.5:360:1440 -> 6 hours * 1440 => 360 days (~1 year)
> + /// - RRA:MAX:0.5:10080:570 -> 1 week * 570 => ~10 years
> + pub(super) fn default_archives() -> Vec<String> {
> + vec![
> + "RRA:AVERAGE:0.5:1:1440".to_string(),
> + "RRA:AVERAGE:0.5:30:1440".to_string(),
> + "RRA:AVERAGE:0.5:360:1440".to_string(),
> + "RRA:AVERAGE:0.5:10080:570".to_string(),
> + "RRA:MAX:0.5:1:1440".to_string(),
> + "RRA:MAX:0.5:30:1440".to_string(),
> + "RRA:MAX:0.5:360:1440".to_string(),
> + "RRA:MAX:0.5:10080:570".to_string(),
> + ]
> + }
> +
> + /// Get number of data sources
> + pub fn column_count(&self) -> usize {
> + self.data_sources.len()
> + }
> +}
> +
> +impl fmt::Display for RrdSchema {
> + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
> + write!(
> + f,
> + "{:?} schema with {} data sources",
> + self.format,
> + self.column_count()
> + )
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> +
> + fn assert_ds_properties(
> + ds: &RrdDataSource,
> + expected_name: &str,
> + expected_type: &str,
> + index: usize,
> + ) {
> + assert_eq!(ds.name, expected_name, "DS[{}] name mismatch", index);
> + assert_eq!(ds.ds_type, expected_type, "DS[{}] type mismatch", index);
> + assert_eq!(ds.heartbeat, 120, "DS[{}] heartbeat should be 120", index);
> + assert_eq!(ds.min, "0", "DS[{}] min should be 0", index);
> + assert_eq!(ds.max, "U", "DS[{}] max should be U", index);
> + }
> +
> + #[test]
> + fn test_datasource_construction() {
> + let gauge_ds = RrdDataSource::gauge("cpu");
> + assert_eq!(gauge_ds.name, "cpu");
> + assert_eq!(gauge_ds.ds_type, "GAUGE");
> + assert_eq!(gauge_ds.heartbeat, 120);
> + assert_eq!(gauge_ds.min, "0");
> + assert_eq!(gauge_ds.max, "U");
> + assert_eq!(gauge_ds.to_arg(), "DS:cpu:GAUGE:120:0:U");
> +
> + let derive_ds = RrdDataSource::derive("netin");
> + assert_eq!(derive_ds.name, "netin");
> + assert_eq!(derive_ds.ds_type, "DERIVE");
> + assert_eq!(derive_ds.heartbeat, 120);
> + assert_eq!(derive_ds.min, "0");
> + assert_eq!(derive_ds.max, "U");
> + assert_eq!(derive_ds.to_arg(), "DS:netin:DERIVE:120:0:U");
> + }
> +
> + #[test]
> + fn test_node_schema_pve2() {
> + let schema = RrdSchema::node(RrdFormat::Pve2);
> +
> + assert_eq!(schema.column_count(), 12);
> + assert_eq!(schema.format, RrdFormat::Pve2);
> +
> + let expected_ds = vec![
> + ("loadavg", "GAUGE"),
> + ("maxcpu", "GAUGE"),
> + ("cpu", "GAUGE"),
> + ("iowait", "GAUGE"),
> + ("memtotal", "GAUGE"),
> + ("memused", "GAUGE"),
> + ("swaptotal", "GAUGE"),
> + ("swapused", "GAUGE"),
> + ("roottotal", "GAUGE"),
> + ("rootused", "GAUGE"),
> + ("netin", "DERIVE"),
> + ("netout", "DERIVE"),
> + ];
> +
> + for (i, (name, ds_type)) in expected_ds.iter().enumerate() {
> + assert_ds_properties(&schema.data_sources[i], name, ds_type, i);
> + }
> + }
> +
> + #[test]
> + fn test_node_schema_pve9() {
> + let schema = RrdSchema::node(RrdFormat::Pve9_0);
> +
> + assert_eq!(schema.column_count(), 19);
> + assert_eq!(schema.format, RrdFormat::Pve9_0);
> +
> + let pve2_schema = RrdSchema::node(RrdFormat::Pve2);
> + for i in 0..12 {
> + assert_eq!(
> + schema.data_sources[i].name, pve2_schema.data_sources[i].name,
> + "First 12 DS should match pve2"
> + );
> + assert_eq!(
> + schema.data_sources[i].ds_type, pve2_schema.data_sources[i].ds_type,
> + "First 12 DS types should match pve2"
> + );
> + }
> +
> + let pve9_additions = vec![
> + ("memavailable", "GAUGE"),
> + ("arcsize", "GAUGE"),
> + ("pressurecpusome", "GAUGE"),
> + ("pressureiosome", "GAUGE"),
> + ("pressureiofull", "GAUGE"),
> + ("pressurememorysome", "GAUGE"),
> + ("pressurememoryfull", "GAUGE"),
> + ];
> +
> + for (i, (name, ds_type)) in pve9_additions.iter().enumerate() {
> + assert_ds_properties(&schema.data_sources[12 + i], name, ds_type, 12 + i);
> + }
> + }
> +
> + #[test]
> + fn test_vm_schema_pve2() {
> + let schema = RrdSchema::vm(RrdFormat::Pve2);
> +
> + assert_eq!(schema.column_count(), 10);
> + assert_eq!(schema.format, RrdFormat::Pve2);
> +
> + let expected_ds = vec![
> + ("maxcpu", "GAUGE"),
> + ("cpu", "GAUGE"),
> + ("maxmem", "GAUGE"),
> + ("mem", "GAUGE"),
> + ("maxdisk", "GAUGE"),
> + ("disk", "GAUGE"),
> + ("netin", "DERIVE"),
> + ("netout", "DERIVE"),
> + ("diskread", "DERIVE"),
> + ("diskwrite", "DERIVE"),
> + ];
> +
> + for (i, (name, ds_type)) in expected_ds.iter().enumerate() {
> + assert_ds_properties(&schema.data_sources[i], name, ds_type, i);
> + }
> + }
> +
> + #[test]
> + fn test_vm_schema_pve9() {
> + let schema = RrdSchema::vm(RrdFormat::Pve9_0);
> +
> + assert_eq!(schema.column_count(), 17);
> + assert_eq!(schema.format, RrdFormat::Pve9_0);
> +
> + let pve2_schema = RrdSchema::vm(RrdFormat::Pve2);
> + for i in 0..10 {
> + assert_eq!(
> + schema.data_sources[i].name, pve2_schema.data_sources[i].name,
> + "First 10 DS should match pve2"
> + );
> + assert_eq!(
> + schema.data_sources[i].ds_type, pve2_schema.data_sources[i].ds_type,
> + "First 10 DS types should match pve2"
> + );
> + }
> +
> + let pve9_additions = vec![
> + ("memhost", "GAUGE"),
> + ("pressurecpusome", "GAUGE"),
> + ("pressurecpufull", "GAUGE"),
> + ("pressureiosome", "GAUGE"),
> + ("pressureiofull", "GAUGE"),
> + ("pressurememorysome", "GAUGE"),
> + ("pressurememoryfull", "GAUGE"),
> + ];
> +
> + for (i, (name, ds_type)) in pve9_additions.iter().enumerate() {
> + assert_ds_properties(&schema.data_sources[10 + i], name, ds_type, 10 + i);
> + }
> + }
> +
> + #[test]
> + fn test_storage_schema() {
> + for format in [RrdFormat::Pve2, RrdFormat::Pve9_0] {
> + let schema = RrdSchema::storage(format);
> +
> + assert_eq!(schema.column_count(), 2);
> + assert_eq!(schema.format, format);
> +
> + assert_ds_properties(&schema.data_sources[0], "total", "GAUGE", 0);
> + assert_ds_properties(&schema.data_sources[1], "used", "GAUGE", 1);
> + }
> + }
> +
> + #[test]
> + fn test_rra_archives() {
> + let expected_rras = [
> + "RRA:AVERAGE:0.5:1:1440",
> + "RRA:AVERAGE:0.5:30:1440",
> + "RRA:AVERAGE:0.5:360:1440",
> + "RRA:AVERAGE:0.5:10080:570",
> + "RRA:MAX:0.5:1:1440",
> + "RRA:MAX:0.5:30:1440",
> + "RRA:MAX:0.5:360:1440",
> + "RRA:MAX:0.5:10080:570",
> + ];
> +
> + let schemas = vec![
> + RrdSchema::node(RrdFormat::Pve2),
> + RrdSchema::node(RrdFormat::Pve9_0),
> + RrdSchema::vm(RrdFormat::Pve2),
> + RrdSchema::vm(RrdFormat::Pve9_0),
> + RrdSchema::storage(RrdFormat::Pve2),
> + RrdSchema::storage(RrdFormat::Pve9_0),
> + ];
> +
> + for schema in schemas {
> + assert_eq!(schema.archives.len(), 8);
> +
> + for (i, expected) in expected_rras.iter().enumerate() {
> + assert_eq!(
> + &schema.archives[i], expected,
> + "RRA[{}] mismatch in {:?}",
> + i, schema.format
> + );
> + }
> + }
> + }
> +
> + #[test]
> + fn test_heartbeat_consistency() {
> + let schemas = vec![
> + RrdSchema::node(RrdFormat::Pve2),
> + RrdSchema::node(RrdFormat::Pve9_0),
> + RrdSchema::vm(RrdFormat::Pve2),
> + RrdSchema::vm(RrdFormat::Pve9_0),
> + RrdSchema::storage(RrdFormat::Pve2),
> + RrdSchema::storage(RrdFormat::Pve9_0),
> + ];
> +
> + for schema in schemas {
> + for ds in &schema.data_sources {
> + assert_eq!(ds.heartbeat, 120);
> + assert_eq!(ds.min, "0");
> + assert_eq!(ds.max, "U");
> + }
> + }
> + }
> +
> + #[test]
> + fn test_gauge_vs_derive_correctness() {
> + // GAUGE: instantaneous values (CPU%, memory bytes)
> + // DERIVE: cumulative counters that can wrap (network/disk bytes)
> +
> + let node = RrdSchema::node(RrdFormat::Pve2);
> + let node_derive_indices = [10, 11]; // netin, netout
> + for (i, ds) in node.data_sources.iter().enumerate() {
> + if node_derive_indices.contains(&i) {
> + assert_eq!(
> + ds.ds_type, "DERIVE",
> + "Node DS[{}] ({}) should be DERIVE",
> + i, ds.name
> + );
> + } else {
> + assert_eq!(
> + ds.ds_type, "GAUGE",
> + "Node DS[{}] ({}) should be GAUGE",
> + i, ds.name
> + );
> + }
> + }
> +
> + let vm = RrdSchema::vm(RrdFormat::Pve2);
> + let vm_derive_indices = [6, 7, 8, 9]; // netin, netout, diskread, diskwrite
> + for (i, ds) in vm.data_sources.iter().enumerate() {
> + if vm_derive_indices.contains(&i) {
> + assert_eq!(
> + ds.ds_type, "DERIVE",
> + "VM DS[{}] ({}) should be DERIVE",
> + i, ds.name
> + );
> + } else {
> + assert_eq!(
> + ds.ds_type, "GAUGE",
> + "VM DS[{}] ({}) should be GAUGE",
> + i, ds.name
> + );
> + }
> + }
> +
> + let storage = RrdSchema::storage(RrdFormat::Pve2);
> + for ds in &storage.data_sources {
> + assert_eq!(
> + ds.ds_type, "GAUGE",
> + "Storage DS ({}) should be GAUGE",
> + ds.name
> + );
> + }
> + }
> +
> + #[test]
> + fn test_pve9_backward_compatibility() {
> + let node_pve2 = RrdSchema::node(RrdFormat::Pve2);
> + let node_pve9 = RrdSchema::node(RrdFormat::Pve9_0);
> +
> + assert!(node_pve9.column_count() > node_pve2.column_count());
> +
> + for i in 0..node_pve2.column_count() {
> + assert_eq!(
> + node_pve2.data_sources[i].name, node_pve9.data_sources[i].name,
> + "Node DS[{}] name must match between pve2 and pve9.0",
> + i
> + );
> + assert_eq!(
> + node_pve2.data_sources[i].ds_type, node_pve9.data_sources[i].ds_type,
> + "Node DS[{}] type must match between pve2 and pve9.0",
> + i
> + );
> + }
> +
> + let vm_pve2 = RrdSchema::vm(RrdFormat::Pve2);
> + let vm_pve9 = RrdSchema::vm(RrdFormat::Pve9_0);
> +
> + assert!(vm_pve9.column_count() > vm_pve2.column_count());
> +
> + for i in 0..vm_pve2.column_count() {
> + assert_eq!(
> + vm_pve2.data_sources[i].name, vm_pve9.data_sources[i].name,
> + "VM DS[{}] name must match between pve2 and pve9.0",
> + i
> + );
> + assert_eq!(
> + vm_pve2.data_sources[i].ds_type, vm_pve9.data_sources[i].ds_type,
> + "VM DS[{}] type must match between pve2 and pve9.0",
> + i
> + );
> + }
> +
> + let storage_pve2 = RrdSchema::storage(RrdFormat::Pve2);
> + let storage_pve9 = RrdSchema::storage(RrdFormat::Pve9_0);
> + assert_eq!(storage_pve2.column_count(), storage_pve9.column_count());
> + }
> +
> + #[test]
> + fn test_schema_display() {
> + let test_cases = vec![
> + (RrdSchema::node(RrdFormat::Pve2), "Pve2", "12 data sources"),
> + (
> + RrdSchema::node(RrdFormat::Pve9_0),
> + "Pve9_0",
> + "19 data sources",
> + ),
> + (RrdSchema::vm(RrdFormat::Pve2), "Pve2", "10 data sources"),
> + (
> + RrdSchema::vm(RrdFormat::Pve9_0),
> + "Pve9_0",
> + "17 data sources",
> + ),
> + (
> + RrdSchema::storage(RrdFormat::Pve2),
> + "Pve2",
> + "2 data sources",
> + ),
> + ];
> +
> + for (schema, expected_format, expected_count) in test_cases {
> + let display = format!("{}", schema);
> + assert!(
> + display.contains(expected_format),
> + "Display should contain format: {}",
> + display
> + );
> + assert!(
> + display.contains(expected_count),
> + "Display should contain count: {}",
> + display
> + );
> + }
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs b/src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs
> new file mode 100644
> index 00000000..79ed202a
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-rrd/src/writer.rs
> @@ -0,0 +1,397 @@
> +/// RRD File Writer
> +///
> +/// Handles creating and updating RRD files via pluggable backends.
> +/// Supports daemon-based (rrdcached) and direct file writing modes.
> +use super::key_type::RrdKeyType;
> +use super::schema::{RrdFormat, RrdSchema};
> +use anyhow::{Context, Result};
> +use chrono::Utc;
> +use std::collections::HashMap;
> +use std::fs;
> +use std::path::{Path, PathBuf};
> +
> +/// Metric type for determining column skipping rules
> +#[derive(Debug, Clone, Copy, PartialEq, Eq)]
> +enum MetricType {
> + Node,
> + Vm,
> + Storage,
> +}
> +
> +impl MetricType {
> + /// Number of non-archivable columns to skip
> + ///
> + /// C implementation (status.c:1300, 1335):
> + /// - Node: skip 2 (uptime, status)
> + /// - VM: skip 4 (uptime, status, template, pid)
> + /// - Storage: skip 0
> + fn skip_columns(self) -> usize {
> + match self {
> + MetricType::Node => 2,
> + MetricType::Vm => 4,
> + MetricType::Storage => 0,
> + }
> + }
> +}
> +
> +impl RrdFormat {
> + /// Get column count for a specific metric type
> + #[allow(dead_code)]
> + fn column_count(self, metric_type: &MetricType) -> usize {
> + match (self, metric_type) {
> + (RrdFormat::Pve2, MetricType::Node) => 12,
> + (RrdFormat::Pve9_0, MetricType::Node) => 19,
> + (RrdFormat::Pve2, MetricType::Vm) => 10,
> + (RrdFormat::Pve9_0, MetricType::Vm) => 17,
> + (_, MetricType::Storage) => 2, // Same for both formats
> + }
> + }
> +}
> +
> +impl RrdKeyType {
> + /// Get the metric type for this key
> + fn metric_type(&self) -> MetricType {
> + match self {
> + RrdKeyType::Node { .. } => MetricType::Node,
> + RrdKeyType::Vm { .. } => MetricType::Vm,
> + RrdKeyType::Storage { .. } => MetricType::Storage,
> + }
> + }
> +}
> +
> +/// RRD writer for persistent metric storage
> +///
> +/// Uses pluggable backends (daemon, direct, or fallback) for RRD operations.
> +pub struct RrdWriter {
> + /// Base directory for RRD files (default: /var/lib/rrdcached/db)
> + base_dir: PathBuf,
> + /// Backend for RRD operations (daemon, direct, or fallback)
> + backend: Box<dyn super::backend::RrdBackend>,
> + /// Track which RRD files we've already created
> + created_files: HashMap<String, ()>,
We currently dont clear this cache?
This suggests to risk DDoS.
> +}
> +
> +impl RrdWriter {
> + /// Create new RRD writer with default fallback backend
> + ///
> + /// Uses the fallback backend that tries daemon first, then falls back to direct file writes.
> + /// This matches the C implementation's behavior.
> + ///
> + /// # Arguments
> + /// * `base_dir` - Base directory for RRD files
> + pub async fn new<P: AsRef<Path>>(base_dir: P) -> Result<Self> {
> + let backend = Self::default_backend().await?;
> + Self::with_backend(base_dir, backend).await
> + }
> +
> + /// Create new RRD writer with specific backend
> + ///
> + /// # Arguments
> + /// * `base_dir` - Base directory for RRD files
> + /// * `backend` - RRD backend to use (daemon, direct, or fallback)
> + pub(crate) async fn with_backend<P: AsRef<Path>>(
> + base_dir: P,
> + backend: Box<dyn super::backend::RrdBackend>,
> + ) -> Result<Self> {
> + let base_dir = base_dir.as_ref().to_path_buf();
> +
> + // Create base directory if it doesn't exist
> + fs::create_dir_all(&base_dir)
> + .with_context(|| format!("Failed to create RRD base directory: {base_dir:?}"))?;
> +
> + tracing::info!("RRD writer using backend: {}", backend.name());
> +
> + Ok(Self {
> + base_dir,
> + backend,
> + created_files: HashMap::new(),
> + })
> + }
> +
> + /// Create default backend (fallback: daemon + direct)
> + ///
> + /// This matches the C implementation's behavior:
> + /// - Tries rrdcached daemon first for performance
> + /// - Falls back to direct file writes if daemon fails
> + async fn default_backend() -> Result<Box<dyn super::backend::RrdBackend>> {
> + let backend = super::backend::RrdFallbackBackend::new("/var/run/rrdcached.sock").await;
> + Ok(Box::new(backend))
> + }
> +
> + /// Update RRD file with metric data
> + ///
> + /// This will:
> + /// 1. Transform data from source format to target format (padding/truncation/column skipping)
> + /// 2. Create the RRD file if it doesn't exist
> + /// 3. Update via rrdcached daemon
> + ///
> + /// # Arguments
> + /// * `key` - RRD key (e.g., "pve2-node/node1", "pve-vm-9.0/100")
> + /// * `data` - Metric data string (format: "timestamp:value1:value2:...")
> + pub async fn update(&mut self, key: &str, data: &str) -> Result<()> {
> + // Parse the key to determine file path and schema
> + let key_type = RrdKeyType::parse(key).with_context(|| format!("Invalid RRD key: {key}"))?;
> +
> + // Get source format and target schema
> + let source_format = key_type.source_format();
> + let target_schema = key_type.schema();
> + let metric_type = key_type.metric_type();
> +
> + // Transform data from source to target format
> + let transformed_data =
> + Self::transform_data(data, source_format, &target_schema, metric_type)
> + .with_context(|| format!("Failed to transform RRD data for key: {key}"))?;
> +
> + // Get the file path (always uses current format)
> + let file_path = key_type.file_path(&self.base_dir);
> +
> + // Ensure the RRD file exists
> + if !self.created_files.contains_key(key) && !file_path.exists() {
If an RRD file is deleted/rotated while the process is running,
created_files still contains the key, so it won’t recreate and
updates will fail. Maybe check file_path.exists() unconditionally?
> + self.create_rrd_file(&key_type, &file_path).await?;
> + self.created_files.insert(key.to_string(), ());
> + }
> +
> + // Update the RRD file via backend
> + self.backend.update(&file_path, &transformed_data).await?;
> +
> + Ok(())
> + }
> +
> + /// Create RRD file with appropriate schema via backend
> + async fn create_rrd_file(&mut self, key_type: &RrdKeyType, file_path: &Path) -> Result<()> {
> + // Ensure parent directory exists
> + if let Some(parent) = file_path.parent() {
> + fs::create_dir_all(parent)
> + .with_context(|| format!("Failed to create directory: {parent:?}"))?;
> + }
> +
> + // Get schema for this RRD type
> + let schema = key_type.schema();
> +
> + // Calculate start time (at day boundary, matching C implementation)
> + let now = Utc::now();
> + let start = now
> + .date_naive()
> + .and_hms_opt(0, 0, 0)
> + .expect("00:00:00 is always a valid time")
> + .and_utc();
start time uses UTC midnight here, I think the C code uses localtime
day boundary. Worth double checking
> + let start_timestamp = start.timestamp();
> +
> + tracing::debug!(
> + "Creating RRD file: {:?} with {} data sources via {}",
> + file_path,
> + schema.column_count(),
> + self.backend.name()
> + );
> +
> + // Delegate to backend for creation
> + self.backend
> + .create(file_path, &schema, start_timestamp)
> + .await?;
> +
> + tracing::info!("Created RRD file: {:?} ({})", file_path, schema);
> +
> + Ok(())
> + }
> +
> + /// Transform data from source format to target format
> + ///
> + /// This implements the C behavior from status.c:
> + /// 1. Skip non-archivable columns only for old formats (uptime, status for nodes)
> + /// 2. Pad old format data with `:U` for missing columns
> + /// 3. Truncate future format data to known columns
> + ///
> + /// # Arguments
> + /// * `data` - Raw data string from status update (format: "timestamp:v1:v2:...")
> + /// * `source_format` - Format indicated by the input key
> + /// * `target_schema` - Target RRD schema (always Pve9_0 currently)
> + /// * `metric_type` - Type of metric (Node, VM, Storage) for column skipping
> + ///
> + /// # Returns
> + /// Transformed data string ready for RRD update
> + fn transform_data(
> + data: &str,
> + source_format: RrdFormat,
> + target_schema: &RrdSchema,
> + metric_type: MetricType,
> + ) -> Result<String> {
> + let mut parts = data.split(':');
> +
> + let timestamp = parts
> + .next()
> + .ok_or_else(|| anyhow::anyhow!("Empty data string"))?;
Not required for correctness as backend will reject, but early
validation here of the timestamp would improve the error message
and avoid doing work before failing
> +
> + // Skip non-archivable columns for old format only (C: status.c:1300, 1335, 1385)
> + let skip_count = if source_format == RrdFormat::Pve2 {
> + metric_type.skip_columns()
> + } else {
> + 0
> + };
likely a bug: here we only skip the non-archivable prefix fields for
Pve2, but not for Pve9_0. If pve9 payloads still include
uptime/status/template/pid, the mapping will be shifted and metrics
will be written into the wrong columns.
status.c:update_rrd_data() skips unconditionally by type:
if (strncmp(key, "pve2-node/", 10) == 0 || strncmp(key, "pve-node-", 9)
== 0) {
...
skip = 2; // first two columns are live data that isn't archived
...
}
} else if (strncmp(key, "pve2.3-vm/", 10) == 0 || strncmp(key,
"pve-vm-", 7) == 0) {
...
skip = 4; // first 4 columns are live data that isn't archived
...
}
skip = 2 / skip = 4 is not "PVE2 only"
Let’s either apply skip_columns() based on metric type for all formats,
or show with captured fixtures pve9 payloads are already stripped.
> +
> + // Build transformed data: timestamp + values (skipped, padded/truncated to target_cols)
> + let target_cols = target_schema.column_count();
> +
> + // Join values with ':' separator, efficiently building the string without Vec allocation
> + let mut iter = parts
> + .skip(skip_count)
> + .chain(std::iter::repeat("U"))
> + .take(target_cols);
> + let values = match iter.next() {
> + Some(first) => {
> + // Start with first value, fold remaining values with separator
> + iter.fold(first.to_string(), |mut acc, value| {
> + acc.push(':');
> + acc.push_str(value);
> + acc
> + })
> + }
> + None => String::new(),
> + };
> +
> + Ok(format!("{timestamp}:{values}"))
> + }
> +
> + /// Flush all pending updates
> + #[allow(dead_code)] // Used via RRD update cycle
> + pub(crate) async fn flush(&mut self) -> Result<()> {
> + self.backend.flush().await
> + }
> +
> + /// Get base directory
> + #[allow(dead_code)] // Used for path resolution in updates
> + pub(crate) fn base_dir(&self) -> &Path {
> + &self.base_dir
> + }
> +}
> +
> +impl Drop for RrdWriter {
> + fn drop(&mut self) {
> + // Note: We can't flush in Drop since it's async
> + // Users should call flush() explicitly before dropping if needed
> + tracing::debug!("RrdWriter dropped");
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::super::schema::{RrdFormat, RrdSchema};
> + use super::*;
> +
> + #[test]
> + fn test_rrd_file_path_generation() {
> + let temp_dir = std::path::PathBuf::from("/tmp/test");
> +
> + let key_node = RrdKeyType::Node {
> + nodename: "testnode".to_string(),
> + format: RrdFormat::Pve9_0,
> + };
> + let path = key_node.file_path(&temp_dir);
> + assert_eq!(path, temp_dir.join("pve-node-9.0").join("testnode"));
> + }
> +
> + // ===== Format Adaptation Tests =====
The transform tests are helpful, but can we add some real sample
payloads?
If we can capture a few actual update strings produced by the current
C impl / running system for:
then we could add them as fixtures and assert transform_data() produces
exactly the expected column layout for the target schema
> +
> + #[test]
> + fn test_transform_data_node_pve2_to_pve9() {
> + // Test padding old format (12 cols) to new format (19 cols)
> + // Input: timestamp:uptime:status:load:maxcpu:cpu:iowait:memtotal:memused:swap_t:swap_u:netin:netout
> + let data = "1234567890:1000:0:1.5:4:2.0:0.5:8000000000:6000000000:0:0:1000000:500000";
> +
> + let schema = RrdSchema::node(RrdFormat::Pve9_0);
> + let result =
> + RrdWriter::transform_data(data, RrdFormat::Pve2, &schema, MetricType::Node).unwrap();
> +
> + // After skipping 2 cols (uptime, status) and padding with 7 U's:
> + // timestamp:load:maxcpu:cpu:iowait:memtotal:memused:swap_t:swap_u:netin:netout:U:U:U:U:U:U:U
> + let parts: Vec<&str> = result.split(':').collect();
> + assert_eq!(parts[0], "1234567890", "Timestamp should be preserved");
> + assert_eq!(parts.len(), 20, "Should have timestamp + 19 values"); // 1 + 19
> + assert_eq!(parts[1], "1.5", "First value after skip should be load");
> + assert_eq!(parts[2], "4", "Second value should be maxcpu");
> +
> + // Check padding
> + for (i, item) in parts.iter().enumerate().take(20).skip(12) {
> + assert_eq!(item, &"U", "Column {} should be padded with U", i);
> + }
> + }
> +
> + #[test]
> + fn test_transform_data_vm_pve2_to_pve9() {
> + // Test VM transformation with 4 columns skipped
> + // Input: timestamp:uptime:status:template:pid:maxcpu:cpu:maxmem:mem:maxdisk:disk:netin:netout:diskread:diskwrite
> + let data = "1234567890:1000:1:0:12345:4:2:4096:2048:100000:50000:1000:500:100:50";
> +
> + let schema = RrdSchema::vm(RrdFormat::Pve9_0);
> + let result =
> + RrdWriter::transform_data(data, RrdFormat::Pve2, &schema, MetricType::Vm).unwrap();
> +
> + let parts: Vec<&str> = result.split(':').collect();
> + assert_eq!(parts[0], "1234567890");
> + assert_eq!(parts.len(), 18, "Should have timestamp + 17 values");
> + assert_eq!(parts[1], "4", "First value after skip should be maxcpu");
> +
> + // Check padding (last 7 columns)
> + for (i, item) in parts.iter().enumerate().take(18).skip(11) {
> + assert_eq!(item, &"U", "Column {} should be padded", i);
> + }
> + }
> +
> + #[test]
> + fn test_transform_data_no_padding_needed() {
> + // Test when source and target have same column count
> + let data = "1234567890:1.5:4:2.0:0.5:8000000000:6000000000:0:0:0:0:1000000:500000:7000000000:0:0:0:0:0:0";
> +
> + let schema = RrdSchema::node(RrdFormat::Pve9_0);
> + let result =
> + RrdWriter::transform_data(data, RrdFormat::Pve9_0, &schema, MetricType::Node).unwrap();
> +
> + // No transformation should occur (same format)
> + let parts: Vec<&str> = result.split(':').collect();
> + assert_eq!(parts.len(), 20); // timestamp + 19 values
> + assert_eq!(parts[1], "1.5");
> + }
> +
> + #[test]
> + fn test_transform_data_future_format_truncation() {
> + // Test truncation of future format with extra columns
> + let data = "1234567890:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:23:24:25";
> +
> + let schema = RrdSchema::node(RrdFormat::Pve9_0);
> + // Simulating future format that has 25 columns
> + let result =
> + RrdWriter::transform_data(data, RrdFormat::Pve9_0, &schema, MetricType::Node).unwrap();
> +
> + let parts: Vec<&str> = result.split(':').collect();
> + assert_eq!(parts.len(), 20, "Should truncate to timestamp + 19 values");
> + assert_eq!(parts[19], "19", "Last value should be column 19");
> + }
> +
> + #[test]
> + fn test_transform_data_storage_no_change() {
> + // Storage format is same for Pve2 and Pve9_0 (2 columns, no skipping)
> + let data = "1234567890:1000000000000:500000000000";
> +
> + let schema = RrdSchema::storage(RrdFormat::Pve9_0);
> + let result =
> + RrdWriter::transform_data(data, RrdFormat::Pve2, &schema, MetricType::Storage).unwrap();
> +
> + assert_eq!(result, data, "Storage data should not be transformed");
> + }
> +
> + #[test]
> + fn test_metric_type_methods() {
> + assert_eq!(MetricType::Node.skip_columns(), 2);
> + assert_eq!(MetricType::Vm.skip_columns(), 4);
> + assert_eq!(MetricType::Storage.skip_columns(), 0);
> + }
> +
> + #[test]
> + fn test_format_column_counts() {
> + assert_eq!(RrdFormat::Pve2.column_count(&MetricType::Node), 12);
> + assert_eq!(RrdFormat::Pve9_0.column_count(&MetricType::Node), 19);
> + assert_eq!(RrdFormat::Pve2.column_count(&MetricType::Vm), 10);
> + assert_eq!(RrdFormat::Pve9_0.column_count(&MetricType::Vm), 17);
> + assert_eq!(RrdFormat::Pve2.column_count(&MetricType::Storage), 2);
> + assert_eq!(RrdFormat::Pve9_0.column_count(&MetricType::Storage), 2);
> + }
> +}
^ permalink raw reply [relevance 5%]
* Re: [pve-devel] [PATCH pve-cluster 05/15] pmxcfs-rs: add pmxcfs-memdb crate
@ 2026-01-30 15:35 5% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-01-30 15:35 UTC (permalink / raw)
To: Proxmox VE development discussion, Kefu Chai
Thanks for this substantial patch, Kefu! The overall structure looks
good and is already a good step.
Main issues I noted are around C compatibility.
Besides that, the version handling differs between operations.
I'd suggest centralizing this into a single mutation helper that
handles version bump + version update + entry change in one
transaction. A single write guard mutex would also help avoid the lock
ordering issues and race conditions I noted.
Details inline.
On 1/7/26 10:15 AM, Kefu Chai wrote:
> Add in-memory database with SQLite persistence:
> - MemDb: Main database handle (thread-safe via Arc)
> - TreeEntry: File/directory entries with metadata
> - SQLite schema version 5 (C-compatible)
> - Plugin system (6 functional + 4 link plugins)
> - Resource locking with timeout-based expiration
> - Version tracking and checksumming
> - Index encoding/decoding for cluster synchronization
>
> This crate depends only on pmxcfs-api-types and external
> libraries (rusqlite, sha2, bincode). It provides the core
> storage layer used by the distributed file system.
>
> Includes comprehensive unit tests for:
> - CRUD operations on files and directories
> - Lock acquisition and expiration
> - SQLite persistence and recovery
> - Index encoding/decoding for sync
> - Tree entry application
>
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
> src/pmxcfs-rs/Cargo.toml | 1 +
> src/pmxcfs-rs/pmxcfs-memdb/Cargo.toml | 42 +
> src/pmxcfs-rs/pmxcfs-memdb/README.md | 220 ++
> src/pmxcfs-rs/pmxcfs-memdb/src/database.rs | 2227 +++++++++++++++++
> src/pmxcfs-rs/pmxcfs-memdb/src/index.rs | 814 ++++++
> src/pmxcfs-rs/pmxcfs-memdb/src/lib.rs | 26 +
> src/pmxcfs-rs/pmxcfs-memdb/src/locks.rs | 286 +++
> src/pmxcfs-rs/pmxcfs-memdb/src/sync.rs | 249 ++
> src/pmxcfs-rs/pmxcfs-memdb/src/traits.rs | 101 +
> src/pmxcfs-rs/pmxcfs-memdb/src/types.rs | 325 +++
> src/pmxcfs-rs/pmxcfs-memdb/src/vmlist.rs | 189 ++
> .../pmxcfs-memdb/tests/checksum_test.rs | 158 ++
> .../tests/sync_integration_tests.rs | 394 +++
> 13 files changed, 5032 insertions(+)
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/Cargo.toml
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/README.md
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/database.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/index.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/lib.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/locks.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/sync.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/traits.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/types.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/src/vmlist.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/tests/checksum_test.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-memdb/tests/sync_integration_tests.rs
>
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index dd36c81f..2e41ac93 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -5,6 +5,7 @@ members = [
> "pmxcfs-config", # Configuration management
> "pmxcfs-logger", # Cluster log with ring buffer and deduplication
> "pmxcfs-rrd", # RRD (Round-Robin Database) persistence
> + "pmxcfs-memdb", # In-memory database with SQLite persistence
> ]
> resolver = "2"
>
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/Cargo.toml b/src/pmxcfs-rs/pmxcfs-memdb/Cargo.toml
> new file mode 100644
> index 00000000..409b87ce
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/Cargo.toml
> @@ -0,0 +1,42 @@
> +[package]
> +name = "pmxcfs-memdb"
> +description = "In-memory database with SQLite persistence for pmxcfs"
> +
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +repository.workspace = true
> +
> +[lints]
> +workspace = true
> +
> +[dependencies]
> +# Error handling
> +anyhow.workspace = true
> +
> +# Database
> +rusqlite = { version = "0.30", features = ["bundled"] }
> +
> +# Concurrency primitives
> +parking_lot.workspace = true
> +
> +# System integration
> +libc.workspace = true
> +
> +# Cryptography (for checksums)
> +sha2.workspace = true
> +bytes.workspace = true
> +
> +# Serialization
> +serde.workspace = true
> +bincode.workspace = true
> +
> +# Logging
> +tracing.workspace = true
> +
> +# pmxcfs types
> +pmxcfs-api-types = { path = "../pmxcfs-api-types" }
> +
> +[dev-dependencies]
> +tempfile.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/README.md b/src/pmxcfs-rs/pmxcfs-memdb/README.md
> new file mode 100644
> index 00000000..172e7351
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/README.md
> @@ -0,0 +1,220 @@
> +# pmxcfs-memdb
> +
> +**In-Memory Database** with SQLite persistence for pmxcfs cluster filesystem.
> +
> +This crate provides a thread-safe, cluster-synchronized in-memory database that serves as the backend storage for the Proxmox cluster filesystem. All filesystem operations (read, write, create, delete) are performed on in-memory structures with SQLite providing durable persistence.
> +
> +## Overview
> +
> +The MemDb is the core data structure that stores all cluster configuration files in memory for fast access while maintaining durability through SQLite. Changes are synchronized across the cluster using the DFSM protocol.
> +
> +### Key Features
> +
> +- **In-memory tree structure**: All filesystem entries cached in memory
> +- **SQLite persistence**: Durable storage with ACID guarantees
> +- **Cluster synchronization**: State replication via DFSM (pmxcfs-dfsm crate)
> +- **Version tracking**: Monotonically increasing version numbers for conflict detection
> +- **Resource locking**: File-level locks with timeout-based expiration
> +- **Thread-safe**: All operations protected by mutex
> +- **Size limits**: Enforces max file size (1 MiB) and total filesystem size (128 MiB)
> +
> +## Architecture
> +
> +### Module Structure
> +
> +| Module | Purpose | C Equivalent |
> +|--------|---------|--------------|
> +| `database.rs` | Core MemDb struct and CRUD operations | `memdb.c` (main functions) |
> +| `types.rs` | TreeEntry, LockInfo, constants | `memdb.h:38-51, 71-74` |
> +| `locks.rs` | Resource locking functionality | `memdb.c:memdb_lock_*` |
> +| `sync.rs` | State serialization for cluster sync | `memdb.c:memdb_encode_index` |
> +| `index.rs` | Index comparison for DFSM updates | `memdb.c:memdb_index_*` |
> +
> +## C to Rust Mapping
> +
> +### Data Structures
> +
> +| C Type | Rust Type | Notes |
> +|--------|-----------|-------|
> +| `memdb_t` | `MemDb` | Main database handle (Clone-able via Arc) |
> +| `memdb_tree_entry_t` | `TreeEntry` | File/directory entry |
> +| `memdb_index_t` | `MemDbIndex` | Serialized state for sync |
> +| `memdb_index_extry_t` | `IndexEntry` | Single index entry |
> +| `memdb_lock_info_t` | `LockInfo` | Lock metadata |
> +| `db_backend_t` | `Connection` | SQLite backend (rusqlite) |
> +| `GHashTable *index` | `HashMap<u64, TreeEntry>` | Inode index |
> +| `GHashTable *locks` | `HashMap<String, LockInfo>` | Lock table |
> +| `GMutex mutex` | `Mutex` | Thread synchronization |
> +
> +### Core Functions
> +
> +#### Database Lifecycle
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_open()` | `MemDb::open()` | database.rs |
> +| `memdb_close()` | (Drop trait) | Automatic |
> +| `memdb_checkpoint()` | (implicit in writes) | Auto-commit |
> +
> +#### File Operations
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_read()` | `MemDb::read()` | database.rs |
> +| `memdb_write()` | `MemDb::write()` | database.rs |
> +| `memdb_create()` | `MemDb::create()` | database.rs |
> +| `memdb_delete()` | `MemDb::delete()` | database.rs |
> +| `memdb_mkdir()` | `MemDb::create()` (with DT_DIR) | database.rs |
> +| `memdb_rename()` | `MemDb::rename()` | database.rs |
> +| `memdb_mtime()` | (included in write) | database.rs |
> +
> +#### Directory Operations
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_readdir()` | `MemDb::readdir()` | database.rs |
> +| `memdb_dirlist_free()` | (automatic) | Rust's Vec drops automatically |
> +
> +#### Metadata Operations
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_getattr()` | `MemDb::lookup_path()` | database.rs |
> +| `memdb_statfs()` | `MemDb::statfs()` | database.rs |
the statfs impl is missing in the diff, please re-visit
> +
> +#### Tree Entry Functions
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_tree_entry_new()` | `TreeEntry { ... }` | Struct initialization |
> +| `memdb_tree_entry_copy()` | `.clone()` | Automatic (derive Clone) |
> +| `memdb_tree_entry_free()` | (Drop trait) | Automatic |
> +| `tree_entry_debug()` | `{:?}` format | Automatic (derive Debug) |
> +| `memdb_tree_entry_csum()` | `TreeEntry::compute_checksum()` | types.rs |
> +
> +#### Lock Operations
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_lock_expired()` | `MemDb::is_lock_expired()` | locks.rs |
> +| `memdb_update_locks()` | `MemDb::update_locks()` | locks.rs |
> +
> +#### Index/Sync Operations
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_encode_index()` | `MemDb::get_index()` | sync.rs |
> +| `memdb_index_copy()` | `.clone()` | Automatic (derive Clone) |
> +| `memdb_compute_checksum()` | `MemDb::compute_checksum()` | sync.rs |
> +| `bdb_backend_commit_update()` | `MemDb::apply_tree_entry()` | database.rs |
> +
> +#### State Synchronization
> +
> +| C Function | Rust Equivalent | Location |
> +|-----------|-----------------|----------|
> +| `memdb_recreate_vmlist()` | (handled by status crate) | External |
> +| (implicit) | `MemDb::replace_all_entries()` | database.rs |
> +
> +### SQLite Backend
> +
> +**C Version (database.c):**
> +- Direct SQLite3 C API
> +- Manual statement preparation
> +- Explicit transaction management
> +- Manual memory management
> +
> +**Rust Version (database.rs):**
> +- `rusqlite` crate for type-safe SQLite access
> +
> +## Database Schema
> +
> +The SQLite schema stores all filesystem entries with metadata:
> +- `inode = 1` is always the root directory
> +- `parent = 0` for root, otherwise parent directory's inode
> +- `version` increments on each modification (monotonic)
> +- `writer` is the node ID that made the change
> +- `mtime` is seconds since UNIX epoch
> +- `data` is NULL for directories, BLOB for files
> +
> +## TreeEntry Wire Format
> +
> +For cluster synchronization (DFSM Update messages), TreeEntry uses C-compatible serialization that is byte-compatible with C's implementation.
> +
> +## Key Differences from C Implementation
> +
> +### Thread Safety
> +
> +**C Version:**
> +- Single `GMutex` protects entire memdb_t
> +- Callback-based access from qb_loop (single-threaded)
> +
> +**Rust Version:**
> +- Mutex for each data structure (index, tree, locks, conn)
> +- More granular locking
> +- Can be shared across tokio tasks
> +
> +### Data Structures
> +
> +**C Version:**
> +- `GHashTable` (GLib) for index and tree
> +- Recursive tree structure with pointers
> +
> +**Rust Version:**
> +- `HashMap` from std
> +- Flat structure: `HashMap<u64, HashMap<String, u64>>` for tree
> +- Separate `HashMap<u64, TreeEntry>` for index
> +- No recursive pointers (eliminates cycles)
> +
> +### SQLite Integration
> +
> +**C Version (database.c):**
> +- Direct SQLite3 C API
> +
> +**Rust Version (database.rs):**
> +- `rusqlite` crate for type-safe SQLite access
> +
> +## Constants
> +
> +| Constant | Value | Purpose |
> +|----------|-------|---------|
> +| `MEMDB_MAX_FILE_SIZE` | 1 MiB | Maximum file size (matches C) |
> +| `MEMDB_MAX_FSSIZE` | 128 MiB | Maximum total filesystem size |
> +| `MEMDB_MAX_INODES` | 256k | Maximum number of files/dirs |
> +| `MEMDB_BLOCKSIZE` | 4096 | Block size for statfs |
> +| `LOCK_TIMEOUT` | 120 sec | Lock expiration timeout |
> +| `DT_DIR` | 4 | Directory type (matches POSIX) |
> +| `DT_REG` | 8 | Regular file type (matches POSIX) |
> +
> +## Known Issues / TODOs
> +
> +### Missing Features
> +
> +- [ ] **vmlist regeneration**: `memdb_recreate_vmlist()` not implemented (handled by status crate's `scan_vmlist()`)
> +
> +### Behavioral Differences (Benign)
> +
> +- **Lock storage**: C reads from filesystem at startup, Rust does the same but implementation differs
> +- **Index encoding**: Rust uses `Vec<IndexEntry>` instead of flexible array member
> +- **Checksum algorithm**: Same (SHA-256) but implementation differs (ring vs OpenSSL)
> +
> +### Compatibility
> +
> +- **Database format**: 100% compatible with C version (same SQLite schema)
> +- **Wire format**: TreeEntry serialization matches C byte-for-byte
> +- **Constants**: All limits match C version exactly
> +
> +## References
> +
> +### C Implementation
> +- `src/pmxcfs/memdb.c` / `memdb.h` - In-memory database
> +- `src/pmxcfs/database.c` - SQLite backend
> +
> +### Related Crates
> +- **pmxcfs-dfsm**: Uses MemDb for cluster synchronization
> +- **pmxcfs-api-types**: Message types for FUSE operations
> +- **pmxcfs**: Main daemon and FUSE integration
> +
> +### External Dependencies
> +- **rusqlite**: SQLite bindings
> +- **parking_lot**: Fast mutex implementation
> +- **sha2**: SHA-256 checksums
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/database.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/database.rs
> new file mode 100644
> index 00000000..ee280683
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/database.rs
> @@ -0,0 +1,2227 @@
> +/// Core MemDb implementation - in-memory database with SQLite persistence
> +use anyhow::{Context, Result};
> +use parking_lot::Mutex;
> +use rusqlite::{Connection, params};
> +use std::collections::HashMap;
> +use std::path::Path;
> +use std::sync::Arc;
> +use std::sync::atomic::{AtomicU64, Ordering};
> +use std::time::{SystemTime, UNIX_EPOCH};
> +
> +use super::types::LockInfo;
> +use super::types::{
> + DT_DIR, DT_REG, LOCK_DIR_PATH, LoadDbResult, MEMDB_MAX_FILE_SIZE, ROOT_INODE, TreeEntry,
> + VERSION_FILENAME,
> +};
> +
> +/// In-memory database with SQLite persistence
> +#[derive(Clone)]
> +pub struct MemDb {
> + pub(super) inner: Arc<MemDbInner>,
> +}
> +
> +pub(super) struct MemDbInner {
> + /// SQLite connection for persistence (wrapped in Mutex for thread-safety)
> + pub(super) conn: Mutex<Connection>,
> +
> + /// In-memory index of all entries (inode -> TreeEntry)
> + /// This is a cache of the database for fast lookups
> + pub(super) index: Mutex<HashMap<u64, TreeEntry>>,
> +
> + /// In-memory tree structure (parent inode -> children)
> + pub(super) tree: Mutex<HashMap<u64, HashMap<String, u64>>>,
> +
> + /// Root entry
> + pub(super) root_inode: u64,
> +
> + /// Current version (incremented on each write)
> + pub(super) version: AtomicU64,
> +
> + /// Resource locks (path -> LockInfo)
> + pub(super) locks: Mutex<HashMap<String, LockInfo>>,
In C we set memdb->errors = 1 after DB errors and refuses subsequent
operations. We should likely also have a error flag here and update
/ check it when performing the operations?
> +}
> +
> +// Manually implement Send and Sync for MemDb
> +// This is safe because we protect the Connection with a Mutex
> +unsafe impl Send for MemDbInner {}
> +unsafe impl Sync for MemDbInner {}
Mutex<Connection> should allow us to avoid any unsafe impls here.
please remove and let the compiler enforce the guarantees
> +
> +impl MemDb {
> + pub fn open(path: &Path, create: bool) -> Result<Self> {
> + let conn = Connection::open(path)?;
> +
> + if create {
> + Self::init_schema(&conn)?;
> + }
> +
> + let (index, tree, root_inode, version) = Self::load_from_db(&conn)?;
> +
> + let memdb = Self {
> + inner: Arc::new(MemDbInner {
> + conn: Mutex::new(conn),
> + index: Mutex::new(index),
> + tree: Mutex::new(tree),
> + root_inode,
> + version: AtomicU64::new(version),
> + locks: Mutex::new(HashMap::new()),
> + }),
> + };
> +
> + memdb.update_locks();
> +
> + Ok(memdb)
> + }
> +
> + fn init_schema(conn: &Connection) -> Result<()> {
> + conn.execute_batch(
> + r#"
> + CREATE TABLE tree (
> + inode INTEGER PRIMARY KEY,
> + parent INTEGER NOT NULL,
> + version INTEGER NOT NULL,
> + writer INTEGER NOT NULL,
> + mtime INTEGER NOT NULL,
> + type INTEGER NOT NULL,
> + name TEXT NOT NULL,
> + data BLOB,
> + size INTEGER NOT NULL
> + );
> +
> + CREATE INDEX tree_parent_idx ON tree(parent, name);
> +
> + CREATE TABLE config (
> + name TEXT PRIMARY KEY,
> + value TEXT
> + );
> + "#,
> + )?;
> +
> + // Create root metadata entry as inode ROOT_INODE with name "__version__"
> + // Matching C implementation: root inode is NEVER in database as a regular entry
> + // Root metadata is stored as inode ROOT_INODE with special name "__version__"
> + let now = SystemTime::now()
> + .duration_since(SystemTime::UNIX_EPOCH)?
> + .as_secs() as u32;
> +
> + conn.execute(
> + "INSERT INTO tree (inode, parent, version, writer, mtime, type, name, data, size) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)",
> + params![ROOT_INODE, ROOT_INODE, 1, 0, now, DT_REG, VERSION_FILENAME, None::<Vec<u8>>, 0],
> + )?;
> +
> + Ok(())
> + }
> +
> + fn load_from_db(conn: &Connection) -> Result<LoadDbResult> {
> + let mut index = HashMap::new();
> + let mut tree: HashMap<u64, HashMap<String, u64>> = HashMap::new();
> + let mut max_version = 0u64;
> +
> + let mut stmt = conn.prepare(
> + "SELECT inode, parent, version, writer, mtime, type, name, data, size FROM tree",
> + )?;
> + let rows = stmt.query_map([], |row| {
> + let inode: u64 = row.get(0)?;
> + let parent: u64 = row.get(1)?;
> + let version: u64 = row.get(2)?;
> + let writer: u32 = row.get(3)?;
> + let mtime: u32 = row.get(4)?;
> + let entry_type: u8 = row.get(5)?;
> + let name: String = row.get(6)?;
> + let data: Option<Vec<u8>> = row.get(7)?;
> + let size: i64 = row.get(8)?;
> +
> + Ok(TreeEntry {
> + inode,
> + parent,
> + version,
> + writer,
> + mtime,
> + size: size as usize,
> + entry_type,
> + name,
> + data: data.unwrap_or_default(),
> + })
> + })?;
> +
> + // Create root entry in memory first (matching C implementation in database.c:559-567)
> + // Root is NEVER stored in database, only its metadata via inode ROOT_INODE
> + let now = SystemTime::now()
> + .duration_since(SystemTime::UNIX_EPOCH)?
> + .as_secs() as u32;
> + let mut root = TreeEntry {
> + inode: ROOT_INODE,
> + parent: ROOT_INODE, // Root's parent is itself
> + version: 0, // Will be populated from __version__ entry
> + writer: 0,
> + mtime: now,
> + size: 0,
> + entry_type: DT_DIR,
> + name: String::new(),
> + data: Vec::new(),
> + };
> +
> + for row in rows {
> + let entry = row?;
> +
> + // Handle __version__ entry (inode ROOT_INODE) - populate root metadata (C: database.c:372-382)
> + if entry.inode == ROOT_INODE {
> + if entry.name == VERSION_FILENAME {
> + tracing::debug!(
> + "Loading root metadata from __version__: version={}, writer={}, mtime={}",
> + entry.version,
> + entry.writer,
> + entry.mtime
> + );
> + root.version = entry.version;
> + root.writer = entry.writer;
> + root.mtime = entry.mtime;
> + if entry.version > max_version {
> + max_version = entry.version;
> + }
> + } else {
> + tracing::warn!("Ignoring inode 0 with unexpected name: {}", entry.name);
> + }
> + continue; // Don't add __version__ to index
> + }
> +
> + // Track max version from all entries
> + if entry.version > max_version {
> + max_version = entry.version;
> + }
> +
> + // Add to tree structure
> + tree.entry(entry.parent)
> + .or_default()
> + .insert(entry.name.clone(), entry.inode);
> +
> + // If this is a directory, ensure it has an entry in the tree map
> + if entry.is_dir() {
> + tree.entry(entry.inode).or_default();
> + }
> +
> + // Add to index
> + index.insert(entry.inode, entry);
> + }
> +
> + // If root version is still 0, set it to 1 (new database)
> + if root.version == 0 {
> + root.version = 1;
> + max_version = 1;
> + tracing::debug!("No __version__ entry found, initializing root with version 1");
> + }
> +
> + // Add root to index and ensure it has a tree entry (use entry() to not overwrite children!)
> + index.insert(ROOT_INODE, root);
> + tree.entry(ROOT_INODE).or_default();
> +
> + Ok((index, tree, ROOT_INODE, max_version))
> + }
> +
> + pub fn get_entry_by_inode(&self, inode: u64) -> Option<TreeEntry> {
> + let index = self.inner.index.lock();
> + index.get(&inode).cloned()
> + }
> +
> + /// Increment global version and synchronize root entry version
> + ///
> + /// CRITICAL: The C implementation uses root->version as the index version.
> + /// We must keep the root entry's version synchronized with the global version counter
> + /// to ensure C nodes can verify the index after applying updates.
> + ///
> + /// This function acquires the index lock and database connection lock internally,
> + /// so it must NOT be called while holding either lock.
We could use a single "write guard" mutex for all mutating operations
to avoid risking consistency / races.
To me it seems C is doing exactly that and it helps us avoid these
issues.
> + fn increment_version(&self) -> Result<u64> {
> + let new_version = self.inner.version.fetch_add(1, Ordering::SeqCst) + 1;
> +
> + // Update root entry version in memory and database
> + {
> + let mut index = self.inner.index.lock();
> + if let Some(root_entry) = index.get_mut(&self.inner.root_inode) {
> + root_entry.version = new_version;
> + }
> + drop(index); // Release lock before DB access
> + }
> +
> + // Persist to database (outside index lock to avoid deadlock)
> + {
> + let conn = self.inner.conn.lock();
> + conn.execute(
> + "UPDATE tree SET version = ? WHERE inode = ?",
> + rusqlite::params![new_version as i64, self.inner.root_inode as i64],
> + )
> + .context("Failed to update root version in database")?;
> + }
> +
> + Ok(new_version)
> + }
Can we please centralize version bumps and __version__ updates?
right now increment_version() updates root version in
memory + DB separately from the actual entry mutation, while
other paths update __version__ differently (and sometimes
not at all).
it’d be much safer if every mutation did:
bump version +update __version__ + apply the entry change in
the same transaction, then updated in-memory
For example we could have a helper like this:
fn with_mutation<R>(&self, writer: u32, mtime: u32, f: impl
FnOnce(&Transaction<'_>, u64) -> Result<R>) -> Result<R>;
> +
> + /// Get the __version__ entry for sending updates to C nodes
> + ///
> + /// The __version__ entry (inode ROOT_INODE) stores root metadata in the database
> + /// but is not kept in the in-memory index. This method queries it directly
> + /// from the database to send as an UPDATE message to C nodes.
> + pub fn get_version_entry(&self) -> anyhow::Result<TreeEntry> {
> + let index = self.inner.index.lock();
> + let root_entry = index
> + .get(&self.inner.root_inode)
> + .ok_or_else(|| anyhow::anyhow!("Root entry not found"))?;
> +
> + // Create a __version__ entry matching C's format
> + // This is what C expects to receive as inode ROOT_INODE
> + Ok(TreeEntry {
> + inode: ROOT_INODE, // __version__ is always inode ROOT_INODE in database/wire format
> + parent: ROOT_INODE, // Root's parent is itself
> + version: root_entry.version,
> + writer: root_entry.writer,
> + mtime: root_entry.mtime,
> + size: 0,
> + entry_type: DT_REG,
> + name: VERSION_FILENAME.to_string(),
> + data: Vec::new(),
> + })
> + }
> +
> + pub fn lookup_path(&self, path: &str) -> Option<TreeEntry> {
> + let index = self.inner.index.lock();
> + let tree = self.inner.tree.lock();
Here we lock in order index, tree
But in fn readdir(..) we lock in order tree, index
I think we should at least enforce a strict lock ordering across
all methods, or collapse to a single mutex as mentioned.
> +
> + if path.is_empty() || path == "/" || path == "." {
> + return index.get(&self.inner.root_inode).cloned();
> + }
> +
> + let parts: Vec<&str> = path.split('/').filter(|s| !s.is_empty()).collect();
> + let mut current_inode = self.inner.root_inode;
> +
> + for part in parts {
> + let children = tree.get(¤t_inode)?;
> + current_inode = *children.get(part)?;
> + }
> +
> + index.get(¤t_inode).cloned()
> + }
> +
> + /// Split a path into parent directory and basename
> + ///
> + /// Paths should be absolute (starting with `/`). While the implementation
> + /// handles relative paths for C compatibility, all new code should use absolute paths.
> + fn split_path(path: &str) -> (String, String) {
> + debug_assert!(
> + path.starts_with('/') || path.is_empty(),
> + "Path should be absolute (start with /), got: {path}"
> + );
This only validates in debug builds. You could replace this with
actual checks.
> +
> + let path = path.trim_end_matches('/');
> +
> + if let Some(pos) = path.rfind('/') {
> + let dirname = if pos == 0 { "/" } else { &path[..pos] };
> + let basename = &path[pos + 1..];
> + (dirname.to_string(), basename.to_string())
> + } else {
> + ("/".to_string(), path.to_string())
> + }
> + }
> +
> + pub fn exists(&self, path: &str) -> Result<bool> {
> + Ok(self.lookup_path(path).is_some())
> + }
> +
> + pub fn read(&self, path: &str, offset: u64, size: usize) -> Result<Vec<u8>> {
> + let entry = self
> + .lookup_path(path)
> + .ok_or_else(|| anyhow::anyhow!("File not found: {path}"))?;
> +
> + if entry.is_dir() {
> + return Err(anyhow::anyhow!("Cannot read directory: {path}"));
> + }
> +
> + let offset = offset as usize;
> + if offset >= entry.data.len() {
> + return Ok(Vec::new());
> + }
> +
> + let end = std::cmp::min(offset + size, entry.data.len());
> + Ok(entry.data[offset..end].to_vec())
> + }
> +
> + /// Helper to update __version__ entry in database
> + ///
> + /// This is called for EVERY write operation to keep root metadata synchronized
> + /// (matching C behavior in database.c:275-278)
> + fn update_version_entry(
> + conn: &rusqlite::Connection,
> + version: u64,
> + writer: u32,
> + mtime: u32,
> + ) -> Result<()> {
> + conn.execute(
> + "UPDATE tree SET version = ?1, writer = ?2, mtime = ?3 WHERE inode = ?4",
> + params![version, writer, mtime, ROOT_INODE],
> + )?;
> + Ok(())
> + }
> +
> + /// Helper to update root entry in index
> + ///
> + /// Keeps the in-memory root entry synchronized with database __version__
> + fn update_root_metadata(
> + index: &mut HashMap<u64, TreeEntry>,
> + root_inode: u64,
> + version: u64,
> + writer: u32,
> + mtime: u32,
> + ) {
> + if let Some(root_entry) = index.get_mut(&root_inode) {
> + root_entry.version = version;
> + root_entry.writer = writer;
> + root_entry.mtime = mtime;
> + }
> + }
> +
> + pub fn create(&self, path: &str, mode: u32, mtime: u32) -> Result<()> {
> + if self.exists(path)? {
> + return Err(anyhow::anyhow!("File already exists: {path}"));
> + }
> +
> + let (parent_path, basename) = Self::split_path(path);
> +
> + let parent_entry = self
> + .lookup_path(&parent_path)
> + .ok_or_else(|| anyhow::anyhow!("Parent directory not found: {parent_path}"))?;
> +
> + if !parent_entry.is_dir() {
> + return Err(anyhow::anyhow!("Parent is not a directory: {parent_path}"));
> + }
> +
> + let entry_type = if mode & libc::S_IFDIR != 0 {
> + DT_DIR
> + } else {
> + DT_REG
> + };
> +
> + // CRITICAL: Increment version FIRST, then assign inode = version
> + // This matches C's behavior: te->inode = memdb->root->version
> + // (see src/pmxcfs/memdb.c:760)
> + let version = self.increment_version()?;
> + let new_inode = version; // Inode equals version number (C compatibility)
> +
> + let entry = TreeEntry {
> + inode: new_inode,
> + parent: parent_entry.inode,
> + version,
> + writer: 0, // Local operations always use writer 0 (matching C)
> + mtime,
> + size: 0,
> + entry_type,
> + name: basename.clone(),
> + data: Vec::new(),
> + };
> +
> + {
> + let conn = self.inner.conn.lock();
> + let tx = conn.unchecked_transaction()?;
> +
> + tx.execute(
> + "INSERT INTO tree (inode, parent, version, writer, mtime, type, name, data, size) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)",
> + params![
> + entry.inode,
> + entry.parent,
> + entry.version,
> + entry.writer,
> + entry.mtime,
> + entry.entry_type,
> + entry.name,
> + if entry.is_dir() { None::<Vec<u8>> } else { Some(entry.data.clone()) },
> + entry.size
> + ],
> + )?;
> +
> + // CRITICAL: Update __version__ entry (matching C in database.c:275-278)
> + Self::update_version_entry(&tx, entry.version, entry.writer, entry.mtime)?;
> +
> + tx.commit()?;
> + }
> +
> + {
> + let mut index = self.inner.index.lock();
> + let mut tree = self.inner.tree.lock();
> +
> + index.insert(new_inode, entry.clone());
> + Self::update_root_metadata(
> + &mut index,
> + self.inner.root_inode,
> + entry.version,
> + entry.writer,
> + entry.mtime,
> + );
> +
> + tree.entry(parent_entry.inode)
> + .or_default()
> + .insert(basename, new_inode);
> +
> + if entry.is_dir() {
> + tree.insert(new_inode, HashMap::new());
> + }
> + }
> +
> + // If this is a directory in priv/lock/, register it in the lock table
> + if entry.is_dir() && parent_path == LOCK_DIR_PATH {
> + let csum = entry.compute_checksum();
> + let _ = self.lock_expired(path, &csum);
> + tracing::debug!("Registered lock directory: {}", path);
> + }
> +
> + Ok(())
> + }
> +
> + pub fn write(
> + &self,
> + path: &str,
> + offset: u64,
> + mtime: u32,
> + data: &[u8],
> + truncate: bool,
> + ) -> Result<usize> {
> + let mut entry = self
> + .lookup_path(path)
> + .ok_or_else(|| anyhow::anyhow!("File not found: {path}"))?;
> +
> + if entry.is_dir() {
> + return Err(anyhow::anyhow!("Cannot write to directory: {path}"));
> + }
> +
> + // Truncate before writing if requested (matches C implementation behavior)
C preserves prefix bytes on truncate
> + if truncate {
> + entry.data.clear();
> + }
> +
> + // Check size limit
> + let new_size = std::cmp::max(entry.data.len(), (offset as usize) + data.len());
I think we should use checked arithmetic to avoid possible overflows
on 32 bit systems. Also we should check
offset + data.len() <= MEMDB_MAX_FILE_SIZE
> +
> + if new_size > MEMDB_MAX_FILE_SIZE {
> + return Err(anyhow::anyhow!(
> + "File size exceeds maximum: {MEMDB_MAX_FILE_SIZE}"
> + ));
> + }
> +
> + // Extend if necessary
> + let offset = offset as usize;
> + if offset + data.len() > entry.data.len() {
> + entry.data.resize(offset + data.len(), 0);
> + }
> +
> + // Write data
> + entry.data[offset..offset + data.len()].copy_from_slice(data);
> + entry.size = entry.data.len();
> + entry.mtime = mtime;
> + entry.writer = 0; // Local operations always use writer 0 (matching C)
> +
> + // Increment version
> + let version = self.increment_version()?;
> + entry.version = version;
> +
> + // Update database
> + {
> + let conn = self.inner.conn.lock();
> + let tx = conn.unchecked_transaction()?;
> +
> + tx.execute(
> + "UPDATE tree SET version = ?1, writer = ?2, mtime = ?3, size = ?4, data = ?5 WHERE inode = ?6",
> + params![
> + entry.version,
> + entry.writer,
> + entry.mtime,
> + entry.size,
> + &entry.data,
> + entry.inode
> + ],
> + )?;
> +
> + // CRITICAL: Update __version__ entry (matching C in database.c:275-278)
> + Self::update_version_entry(&tx, entry.version, entry.writer, entry.mtime)?;
> +
> + tx.commit()?;
> + }
> +
> + // Update in-memory index
> + {
> + let mut index = self.inner.index.lock();
> + index.insert(entry.inode, entry.clone());
> + Self::update_root_metadata(
> + &mut index,
> + self.inner.root_inode,
> + entry.version,
> + entry.writer,
> + entry.mtime,
> + );
> + }
> +
> + Ok(data.len())
> + }
> +
> + /// Update modification time of a file or directory
> + ///
> + /// This implements the C version's `memdb_mtime` function (memdb.c:860-932)
> + /// with full lock protection semantics for directories in `priv/lock/`.
> + ///
> + /// # Lock Protection
> + ///
> + /// For lock directories (`priv/lock/*`), this function enforces:
> + /// 1. Only the same writer (node ID) can update the lock
> + /// 2. Only newer mtime values are accepted (to prevent replay attacks)
> + /// 3. Lock cache is refreshed after successful update
> + ///
> + /// # Arguments
> + ///
> + /// * `path` - Path to the file/directory
> + /// * `writer` - Writer ID (node ID in cluster)
> + /// * `mtime` - New modification time (seconds since UNIX epoch)
> + pub fn set_mtime(&self, path: &str, writer: u32, mtime: u32) -> Result<()> {
> + let mut entry = self
> + .lookup_path(path)
> + .ok_or_else(|| anyhow::anyhow!("File not found: {path}"))?;
> +
> + // Don't allow updating root
> + if entry.inode == self.inner.root_inode {
> + return Err(anyhow::anyhow!("Cannot update root directory"));
> + }
> +
> + // Check if this is a lock directory (matching C logic in memdb.c:882)
> + let (parent_path, _) = Self::split_path(path);
> + let is_lock = parent_path.trim_start_matches('/') == LOCK_DIR_PATH && entry.is_dir();
> +
> + if is_lock {
> + // Lock protection: Only allow newer mtime (C: memdb.c:886-889)
> + // This prevents replay attacks and ensures lock renewal works correctly
> + if mtime < entry.mtime {
> + tracing::warn!(
> + "Rejecting mtime update for lock '{}': {} < {} (locked)",
> + path,
> + mtime,
> + entry.mtime
> + );
> + return Err(anyhow::anyhow!(
> + "Cannot set older mtime on locked directory (dir is locked)"
> + ));
> + }
> +
> + // Lock protection: Only same writer can update (C: memdb.c:890-894)
> + // This prevents lock hijacking from other nodes
> + if entry.writer != writer {
> + tracing::warn!(
> + "Rejecting mtime update for lock '{}': writer {} != {} (wrong owner)",
> + path,
> + writer,
> + entry.writer
> + );
> + return Err(anyhow::anyhow!(
> + "Lock owned by different writer (cannot hijack lock)"
> + ));
> + }
> +
> + tracing::debug!(
> + "Updating lock directory: {} (mtime: {} -> {})",
> + path,
> + entry.mtime,
> + mtime
> + );
> + }
> +
> + // Increment version
> + let version = self.increment_version()?;
> +
> + // Update entry
> + entry.version = version;
> + entry.writer = writer;
> + entry.mtime = mtime;
> +
> + // Update database
> + {
> + let conn = self.inner.conn.lock();
> + conn.execute(
> + "UPDATE tree SET version = ?1, writer = ?2, mtime = ?3 WHERE inode = ?4",
> + params![entry.version, entry.writer, entry.mtime, entry.inode],
> + )?;
> + }
> +
> + // Update in-memory index
> + {
> + let mut index = self.inner.index.lock();
> + index.insert(entry.inode, entry.clone());
> + }
> +
> + // Refresh lock cache if this is a lock directory (C: memdb.c:924-929)
> + // Remove old entry and insert new one with updated checksum
> + if is_lock {
> + let mut locks = self.inner.locks.lock();
> + locks.remove(path);
> +
> + let csum = entry.compute_checksum();
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap_or_default()
> + .as_secs();
> +
> + locks.insert(path.to_string(), LockInfo { ltime: now, csum });
> +
> + tracing::debug!("Refreshed lock cache for: {}", path);
> + }
> +
> + Ok(())
> + }
> +
> + pub fn readdir(&self, path: &str) -> Result<Vec<TreeEntry>> {
> + let entry = self
> + .lookup_path(path)
> + .ok_or_else(|| anyhow::anyhow!("Directory not found: {path}"))?;
> +
> + if !entry.is_dir() {
> + return Err(anyhow::anyhow!("Not a directory: {path}"));
> + }
> +
> + let tree = self.inner.tree.lock();
> + let index = self.inner.index.lock();
> +
> + let children = tree
> + .get(&entry.inode)
> + .ok_or_else(|| anyhow::anyhow!("Directory structure corrupted"))?;
> +
> + let mut entries = Vec::new();
> + for child_inode in children.values() {
> + if let Some(child) = index.get(child_inode) {
> + entries.push(child.clone());
> + }
> + }
> +
> + Ok(entries)
> + }
> +
> + pub fn delete(&self, path: &str) -> Result<()> {
> + let entry = self
> + .lookup_path(path)
> + .ok_or_else(|| anyhow::anyhow!("File not found: {path}"))?;
> +
> + // Don't allow deleting root
> + if entry.inode == self.inner.root_inode {
> + return Err(anyhow::anyhow!("Cannot delete root directory"));
> + }
> +
> + // If directory, check if empty
> + if entry.is_dir() {
> + let tree = self.inner.tree.lock();
> + if let Some(children) = tree.get(&entry.inode)
> + && !children.is_empty()
> + {
> + return Err(anyhow::anyhow!("Directory not empty: {path}"));
> + }
> + }
C's memdb_delete() increments the root version, but here we dont.
Also the __version__ needs to be incremented.
> +
> + // Delete from database
> + {
> + let conn = self.inner.conn.lock();
> + conn.execute("DELETE FROM tree WHERE inode = ?1", params![entry.inode])?;
> + }
> +
> + // Update in-memory structures
> + {
> + let mut index = self.inner.index.lock();
> + let mut tree = self.inner.tree.lock();
> +
> + // Remove from index
> + index.remove(&entry.inode);
> +
> + // Remove from parent's children
> + if let Some(parent_children) = tree.get_mut(&entry.parent) {
> + parent_children.remove(&entry.name);
> + }
> +
> + // Remove from tree if directory
> + if entry.is_dir() {
> + tree.remove(&entry.inode);
> + }
> + }
> +
> + // Clean up lock cache for directories (matching C behavior in memdb.c:1235)
> + // This prevents stale lock cache entries and memory leaks
> + if entry.is_dir() {
> + let mut locks = self.inner.locks.lock();
> + locks.remove(path);
> + tracing::debug!("Removed lock cache entry for deleted directory: {}", path);
> + }
> +
> + Ok(())
> + }
> +
> + pub fn rename(&self, old_path: &str, new_path: &str) -> Result<()> {
> + let mut entry = self
> + .lookup_path(old_path)
> + .ok_or_else(|| anyhow::anyhow!("Source not found: {old_path}"))?;
> +
> + if entry.inode == self.inner.root_inode {
> + return Err(anyhow::anyhow!("Cannot rename root directory"));
> + }
> +
> + if self.exists(new_path)? {
> + return Err(anyhow::anyhow!("Destination already exists: {new_path}"));
> + }
> +
> + let (new_parent_path, new_basename) = Self::split_path(new_path);
> +
> + let new_parent_entry = self
> + .lookup_path(&new_parent_path)
> + .ok_or_else(|| anyhow::anyhow!("New parent directory not found: {new_parent_path}"))?;
> +
> + if !new_parent_entry.is_dir() {
> + return Err(anyhow::anyhow!(
> + "New parent is not a directory: {new_parent_path}"
> + ));
> + }
> +
> + let old_parent = entry.parent;
> + let old_name = entry.name.clone();
> +
> + entry.parent = new_parent_entry.inode;
> + entry.name = new_basename.clone();
> +
> + let version = self.increment_version()?;
> + entry.version = version;
> +
> + // Update database
> + {
> + let conn = self.inner.conn.lock();
> + let tx = conn.unchecked_transaction()?;
> +
> + tx.execute(
> + "UPDATE tree SET parent = ?1, name = ?2, version = ?3 WHERE inode = ?4",
> + params![entry.parent, entry.name, entry.version, entry.inode],
> + )?;
> +
> + // CRITICAL: Update __version__ entry (matching C in database.c:275-278)
> + Self::update_version_entry(&tx, entry.version, entry.writer, entry.mtime)?;
> +
> + tx.commit()?;
> + }
> +
> + {
> + let mut index = self.inner.index.lock();
> + let mut tree = self.inner.tree.lock();
> +
> + index.insert(entry.inode, entry.clone());
> + Self::update_root_metadata(
> + &mut index,
> + self.inner.root_inode,
> + entry.version,
> + entry.writer,
> + entry.mtime,
> + );
> +
> + if let Some(old_parent_children) = tree.get_mut(&old_parent) {
> + old_parent_children.remove(&old_name);
> + }
> +
> + tree.entry(new_parent_entry.inode)
> + .or_default()
> + .insert(new_basename, entry.inode);
> + }
> +
> + Ok(())
> + }
> +
> + pub fn get_all_entries(&self) -> Result<Vec<TreeEntry>> {
> + let index = self.inner.index.lock();
> + let entries: Vec<TreeEntry> = index.values().cloned().collect();
> + Ok(entries)
> + }
> +
> + pub fn get_version(&self) -> u64 {
> + self.inner.version.load(Ordering::SeqCst)
> + }
> +
> + /// Replace all entries (for full state synchronization)
> + pub fn replace_all_entries(&self, entries: Vec<TreeEntry>) -> Result<()> {
> + tracing::info!(
> + "Replacing all database entries with {} new entries",
> + entries.len()
> + );
> +
> + let conn = self.inner.conn.lock();
> + let tx = conn.unchecked_transaction()?;
> +
> + tx.execute("DELETE FROM tree", [])?;
Here we delete all entries, including the root one ..
> +
> + let max_version = entries.iter().map(|e| e.version).max().unwrap_or(0);
> +
> + for entry in &entries {
> + tx.execute(
> + "INSERT INTO tree (inode, parent, version, writer, mtime, type, name, data, size) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)",
> + params![
> + entry.inode,
> + entry.parent,
> + entry.version,
> + entry.writer,
> + entry.mtime,
> + entry.entry_type,
> + entry.name,
> + if entry.is_dir() { None::<Vec<u8>> } else { Some(entry.data.clone()) },
> + entry.size
> + ],
> + )?;
> + }
.. but if Vec<TreeEntry> contains the in-memory root format instead
of the DB format the database may corrupt; on restart load_from_db()
will ignore the malformed root entry and reset version to 1.
We need to handle the root case explicitly.
> +
> + tx.commit()?;
> + drop(conn);
> +
> + let mut index = self.inner.index.lock();
> + let mut tree = self.inner.tree.lock();
> +
> + index.clear();
> + tree.clear();
> +
> + for entry in entries {
> + tree.entry(entry.parent)
> + .or_default()
> + .insert(entry.name.clone(), entry.inode);
> +
> + if entry.is_dir() {
> + tree.entry(entry.inode).or_default();
> + }
> +
> + index.insert(entry.inode, entry);
> + }
> +
> + self.inner.version.store(max_version, Ordering::SeqCst);
> +
> + tracing::info!(
> + "Database state replaced successfully, version now: {}",
> + max_version
> + );
> + Ok(())
> + }
> +
> + /// Apply a single TreeEntry during incremental synchronization
> + ///
> + /// This is used when receiving Update messages from the leader.
> + /// It directly inserts or updates the entry in the database without
> + /// going through the path-based API.
> + pub fn apply_tree_entry(&self, entry: TreeEntry) -> Result<()> {
> + tracing::debug!(
> + "Applying TreeEntry: inode={}, parent={}, name='{}', version={}",
> + entry.inode,
> + entry.parent,
> + entry.name,
> + entry.version
> + );
> +
> + // Begin transaction for atomicity
> + let conn = self.inner.conn.lock();
> + let tx = conn.unchecked_transaction()?;
> +
> + // Handle root inode specially (inode 0 is __version__)
> + let db_name = if entry.inode == self.inner.root_inode {
> + VERSION_FILENAME
> + } else {
> + entry.name.as_str()
> + };
> +
> + // Insert or replace the entry in database
> + tx.execute(
> + "INSERT OR REPLACE INTO tree (inode, parent, version, writer, mtime, type, name, data, size) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9)",
> + params![
> + entry.inode,
> + entry.parent,
> + entry.version,
> + entry.writer,
> + entry.mtime,
> + entry.entry_type,
> + db_name,
> + if entry.is_dir() { None::<Vec<u8>> } else { Some(entry.data.clone()) },
> + entry.size
> + ],
> + )?;
> +
> + // CRITICAL: Update __version__ entry with the same metadata (matching C in database.c:275-278)
> + // Only do this if we're not already writing __version__ itself
> + if entry.inode != ROOT_INODE {
> + Self::update_version_entry(&tx, entry.version, entry.writer, entry.mtime)?;
> + }
> +
> + tx.commit()?;
> + drop(conn);
> +
> + // Update in-memory structures
> + let mut index = self.inner.index.lock();
> + let mut tree = self.inner.tree.lock();
> +
> + // Check if this entry already exists
> + let old_entry = index.get(&entry.inode).cloned();
> +
> + // If entry exists with different parent or name, update tree structure
> + if let Some(old) = old_entry {
> + if old.parent != entry.parent || old.name != entry.name {
> + // Remove from old parent's children
> + if let Some(old_parent_children) = tree.get_mut(&old.parent) {
> + old_parent_children.remove(&old.name);
> + }
> +
> + // Add to new parent's children
> + tree.entry(entry.parent)
> + .or_default()
> + .insert(entry.name.clone(), entry.inode);
> + }
> + } else {
> + // New entry - add to parent's children
> + tree.entry(entry.parent)
> + .or_default()
> + .insert(entry.name.clone(), entry.inode);
> + }
> +
> + // If this is a directory, ensure it has an entry in the tree map
> + if entry.is_dir() {
> + tree.entry(entry.inode).or_default();
> + }
> +
> + // Update index
> + index.insert(entry.inode, entry.clone());
incoming updates may include inode 0. this would overwrite the
in-memory root dir entry (DT_DIR) with a file entry
> +
> + // Update root entry's metadata to match __version__ (if we wrote a non-root entry)
> + if entry.inode != self.inner.root_inode {
> + Self::update_root_metadata(
> + &mut index,
> + self.inner.root_inode,
> + entry.version,
> + entry.writer,
> + entry.mtime,
> + );
> + tracing::debug!(
> + version = entry.version,
> + writer = entry.writer,
> + mtime = entry.mtime,
> + "Updated root entry metadata"
> + );
> + }
> +
> + // Update version counter if this entry has a higher version
> + self.inner
> + .version
> + .fetch_max(entry.version, Ordering::SeqCst);
> +
> + tracing::debug!("TreeEntry applied successfully");
> + Ok(())
> + }
> +
> + /// **TEST ONLY**: Manually set lock timestamp for testing expiration behavior
> + ///
> + /// This method is exposed for testing purposes only to simulate lock expiration
> + /// without waiting the full 120 seconds. Do not use in production code.
> + #[cfg(test)]
> + pub fn test_set_lock_timestamp(&self, path: &str, timestamp_secs: u64) {
> + let mut locks = self.inner.locks.lock();
> + if let Some(lock_info) = locks.get_mut(path) {
> + lock_info.ltime = timestamp_secs;
> + }
> + }
> +}
> +
> +// ============================================================================
> +// Trait Implementation for Dependency Injection
> +// ============================================================================
> +
> +impl crate::traits::MemDbOps for MemDb {
> + fn create(&self, path: &str, mode: u32, mtime: u32) -> Result<()> {
> + self.create(path, mode, mtime)
> + }
> +
> + fn read(&self, path: &str, offset: u64, size: usize) -> Result<Vec<u8>> {
> + self.read(path, offset, size)
> + }
> +
> + fn write(
> + &self,
> + path: &str,
> + offset: u64,
> + mtime: u32,
> + data: &[u8],
> + truncate: bool,
> + ) -> Result<usize> {
> + self.write(path, offset, mtime, data, truncate)
> + }
> +
> + fn delete(&self, path: &str) -> Result<()> {
> + self.delete(path)
> + }
> +
> + fn rename(&self, old_path: &str, new_path: &str) -> Result<()> {
> + self.rename(old_path, new_path)
> + }
> +
> + fn exists(&self, path: &str) -> Result<bool> {
> + self.exists(path)
> + }
> +
> + fn readdir(&self, path: &str) -> Result<Vec<crate::types::TreeEntry>> {
> + self.readdir(path)
> + }
> +
> + fn set_mtime(&self, path: &str, writer: u32, mtime: u32) -> Result<()> {
> + self.set_mtime(path, writer, mtime)
> + }
> +
> + fn lookup_path(&self, path: &str) -> Option<crate::types::TreeEntry> {
> + self.lookup_path(path)
> + }
> +
> + fn get_entry_by_inode(&self, inode: u64) -> Option<crate::types::TreeEntry> {
> + self.get_entry_by_inode(inode)
> + }
> +
> + fn acquire_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()> {
> + self.acquire_lock(path, csum)
> + }
> +
> + fn release_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()> {
> + self.release_lock(path, csum)
> + }
> +
> + fn is_locked(&self, path: &str) -> bool {
> + self.is_locked(path)
> + }
> +
> + fn lock_expired(&self, path: &str, csum: &[u8; 32]) -> bool {
> + self.lock_expired(path, csum)
> + }
> +
> + fn get_version(&self) -> u64 {
> + self.get_version()
> + }
> +
> + fn get_all_entries(&self) -> Result<Vec<crate::types::TreeEntry>> {
> + self.get_all_entries()
> + }
> +
> + fn replace_all_entries(&self, entries: Vec<crate::types::TreeEntry>) -> Result<()> {
> + self.replace_all_entries(entries)
> + }
> +
> + fn apply_tree_entry(&self, entry: crate::types::TreeEntry) -> Result<()> {
> + self.apply_tree_entry(entry)
> + }
> +
> + fn encode_database(&self) -> Result<Vec<u8>> {
> + self.encode_database()
> + }
> +
> + fn compute_database_checksum(&self) -> Result<[u8; 32]> {
> + self.compute_database_checksum()
> + }
> +}
> +
[..]
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/index.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/index.rs
> new file mode 100644
> index 00000000..5bf9c102
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/index.rs
> @@ -0,0 +1,814 @@
> +/// MemDB Index structures for C-compatible state synchronization
> +///
> +/// This module implements the memdb_index_t format used by the C implementation
> +/// for efficient state comparison during cluster synchronization.
> +use anyhow::Result;
> +use sha2::{Digest, Sha256};
> +
> +/// Index entry matching C's memdb_index_extry_t
> +///
> +/// Wire format (40 bytes):
> +/// ```c
> +/// typedef struct {
> +/// guint64 inode; // 8 bytes
> +/// char digest[32]; // 32 bytes (SHA256)
> +/// } memdb_index_extry_t;
> +/// ```
> +#[derive(Debug, Clone, PartialEq, Eq)]
> +pub struct IndexEntry {
> + pub inode: u64,
> + pub digest: [u8; 32],
> +}
> +
> +impl IndexEntry {
> + pub fn serialize(&self) -> Vec<u8> {
> + let mut data = Vec::with_capacity(40);
> + data.extend_from_slice(&self.inode.to_le_bytes());
> + data.extend_from_slice(&self.digest);
> + data
> + }
> +
> + pub fn deserialize(data: &[u8]) -> Result<Self> {
> + if data.len() < 40 {
> + anyhow::bail!("IndexEntry too short: {} bytes (need 40)", data.len());
> + }
> +
> + let inode = u64::from_le_bytes(data[0..8].try_into().unwrap());
> + let mut digest = [0u8; 32];
> + digest.copy_from_slice(&data[8..40]);
> +
> + Ok(Self { inode, digest })
> + }
> +}
> +
> +/// MemDB index matching C's memdb_index_t
> +///
> +/// Wire format header (24 bytes) + entries:
this should be 32 bytes? also please fix the comment below and the
reference in the README
> +/// ```c
> +/// typedef struct {
> +/// guint64 version; // 8 bytes
> +/// guint64 last_inode; // 8 bytes
> +/// guint32 writer; // 4 bytes
> +/// guint32 mtime; // 4 bytes
> +/// guint32 size; // 4 bytes (number of entries)
> +/// guint32 bytes; // 4 bytes (total bytes allocated)
> +/// memdb_index_extry_t entries[]; // variable length
> +/// } memdb_index_t;
> +/// ```
> +#[derive(Debug, Clone, PartialEq, Eq)]
> +pub struct MemDbIndex {
> + pub version: u64,
> + pub last_inode: u64,
> + pub writer: u32,
> + pub mtime: u32,
> + pub size: u32, // number of entries
> + pub bytes: u32, // total bytes (24 + size * 40)
> + pub entries: Vec<IndexEntry>,
> +}
> +
> +impl MemDbIndex {
> + /// Create a new index from entries
> + ///
> + /// Entries are automatically sorted by inode for efficient comparison
> + /// and to match C implementation behavior.
> + pub fn new(
> + version: u64,
> + last_inode: u64,
> + writer: u32,
> + mtime: u32,
> + mut entries: Vec<IndexEntry>,
> + ) -> Self {
> + // Sort entries by inode (matching C implementation)
> + entries.sort_by_key(|e| e.inode);
> +
> + let size = entries.len() as u32;
> + let bytes = 32 + size * 40; // header (32) + entries
> +
> + Self {
> + version,
> + last_inode,
> + writer,
> + mtime,
> + size,
> + bytes,
> + entries,
> + }
> + }
> +
> + /// Serialize to C-compatible wire format
> + pub fn serialize(&self) -> Vec<u8> {
> + let mut data = Vec::with_capacity(self.bytes as usize);
> +
> + // Header (32 bytes)
> + data.extend_from_slice(&self.version.to_le_bytes());
> + data.extend_from_slice(&self.last_inode.to_le_bytes());
> + data.extend_from_slice(&self.writer.to_le_bytes());
> + data.extend_from_slice(&self.mtime.to_le_bytes());
> + data.extend_from_slice(&self.size.to_le_bytes());
> + data.extend_from_slice(&self.bytes.to_le_bytes());
> +
> + // Entries (40 bytes each)
> + for entry in &self.entries {
> + data.extend_from_slice(&entry.serialize());
> + }
> +
> + data
> + }
> +
> + /// Deserialize from C-compatible wire format
> + pub fn deserialize(data: &[u8]) -> Result<Self> {
> + if data.len() < 32 {
> + anyhow::bail!(
> + "MemDbIndex too short: {} bytes (need at least 32)",
> + data.len()
> + );
> + }
> +
> + // Parse header
> + let version = u64::from_le_bytes(data[0..8].try_into().unwrap());
> + let last_inode = u64::from_le_bytes(data[8..16].try_into().unwrap());
> + let writer = u32::from_le_bytes(data[16..20].try_into().unwrap());
> + let mtime = u32::from_le_bytes(data[20..24].try_into().unwrap());
> + let size = u32::from_le_bytes(data[24..28].try_into().unwrap());
> + let bytes = u32::from_le_bytes(data[28..32].try_into().unwrap());
> +
> + // Validate size
> + let expected_bytes = 32 + size * 40;
> + if bytes != expected_bytes {
> + anyhow::bail!("MemDbIndex bytes mismatch: got {bytes}, expected {expected_bytes}");
> + }
> +
> + if data.len() < bytes as usize {
> + anyhow::bail!(
> + "MemDbIndex data too short: {} bytes (need {})",
> + data.len(),
> + bytes
> + );
> + }
> +
> + // Parse entries
> + let mut entries = Vec::with_capacity(size as usize);
> + let mut offset = 32;
> + for _ in 0..size {
> + let entry = IndexEntry::deserialize(&data[offset..offset + 40])?;
> + entries.push(entry);
> + offset += 40;
> + }
> +
> + Ok(Self {
> + version,
> + last_inode,
> + writer,
> + mtime,
> + size,
> + bytes,
> + entries,
> + })
> + }
> +
> + /// Compute SHA256 digest of a tree entry for the index
> + ///
> + /// Matches C's memdb_encode_index() digest computation (memdb.c:1497-1507)
> + /// CRITICAL: Order and fields must match exactly:
> + /// 1. version, 2. writer, 3. mtime, 4. size, 5. type, 6. parent, 7. name, 8. data
> + ///
> + /// NOTE: inode is NOT included in the digest (only used as the index key)
> + #[allow(clippy::too_many_arguments)]
> + pub fn compute_entry_digest(
> + _inode: u64, // Not included in digest, only for signature compatibility
> + parent: u64,
> + version: u64,
> + writer: u32,
> + mtime: u32,
> + size: usize,
> + entry_type: u8,
> + name: &str,
> + data: &[u8],
> + ) -> [u8; 32] {
> + let mut hasher = Sha256::new();
> +
> + // Hash entry metadata in C's exact order (memdb.c:1497-1503)
> + hasher.update(version.to_le_bytes());
> + hasher.update(writer.to_le_bytes());
> + hasher.update(mtime.to_le_bytes());
> + hasher.update((size as u32).to_le_bytes()); // C uses u32 for te->size
> + hasher.update([entry_type]);
> + hasher.update(parent.to_le_bytes());
> + hasher.update(name.as_bytes());
> +
> + // Hash data only for regular files with non-zero size (memdb.c:1505-1507)
> + if entry_type == 8 /* DT_REG */ && size > 0 {
> + hasher.update(data);
> + }
> +
> + hasher.finalize().into()
> + }
> +}
> +
> +/// Implement comparison for MemDbIndex
> +///
> +/// Matches C's dcdb_choose_leader_with_highest_index() logic:
> +/// - If same version, higher mtime wins
> +/// - If different version, higher version wins
> +impl PartialOrd for MemDbIndex {
> + fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
> + Some(self.cmp(other))
> + }
> +}
> +
> +impl Ord for MemDbIndex {
> + fn cmp(&self, other: &Self) -> std::cmp::Ordering {
> + // First compare by version (higher version wins)
> + // Then by mtime (higher mtime wins) if versions are equal
> + self.version
> + .cmp(&other.version)
> + .then_with(|| self.mtime.cmp(&other.mtime))
> + }
> +}
> +
> +impl MemDbIndex {
> + /// Find entries that differ from another index
> + ///
> + /// Returns the set of inodes that need to be sent as updates.
> + /// Matches C's dcdb_create_and_send_updates() comparison logic.
> + pub fn find_differences(&self, other: &MemDbIndex) -> Vec<u64> {
> + let mut differences = Vec::new();
> +
> + // Walk through master index, comparing with slave
> + let mut j = 0; // slave position
> +
> + for i in 0..self.entries.len() {
> + let master_entry = &self.entries[i];
> + let inode = master_entry.inode;
> +
> + // Advance slave pointer to matching or higher inode
> + while j < other.entries.len() && other.entries[j].inode < inode {
> + j += 1;
> + }
> +
> + // Check if entries match
> + if j < other.entries.len() {
> + let slave_entry = &other.entries[j];
> + if slave_entry.inode == inode && slave_entry.digest == master_entry.digest {
> + // Entries match - skip
> + continue;
> + }
> + }
> +
> + // Entry differs or missing - needs update
> + differences.push(inode);
> + }
> +
> + differences
> + }
> +}
> +
> +#[cfg(test)]
[..]
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/lib.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/lib.rs
> new file mode 100644
> index 00000000..f5c6d97a
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/lib.rs
> @@ -0,0 +1,26 @@
> +/// In-memory database with SQLite persistence
> +///
> +/// This module provides a cluster-synchronized in-memory database with SQLite persistence.
> +/// The implementation is organized into focused submodules:
> +///
> +/// - `types`: Type definitions and constants
> +/// - `database`: Core MemDb struct and CRUD operations
> +/// - `locks`: Resource locking functionality
> +/// - `sync`: State synchronization and serialization
> +/// - `index`: C-compatible memdb index structures for efficient state comparison
> +/// - `traits`: Trait abstractions for dependency injection and testing
> +mod database;
> +mod index;
> +mod locks;
> +mod sync;
> +mod traits;
> +mod types;
> +mod vmlist;
> +
> +// Re-export public types
> +pub use database::MemDb;
> +pub use index::{IndexEntry, MemDbIndex};
> +pub use locks::is_lock_path;
> +pub use traits::MemDbOps;
> +pub use types::{ROOT_INODE, TreeEntry};
> +pub use vmlist::recreate_vmlist;
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/locks.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/locks.rs
> new file mode 100644
> index 00000000..6d797fd0
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/locks.rs
> @@ -0,0 +1,286 @@
> +/// Lock management for memdb
> +///
> +/// Locks in pmxcfs are implemented as directory entries stored in the database at
> +/// `priv/lock/<lockname>`. This ensures locks are:
> +/// 1. Persistent across restarts
> +/// 2. Synchronized across the cluster via DFSM
> +/// 3. Visible to both C and Rust nodes
> +///
> +/// The in-memory lock table is a cache rebuilt from the database on startup
> +/// and updated dynamically during runtime.
> +use anyhow::Result;
> +use std::time::{SystemTime, UNIX_EPOCH};
> +
> +use super::database::MemDb;
> +use super::types::{LOCK_DIR_PATH, LOCK_TIMEOUT, LockInfo};
> +
> +/// Check if a path is in the lock directory
> +///
> +/// Matches C's path_is_lockdir() function (cfs-utils.c:306)
> +/// Returns true if path is "{LOCK_DIR_PATH}/<something>" (with or without leading /)
> +pub fn is_lock_path(path: &str) -> bool {
> + let path = path.trim_start_matches('/');
> + let lock_prefix = format!("{LOCK_DIR_PATH}/");
> + path.starts_with(&lock_prefix) && path.len() > lock_prefix.len()
> +}
> +
> +impl MemDb {
> + /// Check if a lock has expired (with side effects matching C semantics)
> + ///
> + /// This function implements the same behavior as the C version (memdb.c:330-358):
> + /// - If no lock exists in cache: Reads from database, creates cache entry, returns `false`
> + /// - If lock exists but csum mismatches: Updates csum, resets timeout, logs critical error, returns `false`
> + /// - If lock exists, csum matches, and time > LOCK_TIMEOUT: Returns `true` (expired)
> + /// - Otherwise: Returns `false` (not expired)
> + ///
> + /// This function is used for both checking AND managing locks, matching C semantics.
> + ///
> + /// # Current Usage
> + /// - Called from `database::create()` when creating lock directories (matching C memdb.c:928)
> + /// - Called from FUSE utimens operation (pmxcfs/src/fuse/filesystem.rs:717) for mtime=0 unlock requests
> + /// - Called from DFSM unlock message handlers (pmxcfs/src/memdb_callbacks.rs:142,161)
> + ///
> + /// Note: DFSM broadcasting of unlock messages to cluster nodes is not yet fully implemented.
> + /// See TODOs in filesystem.rs:723 and memdb_callbacks.rs:154 for remaining work.
> + pub fn lock_expired(&self, path: &str, csum: &[u8; 32]) -> bool {
> + let mut locks = self.inner.locks.lock();
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap_or_default()
> + .as_secs();
> +
> + match locks.get_mut(path) {
> + Some(lock_info) => {
> + // Lock exists in cache - check csum
> + if lock_info.csum != *csum {
> + // Wrong csum - update and reset timeout
> + lock_info.ltime = now;
> + lock_info.csum = *csum;
> + tracing::error!("Lock checksum mismatch for '{}' - resetting timeout", path);
> + return false;
> + }
> +
> + // Csum matches - check if expired
> + let elapsed = now - lock_info.ltime;
> + if elapsed > LOCK_TIMEOUT {
> + tracing::debug!(path, elapsed, "Lock expired");
> + return true; // Expired
> + }
> +
> + false // Not expired
> + }
> + None => {
> + // No lock in cache - create new cache entry
> + locks.insert(
> + path.to_string(),
> + LockInfo {
> + ltime: now,
> + csum: *csum,
> + },
> + );
> + tracing::debug!(path, "Created new lock cache entry");
> + false // Not expired (just created)
> + }
> + }
> + }
> +
> + /// Acquire a lock on a path
> + ///
> + /// This creates a directory entry in the database at `priv/lock/<lockname>`
> + /// and broadcasts the operation to the cluster via DFSM.
> + pub fn acquire_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()> {
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap_or_default()
> + .as_secs();
> +
> + let locks = self.inner.locks.lock();
> +
> + // Check if there's an existing valid lock in cache
> + if let Some(existing_lock) = locks.get(path) {
> + let lock_age = now - existing_lock.ltime;
> + if lock_age <= LOCK_TIMEOUT && existing_lock.csum != *csum {
> + return Err(anyhow::anyhow!("Lock already held by another process"));
> + }
> + }
> +
> + // Convert path like "/priv/lock/foo.lock" to just the lock name
> + let lock_dir_with_slash = format!("/{LOCK_DIR_PATH}/");
> + let lock_name = if let Some(name) = path.strip_prefix(&lock_dir_with_slash) {
> + name
> + } else {
> + path.strip_prefix('/').unwrap_or(path)
> + };
> +
> + let lock_path = format!("/{LOCK_DIR_PATH}/{lock_name}");
In this lock path we use leading slash, but update_locks is without
format!("{}/{}", LOCK_DIR_PATH, entry.name) which would not match.
Please standardize on paths without leading slash and also adjust
the stripping logic accordingly.
Also we should validate the lock names to avoid path traversal.
> +
> + // Release locks mutex before database operations to avoid deadlock
> + drop(locks);
> +
> + // Create or update lock directory in database
> + // First check if it exists
> + if self.exists(&lock_path)? {
> + // Lock directory exists - update its mtime to refresh
> + // In C this is implicit through the checksum, we'll update the entry
> + tracing::debug!("Refreshing existing lock directory: {}", lock_path);
> + // We don't need to do anything - the lock cache entry will be updated below
> + } else {
> + // Create lock directory in database
> + let mode = libc::S_IFDIR | 0o755;
> + let mtime = now as u32;
> +
> + // Ensure lock directory exists
> + let lock_dir_full = format!("/{LOCK_DIR_PATH}");
> + if !self.exists(&lock_dir_full)? {
> + self.create(&lock_dir_full, libc::S_IFDIR | 0o755, mtime)?;
> + }
> +
> + self.create(&lock_path, mode, mtime)?;
> + tracing::debug!("Created lock directory in database: {}", lock_path);
> + }
> +
> + // Update in-memory cache
> + let mut locks = self.inner.locks.lock();
> + locks.insert(
> + lock_path.clone(),
> + LockInfo {
> + ltime: now,
> + csum: *csum,
> + },
> + );
> +
> + tracing::debug!("Lock acquired on path: {}", lock_path);
> + Ok(())
> + }
> +
> + /// Release a lock on a path
> + ///
> + /// This deletes the directory entry from the database and broadcasts
> + /// the delete operation to the cluster via DFSM.
> + pub fn release_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()> {
> + let locks = self.inner.locks.lock();
> +
> + if let Some(lock_info) = locks.get(path) {
> + // Only release if checksum matches
> + if lock_info.csum != *csum {
> + return Err(anyhow::anyhow!("Cannot release lock: checksum mismatch"));
> + }
> + } else {
> + return Err(anyhow::anyhow!("No lock found on path: {path}"));
> + }
> +
> + // Release locks mutex before database operations
> + drop(locks);
> +
> + // Delete lock directory from database
> + if self.exists(path)? {
> + self.delete(path)?;
> + tracing::debug!("Deleted lock directory from database: {}", path);
> + }
> +
> + // Remove from in-memory cache
> + let mut locks = self.inner.locks.lock();
> + locks.remove(path);
> +
> + tracing::debug!("Lock released on path: {}", path);
> + Ok(())
> + }
> +
> + /// Update lock cache by scanning the priv/lock directory in database
> + ///
> + /// This implements the C version's behavior (memdb.c:360-89):
> + /// - Scans the `priv/lock` directory in the database
> + /// - Rebuilds the entire lock hash table from database state
> + /// - Preserves `ltime` from old entries if csum matches
> + /// - Is called on database open and after synchronization
> + ///
> + /// This ensures locks are visible across C/Rust nodes and survive restarts.
> + pub(crate) fn update_locks(&self) {
> + // Check if lock directory exists
> + let _lock_dir = match self.lookup_path(LOCK_DIR_PATH) {
> + Some(entry) if entry.is_dir() => entry,
> + _ => {
> + tracing::debug!(
> + "{} directory does not exist, initializing empty lock table",
> + LOCK_DIR_PATH
> + );
> + self.inner.locks.lock().clear();
> + return;
> + }
> + };
> +
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap_or_default()
> + .as_secs();
> +
> + // Get old locks table for preserving ltimes
> + let old_locks = {
> + let locks = self.inner.locks.lock();
> + locks.clone()
> + };
> +
> + // Build new locks table from database
> + let mut new_locks = std::collections::HashMap::new();
> +
> + // Read all lock directories
> + match self.readdir(LOCK_DIR_PATH) {
> + Ok(entries) => {
> + for entry in entries {
> + // Only process directories (locks are stored as directories)
> + if !entry.is_dir() {
> + continue;
> + }
> +
> + let lock_path = format!("{}/{}", LOCK_DIR_PATH, entry.name);
> + let csum = entry.compute_checksum();
> +
> + // Check if we have an old entry with matching checksum
> + let ltime = if let Some(old_lock) = old_locks.get(&lock_path) {
> + if old_lock.csum == csum {
> + // Checksum matches - preserve old ltime
> + old_lock.ltime
> + } else {
> + // Checksum changed - reset ltime
> + now
> + }
> + } else {
> + // New lock - set ltime to now
> + now
> + };
> +
> + new_locks.insert(lock_path.clone(), LockInfo { ltime, csum });
> + tracing::debug!("Loaded lock from database: {}", lock_path);
> + }
> + }
> + Err(e) => {
> + tracing::warn!("Failed to read {} directory: {}", LOCK_DIR_PATH, e);
> + return;
> + }
> + }
> +
> + // Replace lock table
> + *self.inner.locks.lock() = new_locks;
> +
> + tracing::debug!(
> + "Updated lock table from database: {} locks",
> + self.inner.locks.lock().len()
> + );
> + }
> +
> + /// Check if a path is locked
> + pub fn is_locked(&self, path: &str) -> bool {
> + let locks = self.inner.locks.lock();
> + if let Some(lock_info) = locks.get(path) {
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap_or_default()
> + .as_secs();
> +
> + // Check if lock is still valid (not expired)
> + (now - lock_info.ltime) <= LOCK_TIMEOUT
> + } else {
> + false
> + }
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/sync.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/sync.rs
> new file mode 100644
> index 00000000..719a2cf0
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/sync.rs
> @@ -0,0 +1,249 @@
> +/// State synchronization and serialization for memdb
> +use anyhow::{Context, Result};
> +use sha2::{Digest, Sha256};
> +use std::sync::atomic::Ordering;
> +
> +use super::database::MemDb;
> +use super::index::{IndexEntry, MemDbIndex};
> +use super::types::TreeEntry;
> +
> +impl MemDb {
> + /// Encode database index for C-compatible state synchronization
> + ///
> + /// This creates a memdb_index_t structure matching the C implementation,
> + /// containing metadata and a sorted list of (inode, digest) pairs.
> + /// This is sent as the "state" during DFSM synchronization.
> + pub fn encode_index(&self) -> Result<MemDbIndex> {
> + let mut index = self.inner.index.lock();
> +
> + // CRITICAL: Synchronize root entry version with global version counter
> + // The C implementation uses root->version as the index version,
> + // so we must ensure they match before encoding.
> + let global_version = self.inner.version.load(Ordering::SeqCst);
> +
> + let root_inode = self.inner.root_inode;
> + let mut root_version_updated = false;
> + if let Some(root_entry) = index.get_mut(&root_inode) {
> + if root_entry.version != global_version {
> + root_entry.version = global_version;
> + root_version_updated = true;
> + }
> + } else {
> + anyhow::bail!("Root entry not found in index");
> + }
> +
> + // If root version was updated, persist to database
> + if root_version_updated {
> + let conn = self.inner.conn.lock();
> + let root_entry = index.get(&root_inode).unwrap(); // Safe: we just checked it exists
> +
> + conn.execute(
> + "UPDATE entries SET version = ? WHERE inode = ?",
Please revisit the schema, should refer to the tree table?
> + rusqlite::params![root_entry.version as i64, root_inode as i64],
> + )
> + .context("Failed to update root version in database")?;
> +
> + drop(conn);
> + }
> +
> + // Collect ALL entries including root, sorted by inode
> + let mut entries: Vec<&TreeEntry> = index.values().collect();
> + entries.sort_by_key(|e| e.inode);
> +
> + tracing::info!("=== encode_index: Encoding {} entries ===", entries.len());
> + for te in entries.iter() {
> + tracing::info!(
> + " Entry: inode={:#018x}, parent={:#018x}, name='{}', type={}, version={}, writer={}, mtime={}, size={}",
> + te.inode, te.parent, te.name, te.entry_type, te.version, te.writer, te.mtime, te.size
> + );
> + }
> +
> + // Create index entries with digests
> + let index_entries: Vec<IndexEntry> = entries
> + .iter()
> + .map(|te| {
> + let digest = MemDbIndex::compute_entry_digest(
> + te.inode,
> + te.parent,
> + te.version,
> + te.writer,
> + te.mtime,
> + te.size,
> + te.entry_type,
> + &te.name,
> + &te.data,
> + );
> + tracing::debug!(
> + " Digest for inode {:#018x}: {:02x}{:02x}{:02x}{:02x}...{:02x}{:02x}{:02x}{:02x}",
> + te.inode,
> + digest[0], digest[1], digest[2], digest[3],
> + digest[28], digest[29], digest[30], digest[31]
> + );
> + IndexEntry { inode: te.inode, digest }
> + })
> + .collect();
> +
> + // Get root entry for mtime and writer_id (now updated with global version)
> + let root_entry = index
> + .get(&self.inner.root_inode)
> + .ok_or_else(|| anyhow::anyhow!("Root entry not found in index"))?;
> +
> + let version = global_version; // Already synchronized above
> + let last_inode = index.keys().max().copied().unwrap_or(1);
> + let writer = root_entry.writer;
> + let mtime = root_entry.mtime;
> +
> + drop(index);
> +
> + Ok(MemDbIndex::new(
> + version,
> + last_inode,
> + writer,
> + mtime,
> + index_entries,
> + ))
> + }
> +
> + /// Encode the entire database state into a byte array
> + /// Matches C version's memdb_encode() function
> + pub fn encode_database(&self) -> Result<Vec<u8>> {
> + let index = self.inner.index.lock();
> +
> + // Collect all entries sorted by inode for consistent ordering
> + // This matches the C implementation's memdb_tree_compare function
> + let mut entries: Vec<&TreeEntry> = index.values().collect();
> + entries.sort_by_key(|e| e.inode);
> +
> + // Log all entries for debugging
> + tracing::info!(
> + "Encoding database: {} entries",
> + entries.len()
> + );
> + for entry in entries.iter() {
> + tracing::info!(
> + " Entry: inode={}, name='{}', parent={}, type={}, size={}, version={}",
> + entry.inode,
> + entry.name,
> + entry.parent,
> + entry.entry_type,
> + entry.size,
> + entry.version
> + );
> + }
> +
> + // Serialize using bincode (compatible with C struct layout)
> + let encoded = bincode::serialize(&entries)
> + .map_err(|e| anyhow::anyhow!("Failed to encode database: {e}"))?;
> +
> + tracing::debug!(
> + "Encoded database: {} entries, {} bytes",
> + entries.len(),
> + encoded.len()
> + );
> +
> + Ok(encoded)
> + }
> +
> + /// Compute checksum of the entire database state
> + /// Used for DFSM state verification
> + pub fn compute_database_checksum(&self) -> Result<[u8; 32]> {
> + let encoded = self.encode_database()?;
This currently serializes via bincode then hashes. C’s
memdb_compute_checksum hashes the entries directly.
This does not look C compatible.
> +
> + let mut hasher = Sha256::new();
> + hasher.update(&encoded);
> +
> + Ok(hasher.finalize().into())
> + }
> +
> + /// Decode database state from a byte array
> + /// Used during DFSM state synchronization
> + pub fn decode_database(data: &[u8]) -> Result<Vec<TreeEntry>> {
> + let entries: Vec<TreeEntry> = bincode::deserialize(data)
> + .map_err(|e| anyhow::anyhow!("Failed to decode database: {e}"))?;
> +
> + tracing::debug!("Decoded database: {} entries", entries.len());
> +
> + Ok(entries)
> + }
> +
> + /// Synchronize corosync configuration from MemDb to filesystem
> + ///
> + /// Reads corosync.conf from memdb and writes to system file if changed.
> + /// This syncs the cluster configuration from the distributed database
> + /// to the local filesystem.
> + ///
> + /// # Arguments
> + /// * `system_path` - Path to write the corosync.conf file (default: /etc/corosync/corosync.conf)
> + /// * `force` - Force write even if unchanged
> + pub fn sync_corosync_conf(&self, system_path: Option<&str>, force: bool) -> Result<()> {
> + let system_path = system_path.unwrap_or("/etc/corosync/corosync.conf");
> + tracing::info!(
> + "Syncing corosync configuration to {} (force={})",
> + system_path,
> + force
> + );
> +
> + // Path in memdb for corosync.conf
> + let memdb_path = "/corosync.conf";
> +
> + // Try to read from memdb
> + let memdb_data = match self.lookup_path(memdb_path) {
> + Some(entry) if entry.is_file() => entry.data,
> + Some(_) => {
> + return Err(anyhow::anyhow!("{memdb_path} exists but is not a file"));
> + }
> + None => {
> + tracing::debug!("{} not found in memdb, nothing to sync", memdb_path);
> + return Ok(());
> + }
> + };
> +
> + // Read current system file if it exists
> + let system_data = std::fs::read(system_path).ok();
> +
> + // Determine if we need to write
> + let should_write = force || system_data.as_ref() != Some(&memdb_data);
> +
> + if !should_write {
> + tracing::debug!("Corosync configuration unchanged, skipping write");
> + return Ok(());
> + }
> +
> + // SAFETY CHECK: Writing to /etc requires root permissions
> + // We'll attempt the write but log clearly if it fails
> + tracing::info!(
> + "Corosync configuration changed (size: {} bytes), updating {}",
> + memdb_data.len(),
> + system_path
> + );
> +
> + // Basic validation: check if it looks like a valid corosync config
> + let config_str =
> + std::str::from_utf8(&memdb_data).context("Corosync config is not valid UTF-8")?;
> +
> + if !config_str.contains("totem") {
> + tracing::warn!("Corosync config validation: missing 'totem' section");
> + }
> + if !config_str.contains("nodelist") {
> + tracing::warn!("Corosync config validation: missing 'nodelist' section");
> + }
> +
> + // Attempt to write (will fail if not root or no permissions)
> + match std::fs::write(system_path, &memdb_data) {
> + Ok(()) => {
> + tracing::info!("Successfully updated {}", system_path);
> + Ok(())
> + }
> + Err(e) if e.kind() == std::io::ErrorKind::PermissionDenied => {
> + tracing::warn!(
> + "Permission denied writing {}: {}. Run as root to enable corosync sync.",
> + system_path,
> + e
> + );
> + // Don't return error - this is expected in non-root mode
> + Ok(())
> + }
> + Err(e) => Err(anyhow::anyhow!("Failed to write {system_path}: {e}")),
> + }
> + }
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/traits.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/traits.rs
> new file mode 100644
> index 00000000..efe3ff36
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/traits.rs
> @@ -0,0 +1,101 @@
> +//! Traits for MemDb operations
> +//!
> +//! This module provides the `MemDbOps` trait which abstracts MemDb operations
> +//! for dependency injection and testing. Similar to `StatusOps` in pmxcfs-status.
> +
> +use crate::types::TreeEntry;
> +use anyhow::Result;
> +
> +/// Trait abstracting MemDb operations for dependency injection and mocking
> +///
> +/// This trait enables:
> +/// - Dependency injection of MemDb into components
> +/// - Testing with MockMemDb instead of real database
> +/// - Trait objects for runtime polymorphism
> +///
> +/// # Example
> +/// ```no_run
> +/// use pmxcfs_memdb::{MemDb, MemDbOps};
> +/// use std::sync::Arc;
> +///
> +/// fn use_database(db: Arc<dyn MemDbOps>) {
> +/// // Can work with real MemDb or MockMemDb
> +/// let exists = db.exists("/test").unwrap();
> +/// }
> +/// ```
> +pub trait MemDbOps: Send + Sync {
> + // ===== Basic File Operations =====
> +
> + /// Create a new file or directory
> + fn create(&self, path: &str, mode: u32, mtime: u32) -> Result<()>;
> +
> + /// Read data from a file
> + fn read(&self, path: &str, offset: u64, size: usize) -> Result<Vec<u8>>;
> +
> + /// Write data to a file
> + fn write(
> + &self,
> + path: &str,
> + offset: u64,
> + mtime: u32,
> + data: &[u8],
> + truncate: bool,
> + ) -> Result<usize>;
> +
> + /// Delete a file or directory
> + fn delete(&self, path: &str) -> Result<()>;
> +
> + /// Rename a file or directory
> + fn rename(&self, old_path: &str, new_path: &str) -> Result<()>;
> +
> + /// Check if a path exists
> + fn exists(&self, path: &str) -> Result<bool>;
> +
> + /// List directory contents
> + fn readdir(&self, path: &str) -> Result<Vec<TreeEntry>>;
> +
> + /// Set modification time
> + fn set_mtime(&self, path: &str, writer: u32, mtime: u32) -> Result<()>;
> +
> + // ===== Path Lookup =====
> +
> + /// Look up a path and return its entry
> + fn lookup_path(&self, path: &str) -> Option<TreeEntry>;
> +
> + /// Get entry by inode number
> + fn get_entry_by_inode(&self, inode: u64) -> Option<TreeEntry>;
> +
> + // ===== Lock Operations =====
> +
> + /// Acquire a lock on a path
> + fn acquire_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()>;
> +
> + /// Release a lock on a path
> + fn release_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()>;
> +
> + /// Check if a path is locked
> + fn is_locked(&self, path: &str) -> bool;
> +
> + /// Check if a lock has expired
> + fn lock_expired(&self, path: &str, csum: &[u8; 32]) -> bool;
> +
> + // ===== Database Operations =====
> +
> + /// Get the current database version
> + fn get_version(&self) -> u64;
> +
> + /// Get all entries in the database
> + fn get_all_entries(&self) -> Result<Vec<TreeEntry>>;
> +
> + /// Replace all entries (for synchronization)
> + fn replace_all_entries(&self, entries: Vec<TreeEntry>) -> Result<()>;
> +
> + /// Apply a single tree entry update
> + fn apply_tree_entry(&self, entry: TreeEntry) -> Result<()>;
> +
> + /// Encode the entire database for network transmission
> + fn encode_database(&self) -> Result<Vec<u8>>;
> +
> + /// Compute database checksum
> + fn compute_database_checksum(&self) -> Result<[u8; 32]>;
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-memdb/src/types.rs b/src/pmxcfs-rs/pmxcfs-memdb/src/types.rs
> new file mode 100644
> index 00000000..988596c8
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-memdb/src/types.rs
> @@ -0,0 +1,325 @@
> +/// Type definitions for memdb module
> +use sha2::{Digest, Sha256};
> +use std::collections::HashMap;
> +
> +pub(super) const MEMDB_MAX_FILE_SIZE: usize = 1024 * 1024; // 1 MiB (matches C version)
> +pub(super) const LOCK_TIMEOUT: u64 = 120; // Lock timeout in seconds
> +pub(super) const DT_DIR: u8 = 4; // Directory type
> +pub(super) const DT_REG: u8 = 8; // Regular file type
> +
> +/// Root inode number (matches C implementation's memdb root inode)
> +/// IMPORTANT: This is the MEMDB root inode, which is 0 in both C and Rust.
> +/// The FUSE layer exposes this as inode 1 to the filesystem (FUSE_ROOT_ID).
> +/// See pmxcfs/src/fuse.rs for the inode mapping logic between memdb and FUSE.
> +pub const ROOT_INODE: u64 = 0;
> +
> +/// Version file name (matches C VERSIONFILENAME)
> +/// Used to store root metadata as inode ROOT_INODE in the database
> +pub const VERSION_FILENAME: &str = "__version__";
> +
> +/// Lock directory path (where cluster resource locks are stored)
> +/// Locks are implemented as directory entries stored at `priv/lock/<lockname>`
> +pub const LOCK_DIR_PATH: &str = "priv/lock";
> +
> +/// Lock information for resource locking
> +///
> +/// In the C version (memdb.h:71-74), the lock info struct includes a `path` field
> +/// that serves as the hash table key. In Rust, we use `HashMap<String, LockInfo>`
> +/// where the path is stored as the HashMap key, so we don't duplicate it here.
> +#[derive(Clone, Debug)]
> +pub(crate) struct LockInfo {
> + /// Lock timestamp (seconds since UNIX epoch)
> + pub(crate) ltime: u64,
> +
> + /// Checksum of the locked resource (used to detect changes)
> + pub(crate) csum: [u8; 32],
> +}
> +
> +/// Tree entry representing a file or directory
> +#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)]
> +pub struct TreeEntry {
> + pub inode: u64,
> + pub parent: u64,
> + pub version: u64,
> + pub writer: u32,
> + pub mtime: u32,
> + pub size: usize,
> + pub entry_type: u8, // DT_DIR or DT_REG
> + pub name: String,
> + pub data: Vec<u8>, // File data (empty for directories)
> +}
> +
> +impl TreeEntry {
> + pub fn is_dir(&self) -> bool {
> + self.entry_type == DT_DIR
> + }
> +
> + pub fn is_file(&self) -> bool {
> + self.entry_type == DT_REG
> + }
> +
> + /// Serialize TreeEntry to C-compatible wire format for Update messages
> + ///
> + /// Wire format (matches dcdb_send_update_inode):
> + /// ```c
> + /// [parent: u64][inode: u64][version: u64][writer: u32][mtime: u32]
> + /// [size: u32][namelen: u32][type: u8][name: namelen bytes][data: size bytes]
> + /// ```
> + pub fn serialize_for_update(&self) -> Vec<u8> {
> + let namelen = (self.name.len() + 1) as u32; // Include null terminator
> + let header_size = 8 + 8 + 8 + 4 + 4 + 4 + 4 + 1; // 41 bytes
> + let total_size = header_size + namelen as usize + self.data.len();
> +
> + let mut buf = Vec::with_capacity(total_size);
> +
> + // Header fields
> + buf.extend_from_slice(&self.parent.to_le_bytes());
> + buf.extend_from_slice(&self.inode.to_le_bytes());
> + buf.extend_from_slice(&self.version.to_le_bytes());
> + buf.extend_from_slice(&self.writer.to_le_bytes());
> + buf.extend_from_slice(&self.mtime.to_le_bytes());
> + buf.extend_from_slice(&(self.size as u32).to_le_bytes());
> + buf.extend_from_slice(&namelen.to_le_bytes());
> + buf.push(self.entry_type);
> +
> + // Name (null-terminated)
> + buf.extend_from_slice(self.name.as_bytes());
> + buf.push(0); // null terminator
> +
> + // Data (only for files)
> + if self.entry_type == DT_REG && !self.data.is_empty() {
> + buf.extend_from_slice(&self.data);
> + }
> +
> + buf
> + }
> +
> + /// Deserialize TreeEntry from C-compatible wire format
> + ///
> + /// Matches dcdb_parse_update_inode
> + pub fn deserialize_from_update(data: &[u8]) -> anyhow::Result<Self> {
> + if data.len() < 41 {
> + anyhow::bail!(
> + "Update message too short: {} bytes (need at least 41)",
> + data.len()
> + );
> + }
> +
> + let mut offset = 0;
> +
> + // Parse header
> + let parent = u64::from_le_bytes(data[offset..offset + 8].try_into().unwrap());
> + offset += 8;
> + let inode = u64::from_le_bytes(data[offset..offset + 8].try_into().unwrap());
> + offset += 8;
> + let version = u64::from_le_bytes(data[offset..offset + 8].try_into().unwrap());
> + offset += 8;
> + let writer = u32::from_le_bytes(data[offset..offset + 4].try_into().unwrap());
> + offset += 4;
> + let mtime = u32::from_le_bytes(data[offset..offset + 4].try_into().unwrap());
> + offset += 4;
> + let size = u32::from_le_bytes(data[offset..offset + 4].try_into().unwrap()) as usize;
> + offset += 4;
> + let namelen = u32::from_le_bytes(data[offset..offset + 4].try_into().unwrap()) as usize;
> + offset += 4;
> + let entry_type = data[offset];
> + offset += 1;
> +
> + // Validate type
> + if entry_type != DT_REG && entry_type != DT_DIR {
> + anyhow::bail!("Invalid entry type: {entry_type}");
> + }
> +
> + // Validate lengths
> + if data.len() < offset + namelen + size {
> + anyhow::bail!(
> + "Update message too short: {} bytes (need {})",
> + data.len(),
> + offset + namelen + size
> + );
> + }
> +
> + // Parse name (null-terminated)
> + let name_bytes = &data[offset..offset + namelen];
> + if name_bytes.is_empty() || name_bytes[namelen - 1] != 0 {
> + anyhow::bail!("Name not null-terminated");
> + }
> + let name = std::str::from_utf8(&name_bytes[..namelen - 1])
> + .map_err(|e| anyhow::anyhow!("Invalid UTF-8 in name: {e}"))?
> + .to_string();
> + offset += namelen;
> +
> + // Parse data
> + let data_vec = if entry_type == DT_REG && size > 0 {
> + data[offset..offset + size].to_vec()
> + } else {
> + Vec::new()
> + };
> +
> + Ok(TreeEntry {
> + inode,
> + parent,
> + version,
> + writer,
> + mtime,
> + size,
> + entry_type,
> + name,
> + data: data_vec,
> + })
> + }
> +
> + /// Compute SHA-256 checksum of this tree entry
> + ///
> + /// This checksum is used by the lock system to detect changes to lock directory entries.
> + /// Matches C version's memdb_tree_entry_csum() function (memdb.c:1389).
> + ///
> + /// The checksum includes all entry metadata (inode, parent, version, writer, mtime, size,
> + /// entry_type, name) and data (for files). This ensures any modification to a lock directory
> + /// entry is detected, triggering lock timeout reset.
Since C hashes raw integer bytes, should we use to_ne_bytes() here?
> + pub fn compute_checksum(&self) -> [u8; 32] {
> + let mut hasher = Sha256::new();
> +
> + // Hash entry metadata in the same order as C version
> + hasher.update(self.inode.to_le_bytes());
> + hasher.update(self.parent.to_le_bytes());
This seems to be at the wrong position.
In C it is at the 7th position.
> + hasher.update(self.version.to_le_bytes());
> + hasher.update(self.writer.to_le_bytes());
> + hasher.update(self.mtime.to_le_bytes());
> + hasher.update(self.size.to_le_bytes());
C hashes only 4 bytes (guint32)
I think this should be
hasher.update((self.size as u32).to_le_bytes());
> + hasher.update([self.entry_type]);
> + hasher.update(self.name.as_bytes());
> +
> + // Hash data if present
> + if !self.data.is_empty() {
> + hasher.update(&self.data);
> + }
> +
> + hasher.finalize().into()
> + }
> +}
> +
> +/// Return type for load_from_db: (index, tree, root_inode, max_version)
> +pub(super) type LoadDbResult = (
> + HashMap<u64, TreeEntry>,
> + HashMap<u64, HashMap<String, u64>>,
> + u64,
> + u64,
> +);
> +
[..]
> +}
^ permalink raw reply [relevance 5%]
* Re: [pve-devel] [PATCH pve-cluster 06/15] pmxcfs-rs: add pmxcfs-status crate
@ 2026-02-02 16:07 5% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-02-02 16:07 UTC (permalink / raw)
To: Proxmox VE development discussion, Kefu Chai
comments inline
On 1/6/26 3:25 PM, Kefu Chai wrote:
> Add cluster status tracking and monitoring:
> - Status: Central status container (thread-safe)
> - Cluster membership tracking
> - VM/CT registry with version tracking
> - RRD data management
> - Cluster log integration
> - Quorum state tracking
> - Configuration file version tracking
>
> This integrates pmxcfs-memdb, pmxcfs-rrd, pmxcfs-logger, and
> pmxcfs-api-types to provide centralized cluster state management.
> It also uses procfs for system metrics collection.
>
> Includes comprehensive unit tests for:
> - VM registration and deletion
> - Cluster membership updates
> - Version tracking
> - Configuration file monitoring
>
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
> src/pmxcfs-rs/Cargo.toml | 1 +
> src/pmxcfs-rs/pmxcfs-status/Cargo.toml | 40 +
> src/pmxcfs-rs/pmxcfs-status/README.md | 142 ++
> src/pmxcfs-rs/pmxcfs-status/src/lib.rs | 54 +
> src/pmxcfs-rs/pmxcfs-status/src/status.rs | 1561 +++++++++++++++++++++
> src/pmxcfs-rs/pmxcfs-status/src/traits.rs | 486 +++++++
> src/pmxcfs-rs/pmxcfs-status/src/types.rs | 62 +
> 7 files changed, 2346 insertions(+)
> create mode 100644 src/pmxcfs-rs/pmxcfs-status/Cargo.toml
> create mode 100644 src/pmxcfs-rs/pmxcfs-status/README.md
> create mode 100644 src/pmxcfs-rs/pmxcfs-status/src/lib.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-status/src/status.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-status/src/traits.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-status/src/types.rs
>
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index 2e41ac93..b5191c31 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -6,6 +6,7 @@ members = [
> "pmxcfs-logger", # Cluster log with ring buffer and deduplication
> "pmxcfs-rrd", # RRD (Round-Robin Database) persistence
> "pmxcfs-memdb", # In-memory database with SQLite persistence
> + "pmxcfs-status", # Status monitoring and RRD data management
> ]
> resolver = "2"
>
> diff --git a/src/pmxcfs-rs/pmxcfs-status/Cargo.toml b/src/pmxcfs-rs/pmxcfs-status/Cargo.toml
> new file mode 100644
> index 00000000..e4a817d7
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-status/Cargo.toml
> @@ -0,0 +1,40 @@
> +[package]
> +name = "pmxcfs-status"
> +description = "Status monitoring and RRD data management for pmxcfs"
> +
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +repository.workspace = true
> +
> +[lints]
> +workspace = true
> +
> +[dependencies]
> +# Workspace dependencies
> +pmxcfs-api-types.workspace = true
> +pmxcfs-rrd.workspace = true
> +pmxcfs-memdb.workspace = true
> +pmxcfs-logger.workspace = true
> +
> +# Error handling
> +anyhow.workspace = true
> +
> +# Async runtime
> +tokio.workspace = true
> +
> +# Concurrency primitives
> +parking_lot.workspace = true
> +
> +# Logging
> +tracing.workspace = true
> +
> +# Utilities
> +chrono.workspace = true
this dependency is not used
> +
> +# System information (Linux /proc filesystem)
> +procfs = "0.17"
> +
> +[dev-dependencies]
> +tempfile.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-status/README.md b/src/pmxcfs-rs/pmxcfs-status/README.md
> new file mode 100644
> index 00000000..b6958af3
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-status/README.md
> @@ -0,0 +1,142 @@
> +# pmxcfs-status
> +
> +**Cluster Status** tracking and monitoring for pmxcfs.
> +
> +This crate manages all runtime cluster state information including membership, VM lists, node status, RRD metrics, and cluster logs. It serves as the central repository for dynamic cluster information that changes during runtime.
> +
> +## Overview
> +
> +The Status subsystem tracks:
> +- **Cluster membership**: Which nodes are in the cluster and their states
> +- **VM/CT tracking**: Registry of all virtual machines and containers
> +- **Node status**: Per-node health and resource information
> +- **RRD data**: Performance metrics (CPU, memory, disk, network)
> +- **Cluster log**: Centralized log aggregation
> +- **Quorum state**: Whether cluster has quorum
> +- **Version tracking**: Monitors configuration file changes
> +
> +## Usage
> +
> +### Initialization
> +
> +```rust
> +use pmxcfs_status;
> +
> +// For tests or when RRD persistence is not needed
> +let status = pmxcfs_status::init();
> +
> +// For production with RRD file persistence
> +let status = pmxcfs_status::init_with_rrd("/var/lib/rrdcached/db").await;
> +```
> +
> +The default `init()` is synchronous and doesn't require a directory parameter, making tests simpler. Use `init_with_rrd()` for production deployments that need RRD persistence.
> +
> +### Integration with Other Components
> +
> +**FUSE Plugins**:
> +- `.version` plugin reads from Status
> +- `.vmlist` plugin generates VM list from Status
> +- `.members` plugin generates member list from Status
> +- `.rrd` plugin accesses RRD data from Status
> +- `.clusterlog` plugin reads cluster log from Status
> +
> +**DFSM Status Sync**:
> +- `StatusSyncService` (pmxcfs-dfsm) broadcasts status updates
> +- Uses `pve_kvstore_v1` CPG group
> +- KV store data synchronized across nodes
> +
> +**IPC Server**:
> +- `set_status` IPC call updates Status
> +- Used by `pvecm`/`pvenode` tools
> +- RRD data received via IPC
> +
> +**MemDb Integration**:
> +- Scans VM configs to populate vmlist
> +- Tracks version changes on file modifications
> +- Used for `.version` plugin timestamps
> +
> +## Architecture
> +
> +### Module Structure
> +
> +| Module | Purpose |
> +|--------|---------|
> +| `lib.rs` | Public API and initialization |
> +| `status.rs` | Core Status struct and operations |
> +| `types.rs` | Type definitions (ClusterNode, ClusterInfo, etc.) |
> +
> +### Key Features
> +
> +**Thread-Safe**: All operations use `RwLock` or `AtomicU64` for concurrent access
> +**Version Tracking**: Monotonically increasing counters for change detection
> +**Structured Logging**: Field-based tracing for better observability
> +**Optional RRD**: RRD persistence is opt-in, simplifying testing
> +
> +## C to Rust Mapping
> +
> +### Data Structures
> +
> +| C Type | Rust Type | Notes |
> +|--------|-----------|-------|
> +| `cfs_status_t` | `Status` | Main status container |
> +| `cfs_clinfo_t` | `ClusterInfo` | Cluster membership info |
> +| `cfs_clnode_t` | `ClusterNode` | Individual node info |
> +| `vminfo_t` | `VmEntry` | VM/CT registry entry (in pmxcfs-api-types) |
> +| `clog_entry_t` | `ClusterLogEntry` | Cluster log entry |
> +
> +### Core Functions
> +
> +| C Function | Rust Equivalent | Notes |
> +|-----------|-----------------|-------|
> +| `cfs_status_init()` | `init()` or `init_with_rrd()` | Two variants for flexibility |
> +| `cfs_set_quorate()` | `Status::set_quorate()` | Quorum tracking |
> +| `cfs_is_quorate()` | `Status::is_quorate()` | Quorum checking |
> +| `vmlist_register_vm()` | `Status::register_vm()` | VM registration |
> +| `vmlist_delete_vm()` | `Status::delete_vm()` | VM deletion |
> +| `cfs_status_set()` | `Status::set_node_status()` | Status updates (including RRD) |
> +
> +## Key Differences from C Implementation
> +
> +### RRD Decoupling
> +
> +**C Version (status.c)**:
> +- RRD code embedded in status.c
> +- Async initialization always required
> +
> +**Rust Version**:
> +- Separate `pmxcfs-rrd` crate
> +- `init()` is synchronous (no RRD)
> +- `init_with_rrd()` is async (with RRD)
> +- Tests don't need temp directories
> +
> +### Concurrency
> +
> +**C Version**:
> +- Single `GMutex` for entire status structure
> +
> +**Rust Version**:
> +- Fine-grained `RwLock` for different data structures
> +- `AtomicU64` for version counters
> +- Better read parallelism
> +
> +## Configuration File Tracking
> +
> +Status tracks version numbers for these common Proxmox config files:
> +
> +- `corosync.conf`, `corosync.conf.new`
> +- `storage.cfg`, `user.cfg`, `domains.cfg`
> +- `datacenter.cfg`, `vzdump.cron`, `vzdump.conf`
> +- `ha/` directory files (crm_commands, manager_status, resources.cfg, etc.)
> +- `sdn/` directory files (vnets.cfg, zones.cfg, controllers.cfg, etc.)
> +- And many more (see `Status::new()` in status.rs for complete list)
> +
> +## References
> +
> +### C Implementation
> +- `src/pmxcfs/status.c` / `status.h` - Status tracking
> +
> +### Related Crates
> +- **pmxcfs-rrd**: RRD file persistence
> +- **pmxcfs-dfsm**: Status synchronization via StatusSyncService
> +- **pmxcfs-logger**: Cluster log implementation
> +- **pmxcfs**: FUSE plugins that read from Status
> diff --git a/src/pmxcfs-rs/pmxcfs-status/src/lib.rs b/src/pmxcfs-rs/pmxcfs-status/src/lib.rs
> new file mode 100644
> index 00000000..282e007d
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-status/src/lib.rs
> @@ -0,0 +1,54 @@
> +/// Status information and monitoring
> +///
> +/// This module manages:
> +/// - Cluster membership (nodes, IPs, online status)
> +/// - RRD (Round Robin Database) data for metrics
> +/// - Cluster log
> +/// - Node status information
> +/// - VM/CT list tracking
> +mod status;
> +mod traits;
> +mod types;
> +
> +// Re-export public types
> +pub use pmxcfs_api_types::{VmEntry, VmType};
> +pub use types::{ClusterInfo, ClusterLogEntry, ClusterNode, NodeStatus};
> +
> +// Re-export Status struct and trait
> +pub use status::Status;
> +pub use traits::{BoxFuture, MockStatus, StatusOps};
> +
> +use std::sync::Arc;
> +
> +/// Initialize status subsystem without RRD persistence
> +///
> +/// This is the default initialization that creates a Status instance
> +/// without file-based RRD persistence. RRD data will be kept in memory only.
> +pub fn init() -> Arc<Status> {
> + tracing::info!("Status subsystem initialized (RRD persistence disabled)");
> + Arc::new(Status::new(None))
> +}
> +
> +/// Initialize status subsystem with RRD file persistence
> +///
> +/// Creates a Status instance with RRD data written to disk in the specified directory.
> +/// This requires the RRD directory to exist and be writable.
> +pub async fn init_with_rrd<P: AsRef<std::path::Path>>(rrd_dir: P) -> Arc<Status> {
> + let rrd_dir_path = rrd_dir.as_ref();
> + let rrd_writer = match pmxcfs_rrd::RrdWriter::new(rrd_dir_path).await {
> + Ok(writer) => {
> + tracing::info!(
> + directory = %rrd_dir_path.display(),
> + "RRD file persistence enabled"
> + );
> + Some(writer)
> + }
> + Err(e) => {
> + tracing::warn!(error = %e, "RRD file persistence disabled");
> + None
> + }
> + };
> +
> + tracing::info!("Status subsystem initialized");
> + Arc::new(Status::new(rrd_writer))
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-status/src/status.rs b/src/pmxcfs-rs/pmxcfs-status/src/status.rs
> new file mode 100644
> index 00000000..94b6483d
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-status/src/status.rs
> @@ -0,0 +1,1561 @@
> +/// Status subsystem implementation
> +use crate::types::{ClusterInfo, ClusterLogEntry, ClusterNode, NodeStatus, RrdEntry};
> +use anyhow::Result;
> +use parking_lot::RwLock;
> +use pmxcfs_api_types::{VmEntry, VmType};
> +use std::collections::HashMap;
> +use std::sync::Arc;
> +use std::sync::atomic::{AtomicU64, Ordering};
> +use std::time::{SystemTime, UNIX_EPOCH};
> +
> +/// Status subsystem (matches C implementation's cfs_status_t)
> +pub struct Status {
> + /// Cluster information (nodes, membership) - matches C's clinfo
> + cluster_info: RwLock<Option<ClusterInfo>>,
> +
> + /// Cluster info version counter - increments on membership changes (matches C's clinfo_version)
> + cluster_version: AtomicU64,
This field is used as a change counter in multiple places but gets
overwritten in update_cluster_info(). In C we have clinfo_version
vs cman_version. These need to be separate fields as in C, otherwise
update_cluster_info overwrites the monotonic change counter that other
call sites depend on.
> +
> + /// VM list version counter - increments when VM list changes (matches C's vmlist_version)
> + vmlist_version: AtomicU64,
> +
> + /// MemDB path version counters (matches C's memdb_change_array)
> + /// Tracks versions for specific config files like "corosync.conf", "user.cfg", etc.
> + memdb_path_versions: RwLock<HashMap<String, AtomicU64>>,
> +
> + /// Node status data by name
> + node_status: RwLock<HashMap<String, NodeStatus>>,
> +
> + /// Cluster log with ring buffer and deduplication (matches C's clusterlog_t)
> + cluster_log: pmxcfs_logger::ClusterLog,
> +
> + /// RRD entries by key (e.g., "pve2-node/nodename" or "pve2.3-vm/vmid")
> + pub(crate) rrd_data: RwLock<HashMap<String, RrdEntry>>,
> +
> + /// RRD file writer for persistent storage (using tokio RwLock for async compatibility)
> + rrd_writer: Option<Arc<tokio::sync::RwLock<pmxcfs_rrd::RrdWriter>>>,
> +
> + /// VM/CT list (vmid -> VmEntry)
> + vmlist: RwLock<HashMap<u32, VmEntry>>,
> +
> + /// Quorum status (matches C's cfs_status.quorate)
> + quorate: RwLock<bool>,
> +
> + /// Current cluster members (CPG membership)
> + members: RwLock<Vec<pmxcfs_api_types::MemberInfo>>,
> +
> + /// Daemon start timestamp (UNIX epoch) - for .version plugin
> + start_time: u64,
> +
> + /// KV store data from nodes (nodeid -> key -> value)
> + /// Matches C implementation's kvhash
> + kvstore: RwLock<HashMap<u32, HashMap<String, Vec<u8>>>>,
C removes a kvstore entry when len == 0 and maintains a
per key entry->version counter (incremented on overwrite).
Our kvstore currently stores only Vec<u8> and doesn’t reflect
these semantics
> +}
> +
> +impl Status {
> + /// Create a new Status instance
> + ///
> + /// For production use with RRD persistence, use `pmxcfs_status::init_with_rrd()`.
> + /// For tests or when RRD persistence is not needed, use `pmxcfs_status::init()`.
> + /// This constructor is public to allow custom initialization patterns.
> + pub fn new(rrd_writer: Option<pmxcfs_rrd::RrdWriter>) -> Self {
> + // Wrap RrdWriter in Arc<tokio::sync::RwLock> if provided (for async compatibility)
> + let rrd_writer = rrd_writer.map(|w| Arc::new(tokio::sync::RwLock::new(w)));
> +
> + // Initialize memdb path versions for common Proxmox config files
> + // Matches C implementation's memdb_change_array (status.c:79-120)
> + // These are the exact paths tracked by the C implementation
> + let mut path_versions = HashMap::new();
> + let common_paths = vec![
> + "corosync.conf",
> + "corosync.conf.new",
> + "storage.cfg",
> + "user.cfg",
> + "domains.cfg",
> + "notifications.cfg",
> + "priv/notifications.cfg",
> + "priv/shadow.cfg",
> + "priv/acme/plugins.cfg",
> + "priv/tfa.cfg",
> + "priv/token.cfg",
> + "datacenter.cfg",
> + "vzdump.cron",
> + "vzdump.conf",
> + "jobs.cfg",
> + "ha/crm_commands",
> + "ha/manager_status",
> + "ha/resources.cfg",
> + "ha/rules.cfg",
> + "ha/groups.cfg",
> + "ha/fence.cfg",
> + "status.cfg",
> + "replication.cfg",
> + "ceph.conf",
> + "sdn/vnets.cfg",
> + "sdn/zones.cfg",
> + "sdn/controllers.cfg",
> + "sdn/subnets.cfg",
> + "sdn/ipams.cfg",
> + "sdn/mac-cache.json", // SDN MAC address cache
> + "sdn/pve-ipam-state.json", // SDN IPAM state
> + "sdn/dns.cfg", // SDN DNS configuration
> + "sdn/fabrics.cfg", // SDN fabrics configuration
> + "sdn/.running-config", // SDN running configuration
> + "virtual-guest/cpu-models.conf", // Virtual guest CPU models
> + "virtual-guest/profiles.cfg", // Virtual guest profiles
> + "firewall/cluster.fw", // Cluster firewall rules
> + "mapping/directory.cfg", // Directory mappings
> + "mapping/pci.cfg", // PCI device mappings
> + "mapping/usb.cfg", // USB device mappings
> + ];
> +
> + for path in common_paths {
> + path_versions.insert(path.to_string(), AtomicU64::new(0));
> + }
> +
> + // Get start time (matches C implementation's cfs_status.start_time)
> + let start_time = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap_or_default()
> + .as_secs();
> +
> + Self {
> + cluster_info: RwLock::new(None),
> + cluster_version: AtomicU64::new(1),
> + vmlist_version: AtomicU64::new(1),
> + memdb_path_versions: RwLock::new(path_versions),
> + node_status: RwLock::new(HashMap::new()),
> + cluster_log: pmxcfs_logger::ClusterLog::new(),
> + rrd_data: RwLock::new(HashMap::new()),
> + rrd_writer,
> + vmlist: RwLock::new(HashMap::new()),
> + quorate: RwLock::new(false),
> + members: RwLock::new(Vec::new()),
> + start_time,
> + kvstore: RwLock::new(HashMap::new()),
> + }
> + }
> +
> + /// Get node status
> + pub fn get_node_status(&self, name: &str) -> Option<NodeStatus> {
> + self.node_status.read().get(name).cloned()
> + }
> +
> + /// Set node status (matches C implementation's cfs_status_set)
> + ///
> + /// This handles status updates received via IPC from external clients.
> + /// If the key starts with "rrd/", it's RRD data that should be written to disk.
> + /// Otherwise, it's generic node status data.
> + pub async fn set_node_status(&self, name: String, data: Vec<u8>) -> Result<()> {
we need to check for CFS_MAX_STATUS_SIZE, to avoid accepting unbounded
payloads (and to avoid possible state divergence with C)
> + // Check if this is RRD data (matching C's cfs_status_set behavior)
> + if let Some(rrd_key) = name.strip_prefix("rrd/") {
> + // Strip "rrd/" prefix to get the actual RRD key
> + // Convert data to string (RRD data is text format)
> + let data_str = String::from_utf8(data)
> + .map_err(|e| anyhow::anyhow!("Invalid UTF-8 in RRD data: {e}"))?;
We need to strip \0 as C payloads are NUL terminated and from_utf8
preserves it, so that it doesn't end up in RRD dump output
> +
> + // Write to RRD (stores in memory and writes to disk)
> + self.set_rrd_data(rrd_key.to_string(), data_str).await?;
> + } else {
nodeip handling is missing here, C has a dedicated branch for it.
The backing data structure iphash is also missing.
> + // Regular node status (not RRD)
> + let now = SystemTime::now().duration_since(UNIX_EPOCH)?.as_secs();
> + let status = NodeStatus {
> + name: name.clone(),
> + data,
> + timestamp: now,
> + };
> + self.node_status.write().insert(name, status);
> + }
> +
> + Ok(())
> + }
> +
> + /// Add cluster log entry
> + pub fn add_log_entry(&self, entry: ClusterLogEntry) {
> + // Convert ClusterLogEntry to ClusterLog format and add
> + // The ClusterLog handles size limits and deduplication internally
> + let _ = self.cluster_log.add(
> + &entry.node,
> + &entry.ident,
> + &entry.tag,
> + 0, // pid not tracked in our entries
> + entry.priority,
> + entry.timestamp as u32,
> + &entry.message,
> + );
> + }
> +
> + /// Get cluster log entries
> + pub fn get_log_entries(&self, max: usize) -> Vec<ClusterLogEntry> {
> + // Get entries from ClusterLog and convert to ClusterLogEntry
> + self.cluster_log
> + .get_entries(max)
> + .into_iter()
> + .map(|entry| ClusterLogEntry {
> + timestamp: entry.time as u64,
> + node: entry.node,
> + priority: entry.priority,
> + ident: entry.ident,
> + tag: entry.tag,
> + message: entry.message,
> + })
> + .collect()
> + }
> +
> + /// Clear all cluster log entries (for testing)
> + pub fn clear_cluster_log(&self) {
> + self.cluster_log.clear();
> + }
> +
> + /// Set RRD data (C-compatible format)
> + /// Key format: "pve2-node/{nodename}" or "pve2.3-vm/{vmid}"
> + /// Data format: "{timestamp}:{val1}:{val2}:..."
> + pub async fn set_rrd_data(&self, key: String, data: String) -> Result<()> {
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap_or_default()
> + .as_secs();
> +
> + let entry = RrdEntry {
> + key: key.clone(),
> + data: data.clone(),
> + timestamp: now,
> + };
> +
> + // Store in memory for .rrd plugin file
> + self.rrd_data.write().insert(key.clone(), entry);
> +
> + // Also write to RRD file on disk (if persistence is enabled)
> + if let Some(writer_lock) = &self.rrd_writer {
> + let mut writer = writer_lock.write().await;
> + writer.update(&key, &data).await?;
> + tracing::trace!("Updated RRD file: {} -> {}", key, data);
> + }
> +
> + Ok(())
> + }
> +
> + /// Remove old RRD entries (older than 5 minutes)
> + pub fn remove_old_rrd_data(&self) {
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap_or_default()
> + .as_secs();
> +
> + const EXPIRE_SECONDS: u64 = 60 * 5; // 5 minutes
> +
> + self.rrd_data
> + .write()
> + .retain(|_, entry| now - entry.timestamp <= EXPIRE_SECONDS);
If the system clock jumps backwards, now can be less than
entry.timestamp
> + }
> +
> + /// Get RRD data dump (text format matching C implementation)
> + pub fn get_rrd_dump(&self) -> String {
This rebuilds everytime when called, and calls remove_old_rrd_data
under write lock. This could be cached for a specific time to improve
performance, similarly as done in C.
> + // Remove old entries first
> + self.remove_old_rrd_data();
> +
> + let rrd = self.rrd_data.read();
> + let mut result = String::new();
> +
> + for entry in rrd.values() {
> + result.push_str(&entry.key);
> + result.push(':');
> + result.push_str(&entry.data);
> + result.push('\n');
> + }
> +
> + result
> + }
> +
> + /// Collect disk I/O statistics (bytes read, bytes written)
> + ///
> + /// Note: This is for future VM RRD implementation. Per C implementation:
> + /// - Node RRD (rrd_def_node) has 12 fields and does NOT include diskread/diskwrite
> + /// - VM RRD (rrd_def_vm) has 10 fields and DOES include diskread/diskwrite at indices 8-9
> + ///
> + /// This method will be used when implementing VM RRD collection.
> + ///
> + /// # Sector Size
> + /// The Linux kernel reports disk statistics in /proc/diskstats using 512-byte sectors
> + /// as the standard unit, regardless of the device's actual physical sector size.
> + /// This is a kernel reporting convention (see Documentation/admin-guide/iostats.rst).
> + #[allow(dead_code)]
> + fn collect_disk_io() -> Result<(u64, u64)> {
> + // /proc/diskstats always uses 512-byte sectors (kernel convention)
> + const DISKSTATS_SECTOR_SIZE: u64 = 512;
> +
> + let diskstats = procfs::diskstats()?;
> +
> + let mut total_read = 0u64;
> + let mut total_write = 0u64;
> +
> + for stat in diskstats {
> + // Skip partitions (only look at whole disks: sda, vda, etc.)
> + if stat
> + .name
> + .chars()
> + .last()
> + .map(|c| c.is_numeric())
> + .unwrap_or(false)
> + {
> + continue;
> + }
> +
> + // Convert sectors to bytes using kernel's reporting unit
> + total_read += stat.sectors_read * DISKSTATS_SECTOR_SIZE;
> + total_write += stat.sectors_written * DISKSTATS_SECTOR_SIZE;
> + }
> +
> + Ok((total_read, total_write))
> + }
> +
> + /// Register a VM/CT
> + pub fn register_vm(&self, vmid: u32, vmtype: VmType, node: String) {
> + tracing::debug!(vmid, vmtype = ?vmtype, node = %node, "Registered VM");
> +
> + // Get existing VM version or start at 1
> + let version = self
> + .vmlist
> + .read()
> + .get(&vmid)
> + .map(|vm| vm.version + 1)
In C we have the global static uint32_t vminfo_version_counter.
Here we have per vm based counters. Why the difference?
wouldnt it be more
helpful if we also have a global order here, so we can determine
the update order of VMs from it?
> + .unwrap_or(1);
> +
> + let entry = VmEntry {
> + vmid,
> + vmtype,
> + node,
> + version,
> + };
> + self.vmlist.write().insert(vmid, entry);
Between the read() and write() we have TOCTOU window, similarly as in
set_quorate
> +
> + // Increment vmlist version counter
> + self.increment_vmlist_version();
> + }
> +
> + /// Delete a VM/CT
> + pub fn delete_vm(&self, vmid: u32) {
> + if self.vmlist.write().remove(&vmid).is_some() {
This should bump unconditionally to match C
> + tracing::debug!(vmid, "Deleted VM");
> +
> + // Increment vmlist version counter
> + self.increment_vmlist_version();
> + }
> + }
> +
> + /// Check if VM/CT exists
> + pub fn vm_exists(&self, vmid: u32) -> bool {
> + self.vmlist.read().contains_key(&vmid)
> + }
> +
> + /// Check if a different VM/CT exists (different node or type)
> + pub fn different_vm_exists(&self, vmid: u32, vmtype: VmType, node: &str) -> bool {
> + if let Some(entry) = self.vmlist.read().get(&vmid) {
> + entry.vmtype != vmtype || entry.node != node
> + } else {
> + false
> + }
> + }
> +
> + /// Get VM list
> + pub fn get_vmlist(&self) -> HashMap<u32, VmEntry> {
> + self.vmlist.read().clone()
> + }
> +
> + /// Scan directories for VMs/CTs and update vmlist
> + ///
> + /// Uses memdb's `recreate_vmlist()` to properly scan nodes/*/qemu-server/
> + /// and nodes/*/lxc/ directories to track which node each VM belongs to.
> + pub fn scan_vmlist(&self, memdb: &pmxcfs_memdb::MemDb) {
> + // Use the proper recreate_vmlist from memdb which scans nodes/*/qemu-server/ and nodes/*/lxc/
> + match pmxcfs_memdb::recreate_vmlist(memdb) {
> + Ok(new_vmlist) => {
> + let vmlist_len = new_vmlist.len();
> + let mut vmlist = self.vmlist.write();
> + *vmlist = new_vmlist;
This replaces the entire HashMap, which resets all per VM version
counters.
> + drop(vmlist);
> +
> + tracing::info!(vms = vmlist_len, "VM list scan complete");
> +
> + // Increment vmlist version counter
> + self.increment_vmlist_version();
> + }
> + Err(err) => {
> + tracing::error!(error = %err, "Failed to recreate vmlist");
> + }
> + }
> + }
> +
> + /// Initialize cluster information with cluster name
> + pub fn init_cluster(&self, cluster_name: String) {
> + let info = ClusterInfo::new(cluster_name);
> + *self.cluster_info.write() = Some(info);
> + self.cluster_version.fetch_add(1, Ordering::SeqCst);
> + }
> +
> + /// Register a node in the cluster (name, ID, IP)
> + pub fn register_node(&self, node_id: u32, name: String, ip: String) {
> + tracing::debug!(node_id, node = %name, ip = %ip, "Registering cluster node");
> +
> + let mut cluster_info = self.cluster_info.write();
> + if let Some(ref mut info) = *cluster_info {
> + let node = ClusterNode {
> + name,
> + node_id,
> + ip,
> + online: false, // Will be updated by cluster module
> + };
> + info.add_node(node);
> + self.cluster_version.fetch_add(1, Ordering::SeqCst);
> + }
> + }
> +
> + /// Get cluster information (for .members plugin)
> + pub fn get_cluster_info(&self) -> Option<ClusterInfo> {
> + self.cluster_info.read().clone()
> + }
> +
> + /// Get cluster version
> + pub fn get_cluster_version(&self) -> u64 {
> + self.cluster_version.load(Ordering::SeqCst)
> + }
> +
> + /// Increment cluster version (called when membership changes)
> + pub fn increment_cluster_version(&self) {
> + self.cluster_version.fetch_add(1, Ordering::SeqCst);
> + }
> +
> + /// Update cluster info from CMAP (called by ClusterConfigService)
> + pub fn update_cluster_info(
> + &self,
> + cluster_name: String,
> + config_version: u64,
> + nodes: Vec<(u32, String, String)>,
> + ) -> Result<()> {
> + let mut cluster_info = self.cluster_info.write();
> +
> + // Create or update cluster info
> + let mut info = cluster_info
> + .take()
> + .unwrap_or_else(|| ClusterInfo::new(cluster_name.clone()));
> +
> + // Update cluster name if changed
> + if info.cluster_name != cluster_name {
> + info.cluster_name = cluster_name;
> + }
> +
> + // Clear existing nodes
> + info.nodes_by_id.clear();
> + info.nodes_by_name.clear();
> +
> + // Add updated nodes
> + for (nodeid, name, ip) in nodes {
> + let node = ClusterNode {
> + name: name.clone(),
> + node_id: nodeid,
> + ip,
> + online: false, // Will be updated by quorum module
This drops online status. C's cfs_status_set_clinfo preserves it by
copying from oldnode. This needs the same treatment here.
> + };
> + info.add_node(node);
> + }
Do we need to cleanup kvstore on node removal?
> +
> + *cluster_info = Some(info);
> +
> + // Update version to reflect configuration change
> + self.cluster_version.store(config_version, Ordering::SeqCst);
> +
> + tracing::info!(version = config_version, "Updated cluster configuration");
> + Ok(())
> + }
> +
> + /// Update node online status (called by cluster module)
> + pub fn set_node_online(&self, node_id: u32, online: bool) {
> + let mut cluster_info = self.cluster_info.write();
> + if let Some(ref mut info) = *cluster_info
> + && let Some(node) = info.nodes_by_id.get_mut(&node_id)
> + && node.online != online
> + {
> + node.online = online;
> + // Also update in nodes_by_name
> + if let Some(name_node) = info.nodes_by_name.get_mut(&node.name) {
> + name_node.online = online;
> + }
> + self.cluster_version.fetch_add(1, Ordering::SeqCst);
> + tracing::debug!(
> + node = %node.name,
> + node_id,
> + online = if online { "true" } else { "false" },
> + "Node online status changed"
> + );
> + }
> + }
> +
> + /// Check if cluster is quorate (matches C's cfs_is_quorate)
> + pub fn is_quorate(&self) -> bool {
> + *self.quorate.read()
> + }
> +
> + /// Set quorum status (matches C's cfs_set_quorate)
> + pub fn set_quorate(&self, quorate: bool) {
> + let old_quorate = *self.quorate.read();
between this
> + *self.quorate.write() = quorate;
and this line we have a TOCTOU window.
The * dereferences the bool out of it, and then the guard is dropped at
the semicolon. So between line 1 and line 2, no lock is held.
Putting both operations under the write lock would solve it.
> +
> + if old_quorate != quorate {
> + if quorate {
> + tracing::info!("Node has quorum");
> + } else {
> + tracing::info!("Node lost quorum");
> + }
> + }
> + }
> +
> + /// Get current cluster members (CPG membership)
> + pub fn get_members(&self) -> Vec<pmxcfs_api_types::MemberInfo> {
> + self.members.read().clone()
> + }
> +
> + /// Update cluster members and sync online status (matches C's dfsm_confchg callback)
> + ///
> + /// This updates the CPG member list and synchronizes the online status
> + /// in cluster_info to match current membership.
> + pub fn update_members(&self, members: Vec<pmxcfs_api_types::MemberInfo>) {
> + *self.members.write() = members.clone();
> +
> + // Update online status in cluster_info based on members
> + // (matches C implementation's dfsm_confchg in status.c:1989-2025)
> + let mut cluster_info = self.cluster_info.write();
> + if let Some(ref mut info) = *cluster_info {
> + // First mark all nodes as offline
> + for node in info.nodes_by_id.values_mut() {
> + node.online = false;
> + }
> + for node in info.nodes_by_name.values_mut() {
> + node.online = false;
> + }
> +
> + // Then mark active members as online
> + for member in &members {
> + if let Some(node) = info.nodes_by_id.get_mut(&member.node_id) {
> + node.online = true;
> + // Also update in nodes_by_name
> + if let Some(name_node) = info.nodes_by_name.get_mut(&node.name) {
> + name_node.online = true;
> + }
> + }
> + }
> +
> + self.cluster_version.fetch_add(1, Ordering::SeqCst);
> + }
> + }
> +
> + /// Get daemon start timestamp (for .version plugin)
> + pub fn get_start_time(&self) -> u64 {
> + self.start_time
> + }
> +
> + /// Increment VM list version (matches C's cfs_status.vmlist_version++)
> + pub fn increment_vmlist_version(&self) {
> + self.vmlist_version.fetch_add(1, Ordering::SeqCst);
> + }
> +
> + /// Get VM list version
> + pub fn get_vmlist_version(&self) -> u64 {
> + self.vmlist_version.load(Ordering::SeqCst)
> + }
> +
> + /// Increment version for a specific memdb path (matches C's record_memdb_change)
> + pub fn increment_path_version(&self, path: &str) {
> + let versions = self.memdb_path_versions.read();
> + if let Some(counter) = versions.get(path) {
> + counter.fetch_add(1, Ordering::SeqCst);
> + }
> + }
> +
> + /// Get version for a specific memdb path
> + pub fn get_path_version(&self, path: &str) -> u64 {
> + let versions = self.memdb_path_versions.read();
> + versions
> + .get(path)
> + .map(|counter| counter.load(Ordering::SeqCst))
> + .unwrap_or(0)
> + }
> +
> + /// Get all memdb path versions (for .version plugin)
> + pub fn get_all_path_versions(&self) -> HashMap<String, u64> {
> + let versions = self.memdb_path_versions.read();
> + versions
> + .iter()
> + .map(|(path, counter)| (path.clone(), counter.load(Ordering::SeqCst)))
> + .collect()
> + }
> +
> + /// Increment ALL configuration file versions (matches C's record_memdb_reload)
> + ///
> + /// Called when the entire database is reloaded from cluster peers.
> + /// This ensures clients know that all configuration files should be re-read.
> + pub fn increment_all_path_versions(&self) {
> + let versions = self.memdb_path_versions.read();
> + for (_, counter) in versions.iter() {
> + counter.fetch_add(1, Ordering::SeqCst);
> + }
> + }
> +
> + /// Set key-value data from a node (kvstore DFSM)
> + ///
> + /// Matches C implementation's cfs_kvstore_node_set in status.c.
> + /// Stores ephemeral status data like RRD metrics, IP addresses, etc.
> + pub fn set_node_kv(&self, nodeid: u32, key: String, value: Vec<u8>) {
We accept unknown nodeids here, maybe something like this would work
let cluster_info = self.cluster_info.read();
match &*cluster_info {
Some(info) if info.nodes_by_id.contains_key(&nodeid) => {},
_ => return,
}
drop(cluster_info);
Also, shouldn't we also have the same 3 checks here as set_node_status
should have? Basically
if let Some(rrd_key) = key.strip_prefix("rrd/") {
..
} else if key == "nodeip" {
..
} else {
..
}
> + let mut kvstore = self.kvstore.write();
> + kvstore.entry(nodeid).or_default().insert(key, value);
> + }
> +
> + /// Get key-value data from a node
> + pub fn get_node_kv(&self, nodeid: u32, key: &str) -> Option<Vec<u8>> {
> + let kvstore = self.kvstore.read();
> + kvstore.get(&nodeid)?.get(key).cloned()
> + }
> +
> + /// Add cluster log entry (called by kvstore DFSM)
> + ///
> + /// This is the wrapper for kvstore LOG messages.
> + /// Matches C implementation's clusterlog_insert call.
> + pub fn add_cluster_log(
> + &self,
> + timestamp: u32,
> + priority: u8,
> + tag: String,
> + node: String,
> + message: String,
> + ) {
> + let entry = ClusterLogEntry {
> + timestamp: timestamp as u64,
> + node,
> + priority,
> + ident: String::new(), // Not used in kvstore messages
> + tag,
> + message,
> + };
> + self.add_log_entry(entry);
> + }
> +
> + /// Update node online status based on CPG membership (kvstore DFSM confchg callback)
> + ///
> + /// This is called when kvstore CPG membership changes.
> + /// Matches C implementation's dfsm_confchg in status.c.
> + pub fn update_member_status(&self, member_list: &[u32]) {
> + let mut cluster_info = self.cluster_info.write();
> + if let Some(ref mut info) = *cluster_info {
> + // Mark all nodes as offline
> + for node in info.nodes_by_id.values_mut() {
> + node.online = false;
> + }
> + for node in info.nodes_by_name.values_mut() {
> + node.online = false;
> + }
> +
> + // Mark nodes in member_list as online
> + for &nodeid in member_list {
> + if let Some(node) = info.nodes_by_id.get_mut(&nodeid) {
> + node.online = true;
> + // Also update in nodes_by_name
> + if let Some(name_node) = info.nodes_by_name.get_mut(&node.name) {
> + name_node.online = true;
> + }
> + }
> + }
> +
> + self.cluster_version.fetch_add(1, Ordering::SeqCst);
> + }
> + }
> +
> + /// Get cluster log state (for DFSM synchronization)
> + ///
> + /// Returns the cluster log in C-compatible binary format (clog_base_t).
> + /// Matches C implementation's clusterlog_get_state() in logger.c:553-571.
> + pub fn get_cluster_log_state(&self) -> Result<Vec<u8>> {
> + self.cluster_log.get_state()
> + }
> +
> + /// Merge cluster log states from remote nodes
> + ///
> + /// Deserializes binary states from remote nodes and merges them with the local log.
> + /// Matches C implementation's dfsm_process_state_update() in status.c:2049-2074.
> + pub fn merge_cluster_log_states(
> + &self,
> + states: &[pmxcfs_api_types::NodeSyncInfo],
> + ) -> Result<()> {
> + use pmxcfs_logger::ClusterLog;
> +
> + let mut remote_logs = Vec::new();
> +
> + for state_info in states {
> + // Check if this node has state data
> + let state_data = match &state_info.state {
> + Some(data) if !data.is_empty() => data,
> + _ => continue,
> + };
> +
> + match ClusterLog::deserialize_state(state_data) {
> + Ok(ring_buffer) => {
> + tracing::debug!(
> + "Deserialized cluster log from node {}: {} entries",
> + state_info.nodeid,
> + ring_buffer.len()
> + );
> + remote_logs.push(ring_buffer);
> + }
> + Err(e) => {
> + tracing::warn!(
> + nodeid = state_info.nodeid,
> + error = %e,
> + "Failed to deserialize cluster log from node"
> + );
> + }
> + }
> + }
> +
> + if !remote_logs.is_empty() {
> + // Merge remote logs with local log (include_local = true)
> + match self.cluster_log.merge(remote_logs, true) {
> + Ok(merged) => {
> + // Update our buffer with the merged result
> + self.cluster_log.update_buffer(merged);
> + tracing::debug!("Successfully merged cluster logs");
> + }
> + Err(e) => {
> + tracing::error!(error = %e, "Failed to merge cluster logs");
> + }
> + }
> + }
> +
> + Ok(())
> + }
> +
> + /// Add cluster log entry from remote node (kvstore LOG message)
> + ///
> + /// Matches C implementation's clusterlog_insert() via kvstore message handling.
> + pub fn add_remote_cluster_log(
> + &self,
> + time: u32,
> + priority: u8,
> + node: String,
> + ident: String,
> + tag: String,
> + message: String,
> + ) -> Result<()> {
> + self.cluster_log
> + .add(&node, &ident, &tag, 0, priority, time, &message)?;
> + Ok(())
> + }
> +}
> +
> +// Implement StatusOps trait for Status
> +impl crate::traits::StatusOps for Status {
> + fn get_node_status(&self, name: &str) -> Option<NodeStatus> {
> + self.get_node_status(name)
> + }
> +
> + fn set_node_status<'a>(
> + &'a self,
> + name: String,
> + data: Vec<u8>,
> + ) -> crate::traits::BoxFuture<'a, Result<()>> {
> + Box::pin(self.set_node_status(name, data))
> + }
> +
> + fn add_log_entry(&self, entry: ClusterLogEntry) {
> + self.add_log_entry(entry)
> + }
> +
> + fn get_log_entries(&self, max: usize) -> Vec<ClusterLogEntry> {
> + self.get_log_entries(max)
> + }
> +
> + fn clear_cluster_log(&self) {
> + self.clear_cluster_log()
> + }
> +
> + fn add_cluster_log(
> + &self,
> + timestamp: u32,
> + priority: u8,
> + tag: String,
> + node: String,
> + msg: String,
> + ) {
> + self.add_cluster_log(timestamp, priority, tag, node, msg)
> + }
> +
> + fn get_cluster_log_state(&self) -> Result<Vec<u8>> {
> + self.get_cluster_log_state()
> + }
> +
> + fn merge_cluster_log_states(&self, states: &[pmxcfs_api_types::NodeSyncInfo]) -> Result<()> {
> + self.merge_cluster_log_states(states)
> + }
> +
> + fn add_remote_cluster_log(
> + &self,
> + time: u32,
> + priority: u8,
> + node: String,
> + ident: String,
> + tag: String,
> + message: String,
> + ) -> Result<()> {
> + self.add_remote_cluster_log(time, priority, node, ident, tag, message)
> + }
> +
> + fn set_rrd_data<'a>(
> + &'a self,
> + key: String,
> + data: String,
> + ) -> crate::traits::BoxFuture<'a, Result<()>> {
> + Box::pin(self.set_rrd_data(key, data))
> + }
> +
> + fn remove_old_rrd_data(&self) {
> + self.remove_old_rrd_data()
> + }
> +
> + fn get_rrd_dump(&self) -> String {
> + self.get_rrd_dump()
> + }
> +
> + fn register_vm(&self, vmid: u32, vmtype: VmType, node: String) {
> + self.register_vm(vmid, vmtype, node)
> + }
> +
> + fn delete_vm(&self, vmid: u32) {
> + self.delete_vm(vmid)
> + }
> +
> + fn vm_exists(&self, vmid: u32) -> bool {
> + self.vm_exists(vmid)
> + }
> +
> + fn different_vm_exists(&self, vmid: u32, vmtype: VmType, node: &str) -> bool {
> + self.different_vm_exists(vmid, vmtype, node)
> + }
> +
> + fn get_vmlist(&self) -> HashMap<u32, VmEntry> {
> + self.get_vmlist()
> + }
> +
> + fn scan_vmlist(&self, memdb: &pmxcfs_memdb::MemDb) {
> + self.scan_vmlist(memdb)
> + }
> +
> + fn init_cluster(&self, cluster_name: String) {
> + self.init_cluster(cluster_name)
> + }
> +
> + fn register_node(&self, node_id: u32, name: String, ip: String) {
> + self.register_node(node_id, name, ip)
> + }
> +
> + fn get_cluster_info(&self) -> Option<ClusterInfo> {
> + self.get_cluster_info()
> + }
> +
> + fn get_cluster_version(&self) -> u64 {
> + self.get_cluster_version()
> + }
> +
> + fn increment_cluster_version(&self) {
> + self.increment_cluster_version()
> + }
> +
> + fn update_cluster_info(
> + &self,
> + cluster_name: String,
> + config_version: u64,
> + nodes: Vec<(u32, String, String)>,
> + ) -> Result<()> {
> + self.update_cluster_info(cluster_name, config_version, nodes)
> + }
> +
> + fn set_node_online(&self, node_id: u32, online: bool) {
> + self.set_node_online(node_id, online)
> + }
> +
> + fn is_quorate(&self) -> bool {
> + self.is_quorate()
> + }
> +
> + fn set_quorate(&self, quorate: bool) {
> + self.set_quorate(quorate)
> + }
> +
> + fn get_members(&self) -> Vec<pmxcfs_api_types::MemberInfo> {
> + self.get_members()
> + }
> +
> + fn update_members(&self, members: Vec<pmxcfs_api_types::MemberInfo>) {
> + self.update_members(members)
> + }
> +
> + fn update_member_status(&self, member_list: &[u32]) {
> + self.update_member_status(member_list)
> + }
> +
> + fn get_start_time(&self) -> u64 {
> + self.get_start_time()
> + }
> +
> + fn increment_vmlist_version(&self) {
> + self.increment_vmlist_version()
> + }
> +
> + fn get_vmlist_version(&self) -> u64 {
> + self.get_vmlist_version()
> + }
> +
> + fn increment_path_version(&self, path: &str) {
> + self.increment_path_version(path)
> + }
> +
> + fn get_path_version(&self, path: &str) -> u64 {
> + self.get_path_version(path)
> + }
> +
> + fn get_all_path_versions(&self) -> HashMap<String, u64> {
> + self.get_all_path_versions()
> + }
> +
> + fn increment_all_path_versions(&self) {
> + self.increment_all_path_versions()
> + }
> +
> + fn set_node_kv(&self, nodeid: u32, key: String, value: Vec<u8>) {
> + self.set_node_kv(nodeid, key, value)
> + }
> +
> + fn get_node_kv(&self, nodeid: u32, key: &str) -> Option<Vec<u8>> {
> + self.get_node_kv(nodeid, key)
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
[..]
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-status/src/traits.rs b/src/pmxcfs-rs/pmxcfs-status/src/traits.rs
[..]
> diff --git a/src/pmxcfs-rs/pmxcfs-status/src/types.rs b/src/pmxcfs-rs/pmxcfs-status/src/types.rs
> new file mode 100644
> index 00000000..393ce63a
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-status/src/types.rs
> @@ -0,0 +1,62 @@
> +/// Data types for the status module
> +use std::collections::HashMap;
> +
> +/// Cluster node information (matches C implementation's cfs_clnode_t)
> +#[derive(Debug, Clone)]
> +pub struct ClusterNode {
> + pub name: String,
> + pub node_id: u32,
> + pub ip: String,
> + pub online: bool,
> +}
> +
> +/// Cluster information (matches C implementation's cfs_clinfo_t)
> +#[derive(Debug, Clone)]
> +pub struct ClusterInfo {
> + pub cluster_name: String,
> + pub nodes_by_id: HashMap<u32, ClusterNode>,
> + pub nodes_by_name: HashMap<String, ClusterNode>,
Mutation sites have to remember to update both maps.
A safer pattern would be to make nodes_by_name just an index:
pub nodes_by_id: HashMap<u32, ClusterNode>,
pub nodes_by_name: HashMap<String, u32>,
> +}
> +
> +impl ClusterInfo {
> + pub(crate) fn new(cluster_name: String) -> Self {
> + Self {
> + cluster_name,
> + nodes_by_id: HashMap::new(),
> + nodes_by_name: HashMap::new(),
> + }
> + }
> +
> + /// Add or update a node in the cluster
> + pub(crate) fn add_node(&mut self, node: ClusterNode) {
> + self.nodes_by_name.insert(node.name.clone(), node.clone());
> + self.nodes_by_id.insert(node.node_id, node);
> + }
> +}
> +
> +/// Node status data
> +#[derive(Clone, Debug)]
> +pub struct NodeStatus {
> + pub name: String,
> + pub data: Vec<u8>,
> + pub timestamp: u64,
> +}
> +
> +/// Cluster log entry
> +#[derive(Clone, Debug)]
> +pub struct ClusterLogEntry {
> + pub timestamp: u64,
> + pub node: String,
> + pub priority: u8,
> + pub ident: String,
> + pub tag: String,
> + pub message: String,
> +}
> +
> +/// RRD (Round Robin Database) entry
> +#[derive(Clone, Debug)]
> +pub(crate) struct RrdEntry {
> + pub key: String,
> + pub data: String,
> + pub timestamp: u64,
> +}
^ permalink raw reply [relevance 5%]
* Re: [pve-devel] [PATCH pve-cluster 07/15] pmxcfs-rs: add pmxcfs-test-utils infrastructure crate
@ 2026-02-03 17:03 6% ` Samuel Rufinatscha
0 siblings, 0 replies; 39+ results
From: Samuel Rufinatscha @ 2026-02-03 17:03 UTC (permalink / raw)
To: Proxmox VE development discussion, Kefu Chai; +Cc: Kefu Chai
Thanks for the patch, having shared test utilities in a dedicated crate
makes a lot of sense.
Comments inline.
On 1/6/26 3:25 PM, Kefu Chai wrote:
> From: Kefu Chai <tchaikov@gmail.com>
>
> This commit introduces a dedicated testing infrastructure crate to support
> comprehensive unit and integration testing across the pmxcfs-rs workspace.
>
> Why a dedicated crate?
> - Provides shared test utilities without creating circular dependencies
> - Enables consistent test patterns across all pmxcfs crates
> - Centralizes mock implementations for dependency injection
>
> What this crate provides:
> 1. MockMemDb: Fast, in-memory implementation of MemDbOps trait
> - Eliminates SQLite I/O overhead in unit tests (~100x faster)
> - Enables isolated testing without filesystem dependencies
> - Uses HashMap for storage instead of SQLite persistence
>
> 2. MockStatus: Re-exported mock implementation for StatusOps trait
> - Allows testing without global singleton state
> - Enables parallel test execution
>
> 3. TestEnv builder: Fluent interface for test environment setup
> - Standardizes test configuration across different test types
> - Provides common directory structures and test data
>
> 4. Async helpers: Condition polling utilities (wait_for_condition)
> - Replaces sleep-based synchronization with active polling
>
> This crate is marked as dev-only in the workspace and is used by other
> crates through [dev-dependencies] to avoid circular dependencies.
>
> Signed-off-by: Kefu Chai <k.chai@proxmox.com>
> ---
> src/pmxcfs-rs/Cargo.toml | 2 +
> src/pmxcfs-rs/pmxcfs-test-utils/Cargo.toml | 34 +
> src/pmxcfs-rs/pmxcfs-test-utils/src/lib.rs | 526 +++++++++++++++
> .../pmxcfs-test-utils/src/mock_memdb.rs | 636 ++++++++++++++++++
> 4 files changed, 1198 insertions(+)
> create mode 100644 src/pmxcfs-rs/pmxcfs-test-utils/Cargo.toml
> create mode 100644 src/pmxcfs-rs/pmxcfs-test-utils/src/lib.rs
> create mode 100644 src/pmxcfs-rs/pmxcfs-test-utils/src/mock_memdb.rs
>
> diff --git a/src/pmxcfs-rs/Cargo.toml b/src/pmxcfs-rs/Cargo.toml
> index b5191c31..8fe06b88 100644
> --- a/src/pmxcfs-rs/Cargo.toml
> +++ b/src/pmxcfs-rs/Cargo.toml
> @@ -7,6 +7,7 @@ members = [
> "pmxcfs-rrd", # RRD (Round-Robin Database) persistence
> "pmxcfs-memdb", # In-memory database with SQLite persistence
> "pmxcfs-status", # Status monitoring and RRD data management
> + "pmxcfs-test-utils", # Test utilities and helpers (dev-only)
> ]
> resolver = "2"
>
> @@ -29,6 +30,7 @@ pmxcfs-status = { path = "pmxcfs-status" }
> pmxcfs-ipc = { path = "pmxcfs-ipc" }
> pmxcfs-services = { path = "pmxcfs-services" }
> pmxcfs-logger = { path = "pmxcfs-logger" }
> +pmxcfs-test-utils = { path = "pmxcfs-test-utils" }
>
> # Core async runtime
> tokio = { version = "1.35", features = ["full"] }
> diff --git a/src/pmxcfs-rs/pmxcfs-test-utils/Cargo.toml b/src/pmxcfs-rs/pmxcfs-test-utils/Cargo.toml
> new file mode 100644
> index 00000000..41cdce64
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-test-utils/Cargo.toml
> @@ -0,0 +1,34 @@
> +[package]
> +name = "pmxcfs-test-utils"
> +version.workspace = true
> +edition.workspace = true
> +authors.workspace = true
> +license.workspace = true
> +repository.workspace = true
> +rust-version.workspace = true
> +
> +[lib]
> +name = "pmxcfs_test_utils"
> +path = "src/lib.rs"
> +
> +[dependencies]
> +# Internal workspace dependencies
> +pmxcfs-api-types.workspace = true
> +pmxcfs-config.workspace = true
> +pmxcfs-memdb.workspace = true
> +pmxcfs-status.workspace = true
> +
> +# Error handling
> +anyhow.workspace = true
> +
> +# Concurrency
> +parking_lot.workspace = true
> +
> +# System integration
> +libc.workspace = true
> +
> +# Development utilities
> +tempfile.workspace = true
> +
> +# Async runtime
> +tokio.workspace = true
> diff --git a/src/pmxcfs-rs/pmxcfs-test-utils/src/lib.rs b/src/pmxcfs-rs/pmxcfs-test-utils/src/lib.rs
> new file mode 100644
> index 00000000..a2b732a5
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-test-utils/src/lib.rs
> @@ -0,0 +1,526 @@
> +//! Test utilities for pmxcfs integration and unit tests
> +//!
> +//! This crate provides:
> +//! - Common test setup and helper functions
> +//! - TestEnv builder for standard test configurations
> +//! - Mock implementations (MockStatus, MockMemDb for isolated testing)
> +//! - Test constants and utilities
> +
> +use anyhow::Result;
> +use pmxcfs_config::Config;
> +use pmxcfs_memdb::MemDb;
> +use std::sync::Arc;
> +use std::time::{Duration, Instant};
> +use tempfile::TempDir;
> +
> +// Re-export MockStatus for easy test access
> +pub use pmxcfs_status::{MockStatus, StatusOps};
> +
> +// Mock implementations
> +mod mock_memdb;
> +pub use mock_memdb::MockMemDb;
> +
> +// Re-export MemDbOps for convenience in tests
> +pub use pmxcfs_memdb::MemDbOps;
> +
> +// Test constants
> +pub const TEST_MTIME: u32 = 1234567890;
> +pub const TEST_NODE_NAME: &str = "testnode";
> +pub const TEST_CLUSTER_NAME: &str = "test-cluster";
> +pub const TEST_WWW_DATA_GID: u32 = 33;
> +
> +/// Test environment builder for standard test setups
> +///
> +/// This builder provides a fluent interface for creating test environments
> +/// with optional components (database, status, config).
> +///
> +/// # Example
> +/// ```
> +/// use pmxcfs_test_utils::TestEnv;
> +///
> +/// # fn example() -> anyhow::Result<()> {
> +/// let env = TestEnv::new()
> +/// .with_database()?
> +/// .with_mock_status()
> +/// .build();
> +///
> +/// // Use env.db, env.status, etc.
> +/// # Ok(())
> +/// # }
> +/// ```
> +pub struct TestEnv {
> + pub config: Arc<Config>,
> + pub db: Option<MemDb>,
> + pub status: Option<Arc<dyn StatusOps>>,
these are pub, but we also have accessor functions
(which can panic)
> + pub temp_dir: Option<TempDir>,
> +}
> +
> +impl TestEnv {
> + /// Create a new test environment builder with default config
> + pub fn new() -> Self {
> + Self::new_with_config(false)
> + }
> +
> + /// Create a new test environment builder with local mode config
> + pub fn new_local() -> Self {
> + Self::new_with_config(true)
> + }
> +
> + /// Create a new test environment builder with custom local_mode setting
> + pub fn new_with_config(local_mode: bool) -> Self {
> + let config = create_test_config(local_mode);
> + Self {
> + config,
> + db: None,
> + status: None,
> + temp_dir: None,
> + }
> + }
> +
> + /// Add a database with standard directory structure
> + pub fn with_database(mut self) -> Result<Self> {
> + let (temp_dir, db) = create_test_db()?;
> + self.temp_dir = Some(temp_dir);
> + self.db = Some(db);
> + Ok(self)
> + }
> +
> + /// Add a minimal database (no standard directories)
> + pub fn with_minimal_database(mut self) -> Result<Self> {
> + let (temp_dir, db) = create_minimal_test_db()?;
> + self.temp_dir = Some(temp_dir);
> + self.db = Some(db);
> + Ok(self)
> + }
> +
> + /// Add a MockStatus instance for isolated testing
> + pub fn with_mock_status(mut self) -> Self {
> + self.status = Some(Arc::new(MockStatus::new()));
> + self
> + }
> +
> + /// Add the real Status instance (uses global singleton)
> + pub fn with_status(mut self) -> Self {
> + self.status = Some(pmxcfs_status::init());
> + self
> + }
> +
> + /// Build and return the test environment
> + pub fn build(self) -> Self {
> + self
> + }
this function seems redundant
> +
> + /// Get a reference to the database (panics if not configured)
> + pub fn db(&self) -> &MemDb {
> + self.db
> + .as_ref()
> + .expect("Database not configured. Call with_database() first")
> + }
> +
> + /// Get a reference to the status (panics if not configured)
> + pub fn status(&self) -> &Arc<dyn StatusOps> {
> + self.status
> + .as_ref()
> + .expect("Status not configured. Call with_status() or with_mock_status() first")
> + }
> +}
> +
> +impl Default for TestEnv {
> + fn default() -> Self {
> + Self::new()
> + }
> +}
> +
> +/// Creates a standard test configuration
> +///
> +/// # Arguments
> +/// * `local_mode` - Whether to run in local mode (no cluster)
> +///
> +/// # Returns
> +/// Arc-wrapped Config suitable for testing
> +pub fn create_test_config(local_mode: bool) -> Arc<Config> {
> + Config::new(
> + TEST_NODE_NAME.to_string(),
> + "127.0.0.1".to_string(),
> + TEST_WWW_DATA_GID,
> + false, // debug mode
> + local_mode,
> + TEST_CLUSTER_NAME.to_string(),
> + )
> +}
> +
> +/// Creates a test database with standard directory structure
> +///
> +/// Creates the following directories:
> +/// - /nodes/{nodename}/qemu-server
> +/// - /nodes/{nodename}/lxc
> +/// - /nodes/{nodename}/priv
> +/// - /priv/lock/qemu-server
> +/// - /priv/lock/lxc
> +/// - /qemu-server
> +/// - /lxc
> +///
> +/// # Returns
> +/// (TempDir, MemDb) - The temp directory must be kept alive for database to persist
> +pub fn create_test_db() -> Result<(TempDir, MemDb)> {
> + let temp_dir = TempDir::new()?;
> + let db_path = temp_dir.path().join("test.db");
> + let db = MemDb::open(&db_path, true)?;
> +
> + // Create standard directory structure
> + let now = TEST_MTIME;
> +
> + // Node-specific directories
> + db.create("/nodes", libc::S_IFDIR, now)?;
> + db.create(&format!("/nodes/{}", TEST_NODE_NAME), libc::S_IFDIR, now)?;
> + db.create(
> + &format!("/nodes/{}/qemu-server", TEST_NODE_NAME),
> + libc::S_IFDIR,
> + now,
> + )?;
> + db.create(
> + &format!("/nodes/{}/lxc", TEST_NODE_NAME),
> + libc::S_IFDIR,
> + now,
> + )?;
> + db.create(
> + &format!("/nodes/{}/priv", TEST_NODE_NAME),
> + libc::S_IFDIR,
> + now,
> + )?;
> +
> + // Global directories
> + db.create("/priv", libc::S_IFDIR, now)?;
> + db.create("/priv/lock", libc::S_IFDIR, now)?;
> + db.create("/priv/lock/qemu-server", libc::S_IFDIR, now)?;
> + db.create("/priv/lock/lxc", libc::S_IFDIR, now)?;
> + db.create("/qemu-server", libc::S_IFDIR, now)?;
> + db.create("/lxc", libc::S_IFDIR, now)?;
> +
> + Ok((temp_dir, db))
> +}
> +
> +/// Creates a minimal test database (no standard directories)
> +///
> +/// Use this when you want full control over database structure
> +///
> +/// # Returns
> +/// (TempDir, MemDb) - The temp directory must be kept alive for database to persist
> +pub fn create_minimal_test_db() -> Result<(TempDir, MemDb)> {
> + let temp_dir = TempDir::new()?;
> + let db_path = temp_dir.path().join("test.db");
> + let db = MemDb::open(&db_path, true)?;
> + Ok((temp_dir, db))
> +}
> +
> +/// Creates test VM configuration content
> +///
> +/// # Arguments
> +/// * `vmid` - VM ID
> +/// * `cores` - Number of CPU cores
> +/// * `memory` - Memory in MB
> +///
> +/// # Returns
> +/// Configuration file content as bytes
> +pub fn create_vm_config(vmid: u32, cores: u32, memory: u32) -> Vec<u8> {
> + format!(
> + "name: test-vm-{}\ncores: {}\nmemory: {}\nbootdisk: scsi0\n",
> + vmid, cores, memory
> + )
> + .into_bytes()
> +}
> +
> +/// Creates test CT (container) configuration content
> +///
> +/// # Arguments
> +/// * `vmid` - Container ID
> +/// * `cores` - Number of CPU cores
> +/// * `memory` - Memory in MB
> +///
> +/// # Returns
> +/// Configuration file content as bytes
> +pub fn create_ct_config(vmid: u32, cores: u32, memory: u32) -> Vec<u8> {
> + format!(
> + "cores: {}\nmemory: {}\nrootfs: local:100/vm-{}-disk-0.raw\n",
> + cores, memory, vmid
> + )
> + .into_bytes()
> +}
> +
> +/// Creates a test lock path for a VM config
> +///
> +/// # Arguments
> +/// * `vmid` - VM ID
> +/// * `vm_type` - "qemu" or "lxc"
> +///
> +/// # Returns
> +/// Lock path in format `/priv/lock/{vm_type}/{vmid}.conf`
> +pub fn create_lock_path(vmid: u32, vm_type: &str) -> String {
> + format!("/priv/lock/{}/{}.conf", vm_type, vmid)
> +}
> +
> +/// Creates a test config path for a VM
> +///
> +/// # Arguments
> +/// * `vmid` - VM ID
> +/// * `vm_type` - "qemu-server" or "lxc"
> +///
> +/// # Returns
> +/// Config path in format `/{vm_type}/{vmid}.conf`
> +pub fn create_config_path(vmid: u32, vm_type: &str) -> String {
> + format!("/{}/{}.conf", vm_type, vmid)
> +}
> +
> +/// Clears all VMs from a status instance
> +///
> +/// Useful for ensuring clean state before tests that register VMs.
> +///
> +/// # Arguments
> +/// * `status` - The status instance to clear
> +pub fn clear_test_vms(status: &dyn StatusOps) {
> + let existing_vms: Vec<u32> = status.get_vmlist().keys().copied().collect();
> + for vmid in existing_vms {
> + status.delete_vm(vmid);
> + }
> +}
> +
> +/// Wait for a condition to become true, polling at regular intervals
> +///
> +/// This is a replacement for sleep-based synchronization in integration tests.
> +/// Instead of sleeping for an arbitrary duration and hoping the condition is met,
> +/// this function polls the condition and returns as soon as it becomes true.
> +///
> +/// # Arguments
> +/// * `predicate` - Function that returns true when the condition is met
> +/// * `timeout` - Maximum time to wait for the condition
> +/// * `check_interval` - How often to check the condition
> +///
> +/// # Returns
> +/// * `true` if condition was met within timeout
> +/// * `false` if timeout was reached without condition being met
> +///
> +/// # Example
> +/// ```no_run
> +/// use pmxcfs_test_utils::wait_for_condition;
> +/// use std::time::Duration;
> +/// use std::sync::atomic::{AtomicBool, Ordering};
> +/// use std::sync::Arc;
> +///
> +/// # async fn example() {
> +/// let ready = Arc::new(AtomicBool::new(false));
> +///
> +/// // Wait for service to be ready (with timeout)
> +/// let result = wait_for_condition(
> +/// || ready.load(Ordering::SeqCst),
> +/// Duration::from_secs(5),
> +/// Duration::from_millis(10),
> +/// ).await;
> +///
> +/// assert!(result, "Service should be ready within 5 seconds");
> +/// # }
> +/// ```
> +pub async fn wait_for_condition<F>(
> + predicate: F,
> + timeout: Duration,
> + check_interval: Duration,
> +) -> bool
> +where
> + F: Fn() -> bool,
> +{
> + let start = Instant::now();
> + loop {
> + if predicate() {
> + return true;
> + }
> + if start.elapsed() >= timeout {
> + return false;
> + }
> + tokio::time::sleep(check_interval).await;
> + }
> +}
> +
> +/// Wait for a condition with a custom error message
> +///
> +/// Similar to `wait_for_condition`, but returns a Result with a custom error message
> +/// if the timeout is reached.
> +///
> +/// # Arguments
> +/// * `predicate` - Function that returns true when the condition is met
> +/// * `timeout` - Maximum time to wait for the condition
> +/// * `check_interval` - How often to check the condition
> +/// * `error_msg` - Error message to return if timeout is reached
> +///
> +/// # Returns
> +/// * `Ok(())` if condition was met within timeout
> +/// * `Err(anyhow::Error)` with custom message if timeout was reached
> +///
> +/// # Example
> +/// ```no_run
> +/// use pmxcfs_test_utils::wait_for_condition_or_fail;
> +/// use std::time::Duration;
> +/// use std::sync::atomic::{AtomicU64, Ordering};
> +/// use std::sync::Arc;
> +///
> +/// # async fn example() -> anyhow::Result<()> {
> +/// let counter = Arc::new(AtomicU64::new(0));
> +///
> +/// wait_for_condition_or_fail(
> +/// || counter.load(Ordering::SeqCst) >= 1,
> +/// Duration::from_secs(5),
> +/// Duration::from_millis(10),
> +/// "Service should initialize within 5 seconds",
> +/// ).await?;
> +///
> +/// # Ok(())
> +/// # }
> +/// ```
> +pub async fn wait_for_condition_or_fail<F>(
> + predicate: F,
> + timeout: Duration,
> + check_interval: Duration,
> + error_msg: &str,
> +) -> Result<()>
> +where
> + F: Fn() -> bool,
> +{
> + if wait_for_condition(predicate, timeout, check_interval).await {
> + Ok(())
> + } else {
> + anyhow::bail!("{}", error_msg)
> + }
> +}
> +
> +/// Blocking version of wait_for_condition for synchronous tests
> +///
> +/// Similar to `wait_for_condition`, but works in synchronous contexts.
> +/// Polls the condition and returns as soon as it becomes true or timeout is reached.
> +///
> +/// # Arguments
> +/// * `predicate` - Function that returns true when the condition is met
> +/// * `timeout` - Maximum time to wait for the condition
> +/// * `check_interval` - How often to check the condition
> +///
> +/// # Returns
> +/// * `true` if condition was met within timeout
> +/// * `false` if timeout was reached without condition being met
> +///
> +/// # Example
> +/// ```no_run
> +/// use pmxcfs_test_utils::wait_for_condition_blocking;
> +/// use std::time::Duration;
> +/// use std::sync::atomic::{AtomicBool, Ordering};
> +/// use std::sync::Arc;
> +///
> +/// let ready = Arc::new(AtomicBool::new(false));
> +///
> +/// // Wait for service to be ready (with timeout)
> +/// let result = wait_for_condition_blocking(
> +/// || ready.load(Ordering::SeqCst),
> +/// Duration::from_secs(5),
> +/// Duration::from_millis(10),
> +/// );
> +///
> +/// assert!(result, "Service should be ready within 5 seconds");
> +/// ```
> +pub fn wait_for_condition_blocking<F>(
> + predicate: F,
> + timeout: Duration,
> + check_interval: Duration,
> +) -> bool
> +where
> + F: Fn() -> bool,
> +{
> + let start = Instant::now();
> + loop {
> + if predicate() {
> + return true;
> + }
> + if start.elapsed() >= timeout {
> + return false;
> + }
> + std::thread::sleep(check_interval);
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> +
> + #[test]
> + fn test_create_test_config() {
> + let config = create_test_config(true);
> + assert_eq!(config.nodename, TEST_NODE_NAME);
> + assert_eq!(config.cluster_name, TEST_CLUSTER_NAME);
> + assert!(config.local_mode);
> + }
> +
> + #[test]
> + fn test_create_test_db() -> Result<()> {
> + let (_temp_dir, db) = create_test_db()?;
> +
> + // Verify standard directories exist
> + assert!(db.exists("/nodes")?, "Should have /nodes");
> + assert!(db.exists("/qemu-server")?, "Should have /qemu-server");
> + assert!(db.exists("/priv/lock")?, "Should have /priv/lock");
> +
> + Ok(())
> + }
> +
> + #[test]
> + fn test_path_helpers() {
> + assert_eq!(
> + create_lock_path(100, "qemu-server"),
The docs of create_lock_path say qemu or lxc, but we pass "qemu-server"
> + "/priv/lock/qemu-server/100.conf"
> + );
> + assert_eq!(
> + create_config_path(100, "qemu-server"),
> + "/qemu-server/100.conf"
> + );
> + }
> +
> + #[test]
> + fn test_env_builder_basic() {
> + let env = TestEnv::new().build();
> + assert_eq!(env.config.nodename, TEST_NODE_NAME);
> + assert!(env.db.is_none());
> + assert!(env.status.is_none());
> + }
> +
> + #[test]
> + fn test_env_builder_with_database() -> Result<()> {
> + let env = TestEnv::new().with_database()?.build();
> + assert!(env.db.is_some());
> + assert!(env.db().exists("/nodes")?);
> + Ok(())
> + }
> +
> + #[test]
> + fn test_env_builder_with_mock_status() {
> + let env = TestEnv::new().with_mock_status().build();
> + assert!(env.status.is_some());
> +
> + // Test that MockStatus works
> + let status = env.status();
> + status.set_quorate(true);
> + assert!(status.is_quorate());
> + }
> +
> + #[test]
> + fn test_env_builder_full() -> Result<()> {
> + let env = TestEnv::new().with_database()?.with_mock_status().build();
> +
> + assert!(env.db.is_some());
> + assert!(env.status.is_some());
> + assert!(env.config.nodename == TEST_NODE_NAME);
> +
> + Ok(())
> + }
> +
> + // NOTE: Tokio tests for wait_for_condition functions are REMOVED because they
> + // cause the test runner to hang when running `cargo test --lib --workspace`.
> + // Root cause: tokio multi-threaded runtime doesn't shut down properly when
> + // these async tests complete, blocking the entire test suite.
> + //
> + // These utility functions work correctly and are verified in integration tests
> + // that actually use them (e.g., integration-tests/).
> +}
> diff --git a/src/pmxcfs-rs/pmxcfs-test-utils/src/mock_memdb.rs b/src/pmxcfs-rs/pmxcfs-test-utils/src/mock_memdb.rs
> new file mode 100644
> index 00000000..c341f9eb
> --- /dev/null
> +++ b/src/pmxcfs-rs/pmxcfs-test-utils/src/mock_memdb.rs
> @@ -0,0 +1,636 @@
> +//! Mock in-memory database implementation for testing
> +//!
> +//! This module provides `MockMemDb`, a lightweight in-memory implementation
> +//! of the `MemDbOps` trait for use in unit tests.
> +
> +use anyhow::{Result, bail};
> +use parking_lot::RwLock;
> +use pmxcfs_memdb::{MemDbOps, ROOT_INODE, TreeEntry};
> +use std::collections::HashMap;
> +use std::sync::atomic::{AtomicU64, Ordering};
> +use std::time::{SystemTime, UNIX_EPOCH};
> +
> +// Directory and file type constants from dirent.h
> +const DT_DIR: u8 = 4;
> +const DT_REG: u8 = 8;
> +
> +/// Mock in-memory database for testing
> +///
> +/// Unlike the real `MemDb` which uses SQLite persistence, `MockMemDb` stores
> +/// everything in memory using HashMap. This makes it:
> +/// - Faster for unit tests (no disk I/O)
> +/// - Easier to inject failures for error testing
> +/// - Completely isolated (no shared state between tests)
> +///
> +/// # Example
> +/// ```
> +/// use pmxcfs_test_utils::MockMemDb;
> +/// use pmxcfs_memdb::MemDbOps;
> +/// use std::sync::Arc;
> +///
> +/// let db: Arc<dyn MemDbOps> = Arc::new(MockMemDb::new());
> +/// db.create("/test.txt", 0, 1234).unwrap();
> +/// assert!(db.exists("/test.txt").unwrap());
> +/// ```
> +pub struct MockMemDb {
> + /// Files and directories stored as path -> data
> + files: RwLock<HashMap<String, Vec<u8>>>,
> + /// Directory entries stored as path -> Vec<child_names>
> + directories: RwLock<HashMap<String, Vec<String>>>,
> + /// Metadata stored as path -> TreeEntry
> + entries: RwLock<HashMap<String, TreeEntry>>,
> + /// Lock state stored as path -> (timestamp, checksum)
> + locks: RwLock<HashMap<String, (u64, [u8; 32])>>,
> + /// Version counter
> + version: AtomicU64,
> + /// Inode counter
> + next_inode: AtomicU64,
> +}
> +
> +impl MockMemDb {
> + /// Create a new empty mock database
> + pub fn new() -> Self {
> + let mut directories = HashMap::new();
> + directories.insert("/".to_string(), Vec::new());
> +
> + let mut entries = HashMap::new();
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap()
> + .as_secs() as u32;
> +
> + // Create root entry
> + entries.insert(
> + "/".to_string(),
> + TreeEntry {
> + inode: ROOT_INODE,
> + parent: 0,
> + version: 0,
> + writer: 1,
> + mtime: now,
> + size: 0,
> + entry_type: DT_DIR,
> + data: Vec::new(),
> + name: String::new(),
> + },
> + );
> +
> + Self {
> + files: RwLock::new(HashMap::new()),
> + directories: RwLock::new(directories),
> + entries: RwLock::new(entries),
> + locks: RwLock::new(HashMap::new()),
> + version: AtomicU64::new(1),
> + next_inode: AtomicU64::new(ROOT_INODE + 1),
> + }
> + }
> +
> + /// Helper to check if path is a directory
> + fn is_directory(&self, path: &str) -> bool {
> + self.directories.read().contains_key(path)
> + }
> +
> + /// Helper to get parent path
> + fn parent_path(path: &str) -> Option<String> {
> + if path == "/" {
> + return None;
> + }
> + let parent = path.rsplit_once('/')?.0;
> + if parent.is_empty() {
> + Some("/".to_string())
> + } else {
> + Some(parent.to_string())
> + }
> + }
> +
> + /// Helper to get file name from path
> + fn file_name(path: &str) -> String {
> + if path == "/" {
> + return String::new();
> + }
> + path.rsplit('/').next().unwrap_or("").to_string()
> + }
> +}
> +
> +impl Default for MockMemDb {
> + fn default() -> Self {
> + Self::new()
> + }
> +}
> +
> +impl MemDbOps for MockMemDb {
> + fn create(&self, path: &str, mode: u32, mtime: u32) -> Result<()> {
> + if path.is_empty() {
> + bail!("Empty path");
> + }
> +
> + if self.entries.read().contains_key(path) {
> + bail!("File exists: {}", path);
> + }
> +
> + let is_dir = (mode & libc::S_IFMT) == libc::S_IFDIR;
> + let entry_type = if is_dir { DT_DIR } else { DT_REG };
> + let inode = self.next_inode.fetch_add(1, Ordering::SeqCst);
> +
> + // Add to parent directory
> + if let Some(parent) = Self::parent_path(path) {
> + if !self.is_directory(&parent) {
> + bail!("Parent is not a directory: {}", parent);
> + }
> + let mut dirs = self.directories.write();
> + if let Some(children) = dirs.get_mut(&parent) {
> + children.push(Self::file_name(path));
> + }
> + }
> +
> + // Create entry
> + let entry = TreeEntry {
> + inode,
> + parent: 0, // Simplified
> + version: self.version.load(Ordering::SeqCst),
> + writer: 1,
> + mtime,
> + size: 0,
> + entry_type,
> + data: Vec::new(),
> + name: Self::file_name(path),
> + };
> +
> + self.entries.write().insert(path.to_string(), entry);
> +
> + if is_dir {
> + self.directories
> + .write()
> + .insert(path.to_string(), Vec::new());
> + } else {
> + self.files.write().insert(path.to_string(), Vec::new());
> + }
> +
> + self.version.fetch_add(1, Ordering::SeqCst);
> + Ok(())
> + }
> +
> + fn read(&self, path: &str, offset: u64, size: usize) -> Result<Vec<u8>> {
> + let files = self.files.read();
> + let data = files
> + .get(path)
> + .ok_or_else(|| anyhow::anyhow!("File not found: {}", path))?;
> +
> + let offset = offset as usize;
> + if offset >= data.len() {
> + return Ok(Vec::new());
> + }
> +
> + let end = std::cmp::min(offset + size, data.len());
> + Ok(data[offset..end].to_vec())
> + }
> +
> + fn write(
> + &self,
> + path: &str,
> + offset: u64,
> + mtime: u32,
> + data: &[u8],
> + truncate: bool,
> + ) -> Result<usize> {
> + let mut files = self.files.write();
> + let file_data = files
> + .get_mut(path)
> + .ok_or_else(|| anyhow::anyhow!("File not found: {}", path))?;
> +
> + let offset = offset as usize;
> +
> + if truncate {
> + file_data.clear();
> + }
> +
> + // Expand if needed
> + if offset + data.len() > file_data.len() {
> + file_data.resize(offset + data.len(), 0);
> + }
> +
> + file_data[offset..offset + data.len()].copy_from_slice(data);
> +
> + // Update entry
> + if let Some(entry) = self.entries.write().get_mut(path) {
> + entry.mtime = mtime;
> + entry.size = file_data.len();
> + }
> +
> + self.version.fetch_add(1, Ordering::SeqCst);
> + Ok(data.len())
> + }
> +
> + fn delete(&self, path: &str) -> Result<()> {
> + if !self.entries.read().contains_key(path) {
> + bail!("File not found: {}", path);
> + }
> +
> + // Check if directory is empty
> + if let Some(children) = self.directories.read().get(path) {
> + if !children.is_empty() {
> + bail!("Directory not empty: {}", path);
> + }
> + }
> +
> + self.entries.write().remove(path);
> + self.files.write().remove(path);
> + self.directories.write().remove(path);
> +
> + // Remove from parent
> + if let Some(parent) = Self::parent_path(path) {
> + if let Some(children) = self.directories.write().get_mut(&parent) {
> + children.retain(|name| name != &Self::file_name(path));
> + }
> + }
> +
> + self.version.fetch_add(1, Ordering::SeqCst);
> + Ok(())
> + }
> +
> + fn rename(&self, old_path: &str, new_path: &str) -> Result<()> {
> + // Check existence first with read locks (released immediately)
> + {
> + let entries = self.entries.read();
> + if !entries.contains_key(old_path) {
> + bail!("Source not found: {}", old_path);
> + }
> + if entries.contains_key(new_path) {
> + bail!("Destination already exists: {}", new_path);
> + }
> + }
We currently don't update parent children lists.
Also, if rename() can be used for directories: we likely need to
rewrite/move all descendant keys (/old/... -> /new/...) across
entries/files/directories to keep the tree consistent.
> +
> + // Move entry - hold write lock for entire operation
> + {
> + let mut entries = self.entries.write();
> + if let Some(mut entry) = entries.remove(old_path) {
> + entry.name = Self::file_name(new_path);
> + entries.insert(new_path.to_string(), entry);
> + }
> + }
Between the read and write lock we have a TOCTOU.
Coudlnt we just hold the write lock?
> +
> + // Move file data - hold write lock for entire operation
> + {
> + let mut files = self.files.write();
> + if let Some(data) = files.remove(old_path) {
> + files.insert(new_path.to_string(), data);
> + }
> + }
> +
> + // Move directory - hold write lock for entire operation
> + {
> + let mut directories = self.directories.write();
> + if let Some(children) = directories.remove(old_path) {
> + directories.insert(new_path.to_string(), children);
> + }
> + }
> +
> + self.version.fetch_add(1, Ordering::SeqCst);
> + Ok(())
> + }
> +
> + fn exists(&self, path: &str) -> Result<bool> {
> + Ok(self.entries.read().contains_key(path))
> + }
> +
> + fn readdir(&self, path: &str) -> Result<Vec<TreeEntry>> {
> + let directories = self.directories.read();
> + let children = directories
> + .get(path)
> + .ok_or_else(|| anyhow::anyhow!("Not a directory: {}", path))?;
> +
> + let entries = self.entries.read();
> + let mut result = Vec::new();
> +
> + for child_name in children {
> + let child_path = if path == "/" {
> + format!("/{}", child_name)
> + } else {
> + format!("{}/{}", path, child_name)
> + };
> +
> + if let Some(entry) = entries.get(&child_path) {
> + result.push(entry.clone());
> + }
> + }
> +
> + Ok(result)
> + }
> +
> + fn set_mtime(&self, path: &str, _writer: u32, mtime: u32) -> Result<()> {
> + let mut entries = self.entries.write();
> + let entry = entries
> + .get_mut(path)
> + .ok_or_else(|| anyhow::anyhow!("File not found: {}", path))?;
> + entry.mtime = mtime;
> + Ok(())
> + }
> +
> + fn lookup_path(&self, path: &str) -> Option<TreeEntry> {
> + self.entries.read().get(path).cloned()
> + }
> +
> + fn get_entry_by_inode(&self, inode: u64) -> Option<TreeEntry> {
> + self.entries
> + .read()
> + .values()
> + .find(|e| e.inode == inode)
> + .cloned()
> + }
> +
> + fn acquire_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()> {
> + let mut locks = self.locks.write();
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap()
> + .as_secs();
> +
> + if let Some((timestamp, existing_csum)) = locks.get(path) {
> + // Check if expired
> + if now - timestamp > 120 {
nit: magic number here, could we use a
const LOCK_TIMEOUT_SECS: u64 = 120; for example?
> + // Expired, can acquire
> + locks.insert(path.to_string(), (now, *csum));
> + return Ok(());
> + }
> +
> + // Not expired, check if same checksum (refresh)
> + if existing_csum == csum {
> + locks.insert(path.to_string(), (now, *csum));
> + return Ok(());
> + }
> +
> + bail!("Lock already held with different checksum");
> + }
> +
> + locks.insert(path.to_string(), (now, *csum));
> + Ok(())
> + }
> +
> + fn release_lock(&self, path: &str, csum: &[u8; 32]) -> Result<()> {
> + let mut locks = self.locks.write();
> + if let Some((_, existing_csum)) = locks.get(path) {
> + if existing_csum == csum {
> + locks.remove(path);
> + return Ok(());
> + }
> + bail!("Lock checksum mismatch");
> + }
> + bail!("No lock found");
> + }
> +
> + fn is_locked(&self, path: &str) -> bool {
> + if let Some((timestamp, _)) = self.locks.read().get(path) {
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap()
> + .as_secs();
> + now - timestamp <= 120
> + } else {
> + false
> + }
> + }
> +
> + fn lock_expired(&self, path: &str, csum: &[u8; 32]) -> bool {
> + if let Some((timestamp, existing_csum)) = self.locks.read().get(path).cloned() {
> + let now = SystemTime::now()
> + .duration_since(UNIX_EPOCH)
> + .unwrap()
> + .as_secs();
> +
> + // Checksum mismatch - reset timeout
> + if &existing_csum != csum {
> + self.locks.write().insert(path.to_string(), (now, *csum));
can we please document this, why we are modifying state when
checksums mismatch?
> + return false;
> + }
> +
> + // Check expiration
> + now - timestamp > 120
> + } else {
> + false
> + }
> + }
> +
> + fn get_version(&self) -> u64 {
> + self.version.load(Ordering::SeqCst)
> + }
> +
> + fn get_all_entries(&self) -> Result<Vec<TreeEntry>> {
> + Ok(self.entries.read().values().cloned().collect())
> + }
> +
> + fn replace_all_entries(&self, entries: Vec<TreeEntry>) -> Result<()> {
Also replace_all_entries() / apply_tree_entry() don’t rebuild parent
directories[..] children lists
> + self.entries.write().clear();
Clears entries, so the root TreeEntry ("/") should be reinserted to
preserve invariants not? (similar to directories below).
> + self.files.write().clear();
> + self.directories.write().clear();
Clearing directories removes "/" but doesn’t reinsert "/"
If possible, we could acquire all write locks once (in the right order)
before the loop
> +
> + for entry in entries {
> + let path = format!("/{}", entry.name); // Simplified
> + self.entries.write().insert(path.clone(), entry.clone());
> +
> + if entry.size > 0 {
Use entry.entry_type == DT_DIR to distinguish directories from files.
The current entry.size > 0 check incorrectly classifies empty files
(size 0) as directories.
> + self.files.write().insert(path, entry.data.clone());
> + } else {
> + self.directories.write().insert(path, Vec::new());
> + }
> + }
> +
> + self.version.fetch_add(1, Ordering::SeqCst);
> + Ok(())
> + }
> +
> + fn apply_tree_entry(&self, entry: TreeEntry) -> Result<()> {
> + let path = format!("/{}", entry.name); // Simplified
> + self.entries.write().insert(path.clone(), entry.clone());
> +
> + if entry.size > 0 {
also here please use entry.entry_type == DT_DIR
> + self.files.write().insert(path, entry.data.clone());
> + }
> +
> + self.version.fetch_add(1, Ordering::SeqCst);
> + Ok(())
> + }
> +
> + fn encode_database(&self) -> Result<Vec<u8>> {
> + // Simplified - just return empty vec
> + Ok(Vec::new())
> + }
> +
> + fn compute_database_checksum(&self) -> Result<[u8; 32]> {
> + // Simplified - return deterministic checksum based on version
> + let version = self.version.load(Ordering::SeqCst);
> + let mut checksum = [0u8; 32];
> + checksum[0..8].copy_from_slice(&version.to_le_bytes());
> + Ok(checksum)
> + }
> +}
> +
> +#[cfg(test)]
> +mod tests {
> + use super::*;
> + use std::sync::Arc;
> +
> + #[test]
> + fn test_mock_memdb_basic_operations() {
> + let db = MockMemDb::new();
> +
> + // Create file
> + db.create("/test.txt", libc::S_IFREG, 1234).unwrap();
> + assert!(db.exists("/test.txt").unwrap());
> +
> + // Write data
> + let data = b"Hello, MockMemDb!";
> + db.write("/test.txt", 0, 1235, data, false).unwrap();
> +
> + // Read data
> + let read_data = db.read("/test.txt", 0, 100).unwrap();
> + assert_eq!(&read_data[..], data);
> +
> + // Check entry
> + let entry = db.lookup_path("/test.txt").unwrap();
> + assert_eq!(entry.size, data.len());
> + assert_eq!(entry.mtime, 1235);
> + }
> +
> + #[test]
> + fn test_mock_memdb_directory_operations() {
> + let db = MockMemDb::new();
> +
> + // Create directory
> + db.create("/mydir", libc::S_IFDIR, 1000).unwrap();
> + assert!(db.exists("/mydir").unwrap());
> +
> + // Create file in directory
> + db.create("/mydir/file.txt", libc::S_IFREG, 1001).unwrap();
> +
> + // Read directory
> + let entries = db.readdir("/mydir").unwrap();
> + assert_eq!(entries.len(), 1);
> + assert_eq!(entries[0].name, "file.txt");
> + }
> +
> + #[test]
> + fn test_mock_memdb_lock_operations() {
> + let db = MockMemDb::new();
> + let csum1 = [1u8; 32];
> + let csum2 = [2u8; 32];
> +
> + // Acquire lock
> + db.acquire_lock("/priv/lock/resource", &csum1).unwrap();
> + assert!(db.is_locked("/priv/lock/resource"));
> +
> + // Lock with same checksum should succeed (refresh)
> + assert!(db.acquire_lock("/priv/lock/resource", &csum1).is_ok());
> +
> + // Lock with different checksum should fail
> + assert!(db.acquire_lock("/priv/lock/resource", &csum2).is_err());
> +
> + // Release lock
> + db.release_lock("/priv/lock/resource", &csum1).unwrap();
> + assert!(!db.is_locked("/priv/lock/resource"));
> +
> + // Can acquire with different checksum now
> + db.acquire_lock("/priv/lock/resource", &csum2).unwrap();
> + assert!(db.is_locked("/priv/lock/resource"));
> + }
> +
> + #[test]
> + fn test_mock_memdb_rename() {
> + let db = MockMemDb::new();
> +
> + // Create file
> + db.create("/old.txt", libc::S_IFREG, 1000).unwrap();
> + db.write("/old.txt", 0, 1001, b"content", false).unwrap();
> +
> + // Rename
> + db.rename("/old.txt", "/new.txt").unwrap();
> +
> + // Old path should not exist
> + assert!(!db.exists("/old.txt").unwrap());
> +
> + // New path should exist with same content
> + assert!(db.exists("/new.txt").unwrap());
> + let data = db.read("/new.txt", 0, 100).unwrap();
> + assert_eq!(&data[..], b"content");
> + }
> +
> + #[test]
> + fn test_mock_memdb_delete() {
> + let db = MockMemDb::new();
> +
> + // Create and delete file
> + db.create("/delete-me.txt", libc::S_IFREG, 1000).unwrap();
> + assert!(db.exists("/delete-me.txt").unwrap());
> +
> + db.delete("/delete-me.txt").unwrap();
> + assert!(!db.exists("/delete-me.txt").unwrap());
> +
> + // Delete non-existent file should fail
> + assert!(db.delete("/nonexistent.txt").is_err());
> + }
> +
> + #[test]
> + fn test_mock_memdb_version_tracking() {
> + let db = MockMemDb::new();
> + let initial_version = db.get_version();
> +
> + // Version should increment on modifications
> + db.create("/file1.txt", libc::S_IFREG, 1000).unwrap();
> + assert!(db.get_version() > initial_version);
> +
> + let v1 = db.get_version();
> + db.write("/file1.txt", 0, 1001, b"data", false).unwrap();
> + assert!(db.get_version() > v1);
> +
> + let v2 = db.get_version();
> + db.delete("/file1.txt").unwrap();
> + assert!(db.get_version() > v2);
> + }
> +
> + #[test]
> + fn test_mock_memdb_isolation() {
> + // Each MockMemDb instance is completely isolated
> + let db1 = MockMemDb::new();
> + let db2 = MockMemDb::new();
> +
> + db1.create("/test.txt", libc::S_IFREG, 1000).unwrap();
> +
> + // db2 should not see db1's files
> + assert!(db1.exists("/test.txt").unwrap());
> + assert!(!db2.exists("/test.txt").unwrap());
> + }
> +
> + #[test]
> + fn test_mock_memdb_as_trait_object() {
> + // Demonstrate using MockMemDb through trait object
> + let db: Arc<dyn MemDbOps> = Arc::new(MockMemDb::new());
> +
> + db.create("/trait-test.txt", libc::S_IFREG, 2000).unwrap();
> + assert!(db.exists("/trait-test.txt").unwrap());
> +
> + db.write("/trait-test.txt", 0, 2001, b"via trait", false)
> + .unwrap();
> + let data = db.read("/trait-test.txt", 0, 100).unwrap();
> + assert_eq!(&data[..], b"via trait");
> + }
> +
> + #[test]
> + fn test_mock_memdb_error_cases() {
> + let db = MockMemDb::new();
> +
> + // Create duplicate should fail
> + db.create("/dup.txt", libc::S_IFREG, 1000).unwrap();
> + assert!(db.create("/dup.txt", libc::S_IFREG, 1000).is_err());
> +
> + // Read non-existent file should fail
> + assert!(db.read("/nonexistent.txt", 0, 100).is_err());
> +
> + // Write to non-existent file should fail
> + assert!(
> + db.write("/nonexistent.txt", 0, 1000, b"data", false)
> + .is_err()
> + );
> +
> + // Empty path should fail
> + assert!(db.create("", libc::S_IFREG, 1000).is_err());
> + }
> +}
^ permalink raw reply [relevance 6%]
Results 201-239 of 239 prev (older) | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2026-01-02 16:07 [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] token-shadow: reduce api token verification overhead Samuel Rufinatscha
2026-01-02 16:07 ` [pbs-devel] [PATCH proxmox-backup v3 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
2026-01-14 10:44 ` Fabian Grünbichler
2026-01-16 13:53 6% ` Samuel Rufinatscha
2026-01-02 16:07 ` [pbs-devel] [PATCH proxmox-backup v3 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
2026-01-14 10:44 ` Fabian Grünbichler
2026-01-16 15:13 6% ` Samuel Rufinatscha
2026-01-16 15:29 5% ` Fabian Grünbichler
2026-01-16 15:33 6% ` Samuel Rufinatscha
2026-01-16 16:00 5% ` Fabian Grünbichler
2026-01-16 16:56 6% ` Samuel Rufinatscha
2026-01-02 16:07 ` [pbs-devel] [PATCH proxmox-backup v3 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2026-01-14 10:44 ` Fabian Grünbichler
2026-01-20 9:21 6% ` Samuel Rufinatscha
2026-01-02 16:07 ` [pbs-devel] [PATCH proxmox-datacenter-manager v3 1/2] pdm-config: implement token.shadow generation Samuel Rufinatscha
2026-01-14 10:45 ` Fabian Grünbichler
2026-01-16 16:28 6% ` Samuel Rufinatscha
2026-01-16 16:48 5% ` Shannon Sterz
2026-01-19 7:56 6% ` Samuel Rufinatscha
2026-01-21 15:15 13% ` [pbs-devel] superseded: [PATCH proxmox{-backup, , -datacenter-manager} v3 00/10] token-shadow: reduce api token verification overhead Samuel Rufinatscha
2026-01-06 14:24 [pve-devel] [PATCH pve-cluster 00/15 v1] Rewrite pmxcfs with Rust Kefu Chai
2026-01-06 14:24 ` [pve-devel] [PATCH pve-cluster 01/15] pmxcfs-rs: add workspace and pmxcfs-api-types crate Kefu Chai
2026-01-23 14:17 6% ` Samuel Rufinatscha
2026-01-26 9:00 6% ` Kefu Chai
2026-01-06 14:24 ` [pve-devel] [PATCH pve-cluster 02/15] pmxcfs-rs: add pmxcfs-config crate Kefu Chai
2026-01-23 15:01 6% ` Samuel Rufinatscha
2026-01-26 9:43 5% ` Kefu Chai
2026-01-06 14:24 ` [pve-devel] [PATCH pve-cluster 03/15] pmxcfs-rs: add pmxcfs-logger crate Kefu Chai
2026-01-27 13:16 6% ` Samuel Rufinatscha
2026-01-06 14:24 ` [pve-devel] [PATCH pve-cluster 04/15] pmxcfs-rs: add pmxcfs-rrd crate Kefu Chai
2026-01-29 14:44 5% ` Samuel Rufinatscha
2026-01-06 14:24 ` [pve-devel] [PATCH pve-cluster 05/15] pmxcfs-rs: add pmxcfs-memdb crate Kefu Chai
2026-01-30 15:35 5% ` Samuel Rufinatscha
2026-01-06 14:24 ` [pve-devel] [PATCH pve-cluster 06/15] pmxcfs-rs: add pmxcfs-status crate Kefu Chai
2026-02-02 16:07 5% ` Samuel Rufinatscha
2026-01-06 14:24 ` [pve-devel] [PATCH pve-cluster 07/15] pmxcfs-rs: add pmxcfs-test-utils infrastructure crate Kefu Chai
2026-02-03 17:03 6% ` Samuel Rufinatscha
2026-01-08 11:26 [pbs-devel] [PATCH proxmox{, -backup} v5 0/9] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2026-01-16 11:30 13% ` [pbs-devel] superseded: " Samuel Rufinatscha
2026-01-16 11:28 10% [pbs-devel] [PATCH proxmox{, -backup} v6 0/5] " Samuel Rufinatscha
2026-01-16 11:28 16% ` [pbs-devel] [PATCH proxmox v6 1/3] acme-api: add ACME completion helpers Samuel Rufinatscha
2026-01-16 11:28 15% ` [pbs-devel] [PATCH proxmox v6 2/3] acme: introduce http_status module Samuel Rufinatscha
2026-01-16 11:28 14% ` [pbs-devel] [PATCH proxmox v6 3/3] fix #6939: acme: support servers returning 204 for nonce requests Samuel Rufinatscha
2026-01-16 11:28 4% ` [pbs-devel] [PATCH proxmox-backup v6 1/2] acme: remove local AcmeClient and use proxmox-acme-api handlers Samuel Rufinatscha
2026-01-16 11:28 9% ` [pbs-devel] [PATCH proxmox-backup v6 2/2] acme: remove unused src/acme and plugin code Samuel Rufinatscha
2026-01-21 15:13 14% [pbs-devel] [PATCH proxmox{-backup, , -datacenter-manager} v4 00/11] token-shadow: reduce api token verification overhead Samuel Rufinatscha
2026-01-21 15:13 17% ` [pbs-devel] [PATCH proxmox-backup v4 1/4] pbs-config: add token.shadow generation to ConfigVersionCache Samuel Rufinatscha
2026-01-21 15:13 12% ` [pbs-devel] [PATCH proxmox-backup v4 2/4] pbs-config: cache verified API token secrets Samuel Rufinatscha
2026-01-21 15:13 12% ` [pbs-devel] [PATCH proxmox-backup v4 3/4] pbs-config: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2026-01-21 15:14 15% ` [pbs-devel] [PATCH proxmox-backup v4 4/4] pbs-config: add TTL window to token secret cache Samuel Rufinatscha
2026-01-21 15:14 13% ` [pbs-devel] [PATCH proxmox v4 1/4] proxmox-access-control: split AccessControlConfig and add token.shadow gen Samuel Rufinatscha
2026-01-21 15:14 12% ` [pbs-devel] [PATCH proxmox v4 2/4] proxmox-access-control: cache verified API token secrets Samuel Rufinatscha
2026-01-21 15:14 12% ` [pbs-devel] [PATCH proxmox v4 3/4] proxmox-access-control: invalidate token-secret cache on token.shadow changes Samuel Rufinatscha
2026-01-21 15:14 15% ` [pbs-devel] [PATCH proxmox v4 4/4] proxmox-access-control: add TTL window to token secret cache Samuel Rufinatscha
2026-01-21 15:14 13% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 1/3] pdm-config: implement token.shadow generation Samuel Rufinatscha
2026-01-21 15:14 17% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 2/3] docs: document API token-cache TTL effects Samuel Rufinatscha
2026-01-21 15:14 16% ` [pbs-devel] [PATCH proxmox-datacenter-manager v4 3/3] pdm-config: wire user+acl cache generation Samuel Rufinatscha
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.