[pve-devel] PVE Auditing System

* [pve-devel] PVE Auditing System
@ 2026-01-26  3:03 Thomas Skinner
  0 siblings, 0 replies; only message in thread
From: Thomas Skinner @ 2026-01-26  3:03 UTC (permalink / raw)
  To: Proxmox VE development discussion

Hello!

I'm looking to implement an auditing system for PVE to help
organizations better understand the actions performed by users via the
API. In reference to the conversation in bug #4244, it seems that
there's not currently any development on an auditing system. I am with
Fabian in that we need to flesh out a design before going too far into
development. Below are Thomas Lamprecht's thoughts from a couple years
ago with my comments inline.

----

> - [ ] Auditing Framework
>     - [ ] Explore some auditing projects and possibly some (security) standard
>           requirements about what could be a good feature set and design, and
>           about what is a requirement to have to help users with strict
>           requirements/rules on such things (e.g., gov agencies)

A lot of what I've seen have been requirements to be able to
adjust/configure success/failure auditing for elevated privileges,
access control CRUD (user/group/domain/ACL), and other
organizationally defined requirements (a catch-all for subjectively
_important_ happening in the application). The logs must be in a
standardized format including entities associated with the event and
an accurate representation of what/when/where actions occurred. Some
of the requirements in the US stem from PCI, NIST (specifically
800r53), and HIPAA.

>     - [ ] Probably add some (root only) log format on disk that can be
>           filtered, rotated and allows configuring some guarantees for how long
>           stuff is saved

I would say leave the size, rotation, and retention up to the user.
The way that logrotate is already used in the pveproxy logs should
already be sufficiently configurable for this. In some fiddling so
far, I found it easiest to spit out a least privilege log file for
each of pveproxy, pvedaemon, and spiceproxy all in JSON format, which
is highly ingestible and extendable later. Ultimately, I think the
output should be to a file that some log ingester could read for
aggregation on another system if required.

>     - [ ] Then, one probably wants hook/trace on every config change of guests
>           and node relevant stuff with an signature like: `($type, $id,
>           $change-key, $old, $new)` where `$type` and `$change-key` to be
>           considered API (no arbitrary changes of existing ones) and `$old` and
>           `$new` are arbitrary (scalar, hash/array ref).

A good hook spot looks like in the `rest_handler` function in
`PVE/HTTPServer.pm` of the pve-manager package. I'd propose having a
call outside of the eval so that it can handle both success and
failures in the logs. In my experience so far, it's necessary to log
from each proxy endpoint because of the way that validation/permission
check occurs: e.g. pvedaemon won't ever see a failure due to a
permissions check from pveproxy because the code returns before it's
ever proxied over. Putting the logging function here means there is
risk of the function potentially not returning a valid response on an
audit log failure, but I have seen some requirements where stopping
the application when it cannot audit is a requirement (need to make
this configurable and safe).

Another option that I've thought of is having another daemon run (call
it pveauditd), which receives messages from different PVE processes
with audit logging information. This requires some interprocess
communication, but could potentially reduce any kind of lag because
the daemon could buffer messages and it has the I/O wait instead of
the calling process. Permissions lockdown on files is easier here,
too. Hook would probably still be in the same spot, but audit failures
would be handled differently. A nice pro would be that other PVE
processes could communicate log messages this way, too. I could use a
little guidance/example of existing code if the developers want to go
this route.

As far as fields go, I think there are some steadfast required fields:
- Datetime in UTC
- Subject (who performed)
- Object (identifier of object modified and its type)
- Action (what did the subject attempt to/successfully do to the object)
- Status (success/failure)
- Node name or IP (where it was performed, could be implied by where
the log resides)
- Source IP (where the request originated)
- Process name and ID (name could be implied in log file name)

Some other interesting fields would be:
- Before/after whole objects (mentioned specifically in bug #4244):
I'd recommend this to be configurable because I think it would require
two extra calls: one to retrieve the object before and another after
it's modified.
  - Considerations: sensitive fields (especially credentials) would
need to be redacted (character replacement or hashed)
- What changed on the object: this would be a breakdown of the
differences of the above objects (either fields that changed or the
actual changes that were made)
  - Considerations: requires at least the information above and may be
redundant if the above is included
- API parameters: this would include the parameters passed to the API
endpoint; development side, this is easy to include and would be
useful for determining how an object was changed and is less costly
than the before/after model.
  - Considerations: sensitive fields (especially credentials) would
need to be redacted (character replacement or hashed)
- Event ID: some auditing systems (particularly Microsoft's) use a
unique ID for every different type of audit event.
  - Pros: translations of and custom formatted messages, easily
processed/filtered by automated systems, ID mapping could be used
across multiple products (e.g. pve and datacenter manager);
  - Cons: uniqueness of use must be guaranteed, not always human friendly.
  - Considerations: having this field could potentially eliminate the
need for action/object-type fields if each action/object-type combo
has its own ID; format of the ID

For inclusion of auditing into each API endpoint, I think an addition
to the `method_info` construct for each method would be appropriate.
Some advanced validations could be done during build-time to ensure
uniqueness of action/object type or event ID.

>     - [ ] Allow one to enable or disable auditing on some/all guests/nodes, and
>           disable it by default due to cost

I completely agree on this one, and I'd argue that it should be
built-in from the start. The default could easily be to not audit
anything explicitly, which should have minimal impact on runtime. It
would be nice to have configuration of the auditing performed in the
API and synced across the cluster filesystem. I'd recommend node-level
overrides to cluster-level settings. If done in the API, this should
have a separate permission or API path.

My idea for implementation here is to have a hash built during
application startup to load all of the action/object types or event
IDs and lookup their status in said hash. The hash could either be
updated or reloaded on changes in auditing settings.

>     - [ ] Add interface to view and filter audit events

Consistent format in the log files should make this easier on the dev
side. Event IDs or action/object combos could be used to determine
translatable message formats to be shown to the user. Interfaces
should have either a separate permission or API path (restricting this
even to administrators is a common requirement). Cluster-level view
would be nice with a reasonable default for how many messages to
retrieve per node. An option to retrieve more for each node would be
nice. Node-level view could also be adapted, similar to how the VM
events are currently shown.

>     - [ ] Allow to produce notification's for an audit filter

I'd use the same logic as for the view/filter interface above. Event
IDs might make this easier.

----

I appreciate you all taking the time to review and reply to this
thread. An auditing system would be a great addition to the PVE
project that makes it even more enterprise friendly. In the
implementation that I'm thinking of, this would be an addition to
current logging files, not changing any existing formats, which should
make it a non-breaking change.

-- Thomas Skinner

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] only message in thread