From: "Laurențiu Leahu-Vlăducu" <l.leahu-vladucu@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: Re: [pve-devel] [PATCH many v3 00/34] Expand and migrate RRD data and add/change summary graphs
Date: Wed, 23 Jul 2025 12:15:23 +0200 [thread overview]
Message-ID: <4c4a3036-b6bd-4a68-b1c3-5a66263afe7d@proxmox.com> (raw)
In-Reply-To: <20250715143218.1548306-1-a.lauterer@proxmox.com>
I tested this patch series on a fully up-to-date Proxmox VE 8.4.5
cluster of 3 nodes which I then updated to 9.0 BETA. I tested:
- Having both patched and unpatched nodes
- Migrating VMs from patched nodes to unpatched nodes.
- Having only patched nodes.
My test results and remarks:
1. The updated graphs work as expected. In some cases, I noticed that
the data was aggregated differently when looking at a certain node
depending on the node I was connected to (e.g. looking at data of node 1
from node 1 vs node 2 vs node 3) - but these differences were also
present on unpatched nodes as well (thus unrelated to this patch series).
2. Patching a node made the data appear in the new graphs. Migrating a
VM from node 1 (patched) to node 2 (unpatched) and looking at the node 2
data from node 1 also correctly shows the data before the migration (but
obviously does not generate new data, since node 2 was unpatched). In
other words, it works as expected.
3. Dominik already gave feedback on the tooltips, but I want to mention
one more thing that I noticed: the tooltips don't work when using a
touch screen (tested on latest Firefox and Chromium). This is unrelated
to your patch, since it also doesn't work in the graphs when clicking on
the data points (also on unpatched nodes). However, we should either
explain the graphs differently, or fix the tooltips on larger touch
devices (e.g. tablets).
4. The Hour, Day, Week and Month graphs now show the time spans more
accurately than before (e.g. an hour is really an hour, more or less),
but not perfectly accurate (e.g. an hour actually shows 59 minutes).
However, the Year graph seems to be off by a few months on my side (e.g.
currently shows graphs since mid October 2024 instead of mid July 2024).
5. I'm a bit worried that the "Summary" tab starts getting rather
crowded, with no way to change this (if desired). I think this might not
be a huge issue yet, but if we want to add even more information in the
future, it will probably be annoying, since there is currently:
- no way to minimize/maximize graphs
- no way to resize graphs
- no way to move graphs around (e.g. in case the user in mainly
interested in one or a few graphs)
I'm aware that such changes add additional complexity, but if we want to
add even more information in the future, this might eventually become
necessary. Either that, or moving some of the information to another
tab, e.g. to a dedicated "Pressure" tab separated from "Summary" (but
this would mean not being able to visualize everything at once, which is
also not great).
Laurențiu
On 15.07.25 16:31, Aaron Lauterer wrote:
> This patch series does a few things. It expands the RRD format for nodes and
> VMs. For all types (nodes, VMs, storage) we adjust the aggregation to align
> them with the way they are done on the Backup Server. Therefore, we have new
> RRD defitions for all 3 types.
>
> New values are added for nodes and VMs. In particular:
>
> Nodes:
> * memfree
> * arcsize
> * pressures:
> * cpu some
> * io some
> * io full
> * mem some
> * mem full
>
> VMs:
> * memhost (memory consumption of all processes in the guests cgroup, host view)
> * pressures:
> * cpu some
> * cpu full
> * io some
> * io full
> * mem some
> * mem full
>
> The change in RRD columns and aggregation means, that we need new RRD files. To
> not lose old RRD data, we need to migrate the old RRD files to the ones with
> the new schema. Some initial performance tests showed that migrating 10k VM
> RRD files took ~2m40s single threaded. This is way to long to do it within the
> pmxcfs itself. Therefore this will be a dedicated step. I wrote a small rust
> tool that binds to librrd to to the migraton.
>
> We could include it in a post-install step when upgrading to PVE 9.
>
> This also means, that we need to handle the situation of new and old RRD
> files and formats. Therefore we introduce new keys by which the metrics
> are broadcast in a cluster. Up until now (pre PVE9), it is in the format of
> 'pve2-{type}/{resource id}'.
> Having the version number this early in the string makes it tough to match
> against newer ones, especially in the C code of the pmxcfs. To make it easier
> in the future, we change the key format to 'pve-{type}-{version}/{resource id}'.
> This way, we can fuzzy match against unknown 'pve-{type}-{version}' in the C
> code too and handle those situations better.
>
> The result is, that to avoid breaking changes, we are only allowed to add new
> columns, but not modify or remove existing columns!
>
>
> To avoid missing data and key errors in the journal, we need to ship some
> changes to PVE 8 that can handle the new format sent out by pvestatd. Those
> patches are the first in the series and are marked with a "-pve8" postfix in the
> repo name.
> Those patches are present twice, as we try to keep the same change history on
> the PVE9 branches as well.
>
>
> On the GUI side, we switch memory graphs to stacked area graphs and for VMs
> we also have a dedicated line for the memory consumption as the host sees it.
> Because the current memory view of a VM will switch to the internal guest view,
> if we get detailed infos via the ballooning device.
> To make those slightly more complicated graphs possible, we need to adapt
> RRDChart.js in the widget-toolkit to allow for detailed overrides. Additionally
> we introduce info buttons with tooltips to give users a quick hint what certain
> graphs represent.
>
> While we are at it, we can also fix bug #6068 (Node Search tab incorrect Host
> memory usage %) by switching to memhost if available and one wrong if check.
>
>
> As a side note, now that we got pressure graphs, we could start thinking about
> dropping the server load and IO wait graphs. Those are not very specific and
> mash many different metrics into a single one.
>
>
> Release notes:
> We should probably mention in the release notes, that due to the changed
> aggregation settings, it is expected that the resulting RRD files might have
> some data points that the originals didn't have. We observed that in some
> situation we get could get a data point in one time step earlier than before.
> This is most likely due to how RRD recalculates the aggregated data with the
> different resolution.
>
>
> Plans:
> * pve8to9:
> * have a check how many RRD files are present and verify that there is enough
> space on the root FS
>
>
> How to test:
> 1. build pve-cluster on PVE8
> 2. build the -pve8 patches (cluster & manager) and install them on all PVE8 nodes
> 3. Upgrade the first node to PVE9/trixie and install all the other patches
> build all the other repositories, copy the .deb files over and then ideally
> use something like the following to make shure that any dependency will be
> used from the deb files, and not the apt repositories.
> ```
> apt install ./*.deb --reinstall --allow-downgrades -y
> ```
> 4. build the migration tool with cargo and copy the binary to the nodes for now.
> 5. run the migration tool on the first host
> 6. continue running the migration tool on the other nodes one by one
>
>
> High level changes since:
> v2:
> * several bugfixes that I found, especially regarding pressure and memory
> collection for CTs and VMs
> * add missing return property descriptions for pressures
> * added all the GUI changes
>
> v1:
> * refactored the patches as they were a bit of a mess in v1, sorry for that
> now we have distinct patches for pve8 for both affected repos (cluster & manager)
>
> RFC:
> * drop membuffer and memcached in favor of already present memused and memavailable
> * switch from pve9-{type} to pve-{type}-9.0 schema in all places
> * add patch for PVE8 & 9 that handles different keys in live status to avoid
> question marks in the UI
>
> cluster-pve8:
>
> Aaron Lauterer (2):
> cfs status.c: drop old pve2-vm rrd schema support
> status: handle new metrics update data
>
> src/pmxcfs/status.c | 85 ++++++++++++++++++++++++++++-----------------
> 1 file changed, 53 insertions(+), 32 deletions(-)
>
>
> manager-pve8:
>
> Aaron Lauterer (2):
> api2tools: drop old VM rrd schema
> api2tools: extract stats: handle existence of new pve-{type}-9.0 data
>
> PVE/API2Tools.pm | 44 ++++++++++++++++++++++++--------------------
> 1 file changed, 24 insertions(+), 20 deletions(-)
>
>
> pve9-rrd-migration-tool:
>
> Aaron Lauterer (1):
> introduce rrd migration tool for pve8 -> pve9
>
>
> cluster:
>
> Aaron Lauterer (4):
> cfs status.c: drop old pve2-vm rrd schema support
> status: handle new metrics update data
> status: introduce new pve-{type}- rrd and metric format
> rrd: adapt to new RRD format with different aggregation windows
>
> src/PVE/RRD.pm | 52 ++++++--
> src/pmxcfs/status.c | 318 ++++++++++++++++++++++++++++++++++++++------
> 2 files changed, 317 insertions(+), 53 deletions(-)
>
>
> common:
>
> Folke Gleumes (2):
> fix error in pressure parsing
> add function to retrieve pressures from cgroup
>
> src/PVE/ProcFSTools.pm | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
>
> widget-toolkit:
>
> Aaron Lauterer (2):
> rrdchart: allow to override the series object
> rrdchart: use reference for undo button
>
> src/panel/RRDChart.js | 56 +++++++++++++++++++++++++++++++++----------
> 1 file changed, 43 insertions(+), 13 deletions(-)
>
>
> manager:
>
> Aaron Lauterer (13):
> api2tools: drop old VM rrd schema
> api2tools: extract stats: handle existence of new pve-{type}-9.0 data
> pvestatd: collect and distribute new pve-{type}-9.0 metrics
> api: nodes: rrd and rrddata add decade option and use new pve-node-9.0
> rrd files
> api2tools: extract_vm_status add new vm memhost column
> ui: rrdmodels: add new columns and update existing
> ui: node summary: use stacked memory graph with zfs arc
> ui: GuestStatusView: add memhost for VM guests
> ui: GuestSummary: memory switch to stacked and add hostmem
> ui: nodesummary: guestsummary: add tooltip info buttons
> ui: summaries: use titles for disk and network series
> ui: ResourceStore: add memhost column
> fix #6068: ui: utils: calculate and render host memory usage correctly
>
> Folke Gleumes (1):
> ui: add pressure graphs to node and guest summary
>
> PVE/API2/Cluster.pm | 7 +
> PVE/API2/Nodes.pm | 16 +-
> PVE/API2Tools.pm | 47 ++--
> PVE/Service/pvestatd.pm | 342 +++++++++++++++++++-------
> www/manager6/Utils.js | 8 +-
> www/manager6/data/ResourceStore.js | 8 +
> www/manager6/data/model/RRDModels.js | 44 +++-
> www/manager6/node/Summary.js | 79 +++++-
> www/manager6/panel/GuestStatusView.js | 18 +-
> www/manager6/panel/GuestSummary.js | 88 ++++++-
> 10 files changed, 528 insertions(+), 129 deletions(-)
>
>
> storage:
>
> Aaron Lauterer (1):
> status: rrddata: use new pve-storage-9.0 rrd location if file is
> present
>
> src/PVE/API2/Storage/Status.pm | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
>
> qemu-server:
>
> Aaron Lauterer (3):
> vmstatus: add memhost for host view of vm mem consumption
> vmstatus: switch mem stat to PSS of VM cgroup
> rrddata: use new pve-vm-9.0 rrd location if file is present
>
> Folke Gleumes (1):
> metrics: add pressure to metrics
>
> src/PVE/API2/Qemu.pm | 11 ++++----
> src/PVE/QemuServer.pm | 65 +++++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 69 insertions(+), 7 deletions(-)
>
>
> container:
>
> Aaron Lauterer (1):
> rrddata: use new pve-vm-9.0 rrd location if file is present
>
> Folke Gleumes (1):
> metrics: add pressures to metrics
>
> src/PVE/API2/LXC.pm | 11 ++++++-----
> src/PVE/LXC.pm | 34 ++++++++++++++++++++++++++++++++++
> 2 files changed, 40 insertions(+), 5 deletions(-)
>
>
> Summary over all repositories:
> 21 files changed, 1090 insertions(+), 265 deletions(-)
>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next prev parent reply other threads:[~2025-07-23 10:14 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-15 14:31 Aaron Lauterer
2025-07-15 14:31 ` [pve-devel] [PATCH cluster-pve8 v3 1/2] cfs status.c: drop old pve2-vm rrd schema support Aaron Lauterer
2025-07-16 22:32 ` [pve-devel] applied: " Thomas Lamprecht
2025-07-15 14:31 ` [pve-devel] [PATCH cluster-pve8 v3 2/2] status: handle new metrics update data Aaron Lauterer
2025-07-16 22:32 ` [pve-devel] applied: " Thomas Lamprecht
2025-07-15 14:31 ` [pve-devel] [PATCH manager-pve8 v3 1/2] api2tools: drop old VM rrd schema Aaron Lauterer
2025-07-16 22:32 ` [pve-devel] applied: " Thomas Lamprecht
2025-07-15 14:31 ` [pve-devel] [PATCH manager-pve8 v3 2/2] api2tools: extract stats: handle existence of new pve-{type}-9.0 data Aaron Lauterer
2025-07-16 22:32 ` [pve-devel] applied: " Thomas Lamprecht
2025-07-15 14:31 ` [pve-devel] [PATCH pve9-rrd-migration-tool v3 1/1] introduce rrd migration tool for pve8 -> pve9 Aaron Lauterer
2025-07-16 22:32 ` Thomas Lamprecht
2025-07-15 14:31 ` [pve-devel] [PATCH cluster v3 1/4] cfs status.c: drop old pve2-vm rrd schema support Aaron Lauterer
2025-07-16 22:32 ` [pve-devel] applied: " Thomas Lamprecht
2025-07-15 14:31 ` [pve-devel] [PATCH cluster v3 2/4] status: handle new metrics update data Aaron Lauterer
2025-07-16 22:32 ` [pve-devel] applied: " Thomas Lamprecht
2025-07-15 14:31 ` [pve-devel] [PATCH cluster v3 3/4] status: introduce new pve-{type}- rrd and metric format Aaron Lauterer
2025-07-15 14:31 ` [pve-devel] [PATCH cluster v3 4/4] rrd: adapt to new RRD format with different aggregation windows Aaron Lauterer
2025-07-15 14:31 ` [pve-devel] [PATCH common v3 1/2] fix error in pressure parsing Aaron Lauterer
2025-07-16 22:33 ` [pve-devel] applied: " Thomas Lamprecht
2025-07-15 14:31 ` [pve-devel] [PATCH common v3 2/2] add function to retrieve pressures from cgroup Aaron Lauterer
2025-07-16 22:33 ` [pve-devel] applied: " Thomas Lamprecht
2025-07-15 14:31 ` [pve-devel] [PATCH widget-toolkit v3 1/2] rrdchart: allow to override the series object Aaron Lauterer
2025-07-21 11:42 ` Dominik Csapak
2025-07-21 15:08 ` Aaron Lauterer
2025-07-15 14:31 ` [pve-devel] [PATCH widget-toolkit v3 2/2] rrdchart: use reference for undo button Aaron Lauterer
2025-07-21 11:43 ` Dominik Csapak
2025-07-15 14:31 ` [pve-devel] [PATCH manager v3 01/14] api2tools: drop old VM rrd schema Aaron Lauterer
2025-07-18 19:17 ` [pve-devel] applied: " Thomas Lamprecht
2025-07-15 14:31 ` [pve-devel] [PATCH manager v3 02/14] api2tools: extract stats: handle existence of new pve-{type}-9.0 data Aaron Lauterer
2025-07-18 19:17 ` [pve-devel] applied: " Thomas Lamprecht
2025-07-15 14:32 ` [pve-devel] [PATCH manager v3 03/14] pvestatd: collect and distribute new pve-{type}-9.0 metrics Aaron Lauterer
2025-07-15 14:32 ` [pve-devel] [PATCH manager v3 04/14] api: nodes: rrd and rrddata add decade option and use new pve-node-9.0 rrd files Aaron Lauterer
2025-07-15 14:32 ` [pve-devel] [PATCH manager v3 05/14] api2tools: extract_vm_status add new vm memhost column Aaron Lauterer
2025-07-15 14:32 ` [pve-devel] [PATCH manager v3 06/14] ui: rrdmodels: add new columns and update existing Aaron Lauterer
2025-07-21 11:48 ` Dominik Csapak
2025-07-15 14:32 ` [pve-devel] [PATCH manager v3 07/14] ui: node summary: use stacked memory graph with zfs arc Aaron Lauterer
2025-07-21 12:01 ` Dominik Csapak
2025-07-15 14:32 ` [pve-devel] [PATCH manager v3 08/14] ui: add pressure graphs to node and guest summary Aaron Lauterer
2025-07-21 12:05 ` Dominik Csapak
2025-07-15 14:32 ` [pve-devel] [PATCH manager v3 09/14] ui: GuestStatusView: add memhost for VM guests Aaron Lauterer
2025-07-21 12:34 ` Dominik Csapak
2025-07-15 14:32 ` [pve-devel] [PATCH manager v3 10/14] ui: GuestSummary: memory switch to stacked and add hostmem Aaron Lauterer
2025-07-21 12:37 ` Dominik Csapak
2025-07-15 14:32 ` [pve-devel] [PATCH manager v3 11/14] ui: nodesummary: guestsummary: add tooltip info buttons Aaron Lauterer
2025-07-21 12:40 ` Dominik Csapak
2025-07-15 14:32 ` [pve-devel] [PATCH manager v3 12/14] ui: summaries: use titles for disk and network series Aaron Lauterer
2025-07-21 12:40 ` Dominik Csapak
2025-07-15 14:32 ` [pve-devel] [PATCH manager v3 13/14] ui: ResourceStore: add memhost column Aaron Lauterer
2025-07-15 14:32 ` [pve-devel] [PATCH manager v3 14/14] fix #6068: ui: utils: calculate and render host memory usage correctly Aaron Lauterer
2025-07-21 12:52 ` Dominik Csapak
2025-07-15 14:32 ` [pve-devel] [PATCH storage v3 1/1] status: rrddata: use new pve-storage-9.0 rrd location if file is present Aaron Lauterer
2025-07-15 14:32 ` [pve-devel] [PATCH qemu-server v3 1/4] metrics: add pressure to metrics Aaron Lauterer
2025-07-15 14:32 ` [pve-devel] [PATCH qemu-server v3 2/4] vmstatus: add memhost for host view of vm mem consumption Aaron Lauterer
2025-07-15 14:32 ` [pve-devel] [PATCH qemu-server v3 3/4] vmstatus: switch mem stat to PSS of VM cgroup Aaron Lauterer
2025-07-15 14:32 ` [pve-devel] [PATCH qemu-server v3 4/4] rrddata: use new pve-vm-9.0 rrd location if file is present Aaron Lauterer
2025-07-15 14:32 ` [pve-devel] [PATCH container v3 1/2] metrics: add pressures to metrics Aaron Lauterer
2025-07-15 14:32 ` [pve-devel] [PATCH container v3 2/2] rrddata: use new pve-vm-9.0 rrd location if file is present Aaron Lauterer
2025-07-23 10:15 ` Laurențiu Leahu-Vlăducu [this message]
2025-07-26 1:13 ` [pve-devel] SUPERSEEDED Re: [PATCH many v3 00/34] Expand and migrate RRD data and add/change summary graphs Aaron Lauterer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4c4a3036-b6bd-4a68-b1c3-5a66263afe7d@proxmox.com \
--to=l.leahu-vladucu@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.