From: "Lukas Wagner" <l.wagner@proxmox.com>
To: "Proxmox VE development discussion" <pve-devel@lists.proxmox.com>,
"Aaron Lauterer" <a.lauterer@proxmox.com>
Subject: Re: [pve-devel] [PATCH many v4 00/31] Expand and migrate RRD data and add/change summary graphs
Date: Tue, 29 Jul 2025 14:19:16 +0200 [thread overview]
Message-ID: <DBOJ7YLX5AL1.3M8J791J4K9J4@proxmox.com> (raw)
In-Reply-To: <20250726010626.1496866-1-a.lauterer@proxmox.com>
On Sat Jul 26, 2025 at 3:05 AM CEST, Aaron Lauterer wrote:
> This patch series does a few things. It expands the RRD format for nodes and
> VMs. For all types (nodes, VMs, storage) we adjust the aggregation to align
> them with the way they are done on the Backup Server. Therefore, we have new
> RRD defitions for all 3 types.
>
> New values are added for nodes and VMs. In particular:
>
> Nodes:
> * memfree
> * arcsize
> * pressures:
> * cpu some
> * io some
> * io full
> * mem some
> * mem full
>
> VMs:
> * memhost (memory consumption of all processes in the guests cgroup, host view)
> * pressures:
> * cpu some
> * cpu full
> * io some
> * io full
> * mem some
> * mem full
>
> The change in RRD columns and aggregation means, that we need new RRD files. To
> not lose old RRD data, we need to migrate the old RRD files to the ones with
> the new schema. Some initial performance tests showed that migrating 10k VM
> RRD files took ~2m40s single threaded. This is way to long to do it within the
> pmxcfs itself. Therefore this will be a dedicated step:
> The new `proxmox-rrd-migration-tool` migrates the RRD files to the new location
> and aggregation schemas. It is run automatically by the postinst script of the
> pve-manager.
>
> This also means, that we need to handle the situation of new and old RRD
> files and formats. Therefore we introduce new keys by which the metrics
> are broadcast in a cluster. Up until now (pre PVE9), it is in the format of
> 'pve2-{type}/{resource id}'.
> Having the version number this early in the string makes it tough to match
> against newer ones, especially in the C code of the pmxcfs. To make it easier
> in the future, we change the key format to 'pve-{type}-{version}/{resource id}'.
> This way, we can fuzzy match against unknown 'pve-{type}-{version}' in the C
> code too and handle those situations better.
>
> The result is, that to avoid breaking changes, we are only allowed to add new
> columns, but not modify or remove existing columns!
>
>
> To avoid missing data and key errors in the journal, we already bumped
> changes to PVE 8 so it can handle the new format sent out by pvestatd in the
> latest versions.
>
> On the GUI side, we switch memory graphs to stacked area graphs and for VMs
> we also have a dedicated line for the memory consumption as the host sees it.
> Because the current memory view of a VM will switch to the internal guest view,
> if we get detailed infos via the ballooning device.
> To make those slightly more complicated graphs possible, we need to adapt
> RRDChart.js in the widget-toolkit to allow for detailed overrides.
>
> While we are at it, we can also fix bug #6068 (Node Search tab incorrect Host
> memory usage %) by switching to memhost if available and one wrong if check.
>
>
> As a side note, now that we got pressure graphs, we could start thinking about
> dropping the server load and IO wait graphs. Those are not very specific and
> mash many different metrics into a single one.
>
>
> Release notes:
> We should probably mention in the release notes, that due to the changed
> aggregation settings, it is expected that the resulting RRD files might have
> some data points that the originals didn't have. We observed that in some
> situation we get could get a data point in one time step earlier than before.
> This is most likely due to how RRD recalculates the aggregated data with the
> different resolution.
>
> In the pve8to9 checks, we now have a check that makes sure we do have enough
> free space, as the new RRD files with the new columns and more detailed
> aggeration steps, are quite a bit larger. We also check after install, if any
> RRD files have not yet been migrated, which would warrant another manual run of
> the migration tool.
>
> Plans:
> * add doc patches for the summary pages that explain the different graphs and
> make the help button point to those sections
>
> KNOWN ISSUES:
> * on a live system, renaming the source RRD files to FILE.old doesn't seem to
> work as expected and besides the renamed ones, new ones without the .old prefix
> show up again. I suspect some interaction with rrdached and/or pmxcfs receiving
> new data.
>
> How to test:
> 1. have PVE8 nodes on the latest version (>= 8.4.4)
> 2. Upgrade the first node to PVE9/trixie and install all the other patches
> to see the automatic upgrade, pve-manager might need to be temporarily
> bumped to 9.0.0~12!
> build all the other repositories, copy the .deb files over and then ideally
> use something like the following to make shure that any dependency will be
> used from the deb files, and not the apt repositories.
> ```
> apt install ./*.deb --reinstall --allow-downgrades -y
> ```
> 3. you should see, if the pve-manager package calling the
> proxmox-rrd-migration-tool
>
Gave this series a test on a pre-existing 3-node dev cluster.
I first upgraded the cluster to PVE9 and then installed the packages
from this series on top. I could verify that the
proxmox-rrd-migration-tool is executed by pve-manager's d/postinst
script.
Now, unfortunately that was a cluster that I don't use that often, so
the weekly/monthly/yearly RRD data was pretty empty, but I can at least
verify that the hourly data was migrated successfully. Also I could see
the new pressure graphs being populated immediately after the upgrade.
Nice!
During testing I did not really encounter any issues.
Regarding the memleak in pmxcfs that I mentioned during my review of the
pve-cluster patches: I could definitely see RSS creep up quite slowly.
Not sure how much of that is due to the leak and how much is 'normal',
where the heap size slowly converges to some final maximum. I'll keep
the cluster running for a bit more time and see where this goes.
Consider this:
Tested-by: Lukas Wagner <l.wagner@proxmox.com>
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
next prev parent reply other threads:[~2025-07-29 12:17 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-26 1:05 Aaron Lauterer
2025-07-26 1:05 ` [pve-devel] [PATCH proxmox-rrd-migration-tool v4 1/3] create proxmox-rrd-migration-tool Aaron Lauterer
2025-07-28 14:25 ` Lukas Wagner
2025-07-26 1:05 ` [pve-devel] [PATCH proxmox-rrd-migration-tool v4 2/3] add first tests Aaron Lauterer
2025-07-28 14:52 ` Lukas Wagner
2025-07-26 1:05 ` [pve-devel] [PATCH proxmox-rrd-migration-tool v4 3/3] add debian packaging Aaron Lauterer
2025-07-28 14:36 ` Lukas Wagner
2025-07-29 9:29 ` Thomas Lamprecht
2025-07-29 9:49 ` Lukas Wagner
2025-07-30 17:57 ` [pve-devel] applied: " Thomas Lamprecht
2025-07-26 1:05 ` [pve-devel] [PATCH cluster v4 1/2] status: introduce new pve-{type}- rrd and metric format Aaron Lauterer
2025-07-29 9:44 ` Lukas Wagner
2025-07-30 11:21 ` Lukas Wagner
2025-07-31 3:23 ` Thomas Lamprecht
2025-07-26 1:06 ` [pve-devel] [PATCH cluster v4 2/2] rrd: adapt to new RRD format with different aggregation windows Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH widget-toolkit v4 1/4] rrdchart: allow to override the series object Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH widget-toolkit v4 2/4] rrdchart: use reference for undo button Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH widget-toolkit v4 3/4] rrdchard: set cursor pointer for legend Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH widget-toolkit v4 4/4] rrdchart: add dummy listener for legend clicks Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH manager v4 01/15] pvestatd: collect and distribute new pve-{type}-9.0 metrics Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH manager v4 02/15] api: nodes: rrd and rrddata add decade option and use new pve-node-9.0 rrd files Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH manager v4 03/15] api2tools: extract_vm_status add new vm memhost column Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH manager v4 04/15] ui: rrdmodels: add new columns and update existing Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH manager v4 05/15] ui: node summary: use stacked memory graph with zfs arc Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH manager v4 06/15] ui: add pressure graphs to node and guest summary Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH manager v4 07/15] ui: GuestStatusView: add memhost for VM guests Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH manager v4 08/15] ui: GuestSummary: memory switch to stacked and add hostmem Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH manager v4 09/15] ui: GuestSummary: remember visibility of host memory view Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH manager v4 10/15] ui: nodesummary: guestsummary: add tooltip info buttons Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH manager v4 11/15] ui: summaries: use titles for disk and network series Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH manager v4 12/15] fix #6068: ui: utils: calculate and render host memory usage correctly Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH manager v4 13/15] d/control: require proxmox-rrd-migration-tool >= 1.0.0 Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH manager v4 14/15] d/postinst: run promox-rrd-migration-tool Aaron Lauterer
2025-07-29 12:09 ` Lukas Wagner
2025-07-26 1:06 ` [pve-devel] [PATCH manager stabe-8+master v4 15/15] pve8to9: add checkfs for RRD migration Aaron Lauterer
2025-07-29 8:15 ` Lukas Wagner
2025-07-29 9:16 ` Thomas Lamprecht
2025-07-26 1:06 ` [pve-devel] [PATCH storage v4 1/1] status: rrddata: use new pve-storage-9.0 rrd location if file is present Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH qemu-server v4 1/4] metrics: add pressure to metrics Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH qemu-server v4 2/4] vmstatus: add memhost for host view of vm mem consumption Aaron Lauterer
2025-07-29 12:49 ` Lukas Wagner
2025-07-31 3:37 ` Thomas Lamprecht
2025-07-31 6:51 ` Lukas Wagner
2025-07-26 1:06 ` [pve-devel] [PATCH qemu-server v4 3/4] vmstatus: switch mem stat to PSS of VM cgroup Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH qemu-server v4 4/4] rrddata: use new pve-vm-9.0 rrd location if file is present Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH container v4 1/2] metrics: add pressures to metrics Aaron Lauterer
2025-07-26 1:06 ` [pve-devel] [PATCH container v4 2/2] rrddata: use new pve-vm-9.0 rrd location if file is present Aaron Lauterer
2025-07-28 14:42 ` [pve-devel] [PATCH many v4 00/31] Expand and migrate RRD data and add/change summary graphs Thomas Lamprecht
2025-07-29 12:19 ` Lukas Wagner [this message]
2025-07-31 4:12 ` [pve-devel] applied: " Thomas Lamprecht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DBOJ7YLX5AL1.3M8J791J4K9J4@proxmox.com \
--to=l.wagner@proxmox.com \
--cc=a.lauterer@proxmox.com \
--cc=pve-devel@lists.proxmox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox