public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Aaron Lauterer <a.lauterer@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH cluster 1/1] status: introduce new pve9- rrd and metric format
Date: Fri, 23 May 2025 18:00:14 +0200	[thread overview]
Message-ID: <20250523160029.404400-5-a.lauterer@proxmox.com> (raw)
In-Reply-To: <20250523160029.404400-1-a.lauterer@proxmox.com>

We add several new columns to nodes and VMs (guest) RRDs. See futher
down for details. Additionally we change the RRA definitions on how we
aggregate the data to match how we do it for the Proxmox Backup Server
[0].

The migration of an existing installation is handled by a dedicated
tool. Only once that has happened, will we store new data in the new
format.
This leaves us with a few cases to handle:

  data recv →          old                                 new
  ↓ rrd files
 -------------|---------------------------|-------------------------------------
  none        | check if directories exists:
              |     neither old or new -> new
	      |     new                -> new
	      |     old only           -> old
--------------|---------------------------|-------------------------------------
  only old    | use old file as is        | cut new columns and use old file
--------------|---------------------------|-------------------------------------
  new present | pad data to match new fmt | use new file as is and pass data

To handle the padding and cutting of the data, we use a buffer.

We add the following new columns:

Nodes:
* memfree
* membuffers
* memcached
* arcsize
* pressures:
  * cpu some
  * io some
  * io full
  * mem some
  * mem full

VMs:
* memhost (memory consumption of all processes in the guests cgroup, host view)
* pressures:
  * cpu some
  * cpu full
  * io some
  * io full
  * mem some
  * mem full

[0] https://git.proxmox.com/?p=proxmox-backup.git;a=blob;f=src/server/metric_collection/rrd.rs;h=ed39cc94ee056924b7adbc21b84c0209478bcf42;hb=dc324716a688a67d700fa133725740ac5d3795ce#l76

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
---
 src/pmxcfs/status.c | 242 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 217 insertions(+), 25 deletions(-)

diff --git a/src/pmxcfs/status.c b/src/pmxcfs/status.c
index 3fdb179..4f258f6 100644
--- a/src/pmxcfs/status.c
+++ b/src/pmxcfs/status.c
@@ -1129,6 +1129,21 @@ kventry_hash_set(
 	return TRUE;
 }
 
+// RRAs are defined in the following way:
+//
+// RRA:CF:xff:step:rows
+// CF: AVERAGE or MAX
+// xff: 0.5
+// steps: stepsize is defined on rrd file creation! example: with 60 seconds step size:
+//	e.g. 1 => 60 sec, 30 => 1800 seconds or 30 min
+// rows: how many aggregated rows are kept, as in how far back in time we store data
+//
+// how many seconds are aggregated per RRA: steps * stepsize * rows
+// how many hours are aggregated per RRA: steps * stepsize * rows / 3600
+// how many days are aggregated per RRA: steps * stepsize * rows / 3600 / 24
+// https://oss.oetiker.ch/rrdtool/tut/rrd-beginners.en.html#Understanding_by_an_example
+
+// Time step size 60 seconds
 static const char *rrd_def_node[] = {
 	"DS:loadavg:GAUGE:120:0:U",
 	"DS:maxcpu:GAUGE:120:0:U",
@@ -1157,6 +1172,43 @@ static const char *rrd_def_node[] = {
 	NULL,
 };
 
+// Time step size 10 seconds
+static const char *rrd_def_node_pve9[] = {
+	"DS:loadavg:GAUGE:120:0:U",
+	"DS:maxcpu:GAUGE:120:0:U",
+	"DS:cpu:GAUGE:120:0:U",
+	"DS:iowait:GAUGE:120:0:U",
+	"DS:memtotal:GAUGE:120:0:U",
+	"DS:memused:GAUGE:120:0:U",
+	"DS:swaptotal:GAUGE:120:0:U",
+	"DS:swapused:GAUGE:120:0:U",
+	"DS:roottotal:GAUGE:120:0:U",
+	"DS:rootused:GAUGE:120:0:U",
+	"DS:netin:DERIVE:120:0:U",
+	"DS:netout:DERIVE:120:0:U",
+	"DS:memfree:GAUGE:120:0:U",
+	"DS:membuffers:GAUGE:120:0:U",
+	"DS:memcached:GAUGE:120:0:U",
+	"DS:arcsize:GAUGE:120:0:U",
+	"DS:pressurecpusome:GAUGE:120:0:U",
+	"DS:pressureiosome:GAUGE:120:0:U",
+	"DS:pressureiofull:GAUGE:120:0:U",
+	"DS:pressurememorysome:GAUGE:120:0:U",
+	"DS:pressurememoryfull:GAUGE:120:0:U",
+
+	"RRA:AVERAGE:0.5:1:1440", // 1 min * 1440 => 1 day
+	"RRA:AVERAGE:0.5:30:1440", // 30 min * 1440 => 30 day
+	"RRA:AVERAGE:0.5:360:1440", // 6 hours * 1440 => 360 day ~1 year
+	"RRA:AVERAGE:0.5:10080:570", // 1 week * 570 => ~10 years
+
+	"RRA:MAX:0.5:1:1440", // 1 min * 1440 => 1 day
+	"RRA:MAX:0.5:30:1440", // 30 min * 1440 => 30 day
+	"RRA:MAX:0.5:360:1440", // 6 hours * 1440 => 360 day ~1 year
+	"RRA:MAX:0.5:10080:570", // 1 week * 570 => ~10 years
+	NULL,
+};
+
+// Time step size 60 seconds
 static const char *rrd_def_vm[] = {
 	"DS:maxcpu:GAUGE:120:0:U",
 	"DS:cpu:GAUGE:120:0:U",
@@ -1183,6 +1235,39 @@ static const char *rrd_def_vm[] = {
 	NULL,
 };
 
+// Time step size 60 seconds
+static const char *rrd_def_vm_pve9[] = {
+	"DS:maxcpu:GAUGE:120:0:U",
+	"DS:cpu:GAUGE:120:0:U",
+	"DS:maxmem:GAUGE:120:0:U",
+	"DS:mem:GAUGE:120:0:U",
+	"DS:maxdisk:GAUGE:120:0:U",
+	"DS:disk:GAUGE:120:0:U",
+	"DS:netin:DERIVE:120:0:U",
+	"DS:netout:DERIVE:120:0:U",
+	"DS:diskread:DERIVE:120:0:U",
+	"DS:diskwrite:DERIVE:120:0:U",
+	"DS:memhost:GAUGE:120:0:U",
+	"DS:pressurecpusome:GAUGE:120:0:U",
+	"DS:pressurecpufull:GAUGE:120:0:U",
+	"DS:pressureiosome:GAUGE:120:0:U",
+	"DS:pressureiofull:GAUGE:120:0:U",
+	"DS:pressurememorysome:GAUGE:120:0:U",
+	"DS:pressurememoryfull:GAUGE:120:0:U",
+
+	"RRA:AVERAGE:0.5:1:1440", // 1 min * 1440 => 1 day
+	"RRA:AVERAGE:0.5:30:1440", // 30 min * 1440 => 30 day
+	"RRA:AVERAGE:0.5:360:1440", // 6 hours * 1440 => 360 day ~1 year
+	"RRA:AVERAGE:0.5:10080:570", // 1 week * 570 => ~10 years
+
+	"RRA:MAX:0.5:1:1440", // 1 min * 1440 => 1 day
+	"RRA:MAX:0.5:30:1440", // 30 min * 1440 => 30 day
+	"RRA:MAX:0.5:360:1440", // 6 hours * 1440 => 360 day ~1 year
+	"RRA:MAX:0.5:10080:570", // 1 week * 570 => ~10 years
+	NULL,
+};
+
+// Time step size 60 seconds
 static const char *rrd_def_storage[] = {
 	"DS:total:GAUGE:120:0:U",
 	"DS:used:GAUGE:120:0:U",
@@ -1200,6 +1285,23 @@ static const char *rrd_def_storage[] = {
 	"RRA:MAX:0.5:10080:70", // 7 day max - ony year
 	NULL,
 };
+//
+// Time step size 60 seconds
+static const char *rrd_def_storage_pve9[] = {
+	"DS:total:GAUGE:120:0:U",
+	"DS:used:GAUGE:120:0:U",
+
+	"RRA:AVERAGE:0.5:1:1440", // 1 min * 1440 => 1 day
+	"RRA:AVERAGE:0.5:30:1440", // 30 min * 1440 => 30 day
+	"RRA:AVERAGE:0.5:360:1440", // 6 hours * 1440 => 360 day ~1 year
+	"RRA:AVERAGE:0.5:10080:570", // 1 week * 570 => ~10 years
+
+	"RRA:MAX:0.5:1:1440", // 1 min * 1440 => 1 day
+	"RRA:MAX:0.5:30:1440", // 30 min * 1440 => 30 day
+	"RRA:MAX:0.5:360:1440", // 6 hours * 1440 => 360 day ~1 year
+	"RRA:MAX:0.5:10080:570", // 1 week * 570 => ~10 years
+	NULL,
+};
 
 #define RRDDIR "/var/lib/rrdcached/db"
 
@@ -1260,35 +1362,70 @@ update_rrd_data(
 	if (!rrd_format_update_buffer) {
 	    rrd_format_update_buffer = (char*)malloc(RRD_FORMAT_BUFFER_SIZE);
 	}
+	static const char* pve9_node_padding = "U:U:U:U:U:U:U:U:U";
+	static const char* pve9_vm_padding = "U:U:U:U:U:U:U";
+
+	const char *padding = NULL;
 
 	int skip = 0;
 	int data_cutoff = 0; // how many columns after initial skip should be a cut-off
 
+	// TODO drop pve2- data handling when not needed anymore
 	if (strncmp(key, "pve2-node/", 10) == 0 ||
 		strncmp(key, "pve9-node/", 10) == 0) {
 		const char *node = key + 10;
 
-		skip = 2;
-
 		if (strchr(node, '/') != NULL)
 			goto keyerror;
 
 		if (strlen(node) < 1)
 			goto keyerror;
 
-		if (strncmp(key, "pve9-node/", 10) == 0) {
-		    data_cutoff = 13;
-		}
+		filename = g_strdup_printf(RRDDIR "/pve9-node/%s", node);
+		char *filename_pve2 = g_strdup_printf(RRDDIR "/pve2-node/%s", node);
 
-		filename = g_strdup_printf(RRDDIR "/pve2-node/%s", node);
+		int use_pve2_file = 0;
 
-		if (!g_file_test(filename, G_FILE_TEST_EXISTS)) {
-
-			mkdir(RRDDIR "/pve2-node", 0755);
+		// check existing rrd files and directories
+		if (g_file_test(filename, G_FILE_TEST_EXISTS)) {
+		    // new file exists, we use that
+		    // TODO: get conditions so that we do not have this empty one
+		} else if (g_file_test(filename_pve2, G_FILE_TEST_EXISTS)) {
+		    // old file exists, use that
+		    use_pve2_file = 1;
+		    filename = g_strdup_printf("%s", filename_pve2);
+		} else {
+		    // neither file exists, check for directories to decide and create file
+		    char *dir_pve2 = g_strdup_printf(RRDDIR "/pve2-node");
+		    char *dir_pve9 = g_strdup_printf(RRDDIR "/pve9-node");
+
+		    if (g_file_test(dir_pve9,G_FILE_TEST_IS_DIR)) {
+			int argcount = sizeof(rrd_def_node_pve9)/sizeof(void*) - 1;
+			create_rrd_file(filename, argcount, rrd_def_node_pve9);
+		    } else if (g_file_test(dir_pve2, G_FILE_TEST_IS_DIR)) {
+			use_pve2_file = 1;
+			filename = g_strdup_printf("%s", filename_pve2);
 			int argcount = sizeof(rrd_def_node)/sizeof(void*) - 1;
 			create_rrd_file(filename, argcount, rrd_def_node);
+		    } else {
+			// no dir exists yet, use new pve9
+			mkdir(RRDDIR "/pve9-node", 0755);
+			int argcount = sizeof(rrd_def_node_pve9)/sizeof(void*) - 1;
+			create_rrd_file(filename, argcount, rrd_def_node_pve9);
+		    }
+		    g_free(dir_pve2);
+		    g_free(dir_pve9);
+		}
+
+		skip = 2;
+
+		if (strncmp(key, "pve2-node/", 10) == 0 && !use_pve2_file) {
+		    padding = pve9_node_padding;
+		} else if (strncmp(key, "pve9-node/", 10) == 0 && use_pve2_file) {
+		    data_cutoff = 13;
 		}
 
+		g_free(filename_pve2);
 	} else if (strncmp(key, "pve2.3-vm/", 10) == 0 ||
 		strncmp(key, "pve9-vm/", 8) == 0) {
 
@@ -1299,27 +1436,57 @@ update_rrd_data(
 		    vmid = key + 8;
 		}
 
-		skip = 4;
-
 		if (strchr(vmid, '/') != NULL)
 			goto keyerror;
 
 		if (strlen(vmid) < 1)
 			goto keyerror;
 
-		if (strncmp(key, "pve9-vm/", 8) == 0) {
-		    data_cutoff = 11;
-		}
+		filename = g_strdup_printf(RRDDIR "/pve9-vm/%s", vmid);
+		char *filename_pve2 = g_strdup_printf(RRDDIR "/pve2-vm/%s", vmid);
 
-		filename = g_strdup_printf(RRDDIR "/%s/%s", "pve2-vm", vmid);
+		int use_pve2_file = 0;
 
-		if (!g_file_test(filename, G_FILE_TEST_EXISTS)) {
-
-			mkdir(RRDDIR "/pve2-vm", 0755);
+		// check existing rrd files and directories
+		if (g_file_test(filename, G_FILE_TEST_EXISTS)) {
+		    // new file exists, we use that
+		    // TODO: get conditions so that we do not have this empty one
+		} else if (g_file_test(filename_pve2, G_FILE_TEST_EXISTS)) {
+		    // old file exists, use that
+		    use_pve2_file = 1;
+		    filename = g_strdup_printf("%s", filename_pve2);
+		} else {
+		    // neither file exists, check for directories to decide and create file
+		    char *dir_pve2 = g_strdup_printf(RRDDIR "/pve2-vm");
+		    char *dir_pve9 = g_strdup_printf(RRDDIR "/pve9-vm");
+
+		    if (g_file_test(dir_pve9,G_FILE_TEST_IS_DIR)) {
+			int argcount = sizeof(rrd_def_vm_pve9)/sizeof(void*) - 1;
+			create_rrd_file(filename, argcount, rrd_def_vm_pve9);
+		    } else if (g_file_test(dir_pve2, G_FILE_TEST_IS_DIR)) {
+			use_pve2_file = 1;
+			filename = g_strdup_printf("%s", filename_pve2);
 			int argcount = sizeof(rrd_def_vm)/sizeof(void*) - 1;
 			create_rrd_file(filename, argcount, rrd_def_vm);
+		    } else {
+			// no dir exists yet, use new pve9
+			mkdir(RRDDIR "/pve9-vm", 0755);
+			int argcount = sizeof(rrd_def_vm_pve9)/sizeof(void*) - 1;
+			create_rrd_file(filename, argcount, rrd_def_vm_pve9);
+		    }
+		    g_free(dir_pve2);
+		    g_free(dir_pve9);
 		}
 
+		skip = 4;
+
+		if (strncmp(key, "pve2.3-vm/", 10) == 0 && !use_pve2_file) {
+		    padding = pve9_vm_padding;
+		} else if (strncmp(key, "pve9-vm/", 8) == 0 && use_pve2_file) {
+		    data_cutoff = 11;
+		}
+
+		g_free(filename_pve2);
 	} else if (strncmp(key, "pve2-storage/", 13) == 0 ||
 		strncmp(key, "pve9-storage/", 13) == 0) {
 		const char *node = key + 13;
@@ -1339,20 +1506,43 @@ update_rrd_data(
 		if (strlen(storage) < 1)
 			goto keyerror;
 
-		filename = g_strdup_printf(RRDDIR "/pve2-storage/%s", node);
-
-		if (!g_file_test(filename, G_FILE_TEST_EXISTS)) {
+		filename = g_strdup_printf(RRDDIR "/pve9-storage/%s", node);
+		char *filename_pve2 = g_strdup_printf(RRDDIR "/pve2-storage/%s", node);
 
-			mkdir(RRDDIR "/pve2-storage", 0755);
+		// check existing rrd files and directories
+		if (g_file_test(filename, G_FILE_TEST_EXISTS)) {
+		    // new file exists, we use that
+		    // TODO: get conditions so that we do not have this empty one
+		} else if (g_file_test(filename_pve2, G_FILE_TEST_EXISTS)) {
+		    // old file exists, use that
+		    filename = g_strdup_printf("%s", filename_pve2);
+		} else {
+		    // neither file exists, check for directories to decide and create file
+		    char *dir_pve2 = g_strdup_printf(RRDDIR "/pve2-storage");
+		    char *dir_pve9 = g_strdup_printf(RRDDIR "/pve9-storage");
+
+		    if (g_file_test(dir_pve9,G_FILE_TEST_IS_DIR)) {
+			int argcount = sizeof(rrd_def_storage_pve9)/sizeof(void*) - 1;
+			create_rrd_file(filename, argcount, rrd_def_storage_pve9);
+		    } else if (g_file_test(dir_pve2, G_FILE_TEST_IS_DIR)) {
+			filename = g_strdup_printf("%s", filename_pve2);
+			int argcount = sizeof(rrd_def_storage)/sizeof(void*) - 1;
+			create_rrd_file(filename, argcount, rrd_def_storage);
+		    } else {
+			// no dir exists yet, use new pve9
+			mkdir(RRDDIR "/pve9-storage", 0755);
 
 			char *dir = g_path_get_dirname(filename);
 			mkdir(dir, 0755);
 			g_free(dir);
 
-			int argcount = sizeof(rrd_def_storage)/sizeof(void*) - 1;
-			create_rrd_file(filename, argcount, rrd_def_storage);
+			int argcount = sizeof(rrd_def_storage_pve9)/sizeof(void*) - 1;
+			create_rrd_file(filename, argcount, rrd_def_storage_pve9);
+		    }
+		    g_free(dir_pve2);
+		    g_free(dir_pve9);
 		}
-
+		g_free(filename_pve2);
 	} else {
 		goto keyerror;
 	}
@@ -1363,6 +1553,8 @@ update_rrd_data(
 	    const char *cut = rrd_skip_data(dp, data_cutoff);
 	    const int data_len = cut - dp - 1; // -1 to remove last colon
 	    snprintf(rrd_format_update_buffer, RRD_FORMAT_BUFFER_SIZE, "%.*s", data_len, dp);
+	} else if (padding) {
+	    snprintf(rrd_format_update_buffer, RRD_FORMAT_BUFFER_SIZE, "%s:%s", dp, padding);
 	} else {
 	    snprintf(rrd_format_update_buffer, RRD_FORMAT_BUFFER_SIZE, "%s", dp);
 	}
-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

  parent reply	other threads:[~2025-05-23 16:00 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-23 16:00 [pve-devel] [RFC cluster/common/container/manager/pve9-rrd-migration-tool/qemu-server/storage 00/19] Expand and migrate RRD data Aaron Lauterer
2025-05-23 16:00 ` [pve-devel] [PATCH cluster-pve8 1/2] cfs status.c: drop old pve2-vm rrd schema support Aaron Lauterer
2025-05-23 16:00 ` [pve-devel] [PATCH cluster-pve8 2/2] status: handle new pve9- metrics update data Aaron Lauterer
2025-05-23 16:35   ` Aaron Lauterer
2025-06-02 13:31   ` Thomas Lamprecht
2025-06-11 14:18     ` Aaron Lauterer
2025-05-23 16:00 ` [pve-devel] [PATCH pve9-rrd-migration-tool 1/1] introduce rrd migration tool for pve8 -> pve9 Aaron Lauterer
2025-05-23 16:00 ` Aaron Lauterer [this message]
2025-05-23 16:37 ` [pve-devel] [PATCH common 1/4] fix error in pressure parsing Aaron Lauterer
2025-05-23 16:37 ` [pve-devel] [PATCH common 2/4] add functions to retrieve pressures for vm/ct Aaron Lauterer
2025-05-23 16:37   ` [pve-devel] [PATCH common 3/4] add helper to fetch value from smaps_rollup for pid Aaron Lauterer
2025-06-02 14:11     ` Thomas Lamprecht
2025-05-23 16:37   ` [pve-devel] [PATCH common 4/4] metrics: add buffer and cache to meminfo Aaron Lauterer
2025-06-02 14:07     ` Thomas Lamprecht
2025-06-11 15:17       ` Aaron Lauterer
2025-05-23 16:37   ` [pve-devel] [PATCH manager 1/5] api2tools: drop old VM rrd schema Aaron Lauterer
2025-05-23 16:37   ` [pve-devel] [PATCH manager 2/5] pvestatd: collect and distribute new pve9- metrics Aaron Lauterer
2025-05-23 16:37   ` [pve-devel] [PATCH manager 3/5] api: nodes: rrd and rrddata fetch from new pve9-node rrd files if present Aaron Lauterer
2025-05-23 16:37   ` [pve-devel] [PATCH manager 4/5] api2tools: extract stats: handle existence of new pve9- data Aaron Lauterer
2025-05-23 16:37   ` [pve-devel] [PATCH manager 5/5] ui: rrdmodels: add new columns Aaron Lauterer
2025-05-23 16:37   ` [pve-devel] [PATCH storage 1/1] status: rrddata: use new pve9 rrd location if file is present Aaron Lauterer
2025-05-23 16:37   ` [pve-devel] [PATCH qemu-server 1/4] metrics: add pressure to metrics Aaron Lauterer
2025-05-23 16:37   ` [pve-devel] [PATCH qemu-server 2/4] vmstatus: add memhost for host view of vm mem consumption Aaron Lauterer
2025-05-23 16:37   ` [pve-devel] [PATCH qemu-server 3/4] vmstatus: switch mem stat to PSS of VM cgroup Aaron Lauterer
2025-05-23 16:37   ` [pve-devel] [PATCH qemu-server 4/4] rrddata: use new pve9 rrd location if file is present Aaron Lauterer
2025-05-23 16:37   ` [pve-devel] [PATCH container 1/1] " Aaron Lauterer
2025-06-02 14:39   ` [pve-devel] [PATCH common 2/4] add functions to retrieve pressures for vm/ct Thomas Lamprecht
2025-05-26 11:52 ` [pve-devel] [RFC cluster/common/container/manager/pve9-rrd-migration-tool/qemu-server/storage 00/19] Expand and migrate RRD data DERUMIER, Alexandre via pve-devel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250523160029.404400-5-a.lauterer@proxmox.com \
    --to=a.lauterer@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal