[pve-devel] [PATCH pve-common 1/3] Fix #5708: Add CPU raw counters

public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed

* [pve-devel] [PATCH pve-common 1/3] Fix #5708: Add CPU raw counters
       [not found] <20240917055020.10507-1-sascha.westermann@hl-services.de>
@ 2024-09-17  5:50 ` Sascha Westermann via pve-devel
  2024-09-17  5:50 ` [pve-devel] [PATCH pve-manager 2/3] " Sascha Westermann via pve-devel
  2024-09-17  5:50 ` [pve-devel] [PATCH qemu-server 3/3] " Sascha Westermann via pve-devel
  2 siblings, 0 replies; 8+ messages in thread
From: Sascha Westermann via pve-devel @ 2024-09-17  5:50 UTC (permalink / raw)
  To: pve-devel; +Cc: Sascha Westermann

[-- Attachment #1: Type: message/rfc822, Size: 11805 bytes --]

From: Sascha Westermann <sascha.westermann@hl-services.de>
To: pve-devel@lists.proxmox.com
Cc: Sascha Westermann <sascha.westermann@hl-services.de>
Subject: [PATCH pve-common 1/3] Fix #5708: Add CPU raw counters
Date: Tue, 17 Sep 2024 07:50:18 +0200
Message-ID: <20240917055020.10507-2-sascha.westermann@hl-services.de>

Add "guest_time" (43) from /proc/<pid>/stat.

Signed-off-by: Sascha Westermann <sascha.westermann@hl-services.de>
---
 src/PVE/ProcFSTools.pm | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/PVE/ProcFSTools.pm b/src/PVE/ProcFSTools.pm
index 3826fcc..5b0062e 100644
--- a/src/PVE/ProcFSTools.pm
+++ b/src/PVE/ProcFSTools.pm
@@ -239,7 +239,7 @@ sub read_proc_pid_stat {
 
     my $statstr = PVE::Tools::file_read_firstline("/proc/$pid/stat");
 
-    if ($statstr && $statstr =~ m/^$pid \(.*\) (\S) (-?\d+) -?\d+ -?\d+ -?\d+ -?\d+ \d+ \d+ \d+ \d+ \d+ (\d+) (\d+) (-?\d+) (-?\d+) -?\d+ -?\d+ -?\d+ 0 (\d+) (\d+) (-?\d+) \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ -?\d+ -?\d+ \d+ \d+ \d+/) {
+    if ($statstr && $statstr =~ m/^$pid \(.*\) (\S) (-?\d+) -?\d+ -?\d+ -?\d+ -?\d+ \d+ \d+ \d+ \d+ \d+ (\d+) (\d+) (-?\d+) (-?\d+) -?\d+ -?\d+ -?\d+ 0 (\d+) (\d+) (-?\d+) \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ \d+ -?\d+ -?\d+ \d+ \d+ \d+ (\d+)/) {
 	return {
 	    status => $1,
 	    ppid => $2,
@@ -248,6 +248,7 @@ sub read_proc_pid_stat {
 	    starttime => $7,
 	    vsize => $8,
 	    rss => $9 * 4096,
+	    guest_time => $10,
 	};
     }
 
-- 
2.46.0



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH pve-manager 2/3] Fix #5708: Add CPU raw counters
       [not found] <20240917055020.10507-1-sascha.westermann@hl-services.de>
  2024-09-17  5:50 ` [pve-devel] [PATCH pve-common 1/3] Fix #5708: Add CPU raw counters Sascha Westermann via pve-devel
@ 2024-09-17  5:50 ` Sascha Westermann via pve-devel
  2024-09-24 12:25   ` Daniel Kral
  2024-09-17  5:50 ` [pve-devel] [PATCH qemu-server 3/3] " Sascha Westermann via pve-devel
  2 siblings, 1 reply; 8+ messages in thread
From: Sascha Westermann via pve-devel @ 2024-09-17  5:50 UTC (permalink / raw)
  To: pve-devel; +Cc: Sascha Westermann

[-- Attachment #1: Type: message/rfc822, Size: 13080 bytes --]

From: Sascha Westermann <sascha.westermann@hl-services.de>
To: pve-devel@lists.proxmox.com
Cc: Sascha Westermann <sascha.westermann@hl-services.de>
Subject: [PATCH pve-manager 2/3] Fix #5708: Add CPU raw counters
Date: Tue, 17 Sep 2024 07:50:19 +0200
Message-ID: <20240917055020.10507-3-sascha.westermann@hl-services.de>

Add a map containing raw values from /proc/stat and "uptime_ticks" which
can be used in combination with cpuinfo.user_hz to calculate CPU usage
from two samples. "uptime_ticks" is only defined at the top level, as
/proc/stat is read once, so that core-specific raw values match this
value.

Signed-off-by: Sascha Westermann <sascha.westermann@hl-services.de>
---
 PVE/API2/Nodes.pm | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/PVE/API2/Nodes.pm b/PVE/API2/Nodes.pm
index 9920e977..1943ec56 100644
--- a/PVE/API2/Nodes.pm
+++ b/PVE/API2/Nodes.pm
@@ -5,6 +5,7 @@ use warnings;
 
 use Digest::MD5;
 use Digest::SHA;
+use IO::File;
 use Filesys::Df;
 use HTTP::Status qw(:constants);
 use JSON;
@@ -466,6 +467,37 @@ __PACKAGE__->register_method({
 	$res->{cpu} = $stat->{cpu};
 	$res->{wait} = $stat->{wait};
 
+	if (my $fh = IO::File->new ("/proc/stat", "r")) {
+	    my ($uptime_ticks) = PVE::ProcFSTools::read_proc_uptime(1);
+	    while (defined (my $line = <$fh>)) {
+		if ($line =~ m|^cpu\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)(?:\s+(\d+)\s+(\d+))?|) {
+		    $res->{cpustat}->{user} = int($1);
+		    $res->{cpustat}->{nice} = int($2);
+		    $res->{cpustat}->{system} = int($3);
+		    $res->{cpustat}->{idle} = int($4);
+		    $res->{cpustat}->{iowait} = int($5);
+		    $res->{cpustat}->{irq} = int($6);
+		    $res->{cpustat}->{softirq} = int($7);
+		    $res->{cpustat}->{steal} = int($8);
+		    $res->{cpustat}->{guest} = int($9);
+		    $res->{cpustat}->{guest_nice} = int($10);
+		    $res->{cpustat}->{uptime_ticks} = $uptime_ticks;
+		} elsif ($line =~ m|^cpu(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)(?:\s+(\d+)\s+(\d+))?|) {
+		    $res->{cpustat}->{"cpu" . $1}->{user} = int($2);
+		    $res->{cpustat}->{"cpu" . $1}->{nice} = int($3);
+		    $res->{cpustat}->{"cpu" . $1}->{system} = int($4);
+		    $res->{cpustat}->{"cpu" . $1}->{idle} = int($5);
+		    $res->{cpustat}->{"cpu" . $1}->{iowait} = int($6);
+		    $res->{cpustat}->{"cpu" . $1}->{irq} = int($7);
+		    $res->{cpustat}->{"cpu" . $1}->{softirq} = int($8);
+		    $res->{cpustat}->{"cpu" . $1}->{steal} = int($9);
+		    $res->{cpustat}->{"cpu" . $1}->{guest} = int($10);
+		    $res->{cpustat}->{"cpu" . $1}->{guest_nice} = int($11);
+		}
+	    }
+	    $fh->close;
+	}
+
 	my $meminfo = PVE::ProcFSTools::read_meminfo();
 	$res->{memory} = {
 	    free => $meminfo->{memfree},
-- 
2.46.0



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH qemu-server 3/3] Fix #5708: Add CPU raw counters
       [not found] <20240917055020.10507-1-sascha.westermann@hl-services.de>
  2024-09-17  5:50 ` [pve-devel] [PATCH pve-common 1/3] Fix #5708: Add CPU raw counters Sascha Westermann via pve-devel
  2024-09-17  5:50 ` [pve-devel] [PATCH pve-manager 2/3] " Sascha Westermann via pve-devel
@ 2024-09-17  5:50 ` Sascha Westermann via pve-devel
  2024-09-24 12:25   ` Daniel Kral
  2 siblings, 1 reply; 8+ messages in thread
From: Sascha Westermann via pve-devel @ 2024-09-17  5:50 UTC (permalink / raw)
  To: pve-devel; +Cc: Sascha Westermann

[-- Attachment #1: Type: message/rfc822, Size: 14248 bytes --]

From: Sascha Westermann <sascha.westermann@hl-services.de>
To: pve-devel@lists.proxmox.com
Cc: Sascha Westermann <sascha.westermann@hl-services.de>
Subject: [PATCH qemu-server 3/3] Fix #5708: Add CPU raw counters
Date: Tue, 17 Sep 2024 07:50:20 +0200
Message-ID: <20240917055020.10507-4-sascha.westermann@hl-services.de>

Add a map containing raw values from /proc/<pid>/stat (utime, stime and
guest_time), "uptime_ticks" and "user_hz" (from cpuinfo) to calcuate
physical CPU usage from two samples. In addition, virtual CPU statistics
based on /proc/<pid>/task/<tid>/schedstat (<tid> for virtual cores) are
added - based on this data, the CPU usage can be calculated from the
perspective of the virtual machine.

The total usage corresponds to "cpu_ns + runqueue_ns", "cpu_ns" should
roughly reflect the physical CPU usage (without I/O-threads and
emulators) and "runqueue_ns" corresponds to the value of %steal, i.e.
the same as "CPU ready" for VMware or "Wait for dispatch" for Hyper-V.

To calculate the difference value, uptime_ticks and user_hz would be
converted to nanoseconds - the value was determined immediately after
utime, stime and guest_time were determined from /proc/<pid>/stat, i.e.
before /proc/<pid>/task/<tid>/schedstat was determined. The time value
is therefore not exact, but should be sufficiently close to the time of
determination so that the values determined should be relatively
accurate.

Signed-off-by: Sascha Westermann <sascha.westermann@hl-services.de>
---
 PVE/QemuServer.pm | 55 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 53 insertions(+), 2 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index b26da505..39830709 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -2814,6 +2814,40 @@ our $vmstatus_return_properties = {
 
 my $last_proc_pid_stat;
 
+sub get_vcpu_to_thread_id {
+    my ($pid) = @_;
+    my @cpu_to_thread_id;
+    my $task_dir = "/proc/$pid/task";
+
+    if (! -d $task_dir) {
+	return @cpu_to_thread_id;
+    }
+
+    opendir(my $dh, $task_dir);
+    if (!$dh) {
+	return @cpu_to_thread_id;
+    }
+    while (my $tid = readdir($dh)) {
+	next if $tid =~ /^\./;
+	my $comm_file = "$task_dir/$tid/comm";
+	next unless -f $comm_file;
+
+	open(my $fh, '<', $comm_file) or next;
+	my $comm = <$fh>;
+	close($fh);
+
+	chomp $comm;
+
+	if ($comm =~ /^CPU\s+(\d+)\/KVM$/) {
+	    my $vcpu = $1;
+	    push @cpu_to_thread_id, { tid => $tid, vcpu => $vcpu };
+	}
+    }
+    closedir($dh);
+
+    return @cpu_to_thread_id;
+}
+
 # get VM status information
 # This must be fast and should not block ($full == false)
 # We only query KVM using QMP if $full == true (this can be slow)
@@ -2827,8 +2861,6 @@ sub vmstatus {
     my $list = vzlist();
     my $defaults = load_defaults();
 
-    my ($uptime) = PVE::ProcFSTools::read_proc_uptime(1);
-
     my $cpucount = $cpuinfo->{cpus} || 1;
 
     foreach my $vmid (keys %$list) {
@@ -2911,6 +2943,25 @@ sub vmstatus {
 
 	my $pstat = PVE::ProcFSTools::read_proc_pid_stat($pid);
 	next if !$pstat; # not running
+	my ($uptime) = PVE::ProcFSTools::read_proc_uptime(1);
+	my $process_uptime_ticks = $uptime - $pstat->{starttime};
+
+	$d->{cpustat}->{guest_time} = int($pstat->{guest_time});
+	$d->{cpustat}->{process_uptime_ticks} = $process_uptime_ticks;
+	$d->{cpustat}->{stime} = int($pstat->{stime});
+	$d->{cpustat}->{user_hz} = $cpuinfo->{user_hz};
+	$d->{cpustat}->{utime} = int($pstat->{utime});
+
+	my @vcpu_to_thread_id = get_vcpu_to_thread_id($pid);
+	if (@vcpu_to_thread_id) {
+	    foreach my $entry (@vcpu_to_thread_id) {
+		my $statstr = PVE::Tools::file_read_firstline("/proc/$pid/task/$entry->{tid}/schedstat") or next;
+		if ($statstr && $statstr =~ m/^(\d+) (\d+) \d/) {
+		    $d->{cpustat}->{"vcpu" . $entry->{vcpu}}->{cpu_ns} = int($1);
+		    $d->{cpustat}->{"vcpu" . $entry->{vcpu}}->{runqueue_ns} = int($2);
+		};
+	    }
+	}
 
 	my $used = $pstat->{utime} + $pstat->{stime};
 
-- 
2.46.0



[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH pve-manager 2/3] Fix #5708: Add CPU raw counters
  2024-09-17  5:50 ` [pve-devel] [PATCH pve-manager 2/3] " Sascha Westermann via pve-devel
@ 2024-09-24 12:25   ` Daniel Kral
  2024-09-24 14:00     ` Lukas Wagner
                       ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Daniel Kral @ 2024-09-24 12:25 UTC (permalink / raw)
  To: Proxmox VE development discussion; +Cc: Sascha Westermann

On 9/17/24 07:50, Sascha Westermann via pve-devel wrote:
> Add a map containing raw values from /proc/stat and "uptime_ticks" which
> can be used in combination with cpuinfo.user_hz to calculate CPU usage
> from two samples. "uptime_ticks" is only defined at the top level, as
> /proc/stat is read once, so that core-specific raw values match this
> value.
> 
> Signed-off-by: Sascha Westermann <sascha.westermann@hl-services.de>
> ---
>  PVE/API2/Nodes.pm | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/PVE/API2/Nodes.pm b/PVE/API2/Nodes.pm
> index 9920e977..1943ec56 100644
> --- a/PVE/API2/Nodes.pm
> +++ b/PVE/API2/Nodes.pm
> @@ -5,6 +5,7 @@ use warnings;
>  
>  use Digest::MD5;
>  use Digest::SHA;
> +use IO::File;
>  use Filesys::Df;
>  use HTTP::Status qw(:constants);
>  use JSON;
> @@ -466,6 +467,37 @@ __PACKAGE__->register_method({

note: the same route also gets called when using the WebGUI and a set of 
the values that get returned are displayed on the "Node > Status" page. 
What I have seen, the added data size is very negligible.

>  	$res->{cpu} = $stat->{cpu};
>  	$res->{wait} = $stat->{wait};
>  
> +	if (my $fh = IO::File->new ("/proc/stat", "r")) {

nit: Minor note, but there shouldn't be a space between the function's 
name and its parameter list [0].

> +	    my ($uptime_ticks) = PVE::ProcFSTools::read_proc_uptime(1);
> +	    while (defined (my $line = <$fh>)) {
> +		if ($line =~ m|^cpu\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)(?:\s+(\d+)\s+(\d+))?|) {
> +		    $res->{cpustat}->{user} = int($1);
> +		    $res->{cpustat}->{nice} = int($2);
> +		    $res->{cpustat}->{system} = int($3);
> +		    $res->{cpustat}->{idle} = int($4);
> +		    $res->{cpustat}->{iowait} = int($5);
> +		    $res->{cpustat}->{irq} = int($6);
> +		    $res->{cpustat}->{softirq} = int($7);
> +		    $res->{cpustat}->{steal} = int($8);
> +		    $res->{cpustat}->{guest} = int($9);
> +		    $res->{cpustat}->{guest_nice} = int($10);
> +		    $res->{cpustat}->{uptime_ticks} = $uptime_ticks;

nit: I think this could be placed rather nicely at 
`$res->{uptime_ticks}`, like `$res->{uptime}`, to make `cpustat` a 
little more consistent with `PVE::ProcFSTools::read_proc_stat()` and

> +		} elsif ($line =~ m|^cpu(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)(?:\s+(\d+)\s+(\d+))?|) {
> +		    $res->{cpustat}->{"cpu" . $1}->{user} = int($2);
> +		    $res->{cpustat}->{"cpu" . $1}->{nice} = int($3);
> +		    $res->{cpustat}->{"cpu" . $1}->{system} = int($4);
> +		    $res->{cpustat}->{"cpu" . $1}->{idle} = int($5);
> +		    $res->{cpustat}->{"cpu" . $1}->{iowait} = int($6);
> +		    $res->{cpustat}->{"cpu" . $1}->{irq} = int($7);
> +		    $res->{cpustat}->{"cpu" . $1}->{softirq} = int($8);
> +		    $res->{cpustat}->{"cpu" . $1}->{steal} = int($9);
> +		    $res->{cpustat}->{"cpu" . $1}->{guest} = int($10);
> +		    $res->{cpustat}->{"cpu" . $1}->{guest_nice} = int($11);
> +		}
> +	    }
> +	    $fh->close;
> +	}

Is there something that is holding us back to move this directly into 
`PVE::ProcFSTools::read_proc_stat()`?

As far as I can tell, the output of `PVE::ProcFSTools::read_proc_stat()` 
is used at these locations:

- the PVE `/nodes/{node}/status` API endpoint of course, which only uses 
the values of `cpu` and `wait` at the moment
- `PMG::API2::Nodes`: also only uses the values of `cpu` and `wait`
- the PMG `/nodes/{node}/status` API endpoint, which also only uses the 
values of `cpu` and `wait`
- `PVE::Service::pvestatd::update_node_status`: retrieve the current 
node status and then update them for rrd via `broadcast_rrd` (uses only 
the values of `cpu` and `wait` selectively) and external metric servers

The first three and a half (speaking of `broadcast_rrd` in the latter) 
look fine to me, but we should take a closer look how external metric 
servers will handle the added data, especially for existing 
queries/dashboards. It could also be a name collision, as 'cpustat' is 
also used for the data that gets sent to the metric servers.

In my opinion, I think it would be a worthwhile feature to add the 
properties for external metric servers (either as part of this or a 
future patch series).

> +
>  	my $meminfo = PVE::ProcFSTools::read_meminfo();
>  	$res->{memory} = {
>  	    free => $meminfo->{memfree},
> -- 
> 2.46.0
It would also be very beneficial if the added data properties that are 
returned here are documented in the 'returns' JSONSchema, so that they 
can be easily understood by other users as well (especially in which 
unit those raw values are so that it's easier to know how they would 
need to get converted).

---

Otherwise, this works just as intended when querying the API endpoint 
`/nodes/{node}/status` via curl and pvesh.

Reviewed-by: Daniel Kral <d.kral@proxmox.com>
Tested-by: Daniel Kral <d.kral@proxmox.com>

[0] https://pve.proxmox.com/wiki/Perl_Style_Guide#Spacing_and_syntax_usage


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH qemu-server 3/3] Fix #5708: Add CPU raw counters
  2024-09-17  5:50 ` [pve-devel] [PATCH qemu-server 3/3] " Sascha Westermann via pve-devel
@ 2024-09-24 12:25   ` Daniel Kral
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Kral @ 2024-09-24 12:25 UTC (permalink / raw)
  To: Proxmox VE development discussion; +Cc: Sascha Westermann

On 9/17/24 07:50, Sascha Westermann via pve-devel wrote:
> Add a map containing raw values from /proc/<pid>/stat (utime, stime and
> guest_time), "uptime_ticks" and "user_hz" (from cpuinfo) to calcuate
> physical CPU usage from two samples. In addition, virtual CPU statistics
> based on /proc/<pid>/task/<tid>/schedstat (<tid> for virtual cores) are
> added - based on this data, the CPU usage can be calculated from the
> perspective of the virtual machine.
> 
> The total usage corresponds to "cpu_ns + runqueue_ns", "cpu_ns" should
> roughly reflect the physical CPU usage (without I/O-threads and
> emulators) and "runqueue_ns" corresponds to the value of %steal, i.e.
> the same as "CPU ready" for VMware or "Wait for dispatch" for Hyper-V.
> 
> To calculate the difference value, uptime_ticks and user_hz would be
> converted to nanoseconds - the value was determined immediately after
> utime, stime and guest_time were determined from /proc/<pid>/stat, i.e.
> before /proc/<pid>/task/<tid>/schedstat was determined. The time value
> is therefore not exact, but should be sufficiently close to the time of
> determination so that the values determined should be relatively
> accurate. >
> Signed-off-by: Sascha Westermann <sascha.westermann@hl-services.de>
> ---
>  PVE/QemuServer.pm | 55 +++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 53 insertions(+), 2 deletions(-)
> 
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index b26da505..39830709 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -2814,6 +2814,40 @@ our $vmstatus_return_properties = {
>  
>  my $last_proc_pid_stat;
>  
> +sub get_vcpu_to_thread_id {
> +    my ($pid) = @_;
> +    my @cpu_to_thread_id;
> +    my $task_dir = "/proc/$pid/task";
> +
> +    if (! -d $task_dir) {
> +	return @cpu_to_thread_id;
> +    }
> +
> +    opendir(my $dh, $task_dir);
> +    if (!$dh) {
> +	return @cpu_to_thread_id;
> +    }
> +    while (my $tid = readdir($dh)) {
> +	next if $tid =~ /^\./;
> +	my $comm_file = "$task_dir/$tid/comm";
> +	next unless -f $comm_file;
> +
> +	open(my $fh, '<', $comm_file) or next;
> +	my $comm = <$fh>;
> +	close($fh);
> +
> +	chomp $comm;
> +
> +	if ($comm =~ /^CPU\s+(\d+)\/KVM$/) {
> +	    my $vcpu = $1;
> +	    push @cpu_to_thread_id, { tid => $tid, vcpu => $vcpu };
> +	}
> +    }
> +    closedir($dh);
> +
> +    return @cpu_to_thread_id;
> +}

nit: since they are not part of the initial bug's intent, this probably 
could be split into its own commit (adding vCPU counters).

> +
>  # get VM status information
>  # This must be fast and should not block ($full == false)
>  # We only query KVM using QMP if $full == true (this can be slow)
> @@ -2827,8 +2861,6 @@ sub vmstatus {
>      my $list = vzlist();
>      my $defaults = load_defaults();
>  
> -    my ($uptime) = PVE::ProcFSTools::read_proc_uptime(1);
> -
>      my $cpucount = $cpuinfo->{cpus} || 1;
>  
>      foreach my $vmid (keys %$list) {
> @@ -2911,6 +2943,25 @@ sub vmstatus {
>  
>  	my $pstat = PVE::ProcFSTools::read_proc_pid_stat($pid);
>  	next if !$pstat; # not running
> +	my ($uptime) = PVE::ProcFSTools::read_proc_uptime(1);
> +	my $process_uptime_ticks = $uptime - $pstat->{starttime};
> +
> +	$d->{cpustat}->{guest_time} = int($pstat->{guest_time});
> +	$d->{cpustat}->{process_uptime_ticks} = $process_uptime_ticks;
> +	$d->{cpustat}->{stime} = int($pstat->{stime});
> +	$d->{cpustat}->{user_hz} = $cpuinfo->{user_hz};
> +	$d->{cpustat}->{utime} = int($pstat->{utime});
> +
> +	my @vcpu_to_thread_id = get_vcpu_to_thread_id($pid);
> +	if (@vcpu_to_thread_id) {
> +	    foreach my $entry (@vcpu_to_thread_id) {
> +		my $statstr = PVE::Tools::file_read_firstline("/proc/$pid/task/$entry->{tid}/schedstat") or next;
> +		if ($statstr && $statstr =~ m/^(\d+) (\d+) \d/) {
> +		    $d->{cpustat}->{"vcpu" . $entry->{vcpu}}->{cpu_ns} = int($1);
> +		    $d->{cpustat}->{"vcpu" . $entry->{vcpu}}->{runqueue_ns} = int($2);
> +		};
> +	    }
> +	}

note: This might be useful information for patch #2 (if we decide to 
make the added information available to metric servers as well) as this 
data is actually sent to the external metric servers (at 
`PVE::Service::pvestatd::update_qemu_status`) and it seems fine to me as 
the vCPUs get separated via a "instance=vcpuX" field. I haven't tested 
this with Grafana though.

e.g. for one of my VMs this will add the following to the InfluxDB API 
write call:

```
cpustat,object=qemu,vmid=107,nodename=node1,host=test,instance=vcpu0 
cpu_ns=10916152530,runqueue_ns=29127241 1727171085000000000
cpustat,object=qemu,vmid=107,nodename=node1,host=test,instance=vcpu1 
cpu_ns=1341783516,runqueue_ns=6114069 1727171085000000000
cpustat,object=qemu,vmid=107,nodename=node1,host=test 
guest_time=846,process_uptime_ticks=5234,stime=333,user_hz=100,utime=1004 
1727171085000000000
```

>  
>  	my $used = $pstat->{utime} + $pstat->{stime};
>  
> -- 
> 2.46.0

As for patch #2, it would also be beneficial to the user that your added 
data properties are documented in the JSONSchema for the function call 
(`$vmstatus_return_properties`), so that they can be easily understood 
by other users as well (especially in which unit those raw values are so 
that it's easier to know how they would need to get converted).

---

Otherwise, this works just as intended for me for:

- `/nodes/{node}/qemu/{vmid}/status/current` (pvesh, curl, WebGUI)
- `qm status <vmid>` (cli)
- InfluxDB API write calls

Reviewed-by: Daniel Kral <d.kral@proxmox.com>
Tested-by: Daniel Kral <d.kral@proxmox.com>

[0] https://bugzilla.proxmox.com/show_bug.cgi?id=5708#c3


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH pve-manager 2/3] Fix #5708: Add CPU raw counters
  2024-09-24 12:25   ` Daniel Kral
@ 2024-09-24 14:00     ` Lukas Wagner
  2024-09-30  6:17     ` Sascha Westermann via pve-devel
       [not found]     ` <63c737f2-21cd-4fff-bf86-2369de65f886@hl-services.de>
  2 siblings, 0 replies; 8+ messages in thread
From: Lukas Wagner @ 2024-09-24 14:00 UTC (permalink / raw)
  To: Proxmox VE development discussion, Daniel Kral; +Cc: Sascha Westermann

On  2024-09-24 14:25, Daniel Kral wrote:
> On 9/17/24 07:50, Sascha Westermann via pve-devel wrote:
>> Add a map containing raw values from /proc/stat and "uptime_ticks" which
>> can be used in combination with cpuinfo.user_hz to calculate CPU usage
>> from two samples. "uptime_ticks" is only defined at the top level, as
>> /proc/stat is read once, so that core-specific raw values match this
>> value.
>>
>> Signed-off-by: Sascha Westermann <sascha.westermann@hl-services.de>
>> ---
>>  PVE/API2/Nodes.pm | 32 ++++++++++++++++++++++++++++++++
>>  1 file changed, 32 insertions(+)
>>
>> diff --git a/PVE/API2/Nodes.pm b/PVE/API2/Nodes.pm
>> index 9920e977..1943ec56 100644
>> --- a/PVE/API2/Nodes.pm
>> +++ b/PVE/API2/Nodes.pm
>> @@ -5,6 +5,7 @@ use warnings;
>>  
>>  use Digest::MD5;
>>  use Digest::SHA;
>> +use IO::File;
>>  use Filesys::Df;
>>  use HTTP::Status qw(:constants);
>>  use JSON;
>> @@ -466,6 +467,37 @@ __PACKAGE__->register_method({
> 
> note: the same route also gets called when using the WebGUI and a set of the values that get returned are displayed on the "Node > Status" page. What I have seen, the added data size is very negligible.
> 
>>      $res->{cpu} = $stat->{cpu};
>>      $res->{wait} = $stat->{wait};
>>  
>> +    if (my $fh = IO::File->new ("/proc/stat", "r")) {
> 
> nit: Minor note, but there shouldn't be a space between the function's name and its parameter list [0].
> 
>> +        my ($uptime_ticks) = PVE::ProcFSTools::read_proc_uptime(1);
>> +        while (defined (my $line = <$fh>)) {
>> +        if ($line =~ m|^cpu\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)(?:\s+(\d+)\s+(\d+))?|) {
>> +            $res->{cpustat}->{user} = int($1);
>> +            $res->{cpustat}->{nice} = int($2);
>> +            $res->{cpustat}->{system} = int($3);
>> +            $res->{cpustat}->{idle} = int($4);
>> +            $res->{cpustat}->{iowait} = int($5);
>> +            $res->{cpustat}->{irq} = int($6);
>> +            $res->{cpustat}->{softirq} = int($7);
>> +            $res->{cpustat}->{steal} = int($8);
>> +            $res->{cpustat}->{guest} = int($9);
>> +            $res->{cpustat}->{guest_nice} = int($10);
>> +            $res->{cpustat}->{uptime_ticks} = $uptime_ticks;
> 
> nit: I think this could be placed rather nicely at `$res->{uptime_ticks}`, like `$res->{uptime}`, to make `cpustat` a little more consistent with `PVE::ProcFSTools::read_proc_stat()` and
> 
>> +        } elsif ($line =~ m|^cpu(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)(?:\s+(\d+)\s+(\d+))?|) {
>> +            $res->{cpustat}->{"cpu" . $1}->{user} = int($2);
>> +            $res->{cpustat}->{"cpu" . $1}->{nice} = int($3);
>> +            $res->{cpustat}->{"cpu" . $1}->{system} = int($4);
>> +            $res->{cpustat}->{"cpu" . $1}->{idle} = int($5);
>> +            $res->{cpustat}->{"cpu" . $1}->{iowait} = int($6);
>> +            $res->{cpustat}->{"cpu" . $1}->{irq} = int($7);
>> +            $res->{cpustat}->{"cpu" . $1}->{softirq} = int($8);
>> +            $res->{cpustat}->{"cpu" . $1}->{steal} = int($9);
>> +            $res->{cpustat}->{"cpu" . $1}->{guest} = int($10);
>> +            $res->{cpustat}->{"cpu" . $1}->{guest_nice} = int($11);
>> +        }
>> +        }
>> +        $fh->close;
>> +    }
> 
> Is there something that is holding us back to move this directly into `PVE::ProcFSTools::read_proc_stat()`?
> 
> As far as I can tell, the output of `PVE::ProcFSTools::read_proc_stat()` is used at these locations:
> 
> - the PVE `/nodes/{node}/status` API endpoint of course, which only uses the values of `cpu` and `wait` at the moment
> - `PMG::API2::Nodes`: also only uses the values of `cpu` and `wait`
> - the PMG `/nodes/{node}/status` API endpoint, which also only uses the values of `cpu` and `wait`
> - `PVE::Service::pvestatd::update_node_status`: retrieve the current node status and then update them for rrd via `broadcast_rrd` (uses only the values of `cpu` and `wait` selectively) and external metric servers
> 
> The first three and a half (speaking of `broadcast_rrd` in the latter) look fine to me, but we should take a closer look how external metric servers will handle the added data, especially for existing queries/dashboards. It could also be a name collision, as 'cpustat' is also used for the data that gets sent to the metric servers.

Just as a side-note from me, since I recently went down this rabbithole as well:

In the long-term I might be good to add a layer of abstraction between the cpustat hash produced by 
read_proc_stat (and similar functions) and the metric server integration.
Right now, the metric server implementation will add fields for every single
member of the hash to a datapoint which is then sent to the external metric server. As a result, the cpustat
hash is essentially a public interface at the moment. If we ever change the format of the hash, we
risk breaking custom monitoring dashboards in our user's setups.
I think we should create a mapping/translation layer between these internal data structures and the fields that are sent to the metric server. At the same time, it would probably be wise to also document the structure and format
of the metric data somewhere in our docs.

Just some thoughts that I had when working on the metrics system, of course this does not have to
(and probably should not) be tackled in this patch series.

-- 
- Lukas


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH pve-manager 2/3] Fix #5708: Add CPU raw counters
  2024-09-24 12:25   ` Daniel Kral
  2024-09-24 14:00     ` Lukas Wagner
@ 2024-09-30  6:17     ` Sascha Westermann via pve-devel
       [not found]     ` <63c737f2-21cd-4fff-bf86-2369de65f886@hl-services.de>
  2 siblings, 0 replies; 8+ messages in thread
From: Sascha Westermann via pve-devel @ 2024-09-30  6:17 UTC (permalink / raw)
  To: Daniel Kral, Proxmox VE development discussion; +Cc: Sascha Westermann

[-- Attachment #1: Type: message/rfc822, Size: 14165 bytes --]

[-- Attachment #1.1.1: Type: text/plain, Size: 2315 bytes --]

If the data is also to be processed by external metric servers, I think
the integration in `PVE::ProcFSTools::read_proc_stat()` makes sense. The
term `cpustat` would no longer conflict in this case, as the content
would be virtually the same. I would create a new patch series for this,
but when looking at `read_proc_stat` a few questions arose for me:

> sub read_proc_stat {
>     my $res = { user => 0, nice => 0, system => 0, idle => 0 , iowait => 0, irq => 0, softirq => 0, steal => 0, guest => 0, guest_nice => 0, sum => 0};

In order to remain consistent with the structure, the same fields per
CPU would also need to be initialized with 0. However, this is only
done when I find an entry of the form cpu<num>, which implicitly gives
me values - so the initialization would not do anything. In this
context, I wonder whether there are any situations where no values can
be set? I would actually say that this is not the case and that the
initialization can be removed.

Side note: `sum` is not used, it probably means `total`, right?

>    if (my $fh = IO::File->new ("/proc/stat", "r")) {
>	while (defined (my $line = <$fh>)) {
>	    if ($line =~ m|^cpu\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)(?:\s+(\d+)\s+(\d+))?|) {
>		$res->{user} = $1 - ($9 // 0);
>		$res->{nice} = $2 - ($10 // 0);
>		$res->{system} = $3;

`user` contains `guest` and `nice` contains `guest_nice`, so I
understand that the values are subtracted. Wouldn't it be better to use
the original values here? Especially when these are passed out as raw
values via API, it would certainly be helpful if they correspond 1:1
with the documentation from /proc/stat.

The values are output as a string in the JSON output. Is there anything
against casting them to int?

>    my $diff = ($ctime - $last_proc_stat->{ctime}) * $clock_ticks * $cpucount;
>
>    if ($diff > 1000) { # don't update too often

I don't understand the condition. `$ctime - $last_proc_stat->{ctime}`
corresponds to the elapsed time in seconds as a float, `clock_ticks`
would normally be 100. So that would mean that on a system with one CPU
core an update may only take place every 10 seconds, but with 8 cores
every 1.25 seconds? Is that an error? Is it even necessary to suppress
updates if $diff > 0?

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH pve-manager 2/3] Fix #5708: Add CPU raw counters
       [not found]     ` <63c737f2-21cd-4fff-bf86-2369de65f886@hl-services.de>
@ 2024-10-03  9:40       ` Daniel Kral
  0 siblings, 0 replies; 8+ messages in thread
From: Daniel Kral @ 2024-10-03  9:40 UTC (permalink / raw)
  To: Sascha Westermann, Proxmox VE development discussion

On 9/30/24 08:17, Sascha Westermann wrote:
> If the data is also to be processed by external metric servers, I think
> the integration in `PVE::ProcFSTools::read_proc_stat()` makes sense. The
> term `cpustat` would no longer conflict in this case, as the content
> would be virtually the same. I would create a new patch series for this,
> but when looking at `read_proc_stat` a few questions arose for me:
> 
>> sub read_proc_stat {
>>      my $res = { user => 0, nice => 0, system => 0, idle => 0 , iowait => 0, irq => 0, softirq => 0, steal => 0, guest => 0, guest_nice => 0, sum => 0};
> 
> In order to remain consistent with the structure, the same fields per
> CPU would also need to be initialized with 0. However, this is only
> done when I find an entry of the form cpu<num>, which implicitly gives
> me values - so the initialization would not do anything. In this
> context, I wonder whether there are any situations where no values can
> be set? I would actually say that this is not the case and that the
> initialization can be removed.

Those fields seem to be initialized to zero here, as they are summed 
later for `$res->{total}` [0]. If the `/proc/stat` file couldn't be read 
or for some reason no `cpu` was captured, those would be initialized to 
`undef`. As far as I'm aware, an addition with a `undef` value would 
convert it to `0` in numerical contexts, so removing it shouldn't be a 
problem, but I could also be missing something here.

---

At least for me, I think you wouldn't need to initialize the `cpu<num>` 
fields with zero, but just make sure that any capture group that uses a 
prefixed `?` could become `undef` [1], so that case needs to be handled, 
just like `$res->{guest}` and `$res->{guest_nice}` is.

> 
> Side note: `sum` is not used, it probably means `total`, right?

As far as I'm aware, I couldn't find a usage for the `sum` property 
anywhere the `read_proc_stat` is used and could be deleted.

> 
>>     if (my $fh = IO::File->new ("/proc/stat", "r")) {
>> 	while (defined (my $line = <$fh>)) {
>> 	    if ($line =~ m|^cpu\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)(?:\s+(\d+)\s+(\d+))?|) {
>> 		$res->{user} = $1 - ($9 // 0);
>> 		$res->{nice} = $2 - ($10 // 0);
>> 		$res->{system} = $3;
> 
> `user` contains `guest` and `nice` contains `guest_nice`, so I
> understand that the values are subtracted. Wouldn't it be better to use
> the original values here? Especially when these are passed out as raw
> values via API, it would certainly be helpful if they correspond 1:1
> with the documentation from /proc/stat.

I can only point you in the direction of the original author's commit, 
it seems like it was inspired by other monitoring tools, but I haven't 
checked the latter two references in the commit message [2].

I get your point, but there might already be many (external) users, 
which depend on these values as they are calculated here (thinking of 
InfluxDB/Grafana dashboards, API users, ...). So if it doesn't hurt your 
use case too badly (and others that would use your new data points 
too!), I would adopt that for the `cpu<num>` datapoints as well.

> 
> The values are output as a string in the JSON output. Is there anything
> against casting them to int?

Good catch! It would be great if any numerical value is also treated 
like that. Be aware that Perl doesn't really have an explicit way to 
'cast' variables as types are handled by context. The builtin `int` 
function only makes sure that we only get the integer part of an 
expression [3]. So be aware, but if it doesn't break something (or 
change probable existing expectations for API users), go ahead.

> 
>>     my $diff = ($ctime - $last_proc_stat->{ctime}) * $clock_ticks * $cpucount;
>>
>>     if ($diff > 1000) { # don't update too often
> 
> I don't understand the condition. `$ctime - $last_proc_stat->{ctime}`
> corresponds to the elapsed time in seconds as a float, `clock_ticks`
> would normally be 100. So that would mean that on a system with one CPU
> core an update may only take place every 10 seconds, but with 8 cores
> every 1.25 seconds? Is that an error? Is it even necessary to suppress
> updates if $diff > 0?

Unfortunately, as some other things here, these changes predate our git 
repository and I couldn't tell you exactly why it is that way.

As far as I can say, as this is retrieved from the WebGUI's node status 
page around every second, this seems like it should "buffer" changes to 
these two data points, so that there are no spikes for the cpu% and 
wait%. But I don't quite get why it is multiplied with `$clock_ticks` 
and `$cpu_count` either.

If you really want to change this, I would do that in another patch 
(series) and mark it as an "RFC" (Read for Comments), so it gets a 
little more discussion if that change conflicts with anyone.

---

Hope these remarks clear up some problems. Looking forward to your patches!

Cheers,
Daniel

[0] https://git.proxmox.com/?p=pve-common.git;a=commitdiff;h=5224b31b
[1] https://perldoc.perl.org/perlvar#$%3Cdigits%3E-($1,-$2,-...)
[2] https://git.proxmox.com/?p=pve-common.git;a=commitdiff;h=c140206b
[3] https://perldoc.perl.org/functions/int

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-10-03  9:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20240917055020.10507-1-sascha.westermann@hl-services.de>
2024-09-17  5:50 ` [pve-devel] [PATCH pve-common 1/3] Fix #5708: Add CPU raw counters Sascha Westermann via pve-devel
2024-09-17  5:50 ` [pve-devel] [PATCH pve-manager 2/3] " Sascha Westermann via pve-devel
2024-09-24 12:25   ` Daniel Kral
2024-09-24 14:00     ` Lukas Wagner
2024-09-30  6:17     ` Sascha Westermann via pve-devel
     [not found]     ` <63c737f2-21cd-4fff-bf86-2369de65f886@hl-services.de>
2024-10-03  9:40       ` Daniel Kral
2024-09-17  5:50 ` [pve-devel] [PATCH qemu-server 3/3] " Sascha Westermann via pve-devel
2024-09-24 12:25   ` Daniel Kral

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox

Service provided by Proxmox Server Solutions GmbH | Privacy | Legal