Re: [pve-devel] [PATCH pve-manager 2/3] Fix #5708: Add CPU raw counters

From: Lukas Wagner <l.wagner@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>,
	Daniel Kral <d.kral@proxmox.com>
Cc: Sascha Westermann <sascha.westermann@hl-services.de>
Subject: Re: [pve-devel] [PATCH pve-manager 2/3] Fix #5708: Add CPU raw counters
Date: Tue, 24 Sep 2024 16:00:47 +0200	[thread overview]
Message-ID: <57e1dfc4-091e-43d6-b1a6-be65adc9bf40@proxmox.com> (raw)
In-Reply-To: <69e71cf0-a8c4-42a6-a1a5-36024e903687@proxmox.com>

On  2024-09-24 14:25, Daniel Kral wrote:
> On 9/17/24 07:50, Sascha Westermann via pve-devel wrote:
>> Add a map containing raw values from /proc/stat and "uptime_ticks" which
>> can be used in combination with cpuinfo.user_hz to calculate CPU usage
>> from two samples. "uptime_ticks" is only defined at the top level, as
>> /proc/stat is read once, so that core-specific raw values match this
>> value.
>>
>> Signed-off-by: Sascha Westermann <sascha.westermann@hl-services.de>
>> ---
>>  PVE/API2/Nodes.pm | 32 ++++++++++++++++++++++++++++++++
>>  1 file changed, 32 insertions(+)
>>
>> diff --git a/PVE/API2/Nodes.pm b/PVE/API2/Nodes.pm
>> index 9920e977..1943ec56 100644
>> --- a/PVE/API2/Nodes.pm
>> +++ b/PVE/API2/Nodes.pm
>> @@ -5,6 +5,7 @@ use warnings;
>>  
>>  use Digest::MD5;
>>  use Digest::SHA;
>> +use IO::File;
>>  use Filesys::Df;
>>  use HTTP::Status qw(:constants);
>>  use JSON;
>> @@ -466,6 +467,37 @@ __PACKAGE__->register_method({
> 
> note: the same route also gets called when using the WebGUI and a set of the values that get returned are displayed on the "Node > Status" page. What I have seen, the added data size is very negligible.
> 
>>      $res->{cpu} = $stat->{cpu};
>>      $res->{wait} = $stat->{wait};
>>  
>> +    if (my $fh = IO::File->new ("/proc/stat", "r")) {
> 
> nit: Minor note, but there shouldn't be a space between the function's name and its parameter list [0].
> 
>> +        my ($uptime_ticks) = PVE::ProcFSTools::read_proc_uptime(1);
>> +        while (defined (my $line = <$fh>)) {
>> +        if ($line =~ m|^cpu\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)(?:\s+(\d+)\s+(\d+))?|) {
>> +            $res->{cpustat}->{user} = int($1);
>> +            $res->{cpustat}->{nice} = int($2);
>> +            $res->{cpustat}->{system} = int($3);
>> +            $res->{cpustat}->{idle} = int($4);
>> +            $res->{cpustat}->{iowait} = int($5);
>> +            $res->{cpustat}->{irq} = int($6);
>> +            $res->{cpustat}->{softirq} = int($7);
>> +            $res->{cpustat}->{steal} = int($8);
>> +            $res->{cpustat}->{guest} = int($9);
>> +            $res->{cpustat}->{guest_nice} = int($10);
>> +            $res->{cpustat}->{uptime_ticks} = $uptime_ticks;
> 
> nit: I think this could be placed rather nicely at `$res->{uptime_ticks}`, like `$res->{uptime}`, to make `cpustat` a little more consistent with `PVE::ProcFSTools::read_proc_stat()` and
> 
>> +        } elsif ($line =~ m|^cpu(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)(?:\s+(\d+)\s+(\d+))?|) {
>> +            $res->{cpustat}->{"cpu" . $1}->{user} = int($2);
>> +            $res->{cpustat}->{"cpu" . $1}->{nice} = int($3);
>> +            $res->{cpustat}->{"cpu" . $1}->{system} = int($4);
>> +            $res->{cpustat}->{"cpu" . $1}->{idle} = int($5);
>> +            $res->{cpustat}->{"cpu" . $1}->{iowait} = int($6);
>> +            $res->{cpustat}->{"cpu" . $1}->{irq} = int($7);
>> +            $res->{cpustat}->{"cpu" . $1}->{softirq} = int($8);
>> +            $res->{cpustat}->{"cpu" . $1}->{steal} = int($9);
>> +            $res->{cpustat}->{"cpu" . $1}->{guest} = int($10);
>> +            $res->{cpustat}->{"cpu" . $1}->{guest_nice} = int($11);
>> +        }
>> +        }
>> +        $fh->close;
>> +    }
> 
> Is there something that is holding us back to move this directly into `PVE::ProcFSTools::read_proc_stat()`?
> 
> As far as I can tell, the output of `PVE::ProcFSTools::read_proc_stat()` is used at these locations:
> 
> - the PVE `/nodes/{node}/status` API endpoint of course, which only uses the values of `cpu` and `wait` at the moment
> - `PMG::API2::Nodes`: also only uses the values of `cpu` and `wait`
> - the PMG `/nodes/{node}/status` API endpoint, which also only uses the values of `cpu` and `wait`
> - `PVE::Service::pvestatd::update_node_status`: retrieve the current node status and then update them for rrd via `broadcast_rrd` (uses only the values of `cpu` and `wait` selectively) and external metric servers
> 
> The first three and a half (speaking of `broadcast_rrd` in the latter) look fine to me, but we should take a closer look how external metric servers will handle the added data, especially for existing queries/dashboards. It could also be a name collision, as 'cpustat' is also used for the data that gets sent to the metric servers.

Just as a side-note from me, since I recently went down this rabbithole as well:

In the long-term I might be good to add a layer of abstraction between the cpustat hash produced by 
read_proc_stat (and similar functions) and the metric server integration.
Right now, the metric server implementation will add fields for every single
member of the hash to a datapoint which is then sent to the external metric server. As a result, the cpustat
hash is essentially a public interface at the moment. If we ever change the format of the hash, we
risk breaking custom monitoring dashboards in our user's setups.
I think we should create a mapping/translation layer between these internal data structures and the fields that are sent to the metric server. At the same time, it would probably be wise to also document the structure and format
of the metric data somewhere in our docs.

Just some thoughts that I had when working on the metrics system, of course this does not have to
(and probably should not) be tackled in this patch series.

-- 
- Lukas

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel