public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Daniel Kral <d.kral@proxmox.com>
To: Proxmox VE development discussion <pve-devel@lists.proxmox.com>
Cc: Sascha Westermann <sascha.westermann@hl-services.de>
Subject: Re: [pve-devel] [PATCH qemu-server 3/3] Fix #5708: Add CPU raw counters
Date: Tue, 24 Sep 2024 14:25:44 +0200	[thread overview]
Message-ID: <7c8fa551-1682-4556-9322-15fd280fcfad@proxmox.com> (raw)
In-Reply-To: <mailman.6.1726728980.332.pve-devel@lists.proxmox.com>

On 9/17/24 07:50, Sascha Westermann via pve-devel wrote:
> Add a map containing raw values from /proc/<pid>/stat (utime, stime and
> guest_time), "uptime_ticks" and "user_hz" (from cpuinfo) to calcuate
> physical CPU usage from two samples. In addition, virtual CPU statistics
> based on /proc/<pid>/task/<tid>/schedstat (<tid> for virtual cores) are
> added - based on this data, the CPU usage can be calculated from the
> perspective of the virtual machine.
> 
> The total usage corresponds to "cpu_ns + runqueue_ns", "cpu_ns" should
> roughly reflect the physical CPU usage (without I/O-threads and
> emulators) and "runqueue_ns" corresponds to the value of %steal, i.e.
> the same as "CPU ready" for VMware or "Wait for dispatch" for Hyper-V.
> 
> To calculate the difference value, uptime_ticks and user_hz would be
> converted to nanoseconds - the value was determined immediately after
> utime, stime and guest_time were determined from /proc/<pid>/stat, i.e.
> before /proc/<pid>/task/<tid>/schedstat was determined. The time value
> is therefore not exact, but should be sufficiently close to the time of
> determination so that the values determined should be relatively
> accurate. >
> Signed-off-by: Sascha Westermann <sascha.westermann@hl-services.de>
> ---
>  PVE/QemuServer.pm | 55 +++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 53 insertions(+), 2 deletions(-)
> 
> diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
> index b26da505..39830709 100644
> --- a/PVE/QemuServer.pm
> +++ b/PVE/QemuServer.pm
> @@ -2814,6 +2814,40 @@ our $vmstatus_return_properties = {
>  
>  my $last_proc_pid_stat;
>  
> +sub get_vcpu_to_thread_id {
> +    my ($pid) = @_;
> +    my @cpu_to_thread_id;
> +    my $task_dir = "/proc/$pid/task";
> +
> +    if (! -d $task_dir) {
> +	return @cpu_to_thread_id;
> +    }
> +
> +    opendir(my $dh, $task_dir);
> +    if (!$dh) {
> +	return @cpu_to_thread_id;
> +    }
> +    while (my $tid = readdir($dh)) {
> +	next if $tid =~ /^\./;
> +	my $comm_file = "$task_dir/$tid/comm";
> +	next unless -f $comm_file;
> +
> +	open(my $fh, '<', $comm_file) or next;
> +	my $comm = <$fh>;
> +	close($fh);
> +
> +	chomp $comm;
> +
> +	if ($comm =~ /^CPU\s+(\d+)\/KVM$/) {
> +	    my $vcpu = $1;
> +	    push @cpu_to_thread_id, { tid => $tid, vcpu => $vcpu };
> +	}
> +    }
> +    closedir($dh);
> +
> +    return @cpu_to_thread_id;
> +}

nit: since they are not part of the initial bug's intent, this probably 
could be split into its own commit (adding vCPU counters).

> +
>  # get VM status information
>  # This must be fast and should not block ($full == false)
>  # We only query KVM using QMP if $full == true (this can be slow)
> @@ -2827,8 +2861,6 @@ sub vmstatus {
>      my $list = vzlist();
>      my $defaults = load_defaults();
>  
> -    my ($uptime) = PVE::ProcFSTools::read_proc_uptime(1);
> -
>      my $cpucount = $cpuinfo->{cpus} || 1;
>  
>      foreach my $vmid (keys %$list) {
> @@ -2911,6 +2943,25 @@ sub vmstatus {
>  
>  	my $pstat = PVE::ProcFSTools::read_proc_pid_stat($pid);
>  	next if !$pstat; # not running
> +	my ($uptime) = PVE::ProcFSTools::read_proc_uptime(1);
> +	my $process_uptime_ticks = $uptime - $pstat->{starttime};
> +
> +	$d->{cpustat}->{guest_time} = int($pstat->{guest_time});
> +	$d->{cpustat}->{process_uptime_ticks} = $process_uptime_ticks;
> +	$d->{cpustat}->{stime} = int($pstat->{stime});
> +	$d->{cpustat}->{user_hz} = $cpuinfo->{user_hz};
> +	$d->{cpustat}->{utime} = int($pstat->{utime});
> +
> +	my @vcpu_to_thread_id = get_vcpu_to_thread_id($pid);
> +	if (@vcpu_to_thread_id) {
> +	    foreach my $entry (@vcpu_to_thread_id) {
> +		my $statstr = PVE::Tools::file_read_firstline("/proc/$pid/task/$entry->{tid}/schedstat") or next;
> +		if ($statstr && $statstr =~ m/^(\d+) (\d+) \d/) {
> +		    $d->{cpustat}->{"vcpu" . $entry->{vcpu}}->{cpu_ns} = int($1);
> +		    $d->{cpustat}->{"vcpu" . $entry->{vcpu}}->{runqueue_ns} = int($2);
> +		};
> +	    }
> +	}

note: This might be useful information for patch #2 (if we decide to 
make the added information available to metric servers as well) as this 
data is actually sent to the external metric servers (at 
`PVE::Service::pvestatd::update_qemu_status`) and it seems fine to me as 
the vCPUs get separated via a "instance=vcpuX" field. I haven't tested 
this with Grafana though.

e.g. for one of my VMs this will add the following to the InfluxDB API 
write call:

```
cpustat,object=qemu,vmid=107,nodename=node1,host=test,instance=vcpu0 
cpu_ns=10916152530,runqueue_ns=29127241 1727171085000000000
cpustat,object=qemu,vmid=107,nodename=node1,host=test,instance=vcpu1 
cpu_ns=1341783516,runqueue_ns=6114069 1727171085000000000
cpustat,object=qemu,vmid=107,nodename=node1,host=test 
guest_time=846,process_uptime_ticks=5234,stime=333,user_hz=100,utime=1004 
1727171085000000000
```

>  
>  	my $used = $pstat->{utime} + $pstat->{stime};
>  
> -- 
> 2.46.0

As for patch #2, it would also be beneficial to the user that your added 
data properties are documented in the JSONSchema for the function call 
(`$vmstatus_return_properties`), so that they can be easily understood 
by other users as well (especially in which unit those raw values are so 
that it's easier to know how they would need to get converted).

---

Otherwise, this works just as intended for me for:

- `/nodes/{node}/qemu/{vmid}/status/current` (pvesh, curl, WebGUI)
- `qm status <vmid>` (cli)
- InfluxDB API write calls

Reviewed-by: Daniel Kral <d.kral@proxmox.com>
Tested-by: Daniel Kral <d.kral@proxmox.com>

[0] https://bugzilla.proxmox.com/show_bug.cgi?id=5708#c3


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


      reply	other threads:[~2024-09-24 12:25 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240917055020.10507-1-sascha.westermann@hl-services.de>
2024-09-17  5:50 ` [pve-devel] [PATCH pve-common 1/3] " Sascha Westermann via pve-devel
2024-09-17  5:50 ` [pve-devel] [PATCH pve-manager 2/3] " Sascha Westermann via pve-devel
2024-09-24 12:25   ` Daniel Kral
2024-09-24 14:00     ` Lukas Wagner
2024-09-30  6:17     ` Sascha Westermann via pve-devel
     [not found]     ` <63c737f2-21cd-4fff-bf86-2369de65f886@hl-services.de>
2024-10-03  9:40       ` Daniel Kral
2024-09-17  5:50 ` [pve-devel] [PATCH qemu-server 3/3] " Sascha Westermann via pve-devel
2024-09-24 12:25   ` Daniel Kral [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c8fa551-1682-4556-9322-15fd280fcfad@proxmox.com \
    --to=d.kral@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    --cc=sascha.westermann@hl-services.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal