public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [PATCH pve-common 0/1] ProcFSTools: add read_pressure
@ 2020-10-06 11:58 Alexandre Derumier
  2020-10-06 11:58 ` [pve-devel] [PATCH pve-common 1/1] " Alexandre Derumier
  0 siblings, 1 reply; 8+ messages in thread
From: Alexandre Derumier @ 2020-10-06 11:58 UTC (permalink / raw)
  To: pve-devel

Hi,

I'm currently working on vm load balancing scheduler.

This patch add new pressure counters, very usefull to known
if a node is overloaded, with more granularity than loadaverage.


Alexandre Derumier (1):
  ProcFSTools: add read_pressure

 src/PVE/ProcFSTools.pm | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

-- 
2.20.1




^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] [PATCH pve-common 1/1] ProcFSTools: add read_pressure
  2020-10-06 11:58 [pve-devel] [PATCH pve-common 0/1] ProcFSTools: add read_pressure Alexandre Derumier
@ 2020-10-06 11:58 ` Alexandre Derumier
  2020-10-11  8:23   ` Alexandre Derumier
  2020-10-13  5:35   ` [pve-devel] applied: " Dietmar Maurer
  0 siblings, 2 replies; 8+ messages in thread
From: Alexandre Derumier @ 2020-10-06 11:58 UTC (permalink / raw)
  To: pve-devel

read new /proc/pressure/(cpu,disk,io) introduced in kernel 4.20.

This give more granular informations than loadaverage.

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
---
 src/PVE/ProcFSTools.pm | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/src/PVE/ProcFSTools.pm b/src/PVE/ProcFSTools.pm
index 7cf1472..7687c13 100644
--- a/src/PVE/ProcFSTools.pm
+++ b/src/PVE/ProcFSTools.pm
@@ -132,6 +132,24 @@ sub read_loadavg {
     return wantarray ? (0, 0, 0) : 0;
 }
 
+sub read_pressure {
+
+    my $res = {};
+    foreach my $type (qw(cpu memory io)) {
+	if (my $fh = IO::File->new ("/proc/pressure/$type", "r")) {
+	    while (defined (my $line = <$fh>)) {
+		if ($line =~ /^(some|full)\s+avg10\=(\d+\.\d+)\s+avg60\=(\d+\.\d+)\s+avg300\=(\d+\.\d+)\s+total\=(\d+)/) {
+		    $res->{$type}->{$1}->{avg10} = $2;
+		    $res->{$type}->{$1}->{avg60} = $3;
+		    $res->{$type}->{$1}->{avg300} = $4;
+	        }
+	    }
+	    $fh->close;
+	}
+    }
+    return $res;
+}
+
 my $last_proc_stat;
 
 sub read_proc_stat {
-- 
2.20.1




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH pve-common 1/1] ProcFSTools: add read_pressure
  2020-10-06 11:58 ` [pve-devel] [PATCH pve-common 1/1] " Alexandre Derumier
@ 2020-10-11  8:23   ` Alexandre Derumier
  2020-10-13  6:05     ` Dietmar Maurer
  2020-10-13  5:35   ` [pve-devel] applied: " Dietmar Maurer
  1 sibling, 1 reply; 8+ messages in thread
From: Alexandre Derumier @ 2020-10-11  8:23 UTC (permalink / raw)
  To: pve-devel

Hi,
I have notice that it's possible to get pressure info for each vm/ct
through cgroups

/sys/fs/cgroup/unified/qemu.slice/<vmid>.scope/cpu.pressure
/sys/fs/cgroup/unified/lxc/<vmid>/cpu.pressure


Maybe it could be great to have some new rrd graphs for each vm/ct ?
They are very useful counters to known a specific vm/ct is overloaded


Le mar. 6 oct. 2020 à 13:58, Alexandre Derumier <aderumier@odiso.com> a
écrit :

> read new /proc/pressure/(cpu,disk,io) introduced in kernel 4.20.
>
> This give more granular informations than loadaverage.
>
> Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
> ---
>  src/PVE/ProcFSTools.pm | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
>
> diff --git a/src/PVE/ProcFSTools.pm b/src/PVE/ProcFSTools.pm
> index 7cf1472..7687c13 100644
> --- a/src/PVE/ProcFSTools.pm
> +++ b/src/PVE/ProcFSTools.pm
> @@ -132,6 +132,24 @@ sub read_loadavg {
>      return wantarray ? (0, 0, 0) : 0;
>  }
>
> +sub read_pressure {
> +
> +    my $res = {};
> +    foreach my $type (qw(cpu memory io)) {
> +       if (my $fh = IO::File->new ("/proc/pressure/$type", "r")) {
> +           while (defined (my $line = <$fh>)) {
> +               if ($line =~
> /^(some|full)\s+avg10\=(\d+\.\d+)\s+avg60\=(\d+\.\d+)\s+avg300\=(\d+\.\d+)\s+total\=(\d+)/)
> {
> +                   $res->{$type}->{$1}->{avg10} = $2;
> +                   $res->{$type}->{$1}->{avg60} = $3;
> +                   $res->{$type}->{$1}->{avg300} = $4;
> +               }
> +           }
> +           $fh->close;
> +       }
> +    }
> +    return $res;
> +}
> +
>  my $last_proc_stat;
>
>  sub read_proc_stat {
> --
> 2.20.1
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [pve-devel] applied: [PATCH pve-common 1/1] ProcFSTools: add read_pressure
  2020-10-06 11:58 ` [pve-devel] [PATCH pve-common 1/1] " Alexandre Derumier
  2020-10-11  8:23   ` Alexandre Derumier
@ 2020-10-13  5:35   ` Dietmar Maurer
  1 sibling, 0 replies; 8+ messages in thread
From: Dietmar Maurer @ 2020-10-13  5:35 UTC (permalink / raw)
  To: Proxmox VE development discussion, Alexandre Derumier

applied




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH pve-common 1/1] ProcFSTools: add read_pressure
  2020-10-11  8:23   ` Alexandre Derumier
@ 2020-10-13  6:05     ` Dietmar Maurer
  2020-10-13  6:32       ` Alexandre Derumier
  0 siblings, 1 reply; 8+ messages in thread
From: Dietmar Maurer @ 2020-10-13  6:05 UTC (permalink / raw)
  To: Proxmox VE development discussion, Alexandre Derumier

> I have notice that it's possible to get pressure info for each vm/ct
> through cgroups
> 
> /sys/fs/cgroup/unified/qemu.slice/<vmid>.scope/cpu.pressure
> /sys/fs/cgroup/unified/lxc/<vmid>/cpu.pressure
> 
> 
> Maybe it could be great to have some new rrd graphs for each vm/ct ?
> They are very useful counters to known a specific vm/ct is overloaded

I have no idea how reliable this is, because we do not use cgroups v2. But yes,
I think this would be useful.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH pve-common 1/1] ProcFSTools: add read_pressure
  2020-10-13  6:05     ` Dietmar Maurer
@ 2020-10-13  6:32       ` Alexandre Derumier
  2020-10-13  7:38         ` Dietmar Maurer
  0 siblings, 1 reply; 8+ messages in thread
From: Alexandre Derumier @ 2020-10-13  6:32 UTC (permalink / raw)
  To: Dietmar Maurer; +Cc: Proxmox VE development discussion

>>I have no idea how reliable this is, because we do not use cgroups v2.
But yes,
>>I think this would be useful.

I have tested it on a host with a lot of small vms. (something like 400vms
on  a 48cores), with this number of vms, they was a lot of context
switches, and vms was laggy.
cpu usage was ok (maybe 40%), loadaverage was around 40,  but pressure was
around 20%. (so it seem more precise than loadaverage)

global /proc/pressure/cpu   was almost the sum of cgroups of
/sys/fs/cgroup/unified/qemu.slice/<vmid>.scope/cpu.pressure

so,it seem reliable.

(I don't have lxc container in production, but I think it should be the
same)

So, yes, I think we could add them to rrd for both host/vms.


BTW, I'm currently playing with reading the rrd files, and I have notice
than lower precision is 1minute.
as pvestatd send values around each 10s,  is this 1minute precision an
average of 6x10s values send by pvestatd ?

I'm currently working on a poc of vm balancing, but I would like to have
something like 15min of 10s precision (90 samples of 10s).
So currently I'm getting stats each 10s manually
with PVE::API2Tools::extract_vm_stats like the ressource api.
(This use PVE::Cluster::rrd_dump , but I don't understand the ipcc_. code.
does it only return current streamed values?
 then after the rrdcached daemon is writing to rrd file the average values
each minute ?)

I don't known if we could have rrd files with 15min of 10s precision ?
(don't known the write load impact on disks)




Le mar. 13 oct. 2020 à 08:05, Dietmar Maurer <dietmar@proxmox.com> a écrit :

> > I have notice that it's possible to get pressure info for each vm/ct
> > through cgroups
> >
> > /sys/fs/cgroup/unified/qemu.slice/<vmid>.scope/cpu.pressure
> > /sys/fs/cgroup/unified/lxc/<vmid>/cpu.pressure
> >
> >
> > Maybe it could be great to have some new rrd graphs for each vm/ct ?
> > They are very useful counters to known a specific vm/ct is overloaded
>
> I have no idea how reliable this is, because we do not use cgroups v2. But
> yes,
> I think this would be useful.
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH pve-common 1/1] ProcFSTools: add read_pressure
  2020-10-13  6:32       ` Alexandre Derumier
@ 2020-10-13  7:38         ` Dietmar Maurer
  2020-10-13 12:05           ` Alexandre Derumier
  0 siblings, 1 reply; 8+ messages in thread
From: Dietmar Maurer @ 2020-10-13  7:38 UTC (permalink / raw)
  To: Alexandre Derumier; +Cc: Proxmox VE development discussion

> BTW, I'm currently playing with reading the rrd files, and I have notice than lower precision is 1minute.
> as pvestatd send values around each 10s, is this 1minute precision an average of 6x10s values send by pvestatd ?

Yes (we also store the MAX)

> I'm currently working on a poc of vm balancing, but I would like to have something like 15min of 10s precision (90 samples of 10s).

Why do you need 10s resulution? Isn't 1min good enough?

> So currently I'm getting stats each 10s manually with PVE::API2Tools::extract_vm_stats like the ressource api.
> (This use PVE::Cluster::rrd_dump , but I don't understand the ipcc_. code. does it only return current streamed values?
> then after the rrdcached daemon is writing to rrd file the average values each minute ?)
> 
> I don't known if we could have rrd files with 15min of 10s precision ? (don't known the write load impact on disks)

We use the following RRD conf, step is 60 seconds (see pve-cluster/src/status.c):

static const char *rrd_def_node[] = {
	"DS:loadavg:GAUGE:120:0:U",
	"DS:maxcpu:GAUGE:120:0:U",
	"DS:cpu:GAUGE:120:0:U",
	"DS:iowait:GAUGE:120:0:U",
	"DS:memtotal:GAUGE:120:0:U",
	"DS:memused:GAUGE:120:0:U",
	"DS:swaptotal:GAUGE:120:0:U",
	"DS:swapused:GAUGE:120:0:U",
	"DS:roottotal:GAUGE:120:0:U",
	"DS:rootused:GAUGE:120:0:U",
	"DS:netin:DERIVE:120:0:U",
	"DS:netout:DERIVE:120:0:U",

	"RRA:AVERAGE:0.5:1:70", // 1 min avg - one hour
	"RRA:AVERAGE:0.5:30:70", // 30 min avg - one day
	"RRA:AVERAGE:0.5:180:70", // 3 hour avg - one week
	"RRA:AVERAGE:0.5:720:70", // 12 hour avg - one month
	"RRA:AVERAGE:0.5:10080:70", // 7 day avg - ony year

	"RRA:MAX:0.5:1:70", // 1 min max - one hour
	"RRA:MAX:0.5:30:70", // 30 min max - one day
	"RRA:MAX:0.5:180:70",  // 3 hour max - one week
	"RRA:MAX:0.5:720:70", // 12 hour max - one month
	"RRA:MAX:0.5:10080:70", // 7 day max - ony year
	NULL,
};

Also See: man rrdcreate

So no, you do not get 10s precission from RRD.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [pve-devel] [PATCH pve-common 1/1] ProcFSTools: add read_pressure
  2020-10-13  7:38         ` Dietmar Maurer
@ 2020-10-13 12:05           ` Alexandre Derumier
  0 siblings, 0 replies; 8+ messages in thread
From: Alexandre Derumier @ 2020-10-13 12:05 UTC (permalink / raw)
  To: Dietmar Maurer; +Cc: Proxmox VE development discussion

>>Why do you need 10s resulution? Isn't 1min good enough?
Well, if the 1min is an average of 10s metric, it's ok.

I'm currently using 1min average and 5min average, so it's not a problem
with current rrds.

Thanks for the informations !


(I'll resend a patch to add pressure to rrd, and also add vm/ct pressure)


Le mar. 13 oct. 2020 à 09:38, Dietmar Maurer <dietmar@proxmox.com> a écrit :

> > BTW, I'm currently playing with reading the rrd files, and I have notice
> than lower precision is 1minute.
> > as pvestatd send values around each 10s, is this 1minute precision an
> average of 6x10s values send by pvestatd ?
>
> Yes (we also store the MAX)
>
> > I'm currently working on a poc of vm balancing, but I would like to have
> something like 15min of 10s precision (90 samples of 10s).
>
> Why do you need 10s resulution? Isn't 1min good enough?
>
> > So currently I'm getting stats each 10s manually
> with PVE::API2Tools::extract_vm_stats like the ressource api.
> > (This use PVE::Cluster::rrd_dump , but I don't understand the ipcc_.
> code. does it only return current streamed values?
> > then after the rrdcached daemon is writing to rrd file the average
> values each minute ?)
> >
> > I don't known if we could have rrd files with 15min of 10s precision ?
> (don't known the write load impact on disks)
>
> We use the following RRD conf, step is 60 seconds (see
> pve-cluster/src/status.c):
>
> static const char *rrd_def_node[] = {
>         "DS:loadavg:GAUGE:120:0:U",
>         "DS:maxcpu:GAUGE:120:0:U",
>         "DS:cpu:GAUGE:120:0:U",
>         "DS:iowait:GAUGE:120:0:U",
>         "DS:memtotal:GAUGE:120:0:U",
>         "DS:memused:GAUGE:120:0:U",
>         "DS:swaptotal:GAUGE:120:0:U",
>         "DS:swapused:GAUGE:120:0:U",
>         "DS:roottotal:GAUGE:120:0:U",
>         "DS:rootused:GAUGE:120:0:U",
>         "DS:netin:DERIVE:120:0:U",
>         "DS:netout:DERIVE:120:0:U",
>
>         "RRA:AVERAGE:0.5:1:70", // 1 min avg - one hour
>         "RRA:AVERAGE:0.5:30:70", // 30 min avg - one day
>         "RRA:AVERAGE:0.5:180:70", // 3 hour avg - one week
>         "RRA:AVERAGE:0.5:720:70", // 12 hour avg - one month
>         "RRA:AVERAGE:0.5:10080:70", // 7 day avg - ony year
>
>         "RRA:MAX:0.5:1:70", // 1 min max - one hour
>         "RRA:MAX:0.5:30:70", // 30 min max - one day
>         "RRA:MAX:0.5:180:70",  // 3 hour max - one week
>         "RRA:MAX:0.5:720:70", // 12 hour max - one month
>         "RRA:MAX:0.5:10080:70", // 7 day max - ony year
>         NULL,
> };
>
> Also See: man rrdcreate
>
> So no, you do not get 10s precission from RRD.
>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-10-13 12:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-06 11:58 [pve-devel] [PATCH pve-common 0/1] ProcFSTools: add read_pressure Alexandre Derumier
2020-10-06 11:58 ` [pve-devel] [PATCH pve-common 1/1] " Alexandre Derumier
2020-10-11  8:23   ` Alexandre Derumier
2020-10-13  6:05     ` Dietmar Maurer
2020-10-13  6:32       ` Alexandre Derumier
2020-10-13  7:38         ` Dietmar Maurer
2020-10-13 12:05           ` Alexandre Derumier
2020-10-13  5:35   ` [pve-devel] applied: " Dietmar Maurer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal