* [pve-devel] [PATCH pve-container] pct: fix cpu load calculation on command line @ 2023-07-18 11:58 Philipp Hufnagl 2023-07-18 12:00 ` Philipp Hufnagl 0 siblings, 1 reply; 4+ messages in thread From: Philipp Hufnagl @ 2023-07-18 11:58 UTC (permalink / raw) To: pve-devel When called from the command line, it was not possible to calculate cpu load because there was no 2nd data point available for the calculation. Now (when called) from the command line, cpu stats will be fetched twice with a minimum delta of 20ms. That way the load can be calculated fixes #4765 Signed-off-by: Philipp Hufnagl <p.hufnagl@proxmox.com> --- src/PVE/CLI/pct.pm | 4 ++-- src/PVE/LXC.pm | 32 +++++++++++++++++++++++++++++--- 2 files changed, 31 insertions(+), 5 deletions(-) diff --git a/src/PVE/CLI/pct.pm b/src/PVE/CLI/pct.pm index ff75d33..e531b27 100755 --- a/src/PVE/CLI/pct.pm +++ b/src/PVE/CLI/pct.pm @@ -60,8 +60,8 @@ __PACKAGE__->register_method ({ # test if CT exists my $conf = PVE::LXC::Config->load_config ($param->{vmid}); - - my $vmstatus = PVE::LXC::vmstatus($param->{vmid}); + # workaround to get cpu usage is to fetch cpu stats twice with a delay + my $vmstatus = PVE::LXC::vmstatus($param->{vmid}, 20); my $stat = $vmstatus->{$param->{vmid}}; if ($param->{verbose}) { foreach my $k (sort (keys %$stat)) { diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm index a531ea5..9fc171f 100644 --- a/src/PVE/LXC.pm +++ b/src/PVE/LXC.pm @@ -12,7 +12,7 @@ use IO::Poll qw(POLLIN POLLHUP); use IO::Socket::UNIX; use POSIX qw(EINTR); use Socket; -use Time::HiRes qw (gettimeofday); +use Time::HiRes qw (gettimeofday usleep); use PVE::AccessControl; use PVE::CGroup; @@ -171,11 +171,37 @@ our $vmstatus_return_properties = { } }; +sub get_first_cpu { + my ($list, $measure_timespan_ms) = @_; + my $cdtime = gettimeofday; + + foreach my $vmid (keys %$list) { + my $cgroups = PVE::LXC::CGroup->new($vmid); + if (defined(my $cpu = $cgroups->get_cpu_stat())) { + # Total time (in milliseconds) used up by the cpu. + my $used_ms = $cpu->{utime} + $cpu->{stime}; + $last_proc_vmid_stat->{$vmid} = { + time => $cdtime, + used => $used_ms, + cpu => 0, + }; + } + } + usleep($measure_timespan_ms * 1000); +} + sub vmstatus { - my ($opt_vmid) = @_; + my ($opt_vmid, $measure_timespan_ms) = @_; my $list = $opt_vmid ? { $opt_vmid => { type => 'lxc', vmid => int($opt_vmid) }} : config_list(); + if (defined($measure_timespan_ms)) + { + get_first_cpu($list, $measure_timespan_ms); + } + + $measure_timespan_ms //= 1000; + my $active_hash = list_active_containers(); my $cpucount = $cpuinfo->{cpus} || 1; @@ -285,7 +311,7 @@ sub vmstatus { } my $delta_ms = ($cdtime - $old->{time}) * $cpucount * 1000.0; - if ($delta_ms > 1000.0) { + if ($delta_ms > $measure_timespan_ms) { my $delta_used_ms = $used_ms - $old->{used}; $d->{cpu} = (($delta_used_ms / $delta_ms) * $cpucount) / $d->{cpus}; $last_proc_vmid_stat->{$vmid} = { -- 2.39.2 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [pve-devel] [PATCH pve-container] pct: fix cpu load calculation on command line 2023-07-18 11:58 [pve-devel] [PATCH pve-container] pct: fix cpu load calculation on command line Philipp Hufnagl @ 2023-07-18 12:00 ` Philipp Hufnagl 2023-07-18 13:02 ` Thomas Lamprecht 0 siblings, 1 reply; 4+ messages in thread From: Philipp Hufnagl @ 2023-07-18 12:00 UTC (permalink / raw) To: pve-devel Sorry forgott to tag as v2 On 7/18/23 13:58, Philipp Hufnagl wrote: > When called from the command line, it was not possible to calculate > cpu load because there was no 2nd data point available for the > calculation. Now (when called) from the command line, cpu stats will > be fetched twice with a minimum delta of 20ms. That way the load can > be calculated > > fixes #4765 > > Signed-off-by: Philipp Hufnagl <p.hufnagl@proxmox.com> > --- > src/PVE/CLI/pct.pm | 4 ++-- > src/PVE/LXC.pm | 32 +++++++++++++++++++++++++++++--- > 2 files changed, 31 insertions(+), 5 deletions(-) > > diff --git a/src/PVE/CLI/pct.pm b/src/PVE/CLI/pct.pm > index ff75d33..e531b27 100755 > --- a/src/PVE/CLI/pct.pm > +++ b/src/PVE/CLI/pct.pm > @@ -60,8 +60,8 @@ __PACKAGE__->register_method ({ > > # test if CT exists > my $conf = PVE::LXC::Config->load_config ($param->{vmid}); > - > - my $vmstatus = PVE::LXC::vmstatus($param->{vmid}); > + # workaround to get cpu usage is to fetch cpu stats twice with a delay > + my $vmstatus = PVE::LXC::vmstatus($param->{vmid}, 20); > my $stat = $vmstatus->{$param->{vmid}}; > if ($param->{verbose}) { > foreach my $k (sort (keys %$stat)) { > diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm > index a531ea5..9fc171f 100644 > --- a/src/PVE/LXC.pm > +++ b/src/PVE/LXC.pm > @@ -12,7 +12,7 @@ use IO::Poll qw(POLLIN POLLHUP); > use IO::Socket::UNIX; > use POSIX qw(EINTR); > use Socket; > -use Time::HiRes qw (gettimeofday); > +use Time::HiRes qw (gettimeofday usleep); > > use PVE::AccessControl; > use PVE::CGroup; > @@ -171,11 +171,37 @@ our $vmstatus_return_properties = { > } > }; > > +sub get_first_cpu { > + my ($list, $measure_timespan_ms) = @_; > + my $cdtime = gettimeofday; > + > + foreach my $vmid (keys %$list) { > + my $cgroups = PVE::LXC::CGroup->new($vmid); > + if (defined(my $cpu = $cgroups->get_cpu_stat())) { > + # Total time (in milliseconds) used up by the cpu. > + my $used_ms = $cpu->{utime} + $cpu->{stime}; > + $last_proc_vmid_stat->{$vmid} = { > + time => $cdtime, > + used => $used_ms, > + cpu => 0, > + }; > + } > + } > + usleep($measure_timespan_ms * 1000); > +} > + > sub vmstatus { > - my ($opt_vmid) = @_; > + my ($opt_vmid, $measure_timespan_ms) = @_; > > my $list = $opt_vmid ? { $opt_vmid => { type => 'lxc', vmid => int($opt_vmid) }} : config_list(); > > + if (defined($measure_timespan_ms)) > + { > + get_first_cpu($list, $measure_timespan_ms); > + } > + > + $measure_timespan_ms //= 1000; > + > my $active_hash = list_active_containers(); > > my $cpucount = $cpuinfo->{cpus} || 1; > @@ -285,7 +311,7 @@ sub vmstatus { > } > > my $delta_ms = ($cdtime - $old->{time}) * $cpucount * 1000.0; > - if ($delta_ms > 1000.0) { > + if ($delta_ms > $measure_timespan_ms) { > my $delta_used_ms = $used_ms - $old->{used}; > $d->{cpu} = (($delta_used_ms / $delta_ms) * $cpucount) / $d->{cpus}; > $last_proc_vmid_stat->{$vmid} = { ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [pve-devel] [PATCH pve-container] pct: fix cpu load calculation on command line 2023-07-18 12:00 ` Philipp Hufnagl @ 2023-07-18 13:02 ` Thomas Lamprecht 2023-07-18 13:48 ` Philipp Hufnagl 0 siblings, 1 reply; 4+ messages in thread From: Thomas Lamprecht @ 2023-07-18 13:02 UTC (permalink / raw) To: Philipp Hufnagl; +Cc: Proxmox VE development discussion, Maximiliano Sandoval Am 18/07/2023 um 14:00 schrieb Philipp Hufnagl: > Sorry forgott to tag as v2 and also forgot to document the patch changelog like asked yesterday.. > > On 7/18/23 13:58, Philipp Hufnagl wrote: >> When called from the command line, it was not possible to calculate >> cpu load because there was no 2nd data point available for the >> calculation. Now (when called) from the command line, cpu stats will >> be fetched twice with a minimum delta of 20ms. That way the load can >> be calculated @Maximiliano, didn't we decide to just drop it instead? This isn't really useful, once can get much better data from the pressure stall information (PSI) which is tracked per cgroup and tells a user much more than a 20 ms sample interval.. https://docs.kernel.org/accounting/psi.html#cgroup2-interface Still a few comments inline. >> >> fixes #4765 Add this to the start of the commit subject: fix #4765: >> >> Signed-off-by: Philipp Hufnagl <p.hufnagl@proxmox.com> >> --- >> src/PVE/CLI/pct.pm | 4 ++-- >> src/PVE/LXC.pm | 32 +++++++++++++++++++++++++++++--- >> 2 files changed, 31 insertions(+), 5 deletions(-) >> > diff --git a/src/PVE/CLI/pct.pm b/src/PVE/CLI/pct.pm > index ff75d33..e531b27 100755 > --- a/src/PVE/CLI/pct.pm > +++ b/src/PVE/CLI/pct.pm > @@ -60,8 +60,8 @@ __PACKAGE__->register_method ({ > > # test if CT exists > my $conf = PVE::LXC::Config->load_config ($param->{vmid}); > - > - my $vmstatus = PVE::LXC::vmstatus($param->{vmid}); > + # workaround to get cpu usage is to fetch cpu stats twice with a delay > + my $vmstatus = PVE::LXC::vmstatus($param->{vmid}, 20); > my $stat = $vmstatus->{$param->{vmid}}; > if ($param->{verbose}) { > foreach my $k (sort (keys %$stat)) { > diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm > index a531ea5..9fc171f 100644 > --- a/src/PVE/LXC.pm > +++ b/src/PVE/LXC.pm > @@ -12,7 +12,7 @@ use IO::Poll qw(POLLIN POLLHUP); > use IO::Socket::UNIX; > use POSIX qw(EINTR); > use Socket; > -use Time::HiRes qw (gettimeofday); > +use Time::HiRes qw (gettimeofday usleep); > > use PVE::AccessControl; > use PVE::CGroup; > @@ -171,11 +171,37 @@ our $vmstatus_return_properties = { > } > }; > > +sub get_first_cpu { would expect that this actually returns something, i.e., the ID of the first CPU or something like that, so method name should be telling more about what this does, e.g.: prime_vmstatus_cpu_sampling > + my ($list, $measure_timespan_ms) = @_; > + my $cdtime = gettimeofday; > + > + foreach my $vmid (keys %$list) { > + my $cgroups = PVE::LXC::CGroup->new($vmid); > + if (defined(my $cpu = $cgroups->get_cpu_stat())) { > + # Total time (in milliseconds) used up by the cpu. > + my $used_ms = $cpu->{utime} + $cpu->{stime}; > + $last_proc_vmid_stat->{$vmid} = { > + time => $cdtime, > + used => $used_ms, > + cpu => 0, > + }; > + } > + } > + usleep($measure_timespan_ms * 1000); this is rather ugly, the reading the call site one definitively does not expect that a innocent named get_first_cpu unconditionally sleeps even though that only the caller would require this for their sampling.. If we don't just drop this CPU load stuff in pct status I'd rather do one of four options: 1) rename this to prime_vmstatus_cpu_sampling and just do it for a single vmid, then call this new method in PVE::CLI::pct->status and do the sleep there, as that's actually the one call sites that cares about it, the existing vmstatus method then just needs one change: - if ($delta_ms > 1000.0) { + if ($delta_ms > 1000.0 || $old->{cpu} == 0) { 2) The same as 1) but instead of adding the prime_vmstatus_cpu_sampling helper just call vmstatus twice with sleeping in-between (and the same change to the if condition as for 1). 3) get the data where it's already available, i.e., pvestatd, might need more rework though 4) switch over to reporting the PSI from /sys/fs/cgroup/lxc/VMID/cpu.pressure this is pretty simple as in PSI ~ 0 -> no overload 0 >> PSI > 1 -> some overload and PSI >> 1 a lot of overload. Option 4 sounds niceish, but needs more work and has not that high of a benefit (users can already query this easily themselves), option 1 or 2 would be OK-ish, but IMO not ideal, as we'd use a 20ms avg here compared to a >> 1s average elswhere, which can be confusing as it can be quite, well spikey. option 3 would be better here but as mentioned also more rework and possible more intrusive one, so IMO just dropping it sounds almost the nicest and def. most simple one. > +} > + > sub vmstatus { > - my ($opt_vmid) = @_; > + my ($opt_vmid, $measure_timespan_ms) = @_; nit: in control theory, signal processing and acquiring stats in general, using "sampling period" or "sampling interval" is a bit more common for describing what you do here with "$measure_timespan_ms". > > my $list = $opt_vmid ? { $opt_vmid => { type => 'lxc', vmid => int($opt_vmid) }} : config_list(); > > + if (defined($measure_timespan_ms)) > + { Doesn't follows our coding style guide: https://pve.proxmox.com/wiki/Perl_Style_Guide#Spacing_and_syntax_usage > + get_first_cpu($list, $measure_timespan_ms); > + } > + > + $measure_timespan_ms //= 1000; could just put that in the else block. > + > my $active_hash = list_active_containers(); > > my $cpucount = $cpuinfo->{cpus} || 1; > @@ -285,7 +311,7 @@ sub vmstatus { > } > > my $delta_ms = ($cdtime - $old->{time}) * $cpucount * 1000.0; > - if ($delta_ms > 1000.0) { > + if ($delta_ms > $measure_timespan_ms) { > my $delta_used_ms = $used_ms - $old->{used}; > $d->{cpu} = (($delta_used_ms / $delta_ms) * $cpucount) / $d->{cpus}; > $last_proc_vmid_stat->{$vmid} = { ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [pve-devel] [PATCH pve-container] pct: fix cpu load calculation on command line 2023-07-18 13:02 ` Thomas Lamprecht @ 2023-07-18 13:48 ` Philipp Hufnagl 0 siblings, 0 replies; 4+ messages in thread From: Philipp Hufnagl @ 2023-07-18 13:48 UTC (permalink / raw) To: Thomas Lamprecht; +Cc: Proxmox VE development discussion, Maximiliano Sandoval Hello On 7/18/23 15:02, Thomas Lamprecht wrote: > Am 18/07/2023 um 14:00 schrieb Philipp Hufnagl: >> Sorry forgott to tag as v2 > and also forgot to document the patch changelog like asked yesterday.. Sorry. I did not know that. I will add a changelog > >> On 7/18/23 13:58, Philipp Hufnagl wrote: >>> When called from the command line, it was not possible to calculate >>> cpu load because there was no 2nd data point available for the >>> calculation. Now (when called) from the command line, cpu stats will >>> be fetched twice with a minimum delta of 20ms. That way the load can >>> be calculated > @Maximiliano, didn't we decide to just drop it instead? This isn't really > useful, once can get much better data from the pressure stall information > (PSI) which is tracked per cgroup and tells a user much more than a 20 ms > sample interval.. > > https://docs.kernel.org/accounting/psi.html#cgroup2-interface > > Still a few comments inline. Shall I wait with a v3 until a decision is made? > ust drop this CPU load stuff in pct status I'd rather do one of four > options: > > 1) rename this to prime_vmstatus_cpu_sampling and just do it for a single vmid, > then call this new method in PVE::CLI::pct->status and do the sleep there, as > that's actually the one call sites that cares about it, the existing vmstatus > method then just needs one change: > > - if ($delta_ms > 1000.0) { > + if ($delta_ms > 1000.0 || $old->{cpu} == 0) { > > 2) The same as 1) but instead of adding the prime_vmstatus_cpu_sampling helper > just call vmstatus twice with sleeping in-between (and the same change to the if > condition as for 1). > > 3) get the data where it's already available, i.e., pvestatd, might need more > rework though > > 4) switch over to reporting the PSI from /sys/fs/cgroup/lxc/VMID/cpu.pressure > this is pretty simple as in PSI ~ 0 -> no overload 0 >> PSI > 1 -> some overload > and PSI >> 1 a lot of overload. > > Option 4 sounds niceish, but needs more work and has not that high of a benefit > (users can already query this easily themselves), option 1 or 2 would be OK-ish, > but IMO not ideal, as we'd use a 20ms avg here compared to a >> 1s average elswhere, > which can be confusing as it can be quite, well spikey. option 3 would be better here > but as mentioned also more rework and possible more intrusive one, so IMO just > dropping it sounds almost the nicest and def. most simple one. > I think we should do Idea 1 as a solution until we finish a deeper rework ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-07-18 13:48 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-07-18 11:58 [pve-devel] [PATCH pve-container] pct: fix cpu load calculation on command line Philipp Hufnagl 2023-07-18 12:00 ` Philipp Hufnagl 2023-07-18 13:02 ` Thomas Lamprecht 2023-07-18 13:48 ` Philipp Hufnagl
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox