From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id 2CAF3E52C for ; Tue, 18 Jul 2023 15:02:31 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 15F301B3D9 for ; Tue, 18 Jul 2023 15:02:31 +0200 (CEST) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Tue, 18 Jul 2023 15:02:30 +0200 (CEST) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 18C6F42BFB for ; Tue, 18 Jul 2023 15:02:30 +0200 (CEST) Message-ID: <2d4ca785-09c3-0fc3-0658-e7f29f8579b9@proxmox.com> Date: Tue, 18 Jul 2023 15:02:29 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: Philipp Hufnagl References: <20230718115828.170254-1-p.hufnagl@proxmox.com> Content-Language: de-AT, en-GB From: Thomas Lamprecht Autocrypt: addr=t.lamprecht@proxmox.com; keydata= xsFNBFsLjcYBEACsaQP6uTtw/xHTUCKF4VD4/Wfg7gGn47+OfCKJQAD+Oyb3HSBkjclopC5J uXsB1vVOfqVYE6PO8FlD2L5nxgT3SWkc6Ka634G/yGDU3ZC3C/7NcDVKhSBI5E0ww4Qj8s9w OQRloemb5LOBkJNEUshkWRTHHOmk6QqFB/qBPW2COpAx6oyxVUvBCgm/1S0dAZ9gfkvpqFSD 90B5j3bL6i9FIv3YGUCgz6Ue3f7u+HsEAew6TMtlt90XV3vT4M2IOuECG/pXwTy7NtmHaBQ7 UJBcwSOpDEweNob50+9B4KbnVn1ydx+K6UnEcGDvUWBkREccvuExvupYYYQ5dIhRFf3fkS4+ wMlyAFh8PQUgauod+vqs45FJaSgTqIALSBsEHKEs6IoTXtnnpbhu3p6XBin4hunwoBFiyYt6 YHLAM1yLfCyX510DFzX/Ze2hLqatqzY5Wa7NIXqYYelz7tXiuCLHP84+sV6JtEkeSUCuOiUY virj6nT/nJK8m0BzdR6FgGtNxp7RVXFRz/+mwijJVLpFsyG1i0Hmv2zTn3h2nyGK/I6yhFNt dX69y5hbo6LAsRjLUvZeHXpTU4TrpN/WiCjJblbj5um5eEr4yhcwhVmG102puTtuCECsDucZ jpKpUqzXlpLbzG/dp9dXFH3MivvfuaHrg3MtjXY1i+/Oxyp5iwARAQABzTNUaG9tYXMgTGFt cHJlY2h0IChBdXRoLTQpIDx0LmxhbXByZWNodEBwcm94bW94LmNvbT7CwY4EEwEIADgWIQQO R4qbEl/pah9K6VrTZCM6gDZWBgUCWwuNxgIbAwULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAK CRDTZCM6gDZWBm/jD/4+6JB2s67eaqoP6x9VGaXNGJPCscwzLuxDTCG90G9FYu29VcXtubH/ bPwsyBbNUQpqTm/s4XboU2qpS5ykCuTjqavrcP33tdkYfGcItj2xMipJ1i3TWvpikQVsX42R G64wovLs/dvpTYphRZkg5DwhgTmy3mRkmofFCTa+//MOcNOORltemp984tWjpR3bUJETNWpF sKGZHa3N4kCNxb7A+VMsJZ/1gN3jbQbQG7GkJtnHlWkw9rKCYqBtWrnrHa4UAvSa9M/XCIAB FThFGqZI1ojdVlv5gd6b/nWxfOPrLlSxbUo5FZ1i/ycj7/24nznW1V4ykG9iUld4uYUY86bB UGSjew1KYp9FmvKiwEoB+zxNnuEQfS7/Bj1X9nxizgweiHIyFsRqgogTvLh403QMSGNSoArk tqkorf1U+VhEncIn4H3KksJF0njZKfilrieOO7Vuot1xKr9QnYrZzJ7m7ZxJ/JfKGaRHXkE1 feMmrvZD1AtdUATZkoeQtTOpMu4r6IQRfSdwm/CkppZXfDe50DJxAMDWwfK2rr2bVkNg/yZI tKLBS0YgRTIynkvv0h8d9dIjiicw3RMeYXyqOnSWVva2r+tl+JBaenr8YTQw0zARrhC0mttu cIZGnVEvQuDwib57QLqMjQaC1gazKHvhA15H5MNxUhwm229UmdH3KM7BTQRbC43GARAAyTkR D6KRJ9Xa2fVMh+6f186q0M3ni+5tsaVhUiykxjsPgkuWXWW9MbLpYXkzX6h/RIEKlo2BGA95 QwG5+Ya2Bo3g7FGJHAkXY6loq7DgMp5/TVQ8phsSv3WxPTJLCBq6vNBamp5hda4cfXFUymsy HsJy4dtgkrPQ/bnsdFDCRUuhJHopnAzKHN8APXpKU6xV5e3GE4LwFsDhNHfH/m9+2yO/trcD txSFpyftbK2gaMERHgA8SKkzRhiwRTt9w5idOfpJVkYRsgvuSGZ0pcD4kLCOIFrer5xXudk6 NgJc36XkFRMnwqrL/bB4k6Pi2u5leyqcXSLyBgeHsZJxg6Lcr2LZ35+8RQGPOw9C0ItmRjtY ZpGKPlSxjxA1WHT2YlF9CEt3nx7c4C3thHHtqBra6BGPyW8rvtq4zRqZRLPmZ0kt/kiMPhTM 8wZAlObbATVrUMcZ/uNjRv2vU9O5aTAD9E5r1B0dlqKgxyoImUWB0JgpILADaT3VybDd3C8X s6Jt8MytUP+1cEWt9VKo4vY4Jh5vwrJUDLJvzpN+TsYCZPNVj18+jf9uGRaoK6W++DdMAr5l gQiwsNgf9372dbMI7pt2gnT5/YdG+ZHnIIlXC6OUonA1Ro/Itg90Q7iQySnKKkqqnWVc+qO9 GJbzcGykxD6EQtCSlurt3/5IXTA7t6sAEQEAAcLBdgQYAQgAIBYhBA5HipsSX+lqH0rpWtNk IzqANlYGBQJbC43GAhsMAAoJENNkIzqANlYGD1sP/ikKgHgcspEKqDED9gQrTBvipH85si0j /Jwu/tBtnYjLgKLh2cjv1JkgYYjb3DyZa1pLsIv6rGnPX9bH9IN03nqirC/Q1Y1lnbNTynPk IflgvsJjoTNZjgu1wUdQlBgL/JhUp1sIYID11jZphgzfDgp/E6ve/8xE2HMAnf4zAfJaKgD0 F+fL1DlcdYUditAiYEuN40Ns/abKs8I1MYx7Yglu3RzJfBzV4t86DAR+OvuF9v188WrFwXCS RSf4DmJ8tntyNej+DVGUnmKHupLQJO7uqCKB/1HLlMKc5G3GLoGqJliHjUHUAXNzinlpE2Vj C78pxpwxRNg2ilE3AhPoAXrY5qED5PLE9sLnmQ9AzRcMMJUXjTNEDxEYbF55SdGBHHOAcZtA kEQKub86e+GHA+Z8oXQSGeSGOkqHi7zfgW1UexddTvaRwE6AyZ6FxTApm8wq8NT2cryWPWTF BDSGB3ujWHMM8ERRYJPcBSjTvt0GcEqnd+OSGgxTkGOdufn51oz82zfpVo1t+J/FNz6MRMcg 8nEC+uKvgzH1nujxJ5pRCBOquFZaGn/p71Yr0oVitkttLKblFsqwa+10Lt6HBxm+2+VLp4Ja 0WZNncZciz3V3cuArpan/ZhhyiWYV5FD0pOXPCJIx7WS9PTtxiv0AOS4ScWEUmBxyhFeOpYa DrEx Cc: Proxmox VE development discussion , Maximiliano Sandoval In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SPAM-LEVEL: Spam detection results: 0 AWL -0.077 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record T_SCC_BODY_TEXT_LINE -0.01 - Subject: Re: [pve-devel] [PATCH pve-container] pct: fix cpu load calculation on command line X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Jul 2023 13:02:31 -0000 Am 18/07/2023 um 14:00 schrieb Philipp Hufnagl: > Sorry forgott to tag as v2 and also forgot to document the patch changelog like asked yesterday.. >=20 > On 7/18/23 13:58, Philipp Hufnagl wrote: >> =C2=A0 When called from the command line, it was not possible to calcu= late >> =C2=A0 cpu load because there was no 2nd data point available for the >> =C2=A0 calculation. Now (when called) from the command line, cpu stats= will >> =C2=A0 be fetched twice with a minimum delta of 20ms. That way the loa= d can >> =C2=A0 be calculated @Maximiliano, didn't we decide to just drop it instead? This isn't really= useful, once can get much better data from the pressure stall information= (PSI) which is tracked per cgroup and tells a user much more than a 20 ms= sample interval.. https://docs.kernel.org/accounting/psi.html#cgroup2-interface Still a few comments inline. >> >> fixes #4765 Add this to the start of the commit subject: fix #4765:=20 >> >> Signed-off-by: Philipp Hufnagl >> --- >> =C2=A0 src/PVE/CLI/pct.pm |=C2=A0 4 ++-- >> =C2=A0 src/PVE/LXC.pm=C2=A0=C2=A0=C2=A0=C2=A0 | 32 +++++++++++++++++++= ++++++++++--- >> =C2=A0 2 files changed, 31 insertions(+), 5 deletions(-) >> > diff --git a/src/PVE/CLI/pct.pm b/src/PVE/CLI/pct.pm > index ff75d33..e531b27 100755 > --- a/src/PVE/CLI/pct.pm > +++ b/src/PVE/CLI/pct.pm > @@ -60,8 +60,8 @@ __PACKAGE__->register_method ({ > =20 > # test if CT exists > my $conf =3D PVE::LXC::Config->load_config ($param->{vmid}); > - > - my $vmstatus =3D PVE::LXC::vmstatus($param->{vmid}); > + # workaround to get cpu usage is to fetch cpu stats twice with a dela= y > + my $vmstatus =3D PVE::LXC::vmstatus($param->{vmid}, 20); > my $stat =3D $vmstatus->{$param->{vmid}}; > if ($param->{verbose}) { > foreach my $k (sort (keys %$stat)) { > diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm > index a531ea5..9fc171f 100644 > --- a/src/PVE/LXC.pm > +++ b/src/PVE/LXC.pm > @@ -12,7 +12,7 @@ use IO::Poll qw(POLLIN POLLHUP); > use IO::Socket::UNIX; > use POSIX qw(EINTR); > use Socket; > -use Time::HiRes qw (gettimeofday); > +use Time::HiRes qw (gettimeofday usleep); > =20 > use PVE::AccessControl; > use PVE::CGroup; > @@ -171,11 +171,37 @@ our $vmstatus_return_properties =3D { > } > }; > =20 > +sub get_first_cpu { would expect that this actually returns something, i.e., the ID of the fi= rst CPU or something like that, so method name should be telling more about what = this does, e.g.: prime_vmstatus_cpu_sampling > + my ($list, $measure_timespan_ms) =3D @_; > + my $cdtime =3D gettimeofday; > + > + foreach my $vmid (keys %$list) { > + my $cgroups =3D PVE::LXC::CGroup->new($vmid); > + if (defined(my $cpu =3D $cgroups->get_cpu_stat())) { > + # Total time (in milliseconds) used up by the cpu. > + my $used_ms =3D $cpu->{utime} + $cpu->{stime}; > + $last_proc_vmid_stat->{$vmid} =3D { > + time =3D> $cdtime, > + used =3D> $used_ms, > + cpu =3D> 0, > + }; > + } > + } > + usleep($measure_timespan_ms * 1000); this is rather ugly, the reading the call site one definitively does not = expect that a innocent named get_first_cpu unconditionally sleeps even though th= at only the caller would require this for their sampling.. If we don't just drop this CPU load stuff in pct status I'd rather do one= of four options: 1) rename this to prime_vmstatus_cpu_sampling and just do it for a single= vmid, then call this new method in PVE::CLI::pct->status and do the sleep there= , as that's actually the one call sites that cares about it, the existing vmst= atus method then just needs one change: - if ($delta_ms > 1000.0) { + if ($delta_ms > 1000.0 || $old->{cpu} =3D=3D 0) { 2) The same as 1) but instead of adding the prime_vmstatus_cpu_sampling h= elper just call vmstatus twice with sleeping in-between (and the same change to= the if condition as for 1). 3) get the data where it's already available, i.e., pvestatd, might need = more rework though 4) switch over to reporting the PSI from /sys/fs/cgroup/lxc/VMID/cpu.pres= sure this is pretty simple as in PSI ~ 0 -> no overload 0 >> PSI > 1 -> some o= verload and PSI >> 1 a lot of overload. Option 4 sounds niceish, but needs more work and has not that high of a b= enefit (users can already query this easily themselves), option 1 or 2 would be = OK-ish, but IMO not ideal, as we'd use a 20ms avg here compared to a >> 1s averag= e elswhere, which can be confusing as it can be quite, well spikey. option 3 would be= better here but as mentioned also more rework and possible more intrusive one, so IMO= just dropping it sounds almost the nicest and def. most simple one. > +} > + > sub vmstatus { > - my ($opt_vmid) =3D @_; > + my ($opt_vmid, $measure_timespan_ms) =3D @_; nit: in control theory, signal processing and acquiring stats in general,= using "sampling period" or "sampling interval" is a bit more common for describ= ing what you do here with "$measure_timespan_ms". > =20 > my $list =3D $opt_vmid ? { $opt_vmid =3D> { type =3D> 'lxc', vmid = =3D> int($opt_vmid) }} : config_list(); > =20 > + if (defined($measure_timespan_ms)) > + { Doesn't follows our coding style guide: https://pve.proxmox.com/wiki/Perl_Style_Guide#Spacing_and_syntax_usage > + get_first_cpu($list, $measure_timespan_ms); > + } > + > + $measure_timespan_ms //=3D 1000; could just put that in the else block. > + > my $active_hash =3D list_active_containers(); > =20 > my $cpucount =3D $cpuinfo->{cpus} || 1; > @@ -285,7 +311,7 @@ sub vmstatus { > } > =20 > my $delta_ms =3D ($cdtime - $old->{time}) * $cpucount * 1000.0; > - if ($delta_ms > 1000.0) { > + if ($delta_ms > $measure_timespan_ms) { > my $delta_used_ms =3D $used_ms - $old->{used}; > $d->{cpu} =3D (($delta_used_ms / $delta_ms) * $cpucount) / $d->{cpus= }; > $last_proc_vmid_stat->{$vmid} =3D {