From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [IPv6:2a01:7e0:0:424::9]) by lore.proxmox.com (Postfix) with ESMTPS id 7FF821FF15D for ; Thu, 3 Oct 2024 11:41:02 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id D66441D011; Thu, 3 Oct 2024 11:41:21 +0200 (CEST) Message-ID: <88cc9386-c892-4070-9483-da4f187d909b@proxmox.com> Date: Thu, 3 Oct 2024 11:40:47 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: Sascha Westermann , Proxmox VE development discussion References: <20240917055020.10507-1-sascha.westermann@hl-services.de> <69e71cf0-a8c4-42a6-a1a5-36024e903687@proxmox.com> <63c737f2-21cd-4fff-bf86-2369de65f886@hl-services.de> Content-Language: en-US From: Daniel Kral In-Reply-To: <63c737f2-21cd-4fff-bf86-2369de65f886@hl-services.de> X-SPAM-LEVEL: Spam detection results: 0 AWL 0.001 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: Re: [pve-devel] [PATCH pve-manager 2/3] Fix #5708: Add CPU raw counters X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" On 9/30/24 08:17, Sascha Westermann wrote: > If the data is also to be processed by external metric servers, I think > the integration in `PVE::ProcFSTools::read_proc_stat()` makes sense. The > term `cpustat` would no longer conflict in this case, as the content > would be virtually the same. I would create a new patch series for this, > but when looking at `read_proc_stat` a few questions arose for me: > >> sub read_proc_stat { >> my $res = { user => 0, nice => 0, system => 0, idle => 0 , iowait => 0, irq => 0, softirq => 0, steal => 0, guest => 0, guest_nice => 0, sum => 0}; > > In order to remain consistent with the structure, the same fields per > CPU would also need to be initialized with 0. However, this is only > done when I find an entry of the form cpu, which implicitly gives > me values - so the initialization would not do anything. In this > context, I wonder whether there are any situations where no values can > be set? I would actually say that this is not the case and that the > initialization can be removed. Those fields seem to be initialized to zero here, as they are summed later for `$res->{total}` [0]. If the `/proc/stat` file couldn't be read or for some reason no `cpu` was captured, those would be initialized to `undef`. As far as I'm aware, an addition with a `undef` value would convert it to `0` in numerical contexts, so removing it shouldn't be a problem, but I could also be missing something here. --- At least for me, I think you wouldn't need to initialize the `cpu` fields with zero, but just make sure that any capture group that uses a prefixed `?` could become `undef` [1], so that case needs to be handled, just like `$res->{guest}` and `$res->{guest_nice}` is. > > Side note: `sum` is not used, it probably means `total`, right? As far as I'm aware, I couldn't find a usage for the `sum` property anywhere the `read_proc_stat` is used and could be deleted. > >> if (my $fh = IO::File->new ("/proc/stat", "r")) { >> while (defined (my $line = <$fh>)) { >> if ($line =~ m|^cpu\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)(?:\s+(\d+)\s+(\d+))?|) { >> $res->{user} = $1 - ($9 // 0); >> $res->{nice} = $2 - ($10 // 0); >> $res->{system} = $3; > > `user` contains `guest` and `nice` contains `guest_nice`, so I > understand that the values are subtracted. Wouldn't it be better to use > the original values here? Especially when these are passed out as raw > values via API, it would certainly be helpful if they correspond 1:1 > with the documentation from /proc/stat. I can only point you in the direction of the original author's commit, it seems like it was inspired by other monitoring tools, but I haven't checked the latter two references in the commit message [2]. I get your point, but there might already be many (external) users, which depend on these values as they are calculated here (thinking of InfluxDB/Grafana dashboards, API users, ...). So if it doesn't hurt your use case too badly (and others that would use your new data points too!), I would adopt that for the `cpu` datapoints as well. > > The values are output as a string in the JSON output. Is there anything > against casting them to int? Good catch! It would be great if any numerical value is also treated like that. Be aware that Perl doesn't really have an explicit way to 'cast' variables as types are handled by context. The builtin `int` function only makes sure that we only get the integer part of an expression [3]. So be aware, but if it doesn't break something (or change probable existing expectations for API users), go ahead. > >> my $diff = ($ctime - $last_proc_stat->{ctime}) * $clock_ticks * $cpucount; >> >> if ($diff > 1000) { # don't update too often > > I don't understand the condition. `$ctime - $last_proc_stat->{ctime}` > corresponds to the elapsed time in seconds as a float, `clock_ticks` > would normally be 100. So that would mean that on a system with one CPU > core an update may only take place every 10 seconds, but with 8 cores > every 1.25 seconds? Is that an error? Is it even necessary to suppress > updates if $diff > 0? Unfortunately, as some other things here, these changes predate our git repository and I couldn't tell you exactly why it is that way. As far as I can say, as this is retrieved from the WebGUI's node status page around every second, this seems like it should "buffer" changes to these two data points, so that there are no spikes for the cpu% and wait%. But I don't quite get why it is multiplied with `$clock_ticks` and `$cpu_count` either. If you really want to change this, I would do that in another patch (series) and mark it as an "RFC" (Read for Comments), so it gets a little more discussion if that change conflicts with anyone. --- Hope these remarks clear up some problems. Looking forward to your patches! Cheers, Daniel [0] https://git.proxmox.com/?p=pve-common.git;a=commitdiff;h=5224b31b [1] https://perldoc.perl.org/perlvar#$%3Cdigits%3E-($1,-$2,-...) [2] https://git.proxmox.com/?p=pve-common.git;a=commitdiff;h=c140206b [3] https://perldoc.perl.org/functions/int _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel