From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 35FCC1FF17E for ; Thu, 16 Oct 2025 17:15:35 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 5E195168F4; Thu, 16 Oct 2025 17:15:53 +0200 (CEST) Mime-Version: 1.0 Date: Thu, 16 Oct 2025 17:15:48 +0200 Message-Id: From: "Daniel Kral" To: "Fiona Ebner" , "Proxmox VE development discussion" X-Mailer: aerc 0.20.0 References: <20250930142021.366529-1-d.kral@proxmox.com> <20250930142021.366529-5-d.kral@proxmox.com> In-Reply-To: X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1760627746108 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.014 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment RCVD_IN_VALIDITY_CERTIFIED_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_RPBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. RCVD_IN_VALIDITY_SAFE_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record URIBL_BLOCKED 0.001 ADMINISTRATOR NOTICE: The query to URIBL was blocked. See http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block for more information. [env.pm, resources.pm, pve2.pm, manager.pm] Subject: Re: [pve-devel] [PATCH ha-manager 1/9] implement static service stats cache X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" On Thu Oct 16, 2025 at 1:12 PM CEST, Fiona Ebner wrote: > Am 30.09.25 um 4:21 PM schrieb Daniel Kral: >> As the HA Manager builds the static load scheduler, it queries the >> services' static usage by reading and parsing the static guest configs >> individually, which can take significant time with respect to how many >> times recompute_online_node_usage(...) is called. >> >> PVE::Cluster exposes an efficient interface to gather a set of >> properties from one or all guest configs [0]. This is used here to build >> a rather short-lived cache on every (re)initialization of the static >> load scheduler. >> >> The downside to this approach is if there are way more non-HA managed >> guests in the cluster than HA-managed guests, which causes this to load >> much more information than necessary. It also doesn't cache the default >> values for the environment-specific static service usage, which causes >> quite a bottleneck as well. >> >> [0] pve-cluster cf1b19d (add get_guest_config_property IPCC method) >> >> Suggested-by: Fiona Ebner >> Signed-off-by: Daniel Kral >> --- >> If the above mentioned downside is too large for some setups, we could >> also just add a switch to enable/disable the static stats cache or >> automatically figure out if it brings any benefits.. But I think this >> will be good enough, especially with the latter patches making way fewer >> calls to get_service_usage(...). > > Parsing the configs is much more costly than > PVE::Cluster::get_guest_config_properties(), so improving the situation > for the majority of use cases of the static scheduler still makes sense, > even if it hurts performance for some more exotic setups. And we would > like to consider usage of non-HA guests in the future too to make the > static scheduler more accurate even in such exotic setups. But with > PSI-based information that is already included in the node usage, no > extra preparation necessary there. I fully agree, that's a rather odd case and in hindsight I doubt that this has any significant performance improvements for the most common setups. Not that important, but I'd move this patch after the granular changes are implemented since only there this patch makes a performance improvement while at this point in time and with this implementation it actually degrades performance. > >> >> src/PVE/HA/Env.pm | 12 ++++++++++++ >> src/PVE/HA/Env/PVE2.pm | 21 +++++++++++++++++++++ >> src/PVE/HA/Manager.pm | 1 + > > Needs a rebase because of changes in Manager.pm > >> @@ -497,6 +499,25 @@ sub get_datacenter_settings { >> }; >> } >> >> +sub get_static_service_stats { >> + my ($self, $id) = @_; >> + >> + # undef if update_static_service_stats(...) failed before >> + return undef if !defined($self->{static_service_stats}); >> + >> + return $self->{static_service_stats}->{$id} // {}; > > Can't returning '{}' when nothing is there lead to issues down the line? > If we return undef instead, it's consistent with not having anything > cached and the caller will fall back to loading the config. This return value type definitely needs improvement and/or better documentation, but an undef $self->{static_service_stats}->{$id} value indicates that it should fallback to the default value as none of the properties requested by get_guest_config_properties(...) was included in that particular guest config, e.g. no 'cores', 'sockets', and 'memory' properties defined in a VM config. When $self->{static_service_stats} itself is undef, then the static cache couldn't be queried for some reason. > >> @@ -477,6 +467,8 @@ sub new { >> >> $self->{service_config} = $self->read_service_config(); >> >> + $self->{static_service_stats} = undef; >> + >> return $self; >> } >> >> @@ -943,6 +935,25 @@ sub watchdog_update { >> return &$modify_watchog($self, $code); >> } >> >> +sub get_static_service_stats { >> + my ($self, $id) = @_; >> + >> + # undef if update_static_service_stats(...) failed before >> + return undef if !defined($self->{static_service_stats}); >> + >> + return $self->{static_service_stats}->{$id} // {}; > > Same here. > >> +} >> + >> +sub update_static_service_stats { >> + my ($self) = @_; >> + >> + my $filename = "$self->{statusdir}/static_service_stats"; >> + my $stats = eval { PVE::HA::Tools::read_json_from_file($filename) }; >> + $self->log('warning', "unable to update static service stats cache - $@") if $@; >> + >> + $self->{static_service_stats} = $stats; >> +} >> + >> sub get_static_node_stats { >> my ($self) = @_; >> >> diff --git a/src/PVE/HA/Sim/Resources.pm b/src/PVE/HA/Sim/Resources.pm >> index 72623ee1..7641b1a9 100644 >> --- a/src/PVE/HA/Sim/Resources.pm >> +++ b/src/PVE/HA/Sim/Resources.pm >> @@ -143,8 +143,8 @@ sub get_static_stats { >> my $sid = $class->type() . ":$id"; >> my $hardware = $haenv->hardware(); >> >> - my $stats = $hardware->read_static_service_stats(); >> - if (my $service_stats = $stats->{$sid}) { >> + my $service_stats = $hardware->get_static_service_stats($sid); >> + if (%$service_stats) { > > Style nit: with the above change, this could be just $service_stats or a > definedness check for it. If really checking if there are any keys, I > prefer being explicit with scalar(keys ...) ACK will do that, don't know why I didn't see that already :) > >> return $service_stats; >> } elsif ($id =~ /^(\d)(\d\d)/) { >> # auto assign usage calculated from ID for convenience _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel