From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.proxmox.com (Postfix) with ESMTPS id D513E92FE for ; Thu, 17 Nov 2022 15:00:53 +0100 (CET) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 9D53B2D667 for ; Thu, 17 Nov 2022 15:00:27 +0100 (CET) Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by firstgate.proxmox.com (Proxmox) with ESMTPS for ; Thu, 17 Nov 2022 15:00:23 +0100 (CET) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 7AC6C44D99 for ; Thu, 17 Nov 2022 15:00:23 +0100 (CET) From: Fiona Ebner To: pve-devel@lists.proxmox.com Date: Thu, 17 Nov 2022 15:00:11 +0100 Message-Id: <20221117140018.105004-11-f.ebner@proxmox.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221117140018.105004-1-f.ebner@proxmox.com> References: <20221117140018.105004-1-f.ebner@proxmox.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SPAM-LEVEL: Spam detection results: =?UTF-8?Q?0=0A=09?=AWL 0.027 Adjusted score from AWL reputation of From: =?UTF-8?Q?address=0A=09?=BAYES_00 -1.9 Bayes spam probability is 0 to 1% KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict =?UTF-8?Q?Alignment=0A=09?=SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF =?UTF-8?Q?Record=0A=09?=SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH v2 ha-manager 10/15] manager: use static resource scheduler when configured X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Nov 2022 14:00:53 -0000 Note that recompute_online_node_usage() becomes much slower when the 'static' resource scheduler mode is used. Tested it with ~300 HA services (minimal containers) running on my virtual test cluster. Timings with 'basic' mode were between 0.0004 - 0.001 seconds Timings with 'static' mode were between 0.007 - 0.012 seconds Combined with the fact that recompute_online_node_usage() is currently called very often this can lead to a lot of delay during recovery situations with hundreds of services and low thousands of services overall and with genereous estimates even run into the watchdog timer. Ideas to remedy this is using PVE::Cluster's get_guest_config_properties() instead of load_config() and/or optimizing how often recompute_online_node_usage() is called. Signed-off-by: Fiona Ebner --- Changes from v1: * Add fixme note about overhead. * Add benchmark results to commit message. src/PVE/HA/Manager.pm | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm index 1638442..7f1d1d7 100644 --- a/src/PVE/HA/Manager.pm +++ b/src/PVE/HA/Manager.pm @@ -8,6 +8,7 @@ use PVE::Tools; use PVE::HA::Tools ':exit_codes'; use PVE::HA::NodeStatus; use PVE::HA::Usage::Basic; +use PVE::HA::Usage::Static; ## Variable Name & Abbreviations Convention # @@ -203,14 +204,35 @@ my $valid_service_states = { error => 1, }; +# FIXME with 'static' mode and thousands of services, the overhead can be noticable and the fact +# that this function is called for each state change and upon recovery doesn't help. sub recompute_online_node_usage { my ($self) = @_; - my $online_node_usage = PVE::HA::Usage::Basic->new($self->{haenv}); + my $haenv = $self->{haenv}; my $online_nodes = $self->{ns}->list_online_nodes(); - $online_node_usage->add_node($_) for $online_nodes->@*; + my $online_node_usage; + + if (my $mode = $self->{'scheduler-mode'}) { + if ($mode eq 'static') { + $online_node_usage = eval { + my $scheduler = PVE::HA::Usage::Static->new($haenv); + $scheduler->add_node($_) for $online_nodes->@*; + return $scheduler; + }; + $haenv->log('warning', "using 'basic' scheduler mode, init for 'static' failed - $@") + if $@; + } elsif ($mode ne 'basic') { + $haenv->log('warning', "got unknown scheduler mode '$mode', using 'basic'"); + } + } + + if (!$online_node_usage) { + $online_node_usage = PVE::HA::Usage::Basic->new($haenv); + $online_node_usage->add_node($_) for $online_nodes->@*; + } foreach my $sid (keys %{$self->{ss}}) { my $sd = $self->{ss}->{$sid}; -- 2.30.2