From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from firstgate.proxmox.com (firstgate.proxmox.com [212.224.123.68]) by lore.proxmox.com (Postfix) with ESMTPS id 3A32F1FF187 for ; Mon, 20 Oct 2025 18:45:42 +0200 (CEST) Received: from firstgate.proxmox.com (localhost [127.0.0.1]) by firstgate.proxmox.com (Proxmox) with ESMTP id 462F3DA47; Mon, 20 Oct 2025 18:45:52 +0200 (CEST) From: Daniel Kral To: pve-devel@lists.proxmox.com Date: Mon, 20 Oct 2025 18:45:37 +0200 Message-ID: <20251020164540.517231-12-d.kral@proxmox.com> X-Mailer: git-send-email 2.47.3 In-Reply-To: <20251020164540.517231-1-d.kral@proxmox.com> References: <20251020164540.517231-1-d.kral@proxmox.com> MIME-Version: 1.0 X-Bm-Milter-Handled: 55990f41-d878-4baa-be0a-ee34c49e34d2 X-Bm-Transport-Timestamp: 1760978739157 X-SPAM-LEVEL: Spam detection results: 0 AWL 0.015 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DMARC_MISSING 0.1 Missing DMARC policy KAM_DMARC_STATUS 0.01 Test Rule for DKIM or SPF Failure with Strict Alignment SPF_HELO_NONE 0.001 SPF: HELO does not publish an SPF Record SPF_PASS -0.001 SPF: sender matches SPF record Subject: [pve-devel] [PATCH ha-manager v2 7/8] manager: make online node usage computation granular X-BeenThere: pve-devel@lists.proxmox.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Proxmox VE development discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Proxmox VE development discussion Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: pve-devel-bounces@lists.proxmox.com Sender: "pve-devel" The HA Manager builds $online_node_usage in every FSM iteration in manage(...) and at every HA resource state change in change_service_state(...). This becomes quite costly with a high HA resource count and a lot of state changes happening at once, e.g. starting up multiple nodes with rebalance_on_request_start set or a failover of a node with many configured HA resources. To improve this situation, make the changes to the $online_node_usage more granular by building $online_node_usage only once per call to manage(...) and changing the nodes a HA resource uses individually on every HA resource state transition. This allows the HA Manager to handle many more HA resources with the static load scheduler. Signed-off-by: Daniel Kral --- changes since v1: - remove FIXME - remove argument about cache from patch message - use add_service_usage(...) helper from $online_node_usage now - did not add R-b from Fiona as add_service_usage(...) was moved src/PVE/HA/Manager.pm | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm index bf6895ad..3bd6e1a6 100644 --- a/src/PVE/HA/Manager.pm +++ b/src/PVE/HA/Manager.pm @@ -238,8 +238,6 @@ my $valid_service_states = { error => 1, }; -# FIXME with 'static' mode and thousands of services, the overhead can be noticable and the fact -# that this function is called for each state change and upon recovery doesn't help. sub recompute_online_node_usage { my ($self) = @_; @@ -317,7 +315,9 @@ my $change_service_state = sub { $sd->{$k} = $v; } - $self->recompute_online_node_usage(); + $self->{online_node_usage}->remove_service_usage($sid); + $self->{online_node_usage} + ->add_service_usage($sid, $sd->{state}, $sd->{node}, $sd->{target}); $sd->{uid} = compute_new_uuid($new_state); @@ -709,6 +709,8 @@ sub manage { delete $ss->{$sid}; } + $self->recompute_online_node_usage(); + my $new_rules = $haenv->read_rules_config(); # TODO PVE 10: Remove group migration when HA groups have been fully migrated to rules @@ -738,8 +740,6 @@ sub manage { for (;;) { my $repeat = 0; - $self->recompute_online_node_usage(); - foreach my $sid (sort keys %$ss) { my $sd = $ss->{$sid}; my $cd = $sc->{$sid} || { state => 'disabled' }; -- 2.47.3 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel