all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Daniel Kral <d.kral@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH ha-manager v2 7/8] manager: make online node usage computation granular
Date: Mon, 20 Oct 2025 18:45:37 +0200	[thread overview]
Message-ID: <20251020164540.517231-12-d.kral@proxmox.com> (raw)
In-Reply-To: <20251020164540.517231-1-d.kral@proxmox.com>

The HA Manager builds $online_node_usage in every FSM iteration in
manage(...) and at every HA resource state change in
change_service_state(...). This becomes quite costly with a high HA
resource count and a lot of state changes happening at once, e.g.
starting up multiple nodes with rebalance_on_request_start set or a
failover of a node with many configured HA resources.

To improve this situation, make the changes to the $online_node_usage
more granular by building $online_node_usage only once per call to
manage(...) and changing the nodes a HA resource uses individually on
every HA resource state transition. This allows the HA Manager to handle
many more HA resources with the static load scheduler.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
changes since v1:
 - remove FIXME
 - remove argument about cache from patch message
 - use add_service_usage(...) helper from $online_node_usage now
 - did not add R-b from Fiona as add_service_usage(...) was moved

 src/PVE/HA/Manager.pm | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index bf6895ad..3bd6e1a6 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -238,8 +238,6 @@ my $valid_service_states = {
     error => 1,
 };
 
-# FIXME with 'static' mode and thousands of services, the overhead can be noticable and the fact
-# that this function is called for each state change and upon recovery doesn't help.
 sub recompute_online_node_usage {
     my ($self) = @_;
 
@@ -317,7 +315,9 @@ my $change_service_state = sub {
         $sd->{$k} = $v;
     }
 
-    $self->recompute_online_node_usage();
+    $self->{online_node_usage}->remove_service_usage($sid);
+    $self->{online_node_usage}
+        ->add_service_usage($sid, $sd->{state}, $sd->{node}, $sd->{target});
 
     $sd->{uid} = compute_new_uuid($new_state);
 
@@ -709,6 +709,8 @@ sub manage {
         delete $ss->{$sid};
     }
 
+    $self->recompute_online_node_usage();
+
     my $new_rules = $haenv->read_rules_config();
 
     # TODO PVE 10: Remove group migration when HA groups have been fully migrated to rules
@@ -738,8 +740,6 @@ sub manage {
     for (;;) {
         my $repeat = 0;
 
-        $self->recompute_online_node_usage();
-
         foreach my $sid (sort keys %$ss) {
             my $sd = $ss->{$sid};
             my $cd = $sc->{$sid} || { state => 'disabled' };
-- 
2.47.3



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


  parent reply	other threads:[~2025-10-20 16:45 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-20 16:45 [pve-devel] [PATCH ha-manager/perl-rs/proxmox/qemu-server v2 00/12] Granular online_node_usage accounting Daniel Kral
2025-10-20 16:45 ` [pve-devel] [PATCH qemu-server v2 1/1] config: only fetch necessary default values in get_derived_property helper Daniel Kral
2025-10-21 11:47   ` [pve-devel] applied: " Fiona Ebner
2025-10-20 16:45 ` [pve-devel] [PATCH proxmox v2 1/1] resource-scheduling: change score_nodes_to_start_service signature Daniel Kral
2025-10-21 12:14   ` Fiona Ebner
2025-10-20 16:45 ` [pve-devel] [PATCH perl-rs v2 1/2] pve-rs: resource_scheduling: allow granular usage changes Daniel Kral
2025-10-20 16:45 ` [pve-devel] [PATCH perl-rs v2 2/2] test: resource_scheduling: use score_nodes helper to imitate HA Manager Daniel Kral
2025-10-21 12:14   ` Fiona Ebner
2025-10-20 16:45 ` [pve-devel] [PATCH ha-manager v2 1/8] manager: remove redundant recompute_online_node_usage from next_state_recovery Daniel Kral
2025-10-20 16:45 ` [pve-devel] [PATCH ha-manager v2 2/8] manager: remove redundant add_service_usage_to_node " Daniel Kral
2025-10-20 16:45 ` [pve-devel] [PATCH ha-manager v2 3/8] manager: remove redundant add_service_usage_to_node from next_state_started Daniel Kral
2025-10-20 16:45 ` [pve-devel] [PATCH ha-manager v2 4/8] rules: resource affinity: decouple get_resource_affinity helper from Usage class Daniel Kral
2025-10-21 13:02   ` Fiona Ebner
2025-10-23  8:20     ` Daniel Kral
2025-10-20 16:45 ` [pve-devel] [PATCH ha-manager v2 5/8] manager: make recompute_online_node_usage use add_service_usage helper Daniel Kral
2025-10-21 13:06   ` Fiona Ebner
2025-10-20 16:45 ` [pve-devel] [PATCH ha-manager v2 6/8] usage: allow granular changes to Usage implementations Daniel Kral
2025-10-20 16:45 ` Daniel Kral [this message]
2025-10-21 13:09   ` [pve-devel] [PATCH ha-manager v2 7/8] manager: make online node usage computation granular Fiona Ebner
2025-10-20 16:45 ` [pve-devel] [PATCH ha-manager v2 8/8] implement static service stats cache Daniel Kral
2025-10-21 13:23   ` Fiona Ebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251020164540.517231-12-d.kral@proxmox.com \
    --to=d.kral@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal