public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
From: Daniel Kral <d.kral@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [pve-devel] [PATCH ha-manager 9/9] manager: make service node usage computation more granular
Date: Tue, 30 Sep 2025 16:19:19 +0200	[thread overview]
Message-ID: <20250930142021.366529-13-d.kral@proxmox.com> (raw)
In-Reply-To: <20250930142021.366529-1-d.kral@proxmox.com>

The $online_node_usage is built on every call to manage(...) now, but
can be reduced to only be built on any scheduler mode change (including
initialization or error path to be complete).

This allows recompute_online_node_usage(...) to be reduced to
adding/removing nodes whenever these become online or are not online
anymore and handle the service usage updates whenever these change.
Therefore, recompute_online_node_usage(...) must only be called once in
manage(...) after $ns was properly updated.

Note that this makes the ha-manager not acknowledge any hotplug changes
to the guest configs anymore as long as the HA resource state doesn't
change.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
If we go for this patch, then we would need some mechanism to update the
static usage for a single or all HA resources registered in
$online_node_usage at once (or just rebuilt $online_node_usage at that
point..).

 src/PVE/HA/Manager.pm | 90 +++++++++++++++++++++++--------------------
 1 file changed, 49 insertions(+), 41 deletions(-)

diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 253deba9..6fadb3f3 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -106,6 +106,7 @@ sub update_crs_scheduler_mode {
     if (!defined($old_mode)) {
         $haenv->log('info', "using scheduler mode '$new_mode'") if $new_mode ne 'basic';
     } elsif ($new_mode eq $old_mode) {
+        $haenv->update_static_service_stats() if $old_mode eq 'static';
         return; # nothing to do
     } else {
         $haenv->log('info', "switching scheduler mode from '$old_mode' to '$new_mode'");
@@ -113,6 +114,39 @@ sub update_crs_scheduler_mode {
 
     $self->{crs}->{scheduler} = $new_mode;
 
+    my $online_node_usage;
+
+    if ($new_mode eq 'static') {
+        $online_node_usage = eval {
+            my $scheduler = PVE::HA::Usage::Static->new($haenv);
+            $scheduler->add_node($_) for $self->{ns}->list_online_nodes()->@*;
+            $haenv->update_static_service_stats();
+            return $scheduler;
+        };
+        if ($@) {
+            $self->{crs}->{scheduler} = 'basic'; # retry on next update
+            $haenv->log(
+                'warning',
+                "fallback to 'basic' scheduler mode, init for 'static' failed - $@",
+            );
+        }
+    } elsif ($new_mode eq 'basic') {
+        # handled below in the general fall-back case
+    } else {
+        $haenv->log('warning', "got unknown scheduler mode '$new_mode', using 'basic'");
+    }
+
+    # fallback to the basic algorithm in any case
+    if (!$online_node_usage) {
+        $online_node_usage = PVE::HA::Usage::Basic->new($haenv);
+        $online_node_usage->add_node($_) for $self->{ns}->list_online_nodes()->@*;
+    }
+
+    $self->{online_node_usage} = $online_node_usage;
+
+    # initialize with current nodes and services states
+    $self->add_service_usage($_, $self->{ss}->{$_}) for keys $self->{ss}->%*;
+
     return;
 }
 
@@ -253,49 +287,19 @@ my $valid_service_states = {
 sub recompute_online_node_usage {
     my ($self) = @_;
 
-    my $haenv = $self->{haenv};
+    my ($haenv, $ns) = $self->@{qw(haenv ns)};
 
-    my $online_nodes = { map { $_ => 1 } $self->{ns}->list_online_nodes()->@* };
+    for my $node ($self->{online_node_usage}->list_nodes()) {
+        next if $ns->node_is_online($node);
 
-    my $online_node_usage;
-
-    if (my $mode = $self->{crs}->{scheduler}) {
-        if ($mode eq 'static') {
-            $online_node_usage = eval {
-                my $scheduler = PVE::HA::Usage::Static->new($haenv);
-                $scheduler->add_node($_) for keys $online_nodes->%*;
-                $haenv->update_static_service_stats();
-                return $scheduler;
-            };
-            $haenv->log(
-                'warning',
-                "fallback to 'basic' scheduler mode, init for 'static' failed - $@",
-            ) if $@;
-        } elsif ($mode eq 'basic') {
-            # handled below in the general fall-back case
-        } else {
-            $haenv->log('warning', "got unknown scheduler mode '$mode', using 'basic'");
-        }
+        $self->{online_node_usage}->remove_node($node);
     }
 
-    # fallback to the basic algorithm in any case
-    if (!$online_node_usage) {
-        $online_node_usage = PVE::HA::Usage::Basic->new($haenv);
-        $online_node_usage->add_node($_) for keys $online_nodes->%*;
+    for my $node ($ns->list_online_nodes()->@*) {
+        next if $self->{online_node_usage}->contains_node($node);
+
+        $self->{online_node_usage}->add_node($node);
     }
-
-    for my $sid (sort keys $self->{ss}->%*) {
-        my $sd = $self->{ss}->{$sid};
-        my $used_nodes = PVE::HA::Tools::get_used_service_nodes($sd, $online_nodes);
-        my ($current, $target) = $used_nodes->@{qw(current target)};
-
-        $online_node_usage->add_service_usage_to_node($current, $sid, $sd->{node}, $sd->{target})
-            if $current;
-        $online_node_usage->add_service_usage_to_node($target, $sid, $sd->{node}, $sd->{target})
-            if $target;
-    }
-
-    $self->{online_node_usage} = $online_node_usage;
 }
 
 my $change_service_state = sub {
@@ -693,6 +697,8 @@ sub manage {
 
     $self->{groups} = $haenv->read_group_config(); # update
 
+    $self->recompute_online_node_usage();
+
     # compute new service status
 
     # add new service
@@ -704,11 +710,13 @@ sub manage {
         $haenv->log('info', "adding new service '$sid' on node '$cd->{node}'");
         # assume we are running to avoid relocate running service at add
         my $state = ($cd->{state} eq 'started') ? 'request_start' : 'request_stop';
-        $ss->{$sid} = {
+        my $sd = $ss->{$sid} = {
             state => $state,
             node => $cd->{node},
             uid => compute_new_uuid('started'),
         };
+
+        $self->add_service_usage($sid, $sd);
     }
 
     # remove stale or ignored services from manager state
@@ -718,12 +726,12 @@ sub manage {
         my $reason = defined($sc->{$sid}) ? 'ignored state requested' : 'no config';
         $haenv->log('info', "removing stale service '$sid' ($reason)");
 
+        $self->{online_node_usage}->remove_service_usage($sid);
+
         # remove all service related state information
         delete $ss->{$sid};
     }
 
-    $self->recompute_online_node_usage();
-
     my $new_rules = $haenv->read_rules_config();
 
     # TODO PVE 10: Remove group migration when HA groups have been fully migrated to rules
-- 
2.47.3



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


      parent reply	other threads:[~2025-09-30 14:21 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-30 14:19 [pve-devel] [RFC ha-manager/perl-rs/proxmox/qemu-server 00/12] Granular online_node_usage accounting Daniel Kral
2025-09-30 14:19 ` [pve-devel] [PATCH qemu-server 1/1] config: only fetch necessary default values in get_derived_property helper Daniel Kral
2025-09-30 14:19 ` [pve-devel] [PATCH proxmox 1/1] resource-scheduling: change score_nodes_to_start_service signature Daniel Kral
2025-09-30 14:19 ` [pve-devel] [PATCH perl-rs 1/1] pve-rs: resource_scheduling: allow granular usage changes Daniel Kral
2025-09-30 14:19 ` [pve-devel] [PATCH ha-manager 1/9] implement static service stats cache Daniel Kral
2025-09-30 14:19 ` [pve-devel] [PATCH ha-manager 2/9] manager: remove redundant recompute_online_node_usage from next_state_recovery Daniel Kral
2025-09-30 14:19 ` [pve-devel] [PATCH ha-manager 3/9] manager: remove redundant add_service_usage_to_node " Daniel Kral
2025-09-30 14:19 ` [pve-devel] [PATCH ha-manager 4/9] manager: remove redundant add_service_usage_to_node from next_state_started Daniel Kral
2025-09-30 14:19 ` [pve-devel] [PATCH ha-manager 5/9] rules: resource affinity: decouple get_resource_affinity helper from Usage class Daniel Kral
2025-09-30 14:19 ` [pve-devel] [PATCH ha-manager 6/9] manager: make recompute_online_node_usage use get_service_nodes helper Daniel Kral
2025-09-30 14:19 ` [pve-devel] [PATCH ha-manager 7/9] usage: allow granular changes to Usage implementations Daniel Kral
2025-09-30 14:19 ` [pve-devel] [PATCH ha-manager 8/9] manager: make online node usage computation granular Daniel Kral
2025-09-30 14:19 ` Daniel Kral [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250930142021.366529-13-d.kral@proxmox.com \
    --to=d.kral@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal