all lists on lists.proxmox.com
 help / color / mirror / Atom feed
From: Daniel Kral <d.kral@proxmox.com>
To: pve-devel@lists.proxmox.com
Subject: [PATCH ha-manager 2/2] make idle LRMs resolve leftover moving HA resources while disarmed
Date: Tue, 19 May 2026 16:38:36 +0200	[thread overview]
Message-ID: <20260519143842.382324-3-d.kral@proxmox.com> (raw)
In-Reply-To: <20260519143842.382324-1-d.kral@proxmox.com>

If there are HA resources, which are in transient states that defer the
disarming process, but their LRMs are already in idle state and disarmed
mode, these LRMs will not properly resolve the transient states of these
HA resources as assumed by the HA Manager.

For HA resources, which are still moving, this makes the HA Manager
stuck in a loop, which tries to defer the disarming process to wait for
a LRM response for these moving HA resources, which will never come as
the LRM is idle.

Therefore allow the LRM to become active in disarm mode if there are any
HA resources on the LRM's node, which are in any of these transient
states, and make sure that the LRM only processes the disarm-deferring
HA resources while the LRM is active.

Signed-off-by: Daniel Kral <d.kral@proxmox.com>
---
 src/PVE/HA/LRM.pm                         | 19 ++++++++++-
 src/PVE/HA/Manager.pm                     |  8 ++---
 src/PVE/HA/Tools.pm                       | 17 ++++++++++
 src/test/test-disarm-idle-lrm1/log.expect | 37 ++++++---------------
 src/test/test-disarm-idle-lrm2/log.expect | 39 +++++++----------------
 5 files changed, 58 insertions(+), 62 deletions(-)

diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm
index 426982cc..9100d611 100644
--- a/src/PVE/HA/LRM.pm
+++ b/src/PVE/HA/LRM.pm
@@ -312,6 +312,18 @@ sub active_service_count {
     return PVE::HA::Tools::count_active_services($ss, $nodename);
 }
 
+# returns a truthy value if there are HA resources in transient states, which
+# need to be resolved, e.g. to complete the disarm procedure.
+sub has_disarm_deferred_services {
+    my ($self) = @_;
+
+    my $ss = $self->{service_status};
+    my $nodename = $self->{haenv}->nodename();
+    my $deferred_sids = PVE::HA::Tools::get_disarm_deferred_services($ss, $nodename);
+
+    return %$deferred_sids;
+}
+
 my $wrote_lrm_status_at_startup = 0;
 
 sub do_one_iteration {
@@ -371,7 +383,7 @@ sub work {
 
         my $service_count = $self->active_service_count();
 
-        if ($self->{mode} eq 'disarm') {
+        if ($self->{mode} eq 'disarm' && !$self->has_disarm_deferred_services()) {
             # stay idle while disarmed, don't acquire lock
         } elsif (!$fence_request && $service_count && $haenv->quorate()) {
             if ($self->get_protected_ha_agent_lock()) {
@@ -709,12 +721,17 @@ sub manage_resources {
     my $nodename = $haenv->nodename();
 
     my $ss = $self->{service_status};
+    my $deferred_sids;
+    $deferred_sids = PVE::HA::Tools::get_disarm_deferred_services($ss, $nodename)
+        if $self->{mode} eq 'disarm';
 
     foreach my $sid (keys %{ $self->{restart_tries} }) {
         delete $self->{restart_tries}->{$sid} if !$ss->{$sid};
     }
 
     foreach my $sid (keys %$ss) {
+        next if $deferred_sids && !$deferred_sids->{$sid};
+
         my $sd = $ss->{$sid};
         next if !$sd->{node} || !$sd->{uid};
         next if $sd->{node} ne $nodename;
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 9b901c4f..a2baf349 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -929,15 +929,13 @@ sub handle_disarm {
     }
 
     # defer disarm if any services are in a transient state that needs the state machine to resolve
-    my $deferred_sids = {};
-    for my $sid (sort keys %$ss) {
+    my $deferred_sids = PVE::HA::Tools::get_disarm_deferred_services($ss);
+    for my $sid (sort keys %$deferred_sids) {
         my $state = $ss->{$sid}->{state};
         if ($state eq 'fence' || $state eq 'recovery') {
             $haenv->log('warning', "deferring disarm - service '$sid' is in '$state' state");
-            $deferred_sids->{$sid} = 1;
-        } elsif ($state eq 'migrate' || $state eq 'relocate') {
+        } else {
             $haenv->log('info', "deferring disarm - service '$sid' is in '$state' state");
-            $deferred_sids->{$sid} = 1;
         }
     }
 
diff --git a/src/PVE/HA/Tools.pm b/src/PVE/HA/Tools.pm
index 26629fb5..37b27e11 100644
--- a/src/PVE/HA/Tools.pm
+++ b/src/PVE/HA/Tools.pm
@@ -213,6 +213,23 @@ sub count_active_services {
     return $active_count;
 }
 
+sub get_disarm_deferred_services {
+    my ($ss, $node) = @_;
+
+    my $deferred_sids = {};
+    my @deferrable_states = qw(fence recovery migrate relocate);
+
+    for my $sid (keys %$ss) {
+        my ($state, $current_node, $target_node) = $ss->{$sid}->@{qw(state node target)};
+
+        next if $node && (!$current_node || $current_node ne $node);
+
+        $deferred_sids->{$sid} = 1 if grep { $state eq $_ } @deferrable_states;
+    }
+
+    return $deferred_sids;
+}
+
 sub get_verbose_service_state {
     my ($service_state, $service_conf) = @_;
 
diff --git a/src/test/test-disarm-idle-lrm1/log.expect b/src/test/test-disarm-idle-lrm1/log.expect
index 1b7f4ece..d46fbebd 100644
--- a/src/test/test-disarm-idle-lrm1/log.expect
+++ b/src/test/test-disarm-idle-lrm1/log.expect
@@ -26,34 +26,15 @@ info     20    node1/crm: recover service 'vm:102' from fenced node 'node2' to n
 info     20    node1/crm: service 'vm:102': state changed from 'recovery' to 'started'  (node = node1)
 info     22    node2/crm: status change wait_for_quorum => slave
 info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: service vm:103 - start migrate to node 'node2'
+info     25    node3/lrm: service vm:103 - end migrate to node 'node2'
 info     40    node1/crm: node 'node2': state changed from 'unknown' => 'online'
 info     40    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info     60    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info     80    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    100    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    120    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    140    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    160    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    180    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    200    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    220    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    240    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    260    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    280    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    300    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    320    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    340    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    360    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    380    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    400    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    420    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    440    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    460    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    480    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    500    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    520    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    540    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    560    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    580    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    600    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info     40    node1/crm: service 'vm:103': state changed from 'migrate' to 'started'  (node = node2)
+info     45    node3/lrm: HA disarm requested, releasing agent lock and watchdog
+info     45    node3/lrm: status change active => wait_for_agent_lock
+info     60    node1/crm: all LRMs disarmed, HA stack is now fully disarmed
+info     60    node1/crm: HA stack fully disarmed, releasing CRM watchdog
 info    620     hardware: exit simulation - done
diff --git a/src/test/test-disarm-idle-lrm2/log.expect b/src/test/test-disarm-idle-lrm2/log.expect
index d0ba96ff..13e3e2a7 100644
--- a/src/test/test-disarm-idle-lrm2/log.expect
+++ b/src/test/test-disarm-idle-lrm2/log.expect
@@ -23,34 +23,17 @@ info     20    node1/crm: recover service 'vm:102' from fenced node 'node2' to n
 info     20    node1/crm: service 'vm:102': state changed from 'recovery' to 'started'  (node = node1)
 info     22    node2/crm: status change wait_for_quorum => slave
 info     24    node3/crm: status change wait_for_quorum => slave
+info     25    node3/lrm: got lock 'ha_agent_node3_lock'
+info     25    node3/lrm: status change wait_for_agent_lock => active
+info     25    node3/lrm: service vm:103 - start migrate to node 'node2'
+info     25    node3/lrm: service vm:103 - end migrate to node 'node2'
 info     40    node1/crm: node 'node2': state changed from 'unknown' => 'online'
 info     40    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info     60    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info     80    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    100    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    120    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    140    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    160    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    180    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    200    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    220    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    240    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    260    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    280    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    300    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    320    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    340    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    360    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    380    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    400    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    420    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    440    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    460    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    480    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    500    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    520    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    540    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    560    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    580    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
-info    600    node1/crm: deferring disarm - service 'vm:103' is in 'migrate' state
+info     40    node1/crm: service 'vm:103': state changed from 'migrate' to 'started'  (node = node2)
+info     45    node3/lrm: HA disarm requested, releasing agent lock and watchdog
+info     45    node3/lrm: status change active => wait_for_agent_lock
+info     60    node1/crm: disarm: freezing service 'vm:102' (was 'started')
+info     60    node1/crm: disarm: freezing service 'vm:103' (was 'started')
+info     60    node1/crm: all LRMs disarmed, HA stack is now fully disarmed
+info     60    node1/crm: HA stack fully disarmed, releasing CRM watchdog
 info    620     hardware: exit simulation - done
-- 
2.47.3





  parent reply	other threads:[~2026-05-19 14:39 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19 14:38 [PATCH-SERIES ha-manager 0/2] make idle LRMs resolve leftover moving HA resources while disarmed Daniel Kral
2026-05-19 14:38 ` [PATCH ha-manager 1/2] test: add disarm test cases for idle lrms with transient ha resources Daniel Kral
2026-05-19 14:38 ` Daniel Kral [this message]
2026-05-19 16:00   ` [PATCH ha-manager 2/2] make idle LRMs resolve leftover moving HA resources while disarmed Fiona Ebner
2026-05-19 14:47 ` [PATCH-SERIES ha-manager 0/2] " Daniel Kral
2026-05-19 16:00   ` Fiona Ebner
2026-05-19 20:11 ` applied: " Thomas Lamprecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260519143842.382324-3-d.kral@proxmox.com \
    --to=d.kral@proxmox.com \
    --cc=pve-devel@lists.proxmox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal